Re: [PATCH][RFC] Add new ipa-reorder pass

2019-10-02 Thread Fangrui Song

On 2019-09-24, Martin Liška wrote:

On 9/19/19 10:33 AM, Martin Liška wrote:

- One needs modified binutils and I that would probably require a configure 
detection. The only way
  which I see is based on ld --version. I'm planning to make the binutils 
submission soon.


The patch submission link:
https://sourceware.org/ml/binutils/2019-09/msg00219.html


Hi Martin,

I have a question about why .text.sorted.* are needed.

The Sony presentation (your [2] link) embedded a new section
.llvm.call-graph-profile[3] to represent edges in the object files. The
linker (lld) collects all .llvm.call-graph-profile sections and does a
C3 layout. There is no need for new section type .text.sorted.*

[3]: 
https://github.com/llvm/llvm-project/blob/master/lld/test/ELF/cgprofile-obj.s

(Please CC me. I am not subscribed.)


Re: [PATCH target/86811] Mark VAX as not needing speculation barriers

2019-10-02 Thread Jeff Law
On 10/2/19 8:12 PM, Maciej W. Rozycki wrote:
> On Fri, 20 Sep 2019, Jeff Law wrote:
> 
>>> According to Bob Supnik (who is a very authoritative source on VAX),
>>>
 Funny you should ask. The short answer is no. No VAX ever did
 speculative or out of order execution.
>>>
>>> As such, marking VAX as not needing speculation barriers.
>>>
>>>
>>> PR target/86811
>>> * config/vax/vax.c (TARGET_HAVE_SPECULATION_SAFE_VALUE):
>>> Define to speculation_safe_value_not_needed.
>> Installed on the trunk.
> 
>  I don't think this is right.  As I have just mentioned in a related 
> discussion elsewhere, the NVAX and NVAX+ implementations include a branch 
> predictor in their microarchitecture[1], so obviously they do execute 
> speculatively.  I think this change would best be reverted and the issue 
> further investigated.
Simply having branch prediction and simple speculation onto one side of
the branch isn't sufficient.  Otherwise a whole lot more processors
would, in theory, be vulnerable.

IIRC you also need some degree of out of order execution, or at least
the ability to keep speculating past memory accesses and high enough
quality timers to distinguish between a cache hit and cache miss.


All this seems fairly academic for the vax.


jeff


Re: [PATCH target/86811] Mark VAX as not needing speculation barriers

2019-10-02 Thread Maciej W. Rozycki
On Fri, 20 Sep 2019, Jeff Law wrote:

> > According to Bob Supnik (who is a very authoritative source on VAX),
> > 
> >> Funny you should ask. The short answer is no. No VAX ever did
> >> speculative or out of order execution.
> > 
> > As such, marking VAX as not needing speculation barriers.
> > 
> > 
> > PR target/86811
> > * config/vax/vax.c (TARGET_HAVE_SPECULATION_SAFE_VALUE):
> > Define to speculation_safe_value_not_needed.
> Installed on the trunk.

 I don't think this is right.  As I have just mentioned in a related 
discussion elsewhere, the NVAX and NVAX+ implementations include a branch 
predictor in their microarchitecture[1], so obviously they do execute 
speculatively.  I think this change would best be reverted and the issue 
further investigated.

References:

[1] G. Michael Uhler et al, "The NVAX and NVAX+ High-performance VAX
Microprocessors", Digital Technical Journal Vol. 4 No. 3 Summer 1992



  Maciej


Define WIDTH macros for C2x

2019-10-02 Thread Joseph Myers
As part of the integration of TS 18661-1 into C2x, many features
became unconditional features not depending on any feature test macro
being defined.  This patch updates the conditionals on the *_WIDTH
macros in limits.h and stdint.h accordingly so that they are defined
for C2x.  The macro CR_DECIMAL_DIG in float.h does still require
__STDC_WANT_IEC_60559_BFP_EXT__ to be defined, and a test for this is
added.

Bootstrapped with no regressions on x86_64-pc-linux-gnu.  Applied to 
mainline.

gcc:
2019-10-02  Joseph Myers  

* ginclude/stdint-gcc.h [__STDC_WANT_IEC_60559_BFP_EXT__]: Change
condition on WIDTH macros to [__STDC_WANT_IEC_60559_BFP_EXT__ ||
(__STDC_VERSION__ && __STDC_VERSION__ > 201710L)].
* glimits.h: Likewise.

gcc/testsuite:
2019-10-02  Joseph Myers  

* gcc.dg/cr-decimal-dig-2.c: New test.
* gcc.dg/limits-width-2.c: New test.  Based on limits-width-1.c.
* gcc.dg/stdint-width-2.c: New test.  Based on stdint-width-1.c.

Index: gcc/ginclude/stdint-gcc.h
===
--- gcc/ginclude/stdint-gcc.h   (revision 276455)
+++ gcc/ginclude/stdint-gcc.h   (working copy)
@@ -260,8 +260,9 @@
 #endif /* (!defined __cplusplus || __cplusplus >= 201103L
   || defined __STDC_CONSTANT_MACROS) */
 
-#ifdef __STDC_WANT_IEC_60559_BFP_EXT__
-/* TS 18661-1 widths of integer types.  */
+#if (defined __STDC_WANT_IEC_60559_BFP_EXT__ \
+ || (defined (__STDC_VERSION__) && __STDC_VERSION__ > 201710L))
+/* TS 18661-1 / C2X widths of integer types.  */
 
 #ifdef __INT8_TYPE__
 # undef INT8_WIDTH
Index: gcc/glimits.h
===
--- gcc/glimits.h   (revision 276455)
+++ gcc/glimits.h   (working copy)
@@ -123,8 +123,9 @@
 # define ULONG_LONG_MAX (LONG_LONG_MAX * 2ULL + 1ULL)
 #endif
 
-#ifdef __STDC_WANT_IEC_60559_BFP_EXT__
-/* TS 18661-1 widths of integer types.  */
+#if (defined __STDC_WANT_IEC_60559_BFP_EXT__ \
+ || (defined (__STDC_VERSION__) && __STDC_VERSION__ > 201710L))
+/* TS 18661-1 / C2X widths of integer types.  */
 # undef CHAR_WIDTH
 # define CHAR_WIDTH __SCHAR_WIDTH__
 # undef SCHAR_WIDTH
Index: gcc/testsuite/gcc.dg/cr-decimal-dig-2.c
===
--- gcc/testsuite/gcc.dg/cr-decimal-dig-2.c (nonexistent)
+++ gcc/testsuite/gcc.dg/cr-decimal-dig-2.c (working copy)
@@ -0,0 +1,10 @@
+/* Test TS 18661-1 CR_DECIMAL_DIG: not in C2X without
+   __STDC_WANT_IEC_60559_BFP_EXT__ defined.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2x" } */
+
+#include 
+
+#ifdef CR_DECIMAL_DIG
+#error "CR_DECIMAL_DIG defined"
+#endif
Index: gcc/testsuite/gcc.dg/limits-width-2.c
===
--- gcc/testsuite/gcc.dg/limits-width-2.c   (nonexistent)
+++ gcc/testsuite/gcc.dg/limits-width-2.c   (working copy)
@@ -0,0 +1,54 @@
+/* Test C2X width macros in .  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2x" } */
+
+#include 
+
+#define CHECK_WIDTH(TYPE, MAX, WIDTH)  \
+  _Static_assert ((MAX >> ((TYPE) -1 < 0 ? (WIDTH - 2) : (WIDTH - 1))) == 1, \
+ "width must match type")
+
+#ifndef CHAR_WIDTH
+# error "missing CHAR_WIDTH"
+#endif
+CHECK_WIDTH (char, CHAR_MAX, CHAR_WIDTH);
+#ifndef SCHAR_WIDTH
+# error "missing SCHAR_WIDTH"
+#endif
+CHECK_WIDTH (signed char, SCHAR_MAX, SCHAR_WIDTH);
+#ifndef UCHAR_WIDTH
+# error "missing UCHAR_WIDTH"
+#endif
+CHECK_WIDTH (unsigned char, UCHAR_MAX, UCHAR_WIDTH);
+#ifndef SHRT_WIDTH
+# error "missing SHRT_WIDTH"
+#endif
+CHECK_WIDTH (signed short, SHRT_MAX, SHRT_WIDTH);
+#ifndef USHRT_WIDTH
+# error "missing USHRT_WIDTH"
+#endif
+CHECK_WIDTH (unsigned short, USHRT_MAX, USHRT_WIDTH);
+#ifndef INT_WIDTH
+# error "missing INT_WIDTH"
+#endif
+CHECK_WIDTH (signed int, INT_MAX, INT_WIDTH);
+#ifndef UINT_WIDTH
+# error "missing UINT_WIDTH"
+#endif
+CHECK_WIDTH (unsigned int, UINT_MAX, UINT_WIDTH);
+#ifndef LONG_WIDTH
+# error "missing LONG_WIDTH"
+#endif
+CHECK_WIDTH (signed long, LONG_MAX, LONG_WIDTH);
+#ifndef ULONG_WIDTH
+# error "missing ULONG_WIDTH"
+#endif
+CHECK_WIDTH (unsigned long, ULONG_MAX, ULONG_WIDTH);
+#ifndef LLONG_WIDTH
+# error "missing LLONG_WIDTH"
+#endif
+CHECK_WIDTH (signed long long, LLONG_MAX, LLONG_WIDTH);
+#ifndef ULLONG_WIDTH
+# error "missing ULLONG_WIDTH"
+#endif
+CHECK_WIDTH (unsigned long long, ULLONG_MAX, ULLONG_WIDTH);
Index: gcc/testsuite/gcc.dg/stdint-width-2.c
===
--- gcc/testsuite/gcc.dg/stdint-width-2.c   (nonexistent)
+++ gcc/testsuite/gcc.dg/stdint-width-2.c   (working copy)
@@ -0,0 +1,175 @@
+/* Test C2X width macros in .  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2x -ffreestanding" } */
+/* { dg-additional-options "-DSIGNAL_SUPPRESS" { target { ! signal } } } */
+
+#include 
+#include 
+#ifndef SIGNAL_SUPPRESS
+#include 
+#e

Re: [SVE] PR86753

2019-10-02 Thread Prathamesh Kulkarni
On Wed, 25 Sep 2019 at 09:17, Prathamesh Kulkarni
 wrote:
>
> On Mon, 16 Sep 2019 at 08:54, Prathamesh Kulkarni
>  wrote:
> >
> > On Mon, 9 Sep 2019 at 09:36, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Mon, 9 Sep 2019 at 16:45, Richard Sandiford
> > >  wrote:
> > > >
> > > > Prathamesh Kulkarni  writes:
> > > > > With patch, the only following FAIL remains for aarch64-sve.exp:
> > > > > FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve
> > > > > scan-assembler-times \\tmovprfx\\t 6
> > > > > which now contains 14.
> > > > > Should I adjust the test, assuming the change isn't a regression ?
> > > >
> > > > Well, it is kind-of a regression, but it really just means that the
> > > > integer code is now consistent with the floating-point code in having
> > > > an unnecessary MOVPRFX.  So I think adjusting the count is fine.
> > > > Presumably any future fix for the existing redundant MOVPRFXs will
> > > > apply to the new ones as well.
> > > >
> > > > The patch looks good to me, just some very minor nits:
> > > >
> > > > > @@ -8309,11 +8309,12 @@ vect_double_mask_nunits (tree type)
> > > > >
> > > > >  /* Record that a fully-masked version of LOOP_VINFO would need MASKS 
> > > > > to
> > > > > contain a sequence of NVECTORS masks that each control a vector 
> > > > > of type
> > > > > -   VECTYPE.  */
> > > > > +   VECTYPE. SCALAR_MASK if non-null, represents the mask used for 
> > > > > corresponding
> > > > > +   load/store stmt.  */
> > > >
> > > > Should be two spaces between sentences.  Maybe:
> > > >
> > > >VECTYPE.  If SCALAR_MASK is nonnull, the fully-masked loop would AND
> > > >these vector masks with the vector version of SCALAR_MASK.  */
> > > >
> > > > since the mask isn't necessarily for a load or store statement.
> > > >
> > > > > [...]
> > > > > @@ -1879,7 +1879,8 @@ static tree permute_vec_elements (tree, tree, 
> > > > > tree, stmt_vec_info,
> > > > > says how the load or store is going to be implemented and 
> > > > > GROUP_SIZE
> > > > > is the number of load or store statements in the containing group.
> > > > > If the access is a gather load or scatter store, GS_INFO describes
> > > > > -   its arguments.
> > > > > +   its arguments. SCALAR_MASK is the scalar mask used for 
> > > > > corresponding
> > > > > +   load or store stmt.
> > > >
> > > > Maybe:
> > > >
> > > >its arguments.  If the load or store is conditional, SCALAR_MASK is 
> > > > the
> > > >condition under which it occurs.
> > > >
> > > > since SCALAR_MASK can be null here too.
> > > >
> > > > > [...]
> > > > > @@ -9975,6 +9978,31 @@ vectorizable_condition (stmt_vec_info 
> > > > > stmt_info, gimple_stmt_iterator *gsi,
> > > > >/* Handle cond expr.  */
> > > > >for (j = 0; j < ncopies; j++)
> > > > >  {
> > > > > +  tree loop_mask = NULL_TREE;
> > > > > +  bool swap_cond_operands = false;
> > > > > +
> > > > > +  if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> > > > > + {
> > > > > +   scalar_cond_masked_key cond (cond_expr, ncopies);
> > > > > +   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > > > > + {
> > > > > +   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
> > > > > +   loop_mask = vect_get_loop_mask (gsi, masks, ncopies, 
> > > > > vectype, j);
> > > > > + }
> > > > > +   else
> > > > > + {
> > > > > +   cond.code = invert_tree_comparison (cond.code,
> > > > > +   HONOR_NANS (TREE_TYPE 
> > > > > (cond.op0)));
> > > >
> > > > Long line.  Maybe just split it out into a separate assignment:
> > > >
> > > >   bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
> > > >   cond.code = invert_tree_comparison (cond.code, 
> > > > honor_nans);
> > > >
> > > > > +   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > > > > + {
> > > > > +   vec_loop_masks *masks = &LOOP_VINFO_MASKS 
> > > > > (loop_vinfo);
> > > > > +   loop_mask = vect_get_loop_mask (gsi, masks, ncopies, 
> > > > > vectype, j);
> > > >
> > > > Long line here too.
> > > >
> > > > > [...]
> > > > > @@ -10090,6 +10121,26 @@ vectorizable_condition (stmt_vec_info 
> > > > > stmt_info, gimple_stmt_iterator *gsi,
> > > > >   }
> > > > >   }
> > > > >   }
> > > > > +
> > > > > +   if (loop_mask)
> > > > > + {
> > > > > +   if (COMPARISON_CLASS_P (vec_compare))
> > > > > + {
> > > > > +   tree tmp = make_ssa_name (vec_cmp_type);
> > > > > +   gassign *g = gimple_build_assign (tmp,
> > > > > + TREE_CODE 
> > > > > (vec_compare),
> > > > > + TREE_OPERAND 
> > > > > (vec_compare, 0),
> > > > d> +TREE_OPERAND 
> > > > (vec_compare, 1));
> > > >
> > > 

[PATCH] mention referenced object in more -Wstringop-overflow instances

2019-10-02 Thread Martin Sebor

The attached patch adds an optional argument to
compute_builtin_object_size to let it return the DECL of the object
whose size it's called to compute.  This lets -Wstringop-overflow
point to the object in more instances of the warning than it does
now (but by no means all of them -- more work is needed to make
that happen).  This can be helpful when the object is declared in
a different function than the access.

Tested on x86_64-linux.

Martin

gcc/ChangeLog:

	* builtins.c (compute_objsize): Add an argument.
	* tree-object-size.c (addr_object_size): Same.
	(compute_builtin_object_size): Same.
	* tree-object-size.h (compute_builtin_object): Same.

gcc/testsuite/ChangeLog:

	* gcc.dg/Wstringop-overflow-17.c: New test.

Index: gcc/builtins.c
===
--- gcc/builtins.c	(revision 276472)
+++ gcc/builtins.c	(working copy)
@@ -3587,7 +3587,7 @@ compute_objsize (tree dest, int ostype, tree *pdec
   /* Only the two least significant bits are meaningful.  */
   ostype &= 3;
 
-  if (compute_builtin_object_size (dest, ostype, &size))
+  if (compute_builtin_object_size (dest, ostype, &size, pdecl))
 return build_int_cst (sizetype, size);
 
   if (TREE_CODE (dest) == SSA_NAME)
Index: gcc/tree-object-size.h
===
--- gcc/tree-object-size.h	(revision 276472)
+++ gcc/tree-object-size.h	(working copy)
@@ -22,6 +22,7 @@ along with GCC; see the file COPYING3.  If not see
 
 extern void init_object_sizes (void);
 extern void fini_object_sizes (void);
-extern bool compute_builtin_object_size (tree, int, unsigned HOST_WIDE_INT *);
+extern bool compute_builtin_object_size (tree, int, unsigned HOST_WIDE_INT *,
+	 tree * = NULL);
 
 #endif  // GCC_TREE_OBJECT_SIZE_H
Index: gcc/tree-object-size.c
===
--- gcc/tree-object-size.c	(revision 276491)
+++ gcc/tree-object-size.c	(working copy)
@@ -54,7 +54,8 @@ static const unsigned HOST_WIDE_INT unknown[4] = {
 
 static tree compute_object_offset (const_tree, const_tree);
 static bool addr_object_size (struct object_size_info *,
-			  const_tree, int, unsigned HOST_WIDE_INT *);
+			  const_tree, int, unsigned HOST_WIDE_INT *,
+			  tree * = NULL);
 static unsigned HOST_WIDE_INT alloc_object_size (const gcall *, int);
 static tree pass_through_call (const gcall *);
 static void collect_object_sizes_for (struct object_size_info *, tree);
@@ -172,10 +173,15 @@ compute_object_offset (const_tree expr, const_tree
 
 static bool
 addr_object_size (struct object_size_info *osi, const_tree ptr,
-		  int object_size_type, unsigned HOST_WIDE_INT *psize)
+		  int object_size_type, unsigned HOST_WIDE_INT *psize,
+		  tree *pdecl /* = NULL */)
 {
   tree pt_var, pt_var_size = NULL_TREE, var_size, bytes;
 
+  tree dummy;
+  if (!pdecl)
+pdecl = &dummy;
+
   gcc_assert (TREE_CODE (ptr) == ADDR_EXPR);
 
   /* Set to unknown and overwrite just before returning if the size
@@ -195,7 +201,7 @@ addr_object_size (struct object_size_info *osi, co
 	  || TREE_CODE (TREE_OPERAND (pt_var, 0)) != SSA_NAME)
 	{
 	  compute_builtin_object_size (TREE_OPERAND (pt_var, 0),
-   object_size_type & ~1, &sz);
+   object_size_type & ~1, &sz, pdecl);
 	}
   else
 	{
@@ -232,7 +238,10 @@ addr_object_size (struct object_size_info *osi, co
 	   && DECL_P (pt_var)
 	   && tree_fits_uhwi_p (DECL_SIZE_UNIT (pt_var))
 	   && tree_to_uhwi (DECL_SIZE_UNIT (pt_var)) < offset_limit)
-pt_var_size = DECL_SIZE_UNIT (pt_var);
+{
+  *pdecl = pt_var;
+  pt_var_size = DECL_SIZE_UNIT (pt_var);
+}
   else if (pt_var
 	   && TREE_CODE (pt_var) == STRING_CST
 	   && TYPE_SIZE_UNIT (TREE_TYPE (pt_var))
@@ -478,13 +487,16 @@ pass_through_call (const gcall *call)
 
 
 /* Compute __builtin_object_size value for PTR and set *PSIZE to
-   the resulting value.  OBJECT_SIZE_TYPE is the second argument
-   to __builtin_object_size.  Return true on success and false
-   when the object size could not be determined.  */
+   the resulting value.  If the declared object is known and PDECL
+   is nonnull, sets *PDECL to the object's DECL.  OBJECT_SIZE_TYPE
+   is the second argument   to __builtin_object_size.
+   Returns true on success and false when the object size could not
+   be determined.  */
 
 bool
 compute_builtin_object_size (tree ptr, int object_size_type,
-			 unsigned HOST_WIDE_INT *psize)
+			 unsigned HOST_WIDE_INT *psize,
+			 tree *pdecl /* = NULL */)
 {
   gcc_assert (object_size_type >= 0 && object_size_type <= 3);
 
@@ -496,7 +508,7 @@ compute_builtin_object_size (tree ptr, int object_
 init_offset_limit ();
 
   if (TREE_CODE (ptr) == ADDR_EXPR)
-return addr_object_size (NULL, ptr, object_size_type, psize);
+return addr_object_size (NULL, ptr, object_size_type, psize, pdecl);
 
   if (TREE_CODE (ptr) != SSA_NAME
   || !POINTER_TYPE_P (TREE_TYPE (ptr)))
@@ -520,7 

Re: Fix ALL_REGS thinko in initialisation of function_used_regs

2019-10-02 Thread Segher Boessenkool
On Wed, Oct 02, 2019 at 10:22:22PM +0100, Richard Sandiford wrote:
> My change to the -fipa-ra bookkeeping used ALL_REGS as the supposedly
> safe default assumption, but ALL_REGS isn't literally all registers,
> just a close approximation.
> 
> This caused a bootstrap failure on arm-linux-gnu, where the condition
> code register isn't in ALL_REGS and so was being masked out of some
> call-clobbered sets.

The documentation says

@findex ALL_REGS
@findex NO_REGS
In general, each register will belong to several classes.  In fact, one
class must be named @code{ALL_REGS} and contain all the registers.  Another
class must be named @code{NO_REGS} and contain no registers.  Often the
union of two classes will be another class; however, this is not required.

so is the arm port wrong, or is the documentation wrong?  I think the arm
port simply forgets to include CC_REG, VFPCC_REG, SFP_REG, AFP_REG:

#define REG_CLASS_CONTENTS  \
...
  { 0x, 0x, 0x, 0x0010 }, /* CC_REG */  \
  { 0x, 0x, 0x, 0x0020 }, /* VFPCC_REG */   \
  { 0x, 0x, 0x, 0x0040 }, /* SFP_REG */ \
  { 0x, 0x, 0x, 0x0080 }, /* AFP_REG */ \
  { 0x7FFF, 0x, 0x, 0x000F }  /* ALL_REGS */\

and that last number should be 0x00ff instead?



Segher


Re: [PATCH], V4, patch #4.1: Enable prefixed/pc-rel addressing (revised)

2019-10-02 Thread Segher Boessenkool
On Wed, Oct 02, 2019 at 03:04:35PM -0400, Michael Meissner wrote:
> On Tue, Oct 01, 2019 at 06:56:01PM -0500, Segher Boessenkool wrote:
> > On Mon, Sep 30, 2019 at 10:12:54AM -0400, Michael Meissner wrote:
> > > I needed
> > > to add a second memory constraint ('eM') that prevents using a prefixed
> > > address.
> > 
> > Do we need both em and eM?  Do we ever want to allow prefixed insns but not
> > pcrel?  Or, alternatively, this only uses eM for some insns where the "p"
> > prefix trick won't work (because there are multiple insns in the template);
> > we could solve that some other way (by inserting the p's manually for
> > example).
> 
> No right now we need one (no prefix) and the other (no pcrel) is desirable, 
> but
> if we can only have one, it will mean extra instructions being generated.

We can have both if we need to, but it should have less confusing names at
a minimum.

> In the case of no prefix, we need this for stack_protect_testdi and
> stack_protect_setdi.  There if we have a large stack frame and enable stack
> protection, we don't want the register allocator from re-combining the insns 
> to
> an insn with a large offset, which it will do, even though the predicate does
> not allow this case.  This was discovered when we tried to build glibc, and 
> one
> module (vfwprintf-internal.c) has a large stack frame and is built with
> -fstack-protector-all.

Yes, but why does it not allow prefixed insns anyway?  It does not currently
*handle* prefixed insns properly, but that can be fixed.  It won't be
pretty, but it won't be horrible either.

Anyway, we need to be able to handle non-prefixed anyway (for asm, as I
mentioned later), so yes we want a constraint for that, and have "m" in
inline asm mean that (just like right now it actually means "m but not
update form").

> In the case of no pc-rel, this occurs in optimizing vector extracts where the
> vector is in memory.  In this code, we only have one temporary base register.

Why?

> In the case where the address is PC-relative, and the element number being
> extracted is variable, we would need two temporary base registers, one to hold
> the PC-relative address, and the other to hold the offset from the start of 
> the
> vector.  So here, we disallow PC-relative addresses, but numeric addresses 
> that
> result in a prefixed instruction are fine, since the code calculates the
> offset, adds in the offset, and then does the memory operation.

So it should have more scratch registers here?

Or, alternatively, we can just disallow all prefixed addressing here?  Will
that really degrade anything?

> If we only had a no prefixed constraint, the code would not combine the vector
> extract from memory, and instead, it would load the whole vector into a
> register, and then do the appropriate shifts, VSLO, etc. to extract the
> element.  I imagine the case shows up when you have large stack frames (or 
> very
> large structures).

I don't understand.

> > But what should inline asm users do wrt prefixed/pcrel memory?  Should
> > they not use prefixed memory at all?  That means for asm we should always
> > disallow prefixed for "m".
> 
> Yes, I've been thinking that for some time.  But I'm not going to worry about
> that until the patches are in.

Please do worry about it.

> > > 4) In the previous patch, I missed setting the prefixed size and non 
> > > prefixed
> > > size for mov_ppc64 in the insn.  This pattern is used for moving 
> > > PTImode
> > > in GPR registers (on non-VSX systems, it would move TImode also).  By the 
> > > time
> > > it gets to final, it will have been split, but it is still useful to get 
> > > the
> > > sizes correct before the mode is split.
> > 
> > So that is a separate patch?  Please send it as one, then?
> 
> No, it needs to be part of this patch.  It was just missing from the patch I
> sent out.

This patch does a whole lot of separate things.  It needs to be split up,
it took ages to review it like this.

> > > +  /* The LWA instruction uses the DS-form format where the bottom two 
> > > bits of
> > > + the offset must be 0.  The prefixed PLWA does not have this
> > > + restriction.  */
> > > +  if (TARGET_PREFIXED_ADDR
> > > +  && address_is_prefixed (addr, DImode, NON_PREFIXED_DS))
> > > +return true;
> > 
> > Should TARGET_PREFIXED_ADDR be part of address_is_prefixed, instead of
> > part of all its callers?
> 
> Just trying to do the test before the call, so the default case (not 'future')
> won't have the call/return.

Don't micro-optimise like this please, the compiler can do this better
than humans can.  And humans *can* forget it in places, which gives ICEs.

> > What is INSN_FORM_BAD about, in both functions?
> 
> Because I think it is a bad idea to return true in the case where the memory 
> is
> not correct (for example, if somebody created an address with 35-bit offsets,
> the address_to_insn_form would return INSN_FORM_BAD, but you would get a false
> positive if non_pcrel_memory retur

Re: [PATCH] Use movmem optab to attempt inline expansion of __builtin_memmove()

2019-10-02 Thread Aaron Sawdey
On 10/2/19 5:35 PM, Jakub Jelinek wrote:
> On Wed, Oct 02, 2019 at 09:21:23AM -0500, Aaron Sawdey wrote:
 2019-09-27  Aaron Sawdey 

* builtins.c (expand_builtin_memory_copy_args): Add might_overlap parm.
(expand_builtin_memcpy): Use might_overlap parm.
(expand_builtin_mempcpy_args): Use might_overlap parm.
(expand_builtin_memmove): Call expand_builtin_memory_copy_args.
(expand_builtin_memory_copy_args): Add might_overlap parm.
* expr.c (emit_block_move_via_cpymem): Rename to
emit_block_move_via_pattern, add might_overlap parm, use cpymem
or movmem optab as appropriate.
(emit_block_move_hints): Add might_overlap parm, do the right
thing for might_overlap==true.
* expr.h (emit_block_move_hints): Update prototype.
> 
>> @@ -1622,13 +1624,30 @@
>>set_mem_size (y, const_size);
>>  }
>>
>> -  if (CONST_INT_P (size) && can_move_by_pieces (INTVAL (size), align))
>> +  bool pieces_ok = can_move_by_pieces (INTVAL (size), align);
>> +  bool pattern_ok = false;
>> +
>> +  if (!CONST_INT_P (size) || !pieces_ok || might_overlap)
> ...
> 
> This change broke rtl checking bootstrap.
> You can't use INTVAL on size that isn't CONST_INT_P.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
> committed to trunk as obvious:

Jakub,
  Sorry about that! Now that you point it out, it's obvious. But what it means
for me is that I need to be in the habit of bootstrapping with 
--enable-checking=rtl
when I make these changes.

  Aaron

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain


Re: [PATCH] Use movmem optab to attempt inline expansion of __builtin_memmove()

2019-10-02 Thread Jakub Jelinek
On Wed, Oct 02, 2019 at 09:21:23AM -0500, Aaron Sawdey wrote:
> >> 2019-09-27  Aaron Sawdey 
> >>
> >>* builtins.c (expand_builtin_memory_copy_args): Add might_overlap parm.
> >>(expand_builtin_memcpy): Use might_overlap parm.
> >>(expand_builtin_mempcpy_args): Use might_overlap parm.
> >>(expand_builtin_memmove): Call expand_builtin_memory_copy_args.
> >>(expand_builtin_memory_copy_args): Add might_overlap parm.
> >>* expr.c (emit_block_move_via_cpymem): Rename to
> >>emit_block_move_via_pattern, add might_overlap parm, use cpymem
> >>or movmem optab as appropriate.
> >>(emit_block_move_hints): Add might_overlap parm, do the right
> >>thing for might_overlap==true.
> >>* expr.h (emit_block_move_hints): Update prototype.

> @@ -1622,13 +1624,30 @@
>set_mem_size (y, const_size);
>  }
> 
> -  if (CONST_INT_P (size) && can_move_by_pieces (INTVAL (size), align))
> +  bool pieces_ok = can_move_by_pieces (INTVAL (size), align);
> +  bool pattern_ok = false;
> +
> +  if (!CONST_INT_P (size) || !pieces_ok || might_overlap)
...

This change broke rtl checking bootstrap.
You can't use INTVAL on size that isn't CONST_INT_P.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
committed to trunk as obvious:

2019-10-03  Jakub Jelinek  

PR rtl-optimization/91976
* expr.c (emit_block_move_hints): Don't call can_move_by_pieces if
size is not CONST_INT_P, set pieces_ok to false in that case.  Simplify
CONST_INT_P (size) && pieces_ok to pieces_ok.  Formatting fix.

--- gcc/expr.c.jj   2019-10-02 16:35:20.977451346 +0200
+++ gcc/expr.c  2019-10-02 21:47:54.900724874 +0200
@@ -1624,16 +1624,18 @@ emit_block_move_hints (rtx x, rtx y, rtx
   set_mem_size (y, const_size);
 }
 
-  bool pieces_ok = can_move_by_pieces (INTVAL (size), align);
+  bool pieces_ok = false;
+  if (CONST_INT_P (size))
+pieces_ok = can_move_by_pieces (INTVAL (size), align);
   bool pattern_ok = false;
 
-  if (!CONST_INT_P (size) || !pieces_ok || might_overlap)
+  if (!pieces_ok || might_overlap)
 {
-  pattern_ok = 
-   emit_block_move_via_pattern (x, y, size, align,
-expected_align, expected_size,
-min_size, max_size, probable_max_size,
-might_overlap);
+  pattern_ok
+   = emit_block_move_via_pattern (x, y, size, align,
+  expected_align, expected_size,
+  min_size, max_size, probable_max_size,
+  might_overlap);
   if (!pattern_ok && might_overlap)
{
  /* Do not try any of the other methods below as they are not safe
@@ -1645,7 +1647,7 @@ emit_block_move_hints (rtx x, rtx y, rtx
 
   if (pattern_ok)
 ;
-  else if (CONST_INT_P (size) && pieces_ok)
+  else if (pieces_ok)
 move_by_pieces (x, y, INTVAL (size), align, RETURN_BEGIN);
   else if (may_use_call && !might_overlap
   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))


Jakub


Re: [PATCH] Builtin fadd variants folding implementation

2019-10-02 Thread Joseph Myers
On Wed, 11 Sep 2019, Tejas Joshi wrote:

> diff --git a/gcc/builtins.def b/gcc/builtins.def
> index 8bb7027aac7..2df616c477e 100644
> --- a/gcc/builtins.def
> +++ b/gcc/builtins.def
> @@ -355,6 +355,9 @@ DEF_EXT_LIB_FLOATN_NX_BUILTINS (BUILT_IN_FABS, "fabs", 
> FABS_TYPE, ATTR_CONST_NOT
>  DEF_GCC_BUILTIN(BUILT_IN_FABSD32, "fabsd32", 
> BT_FN_DFLOAT32_DFLOAT32, ATTR_CONST_NOTHROW_LEAF_LIST)
>  DEF_GCC_BUILTIN(BUILT_IN_FABSD64, "fabsd64", 
> BT_FN_DFLOAT64_DFLOAT64, ATTR_CONST_NOTHROW_LEAF_LIST)
>  DEF_GCC_BUILTIN(BUILT_IN_FABSD128, "fabsd128", 
> BT_FN_DFLOAT128_DFLOAT128, ATTR_CONST_NOTHROW_LEAF_LIST)
> +DEF_EXT_LIB_BUILTIN(BUILT_IN_FADD, "fadd", BT_FN_FLOAT_DOUBLE_DOUBLE, 
> ATTR_CONST_NOTHROW_LEAF_LIST)
> +DEF_EXT_LIB_BUILTIN(BUILT_IN_FADDL, "faddl", 
> BT_FN_FLOAT_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
> +DEF_EXT_LIB_BUILTIN(BUILT_IN_DADDL, "daddl", 
> BT_FN_DOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)

I think these should all use ATTR_MATHFN_FPROUNDING_ERRNO instead of 
ATTR_CONST_NOTHROW_LEAF_LIST.  That way the attributes depend properly on 
the command-line options.

>  DEF_C99_BUILTIN(BUILT_IN_FDIM, "fdim", BT_FN_DOUBLE_DOUBLE_DOUBLE, 
> ATTR_MATHFN_FPROUNDING_ERRNO)
>  DEF_C99_BUILTIN(BUILT_IN_FDIMF, "fdimf", BT_FN_FLOAT_FLOAT_FLOAT, 
> ATTR_MATHFN_FPROUNDING_ERRNO)
>  DEF_C99_BUILTIN(BUILT_IN_FDIML, "fdiml", 
> BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_FPROUNDING_ERRNO)
> diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
> index 3a14d2a41c1..f6b4508a101 100644
> --- a/gcc/fold-const-call.c
> +++ b/gcc/fold-const-call.c
> @@ -570,6 +570,44 @@ fold_const_nextafter (real_value *result, const 
> real_value *arg0,
>return true;
>  }
>  
> +/* Try to evaluate:
> +
> +  *RESULT = add (*ARG0, *ARG1)
> +
> +   in format FORMAT.  Return true on success.  */

It's not specifically "add".  It's an operation determined by ICODE.

> +
> +static bool
> +fold_const_narrow_binary (real_value *result, const real_value *arg0,
> +   int icode, const real_value *arg1,
> +   const real_format *format)
> +{
> +  if (REAL_VALUE_ISSIGNALING_NAN (*arg0)
> +  || REAL_VALUE_ISSIGNALING_NAN (*arg1))
> +return false;
> +
> +  real_arithmetic (result, icode, arg0, arg1);
> +  /* Underflow condition.  */
> +  if (flag_errno_math
> +  && result->cl == rvc_normal
> +  && REAL_EXP (result) < format->emin)
> +return false;
> +
> +  if (!exact_real_truncate (format, result)
> +  && (flag_rounding_math || flag_trapping_math))
> +return false;
> +
> +  real_convert (result, format, result);
> +  /* Overflow condition.  */
> +  if (!real_isfinite (result) && flag_errno_math)
> +return false;

For overflow, this should be checking for real_isinf, not for 
!real_isfinite.  And specifically for real_isinf with both arguments 
finite, it's not an overflow to have an infinite result from an infinite 
argument.  (Strictly, infinite result with finite arguments could be 
either overflow or divide-by-zero, depending on whether it's an exact or 
inexact infinite result, but both those cases should be handled the same 
here.)

> +  if (REAL_VALUE_ISNAN (*result)
> +  && (flag_errno_math || flag_trapping_math))
> +return false;

This last condition should only apply if neither argument is NaN; it's 
fine to fold to a NaN result when one or both arguments is a quiet NaN 
(and signaling NaNs were handled above).

There is one other case that also needs excluding in this function.  If 
both arguments are zero, and the operation is either adding two zeroes of 
different sign, or subtracting two zeroes of the same sign, it's not valid 
to fold if flag_rounding_math, because the sign of the result of 0 - 0 
depends on the rounding mode.

> diff --git a/gcc/testsuite/gcc.dg/builtin-fadd-5.c 
> b/gcc/testsuite/gcc.dg/builtin-fadd-5.c
> new file mode 100644
> index 000..ac3b547724c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/builtin-fadd-5.c
> @@ -0,0 +1,26 @@
> +/* { dg-do link } */
> +/* { dg-options "-O2 -fno-trapping-math -fno-math-errno" } */
> +
> +extern int link_error (int);
> +
> +#define TEST(FN, VAL1, VAL2, RESULT) \
> +  if (__builtin_##FN (VAL1, VAL2) != RESULT) link_error (__LINE__);
> +
> +int
> +main (void)
> +{
> +  TEST(fadd, __DBL_MAX__, __DBL_MAX__, __builtin_inff());
> +  TEST(fadd, __FLT_MAX__, __FLT_MAX__, __builtin_inff());
> +  TEST(fadd, __DBL_MIN__, __DBL_MIN__, 0.0);
> +
> +  TEST(faddl, __LDBL_MAX__, __LDBL_MAX__, __builtin_inff());
> +  TEST(faddl, __FLT_MAX__, __FLT_MAX__, __builtin_inff());
> +  TEST(faddl, __LDBL_MIN__, __LDBL_MIN__, 0.0);
> +
> +  TEST(daddl, __LDBL_MAX__, __LDBL_MAX__, __builtin_inf());
> +  TEST(daddl, __DBL_MAX__, __DBL_MAX__, __builtin_inf());
> +  TEST(daddl, __LDBL_MIN__, __LDBL_MIN__, 0.0);

In the case where long double = double, this last test isn't correct, as 
LDBL_MIN + LDBL_MIN == DBL_MIN 

Fix ALL_REGS thinko in initialisation of function_used_regs

2019-10-02 Thread Richard Sandiford
My change to the -fipa-ra bookkeeping used ALL_REGS as the supposedly
safe default assumption, but ALL_REGS isn't literally all registers,
just a close approximation.

This caused a bootstrap failure on arm-linux-gnu, where the condition
code register isn't in ALL_REGS and so was being masked out of some
call-clobbered sets.

Tested on arm-linux-gnueabihf (which now bootstraps) and applied
as obvious as r276489.  Sorry for the breakage...

Richard


2019-10-02  Richard Sandiford  

gcc/
* cgraph.c (cgraph_node::rtl_info): Use SET_HARD_REG_SET
instead of reg_class_contents[ALL_REGS].

Index: gcc/cgraph.c
===
--- gcc/cgraph.c2019-10-02 14:11:33.275497853 +0100
+++ gcc/cgraph.c2019-10-02 22:18:36.880570656 +0100
@@ -1866,7 +1866,7 @@ cgraph_node::rtl_info (const_tree decl)
   if (node->rtl == NULL)
 {
   node->rtl = ggc_cleared_alloc ();
-  node->rtl->function_used_regs = reg_class_contents[ALL_REGS];
+  SET_HARD_REG_SET (node->rtl->function_used_regs);
 }
   return node->rtl;
 }


Re: [PATCH 4/4] Modifications to the testsuite

2019-10-02 Thread Segher Boessenkool
On Wed, Oct 02, 2019 at 09:29:22PM +0200, Andreas Schwab wrote:
> FAIL: gcc.dg/ipa/ipa-sra-19.c (test for excess errors)
> Excess errors:
> /daten/gcc/gcc-20191001/gcc/testsuite/gcc.dg/ipa/ipa-sra-19.c:19:3: error: 
> AltiVec argument passed to unprototyped function
> /daten/gcc/gcc-20191001/gcc/testsuite/gcc.dg/ipa/ipa-sra-19.c:17:12: warning: 
> GCC vector returned by reference: non-standard ABI extension with no 
> compatibility guarantee [-Wpsabi]

Yeah.  Many tests add -Wno-psabi to shut these up; is that good here?


Segher


libgo patch committed: mark go-context.S as no-exec-stack, split-stack

2019-10-02 Thread Ian Lance Taylor
This libgo patch marks go-context.S as no-executable-stack and
split-stack supported.

The .note.GNU-stack section tells the linker that this object does not
require an executable stack.

The .note.GNU-split-stack section tells the linker that functions in
this object can be called directly by split-stack functions, without
require a large stack.

The .note.GNU-no-split-stack section tells the linker that functions
in this object do not have a split-stack prologue.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 276382)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-07faafda5fbd66a710153814f30d93c91461e7cb
+a3aef6b6df932ea6c7094d074695bc0b033a3d17
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/go-context.S
===
--- libgo/runtime/go-context.S  (revision 276382)
+++ libgo/runtime/go-context.S  (working copy)
@@ -66,4 +66,8 @@ __go_makecontext:
 
ret
 
+   .section.note.GNU-stack,"",@progbits
+   .section.note.GNU-split-stack,"",@progbits
+   .section.note.GNU-no-split-stack,"",@progbits
+
 #endif


[C++ PATCH] Improve C++ fold caching efficiency.

2019-10-02 Thread Jason Merrill
While looking at concepts caching I noticed that we were clearing the caches
unnecessarily for non-constant initialization, which shouldn't affect
folding.

Tested x86_64-pc-linux-gnu, applying to trunk.

* typeck2.c (store_init_value): Only clear_cv_and_fold_caches if the
value is constant.
---
 gcc/cp/typeck2.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index 58fa54f40af..ec0e6a7e33d 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -907,9 +907,6 @@ store_init_value (tree decl, tree init, vec** 
cleanups, int flags)
   /* Handle aggregate NSDMI in non-constant initializers, too.  */
   value = replace_placeholders (value, decl);
 
-  /* DECL may change value; purge caches.  */
-  clear_cv_and_fold_caches ();
-
   /* If the initializer is not a constant, fill in DECL_INITIAL with
  the bits that are constant, and then return an expression that
  will perform the dynamic initialization.  */
@@ -918,6 +915,10 @@ store_init_value (tree decl, tree init, vec** 
cleanups, int flags)
  || vla_type_p (type)
  || ! reduced_constant_expression_p (value)))
 return split_nonconstant_init (decl, value);
+
+  /* DECL may change value; purge caches.  */
+  clear_cv_and_fold_caches ();
+
   /* If the value is a constant, just put it in DECL_INITIAL.  If DECL
  is an automatic variable, the middle end will turn this into a
  dynamic initialization later.  */

base-commit: d07428e896b5cfc7d78435de6ea08aad4e5ccfa5
-- 
2.21.0



Re: [PATCH 4/4] Modifications to the testsuite

2019-10-02 Thread Andreas Schwab
FAIL: gcc.dg/ipa/ipa-sra-19.c (test for excess errors)
Excess errors:
/daten/gcc/gcc-20191001/gcc/testsuite/gcc.dg/ipa/ipa-sra-19.c:19:3: error: 
AltiVec argument passed to unprototyped function
/daten/gcc/gcc-20191001/gcc/testsuite/gcc.dg/ipa/ipa-sra-19.c:17:12: warning: 
GCC vector returned by reference: non-standard ABI extension with no 
compatibility guarantee [-Wpsabi]

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH] Add some hash_map_safe_* functions like vec_safe_*.

2019-10-02 Thread Jason Merrill

On 10/2/19 4:28 AM, Richard Biener wrote:

On Tue, Oct 1, 2019 at 8:50 PM Jason Merrill  wrote:


On 10/1/19 3:34 AM, Richard Biener wrote:

On Mon, Sep 30, 2019 at 8:33 PM Jason Merrill  wrote:


My comments accidentally got lost.

Several places in the front-end (and elsewhere) use the same lazy
allocation pattern for hash_maps, and this patch replaces them with
hash_map_safe_* functions like vec_safe_*.  They don't provide a way
to specify an initial size, but I don't think that's a significant
issue.

Tested x86_64-pc-linux-gnu.  OK for trunk?


You are using create_ggc but the new functions do not indicate that ggc
allocation is done.
It's then also incomplete with not having a non-ggc variant
of them?  Maybe I'm missing something.


Ah, I had been thinking that this lazy pattern would only be used with
ggc, but I see that I was wrong.  How's this?

Incidentally, now I see another C++11 feature I'd like to be able to
use: default template arguments for function templates.


I presume

template
inline bool
hash_map_safe_put (hash_map *&h, const K& k, const V& v, size_t
size = default_hash_map_size)
{
   return hash_map_maybe_create (h, size)->put (k, v);
}

was deemed to ugly?  IMHO instantiating the templates for different sizes
is unwanted compile-time overhead (plus not being able to use
a default value makes non-default values creep into the code-base?).

I'd have OKed a variant like above, so - would that work for you
(change hash_map_maybe_create as well then, of course).


Sure.  Applying this version:

commit 136fbc96abe0cbb9458721fb919aacdd799e3ba8
Author: Jason Merrill 
Date:   Mon Sep 30 13:17:27 2019 -0400

Add some hash_map_safe_* functions like vec_safe_*.

gcc/
* hash-map.h (default_hash_map_size): New variable.
(create_ggc): Use it as default argument.
(hash_map_maybe_create, hash_map_safe_get)
(hash_map_safe_get_or_insert, hash_map_safe_put): New fns.
gcc/cp/
* constexpr.c (maybe_initialize_fundef_copies_table): Remove.
(get_fundef_copy): Use hash_map_safe_get_or_insert.
* cp-objcp-common.c (cp_get_debug_type): Use hash_map_safe_*.
* decl.c (store_decomp_type): Remove.
(cp_finish_decomp): Use hash_map_safe_put.
* init.c (get_nsdmi): Use hash_map_safe_*.
* pt.c (store_defaulted_ttp, lookup_defaulted_ttp): Remove.
(add_defaults_to_ttp): Use hash_map_safe_*.

diff --git a/gcc/hash-map.h b/gcc/hash-map.h
index ba20fe79f23..73ce6a1dc66 100644
--- a/gcc/hash-map.h
+++ b/gcc/hash-map.h
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
removed.  Objects of hash_map type are copy-constructible but not
assignable.  */
 
+const size_t default_hash_map_size = 13;
 template,
 			Value> */>
@@ -129,7 +130,7 @@ class GTY((user)) hash_map
   };
 
 public:
-  explicit hash_map (size_t n = 13, bool ggc = false,
+  explicit hash_map (size_t n = default_hash_map_size, bool ggc = false,
 		 bool sanitize_eq_and_hash = true,
 		 bool gather_mem_stats = GATHER_STATISTICS
 		 CXX_MEM_STAT_INFO)
@@ -146,7 +147,7 @@ public:
 	   HASH_MAP_ORIGIN PASS_MEM_STAT) {}
 
   /* Create a hash_map in ggc memory.  */
-  static hash_map *create_ggc (size_t size,
+  static hash_map *create_ggc (size_t size = default_hash_map_size,
 			   bool gather_mem_stats = GATHER_STATISTICS
 			   CXX_MEM_STAT_INFO)
 {
@@ -326,4 +327,46 @@ gt_pch_nx (hash_map *h, gt_pointer_operator op, void *cookie)
   op (&h->m_table.m_entries, cookie);
 }
 
+enum hm_alloc { hm_heap = false, hm_ggc = true };
+template
+inline hash_map *
+hash_map_maybe_create (hash_map *&h,
+		   size_t size = default_hash_map_size)
+{
+  if (!h)
+{
+  if (ggc)
+	h = hash_map::create_ggc (size);
+  else
+	h = new hash_map (size);
+}
+  return h;
+}
+
+/* Like h->get, but handles null h.  */
+template
+inline V*
+hash_map_safe_get (hash_map *h, const K& k)
+{
+  return h ? h->get (k) : NULL;
+}
+
+/* Like h->get, but handles null h.  */
+template
+inline V&
+hash_map_safe_get_or_insert (hash_map *&h, const K& k, bool *e = NULL,
+			 size_t size = default_hash_map_size)
+{
+  return hash_map_maybe_create (h, size)->get_or_insert (k, e);
+}
+
+/* Like h->put, but handles null h.  */
+template
+inline bool
+hash_map_safe_put (hash_map *&h, const K& k, const V& v,
+		   size_t size = default_hash_map_size)
+{
+  return hash_map_maybe_create (h, size)->put (k, v);
+}
+
 #endif
diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index cb5484f4b72..06672a2c3b2 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -1098,15 +1098,6 @@ maybe_initialize_constexpr_call_table (void)
 
 static GTY(()) hash_map *fundef_copies_table;
 
-/* Initialize FUNDEF_COPIES_TABLE if it's not initialized.  */
-
-static void
-maybe_initialize_fundef_copies_table ()
-{
-  if (fundef_copies_table == NULL)
-fundef_co

Re: [patch, libgomp] testsuite remove alloca header

2019-10-02 Thread Andreas Tobler

On 02.10.19 12:19, Thomas Schwinge wrote:

Hi!

On 2019-09-30T15:22:52+0200, Jakub Jelinek  wrote:

On Mon, Sep 30, 2019 at 03:16:00PM +0200, Andreas Tobler wrote:

here on FreeBSD we lack the alloca.h header so two test cases fail to
compile.

Do the same as I did six years ago for another test case: 'Remove alloca.h
and replace it with __builtin_alloca"

Is this ok for trunk?

TIA,
Andreas

2019-09-30  Andreas Tobler  

* testsuite/libgomp.oacc-c-c++-common/loop-default.h: Remove alloca.h
inlcude. Replace alloca () with __builtin_alloca ().


s/inlcude/include/

Because the tests are already using GCC specific builtins
(__builtin_goacc_parlevel_id), I think that is ok, they aren't portable
anyway, CCing Thomas to confirm.


* testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c: Likewise.


Sure, that's fine, and (as far as I'm concerned) pre-approved, should any
additional such changes be necessary later on.

To record the review effort -- not very much here, admittedly ;-) --
please include "Reviewed-by: Thomas Schwinge "
in the commit log, see .


I did the commit and hopefully I matched your expectations.
I didn't find an example, so I added the tag similar as we do in FreeBSD.

https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=276479

Danke,
Andreas


Re: [PATCH], V4, patch #4.1: Enable prefixed/pc-rel addressing (revised)

2019-10-02 Thread Michael Meissner
On Tue, Oct 01, 2019 at 06:56:01PM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Mon, Sep 30, 2019 at 10:12:54AM -0400, Michael Meissner wrote:
> > I needed
> > to add a second memory constraint ('eM') that prevents using a prefixed
> > address.
> 
> Do we need both em and eM?  Do we ever want to allow prefixed insns but not
> pcrel?  Or, alternatively, this only uses eM for some insns where the "p"
> prefix trick won't work (because there are multiple insns in the template);
> we could solve that some other way (by inserting the p's manually for
> example).

No right now we need one (no prefix) and the other (no pcrel) is desirable, but
if we can only have one, it will mean extra instructions being generated.

In the case of no prefix, we need this for stack_protect_testdi and
stack_protect_setdi.  There if we have a large stack frame and enable stack
protection, we don't want the register allocator from re-combining the insns to
an insn with a large offset, which it will do, even though the predicate does
not allow this case.  This was discovered when we tried to build glibc, and one
module (vfwprintf-internal.c) has a large stack frame and is built with
-fstack-protector-all.

In the case of no pc-rel, this occurs in optimizing vector extracts where the
vector is in memory.  In this code, we only have one temporary base register.
In the case where the address is PC-relative, and the element number being
extracted is variable, we would need two temporary base registers, one to hold
the PC-relative address, and the other to hold the offset from the start of the
vector.  So here, we disallow PC-relative addresses, but numeric addresses that
result in a prefixed instruction are fine, since the code calculates the
offset, adds in the offset, and then does the memory operation.

Aaron found this bug where it previously tried to use the single temporary with
two different uses.

If we only had a no prefixed constraint, the code would not combine the vector
extract from memory, and instead, it would load the whole vector into a
register, and then do the appropriate shifts, VSLO, etc. to extract the
element.  I imagine the case shows up when you have large stack frames (or very
large structures).

> But what should inline asm users do wrt prefixed/pcrel memory?  Should
> they not use prefixed memory at all?  That means for asm we should always
> disallow prefixed for "m".

Yes, I've been thinking that for some time.  But I'm not going to worry about
that until the patches are in.

> Having both em and eM names is a bit confusing (which is what?)  The eM
> one should not be documented in the user manual, probably.
> 
> Maybe just using wY here will work just as well?  That is also not ideal
> of course, but we already have that one anyway.

Well, wY does allow prefixed addresses (because it calls mem_operand_ds_form
which also was modified to support prefixed addresses), so it wouldn't help.

> > 4) In the previous patch, I missed setting the prefixed size and non 
> > prefixed
> > size for mov_ppc64 in the insn.  This pattern is used for moving 
> > PTImode
> > in GPR registers (on non-VSX systems, it would move TImode also).  By the 
> > time
> > it gets to final, it will have been split, but it is still useful to get the
> > sizes correct before the mode is split.
> 
> So that is a separate patch?  Please send it as one, then?

No, it needs to be part of this patch.  It was just missing from the patch I
sent out.

> > +  /* The LWA instruction uses the DS-form format where the bottom two bits 
> > of
> > + the offset must be 0.  The prefixed PLWA does not have this
> > + restriction.  */
> > +  if (TARGET_PREFIXED_ADDR
> > +  && address_is_prefixed (addr, DImode, NON_PREFIXED_DS))
> > +return true;
> 
> Should TARGET_PREFIXED_ADDR be part of address_is_prefixed, instead of
> part of all its callers?

Just trying to do the test before the call, so the default case (not 'future')
won't have the call/return.

> > +;; Return 1 if op is a memory operand that is not prefixed.
> > +(define_predicate "non_prefixed_memory"
> > +  (match_code "mem")
> > +{
> > +  if (!memory_operand (op, mode))
> > +return false;
> > +
> > +  enum insn_form iform
> > += address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
> > +
> > +  return (iform != INSN_FORM_PREFIXED_NUMERIC
> > +  && iform != INSN_FORM_PCREL_LOCAL
> > +  && iform != INSN_FORM_BAD);
> > +})
> > +
> > +(define_predicate "non_pcrel_memory"
> > +  (match_code "mem")
> > +{
> > +  if (!memory_operand (op, mode))
> > +return false;
> > +
> > +  enum insn_form iform
> > += address_to_insn_form (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
> > +
> > +  return (iform != INSN_FORM_PCREL_EXTERNAL
> > +  && iform != INSN_FORM_PCREL_LOCAL
> > +  && iform != INSN_FORM_BAD);
> > +})
> 
> Why does non_prefixed_memory not check INSN_FORM_PCREL_EXTERNAL?  Why does
> non_prefixed_memory not use non_pcrel_m

Re: [patch] range-ops contribution

2019-10-02 Thread Jeff Law
On 10/2/19 6:56 AM, Aldy Hernandez wrote:
>>> diff --git a/gcc/range.cc b/gcc/range.cc
>>> new file mode 100644
>>> index 000..5e4d90436f2
>>> --- /dev/null
>>> +++ b/gcc/range.cc
>> [ ... ]
>>> +
>>> +value_range_base
>>> +range_intersect (const value_range_base &r1, const value_range_base
>>> &r2)
>>> +{
>>> +  value_range_base tmp (r1);
>>> +  tmp.intersect (r2);
>>> +  return tmp;
>>> +}
>> So low level question here.  This code looks particularly well suited
>> for the NRV optimization.  Can you check if NVR (named-value-return) is
>> triggering here, and if not why.  ISTM these are likely used heavily and
>> NVR would be a win.
> 
> AFAICT, even before the NRV pass, we have already lined up
> value_range_base::intersect to put its return value into RETVAL:
> 
>   const struct value_range_base & r1_2(D) = r1;
>   const struct value_range_base & r2_4(D) = r2;
>    [local count: 1073741824]:
>    = *r1_2(D);
>   value_range_base::intersect (&, r2_4(D));
>   return ;
> 
> So...all good?
Yup. We construct into  and return it.

> 
>>
>>
>>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>>> index 5ec4d17f23b..269a3cb090e 100644
>>> --- a/gcc/tree-vrp.c
>>> +++ b/gcc/tree-vrp.c
>> [ ... ]
>>
>>> @@ -67,7 +67,7 @@ along with GCC; see the file COPYING3.  If not see
>>
>>> +
>>> +/* Return the inverse of a range.  */
>>> +
>>> +void
>>> +value_range_base::invert ()
>>> +{
>>> +  if (undefined_p ())
>>> +    return;
>>> +  if (varying_p ())
>>> +    set_undefined ();
>>> +  else if (m_kind == VR_RANGE)
>>> +    m_kind = VR_ANTI_RANGE;
>>> +  else if (m_kind == VR_ANTI_RANGE)
>>> +    m_kind = VR_RANGE;
>>> +  else
>>> +    gcc_unreachable ();
>>> +}
>> I don't think this is right for varying_p.  ISTM that if something is
>> VR_VARYING that inverting it is still VR_VARYING.  VR_UNDEFINED seems
>> particularly bad given its optimistic treatment elsewhere.
> 
> Done.  Andrew agreed, as per his reply.
> 
> Retested for all languages on x86-64 Linux.
> 
> OK?
So the final version has asserts to ensure we don't get into the invert
method with varying/undefined.  That works for me.

OK for the trunk.

Onward!

jeff


Re: [PATCH] Simplify sinh (x) / cosh (x)

2019-10-02 Thread Rafael Tsuha
Hi Jeff,

I've just checked and the proposed optimization is under
-funsafe-math-optimizations and canonicalize_math_p():
 `(if (flag_unsafe_math_optimizations && canonicalize_math_p ())`

I'm sorry, but I don't quite understand what you mean with 'using
those expanders in an unexpected way', should I put the optimization
in another place and try to deal with the zero sign or is it fine the
way it is?

Rafael Tsuha


Em ter, 10 de set de 2019 às 12:23, Jeff Law  escreveu:
>
> On 9/10/19 1:36 AM, Uros Bizjak wrote:
> > On Mon, Sep 9, 2019 at 8:44 PM Jeff Law  wrote:
> >>
> >> On 9/4/19 12:16 PM, Rafael Tsuha wrote:
> >>> Hi, Jeff
> >>>
> >>> Em seg, 29 de abr de 2019 às 18:22, Jeff Law  escreveu:
> 
>  On 1/22/19 12:31 PM, Rafael Tsuha wrote:
> > This patch simplifies the expression sinh (x) / cosh (x) to tanh (x).
> > This rule is mathematically valid.
> >
> > There's a slight difference in the result when applying this
> > optimization with x in the interval 0 < x <= 1e-4951. With the
> > optimization, the result using long double is -0 and without the
> > optimization, the result is +0.
>  That's an indication something has gone wrong.
> 
>  If I'm reading this correctly it sounds like tanh in that range is
>  returning -0?  If so, that just seems like the wrong output from tanh
>  since IIUC for a positive input tanh will always have a positive output.
> 
> >>>
> >>> I reverted the patch sent to solve bug 88556 and found out that
> >>> tanhl(0) started returning -0 after this patch.
> >>>
> >>> patch we reverted:
> >>> (https://gcc.gnu.org/viewcvs/gcc/trunk/gcc/config/i386/i386.c?r1=267325&r2=267324&pathrev=267325)
> >>>
> >>> In the line 44480 of this patch, it checks the sign bit of the input
> >>> and if it's false the expression is multiplied by -1. In the way it's
> >>> being calculated, this should be done only if the input is a number
> >>> greater than zero.
> >>>
> >>> If we follow the code starting at line 44468, replacing op1 with 0, we
> >>> can see that e2 equals 0 at line 44482, flags will be false and
> >>> finally the e2 = -e2 operation will be executed generating the -0
> >>> result.
> >>>
> >>> I have implemented a testcase to reproduce the bug:
> >>> https://paste.debian.net/1098800/
> >>> this code was compiled with -Ofast when we tested it.
> >>>
> >>> Should I file a bug about this? And for fixing, Is it a good solution
> >>> to emit an instruction to return zero immediately if the input equals
> >>> zero?
> >> So if I'm understanding Uros's patch correctly, it's supposed to only be
> >> used for -ffast-math where we don't necessarily honor signed zeros.
> >
> > True. The full patch is at [1], where it is evident that all these
> > expanders are protected by flag_unsafe_math_optimizations. As
> > explained in the patch sumbission, the equations are ported from [2],
> > so barring some unwanted bug in the porting, they should be equal. I
> > didn't analyse the correctness of the original equations.
> It (your patch) looked fine to me given the -ffast-math constraint.
>
> I think the question we need to go back and answer is why the proposed
> patch to improve sinh/cosh -> tanh is using those expanders in an
> unexpected way.
>
> jeff


Re: [RFH][libgcc] fp-bit bit ordering (PR 78804)

2019-10-02 Thread Jeff Law
On 10/2/19 10:51 AM, Oleg Endo wrote:
> On Tue, 2019-10-01 at 14:21 -0600, Jeff Law wrote:
>>
>> So the ask is to just test this on some LE targets?  I can do that :-)
>>
>> I'll throw it in.  Analysis will be slightly more difficult than
>> usual
>> as we've got some fallout from Richard S's work, but it's certainly
>> do-able.
> 
> Thanks a lot!
> 
> 
>> ps.  ANd yes, I've got a request to the build farm folks to get a
>> jenkins instance on the build farm.  Once that's in place I can have
>> my tester start publishing results that everyone can see.
> 
> Sounds great.  Would it be possible for other people to give the auto
> tester patches for testing and get the results back from it?  Or
> something like that?
That's certainly the medium term plan.

THe initial first step is to proxy the results from the Jenkins instance
on my laptop and Red Hat's Toronto office to a public instance running
in the GCC farm.  That way everyone can see the results & log files even
if they can't (yet) start runs or submit patches for subsequent runs.

jeff


Re: [RFC][Fortran,patch] %C error diagnostic location

2019-10-02 Thread Steve Kargl
On Wed, Oct 02, 2019 at 11:26:17AM +0200, Tobias Burnus wrote:
> Hi all,
> 
> I looked at error messages – and I like it. – Please review.
> 
> There is actually a "fallout": One testsuite case actually checks for 
> the row location – I didn't even know that one can do it that way.
> 
> That's fixed by the attached patch (I also included the original patch).
> OK for the trunk?
> 

Looks good.

-- 
Steve


Re: [RFH][libgcc] fp-bit bit ordering (PR 78804)

2019-10-02 Thread Oleg Endo
On Tue, 2019-10-01 at 14:21 -0600, Jeff Law wrote:
> 
> So the ask is to just test this on some LE targets?  I can do that :-)
> 
> I'll throw it in.  Analysis will be slightly more difficult than
> usual
> as we've got some fallout from Richard S's work, but it's certainly
> do-able.

Thanks a lot!


> ps.  ANd yes, I've got a request to the build farm folks to get a
> jenkins instance on the build farm.  Once that's in place I can have
> my tester start publishing results that everyone can see.

Sounds great.  Would it be possible for other people to give the auto
tester patches for testing and get the results back from it?  Or
something like that?

Cheers,
Oleg



Re: Ping: question on Multiple level macro expansion

2019-10-02 Thread Joseph Myers
This is not a bug (except in icc, since you said icc accepts the code).  
gcc-help would have been a better mailing list for this sort of question.

-- 
Joseph S. Myers
jos...@codesourcery.com


Ping: question on Multiple level macro expansion

2019-10-02 Thread Qing Zhao
Hi,

This is a ping on:

https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01628.html 


Anyway, I filed a gcc bug on this issue:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91973 


If you think that this is NOT a gcc bug, please let me know.

thanks.

Qing

Re: [C++ Patch] More cp_expr_loc_or_input_loc and DECL_SOURCE_LOCATION uses

2019-10-02 Thread Jason Merrill

On 9/30/19 8:24 AM, Paolo Carlini wrote:

Hi,

a few more in init.c and name-lookup.c, respectively. Tested 
x86_64-linux, as usual.


Thanks, Paolo.

///


OK.



Re: [PR testsuite/91842] Skip gcc.dg/ipa/ipa-sra-19.c on power

2019-10-02 Thread Rainer Orth
Hi Martin,

> I seem to remember I minimized gcc.dg/ipa/ipa-sra-19.c on power but
> perhaps I am wrong because the testcase fails there with a
> power-specific error:
>
> gcc.dg/ipa/ipa-sra-19.c:19:3: error: AltiVec argument passed to
> unprototyped function
>
> I am going to simply skip it there with the following patch, which I
> hope is obvious.  Tested by running ipa.exp on both ppc64le-linux and
> x86_64-linux.
[...]
> diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-sra-19.c
> b/gcc/testsuite/gcc.dg/ipa/ipa-sra-19.c
> index adebaa5f5e1..d219411d8ba 100644
> --- a/gcc/testsuite/gcc.dg/ipa/ipa-sra-19.c
> +++ b/gcc/testsuite/gcc.dg/ipa/ipa-sra-19.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2"  } */
> +/* { dg-skip-if "" { powerpc*-*-* } } */

please use a short comment (first arg to dg-skip-if) explaining why you
skip the test, otherwise this will be anything but obvious shortly.

I'll leave approval to the PowerPC maintainers.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] Builtin fadd variants folding implementation

2019-10-02 Thread Martin Jambor
Hi Joseph,

could you please have a look at this latest version of the patch
introducing fadd et al.?  There already is a pre-approved folllow-up
somewhere expanding it on power but we need to get this in first.

Thanks,

Martin


On Wed, Sep 11 2019, Tejas Joshi wrote:
> Hi.
> This patch includes implementation of fadd, faddl, daddl functions.
> The patch boostraps on x86_64-linux-gnu and passes regression tests.
>
> Thanks,
> Tejas
>
> gcc/ChangeLog:
>
> 2019-09-11  Tejas Joshi  
>
> * builtin-types.def: Define narrowing function types.
> * builtins.def: Define fadd variants builtin functions.
> * fold-const-call.c (fold_const_narrow_binary): New. Folding calls
> and conditions for standard arithmetic functions.
> (fold_const_call): Add case for fadd variants.
>
> gcc/testsuite/ChangeLog:
>
> 2019-09-11  Tejas Joshi  
>
> * gcc.dg/builtin-fadd-1.c: New test.
> * gcc.dg/builtin-fadd-2.c: New test.
> * gcc.dg/builtin-fadd-3.c: New test.
> * gcc.dg/builtin-fadd-4.c: New test.
> * gcc.dg/builtin-fadd-5.c: New test.
> * gcc.dg/builtin-fadd-6.c: New test.
> diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
> index e5c9e063c48..6bc552fa71a 100644
> --- a/gcc/builtin-types.def
> +++ b/gcc/builtin-types.def
> @@ -387,8 +387,14 @@ DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT_PTR,
>BT_VOID, BT_UINT, BT_PTR)
>  DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT_FLOAT_FLOAT,
>BT_FLOAT, BT_FLOAT, BT_FLOAT)
> +DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT_DOUBLE_DOUBLE,
> +  BT_FLOAT, BT_DOUBLE, BT_DOUBLE)
> +DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT_LONGDOUBLE_LONGDOUBLE,
> +  BT_FLOAT, BT_LONGDOUBLE, BT_LONGDOUBLE)
>  DEF_FUNCTION_TYPE_2 (BT_FN_DOUBLE_DOUBLE_DOUBLE,
>BT_DOUBLE, BT_DOUBLE, BT_DOUBLE)
> +DEF_FUNCTION_TYPE_2 (BT_FN_DOUBLE_LONGDOUBLE_LONGDOUBLE,
> +  BT_DOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE)
>  DEF_FUNCTION_TYPE_2 (BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE,
>BT_LONGDOUBLE, BT_LONGDOUBLE, BT_LONGDOUBLE)
>  DEF_FUNCTION_TYPE_2 (BT_FN_FLOAT16_FLOAT16_FLOAT16,
> diff --git a/gcc/builtins.def b/gcc/builtins.def
> index 8bb7027aac7..2df616c477e 100644
> --- a/gcc/builtins.def
> +++ b/gcc/builtins.def
> @@ -355,6 +355,9 @@ DEF_EXT_LIB_FLOATN_NX_BUILTINS (BUILT_IN_FABS, "fabs", 
> FABS_TYPE, ATTR_CONST_NOT
>  DEF_GCC_BUILTIN(BUILT_IN_FABSD32, "fabsd32", 
> BT_FN_DFLOAT32_DFLOAT32, ATTR_CONST_NOTHROW_LEAF_LIST)
>  DEF_GCC_BUILTIN(BUILT_IN_FABSD64, "fabsd64", 
> BT_FN_DFLOAT64_DFLOAT64, ATTR_CONST_NOTHROW_LEAF_LIST)
>  DEF_GCC_BUILTIN(BUILT_IN_FABSD128, "fabsd128", 
> BT_FN_DFLOAT128_DFLOAT128, ATTR_CONST_NOTHROW_LEAF_LIST)
> +DEF_EXT_LIB_BUILTIN(BUILT_IN_FADD, "fadd", BT_FN_FLOAT_DOUBLE_DOUBLE, 
> ATTR_CONST_NOTHROW_LEAF_LIST)
> +DEF_EXT_LIB_BUILTIN(BUILT_IN_FADDL, "faddl", 
> BT_FN_FLOAT_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
> +DEF_EXT_LIB_BUILTIN(BUILT_IN_DADDL, "daddl", 
> BT_FN_DOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
>  DEF_C99_BUILTIN(BUILT_IN_FDIM, "fdim", BT_FN_DOUBLE_DOUBLE_DOUBLE, 
> ATTR_MATHFN_FPROUNDING_ERRNO)
>  DEF_C99_BUILTIN(BUILT_IN_FDIMF, "fdimf", BT_FN_FLOAT_FLOAT_FLOAT, 
> ATTR_MATHFN_FPROUNDING_ERRNO)
>  DEF_C99_BUILTIN(BUILT_IN_FDIML, "fdiml", 
> BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_FPROUNDING_ERRNO)
> diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
> index 3a14d2a41c1..f6b4508a101 100644
> --- a/gcc/fold-const-call.c
> +++ b/gcc/fold-const-call.c
> @@ -570,6 +570,44 @@ fold_const_nextafter (real_value *result, const 
> real_value *arg0,
>return true;
>  }
>  
> +/* Try to evaluate:
> +
> +  *RESULT = add (*ARG0, *ARG1)
> +
> +   in format FORMAT.  Return true on success.  */
> +
> +static bool
> +fold_const_narrow_binary (real_value *result, const real_value *arg0,
> +   int icode, const real_value *arg1,
> +   const real_format *format)
> +{
> +  if (REAL_VALUE_ISSIGNALING_NAN (*arg0)
> +  || REAL_VALUE_ISSIGNALING_NAN (*arg1))
> +return false;
> +
> +  real_arithmetic (result, icode, arg0, arg1);
> +  /* Underflow condition.  */
> +  if (flag_errno_math
> +  && result->cl == rvc_normal
> +  && REAL_EXP (result) < format->emin)
> +return false;
> +
> +  if (!exact_real_truncate (format, result)
> +  && (flag_rounding_math || flag_trapping_math))
> +return false;
> +
> +  real_convert (result, format, result);
> +  /* Overflow condition.  */
> +  if (!real_isfinite (result) && flag_errno_math)
> +return false;
> +
> +  if (REAL_VALUE_ISNAN (*result)
> +  && (flag_errno_math || flag_trapping_math))
> +return false;
> +
> +  return true;
> +}
> +
>  /* Try to evaluate:
>  
>*RESULT = ldexp (*ARG0, ARG1)
> @@ -1674,6 +1712,25 @@ fold_const_call (combined_fn fn, tree type, tree arg0, 
> tree arg1)
>  case CFN_FOLD_LEFT_PLUS:
>return fold_const

Ping ---A bug with -fprofile-dir? Profile directory concatenated with object file path

2019-10-02 Thread Qing Zhao
Ping on:

https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01554.html 


Anyway, I filed an gcc bug on this issue as:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91971 


If you think that this is NOT a bug, please let me know.

thanks.

Qing

[PATCH] Remove greedy wildcards from libstdc++ linker script

2019-10-02 Thread Jonathan Wakely

The only symbols matched by std::e[a-q]* and std::e[s-z]* that are
supposed to be in the GLIBCXX_3.4 version are std::exception::* and
std::endl and std::ends. The latter two already have explicit patterns
matching them, so we just need to match std::exception::*.

This change ensures that any new symbols with a return type of
std::enable_if<...> are not added to the GLIBCXX_3.4 version.

* config/abi/pre/gnu.ver: Tighten up greedy wildcards.

Tested powerpc64le-linux and x86_64-linux, committed to trunk.


commit 2f33800e0b630a6e719a9f4c76934dfcb872bc7f
Author: Jonathan Wakely 
Date:   Wed Oct 2 15:49:18 2019 +0100

Remove greedy wildcards from libstdc++ linker script

The only symbols matched by std::e[a-q]* and std::e[s-z]* that are
supposed to be in the GLIBCXX_3.4 version are std::exception::* and
std::endl and std::ends. The latter two already have explicit patterns
matching them, so we just need to match std::exception::*.

This change ensures that any new symbols with a return type of
std::enable_if<...> are not added to the GLIBCXX_3.4 version.

* config/abi/pre/gnu.ver: Tighten up greedy wildcards.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index 07a00036827..e61fbe0ad66 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -80,9 +80,8 @@ GLIBCXX_3.4 {
 # std::domain_error::d*;
 # std::domain_error::~d*;
   std::d[p-z]*;
-  std::e[a-q]*;
   std::error[^_]*;
-  std::e[s-z]*;
+  std::exception::*;
   std::gslice*;
   std::h[^a]*;
   std::i[a-m]*;


-O2 inline retuning 2/n: enable -finline-functions for O2

2019-10-02 Thread Jan Hubicka
Hi,
this patch enables -finline-functions at -O2 and trottles it down
by means of new parameters

 --param max-inline-insns-auto-O2 (set to 15 instead 30)
 --param max-inline-insns-single-O2 (set to 30 instead 200)
 --param inline-min-speedup-O2 (set to 30 instead 15)

Overall effect of both patches is
https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00952.html

The first patches just reduced early inlining so it mostly
has positive effects on code size and few negative performance
impats. Those are quite small as benchmarked by LNT tonight.

For a record, tonight changes in SPEC scores are (this includes the
patch 1 trottling down early inliner for -O2; non -O2 changes are
clearly unrelated):
https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report?younger_in_days=14&older_in_days=0&min_percentage_change=0.02&revisions=ed76597323f2005730596f3a85583691621aa616%2Cb2902434a2c7dd89c5b808ff6db4d3f0f0839603
CPP tests:
https://lnt.opensuse.org/db_default/v4/CPP/latest_runs_report?younger_in_days=14&older_in_days=0&min_percentage_change=0.02&revisions=ed76597323f2005730596f3a85583691621aa616%2Cb2902434a2c7dd89c5b808ff6db4d3f0f0839603
So off-noise regresions are 2.8% for povray for SPEC
and relatively large regressions for botan and 
nbench/FP EMULATION
The later two should be fixed by this patch.

Bootstrapped/regtested x86_64-linux. The patch triggers abi_check
failure which is due to overactive checking fix.  I will wait for
Jonathan to commit fix for it before commiting the patch to mainline.

Honza

* cif-code.def (MAX_INLINE_INSNS_SINGLE_O2_LIMIT,
MAX_INLINE_INSNS_AUTO_O2_LIMIT): New.
* ipa-inline.c (inline_insns_single, inline_insns_auto): New functions.
(can_inline_edge_by_limits_p): Use it.
(big_speedup_p): Use PARAM_INLINE_MIN_SPEEDUP_O2.
(want_inline_small_function_p): Use O2 bounds.
(edge_badness): LIkewise.
* opts.c (default_options): Add OPT_finline_functions.
* params.def (PARAM_INLINE_MIN_SPEEDUP_O2,
PARAM_MAX_INLINE_INSNS_SINGLE_O2, PARAM_MAX_INLINE_INSNS_AUTO_O2):
New parameters.

* g++.dg/tree-ssa/pr53844.C: Add -fno-inline-functions --param
max-inline-insns-single-O2=200.
* gcc.c-torture/execute/builtins/builtins.exp: Add
-fno-inline-functions to additional_flags.
* gcc.dg/ipa/inline-7.c: Add -fno-inline-functions.
* gcc.dg/optimize-bswapsi-5.c: Add -fno-inline-functions.
* gcc.dg/tree-ssa/ssa-thread-12.c: Add --param
early-inlining-insns-O2=14 -fno-inline-functions; revert previous
change.
* gcc.dg/winline-3.c: Use --param max-inline-insns-single-O2=1
--param inline-min-speedup-O2=100
instead of --param max-inline-insns-single=1 --param
inline-min-speedup=100

* invoke.texi (-finline-functions): Update documentation.
(max-inline-insns-single-O2, max-inline-insns-auto-O2,
inline-min-speedup-O2): Document.
(early-inlining-insns-O2): Simplify docs.
Index: cif-code.def
===
--- cif-code.def(revision 276441)
+++ cif-code.def(working copy)
@@ -70,8 +70,12 @@ DEFCIFCODE(LARGE_STACK_FRAME_GROWTH_LIMI
   N_("--param large-stack-frame-growth limit reached"))
 DEFCIFCODE(MAX_INLINE_INSNS_SINGLE_LIMIT, CIF_FINAL_NORMAL,
   N_("--param max-inline-insns-single limit reached"))
+DEFCIFCODE(MAX_INLINE_INSNS_SINGLE_O2_LIMIT, CIF_FINAL_NORMAL,
+  N_("--param max-inline-insns-single-O2 limit reached"))
 DEFCIFCODE(MAX_INLINE_INSNS_AUTO_LIMIT, CIF_FINAL_NORMAL,
   N_("--param max-inline-insns-auto limit reached"))
+DEFCIFCODE(MAX_INLINE_INSNS_AUTO_O2_LIMIT, CIF_FINAL_NORMAL,
+  N_("--param max-inline-insns-auto-O2 limit reached"))
 DEFCIFCODE(INLINE_UNIT_GROWTH_LIMIT, CIF_FINAL_NORMAL,
   N_("--param inline-unit-growth limit reached"))
 
Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 276441)
+++ doc/invoke.texi (working copy)
@@ -8346,6 +8346,7 @@ also turns on the following optimization
 -ffinite-loops @gol
 -fgcse  -fgcse-lm  @gol
 -fhoist-adjacent-loads @gol
+-finline-functions @gol
 -finline-small-functions @gol
 -findirect-inlining @gol
 -fipa-bit-cp  -fipa-cp  -fipa-icf @gol
@@ -8379,7 +8380,6 @@ by @option{-O2} and also turns on the fo
 
 @c Please keep the following list alphabetized!
 @gccoptlist{-fgcse-after-reload @gol
--finline-functions @gol
 -fipa-cp-clone
 -floop-interchange @gol
 -floop-unroll-and-jam @gol
@@ -8559,7 +8559,7 @@ If all calls to a given function are int
 declared @code{static}, then the function is normally not output as
 assembler code in its own right.
 
-Enabled at levels @option{-O3}, @option{-Os}.  Also enabled
+Enabled at levels @option{-O2}, @option{-O3}, @option{-Os}.  Also enabled
 by @option{-fprofile-use} and @option{-fauto-profile}.
 
 @item -

Re: [C++ PATCH] PR c++/91369 - Implement P0784R7: constexpr new

2019-10-02 Thread Jason Merrill

On 10/2/19 7:36 AM, Jakub Jelinek wrote:

Hi!

Thanks for the review, let me start with the unrelated bugfixes then and
deal with the rest later.

On Tue, Oct 01, 2019 at 05:56:00PM -0400, Jason Merrill wrote:

@@ -3905,7 +4033,7 @@ cxx_eval_store_expression (const constex
 bool no_zero_init = true;
 releasing_vec ctors;
-  while (!refs->is_empty())
+  while (!refs->is_empty ())
   {
 if (*valp == NULL_TREE)
{
@@ -4046,7 +4174,9 @@ cxx_eval_store_expression (const constex
 if (const_object_being_modified)
   {
 bool fail = false;
-  if (!CLASS_TYPE_P (TREE_TYPE (const_object_being_modified)))
+  tree const_objtype
+   = strip_array_types (TREE_TYPE (const_object_being_modified));
+  if (!CLASS_TYPE_P (const_objtype))


This looks like an unrelated bugfix; you might commit it (and the others)
separately if that's convenient.


@@ -4907,14 +5043,21 @@ cxx_eval_constant_expression (const cons
 break;
   case CLEANUP_STMT:
-  r = cxx_eval_constant_expression (ctx, CLEANUP_BODY (t), lval,
+  {
+   tree initial_jump_target = jump_target ? *jump_target : NULL_TREE;
+   r = cxx_eval_constant_expression (ctx, CLEANUP_BODY (t), lval,
+ non_constant_p, overflow_p,
+ jump_target);
+   if (!CLEANUP_EH_ONLY (t) && !*non_constant_p)
+ /* Also evaluate the cleanup.  If we weren't skipping at the
+start of the CLEANUP_BODY, change jump_target temporarily
+to &initial_jump_target, so that even a return or break or
+continue in the body doesn't skip the cleanup.  */


This also looks like an unrelated bugfix.


Both are only partially related, I think they can go in separately, but at
least for the first one I haven't managed to come up with a testcase where
it would matter and nothing in e.g. check-c++-all comes there with ARRAY_TYPE,
is it ok for trunk without a testcase (first attached patch)?

As for the second bugfix, I think it should be accompanied with the
potential_constant_expression_1 change, and there I've actually managed to
come up with a testcase where it matters, though it is using a GCC extension
(generally, CLEANUP_STMTs won't appear in constexpr functions that often
because of the requirement that variables in them have literal type and
literal types have trivial destructors, so really no cleanups in that case).

The testcase where it makes a difference is:

constexpr void
cleanup (int *x)
{
   if (x)
 asm ("");
}

constexpr void
cleanup2 (int *x)
{
}

constexpr bool
foo ()
{
   int a __attribute__((cleanup (cleanup))) = 1;
   return true;
}

constexpr bool
bar ()
{
   int a __attribute__((cleanup (cleanup2))) = 1;
   return true;
}

constexpr auto x = foo ();
constexpr auto y = bar ();

With vanilla trunk, one gets a weird message:
test.C:27:24: error: ‘constexpr bool foo()’ called in a constant expression
27 | constexpr auto x = foo ();
   |^~
test.C:28:24: error: ‘constexpr bool bar()’ called in a constant expression
28 | constexpr auto y = bar ();
   |^~
That is because we call potential_constant_expression_1 on the foo (or
bar) body, see CLEANUP_STMT in there and punt.  With just the
potential_constant_expression_1 change to handle CLEANUP_STMT, we actually
accept it, which is wrong.
Finally, with the whole patch attached below, we reject foo () call and
accept bar ():
test.C:27:24:   in ‘constexpr’ expansion of ‘foo()’
test.C:27:25:   in ‘constexpr’ expansion of ‘cleanup((& a))’
test.C:5:5: error: inline assembly is not a constant expression
 5 | asm ("");
   | ^~~
test.C:5:5: note: only unevaluated inline assembly is allowed in a ‘constexpr’ 
function in C++2a

Is the testcase ok in the second patch?  Are those patches ok for trunk?


OK, thanks.

Jason


[PR testsuite/91842] Skip gcc.dg/ipa/ipa-sra-19.c on power

2019-10-02 Thread Martin Jambor
Hi,

I seem to remember I minimized gcc.dg/ipa/ipa-sra-19.c on power but
perhaps I am wrong because the testcase fails there with a
power-specific error:

gcc.dg/ipa/ipa-sra-19.c:19:3: error: AltiVec argument passed to unprototyped 
function

I am going to simply skip it there with the following patch, which I
hope is obvious.  Tested by running ipa.exp on both ppc64le-linux and
x86_64-linux.

Thanks,

Martin


diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-sra-19.c 
b/gcc/testsuite/gcc.dg/ipa/ipa-sra-19.c
index adebaa5f5e1..d219411d8ba 100644
--- a/gcc/testsuite/gcc.dg/ipa/ipa-sra-19.c
+++ b/gcc/testsuite/gcc.dg/ipa/ipa-sra-19.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2"  } */
+/* { dg-skip-if "" { powerpc*-*-* } } */
 
 typedef int __attribute__((__vector_size__(16))) vectype;




Re: [PATCH] Disable tests that aren't valid in parallel mode

2019-10-02 Thread Jonathan Wakely

On 01/10/19 21:59 +0100, Jonathan Wakely wrote:

Tested x86_64-linux (normal and parallel modes), committed to trunk.



commit b11c8f480fe1cd5696ec1a8f0db481c5f45429b8
Author: Jonathan Wakely 
Date:   Tue Oct 1 20:31:51 2019 +0100

   Disable tests that aren't valid in parallel mode
   
   Tests that depend on debug mode can't be tested in parallel mode.
   
   * testsuite/17_intro/using_namespace_std_tr1_neg.cc: Skip test for

   parallel mode.
   * testsuite/20_util/hash/84998.cc: Likewise.
   * testsuite/23_containers/deque/types/pmr_typedefs_debug.cc: 
Likewise.
   * testsuite/23_containers/forward_list/pmr_typedefs_debug.cc: 
Likewise.
   * testsuite/23_containers/list/pmr_typedefs_debug.cc: Likewise.
   * testsuite/23_containers/map/pmr_typedefs_debug.cc: Likewise.
   * testsuite/23_containers/multimap/pmr_typedefs_debug.cc: Likewise.
   * testsuite/23_containers/multiset/pmr_typedefs_debug.cc: Likewise.
   * testsuite/23_containers/set/pmr_typedefs_debug.cc: Likewise.
   * testsuite/23_containers/unordered_map/pmr_typedefs_debug.cc:
   Likewise.
   * testsuite/23_containers/unordered_multimap/pmr_typedefs_debug.cc:
   Likewise.
   * testsuite/23_containers/unordered_multiset/pmr_typedefs_debug.cc:
   Likewise.
   * testsuite/23_containers/unordered_set/pmr_typedefs_debug.cc:
   Likewise.
   * testsuite/23_containers/vector/cons/destructible_debug_neg.cc:
   Likewise.
   * testsuite/23_containers/vector/types/pmr_typedefs_debug.cc: 
Likewise.
   * testsuite/25_algorithms/binary_search/partitioned.cc: Likewise.
   * testsuite/25_algorithms/copy/86658.cc: Likewise.
   * testsuite/25_algorithms/equal_range/partitioned.cc: Likewise.
   * testsuite/25_algorithms/lexicographical_compare/71545.cc: Likewise.
   * testsuite/25_algorithms/lower_bound/partitioned.cc: Likewise.
   * testsuite/25_algorithms/upper_bound/partitioned.cc: Likewise.


The remaining failures for 'make check-parallel' are all in tests for C++20
constexpr and move-only function objects.

We should just document that Parallel Mode only really meets the C++03
requirements. For C++17 there are new parallel algorithms, and once
those are able to use OpenMP we should consider deprecating Parallel
Mode entirely.

This documents the limitation. Committed to trunk.

commit 5fad726cad1b930930caeb4f6c070ebde813cb02
Author: Jonathan Wakely 
Date:   Wed Oct 2 15:43:54 2019 +0100

Document non-conformance of parallel mode to recent C++ standards

* doc/xml/manual/parallel_mode.xml: Add caveat about support for
recent standards.
* doc/html/*: Regenerate.

diff --git a/libstdc++-v3/doc/xml/manual/parallel_mode.xml b/libstdc++-v3/doc/xml/manual/parallel_mode.xml
index 60e2088b2a1..ab7d2f2b56b 100644
--- a/libstdc++-v3/doc/xml/manual/parallel_mode.xml
+++ b/libstdc++-v3/doc/xml/manual/parallel_mode.xml
@@ -13,17 +13,31 @@
 
 
  The libstdc++ parallel mode is an experimental parallel
-implementation of many algorithms the C++ Standard Library.
+implementation of many algorithms of the C++ Standard Library.
 
 
 
 Several of the standard algorithms, for instance
 std::sort, are made parallel using OpenMP
-annotations. These parallel mode constructs and can be invoked by
+annotations. These parallel mode constructs can be invoked by
 explicit source declaration or by compiling existing sources with a
 specific compiler flag.
 
 
+
+  
+The parallel mode has not been kept up to date with recent C++ standards
+and so it only conforms to the C++03 requirements.
+That means that move-only predicates may not work with parallel mode
+algorithms, and for C++20 most of the algorithms cannot be used in
+constexpr functions.
+  
+  
+For C++17 and above there are new overloads of the standard algorithms
+which take an execution policy argument. You should consider using those
+instead of the non-standard parallel mode extensions.
+  
+
 
 Intro
   


Silecne bogus -Wmaybe-uninitialized warning

2019-10-02 Thread Jan Hubicka
Hi,
this patch silences -Wmaybe-uninitialized warning when GCC is built
with -finline-functions (which I am working to make part of -O2).

Bootstrapped/regtested x86_64-linux, comitted as obvious.
Honza
* module.c (load_commons): Initialize flags to 0 to silecne
-Wmaybe-uninitialized warning.
(read_module): Likewise for n and comp_name.
Index: module.c
===
--- module.c(revision 276441)
+++ module.c(working copy)
@@ -4745,7 +4745,7 @@ load_commons (void)
 
   while (peek_atom () != ATOM_RPAREN)
 {
-  int flags;
+  int flags = 0;
   char* label;
   mio_lparen ();
   mio_internal_string (name);
@@ -5243,8 +5243,8 @@ read_module (void)
  for (c = sym->components; c; c = c->next)
{
  pointer_info *p;
- const char *comp_name;
- int n;
+ const char *comp_name = NULL;
+ int n = 0;
 
  mio_lparen (); /* component opening.  */
  mio_integer (&n);


Re: [patch] range-ops contribution

2019-10-02 Thread Andrew MacLeod

On 10/2/19 9:36 AM, Richard Sandiford wrote:

Andrew MacLeod  writes:

On 10/2/19 6:52 AM, Richard Biener wrote:

+
+/* Return the inverse of a range.  */
+
+void
+value_range_base::invert ()
+{
+  if (undefined_p ())
+return;
+  if (varying_p ())
+set_undefined ();
+  else if (m_kind == VR_RANGE)
+m_kind = VR_ANTI_RANGE;
+  else if (m_kind == VR_ANTI_RANGE)
+m_kind = VR_RANGE;
+  else
+gcc_unreachable ();
+}

I don't think this is right for varying_p.  ISTM that if something is
VR_VARYING that inverting it is still VR_VARYING.  VR_UNDEFINED seems
particularly bad given its optimistic treatment elsewhere.

VR_VARYING isn't a range, it's a lattice state (likewise for VR_UNDEFINED).
It doesn't make sense to invert a lattice state.  How you treat
VR_VARYING/VR_UNDEFINED
depends on context and so depends what 'invert' would do.  I suggest to assert
that varying/undefined is never inverted.



True for a lattice state, not true for a range in the new context of the
ranger where
   a) varying == range for type and
   b) undefined == unreachable

This is a carry over from really old code where we only got part of it
fixed right a while ago.
invert ( varying ) == varying    because we still know nothing about it,
its still range for type.
invert (undefined) == undefined because undefined is unreachable
which is viral.

So indeed, varying should return varying... So If its undefined or
varying, we should just return from the invert call. ie, not do anything
to the range.    in the case of a lattice state, doing nothing to it
should not be harmful.  I also expect it will never get called for a
pure lattice state since its only invoked from range-ops, at which point
we only are dealing with the range it represents.


I took a look and this bug hasn't been triggered because its only used
in a couple of places.
1)  op1_range for EQ_EXPR and NE_EXPR when we have a true OR false
constant condition in both the LHS and OP2 position, it sometimes
inverts it via this call..  so its only when there is a specific boolean
range of [TRUE,TRUE]  or [FALSE,FALSE].      when any range is undefined
or varying in those routines, theres a different path for the result

2) fold() for logical NOT, which also has a preliminary check for
varying or undefined and does nothing in those cases ie, returns the
existing value.   IN fact, you can probably remove the special casing in
logical_not with this fix, which is indicative that it is correct.

IMO that makes invert a bit of a dangerous operation.  E.g. for
ranges of unsigned bytes:

   invert ({}) = invert(UNDEFINED) = UNDEFINED = {}
   invert ([255, 255]) = ~[255, 255] = [0, 254]
   ...
   invert ([3, 255]) = ~[3, 255] = [0, 2]
   invert ([2, 255]) = ~[2, 255] = [0, 1]
   invert ([1, 255]) = ~[1, 255] = [0, 0]
   invert ([0, 255]) = invert(VARYING) = VARYING = [0, 255]

where there's no continuity at the extremes.  Maybe it would be
better to open-code it in those two places instead?

Richard


Im not sure that the continuity is important since ranges are a little 
bit odd at the edges anyway :-)


however, I will take the point that perhaps invert () has potentially 
different meanings at the edges depending no how you look at it.  
Ultimately that is why we ended up getting it wrong in the first place...


So, I audited all the current uses of invert() (it is not commonly used 
either) , and we already special case the varying or undefined behaviour 
when its appropriate before invert is invoked.   So I think we can 
reduce everyones concern about these edge cases by simply doing as you 
guys suggest and gcc_assert()  that the ranges are not undefined or 
varying for invert().



Andrew




Re: [PATCH] Use movmem optab to attempt inline expansion of __builtin_memmove()

2019-10-02 Thread Aaron Sawdey
On 10/1/19 4:45 PM, Jeff Law wrote:
> On 9/27/19 12:23 PM, Aaron Sawdey wrote:
>> This is the third piece of my effort to improve inline expansion of memmove. 
>> The
>> first two parts I posted back in June fixed the names of the optab entries
>> involved so that optab cpymem is used for memcpy() and optab movmem is used 
>> for
>> memmove(). This piece adds support for actually attempting to invoke the 
>> movmem
>> optab to do inline expansion of __builtin_memmove().
>>
>> Because what needs to be done for memmove() is very similar to memcpy(), I 
>> have
>> just added a bool parm "might_overlap" to several of the functions involved 
>> so
>> the same functions can handle both. The name might_overlap comes from the 
>> fact
>> that if we still have a memmove() call at expand, this means
>> gimple_fold_builtin_memory_op() was not able to prove that the source and
>> destination do not overlap.
>>
>> There are a few places where might_overlap gets used to keep us from trying 
>> to
>> use the by-pieces infrastructure or generate a copy loop, as neither of those
>> things will work correctly if source and destination overlap.
>>
>> I've restructured things slightly in emit_block_move_hints() so that we can
>> try the pattern first if we already know that by-pieces won't work. This way
>> we can bail out immediately in the might_overlap case.
>>
>> Bootstrap/regtest passed on ppc64le, in progress on x86_64. If everything 
>> passes,
>> is this ok for trunk?
>>
>>
>> 2019-09-27  Aaron Sawdey 
>>
>>  * builtins.c (expand_builtin_memory_copy_args): Add might_overlap parm.
>>  (expand_builtin_memcpy): Use might_overlap parm.
>>  (expand_builtin_mempcpy_args): Use might_overlap parm.
>>  (expand_builtin_memmove): Call expand_builtin_memory_copy_args.
>>  (expand_builtin_memory_copy_args): Add might_overlap parm.
>>  * expr.c (emit_block_move_via_cpymem): Rename to
>>  emit_block_move_via_pattern, add might_overlap parm, use cpymem
>>  or movmem optab as appropriate.
>>  (emit_block_move_hints): Add might_overlap parm, do the right
>>  thing for might_overlap==true.
>>  * expr.h (emit_block_move_hints): Update prototype.
>>
>>
>>
>>
>> Index: gcc/builtins.c
>> ===
>> --- gcc/builtins.c   (revision 276131)
>> +++ gcc/builtins.c   (working copy)
>> @@ -3894,10 +3897,11 @@
>>  &probable_max_size);
>>src_str = c_getstr (src);
>>
>> -  /* If SRC is a string constant and block move would be done
>> - by pieces, we can avoid loading the string from memory
>> - and only stored the computed constants.  */
>> -  if (src_str
>> +  /* If SRC is a string constant and block move would be done by
>> + pieces, we can avoid loading the string from memory and only
>> + stored the computed constants.  I'm not sure if the by pieces
>> + method works if src/dest are overlapping, so avoid that case.  */
>> +  if (src_str && !might_overlap
> I don't think you need the check here.  c_getstr, when it returns
> somethign useful is going to be returning a string constant.  Think read
> only literals here.  I'm pretty sure overlap isn't going to be possible.

After some digging, I agree -- c_getstr() return a string constant and
that is then used to generate stores of constants.

I've fixed the other issues and also fixed emit_block_move_via_pattern() to
make use of pieces_ok instead of calling can_move_by_pieces() a second
time. The patch I'm actually committing is below.

Thanks for the review!

  Aaron

Index: gcc/builtins.c
===
--- gcc/builtins.c  (revision 276131)
+++ gcc/builtins.c  (working copy)
@@ -127,7 +127,8 @@
 static rtx expand_builtin_memcpy (tree, rtx);
 static rtx expand_builtin_memory_copy_args (tree dest, tree src, tree len,
rtx target, tree exp,
-   memop_ret retmode);
+   memop_ret retmode,
+   bool might_overlap);
 static rtx expand_builtin_memmove (tree, rtx);
 static rtx expand_builtin_mempcpy (tree, rtx);
 static rtx expand_builtin_mempcpy_args (tree, tree, tree, rtx, tree, 
memop_ret);
@@ -3790,7 +3791,7 @@
   check_memop_access (exp, dest, src, len);

   return expand_builtin_memory_copy_args (dest, src, len, target, exp,
- /*retmode=*/ RETURN_BEGIN);
+ /*retmode=*/ RETURN_BEGIN, false);
 }

 /* Check a call EXP to the memmove built-in for validity.
@@ -3797,7 +3798,7 @@
Return NULL_RTX on both success and failure.  */

 static rtx
-expand_builtin_memmove (tree exp, rtx)
+expand_builtin_memmove (tree exp, rtx target)
 {
   if (!validate_arglist (exp,
 POINTER_TYPE, POINTER_TYPE, INTEGER_TYPE, VOID_TYPE))
@@ -3809,7 +3810,8 @@


Re: [PATCH, Fortran] Fix Automatics in equivalence test cases was Re: Automatics in Equivalences failures

2019-10-02 Thread Jakub Jelinek
On Wed, Oct 02, 2019 at 02:36:55PM +0100, Mark Eggleston wrote:
> ChangeLog:
> 
>     Mark Eggleston 
> 
>     * gfortran.dg/auto_in_equiv_1.f90: Deleted.
>     * gfortran.dg/auto_in_equiv_2.f90: Deleted.
>     * gfortran.dg/auto_in_equiv_3.f90: Deleted.
>     * gfortran.dg/automatics_in_equivalence_1.f90: New test.
>     * gfortran.dg/automatics_in_equivalence_2.f90: New test.

Why the so long testcase names?  Just replacing the first old test with the
new one would be IMHO better.

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/automatics_in_equivalence_2.f90
> @@ -0,0 +1,37 @@
> +! { dg-do run }
> +! { dg-options "-fdec-static -frecursive -fno-automatic" }
...

1) if the test is the same, the only difference is in dg- directives,
it is easier to just include the other test.
2) if -frecursive -fno-automatic is the same as -fno-automatic, then
the test is not valid unless you use explicit recursive keyword, because
then you recurse on something that shouldn't be called recursively, right?

> +! { dg-warning "Flag '-fno-automatic' overwrites '-frecursive'" "warning" { 
> target *-*-* } 0 } 

I think you want one runtime test (e.g. the one you wrote in
automatics_in_equivalence_1.f90) and the rest just dg-do compile tests that
will check the original or gimple dumps to verify what happened in addition
to checking diagnostics (none) from the compilation, one testing the
default, another -fno-automatic, but in both cases without -frecursive.

Jakub


Re: [PATCH, Fortran] Fix Automatics in equivalence test cases was Re: Automatics in Equivalences failures

2019-10-02 Thread Steve Kargl
On Wed, Oct 02, 2019 at 02:36:55PM +0100, Mark Eggleston wrote:
> Please find attached a patch to replace the test cases for "Automatic in 
> equivalence".
> 
> ChangeLog:
> 
>      Mark Eggleston 
> 
>      * gfortran.dg/auto_in_equiv_1.f90: Deleted.
>      * gfortran.dg/auto_in_equiv_2.f90: Deleted.
>      * gfortran.dg/auto_in_equiv_3.f90: Deleted.
>      * gfortran.dg/automatics_in_equivalence_1.f90: New test.
>      * gfortran.dg/automatics_in_equivalence_2.f90: New test.
> 
> OK to commit?
> 

yes

-- 
Steve


Re: [IRA] Handle fully-tied destinations in a similar way to earlyclobbers

2019-10-02 Thread Vladimir Makarov



On 2019-09-30 10:40 a.m., Richard Sandiford wrote:

IRA's make_early_clobber_and_input_conflicts checks for cases in
which an output operand is likely to be an earlyclobber and an input
operand is unlikely to be tieable with it.  If so, the allocno for
the output conflicts with the allocno for the input.  This seems
to work well.

However, a similar situation arises if an output operand is likely
to be tied to one of a set of input operands X and if another input
operand has a different value from all of the operands in X.
E.g. if we have:

   0: "=r, r"
   1: "0, r"
   2: "r, 0"
   3: "r, r"

operand 0 will always be tied to operand 1 or operand 2, so if operand 3
is different from them both, operand 0 acts like an earlyclobber as far
as operand 3 (only) is concerned.  The same is true for operand 2 in:

   0: "=r"
   1: "0"
   2: "r"

In the second example, we'd normally have a copy between operand 1 and
operand 0 if operand 1 dies in the instruction, and so there's rarely
a problem.  But if operand 1 doesn't die in the instruction, operand 0
still acts as an earlyclobber for operand 2 (if different from operand 1),
since in that case LRA must copy operand 1 to operand 0 before the
instruction.

As the existing comment says:

 Avoid introducing unnecessary conflicts by checking classes of the
 constraints and pseudos because otherwise significant code
 degradation is possible for some targets.

I think that's doubly true here.  E.g. it's perfectly reasonable to have
constraints like:

   0: "=r, r"
   1: "0, r"
   2: "r, r"

on targets like s390 that have shorter instructions for tied operands,
but that don't want the size difference to influence RA too much.
We shouldn't treat operand 0 as earlyclobber wrt operand 2 in that case.

This patch therefore treats a normal tied non-earlyclobber output as
being effectively earlyclobber wrt to an input if it is so for *all*
preferred alternatives.

My usual bogo-comparison of gcc.c-torture, gcc.dg and g++.dg
(this time using -Os -fno-schedule-insns{,2}) gives:

Target Tests  Delta   Best  Worst Median
== =  =     = ==
aarch64-linux-gnu  3 -3 -1 -1 -1
aarch64_be-linux-gnu   4 -4 -1 -1 -1
alpha-linux-gnu  136   -190-56 84 -1
arc-elf   31   -172-27  3 -2
arm-linux-gnueabi 59   -996   -136  4 -1
arm-linux-gnueabihf   59   -996   -136  4 -1
bfin-elf  22-31-19  8 -1
bpf-elf  276   -388   -191 12 -1
cris-elf  73 69-18 26 -1
epiphany-elf  58-91-10  2 -1
fr30-elf 123   -156-33 20 -1
h8300-elf150   -426-36 17 -2
hppa64-hp-hpux11.23   39-65-16  1 -1
i686-apple-darwin 93-51-29 26 -1
i686-pc-linux-gnu 43  8-10 27 -1
m32r-elf  68-92-31 14 -1
m68k-linux-gnu   169-65-23 33 -1
mcore-elf 27-29-14  8 -1
mmix  25-75-28  2 -1
mn10300-elf  166 32-46149 -1
moxie-rtems  937   1461  -1649   6000 -1
msp430-elf89  -1364   -835  5 -4
nds32le-elf   34-54-29  2 -1
pdp11252   -458-23 13 -1
powerpc-ibm-aix7.0 3 -4 -2 -1 -1
powerpc64-linux-gnu1 -1 -1 -1 -1
powerpc64le-linux-gnu  3 -3 -1 -1 -1
rl78-elf   4-12 -4 -2 -4
rx-elf59-99-11  2 -1
s390-linux-gnu   115   -117-53 21 -1
s390x-linux-gnu  120-47-25 21 -1
sh-linux-gnu  54-89-31  8 -1
sparc64-linux-gnu 14 -6 -5  4 -1
tilepro-linux-gnu209   -452-55 16 -1
v850-elf  10 18 -2 21 -1
vax-netbsdelf  5 -5 -1 -1 -1
x86_64-darwin 53-62-33  3 -1
x86_64-linux-gnu  52 -8 -8 13 -1
xstormy16-elf144   -814   -541 25 -1
xtensa-elf   578  -2096   -138 15 -1

The eye-watering moxie-rtems +6,000 outlier is from gcc.dg/pr59992.c,
where the same code template is instantiated 10,000 times.  In some
instances the code improves by one instruction, but it regresses by
one instruction in many more.

To emphasise that this is a poor metric (and is just to get a flavour),
most of the {i686,x86_64}-linux-gnu LOC increases happen in frame info
rather than code.  I should try to script that out...

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?


Yes.

It is a very non-trivial patch.  It to

Re: [PATCH] Automatics in equivalence statements

2019-10-02 Thread Mark Eggleston
Thanks, as you point out all the test needs to do is verify that that a 
variable with an AUTOMATIC attribute can be used in an EQUIVALENCE and 
and that the items in the EQUIVALENCE are on the stack by using in a 
recursive routine.


I've created a patch to replace the existing test cases and have sent it 
to this e-mail thread https://gcc.gnu.org/ml/fortran/2019-09/msg00123.html


regards,

Mark

On 30/09/2019 11:24, Jakub Jelinek wrote:

On Sat, Sep 28, 2019 at 10:33:26PM +0200, Andreas Schwab wrote:

On Aug 14 2019, Mark Eggleston  wrote:


     * gfortran.dg/auto_in_equiv_3.f90: New test.

This test fails everywhere.

Yes, and _2 on i686-linux at -O0.

To me both testcases are undefined behavior.
E.g. the first one has:
   subroutine suba(option)
 integer, intent(in) :: option
 integer, automatic :: a
 integer :: b
 integer :: c
 equivalence (a, b)
 if (option.eq.0) then
   ! initialise a and c
   a = 9
   c = 99
   if (a.ne.b) stop 1
   if (loc(a).ne.loc(b)) stop 2
 else
   ! a should've been overwritten
   if (a.eq.9) stop 3
 end if
   end subroutine suba
My understanding is that because a is explicitly automatic and b is automatic 
too
(implicitly), the whole equivalence is automatic, so if you call this
subroutine with non-zero option, you read an uninitialized variable and
compare it to 9.  That can result in anything, .false., .true., disk
formatting, you can't rely on some other routine laying out its automatic
variable at exactly the same spot and overwriting the memory in there, not
to mention that the compiler can easily spot the uninitialized use too.
Similarly in the second test, returning address of an automatic variable
from a function is already something e.g. the C/C++ FEs warn about, because
you really can't do anything useful with that address, it can't be
dereferenced, or even the comparisons to addresses of other automatic
variables that left their scope is wrong.

IMHO if you want to check if a variable is SAVEd or AUTOMATIC, you want to
recurse, either directly or indirectly, pass the address of the variable in
the outer subroutine down to the inner one and compare there, SAVEd
variables need to have the same address, while AUTOMATIC variables where
both the outer and inner variable is at that point still in the scope need
to have the addresses different.

Though, in order to have in Fortran a recursively callable subroutine, one
needs to use RECURSIVE.
So, IMHO we want 4 testcases out of these 2, two dg-do compile only which
will verify the tests compile when mixing automatic with no explicit
save/automatic in equivalence and will -fdump-tree-gimple and scan the
gimple dump to verify there is equiv.\[0-9]* variable which is not static,
and then two runtime testcases like (one with just -fdec-static, one with
also -fno-automatic, though guess it doesn't matter that much, as recursive
already implies that it is automatic).
program test
   integer :: dummy
   integer, parameter :: address = kind(loc(dummy))
   integer(address) :: addr
   addr = 0
   call sub (0, addr)
contains
   recursive subroutine sub (option, addr)
 integer, intent(in) :: option
 integer(address), intent(in) :: addr
 integer, automatic :: a
 integer :: b
 integer(address) :: c
 equivalence (a, b)
 if (option.eq.0) then
   a = 9
   if (a.ne.b) stop 1
   if (loc(a).ne.loc(b)) stop 2
   c = loc(a)
   call sub (1, c)
   if (a.ne.9) stop 3
 else
   a = 10
   if (a.ne.b) stop 4
   if (loc(a).ne.loc(b)) stop 5
   if (addr.eq.loc(a)) stop 6
 end if
   end subroutine sub
end program test

Jakub


--
https://www.codethink.co.uk/privacy.html



[PATCH, Fortran] Fix Automatics in equivalence test cases was Re: Automatics in Equivalences failures

2019-10-02 Thread Mark Eggleston
Please find attached a patch to replace the test cases for "Automatic in 
equivalence".


ChangeLog:

    Mark Eggleston 

    * gfortran.dg/auto_in_equiv_1.f90: Deleted.
    * gfortran.dg/auto_in_equiv_2.f90: Deleted.
    * gfortran.dg/auto_in_equiv_3.f90: Deleted.
    * gfortran.dg/automatics_in_equivalence_1.f90: New test.
    * gfortran.dg/automatics_in_equivalence_2.f90: New test.

OK to commit?

regards,

Mark

On 30/09/2019 13:18, Mark Eggleston wrote:


On 30/09/2019 10:06, Mark Eggleston wrote:

Thanks to Tobias Burnius for fixing the dg directives in test cases.

Now that the test cases for "Automatics in equivalence" (svn revision 
274565) are actually being run, test failures are occurring.


I've investigated the test failures for auto-in-equiv_3.f90:

- -O1, -O2, -O3 and -Os fail
- -O1 fails because the check of address fails due to a 40 byte 
difference in location of the stack
- -O2, -O3 and -Os fail due the evaluation of an .and. operation 
returning .false. when both operands are .true..


The test case could be better and should probably be replaced with a 
better one.


I've discovered that -finit-local-zero doesn't work if the local 
variable is in an equivalence statement where at least one the 
variables has an AUTOMATIC attribute.
On further investigation I find that local variables are not 
initialised if the EQUIVALENCE attribute is set 
(build_default_init_expr). So this is expected behaviour. So back to 
the drawing board for a suitable test case.


What is the best way of dealing with this? Reverting the commit and 
resubmitting a corrected patch when it's been fixed?


regards,

Mark



--
https://www.codethink.co.uk/privacy.html

>From e461900f602b48e8d13402fca46d34506308785f Mon Sep 17 00:00:00 2001
From: Mark Eggleston 
Date: Wed, 2 Oct 2019 10:23:49 +0100
Subject: [PATCH] Replace test cases for Automatics in equivalence

---
 gcc/testsuite/gfortran.dg/auto_in_equiv_1.f90  | 36 -
 gcc/testsuite/gfortran.dg/auto_in_equiv_2.f90  | 38 -
 gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90  | 63 --
 .../gfortran.dg/automatics_in_equivalence_1.f90| 35 
 .../gfortran.dg/automatics_in_equivalence_2.f90| 37 +
 5 files changed, 72 insertions(+), 137 deletions(-)
 delete mode 100644 gcc/testsuite/gfortran.dg/auto_in_equiv_1.f90
 delete mode 100644 gcc/testsuite/gfortran.dg/auto_in_equiv_2.f90
 delete mode 100644 gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/automatics_in_equivalence_1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/automatics_in_equivalence_2.f90

diff --git a/gcc/testsuite/gfortran.dg/auto_in_equiv_1.f90 b/gcc/testsuite/gfortran.dg/auto_in_equiv_1.f90
deleted file mode 100644
index bf6e0c68a57..000
--- a/gcc/testsuite/gfortran.dg/auto_in_equiv_1.f90
+++ /dev/null
@@ -1,36 +0,0 @@
-! { dg-do compile }
-
-! Contributed by Mark Eggleston 
-program test
-  call suba(0)
-  call subb(0)
-  call suba(1)
-
-contains
-  subroutine suba(option) 
-integer, intent(in) :: option
-integer, automatic :: a ! { dg-error "AUTOMATIC at \\(1\\) is a DEC extension" }
-integer :: b
-integer :: c
-equivalence (a, b)
-if (option.eq.0) then
-  ! initialise a and c
-  a = 9
-  c = 99
-  if (a.ne.b) stop 1
-  if (loc(a).ne.loc(b)) stop 2
-else
-  ! a should've been overwritten
-  if (a.eq.9) stop 3
-end if
-  end subroutine suba
-
-  subroutine subb(dummy)
-integer, intent(in) :: dummy
-integer, automatic :: x ! { dg-error "AUTOMATIC at \\(1\\) is a DEC extension" }
-integer :: y
-x = 77
-y = 7
-  end subroutine subb
-
-end program test
diff --git a/gcc/testsuite/gfortran.dg/auto_in_equiv_2.f90 b/gcc/testsuite/gfortran.dg/auto_in_equiv_2.f90
deleted file mode 100644
index e40c0f15f3e..000
--- a/gcc/testsuite/gfortran.dg/auto_in_equiv_2.f90
+++ /dev/null
@@ -1,38 +0,0 @@
-! { dg-do run }
-! { dg-options "-fdec-static" }
-
-! Contributed by Mark Eggleston 
-
-program test
-  call suba(0)
-  call subb(0)
-  call suba(1)
-
-contains
-  subroutine suba(option) 
-integer, intent(in) :: option
-integer, automatic :: a
-integer :: b
-integer :: c
-equivalence (a, b)
-if (option.eq.0) then
-  ! initialise a and c
-  a = 9
-  c = 99
-  if (a.ne.b) stop 1
-  if (loc(a).ne.loc(b)) stop 2
-else
-  ! a should've been overwritten
-  if (a.eq.9) stop 3
-end if
-  end subroutine suba
-
-  subroutine subb(dummy)
-integer, intent(in) :: dummy
-integer, automatic :: x
-integer :: y
-x = 77
-y = 7
-  end subroutine subb
-
-end program test
diff --git a/gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90 b/gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90
deleted file mode 100644
index 57c384d1772..000
--- a/gcc/testsuite/gfortran.dg/auto_in_equiv_3.f90
+++ /dev/null
@@ -1,63 +0,0 @@
-! { dg-do run }
-! { dg-options

Re: [patch] range-ops contribution

2019-10-02 Thread Richard Sandiford
Andrew MacLeod  writes:
> On 10/2/19 6:52 AM, Richard Biener wrote:
>>
 +
 +/* Return the inverse of a range.  */
 +
 +void
 +value_range_base::invert ()
 +{
 +  if (undefined_p ())
 +return;
 +  if (varying_p ())
 +set_undefined ();
 +  else if (m_kind == VR_RANGE)
 +m_kind = VR_ANTI_RANGE;
 +  else if (m_kind == VR_ANTI_RANGE)
 +m_kind = VR_RANGE;
 +  else
 +gcc_unreachable ();
 +}
>>> I don't think this is right for varying_p.  ISTM that if something is
>>> VR_VARYING that inverting it is still VR_VARYING.  VR_UNDEFINED seems
>>> particularly bad given its optimistic treatment elsewhere.
>> VR_VARYING isn't a range, it's a lattice state (likewise for VR_UNDEFINED).
>> It doesn't make sense to invert a lattice state.  How you treat
>> VR_VARYING/VR_UNDEFINED
>> depends on context and so depends what 'invert' would do.  I suggest to 
>> assert
>> that varying/undefined is never inverted.
>>
>>
> True for a lattice state, not true for a range in the new context of the 
> ranger where
>   a) varying == range for type and
>   b) undefined == unreachable
>
> This is a carry over from really old code where we only got part of it 
> fixed right a while ago.
> invert ( varying ) == varying    because we still know nothing about it, 
> its still range for type.
> invert (undefined) == undefined because undefined is unreachable 
> which is viral.
>
> So indeed, varying should return varying... So If its undefined or 
> varying, we should just return from the invert call. ie, not do anything 
> to the range.    in the case of a lattice state, doing nothing to it 
> should not be harmful.  I also expect it will never get called for a 
> pure lattice state since its only invoked from range-ops, at which point 
> we only are dealing with the range it represents.
>
>
> I took a look and this bug hasn't been triggered because its only used 
> in a couple of places.
> 1)  op1_range for EQ_EXPR and NE_EXPR when we have a true OR false 
> constant condition in both the LHS and OP2 position, it sometimes 
> inverts it via this call..  so its only when there is a specific boolean 
> range of [TRUE,TRUE]  or [FALSE,FALSE].      when any range is undefined 
> or varying in those routines, theres a different path for the result
>
> 2) fold() for logical NOT, which also has a preliminary check for 
> varying or undefined and does nothing in those cases ie, returns the 
> existing value.   IN fact, you can probably remove the special casing in 
> logical_not with this fix, which is indicative that it is correct.

IMO that makes invert a bit of a dangerous operation.  E.g. for
ranges of unsigned bytes:

  invert ({}) = invert(UNDEFINED) = UNDEFINED = {}
  invert ([255, 255]) = ~[255, 255] = [0, 254]
  ...
  invert ([3, 255]) = ~[3, 255] = [0, 2]
  invert ([2, 255]) = ~[2, 255] = [0, 1]
  invert ([1, 255]) = ~[1, 255] = [0, 0]
  invert ([0, 255]) = invert(VARYING) = VARYING = [0, 255]

where there's no continuity at the extremes.  Maybe it would be
better to open-code it in those two places instead?

Richard


Re: [patch] Extend GIMPLE store merging to throwing stores

2019-10-02 Thread Eric Botcazou
> that works for me - the patch is OK with that change.

Thanks.  In the end, I went for your solution and the helper is called 
unsplit_eh_edges, which is even more explicit than unsplit_all_eh.


* tree-eh.h (unsplit_eh_edges): Declare.
* tree-eh.c (maybe_remove_unreachable_handlers): Detect more cases.
(unsplit_eh_edges): New function wrapping unsplit_all_eh.
* gimple-ssa-store-merging.c: Include cfganal.h cfgcleanup.h except.h.
(struct store_immediate_info): Add lp_nr field.
(store_immediate_info::store_immediate_info): Add NR2 parameter and
initialize lp_nr with it.
(struct merged_store_group): Add lp_nr and only_constants fields.
(merged_store_group::merged_store_group): Initialize them.
(merged_store_group::can_be_merged_into): Deal with them.
(pass_store_merging): Rename terminate_and_release_chain into
terminate_and_process_chain.
(pass_store_merging::terminate_and_process_all_chains): Adjust to above
renaming and remove useless assertions.
(pass_store_merging::terminate_all_aliasing_chains): Small tweak.
(stmts_may_clobber_ref_p): Be prepared for different basic blocks.
(imm_store_chain_info::coalesce_immediate_stores): Use only_constants
instead of always recomputing it and compare lp_nr.
(imm_store_chain_info::output_merged_store): If the group is in an
active EH region, register new stores if they can throw.  Moreover,
if the insertion has created new basic blocks, adjust the PHI nodes
of the post landing pad.
(imm_store_chain_info::output_merged_stores): If the original stores
are in an active EH region, deregister them.
(lhs_valid_for_store_merging_p): Prettify.
(adjust_bit_pos): New function extracted from...
(mem_valid_for_store_merging): ...here.  Use it for the base address
and also for the offset if it is the addition of a constant.
(lp_nr_for_store): New function.
(pass_store_merging::process_store): Change return type to bool.
Call lp_nr_for_store to initialize the store info.  Propagate the
return status of various called functions to the return value.
(store_valid_for_store_merging_p): New predicate.
(enum basic_block_status): New enumeration.
(get_status_for_store_merging): New function.
(pass_store_merging::execute): If the function can throw and catch
non-call exceptions, unsplit the EH edges on entry and clean up the
CFG on exit if something changed.  Call get_status_for_store_merging
for every basic block and keep the chains open across basic blocks
when possible.  Terminate and process open chains at the end, if any.


-- 
Eric BotcazouIndex: gimple-ssa-store-merging.c
===
--- gimple-ssa-store-merging.c	(revision 275988)
+++ gimple-ssa-store-merging.c	(working copy)
@@ -159,7 +159,10 @@
 #include "gimple-fold.h"
 #include "stor-layout.h"
 #include "timevar.h"
+#include "cfganal.h"
+#include "cfgcleanup.h"
 #include "tree-cfg.h"
+#include "except.h"
 #include "tree-eh.h"
 #include "target.h"
 #include "gimplify-me.h"
@@ -1375,13 +1378,15 @@ public:
   /* True if ops have been swapped and thus ops[1] represents
  rhs1 of BIT_{AND,IOR,XOR}_EXPR and ops[0] represents rhs2.  */
   bool ops_swapped_p;
+  /* The index number of the landing pad, or 0 if there is none.  */
+  int lp_nr;
   /* Operands.  For BIT_*_EXPR rhs_code both operands are used, otherwise
  just the first one.  */
   store_operand_info ops[2];
   store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
 			unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
 			gimple *, unsigned int, enum tree_code,
-			struct symbolic_number &, gimple *, bool,
+			struct symbolic_number &, gimple *, bool, int,
 			const store_operand_info &,
 			const store_operand_info &);
 };
@@ -1396,11 +1401,13 @@ store_immediate_info::store_immediate_in
 	struct symbolic_number &nr,
 	gimple *ins_stmtp,
 	bool bitnotp,
+	int nr2,
 	const store_operand_info &op0r,
 	const store_operand_info &op1r)
   : bitsize (bs), bitpos (bp), bitregion_start (brs), bitregion_end (bre),
 stmt (st), order (ord), rhs_code (rhscode), n (nr),
-ins_stmt (ins_stmtp), bit_not_p (bitnotp), ops_swapped_p (false)
+ins_stmt (ins_stmtp), bit_not_p (bitnotp), ops_swapped_p (false),
+lp_nr (nr2)
 #if __cplusplus >= 201103L
 , ops { op0r, op1r }
 {
@@ -1435,6 +1442,7 @@ public:
   bool bit_insertion;
   bool only_constants;
   unsigned int first_nonmergeable_order;
+  int lp_nr;
 
   auto_vec stores;
   /* We record the first and last original statements in the sequence because
@@ -1862,6 +1870,7 @@ merged_store_group::merged_store_group (
   bit_insertion = false;
   only_constants = info->rhs_code == INTEGER_

Re: [01/32] Add function_abi.{h,cc}

2019-10-02 Thread Richard Sandiford
Bernd Edlinger  writes:
> Hi,
>
> I am currently trying to implement -Wshadow=local, and
> this patch triggers a build-failure with -Wshadow=local
> since i is a parameter that is the regno.
> But it is also used as loop variable,
> so I think this introduces probably a bug:
>
>> @@ -728,7 +731,11 @@ globalize_reg (tree decl, int i)
>>   appropriate regs_invalidated_by_call bit, even if it's already
>>   set in fixed_regs.  */
>>if (i != STACK_POINTER_REGNUM)
>> -SET_HARD_REG_BIT (regs_invalidated_by_call, i);
>> +{
>> +  SET_HARD_REG_BIT (regs_invalidated_by_call, i);
>> +  for (unsigned int i = 0; i < NUM_ABI_IDS; ++i)
>> +function_abis[i].add_full_reg_clobber (i);
>> +}
>
>
> I would think you meant:
>
> for (unsigned int j = 0; j < NUM_ABI_IDS; ++j)
>   function_abis[j].add_full_reg_clobber (i);
>
> Right?

Oops, yes.  Applied as obvious after testing on aarch64-linux-gnu.

Looking forward to the new -W option :-)

Richard


2019-10-02  Richard Sandiford  

gcc/
* reginfo.c (globalize_reg): Fix shadowed variable in
function_abis walk.

Index: gcc/reginfo.c
===
--- gcc/reginfo.c   2019-10-01 09:55:35.150088599 +0100
+++ gcc/reginfo.c   2019-10-02 14:12:15.379196856 +0100
@@ -731,8 +731,8 @@ globalize_reg (tree decl, int i)
   if (i != STACK_POINTER_REGNUM)
 {
   SET_HARD_REG_BIT (regs_invalidated_by_call, i);
-  for (unsigned int i = 0; i < NUM_ABI_IDS; ++i)
-   function_abis[i].add_full_reg_clobber (i);
+  for (unsigned int j = 0; j < NUM_ABI_IDS; ++j)
+   function_abis[j].add_full_reg_clobber (i);
 }
 
   /* If already fixed, nothing else to do.  */


[OG9] Comitted backport of: [Patch][Fortran] Vastly improve error message during parsing – simple patch for OpenMP/OpenACC directives

2019-10-02 Thread Tobias Burnus
Simply committing that patch to the GCC's git OG9 branch as 
32568a014c678e09a251dd3c5f64618779f036f5


Tobias



Fix another -Wodr ICE

2019-10-02 Thread Jan Hubicka
Hi,
this patch finally makes cactusBSSN working.  The logic which was
supposed to get anonymous namespace type first (if it exists) was
wrong as was caught by a subsetquent assert.

Bootstrapped/regtested x86_64-linux, comitted.
Honza

Index: ChangeLog
===
--- ChangeLog   (revision 276453)
+++ ChangeLog   (working copy)
@@ -1,3 +1,9 @@
+2019-10-01  Jan Hubicka  
+
+   PR c++/91222
+   * ipa-devirt.c (warn_types_mismatch): Fix conditional on anonymous
+   namespace types.
+
 2019-10-02  Shahab Vahedi  
 
* config/arc/arc.h (ASM_SPEC): Pass -mcode-density.
Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 276441)
+++ ipa-devirt.c(working copy)
@@ -986,8 +986,8 @@ warn_types_mismatch (tree t1, tree t2, l
   || (type_with_linkage_p (TYPE_MAIN_VARIANT (t2))
  && type_in_anonymous_namespace_p (TYPE_MAIN_VARIANT (t2
 {
-  if (type_with_linkage_p (TYPE_MAIN_VARIANT (t1))
- && !type_in_anonymous_namespace_p (TYPE_MAIN_VARIANT (t1)))
+  if (!type_with_linkage_p (TYPE_MAIN_VARIANT (t1))
+ || !type_in_anonymous_namespace_p (TYPE_MAIN_VARIANT (t1)))
{
  std::swap (t1, t2);
  std::swap (loc_t1, loc_t2);


Re: Fix MIPS call-clobbered-*.c tests

2019-10-02 Thread Jeff Law
On 10/2/19 5:47 AM, Richard Sandiford wrote:
> Jeff pointed out that gcc.target/mips/call-clobbered-4.c started
> failing after the function-abi series.  This is because IRA used
> to treat partly call-clobbered registers as hard conflicts and
> so wouldn't consider them for -fcaller-saves.  Now that we treat
> call clobbers the same way regardless of where they come from,
> we can use $f21 as a caller-save register.  This in turn means
> that -Os is no longer a special case in call-clobbered-3.c.
> 
> (The new code is the same size as the old code.)
> 
> Tested on mipsel-linux-gnu.  OK to install?
> 
> Richard
> 
> 
> 2019-10-02  Richard Sandiford  
> 
> gcc/testsuite/
>   * gcc.target/mips/call-clobbered-3.c: Remove skip for -Os.
>   * gcc.target/mips/call-clobbered-4.c: Delete.
OK
jeff


Re: [patch] range-ops contribution

2019-10-02 Thread Aldy Hernandez




On 10/2/19 8:19 AM, Andrew MacLeod wrote:

On 10/2/19 6:52 AM, Richard Biener wrote:



+
+/* Return the inverse of a range.  */
+
+void
+value_range_base::invert ()
+{
+  if (undefined_p ())
+    return;
+  if (varying_p ())
+    set_undefined ();
+  else if (m_kind == VR_RANGE)
+    m_kind = VR_ANTI_RANGE;
+  else if (m_kind == VR_ANTI_RANGE)
+    m_kind = VR_RANGE;
+  else
+    gcc_unreachable ();
+}

I don't think this is right for varying_p.  ISTM that if something is
VR_VARYING that inverting it is still VR_VARYING.  VR_UNDEFINED seems
particularly bad given its optimistic treatment elsewhere.
VR_VARYING isn't a range, it's a lattice state (likewise for 
VR_UNDEFINED).

It doesn't make sense to invert a lattice state.  How you treat
VR_VARYING/VR_UNDEFINED
depends on context and so depends what 'invert' would do.  I suggest 
to assert

that varying/undefined is never inverted.


True for a lattice state, not true for a range in the new context of the 
ranger where

  a) varying == range for type and
  b) undefined == unreachable

This is a carry over from really old code where we only got part of it 
fixed right a while ago.
invert ( varying ) == varying    because we still know nothing about it, 
its still range for type.
invert (undefined) == undefined because undefined is unreachable 
which is viral.


So indeed, varying should return varying... So If its undefined or 
varying, we should just return from the invert call. ie, not do anything 
to the range.    in the case of a lattice state, doing nothing to it 
should not be harmful.  I also expect it will never get called for a 
pure lattice state since its only invoked from range-ops, at which point 
we only are dealing with the range it represents.



I took a look and this bug hasn't been triggered because its only used 
in a couple of places.
1)  op1_range for EQ_EXPR and NE_EXPR when we have a true OR false 
constant condition in both the LHS and OP2 position, it sometimes 
inverts it via this call..  so its only when there is a specific boolean 
range of [TRUE,TRUE]  or [FALSE,FALSE].      when any range is undefined 
or varying in those routines, theres a different path for the result


2) fold() for logical NOT, which also has a preliminary check for 
varying or undefined and does nothing in those cases ie, returns the 
existing value.   IN fact, you can probably remove the special casing in 
logical_not with this fix, which is indicative that it is correct.


Good idea.  I've removed the special casing and am testing a new patch.

Thanks.
Aldy


Re: [patch] range-ops contribution

2019-10-02 Thread Andrew MacLeod

On 10/2/19 6:52 AM, Richard Biener wrote:



+
+/* Return the inverse of a range.  */
+
+void
+value_range_base::invert ()
+{
+  if (undefined_p ())
+return;
+  if (varying_p ())
+set_undefined ();
+  else if (m_kind == VR_RANGE)
+m_kind = VR_ANTI_RANGE;
+  else if (m_kind == VR_ANTI_RANGE)
+m_kind = VR_RANGE;
+  else
+gcc_unreachable ();
+}

I don't think this is right for varying_p.  ISTM that if something is
VR_VARYING that inverting it is still VR_VARYING.  VR_UNDEFINED seems
particularly bad given its optimistic treatment elsewhere.

VR_VARYING isn't a range, it's a lattice state (likewise for VR_UNDEFINED).
It doesn't make sense to invert a lattice state.  How you treat
VR_VARYING/VR_UNDEFINED
depends on context and so depends what 'invert' would do.  I suggest to assert
that varying/undefined is never inverted.


True for a lattice state, not true for a range in the new context of the 
ranger where

 a) varying == range for type and
 b) undefined == unreachable

This is a carry over from really old code where we only got part of it 
fixed right a while ago.
invert ( varying ) == varying    because we still know nothing about it, 
its still range for type.
invert (undefined) == undefined because undefined is unreachable 
which is viral.


So indeed, varying should return varying... So If its undefined or 
varying, we should just return from the invert call. ie, not do anything 
to the range.    in the case of a lattice state, doing nothing to it 
should not be harmful.  I also expect it will never get called for a 
pure lattice state since its only invoked from range-ops, at which point 
we only are dealing with the range it represents.



I took a look and this bug hasn't been triggered because its only used 
in a couple of places.
1)  op1_range for EQ_EXPR and NE_EXPR when we have a true OR false 
constant condition in both the LHS and OP2 position, it sometimes 
inverts it via this call..  so its only when there is a specific boolean 
range of [TRUE,TRUE]  or [FALSE,FALSE].      when any range is undefined 
or varying in those routines, theres a different path for the result


2) fold() for logical NOT, which also has a preliminary check for 
varying or undefined and does nothing in those cases ie, returns the 
existing value.   IN fact, you can probably remove the special casing in 
logical_not with this fix, which is indicative that it is correct.




Andrew




Fix MIPS call-clobbered-*.c tests

2019-10-02 Thread Richard Sandiford
Jeff pointed out that gcc.target/mips/call-clobbered-4.c started
failing after the function-abi series.  This is because IRA used
to treat partly call-clobbered registers as hard conflicts and
so wouldn't consider them for -fcaller-saves.  Now that we treat
call clobbers the same way regardless of where they come from,
we can use $f21 as a caller-save register.  This in turn means
that -Os is no longer a special case in call-clobbered-3.c.

(The new code is the same size as the old code.)

Tested on mipsel-linux-gnu.  OK to install?

Richard


2019-10-02  Richard Sandiford  

gcc/testsuite/
* gcc.target/mips/call-clobbered-3.c: Remove skip for -Os.
* gcc.target/mips/call-clobbered-4.c: Delete.

Index: gcc/testsuite/gcc.target/mips/call-clobbered-3.c
===
--- gcc/testsuite/gcc.target/mips/call-clobbered-3.c2019-03-08 
18:14:29.356996347 +
+++ gcc/testsuite/gcc.target/mips/call-clobbered-3.c2019-10-02 
12:44:54.644608714 +0100
@@ -1,7 +1,5 @@
 /* Check that we handle call-clobbered FPRs correctly.  */
 /* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
-/* Refer to call-clobbered-4.c to see the expected output from -Os builds.  */
-/* { dg-skip-if "uses callee-saved GPR" { *-*-* } { "-Os" } { "" } } */
 /* { dg-options "-mabi=32 -modd-spreg -mfpxx -ffixed-f0 -ffixed-f1 -ffixed-f2 
-ffixed-f3 -ffixed-f4 -ffixed-f5 -ffixed-f6 -ffixed-f7 -ffixed-f8 -ffixed-f9 
-ffixed-f10 -ffixed-f11 -ffixed-f12 -ffixed-f13 -ffixed-f14 -ffixed-f15 
-ffixed-f16 -ffixed-f17 -ffixed-f18 -ffixed-f19 -ffixed-f20 -ffixed-f22 
-ffixed-f24 -ffixed-f26 -ffixed-f28 -ffixed-f30" } */
 
 void bar (void);
Index: gcc/testsuite/gcc.target/mips/call-clobbered-4.c
===
--- gcc/testsuite/gcc.target/mips/call-clobbered-4.c2019-03-08 
18:14:29.356996347 +
+++ /dev/null   2019-09-17 11:41:18.176664108 +0100
@@ -1,23 +0,0 @@
-/* Check that we handle call-clobbered FPRs correctly.
-   This test differs from call-clobbered-3.c because when optimising for size
-   a callee-saved GPR is used for 'b' to cross the call.  */
-/* { dg-skip-if "code quality test" { *-*-* } { "*" } { "-Os" } } */
-/* { dg-options "-mabi=32 -modd-spreg -mfpxx -ffixed-f0 -ffixed-f1 -ffixed-f2 
-ffixed-f3 -ffixed-f4 -ffixed-f5 -ffixed-f6 -ffixed-f7 -ffixed-f8 -ffixed-f9 
-ffixed-f10 -ffixed-f11 -ffixed-f12 -ffixed-f13 -ffixed-f14 -ffixed-f15 
-ffixed-f16 -ffixed-f17 -ffixed-f18 -ffixed-f19 -ffixed-f20 -ffixed-f22 
-ffixed-f24 -ffixed-f26 -ffixed-f28 -ffixed-f30" } */
-
-void bar (void);
-float a;
-float
-foo ()
-{
-  float b = a + 1.0f;
-  bar();
-  return b;
-}
-/* { dg-final { scan-assembler-times "lwc1" 4 } } */
-/* { dg-final { scan-assembler-times "swc1" 2 } } */
-/* { dg-final { scan-assembler-times "mtc" 1 } } */
-/* { dg-final { scan-assembler-times "mfc" 1 } } */
-/* { dg-final { scan-assembler-not "mthc" } } */
-/* { dg-final { scan-assembler-not "mfhc" } } */
-/* { dg-final { scan-assembler-not "ldc1" } } */
-/* { dg-final { scan-assembler-not "sdc1" } } */


Re: [C++ PATCH] PR c++/91369 - Implement P0784R7: constexpr new

2019-10-02 Thread Jakub Jelinek
Hi!

Thanks for the review, let me start with the unrelated bugfixes then and
deal with the rest later.

On Tue, Oct 01, 2019 at 05:56:00PM -0400, Jason Merrill wrote:
> > @@ -3905,7 +4033,7 @@ cxx_eval_store_expression (const constex
> > bool no_zero_init = true;
> > releasing_vec ctors;
> > -  while (!refs->is_empty())
> > +  while (!refs->is_empty ())
> >   {
> > if (*valp == NULL_TREE)
> > {
> > @@ -4046,7 +4174,9 @@ cxx_eval_store_expression (const constex
> > if (const_object_being_modified)
> >   {
> > bool fail = false;
> > -  if (!CLASS_TYPE_P (TREE_TYPE (const_object_being_modified)))
> > +  tree const_objtype
> > +   = strip_array_types (TREE_TYPE (const_object_being_modified));
> > +  if (!CLASS_TYPE_P (const_objtype))
> 
> This looks like an unrelated bugfix; you might commit it (and the others)
> separately if that's convenient.
> 
> > @@ -4907,14 +5043,21 @@ cxx_eval_constant_expression (const cons
> > break;
> >   case CLEANUP_STMT:
> > -  r = cxx_eval_constant_expression (ctx, CLEANUP_BODY (t), lval,
> > +  {
> > +   tree initial_jump_target = jump_target ? *jump_target : NULL_TREE;
> > +   r = cxx_eval_constant_expression (ctx, CLEANUP_BODY (t), lval,
> > + non_constant_p, overflow_p,
> > + jump_target);
> > +   if (!CLEANUP_EH_ONLY (t) && !*non_constant_p)
> > + /* Also evaluate the cleanup.  If we weren't skipping at the
> > +start of the CLEANUP_BODY, change jump_target temporarily
> > +to &initial_jump_target, so that even a return or break or
> > +continue in the body doesn't skip the cleanup.  */
> 
> This also looks like an unrelated bugfix.

Both are only partially related, I think they can go in separately, but at
least for the first one I haven't managed to come up with a testcase where
it would matter and nothing in e.g. check-c++-all comes there with ARRAY_TYPE,
is it ok for trunk without a testcase (first attached patch)?

As for the second bugfix, I think it should be accompanied with the
potential_constant_expression_1 change, and there I've actually managed to
come up with a testcase where it matters, though it is using a GCC extension
(generally, CLEANUP_STMTs won't appear in constexpr functions that often
because of the requirement that variables in them have literal type and
literal types have trivial destructors, so really no cleanups in that case).

The testcase where it makes a difference is:

constexpr void
cleanup (int *x)
{
  if (x)
asm ("");
}

constexpr void
cleanup2 (int *x)
{
}

constexpr bool
foo ()
{
  int a __attribute__((cleanup (cleanup))) = 1;
  return true;
}

constexpr bool
bar ()
{
  int a __attribute__((cleanup (cleanup2))) = 1;
  return true;
}

constexpr auto x = foo ();
constexpr auto y = bar ();

With vanilla trunk, one gets a weird message:
test.C:27:24: error: ‘constexpr bool foo()’ called in a constant expression
   27 | constexpr auto x = foo ();
  |^~
test.C:28:24: error: ‘constexpr bool bar()’ called in a constant expression
   28 | constexpr auto y = bar ();
  |^~
That is because we call potential_constant_expression_1 on the foo (or
bar) body, see CLEANUP_STMT in there and punt.  With just the
potential_constant_expression_1 change to handle CLEANUP_STMT, we actually
accept it, which is wrong.
Finally, with the whole patch attached below, we reject foo () call and
accept bar ():
test.C:27:24:   in ‘constexpr’ expansion of ‘foo()’
test.C:27:25:   in ‘constexpr’ expansion of ‘cleanup((& a))’
test.C:5:5: error: inline assembly is not a constant expression
5 | asm ("");
  | ^~~
test.C:5:5: note: only unevaluated inline assembly is allowed in a ‘constexpr’ 
function in C++2a

Is the testcase ok in the second patch?  Are those patches ok for trunk?

Jakub
2019-10-02  Jakub Jelinek  

* constexpr.c (cxx_eval_store_expression): Formatting fix.  Handle
const_object_being_modified with array type.

--- gcc/cp/constexpr.c.jj   2019-09-27 20:33:37.600208356 +0200
+++ gcc/cp/constexpr.c  2019-09-27 20:38:38.203710246 +0200
@@ -3905,7 +4033,7 @@ cxx_eval_store_expression (const constex
   bool no_zero_init = true;
 
   releasing_vec ctors;
-  while (!refs->is_empty())
+  while (!refs->is_empty ())
 {
   if (*valp == NULL_TREE)
{
@@ -4046,7 +4174,9 @@ cxx_eval_store_expression (const constex
   if (const_object_being_modified)
 {
   bool fail = false;
-  if (!CLASS_TYPE_P (TREE_TYPE (const_object_being_modified)))
+  tree const_objtype
+   = strip_array_types (TREE_TYPE (const_object_being_modified));
+  if (!CLASS_TYPE_P (const_objtype))
fail = true;
   else
{
2019-10-02  Jakub Jelinek  

* constexpr.c (cxx_eval_constant_expression) : If
not skipping upon entry to body, run cleanup with the same *jump_targe

[PATCH] Split out stmt transform from vectorizable_reduction

2019-10-02 Thread Richard Biener


I've done only "trivial" pruning of unnecessary code to reduce the
chance of functional changes.  Cleanup will happen as followup.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-10-02  Richard Biener  

* tree-vectorizer.h (vect_transform_reduction): Declare.
* tree-vect-stmts.c (vect_transform_stmt): Use it.
* tree-vect-loop.c (vectorizable_reduction): Split out reduction
stmt transform to ...
(vect_transform_reduction): ... this.

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index a3fd011e6c4..31e745780ba 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -5784,7 +5784,6 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
gimple_stmt_iterator *gsi,
   int i;
   int ncopies;
   bool single_defuse_cycle = false;
-  int j;
   tree ops[3];
   enum vect_def_type dts[3];
   bool nested_cycle = false, found_nested_cycle_def = false;
@@ -6576,43 +6575,224 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
gimple_stmt_iterator *gsi,
   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
   bool mask_by_cond_expr = use_mask_by_cond_expr_p (code, cond_fn, vectype_in);
 
-  if (!vec_stmt) /* transformation not required.  */
+  /* transformation not required.  */
+  gcc_assert (!vec_stmt);
+
+  vect_model_reduction_cost (stmt_info, reduc_fn, ncopies, cost_vec);
+  if (loop_vinfo && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo))
 {
-  vect_model_reduction_cost (stmt_info, reduc_fn, ncopies, cost_vec);
-  if (loop_vinfo && LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo))
+  if (reduction_type != FOLD_LEFT_REDUCTION
+ && !mask_by_cond_expr
+ && (cond_fn == IFN_LAST
+ || !direct_internal_fn_supported_p (cond_fn, vectype_in,
+ OPTIMIZE_FOR_SPEED)))
{
- if (reduction_type != FOLD_LEFT_REDUCTION
- && !mask_by_cond_expr
- && (cond_fn == IFN_LAST
- || !direct_internal_fn_supported_p (cond_fn, vectype_in,
- OPTIMIZE_FOR_SPEED)))
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"can't use a fully-masked loop because no"
-" conditional operation is available.\n");
- LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
-   }
- else if (reduc_index == -1)
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"can't use a fully-masked loop for chained"
-" reductions.\n");
- LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
-   }
- else
-   vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
-  vectype_in);
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"can't use a fully-masked loop because no"
+" conditional operation is available.\n");
+ LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
}
-  if (dump_enabled_p ()
- && reduction_type == FOLD_LEFT_REDUCTION)
-   dump_printf_loc (MSG_NOTE, vect_location,
-"using an in-order (fold-left) reduction.\n");
-  STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
-  return true;
+  else if (reduc_index == -1)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"can't use a fully-masked loop for chained"
+" reductions.\n");
+ LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false;
+   }
+  else
+   vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
+  vectype_in);
+}
+  if (dump_enabled_p ()
+  && reduction_type == FOLD_LEFT_REDUCTION)
+dump_printf_loc (MSG_NOTE, vect_location,
+"using an in-order (fold-left) reduction.\n");
+  STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
+  return true;
+}
+
+/* Transform the definition stmt STMT_INFO of a reduction PHI backedge
+   value.  */
+
+bool
+vect_transform_reduction (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
+ stmt_vec_info *vec_stmt, slp_tree slp_node)
+{
+  tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
+  tree vectype_in = NULL_TREE;
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  enum tree_code code;
+  int op_type;
+  bool is_simple_use;
+  int i;
+  int ncopies;
+  bool single_defuse_cycle = false;
+  int j;
+  tree ops[3];
+  bool nested_cycle = false;
+  int vec_num;
+
+  if (nested_in_vect_loo

Re: [patch] range-ops contribution

2019-10-02 Thread Richard Biener
On Tue, Oct 1, 2019 at 8:07 PM Jeff Law  wrote:
>
> On 10/1/19 11:11 AM, Aldy Hernandez wrote:
> > Hi folks.
> >
> > Here is my official submission of the range-ops part of the ranger to
> > mainline.
> >
> > I realize that I could have split this patch up into 2-3 separate ones,
> > but I don't want to run into the chicken-and-egg scenario of last time,
> > where I had 4 inter-connected patches that were hard to review
> > independently.
> It might have helped a bit, but it was pretty easy to find the mapping
> from bits in wide-int-range.cc into range-op.cc -- the comments were
> copied :-)
>
> >
> > A few notes that may help in reviewing.
> >
> > The range-ops proper is in range-op.*.
> >
> > The range.* files are separate files containing some simple auxiliary
> > functions that will have irange and value_range_base counterparts.  Our
> > development branch will have #define value_range_base irange, and some
> > auxiliary glue, but none of that will be in trunk.  As promised, trunk
> > is all value_range_base.
> >
> > * The changes to tree-vrp.* are:
> >
> > 1. New constructors to align the irange and value_range_base APIs.  We
> > discussed this a few months ago, and these were the agreed upon changes
> > to the API.
> Right.
>
> >
> > 2. Extracting the symbolic handling of PLUS/MINUS and POINTER_PLUS_EXPR
> > into separate functions (extract_range_from_plus_minus_expr and
> > extract_range_from_pointer_plus_expr).
> In retrospect we should have broken down that function in the old vrp
> code.  I suspect that function started out relatively small and just
> kept expanding over time into the horrid mess that became.
>
> THere were a number of places where you ended up pulling code from two
> existing locations into a single point in range-ops.  But again, it was
> just a matter of finding the multiple original source points and mapping
> then into their new location in range-ops.cc, using the copied comments
> as a guide.
>
> >
> > 3. New range_fold_unary_expr and range_fold_binary_expr functions. These
> > are the VRP entry point into range-ops.  They normalize symbolics and do
> > some minor pre-shuffling before calling range-ops to do the actual range
> > operation.
> Right.  I see these as primarily an adapter between existing code and
> the new range ops.
>
> >
> > (I have some minor shuffling of these two functions that I'd like to
> > post as separate clean-ups, but I didn't want to pollute this patchset
> > with them: Fedora taking forever to test and all.)
> Works for me.
>
>
> > 5. Removing the wide-int-range.* files.  Most of the code is now
> > in-lined into range-op.cc with the exception of
> > wide_int_range_set_zero_nonzero_bits which has been moved into tree-vrp.c.
> Right.  Largely follows from #2 above.
>
> >
> > I think that's all!
> >
> > For testing this patchset, I left the old extract_*ary_expr_code in, and
> > added comparison code that trapped if there were any differences between
> > what VRP was getting and what range-ops calculated.  I found no
> > regressions in either a full bootstrap/tests (all languages), or with a
> > full Fedora build.  As a bonus, we found quite a few cases where
> > range-ops was getting better results.
> So to provide a bit more info here.  We ran tests back in the spring
> which resulted in various bugfixes/improvements.  Aldy asked me to
> re-run with their more recent branch.  That run exposed one very clear
> ranger bug which Aldy fixed prior to submitting this patch as well as
> several cases where the results differed.  We verified each and every
> one of them was a case where Ranger was getting better results.
>
> > (Note: At the last minute, Jeff found one regression in the multi-day
> > Fedora build.  I will fix this as a follow-up.  BTW, it does not cause
> > any regressions in a bootstrap or GCC test-run, just a discrepancy on
> > one specific corner case between VRP and range-ops.)
> Right.  WHat happened was there was a package that failed to build due
> to the Fortran front-end getting tighter in its handling of argument
> checking.  Once that (and various other issues related to using a gcc-10
> snapshot) was worked around I rebuilt the failing packages.  That in
> turn exposed another case where ranger and vrp differed in their results
> (it's a MULT+overflow case IIRC)  ANyway, I'm leaving it to you to
> analyze :-)
>
>
> [ ... ]
>
> >
> > The attached patch is based off of trunk from a few weeks ago.  If
> > approved, I will merge and re-test again with latest trunk.  I won't
> > however, test all of Fedora :-P.
> Agreed, I don't think that's necessary.  FWIW, using a month-old branch
> for testing was amazingly helpful in other respects.  We found ~100
> packages that need updating for gcc-10 as well as a few bugs unrelated
> to Ranger.  I've actually got Sunday's snapshot spinning now and fully
> expect to be spinning Fedora builds with snapshots for the next several
> months.  So I don't expect a Fedora build just to test after 

Re: [PATCH] Fix PR91606

2019-10-02 Thread Nathan Sidwell

On 10/2/19 6:07 AM, Richard Biener wrote:


A middle-end change exposed that the C++ FE handling of pointer-to-member
aggregates in cxx_get_alias_set isn't effective.  The following patch
makes it so by design by marking the structure members as not being
aliased (there can be no pointers like &x.__pfn or &x.__delta) thus
making them behave like fat pointers.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for trunk and affected branches?

Thanks,
Richard.

2019-10-02  Richard Biener  

PR c++/91606
* decl.c (build_ptrmemfunc_type): Mark pointer-to-member
fat pointer structure members as DECL_NONADDRESSABLE_P.



ok, thanks

nathan

--
Nathan Sidwell


Re: -O2 inliner returning 1/n: reduce EARLY_INLINING_INSNS for O1 and O2

2019-10-02 Thread Jan Hubicka
> On Tue, Oct 01, 2019 at 07:04:53PM +0200, Jan Hubicka wrote:
> > * gcc.dg/tree-ssa/ssa-thread-12.c: Mark compure_idf inline.
> 
> > --- testsuite/gcc.dg/tree-ssa/ssa-thread-12.c   (revision 276272)
> > +++ testsuite/gcc.dg/tree-ssa/ssa-thread-12.c   (working copy)
> > @@ -56,7 +56,7 @@ bmp_iter_and_compl (bitmap_iterator * bi
> >  }
> >  
> >  extern int VEC_int_base_length (VEC_int_base *);
> > -bitmap
> > +inline bitmap
> >  compute_idf (bitmap def_blocks, bitmap_head * dfs)
> >  {
> >bitmap_iterator bi;
> 
> Neither this change nor the followup in http://gcc.gnu.org/r276418
> actually works.
> PASS: gcc.dg/tree-ssa/ssa-thread-12.c (test for excess errors)
> gcc.dg/tree-ssa/ssa-thread-12.c: dump file does not exist
> UNRESOLVED: gcc.dg/tree-ssa/ssa-thread-12.c scan-tree-dump thread2 "FSM"
> gcc.dg/tree-ssa/ssa-thread-12.c: dump file does not exist
> UNRESOLVED: gcc.dg/tree-ssa/ssa-thread-12.c scan-tree-dump thread3 "FSM"
> gcc.dg/tree-ssa/ssa-thread-12.c: dump file does not exist
> UNRESOLVED: gcc.dg/tree-ssa/ssa-thread-12.c scan-tree-dump thread4 "FSM"
> 
> We shouldn't really have UNRESOLVED testcases in the testsuite.
> 
> The reason is that with the static __inline__ bitmap in there, no function
> is actually emitted as nothing uses it, so there are no dumps other than
> original (and 4 empty ones) with -fdump-tree-all.

Aha, sorry for that.  I was looking for FAILs and missed the UNRESOLVED.
I will revert that change and go with --param then.  The testcase is
somewhat fragile depending on particular chain of inliner decisions.

Honza
> 
>   Jakub


Re: [PR47785] COLLECT_AS_OPTIONS

2019-10-02 Thread Richard Biener
On Wed, Oct 2, 2019 at 10:39 AM Kugan Vivekanandarajah
 wrote:
>
> Hi,
>
> As mentioned in the PR, attached patch adds COLLECT_AS_OPTIONS for
> passing assembler options specified with -Wa, to the link-time driver.
>
> The proposed solution only works for uniform -Wa options across all
> TUs. As mentioned by Richard Biener, supporting non-uniform -Wa flags
> would require either adjusting partitioning according to flags or
> emitting multiple object files  from a single LTRANS CU. We could
> consider this as a follow up.
>
> Bootstrapped and regression tests on  arm-linux-gcc. Is this OK for trunk?

While it works for your simple cases it is unlikely to work in practice since
your implementation needs the assembler options be present at the link
command line.  I agree that this might be the way for people to go when
they face the issue but then it needs to be documented somewhere
in the manual.

That is, with COLLECT_AS_OPTION (why singular?  I'd expected
COLLECT_AS_OPTIONS) available to cc1 we could stream this string
to lto_options and re-materialize it at link time (and diagnose mismatches
even if we like).

Richard.

> Thanks,
> Kugan
>
>
> gcc/ChangeLog:
>
> 2019-10-02  kugan.vivekanandarajah  
>
> PR lto/78353
> * gcc.c (putenv_COLLECT_AS_OPTION): New to set COLLECT_AS_OPTION in env.
> (driver::main): Call putenv_COLLECT_AS_OPTION.
> * lto-wrapper.c (run_gcc): use COLLECT_AS_OPTION from env.
>
> gcc/testsuite/ChangeLog:
>
> 2019-10-02  kugan.vivekanandarajah  
>
> PR lto/78353
> * gcc.target/arm/pr78353-1.c: New test.
> * gcc.target/arm/pr78353-2.c: New test.


Re: -O2 inliner returning 1/n: reduce EARLY_INLINING_INSNS for O1 and O2

2019-10-02 Thread Jakub Jelinek
On Tue, Oct 01, 2019 at 07:04:53PM +0200, Jan Hubicka wrote:
>   * gcc.dg/tree-ssa/ssa-thread-12.c: Mark compure_idf inline.

> --- testsuite/gcc.dg/tree-ssa/ssa-thread-12.c (revision 276272)
> +++ testsuite/gcc.dg/tree-ssa/ssa-thread-12.c (working copy)
> @@ -56,7 +56,7 @@ bmp_iter_and_compl (bitmap_iterator * bi
>  }
>  
>  extern int VEC_int_base_length (VEC_int_base *);
> -bitmap
> +inline bitmap
>  compute_idf (bitmap def_blocks, bitmap_head * dfs)
>  {
>bitmap_iterator bi;

Neither this change nor the followup in http://gcc.gnu.org/r276418
actually works.
PASS: gcc.dg/tree-ssa/ssa-thread-12.c (test for excess errors)
gcc.dg/tree-ssa/ssa-thread-12.c: dump file does not exist
UNRESOLVED: gcc.dg/tree-ssa/ssa-thread-12.c scan-tree-dump thread2 "FSM"
gcc.dg/tree-ssa/ssa-thread-12.c: dump file does not exist
UNRESOLVED: gcc.dg/tree-ssa/ssa-thread-12.c scan-tree-dump thread3 "FSM"
gcc.dg/tree-ssa/ssa-thread-12.c: dump file does not exist
UNRESOLVED: gcc.dg/tree-ssa/ssa-thread-12.c scan-tree-dump thread4 "FSM"

We shouldn't really have UNRESOLVED testcases in the testsuite.

The reason is that with the static __inline__ bitmap in there, no function
is actually emitted as nothing uses it, so there are no dumps other than
original (and 4 empty ones) with -fdump-tree-all.

Jakub


[PATCH] Fix build with GCC 4.3

2019-10-02 Thread Richard Biener


Which complains

libcpp/internal.h:129: error: comma at end of enumerator list

Committed as obvious.

Richard.

Index: libcpp/internal.h
===
--- libcpp/internal.h   (revision 276439)
+++ libcpp/internal.h   (working copy)
@@ -126,7 +126,7 @@ enum include_type
IT_MAIN, /* main  */
 
IT_DIRECTIVE_HWM = IT_IMPORT + 1,  /* Directives below this.  */
-   IT_HEADER_HWM = IT_DEFAULT + 1,/* Header files below this.  */
+   IT_HEADER_HWM = IT_DEFAULT + 1 /* Header files below this.  */
   };
 
 union utoken


Re: [patch, libgomp] testsuite remove alloca header

2019-10-02 Thread Thomas Schwinge
Hi!

On 2019-09-30T15:22:52+0200, Jakub Jelinek  wrote:
> On Mon, Sep 30, 2019 at 03:16:00PM +0200, Andreas Tobler wrote:
>> here on FreeBSD we lack the alloca.h header so two test cases fail to
>> compile.
>> 
>> Do the same as I did six years ago for another test case: 'Remove alloca.h
>> and replace it with __builtin_alloca"
>> 
>> Is this ok for trunk?
>> 
>> TIA,
>> Andreas
>> 
>> 2019-09-30  Andreas Tobler  
>> 
>>  * testsuite/libgomp.oacc-c-c++-common/loop-default.h: Remove alloca.h
>>  inlcude. Replace alloca () with __builtin_alloca ().
>
> s/inlcude/include/
>
> Because the tests are already using GCC specific builtins
> (__builtin_goacc_parlevel_id), I think that is ok, they aren't portable
> anyway, CCing Thomas to confirm.
>
>>  * testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c: Likewise.

Sure, that's fine, and (as far as I'm concerned) pre-approved, should any
additional such changes be necessary later on.

To record the review effort -- not very much here, admittedly ;-) --
please include "Reviewed-by: Thomas Schwinge "
in the commit log, see .


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [Patch, Fortran, OpenMP] Support OpenMP 5.0's use_device_addr

2019-10-02 Thread Jakub Jelinek
On Wed, Oct 02, 2019 at 11:59:46AM +0200, Tobias Burnus wrote:
> This patch adds the first OpenMP 5 feature to gfortran – which is trivial as
> it justs adds some parsing, the heavy lifting is already done in the middle
> end. (Minus bugs.)
> 
> Additionally, I add a check for "is_device_ptr" – which is explicitly is
> specified in the OpenMP 4.5/5.0 spec. (i.e. dummy argument w/o
> pointer/allocatable/value attribute; C/C++ have the requirement pointer or
> array - or (C++) reference to those.) – I am not sure whether the Fortran
> restrictions make sense, but that's a OpenMP spec issue (for OpenMP 5.x?).

Indeed.  I guess the reason for dummy argument that is passed by reference
is that it under the hood works most closely to a C/C++ pointer (well, more
precisely a reference, but for C++ references there are special rules that
might not be appropriate for this purpose, in many cases it then behaves
more like fortran scalar pointer, it is a pointer + what it points to.

> I am a bit unsure about checks for use_device_…. The spec has no explicit
> rules, only some with can be deduced from the semantic. For C/C++, gcc/g++
> has for OpenMP 4.5 (_ptr) and for OpenMP 5 (_addr): only accept pointer or
> array type and C++ additionally for references to those. OpenMP 5 for _ptr
> accepts only POINTER_TYPE (C++: plus refs to it).

Indeed, you've seen the discussions.

> --- a/gcc/fortran/openmp.c
> +++ b/gcc/fortran/openmp.c
> @@ -4563,7 +4571,25 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
> *omp_clauses,
>   }
>   break;
> case OMP_LIST_IS_DEVICE_PTR:
if (!n->sym->attr.dummy)
  gfc_error ("Non-dummy object %qs in %s clause at %L",
 n->sym->name, name, &n->where);

Formatting, gfc_error should be only 2 columns to the left of it, several
times below too.

> + if (n->sym->attr.allocatable
> + || (n->sym->ts.type == BT_CLASS
> + && CLASS_DATA (n->sym)->attr.allocatable))
> +   gfc_error ("ALLOCATABLE object %qs in %s clause at %L",
> +  n->sym->name, name, &n->where);
> + if (n->sym->attr.pointer
> + || (n->sym->ts.type == BT_CLASS
> + && CLASS_DATA (n->sym)->attr.pointer))
> +   gfc_error ("POINTER object %qs in %s clause at %L",
> +  n->sym->name, name, &n->where);
> + if (n->sym->attr.value)
> +   gfc_error ("VALUE object %qs in %s clause at %L",
> +  n->sym->name, name, &n->where);
> + break;
> case OMP_LIST_USE_DEVICE_PTR:
> +   case OMP_LIST_USE_DEVICE_ADDR:
>   /* FIXME: Handle these.  */

I think for use_device_addr there is no reason for a FIXME, there shouldn't
be restrictions on what variable is allowed in it, the restrictions are how
one can actually use the variable inside of the region (though, the
restriction is vague).  Though perhaps not worth having a separate break;
for it.

> diff --git a/gcc/testsuite/gfortran.dg/gomp/is_device_ptr-1.f90 
> b/gcc/testsuite/gfortran.dg/gomp/is_device_ptr-1.f90
> new file mode 100644
> index 000..18211df0ea4
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/gomp/is_device_ptr-1.f90
> @@ -0,0 +1,24 @@
> +! { dg-do compile }
> +subroutine test(b,c,d)
> +  implicit none
> +  integer, value, target :: b
> +  integer, pointer :: c
> +  integer, allocatable, target :: d
> +
> +  integer, target :: a(5)
> +
> +  !$omp target is_device_ptr(a) ! { dg-error "Non-dummy object .a. in 
> IS_DEVICE_PTR clause" }
> +  !$omp target is_device_ptr(b) ! { dg-error "VALUE object .b. in 
> IS_DEVICE_PTR clause" }
> +  !$omp target is_device_ptr(c) ! { dg-error "POINTER object .c. in 
> IS_DEVICE_PTR clause" }
> +  !$omp target is_device_ptr(d) ! { dg-error "ALLOCATABLE object .d. in 
> IS_DEVICE_PTR clause" }
> +  !$omp end target
> +  !$omp end target
> +  !$omp end target
> +  !$omp end target

Can you please move the !$opm end target directives after each !$omp target?
Encountering a target inside of target is unspecified behavior...

> +
> +  !$omp target data map(a) use_device_addr(a)  ! Should be okay
> +  !$omp end target data
> +
> +  !$omp target data map(c) use_device_ptr(c)  ! Should be okay
> +  !$omp end target data
> +end subroutine test

Ok with those nits fixed.

Jakub


Re: [PATCH] Allow vectorization of __builtin_bswap16 (PR tree-optimization/91940)

2019-10-02 Thread Richard Biener
On Wed, 2 Oct 2019, Jakub Jelinek wrote:

> On Wed, Nov 09, 2016 at 09:14:55AM +0100, Richard Biener wrote:
> > The following implements vectorization of bswap via VEC_PERM_EXPR
> > on the corresponding QImode vector.
> > 
> > ARM already has backend handling via the builtin_vectorized_call
> > hook and thus there were already testcases available.  It doesn't
> > end up working for vect-bswap16.c because we have a promoted
> > argument to __builtin_bswap16 which confuses vectorization.
> 
> Indeed.  The following patch handles that in tree-vect-patterns.c.
> If it sees a __builtin_bswap16 with the promoted argument, it checks if we'd
> vectorize the builtin if it didn't have a promoted argument and if yes,
> it just changes it in a pattern_stmt to use an unpromoted argument or casts
> it first to the right type.  This works e.g. for the SSE4 case.
> Otherwise, it handles __builtin_bswap16 like a x r<< 8, if that is
> vectorizable, emits a pattern_stmt with x r<< 8, if it isn't, falls back to
> (x << 8) | (x >> 8) if that can be vectorized.  The last case matters for 
> SSE2.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2019-10-02  Jakub Jelinek  
> 
>   PR tree-optimization/91940
>   * tree-vect-patterns.c: Include tree-vector-builder.h and
>   vec-perm-indices.h.
>   (vect_recog_rotate_pattern): Also handle __builtin_bswap16, either by
>   unpromoting the argument back to uint16_t, or by converting into a
>   rotate, or into shifts plus ior.
> 
>   * gcc.dg/vect/vect-bswap16.c: Add -msse4 on x86, run on all targets,
>   expect vectorized 1 loops message on both vect_bswap and sse4_runtime
>   targets.
>   * gcc.dg/vect/vect-bswap16a.c: New test.
> 
> --- gcc/tree-vect-patterns.c.jj   2019-09-20 12:25:48.186387075 +0200
> +++ gcc/tree-vect-patterns.c  2019-10-01 11:29:18.229215895 +0200
> @@ -46,6 +46,8 @@ along with GCC; see the file COPYING3.
>  #include "cgraph.h"
>  #include "omp-simd-clone.h"
>  #include "predict.h"
> +#include "tree-vector-builder.h"
> +#include "vec-perm-indices.h"
>  
>  /* Return true if we have a useful VR_RANGE range for VAR, storing it
> in *MIN_VALUE and *MAX_VALUE if so.  Note the range in the dump files.  */
> @@ -2168,24 +2170,107 @@ vect_recog_rotate_pattern (stmt_vec_info
>enum vect_def_type dt;
>optab optab1, optab2;
>edge ext_def = NULL;
> +  bool bswap16_p = false;
>  
> -  if (!is_gimple_assign (last_stmt))
> -return NULL;
> -
> -  rhs_code = gimple_assign_rhs_code (last_stmt);
> -  switch (rhs_code)
> +  if (is_gimple_assign (last_stmt))
>  {
> -case LROTATE_EXPR:
> -case RROTATE_EXPR:
> -  break;
> -default:
> -  return NULL;
> +  rhs_code = gimple_assign_rhs_code (last_stmt);
> +  switch (rhs_code)
> + {
> + case LROTATE_EXPR:
> + case RROTATE_EXPR:
> +   break;
> + default:
> +   return NULL;
> + }
> +
> +  lhs = gimple_assign_lhs (last_stmt);
> +  oprnd0 = gimple_assign_rhs1 (last_stmt);
> +  type = TREE_TYPE (oprnd0);
> +  oprnd1 = gimple_assign_rhs2 (last_stmt);
> +}
> +  else if (gimple_call_builtin_p (last_stmt, BUILT_IN_BSWAP16))
> +{
> +  /* __builtin_bswap16 (x) is another form of x r>> 8.
> +  The vectorizer has bswap support, but only if the argument isn't
> +  promoted.  */
> +  lhs = gimple_call_lhs (last_stmt);
> +  oprnd0 = gimple_call_arg (last_stmt, 0);
> +  type = TREE_TYPE (oprnd0);
> +  if (TYPE_PRECISION (TREE_TYPE (lhs)) != 16
> +   || TYPE_PRECISION (type) <= 16
> +   || TREE_CODE (oprnd0) != SSA_NAME
> +   || BITS_PER_UNIT != 8
> +   || !TYPE_UNSIGNED (TREE_TYPE (lhs)))
> + return NULL;
> +
> +  stmt_vec_info def_stmt_info;
> +  if (!vect_is_simple_use (oprnd0, vinfo, &dt, &def_stmt_info, 
> &def_stmt))
> + return NULL;
> +
> +  if (dt != vect_internal_def)
> + return NULL;
> +
> +  if (gimple_assign_cast_p (def_stmt))
> + {
> +   def = gimple_assign_rhs1 (def_stmt);
> +   if (INTEGRAL_TYPE_P (TREE_TYPE (def))
> +   && TYPE_PRECISION (TREE_TYPE (def)) == 16)
> + oprnd0 = def;
> + }
> +
> +  type = TREE_TYPE (lhs);
> +  vectype = get_vectype_for_scalar_type (type);
> +  if (vectype == NULL_TREE)
> + return NULL;
> +
> +  if (tree char_vectype = get_same_sized_vectype (char_type_node, 
> vectype))
> + {
> +   /* The encoding uses one stepped pattern for each byte in the
> +  16-bit word.  */
> +   vec_perm_builder elts (TYPE_VECTOR_SUBPARTS (char_vectype), 2, 3);
> +   for (unsigned i = 0; i < 3; ++i)
> + for (unsigned j = 0; j < 2; ++j)
> +   elts.quick_push ((i + 1) * 2 - j - 1);
> +
> +   vec_perm_indices indices (elts, 1,
> + TYPE_VECTOR_SUBPARTS (char_vectype));
> +   if (can_vec_perm_const_p (TYPE_MODE (char_vectype), indices))
> +   

[PATCH] Fix PR91606

2019-10-02 Thread Richard Biener


A middle-end change exposed that the C++ FE handling of pointer-to-member
aggregates in cxx_get_alias_set isn't effective.  The following patch
makes it so by design by marking the structure members as not being
aliased (there can be no pointers like &x.__pfn or &x.__delta) thus
making them behave like fat pointers.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK for trunk and affected branches?

Thanks,
Richard.

2019-10-02  Richard Biener  

PR c++/91606
* decl.c (build_ptrmemfunc_type): Mark pointer-to-member
fat pointer structure members as DECL_NONADDRESSABLE_P.

* g++.dg/torture/pr91606.C: New testcase.

Index: gcc/cp/decl.c
===
--- gcc/cp/decl.c   (revision 276396)
+++ gcc/cp/decl.c   (working copy)
@@ -9554,10 +9554,12 @@ build_ptrmemfunc_type (tree type)
   TYPE_PTRMEMFUNC_FLAG (t) = 1;
 
   field = build_decl (input_location, FIELD_DECL, pfn_identifier, type);
+  DECL_NONADDRESSABLE_P (field) = 1;
   fields = field;
 
   field = build_decl (input_location, FIELD_DECL, delta_identifier, 
  delta_type_node);
+  DECL_NONADDRESSABLE_P (field) = 1;
   DECL_CHAIN (field) = fields;
   fields = field;
 
Index: gcc/testsuite/g++.dg/torture/pr91606.C
===
--- gcc/testsuite/g++.dg/torture/pr91606.C  (revision 0)
+++ gcc/testsuite/g++.dg/torture/pr91606.C  (working copy)
@@ -0,0 +1,109 @@
+/* { dg-do run } */
+/* { dg-additional-options "-fstrict-aliasing" } */
+
+#include 
+#include 
+#include 
+
+template 
+struct variant
+{
+  constexpr variant(T1 arg)
+  : f1(arg),
+  index(0)
+  {}
+
+  constexpr variant(T2 arg)
+  : f2(arg),
+  index(1)
+  {}
+
+  union
+{
+  T1 f1;
+  T2 f2;
+};
+  std::size_t index = 0;
+};
+
+template 
+constexpr const T1* get_if(const variant* v)
+{
+  if (v->index != 0)
+{
+  return nullptr;
+}
+  return &v->f1;
+}
+
+template 
+constexpr const T2* get_if(const variant* v)
+{
+  if (v->index != 1)
+{
+  return nullptr;
+}
+  return &v->f2;
+}
+
+template 
+struct my_array
+{
+  constexpr const T* begin() const
+{
+  return data;
+}
+
+  constexpr const T* end() const
+{
+  return data + N;
+}
+
+  T data[N];
+};
+
+template 
+constexpr auto get_array_of_variants(Ts ...ptrs)
+{
+  return std::array...>, sizeof...(Ts)>{ ptrs... };
+}
+
+template 
+constexpr auto get_member_functions();
+
+template 
+constexpr int getFuncId(Member (Class::*memFuncPtr))
+{
+  int idx = 0u;
+  for (auto &anyFunc : get_member_functions())
+{
+  if (auto *specificFunc = get_if(&anyFunc))
+   {
+ if (*specificFunc == memFuncPtr)
+   {
+ return idx;
+   }
+   }
+  ++idx;
+}
+  std::abort();
+}
+
+struct MyStruct
+{
+  void fun1(int /*a*/) {}
+
+  int fun2(char /*b*/, short /*c*/, bool /*d*/) { return 0; }
+
+};
+
+template <>
+constexpr auto get_member_functions()
+{
+  return get_array_of_variants(&MyStruct::fun1, &MyStruct::fun2);
+}
+
+int main()
+{
+  return getFuncId(&MyStruct::fun1);
+}


[Patch, Fortran, OpenMP] Support OpenMP 5.0's use_device_addr

2019-10-02 Thread Tobias Burnus
This patch adds the first OpenMP 5 feature to gfortran – which is 
trivial as it justs adds some parsing, the heavy lifting is already done 
in the middle end. (Minus bugs.)


Additionally, I add a check for "is_device_ptr" – which is explicitly is 
specified in the OpenMP 4.5/5.0 spec. (i.e. dummy argument w/o 
pointer/allocatable/value attribute; C/C++ have the requirement pointer 
or array - or (C++) reference to those.) – I am not sure whether the 
Fortran restrictions make sense, but that's a OpenMP spec issue (for 
OpenMP 5.x?).


I am a bit unsure about checks for use_device_…. The spec has no 
explicit rules, only some with can be deduced from the semantic. For 
C/C++, gcc/g++ has for OpenMP 4.5 (_ptr) and for OpenMP 5 (_addr): only 
accept pointer or array type and C++ additionally for references to 
those. OpenMP 5 for _ptr accepts only POINTER_TYPE (C++: plus refs to it).


The current omp-low.c implementation has in any case issues with 
use_device_ptr and nonallocatable, nonpointer scalars, which are locally 
defined or have the value attribute.


Build and regtested on x86_64-gnu-linux.

OK for the trunk?

Tobias

	* dump-parse-tree.c (show_omp_clauses): Handle OMP_LIST_USE_DEVICE_ADDR.
	* gfortran.h (enum): Add OMP_LIST_USE_DEVICE_ADDR.
	* openmp.c (omp_mask1): Likewise.
	(gfc_match_omp_clauses): Match 'use_device_addr'.
	(OMP_TARGET_DATA_CLAUSES): Add OMP_LIST_USE_DEVICE_ADDR.
	(resolve_omp_clauses): Add it; add is_device_ptr checks.

	* gfortran.dg/gomp/is_device_ptr-1.f90: New.

diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index 513f211b68b..9d7b26f5f6a 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1507,6 +1507,7 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
 	  case OMP_LIST_CACHE: type = "CACHE"; break;
 	  case OMP_LIST_IS_DEVICE_PTR: type = "IS_DEVICE_PTR"; break;
 	  case OMP_LIST_USE_DEVICE_PTR: type = "USE_DEVICE_PTR"; break;
+	  case OMP_LIST_USE_DEVICE_ADDR: type = "USE_DEVICE_ADDR"; break;
 	  default:
 	gcc_unreachable ();
 	  }
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 6f7717d1134..a70978bf49b 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1263,6 +1263,7 @@ enum
   OMP_LIST_CACHE,
   OMP_LIST_IS_DEVICE_PTR,
   OMP_LIST_USE_DEVICE_PTR,
+  OMP_LIST_USE_DEVICE_ADDR,
   OMP_LIST_NUM
 };
 
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index bda7f288989..17b0461276a 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -780,6 +780,7 @@ enum omp_mask1
   OMP_CLAUSE_SIMD,
   OMP_CLAUSE_THREADS,
   OMP_CLAUSE_USE_DEVICE_PTR,
+  OMP_CLAUSE_USE_DEVICE_ADDR,  /* Actually, OpenMP 5.0.  */
   OMP_CLAUSE_NOWAIT,
   /* This must come last.  */
   OMP_MASK1_LAST
@@ -1849,6 +1850,11 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 		   ("use_device_ptr (",
 		&c->lists[OMP_LIST_USE_DEVICE_PTR], false) == MATCH_YES)
 	continue;
+	  if ((mask & OMP_CLAUSE_USE_DEVICE_ADDR)
+	  && gfc_match_omp_variable_list
+		   ("use_device_addr (",
+		&c->lists[OMP_LIST_USE_DEVICE_ADDR], false) == MATCH_YES)
+	continue;
 	  break;
 	case 'v':
 	  /* VECTOR_LENGTH must be matched before VECTOR, because the latter
@@ -2477,7 +2485,7 @@ cleanup:
| OMP_CLAUSE_IS_DEVICE_PTR)
 #define OMP_TARGET_DATA_CLAUSES \
   (omp_mask (OMP_CLAUSE_DEVICE) | OMP_CLAUSE_MAP | OMP_CLAUSE_IF	\
-   | OMP_CLAUSE_USE_DEVICE_PTR)
+   | OMP_CLAUSE_USE_DEVICE_PTR | OMP_CLAUSE_USE_DEVICE_ADDR)
 #define OMP_TARGET_ENTER_DATA_CLAUSES \
   (omp_mask (OMP_CLAUSE_DEVICE) | OMP_CLAUSE_MAP | OMP_CLAUSE_IF	\
| OMP_CLAUSE_DEPEND | OMP_CLAUSE_NOWAIT)
@@ -4006,7 +4014,7 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 = { "PRIVATE", "FIRSTPRIVATE", "LASTPRIVATE", "COPYPRIVATE", "SHARED",
 	"COPYIN", "UNIFORM", "ALIGNED", "LINEAR", "DEPEND", "MAP",
 	"TO", "FROM", "REDUCTION", "DEVICE_RESIDENT", "LINK", "USE_DEVICE",
-	"CACHE", "IS_DEVICE_PTR", "USE_DEVICE_PTR" };
+	"CACHE", "IS_DEVICE_PTR", "USE_DEVICE_PTR", "USE_DEVICE_ADDR" };
 
   if (omp_clauses == NULL)
 return;
@@ -4563,7 +4571,25 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 		}
 	break;
 	  case OMP_LIST_IS_DEVICE_PTR:
+	if (!n->sym->attr.dummy)
+		  gfc_error ("Non-dummy object %qs in %s clause at %L",
+			 n->sym->name, name, &n->where);
+	if (n->sym->attr.allocatable
+		|| (n->sym->ts.type == BT_CLASS
+		&& CLASS_DATA (n->sym)->attr.allocatable))
+		  gfc_error ("ALLOCATABLE object %qs in %s clause at %L",
+			 n->sym->name, name, &n->where);
+	if (n->sym->attr.pointer
+		|| (n->sym->ts.type == BT_CLASS
+		&& CLASS_DATA (n->sym)->attr.pointer))
+		  gfc_error ("POINTER object %qs in %s clause at %L",
+			 n->sym->name, name, &n->where);
+	if (n->sym->attr.value)
+		  gfc_error ("VALUE object %qs in %s clause at %L",
+			 n->sym->name, name, &n->where);
+	break;
 	  case OMP_LIST_USE_DEVICE_PTR:
+	  c

Re: [Patch][Fortran] Vastly improve error message during parsing – simple patch for OpenMP/OpenACC directives

2019-10-02 Thread Jakub Jelinek
Hi!

Nice!

On Wed, Oct 02, 2019 at 11:24:46AM +0200, Tobias Burnus wrote:
>   gcc/fortran/
>   * openmp.c (gfc_match_omp_clauses): Show a clause-parsing
>   error if none was rised before.
>   * parse.c (matcha, matcho): If error occurred after
>   OpenMP/OpenACC directive matched, do not try other directives.
> 
>   gcc/testsuite/
>   * gfortran.dg/goacc/asyncwait-1.f95: Handle new error message.
>   * gfortran.dg/goacc/asyncwait-2.f95: Likewise

s/Likewise/Likewise./g

>   * gfortran.dg/goacc/asyncwait-3.f95: Likewise
>   * gfortran.dg/goacc/asyncwait-4.f95: Likewise
>   * gfortran.dg/goacc/default-2.f: Likewise
>   * gfortran.dg/goacc/enter-exit-data.f95: Likewise
>   * gfortran.dg/goacc/if.f95: Likewise
>   * gfortran.dg/goacc/list.f95: Likewise
>   * gfortran.dg/goacc/literal.f95: Likewise
>   * gfortran.dg/goacc/loop-2-kernels-tile.f: Likewise95

95 should move before :

>   * gfortran.dg/goacc/loop-2-parallel-tile.f95: Likewise
>   * gfortran.dg/goacc/loop-7.f95: Likewise
>   * gfortran.dg/goacc/parallel-kernels-cla: Likewiseuses.f95

Similarly for uses.f95

>   * gfortran.dg/goacc/routine-6.f90: Likewise
>   * gfortran.dg/goacc/several-directives.f95: Likewise
>   * gfortran.dg/goacc/sie.f95: Likewise
>   * gfortran.dg/goacc/tile-1.f90: Likewise
>   * gfortran.dg/goacc/update-if_present-2.: Likewisef90

Similarly for f90

>   * gfortran.dg/gomp/declare-simd-1.f90: Likewise
>   * gfortran.dg/gomp/pr29759.f90: Likewise

Ok with the ChangeLog nits fixed.

Jakub


Re: [Patch][omp-low.c,fortran] Simple fix for optional argument handling with OpenMP's use_device_ptr

2019-10-02 Thread Jakub Jelinek
On Tue, Oct 01, 2019 at 04:19:11PM +0200, Tobias Burnus wrote:
> OK for the trunk?

Ok, with a small formatting nit below.

> --- a/gcc/fortran/trans-openmp.c
> +++ b/gcc/fortran/trans-openmp.c
> @@ -47,6 +47,15 @@ along with GCC; see the file COPYING3.  If not see
>  
>  int ompws_flags;
>  
> +/* True if OpenMP should treat this DECL as an optional argument.  */
> +
> +bool
> +gfc_omp_is_optional_argument (const_tree decl)
> +{
> +  return (TREE_CODE (decl) == PARM_DECL && DECL_LANG_SPECIFIC (decl)
> +   && GFC_DECL_OPTIONAL_ARGUMENT (decl));
> +}

I think our coding style is that if the whole && or || condition fits
on one line, it goes on one line, if it doesn't fit, each && or ||
subexpression goes on a separate line, so no mixing of 2 subexpressions on
this line, 3 on following line, 1 on another one.  So the above should be
  return (TREE_CODE (decl) == PARM_DECL
  && DECL_LANG_SPECIFIC (decl)
  && GFC_DECL_OPTIONAL_ARGUMENT (decl));

And, as I said earlier, more work will be needed on the OpenMP side to handle
optional arguments e.g. in data sharing clauses properly.

Jakub


[PATCH] Allow vectorization of __builtin_bswap16 (PR tree-optimization/91940)

2019-10-02 Thread Jakub Jelinek
On Wed, Nov 09, 2016 at 09:14:55AM +0100, Richard Biener wrote:
> The following implements vectorization of bswap via VEC_PERM_EXPR
> on the corresponding QImode vector.
> 
> ARM already has backend handling via the builtin_vectorized_call
> hook and thus there were already testcases available.  It doesn't
> end up working for vect-bswap16.c because we have a promoted
> argument to __builtin_bswap16 which confuses vectorization.

Indeed.  The following patch handles that in tree-vect-patterns.c.
If it sees a __builtin_bswap16 with the promoted argument, it checks if we'd
vectorize the builtin if it didn't have a promoted argument and if yes,
it just changes it in a pattern_stmt to use an unpromoted argument or casts
it first to the right type.  This works e.g. for the SSE4 case.
Otherwise, it handles __builtin_bswap16 like a x r<< 8, if that is
vectorizable, emits a pattern_stmt with x r<< 8, if it isn't, falls back to
(x << 8) | (x >> 8) if that can be vectorized.  The last case matters for SSE2.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-10-02  Jakub Jelinek  

PR tree-optimization/91940
* tree-vect-patterns.c: Include tree-vector-builder.h and
vec-perm-indices.h.
(vect_recog_rotate_pattern): Also handle __builtin_bswap16, either by
unpromoting the argument back to uint16_t, or by converting into a
rotate, or into shifts plus ior.

* gcc.dg/vect/vect-bswap16.c: Add -msse4 on x86, run on all targets,
expect vectorized 1 loops message on both vect_bswap and sse4_runtime
targets.
* gcc.dg/vect/vect-bswap16a.c: New test.

--- gcc/tree-vect-patterns.c.jj 2019-09-20 12:25:48.186387075 +0200
+++ gcc/tree-vect-patterns.c2019-10-01 11:29:18.229215895 +0200
@@ -46,6 +46,8 @@ along with GCC; see the file COPYING3.
 #include "cgraph.h"
 #include "omp-simd-clone.h"
 #include "predict.h"
+#include "tree-vector-builder.h"
+#include "vec-perm-indices.h"
 
 /* Return true if we have a useful VR_RANGE range for VAR, storing it
in *MIN_VALUE and *MAX_VALUE if so.  Note the range in the dump files.  */
@@ -2168,24 +2170,107 @@ vect_recog_rotate_pattern (stmt_vec_info
   enum vect_def_type dt;
   optab optab1, optab2;
   edge ext_def = NULL;
+  bool bswap16_p = false;
 
-  if (!is_gimple_assign (last_stmt))
-return NULL;
-
-  rhs_code = gimple_assign_rhs_code (last_stmt);
-  switch (rhs_code)
+  if (is_gimple_assign (last_stmt))
 {
-case LROTATE_EXPR:
-case RROTATE_EXPR:
-  break;
-default:
-  return NULL;
+  rhs_code = gimple_assign_rhs_code (last_stmt);
+  switch (rhs_code)
+   {
+   case LROTATE_EXPR:
+   case RROTATE_EXPR:
+ break;
+   default:
+ return NULL;
+   }
+
+  lhs = gimple_assign_lhs (last_stmt);
+  oprnd0 = gimple_assign_rhs1 (last_stmt);
+  type = TREE_TYPE (oprnd0);
+  oprnd1 = gimple_assign_rhs2 (last_stmt);
+}
+  else if (gimple_call_builtin_p (last_stmt, BUILT_IN_BSWAP16))
+{
+  /* __builtin_bswap16 (x) is another form of x r>> 8.
+The vectorizer has bswap support, but only if the argument isn't
+promoted.  */
+  lhs = gimple_call_lhs (last_stmt);
+  oprnd0 = gimple_call_arg (last_stmt, 0);
+  type = TREE_TYPE (oprnd0);
+  if (TYPE_PRECISION (TREE_TYPE (lhs)) != 16
+ || TYPE_PRECISION (type) <= 16
+ || TREE_CODE (oprnd0) != SSA_NAME
+ || BITS_PER_UNIT != 8
+ || !TYPE_UNSIGNED (TREE_TYPE (lhs)))
+   return NULL;
+
+  stmt_vec_info def_stmt_info;
+  if (!vect_is_simple_use (oprnd0, vinfo, &dt, &def_stmt_info, &def_stmt))
+   return NULL;
+
+  if (dt != vect_internal_def)
+   return NULL;
+
+  if (gimple_assign_cast_p (def_stmt))
+   {
+ def = gimple_assign_rhs1 (def_stmt);
+ if (INTEGRAL_TYPE_P (TREE_TYPE (def))
+ && TYPE_PRECISION (TREE_TYPE (def)) == 16)
+   oprnd0 = def;
+   }
+
+  type = TREE_TYPE (lhs);
+  vectype = get_vectype_for_scalar_type (type);
+  if (vectype == NULL_TREE)
+   return NULL;
+
+  if (tree char_vectype = get_same_sized_vectype (char_type_node, vectype))
+   {
+ /* The encoding uses one stepped pattern for each byte in the
+16-bit word.  */
+ vec_perm_builder elts (TYPE_VECTOR_SUBPARTS (char_vectype), 2, 3);
+ for (unsigned i = 0; i < 3; ++i)
+   for (unsigned j = 0; j < 2; ++j)
+ elts.quick_push ((i + 1) * 2 - j - 1);
+
+ vec_perm_indices indices (elts, 1,
+   TYPE_VECTOR_SUBPARTS (char_vectype));
+ if (can_vec_perm_const_p (TYPE_MODE (char_vectype), indices))
+   {
+ /* vectorizable_bswap can handle the __builtin_bswap16 if we
+undo the argument promotion.  */
+ if (!useless_type_conversion_p (type, TREE_TYPE (oprnd0)))
+   {
+ de

Re: [RFC][Fortran,patch] %C error diagnostic location

2019-10-02 Thread Tobias Burnus

Hi all,

I looked at error messages – and I like it. – Please review.

There is actually a "fallout": One testsuite case actually checks for 
the row location – I didn't even know that one can do it that way.


That's fixed by the attached patch (I also included the original patch).
OK for the trunk?

Tobias

PS: The effect of the patch can also be seen at the patch description 
at: https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00110.html


On 10/2/19 12:26 AM, Tobias Burnus wrote:

Hi all,

my feeling is that %C locations are always off by one, e.g., showing 
the (1) under the last white-space character before the place where 
the error occurred – the match starts at the character after the 
gfc_current_location.


That bothered my for a while – but today, I was wondering whether one 
shouldn't simply bump the %C location by one – such that it shows at 
the first wrong character and not at the last okay character.


What do you think?


Another observation (unfixed): If gfortran buffers the error, the %C 
does not seem to get resolved at gfc_{error,warning} time but at the 
time when the buffer is flushed – which will have a reset error location.


Cheers,

Tobias


	* error (error_print, gfc_format_decoder): Fix off-by one issue with %C.

	* gfortran.dg/use_without_only_1.f90: Update column num in dg-warning.

diff --git a/gcc/fortran/error.c b/gcc/fortran/error.c
index a0ce7a6b190..815cae9d7e7 100644
--- a/gcc/fortran/error.c
+++ b/gcc/fortran/error.c
@@ -618,12 +618,18 @@ error_print (const char *type, const char *format0, va_list argp)
 	  {
 		l2 = loc;
 		arg[pos].u.stringval = "(2)";
+		/* Point %C first offending character not the last good one. */
+		if (arg[pos].type == TYPE_CURRENTLOC)
+		  l2->nextc++;
 	  }
 	else
 	  {
 		l1 = loc;
 		have_l1 = 1;
 		arg[pos].u.stringval = "(1)";
+		/* Point %C first offending character not the last good one. */
+		if (arg[pos].type == TYPE_CURRENTLOC)
+		  l1->nextc++;
 	  }
 	break;
 
@@ -963,6 +969,9 @@ gfc_format_decoder (pretty_printer *pp, text_info *text, const char *spec,
 	  loc = va_arg (*text->args_ptr, locus *);
 	gcc_assert (loc->nextc - loc->lb->line >= 0);
 	unsigned int offset = loc->nextc - loc->lb->line;
+	if (*spec == 'C')
+	  /* Point %C first offending character not the last good one. */
+	  offset++;
 	/* If location[0] != UNKNOWN_LOCATION means that we already
 	   processed one of %C/%L.  */
 	int loc_num = text->get_location (0) == UNKNOWN_LOCATION ? 0 : 1;
@@ -1400,7 +1409,7 @@ gfc_internal_error (const char *gmsgid, ...)
 void
 gfc_clear_error (void)
 {
-  error_buffer.flag = 0;
+  error_buffer.flag = false;
   warnings_not_errors = false;
   gfc_clear_pp_buffer (pp_error_buffer);
 }
diff --git a/gcc/testsuite/gfortran.dg/use_without_only_1.f90 b/gcc/testsuite/gfortran.dg/use_without_only_1.f90
index 06af9853933..ad64b206b76 100644
--- a/gcc/testsuite/gfortran.dg/use_without_only_1.f90
+++ b/gcc/testsuite/gfortran.dg/use_without_only_1.f90
@@ -6,16 +6,16 @@ MODULE foo
 END MODULE
 
 MODULE testmod
-  USE foo ! { dg-warning "6:has no ONLY qualifier" }
+  USE foo ! { dg-warning "7:has no ONLY qualifier" }
   IMPLICIT NONE
 CONTAINS
   SUBROUTINE S1
- USE foo ! { dg-warning "9:has no ONLY qualifier" }
+ USE foo ! { dg-warning "10:has no ONLY qualifier" }
   END SUBROUTINE S1
   SUBROUTINE S2
  USE foo, ONLY: bar 
   END SUBROUTINE
   SUBROUTINE S3
- USE ISO_C_BINDING ! { dg-warning "9:has no ONLY qualifier" }
+ USE ISO_C_BINDING ! { dg-warning "10:has no ONLY qualifier" }
   END SUBROUTINE S3
 END MODULE


[Patch][Fortran] Vastly improve error message during parsing – simple patch for OpenMP/OpenACC directives

2019-10-02 Thread Tobias Burnus

Simple patch – large effect.

Currently, gfortran prints for any parsing issue in !$omp (-fopenmp) and 
likewise for !$acc (-fopenacc) just:


    8 | !$omp target data  map(to:a) is_device_ptr(a)
  |  1
Error: Unclassifiable OpenMP directive at (1)

This patch adds a new error message – such that the error text is better 
and, in particular, the error location makes sense:


8 | !$omp target data  map(to:a) is_device_ptr(a)
  |  1
Error: Failed to match clause at (1)

(is_device is invalid as it is only permitted for "target" and not 
"target data").



In openmp.c, I now write a more explicit (and correct) error message, 
unless–  in a rare case – an error message has already been printed before.


parse.c: In order to get this to work properly, two things need to 
happen in parse.c: First, once an error message is printed, we don't 
fall through matching the next item – otherwise, one gets once the error 
above for "target data" plus one for  "target" (when parse the "clause" 
"data"). – But if the directive name has matched, we know for sure we 
are in the directive. Thus, the MATCH_ERROR must come from a clause.* – 
Secondly, reverting the location (undo_new_statement) should be avoided 
such that %C points to the right location.  (For buffered messages, it 
is evaluated at buffer-flush time not when gfc_{error,warning} is called.)


OK for the trunk?

Tobias

PS: Without my %C patch, the error location would be not under the "i" 
of "is_device_ptr" but before the "i" under the space; the old 
"Unclassifiable" message is from GCC 9 and, hence, "1" is under the space.
Please consider approving that patch as well: 
https://gcc.gnu.org/ml/fortran/2019-10/msg00010.html


[* That's different to normal Fortran parsing – especially with 
fixed-form which ignores spaces: "d o i = complex_expr, …": Until 
parsing the "," it can be both an assignment to "doi" or a loop ("do i = 
…"). But here we know.]


	gcc/fortran/
	* openmp.c (gfc_match_omp_clauses): Show a clause-parsing
	error if none was rised before.
	* parse.c (matcha, matcho): If error occurred after
	OpenMP/OpenACC directive matched, do not try other directives.

	gcc/testsuite/
	* gfortran.dg/goacc/asyncwait-1.f95: Handle new error message.
	* gfortran.dg/goacc/asyncwait-2.f95: Likewise
	* gfortran.dg/goacc/asyncwait-3.f95: Likewise
	* gfortran.dg/goacc/asyncwait-4.f95: Likewise
	* gfortran.dg/goacc/default-2.f: Likewise
	* gfortran.dg/goacc/enter-exit-data.f95: Likewise
	* gfortran.dg/goacc/if.f95: Likewise
	* gfortran.dg/goacc/list.f95: Likewise
	* gfortran.dg/goacc/literal.f95: Likewise
	* gfortran.dg/goacc/loop-2-kernels-tile.f: Likewise95
	* gfortran.dg/goacc/loop-2-parallel-tile.f95: Likewise
	* gfortran.dg/goacc/loop-7.f95: Likewise
	* gfortran.dg/goacc/parallel-kernels-cla: Likewiseuses.f95
	* gfortran.dg/goacc/routine-6.f90: Likewise
	* gfortran.dg/goacc/several-directives.f95: Likewise
	* gfortran.dg/goacc/sie.f95: Likewise
	* gfortran.dg/goacc/tile-1.f90: Likewise
	* gfortran.dg/goacc/update-if_present-2.: Likewisef90
	* gfortran.dg/gomp/declare-simd-1.f90: Likewise
	* gfortran.dg/gomp/pr29759.f90: Likewise

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index bda7f288989..17b0461276a 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -1922,6 +1928,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const omp_mask mask,
 
   if (gfc_match_omp_eos () != MATCH_YES)
 {
+  if (!gfc_error_flag_test ())
+	gfc_error ("Failed to match clause at %C");
   gfc_free_omp_clauses (c);
   return MATCH_ERROR;
 }
diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index 5bd04b839ce..4d343450555 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -609,13 +609,18 @@ decode_statement (void)
 
 /* Like match and if spec_only, goto do_spec_only without actually
matching.  */
+/* If the directive matched but the clauses failed, do not start
+   matching the next directive in the same switch statement. */
 #define matcha(keyword, subr, st)\
 do {			\
+  match m2;			\
   if (spec_only && gfc_match (keyword) == MATCH_YES)	\
 	goto do_spec_only;	\
-  else if (match_word (keyword, subr, &old_locus)		\
+  else if ((m2 = match_word (keyword, subr, &old_locus))	\
 	   == MATCH_YES)	\
 	return st;		\
+  else if (m2 == MATCH_ERROR)\
+	goto error_handling;	\
   else			\
 	undo_new_statement ();  	\
 } while (0)
@@ -711,6 +716,7 @@ decode_oacc_directive (void)
   /* Directive not found or stored an error message.
  Check and give up.  */
 
+ error_handling:
   if (gfc_error_check () == 0)
 gfc_error_now ("Unclassifiable OpenACC directive at %C");
 
@@ -746,18 +752,23 @@ decode_oacc_directive (void)
 
 /* Like match, but don't match anything if not -fopenmp
and if spec_only, goto do_spec_only without actually matching.  */
+/* If the directive matched but the clauses failed, do not star

[PATCH] Split out PHI transform from vectorizable_reduction

2019-10-02 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-10-02  Richard Biener  

* tree-vectorizer.h (stmt_vec_info_type::cycle_phi_info_type):
New.
(vect_transform_cycle_phi): Declare.
* tree-vect-stmts.c (vect_transform_stmt): Call
vect_transform_cycle_phi.
* tree-vect-loop.c (vectorizable_reduction): Split out
PHI transformation stage to ...
(vect_transform_cycle_phi): ... here.

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 350cee58246..a3fd011e6c4 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -5783,7 +5783,6 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
gimple_stmt_iterator *gsi,
   bool is_simple_use;
   int i;
   int ncopies;
-  stmt_vec_info prev_phi_info;
   bool single_defuse_cycle = false;
   int j;
   tree ops[3];
@@ -5811,207 +5810,15 @@ vectorizable_reduction (stmt_vec_info stmt_info, 
gimple_stmt_iterator *gsi,
 gcc_assert (slp_node
&& REDUC_GROUP_FIRST_ELEMENT (stmt_info) == stmt_info);
 
-  if (gphi *phi = dyn_cast  (stmt_info->stmt))
+  if (is_a  (stmt_info->stmt))
 {
-  tree phi_result = gimple_phi_result (phi);
   /* Analysis is fully done on the reduction stmt invocation.  */
-  if (! vec_stmt)
-   {
- if (slp_node)
-   slp_node_instance->reduc_phis = slp_node;
-
- STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
- return true;
-   }
-
-  if (STMT_VINFO_REDUC_TYPE (stmt_info) == FOLD_LEFT_REDUCTION)
-   /* Leave the scalar phi in place.  Note that checking
-  STMT_VINFO_VEC_REDUCTION_TYPE (as below) only works
-  for reductions involving a single statement.  */
-   return true;
-
-  stmt_vec_info reduc_stmt_info = STMT_VINFO_REDUC_DEF (stmt_info);
-  reduc_stmt_info = vect_stmt_to_vectorize (reduc_stmt_info);
-
-  if (STMT_VINFO_VEC_REDUCTION_TYPE (reduc_stmt_info)
- == EXTRACT_LAST_REDUCTION)
-   /* Leave the scalar phi in place.  */
-   return true;
-
-  if (gassign *reduc_stmt = dyn_cast  (reduc_stmt_info->stmt))
-   for (unsigned k = 1; k < gimple_num_ops (reduc_stmt); ++k)
- {
-   tree op = gimple_op (reduc_stmt, k);
-   if (op == phi_result)
- continue;
-   if (k == 1 && gimple_assign_rhs_code (reduc_stmt) == COND_EXPR)
- continue;
-   bool is_simple_use = vect_is_simple_use (op, loop_vinfo, &dt);
-   gcc_assert (is_simple_use);
-   if (dt == vect_constant_def || dt == vect_external_def)
- continue;
-   if (!vectype_in
-   || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_in)))
-   < GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (op)
- vectype_in = get_vectype_for_scalar_type (TREE_TYPE (op));
-   break;
- }
-  /* For a nested cycle we might end up with an operation like
- phi_result * phi_result.  */
-  if (!vectype_in)
-   vectype_in = STMT_VINFO_VECTYPE (stmt_info);
-  gcc_assert (vectype_in);
+  gcc_assert (! vec_stmt);
 
   if (slp_node)
-   {
- /* The size vect_schedule_slp_instance computes is off for us.  */
- vec_num = vect_get_num_vectors
- (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
-  * SLP_TREE_SCALAR_STMTS (slp_node).length (), vectype_in);
- ncopies = 1;
-   }
-  else
-   {
- vec_num = 1;
- ncopies = vect_get_num_copies (loop_vinfo, vectype_in);
-   }
-
-  /* Check whether we can use a single PHI node and accumulate
- vectors to one before the backedge.  */
-  stmt_vec_info use_stmt_info;
-  if (ncopies > 1
- && STMT_VINFO_RELEVANT (reduc_stmt_info) <= vect_used_only_live
- && (use_stmt_info = loop_vinfo->lookup_single_use (phi_result))
- && (!STMT_VINFO_IN_PATTERN_P (use_stmt_info)
- || !STMT_VINFO_PATTERN_DEF_SEQ (use_stmt_info))
- && vect_stmt_to_vectorize (use_stmt_info) == reduc_stmt_info)
-   {
- single_defuse_cycle = true;
- ncopies = 1;
-   }
-
-  /* Create the destination vector  */
-  tree vec_dest = vect_create_destination_var (phi_result, vectype_out);
-
-  /* Get the loop-entry arguments.  */
-  tree vec_initial_def;
-  auto_vec vec_initial_defs;
-  if (slp_node)
-   {
- vec_initial_defs.reserve (vec_num);
- gcc_assert (slp_node == slp_node_instance->reduc_phis);
- stmt_vec_info first = REDUC_GROUP_FIRST_ELEMENT (reduc_stmt_info);
- tree neutral_op
- = neutral_op_for_slp_reduction (slp_node,
- STMT_VINFO_REDUC_CODE
- (first ? first : reduc_stmt_info),
- first != NULL);
- get_initial_defs_for_reduction (slp_node_instance->red

[PR47785] COLLECT_AS_OPTIONS

2019-10-02 Thread Kugan Vivekanandarajah
Hi,

As mentioned in the PR, attached patch adds COLLECT_AS_OPTIONS for
passing assembler options specified with -Wa, to the link-time driver.

The proposed solution only works for uniform -Wa options across all
TUs. As mentioned by Richard Biener, supporting non-uniform -Wa flags
would require either adjusting partitioning according to flags or
emitting multiple object files  from a single LTRANS CU. We could
consider this as a follow up.

Bootstrapped and regression tests on  arm-linux-gcc. Is this OK for trunk?

Thanks,
Kugan


gcc/ChangeLog:

2019-10-02  kugan.vivekanandarajah  

PR lto/78353
* gcc.c (putenv_COLLECT_AS_OPTION): New to set COLLECT_AS_OPTION in env.
(driver::main): Call putenv_COLLECT_AS_OPTION.
* lto-wrapper.c (run_gcc): use COLLECT_AS_OPTION from env.

gcc/testsuite/ChangeLog:

2019-10-02  kugan.vivekanandarajah  

PR lto/78353
* gcc.target/arm/pr78353-1.c: New test.
* gcc.target/arm/pr78353-2.c: New test.
From 6968d4343b2442736946a07df4eca969c916ccd3 Mon Sep 17 00:00:00 2001
From: Kugan 
Date: Sat, 28 Sep 2019 02:11:49 +1000
Subject: [PATCH] COLLECT_AS support

---
 gcc/gcc.c| 29 
 gcc/lto-wrapper.c|  6 -
 gcc/testsuite/gcc.target/arm/pr78353-1.c |  9 
 gcc/testsuite/gcc.target/arm/pr78353-2.c |  9 
 4 files changed, 52 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr78353-1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pr78353-2.c

diff --git a/gcc/gcc.c b/gcc/gcc.c
index 1216cdd505a..058f612d1f4 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -5239,6 +5239,34 @@ do_specs_vec (vec vec)
 }
 }
 
+/* Store switches specified for as with -Wa in COLLECT_AS_OPTIONS
+   and place that in the environment.  */
+static void
+putenv_COLLECT_AS_OPTION (vec vec)
+{
+  unsigned ix;
+  char *opt;
+  int len = vec.length ();
+
+  if (!len)
+ return;
+
+  obstack_init (&collect_obstack);
+  obstack_grow (&collect_obstack, "COLLECT_AS_OPTION=",
+		sizeof ("COLLECT_AS_OPTION=") - 1);
+  obstack_grow (&collect_obstack, "-Wa,", strlen ("-Wa,"));
+
+  FOR_EACH_VEC_ELT (vec, ix, opt)
+  {
+  obstack_grow (&collect_obstack, opt, strlen (opt));
+  --len;
+  if (len)
+	obstack_grow (&collect_obstack, ",", strlen (","));
+  }
+
+  xputenv (XOBFINISH (&collect_obstack, char *));
+}
+
 /* Process the sub-spec SPEC as a portion of a larger spec.
This is like processing a whole spec except that we do
not initialize at the beginning and we do not supply a
@@ -7360,6 +7388,7 @@ driver::main (int argc, char **argv)
   global_initializations ();
   build_multilib_strings ();
   set_up_specs ();
+  putenv_COLLECT_AS_OPTION (assembler_options);
   putenv_COLLECT_GCC (argv[0]);
   maybe_putenv_COLLECT_LTO_WRAPPER ();
   maybe_putenv_OFFLOAD_TARGETS ();
diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 9a7bbd0c022..64dfabc202a 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1250,7 +1250,7 @@ run_gcc (unsigned argc, char *argv[])
   const char **argv_ptr;
   char *list_option_full = NULL;
   const char *linker_output = NULL;
-  const char *collect_gcc, *collect_gcc_options;
+  const char *collect_gcc, *collect_gcc_options, *collect_as_option;
   int parallel = 0;
   int jobserver = 0;
   int auto_parallel = 0;
@@ -1283,6 +1283,7 @@ run_gcc (unsigned argc, char *argv[])
   get_options_from_collect_gcc_options (collect_gcc, collect_gcc_options,
 	&decoded_options,
 	&decoded_options_count);
+  collect_as_option = getenv ("COLLECT_AS_OPTION");
 
   /* Allocate array for input object files with LTO IL,
  and for possible preceding arguments.  */
@@ -1345,6 +1346,9 @@ run_gcc (unsigned argc, char *argv[])
   obstack_init (&argv_obstack);
   obstack_ptr_grow (&argv_obstack, collect_gcc);
   obstack_ptr_grow (&argv_obstack, "-xlto");
+  if (collect_as_option)
+obstack_ptr_grow (&argv_obstack, collect_as_option);
+
   obstack_ptr_grow (&argv_obstack, "-c");
 
   append_compiler_options (&argv_obstack, fdecoded_options,
diff --git a/gcc/testsuite/gcc.target/arm/pr78353-1.c b/gcc/testsuite/gcc.target/arm/pr78353-1.c
new file mode 100644
index 000..bba81ee50c3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr78353-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile }  */
+/* { dg-options "-march=armv7-a -mthumb -O2 -flto -Wa,-mimplicit-it=always" }  */
+
+int main(int x)
+{
+  asm("teq %0, #0; addne %0, %0, #1" : "=r" (x));
+  return x;
+}
+
diff --git a/gcc/testsuite/gcc.target/arm/pr78353-2.c b/gcc/testsuite/gcc.target/arm/pr78353-2.c
new file mode 100644
index 000..776eb64b8c7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr78353-2.c
@@ -0,0 +1,9 @@
+/* { dg-do compile }  */
+/* { dg-options "-march=armv7-a -mthumb -O2 -flto -Wa,-mimplicit-it=always,-mthumb" }  */
+
+int main(int x)
+{
+  asm("teq %0, #0; addne %0, %0, #1" : "=r" (x));
+  return x;
+}
+
-- 
2.17.1



Re: [PATCH] Add some hash_map_safe_* functions like vec_safe_*.

2019-10-02 Thread Richard Biener
On Tue, Oct 1, 2019 at 8:50 PM Jason Merrill  wrote:
>
> On 10/1/19 3:34 AM, Richard Biener wrote:
> > On Mon, Sep 30, 2019 at 8:33 PM Jason Merrill  wrote:
> >>
> >> My comments accidentally got lost.
> >>
> >> Several places in the front-end (and elsewhere) use the same lazy
> >> allocation pattern for hash_maps, and this patch replaces them with
> >> hash_map_safe_* functions like vec_safe_*.  They don't provide a way
> >> to specify an initial size, but I don't think that's a significant
> >> issue.
> >>
> >> Tested x86_64-pc-linux-gnu.  OK for trunk?
> >
> > You are using create_ggc but the new functions do not indicate that ggc
> > allocation is done.
> > It's then also incomplete with not having a non-ggc variant
> > of them?  Maybe I'm missing something.
>
> Ah, I had been thinking that this lazy pattern would only be used with
> ggc, but I see that I was wrong.  How's this?
>
> Incidentally, now I see another C++11 feature I'd like to be able to
> use: default template arguments for function templates.

I presume

template
inline bool
hash_map_safe_put (hash_map *&h, const K& k, const V& v, size_t
size = default_hash_map_size)
{
  return hash_map_maybe_create (h, size)->put (k, v);
}

was deemed to ugly?  IMHO instantiating the templates for different sizes
is unwanted compile-time overhead (plus not being able to use
a default value makes non-default values creep into the code-base?).

I'd have OKed a variant like above, so - would that work for you
(change hash_map_maybe_create as well then, of course).

Thanks,
Richard.


Re: [PATCH, Fortran] Optionally suppress no-automatic overwrites recursive warning - for approval

2019-10-02 Thread Mark Eggleston



On 28/09/2019 17:50, Steve Kargl wrote:

On Thu, Sep 26, 2019 at 09:45:28AM +0100, Mark Eggleston wrote:

Original thread starts here
https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01185.html

OK to commit?


I'm not a big fan of option proliferation.  If you don't
want to see warns just use -w.  Of course, this is just
MHO.

Unfortunately -w switches off ALL warnings.

--
https://www.codethink.co.uk/privacy.html