date:20191007

use call-clobbered reg to disalign the stack

2019-10-07 Thread Alexandre Oliva

Some x86 tests of stack realignment, that disaligned the stack with
pushes and pops, failed when the compiler was configured to tune for a
target that preferred to accumulate outgoing arguments: the stack
space is reserved before the asm push, the call sequence overwrites
the saved register, and then the asm pop restores the overwritten
value.  Since that's a call-preserved register in 32-bit mode, it
should be preserved unchanged, but isn't.

Merely changing the register to a call-clobbered one would be enough,
but the tests would remain fragile and prone to failure due to other
optimizations, so I arranged for the compiler to be made aware of the
register used for the push and the pop, so it won't use it for
something else, and forced the function to use a frame pointer, so
that it won't use stack pointer offsets for local variables: the
offsets would likely be wrong between the asm push and pop.

Tested on x86_64-linux-gnu with -m64 and -m32.  Ok to install?


for  gcc/testsuite/ChangeLog

* gcc.target/i386/20060512-1.c (sse2_test): Use a
call-clobbered register variable for stack-disaligning push
and pop.  Require a frame pointer.
* gcc.target/i386/20060512-3.c (sse2_test): Likewise.
---
 gcc/testsuite/gcc.target/i386/20060512-1.c |   16 +---
 gcc/testsuite/gcc.target/i386/20060512-3.c |7 ---
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/20060512-1.c 
b/gcc/testsuite/gcc.target/i386/20060512-1.c
index ec163a9bc51e9..fe95f6d52fa98 100644
--- a/gcc/testsuite/gcc.target/i386/20060512-1.c
+++ b/gcc/testsuite/gcc.target/i386/20060512-1.c
@@ -7,11 +7,11 @@
 #include 
 
 #ifdef __x86_64__
-# define PUSH "pushq %rsi"
-# define POP "popq %rsi"
+# define REG "rcx"
+# define WIDTH "q"
 #else
-# define PUSH "pushl %esi"
-# define POP "popl %esi"
+# define REG "ecx"
+# define WIDTH "l"
 #endif
 
 __m128i __attribute__ ((__noinline__))
@@ -30,13 +30,15 @@ self_aligning_function (int x, int y)
 int g_1 = 20;
 int g_2 = 22;
 
-static void
+static void __attribute__ ((__optimize__ ("-fno-omit-frame-pointer")))
 sse2_test (void)
 {
   int result;
-  asm (PUSH);  /* Misalign runtime stack.  */
+  register int __attribute__ ((__mode__ (__word__))) reg asm (REG);
+  asm volatile ("push" WIDTH "\t%0"  /* Disalign runtime stack.  */
+   : : "r" (reg) : "memory");
   result = self_aligning_function (g_1, g_2);
   if (result != 42)
 abort ();
-  asm (POP);
+  asm volatile ("pop" WIDTH "\t%0" : "=r" (reg));
 }
diff --git a/gcc/testsuite/gcc.target/i386/20060512-3.c 
b/gcc/testsuite/gcc.target/i386/20060512-3.c
index 3370b9ec25afb..0cebb47f6e9bc 100644
--- a/gcc/testsuite/gcc.target/i386/20060512-3.c
+++ b/gcc/testsuite/gcc.target/i386/20060512-3.c
@@ -23,13 +23,14 @@ self_aligning_function (int x, int y)
 int g_1 = 20;
 int g_2 = 22;
 
-static void
+static void __attribute__ ((__optimize__ ("-fno-omit-frame-pointer")))
 sse2_test (void)
 {
   int result;
-  asm ("pushl %esi");  /* Disalign runtime stack.  */
+  register int reg asm ("ecx");
+  asm ("pushl\t%0": : "r" (reg) : "memory"); /* Disalign runtime stack.  */
   result = self_aligning_function (g_1, g_2);
   if (result != 42)
 abort ();
-  asm ("popl %esi");
+  asm ("popl\t%0" : "=r" (reg));
 }


-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free!FSF VP & FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara

[PATCH] avoid a spurious -Wstringop-overflow due to multiple MEM_REFs (PR 92014)

2019-10-07 Thread Martin Sebor


Last week's enhancement to detect one-byte buffer overflows exposed
a bug that let the warning use the size of a prior MEM_REF access
and "override" the size of the actual store to the character array.
When the store was smaller than the prior access (e.g., one byte,
vs an 8-byte null pointer read such as in a PHI), this would lead
to a false positive.

The attached patch has the function fail after it has determined
the size of the store from a MEM_REF if one of its recursive
invocations finds another MEM_REF.

Tested on x86_64-linux.  Since the bug is causing trouble in Glibc
builds I will plan on committing the fix tomorrow.

Martin

PR middle-end/92014 - bogus warning: writing 8 bytes into a region of size 1 in timezone/zic.c

gcc/ChangeLog:

	PR middle-end/92014
	* tree-ssa-strlen.c (count_nonzero_bytes): Avoid recursing for MEM_REF
	again once nbytes has been set .

gcc/testsuite/ChangeLog:

	PR middle-end/92014
	* gcc.dg/Wstringop-overflow-19.c: New test.

Index: gcc/testsuite/gcc.dg/Wstringop-overflow-19.c
===
--- gcc/testsuite/gcc.dg/Wstringop-overflow-19.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/Wstringop-overflow-19.c	(working copy)
@@ -0,0 +1,27 @@
+/* PR middle-end/92014 - bogus warning: writing 8 bytes into a region
+   of size 1 in timezone/zic.c
+   { dg-do compile }
+   { dg-options "-O2 -Wall" } */
+
+struct
+{
+  char *s1, *s2;
+  char c;
+} z;
+
+
+void f (char **a, int i, int j)
+{
+  char * cp = __builtin_strchr (a[i], '%');
+
+  if (cp && *++cp != 's')
+return;
+
+  z.s1 = __builtin_strdup (a[i]);
+  if (!z.s1) __builtin_abort ();
+
+  z.s2 = __builtin_strdup (a[j]);
+  if (!z.s2) __builtin_abort ();
+
+  z.c = cp ? *cp : '\0';// { dg-bogus "\\\[-Wstringop-overflow" }
+}
Index: gcc/tree-ssa-strlen.c
===
--- gcc/tree-ssa-strlen.c	(revision 276657)
+++ gcc/tree-ssa-strlen.c	(working copy)
@@ -3741,13 +3741,16 @@ int ssa_name_limit_t::next_ssa_name (tree ssa_name
   return 0;
 }
 
-/* Determine the minimum and maximum number of leading non-zero bytes
+/* Determines the minimum and maximum number of leading non-zero bytes
in the representation of EXP and set LENRANGE[0] and LENRANGE[1]
-   to each.  Set LENRANGE[2] to the total number of bytes in
-   the representation.  Set *NULTREM if the representation contains
-   a zero byte, and set *ALLNUL if all the bytes are zero.  Avoid
-   recursing deeper than the limits in SNLIM allow.  Return true
-   on success and false otherwise.  */
+   to each.  Sets LENRANGE[2] to the total number of bytes in
+   the representation.  Sets *NULTREM if the representation contains
+   a zero byte, and sets *ALLNUL if all the bytes are zero.
+   OFFSET and NBYTES are the offset into the representation and
+   the size of the access to it determined from a MEM_REF or zero
+   for other expressions.
+   Avoid recursing deeper than the limits in SNLIM allow.
+   Returns true on success and false otherwise.  */
 
 static bool
 count_nonzero_bytes (tree exp, unsigned HOST_WIDE_INT offset,
@@ -3843,6 +3846,9 @@ count_nonzero_bytes (tree exp, unsigned HOST_WIDE_
 
   if (TREE_CODE (exp) == MEM_REF)
 {
+  if (nbytes)
+	return false;
+
   tree arg = TREE_OPERAND (exp, 0);
   tree off = TREE_OPERAND (exp, 1);
 
@@ -3910,8 +3916,10 @@ count_nonzero_bytes (tree exp, unsigned HOST_WIDE_
 	  lenrange[0] = 0;
 	  prep = NULL;
 	}
-  else
+  else if (!nbytes)
 	nbytes = repsize;
+  else if (nbytes < repsize)
+	return false;
 }
 
   if (!nbytes)

[PATCH] handle arrays in -Wclass-memaccess (PR 92001)

2019-10-07 Thread Martin Sebor


-Wclass-memaccess doesn't trigger for access to arrays of
objects whose type it's designed to trigger for.  It looks
to me like a simple oversight in maybe_warn_class_memaccess.
Attached is a trivial patch to correct it tested on
x86_64-linux.

Martin
PR c++/92001 - missing -Wclass-memaccess with array as first argument to memset

gcc/cp/ChangeLog:

	PR c++/92001
	* call.c (maybe_warn_class_memaccess): Handle arrays.

gcc/testsuite/ChangeLog:

	PR c++/92001
	* g++.dg/Wclass-memaccess-5.C: New test.

Index: gcc/cp/call.c
===
--- gcc/cp/call.c	(revision 276657)
+++ gcc/cp/call.c	(working copy)
@@ -8910,7 +8910,9 @@ maybe_warn_class_memaccess (location_t loc, tree f
   unsigned srcidx = !dstidx;
 
   tree dest = (*args)[dstidx];
-  if (!TREE_TYPE (dest) || !INDIRECT_TYPE_P (TREE_TYPE (dest)))
+  if (!TREE_TYPE (dest)
+  || (TREE_CODE (TREE_TYPE (dest)) != ARRAY_TYPE
+	  && !INDIRECT_TYPE_P (TREE_TYPE (dest
 return;
 
   tree srctype = NULL_TREE;
Index: gcc/testsuite/g++.dg/Wclass-memaccess-5.C
===
--- gcc/testsuite/g++.dg/Wclass-memaccess-5.C	(nonexistent)
+++ gcc/testsuite/g++.dg/Wclass-memaccess-5.C	(working copy)
@@ -0,0 +1,18 @@
+/* PR c++/92001 - missing -Wclass-memaccess with array as first argument
+   to memset
+   { dg-do compile }
+   { dg-options "-Wall" } */
+
+extern "C" void* memset (void*, int, __SIZE_TYPE__);
+
+struct S { S (); };
+
+void test_array_access (S *p, S (*pa)[2], S ()[3])
+{
+  S a[1];
+  memset (a, 0, sizeof a);// { dg-warning "-Wclass-memaccess" }
+
+  memset (*pa, 0, sizeof *pa);// { dg-warning "-Wclass-memaccess" }
+
+  memset (r, 0, sizeof r);// { dg-warning "-Wclass-memaccess" }
+}

Make C2X imply -fno-fp-int-builtin-inexact

2019-10-07 Thread Joseph Myers

Since TS 18661-1 has been integrated into C2X, this patch makes C2X
imply -fno-fp-int-builtin-inexact.

Bootstrapped with no regressions on x86_64-pc-linux-gnu.  Applied to 
mainline.

gcc:
2019-10-08  Joseph Myers  

* doc/invoke.texi (-ffp-int-builtin-inexact): Document
-fno-fp-int-builtin-inexact default for C2X.

gcc/c-family:
2019-10-08  Joseph Myers  

* c-opts.c (c_common_post_options): Set
-fno-fp-int-builtin-inexact for C2X.

gcc/testsuite:
2019-10-08  Joseph Myers  

* gcc.dg/torture/builtin-fp-int-inexact-c2x.c: New test.

Index: gcc/c-family/c-opts.c
===
--- gcc/c-family/c-opts.c   (revision 276655)
+++ gcc/c-family/c-opts.c   (working copy)
@@ -826,6 +826,12 @@ c_common_post_options (const char **pfilename)
   else
 flag_permitted_flt_eval_methods = PERMITTED_FLT_EVAL_METHODS_C11;
 
+  /* C2X Annex F does not permit certain built-in functions to raise
+ "inexact".  */
+  if (flag_isoc2x
+  && !global_options_set.x_flag_fp_int_builtin_inexact)
+flag_fp_int_builtin_inexact = 0;
+
   /* By default we use C99 inline semantics in GNU99 or C99 mode.  C99
  inline semantics are not supported in GNU89 or C89 mode.  */
   if (flag_gnu89_inline == -1)
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 276655)
+++ gcc/doc/invoke.texi (working copy)
@@ -10810,12 +10810,12 @@ Do not allow the built-in functions @code{ceil}, @
 double} variants, to generate code that raises the ``inexact''
 floating-point exception for noninteger arguments.  ISO C99 and C11
 allow these functions to raise the ``inexact'' exception, but ISO/IEC
-TS 18661-1:2014, the C bindings to IEEE 754-2008, does not allow these
-functions to do so.
+TS 18661-1:2014, the C bindings to IEEE 754-2008, as integrated into
+ISO C2X, does not allow these functions to do so.
 
 The default is @option{-ffp-int-builtin-inexact}, allowing the
-exception to be raised.  This option does nothing unless
-@option{-ftrapping-math} is in effect.
+exception to be raised, unless C2X or a later C standard is selected.
+This option does nothing unless @option{-ftrapping-math} is in effect.
 
 Even if @option{-fno-fp-int-builtin-inexact} is used, if the functions
 generate a call to a library function then the ``inexact'' exception
Index: gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact-c2x.c
===
--- gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact-c2x.c   (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/builtin-fp-int-inexact-c2x.c   (working copy)
@@ -0,0 +1,7 @@
+/* Test C2X enables -fno-fp-int-builtin-inexact.  */
+/* { dg-do run } */
+/* { dg-options "-std=c2x" } */
+/* { dg-add-options c99_runtime } */
+/* { dg-require-effective-target fenv_exceptions } */
+
+#include "builtin-fp-int-inexact.c"

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [SVE] PR86753

2019-10-07 Thread Prathamesh Kulkarni

On Fri, 4 Oct 2019 at 16:08, Richard Biener  wrote:
>
> On Thu, Oct 3, 2019 at 1:42 AM Prathamesh Kulkarni
>  wrote:
> >
> > On Wed, 25 Sep 2019 at 09:17, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Mon, 16 Sep 2019 at 08:54, Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > On Mon, 9 Sep 2019 at 09:36, Prathamesh Kulkarni
> > > >  wrote:
> > > > >
> > > > > On Mon, 9 Sep 2019 at 16:45, Richard Sandiford
> > > > >  wrote:
> > > > > >
> > > > > > Prathamesh Kulkarni  writes:
> > > > > > > With patch, the only following FAIL remains for aarch64-sve.exp:
> > > > > > > FAIL: gcc.target/aarch64/sve/cond_unary_2.c -march=armv8.2-a+sve
> > > > > > > scan-assembler-times \\tmovprfx\\t 6
> > > > > > > which now contains 14.
> > > > > > > Should I adjust the test, assuming the change isn't a regression ?
> > > > > >
> > > > > > Well, it is kind-of a regression, but it really just means that the
> > > > > > integer code is now consistent with the floating-point code in 
> > > > > > having
> > > > > > an unnecessary MOVPRFX.  So I think adjusting the count is fine.
> > > > > > Presumably any future fix for the existing redundant MOVPRFXs will
> > > > > > apply to the new ones as well.
> > > > > >
> > > > > > The patch looks good to me, just some very minor nits:
> > > > > >
> > > > > > > @@ -8309,11 +8309,12 @@ vect_double_mask_nunits (tree type)
> > > > > > >
> > > > > > >  /* Record that a fully-masked version of LOOP_VINFO would need 
> > > > > > > MASKS to
> > > > > > > contain a sequence of NVECTORS masks that each control a 
> > > > > > > vector of type
> > > > > > > -   VECTYPE.  */
> > > > > > > +   VECTYPE. SCALAR_MASK if non-null, represents the mask used 
> > > > > > > for corresponding
> > > > > > > +   load/store stmt.  */
> > > > > >
> > > > > > Should be two spaces between sentences.  Maybe:
> > > > > >
> > > > > >VECTYPE.  If SCALAR_MASK is nonnull, the fully-masked loop would 
> > > > > > AND
> > > > > >these vector masks with the vector version of SCALAR_MASK.  */
> > > > > >
> > > > > > since the mask isn't necessarily for a load or store statement.
> > > > > >
> > > > > > > [...]
> > > > > > > @@ -1879,7 +1879,8 @@ static tree permute_vec_elements (tree, 
> > > > > > > tree, tree, stmt_vec_info,
> > > > > > > says how the load or store is going to be implemented and 
> > > > > > > GROUP_SIZE
> > > > > > > is the number of load or store statements in the containing 
> > > > > > > group.
> > > > > > > If the access is a gather load or scatter store, GS_INFO 
> > > > > > > describes
> > > > > > > -   its arguments.
> > > > > > > +   its arguments. SCALAR_MASK is the scalar mask used for 
> > > > > > > corresponding
> > > > > > > +   load or store stmt.
> > > > > >
> > > > > > Maybe:
> > > > > >
> > > > > >its arguments.  If the load or store is conditional, SCALAR_MASK 
> > > > > > is the
> > > > > >condition under which it occurs.
> > > > > >
> > > > > > since SCALAR_MASK can be null here too.
> > > > > >
> > > > > > > [...]
> > > > > > > @@ -9975,6 +9978,31 @@ vectorizable_condition (stmt_vec_info 
> > > > > > > stmt_info, gimple_stmt_iterator *gsi,
> > > > > > >/* Handle cond expr.  */
> > > > > > >for (j = 0; j < ncopies; j++)
> > > > > > >  {
> > > > > > > +  tree loop_mask = NULL_TREE;
> > > > > > > +  bool swap_cond_operands = false;
> > > > > > > +
> > > > > > > +  if (loop_vinfo && LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> > > > > > > + {
> > > > > > > +   scalar_cond_masked_key cond (cond_expr, ncopies);
> > > > > > > +   if (loop_vinfo->scalar_cond_masked_set.contains (cond))
> > > > > > > + {
> > > > > > > +   vec_loop_masks *masks = _VINFO_MASKS 
> > > > > > > (loop_vinfo);
> > > > > > > +   loop_mask = vect_get_loop_mask (gsi, masks, ncopies, 
> > > > > > > vectype, j);
> > > > > > > + }
> > > > > > > +   else
> > > > > > > + {
> > > > > > > +   cond.code = invert_tree_comparison (cond.code,
> > > > > > > +   HONOR_NANS 
> > > > > > > (TREE_TYPE (cond.op0)));
> > > > > >
> > > > > > Long line.  Maybe just split it out into a separate assignment:
> > > > > >
> > > > > >   bool honor_nans = HONOR_NANS (TREE_TYPE (cond.op0));
> > > > > >   cond.code = invert_tree_comparison (cond.code, 
> > > > > > honor_nans);
> > > > > >
> > > > > > > +   if (loop_vinfo->scalar_cond_masked_set.contains 
> > > > > > > (cond))
> > > > > > > + {
> > > > > > > +   vec_loop_masks *masks = _VINFO_MASKS 
> > > > > > > (loop_vinfo);
> > > > > > > +   loop_mask = vect_get_loop_mask (gsi, masks, 
> > > > > > > ncopies, vectype, j);
> > > > > >
> > > > > > Long line here too.
> > > > > >
> > > > > > > [...]
> > > > > > > @@ -10090,6 +10121,26 @@ vectorizable_condition (stmt_vec_info 
> > > > > > > stmt_info, gimple_stmt_iterator *gsi,
> > > > > > >   }
> > > > > > >

Re: [SVE] PR91532

2019-10-07 Thread Prathamesh Kulkarni

On Mon, 7 Oct 2019 at 03:11, Richard Biener  wrote:
>
> On Fri, 4 Oct 2019, Prathamesh Kulkarni wrote:
>
> > On Fri, 4 Oct 2019 at 12:18, Richard Biener  wrote:
> > >
> > > On Thu, 3 Oct 2019, Prathamesh Kulkarni wrote:
> > >
> > > > On Wed, 2 Oct 2019 at 12:28, Richard Biener  wrote:
> > > > >
> > > > > On Wed, 2 Oct 2019, Prathamesh Kulkarni wrote:
> > > > >
> > > > > > On Wed, 2 Oct 2019 at 01:08, Jeff Law  wrote:
> > > > > > >
> > > > > > > On 10/1/19 12:40 AM, Richard Biener wrote:
> > > > > > > > On Mon, 30 Sep 2019, Prathamesh Kulkarni wrote:
> > > > > > > >
> > > > > > > >> On Wed, 25 Sep 2019 at 23:44, Richard Biener 
> > > > > > > >>  wrote:
> > > > > > > >>>
> > > > > > > >>> On Wed, 25 Sep 2019, Prathamesh Kulkarni wrote:
> > > > > > > >>>
> > > > > > >  On Fri, 20 Sep 2019 at 15:20, Jeff Law  
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On 9/19/19 10:19 AM, Prathamesh Kulkarni wrote:
> > > > > > > >> Hi,
> > > > > > > >> For PR91532, the dead store is trivially deleted if we 
> > > > > > > >> place dse pass
> > > > > > > >> between ifcvt and vect. Would it be OK to add another 
> > > > > > > >> instance of dse there ?
> > > > > > > >> Or should we add an ad-hoc "basic-block dse" sub-pass to 
> > > > > > > >> ifcvt that
> > > > > > > >> will clean up the dead store ?
> > > > > > > > I'd hesitate to add another DSE pass.  If there's one 
> > > > > > > > nearby could we
> > > > > > > > move the existing pass?
> > > > > > >  Well I think the nearest one is just after 
> > > > > > >  pass_warn_restrict. Not
> > > > > > >  sure if it's a good
> > > > > > >  idea to move it up from there ?
> > > > > > > >>>
> > > > > > > >>> You'll need it inbetween ifcvt and vect so it would be 
> > > > > > > >>> disabled
> > > > > > > >>> w/o vectorization, so no, that doesn't work.
> > > > > > > >>>
> > > > > > > >>> ifcvt already invokes SEME region value-numbering so if we had
> > > > > > > >>> MESE region DSE it could use that.  Not sure if you feel like
> > > > > > > >>> refactoring DSE to work on regions - it currently uses a DOM
> > > > > > > >>> walk which isn't suited for that.
> > > > > > > >>>
> > > > > > > >>> if-conversion has a little "local" dead predicate compute 
> > > > > > > >>> removal
> > > > > > > >>> thingy (not that I like that), eventually it can be enhanced 
> > > > > > > >>> to
> > > > > > > >>> do the DSE you want?  Eventually it should be moved after the 
> > > > > > > >>> local
> > > > > > > >>> CSE invocation though.
> > > > > > > >> Hi,
> > > > > > > >> Thanks for the suggestions.
> > > > > > > >> For now, would it be OK to do "dse" on loop header in
> > > > > > > >> tree_if_conversion, as in the attached patch ?
> > > > > > > >> The patch does local dse in a new function ifcvt_local_dse 
> > > > > > > >> instead of
> > > > > > > >> ifcvt_local_dce, because it needed to be done after RPO VN 
> > > > > > > >> which
> > > > > > > >> eliminates:
> > > > > > > >> Removing dead stmt _ifc__62 = *_55;
> > > > > > > >> and makes the following store dead:
> > > > > > > >> *_55 = _ifc__61;
> > > > > > > >
> > > > > > > > I suggested trying to move ifcvt_local_dce after RPO VN, you 
> > > > > > > > could
> > > > > > > > try that as independent patch (pre-approved).
> > > > > > > >
> > > > > > > > I don't mind the extra walk though.
> > > > > > > >
> > > > > > > > What I see as possible issue is that dse_classify_store walks 
> > > > > > > > virtual
> > > > > > > > uses and I'm not sure if the loop exit is a natural boundary for
> > > > > > > > such walk (eventually the loop header virtual PHI is reached but
> > > > > > > > there may also be a loop-closed PHI for the virtual operand,
> > > > > > > > but not necessarily).  So the question is whether to add a
> > > > > > > > "stop at" argument to dse_classify_store specifying the virtual
> > > > > > > > use the walk should stop at?
> > > > > > > I think we want to stop at the block boundary -- aren't the cases 
> > > > > > > we
> > > > > > > care about here local to a block?
> > > > > > This version restricts walking in dse_classify_store to basic-block 
> > > > > > if
> > > > > > bb_only is true,
> > > > > > and removes dead stores in ifcvt_local_dce instead of separate walk.
> > > > > > Does it look OK ?
> > > > >
> > > > > As relied to Jeff please make it trivially work for SESE region walks
> > > > > by specifying the exit virtual operand to stop on.
> > > > Thanks for the suggestions, does the attached patch look OK ?
> > > > Um, sorry, I don't really understand why we need to specify virtual
> > > > phi arg from back edge to stop the walk.
> > > > Would it achieve the same effect if we pass index of loop->latch to
> > > > stop the walk ?
> > >
> > > Since DSE walks the virtual def->use chain it makes sense to stop
> > > it at a specific virtual definition.  Note
> > >
> > > @@ -706,7 +699,9 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
> > >

Re: C++ PATCH for C++20 P0388R4 (conversions to arrays of unknown bounds) and CWG 1307 (c++/91364, c++/69531)

2019-10-07 Thread Marek Polacek

On Mon, Oct 07, 2019 at 02:56:10PM -0400, Jason Merrill wrote:
> > @@ -1378,7 +1381,7 @@ standard_conversion (tree to, tree from, tree expr, 
> > bool c_cast_p,
> > if (same_type_p (from, to))
> > /* OK */;
> > -  else if (c_cast_p && comp_ptr_ttypes_const (to, from))
> > +  else if (c_cast_p && comp_ptr_ttypes_const (to, from, bounds_none))
> > /* In a C-style cast, we ignore CV-qualification because we
> >are allowed to perform a static_cast followed by a
> >const_cast.  */
> 
> Hmm, I'd expect bounds_either for a C-style cast.

Makes sense, it's just that const_cast shouldn't drop the bounds.

> > @@ -1670,7 +1673,14 @@ reference_binding (tree rto, tree rfrom, tree expr, 
> > bool c_cast_p, int flags,
> > maybe_warn_cpp0x (CPP0X_INITIALIZER_LISTS);
> > /* DR 1288: Otherwise, if the initializer list has a single element
> >  of type E and ... [T's] referenced type is reference-related to E,
> > -the object or reference is initialized from that element... */
> > +the object or reference is initialized from that element...
> > +
> > +??? With P0388R4, we should bind 't' directly to U{}:
> > +  using U = A[2];
> > +  A (&)[] = {U{}};
> > +because A[] and A[2] are reference-related.  But we don't do it
> > +because grok_reference_init has deduced the array size (to 1), and
> > +A[1] and A[2] aren't reference-related.  */
> 
> That sounds like a bug in grok_reference_init; it isn't properly
> implementing
> 
> "Otherwise, if the initializer list has a single element of type E and
> either T is not a reference type or its
> referenced type is reference-related to E, the object or reference is
> initialized from that element"

Can that be fixed in a follow up?

> > if (CONSTRUCTOR_NELTS (expr) == 1)
> > {
> >   tree elt = CONSTRUCTOR_ELT (expr, 0)->value;
> > @@ -6982,6 +6992,27 @@ maybe_inform_about_fndecl_for_bogus_argument_init 
> > (tree fn, int argnum)
> > "  initializing argument %P of %qD", argnum, fn);
> >   }
> > +/* Maybe warn about C++20 Conversions to arrays of unknown bound.  C is
> > +   the conversion, EXPR is the expression we're converting.  */
> > +
> > +static void
> > +maybe_warn_array_conv (location_t loc, conversion *c, tree expr)
> > +{
> > +  if (cxx_dialect >= cxx2a)
> > +return;
> > +
> > +  tree type = TREE_TYPE (expr);
> > +  type = strip_pointer_operator (type);
> > +
> > +  if (TREE_CODE (type) != ARRAY_TYPE
> > +  || TYPE_DOMAIN (type) == NULL_TREE)
> > +return;
> > +
> > +  if (conv_binds_to_array_of_unknown_bound (c))
> > +pedwarn (loc, OPT_Wpedantic, "conversions to arrays of unknown bound "
> > +"are only available with %<-std=c++2a%> or %<-std=gnu++2a%>");
> > +}
> > +
> >   /* Perform the conversions in CONVS on the expression EXPR.  FN and
> >  ARGNUM are used for diagnostics.  ARGNUM is zero based, -1
> >  indicates the `this' argument of a method.  INNER is nonzero when
> > @@ -7401,8 +7432,20 @@ convert_like_real (conversion *convs, tree expr, 
> > tree fn, int argnum,
> >   error_at (loc, "cannot bind non-const lvalue reference of "
> > "type %qH to an rvalue of type %qI", totype, extype);
> > else if (!reference_compatible_p (TREE_TYPE (totype), extype))
> > - error_at (loc, "binding reference of type %qH to %qI "
> > -   "discards qualifiers", totype, extype);
> > + {
> > +   /* If we're converting from T[] to T[N], don't talk
> > +  about discarding qualifiers.  (Converting from T[N] to
> > +  T[] is allowed by P0388R4.)  */
> > +   if (TREE_CODE (extype) == ARRAY_TYPE
> > +   && TYPE_DOMAIN (extype) == NULL_TREE
> > +   && TREE_CODE (TREE_TYPE (totype)) == ARRAY_TYPE
> > +   && TYPE_DOMAIN (TREE_TYPE (totype)) != NULL_TREE)
> > + error_at (loc, "binding reference of type %qH to %qI "
> > +   "discards array bounds", totype, extype);
> 
> If we're converting to T[N], that would be adding, not discarding, array
> bounds?

True, I've reworded the error mesage.

> > +   else
> > + error_at (loc, "binding reference of type %qH to %qI "
> > +   "discards qualifiers", totype, extype);
> > + }
> > else
> >   gcc_unreachable ();
> > maybe_print_user_conv_context (convs);
> > @@ -7410,6 +7453,8 @@ convert_like_real (conversion *convs, tree expr, tree 
> > fn, int argnum,
> > return error_mark_node;
> >   }
> > +   else if (complain & tf_warning)
> > + maybe_warn_array_conv (loc, convs, expr);
> > /* If necessary, create a temporary.
> > @@ -7493,7 +7538,10 @@ convert_like_real (conversion *convs, tree expr, 
> > tree fn, int argnum,
> >   case ck_qual:
> > /* Warn about deprecated conversion if appropriate.  */
> > if (complain & tf_warning)
>

[patch][OpenMP,Fortran] Fix several OpenMP use_device_addr/map/update errors found by a length test case

2019-10-07 Thread Tobias Burnus

This patch includes a rather comprehensive test case for 
"use_device_addr" – and fixes the fall out. Notes:


* Only scalars and non-descriptor arrays are tested

* In particular, polymorphic types, absent optional arguments and 
array-descriptor arrays are not; nor are associate-block variables nor 
cray pointees.


* "use_device_ptr" has not been tested (beyond what's currently in the 
test suite)


* Assumption (based on the OpenMP spec): within a "omp target data … 
use_device_addr(var)" block, c_loc is used to access the address and it 
is the only pointer which will end up on the device – everything else 
("meta data") is only host data (i.e. present status, dynamic type, 
array shape).


* OpenMP spec: is_device_ptr with "type(c), value" would be nice (cf. 
test-case file; but OpenMP 5 currently only permits dummy arguments w/o 
value/allocatable/pointer attribute).



This patch fixes:

* An issue with MAP and VALUE

* An issue with UPDATE target and pointer/allocatable scalars

* use_device_addr: Actually pass clause to the ME and fix several issues 
there, mostly related to optional and pointer/allocatable.



Tested with nvptx – and bootstrapped w/ and without offloading support.

OK for the trunk?

Tobias

	gcc/fortran/
	* f95-lang.c (LANG_HOOKS_OMP_IS_ALLOCATABLE_OR_PTR): Re-define to
	gfc_omp_is_allocatable_or_ptr.
	* trans-decl.c (create_function_arglist): Set GFC_DECL_OPTIONAL_ARGUMENT
	only if not passed by value.
	* trans-openmp.c (gfc_omp_is_allocatable_or_ptr): New.
	(gfc_trans_omp_clauses): Actually pass use_device_addr on to the middle
	end; for MAP, handle (present) optional arguments; for target update,
	handle allocatable/pointer scalars.
	* trans.h (gfc_omp_is_allocatable_or_ptr): Declare.

	gcc/
	* langhooks-def.h (LANG_HOOKS_OMP_IS_ALLOCATABLE_OR_PTR): Define.
	(LANG_HOOKS_DECLS): Add it.
	* langhooks.h (lang_hooks_for_decls): Add omp_is_allocatable_or_ptr;
	update comment for omp_is_optional_argument.
	* omp-general.c (omp_is_allocatable_or_ptr): New.
	* omp-general.h (omp_is_allocatable_or_ptr): Declare.
	* omp-low.c (scan_sharing_clauses, lower_omp_target): Handle
	Fortran's optional arguments and allocatable/pointer scalars
	with use_device_addr.

	libgomp/
	* testsuite/libgomp.fortran/use_device_addr-1.f90: New.
	* testsuite/libgomp.fortran/use_device_addr-2.f90: New.


 gcc/fortran/f95-lang.c |2 +
 gcc/fortran/trans-decl.c   |3 +-
 gcc/fortran/trans-openmp.c |   35 +-
 gcc/fortran/trans.h|1 +
 gcc/langhooks-def.h|2 +
 gcc/langhooks.h|9 +-
 gcc/omp-general.c  |8 +
 gcc/omp-general.h  |1 +
 gcc/omp-low.c  |   38 +-
 .../libgomp.fortran/use_device_addr-1.f90  | 1202 
 .../libgomp.fortran/use_device_addr-2.f90  | 1202 
 11 files changed, 2489 insertions(+), 14 deletions(-)

diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c
index 2467cd968af..0f72ab9e3b4 100644
--- a/gcc/fortran/f95-lang.c
+++ b/gcc/fortran/f95-lang.c
@@ -113,6 +113,7 @@ static const struct attribute_spec gfc_attribute_table[] =
 #undef LANG_HOOKS_TYPE_FOR_MODE
 #undef LANG_HOOKS_TYPE_FOR_SIZE
 #undef LANG_HOOKS_INIT_TS
+#undef LANG_HOOKS_OMP_IS_ALLOCATABLE_OR_PTR
 #undef LANG_HOOKS_OMP_IS_OPTIONAL_ARGUMENT
 #undef LANG_HOOKS_OMP_PRIVATIZE_BY_REFERENCE
 #undef LANG_HOOKS_OMP_PREDETERMINED_SHARING
@@ -146,6 +147,7 @@ static const struct attribute_spec gfc_attribute_table[] =
 #define LANG_HOOKS_TYPE_FOR_MODE	gfc_type_for_mode
 #define LANG_HOOKS_TYPE_FOR_SIZE	gfc_type_for_size
 #define LANG_HOOKS_INIT_TS		gfc_init_ts
+#define LANG_HOOKS_OMP_IS_ALLOCATABLE_OR_PTR	gfc_omp_is_allocatable_or_ptr
 #define LANG_HOOKS_OMP_IS_OPTIONAL_ARGUMENT	gfc_omp_is_optional_argument
 #define LANG_HOOKS_OMP_PRIVATIZE_BY_REFERENCE	gfc_omp_privatize_by_reference
 #define LANG_HOOKS_OMP_PREDETERMINED_SHARING	gfc_omp_predetermined_sharing
diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index b701f493440..698d90a4d42 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -2691,8 +2691,9 @@ create_function_arglist (gfc_symbol * sym)
 	  && (!f->sym->attr.proc_pointer
 	  && f->sym->attr.flavor != FL_PROCEDURE))
 	DECL_BY_REFERENCE (parm) = 1;
-  if (f->sym->attr.optional)
+  if (f->sym->attr.optional && !f->sym->attr.value)
 	{
+	  // With value, the argument is passed as is
 	  gfc_allocate_lang_decl (parm);
 	  GFC_DECL_OPTIONAL_ARGUMENT (parm) = 1;
 	}
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index f83bab4850e..dad11a24430 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -47,7 +47,21 @@ along with GCC; see the file COPYING3.  If not see
 
 int ompws_flags;
 
-/* True if

Re: [PATCH] Fix -Wshadow=local warnings in elfos.h

2019-10-07 Thread Joseph Myers

On Sat, 5 Oct 2019, Bernd Edlinger wrote:

> > Like
> > 
> > #define DEFAULT_SOME_MACRO(PARMS) { lots of code }
> > 
> > becomes
> > 
> > #define DEFAULT_SOME_MACRO(PARMS) default_some_macro(PARMS)
> > 
> > static inline int
> > default_some_macro (int parm, long another)
> > {
> >   lots of code;
> > }
> > 
> > The point is that all this is completely *local* changes.
> > 
> 
> Hmm, I tried this but it does not work:

The answer to that is an out-of-line function.

See the default_register_move_cost function in targhooks.c, the 
register_move_cost hook and the REGISTER_MOVE_COST target macro, for 
example.  There is a strategy for incremental conversion of a single 
target macro to a hook that goes as follows:

* Define a corresponding target hook.  Make the default definition in 
targhooks.c be one that tests whether the target macro is defined and 
either uses that macro or the existing default as appropriate.  Make 
tm.texi.in document the hook alongside the macro, saying that the macro is 
obsolete and new targets should use the hook.

* Convert all *uses* of the target macro to use the hook instead.  There 
is no need to change any of the definitions in the various targets with 
non-default definitions.

* Remove the default definition of the target macro, as no longer needed 
now targhooks.c tests whether the target macro is defined or not.

At that point, you have a partial conversion of that target macro to a 
hook (and consequent gains regarding avoiding shadowing) without having 
touched any of the target-specific code.  Any target can safely convert 
its own definition of the macro to a definition of the hook, independently 
of all other targets and all other hooks.  Once all targets have changed, 
the macro can be poisoned and the documentation of the macro and support 
in targhooks.c for using the macro removed, in the usual fashion for 
removing a target macro - but that might happen a long time afterwards.  
It's more work than a pure local conversion to an inline function, or than 
just renaming variables, but significantly less than a full conversion of 
the macro to a hook (and in particular should not need testing on more 
than one target).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [C++ PATCH] Partial fix for further -fstrong-eval-order issues (PR c++/91987)

2019-10-07 Thread Jason Merrill


On 10/7/19 4:57 PM, Jakub Jelinek wrote:

On Mon, Oct 07, 2019 at 04:37:13PM -0400, Jason Merrill wrote:

How?  By adding a SAVE_EXPR in there, so that generic code is safe?


Something like that, yes.


Ok, will try that tomorrow.


I haven't touched the ARRAY_REF case earlier, because that I believe is
handled right in the gimplifier already.

+  if (flag_strong_eval_order == 2
+ && TREE_SIDE_EFFECTS (TREE_OPERAND (*expr_p, 1)))
+   {
+ enum gimplify_status t
+   = gimplify_expr (_OPERAND (*expr_p, 0), pre_p, post_p,
+is_gimple_val, fb_rvalue);
+ if (t == GS_ERROR)
+   break;
+ else if (is_gimple_variable (TREE_OPERAND (*expr_p, 0))
+  && TREE_CODE (TREE_OPERAND (*expr_p, 0)) != SSA_NAME)
+   TREE_OPERAND (*expr_p, 0)
+ = get_initialized_tmp_var (TREE_OPERAND (*expr_p, 0), pre_p,
+NULL);


I still think this pattern would be cleaner with a new gimple predicate that
returns true for invariant || SSA_NAME.  I think that would have the same
effect as this code.


The problem is that we need a reliable way to discover safe GIMPLE
temporaries for that to work.  If SSA_NAME is created, great, but in various
contexts (OpenMP/OpenACC bodies, and various other cases) allow_ssa is
false, at which point the gimplifier creates a temporary artificial VAR_DECL.


Yes, but doesn't the code above have the exact same effect?  You're checking
for a variable that isn't an SSA_NAME, and making a copy otherwise.


It will have the same effect except for the ICE.


If there is a predicate that rejects such a temporary, gimplify_expr will
ICE:
gcc_assert (!VOID_TYPE_P (TREE_TYPE (*expr_p)));
*expr_p = get_formal_tmp_var (*expr_p, pre_p);
...
/* Make sure the temporary matches our predicate.  */
gcc_assert ((*gimple_test_f) (*expr_p));


Won't get_formal_tmp_var always give an SSA_NAME?  It looks to me like it
unconditionally passes true for allow_ssa.


Yes, but there is still the && gimplify_ctxp->into_ssa conditional.
All but one push_gimplify_context call are with in_ssa = false.


Ah, I see.  And it occurs to me this situation fails condition #1 for 
get_formal_tmp_var anyway.  So I guess the predicate direction doesn't 
make sense.  Let's factor out the above pattern differently, then. 
Maybe a preevaluate function that takes a predicate as an argument?


Jason

Re: [C++ PATCH] Partial fix for further -fstrong-eval-order issues (PR c++/91987)

2019-10-07 Thread Jakub Jelinek

On Mon, Oct 07, 2019 at 04:37:13PM -0400, Jason Merrill wrote:
> > How?  By adding a SAVE_EXPR in there, so that generic code is safe?
> 
> Something like that, yes.

Ok, will try that tomorrow.

> > I haven't touched the ARRAY_REF case earlier, because that I believe is
> > handled right in the gimplifier already.
> > > > +  if (flag_strong_eval_order == 2
> > > > + && TREE_SIDE_EFFECTS (TREE_OPERAND (*expr_p, 1)))
> > > > +   {
> > > > + enum gimplify_status t
> > > > +   = gimplify_expr (_OPERAND (*expr_p, 0), pre_p, post_p,
> > > > +is_gimple_val, fb_rvalue);
> > > > + if (t == GS_ERROR)
> > > > +   break;
> > > > + else if (is_gimple_variable (TREE_OPERAND (*expr_p, 0))
> > > > +  && TREE_CODE (TREE_OPERAND (*expr_p, 0)) != SSA_NAME)
> > > > +   TREE_OPERAND (*expr_p, 0)
> > > > + = get_initialized_tmp_var (TREE_OPERAND (*expr_p, 0), 
> > > > pre_p,
> > > > +NULL);
> > > 
> > > I still think this pattern would be cleaner with a new gimple predicate 
> > > that
> > > returns true for invariant || SSA_NAME.  I think that would have the same
> > > effect as this code.
> > 
> > The problem is that we need a reliable way to discover safe GIMPLE
> > temporaries for that to work.  If SSA_NAME is created, great, but in various
> > contexts (OpenMP/OpenACC bodies, and various other cases) allow_ssa is
> > false, at which point the gimplifier creates a temporary artificial 
> > VAR_DECL.
> 
> Yes, but doesn't the code above have the exact same effect?  You're checking
> for a variable that isn't an SSA_NAME, and making a copy otherwise.

It will have the same effect except for the ICE.

> > If there is a predicate that rejects such a temporary, gimplify_expr will
> > ICE:
> >gcc_assert (!VOID_TYPE_P (TREE_TYPE (*expr_p)));
> >*expr_p = get_formal_tmp_var (*expr_p, pre_p);
> > ...
> >/* Make sure the temporary matches our predicate.  */
> >gcc_assert ((*gimple_test_f) (*expr_p));
> 
> Won't get_formal_tmp_var always give an SSA_NAME?  It looks to me like it
> unconditionally passes true for allow_ssa.

Yes, but there is still the && gimplify_ctxp->into_ssa conditional.
All but one push_gimplify_context call are with in_ssa = false.

Jakub

Re: Type representation in CTF and DWARF

2019-10-07 Thread Jason Merrill

On Mon, Oct 7, 2019 at 4:47 PM Indu Bhagat  wrote:

> On 10/07/2019 12:35 AM, Richard Biener wrote:
> > On Fri, Oct 4, 2019 at 9:12 PM Indu Bhagat 
> wrote:
> >> Hello,
> >>
> >> At GNU Tools Cauldron this year, some folks were curious to know more
> on how
> >> the "type representation" in CTF compares vis-a-vis DWARF.
> >>
> >> [...]
> >>
> >> So, for the small C testcase with a union, enum, array, struct, typedef
> etc, I
> >> see following sizes :
> >>
> >> Compile with -fdebug-types-section -gdwarf-4 (size -A  excerpt):
> >>   .debug_aranges 48 0
> >>   .debug_info   150 0
> >>   .debug_abbrev 314 0
> >>   .debug_line73 0
> >>   .debug_str455 0
> >>   .debug_ranges  32 0
> >>   .debug_types  578 0
> >>
> >> Compile with -fdebug-types-section -gdwarf-5 (size -A  excerpt):
> >>   .debug_aranges  48 0
> >>   .debug_info732 0
> >>   .debug_abbrev  309 0
> >>   .debug_line 73 0
> >>   .debug_str 455 0
> >>   .debug_rnglists 23 0
> >>
> >> Compile with -gt (size -A  excerpt):
> >>   .ctf  966 0
> >>   CTF strings sub-section size (ctf_strlen in disassmebly) = 374
> >>   == > CTF section just for representing types = 966 - 374 = 592
> bytes
> >>   (The 592 bytes include the CTF header and other indexes etc.)
> >>
> >> So, following points are what I would highlight. Hopefully this helps
> you see
> >> that CTF has promise for the task of representing type debug info.
> >>
> >> 1. Type Information layout in sections:
> >>  A .ctf section is self-sufficient to represent types in a program.
> All
> >>  references within the CTF section are via either indexes or
> offsets into the
> >>  CTF section. No relocations are necessary in CTF at this time. In
> contrast,
> >>  DWARF type information is organized in multiple sections -
> .debug_info,
> >>  .debug_abbrev and .debug_str sections in DWARF5; plus .debug_types
> in DWARF4.
> >>
> >> 2. Type Information encoding / compactness matters:
> >>  Because the type information is organized across sections in DWARF
> (and
> >>  contains some debug information like location etc.) , it is not
> feasible
> >>  to put a distinct number to the size in bytes for representing type
> >>  information in DWARF. But the size info of sections shown above
> should
> >>  be helpful to show that CTF does show promise in compactly
> representing
> >>  types.
> >>
> >>  Lets see some size data. CTF string table (= 374 bytes) is left
> out of the
> >>  discussion at hand because it will not be fair to compare with
> .debug_str
> >>  section which contains other information than just names of types.
> >>
> >>  The 592 bytes of the .ctf section are needed to represent types in
> CTF
> >>  format. Now, when using DWARF5, the type information needs 732
> bytes in
> >>  .debug_info and 309 bytes in .debug_abbrev.
> >>
> >>  In DWARF (when using -fdebug-types-section), the base types are
> duplicated
> >>  across type units. So for the above example, the DWARF DIE
> representing
> >>  'unsigned int' will appear in both the  DWARF trees for types -
> node and
> >>  node_payload. In CTF, there is a single lone type 'unsigned int'.
> > It's not clear to me why you are using -fdebug-types-section for this
> > comparison?
> > With just -gdwarf-4 I get
> >
> > .debug_info  292
> > .debug_abbrev 189
> > .debug_str   299
> >
> > this contains all the info CTF provides (and more).  This sums to 780
> bytes,
> > smaller than the CTF variant.  I skimmed over the info and there's not
> much
> > to strip to get to CTF levels, mainly locations.  The strings section
> also
> > has a quite large portion for GCC version and arguments, which is 93
> bytes.
> > So overall the DWARF representation should clock in at less than 700
> bytes,
> > more close to 650.
> >
> > Richard.
>
> It's not in favor of DWARF to go with just -gdwarf-4. Because the types
> in the .debug_info section will not be de-duplicated. For more complicated
> code
> bases with many compilation units, this will skew the results in favor of
> CTF
> (once the CTF de-duplictor is ready :) ).
>
> Now, one might argue that in this example, there is no role for
> de-duplicator.
> Yes to that. But to all users of DWARF type debug information for _real
> codebases_, -fdebug-types-section option is the best option. Isn't it ?
>
> Keeping "the size of type debug information in the shipped artifact small"
> as
> our target is meaningful for both CTF and DWARF.
>
> De-duplication is a key contributor to reducing the size of the type debug
> information; and both CTF and DWARF types can be de-duplicated. At this
> time, I
> stuck to a simple example with one CU because it eases interpreting the
> CTF and
> DWARF

Re: Type representation in CTF and DWARF

2019-10-07 Thread Indu Bhagat





On 10/07/2019 12:35 AM, Richard Biener wrote:

On Fri, Oct 4, 2019 at 9:12 PM Indu Bhagat  wrote:

Hello,

At GNU Tools Cauldron this year, some folks were curious to know more on how
the "type representation" in CTF compares vis-a-vis DWARF.

[...]

So, for the small C testcase with a union, enum, array, struct, typedef etc, I
see following sizes :

Compile with -fdebug-types-section -gdwarf-4 (size -A  excerpt):
  .debug_aranges 48 0
  .debug_info   150 0
  .debug_abbrev 314 0
  .debug_line73 0
  .debug_str455 0
  .debug_ranges  32 0
  .debug_types  578 0

Compile with -fdebug-types-section -gdwarf-5 (size -A  excerpt):
  .debug_aranges  48 0
  .debug_info732 0
  .debug_abbrev  309 0
  .debug_line 73 0
  .debug_str 455 0
  .debug_rnglists 23 0

Compile with -gt (size -A  excerpt):
  .ctf  966 0
  CTF strings sub-section size (ctf_strlen in disassmebly) = 374
  == > CTF section just for representing types = 966 - 374 = 592 bytes
  (The 592 bytes include the CTF header and other indexes etc.)

So, following points are what I would highlight. Hopefully this helps you see
that CTF has promise for the task of representing type debug info.

1. Type Information layout in sections:
 A .ctf section is self-sufficient to represent types in a program. All
 references within the CTF section are via either indexes or offsets into 
the
 CTF section. No relocations are necessary in CTF at this time. In contrast,
 DWARF type information is organized in multiple sections - .debug_info,
 .debug_abbrev and .debug_str sections in DWARF5; plus .debug_types in 
DWARF4.

2. Type Information encoding / compactness matters:
 Because the type information is organized across sections in DWARF (and
 contains some debug information like location etc.) , it is not feasible
 to put a distinct number to the size in bytes for representing type
 information in DWARF. But the size info of sections shown above should
 be helpful to show that CTF does show promise in compactly representing
 types.

 Lets see some size data. CTF string table (= 374 bytes) is left out of the
 discussion at hand because it will not be fair to compare with .debug_str
 section which contains other information than just names of types.

 The 592 bytes of the .ctf section are needed to represent types in CTF
 format. Now, when using DWARF5, the type information needs 732 bytes in
 .debug_info and 309 bytes in .debug_abbrev.

 In DWARF (when using -fdebug-types-section), the base types are duplicated
 across type units. So for the above example, the DWARF DIE representing
 'unsigned int' will appear in both the  DWARF trees for types - node and
 node_payload. In CTF, there is a single lone type 'unsigned int'.

It's not clear to me why you are using -fdebug-types-section for this
comparison?
With just -gdwarf-4 I get

.debug_info  292
.debug_abbrev 189
.debug_str   299

this contains all the info CTF provides (and more).  This sums to 780 bytes,
smaller than the CTF variant.  I skimmed over the info and there's not much
to strip to get to CTF levels, mainly locations.  The strings section also
has a quite large portion for GCC version and arguments, which is 93 bytes.
So overall the DWARF representation should clock in at less than 700 bytes,
more close to 650.

Richard.


It's not in favor of DWARF to go with just -gdwarf-4. Because the types
in the .debug_info section will not be de-duplicated. For more complicated code
bases with many compilation units, this will skew the results in favor of CTF
(once the CTF de-duplictor is ready :) ).

Now, one might argue that in this example, there is no role for de-duplicator.
Yes to that. But to all users of DWARF type debug information for _real
codebases_, -fdebug-types-section option is the best option. Isn't it ?

Keeping "the size of type debug information in the shipped artifact small" as
our target is meaningful for both CTF and DWARF.

De-duplication is a key contributor to reducing the size of the type debug
information; and both CTF and DWARF types can be de-duplicated. At this time, I
stuck to a simple example with one CU because it eases interpreting the CTF and
DWARF debug info in the binaries and because the CTF link-time de-duplication
is not fully ready.

(NickA suggested few days ago to compare how DWARF and CTF section sizes
 increase when a new member, or a new enum, or a new union etc are added. I can
 share some more data if there is interest in such a comparison. Few examples
 below :

1. Add a new member 'struct node_payload * a' to struct node_payload
   DWARF = 589 - 578 (.debug_types); 331 - 314 (.debug_abbrev); total = 11 + 17 
= 28
   CTF = 980 - 966

Re: [PATCHv2] Change the library search path when using --with-advance-toolchain

2019-10-07 Thread Michael Meissner

On Fri, Oct 04, 2019 at 06:31:34PM -0300, Tulio Magno Quites Machado Filho 
wrote:
> Michael Meissner  writes:
> 
> > And then I built Spec 2006 and 2017 with my normal options on a power8 
> > system
> > running Ubuntu and an older set of host libraries.  If I have this patch
> > installed, it breaks linking some/all of the C++ and Fortran benchmarks,
> > because it finds the host C++/Fortran libraries before the AT 12.0 
> > libraries.
> 
> You have an usecase where the compiler has a different prefix.
> That requires an extra library directory, but it still has to come after
> the user lib directory.

Once I fixed the perl script to remove the extra -L's from the spec config.cfg
file that the perl script had put in for the previous version of
--with-advance-toolchain=, I was able to link one of the problematical
benchmarks (2017 roms_r in this case).

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

[COMMITTED][MSP430] s/msp_/msp430_/

2019-10-07 Thread Jozef Lawrynowicz

To improve consistency in the naming scheme for functions and other C objects
in the msp430 backend, the attached patch replaces names that start with "msp_"
with "msp430_".
The patch exposed some whitespace issues in msp430.md. These have also been
fixed.

Regtested on trunk.
Committed as obvious.
>From a7bc2144797a7bc070eeaf01830816f1935a7345 Mon Sep 17 00:00:00 2001
From: jozefl 
Date: Mon, 7 Oct 2019 20:09:49 +
Subject: [PATCH] 2019-10-07  Jozef Lawrynowicz  

	* config/msp430/msp430.c (msp430_file_end): s/msp_/msp430_/
	(msp430_expand_epilogue): Likewise.
	* config/msp430/predicates.md: Likewise.
	* config/msp430/msp430.md: Likewise.
	Replace blocks of 8 spaces with tabs.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@276671 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog   |   8 +++
 gcc/config/msp430/msp430.c  |  14 ++--
 gcc/config/msp430/msp430.md | 122 
 gcc/config/msp430/predicates.md |  10 +--
 4 files changed, 82 insertions(+), 72 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 91ebb5a114c..7d8aa956633 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2019-10-07  Jozef Lawrynowicz  
+
+	* config/msp430/msp430.c (msp430_file_end): s/msp_/msp430_/
+	(msp430_expand_epilogue): Likewise.
+	* config/msp430/predicates.md: Likewise.
+	* config/msp430/msp430.md: Likewise.
+	Replace blocks of 8 spaces with tabs.
+
 2019-10-07  Jozef Lawrynowicz  
 
 	* config/msp430/msp430-protos.h (msp430_split_addsi): New prototype.
diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c
index add19bdb97c..ae763faada3 100644
--- a/gcc/config/msp430/msp430.c
+++ b/gcc/config/msp430/msp430.c
@@ -83,7 +83,7 @@ struct GTY(()) machine_function
 };
 
 /* This is our init_machine_status, as set in
-   msp_option_override.  */
+   msp430_option_override.  */
 static struct machine_function *
 msp430_init_machine_status (void)
 {
@@ -2119,18 +2119,20 @@ msp430_file_end (void)
  construct a .MSP430.attributes section based on the options it is invoked
  with.  The values it reads from these directives are used for validating
  those options.  */
-  const char *msp_attr = ".mspabi_attribute";
+  const char *msp430_attr = ".mspabi_attribute";
   const char *gnu_attr = ".gnu_attribute";
 
   /* Emit .mspabi_attribute directive for OFBA_MSPABI_Tag_ISA.  */
-  fprintf (asm_out_file, "\t%s %d, %d\n", msp_attr, OFBA_MSPABI_Tag_ISA,
+  fprintf (asm_out_file, "\t%s %d, %d\n", msp430_attr, OFBA_MSPABI_Tag_ISA,
 	   msp430x ? OFBA_MSPABI_Val_ISA_MSP430X : OFBA_MSPABI_Val_ISA_MSP430);
   /* Emit .mspabi_attribute directive for OFBA_MSPABI_Tag_Code_Model.  */
-  fprintf (asm_out_file, "\t%s %d, %d\n", msp_attr, OFBA_MSPABI_Tag_Code_Model,
+  fprintf (asm_out_file, "\t%s %d, %d\n", msp430_attr,
+	   OFBA_MSPABI_Tag_Code_Model,
 	   TARGET_LARGE ? OFBA_MSPABI_Val_Model_Large
 	   : OFBA_MSPABI_Val_Model_Small);
   /* Emit .mspabi_attribute directive for OFBA_MSPABI_Tag_Data_Model.  */
-  fprintf (asm_out_file, "\t%s %d, %d\n", msp_attr, OFBA_MSPABI_Tag_Data_Model,
+  fprintf (asm_out_file, "\t%s %d, %d\n", msp430_attr,
+	   OFBA_MSPABI_Tag_Data_Model,
 	   TARGET_LARGE ? OFBA_MSPABI_Val_Model_Large
 	   : OFBA_MSPABI_Val_Model_Small);
 #ifdef HAVE_AS_MSPABI_ATTRIBUTE
@@ -2604,7 +2606,7 @@ msp430_expand_epilogue (int is_eh)
   else if (is_reentrant_func ())
 emit_insn (gen_enable_interrupts ());
 
-  emit_jump_insn (gen_msp_return ());
+  emit_jump_insn (gen_msp430_return ());
 }
 
 /* Implements EH_RETURN_STACKADJ_RTX.  Saved and used later in
diff --git a/gcc/config/msp430/msp430.md b/gcc/config/msp430/msp430.md
index e1c61f5ea3d..ebd9c85fbb6 100644
--- a/gcc/config/msp430/msp430.md
+++ b/gcc/config/msp430/msp430.md
@@ -183,15 +183,15 @@
 )
 
 (define_insn "movqi_topbyte"
-  [(set (match_operand:QI 0 "msp_nonimmediate_operand" "=r")
-	(subreg:QI (match_operand:PSI 1 "msp_general_operand" "r") 2))]
+  [(set (match_operand:QI 0 "msp430_nonimmediate_operand" "=r")
+	(subreg:QI (match_operand:PSI 1 "msp430_general_operand" "r") 2))]
   "msp430x"
   "PUSHM.A\t#1,%1 { POPM.W\t#1,%0 { POPM.W\t#1,%0"
 )
 
 (define_insn "movqi"
-  [(set (match_operand:QI 0 "msp_nonimmediate_operand" "=rYsYx,rm")
-	(match_operand:QI 1 "msp_general_operand" "riYsYx,rmi"))]
+  [(set (match_operand:QI 0 "msp430_nonimmediate_operand" "=rYsYx,rm")
+	(match_operand:QI 1 "msp430_general_operand" "riYsYx,rmi"))]
   ""
   "@
   MOV.B\t%1, %0
@@ -199,8 +199,8 @@
 )
 
 (define_insn "movhi"
-  [(set (match_operand:HI 0 "msp_nonimmediate_operand" "=r,rYsYx,rm")
-	(match_operand:HI 1 "msp_general_operand" "N,riYsYx,rmi"))]
+  [(set (match_operand:HI 0 "msp430_nonimmediate_operand" "=r,rYsYx,rm")
+	(match_operand:HI 1 "msp430_general_operand" "N,riYsYx,rmi"))]
   ""
   "@
   MOV.B\t%1, %0
@@ -243,8 +243,8 @@
 
 ;; FIXME: Some MOVX.A cases can be done with MOVA, this is only a few of them.
 (define_insn "movpsi"
-  [(set (match_operand:PSI 0

Re: [C++ PATCH] Partial fix for further -fstrong-eval-order issues (PR c++/91987)

2019-10-07 Thread Jason Merrill


On 10/7/19 4:10 PM, Jakub Jelinek wrote:

On Mon, Oct 07, 2019 at 03:51:05PM -0400, Jason Merrill wrote:

-  if (TREE_CODE (arg1) == COMPOUND_EXPR)
+  if (TREE_CODE (arg1) == COMPOUND_EXPR
+ && (flag_strong_eval_order != 2
+ /* C++17 disallows this canonicalization for shifts.  */
+ || (code != LSHIFT_EXPR
+ && code != RSHIFT_EXPR
+ && code != LROTATE_EXPR
+ && code != RROTATE_EXPR)))


Perhaps we should handle this in cp_build_binary_op instead?


How?  By adding a SAVE_EXPR in there, so that generic code is safe?


Something like that, yes.


 if (processing_template_decl && expr != error_mark_node)
   {


Hmm, it's weird that we have both grok_array_decl and build_x_array_ref.
I'll try removing the former.


Indeed.  I've noticed that because cp_build_array_ref only swaps in the
non-array case, in:
template 
auto
foo (T , U )
{
   T t;
   U u;
   __builtin_memcpy (, , sizeof (t));
   __builtin_memcpy (, , sizeof (u));
   return t[u];
}

int
main ()
{
   int a[4] = { 3, 4, 5, 6 };
   int b = 2;
   auto c = foo (a, b);
   auto d = foo (b, a);
}
we actually use the *(t+u) form in the second instantiation case
(regardless of -fstrong-eval-order) and t[u] in the former case,
as it is only grok_array_decl that swaps in that case.


Aha.  Yes, it seems there are a few things that work with 
grok_array_decl that will need to be fixed with build_x_array_ref.  I'm 
not going to mess with this any more in stage 1.



--- gcc/cp/typeck.c.jj  2019-10-07 13:08:58.717380894 +0200
+++ gcc/cp/typeck.c 2019-10-07 13:21:56.859450760 +0200
@@ -3550,6 +3550,10 @@ cp_build_array_ref (location_t loc, tree
 {
   tree ar = cp_default_conversion (array, complain);
   tree ind = cp_default_conversion (idx, complain);
+tree first = NULL_TREE;
+
+if (flag_strong_eval_order == 2 && TREE_SIDE_EFFECTS (ind))
+  ar = first = save_expr (ar);


save_expr will make a copy of the array, won't it?  That's unlikely to be


No.  First of all, this is on the else branch of
TREE_CODE (TREE_TYPE (array)) == ARRAY_TYPE, so either array is a pointer,
or idx is an array, or pointer, and it is after cp_default_conversion, so I
think ar must be either a pointer or integer.


Ah, good point.


I haven't touched the ARRAY_REF case earlier, because that I believe is
handled right in the gimplifier already.

+  if (flag_strong_eval_order == 2
+ && TREE_SIDE_EFFECTS (TREE_OPERAND (*expr_p, 1)))
+   {
+ enum gimplify_status t
+   = gimplify_expr (_OPERAND (*expr_p, 0), pre_p, post_p,
+is_gimple_val, fb_rvalue);
+ if (t == GS_ERROR)
+   break;
+ else if (is_gimple_variable (TREE_OPERAND (*expr_p, 0))
+  && TREE_CODE (TREE_OPERAND (*expr_p, 0)) != SSA_NAME)
+   TREE_OPERAND (*expr_p, 0)
+ = get_initialized_tmp_var (TREE_OPERAND (*expr_p, 0), pre_p,
+NULL);


I still think this pattern would be cleaner with a new gimple predicate that
returns true for invariant || SSA_NAME.  I think that would have the same
effect as this code.


The problem is that we need a reliable way to discover safe GIMPLE
temporaries for that to work.  If SSA_NAME is created, great, but in various
contexts (OpenMP/OpenACC bodies, and various other cases) allow_ssa is
false, at which point the gimplifier creates a temporary artificial VAR_DECL.


Yes, but doesn't the code above have the exact same effect?  You're 
checking for a variable that isn't an SSA_NAME, and making a copy otherwise.



If there is a predicate that rejects such a temporary, gimplify_expr will
ICE:
   gcc_assert (!VOID_TYPE_P (TREE_TYPE (*expr_p)));
   *expr_p = get_formal_tmp_var (*expr_p, pre_p);
...
   /* Make sure the temporary matches our predicate.  */
   gcc_assert ((*gimple_test_f) (*expr_p));


Won't get_formal_tmp_var always give an SSA_NAME?  It looks to me like 
it unconditionally passes true for allow_ssa.


Jason

[COMMITTED][MSP430] Move C code for addsi define_split to its own function

2019-10-07 Thread Jozef Lawrynowicz

In preparation for upcoming changes to the addsi splitter, the attached patch
puts C code to perform the splitting in its own function.

Regtested on trunk.
Committed as obvious.
From 41e73d742fda612b0978cf84ae8732b430c4ef5a Mon Sep 17 00:00:00 2001
From: jozefl 
Date: Mon, 7 Oct 2019 20:05:30 +
Subject: [PATCH] 2019-10-07  Jozef Lawrynowicz  

	* config/msp430/msp430-protos.h (msp430_split_addsi): New prototype.
	* config/msp430/msp430.c (msp430_split_addsi): New.
	* config/msp430/msp430.md: Call msp430_split_addsi () instead of using
	a block of C code for splitting addsi.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@276670 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog |  7 +++
 gcc/config/msp430/msp430-protos.h |  1 +
 gcc/config/msp430/msp430.c| 23 +++
 gcc/config/msp430/msp430.md   | 20 +++-
 4 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index bbdce86a0f8..91ebb5a114c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2019-10-07  Jozef Lawrynowicz  
+
+	* config/msp430/msp430-protos.h (msp430_split_addsi): New prototype.
+	* config/msp430/msp430.c (msp430_split_addsi): New.
+	* config/msp430/msp430.md: Call msp430_split_addsi () instead of using
+	a block of C code for splitting addsi.
+
 2019-10-07  UroÅ¡ Bizjak  
 
 	* config/i386/i386-expand.c (ix86_expand_floorceildf_32,
diff --git a/gcc/config/msp430/msp430-protos.h b/gcc/config/msp430/msp430-protos.h
index 1c1757fc7ab..37ca48297ac 100644
--- a/gcc/config/msp430/msp430-protos.h
+++ b/gcc/config/msp430/msp430-protos.h
@@ -44,6 +44,7 @@ void	msp430_output_labelref (FILE *, const char *);
 void	msp430_register_pragmas (void);
 rtx	msp430_return_addr_rtx (int);
 void	msp430_split_movsi (rtx *);
+int msp430_split_addsi (rtx *);
 voidmsp430_start_function (FILE *, const char *, tree);
 rtx	msp430_subreg (machine_mode, rtx, machine_mode, int);
 boolmsp430_use_f5_series_hwmult (void);
diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c
index 354b4ddb419..add19bdb97c 100644
--- a/gcc/config/msp430/msp430.c
+++ b/gcc/config/msp430/msp430.c
@@ -2841,6 +2841,29 @@ msp430_subreg (machine_mode mode, rtx r, machine_mode omode, int byte)
   return rv;
 }
 
+int
+msp430_split_addsi (rtx *operands)
+{
+  operands[3] = msp430_subreg (HImode, operands[0], SImode, 0);
+  operands[4] = msp430_subreg (HImode, operands[1], SImode, 0);
+  operands[5] = msp430_subreg (HImode, operands[2], SImode, 0);
+  operands[6] = msp430_subreg (HImode, operands[0], SImode, 2);
+  operands[7] = msp430_subreg (HImode, operands[1], SImode, 2);
+  operands[8] = msp430_subreg (HImode, operands[2], SImode, 2);
+
+  /* BZ 64160: Do not use this splitter when the dest partially overlaps the
+ source.  */
+  if (reg_overlap_mentioned_p (operands[3], operands[7])
+  || reg_overlap_mentioned_p (operands[3], operands[8]))
+return 1;
+
+  if (GET_CODE (operands[5]) == CONST_INT)
+operands[9] = GEN_INT (INTVAL (operands[5]) & 0x);
+  else
+operands[9] = gen_rtx_ZERO_EXTEND (SImode, operands[5]);
+  return 0;
+}
+
 /* Called by movsi_x to generate the HImode operands.  */
 void
 msp430_split_movsi (rtx *operands)
diff --git a/gcc/config/msp430/msp430.md b/gcc/config/msp430/msp430.md
index c72f7aade30..e1c61f5ea3d 100644
--- a/gcc/config/msp430/msp430.md
+++ b/gcc/config/msp430/msp430.md
@@ -423,23 +423,9 @@
 		 (zero_extend:HI (reg:BI CARRY
]
   "
-   operands[3] = msp430_subreg (HImode, operands[0], SImode, 0);
-   operands[4] = msp430_subreg (HImode, operands[1], SImode, 0);
-   operands[5] = msp430_subreg (HImode, operands[2], SImode, 0);
-   operands[6] = msp430_subreg (HImode, operands[0], SImode, 2);
-   operands[7] = msp430_subreg (HImode, operands[1], SImode, 2);
-   operands[8] = msp430_subreg (HImode, operands[2], SImode, 2);
-
-   /* BZ 64160: Do not use this splitter when the dest partially overlaps the source.  */
-   if (reg_overlap_mentioned_p (operands[3], operands[7])
-   || reg_overlap_mentioned_p (operands[3], operands[8]))
-  FAIL;
-
-   if (GET_CODE (operands[5]) == CONST_INT)
- operands[9] = GEN_INT (INTVAL (operands[5]) & 0x);
-   else
- operands[9] = gen_rtx_ZERO_EXTEND (SImode, operands[5]);
-   "
+  if (msp430_split_addsi (operands))
+FAIL;
+  "
   )
 
 
-- 
2.17.1

[Darwin, machopic 1/n, conmmitted] Consider visibility in indirections.

2019-10-07 Thread Iain Sandoe

For weak, hidden vars the indirection should just be as normal; that
is, that the indirections for such symbols should appear in the non-lazy
symbol pointers table, not in the .data section.

tested on x86_64-darwin16, applied to mainline,
thanks
Iain

gcc/ChangeLog:

2019-10-07  Iain Sandoe  

* config/darwin.c (machopic_output_indirection): Don't put
hidden symbol indirections into the .data section, use the
non-lazy symbol pointers section as normal.
(darwin_encode_section_info): Record if a symbol is hidden.
* config/darwin.h (MACHO_SYMBOL_FLAG_HIDDEN_VIS): New.
(MACHO_SYMBOL_HIDDEN_VIS_P): New.


diff --git a/gcc/config/darwin.c b/gcc/config/darwin.c
index 45e0d744ad..869e850c57 100644
--- a/gcc/config/darwin.c
+++ b/gcc/config/darwin.c
@@ -1120,6 +1120,7 @@ machopic_output_indirection (machopic_indirection **slot, 
FILE *asm_out_file)
   machopic_output_stub (asm_out_file, sym, stub);
 }
   else if (! indirect_data (symbol)
+  && ! MACHO_SYMBOL_HIDDEN_VIS_P (symbol)
   && (machopic_symbol_defined_p (symbol)
   || SYMBOL_REF_LOCAL_P (symbol)))
 {
@@ -1237,6 +1238,12 @@ darwin_encode_section_info (tree decl, rtx rtl, int 
first)
   if (VAR_P (decl))
 SYMBOL_REF_FLAGS (sym_ref) |= MACHO_SYMBOL_FLAG_VARIABLE;
 
+  /* For Darwin, if we have specified visibility and it's not the default
+ that's counted 'hidden'.  */
+  if (DECL_VISIBILITY_SPECIFIED (decl)
+  && DECL_VISIBILITY (decl) != VISIBILITY_DEFAULT)
+ SYMBOL_REF_FLAGS (sym_ref) |= MACHO_SYMBOL_FLAG_HIDDEN_VIS;
+
   if (!DECL_EXTERNAL (decl)
   && (!TREE_PUBLIC (decl) || !DECL_WEAK (decl))
   && ! lookup_attribute ("weakref", DECL_ATTRIBUTES (decl))
diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index 39c54cc720..87e1eb63b1 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -828,6 +828,12 @@ extern GTY(()) section * 
darwin_sections[NUM_DARWIN_SECTIONS];
 #define MACHO_SYMBOL_DEFINED_P(RTX) \
   ((SYMBOL_REF_FLAGS (RTX) & MACHO_SYMBOL_FLAG_DEFINED) != 0)
 
+/* Set on a symbol that has specified non-default visibility.  */
+
+#define MACHO_SYMBOL_FLAG_HIDDEN_VIS ((SYMBOL_FLAG_SUBT_DEP) << 3)
+#define MACHO_SYMBOL_HIDDEN_VIS_P(RTX) \
+  ((SYMBOL_REF_FLAGS (RTX) & MACHO_SYMBOL_FLAG_HIDDEN_VIS) != 0)
+
 /* Set on a symbol to indicate when fix-and-continue style code
generation is being used and the symbol refers to a static symbol
that should be rebound from new instances of a translation unit to

[PATCH][MSP430] Add support for post increment addressing

2019-10-07 Thread Jozef Lawrynowicz

MSP430 supports post increment addressing for the source operand only. The
attached patch enables post increment addressing for MSP430 in GCC.

Regtested on trunk for the small and large memory models.

Ok for trunk?
>From d75e48ba434d7bab141c09d442793b2cb26afa69 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Mon, 16 Sep 2019 16:34:51 +0100
Subject: [PATCH] MSP430: Implement post increment addressing

gcc/ChangeLog:

2019-10-07  Jozef Lawrynowicz  

	* config/msp430/constraints.md: Allow post_inc operand for "Ya"
	constraint.
	* config/msp430/msp430.c (msp430_legitimate_address_p): Handle
	POST_INC.
	(msp430_subreg): Likewise.
	(msp430_split_addsi): Likewise.
	(msp430_print_operand_addr): Likewise.
	* config/msp430/msp430.h (HAVE_POST_INCREMENT): Define.
	(USE_STORE_POST_INCREMENT): Define.
	* config/msp430/msp430.md: Use the msp430_general_dst_operand or
	msp430_general_dst_nonv_operand predicates for the lvalues insns.
	* config/msp430/predicates.md (msp430_nonpostinc_operand): New.
	(msp430_general_dst_operand): New.
	(msp430_general_dst_nonv_operand): New.
	(msp430_nonsubreg_operand): Remove.
	(msp430_nonsubreg_dst_operand): New.
	(msp430_nonsubreg_or_imm_operand): Allow reg or mem operands in place
	of defunct msp430_nonsubreg_operand.
	(msp430_nonsubregnonpostinc_or_imm_operand): New.
---
 gcc/config/msp430/constraints.md |   1 +
 gcc/config/msp430/msp430.c   |  32 +-
 gcc/config/msp430/msp430.h   |  12 ++
 gcc/config/msp430/msp430.md  | 191 ---
 gcc/config/msp430/predicates.md  |  46 +++-
 5 files changed, 183 insertions(+), 99 deletions(-)

diff --git a/gcc/config/msp430/constraints.md b/gcc/config/msp430/constraints.md
index 4422b2b6454..d01bcf9a242 100644
--- a/gcc/config/msp430/constraints.md
+++ b/gcc/config/msp430/constraints.md
@@ -60,6 +60,7 @@
 		 (match_code "reg" "00")
 		 (match_test ("CONST_INT_P (XEXP (XEXP (op, 0), 1))")))
 	(match_test "CONSTANT_P (XEXP (op, 0))")
+	(match_code "post_inc" "0")
 	)))
 
 (define_constraint "Yl"
diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c
index ae763faada3..31029395c3d 100644
--- a/gcc/config/msp430/msp430.c
+++ b/gcc/config/msp430/msp430.c
@@ -942,12 +942,17 @@ msp430_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED,
   return false;
 
 case PLUS:
+case POST_INC:
   if (REG_P (XEXP (x, 0)))
 	{
 	  if (GET_MODE (x) != GET_MODE (XEXP (x, 0)))
 	return false;
 	  if (!reg_ok_for_addr (XEXP (x, 0), strict))
 	return false;
+	  if (GET_CODE (x) == POST_INC)
+	/* At this point, if the original rtx was a post_inc, we don't have
+	   anything further to check.  */
+	return true;
 	  switch (GET_CODE (XEXP (x, 1)))
 	{
 	case CONST:
@@ -2810,6 +2815,7 @@ rtx
 msp430_subreg (machine_mode mode, rtx r, machine_mode omode, int byte)
 {
   rtx rv;
+  gcc_assert (mode == HImode);
 
   if (GET_CODE (r) == SUBREG
   && SUBREG_BYTE (r) == 0)
@@ -2826,7 +2832,15 @@ msp430_subreg (machine_mode mode, rtx r, machine_mode omode, int byte)
 	rv = simplify_gen_subreg (mode, ireg, imode, byte);
 }
   else if (GET_CODE (r) == MEM)
-rv = adjust_address (r, mode, byte);
+{
+  /* When byte == 2, we can be certain that we were already called with an
+	 identical rtx with byte == 0.  So we don't need to do anything to
+	 get a 2 byte offset of a (mem (post_inc)) rtx, since the address has
+	 already been offset by the post_inc itself.  */
+  if (GET_CODE (XEXP (r, 0)) == POST_INC && byte == 2)
+	byte = 0;
+  rv = adjust_address (r, mode, byte);
+}
   else if (GET_CODE (r) == SYMBOL_REF
 	   && (byte == 0 || byte == 2)
 	   && mode == HImode)
@@ -2861,6 +2875,18 @@ msp430_split_addsi (rtx *operands)
 
   if (GET_CODE (operands[5]) == CONST_INT)
 operands[9] = GEN_INT (INTVAL (operands[5]) & 0x);
+  /* Handle post_inc, for example:
+ (set (reg:SI)
+	  (plus:SI (reg:SI)
+		   (mem:SI (post_inc:PSI (reg:PSI).  */
+  else if (MEM_P (operands[5]) && GET_CODE (XEXP (operands[5], 0)) == POST_INC)
+{
+  /* Strip out the post_inc from (mem (post_inc (reg))).  */
+  operands[9] = XEXP (XEXP (operands[5], 0), 0);
+  operands[9] = gen_rtx_MEM (HImode, operands[9]);
+  /* Then zero extend as normal.  */
+  operands[9] = gen_rtx_ZERO_EXTEND (SImode, operands[9]);
+}
   else
 operands[9] = gen_rtx_ZERO_EXTEND (SImode, operands[5]);
   return 0;
@@ -3205,6 +3231,10 @@ msp430_print_operand_addr (FILE * file, machine_mode /*mode*/, rtx addr)
   fprintf (file, "@");
   break;
 
+case POST_INC:
+  fprintf (file, "@%s+", reg_names[REGNO (XEXP (addr, 0))]);
+  return;
+
 case CONST:
 case CONST_INT:
 case SYMBOL_REF:
diff --git a/gcc/config/msp430/msp430.h b/gcc/config/msp430/msp430.h
index f885de2bb2f..73afe2e2d16 100644
--- a/gcc/config/msp430/msp430.h
+++ b/gcc/config/msp430/msp430.h
@@ -478,6 +478,18 @@ typedef struct
 
 #define

[Darwin, machopic 0/n, committed] Initial tidy of Mach-O symbol handling.

2019-10-07 Thread Iain Sandoe

We want to improve the detection and caching of symbol-properties
so that (a) we can make the compiler's output match the platform
norms (b) we can improve efficiency by checking flags instead of
inspecting strings. (c) The fix for PR71767 was a largish hammer
and we want to reduce the number of symbols that are made linker-
visible.

This first patch is largely typographical changes with no functional
difference intended:

- Tries to ensure that there's no overlap between the symbols used in
the Mach-O case and those declared in the i386 or rs6000 port trees.

 - Some improvement to comments.

- Makes the naming of the symbol flags consistent with other uses.
 
 - Provides a predicate macro for each use.

tested on x86_64-darwin16, applied to mainline,
thanks
Iain
 
gcc/ChangeLog:

2019-10-07  Iain Sandoe  

* config/darwin.c (machopic_symbol_defined_p): Use symbol flag
predicates instead of accessing bits directly.
(machopic_indirect_call_target): Likewise.
(machopic_output_indirection): Likewise.
(darwin_encode_section_info): Improve description.  Use renamed
symbol flags.  Use predicate macros for variables and functions.
* config/darwin.h:
Rename MACHO_SYMBOL_VARIABLE to MACHO_SYMBOL_FLAG_VARIABLE.
Rename MACHO_SYMBOL_DEFINED to MACHO_SYMBOL_FLAG_DEFINED.
Rename MACHO_SYMBOL_STATIC to MACHO_SYMBOL_FLAG_STATIC.
(MACHO_SYMBOL_VARIABLE_P): New.
(MACHO_SYMBOL_DEFINED_P):New.
(MACHO_SYMBOL_STATIC_P): New.
* config/i386/darwin.h (MACHO_SYMBOL_FLAG_VARIABLE): Delete.
(SYMBOL_FLAG_SUBT_DEP): New.
* config/rs6000/darwin.h (SYMBOL_FLAG_SUBT_DEP): New.


diff --git a/gcc/config/darwin.c b/gcc/config/darwin.c
index 5673982394..45e0d744ad 100644
--- a/gcc/config/darwin.c
+++ b/gcc/config/darwin.c
@@ -76,7 +76,7 @@ along with GCC; see the file COPYING3.  If not see
setting the second word in the .non_lazy_symbol_pointer data
structure to symbol.  See indirect_data for the code that handles
the extra indirection, and machopic_output_indirection and its use
-   of MACHO_SYMBOL_STATIC for the code that handles @code{static}
+   of MACHO_SYMBOL_FLAG_STATIC for the code that handles @code{static}
symbol indirection.  */
 
 typedef struct GTY(()) cdtor_record {
@@ -249,7 +249,7 @@ name_needs_quotes (const char *name)
 int
 machopic_symbol_defined_p (rtx sym_ref)
 {
-  if (SYMBOL_REF_FLAGS (sym_ref) & MACHO_SYMBOL_FLAG_DEFINED)
+  if (MACHO_SYMBOL_DEFINED_P (sym_ref))
 return true;
 
   /* If a symbol references local and is not an extern to this
@@ -258,7 +258,7 @@ machopic_symbol_defined_p (rtx sym_ref)
 {
   /* If the symbol references a variable and the variable is a
 common symbol, then this symbol is not defined.  */
-  if (SYMBOL_REF_FLAGS (sym_ref) & MACHO_SYMBOL_FLAG_VARIABLE)
+  if (MACHO_SYMBOL_VARIABLE_P (sym_ref))
{
  tree decl = SYMBOL_REF_DECL (sym_ref);
  if (!decl)
@@ -797,8 +797,7 @@ machopic_indirect_call_target (rtx target)
 
   if (MACHOPIC_INDIRECT
   && GET_CODE (XEXP (target, 0)) == SYMBOL_REF
-  && !(SYMBOL_REF_FLAGS (XEXP (target, 0))
-  & MACHO_SYMBOL_FLAG_DEFINED))
+  && ! MACHO_SYMBOL_DEFINED_P (XEXP (target, 0)))
 {
   rtx sym_ref = XEXP (target, 0);
   const char *stub_name = machopic_indirection_name (sym_ref,
@@ -1167,14 +1166,14 @@ machopic_output_indirection (machopic_indirection 
**slot, FILE *asm_out_file)
   assemble_name (asm_out_file, sym_name);
   fprintf (asm_out_file, "\n");
 
-  /* Variables that are marked with MACHO_SYMBOL_STATIC need to
+  /* Variables that are marked with MACHO_SYMBOL_FLAG_STATIC need to
 have their symbol name instead of 0 in the second entry of
 the non-lazy symbol pointer data structure when they are
 defined.  This allows the runtime to rebind newer instances
 of the translation unit with the original instance of the
 symbol.  */
 
-  if ((SYMBOL_REF_FLAGS (symbol) & MACHO_SYMBOL_STATIC)
+  if (MACHO_SYMBOL_STATIC_P (symbol)
  && machopic_symbol_defined_p (symbol))
init = gen_rtx_SYMBOL_REF (Pmode, sym_name);
 
@@ -1205,23 +1204,37 @@ machopic_operand_p (rtx op)
&& XINT (XEXP (op, 0), 1) == UNSPEC_MACHOPIC_OFFSET);
 }
 
-/* This function records whether a given name corresponds to a defined
-   or undefined function or variable, for machopic_classify_ident to
-   use later.  */
+/* This function:
+   computes and caches a series of flags that characterise the symbol's
+   properties that affect Mach-O code gen (including accidental cases
+   from older toolchains).
+
+   TODO:
+   Here we also need to do enough analysis to determine if a symbol's
+   name needs to be made linker-visible.  This is more tricky - since
+   it depends on whether we've previously seen a global weak definition
+   in the same section.
+   */
 
 void

Re: [C++ PATCH] Partial fix for further -fstrong-eval-order issues (PR c++/91987)

2019-10-07 Thread Jakub Jelinek

On Mon, Oct 07, 2019 at 03:51:05PM -0400, Jason Merrill wrote:
> > -  if (TREE_CODE (arg1) == COMPOUND_EXPR)
> > +  if (TREE_CODE (arg1) == COMPOUND_EXPR
> > + && (flag_strong_eval_order != 2
> > + /* C++17 disallows this canonicalization for shifts.  */
> > + || (code != LSHIFT_EXPR
> > + && code != RSHIFT_EXPR
> > + && code != LROTATE_EXPR
> > + && code != RROTATE_EXPR)))
> 
> Perhaps we should handle this in cp_build_binary_op instead?

How?  By adding a SAVE_EXPR in there, so that generic code is safe?

> > if (processing_template_decl && expr != error_mark_node)
> >   {
> 
> Hmm, it's weird that we have both grok_array_decl and build_x_array_ref.
> I'll try removing the former.

Indeed.  I've noticed that because cp_build_array_ref only swaps in the
non-array case, in:
template 
auto
foo (T , U )
{
  T t;
  U u;
  __builtin_memcpy (, , sizeof (t));
  __builtin_memcpy (, , sizeof (u));
  return t[u];
}

int
main ()
{
  int a[4] = { 3, 4, 5, 6 };
  int b = 2;
  auto c = foo (a, b);
  auto d = foo (b, a);
}
we actually use the *(t+u) form in the second instantiation case
(regardless of -fstrong-eval-order) and t[u] in the former case,
as it is only grok_array_decl that swaps in that case.

> > --- gcc/cp/typeck.c.jj  2019-10-07 13:08:58.717380894 +0200
> > +++ gcc/cp/typeck.c 2019-10-07 13:21:56.859450760 +0200
> > @@ -3550,6 +3550,10 @@ cp_build_array_ref (location_t loc, tree
> > {
> >   tree ar = cp_default_conversion (array, complain);
> >   tree ind = cp_default_conversion (idx, complain);
> > +tree first = NULL_TREE;
> > +
> > +if (flag_strong_eval_order == 2 && TREE_SIDE_EFFECTS (ind))
> > +  ar = first = save_expr (ar);
> 
> save_expr will make a copy of the array, won't it?  That's unlikely to be

No.  First of all, this is on the else branch of
TREE_CODE (TREE_TYPE (array)) == ARRAY_TYPE, so either array is a pointer,
or idx is an array, or pointer, and it is after cp_default_conversion, so I
think ar must be either a pointer or integer.
I haven't touched the ARRAY_REF case earlier, because that I believe is
handled right in the gimplifier already.

> > +  if (flag_strong_eval_order == 2
> > + && TREE_SIDE_EFFECTS (TREE_OPERAND (*expr_p, 1)))
> > +   {
> > + enum gimplify_status t
> > +   = gimplify_expr (_OPERAND (*expr_p, 0), pre_p, post_p,
> > +is_gimple_val, fb_rvalue);
> > + if (t == GS_ERROR)
> > +   break;
> > + else if (is_gimple_variable (TREE_OPERAND (*expr_p, 0))
> > +  && TREE_CODE (TREE_OPERAND (*expr_p, 0)) != SSA_NAME)
> > +   TREE_OPERAND (*expr_p, 0)
> > + = get_initialized_tmp_var (TREE_OPERAND (*expr_p, 0), pre_p,
> > +NULL);
> 
> I still think this pattern would be cleaner with a new gimple predicate that
> returns true for invariant || SSA_NAME.  I think that would have the same
> effect as this code.

The problem is that we need a reliable way to discover safe GIMPLE
temporaries for that to work.  If SSA_NAME is created, great, but in various
contexts (OpenMP/OpenACC bodies, and various other cases) allow_ssa is
false, at which point the gimplifier creates a temporary artificial VAR_DECL.
If there is a predicate that rejects such a temporary, gimplify_expr will
ICE:
  gcc_assert (!VOID_TYPE_P (TREE_TYPE (*expr_p)));
  *expr_p = get_formal_tmp_var (*expr_p, pre_p);
...
  /* Make sure the temporary matches our predicate.  */
  gcc_assert ((*gimple_test_f) (*expr_p));

Now, we could have a predicate that does invariant || SSA_NAME || (VAR_P &&
DECL_ARTIFICIAL), but I'm not certain that is good enough, DECL_ARTIFICIAL
are also TARGET_EXPR temporaries and many others and I couldn't convince
myself one can't write a valid testcase that would still fail.
Of course, we could add some VAR_DECL flag and set it on the
create_tmp_from_val created temporaries, but is that the way to go?

Jakub

Re: [C++ PATCH] Partial fix for further -fstrong-eval-order issues (PR c++/91987)

2019-10-07 Thread Jason Merrill


On 10/7/19 12:23 PM, Jakub Jelinek wrote:

Hi!

So, C++17 and later says that in E1[E2] and E1 << E2 and E1 >> E2 the
E1 expression is sequenced before E2.
Similarly to the recent call expression PR, while the gimplifier
gimplifies the first operand before the second one, it uses is_gimple_val
as predicate (and we don't really have a usable predicate that would require
only gimple temporaries one can't modify), so if the first operand doesn't
gimplify into SSA_NAME which is ok, we need to force it into a temporary.
This is the cp-gimplify.c hunk.  Unfortunately, for shifts that is not good
enough, because even before we reach that fold-const.c actually can move
some side-effects from the arguments, which is ok for the first operand, but
not ok for the second one.

For E1[E2], we either lower it into ARRAY_REF if E1 has array type, but in
that case I think the similar issue can't happen, or we lower it otherwise
to *(E1 + E2), but in that case there is absolutely no ordering guarantee
between the two, folders can swap them any time etc.
So, the patch instead for -fstrong-eval-order lowers it into
SAVE_EXPR, *(SAVE_EXPR + E2) if at least one of the operands has
side-effects.  I had to also tweak grok_array_decl, because it swapped the
operands in some cases, and it causes on c-c++-common/gomp/reduction-1.c
I'll need to handle the new form of code in the OpenMP handling code.

Also, the PR contains other testcases that are wrong, but I'm not 100% sure
what to do, I'm afraid we'll need to force aggregate arguments (without
TREE_ADDRESSABLE types) into registers in that case for some of the
CALL_EXPRs with ordered args.  Unlike this patch and the previous one, which
wastes at most some compile time memory for the extra temporaries or best
case SSA_NAMEs and one extra stmt to load), unlikely results in any runtime
slowdowns, forcing aggregates into separate temporaries might be very
expensive in some cases.  So, wonder if we shouldn't do some smarter
analysis in that case, like if we have two args where the first one needs to
be evaluated before the second one and second one has side-effects (or vice
versa), for the first argument check if it has address taken or is global
(== may_be_aliased), if yes, probably force it always, otherwise walk the
second argument to see if it refers to this var.  Thoughts on that?

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
once I resolve the reduction-1.c ICE in the OpenMP code?

2019-10-07  Jakub Jelinek  

PR c++/91987
* fold-const.c (fold_binary_loc): Don't optimize COMPOUND_EXPR in
the second operand of -fstrong-eval-order shift/rotate.
cp/
* decl2.c (grok_array_decl): For -fstrong-eval-order, when array ref
operands have been swapped and at least one operand has side-effects,
revert the swapping before calling build_array_ref.
* typeck.c (cp_build_array_ref): For non-ARRAY_TYPE array ref with
side-effects on the index operand, if -fstrong-eval-order use
save_expr around the array operand.
* cp-gimplify.c (cp_gimplify_expr): For shifts/rotates and
-fstrong-eval-order, force the first operand into a temporary.
testsuite/
* g++.dg/cpp1z/eval-order6.C: New test.
* g++.dg/cpp1z/eval-order7.C: New test.
* g++.dg/cpp1z/eval-order8.C: New test.

--- gcc/fold-const.c.jj 2019-10-04 12:24:38.704764109 +0200
+++ gcc/fold-const.c2019-10-07 13:21:56.855450821 +0200
@@ -9447,16 +9447,23 @@ fold_binary_loc (location_t loc, enum tr
if (TREE_CODE (arg0) == COMPOUND_EXPR)
{
  tem = fold_build2_loc (loc, code, type,
-fold_convert_loc (loc, TREE_TYPE (op0),
-  TREE_OPERAND (arg0, 1)), op1);
+fold_convert_loc (loc, TREE_TYPE (op0),
+  TREE_OPERAND (arg0, 1)),
+  op1);
  return build2_loc (loc, COMPOUND_EXPR, type, TREE_OPERAND (arg0, 0),
 tem);
}
-  if (TREE_CODE (arg1) == COMPOUND_EXPR)
+  if (TREE_CODE (arg1) == COMPOUND_EXPR
+ && (flag_strong_eval_order != 2
+ /* C++17 disallows this canonicalization for shifts.  */
+ || (code != LSHIFT_EXPR
+ && code != RSHIFT_EXPR
+ && code != LROTATE_EXPR
+ && code != RROTATE_EXPR)))


Perhaps we should handle this in cp_build_binary_op instead?


{
  tem = fold_build2_loc (loc, code, type, op0,
-fold_convert_loc (loc, TREE_TYPE (op1),
-  TREE_OPERAND (arg1, 1)));
+fold_convert_loc (loc, TREE_TYPE (op1),
+  TREE_OPERAND (arg1, 1)));
  return build2_loc (loc, COMPOUND_EXPR, type, TREE_OPERAND

Re: C++ PATCH for C++20 P0388R4 (conversions to arrays of unknown bounds) and CWG 1307 (c++/91364, c++/69531)

2019-10-07 Thread Jason Merrill


On 10/7/19 1:42 PM, Marek Polacek wrote:

On Sun, Oct 06, 2019 at 03:39:25PM +, Tam S. B. wrote:

Hi, sorry for chiming in.


Hi, no worries, comments are welcome!


IIUC P0388R4 does not allow implicit conversion from `T(*)[]` to `T(*)[N]`. Per 
[conv.qual]/3,


A prvalue of type `T1` can be converted to type `T2` if the cv-combined type of 
`T1` and `T2` is `T2`.


When T1 is `T(*)[]` and T2 is `T(*)[N]`, the "cv-combined type" is `T(*)[]`, 
which is not the same as T2 = `T(*)[N]`, so conversion should not be permitted.

That is to say, either `comp_ptr_ttypes_real` should not use 
`comp_array_types`, or `comp_array_types` should be extended to handle this 
case correctly.


Indeed, I must've glossed over that.  Fixed by introducing compare_bounds_t,
which says how we should compare arrays when one of them is [].

similar_type_p still must say "yes" for T(*)[] and T(*)[N] though.


Also, does this patch permit conversion from `T(*)[N]` to `const T(*)[]`? I 
think [conv.qual] allows this conversion, but IIUC `comp_array_types` will 
return false in this case, as it uses `same_type_p`.


It does now, array-conv14.C tests it.

Thanks for looking and catching these problems!

Bootstrapped/regtested on x86_64-linux, built Boost and cmcstl2.

Ok for trunk?

2019-10-07  Marek Polacek  

PR c++/91364 - Implement P0388R4: Permit conversions to arrays of
unknown bound.
PR c++/69531 - Implement CWG 1307: Differently bounded array
parameters.
* call.c (build_array_conv): Build ck_identity at the beginning
of the conversion.
(standard_conversion): Pass bounds_none to comp_ptr_ttypes_const.
(maybe_warn_array_conv): New.
(convert_like_real): Call it.  Add an error message about converting
from arrays of unknown bounds.
(conv_get_original_expr): New.
(nelts_initialized_by_list_init): New.
(conv_binds_to_array_of_unknown_bound): New.
(compare_ics): Implement list-initialization ranking based on
array sizes, as specified in DR 1307 and P0388R.
* cp-tree.h (comp_ptr_ttypes_const): Adjust declaration.
(compare_bounds_t): New enum.
* typeck.c (comp_array_types): New bool and compare_bounds_t
parameters.  Use them.
(structural_comptypes): Adjust the call to comp_array_types.
(similar_type_p): Handle ARRAY_TYPE.
(build_const_cast_1): Pass bounds_none to comp_ptr_ttypes_const.
(comp_ptr_ttypes_real): Use comp_array_types.
(comp_ptr_ttypes_const): New compare_bounds_t parameter.  Use
comp_array_types.

* g++.dg/cpp0x/initlist-array3.C: Remove dg-error.
* g++.dg/cpp0x/initlist-array7.C: New test.
* g++.dg/cpp0x/initlist-array8.C: New test.
* g++.dg/cpp2a/array-conv1.C: New test.
* g++.dg/cpp2a/array-conv10.C: New test.
* g++.dg/cpp2a/array-conv11.C: New test.
* g++.dg/cpp2a/array-conv12.C: New test.
* g++.dg/cpp2a/array-conv13.C: New test.
* g++.dg/cpp2a/array-conv14.C: New test.
* g++.dg/cpp2a/array-conv2.C: New test.
* g++.dg/cpp2a/array-conv3.C: New test.
* g++.dg/cpp2a/array-conv4.C: New test.
* g++.dg/cpp2a/array-conv5.C: New test.
* g++.dg/cpp2a/array-conv6.C: New test.
* g++.dg/cpp2a/array-conv7.C: New test.
* g++.dg/cpp2a/array-conv8.C: New test.
* g++.dg/cpp2a/array-conv9.C: New test.
* g++.old-deja/g++.bugs/900321_01.C: Adjust dg-error.

diff --git gcc/gcc/cp/call.c gcc/gcc/cp/call.c
index 56dcbd391c1..1a9c296a812 100644
--- gcc/gcc/cp/call.c
+++ gcc/gcc/cp/call.c
@@ -122,7 +122,8 @@ struct conversion {
 of using this field directly.  */
  conversion *next;
  /* The expression at the beginning of the conversion chain.  This
-   variant is used only if KIND is ck_identity or ck_ambig.  */
+   variant is used only if KIND is ck_identity or ck_ambig.  You can
+   use conv_get_original_expr to get this expression.  */
  tree expr;
  /* The array of conversions for an initializer_list, so this
 variant is used only when KIN D is ck_list.  */
@@ -223,6 +224,8 @@ static void add_candidates (tree, tree, const vec *, tree, tree,
tsubst_flags_t);
  static conversion *merge_conversion_sequences (conversion *, conversion *);
  static tree build_temp (tree, tree, int, diagnostic_t *, tsubst_flags_t);
+static conversion *build_identity_conv (tree, tree);
+static inline bool conv_binds_to_array_of_unknown_bound (conversion *);
  
  /* Returns nonzero iff the destructor name specified in NAME matches BASETYPE.

 NAME can take many forms...  */
@@ -1078,7 +1081,7 @@ build_array_conv (tree type, tree ctor, int flags, 
tsubst_flags_t complain)
c->rank = rank;
c->user_conv_p = user;
c->bad_p = bad;
-  c->u.next = NULL;
+  c->u.next = build_identity_conv (TREE_TYPE (ctor), ctor);
return c;
  }
  
@@

Re: [PATCH] Remove broken URL from libstdc++ manual

2019-10-07 Thread Thomas Schwinge

Hi!

On 2019-09-05T08:45:50+0100, Jonathan Wakely  wrote:
> Committed to trunk. I think I'll backport this too, so we don't keep a
> non-working link in the docs on release branches.

> commit 45a605e970ea6db474e40c02aef6b18993fea05c
> Author: Jonathan Wakely 
> Date:   Thu Sep 5 08:40:35 2019 +0100
>
> Remove broken URL from libstdc++ manual
> 
> The URL for the "What Are Allocators Good For?" article has been a
> recurring source of problems. It moved from the C/C++ Users Journal
> website to the Dr Dobbs site after CUJ shut down, and the original
> domain changed hands, leaving old links pointing to nefarious sites.
> 
> Now the URL to the copy on drdobbs.com no longer works either and I
> can't find a (legal) copy of the article online. The simplest solution
> is to remove the URL.
> 
> * doc/xml/manual/allocator.xml: Remove URL for bibliography entry.
> * doc/html/*: Regenerate.
>
> diff --git a/libstdc++-v3/doc/xml/manual/allocator.xml 
> b/libstdc++-v3/doc/xml/manual/allocator.xml
> index 0de1be9465a..922bc49091c 100644
> --- a/libstdc++-v3/doc/xml/manual/allocator.xml
> +++ b/libstdc++-v3/doc/xml/manual/allocator.xml
> @@ -482,12 +482,9 @@
>
>  
>
> -  
> - http://www.w3.org/1999/xlink;
> -   
> xlink:href="http://www.drdobbs.com/the-standard-librarian-what-are-allocato/184403759;>

For what it's worth: see
,
and

seems to be the latest revision they've got.


Grüße
 Thomas


> +
>The Standard Librarian: What Are Allocators Good For?
> - 
> -  
> +
>  
>  
> MattAustern
>  
> @@ -495,6 +492,7 @@
>   C/C++ Users Journal
>
>  
> +2000-12
>
>  
>


signature.asc
Description: PGP signature

Re: [PATCH V6 11/11] bpf: add myself as the maintainer for the eBPF port

2019-10-07 Thread Thomas Schwinge

Hi!

On 2019-09-08T19:35:52+0800, Gerald Pfeifer  wrote:
> On Thu, 29 Aug 2019, Jose E. Marchesi wrote:
>> ChangeLog:
>> 
>>  * MAINTAINERS: Add myself as the maintainer of the eBPF port.
>
> Approved.
>
> As in: Approved by the steering committed assuming the patchset passes
> technical review. :-)
>
> Thanks for working on this, Joe, and congrats to the new hat!

Indeed -- I'm always happy when people/companies demonstrate that good
ol' GCC is still viable for the latest craze, the new shiny thing, like
adding a not-your-standard-CPU-architecture back end!  ;-D

I think, using the "as obvious" rule, you should add a "News" item on
, like done for other new back ends etc.

Grüße
 Thomas

signature.asc
Description: PGP signature

[PATCH, i386]: Reorder a couple of rounding functions

2019-10-07 Thread Uros Bizjak

Put some functions to a better place.

2019-10-07  Uroš Bizjak  

* config/i386/i386-expand.c (ix86_expand_floorceildf_32,
ix86_expand_rounddf_32): Reorder functions.
* config/i386/i386-protos.h: Update..

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 22c2823c549f..3635de597d0b 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -15903,71 +15903,8 @@ ix86_expand_rint (rtx operand0, rtx operand1)
   emit_move_insn (operand0, res);
 }
 
-/* Expand SSE2 sequence for computing floor or ceil from OPERAND1 storing
-   into OPERAND0.  */
-void
-ix86_expand_floorceildf_32 (rtx operand0, rtx operand1, bool do_floor)
-{
-  /* C code for the stuff we expand below.
-double xa = fabs (x), x2;
-if (!isless (xa, TWO52))
-  return x;
-xa = xa + TWO52 - TWO52;
-x2 = copysign (xa, x);
- Compensate.  Floor:
-if (x2 > x)
-  x2 -= 1;
- Compensate.  Ceil:
-if (x2 < x)
-  x2 += 1;
-   if (HONOR_SIGNED_ZEROS (mode))
- x2 = copysign (x2, x);
-   return x2;
-   */
-  machine_mode mode = GET_MODE (operand0);
-  rtx xa, TWO52, tmp, one, res, mask;
-  rtx_code_label *label;
-
-  TWO52 = ix86_gen_TWO52 (mode);
-
-  /* Temporary for holding the result, initialized to the input
- operand to ease control flow.  */
-  res = gen_reg_rtx (mode);
-  emit_move_insn (res, operand1);
-
-  /* xa = abs (operand1) */
-  xa = ix86_expand_sse_fabs (res, );
-
-  /* if (!isless (xa, TWO52)) goto label; */
-  label = ix86_expand_sse_compare_and_jump (UNLE, TWO52, xa, false);
-
-  /* xa = xa + TWO52 - TWO52; */
-  xa = expand_simple_binop (mode, PLUS, xa, TWO52, NULL_RTX, 0, OPTAB_DIRECT);
-  xa = expand_simple_binop (mode, MINUS, xa, TWO52, xa, 0, OPTAB_DIRECT);
-
-  /* xa = copysign (xa, operand1) */
-  ix86_sse_copysign_to_positive (xa, xa, res, mask);
-
-  /* generate 1.0 */
-  one = force_reg (mode, const_double_from_real_value (dconst1, mode));
-
-  /* Compensate: xa = xa - (xa > operand1 ? 1 : 0) */
-  tmp = ix86_expand_sse_compare_mask (UNGT, xa, res, !do_floor);
-  emit_insn (gen_rtx_SET (tmp, gen_rtx_AND (mode, one, tmp)));
-  tmp = expand_simple_binop (mode, do_floor ? MINUS : PLUS,
-xa, tmp, NULL_RTX, 0, OPTAB_DIRECT);
-  if (!do_floor && HONOR_SIGNED_ZEROS (mode))
-ix86_sse_copysign_to_positive (tmp, tmp, res, mask);
-  emit_move_insn (res, tmp);
-
-  emit_label (label);
-  LABEL_NUSES (label) = 1;
-
-  emit_move_insn (operand0, res);
-}
-
-/* Expand SSE2 sequence for computing floor or ceil from OPERAND1 storing
-   into OPERAND0.  */
+/* Expand SSE2 sequence for computing floor or ceil
+   from OPERAND1 storing into OPERAND0.  */
 void
 ix86_expand_floorceil (rtx operand0, rtx operand1, bool do_floor)
 {
@@ -16027,30 +15964,30 @@ ix86_expand_floorceil (rtx operand0, rtx operand1, 
bool do_floor)
   emit_move_insn (operand0, res);
 }
 
-/* Expand SSE sequence for computing round from OPERAND1 storing
-   into OPERAND0.  Sequence that works without relying on DImode truncation
-   via cvttsd2siq that is only available on 64bit targets.  */
+/* Expand SSE2 sequence for computing floor or ceil from OPERAND1 storing
+   into OPERAND0 without relying on DImode truncation via cvttsd2siq
+   that is only available on 64bit targets.  */
 void
-ix86_expand_rounddf_32 (rtx operand0, rtx operand1)
+ix86_expand_floorceildf_32 (rtx operand0, rtx operand1, bool do_floor)
 {
   /* C code for the stuff we expand below.
-double xa = fabs (x), xa2, x2;
+double xa = fabs (x), x2;
 if (!isless (xa, TWO52))
   return x;
- Using the absolute value and copying back sign makes
- -0.0 -> -0.0 correct.
-xa2 = xa + TWO52 - TWO52;
- Compensate.
-   dxa = xa2 - xa;
-if (dxa <= -0.5)
-  xa2 += 1;
-else if (dxa > 0.5)
-  xa2 -= 1;
-x2 = copysign (xa2, x);
-return x2;
+xa = xa + TWO52 - TWO52;
+x2 = copysign (xa, x);
+ Compensate.  Floor:
+if (x2 > x)
+  x2 -= 1;
+ Compensate.  Ceil:
+if (x2 < x)
+  x2 += 1;
+   if (HONOR_SIGNED_ZEROS (mode))
+ x2 = copysign (x2, x);
+   return x2;
*/
   machine_mode mode = GET_MODE (operand0);
-  rtx xa, xa2, dxa, TWO52, tmp, half, mhalf, one, res, mask;
+  rtx xa, TWO52, tmp, one, res, mask;
   rtx_code_label *label;
 
   TWO52 = ix86_gen_TWO52 (mode);
@@ -16066,31 +16003,24 @@ ix86_expand_rounddf_32 (rtx operand0, rtx operand1)
   /* if (!isless (xa, TWO52)) goto label; */
   label = ix86_expand_sse_compare_and_jump (UNLE, TWO52, xa, false);
 
-  /* xa2 = xa + TWO52 - TWO52; */
-  xa2 = expand_simple_binop (mode, PLUS, xa, TWO52, NULL_RTX, 0, OPTAB_DIRECT);
-  xa2 = expand_simple_binop (mode, MINUS, xa2, TWO52, xa2, 0, OPTAB_DIRECT);
-
-  /* dxa = xa2 -

Add OpenACC 2.6 `acc_get_property' support

2019-10-07 Thread Thomas Schwinge

Hi Jakub!

On 2018-12-03T16:51:14+, "Maciej W. Rozycki"  wrote:
> Add generic support for the OpenACC 2.6 `acc_get_property' and 
> `acc_get_property_string' routines [...]

..., which allow for user code to query the implementation for stuff like:

> OpenACC vendor: GNU
> OpenACC name: GOMP
> OpenACC driver: 1.0
>
> with the host driver and output like:
>
> OpenACC vendor: Nvidia
> OpenACC total memory: 12651462656
> OpenACC free memory: 12202737664
> OpenACC name: TITAN V
> OpenACC driver: 9.1
>
> with the NVPTX driver.

(With 's%OpenACC%Device%', as I understand this; see my comment below.)

Before Frederik starts working on integrating this into GCC trunk, do you
(Jakub) agree with the libgomp plugin interface changes as implemented by
Maciej?  For example, top-level 'GOMP_OFFLOAD_get_property' function in
'struct gomp_device_descr' instead of stuffing this into its
'acc_dispatch_t openacc'.  (I never understood why the OpenACC functions
need to be segregated like they are.)

For reference I'm attaching the latest version of the patch, that is the
commit from openacc-gcc-9-branch, plus a small fix-up.

And, some first remarks (or, "thinking aloud", not a conclusive review)
while I have a look at this myself:

> --- a/include/gomp-constants.h
> +++ b/include/gomp-constants.h
> @@ -215,10 +215,24 @@ enum gomp_map_kind
>  #define GOMP_DEVICE_NVIDIA_PTX   5
>  #define GOMP_DEVICE_INTEL_MIC6
>  #define GOMP_DEVICE_HSA  7
> +#define GOMP_DEVICE_CURRENT  8

This is used for 'acc_device_current', relevant only for
'acc_get_property', to return "the value of the property for the current
device".  This should probably use a more special (negative?) value
instead of eight, so that when additional real device types are added
later on, we can just add them with increasing numbers, and keep the
scanning code simple.

(Use of 'acc_device_current' as an argument to other functions taking an
'acc_device_t' is undefined, and should be rejected with 'gomp_fatal'?)

Also, 'acc_device_current' is a libgomp-internal thing (doesn't interface
with the compiler proper), so strictly speaking 'GOMP_DEVICE_CURRENT'
isn't needed in 'include/gomp-constants.h'.  But probably still a good
idea to list it there, in this canonical place, to keep the several lists
of device types coherent.

(I have not checked how 'acc_device_current' is actually implemented in
the following.)

> +/* Device property codes.  Keep in sync with
> +   libgomp/{openacc.h,openacc.f90,openacc_lib.h}:acc_device_property_t
> +   as well as libgomp/libgomp-plugin.h.  */

Same thing, libgomp-internal, not sure whether to list these here?

> +/* Start from 1 to catch uninitialized use.  */

Hmm, not sure about that either.  Don't think we're generally doing that?

(But I see PGI have 'acc_property_none = 0', oh well.)

> +#define GOMP_DEVICE_PROPERTY_MEMORY  1
> +#define GOMP_DEVICE_PROPERTY_FREE_MEMORY 2
> +#define GOMP_DEVICE_PROPERTY_NAME0x10001
> +#define GOMP_DEVICE_PROPERTY_VENDOR  0x10002
> +#define GOMP_DEVICE_PROPERTY_DRIVER  0x10003
> +
> +/* Internal property mask to tell numeric and string values apart.  */
> +#define GOMP_DEVICE_PROPERTY_STRING_MASK 0x1

> --- a/libgomp/plugin/plugin-nvptx.c
> +++ b/libgomp/plugin/plugin-nvptx.c

> +union gomp_device_property_value
> +GOMP_OFFLOAD_get_property (int n, int prop)
> +{
> +  union gomp_device_property_value propval = { .val = 0 };
> +
> +  pthread_mutex_lock (_dev_lock);
> +
> +  if (!nvptx_init () || n >= nvptx_get_num_devices ())
> +{
> +  pthread_mutex_unlock (_dev_lock);
> +  return propval;
> +}

Isn't it implicit that 'get_num_devices' has been called while loading
the plugin, so we don't have to do any initialization that here?  (But I
may be misremembering that.)

> +  switch (prop)
> +{
> +case GOMP_DEVICE_PROPERTY_MEMORY:
> +  {
> + size_t total_mem;
> + CUdevice dev;
> +
> + CUDA_CALL_ERET (propval, cuDeviceGet, , n);

Isn't that already known as 'ptx_devices[n]'?  (Likewise elsewhere.)

> + CUDA_CALL_ERET (propval, cuDeviceTotalMem, _mem, dev);
> + propval.val = total_mem;
> +  }
> +  break;

> +case GOMP_DEVICE_PROPERTY_NAME:
> +  {
> + static char name[256];
> + CUdevice dev;
> +
> + CUDA_CALL_ERET (propval, cuDeviceGet, , n);
> + CUDA_CALL_ERET (propval, cuDeviceGetName, name, sizeof (name), dev);
> + propval.ptr = name;
> +  }
> +  break;

Uh, that's not thread-safe, is it?

From a quick look at OpenACC 2.7, that doesn't specify what exactly to
return here (that is, which "class" of memory; who's the "owner" of the
memory object).  (So, returning a 'malloc'ed object would be a memory
leak, for example.)  Best would probably be to return some 'const char *'
living in read-only program memory.  (But that likely is not available,
otherwise I suppose Maciej would have done that.)

Otherwise,

Re: [patch] disentangle range_fold_*ary_expr into various pieces

2019-10-07 Thread Marc Glisse


On Mon, 7 Oct 2019, Aldy Hernandez wrote:


On 10/4/19 2:09 PM, Jeff Law wrote:


You're right.  Easier to refer to the before/after directly.  Sometimes
diffs just suck.

OK
jeff


In testing this patch in isolation from the non-zero canonicalization patch, 
I found one regression due to the fact that:


a) As discussed, two non-zero representations currently exist for unsigned 
ranges.


b) ipa-prop.c has it's own hacked up value_range structure (ipa_vr) which 
doesn't use any API.  Since there is no agreed upon non-zero, range-ops can 
sometimes (correctly) create an unsigned [1,MAX], and ipa-prop.c is 
open-coding the check for a pointer non-zero to ~[0,0]. This seems like a 
latent bug.


I really have no idea, nor do I care (*), what we do with ipa-prop's lack of 
API.  For now, I have implemented ipa_vr::nonzero_p(), and used it.  When we 
agree on the non-zero normalization we can adjust this method if necessary.


+bool
+ipa_vr::nonzero_p (tree expr_type) const
+{
+  if (type == VR_ANTI_RANGE && wi::eq_p (min, 0) && wi::eq_p (max, 0))
+return true;
+
+  unsigned prec = TYPE_PRECISION (expr_type);
+  return (type == VR_RANGE
+ && wi::eq_p (min, wi::one (prec))
+ && wi::eq_p (max, wi::max_value (prec, TYPE_SIGN (expr_type;
+}

...

  else if (POINTER_TYPE_P (TREE_TYPE (ddef))
-  && vr[i].type == VR_ANTI_RANGE
-  && wi::eq_p (vr[i].min, 0)
-  && wi::eq_p (vr[i].max, 0))
+  && vr[i].nonzero_p (TREE_TYPE (ddef)))

Attached is the final adjusted patch I have committed to trunk.


I wonder why we would ever want to ask "is this interval the one that 
misses exactly the value 0" instead of "does this interval contain the 
value 0". I naively believe there shouldn't even be any API for the first 
question. Or if pointers really only have 2 possible intervals (besides 
varying and undefined), aka [0,0] and ~[0,0], using intervals seems like 
overkill for them...


--
Marc Glisse

Re: [PATCH] Fix -Wshadow=local warnings in passes.c

2019-10-07 Thread Segher Boessenkool

On Mon, Oct 07, 2019 at 05:11:27PM +, Bernd Edlinger wrote:
> On 10/7/19 9:20 AM, Eric Botcazou wrote:
> > No, please, the cure would be much worse than the disease.
> 
> Ack.
> 
> I think the least worse thing would be a pragma in the macro where the 
> shadowing
> variable is declared...

The best thing would be to write this in a way that nothing *can*
accidentally shadow something.


Segher

Re: C++ PATCH for C++20 P0388R4 (conversions to arrays of unknown bounds) and CWG 1307 (c++/91364, c++/69531)

2019-10-07 Thread Marek Polacek

On Sun, Oct 06, 2019 at 03:39:25PM +, Tam S. B. wrote:
> Hi, sorry for chiming in.

Hi, no worries, comments are welcome!

> IIUC P0388R4 does not allow implicit conversion from `T(*)[]` to `T(*)[N]`. 
> Per [conv.qual]/3, 
> 
> > A prvalue of type `T1` can be converted to type `T2` if the cv-combined 
> > type of `T1` and `T2` is `T2`.
> 
> When T1 is `T(*)[]` and T2 is `T(*)[N]`, the "cv-combined type" is `T(*)[]`, 
> which is not the same as T2 = `T(*)[N]`, so conversion should not be 
> permitted.
> 
> That is to say, either `comp_ptr_ttypes_real` should not use 
> `comp_array_types`, or `comp_array_types` should be extended to handle this 
> case correctly.

Indeed, I must've glossed over that.  Fixed by introducing compare_bounds_t,
which says how we should compare arrays when one of them is [].

similar_type_p still must say "yes" for T(*)[] and T(*)[N] though.

> Also, does this patch permit conversion from `T(*)[N]` to `const T(*)[]`? I 
> think [conv.qual] allows this conversion, but IIUC `comp_array_types` will 
> return false in this case, as it uses `same_type_p`.

It does now, array-conv14.C tests it.

Thanks for looking and catching these problems!

Bootstrapped/regtested on x86_64-linux, built Boost and cmcstl2.

Ok for trunk?

2019-10-07  Marek Polacek  

PR c++/91364 - Implement P0388R4: Permit conversions to arrays of
unknown bound.
PR c++/69531 - Implement CWG 1307: Differently bounded array
parameters.
* call.c (build_array_conv): Build ck_identity at the beginning
of the conversion.
(standard_conversion): Pass bounds_none to comp_ptr_ttypes_const.
(maybe_warn_array_conv): New.
(convert_like_real): Call it.  Add an error message about converting
from arrays of unknown bounds.
(conv_get_original_expr): New.
(nelts_initialized_by_list_init): New.
(conv_binds_to_array_of_unknown_bound): New.
(compare_ics): Implement list-initialization ranking based on
array sizes, as specified in DR 1307 and P0388R.
* cp-tree.h (comp_ptr_ttypes_const): Adjust declaration.
(compare_bounds_t): New enum.
* typeck.c (comp_array_types): New bool and compare_bounds_t
parameters.  Use them.
(structural_comptypes): Adjust the call to comp_array_types.
(similar_type_p): Handle ARRAY_TYPE.
(build_const_cast_1): Pass bounds_none to comp_ptr_ttypes_const.
(comp_ptr_ttypes_real): Use comp_array_types.
(comp_ptr_ttypes_const): New compare_bounds_t parameter.  Use
comp_array_types.

* g++.dg/cpp0x/initlist-array3.C: Remove dg-error.
* g++.dg/cpp0x/initlist-array7.C: New test.
* g++.dg/cpp0x/initlist-array8.C: New test.
* g++.dg/cpp2a/array-conv1.C: New test.
* g++.dg/cpp2a/array-conv10.C: New test.
* g++.dg/cpp2a/array-conv11.C: New test.
* g++.dg/cpp2a/array-conv12.C: New test.
* g++.dg/cpp2a/array-conv13.C: New test.
* g++.dg/cpp2a/array-conv14.C: New test.
* g++.dg/cpp2a/array-conv2.C: New test.
* g++.dg/cpp2a/array-conv3.C: New test.
* g++.dg/cpp2a/array-conv4.C: New test.
* g++.dg/cpp2a/array-conv5.C: New test.
* g++.dg/cpp2a/array-conv6.C: New test.
* g++.dg/cpp2a/array-conv7.C: New test.
* g++.dg/cpp2a/array-conv8.C: New test.
* g++.dg/cpp2a/array-conv9.C: New test.
* g++.old-deja/g++.bugs/900321_01.C: Adjust dg-error.

diff --git gcc/gcc/cp/call.c gcc/gcc/cp/call.c
index 56dcbd391c1..1a9c296a812 100644
--- gcc/gcc/cp/call.c
+++ gcc/gcc/cp/call.c
@@ -122,7 +122,8 @@ struct conversion {
of using this field directly.  */
 conversion *next;
 /* The expression at the beginning of the conversion chain.  This
-   variant is used only if KIND is ck_identity or ck_ambig.  */
+   variant is used only if KIND is ck_identity or ck_ambig.  You can
+   use conv_get_original_expr to get this expression.  */
 tree expr;
 /* The array of conversions for an initializer_list, so this
variant is used only when KIN D is ck_list.  */
@@ -223,6 +224,8 @@ static void add_candidates (tree, tree, const vec *, tree, tree,
tsubst_flags_t);
 static conversion *merge_conversion_sequences (conversion *, conversion *);
 static tree build_temp (tree, tree, int, diagnostic_t *, tsubst_flags_t);
+static conversion *build_identity_conv (tree, tree);
+static inline bool conv_binds_to_array_of_unknown_bound (conversion *);

 /* Returns nonzero iff the destructor name specified in NAME matches BASETYPE.
NAME can take many forms...  */
@@ -1078,7 +1081,7 @@ build_array_conv (tree type, tree ctor, int flags, 
tsubst_flags_t complain)
   c->rank = rank;
   c->user_conv_p = user;
   c->bad_p = bad;
-  c->u.next = NULL;
+  c->u.next = build_identity_conv (TREE_TYPE (ctor), ctor);
   return c;
 }

@@ -1378,7 +1381,7 @@

Re: [PATCH] Fix -Wshadow=local warnings in passes.c

2019-10-07 Thread Bernd Edlinger

On 10/7/19 9:20 AM, Eric Botcazou wrote:
>> If this ends up acked then please add this to ansidecl.h or
>> somewhere else global as template:
>>
>> template 
>> struct push {
>>   push (T &);
>>   ~push ();
>>   T *m_loc;
>>   T m_val;
>> };
>>
>> because it would be a general solution for _all_ shadow=local warnings?!
> 
> No, please, the cure would be much worse than the disease.
> 

Ack.

I think the least worse thing would be a pragma in the macro where the shadowing
variable is declared...


Bernd.

Re: [PATCH v4 7/7] S/390: Test signaling FP comparison instructions

2019-10-07 Thread Andreas Krebbel

On 01.10.19 15:27,  wrote:
> gcc/testsuite/ChangeLog:
> 
> 2019-08-09  Ilya Leoshkevich  
> 
>   PR target/77918
>   * gcc.target/s390/s390.exp: Enable Fortran tests.
>   * gcc.target/s390/zvector/autovec-double-quiet-eq.c: New test.
>   * gcc.target/s390/zvector/autovec-double-quiet-ge.c: New test.
>   * gcc.target/s390/zvector/autovec-double-quiet-gt.c: New test.
>   * gcc.target/s390/zvector/autovec-double-quiet-le.c: New test.
>   * gcc.target/s390/zvector/autovec-double-quiet-lt.c: New test.
>   * gcc.target/s390/zvector/autovec-double-quiet-ordered.c: New test.
>   * gcc.target/s390/zvector/autovec-double-quiet-uneq.c: New test.
>   * gcc.target/s390/zvector/autovec-double-quiet-unordered.c: New test.
>   * gcc.target/s390/zvector/autovec-double-signaling-eq-z13-finite.c: New 
> test.
>   * gcc.target/s390/zvector/autovec-double-signaling-eq-z13.c: New test.
>   * gcc.target/s390/zvector/autovec-double-signaling-eq.c: New test.
>   * gcc.target/s390/zvector/autovec-double-signaling-ge-z13-finite.c: New 
> test.
>   * gcc.target/s390/zvector/autovec-double-signaling-ge-z13.c: New test.
>   * gcc.target/s390/zvector/autovec-double-signaling-ge.c: New test.
>   * gcc.target/s390/zvector/autovec-double-signaling-gt-z13-finite.c: New 
> test.
>   * gcc.target/s390/zvector/autovec-double-signaling-gt-z13.c: New test.
>   * gcc.target/s390/zvector/autovec-double-signaling-gt.c: New test.
>   * gcc.target/s390/zvector/autovec-double-signaling-le-z13-finite.c: New 
> test.
>   * gcc.target/s390/zvector/autovec-double-signaling-le-z13.c: New test.
>   * gcc.target/s390/zvector/autovec-double-signaling-le.c: New test.
>   * gcc.target/s390/zvector/autovec-double-signaling-lt-z13-finite.c: New 
> test.
>   * gcc.target/s390/zvector/autovec-double-signaling-lt-z13.c: New test.
>   * gcc.target/s390/zvector/autovec-double-signaling-lt.c: New test.
>   * gcc.target/s390/zvector/autovec-double-signaling-ltgt-z13-finite.c: 
> New test.
>   * gcc.target/s390/zvector/autovec-double-signaling-ltgt-z13.c: New test.
>   * gcc.target/s390/zvector/autovec-double-signaling-ltgt.c: New test.
>   * gcc.target/s390/zvector/autovec-double-smax-z13.F90: New test.
>   * gcc.target/s390/zvector/autovec-double-smax.F90: New test.
>   * gcc.target/s390/zvector/autovec-double-smin-z13.F90: New test.
>   * gcc.target/s390/zvector/autovec-double-smin.F90: New test.
>   * gcc.target/s390/zvector/autovec-float-quiet-eq.c: New test.
>   * gcc.target/s390/zvector/autovec-float-quiet-ge.c: New test.
>   * gcc.target/s390/zvector/autovec-float-quiet-gt.c: New test.
>   * gcc.target/s390/zvector/autovec-float-quiet-le.c: New test.
>   * gcc.target/s390/zvector/autovec-float-quiet-lt.c: New test.
>   * gcc.target/s390/zvector/autovec-float-quiet-ordered.c: New test.
>   * gcc.target/s390/zvector/autovec-float-quiet-uneq.c: New test.
>   * gcc.target/s390/zvector/autovec-float-quiet-unordered.c: New test.
>   * gcc.target/s390/zvector/autovec-float-signaling-eq.c: New test.
>   * gcc.target/s390/zvector/autovec-float-signaling-ge.c: New test.
>   * gcc.target/s390/zvector/autovec-float-signaling-gt.c: New test.
>   * gcc.target/s390/zvector/autovec-float-signaling-le.c: New test.
>   * gcc.target/s390/zvector/autovec-float-signaling-lt.c: New test.
>   * gcc.target/s390/zvector/autovec-float-signaling-ltgt.c: New test.
>   * gcc.target/s390/zvector/autovec-fortran.h: New test.
>   * gcc.target/s390/zvector/autovec-long-double-signaling-ge.c: New test.
>   * gcc.target/s390/zvector/autovec-long-double-signaling-gt.c: New test.
>   * gcc.target/s390/zvector/autovec-long-double-signaling-le.c: New test.
>   * gcc.target/s390/zvector/autovec-long-double-signaling-lt.c: New test.
>   * gcc.target/s390/zvector/autovec.h: New test.

Do these tests work on 32 bit? We need -mzarch to make the vector instructions 
available there.

Ok, if clean on 32 and 64 bit.

Thanks!

Andreas


> ---
>  gcc/testsuite/gcc.target/s390/s390.exp|  8 
>  .../s390/zvector/autovec-double-quiet-eq.c|  8 
>  .../s390/zvector/autovec-double-quiet-ge.c|  8 
>  .../s390/zvector/autovec-double-quiet-gt.c|  8 
>  .../s390/zvector/autovec-double-quiet-le.c|  8 
>  .../s390/zvector/autovec-double-quiet-lt.c|  8 
>  .../zvector/autovec-double-quiet-ordered.c| 10 +
>  .../s390/zvector/autovec-double-quiet-uneq.c  | 10 +
>  .../zvector/autovec-double-quiet-unordered.c  | 11 +
>  .../autovec-double-signaling-eq-z13-finite.c  | 10 +
>  .../zvector/autovec-double-signaling-eq-z13.c |  9 
>  .../zvector/autovec-double-signaling-eq.c | 11 +
>  .../autovec-double-signaling-ge-z13-finite.c  | 10 +
>  .../zvector/autovec-double-signaling-ge-z13.c |  9 
>  .../zvector/autovec-double-signaling-ge.c

[C++ PATCH] Partial fix for further -fstrong-eval-order issues (PR c++/91987)

2019-10-07 Thread Jakub Jelinek

Hi!

So, C++17 and later says that in E1[E2] and E1 << E2 and E1 >> E2 the
E1 expression is sequenced before E2.
Similarly to the recent call expression PR, while the gimplifier
gimplifies the first operand before the second one, it uses is_gimple_val
as predicate (and we don't really have a usable predicate that would require
only gimple temporaries one can't modify), so if the first operand doesn't
gimplify into SSA_NAME which is ok, we need to force it into a temporary.
This is the cp-gimplify.c hunk.  Unfortunately, for shifts that is not good
enough, because even before we reach that fold-const.c actually can move
some side-effects from the arguments, which is ok for the first operand, but
not ok for the second one.

For E1[E2], we either lower it into ARRAY_REF if E1 has array type, but in
that case I think the similar issue can't happen, or we lower it otherwise
to *(E1 + E2), but in that case there is absolutely no ordering guarantee
between the two, folders can swap them any time etc.
So, the patch instead for -fstrong-eval-order lowers it into
SAVE_EXPR, *(SAVE_EXPR + E2) if at least one of the operands has
side-effects.  I had to also tweak grok_array_decl, because it swapped the
operands in some cases, and it causes on c-c++-common/gomp/reduction-1.c
I'll need to handle the new form of code in the OpenMP handling code.

Also, the PR contains other testcases that are wrong, but I'm not 100% sure
what to do, I'm afraid we'll need to force aggregate arguments (without
TREE_ADDRESSABLE types) into registers in that case for some of the
CALL_EXPRs with ordered args.  Unlike this patch and the previous one, which
wastes at most some compile time memory for the extra temporaries or best
case SSA_NAMEs and one extra stmt to load), unlikely results in any runtime
slowdowns, forcing aggregates into separate temporaries might be very
expensive in some cases.  So, wonder if we shouldn't do some smarter
analysis in that case, like if we have two args where the first one needs to
be evaluated before the second one and second one has side-effects (or vice
versa), for the first argument check if it has address taken or is global
(== may_be_aliased), if yes, probably force it always, otherwise walk the
second argument to see if it refers to this var.  Thoughts on that?

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
once I resolve the reduction-1.c ICE in the OpenMP code?

2019-10-07  Jakub Jelinek  

PR c++/91987
* fold-const.c (fold_binary_loc): Don't optimize COMPOUND_EXPR in
the second operand of -fstrong-eval-order shift/rotate.
cp/
* decl2.c (grok_array_decl): For -fstrong-eval-order, when array ref
operands have been swapped and at least one operand has side-effects,
revert the swapping before calling build_array_ref.
* typeck.c (cp_build_array_ref): For non-ARRAY_TYPE array ref with
side-effects on the index operand, if -fstrong-eval-order use
save_expr around the array operand.
* cp-gimplify.c (cp_gimplify_expr): For shifts/rotates and
-fstrong-eval-order, force the first operand into a temporary.
testsuite/
* g++.dg/cpp1z/eval-order6.C: New test.
* g++.dg/cpp1z/eval-order7.C: New test.
* g++.dg/cpp1z/eval-order8.C: New test.

--- gcc/fold-const.c.jj 2019-10-04 12:24:38.704764109 +0200
+++ gcc/fold-const.c2019-10-07 13:21:56.855450821 +0200
@@ -9447,16 +9447,23 @@ fold_binary_loc (location_t loc, enum tr
   if (TREE_CODE (arg0) == COMPOUND_EXPR)
{
  tem = fold_build2_loc (loc, code, type,
-fold_convert_loc (loc, TREE_TYPE (op0),
-  TREE_OPERAND (arg0, 1)), op1);
+fold_convert_loc (loc, TREE_TYPE (op0),
+  TREE_OPERAND (arg0, 1)),
+  op1);
  return build2_loc (loc, COMPOUND_EXPR, type, TREE_OPERAND (arg0, 0),
 tem);
}
-  if (TREE_CODE (arg1) == COMPOUND_EXPR)
+  if (TREE_CODE (arg1) == COMPOUND_EXPR
+ && (flag_strong_eval_order != 2
+ /* C++17 disallows this canonicalization for shifts.  */
+ || (code != LSHIFT_EXPR
+ && code != RSHIFT_EXPR
+ && code != LROTATE_EXPR
+ && code != RROTATE_EXPR)))
{
  tem = fold_build2_loc (loc, code, type, op0,
-fold_convert_loc (loc, TREE_TYPE (op1),
-  TREE_OPERAND (arg1, 1)));
+fold_convert_loc (loc, TREE_TYPE (op1),
+  TREE_OPERAND (arg1, 1)));
  return build2_loc (loc, COMPOUND_EXPR, type, TREE_OPERAND (arg1, 0),
 tem);
}
--- gcc/cp/decl2.c.jj   2019-09-26 21:34:21.711916155

Re: [PATCH v4 6/7] S/390: Use signaling FP comparison instructions

2019-10-07 Thread Andreas Krebbel

On 01.10.19 15:27, Ilya Leoshkevich wrote:
> dg-torture.exp=inf-compare-1.c is failing, because (qNaN > +Inf)
> comparison is compiled to CDB instruction, which does not signal an
> invalid operation exception. KDB should have been used instead.
> 
> This patch introduces a new CCmode and a new pattern in order to
> generate signaling instructions in this and similar cases.
> 
> gcc/ChangeLog:
> 
> 2019-08-09  Ilya Leoshkevich  
> 
>   PR target/77918
>   * config/s390/2827.md: Add new opcodes.
>   * config/s390/2964.md: Likewise.
>   * config/s390/3906.md: Likewise.
>   * config/s390/8561.md: Likewise.
>   * config/s390/s390-builtins.def (s390_vfchesb): Use
>   the new vec_cmpgev4sf_quiet_nocc.
>   (s390_vfchedb): Use the new vec_cmpgev2df_quiet_nocc.
>   (s390_vfchsb): Use the new vec_cmpgtv4sf_quiet_nocc.
>   (s390_vfchdb): Use the new vec_cmpgtv2df_quiet_nocc.
>   (vec_cmplev4sf): Use the new vec_cmplev4sf_quiet_nocc.
>   (vec_cmplev2df): Use the new vec_cmplev2df_quiet_nocc.
>   (vec_cmpltv4sf): Use the new vec_cmpltv4sf_quiet_nocc.
>   (vec_cmpltv2df): Use the new vec_cmpltv2df_quiet_nocc.
>   * config/s390/s390-modes.def (CCSFPS): New mode.
>   * config/s390/s390.c (s390_match_ccmode_set): Support CCSFPS.
>   (s390_select_ccmode): Return CCSFPS for LT, LE, GT, GE and LTGT.
>   (s390_branch_condition_mask): Reuse CCS for CCSFPS.
>   (s390_expand_vec_compare): Use non-signaling patterns where
>   necessary.
>   (s390_reverse_condition): Support CCSFPS.
>   * config/s390/s390.md (*cmp_ccsfps): New pattern.
>   * config/s390/vector.md: (VFCMP_HW_OP): Remove.
>   (asm_fcmp_op): Likewise.
>   (*smaxv2df3_vx): Use pattern for quiet comparison.
>   (*sminv2df3_vx): Likewise.
>   (*vec_cmp_nocc): Remove.
>   (*vec_cmpeq_quiet_nocc): New pattern.
>   (vec_cmpgt_quiet_nocc): Likewise.
>   (vec_cmplt_quiet_nocc): New expander.
>   (vec_cmpge_quiet_nocc): New pattern.
>   (vec_cmple_quiet_nocc): New expander.
>   (*vec_cmpeq_signaling_nocc): New pattern.
>   (*vec_cmpgt_signaling_nocc): Likewise.
>   (*vec_cmpgt_signaling_finite_nocc): Likewise.
>   (*vec_cmpge_signaling_nocc): Likewise.
>   (*vec_cmpge_signaling_finite_nocc): Likewise.
>   (vec_cmpungt): New expander.
>   (vec_cmpunge): Likewise.
>   (vec_cmpuneq): Use quiet patterns.
>   (vec_cmpltgt): Allow only on z14+.
>   (vec_cmpordered): Use quiet patterns.
>   (vec_cmpunordered): Likewise.
>   (VEC_CMP_EXPAND): Add ungt and unge.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-08-09  Ilya Leoshkevich  
> 
>   * gcc.target/s390/vector/vec-scalar-cmp-1.c: Adjust
>   expectations.

...

> diff --git a/gcc/config/s390/s390-modes.def b/gcc/config/s390/s390-modes.def
> index 7b7c1141449..2d9cd9b5945 100644
> --- a/gcc/config/s390/s390-modes.def
> +++ b/gcc/config/s390/s390-modes.def
> @@ -52,6 +52,8 @@ CCS:  EQ  LT   GT  UNORDERED  
> (LTGFR, LTGR, LTR, ICM/Y,
> ADB/R, AEB/R, SDB/R, 
> SEB/R,
> SRAG, SRA, SRDA)
>  CCSR: EQ  GT   LT  UNORDERED  (CGF/R, CH/Y)
> +CCSFPS: EQGT   LT  UNORDERED  (KEB/R, KDB/R, KXBR, 
> KDTR,
> +KXTR, WFK)

GT and LT need to be swapped.

Ok with that change. Thanks!

Andreas

Re: [PATCH v4 3/7] S/390: Do not use signaling vector comparisons on z13

2019-10-07 Thread Andreas Krebbel

On 01.10.19 15:27, Ilya Leoshkevich wrote:
> z13 supports only non-signaling vector comparisons.  This means we
> cannot vectorize LT, LE, GT, GE and LTGT when compiling for z13.  Notify
> middle-end about this by using more restrictive operator predicate in
> vcond.
> 
> gcc/ChangeLog:
> 
> 2019-08-21  Ilya Leoshkevich  
> 
>   PR target/77918
>   * config/s390/vector.md (vcond_comparison_operator): New
>   predicate.
>   (vcond): Use vcond_comparison_operator.

Ok. Thanks!

Andreas

[committed] Fix up gcc.target/i386/pr71801.c testcase

2019-10-07 Thread Jakub Jelinek

Hi!

This testcase started FAILing on i?86-linux with r276603 (the -O2 inlining
changes):
/home/jakub/src/gcc/gcc/testsuite/gcc.target/i386/pr71801.c:12:3: warning: 
writing 24 bytes into a region of size 1 [-Wstringop-overflow=]
/home/jakub/src/gcc/gcc/testsuite/gcc.target/i386/pr71801.c:14:5: warning: 
writing 24 bytes into a region of size 1 [-Wstringop-overflow=]
Previously the function call wasn't inlined and thus it didn't warn
that it is in fact invalid.
The following patch makes it valid, and it still ICEs with r238210 the same
way as the incorrect testcase, and succeeds with r238211, so I've committed
this as obvious to trunk.

2019-10-07  Jakub Jelinek  

* gcc.target/i386/pr71801.c (uuidcache_init): Fix up size of d array.

--- gcc/testsuite/gcc.target/i386/pr71801.c.jj  2016-07-11 22:18:04.470879189 
+0200
+++ gcc/testsuite/gcc.target/i386/pr71801.c 2019-10-07 17:39:44.310698244 
+0200
@@ -16,7 +16,7 @@ static int get_label_uuid(char *p1) {
 }
 void uuidcache_addentry(char *p1) { __builtin_memcpy(, p1, sizeof(c)); }
 void uuidcache_init() {
-  char d[1];
+  char d[sizeof(a) + sizeof(c)];
   get_label_uuid(d);
   uuidcache_addentry(d);
 }

Jakub

Re: [PATCH v4 2/7] Introduce can_vcond_compare_p function

2019-10-07 Thread Ilya Leoshkevich

> Am 03.10.2019 um 14:42 schrieb Richard Sandiford :
> 
> Ilya Leoshkevich  writes:
>> z13 supports only non-signaling vector comparisons.  This means we
>> cannot vectorize LT, LE, GT, GE and LTGT when compiling for z13.
>> However, we cannot express this restriction today: the code only checks
>> whether vcond$a$b optab exists, which does not contain information about
>> the operation.
>> 
>> Introduce a function that checks whether back-end supports vector
>> comparisons with individual rtx codes by matching vcond expander's third
>> argument with a fake comparison with the corresponding rtx code.
>> 
>> gcc/ChangeLog:
>> 
>> 2019-08-27  Ilya Leoshkevich  
>> 
>>  PR target/77918
>>  * optabs-tree.c (vcond_icode_p): New function.
>>  (vcond_eq_icode_p): New function.
>>  (expand_vec_cond_expr_p): Use vcond_icode_p and
>>  vcond_eq_icode_p.
>>  * optabs.c (can_vcond_compare_p): New function.
>>  (get_rtx_code): Use get_rtx_code_safe.
>>  (get_rtx_code_safe): New function.
>>  * optabs.h (can_vcond_compare_p): New function.
>>  (get_rtx_code_safe): Likewise.
>> ---
>> gcc/optabs-tree.c | 37 +++--
>> gcc/optabs.c  | 38 ++
>> gcc/optabs.h  |  7 +++
>> 3 files changed, 72 insertions(+), 10 deletions(-)
>> 
>> diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
>> index 8157798cc71..7f505c9cdee 100644
>> --- a/gcc/optabs-tree.c
>> +++ b/gcc/optabs-tree.c
>> @@ -23,7 +23,10 @@ along with GCC; see the file COPYING3.  If not see
>> #include "coretypes.h"
>> #include "target.h"
>> #include "insn-codes.h"
>> +#include "rtl.h"
>> #include "tree.h"
>> +#include "memmodel.h"
>> +#include "optabs.h"
>> #include "optabs-tree.h"
>> #include "stor-layout.h"
>> 
>> @@ -329,6 +332,28 @@ expand_vec_cmp_expr_p (tree value_type, tree mask_type, 
>> enum tree_code code)
>>   return false;
>> }
>> 
>> +/* Return true iff vcond_optab/vcondu_optab support the given tree
>> +   comparison.  */
> 
> Realise it's a bit repetitive in the context of this patch, but...
> 
> /* Return true iff vcond_optab/vcondu_optab can handle a vector
>   comparison for code CODE, comparing operands of type CMP_OP_TYPE and
>   producing a result of type VALUE_TYPE.  */
> 
>> +
>> +static bool
>> +vcond_icode_p (tree value_type, tree cmp_op_type, enum tree_code code)
>> +{
>> +  return can_vcond_compare_p (get_rtx_code (code, TYPE_UNSIGNED 
>> (cmp_op_type)),
>> +  TYPE_MODE (value_type), TYPE_MODE (cmp_op_type));
>> +}
>> +
>> +/* Return true iff vcondeq_optab supports the given tree comparison.  */
> 
> Same idea here.
> 
>> +
>> +static bool
>> +vcond_eq_icode_p (tree value_type, tree cmp_op_type, enum tree_code code)
>> +{
>> +  if (code != EQ_EXPR && code != NE_EXPR)
>> +return false;
>> +
>> +  return get_vcond_eq_icode (TYPE_MODE (value_type), TYPE_MODE 
>> (cmp_op_type))
>> + != CODE_FOR_nothing;
>> +}
>> +
>> /* Return TRUE iff, appropriate vector insns are available
>>for vector cond expr with vector type VALUE_TYPE and a comparison
>>with operand vector types in CMP_OP_TYPE.  */
>> @@ -347,14 +372,14 @@ expand_vec_cond_expr_p (tree value_type, tree 
>> cmp_op_type, enum tree_code code)
>>   || maybe_ne (GET_MODE_NUNITS (value_mode), GET_MODE_NUNITS 
>> (cmp_op_mode)))
>> return false;
>> 
>> -  if (get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),
>> -   TYPE_UNSIGNED (cmp_op_type)) == CODE_FOR_nothing
>> -  && ((code != EQ_EXPR && code != NE_EXPR)
>> -  || get_vcond_eq_icode (TYPE_MODE (value_type),
>> - TYPE_MODE (cmp_op_type)) == CODE_FOR_nothing))
>> +  if (get_rtx_code_safe (code, TYPE_UNSIGNED (cmp_op_type))
>> +  == LAST_AND_UNUSED_RTX_CODE)
>> +/* This may happen, for example, if code == SSA_NAME, in which case we
>> +   cannot be certain whether a vector insn is available.  */
>> return false;
> 
> I think this should just be:
> 
>  if (TREE_CODE_CLASS (tcode) != tcc_comparison)
>return false;
> 
> OK with those changes and without the introduction of get_rtx_code_safe.

Thanks! I've made the changes, retested, and committed the result as
https://gcc.gnu.org/viewcvs/gcc?view=revision=276660
alongside
https://gcc.gnu.org/viewcvs/gcc?view=revision=276659
which Richard B. has approved as well.

Now only the main S/390 bits remain..

Best regards,
Ilya

Re: C++ PATCH for C++20 P0388R4 (conversions to arrays of unknown bounds) and CWG 1307 (c++/91364, c++/69531)

2019-10-07 Thread Jason Merrill


On 10/6/19 11:39 AM, Tam S. B. wrote:

Hi, sorry for chiming in.

IIUC P0388R4 does not allow implicit conversion from `T(*)[]` to `T(*)[N]`. Per 
[conv.qual]/3,


A prvalue of type `T1` can be converted to type `T2` if the cv-combined type of 
`T1` and `T2` is `T2`.


When T1 is `T(*)[]` and T2 is `T(*)[N]`, the "cv-combined type" is `T(*)[]`, 
which is not the same as T2 = `T(*)[N]`, so conversion should not be permitted.

That is to say, either `comp_ptr_ttypes_real` should not use 
`comp_array_types`, or `comp_array_types` should be extended to handle this 
case correctly.

Also, does this patch permit conversion from `T(*)[N]` to `const T(*)[]`? I 
think [conv.qual] allows this conversion, but IIUC `comp_array_types` will 
return false in this case, as it uses `same_type_p`.


Agreed.

Jason

Re: [C++ Patch] Fix cp_parser_delete_expression and cp_parser_throw_expression locations and more

2019-10-07 Thread Jason Merrill

OK.

Re: [PATCH, OpenACC, 3/3] Non-contiguous array support for OpenACC data clauses (re-submission), libgomp patches

2019-10-07 Thread Thomas Schwinge

Hi Chung-Lin!

On 2019-08-20T19:36:56+0800, Chung-Lin Tang  wrote:
> --- libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-1.c   
> (nonexistent)
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-1.c   
> (working copy)
> @@ -0,0 +1,103 @@
> +/* { dg-do run { target { ! openacc_host_selected } } } */

Curious about that restriction, I removed it, and see that these test
cases then fail (SIGSEGV) for host-fallback execution.  Same in presence
of 'if (false)' clauses, which do get used in real-world OpenACC code
(with proper conditionals, of course).

Program received signal SIGSEGV, Segmentation fault.
0x00400fd0 in test1._omp_fn.0 () at 
source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-1.c:26
26a[i][j] = b[i][j];
(gdb) bt
#0  0x00400fd0 in test1._omp_fn.0 () at 
source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-1.c:26
#1  0x77bbfdf9 in GOACC_parallel_keyed (flags_m=, 
fn=0x400ef1 , mapnum=2, hostaddrs=0x7fffc8c0, 
sizes=0x606290 <.omp_data_sizes.4>, kinds=0x6062a0 <.omp_data_kinds.5>) at 
[...]/source-gcc/libgomp/oacc-parallel.c:221
#2  0x00400a1c in test1 () at 
source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-1.c:22
#3  0x00400ee0 in main () at 
source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/noncontig_array-1.c:97

What does it take to make that work?


Grüße
 Thomas


signature.asc
Description: PGP signature

Re: [PATCH, OpenACC, 1/3] Non-contiguous array support for OpenACC data clauses (re-submission), front-end patches

2019-10-07 Thread Thomas Schwinge

Hi Chung-Lin!

Thanks for your work on this.

Please reference PR76739 in your submission/ChangeLog updates.

We'll need Jakub to review the generic code changes, but let me provide
some first review remarks, too.

On 2019-08-20T19:36:24+0800, Chung-Lin Tang  wrote:
> The first patch here are the C/C++ front-end patches.

As far as I'm concerned, it doesn't make sense to artificially split up
patches like that, given that the individual three pieces can only be
considered all together.

And if posting split-up for other reasonse, then please make sure that
the individual patch submission emails have a common "cover letter" email
so that they show up as one email thread.

> --- gcc/c/c-typeck.c  (revision 274618)
> +++ gcc/c/c-typeck.c  (working copy)

> @@ -13099,6 +13100,21 @@ handle_omp_array_sections_1 (tree c, tree t, vec   }
>   }
>   }
> +
> +   /* For OpenACC, if the low_bound/length suggest this is a subarray,
> +  and is referenced through by a pointer, then mark this as
> +  non-contiguous.  */

I don't directly understand this logic.  I'll have to think about it
more.

> +   if (ort == C_ORT_ACC
> +   && types.length () > 0
> +   && (TREE_CODE (low_bound) != INTEGER_CST
> +   || integer_nonzerop (low_bound)
> +   || (length && (TREE_CODE (length) != INTEGER_CST
> +  || !tree_int_cst_equal (size, length)
> + {
> +   tree x = types.last ();
> +   if (TREE_CODE (x) == POINTER_TYPE)
> + non_contiguous = true;
> + }
>   }
>else if (length == NULL_TREE)
>   {
> @@ -13142,7 +13158,8 @@ handle_omp_array_sections_1 (tree c, tree t, vec/* If there is a pointer type anywhere but in the very first
>array-section-subscript, the array section can't be contiguous.  */
>if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_DEPEND
> -   && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST)
> +   && TREE_CODE (TREE_CHAIN (t)) == TREE_LIST
> +   && ort != C_ORT_ACC)
>   {
> error_at (OMP_CLAUSE_LOCATION (c),
>   "array section is not contiguous in %qs clause",
> @@ -13149,6 +13166,8 @@ handle_omp_array_sections_1 (tree c, tree t, vec   omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
> return error_mark_node;
>   }
> +  else if (TREE_CODE (TREE_CHAIN (t)) == TREE_LIST)
> + non_contiguous = true;
>  }
>else
>  {

> @@ -13337,6 +13365,14 @@ handle_omp_array_sections (tree c, enum c_omp_regi
>   size = size_binop (MULT_EXPR, size, l);
>   }
>   }
> +  if (non_contiguous)
> + {
> +   int kind = OMP_CLAUSE_MAP_KIND (c);
> +   OMP_CLAUSE_SET_MAP_KIND (c, kind | GOMP_MAP_NONCONTIG_ARRAY);
> +   OMP_CLAUSE_DECL (c) = t;
> +   OMP_CLAUSE_SIZE (c) = ncarray_dims;
> +   return false;
> + }

I'm expecting to see front end test cases (probably
'-fdump-tree-original' scanning?) for a good number of different data
clauses/array variants, whether that flag 'GOMP_MAP_NONCONTIG_ARRAY' has
been set or not.  (That would then also document the logic presented
above, and should thus help me understand that.)

> --- gcc/cp/semantics.c(revision 274618)
> +++ gcc/cp/semantics.c(working copy)

Likewise.

Grüße
 Thomas

signature.asc
Description: PGP signature

Re: [29/32] Remove global call sets: sched-deps.c

2019-10-07 Thread Christophe Lyon

On Fri, 4 Oct 2019 at 16:35, Richard Sandiford 
wrote:

> Christophe Lyon  writes:
> > On Mon, 30 Sep 2019 at 00:20, Jeff Law  wrote:
> >
> > On 9/11/19 1:17 PM, Richard Sandiford wrote:
> > > This is a straight replacement of an existing "full or partial"
> > > call-clobber check.
> > >
> > >
> > > 2019-09-11  Richard Sandiford  
> > >
> > > gcc/
> > >   * sched-deps.c (deps_analyze_insn): Use the ABI of the target
> > >   function to test whether a register is fully or partly
> clobbered.
> > OK
> > jeff
> >
> >
> > Hi Richard,
> >
> > My testing shows regressions on arm after you applied this patch
> (r276335):
> > For instance on arm-none-linux-gnueabi
> > --with-mode arm
> > --with-cpu cortex-a9
> > FAIL:  gcc.dg/strlenopt-18g.c execution test
> >
> > If you force -march=armv5t via RUNTESTFLAGS, there's an additional
> failure:
> > FAIL: gcc.dg/strlenopt-19.c execution test
> >
> > In fortran, I see different sets of regressions depending on arm vs
> thumb mode.
> > target arm-none-linux-gnueabi
> > --with-mode arm
> > --with-cpu cortex-a9
> > I get these new FAILs:
> > gfortran.dg/char4_iunit_1.f03   -O0  execution test
> > gfortran.dg/char4_iunit_1.f03   -O1  execution test
> > gfortran.dg/char4_iunit_1.f03   -O2  execution test
> > gfortran.dg/char4_iunit_1.f03   -O3 -fomit-frame-pointer
> -funroll-loops
> > -fpeel-loops -ftracer -finline-functions  execution test
> > gfortran.dg/char4_iunit_1.f03   -O3 -g  execution test
> > gfortran.dg/char4_iunit_1.f03   -Os  execution test
> > gfortran.dg/namelist_16.f90   -O0  execution test
> > gfortran.dg/namelist_16.f90   -O1  execution test
> > gfortran.dg/namelist_16.f90   -O2  execution test
> > gfortran.dg/namelist_16.f90   -O3 -fomit-frame-pointer -funroll-loops
> > -fpeel-loops -ftracer -finline-functions  execution test
> > gfortran.dg/namelist_16.f90   -O3 -g  execution test
> > gfortran.dg/namelist_16.f90   -Os  execution test
> > gfortran.dg/namelist_95.f90   -O0  execution test
> > gfortran.dg/namelist_95.f90   -O1  execution test
> > gfortran.dg/namelist_95.f90   -O2  execution test
> > gfortran.dg/namelist_95.f90   -O3 -fomit-frame-pointer -funroll-loops
> > -fpeel-loops -ftracer -finline-functions  execution test
> > gfortran.dg/namelist_95.f90   -O3 -g  execution test
> > gfortran.dg/namelist_95.f90   -Os  execution test
> > gfortran.dg/real_const_3.f90   -O0  execution test
> > gfortran.dg/real_const_3.f90   -O1  execution test
> > gfortran.dg/real_const_3.f90   -O2  execution test
> > gfortran.dg/real_const_3.f90   -O3 -fomit-frame-pointer
> -funroll-loops
> > -fpeel-loops -ftracer -finline-functions  execution test
> > gfortran.dg/real_const_3.f90   -O3 -g  execution test
> > gfortran.dg/real_const_3.f90   -Os  execution test
> >
> >
> > When defaulting to thumb:
> > target arm-none-linux-gnueabi
> > --with-mode thumb
> > --with-cpu cortex-a9
> > I get these new FAILs:
> > gfortran.dg/f2003_io_5.f03   -O0  execution test
> > gfortran.dg/f2003_io_5.f03   -O1  execution test
> > gfortran.dg/f2003_io_5.f03   -O2  execution test
> > gfortran.dg/f2003_io_5.f03   -O3 -fomit-frame-pointer -funroll-loops
> > -fpeel-loops -ftracer -finline-functions  execution test
> > gfortran.dg/f2003_io_5.f03   -O3 -g  execution test
> > gfortran.dg/f2003_io_5.f03   -Os  execution test
> > gfortran.dg/real_const_3.f90   -O0  execution test
> > gfortran.dg/real_const_3.f90   -O1  execution test
> > gfortran.dg/real_const_3.f90   -O2  execution test
> > gfortran.dg/real_const_3.f90   -O3 -fomit-frame-pointer
> -funroll-loops
> > -fpeel-loops -ftracer -finline-functions  execution test
> > gfortran.dg/real_const_3.f90   -O3 -g  execution test
> > gfortran.dg/real_const_3.f90   -Os  execution test
> >
> > This is the most recent validation result I have so far, so maybe you
> already
> > fixed the problem?
>
> This sounds very like
> https://gcc.gnu.org/ml/gcc-patches/2019-10/msg00170.html
> Let me know if you see any remaining failures after that though.
>
>
I looks OK, now.
Note that r276489 fixed the strlenopt regressions, while the fortran
regressions
seem to have been fixed between r276457 and  r276488 (that is, just before
the above patch).

Thanks,

Christophe

Thanks,
> Richard
>

Re: [PATCH 2/5, OpenACC] Support Fortran optional arguments in the firstprivate clause

2019-10-07 Thread Kwok Cheung Yeung

On 07/10/2019 10:25 am, Thomas Schwinge wrote:

Hi Kwok, Tobias!

On 2019-07-29T21:55:52+0100, Kwok Cheung Yeung  wrote:

  >  if (omp_is_reference (new_var)
  >  && TREE_CODE (TREE_TYPE (new_var)) != POINTER_TYPE)

As is, this code in lower_omp_target will always reject optional arguments, so
it must be changed. This was introduced in commit r261025 (SVN trunk) as a fix
for PR 85879, but it also breaks support for arrays in firstprivate, which was
probably an unintended side-effect.

Do we have sufficient testsuite coverage for "arrays in firstprivate", in
all languages?

I don't know about other languages, but for Fortran, definitely not. The 
only testcase that exercises this AFAIK is optional-firstprivate.f90 in 
part 5 of this patch series.

Kwok

Re: [PATCH, OBVIOUS] Fix -Wshadow=local warnings in gcc/[a-c]*.c

2019-10-07 Thread Bernd Edlinger

On 10/7/19 8:56 AM, Richard Biener wrote:
> On Sun, Oct 6, 2019 at 1:24 PM Bernd Edlinger  
> wrote:
>>
>> On 10/5/19 8:24 PM, Segher Boessenkool wrote:
>>>
>>> I am maintainer of combine, I know all about its many problems (it has much
>>> deeper problems than this unfortunately).  Thanks for your help though, this
>>> is much appreciated, but I do think your current patch is not a step in the
>>> right direction.
>>>
>>
>> Hmm, thanks for your open words these are of course important.  I will not
>> commit this under obvious rules since you objected to the patch in general.
>>
>> What I want to achieve is to make sure that new code is not introducing more
>> variable shadowing.  New shadowing variables are introduced by new code, 
>> unless
>> we have a warning enabled.  And the warning need to be enabled together with
>> -Werror otherwise it will be overlooked.
>>
>> For instance I believe MISRA has even stronger coding rules with respect
>> to shadowing.
>>
>> What I tried to do was adding -Wshadow=local to the -Werror warnings set
>> and do a more or less mechanical change over the whole code base.
>> How that mechanical change is done - if at all -, needs to be agreed upon.
>>
>> Currently I have the impression that we do not agree if this warning is to be
>> enabled at all.
> 
> I think if the current code-base was clean then enabling it would be a
> no-brainer.
> But I agree that mechanically "fixing" the current code-base, while ending up
> with no new introductions of local shadowing, is worse if it makes the current
> code-base worse.  Consider your
> 
>   for  (int i = )
> {
> ...
>{
>   int i = ...;
>   ... (*)
>}
> }
> 
> when editing (*) using 'i' will usually be picking up the correct one.  If you
> change the inner do 'i1' then fat-fingering 'i' is easy and bound to happen
> and will be silently accepted.  It's also not any more obvious _which_
> of the i's is intended since 'i1' is not any more descriptive than 'i'.
> 
> If only it will confuse developers familiar with the code then such change is
> making things worse.
> 
> But yes, this means that the quest to enable -Werror=shadow=local becomes
> much harder.  Would it be possible to enable it selectively for "clean"
> files in the Makefile rules?  White-listing with -Wno-error=... doesn't work
> because older host compilers treat unknown options as error there.  More
> configury around this might help (like simply not enabling it during stage1
> and using a configure macro in the while-listing makefile rules).
> 

I think the easiest way would be to add something like this to all files that
are too difficult to fix right now:

Index: combine.c
===
--- combine.c   (revision 276634)
+++ combine.c   (working copy)
@@ -107,6 +107,10 @@
 #include "print-rtl.h"
 #include "function-abi.h"
 
+#if __GNUC__ >= 7
+#pragma GCC diagnostic ignored "-Wshadow=local"
+#endif
+
 /* Number of attempts to combine instructions in this function.  */
 
 static int combine_attempts;

I have modified
192 of 453 *.c files in gcc/*.c

and
59 out of 241 files in the frontends, c*/*.c  fortran/*.c lto/*.c etc.

Many of them have only 1-2 changes, so might be possible to fix at
least some of them, but OTOH more than 50% don't have any issues.

> This probably means fixing the header file issues first though.
> 

Yes.


Bernd.

> Richard.
> 
>>
>> Bernd.
>>

[C++ Patch] Fix cp_parser_delete_expression and cp_parser_throw_expression locations and more

2019-10-07 Thread Paolo Carlini


Hi,

the below fixes those two functions consistently with 
cp_parser_new_expression. Additionally, a few rather straightforward 
tweaks along the usual lines, more DECL_SOURCE_LOCATION and id_loc. 
Tested x86_64-linux.


Thanks, Paolo.

/

/cp
2019-10-07  Paolo Carlini  

* call.c (resolve_args): Use cp_expr_loc_or_input_loc in one place.
* decl.c (grokdeclarator): Use id_loc in one place.
* decl2.c (build_anon_union_vars): Use DECL_SOURCE_LOCATION.
* parser.c (cp_parser_delete_expression): Fix the location of the
returned expression.
(cp_parser_throw_expression): Likewise.
* pt.c (determine_specialization): Use DECL_SOURCE_LOCATION.

/testsuite
2019-10-07  Paolo Carlini  

* g++.dg/diagnostic/not-a-function-template-1.C: New.
* g++.dg/template/crash107.C: Adjust expected location.
* g++.dg/template/dependent-expr1.C: Check locations.
* g++.dg/template/error17.C: Check location.
Index: cp/call.c
===
--- cp/call.c   (revision 276646)
+++ cp/call.c   (working copy)
@@ -4381,7 +4381,8 @@ resolve_args (vec *args, tsubst_flags
   else if (VOID_TYPE_P (TREE_TYPE (arg)))
{
  if (complain & tf_error)
-   error ("invalid use of void expression");
+   error_at (cp_expr_loc_or_input_loc (arg),
+ "invalid use of void expression");
  return NULL;
}
   else if (invalid_nonstatic_memfn_p (EXPR_LOCATION (arg), arg, complain))
Index: cp/decl.c
===
--- cp/decl.c   (revision 276646)
+++ cp/decl.c   (working copy)
@@ -12754,8 +12754,8 @@ grokdeclarator (const cp_declarator *declarator,
tree tmpl = TREE_OPERAND (unqualified_id, 0);
if (variable_template_p (tmpl))
  {
-   error ("specialization of variable template %qD "
-  "declared as function", tmpl);
+   error_at (id_loc, "specialization of variable template "
+ "%qD declared as function", tmpl);
inform (DECL_SOURCE_LOCATION (tmpl),
"variable template declared here");
return error_mark_node;
Index: cp/decl2.c
===
--- cp/decl2.c  (revision 276646)
+++ cp/decl2.c  (working copy)
@@ -1608,7 +1608,8 @@ build_anon_union_vars (tree type, tree object)
  just give an error.  */
   if (TREE_CODE (type) != UNION_TYPE)
 {
-  error ("anonymous struct not inside named type");
+  error_at (DECL_SOURCE_LOCATION (TYPE_MAIN_DECL (type)),
+   "anonymous struct not inside named type");
   return error_mark_node;
 }
 
Index: cp/parser.c
===
--- cp/parser.c (revision 276646)
+++ cp/parser.c (working copy)
@@ -9014,6 +9014,7 @@ cp_parser_delete_expression (cp_parser* parser)
   bool global_scope_p;
   bool array_p;
   tree expression;
+  location_t start_loc = cp_lexer_peek_token (parser->lexer)->location;
 
   /* Look for the optional `::' operator.  */
   global_scope_p
@@ -9043,8 +9044,18 @@ cp_parser_delete_expression (cp_parser* parser)
   if (cp_parser_non_integral_constant_expression (parser, NIC_DEL))
 return error_mark_node;
 
-  return delete_sanity (expression, NULL_TREE, array_p, global_scope_p,
-   tf_warning_or_error);
+  /* Construct a location e.g.:
+   delete [ ] ptr
+   ^~
+ with caret == start at the start of the "delete" token, and
+ the end at the end of the final token we consumed.  */
+  location_t combined_loc = make_location (start_loc, start_loc,
+  parser->lexer);
+  expression = delete_sanity (expression, NULL_TREE, array_p,
+ global_scope_p, tf_warning_or_error);
+  protected_set_expr_location (expression, combined_loc);
+
+  return expression;
 }
 
 /* Returns 1 if TOKEN may start a cast-expression and isn't '++', '--',
@@ -25827,6 +25838,7 @@ cp_parser_throw_expression (cp_parser* parser)
 {
   tree expression;
   cp_token* token;
+  location_t start_loc = cp_lexer_peek_token (parser->lexer)->location;
 
   cp_parser_require_keyword (parser, RID_THROW, RT_THROW);
   token = cp_lexer_peek_token (parser->lexer);
@@ -25842,7 +25854,17 @@ cp_parser_throw_expression (cp_parser* parser)
   else
 expression = cp_parser_assignment_expression (parser);
 
-  return build_throw (expression);
+  /* Construct a location e.g.:
+   throw x
+   ^~~
+ with caret == start at the start of the "throw" token, and
+ the end at the end of the final token we consumed.  */
+  location_t combined_loc = make_location (start_loc, start_loc,
+

Re: [patch] canonicalize unsigned [1,MAX] ranges into ~[0,0]

2019-10-07 Thread Aldy Hernandez


On 10/4/19 1:17 PM, Jeff Law wrote:

On 10/4/19 10:14 AM, Aldy Hernandez wrote:



On 10/4/19 12:02 PM, Jeff Law wrote:

On 10/4/19 9:49 AM, Aldy Hernandez wrote:



On 10/4/19 11:38 AM, Jeff Law wrote:

On 10/4/19 6:59 AM, Aldy Hernandez wrote:

When I did the value_range canonicalization work, I noticed that an
unsigned [1,MAX] and an ~[0,0] could be two different representations
for the same thing.  I didn't address the problem then because callers
to ranges_from_anti_range() would go into an infinite loop trying to
extract ~[0,0] into [1,MAX] and [].  We had a lot of callers to
ranges_from_anti_range, and it smelled like a rat's nest, so I bailed.

Now that we have one main caller (from the symbolic PLUS/MINUS
handling), it's a lot easier to contain.  Well, singleton_p also calls
it, but it's already handling nonzero specially, so it wouldn't be affected.




With some upcoming cleanups I'm about to post, the fact that
[1,MAX] and
~[0,0] are equal_p(), but not nonzero_p(), matters.  Plus, it's just
good form to have one representation, giving us the ability to pick at
nonzero_p ranges with ease.

The code in extract_range_from_plus_minus_expr() continues to be a
mess
(as it has always been), but at least it's contained, and with this
patch, it's slightly smaller.

Note, I'm avoiding adding a comment header for functions with highly
descriptive obvious names.

OK?

Aldy

canonicalize-nonzero-ranges.patch

commit 1c333730deeb4ddadc46ad6d12d5344f92c0352c
Author: Aldy Hernandez 
Date:   Fri Oct 4 08:51:25 2019 +0200

   Canonicalize UNSIGNED [1,MAX] into ~[0,0].
    Adapt PLUS/MINUS symbolic handling, so it doesn't call
   ranges_from_anti_range with a VR_ANTI_RANGE containing one
sub-range.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6e4f145af46..3934b41fdf9 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,18 @@
+2019-10-04  Aldy Hernandez  
+
+    * tree-vrp.c (value_range_base::singleton_p): Use num_pairs
+    instead of calling vrp_val_is_*.
+    (value_range_base::set): Canonicalize unsigned [1,MAX] into
+    non-zero.
+    (range_has_numeric_bounds_p): New.
+    (range_int_cst_p): Use range_has_numeric_bounds_p.
+    (ranges_from_anti_range): Assert that we won't recurse
+    indefinitely.
+    (extract_extremes_from_range): New.
+    (extract_range_from_plus_minus_expr): Adapt so we don't call
+    ranges_from_anti_range with an anti-range containing only one
+    sub-range.

So no problem with the implementation, but I do have a higher level
question.

One of the goals of the representation side of the Ranger project is to
drop anti-ranges.  Canonicalizing [1, MAX] to ~[0,0] seems to be going
in the opposite direction.   So do we really want to canonicalize to
~[0,0]?


Hmmm, Andrew had the same question.

It really doesn't matter what we canonicalize too, as long as we're
consistent, but there are a bunch of non-zero tests throughout that were
checking for the ~[0,0] construct, and I didn't want to rock the boat
too much.  Although in all honesty, most of those should already be
converted to the ::nonzero_p() API.

However, if we canonicalize into [1,MAX] for unsigned, we have the
problem that a signed non-zero will still be ~[0,0], so our ::nonzero_p
code will have to check two different representations, not to mention it
will now have to check TYPE_UNSIGNED(type).

ISTM that the right thing to do is to first move to the ::nonzero_p API,
which should be a behavior preserving change.  It'd probably be a medium
sized change, but highly mechanical and behavior preserving, so easy to
review.


Ughh, please no?  This was just a means to get the general range_fold*
cleanups which are important and make everything easier to read.  I'd
rather not get distracted by having to audit all the nonzero checking
throughout.

Doesn't the audit just have to look for an open-coded check for ~[0,0]
and convert any to use the API?  I don't think we have *that* many
anti-range bits I wouldn't think this would be terrible.

What am I missing here that makes this so painful?


I think I'm just suffering from PTSD from anything VRP related.  If you 
fix something correctly, 20 things are bound to break because they were 
expecting incorrect behavior.







Besides, we can't get away from anti-ranges inasmuch as we have
value_range_base, which hopefully we can move away from in a future
release.  So this will eventually become a non-issue.  At that point,
we'll loose ALL anti-ranges once and for all.

Even if we can't get away from them, introducing more, particularly
canonicalizing on a form using anti-ranges just seems like we're going
backwards.

If we funnel everything through the established API, then changing the
canonical form becomes trivial because it's isolated to just two places.
  The test ::nonzero_p method and the canonicalization point.



~[0,0] has been the accepted way for a long time, I'd really prefer to
keep that (for now).

It has.  Very true.  But I don't

Re: [patch] disentangle range_fold_*ary_expr into various pieces

2019-10-07 Thread Aldy Hernandez





+bool
+ipa_vr::nonzero_p (tree expr_type) const
+{
+  if (type == VR_ANTI_RANGE && wi::eq_p (min, 0) && wi::eq_p (max, 0))
+    return true;
+
+  unsigned prec = TYPE_PRECISION (expr_type);
+  return (type == VR_RANGE
+ && wi::eq_p (min, wi::one (prec))
+ && wi::eq_p (max, wi::max_value (prec, TYPE_SIGN (expr_type;
+}


Errr, wrong version posted.  There was a TYPE_UNSIGNED missing.

Fixed and committed.

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 20a0bddcbab..5020f4a44d5 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -5117,6 +5117,7 @@ ipa_vr::nonzero_p (tree expr_type) const

   unsigned prec = TYPE_PRECISION (expr_type);
   return (type == VR_RANGE
+ && TYPE_UNSIGNED (expr_type)
  && wi::eq_p (min, wi::one (prec))
  && wi::eq_p (max, wi::max_value (prec, TYPE_SIGN (expr_type;
 }

Re: [patch] disentangle range_fold_*ary_expr into various pieces

2019-10-07 Thread Aldy Hernandez


On 10/4/19 2:09 PM, Jeff Law wrote:


You're right.  Easier to refer to the before/after directly.  Sometimes
diffs just suck.

OK
jeff


In testing this patch in isolation from the non-zero canonicalization 
patch, I found one regression due to the fact that:


a) As discussed, two non-zero representations currently exist for 
unsigned ranges.


b) ipa-prop.c has it's own hacked up value_range structure (ipa_vr) 
which doesn't use any API.  Since there is no agreed upon non-zero, 
range-ops can sometimes (correctly) create an unsigned [1,MAX], and 
ipa-prop.c is open-coding the check for a pointer non-zero to ~[0,0]. 
This seems like a latent bug.


I really have no idea, nor do I care (*), what we do with ipa-prop's 
lack of API.  For now, I have implemented ipa_vr::nonzero_p(), and used 
it.  When we agree on the non-zero normalization we can adjust this 
method if necessary.


+bool
+ipa_vr::nonzero_p (tree expr_type) const
+{
+  if (type == VR_ANTI_RANGE && wi::eq_p (min, 0) && wi::eq_p (max, 0))
+return true;
+
+  unsigned prec = TYPE_PRECISION (expr_type);
+  return (type == VR_RANGE
+ && wi::eq_p (min, wi::one (prec))
+ && wi::eq_p (max, wi::max_value (prec, TYPE_SIGN (expr_type;
+}

...

   else if (POINTER_TYPE_P (TREE_TYPE (ddef))
-  && vr[i].type == VR_ANTI_RANGE
-  && wi::eq_p (vr[i].min, 0)
-  && wi::eq_p (vr[i].max, 0))
+  && vr[i].nonzero_p (TREE_TYPE (ddef)))

Attached is the final adjusted patch I have committed to trunk.

Aldy

(*) This can be loosely translated as "it'll eventually bite us in the 
ass, at which point I'll end up addressing it." ;-)


commit 3a7c5a78fe9226daed2d5363a97f87081acd82c7
Author: Aldy Hernandez 
Date:   Thu Oct 3 14:53:12 2019 +0200

Disentangle range_fold_*ary_expr() into various independent pieces.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index fbb1634cbae..83ad0017b5a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,23 @@
+2019-10-07  Aldy Hernandez  
+
+	* ipa-prop.c (ipa_vr::nonzero_p): New.
+	(ipcp_update_vr): Use nonzero_p instead of open-coding check for
+	non-zero range.
+	* ipa-prop.h (class ipa_vr): Add nonzero_p.
+	* tree-vrp.c (range_has_numeric_bounds_p): New.
+	(range_int_cst_p): Use range_has_numeric_bounds_p.
+	(get_range_op_handler): New.
+	(supported_types_p): New.
+	(defined_ranges_p): New.
+	(drop_undefines_to_varying): New.
+	(range_fold_binary_symbolics_p): New.
+	(range_fold_unary_symbolics_p): New.
+	(range_fold_unary_expr): Extract out into above functions.
+	(range_fold_binary_expr): Same.
+	(value_range_base::normalize_addresses): New.
+	(value_range_base::normalize_symbolics): Normalize addresses.
+	* tree-vrp.h (class value_range_base): Add normalize_addresses.
+
 2019-10-07  Aldy Hernandez  
 
 	* tree-vrp.c (value_range_base::singleton_p): Use
diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index e79add12b1e..20a0bddcbab 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -5109,6 +5109,18 @@ ipcp_update_bits (struct cgraph_node *node)
 }
 }
 
+bool
+ipa_vr::nonzero_p (tree expr_type) const
+{
+  if (type == VR_ANTI_RANGE && wi::eq_p (min, 0) && wi::eq_p (max, 0))
+return true;
+
+  unsigned prec = TYPE_PRECISION (expr_type);
+  return (type == VR_RANGE
+	  && wi::eq_p (min, wi::one (prec))
+	  && wi::eq_p (max, wi::max_value (prec, TYPE_SIGN (expr_type;
+}
+
 /* Update value range of formal parameters as described in
ipcp_transformation.  */
 
@@ -5181,9 +5193,7 @@ ipcp_update_vr (struct cgraph_node *node)
 		  TYPE_SIGN (type)));
 	}
 	  else if (POINTER_TYPE_P (TREE_TYPE (ddef))
-		   && vr[i].type == VR_ANTI_RANGE
-		   && wi::eq_p (vr[i].min, 0)
-		   && wi::eq_p (vr[i].max, 0))
+		   && vr[i].nonzero_p (TREE_TYPE (ddef)))
 	{
 	  if (dump_file)
 		fprintf (dump_file, "Setting nonnull for %u\n", i);
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index 0ff80854284..eb3397a6d81 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -165,6 +165,7 @@ public:
   enum value_range_kind type;
   wide_int min;
   wide_int max;
+  bool nonzero_p (tree) const;
 };
 
 /* A jump function for a callsite represents the values passed as actual
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 86e4dace073..d69cfb107cb 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -910,15 +910,21 @@ vrp_bitmap_equal_p (const_bitmap b1, const_bitmap b2)
 	  && bitmap_equal_p (b1, b2)));
 }
 
+static bool
+range_has_numeric_bounds_p (const value_range_base *vr)
+{
+  return (vr->min ()
+	  && TREE_CODE (vr->min ()) == INTEGER_CST
+	  && TREE_CODE (vr->max ()) == INTEGER_CST);
+}
+
 /* Return true if max and min of VR are INTEGER_CST.  It's not necessary
a singleton.  */
 
 bool
 range_int_cst_p (const value_range_base *vr)
 {
-  return (vr->kind () == VR_RANGE
-	  && TREE_CODE (vr->min ()) == INTEGER_CST
-	  && TREE_CODE (vr->max ()) == INTEGER_CST);
+  return (vr->kind () == VR_RANGE &&

Re: [PATCH 1/2][GCC][RFC][middle-end]: Add SLP pattern matcher.

2019-10-07 Thread Richard Biener

On Tue, 1 Oct 2019, Tamar Christina wrote:

> Hi All,
> 
> This adds a framework to allow pattern matchers to be written at based on the
> SLP tree.  The difference between this one and the one in tree-vect-patterns 
> is
> that this matcher allows the matching of an arbitrary number of parallel
> statements and replacing of an arbitrary number of children or statements.
> 
> Any relationship created by the SLP pattern matcher will be undone if SLP 
> fails.
> 
> The pattern matcher can also cancel all permutes depending on what the pattern
> requested it to do.  As soon as one pattern requests the permutes to be
> cancelled all permutes are cancelled.
> 
> Compared to the previous pattern matcher this one will work for an arbitrary
> group size and will match at any arbitrary node in the SLP tree.  The only
> requirement is that the entire node is matched or rejected.
> 
> vect_build_slp_tree_1 is a bit more lenient in what it accepts as "compatible
> operations" but the matcher cannot be because in cases where you match the 
> order
> of the operands may be changed.  So all operands must be changed or none.
> 
> Furthermore the matcher relies on canonization of the operations inside the
> SLP tree and on the fact that floating math operations are not commutative.
> This means that matching a pattern does not need to try all alternatives or
> combinations and that arguments will always be in the same order if it's the
> same operation.
> 
> The pattern matcher also ignored uninteresting nodes such as type casts, loads
> and stores.  Doing so is essential to keep the runtime down.
> 
> Each matcher is allowed a post condition that can be run to perform any 
> changes
> to the SLP tree as needed before the patterns are created and may also abort
> the creation of the patterns.
> 
> When a pattern is matched it is not immediately created but instead it is
> deferred until all statements in the node have been analyzed.  Only if all 
> post
> conditions are true, and all statements will be replaced will the patterns be
> created in batch.  This allows us to not have to undo any work if the pattern
> fails but also makes it so we traverse the tree only once.
> 
> When a new pattern is created it is a marked as a pattern to the statement it 
> is
> replacing and be marked as used in the current SLP scope.  If SLP fails then
> relationship is undone and the relevancy restored.
> 
> Each pattern matcher can detect any number of pattern it wants.  The only
> constraint is that the optabs they produce must all have the same arity.
> 
> The pattern matcher supports instructions that have no scalar form as they
> are added as pattern statements to the stmt.  The BB is left untouched and
> so the scalar loop is untouched.
> 
> Bootstrapped on aarch64-none-linux-gnu and no issues.
> No regression testing done yet.

If you split out the introduction of SLP_TREE_REF_COUNT you can commit
that right now (sorry for being too lazy there...).

One overall comment - you do pattern matching after SLP tree
creation (good) but still do it before the whole SLP graph is
created (bad).  Would it be possible to instead do it as a separate
phase in vect_analyze_slp, looping over all instances (the instances
present entries into the single unified SLP graph now), avoiding
to visit "duplicates"?

In patch 1/2 I see references (mostly in variable names) to
"complex" which is IMHO too specific.

I'd also think that a useful first pattern to support would be
what we do with SLP_TREE_TWO_OPERATORS, where code generation
inserts extra blends.  I have yet to dive into the complex pattern
details to see if that's feasible or if you maybe re-use that...
You seem to at least rely on the SLP build succeeding with
the mixed plus/minus cases?  Which also restricts the kind of
patterns we can recognize I guess.  Plus it shows the chicken-and-egg
issue here - we'd like to detect the patterns _first_ and then
build the SLP trees (or rather combine lanes into larger groups).
Doing it after the fact makes it no more powerful than doing
it as folding post vectorization?

+typedef hash_map 
+  ssa_name_def_to_slp_tree_map_t;
+

this seems to be unused.

+  FOR_EACH_VEC_ELT_FROM (scalar_stmts, i, stmt_info, n - 1)
+{
...
+  work.stmt_infos = scalar_stmts.begin () + (i - (n - 1));
+  work.idx = i;

so this tries to match patterns on ARITY arbitrary aligned
but adjacent scalar stmts [in a brute force way].  But then
you immediately fail when one matching fails.  So I think
this loop should be written like

 for (unsigned i = n - 1; i < scalar_stmts.length (); i += n)
   {
...

to make this fact clearer.  A missed optimization here is to consider
pre/post shuffling of the group to make more cases match.

In this loop you also do the target support check which could
possibly be done upfront in the pattern matcher itself to save
compile-time?  It also seems to be required that patterns match
a single IFN call?

Looking at the complex patterns I am

Re: [SVE] PR91532

2019-10-07 Thread Richard Biener

On Fri, 4 Oct 2019, Prathamesh Kulkarni wrote:

> On Fri, 4 Oct 2019 at 12:18, Richard Biener  wrote:
> >
> > On Thu, 3 Oct 2019, Prathamesh Kulkarni wrote:
> >
> > > On Wed, 2 Oct 2019 at 12:28, Richard Biener  wrote:
> > > >
> > > > On Wed, 2 Oct 2019, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Wed, 2 Oct 2019 at 01:08, Jeff Law  wrote:
> > > > > >
> > > > > > On 10/1/19 12:40 AM, Richard Biener wrote:
> > > > > > > On Mon, 30 Sep 2019, Prathamesh Kulkarni wrote:
> > > > > > >
> > > > > > >> On Wed, 25 Sep 2019 at 23:44, Richard Biener  
> > > > > > >> wrote:
> > > > > > >>>
> > > > > > >>> On Wed, 25 Sep 2019, Prathamesh Kulkarni wrote:
> > > > > > >>>
> > > > > >  On Fri, 20 Sep 2019 at 15:20, Jeff Law  wrote:
> > > > > > >
> > > > > > > On 9/19/19 10:19 AM, Prathamesh Kulkarni wrote:
> > > > > > >> Hi,
> > > > > > >> For PR91532, the dead store is trivially deleted if we place 
> > > > > > >> dse pass
> > > > > > >> between ifcvt and vect. Would it be OK to add another 
> > > > > > >> instance of dse there ?
> > > > > > >> Or should we add an ad-hoc "basic-block dse" sub-pass to 
> > > > > > >> ifcvt that
> > > > > > >> will clean up the dead store ?
> > > > > > > I'd hesitate to add another DSE pass.  If there's one nearby 
> > > > > > > could we
> > > > > > > move the existing pass?
> > > > > >  Well I think the nearest one is just after pass_warn_restrict. 
> > > > > >  Not
> > > > > >  sure if it's a good
> > > > > >  idea to move it up from there ?
> > > > > > >>>
> > > > > > >>> You'll need it inbetween ifcvt and vect so it would be disabled
> > > > > > >>> w/o vectorization, so no, that doesn't work.
> > > > > > >>>
> > > > > > >>> ifcvt already invokes SEME region value-numbering so if we had
> > > > > > >>> MESE region DSE it could use that.  Not sure if you feel like
> > > > > > >>> refactoring DSE to work on regions - it currently uses a DOM
> > > > > > >>> walk which isn't suited for that.
> > > > > > >>>
> > > > > > >>> if-conversion has a little "local" dead predicate compute 
> > > > > > >>> removal
> > > > > > >>> thingy (not that I like that), eventually it can be enhanced to
> > > > > > >>> do the DSE you want?  Eventually it should be moved after the 
> > > > > > >>> local
> > > > > > >>> CSE invocation though.
> > > > > > >> Hi,
> > > > > > >> Thanks for the suggestions.
> > > > > > >> For now, would it be OK to do "dse" on loop header in
> > > > > > >> tree_if_conversion, as in the attached patch ?
> > > > > > >> The patch does local dse in a new function ifcvt_local_dse 
> > > > > > >> instead of
> > > > > > >> ifcvt_local_dce, because it needed to be done after RPO VN which
> > > > > > >> eliminates:
> > > > > > >> Removing dead stmt _ifc__62 = *_55;
> > > > > > >> and makes the following store dead:
> > > > > > >> *_55 = _ifc__61;
> > > > > > >
> > > > > > > I suggested trying to move ifcvt_local_dce after RPO VN, you could
> > > > > > > try that as independent patch (pre-approved).
> > > > > > >
> > > > > > > I don't mind the extra walk though.
> > > > > > >
> > > > > > > What I see as possible issue is that dse_classify_store walks 
> > > > > > > virtual
> > > > > > > uses and I'm not sure if the loop exit is a natural boundary for
> > > > > > > such walk (eventually the loop header virtual PHI is reached but
> > > > > > > there may also be a loop-closed PHI for the virtual operand,
> > > > > > > but not necessarily).  So the question is whether to add a
> > > > > > > "stop at" argument to dse_classify_store specifying the virtual
> > > > > > > use the walk should stop at?
> > > > > > I think we want to stop at the block boundary -- aren't the cases we
> > > > > > care about here local to a block?
> > > > > This version restricts walking in dse_classify_store to basic-block if
> > > > > bb_only is true,
> > > > > and removes dead stores in ifcvt_local_dce instead of separate walk.
> > > > > Does it look OK ?
> > > >
> > > > As relied to Jeff please make it trivially work for SESE region walks
> > > > by specifying the exit virtual operand to stop on.
> > > Thanks for the suggestions, does the attached patch look OK ?
> > > Um, sorry, I don't really understand why we need to specify virtual
> > > phi arg from back edge to stop the walk.
> > > Would it achieve the same effect if we pass index of loop->latch to
> > > stop the walk ?
> >
> > Since DSE walks the virtual def->use chain it makes sense to stop
> > it at a specific virtual definition.  Note
> >
> > @@ -706,7 +699,9 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
> >FOR_EACH_IMM_USE_STMT (use_stmt, ui, defvar)
> > {
> >   /* Limit stmt walking.  */
> > - if (++cnt > PARAM_VALUE (PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE))
> > + if (++cnt > PARAM_VALUE (PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE)
> > + || (stop_at_vuse
> > + && gimple_vuse (use_stmt) == stop_at_vuse))
> >

Re: [PATCH] PR tree-optimization/90836 Missing popcount pattern matching

2019-10-07 Thread Richard Biener

On Tue, Oct 1, 2019 at 1:48 PM Dmitrij Pochepko
 wrote:
>
> Hi Richard,
>
> I updated patch according to all your comments.
> Also bootstrapped and tested again on x86_64-pc-linux-gnu and 
> aarch64-linux-gnu, which took some time.
>
> attached v3.

OK.

Thanks,
Richard.

> Thanks,
> Dmitrij
>
> On Thu, Sep 26, 2019 at 09:47:04AM +0200, Richard Biener wrote:
> > On Tue, Sep 24, 2019 at 5:29 PM Dmitrij Pochepko
> >  wrote:
> > >
> > > Hi,
> > >
> > > can anybody take a look at v2?
> >
> > +(if (tree_to_uhwi (@4) == 1
> > + && tree_to_uhwi (@10) == 2 && tree_to_uhwi (@5) == 4
> >
> > those will still ICE for large __int128_t constants.  Since you do not match
> > any conversions you should probably restrict the precision of 'type' like
> > with
> >(if (TYPE_PRECISION (type) <= 64
> > && tree_to_uhwi (@4) ...
> >
> > likewise tree_to_uhwi will fail for negative constants thus if the
> > pattern assumes
> > unsigned you should verify that as well with && TYPE_UNSIGNED  (type).
> >
> > Your 'argtype' is simply 'type' so you can elide it.
> >
> > +   (switch
> > +   (if (types_match (argtype, long_long_unsigned_type_node))
> > + (convert (BUILT_IN_POPCOUNTLL:integer_type_node @0)))
> > +   (if (types_match (argtype, long_unsigned_type_node))
> > + (convert (BUILT_IN_POPCOUNTL:integer_type_node @0)))
> > +   (if (types_match (argtype, unsigned_type_node))
> > + (convert (BUILT_IN_POPCOUNT:integer_type_node @0)))
> >
> > Please test small types first so we can avoid popcountll when long == long 
> > long
> > or long == int.  I also wonder if we really want to use the builtins and
> > check optab availability or if we nowadays should use
> > direct_internal_fn_supported_p (IFN_POPCOUNT, integer_type_node, type,
> > OPTIMIZE_FOR_BOTH) and
> >
> > (convert (IFN_POPCOUNT:type @0))
> >
> > without the switch?
> >
> > Thanks,
> > Richard.
> >
> > > Thanks,
> > > Dmitrij
> > >
> > > On Mon, Sep 09, 2019 at 10:03:40PM +0300, Dmitrij Pochepko wrote:
> > > > Hi all.
> > > >
> > > > Please take a look at v2 (attached).
> > > > I changed patch according to review comments. The same testing was 
> > > > performed again.
> > > >
> > > > Thanks,
> > > > Dmitrij
> > > >
> > > > On Thu, Sep 05, 2019 at 06:34:49PM +0300, Dmitrij Pochepko wrote:
> > > > > This patch adds matching for Hamming weight (popcount) 
> > > > > implementation. The following sources:
> > > > >
> > > > > int
> > > > > foo64 (unsigned long long a)
> > > > > {
> > > > > unsigned long long b = a;
> > > > > b -= ((b>>1) & 0xULL);
> > > > > b = ((b>>2) & 0xULL) + (b & 
> > > > > 0xULL);
> > > > > b = ((b>>4) + b) & 0x0F0F0F0F0F0F0F0FULL;
> > > > > b *= 0x0101010101010101ULL;
> > > > > return (int)(b >> 56);
> > > > > }
> > > > >
> > > > > and
> > > > >
> > > > > int
> > > > > foo32 (unsigned int a)
> > > > > {
> > > > > unsigned long b = a;
> > > > > b -= ((b>>1) & 0xUL);
> > > > > b = ((b>>2) & 0xUL) + (b & 0xUL);
> > > > > b = ((b>>4) + b) & 0x0F0F0F0FUL;
> > > > > b *= 0x01010101UL;
> > > > > return (int)(b >> 24);
> > > > > }
> > > > >
> > > > > and equivalents are now recognized as popcount for platforms with hw 
> > > > > popcount support. Bootstrapped and tested on x86_64-pc-linux-gnu and 
> > > > > aarch64-linux-gnu systems with no regressions.
> > > > >
> > > > > (I have no write access to repo)
> > > > >
> > > > > Thanks,
> > > > > Dmitrij
> > > > >
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > PR tree-optimization/90836
> > > > >
> > > > > * gcc/match.pd (popcount): New pattern.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > PR tree-optimization/90836
> > > > >
> > > > > * lib/target-supports.exp (check_effective_target_popcount)
> > > > > (check_effective_target_popcountll): New effective targets.
> > > > > * gcc.dg/tree-ssa/popcount4.c: New test.
> > > > > * gcc.dg/tree-ssa/popcount4l.c: New test.
> > > > > * gcc.dg/tree-ssa/popcount4ll.c: New test.
> > > >
> > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > index 0317bc7..b1867bf 100644
> > > > > --- a/gcc/match.pd
> > > > > +++ b/gcc/match.pd
> > > > > @@ -5358,6 +5358,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > > >(cmp (popcount @0) integer_zerop)
> > > > >(rep @0 { build_zero_cst (TREE_TYPE (@0)); }
> > > > >
> > > > > +/* 64- and 32-bits branchless implementations of popcount are 
> > > > > detected:
> > > > > +
> > > > > +   int popcount64c (uint64_t x)
> > > > > +   {
> > > > > + x -= (x >> 1) & 0xULL;
> > > > > + x = (x & 0xULL) + ((x >> 2) & 
> > > > > 0xULL);
> > > > > + x = (x + (x >> 4)) & 0x0f0f0f0f0f0f0f0fULL;
> > > > > + return (x * 0x0101010101010101ULL) >> 56;
> > > > > +   }
> > > > > +
> > > > > +   int popcount32c

Re: [PATCH v4 1/7] Allow COND_EXPR and VEC_COND_EXPR condtions to trap

2019-10-07 Thread Richard Biener

On Tue, Oct 1, 2019 at 3:27 PM Ilya Leoshkevich  wrote:
>
> Right now gimplifier does not allow VEC_COND_EXPR's condition to trap
> and introduces a temporary if this could happen, for example, generating
>
>   _5 = _4 > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 };
>   _6 = VEC_COND_EXPR <_5, { -1, -1, -1, -1 }, { 0, 0, 0, 0 }>;
>
> from GENERIC
>
>   VEC_COND_EXPR < (*b > { 2.0e+0, 2.0e+0, 2.0e+0, 2.0e+0 }) ,
>   { -1, -1, -1, -1 } ,
>   { 0, 0, 0, 0 } >
>
> This is not necessary and makes the resulting GIMPLE harder to analyze.
> Change the gimplifier so as to allow COND_EXPR and VEC_COND_EXPR
> conditions to trap.
>
> This patch takes special care to avoid introducing trapping comparisons
> in GIMPLE_COND.  They are not allowed, because they would require 3
> outgoing edges (then, else and EH), which is awkward to say the least.
> Therefore, computations of such conditions should live in their own basic
> blocks.

OK.  This can go in independently of the other changes in this series.

Thanks and sorry for the delay.
Richard.

> gcc/ChangeLog:
>
> 2019-09-03  Ilya Leoshkevich  
>
> PR target/77918
> * gimple-expr.c (gimple_cond_get_ops_from_tree): Assert that the
> caller passes a non-trapping condition.
> (is_gimple_condexpr): Allow trapping conditions.
> (is_gimple_condexpr_1): New helper function.
> (is_gimple_condexpr_for_cond): New function, acts like old
> is_gimple_condexpr.
> * gimple-expr.h (is_gimple_condexpr_for_cond): New function.
> * gimple.c (gimple_could_trap_p_1): Handle COND_EXPR and
> VEC_COND_EXPR. Fix an issue with statements like i = (fp < 1.).
> * gimplify.c (gimplify_cond_expr): Use
> is_gimple_condexpr_for_cond.
> (gimplify_expr): Allow is_gimple_condexpr_for_cond.
> * tree-eh.c (operation_could_trap_p): Assert on COND_EXPR and
> VEC_COND_EXPR.
> (tree_could_trap_p): Handle COND_EXPR and VEC_COND_EXPR.
> * tree-ssa-forwprop.c (forward_propagate_into_gimple_cond): Use
> is_gimple_condexpr_for_cond, remove pointless tmp check
> (forward_propagate_into_cond): Remove pointless tmp check.
> ---
>  gcc/gimple-expr.c   | 25 +
>  gcc/gimple-expr.h   |  1 +
>  gcc/gimple.c| 14 +-
>  gcc/gimplify.c  |  5 +++--
>  gcc/tree-eh.c   |  8 
>  gcc/tree-ssa-forwprop.c |  7 ---
>  6 files changed, 50 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
> index 4082828e198..1738af186d7 100644
> --- a/gcc/gimple-expr.c
> +++ b/gcc/gimple-expr.c
> @@ -574,6 +574,7 @@ gimple_cond_get_ops_from_tree (tree cond, enum tree_code 
> *code_p,
>   || TREE_CODE (cond) == TRUTH_NOT_EXPR
>   || is_gimple_min_invariant (cond)
>   || SSA_VAR_P (cond));
> +  gcc_checking_assert (!tree_could_throw_p (cond));
>
>extract_ops_from_tree (cond, code_p, lhs_p, rhs_p);
>
> @@ -605,17 +606,33 @@ is_gimple_lvalue (tree t)
>   || TREE_CODE (t) == BIT_FIELD_REF);
>  }
>
> -/*  Return true if T is a GIMPLE condition.  */
> +/* Helper for is_gimple_condexpr and is_gimple_condexpr_for_cond.  */
>
> -bool
> -is_gimple_condexpr (tree t)
> +static bool
> +is_gimple_condexpr_1 (tree t, bool allow_traps)
>  {
>return (is_gimple_val (t) || (COMPARISON_CLASS_P (t)
> -   && !tree_could_throw_p (t)
> +   && (allow_traps || !tree_could_throw_p (t))
> && is_gimple_val (TREE_OPERAND (t, 0))
> && is_gimple_val (TREE_OPERAND (t, 1;
>  }
>
> +/* Return true if T is a GIMPLE condition.  */
> +
> +bool
> +is_gimple_condexpr (tree t)
> +{
> +  return is_gimple_condexpr_1 (t, true);
> +}
> +
> +/* Like is_gimple_condexpr, but does not allow T to trap.  */
> +
> +bool
> +is_gimple_condexpr_for_cond (tree t)
> +{
> +  return is_gimple_condexpr_1 (t, false);
> +}
> +
>  /* Return true if T is a gimple address.  */
>
>  bool
> diff --git a/gcc/gimple-expr.h b/gcc/gimple-expr.h
> index 1ad1432bd17..0925aeb0f57 100644
> --- a/gcc/gimple-expr.h
> +++ b/gcc/gimple-expr.h
> @@ -41,6 +41,7 @@ extern void gimple_cond_get_ops_from_tree (tree, enum 
> tree_code *, tree *,
>tree *);
>  extern bool is_gimple_lvalue (tree);
>  extern bool is_gimple_condexpr (tree);
> +extern bool is_gimple_condexpr_for_cond (tree);
>  extern bool is_gimple_address (const_tree);
>  extern bool is_gimple_invariant_address (const_tree);
>  extern bool is_gimple_ip_invariant_address (const_tree);
> diff --git a/gcc/gimple.c b/gcc/gimple.c
> index 8e828a5f169..a874c29454c 100644
> --- a/gcc/gimple.c
> +++ b/gcc/gimple.c
> @@ -2149,10 +2149,22 @@ gimple_could_trap_p_1 (gimple *s, bool include_mem, 
> bool include_stores)
>return false;
>
>  case GIMPLE_ASSIGN:
> -  t

[committed] use value_range_base::num_pairs in singleton_p

2019-10-07 Thread Aldy Hernandez

Instead of looking inside a range to determine if it has one sub-range, 
use the API.


Committed as obvious.

Aldy
commit 93d4733dd1f8ce8ca4959f4584cec4bdd96d063e
Author: Aldy Hernandez 
Date:   Mon Oct 7 09:15:30 2019 +0200

Use value_range_base::num_pairs instead of vrp_val_is* to check if a range
has one sub-range.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 798d16cf0c6..fbb1634cbae 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2019-10-07  Aldy Hernandez  
+
+	* tree-vrp.c (value_range_base::singleton_p): Use
+	value_range_base::num_pairs instead of vrp_val_is* to check
+	if a range has one sub-range.
+
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index a2ab4a21925..86e4dace073 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -379,10 +379,7 @@ value_range_base::singleton_p (tree *result) const
 	}
 	  return false;
 	}
-
-  /* An anti-range that includes an extreme, is just a range with
-	 one sub-range.  Use the one sub-range.  */
-  if (vrp_val_is_min (m_min, true) || vrp_val_is_max (m_max, true))
+  if (num_pairs () == 1)
 	{
 	  value_range_base vr0, vr1;
 	  ranges_from_anti_range (this, , , true);

Re: [PATCH][RFC] Add new ipa-reorder pass

2019-10-07 Thread Jan Hubicka

> > This is why we currently have way to order function when outputting them
> > and use that with FDO (using Martin's first execution logic). This has
> > drwarback of making the functions to flow in that order through late
> > optimizations and RTL backend and thus we lose IPA-RA and some
> > IP propagation (late pure/const/nothrow discovery).
> 
> But you can also fix that by the parallelization GSoC project approach,
> decoupling things at RTL expansion IPA-wise and using output order
> for the latter (or even do the fragments in GCC itself by refactoring the
> output machinery to use function-local "strings", assembling the final
> file later).  Refactoring output to handle multiple output "files" at the
> same time might also help getting rid of that early-LTO-debug copying
> stuff (now that it seems that linker-plugin extensions for partially claiming
> files will never happen...)

Multiple output files won't solve my concern about padding and
relaxation, but yep, parallelizing RTL will definitly need to introduce
way to hold final assembly and glue it together, so this problem may get
solved independently which would be nice.

Honza

Re: [PR90742] OpenACC/OpenMP target offloading: Fortran 'allocatable' scalars in 'firstprivate' clauses

2019-10-07 Thread Thomas Schwinge

Hi!

Jakub, ping -- and/or: Kwok, Tobias, as you recently worked through that
code for related issues (Fortran optional arguments), do you happen to
have any comments?

On 2019-06-07T16:01:29+0200, I wrote:
> As I had mentioned in the PR...
>
> On Tue, 7 Aug 2018 14:55:07 -0700, Cesar Philippidis  
> wrote:
>> This patch
>
> ... would be one component for fixing 
> "OpenACC/OpenMP target offloading: Fortran 'allocatable' scalars in
> 'firstprivate' clauses".
>
> (Also, as mentioned there, such changes have been submitted already, a
> few times, muddled into other changes.  So, thanks, that this also got
> submitted separately, to address just this one issue.)
>
>> updates the way that lower_omp_target uses firstprivate
>> pointers in OpenACC offloaded regions. On host side, when preparing
>> firstprivate data mapping for pointer type objects, not to be confused
>> with GOMP_MAP_FIRSTPRIVATE_POINTER, the compiler passes passes the
>> address of the value being pointed to and not the address of the pointer
>> itself to the runtime. Correspondingly, on the device side, the compiler
>> generates to code to dereference the remapped pointer once to copy the
>> data to a local buffer.
>> 
>> While this behavior looks like it would break things, it will not affect
>> C or C++ data mappings, because those languages transfer pointers via
>> GOMP_MAP_FIRSTPRIVATE_POINTER.
>
> Not with current GCC sources, as I should eventually find out, which are
> still missing another patch or two, or three, or more.
>
>> In addition, this will not cause
>> problems with array types, because the default remapping rules for
>> OpenACC is to transfer them in via copy. Besides it really doesn't
>> make sense to allow arrays to be transferred in via firstprivate
>> because that would use up a lot of memory on the accelerator.
>
> (Huh, but the latter ought to be supported nevertheless, as far as I
> understand?  Anyway, that'll be for later.)
>
>> Is this OK for trunk? I bootstrapped and regtested it for x86_64 with
>> nvptx offloading.
>
> The patch, as proposed, does introduce regressions.
>
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> @@ -7643,15 +7643,21 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
>> omp_context *ctx)
>>  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE)
>>{
>>  gcc_assert (is_gimple_omp_oacc (ctx->stmt));
>> -if (omp_is_reference (new_var)
>> -&& TREE_CODE (TREE_TYPE (new_var)) != POINTER_TYPE)
>> +if (omp_is_reference (new_var))
>>{
>>  /* Create a local object to hold the instance
>> value.  */
>> -tree type = TREE_TYPE (TREE_TYPE (new_var));
>> +tree type = TREE_TYPE (new_var);
>> +/* Pointer types are mapped onto the device via a
>> +   single level of indirection.  */
>> +if (TREE_CODE (type) != POINTER_TYPE)
>> +  type = TREE_TYPE (type);
>>  const char *id = IDENTIFIER_POINTER (DECL_NAME (new_var));
>>  tree inst = create_tmp_var (type, id);
>> -gimplify_assign (inst, fold_indirect_ref (x), );
>> +if (TREE_CODE (TREE_TYPE (new_var)) == POINTER_TYPE)
>> +  gimplify_assign (inst, fold_indirect_ref (x), );
>> +else
>> +  gimplify_assign (inst, fold_indirect_ref (x), );
>>  x = build_fold_addr_expr (inst);
>>}
>>  gimplify_assign (new_var, x, );
>
> (It seems strange to have the same code in both branches of the 'if'
> statement?)
>
>> @@ -7879,7 +7885,9 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
>> omp_context *ctx)
>>  else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_FIRSTPRIVATE)
>>{
>>  gcc_assert (is_gimple_omp_oacc (ctx->stmt));
>> -if (!omp_is_reference (var))
>> +/* Handle Fortran allocatable scalars.  */
>> +if (!omp_is_reference (var)
>> +&& TREE_CODE (TREE_TYPE (var)) != POINTER_TYPE)
>>{
>>  if (is_gimple_reg (var)
>>  && OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT (c))
> |   TREE_NO_WARNING (var) = 1;
> | var = build_fold_addr_expr (var);
> |   }
> | else
> |   talign = TYPE_ALIGN_UNIT (TREE_TYPE (TREE_TYPE (ovar)));
> | gimplify_assign (x, var, );
> |   }
>
> That's what's causing regressions, for example for 'firstprivate' clauses
> even in non-offloading situation ('if(0)' clause, for example):
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x00402f8a in main._omp_fn.1 () at 
> source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c:59
> 59  b[ii] =

Re: [PATCH 2/5, OpenACC] Support Fortran optional arguments in the firstprivate clause

2019-10-07 Thread Thomas Schwinge

Hi Kwok, Tobias!

On 2019-07-29T21:55:52+0100, Kwok Cheung Yeung  wrote:
>  >if (omp_is_reference (new_var)
>  >&& TREE_CODE (TREE_TYPE (new_var)) != POINTER_TYPE)
>
> As is, this code in lower_omp_target will always reject optional arguments, 
> so 
> it must be changed. This was introduced in commit r261025 (SVN trunk) as a 
> fix 
> for PR 85879, but it also breaks support for arrays in firstprivate, which 
> was 
> probably an unintended side-effect.

Do we have sufficient testsuite coverage for "arrays in firstprivate", in
all languages?


Grüße
 Thomas


> I have found that allowing POINTER_TYPEs 
> that are also DECL_BY_REFERENCE through in this condition allows both 
> optional 
> arguments and arrays to work without regressing the tests in r261025.

>   * omp-low.c (lower_omp_target): Create temporary for received value
>   and take the address for new_var if the original variable was a
>   DECL_BY_REFERENCE.  [...]

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -11173,7 +11173,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
> omp_context *ctx)
> {
>   gcc_assert (is_gimple_omp_oacc (ctx->stmt));
>   if (omp_is_reference (new_var)
> - && TREE_CODE (TREE_TYPE (new_var)) != POINTER_TYPE)
> + && (TREE_CODE (TREE_TYPE (new_var)) != POINTER_TYPE
> + || DECL_BY_REFERENCE (var)))
> {
>   /* Create a local object to hold the instance
>  value.  */


Grüße
 Thomas


signature.asc
Description: PGP signature

[PATCH] Fix PR91975, tame PRE some more

2019-10-07 Thread Richard Biener



The following tries to address the issue that PRE is quite happy
to introduce new IVs in loops just because it can compute some
constant value on the loop entry edge.  In principle there's
already code that should work against that but it simply searches
for a optimize_edge_for_speed () edge.  That still considers
the loop entry edge to be worth optimizing because it ends
up as maybe_hot_edge_p (e) for -O2 which compares the edge count
against the entry block count.  For PRE we want something more
local (comparing to the destination block count).

Now for the simple testcases this shouldn't make a difference
but hot/cold uses PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION) which
isn't the same as profile_probabilities likely or very likely...

Still one key of the patch is that we compare the sum of the
edge counts where the value is available (and thus the redundancy
elimination happens) with the count we have to insert rather
than looking for a single optimize_edge_for_speed_p edge.

For that I've used

if (avail_count < block->count.apply_probability
(profile_probability::unlikely ()))

so we block insertion if the redundancies would overall be "unlikely".

I'm also not sure why maybe_hot_count_p uses HOT_BB_FREQUENCY_FRACTION
while there exists HOT_BB_COUNT_FRACTION (with a ten-fold larger
default value) that seems to match better for scaling a profile-count?

Honza?

Bootstrap & regtest running on x86-64-unknown-linux-gnu.

Does the above predicate look sane or am I on a wrong track with
using the destination block count here (I realize even the "locally cold"
entries into the block might be quite hot globally).

For a 1:1 translation of the existing code to sth using the
original predicate but summing over edges I could use
!maybe_hot_count_p (cfun, avail_count)?  But then we're back to
PRE doing the unwanted insertions.  Changing maybe_hot_count_p
to use HOT_BB_COUNT_FRACTION doesn't make any difference there
(obviously).

Thanks,
Richard.

2019-10-06  Richard Biener  

PR tree-optimization/91975
* tree-ssa-pre.c (do_pre_regular_insertion): Adjust
profitability check to use the sum of all edge counts the
value is available on and check against unlikely execution
of the block.

* gcc.dg/tree-ssa/ldist-39.c: New testcase.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c
new file mode 100644
index 000..a63548979ea
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-39.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ldist-details" } */
+
+#define T int
+
+const T a[] = { 0, 1, 2, 3, 4, 5, 6, 7 };
+T b[sizeof a / sizeof *a];
+
+void f0 (void)
+{
+  const T *s = a;
+  T *d = b;
+  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
+d[i] = s[i];
+}
+
+void g0 (void)
+{
+  const T *s = a;
+  T *d = b;
+  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
+*d++ = *s++;
+}
+
+extern const T c[sizeof a / sizeof *a];
+
+void f1 (void)
+{
+  const T *s = c;
+  T *d = b;
+  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
+d[i] = s[i];
+}
+
+void g1 (void)
+{
+  const T *s = c;
+  T *d = b;
+  for (unsigned i = 0; i != sizeof a / sizeof *a; ++i)
+*d++ = *s++;
+}
+
+/* { dg-final { scan-tree-dump-times "generated memcpy" 4 "ldist" } } */
diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index c618601a184..af49ba388c1 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -3195,7 +3195,7 @@ do_pre_regular_insertion (basic_block block, basic_block 
dom)
  pre_expr eprime = NULL;
  edge_iterator ei;
  pre_expr edoubleprime = NULL;
- bool do_insertion = false;
+ profile_count avail_count = profile_count::zero ();
 
  val = get_expr_value_id (expr);
  if (bitmap_set_contains_value (PHI_GEN (block), val))
@@ -3250,10 +3250,7 @@ do_pre_regular_insertion (basic_block block, basic_block 
dom)
{
  avail[pred->dest_idx] = edoubleprime;
  by_some = true;
- /* We want to perform insertions to remove a redundancy on
-a path in the CFG we want to optimize for speed.  */
- if (optimize_edge_for_speed_p (pred))
-   do_insertion = true;
+ avail_count += pred->count ();
  if (first_s == NULL)
first_s = edoubleprime;
  else if (!pre_expr_d::equal (first_s, edoubleprime))
@@ -3266,7 +3263,11 @@ do_pre_regular_insertion (basic_block block, basic_block 
dom)
 partially redundant.  */
  if (!cant_insert && !all_same && by_some)
{
- if (!do_insertion)
+ /* We want to perform insertions to remove a redundancy on
+a path in the CFG that is somewhat likely.  Avoid inserting
+when we'd only remove a redundancy on

[PATCH] Improve unrolling heuristics, PR91975

2019-10-07 Thread Richard Biener



Currently there's a surprising difference in unrolling size estimation
depending on how exactly you formulate your IV expressions.  The following
patch makes it less dependent on this, behaving like the more optimistical
treatment ( + 1 being constant).  In the end it's still a heuristic
and in some sense the estimation of the original size now looks odd
(costing of a[i] vs. *(a + i * 4)).  I still think it's an improvement.

For testcase adjustments I generally tried to disable unrolling if
doing so would defeat the testcases purpose (validate correctness
of vectorization for example).  I've verified that the unrolling we
now do results in no worse code for the cases (even if I ended up
disabling that unrolling).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

2019-10-07  Richard Biener  

PR tree-optimization/91975
* tree-ssa-loop-ivcanon.c (constant_after_peeling): Consistently
handle invariants.

* g++.dg/tree-ssa/ivopts-3.C: Adjust.
* gcc.dg/vect/vect-profile-1.c: Disable cunrolli.
* gcc.dg/vect/vect-double-reduc-6.c: Disable unrolling of
the innermost loop.
* gcc.dg/vect/vect-93.c: Likewise.
* gcc.dg/vect/vect-105.c: Likewise.
* gcc.dg/vect/pr79920.c: Likewise.
* gcc.dg/vect/no-vfa-vect-102.c: Likewise.
* gcc.dg/vect/no-vfa-vect-101.c: Likewise.
* gcc.dg/vect/pr83202-1.c: Operate on a larger array.
* gfortran.dg/vect/vect-8.f90: Likewise.
* gcc.dg/tree-ssa/cunroll-2.c: Scan early unrolling dump instead
of late one.

diff --git a/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C 
b/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
index 07ff1b770f8..6760a5b1851 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/ivopts-3.C
@@ -70,6 +70,8 @@ int main ( int , char** ) {
 return 0;
 }
 
-// Verify that on x86_64 and i?86 we use a single IV for the innermost loop
+// Verify that on x86_64 and i?86 we unroll the innsermost loop and
+// use three IVs for the then innermost loop
 
-// { dg-final { scan-tree-dump "Selected IV set for loop \[0-9\]* at \[^ 
\]*:64, 3 avg niters, 1 IVs" "ivopts" { target x86_64-*-* i?86-*-* } } }
+// { dg-final { scan-tree-dump "Selected IV set for loop \[0-9\]* at \[^ 
\]*:63, 127 avg niters, 3 IVs" "ivopts" { target x86_64-*-* i?86-*-* } } }
+// { dg-final { scan-tree-dump-not "Selected IV set for loop \[0-9\]* at \[^ 
\]*:64" "ivopts" { target x86_64-*-* i?86-*-* } } }
diff --git a/gcc/testsuite/gcc.c-torture/execute/loop-3.c 
b/gcc/testsuite/gcc.c-torture/execute/loop-3.c
index e314a01b1f1..33eb18826fd 100644
--- a/gcc/testsuite/gcc.c-torture/execute/loop-3.c
+++ b/gcc/testsuite/gcc.c-torture/execute/loop-3.c
@@ -13,7 +13,7 @@ f (m)
   i = m;
   do
 {
-  g (i * INT_MAX / 2);
+  g ((int)((unsigned)i * INT_MAX) / 2);
 }
   while (--i > 0);
 }
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-2.c
index b1d1c7d3d85..ae3fec99749 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
+/* { dg-options "-O3 -fdump-tree-cunrolli-details" } */
 int a[2];
 int test2 (void);
 void
@@ -14,4 +14,4 @@ test(int c)
 }
 }
 /* We are not able to get rid of the final conditional because the loop has 
two exits.  */
-/* { dg-final { scan-tree-dump "loop with 1 iterations completely unrolled" 
"cunroll"} } */
+/* { dg-final { scan-tree-dump "loop with 2 iterations completely unrolled" 
"cunrolli"} } */
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c 
b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c
index 91eb28218bd..ce934279ddf 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-101.c
@@ -22,6 +22,7 @@ int main1 (int x, int y) {
   p = (struct extraction *) malloc (sizeof (struct extraction));
 
   /* Not vectorizable: different unknown offset.  */
+#pragma GCC unroll 0
   for (i = 0; i < N; i++)
 {
   *((int *)p + x + i) = a[i];
diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c 
b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c
index 51f62788dbf..d9e0529e73f 100644
--- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c
+++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-102.c
@@ -28,6 +28,7 @@ int main1 (int x, int y) {
 }
 
   /* Not vectorizable: distance 1.  */
+#pragma GCC unroll 0
   for (i = 0; i < N - 1; i++)
 {
*((int *)p + x + i + 1) = *((int *)p + x + i);
diff --git a/gcc/testsuite/gcc.dg/vect/pr79920.c 
b/gcc/testsuite/gcc.dg/vect/pr79920.c
index 276a2806f0c..38e0fef779a 100644
--- a/gcc/testsuite/gcc.dg/vect/pr79920.c
+++ b/gcc/testsuite/gcc.dg/vect/pr79920.c
@@ -14,6 +14,7 @@ compute_integral (double w_1[18])
 
   for (int ip_1 = 0; ip_1 < 2; ++ip_1)
 {
+#pragma GCC unroll 0
   for (int i_0 = 0; i_0 < 6; ++i_0)

Re: [PATCH, OBVIOUS] Fix -Wshadow=local warnings in gcc/[a-c]*.c

2019-10-07 Thread Segher Boessenkool

On Mon, Oct 07, 2019 at 08:56:44AM +0200, Richard Biener wrote:
> But I agree that mechanically "fixing" the current code-base, while ending up
> with no new introductions of local shadowing, is worse if it makes the current
> code-base worse.

Obfucating the names is not often a good fix for the "this code is too big
and complex" problem :-(

Good fixes for this cannot be done mechanically.

> This probably means fixing the header file issues first though.

Yeah, and those are some of the nastier issues anyway.  We need to try to
get rid of some more big macros.  Not the most exciting work, but useful.

Segher

Re: Type representation in CTF and DWARF

2019-10-07 Thread Richard Biener

On Fri, Oct 4, 2019 at 9:12 PM Indu Bhagat  wrote:
>
> Hello,
>
> At GNU Tools Cauldron this year, some folks were curious to know more on how
> the "type representation" in CTF compares vis-a-vis DWARF.
>
> I use small testcase below to gather some numbers to help drive this 
> discussion.
>
> [ibhagat@ibhagatpc ctf-size]$ cat ctf_sizeme.c
> #define MAX_NUM_MSGS 5
>
> enum node_type
> {
>INIT_TYPE = 0,
>COMM_TYPE = 1,
>COMP_TYPE = 2,
>MSG_TYPE = 3,
>RELEASE_TYPE = 4,
>MAX_NODE_TYPE
> };
>
> typedef struct node_payload
> {
>unsigned short npay_offset;
>const char * npay_msg;
>unsigned int npay_nelems;
>struct node_payload * npay_next;
> } node_payload;
>
> typedef struct node_property
> {
>int timestamp;
>char category;
>long initvalue;
> } node_property_t;
>
> typedef struct node
> {
>enum node_type ntype;
>int nmask:5;
>union
>  {
>struct node_payload * npayload;
>void * nbase;
>  } nu;
>  unsigned int msgs[MAX_NUM_MSGS];
>  node_property_t node_prop;
> } Node;
>
> Node s;
>
> int main (void)
> {
>return 0;
> }
>
> Note that in this case, there is nothing that the de-duplicator has to do
> (neither for the TYPE comdat sections nor CTF types). I chose such an example
> because de-duplication of types is orthogonal to the concept of representation
> of types.
>
> So, for the small C testcase with a union, enum, array, struct, typedef etc, I
> see following sizes :
>
> Compile with -fdebug-types-section -gdwarf-4 (size -A  excerpt):
>  .debug_aranges 48 0
>  .debug_info   150 0
>  .debug_abbrev 314 0
>  .debug_line73 0
>  .debug_str455 0
>  .debug_ranges  32 0
>  .debug_types  578 0
>
> Compile with -fdebug-types-section -gdwarf-5 (size -A  excerpt):
>  .debug_aranges  48 0
>  .debug_info732 0
>  .debug_abbrev  309 0
>  .debug_line 73 0
>  .debug_str 455 0
>  .debug_rnglists 23 0
>
> Compile with -gt (size -A  excerpt):
>  .ctf  966 0
>  CTF strings sub-section size (ctf_strlen in disassmebly) = 374
>  == > CTF section just for representing types = 966 - 374 = 592 bytes
>  (The 592 bytes include the CTF header and other indexes etc.)
>
> So, following points are what I would highlight. Hopefully this helps you see
> that CTF has promise for the task of representing type debug info.
>
> 1. Type Information layout in sections:
> A .ctf section is self-sufficient to represent types in a program. All
> references within the CTF section are via either indexes or offsets into 
> the
> CTF section. No relocations are necessary in CTF at this time. In 
> contrast,
> DWARF type information is organized in multiple sections - .debug_info,
> .debug_abbrev and .debug_str sections in DWARF5; plus .debug_types in 
> DWARF4.
>
> 2. Type Information encoding / compactness matters:
> Because the type information is organized across sections in DWARF (and
> contains some debug information like location etc.) , it is not feasible
> to put a distinct number to the size in bytes for representing type
> information in DWARF. But the size info of sections shown above should
> be helpful to show that CTF does show promise in compactly representing
> types.
>
> Lets see some size data. CTF string table (= 374 bytes) is left out of the
> discussion at hand because it will not be fair to compare with .debug_str
> section which contains other information than just names of types.
>
> The 592 bytes of the .ctf section are needed to represent types in CTF
> format. Now, when using DWARF5, the type information needs 732 bytes in
> .debug_info and 309 bytes in .debug_abbrev.
>
> In DWARF (when using -fdebug-types-section), the base types are duplicated
> across type units. So for the above example, the DWARF DIE representing
> 'unsigned int' will appear in both the  DWARF trees for types - node and
> node_payload. In CTF, there is a single lone type 'unsigned int'.

It's not clear to me why you are using -fdebug-types-section for this
comparison?
With just -gdwarf-4 I get

.debug_info  292
.debug_abbrev 189
.debug_str   299

this contains all the info CTF provides (and more).  This sums to 780 bytes,
smaller than the CTF variant.  I skimmed over the info and there's not much
to strip to get to CTF levels, mainly locations.  The strings section also
has a quite large portion for GCC version and arguments, which is 93 bytes.
So overall the DWARF representation should clock in at less than 700 bytes,
more close to 650.

Richard.

> 3. Type Information retrieval and handling:
> CTF type information is organized as a linear array of CTF types. CTF 
> types
> have references to other CTF types.

Re: [PATCH] Fix -Wshadow=local warnings in passes.c

2019-10-07 Thread Eric Botcazou

> If this ends up acked then please add this to ansidecl.h or
> somewhere else global as template:
> 
> template 
> struct push {
>   push (T &);
>   ~push ();
>   T *m_loc;
>   T m_val;
> };
> 
> because it would be a general solution for _all_ shadow=local warnings?!

No, please, the cure would be much worse than the disease.

-- 
Eric Botcazou

Re: [PATCH] Fix -Wshadow=local warnings in passes.c

2019-10-07 Thread Richard Biener

On Thu, Oct 3, 2019 at 5:18 PM Bernd Edlinger  wrote:
>
> Hi,
>
> this fixes -Wshadow=local warnings in passes.c.
> The non-trivial part is due to the PUSH_INSERT_PASSES_WITHIN
> is used recursively, and shadows the local value p
> in each invocation.
>
> Fixed by using a helper class that restores the saved content
> of p at the end of the block.
>
> The shadowing variable in ipa_write_summaries can simply be
> removed sine the outer variable of the same name has the
> same type and is not live here, this is a trivial change.
>
>
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?

The class seems to be a poor-mans way to avoid the warning
while the current use is more clear.  You do

 {
   int i;
   {
 // int i; ... oops, warning
 int prev_i_1 = i;
 i = ...

 // fixup
 i = prev_i_1;
   }

Using of a C++ class doesn't make this less obviously worse.
If this ends up acked then please add this to ansidecl.h or
somewhere else global as template:

template 
struct push {
  push (T &);
  ~push ();
  T *m_loc;
  T m_val;
};

because it would be a general solution for _all_ shadow=local warnings?!

Richard.

>
> Thanks
> Bernd.
>

Re: [PATCH][RFC] Add new ipa-reorder pass

2019-10-07 Thread Richard Biener

On Sun, Oct 6, 2019 at 4:38 PM Jan Hubicka  wrote:
>
> > On 9/19/19 2:33 AM, Martin Liška wrote:
> > > Hi.
> > >
> > > Function reordering has been around for quite some time and a naive
> > > implementation was also part of my diploma thesis some time ago.
> > > Currently, the GCC can reorder function based on first execution, which
> > > happens with PGO and LTO of course. Known limitation is that the order
> > > is preserved only partially as various symbols go into different LTRANS 
> > > partitions.
> > >
> > > There has been some research in the area and I would point out the 
> > > Facebook paper
> > > ([1]) and Sony presentation ([2]). Based on that, I decided to make a new 
> > > implementation
> > > in the GCC that does the same (in a proper way). First part of the 
> > > enablement are patches
> > > to ld.bfd and ld.gold that come up with a new section .text.sorted, that 
> > > is always sorted.
> > > There's a snippet from the modified default linker script:
> > Funny, I was doing this via linker scripts circa ~95, in fact that's why
> > we have -ffunction-sections :-)   We started with scripts which
> > post-processed profiling data to create linker scripts for ELF systems.
> >  We had something for HPUX/SOM as well, but I can't remember what
> > mechanism we used, it may have been a gross level sorting using the SOM
> > section sort key mechanism - it only had 128 or 256 keys with a notable
> > amount of them reserved.
> >
> > We had also built a linker with a basic level of interposition circa
> > 1993 and explored various approaches to reordering executables.  I'd
> > just joined the group at the time and was responsible for wiring up
> > stuff on the OS side, but eventually got the "pleasure" of owning the
> > linker server.  A lot of the C3 algorithmic stuff looks similar to what
> > we did.
>
> For reference I am attaching early LTO version from circa 2010 :)
> >
> > Anyway...
> >
> > I don't see anything objectionable in here.  It's noted as an RFC.  Are
> > you interested in pushing this forward for gcc-10?
>
> I think it is plan to get some form of code layout pass into GCC10.  I
> will test Martin's version on Firefox and see if it have any effect
> here. It is generally quite hard to evaluate those changes (and it is
> reason why I did not moved forward with version form 2010 - more
> precisely it kept falling off the shelf for about a decade)
>
> If LLD has support for cgraph annotations and we support LLD, i think we
> should add support for that, too - it will be very useful to compare
> indvidiual implementations.
> I believe there is gold support (off tree?) for that too and something
> similar is also supported by other toolchains like AIX?
>
> One problem with the sections approach is that every section gets
> aligned to largest code alignment inside of the section. Since our
> alignments are (should be) often cache line based we get cca 30 bytes of
> waste for every function that is quite a lot.
>
> This is why we currently have way to order function when outputting them
> and use that with FDO (using Martin's first execution logic). This has
> drwarback of making the functions to flow in that order through late
> optimizations and RTL backend and thus we lose IPA-RA and some
> IP propagation (late pure/const/nothrow discovery).

But you can also fix that by the parallelization GSoC project approach,
decoupling things at RTL expansion IPA-wise and using output order
for the latter (or even do the fragments in GCC itself by refactoring the
output machinery to use function-local "strings", assembling the final
file later).  Refactoring output to handle multiple output "files" at the
same time might also help getting rid of that early-LTO-debug copying
stuff (now that it seems that linker-plugin extensions for partially claiming
files will never happen...)

> So this approach has a drawback, too. It is why i was trying to push
> myself to get gcc to use gas fragments :)
>
> Anyway, all these issues can be sovled incementally - lets see how
> Maritn's patch work on Firefox and if we can get it tested elsewhere and
> start from that.
>
> I will take a look into the details.
>
> Honza
> >
> > jeff
> >
>
> Index: tree-pass.h
> ===
> *** tree-pass.h (revision 164689)
> --- tree-pass.h (working copy)
> *** extern struct ipa_opt_pass_d pass_ipa_lt
> *** 467,472 
> --- 467,473 
>   extern struct ipa_opt_pass_d pass_ipa_lto_finish_out;
>   extern struct ipa_opt_pass_d pass_ipa_profile;
>   extern struct ipa_opt_pass_d pass_ipa_cdtor_merge;
> + extern struct ipa_opt_pass_d pass_ipa_func_reorder;
>
>   extern struct gimple_opt_pass pass_all_optimizations;
>   extern struct gimple_opt_pass pass_cleanup_cfg_post_optimizing;
> Index: cgraphunit.c
> ===
> *** cgraphunit.c(revision 164689)
> --- cgraphunit.c(working copy)
>

Re: [PATCH, OBVIOUS] Fix -Wshadow=local warnings in gcc/[a-c]*.c

2019-10-07 Thread Richard Biener

On Sun, Oct 6, 2019 at 1:24 PM Bernd Edlinger  wrote:
>
> On 10/5/19 8:24 PM, Segher Boessenkool wrote:
> >
> > I am maintainer of combine, I know all about its many problems (it has much
> > deeper problems than this unfortunately).  Thanks for your help though, this
> > is much appreciated, but I do think your current patch is not a step in the
> > right direction.
> >
>
> Hmm, thanks for your open words these are of course important.  I will not
> commit this under obvious rules since you objected to the patch in general.
>
> What I want to achieve is to make sure that new code is not introducing more
> variable shadowing.  New shadowing variables are introduced by new code, 
> unless
> we have a warning enabled.  And the warning need to be enabled together with
> -Werror otherwise it will be overlooked.
>
> For instance I believe MISRA has even stronger coding rules with respect
> to shadowing.
>
> What I tried to do was adding -Wshadow=local to the -Werror warnings set
> and do a more or less mechanical change over the whole code base.
> How that mechanical change is done - if at all -, needs to be agreed upon.
>
> Currently I have the impression that we do not agree if this warning is to be
> enabled at all.

I think if the current code-base was clean then enabling it would be a
no-brainer.
But I agree that mechanically "fixing" the current code-base, while ending up
with no new introductions of local shadowing, is worse if it makes the current
code-base worse.  Consider your

  for  (int i = )
{
...
   {
  int i = ...;
  ... (*)
   }
}

when editing (*) using 'i' will usually be picking up the correct one.  If you
change the inner do 'i1' then fat-fingering 'i' is easy and bound to happen
and will be silently accepted.  It's also not any more obvious _which_
of the i's is intended since 'i1' is not any more descriptive than 'i'.

If only it will confuse developers familiar with the code then such change is
making things worse.

But yes, this means that the quest to enable -Werror=shadow=local becomes
much harder.  Would it be possible to enable it selectively for "clean"
files in the Makefile rules?  White-listing with -Wno-error=... doesn't work
because older host compilers treat unknown options as error there.  More
configury around this might help (like simply not enabling it during stage1
and using a configure macro in the while-listing makefile rules).

This probably means fixing the header file issues first though.

Richard.

>
> Bernd.
>

Re: [21/32] Remove global call sets: LRA

2019-10-07 Thread Uros Bizjak

On Sun, Oct 6, 2019 at 4:32 PM Richard Sandiford
 wrote:
>
> Uros Bizjak  writes:
>  This caused:
> 
>  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91994
> >>
> >> Thanks for reducing & tracking down the underlying cause.
> >>
> >>> This change doesn't work with -mzeroupper.  When -mzeroupper is used,
> >>> upper bits of vector registers are clobbered upon callee return if any
> >>> MM/ZMM registers are used in callee.  Even if YMM7 isn't used, upper
> >>> bits of YMM7 can still be clobbered by vzeroupper when YMM1 is used.
> >>
> >> The problem here really is that the pattern is just:
> >>
> >> (define_insn "avx_vzeroupper"
> >>   [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)]
> >>   "TARGET_AVX"
> >>   "vzeroupper"
> >>   ...)
> >>
> >> and so its effect on the registers isn't modelled at all in rtl.
> >> Maybe one option would be to add a parallel:
> >>
> >>   (set (reg:V2DI N) (reg:V2DI N))
> >>
> >> for each register.  Or we could do something like I did for the SVE
> >> tlsdesc calls, although here that would mean using a call pattern for
> >> something that isn't really a call.  Or we could reinstate clobber_high
> >> and use that, but that's very much third out of three.
> >>
> >> I don't think we should add target hooks to get around this, since that's
> >> IMO papering over the issue.
> >>
> >> I'll try the parallel set thing first.
> >
> > Please note that vzeroupper insertion pass runs after register
> > allocation, so in effect vzeroupper pattern is hidden to the register
> > allocator.
>
> Right, but even post-RA passes rely on the register usage being accurate.
> Same for collect_fn_hard_reg_usage, which is the issue here.
>
> The info collected by collect_fn_hard_reg_usage was always wrong for
> vzeroupper.  What changed with my patch is that we now use that info
> for partly call-clobbered registers as well as "normally" clobbered
> registers.  So this is another instance of a problem that was previously
> being masked by having ix86_hard_regno_call_part_clobbered enforce Win64
> rules for all ABIs.
>
> My first idea of adding:
>
>   (set (reg:V2DI N) (reg:V2DI N))
>
> for all clobbered registers didn't work well because it left previously-
> dead registers upwards exposed (obvious in hindsight).  And the second
> idea of using a fake call would require too many "is this really a call?"
> hacks.
>
> So in the end I went for a subpass that chooses between:
>
>   (set (reg:V2DI N) (reg:V2DI N))
>
> and
>
>   (clobber (reg:V2DI N))
>
> depending on whether register N is live or not.  This fixes the testcase
> and doesn't seem to regress code quality for the tests I've tried.
>
> Tested on x86_64-linux-gnu.  OK to install?
>
> Richard
>
>
> 2019-10-06  Richard Sandiford  
>
> gcc/
> PR target/91994
> * config/i386/sse.md (avx_vzeroupper): Turn into a define_expand
> and wrap the unspec_volatile in a parallel.
> (*avx_vzeroupper): New define_insn.  Use a match_parallel around
> the unspec_volatile.
> * config/i386/predicates.md (vzeroupper_pattern): Expect the
> unspec_volatile to be wrapped in a parallel.
> * config/i386/i386-features.c (ix86_add_reg_usage_to_vzeroupper)
> (ix86_add_reg_usage_to_vzerouppers): New functions.
> (rest_of_handle_insert_vzeroupper): Use them to add register
> usage information to the vzeroupper instructions.
>
> gcc/testsuite/
> PR target/91994
> * gcc.target/i386/pr91994.c: New test.

LGTM.

Thanks,
Uros.

> Index: gcc/config/i386/sse.md
> ===
> --- gcc/config/i386/sse.md  2019-09-17 15:27:10.214075253 +0100
> +++ gcc/config/i386/sse.md  2019-10-06 15:19:10.062769500 +0100
> @@ -19622,9 +19622,16 @@ (define_insn "*avx_vzeroall"
> (set_attr "mode" "OI")])
>
>  ;; Clear the upper 128bits of AVX registers, equivalent to a NOP
> -;; if the upper 128bits are unused.
> -(define_insn "avx_vzeroupper"
> -  [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)]
> +;; if the upper 128bits are unused.  Initially we expand the instructions
> +;; as though they had no effect on the SSE registers, but later add SETs and
> +;; CLOBBERs to the PARALLEL to model the real effect.
> +(define_expand "avx_vzeroupper"
> +  [(parallel [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
> +  "TARGET_AVX")
> +
> +(define_insn "*avx_vzeroupper"
> +  [(match_parallel 0 "vzeroupper_pattern"
> + [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])]
>"TARGET_AVX"
>"vzeroupper"
>[(set_attr "type" "sse")
> Index: gcc/config/i386/predicates.md
> ===
> --- gcc/config/i386/predicates.md   2019-09-10 19:56:45.337178032 +0100
> +++ gcc/config/i386/predicates.md   2019-10-06 15:19:10.054769556 +0100
> @@ -1441,8 +1441,9 @@ (define_predicate "vzeroall_pattern"
>
>  ;; return true if OP is a vzeroupper pattern.

65 matches

Mail list logo