Re: [Ping^2, Patch, Fortran] PR98301 Re: RANDOM_INIT() and coarray Fortran

2021-05-21 Thread Jerry D via Gcc-patches

yes, please commit

On 5/21/21 8:08 AM, Steve Kargl via Fortran wrote:

On Fri, May 21, 2021 at 10:09:02AM +0200, Andre Vehreschild wrote:

Ping, ping!

Please find attached a rebased version of the patch for the RANDOM_INIT issue
with coarray Fortran. Nothing changed to the previous version, just rebased to
current master.

Regtested fine on x86_64-linux/f33. Ok for trunk?


I think you've down your due diligence with 2 pings.
I would commit.





[r12-989 Regression] FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -Os (test for warnings, line 98) on Linux/x86_64

2021-05-21 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

325aa13996bafce0c4927876c315d1fa706d9881 is the first bad commit
commit 325aa13996bafce0c4927876c315d1fa706d9881
Author: Thomas Schwinge 
Date:   Fri May 21 08:51:47 2021 +0200

[OpenACC privatization] Reject 'static', 'external' in blocks [PR90115]

caused

FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O0   (test for warnings, line 134)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O0   (test for warnings, line 98)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O1   (test for warnings, line 134)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O1   (test for warnings, line 98)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O2   (test for warnings, line 134)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O2   (test for warnings, line 98)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions   (test for warnings, line 134)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions   (test for warnings, line 98)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -g   (test for warnings, line 134)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -g   (test for warnings, line 98)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -Os   (test for warnings, line 134)
FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -Os   (test for warnings, line 98)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-989/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.oacc-fortran/privatized-ref-2.f90 
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check 
RUNTESTFLAGS="fortran.exp=libgomp.oacc-fortran/privatized-ref-2.f90 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PATCH 06/57] rs6000: Add helper functions for parsing

2021-05-21 Thread Segher Boessenkool
Hi!

On Fri, May 21, 2021 at 03:56:09PM -0500, Bill Schmidt wrote:
> On 5/21/21 1:51 PM, Segher Boessenkool wrote:
> >>+/* Exit codes for the shell.  */
> >>+enum exit_codes {
> >>+  EC_INTERR
> >>+};
> >Please define this with some specified value (so append " = 1" or such),
> >or just write "exit(1);" etc. directly.  As it is, a compiler is free to
> >do "exit(0);" for this error value!  Not good.
> >
> >All these exit codes are externally visible, so please just make them
> >explicit.  There isn't much point in having symbolic names for it
> >either, because users will see the raw numbers (reported by their
> >shell), instead.
> >
> >Most generator programs allow to use fatal() etc., this is a much richer
> >environment, no need to reinvent stuff?  Include "errors.h" and link
> >with error.o, and that is all you need AFAIK.
> 
> I'll dump the exit codes, which were an early idea that didn't prove 
> useful in the end.  exit (1) will suffice...
> 
> However, I'd like to keep my approach to diagnostics.  I wrote it this 
> way so that I would have a centralized place to produce the file, line, 
> and column information, which was really helpful while debugging these 
> large input files.  Putting wrappers around the errors.c functions seems 
> at least as messy.

Yes, wrappers is a no-go.  But you could just have added the features
you need to the generic code?  Was there a technical reason not to do
that?  It sounds useful in many places, not just here.

> >>+static int
> >>+match_integer ()
> >>+{
> >>+  int startpos = pos;
> >>+  if (linebuf[pos] == '-')
> >>+safe_inc_pos ();
> >>+
> >>+  int lastpos = pos - 1;
> >>+  while (isdigit (linebuf[lastpos + 1]))
> >>+if (++lastpos >= LINELEN - 1)
> >>+  {
> >>+   (*diag) ("line length overrun in match_integer.\n");
> >>+   exit (EC_INTERR);
> >>+  }
> >>+
> >>+  if (lastpos < pos)
> >>+return MININT;
> >>+
> >>+  pos = lastpos + 1;
> >>+  char *buf = (char *) malloc (lastpos - startpos + 2);
> >>+  memcpy (buf, [startpos], lastpos - startpos + 1);
> >>+  buf[lastpos - startpos + 1] = '\0';
> >>+
> >>+  int x;
> >>+  sscanf (buf, "%d", );
> >>+  return x;
> >>+}
> >Can't you just use strtol?
> I could, but (a) it's more overhead because of tracking the base, and 

How so?  If you give base 0 it handles all of decimal, hexadecimal, and
octal (with the usual 0x and 0 prefixes).

> (b) I then have to calculate lastpos afterwards.  /shrug

It can stores it in where its second arg points to!

===
  char *endp;
  const char *str = where_the_number_starts_or_some_whitespace_before_it;

  errno = 0;
  long n = strtol (str, , 0);
  if (!*str || errno)
whoops (...);
===

... and now endp points to the first char that is not part of the
number.  It's one of the cases where errno is *helpful* :-)

> >>+  int lastpos = pos - 1;
> >>+  while (linebuf[lastpos + 1] != ']')
> >>+if (++lastpos >= LINELEN - 1)
> >Please don't use side effects in "if" conditions.
> 
> Really?  Is it actually better to write
> 
>   while (linebuf[lastpos + a] != ']')
>     {
>   ++lastpos;
>   if (lastpos >= LINELEN - 1)
>     ...
>     }

  for (int lastpost = pos - 1; linebuf[lastpos + 1] != ']'; lastpos++)
{
  if (lastpos >= LINELEN - 1)
...

  ...
}

> Frankly I don't see it...and I don't see anything in the GNU or GCC 
> coding conventions about this.  I'd rather keep what I have.

The most used side effect in conditionals is assignment.  This saves a
few keystrokes, and maybe a whole line in the code, but it makes the
code a lot harder to comprehend.

In your code it does not matter so very much, since you exit if there
is an error anyway, but it makes it a lot harder to verify what the code
does in all cases, and to check that that is what is wanted.

> >>+  {
> >>+   (*diag) ("line length overrun.\n");
> >>+   exit (EC_INTERR);
> >>+  }
> >I don't think you shoulod check for line length overrun in any of these
> >functions, btw?  Just check where you read them in, and that is plenty?
> 
> Yes -- I think if I check in advance_line for a line that doesn't end in 
> \n, that will make a lot of those things superfluous.
> 
> I've been a little reluctant to do that, since eventually I want to 
> support escape-newline processing to avoid long input lines, but I can 
> still work around that when I get to it.

Aha, that explains.  Yeah you should be able to check the length again
when concatenating two lines, and that is all you need?

Checking for errors repeatedly is so error-prone :-(


Segher


Re: [PATCH 10/57] rs6000: Red-black tree implementation for balanced tree search

2021-05-21 Thread Segher Boessenkool
On Tue, Apr 27, 2021 at 10:32:45AM -0500, Bill Schmidt via Gcc-patches wrote:
> 2021-03-03  Bill Schmidt  
> 
> gcc/
>   * config/rs6000/rbtree.c: New file.
>   * config/rs6000/rbtree.h: New file.

> +  struct rbt_string_node *nodeptr
> += (struct rbt_string_node *) malloc (sizeof (struct rbt_string_node));

malloc (sizeof *nodeptr)  helps catch foolish errors.  This is one of
the secondary reasons not too use superfluous parens with the sizeof
operator: when you see them you know there is probably something not
quite right (like here, should not repeat the type, it is too easy to
make errors that way).

> + /* Gender-neutral formations are awkward, so let's be fair. ;-)
> +("Parent-sibling" is just awful.)  */
> + struct rbt_string_node *aunt = curr->par->par->left;

"unt" is nice imo :-)

> --- /dev/null
> +++ b/gcc/config/rs6000/rbtree.h

> +/* Red-black binary search tree on strings.  Presently we don't support
> +   deletes; only insert/find operations are implemented.  */
> +enum rbt_color
> +  {
> +RBT_BLACK,
> +RBT_RED
> +  };

You you need this in the header?  You could just say
  enum rbt_color;
and nothing more?

Okay for trunk, whatever you decide to do for these.  Thanks!


Segher


Re: [PATCH 09/57] rs6000: Add functions for matching types, part 3 of 3

2021-05-21 Thread Segher Boessenkool
On Tue, Apr 27, 2021 at 10:32:44AM -0500, Bill Schmidt via Gcc-patches wrote:
>  static int
>  match_const_restriction (typeinfo *typedata)
>  {
> +  int oldpos = pos;
> +  if (linebuf[pos] == '<')

> +  else if (linebuf[pos] == '{')

> +  assert (linebuf[pos] == '[');

Can you factor this please?  The three cases do exactly the same?  And
if not, it would be much clearer if you make the difference explicit.
Pass the two '<', '>' as arguments, and you can use that to
differentiate then, even.

Okay for trunk either way.  Thanks!


Segher


Re: [PATCH 08/57] rs6000: Add functions for matching types, part 2 of 3

2021-05-21 Thread Segher Boessenkool
On Tue, Apr 27, 2021 at 10:32:43AM -0500, Bill Schmidt via Gcc-patches wrote:

Oh, hrm, I deleted it all?  I guess there was nothing to complain about
^W^Wcomment on then!  Okay for trunk.  Thanks!


Segher


Re: [PATCH 06/57] rs6000: Add helper functions for parsing

2021-05-21 Thread Bill Schmidt via Gcc-patches

On 5/21/21 1:51 PM, Segher Boessenkool wrote:

Hi!

On Tue, Apr 27, 2021 at 10:32:41AM -0500, Bill Schmidt via Gcc-patches wrote:

+/* Used as a sentinel for range constraints on integer fields.  No field can
+   be 32 bits wide, so this is a safe sentinel value.  */
+#define MININT INT32_MIN

INT32_MIN is for value of type int32_t.  You use it for "int" though.
There is INT_MIN just for that :-)

OK.



+/* Exit codes for the shell.  */
+enum exit_codes {
+  EC_INTERR
+};

Please define this with some specified value (so append " = 1" or such),
or just write "exit(1);" etc. directly.  As it is, a compiler is free to
do "exit(0);" for this error value!  Not good.

All these exit codes are externally visible, so please just make them
explicit.  There isn't much point in having symbolic names for it
either, because users will see the raw numbers (reported by their
shell), instead.

Most generator programs allow to use fatal() etc., this is a much richer
environment, no need to reinvent stuff?  Include "errors.h" and link
with error.o, and that is all you need AFAIK.


I'll dump the exit codes, which were an early idea that didn't prove 
useful in the end.  exit (1) will suffice...


However, I'd like to keep my approach to diagnostics.  I wrote it this 
way so that I would have a centralized place to produce the file, line, 
and column information, which was really helpful while debugging these 
large input files.  Putting wrappers around the errors.c functions seems 
at least as messy.





+static void
+consume_whitespace ()

Please say (void), empty argument lists have different meanings in
different language versions, and it looks like you just forgot to fill
int the type.  Besides, it is just "what we do" :-)


OK.




+/* Get the next nonblank, noncomment line, returning 0 on EOF, 1 otherwise.  */
+static int
+advance_line (FILE *file)
+{
+  while (1)
+{
+  /* Read ahead one line and check for EOF.  */
+  if (!fgets (linebuf, sizeof(linebuf), file))

sizeof is an operator, not a function, and even if it was a function it
would have a space before the opening parenthesis :-)


OK.




+   return 0;

So it also returns 0 on any error?  This may not be a good idea.
match_identifier returns 0 when the string doesn't match a legal 
identifier, as described in the function note.  Otherwise the string is 
copied into a dynamic buffer and its address is returned.  Not clear to 
me what you're objecting to here.

+/* Match an integer and return its value, or MININT on failure.  */
+static int
+match_integer ()
+{
+  int startpos = pos;
+  if (linebuf[pos] == '-')
+safe_inc_pos ();
+
+  int lastpos = pos - 1;
+  while (isdigit (linebuf[lastpos + 1]))
+if (++lastpos >= LINELEN - 1)
+  {
+   (*diag) ("line length overrun in match_integer.\n");
+   exit (EC_INTERR);
+  }
+
+  if (lastpos < pos)
+return MININT;
+
+  pos = lastpos + 1;
+  char *buf = (char *) malloc (lastpos - startpos + 2);
+  memcpy (buf, [startpos], lastpos - startpos + 1);
+  buf[lastpos - startpos + 1] = '\0';
+
+  int x;
+  sscanf (buf, "%d", );
+  return x;
+}

Can't you just use strtol?
I could, but (a) it's more overhead because of tracking the base, and 
(b) I then have to calculate lastpos afterwards.  /shrug



+static const char *
+match_to_right_bracket ()

This needs a function comment.

OK.



+{
+  int lastpos = pos - 1;
+  while (linebuf[lastpos + 1] != ']')
+if (++lastpos >= LINELEN - 1)

Please don't use side effects in "if" conditions.


Really?  Is it actually better to write

  while (linebuf[lastpos + a] != ']')
    {
  ++lastpos;
  if (lastpos >= LINELEN - 1)
    ...
    }

Frankly I don't see it...and I don't see anything in the GNU or GCC 
coding conventions about this.  I'd rather keep what I have.



+  {
+   (*diag) ("line length overrun.\n");
+   exit (EC_INTERR);
+  }

I don't think you shoulod check for line length overrun in any of these
functions, btw?  Just check where you read them in, and that is plenty?


Yes -- I think if I check in advance_line for a line that doesn't end in 
\n, that will make a lot of those things superfluous.


I've been a little reluctant to do that, since eventually I want to 
support escape-newline processing to avoid long input lines, but I can 
still work around that when I get to it.






+  if (lastpos < pos)
+return 0;
+
+  char *buf = (char *) malloc (lastpos - pos + 2);
+  memcpy (buf, [pos], lastpos - pos + 1);
+  buf[lastpos - pos + 1] = '\0';
+
+  pos = lastpos + 1;
+  return buf;
+}

Are there no utility routines you can use?  It would be useful to have
something that all gen* can use (something less bare than what there is
now...)


I didn't find anything great as I was poking around, hence I wrote my 
own low level utilities.  It goes back to my desire to track line/pos 
information for debug.


Thanks for the review!

Bill




Segher


Re: [PATCH 07/57] rs6000: Add functions for matching types, part 1 of 3

2021-05-21 Thread Segher Boessenkool
Hi!

On Tue, Apr 27, 2021 at 10:32:42AM -0500, Bill Schmidt via Gcc-patches wrote:
> +enum void_status {

The { goes on a new line, in our coding style.

> +struct typeinfo {

The same is true in structs (but that is violated in some places).

> +  char isvoid;

bool?  I'm not asking you to change it, but, is there a reason to use an
integer type?

> +  int val1;
> +  int val2;

Lol :-)  Oh the suspense!

> +/* Match one of the allowable base types.  Consumes one token unless the
> +   token is "long", which must be paired with a second "long".  Optionally
> +   consumes a following '*' token for pointers.  Return 1 for success,
> +   0 for failure.  */
> +static int
> +match_basetype (typeinfo *typedata)
> +{
> +  return 1;
> +}

This will be implemented later?

> +/* A const int argument may be restricted to certain values.  This is
> +   indicated by one of the following occurring after the "int' token:
> +
> +restricts the constant to x bits, interpreted as unsigned
> +  restricts the constant to the inclusive range [x,y]
> + [x,y] restricts the constant to the inclusive range [x,y],
> +but only applies if the argument is constant.
> + {x,y} restricts the constant to one of two values, x or y.
> +
> +   Here x and y are integer tokens.  Note that the "const" token is a
> +   lie when the restriction is [x,y], but this simplifies the parsing
> +   significantly and is hopefully forgivable.
> +
> +   Return 1 for success, else 0.  */
> +static int
> +match_const_restriction (typeinfo *typedata)
> +{
> +  return 1;
> +}

And this.

Okay for trunk with the trivialities fixed.  Thanks!


Segher


[PATCH] c++: access for hidden friend of nested class template [PR100502]

2021-05-21 Thread Patrick Palka via Gcc-patches
Here, during ahead of time access checking for the private member
EnumeratorRange::end_reached_ in the hidden friend f, we're triggering
the the assert in enforce_access that verifies we're not trying to add a
dependent access check to TI_DEFERRED_ACCESS_CHECKS.

The special thing about this class member access expression is that it's
considered to be non-type-dependent (so finish_class_member_access_expr
doesn't exit early at template parse time), and then accessible_p
rejects the access (so we don't exit early from enforce access either,
and end up triggering the assert).  I think we're correct to reject it
because a hidden friend is not a member function, so [class.access.nest]
doesn't apply, and also a hidden friend of a nested class is not a
friend of the enclosing class.  (Note that Clang accepts the testcase
and MSVC and ICC reject it.)

This patch relaxes the problematic assert in enforce_access to check
dependent_scope_p instead of uses_template_parms, which is the more
accurate notion of dependence we care about.  This change alone is
sufficient to fix the ICE, but we now end up diagnosing each access
twice, once at substitution time and again from TI_DEFERRED_ACCESS_CHECKS.
So this patch additionally disables ahead of time access checking
during the call to lookup_member from finish_class_member_access_expr;
we're going to check the same access again at substitution time anyway.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?  For GCC 11, should we just backport the enforce_access hunk?

PR c++/100502

gcc/cp/ChangeLog:

* semantics.c (enforce_access): Relax assert about the type
depedence of the DECL_CONTEXT of the declaration.
* typeck.c (finish_class_member_access_expr): Disable ahead
of time access checking during the member lookup.

gcc/testsuite/ChangeLog:

* g++.dg/template/access37.C: New test.
* g++.dg/template/access37a.C: New test.
---
 gcc/cp/semantics.c|  2 +-
 gcc/cp/typeck.c   |  6 ++
 gcc/testsuite/g++.dg/template/access37.C  | 26 +++
 gcc/testsuite/g++.dg/template/access37a.C |  6 ++
 4 files changed, 39 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/template/access37.C
 create mode 100644 gcc/testsuite/g++.dg/template/access37a.C

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 0d590c318fb..0de14316bba 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -365,7 +365,7 @@ enforce_access (tree basetype_path, tree decl, tree 
diag_decl,
   check here.  */
gcc_assert (!uses_template_parms (decl));
if (TREE_CODE (decl) == FIELD_DECL)
- gcc_assert (!uses_template_parms (DECL_CONTEXT (decl)));
+ gcc_assert (!dependent_scope_p (DECL_CONTEXT (decl)));
 
/* Defer this access check until instantiation time.  */
deferred_access_check access_check;
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 703ddd3cc7a..86cf26b9c84 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -3201,9 +3201,15 @@ finish_class_member_access_expr (cp_expr object, tree 
name, bool template_p,
{
  /* Look up the member.  */
  access_failure_info afi;
+ if (processing_template_decl)
+   /* We're going to redo this member lookup at instantiation
+  time, so don't bother checking access ahead of time here.  */
+   push_deferring_access_checks (dk_no_check);
  member = lookup_member (access_path, name, /*protect=*/1,
  /*want_type=*/false, complain,
  );
+ if (processing_template_decl)
+   pop_deferring_access_checks ();
  afi.maybe_suggest_accessor (TYPE_READONLY (object_type));
  if (member == NULL_TREE)
{
diff --git a/gcc/testsuite/g++.dg/template/access37.C 
b/gcc/testsuite/g++.dg/template/access37.C
new file mode 100644
index 000..92fed3e97cb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/access37.C
@@ -0,0 +1,26 @@
+// PR c++/100502
+
+template 
+struct EnumeratorRange {
+  struct Iterator {
+EnumeratorRange range_;
+
+friend void f(Iterator i) {
+  i.range_.end_reached_; // { dg-error "private" }
+  i.range_.EnumeratorRange::end_reached_; // { dg-error "private" }
+  _.end_reached_; // { dg-error "private" }
+  _.EnumeratorRange::end_reached_; // { dg-error "private" }
+}
+  };
+
+ private:
+  bool end_reached_;
+#if DECLARE_FRIEND
+  friend void f(Iterator);
+#endif
+};
+
+int main() {
+  EnumeratorRange::Iterator i;
+  f(i);
+}
diff --git a/gcc/testsuite/g++.dg/template/access37a.C 
b/gcc/testsuite/g++.dg/template/access37a.C
new file mode 100644
index 000..4ce1b2718a0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/access37a.C
@@ -0,0 +1,6 @@
+// PR c++/100502
+// { dg-additional-options "-DDECLARE_FRIEND 

[PATCH] c++: Fix reference NTTP binding to noexcept fn [PR97420]

2021-05-21 Thread Patrick Palka via Gcc-patches
Here, in C++17 mode, convert_nontype_argument_function is rejecting
binding a non-noexcept function reference template parameter to a
noexcept function (encoded as the template argument '*(int (&) (int)) ').

The first roadblock to making this work is that the argument is wrapped
an an implicit INDIRECT_REF, so we need to unwrap it before calling
strip_fnptr_conv.

The second roadblock is that the NOP_EXPR cast converts from a function
pointer type to a reference type while simultaneously removing the
noexcept qualification, and fnptr_conv_p doesn't consider this cast to
be a function pointer conversion.  This patch fixes this by making
fnptr_conv_p treat REFERENCE_TYPEs and POINTER_TYPEs interchangeably.

Finally, in passing, this patch also simplifies noexcept_conv_p by
removing a bunch of redundant checks already performed by its only
caller fnptr_conv_p.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

PR c++/97420
* cvt.c (noexcept_conv_p): Remove redundant checks and simplify.
(fnptr_conv_p): Don't call non_reference.  Use INDIRECT_TYPE_P
instead of TYPE_PTR_P.
* pt.c (convert_nontype_argument_function): Look through
implicit INDIRECT_REFs before calling strip_fnptr_conv.

gcc/testsuite/ChangeLog:

PR c++/97420
* g++.dg/cpp0x/noexcept68.C: New test.
---
 gcc/cp/cvt.c| 33 +++--
 gcc/cp/pt.c |  5 +++-
 gcc/testsuite/g++.dg/cpp0x/noexcept68.C |  8 ++
 3 files changed, 21 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/noexcept68.C

diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index 7fa6e8df52b..582d03f61f5 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -2089,30 +2089,15 @@ noexcept_conv_p (tree to, tree from)
   if (!flag_noexcept_type)
 return false;
 
-  tree t = non_reference (to);
-  tree f = from;
-  if (TYPE_PTRMEMFUNC_P (t)
-  && TYPE_PTRMEMFUNC_P (f))
-{
-  t = TYPE_PTRMEMFUNC_FN_TYPE (t);
-  f = TYPE_PTRMEMFUNC_FN_TYPE (f);
-}
-  if (TYPE_PTR_P (t)
-  && TYPE_PTR_P (f))
-{
-  t = TREE_TYPE (t);
-  f = TREE_TYPE (f);
-}
-  tree_code code = TREE_CODE (f);
-  if (TREE_CODE (t) != code)
+  if (TREE_CODE (to) != TREE_CODE (from))
 return false;
-  if (code != FUNCTION_TYPE && code != METHOD_TYPE)
+  if (!FUNC_OR_METHOD_TYPE_P (from))
 return false;
-  if (!type_throw_all_p (t)
-  || type_throw_all_p (f))
+  if (!type_throw_all_p (to)
+  || type_throw_all_p (from))
 return false;
-  tree v = build_exception_variant (f, NULL_TREE);
-  return same_type_p (t, v);
+  tree v = build_exception_variant (from, NULL_TREE);
+  return same_type_p (to, v);
 }
 
 /* Return true iff FROM can convert to TO by a function pointer conversion.  */
@@ -2120,7 +2105,7 @@ noexcept_conv_p (tree to, tree from)
 bool
 fnptr_conv_p (tree to, tree from)
 {
-  tree t = non_reference (to);
+  tree t = to;
   tree f = from;
   if (TYPE_PTRMEMFUNC_P (t)
   && TYPE_PTRMEMFUNC_P (f))
@@ -2128,8 +2113,8 @@ fnptr_conv_p (tree to, tree from)
   t = TYPE_PTRMEMFUNC_FN_TYPE (t);
   f = TYPE_PTRMEMFUNC_FN_TYPE (f);
 }
-  if (TYPE_PTR_P (t)
-  && TYPE_PTR_P (f))
+  if (INDIRECT_TYPE_P (t)
+  && INDIRECT_TYPE_P (f))
 {
   t = TREE_TYPE (t);
   f = TREE_TYPE (f);
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 99a9ee5ade2..f3fa9c192ad 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -6622,7 +6622,10 @@ convert_nontype_argument_function (tree type, tree expr,
   if (value_dependent_expression_p (fn))
 goto accept;
 
-  fn_no_ptr = strip_fnptr_conv (fn);
+  fn_no_ptr = fn;
+  if (REFERENCE_REF_P (fn_no_ptr))
+fn_no_ptr = TREE_OPERAND (fn_no_ptr, 0);
+  fn_no_ptr = strip_fnptr_conv (fn_no_ptr);
   if (TREE_CODE (fn_no_ptr) == ADDR_EXPR)
 fn_no_ptr = TREE_OPERAND (fn_no_ptr, 0);
   if (BASELINK_P (fn_no_ptr))
diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept68.C 
b/gcc/testsuite/g++.dg/cpp0x/noexcept68.C
new file mode 100644
index 000..086899a4a19
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/noexcept68.C
@@ -0,0 +1,8 @@
+// PR c++/97420
+// { dg-do compile { target c++11 } }
+
+int f(int) noexcept;
+template void A();
+int main() {
+  A();
+}
-- 
2.32.0.rc0



Re: [committed] Fix minor CC0 removal fallout on cr16

2021-05-21 Thread H.J. Lu via Gcc-patches
On Wed, May 5, 2021 at 8:17 AM Jeff Law via Gcc-patches
 wrote:
>
>
> cr16 failed to build due to remnants of CC0 support (NOTICE_UPDATE_CC).
> This removes the macro and obvious bits necessary to support it.
> Committed to the trunk.
>

cr16-elf won't build:

/export/gnu/import/git/sources/gcc/gcc/config/cr16/cr16.md:797:16:
error: unknown rtx code `cc0'
/export/gnu/import/git/sources/gcc/gcc/config/cr16/cr16.md:797:19:
note: following context is `))]'
make[2]: *** [Makefile:2547: s-preds] Error 1

-- 
H.J.


[OpenACC privatization] Reject 'static', 'external' in blocks [PR90115] (was: [Patch][OG10] omp-low.c: Avoid offload-target lto1 is-missing error by not-privatizing TREE_READONLY vars)

2021-05-21 Thread Thomas Schwinge
Hi!

Might be interesting to also research the comments at the end of the
email from an OpenMP perspective?

On 2020-07-23T17:10:54+0100, Julian Brown  wrote:
> On Thu, 16 Jul 2020 15:53:54 +0200
> Tobias Burnus  wrote:
>
>> [This is an OpenACC issue but I would not completely surprised if
>> something similar could occur for OpenMP offloading as well.
>> However, the patch is for an OpenACC-specific function.]

..., and this issue only occurs in OpenACC-only handling, so shouldn't
affect OpenMP.

>> This issue occurs for libgomp.oacc-fortran/privatized-ref-2.f90, for
>> which on the device lto1 complains:
>> lto1: fatal error: /tmp/ccEGJTZN.o: section A.13.1.21 is missing
>> Here, "A.13" is a TREE_STATIC, TREE_READONLY array generated by the
>> Fortran front-end and containing the array-constructor values, i.e.
>> RHS of: array = [(-2*i, i = 1, size(array))] That testcase works on
>> the trunk or on the OG10 (= devel/omp/gcc-10) branch if one reverts
>> the patch "Re-do OpenACC private variable resolution"
>> https://gcc.gnu.org/g:2f4b477223fddb84f66e494eb88d1defbd5e04a2 which
>> is scheduled but not yet submitted for mainline inclusion.

(Now it's in,
<87fsyfzzk4.fsf@euler.schwinge.homeip.net">http://mid.mail-archive.com/87fsyfzzk4.fsf@euler.schwinge.homeip.net>
"openacc: Add support for gang local storage allocation in shared
memory", etc.)

>> The
>> offloading variable table contains the variable as "A.13.10" (which
>> works fine) and the problem-causing patch causes that the code
>> .UNIQUE (OACC_PRIVATE, 0, 0, , ); gets inserted (via the
>> then-added make_oacc_private_marker in omp-low.c). Here, the decl for
>> 'A.13' does not have a varpool_node entry – and it is not streamed
>> out as separate entity. (This IFN_ is then later processed by the
>> target lto1 via omp-offload.c's execute_oacc_device_lower – where the
>> asm_name "A.13.1" appears.)
>>
>> [While I do not completely understand why the target LTO does
>> not contain the symbol, I think the following still makes sense.
>> (I do understand why the offload var table does not contain it.)]
>>
>>
>> If the variable is TREE_READONLY, there is no need to pass it
>> through the variable-privatization bits.

(I have not yet thought in detail about 'TREE_READONLY' specifically.)

>> The current check is for VAR_P and TREE_ADDRESSABLE. For the fix,
>> one could use:
>>!TREE_READONLY
>> or
>>!(TREE_READONLY && TREE_STATIC)
>> or
>>!(TREE_READONLY && (TREE_STATIC || DECL_EXTERNAL)
>> I am not sure what makes more sense. I initially used the first
>> version and then moved to the last. Thoughts?
>
> I don't know when we'd have TREE_READONLY and DECL_EXTERNAL in a
> situation where it made sense to mark the decl private using this
> mechanism -- but I'll defer to your judgement!

Such things have come up in my review of the OpenACC privatization work
('libgomp.oacc-fortran/privatized-ref-2.f90' as discussed here,
'libgomp.oacc-c-c++-common/static-variable-1.c', and a few others).

There's (a) the case of user code, and indeed code like:

#pragma acc parallel
  {
extern int e;
static int s;
[...]
  }

... are probably fine to reject.  We shall discuss that in
 "C/C++ 'static'
variables" (only visible to members of the GitHub OpenACC organization).

And then there's (b) the case of code that GCC itself synthesizes -- like
the example Tobias detailed above.  We ought to handle that "fine".
Again, I have not yet tought in detail what "fine" means here, so let's
default on doing what we've been doing before, that is, disable the new
OpenACC privatization for (b) (and simply for (a), too).

I've pushed "[OpenACC privatization] Reject 'static', 'external' in
blocks [PR90115]" to master branch in commit
325aa13996bafce0c4927876c315d1fa706d9881, see attached.


Relatedly, (c) we also need to establish and verify what's to happen for
code like:

extern int e;
static int s;
[...]
#pragma acc parallel
  {
#pragma acc loop gang private(e, s)
for ([...])
  [...]
  }

Is the idea that these are to be "properly privatized"?  The gang-private
instances of 'e', 's' lose the 'extern', 'static' properties?  (... and
similar for any other funny such properties that may exist?  I'm sure
Fortran has something to offer, too: 'save', etc.?)

Or, are we to get gang-private instances of 'e', 's', keeping the
'extern', 'static' properties?  ("Fun!")


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
>From 325aa13996bafce0c4927876c315d1fa706d9881 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 21 May 2021 08:51:47 +0200
Subject: [PATCH] [OpenACC privatization] Reject 'static', 'external' in blocks
 [PR90115]

	gcc/
	PR middle-end/90115
	* omp-low.c (oacc_privatization_candidate_p): 

Re: [PATCH] rs6000: MMA test case ICEs using -O3 [PR99842]

2021-05-21 Thread Peter Bergner via Gcc-patches
Getting back to this now that trunk is open again...

On 3/31/21 2:17 PM, Segher Boessenkool wrote:
> On Tue, Mar 30, 2021 at 06:49:29PM -0500, Peter Bergner wrote:
>> The mma_assemble_input_operand predicate does not accept reg+reg indexed
>> addresses which can lead to ICEs.  The problem is that the quad_address_p
>> function only accepts reg+offset addresses that are valid for quad word
>> accesses, but not reg+reg addresses which are also valid for quad word
>> accesses when dealing with vector types.  The solution used here is to
>> call memory_operand, which uses rs6000_legitimate_address_p to ensure
>> the address is valid.  For reg+offset addresses, it uses quad_address_p like
>> before, but for reg+reg addresses, it calls legitimate_indexed_address_p
>> addresses which fixes this specific ICE.
> 
> quad_address_p should really only be used for lq/stq (and their prefixed
> forms).  Those insns have very different semantics: they are atomic,
> while vector loads and stores are not; and the prefixed form has some
> special semantics (it swaps halves on LE).

quad_address_p has since day one been used for both lq/stq as well as
for vector loads/stores.  I think the "quad" here is meant to describe
that the address will be used for a dform quad word memory access
(eg, lq/stq, lxv/stxv and lxvp/stxvp) which use DQ offsets.  I don't
think quad_address_p cares whether the address is being used for an
atomic access or not.  That restriction is done elsewhere it seems.



>> diff --git a/gcc/config/rs6000/predicates.md 
>> b/gcc/config/rs6000/predicates.md
>> index 859af75dfbd..e48c6eee19e 100644
>> --- a/gcc/config/rs6000/predicates.md
>> +++ b/gcc/config/rs6000/predicates.md
>> @@ -1171,8 +1171,7 @@
>>  (define_special_predicate "mma_assemble_input_operand"
>>(match_test "(mode == V16QImode
>>  && (vsx_register_operand (op, mode)
>> -|| (MEM_P (op)
>> -&& quad_address_p (XEXP (op, 0), mode, false"))
>> +|| memory_operand (op, mode)))"))
> 
> This seems like it might need reloads very often, because it allows way
> too much?  Or, can you just use reg_or_mem_operand?

Yeah, I think the simplest change is to just add an additional test for
indexed form addresses here.  That keeps the predicate tight on what it
will and won't accept.


>> +/* { dg-options "-O3 -mdejagnu-cpu=power10 -w" } */
>
> (Please document what warning you want to shut up.  Just a few words is
> plenty.)

Ok, I updated the comment to mention the test case was noisy due to creduce
and silenced just the one warning that was being emitted.



rs6000: MMA test case ICEs using -O3 [PR99842]

The mma_assemble_input_operand predicate does not accept reg+reg indexed
addresses which can lead to ICEs.  The lxv and lxvp instructions have
indexed forms (lxvx and lxvpx), so the simple solution is to just allow
indexed addresses in the predicate.

This passed bootstrap and regtesting on powerpc64le-linux with no regressions.

Ok for trunk?

The same issue exists in GCC 10 & 11.  Ok for the release branches too after
it has baked on trunk for a little while?

Peter


gcc/
PR target/99842
* config/rs6000/predicates.md(mma_assemble_input_operand): Allow
indexed form addresses.

gcc/testsuite/
PR target/99842
* g++.target/powerpc/pr99842.C: New.

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index e21bc745f72..121cbf14810 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1172,7 +1172,8 @@ (define_special_predicate "mma_assemble_input_operand"
   (match_test "(mode == V16QImode
&& (vsx_register_operand (op, mode)
|| (MEM_P (op)
-   && quad_address_p (XEXP (op, 0), mode, false"))
+   && (indexed_or_indirect_address (XEXP (op, 0), mode)
+   || quad_address_p (XEXP (op, 0), mode, false)"))
 
 ;; Return 1 if this operand is valid for an MMA disassemble insn.
 (define_predicate "mma_disassemble_output_operand"
diff --git a/gcc/testsuite/g++.target/powerpc/pr99842.C 
b/gcc/testsuite/g++.target/powerpc/pr99842.C
new file mode 100644
index 000..d84de3b4570
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr99842.C
@@ -0,0 +1,188 @@
+/* PR target/99842 */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-O3 -mdejagnu-cpu=power10 -Wno-return-type" } */
+
+/* Verify we do not ICE on the following noisy creduced test case.  */
+
+enum { a, b, c, d };
+template  struct e;
+template  struct e {
+  typedef h f;
+};
+template  struct ac;
+template  struct ac : ac {};
+template  struct l;
+template  class n;
+template  class o;
+template  class ag;
+template  class af;
+template  struct ad;
+template  struct an {
+  typedef n::ai, ac::aj> f;
+};
+template  struct am { typedef o f; };
+template ::ao,
+  typename = typename ac::av>
+struct ak;

[committed] openmp: Fix up firstprivate+lastprivate clause handling [PR99928]

2021-05-21 Thread Jakub Jelinek via Gcc-patches
Hi!

The C/C++ clause splitting happens very early during construct parsing,
but only the FEs later on handle possible instantiations, non-static
member handling and array section lowering.
In the OpenMP 5.0/5.1 rules, whether firstprivate is added to combined
target depends on whether it isn't also mentioned in lastprivate or map
clauses, but unfortunately I think such checks are much better done only
when the FEs perform all the above mentioned changes.
So, this patch arranges for the firstprivate clause to be copied or moved
to combined target construct (as before), but sets flags on that clause,
which tell the FE *finish_omp_clauses and the gimplifier it has been added
only conditionally and let the FEs and gimplifier DTRT for these.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-05-21  Jakub Jelinek  

PR middle-end/99928
gcc/
* tree.h (OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET): Define.
* gimplify.c (enum gimplify_omp_var_data): Fix up
GOVD_MAP_HAS_ATTACHMENTS value, add GOVD_FIRSTPRIVATE_IMPLICIT.
(omp_lastprivate_for_combined_outer_constructs): If combined target
has GOVD_FIRSTPRIVATE_IMPLICIT set for the decl, change it to
GOVD_MAP | GOVD_SEEN.
(gimplify_scan_omp_clauses): Set GOVD_FIRSTPRIVATE_IMPLICIT for
firstprivate clauses with OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT.
(gimplify_adjust_omp_clauses): For firstprivate clauses with
OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT either clear that bit and
OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET too, or remove it and
let it be replaced by implicit map clause.
gcc/c-family/
* c-omp.c (c_omp_split_clauses): Set OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT
on firstprivate clause copy going to target construct, and for
target simd set also OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET bit.
gcc/c/
* c-typeck.c (c_finish_omp_clauses): Move firstprivate clauses with
OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT to the end of the chain.  Don't error
if a decl is mentioned both in map clause and in such firstprivate
clause unless OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET is also set.
gcc/cp/
* semantics.c (finish_omp_clauses): Move firstprivate clauses with
OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT to the end of the chain.  Don't error
if a decl is mentioned both in map clause and in such firstprivate
clause unless OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET is also set.
gcc/testsuite/
* c-c++-common/gomp/pr99928-3.c: Remove all xfails.
* c-c++-common/gomp/pr99928-15.c: New test.

--- gcc/tree.h.jj   2021-05-19 09:20:57.760095905 +0200
+++ gcc/tree.h  2021-05-21 18:48:20.426823145 +0200
@@ -1538,6 +1538,11 @@ class auto_suppress_location_wrappers
 #define OMP_CLAUSE_FIRSTPRIVATE_NO_REFERENCE(NODE) \
   TREE_PRIVATE (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_FIRSTPRIVATE))
 
+/* True on a FIRSTPRIVATE clause with OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT also
+   set if target construct is the only one that accepts the clause.  */
+#define OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET(NODE) \
+  TREE_PROTECTED (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_FIRSTPRIVATE))
+
 /* True on a LASTPRIVATE clause if a FIRSTPRIVATE clause for the same
decl is present in the chain.  */
 #define OMP_CLAUSE_LASTPRIVATE_FIRSTPRIVATE(NODE) \
--- gcc/gimplify.c.jj   2021-05-21 11:41:02.647089257 +0200
+++ gcc/gimplify.c  2021-05-21 18:51:04.754024788 +0200
@@ -126,7 +126,10 @@ enum gimplify_omp_var_data
 
   /* Flag for GOVD_MAP: (struct) vars that have pointer attachments for
  fields.  */
-  GOVD_MAP_HAS_ATTACHMENTS = 8388608,
+  GOVD_MAP_HAS_ATTACHMENTS = 0x400,
+
+  /* Flag for GOVD_FIRSTPRIVATE: OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT.  */
+  GOVD_FIRSTPRIVATE_IMPLICIT = 0x800,
 
   GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
   | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR
@@ -8586,13 +8589,24 @@ omp_lastprivate_for_combined_outer_const
  omp_add_variable (octx, decl, GOVD_LASTPRIVATE | GOVD_SEEN);
  continue;
}
-  if (octx->region_type == ORT_COMBINED_TARGET
- && splay_tree_lookup (octx->variables,
-   (splay_tree_key) decl) == NULL)
+  if (octx->region_type == ORT_COMBINED_TARGET)
{
- omp_add_variable (octx, decl, GOVD_MAP | GOVD_SEEN);
- octx = octx->outer_context;
- break;
+ splay_tree_node n = splay_tree_lookup (octx->variables,
+(splay_tree_key) decl);
+ if (n == NULL)
+   {
+ omp_add_variable (octx, decl, GOVD_MAP | GOVD_SEEN);
+ octx = octx->outer_context;
+   }
+ else if (!implicit_p
+  && (n->value & GOVD_FIRSTPRIVATE_IMPLICIT))
+   {
+ n->value &= ~(GOVD_FIRSTPRIVATE
+

[committed] openmp: Fix up handling of implicit lastprivate on outer constructs for implicit linear and lastprivate IVs [PR99928]

2021-05-21 Thread Jakub Jelinek via Gcc-patches
Hi!

This patch fixes the handling of lastprivate propagation to outer 
combined/composite
leaf constructs from implicit linear or lastprivate clauses on simd IVs and 
adds missing
testsuite coverage for explicit and implicit lastprivate on simd IVs.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-05-21  Jakub Jelinek  

PR middle-end/99928
* gimplify.c (omp_lastprivate_for_combined_outer_constructs): New
function.
(gimplify_scan_omp_clauses) : Use it.
(gimplify_omp_for): Likewise.

* c-c++-common/gomp/pr99928-6.c: Remove all xfails.
* c-c++-common/gomp/pr99928-13.c: New test.
* c-c++-common/gomp/pr99928-14.c: New test.

--- gcc/gimplify.c.jj   2021-05-21 10:33:36.348926909 +0200
+++ gcc/gimplify.c  2021-05-21 11:41:02.647089257 +0200
@@ -8533,6 +8533,73 @@ omp_target_reorder_clauses (tree *list_p
   }
 }
 
+/* DECL is supposed to have lastprivate semantics in the outer contexts
+   of combined/composite constructs, starting with OCTX.
+   Add needed lastprivate, shared or map clause if no data sharing or
+   mapping clause are present.  IMPLICIT_P is true if it is an implicit
+   clause (IV on simd), in which case the lastprivate will not be
+   copied to some constructs.  */
+
+static void
+omp_lastprivate_for_combined_outer_constructs (struct gimplify_omp_ctx *octx,
+  tree decl, bool implicit_p)
+{
+  struct gimplify_omp_ctx *orig_octx = octx;
+  for (; octx; octx = octx->outer_context)
+{
+  if ((octx->region_type == ORT_COMBINED_PARALLEL
+  || (octx->region_type & ORT_COMBINED_TEAMS) == ORT_COMBINED_TEAMS)
+ && splay_tree_lookup (octx->variables,
+   (splay_tree_key) decl) == NULL)
+   {
+ omp_add_variable (octx, decl, GOVD_SHARED | GOVD_SEEN);
+ continue;
+   }
+  if ((octx->region_type & ORT_TASK) != 0
+ && octx->combined_loop
+ && splay_tree_lookup (octx->variables,
+   (splay_tree_key) decl) == NULL)
+   {
+ omp_add_variable (octx, decl, GOVD_LASTPRIVATE | GOVD_SEEN);
+ continue;
+   }
+  if (implicit_p
+ && octx->region_type == ORT_WORKSHARE
+ && octx->combined_loop
+ && splay_tree_lookup (octx->variables,
+   (splay_tree_key) decl) == NULL
+ && octx->outer_context
+ && octx->outer_context->region_type == ORT_COMBINED_PARALLEL
+ && splay_tree_lookup (octx->outer_context->variables,
+   (splay_tree_key) decl) == NULL)
+   {
+ octx = octx->outer_context;
+ omp_add_variable (octx, decl, GOVD_LASTPRIVATE | GOVD_SEEN);
+ continue;
+   }
+  if ((octx->region_type == ORT_WORKSHARE || octx->region_type == ORT_ACC)
+ && octx->combined_loop
+ && splay_tree_lookup (octx->variables,
+   (splay_tree_key) decl) == NULL
+ && !omp_check_private (octx, decl, false))
+   {
+ omp_add_variable (octx, decl, GOVD_LASTPRIVATE | GOVD_SEEN);
+ continue;
+   }
+  if (octx->region_type == ORT_COMBINED_TARGET
+ && splay_tree_lookup (octx->variables,
+   (splay_tree_key) decl) == NULL)
+   {
+ omp_add_variable (octx, decl, GOVD_MAP | GOVD_SEEN);
+ octx = octx->outer_context;
+ break;
+   }
+  break;
+}
+  if (octx && (implicit_p || octx != orig_octx))
+omp_notice_variable (octx, decl, true);
+}
+
 /* Scan the OMP clauses in *LIST_P, installing mappings into a new
and previous omp contexts.  */
 
@@ -8642,48 +8709,8 @@ gimplify_scan_omp_clauses (tree *list_p,
}
  if (OMP_CLAUSE_LASTPRIVATE_CONDITIONAL (c))
flags |= GOVD_LASTPRIVATE_CONDITIONAL;
- struct gimplify_omp_ctx *octx;
- for (octx = outer_ctx; octx; octx = octx->outer_context)
-   {
- if ((octx->region_type == ORT_COMBINED_PARALLEL
-  || ((octx->region_type & ORT_COMBINED_TEAMS)
-   == ORT_COMBINED_TEAMS))
- && splay_tree_lookup (octx->variables,
-   (splay_tree_key) decl) == NULL)
-   {
- omp_add_variable (octx, decl, GOVD_SHARED | GOVD_SEEN);
- continue;
-   }
- if ((octx->region_type & ORT_TASK) != 0
- && octx->combined_loop
- && splay_tree_lookup (octx->variables,
-   (splay_tree_key) decl) == NULL)
-   {
- omp_add_variable (octx, decl, GOVD_LASTPRIVATE | GOVD_SEEN);
- continue;
-   }
- if ((octx->region_type == ORT_WORKSHARE
-  || octx->region_type == ORT_ACC)
- && 

Re: [PATCH 1/3] openacc: Add support for gang local storage allocation in shared memory

2021-05-21 Thread Thomas Schwinge
Hi!

On 2021-04-19T12:23:56+0100, Julian Brown  wrote:
> On Thu, 15 Apr 2021 19:26:54 +0200
> Thomas Schwinge  wrote:
>> As that may not be obvious to the reader, I'd like to have the
>> 'TREE_ADDRESSABLE' conditionalization be documented in the code.  You
>> had explained that in
>> : "a
>> non-addressable variable [...]".
>
> Yeah that probably makes sense.

I've pushed "[OpenACC privatization] Explain OpenACC privatization
candidate selection [PR90115]" to master branch in commit
5a0fe1f6c4ad0e50bf4684e723ae2ba17d94c9e4, see attached.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
>From 5a0fe1f6c4ad0e50bf4684e723ae2ba17d94c9e4 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 20 May 2021 15:55:18 +0200
Subject: [PATCH] [OpenACC privatization] Explain OpenACC privatization
 candidate selection [PR90115]

	gcc/
	PR middle-end/90115
	* omp-low.c (oacc_privatization_candidate_p): New function.
	(oacc_privatization_scan_clause_chain)
	(oacc_privatization_scan_decl_chain): Use it.  Also
	'gcc_checking_assert' that we're not seeing duplicates.
---
 gcc/omp-low.c | 45 ++---
 1 file changed, 42 insertions(+), 3 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index a86c6c1e82c..577676b2a16 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -10144,6 +10144,36 @@ lower_omp_for_lastprivate (struct omp_for_data *fd, gimple_seq *body_p,
 }
 }
 
+/* OpenACC privatization.
+
+   Or, in other words, *sharing* at the respective OpenACC level of
+   parallelism.
+
+   From a correctness perspective, a non-addressable variable can't be accessed
+   outside the current thread, so it can go in a (faster than shared memory)
+   register -- though that register may need to be broadcast in some
+   circumstances.  A variable can only meaningfully be "shared" across workers
+   or vector lanes if its address is taken, e.g. by a call to an atomic
+   builtin.
+
+   From an optimisation perspective, the answer might be fuzzier: maybe
+   sometimes, using shared memory directly would be faster than
+   broadcasting.  */
+
+static bool
+oacc_privatization_candidate_p (const tree decl)
+{
+  bool res = true;
+
+  if (res && !VAR_P (decl))
+res = false;
+
+  if (res && !TREE_ADDRESSABLE (decl))
+res = false;
+
+  return res;
+}
+
 /* Scan CLAUSES for candidates for adjusting OpenACC privatization level in
CTX.  */
 
@@ -10154,8 +10184,12 @@ oacc_privatization_scan_clause_chain (omp_context *ctx, tree clauses)
 if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
   {
 	tree decl = OMP_CLAUSE_DECL (c);
-	if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
-	  ctx->oacc_privatization_candidates.safe_push (decl);
+
+	if (!oacc_privatization_candidate_p (decl))
+	  continue;
+
+	gcc_checking_assert (!ctx->oacc_privatization_candidates.contains (decl));
+	ctx->oacc_privatization_candidates.safe_push (decl);
   }
 }
 
@@ -10166,8 +10200,13 @@ static void
 oacc_privatization_scan_decl_chain (omp_context *ctx, tree decls)
 {
   for (tree decl = decls; decl; decl = DECL_CHAIN (decl))
-if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
+{
+  if (!oacc_privatization_candidate_p (decl))
+	continue;
+
+  gcc_checking_assert (!ctx->oacc_privatization_candidates.contains (decl));
   ctx->oacc_privatization_candidates.safe_push (decl);
+}
 }
 
 /* Callback for walk_gimple_seq.  Find #pragma omp scan statement.  */
-- 
2.30.2



Re: [PATCH 1/3] openacc: Add support for gang local storage allocation in shared memory

2021-05-21 Thread Thomas Schwinge
Hi!

On 2021-04-19T12:23:56+0100, Julian Brown  wrote:
> On Thu, 15 Apr 2021 19:26:54 +0200
> Thomas Schwinge  wrote:
>> On 2021-02-26T04:34:50-0800, Julian Brown 
>> wrote:
>> > Two new target hooks are introduced:
>> > TARGET_GOACC_ADJUST_PRIVATE_DECL and TARGET_GOACC_EXPAND_VAR_DECL.
>> > The first can tweak a variable declaration at oaccdevlow time, and
>> > the second at expand time.  The first or both of these target hooks
>> > can be used by a given offload target, depending on its strategy
>> > for implementing private variables.
>>
>> ACK.
>>
>> So, currently we're only looking at making the gang-private level
>> work. Regarding that, we have two configurations: (1) for GCN
>> offloading, 'targetm.goacc.adjust_private_decl' does the work (in
>> particular, change 'TREE_TYPE' etc.) and there is no
>> 'targetm.goacc.expand_var_decl', and (2) for nvptx offloading,
>> 'targetm.goacc.adjust_private_decl' only sets a marker ('oacc
>> gangprivate' attribute) and then 'targetm.goacc.expand_var_decl' does
>> the work.
>>
>> Therefore I suggest we clarify the (currently) expected handling
>> similar to:
>>
>> --- gcc/omp-offload.c
>> +++ gcc/omp-offload.c
>> @@ -1854,6 +1854,19 @@ oacc_rewrite_var_decl (tree *tp, int 
>> *walk_subtrees, void *data) return NULL_TREE;
>>  }
>>
>> +static tree
>> +oacc_rewrite_var_decl_ (tree *tp, int *walk_subtrees, void *data)
>> +{
>> +  tree t = oacc_rewrite_var_decl (tp, walk_subtrees, data);
>> +  if (targetm.goacc.expand_var_decl)
>> +{
>> +  walk_stmt_info *wi = (walk_stmt_info *) data;
>> +  var_decl_rewrite_info *info = (var_decl_rewrite_info *) wi->info;
>> +  gcc_assert (!info->modified);
>> +}
>> +  return t;
>> +}
>
> Why the ugly _ tail on the function name!? I don't think that's a
> typical GNU coding standards thing, is it?

Heh, that was just to make the WIP prototype changes diff as small as
possible.  ;-)

>> +
>>  /* Return TRUE if CALL is a call to a builtin atomic/sync operation.  */
>>  static bool
>> @@ -2195,6 +2208,9 @@ execute_oacc_device_lower ()
>>   COMPONENT_REFS, ARRAY_REFS and plain VAR_DECLs are also
>> rewritten to use the new decl, adjusting types of appropriate tree
>> nodes as necessary.  */
>> +  if (targetm.goacc.expand_var_decl)
>> +gcc_assert (adjusted_vars.is_empty ());
>
> If you like

I've pushed "[OpenACC privatization] Explain two different configurations
[PR90115]" to master branch in commit
21803fcaebeab36de0d7b6b8cf6abb9389f5e51f, see attached.

> -- or do something like
>
>>if (targetm.goacc.adjust_private_decl)
>  && !adjusted_vars.is_empty ())
>
> perhaps.

That, too, additionally: I've pushed "[OpenACC privatization] Skip
processing if no work to be done [PR90115]" to master branch in commit
ad4612cb048b261f6834e9155e41e40e9252c80b, see attached.

>>  {
>>FOR_ALL_BB_FN (bb, cfun)
>> @@ -2217,7 +2233,7 @@ execute_oacc_device_lower ()
>> memset (, 0, sizeof (wi));
>> wi.info = 
>>
>> -   walk_gimple_op (stmt, oacc_rewrite_var_decl, );
>> +   walk_gimple_op (stmt, oacc_rewrite_var_decl_, );
>>
>> if (info.modified)
>>   update_stmt (stmt);
>>
>> Or, in fact, 'if (targetm.goacc.expand_var_decl)', skip the
>> 'adjusted_vars' handling completely?
>
> For the current pair of implementations, sure. I don't think it's
> necessary to set that as a constraint for future targets though? I
> guess it doesn't matter much until such a target exists.
>
>> I do understand that eventually (in particular, for worker-private
>> level?), both 'targetm.goacc.adjust_private_decl' and
>> 'targetm.goacc.expand_var_decl' may need to do things, but that's
>> currently not meant to be addressed, and thus not fully worked out and
>> implemented, and thus untested.  Hence, 'assert' what currently is
>> implemented/tested, only.
>
> If you like, no strong feelings from me on that.
>
>> (Given that eventual goal, that's probably sufficient motivation to
>> indeed add the 'adjusted_vars' handling in generic 'gcc/omp-offload.c'
>> instead of moving it into the GCN back end?)
>
> I'm not sure what moving it to the GCN back end would look like. I
> guess it's a question of keeping the right abstractions in the right
> place.

Right.  I guess we'll figure that out once we have more than one back end
using the 'adjusted_vars' machinery.

>> > +   1. They can be recreated, making a pointer to the variable in the 
>> > new
>> > +address space, or
>> > +
>> > +   2. The address of the variable in the new address space can be 
>> > taken,
>> > +converted to the default (original) address space, and the result of
>> > +that conversion subsituted in place of the original ADDR_EXPR node.
>> > +
>> > + Which of these is done depends on the gimple statement being 
>> > processed.
>> > + At 

Re: [PATCH 1/2] c-family: Copy DECL_USER_ALIGN even if DECL_ALIGN is similar.

2021-05-21 Thread Jason Merrill via Gcc-patches

On 5/19/21 6:03 PM, Martin Sebor via Gcc-patches wrote:

On 5/3/21 8:53 AM, Robin Dapp via Gcc-patches wrote:

Hi,

on s390 a warning test fails:

inline int ATTR ((cold, aligned (8)))
finline_hot_noret_align (int);

inline int ATTR ((warn_unused_result))
finline_hot_noret_align (int);

inline int ATTR ((aligned (4)))
   finline_hot_noret_align (int);  /* { dg-warning "ignoring attribute 
.aligned \\(4\\). because it conflicts with attribute .aligned \\(8\\)."


This test actually uncovered two problems.  First, on s390 the default 
function alignment is 8 bytes.  When the second decl above is merged 
with the first one, DECL_USER_ALIGN is only copied if DECL_ALIGN (old) 
> DECL_ALIGN (new).  Subsequently, when merging the third decl, no 
warning is emitted since DECL_USER_ALIGN is unset.


This patch also copies DECL_USER_ALIGN if DECL_ALIGN (old) == 
DECL_ALIGN (new) && DECL_USER_ALIGN (olddecl) != DECL_USER_ALIGN 
(newdecl)).



Then, while going through the related files I also noticed that we 
emit a wrong warning for:


inline int ATTR ((aligned (32)))
finline_align (int);

inline int ATTR ((aligned (4)))
   finline_align (int);  /* { dg-warning "ignoring attribute .aligned 
\\(4\\). because it conflicts with attribute .aligned \\(32\\)." "" } */


What we emit is

warning: ignoring attribute ‘aligned (4)’ because it conflicts with 
attribute ‘aligned (8)’ [-Wattributes].


This is due to the short circuit evaluation in c-attribs.c:

    && ((curalign = DECL_ALIGN (decl)) > bitalign
    || ((lastalign = DECL_ALIGN (last_decl)) > bitalign)))

where lastalign is only initialized when curalign > bitalign.  On s390 
this is not the case and lastalign is used zero-initialized in the 
following code.


On top, the following condition
   else if (!warn_if_not_aligned_p
   && TREE_CODE (decl) == FUNCTION_DECL
   && DECL_ALIGN (decl) > bitalign)

seems to be fully covered by
 else if (TREE_CODE (decl) == FUNCTION_DECL
    && ((curalign = DECL_ALIGN (decl)) > bitalign
    || ((lastalign = DECL_ALIGN (last_decl)) > bitalign)))

and so is essentially dead. I therefore removed it as there does not 
seem to be a test anywhere for the error message ( "alignment for %q+D 
was previously specified as %d and may not be decreased") either.


The removal of the dead code looks good to me.  The change to
"re-init lastalign" doesn't seem right.  When it's zero it means
the conflict is between two attributes on the same declaration,
in which case the note shouldn't be printed (it would just point
to the same location as the warning).


Agreed.


diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index c1f652d1dc9..d132b6fd3b6 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -2317,6 +2317,10 @@ common_handle_aligned_attribute (tree *node, tree 
name, tree args, int flags,

    && ((curalign = DECL_ALIGN (decl)) > bitalign
    || ((lastalign = DECL_ALIGN (last_decl)) > bitalign)))
     {
+  /* Re-init lastalign in case we short-circuit the condition,
+ i.e.  curalign > bitalign.  */
+  lastalign = DECL_ALIGN (last_decl);
+
   /* Either a prior attribute on the same declaration or one
  on a prior declaration of the same function specifies
  stricter alignment than this attribute.  */

For the C/C++ FE changes:

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 3ea4708c507..2ea5051e9cd 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -2620,6 +2620,9 @@ merge_decls (tree newdecl, tree olddecl, tree 
newtype, tree oldtype)

   SET_DECL_ALIGN (newdecl, DECL_ALIGN (olddecl));
   DECL_USER_ALIGN (newdecl) |= DECL_USER_ALIGN (olddecl);
 }
+  else if (DECL_ALIGN (olddecl) == DECL_ALIGN (newdecl)
+   && DECL_USER_ALIGN (olddecl) != DECL_USER_ALIGN (newdecl))
+    DECL_USER_ALIGN (newdecl) |= DECL_USER_ALIGN (olddecl);

This should be the same as

  DECL_USER_ALIGN (newdecl) = 1;

so I would suggest to use that for clarity.


I'm not sure that's clearer, though it certainly is equivalent.  I'm 
happy either way.


The braces around this statement in the parallel C++ change are unnecessary.

I'm a little nervous about the front-end changes given that we were 
specifically not setting *_USER_ALIGN in this case before, but I'm not 
aware of any rationale for that, so I'm content to make the change and 
see if anything breaks.


Jason



Re: [PATCH 1/3] openacc: Add support for gang local storage allocation in shared memory

2021-05-21 Thread Thomas Schwinge
Hi!

On 2021-02-26T04:34:50-0800, Julian Brown  wrote:
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -2957,6 +2957,8 @@ expand_UNIQUE (internal_fn, gcall *stmt)
>else
>   gcc_unreachable ();
>break;
> +case IFN_UNIQUE_OACC_PRIVATE:
> +  break;
>  }
>
>if (pattern)

That's unexpected.  Meaning: better if this doesn't happen.

> --- a/gcc/omp-offload.c
> +++ b/gcc/omp-offload.c

> @@ -1998,6 +2133,45 @@ execute_oacc_device_lower ()
>   case IFN_UNIQUE_OACC_TAIL_MARK:
> remove = true;
> break;
> +
> + case IFN_UNIQUE_OACC_PRIVATE:
> +   {
> + HOST_WIDE_INT level
> +   = TREE_INT_CST_LOW (gimple_call_arg (call, 2));
> + if (level == -1)
> +   break;

They should be all "handled" here (meaning: also for 'level == -1', do
'remove = true' after the real handling):

> + for (unsigned i = 3;
> +  i < gimple_call_num_args (call);
> +  i++)
> +   {
> + [...]
> +   }
> + remove = true;
> +   }
> +   break;
>   }
> break;
>   }

Why we at all can have 'level == -1' cases is a separate bug to be fixed.

I've pushed "[OpenACC privatization] Don't let unhandled
'IFN_UNIQUE_OACC_PRIVATE' linger [PR90115]" to master branch in commit
ff451ea723deb3fe8471eb96ac9381c063ec6533, see attached.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
>From ff451ea723deb3fe8471eb96ac9381c063ec6533 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 20 May 2021 15:37:07 +0200
Subject: [PATCH] [OpenACC privatization] Don't let unhandled
 'IFN_UNIQUE_OACC_PRIVATE' linger [PR90115]

Make sure they're all handled in 'execute_oacc_device_lower'.  Why we at all
can have 'level == -1' cases is a separate bug to be fixed.

	gcc/
	PR middle-end/90115
	* omp-offload.c (execute_oacc_device_lower)
	: Diagnose and handle for 'level == -1'
	case, too.
	* internal-fn.c (expand_UNIQUE): Don't expect
	'IFN_UNIQUE_OACC_PRIVATE'.
---
 gcc/internal-fn.c |  2 --
 gcc/omp-offload.c | 10 ++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index d92080c8077..d209a52f823 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2969,8 +2969,6 @@ expand_UNIQUE (internal_fn, gcall *stmt)
   else
 	gcc_unreachable ();
   break;
-case IFN_UNIQUE_OACC_PRIVATE:
-  break;
 }
 
   if (pattern)
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 080bdddfe88..36bd2e44d81 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -2139,8 +2139,9 @@ execute_oacc_device_lower ()
 		  {
 		HOST_WIDE_INT level
 		  = TREE_INT_CST_LOW (gimple_call_arg (call, 2));
-		if (level == -1)
-		  break;
+		gcc_checking_assert (level == -1
+	 || (level >= 0
+	 && level < GOMP_DIM_MAX));
 		for (unsigned i = 3;
 			 i < gimple_call_num_args (call);
 			 i++)
@@ -2156,11 +2157,12 @@ execute_oacc_device_lower ()
 			  { "gang", "worker", "vector" };
 			fprintf (dump_file, "Decl UID %u has %s "
  "partitioning:", DECL_UID (decl),
- axes[level]);
+ (level == -1 ? "UNKNOWN" : axes[level]));
 			print_generic_decl (dump_file, decl, TDF_SLIM);
 			fputc ('\n', dump_file);
 			  }
-			if (targetm.goacc.adjust_private_decl)
+			if (level != -1
+			&& targetm.goacc.adjust_private_decl)
 			  {
 			tree oldtype = TREE_TYPE (decl);
 			tree newdecl
-- 
2.30.2



Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2021-05-21 Thread Thomas Schwinge
Hi!

On 2019-06-07T15:08:37+0100, Julian Brown  wrote:
> Hi Jakub,
>
> Thanks for the review! I believe I've addressed all your comments in
> the attached version of the patch.
>
> On Mon, 3 Jun 2019 18:23:00 +0200
> Jakub Jelinek  wrote:
>> > +/* Record vars listed in private clauses in CLAUSES in CTX.  This 
>> > information
>> > +   is used to mark up variables that should be made private per-gang.  */
>> > +
>> > +static void
>> > +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
>> > +{
>> > +  [...]
>> > +}
>>
>> You don't want to do this for all GOMP_FOR or GOMP_TARGET context,
>> I'd hope you only want to do that for OpenACC contexts.

> I've [...] fixed the patch to only call oacc_record_private_var_clauses in
> OpenACC contexts.

> commit 6c2a018b940d0b132395048b0600f7d897319ee2
> Author: Julian Brown 
> Date:   Thu Aug 9 20:27:04 2018 -0700
>
> [OpenACC] Add support for gang local storage allocation in shared memory

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c

> @@ -8599,6 +8681,9 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context 
> *ctx)
>
>push_gimplify_context ();
>
> +  if (is_gimple_omp_oacc (ctx->stmt))
> +oacc_record_private_var_clauses (ctx, gimple_omp_for_clauses (stmt));
> +
>lower_omp (gimple_omp_for_pre_body_ptr (stmt), ctx);
>
>block = make_node (BLOCK);

So, yes -- but then, apparently, that again got lost in a later version
of the patch.  ;-)

I've pushed "[OpenACC privatization] Don't evaluate OpenMP 'for' clauses
[PR90115]" to master branch in commit
3a285ebd0cf5ab762726018515d23280fa6dd445, see attached.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
>From 3a285ebd0cf5ab762726018515d23280fa6dd445 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 20 May 2021 15:22:24 +0200
Subject: [PATCH] [OpenACC privatization] Don't evaluate OpenMP 'for' clauses
 [PR90115]

	gcc/
	PR middle-end/90115
	* omp-low.c (lower_omp_for): Don't evaluate OpenMP 'for' clauses.
---
 gcc/omp-low.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index da827ef2e34..a86c6c1e82c 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11067,7 +11067,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   push_gimplify_context ();
 
-  oacc_privatization_scan_clause_chain (ctx, gimple_omp_for_clauses (stmt));
+  if (is_gimple_omp_oacc (ctx->stmt))
+oacc_privatization_scan_clause_chain (ctx, gimple_omp_for_clauses (stmt));
 
   lower_omp (gimple_omp_for_pre_body_ptr (stmt), ctx);
 
-- 
2.30.2



Re: [PATCH 3/3] nvptx: NVPTX parts for OpenACC private variables patch

2021-05-21 Thread Thomas Schwinge
Hi!

On 2021-02-26T04:34:52-0800, Julian Brown  wrote:
> This patch contains the NVPTX backend support for placing OpenACC
> gang-private variables in GPU shared memory.
>
> Tested with offloading to NVPTX.
>
> This is substantially the same as the version previously posted: I will
> assume it is already approved (unless I hear objections), and will commit
> it at the same time as the rest of the series.
>
>   (https://gcc.gnu.org/pipermail/gcc-patches/2018-October/507909.html)

I've additionally pushed "[OpenACC privatization, nvptx] Tighten some
aspects [PR90115]" to master branch in commit
f6f45309d9fc140006886456b291e4ac24812cea, see attached.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
>From f6f45309d9fc140006886456b291e4ac24812cea Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 20 May 2021 15:08:38 +0200
Subject: [PATCH] [OpenACC privatization, nvptx] Tighten some aspects [PR90115]

No functional change.

	gcc/
	PR middle-end/90115
	* config/nvptx/nvptx.c (nvptx_goacc_adjust_private_decl)
	(nvptx_goacc_expand_var_decl): Tighten.
---
 gcc/config/nvptx/nvptx.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 80116e570d6..60d3f079048 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -6682,12 +6682,12 @@ nvptx_truly_noop_truncation (poly_uint64, poly_uint64)
 static tree
 nvptx_goacc_adjust_private_decl (tree decl, int level)
 {
-  if (level != GOMP_DIM_GANG)
-return decl;
+  gcc_checking_assert (!lookup_attribute ("oacc gang-private",
+	  DECL_ATTRIBUTES (decl)));
 
   /* Set "oacc gang-private" attribute for gang-private variable
  declarations.  */
-  if (!lookup_attribute ("oacc gang-private", DECL_ATTRIBUTES (decl)))
+  if (level == GOMP_DIM_GANG)
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
 	{
@@ -6708,9 +6708,10 @@ static rtx
 nvptx_goacc_expand_var_decl (tree var)
 {
   /* Place "oacc gang-private" variables in shared memory.  */
-  if (VAR_P (var)
-  && lookup_attribute ("oacc gang-private", DECL_ATTRIBUTES (var)))
+  if (lookup_attribute ("oacc gang-private", DECL_ATTRIBUTES (var)))
 {
+  gcc_checking_assert (VAR_P (var));
+
   unsigned int offset, *poffset;
   poffset = gang_private_shared_hmap.get (var);
   if (poffset)
-- 
2.30.2



Re: [PATCH 1/3] openacc: Add support for gang local storage allocation in shared memory

2021-05-21 Thread Thomas Schwinge
Hi!

On 2021-04-19T12:23:56+0100, Julian Brown  wrote:
> On Thu, 15 Apr 2021 19:26:54 +0200
> Thomas Schwinge  wrote:
>> This has iterated through several conceptually different designs and
>> implementations, by several people, over the past several years.
>
> I hope this wasn't a hint that I'd failed to attribute the authorship of
> the patch properly? Many apologies if so, that certainly wasn't my
> intention!

No, not at all -- this was just to highlight the several iterations this
work has gone though.


With a first set of my modification merged in, I've now pushed "openacc:
Add support for gang local storage allocation in shared memory [PR90115]"
to master branch in commit 29a2f51806c5b30e17a8d0e9ba7915a3c53c34ff, see
attached.  I shall now follow up with a number of further changes, and
more to come later (once developed).


>> On 2021-02-26T04:34:50-0800, Julian Brown 
>> wrote:
>> > This patch implements a method to track the "private-ness" of
>> > OpenACC variables declared in offload regions in gang-partitioned,
>> > worker-partitioned or vector-partitioned modes. Variables declared
>> > implicitly in scoped blocks and those declared "private" on
>> > enclosing directives (e.g. "acc parallel") are both handled.
>> > Variables that are e.g. gang-private can then be adjusted so they
>> > reside in GPU shared memory.
>> >
>> > The reason for doing this is twofold: correct implementation of
>> > OpenACC semantics
>>
>> ACK, and as mentioned before, this very much relates to
>>  "OpenACC: predetermined private levels
>> for variables declared in blocks" (plus the corresponding use of
>> 'private' clauses, implicit/explicit, including 'firstprivate') and
>>  "Predetermined private levels for
>> variables declared in OpenACC accelerator routines", which we thus
>> should refer in testcases/ChangeLog/commit log, as appropriate.  I do
>> understand we're not yet addressing all of that (and that's fine!),
>> but we should capture remaining work items of the PRs and Cesar's
>> list in
>> ),
>> as appropriate.
>
> From that list: [...]

Thanks, that'll be useful for later.


>> > Handling of private variables is intimately
>> > tied to the execution model for gangs/workers/vectors implemented by
>> > a particular target: for current targets, we use (or on mainline,
>> > will soon use) a broadcasting/neutering scheme.
>> >
>> > That is sufficient for code that e.g. sets a variable in
>> > worker-single mode and expects to use the value in
>> > worker-partitioned mode. The difficulty (semantics-wise) comes when
>> > the user wants to do something like an atomic operation in
>> > worker-partitioned mode and expects a worker-single (gang private)
>> > variable to be shared across each partitioned worker. Forcing use
>> > of shared memory for such variables makes that work properly.
>>
>> Are we reliably making sure that gang-private variables (and other
>> levels, in general) are not subject to the usual broadcasting scheme
>> (nvptx, at least), or does that currently work "by accident"?  (I
>> haven't looked into that, yet.)
>
> Yes, that case is explicitly handled by the broadcasting/neutering patch
> recently posted. (One of the reasons that patch depends on this one.)

OK, I shall look into these GCN patches soon -- and I still haven't
looked into the nvptx aspect.


>> > --- a/gcc/expr.c
>> > +++ b/gcc/expr.c
>> > @@ -10224,8 +10224,19 @@ expand_expr_real_1 (tree exp, rtx target,
>> > machine_mode tmode, exp = SSA_NAME_VAR (ssa_name);
>> >goto expand_decl_rtl;
>> >
>> > -case PARM_DECL:
>> >  case VAR_DECL:
>> > +  /* Allow accel compiler to handle variables that require special
>> > +   treatment, e.g. if they have been modified in some way earlier in
>> > +   compilation by the adjust_private_decl OpenACC hook.  */
>> > +  if (flag_openacc && targetm.goacc.expand_var_decl)
>> > +  {
>> > +temp = targetm.goacc.expand_var_decl (exp);
>> > +if (temp)
>> > +  return temp;
>> > +  }
>> > +  /* ... fall through ...  */
>> > +
>> > +case PARM_DECL:
>>
>> [TS] Are we sure that we don't need the same handling for a
>> 'PARM_DECL', too?  (If yes, to document and verify that, should we
>> thus again unify the two 'case's, and in
>> 'targetm.goacc.expand_var_decl' add a 'gcc_checking_assert (TREE_CODE
>> (var) == VAR_DECL')'?)
>
> Maybe for routines? Those bits date from the earliest version of the
> patch and (same excuse again) I didn't have call to revisit those
> decisions.

Indeed we're currently not handling 'p' here:

int f(int p)
{
  int l;
  #pragma acc parallel
{
  #pragma acc loop gang private(l, p) // 'l' is, but 'p' is *not* made 
gang-private here.
  for ([...])

... to be fixed at some later point.


>> Also, are we sure that all the following existing processing is not
>> 

Re: [PATCH 06/57] rs6000: Add helper functions for parsing

2021-05-21 Thread Segher Boessenkool
Hi!

On Tue, Apr 27, 2021 at 10:32:41AM -0500, Bill Schmidt via Gcc-patches wrote:
> +/* Used as a sentinel for range constraints on integer fields.  No field can
> +   be 32 bits wide, so this is a safe sentinel value.  */
> +#define MININT INT32_MIN

INT32_MIN is for value of type int32_t.  You use it for "int" though.
There is INT_MIN just for that :-)

> +/* Exit codes for the shell.  */
> +enum exit_codes {
> +  EC_INTERR
> +};

Please define this with some specified value (so append " = 1" or such),
or just write "exit(1);" etc. directly.  As it is, a compiler is free to
do "exit(0);" for this error value!  Not good.

All these exit codes are externally visible, so please just make them
explicit.  There isn't much point in having symbolic names for it
either, because users will see the raw numbers (reported by their
shell), instead.

Most generator programs allow to use fatal() etc., this is a much richer
environment, no need to reinvent stuff?  Include "errors.h" and link
with error.o, and that is all you need AFAIK.

> +static void
> +consume_whitespace ()

Please say (void), empty argument lists have different meanings in
different language versions, and it looks like you just forgot to fill
int the type.  Besides, it is just "what we do" :-)

> +/* Get the next nonblank, noncomment line, returning 0 on EOF, 1 otherwise.  
> */
> +static int
> +advance_line (FILE *file)
> +{
> +  while (1)
> +{
> +  /* Read ahead one line and check for EOF.  */
> +  if (!fgets (linebuf, sizeof(linebuf), file))

sizeof is an operator, not a function, and even if it was a function it
would have a space before the opening parenthesis :-)

> + return 0;

So it also returns 0 on any error?  This may not be a good idea.

> +/* Match an integer and return its value, or MININT on failure.  */
> +static int
> +match_integer ()
> +{
> +  int startpos = pos;
> +  if (linebuf[pos] == '-')
> +safe_inc_pos ();
> +
> +  int lastpos = pos - 1;
> +  while (isdigit (linebuf[lastpos + 1]))
> +if (++lastpos >= LINELEN - 1)
> +  {
> + (*diag) ("line length overrun in match_integer.\n");
> + exit (EC_INTERR);
> +  }
> +
> +  if (lastpos < pos)
> +return MININT;
> +
> +  pos = lastpos + 1;
> +  char *buf = (char *) malloc (lastpos - startpos + 2);
> +  memcpy (buf, [startpos], lastpos - startpos + 1);
> +  buf[lastpos - startpos + 1] = '\0';
> +
> +  int x;
> +  sscanf (buf, "%d", );
> +  return x;
> +}

Can't you just use strtol?

> +static const char *
> +match_to_right_bracket ()

This needs a function comment.

> +{
> +  int lastpos = pos - 1;
> +  while (linebuf[lastpos + 1] != ']')
> +if (++lastpos >= LINELEN - 1)

Please don't use side effects in "if" conditions.

> +  {
> + (*diag) ("line length overrun.\n");
> + exit (EC_INTERR);
> +  }

I don't think you shoulod check for line length overrun in any of these
functions, btw?  Just check where you read them in, and that is plenty?

> +  if (lastpos < pos)
> +return 0;
> +
> +  char *buf = (char *) malloc (lastpos - pos + 2);
> +  memcpy (buf, [pos], lastpos - pos + 1);
> +  buf[lastpos - pos + 1] = '\0';
> +
> +  pos = lastpos + 1;
> +  return buf;
> +}

Are there no utility routines you can use?  It would be useful to have
something that all gen* can use (something less bare than what there is
now...)


Segher


Re: [PATCH] libstdc++: More efficient std::chrono::year::leap.

2021-05-21 Thread Cassio Neri via Gcc-patches
I've checked the generated code and the compiler doesn't figure out
the logic. I added a comment to explain.

(Revised patch below and attached.)

Best wishes,
Cassio.

---

Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.

Tested on x86_64-pc-linux-gnu.

libstdc++-v3/ChangeLog:
libstdc++-v3/ChangeLog:

* include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..85aa0379432 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
 const bool __is_multiple_of_100
   = __multiplier * (_M_y + __offset) < __bound;
-return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+// Usually we test _M_y % 400 == 0 but, when it's already known that
+// _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0.
+return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;
   }

   explicit constexpr

On Fri, May 21, 2021 at 7:05 PM Koning, Paul  wrote:
>
>
>
> > On May 21, 2021, at 1:46 PM, Cassio Neri via Gcc-patches 
> >  wrote:
> >
> > Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
> > then it's divisible by 400 if and only if it's divisible by 16. The latter
> > allows for better code generation.
>
> I wonder if the optimizer could be taught to do that.
>
> The change seems correct but it is very confusing; at the very least the 
> reasoning you gave should be stated in a comment on that check.
>
> paul
>
>
Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.

Tested on x86_64-pc-linux-gnu.

libstdc++-v3/ChangeLog:
libstdc++-v3/ChangeLog:

	* include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..85aa0379432 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1612,7 +1612,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
 	const bool __is_multiple_of_100
 	  = __multiplier * (_M_y + __offset) < __bound;
-	return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+	// Usually we test _M_y % 400 == 0 but, when it's already known that
+	// _M_y%100 == 0, then _M_y % 400==0 is equivalent to _M_y % 16 == 0.
+	return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;
   }

   explicit constexpr


Re: [PATCH] libstdc++: More efficient std::chrono::year::leap.

2021-05-21 Thread Koning, Paul via Gcc-patches



> On May 21, 2021, at 1:46 PM, Cassio Neri via Gcc-patches 
>  wrote:
> 
> Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
> then it's divisible by 400 if and only if it's divisible by 16. The latter
> allows for better code generation.

I wonder if the optimizer could be taught to do that.

The change seems correct but it is very confusing; at the very least the 
reasoning you gave should be stated in a comment on that check.

paul




[PATCH] libstdc++: More efficient std::chrono::year::leap.

2021-05-21 Thread Cassio Neri via Gcc-patches
Simple change to std::chrono::year::is_leap. If a year is multiple of 100,
then it's divisible by 400 if and only if it's divisible by 16. The latter
allows for better code generation.

Tested on x86_64-pc-linux-gnu.

libstdc++-v3/ChangeLog:

* include/std/chrono:

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 4631a727d73..6221d02c1b3 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1612,7 +1612,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
  const bool __is_multiple_of_100
   = __multiplier * (_M_y + __offset) < __bound;
- return (!__is_multiple_of_100 || _M_y % 400 == 0) && _M_y % 4 == 0;
+ return (!__is_multiple_of_100 || _M_y % 16 == 0) && _M_y % 4 == 0;
   }

   explicit constexpr


Re: [PATCH] tree-sra: Avoid refreshing into const base decls (PR 100453)

2021-05-21 Thread Martin Jambor
Hi,

On Mon, May 17 2021, Eric Botcazou wrote:
>> sorry for breaking Ada over the weekend, however...
>
> No problem.
>
>> None of the non-ACATS tests fail for me without doing a bootstrap, but I
>> did manage to reproduce this one (by the way, there really should be a
>> documentation on how to run ACATS tests manually, I should not need to
>> spend an hour and half figuring it out).
>
> https://gcc.gnu.org/wiki/DebuggingGCC

I missed that.  I'll try to put the string ACATS there so that people
can search for it.

>
>> The problem is that before (early) SRA, there is a TREE_READONLY decl
>> that is being written to and my patch eliminated those writes.
>> Specifically, I see:
>> 
>>:
>>   var_ara_5D.5012.FD.4868[1]{lb: 1 sz: 4}[1]{lb: 1 sz: 1} = _877;
>>   _880 = report.ident_bool (1);
>> 
>>:
>>   var_ara_5D.5012.FD.4868[1]{lb: 1 sz: 4}[2]{lb: 1 sz: 1} = _880;
>>   _883 = report.ident_bool (1);
>> 
>>:
>>   var_ara_5D.5012.FD.4868[1]{lb: 1 sz: 4}[3]{lb: 1 sz: 1} = _883;
>>   _886 = report.ident_bool (1);
>> 
>>:
>>   var_ara_5D.5012.FD.4868[1]{lb: 1 sz: 4}[4]{lb: 1 sz: 1} = _886;
>>   _889 = report.ident_bool (1);
>> 
>> [...and many more.]  Note that this is an -fdump-tree-all-uid dump, the
>> DECL that is being assigned to has DECL_UID 5012 and when I dump it:
>> 
>> DECL_UID of racc->base is: 5012
>> print_node (dump_file, "", racc->base, 0):
>>  > type > sizes-gimplified type_5 TI size 
>> unit-size 
>> align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
>> 0x7fc249faf690 fields > 0x7fc249faf1f8 c41325a__p__array_5> decl_3 TI c41325a.adb:57:11
>> size 
>> unit-size 
>> align:8 warn_if_not_align:0 offset_align 128
>> offset 
>> bit-offset  context
>> > context
>>  Ada size > constant 128>
>> pointer_to_this  chain > 0x7fc249fb0390 c41325a__p__p__obj_ara_5___PAD>> --> readonly TI
>> c41325a.adb:70:6
>> size > bitsizetype> constant 128> unit-size >  constant 16> align:64
>> warn_if_not_align:0 context  chain
>> >
>> 
>> I can see that base is TREE_READONLY.
>> 
>> Am I right that this is a bug happening at some point earlier in the Ada
>> compiler?
>
> Yes, in the gimplifier apparently, so try with:
>
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index e790f08b23f..52cef6f8bff 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -1822,6 +1822,7 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
> if (!TREE_STATIC (decl))
>   {
> DECL_INITIAL (decl) = NULL_TREE;
> +   TREE_READONLY (decl) = 0;
> init = build2 (INIT_EXPR, void_type_node, decl, init);
> gimplify_and_add (init, seq_p);
> ggc_free (init);

The problem with this patch is that it causes:

  FAIL: gnat.dg/opt94.adb scan-tree-dump-times optimized "worker" 1

which is exactly the testcase from the commit which caused the bug I am
trying to address.

I have given up attempting to clean IL we get from Ada testcases from
writes to TREE_READONLY decls, at least for now.  I spent a bigger part
of today trying to handle those cases gracefully in SRA but I still
cannot make it work, always some Ada testcase breaks.  I'll try again
next week.

Martin


Re: GCC 9.4 Status Report (2021-05-19), branch frozen for release

2021-05-21 Thread Jason Merrill via Gcc-patches
Sorry, just pushed 3 patches before I noticed this.  They're safe but not
critical, should I back them out?

On Wed, May 19, 2021 at 4:06 AM Richard Biener  wrote:

>
> Status
> ==
>
> The GCC 9 branch is now frozen for the upcoming GCC 9.4 release.
> I will announce a first release candidate shortly.
>
> All changes require release manager approval now.
>
>
> Quality Data
> 
>
> Priority  #   Change from last report
> ---   ---
> P1-   1
> P2  304   -   7
> P3   23   +   7
> P4  174
> P5   23
> ---   ---
> Total P1-P3 328   -   1
> Total   525   -   1
>
>
> Previous Report
> ===
>
> https://gcc.gnu.org/pipermail/gcc/2021-April/235948.html
>
>


Re: [PATCH] c++/88601 - [C/C++] __builtin_shufflevector support

2021-05-21 Thread Jason Merrill via Gcc-patches

On 5/21/21 8:33 AM, Richard Biener wrote:

This adds support for the clang __builtin_shufflevector extension to
the C and C++ frontends.  The builtin is lowered to VEC_PERM_EXPR.
Because VEC_PERM_EXPR does not support different sized vector inputs
or result or the special permute index of -1 (don't-care)
c_build_shufflevector applies lowering by widening inputs and output
to the widest vector, replacing -1 by a defined index and
subsetting the final vector if we produced a wider result than
desired.

Code generation thus can be sub-optimal, followup patches will
aim to fix that by recovering from part of the missing features
during RTL expansion and by relaxing the constraints of the GIMPLE
IL with regard to VEC_PERM_EXPR.

Bootstrapped on x86_64-unknown-linux-gnu, (re-)testing in progress.

Honza - you've filed PR88601, can you point me to testcases that
exercise common uses so we can look at code generation quality
and where time is spent best in improving things?

OK for trunk?

Thanks,
Richard.

2021-05-21  Richard Biener  

PR c++/88601
gcc/c-family/
* c-common.c: Include tree-vector-builder.h and
vec-perm-indices.h.
(c_common_reswords): Add __builtin_shufflevector.
(c_build_shufflevector): New funtion.
* c-common.h (enum rid): Add RID_BUILTIN_SHUFFLEVECTOR.
(c_build_shufflevector): Declare.

gcc/c/
* c-decl.c (names_builtin_p): Handle RID_BUILTIN_SHUFFLEVECTOR.
* c-parser.c (c_parser_postfix_expression): Likewise.

gcc/cp/
* cp-objcp-common.c (names_builtin_p): Handle
RID_BUILTIN_SHUFFLEVECTOR.
* cp-tree.h (build_x_shufflevector): Declare.
* parser.c (cp_parser_postfix_expression): Handle
RID_BUILTIN_SHUFFLEVECTOR.
* pt.c (tsubst_copy_and_build): Handle IFN_SHUFFLEVECTOR.
* typeck.c (build_x_shufflevector): Build either a lowered
VEC_PERM_EXPR or an unlowered shufflevector via a temporary
internal function IFN_SHUFFLEVECTOR.

gcc/
* internal-fn.c (expand_SHUFFLEVECTOR): Define.
* internal-fn.def (SHUFFLEVECTOR): New.
* internal-fn.h (expand_SHUFFLEVECTOR): Declare.

gcc/testsuite/
* c-c++-common/builtin-shufflevector-2.c: New testcase.
* c-c++-common/torture/builtin-shufflevector-1.c: Likewise.
* g++.dg/builtin-shufflevector-1.C: Likewise.
* g++.dg/builtin-shufflevector-2.C: Likewise.
---
  gcc/c-family/c-common.c   | 139 ++
  gcc/c-family/c-common.h   |   4 +-
  gcc/c/c-decl.c|   1 +
  gcc/c/c-parser.c  |  38 +
  gcc/cp/cp-objcp-common.c  |   1 +
  gcc/cp/cp-tree.h  |   3 +
  gcc/cp/parser.c   |  15 ++
  gcc/cp/pt.c   |   9 ++
  gcc/cp/typeck.c   |  36 +
  gcc/internal-fn.c |   6 +
  gcc/internal-fn.def   |   3 +
  gcc/internal-fn.h |   1 +
  .../c-c++-common/builtin-shufflevector-2.c|  18 +++
  .../torture/builtin-shufflevector-1.c |  49 ++
  .../g++.dg/builtin-shufflevector-1.C  |  18 +++
  .../g++.dg/builtin-shufflevector-2.C  |  12 ++
  16 files changed, 352 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/c-c++-common/builtin-shufflevector-2.c
  create mode 100644 
gcc/testsuite/c-c++-common/torture/builtin-shufflevector-1.c
  create mode 100644 gcc/testsuite/g++.dg/builtin-shufflevector-1.C
  create mode 100644 gcc/testsuite/g++.dg/builtin-shufflevector-2.C

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index b7daa2e2654..c4eb2b1c920 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -51,6 +51,8 @@ along with GCC; see the file COPYING3.  If not see
  #include "c-spellcheck.h"
  #include "selftest.h"
  #include "debug.h"
+#include "tree-vector-builder.h"
+#include "vec-perm-indices.h"
  
  cpp_reader *parse_in;		/* Declared in c-pragma.h.  */
  
@@ -383,6 +385,7 @@ const struct c_common_resword c_common_reswords[] =

{ "__builtin_has_attribute", RID_BUILTIN_HAS_ATTRIBUTE, 0 },
{ "__builtin_launder", RID_BUILTIN_LAUNDER, D_CXXONLY },
{ "__builtin_shuffle", RID_BUILTIN_SHUFFLE, 0 },
+  { "__builtin_shufflevector", RID_BUILTIN_SHUFFLEVECTOR, 0 },
{ "__builtin_tgmath", RID_BUILTIN_TGMATH, D_CONLY },
{ "__builtin_offsetof", RID_OFFSETOF, 0 },
{ "__builtin_types_compatible_p", RID_TYPES_COMPATIBLE_P, D_CONLY },
@@ -1108,6 +,142 @@ c_build_vec_perm_expr (location_t loc, tree v0, tree 
v1, tree mask,
return ret;
  }
  
+/* Build a VEC_PERM_EXPR if V0, V1 are not error_mark_nodes

+   and have vector types, V0 has the same element type as V1, and the
+   number of elements the result is that of MASK.  */
+tree
+c_build_shufflevector (location_t loc, tree v0, 

Re: [PATCH] c++: Add new warning options for C++ language mismatches

2021-05-21 Thread Jason Merrill via Gcc-patches

On 5/20/21 4:05 PM, Jonathan Wakely wrote:

On 19/05/21 23:52 +0100, Jonathan Wakely wrote:

On 19/05/21 16:08 -0400, Jason Merrill wrote:

On 5/19/21 4:05 PM, Jonathan Wakely wrote:

Oh, also we have https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93769
which points out a problem with the current wording. Not a very
important one, but still ...

While I'm touching all 38(?) places that say "only available with
-std=c++NN or -std=gnu++NN I could change them to say something like
"only available since C++NN". Should I bother?

Clang's equivalent warnings say "are a C++11 feature" e.g.

ext.C:1:17: warning: inline namespaces are a C++11 feature 
[-Wc++11-inline-namespace]


(They have a specific warning for each feature, with
-Wc++11-extensions to control them all at once.)


The clang wording seems more accurate, as that PR points out.


OK, that requires touching a number of error_at and inform calls as
well as the pedwarns, so I'll address that separately in a later
patch.


Here's a WIP patch that rewords all those diagnostics. This doesn't
include the necessary testsuite changes, and I don't know when I'll
have time to do the rest of it. But it's a start. Does this look like
the right approach?



+   "trailing return type is a C++11 feature");

...

+ "% destructors are a C++20 feature");


I wonder if we want to standardize on singular or plural.

I wonder if we want to keep the mention of the -std= option, possibly 
also mentioning the currently active standard level.


Jason



Re: [PATCH 1.0/2] ipa-sra: Introduce a mini-DCE to tree-inline.c (PR 93385)

2021-05-21 Thread Martin Jambor
Hi,

On Tue, May 11 2021, Richard Biener wrote:
> On Mon, May 10, 2021 at 8:52 PM Martin Jambor  wrote:

[...]

>> @@ -969,6 +969,97 @@ ipa_param_body_adjustments::carry_over_param (tree t)
>>return new_parm;
>>  }
>>
>> +/* Return true if BLOCKS_TO_COPY is NULL or if PHI has an argument ARG in
>> +   position that corresponds to an edge that is coming from a block that has
>> +   the corresponding bit set in BLOCKS_TO_COPY.  */
>> +
>> +static bool
>> +phi_arg_will_live_p (gphi *phi, bitmap blocks_to_copy, tree arg)
>> +{
>> +  bool arg_will_survive = false;
>> +  if (!blocks_to_copy)
>> +arg_will_survive = true;
>> +  else
>> +for (unsigned i = 0; i < gimple_phi_num_args (phi); i++)
>> +  if (gimple_phi_arg_def (phi, i) == arg
>
> I think this is prone to quadratic, it would be nice to use the
> faster FOR_EACH_IMM_USE_FAST () below and then
> phi_arg_index_from_use () to get to the corresponding edge
> directly - this would remove the loop over PHI args here and
> you can then inline the function below.

I see, thanks, I have changed the code as suggested.

>
>> + && bitmap_bit_p (blocks_to_copy,
>> +  gimple_phi_arg_edge (phi, i)->src->index))
>> +   {
>> + arg_will_survive = true;
>> + break;
>> +   }
>> +  return arg_will_survive;
>> +}
>> +
>> +/* Populate m_dead_stmts given that DEAD_PARAM is going to be removed 
>> without
>> +   any replacement or splitting.  REPL is the replacement VAR_SECL to base 
>> any
>> +   remaining uses of a removed parameter on.  */
>> +
>> +void
>> +ipa_param_body_adjustments::mark_dead_statements (tree dead_param)
>> +{
>> +  /* Current IPA analyses which remove unused parameters never remove a
>> + non-gimple register ones which have any use except as parameters in 
>> other
>> + calls, so we can safely leve them as they are.  */
>> +  if (!is_gimple_reg (dead_param))
>> +return;
>> +  tree parm_ddef = ssa_default_def (m_id->src_cfun, dead_param);
>> +  if (!parm_ddef || has_zero_uses (parm_ddef))
>> +return;
>> +
>> +  auto_vec stack;
>> +  m_dead_ssas.add (parm_ddef);
>> +  stack.safe_push (parm_ddef);
>> +  while (!stack.is_empty ())
>> +{
>> +  tree t = stack.pop ();
>> +
>> +  imm_use_iterator imm_iter;
>> +  gimple *stmt;
>> +
>> +  insert_decl_map (m_id, t, error_mark_node);
>> +  FOR_EACH_IMM_USE_STMT (stmt, imm_iter, t)
>> +   {
>> + if (is_gimple_call (stmt)
>
> (*) so we just ignore dead calls?  That is, there might be
> a const call like
>
>_1 = foo (removed_param_2);
>
> and followup important to be removed uses of _1?  Above
> you say that eventual IPA SRA argument removal is dealt
> with but this still leaves the const/pure call case.  Unless
> IPA SRA bails on them, of course.

It does, dealing with pure/const functions is only a TODO.  This code is
basically only meant to understand what isra_track_scalar_value_uses
can.  I was looking at somehow merging them but introducing
callbacks/specialization would make it too complex and ugly for little
gain.


> Can you add a comment like
>
> /* IPA SRA only handles dead arguments in calls when it
>removes those arguments from the called function.  The
>IPA machinery arranges for fixup here.  */

I added the following comment which I think better describes the
situation:

/* Calls containing dead arguments cannot be deleted,
   modify_call_stmt will instead remove just the argument later on. 
   If isra_track_scalar_value_uses in ipa-sra.c is extended to look 
   through const functions, we will need to do so here too.  */ 

because arguments are indeed "usually" removed during the final fixup of
calls but this case is actually an exception from that "rule."

>
>> + || (m_id->blocks_to_copy
>> + && !bitmap_bit_p (m_id->blocks_to_copy,
>> +   gimple_bb (stmt)->index)))
>> +   continue;
>> +
>> + if (is_gimple_debug (stmt))
>> +   {
>> + m_dead_stmts.add (stmt);
>> + gcc_assert (gimple_debug_bind_p (stmt));
>> +   }
>> + else if (gimple_code (stmt) == GIMPLE_PHI)
>> +   {
>> + gphi *phi = as_a  (stmt);
>> + if (phi_arg_will_live_p (phi, m_id->blocks_to_copy, t))
>> +   {
>> + m_dead_stmts.add (phi);
>> + tree res = gimple_phi_result (phi);
>> + if (!m_dead_ssas.add (res))
>> +   stack.safe_push (res);
>> +   }
>> +   }
>> + else if (is_gimple_assign (stmt))
>> +   {
>> + m_dead_stmts.add (stmt);
>> + if (!gimple_clobber_p (stmt))
>> +   {
>> + tree lhs = gimple_assign_lhs (stmt);
>> + gcc_assert (TREE_CODE (lhs) == SSA_NAME);
>
> I don't think you can assert this?  Well, maybe you can because
> IPA prop would have refused to mark the param dead?  

Re: [PATCH 0.5/2] ipa-sra: Restructure how cloning and call redirection communicate (PR 93385)

2021-05-21 Thread Martin Jambor
Hi,

this is a PING of the following patch, which however also now got two
fixes, it deallocated the newly added summary in toplev::finalize and
fixes a mistakenly uninitialized field (dummy, used by OpenMP SIMD
cloning).

The subsequent patch which actually deals with the PR has already been
approved by Richi, so this is the one holding it up:


I was asked by Richi to split my fix for PR 93385 for easier review
into IPA-SRA materialization refactoring and the actual DCE addition.
Fortunately it was mostly natural except for a temporary weird
condition in ipa_param_body_adjustments::modify_call_stmt.
Additionally.

This is the first part which basically replaces performed_splits in
clone_info and the code which generates it, keeps it up-to-date and
consumes it with new edge summaries which are much nicer.  It simply
contains 1) a mapping from the original argument indices to the actual
indices in the call statement as it is now, 2) information needed to
identify arguments representing pass-through IPA-SRA splits with which
have been added to the call arguments in place of an original
argument/reference and 3) a delta to the index where va_args may start
- so basically directly all the information that the consumer of
performed_splits had to compute and we also do not need the weird
dummy declarations.

The main disadvantage is that the information has to be created (and
kept up-to-date) for all call graph edges associated with the given
statement from all clones (including inline clones) of the clone where
splitting or removal happened first.  But all of this happens during
clone materialization so the only effect on WPA memory consumption is
the removal of a pointer from clone_info.

The statement modification code also has to know the statement from
the original function in order to be able to locate the edge summaries
which at this point are still keyed to these.  However, the code is
already quite heavily dependant on how things are structured in
tree-inline.c and in order to fix bugs like these it probably has to
be.

The subsequent patch needs this new information to be able to remove
arguments from calls during materialization and communicate this
information to the call redirection.

Bootstrapped and tested on x86_64-linux.  Together with other patches I
also did that on i686-linux and aarch64-linux and (profiled)
LTO-bootstrapped it on x86_64-linux.

OK for trunk?

Thanks,

Martin


2021-05-19  Martin Jambor  

PR ipa/93385
* symtab-clones.h (clone_info): Removed member param_adjustments.
* ipa-param-manipulation.h: Adjust initial comment to reflect how we
deal with pass-through splits now.
(ipa_param_performed_split): Removed.
(ipa_param_adjustments::modify_call): Adjusted parameters.
(class ipa_param_body_adjustments): Adjusted parameters of
register_replacement, modify_gimple_stmt and modify_call_stmt.
(ipa_verify_edge_has_no_modifications): Declare.
(ipa_edge_modifications_finalize): Declare.
* cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Remove
performed_splits processing, pas only edge to padjs->modify_call,
check that call arguments were not modified if they should not have
been.
* cgraphclones.c (cgraph_node::create_clone): Do not copy performed
splits.
* ipa-param-manipulation.c (struct pass_through_split_map): New type.
(ipa_edge_modification_info): Likewise.
(ipa_edge_modification_sum): Likewise.
(ipa_edge_modifications): New edge summary.
(ipa_verify_edge_has_no_modifications): New function.
(transitive_split_p): Removed.
(transitive_split_map): Likewise.
(init_transitive_splits): Likewise.
(ipa_param_adjustments::modify_call): Adjusted to use the new edge
summary instead of performed_splits.
(ipa_param_body_adjustments::register_replacement): Drop dummy
parameter, set base_index of the created ipa_param_body_replacement.
(phi_arg_will_live_p): New function.
(ipa_param_body_adjustments::common_initialization): Do not create
IPA_SRA dummy decls.
(simple_tree_swap_info): Removed.
(remap_split_decl_to_dummy): Likewise.
(record_argument_state_1): New function.
(record_argument_state): Likewise.
(ipa_param_body_adjustments::modify_call_stmt): New parameter
orig_stmt.  Do not work with dummy decls, save necessary info about
changes to ipa_edge_modifications.
(ipa_param_body_adjustments::modify_gimple_stmt): New parameter
orig_stmt, pass it to modify_call_stmt.
(ipa_param_body_adjustments::modify_cfun_body): Adjust call to
modify_gimple_stmt.
(ipa_edge_modifications_finalize): New function.
* tree-inline.c (remap_gimple_stmt): Pass original statement to
modify_gimple_stmt.
(copy_phis_for_bb): Do not copy dead PHI nodes.
 

Re: [Ping^2, Patch, Fortran] PR98301 Re: RANDOM_INIT() and coarray Fortran

2021-05-21 Thread Steve Kargl via Gcc-patches
On Fri, May 21, 2021 at 10:09:02AM +0200, Andre Vehreschild wrote:
> Ping, ping!
> 
> Please find attached a rebased version of the patch for the RANDOM_INIT issue
> with coarray Fortran. Nothing changed to the previous version, just rebased to
> current master.
> 
> Regtested fine on x86_64-linux/f33. Ok for trunk?
> 

I think you've down your due diligence with 2 pings.
I would commit.

-- 
steve


Add 'libgomp.oacc-fortran/privatized-ref-2.f90'

2021-05-21 Thread Thomas Schwinge
Hi!

This came into existance internally, when the og10 branch was set up.

On 2020-06-03T17:23:51+0200, Tobias Burnus  wrote:
> This fixes [...] on OG10 (og10_prerelease); it will be
> later applied to gcn/… to fix the issue. (Upstream is unaffected.)
> [...]

However, that means that your testcase does work on master branch (and
would regress if certain commits got pushed there).  As the testcase has
got a property useful for a thing I'm currently working on, I've pushed
to master branch "Add 'libgomp.oacc-fortran/privatized-ref-2.f90'" in
commit 61796dc03befa9b7426d5bc7c336cca585944143, and "Don't skip
'libgomp.oacc-fortran/privatized-ref-2.f90' for nvptx offloading" in
commit 5d42db533324e80a7382b20b94cace5b202d41ea, see attached.


I confirm that "FIXME: Fails due to PR middle-end/95499" is still a
problem.

And, GCC '-O' reports:

[...]/libgomp.oacc-fortran/privatized-ref-2.f90:147:21:

  147 |   subroutine foobar15 (scalar)
  | ^
Warning: ‘foobar15’ defined but not used [-Wunused-function]
[...]/libgomp.oacc-fortran/privatized-ref-2.f90: In function ‘MAIN__’:
[...]/libgomp.oacc-fortran/privatized-ref-2.f90:31:22: warning: ‘a.offset’ 
is used uninitialized [-Wuninitialized]
   31 |   A = [(3*j, j=1, 10)]
  |  ^
[...]/libgomp.oacc-fortran/privatized-ref-2.f90:27:30: note: ‘a’ declared 
here
   27 |   integer, allocatable :: A(:)
  |  ^
[...]/libgomp.oacc-fortran/privatized-ref-2.f90:31:22: warning: 
‘a.dim[0].lbound’ is used uninitialized [-Wuninitialized]
   31 |   A = [(3*j, j=1, 10)]
  |  ^
[...]/libgomp.oacc-fortran/privatized-ref-2.f90:27:30: note: ‘a’ declared 
here
   27 |   integer, allocatable :: A(:)
  |  ^
[...]/libgomp.oacc-fortran/privatized-ref-2.f90:31:22: warning: 
‘a.dim[0].ubound’ is used uninitialized [-Wuninitialized]
   31 |   A = [(3*j, j=1, 10)]
  |  ^
[...]/libgomp.oacc-fortran/privatized-ref-2.f90:27:30: note: ‘a’ declared 
here
   27 |   integer, allocatable :: A(:)
  |  ^

I haven't looked into these.


Grüße
 Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
>From 61796dc03befa9b7426d5bc7c336cca585944143 Mon Sep 17 00:00:00 2001
From: Tobias Burnus 
Date: Wed, 3 Jun 2020 15:35:12 +0200
Subject: [PATCH 1/2] Add 'libgomp.oacc-fortran/privatized-ref-2.f90'

	libgomp/
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: New.
---
 .../libgomp.oacc-fortran/privatized-ref-2.f90 | 101 ++
 1 file changed, 101 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
new file mode 100644
index 000..6c3e1dcc211
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
@@ -0,0 +1,101 @@
+! { dg-do run { target { ! openacc_nvidia_accel_selected } } }
+
+program main
+  implicit none (type, external)
+  integer :: j
+  integer, allocatable :: A(:)
+  character(len=:), allocatable :: my_str
+  character(len=15), allocatable :: my_str15
+
+  A = [(3*j, j=1, 10)]
+  call foo (A, size(A))
+  call bar (A)
+  my_str = "1234567890"
+  call foo_str(my_str)
+  call bar_str(my_str)
+  my_str15 = "123456789012345"
+  call foobar (my_str15)
+  deallocate (A, my_str, my_str15)
+contains
+  subroutine foo (array, nn)
+integer :: i, nn
+integer :: array(nn)
+
+!$acc parallel copyout(array)
+array = [(-i, i = 1, nn)]
+!$acc loop gang private(array)
+do i = 1, 10
+  array(i) = i
+end do
+if (any (array /= [(-i, i = 1, nn)])) error stop 1
+!$acc end parallel
+  end subroutine foo
+  subroutine bar (array)
+integer :: i
+integer :: array(:)
+
+!$acc parallel copyout(array)
+array = [(-2*i, i = 1, size(array))]
+!$acc loop gang private(array)
+do i = 1, 10
+  array(i) = 9*i
+end do
+if (any (array /= [(-2*i, i = 1, 10)])) error stop 2
+!$acc end parallel
+  end subroutine bar
+  subroutine foo_str(str)
+integer :: i
+character(len=*) :: str
+
+!$acc parallel copyout(str)
+str = "abcdefghij"
+!$acc loop gang private(str)
+do i = 1, 10
+  str(i:i) = achar(ichar('A') + i)
+end do
+if (str /= "abcdefghij") error stop 3
+!$acc end parallel
+  end
+  subroutine bar_str(str)
+integer :: i
+character(len=:), allocatable :: str
+
+! ***
+! FIXME: Fails due to PR middle-end/95499
+! ***
+!!$acc parallel copyout(str)
+str = "abcdefghij"
+!!$acc loop gang private(str)
+!do i = 1, 10

[RFC PATCH] i386: Enable auto-vectorization for 32bit modes (+ testcases)

2021-05-21 Thread Uros Bizjak via Gcc-patches
Here it is, the patch that enables auto-vectorization for 32bit modes.

Sent as RFC, because the patch fails some vectorizer scans, as it
obviously enables more vectorization to happen:

Running target unix
FAIL: gcc.dg/vect/pr71264.c -flto -ffat-lto-objects  scan-tree-dump
vect "vectorized 1 loops in function"
FAIL: gcc.dg/vect/pr71264.c scan-tree-dump vect "vectorized 1 loops in function"
FAIL: gcc.dg/vect/slp-28.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/slp-28.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/slp-28.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/slp-28.c scan-tree-dump-times vect "vectorizing
stmts using SLP" 1
FAIL: gcc.dg/vect/slp-3.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 3 loops" 1
FAIL: gcc.dg/vect/slp-3.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorizing stmts using SLP" 3
FAIL: gcc.dg/vect/slp-3.c scan-tree-dump-times vect "vectorized 3 loops" 1
FAIL: gcc.dg/vect/slp-3.c scan-tree-dump-times vect "vectorizing stmts
using SLP" 3


Running target unix/-m32
FAIL: gcc.dg/vect/no-vfa-vect-101.c scan-tree-dump-times vect "can't
determine dependence" 1
FAIL: gcc.dg/vect/no-vfa-vect-102.c scan-tree-dump-times vect
"possible dependence between data-refs" 1
FAIL: gcc.dg/vect/no-vfa-vect-102a.c scan-tree-dump-times vect
"possible dependence between data-refs" 1
FAIL: gcc.dg/vect/no-vfa-vect-37.c scan-tree-dump-times vect "can't
determine dependence" 2
FAIL: gcc.dg/vect/pr71264.c -flto -ffat-lto-objects  scan-tree-dump
vect "vectorized 1 loops in function"
FAIL: gcc.dg/vect/pr71264.c scan-tree-dump vect "vectorized 1 loops in function"
FAIL: gcc.dg/vect/slp-28.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/slp-28.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorizing stmts using SLP" 1
FAIL: gcc.dg/vect/slp-28.c scan-tree-dump-times vect "vectorized 1 loops" 1
FAIL: gcc.dg/vect/slp-28.c scan-tree-dump-times vect "vectorizing
stmts using SLP" 1
FAIL: gcc.dg/vect/slp-3.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorized 3 loops" 1
FAIL: gcc.dg/vect/slp-3.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vectorizing stmts using SLP" 3
FAIL: gcc.dg/vect/slp-3.c scan-tree-dump-times vect "vectorized 3 loops" 1
FAIL: gcc.dg/vect/slp-3.c scan-tree-dump-times vect "vectorizing stmts
using SLP" 3
FAIL: gcc.dg/vect/vect-104.c -flto -ffat-lto-objects
scan-tree-dump-times vect "possible dependence between data-refs" 1
FAIL: gcc.dg/vect/vect-104.c scan-tree-dump-times vect "possible
dependence between data-refs" 1

Please also note that V4QI and V2HI modes do not use MMX registers, so
auto-vectorization can also be enabled on 32bit x86 targets.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f3b451835da..f43f3ba060e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22187,12 +22187,15 @@ ix86_autovectorize_vector_modes (vector_modes *modes, 
bool all)
   modes->safe_push (V16QImode);
   modes->safe_push (V32QImode);
 }
-  else if (TARGET_MMX_WITH_SSE)
+  else if (TARGET_SSE2)
 modes->safe_push (V16QImode);
 
   if (TARGET_MMX_WITH_SSE)
 modes->safe_push (V8QImode);
 
+  if (TARGET_SSE2)
+modes->safe_push (V4QImode);
+
   return 0;
 }
 
diff --git a/gcc/testsuite/gcc.target/i386/pr100637-3b.c 
b/gcc/testsuite/gcc.target/i386/pr100637-3b.c
new file mode 100644
index 000..16df70059a9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100637-3b.c
@@ -0,0 +1,56 @@
+/* PR target/100637 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -msse4" } */
+
+char r[4], a[4], b[4];
+unsigned char ur[4], ua[4], ub[4];
+
+void maxs (void)
+{
+  int i;
+
+  for (i = 0; i < 4; i++)
+r[i] = a[i] > b[i] ? a[i] : b[i];
+}
+
+/* { dg-final { scan-assembler "pmaxsb" } } */
+
+void maxu (void)
+{
+  int i;
+
+  for (i = 0; i < 4; i++)
+ur[i] = ua[i] > ub[i] ? ua[i] : ub[i];
+}
+
+/* { dg-final { scan-assembler "pmaxub" } } */
+
+void mins (void)
+{
+  int i;
+
+  for (i = 0; i < 4; i++)
+r[i] = a[i] < b[i] ? a[i] : b[i];
+}
+
+/* { dg-final { scan-assembler "pminsb" } } */
+
+void minu (void)
+{
+  int i;
+
+  for (i = 0; i < 4; i++)
+ur[i] = ua[i] < ub[i] ? ua[i] : ub[i];
+}
+
+/* { dg-final { scan-assembler "pminub" } } */
+
+void _abs (void)
+{
+  int i;
+
+  for (i = 0; i < 4; i++)
+r[i] = a[i] < 0 ? -a[i] : a[i];
+}
+
+/* { dg-final { scan-assembler "pabsb" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100637-3w.c 
b/gcc/testsuite/gcc.target/i386/pr100637-3w.c
new file mode 100644
index 000..7f1882e7a56
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr100637-3w.c
@@ -0,0 +1,86 @@
+/* PR target/100637 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -msse4" } */
+
+short r[2], a[2], b[2];
+unsigned short ur[2], ua[2], ub[2];
+
+void mulh (void)
+{
+  

RE: [PATCH] aarch64: Add attributes for builtins specified in aarch64-builtins.c

2021-05-21 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Richard Sandiford 
> Sent: 21 May 2021 13:08
> To: Kyrylo Tkachov 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] aarch64: Add attributes for builtins specified in 
> aarch64-
> builtins.c
> 
> Kyrylo Tkachov  writes:
> > Hi all,
> >
> > Besides the builtins in aarch64-simd-builtins.def there are a number of
> builtins defined in aarch64-builtins.c itself.
> > They could also benefit from the attributes generated by
> aarch64_get_attributes.
> > However aarch64_get_attributes and its helpers are only set up to handle a
> aarch64_simd_builtin_datum.
> >
> > This patch changes these functions to instead take a flag and mode value
> that are extracted from
> > aarch64_simd_builtin_datum.flags and aarch64_simd_builtin_datum.mode
> anyway.
> > Then the various builtin init functions in aarch64-builtins.c can pass down
> their own FLAG_* flags
> > that they want to derive attributes from.
> > I'm not too happy with changing the helper functions in this way, but it
> seemed like the least bad option.
> >
> > Richard, does this look like an ok approach to you?
> 
> Looks good to me.  Just some very minor things:
> 
> I guess this is personal preference, but in:
> 
> > +  unsigned int flags = FLAG_FP;
> > +  tree attrs = aarch64_get_attributes (flags, d->mode);
> 
> it seemed odd to separate out the flags variable instead of just having:
> 
>   tree attrs = aarch64_get_attributes (FLAG_FP, d->mode);
> 
> Long line in:
> 
> > +  tree fndecl = aarch64_general_add_builtin (d->name, ftype, d->fcode,
> attrs);
> 
> For;
> 
> > @@ -906,14 +906,13 @@ aarch64_init_simd_builtin_scalar_types (void)
> >  "__builtin_aarch64_simd_udi");
> >  }
> >
> > -/* Return a set of FLAG_* flags that describe what the function could do,
> > +/* Adjust the set of FLAG_* flags derived from FLAGS
> > +   that describe what the function with result MODE could do,
> 
> I think this is easier to parse without the s/Return a/Adjust the/.
> 
> Also, s/the function/a function/ seems better, now that we're not passing
> a specific function.  Same for the other comments.

Thanks for reviewing.
I've pushed the fixed patch to trunk.

Kyrill

> 
> Thanks,
> Richard


attrs.patch
Description: attrs.patch


Re: [PATCH] c++/88601 - [C/C++] __builtin_shufflevector support

2021-05-21 Thread Richard Biener via Gcc-patches
On Fri, May 21, 2021 at 3:19 PM Richard Biener  wrote:
>
> This adds support for the clang __builtin_shufflevector extension to
> the C and C++ frontends.  The builtin is lowered to VEC_PERM_EXPR.
> Because VEC_PERM_EXPR does not support different sized vector inputs
> or result or the special permute index of -1 (don't-care)
> c_build_shufflevector applies lowering by widening inputs and output
> to the widest vector, replacing -1 by a defined index and
> subsetting the final vector if we produced a wider result than
> desired.
>
> Code generation thus can be sub-optimal, followup patches will
> aim to fix that by recovering from part of the missing features
> during RTL expansion and by relaxing the constraints of the GIMPLE
> IL with regard to VEC_PERM_EXPR.
>
> Bootstrapped on x86_64-unknown-linux-gnu, (re-)testing in progress.
>
> Honza - you've filed PR88601, can you point me to testcases that
> exercise common uses so we can look at code generation quality
> and where time is spent best in improving things?
>
> OK for trunk?

Updated patch with added documentation and the C++ testcases
moved to g++.dg/ext/

Richard.


p
Description: Binary data


[Patch, Fortran] PR100337 Should be able to pass non-present optional arguments to CO_BROADCAST

2021-05-21 Thread Andre Vehreschild via Gcc-patches
Hi,

the attached patch fixes an issue when calling CO_BROADCAST in -fcoarray=single
mode, where the optional but non-present (in the calling scope) stat variable
was assigned to before checking for it being not present.

Regtests fine on x86-64-linux/f33. Ok for trunk?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
gcc/fortran/ChangeLog:

	PR fortran/100337
	* trans-intrinsic.c (conv_co_collective): Check stat for null ptr
	before dereferrencing.

gcc/testsuite/ChangeLog:

	PR fortran/100337
	* gfortran.dg/coarray_collectives_17.f90: New test.

diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 4d7451479d3..03a38090051 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -11232,8 +11232,28 @@ conv_co_collective (gfc_code *code)
   if (flag_coarray == GFC_FCOARRAY_SINGLE)
 {
   if (stat != NULL_TREE)
-	gfc_add_modify (, stat,
-			fold_convert (TREE_TYPE (stat), integer_zero_node));
+	{
+	  /* For optional stats, check the pointer is valid before zero'ing.  */
+	  if (gfc_expr_attr (stat_expr).optional)
+	{
+	  tree tmp;
+	  stmtblock_t ass_block;
+	  gfc_start_block (_block);
+	  gfc_add_modify (_block, stat,
+			  fold_convert (TREE_TYPE (stat),
+	integer_zero_node));
+	  tmp = fold_build2 (NE_EXPR, logical_type_node,
+ gfc_build_addr_expr (NULL_TREE, stat),
+ null_pointer_node);
+	  tmp = fold_build3 (COND_EXPR, void_type_node, tmp,
+ gfc_finish_block (_block),
+ build_empty_stmt (input_location));
+	  gfc_add_expr_to_block (, tmp);
+	}
+	  else
+	gfc_add_modify (, stat,
+			fold_convert (TREE_TYPE (stat), integer_zero_node));
+	}
   return gfc_finish_block ();
 }

diff --git a/gcc/testsuite/gfortran.dg/coarray_collectives_17.f90 b/gcc/testsuite/gfortran.dg/coarray_collectives_17.f90
new file mode 100644
index 000..84a6645865e
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray_collectives_17.f90
@@ -0,0 +1,42 @@
+! { dg-do run }
+! { dg-options "-fcoarray=single" }
+!
+! PR 100337
+! Test case inspired by code submitted by Brad Richardson
+
+program main
+implicit none
+
+integer, parameter :: MESSAGE = 42
+integer :: result
+
+call myco_broadcast(MESSAGE, result, 1)
+
+if (result /= MESSAGE) error stop 1
+contains
+subroutine myco_broadcast(m, r, source_image, stat, errmsg)
+integer, intent(in) :: m
+integer, intent(out) :: r
+integer, intent(in) :: source_image
+integer, intent(out), optional :: stat
+character(len=*), intent(inout), optional :: errmsg
+
+integer :: data_length
+
+data_length = 1
+
+call co_broadcast(data_length, source_image, stat, errmsg)
+
+if (present(stat)) then
+if (stat /= 0) return
+end if
+
+if (this_image() == source_image) then
+r = m
+end if
+
+call co_broadcast(r, source_image, stat, errmsg)
+end subroutine
+
+end program
+


Re: [PATCH] constructor: Elide expand_constructor when can move by pieces is true

2021-05-21 Thread H.J. Lu via Gcc-patches
On Fri, May 21, 2021 at 12:30 AM Bernd Edlinger
 wrote:
>
>
>
> On 5/21/21 8:57 AM, Richard Biener wrote:
> > On Thu, May 20, 2021 at 4:04 PM H.J. Lu  wrote:
> >>
> >> On Thu, May 20, 2021 at 12:51 AM Richard Biener
> >>  wrote:
> >>>
> >>> On Wed, May 19, 2021 at 3:22 PM H.J. Lu  wrote:
> 
>  On Wed, May 19, 2021 at 2:33 AM Richard Biener
>   wrote:
> >
> > On Tue, May 18, 2021 at 9:16 PM H.J. Lu  wrote:
> >>
> >> When expanding a constant constructor, don't call expand_constructor if
> >> it is more efficient to load the data from the memory via move by 
> >> pieces.
> >>
> >> gcc/
> >>
> >> PR middle-end/90773
> >> * expr.c (expand_expr_real_1): Don't call expand_constructor if
> >> it is more efficient to load the data from the memory.
> >>
> >> gcc/testsuite/
> >>
> >> PR middle-end/90773
> >> * gcc.target/i386/pr90773-24.c: New test.
> >> * gcc.target/i386/pr90773-25.c: Likewise.
> >> ---
> >>  gcc/expr.c | 10 ++
> >>  gcc/testsuite/gcc.target/i386/pr90773-24.c | 22 ++
> >>  gcc/testsuite/gcc.target/i386/pr90773-25.c | 20 
> >>  3 files changed, 52 insertions(+)
> >>  create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-24.c
> >>  create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-25.c
> >>
> >> diff --git a/gcc/expr.c b/gcc/expr.c
> >> index d09ee42e262..80e01ea1cbe 100644
> >> --- a/gcc/expr.c
> >> +++ b/gcc/expr.c
> >> @@ -10886,6 +10886,16 @@ expand_expr_real_1 (tree exp, rtx target, 
> >> machine_mode tmode,
> >> unsigned HOST_WIDE_INT ix;
> >> tree field, value;
> >>
> >> +   /* Check if it is more efficient to load the data from
> >> +  the memory directly.  FIXME: How many stores do we
> >> +  need here if not moved by pieces?  */
> >> +   unsigned HOST_WIDE_INT bytes
> >> + = tree_to_uhwi (TYPE_SIZE_UNIT (type));
> >
> > that's prone to fail - it could be a VLA.
> 
>  What do you mean by fail?  Is it ICE or missed optimization?
>  Do you have a testcase?
> 
> >
> >> +   if ((bytes / UNITS_PER_WORD) > 2
> >> +   && MOVE_MAX_PIECES > UNITS_PER_WORD
> >> +   && can_move_by_pieces (bytes, TYPE_ALIGN (type)))
> >> + goto normal_inner_ref;
> >> +
> >
> > It looks like you're concerned about aggregate copies but this also 
> > handles
> > non-aggregates (which on GIMPLE might already be optimized of course).
> 
>  Here I check if we copy more than 2 words and we can move more than
>  a word in a single instruction.
> 
> > Also you say "if it's cheaper" but I see no cost considerations.  How do
> > we generally handle immed const vs. load from constant pool costs?
> 
>  This trades 2 (update to 8) stores with one load plus one store.  Is 
>  there
>  a way to check which one is faster?
> >>>
> >>> I'm not sure - it depends on whether the target can do stores from 
> >>> immediates
> >>> at all or what restrictions apply, what the immediate value actually is
> >>> (zero or all-ones should be way cheaper than sth arbitrary) and how the
> >>> pressure on the load unit is.  can_move_by_pieces (bytes, TYPE_ALIGN 
> >>> (type))
> >>> also does not guarantee it will actually move pieces larger than 
> >>> UNITS_PER_WORD,
> >>> that might depend on alignment.  There's by_pieces_ninsns that might 
> >>> provide
> >>> some hint here.
> >>>
> >>> I'm sure it works well for x86.
> >>>
> >>> I wonder if the existing code is in the appropriate place and we
> >>> shouldn't instead
> >>> handle this somewhere upthread where we ask to copy 'exp' into some other
> >>> memory location.  For your testcase that's expand_assignment but I can
> >>> imagine passing array[0] by value to a function resulting in similar 
> >>> copying.
> >>> Testing that shows we get
> >>>
> >>> pushq   array+56(%rip)
> >>> .cfi_def_cfa_offset 24
> >>> pushq   array+48(%rip)
> >>> .cfi_def_cfa_offset 32
> >>> pushq   array+40(%rip)
> >>> .cfi_def_cfa_offset 40
> >>> pushq   array+32(%rip)
> >>> .cfi_def_cfa_offset 48
> >>> pushq   array+24(%rip)
> >>> .cfi_def_cfa_offset 56
> >>> pushq   array+16(%rip)
> >>> .cfi_def_cfa_offset 64
> >>> pushq   array+8(%rip)
> >>> .cfi_def_cfa_offset 72
> >>> pushq   array(%rip)
> >>> .cfi_def_cfa_offset 80
> >>> callbar
> >>>
> >>> for that.  We do have the by-pieces infrastructure to generally do this 
> >>> kind of
> >>> copying but in both of these cases we do not seem to use it.  I also 
> >>> 

[PATCH] Elide expand_constructor if move by pieces is preferred

2021-05-21 Thread H.J. Lu via Gcc-patches
On Thu, May 20, 2021 at 11:57 PM Richard Biener
 wrote:
>
> On Thu, May 20, 2021 at 4:04 PM H.J. Lu  wrote:
> >
> > On Thu, May 20, 2021 at 12:51 AM Richard Biener
> >  wrote:
> > >
> > > On Wed, May 19, 2021 at 3:22 PM H.J. Lu  wrote:
> > > >
> > > > On Wed, May 19, 2021 at 2:33 AM Richard Biener
> > > >  wrote:
> > > > >
> > > > > On Tue, May 18, 2021 at 9:16 PM H.J. Lu  wrote:
> > > > > >
> > > > > > When expanding a constant constructor, don't call 
> > > > > > expand_constructor if
> > > > > > it is more efficient to load the data from the memory via move by 
> > > > > > pieces.
> > > > > >
> > > > > > gcc/
> > > > > >
> > > > > > PR middle-end/90773
> > > > > > * expr.c (expand_expr_real_1): Don't call 
> > > > > > expand_constructor if
> > > > > > it is more efficient to load the data from the memory.
> > > > > >
> > > > > > gcc/testsuite/
> > > > > >
> > > > > > PR middle-end/90773
> > > > > > * gcc.target/i386/pr90773-24.c: New test.
> > > > > > * gcc.target/i386/pr90773-25.c: Likewise.
> > > > > > ---
> > > > > >  gcc/expr.c | 10 ++
> > > > > >  gcc/testsuite/gcc.target/i386/pr90773-24.c | 22 
> > > > > > ++
> > > > > >  gcc/testsuite/gcc.target/i386/pr90773-25.c | 20 
> > > > > > 
> > > > > >  3 files changed, 52 insertions(+)
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-24.c
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-25.c
> > > > > >
> > > > > > diff --git a/gcc/expr.c b/gcc/expr.c
> > > > > > index d09ee42e262..80e01ea1cbe 100644
> > > > > > --- a/gcc/expr.c
> > > > > > +++ b/gcc/expr.c
> > > > > > @@ -10886,6 +10886,16 @@ expand_expr_real_1 (tree exp, rtx target, 
> > > > > > machine_mode tmode,
> > > > > > unsigned HOST_WIDE_INT ix;
> > > > > > tree field, value;
> > > > > >
> > > > > > +   /* Check if it is more efficient to load the data 
> > > > > > from
> > > > > > +  the memory directly.  FIXME: How many stores do 
> > > > > > we
> > > > > > +  need here if not moved by pieces?  */
> > > > > > +   unsigned HOST_WIDE_INT bytes
> > > > > > + = tree_to_uhwi (TYPE_SIZE_UNIT (type));
> > > > >
> > > > > that's prone to fail - it could be a VLA.
> > > >
> > > > What do you mean by fail?  Is it ICE or missed optimization?
> > > > Do you have a testcase?
> > > >
> > > > >
> > > > > > +   if ((bytes / UNITS_PER_WORD) > 2
> > > > > > +   && MOVE_MAX_PIECES > UNITS_PER_WORD
> > > > > > +   && can_move_by_pieces (bytes, TYPE_ALIGN 
> > > > > > (type)))
> > > > > > + goto normal_inner_ref;
> > > > > > +
> > > > >
> > > > > It looks like you're concerned about aggregate copies but this also 
> > > > > handles
> > > > > non-aggregates (which on GIMPLE might already be optimized of course).
> > > >
> > > > Here I check if we copy more than 2 words and we can move more than
> > > > a word in a single instruction.
> > > >
> > > > > Also you say "if it's cheaper" but I see no cost considerations.  How 
> > > > > do
> > > > > we generally handle immed const vs. load from constant pool costs?
> > > >
> > > > This trades 2 (update to 8) stores with one load plus one store.  Is 
> > > > there
> > > > a way to check which one is faster?
> > >
> > > I'm not sure - it depends on whether the target can do stores from 
> > > immediates
> > > at all or what restrictions apply, what the immediate value actually is
> > > (zero or all-ones should be way cheaper than sth arbitrary) and how the
> > > pressure on the load unit is.  can_move_by_pieces (bytes, TYPE_ALIGN 
> > > (type))
> > > also does not guarantee it will actually move pieces larger than 
> > > UNITS_PER_WORD,
> > > that might depend on alignment.  There's by_pieces_ninsns that might 
> > > provide
> > > some hint here.
> > >
> > > I'm sure it works well for x86.
> > >
> > > I wonder if the existing code is in the appropriate place and we
> > > shouldn't instead
> > > handle this somewhere upthread where we ask to copy 'exp' into some other
> > > memory location.  For your testcase that's expand_assignment but I can
> > > imagine passing array[0] by value to a function resulting in similar 
> > > copying.
> > > Testing that shows we get
> > >
> > > pushq   array+56(%rip)
> > > .cfi_def_cfa_offset 24
> > > pushq   array+48(%rip)
> > > .cfi_def_cfa_offset 32
> > > pushq   array+40(%rip)
> > > .cfi_def_cfa_offset 40
> > > pushq   array+32(%rip)
> > > .cfi_def_cfa_offset 48
> > > pushq   array+24(%rip)
> > > .cfi_def_cfa_offset 56
> > > pushq   array+16(%rip)
> > > .cfi_def_cfa_offset 64
> > > pushq   array+8(%rip)
> > > .cfi_def_cfa_offset 72
> > > pushq   array(%rip)
> > > .cfi_def_cfa_offset 

Re: [PATCH 05/57] rs6000: Add file support and functions for diagnostic support

2021-05-21 Thread Bill Schmidt via Gcc-patches

On 5/20/21 6:03 PM, Segher Boessenkool wrote:

Hi!

On Tue, Apr 27, 2021 at 10:32:40AM -0500, Bill Schmidt via Gcc-patches wrote:

* config/rs6000/rs6000-gen-builtins.c (bif_file): New filescope
variable.

What makes it interesting that this var has file scope?  Did you mean to
say it has internal linkage ("is static")?  I would just leave that off
completely, anyway (just "New." or "New variable.")
OK.  As you say, just pointing out these are all static linkage, outside 
all functions for convenience.  You'll find this in quite a number of 
patches in the series, so you can assume I'll fix this for all of them.



(LINELEN): New defined constant.

It isn't a constant, it's a macro.  "New macro."?
Hey, these are old-school named constants!  :-)  (No problem, will 
change.  Again, you will see a lot of this in the change logs.)



+#define LINELEN 1024
+static char linebuf[LINELEN];

You never get anywhere close to 1024 I suppose?


Correct; I'd be surprised if we get to a tenth of that on any line.


+/* Pointer to a diagnostic function.  */
+void (*diag) (const char *, ...) __attribute__ ((format (printf, 1, 2)))
+  = NULL;

This isn't portable: you cannot assign NULL to a pointer to function.
But it will work on all POSIX machines, and those are the only hosts we
support.  Still icky :-)

If you just leave off the initialisation, it will be initialised to 0,
which is exactly the same problem of course, just less explicit.

Will do.


This pointer is not static btw?  Should it be?

Yes, looks like it should...

Okay for trunk with a little changelog massaging.  Thanks!


Thank you!

BIll



Segher


Re: [PATCH 04/57] rs6000: Add initial input files

2021-05-21 Thread Bill Schmidt via Gcc-patches



On 5/20/21 5:46 PM, Segher Boessenkool wrote:

On Tue, Apr 27, 2021 at 10:32:39AM -0500, Bill Schmidt via Gcc-patches wrote:

This patch adds a tiny subset of the built-in and overload descriptions.
* config/rs6000/rs6000-builtin-new.def: New.

You'll have to rename this to not have "-new" in the name later, I hope
you realise :-)


Noted for the future "removal" patch.  Thanks for the reminder!

Bill


Okay for trunk.  Thanks!


Segher


Re: [PATCH 00/57] Replace the Power target-specific built-in machinery

2021-05-21 Thread Bill Schmidt via Gcc-patches



On 5/20/21 4:57 PM, Segher Boessenkool wrote:

Is there some (semi-)automatic way to compare the results of the old
and new systems?
Yes...very "semi".  There's a patch in the series that updates the 
information printed from -mdebug=builtin.  I use this to print the 
builtins generated by the old and new versions, and run them through a 
Python script to look for anything missing or added, any mismatches in 
the type system, mismatches in const/pure/etc., that sort of thing.  I 
found quite a lot of mistakes this way and got them fixed ahead of 
time.  In the script I maintain a table of expected differences, where 
there were just errors before, etc., and there's a short doc string with 
each of them.  So I can go back and look at that if we run into any 
discrepancies in the future.  The script doesn't have long-term value 
beyond that; once we delete the old version in a future patch, it won't 
have anything to compare against.

Patch 0057: Fix one last late-breaking change

   Keeping the code up-to-date with upstream has been fun.  When I
   rebased to create the patch set, I found one new issue where a
   small change had been made to the overload handling for the
   vec_insert builtins.  This patch reflects that change into the
   new handling.  My version of git is having trouble with
   interactive rebasing, so it was easier to just add the extra patch.

What breaks by keeping this fix after the other patches?
Nothing truly breaks.  There is a VSX-efficient implementation of 
vec_insert and a less-efficient implementation.  Late in GCC 11, 
somebody recognized the VSX-efficient implementation could be used in 
more situations.  Without this patch, we still use the less-efficient 
implementation in the new version.



(3) A number of built-ins used "long" for DImode types, which would
break these for 32-bit.  I changed those arguments and return values
to "long long" to avoid such problems, when those built-ins were not
restricted to 64-bit mode already.  There aren't many such cases.

You can do this for 64-bit-only builtins as well -- the actual argument
type is never visible (to the user), and everything becomes modes early.

Yep -- in fact, the old system generally used a (V2)DImode type for such 
things, and I've just translated all of those to (vector) long long in 
the new prototypes.


Thanks for the ongoing review!

Bill



Re: [PATCH] Add no_sanitize_coverage attribute.

2021-05-21 Thread Marco Elver via Gcc-patches
On Fri, May 21, 2021 at 10:50AM +0200, Martin Liška wrote:
> On 5/20/21 12:55 PM, Marco Elver wrote:
> > I think this came up with other no_sanitize [1] based on what I had
> > written to you last year [2].
> > 
> > [1]https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547618.html
> > [2]https://lore.kernel.org/lkml/canpmjnnrz5ovkb6pe7k6gjfogbht_zhypkng9ad+kjndzk7...@mail.gmail.com/
> 
> Ah, you're right. I've just updated the patch to address that.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?

Looks good, I also just built a kernel with the no_sanitize_coverage
attribute (without the objtool nop-workaround) and works as expected.

Not sure if required, but would such an additional test be useful:

---

diff --git a/gcc/testsuite/gcc.dg/sancov/attribute.c 
b/gcc/testsuite/gcc.dg/sancov/attribute.c
index bf6dbd4bae7..7cfa9134ff1 100644
--- a/gcc/testsuite/gcc.dg/sancov/attribute.c
+++ b/gcc/testsuite/gcc.dg/sancov/attribute.c
@@ -11,5 +11,17 @@ bar(void)
 {
 }
 
+static void inline
+__attribute__((always_inline))
+inline_fn(void)
+{
+}
+
+void
+__attribute__((no_sanitize_coverage))
+baz(void)
+{
+  inline_fn();
+}
 
 /* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_pc 
\\(\\)" 1 "optimized" } } */

---

Otherwise, please go ahead. I assume this is targeting GCC 12?

Thanks,
-- Marco


Re: [PATCH] Try LTO partial linking. (Was: Speed of compiling gimple-match.c)

2021-05-21 Thread David Edelsohn via Gcc-patches
On Fri, May 21, 2021 at 5:25 AM Martin Liška  wrote:
>
> On 5/20/21 2:54 PM, Richard Biener wrote:
> > On Thu, May 20, 2021 at 2:34 PM Martin Liška  wrote:
> >>
> >> Hello.
> >>
> >> I've got a patch candidate that leverages partial linking for a couple of 
> >> selected object files.
> >>
> >> I'm sending make all-host- jX results for my machine:
> >>
> >> before: 3m18s (user 32m52s)
> >> https://gist.githubusercontent.com/marxin/223890df4d8d8e490b6b2918b77dacad/raw/1dd5eae5001295ba0230a689f7edc67284c9b742/gcc-all-host.svg
> >>
> >> after: 2m57m (user 35m)
> >> https://gist.githubusercontent.com/marxin/223890df4d8d8e490b6b2918b77dacad/raw/d659b2187cf622167841efbbe6bc93cb33855fa9/gcc-all-host-partial-lto.svg
> >>
> >> One can utilize it with:
> >> make -j16 all-host PARTIAL_LTO=1
> >>
> >> @Segher, Andrew: Can you please measure time improvement for your slow 
> >> bootstrap?
> >> One can also tweak --param=lto-partitions=16 param value.
> >>
> >> Thoughts?
> >
> > You're LTO linking multiple objects here - that's almost as if you
> > were doing this
> > for the whole of libbackend.a ... so $(OBJS)_CLFAGS += -flto and in the
> > libbackend.a rule do a similar partial link trick.
>
> Yeah, apart from that one can't likely do partial linking for an archive:
>
> $ g++ -no-pie -flto=auto --param=lto-partitions=16 -flinker-output=nolto-rel 
> -r libbackend.a
> collect2: fatal error: ld terminated with signal 11 [Segmentation fault], 
> core dumped
> compilation terminated.
>
> while ld.bfd immediately finishes.
>
> >
> > That gets you half of a LTO bootstrap then.
> >
> > So why did you go from applying this per-file to multiple files?  Does 
> > $(LINKER)
> > have a proper rule to pick up a jobserver?
> >
> > When upstreaming in any form you probably have to gate it on bootstrap-lto
> > being not active.
>
> Sure, that's reasonable, we can likely detect a -flto option in $(COMPILE), 
> right?
>
> One more thing I face is broken dependency:
> $ make clean && make -j32 PARTIAL_LTO=1
>
> g++ -fcf-protection -fno-PIE -c   -g   -DIN_GCC -fPIC-fno-exceptions 
> -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing 
> -Wwrite-strings -Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute 
> -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
> -Wno-overlength-strings -fno-common -Wno-unused -DHAVE_CONFIG_H -I. -I. 
> -I/home/marxin/Programming/gcc/gcc -I/home/marxin/Programming/gcc/gcc/. 
> -I/home/marxin/Programming/gcc/gcc/../include 
> -I/home/marxin/Programming/gcc/gcc/../libcpp/include 
> -I/home/marxin/Programming/gcc/gcc/../libcody  
> -I/home/marxin/Programming/gcc/gcc/../libdecnumber 
> -I/home/marxin/Programming/gcc/gcc/../libdecnumber/bid -I../libdecnumber 
> -I/home/marxin/Programming/gcc/gcc/../libbacktrace   -o gimple-match-lto.o 
> -MT gimple-match-lto.o -MMD -MP -MF ./.deps/gimple-match-lto.TPo 
> gimple-match.c -flto
> g++ -fcf-protection -fno-PIE -c   -g   -DIN_GCC -fPIC-fno-exceptions 
> -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing 
> -Wwrite-strings -Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute 
> -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
> -Wno-overlength-strings -fno-common -Wno-unused -DHAVE_CONFIG_H -I. -I. 
> -I/home/marxin/Programming/gcc/gcc -I/home/marxin/Programming/gcc/gcc/. 
> -I/home/marxin/Programming/gcc/gcc/../include 
> -I/home/marxin/Programming/gcc/gcc/../libcpp/include 
> -I/home/marxin/Programming/gcc/gcc/../libcody  
> -I/home/marxin/Programming/gcc/gcc/../libdecnumber 
> -I/home/marxin/Programming/gcc/gcc/../libdecnumber/bid -I../libdecnumber 
> -I/home/marxin/Programming/gcc/gcc/../libbacktrace   -o generic-match-lto.o 
> -MT generic-match-lto.o -MMD -MP -MF ./.deps/generic-match-lto.TPo 
> generic-match.c -flto
>
> In file included from ./tm.h:26,
>   from /home/marxin/Programming/gcc/gcc/backend.h:28,
>   from 
> /home/marxin/Programming/gcc/gcc/generic-match-head.c:23,
>   from generic-match.c:4:
> /home/marxin/Programming/gcc/gcc/config/i386/i386.h:2286:10: fatal error: 
> insn-attr-common.h: No such file or directory
>   2286 | #include "insn-attr-common.h"
>|  ^~~~
> compilation terminated.
> make: *** [Makefile:2678: generic-match-lto.o] Error 1
> make: *** Waiting for unfinished jobs
>
> In file included from ./tm.h:26,
>   from /home/marxin/Programming/gcc/gcc/backend.h:28,
>   from 
> /home/marxin/Programming/gcc/gcc/gimple-match-head.c:23,
>   from gimple-match.c:4:
> /home/marxin/Programming/gcc/gcc/config/i386/i386.h:2286:10: fatal error: 
> insn-attr-common.h: No such file or directory
>   2286 | #include "insn-attr-common.h"
>|  ^~~~
>
> I explicitly added:
> gimple-match.o: gimple-match.c $(generated_files)
> generic-match.o: generic-match.c $(generated_files)
>
> But it's not obeyed.

Please 

[PATCH] c++/88601 - [C/C++] __builtin_shufflevector support

2021-05-21 Thread Richard Biener
This adds support for the clang __builtin_shufflevector extension to
the C and C++ frontends.  The builtin is lowered to VEC_PERM_EXPR.
Because VEC_PERM_EXPR does not support different sized vector inputs
or result or the special permute index of -1 (don't-care)
c_build_shufflevector applies lowering by widening inputs and output
to the widest vector, replacing -1 by a defined index and
subsetting the final vector if we produced a wider result than
desired.

Code generation thus can be sub-optimal, followup patches will
aim to fix that by recovering from part of the missing features
during RTL expansion and by relaxing the constraints of the GIMPLE
IL with regard to VEC_PERM_EXPR.

Bootstrapped on x86_64-unknown-linux-gnu, (re-)testing in progress.

Honza - you've filed PR88601, can you point me to testcases that
exercise common uses so we can look at code generation quality
and where time is spent best in improving things?

OK for trunk?

Thanks,
Richard.

2021-05-21  Richard Biener  

PR c++/88601
gcc/c-family/
* c-common.c: Include tree-vector-builder.h and
vec-perm-indices.h.
(c_common_reswords): Add __builtin_shufflevector.
(c_build_shufflevector): New funtion.
* c-common.h (enum rid): Add RID_BUILTIN_SHUFFLEVECTOR.
(c_build_shufflevector): Declare.

gcc/c/
* c-decl.c (names_builtin_p): Handle RID_BUILTIN_SHUFFLEVECTOR.
* c-parser.c (c_parser_postfix_expression): Likewise.

gcc/cp/
* cp-objcp-common.c (names_builtin_p): Handle
RID_BUILTIN_SHUFFLEVECTOR.
* cp-tree.h (build_x_shufflevector): Declare.
* parser.c (cp_parser_postfix_expression): Handle
RID_BUILTIN_SHUFFLEVECTOR.
* pt.c (tsubst_copy_and_build): Handle IFN_SHUFFLEVECTOR.
* typeck.c (build_x_shufflevector): Build either a lowered
VEC_PERM_EXPR or an unlowered shufflevector via a temporary
internal function IFN_SHUFFLEVECTOR.

gcc/
* internal-fn.c (expand_SHUFFLEVECTOR): Define.
* internal-fn.def (SHUFFLEVECTOR): New.
* internal-fn.h (expand_SHUFFLEVECTOR): Declare.

gcc/testsuite/
* c-c++-common/builtin-shufflevector-2.c: New testcase.
* c-c++-common/torture/builtin-shufflevector-1.c: Likewise.
* g++.dg/builtin-shufflevector-1.C: Likewise.
* g++.dg/builtin-shufflevector-2.C: Likewise.
---
 gcc/c-family/c-common.c   | 139 ++
 gcc/c-family/c-common.h   |   4 +-
 gcc/c/c-decl.c|   1 +
 gcc/c/c-parser.c  |  38 +
 gcc/cp/cp-objcp-common.c  |   1 +
 gcc/cp/cp-tree.h  |   3 +
 gcc/cp/parser.c   |  15 ++
 gcc/cp/pt.c   |   9 ++
 gcc/cp/typeck.c   |  36 +
 gcc/internal-fn.c |   6 +
 gcc/internal-fn.def   |   3 +
 gcc/internal-fn.h |   1 +
 .../c-c++-common/builtin-shufflevector-2.c|  18 +++
 .../torture/builtin-shufflevector-1.c |  49 ++
 .../g++.dg/builtin-shufflevector-1.C  |  18 +++
 .../g++.dg/builtin-shufflevector-2.C  |  12 ++
 16 files changed, 352 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/c-c++-common/builtin-shufflevector-2.c
 create mode 100644 gcc/testsuite/c-c++-common/torture/builtin-shufflevector-1.c
 create mode 100644 gcc/testsuite/g++.dg/builtin-shufflevector-1.C
 create mode 100644 gcc/testsuite/g++.dg/builtin-shufflevector-2.C

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index b7daa2e2654..c4eb2b1c920 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -51,6 +51,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-spellcheck.h"
 #include "selftest.h"
 #include "debug.h"
+#include "tree-vector-builder.h"
+#include "vec-perm-indices.h"
 
 cpp_reader *parse_in;  /* Declared in c-pragma.h.  */
 
@@ -383,6 +385,7 @@ const struct c_common_resword c_common_reswords[] =
   { "__builtin_has_attribute", RID_BUILTIN_HAS_ATTRIBUTE, 0 },
   { "__builtin_launder", RID_BUILTIN_LAUNDER, D_CXXONLY },
   { "__builtin_shuffle", RID_BUILTIN_SHUFFLE, 0 },
+  { "__builtin_shufflevector", RID_BUILTIN_SHUFFLEVECTOR, 0 },
   { "__builtin_tgmath", RID_BUILTIN_TGMATH, D_CONLY },
   { "__builtin_offsetof", RID_OFFSETOF, 0 },
   { "__builtin_types_compatible_p", RID_TYPES_COMPATIBLE_P, D_CONLY },
@@ -1108,6 +,142 @@ c_build_vec_perm_expr (location_t loc, tree v0, tree 
v1, tree mask,
   return ret;
 }
 
+/* Build a VEC_PERM_EXPR if V0, V1 are not error_mark_nodes
+   and have vector types, V0 has the same element type as V1, and the
+   number of elements the result is that of MASK.  */
+tree
+c_build_shufflevector (location_t loc, tree v0, tree v1, vec mask,
+  bool complain)
+{
+  tree ret;
+  

Re: [PATCH] aarch64: Add attributes for builtins specified in aarch64-builtins.c

2021-05-21 Thread Richard Sandiford via Gcc-patches
Kyrylo Tkachov  writes:
> Hi all,
>
> Besides the builtins in aarch64-simd-builtins.def there are a number of 
> builtins defined in aarch64-builtins.c itself.
> They could also benefit from the attributes generated by 
> aarch64_get_attributes.
> However aarch64_get_attributes and its helpers are only set up to handle a 
> aarch64_simd_builtin_datum.
>
> This patch changes these functions to instead take a flag and mode value that 
> are extracted from
> aarch64_simd_builtin_datum.flags and aarch64_simd_builtin_datum.mode anyway.
> Then the various builtin init functions in aarch64-builtins.c can pass down 
> their own FLAG_* flags
> that they want to derive attributes from.
> I'm not too happy with changing the helper functions in this way, but it 
> seemed like the least bad option.
>
> Richard, does this look like an ok approach to you?

Looks good to me.  Just some very minor things:

I guess this is personal preference, but in:

> +  unsigned int flags = FLAG_FP;
> +  tree attrs = aarch64_get_attributes (flags, d->mode);

it seemed odd to separate out the flags variable instead of just having:

  tree attrs = aarch64_get_attributes (FLAG_FP, d->mode);

Long line in:

> +  tree fndecl = aarch64_general_add_builtin (d->name, ftype, d->fcode, 
> attrs);

For;

> @@ -906,14 +906,13 @@ aarch64_init_simd_builtin_scalar_types (void)
>"__builtin_aarch64_simd_udi");
>  }
>  
> -/* Return a set of FLAG_* flags that describe what the function could do,
> +/* Adjust the set of FLAG_* flags derived from FLAGS
> +   that describe what the function with result MODE could do,

I think this is easier to parse without the s/Return a/Adjust the/.

Also, s/the function/a function/ seems better, now that we're not passing
a specific function.  Same for the other comments.

Thanks,
Richard


Re: [PATCH] Add 3 target hooks for memset

2021-05-21 Thread H.J. Lu via Gcc-patches
On Thu, May 20, 2021 at 10:42 PM Bernd Edlinger
 wrote:
>
> On 5/20/21 10:49 PM, H.J. Lu wrote:
> > On Wed, May 19, 2021 at 5:55 AM H.J. Lu  wrote:
> >>
> >> On Wed, May 19, 2021 at 2:25 AM Richard Biener
> >>  wrote:
> >>>
> >>> On Tue, May 18, 2021 at 9:16 PM H.J. Lu  wrote:
> 
>  Add TARGET_READ_MEMSET_VALUE and TARGET_GEN_MEMSET_VALUE to support
>  target instructions to duplicate QImode value to TImode/OImode/XImode
>  value for memmset.
> 
>  PR middle-end/90773
>  * builtins.c (builtin_memset_read_str): Call
>  targetm.read_memset_value.
>  (builtin_memset_gen_str): Call targetm.gen_memset_value.
>  * target.def (read_memset_value): New hook.
>  (gen_memset_value): Likewise.
>  * targhooks.c: Inclue "builtins.h".
>  (default_read_memset_value): New function.
>  (default_gen_memset_value): Likewise.
>  * targhooks.h (default_read_memset_value): New prototype.
>  (default_gen_memset_value): Likewise.
>  * doc/tm.texi.in: Add TARGET_READ_MEMSET_VALUE and
>  TARGET_GEN_MEMSET_VALUE hooks.
>  * doc/tm.texi: Regenerated.
>  ---
>   gcc/builtins.c | 47 --
>   gcc/doc/tm.texi| 16 +
>   gcc/doc/tm.texi.in |  4 
>   gcc/target.def | 20 +
>   gcc/targhooks.c| 56 ++
>   gcc/targhooks.h|  4 
>   6 files changed, 104 insertions(+), 43 deletions(-)
> 
>  diff --git a/gcc/builtins.c b/gcc/builtins.c
>  index e1b284846b1..f78a36478ef 100644
>  --- a/gcc/builtins.c
>  +++ b/gcc/builtins.c
>  @@ -6584,24 +6584,11 @@ expand_builtin_strncpy (tree exp, rtx target)
>  previous iteration.  */
> 
>   rtx
>  -builtin_memset_read_str (void *data, void *prevp,
>  +builtin_memset_read_str (void *data, void *prev,
>   HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
>   scalar_int_mode mode)
>   {
>  -  by_pieces_prev *prev = (by_pieces_prev *) prevp;
>  -  if (prev != nullptr && prev->data != nullptr)
>  -{
>  -  /* Use the previous data in the same mode.  */
>  -  if (prev->mode == mode)
>  -   return prev->data;
>  -}
>  -
>  -  const char *c = (const char *) data;
>  -  char *p = XALLOCAVEC (char, GET_MODE_SIZE (mode));
>  -
>  -  memset (p, *c, GET_MODE_SIZE (mode));
>  -
>  -  return c_readstr (p, mode);
>  +  return targetm.read_memset_value ((const char *) data, prev, mode);
>   }
> 
>   /* Callback routine for store_by_pieces.  Return the RTL of a register
>  @@ -6611,37 +6598,11 @@ builtin_memset_read_str (void *data, void *prevp,
>  nullptr, it has the RTL info from the previous iteration.  */
> 
>   static rtx
>  -builtin_memset_gen_str (void *data, void *prevp,
>  +builtin_memset_gen_str (void *data, void *prev,
>  HOST_WIDE_INT offset ATTRIBUTE_UNUSED,
>  scalar_int_mode mode)
>   {
>  -  rtx target, coeff;
>  -  size_t size;
>  -  char *p;
>  -
>  -  by_pieces_prev *prev = (by_pieces_prev *) prevp;
>  -  if (prev != nullptr && prev->data != nullptr)
>  -{
>  -  /* Use the previous data in the same mode.  */
>  -  if (prev->mode == mode)
>  -   return prev->data;
>  -
>  -  target = simplify_gen_subreg (mode, prev->data, prev->mode, 0);
>  -  if (target != nullptr)
>  -   return target;
>  -}
>  -
>  -  size = GET_MODE_SIZE (mode);
>  -  if (size == 1)
>  -return (rtx) data;
>  -
>  -  p = XALLOCAVEC (char, size);
>  -  memset (p, 1, size);
>  -  coeff = c_readstr (p, mode);
>  -
>  -  target = convert_to_mode (mode, (rtx) data, 1);
>  -  target = expand_mult (mode, target, coeff, NULL_RTX, 1);
>  -  return force_reg (mode, target);
>  +  return targetm.gen_memset_value ((rtx) data, prev, mode);
>   }
> 
>   /* Expand expression EXP, which is a call to the memset builtin.  Return
>  diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>  index 85ea9395560..51385044e76 100644
>  --- a/gcc/doc/tm.texi
>  +++ b/gcc/doc/tm.texi
>  @@ -11868,6 +11868,22 @@ This function prepares to emit a conditional 
>  comparison within a sequence
>    @var{bit_code} is @code{AND} or @code{IOR}, which is the op on the 
>  compares.
>   @end deftypefn
> 
>  +@deftypefn {Target Hook} rtx TARGET_READ_MEMSET_VALUE (const char 
>  *@var{c}, void *@var{prev}, scalar_int_mode @var{mode})
>  +This function returns the RTL of a constant integer corresponding to
>  +target reading @code{GET_MODE_SIZE (@var{mode})} 

[PATCH 4/5] Convert remaining passes to RANGE_QUERY.

2021-05-21 Thread Aldy Hernandez via Gcc-patches
This patch converts the remaining users of get_range_info and
get_ptr_nonnull to the range_query API.

No effort was made to move passes away from VR_ANTI_RANGE, or any other
use of deprecated methods.  This was a straight up conversion to the new
API, nothing else.

Tested on x86-64 Linux.

OK?

gcc/ChangeLog:

* builtins.c (check_nul_terminated_array): Convert to RANGE_QUERY.
(expand_builtin_strnlen): Same.
(determine_block_size): Same.
* fold-const.c (expr_not_equal_to): Same.
* gimple-fold.c (size_must_be_zero_p): Same.
* gimple-match-head.c: Include gimple-range.h.
* gimple-pretty-print.c (dump_ssaname_info): Convert to RANGE_QUERY.
* gimple-ssa-warn-restrict.c
(builtin_memref::extend_offset_range): Same.
* graphite-sese-to-poly.c (add_param_constraints): Same.
* internal-fn.c (get_min_precision): Same.
* ipa-fnsummary.c (set_switch_stmt_execution_predicate): Same.
* ipa-prop.c (ipa_compute_jump_functions_for_edge): Same.
* match.pd: Same.
* tree-data-ref.c (split_constant_offset): Same.
(dr_step_indicator): Same.
* tree-dfa.c (get_ref_base_and_extent): Same.
* tree-scalar-evolution.c (iv_can_overflow_p): Same.
* tree-ssa-loop-niter.c (refine_value_range_using_guard): Same.
(determine_value_range): Same.
(record_nonwrapping_iv): Same.
(infer_loop_bounds_from_signedness): Same.
(scev_var_range_cant_overflow): Same.
* tree-ssa-phiopt.c (two_value_replacement): Same.
* tree-ssa-pre.c (insert_into_preds_of_block): Same.
* tree-ssa-reassoc.c (optimize_range_tests_to_bit_test): Same.
* tree-ssa-strlen.c (handle_builtin_stxncpy_strncat): Same.
(get_range): Same.
(dump_strlen_info): Same.
(set_strlen_range): Same.
(maybe_diag_stxncpy_trunc): Same.
(get_len_or_size): Same.
(handle_integral_assign): Same.
* tree-ssa-structalias.c (find_what_p_points_to): Same.
* tree-ssa-uninit.c (find_var_cmp_const): Same.
* tree-switch-conversion.c (bit_test_cluster::emit): Same.
* tree-vect-patterns.c (vect_get_range_info): Same.
(vect_recog_divmod_pattern): Same.
* tree-vrp.c (intersect_range_with_nonzero_bits): Same.
(register_edge_assert_for_2): Same.
(determine_value_range_1): Same.
* tree.c (get_range_pos_neg): Same.
* vr-values.c (vr_values::get_lattice_entry): Same.
(vr_values::update_value_range): Same.
(simplify_conversion_using_ranges): Same.
---
 gcc/builtins.c | 40 ++--
 gcc/fold-const.c   |  8 +++-
 gcc/gimple-fold.c  |  7 ++-
 gcc/gimple-match-head.c|  1 +
 gcc/gimple-pretty-print.c  | 12 -
 gcc/gimple-ssa-warn-restrict.c |  8 +++-
 gcc/graphite-sese-to-poly.c|  9 +++-
 gcc/internal-fn.c  | 14 +++---
 gcc/ipa-fnsummary.c| 11 -
 gcc/ipa-prop.c | 16 +++
 gcc/match.pd   | 19 ++--
 gcc/tree-data-ref.c| 24 --
 gcc/tree-dfa.c | 14 +-
 gcc/tree-scalar-evolution.c| 13 +-
 gcc/tree-ssa-loop-niter.c  | 81 +---
 gcc/tree-ssa-phiopt.c  | 11 -
 gcc/tree-ssa-pre.c | 19 
 gcc/tree-ssa-reassoc.c |  9 ++--
 gcc/tree-ssa-strlen.c  | 85 --
 gcc/tree-ssa-structalias.c |  8 ++--
 gcc/tree-ssa-uninit.c  |  8 +++-
 gcc/tree-switch-conversion.c   | 10 ++--
 gcc/tree-vect-patterns.c   | 18 +--
 gcc/tree-vrp.c | 21 -
 gcc/tree.c | 13 +++---
 gcc/vr-values.c| 12 +++--
 26 files changed, 332 insertions(+), 159 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index e1b284846b1..deb7c083315 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -79,6 +79,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-outof-ssa.h"
 #include "attr-fnspec.h"
 #include "demangle.h"
+#include "gimple-range.h"
 
 struct target_builtins default_target_builtins;
 #if SWITCHABLE_TARGET
@@ -1214,14 +1215,15 @@ check_nul_terminated_array (tree expr, tree src,
   wide_int bndrng[2];
   if (bound)
 {
-  if (TREE_CODE (bound) == INTEGER_CST)
-   bndrng[0] = bndrng[1] = wi::to_wide (bound);
-  else
-   {
- value_range_kind rng = get_range_info (bound, bndrng, bndrng + 1);
- if (rng != VR_RANGE)
-   return true;
-   }
+  value_range r;
+
+  GLOBAL_RANGE_QUERY->range_of_expr (r, bound);
+
+  if (r.kind () != VR_RANGE)
+   return true;
+
+  bndrng[0] = r.lower_bound ();
+  bndrng[1] = r.upper_bound ();
 
   if (exact)
{
@@ -3827,9 +3829,12 @@ expand_builtin_strnlen (tree exp, rtx target, 
machine_mode target_mode)
 return 

[PATCH 1/5] Common API for accessing global and on-demand ranges.

2021-05-21 Thread Aldy Hernandez via Gcc-patches
This patch provides a generic API for accessing global ranges.  It is
meant to replace get_range_info() and get_ptr_nonnull() with one
common interface.  It uses the same API as the ranger (class
range_query), so there will now be one API for accessing local and
global ranges alike.

Follow-up patches will convert all users of get_range_info and
get_ptr_nonnull to this API.

For get_range_info, instead of:

  if (!POINTER_TYPE_P (TREE_TYPE (name)) && SSA_NAME_RANGE_INFO (name))
get_range_info (name, vr);

You can now do:

  RANGE_QUERY (cfun)->range_of_expr (vr, name, [stmt]);

...as well as any other of the range_query methods (range_on_edge,
range_of_stmt, value_of_expr, value_on_edge, value_on_stmt, etc).

As per the API, range_of_expr will work on constants, SSA names, and
anything we support in irange::supports_type_p().

For pointers, the interface is the same, so instead of:

  else if (POINTER_TYPE_P (TREE_TYPE (name)) && SSA_NAME_PTR_INFO (name))
{
  if (get_ptr_nonnull (name))
stuff();
}

One can do:

  RANGE_QUERY (cfun)->range_of_expr (vr, name, [stmt]);
  if (vr.nonzero_p ())
stuff ();

Along with this interface, we are providing a mechanism by which a
pass can use an on-demand ranger transparently, without having to
change its code.  Of course, this assumes all get_range_info() and
get_ptr_nonnull() users have been converted to the new API, which
follow-up patches will do.

If a pass would prefer to use an on-demand ranger with finer grained
and context aware ranges, all it would have to do is call
enable_ranger() at the beginning of the pass, and disable_ranger() at
the end of the pass.

Note, that to use context aware ranges, any user of range_of_expr()
would need to pass additional context.  For example, the optional
gimple statement (or perhaps use range_on_edge or range_of_stmt).

The observant reader will note that RANGE_QUERY is tied to cfun, which
may not be available in certain contexts, such as at RTL time,
gimple-fold, or some other places where we may or may not have cfun
set.

For cases where we are sure there is no cfun, you can use
GLOBAL_RANGE_QUERY instead of RANGE_QUERY(cfun).  The API is the same.

For cases where a function may be called with or without a function,
you could use the following idiom:

  range_query *query = cfun ? RANGE_QUERY (cfun) : GLOBAL_RANGE_QUERY;

  query->range_of_expr (range, expr, [stmt]);

By default RANGE_QUERY uses GLOBAL_RANGE_QUERY, unless the user has
enabled an on-demand ranger with enable_ranger(), in which case it
will use the currently active ranger.  That is, until disable_ranger()
is called, at which point, we revert back to GLOBAL_RANGE_QUERY.

We think this provides a generic way of accessing ranges, both
globally and locally, without having to keep track of types,
SSA_NAME_RANGE_INFO, and SSA_NAME_PTR_INFO.  We also hope this can be
used to transition passes from global to on-demand ranges when
appropriate.

Tested on x86-64 Linux.

OK?

gcc/ChangeLog:

* function.c (allocate_struct_function): Set cfun->x_range_query.
* function.h (struct function): Declare x_range_query.
(RANGE_QUERY): New.
(get_global_range_query): New.
(GLOBAL_RANGE_QUERY): New.
* gimple-range-cache.cc (ssa_global_cache::ssa_global_cache):
Remove call to safe_grow_cleared.
* gimple-range.cc (get_range_global): New.
(gimple_range_global): Move from gimple-range.h.
(get_global_range_query): New.
(global_range_query::range_of_expr): New.
(enable_ranger): New.
(disable_ranger): New.
* gimple-range.h (gimple_range_global): Move to gimple-range.cc.
(class global_range_query): New.
(enable_ranger): New.
(disable_ranger): New.
* gimple-ssa-evrp.c (evrp_folder::~evrp_folder): Rename
dump_all_value_ranges to dump.
* tree-vrp.c (vrp_prop::finalize): Same.
* value-query.cc (range_query::dump): New.
* value-query.h (range_query::dump): New.
* vr-values.c (vr_values::dump_all_value_ranges): Rename to...
(vr_values::dump): ...this.
* vr-values.h (class vr_values): Rename dump_all_value_ranges to
dump and make virtual.
---
 gcc/function.c|   4 ++
 gcc/function.h|  16 +
 gcc/gimple-range-cache.cc |   1 -
 gcc/gimple-range.cc   | 130 ++
 gcc/gimple-range.h|  60 +-
 gcc/gimple-ssa-evrp.c |   2 +-
 gcc/tree-vrp.c|   2 +-
 gcc/value-query.cc|   5 ++
 gcc/value-query.h |   1 +
 gcc/vr-values.c   |   2 +-
 gcc/vr-values.h   |   2 +-
 11 files changed, 175 insertions(+), 50 deletions(-)

diff --git a/gcc/function.c b/gcc/function.c
index fc7b147b5f1..67576950983 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -82,6 +82,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple.h"
 #include 

[PATCH 3/5] Convert evrp pass to RANGE_QUERY(cfun).

2021-05-21 Thread Aldy Hernandez via Gcc-patches
Some tests required tweaking, as exporting global ranges from evrp will
now provide better ranges for subsequent passes.

Tested on x86-64 Linux.

OK?

gcc/ChangeLog:

* gimple-ssa-evrp.c (rvrp_folder): Call enable_ranger.  Use
RANGE_QUERY.
(~rvrp_folder): Call disable_ranger.
(hybrid_folder::value_of_expr): Use global ranger.
(hybrid_folder::value_on_edge): Same.
(hybrid_folder::value_of_stmt): Same.

gcc/testsuite/ChangeLog:

* gcc.dg/Wstringop-overflow-55.c: Remove xfail.
* gcc.dg/pr80776-1.c: Same.
---
 gcc/gimple-ssa-evrp.c| 43 +---
 gcc/testsuite/gcc.dg/Wstringop-overflow-55.c |  8 ++--
 gcc/testsuite/gcc.dg/pr80776-1.c |  4 +-
 3 files changed, 24 insertions(+), 31 deletions(-)

diff --git a/gcc/gimple-ssa-evrp.c b/gcc/gimple-ssa-evrp.c
index 829fdcdaef2..bd4c7634b3e 100644
--- a/gcc/gimple-ssa-evrp.c
+++ b/gcc/gimple-ssa-evrp.c
@@ -117,34 +117,33 @@ class rvrp_folder : public substitute_and_fold_engine
 public:
 
   rvrp_folder () : substitute_and_fold_engine (), m_simplifier ()
-  { 
-if (param_evrp_mode & EVRP_MODE_TRACE)
-  m_ranger = new trace_ranger ();
-else
-  m_ranger = new gimple_ranger ();
-m_simplifier.set_range_query (m_ranger);
+  {
+enable_ranger ();
+
+m_simplifier.set_range_query (RANGE_QUERY (cfun));
   }
   
   ~rvrp_folder ()
   {
 if (dump_file && (dump_flags & TDF_DETAILS))
-  m_ranger->dump (dump_file);
-delete m_ranger;
+  RANGE_QUERY (cfun)->dump (dump_file);
+
+disable_ranger ();
   }
 
   tree value_of_expr (tree name, gimple *s = NULL) OVERRIDE
   {
-return m_ranger->value_of_expr (name, s);
+return RANGE_QUERY (cfun)->value_of_expr (name, s);
   }
 
   tree value_on_edge (edge e, tree name) OVERRIDE
   {
-return m_ranger->value_on_edge (e, name);
+return RANGE_QUERY (cfun)->value_on_edge (e, name);
   }
 
   tree value_of_stmt (gimple *s, tree name = NULL) OVERRIDE
   {
-return m_ranger->value_of_stmt (s, name);
+return RANGE_QUERY (cfun)->value_of_stmt (s, name);
   }
 
   bool fold_stmt (gimple_stmt_iterator *gsi) OVERRIDE
@@ -154,7 +153,6 @@ public:
 
 private:
   DISABLE_COPY_AND_ASSIGN (rvrp_folder);
-  gimple_ranger *m_ranger;
   simplify_using_ranges m_simplifier;
 };
 
@@ -175,19 +173,16 @@ class hybrid_folder : public evrp_folder
 public:
   hybrid_folder (bool evrp_first)
   {
-if (param_evrp_mode & EVRP_MODE_TRACE)
-  m_ranger = new trace_ranger ();
-else
-  m_ranger = new gimple_ranger ();
+enable_ranger ();
 
 if (evrp_first)
   {
first = _range_analyzer;
-   second = m_ranger;
+   second = RANGE_QUERY (cfun);
   }
  else
   {
-   first = m_ranger;
+   first = RANGE_QUERY (cfun);
second = _range_analyzer;
   }
   }
@@ -195,8 +190,9 @@ public:
   ~hybrid_folder ()
   {
 if (dump_file && (dump_flags & TDF_DETAILS))
-  m_ranger->dump (dump_file);
-delete m_ranger;
+  RANGE_QUERY (cfun)->dump (dump_file);
+
+disable_ranger ();
   }
 
   bool fold_stmt (gimple_stmt_iterator *gsi) OVERRIDE
@@ -221,7 +217,6 @@ public:
 
 private:
   DISABLE_COPY_AND_ASSIGN (hybrid_folder);
-  gimple_ranger *m_ranger;
   range_query *first;
   range_query *second;
   tree choose_value (tree evrp_val, tree ranger_val);
@@ -232,7 +227,7 @@ tree
 hybrid_folder::value_of_expr (tree op, gimple *stmt)
 {
   tree evrp_ret = evrp_folder::value_of_expr (op, stmt);
-  tree ranger_ret = m_ranger->value_of_expr (op, stmt);
+  tree ranger_ret = RANGE_QUERY (cfun)->value_of_expr (op, stmt);
   return choose_value (evrp_ret, ranger_ret);
 }
 
@@ -242,7 +237,7 @@ hybrid_folder::value_on_edge (edge e, tree op)
   // Call evrp::value_of_expr directly.  Otherwise another dual call is made
   // via hybrid_folder::value_of_expr, but without an edge.
   tree evrp_ret = evrp_folder::value_of_expr (op, NULL);
-  tree ranger_ret = m_ranger->value_on_edge (e, op);
+  tree ranger_ret = RANGE_QUERY (cfun)->value_on_edge (e, op);
   return choose_value (evrp_ret, ranger_ret);
 }
 
@@ -257,7 +252,7 @@ hybrid_folder::value_of_stmt (gimple *stmt, tree op)
   else
 evrp_ret = NULL_TREE;
 
-  tree ranger_ret = m_ranger->value_of_stmt (stmt, op);
+  tree ranger_ret = RANGE_QUERY (cfun)->value_of_stmt (stmt, op);
   return choose_value (evrp_ret, ranger_ret);
 }
 
diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow-55.c 
b/gcc/testsuite/gcc.dg/Wstringop-overflow-55.c
index 25f5b82d9be..8df5cb629ae 100644
--- a/gcc/testsuite/gcc.dg/Wstringop-overflow-55.c
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow-55.c
@@ -66,7 +66,7 @@ void warn_ptrdiff_anti_range_add (ptrdiff_t i)
 {
   i |= 1;
 
-  char ca5[5];  // { dg-message "at offset \\\[1, 5]" "pr?" { 
xfail *-*-* } }
+  char ca5[5];  // { dg-message "at offset \\\[1, 5]" "pr?" }
   char *p0 = ca5;   // offset
   char *p1 = p0 + i;//  

[PATCH 5/5] Cleanup get_range_info

2021-05-21 Thread Aldy Hernandez via Gcc-patches
Now that there is only one user of get_range_info() we can clean this up
to always return a range instead of pairs of wide_ints.

A follow-up patch will inline both get_range_info and get_ptr_nonnull
into its only remaining user.

Tested on x86-64 Linux.

OK?

gcc/ChangeLog:

* tree-ssanames.c (get_range_info): Merge both copies of
get_range_info into one that works with irange.
* tree-ssanames.h (get_range_info): Remove version that works on
wide_ints.
---
 gcc/tree-ssanames.c | 57 +++--
 gcc/tree-ssanames.h |  2 --
 2 files changed, 14 insertions(+), 45 deletions(-)

diff --git a/gcc/tree-ssanames.c b/gcc/tree-ssanames.c
index 51a26d2fce1..5329c0a4187 100644
--- a/gcc/tree-ssanames.c
+++ b/gcc/tree-ssanames.c
@@ -423,58 +423,29 @@ set_range_info (tree name, const value_range )
   set_range_info (name, vr.kind (), min, max);
 }
 
-/* Gets range information MIN, MAX and returns enum value_range_kind
-   corresponding to tree ssa_name NAME.  enum value_range_kind returned
-   is used to determine if MIN and MAX are valid values.  */
+/* Gets range information corresponding to ssa_name NAME and stores it
+   in a value_range VR.  Returns the value_range_kind.  */
 
 enum value_range_kind
-get_range_info (const_tree expr, wide_int *min, wide_int *max)
+get_range_info (const_tree name, irange )
 {
-  gcc_assert (!POINTER_TYPE_P (TREE_TYPE (expr)));
-  gcc_assert (min && max);
-  if (TREE_CODE (expr) == INTEGER_CST)
-{
-  *min = wi::to_wide (expr);
-  *max = *min;
-  return VR_RANGE;
-}
-  if (TREE_CODE (expr) != SSA_NAME)
-return VR_VARYING;
+  tree type = TREE_TYPE (name);
+  gcc_checking_assert (!POINTER_TYPE_P (type));
+  gcc_checking_assert (TREE_CODE (name) == SSA_NAME);
 
-  range_info_def *ri = SSA_NAME_RANGE_INFO (expr);
+  range_info_def *ri = SSA_NAME_RANGE_INFO (name);
 
   /* Return VR_VARYING for SSA_NAMEs with NULL RANGE_INFO or SSA_NAMEs
  with integral types width > 2 * HOST_BITS_PER_WIDE_INT precision.  */
-  if (!ri || (GET_MODE_PRECISION (SCALAR_INT_TYPE_MODE (TREE_TYPE (expr)))
+  if (!ri || (GET_MODE_PRECISION (SCALAR_INT_TYPE_MODE (TREE_TYPE (name)))
  > 2 * HOST_BITS_PER_WIDE_INT))
-return VR_VARYING;
-
-  *min = ri->get_min ();
-  *max = ri->get_max ();
-  return SSA_NAME_RANGE_TYPE (expr);
-}
-
-/* Gets range information corresponding to ssa_name NAME and stores it
-   in a value_range VR.  Returns the value_range_kind.  */
-
-enum value_range_kind
-get_range_info (const_tree name, irange )
-{
-  tree min, max;
-  wide_int wmin, wmax;
-  enum value_range_kind kind = get_range_info (name, , );
-
-  if (kind == VR_VARYING)
-vr.set_varying (TREE_TYPE (name));
-  else if (kind == VR_UNDEFINED)
-vr.set_undefined ();
+vr.set_varying (type);
   else
-{
-  min = wide_int_to_tree (TREE_TYPE (name), wmin);
-  max = wide_int_to_tree (TREE_TYPE (name), wmax);
-  vr.set (min, max, kind);
-}
-  return kind;
+vr.set (wide_int_to_tree (type, ri->get_min ()),
+   wide_int_to_tree (type, ri->get_max ()),
+   SSA_NAME_RANGE_TYPE (name));
+
+  return vr.kind ();
 }
 
 /* Set nonnull attribute to pointer NAME.  */
diff --git a/gcc/tree-ssanames.h b/gcc/tree-ssanames.h
index cce2d2c22e8..166f921f04c 100644
--- a/gcc/tree-ssanames.h
+++ b/gcc/tree-ssanames.h
@@ -71,8 +71,6 @@ extern void set_range_info (tree, enum value_range_kind, 
const wide_int_ref &,
const wide_int_ref &);
 extern void set_range_info (tree, const value_range &);
 /* Gets the value range from SSA.  */
-extern enum value_range_kind get_range_info (const_tree, wide_int *,
-wide_int *);
 extern enum value_range_kind get_range_info (const_tree, irange &);
 extern void set_nonzero_bits (tree, const wide_int_ref &);
 extern wide_int get_nonzero_bits (const_tree);
-- 
2.31.1



[PATCH 2/5] Convert Walloca pass to RANGE_QUERY(cfun).

2021-05-21 Thread Aldy Hernandez via Gcc-patches
This patch converts the Walloca pass to use an on-demand ranger
accesible with RANGE_QUERY instead of having to create a ranger and pass
it around.

Tested on x86-64 Linux.

OK?

gcc/ChangeLog:

* gimple-ssa-warn-alloca.c (alloca_call_type): Use RANGE_QUERY
instead of query argument.
(pass_walloca::execute): Enable and disable global ranger.
---
 gcc/gimple-ssa-warn-alloca.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/gcc/gimple-ssa-warn-alloca.c b/gcc/gimple-ssa-warn-alloca.c
index e9a24d4d1d0..12f4bce3be8 100644
--- a/gcc/gimple-ssa-warn-alloca.c
+++ b/gcc/gimple-ssa-warn-alloca.c
@@ -165,7 +165,7 @@ adjusted_warn_limit (bool idx)
 // call was created by the gimplifier for a VLA.
 
 static class alloca_type_and_limit
-alloca_call_type (range_query , gimple *stmt, bool is_vla)
+alloca_call_type (gimple *stmt, bool is_vla)
 {
   gcc_assert (gimple_alloca_call_p (stmt));
   tree len = gimple_call_arg (stmt, 0);
@@ -217,7 +217,7 @@ alloca_call_type (range_query , gimple *stmt, bool 
is_vla)
   int_range_max r;
   if (warn_limit_specified_p (is_vla)
   && TREE_CODE (len) == SSA_NAME
-  && query.range_of_expr (r, len, stmt)
+  && RANGE_QUERY (cfun)->range_of_expr (r, len, stmt)
   && !r.varying_p ())
 {
   // The invalid bits are anything outside of [0, MAX_SIZE].
@@ -256,7 +256,7 @@ in_loop_p (gimple *stmt)
 unsigned int
 pass_walloca::execute (function *fun)
 {
-  gimple_ranger ranger;
+  enable_ranger ();
   basic_block bb;
   FOR_EACH_BB_FN (bb, fun)
 {
@@ -290,7 +290,7 @@ pass_walloca::execute (function *fun)
continue;
 
  class alloca_type_and_limit t
-   = alloca_call_type (ranger, stmt, is_vla);
+   = alloca_call_type (stmt, is_vla);
 
  unsigned HOST_WIDE_INT adjusted_alloca_limit
= adjusted_warn_limit (false);
@@ -383,6 +383,7 @@ pass_walloca::execute (function *fun)
}
}
 }
+  disable_ranger ();
   return 0;
 }
 
-- 
2.31.1



[PATCH] i386: Add comparisons for 4-byte vectors [PR100637]

2021-05-21 Thread Uros Bizjak via Gcc-patches
2021-05-21  Uroš Bizjak  

gcc/
PR target/100637
* config/i386/i386-expand.c (ix86_expand_sse_movcc):
Handle V4QI and V2HI modes.
(ix86_expand_sse_movcc): Ditto.
* config/i386/mmx.md (*3):
New instruction pattern.
(*eq3): Ditto.
(*gt3): Ditto.
(*xop_pcmov_): Ditto.
(mmx_pblendvb32): Ditto.
(mmx_pblendvb64): Rename from mmx_pblendvb.
(vec_cmp): New expander.
(vec_cmpu): Ditto.
(vcond): Ditto.
(vcondu): Ditto.
(vcond_mask_): Ditto.

gcc/testsuite/

PR target/100637
* g++.target/i386/pr100637-1b.C: New test.
* g++.target/i386/pr100637-1w.C: Ditto.
* gcc.target/i386/pr100637-2b.c: Ditto.
* gcc.target/i386/pr100637-2w.c: Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 9f3d41955a2..931b3362144 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -3721,7 +3721,7 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, 
rtx op_false)
{
  op_true = force_reg (mode, op_true);
 
- gen = gen_mmx_pblendvb;
+ gen = gen_mmx_pblendvb64;
  if (mode != V8QImode)
d = gen_reg_rtx (V8QImode);
  op_false = gen_lowpart (V8QImode, op_false);
@@ -3729,6 +3729,20 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, 
rtx op_false)
  cmp = gen_lowpart (V8QImode, cmp);
}
   break;
+case E_V4QImode:
+case E_V2HImode:
+  if (TARGET_SSE4_1)
+   {
+ op_true = force_reg (mode, op_true);
+
+ gen = gen_mmx_pblendvb32;
+ if (mode != V4QImode)
+   d = gen_reg_rtx (V4QImode);
+ op_false = gen_lowpart (V4QImode, op_false);
+ op_true = gen_lowpart (V4QImode, op_true);
+ cmp = gen_lowpart (V4QImode, cmp);
+   }
+  break;
 case E_V16QImode:
 case E_V8HImode:
 case E_V4SImode:
@@ -4241,6 +4255,12 @@ ix86_expand_int_sse_cmp (rtx dest, enum rtx_code code, 
rtx cop0, rtx cop1,
  else if (code == GT && TARGET_SSE4_1)
gen = gen_sminv8qi3;
  break;
+   case E_V4QImode:
+ if (code == GTU && TARGET_SSE2)
+   gen = gen_uminv4qi3;
+ else if (code == GT && TARGET_SSE4_1)
+   gen = gen_sminv4qi3;
+ break;
case E_V8HImode:
  if (code == GTU && TARGET_SSE4_1)
gen = gen_uminv8hi3;
@@ -4253,6 +4273,12 @@ ix86_expand_int_sse_cmp (rtx dest, enum rtx_code code, 
rtx cop0, rtx cop1,
  else if (code == GT && TARGET_SSE2)
gen = gen_sminv4hi3;
  break;
+   case E_V2HImode:
+ if (code == GTU && TARGET_SSE4_1)
+   gen = gen_uminv2hi3;
+ else if (code == GT && TARGET_SSE2)
+   gen = gen_sminv2hi3;
+ break;
case E_V4SImode:
  if (TARGET_SSE4_1)
gen = (code == GTU) ? gen_uminv4si3 : gen_sminv4si3;
@@ -4327,8 +4353,10 @@ ix86_expand_int_sse_cmp (rtx dest, enum rtx_code code, 
rtx cop0, rtx cop1,
case E_V16HImode:
case E_V16QImode:
case E_V8QImode:
+   case E_V4QImode:
case E_V8HImode:
case E_V4HImode:
+   case E_V2HImode:
  /* Perform a parallel unsigned saturating subtraction.  */
  x = gen_reg_rtx (mode);
  emit_insn (gen_rtx_SET
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 5e92be34545..4c42e6d93dc 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1403,6 +1403,20 @@ (define_insn "*mmx_3"
(set_attr "type" "mmxadd,sseadd,sseadd")
(set_attr "mode" "DI,TI,TI")])
 
+(define_insn "*3"
+  [(set (match_operand:VI_32 0 "register_operand" "=x,Yw")
+(sat_plusminus:VI_32
+ (match_operand:VI_32 1 "register_operand" "0,Yw")
+ (match_operand:VI_32 2 "register_operand" "x,Yw")))]
+  "TARGET_SSE2
+   && ix86_binary_operator_ok (, mode, operands)"
+  "@
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sseadd")
+   (set_attr "mode" "TI")])
+
 (define_expand "mmx_mulv4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
 (mult:V4HI (match_operand:V4HI 1 "register_mmxmem_operand")
@@ -2032,6 +2046,20 @@ (define_insn "*mmx_eq3"
(set_attr "type" "mmxcmp,ssecmp,ssecmp")
(set_attr "mode" "DI,TI,TI")])
 
+(define_insn "*eq3"
+  [(set (match_operand:VI_32 0 "register_operand" "=x,x")
+(eq:VI_32
+ (match_operand:VI_32 1 "register_operand" "%0,x")
+ (match_operand:VI_32 2 "register_operand" "x,x")))]
+  "TARGET_SSE2
+   && ix86_binary_operator_ok (EQ, mode, operands)"
+  "@
+   pcmpeq\t{%2, %0|%0, %2}
+   vpcmpeq\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssecmp")
+   (set_attr 

[PATCH][GCC-9][committed][libsanitizer]: Remove cyclades from libsanitizer

2021-05-21 Thread Tamar Christina via Gcc-patches
Hi All,


[rebased patch for GCC-9 but the same as the others]

The Linux kernel has removed the interface to cyclades from
the latest kernel headers[1] due to them being orphaned for the
past 13 years.

libsanitizer uses this header when compiling against glibc, but
glibcs itself doesn't seem to have any references to cyclades.

Further more it seems that the driver is broken in the kernel and
the firmware doesn't seem to be available anymore.

As such since this is breaking the build of libsanitizer (and so the
GCC bootstrap[2]) I propose to remove this.

[1] https://lkml.org/lkml/2021/3/2/153
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100379

Bootstrapped on aarch64-none-linux-gnu and no issues.

Committed to branch under previous approval.

Thanks,
Tamar

libsanitizer/ChangeLog:

PR sanitizer/100379
* sanitizer_common/sanitizer_common_interceptors_ioctl.inc: Cherry-pick
llvm-project revision f7c5351552387bd43f6ca3631016d7f0dfe0f135.
* sanitizer_common/sanitizer_platform_limits_posix.cc: Likewise.
* sanitizer_common/sanitizer_platform_limits_posix.h: Likewise.

--- inline copy of patch -- 
diff --git 
a/libsanitizer/sanitizer_common/sanitizer_common_interceptors_ioctl.inc 
b/libsanitizer/sanitizer_common/sanitizer_common_interceptors_ioctl.inc
index 
5408ea17c595c76d25bf0d04aa3946daaef1e1aa..7a9cd3f5968ad3bc0d09ddbc2bed64e536c8c706
 100644
--- a/libsanitizer/sanitizer_common/sanitizer_common_interceptors_ioctl.inc
+++ b/libsanitizer/sanitizer_common/sanitizer_common_interceptors_ioctl.inc
@@ -365,15 +365,6 @@ static void ioctl_table_fill() {
 
 #if SANITIZER_LINUX && !SANITIZER_ANDROID
   // _(SIOCDEVPLIP, WRITE, struct_ifreq_sz); // the same as EQL_ENSLAVE
-  _(CYGETDEFTHRESH, WRITE, sizeof(int));
-  _(CYGETDEFTIMEOUT, WRITE, sizeof(int));
-  _(CYGETMON, WRITE, struct_cyclades_monitor_sz);
-  _(CYGETTHRESH, WRITE, sizeof(int));
-  _(CYGETTIMEOUT, WRITE, sizeof(int));
-  _(CYSETDEFTHRESH, NONE, 0);
-  _(CYSETDEFTIMEOUT, NONE, 0);
-  _(CYSETTHRESH, NONE, 0);
-  _(CYSETTIMEOUT, NONE, 0);
   _(EQL_EMANCIPATE, WRITE, struct_ifreq_sz);
   _(EQL_ENSLAVE, WRITE, struct_ifreq_sz);
   _(EQL_GETMASTRCFG, WRITE, struct_ifreq_sz);
diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h 
b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
index 
6a673a7c99598ed4248eba580a23afbe3e813525..f921bf2b5b5bae5e81ea6472bc9d0fede7fbd95b
 100644
--- a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
+++ b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h
@@ -1040,7 +1040,6 @@ struct __sanitizer_cookie_io_functions_t {
 
 #if SANITIZER_LINUX && !SANITIZER_ANDROID
   extern unsigned struct_ax25_parms_struct_sz;
-  extern unsigned struct_cyclades_monitor_sz;
   extern unsigned struct_input_keymap_entry_sz;
   extern unsigned struct_ipx_config_data_sz;
   extern unsigned struct_kbdiacrs_sz;
@@ -1385,15 +1384,6 @@ struct __sanitizer_cookie_io_functions_t {
 #endif  // SANITIZER_LINUX || SANITIZER_FREEBSD
 
 #if SANITIZER_LINUX && !SANITIZER_ANDROID
-  extern unsigned IOCTL_CYGETDEFTHRESH;
-  extern unsigned IOCTL_CYGETDEFTIMEOUT;
-  extern unsigned IOCTL_CYGETMON;
-  extern unsigned IOCTL_CYGETTHRESH;
-  extern unsigned IOCTL_CYGETTIMEOUT;
-  extern unsigned IOCTL_CYSETDEFTHRESH;
-  extern unsigned IOCTL_CYSETDEFTIMEOUT;
-  extern unsigned IOCTL_CYSETTHRESH;
-  extern unsigned IOCTL_CYSETTIMEOUT;
   extern unsigned IOCTL_EQL_EMANCIPATE;
   extern unsigned IOCTL_EQL_ENSLAVE;
   extern unsigned IOCTL_EQL_GETMASTRCFG;
diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc 
b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc
index 
d823a12190c087689b81c1a735005c862c163936..e8fce8a0287dff674016c901cabd4b67a2fabd8c
 100644
--- a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc
+++ b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc
@@ -157,7 +157,6 @@ typedef struct user_fpregs elf_fpregset_t;
 # include 
 #endif
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -466,7 +465,6 @@ unsigned struct_ElfW_Phdr_sz = sizeof(Elf_Phdr);
 
 #if SANITIZER_LINUX && !SANITIZER_ANDROID
   unsigned struct_ax25_parms_struct_sz = sizeof(struct ax25_parms_struct);
-  unsigned struct_cyclades_monitor_sz = sizeof(struct cyclades_monitor);
 #if EV_VERSION > (0x01)
   unsigned struct_input_keymap_entry_sz = sizeof(struct input_keymap_entry);
 #else
@@ -833,15 +831,6 @@ unsigned struct_ElfW_Phdr_sz = sizeof(Elf_Phdr);
 #endif // SANITIZER_LINUX || SANITIZER_FREEBSD
 
 #if SANITIZER_LINUX && !SANITIZER_ANDROID
-  unsigned IOCTL_CYGETDEFTHRESH = CYGETDEFTHRESH;
-  unsigned IOCTL_CYGETDEFTIMEOUT = CYGETDEFTIMEOUT;
-  unsigned IOCTL_CYGETMON = CYGETMON;
-  unsigned IOCTL_CYGETTHRESH = CYGETTHRESH;
-  unsigned IOCTL_CYGETTIMEOUT = CYGETTIMEOUT;
-  unsigned IOCTL_CYSETDEFTHRESH = CYSETDEFTHRESH;
-  unsigned IOCTL_CYSETDEFTIMEOUT = CYSETDEFTIMEOUT;
-  unsigned IOCTL_CYSETTHRESH = 

Re: [PATCH v3] bpf.2: Use standard types and attributes

2021-05-21 Thread Alejandro Colomar via Gcc-patches

Hello Daniel,

On 5/17/21 8:56 PM, Daniel Borkmann wrote:

On 5/16/21 11:16 AM, Alejandro Colomar (man-pages) wrote:


Signed-off-by: Alejandro Colomar 
Discussion: 
 


Nacked-by: Alexei Starovoitov 
Nacked-by: Greg Kroah-Hartman 


You forgot to retain my ...

Nacked-by: Daniel Borkmann 


Yup! Sorry, I forgot :)

Thanks,

Alex




Acked-by: Zack Weinberg 
Cc: LKML 
Cc: glibc 
Cc: GCC 
Cc: bpf 
Cc: David Laight 
Cc: Joseph Myers 
Cc: Florian Weimer 
Cc: Daniel Borkmann 
---
  man2/bpf.2 | 49 -
  1 file changed, 24 insertions(+), 25 deletions(-)





[PATCH] lto-dump : Fix an ICE when lto-dump try to call 'get_size()' on thunk symbols.

2021-05-21 Thread 王留帅 via Gcc-patches
Hi,
​
We, ByteDance, have seen an ICE on trunk which can be reproduced with the
following symbol testcase. The problem is caused by referencing a null
pointer when applying 'n_basic_blocks_for_fn' on a thunk symbol. For the
thunk symbol, 'DECL_STRUCT_FUNCTION (cnode->decl)' will return an empty
pointer.
​
class Base {
public :
virtual void f() {}
private:
long x;
};
class Derived : public virtual Base {
public:
virtual void f() {}
private:
long y;
};
int main() {
Derived d;
}
​
I have fixed it by adding more guard conditions and tested it with
ByteDance internal workloads.

2020-05-21 Wang Liushuai 

gcc/lto/ChangeLog:

* lto-dump.c (get_size): Fix the NPD error about the thunk symbol.
​
>From ce3faba067e4b911bfdafa486b1bb4258272b650 Mon Sep 17 00:00:00 2001
From: wangliushuai 
Date: Fri, 21 May 2021 16:59:51 +0800
Subject: [PATCH] Do not load the thunk symbols.

---
 gcc/lto/lto-dump.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/lto/lto-dump.c b/gcc/lto/lto-dump.c
index 8344d3e1737..3586e208f9a 100644
--- a/gcc/lto/lto-dump.c
+++ b/gcc/lto/lto-dump.c
@@ -122,7 +122,7 @@ public:
 cgraph_node *cnode = dyn_cast (node);
 gcc_assert (cnode);

-return (cnode->definition && !cnode->alias)
+return (cnode->definition && !cnode->thunk && !cnode->alias)
  ? n_basic_blocks_for_fn (DECL_STRUCT_FUNCTION (cnode->decl))
  : 0;
   }
--
2.29.2


Re: PowerPC64 ELFv2 -fpatchable-function-entry

2021-05-21 Thread Segher Boessenkool
On Wed, May 19, 2021 at 09:39:30PM +0930, Alan Modra wrote:
> On Tue, May 18, 2021 at 05:48:40AM -0500, Segher Boessenkool wrote:
> > I wrote a bit later:
> > 
> > > > Can you make this less hacky please?  Changing the generic code is just
> > > > fine as well, it needs some love.
> > 
> > In effect making a callback / hook without making that explicit is bad
> > for maintainability.  We are in stage 1, now is a good time for
> > infrastructure changes.
> 
> Yes, perhaps I could do that, and possibly even write a patch that is
> accepted.  I'm not going to though, because I made a decision a little
> while ago that I'm not going to contribute new gcc work.  This one was
> just flushing some patches that I wrote a while ago.

Okay, thanks.  I opened PR100714 to keep track of the work that will
need to be done.

Cheers,


Segher


Re: [Ada] Detect illegal conditions in Raise_xxx_Error nodes

2021-05-21 Thread Eric Botcazou
> 2021-05-21  Piotr Trojanek  
> 
>   * gcc-interface/trans.c (Raise_Error_to_gnu): Add an assertion.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
index 289a2efcaf1..ee014a35cc2 100644
--- a/gcc/ada/gcc-interface/trans.c
+++ b/gcc/ada/gcc-interface/trans.c
@@ -6159,7 +6159,12 @@ Raise_Error_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p)
 	}
 }
   else
-gnu_result = build1 (NULL_EXPR, *gnu_result_type_p, gnu_result);
+{
+  /* The condition field must not be present when the node is used as an
+	 expression form.  */
+  gigi_checking_assert (No (gnat_cond));
+  gnu_result = build1 (NULL_EXPR, *gnu_result_type_p, gnu_result);
+}
 
   return gnu_result;
 }


[Ada] Detect illegal conditions in Raise_xxx_Error nodes

2021-05-21 Thread Eric Botcazou
Enforce comment from sinfo.ads about the Condition field in N_Raise_xxx_Error
nodes. Only an extra sanity check; the behaviour is not affected.


Tested on x86-64/Linux, applied on the mainline.


2021-05-21  Piotr Trojanek  

* gcc-interface/trans.c (Raise_Error_to_gnu): Add an assertion.

-- 
Eric Botcazou




[Ada] Consistently generate debug info for elaboration variables

2021-05-21 Thread Eric Botcazou
This makes sure that debug info is generated for elaboration variables,
even if the variables are not generated exclusively for this purpose.

Tested on x86-64/Linux, applied on the mainline.


2021-05-21  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_entity) : Do
not pass default value in call to create_var_decl.
: Likewise.
: Both pass true for const_flag and false for
const_decl_allowed_p in call to create_var_decl.
Small tweaks in the generic record type case.
(elaborate_expression): Rename need_debug into need_for_debug and
adjust throughout.
(elaborate_expression_1): Likewise.  Pass Needs_Debug_Info instead
of need_for_debug in call to create_var_decl.
(elaborate_expression_2): Likewise.
* gcc-interface/utils.c (maybe_pad_type): Pass false for
const_decl_allowed_p in call to create_var_decl.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c
index c7d61763db1..bc7046accb5 100644
--- a/gcc/ada/gcc-interface/decl.c
+++ b/gcc/ada/gcc-interface/decl.c
@@ -622,7 +622,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 	  = create_var_decl (gnu_entity_name, gnu_ext_name, gnu_type,
 			 gnu_expr, true, Is_Public (gnat_entity),
 			 false, false, false, artificial_p,
-			 debug_info_p, NULL, gnat_entity, true);
+			 debug_info_p, NULL, gnat_entity);
   }
   break;
 
@@ -1527,7 +1527,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 			 imported_p || !definition, static_flag,
 			 volatile_flag, artificial_p,
 			 debug_info_p && definition, attr_list,
-			 gnat_entity, true);
+			 gnat_entity);
 	DECL_BY_REF_P (gnu_decl) = used_by_ref;
 	DECL_POINTS_TO_READONLY_P (gnu_decl) = used_by_ref && inner_const_flag;
 	DECL_CAN_NEVER_BE_NULL_P (gnu_decl) = Can_Never_Be_Null (gnat_entity);
@@ -3526,9 +3526,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 		  = create_var_decl (create_concat_name (gnat_entity,
 			 "XVZ"),
 	 NULL_TREE, sizetype, gnu_size_unit,
-	 false, false, false, false, false,
-	 true, debug_info_p,
-	 NULL, gnat_entity);
+	 true, false, false, false, false,
+	 true, true, NULL, gnat_entity, false);
 		}
 
 	  /* Or else, if the subtype is artificial and encodings are not
@@ -4455,21 +4454,20 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
   if (Unknown_RM_Size (gnat_entity) && TYPE_SIZE (gnu_type))
 	Set_RM_Size (gnat_entity, annotate_value (rm_size (gnu_type)));
 
-  /* If we are at global level, GCC will have applied variable_size to
-	 the type, but that won't have done anything.  So, if it's not
-	 a constant or self-referential, call elaborate_expression_1 to
-	 make a variable for the size rather than calculating it each time.
-	 Handle both the RM size and the actual size.  */
+  /* If we are at global level, GCC applied variable_size to the size but
+	 this has done nothing.  So, if it's not constant or self-referential,
+	 call elaborate_expression_1 to make a variable for it rather than
+	 calculating it each time.  */
   if (TYPE_SIZE (gnu_type)
 	  && !TREE_CONSTANT (TYPE_SIZE (gnu_type))
 	  && !CONTAINS_PLACEHOLDER_P (TYPE_SIZE (gnu_type))
 	  && global_bindings_p ())
 	{
-	  tree size = TYPE_SIZE (gnu_type);
+	  tree orig_size = TYPE_SIZE (gnu_type);
 
 	  TYPE_SIZE (gnu_type)
-	= elaborate_expression_1 (size, gnat_entity, "SIZE", definition,
-  false);
+	= elaborate_expression_1 (TYPE_SIZE (gnu_type), gnat_entity,
+  "SIZE", definition, false);
 
 	  /* ??? For now, store the size as a multiple of the alignment in
 	 bytes so that we can see the alignment from the tree.  */
@@ -4482,7 +4480,9 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 	 may not be marked by the call to create_type_decl below.  */
 	  MARK_VISITED (TYPE_SIZE_UNIT (gnu_type));
 
-	  if (TREE_CODE (gnu_type) == RECORD_TYPE)
+	  /* For a record type, deal with the variant part, if any, and handle
+	 the Ada size as well.  */
+	  if (RECORD_OR_UNION_TYPE_P (gnu_type))
 	{
 	  tree variant_part = get_variant_part (gnu_type);
 	  tree ada_size = TYPE_ADA_SIZE (gnu_type);
@@ -4535,7 +4535,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 		  DECL_SIZE_UNIT (variant_part) = TYPE_SIZE_UNIT (union_type);
 		}
 
-	  if (operand_equal_p (ada_size, size, 0))
+	  if (operand_equal_p (ada_size, orig_size, 0))
 		ada_size = TYPE_SIZE (gnu_type);
 	  else
 		ada_size
@@ -6724,12 +6724,12 @@ prepend_attributes (struct attrib **attr_list, Entity_Id gnat_entity)
if a variable needs to be created and DEFINITION is true if this is done
for a definition of GNAT_ENTITY.  If NEED_VALUE is true, we need a result;
otherwise, we are just 

[Ada] Replace ? with ?? in warning messages

2021-05-21 Thread Eric Botcazou
The former has been deprecated in favor of the latter.

Tested on x86-64/Linux, applied on the mainline.


2021-05-21  Ghjuvan Lacambre  

* gcc-interface/decl.c (gnat_to_gnu_entity): Replace ? with ??.
(gnat_to_gnu_param): Likewise.
(gnat_to_gnu_subprog_type): Likewise.
(warn_on_field_placement): Likewise.
(intrin_arglists_compatible_p): Likewise.
* gcc-interface/trans.c (Pragma_to_gnu): Likewise.
(gnat_to_gnu): Likewise.
(validate_unchecked_conversion): Likewise.
* gcc-interface/utils.c (maybe_pad_type): Likewise.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c
index 8d3c16c624b..c7d61763db1 100644
--- a/gcc/ada/gcc-interface/decl.c
+++ b/gcc/ada/gcc-interface/decl.c
@@ -1392,7 +1392,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 
 		if (TREE_CODE (TYPE_SIZE_UNIT (gnu_alloc_type)) == INTEGER_CST
 		&& !valid_constant_size_p (TYPE_SIZE_UNIT (gnu_alloc_type)))
-		  post_error ("?`Storage_Error` will be raised at run time!",
+		  post_error ("??`Storage_Error` will be raised at run time!",
 			  gnat_entity);
 
 		gnu_expr
@@ -4328,7 +4328,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 		 ratio is greater or equal to the byte/bit ratio.  */
 	  if (tree_fits_uhwi_p (size)
 		  && align >= tree_to_uhwi (size) * BITS_PER_UNIT)
-		post_error_ne ("?suspiciously large alignment specified for&",
+		post_error_ne ("??suspiciously large alignment specified for&",
 			   Expression (Alignment_Clause (gnat_entity)),
 			   gnat_entity);
 	}
@@ -5443,7 +5443,7 @@ gnat_to_gnu_param (Entity_Id gnat_param, tree gnu_param_type, bool first,
   input_location = saved_location;
 
   if (mech == By_Copy && (by_ref || by_component_ptr))
-post_error ("?cannot pass & by copy", gnat_param);
+post_error ("??cannot pass & by copy", gnat_param);
 
   /* If this is an Out parameter that isn't passed by reference and whose
  type doesn't require the initialization of formals, we don't make a
@@ -6271,7 +6271,7 @@ gnat_to_gnu_subprog_type (Entity_Id gnat_subprog, bool definition,
 
 	  if (!intrin_profiles_compatible_p ())
 		post_error
-		  ("?profile of& doesn''t match the builtin it binds!",
+		  ("??profile of& doesn''t match the builtin it binds!",
 		   gnat_subprog);
 
 	  return gnu_builtin_decl;
@@ -6284,7 +6284,7 @@ gnat_to_gnu_subprog_type (Entity_Id gnat_subprog, bool definition,
 	 on demand without risking false positives with common default sets
 	 of options.  */
 	  if (warn_shadow)
-	post_error ("?gcc intrinsic not found for&!", gnat_subprog);
+	post_error ("??gcc intrinsic not found for&!", gnat_subprog);
 	}
 }
 
@@ -7597,20 +7597,20 @@ warn_on_field_placement (tree gnu_field, Node_Id gnat_component_list,
 
   const char *msg1
 = in_variant
-  ? "?variant layout may cause performance issues"
-  : "?record layout may cause performance issues";
+  ? "??variant layout may cause performance issues"
+  : "??record layout may cause performance issues";
   const char *msg2
 = Ekind (gnat_field) == E_Discriminant
-  ? "?discriminant & whose length is not multiple of a byte"
+  ? "??discriminant & whose length is not multiple of a byte"
   : field_has_self_size (gnu_field)
-	? "?component & whose length depends on a discriminant"
+	? "??component & whose length depends on a discriminant"
 	: field_has_variable_size (gnu_field)
-	  ? "?component & whose length is not fixed"
-	  : "?component & whose length is not multiple of a byte";
+	  ? "??component & whose length is not fixed"
+	  : "??component & whose length is not multiple of a byte";
   const char *msg3
 = do_reorder
-  ? "?comes too early and was moved down"
-  : "?comes too early and ought to be moved down";
+  ? "??comes too early and was moved down"
+  : "??comes too early and ought to be moved down";
 
   post_error (msg1, gnat_field);
   post_error_ne (msg2, gnat_field, gnat_field);
@@ -9481,14 +9481,14 @@ intrin_arglists_compatible_p (intrin_binding_t * inb)
   if (ada_type == void_type_node
 	  && btin_type != void_type_node)
 	{
-	  post_error ("?Ada arguments list too short!", inb->gnat_entity);
+	  post_error ("??Ada arguments list too short!", inb->gnat_entity);
 	  return false;
 	}
 
   if (btin_type == void_type_node
 	  && ada_type != void_type_node)
 	{
-	  post_error_ne_num ("?Ada arguments list too long ('> ^)!",
+	  post_error_ne_num ("??Ada arguments list too long ('> ^)!",
 			 inb->gnat_entity, inb->gnat_entity, argpos);
 	  return false;
 	}
@@ -9497,7 +9497,7 @@ intrin_arglists_compatible_p (intrin_binding_t * inb)
   argpos ++;
   if (intrin_types_incompatible_p (ada_type, btin_type))
 	{
-	  post_error_ne_num ("?intrinsic binding type mismatch on argument ^!",
+	  post_error_ne_num ("??intrinsic binding type mismatch on 

Re: [PATCH][DOCS] Remove install-old.texi

2021-05-21 Thread Martin Liška

On 5/19/21 4:49 PM, Joseph Myers wrote:

On Wed, 19 May 2021, Martin Liška wrote:


I'm not sure where this list comes from


I split parts in contrib/config-list.mk and printed them.


but I'd expect "linux" to be the
canonical "linux-gnu", along with "linux-uclibc", "linux-android",
"linux-musl" ("uclibc" etc. aren't system names on their own) and variants
with "eabi" or "eabihf" on the end (see what config.guess produces for
Arm).


One needs an Arm machine for that. Do you know about a better way how to get
list of all systems?


Looking at the config.sub / config.guess testsuites (in config.git) will
provide lists of many systems (not all supported by GCC) (and the code of
config.guess may also help show what the Arm variants are).  Looking at
config.gcc to see what it matches against may also be helpful (but then
use config.sub to get the canonical form of a target triplet).


All right, I've taken all targets mentioned in contrib/config-list.mk and
called config.sub for them.

Then I've taken first parts[0] as a CPU name and '-'.join(parts[2:]) as system 
name.
That gets me:

CPUs:
aarch64, alpha, alpha64, amdgcn, arc, arceb, arm, avr, bfin, bpf, cr16, cris, 
csky, epiphany, fido, fr30, frv, ft32, h8300, hppa, hppa2.0, hppa64, i486, 
i686, ia64, iq2000, lm32, m32c, m32r, m32rle, m68k, mcore, microblaze, mips, 
mips64, mips64el, mips64octeon, mips64orion, mips64vr, mipsel, mipsisa32, 
mipsisa32r2, mipsisa64, mipsisa64r2, mipsisa64r2el, mipsisa64sb1, 
mipsisa64sr71k, mipstx39, mmix, mn10300, moxie, msp430, nds32be, nds32le, 
nios2, nvptx, or1k, pdp11, powerpc, powerpc64, powerpcle, pru, riscv32, 
riscv64, rl78, rx, s390, s390x, sh, shle, sparc, sparc64, tic6x, tilegx, 
tilegxbe, tilepro, v850, v850e, v850e1, vax, visium, x86_64, xstormy16, xtensa


SYSTEMSs:
aix7.1, aix7.2, amdhsa, aout, cygwin, darwin, darwin10, darwin7, darwin8, 
darwin9, eabi, eabialtivec, eabisim, eabisimaltivec, elf, elf32, elfbare, 
elfoabi, freebsd4, freebsd6, gnu, hpux, hpux10.1, hpux11.0, hpux11.3, hpux11.9, 
kfreebsd-gnu, kopensolaris-gnu, linux-androideabi, linux-gnu, 
linux-gnu_altivec, linux-musl, linux-uclibc, lynxos, mingw32, mingw32crt, 
mmixware, msdosdjgpp, netbsd, netbsdelf, netbsdelf9, nto-qnx, openbsd, rtems, 
solaris2.11, symbianelf, tpf, uclinux, uclinux_eabi, vms, vxworks, vxworksae, 
vxworksmils


Is it a reasonable list?
Thanks,
Martin







[Ada] Fix internal error on locally derived bit-packed array type

2021-05-21 Thread Eric Botcazou
This is a regression present on the mainline, 11 and 10 branches, in the form 
of an ICE on a locally derived bit-packed array type.

Tested on x86-64/Linux, applied on the mainline, 11 and 10 branches.


2021-05-21  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_entity) : Process
the implementation type of a packed type implemented specially.


2021-05-21  Eric Botcazou  

* gnat.dg/derived_type7.adb, gnat.dg/derived_type7.ads: New test.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c
index 19e851ff712..8d3c16c624b 100644
--- a/gcc/ada/gcc-interface/decl.c
+++ b/gcc/ada/gcc-interface/decl.c
@@ -2345,11 +2345,15 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 	  set_nonaliased_component_on_array_type (tem);
 	  }
 
-	/* If an alignment is specified, use it if valid.  But ignore it
-	   for the original type of packed array types.  If the alignment
-	   was requested with an explicit alignment clause, state so.  */
-	if (No (Packed_Array_Impl_Type (gnat_entity))
-	&& Known_Alignment (gnat_entity))
+	/* If this is a packed type implemented specially, then process the
+	   implementation type so it is elaborated in the proper scope.  */
+	if (Present (Packed_Array_Impl_Type (gnat_entity)))
+	  gnat_to_gnu_entity (Packed_Array_Impl_Type (gnat_entity), NULL_TREE,
+			  false);
+
+	/* Otherwise, if an alignment is specified, use it if valid and, if
+	   the alignment was requested with an explicit clause, state so.  */
+	else if (Known_Alignment (gnat_entity))
 	  {
 	SET_TYPE_ALIGN (tem,
 			validate_alignment (Alignment (gnat_entity),
-- { dg-do compile }

package Derived_Type7 is

  type Six_Bit_Data_Type is range 0 .. 63;
  for Six_Bit_Data_Type'Size use 6;

  type Six_Bit_Data_Array_Type is array (Integer range <>) of Six_Bit_Data_Type;
  for Six_Bit_Data_Array_Type'Component_Size use 6;

  procedure Proc (Size : Natural);

end Derived_Type7;
package body Derived_Type7 is

  procedure Proc (Size : Natural) is
type Sar_Six_Bit_Arr is new Six_Bit_Data_Array_Type (1 .. Size);
  begin
null;
  end;

end Derived_Type7;


Re: [PATCH] Add no_sanitize_coverage attribute.

2021-05-21 Thread Martin Liška

On 5/20/21 12:55 PM, Marco Elver wrote:

I think this came up with other no_sanitize [1] based on what I had
written to you last year [2].

[1]https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547618.html
[2]https://lore.kernel.org/lkml/canpmjnnrz5ovkb6pe7k6gjfogbht_zhypkng9ad+kjndzk7...@mail.gmail.com/


Ah, you're right. I've just updated the patch to address that.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin
>From b7f9f6c9722e7cb1b81b96ec646cadb5706b1038 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 20 May 2021 09:32:29 +0200
Subject: [PATCH] Add no_sanitize_coverage attribute.

gcc/ChangeLog:

	* asan.h (sanitize_coverage_p): New function.
	* doc/extend.texi: Document it.
	* fold-const.c (fold_range_test): Use sanitize_flags_p
	instead of flag_sanitize_coverage.
	(fold_truth_andor): Likewise.
	* sancov.c: Likewise.
	* tree-ssa-ifcombine.c (ifcombine_ifandif): Likewise.
	* ipa-inline.c (sanitize_attrs_match_for_inline_p): Handle
	-fsanitize-coverage when inlining.

gcc/c-family/ChangeLog:

	* c-attribs.c (handle_no_sanitize_coverage_attribute): New.

gcc/testsuite/ChangeLog:

	* gcc.dg/sancov/attribute.c: New test.
---
 gcc/asan.h  | 10 ++
 gcc/c-family/c-attribs.c| 20 
 gcc/doc/extend.texi |  6 ++
 gcc/fold-const.c|  4 ++--
 gcc/ipa-inline.c|  3 +++
 gcc/sancov.c|  4 ++--
 gcc/testsuite/gcc.dg/sancov/attribute.c | 15 +++
 gcc/tree-ssa-ifcombine.c|  4 +++-
 8 files changed, 61 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/sancov/attribute.c

diff --git a/gcc/asan.h b/gcc/asan.h
index f110f1db563..8c0b2baf170 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -249,4 +249,14 @@ sanitize_flags_p (unsigned int flag, const_tree fn = current_function_decl)
   return result_flags;
 }
 
+/* Return true when coverage sanitization should happend for FN function.  */
+
+static inline bool
+sanitize_coverage_p (const_tree fn = current_function_decl)
+{
+  return (flag_sanitize_coverage
+	  && lookup_attribute ("no_sanitize_coverage",
+			   DECL_ATTRIBUTES (fn)) == NULL_TREE);
+}
+
 #endif /* TREE_ASAN */
diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index ccf9e4ccf0b..671b27c3200 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -62,6 +62,8 @@ static tree handle_no_address_safety_analysis_attribute (tree *, tree, tree,
 			 int, bool *);
 static tree handle_no_sanitize_undefined_attribute (tree *, tree, tree, int,
 		bool *);
+static tree handle_no_sanitize_coverage_attribute (tree *, tree, tree, int,
+		   bool *);
 static tree handle_asan_odr_indicator_attribute (tree *, tree, tree, int,
 		 bool *);
 static tree handle_stack_protect_attribute (tree *, tree, tree, int, bool *);
@@ -449,6 +451,8 @@ const struct attribute_spec c_common_attribute_table[] =
 			  handle_no_sanitize_thread_attribute, NULL },
   { "no_sanitize_undefined",  0, 0, true, false, false, false,
 			  handle_no_sanitize_undefined_attribute, NULL },
+  { "no_sanitize_coverage",   0, 0, true, false, false, false,
+			  handle_no_sanitize_coverage_attribute, NULL },
   { "asan odr indicator", 0, 0, true, false, false, false,
 			  handle_asan_odr_indicator_attribute, NULL },
   { "warning",		  1, 1, true,  false, false, false,
@@ -1211,6 +1215,22 @@ handle_no_sanitize_undefined_attribute (tree *node, tree name, tree, int,
   return NULL_TREE;
 }
 
+/* Handle a "no_sanitize_coverage" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_no_sanitize_coverage_attribute (tree *node, tree name, tree, int,
+   bool *no_add_attrs)
+{
+  if (TREE_CODE (*node) != FUNCTION_DECL)
+{
+  warning (OPT_Wattributes, "%qE attribute ignored", name);
+  *no_add_attrs = true;
+}
+
+  return NULL_TREE;
+}
+
 /* Handle an "asan odr indicator" attribute; arguments as in
struct attribute_spec.handler.  */
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 826804e6149..3ddeb0dee3a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3415,6 +3415,12 @@ The @code{no_sanitize_undefined} attribute on functions is used
 to inform the compiler that it should not check for undefined behavior
 in the function when compiling with the @option{-fsanitize=undefined} option.
 
+@item no_sanitize_coverage
+@cindex @code{no_sanitize_coverage} function attribute
+The @code{no_sanitize_coverage} attribute on functions is used
+to inform the compiler that it should not do coverage-guided
+fuzzing code instrumentation (@option{-fsanitize-coverage}).
+
 @item no_split_stack
 @cindex @code{no_split_stack} function attribute
 @opindex fsplit-stack
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 3be9c15e6b2..d088187a057 

[Ada] Always translate Is_Pure flag into pure in C sense

2021-05-21 Thread Eric Botcazou
Gigi has historically translated the Is_Pure flag of the front-end into
the "const" attribute of GNU C.  That's correct for subprograms of pure
Ada units, but not fully exact according to the semantics of the flag.

Tested on x86-64/Linux, applied on the mainline, 11 and 10 branches.


2021-05-21  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_subprog_type): Always translate
the Is_Pure flag into the "pure" attribute of GNU C.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c
index a7595cbf514..19e851ff712 100644
--- a/gcc/ada/gcc-interface/decl.c
+++ b/gcc/ada/gcc-interface/decl.c
@@ -5737,16 +5737,16 @@ gnat_to_gnu_subprog_type (Entity_Id gnat_subprog, bool definition,
   tree gnu_cico_return_type = NULL_TREE;
   tree gnu_cico_field_list = NULL_TREE;
   bool gnu_cico_only_integral_type = true;
-  /* The semantics of "pure" in Ada essentially matches that of "const"
- or "pure" in GCC.  In particular, both properties are orthogonal
- to the "nothrow" property if the EH circuitry is explicit in the
- internal representation of the middle-end.  If we are to completely
- hide the EH circuitry from it, we need to declare that calls to pure
- Ada subprograms that can throw have side effects since they can
- trigger an "abnormal" transfer of control flow; therefore, they can
- be neither "const" nor "pure" in the GCC sense.  */
-  bool const_flag = (Back_End_Exceptions () && Is_Pure (gnat_subprog));
-  bool pure_flag = false;
+  /* Although the semantics of "pure" units in Ada essentially match those of
+ "const" in GNU C, the semantics of the Is_Pure flag in GNAT do not say
+ anything about access to global memory, that's why it needs to be mapped
+ to "pure" instead of "const" in GNU C.  The property is orthogonal to the
+ "nothrow" property only if the EH circuitry is explicit in the internal
+ representation of the middle-end: if we are to completely hide the EH
+ circuitry from it, we need to declare that calls to pure Ada subprograms
+ that can throw have side effects, since they can trigger an "abnormal"
+ transfer of control; therefore they cannot be "pure" in the GCC sense.  */
+  bool pure_flag = Is_Pure (gnat_subprog) && Back_End_Exceptions ();
   bool return_by_direct_ref_p = false;
   bool return_by_invisi_ref_p = false;
   bool return_unconstrained_p = false;
@@ -5899,14 +5899,14 @@ gnat_to_gnu_subprog_type (Entity_Id gnat_subprog, bool definition,
 }
 
   /* A procedure (something that doesn't return anything) shouldn't be
- considered const since there would be no reason for calling such a
+ considered pure since there would be no reason for calling such a
  subprogram.  Note that procedures with Out (or In Out) parameters
  have already been converted into a function with a return type.
  Similarly, if the function returns an unconstrained type, then the
  function will allocate the return value on the secondary stack and
  thus calls to it cannot be CSE'ed, lest the stack be reclaimed.  */
   if (VOID_TYPE_P (gnu_return_type) || return_unconstrained_p)
-const_flag = false;
+pure_flag = false;
 
   /* Loop over the parameters and get their associated GCC tree.  While doing
  this, build a copy-in copy-out structure if we need one.  */
@@ -6034,19 +6034,16 @@ gnat_to_gnu_subprog_type (Entity_Id gnat_subprog, bool definition,
 	  save_gnu_tree (gnat_param, gnu_param, false);
 
 	  /* A pure function in the Ada sense which takes an access parameter
-	 may modify memory through it and thus need be considered neither
-	 const nor pure in the GCC sense, unless it's access-to-function.
-	 Likewise it if takes a by-ref In Out or Out parameter.  But if it
-	 takes a by-ref In parameter, then it may only read memory through
-	 it and can be considered pure in the GCC sense.  */
-	  if ((const_flag || pure_flag)
+	 may modify memory through it and thus cannot be considered pure
+	 in the GCC sense, unless it's access-to-function.  Likewise it if
+	 takes a by-ref In Out or Out parameter.  But if it takes a by-ref
+	 In parameter, then it may only read memory through it and can be
+	 considered pure in the GCC sense.  */
+	  if (pure_flag
 	  && ((POINTER_TYPE_P (gnu_param_type)
 		   && TREE_CODE (TREE_TYPE (gnu_param_type)) != FUNCTION_TYPE)
 		  || TYPE_IS_FAT_POINTER_P (gnu_param_type)))
-	{
-	  const_flag = false;
-	  pure_flag = DECL_POINTS_TO_READONLY_P (gnu_param);
-	}
+	pure_flag = DECL_POINTS_TO_READONLY_P (gnu_param);
 	}
 
   /* If the parameter uses the copy-in copy-out mechanism, allocate a field
@@ -6246,9 +6243,6 @@ gnat_to_gnu_subprog_type (Entity_Id gnat_subprog, bool definition,
 	}
 	}
 
-  if (const_flag)
-	gnu_type = change_qualified_type (gnu_type, TYPE_QUAL_CONST);
-
   if (pure_flag)
 	gnu_type = change_qualified_type (gnu_type, 

Re: [PATCH] Try LTO partial linking. (Was: Speed of compiling gimple-match.c)

2021-05-21 Thread Martin Liška

On 5/20/21 2:54 PM, Richard Biener wrote:

On Thu, May 20, 2021 at 2:34 PM Martin Liška  wrote:


Hello.

I've got a patch candidate that leverages partial linking for a couple of 
selected object files.

I'm sending make all-host- jX results for my machine:

before: 3m18s (user 32m52s)
https://gist.githubusercontent.com/marxin/223890df4d8d8e490b6b2918b77dacad/raw/1dd5eae5001295ba0230a689f7edc67284c9b742/gcc-all-host.svg

after: 2m57m (user 35m)
https://gist.githubusercontent.com/marxin/223890df4d8d8e490b6b2918b77dacad/raw/d659b2187cf622167841efbbe6bc93cb33855fa9/gcc-all-host-partial-lto.svg

One can utilize it with:
make -j16 all-host PARTIAL_LTO=1

@Segher, Andrew: Can you please measure time improvement for your slow 
bootstrap?
One can also tweak --param=lto-partitions=16 param value.

Thoughts?


You're LTO linking multiple objects here - that's almost as if you
were doing this
for the whole of libbackend.a ... so $(OBJS)_CLFAGS += -flto and in the
libbackend.a rule do a similar partial link trick.


Yeah, apart from that one can't likely do partial linking for an archive:

$ g++ -no-pie -flto=auto --param=lto-partitions=16 -flinker-output=nolto-rel -r 
libbackend.a
collect2: fatal error: ld terminated with signal 11 [Segmentation fault], core 
dumped
compilation terminated.

while ld.bfd immediately finishes.



That gets you half of a LTO bootstrap then.

So why did you go from applying this per-file to multiple files?  Does $(LINKER)
have a proper rule to pick up a jobserver?

When upstreaming in any form you probably have to gate it on bootstrap-lto
being not active.


Sure, that's reasonable, we can likely detect a -flto option in $(COMPILE), 
right?

One more thing I face is broken dependency:
$ make clean && make -j32 PARTIAL_LTO=1

g++ -fcf-protection -fno-PIE -c   -g   -DIN_GCC -fPIC-fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -fno-common -Wno-unused -DHAVE_CONFIG_H -I. -I. 
-I/home/marxin/Programming/gcc/gcc -I/home/marxin/Programming/gcc/gcc/. 
-I/home/marxin/Programming/gcc/gcc/../include 
-I/home/marxin/Programming/gcc/gcc/../libcpp/include 
-I/home/marxin/Programming/gcc/gcc/../libcody  
-I/home/marxin/Programming/gcc/gcc/../libdecnumber 
-I/home/marxin/Programming/gcc/gcc/../libdecnumber/bid -I../libdecnumber 
-I/home/marxin/Programming/gcc/gcc/../libbacktrace   -o gimple-match-lto.o -MT 
gimple-match-lto.o -MMD -MP -MF ./.deps/gimple-match-lto.TPo gimple-match.c 
-flto
g++ -fcf-protection -fno-PIE -c   -g   -DIN_GCC -fPIC-fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wno-error=format-diag -Wmissing-format-attribute 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -fno-common -Wno-unused -DHAVE_CONFIG_H -I. -I. 
-I/home/marxin/Programming/gcc/gcc -I/home/marxin/Programming/gcc/gcc/. 
-I/home/marxin/Programming/gcc/gcc/../include 
-I/home/marxin/Programming/gcc/gcc/../libcpp/include 
-I/home/marxin/Programming/gcc/gcc/../libcody  
-I/home/marxin/Programming/gcc/gcc/../libdecnumber 
-I/home/marxin/Programming/gcc/gcc/../libdecnumber/bid -I../libdecnumber 
-I/home/marxin/Programming/gcc/gcc/../libbacktrace   -o generic-match-lto.o -MT 
generic-match-lto.o -MMD -MP -MF ./.deps/generic-match-lto.TPo generic-match.c 
-flto

In file included from ./tm.h:26,
 from /home/marxin/Programming/gcc/gcc/backend.h:28,
 from /home/marxin/Programming/gcc/gcc/generic-match-head.c:23,
 from generic-match.c:4:
/home/marxin/Programming/gcc/gcc/config/i386/i386.h:2286:10: fatal error: 
insn-attr-common.h: No such file or directory
 2286 | #include "insn-attr-common.h"
  |  ^~~~
compilation terminated.
make: *** [Makefile:2678: generic-match-lto.o] Error 1
make: *** Waiting for unfinished jobs

In file included from ./tm.h:26,
 from /home/marxin/Programming/gcc/gcc/backend.h:28,
 from /home/marxin/Programming/gcc/gcc/gimple-match-head.c:23,
 from gimple-match.c:4:
/home/marxin/Programming/gcc/gcc/config/i386/i386.h:2286:10: fatal error: 
insn-attr-common.h: No such file or directory
 2286 | #include "insn-attr-common.h"
  |  ^~~~

I explicitly added:
gimple-match.o: gimple-match.c $(generated_files)
generic-match.o: generic-match.c $(generated_files)

But it's not obeyed.

Martin



Richard.


Thanks,
Martin


>From b209c7aea76ceb8eadcf5843f30299fc8041a579 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 20 May 2021 14:29:35 +0200
Subject: [PATCH] Try LTO partial linking.

---
 gcc/Makefile.in | 33 +
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in

[Ada] Fix segfault at run time on strict-alignment platforms

2021-05-21 Thread Eric Botcazou
This fixes a regression present on the mainline and 11 branch by restricting 
the problematic change dealing with bitfields whose nomimal subtype is self-
referential to the cases where the size is really lower.

Tested on x86-64/Linux, applied on the mainline and 11 branch.


2021-05-21  Eric Botcazou  

* gcc-interface/trans.c (Call_to_gnu): Restrict previous change
to bitfields whose size is not equal to the type size.
(gnat_to_gnu): Likewise.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
index 8ce0d8ac9e8..ad11aea 100644
--- a/gcc/ada/gcc-interface/trans.c
+++ b/gcc/ada/gcc-interface/trans.c
@@ -4611,6 +4611,8 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, tree gnu_target,
 	  || (gnu_target
 	  && TREE_CODE (gnu_target) == COMPONENT_REF
 	  && DECL_BIT_FIELD (TREE_OPERAND (gnu_target, 1))
+	  && DECL_SIZE (TREE_OPERAND (gnu_target, 1))
+		 != TYPE_SIZE (TREE_TYPE (gnu_target))
 	  && type_is_padding_self_referential (gnu_result_type
 {
   gnu_retval = create_temporary ("R", gnu_result_type);
@@ -8370,7 +8372,9 @@ gnat_to_gnu (Node_Id gnat_node)
 	 much data.  But do not remove it if it is already too small.  */
   if (type_is_padding_self_referential (TREE_TYPE (gnu_result))
 	  && !(TREE_CODE (gnu_result) == COMPONENT_REF
-	   && DECL_BIT_FIELD (TREE_OPERAND (gnu_result, 1
+	   && DECL_BIT_FIELD (TREE_OPERAND (gnu_result, 1))
+	   && DECL_SIZE (TREE_OPERAND (gnu_result, 1))
+		  != TYPE_SIZE (TREE_TYPE (gnu_result
 	gnu_result = convert (TREE_TYPE (TYPE_FIELDS (TREE_TYPE (gnu_result))),
 			  gnu_result);
 }


[Ada] Fix spurious No_Elaboration violation for Size attribute

2021-05-21 Thread Eric Botcazou
We optimize the associated range check but nevertheless flag a violation.

Tested on x86-64/Linux, applied on the mainline.


2021-05-21  Eric Botcazou  

* gcc-interface/trans.c (Call_to_gnu): Minor tweaks.
(gnat_to_gnu_external): Likewise.
(Raise_Error_to_gnu): Return an empty statement list if there is a
condition and it is always false.
(gnat_to_gnu): Do not check for elaboration code a priori during the
translation but a posteriori instead.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
index f5686475698..8ce0d8ac9e8 100644
--- a/gcc/ada/gcc-interface/trans.c
+++ b/gcc/ada/gcc-interface/trans.c
@@ -4471,8 +4471,8 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, tree gnu_target,
   tree gnu_after_list = NULL_TREE;
   tree gnu_retval = NULL_TREE;
   tree gnu_call, gnu_result;
-  bool went_into_elab_proc = false;
-  bool pushed_binding_level = false;
+  bool went_into_elab_proc;
+  bool pushed_binding_level;
   bool variadic;
   bool by_descriptor;
   Entity_Id gnat_formal;
@@ -4555,6 +4555,8 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, tree gnu_target,
   current_function_decl = get_elaboration_procedure ();
   went_into_elab_proc = true;
 }
+  else
+went_into_elab_proc = false;
 
   /* First, create the temporary for the return value when:
 
@@ -4624,6 +4626,8 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, tree gnu_target,
   gnat_pushlevel ();
   pushed_binding_level = true;
 }
+  else
+pushed_binding_level = false;
 
   /* Create the list of the actual parameters as GCC expects it, namely a
  chain of TREE_LIST nodes in which the TREE_VALUE field of each node
@@ -6146,6 +6150,8 @@ Raise_Error_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p)
 	{
 	  if (!gnu_cond)
 	gnu_cond = gnat_to_gnu (gnat_cond);
+	  if (integer_zerop (gnu_cond))
+	return alloc_stmt_list ();
 	  gnu_result = build3 (COND_EXPR, void_type_node, gnu_cond, gnu_result,
 			   alloc_stmt_list ());
 	}
@@ -6271,12 +6277,12 @@ tree
 gnat_to_gnu (Node_Id gnat_node)
 {
   const Node_Kind kind = Nkind (gnat_node);
-  bool went_into_elab_proc = false;
   tree gnu_result = error_mark_node; /* Default to no value.  */
   tree gnu_result_type = void_type_node;
   tree gnu_expr, gnu_lhs, gnu_rhs;
   Node_Id gnat_temp;
   atomic_acces_t aa_type;
+  bool went_into_elab_proc;
   bool aa_sync;
 
   /* Save node number for error message and set location information.  */
@@ -6308,32 +6314,18 @@ gnat_to_gnu (Node_Id gnat_node)
 		   build_call_raise (CE_Range_Check_Failed, gnat_node,
  N_Raise_Constraint_Error));
 
-  if ((statement_node_p (gnat_node) && kind != N_Null_Statement)
-  || kind == N_Handled_Sequence_Of_Statements
-  || kind == N_Implicit_Label_Declaration)
+  /* If this is a statement and we are at top level, it must be part of the
+ elaboration procedure, so mark us as being in that procedure.  */
+  if ((statement_node_p (gnat_node)
+   || kind == N_Handled_Sequence_Of_Statements
+   || kind == N_Implicit_Label_Declaration)
+  && !current_function_decl)
 {
-  tree current_elab_proc = get_elaboration_procedure ();
-
-  /* If this is a statement and we are at top level, it must be part of
-	 the elaboration procedure, so mark us as being in that procedure.  */
-  if (!current_function_decl)
-	{
-	  current_function_decl = current_elab_proc;
-	  went_into_elab_proc = true;
-	}
-
-  /* If we are in the elaboration procedure, check if we are violating a
-	 No_Elaboration_Code restriction by having a statement there.  Don't
-	 check for a possible No_Elaboration_Code restriction violation on
-	 N_Handled_Sequence_Of_Statements, as we want to signal an error on
-	 every nested real statement instead.  This also avoids triggering
-	 spurious errors on dummy (empty) sequences created by the front-end
-	 for package bodies in some cases.  */
-  if (current_function_decl == current_elab_proc
-	  && kind != N_Handled_Sequence_Of_Statements
-	  && kind != N_Implicit_Label_Declaration)
-	Check_Elaboration_Code_Allowed (gnat_node);
+  current_function_decl = get_elaboration_procedure ();
+  went_into_elab_proc = true;
 }
+  else
+went_into_elab_proc = false;
 
   switch (kind)
 {
@@ -8235,6 +8227,14 @@ gnat_to_gnu (Node_Id gnat_node)
   gcc_unreachable ();
 }
 
+  /* If we are in the elaboration procedure, check if we are violating the
+ No_Elaboration_Code restriction by having a non-empty statement.  */
+  if (statement_node_p (gnat_node)
+  && !(TREE_CODE (gnu_result) == STATEMENT_LIST
+	   && empty_stmt_list_p (gnu_result))
+  && current_function_decl == get_elaboration_procedure ())
+Check_Elaboration_Code_Allowed (gnat_node);
+
   /* If we pushed the processing of the elaboration routine, pop it back.  */
   if (went_into_elab_proc)
 current_function_decl = 

[Ada] Fix incorrect SLOC on instruction

2021-05-21 Thread Eric Botcazou
This puts the missing SLOC on a statement generated by a return.

Tested on x86-64/Linux, applied on the mainline, 11 and 10 branches.


2021-05-21  Eric Botcazou  

* gcc-interface/trans.c (gnat_to_gnu) :
Put a SLOC on the assignment from the return value to the return
object in the copy-in/copy-out case.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
index 9aeaf038118..f5686475698 100644
--- a/gcc/ada/gcc-interface/trans.c
+++ b/gcc/ada/gcc-interface/trans.c
@@ -7624,8 +7624,10 @@ gnat_to_gnu (Node_Id gnat_node)
 	if (gnu_return_label_stack->last ())
 	  {
 	if (gnu_ret_val)
-	  add_stmt (build_binary_op (MODIFY_EXPR, NULL_TREE, gnu_ret_obj,
-	 gnu_ret_val));
+	  add_stmt_with_node (build_binary_op (MODIFY_EXPR,
+		   NULL_TREE, gnu_ret_obj,
+		   gnu_ret_val),
+  gnat_node);
 
 	gnu_result = build1 (GOTO_EXPR, void_type_node,
  gnu_return_label_stack->last ());


Re: [PATCH] Try LTO partial linking. (Was: Speed of compiling gimple-match.c)

2021-05-21 Thread Martin Liška

On 5/20/21 5:55 PM, Jan Hubicka wrote:

Quick solution is to also modify partitioner to use the local symbol
names when doing incremental linking (those mixing in source code and
random seeds) to avoid clashes.


Good hint. I added hash based on object file name (I don't want to handle
proper string escaping) and -frandom-seed.

What do you think about the patch?
Thanks,
Martin
>From 372d2944571906932fd1419bfc51a949d67b857e Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 21 May 2021 10:25:49 +0200
Subject: [PATCH] LTO: add lto_priv suffixfor LTO_LINKER_OUTPUT_NOLTOREL.

gcc/lto/ChangeLog:

	* lto-partition.c (privatize_symbol_name_1): Add random suffix
	based on hash of the object file and -frandom-seed.
---
 gcc/lto/lto-partition.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 15761ac9eb5..fef48c869a2 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-fnsummary.h"
 #include "lto-partition.h"
 #include "sreal.h"
+#include "toplev.h"
 
 vec ltrans_partitions;
 
@@ -941,9 +942,23 @@ privatize_symbol_name_1 (symtab_node *node, tree decl)
 
   name = maybe_rewrite_identifier (name);
   unsigned _number = lto_clone_numbers->get_or_insert (name);
-  symtab->change_decl_assembler_name (decl,
-  clone_function_name (
-	  name, "lto_priv", clone_number));
+
+  char *suffix = NULL;
+  if (flag_lto_linker_output == LTO_LINKER_OUTPUT_NOLTOREL)
+{
+  hashval_t fnhash = 0;
+  if (node->lto_file_data != NULL)
+	fnhash = htab_hash_string (node->lto_file_data->file_name);
+  suffix = XNEWVEC (char, 128);
+  char sep = symbol_table::symbol_suffix_separator ();
+  sprintf (suffix, "lto_priv%c%u%c%" PRIu64, sep, fnhash, sep,
+	   (unsigned HOST_WIDE_INT)get_random_seed (false));
+}
+
+  tree clone
+= clone_function_name (name, suffix ? suffix : "lto_priv", clone_number);
+  symtab->change_decl_assembler_name (decl, clone);
+  free (suffix);
   clone_number++;
 
   if (node->lto_file_data)
-- 
2.31.1



[Ada] Use EXACT_DIV_EXPR as much as possible

2021-05-21 Thread Eric Botcazou
...when the division is exact, typically dividing TYPE_SIZE by BITS_PER_UNIT.

Tested on x86-64/Linux, applied on the mainline.


2021-05-21  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_entity) : Replace
CEIL_DIV_EXPR with EXACT_DIV_EXPR.
* gcc-interface/misc.c (gnat_type_max_size): Likewise.
* gcc-interface/utils.c (maybe_pad_type): Likewise.
(finish_record_type): Likewise.  And always compute the unit size.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c
index 232b552a60c..a7595cbf514 100644
--- a/gcc/ada/gcc-interface/decl.c
+++ b/gcc/ada/gcc-interface/decl.c
@@ -1343,7 +1343,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 	|| (gnu_size
 		&& !allocatable_size_p (convert (sizetype,
 		 size_binop
-		 (CEIL_DIV_EXPR, gnu_size,
+		 (EXACT_DIV_EXPR, gnu_size,
 		  bitsize_unit_node)),
 	global_bindings_p ()
 	|| !definition
diff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c
index f302cf0a6f0..72a2624b9da 100644
--- a/gcc/ada/gcc-interface/misc.c
+++ b/gcc/ada/gcc-interface/misc.c
@@ -752,7 +752,7 @@ gnat_type_max_size (const_tree gnu_type)
 	 type's alignment and return the result in units.  */
   if (tree_fits_uhwi_p (max_ada_size))
 	max_size_unit
-	  = size_binop (CEIL_DIV_EXPR,
+	  = size_binop (EXACT_DIV_EXPR,
 			round_up (max_ada_size, TYPE_ALIGN (gnu_type)),
 			bitsize_unit_node);
 }
diff --git a/gcc/ada/gcc-interface/utils.c b/gcc/ada/gcc-interface/utils.c
index 80a4160adf3..8539fe497e1 100644
--- a/gcc/ada/gcc-interface/utils.c
+++ b/gcc/ada/gcc-interface/utils.c
@@ -1547,7 +1547,7 @@ maybe_pad_type (tree type, tree size, unsigned int align,
   TYPE_SIZE (record) = size ? size : orig_size;
   TYPE_SIZE_UNIT (record)
 = convert (sizetype,
-	   size_binop (CEIL_DIV_EXPR, TYPE_SIZE (record),
+	   size_binop (EXACT_DIV_EXPR, TYPE_SIZE (record),
 			   bitsize_unit_node));
 
   /* If we are changing the alignment and the input type is a record with
@@ -1970,7 +1970,6 @@ finish_record_type (tree record_type, tree field_list, int rep_level,
 {
   const enum tree_code orig_code = TREE_CODE (record_type);
   const bool had_size = TYPE_SIZE (record_type) != NULL_TREE;
-  const bool had_size_unit = TYPE_SIZE_UNIT (record_type) != NULL_TREE;
   const bool had_align = TYPE_ALIGN (record_type) > 0;
   /* For all-repped records with a size specified, lay the QUAL_UNION_TYPE
  out just like a UNION_TYPE, since the size will be fixed.  */
@@ -1997,9 +1996,6 @@ finish_record_type (tree record_type, tree field_list, int rep_level,
 
   if (!had_size)
 	TYPE_SIZE (record_type) = bitsize_zero_node;
-
-  if (!had_size_unit)
-	TYPE_SIZE_UNIT (record_type) = size_zero_node;
 }
   else
 {
@@ -2155,19 +2151,22 @@ finish_record_type (tree record_type, tree field_list, int rep_level,
   /* We need to set the regular sizes if REP_LEVEL is one.  */
   if (rep_level == 1)
 {
+  /* We round TYPE_SIZE and TYPE_SIZE_UNIT up to TYPE_ALIGN separately
+	 to avoid having very large masking constants in TYPE_SIZE_UNIT.  */
+  const unsigned int align = TYPE_ALIGN (record_type);
+
   /* If this is a padding record, we never want to make the size smaller
 	 than what was specified in it, if any.  */
-  if (TYPE_IS_PADDING_P (record_type) && TYPE_SIZE (record_type))
+  if (TYPE_IS_PADDING_P (record_type) && had_size)
 	size = TYPE_SIZE (record_type);
-
-  tree size_unit = had_size_unit
-		   ? TYPE_SIZE_UNIT (record_type)
-		   : convert (sizetype,
-  size_binop (CEIL_DIV_EXPR, size,
-	  bitsize_unit_node));
-  const unsigned int align = TYPE_ALIGN (record_type);
+  else
+	size = round_up (size, BITS_PER_UNIT);
 
   TYPE_SIZE (record_type) = variable_size (round_up (size, align));
+
+  tree size_unit
+	= convert (sizetype,
+		   size_binop (EXACT_DIV_EXPR, size, bitsize_unit_node));
   TYPE_SIZE_UNIT (record_type)
 	= variable_size (round_up (size_unit, align / BITS_PER_UNIT));
 }


Re: [Ping^2, Patch, Fortran] PR98301 Re: RANDOM_INIT() and coarray Fortran

2021-05-21 Thread Andre Vehreschild via Gcc-patches
Ping, ping!

Please find attached a rebased version of the patch for the RANDOM_INIT issue
with coarray Fortran. Nothing changed to the previous version, just rebased to
current master.

Regtested fine on x86_64-linux/f33. Ok for trunk?

- Andre

On Mon, 3 May 2021 08:20:36 -0700
Steve Kargl  wrote:

> On Mon, May 03, 2021 at 11:21:10AM +0200, Andre Vehreschild wrote:
> > Ping!
> >
> > Ok for trunk?
> >
> > I have looked at other patches, but none was patching any location I have
> > worked on previously. Therefore I can't return the favor of reviewing any
> > currently open patches and have to ask for volunteers here.
> >
> > - Andre
> >
>
> I doubt I'm allowed to approve a patch, where I wrote a portion
> of it.  However, if no one else steps forward in the next day
> or two, then I think you should commit.
>


--
Andre Vehreschild * Email: vehre ad gmx dot de
Steve Kargl  

PR fortran/98301 - random_init() is broken

Correct implementation of random_init() when -fcoarray=lib is given.

gcc/fortran/ChangeLog:

	PR fortran/98301
	* trans-decl.c (gfc_build_builtin_function_decls): Move decl.
	* trans-intrinsic.c (conv_intrinsic_random_init): Use bool for
	lib-call of caf_random_init instead of logical (4-byte).
	* trans.h: Add tree var for random_init.

libgfortran/ChangeLog:

	PR fortran/98301
	* caf/libcaf.h (_gfortran_caf_random_init): New function.
	* caf/single.c (_gfortran_caf_random_init): New function.
	* gfortran.map: Added fndecl.
	* intrinsics/random_init.f90: Implement random_init.

diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index 406b4aeb1d4..c32bd05bb1b 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -170,6 +170,7 @@ tree gfor_fndecl_co_min;
 tree gfor_fndecl_co_reduce;
 tree gfor_fndecl_co_sum;
 tree gfor_fndecl_caf_is_present;
+tree gfor_fndecl_caf_random_init;


 /* Math functions.  Many other math functions are handled in
@@ -233,7 +234,7 @@ tree gfor_fndecl_cgemm;
 tree gfor_fndecl_zgemm;

 /* RANDOM_INIT function.  */
-tree gfor_fndecl_random_init;
+tree gfor_fndecl_random_init;  /* libgfortran, 1 image only.  */

 static void
 gfc_add_decl_to_parent_function (tree decl)
@@ -3516,6 +3517,8 @@ gfc_build_intrinsic_function_decls (void)
 	void_type_node, 3, gfc_logical4_type_node, gfc_logical4_type_node,
 	gfc_int4_type_node);

+ // gfor_fndecl_caf_rand_init is defined in the lib-coarray section below.
+
   gfor_fndecl_sc_kind = gfc_build_library_function_decl_with_spec (
 	get_identifier (PREFIX("selected_char_kind")), ". . R ",
 	gfc_int4_type_node, 2, gfc_charlen_type_node, pchar_type_node);
@@ -4081,6 +4084,10 @@ gfc_build_builtin_function_decls (void)
 	get_identifier (PREFIX("caf_is_present")), ". r . r ",
 	integer_type_node, 3, pvoid_type_node, integer_type_node,
 	pvoid_type_node);
+
+  gfor_fndecl_caf_random_init = gfc_build_library_function_decl (
+	get_identifier (PREFIX("caf_random_init")),
+	void_type_node, 2, logical_type_node, logical_type_node);
 }

   gfc_build_intrinsic_function_decls ();
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 4d7451479d3..db9248c0043 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -3827,38 +3827,43 @@ conv_intrinsic_random_init (gfc_code *code)
 {
   stmtblock_t block;
   gfc_se se;
-  tree arg1, arg2, arg3, tmp;
-  tree logical4_type_node = gfc_get_logical_type (4);
+  tree arg1, arg2, tmp;
+  /* On none coarray == lib compiles use LOGICAL(4) else regular LOGICAL.  */
+  tree used_bool_type_node = flag_coarray == GFC_FCOARRAY_LIB
+			 ? logical_type_node
+			 : gfc_get_logical_type (4);

   /* Make the function call.  */
   gfc_init_block ();
   gfc_init_se (, NULL);

-  /* Convert REPEATABLE to a LOGICAL(4) entity.  */
+  /* Convert REPEATABLE to the desired LOGICAL entity.  */
   gfc_conv_expr (, code->ext.actual->expr);
   gfc_add_block_to_block (, );
-  arg1 = fold_convert (logical4_type_node, gfc_evaluate_now (se.expr, ));
+  arg1 = fold_convert (used_bool_type_node, gfc_evaluate_now (se.expr, ));
   gfc_add_block_to_block (, );

-  /* Convert IMAGE_DISTINCT to a LOGICAL(4) entity.  */
+  /* Convert IMAGE_DISTINCT to the desired LOGICAL entity.  */
   gfc_conv_expr (, code->ext.actual->next->expr);
   gfc_add_block_to_block (, );
-  arg2 = fold_convert (logical4_type_node, gfc_evaluate_now (se.expr, ));
+  arg2 = fold_convert (used_bool_type_node, gfc_evaluate_now (se.expr, ));
   gfc_add_block_to_block (, );

-  /* Create the hidden argument.  For non-coarray codes and -fcoarray=single,
- simply set this to 0.  For -fcoarray=lib, generate a call to
- THIS_IMAGE() without arguments.  */
-  arg3 = build_int_cst (gfc_get_int_type (4), 0);
   if (flag_coarray == GFC_FCOARRAY_LIB)
 {
-  arg3 = build_call_expr_loc (input_location, gfor_fndecl_caf_this_image,
-  1, arg3);
-  se.expr = fold_convert (gfc_get_int_type (4), arg3);
+  tmp = build_call_expr_loc (input_location, 

[PATCH] aarch64: Add attributes for builtins specified in aarch64-builtins.c

2021-05-21 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

Besides the builtins in aarch64-simd-builtins.def there are a number of 
builtins defined in aarch64-builtins.c itself.
They could also benefit from the attributes generated by aarch64_get_attributes.
However aarch64_get_attributes and its helpers are only set up to handle a 
aarch64_simd_builtin_datum.

This patch changes these functions to instead take a flag and mode value that 
are extracted from
aarch64_simd_builtin_datum.flags and aarch64_simd_builtin_datum.mode anyway.
Then the various builtin init functions in aarch64-builtins.c can pass down 
their own FLAG_* flags
that they want to derive attributes from.
I'm not too happy with changing the helper functions in this way, but it seemed 
like the least bad option.

Richard, does this look like an ok approach to you?

Bootstrapped and tested on aarch64-none-linux-gnu.

Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c (aarch64_call_properties):
Take a flag and mode value as arguments.
(aarch64_modifies_global_state_p): Likewise.
(aarch64_reads_global_state_p): Likewise.
(aarch64_could_trap_p): Likewise.
(aarch64_get_attributes): Likewise.
(aarch64_init_simd_builtins): Adjust callsite of above.
(aarch64_init_fcmla_laneq_builtins): Use aarch64_get_attributes to get
function attributes to apply to builtins.
(aarch64_init_crc32_builtins): Likewise.
(aarch64_init_builtin_rsqrt): Likewise.


attrs.patch
Description: attrs.patch


Re: [PATCH] constructor: Elide expand_constructor when can move by pieces is true

2021-05-21 Thread Bernd Edlinger



On 5/21/21 8:57 AM, Richard Biener wrote:
> On Thu, May 20, 2021 at 4:04 PM H.J. Lu  wrote:
>>
>> On Thu, May 20, 2021 at 12:51 AM Richard Biener
>>  wrote:
>>>
>>> On Wed, May 19, 2021 at 3:22 PM H.J. Lu  wrote:

 On Wed, May 19, 2021 at 2:33 AM Richard Biener
  wrote:
>
> On Tue, May 18, 2021 at 9:16 PM H.J. Lu  wrote:
>>
>> When expanding a constant constructor, don't call expand_constructor if
>> it is more efficient to load the data from the memory via move by pieces.
>>
>> gcc/
>>
>> PR middle-end/90773
>> * expr.c (expand_expr_real_1): Don't call expand_constructor if
>> it is more efficient to load the data from the memory.
>>
>> gcc/testsuite/
>>
>> PR middle-end/90773
>> * gcc.target/i386/pr90773-24.c: New test.
>> * gcc.target/i386/pr90773-25.c: Likewise.
>> ---
>>  gcc/expr.c | 10 ++
>>  gcc/testsuite/gcc.target/i386/pr90773-24.c | 22 ++
>>  gcc/testsuite/gcc.target/i386/pr90773-25.c | 20 
>>  3 files changed, 52 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-24.c
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-25.c
>>
>> diff --git a/gcc/expr.c b/gcc/expr.c
>> index d09ee42e262..80e01ea1cbe 100644
>> --- a/gcc/expr.c
>> +++ b/gcc/expr.c
>> @@ -10886,6 +10886,16 @@ expand_expr_real_1 (tree exp, rtx target, 
>> machine_mode tmode,
>> unsigned HOST_WIDE_INT ix;
>> tree field, value;
>>
>> +   /* Check if it is more efficient to load the data from
>> +  the memory directly.  FIXME: How many stores do we
>> +  need here if not moved by pieces?  */
>> +   unsigned HOST_WIDE_INT bytes
>> + = tree_to_uhwi (TYPE_SIZE_UNIT (type));
>
> that's prone to fail - it could be a VLA.

 What do you mean by fail?  Is it ICE or missed optimization?
 Do you have a testcase?

>
>> +   if ((bytes / UNITS_PER_WORD) > 2
>> +   && MOVE_MAX_PIECES > UNITS_PER_WORD
>> +   && can_move_by_pieces (bytes, TYPE_ALIGN (type)))
>> + goto normal_inner_ref;
>> +
>
> It looks like you're concerned about aggregate copies but this also 
> handles
> non-aggregates (which on GIMPLE might already be optimized of course).

 Here I check if we copy more than 2 words and we can move more than
 a word in a single instruction.

> Also you say "if it's cheaper" but I see no cost considerations.  How do
> we generally handle immed const vs. load from constant pool costs?

 This trades 2 (update to 8) stores with one load plus one store.  Is there
 a way to check which one is faster?
>>>
>>> I'm not sure - it depends on whether the target can do stores from 
>>> immediates
>>> at all or what restrictions apply, what the immediate value actually is
>>> (zero or all-ones should be way cheaper than sth arbitrary) and how the
>>> pressure on the load unit is.  can_move_by_pieces (bytes, TYPE_ALIGN (type))
>>> also does not guarantee it will actually move pieces larger than 
>>> UNITS_PER_WORD,
>>> that might depend on alignment.  There's by_pieces_ninsns that might provide
>>> some hint here.
>>>
>>> I'm sure it works well for x86.
>>>
>>> I wonder if the existing code is in the appropriate place and we
>>> shouldn't instead
>>> handle this somewhere upthread where we ask to copy 'exp' into some other
>>> memory location.  For your testcase that's expand_assignment but I can
>>> imagine passing array[0] by value to a function resulting in similar 
>>> copying.
>>> Testing that shows we get
>>>
>>> pushq   array+56(%rip)
>>> .cfi_def_cfa_offset 24
>>> pushq   array+48(%rip)
>>> .cfi_def_cfa_offset 32
>>> pushq   array+40(%rip)
>>> .cfi_def_cfa_offset 40
>>> pushq   array+32(%rip)
>>> .cfi_def_cfa_offset 48
>>> pushq   array+24(%rip)
>>> .cfi_def_cfa_offset 56
>>> pushq   array+16(%rip)
>>> .cfi_def_cfa_offset 64
>>> pushq   array+8(%rip)
>>> .cfi_def_cfa_offset 72
>>> pushq   array(%rip)
>>> .cfi_def_cfa_offset 80
>>> callbar
>>>
>>> for that.  We do have the by-pieces infrastructure to generally do this 
>>> kind of
>>> copying but in both of these cases we do not seem to use it.  I also wonder
>>> if the by-pieces infrastructure can pick up constant initializers 
>>> automagically
>>> (we could native_encode the initializer part and feed the by-pieces
>>> infrastructure with an array of bytes).  There for example might be easy to
>>> immediate-store byte parts and difficult ones where we could decide on a
>>> 

Re: [PATCH] constructor: Elide expand_constructor when can move by pieces is true

2021-05-21 Thread Richard Biener via Gcc-patches
On Thu, May 20, 2021 at 4:04 PM H.J. Lu  wrote:
>
> On Thu, May 20, 2021 at 12:51 AM Richard Biener
>  wrote:
> >
> > On Wed, May 19, 2021 at 3:22 PM H.J. Lu  wrote:
> > >
> > > On Wed, May 19, 2021 at 2:33 AM Richard Biener
> > >  wrote:
> > > >
> > > > On Tue, May 18, 2021 at 9:16 PM H.J. Lu  wrote:
> > > > >
> > > > > When expanding a constant constructor, don't call expand_constructor 
> > > > > if
> > > > > it is more efficient to load the data from the memory via move by 
> > > > > pieces.
> > > > >
> > > > > gcc/
> > > > >
> > > > > PR middle-end/90773
> > > > > * expr.c (expand_expr_real_1): Don't call expand_constructor 
> > > > > if
> > > > > it is more efficient to load the data from the memory.
> > > > >
> > > > > gcc/testsuite/
> > > > >
> > > > > PR middle-end/90773
> > > > > * gcc.target/i386/pr90773-24.c: New test.
> > > > > * gcc.target/i386/pr90773-25.c: Likewise.
> > > > > ---
> > > > >  gcc/expr.c | 10 ++
> > > > >  gcc/testsuite/gcc.target/i386/pr90773-24.c | 22 
> > > > > ++
> > > > >  gcc/testsuite/gcc.target/i386/pr90773-25.c | 20 
> > > > >  3 files changed, 52 insertions(+)
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-24.c
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-25.c
> > > > >
> > > > > diff --git a/gcc/expr.c b/gcc/expr.c
> > > > > index d09ee42e262..80e01ea1cbe 100644
> > > > > --- a/gcc/expr.c
> > > > > +++ b/gcc/expr.c
> > > > > @@ -10886,6 +10886,16 @@ expand_expr_real_1 (tree exp, rtx target, 
> > > > > machine_mode tmode,
> > > > > unsigned HOST_WIDE_INT ix;
> > > > > tree field, value;
> > > > >
> > > > > +   /* Check if it is more efficient to load the data from
> > > > > +  the memory directly.  FIXME: How many stores do we
> > > > > +  need here if not moved by pieces?  */
> > > > > +   unsigned HOST_WIDE_INT bytes
> > > > > + = tree_to_uhwi (TYPE_SIZE_UNIT (type));
> > > >
> > > > that's prone to fail - it could be a VLA.
> > >
> > > What do you mean by fail?  Is it ICE or missed optimization?
> > > Do you have a testcase?
> > >
> > > >
> > > > > +   if ((bytes / UNITS_PER_WORD) > 2
> > > > > +   && MOVE_MAX_PIECES > UNITS_PER_WORD
> > > > > +   && can_move_by_pieces (bytes, TYPE_ALIGN (type)))
> > > > > + goto normal_inner_ref;
> > > > > +
> > > >
> > > > It looks like you're concerned about aggregate copies but this also 
> > > > handles
> > > > non-aggregates (which on GIMPLE might already be optimized of course).
> > >
> > > Here I check if we copy more than 2 words and we can move more than
> > > a word in a single instruction.
> > >
> > > > Also you say "if it's cheaper" but I see no cost considerations.  How do
> > > > we generally handle immed const vs. load from constant pool costs?
> > >
> > > This trades 2 (update to 8) stores with one load plus one store.  Is there
> > > a way to check which one is faster?
> >
> > I'm not sure - it depends on whether the target can do stores from 
> > immediates
> > at all or what restrictions apply, what the immediate value actually is
> > (zero or all-ones should be way cheaper than sth arbitrary) and how the
> > pressure on the load unit is.  can_move_by_pieces (bytes, TYPE_ALIGN (type))
> > also does not guarantee it will actually move pieces larger than 
> > UNITS_PER_WORD,
> > that might depend on alignment.  There's by_pieces_ninsns that might provide
> > some hint here.
> >
> > I'm sure it works well for x86.
> >
> > I wonder if the existing code is in the appropriate place and we
> > shouldn't instead
> > handle this somewhere upthread where we ask to copy 'exp' into some other
> > memory location.  For your testcase that's expand_assignment but I can
> > imagine passing array[0] by value to a function resulting in similar 
> > copying.
> > Testing that shows we get
> >
> > pushq   array+56(%rip)
> > .cfi_def_cfa_offset 24
> > pushq   array+48(%rip)
> > .cfi_def_cfa_offset 32
> > pushq   array+40(%rip)
> > .cfi_def_cfa_offset 40
> > pushq   array+32(%rip)
> > .cfi_def_cfa_offset 48
> > pushq   array+24(%rip)
> > .cfi_def_cfa_offset 56
> > pushq   array+16(%rip)
> > .cfi_def_cfa_offset 64
> > pushq   array+8(%rip)
> > .cfi_def_cfa_offset 72
> > pushq   array(%rip)
> > .cfi_def_cfa_offset 80
> > callbar
> >
> > for that.  We do have the by-pieces infrastructure to generally do this 
> > kind of
> > copying but in both of these cases we do not seem to use it.  I also wonder
> > if the by-pieces infrastructure can pick up constant initializers 
> > automagically
> > (we could native_encode the initializer part and feed the by-pieces
> > 

Re: [PATCH] Hashtable PR96088

2021-05-21 Thread Jonathan Wakely via Gcc-patches
On Fri, 21 May 2021, 07:48 Jonathan Wakely,  wrote:

>
>
> On Fri, 21 May 2021, 07:31 François Dumont via Libstdc++, <
> libstd...@gcc.gnu.org> wrote:
>
>> On 20/05/21 6:44 pm, Jonathan Wakely wrote:
>> > On 06/05/21 22:03 +0200, François Dumont via Libstdc++ wrote:
>> >> Hi
>> >>
>> >> Considering your feedback on backtrace in debug mode is going to
>> >> take me some time so here is another one.
>> >>
>> >> Compared to latest submission I've added a _Hash_arg_t partial
>> >> specialization for std::hash<>. It is not strictly necessary for the
>> >> moment but when we will eventually remove its nested argument_type it
>> >> will be. I also wonder if it is not easier to handle for the
>> >> compiler, not sure about that thought.
>> >
>> > The std::hash specializations in libstdc++ define argument_type, but
>> > I'm already working on one that doesn't (forstd::stacktrace).
>> >
>> > And std::hash can be specialized by users,
>> > and is not required to provide argument_type.
>> >
>> > So it's already not valid to assume that std::hash::argument_type
>> > exists.
>>
>> Yes, I know that the plan is to get rid of argument_type. But as long as
>> it is there we can still use it even if I didn't realize that you were
>> already in the process of removing it so soon.
>>
>
> I'm adding a new specialization for a C++23 type, and not adding the
> nested types.
>
>
>
>>
>> >
>> >> @@ -850,9 +852,56 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> >> iterator
>> >> _M_emplace(const_iterator, false_type __uks, _Args&&... __args);
>> >>
>> >> +  template
>> >> +std::pair
>> >> +_M_insert_unique(_Kt&&, _Arg&&, const _NodeGenerator&);
>> >> +
>> >> +  // Detect nested argument_type.
>> >> +  template>
>> >> +struct _Hash_arg_t
>> >> +{ typedef _Kt argument_type; };
>> >> +
>> >> +  // std::hash
>> >> +  template
>> >> +struct _Hash_arg_t<_Kt, std::hash<_Arg>>
>> >> +{ typedef _Arg argument_type; };
>> >> +
>> >> +  // Nested argument_type.
>> >> +  template
>> >> +struct _Hash_arg_t<_Kt, _Ht,
>> >> +  __void_t>
>> >> +{ typedef typename _Ht::argument_type argument_type; };
>> >> +
>> >> +  // Function pointer.
>> >> +  template
>> >> +struct _Hash_arg_t<_Kt, std::size_t(*)(const _Arg&)>
>> >> +{ typedef _Arg argument_type; };
>> >> +
>> >> +  template> >> +   typename _ArgType
>> >> + = typename _Hash_arg_t<_Kt, _Hash>::argument_type>
>> >> +static typename conditional<
>> >> +  __is_nothrow_convertible<_Kt, _ArgType>::value, _Kt&&,
>> >> key_type>::type
>> >
>> > Please use __conditional_t<...> here instead of
>> > typename conditional<...>::type.
>> >
>> > The purpose of the _Hash_arg_t type is to determine whether invoking
>> > the hash function with _Kt&& can throw, right?
>>
>> No, the purpose of _Hash_arg_t is to find out what is the argument type
>> of the _Hash functor. With this info I can check if invoking that
>> functor is going to instantiate a temporary using a throwing operation.
>
>
> Right, that's what I mean.
>
>
>> If so, I create a temporary at _Hashtable code level and move it to its
>> final storage place when needed.
>>
>> The tricky part is that _Hash can accept different argument types, for
>> the moment I just do not create a temporary in this case.
>>
>> >
>> > And if it can throw, you force a conversion early, and if it can't,
>> > you don't do the conversion.
>> >
>> > Can't you use __is_nothrow_invocable<_Hash&, _Kt> for that, instead of
>> > this fragile approach?
>> >
>> I think I already try but I'll check.
>>
>> I fear that __is_nothrow_invocable<_Hash&, _Kt> tells if the chosen
>> operator()(const _Arg&) is noexcept qualified.
>
>
> No.
>
> Not if the conversion
>> from _Kt to _Arg is noexcept.
>>
>
>
> It does exactly what you need.
>

And even if it didn't, you could do it yourself with the noexcept operator:

noexcept(std::declval<_Hash&>()(std::declval<_Kt>()));


This is what the trait uses, and it tells you if invoking Hash with a Kt
will throw. If a conversion to the argument type is needed, the compiler
will consider whether that can throw.

You don't need to know the argument type, you only need to know if the
expression can throw, and C++11 provides the tools to check that accurately.


Re: [PATCH] Hashtable PR96088

2021-05-21 Thread Jonathan Wakely via Gcc-patches
On Fri, 21 May 2021, 07:31 François Dumont via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> On 20/05/21 6:44 pm, Jonathan Wakely wrote:
> > On 06/05/21 22:03 +0200, François Dumont via Libstdc++ wrote:
> >> Hi
> >>
> >> Considering your feedback on backtrace in debug mode is going to
> >> take me some time so here is another one.
> >>
> >> Compared to latest submission I've added a _Hash_arg_t partial
> >> specialization for std::hash<>. It is not strictly necessary for the
> >> moment but when we will eventually remove its nested argument_type it
> >> will be. I also wonder if it is not easier to handle for the
> >> compiler, not sure about that thought.
> >
> > The std::hash specializations in libstdc++ define argument_type, but
> > I'm already working on one that doesn't (forstd::stacktrace).
> >
> > And std::hash can be specialized by users,
> > and is not required to provide argument_type.
> >
> > So it's already not valid to assume that std::hash::argument_type
> > exists.
>
> Yes, I know that the plan is to get rid of argument_type. But as long as
> it is there we can still use it even if I didn't realize that you were
> already in the process of removing it so soon.
>

I'm adding a new specialization for a C++23 type, and not adding the nested
types.



>
> >
> >> @@ -850,9 +852,56 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >> iterator
> >> _M_emplace(const_iterator, false_type __uks, _Args&&... __args);
> >>
> >> +  template
> >> +std::pair
> >> +_M_insert_unique(_Kt&&, _Arg&&, const _NodeGenerator&);
> >> +
> >> +  // Detect nested argument_type.
> >> +  template>
> >> +struct _Hash_arg_t
> >> +{ typedef _Kt argument_type; };
> >> +
> >> +  // std::hash
> >> +  template
> >> +struct _Hash_arg_t<_Kt, std::hash<_Arg>>
> >> +{ typedef _Arg argument_type; };
> >> +
> >> +  // Nested argument_type.
> >> +  template
> >> +struct _Hash_arg_t<_Kt, _Ht,
> >> +  __void_t>
> >> +{ typedef typename _Ht::argument_type argument_type; };
> >> +
> >> +  // Function pointer.
> >> +  template
> >> +struct _Hash_arg_t<_Kt, std::size_t(*)(const _Arg&)>
> >> +{ typedef _Arg argument_type; };
> >> +
> >> +  template >> +   typename _ArgType
> >> + = typename _Hash_arg_t<_Kt, _Hash>::argument_type>
> >> +static typename conditional<
> >> +  __is_nothrow_convertible<_Kt, _ArgType>::value, _Kt&&,
> >> key_type>::type
> >
> > Please use __conditional_t<...> here instead of
> > typename conditional<...>::type.
> >
> > The purpose of the _Hash_arg_t type is to determine whether invoking
> > the hash function with _Kt&& can throw, right?
>
> No, the purpose of _Hash_arg_t is to find out what is the argument type
> of the _Hash functor. With this info I can check if invoking that
> functor is going to instantiate a temporary using a throwing operation.


Right, that's what I mean.


> If so, I create a temporary at _Hashtable code level and move it to its
> final storage place when needed.
>
> The tricky part is that _Hash can accept different argument types, for
> the moment I just do not create a temporary in this case.
>
> >
> > And if it can throw, you force a conversion early, and if it can't,
> > you don't do the conversion.
> >
> > Can't you use __is_nothrow_invocable<_Hash&, _Kt> for that, instead of
> > this fragile approach?
> >
> I think I already try but I'll check.
>
> I fear that __is_nothrow_invocable<_Hash&, _Kt> tells if the chosen
> operator()(const _Arg&) is noexcept qualified.


No.

Not if the conversion
> from _Kt to _Arg is noexcept.
>


It does exactly what you need.


>
>


Re: [i386] [PATCH] Fix ICE when lhs is NULL [PR target/100660]

2021-05-21 Thread Richard Biener via Gcc-patches
On Fri, May 21, 2021 at 4:41 AM Hongtao Liu  wrote:
>
> On Thu, May 20, 2021 at 4:30 PM Richard Biener
>  wrote:
> >
> > On Thu, May 20, 2021 at 10:15 AM Hongtao Liu  wrote:
> > >
> > > On Thu, May 20, 2021 at 4:06 PM Richard Biener
> > >  wrote:
> > > >
> > > > On Thu, May 20, 2021 at 8:54 AM Hongtao Liu  wrote:
> > > > >
> > > > > Hi:
> > > > >   In folding target-specific builtin, when lhs is NULL, create a
> > > > > temporary variable for it.
> > > > >   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}
> > > >
> > > > I would suggest to drop the stmt or leave it unfolded instead.
> > > Will -O0 be able to optimize the builtin away?
> > >  Since i've deleted the corresponding expander, there would be an
> > > error if the builtin goes directly to pass_expand.
> >
> > In that case replace it with a NOP, that should be safe then.
> >
> update patch.

OK.

Thanks,
Richard.

> > Richard.
> >
> > > > Note dropping would mean replacing it with a GIMPLE_NOP
> > > > (gimple_build_nop ()).  But creating a new unused LHS certainly
> > > > works as well.
> > > >
> > > > Jakub, any preference?
> > > >
> > > > Richard.
> > > >
> > > > > gcc/ChangeLog:
> > > > > PR target/100660
> > > > > * config/i386/i386.c (ix86_gimple_fold_builtin): Create a tmp
> > > > > variable for lhs when it doesn't exist.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > > PR target/100660
> > > > > * gcc.target/i386/pr100660.c: New test.
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > BR,
> > > > > Hongtao
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
>
>
>
> --
> BR,
> Hongtao


[PATCH] i386: Add minmax and abs patterns for 4-byte vectors [PR100637]

2021-05-21 Thread Uros Bizjak via Gcc-patches
2021-05-21  Uroš Bizjak  

gcc/
PR target/100637
* config/i386/mmx.md (SMAXMIN_MMXMODEI): New mode iterator.
(3): Macroize expander
from v4hi3> and 3
using SMAXMIN_MMXMODEI mode iterator.
(*v4qi3): New insn pattern.
(*v2hi3): Ditto.
(SMAXMIN_VI_32): New mode iterator.
(mode3): New expander.

(UMAXMIN_MMXMODEI): New mode iterator.
(3): Macroize expander
from v8qi3> and 3
using UMAXMIN_MMXMODEI mode iterator.
(*v4qi3): New insn pattern.
(*v2hi3): Ditto.
(UMAXMIN_VI_32): New mode iterator.
(mode3): New expander.

(abs2): New insn pattern.
(ssse3_abs2, abs2): Move from ...
* config/i386/sse.md: ... here.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index baeed04d8c9..5e92be34545 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1691,13 +1691,11 @@ (define_insn "*sse2_umulv1siv1di3"
(set_attr "type" "mmxmul,ssemul,ssemul")
(set_attr "mode" "DI,TI,TI")])
 
-(define_expand "3"
-  [(set (match_operand:MMXMODE14 0 "register_operand")
-(smaxmin:MMXMODE14
- (match_operand:MMXMODE14 1 "register_operand")
- (match_operand:MMXMODE14 2 "register_operand")))]
-  "TARGET_MMX_WITH_SSE && TARGET_SSE4_1"
-  "ix86_fixup_binary_operands_no_copy (, mode, operands);")
+;
+;;
+;; Parallel integral shifts
+;;
+;
 
 (define_insn "*mmx_3"
   [(set (match_operand:MMXMODE14 0 "register_operand" "=Yr,*x,Yv")
@@ -1725,14 +1723,6 @@ (define_expand "mmx_v4hi3"
&& (TARGET_SSE || TARGET_3DNOW_A)"
   "ix86_fixup_binary_operands_no_copy (, V4HImode, operands);")
 
-(define_expand "v4hi3"
-  [(set (match_operand:V4HI 0 "register_operand")
-(smaxmin:V4HI
- (match_operand:V4HI 1 "register_operand")
- (match_operand:V4HI 2 "register_operand")))]
-  "TARGET_MMX_WITH_SSE"
-  "ix86_fixup_binary_operands_no_copy (, V4HImode, operands);")
-
 (define_insn "*mmx_v4hi3"
   [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yw")
 (smaxmin:V4HI
@@ -1750,14 +1740,58 @@ (define_insn "*mmx_v4hi3"
(set_attr "type" "mmxadd,sseiadd,sseiadd")
(set_attr "mode" "DI,TI,TI")])
 
+(define_mode_iterator SMAXMIN_MMXMODEI
+  [(V8QI "TARGET_SSE4_1") V4HI (V2SI "TARGET_SSE4_1")])
+
 (define_expand "3"
-  [(set (match_operand:MMXMODE24 0 "register_operand")
-(umaxmin:MMXMODE24
- (match_operand:MMXMODE24 1 "register_operand")
- (match_operand:MMXMODE24 2 "register_operand")))]
-  "TARGET_MMX_WITH_SSE && TARGET_SSE4_1"
+  [(set (match_operand:SMAXMIN_MMXMODEI 0 "register_operand")
+(smaxmin:SMAXMIN_MMXMODEI
+ (match_operand:SMAXMIN_MMXMODEI 1 "register_operand")
+ (match_operand:SMAXMIN_MMXMODEI 2 "register_operand")))]
+  "TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
 
+(define_insn "*v4qi3"
+  [(set (match_operand:V4QI 0 "register_operand" "=Yr,*x,Yv")
+   (smaxmin:V4QI
+ (match_operand:V4QI 1 "register_operand" "%0,0,Yv")
+ (match_operand:V4QI 2 "register_operand" "Yr,*x,Yv")))]
+  "TARGET_SSE4_1
+   && ix86_binary_operator_ok (, V4QImode, operands)"
+  "@
+   pb\t{%2, %0|%0, %2}
+   pb\t{%2, %0|%0, %2}
+   vpb\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "sseiadd")
+   (set_attr "prefix_extra" "1,1,*")
+   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "mode" "TI")])
+
+(define_insn "*v2hi3"
+  [(set (match_operand:V2HI 0 "register_operand" "=x,Yw")
+(smaxmin:V2HI
+ (match_operand:V2HI 1 "register_operand" "%0,Yw")
+ (match_operand:V2HI 2 "register_operand" "x,Yw")))]
+  "TARGET_SSE2
+   && ix86_binary_operator_ok (, V2HImode, operands)"
+  "@
+   pw\t{%2, %0|%0, %2}
+   vpw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sseiadd")
+   (set_attr "mode" "TI")])
+
+(define_mode_iterator SMAXMIN_VI_32 [(V4QI "TARGET_SSE4_1") V2HI])
+
+(define_expand "3"
+  [(set (match_operand:SMAXMIN_VI_32 0 "register_operand")
+(smaxmin:SMAXMIN_VI_32
+ (match_operand:SMAXMIN_VI_32 1 "register_operand")
+ (match_operand:SMAXMIN_VI_32 2 "register_operand")))]
+  "TARGET_SSE2"
+  "ix86_fixup_binary_operands_no_copy (, V4HImode, operands);")
+
 (define_insn "*mmx_3"
   [(set (match_operand:MMXMODE24 0 "register_operand" "=Yr,*x,Yv")
(umaxmin:MMXMODE24
@@ -1784,14 +1818,6 @@ (define_expand "mmx_v8qi3"
&& (TARGET_SSE || TARGET_3DNOW_A)"
   "ix86_fixup_binary_operands_no_copy (, V8QImode, operands);")
 
-(define_expand "v8qi3"
-  [(set (match_operand:V8QI 0 "register_operand")
-(umaxmin:V8QI
- (match_operand:V8QI 1 "register_operand")
- (match_operand:V8QI 2 "register_operand")))]
-  "TARGET_MMX_WITH_SSE"
-