Re: [PATCH] Remove VRP threader passes in exchange for better threading pre-VRP.

2021-10-29 Thread Aldy Hernandez via Gcc-patches
Yes as well as anything ASSERT related in the forward threader.  That'll be
a follow-up patch.

Aldy

On Fri, Oct 29, 2021, 22:58 David Malcolm  wrote:

> On Thu, 2021-10-28 at 17:24 +0200, Aldy Hernandez via Gcc-patches
> wrote:
>
> [...snip...]
>
> > gcc/ChangeLog:
> >
> > * passes.def: Replace the pass_thread_jumps before VRP* with
> > pass_thread_jumps_full.  Remove all pass_vrp_threader
> > instances.
>
> Given that you're deleting all pass_vrp_threader instances, will you be
> deleting make_pass_vrp_threader and class pass_vrp_threader once the
> dust settles?  (and thus execute_vrp_threader, etc?)
>
> Dave
>
>


Re: [Patch] libcpp: Fix _Pragma expansion [PR102409]

2021-10-29 Thread Tobias Burnus

Hi Martin,

On 28.10.21 18:28, Martin Sebor wrote:

There are a number of bug reports of _Pragma not working right
in macros, including (and especially) to control diagnostics:
https://gcc.gnu.org/bugzilla/buglist.cgi?quicksearch=_Pragma%20macro_id=328003


Just by the description this change seems like it could also
fix some of them.


I think it does not help with them – or only partially.

I believe there currently still two issues:

* _Pragma("GCC foo") – when "foo" or "GCC foo" are
  not registered is immediately processed, leading
  to wrong placement in the output with "-E".

A probably not fully correct draft patch is attached to
https://gcc.gnu.org/PR90400 which fixes the issue
(misses a location before the #pragma).


* With _Pragma("GCC diagnostic") in macros the problem is:
The macro is replaced by all the macro code including the
#pragma and all other code in there.

By construction, all those have the same line. But if
  else { b--;
#pragma GCC diagnostic push
;
#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
; a--;
#pragma GCC diagnostic pop
; }

the location is the input_location of the expanded macro,
i.e. all code is in the same line. As the 'pop' check checks
whether the loc is before the pragma, it might pop too early
and the 'ignored' is already ignored for the 'a--' in this
example.  Cf. https://gcc.gnu.org/PR91669

@Martin: As you seem to have spare cycles, how about spending
some time fixing either issue?

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] Fortran: recognize Gerhard Steinmetz

2021-10-29 Thread Manfred Schwarb via Gcc-patches
Am 29.10.21 um 21:58 schrieb Harald Anlauf via Fortran:
> Hi Manfred,
>
> Am 29.10.21 um 16:33 schrieb Manfred Schwarb via Fortran:
>> Hi,
>> there were really a lot of test cases provided by Gerhard Steinmetz lately.
>> Although I'm not really in the position to suggest this,
>> I would appreciate it, if one could recognize him by adding an entry to 
>> gfortran.texi.
>>
>> As e.g. in the proposed patch. Such a patch should probably be signed-off my 
>> someone of
>> the inner circle and not by me ;-)
>
> well, this is sth. close to obvious to everybody. ;-)
> Anyway, a ChangeLog entry would be nice.

I would feel more comfortable if such a patch would origin from somebody else, 
not from me, actually.

>
> Harald
>
>> Cheers,
>> Manfred
>>
>
>



Re: [PATCH] Fortran: Correct documentation for REAL intrinsic

2021-10-29 Thread Manfred Schwarb via Gcc-patches
Am 29.10.21 um 21:56 schrieb Harald Anlauf via Fortran:
> Hi Manfred,
>
> Am 29.10.21 um 16:18 schrieb Manfred Schwarb via Gcc-patches:
>> Hi,
>>
>> documentation for REAL intrinsic is slightly wrong. Fix it.
>> Patch is done on top of the column adjustment patch.
>
> the patch looks fine, but it would help a lot to have a ChangeLog entry.

Sorry, forgot the changelog entry, I added it to the patch now.

>
> Thanks,
> Harald
>
>> Signed-off-by Manfred Schwarb 
>>
>>
>> [Note: I do not have commit access]
>>
>
>

2021-10-30  Manfred Schwarb  

gcc/fortran/ChangeLog:

	* intrinsic.texi (REAL): Fix entries in Specific names table.

--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -12251,12 +12255,12 @@ end program test_real
 @item @emph{Specific names}:
 @multitable @columnfractions .20 .23 .20 .33
 @headitem Name @tab Argument   @tab Return type @tab Standard
-@item @code{FLOAT(A)}  @tab @code{INTEGER(4)}  @tab @code{REAL(4)}  @tab GNU extension
+@item @code{FLOAT(A)}  @tab @code{INTEGER(4)}  @tab @code{REAL(4)}  @tab Fortran 77 and later
 @item @code{DFLOAT(A)} @tab @code{INTEGER(4)}  @tab @code{REAL(8)}  @tab GNU extension
-@item @code{FLOATI(A)} @tab @code{INTEGER(2)}  @tab @code{REAL(4)}  @tab GNU extension
-@item @code{FLOATJ(A)} @tab @code{INTEGER(4)}  @tab @code{REAL(4)}  @tab GNU extension
-@item @code{FLOATK(A)} @tab @code{INTEGER(8)}  @tab @code{REAL(4)}  @tab GNU extension
-@item @code{SNGL(A)}   @tab @code{INTEGER(8)}  @tab @code{REAL(4)}  @tab GNU extension
+@item @code{FLOATI(A)} @tab @code{INTEGER(2)}  @tab @code{REAL(4)}  @tab GNU extension (-fdec)
+@item @code{FLOATJ(A)} @tab @code{INTEGER(4)}  @tab @code{REAL(4)}  @tab GNU extension (-fdec)
+@item @code{FLOATK(A)} @tab @code{INTEGER(8)}  @tab @code{REAL(4)}  @tab GNU extension (-fdec)
+@item @code{SNGL(A)}   @tab @code{REAL(8)} @tab @code{REAL(4)}  @tab Fortran 77 and later
 @end multitable




Re: [PATCH] Fortran: Remove documentation for SHORT and LONG intrinics

2021-10-29 Thread Manfred Schwarb via Gcc-patches
Am 29.10.21 um 21:52 schrieb Harald Anlauf via Fortran:
> Hi Manfred,
>
> Am 29.10.21 um 16:13 schrieb Manfred Schwarb via Gcc-patches:
>> Hi,
>>
>> on 2019-07-23, support for SHORT and LONG intrinsics was removed be Steve 
>> Kargl by
>> adding an error message in check.c.  As far as I can see code support is 
>> still there, though.
>>
>> Remove documentation for these intrinsics.
>
> could you please provide a formatted patch that applies using git apply?
> And a ChangeLog entry?

Sorry, forgot the changelog entry, I added it to the patch now.

>
> Thanks,
> Harald
>
>> Signed-off-by Manfred Schwarb 
>>
>>
>> [Note: I do not have commit access]
>>
>
>

2021-10-30  Manfred Schwarb  

gcc/fortran/ChangeLog:

	* intrinsic.texi: Remove entries for SHORT and LONG intrinsics.

--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -221,7 +221,6 @@ Some basic guidelines for editing this d
 * @code{LOG10}: LOG10, Base 10 logarithm function
 * @code{LOG_GAMMA}: LOG_GAMMA, Logarithm of the Gamma function
 * @code{LOGICAL}:   LOGICAL,   Convert to logical type
-* @code{LONG}:  LONG,  Convert to integer type
 * @code{LSHIFT}:LSHIFT,Left shift bits
 * @code{LSTAT}: LSTAT, Get file status
 * @code{LTIME}: LTIME, Convert time to local time info
@@ -8372,7 +8371,6 @@ end program
 @node INT2
 @section @code{INT2} --- Convert to 16-bit integer type
 @fnindex INT2
-@fnindex SHORT
 @cindex conversion, to integer

 @table @asis
@@ -8381,8 +8379,6 @@ Convert to a @code{KIND=2} integer type.
 standard @code{INT} intrinsic with an optional argument of
 @code{KIND=2}, and is only included for backwards compatibility.

-The @code{SHORT} intrinsic is equivalent to @code{INT2}.
-
 @item @emph{Standard}:
 GNU extension

@@ -8403,8 +8399,7 @@ The return value is a @code{INTEGER(2)}

 @item @emph{See also}:
 @ref{INT}, @gol
-@ref{INT8}, @gol
-@ref{LONG}
+@ref{INT8}
 @end table


@@ -8440,8 +8435,7 @@ The return value is a @code{INTEGER(8)}

 @item @emph{See also}:
 @ref{INT}, @gol
-@ref{INT2}, @gol
-@ref{LONG}
+@ref{INT2}
 @end table


@@ -9848,44 +9842,6 @@ kind corresponding to @var{KIND}, or of
 @end table


-
-@node LONG
-@section @code{LONG} --- Convert to integer type
-@fnindex LONG
-@cindex conversion, to integer
-
-@table @asis
-@item @emph{Description}:
-Convert to a @code{KIND=4} integer type, which is the same size as a C
-@code{long} integer.  This is equivalent to the standard @code{INT}
-intrinsic with an optional argument of @code{KIND=4}, and is only
-included for backwards compatibility.
-
-@item @emph{Standard}:
-GNU extension
-
-@item @emph{Class}:
-Elemental function
-
-@item @emph{Syntax}:
-@code{RESULT = LONG(A)}
-
-@item @emph{Arguments}:
-@multitable @columnfractions .15 .70
-@item @var{A}@tab Shall be of type @code{INTEGER},
-@code{REAL}, or @code{COMPLEX}.
-@end multitable
-
-@item @emph{Return value}:
-The return value is a @code{INTEGER(4)} variable.
-
-@item @emph{See also}:
-@ref{INT}, @gol
-@ref{INT2}, @gol
-@ref{INT8}
-@end table
-
-

 @node LSHIFT
 @section @code{LSHIFT} --- Left shift bits


Re: [PATCH] Fortran: adjust error message for SHORT and LONG intrinsics

2021-10-29 Thread Manfred Schwarb via Gcc-patches
Am 29.10.21 um 21:51 schrieb Harald Anlauf via Fortran:
> Hi Manfred,
>
> Am 29.10.21 um 16:12 schrieb Manfred Schwarb via Fortran:
>> Hi,
>>
>> on 2019-07-23, support for SHORT and LONG intrinsics were removed be Steve 
>> Kargl by
>> adding an error message in check.c.  However, the error message
>>Error: 'long' intrinsic subprogram at (1) has been deprecated
>> is misleading, as support has been disabled by this patch.
>>
>> Adjust the error message. This error message does not appear in the 
>> testsuite AFAIK.
>
> the patch looks fine.  A testcase checking the error message is missing,
> as well as a ChangeLog entry.

Sorry, forgot the changelog entry, I added it to the patch now.
Testcase was missing already before, but I added a trivial test to the patch 
for completeness.

>
> Thanks,
> Harald
>
>> Signed-off-by Manfred Schwarb 
>>
>>
>> [Note: I do not have commit access]
>>
>
>

2021-10-30  Manfred Schwarb  

gcc/fortran/ChangeLog:

	* check.c (gfc_check_intconv): Change error message.

gcc/testsuite/ChangeLog:

	* gfortran.dg/intrinsic_short-long.f90: New test.

--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -3240,7 +3240,7 @@ gfc_check_intconv (gfc_expr *x)
   if (strcmp (gfc_current_intrinsic, "short") == 0
   || strcmp (gfc_current_intrinsic, "long") == 0)
 {
-  gfc_error ("%qs intrinsic subprogram at %L has been deprecated.  "
+  gfc_error ("%qs intrinsic subprogram at %L has been removed.  "
 		 "Use INT intrinsic subprogram.", gfc_current_intrinsic,
 		 >where);
   return false;
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/intrinsic_short-long.f90
@@ -0,0 +1,11 @@
+! { dg-do compile }
+!
+! Checking for removal of SHORT and LONG intrinsics.
+!
+  real,parameter :: a=3.1415927
+  integer :: i
+
+  i=SHORT(a) ! { dg-error "has been removed" }
+  i=LONG(a)  ! { dg-error "has been removed" }
+
+  end


Re: [PATCH] Fortran: adjust column sizes in intrinsic.texi

2021-10-29 Thread Manfred Schwarb via Gcc-patches
Am 29.10.21 um 21:44 schrieb Harald Anlauf via Fortran:
> Hi Manfred,
>
> Am 29.10.21 um 16:05 schrieb Manfred Schwarb via Fortran:
>> Hi,
>>
>> in intrinsic.texi, a lot of tables wrap lines when watching the
>> resulting info file in a 80char terminal.
>>
>> Adjust the @columnfractions items to fit screen. Some minor white space
>> changes are added as well to help saving space.
>
> the patch looks fine.  However, could you please provide a properly
> formatted patch that applies using git patch and a ChangeLog entry?
>

Sorry, forgot the changelog entry, I added it to the patch now.

> Thanks,
> Harald
>
>> Signed-off-by Manfred Schwarb 
>>
>>
>> [Note: I do not have commit access]
>>
>
>

2021-10-30  Manfred Schwarb  

gcc/fortran/ChangeLog:

	* intrinsic.texi: Adjust @columnfractions commands to improve
	appearance for narrow 80 character terminals.

--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -461,7 +461,7 @@ end program test_abs
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable @columnfractions .20 .23 .20 .33
 @headitem Name@tab Argument@tab Return type   @tab Standard
 @item @code{ABS(A)}   @tab @code{REAL(4) A}@tab @code{REAL(4)}@tab Fortran 77 and later
 @item @code{CABS(A)}  @tab @code{COMPLEX(4) A} @tab @code{REAL(4)}@tab Fortran 77 and later
@@ -626,7 +626,7 @@ end program test_acos
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable @columnfractions .20 .23 .20 .33
 @headitem Name@tab Argument @tab Return type @tab Standard
 @item @code{ACOS(X)}  @tab @code{REAL(4) X} @tab @code{REAL(4)}  @tab Fortran 77 and later
 @item @code{DACOS(X)} @tab @code{REAL(8) X} @tab @code{REAL(8)}  @tab Fortran 77 and later
@@ -685,7 +685,7 @@ end program test_acosd
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable @columnfractions .20 .23 .20 .33
 @headitem Name@tab Argument @tab Return type @tab Standard
 @item @code{ACOSD(X)}  @tab @code{REAL(4) X} @tab @code{REAL(4)}  @tab GNU extension
 @item @code{DACOSD(X)} @tab @code{REAL(8) X} @tab @code{REAL(8)}  @tab GNU extension
@@ -741,7 +741,7 @@ END PROGRAM
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable @columnfractions .20 .23 .20 .33
 @headitem Name @tab Argument  @tab Return type   @tab Standard
 @item @code{DACOSH(X)} @tab @code{REAL(8) X}  @tab @code{REAL(8)}@tab GNU extension
 @end multitable
@@ -890,7 +890,7 @@ end program test_aimag
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable @columnfractions .20 .23 .20 .33
 @headitem Name   @tab Argument@tab Return type @tab Standard
 @item @code{AIMAG(Z)}@tab @code{COMPLEX Z}@tab @code{REAL} @tab Fortran 77 and later
 @item @code{DIMAG(Z)}@tab @code{COMPLEX(8) Z} @tab @code{REAL(8)}  @tab GNU extension
@@ -950,7 +950,7 @@ end program test_aint
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable @columnfractions .20 .23 .20 .33
 @headitem Name   @tab Argument @tab Return type  @tab Standard
 @item @code{AINT(A)} @tab @code{REAL(4) A} @tab @code{REAL(4)}   @tab Fortran 77 and later
 @item @code{DINT(A)} @tab @code{REAL(8) A} @tab @code{REAL(8)}   @tab Fortran 77 and later
@@ -1230,7 +1230,7 @@ end program test_anint
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable @columnfractions .20 .23 .20 .33
 @headitem Name@tab Argument @tab Return type  @tab Standard
 @item @code{ANINT(A)}  @tab @code{REAL(4) A} @tab @code{REAL(4)}   @tab Fortran 77 and later
 @item @code{DNINT(A)} @tab @code{REAL(8) A} @tab @code{REAL(8)}   @tab Fortran 77 and later
@@ -1346,7 +1346,7 @@ end program test_asin
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable @columnfractions .20 .23 .20 .33
 @headitem Name@tab Argument  @tab Return type   @tab Standard
 @item @code{ASIN(X)}  @tab @code{REAL(4) X}  @tab @code{REAL(4)}@tab Fortran 77 and later
 @item @code{DASIN(X)} @tab @code{REAL(8) X}  @tab @code{REAL(8)}@tab Fortran 77 and later
@@ -1405,7 +1405,7 @@ end program test_asind
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable @columnfractions .20 .23 .20 .33
 @headitem Name@tab Argument  @tab Return type   @tab Standard
 @item @code{ASIND(X)}  @tab @code{REAL(4) X}  @tab @code{REAL(4)}@tab GNU extension
 @item @code{DASIND(X)} @tab @code{REAL(8) X}  @tab @code{REAL(8)}@tab GNU extension
@@ -1461,7 +1461,7 @@ END PROGRAM
 @end 

Re: [PATCH,FORTRAN 01/29] gdbinit: break on gfc_internal_error

2021-10-29 Thread Jerry D via Gcc-patches

Looks OK.

Cheers

On 10/29/21 11:58 AM, Bernhard Reutner-Fischer via Fortran wrote:

ping

On Wed,  5 Sep 2018 14:57:04 +
Bernhard Reutner-Fischer  wrote:


From: Bernhard Reutner-Fischer 

Aids debugging the fortran FE.

gcc/ChangeLog:

2017-11-12  Bernhard Reutner-Fischer  

* gdbinit.in: Break on gfc_internal_error.
---
  gcc/gdbinit.in | 1 +
  1 file changed, 1 insertion(+)

diff --git a/gcc/gdbinit.in b/gcc/gdbinit.in
index 4db977f0bab..ac4d7c42e21 100644
--- a/gcc/gdbinit.in
+++ b/gcc/gdbinit.in
@@ -227,6 +227,7 @@ b fancy_abort
  
  # Put a breakpoint on internal_error to help with debugging ICEs.

  b internal_error
+b gfc_internal_error
  
  set complaints 0

  # Don't let abort actually run, as it will make




Re: [PATCH] x86: Document -fcf-protection requires i686 or newer

2021-10-29 Thread Eric Gallager via Gcc-patches
On Thu, Oct 21, 2021 at 12:49 PM H.J. Lu via Gcc-patches
 wrote:
>
> PR target/98667
> * doc/invoke.texi: Document -fcf-protection requires i686 or
> new.
> ---
>  gcc/doc/invoke.texi | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index c66a25fcd69..71992b8c597 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -15542,7 +15542,8 @@ which functions and calls should be skipped from 
> instrumentation
>  (@pxref{Function Attributes}).
>
>  Currently the x86 GNU/Linux target provides an implementation based
> -on Intel Control-flow Enforcement Technology (CET).
> +on Intel Control-flow Enforcement Technology (CET) which works for
> +i686 processor or newer.

I think "processor" should be pluralized to "processors"? Also,
possibly a missing comma after "(CET)"?

>
>  @item -fstack-protector
>  @opindex fstack-protector
> --
> 2.32.0
>


Re: [PATCH] attribs: Allow optional second arg for attr deprecated [PR102049]

2021-10-29 Thread Eric Gallager via Gcc-patches
On Mon, Oct 11, 2021 at 11:19 AM Marek Polacek via Gcc-patches
 wrote:
>
> Any thoughts?

I think it's a good idea, but then again I can't approve it, so...
well, who can, then?

>
> On Thu, Sep 23, 2021 at 12:16:36PM -0400, Marek Polacek via Gcc-patches wrote:
> > Clang implements something we don't have:
> >
> > __attribute__((deprecated("message", "replacement")));
> >
> > which seems pretty neat so I wrote this patch to add it to gcc.
> >
> > It doesn't allow the optional second argument in the standard [[]]
> > form so as not to clash with possible future standard additions.
> >
> > I had hoped we could print a nice fix-it replacement hint, but that
> > won't be possible until warn_deprecated_use gets something better than
> > input_location.

Looking forward to the fix-it hint support being added!

> >
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> >
> >   PR c++/102049
> >
> > gcc/c-family/ChangeLog:
> >
> >   * c-attribs.c (c_common_attribute_table): Increase max_len for
> >   deprecated.
> >   (handle_deprecated_attribute): Allow an optional second argument
> >   in the GNU form of attribute deprecated.
> >
> > gcc/c/ChangeLog:
> >
> >   * c-parser.c (c_parser_std_attribute): Give a diagnostic when
> >   the standard form of an attribute deprecated has a second argument.
> >
> > gcc/ChangeLog:
> >
> >   * doc/extend.texi: Document attribute deprecated with an
> >   optional second argument.
> >   * tree.c (warn_deprecated_use): Print the replacement argument,
> >   if any.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/c2x-attr-deprecated-3.c: Adjust dg-error.
> >   * c-c++-common/Wdeprecated-arg-1.c: New test.
> > ---
> >  gcc/c-family/c-attribs.c  | 17 -
> >  gcc/c/c-parser.c  |  8 ++
> >  gcc/doc/extend.texi   | 24 ++
> >  .../c-c++-common/Wdeprecated-arg-1.c  | 21 
> >  gcc/testsuite/gcc.dg/c2x-attr-deprecated-3.c  |  2 +-
> >  gcc/tree.c| 25 +++
> >  6 files changed, 90 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/c-c++-common/Wdeprecated-arg-1.c
> >
> > diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> > index 007b928c54b..ef857a9ae2c 100644
> > --- a/gcc/c-family/c-attribs.c
> > +++ b/gcc/c-family/c-attribs.c
> > @@ -409,7 +409,7 @@ const struct attribute_spec c_common_attribute_table[] =
> >   to prevent its usage in source code.  */
> >{ "no vops",0, 0, true,  false, false, false,
> > handle_novops_attribute, NULL },
> > -  { "deprecated", 0, 1, false, false, false, false,
> > +  { "deprecated", 0, 2, false, false, false, false,
> > handle_deprecated_attribute, NULL },
> >{ "unavailable",0, 1, false, false, false, false,
> > handle_unavailable_attribute, NULL },
> > @@ -4107,6 +4107,21 @@ handle_deprecated_attribute (tree *node, tree name,
> >error ("deprecated message is not a string");
> >*no_add_attrs = true;
> >  }
> > +  else if (TREE_CHAIN (args) != NULL_TREE)
> > +{
> > +  /* We allow an optional second argument in the GNU form of
> > +  attribute deprecated, which specifies the replacement.  */
> > +  if (flags & ATTR_FLAG_CXX11)
> > + {
> > +   error ("replacement argument only allowed in GNU attributes");
> > +   *no_add_attrs = true;
> > + }
> > +  else if (TREE_CODE (TREE_VALUE (TREE_CHAIN (args))) != STRING_CST)
> > + {
> > +   error ("replacement argument is not a string");
> > +   *no_add_attrs = true;
> > + }
> > +}
> >
> >if (DECL_P (*node))
> >  {
> > diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
> > index fa29d2c15fc..2b47f01d166 100644
> > --- a/gcc/c/c-parser.c
> > +++ b/gcc/c/c-parser.c
> > @@ -4952,6 +4952,14 @@ c_parser_std_attribute (c_parser *parser, bool 
> > for_tm)
> >   TREE_VALUE (attribute)
> > = c_parser_attribute_arguments (parser, takes_identifier,
> > require_string, false);
> > + if (c_parser_next_token_is (parser, CPP_COMMA)
> > + && strcmp (IDENTIFIER_POINTER (name), "deprecated") == 0)
> > +   {
> > + error_at (open_loc, "replacement argument only allowed in "
> > +   "GNU attributes");
> > + c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
> > + return error_mark_node;
> > +   }
> >}
> >  else
> >c_parser_balanced_token_sequence (parser);
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index 9501a60f20e..7d399f4b2bc 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -2860,6 +2860,7 @@ StrongAlias (allocate, alloc);
> >
> >  @item deprecated
> >  

Re: rs6000: Fix up flag_shrink_wrap handling in presence of -mrop-protect [PR101324]

2021-10-29 Thread Segher Boessenkool
On Wed, Oct 27, 2021 at 10:17:39PM -0500, Peter Bergner wrote:
> Sorry for reposting, but I forgot to CC the gcc-patches mailing list. :-(

Whoops, and I replied to your original message :-/

> 2021-10-27  Martin Liska  
> 
> gcc/
>   PR target/101324
>   * config/rs6000/rs6000.c (rs6000_option_override_internal): Move the
>   disabling of shrink-wrapping when using -mrop-protect from here...
>   (rs6000_override_options_after_change): ...to here.
> 
> 2021-10-27  Peter Bergner  
> 
> gcc/testsuite/
>   PR target/101324
>   * gcc.target/powerpc/pr101324.c: New test.

> +/* Ensure hashst comes after mflr and hashchk comes after ld 0,16(1).  */
> +/* { dg-final { scan-assembler "mflr 0.*hashst 0," } } */
> +/* { dg-final { scan-assembler "ld 0,16\\\(1\\\).*hashchk 0," } } */

First: don't use double quotes, or you get double backslashes (or more)
as well.  Use curlies instead:

/* { dg-final { scan-assembler {ld 0,16\(1\).*hashchk 0,} } } */

But, more importantly, "." by default matches anything, newlines as
well.  You probably do not want that here, because your RE as written
can match an "ld" in one function and a "hashchk" many functions later,
many million lines later.

You can for example do
/* { dg-final { scan-assembler {(?p)ld 0,.*\n.*\mhashchk 0,} } } */

(?p) is "partial newline-sensitive matching": it makes "." not match
newlines.  This is often what you want.  This RE also makes sure that
"hashchk" is the full mnemonic (not the tail of one), and that it is on
the line after that "ld".

Similarly you would have

/* { dg-final { scan-assembler {(?p)\mmflr 0,.*\n.*\mhashst 0,} } } */

I hope I didn't typo those things, I didn't test them out :-)

Okay for trunk with similar robustification.  Thanks!


Segher


Re: [PATCH] configure, d: Add support for bootstrapping the D front-end

2021-10-29 Thread Eric Gallager via Gcc-patches
On Thu, Oct 28, 2021 at 2:38 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 10/9/2021 7:32 AM, Iain Buclaw via Gcc-patches wrote:
> > Hi,
> >
> > The implementation of the D front-end in GCC is based on the original
> > C++ version of the D programming language compiler, which was ported to
> > D itself in version 2.069.0 (released in 2015).  To keep it somewhat
> > up-to-date, I have been backporting fixes from upstream back into C++,
> > but this stopped at version 2.076.1 (released in 2017), and since then
> > I've only been keeping the front-end only updated enough to still be
> > able to build the latest version of the D language (now 2.098.0).
> >
> > Reasons for putting off switching to the D implementation immediately
> > after GCC 9 has been a mixture of the front-end not being ready to use,
> > and current portability status of the D core runtime library.
> >
> > It has come to the point now that I'm happy enough with the process to
> > switch out the C++ sources in gcc/d/dmd with D sources.
> >
> > Before that, there's only this patch that makes the required changes to
> > GCC itself in order to have a D front-end written in D itself.
> >
> > The rest of the series only changes code in the D language front-end or
> > libphobos standard library, so I've left that out for the time being
> > until I'm ready to commit it.
> >
> > The complete set of changes are in the ibuclaw/gdc branch under
> > users/ibuclaw.  It has been well-tested on x86_64-linux-gnu for about 3
> > years now, and I've also been testing the self-hosted compiler on
> > powerpc64le-linux-gnu as well with no regressions from the D language
> > testsuite run.
> >
> > Does anything stand out as being problematic in this patch, or may need
> > splitting out first?  Or would it be OK for trunk?
> >
> > Thanks,
> > Iain.
> >
> > ---
> > ChangeLog:
> >
> >   * Makefile.def: Add bootstrap to libbacktrace, libphobos, zlib, and
> >   libatomic.
> >   * Makefile.in: Regenerate.
> >   * Makefile.tpl (POSTSTAGE1_HOST_EXPORTS): Fix command for GDC.
> >   (STAGE1_CONFIGURE_FLAGS): Add --with-libphobos-druntime-only if
> >   target-libphobos-bootstrap.
> >   (STAGE2_CONFIGURE_FLAGS): Likewise.
> >   * configure: Regenerate.
> >   * configure.ac: Add support for bootstrapping D front-end.
> >
> > config/ChangeLog:
> >
> >   * acx.m4 (ACX_PROG_GDC): New m4 function.
> >
> > gcc/ChangeLog:
> >
> >   * Makefile.in (GDC): New variable.
> >   (GDCFLAGS): New variable.
> >   * configure: Regenerate.
> >   * configure.ac: Add call to ACX_PROG_GDC.  Substitute GDCFLAGS.
> >
> > gcc/po/ChangeLog:
> >
> >   * EXCLUDES: Remove d/dmd sources from list.
> Presumably this means that the only way to build D for the first time on
> a new target is to cross from an existing target that supports D, right?
>
> I think that's not unreasonable and I don't think we want to increase
> the burden of maintaining an old codebase just for the sake of a
> marginally easier bootstrap process for a new target.
>
> So I think you should go with this whenever you're ready.
>
> jeff
>

There should be some sort of note about this in the documentation,
IMO; both install.texi and the "Caveats" section of
gcc-12/changes.html (and possibly other places).

Eric


Re: [PATCH,Fortran 1/2] Add uop/name helpers

2021-10-29 Thread Bernhard Reutner-Fischer via Gcc-patches
On Fri, 29 Oct 2021 19:15:13 +0200
Bernhard Reutner-Fischer  wrote:

> > But most importantly, I really don't like these helpers at all, they
> > unnecessarily allocate memory of the remaining duration of compilation,
> > and the second one even uses heap for temporary.  

> I can easily switch the second one to use XALLOCAVEC if you'd accept
> that? Ok?

const char *
gfc_get_name_from_uop (const char *name)
{
  gcc_assert (name[0] == '.');
  const size_t len = strlen (name) - 1;
  gcc_assert (len > 1);
  gcc_assert (name[len] == '.');
  const char *ret = gfc_get_string ("%.*s", len - 1, name + 1);
  return ret;
}
if that's deemed portable enough nowadays? There seem to be preexisting
users in the tree so should be ok.
We can make these checking asserts or remove them altogether of course.

Would that be acceptable?
thanks,

> 
> > Can't you just fix the real bug and keep the code as it was otherwise
> > (with XALLOCAVEC etc.)?
> > And, there should be a testcase...  
> 
> I tried to construct a testcase yesterday but failed.
> I took udr10.f90 and experimented with not using a derived type
> (because DERIVED || CLASS bypasses the failure to lookup st).
> I tried to move the module out to its own source to no avail and gave up
> late at night.
> 
> Unrelated note:
> One thing that looked odd to my untrained eyes was in e.g. udr10.f90
> where you write:
> !$omp parallel do reduction(+:j) reduction(.localadd.:k)
>   do i = 1, 100
> j = j .localadd. dl(i)
> k = k + dl(i * 2)
> 
> which may of course be correct (even if + would be implemented by e.g.
> a twice_i procedure (and not addme like the user operator) but i'd have
> assumed the reduction oper to match the target var in the region, like
> !$omp parallel do reduction(.localadd.:j) reduction(+:k)
> But i admittedly know nothing about openmp syntax so it's certainly fine
> as written?
> 
> PS: you have at least
> declare-simd-3.f90:! { dg-do compile { target { lp64 && { ! lp64 } } } }
> declare-target-2.f90:! { dg-do compile { target { lp64 && { ! lp64 } } } }
> 
> and i think later on
> ! { dg-do compile { target skip-all-targets } }
> was added, presumably for this very purpose.
> 
> thanks,



[committed][PATCH]AArch64 [testsuite] Don't expect a complex FMA

2021-10-29 Thread Tamar Christina via Gcc-patches
Hi All,

The sharing of the COMPLEX_MUL node makes it so it's
more efficient to not generate both a MUL and FMA
in this node.

Because the shape for a normal FMA is not different
the FMA is no longer detected here which results in
better codegen so update the testcase.

Regtested on aarch64-none-linux-gnu and no issues.

Committed under the GCC obvious rule.

Thanks,
Tamar

gcc/testsuite/ChangeLog:

* g++.dg/vect/pr99149.cc: Update case.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/g++.dg/vect/pr99149.cc 
b/gcc/testsuite/g++.dg/vect/pr99149.cc
index 
9d584262770c75f53bea9c193d3a44aa792f4d36..e6e0594a336fa053ffba64a12e2de43a4e373f49
 100755
--- a/gcc/testsuite/g++.dg/vect/pr99149.cc
+++ b/gcc/testsuite/g++.dg/vect/pr99149.cc
@@ -25,4 +25,3 @@ public:
 main() { n.j(); }
 
 /* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" } } */
-/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_FMA" 1 "slp2" } } */


-- 
diff --git a/gcc/testsuite/g++.dg/vect/pr99149.cc b/gcc/testsuite/g++.dg/vect/pr99149.cc
index 9d584262770c75f53bea9c193d3a44aa792f4d36..e6e0594a336fa053ffba64a12e2de43a4e373f49 100755
--- a/gcc/testsuite/g++.dg/vect/pr99149.cc
+++ b/gcc/testsuite/g++.dg/vect/pr99149.cc
@@ -25,4 +25,3 @@ public:
 main() { n.j(); }
 
 /* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_MUL" 1 "slp2" } } */
-/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_FMA" 1 "slp2" } } */



Re: [PATCH] Remove VRP threader passes in exchange for better threading pre-VRP.

2021-10-29 Thread David Malcolm via Gcc-patches
On Thu, 2021-10-28 at 17:24 +0200, Aldy Hernandez via Gcc-patches
wrote:

[...snip...]

> gcc/ChangeLog:
> 
> * passes.def: Replace the pass_thread_jumps before VRP* with
> pass_thread_jumps_full.  Remove all pass_vrp_threader
> instances.

Given that you're deleting all pass_vrp_threader instances, will you be
deleting make_pass_vrp_threader and class pass_vrp_threader once the
dust settles?  (and thus execute_vrp_threader, etc?)

Dave



[committed] libstdc++: Fix typo in std::stack test

2021-10-29 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, committed to trunk.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/stack/deduction.cc: Fix typo.
---
 libstdc++-v3/testsuite/23_containers/stack/deduction.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/23_containers/stack/deduction.cc 
b/libstdc++-v3/testsuite/23_containers/stack/deduction.cc
index dea7ba060d9..0ac3737021b 100644
--- a/libstdc++-v3/testsuite/23_containers/stack/deduction.cc
+++ b/libstdc++-v3/testsuite/23_containers/stack/deduction.cc
@@ -98,6 +98,6 @@ test03()
   check_type>(s1);
 
   std::stack s2(l.begin(), l.end(), std::allocator());
-  check_type>(s1);
+  check_type>(s2);
 }
 #endif
-- 
2.31.1



Re: [PATCH] Fortran: Correct documentation for REAL intrinsic

2021-10-29 Thread Harald Anlauf via Gcc-patches

Hi Manfred,

Am 29.10.21 um 16:18 schrieb Manfred Schwarb via Gcc-patches:

Hi,

documentation for REAL intrinsic is slightly wrong. Fix it.
Patch is done on top of the column adjustment patch.


the patch looks fine, but it would help a lot to have a ChangeLog entry.

Thanks,
Harald


Signed-off-by Manfred Schwarb 


[Note: I do not have commit access]






Re: [PATCH,FORTRAN 28/29] Free type-bound procedure structs

2021-10-29 Thread Bernhard Reutner-Fischer via Gcc-patches
On Fri, 29 Oct 2021 21:36:26 +0200
Harald Anlauf via Gcc-patches  wrote:

> Dear Bernhard, all,
> 
> Am 29.10.21 um 02:05 schrieb Bernhard Reutner-Fischer via Gcc-patches:
> 
> >> diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
> >> index 53c760a6c38..cde34c67482 100644
> >> --- a/gcc/fortran/symbol.c
> >> +++ b/gcc/fortran/symbol.c  
> 
> >> @@ -5052,7 +5052,7 @@ gfc_get_typebound_proc (gfc_typebound_proc *tb0)
> >>   
> >> result = XCNEW (gfc_typebound_proc);
> >> if (tb0)
> >> -*result = *tb0;
> >> +memcpy (result, tb0, sizeof (gfc_typebound_proc));;
> >> result->error = 1;
> >>   
> >> latest_undo_chgset->tbps.safe_push (result);  
> > 
> >   
> 
> please forgive me, but frankly speaking, I don't like this change.
> 
> It seems to serve no obvious purpose other than obfuscating the code
> and defeating the compiler's ability to detect type mismatches.

mhm okay.
IIRC these are folded to memcpy early on and in some projects with
certain optimization levels results in an unobvious call to memcpy
(which poses trouble if you want to avoid relocations at all cost which
this might trigger if pulling in memcpy unexpectedly).
f951 of course is not in the camp to bother much about this so i admit
the change might stem from a tinfoil-hat moment of mine and might not
be appropriate here.

Although i don't buy the argument of the possibility of papering over
type-mismatches in this particular case (the incoming arg is typed
gfc_typebound_proc*, the result is typed gfc_typebound_proc*, the
allocation is casted to gfc_typebound_proc*) we can certainly revert
that hunk if folks prefer.

> 
> I would not have OKed that part of the patch.

For reference:
gcc/fortran/symbol.c
gfc_typebound_proc*
gfc_get_typebound_proc (gfc_typebound_proc *tb0)
{
  gfc_typebound_proc *result;

  result = XCNEW (gfc_typebound_proc);
  if (tb0)
memcpy (result, tb0, sizeof (gfc_typebound_proc));
  result->error = 1;

  latest_undo_chgset->tbps.safe_push (result);

  return result;
}

And i did
-*result = *tb0;
+memcpy (result, tb0, sizeof (gfc_typebound_proc));

> 
> Harald
> 



Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-29 Thread Aldy Hernandez via Gcc-patches
On Fri, Oct 15, 2021, 12:39 Aldy Hernandez  wrote:

>
>
> On 10/15/21 2:47 AM, Andrew MacLeod wrote:
> > On 10/14/21 6:07 PM, Martin Sebor via Gcc-patches wrote:
> >> On 10/9/21 12:47 PM, Aldy Hernandez via Gcc-patches wrote:
> >>> We seem to be passing a lot of context around in the strlen code.  I
> >>> certainly don't want to contribute to more.
> >>>
> >>> Most of the handle_* functions are passing the gsi as well as either
> >>> ptr_qry or rvals.  That looks a bit messy.  May I suggest putting all
> >>> of that in the strlen pass object (well, the dom walker object, but we
> >>> can rename it to be less dom centric)?
> >>>
> >>> Something like the attached (untested) patch could be the basis for
> >>> further cleanups.
> >>>
> >>> Jakub, would this line of work interest you?
> >>
> >> You didn't ask me but since no one spoke up against it let me add
> >> some encouragement: this is exactly what I was envisioning and in
> >> line with other such modernization we have been doing elsewhere.
> >> Could you please submit it for review?
> >>
> >> Martin
> >
> > I'm willing to bet he didn't submit it for review because he doesn't
> > have time this release to polish and track it...  (I think the threader
> > has been quite consuming).  Rather, it was offered as a starting point
> > for someone else who might be interested in continuing to pursue this
> > work...  *everyone* is interested in cleanup work others do :-)
>
> Exactly.  There's a lot of work that could be done in this area, and I'm
> trying to avoid the situation with the threaders where what started as
> refactoring ended up with me basically owning them ;-).
>
> That being said, I there are enough cleanups that are useful on their
> own.  I've removed all the passing around of GSIs, as well as ptr_qry,
> with the exception of anything dealing with the sprintf pass, since it
> has a slightly different interface.
>
> This is patch 0001, which I'm formally submitting for inclusion.  No
> functional changes with this patch.  OK for trunk?
>
> Also, I am PINGing patch 0002, which is the strlen pass conversion to
> the ranger.  As mentioned, this is just a change from an evrp client to
> a ranger client.  The APIs are exactly the same, and besides, the evrp
> analyzer is deprecated and slated for removal.  OK for trunk?
>

Ping * 2 for patch 2, although I'm sure it needs massaging after Martin
Sebor's in the same area.

Aldy


Re: [PATCH] Fortran: recognize Gerhard Steinmetz

2021-10-29 Thread Harald Anlauf via Gcc-patches

Hi Manfred,

Am 29.10.21 um 16:33 schrieb Manfred Schwarb via Fortran:

Hi,
there were really a lot of test cases provided by Gerhard Steinmetz lately.
Although I'm not really in the position to suggest this,
I would appreciate it, if one could recognize him by adding an entry to 
gfortran.texi.

As e.g. in the proposed patch. Such a patch should probably be signed-off my 
someone of
the inner circle and not by me ;-)


well, this is sth. close to obvious to everybody. ;-)
Anyway, a ChangeLog entry would be nice.

Harald


Cheers,
Manfred





Re: [PATCH] Fortran: Remove documentation for SHORT and LONG intrinics

2021-10-29 Thread Harald Anlauf via Gcc-patches

Hi Manfred,

Am 29.10.21 um 16:13 schrieb Manfred Schwarb via Gcc-patches:

Hi,

on 2019-07-23, support for SHORT and LONG intrinsics was removed be Steve Kargl 
by
adding an error message in check.c.  As far as I can see code support is still 
there, though.

Remove documentation for these intrinsics.


could you please provide a formatted patch that applies using git apply?
And a ChangeLog entry?

Thanks,
Harald


Signed-off-by Manfred Schwarb 


[Note: I do not have commit access]





Re: [PATCH] Fortran: adjust error message for SHORT and LONG intrinsics

2021-10-29 Thread Harald Anlauf via Gcc-patches

Hi Manfred,

Am 29.10.21 um 16:12 schrieb Manfred Schwarb via Fortran:

Hi,

on 2019-07-23, support for SHORT and LONG intrinsics were removed be Steve 
Kargl by
adding an error message in check.c.  However, the error message
   Error: 'long' intrinsic subprogram at (1) has been deprecated
is misleading, as support has been disabled by this patch.

Adjust the error message. This error message does not appear in the testsuite 
AFAIK.


the patch looks fine.  A testcase checking the error message is missing,
as well as a ChangeLog entry.

Thanks,
Harald


Signed-off-by Manfred Schwarb 


[Note: I do not have commit access]





Re: [PATCH] Fortran: adjust column sizes in intrinsic.texi

2021-10-29 Thread Harald Anlauf via Gcc-patches

Hi Manfred,

Am 29.10.21 um 16:05 schrieb Manfred Schwarb via Fortran:

Hi,

in intrinsic.texi, a lot of tables wrap lines when watching the
resulting info file in a 80char terminal.

Adjust the @columnfractions items to fit screen. Some minor white space
changes are added as well to help saving space.


the patch looks fine.  However, could you please provide a properly
formatted patch that applies using git patch and a ChangeLog entry?

Thanks,
Harald


Signed-off-by Manfred Schwarb 


[Note: I do not have commit access]





Re: [PATCH,FORTRAN 28/29] Free type-bound procedure structs

2021-10-29 Thread Harald Anlauf via Gcc-patches

Dear Bernhard, all,

Am 29.10.21 um 02:05 schrieb Bernhard Reutner-Fischer via Gcc-patches:


diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index 53c760a6c38..cde34c67482 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c



@@ -5052,7 +5052,7 @@ gfc_get_typebound_proc (gfc_typebound_proc *tb0)
  
result = XCNEW (gfc_typebound_proc);

if (tb0)
-*result = *tb0;
+memcpy (result, tb0, sizeof (gfc_typebound_proc));;
result->error = 1;
  
latest_undo_chgset->tbps.safe_push (result);





please forgive me, but frankly speaking, I don't like this change.

It seems to serve no obvious purpose other than obfuscating the code
and defeating the compiler's ability to detect type mismatches.

I would not have OKed that part of the patch.

Harald



Re: [PATCH,FORTRAN 01/29] gdbinit: break on gfc_internal_error

2021-10-29 Thread Bernhard Reutner-Fischer via Gcc-patches
ping

On Wed,  5 Sep 2018 14:57:04 +
Bernhard Reutner-Fischer  wrote:

> From: Bernhard Reutner-Fischer 
> 
> Aids debugging the fortran FE.
> 
> gcc/ChangeLog:
> 
> 2017-11-12  Bernhard Reutner-Fischer  
> 
>   * gdbinit.in: Break on gfc_internal_error.
> ---
>  gcc/gdbinit.in | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/gcc/gdbinit.in b/gcc/gdbinit.in
> index 4db977f0bab..ac4d7c42e21 100644
> --- a/gcc/gdbinit.in
> +++ b/gcc/gdbinit.in
> @@ -227,6 +227,7 @@ b fancy_abort
>  
>  # Put a breakpoint on internal_error to help with debugging ICEs.
>  b internal_error
> +b gfc_internal_error
>  
>  set complaints 0
>  # Don't let abort actually run, as it will make



Re: [PATCH] PR fortran/99853 - ICE: Cannot convert 'LOGICAL(4)' to 'INTEGER(8)' (etc.)

2021-10-29 Thread Bernhard Reutner-Fischer via Gcc-patches
On Thu, 28 Oct 2021 23:03:05 +0200
Harald Anlauf via Fortran  wrote:

> Dear Fortranners,
> 
> the original fix by Steve was lingering in the PR.
> 
> We did ICE in situations where in a SELECT CASE a kind conversion
> was deemed necessary, but it did involve different types.
> The check gfc_convert_type_warn () was invoked with arguments
> requesting to generate an internal error.  A regular gfc_error
> is good enough here.
> 
> Regtested on x86_64-pc-linux-gnu.  OK?

Sounds plausible but i cannot approve it.
PS:
git commit --author 'Steve Kargl '
would give Steve due credit i suppose. Or throw in --amend if you
applied it already to your local tree (e.g rebase -i and reword the
message, then git commit --amend --author ...). HTH.
> 
> Thanks, also to Steve,

thanks,
> 
> Harald
> 
> 
> Fortran: generate regular error on invalid conversions of CASE expressions
> 
> gcc/fortran/ChangeLog:
> 
>   PR fortran/99853
>   * resolve.c (resolve_select): Generate regular gfc_error on
>   invalid conversions instead of an gfc_internal_error.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR fortran/99853
>   * gfortran.dg/pr99853.f90: New test.
> 



[r12-4786 Regression] FAIL: gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c scan-tree-dump slp1 "Found COMPLEX_ADD_ROT90" on Linux/x86_64

2021-10-29 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

4045d5fa42f2ee7b284977c8f2f0edc300a63e43 is the first bad commit
commit 4045d5fa42f2ee7b284977c8f2f0edc300a63e43
Author: Tamar Christina 
Date:   Fri Oct 29 12:47:39 2021 +0100

middle-end: Add target independent tests for Arm complex numbers 
vectorization.

caused

FAIL: gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c scan-tree-dump 
slp1 "Found COMPLEX_ADD_ROT270"
FAIL: gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c scan-tree-dump 
slp1 "Found COMPLEX_ADD_ROT90"
FAIL: gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c 
scan-tree-dump slp1 "Found COMPLEX_ADD_ROT270"
FAIL: gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c 
scan-tree-dump slp1 "Found COMPLEX_ADD_ROT90"

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4786/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="complex.exp=gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c
 --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="complex.exp=gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c
 --target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="complex.exp=gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c
 --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="complex.exp=gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c
 --target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="complex.exp=gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c
 --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="complex.exp=gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c
 --target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PATCH,Fortran 1/2] Add uop/name helpers

2021-10-29 Thread Bernhard Reutner-Fischer via Gcc-patches
On Fri, 29 Oct 2021 13:13:52 +0200
Jakub Jelinek  wrote:

> On Fri, Oct 29, 2021 at 01:52:58AM +0200, Bernhard Reutner-Fischer wrote:
> > From: Bernhard Reutner-Fischer 
> > 
> > Introduce a helper to construct a user operator from a name and the
> > reverse operation, i.e. a helper to construct a name from a user
> > operator.
> > 
> > Cc: Jakub Jelinek 
> > 
> > gcc/fortran/ChangeLog:
> > 
> > 2017-10-29  Bernhard Reutner-Fischer  
> > 
> > * gfortran.h (gfc_get_uop_from_name, gfc_get_name_from_uop): Declare.
> > * symbol.c (gfc_get_uop_from_name, gfc_get_name_from_uop): Define.
> > * module.c (load_omp_udrs): Use them.
> > ---
> >  gcc/fortran/gfortran.h |  2 ++
> >  gcc/fortran/module.c   | 21 +++--
> >  gcc/fortran/symbol.c   | 21 +
> >  3 files changed, 26 insertions(+), 18 deletions(-)
> > 
> > diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
> > index 9378b4b8a24..afe9f2354ee 100644
> > --- a/gcc/fortran/gfortran.h
> > +++ b/gcc/fortran/gfortran.h
> > @@ -3399,6 +3399,8 @@ void gfc_delete_symtree (gfc_symtree **, const char 
> > *);
> >  gfc_symtree *gfc_get_unique_symtree (gfc_namespace *);
> >  gfc_user_op *gfc_get_uop (const char *);
> >  gfc_user_op *gfc_find_uop (const char *, gfc_namespace *);
> > +const char *gfc_get_uop_from_name (const char*);
> > +const char *gfc_get_name_from_uop (const char*);  
> 
> Formatting, space between char and *.
> 
> > --- a/gcc/fortran/symbol.c
> > +++ b/gcc/fortran/symbol.c
> > @@ -3044,6 +3044,27 @@ gfc_find_uop (const char *name, gfc_namespace *ns)
> >return (st == NULL) ? NULL : st->n.uop;
> >  }
> >  
> > +/* Given a name return a string usable as user operator name.  */
> > +const char *
> > +gfc_get_uop_from_name (const char* name) {  
> 
> Formatting, space before * rather than after it, { should go on next line.
> Similarly later.

Fixed the formatting. Sorry for my sloppiness..
> 
> But most importantly, I really don't like these helpers at all, they
> unnecessarily allocate memory of the remaining duration of compilation,
> and the second one even uses heap for temporary.

Where do they allocate memory that remains during the rest of
compilation?
If we end up emitting the thing, then we will have it put into the
stringpool anyway, so how's that bad? I did delete the allocated buffer
after ht_lookup_with_hash copied the content so the temporary buffer in
gfc_get_name_from_uop does not leak AFAICS.

I can easily switch the second one to use XALLOCAVEC if you'd accept
that? Ok?

> Can't you just fix the real bug and keep the code as it was otherwise
> (with XALLOCAVEC etc.)?
> And, there should be a testcase...

I tried to construct a testcase yesterday but failed.
I took udr10.f90 and experimented with not using a derived type
(because DERIVED || CLASS bypasses the failure to lookup st).
I tried to move the module out to its own source to no avail and gave up
late at night.

Unrelated note:
One thing that looked odd to my untrained eyes was in e.g. udr10.f90
where you write:
!$omp parallel do reduction(+:j) reduction(.localadd.:k)
  do i = 1, 100
j = j .localadd. dl(i)
k = k + dl(i * 2)

which may of course be correct (even if + would be implemented by e.g.
a twice_i procedure (and not addme like the user operator) but i'd have
assumed the reduction oper to match the target var in the region, like
!$omp parallel do reduction(.localadd.:j) reduction(+:k)
But i admittedly know nothing about openmp syntax so it's certainly fine
as written?

PS: you have at least
declare-simd-3.f90:! { dg-do compile { target { lp64 && { ! lp64 } } } }
declare-target-2.f90:! { dg-do compile { target { lp64 && { ! lp64 } } } }

and i think later on
! { dg-do compile { target skip-all-targets } }
was added, presumably for this very purpose.

thanks,


Re: [Version 2][Patch][PR102281]do not add BUILTIN_CLEAR_PADDING for variables that are gimple registers

2021-10-29 Thread Jakub Jelinek via Gcc-patches
On Fri, Oct 29, 2021 at 04:55:07PM +, Qing Zhao wrote:
> I will commit this patch the beginning of next week.
> Jakub, let me know if you have any objection on this.

No objections, sorry for the delay.

Jakub



Re: [Version 2][Patch][PR102281]do not add BUILTIN_CLEAR_PADDING for variables that are gimple registers

2021-10-29 Thread Qing Zhao via Gcc-patches
Thank you.

I will commit this patch the beginning of next week.
Jakub, let me know if you have any objection on this.

Qing

> On Oct 29, 2021, at 2:21 AM, Richard Biener  wrote:
> 
> On Thu, 28 Oct 2021, Qing Zhao wrote:
> 
>> Ping….
>> 
>> Hi,
>> 
>> Based on the previous discussion, I thought that we have agreed that the 
>> proposed patch for this current bug is the correct  fix. 
>> And This bug is an important bug that need to be fixed.
>> 
>> I have created another new PR for the other potential issue with padding 
>> initialization for  long double/_Complex long double variables with explicit 
>> initializer https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102781, and will be 
>> fixed separately later if needed.
>> 
>> Please take a look of the new patch and let me know whether there is any 
>> more issue with this version? Or it’s okay for commit now?
> 
> I think it's reasonable.
> 
> Thus OK unless Jakub has comments.
> 
> Thanks,
> Richard.
> 
>> Thanks.
>> 
>> Qing
>> 
>> 
>> 
>>> On Oct 25, 2021, at 9:16 AM, Qing Zhao via Gcc-patches 
>>>  wrote:
>>> 
>>> Ping….
>>> 
>>> Is this Okay for trunk?
>>> 
 On Oct 18, 2021, at 2:26 PM, Qing Zhao via Gcc-patches 
  wrote:
 
 Hi, Jakub,
 
 This is the 2nd version of the patch based on your comment.
 
 Bootstrapped on both x86 and aarch64. Regression testings are ongoing.
>>> 
>>> The regression testing looks good.
>>> 
>>> Thanks.
>>> 
>>> Qing
 
 Please let me know if this is ready for committing?
 
 Thanks a lot.
 
 Qing.
 
 ==
 
 From d6f60370dee69b5deb3d7ef51873a5e986490782 Mon Sep 17 00:00:00 2001
 From: Qing Zhao 
 Date: Mon, 18 Oct 2021 19:04:39 +
 Subject: [PATCH] PR 102281 (-ftrivial-auto-var-init=zero causes ice)
 
 Do not add call to __builtin_clear_padding when a variable is a gimple
 register or it might not have padding.
 
 gcc/ChangeLog:
 
 2021-10-18  qing zhao  
 
* gimplify.c (gimplify_decl_expr): Do not add call to
__builtin_clear_padding when a variable is a gimple register
or it might not have padding.
(gimplify_init_constructor): Likewise.
 
 gcc/testsuite/ChangeLog:
 
 2021-10-18  qing zhao  
 
* c-c++-common/pr102281.c: New test.
* gcc.target/i386/auto-init-2.c: Adjust testing case.
* gcc.target/i386/auto-init-4.c: Likewise.
* gcc.target/i386/auto-init-6.c: Likewise.
* gcc.target/aarch64/auto-init-6.c: Likewise.
 ---
 gcc/gimplify.c| 25 ++-
 gcc/testsuite/c-c++-common/pr102281.c | 17 +
 .../gcc.target/aarch64/auto-init-6.c  |  4 +--
 gcc/testsuite/gcc.target/i386/auto-init-2.c   |  2 +-
 gcc/testsuite/gcc.target/i386/auto-init-4.c   | 10 +++-
 gcc/testsuite/gcc.target/i386/auto-init-6.c   |  7 +++---
 6 files changed, 47 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr102281.c
 
 diff --git a/gcc/gimplify.c b/gcc/gimplify.c
 index d8e4b139349..b27dc0ed308 100644
 --- a/gcc/gimplify.c
 +++ b/gcc/gimplify.c
 @@ -1784,8 +1784,8 @@ gimple_add_init_for_auto_var (tree decl,
  that padding is initialized to zero. So, we always initialize paddings
  to zeroes regardless INIT_TYPE.
  To do the padding initialization, we insert a call to
 -   __BUILTIN_CLEAR_PADDING (, 0, for_auto_init = true).
 -   Note, we add an additional dummy argument for __BUILTIN_CLEAR_PADDING,
 +   __builtin_clear_padding (, 0, for_auto_init = true).
 +   Note, we add an additional dummy argument for __builtin_clear_padding,
  'for_auto_init' to distinguish whether this call is for automatic
  variable initialization or not.
  */
 @@ -1954,8 +1954,14 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
 pattern initialization.
 In order to make the paddings as zeroes for pattern init, We
 should add a call to __builtin_clear_padding to clear the
 -   paddings to zero in compatiple with CLANG.  */
 -if (flag_auto_var_init == AUTO_INIT_PATTERN)
 +   paddings to zero in compatiple with CLANG.
 +   We cannot insert this call if the variable is a gimple register
 +   since __builtin_clear_padding will take the address of the
 +   variable.  As a result, if a long double/_Complex long double
 +   variable will spilled into stack later, its padding is 0XFE.  */
 +if (flag_auto_var_init == AUTO_INIT_PATTERN
 +&& !is_gimple_reg (decl)
 +&& clear_padding_type_may_have_padding_p (TREE_TYPE (decl)))
gimple_add_padding_init_for_auto_var (decl, is_vla, seq_p);
}
   }
 @@ -5384,12 +5390,19 @@ gimplify_init_constructor (tree *expr_p, 
 gimple_seq *pre_p, gimple_seq *post_p,
 

Re: [Patch] OpenMP: Add strictly nested API call check [PR102972]

2021-10-29 Thread Jakub Jelinek via Gcc-patches
On Fri, Oct 29, 2021 at 05:54:57PM +0200, Tobias Burnus wrote:
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -3926,8 +3926,9 @@ omp_runtime_api_call (const_tree fndecl)
>  
>static const char *omp_runtime_apis[] =
>  {
> -  /* This array has 3 sections.  First omp_* calls that don't
> -  have any suffixes.  */
> +  /* This array has 2 sections.  First omp_* calls that don't
> +  have any suffixes in the DECL_NAME; this includes omp_*
> +  but also the omp_*_ of libgomp/fortran.c.  */
>"aligned_alloc",
>"aligned_calloc",
>"alloc",
> @@ -3941,8 +3942,6 @@ omp_runtime_api_call (const_tree fndecl)
>"target_is_present",
>"target_memcpy",
>"target_memcpy_rect",
> -  NULL,
> -  /* Now omp_* calls that are available as omp_* and omp_*_.  */
>"capture_affinity",
>"destroy_allocator",
>"destroy_lock",

If we use just 2 sections, then the two sections should be merged (they were
in alphabetic order in each section).
Or we can keep 3 sections and say that the first one is for the
calls on the library side without suffixes and second is for those with
no and _ suffixes, but that in DECL_NAME those don't make a difference.
Or make it 3 sections but the first two not separated by NULL but just a
comment, i.e. what you have in the patch except that the comments would
be adjusted...
Either of those 3 section solutions would be more useful if we ever reconsider
this and go with DECL_ASSEMBLER_NAME.

> @@ -3994,7 +3993,8 @@ omp_runtime_api_call (const_tree fndecl)
>"unset_lock",
>"unset_nest_lock",
>NULL,
> -  /* And finally calls available as omp_*, omp_*_ and omp_*_8_.  */
> +  /* Calls available with DECL_NAME omp_* and omp_*_8, the latter matches
> +  omp_*_8_ in libgomp/fortran.c.  */
>"display_env",
>"get_ancestor_thread_num",
>"init_allocator",
> @@ -4024,11 +4024,7 @@ omp_runtime_api_call (const_tree fndecl)
>size_t len = strlen (omp_runtime_apis[i]);
>if (strncmp (name + 4, omp_runtime_apis[i], len) == 0
> && (name[4 + len] == '\0'
> -   || (mode > 0
> -   && name[4 + len] == '_'
> -   && (name[4 + len + 1] == '\0'
> -   || (mode > 1
> -   && strcmp (name + 4 + len + 1, "8_") == 0)
> +   || (mode && strcmp (name + 4 + len, "_8") == 0)))
>   return true;
>  }
>return false;
> @@ -4095,9 +4091,24 @@ scan_omp_1_stmt (gimple_stmt_iterator *gsi, bool 
> *handled_ops_p,
>   "OpenMP runtime API call %qD in a region with "
>   "% clause", fndecl);
>   }
> +   if (gimple_code (ctx->stmt) == GIMPLE_OMP_TEAMS
> +   && omp_runtime_api_call (fndecl)
> +   && strncmp (IDENTIFIER_POINTER (DECL_NAME (fndecl)),
> +   "omp_get_num_teams",
> +   strlen ("omp_get_num_teams")) != 0
> +   && strncmp (IDENTIFIER_POINTER (DECL_NAME (fndecl)),
> +   "omp_get_team_num",
> +   strlen ("omp_get_team_num")) != 0)

If we wanted to optimize, we could decide based on IDENTIFIER_LENGTH whether
to use strncmp at all and which one.  Your choice.

> +   #pragma omp distribute
> +   for (int i = 0; i < 1; ++i)
> + if (omp_in_parallel ()
> + || omp_get_level () != 0
> + || omp_get_ancestor_thread_num (0) != 0
> + || omp_get_ancestor_thread_num (1) != -1)
> +   abort ();

One thing I've missed, with such omp distribute we unfortunately test
it only on one of the teams (probably the first one) rather than all of
them.
Can't we use instead
  #pragma omp distribute dist_schedule(static,1)
  for (int i = 0; i < omp_get_num_teams (); ++i)
which I believe should ensure that each team will execute exactly one
iteration (i.e. exactly what the code has been doing before).

Otherwise LGTM.

Jakub



Re: [PATCH v4] attribs: Implement -Wno-attributes=vendor::attr [PR101940]

2021-10-29 Thread Marek Polacek via Gcc-patches
Ping.

On Mon, Oct 11, 2021 at 11:17:11AM -0400, Marek Polacek wrote:
> Ping.
> 
> On Tue, Sep 28, 2021 at 04:20:46PM -0400, Marek Polacek wrote:
> > On Thu, Sep 23, 2021 at 02:25:16PM -0400, Jason Merrill wrote:
> > > On 9/20/21 18:59, Marek Polacek via Gcc-patches wrote:
> > > > +void
> > > > +handle_ignored_attributes_option (vec *v)
> > > > +{
> > > > +  if (v == nullptr)
> > > > +return;
> > > > +
> > > > +  for (auto opt : v)
> > > > +{
> > > > +  if (strcmp (opt, "clang") == 0)
> > > > +   {
> > > > + // TODO
> > > > + continue;
> > > > +   }
> > > 
> > > If this doesn't work yet, let's not accept it at all for now.
> > 
> > Ok.
> >  
> > > > +  char *q = strstr (opt, "::");
> > > > +  /* We don't accept '::attr'.  */
> > > > +  if (q == nullptr || q == opt)
> > > > +   {
> > > > + error ("wrong argument to ignored attributes");
> > > > + inform (input_location, "valid format is %, 
> > > > %, "
> > > > + "or %");
> > > 
> > > ...or even mention it.  Users can ignore clang:: instead, it doesn't 
> > > matter
> > > to us if clang attributes are misspelled.
> > 
> > Removed.
> > 
> > > > + continue;
> > > > +   }
> > > > +  /* Cut off the vendor part.  */
> > > > +  *q = '\0';
> > > > +  char *vendor = opt;
> > > > +  char *attr = q + 2;
> > > > +  /* Verify that they look valid.  */
> > > > +  auto valid_p = [](const char *s) {
> > > > +   for (; *s != '\0'; ++s)
> > > > + if (!ISALNUM (*s) && *s != '_')
> > > > +   return false;
> > > > +   return true;
> > > > +  };
> > > > +  if (!valid_p (vendor) || !valid_p (attr))
> > > > +   {
> > > > + error ("wrong argument to ignored attributes");
> > > > + continue;
> > > > +   }
> > > > +  /* Turn "__attr__" into "attr" so that we have a canonical form 
> > > > of
> > > > +attribute names.  Likewise for vendor.  */
> > > > +  auto strip = [](char *) {
> > > > +   const size_t l = strlen (s);
> > > > +   if (l > 4 && s[0] == '_' && s[1] == '_'
> > > > +   && s[l - 1] == '_' && s[l - 2] == '_')
> > > > + {
> > > > +   s[l - 2] = '\0';
> > > > +   s += 2;
> > > > + }
> > > > +  };
> > > > +  strip (attr);
> > > > +  strip (vendor);
> > > > +  /* If we've already seen this vendor::attr, ignore it.  
> > > > Attempting to
> > > > +register it twice would lead to a crash.  */
> > > > +  if (lookup_scoped_attribute_spec (get_identifier (vendor),
> > > > +   get_identifier (attr)))
> > > > +   continue;
> > > > +  /* In the "vendor::" case, we should ignore *any* attribute 
> > > > coming
> > > > +from this attribute namespace.  */
> > > > +  const bool ignored_ns = attr[0] == '\0';
> > > 
> > > Maybe set attr to nullptr instead of declaring ignored_ns?
> > > 
> > > > +  /* Create a table with extra attributes which we will register.
> > > > +We can't free it here, so squirrel away the pointers.  */
> > > > +  attribute_spec *table = new attribute_spec[2];
> > > > +  ignored_attributes_table.safe_push (table);
> > > > +  table[0] = { ignored_ns ? nullptr : attr, 0, 0, false, false,
> > > 
> > > ...so this can just use attr.
> > 
> > I also need ignored_ns...
> >  
> > > > +  false, false, nullptr, nullptr };
> > > > +  table[1] = { nullptr, 0, 0, false, false, false, false, nullptr, 
> > > > nullptr };
> > > > +  register_scoped_attributes (table, vendor, ignored_ns);
> > 
> > ...here, but I tweaked this a bit to get rid of the bool.
> > 
> > > > +}
> > > > +}
> > > > +
> > > > +/* Free data we might have allocated when adding extra attributes.  */
> > > > +
> > > > +void
> > > > +free_attr_data ()
> > > > +{
> > > > +  for (auto x : ignored_attributes_table)
> > > > +delete[] x;
> > > > +}
> > > 
> > > You probably also want to zero out ignored_attributes_table at this point.
> > 
> > Done.
> > 
> > > >   /* Initialize attribute tables, and make some sanity checks if 
> > > > checking is
> > > >  enabled.  */
> > > > @@ -252,6 +353,9 @@ init_attributes (void)
> > > >   /* Put all the GNU attributes into the "gnu" namespace.  */
> > > >   register_scoped_attributes (attribute_tables[i], "gnu");
> > > > +  vec *ignored = (vec *) flag_ignored_attributes;
> > > > +  handle_ignored_attributes_option (ignored);
> > > > +
> > > > invoke_plugin_callbacks (PLUGIN_ATTRIBUTES, NULL);
> > > > attributes_initialized = true;
> > > >   }
> > > > @@ -456,6 +560,19 @@ diag_attr_exclusions (tree last_decl, tree node, 
> > > > tree attrname,
> > > > return found;
> > > >   }
> > > > +/* Return true iff we should not complain about unknown attributes
> > > > +   coming from the attribute namespace NS.  This is the case for
> > > > +   the -Wno-attributes=ns:: command-line option.  */

Re: [PATCH,FORTRAN 28/29] Free type-bound procedure structs

2021-10-29 Thread Bernhard Reutner-Fischer via Gcc-patches
On Fri, 29 Oct 2021 07:54:21 -0700
Jerry D  wrote:

> Looks good and simple. Proceed. Thanks

Committed as 7883a7f07c1ad9c8aaccc5bbd96e0ae1fa230c89
Thanks!

Maybe somebody could eyeball and ACK "Fix memory leak in finalization
wrappers"
https://gcc.gnu.org/pipermail/fortran/2021-October/056838.html

We were generating (and emitting to modules) finalization wrapper
needlessly, i.e. even when they were not called for.

This 1) leaked like shown in the initial submission and
2) polluted module files with unwarranted (wrong) mention of
finalization wrappers even when compiling without any coarray stuff.

E.g. a modified udr10.f90 (from libgomp) has the following diff in the
module which illustrates the positive side-effect of the fix:

-26 'array' '' '' 25 ((VARIABLE INOUT UNKNOWN-PROC UNKNOWN UNKNOWN 0 0
-ARTIFICIAL DIMENSION CONTIGUOUS DUMMY) () (DERIVED 3 0 0 0 DERIVED ()) 0
-0 () (0 0 ASSUMED_RANK) 0 () () () 0 0)
-27 'byte_stride' '' '' 25 ((VARIABLE UNKNOWN-INTENT UNKNOWN-PROC UNKNOWN
-UNKNOWN 0 0 ARTIFICIAL VALUE DUMMY) () (INTEGER 8 0 0 0 INTEGER ()) 0 0
-() () 0 () () () 0 0)
-28 'fini_coarray' '' '' 25 ((VARIABLE UNKNOWN-INTENT UNKNOWN-PROC
-UNKNOWN UNKNOWN 0 0 ARTIFICIAL VALUE DUMMY) () (LOGICAL 1 0 0 0 LOGICAL
-()) 0 0 () () 0 () () () 0 0)

thanks,


Re: [Patch] libcpp: Fix _Pragma expansion [PR102409]

2021-10-29 Thread Jakub Jelinek via Gcc-patches
On Fri, Oct 29, 2021 at 06:20:15PM +0200, Tobias Burnus wrote:
> On 29.10.21 13:06, Jakub Jelinek wrote:
> > On Thu, Oct 28, 2021 at 05:51:59PM +0200, Tobias Burnus wrote:
> > > libcpp/ChangeLog:
> > > 
> > >  PR c++/102409
> > >  * directives.c (destringize_and_run): Add PRAGMA_OP to the
> > >  CPP_PRAGMA token's flags to mark is as coming from _Pragma.
> > >  * include/cpplib.h (PRAGMA_OP): #define, to be used with token flags.
> > >  * macro.c (collect_args): Only handle CPP_PRAGMA special if PRAGMA_OP
> > >  is set.
> > > 
> > The patch itself looks reasonable to me, but it should come up with
> > testsuite coverage.  And the testsuite coverage should include both normal
> > testcases that do use integrated preprocessor, and the same with
> > -save-temps to make sure that even when preprocessing separately it works
> > too.
> 
> Yes, I realized myself that I missed to include a testcase – thanks for
> the -save-temps suggestion!
> 
> Updated patch enclosed.

Ok, thanks.
For backports, I'd wait a few weeks.

Jakub



Re: [Patch] libcpp: Fix _Pragma expansion [PR102409]

2021-10-29 Thread Tobias Burnus

On 29.10.21 13:06, Jakub Jelinek wrote:

On Thu, Oct 28, 2021 at 05:51:59PM +0200, Tobias Burnus wrote:

libcpp/ChangeLog:

 PR c++/102409
 * directives.c (destringize_and_run): Add PRAGMA_OP to the
 CPP_PRAGMA token's flags to mark is as coming from _Pragma.
 * include/cpplib.h (PRAGMA_OP): #define, to be used with token flags.
 * macro.c (collect_args): Only handle CPP_PRAGMA special if PRAGMA_OP
 is set.


The patch itself looks reasonable to me, but it should come up with
testsuite coverage.  And the testsuite coverage should include both normal
testcases that do use integrated preprocessor, and the same with
-save-temps to make sure that even when preprocessing separately it works
too.


Yes, I realized myself that I missed to include a testcase – thanks for
the -save-temps suggestion!

Updated patch enclosed.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libcpp: Fix _Pragma expansion [PR102409]

Both #pragma and _Pragma ended up as CPP_PRAGMA. Presumably since
r131819 (2008, GCC 4.3) for PR34692, pragmas are not expanded in
macro arguments but are output as is before. From the old bug report,
that was to fix usage like
  FOO (
#pragma GCC diagnostic
  )
However, that change also affected _Pragma such that
  BAR (
"1";
_Pragma("omp ..."); )
yielded
  #pragma omp ...
followed by what BAR expanded too, possibly including '"1";'.

This commit adds a flag, PRAGMA_OP, to tokens to make the two
distinguishable - and include again _Pragma in the expanded arguments.

libcpp/ChangeLog:

	PR c++/102409
	* directives.c (destringize_and_run): Add PRAGMA_OP to the
	CPP_PRAGMA token's flags to mark is as coming from _Pragma.
	* include/cpplib.h (PRAGMA_OP): #define, to be used with token flags.
	* macro.c (collect_args): Only handle CPP_PRAGMA special if PRAGMA_OP
	is set.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/pragma-1.c: New test.
	* c-c++-common/gomp/pragma-2.c: New test.

 gcc/testsuite/c-c++-common/gomp/pragma-1.c | 50 ++
 gcc/testsuite/c-c++-common/gomp/pragma-2.c | 50 ++
 libcpp/directives.c|  2 ++
 libcpp/include/cpplib.h|  1 +
 libcpp/macro.c |  2 +-
 5 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/c-c++-common/gomp/pragma-1.c b/gcc/testsuite/c-c++-common/gomp/pragma-1.c
new file mode 100644
index 000..e330f17204a
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/pragma-1.c
@@ -0,0 +1,50 @@
+/* { dg-additional-options "-fdump-tree-original" }  */
+/* PR c++/51484  */
+
+#define TEST(T) { \
+  int fail = 0, trial; \
+  for (int trial = 0; trial < TRIALS && fail == 0; trial++) { \
+_Pragma("omp target teams num_teams(1) thread_limit(1024)") \
+ {T} \
+  } \
+}
+
+#define TRIALS (1)
+#define N (1024*3)
+
+int main(void) {
+
+  double C[N], D[N];
+  double S[N];
+  double p[2];
+  int i; 
+  for (i = 0; i < N; i++)
+  {C[i] = 1; D[i] = i;}
+
+  int max_threads = 224;
+
+#define PARALLEL(X) TEST({ \
+_Pragma("omp parallel if(threads[0] > 1) num_threads(threads[0])") \
+{ \
+_Pragma("omp for ordered") \
+  X  \
+_Pragma("omp for schedule(auto) ordered") \
+  X  \
+} \
+})
+
+  for (int t = 0; t <= max_threads; t += max_threads) {
+int threads[1]; threads[0] = t;
+S[0] = 0;
+PARALLEL(
+for (int i = 0; i < N; i++) { \
+  _Pragma("omp ordered") \
+  S[0] += C[i] + D[i]; \
+})
+  }
+  return 0;
+}
+
+/* On expansion, the _Pragma were wrongly placed, ensure the order is now correct: */
+/* { dg-final { scan-tree-dump "#pragma omp target.*#pragma omp teams num_teams\\(1\\) thread_limit\\(1024\\).*#pragma omp parallel num_threads\\(threads\\\[0\\\]\\) if\\(threads\\\[0\\\] > 1\\).*#pragma omp for ordered.*#pragma omp ordered.*#pragma omp for ordered schedule\\(auto\\).*#pragma omp ordered" "original" } } */
+
diff --git a/gcc/testsuite/c-c++-common/gomp/pragma-2.c b/gcc/testsuite/c-c++-common/gomp/pragma-2.c
new file mode 100644
index 000..5358f875959
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/pragma-2.c
@@ -0,0 +1,50 @@
+/* { dg-additional-options "-fdump-tree-original -save-temps" }  */
+/* PR c++/51484  */
+
+#define TEST(T) { \
+  int fail = 0, trial; \
+  for (int trial = 0; trial < TRIALS && fail == 0; trial++) { \
+_Pragma("omp target teams num_teams(1) thread_limit(1024)") \
+ {T} \
+  } \
+}
+
+#define TRIALS (1)
+#define N (1024*3)
+
+int main(void) {
+
+  double C[N], D[N];
+  double S[N];
+  double p[2];
+  int i; 
+  for (i = 0; i < N; i++)
+  {C[i] = 1; D[i] = i;}
+
+  int max_threads = 224;
+
+#define PARALLEL(X) TEST({ \
+_Pragma("omp parallel if(threads[0] > 1) num_threads(threads[0])") \
+{ \
+_Pragma("omp for 

Re: [Patch] OpenMP: Add strictly nested API call check [PR102972]

2021-10-29 Thread Tobias Burnus

On 29.10.21 12:53, Jakub Jelinek wrote:

On Fri, Oct 29, 2021 at 12:09:55PM +0200, Tobias Burnus wrote:

[...] only routines calls to
   omp_get_num_teams() and omp_get_team_num()
are permitted in teams when closely nested.

I'm afraid using DECL_ASSEMBLER_NAME opens a new can of worms. [...]
At least for C++, [...]
is meant to be checked by
   || (DECL_CONTEXT (fndecl) != NULL_TREE
   && TREE_CODE (DECL_CONTEXT (fndecl)) != TRANSLATION_UNIT_DECL)
If that doesn't work for Fortran modules, we need to find out something
different, e.g. setjmp_or_longjmp_p also relies on that...


It turned out that the current (pre-patch) code works correctly, except
that DECL_NAME for Fortran does not have the '_' suffix. I have now
updated the comments and just for omp_* and omp_*_8. That simplifies the
code and, fortunately, DECL_NAME does seem to work.


On the other side, when we use DECL_NAME we don't currently differentiate
between:
extern "C" int omp_is_initial_device ();
and say
extern int omp_is_initial_device (double, float);
where the latter is in C++ mangled differently.  Sure, one can't use
the latter together with #include ...


The question is whether anyone cares that we reject the latter?


As mentioned in the PR, I really don't like this permit_num_teams argument,
IMHO it is a caller that should check it, otherwise we end up in the
function with myriads of future exceptions etc.

I concur – given that DECL_NAME seems to work fine (ignoring C++ w/o
extern "C").

As for tests where you are adding parallel to avoid the new diagnostics,
I'd suggest parallel if(0) instead, no need to create any extra threads...


Done.

Thanks for the comments!

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP: Add strictly nested API call check [PR102972]

The teams construct only permits omp_get_num_teams and omp_get_team_num
as API call in strictly nested regions - check for it.

Additionally, for Fortran, using DECL_NAME does not show the mangled
name, hence, DECL_ASSEMBLER_NAME had to be used to.

Finally, 'target device(ancestor:1)' wrongly rejected non-API calls
as well.

	PR middle-end/102972
gcc/ChangeLog:

	* omp-low.c (omp_runtime_api_call): Use DECL_ASSEMBLER_NAME to get
	internal Fortran name; new permit_num_teams arg to permit
	omp_get_num_teams and omp_get_team_num.
	(scan_omp_1_stmt): Update call to it, add missing call for
	reverse offload, and check for strictly nested API calls in teams.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/target-device-ancestor-3.c: Add non-API
	routine test.
	* gfortran.dg/gomp/order-6.f90: Add missing bind(C).
	* c-c++-common/gomp/teams-3.c: New test.
	* gfortran.dg/gomp/teams-3.f90: New test.
	* gfortran.dg/gomp/teams-4.f90: New test.

libgomp/ChangeLog:
	* testsuite/libgomp.c-c++-common/icv-3.c: Nest API calls inside
	parallel construct.
	* testsuite/libgomp.c-c++-common/icv-4.c: Likewise.
	* testsuite/libgomp.c/target-3.c: Likewise.
	* testsuite/libgomp.c/target-5.c: Likewise.
	* testsuite/libgomp.c/target-6.c: Likewise.
	* testsuite/libgomp.c/target-teams-1.c: Likewise.
	* testsuite/libgomp.c/teams-1.c: Likewise.
	* testsuite/libgomp.c/thread-limit-2.c: Likewise.
	* testsuite/libgomp.c/thread-limit-3.c: Likewise.
	* testsuite/libgomp.c/thread-limit-4.c: Likewise.
	* testsuite/libgomp.c/thread-limit-5.c: Likewise.
	* testsuite/libgomp.fortran/icv-3.f90: Likewise.
	* testsuite/libgomp.fortran/icv-4.f90: Likewise.
	* testsuite/libgomp.fortran/teams1.f90: Likewise.

 gcc/omp-low.c  |  33 --
 .../c-c++-common/gomp/target-device-ancestor-3.c   |   2 +
 gcc/testsuite/c-c++-common/gomp/teams-3.c  |  64 
 gcc/testsuite/gfortran.dg/gomp/order-6.f90 |   2 +-
 gcc/testsuite/gfortran.dg/gomp/teams-3.f90 |  65 
 gcc/testsuite/gfortran.dg/gomp/teams-4.f90 |  47 +
 libgomp/testsuite/libgomp.c-c++-common/icv-3.c |   3 +
 libgomp/testsuite/libgomp.c-c++-common/icv-4.c |   1 +
 libgomp/testsuite/libgomp.c/target-3.c |   6 +-
 libgomp/testsuite/libgomp.c/target-5.c |   1 +
 libgomp/testsuite/libgomp.c/target-6.c |  12 ++-
 libgomp/testsuite/libgomp.c/target-teams-1.c   | 115 +++--
 libgomp/testsuite/libgomp.c/teams-1.c  |   6 +-
 libgomp/testsuite/libgomp.c/thread-limit-2.c   |  21 ++--
 libgomp/testsuite/libgomp.c/thread-limit-3.c   |   1 +
 libgomp/testsuite/libgomp.c/thread-limit-4.c   |  25 +++--
 libgomp/testsuite/libgomp.c/thread-limit-5.c   |   1 +
 libgomp/testsuite/libgomp.fortran/icv-3.f90|   6 ++
 libgomp/testsuite/libgomp.fortran/icv-4.f90|   2 +
 libgomp/testsuite/libgomp.fortran/teams1.f90   |  16 +--
 20 files changed, 347 

[COMMITTED] path oracle: Do not look back to the root oracle for killing defs.

2021-10-29 Thread Aldy Hernandez via Gcc-patches
Since registering a kill means removing all references to it from the
path oracle list, make sure we don't look back to the root oracle
either.

Tested on x86-64 Linux.

Co-authored-by: Andrew MacLeod 

gcc/ChangeLog:

* value-relation.cc (path_oracle::killing_def): Add a
self-equivalence so we don't look to the root oracle.
---
 gcc/value-relation.cc | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index 512b51ce022..f572bcd4dc2 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -1302,11 +1302,22 @@ path_oracle::killing_def (tree ssa)
   // Walk the equivalency list and remove SSA from any equivalencies.
   if (bitmap_bit_p (m_equiv.m_names, v))
 {
-  bitmap_clear_bit (m_equiv.m_names, v);
   for (equiv_chain *ptr = m_equiv.m_next; ptr; ptr = ptr->m_next)
if (bitmap_bit_p (ptr->m_names, v))
  bitmap_clear_bit (ptr->m_names, v);
 }
+  else
+bitmap_set_bit (m_equiv.m_names, v);
+
+  // Now add an equivalency with itself so we don't look to the root oracle.
+  bitmap b = BITMAP_ALLOC (_bitmaps);
+  bitmap_set_bit (b, v);
+  equiv_chain *ptr = (equiv_chain *) obstack_alloc (_chain_obstack,
+   sizeof (equiv_chain));
+  ptr->m_names = b;
+  ptr->m_bb = NULL;
+  ptr->m_next = m_equiv.m_next;
+  m_equiv.m_next = ptr;
 
   // Walk the relation list and remove SSA from any relations.
   if (!bitmap_bit_p (m_relations.m_names, v))
-- 
2.31.1



[committed] Avoid overly-greedy match in dejagnu regexp.

2021-10-29 Thread Jeff Law via Gcc-patches
Occasionally I've been seeing failures with the multi-line diagnostics.  
It's never been clear what's causing the spurious failures, though I 
have long suspected a greedy regexp match.


It happened again yesterday with a local change that in no way should 
affect diagnostics, so I finally went searching and found that sure 
enough the multi-line diagnostics had a ".*" in their regexp.  According 
to the comments, the .* is primarily to catch any dg directives that may 
appear -- ie it should eat to EOL, but not multiple lines.  But a .* can 
indeed match a newline and cause it to eat multiple lines.


The fix is simple.  [^\r\n]* will eat to EOL, but not further.

Regression tested on x86_64 and on our internal target.

Committed to the trunk.

Jeff
commit 14c7757e9b751781360737f53b71f851fc356d3d
Author: Jeff Law 
Date:   Fri Oct 29 11:30:15 2021 -0400

Avoid overly-greedy match in dejagnu regexp.

Occasionally I've been seeing failures with the multi-line diagnostics.  
It's never been clear what's causing the spurious failures, though I have long 
suspected a greedy regexp match.

It happened again yesterday with a local change that in no way should 
affect diagnostics, so I finally went searching and found that sure enough the 
multi-line diagnostics had a ".*" in their regexp.  According to the comments, 
the .* is primarily to catch any dg directives that may appear -- ie it should 
eat to EOL, but not multiple lines.  But a .* can indeed match a newline and 
cause it to eat multiple lines.

The fix is simple.  [^\r\n]* will eat to EOL, but not further.

Regression tested on x86_64 and on our internal target.

gcc/testsuite

* lib/multiline.exp (_build_multiline_regex): Use a better
regexp than .* to match up to EOL.

diff --git a/gcc/testsuite/lib/multiline.exp b/gcc/testsuite/lib/multiline.exp
index 0e151b6d222..86387f8209b 100644
--- a/gcc/testsuite/lib/multiline.exp
+++ b/gcc/testsuite/lib/multiline.exp
@@ -331,7 +331,7 @@ proc _build_multiline_regex { multiline index } {
# Support arbitrary followup text on each non-empty line,
# to deal with comments containing containing DejaGnu
# directives.
-   append rexp ".*"
+   append rexp "\[^\\n\\r\]*"
}
}
append rexp "\n"


Re: [PATCH, v2] c++: Diagnose taking address of an immediate member function [PR102753]

2021-10-29 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 26, 2021 at 04:58:11PM -0400, Jason Merrill wrote:
> > I'm afraid I don't have a good idea where to move that diagnostic to though,
> > it would need to be done somewhere where we are certain we aren't in a
> > subexpression of immediate invocation.  Given statement expressions, even
> > diagnostics after parsing whole statements might not be good enough, e.g.
> > void
> > qux ()
> > {
> >static_assert (bar (({ constexpr auto a = 1; foo; })) == 42);
> > }
> 
> I suppose (a wrapper for) fold_build_cleanup_point_expr would be a possible
> place to check, since that's called for full-expressions.

I've played a little bit with this (tried to do it at cp_fold time), but
there are problems with that.
cp_fold of course isn't a good spot for this because it can be called from
fold_for_warn and at that point we don't know if we are inside of immediate
invocation's argument or not, or it can be called even inside of consteval
fn bodies etc.  So, let's suppose we do a separate cp_walk_tree just for
this if cxx_dialect >= cxx20 e.g. from cp_fold_function and
cp_fully_fold_init or some other useful spot, like in the patch below
we avoid walking into THEN_CLAUSE of IF_STMT_CONSTEVAL_P IF_STMTs.
And if this would be done before cp_fold_function's cp_fold_r walk,
we'd also need calls to source_location_current_p as an exception.
The major problem is the location used for the error_at,
e.g. the ADDR_EXPRs pretty much never EXPR_HAS_LOCATION and PTRMEM_CST
doesn't even have location, so while we would report diagnostics, it would
be always
cc1plus: error: taking address of an immediate function ‘consteval int S::foo() 
const’
etc.
I guess one option is to report it even later, during gimplification where
gimplify_expr etc. track input_location, but what to do with static
initializers?
Another option would be to have a walk_tree_1 variant that would be updating
input_location similarly to how gimplify_expr does that, i.e.
  saved_location = input_location;
  if (save_expr != error_mark_node
  && EXPR_HAS_LOCATION (*expr_p))
input_location = EXPR_LOCATION (*expr_p);
...
  input_location = saved_location;
but probably using RAII because walk_tree_1 has a lot of returns in it.
And turn walk_tree_1 into a template instantiated twice, once as walk_tree_1
without the input_location handling in it and once with it under some
different name?
Or do we have some other expression walker that does update input_location
as it goes?

--- gcc/cp/typeck.c.jj  2021-10-27 09:03:07.555043491 +0200
+++ gcc/cp/typeck.c 2021-10-29 15:59:57.871449304 +0200
@@ -6773,16 +6773,6 @@ cp_build_addr_expr_1 (tree arg, bool str
return error_mark_node;
  }
 
-   if (TREE_CODE (t) == FUNCTION_DECL
-   && DECL_IMMEDIATE_FUNCTION_P (t)
-   && !in_immediate_context ())
- {
-   if (complain & tf_error)
- error_at (loc, "taking address of an immediate function %qD",
-   t);
-   return error_mark_node;
- }
-
type = build_ptrmem_type (context_for_name_lookup (t),
  TREE_TYPE (t));
t = make_ptrmem_cst (type, t);
@@ -6809,15 +6799,6 @@ cp_build_addr_expr_1 (tree arg, bool str
 {
   tree stripped_arg = tree_strip_any_location_wrapper (arg);
   if (TREE_CODE (stripped_arg) == FUNCTION_DECL
- && DECL_IMMEDIATE_FUNCTION_P (stripped_arg)
- && !in_immediate_context ())
-   {
- if (complain & tf_error)
-   error_at (loc, "taking address of an immediate function %qD",
- stripped_arg);
- return error_mark_node;
-   }
-  if (TREE_CODE (stripped_arg) == FUNCTION_DECL
  && !mark_used (stripped_arg, complain) && !(complain & tf_error))
return error_mark_node;
   val = build_address (arg);
--- gcc/cp/cp-gimplify.c.jj 2021-09-18 09:47:08.409573816 +0200
+++ gcc/cp/cp-gimplify.c2021-10-29 16:48:42.308261319 +0200
@@ -902,6 +902,17 @@ cp_fold_r (tree *stmt_p, int *walk_subtr
}
   cp_walk_tree (_FOR_PRE_BODY (stmt), cp_fold_r, data, NULL);
   *walk_subtrees = 0;
+  return NULL;
+}
+
+  if (code == IF_STMT && IF_STMT_CONSTEVAL_P (stmt))
+{
+  /* Don't walk THEN_CLAUSE (stmt) for consteval if.  IF_COND is always
+boolean_false_node.  */
+  cp_walk_tree (_CLAUSE (stmt), cp_fold_r, data, NULL);
+  cp_walk_tree (_SCOPE (stmt), cp_fold_r, data, NULL);
+  *walk_subtrees = 0;
+  return NULL;
 }
 
   return NULL;
@@ -1418,9 +1429,9 @@ cp_genericize_r (tree *stmt_p, int *walk
}
 
   if (tree fndecl = cp_get_callee_fndecl_nofold (stmt))
-   if (DECL_IMMEDIATE_FUNCTION_P (fndecl))
+   if (DECL_IMMEDIATE_FUNCTION_P (fndecl)
+   && source_location_current_p (fndecl))
  {
-   gcc_assert (source_location_current_p (fndecl));
*stmt_p = cxx_constant_value (stmt);
break;
  }
@@ 

Re: [PATCH 2/2]AArch64: Add better costing for vector constants and operations

2021-10-29 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> Attached is a new version that fixes the previous SVE fallouts in a new way.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> --- inline copy of patch ---
>
>
> diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
> b/gcc/config/aarch64/aarch64-cost-tables.h
> index 
> dd2e7e7cbb13d24f0b51092270cd7e2d75fabf29..bb499a1eae62a145f1665d521f57c98b49ac5389
>  100644
> --- a/gcc/config/aarch64/aarch64-cost-tables.h
> +++ b/gcc/config/aarch64/aarch64-cost-tables.h
> @@ -124,7 +124,10 @@ const struct cpu_cost_table qdf24xx_extra_costs =
>/* Vector */
>{
>  COSTS_N_INSNS (1),  /* alu.  */
> -COSTS_N_INSNS (4)   /* mult.  */
> +COSTS_N_INSNS (4),  /* mult.  */
> +COSTS_N_INSNS (1),  /* movi.  */
> +COSTS_N_INSNS (2),  /* dup.  */
> +COSTS_N_INSNS (2)   /* extract.  */
>}
>  };
>  
> @@ -229,7 +232,10 @@ const struct cpu_cost_table thunderx_extra_costs =
>/* Vector */
>{
>  COSTS_N_INSNS (1),   /* Alu.  */
> -COSTS_N_INSNS (4)/* mult.  */
> +COSTS_N_INSNS (4),   /* mult.  */
> +COSTS_N_INSNS (1),   /* movi.  */
> +COSTS_N_INSNS (2),   /* dup.  */
> +COSTS_N_INSNS (2)/* extract.  */
>}
>  };
>  
> @@ -333,7 +339,10 @@ const struct cpu_cost_table thunderx2t99_extra_costs =
>/* Vector */
>{
>  COSTS_N_INSNS (1),   /* Alu.  */
> -COSTS_N_INSNS (4)/* Mult.  */
> +COSTS_N_INSNS (4),   /* Mult.  */
> +COSTS_N_INSNS (1),   /* movi.  */
> +COSTS_N_INSNS (2),   /* dup.  */
> +COSTS_N_INSNS (2)/* extract.  */
>}
>  };
>  
> @@ -437,7 +446,10 @@ const struct cpu_cost_table thunderx3t110_extra_costs =
>/* Vector */
>{
>  COSTS_N_INSNS (1),   /* Alu.  */
> -COSTS_N_INSNS (4)/* Mult.  */
> +COSTS_N_INSNS (4),   /* Mult.  */
> +COSTS_N_INSNS (1),   /* movi.  */
> +COSTS_N_INSNS (2),   /* dup.  */
> +COSTS_N_INSNS (2)/* extract.  */
>}
>  };
>  
> @@ -542,7 +554,10 @@ const struct cpu_cost_table tsv110_extra_costs =
>/* Vector */
>{
>  COSTS_N_INSNS (1),  /* alu.  */
> -COSTS_N_INSNS (4)   /* mult.  */
> +COSTS_N_INSNS (4),  /* mult.  */
> +COSTS_N_INSNS (1),  /* movi.  */
> +COSTS_N_INSNS (2),  /* dup.  */
> +COSTS_N_INSNS (2)   /* extract.  */
>}
>  };
>  
> @@ -646,7 +661,10 @@ const struct cpu_cost_table a64fx_extra_costs =
>/* Vector */
>{
>  COSTS_N_INSNS (1),  /* alu.  */
> -COSTS_N_INSNS (4)   /* mult.  */
> +COSTS_N_INSNS (4),  /* mult.  */
> +COSTS_N_INSNS (1),  /* movi.  */
> +COSTS_N_INSNS (2),  /* dup.  */
> +COSTS_N_INSNS (2)   /* extract.  */
>}
>  };
>  
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 29f381728a3b3d28bcd6a1002ba398c8b87713d2..61c3d7e195c510da88aa513f99af5f76f4d696e7
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -74,12 +74,14 @@ (define_insn "aarch64_simd_dup"
>  )
>  
>  (define_insn "aarch64_simd_dup"
> -  [(set (match_operand:VDQF_F16 0 "register_operand" "=w")
> +  [(set (match_operand:VDQF_F16 0 "register_operand" "=w,w")
>   (vec_duplicate:VDQF_F16
> -   (match_operand: 1 "register_operand" "w")))]
> +   (match_operand: 1 "register_operand" "w,r")))]
>"TARGET_SIMD"
> -  "dup\\t%0., %1.[0]"
> -  [(set_attr "type" "neon_dup")]
> +  "@
> +   dup\\t%0., %1.[0]
> +   dup\\t%0., %1"
> +  [(set_attr "type" "neon_dup, neon_from_gp")]
>  )
>  
>  (define_insn "aarch64_dup_lane"
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 699c105a42a613c06c462e2de686795279d85bc9..542fc874a4e224fb2cbe94e64eab590458fe935b
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -12705,7 +12705,7 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int 
> outer ATTRIBUTE_UNUSED,
>rtx op0, op1, op2;
>const struct cpu_cost_table *extra_cost
>  = aarch64_tune_params.insn_extra_cost;
> -  int code = GET_CODE (x);
> +  rtx_code code = GET_CODE (x);
>scalar_int_mode int_mode;
>  
>/* By default, assume that everything has equivalent cost to the
> @@ -13466,8 +13466,7 @@ cost_plus:
>  
>we must cost the explicit register move.  */
>if (mode == DImode
> -   && GET_MODE (op0) == SImode
> -   && outer == SET)
> +   && GET_MODE (op0) == SImode)
>   {
> int op_cost = rtx_cost (op0, VOIDmode, ZERO_EXTEND, 0, speed);
>  
> @@ -14006,8 +14005,39 @@ cost_plus:
>mode, MULT, 1, speed);
>return true;
>  }
> + break;
> +case CONST_VECTOR:
> + {
> +   /* Load using MOVI/MVNI.  */
> +   if (aarch64_simd_valid_immediate (x, NULL))
> + *cost = extra_cost->vect.movi;
> +   else /* Load using constant pool.  */
> + *cost = extra_cost->ldst.load;
> +   break;
> + }
> +  

RE: [PATCH 2/2]AArch64: Add better costing for vector constants and operations

2021-10-29 Thread Tamar Christina via Gcc-patches
Hi All,

Attached is a new version that fixes the previous SVE fallouts in a new way.

Ok for master?

Thanks,
Tamar

--- inline copy of patch ---


diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
b/gcc/config/aarch64/aarch64-cost-tables.h
index 
dd2e7e7cbb13d24f0b51092270cd7e2d75fabf29..bb499a1eae62a145f1665d521f57c98b49ac5389
 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -124,7 +124,10 @@ const struct cpu_cost_table qdf24xx_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1),  /* alu.  */
-COSTS_N_INSNS (4)   /* mult.  */
+COSTS_N_INSNS (4),  /* mult.  */
+COSTS_N_INSNS (1),  /* movi.  */
+COSTS_N_INSNS (2),  /* dup.  */
+COSTS_N_INSNS (2)   /* extract.  */
   }
 };
 
@@ -229,7 +232,10 @@ const struct cpu_cost_table thunderx_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1), /* Alu.  */
-COSTS_N_INSNS (4)  /* mult.  */
+COSTS_N_INSNS (4), /* mult.  */
+COSTS_N_INSNS (1), /* movi.  */
+COSTS_N_INSNS (2), /* dup.  */
+COSTS_N_INSNS (2)  /* extract.  */
   }
 };
 
@@ -333,7 +339,10 @@ const struct cpu_cost_table thunderx2t99_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1), /* Alu.  */
-COSTS_N_INSNS (4)  /* Mult.  */
+COSTS_N_INSNS (4), /* Mult.  */
+COSTS_N_INSNS (1), /* movi.  */
+COSTS_N_INSNS (2), /* dup.  */
+COSTS_N_INSNS (2)  /* extract.  */
   }
 };
 
@@ -437,7 +446,10 @@ const struct cpu_cost_table thunderx3t110_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1), /* Alu.  */
-COSTS_N_INSNS (4)  /* Mult.  */
+COSTS_N_INSNS (4), /* Mult.  */
+COSTS_N_INSNS (1), /* movi.  */
+COSTS_N_INSNS (2), /* dup.  */
+COSTS_N_INSNS (2)  /* extract.  */
   }
 };
 
@@ -542,7 +554,10 @@ const struct cpu_cost_table tsv110_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1),  /* alu.  */
-COSTS_N_INSNS (4)   /* mult.  */
+COSTS_N_INSNS (4),  /* mult.  */
+COSTS_N_INSNS (1),  /* movi.  */
+COSTS_N_INSNS (2),  /* dup.  */
+COSTS_N_INSNS (2)   /* extract.  */
   }
 };
 
@@ -646,7 +661,10 @@ const struct cpu_cost_table a64fx_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1),  /* alu.  */
-COSTS_N_INSNS (4)   /* mult.  */
+COSTS_N_INSNS (4),  /* mult.  */
+COSTS_N_INSNS (1),  /* movi.  */
+COSTS_N_INSNS (2),  /* dup.  */
+COSTS_N_INSNS (2)   /* extract.  */
   }
 };
 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
29f381728a3b3d28bcd6a1002ba398c8b87713d2..61c3d7e195c510da88aa513f99af5f76f4d696e7
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -74,12 +74,14 @@ (define_insn "aarch64_simd_dup"
 )
 
 (define_insn "aarch64_simd_dup"
-  [(set (match_operand:VDQF_F16 0 "register_operand" "=w")
+  [(set (match_operand:VDQF_F16 0 "register_operand" "=w,w")
(vec_duplicate:VDQF_F16
- (match_operand: 1 "register_operand" "w")))]
+ (match_operand: 1 "register_operand" "w,r")))]
   "TARGET_SIMD"
-  "dup\\t%0., %1.[0]"
-  [(set_attr "type" "neon_dup")]
+  "@
+   dup\\t%0., %1.[0]
+   dup\\t%0., %1"
+  [(set_attr "type" "neon_dup, neon_from_gp")]
 )
 
 (define_insn "aarch64_dup_lane"
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
699c105a42a613c06c462e2de686795279d85bc9..542fc874a4e224fb2cbe94e64eab590458fe935b
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12705,7 +12705,7 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int outer 
ATTRIBUTE_UNUSED,
   rtx op0, op1, op2;
   const struct cpu_cost_table *extra_cost
 = aarch64_tune_params.insn_extra_cost;
-  int code = GET_CODE (x);
+  rtx_code code = GET_CODE (x);
   scalar_int_mode int_mode;
 
   /* By default, assume that everything has equivalent cost to the
@@ -13466,8 +13466,7 @@ cost_plus:
 
 we must cost the explicit register move.  */
   if (mode == DImode
- && GET_MODE (op0) == SImode
- && outer == SET)
+ && GET_MODE (op0) == SImode)
{
  int op_cost = rtx_cost (op0, VOIDmode, ZERO_EXTEND, 0, speed);
 
@@ -14006,8 +14005,39 @@ cost_plus:
 mode, MULT, 1, speed);
   return true;
 }
+   break;
+case CONST_VECTOR:
+   {
+ /* Load using MOVI/MVNI.  */
+ if (aarch64_simd_valid_immediate (x, NULL))
+   *cost = extra_cost->vect.movi;
+ else /* Load using constant pool.  */
+   *cost = extra_cost->ldst.load;
+ break;
+   }
+case VEC_CONCAT:
+   /* depending on the operation, either DUP or INS.
+  For now, keep default costing.  */
+   break;
+   /* Load using a DUP.  */
+case VEC_DUPLICATE:
+   *cost = extra_cost->vect.dup;
+   return false;
+case VEC_SELECT:
+   {
+ rtx op0 = XEXP (x, 0);
+ *cost = rtx_cost (op0, GET_MODE (op0), VEC_SELECT, 0, speed);
 
-  /* 

Re: [PATCH,FORTRAN 28/29] Free type-bound procedure structs

2021-10-29 Thread Jerry D via Gcc-patches

Looks good and simple. Proceed. Thanks

Jerry

On 10/28/21 5:05 PM, Bernhard Reutner-Fischer via Fortran wrote:

ping
[Rebased, re-regtested cleanly. Ok for trunk?]
On Wed,  5 Sep 2018 14:57:31 +
Bernhard Reutner-Fischer  wrote:


From: Bernhard Reutner-Fischer 

compiling gfortran.dg/typebound_proc_31.f90 leaked the type-bound
structs:

56 bytes in 1 blocks are definitely lost.
   at 0x4C2CC05: calloc (vg_replace_malloc.c:711)
   by 0x151EA90: xcalloc (xmalloc.c:162)
   by 0x8E3E4F: gfc_get_typebound_proc(gfc_typebound_proc*) (symbol.c:4945)
   by 0x84C095: match_procedure_in_type (decl.c:10486)
   by 0x84C095: gfc_match_procedure() (decl.c:6696)
...

gcc/fortran/ChangeLog:

2017-12-06  Bernhard Reutner-Fischer  

* symbol.c (free_tb_tree): Free type-bound procedure struct.
(gfc_get_typebound_proc): Use explicit memcpy for clarity.
---
  gcc/fortran/symbol.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index 53c760a6c38..cde34c67482 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -3845,7 +3845,7 @@ free_tb_tree (gfc_symtree *t)
  
/* TODO: Free type-bound procedure structs themselves; probably needs some

   sort of ref-counting mechanism.  */
-
+  free (t->n.tb);
free (t);
  }
  
@@ -5052,7 +5052,7 @@ gfc_get_typebound_proc (gfc_typebound_proc *tb0)
  
result = XCNEW (gfc_typebound_proc);

if (tb0)
-*result = *tb0;
+memcpy (result, tb0, sizeof (gfc_typebound_proc));;
result->error = 1;
  
latest_undo_chgset->tbps.safe_push (result);




Re: [PATCH] Always default to DWARF2_DEBUG if not specified, warn about deprecated STABS

2021-10-29 Thread ibuclaw--- via Gcc-patches
> On 26/10/2021 03:28 Joseph Myers  wrote:
> 
>  
> On Mon, 25 Oct 2021, Richard Biener via Gcc-patches wrote:
> 
> > So it looks like tm_d.h is much more stripped down compared to regular
> > tm_p.h but also oddly enough config/default-d.c includes tm_d.h
> > while config/default-c.c explicitely documents itself to not do that.
> 
> I think the intent of that comment in default-c.c (which I wrote) was that 
> if a separate tm_c.h is needed, it should use its own headers, disjoint 
> from those used by tm.h.  In particular, as noted in the original patch 
> submission 
> , that 
> avoids making macros used only to define hooks visible throughout the 
> compiler.
> 
> > Is it maybe a bug that tm_d.h includes defaults.h at all?  Should
> 
> It's a bug that it includes defaults.h, and a bug that it includes 
> ${cpu_type}/${cpu_type}.h.  Any macros used only to define D hooks should 
> be in completely separate headers that aren't used elsewhere in the 
> compiler.
> 
> > "d defaults" be in a defaults-d.h instead?  If I remove the
> 
> Yes, and likewise any target-specific overrides of such macros should be 
> in a separate header, not ${cpu_type}/${cpu_type}.h.
> 

So the what default-d.c is doing, is pulling down per-CPU back-end information 
to populate the targetdm structure where there's a supported CPU, but not 
platform.  The why it is doing that was I wanted to avoid both having #ifdef's 
in the D front-end, and altering gcc/target.def.

It seems then that either all TARGET_D_ macros should be moved to 
${cpu_type}/${cpu_type}-d.h, or do one of alternatives I was trying to avoid.

Iain.


[COMMITTED] PR tree-optimization/102983 - Perform on-entry propagation after range_of_stmt on a gcond.

2021-10-29 Thread Andrew MacLeod via Gcc-patches
When range-of_stmt is invoked on a statement, out-of-date updating is 
keyed off the timestamp on the definition.


When the def is calculated, and its global value stored, a timestamp is 
created for the cache.  if range_of_stmt is invoked again, the timestamp 
of the uses are compared to the definitions, and if they are older, we 
simply use the cache value.


If one of the uses is newer than the definition, that means an input may 
have changed, and we will recalculate the definition. If this new value 
is different, we propagate this new value to any subsequent cache entries


In the case of a gcond, there is no LHS, so we ahev no way to determine 
if anything might be out of date or need updating. Until now, we just 
did nothing except calculate the branch... any cache entires in the 
following blocks were never updated.  In this PR, we later determines 
the b_4 has a value of 0 instead of [0,1] , which would then change the 
value of c in subsequent blocks.


This patch triggers a re-evaluation of all exports from a block when 
range_of_stmt is invoked on a gcond.  This isnt quite as bad as it seems 
because:
  a) range_of_stmt on a stmt without a LHS  is never invoked from 
within the internal API, so its only a client like VRP which can make 
this call
  b) The cache propagator is already smart enough to only propagate a 
value to the following blocks if

      1 - there is already an on-entry cache value, otherwise its skipped
      2 - the value actually changed.

The net result is that this change has very minimal impact on the 
compile time performance of the ranger VRP pass.. In the order of 0.5%.  
It also now catches a few things we use to miss.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

>From cb596fd43667f92c4cb037a4ee8b2061c393ba60 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 28 Oct 2021 13:31:17 -0400
Subject: [PATCH] Perform on-entry propagation after range_of_stmt on a gcond.

Propagation is automatically done by the temporal cache when defs are
out of date from the names on the RHS, but a gcond has no LHS, and any
updates on the RHS are never propagated.  Always propagate them.

	gcc/
	PR tree-optimization/102983
	* gimple-range-cache.h (propagate_updated_value): Make public.
	* gimple-range.cc (gimple_ranger::range_of_stmt): Propagate exports
	when processing gcond stmts.

	gcc/testsuite/
	* gcc.dg/pr102983.c: New.
---
 gcc/gimple-range-cache.h|  4 ++--
 gcc/gimple-range.cc | 12 +++-
 gcc/testsuite/gcc.dg/pr102983.c | 21 +
 3 files changed, 34 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr102983.c

diff --git a/gcc/gimple-range-cache.h b/gcc/gimple-range-cache.h
index 4937a0b305a..75105008338 100644
--- a/gcc/gimple-range-cache.h
+++ b/gcc/gimple-range-cache.h
@@ -103,6 +103,8 @@ public:
   bool get_non_stale_global_range (irange , tree name);
   void set_global_range (tree name, const irange );
 
+  void propagate_updated_value (tree name, basic_block bb);
+
   non_null_ref m_non_null;
   gori_compute m_gori;
 
@@ -120,8 +122,6 @@ private:
   void entry_range (irange , tree expr, basic_block bb);
   void exit_range (irange , tree expr, basic_block bb);
 
-  void propagate_updated_value (tree name, basic_block bb);
-
   bitmap m_propfail;
   vec m_workback;
   vec m_update_list;
diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 91bacda6dd0..2c9715a6f2c 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -256,7 +256,17 @@ gimple_ranger::range_of_stmt (irange , gimple *s, tree name)
 
   // If no name, simply call the base routine.
   if (!name)
-res = fold_range_internal (r, s, NULL_TREE);
+{
+  res = fold_range_internal (r, s, NULL_TREE);
+  if (res && is_a  (s))
+	{
+	  // Update any exports in the cache if this is a gimple cond statement.
+	  tree exp;
+	  basic_block bb = gimple_bb (s);
+	  FOR_EACH_GORI_EXPORT_NAME (m_cache.m_gori, bb, exp)
+	m_cache.propagate_updated_value (exp, bb);
+	}
+}
   else if (!gimple_range_ssa_p (name))
 res = get_tree_range (r, name, NULL);
   // Check if the stmt has already been processed, and is not stale.
diff --git a/gcc/testsuite/gcc.dg/pr102983.c b/gcc/testsuite/gcc.dg/pr102983.c
new file mode 100644
index 000..ef58af6def0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr102983.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+void foo(void);
+
+static int a = 1;
+
+int main() {
+  int c = 0;
+  for (int b = 0; b <= 0; b++) {
+if (!a)
+  foo();
+if (b > c){
+  if (c)
+continue;
+  a = 0;
+}
+c = 1;
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "Folding predicate c_.* to 1" 1 "evrp" } } */
-- 
2.17.2



Re: [PATCH] Remove VRP threader passes in exchange for better threading pre-VRP.

2021-10-29 Thread Aldy Hernandez via Gcc-patches
On Fri, Oct 29, 2021 at 10:10 AM Richard Biener
 wrote:
>
> On Fri, Oct 29, 2021 at 10:06 AM Aldy Hernandez  wrote:
> >
> > On Fri, Oct 29, 2021 at 9:30 AM Richard Biener
> >  wrote:
> >
> > > Btw, in case the "fully resolving" mode is slower than not fully resolving
> > > please consider gating it on -fexpensive-optimizations (aka -O2+), thus
> > > run the passes in not fully resolving modes at-O1.
> >
> > Sorry for the awkward naming.  I couldn't find a better name :-/.
> > Suggestions welcome.
> >
> > The fast mode assumes any unknown ranges on entry to a path to be
> > VARYING, whereas the fully resolving mode will ask the ranger, so the
> > fully resolving mode will indeed be slower.  Though, I haven't
> > measured how much.  However, we are gaining some time in total
> > compilation speed (1.32%) by replacing two threaders with one.
>
> OK.  Just again, -O1 is to favor compile-speed and should crunch through
> those incredibly stupi^Wlarge machine-generated sources without problems.
> But from your comment it doesn't sound like something completely unreasonable
> or slow.

Oh, I just noticed...we already key off of -fexpensive-optimizations.
Duh.  The only backward threader that runs at -O1 is ethread, which
does not fully resolve.  So I think we're good.

But your comment still applies when we kill the DOM threader and
replace it with a pre-DOM fully resolving threader, since DOM does run
at -O1.  Ughhh.. I really hate that DOM is an evrp pass in disguise
but at -O1.

Aldy



[PATCH] Fortran: recognize Gerhard Steinmetz

2021-10-29 Thread Manfred Schwarb via Gcc-patches
Hi,
there were really a lot of test cases provided by Gerhard Steinmetz lately.
Although I'm not really in the position to suggest this,
I would appreciate it, if one could recognize him by adding an entry to 
gfortran.texi.

As e.g. in the proposed patch. Such a patch should probably be signed-off my 
someone of
the inner circle and not by me ;-)

Cheers,
Manfred

--- gcc/gcc/fortran/gfortran.texi.orig	2021-05-31 03:23:30.069163688 +0200
+++ gcc/gcc/fortran/gfortran.texi	2021-10-29 15:21:39.309279615 +0200
@@ -5888,6 +5888,7 @@ GNU Fortran project:
 @item Dominique d'Humi@`eres
 @item Kate Hedstrom
 @item Erik Schnetter
+@item Gerhard Steinmetz
 @item Joost VandeVondele
 @end itemize



[PATCH] Fortran: Correct documentation for REAL intrinsic

2021-10-29 Thread Manfred Schwarb via Gcc-patches
Hi,

documentation for REAL intrinsic is slightly wrong. Fix it.
Patch is done on top of the column adjustment patch.

Signed-off-by Manfred Schwarb 


[Note: I do not have commit access]
--- gcc/gcc/fortran/intrinsic.texi.2	2021-10-29 15:08:38.302188947 +0200
+++ gcc/gcc/fortran/intrinsic.texi	2021-10-29 15:14:57.111458299 +0200
@@ -12251,12 +12255,12 @@ end program test_real
 @item @emph{Specific names}:
 @multitable @columnfractions .20 .23 .20 .33
 @headitem Name @tab Argument   @tab Return type @tab Standard
-@item @code{FLOAT(A)}  @tab @code{INTEGER(4)}  @tab @code{REAL(4)}  @tab GNU extension
+@item @code{FLOAT(A)}  @tab @code{INTEGER(4)}  @tab @code{REAL(4)}  @tab Fortran 77 and later
 @item @code{DFLOAT(A)} @tab @code{INTEGER(4)}  @tab @code{REAL(8)}  @tab GNU extension
-@item @code{FLOATI(A)} @tab @code{INTEGER(2)}  @tab @code{REAL(4)}  @tab GNU extension
-@item @code{FLOATJ(A)} @tab @code{INTEGER(4)}  @tab @code{REAL(4)}  @tab GNU extension
-@item @code{FLOATK(A)} @tab @code{INTEGER(8)}  @tab @code{REAL(4)}  @tab GNU extension
-@item @code{SNGL(A)}   @tab @code{INTEGER(8)}  @tab @code{REAL(4)}  @tab GNU extension
+@item @code{FLOATI(A)} @tab @code{INTEGER(2)}  @tab @code{REAL(4)}  @tab GNU extension (-fdec)
+@item @code{FLOATJ(A)} @tab @code{INTEGER(4)}  @tab @code{REAL(4)}  @tab GNU extension (-fdec)
+@item @code{FLOATK(A)} @tab @code{INTEGER(8)}  @tab @code{REAL(4)}  @tab GNU extension (-fdec)
+@item @code{SNGL(A)}   @tab @code{REAL(8)} @tab @code{REAL(4)}  @tab Fortran 77 and later
 @end multitable




[PATCH] Fortran: Remove documentation for SHORT and LONG intrinics

2021-10-29 Thread Manfred Schwarb via Gcc-patches
Hi,

on 2019-07-23, support for SHORT and LONG intrinsics was removed be Steve Kargl 
by
adding an error message in check.c.  As far as I can see code support is still 
there, though.

Remove documentation for these intrinsics.

Signed-off-by Manfred Schwarb 


[Note: I do not have commit access]
--- gcc/gcc/fortran/intrinsic.texi.1	2021-10-29 14:24:46.102169856 +0200
+++ gcc/gcc/fortran/intrinsic.texi.2	2021-10-29 15:08:38.302188947 +0200
@@ -221,7 +221,6 @@ Some basic guidelines for editing this d
 * @code{LOG10}: LOG10, Base 10 logarithm function
 * @code{LOG_GAMMA}: LOG_GAMMA, Logarithm of the Gamma function
 * @code{LOGICAL}:   LOGICAL,   Convert to logical type
-* @code{LONG}:  LONG,  Convert to integer type
 * @code{LSHIFT}:LSHIFT,Left shift bits
 * @code{LSTAT}: LSTAT, Get file status
 * @code{LTIME}: LTIME, Convert time to local time info
@@ -8372,7 +8371,6 @@ end program
 @node INT2
 @section @code{INT2} --- Convert to 16-bit integer type
 @fnindex INT2
-@fnindex SHORT
 @cindex conversion, to integer

 @table @asis
@@ -8381,8 +8379,6 @@ Convert to a @code{KIND=2} integer type.
 standard @code{INT} intrinsic with an optional argument of
 @code{KIND=2}, and is only included for backwards compatibility.

-The @code{SHORT} intrinsic is equivalent to @code{INT2}.
-
 @item @emph{Standard}:
 GNU extension

@@ -8403,8 +8399,7 @@ The return value is a @code{INTEGER(2)}

 @item @emph{See also}:
 @ref{INT}, @gol
-@ref{INT8}, @gol
-@ref{LONG}
+@ref{INT8}
 @end table


@@ -8440,8 +8435,7 @@ The return value is a @code{INTEGER(8)}

 @item @emph{See also}:
 @ref{INT}, @gol
-@ref{INT2}, @gol
-@ref{LONG}
+@ref{INT2}
 @end table


@@ -9848,44 +9842,6 @@ kind corresponding to @var{KIND}, or of
 @end table


-
-@node LONG
-@section @code{LONG} --- Convert to integer type
-@fnindex LONG
-@cindex conversion, to integer
-
-@table @asis
-@item @emph{Description}:
-Convert to a @code{KIND=4} integer type, which is the same size as a C
-@code{long} integer.  This is equivalent to the standard @code{INT}
-intrinsic with an optional argument of @code{KIND=4}, and is only
-included for backwards compatibility.
-
-@item @emph{Standard}:
-GNU extension
-
-@item @emph{Class}:
-Elemental function
-
-@item @emph{Syntax}:
-@code{RESULT = LONG(A)}
-
-@item @emph{Arguments}:
-@multitable @columnfractions .15 .70
-@item @var{A}@tab Shall be of type @code{INTEGER},
-@code{REAL}, or @code{COMPLEX}.
-@end multitable
-
-@item @emph{Return value}:
-The return value is a @code{INTEGER(4)} variable.
-
-@item @emph{See also}:
-@ref{INT}, @gol
-@ref{INT2}, @gol
-@ref{INT8}
-@end table
-
-

 @node LSHIFT
 @section @code{LSHIFT} --- Left shift bits


[PATCH] Fortran: adjust error message for SHORT and LONG intrinsics

2021-10-29 Thread Manfred Schwarb via Gcc-patches
Hi,

on 2019-07-23, support for SHORT and LONG intrinsics were removed be Steve 
Kargl by
adding an error message in check.c.  However, the error message
  Error: 'long' intrinsic subprogram at (1) has been deprecated
is misleading, as support has been disabled by this patch.

Adjust the error message. This error message does not appear in the testsuite 
AFAIK.

Signed-off-by Manfred Schwarb 


[Note: I do not have commit access]
--- gcc/gcc/fortran/check.c.orig	2021-10-15 02:20:28.825876592 +0200
+++ gcc/gcc/fortran/check.c	2021-10-29 14:44:51.771512312 +0200
@@ -3240,7 +3240,7 @@ gfc_check_intconv (gfc_expr *x)
   if (strcmp (gfc_current_intrinsic, "short") == 0
   || strcmp (gfc_current_intrinsic, "long") == 0)
 {
-  gfc_error ("%qs intrinsic subprogram at %L has been deprecated.  "
+  gfc_error ("%qs intrinsic subprogram at %L has been removed.  "
 		 "Use INT intrinsic subprogram.", gfc_current_intrinsic,
 		 >where);
   return false;


[PATCH] Fortran: adjust column sizes in intrinsic.texi

2021-10-29 Thread Manfred Schwarb via Gcc-patches
Hi,

in intrinsic.texi, a lot of tables wrap lines when watching the
resulting info file in a 80char terminal.

Adjust the @columnfractions items to fit screen. Some minor white space
changes are added as well to help saving space.

Signed-off-by Manfred Schwarb 


[Note: I do not have commit access]
--- gcc/gcc/fortran/intrinsic.texi.orig	2021-09-18 03:19:23.645913785 +0200
+++ gcc/gcc/fortran/intrinsic.texi.1	2021-10-29 14:24:46.102169856 +0200
@@ -461,7 +461,7 @@ end program test_abs
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable @columnfractions .20 .23 .20 .33
 @headitem Name@tab Argument@tab Return type   @tab Standard
 @item @code{ABS(A)}   @tab @code{REAL(4) A}@tab @code{REAL(4)}@tab Fortran 77 and later
 @item @code{CABS(A)}  @tab @code{COMPLEX(4) A} @tab @code{REAL(4)}@tab Fortran 77 and later
@@ -626,7 +626,7 @@ end program test_acos
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable @columnfractions .20 .23 .20 .33
 @headitem Name@tab Argument @tab Return type @tab Standard
 @item @code{ACOS(X)}  @tab @code{REAL(4) X} @tab @code{REAL(4)}  @tab Fortran 77 and later
 @item @code{DACOS(X)} @tab @code{REAL(8) X} @tab @code{REAL(8)}  @tab Fortran 77 and later
@@ -685,7 +685,7 @@ end program test_acosd
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable @columnfractions .20 .23 .20 .33
 @headitem Name@tab Argument @tab Return type @tab Standard
 @item @code{ACOSD(X)}  @tab @code{REAL(4) X} @tab @code{REAL(4)}  @tab GNU extension
 @item @code{DACOSD(X)} @tab @code{REAL(8) X} @tab @code{REAL(8)}  @tab GNU extension
@@ -741,7 +741,7 @@ END PROGRAM
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable @columnfractions .20 .23 .20 .33
 @headitem Name @tab Argument  @tab Return type   @tab Standard
 @item @code{DACOSH(X)} @tab @code{REAL(8) X}  @tab @code{REAL(8)}@tab GNU extension
 @end multitable
@@ -890,7 +890,7 @@ end program test_aimag
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable @columnfractions .20 .23 .20 .33
 @headitem Name   @tab Argument@tab Return type @tab Standard
 @item @code{AIMAG(Z)}@tab @code{COMPLEX Z}@tab @code{REAL} @tab Fortran 77 and later
 @item @code{DIMAG(Z)}@tab @code{COMPLEX(8) Z} @tab @code{REAL(8)}  @tab GNU extension
@@ -950,7 +950,7 @@ end program test_aint
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable @columnfractions .20 .23 .20 .33
 @headitem Name   @tab Argument @tab Return type  @tab Standard
 @item @code{AINT(A)} @tab @code{REAL(4) A} @tab @code{REAL(4)}   @tab Fortran 77 and later
 @item @code{DINT(A)} @tab @code{REAL(8) A} @tab @code{REAL(8)}   @tab Fortran 77 and later
@@ -1230,7 +1230,7 @@ end program test_anint
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable @columnfractions .20 .23 .20 .33
 @headitem Name@tab Argument @tab Return type  @tab Standard
 @item @code{ANINT(A)}  @tab @code{REAL(4) A} @tab @code{REAL(4)}   @tab Fortran 77 and later
 @item @code{DNINT(A)} @tab @code{REAL(8) A} @tab @code{REAL(8)}   @tab Fortran 77 and later
@@ -1346,7 +1346,7 @@ end program test_asin
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable @columnfractions .20 .23 .20 .33
 @headitem Name@tab Argument  @tab Return type   @tab Standard
 @item @code{ASIN(X)}  @tab @code{REAL(4) X}  @tab @code{REAL(4)}@tab Fortran 77 and later
 @item @code{DASIN(X)} @tab @code{REAL(8) X}  @tab @code{REAL(8)}@tab Fortran 77 and later
@@ -1405,7 +1405,7 @@ end program test_asind
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable @columnfractions .20 .23 .20 .33
 @headitem Name@tab Argument  @tab Return type   @tab Standard
 @item @code{ASIND(X)}  @tab @code{REAL(4) X}  @tab @code{REAL(4)}@tab GNU extension
 @item @code{DASIND(X)} @tab @code{REAL(8) X}  @tab @code{REAL(8)}@tab GNU extension
@@ -1461,7 +1461,7 @@ END PROGRAM
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable @columnfractions .20 .23 .20 .33
 @headitem Name @tab Argument  @tab Return type   @tab Standard
 @item @code{DASINH(X)} @tab @code{REAL(8) X}  @tab @code{REAL(8)}@tab GNU extension.
 @end multitable
@@ -1597,7 +1597,7 @@ end program test_atan
 @end smallexample

 @item @emph{Specific names}:
-@multitable @columnfractions .20 .20 .20 .25
+@multitable 

Re: [PATCH] c++: quadratic constexpr behavior for left-assoc logical exprs [PR102780]

2021-10-29 Thread Jakub Jelinek via Gcc-patches
On Thu, Oct 28, 2021 at 03:35:20PM -0400, Patrick Palka wrote:
> > Is there a reason to turn this mode of evaluating everything twice if an
> > error should be diagnosed all the time, rather than only if we actually see
> > a TRUTH_*_EXPR we want to handle this way?
> > If we don't see any TRUTH_*_EXPR, or if processing_template_decl, or if
> > the first operand is already a constant, that seems like a waste of time.
> 
> Hmm yeah, at the very least it wouldn't hurt to check
> processing_template_decl before doing the tf_error shenanigans.  I'm not
> sure if we would gain anything by first looking for TRUTH_*_EXPR since
> that'd involve walking the entire expression anyway IIUC.

I meant actually something like:
--- gcc/cp/constexpr.c.jj   2021-10-28 20:07:48.566193259 +0200
+++ gcc/cp/constexpr.c  2021-10-29 13:47:48.824026156 +0200
@@ -8789,7 +8789,7 @@ potential_constant_expression_1 (tree t,
  return false;
}
  else if (!check_for_uninitialized_const_var
-  (tmp, /*constexpr_context_p=*/true, flags))
+  (tmp, /*constexpr_context_p=*/true, flags & ~(1 << 30)))
return false;
}
   return RECUR (tmp, want_rval);
@@ -8896,14 +8896,36 @@ potential_constant_expression_1 (tree t,
tree op1 = TREE_OPERAND (t, 1);
if (!RECUR (op0, rval))
  return false;
-   if (!(flags & tf_error) && RECUR (op1, rval))
- /* When quiet, try to avoid expensive trial evaluation by first
-checking potentiality of the second operand.  */
- return true;
-   if (!processing_template_decl)
- op0 = cxx_eval_outermost_constant_expr (op0, true);
+   if (TREE_CODE (op0) != INTEGER_CST && !processing_template_decl)
+ {
+   /* If op0 is not a constant, we can either
+  cxx_eval_outermost_constant_expr first, or RECUR (op1, rval)
+  first.  If quiet, perform the latter first, if not quiet
+  and it is the outermost such TRUTH_*_EXPR, perform the
+  latter first in quiet mode, followed by the former and
+  retry with the latter in non-quiet mode.  */
+   if ((flags & (1 << 30)) != 0)
+ op0 = cxx_eval_outermost_constant_expr (op0, true);
+   else if ((flags & tf_error) != 0)
+ {
+   flags &= ~tf_error;
+   if (RECUR (op1, rval))
+ return true;
+   op0 = cxx_eval_outermost_constant_expr (op0, true);
+   flags |= tf_error | (1 << 30);
+ }
+   else
+ {
+   if (RECUR (op1, rval))
+ return true;
+   op0 = cxx_eval_outermost_constant_expr (op0, true);
+   if (tree_int_cst_equal (op0, tmp))
+ return false;
+   return true;
+ }
+ }
if (tree_int_cst_equal (op0, tmp))
- return (flags & tf_error) ? RECUR (op1, rval) : false;
+ return RECUR (op1, rval);
else
  return true;
   }
@@ -9112,17 +9134,6 @@ bool
 potential_constant_expression_1 (tree t, bool want_rval, bool strict, bool now,
 tsubst_flags_t flags)
 {
-  if (flags & tf_error)
-{
-  /* Check potentiality quietly first, as that could be performed more
-efficiently in some cases (currently only for TRUTH_*_EXPR).  If
-that fails, replay the check noisily to give errors.  */
-  flags &= ~tf_error;
-  if (potential_constant_expression_1 (t, want_rval, strict, now, flags))
-   return true;
-  flags |= tf_error;
-}
-
   tree target = NULL_TREE;
   return potential_constant_expression_1 (t, want_rval, strict, now,
  flags, );

(perhaps with naming the 1 << 30 as tf_something or using different bit for
that).  So no doubling of potential_constant_expression_1 evaluation
for tf_error unless a TRUTH_*_EXPR is seen outside of template with
potentially constant first operand other than INTEGER_CST, but similarly to
what you did, make sure that there are at most two calls and not more.

> > As I said, another possibility is something like:
> > /* Try to quietly evaluate T to constant, but don't try too hard.  */
> > 
> > static tree
> > potential_constant_expression_eval (tree t)
> > {
> >   auto o = make_temp_override (constexpr_ops_limit,
> >MIN (constexpr_ops_limit, 100));
> >   return cxx_eval_outermost_constant_expr (t, true);
> > }
> > and using this new function instead of cxx_eval_outermost_constant_expr 
> > (op, true);
> > everywhere in potential_constant_expression_1 should fix the quadraticness
> > too.
> 
> This would technically fix the quadraticness but wouldn't it still mean
> that a huge left-associated constant logical expression is quite a bit
> slower to check than an equivalent right-associated one (depending on
> what we set 

Re: [PATCH v2 2/4] Refactor loop_version

2021-10-29 Thread Richard Biener via Gcc-patches
On Wed, 27 Oct 2021, Xionghu Luo wrote:

> loop_version currently does lv_adjust_loop_entry_edge
> before it loopifys the copy inserted on the header.  This patch moves
> the condition generation later and thus we have four pieces to help
> understanding of how the adjustment works:
>  1) duplicating the loop on the entry edge.
>  2) loopify the duplicated new loop.
>  3) adjusting the CFG to insert a condition branching to either loop
>  with lv_adjust_loop_entry_edge.
>  4) From loopify extract the scale_loop_frequencies bits.
> 
> Also removed some piece of code seems obviously useless which is not
> completely sure:
>  - redirect_all_edges since it is false and loopify only called once.
>  - extract_cond_bb_edges and lv_flush_pending_stmts (false_edge) as the
>  edge is not redirected actually.

This is OK (you can also commit this independently), thanks for the
cleanup.

Richard.

> gcc/ChangeLog:
> 
>   * cfgloopmanip.c (loop_version): Refactor loopify to
>   loop_version.  Move condition generation after loopify.
>   (loopify): Delete.
>   * cfgloopmanip.h (loopify): Delete.
> ---
>  gcc/cfgloopmanip.c | 113 -
>  gcc/cfgloopmanip.h |   3 --
>  2 files changed, 29 insertions(+), 87 deletions(-)
> 
> diff --git a/gcc/cfgloopmanip.c b/gcc/cfgloopmanip.c
> index 82c242dd720..a30ebe1cdb4 100644
> --- a/gcc/cfgloopmanip.c
> +++ b/gcc/cfgloopmanip.c
> @@ -846,71 +846,6 @@ create_empty_loop_on_edge (edge entry_edge,
>return loop;
>  }
>  
> -/* Make area between HEADER_EDGE and LATCH_EDGE a loop by connecting
> -   latch to header and update loop tree and dominators
> -   accordingly. Everything between them plus LATCH_EDGE destination must
> -   be dominated by HEADER_EDGE destination, and back-reachable from
> -   LATCH_EDGE source.  HEADER_EDGE is redirected to basic block SWITCH_BB,
> -   FALSE_EDGE of SWITCH_BB to original destination of HEADER_EDGE and
> -   TRUE_EDGE of SWITCH_BB to original destination of LATCH_EDGE.
> -   Returns the newly created loop.  Frequencies and counts in the new loop
> -   are scaled by FALSE_SCALE and in the old one by TRUE_SCALE.  */
> -
> -class loop *
> -loopify (edge latch_edge, edge header_edge,
> -  basic_block switch_bb, edge true_edge, edge false_edge,
> -  bool redirect_all_edges, profile_probability true_scale,
> -  profile_probability false_scale)
> -{
> -  basic_block succ_bb = latch_edge->dest;
> -  basic_block pred_bb = header_edge->src;
> -  class loop *loop = alloc_loop ();
> -  class loop *outer = loop_outer (succ_bb->loop_father);
> -  profile_count cnt;
> -
> -  loop->header = header_edge->dest;
> -  loop->latch = latch_edge->src;
> -
> -  cnt = header_edge->count ();
> -
> -  /* Redirect edges.  */
> -  loop_redirect_edge (latch_edge, loop->header);
> -  loop_redirect_edge (true_edge, succ_bb);
> -
> -  /* During loop versioning, one of the switch_bb edge is already properly
> - set. Do not redirect it again unless redirect_all_edges is true.  */
> -  if (redirect_all_edges)
> -{
> -  loop_redirect_edge (header_edge, switch_bb);
> -  loop_redirect_edge (false_edge, loop->header);
> -
> -  /* Update dominators.  */
> -  set_immediate_dominator (CDI_DOMINATORS, switch_bb, pred_bb);
> -  set_immediate_dominator (CDI_DOMINATORS, loop->header, switch_bb);
> -}
> -
> -  set_immediate_dominator (CDI_DOMINATORS, succ_bb, switch_bb);
> -
> -  /* Compute new loop.  */
> -  add_loop (loop, outer);
> -
> -  /* Add switch_bb to appropriate loop.  */
> -  if (switch_bb->loop_father)
> -remove_bb_from_loops (switch_bb);
> -  add_bb_to_loop (switch_bb, outer);
> -
> -  /* Fix counts.  */
> -  if (redirect_all_edges)
> -{
> -  switch_bb->count = cnt;
> -}
> -  scale_loop_frequencies (loop, false_scale);
> -  scale_loop_frequencies (succ_bb->loop_father, true_scale);
> -  update_dominators_in_loop (loop);
> -
> -  return loop;
> -}
> -
>  /* Remove the latch edge of a LOOP and update loops to indicate that
> the LOOP was removed.  After this function, original loop latch will
> have no successor, which caller is expected to fix somehow.
> @@ -1681,7 +1616,7 @@ loop_version (class loop *loop,
> bool place_after)
>  {
>basic_block first_head, second_head;
> -  edge entry, latch_edge, true_edge, false_edge;
> +  edge entry, latch_edge;
>int irred_flag;
>class loop *nloop;
>basic_block cond_bb;
> @@ -1694,7 +1629,7 @@ loop_version (class loop *loop,
>/* Note down head of loop as first_head.  */
>first_head = entry->dest;
>  
> -  /* Duplicate loop.  */
> +  /* 1) Duplicate loop on the entry edge.  */
>if (!cfg_hook_duplicate_loop_to_header_edge (loop, entry, 1,
>  NULL, NULL, NULL, 0))
>  {
> @@ -1702,11 +1637,28 @@ loop_version (class loop *loop,
>return NULL;
>  }
>  
> +  /* 2) loopify the duplicated new loop. */
> +  latch_edge = 

Re: [PATCH v2 4/4] Rename duplicate_loop_to_header_edge to duplicate_loop_body_to_header_edge

2021-10-29 Thread Richard Biener via Gcc-patches
On Wed, 27 Oct 2021, Xionghu Luo wrote:

> gcc/ChangeLog:
> 
>   * cfghooks.c (cfg_hook_duplicate_loop_to_header_edge): Rename
>   duplicate_loop_to_header_edge to
>   duplicate_loop_body_to_header_edge.
>   (cfg_hook_duplicate_loop_body_to_header_edge): Likewise.
>   * cfghooks.h (struct cfg_hooks): Likewise.
>   (cfg_hook_duplicate_loop_body_to_header_edge): Likewise.
>   * cfgloopmanip.c (duplicate_loop_body_to_header_edge): Likewise.
>   (clone_loop_to_header_edge): Likewise.
>   * cfgloopmanip.h (duplicate_loop_body_to_header_edge): Likewise.
>   * cfgrtl.c (struct cfg_hooks): Likewise.
>   * doc/loop.texi: Likewise.
>   * loop-unroll.c (unroll_loop_constant_iterations): Likewise.
>   (unroll_loop_runtime_iterations): Likewise.
>   (unroll_loop_stupid): Likewise.
>   (apply_opt_in_copies): Likewise.
>   * tree-cfg.c (struct cfg_hooks): Likewise.
>   * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Likewise.
>   (try_peel_loop): Likewise.
>   * tree-ssa-loop-manip.c (copy_phi_node_args): Likewise.
>   (gimple_duplicate_loop_body_to_header_edge): Likewise.
>   (tree_transform_and_unroll_loop): Likewise.
>   * tree-ssa-loop-manip.h (gimple_duplicate_loop_body_to_header_edge):
>   Likewise.

This renaming is OK (you can commit it independently).

Thanks,
Richard.

> ---
>  gcc/cfghooks.c  | 27 ---
>  gcc/cfghooks.h  | 13 ++---
>  gcc/cfgloopmanip.c  |  9 -
>  gcc/cfgloopmanip.h  |  6 +++---
>  gcc/cfgrtl.c|  2 +-
>  gcc/doc/loop.texi   |  4 ++--
>  gcc/loop-unroll.c   | 27 ---
>  gcc/tree-cfg.c  |  2 +-
>  gcc/tree-ssa-loop-ivcanon.c |  4 ++--
>  gcc/tree-ssa-loop-manip.c   | 22 --
>  gcc/tree-ssa-loop-manip.h   |  7 +++
>  11 files changed, 58 insertions(+), 65 deletions(-)
> 
> diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c
> index 50b9b177639..23eb364bee6 100644
> --- a/gcc/cfghooks.c
> +++ b/gcc/cfghooks.c
> @@ -1226,25 +1226,22 @@ lv_flush_pending_stmts (edge e)
>  cfg_hooks->flush_pending_stmts (e);
>  }
>  
> -/* Loop versioning uses the duplicate_loop_to_header_edge to create
> +/* Loop versioning uses the duplicate_loop_body_to_header_edge to create
> a new version of the loop basic-blocks, the parameters here are
> -   exactly the same as in duplicate_loop_to_header_edge or
> -   tree_duplicate_loop_to_header_edge; while in tree-ssa there is
> +   exactly the same as in duplicate_loop_body_to_header_edge or
> +   tree_duplicate_loop_body_to_header_edge; while in tree-ssa there is
> additional work to maintain ssa information that's why there is
> -   a need to call the tree_duplicate_loop_to_header_edge rather
> -   than duplicate_loop_to_header_edge when we are in tree mode.  */
> +   a need to call the tree_duplicate_loop_body_to_header_edge rather
> +   than duplicate_loop_body_to_header_edge when we are in tree mode.  */
>  bool
> -cfg_hook_duplicate_loop_to_header_edge (class loop *loop, edge e,
> - unsigned int ndupl,
> - sbitmap wont_exit, edge orig,
> - vec *to_remove,
> - int flags)
> +cfg_hook_duplicate_loop_body_to_header_edge (class loop *loop, edge e,
> +  unsigned int ndupl,
> +  sbitmap wont_exit, edge orig,
> +  vec *to_remove, int flags)
>  {
> -  gcc_assert (cfg_hooks->cfg_hook_duplicate_loop_to_header_edge);
> -  return cfg_hooks->cfg_hook_duplicate_loop_to_header_edge (loop, e,
> - ndupl, wont_exit,
> - orig, to_remove,
> - flags);
> +  gcc_assert (cfg_hooks->cfg_hook_duplicate_loop_body_to_header_edge);
> +  return cfg_hooks->cfg_hook_duplicate_loop_body_to_header_edge (
> +loop, e, ndupl, wont_exit, orig, to_remove, flags);
>  }
>  
>  /* Conditional jumps are represented differently in trees and RTL,
> diff --git a/gcc/cfghooks.h b/gcc/cfghooks.h
> index 8645fe5b9e7..29aa2bf0636 100644
> --- a/gcc/cfghooks.h
> +++ b/gcc/cfghooks.h
> @@ -166,7 +166,7 @@ struct cfg_hooks
>  
>/* A hook for duplicating loop in CFG, currently this is used
>   in loop versioning.  */
> -  bool (*cfg_hook_duplicate_loop_to_header_edge) (class loop *, edge,
> +  bool (*cfg_hook_duplicate_loop_body_to_header_edge) (class loop *, edge,
> unsigned, sbitmap,
> edge, vec *,
> int);
> @@ -250,12 +250,11 @@ extern bool block_ends_with_condjump_p 
> 

Re: [RFC] Don't move cold code out of loop by checking bb count

2021-10-29 Thread Richard Biener via Gcc-patches
On Wed, Oct 27, 2021 at 4:40 AM Xionghu Luo  wrote:
>
>
>
> On 2021/10/26 21:20, Richard Biener wrote:
> > On Mon, Oct 18, 2021 at 6:29 AM Xionghu Luo  wrote:
> >>
> >>
> >>
> >> On 2021/10/15 16:11, Richard Biener wrote:
> >>> On Sat, Oct 9, 2021 at 5:45 AM Xionghu Luo  wrote:
> 
>  Hi,
> 
>  On 2021/9/28 20:09, Richard Biener wrote:
> > On Fri, Sep 24, 2021 at 8:29 AM Xionghu Luo  
> > wrote:
> >>
> >> Update the patch to v3, not sure whether you prefer the paste style
> >> and continue to link the previous thread as Segher dislikes this...
> >>
> >>
> >> [PATCH v3] Don't move cold code out of loop by checking bb count
> >>
> >>
> >> Changes:
> >> 1. Handle max_loop in determine_max_movement instead of
> >> outermost_invariant_loop.
> >> 2. Remove unnecessary changes.
> >> 3. Add for_all_locs_in_loop (loop, ref, ref_in_loop_hot_body) in 
> >> can_sm_ref_p.
> >> 4. "gsi_next ();" in move_computations_worker is kept since it 
> >> caused
> >> infinite loop when implementing v1 and the iteration is missed to be
> >> updated actually.
> >>
> >> v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576488.html
> >> v2: 
> >> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579086.html
> >>
> >> There was a patch trying to avoid move cold block out of loop:
> >>
> >> https://gcc.gnu.org/pipermail/gcc/2014-November/215551.html
> >>
> >> Richard suggested to "never hoist anything from a bb with lower 
> >> execution
> >> frequency to a bb with higher one in LIM invariantness_dom_walker
> >> before_dom_children".
> >>
> >> In gimple LIM analysis, add find_coldest_out_loop to move invariants to
> >> expected target loop, if profile count of the loop bb is colder
> >> than target loop preheader, it won't be hoisted out of loop.
> >> Likely for store motion, if all locations of the REF in loop is cold,
> >> don't do store motion of it.
> >>
> >> SPEC2017 performance evaluation shows 1% performance improvement for
> >> intrate GEOMEAN and no obvious regression for others.  Especially,
> >> 500.perlbench_r +7.52% (Perf shows function S_regtry of perlbench is
> >> largely improved.), and 548.exchange2_r+1.98%, 526.blender_r +1.00%
> >> on P8LE.
> >>
> >> gcc/ChangeLog:
> >>
> >> * loop-invariant.c (find_invariants_bb): Check profile count
> >> before motion.
> >> (find_invariants_body): Add argument.
> >> * tree-ssa-loop-im.c (find_coldest_out_loop): New function.
> >> (determine_max_movement): Use find_coldest_out_loop.
> >> (move_computations_worker): Adjust and fix iteration udpate.
> >> (execute_sm_exit): Check pointer validness.
> >> (class ref_in_loop_hot_body): New functor.
> >> (ref_in_loop_hot_body::operator): New.
> >> (can_sm_ref_p): Use for_all_locs_in_loop.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.dg/tree-ssa/recip-3.c: Adjust.
> >> * gcc.dg/tree-ssa/ssa-lim-18.c: New test.
> >> * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
> >> * gcc.dg/tree-ssa/ssa-lim-20.c: New test.
> >> ---
> >>  gcc/loop-invariant.c   | 10 ++--
> >>  gcc/tree-ssa-loop-im.c | 61 --
> >>  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c|  2 +-
> >>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c | 20 +++
> >>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 27 ++
> >>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c | 25 +
> >>  gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c | 28 ++
> >>  7 files changed, 165 insertions(+), 8 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-21.c
> >>
> >> diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
> >> index fca0c2b24be..5c3be7bf0eb 100644
> >> --- a/gcc/loop-invariant.c
> >> +++ b/gcc/loop-invariant.c
> >> @@ -1183,9 +1183,14 @@ find_invariants_insn (rtx_insn *insn, bool 
> >> always_reached, bool always_executed)
> >> call.  */
> >>
> >>  static void
> >> -find_invariants_bb (basic_block bb, bool always_reached, bool 
> >> always_executed)
> >> +find_invariants_bb (class loop *loop, basic_block bb, bool 
> >> always_reached,
> >> +   bool always_executed)
> >>  {
> >>rtx_insn *insn;
> >> +  basic_block preheader = loop_preheader_edge (loop)->src;
> >> +
> >> +  if (preheader->count > bb->count)
> >> +return;
> >>
> >>

Re: [PATCH 2/2]middle-end Add target independent tests for complex numbers vectorization.

2021-10-29 Thread Richard Biener via Gcc-patches
On Fri, 29 Oct 2021, Tamar Christina wrote:

> Hi All,
> 
> This beefs up the complex numbers vectorization testsuite
> and adds target independent checks next to the target
> dependent ones.
> 
> This allows regressions to the detection code to be found
> when running on any target, not just aarch64.
> 
> Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and no regressions.
> 
> Ok for master?

OK.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/102977
>   * gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c: Updated.
>   * gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: Updated.
>   * gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c: Updated.
>   * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c:
>   Updated.
>   * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c:
>   Updated.
>   * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c:
>   Updated.
>   * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c:
>   * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: Updated.
>   * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c:
>   Updated.
>   * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c:
>   Updated.
>   * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c:
>   Updated.
>   * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c:
>   Updated.
>   * gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-double.c:
>   Updated.
>   * gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-float.c: Updated.
>   * gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-half-float.c:
>   Updated.
>   * gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-double.c:
>   Updated.
>   * gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-float.c: Updated.
>   * gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-half-float.c:
>   Updated.
>   * gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-double.c: Updated.
>   * gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-float.c: Updated.
>   * gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-half-float.c:
>   Updated.
>   * gcc.dg/vect/complex/fast-math-complex-add-double.c: Updated.
>   * gcc.dg/vect/complex/fast-math-complex-add-float.c: Updated.
>   * gcc.dg/vect/complex/fast-math-complex-add-half-float.c: Updated.
>   * gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c: Updated.
>   * gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c: Updated.
>   * gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c:
>   Updated.
>   * gcc.dg/vect/complex/fast-math-complex-mla-double.c: Updated.
>   * gcc.dg/vect/complex/fast-math-complex-mla-float.c: Updated.
>   * gcc.dg/vect/complex/fast-math-complex-mla-half-float.c: Updated.
>   * gcc.dg/vect/complex/fast-math-complex-mls-double.c: Updated.
>   * gcc.dg/vect/complex/fast-math-complex-mls-float.c: Updated.
>   * gcc.dg/vect/complex/fast-math-complex-mls-half-float.c: Updated.
>   * gcc.dg/vect/complex/fast-math-complex-mul-double.c: Updated.
>   * gcc.dg/vect/complex/fast-math-complex-mul-float.c: Updated.
>   * gcc.dg/vect/complex/fast-math-complex-mul-half-float.c: Updated.
>   * gcc.dg/vect/complex/vect-complex-add-pattern-byte.c: Updated.
>   * gcc.dg/vect/complex/vect-complex-add-pattern-int.c: Updated.
>   * gcc.dg/vect/complex/vect-complex-add-pattern-long.c: Updated.
>   * gcc.dg/vect/complex/vect-complex-add-pattern-short.c: Updated.
>   * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c:
>   Updated.
>   * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c:
>   Updated.
>   * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c:
>   Updated.
>   * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c:
>   Updated.
>   * gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c: Removed.
>   * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c:
>   Removed.
> 
> --- inline copy of patch -- 
> diff --git 
> a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c 
> b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c
> deleted file mode 100644
> index 
> aadee7f86fa42895ffc6bec481a95a2b185bf86d..
> --- a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c
> +++ /dev/null
> @@ -1,12 +0,0 @@
> -/* { dg-do compile } */
> -/* { dg-require-effective-target vect_complex_add_byte } */
> -/* { dg-require-effective-target stdint_types } */
> -/* { dg-add-options arm_v8_1m_mve_fp } */
> -
> -#define TYPE int8_t
> -#define N 16
> -#include 
> -#include "complex-add-pattern-template.c"
> -
> -/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 

Re: [PATCH 1/2]middle-end Update the complex numbers auto-vec detection to the new format of the SLP tree.

2021-10-29 Thread Richard Biener via Gcc-patches
On Fri, 29 Oct 2021, Tamar Christina wrote:

> Hi All,
> 
> The layout of the SLP tree has changed in GCC 12 which
> broke the detection of complex FMA and FMS.
> 
> This patch updates the detection to the new tree shape
> and by necessity merges the complex MUL and FMA detection
> into one.
> 
> This does not yet address the wrong code-gen PR which I
> will fix in a different patch as that needs backporting.
> 
> Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and no regressions.
> 
> Ok for master?

OK.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/102977
>   * tree-vect-slp-patterns.c (vect_match_call_p): Remove.
>   (vect_detect_pair_op): Add crosslane check.
>   (vect_match_call_complex_mla): Remove.
>   (class complex_mul_pattern): Update comment.
>   (complex_mul_pattern::matches): Update detection.
>   (class complex_fma_pattern): Remove.
>   (complex_fma_pattern::matches): Remove.
>   (complex_fma_pattern::recognize): Remove.
>   (complex_fma_pattern::build): Remove.
>   (class complex_fms_pattern):  Update comment.
>   (complex_fms_pattern::matches): Remove.
>   (complex_operations_pattern::recognize): Remove complex_fma_pattern
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> index 
> b8d09b7832e29689ede832d555e1b6af2c24ce1e..99dea82aba91a333500bb5ff35bf30b6416c09ca
>  100644
> --- a/gcc/tree-vect-slp-patterns.c
> +++ b/gcc/tree-vect-slp-patterns.c
> @@ -306,24 +306,6 @@ vect_match_expression_p (slp_tree node, tree_code code)
>return true;
>  }
>  
> -/* Checks to see if the expression represented by NODE is a call to the 
> internal
> -   function FN.  */
> -
> -static inline bool
> -vect_match_call_p (slp_tree node, internal_fn fn)
> -{
> -  if (!node
> -  || !SLP_TREE_REPRESENTATIVE (node))
> -return false;
> -
> -  gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (node));
> -  if (!expr
> -  || !gimple_call_internal_p (expr, fn))
> -return false;
> -
> -   return true;
> -}
> -
>  /* Check if the given lane permute in PERMUTES matches an alternating 
> sequence
> of {even odd even odd ...}.  This to account for unrolled loops.  Further
> mode there resulting permute must be linear.   */
> @@ -389,6 +371,16 @@ vect_detect_pair_op (slp_tree node1, slp_tree node2, 
> lane_permutation_t ,
>  
>if (result != CMPLX_NONE && ops != NULL)
>  {
> +  if (two_operands)
> + {
> +   auto l0node = SLP_TREE_CHILDREN (node1);
> +   auto l1node = SLP_TREE_CHILDREN (node2);
> +
> +   /* Check if the tree is connected as we expect it.  */
> +   if (!((l0node[0] == l1node[0] && l0node[1] == l1node[1])
> +   || (l0node[0] == l1node[1] && l0node[1] == l1node[0])))
> + return CMPLX_NONE;
> + }
>ops->safe_push (node1);
>ops->safe_push (node2);
>  }
> @@ -717,27 +709,6 @@ complex_add_pattern::recognize 
> (slp_tree_to_load_perm_map_t *perm_cache,
>   * complex_mul_pattern
>   
> **/
>  
> -/* Helper function of that looks for a match in the CHILDth child of NODE.  
> The
> -   child used is stored in RES.
> -
> -   If the match is successful then ARGS will contain the operands matched
> -   and the complex_operation_t type is returned.  If match is not successful
> -   then CMPLX_NONE is returned and ARGS is left unmodified.  */
> -
> -static inline complex_operation_t
> -vect_match_call_complex_mla (slp_tree node, unsigned child,
> -  vec *args = NULL, slp_tree *res = NULL)
> -{
> -  gcc_assert (child < SLP_TREE_CHILDREN (node).length ());
> -
> -  slp_tree data = SLP_TREE_CHILDREN (node)[child];
> -
> -  if (res)
> -*res = data;
> -
> -  return vect_detect_pair_op (data, false, args);
> -}
> -
>  /* Check to see if either of the trees in ARGS are a NEGATE_EXPR.  If the 
> first
> child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE.
>  
> @@ -945,9 +916,10 @@ class complex_mul_pattern : public complex_pattern
>  
>  };
>  
> -/* Pattern matcher for trying to match complex multiply pattern in SLP tree
> -   If the operation matches then IFN is set to the operation it matched
> -   and the arguments to the two replacement statements are put in m_ops.
> +/* Pattern matcher for trying to match complex multiply and complex multiply
> +   and accumulate pattern in SLP tree.  If the operation matches then IFN
> +   is set to the operation it matched and the arguments to the two
> +   replacement statements are put in m_ops.
>  
> If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
>  
> @@ -972,19 +944,43 @@ complex_mul_pattern::matches (complex_operation_t op,
>if (op != MINUS_PLUS)
>  return IFN_LAST;
>  
> -  slp_tree root = *node;
> -  /* First two nodes must be a multiply.  */
> -  auto_vec muls;
> - 

Re: [PATCH,Fortran 1/2] Add uop/name helpers

2021-10-29 Thread Jakub Jelinek via Gcc-patches
On Fri, Oct 29, 2021 at 01:52:58AM +0200, Bernhard Reutner-Fischer wrote:
> From: Bernhard Reutner-Fischer 
> 
> Introduce a helper to construct a user operator from a name and the
> reverse operation, i.e. a helper to construct a name from a user
> operator.
> 
> Cc: Jakub Jelinek 
> 
> gcc/fortran/ChangeLog:
> 
> 2017-10-29  Bernhard Reutner-Fischer  
> 
>   * gfortran.h (gfc_get_uop_from_name, gfc_get_name_from_uop): Declare.
>   * symbol.c (gfc_get_uop_from_name, gfc_get_name_from_uop): Define.
>   * module.c (load_omp_udrs): Use them.
> ---
>  gcc/fortran/gfortran.h |  2 ++
>  gcc/fortran/module.c   | 21 +++--
>  gcc/fortran/symbol.c   | 21 +
>  3 files changed, 26 insertions(+), 18 deletions(-)
> 
> diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
> index 9378b4b8a24..afe9f2354ee 100644
> --- a/gcc/fortran/gfortran.h
> +++ b/gcc/fortran/gfortran.h
> @@ -3399,6 +3399,8 @@ void gfc_delete_symtree (gfc_symtree **, const char *);
>  gfc_symtree *gfc_get_unique_symtree (gfc_namespace *);
>  gfc_user_op *gfc_get_uop (const char *);
>  gfc_user_op *gfc_find_uop (const char *, gfc_namespace *);
> +const char *gfc_get_uop_from_name (const char*);
> +const char *gfc_get_name_from_uop (const char*);

Formatting, space between char and *.

> --- a/gcc/fortran/symbol.c
> +++ b/gcc/fortran/symbol.c
> @@ -3044,6 +3044,27 @@ gfc_find_uop (const char *name, gfc_namespace *ns)
>return (st == NULL) ? NULL : st->n.uop;
>  }
>  
> +/* Given a name return a string usable as user operator name.  */
> +const char *
> +gfc_get_uop_from_name (const char* name) {

Formatting, space before * rather than after it, { should go on next line.
Similarly later.

But most importantly, I really don't like these helpers at all, they
unnecessarily allocate memory of the remaining duration of compilation,
and the second one even uses heap for temporary.

Can't you just fix the real bug and keep the code as it was otherwise
(with XALLOCAVEC etc.)?
And, there should be a testcase...

Jakub



[PATCH 2/2]middle-end Add target independent tests for complex numbers vectorization.

2021-10-29 Thread Tamar Christina via Gcc-patches
Hi All,

This beefs up the complex numbers vectorization testsuite
and adds target independent checks next to the target
dependent ones.

This allows regressions to the detection code to be found
when running on any target, not just aarch64.

Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu and no regressions.

Ok for master?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

PR tree-optimization/102977
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c: Updated.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: Updated.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c: Updated.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c:
Updated.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c:
Updated.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c:
Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c:
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c:
Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c:
Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c:
Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c:
Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-double.c:
Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-float.c: Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mla-half-float.c:
Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-double.c:
Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-float.c: Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mls-half-float.c:
Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-double.c: Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-float.c: Updated.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-mul-half-float.c:
Updated.
* gcc.dg/vect/complex/fast-math-complex-add-double.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-add-float.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-add-half-float.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c:
Updated.
* gcc.dg/vect/complex/fast-math-complex-mla-double.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-mla-float.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-mla-half-float.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-mls-double.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-mls-float.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-mls-half-float.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-mul-double.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-mul-float.c: Updated.
* gcc.dg/vect/complex/fast-math-complex-mul-half-float.c: Updated.
* gcc.dg/vect/complex/vect-complex-add-pattern-byte.c: Updated.
* gcc.dg/vect/complex/vect-complex-add-pattern-int.c: Updated.
* gcc.dg/vect/complex/vect-complex-add-pattern-long.c: Updated.
* gcc.dg/vect/complex/vect-complex-add-pattern-short.c: Updated.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c:
Updated.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c:
Updated.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c:
Updated.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c:
Updated.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c: Removed.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c:
Removed.

--- inline copy of patch -- 
diff --git 
a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c 
b/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c
deleted file mode 100644
index 
aadee7f86fa42895ffc6bec481a95a2b185bf86d..
--- a/gcc/testsuite/gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c
+++ /dev/null
@@ -1,12 +0,0 @@
-/* { dg-do compile } */
-/* { dg-require-effective-target vect_complex_add_byte } */
-/* { dg-require-effective-target stdint_types } */
-/* { dg-add-options arm_v8_1m_mve_fp } */
-
-#define TYPE int8_t
-#define N 16
-#include 
-#include "complex-add-pattern-template.c"
-
-/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT90" 1 "slp1" { 
xfail aarch64_sve2 } } } */
-/* { dg-final { scan-tree-dump-times "stmt.*COMPLEX_ADD_ROT270" 1 "slp1" { 
xfail aarch64_sve2 } } } */
diff --git 

[PATCH 1/2]middle-end Update the complex numbers auto-vec detection to the new format of the SLP tree.

2021-10-29 Thread Tamar Christina via Gcc-patches
Hi All,

The layout of the SLP tree has changed in GCC 12 which
broke the detection of complex FMA and FMS.

This patch updates the detection to the new tree shape
and by necessity merges the complex MUL and FMA detection
into one.

This does not yet address the wrong code-gen PR which I
will fix in a different patch as that needs backporting.

Regtested on aarch64-none-linux-gnu,
x86_64-pc-linux-gnu and no regressions.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/102977
* tree-vect-slp-patterns.c (vect_match_call_p): Remove.
(vect_detect_pair_op): Add crosslane check.
(vect_match_call_complex_mla): Remove.
(class complex_mul_pattern): Update comment.
(complex_mul_pattern::matches): Update detection.
(class complex_fma_pattern): Remove.
(complex_fma_pattern::matches): Remove.
(complex_fma_pattern::recognize): Remove.
(complex_fma_pattern::build): Remove.
(class complex_fms_pattern):  Update comment.
(complex_fms_pattern::matches): Remove.
(complex_operations_pattern::recognize): Remove complex_fma_pattern

--- inline copy of patch -- 
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 
b8d09b7832e29689ede832d555e1b6af2c24ce1e..99dea82aba91a333500bb5ff35bf30b6416c09ca
 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -306,24 +306,6 @@ vect_match_expression_p (slp_tree node, tree_code code)
   return true;
 }
 
-/* Checks to see if the expression represented by NODE is a call to the 
internal
-   function FN.  */
-
-static inline bool
-vect_match_call_p (slp_tree node, internal_fn fn)
-{
-  if (!node
-  || !SLP_TREE_REPRESENTATIVE (node))
-return false;
-
-  gimple* expr = STMT_VINFO_STMT (SLP_TREE_REPRESENTATIVE (node));
-  if (!expr
-  || !gimple_call_internal_p (expr, fn))
-return false;
-
-   return true;
-}
-
 /* Check if the given lane permute in PERMUTES matches an alternating sequence
of {even odd even odd ...}.  This to account for unrolled loops.  Further
mode there resulting permute must be linear.   */
@@ -389,6 +371,16 @@ vect_detect_pair_op (slp_tree node1, slp_tree node2, 
lane_permutation_t ,
 
   if (result != CMPLX_NONE && ops != NULL)
 {
+  if (two_operands)
+   {
+ auto l0node = SLP_TREE_CHILDREN (node1);
+ auto l1node = SLP_TREE_CHILDREN (node2);
+
+ /* Check if the tree is connected as we expect it.  */
+ if (!((l0node[0] == l1node[0] && l0node[1] == l1node[1])
+ || (l0node[0] == l1node[1] && l0node[1] == l1node[0])))
+   return CMPLX_NONE;
+   }
   ops->safe_push (node1);
   ops->safe_push (node2);
 }
@@ -717,27 +709,6 @@ complex_add_pattern::recognize 
(slp_tree_to_load_perm_map_t *perm_cache,
  * complex_mul_pattern
  
**/
 
-/* Helper function of that looks for a match in the CHILDth child of NODE.  The
-   child used is stored in RES.
-
-   If the match is successful then ARGS will contain the operands matched
-   and the complex_operation_t type is returned.  If match is not successful
-   then CMPLX_NONE is returned and ARGS is left unmodified.  */
-
-static inline complex_operation_t
-vect_match_call_complex_mla (slp_tree node, unsigned child,
-vec *args = NULL, slp_tree *res = NULL)
-{
-  gcc_assert (child < SLP_TREE_CHILDREN (node).length ());
-
-  slp_tree data = SLP_TREE_CHILDREN (node)[child];
-
-  if (res)
-*res = data;
-
-  return vect_detect_pair_op (data, false, args);
-}
-
 /* Check to see if either of the trees in ARGS are a NEGATE_EXPR.  If the first
child (args[0]) is a NEGATE_EXPR then NEG_FIRST_P is set to TRUE.
 
@@ -945,9 +916,10 @@ class complex_mul_pattern : public complex_pattern
 
 };
 
-/* Pattern matcher for trying to match complex multiply pattern in SLP tree
-   If the operation matches then IFN is set to the operation it matched
-   and the arguments to the two replacement statements are put in m_ops.
+/* Pattern matcher for trying to match complex multiply and complex multiply
+   and accumulate pattern in SLP tree.  If the operation matches then IFN
+   is set to the operation it matched and the arguments to the two
+   replacement statements are put in m_ops.
 
If no match is found then IFN is set to IFN_LAST and m_ops is unchanged.
 
@@ -972,19 +944,43 @@ complex_mul_pattern::matches (complex_operation_t op,
   if (op != MINUS_PLUS)
 return IFN_LAST;
 
-  slp_tree root = *node;
-  /* First two nodes must be a multiply.  */
-  auto_vec muls;
-  if (vect_match_call_complex_mla (root, 0) != MULT_MULT
-  || vect_match_call_complex_mla (root, 1, ) != MULT_MULT)
+  auto childs = *ops;
+  auto l0node = SLP_TREE_CHILDREN (childs[0]);
+  auto l1node = SLP_TREE_CHILDREN (childs[1]);
+
+  bool mul0 = vect_match_expression_p (l0node[0], MULT_EXPR);
+  

Re: [Patch] libcpp: Fix _Pragma expansion [PR102409]

2021-10-29 Thread Jakub Jelinek via Gcc-patches
On Thu, Oct 28, 2021 at 05:51:59PM +0200, Tobias Burnus wrote:
> libcpp/ChangeLog:
> 
>   PR c++/102409
>   * directives.c (destringize_and_run): Add PRAGMA_OP to the
>   CPP_PRAGMA token's flags to mark is as coming from _Pragma.
>   * include/cpplib.h (PRAGMA_OP): #define, to be used with token flags.
>   * macro.c (collect_args): Only handle CPP_PRAGMA special if PRAGMA_OP
>   is set.
> 

The patch itself looks reasonable to me, but it should come up with
testsuite coverage.  And the testsuite coverage should include both normal
testcases that do use integrated preprocessor, and the same with
-save-temps to make sure that even when preprocessing separately it works
too.

Jakub



Re: [Patch] OpenMP: Add strictly nested API call check [PR102972]

2021-10-29 Thread Jakub Jelinek via Gcc-patches
On Fri, Oct 29, 2021 at 12:09:55PM +0200, Tobias Burnus wrote:
> The original motivation was to fix the routine part
> of the restriction quoted below. Namely that the only
> routines calls to
>   omp_get_num_teams() and omp_get_team_num()
> are permitted in teams when closely nested.
> 
> 
> "Restrictions to the teams construct are as follows:
> ...
> • distribute regions, including any distribute regions arising from composite 
> constructs,
> parallel regions, including any parallel regions arising from combined 
> constructs, loop
> regions, omp_get_num_teams() regions, and omp_get_team_num() regions are the
> only OpenMP regions that may be strictly nested inside the teams region."
> 
> 
> While being there, I found one issue related to the ancestor
> check – which checked too strictly – and in the generic check
> which assumed that the DECL_NAME in Fortran had the '_' suffix
> while only the assembler name has.
> 
> That worked well with '_' as DECL_NAME then matched the C name
> but for the integer(8) version, only ..._8_ was matched and
> DECL_NAME only contained ..._8 without tailing '_'.
> 
> The assembler name is also needed because in Fortran,
>  module m
>  contains
>subroutine omp_is_initial_device ()
> has an OpenMP API name in DECL_NAME but internally, it is
> something like m_MOD_omp_is_initial_device_ - which is an
> odd user name but is not the API routine name.

I'm afraid using DECL_ASSEMBLER_NAME opens a new can of worms.
For one, it shouldn't be HAS_DECL_ASSEMBLER_NAME_P guarded, we either want
to use one or the other always, not randomly pick between them depending
on whether a function already got an assembler name or not.
But, for DECL_ASSEMBLER_NAME, I'm afraid one needs to
targetm.strip_name_encoding and also strip user_label_prefix if any.

At least for C++,
namespace A
{
  int omp_is_initial_device () { return 0; }
};
is meant to be checked by
  || (DECL_CONTEXT (fndecl) != NULL_TREE
  && TREE_CODE (DECL_CONTEXT (fndecl)) != TRANSLATION_UNIT_DECL)
If that doesn't work for Fortran modules, we need to find out something
different, e.g. setjmp_or_longjmp_p also relies on that...

On the other side, when we use DECL_NAME we don't currently differentiate
between:
extern "C" int omp_is_initial_device ();
and say
extern int omp_is_initial_device (double, float);
where the latter is in C++ mangled differently.  Sure, one can't use
the latter together with #include ...

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -3911,7 +3911,7 @@ setjmp_or_longjmp_p (const_tree fndecl)
>  /* Return true if FNDECL is an omp_* runtime API call.  */
>  
>  static bool
> -omp_runtime_api_call (const_tree fndecl)
> +omp_runtime_api_call (tree fndecl, bool permit_num_teams)
>  {
>tree declname = DECL_NAME (fndecl);
>if (!declname
> @@ -3920,6 +3920,8 @@ omp_runtime_api_call (const_tree fndecl)
>|| !TREE_PUBLIC (fndecl))
>  return false;
>  
> +  if (HAS_DECL_ASSEMBLER_NAME_P (fndecl))
> +declname = DECL_ASSEMBLER_NAME (fndecl);
>const char *name = IDENTIFIER_POINTER (declname);
>if (!startswith (name, "omp_"))
>  return false;
> @@ -4029,7 +4031,17 @@ omp_runtime_api_call (const_tree fndecl)
> && (name[4 + len + 1] == '\0'
> || (mode > 1
> && strcmp (name + 4 + len + 1, "8_") == 0)
> - return true;
> + {
> +   /* Only omp_get_num_teams + omp_get_team_num.  */
> +   if (permit_num_teams
> +   && mode == 1
> +   && (strncmp (name + 4, "get_num_teams",
> +strlen ("get_num_teams")) == 0
> +   || strncmp (name + 4, "get_team_num",
> +   strlen ("get_team_num")) == 0))
> + return false;
> +   return true;
> + }
>  }
>return false;
>  }

As mentioned in the PR, I really don't like this permit_num_teams argument,
IMHO it is a caller that should check it, otherwise we end up in the
function with myriads of future exceptions etc.
But, if the stripping of the prefixes is non-trivial, perhaps
omp_runtime_api_call shouldn't return bool, but const char *, either NULL
for "this isn't an OpenMP API call", or pointer to the actual name starting
with "omp_", so that callers can check further.

As for tests where you are adding parallel to avoid the new diagnostics,
I'd suggest parallel if(0) instead, no need to create any extra threads...

Jakub



[Patch] OpenMP: Add strictly nested API call check [PR102972]

2021-10-29 Thread Tobias Burnus

The original motivation was to fix the routine part
of the restriction quoted below. Namely that the only
routines calls to
  omp_get_num_teams() and omp_get_team_num()
are permitted in teams when closely nested.


"Restrictions to the teams construct are as follows:
...
• distribute regions, including any distribute regions arising from composite 
constructs,
parallel regions, including any parallel regions arising from combined 
constructs, loop
regions, omp_get_num_teams() regions, and omp_get_team_num() regions are the
only OpenMP regions that may be strictly nested inside the teams region."


While being there, I found one issue related to the ancestor
check – which checked too strictly – and in the generic check
which assumed that the DECL_NAME in Fortran had the '_' suffix
while only the assembler name has.

That worked well with '_' as DECL_NAME then matched the C name
but for the integer(8) version, only ..._8_ was matched and
DECL_NAME only contained ..._8 without tailing '_'.

The assembler name is also needed because in Fortran,
 module m
 contains
   subroutine omp_is_initial_device ()
has an OpenMP API name in DECL_NAME but internally, it is
something like m_MOD_omp_is_initial_device_ - which is an
odd user name but is not the API routine name.

I hope that no target starts mangling the C name such that
C's DECL_NAME() != the assembler name as then the patch
will break, but I think all targets do permit those simple
names and don't introduce further mangling.


While other testsuites had surprisingly little problems with
this change – most did use omp_get_num_teams() and
omp_get_team_num() but that's fine - the GCC testsuite did
have many violations. — I hoped I have fixed them in a
sensible way.


OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP: Add strictly nested API call check [PR102972]

The teams construct only permits omp_get_num_teams and omp_get_team_num
as API call in strictly nested regions - check for it.

Additionally, for Fortran, using DECL_NAME does not show the mangled
name, hence, DECL_ASSEMBLER_NAME had to be used to.

Finally, 'target device(ancestor:1)' wrongly rejected non-API calls
as well.

	PR middle-end/102972
gcc/ChangeLog:

	* omp-low.c (omp_runtime_api_call): Use DECL_ASSEMBLER_NAME to get
	internal Fortran name; new permit_num_teams arg to permit
	omp_get_num_teams and omp_get_team_num.
	(scan_omp_1_stmt): Update call to it, add missing call for
	reverse offload, and check for strictly nested API calls in teams.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/target-device-ancestor-3.c: Add non-API
	routine test.
	* gfortran.dg/gomp/order-6.f90: Add missing bind(C).
	* c-c++-common/gomp/teams-3.c: New test.
	* gfortran.dg/gomp/teams-3.f90: New test.
	* gfortran.dg/gomp/teams-4.f90: New test.

libgomp/ChangeLog:
	* testsuite/libgomp.c-c++-common/icv-3.c: Nest API calls inside
	parallel construct.
	* testsuite/libgomp.c-c++-common/icv-4.c: Likewise.
	* testsuite/libgomp.c/target-3.c: Likewise.
	* testsuite/libgomp.c/target-5.c: Likewise.
	* testsuite/libgomp.c/target-6.c: Likewise.
	* testsuite/libgomp.c/target-teams-1.c: Likewise.
	* testsuite/libgomp.c/teams-1.c: Likewise.
	* testsuite/libgomp.c/thread-limit-2.c: Likewise.
	* testsuite/libgomp.c/thread-limit-3.c: Likewise.
	* testsuite/libgomp.c/thread-limit-4.c: Likewise.
	* testsuite/libgomp.c/thread-limit-5.c: Likewise.
	* testsuite/libgomp.fortran/icv-3.f90: Likewise.
	* testsuite/libgomp.fortran/icv-4.f90: Likewise.
	* testsuite/libgomp.fortran/teams1.f90: Likewise.

 gcc/omp-low.c  |  30 +-
 .../c-c++-common/gomp/target-device-ancestor-3.c   |   2 +
 gcc/testsuite/c-c++-common/gomp/teams-3.c  |  64 
 gcc/testsuite/gfortran.dg/gomp/order-6.f90 |   2 +-
 gcc/testsuite/gfortran.dg/gomp/teams-3.f90 |  65 
 gcc/testsuite/gfortran.dg/gomp/teams-4.f90 |  47 +
 libgomp/testsuite/libgomp.c-c++-common/icv-3.c |   3 +
 libgomp/testsuite/libgomp.c-c++-common/icv-4.c |   1 +
 libgomp/testsuite/libgomp.c/target-3.c |   6 +-
 libgomp/testsuite/libgomp.c/target-5.c |   1 +
 libgomp/testsuite/libgomp.c/target-6.c |  12 ++-
 libgomp/testsuite/libgomp.c/target-teams-1.c   | 115 +++--
 libgomp/testsuite/libgomp.c/teams-1.c  |   6 +-
 libgomp/testsuite/libgomp.c/thread-limit-2.c   |  21 ++--
 libgomp/testsuite/libgomp.c/thread-limit-3.c   |   1 +
 libgomp/testsuite/libgomp.c/thread-limit-4.c   |  25 +++--
 libgomp/testsuite/libgomp.c/thread-limit-5.c   |   1 +
 libgomp/testsuite/libgomp.fortran/icv-3.f90|   6 ++
 libgomp/testsuite/libgomp.fortran/icv-4.f90|   2 +
 

Re: Handle retslot_flags in ipa-modref and PTA

2021-10-29 Thread Richard Biener via Gcc-patches
On Fri, 29 Oct 2021, Jan Hubicka wrote:

> Hi,
> this patch extends modref and tree-ssa-structalias to handle retslot flags.
> Since retslot it essentially a hidden argument that is known to be write-only
> we can do pretty much the same stuff as we do for regular parameters.
> I plan to add static chain handling similar way.
> 
> We do not handle IPA propagation of retslot flags (where return slot is
> initialized via return slot of other function). For this ipa-prop needs
> to be extended to understand retslot as well.
> 
> Martin, I wonder if we could look into the ipa-prop bits as well as
> dropping ancessor functions and adding jump functions for return
> functions (which initially does not need to be used by ipa-cp avoiding
> your problem with forming non-trivial SCCS on acyclic callgraph).
> They would be immediatly useful for modref and work on ipa-cp can be
> done incrementally.
> 
> Bootstrapped/regtested x86_64-linux, OK for the gimple bits?

OK.

Thanks,
Richard.

> Honz
> 
> gcc/ChangeLog:
> 
>   * gimple.c (gimple_call_retslot_flags): New function.
>   * gimple.h (gimple_call_retslot_flags): Declare.
>   * ipa-modref.c: Include tree-cfg.h.
>   (struct escape_entry): Turn parm_index to signed.
>   (modref_summary_lto::modref_summary_lto): Add retslot_flags.
>   (modref_summary::modref_summary): Initialize retslot_flags.
>   (struct modref_summary_lto): Likewise.
>   (modref_summary::useful_p): Check retslot_flags.
>   (modref_summary_lto::useful_p): Likewise.
>   (modref_summary::dump): Dump retslot_flags.
>   (modref_summary_lto::dump): Likewise.
>   (struct escape_point): Add hidden_args enum.
>   (analyze_ssa_name_flags): Ignore return slot return;
>   use gimple_call_retslot_flags.
>   (record_escape_points): Break out from ...
>   (analyze_parms): ... here; handle retslot_flags.
>   (modref_summaries::duplicate): Duplicate retslot_flags.
>   (modref_summaries_lto::duplicate): Likewise.
>   (modref_write_escape_summary): Stream parm_index as signed.
>   (modref_read_escape_summary): Likewise.
>   (modref_write): Stream retslot_flags.
>   (read_section): Likewise.
>   (struct escape_map): Fix typo in comment.
>   (update_escape_summary_1): Fix whitespace.
>   (ipa_merge_modref_summary_after_inlining): Drop retslot_flags.
>   (modref_merge_call_site_flags): Merge retslot_flags.
>   * ipa-modref.h (struct modref_summary): Add retslot_flags.
>   * tree-ssa-structalias.c (handle_rhs_call): Handle retslot_flags.
> 
> diff --git a/gcc/gimple.c b/gcc/gimple.c
> index cc7a88e822b..22dd6417d19 100644
> --- a/gcc/gimple.c
> +++ b/gcc/gimple.c
> @@ -1608,6 +1613,40 @@ gimple_call_arg_flags (const gcall *stmt, unsigned arg)
>return flags;
>  }
>  
> +/* Detects argument flags for return slot on call STMT.  */
> +
> +int
> +gimple_call_retslot_flags (const gcall *stmt)
> +{
> +  int flags = EAF_DIRECT | EAF_NOREAD;
> +
> +  tree callee = gimple_call_fndecl (stmt);
> +  if (callee)
> +{
> +  cgraph_node *node = cgraph_node::get (callee);
> +  modref_summary *summary = node ? get_modref_function_summary (node)
> + : NULL;
> +
> +  if (summary)
> + {
> +   int modref_flags = summary->retslot_flags;
> +
> +   /* We have possibly optimized out load.  Be conservative here.  */
> +   if (!node->binds_to_current_def_p ())
> + {
> +   if ((modref_flags & EAF_UNUSED) && !(flags & EAF_UNUSED))
> + {
> +   modref_flags &= ~EAF_UNUSED;
> +   modref_flags |= EAF_NOESCAPE;
> + }
> + }
> +   if (dbg_cnt (ipa_mod_ref_pta))
> + flags |= modref_flags;
> + }
> +}
> +  return flags;
> +}
> +
>  /* Detects return flags for the call STMT.  */
>  
>  int
> diff --git a/gcc/gimple.h b/gcc/gimple.h
> index 303623b3ced..23a124ec769 100644
> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h
> @@ -1589,6 +1589,7 @@ gimple_seq gimple_seq_copy (gimple_seq);
>  bool gimple_call_same_target_p (const gimple *, const gimple *);
>  int gimple_call_flags (const gimple *);
>  int gimple_call_arg_flags (const gcall *, unsigned);
> +int gimple_call_retslot_flags (const gcall *);
>  int gimple_call_return_flags (const gcall *);
>  bool gimple_call_nonnull_result_p (gcall *);
>  tree gimple_call_nonnull_arg (gcall *);
> diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> index 0bbec8df0a2..4c59194c521 100644
> --- a/gcc/ipa-modref.c
> +++ b/gcc/ipa-modref.c
> @@ -86,6 +86,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "stringpool.h"
>  #include "tree-ssanames.h"
>  #include "attribs.h"
> +#include "tree-cfg.h"
>  
>  
>  namespace {
> @@ -133,7 +134,7 @@ static fnspec_summaries_t *fnspec_summaries = NULL;
>  struct escape_entry
>  {
>/* Parameter that escapes at a given call.  */
> -  unsigned int parm_index;
> +  int parm_index;
>/* Argument it escapes to.  */
> 

Re: [PATCH] Add TSVC tests.

2021-10-29 Thread Richard Biener via Gcc-patches
On Tue, Oct 26, 2021 at 5:27 PM Martin Liška  wrote:
>
> On 10/26/21 10:13, Richard Biener wrote:
> > On Tue, Oct 19, 2021 at 8:49 AM Martin Liška  wrote:
> >>
> >> On 10/18/21 12:08, Richard Biener wrote:
> >>> Can you please use a subdirectory for the sources, a "toplevel"
> >>> license.txt doesn't make much sense.  You can simply amend
> >>> vect.exp to process tsvc/*.c as well as sources so no need for an
> >>> extra .exp file.
> >>
> >> Sure, it's a good idea and I've done that.
> >>
> >>>
> >>> Is the license recognized as
> >>> compatible to the GPL as far as source distribution is concerned?
> >>
> >> Yes: https://www.gnu.org/licenses/license-list.html#NCSA
> >>
> >>>
> >>> Did you test the testcases on any non-x86 target?  (power/aarch64/arm)
> >>
> >> Yes, I run the tests also on ppc64le-linux-gnu and aarch64-linux-gnu.
> >>
> >> Thoughts?
> >
>
> Hey.
>
> > The overall setup looks fine to me.  There are quite some testcases
> > where there are no dg-final directives, some indicate in comments
> > that we do not expect vectorization - for those do we want to
> > add scan-tree-dump-not "loop vectorized" or so to make that clear?
>
> In the updated version of the patch I added:
> /* { dg-final { scan-tree-dump-not "vectorized \[1-9\] loops" "vect" } } */
>
> > For others do we want to add XFAILs so we'll notice when we improve
> > on TSVC?
>
> What type of XFAILs do you mean?

Like

/* { dg-final { scann-tree-dump "vectorized 1 loops" "vect" { xfail
*-*-* } } } */

when the testcase looks for vectorization but we don't do that (yet).
For s1113 for example you added a scan-tree-dump-not but the comment
suggests we'd expect vectorization.

> > It looks like for example s124 is looking for IVOPTs rather
> > than vectorization?  There are testcases exercising float compares
> > (s124 is an example), vectorizing those likely requires a subset
> > of fast-math flags to allow if-conversion and masking, plus masking
> > is not available on all targets.  Is the intent to adjust testcase options
> > accordingly?
>
> No, this is out of my scope, it has already taken me some time...

OK.

> >
> > That said, I wonder whether it makes sense to initially only add
> > the parts having dg-final directives (that PASS or XFAIL), just
> > adding testcases for testing compile looks superfluous.
> >
> > All of the testcases are dg-do compile, but vectorizer testcases
> > ideally would come with runtime verification.  I assume the
> > original TSVC provides this and as you include tscv.h in all
> > tests I suppose including a runtime harness would be possible, no?
>
> All right, I'm adding also run-time checking. It took me some time making
> array initialization for all tests independent. Plus I reduced number of
> iterations to 1/10 of the origin. That makes tests quite fast.
>
> What do you think about it now?

It looks nice now, but as said above some of the scan-tree-dump-not
should probably be xfailed scan-tree-dump, I was suggesting the
-not for the cases where vectorizing would be semantically wrong.

So I'd say OK with that change.

Thanks,
Richard.

> Martin
>
> >
> > Thanks,
> > Richard.
> >
> >> Thanks,
> >> Martin
> >>
> >>>
> >>> Richard.


Handle retslot_flags in ipa-modref and PTA

2021-10-29 Thread Jan Hubicka via Gcc-patches
Hi,
this patch extends modref and tree-ssa-structalias to handle retslot flags.
Since retslot it essentially a hidden argument that is known to be write-only
we can do pretty much the same stuff as we do for regular parameters.
I plan to add static chain handling similar way.

We do not handle IPA propagation of retslot flags (where return slot is
initialized via return slot of other function). For this ipa-prop needs
to be extended to understand retslot as well.

Martin, I wonder if we could look into the ipa-prop bits as well as
dropping ancessor functions and adding jump functions for return
functions (which initially does not need to be used by ipa-cp avoiding
your problem with forming non-trivial SCCS on acyclic callgraph).
They would be immediatly useful for modref and work on ipa-cp can be
done incrementally.

Bootstrapped/regtested x86_64-linux, OK for the gimple bits?

Honza

gcc/ChangeLog:

* gimple.c (gimple_call_retslot_flags): New function.
* gimple.h (gimple_call_retslot_flags): Declare.
* ipa-modref.c: Include tree-cfg.h.
(struct escape_entry): Turn parm_index to signed.
(modref_summary_lto::modref_summary_lto): Add retslot_flags.
(modref_summary::modref_summary): Initialize retslot_flags.
(struct modref_summary_lto): Likewise.
(modref_summary::useful_p): Check retslot_flags.
(modref_summary_lto::useful_p): Likewise.
(modref_summary::dump): Dump retslot_flags.
(modref_summary_lto::dump): Likewise.
(struct escape_point): Add hidden_args enum.
(analyze_ssa_name_flags): Ignore return slot return;
use gimple_call_retslot_flags.
(record_escape_points): Break out from ...
(analyze_parms): ... here; handle retslot_flags.
(modref_summaries::duplicate): Duplicate retslot_flags.
(modref_summaries_lto::duplicate): Likewise.
(modref_write_escape_summary): Stream parm_index as signed.
(modref_read_escape_summary): Likewise.
(modref_write): Stream retslot_flags.
(read_section): Likewise.
(struct escape_map): Fix typo in comment.
(update_escape_summary_1): Fix whitespace.
(ipa_merge_modref_summary_after_inlining): Drop retslot_flags.
(modref_merge_call_site_flags): Merge retslot_flags.
* ipa-modref.h (struct modref_summary): Add retslot_flags.
* tree-ssa-structalias.c (handle_rhs_call): Handle retslot_flags.

diff --git a/gcc/gimple.c b/gcc/gimple.c
index cc7a88e822b..22dd6417d19 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -1608,6 +1613,40 @@ gimple_call_arg_flags (const gcall *stmt, unsigned arg)
   return flags;
 }
 
+/* Detects argument flags for return slot on call STMT.  */
+
+int
+gimple_call_retslot_flags (const gcall *stmt)
+{
+  int flags = EAF_DIRECT | EAF_NOREAD;
+
+  tree callee = gimple_call_fndecl (stmt);
+  if (callee)
+{
+  cgraph_node *node = cgraph_node::get (callee);
+  modref_summary *summary = node ? get_modref_function_summary (node)
+   : NULL;
+
+  if (summary)
+   {
+ int modref_flags = summary->retslot_flags;
+
+ /* We have possibly optimized out load.  Be conservative here.  */
+ if (!node->binds_to_current_def_p ())
+   {
+ if ((modref_flags & EAF_UNUSED) && !(flags & EAF_UNUSED))
+   {
+ modref_flags &= ~EAF_UNUSED;
+ modref_flags |= EAF_NOESCAPE;
+   }
+   }
+ if (dbg_cnt (ipa_mod_ref_pta))
+   flags |= modref_flags;
+   }
+}
+  return flags;
+}
+
 /* Detects return flags for the call STMT.  */
 
 int
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 303623b3ced..23a124ec769 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -1589,6 +1589,7 @@ gimple_seq gimple_seq_copy (gimple_seq);
 bool gimple_call_same_target_p (const gimple *, const gimple *);
 int gimple_call_flags (const gimple *);
 int gimple_call_arg_flags (const gcall *, unsigned);
+int gimple_call_retslot_flags (const gcall *);
 int gimple_call_return_flags (const gcall *);
 bool gimple_call_nonnull_result_p (gcall *);
 tree gimple_call_nonnull_arg (gcall *);
diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 0bbec8df0a2..4c59194c521 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -86,6 +86,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stringpool.h"
 #include "tree-ssanames.h"
 #include "attribs.h"
+#include "tree-cfg.h"
 
 
 namespace {
@@ -133,7 +134,7 @@ static fnspec_summaries_t *fnspec_summaries = NULL;
 struct escape_entry
 {
   /* Parameter that escapes at a given call.  */
-  unsigned int parm_index;
+  int parm_index;
   /* Argument it escapes to.  */
   unsigned int arg;
   /* Minimal flags known about the argument.  */
@@ -269,7 +270,7 @@ static GTY(()) fast_function_summary 
 /* Summary for a single function which this pass produces.  */
 
 modref_summary::modref_summary ()
-  

Re: [PATCH] Remove VRP threader passes in exchange for better threading pre-VRP.

2021-10-29 Thread Aldy Hernandez via Gcc-patches
On Fri, Oct 29, 2021 at 10:10 AM Richard Biener
 wrote:
>
> On Fri, Oct 29, 2021 at 10:06 AM Aldy Hernandez  wrote:
> >
> > On Fri, Oct 29, 2021 at 9:30 AM Richard Biener
> >  wrote:
> >
> > > Btw, in case the "fully resolving" mode is slower than not fully resolving
> > > please consider gating it on -fexpensive-optimizations (aka -O2+), thus
> > > run the passes in not fully resolving modes at-O1.
> >
> > Sorry for the awkward naming.  I couldn't find a better name :-/.
> > Suggestions welcome.
> >
> > The fast mode assumes any unknown ranges on entry to a path to be
> > VARYING, whereas the fully resolving mode will ask the ranger, so the
> > fully resolving mode will indeed be slower.  Though, I haven't
> > measured how much.  However, we are gaining some time in total
> > compilation speed (1.32%) by replacing two threaders with one.
>
> OK.  Just again, -O1 is to favor compile-speed and should crunch through
> those incredibly stupi^Wlarge machine-generated sources without problems.
> But from your comment it doesn't sound like something completely unreasonable
> or slow.

It shouldn't be a problem.  Andrew has worked hard at handling those
large CFGs, and I'm just leveraging his work.  The backward threader
also has a limit of 10 blocks look-back.  But if it becomes a problem,
I'm more than happy to gate the fully resolving threader with
fexpensive-optimizations, but we will lose threading ability at -O1.
I assume that's OK?

FWIW, Andrew has mentioned providing a fast mode for the ranger for
precisely those huge CFGs.  Perhaps when that's ready, we could use
that mode for -O1.

>
> > >
> > > Btw, there were quite a few big compile-time hogs with the vrp_threader
> > > passes, not sure if this solves those.
> >
> > Sorry for not commenting on your spec ltrans report.  I was waiting
> > until this went in to get a better feel of whether it was the path
> > solver, the forward threader, or something else.  When I commit this
> > patch we'll get the forward threader out of the set of variables to
> > examine.  The forward threader, for instance, has very few knobs
> > limiting its behavior, and coupled with a smarter solver, who knows
> > what's going on.
> >
> > It is possible we may need to add a few knobs (or re-add some of the
> > ones I removed??), since the backward threader can find a whole slew
> > of paths that the forward threader could never find.
>
> Yeah, sure.  I'll wait unless this change is in and will re-measure and update
> the PR.

I'm working through a regression on ppc64, but I should be able to
push later today.

Thanks.
Aldy



Re: [PATCH] regcprop: Determine subreg offset depending on endianness [PR101260]

2021-10-29 Thread Stefan Schulze Frielinghaus via Gcc-patches
ping

On Mon, Oct 11, 2021 at 02:14:53PM +0200, Stefan Schulze Frielinghaus wrote:
> On Mon, Oct 11, 2021 at 09:38:36AM +0200, Richard Biener wrote:
> > On Fri, Oct 8, 2021 at 1:31 PM Stefan Schulze Frielinghaus via
> > Gcc-patches  wrote:
> > >
> > > gcc/ChangeLog:
> > >
> > > * regcprop.c (maybe_mode_change): Determine offset relative to
> > > high or low part depending on endianness.
> > >
> > > Bootstrapped and regtested on IBM Z. Ok for mainline and gcc-{11,10,9}?
> > 
> > Is there a testcase to add?
> 
> I've updated the patch and added the testcase from the PR.
> 
> > 
> > > ---
> > >  gcc/regcprop.c | 11 ---
> > >  1 file changed, 8 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/gcc/regcprop.c b/gcc/regcprop.c
> > > index d2a01130fe1..0e1ac12458a 100644
> > > --- a/gcc/regcprop.c
> > > +++ b/gcc/regcprop.c
> > > @@ -414,9 +414,14 @@ maybe_mode_change (machine_mode orig_mode, 
> > > machine_mode copy_mode,
> > > copy_nregs, _per_reg))
> > > return NULL_RTX;
> > >poly_uint64 copy_offset = bytes_per_reg * (copy_nregs - use_nregs);
> > > -  poly_uint64 offset
> > > -   = subreg_size_lowpart_offset (GET_MODE_SIZE (new_mode) + 
> > > copy_offset,
> > > - GET_MODE_SIZE (orig_mode));
> > > +  poly_uint64 offset =
> > > +#if WORDS_BIG_ENDIAN
> > > +   subreg_size_highpart_offset
> > > +#else
> > > +   subreg_size_lowpart_offset
> > > +#endif
> > > +   (GET_MODE_SIZE (new_mode) + 
> > > copy_offset,
> > > +GET_MODE_SIZE (orig_mode));
> > >regno += subreg_regno_offset (regno, orig_mode, offset, new_mode);
> > >if (targetm.hard_regno_mode_ok (regno, new_mode))
> > > return gen_raw_REG (new_mode, regno);
> > > --
> > > 2.31.1
> > >

> From 299959788321e21c27f0d4a6d437a586c5f6c92e Mon Sep 17 00:00:00 2001
> From: Stefan Schulze Frielinghaus 
> Date: Mon, 4 Oct 2021 09:36:21 +0200
> Subject: [PATCH] regcprop: Determine subreg offset depending on endianness
>  [PR101260]
> 
> gcc/ChangeLog:
> 
>   * regcprop.c (maybe_mode_change): Determine offset relative to
>   high or low part depending on endianness.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr101260.c: New test.
> ---
>  gcc/regcprop.c  | 11 ++--
>  gcc/testsuite/gcc.dg/pr101260.c | 49 +
>  2 files changed, 57 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr101260.c
> 
> diff --git a/gcc/regcprop.c b/gcc/regcprop.c
> index d2a01130fe1..0e1ac12458a 100644
> --- a/gcc/regcprop.c
> +++ b/gcc/regcprop.c
> @@ -414,9 +414,14 @@ maybe_mode_change (machine_mode orig_mode, machine_mode 
> copy_mode,
>   copy_nregs, _per_reg))
>   return NULL_RTX;
>poly_uint64 copy_offset = bytes_per_reg * (copy_nregs - use_nregs);
> -  poly_uint64 offset
> - = subreg_size_lowpart_offset (GET_MODE_SIZE (new_mode) + copy_offset,
> -   GET_MODE_SIZE (orig_mode));
> +  poly_uint64 offset =
> +#if WORDS_BIG_ENDIAN
> + subreg_size_highpart_offset
> +#else
> + subreg_size_lowpart_offset
> +#endif
> + (GET_MODE_SIZE (new_mode) + copy_offset,
> +  GET_MODE_SIZE (orig_mode));
>regno += subreg_regno_offset (regno, orig_mode, offset, new_mode);
>if (targetm.hard_regno_mode_ok (regno, new_mode))
>   return gen_raw_REG (new_mode, regno);
> diff --git a/gcc/testsuite/gcc.dg/pr101260.c b/gcc/testsuite/gcc.dg/pr101260.c
> new file mode 100644
> index 000..0e9ec4e203a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr101260.c
> @@ -0,0 +1,49 @@
> +/* PR rtl-optimization/101260 */
> +/* { dg-do run } */
> +/* { dg-options -O1 } */
> +struct a {
> +  unsigned b : 7;
> +  int c;
> +  int d;
> +  short e;
> +} p, *q = 
> +int f, g, h, i, r, s;
> +static short j[8][1][6] = {0};
> +char k[7];
> +short l, m;
> +int *n;
> +int **o = 
> +void t() {
> +  for (; f;)
> +;
> +}
> +static struct a u(int x) {
> +  struct a a = {4, 8, 5, 4};
> +  for (; i <= 6; i++) {
> +struct a v = {0};
> +for (; l; l++)
> +  h = 0;
> +for (; h >= 0; h--) {
> +  struct a *w;
> +  j[i];
> +  w = 
> +  s = 0;
> +  for (; s < 3; s++) {
> +r ^= x;
> +m = j[i][g][h] == (k[g] = g);
> +*w = v;
> +  }
> +  r = 2;
> +  for (; r; r--)
> +*o = 
> +}
> +  }
> +  t();
> +  return a;
> +}
> +int main() {
> +  *q = u(636);
> +  if (p.b != 4)
> +__builtin_abort ();
> +  return 0;
> +}
> -- 
> 2.31.1
> 



Re: [PATCH] Remove VRP threader passes in exchange for better threading pre-VRP.

2021-10-29 Thread Richard Biener via Gcc-patches
On Fri, Oct 29, 2021 at 10:06 AM Aldy Hernandez  wrote:
>
> On Fri, Oct 29, 2021 at 9:30 AM Richard Biener
>  wrote:
>
> > Btw, in case the "fully resolving" mode is slower than not fully resolving
> > please consider gating it on -fexpensive-optimizations (aka -O2+), thus
> > run the passes in not fully resolving modes at-O1.
>
> Sorry for the awkward naming.  I couldn't find a better name :-/.
> Suggestions welcome.
>
> The fast mode assumes any unknown ranges on entry to a path to be
> VARYING, whereas the fully resolving mode will ask the ranger, so the
> fully resolving mode will indeed be slower.  Though, I haven't
> measured how much.  However, we are gaining some time in total
> compilation speed (1.32%) by replacing two threaders with one.

OK.  Just again, -O1 is to favor compile-speed and should crunch through
those incredibly stupi^Wlarge machine-generated sources without problems.
But from your comment it doesn't sound like something completely unreasonable
or slow.

> >
> > Btw, there were quite a few big compile-time hogs with the vrp_threader
> > passes, not sure if this solves those.
>
> Sorry for not commenting on your spec ltrans report.  I was waiting
> until this went in to get a better feel of whether it was the path
> solver, the forward threader, or something else.  When I commit this
> patch we'll get the forward threader out of the set of variables to
> examine.  The forward threader, for instance, has very few knobs
> limiting its behavior, and coupled with a smarter solver, who knows
> what's going on.
>
> It is possible we may need to add a few knobs (or re-add some of the
> ones I removed??), since the backward threader can find a whole slew
> of paths that the forward threader could never find.

Yeah, sure.  I'll wait unless this change is in and will re-measure and update
the PR.

Richard.

> Aldy
>


Re: [PATCH] Preserve location in gimple_fold_builtin_memset

2021-10-29 Thread Richard Biener via Gcc-patches
On Fri, 29 Oct 2021, Jakub Jelinek wrote:

> Hi!
> 
> As mentioned yesterday, gimple_fold_builtin_memset doesn't preserve
> locus which means e.g. the -Wstringop-overflow warnings are emitted as:
> In function 'test_max':
> cc1: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
> The function emits up to 2 new statements, but the latter (asgn) is added
> through gsi_replace and therefore the locus is copied over from the call.
> But store is emitted before the call and optionally the call removed
> afterwards, so locus needs to be copied over manually.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

OK.

Thanks,
Richard.

> 2021-10-29  Jakub Jelinek  
> 
>   * gimple-fold.c (gimple_fold_builtin_memset): Copy over location from
>   call to store.
> 
>   * gcc.dg/Wstringop-overflow-62.c: Adjust expected diagnostics.
> 
> --- gcc/gimple-fold.c.jj  2021-10-12 09:36:57.728450483 +0200
> +++ gcc/gimple-fold.c 2021-10-28 12:42:07.638741783 +0200
> @@ -1505,6 +1505,7 @@ gimple_fold_builtin_memset (gimple_stmt_
>var = fold_build2 (MEM_REF, etype, dest, build_int_cst (ptr_type_node, 0));
>gimple *store = gimple_build_assign (var, build_int_cst_type (etype, 
> cval));
>gimple_move_vops (store, stmt);
> +  gimple_set_location (store, gimple_location (stmt));
>gsi_insert_before (gsi, store, GSI_SAME_STMT);
>if (gimple_call_lhs (stmt))
>  {
> --- gcc/testsuite/gcc.dg/Wstringop-overflow-62.c.jj   2021-10-28 
> 12:24:21.909780099 +0200
> +++ gcc/testsuite/gcc.dg/Wstringop-overflow-62.c  2021-10-28 
> 12:43:52.023273297 +0200
> @@ -223,7 +223,7 @@ void test_max (void)
>  
>  char *q = MAX (pi, pj);
>  
> -memset (q, 0, 1); // { dg-warning "writing 1 byte into a region 
> of size 0 " "" { target *-*-* } 0 }
> +memset (q, 0, 1); // { dg-warning "writing 1 byte into a region 
> of size 0 " }
>  memset (q, 0, 2); // { dg-warning "writing 2 bytes into a region 
> of size 0 " }
>}
>  
> @@ -345,7 +345,7 @@ void test_max (void)
> not reflected in the determaxed offset).  */
>  char *q = MAX (p1, p2);
>  
> -memset (q, 0, 1);
> +memset (q, 0, 1); // { dg-warning "writing 1 byte into a region 
> of size 0 " }
>}
>  
>{
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH] Remove VRP threader passes in exchange for better threading pre-VRP.

2021-10-29 Thread Aldy Hernandez via Gcc-patches
On Fri, Oct 29, 2021 at 9:30 AM Richard Biener
 wrote:

> Btw, in case the "fully resolving" mode is slower than not fully resolving
> please consider gating it on -fexpensive-optimizations (aka -O2+), thus
> run the passes in not fully resolving modes at-O1.

Sorry for the awkward naming.  I couldn't find a better name :-/.
Suggestions welcome.

The fast mode assumes any unknown ranges on entry to a path to be
VARYING, whereas the fully resolving mode will ask the ranger, so the
fully resolving mode will indeed be slower.  Though, I haven't
measured how much.  However, we are gaining some time in total
compilation speed (1.32%) by replacing two threaders with one.

>
> Btw, there were quite a few big compile-time hogs with the vrp_threader
> passes, not sure if this solves those.

Sorry for not commenting on your spec ltrans report.  I was waiting
until this went in to get a better feel of whether it was the path
solver, the forward threader, or something else.  When I commit this
patch we'll get the forward threader out of the set of variables to
examine.  The forward threader, for instance, has very few knobs
limiting its behavior, and coupled with a smarter solver, who knows
what's going on.

It is possible we may need to add a few knobs (or re-add some of the
ones I removed??), since the backward threader can find a whole slew
of paths that the forward threader could never find.

Aldy



[PATCH] Preserve location in gimple_fold_builtin_memset

2021-10-29 Thread Jakub Jelinek via Gcc-patches
Hi!

As mentioned yesterday, gimple_fold_builtin_memset doesn't preserve
locus which means e.g. the -Wstringop-overflow warnings are emitted as:
In function 'test_max':
cc1: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
The function emits up to 2 new statements, but the latter (asgn) is added
through gsi_replace and therefore the locus is copied over from the call.
But store is emitted before the call and optionally the call removed
afterwards, so locus needs to be copied over manually.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2021-10-29  Jakub Jelinek  

* gimple-fold.c (gimple_fold_builtin_memset): Copy over location from
call to store.

* gcc.dg/Wstringop-overflow-62.c: Adjust expected diagnostics.

--- gcc/gimple-fold.c.jj2021-10-12 09:36:57.728450483 +0200
+++ gcc/gimple-fold.c   2021-10-28 12:42:07.638741783 +0200
@@ -1505,6 +1505,7 @@ gimple_fold_builtin_memset (gimple_stmt_
   var = fold_build2 (MEM_REF, etype, dest, build_int_cst (ptr_type_node, 0));
   gimple *store = gimple_build_assign (var, build_int_cst_type (etype, cval));
   gimple_move_vops (store, stmt);
+  gimple_set_location (store, gimple_location (stmt));
   gsi_insert_before (gsi, store, GSI_SAME_STMT);
   if (gimple_call_lhs (stmt))
 {
--- gcc/testsuite/gcc.dg/Wstringop-overflow-62.c.jj 2021-10-28 
12:24:21.909780099 +0200
+++ gcc/testsuite/gcc.dg/Wstringop-overflow-62.c2021-10-28 
12:43:52.023273297 +0200
@@ -223,7 +223,7 @@ void test_max (void)
 
 char *q = MAX (pi, pj);
 
-memset (q, 0, 1); // { dg-warning "writing 1 byte into a region of 
size 0 " "" { target *-*-* } 0 }
+memset (q, 0, 1); // { dg-warning "writing 1 byte into a region of 
size 0 " }
 memset (q, 0, 2); // { dg-warning "writing 2 bytes into a region 
of size 0 " }
   }
 
@@ -345,7 +345,7 @@ void test_max (void)
not reflected in the determaxed offset).  */
 char *q = MAX (p1, p2);
 
-memset (q, 0, 1);
+memset (q, 0, 1); // { dg-warning "writing 1 byte into a region of 
size 0 " }
   }
 
   {

Jakub



Re: [PATCH] vect: Add bias parameter for partial vectorization

2021-10-29 Thread Richard Sandiford via Gcc-patches
Robin Dapp  writes:
> Hi,
>
> as discussed in
> https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582627.html this
> introduces a bias parameter for the len_load/len_store ifns as well as
> optabs that is meant to distinguish between Power and s390 variants.
> The default is a bias of 0, while in s390's case vll/vstl do not support
> lengths of zero bytes and a bias of -1 should be used.
>
> Bootstrapped and regtested on Power9 (--with-cpu=power9) and s390
> (--with-arch=z15).
>
> The tiny changes in the Power backend I will post separately.

Some comments in addition to what Kewen said:

> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index 8312d08aab2..993e32c1854 100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -2696,9 +2696,9 @@ expand_call_mem_ref (tree type, gcall *stmt, int index)
>  static void
>  expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>  {
> -  class expand_operand ops[3];
> -  tree type, lhs, rhs, maskt;
> -  rtx mem, target, mask;
> +  class expand_operand ops[4];
> +  tree type, lhs, rhs, maskt, biast;
> +  rtx mem, target, mask, bias;
>insn_code icode;
>  
>maskt = gimple_call_arg (stmt, 2);
> @@ -2727,7 +2727,16 @@ expand_partial_load_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>TYPE_UNSIGNED (TREE_TYPE (maskt)));
>else
>  create_input_operand ([2], mask, TYPE_MODE (TREE_TYPE (maskt)));
> -  expand_insn (icode, 3, ops);
> +  if (optab == len_load_optab)
> +{
> +  biast = gimple_call_arg (stmt, 3);
> +  bias = expand_normal (biast);
> +  create_input_operand ([3], bias, QImode);
> +  expand_insn (icode, 4, ops);
> +}
> +  else
> +expand_insn (icode, 3, ops);
> +

The previous “if“ is also for len_load_optab, so it seems better to
combine the two.

>if (!rtx_equal_p (target, ops[0].value))
>  emit_move_insn (target, ops[0].value);
>  }
> @@ -2741,9 +2750,9 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, 
> convert_optab optab)
>  static void
>  expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>  {
> -  class expand_operand ops[3];
> -  tree type, lhs, rhs, maskt;
> -  rtx mem, reg, mask;
> +  class expand_operand ops[4];
> +  tree type, lhs, rhs, maskt, biast;
> +  rtx mem, reg, mask, bias;
>insn_code icode;
>  
>maskt = gimple_call_arg (stmt, 2);
> @@ -2770,7 +2779,16 @@ expand_partial_store_optab_fn (internal_fn, gcall 
> *stmt, convert_optab optab)
>TYPE_UNSIGNED (TREE_TYPE (maskt)));
>else
>  create_input_operand ([2], mask, TYPE_MODE (TREE_TYPE (maskt)));
> -  expand_insn (icode, 3, ops);
> +
> +  if (optab == len_store_optab)
> +{
> +  biast = gimple_call_arg (stmt, 4);
> +  bias = expand_normal (biast);
> +  create_input_operand ([3], bias, QImode);
> +  expand_insn (icode, 4, ops);
> +}
> +  else
> +expand_insn (icode, 3, ops);

Same idea here.

>  }
>  
>  #define expand_mask_store_optab_fn expand_partial_store_optab_fn
> @@ -4172,6 +4190,30 @@ internal_check_ptrs_fn_supported_p (internal_fn ifn, 
> tree type,
> && insn_operand_matches (icode, 4, GEN_INT (align)));
>  }
>  
> +/* Return the supported bias for the len_load IFN.  For now we support a
> +   default bias of 0 and -1 in case 0 is not an allowable length for 
> len_load.

Neither bias is really the default.  How about just “For now we only
support the biases 0 and -1.”?

> +   If none of these biases match what the backend provides, return
> +   VECT_PARTIAL_BIAS_UNSUPPORTED.  */
> +
> +signed char
> +internal_len_load_bias_supported (internal_fn ifn, machine_mode mode)

Since this is now providing the bias rather than asking about a
given bias (thanks), I think we should drop “_supported”.

> +{
> +  optab optab = direct_internal_fn_optab (ifn);
> +  insn_code icode = direct_optab_handler (optab, mode);
> +
> +  if (icode != CODE_FOR_nothing)
> +{
> +  /* We only support a bias of 0 (default) or -1.  Try both
> +  of them.  */
> +  if (insn_operand_matches (icode, 3, GEN_INT (0)))
> + return 0;
> +  else if (insn_operand_matches (icode, 3, GEN_INT (-1)))

Minor nit, but: redundant else.

> + return -1;
> +}
> +
> +  return VECT_PARTIAL_BIAS_UNSUPPORTED;
> +}
> +
>  /* Expand STMT as though it were a call to internal function FN.  */
>  
>  void
> diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
> index 19d0f849a5a..af28cf0d566 100644
> --- a/gcc/internal-fn.h
> +++ b/gcc/internal-fn.h
> @@ -227,6 +227,10 @@ extern bool internal_gather_scatter_fn_supported_p 
> (internal_fn, tree,
>   tree, tree, int);
>  extern bool internal_check_ptrs_fn_supported_p (internal_fn, tree,
>   poly_uint64, unsigned int);
> +#define VECT_PARTIAL_BIAS_UNSUPPORTED 127
> +
> +extern signed char internal_len_load_bias_supported (internal_fn 

[PATCH] Force -fexcess-precision=standard for fp-uint64-convert-double-1.c

2021-10-29 Thread Richard Biener via Gcc-patches
This forces -fexcess-precision=standard since the testcase is
otherwise prone to fail with x87 math.

Tested on x86_64-unknown-linux-gnu with {,-m32}, pushed.

2021-10-29  Richard Biener  

* gcc.dg/torture/fp-uint64-convert-double-1.c: Add
-fexcess-precision=standard.
---
 gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-1.c 
b/gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-1.c
index b40a16a2257..fadad8c3198 100644
--- a/gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-1.c
+++ b/gcc/testsuite/gcc.dg/torture/fp-uint64-convert-double-1.c
@@ -1,7 +1,7 @@
 /* PR84407 */
 /* { dg-do run } */
 /* { dg-require-effective-target fenv } */
-/* { dg-additional-options "-frounding-math" } */
+/* { dg-additional-options "-frounding-math -fexcess-precision=standard" } */
 
 #include 
 #include 
-- 
2.31.1


Re: [PATCH] Bump required minimum DejaGnu version to 1.5.3

2021-10-29 Thread Richard Biener via Gcc-patches
On Fri, Oct 29, 2021 at 2:42 AM Bernhard Reutner-Fischer via
Gcc-patches  wrote:
>
> From: Bernhard Reutner-Fischer 
>
> Bump required DejaGnu version to 1.5.3 (or later).
> Ok for trunk?

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * doc/install.texi: Bump required minimum DejaGnu version.
> ---
>  gcc/doc/install.texi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index 36c8280d7da..094469b9a4e 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -452,7 +452,7 @@ Necessary when modifying @command{gperf} input files, 
> e.g.@:
>  @file{gcc/cp/cfns.gperf} to regenerate its associated header file, e.g.@:
>  @file{gcc/cp/cfns.h}.
>
> -@item DejaGnu 1.4.4
> +@item DejaGnu version 1.5.3 (or later)
>  @itemx Expect
>  @itemx Tcl
>  @c Once Tcl 8.5 or higher is required, remove any obsolete
> --
> 2.33.0
>


Re: [PATCH] Remove VRP threader passes in exchange for better threading pre-VRP.

2021-10-29 Thread Richard Biener via Gcc-patches
On Thu, Oct 28, 2021 at 8:34 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 10/28/2021 9:24 AM, Aldy Hernandez wrote:
> > This patch upgrades the pre-VRP threading passes to fully resolving
> > backward threaders, and removes the post-VRP threading passes altogether.
> > With it, we reduce the number of threaders in our pipeline from 9 to 7.
> >
> > This will leave DOM as the only forward threader client.  When the ranger
> > can handle floats, we should be able to upgrade the pre-DOM threaders to
> > fully resolving threaders and kill the embedded DOM threader.
> >
> > The final numbers are:
> >
> >   prev: # threads in backward + vrp-threaders = 92624
> >   now:  # threads in backward threaders = 94275
> >   Gain: +1.78%
> >
> >   prev: # total threads: 189495
> >   now:  # total threads: 193714
> >   Gain: +2.22%
> >
> >   The numbers are not as great as my initial proposal, but I've
> >   recently pushed all the work that got us to this point ;-).
> >
> > And... the total compilation improves by 1.32%!
> >
> > There's a regression on uninit-pred-7_a.c that I've yet to look at.  I
> > want to make sure it's not a missing thread.  If it is, I'll create a PR
> > and own it.
> >
> > Also, the tree-ssa/phi_on_compare-*.c tests have all regressed.  This
> > seems to be some special case the forward threader handles that the
> > backward threader does not (edge_forwards_cmp_to_conditional_jump*).
> > I haven't dug deep to see if this is solveable within our
> > infrastructure, but a cursory look shows that even though the VRP
> > threader threads this, the *.optimized dump ends with more conditional
> > jumps than without the optimization.  I'd like to punt on this for
> > now, because DOM actually catches this through its lone use of the
> > forward threader (I've adjusted the tests).  However, we will need to
> > address this sooner or later, if indeed it's still improving the final
> > assembly.
> >
> > Even though we have been incrementally stressing all the pieces of this
> > intricate puzzle, I do expect fall out.  My plan from here until stage1
> > ends is to stop new development in the threader(s), and focus on bug
> > fixing and improving the developer's debugging experience.
> >
> > OK pending another round of tests on x86-64 and ppc64le Linux?
> >
> > gcc/ChangeLog:
> >
> >   * passes.def: Replace the pass_thread_jumps before VRP* with
> >   pass_thread_jumps_full.  Remove all pass_vrp_threader instances.
> >
> > libgomp/ChangeLog:
> >
> >   * testsuite/libgomp.graphite/force-parallel-4.c: Adjust for threading 
> > changes.
> >   * testsuite/libgomp.graphite/force-parallel-8.c: Same.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/loop-unswitch-2.c: Adjust for threading changes.
> >   * gcc.dg/old-style-asm-1.c: Same.
> >   * gcc.dg/tree-ssa/phi_on_compare-1.c: Same.
> >   * gcc.dg/tree-ssa/phi_on_compare-2.c: Same.
> >   * gcc.dg/tree-ssa/phi_on_compare-3.c: Same.
> >   * gcc.dg/tree-ssa/phi_on_compare-4.c: Same.
> >   * gcc.dg/tree-ssa/pr20701.c: Same.
> >   * gcc.dg/tree-ssa/pr21001.c: Same.
> >   * gcc.dg/tree-ssa/pr21294.c: Same.
> >   * gcc.dg/tree-ssa/pr21417.c: Same.
> >   * gcc.dg/tree-ssa/pr21559.c: Same.
> >   * gcc.dg/tree-ssa/pr21563.c: Same.
> >   * gcc.dg/tree-ssa/pr49039.c: Same.
> >   * gcc.dg/tree-ssa/pr59597.c: Same.
> >   * gcc.dg/tree-ssa/pr61839_1.c: Same.
> >   * gcc.dg/tree-ssa/pr61839_3.c: Same.
> >   * gcc.dg/tree-ssa/pr66752-3.c: Same.
> >   * gcc.dg/tree-ssa/pr68198.c: Same.
> >   * gcc.dg/tree-ssa/pr77445-2.c: Same.
> >   * gcc.dg/tree-ssa/pr77445.c: Same.
> >   * gcc.dg/tree-ssa/ranger-threader-1.c: Same.
> >   * gcc.dg/tree-ssa/ranger-threader-2.c: Same.
> >   * gcc.dg/tree-ssa/ranger-threader-4.c: Same.
> >   * gcc.dg/tree-ssa/ssa-dom-thread-1.c: Same.
> >   * gcc.dg/tree-ssa/ssa-dom-thread-11.c: Same.
> >   * gcc.dg/tree-ssa/ssa-dom-thread-12.c: Same.
> >   * gcc.dg/tree-ssa/ssa-dom-thread-14.c: Same.
> >   * gcc.dg/tree-ssa/ssa-dom-thread-16.c: Same.
> >   * gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Same.
> >   * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
> >   * gcc.dg/tree-ssa/ssa-thread-14.c: Same.
> >   * gcc.dg/tree-ssa/ssa-thread-backedge.c: Same.
> >   * gcc.dg/tree-ssa/ssa-vrp-thread-1.c: Same.
> >   * gcc.dg/tree-ssa/vrp02.c: Same.
> >   * gcc.dg/tree-ssa/vrp03.c: Same.
> >   * gcc.dg/tree-ssa/vrp05.c: Same.
> >   * gcc.dg/tree-ssa/vrp06.c: Same.
> >   * gcc.dg/tree-ssa/vrp07.c: Same.
> >   * gcc.dg/tree-ssa/vrp08.c: Same.
> >   * gcc.dg/tree-ssa/vrp09.c: Same.
> >   * gcc.dg/tree-ssa/vrp106.c: Same.
> >   * gcc.dg/tree-ssa/vrp33.c: Same.
> OK.  And yes, there will probably be fallout.  Fully expected and we'll
> deal with it.

Btw, in case the "fully resolving" mode is slower than not fully resolving
please consider gating it on 

Re: [PATCH] configure: Avoid unnecessary constraints on executables for $build.

2021-10-29 Thread Richard Biener via Gcc-patches
On Thu, Oct 28, 2021 at 5:44 PM Iain Sandoe  wrote:
>
> Hi Richard,
>
> > On 8 Sep 2021, at 07:35, Richard Biener  wrote:
> >
> > On Tue, Sep 7, 2021 at 10:11 PM Iain Sandoe  wrote:
> >>
>
> >> So, looking through the various email threads and the PR, I think that
> >> what has happened is :
> >>
> >> As the PR points out, our existing PCH model does not work if the compiler
> >> executable is PIE - which manifests on platforms like Darwin (which is PIE
> >> by default) or Linux when configured —enable-default-pie.
> >>
> >> H.J’s original patch forces no-PIE onto the compiler executables, and
> >> because of shared code on $host also to the driver etc.
>
> >> OK for master, and eventually backports?
> >
> > OK for trunk, I think it warrants quite some soaking time before considering
> > backports.
>
> It’s been on master for quite some time now (and presumably several cycles of
> everyone’s CI) without any reports of problems,  it would be good to get this 
> at
> least onto 11 and 10 (since that is the last version we can bootstrap with 
> c++98).
>
> OK for backports now?

OK.

> thanks
> Iain
>


Re: [PATCH] path relation oracle: Remove SSA's being killed from the equivalence list.

2021-10-29 Thread Richard Biener via Gcc-patches
On Thu, Oct 28, 2021 at 5:25 PM Aldy Hernandez via Gcc-patches
 wrote:
>
> Same thing as the relational change.  Walk any equivalences that have
> been registered on the path, and remove the name being killed.  The
> only reason we had added the equivalence with itself earlier is so we
> wouldn't search any further in the equivalency list.  So if we are
> removing all references to it, then we no longer need to add a "kill"
> record.
>
> Will push pending tests on x86-64 Linux.
>
> Co-authored-by: Andrew MacLeod 
>
> gcc/ChangeLog:
>
> * value-relation.cc (path_oracle::killing_def): Walk the
> equivalency list and remove SSA from any equivalencies.
> ---
>  gcc/value-relation.cc | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
> index 0ad4f7a9495..512b51ce022 100644
> --- a/gcc/value-relation.cc
> +++ b/gcc/value-relation.cc
> @@ -1298,17 +1298,17 @@ path_oracle::killing_def (tree ssa)
>  }
>
>unsigned v = SSA_NAME_VERSION (ssa);
> -  bitmap b = BITMAP_ALLOC (_bitmaps);
> -  bitmap_set_bit (b, v);
> -  equiv_chain *ptr = (equiv_chain *) obstack_alloc (_chain_obstack,
> -   sizeof (equiv_chain));
> -  ptr->m_names = b;
> -  ptr->m_bb = NULL;
> -  ptr->m_next = m_equiv.m_next;
> -  m_equiv.m_next = ptr;
> -  bitmap_ior_into (m_equiv.m_names, b);
>
> -  // Walk the relation list an remove SSA from any relations.
> +  // Walk the equivalency list and remove SSA from any equivalencies.
> +  if (bitmap_bit_p (m_equiv.m_names, v))
> +{
> +  bitmap_clear_bit (m_equiv.m_names, v);
> +  for (equiv_chain *ptr = m_equiv.m_next; ptr; ptr = ptr->m_next)
> +   if (bitmap_bit_p (ptr->m_names, v))
> + bitmap_clear_bit (ptr->m_names, v);

What's the reason to do both lookup and clear?  Just bitmap_clear_bit ()
should be good enough.

> +}
> +
> +  // Walk the relation list and remove SSA from any relations.
>if (!bitmap_bit_p (m_relations.m_names, v))
>  return;
>
> --
> 2.31.1
>


Re: [Version 2][Patch][PR102281]do not add BUILTIN_CLEAR_PADDING for variables that are gimple registers

2021-10-29 Thread Richard Biener via Gcc-patches
On Thu, 28 Oct 2021, Qing Zhao wrote:

> Ping….
> 
> Hi,
> 
> Based on the previous discussion, I thought that we have agreed that the 
> proposed patch for this current bug is the correct  fix. 
> And This bug is an important bug that need to be fixed.
> 
> I have created another new PR for the other potential issue with padding 
> initialization for  long double/_Complex long double variables with explicit 
> initializer https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102781, and will be 
> fixed separately later if needed.
> 
> Please take a look of the new patch and let me know whether there is any 
> more issue with this version? Or it’s okay for commit now?

I think it's reasonable.

Thus OK unless Jakub has comments.

Thanks,
Richard.

> Thanks.
> 
> Qing
> 
> 
> 
> > On Oct 25, 2021, at 9:16 AM, Qing Zhao via Gcc-patches 
> >  wrote:
> > 
> > Ping….
> > 
> > Is this Okay for trunk?
> > 
> >> On Oct 18, 2021, at 2:26 PM, Qing Zhao via Gcc-patches 
> >>  wrote:
> >> 
> >> Hi, Jakub,
> >> 
> >> This is the 2nd version of the patch based on your comment.
> >> 
> >> Bootstrapped on both x86 and aarch64. Regression testings are ongoing.
> > 
> > The regression testing looks good.
> > 
> > Thanks.
> > 
> > Qing
> >> 
> >> Please let me know if this is ready for committing?
> >> 
> >> Thanks a lot.
> >> 
> >> Qing.
> >> 
> >> ==
> >> 
> >> From d6f60370dee69b5deb3d7ef51873a5e986490782 Mon Sep 17 00:00:00 2001
> >> From: Qing Zhao 
> >> Date: Mon, 18 Oct 2021 19:04:39 +
> >> Subject: [PATCH] PR 102281 (-ftrivial-auto-var-init=zero causes ice)
> >> 
> >> Do not add call to __builtin_clear_padding when a variable is a gimple
> >> register or it might not have padding.
> >> 
> >> gcc/ChangeLog:
> >> 
> >> 2021-10-18  qing zhao  
> >> 
> >>* gimplify.c (gimplify_decl_expr): Do not add call to
> >>__builtin_clear_padding when a variable is a gimple register
> >>or it might not have padding.
> >>(gimplify_init_constructor): Likewise.
> >> 
> >> gcc/testsuite/ChangeLog:
> >> 
> >> 2021-10-18  qing zhao  
> >> 
> >>* c-c++-common/pr102281.c: New test.
> >>* gcc.target/i386/auto-init-2.c: Adjust testing case.
> >>* gcc.target/i386/auto-init-4.c: Likewise.
> >>* gcc.target/i386/auto-init-6.c: Likewise.
> >>* gcc.target/aarch64/auto-init-6.c: Likewise.
> >> ---
> >> gcc/gimplify.c| 25 ++-
> >> gcc/testsuite/c-c++-common/pr102281.c | 17 +
> >> .../gcc.target/aarch64/auto-init-6.c  |  4 +--
> >> gcc/testsuite/gcc.target/i386/auto-init-2.c   |  2 +-
> >> gcc/testsuite/gcc.target/i386/auto-init-4.c   | 10 +++-
> >> gcc/testsuite/gcc.target/i386/auto-init-6.c   |  7 +++---
> >> 6 files changed, 47 insertions(+), 18 deletions(-)
> >> create mode 100644 gcc/testsuite/c-c++-common/pr102281.c
> >> 
> >> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> >> index d8e4b139349..b27dc0ed308 100644
> >> --- a/gcc/gimplify.c
> >> +++ b/gcc/gimplify.c
> >> @@ -1784,8 +1784,8 @@ gimple_add_init_for_auto_var (tree decl,
> >>   that padding is initialized to zero. So, we always initialize paddings
> >>   to zeroes regardless INIT_TYPE.
> >>   To do the padding initialization, we insert a call to
> >> -   __BUILTIN_CLEAR_PADDING (, 0, for_auto_init = true).
> >> -   Note, we add an additional dummy argument for __BUILTIN_CLEAR_PADDING,
> >> +   __builtin_clear_padding (, 0, for_auto_init = true).
> >> +   Note, we add an additional dummy argument for __builtin_clear_padding,
> >>   'for_auto_init' to distinguish whether this call is for automatic
> >>   variable initialization or not.
> >>   */
> >> @@ -1954,8 +1954,14 @@ gimplify_decl_expr (tree *stmt_p, gimple_seq *seq_p)
> >> pattern initialization.
> >> In order to make the paddings as zeroes for pattern init, We
> >> should add a call to __builtin_clear_padding to clear the
> >> -   paddings to zero in compatiple with CLANG.  */
> >> -if (flag_auto_var_init == AUTO_INIT_PATTERN)
> >> +   paddings to zero in compatiple with CLANG.
> >> +   We cannot insert this call if the variable is a gimple register
> >> +   since __builtin_clear_padding will take the address of the
> >> +   variable.  As a result, if a long double/_Complex long double
> >> +   variable will spilled into stack later, its padding is 0XFE.  */
> >> +if (flag_auto_var_init == AUTO_INIT_PATTERN
> >> +&& !is_gimple_reg (decl)
> >> +&& clear_padding_type_may_have_padding_p (TREE_TYPE (decl)))
> >>gimple_add_padding_init_for_auto_var (decl, is_vla, seq_p);
> >>}
> >>}
> >> @@ -5384,12 +5390,19 @@ gimplify_init_constructor (tree *expr_p, 
> >> gimple_seq *pre_p, gimple_seq *post_p,
> >> 
> >>  /* If the user requests to initialize automatic variables, we
> >> should initialize paddings inside the variable.  Add a call to
> >> - __BUILTIN_CLEAR_PADDING (, 0, for_auto_init = true) to
> >> + 

Re: [PATCH][RFC] Map -ftrapv to -fsanitize=signed-integer-overflow -fsanitize-undefined-trap-on-error

2021-10-29 Thread Richard Biener via Gcc-patches
On Thu, 28 Oct 2021, Hans-Peter Nilsson wrote:

> On Wed, 20 Oct 2021, Richard Biener via Gcc-patches wrote:
> 
> > This maps -ftrapv to -fsanitize=signed-integer-overflow
> > -fsanitize-undefined-trap-on-error,
> 
> Isn't that UBSAN target-dependent, i.e. not supported on all
> targets, whereas -ftrapv is just about universally supported?

I think only libubsan from libsanitizer has target dependences,
-fsanitize-undefined-trap-on-error specifically avoids requiring
-lubsan and thus should be fine in that regard.

> I.e. isn't this patch breaking -ftrapv for some targets?

I don't think so.  As proposed the patch would make -ftrapv
non-functional for non-C/C++ languages.

But Jakub already stated he didn't like this simple approach.

Note that the UBSAN instrumentation in the C/C++ frontends has
the advantage over anything done in the middle-end that it
reliably happens before early folding which might influence
what and how things are instrumented.  The next obvious places
to instrument things would be the gimplifier or the ubsan pass.

Richard.