Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-09-19 Thread Stephan Bergmann via Gcc-patches

On 19/09/2022 17:33, Thomas Neumann wrote:
Can you try if the patch below fixes the problem? It keeps the data 
structures alive at shutdown, though, which will probably make some leak 
detectors unhappy.


Alternatively we could simply remove the gcc_assert (ob) in line 285 of 
that file. As far as I can see in crt-begin nothing bad happens if we 
return nullptr at shutdown.


Yes, thanks, each of those two alternative approaches would appear to 
fix that LibreOffice build of mine.




Re: [PATCH] [PR68097] frange::set_nonnegative should not contain -NAN.

2022-09-19 Thread Aldy Hernandez via Gcc-patches
On Mon, Sep 19, 2022 at 3:42 PM Richard Biener
 wrote:
>
> On Mon, Sep 19, 2022 at 2:52 PM Aldy Hernandez  wrote:
> >
> > On Mon, Sep 19, 2022 at 10:14 AM Richard Biener
> >  wrote:
> > >
> > > On Mon, Sep 19, 2022 at 9:59 AM Aldy Hernandez  wrote:
> > > >
> > > > ISTM that a specifically nonnegative range should not contain -NAN,
> > > > otherwise signbit_p() would return false, because we'd be unsure of the
> > > > sign.
> > > >
> > > > Do y'all agree?
> > >
> > > what tree_expr_nonnegative_p actually means isn't 100% clear.  For 
> > > REAL_CST
> > > it actually looks at the sign-bit but we have
> > >
> > > (simplify
> > >  /* copysign(x,y) -> fabs(x) if y is nonnegative.  */
> > >  (COPYSIGN_ALL @0 tree_expr_nonnegative_p@1)
> > >  (abs @0))
> > >
> > > is abs (@0) OK for sNaNs and -NaN/+NaN?
> >
> > At least for real_value's, ABS_EXPR works on NAN's.  There's no
> > special code dealing with them.  We just clear the sign bit:
> >
> > real_arithmetic:
> > case ABS_EXPR:
> >   *r = *op0;
> >   r->sign = 0;
> >   break;
> >
> > >
> > > And we have
> > >
> > > /* Convert abs[u] (X)  where X is nonnegative -> (X).  */
> > > (simplify
> > >  (abs tree_expr_nonnegative_p@0)
> > >  @0)
> > >
> > > where at least sNaN -> qNaN would be dropped?
> > >
> > > And of course
> > >
> > > (simplify
> > >  /* signbit(x) -> 0 if x is nonnegative.  */
> > >  (SIGNBIT tree_expr_nonnegative_p@0)
> > >  { integer_zero_node; })
> > >
> > > that is, is tree_expr_nonnegative_p actually tree_expr_sign or
> > > does tree_expr_nonnegative (x) mean x >= (typeof(X)) 0
> > > or !(x < (typeof(X))0)?
> >
> > I have no idea, but I'm happy to have frange::set_nonnegative() do
> > whatever you agree on.
> >
> > Actually TBH, ranger only uses set_nonnegative for call's, not much else:
> >
> >   if (range_of_builtin_call (r, call, src))
> > ;
> >   else if (gimple_stmt_nonnegative_warnv_p (call, _overflow_p))
> > r.set_nonnegative (type);
> >   else if (gimple_call_nonnull_result_p (call)
> >|| gimple_call_nonnull_arg (call))
> > r.set_nonzero (type);
> >   else
> > r.set_varying (type);
> >
> > but I guess it's good we do the right thing for correctness sake, and
> > if it ever gets used by someone else.
> >
> > >
> > > That said, 'set_nonnegative' could be interpreted to be without
> > > NaNs?
> >
> > Sounds good to me.  How's this?
>
> Hmm, I merely had lots of questions above so I think to answer them
> we at least should document what 'set_nonnegative' implies in the
> abstract vrange class.
>
> It's probably safer to keep NaN in.  For example unfolded copysign (x, 1.)
> will return true for x == -NaN but the result will be a NaN.

Come to think of it, perhaps having set_nonnegative in the API is
overkill.  It's either nonsensical or awkward for say pointers or
strings.  There is only one use of it in all of ranger.  It's only
used to set a range for the result from gimple_stmt_nonnegative* on an
unknown call:

 else if (gimple_stmt_nonnegative_warnv_p (call, _overflow_p))
r.set_nonnegative (type);

I would just remove the API entry point and push whatever we decide
onto the actual use.  No point confusing everyone at large.

So... what does gimple_stmt_nonnegative_warnv_p imply for floats?  My
gut feeling is [+0.0,+INF] U +NAN, but I'm happy to do whatever y'all
agree on.

Aldy



Re: [PATCH] [PR68097] frange::set_nonnegative should not contain -NAN.

2022-09-19 Thread Aldy Hernandez via Gcc-patches
On Mon, Sep 19, 2022 at 4:04 PM Michael Matz  wrote:
>
> Hello,
>
> On Mon, 19 Sep 2022, Richard Biener via Gcc-patches wrote:
>
> > > but I guess it's good we do the right thing for correctness sake, and
> > > if it ever gets used by someone else.
> > >
> > > >
> > > > That said, 'set_nonnegative' could be interpreted to be without
> > > > NaNs?
> > >
> > > Sounds good to me.  How's this?
> >
> > Hmm, I merely had lots of questions above so I think to answer them
> > we at least should document what 'set_nonnegative' implies in the
> > abstract vrange class.
> >
> > It's probably safer to keep NaN in.  For example unfolded copysign (x, 1.)
> > will return true for x == -NaN but the result will be a NaN.
>
> FWIW, in IEEE, 'abs' (like 'copy, 'copysign' and 'negate') are not
> arithmetic, they are quiet-computational.  Hence they don't rise
> anything, not even for sNaNs; they copy the input bits and appropriately
> modify the bit pattern according to the specification (i.e. fiddle the
> sign bit).
>
> That also means that a predicate like negative_p(x) that would be
> implemented ala
>
>   copysign(1.0, x) < 0.0

I suppose this means -0.0 is not considered negative, though it has
the signbit set?  FWIW, on real_value's real_isneg() returns true for
-0.0 because it only looks at the sign.

>
> deal with NaNs just fine and is required to correctly capture the sign of
> 'x'.  If frange::set_nonnegative is supposed to be used in such contexts
> (and I think it's a good idea if that were the case), then set_nonnegative
> does _not_ imply no-NaN.
>
> In particular I would assume that, given an VAYRING frange FR, that
> FR.set_nonnegative() would result in an frange {[+0.0,+inf],+nan} .

That was my understanding as well, and what my original patch did.
But again, I'm just the messenger.

Aldy



Re: [PATCH] frange: flush denormals to zero for -funsafe-math-optimizations.

2022-09-19 Thread Aldy Hernandez via Gcc-patches
On Mon, Sep 19, 2022 at 3:45 PM Richard Biener
 wrote:
>
> On Mon, Sep 19, 2022 at 3:04 PM Aldy Hernandez  wrote:
> >
> > On Mon, Sep 19, 2022 at 9:37 AM Richard Biener
> >  wrote:
> > >
> > > On Sat, Sep 17, 2022 at 10:25 AM Aldy Hernandez via Gcc-patches
> > >  wrote:
> > > >
> > > > Jakub has mentioned that for -funsafe-math-optimizations we may flush
> > > > denormals to zero, in which case we need to be careful to extend the
> > > > ranges to the appropriate zero.  This patch does exactly that.  For a
> > > > range of [x, -DENORMAL] we flush to [x, -0.0] and for [+DENORMAL, x]
> > > > we flush to [+0.0, x].
> > > >
> > > > It is unclear whether we should do this for Alpha, since I believe
> > > > flushing to zero is the default, and the port requires -mieee for IEEE
> > > > sanity.  If so, perhaps we should add a target hook so backends are
> > > > free to request flushing to zero.
> > > >
> > > > Thoughts?
> > >
> > > I'm not sure what the intention of this is - it effectively results in
> > > more conservative ranges for -funsafe-math-optimizations.  That is,
> > > if -funsafe-math-optimizations says that denormals don't exist and
> > > are all zero then doesn't that mean we can instead use the smalles
> > > non-denormal number less than (or greater than) zero here?
> > >
> > > That said, the flushing you do is also "valid" for
> > > -fno-unsafe-math-optimizations
> > > in case we don't want to deal with denormals in range boundaries.
> > >
> > > It might also be a correctness requirement in case we don't know how
> > > targets handle denormals (IIRC even OS defaults might matter here) so
> > > we conservatively have to treat them as being flushed to zero.
> >
> > Actually, rth suggested we always flush to zero because we don't know
> > what the target would do.  Again, I'm happy to do whatever you agree
> > on.  I have no opinion.  My main goal here is correctness.
>
> Yes, I think we should do this.

Ok, I'm pushing the attached patch because it's conservatively correct
everywhere.  We can tweak it when y'all reach consensus.

>
> > >
> > > So ... if we want to be on the "safe" side then please always do this.
> > >
> > > At the same point you could do
> > >
> > >  if (!HONOR_SIGNED_ZEROS ())
> > >if (real_iszero (_max))
> > >  {
> > > if (real_iszero (_min))
> > >m.min.sign = 1;
> > > m_max.sign = 1;
> > >  }
> >
> > But wouldn't that set [-0.0, -0.0] when encountering [+0, +0] ??
>
> Yeah, that's my laziness not adding a special case for m_min == m_max.
>
> >
> > >else if (real_iszero (_min))
> > >  m_min.sign = 0;
> >
> > Jakub actually suggested something completely different...just set +0
> > always for !HONOR_SIGNED_ZEROS.
>
> Hmm, but the [-1, -0.] with known sign becomes [-1, +0.] with unknown sign?

Huh.  I guess that's true.  Does that happen often enough in practice
to matter?  I suppose we're going into uncharted territory with folks
asking for no signs on zeros, but them appearing in the source code?
My gut feeling is that we should take whatever signs are in the source
code (similar to what we're doing for NANs), but don't introduce any
additional ones.

Again, no strong opinions.  Feel free to add whatever adjustment you
think is necessary.

Aldy
From 2b61ed838c7f3f4bf54d4530c0f053b420623beb Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Sat, 17 Sep 2022 10:17:13 +0200
Subject: [PATCH] frange: flush denormals to zero

For some architectures (or for -funsafe-math-optimizations) we may
flush denormals to zero, in which case we need to be careful to
extend the ranges to the appropriate zero.  This patch does exactly that.
For a range of [x, -DENORMAL] we flush to [x, -0.0] and for [+DENORMAL, x]
we flush to [+0.0, x].

gcc/ChangeLog:

	* value-range.cc (frange::flush_denormals_to_zero): New.
	(frange::set): Call flush_denormals_to_zero.
	* value-range.h (class frange): Add flush_denormals_to_zero.
---
 gcc/value-range.cc | 23 +++
 gcc/value-range.h  |  1 +
 2 files changed, 24 insertions(+)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 67d5d7fa90f..a8e3bb39bab 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -267,6 +267,26 @@ tree_compare (tree_code code, tree op1, tree op2)
   return !integer_zerop (fold_build2 (code, integer_type_node, op1, op2));
 }
 
+// Flush denormal endpoints to the appropriate 0.0.
+
+void
+frange::flush_denormals_to_zero ()
+{
+  if (undefined_p () || known_isnan ())
+return;
+
+  // Flush [x, -DENORMAL] to [x, -0.0].
+  if (real_isdenormal (_max) && real_isneg (_max))
+{
+  m_max = dconst0;
+  if (HONOR_SIGNED_ZEROS (m_type))
+	m_max.sign = 1;
+}
+  // Flush [+DENORMAL, x] to [+0.0, x].
+  if (real_isdenormal (_min) && !real_isneg (_min))
+m_min = dconst0;
+}
+
 // Setter for franges.
 
 void
@@ -317,6 +337,9 @@ frange::set (tree min, tree max, value_range_kind kind)
   gcc_checking_assert (tree_compare (LE_EXPR, min, max));
 
   

Re: [PATCH] c++: Implement P1467R9 - Extended floating-point types and standard names compiler part except for bfloat16 [PR106652]

2022-09-19 Thread Hongtao Liu via Gcc-patches
On Mon, Sep 12, 2022 at 4:06 PM Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> The following patch implements the compiler part of C++23
> P1467R9 - Extended floating-point types and standard names compiler part
> by introducing _Float{16,32,64,128} as keywords and builtin types
> like they are implemented for C already since GCC 7.
> It doesn't introduce _Float{32,64,128}x for C++, those remain C only
> for now, mainly because 
> https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling
> has mangling for:
> ::= DF  _ # ISO/IEC TS 18661 binary floating point type _FloatN (N 
> bits)
> but doesn't for _FloatNx.  And it doesn't add anything for bfloat16_t
> support, see below.
> Regarding mangling, I think mangling _FloatNx as DF  x _ would be
> possible, but would need to be discussed and voted in.
> As there is no _FloatNx support for C++, I think it is wrong to announce
> it through __FLT{32,64,128}X_*__ predefined macros (so the patch disables
> those for C++; unfortunately g++ 7 to 12 will predefine those and also
> __FLT{32,64,128}_*__ even when _FloatN support isn't implemented).
> The patch wants to keep backwards compatibility with how __float128 has
> been handled in C++ before, both for mangling and behavior in binary
> operations, overload resolution etc.  So, there are some backend changes
> where for C __float128 and _Float128 are the same type (float128_type_node
> and float128t_type_node are the same pointer), but for C++ they are distinct
> types which mangle differently and _Float128 is treated as extended
> floating-point type while __float128 is treated as non-standard floating
> point type.  The various C++23 changes about how floating-point types
> are changed are actually implemented as written in the spec only if at least
> one of the types involved is _Float{16,32,64,128} and kept previous behavior
> otherwise.  For float/double/long double the rules are actually written that
> they behave the same as before.
> There is some backwards incompatibility at least on x86 regarding _Float16,
> because that type was already used by that name and with the DF16_ mangling
> (but only since GCC 12 and I think it isn't that widely used in the wild
> yet).  E.g. config/i386/avx512fp16intrin.h shows the issues, where
> in C or in GCC 12 in C++ one could pass 0.0f to a builtin taking _Float16
> argument, but with the changes that is not possible anymore, one needs
> to either use 0.0f16 or (_Float16) 0.0f.
> We have also a problem with glibc headers, where since glibc 2.27
> math.h and complex.h aren't compilable with these changes.  One gets
> errors like:
> In file included from /usr/include/math.h:43,
>  from abc.c:1:
> /usr/include/bits/floatn.h:86:9: error: multiple types in one declaration
>86 | typedef __float128 _Float128;
>   | ^~
> /usr/include/bits/floatn.h:86:20: error: declaration does not declare 
> anything [-fpermissive]
>86 | typedef __float128 _Float128;
>   |^
> In file included from /usr/include/bits/floatn.h:119:
> /usr/include/bits/floatn-common.h:214:9: error: multiple types in one 
> declaration
>   214 | typedef float _Float32;
>   | ^
> /usr/include/bits/floatn-common.h:214:15: error: declaration does not declare 
> anything [-fpermissive]
>   214 | typedef float _Float32;
>   |   ^~~~
> /usr/include/bits/floatn-common.h:251:9: error: multiple types in one 
> declaration
>   251 | typedef double _Float64;
>   | ^~
> /usr/include/bits/floatn-common.h:251:16: error: declaration does not declare 
> anything [-fpermissive]
>   251 | typedef double _Float64;
>   |^~~~
> This is from snippets like:
> /* The remaining of this file provides support for older compilers.  */
> # if __HAVE_FLOAT128
>
> /* The type _Float128 exists only since GCC 7.0.  */
> #  if !__GNUC_PREREQ (7, 0) || defined __cplusplus
> typedef __float128 _Float128;
> #  endif
> where it hardcodes that C++ doesn't have _Float{16,32,64,128} support nor
> {f,F}{16,32,64,128} literal suffixes nor _Complex _Float{16,32,64,128}.
> The patch fixincludes this for now and hopefully if this is committed, then
> glibc can change those.  Right now the patch changes those
> #  if !__GNUC_PREREQ (7, 0) || defined __cplusplus
> conditions to
> #  if !__GNUC_PREREQ (7, 0) || (defined __cplusplus && !__GNUC_PREREQ (13, 1) 
> && defined __FLT32X_MANT_DIG__)
> where it relies on __FLT32X_*__ macros no longer being predefined for C++.
> Now, I guess for the fixincludes it could also use
> #  if !__GNUC_PREREQ (7, 0) || (defined __cplusplus && !__GNUC_PREREQ (13, 0))
> where earlier GCC 13 snapshots would not be doing the fixincludes,
> but the question is what to use for upstream glibc, because
> there will be 13.0 snapshots where C++ doesn't support _Float{16,32,64,128}
> and where it is essential to use what glibc has been doing previously
> and using the #else would fail 

Re: [PATCH] Fix incorrect handle in vectorizable_induction for mixed induction type.

2022-09-19 Thread Hongtao Liu via Gcc-patches
On Tue, Sep 20, 2022 at 10:23 AM liuhongt  wrote:
>
> The codes in vectorizable_induction for slp_node assume all phi_info
> have same induction type(vect_step_op_add), but since we support
> nonlinear induction, it could be wrong handled.
> So the patch return false when slp_node has mixed induction type.
>
> Note codes in other place will still vectorize the induction with
> separate iv update and vec_perm. But slp_node handle in
> vectorizable_induction will be more optimal when all induction type
> are the same, it will update ivs with one operation instead of
> separate iv updates and permutation.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR tree-optimization/103144
> * tree-vect-loop.cc (vectorizable_induction): Return false for
> slp_node with mixed induction type.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr103144-mix-1.c: New test.
> * gcc.target/i386/pr103144-mix-2.c: New test.
> ---
>  .../gcc.target/i386/pr103144-mix-1.c  | 17 +
>  .../gcc.target/i386/pr103144-mix-2.c  | 35 +++
>  gcc/tree-vect-loop.cc | 34 ++
>  3 files changed, 79 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-mix-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-mix-2.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mix-1.c 
> b/gcc/testsuite/gcc.target/i386/pr103144-mix-1.c
> new file mode 100644
> index 000..b292d66ef71
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103144-mix-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 2 "optimized" } } */
> +/* For induction variable with differernt induction type(vect_step_op_add, 
> vect_step_op_neg),
> +   It should't be handled in vectorizable_induction with just 1 single iv 
> update(addition.),
> +   separate iv update and vec_perm are needed.  */
> +int
> +__attribute__((noipa))
> +foo (int* p, int c, int n)
> +{
> +  for (int i = 0; i != n; i++)
> +{
> +  p[2* i]= i;
> +  p[2 * i+1] = c;
> +  c = -c;
> +}
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mix-2.c 
> b/gcc/testsuite/gcc.target/i386/pr103144-mix-2.c
> new file mode 100644
> index 000..b7043d59aec
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103144-mix-2.c
> @@ -0,0 +1,35 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited 
> -mprefer-vector-width=256" } */
> +/* { dg-require-effective-target avx2 } */
> +
> +#include "avx2-check.h"
> +#include 
> +#include "pr103144-mix-1.c"
> +
> +typedef int v8si __attribute__((vector_size(32)));
> +
> +#define N 34
> +void
> +avx2_test (void)
> +{
> +  int* epi32_exp = (int*) malloc (N * sizeof (int));
> +  int* epi32_dst = (int*) malloc (N * sizeof (int));
> +
> +  __builtin_memset (epi32_exp, 0, N * sizeof (int));
> +  int b = 8;
> +  v8si init1 = __extension__(v8si) { 0, b, 1, -b, 2, b, 3, -b };
> +  v8si init2 = __extension__(v8si) { 4, b, 5, -b, 6, b, 7, -b };
> +  v8si init3 = __extension__(v8si) { 8, b, 9, -b, 10, b, 11, -b };
> +  v8si init4 = __extension__(v8si) { 12, b, 13, -b, 14, b, 15, -b };
> +  memcpy (epi32_exp, , 32);
> +  memcpy (epi32_exp + 8, , 32);
> +  memcpy (epi32_exp + 16, , 32);
> +  memcpy (epi32_exp + 24, , 32);
> +  epi32_exp[32] = 16;
> +  epi32_exp[33] = b;
> +  foo (epi32_dst, b, N / 2);
> +  if (__builtin_memcmp (epi32_dst, epi32_exp, N * sizeof (int)) != 0)
> +__builtin_abort ();
> +
> +  return;
> +}
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 9c434b66c5b..c7050a47c1c 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -9007,14 +9007,34 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>  iv_loop = loop;
>gcc_assert (iv_loop == (gimple_bb (phi))->loop_father);
>
> -  if (slp_node && !nunits.is_constant ())
> +  if (slp_node)
>  {
> -  /* The current SLP code creates the step value element-by-element.  */
> -  if (dump_enabled_p ())
> -   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -"SLP induction not supported for variable-length"
> -" vectors.\n");
> -  return false;
> +  if (!nunits.is_constant ())
> +   {
> + /* The current SLP code creates the step value element-by-element.  
> */
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"SLP induction not supported for variable-length"
> +" vectors.\n");
> + return false;
> +   }
> +
> +  stmt_vec_info phi_info;
> +  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, phi_info)
> +   {
> + if 

Re: [PATCH] [x86]Don't optimize cmp mem, 0 to load mem, reg + test reg, reg

2022-09-19 Thread Hongtao Liu via Gcc-patches
On Fri, Sep 16, 2022 at 9:38 PM Alexander Monakov via Gcc-patches
 wrote:
>
> On Fri, 16 Sep 2022, Uros Bizjak via Gcc-patches wrote:
>
> > On Fri, Sep 16, 2022 at 3:32 AM Jeff Law via Gcc-patches
> >  wrote:
> > >
> > >
> > > On 9/15/22 19:06, liuhongt via Gcc-patches wrote:
> > > > There's peephole2 submit in 1990s which split cmp mem, 0 to load mem,
> > > > reg + test reg, reg. I don't know exact reason why gcc do this.
> > > >
> > > > For latest x86 processors, ciscization should help processor frontend
> > > > also codesize, for processor backend, they should be the same(has same
> > > > uops).
> > > >
> > > > So the patch deleted the peephole2, and also modify another splitter to
> > > > generate more cmp mem, 0 for 32-bit target.
> > > >
> > > > It will help instruction fetch.
> > > >
> > > > for minmax-1.c minmax-2.c minmax-10, pr96891.c, it's supposed to scan 
> > > > there's no
> > > > comparison to 1 or -1, so adjust the testcase since under 32-bit
> > > > target, we now generate cmp mem, 0 instead of load + test.
> > > >
> > > > Similar for pr78035.c.
> > > >
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> > > > No performance impact for SPEC2017 on ICX/Znver3.
> > > >
> > > It was almost certainly for PPro/P2 given it was rth's work from
> > > 1999.Probably should have been conditionalized on PPro/P2 at the
> > > time.   No worries losing it now...
> >
> > Please add a tune flag in x86-tune.def under "Historical relics" and
> > use it in the relevant peephole2 instead of deleting it.
>
> When the next instruction after 'load mem; test reg, reg' is a conditional
> branch, this disables macro-op fusion because Intel CPUs do not macro-fuse
> 'cmp mem, imm; jcc'.
>
Oh, i didn't realize it, thanks for your reply.
I'll hold on the patch until more investigation.
> It would be nice to rephrase the commit message to acknowledge this (the
> statement 'has same uops' is not always true with this considered).
>
> AMD CPUs can fuse some 'cmp mem, imm; jcc' under some conditions, so this
> should be beneficial for AMD.
>
> Alexander



-- 
BR,
Hongtao


[PATCH] Fix incorrect handle in vectorizable_induction for mixed induction type.

2022-09-19 Thread liuhongt via Gcc-patches
The codes in vectorizable_induction for slp_node assume all phi_info
have same induction type(vect_step_op_add), but since we support
nonlinear induction, it could be wrong handled.
So the patch return false when slp_node has mixed induction type.

Note codes in other place will still vectorize the induction with
separate iv update and vec_perm. But slp_node handle in
vectorizable_induction will be more optimal when all induction type
are the same, it will update ivs with one operation instead of
separate iv updates and permutation.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

PR tree-optimization/103144
* tree-vect-loop.cc (vectorizable_induction): Return false for
slp_node with mixed induction type.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr103144-mix-1.c: New test.
* gcc.target/i386/pr103144-mix-2.c: New test.
---
 .../gcc.target/i386/pr103144-mix-1.c  | 17 +
 .../gcc.target/i386/pr103144-mix-2.c  | 35 +++
 gcc/tree-vect-loop.cc | 34 ++
 3 files changed, 79 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-mix-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-mix-2.c

diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mix-1.c 
b/gcc/testsuite/gcc.target/i386/pr103144-mix-1.c
new file mode 100644
index 000..b292d66ef71
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103144-mix-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 2 "optimized" } } */
+/* For induction variable with differernt induction type(vect_step_op_add, 
vect_step_op_neg),
+   It should't be handled in vectorizable_induction with just 1 single iv 
update(addition.),
+   separate iv update and vec_perm are needed.  */
+int
+__attribute__((noipa))
+foo (int* p, int c, int n)
+{
+  for (int i = 0; i != n; i++)
+{
+  p[2* i]= i;
+  p[2 * i+1] = c;
+  c = -c;
+}
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mix-2.c 
b/gcc/testsuite/gcc.target/i386/pr103144-mix-2.c
new file mode 100644
index 000..b7043d59aec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr103144-mix-2.c
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited 
-mprefer-vector-width=256" } */
+/* { dg-require-effective-target avx2 } */
+
+#include "avx2-check.h"
+#include 
+#include "pr103144-mix-1.c"
+
+typedef int v8si __attribute__((vector_size(32)));
+
+#define N 34
+void
+avx2_test (void)
+{
+  int* epi32_exp = (int*) malloc (N * sizeof (int));
+  int* epi32_dst = (int*) malloc (N * sizeof (int));
+
+  __builtin_memset (epi32_exp, 0, N * sizeof (int));
+  int b = 8;
+  v8si init1 = __extension__(v8si) { 0, b, 1, -b, 2, b, 3, -b };
+  v8si init2 = __extension__(v8si) { 4, b, 5, -b, 6, b, 7, -b };
+  v8si init3 = __extension__(v8si) { 8, b, 9, -b, 10, b, 11, -b };
+  v8si init4 = __extension__(v8si) { 12, b, 13, -b, 14, b, 15, -b };
+  memcpy (epi32_exp, , 32);
+  memcpy (epi32_exp + 8, , 32);
+  memcpy (epi32_exp + 16, , 32);
+  memcpy (epi32_exp + 24, , 32);
+  epi32_exp[32] = 16;
+  epi32_exp[33] = b;
+  foo (epi32_dst, b, N / 2);
+  if (__builtin_memcmp (epi32_dst, epi32_exp, N * sizeof (int)) != 0)
+__builtin_abort ();
+
+  return;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 9c434b66c5b..c7050a47c1c 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -9007,14 +9007,34 @@ vectorizable_induction (loop_vec_info loop_vinfo,
 iv_loop = loop;
   gcc_assert (iv_loop == (gimple_bb (phi))->loop_father);
 
-  if (slp_node && !nunits.is_constant ())
+  if (slp_node)
 {
-  /* The current SLP code creates the step value element-by-element.  */
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"SLP induction not supported for variable-length"
-" vectors.\n");
-  return false;
+  if (!nunits.is_constant ())
+   {
+ /* The current SLP code creates the step value element-by-element.  */
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"SLP induction not supported for variable-length"
+" vectors.\n");
+ return false;
+   }
+
+  stmt_vec_info phi_info;
+  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, phi_info)
+   {
+ if (STMT_VINFO_LOOP_PHI_EVOLUTION_TYPE (phi_info) != vect_step_op_add)
+   {
+ /* The below SLP code assume all induction type to be the same.
+But slp in other place will still vectorize the loop via 
updating
+iv update separately + vec_perm, but not from below codes.  */
+

Re: [PATCH] Support 64-bit vectorization for single-precision floating rounding operation.

2022-09-19 Thread Hongtao Liu via Gcc-patches
On Tue, Sep 20, 2022 at 10:14 AM liuhongt  wrote:
>
> Here's list the patch supported.
> rint/nearbyint/ceil/floor/trunc/lrint/lceil/lfloor/round/lround.
>
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/106910
> * config/i386/mmx.md (nearbyintv2sf2): New expander.
> (rintv2sf2): Ditto.
> (ceilv2sf2): Ditto.
> (lceilv2sfv2si2): Ditto.
> (floorv2sf2): Ditto.
> (lfloorv2sfv2si2): Ditto.
> (btruncv2sf2): Ditto.
> (lrintv2sfv2si2): Ditto.
> (roundv2sf2): Ditto.
> (lroundv2sfv2si2): Ditto.
> (*mmx_roundv2sf2): New define_insn.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr106910-1.c: New test.
> ---
>  gcc/config/i386/mmx.md | 154 +
>  gcc/testsuite/gcc.target/i386/pr106910-1.c |  77 +++
>  2 files changed, 231 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr106910-1.c
>
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index dda4b43f5c1..222a041de58 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -1627,6 +1627,160 @@ (define_expand "vec_initv2sfsf"
>DONE;
>  })
>
> +;
> +;;
> +;; Parallel single-precision floating point rounding operations.
> +;;
> +;
> +
> +(define_expand "nearbyintv2sf2"
> +  [(set (match_operand:V2SF 0 "register_operand")
> +   (unspec:V2SF
> + [(match_operand:V2SF 1 "register_operand")
> +  (match_dup 2)]
> + UNSPEC_ROUND))]
> +  "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
> +  "operands[2] = GEN_INT (ROUND_MXCSR | ROUND_NO_EXC);")
> +
> +(define_expand "rintv2sf2"
> +  [(set (match_operand:V2SF 0 "register_operand")
> +   (unspec:V2SF
> + [(match_operand:V2SF 1 "register_operand")
> +  (match_dup 2)]
> + UNSPEC_ROUND))]
> +  "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
> +  "operands[2] = GEN_INT (ROUND_MXCSR);")
> +
> +(define_expand "ceilv2sf2"
> +  [(set (match_operand:V2SF 0 "register_operand")
> +   (unspec:V2SF
> + [(match_operand:V2SF 1 "register_operand")
> +  (match_dup 2)]
> + UNSPEC_ROUND))]
> +  "TARGET_SSE4_1 && !flag_trapping_math
> +   && TARGET_MMX_WITH_SSE"
> +  "operands[2] = GEN_INT (ROUND_CEIL | ROUND_NO_EXC);")
> +
> +(define_expand "lceilv2sfv2si2"
> +  [(match_operand:V2SI 0 "register_operand")
> +   (match_operand:V2SF 1 "register_operand")]
> + "TARGET_SSE4_1 && !flag_trapping_math
> +  && TARGET_MMX_WITH_SSE"
> +{
> +  rtx tmp = gen_reg_rtx (V2SFmode);
> +  emit_insn (gen_ceilv2sf2 (tmp, operands[1]));
> +  emit_insn (gen_fix_truncv2sfv2si2 (operands[0], tmp));
> +  DONE;
> +})
> +
> +(define_expand "floorv2sf2"
> +  [(set (match_operand:V2SF 0 "register_operand")
> +   (unspec:V2SF
> + [(match_operand:V2SF 1 "vector_operand")
> +  (match_dup 2)]
> + UNSPEC_ROUND))]
> +  "TARGET_SSE4_1 && !flag_trapping_math
> +  && TARGET_MMX_WITH_SSE"
> +  "operands[2] = GEN_INT (ROUND_FLOOR | ROUND_NO_EXC);")
> +
> +(define_expand "lfloorv2sfv2si2"
> +  [(match_operand:V2SI 0 "register_operand")
> +   (match_operand:V2SF 1 "register_operand")]
> + "TARGET_SSE4_1 && !flag_trapping_math
> +  && TARGET_MMX_WITH_SSE"
> +{
> +  rtx tmp = gen_reg_rtx (V2SFmode);
> +  emit_insn (gen_floorv2sf2 (tmp, operands[1]));
> +  emit_insn (gen_fix_truncv2sfv2si2 (operands[0], tmp));
> +  DONE;
> +})
> +
> +(define_expand "btruncv2sf2"
> +  [(set (match_operand:V2SF 0 "register_operand")
> +   (unspec:V2SF
> + [(match_operand:V2SF 1 "register_operand")
> +  (match_dup 2)]
> + UNSPEC_ROUND))]
> +  "TARGET_SSE4_1 && !flag_trapping_math"
> +  "operands[2] = GEN_INT (ROUND_TRUNC | ROUND_NO_EXC);")
> +
> +(define_insn "*mmx_roundv2sf2"
> +  [(set (match_operand:V2SF 0 "register_operand" "=Yr,*x,v")
> +   (unspec:V2SF
> + [(match_operand:V2SF 1 "register_operand" "Yr,x,v")
> +  (match_operand:SI 2 "const_0_to_15_operand")]
> + UNSPEC_ROUND))]
> +  "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
> +  "%vroundps\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "isa" "noavx,noavx,avx")
> +   (set_attr "type" "ssecvt")
> +   (set_attr "prefix_data16" "1,1,*")
> +   (set_attr "prefix_extra" "1")
> +   (set_attr "length_immediate" "1")
> +   (set_attr "prefix" "orig,orig,vex")
> +   (set_attr "mode" "V4SF")])
> +
> +(define_insn "lrintv2sfv2si2"
> +  [(set (match_operand:V2SI 0 "register_operand" "=v")
> +   (unspec:V2SI
> + [(match_operand:V2SF 1 "register_operand" "v")]
> + UNSPEC_FIX_NOTRUNC))]
> +  "TARGET_MMX_WITH_SSE"
> +  "%vcvtps2dq\t{%1, %0|%0, %1}"
> +  [(set_attr "type" "ssecvt")
> +   (set (attr "prefix_data16")
> + (if_then_else
> +   (match_test "TARGET_AVX")
> + (const_string "*")
> + (const_string 

[PATCH] Support 64-bit vectorization for single-precision floating rounding operation.

2022-09-19 Thread liuhongt via Gcc-patches
Here's list the patch supported.
rint/nearbyint/ceil/floor/trunc/lrint/lceil/lfloor/round/lround.


Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
Ok for trunk?

gcc/ChangeLog:

PR target/106910
* config/i386/mmx.md (nearbyintv2sf2): New expander.
(rintv2sf2): Ditto.
(ceilv2sf2): Ditto.
(lceilv2sfv2si2): Ditto.
(floorv2sf2): Ditto.
(lfloorv2sfv2si2): Ditto.
(btruncv2sf2): Ditto.
(lrintv2sfv2si2): Ditto.
(roundv2sf2): Ditto.
(lroundv2sfv2si2): Ditto.
(*mmx_roundv2sf2): New define_insn.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr106910-1.c: New test.
---
 gcc/config/i386/mmx.md | 154 +
 gcc/testsuite/gcc.target/i386/pr106910-1.c |  77 +++
 2 files changed, 231 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr106910-1.c

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index dda4b43f5c1..222a041de58 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1627,6 +1627,160 @@ (define_expand "vec_initv2sfsf"
   DONE;
 })
 
+;
+;;
+;; Parallel single-precision floating point rounding operations.
+;;
+;
+
+(define_expand "nearbyintv2sf2"
+  [(set (match_operand:V2SF 0 "register_operand")
+   (unspec:V2SF
+ [(match_operand:V2SF 1 "register_operand")
+  (match_dup 2)]
+ UNSPEC_ROUND))]
+  "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
+  "operands[2] = GEN_INT (ROUND_MXCSR | ROUND_NO_EXC);")
+
+(define_expand "rintv2sf2"
+  [(set (match_operand:V2SF 0 "register_operand")
+   (unspec:V2SF
+ [(match_operand:V2SF 1 "register_operand")
+  (match_dup 2)]
+ UNSPEC_ROUND))]
+  "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
+  "operands[2] = GEN_INT (ROUND_MXCSR);")
+
+(define_expand "ceilv2sf2"
+  [(set (match_operand:V2SF 0 "register_operand")
+   (unspec:V2SF
+ [(match_operand:V2SF 1 "register_operand")
+  (match_dup 2)]
+ UNSPEC_ROUND))]
+  "TARGET_SSE4_1 && !flag_trapping_math
+   && TARGET_MMX_WITH_SSE"
+  "operands[2] = GEN_INT (ROUND_CEIL | ROUND_NO_EXC);")
+
+(define_expand "lceilv2sfv2si2"
+  [(match_operand:V2SI 0 "register_operand")
+   (match_operand:V2SF 1 "register_operand")]
+ "TARGET_SSE4_1 && !flag_trapping_math
+  && TARGET_MMX_WITH_SSE"
+{
+  rtx tmp = gen_reg_rtx (V2SFmode);
+  emit_insn (gen_ceilv2sf2 (tmp, operands[1]));
+  emit_insn (gen_fix_truncv2sfv2si2 (operands[0], tmp));
+  DONE;
+})
+
+(define_expand "floorv2sf2"
+  [(set (match_operand:V2SF 0 "register_operand")
+   (unspec:V2SF
+ [(match_operand:V2SF 1 "vector_operand")
+  (match_dup 2)]
+ UNSPEC_ROUND))]
+  "TARGET_SSE4_1 && !flag_trapping_math
+  && TARGET_MMX_WITH_SSE"
+  "operands[2] = GEN_INT (ROUND_FLOOR | ROUND_NO_EXC);")
+
+(define_expand "lfloorv2sfv2si2"
+  [(match_operand:V2SI 0 "register_operand")
+   (match_operand:V2SF 1 "register_operand")]
+ "TARGET_SSE4_1 && !flag_trapping_math
+  && TARGET_MMX_WITH_SSE"
+{
+  rtx tmp = gen_reg_rtx (V2SFmode);
+  emit_insn (gen_floorv2sf2 (tmp, operands[1]));
+  emit_insn (gen_fix_truncv2sfv2si2 (operands[0], tmp));
+  DONE;
+})
+
+(define_expand "btruncv2sf2"
+  [(set (match_operand:V2SF 0 "register_operand")
+   (unspec:V2SF
+ [(match_operand:V2SF 1 "register_operand")
+  (match_dup 2)]
+ UNSPEC_ROUND))]
+  "TARGET_SSE4_1 && !flag_trapping_math"
+  "operands[2] = GEN_INT (ROUND_TRUNC | ROUND_NO_EXC);")
+
+(define_insn "*mmx_roundv2sf2"
+  [(set (match_operand:V2SF 0 "register_operand" "=Yr,*x,v")
+   (unspec:V2SF
+ [(match_operand:V2SF 1 "register_operand" "Yr,x,v")
+  (match_operand:SI 2 "const_0_to_15_operand")]
+ UNSPEC_ROUND))]
+  "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
+  "%vroundps\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssecvt")
+   (set_attr "prefix_data16" "1,1,*")
+   (set_attr "prefix_extra" "1")
+   (set_attr "length_immediate" "1")
+   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "mode" "V4SF")])
+
+(define_insn "lrintv2sfv2si2"
+  [(set (match_operand:V2SI 0 "register_operand" "=v")
+   (unspec:V2SI
+ [(match_operand:V2SF 1 "register_operand" "v")]
+ UNSPEC_FIX_NOTRUNC))]
+  "TARGET_MMX_WITH_SSE"
+  "%vcvtps2dq\t{%1, %0|%0, %1}"
+  [(set_attr "type" "ssecvt")
+   (set (attr "prefix_data16")
+ (if_then_else
+   (match_test "TARGET_AVX")
+ (const_string "*")
+ (const_string "1")))
+   (set_attr "prefix" "maybe_vex")
+   (set_attr "mode" "TI")])
+
+(define_expand "roundv2sf2"
+  [(set (match_dup 3)
+   (plus:V2SF
+ (match_operand:V2SF 1 "register_operand")
+ (match_dup 2)))
+   (set (match_operand:V2SF 0 "register_operand")
+   (unspec:V2SF
+ [(match_dup 3) (match_dup 4)]
+  

[PATCH, rs6000] Eliminate TARGET_CTZ,TARGET_FCTIDZ,FCTIWUZ defines

2022-09-19 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Eliminate TARGET_CTZ,TARGET_FCTIDZ,FCTIWUZ defines

Hi,
  This is the first of a batch of changes that eliminate a number
of define TARGET_foo entries we have collected over time.

TARGET_CTZ is defined as TARGET_MODULO, and has a low number
of uses.  References to TARGET_CTZ should be safe to replace
with TARGET_MODULO throughout.

TARGET_FCTIDZ is entirely unused, and safe to remove.

TARGET_FCTIWUZ has a low number of uses, and can be directly
replaced with TARGET_POPCNTD.

This eliminates three defines.

There should be no codegen changes, and this has regtested OK.
OK for trunk?
Thanks,

gcc/
* config/rs6000/rs6000.h (TARGET_CTZ): Replace with
TARGET_MODULO.
(TARGET_FCTIDZ): Remove.
(TARGET_FCTIWUZ): Replace with TARGET_POPCNTD.
* config/rs6000/rs6000.cc (TARGET_CTZ): Replace with TARGET_MODULO.
* config/rs6000/rs6000.md (ctz2): Replace TARGET_CTZ
with TARGET_MODULO.
(ctz2_hw): Same.
(fixuns_truncsi2): Replace TARGET_FCTIWUZ
with TARGET_POPCNTD.
(fixuns_truncsi2_stfiwx): Same.
(fctiwz_): Same.

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index fcca062a8709..eea427b1ca51 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -21998,11 +21998,11 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
   if (!TARGET_MODULO && (code == MOD || code == UMOD))
*total += COSTS_N_INSNS (2);
   return false;
 
 case CTZ:
-  *total = COSTS_N_INSNS (TARGET_CTZ ? 1 : 4);
+  *total = COSTS_N_INSNS (TARGET_MODULO ? 1 : 4);
   return false;
 
 case FFS:
   *total = COSTS_N_INSNS (4);
   return false;
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index eb7b21584970..ee887efd1122 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -456,20 +456,17 @@ extern int rs6000_vector_align[];
 || TARGET_PPC_GPOPT/* 970/power4 */\
 || TARGET_POPCNTB  /* ISA 2.02 */  \
 || TARGET_CMPB /* ISA 2.05 */  \
 || TARGET_POPCNTD) /* ISA 2.06 */
 
-#define TARGET_FCTIDZ  TARGET_FCFID
 #define TARGET_STFIWX  TARGET_PPC_GFXOPT
 #define TARGET_LFIWAX  TARGET_CMPB
 #define TARGET_LFIWZX  TARGET_POPCNTD
 #define TARGET_FCFIDS  TARGET_POPCNTD
 #define TARGET_FCFIDU  TARGET_POPCNTD
 #define TARGET_FCFIDUS TARGET_POPCNTD
 #define TARGET_FCTIDUZ TARGET_POPCNTD
-#define TARGET_FCTIWUZ TARGET_POPCNTD
-#define TARGET_CTZ TARGET_MODULO
 #define TARGET_EXTSWSLI(TARGET_MODULO && TARGET_POWERPC64)
 #define TARGET_MADDLD  TARGET_MODULO
 
 #define TARGET_XSCVDPSPN   (TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
 #define TARGET_XSCVSPDPN   (TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
@@ -1751,11 +1748,11 @@ typedef struct rs6000_args
 
 /* The CTZ patterns that are implemented in terms of CLZ return -1 for input of
zero.  The hardware instructions added in Power9 and the sequences using
popcount return 32 or 64.  */
 #define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
-  (TARGET_CTZ || TARGET_POPCNTD
\
+  (TARGET_MODULO || TARGET_POPCNTD 
\
? ((VALUE) = GET_MODE_BITSIZE (MODE), 2)\
: ((VALUE) = -1, 2))
 
 /* Specify the machine mode that pointers have.
After generation of rtl, the compiler makes no further distinction
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index ad5a4cf2ef83..619a87374734 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -2414,11 +2414,11 @@ (define_insn "clz2"
 (define_expand "ctz2"
[(set (match_operand:GPR 0 "gpc_reg_operand")
 (ctz:GPR (match_operand:GPR 1 "gpc_reg_operand")))]
   ""
 {
-  if (TARGET_CTZ)
+  if (TARGET_MODULO)
 {
   emit_insn (gen_ctz2_hw (operands[0], operands[1]));
   DONE;
 }
 
@@ -2445,11 +2445,11 @@ (define_expand "ctz2"
 })
 
 (define_insn "ctz2_hw"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
(ctz:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")))]
-  "TARGET_CTZ"
+  "TARGET_MODULO"
   "cnttz %0,%1"
   [(set_attr "type" "cntlz")])
 
 (define_expand "ffs2"
   [(set (match_operand:GPR 0 "gpc_reg_operand")
@@ -6326,11 +6326,11 @@ (define_insn_and_split 
"*fix_trunc2_mem"
 })
 
 (define_expand "fixuns_truncsi2"
   [(set (match_operand:SI 0 "gpc_reg_operand")
(unsigned_fix:SI (match_operand:SFDF 1 "gpc_reg_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_FCTIWUZ && TARGET_STFIWX"
+  "TARGET_HARD_FLOAT && TARGET_POPCNTD && TARGET_STFIWX"
 {
   if (!TARGET_P8_VECTOR)
 {
   emit_insn (gen_fixuns_truncsi2_stfiwx (operands[0], operands[1]));
   DONE;
@@ -6339,11 +6339,11 @@ (define_expand 

Re: [PATCH] RISC-V modified add3 for large stack frame optimization [PR105733]

2022-09-19 Thread Kito Cheng via Gcc-patches
Could you provide some data including code size and performance? add is
frequently used patten, so we should more careful when changing that.

Kevin Lee 於 2022年9月19日 週一,18:07寫道:

> Hello GCC,
>  Started from Jim Wilson's patch in
>
> https://github.com/riscv-admin/riscv-code-speed-optimization/blob/main/projects/gcc-optimizations.adoc
> for the large stack frame optimization problem, this augmented patch
> generates less instructions for cases such as
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733.
> Original:
> foo:
> li t0,-4096
> addi t0,t0,2016
> li a4,4096
> add sp,sp,t0
> li a5,-4096
> addi a4,a4,-2032
> add a4,a4,a5
> addi a5,sp,16
> add a5,a4,a5
> add a0,a5,a0
> li t0,4096
> sd a5,8(sp)
> sb zero,2032(a0)
> addi t0,t0,-2016
> add sp,sp,t0
> jr ra
> After Patch:
> foo:
> li t0,-4096
> addi t0,t0,2032
> add sp,sp,t0
> addi a5,sp,-2032
> add a0,a5,a0
> li t0,4096
> sb zero,2032(a0)
> addi t0,t0,-2032
> add sp,sp,t0
> jr ra
>
>= Summary of gcc testsuite =
> | # of unexpected case / # of unique unexpected
> case
> |  gcc |  g++ | gfortran |
>  rv64gc/  lp64d/ medlow |4 / 4 |   13 / 4 |0 / 0 |
> No additional failures were created from the testsuite.
>
> gcc/ChangeLog:
>Jim Wilson 
>Michael Collison 
>Kevin Lee 
>
> * config/riscv/predicates.md (const_lui_operand): New predicate.
> (add_operand): Ditto.
> (reg_or_const_int_operand): Ditto.
> * config/riscv/riscv-protos.h (riscv_eliminable_reg): New
> function.
> * config/riscv/riscv.cc (riscv_eliminable_reg): New Function.
> (riscv_adjust_libcall_cfi_prologue): Use gen_rtx_SET and
> gen_rtx_fmt_ee instead of gen_add3_insn.
> (riscv_adjust_libcall_cfi_epilogue): ditto.
> * config/riscv/riscv.md (addsi3): Remove.
> (adddi3): ditto.
> (add3): New instruction for large stack frame optimization.
> (add3_internal): ditto
> (add3_internal2): New instruction for insns generated in
> the prologue and epilogue pass.
> ---
>  gcc/config/riscv/predicates.md  | 13 +
>  gcc/config/riscv/riscv-protos.h |  1 +
>  gcc/config/riscv/riscv.cc   | 20 ++--
>  gcc/config/riscv/riscv.md   | 84 -
>  4 files changed, 101 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/config/riscv/predicates.md
> b/gcc/config/riscv/predicates.md
> index 862e72b0983..b98bb5a9768 100644
> --- a/gcc/config/riscv/predicates.md
> +++ b/gcc/config/riscv/predicates.md
> @@ -35,6 +35,14 @@ (define_predicate "sfb_alu_operand"
>(ior (match_operand 0 "arith_operand")
> (match_operand 0 "lui_operand")))
>
> +(define_predicate "const_lui_operand"
> +  (and (match_code "const_int")
> +   (match_test "(INTVAL (op) & 0xFFF) == 0 && INTVAL (op) != 0")))
> +
> +(define_predicate "add_operand"
> +  (ior (match_operand 0 "arith_operand")
> +   (match_operand 0 "const_lui_operand")))
> +
>  (define_predicate "const_csr_operand"
>(and (match_code "const_int")
> (match_test "IN_RANGE (INTVAL (op), 0, 31)")))
> @@ -59,6 +67,11 @@ (define_predicate "reg_or_0_operand"
>(ior (match_operand 0 "const_0_operand")
> (match_operand 0 "register_operand")))
>
> +;; For use in adds, when adding to an eliminable register.
> +(define_predicate "reg_or_const_int_operand"
> +  (ior (match_code "const_int")
> +   (match_operand 0 "register_operand")))
> +
>  ;; Only use branch-on-bit sequences when the mask is not an ANDI
> immediate.
>  (define_predicate "branch_on_bit_operand"
>(and (match_code "const_int")
> diff --git a/gcc/config/riscv/riscv-protos.h
> b/gcc/config/riscv/riscv-protos.h
> index 649c5c977e1..8f0aa8114be 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -63,6 +63,7 @@ extern void riscv_expand_conditional_move (rtx, rtx, rtx,
> rtx_code, rtx, rtx);
>  extern rtx riscv_legitimize_call_address (rtx);
>  extern void riscv_set_return_address (rtx, rtx);
>  extern bool riscv_expand_block_move (rtx, rtx, rtx);
> +extern bool riscv_eliminable_reg (rtx);
>  extern rtx riscv_return_addr (int, rtx);
>  extern poly_int64 riscv_initial_elimination_offset (int, int);
>  extern void riscv_expand_prologue (void);
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 675d92c0961..b5577a4f366 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -4320,6 +4320,16 @@ riscv_initial_elimination_offset (int from, int to)
>return src - dest;
>  }
>
> +/* Return true if X is a register that will be eliminated later on.  */
> +bool
> +riscv_eliminable_reg (rtx x)
> +{
> +  return REG_P (x) && (REGNO (x) == FRAME_POINTER_REGNUM
> +   || REGNO (x) == ARG_POINTER_REGNUM
> +   || (REGNO (x) >= FIRST_VIRTUAL_REGISTER
> +   && REGNO (x) <= LAST_VIRTUAL_REGISTER));
> +}
> +
>  /* Implement RETURN_ADDR_RTX.  We do not support moving back to a
> previous frame.  */
>
> @@ -4521,8 +4531,9 

RE: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator support.

2022-09-19 Thread Eugene Rozenfeld via Gcc-patches
Hi Jason,

Do you have any more feedback for this patch?

Thanks,

Eugene

-Original Message-
From: Eugene Rozenfeld 
Sent: Thursday, September 08, 2022 5:46 PM
To: Jason Merrill ; gcc-patches@gcc.gnu.org
Cc: Andi Kleen ; Jan Hubicka 
Subject: RE: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator 
support.

Jason,

Thank for your suggestion. The patch is updated (attached).

Thanks,

Eugene

-Original Message-
From: Jason Merrill 
Sent: Thursday, September 08, 2022 10:26 AM
To: Eugene Rozenfeld ; gcc-patches@gcc.gnu.org
Cc: Andi Kleen ; Jan Hubicka 
Subject: Re: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator 
support.

On 9/1/22 18:22, Eugene Rozenfeld wrote:
> Jason,
> 
> I made another small change in addressing your feedback (attached).
> 
> Thanks,
> 
> Eugene
> 
> -Original Message-
> From: Gcc-patches
>  On Behalf Of 
> Eugene Rozenfeld via Gcc-patches
> Sent: Thursday, September 01, 2022 1:49 PM
> To: Jason Merrill ; gcc-patches@gcc.gnu.org
> Cc: Andi Kleen ; Jan Hubicka 
> Subject: RE: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator 
> support.
> 
> Jason,
> 
> Thank you for your review. You are correct, we only need to check 
> has_discriminator for the statement that's on the same line.
> I updated the patch (attached).
> 
> Thanks,
> 
> Eugene
> 
> -Original Message-
> From: Jason Merrill 
> Sent: Thursday, August 18, 2022 6:55 PM
> To: Eugene Rozenfeld ; 
> gcc-patches@gcc.gnu.org
> Cc: Andi Kleen ; Jan Hubicka 
> Subject: [EXTERNAL] Re: [PING][PATCH] Add instruction level discriminator 
> support.
> 
> On 8/3/22 17:25, Eugene Rozenfeld wrote:
>> One more ping for this patch
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.
>> gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.htmldata=
>> 0
>> 5%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da8
>> 1
>> 85dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63796470932569195
>> 1
>> %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6
>> I
>> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=K%2BMx6jelnED3n%2Be2d
>> T
>> DYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3Dreserved=0
>>
>> CC Jason since this changes discriminators emitted in dwarf.
>>
>> Thanks,
>>
>> Eugene
>>
>> -Original Message-
>> From: Eugene Rozenfeld
>> Sent: Monday, June 27, 2022 12:45 PM
>> To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan 
>> Hubicka 
>> Subject: RE: [PING][PATCH] Add instruction level discriminator support.
>>
>> Another ping for 
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.htmldata=05%7C01%7Ceugene.rozenfeld%40microsoft.com%7Cc7c9fab519184eee0bb808da91bf2fea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637982547579085499%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=Kj3YWJrDqjX%2B0Ml3At5P8NRWc1Xs6mbI%2F1vCk9%2BLaQs%3Dreserved=0
>>  .
>>
>> I got a review from Andi 
>> (https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596549.htmldata=05%7C01%7Ceugene.rozenfeld%40microsoft.com%7Cc7c9fab519184eee0bb808da91bf2fea%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637982547579085499%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=NBzFtD0mH7EYKxsVWfqZgSpQmUG18SIt01XRYUlwwW4%3Dreserved=0)
>>  but I also need a review from someone who can approve the changes.
>>
>> Thanks,
>>
>> Eugene
>>
>> -Original Message-
>> From: Eugene Rozenfeld
>> Sent: Friday, June 10, 2022 12:03 PM
>> To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan 
>> Hubicka 
>> Subject: [PING][PATCH] Add instruction level discriminator support.
>>
>> Hello,
>>
>> I'd like to ping this patch:
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.
>> gnu.org%2Fpipermail%2Fgcc-patches%2F2022-June%2F596065.htmldata=
>> 0
>> 5%7C01%7Ceugene.rozenfeld%40microsoft.com%7C3e9ebe6dd5b14fe4471808da8
>> 1
>> 85dc68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63796470932569195
>> 1
>> %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6
>> I
>> k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=K%2BMx6jelnED3n%2Be2d
>> T
>> DYAPOqZZ8Zlsd2%2FyPJ0qib5%2FM%3Dreserved=0
>>
>> Thanks,
>>
>> Eugene
>>
>> -Original Message-
>> From: Gcc-patches
>>  On Behalf Of 
>> Eugene Rozenfeld via Gcc-patches
>> Sent: Thursday, June 02, 2022 12:22 AM
>> To: gcc-patches@gcc.gnu.org; Andi Kleen ; Jan 
>> Hubicka 
>> Subject: [EXTERNAL] [PATCH] Add instruction level discriminator support.
>>
>> This is the first in a series of patches to enable discriminator support in 
>> AutoFDO.
>>
>> This patch switches to tracking discriminators per statement/instruction 
>> instead of per basic block. Tracking per basic block was problematic since 
>> not all statements in a basic block needed a 

Re: [PATCH 09/10] fortran: Support clobbering of variable subreferences [PR88364]

2022-09-19 Thread Mikael Morin

Le 19/09/2022 à 21:46, Harald Anlauf a écrit :

Am 18.09.22 um 22:55 schrieb Mikael Morin:

Le 18/09/2022 à 20:32, Harald Anlauf a écrit :


Assumed shape will be on the easy side,
while assumed size likely needs to be excluded for clobbering.


Isn’t it the converse that is true?
Assumed shape can be non-contiguous so have to be excluded, but assumed
size are contiguous, so valid candidates for clobbering. No?


I really was referring here to *dummies*, as in the following example:

program p
   integer :: a(4)
   a = 1
   call sub (a(1), 2)
   print *, a
contains
   subroutine sub (b, k)
     integer, intent(in)  :: k
     integer, intent(out) :: b(*)
!   integer, intent(out) :: b(k)
     if (k > 2) b(k) = k
   end subroutine sub
end program p

Assumed size (*) is just a contiguous hunk of memory of possibly
unknown size, which can be zero.  So you couldn't set a clobber
for the a(1) actual argument.

Couldn't you clobber A entirely?  If no element of B is initialized in 
SUB, well, A has undefined values on return from SUB.  That's how 
INTENT(OUT) works.


Proxy ping [PATCH] Fortran: Fix function attributes [PR100132]

2022-09-19 Thread Harald Anlauf via Gcc-patches
Dear all,

the following patch was submitted by Jose but never reviewed:

https://gcc.gnu.org/pipermail/fortran/2021-April/055946.html

Before, we didn't set function attributes properly when
passing polymorphic pointers, which could lead to
mis-optimization.

The patch is technically fine and regtests ok, although it
can be shortened slightly, which makes it more readable,
see attached.

When testing the suggested testcase I found that it was
accepted (and working fine) with NAG, but it was rejected
by both Intel and Cray.  This troubled me, but I think
it is standard conforming (F2018:15.5.2.7), while the
error messages issued by Intel

PR100132.f90(61): error #8300: If a dummy argument is allocatable or a pointer, 
and the dummy or its associated actual argument is polymorphic, both dummy and 
actual must be polymorphic with the same declared type or both must be 
unlimited polymorphic.   [S]
call set(s)
-^

and a similar one by Cray, suggest that they refer to
F2018:15.5.2.5, which IMHO does not apply here.
(The text in the error message seems very related to
the reasoning in Note 1 of that subsection).

I'd like to hear (read: read) a second opinion on that.

Thanks,
Harald

From 0b19cfc098554572279c8d19997df4823b426191 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Mon, 19 Sep 2022 22:00:45 +0200
Subject: [PATCH] Fortran: Fix function attributes [PR100132]

gcc/fortran/ChangeLog:

	PR fortran/100132
	* trans-types.cc (create_fn_spec): Fix function attributes when
	passing polymorphic pointers.

gcc/testsuite/ChangeLog:

	PR fortran/100132
	* gfortran.dg/PR100132.f90: New test.
---
 gcc/fortran/trans-types.cc | 15 +-
 gcc/testsuite/gfortran.dg/PR100132.f90 | 75 ++
 2 files changed, 88 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/PR100132.f90

diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc
index 0ea7c74a6f1..c062a5b29d7 100644
--- a/gcc/fortran/trans-types.cc
+++ b/gcc/fortran/trans-types.cc
@@ -3054,12 +3054,23 @@ create_fn_spec (gfc_symbol *sym, tree fntype)
   for (f = gfc_sym_get_dummy_args (sym); f; f = f->next)
 if (spec_len < sizeof (spec))
   {
-	if (!f->sym || f->sym->attr.pointer || f->sym->attr.target
+	bool is_class = false;
+	bool is_pointer = false;
+
+	if (f->sym)
+	  {
+	is_class = f->sym->ts.type == BT_CLASS && CLASS_DATA (f->sym)
+	  && f->sym->attr.class_ok;
+	is_pointer = is_class ? CLASS_DATA (f->sym)->attr.class_pointer
+  : f->sym->attr.pointer;
+	  }
+
+	if (f->sym == NULL || is_pointer || f->sym->attr.target
 	|| f->sym->attr.external || f->sym->attr.cray_pointer
 	|| (f->sym->ts.type == BT_DERIVED
 		&& (f->sym->ts.u.derived->attr.proc_pointer_comp
 		|| f->sym->ts.u.derived->attr.pointer_comp))
-	|| (f->sym->ts.type == BT_CLASS
+	|| (is_class
 		&& (CLASS_DATA (f->sym)->ts.u.derived->attr.proc_pointer_comp
 		|| CLASS_DATA (f->sym)->ts.u.derived->attr.pointer_comp))
 	|| (f->sym->ts.type == BT_INTEGER && f->sym->ts.is_c_interop))
diff --git a/gcc/testsuite/gfortran.dg/PR100132.f90 b/gcc/testsuite/gfortran.dg/PR100132.f90
new file mode 100644
index 000..78ae6702810
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR100132.f90
@@ -0,0 +1,75 @@
+! { dg-do run }
+!
+! Test the fix for PR100132
+!
+
+module main_m
+  implicit none
+
+  private
+
+  public :: &
+foo_t
+
+  public :: &
+set,&
+get
+
+  type :: foo_t
+integer :: i
+  end type foo_t
+
+  type(foo_t), save, pointer :: data => null()
+
+contains
+
+  subroutine set(this)
+class(foo_t), pointer, intent(in) :: this
+
+if(associated(data)) stop 1
+data => this
+  end subroutine set
+
+  subroutine get(this)
+type(foo_t), pointer, intent(out) :: this
+
+if(.not.associated(data)) stop 4
+this => data
+nullify(data)
+  end subroutine get
+
+end module main_m
+
+program main_p
+
+  use :: main_m, only: &
+foo_t, set, get
+
+  implicit none
+
+  integer, parameter :: n = 1000
+
+  type(foo_t), pointer :: ps
+  type(foo_t),  target :: s
+  integer  :: i, j, yay, nay
+
+  yay = 0
+  nay = 0
+  do i = 1, n
+s%i = i
+call set(s)
+call get(ps)
+if(.not.associated(ps)) stop 13
+j = ps%i
+if(i/=j) stop 14
+if(i/=s%i) stop 15
+if(ps%i/=s%i) stop 16
+if(associated(ps, s))then
+  yay = yay + 1
+else
+  nay = nay + 1
+end if
+  end do
+  if((yay/=n).or.(nay/=0)) stop 17
+
+end program main_p
--
2.35.3



Re: [PATCH 09/10] fortran: Support clobbering of variable subreferences [PR88364]

2022-09-19 Thread Harald Anlauf via Gcc-patches

Am 18.09.22 um 22:55 schrieb Mikael Morin:

Le 18/09/2022 à 20:32, Harald Anlauf a écrit :


Assumed shape will be on the easy side,
while assumed size likely needs to be excluded for clobbering.


Isn’t it the converse that is true?
Assumed shape can be non-contiguous so have to be excluded, but assumed 
size are contiguous, so valid candidates for clobbering. No?


I really was referring here to *dummies*, as in the following example:

program p
  integer :: a(4)
  a = 1
  call sub (a(1), 2)
  print *, a
contains
  subroutine sub (b, k)
integer, intent(in)  :: k
integer, intent(out) :: b(*)
!   integer, intent(out) :: b(k)
if (k > 2) b(k) = k
  end subroutine sub
end program p

Assumed size (*) is just a contiguous hunk of memory of possibly
unknown size, which can be zero.  So you couldn't set a clobber
for the a(1) actual argument.


No way, really, arrays are going to be a maze of complexity.


Agreed.






[r13-2722 Regression] FAIL: gfortran.dg/ieee/modes_1.f90 -Os (test for excess errors) on Linux/x86_64

2022-09-19 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

de40fab2f32b03c3d8f69f72c7f1e38694f93d35 is the first bad commit
commit de40fab2f32b03c3d8f69f72c7f1e38694f93d35
Author: Francois-Xavier Coudert 
Date:   Sun Sep 4 18:24:23 2022 +0200

Fortran: add IEEE_MODES_TYPE, IEEE_GET_MODES and IEEE_SET_MODES

caused

FAIL: gfortran.dg/ieee/modes_1.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/ieee/modes_1.f90   -O1  (test for excess errors)
FAIL: gfortran.dg/ieee/modes_1.f90   -O2  (test for excess errors)
FAIL: gfortran.dg/ieee/modes_1.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gfortran.dg/ieee/modes_1.f90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/ieee/modes_1.f90   -Os  (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-2722/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="ieee.exp=gfortran.dg/ieee/modes_1.f90 --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="ieee.exp=gfortran.dg/ieee/modes_1.f90 --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="ieee.exp=gfortran.dg/ieee/modes_1.f90 --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="ieee.exp=gfortran.dg/ieee/modes_1.f90 --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


Re: [PATCH] c: Stray inform note with -Waddress [PR106947]

2022-09-19 Thread Jakub Jelinek via Gcc-patches
On Mon, Sep 19, 2022 at 02:25:59PM -0400, Marek Polacek via Gcc-patches wrote:
> A trivial fix for maybe_warn_for_null_address where we print an
> inform note without first checking the return value of a warning
> call.
> 
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/12?
> 
>   PR c/106947
> 
> gcc/c/ChangeLog:
> 
>   * c-typeck.cc (maybe_warn_for_null_address): Don't emit stray
>   notes.
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/Waddress-7.c: New test.

Ok, thanks.

Jakub



[PATCH] c: Stray inform note with -Waddress [PR106947]

2022-09-19 Thread Marek Polacek via Gcc-patches
A trivial fix for maybe_warn_for_null_address where we print an
inform note without first checking the return value of a warning
call.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/12?

PR c/106947

gcc/c/ChangeLog:

* c-typeck.cc (maybe_warn_for_null_address): Don't emit stray
notes.

gcc/testsuite/ChangeLog:

* c-c++-common/Waddress-7.c: New test.
---
 gcc/c/c-typeck.cc   | 19 ++-
 gcc/testsuite/c-c++-common/Waddress-7.c | 22 ++
 2 files changed, 32 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/Waddress-7.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 43a910d5df2..33d1e8439db 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -11738,18 +11738,19 @@ maybe_warn_for_null_address (location_t loc, tree op, 
tree_code code)
   || from_macro_expansion_at (loc))
 return;
 
+  bool w;
   if (code == EQ_EXPR)
-warning_at (loc, OPT_Waddress,
-   "the comparison will always evaluate as % "
-   "for the address of %qE will never be NULL",
-   op);
+w = warning_at (loc, OPT_Waddress,
+   "the comparison will always evaluate as % "
+   "for the address of %qE will never be NULL",
+   op);
   else
-warning_at (loc, OPT_Waddress,
-   "the comparison will always evaluate as % "
-   "for the address of %qE will never be NULL",
-   op);
+w = warning_at (loc, OPT_Waddress,
+   "the comparison will always evaluate as % "
+   "for the address of %qE will never be NULL",
+   op);
 
-  if (DECL_P (op))
+  if (w && DECL_P (op))
 inform (DECL_SOURCE_LOCATION (op), "%qD declared here", op);
 }
 
diff --git a/gcc/testsuite/c-c++-common/Waddress-7.c 
b/gcc/testsuite/c-c++-common/Waddress-7.c
new file mode 100644
index 000..179948553c5
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Waddress-7.c
@@ -0,0 +1,22 @@
+/* PR c/106947 */
+/* { dg-do compile } */
+/* { dg-options "-Waddress" } */
+
+#ifndef __cplusplus
+# define bool _Bool
+#endif
+
+#pragma GCC diagnostic ignored "-Waddress"
+int s; /* { dg-bogus "declared" } */
+bool e = 
+int
+main ()
+{
+  int error = 0;
+  {
+bool e1 = 
+if (!e1)
+  error = 1;
+  }
+  return error;
+}

base-commit: de40fab2f32b03c3d8f69f72c7f1e38694f93d35
-- 
2.37.3



Re: [PATCH] Fortran 2018 rounding modes changes

2022-09-19 Thread FX via Gcc-patches
Hi,

> Hmm, not really convinced, but that's a possible interpretation, I guess.

It seems to me to be in line with the handling of all other IEEE intrinsics: 
the user is responsible for querying support before calling any relevant 
routine. I admit that there is no explicit language in the case of the rounding 
mode, I will try to find time to ask for an official interpretation.


> My sentence was about IEEE_*S*ET_ROUNDING_MODE actually.
> My point that we previously reported an error (at compile time) to a user 
> that would try to pass a radix 10 while we now silently do... something else 
> that was requested (i.e. set the radix 2 rounding mode).

Oh sorry, I see what you mean and you’re totally right. This is an easy 
improvement, and while I believe it is undefined behaviour in any case, we can 
easily improve that.


> I think it's worth at least documenting that only radix 2 is supported.

That also makes sense. I’ll propose a patch.

Thanks,
FX

Re: [PATCH] Fortran 2018 rounding modes changes

2022-09-19 Thread Mikael Morin

Le 19/09/2022 à 18:17, FX a écrit :

Hi,


2. Add the optional RADIX argument to IEEE_SET_ROUNDING_MODE and 
IEEE_GET_ROUNDING_MODE. It is unused for now, because we do not support 
floating-point radices other than 2.

If we accept the argument, we have to support it somehow.
So for IEEE_GET_ROUNDING_MODE (*, 10), we should return IEEE_OTHER, shouldn't 
we?


I think I disagree. We do not support any real kind with radix 10, so calling 
IEEE_GET_ROUNDING_MODE with RADIX=10 is invalid. Or, in another interpretation, 
the rounding mode is whatever-you-want-to-call it, since you cannot perform 
arithmetic in that radix anyway.


Hmm, not really convinced, but that's a possible interpretation, I guess.



There is no problem for IEEE_SET_ROUNDING_MODE (*, 10) as there is no way this 
to be a valid call if radix 10 is not supported, but changing a compile time 
error to a runtime unexpected behavior seems like a step backwards.


What do you mean by that? The behavior is not unexpected, the value returned by 
IEEE_GET_ROUNDING_MODE for RADIX=10 is irrelevant and cannot be used for 
anything useful.



My sentence was about IEEE_*S*ET_ROUNDING_MODE actually.
My point that we previously reported an error (at compile time) to a 
user that would try to pass a radix 10 while we now silently do... 
something else that was requested (i.e. set the radix 2 rounding mode).


I think it's worth at least documenting that only radix 2 is supported.


[PATCH] testsuite: Do not prefix linker script with "-Wl,"

2022-09-19 Thread Torbjörn SVENSSON via Gcc-patches
The linker script should not be prefixed with "-Wl," - it's not an
input file and does not interfere with the new dump output filename
strategy.

gcc/testsuite/ChangeLog:

* lib/gcc-defs.exp: Do not prefix linker script with "-Wl,".

Signed-off-by: Torbjörn SVENSSON  
---
 gcc/testsuite/lib/gcc-defs.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/gcc-defs.exp b/gcc/testsuite/lib/gcc-defs.exp
index 42ef1d85432..2102ed6f7a3 100644
--- a/gcc/testsuite/lib/gcc-defs.exp
+++ b/gcc/testsuite/lib/gcc-defs.exp
@@ -332,7 +332,7 @@ proc gcc_adjust_linker_flags_list { args } {
continue
} elseif { $skip != "" } then {
set skip ""
-   } elseif { $opt == "-Xlinker" } then {
+   } elseif { $opt == "-Xlinker" || $opt == "-T" } then {
set skip $opt
} elseif { ![string match "-*" $opt] \
   && [file isfile $opt] } {
-- 
2.25.1



Re: [PATCH] c++: Implement P1467R9 - Extended floating-point types and standard names compiler part except for bfloat16 [PR106652]

2022-09-19 Thread Jakub Jelinek via Gcc-patches
On Sat, Sep 17, 2022 at 10:58:54AM +0200, Jason Merrill wrote:
> > I thought it is fairly important because __float128 has been around in GCC
> > for 19 years already.  To be precise, I think e.g. for x86_64 GCC 3.4
> > introduced it, but mangling was implemented only in GCC 4.1 (2006), before 
> > we ICEd
> > on those.  Until glibc 2.26 (2017) one had to use libquadmath when
> > math library functions were needed, but since then one can just use libm.
> > __float128 is on some targets (e.g. PA) just another name for long double,
> > not a distinct type.
> 
> I think we certainly want to continue to support __float128, what I'm
> wondering is how much changing it to mean _Float128 will affect existing
> code.  I would guess that a lot of code that just works on __float128 will
> continue to work without modification.  Does anyone know of significant
> existing uses of __float128?

I know boost uses it in its cstdfloat.hpp and stuff that uses it.

> > Another thing are the PowerPC __ieee128 and __ibm128 type, I think for the
> > former we can't make it the same type as _Float128, because e.g. libstdc++
> > code relies on __ieee128 and __ibm128 being long double type of the other
> > ABI, so they should mangle as long double of the other ABI.  But in that
> > case they can't act as distinct types when long double should mangle the
> > same as they do.  And it would be weird if those types in one
> > -mabi=*longdouble mode worked as standard floating-point type and in another
> > as extended floating-point type, rather than just types which are neither
> > standard nor extended as before.
> 
> Absolutely we don't want to mess with __ieee128 and __ibm128.  And I guess
> that means that we need to preserve the non-standard type handling for the
> alternate long double.
> 
> I think we can still change __float128 to be _Float128 on PPC and other
> targets where it's currently an alias for long double.
> 
> It seems to me that it's a question of what provides the better transition
> path for users.  I imagine we'll want to encourage people to replace
> __float128 with std::float128_t everywhere.
> 
> In the existing model, it's not portable whether
> 
> void f(long double) { }
> void f(__float128) { }
> 
> is an overload or an erroneous redefinition.  In the new model, you can
> portably write
> 
> void f(long double) { }
> void f(std::float128_t) { }
> 
> and existing __float128 code will call the second one.  Old code that had
> conditional __float128 overloads when it's different from long double will
> need to change to have unconditional _Float128 overloads.
> 
> If we don't change __float128 to mean _Float128, we require fewer immediate
> changes for a library that does try to support all floating-point types, but
> it will need changes to support _Float128 and will need to keep around
> conditional __float128 overloads indefinitely.

I agree that we should judge on what makes the forward path for users
better.
I just think we serve users better if we keep __float128 as is, it will be
then better consistent with other weird types like __float80 (on x86/ia64,
same representation/mode as long double, but distinct with separate
mangling), __fpreg on ia64 etc.  It is true that whether __float128 is
distinct or same type as other standard or non-standard types right now
differs from target to target (on x86 obviously it is distinct from all
other currently supported types because we didn't have other IEEE quad
type but __float80 was distinct even if we had one, ia64 has distinct
__float128 unless it is HP-UX where it is same type as long double,
on PA it is same as long double if __float128 exists at all, on powerpc
it is a define to __ieee128 right now where __ieee128 is same type as long
double in the -mabi=ieeelongdouble, and distinct type otherwise (but only
with -mvsx, otherwise it isn't supported (on by default for ppc64le but not
the others)).  So, code that wants to overload with __float128 or use
__float128 in template arguments needs to do some ifdefs to find out
what it should do.  But, if we make __float128 the same as _Float128,
I think it will mean people actually need to make the conditions even more
complex, because what __float128 will be and how it will behave will depend
on the compiler version.
I believe libraries with floating point stuff will want to add
std::{float{16,32,64,128},bfloat16}_t overloads eventually, not just
std::float128_t overloads, and it will be better if it uses the standard
f{16,32,64,128} literal suffixes, builtins, library functions etc. rather
than Q suffixes, *q builtins, libquadmath functions etc.
Say on x86 it would be inconsistent if __float80 remains to be some
non-standard type, but __float128 is now extended type.  And when
__float128 and _Float128 is distinct, we can keep clean DF128_ mangling
for the latter.

Anyway, here is an updated patch that adds also _Float{32,64,128}x support
with DF{32,64,128}x mangling and demangling and the conv->bad_p + pedwarn

[PATCH] testsuite: 'b' instruction can't do long enough jumps

2022-09-19 Thread Torbjörn SVENSSON via Gcc-patches
After moving the testglue in commit 9d503515cee, the jump to exit and
abort is too far for the 'b' instruction on Cortex-M0. As most of the
C code would generate a 'bl' instruction instead of a 'b'
instruction, lets do the same for the inline assembler.

The error seen without this patch:

/tmp/cccCRiCl.o: in function `main':
stack-protector-1.c:(.text+0x4e): relocation truncated to fit: R_ARM_THM_JUMP11 
against symbol `__wrap_exit' defined in .text section in gcc_tg.o
stack-protector-1.c:(.text+0x50): relocation truncated to fit: R_ARM_THM_JUMP11 
against symbol `__wrap_abort' defined in .text section in gcc_tg.o
collect2: error: ld returned 1 exit status

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/arm/stack-protector-1.c: Use 'bl'
instead of 'b' instruction.
* gcc/testsuite/gcc.target/arm/stack-protector-3.c: Likewise.

Co-Authored-By: Yvan ROUX  
Signed-off-by: Torbjörn SVENSSON  
---
 gcc/testsuite/gcc.target/arm/stack-protector-1.c | 4 ++--
 gcc/testsuite/gcc.target/arm/stack-protector-3.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/stack-protector-1.c 
b/gcc/testsuite/gcc.target/arm/stack-protector-1.c
index 8d28b0a847c..3f0ffc9c3f3 100644
--- a/gcc/testsuite/gcc.target/arm/stack-protector-1.c
+++ b/gcc/testsuite/gcc.target/arm/stack-protector-1.c
@@ -56,8 +56,8 @@ asm (
 "  ldr r1, [sp, #4]\n"
CHECK (r1)
 "  mov r0, #0\n"
-"  b   exit\n"
+"  bl  exit\n"
 "1:\n"
-"  b   abort\n"
+"  bl  abort\n"
 "  .size   main, .-main"
 );
diff --git a/gcc/testsuite/gcc.target/arm/stack-protector-3.c 
b/gcc/testsuite/gcc.target/arm/stack-protector-3.c
index b8f77fa2309..2f710529b8f 100644
--- a/gcc/testsuite/gcc.target/arm/stack-protector-3.c
+++ b/gcc/testsuite/gcc.target/arm/stack-protector-3.c
@@ -26,7 +26,7 @@ asm (
 "  .type   __stack_chk_fail, %function\n"
 "__stack_chk_fail:\n"
 "  movsr0, #0\n"
-"  b   exit\n"
+"  bl  exit\n"
 "  .size   __stack_chk_fail, .-__stack_chk_fail"
 );
 
-- 
2.25.1



[PATCH] Avoid depending on destructor order

2022-09-19 Thread Thomas Neumann via Gcc-patches

In some scenarios (e.g., when mixing gcc and clang code), it can
happen that frames are deregistered after the lookup structure
has already been destroyed. That in itself would be fine, but
it triggers an assert in __deregister_frame_info_bases that
expects to find the frame.

To avoid that, we now remember that the btree as already been
destroyed and disable the assert in that case.

libgcc/ChangeLog:

* unwind-dw2-fde.c: (release_register_frames) Remember
when the btree has been destroyed.
(__deregister_frame_info_bases) Disable the assert when
shutting down.
---
 libgcc/unwind-dw2-fde.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
index 919abfe0664..d237179f4ea 100644
--- a/libgcc/unwind-dw2-fde.c
+++ b/libgcc/unwind-dw2-fde.c
@@ -48,6 +48,7 @@ typedef __UINTPTR_TYPE__ uintptr_type;
 #include "unwind-dw2-btree.h"

 static struct btree registered_frames;
+static bool in_shutdown;

 static void
 release_registered_frames (void) __attribute__ ((destructor (110)));
@@ -57,6 +58,7 @@ release_registered_frames (void)
   /* Release the b-tree and all frames. Frame releases that happen 
later are

* silently ignored */
   btree_destroy (_frames);
+  in_shutdown = true;
 }

 static void
@@ -282,7 +284,7 @@ __deregister_frame_info_bases (const void *begin)
   __gthread_mutex_unlock (_mutex);
 #endif

-  gcc_assert (ob);
+  gcc_assert (in_shutdown || ob);
   return (void *) ob;
 }

--
2.34.1



[PATCH] testsuite: Skip intrinsics test if arm

2022-09-19 Thread Torbjörn SVENSSON via Gcc-patches
In the test case, it's clearly written that intrinsics is not
implemented on arm*. A simple xfail does not help since there are
link error and that would cause an UNRESOLVED testcase rather than
XFAIL.
By chaning to dg-skip-if, the entire test case is omitted.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd-intrinsics/vld1x2.c: Replace
dg-xfail-if with gd-skip-if.

Co-Authored-By: Yvan ROUX  
Signed-off-by: Torbjörn SVENSSON  
---
 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c
index 92a139bc523..f933102be47 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c
@@ -1,6 +1,6 @@
 /* We haven't implemented these intrinsics for arm yet.  */
-/* { dg-xfail-if "" { arm*-*-* } } */
 /* { dg-do run } */
+/* { dg-skip-if "unsupported" { arm*-*-* } } */
 /* { dg-options "-O3" } */
 
 #include 
-- 
2.25.1



Re: [PATCH] Fortran 2018 rounding modes changes

2022-09-19 Thread FX via Gcc-patches
Hi,

>> 2. Add the optional RADIX argument to IEEE_SET_ROUNDING_MODE and 
>> IEEE_GET_ROUNDING_MODE. It is unused for now, because we do not support 
>> floating-point radices other than 2.
> If we accept the argument, we have to support it somehow.
> So for IEEE_GET_ROUNDING_MODE (*, 10), we should return IEEE_OTHER, shouldn't 
> we?

I think I disagree. We do not support any real kind with radix 10, so calling 
IEEE_GET_ROUNDING_MODE with RADIX=10 is invalid. Or, in another interpretation, 
the rounding mode is whatever-you-want-to-call it, since you cannot perform 
arithmetic in that radix anyway.


> There is no problem for IEEE_SET_ROUNDING_MODE (*, 10) as there is no way 
> this to be a valid call if radix 10 is not supported, but changing a compile 
> time error to a runtime unexpected behavior seems like a step backwards.

What do you mean by that? The behavior is not unexpected, the value returned by 
IEEE_GET_ROUNDING_MODE for RADIX=10 is irrelevant and cannot be used for 
anything useful.

FX

[PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865] (2/2)

2022-09-19 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Split TARGET_POWER8 from TARGET_DIRECT_MOVE [PR101865]

Hi,
  The _ARCH_PWR8 define is conditional on TARGET_DIRECT_MOVE,
and can be disabled by dependent options when it should not be.
This manifests in the issue seen in PR101865 where -mno-vsx
mistakenly disables _ARCH_PWR8.

This change replaces the relevant TARGET_DIRECT_MOVE references
with a TARGET_POWER8 entry so that the direct_move and power8
features can be enabled or disabled independently.

This is done via the OPTION_MASK definitions, so this
means that some references to the OPTION_MASK_DIRECT_MOVE
option are now replaced with OPTION_MASK_POWER8.

The existing (and rather lengthy) commentary for DIRECT_MOVE remains
in place in rs6000-c.cc:rs6000_target_modify_macros().  The
if-defined logic there will now set a __DIRECT_MOVE__ define when
TARGET_DIRECT_MOVE is set, this serves as a placeholder for debug
purposes, but is otherwise unused.  This can be removed in a
subsequent patch, or in an update of this patch, depending on feedback.

This regests cleanly (power8,power9,power10), and resolves
PR 101865 as represented in the tests from (1/2).

OK for trunk?
Thanks,
-Will


gcc/
PR Target/101865
* config/rs6000/rs6000-builtin.cc
(rs6000_builtin_is_supported): Replace TARGET_DIRECT_MOVE
usage with TARGET_POWER8.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros):
Add __DIRECT_MOVE__ define.  Replace _ARCH_PWR8_ define
conditional with OPTION_MASK_POWER8.
* config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER):
Add OPTION_MASK_POWER8 entry.
(POWERPC_MASKS): Same.
* config/rs6000/rs6000.cc (rs6000_option_override_internal):
Replace OPTION_MASK_DIRECT_MOVE usage with OPTION_MASK_POWER8.
(rs6000_opt_masks): Add "power8" entry for new OPTION_MASK_POWER8.
* config/rs6000/rs6000.opt (-mpower8): Add entry for POWER8.
* config/rs6000/vsx.md (vsx_extract_): Replace
TARGET_DIRECT_MOVE usage with TARGET_POWER8.
(define_peephole2): Same.

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 3ce729c1e6de..91a0f39bd796 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -163,11 +163,11 @@ rs6000_builtin_is_supported (enum rs6000_gen_builtins 
fncode)
 case ENB_P7:
   return TARGET_POPCNTD;
 case ENB_P7_64:
   return TARGET_POPCNTD && TARGET_POWERPC64;
 case ENB_P8:
-  return TARGET_DIRECT_MOVE;
+  return TARGET_POWER8;
 case ENB_P8V:
   return TARGET_P8_VECTOR;
 case ENB_P9:
   return TARGET_MODULO;
 case ENB_P9_64:
diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index ca9cc42028f7..41d51b039061 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -439,11 +439,13 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
flags)
  turned off in any of the following conditions:
  1. TARGET_HARD_FLOAT, TARGET_ALTIVEC, or TARGET_VSX is explicitly
disabled and OPTION_MASK_DIRECT_MOVE was not explicitly
enabled.
  2. TARGET_VSX is off.  */
-  if ((flags & OPTION_MASK_DIRECT_MOVE) != 0)
+  if ((OPTION_MASK_DIRECT_MOVE) != 0)
+rs6000_define_or_undefine_macro (define_p, "__DIRECT_MOVE__");
+  if ((flags & OPTION_MASK_POWER8) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR8");
   if ((flags & OPTION_MASK_MODULO) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9");
   if ((flags & OPTION_MASK_POWER10) != 0)
 rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR10");
diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index c3825bcccd84..c873f6d58989 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -48,10 +48,11 @@
system.  */
 #define ISA_2_7_MASKS_SERVER   (ISA_2_6_MASKS_SERVER   \
 | OPTION_MASK_P8_VECTOR\
 | OPTION_MASK_CRYPTO   \
 | OPTION_MASK_DIRECT_MOVE  \
+| OPTION_MASK_POWER8   \
 | OPTION_MASK_EFFICIENT_UNALIGNED_VSX  \
 | OPTION_MASK_QUAD_MEMORY  \
 | OPTION_MASK_QUAD_MEMORY_ATOMIC)
 
 /* ISA masks setting fusion options.  */
@@ -124,10 +125,11 @@
 #define POWERPC_MASKS  (OPTION_MASK_ALTIVEC\
 | OPTION_MASK_CMPB \
 | OPTION_MASK_CRYPTO   \
 | OPTION_MASK_DFP  \
 | OPTION_MASK_DIRECT_MOVE  \
+| OPTION_MASK_POWER8   

[PATCH] RISC-V modified add3 for large stack frame optimization [PR105733]

2022-09-19 Thread Kevin Lee
Hello GCC,
 Started from Jim Wilson's patch in
https://github.com/riscv-admin/riscv-code-speed-optimization/blob/main/projects/gcc-optimizations.adoc
for the large stack frame optimization problem, this augmented patch
generates less instructions for cases such as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733.
Original:
foo:
li t0,-4096
addi t0,t0,2016
li a4,4096
add sp,sp,t0
li a5,-4096
addi a4,a4,-2032
add a4,a4,a5
addi a5,sp,16
add a5,a4,a5
add a0,a5,a0
li t0,4096
sd a5,8(sp)
sb zero,2032(a0)
addi t0,t0,-2016
add sp,sp,t0
jr ra
After Patch:
foo:
li t0,-4096
addi t0,t0,2032
add sp,sp,t0
addi a5,sp,-2032
add a0,a5,a0
li t0,4096
sb zero,2032(a0)
addi t0,t0,-2032
add sp,sp,t0
jr ra

   = Summary of gcc testsuite =
| # of unexpected case / # of unique unexpected
case
|  gcc |  g++ | gfortran |
 rv64gc/  lp64d/ medlow |4 / 4 |   13 / 4 |0 / 0 |
No additional failures were created from the testsuite.

gcc/ChangeLog:
   Jim Wilson 
   Michael Collison 
   Kevin Lee 

* config/riscv/predicates.md (const_lui_operand): New predicate.
(add_operand): Ditto.
(reg_or_const_int_operand): Ditto.
* config/riscv/riscv-protos.h (riscv_eliminable_reg): New
function.
* config/riscv/riscv.cc (riscv_eliminable_reg): New Function.
(riscv_adjust_libcall_cfi_prologue): Use gen_rtx_SET and
gen_rtx_fmt_ee instead of gen_add3_insn.
(riscv_adjust_libcall_cfi_epilogue): ditto.
* config/riscv/riscv.md (addsi3): Remove.
(adddi3): ditto.
(add3): New instruction for large stack frame optimization.
(add3_internal): ditto
(add3_internal2): New instruction for insns generated in
the prologue and epilogue pass.
---
 gcc/config/riscv/predicates.md  | 13 +
 gcc/config/riscv/riscv-protos.h |  1 +
 gcc/config/riscv/riscv.cc   | 20 ++--
 gcc/config/riscv/riscv.md   | 84 -
 4 files changed, 101 insertions(+), 17 deletions(-)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 862e72b0983..b98bb5a9768 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -35,6 +35,14 @@ (define_predicate "sfb_alu_operand"
   (ior (match_operand 0 "arith_operand")
(match_operand 0 "lui_operand")))

+(define_predicate "const_lui_operand"
+  (and (match_code "const_int")
+   (match_test "(INTVAL (op) & 0xFFF) == 0 && INTVAL (op) != 0")))
+
+(define_predicate "add_operand"
+  (ior (match_operand 0 "arith_operand")
+   (match_operand 0 "const_lui_operand")))
+
 (define_predicate "const_csr_operand"
   (and (match_code "const_int")
(match_test "IN_RANGE (INTVAL (op), 0, 31)")))
@@ -59,6 +67,11 @@ (define_predicate "reg_or_0_operand"
   (ior (match_operand 0 "const_0_operand")
(match_operand 0 "register_operand")))

+;; For use in adds, when adding to an eliminable register.
+(define_predicate "reg_or_const_int_operand"
+  (ior (match_code "const_int")
+   (match_operand 0 "register_operand")))
+
 ;; Only use branch-on-bit sequences when the mask is not an ANDI immediate.
 (define_predicate "branch_on_bit_operand"
   (and (match_code "const_int")
diff --git a/gcc/config/riscv/riscv-protos.h
b/gcc/config/riscv/riscv-protos.h
index 649c5c977e1..8f0aa8114be 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -63,6 +63,7 @@ extern void riscv_expand_conditional_move (rtx, rtx, rtx,
rtx_code, rtx, rtx);
 extern rtx riscv_legitimize_call_address (rtx);
 extern void riscv_set_return_address (rtx, rtx);
 extern bool riscv_expand_block_move (rtx, rtx, rtx);
+extern bool riscv_eliminable_reg (rtx);
 extern rtx riscv_return_addr (int, rtx);
 extern poly_int64 riscv_initial_elimination_offset (int, int);
 extern void riscv_expand_prologue (void);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 675d92c0961..b5577a4f366 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4320,6 +4320,16 @@ riscv_initial_elimination_offset (int from, int to)
   return src - dest;
 }

+/* Return true if X is a register that will be eliminated later on.  */
+bool
+riscv_eliminable_reg (rtx x)
+{
+  return REG_P (x) && (REGNO (x) == FRAME_POINTER_REGNUM
+   || REGNO (x) == ARG_POINTER_REGNUM
+   || (REGNO (x) >= FIRST_VIRTUAL_REGISTER
+   && REGNO (x) <= LAST_VIRTUAL_REGISTER));
+}
+
 /* Implement RETURN_ADDR_RTX.  We do not support moving back to a
previous frame.  */

@@ -4521,8 +4531,9 @@ riscv_adjust_libcall_cfi_prologue ()
   }

   /* Debug info for adjust sp.  */
-  adjust_sp_rtx = gen_add3_insn (stack_pointer_rtx,
- stack_pointer_rtx, GEN_INT (-saved_size));
+  adjust_sp_rtx = gen_rtx_SET (stack_pointer_rtx,
+gen_rtx_fmt_ee (PLUS, GET_MODE (stack_pointer_rtx),
+  stack_pointer_rtx, GEN_INT (saved_size)));
   dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx,
   dwarf);
   return dwarf;
@@ 

[PATCH, rs6000] Tests of ARCH_PWR8 and -mno-vsx option. (1/2)

2022-09-19 Thread will schmidt via Gcc-patches
[PATCH, rs6000] Tests of ARCH_PWR8 and -mno-vsx option.

Hi,

This adds an assortment of tests to exercise the -mno-vsx option and
confirm the impacts on the ARCH_PWR8 define.

These are based on and inspired by PR 101865, which
reports that _ARCH_PWR8 is disabled when -mno-vsx
is passed on the commandline.

There are a small number of failures introduced by these tests,
those are resolved with the changes in part 2.

OK for trunk?
Thanks,
-Will


gcc/testsuite:
* gcc.target/powerpc/predefine_p7-novsx.c: New test.
* gcc.target/powerpc/predefine_p8-noaltivec-novsx.c: New test.
* gcc.target/powerpc/predefine_p8-novsx.c: New test.
* gcc.target/powerpc/predefine_p9-novsx.c: New test.
* gcc.target/powerpc/predefine_pragma_vsx.c: New test.


diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p7-novsx.c 
b/gcc/testsuite/gcc.target/powerpc/predefine_p7-novsx.c
new file mode 100644
index ..e842025b4d3c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/predefine_p7-novsx.c
@@ -0,0 +1,9 @@
+/* { dg-do preprocess } */
+/* Test whether the ARCH_PWR7 and ARCH_PWR8 defines gets set
+ * when we specify power7, plus options.
+/* This is a variation of the test at issue in GCC PR 101865 */
+/* { dg-options "-dM -E -mdejagnu-cpu=power7 -mno-vsx" } */
+/* { dg-final { scan-file predefine_p7-novsx.i "(^|\\n)#define _ARCH_PWR7 
1($|\\n)"  } } */
+/* { dg-final { scan-file-not predefine_p7-novsx.i "(^|\\n)#define _ARCH_PWR8 
1($|\\n)"  } } */
+/* { dg-final { scan-file-not predefine_p7-novsx.i "(^|\\n)#define __VSX__ 
1($|\\n)" } } */
+/* { dg-final { scan-file predefine_p7-novsx.i "(^|\\n)#define __ALTIVEC__ 
1($|\\n)" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p8-noaltivec-novsx.c 
b/gcc/testsuite/gcc.target/powerpc/predefine_p8-noaltivec-novsx.c
new file mode 100644
index ..c3b705ca3d48
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/predefine_p8-noaltivec-novsx.c
@@ -0,0 +1,7 @@
+/* { dg-do preprocess } */
+/* Test whether the ARCH_PWR8 define remains set after disabling both altivec 
and vsx. */
+/* { dg-options "-dM -E -mdejagnu-cpu=power8 -mno-altivec -mno-vsx" } */
+/* { dg-final { scan-file predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
_ARCH_PWR8 1($|\\n)"  } } */
+/* { dg-final { scan-file-not predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
_ARCH_PWR9 1($|\\n)" } } */
+/* { dg-final { scan-file-not predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
__VSX__ 1($|\\n)" } } */
+/* { dg-final { scan-file-not predefine_p8-noaltivec-novsx.i "(^|\\n)#define 
__ALTIVEC__ 1($|\\n)" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p8-novsx.c 
b/gcc/testsuite/gcc.target/powerpc/predefine_p8-novsx.c
new file mode 100644
index ..8b6c69b20104
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/predefine_p8-novsx.c
@@ -0,0 +1,8 @@
+/* { dg-do preprocess } */
+/* Test whether the ARCH_PWR8 define remains set after disabling vsx.
+   This also confirms __ALTIVEC__ remains set when VSX is disabled. */
+/* This is the primary test at issue in GCC PR 101865 */
+/* { dg-options "-dM -E -mdejagnu-cpu=power8 -mno-vsx" } */
+/* { dg-final { scan-file predefine_p8-novsx.i "(^|\\n)#define _ARCH_PWR8 
1($|\\n)"  } } */
+/* { dg-final { scan-file-not predefine_p8-novsx.i "(^|\\n)#define __VSX__ 
1($|\\n)" } } */
+/* { dg-final { scan-file predefine_p8-novsx.i "(^|\\n)#define __ALTIVEC__ 
1($|\\n)" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_p9-novsx.c 
b/gcc/testsuite/gcc.target/powerpc/predefine_p9-novsx.c
new file mode 100644
index ..eef42c111663
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/predefine_p9-novsx.c
@@ -0,0 +1,10 @@
+/* { dg-do preprocess } */
+/* Test whether the ARCH_PWR8 define remains set after disabling vsx.
+   This also confirms __ALTIVEC__ remains set when VSX is disabled. */
+/* This is the primary test at issue in GCC PR 101865 */
+/* { dg-options "-dM -E -mdejagnu-cpu=power9 -mno-vsx" } */
+/* {xfail *-*-*} */
+/* { dg-final { scan-file predefine_p9-novsx.i "(^|\\n)#define _ARCH_PWR8 
1($|\\n)"  } } */
+/* { dg-final { scan-file predefine_p9-novsx.i "(^|\\n)#define _ARCH_PWR9 
1($|\\n)"  } } */
+/* { dg-final { scan-file-not predefine_p9-novsx.i "(^|\\n)#define __VSX__ 
1($|\\n)" } } */
+/* { dg-final { scan-file predefine_p9-novsx.i "(^|\\n)#define __ALTIVEC__ 
1($|\\n)" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/predefine_pragma_vsx.c 
b/gcc/testsuite/gcc.target/powerpc/predefine_pragma_vsx.c
new file mode 100644
index ..b300600af999
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/predefine_pragma_vsx.c
@@ -0,0 +1,83 @@
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-mdejagnu-cpu=power8 -mvsx -O2" } */
+
+/* Ensure that if we set a pragma gcc target for an
+   older processor, we do not compile builtins that
+   the older target does not 

Re: [PATCH] Fortran 2018 rounding modes changes

2022-09-19 Thread Mikael Morin

Hello,

I'm coming (late) to this.

Le 31/08/2022 à 20:29, FX via Fortran a écrit :

This adds new F2018 features, that are not really enabled (because their 
runtime support is optional).


(...)


2. Add the optional RADIX argument to IEEE_SET_ROUNDING_MODE and 
IEEE_GET_ROUNDING_MODE. It is unused for now, because we do not support 
floating-point radices other than 2.


If we accept the argument, we have to support it somehow.
So for IEEE_GET_ROUNDING_MODE (*, 10), we should return IEEE_OTHER, 
shouldn't we?
There is no problem for IEEE_SET_ROUNDING_MODE (*, 10) as there is no 
way this to be a valid call if radix 10 is not supported, but changing a 
compile time error to a runtime unexpected behavior seems like a step 
backwards.




Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-09-19 Thread Thomas Neumann via Gcc-patches



At least when building LibreOffice with Clang (16 trunk) with ASan and 
UBsan enabled against libstdc++ (with --gcc-toolchain and 
LD_LIBRARY_PATH to pick up a libstdc++ trunk build including this change 
at build and run-time), at least one of the LibreOffice tests executed 
during the build started to fail with


I could not reproduce the issue when building LibreOffice with gcc, but 
after reading the compiler-rt version of crtbegin.c I think the problem 
is the destruction order in compiler-rt. It calls 
__deregister_frame_info_bases after our lookup structure has already 
been destroyed.


Can you try if the patch below fixes the problem? It keeps the data 
structures alive at shutdown, though, which will probably make some leak 
detectors unhappy.


Alternatively we could simply remove the gcc_assert (ob) in line 285 of 
that file. As far as I can see in crt-begin nothing bad happens if we 
return nullptr at shutdown.


Best

Thomas


diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
index 919abfe0664..d427318280c 100644
--- a/libgcc/unwind-dw2-fde.c
+++ b/libgcc/unwind-dw2-fde.c
@@ -49,16 +49,6 @@ typedef __UINTPTR_TYPE__ uintptr_type;

 static struct btree registered_frames;

-static void
-release_registered_frames (void) __attribute__ ((destructor (110)));
-static void
-release_registered_frames (void)
-{
-  /* Release the b-tree and all frames. Frame releases that happen 
later are

-   * silently ignored */
-  btree_destroy (_frames);
-}
-
 static void
 get_pc_range (const struct object *ob, uintptr_type *range);
 static void




Re: [PATCH][RFH] Wire ranger into FRE

2022-09-19 Thread Andrew MacLeod via Gcc-patches
Yeah, currently the internal cache isnt really wired into the 
fold_using_range as its is suppose to be a consistent internal state.  
so its not currently expecting to work in a situation here what it 
thinks is global state might change.


I figured I better go back and watch your entire VN presentation, I only 
caught the last bit live.


Let me try to understand exactly what you want/need.. and let me use a 
simple example from that talk (yes I just typed it in :-).


  n_2 = 1
  i_4 = 0
  val_5 = 0
:
  # i_1 = PHI 
  #val_2 = PHI 
  val_6 = val_2 + 1;
  i_7 = i_1 + 1
  if (i_7 < n_3)
    goto ;
  else
    goto ;

  _8 = val_6
  return _8


In an ideal world, While you are processing bb 3 the first iteration,  
you want edge 4->3 to be unexecutable as far as ranger is concerned.  
You walk thru the block, and get to the end.  and in this example, you'd 
be done cause any queries you make range would have if (2 < 1),


For the case when we do need to iterate (say n_2 = 3), you need to go 
back and rewalk that block, with that edge no longer marked as 
unexecutable?  And part of the problem would be that ranger's cache for 
bb 3 would have all those values from the first pass. and everything 
explodes.


It looks like you created a fur_source to manually adjust PHIs within 
the fold_stmt query to ignore edges that are not marked executable.


So now let me start with my questions.

In theory, if you could tell it "kill the cache for bb3"  change the 
executable state and re walk it, that would work?  This clearly get s a 
little more complicated as the size of the region that is being iterated 
grows, so you would want to be able to invalidate the cache for a range 
of blocks.


Thats not quite as straightforward as it might seems because the cache 
for values per basic block is indexed by ssa-name..  so you cant simply 
say, "kill bb3s value".. you need to walk all the ssa-names that have a 
cache object, and if they have an entry for bb3, kill that.  But lets 
leave that alone for a minute, because that is a solvable problem.


Ranger also has a globally available flag that it uses to internally 
flag and track executable state of edge:


  auto_edge_flag non_executable_edge_flag;

The gori object which is instantiated with any ranger instance only uses 
that to determine if an edge should be processed for a value  ie


gori_compute::outgoing_edge_range_p (vrange , edge e, tree name, 
range_query )

{
    if ((e->flags & m_not_executable_flag))
    {
  r.set_undefined ();
  return true;

So it seems to me that if you all set/cleared this flag along with 
EDGE_EXECUTABLE  (and it means the opposite by the way.  TRUE means the 
edge is not_executable.. )  That at least all your queries through 
ranger would be picking up the values you are looking for.  including 
those PHI's.


That would then just leave you with the stale cache state to deal 
with?   And if we can resolve that, would all just work?  at least in 
theory?


There is a concept in the cache of staleness which is used when we 
recalculate over back edges, but its really oriented towards a model 
where we only ever move edges from executable to non-executable.    Your 
VN approach moves them in the other way and I have no confidence it 
would work properly in that mode.


Its not super efficient, but as a starting point we might be able to do 
something like
  1 - when you enter a region call a new routine called "Mark state" 
which would  push an empty bitmap over ssa-names onto a stack
  2 - everytime the cache creates an entry, set the bit for the 
ssa-name in the bitmap at the top of the stack
 3 - when you call "reset state", simply kill all the cache entries for 
any ssa name set in the bitmap.  And also kill rangers internal global 
value for the name.   THIs would in essence reset eveyrthing it knows 
about that ssa name.


That would have to drawback of losing any information that had been 
calculated earlier, and possibly trigger some additional calculations as 
that value will now be stale elsewhere in the IL.   We could make it 
more efficient if you also provided a bitmap over the basic blocks in 
the region (or some way of determining that).  Then we would just kill 
the entries in those blocks for those ssa-names.


As long as you weren't making any ranger queries from outside the region 
during this time, that might work.


Before I go any further, are we on the same page, or are you looking for 
something different than this?


You had mentioned the path_ranger model also, which effectively uses 
ranger for the values on entry to the path/region, and then uses 
over-rided API entry points to choose whether to use the local cache 
vector, or go to rangers on-entry values..


 This approach could also work, and one way would be to implement the 
mark/reset API . The new class would maybe instantiate an additional 
local cache from which values are set/picked up first for any queries 
within the region blocks and when 

Re: [patch, avr] Fix PR target/99184: Wrong cast from double to 16-bit and 32-bit ints.

2022-09-19 Thread Jonathan Wakely via Gcc-patches
On Mon, 19 Sept 2022 at 10:06, Richard Biener wrote:
>
> On Mon, Sep 19, 2022 at 10:57 AM Georg Johann Lay  wrote:
> >
> >
> >
> > Am 19.09.22 um 09:51 schrieb Richard Biener:
> > > On Sun, Sep 18, 2022 at 7:40 PM Georg Johann Lay  wrote:
> > >>
> > >> Hello,
> > >>
> > >> this patch fixed PR target/99184 which incorrectly rounded during 64-bit
> > >> (long) double to 16-bit and 32-bit integers.
> > >>
> > >> The patch just removes the respective roundings from
> > >> libf7-asm.sx::to_integer and ::to_unsigned.  Luckily, LibF7 does nowhere
> > >> use respective functions internally, the only user is in libf7.c::f7_exp
> > >>
> > >> which reads
> > >>
> > >> f7_round (qq, qq);
> > >> int16_t q = f7_get_s16 (qq);
> > >>
> > >> so that f7_get_s16() operates on an already rounded value, and therefore
> > >> this code works unaltered with or without rounding in to_integer.
> > >>
> > >> The patch applies to directory
> > >>
> > >> ./libgcc/config/avr/libf7/
> > >>
> > >> and is the same for all GCC versions v10+.
> > >>
> > >> Please someone with write permissions commit it to trunk and backport to
> > >> v12, v11, and v10 as it is a wrong-code issue.
> > >>
> > >> The patch will fit without problems (except for ChangeLog) because there
> > >> is no traffic on that folder.
> > >
> > > Thanks, I've pushed the change.  Please in future try to send patches
> > > that can be applied with git am, thus use git format-patch
> > >
> > > Richard.
> >
> > Thanks you so much.  The patch I generated with "git diff > file.diff",
> > so that is not correct?
>
> Yes, it's correct, but see below.

I would say it's just wrong.

> > The only change is that I defined extra hunks
> > for asm so that one can see the function like in
> >
> >   @@ -601,9 +601,6 @@ DEFUN to_integer
> >
> > So git is not prepared to such hunks? Would you point me to some
> > documentation on how to do it properly?
>
> The more useful process is that after changing the code you
> do a git commit to your tree and then
>
> git format-patch --stdout HEAD^
>
> to get a git patch.  On that I can do 'git am' and that will produce
> exactly the commit you produced, including the patch description
> and ChangeLog parts.  I had to add that manually.
>
> Indeed https://gcc.gnu.org/contribute.html doesn't mention this,
> we should probably improve that.  Jonathan, maybe the above
> is not the optimal way so can you think of something to add
> to the 'Submitting Patches' section to document this?

I'll try to come up with something.

I agree that format-patch (or just 'git show') is the proper way to
generate a patch for submission to the mailing lists. Just doing 'git
diff' only works if you haven't committed or staged any part of your
work, and frankly if you haven't even got to the point of committing
it locally, it's just a rough draft and probably not ready to submit
to the mailing list.

The current docs do not really represent anything close to best
practice when using Git for version control. But last time I tried to
improve those docs I gave up because people wanted perfection instead
of "good enough" (compared to "quite bad" as we have now).



Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-09-19 Thread Stephan Bergmann via Gcc-patches

On 19/09/2022 15:55, Thomas Neumann wrote:
Apparently a registered frame is not found when deregistering, which 
triggers an assert. I will debug this. Could you send me a script or a 
description on how to reproduce the issue?


Thanks a lot!  I'm in the process of checking whether a more generic 
LibreOffice build scenario will also exhibit this, and will let you know 
about a (hopefully less demanding) reproducer scenario.




Re: [PATCH] [PR68097] frange::set_nonnegative should not contain -NAN.

2022-09-19 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 19 Sep 2022, Richard Biener via Gcc-patches wrote:

> > but I guess it's good we do the right thing for correctness sake, and
> > if it ever gets used by someone else.
> >
> > >
> > > That said, 'set_nonnegative' could be interpreted to be without
> > > NaNs?
> >
> > Sounds good to me.  How's this?
> 
> Hmm, I merely had lots of questions above so I think to answer them
> we at least should document what 'set_nonnegative' implies in the
> abstract vrange class.
> 
> It's probably safer to keep NaN in.  For example unfolded copysign (x, 1.)
> will return true for x == -NaN but the result will be a NaN.

FWIW, in IEEE, 'abs' (like 'copy, 'copysign' and 'negate') are not 
arithmetic, they are quiet-computational.  Hence they don't rise 
anything, not even for sNaNs; they copy the input bits and appropriately 
modify the bit pattern according to the specification (i.e. fiddle the 
sign bit).

That also means that a predicate like negative_p(x) that would be 
implemented ala

  copysign(1.0, x) < 0.0

deal with NaNs just fine and is required to correctly capture the sign of 
'x'.  If frange::set_nonnegative is supposed to be used in such contexts 
(and I think it's a good idea if that were the case), then set_nonnegative 
does _not_ imply no-NaN.

In particular I would assume that, given an VAYRING frange FR, that 
FR.set_nonnegative() would result in an frange {[+0.0,+inf],+nan} .


Ciao,
Michael.


Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-09-19 Thread Thomas Neumann via Gcc-patches
I haven't debugged this in any way, nor checked whether it only impacts 
exactly my below scenario, but noticed the following:


At least when building LibreOffice with Clang (16 trunk) with ASan and 
UBsan enabled against libstdc++ (with --gcc-toolchain and 
LD_LIBRARY_PATH to pick up a libstdc++ trunk build including this change 
at build and run-time), at least one of the LibreOffice tests executed 
during the build started to fail with


Apparently a registered frame is not found when deregistering, which 
triggers an assert. I will debug this. Could you send me a script or a 
description on how to reproduce the issue?


Best

Thomas


[PATCH] c++: stream PACK_EXPANSION_EXTRA_ARGS [PR106761]

2022-09-19 Thread Patrick Palka via Gcc-patches
It looks like some xtreme-header-* tests are failing after the libstdc++
change r13-2158-g02f6b405f0e9dc ultimately because we're neglecting
to stream PACK_EXPANSION_EXTRA_ARGS, which leads to false equivalences
of different partial instantiations of _TupleConstraints::__constructible.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

PR c++/106761

gcc/cp/ChangeLog:

* module.cc (trees_out::type_node) :
Stream PACK_EXPANSION_EXTRA_ARGS.
(trees_in::tree_node) : Likewise.
---
 gcc/cp/module.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 1a1ff5be574..9a9ef4e3332 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -8922,6 +8922,7 @@ trees_out::type_node (tree type)
   if (streaming_p ())
u (PACK_EXPANSION_LOCAL_P (type));
   tree_node (PACK_EXPANSION_PARAMETER_PACKS (type));
+  tree_node (PACK_EXPANSION_EXTRA_ARGS (type));
   break;
 
 case TYPENAME_TYPE:
@@ -9455,12 +9456,14 @@ trees_in::tree_node (bool is_use)
{
  bool local = u ();
  tree param_packs = tree_node ();
+ tree extra_args = tree_node ();
  if (!get_overrun ())
{
  tree expn = cxx_make_type (TYPE_PACK_EXPANSION);
  SET_TYPE_STRUCTURAL_EQUALITY (expn);
  PACK_EXPANSION_PATTERN (expn) = res;
  PACK_EXPANSION_PARAMETER_PACKS (expn) = param_packs;
+ PACK_EXPANSION_EXTRA_ARGS (expn) = extra_args;
  PACK_EXPANSION_LOCAL_P (expn) = local;
  res = expn;
}
-- 
2.38.0.rc0



Re: [PATCH v4] eliminate mutex in fast path of __register_frame

2022-09-19 Thread Stephan Bergmann via Gcc-patches

On 16/09/2022 12:19, Thomas Neumann via Gcc-patches wrote:

The __register_frame/__deregister_frame functions are used to register
unwinding frames from JITed code in a sorted list. That list itself
is protected by object_mutex, which leads to terrible performance
in multi-threaded code and is somewhat expensive even if single-threaded.
There was already a fast-path that avoided taking the mutex if no
frame was registered at all.

This commit eliminates both the mutex and the sorted list from
the atomic fast path, and replaces it with a btree that uses
optimistic lock coupling during lookup. This allows for fully parallel
unwinding and is essential to scale exception handling to large
core counts.


I haven't debugged this in any way, nor checked whether it only impacts 
exactly my below scenario, but noticed the following:


At least when building LibreOffice with Clang (16 trunk) with ASan and 
UBsan enabled against libstdc++ (with --gcc-toolchain and 
LD_LIBRARY_PATH to pick up a libstdc++ trunk build including this change 
at build and run-time), at least one of the LibreOffice tests executed 
during the build started to fail with



Thread 1 "cppunittester" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=, signo=signo@entry=6, 
no_tid=no_tid@entry=0)Downloading 0.00 MB source file 
/usr/src/debug/glibc-2.36-4.fc37.x86_64/nptl/pthread_kill.c
 at 
~/.cache/debuginfod_client/a6572cd46182057d3dbacf1685a12edab0e2eda1/source##usr##src##debug##glibc-2.36-4.fc37.x86_64##nptl##pthread_kill.c:44
44return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO 
(ret) : 0;
(gdb) bt
#0  __pthread_kill_implementation (threadid=, 
signo=signo@entry=6, no_tid=no_tid@entry=0) at 
~/.cache/debuginfod_client/a6572cd46182057d3dbacf1685a12edab0e2eda1/source##usr##src##debug##glibc-2.36-4.fc37.x86_64##nptl##pthread_kill.c:44
#1  0x76dcdd33 in __pthread_kill_internal (signo=6, threadid=) at 
~/.cache/debuginfod_client/a6572cd46182057d3dbacf1685a12edab0e2eda1/source##usr##src##debug##glibc-2.36-4.fc37.x86_64##nptl##pthread_kill.c:78
#2  0x76d7daa6 in __GI_raise (sig=sig@entry=6) at 
~/.cache/debuginfod_client/a6572cd46182057d3dbacf1685a12edab0e2eda1/source##usr##src##debug##glibc-2.36-4.fc37.x86_64##signal##..##sysdeps##posix##raise.c:26
#3  0x76d677fc in __GI_abort () at 
~/.cache/debuginfod_client/a6572cd46182057d3dbacf1685a12edab0e2eda1/source##usr##src##debug##glibc-2.36-4.fc37.x86_64##stdlib##abort.c:79
#4  0x76f377e8 in __deregister_frame_info_bases (begin=) 
at ~/gcc/trunk/src/libgcc/unwind-dw2-fde.c:285
#5  __deregister_frame_info_bases (begin=) at 
~/gcc/trunk/src/libgcc/unwind-dw2-fde.c:223
#6  0x7fffc7c3b53f in __do_fini () at 
~/lo/core/instdir/program/libcairo.so.2
#7  0x77fcda9e in _dl_fini () at 
~/.cache/debuginfod_client/653dfb54d6e6d9c27c349f698a8af1ab86d5501d/source##usr##src##debug##glibc-2.36-4.fc37.x86_64##elf##dl-fini.c:142
#8  0x76d7ff35 in __run_exit_handlers (status=0, listp=0x76f13840 
<__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, 
run_dtors=run_dtors@entry=true) at 
~/.cache/debuginfod_client/a6572cd46182057d3dbacf1685a12edab0e2eda1/source##usr##src##debug##glibc-2.36-4.fc37.x86_64##stdlib##exit.c:113
#9  0x76d800b0 in __GI_exit (status=) at 
~/.cache/debuginfod_client/a6572cd46182057d3dbacf1685a12edab0e2eda1/source##usr##src##debug##glibc-2.36-4.fc37.x86_64##stdlib##exit.c:143
#10 0x76d68517 in __libc_start_call_main (main=main@entry=0x556c9ef0 
, argc=argc@entry=24, argv=argv@entry=0x7ffefbf8) at 
~/.cache/debuginfod_client/a6572cd46182057d3dbacf1685a12edab0e2eda1/source##usr##src##debug##glibc-2.36-4.fc37.x86_64##csu##..##sysdeps##nptl##libc_start_call_main.h:74
#11 0x76d685c9 in __libc_start_main_impl (main=0x556c9ef0 , 
argc=24, argv=0x7ffefbf8, init=, fini=, 
rtld_fini=, stack_end=0x7ffefbe8) at 
~/.cache/debuginfod_client/a6572cd46182057d3dbacf1685a12edab0e2eda1/source##usr##src##debug##glibc-2.36-4.fc37.x86_64##csu##..##csu##libc-start.c:381
#12 0x555f1575 in _start ()


and which went away again when locally reverting this 
 
"eliminate mutex in fast path of __register_frame".




Re: [PATCH] libgcc: Decrease size of _Unwind_FrameState and even more size of cleared area in uw_frame_state_for

2022-09-19 Thread Florian Weimer via Gcc-patches
* Jakub Jelinek:

> On Mon, Sep 19, 2022 at 11:25:13AM +0200, Florian Weimer wrote:
>> * Jakub Jelinek:
>> 
>> > The disadvantage of the patch is that touching reg[x].loc and how[x]
>> > now means 2 cachelines rather than one as before, and I admit beyond
>> > bootstrap/regtest I haven't benchmarked it in any way.  Florian, could
>> > you retry whatever you measured to get at the 40% of time spent on the
>> > stack clearing to see how the numbers change?
>> 
>> A benchmark that unwinds through 100 frames containing a std::string
>> variable goes from (0b5b8ac5cb7fe92dd17ae8bd7de84640daa59e84):
>> 
>> min: 24418 ns
>> 25%: 24740 ns
>> 50%: 24790 ns
>> 75%: 24840 ns
>> 95%: 24937 ns
>> 99%: 26174 ns
>> max: 42530 ns
>> avg:   24826.1 ns
>> 
>> to (0b5b8ac5cb7fe92dd17ae8bd7de84640daa59e84 with this patch):
>> 
>> min: 22307 ns
>> 25%: 22640 ns
>> 50%: 22713 ns
>> 75%: 22787 ns
>> 95%: 22948 ns
>> 99%: 24839 ns
>> max: 52658 ns
>> avg:   22863.4 ns
>> 
>> So 227 ns per frame instead of 248 ns per frame, or ~9% less.
>
> Thanks for doing that.

So it turns out my test program had 100 frames, but not with
std::string.  With std::string objects, the numbers are:

Before:

min: 71236 ns
25%: 71637 ns
50%: 71724 ns
75%: 71857 ns
95%: 73148 ns
99%: 74023 ns
max:120735 ns
avg:   71973.1 ns

After:

min: 69547 ns
25%: 69961 ns
50%: 70034 ns
75%: 70112 ns
95%: 71273 ns
99%: 71511 ns
max: 82691 ns
avg:   70121.3 ns

So slightly less improvement per frame, but it's still there.

>> Moving cfa_how after how in struct frame_state_reg_info as an 8-bit
>> bitfield should avoid zeroing another 8 bytes.  This shaves off another
>> 3 ns per frame in my testing (on a Core i9-10900T, so with ERMS).
>
> Good idea.  Won't help always, on some targets how could have size divisible
> by pointer alignment, but when it is at the end it always increases the
> size by alignment of pointer, while after how array it only does so if
> how is multiple of pointer alignment.

Okay, I'll send a separate patch once yours is in, along with some other
simple changes.

Thanks,
Florian



Re: [PATCH] frange: flush denormals to zero for -funsafe-math-optimizations.

2022-09-19 Thread Richard Biener via Gcc-patches
On Mon, Sep 19, 2022 at 3:04 PM Aldy Hernandez  wrote:
>
> On Mon, Sep 19, 2022 at 9:37 AM Richard Biener
>  wrote:
> >
> > On Sat, Sep 17, 2022 at 10:25 AM Aldy Hernandez via Gcc-patches
> >  wrote:
> > >
> > > Jakub has mentioned that for -funsafe-math-optimizations we may flush
> > > denormals to zero, in which case we need to be careful to extend the
> > > ranges to the appropriate zero.  This patch does exactly that.  For a
> > > range of [x, -DENORMAL] we flush to [x, -0.0] and for [+DENORMAL, x]
> > > we flush to [+0.0, x].
> > >
> > > It is unclear whether we should do this for Alpha, since I believe
> > > flushing to zero is the default, and the port requires -mieee for IEEE
> > > sanity.  If so, perhaps we should add a target hook so backends are
> > > free to request flushing to zero.
> > >
> > > Thoughts?
> >
> > I'm not sure what the intention of this is - it effectively results in
> > more conservative ranges for -funsafe-math-optimizations.  That is,
> > if -funsafe-math-optimizations says that denormals don't exist and
> > are all zero then doesn't that mean we can instead use the smalles
> > non-denormal number less than (or greater than) zero here?
> >
> > That said, the flushing you do is also "valid" for
> > -fno-unsafe-math-optimizations
> > in case we don't want to deal with denormals in range boundaries.
> >
> > It might also be a correctness requirement in case we don't know how
> > targets handle denormals (IIRC even OS defaults might matter here) so
> > we conservatively have to treat them as being flushed to zero.
>
> Actually, rth suggested we always flush to zero because we don't know
> what the target would do.  Again, I'm happy to do whatever you agree
> on.  I have no opinion.  My main goal here is correctness.

Yes, I think we should do this.

> >
> > So ... if we want to be on the "safe" side then please always do this.
> >
> > At the same point you could do
> >
> >  if (!HONOR_SIGNED_ZEROS ())
> >if (real_iszero (_max))
> >  {
> > if (real_iszero (_min))
> >m.min.sign = 1;
> > m_max.sign = 1;
> >  }
>
> But wouldn't that set [-0.0, -0.0] when encountering [+0, +0] ??

Yeah, that's my laziness not adding a special case for m_min == m_max.

>
> >else if (real_iszero (_min))
> >  m_min.sign = 0;
>
> Jakub actually suggested something completely different...just set +0
> always for !HONOR_SIGNED_ZEROS.

Hmm, but the [-1, -0.] with known sign becomes [-1, +0.] with unknown sign?

Richard.

> Aldy
>
> >
> > so that we canonicalize a zero bound so that the sign is known for a range.
> >
> > Richard.
> >
> > > gcc/ChangeLog:
> > >
> > > * value-range.cc (frange::flush_denormals_to_zero): New.
> > > (frange::set): Call flush_denormals_to_zero.
> > > * value-range.h (class frange): Add flush_denormals_to_zero.
> > > ---
> > >  gcc/value-range.cc | 24 
> > >  gcc/value-range.h  |  1 +
> > >  2 files changed, 25 insertions(+)
> > >
> > > diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> > > index 67d5d7fa90f..f285734f0e0 100644
> > > --- a/gcc/value-range.cc
> > > +++ b/gcc/value-range.cc
> > > @@ -267,6 +267,26 @@ tree_compare (tree_code code, tree op1, tree op2)
> > >return !integer_zerop (fold_build2 (code, integer_type_node, op1, 
> > > op2));
> > >  }
> > >
> > > +// Flush denormal endpoints to the appropriate 0.0.
> > > +
> > > +void
> > > +frange::flush_denormals_to_zero ()
> > > +{
> > > +  if (undefined_p () || known_isnan ())
> > > +return;
> > > +
> > > +  // Flush [x, -DENORMAL] to [x, -0.0].
> > > +  if (real_isdenormal (_max) && real_isneg (_max))
> > > +{
> > > +  m_max = dconst0;
> > > +  if (HONOR_SIGNED_ZEROS (m_type))
> > > +   m_max.sign = 1;
> > > +}
> > > +  // Flush [+DENORMAL, x] to [+0.0, x].
> > > +  if (real_isdenormal (_min) && !real_isneg (_min))
> > > +m_min = dconst0;
> > > +}
> > > +
> > >  // Setter for franges.
> > >
> > >  void
> > > @@ -317,6 +337,10 @@ frange::set (tree min, tree max, value_range_kind 
> > > kind)
> > >gcc_checking_assert (tree_compare (LE_EXPR, min, max));
> > >
> > >normalize_kind ();
> > > +
> > > +  if (flag_unsafe_math_optimizations)
> > > +flush_denormals_to_zero ();
> > > +
> > >if (flag_checking)
> > >  verify_range ();
> > >  }
> > > diff --git a/gcc/value-range.h b/gcc/value-range.h
> > > index 3a401f3e4e2..795b1f00fdc 100644
> > > --- a/gcc/value-range.h
> > > +++ b/gcc/value-range.h
> > > @@ -327,6 +327,7 @@ private:
> > >bool union_nans (const frange &);
> > >bool intersect_nans (const frange &);
> > >bool combine_zeros (const frange &, bool union_p);
> > > +  void flush_denormals_to_zero ();
> > >
> > >tree m_type;
> > >REAL_VALUE_TYPE m_min;
> > > --
> > > 2.37.1
> > >
> >
>


Re: [PATCH] [PR68097] frange::set_nonnegative should not contain -NAN.

2022-09-19 Thread Richard Biener via Gcc-patches
On Mon, Sep 19, 2022 at 2:52 PM Aldy Hernandez  wrote:
>
> On Mon, Sep 19, 2022 at 10:14 AM Richard Biener
>  wrote:
> >
> > On Mon, Sep 19, 2022 at 9:59 AM Aldy Hernandez  wrote:
> > >
> > > ISTM that a specifically nonnegative range should not contain -NAN,
> > > otherwise signbit_p() would return false, because we'd be unsure of the
> > > sign.
> > >
> > > Do y'all agree?
> >
> > what tree_expr_nonnegative_p actually means isn't 100% clear.  For REAL_CST
> > it actually looks at the sign-bit but we have
> >
> > (simplify
> >  /* copysign(x,y) -> fabs(x) if y is nonnegative.  */
> >  (COPYSIGN_ALL @0 tree_expr_nonnegative_p@1)
> >  (abs @0))
> >
> > is abs (@0) OK for sNaNs and -NaN/+NaN?
>
> At least for real_value's, ABS_EXPR works on NAN's.  There's no
> special code dealing with them.  We just clear the sign bit:
>
> real_arithmetic:
> case ABS_EXPR:
>   *r = *op0;
>   r->sign = 0;
>   break;
>
> >
> > And we have
> >
> > /* Convert abs[u] (X)  where X is nonnegative -> (X).  */
> > (simplify
> >  (abs tree_expr_nonnegative_p@0)
> >  @0)
> >
> > where at least sNaN -> qNaN would be dropped?
> >
> > And of course
> >
> > (simplify
> >  /* signbit(x) -> 0 if x is nonnegative.  */
> >  (SIGNBIT tree_expr_nonnegative_p@0)
> >  { integer_zero_node; })
> >
> > that is, is tree_expr_nonnegative_p actually tree_expr_sign or
> > does tree_expr_nonnegative (x) mean x >= (typeof(X)) 0
> > or !(x < (typeof(X))0)?
>
> I have no idea, but I'm happy to have frange::set_nonnegative() do
> whatever you agree on.
>
> Actually TBH, ranger only uses set_nonnegative for call's, not much else:
>
>   if (range_of_builtin_call (r, call, src))
> ;
>   else if (gimple_stmt_nonnegative_warnv_p (call, _overflow_p))
> r.set_nonnegative (type);
>   else if (gimple_call_nonnull_result_p (call)
>|| gimple_call_nonnull_arg (call))
> r.set_nonzero (type);
>   else
> r.set_varying (type);
>
> but I guess it's good we do the right thing for correctness sake, and
> if it ever gets used by someone else.
>
> >
> > That said, 'set_nonnegative' could be interpreted to be without
> > NaNs?
>
> Sounds good to me.  How's this?

Hmm, I merely had lots of questions above so I think to answer them
we at least should document what 'set_nonnegative' implies in the
abstract vrange class.

It's probably safer to keep NaN in.  For example unfolded copysign (x, 1.)
will return true for x == -NaN but the result will be a NaN.

Richard.

>
> Aldy


Re: [PATCH] frange: flush denormals to zero for -funsafe-math-optimizations.

2022-09-19 Thread Aldy Hernandez via Gcc-patches
On Mon, Sep 19, 2022 at 3:04 PM Aldy Hernandez  wrote:
>
> On Mon, Sep 19, 2022 at 9:37 AM Richard Biener
>  wrote:
> >
> > On Sat, Sep 17, 2022 at 10:25 AM Aldy Hernandez via Gcc-patches
> >  wrote:
> > >
> > > Jakub has mentioned that for -funsafe-math-optimizations we may flush
> > > denormals to zero, in which case we need to be careful to extend the
> > > ranges to the appropriate zero.  This patch does exactly that.  For a
> > > range of [x, -DENORMAL] we flush to [x, -0.0] and for [+DENORMAL, x]
> > > we flush to [+0.0, x].
> > >
> > > It is unclear whether we should do this for Alpha, since I believe
> > > flushing to zero is the default, and the port requires -mieee for IEEE
> > > sanity.  If so, perhaps we should add a target hook so backends are
> > > free to request flushing to zero.
> > >
> > > Thoughts?
> >
> > I'm not sure what the intention of this is - it effectively results in
> > more conservative ranges for -funsafe-math-optimizations.  That is,
> > if -funsafe-math-optimizations says that denormals don't exist and
> > are all zero then doesn't that mean we can instead use the smalles
> > non-denormal number less than (or greater than) zero here?
> >
> > That said, the flushing you do is also "valid" for
> > -fno-unsafe-math-optimizations
> > in case we don't want to deal with denormals in range boundaries.
> >
> > It might also be a correctness requirement in case we don't know how
> > targets handle denormals (IIRC even OS defaults might matter here) so
> > we conservatively have to treat them as being flushed to zero.
>
> Actually, rth suggested we always flush to zero because we don't know
> what the target would do.  Again, I'm happy to do whatever you agree

More specifically, we don't know what the OS will do, so we either
should have a flag of some kind set to TRUE by default when unsafe
math optimizations, or always assume denormal flushing could happen.

Again, no opinion.  Up to y'all.

Aldy

> on.  I have no opinion.  My main goal here is correctness.
>
> >
> > So ... if we want to be on the "safe" side then please always do this.
> >
> > At the same point you could do
> >
> >  if (!HONOR_SIGNED_ZEROS ())
> >if (real_iszero (_max))
> >  {
> > if (real_iszero (_min))
> >m.min.sign = 1;
> > m_max.sign = 1;
> >  }
>
> But wouldn't that set [-0.0, -0.0] when encountering [+0, +0] ??
>
> >else if (real_iszero (_min))
> >  m_min.sign = 0;
>
> Jakub actually suggested something completely different...just set +0
> always for !HONOR_SIGNED_ZEROS.
>
> Aldy
>
> >
> > so that we canonicalize a zero bound so that the sign is known for a range.
> >
> > Richard.
> >
> > > gcc/ChangeLog:
> > >
> > > * value-range.cc (frange::flush_denormals_to_zero): New.
> > > (frange::set): Call flush_denormals_to_zero.
> > > * value-range.h (class frange): Add flush_denormals_to_zero.
> > > ---
> > >  gcc/value-range.cc | 24 
> > >  gcc/value-range.h  |  1 +
> > >  2 files changed, 25 insertions(+)
> > >
> > > diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> > > index 67d5d7fa90f..f285734f0e0 100644
> > > --- a/gcc/value-range.cc
> > > +++ b/gcc/value-range.cc
> > > @@ -267,6 +267,26 @@ tree_compare (tree_code code, tree op1, tree op2)
> > >return !integer_zerop (fold_build2 (code, integer_type_node, op1, 
> > > op2));
> > >  }
> > >
> > > +// Flush denormal endpoints to the appropriate 0.0.
> > > +
> > > +void
> > > +frange::flush_denormals_to_zero ()
> > > +{
> > > +  if (undefined_p () || known_isnan ())
> > > +return;
> > > +
> > > +  // Flush [x, -DENORMAL] to [x, -0.0].
> > > +  if (real_isdenormal (_max) && real_isneg (_max))
> > > +{
> > > +  m_max = dconst0;
> > > +  if (HONOR_SIGNED_ZEROS (m_type))
> > > +   m_max.sign = 1;
> > > +}
> > > +  // Flush [+DENORMAL, x] to [+0.0, x].
> > > +  if (real_isdenormal (_min) && !real_isneg (_min))
> > > +m_min = dconst0;
> > > +}
> > > +
> > >  // Setter for franges.
> > >
> > >  void
> > > @@ -317,6 +337,10 @@ frange::set (tree min, tree max, value_range_kind 
> > > kind)
> > >gcc_checking_assert (tree_compare (LE_EXPR, min, max));
> > >
> > >normalize_kind ();
> > > +
> > > +  if (flag_unsafe_math_optimizations)
> > > +flush_denormals_to_zero ();
> > > +
> > >if (flag_checking)
> > >  verify_range ();
> > >  }
> > > diff --git a/gcc/value-range.h b/gcc/value-range.h
> > > index 3a401f3e4e2..795b1f00fdc 100644
> > > --- a/gcc/value-range.h
> > > +++ b/gcc/value-range.h
> > > @@ -327,6 +327,7 @@ private:
> > >bool union_nans (const frange &);
> > >bool intersect_nans (const frange &);
> > >bool combine_zeros (const frange &, bool union_p);
> > > +  void flush_denormals_to_zero ();
> > >
> > >tree m_type;
> > >REAL_VALUE_TYPE m_min;
> > > --
> > > 2.37.1
> > >
> >



Re: [PATCH] frange: flush denormals to zero for -funsafe-math-optimizations.

2022-09-19 Thread Aldy Hernandez via Gcc-patches
On Mon, Sep 19, 2022 at 9:37 AM Richard Biener
 wrote:
>
> On Sat, Sep 17, 2022 at 10:25 AM Aldy Hernandez via Gcc-patches
>  wrote:
> >
> > Jakub has mentioned that for -funsafe-math-optimizations we may flush
> > denormals to zero, in which case we need to be careful to extend the
> > ranges to the appropriate zero.  This patch does exactly that.  For a
> > range of [x, -DENORMAL] we flush to [x, -0.0] and for [+DENORMAL, x]
> > we flush to [+0.0, x].
> >
> > It is unclear whether we should do this for Alpha, since I believe
> > flushing to zero is the default, and the port requires -mieee for IEEE
> > sanity.  If so, perhaps we should add a target hook so backends are
> > free to request flushing to zero.
> >
> > Thoughts?
>
> I'm not sure what the intention of this is - it effectively results in
> more conservative ranges for -funsafe-math-optimizations.  That is,
> if -funsafe-math-optimizations says that denormals don't exist and
> are all zero then doesn't that mean we can instead use the smalles
> non-denormal number less than (or greater than) zero here?
>
> That said, the flushing you do is also "valid" for
> -fno-unsafe-math-optimizations
> in case we don't want to deal with denormals in range boundaries.
>
> It might also be a correctness requirement in case we don't know how
> targets handle denormals (IIRC even OS defaults might matter here) so
> we conservatively have to treat them as being flushed to zero.

Actually, rth suggested we always flush to zero because we don't know
what the target would do.  Again, I'm happy to do whatever you agree
on.  I have no opinion.  My main goal here is correctness.

>
> So ... if we want to be on the "safe" side then please always do this.
>
> At the same point you could do
>
>  if (!HONOR_SIGNED_ZEROS ())
>if (real_iszero (_max))
>  {
> if (real_iszero (_min))
>m.min.sign = 1;
> m_max.sign = 1;
>  }

But wouldn't that set [-0.0, -0.0] when encountering [+0, +0] ??

>else if (real_iszero (_min))
>  m_min.sign = 0;

Jakub actually suggested something completely different...just set +0
always for !HONOR_SIGNED_ZEROS.

Aldy

>
> so that we canonicalize a zero bound so that the sign is known for a range.
>
> Richard.
>
> > gcc/ChangeLog:
> >
> > * value-range.cc (frange::flush_denormals_to_zero): New.
> > (frange::set): Call flush_denormals_to_zero.
> > * value-range.h (class frange): Add flush_denormals_to_zero.
> > ---
> >  gcc/value-range.cc | 24 
> >  gcc/value-range.h  |  1 +
> >  2 files changed, 25 insertions(+)
> >
> > diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> > index 67d5d7fa90f..f285734f0e0 100644
> > --- a/gcc/value-range.cc
> > +++ b/gcc/value-range.cc
> > @@ -267,6 +267,26 @@ tree_compare (tree_code code, tree op1, tree op2)
> >return !integer_zerop (fold_build2 (code, integer_type_node, op1, op2));
> >  }
> >
> > +// Flush denormal endpoints to the appropriate 0.0.
> > +
> > +void
> > +frange::flush_denormals_to_zero ()
> > +{
> > +  if (undefined_p () || known_isnan ())
> > +return;
> > +
> > +  // Flush [x, -DENORMAL] to [x, -0.0].
> > +  if (real_isdenormal (_max) && real_isneg (_max))
> > +{
> > +  m_max = dconst0;
> > +  if (HONOR_SIGNED_ZEROS (m_type))
> > +   m_max.sign = 1;
> > +}
> > +  // Flush [+DENORMAL, x] to [+0.0, x].
> > +  if (real_isdenormal (_min) && !real_isneg (_min))
> > +m_min = dconst0;
> > +}
> > +
> >  // Setter for franges.
> >
> >  void
> > @@ -317,6 +337,10 @@ frange::set (tree min, tree max, value_range_kind kind)
> >gcc_checking_assert (tree_compare (LE_EXPR, min, max));
> >
> >normalize_kind ();
> > +
> > +  if (flag_unsafe_math_optimizations)
> > +flush_denormals_to_zero ();
> > +
> >if (flag_checking)
> >  verify_range ();
> >  }
> > diff --git a/gcc/value-range.h b/gcc/value-range.h
> > index 3a401f3e4e2..795b1f00fdc 100644
> > --- a/gcc/value-range.h
> > +++ b/gcc/value-range.h
> > @@ -327,6 +327,7 @@ private:
> >bool union_nans (const frange &);
> >bool intersect_nans (const frange &);
> >bool combine_zeros (const frange &, bool union_p);
> > +  void flush_denormals_to_zero ();
> >
> >tree m_type;
> >REAL_VALUE_TYPE m_min;
> > --
> > 2.37.1
> >
>



Re: [PATCH] [PR68097] frange::set_nonnegative should not contain -NAN.

2022-09-19 Thread Aldy Hernandez via Gcc-patches
On Mon, Sep 19, 2022 at 10:14 AM Richard Biener
 wrote:
>
> On Mon, Sep 19, 2022 at 9:59 AM Aldy Hernandez  wrote:
> >
> > ISTM that a specifically nonnegative range should not contain -NAN,
> > otherwise signbit_p() would return false, because we'd be unsure of the
> > sign.
> >
> > Do y'all agree?
>
> what tree_expr_nonnegative_p actually means isn't 100% clear.  For REAL_CST
> it actually looks at the sign-bit but we have
>
> (simplify
>  /* copysign(x,y) -> fabs(x) if y is nonnegative.  */
>  (COPYSIGN_ALL @0 tree_expr_nonnegative_p@1)
>  (abs @0))
>
> is abs (@0) OK for sNaNs and -NaN/+NaN?

At least for real_value's, ABS_EXPR works on NAN's.  There's no
special code dealing with them.  We just clear the sign bit:

real_arithmetic:
case ABS_EXPR:
  *r = *op0;
  r->sign = 0;
  break;

>
> And we have
>
> /* Convert abs[u] (X)  where X is nonnegative -> (X).  */
> (simplify
>  (abs tree_expr_nonnegative_p@0)
>  @0)
>
> where at least sNaN -> qNaN would be dropped?
>
> And of course
>
> (simplify
>  /* signbit(x) -> 0 if x is nonnegative.  */
>  (SIGNBIT tree_expr_nonnegative_p@0)
>  { integer_zero_node; })
>
> that is, is tree_expr_nonnegative_p actually tree_expr_sign or
> does tree_expr_nonnegative (x) mean x >= (typeof(X)) 0
> or !(x < (typeof(X))0)?

I have no idea, but I'm happy to have frange::set_nonnegative() do
whatever you agree on.

Actually TBH, ranger only uses set_nonnegative for call's, not much else:

  if (range_of_builtin_call (r, call, src))
;
  else if (gimple_stmt_nonnegative_warnv_p (call, _overflow_p))
r.set_nonnegative (type);
  else if (gimple_call_nonnull_result_p (call)
   || gimple_call_nonnull_arg (call))
r.set_nonzero (type);
  else
r.set_varying (type);

but I guess it's good we do the right thing for correctness sake, and
if it ever gets used by someone else.

>
> That said, 'set_nonnegative' could be interpreted to be without
> NaNs?

Sounds good to me.  How's this?

Aldy
From 7eeb81a9516663d589e692b278907bb5f3d60ea0 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Sun, 18 Sep 2022 16:24:36 +0200
Subject: [PATCH] [PR68097] Non-negative frange's should not contain NANs.

	PR 68097/tree-optimization

gcc/ChangeLog:

	* value-range.cc (frange::set_nonnegative): Clear NAN.
	(range_tests_signed_zeros): New test.
---
 gcc/value-range.cc | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 67d5d7fa90f..6e703c7eb43 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -752,6 +752,7 @@ void
 frange::set_nonnegative (tree type)
 {
   set (type, dconst0, dconstinf);
+  clear_nan ();
 }
 
 // Here we copy between any two irange's.  The ranges can be legacy or
@@ -3800,6 +3801,10 @@ range_tests_signed_zeros ()
   r1.update_nan ();
   r0.intersect (r1);
   ASSERT_TRUE (r0.known_isnan ());
+
+  // Non-negative ranges should not contain NANs.
+  r0.set_nonnegative (float_type_node);
+  ASSERT_TRUE (!r0.maybe_isnan ());
 }
 
 static void
-- 
2.37.1



Re: [PATCH] Fortran 2018 rounding modes changes

2022-09-19 Thread FX via Gcc-patches
Committed as 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=4637a1d293c978816ad622ba33e3a32a78640edd

FX


> Le 10 sept. 2022 à 12:21, FX  a écrit :
> 
> ping
> (with fix for the typo Bernhard noticed)
> 
> <0001-Fortran-F2018-rounding-modes-changes.patch>
> 
>> Le 31 août 2022 à 20:29, FX  a écrit :
>> 
>> This adds new F2018 features, that are not really enabled (because their 
>> runtime support is optional).
>> 
>> 1. Add the new IEEE_AWAY rounding mode. It is unsupported on all known 
>> targets, but could be supported by glibc and AIX as part of the C2x 
>> proposal. Testing for now is minimal, but once a target supports that 
>> rounding mode, the tests will fail and we can fix them by expanding them.
>> 
>> 2. Add the optional RADIX argument to IEEE_SET_ROUNDING_MODE and 
>> IEEE_GET_ROUNDING_MODE. It is unused for now, because we do not support 
>> floating-point radices other than 2.
>> 
>> 
>> Regression-tested on x86_64-pc-linux-gnu, both 32- and 64-bit.
>> OK to commit?
>> 
>> 
>> <0001-fortran-Fortran-2018-rounding-modes-changes.patch>
> 



[PATCH] c++, v2: Implement C++23 P1169R4 - static operator() [PR106651]

2022-09-19 Thread Jakub Jelinek via Gcc-patches
Hi!

On Sat, Sep 17, 2022 at 01:30:08PM +0200, Jason Merrill wrote:
Below is an updated patch on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601788.html
patch.

> Ah, OK.  I don't think you need to check for C++23 or CALL_EXPR at all, just
> CONVERSION_RANK of the other candidate conversion.

You mean that for C++20 and earlier without static operator () being
involved it is impossible to see non-standard conversion on the first
argument?
I've instrumented GCC during the weekend to log an cases where
static_1 != static_2 and log also whether the corresponding arg on non-static
has standard-conversion or not and the only cases I saw were with
the standard conversion when this static operator () patch wasn't in,
all similar to:
struct S {
  int foo (long);
  static int foo (long long);
};

void
baz ()
{
  S s;
  s.foo (1);
}
and I can't imagine how there could be something other than standard
conversion for the object on which it is called in those cases.

> And I notice that you have winner = -1 in both cases; the first should be
> winner = 1.

Thanks for noticing that, I actually meant to write winner = -1; in the
first and winner = 1; for the second, but that was only because of
misreading it that -1 is when cand1 is better.

> > > You could use FUNCTION_FIRST_USER_PARM to skip 'this' if present.
> > 
> > Ok, will try.

Done.

> You can use FUNCTION_FIRST_USER_PARMTYPE instead of the ?:.

Done too.

> The reformatting of the next two statements seems unnecessary.

It is unnecessary (and I left it now out), I was just something
that catched my eye that it could be formatted more nicely.

> How about
> 
> void f()
> {
>   auto l = [] (auto x) static { return x; };
>   int (*p)(int) = l;
> }
> 
> ?

You're right.  Except that we ICE on the above for C++14 and 17 in
tsubst_copy_and_build's
20343   else if (identifier_p (templ))
20344 {
20345   /* C++20 P0846: we can encounter an IDENTIFIER_NODE here 
when
20346  name lookup found nothing when parsing the template 
name.  */
20347   gcc_assert (cxx_dialect >= cxx20 || seen_error ());
20348   RETURN (tid);
20349 }
and while for C++20 and 23 we don't, we say
error: cannot convert ‘f()::’ to ‘int (*)(int)’ in 
initialization
so I think it is just that the assertion does nothing for C++20/23 and we
didn't manage to look up templ
 >

Just as a wild guess, I've tried the attached incremental patch on top of
the inline patch below and both the ICEs and errors are gone, but whether
that is correct or not sadly no idea.

2022-09-19  Jakub Jelinek  

PR c++/106651
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Predefine
__cpp_static_call_operator=202207L for C++23.
gcc/cp/
* cp-tree.h (LAMBDA_EXPR_STATIC_P): Implement C++23
P1169R4 - static operator().  Define.
* parser.cc (CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR): Document
that it also allows static.
(cp_parser_lambda_declarator_opt): Handle static lambda specifier.
(cp_parser_decl_specifier_seq): Allow RID_STATIC for
CP_PARSER_FLAGS_ONLY_MUTABLE_OR_CONSTEXPR.
* decl.cc (grok_op_properties): If operator() isn't a method,
use a different error wording, if it is static member function,
allow it (for C++20 and older with a pedwarn unless it is
a lambda function or template instantiation).
* call.cc (joust): Don't ICE if one candidate is static member
function and the other is an indirect call.  If the parameter
conversion on the other candidate is user defined conversion,
ellipsis or bad conversion, make static member function candidate
a winner for that parameter. 
* lambda.cc (maybe_add_lambda_conv_op): Handle static lambdas.
* error.cc (dump_lambda_function): Print static for static lambdas.
gcc/testsuite/
* g++.dg/template/error30.C: Adjust expected diagnostics.
* g++.dg/cpp1z/constexpr-lambda13.C: Likewise.
* g++.dg/cpp23/feat-cxx2b.C: Test __cpp_static_call_operator.
* g++.dg/cpp23/static-operator-call1.C: New test.
* g++.dg/cpp23/static-operator-call2.C: New test.
* g++.old-deja/g++.jason/operator.C: Adjust expected diagnostics.

--- gcc/cp/cp-tree.h.jj 2022-09-13 18:57:29.356969560 +0200
+++ gcc/cp/cp-tree.h2022-09-19 11:53:19.780594924 +0200
@@ -504,6 +504,7 @@ extern GTY(()) tree cp_global_trees[CPTI
   OVL_NESTED_P (in OVERLOAD)
   DECL_MODULE_EXPORT_P (in _DECL)
   PACK_EXPANSION_FORCE_EXTRA_ARGS_P (in *_PACK_EXPANSION)
+  LAMBDA_EXPR_STATIC_P (in LAMBDA_EXPR)
4: IDENTIFIER_MARKED (IDENTIFIER_NODEs)
   TREE_HAS_CONSTRUCTOR (in INDIRECT_REF, SAVE_EXPR, CONSTRUCTOR,
  CALL_EXPR, or FIELD_DECL).
@@ -1488,6 +1489,10 @@ enum cp_lambda_default_capture_mode_type
 #define LAMBDA_EXPR_CAPTURE_OPTIMIZED(NODE) \
   TREE_LANG_FLAG_2 (LAMBDA_EXPR_CHECK (NODE))
 

Re: [PATCH] Fortran: add IEEE_MODES_TYPE, IEEE_GET_MODES and IEEE_SET_MODES

2022-09-19 Thread FX via Gcc-patches
Hi Mikael,

> Looks good, thanks.

Thank you for your reviews. This patch is committed to trunk: 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=4637a1d293c978816ad622ba33e3a32a78640edd

FX

[PATCH][RFH] Wire ranger into FRE

2022-09-19 Thread Richard Biener via Gcc-patches
The following is an attempt to utilize ranger for simplification
of conditionals during value-numbering.  It fails the added
testcase because it seems the fur source is only involved when
processing the stmt to be folded but not when filling the cache
of ranges on the operand use->def chain.

When not iterating that's not a problem since earlier stmts
have operands eliminated but when not iterating or when
elimination is disabled the IL remains unchanged.

When iterating it will probably do things wrong due to the lack
of pruning the chache for the region we unwind and when doing
region based VN it likely picks up stale EDGE_EXECUTABLE when
leaving the region over the entry because there's no way to stop
"local" cache prefilling when leaving the region (and just using
global ranges from there).

Knowing path-range-query it's probably possible to clone all
the block walking and cache filling, making a special variant
for value-numbering but I wonder whether that's really what's
intended?  I would probably need to implement a special
range_of_expr and range_of_stmt for this and keep my own cache?
Basically a region_range_query with wired in valueization?

* tree-ssa-sccvn.cc (...): ...
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c | 15 
 gcc/tree-ssa-sccvn.cc   | 77 ++---
 gcc/tree-ssa-sccvn.h|  3 +-
 3 files changed, 83 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
new file mode 100644
index 000..f479a322b70
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-100.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+int foo (int *p, int *q)
+{
+  if (*p > 0 && *q > 0)
+{
+  int v = *p + *q;
+  if (v > 0)
+return *q;
+}
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "known outgoing edge" 1 "fre1" } } */
diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
index 74b8d8d18ef..2753200c2c2 100644
--- a/gcc/tree-ssa-sccvn.cc
+++ b/gcc/tree-ssa-sccvn.cc
@@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "fold-const-call.h"
 #include "ipa-modref-tree.h"
 #include "ipa-modref.h"
+#include "gimple-range.h"
 #include "tree-ssa-sccvn.h"
 
 /* This algorithm is based on the SCC algorithm presented by Keith
@@ -360,10 +361,11 @@ static vn_ssa_aux_t last_pushed_avail;
correct.  */
 static vn_tables_t valid_info;
 
+tree (*vn_valueize) (tree);
 
 /* Valueization hook for simplify_replace_tree.  Valueize NAME if it is
an SSA name, otherwise just return it.  */
-tree (*vn_valueize) (tree);
+
 static tree
 vn_valueize_for_srt (tree t, void* context ATTRIBUTE_UNUSED)
 {
@@ -380,6 +382,36 @@ vn_valueize_for_srt (tree t, void* context 
ATTRIBUTE_UNUSED)
   return res;
 }
 
+class vn_fur : public fur_depend
+{
+public:
+  vn_fur (gimple *s, gori_compute *g, range_query *q)
+: fur_depend (s, g, q) {}
+  virtual bool get_operand (vrange &, tree);
+  virtual bool get_phi_operand (vrange &, tree, edge);
+};
+
+bool
+vn_fur::get_operand (vrange , tree op)
+{
+  return fur_depend::get_operand (r, vn_valueize (op));
+}
+
+bool
+vn_fur::get_phi_operand (vrange , tree op, edge e)
+{
+  // ???  Check region boundary, avoid walking ourside of the region
+  // somehow?  Possibly when initializing the ranger object?
+  // or can/should we simply return 'false' when we arrive with
+  // e being the entry edge?  What about for non-PHIs?  Can we
+  // somehow force to only use the global range and stop caching
+  // after that?
+  if (e->flags & EDGE_EXECUTABLE)
+return fur_depend::get_phi_operand (r, vn_valueize (op), e);
+  r.set_undefined ();
+  return true;
+}
+
 
 /* This represents the top of the VN lattice, which is the universal
value.  */
@@ -7292,12 +7324,12 @@ eliminate_with_rpo_vn (bitmap inserted_exprs)
 
 static unsigned
 do_rpo_vn_1 (function *fn, edge entry, bitmap exit_bbs,
-bool iterate, bool eliminate, vn_lookup_kind kind);
+bool iterate, bool eliminate, vn_lookup_kind kind, gimple_ranger 
*);
 
 void
 run_rpo_vn (vn_lookup_kind kind)
 {
-  do_rpo_vn_1 (cfun, NULL, NULL, true, false, kind);
+  do_rpo_vn_1 (cfun, NULL, NULL, true, false, kind, NULL);
 
   /* ???  Prune requirement of these.  */
   constant_to_value_id = new hash_table (23);
@@ -7580,7 +7612,8 @@ insert_related_predicates_on_edge (enum tree_code code, 
tree *ops, edge pred_e)
 static unsigned
 process_bb (rpo_elim , basic_block bb,
bool bb_visited, bool iterate_phis, bool iterate, bool eliminate,
-   bool do_region, bitmap exit_bbs, bool skip_phis)
+   bool do_region, bitmap exit_bbs, bool skip_phis,
+   gimple_ranger *ranger)
 {
   unsigned todo = 0;
   edge_iterator ei;
@@ -7766,6 +7799,20 @@ process_bb (rpo_elim , basic_block bb,
 

Re: [PATCH] libgcc: Decrease size of _Unwind_FrameState and even more size of cleared area in uw_frame_state_for

2022-09-19 Thread Jakub Jelinek via Gcc-patches
On Mon, Sep 19, 2022 at 11:25:13AM +0200, Florian Weimer wrote:
> * Jakub Jelinek:
> 
> > The disadvantage of the patch is that touching reg[x].loc and how[x]
> > now means 2 cachelines rather than one as before, and I admit beyond
> > bootstrap/regtest I haven't benchmarked it in any way.  Florian, could
> > you retry whatever you measured to get at the 40% of time spent on the
> > stack clearing to see how the numbers change?
> 
> A benchmark that unwinds through 100 frames containing a std::string
> variable goes from (0b5b8ac5cb7fe92dd17ae8bd7de84640daa59e84):
> 
> min: 24418 ns
> 25%: 24740 ns
> 50%: 24790 ns
> 75%: 24840 ns
> 95%: 24937 ns
> 99%: 26174 ns
> max: 42530 ns
> avg:   24826.1 ns
> 
> to (0b5b8ac5cb7fe92dd17ae8bd7de84640daa59e84 with this patch):
> 
> min: 22307 ns
> 25%: 22640 ns
> 50%: 22713 ns
> 75%: 22787 ns
> 95%: 22948 ns
> 99%: 24839 ns
> max: 52658 ns
> avg:   22863.4 ns
> 
> So 227 ns per frame instead of 248 ns per frame, or ~9% less.

Thanks for doing that.

> Moving cfa_how after how in struct frame_state_reg_info as an 8-bit
> bitfield should avoid zeroing another 8 bytes.  This shaves off another
> 3 ns per frame in my testing (on a Core i9-10900T, so with ERMS).

Good idea.  Won't help always, on some targets how could have size divisible
by pointer alignment, but when it is at the end it always increases the
size by alignment of pointer, while after how array it only does so if
how is multiple of pointer alignment.

> The REP STOS still dominates uw_frame_state_for execution time, but this
> seems to be a profiling artifact.  Replacing it with PXOR and seven
> MOVUPS instructions makes the hotspot go away, but performance does not
> improve.  Odd.

Jakub



Re: [PATCH] libgcc: Decrease size of _Unwind_FrameState and even more size of cleared area in uw_frame_state_for

2022-09-19 Thread Florian Weimer via Gcc-patches
* Jakub Jelinek:

> The disadvantage of the patch is that touching reg[x].loc and how[x]
> now means 2 cachelines rather than one as before, and I admit beyond
> bootstrap/regtest I haven't benchmarked it in any way.  Florian, could
> you retry whatever you measured to get at the 40% of time spent on the
> stack clearing to see how the numbers change?

A benchmark that unwinds through 100 frames containing a std::string
variable goes from (0b5b8ac5cb7fe92dd17ae8bd7de84640daa59e84):

min: 24418 ns
25%: 24740 ns
50%: 24790 ns
75%: 24840 ns
95%: 24937 ns
99%: 26174 ns
max: 42530 ns
avg:   24826.1 ns

to (0b5b8ac5cb7fe92dd17ae8bd7de84640daa59e84 with this patch):

min: 22307 ns
25%: 22640 ns
50%: 22713 ns
75%: 22787 ns
95%: 22948 ns
99%: 24839 ns
max: 52658 ns
avg:   22863.4 ns

So 227 ns per frame instead of 248 ns per frame, or ~9% less.

Moving cfa_how after how in struct frame_state_reg_info as an 8-bit
bitfield should avoid zeroing another 8 bytes.  This shaves off another
3 ns per frame in my testing (on a Core i9-10900T, so with ERMS).

The REP STOS still dominates uw_frame_state_for execution time, but this
seems to be a profiling artifact.  Replacing it with PXOR and seven
MOVUPS instructions makes the hotspot go away, but performance does not
improve.  Odd.

Thanks,
Florian



Re: [PATCH] libgcc: Decrease size of _Unwind_FrameState and even more size of cleared area in uw_frame_state_for

2022-09-19 Thread Jakub Jelinek via Gcc-patches
On Mon, Sep 19, 2022 at 10:57:25AM +0200, Richard Biener wrote:
> For even more uglification and avoiding of the cache effect mentioned below
> we could squeeze the 4 bits into the union members, at least on 64bit 
> platforms
> (probably not on 32bit ones) ...

Maybe, but it would limit the offsets we can represent at least:
The union is
  union {
_Unwind_Word reg;
_Unwind_Sword offset;
const unsigned char *exp;
  } loc;
I guess reg is always pretty small (< 153 or so on most targets), so we
could squeeze the 4 bits in there, offset would need to be multiplied by 16
because we can't rely on all offsets being multiples of 16, and exp
could be through performing aligned allocations for the expression.
That said, while it could save a few bytes and the cache effects, on the
other side, we'd need to clear more bytes.
On x86_64 the size would be 240 bytes instead of 264 bytes, but we'd need
to clear those 240 bytes instead of 120 (ok, unless we want to store just
the bytes containing the 4 bits but then it would be even more stores).

Jakub



Re: [patch, avr] Fix PR target/99184: Wrong cast from double to 16-bit and 32-bit ints.

2022-09-19 Thread Richard Biener via Gcc-patches
On Mon, Sep 19, 2022 at 10:57 AM Georg Johann Lay  wrote:
>
>
>
> Am 19.09.22 um 09:51 schrieb Richard Biener:
> > On Sun, Sep 18, 2022 at 7:40 PM Georg Johann Lay  wrote:
> >>
> >> Hello,
> >>
> >> this patch fixed PR target/99184 which incorrectly rounded during 64-bit
> >> (long) double to 16-bit and 32-bit integers.
> >>
> >> The patch just removes the respective roundings from
> >> libf7-asm.sx::to_integer and ::to_unsigned.  Luckily, LibF7 does nowhere
> >> use respective functions internally, the only user is in libf7.c::f7_exp
> >>
> >> which reads
> >>
> >> f7_round (qq, qq);
> >> int16_t q = f7_get_s16 (qq);
> >>
> >> so that f7_get_s16() operates on an already rounded value, and therefore
> >> this code works unaltered with or without rounding in to_integer.
> >>
> >> The patch applies to directory
> >>
> >> ./libgcc/config/avr/libf7/
> >>
> >> and is the same for all GCC versions v10+.
> >>
> >> Please someone with write permissions commit it to trunk and backport to
> >> v12, v11, and v10 as it is a wrong-code issue.
> >>
> >> The patch will fit without problems (except for ChangeLog) because there
> >> is no traffic on that folder.
> >
> > Thanks, I've pushed the change.  Please in future try to send patches
> > that can be applied with git am, thus use git format-patch
> >
> > Richard.
>
> Thanks you so much.  The patch I generated with "git diff > file.diff",
> so that is not correct?

Yes, it's correct, but see below.

> The only change is that I defined extra hunks
> for asm so that one can see the function like in
>
>   @@ -601,9 +601,6 @@ DEFUN to_integer
>
> So git is not prepared to such hunks? Would you point me to some
> documentation on how to do it properly?

The more useful process is that after changing the code you
do a git commit to your tree and then

git format-patch --stdout HEAD^

to get a git patch.  On that I can do 'git am' and that will produce
exactly the commit you produced, including the patch description
and ChangeLog parts.  I had to add that manually.

Indeed https://gcc.gnu.org/contribute.html doesn't mention this,
we should probably improve that.  Jonathan, maybe the above
is not the optimal way so can you think of something to add
to the 'Submitting Patches' section to document this?

Thanks,
Richard.

>
> Thanks,
>
> Johann


Re: [PATCH] libgcc: Decrease size of _Unwind_FrameState and even more size of cleared area in uw_frame_state_for

2022-09-19 Thread Richard Biener via Gcc-patches
On Mon, Sep 19, 2022 at 9:59 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> The following patch implements something that has Florian found as
> low hanging fruit in our unwinder and has been discussed in the
> https://gcc.gnu.org/wiki/cauldron2022#cauldron2022talks.inprocess_unwinding_bof
> talk.
> _Unwind_FrameState type seems to be (unlike the pre-GCC 3 frame_state
> which has been part of ABI) private to unwind-dw2.c + unwind.inc it
> includes, it is always defined on the stack of some entrypoints, initialized
> by static uw_frame_state_for and the address of it is also passed to other
> static functions or the static inlines handling machine dependent unwinding,
> but it isn't fortunately passed to any callbacks or public functions, so I
> think we can safely change it any time we want.
> Florian mentioned that the structure is large even on x86_64, 384 bytes
> there, starts with 328 bytes long element with frame_state_reg_info type
> which then starts with an array with __LIBGCC_DWARF_FRAME_REGISTERS__ + 1
> elements, each of them is 16 bytes long, on x86_64
> __LIBGCC_DWARF_FRAME_REGISTERS__ is just 17 but even that is big, on say
> riscv __LIBGCC_DWARF_FRAME_REGISTERS__ is I think 128, on powerpc 111,
> on sh 153 etc.  And, we memset to zero the whole fs variable with the
> _Unwind_FrameState type at the start of the unwinding.
> The reason why each element is 16 byte (on 64-bit arches) is that it
> contains some pointer or pointer sized integer and then an enum (with just
> 7 different enumerators) + padding.
>
> The following patch decreases it by moving the enum into a separate
> array and using just one byte for each register in that second array.
> We could compress it even more, say 4 bits per register, but I don't
> want to uglify the code for it too much and make the accesses slower.

For even more uglification and avoiding of the cache effect mentioned below
we could squeeze the 4 bits into the union members, at least on 64bit platforms
(probably not on 32bit ones) ...

Richard.

> Furthermore, the clearing of the object can clear only thos how array
> and members after it, because REG_UNSAVED enumerator (0) doesn't actually
> need any pointer or pointer sized integer, it is just the other kinds
> that need to have there something.
> By doing this, on x86_64 the above numbers change to _Unwind_FrameState
> type being now 264 bytes long, frame_state_reg_info 208 bytes and we
> don't clear the first 144 bytes of the object, so the memset is 120 bytes,
> so ~ 31% of the old clearing size.  On riscv 64-bit assuming it has same
> structure layout rules for the few types used there that would be
> ~ 2160 bytes of _Unwind_FrameState type before and ~ 1264 bytes after,
> with the memset previously ~ 2160 bytes and after ~ 232 bytes after.
>
> We've also talked about possibly adding a number of initially initialized
> regs and initializing the rest lazily, but at least for x86_64 with
> 18 elements in the array that doesn't seem to be worth it anymore,
> especially because return address column is 16 there and that is usually the
> first thing to be touched.  It might theory help with lots of registers if
> they are usually untouched, but would uglify and complicate any stores to
> how by having to check there for the not initialized yet cases and lazy
> initialization, and similarly for all reads of how to do there if below
> last initialized one, use how, otherwise imply REG_UNSAVED.
>
> The disadvantage of the patch is that touching reg[x].loc and how[x]
> now means 2 cachelines rather than one as before, and I admit beyond
> bootstrap/regtest I haven't benchmarked it in any way.  Florian, could
> you retry whatever you measured to get at the 40% of time spent on the
> stack clearing to see how the numbers change?
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2022-09-19  Jakub Jelinek  
>
> * unwind-dw2.h (REG_UNSAVED, REG_SAVED_OFFSET, REG_SAVED_REG,
> REG_SAVED_EXP, REG_SAVED_VAL_OFFSET, REG_SAVED_VAL_EXP,
> REG_UNDEFINED): New anonymous enum, moved from inside of
> struct frame_state_reg_info.
> (struct frame_state_reg_info): Remove reg[].how element and the
> anonymous enum there.  Add how element.
> * unwind-dw2.c: Include stddef.h.
> (uw_frame_state_for): Don't clear first
> offsetof (_Unwind_FrameState, regs.how[0]) bytes of *fs.
> (execute_cfa_program, __frame_state_for, uw_update_context_1,
> uw_update_context): Use fs->regs.how[X] instead of fs->regs.reg[X].how
> or fs.regs.how[X] instead of fs.regs.reg[X].how.
> * config/sh/linux-unwind.h (sh_fallback_frame_state): Likewise.
> * config/bfin/linux-unwind.h (bfin_fallback_frame_state): Likewise.
> * config/pa/linux-unwind.h (pa32_fallback_frame_state): Likewise.
> * config/pa/hpux-unwind.h (UPDATE_FS_FOR_SAR, UPDATE_FS_FOR_GR,
> UPDATE_FS_FOR_FR, UPDATE_FS_FOR_PC, 

Re: [patch, avr] Fix PR target/99184: Wrong cast from double to 16-bit and 32-bit ints.

2022-09-19 Thread Georg Johann Lay




Am 19.09.22 um 09:51 schrieb Richard Biener:

On Sun, Sep 18, 2022 at 7:40 PM Georg Johann Lay  wrote:


Hello,

this patch fixed PR target/99184 which incorrectly rounded during 64-bit
(long) double to 16-bit and 32-bit integers.

The patch just removes the respective roundings from
libf7-asm.sx::to_integer and ::to_unsigned.  Luckily, LibF7 does nowhere
use respective functions internally, the only user is in libf7.c::f7_exp

which reads

f7_round (qq, qq);
int16_t q = f7_get_s16 (qq);

so that f7_get_s16() operates on an already rounded value, and therefore
this code works unaltered with or without rounding in to_integer.

The patch applies to directory

./libgcc/config/avr/libf7/

and is the same for all GCC versions v10+.

Please someone with write permissions commit it to trunk and backport to
v12, v11, and v10 as it is a wrong-code issue.

The patch will fit without problems (except for ChangeLog) because there
is no traffic on that folder.


Thanks, I've pushed the change.  Please in future try to send patches
that can be applied with git am, thus use git format-patch

Richard.


Thanks you so much.  The patch I generated with "git diff > file.diff", 
so that is not correct? The only change is that I defined extra hunks 
for asm so that one can see the function like in


 @@ -601,9 +601,6 @@ DEFUN to_integer

So git is not prepared to such hunks? Would you point me to some 
documentation on how to do it properly?


Thanks,

Johann


Re: [PATCH] frange: flush denormals to zero for -funsafe-math-optimizations.

2022-09-19 Thread Jakub Jelinek via Gcc-patches
On Mon, Sep 19, 2022 at 09:37:22AM +0200, Richard Biener wrote:
> On Sat, Sep 17, 2022 at 10:25 AM Aldy Hernandez via Gcc-patches
>  wrote:
> >
> > Jakub has mentioned that for -funsafe-math-optimizations we may flush
> > denormals to zero, in which case we need to be careful to extend the
> > ranges to the appropriate zero.  This patch does exactly that.  For a
> > range of [x, -DENORMAL] we flush to [x, -0.0] and for [+DENORMAL, x]
> > we flush to [+0.0, x].
> >
> > It is unclear whether we should do this for Alpha, since I believe
> > flushing to zero is the default, and the port requires -mieee for IEEE
> > sanity.  If so, perhaps we should add a target hook so backends are
> > free to request flushing to zero.
> >
> > Thoughts?
> 
> I'm not sure what the intention of this is - it effectively results in
> more conservative ranges for -funsafe-math-optimizations.  That is,
> if -funsafe-math-optimizations says that denormals don't exist and
> are all zero then doesn't that mean we can instead use the smalles
> non-denormal number less than (or greater than) zero here?

Yes, with -funsafe-math-optimizations we will sometimes have slightly
larger ranges, while when we know or assume we know that denormals
will be honored we can be more precise.
It is similar to the -fno-signed-zeros case, when we are more relaxed
and say we don't care about sign of zeros and optimizations can be
performed to leave the sign of zero in whatever state they like,
we can't have [+0.0, xx] or [xx, -0.0] ranges and need to extend them
to [-0.0, xx] or [xx, +0.0].  The honor denormals case is similar,
if there is the possibility of having denormals flushed to zero,
[+denormal, xx] needs to be extended to [+0.0, xx] and
[xx, -denormal] to [xx, -0.0].  Now, I think we just should add
a separate option whether denormals are honored or not, because
always extending the ranges with denormals on the boundaries might
prevent some useful optimizations.  If we add such an option, it
could default to not honoring them for -funsafe-math-optimizations,
and say on alpha it should default to not honoring them by default
always unless -mieee option is used, but all of this if the user
didn't use the option explicitly, in that case follow what the user
said.  That way if users know they set DTZ flag themselves or use
libraries that do so, they can tell the compiler about it.

> That said, the flushing you do is also "valid" for
> -fno-unsafe-math-optimizations
> in case we don't want to deal with denormals in range boundaries.

It is always valid to extend the range more, but could be harmful to
optimizations.  Say if we have then x > 0.0 test...
Of course, without -ffast-math we won't DCE various floating point
operations because they can raise exceptions, but at least we can DCE
the basic block after it that depends on x > 0.0.

Jakub



Re: [PATCH] [PR68097] frange::set_nonnegative should not contain -NAN.

2022-09-19 Thread Richard Biener via Gcc-patches
On Mon, Sep 19, 2022 at 9:59 AM Aldy Hernandez  wrote:
>
> ISTM that a specifically nonnegative range should not contain -NAN,
> otherwise signbit_p() would return false, because we'd be unsure of the
> sign.
>
> Do y'all agree?

what tree_expr_nonnegative_p actually means isn't 100% clear.  For REAL_CST
it actually looks at the sign-bit but we have

(simplify
 /* copysign(x,y) -> fabs(x) if y is nonnegative.  */
 (COPYSIGN_ALL @0 tree_expr_nonnegative_p@1)
 (abs @0))

is abs (@0) OK for sNaNs and -NaN/+NaN?

And we have

/* Convert abs[u] (X)  where X is nonnegative -> (X).  */
(simplify
 (abs tree_expr_nonnegative_p@0)
 @0)

where at least sNaN -> qNaN would be dropped?

And of course

(simplify
 /* signbit(x) -> 0 if x is nonnegative.  */
 (SIGNBIT tree_expr_nonnegative_p@0)
 { integer_zero_node; })

that is, is tree_expr_nonnegative_p actually tree_expr_sign or
does tree_expr_nonnegative (x) mean x >= (typeof(X)) 0
or !(x < (typeof(X))0)?

That said, 'set_nonnegative' could be interpreted to be without
NaNs?

Richard.

> PR 68097/tree-optimization
>
> gcc/ChangeLog:
>
> * value-range.cc (frange::set_nonnegative): Set +NAN.
> (range_tests_signed_zeros): New test.
> * value-range.h (frange::update_nan): New overload to set NAN sign.
> ---
>  gcc/value-range.cc |  9 +
>  gcc/value-range.h  | 14 ++
>  2 files changed, 23 insertions(+)
>
> diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> index 67d5d7fa90f..e432ec8b525 100644
> --- a/gcc/value-range.cc
> +++ b/gcc/value-range.cc
> @@ -752,6 +752,10 @@ void
>  frange::set_nonnegative (tree type)
>  {
>set (type, dconst0, dconstinf);
> +
> +  // Set +NAN as the only possibility.
> +  if (HONOR_NANS (type))
> +update_nan (/*sign=*/0);
>  }
>
>  // Here we copy between any two irange's.  The ranges can be legacy or
> @@ -3800,6 +3804,11 @@ range_tests_signed_zeros ()
>r1.update_nan ();
>r0.intersect (r1);
>ASSERT_TRUE (r0.known_isnan ());
> +
> +  r0.set_nonnegative (float_type_node);
> +  ASSERT_TRUE (r0.signbit_p (signbit) && !signbit);
> +  if (HONOR_NANS (float_type_node))
> +ASSERT_TRUE (r0.maybe_isnan ());
>  }
>
>  static void
> diff --git a/gcc/value-range.h b/gcc/value-range.h
> index 3a401f3e4e2..5b261d4f46a 100644
> --- a/gcc/value-range.h
> +++ b/gcc/value-range.h
> @@ -312,6 +312,7 @@ public:
>const REAL_VALUE_TYPE _bound () const;
>const REAL_VALUE_TYPE _bound () const;
>void update_nan ();
> +  void update_nan (bool sign);
>void clear_nan ();
>
>// fpclassify like API
> @@ -1098,6 +1099,19 @@ frange::update_nan ()
>  verify_range ();
>  }
>
> +// Like above, but set the sign of the NAN.
> +
> +inline void
> +frange::update_nan (bool sign)
> +{
> +  gcc_checking_assert (!undefined_p ());
> +  m_pos_nan = !sign;
> +  m_neg_nan = sign;
> +  normalize_kind ();
> +  if (flag_checking)
> +verify_range ();
> +}
> +
>  // Clear the NAN bit and adjust the range.
>
>  inline void
> --
> 2.37.1
>


[PATCH] [PR68097] frange::set_nonnegative should not contain -NAN.

2022-09-19 Thread Aldy Hernandez via Gcc-patches
ISTM that a specifically nonnegative range should not contain -NAN,
otherwise signbit_p() would return false, because we'd be unsure of the
sign.

Do y'all agree?

PR 68097/tree-optimization

gcc/ChangeLog:

* value-range.cc (frange::set_nonnegative): Set +NAN.
(range_tests_signed_zeros): New test.
* value-range.h (frange::update_nan): New overload to set NAN sign.
---
 gcc/value-range.cc |  9 +
 gcc/value-range.h  | 14 ++
 2 files changed, 23 insertions(+)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 67d5d7fa90f..e432ec8b525 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -752,6 +752,10 @@ void
 frange::set_nonnegative (tree type)
 {
   set (type, dconst0, dconstinf);
+
+  // Set +NAN as the only possibility.
+  if (HONOR_NANS (type))
+update_nan (/*sign=*/0);
 }
 
 // Here we copy between any two irange's.  The ranges can be legacy or
@@ -3800,6 +3804,11 @@ range_tests_signed_zeros ()
   r1.update_nan ();
   r0.intersect (r1);
   ASSERT_TRUE (r0.known_isnan ());
+
+  r0.set_nonnegative (float_type_node);
+  ASSERT_TRUE (r0.signbit_p (signbit) && !signbit);
+  if (HONOR_NANS (float_type_node))
+ASSERT_TRUE (r0.maybe_isnan ());
 }
 
 static void
diff --git a/gcc/value-range.h b/gcc/value-range.h
index 3a401f3e4e2..5b261d4f46a 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -312,6 +312,7 @@ public:
   const REAL_VALUE_TYPE _bound () const;
   const REAL_VALUE_TYPE _bound () const;
   void update_nan ();
+  void update_nan (bool sign);
   void clear_nan ();
 
   // fpclassify like API
@@ -1098,6 +1099,19 @@ frange::update_nan ()
 verify_range ();
 }
 
+// Like above, but set the sign of the NAN.
+
+inline void
+frange::update_nan (bool sign)
+{
+  gcc_checking_assert (!undefined_p ());
+  m_pos_nan = !sign;
+  m_neg_nan = sign;
+  normalize_kind ();
+  if (flag_checking)
+verify_range ();
+}
+
 // Clear the NAN bit and adjust the range.
 
 inline void
-- 
2.37.1



Re: [PATCH 09/10] fortran: Support clobbering of variable subreferences [PR88364]

2022-09-19 Thread Richard Biener via Gcc-patches
On Mon, Sep 19, 2022 at 9:31 AM Mikael Morin  wrote:
>
> Le 18/09/2022 à 12:48, Richard Biener a écrit :
> >
> >> Does *([1]) count as a pointer dereference?
> >
> > Yes, technically.
> >
> >>   Even in the original dump it is already simplified to a straight a[1].
> >
> > But this not anymore.  The check can probably be relaxed, it stems from the 
> > dual purpose of CLOBBER.
> >
> So the following makes the frontend-emitted IL valid, by handing the
> simplification over to the middle-end, but I can't help thinking that
> behavior won't be more reliable.

I think that will just delay the folding.  We are basically relying on
the frontends to
restrict what they produce, the *ptr case was specifically for C++
this in CTOR/DTORs
and during inlining we throw away all clobbers that end up "wrong" but
I guess we
might have latent issues when we obfuscate the address in the caller enough and
get undesired simplification ...

>
> diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
> index f8fcd2d97d9..5fb9a3a536d 100644
> --- a/gcc/fortran/trans-expr.cc
> +++ b/gcc/fortran/trans-expr.cc
> @@ -6544,8 +6544,9 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol *
> sym,
>&& !sym->attr.elemental)
>  {
>tree var;
> - var = build_fold_indirect_ref_loc (input_location,
> -parmse.expr);
> + var = build1_loc (input_location, INDIRECT_REF,
> +   TREE_TYPE (TREE_TYPE
> (parmse.expr)),
> +   parmse.expr);
>tree clobber = build_clobber (TREE_TYPE (var));
>gfc_add_modify (, var, clobber);
>  }
>


[PATCH] libgcc: Decrease size of _Unwind_FrameState and even more size of cleared area in uw_frame_state_for

2022-09-19 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch implements something that has Florian found as
low hanging fruit in our unwinder and has been discussed in the
https://gcc.gnu.org/wiki/cauldron2022#cauldron2022talks.inprocess_unwinding_bof
talk.
_Unwind_FrameState type seems to be (unlike the pre-GCC 3 frame_state
which has been part of ABI) private to unwind-dw2.c + unwind.inc it
includes, it is always defined on the stack of some entrypoints, initialized
by static uw_frame_state_for and the address of it is also passed to other
static functions or the static inlines handling machine dependent unwinding,
but it isn't fortunately passed to any callbacks or public functions, so I
think we can safely change it any time we want.
Florian mentioned that the structure is large even on x86_64, 384 bytes
there, starts with 328 bytes long element with frame_state_reg_info type
which then starts with an array with __LIBGCC_DWARF_FRAME_REGISTERS__ + 1
elements, each of them is 16 bytes long, on x86_64
__LIBGCC_DWARF_FRAME_REGISTERS__ is just 17 but even that is big, on say
riscv __LIBGCC_DWARF_FRAME_REGISTERS__ is I think 128, on powerpc 111,
on sh 153 etc.  And, we memset to zero the whole fs variable with the
_Unwind_FrameState type at the start of the unwinding.
The reason why each element is 16 byte (on 64-bit arches) is that it
contains some pointer or pointer sized integer and then an enum (with just
7 different enumerators) + padding.

The following patch decreases it by moving the enum into a separate
array and using just one byte for each register in that second array.
We could compress it even more, say 4 bits per register, but I don't
want to uglify the code for it too much and make the accesses slower.
Furthermore, the clearing of the object can clear only thos how array
and members after it, because REG_UNSAVED enumerator (0) doesn't actually
need any pointer or pointer sized integer, it is just the other kinds
that need to have there something.
By doing this, on x86_64 the above numbers change to _Unwind_FrameState
type being now 264 bytes long, frame_state_reg_info 208 bytes and we
don't clear the first 144 bytes of the object, so the memset is 120 bytes,
so ~ 31% of the old clearing size.  On riscv 64-bit assuming it has same
structure layout rules for the few types used there that would be
~ 2160 bytes of _Unwind_FrameState type before and ~ 1264 bytes after,
with the memset previously ~ 2160 bytes and after ~ 232 bytes after.

We've also talked about possibly adding a number of initially initialized
regs and initializing the rest lazily, but at least for x86_64 with
18 elements in the array that doesn't seem to be worth it anymore,
especially because return address column is 16 there and that is usually the
first thing to be touched.  It might theory help with lots of registers if
they are usually untouched, but would uglify and complicate any stores to
how by having to check there for the not initialized yet cases and lazy
initialization, and similarly for all reads of how to do there if below
last initialized one, use how, otherwise imply REG_UNSAVED.

The disadvantage of the patch is that touching reg[x].loc and how[x]
now means 2 cachelines rather than one as before, and I admit beyond
bootstrap/regtest I haven't benchmarked it in any way.  Florian, could
you retry whatever you measured to get at the 40% of time spent on the
stack clearing to see how the numbers change?

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-09-19  Jakub Jelinek  

* unwind-dw2.h (REG_UNSAVED, REG_SAVED_OFFSET, REG_SAVED_REG,
REG_SAVED_EXP, REG_SAVED_VAL_OFFSET, REG_SAVED_VAL_EXP,
REG_UNDEFINED): New anonymous enum, moved from inside of
struct frame_state_reg_info.
(struct frame_state_reg_info): Remove reg[].how element and the
anonymous enum there.  Add how element.
* unwind-dw2.c: Include stddef.h.
(uw_frame_state_for): Don't clear first
offsetof (_Unwind_FrameState, regs.how[0]) bytes of *fs.
(execute_cfa_program, __frame_state_for, uw_update_context_1,
uw_update_context): Use fs->regs.how[X] instead of fs->regs.reg[X].how
or fs.regs.how[X] instead of fs.regs.reg[X].how.
* config/sh/linux-unwind.h (sh_fallback_frame_state): Likewise.
* config/bfin/linux-unwind.h (bfin_fallback_frame_state): Likewise.
* config/pa/linux-unwind.h (pa32_fallback_frame_state): Likewise.
* config/pa/hpux-unwind.h (UPDATE_FS_FOR_SAR, UPDATE_FS_FOR_GR,
UPDATE_FS_FOR_FR, UPDATE_FS_FOR_PC, pa_fallback_frame_state):
Likewise.
* config/alpha/vms-unwind.h (alpha_vms_fallback_frame_state):
Likewise.
* config/alpha/linux-unwind.h (alpha_fallback_frame_state): Likewise.
* config/arc/linux-unwind.h (arc_fallback_frame_state,
arc_frob_update_context): Likewise.
* config/riscv/linux-unwind.h (riscv_fallback_frame_state): Likewise.
* 

Re: [patch, avr] Fix PR target/99184: Wrong cast from double to 16-bit and 32-bit ints.

2022-09-19 Thread Richard Biener via Gcc-patches
On Sun, Sep 18, 2022 at 7:40 PM Georg Johann Lay  wrote:
>
> Hello,
>
> this patch fixed PR target/99184 which incorrectly rounded during 64-bit
> (long) double to 16-bit and 32-bit integers.
>
> The patch just removes the respective roundings from
> libf7-asm.sx::to_integer and ::to_unsigned.  Luckily, LibF7 does nowhere
> use respective functions internally, the only user is in libf7.c::f7_exp
>
> which reads
>
>f7_round (qq, qq);
>int16_t q = f7_get_s16 (qq);
>
> so that f7_get_s16() operates on an already rounded value, and therefore
> this code works unaltered with or without rounding in to_integer.
>
> The patch applies to directory
>
> ./libgcc/config/avr/libf7/
>
> and is the same for all GCC versions v10+.
>
> Please someone with write permissions commit it to trunk and backport to
> v12, v11, and v10 as it is a wrong-code issue.
>
> The patch will fit without problems (except for ChangeLog) because there
> is no traffic on that folder.

Thanks, I've pushed the change.  Please in future try to send patches
that can be applied with git am, thus use git format-patch

Richard.

>
> Thanks, Johann
>
>
> libgcc/config/avr/libf7/
> PR target/99184
> Remove rounding from double to [u]int16 and [u]int32 casts.
>
> * libf7-asm.sx (to_integer, to_unsigned): Don't round 16-bit
> and 32-bit integers.
>


Re: [PATCH] Improve sorry message for -fzero-call-used-regs

2022-09-19 Thread Richard Biener via Gcc-patches
On Sun, Sep 18, 2022 at 11:09 AM Torbjörn SVENSSON via Gcc-patches
 wrote:
>
> When the -fzero-call-used-regs command line option is used with an
> unsupported value, indicate that it's a value problem instead of an
> option problem.
>
> Without the patch, the error is:
> In file included from gcc/testsuite/c-c++-common/zero-scratch-regs-8.c:5:
> gcc/testsuite/c-c++-common/zero-scratch-regs-1.c: In function 'foo':
> gcc/testsuite/c-c++-common/zero-scratch-regs-1.c:10:1: sorry, unimplemented: 
> '-fzero-call-used-regs' not supported on this target
>10 | }
>   | ^
>
> With the patch, the error would be like this:
>  In file included from gcc/testsuite/c-c++-common/zero-scratch-regs-8.c:5:
> gcc/testsuite/c-c++-common/zero-scratch-regs-1.c: In function 'foo':
> gcc/testsuite/c-c++-common/zero-scratch-regs-1.c:10:1: sorry, unimplemented: 
> Argument 'all-arg' is not supported for '-fzero-call-used-regs' on this target

the 'A' in 'Argument' should be lower case.

Otherwise LGTM.

Richard.

>10 | }
>   | ^
>
> 2022-09-18  Torbjörn SVENSSON  
>
> gcc/ChangeLog:
>
> * targhooks.cc (default_zero_call_used_regs): Improve sorry
> message.
>
> Signed-off-by: Torbjörn SVENSSON  
> ---
>  gcc/targhooks.cc | 17 +++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
> index b15ae19bcb6..8bfbc1d18f6 100644
> --- a/gcc/targhooks.cc
> +++ b/gcc/targhooks.cc
> @@ -93,6 +93,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimple.h"
>  #include "cfgloop.h"
>  #include "tree-vectorizer.h"
> +#include "options.h"
>
>  bool
>  default_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED,
> @@ -1181,9 +1182,21 @@ default_zero_call_used_regs (HARD_REG_SET 
> need_zeroed_hardregs)
>static bool issued_error;
>if (!issued_error)
> {
> + const char *name = NULL;
> + for (unsigned int i = 0; zero_call_used_regs_opts[i].name != NULL;
> +  ++i)
> +   if (flag_zero_call_used_regs == zero_call_used_regs_opts[i].flag)
> + {
> +   name = zero_call_used_regs_opts[i].name;
> +   break;
> + }
> +
> + if (!name)
> +   name = "";
> +
>   issued_error = true;
> - sorry ("%qs not supported on this target",
> -"-fzero-call-used-regs");
> + sorry ("Argument %qs is not supported for %qs on this target",
> +name, "-fzero-call-used-regs");
> }
>  }
>
> --
> 2.25.1
>


Re: [PATCH] frange: flush denormals to zero for -funsafe-math-optimizations.

2022-09-19 Thread Richard Biener via Gcc-patches
On Sat, Sep 17, 2022 at 10:25 AM Aldy Hernandez via Gcc-patches
 wrote:
>
> Jakub has mentioned that for -funsafe-math-optimizations we may flush
> denormals to zero, in which case we need to be careful to extend the
> ranges to the appropriate zero.  This patch does exactly that.  For a
> range of [x, -DENORMAL] we flush to [x, -0.0] and for [+DENORMAL, x]
> we flush to [+0.0, x].
>
> It is unclear whether we should do this for Alpha, since I believe
> flushing to zero is the default, and the port requires -mieee for IEEE
> sanity.  If so, perhaps we should add a target hook so backends are
> free to request flushing to zero.
>
> Thoughts?

I'm not sure what the intention of this is - it effectively results in
more conservative ranges for -funsafe-math-optimizations.  That is,
if -funsafe-math-optimizations says that denormals don't exist and
are all zero then doesn't that mean we can instead use the smalles
non-denormal number less than (or greater than) zero here?

That said, the flushing you do is also "valid" for
-fno-unsafe-math-optimizations
in case we don't want to deal with denormals in range boundaries.

It might also be a correctness requirement in case we don't know how
targets handle denormals (IIRC even OS defaults might matter here) so
we conservatively have to treat them as being flushed to zero.

So ... if we want to be on the "safe" side then please always do this.

At the same point you could do

 if (!HONOR_SIGNED_ZEROS ())
   if (real_iszero (_max))
 {
if (real_iszero (_min))
   m.min.sign = 1;
m_max.sign = 1;
 }
   else if (real_iszero (_min))
 m_min.sign = 0;

so that we canonicalize a zero bound so that the sign is known for a range.

Richard.

> gcc/ChangeLog:
>
> * value-range.cc (frange::flush_denormals_to_zero): New.
> (frange::set): Call flush_denormals_to_zero.
> * value-range.h (class frange): Add flush_denormals_to_zero.
> ---
>  gcc/value-range.cc | 24 
>  gcc/value-range.h  |  1 +
>  2 files changed, 25 insertions(+)
>
> diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> index 67d5d7fa90f..f285734f0e0 100644
> --- a/gcc/value-range.cc
> +++ b/gcc/value-range.cc
> @@ -267,6 +267,26 @@ tree_compare (tree_code code, tree op1, tree op2)
>return !integer_zerop (fold_build2 (code, integer_type_node, op1, op2));
>  }
>
> +// Flush denormal endpoints to the appropriate 0.0.
> +
> +void
> +frange::flush_denormals_to_zero ()
> +{
> +  if (undefined_p () || known_isnan ())
> +return;
> +
> +  // Flush [x, -DENORMAL] to [x, -0.0].
> +  if (real_isdenormal (_max) && real_isneg (_max))
> +{
> +  m_max = dconst0;
> +  if (HONOR_SIGNED_ZEROS (m_type))
> +   m_max.sign = 1;
> +}
> +  // Flush [+DENORMAL, x] to [+0.0, x].
> +  if (real_isdenormal (_min) && !real_isneg (_min))
> +m_min = dconst0;
> +}
> +
>  // Setter for franges.
>
>  void
> @@ -317,6 +337,10 @@ frange::set (tree min, tree max, value_range_kind kind)
>gcc_checking_assert (tree_compare (LE_EXPR, min, max));
>
>normalize_kind ();
> +
> +  if (flag_unsafe_math_optimizations)
> +flush_denormals_to_zero ();
> +
>if (flag_checking)
>  verify_range ();
>  }
> diff --git a/gcc/value-range.h b/gcc/value-range.h
> index 3a401f3e4e2..795b1f00fdc 100644
> --- a/gcc/value-range.h
> +++ b/gcc/value-range.h
> @@ -327,6 +327,7 @@ private:
>bool union_nans (const frange &);
>bool intersect_nans (const frange &);
>bool combine_zeros (const frange &, bool union_p);
> +  void flush_denormals_to_zero ();
>
>tree m_type;
>REAL_VALUE_TYPE m_min;
> --
> 2.37.1
>


Re: [PATCH 09/10] fortran: Support clobbering of variable subreferences [PR88364]

2022-09-19 Thread Mikael Morin

Le 18/09/2022 à 12:48, Richard Biener a écrit :



Does *([1]) count as a pointer dereference?


Yes, technically.


  Even in the original dump it is already simplified to a straight a[1].


But this not anymore.  The check can probably be relaxed, it stems from the 
dual purpose of CLOBBER.

So the following makes the frontend-emitted IL valid, by handing the 
simplification over to the middle-end, but I can't help thinking that 
behavior won't be more reliable.



diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index f8fcd2d97d9..5fb9a3a536d 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6544,8 +6544,9 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * 
sym,

  && !sym->attr.elemental)
{
  tree var;
- var = build_fold_indirect_ref_loc (input_location,
-parmse.expr);
+ var = build1_loc (input_location, INDIRECT_REF,
+   TREE_TYPE (TREE_TYPE 
(parmse.expr)),

+   parmse.expr);
  tree clobber = build_clobber (TREE_TYPE (var));
  gfc_add_modify (, var, clobber);
}



[PATCH] c++: Improve diagnostics about conflicting specifiers

2022-09-19 Thread Jakub Jelinek via Gcc-patches
Hi!

On Sat, Sep 17, 2022 at 01:23:59AM +0200, Jason Merrill wrote:
> I wonder why we don't give an error when setting the
> conflicting_specifiers_p flag in cp_parser_set_storage_class?  We should be
> able to give a better diagnostic at that point.

I didn't have time to update the whole patch last night, but this part
seems to be independent and I've managed to test it.

The diagnostics then looks like:
a.C:1:9: error: ‘static’ specifier conflicts with ‘typedef’
1 | typedef static int a;
  | ~~~ ^~
a.C:2:8: error: ‘typedef’ specifier conflicts with ‘static’
2 | static typedef int b;
  | ~~ ^~~
a.C:3:8: error: duplicate ‘static’ specifier
3 | static static int c;
  | ~~ ^~
a.C:4:8: error: ‘extern’ specifier conflicts with ‘static’
4 | static extern int d;
  | ~~ ^~

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-09-19  Jakub Jelinek  

gcc/cp/
* parser.cc (cp_parser_lambda_declarator_opt): Don't diagnose
conflicting specifiers here.
(cp_storage_class_name): New variable.
(cp_parser_decl_specifier_seq): When setting conflicting_specifiers_p
for the first time, diagnose which exact specifiers conflict.
(cp_parser_set_storage_class): Likewise.  Move storage_class
computation earlier.
* decl.cc (grokdeclarator): Don't diagnose conflicting specifiers
here, just return error_mark_node.
gcc/testsuite/
* g++.dg/diagnostic/conflicting-specifiers-1.C: Adjust expected
diagnostics.
* g++.dg/parse/typedef8.C: Likewise.
* g++.dg/parse/crash39.C: Likewise.
* g++.dg/other/mult-stor1.C: Likewise.
* g++.dg/cpp2a/constinit3.C: Likewise.

--- gcc/cp/parser.cc.jj 2022-09-16 22:34:28.427708581 +0200
+++ gcc/cp/parser.cc2022-09-19 09:18:49.561145347 +0200
@@ -11718,9 +11718,6 @@ cp_parser_lambda_declarator_opt (cp_pars
 {
   LAMBDA_EXPR_MUTABLE_P (lambda_expr) = 1;
   quals = TYPE_UNQUALIFIED;
-  if (lambda_specs.conflicting_specifiers_p)
-   error_at (lambda_specs.locations[ds_storage_class],
- "duplicate %");
 }
 
   tx_qual = cp_parser_tx_qualifier_opt (parser);
@@ -15720,6 +15717,13 @@ cp_parser_decomposition_declaration (cp_
   return decl;
 }
 
+/* Names of storage classes.  */
+
+static const char *const
+cp_storage_class_name[] = {
+  "", "auto", "register", "static", "extern", "mutable"
+};
+
 /* Parse a decl-specifier-seq.
 
decl-specifier-seq:
@@ -15941,8 +15945,18 @@ cp_parser_decl_specifier_seq (cp_parser*
 may as well commit at this point.  */
  cp_parser_commit_to_tentative_parse (parser);
 
-  if (decl_specs->storage_class != sc_none)
-decl_specs->conflicting_specifiers_p = true;
+ if (decl_specs->storage_class != sc_none)
+   {
+ if (decl_specs->conflicting_specifiers_p)
+   break;
+ gcc_rich_location richloc (token->location);
+ location_t oloc = decl_specs->locations[ds_storage_class];
+ richloc.add_location_if_nearby (oloc);
+ error_at (,
+   "% specifier conflicts with %qs",
+   cp_storage_class_name[decl_specs->storage_class]);
+ decl_specs->conflicting_specifiers_p = true;
+   }
  break;
 
  /* storage-class-specifier:
@@ -32826,26 +32840,6 @@ cp_parser_set_storage_class (cp_parser *
 {
   cp_storage_class storage_class;
 
-  if (parser->in_unbraced_linkage_specification_p)
-{
-  error_at (token->location, "invalid use of %qD in linkage specification",
-   ridpointers[keyword]);
-  return;
-}
-  else if (decl_specs->storage_class != sc_none)
-{
-  decl_specs->conflicting_specifiers_p = true;
-  return;
-}
-
-  if ((keyword == RID_EXTERN || keyword == RID_STATIC)
-  && decl_spec_seq_has_spec_p (decl_specs, ds_thread)
-  && decl_specs->gnu_thread_keyword_p)
-{
-  pedwarn (decl_specs->locations[ds_thread], 0,
-   "%<__thread%> before %qD", ridpointers[keyword]);
-}
-
   switch (keyword)
 {
 case RID_AUTO:
@@ -32866,6 +32860,38 @@ cp_parser_set_storage_class (cp_parser *
 default:
   gcc_unreachable ();
 }
+
+  if (parser->in_unbraced_linkage_specification_p)
+{
+  error_at (token->location, "invalid use of %qD in linkage specification",
+   ridpointers[keyword]);
+  return;
+}
+  else if (decl_specs->storage_class != sc_none)
+{
+  if (decl_specs->conflicting_specifiers_p)
+   return;
+  gcc_rich_location richloc (token->location);
+  richloc.add_location_if_nearby (decl_specs->locations[ds_storage_class]);
+  if (decl_specs->storage_class == storage_class)
+   error_at (, "duplicate %qD specifier", ridpointers[keyword]);
+  else
+   error_at (,
+ "%qD specifier conflicts with 

Re: [PATCH 09/10] fortran: Support clobbering of variable subreferences [PR88364]

2022-09-19 Thread Mikael Morin

Le 18/09/2022 à 22:55, Mikael Morin a écrit :
Assumed shape can be non-contiguous so have to be excluded, 

Thinking about it again, assumed shape are probably acceptable, but 
there should be a check for contiguousness on the actual argument.


[PATCH] genrecog.cc (print_nonbool_test): Fix type error of SUBREG_BYTE

2022-09-19 Thread Jojo R via Gcc-patches
* gcc/genrecog.cc (print_nonbool_test): Fix type error of
SUBREG_BYTE
---
 gcc/genrecog.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/genrecog.cc b/gcc/genrecog.cc
index 77f8fb97853..319e437e334 100644
--- a/gcc/genrecog.cc
+++ b/gcc/genrecog.cc
@@ -4619,6 +4619,7 @@ print_nonbool_test (output_state *os, const rtx_test 
)
   printf ("SUBREG_BYTE (");
   print_test_rtx (os, test);
   printf (")");
+  printf (".to_constant ()");
   break;
 
 case rtx_test::WIDE_INT_FIELD:
-- 
2.24.3 (Apple Git-128)