[committed] wwwdocs: projects/cfg: Update reference to paper

2023-01-15 Thread Gerald Pfeifer
Citeseer no longer has "Hyperblock Performance Optimizations For
ILP Processors; David Isaac August, 1996 (Master Thesis)". Link to
the actual PDF instead.
---
 htdocs/projects/cfg.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/projects/cfg.html b/htdocs/projects/cfg.html
index 464d81dd..b1ee1f34 100644
--- a/htdocs/projects/cfg.html
+++ b/htdocs/projects/cfg.html
@@ -496,7 +496,7 @@ Chang, Scott A. Mahlke, and Wen-mei W. Hwu, 1991
 [8] wwwdocs:
 
 http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.1922;>Hyperblock
+"http://impact.crhc.illinois.edu/shared/Thesis/daugust-thesis.pdf;>Hyperblock
 Performance Optimizations For ILP Processors; David Isaac August,
 1996 (Master Thesis)
 
-- 
2.38.1


Re: [PATCH] PR tree-optimization/108359 - Utilize op1 == op2 when invoking range-ops folding.

2023-01-15 Thread Aldy Hernandez via Gcc-patches




On 1/16/23 08:19, Richard Biener wrote:

On Fri, Jan 13, 2023 at 11:07 PM Andrew MacLeod  wrote:



On 1/13/23 16:54, Jakub Jelinek wrote:

On Fri, Jan 13, 2023 at 04:23:20PM -0500, Andrew MacLeod via Gcc-patches wrote:

fold_range() already invokes wi_fold_in_parts to try to get more refined
information. If the subranges are quite small, it will do each individual
calculation and combine the results.

x * y with x = [1,3] and y = [1,3]  is broken down and we calculate each
possibility and we end up with [1,4][6,6][9,9] instead of [1,9]

We limit this as the time is between quadratic to exponential depending on
the number of elements in x and y.

If we also check the relation and determine that x == y, we don't need to
worry about that growth as this process is linear.  The above case will be
broken down to just  1*1, 2*2 and 3*3, resulting in a range of [1,
1][4,4][9,9].

   In the testcase, it happens to be the right_shift operation, but this
solution is generic and applies to all range-op operations. I added a
testcase which checks >>, *, + and %.

I also arbitrarily chose 8 elements as the limit for breaking down
individual operations.  The overall compile time change for this is
negligible.

Although this is a regression fix, it will affect all operations where x ==
y, which is where my initial hesitancy arose.

Regardless, bootstrapped on x86_64-pc-linux-gnu with no regressions.  OK for
trunk?

Will defer to Aldy, just some nits.

Did you mean Richi?


I suppose Aldy, since it's a regression fix it's OK even during stage4.


I don't have any issues with the patch.  Whatever the release managers 
agree to, I'm fine with.


Aldy



I do have a comment as well though, you do

+  // If op1 and op2 are equivalences, then we don't need a complete cross
+  // product, just pairs of matching elements.
+  if (relation_equiv_p (rel) && lh == rh && num_lh <= 16)
+{
+  int_range_max tmp;
+  r.set_undefined ();
+  for (unsigned x = 0; x < num_lh; ++x)
+   {
+ wide_int lh_lb = lh.lower_bound (x);
+ wide_int lh_ub = lh.upper_bound (x);
+ wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub);

and that does

+  widest_int lh_range = wi::sub (widest_int::from (lh_ub, TYPE_SIGN (type)),
+widest_int::from (lh_lb, TYPE_SIGN (type)));
+  // if there are 1 to 8 values in the LH range, split them up.
+  r.set_undefined ();
+  if (lh_range >= 0 && lh_range <= 7)
+{
+  for (unsigned x = 0; x <= lh_range; x++)

which in total limits the number of sub-ranges in the output but in an
odd way.  It's also all-or-nothing.  IIRC there's a hard limit on the
number of sub-ranges in the output anyway via int_range, so
why not use that and always do the first loop over the sub-ranges
of the inputs and the second loop over the range members but
stop when we reach N-1 and then use wi_fold on the remainds?

Your above code suggests we go up to 112 sub-ranges and once
we'd reach 113 we'd fold down to a single.

Maybe my "heuristic" wouldn't be much better, but then somehow
breaking the heuristic down to a single magic number would be
better, esp. since .union_ will undo some of the breakup when
reaching N?




+  // if there are 1 to 8 values in the LH range, split them up.
+  r.set_undefined ();
+  if (lh_range >= 0 && lh_range <= 7)
+{
+  unsigned x;
+  for (x = 0; x <= lh_range; x++)

Nothing uses x after the loop, so why not
for (unsigned x = 0; x <= lh_range; x++)
instead?


Just old habits.



@@ -234,6 +264,26 @@ range_operator::fold_range (irange , tree type,
 unsigned num_lh = lh.num_pairs ();
 unsigned num_rh = rh.num_pairs ();

+  // If op1 and op2 are equivalences, then we don't need a complete cross
+  // product, just pairs of matching elements.
+  if (relation_equiv_p (rel) && (lh == rh))

The ()s around lh == rh look superfluous to me.

Yeah I just found it marginally more readable, but it is superfluous

+{
+  int_range_max tmp;
+  r.set_undefined ();
+  for (unsigned x = 0; x < num_lh; ++x)

fold_range has an upper bound of num_lh * num_rh > 12, shouldn't something
like that be there for this case too?
I mean, every wi_fold_in_parts_equiv can result in 8 subranges,
but num_lh could be up to 255 here, it is true it is linear and union_
should merge excess ones, but still I wonder if some larger num_lh upper
bound like 20 or 32 wouldn't be useful.  Up to you...

fold_range has the num_lh * num_rh limit because it was
quadratic/exponential and changes rapidly. Since this was linear based
on the number of sub ranges I didn't think it would matter much, but
sure, we can put a similar limit on it.. 16 seems reasonable.

+{
+  wide_int lh_lb = lh.lower_bound (x);
+  wide_int lh_ub = lh.upper_bound (x);
+  wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub);
+  r.union_ (tmp);
+  if (r.varying_p ())
+break;
+}
+  op1_op2_relation_effect (r, type, lh, rh, rel);
+  

Re: [PR106746] drop cselib addr lookup in debug insn mem

2023-01-15 Thread Richard Biener via Gcc-patches
On Sat, Jan 14, 2023 at 12:26 PM Alexandre Oliva via Gcc-patches
 wrote:
>
>
> The testcase used to get scheduled differently depending on the
> presence of debug insns with MEMs.  It's not clear to me why those
> MEMs affected scheduling, but the cselib pre-canonicalization of the
> MEM address is not used at all when analyzing debug insns, so the
> memory allocation and lookup are pure waste.  Somehow, avoiding that
> waste fixes the problem, or makes it go latent.
>
> Regstrapped on x86_64-linux-gnu.  Ok to install?

OK.

Richard.

>
> for  gcc/ChangeLog
>
> PR debug/106746
> * sched-deps.cc (sched_analyze_2): Skip cselib address lookup
> within debug insns.
>
> for  gcc/testsuite/ChangeLog
>
> PR debug/106746
> * gcc.dg/target/i386/pr106746.c: New.
> ---
>  gcc/sched-deps.cc|   36 
> +++---
>  gcc/testsuite/gcc.target/i386/pr106746.c |   29 
>  2 files changed, 47 insertions(+), 18 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr106746.c
>
> diff --git a/gcc/sched-deps.cc b/gcc/sched-deps.cc
> index f9371b81fb41e..a9214f674329a 100644
> --- a/gcc/sched-deps.cc
> +++ b/gcc/sched-deps.cc
> @@ -2605,26 +2605,26 @@ sched_analyze_2 (class deps_desc *deps, rtx x, 
> rtx_insn *insn)
>
>  case MEM:
>{
> -   /* Reading memory.  */
> -   rtx_insn_list *u;
> -   rtx_insn_list *pending;
> -   rtx_expr_list *pending_mem;
> -   rtx t = x;
> -
> -   if (sched_deps_info->use_cselib)
> - {
> -   machine_mode address_mode = get_address_mode (t);
> -
> -   t = shallow_copy_rtx (t);
> -   cselib_lookup_from_insn (XEXP (t, 0), address_mode, 1,
> -GET_MODE (t), insn);
> -   XEXP (t, 0)
> - = cselib_subst_to_values_from_insn (XEXP (t, 0), GET_MODE (t),
> - insn);
> - }
> -
> if (!DEBUG_INSN_P (insn))
>   {
> +   /* Reading memory.  */
> +   rtx_insn_list *u;
> +   rtx_insn_list *pending;
> +   rtx_expr_list *pending_mem;
> +   rtx t = x;
> +
> +   if (sched_deps_info->use_cselib)
> + {
> +   machine_mode address_mode = get_address_mode (t);
> +
> +   t = shallow_copy_rtx (t);
> +   cselib_lookup_from_insn (XEXP (t, 0), address_mode, 1,
> +GET_MODE (t), insn);
> +   XEXP (t, 0)
> + = cselib_subst_to_values_from_insn (XEXP (t, 0), GET_MODE 
> (t),
> + insn);
> + }
> +
> t = canon_rtx (t);
> pending = deps->pending_read_insns;
> pending_mem = deps->pending_read_mems;
> diff --git a/gcc/testsuite/gcc.target/i386/pr106746.c 
> b/gcc/testsuite/gcc.target/i386/pr106746.c
> new file mode 100644
> index 0..14f7dab71d691
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106746.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fsched2-use-superblocks -fcompare-debug -Wno-psabi" } 
> */
> +
> +typedef char __attribute__((__vector_size__ (64))) U;
> +typedef short __attribute__((__vector_size__ (64))) V;
> +typedef int __attribute__((__vector_size__ (64))) W;
> +
> +char c;
> +U a;
> +U *r;
> +W foo0_v512u32_0;
> +
> +void
> +foo (W)
> +{
> +  U u;
> +  V v;
> +  W w = __builtin_shuffle (foo0_v512u32_0, foo0_v512u32_0);
> +  u =
> +__builtin_shufflevector (a, u, 3, 0, 4, 9, 9, 6, 7, 8, 5,
> +0, 6, 1, 8, 1, 2, 8, 6,
> +1, 8, 4, 9, 3, 8, 4, 6, 0, 9, 0, 1, 8, 2, 3, 3,
> +0, 4, 9, 9, 6, 7, 8, 5,
> +0, 6, 1, 8, 1, 2, 8, 6,
> +1, 8, 4, 9, 3, 8, 4, 6, 0, 9, 0, 1, 8, 2, 3);
> +  v *= c;
> +  w &= c;
> +  *r = (U) v + (U) w;
> +}
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 


Re: [RFC] Introduce -finline-memset-loops

2023-01-15 Thread Richard Biener via Gcc-patches
On Sat, Jan 14, 2023 at 2:55 AM Alexandre Oliva  wrote:
>
> Hello, Richard,
>
> Thank you for the feedback.
>
> On Jan 12, 2023, Richard Biener  wrote:
>
> > On Tue, Dec 27, 2022 at 5:12 AM Alexandre Oliva via Gcc-patches
> >  wrote:
>
> >> This patch extends the memset expansion to start with a loop, so as to
> >> still take advantage of known alignment even with long lengths, but
> >> without necessarily adding store blocks for every power of two.
>
> > I wonder if that isn't better handled by targets via the setmem pattern,
>
> That was indeed where I started, but then I found myself duplicating the
> logic in try_store_by_multiple_pieces on a per-target basis.
>
> Target-specific code is great for tight optimizations, but the main
> purpose of this feature is not an optimization.  AFAICT it actually
> slows things down in general (due to code growth, and to conservative
> assumptions about alignment), except perhaps for some microbenchmarks.
> It's rather a means to avoid depending on the C runtime, particularly
> due to compiler-introduced memset calls.

OK, that's what I guessed but you didn't spell out.  So does it make sense
to mention -ffreestanding in the docs at least?  My fear is that we'd get
complaints that -O3 -finline-memset-loops turns nicely optimized memset
loops into dumb ones (via loop distribution and then stupid re-expansion).
So does it also make sense to turn off -floop-distribute-patterns[-memset]
with -finline-memset-loops?

> My initial goal was to be able to show that inline expansion would NOT
> bring about performance improvements, but performance was not the
> concern that led to the request.
>
> If the approach seems generally acceptable, I may even end up extending
> it to other such builtins.  I have a vague recollection that memcmp is
> also an issue for us.

The C/C++ runtime produce at least memmove, memcpy and memcmp as well.
In this respect -finline-memset-loops is too specific and to avoid an explosion
in the number of command line options we should try to come up with sth
better?  -finline-all-stringops[={memset,memcpy,...}] (just like x86 has
-minline-all-stringops)?

> > like x86 has the stringop inline strathegy.  What is considered acceptable
> > in terms of size or performance will vary and I don't think there's much
> > room for improvements on this generic code support?
>
> *nod* x86 is quite finely tuned already; I suppose other targets may
> have some room for additional tuning, both for performance and for code
> size, but we don't have much affordance for avoiding builtin calls to
> the C runtime, which is what this is about.
>
> Sometimes disabling loop distribution is enough to accomplish that, but
> in some cases GNAT itself resorts to builtin memset calls, in ways that
> are not so easy to avoid, and that would ultimately amount to expanding
> memset inline, so I figured we might as well offer that as a general
> feature, for users to whom this matters.
>
> Is (optionally) tending to this (uncommon, I suppose) need (or
> preference?) not something GCC would like to do?

Sure, I think for the specific intended purpose that would be fine.  It should
also only apply to __builtin_memset calls, not to memset calls from user code?

Thanks,
Richard.

> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 


Re: [PATCH] PR tree-optimization/108359 - Utilize op1 == op2 when invoking range-ops folding.

2023-01-15 Thread Richard Biener via Gcc-patches
On Fri, Jan 13, 2023 at 11:07 PM Andrew MacLeod  wrote:
>
>
> On 1/13/23 16:54, Jakub Jelinek wrote:
> > On Fri, Jan 13, 2023 at 04:23:20PM -0500, Andrew MacLeod via Gcc-patches 
> > wrote:
> >> fold_range() already invokes wi_fold_in_parts to try to get more refined
> >> information. If the subranges are quite small, it will do each individual
> >> calculation and combine the results.
> >>
> >> x * y with x = [1,3] and y = [1,3]  is broken down and we calculate each
> >> possibility and we end up with [1,4][6,6][9,9] instead of [1,9]
> >>
> >> We limit this as the time is between quadratic to exponential depending on
> >> the number of elements in x and y.
> >>
> >> If we also check the relation and determine that x == y, we don't need to
> >> worry about that growth as this process is linear.  The above case will be
> >> broken down to just  1*1, 2*2 and 3*3, resulting in a range of [1,
> >> 1][4,4][9,9].
> >>
> >>   In the testcase, it happens to be the right_shift operation, but this
> >> solution is generic and applies to all range-op operations. I added a
> >> testcase which checks >>, *, + and %.
> >>
> >> I also arbitrarily chose 8 elements as the limit for breaking down
> >> individual operations.  The overall compile time change for this is
> >> negligible.
> >>
> >> Although this is a regression fix, it will affect all operations where x ==
> >> y, which is where my initial hesitancy arose.
> >>
> >> Regardless, bootstrapped on x86_64-pc-linux-gnu with no regressions.  OK 
> >> for
> >> trunk?
> > Will defer to Aldy, just some nits.
> Did you mean Richi?

I suppose Aldy, since it's a regression fix it's OK even during stage4.

I do have a comment as well though, you do

+  // If op1 and op2 are equivalences, then we don't need a complete cross
+  // product, just pairs of matching elements.
+  if (relation_equiv_p (rel) && lh == rh && num_lh <= 16)
+{
+  int_range_max tmp;
+  r.set_undefined ();
+  for (unsigned x = 0; x < num_lh; ++x)
+   {
+ wide_int lh_lb = lh.lower_bound (x);
+ wide_int lh_ub = lh.upper_bound (x);
+ wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub);

and that does

+  widest_int lh_range = wi::sub (widest_int::from (lh_ub, TYPE_SIGN (type)),
+widest_int::from (lh_lb, TYPE_SIGN (type)));
+  // if there are 1 to 8 values in the LH range, split them up.
+  r.set_undefined ();
+  if (lh_range >= 0 && lh_range <= 7)
+{
+  for (unsigned x = 0; x <= lh_range; x++)

which in total limits the number of sub-ranges in the output but in an
odd way.  It's also all-or-nothing.  IIRC there's a hard limit on the
number of sub-ranges in the output anyway via int_range, so
why not use that and always do the first loop over the sub-ranges
of the inputs and the second loop over the range members but
stop when we reach N-1 and then use wi_fold on the remainds?

Your above code suggests we go up to 112 sub-ranges and once
we'd reach 113 we'd fold down to a single.

Maybe my "heuristic" wouldn't be much better, but then somehow
breaking the heuristic down to a single magic number would be
better, esp. since .union_ will undo some of the breakup when
reaching N?

> >
> >> +  // if there are 1 to 8 values in the LH range, split them up.
> >> +  r.set_undefined ();
> >> +  if (lh_range >= 0 && lh_range <= 7)
> >> +{
> >> +  unsigned x;
> >> +  for (x = 0; x <= lh_range; x++)
> > Nothing uses x after the loop, so why not
> >for (unsigned x = 0; x <= lh_range; x++)
> > instead?
>
> Just old habits.
>
>
> >> @@ -234,6 +264,26 @@ range_operator::fold_range (irange , tree type,
> >> unsigned num_lh = lh.num_pairs ();
> >> unsigned num_rh = rh.num_pairs ();
> >>
> >> +  // If op1 and op2 are equivalences, then we don't need a complete cross
> >> +  // product, just pairs of matching elements.
> >> +  if (relation_equiv_p (rel) && (lh == rh))
> > The ()s around lh == rh look superfluous to me.
> Yeah I just found it marginally more readable, but it is superfluous
> >> +{
> >> +  int_range_max tmp;
> >> +  r.set_undefined ();
> >> +  for (unsigned x = 0; x < num_lh; ++x)
> > fold_range has an upper bound of num_lh * num_rh > 12, shouldn't something
> > like that be there for this case too?
> > I mean, every wi_fold_in_parts_equiv can result in 8 subranges,
> > but num_lh could be up to 255 here, it is true it is linear and union_
> > should merge excess ones, but still I wonder if some larger num_lh upper
> > bound like 20 or 32 wouldn't be useful.  Up to you...
> fold_range has the num_lh * num_rh limit because it was
> quadratic/exponential and changes rapidly. Since this was linear based
> on the number of sub ranges I didn't think it would matter much, but
> sure, we can put a similar limit on it.. 16 seems reasonable.
> >> +{
> >> +  wide_int lh_lb = lh.lower_bound (x);
> >> +  wide_int lh_ub = lh.upper_bound (x);
> >> +  wi_fold_in_parts_equiv (tmp, type, 

[PATCH] AArch64: Gate various crypto intrinsics availability based on features

2023-01-15 Thread Tejas Belagod via Gcc-patches
The 64-bit variant of PMULL{2} and AES instructions are available if FEAT_AES
is implemented according to the Arm ARM [1].  Similarly FEAT_SHA1 and
FEAT_SHA256 enable the use of SHA1 and SHA256 instruction variants.
This patch fixes arm_neon.h to correctly reflect the feature availability based
on '+aes' and '+sha2' as opposed to the ambiguous catch-all '+crypto'.

[1] Section D17.2.61, C7.2.215

2022-01-11  Tejas Belagod  

gcc/
* config/aarch64/arm_neon.h: Gate AES and PMULL64 intrinsics
under target feature +aes as opposed to +crypto. Gate SHA1 and SHA2
intrinsics under +sha2.

testsuite/

* gcc.target/aarch64/acle/pmull64.c: New.
* gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c: Replace '+crypto' with
corresponding feature flag based on the intrinsic.
* gcc.target/aarch64/aes-fuse-2.c: Likewise.
* gcc.target/aarch64/aes_1.c: Likewise.
* gcc.target/aarch64/aes_2.c: Likewise.
* gcc.target/aarch64/aes_xor_combine.c: Likewise.
* gcc.target/aarch64/sha1_1.c: Likewise.
* gcc.target/aarch64/sha256_1.c: Likewise.
* gcc.target/aarch64/target_attr_crypto_ice_1.c: Likewise.
---
 gcc/config/aarch64/arm_neon.h | 35 ++-
 .../gcc.target/aarch64/acle/pmull64.c | 14 
 gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c |  4 +--
 gcc/testsuite/gcc.target/aarch64/aes-fuse-2.c |  4 +--
 gcc/testsuite/gcc.target/aarch64/aes_1.c  |  2 +-
 gcc/testsuite/gcc.target/aarch64/aes_2.c  |  4 ++-
 .../gcc.target/aarch64/aes_xor_combine.c  |  2 +-
 gcc/testsuite/gcc.target/aarch64/sha1_1.c |  2 +-
 gcc/testsuite/gcc.target/aarch64/sha256_1.c   |  2 +-
 .../aarch64/target_attr_crypto_ice_1.c|  2 +-
 10 files changed, 44 insertions(+), 27 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/pmull64.c

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index cf6af728ca9..a795a387b38 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -7496,7 +7496,7 @@ vqrdmlshs_laneq_s32 (int32_t __a, int32_t __b, int32x4_t 
__c, const int __d)
 #pragma GCC pop_options
 
 #pragma GCC push_options
-#pragma GCC target ("+nothing+crypto")
+#pragma GCC target ("+nothing+aes")
 /* vaes  */
 
 __extension__ extern __inline uint8x16_t
@@ -7526,6 +7526,22 @@ vaesimcq_u8 (uint8x16_t data)
 {
   return __builtin_aarch64_crypto_aesimcv16qi_uu (data);
 }
+
+__extension__ extern __inline poly128_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vmull_p64 (poly64_t __a, poly64_t __b)
+{
+  return
+__builtin_aarch64_crypto_pmulldi_ppp (__a, __b);
+}
+
+__extension__ extern __inline poly128_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vmull_high_p64 (poly64x2_t __a, poly64x2_t __b)
+{
+  return __builtin_aarch64_crypto_pmullv2di_ppp (__a, __b);
+}
+
 #pragma GCC pop_options
 
 /* vcage  */
@@ -20772,7 +20788,7 @@ vrsrad_n_u64 (uint64_t __a, uint64_t __b, const int __c)
 }
 
 #pragma GCC push_options
-#pragma GCC target ("+nothing+crypto")
+#pragma GCC target ("+nothing+sha2")
 
 /* vsha1  */
 
@@ -20849,21 +20865,6 @@ vsha256su1q_u32 (uint32x4_t __tw0_3, uint32x4_t 
__w8_11, uint32x4_t __w12_15)
   __w12_15);
 }
 
-__extension__ extern __inline poly128_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vmull_p64 (poly64_t __a, poly64_t __b)
-{
-  return
-__builtin_aarch64_crypto_pmulldi_ppp (__a, __b);
-}
-
-__extension__ extern __inline poly128_t
-__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-vmull_high_p64 (poly64x2_t __a, poly64x2_t __b)
-{
-  return __builtin_aarch64_crypto_pmullv2di_ppp (__a, __b);
-}
-
 #pragma GCC pop_options
 
 /* vshl */
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/pmull64.c 
b/gcc/testsuite/gcc.target/aarch64/acle/pmull64.c
new file mode 100644
index 000..6a1e99e2d0d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/acle/pmull64.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=armv8.2-a" } */
+
+#pragma push_options
+#pragma GCC target ("+aes")
+
+#include "arm_neon.h"
+
+int foo (poly64_t a, poly64_t b)
+{
+  return vgetq_lane_s32 (vreinterpretq_s32_p128 (vmull_p64 (a, b)), 0);
+}
+
+/* { dg-final { scan-assembler "\tpmull\tv" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c 
b/gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c
index d7b4f89919d..1b4e10f78db 100644
--- a/gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -mcpu=cortex-a72+crypto -dp" } */
-/* { dg-additional-options "-march=armv8-a+crypto" { target { aarch64*-*-* } } 
}*/
+/* { dg-options "-O3 -mcpu=cortex-a72+aes -dp" } */
+/* { dg-additional-options "-march=armv8-a+aes" { target { aarch64*-*-* } } }*/
 
 #include 
 

Re: [PATCH][pushed] contrib: add 'contrib' to default dirs in update-copyright.py

2023-01-15 Thread Andrew Pinski via Gcc-patches
On Thu, Jan 5, 2023 at 11:49 PM Martin Liška  wrote:
>
> Hi.
>
> I forgot to include contrib folder in default dir, thus the copyright in the 
> folder
> haven't been updated by Jakub.
>
> However, I noticed when I run ./contrib/update-copyright.py --this-year
> I get much more modifications out of contrib folder:
>
> $ git diff --stat
> ...
>  951 files changed, 976 insertions(+), 979 deletions(-)
>
> where the are not updated files is:
> gcc/analyzer/*
> gcc/common/config/*
> gcc/m2/*
> gcc/objc/*

libstdc++-v3 was not updated either. I wonder how many more were missed too ...

Thanks,
Andrew Pinski

>
> Jakub, can you please re-run the script?
>
> Cheers,
> Martin
>
> contrib/ChangeLog:
>
> * update-copyright.py: Add contrib as a default dir.
> ---
>  contrib/update-copyright.py | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/contrib/update-copyright.py b/contrib/update-copyright.py
> index fcae846fb1e..06e6fb61757 100755
> --- a/contrib/update-copyright.py
> +++ b/contrib/update-copyright.py
> @@ -785,6 +785,7 @@ class GCCCmdLine (CmdLine):
>
>  self.default_dirs = [
>  'c++tools',
> +'contrib',
>  'gcc',
>  'include',
>  'libada',
> --
> 2.39.0
>


[PATCH] xtensa: Eliminate the use of callee-saved register that saves and restores only once

2023-01-15 Thread Takayuki 'January June' Suwa via Gcc-patches
In the case of the CALL0 ABI, values that must be retained before and
after function calls are placed in the callee-saved registers (A12
through A15) and referenced later.  However, it is often the case that
the save and the reference are each only once and a simple register-
register move.

e.g. in the following example, if there are no other occurrences of
register A14:

;; before
; prologue {
  ...
s32i.n  a14, sp, 16
  ...
; } prologue
  ...
mov.n   a14, a6
  ...
call0   foo
  ...
mov.n   a8, a14
  ...
; epilogue {
  ...
l32i.n  a14, sp, 16
  ...
; } epilogue

It can be possible like this:

;; after
; prologue {
  ...
(deleted)
  ...
; } prologue
  ...
s32i.n  a6, sp, 16
  ...
call0   foo
  ...
l32i.n  a8, sp, 16
  ...
; epilogue {
  ...
(deleted)
  ...
; } epilogue

This patch introduces a new peephole2 pattern that implements the above.

gcc/ChangeLog:

* config/xtensa/xtensa.md: New peephole2 pattern that eliminates
the use of callee-saved register that saves and restores only once
for other register, by using its stack slot directly.
---
 gcc/config/xtensa/xtensa.md | 58 +
 1 file changed, 58 insertions(+)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 764da63f91c..249147688ac 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -3024,3 +3024,61 @@ FALLTHRU:;
   operands[1] = GEN_INT (imm0);
   operands[2] = GEN_INT (imm1);
 })
+
+(define_peephole2
+  [(set (match_operand:SI 0 "register_operand")
+   (match_operand:SI 1 "reload_operand"))]
+  "!TARGET_WINDOWED_ABI && df
+   && epilogue_contains (insn)
+   && ! call_used_or_fixed_reg_p (REGNO (operands[0]))"
+  [(const_int 0)]
+{
+  rtx reg = operands[0], pattern;
+  rtx_insn *insnP = NULL, *insnS = NULL, *insnR = NULL;
+  df_ref ref;
+  rtx_insn *insn;
+  for (ref = DF_REG_DEF_CHAIN (REGNO (reg));
+   ref; ref = DF_REF_NEXT_REG (ref))
+if (DF_REF_CLASS (ref) != DF_REF_REGULAR)
+  continue;
+else if ((insn = DF_REF_INSN (ref)) == curr_insn)
+  continue;
+else if (GET_CODE (pattern = PATTERN (insn)) == SET
+&& rtx_equal_p (SET_DEST (pattern), reg)
+&& REG_P (SET_SRC (pattern)))
+  {
+   if (insnS)
+ FAIL;
+   insnS = insn;
+   continue;
+  }
+else
+  FAIL;
+  for (ref = DF_REG_USE_CHAIN (REGNO (reg));
+   ref; ref = DF_REF_NEXT_REG (ref))
+if (DF_REF_CLASS (ref) != DF_REF_REGULAR)
+  continue;
+else if (prologue_contains (insn = DF_REF_INSN (ref)))
+  {
+   insnP = insn;
+   continue;
+  }
+else if (GET_CODE (pattern = PATTERN (insn)) == SET
+&& rtx_equal_p (SET_SRC (pattern), reg)
+&& REG_P (SET_DEST (pattern)))
+  {
+   if (insnR)
+ FAIL;
+   insnR = insn;
+   continue;
+  }
+else
+  FAIL;
+  if (!insnP || !insnS || !insnR)
+FAIL;
+  SET_DEST (PATTERN (insnS)) = copy_rtx (operands[1]);
+  df_insn_rescan (insnS);
+  SET_SRC (PATTERN (insnR)) = copy_rtx (operands[1]);
+  df_insn_rescan (insnR);
+  set_insn_deleted (insnP);
+})
-- 
2.30.2


[committed] wwwdocs: gcc-3.4: Switch www.eclipse.org to https

2023-01-15 Thread Gerald Pfeifer


---
 htdocs/gcc-3.4/changes.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/gcc-3.4/changes.html b/htdocs/gcc-3.4/changes.html
index aac9a245..d9985673 100644
--- a/htdocs/gcc-3.4/changes.html
+++ b/htdocs/gcc-3.4/changes.html
@@ -738,7 +738,7 @@ and not your code, that is broken.
   href="http://www.gnu.org/software/classpath/;>GNU Classpath.
 Class loading is now much more correct; in particular the
   caller's class loader is now used when that is required.
-http://www.eclipse.org/;>Eclipse 2.x will run
+https://www.eclipse.org;>Eclipse 2.x will run
   out of the box using gij.
 Parts of java.nio have been implemented.
   Direct and indirect buffers work, as do fundamental file and
-- 
2.38.1


[committed] wwwdocs: codingconventions: Adjust Intel BID library link

2023-01-15 Thread Gerald Pfeifer
Following a permanent redirect request from that server.

Gerald
---
 htdocs/codingconventions.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/codingconventions.html b/htdocs/codingconventions.html
index 348f1e1d..5519d3f9 100644
--- a/htdocs/codingconventions.html
+++ b/htdocs/codingconventions.html
@@ -764,7 +764,7 @@ FSF website, or are autogenerated.  These files should not 
be changed
 without prior permission, if at all.
 
 libgcc/config/libbid: The master sources come from Intel BID library
-https://www.netlib.org/misc/intel/;>Intel BID library. 
+https://netlib.org/misc/intel/;>Intel BID library. 
 Bugs should be reported to
 mailto:marius.cor...@intel.com;>marius.cor...@intel.com
 and
-- 
2.38.1


[committed] wwwdocs: gcc-4.5: Convert www.open-std.org links to https

2023-01-15 Thread Gerald Pfeifer
There are more; this should be the biggest chunk left, though.

Gerald

---
 htdocs/gcc-4.5/changes.html  |   4 +-
 htdocs/gcc-4.5/cxx0x_status.html | 124 +++
 2 files changed, 64 insertions(+), 64 deletions(-)

diff --git a/htdocs/gcc-4.5/changes.html b/htdocs/gcc-4.5/changes.html
index 061dbce4..2e8f56a7 100644
--- a/htdocs/gcc-4.5/changes.html
+++ b/htdocs/gcc-4.5/changes.html
@@ -320,7 +320,7 @@
   template arguments, and in declarations of variables and functions
   with linkage, so long as any such declaration that is used is also
   defined (http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#757;
+  href="https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#757;
   >DR 757).
 
 Labels may now have attributes, as has been permitted for a
@@ -331,7 +331,7 @@
 
 
   G++ now implements 
-  http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#176;>DR
+  https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#176;>DR
176.  Previously G++ did not support using the
   injected-class-name of a template base class as a type name, and
   lookup of the name found the declaration of the template in the
diff --git a/htdocs/gcc-4.5/cxx0x_status.html b/htdocs/gcc-4.5/cxx0x_status.html
index 006b3bea..f12071b3 100644
--- a/htdocs/gcc-4.5/cxx0x_status.html
+++ b/htdocs/gcc-4.5/cxx0x_status.html
@@ -18,7 +18,7 @@
 GCC's C++0x mode tracks the C++0x working paper drafts produced by
 the ISO C++ committee, available on the ISO C++ committee's web site
 at http://www.open-std.org/jtc1/sc22/wg21/;>http://www.open-std.org/jtc1/sc22/wg21/.
 Since
+href="https://www.open-std.org/jtc1/sc22/wg21/;>https://www.open-std.org/jtc1/sc22/wg21/.
 Since
 this standard is still being extended and modified, the feature set
 provided by the experimental C++0x mode may vary greatly from one GCC
 version to another. No attempts will be made to preserve backward
@@ -40,236 +40,236 @@ page.
 
 
   Rvalue references
-  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2118.html;>N2118
+  https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2118.html;>N2118
Yes
 
 
   Rvalue references for *this
-  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2439.htm;>N2439
+  https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2439.htm;>N2439
   No
 
 
   Initialization of class objects by rvalues
-  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1610.html;>N1610
+  https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1610.html;>N1610
   Yes
 
 
   Non-static data member initializers
-  http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2008/n2756.htm;>N2756
+  https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2008/n2756.htm;>N2756
   No
 
 
   Variadic templates
-  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2242.pdf;>N2242
+  https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2242.pdf;>N2242
Yes
 
 
   Extending variadic template template 
parameters
-  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2555.pdf;>N2555
+  https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2555.pdf;>N2555
Yes
 
 
   Initializer lists
-  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2672.htm;>N2672
+  https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2672.htm;>N2672
Yes
 
 
   Static assertions
-  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1720.html;>N1720
+  https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1720.html;>N1720
Yes
 
 
   auto-typed variables
-  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1984.pdf;>N1984
+  https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n1984.pdf;>N1984
Yes
 
 
   Multi-declarator auto
-  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1737.pdf;>N1737
+  https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1737.pdf;>N1737
Yes
 
 
   Removal of auto as a storage-class 
specifier
-  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2546.htm;>N2546
+  https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2546.htm;>N2546
Yes
 
 
   New function declarator syntax
-  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2541.htm;>N2541
+  https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2541.htm;>N2541
Yes
 
 
   New wording for C++0x lambdas
-  http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2009/n2927.pdf;>N2927
+  https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2009/n2927.pdf;>N2927
   Yes
 
 
   Declared type of an expression
-  

[committed] wwwdocs: faq: Move c-faq.com to https

2023-01-15 Thread Gerald Pfeifer
Pushed.

Gerald
---
 htdocs/faq.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/faq.html b/htdocs/faq.html
index b09e3920..203661dc 100644
--- a/htdocs/faq.html
+++ b/htdocs/faq.html
@@ -13,7 +13,7 @@
 
 This FAQ tries to answer specific questions concerning GCC. For
 general information regarding C, C++, and Fortran respectively, please check
-the http://c-faq.com/;>comp.lang.c FAQ and the
+the https://c-faq.com;>comp.lang.c FAQ and the
 https://isocpp.org/faq/;>C++ FAQ.
 
 
-- 
2.38.1


Re: [PATCH] testsuite: Skip intrinsics test if arm

2023-01-15 Thread Torbjorn SVENSSON via Gcc-patches



On 2023-01-12 16:03, Richard Earnshaw wrote:



On 19/09/2022 17:16, Torbjörn SVENSSON via Gcc-patches wrote:

In the test case, it's clearly written that intrinsics is not
implemented on arm*. A simple xfail does not help since there are
link error and that would cause an UNRESOLVED testcase rather than
XFAIL.
By chaning to dg-skip-if, the entire test case is omitted.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/advsimd-intrinsics/vld1x2.c: Replace
dg-xfail-if with gd-skip-if.


Sorry for the delay reviewing this, I missed it at the time.

My problem with your suggested solution is that if these intrinsics are 
ever added this test will not automatically pick this up as it will have 
been disabled.  I presume from the comment (and the body of the test 
that contains an #ifdef for aarch64) that this is expected to be a 
temporary issue rather than something permanent.


So IMO I think it is correct to leave this as unresolved because the 
test cannot be built due to an issue with the compiler.


This patch has already been merged after Kyrill reviewed it back in 
September.


Without this change, the log would be filled with warnings about missing 
types. Maybe we could add some check that will enable the test only if 
the types are known?

Would that mitigate your concern?

Attached is the log from vld1x2.c on Cortex-A7 with -mfloat-abi=hard 
-mfpu=neon.


When I look at the result of a run, I only look at the test cases that 
are either FAIL (obviously), XPASS and UNRESOLVED. All other test cases 
are in a "good" state from what I can tell. If there are a lot of test 
cases in the UNRESOLVED state, that are not yet implemented year after 
year, it makes it harder to identify those test cases that are of 
interest. Right or wrong, that's why I suggested to remove it for the 
list of test cases that should be working.


Let me know what you think.

Kind regards,
Torbjörn



R.



Co-Authored-By: Yvan ROUX  
Signed-off-by: Torbjörn SVENSSON  
---
  gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c

index 92a139bc523..f933102be47 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c
@@ -1,6 +1,6 @@
  /* We haven't implemented these intrinsics for arm yet.  */
-/* { dg-xfail-if "" { arm*-*-* } } */
  /* { dg-do run } */
+/* { dg-skip-if "unsupported" { arm*-*-* } } */
  /* { dg-options "-O3" } */
  #include Testing advsimd-intrinsics/vld1x2.c,   -O1
doing compile
Executing on host: /build/bin/arm-none-eabi-gcc  
/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c  -mthumb 
-march=armv7ve -mcpu=cortex-a7 -mfloat-abi=hard -mfpu=neon   -dumpbase "" 
-fdiagnostics-plain-output-O1  -O3   -Wl,gcc_tg.o -lm -T 
/qemu/qemu-cortex-a7.ld -o ./vld1x2.exe(timeout = 800)
spawn -ignore SIGHUP /build/bin/arm-none-eabi-gcc 
/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c -mthumb 
-march=armv7ve -mcpu=cortex-a7 -mfloat-abi=hard -mfpu=neon -dumpbase  
-fdiagnostics-plain-output -O1 -O3 -Wl,gcc_tg.o -lm -T /qemu/qemu-cortex-a7.ld 
-o ./vld1x2.exe
pid is 22433 -22433
/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c: In function 
'test_vld_u8_x2':
/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c:21:13: 
warning: implicit declaration of function 'vld1_u8_x2'; did you mean 
'vld1_u32'? [-Wimplicit-function-declaration]
/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c:32:1: note: 
in expansion of macro 'TESTMETH'
/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c:62:27: note: 
in expansion of macro 'VARIANTS_1'
/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c:66:1: note: 
in expansion of macro 'VARIANTS'
/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c:21:13: error: 
incompatible types when assigning to type 'uint8x8x2_t' from type 'int'
/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c:32:1: note: 
in expansion of macro 'TESTMETH'
/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c:62:27: note: 
in expansion of macro 'VARIANTS_1'
/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c:66:1: note: 
in expansion of macro 'VARIANTS'
/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c: In function 
'test_vld_u16_x2':
/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c:21:13: 
warning: implicit declaration of function 'vld1_u16_x2'; did you mean 
'vld1_u16'? [-Wimplicit-function-declaration]
/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c:33:1: note: 
in expansion of macro 'TESTMETH'
/src/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vld1x2.c:62:27: note: 
in expansion of macro 'VARIANTS_1'

Re: [PATCH v3 2/2] aarch64: Fix bit-field alignment in param passing [PR105549]

2023-01-15 Thread Christophe Lyon via Gcc-patches

Hi!


On 1/13/23 16:38, Jakub Jelinek wrote:

On Wed, Jan 11, 2023 at 03:18:06PM +0100, Christophe Lyon via Gcc-patches wrote:

While working on enabling DFP for AArch64, I noticed new failures in
gcc.dg/compat/struct-layout-1.exp (t028) which were not actually
caused by DFP types handling. These tests are generated during 'make
check' and enabling DFP made generation different (not sure if new
non-DFP tests are generated, or if existing ones are generated
differently, the tests in question are huge and difficult to compare).

Anyway, I reduced the problem to what I attach at the end of the new
gcc.target/aarch64/aapcs64/va_arg-17.c test and rewrote it in the same
scheme as other va_arg* AArch64 tests.  Richard Sandiford further
reduced this to a non-vararg function, added as a second testcase.

This is a tough case mixing bit-fields and alignment, where
aarch64_function_arg_alignment did not follow what its descriptive
comment says: we want to use the natural alignment of the bit-field
type only if the user didn't reduce the alignment for the bit-field
itself.

The patch also adds a comment and assert that would help someone who
has to look at this area again.

The fix would be very small, except that this introduces a new ABI
break, and we have to warn about that.  Since this actually fixes a
problem introduced in GCC 9.1, we keep the old computation to detect
when we now behave differently.

This patch adds two new tests (va_arg-17.c and
pr105549.c). va_arg-17.c contains the reduced offending testcase from
struct-layout-1.exp for reference.  We update some tests introduced by
the previous patch, where parameters with bit-fields and packed
attribute now emit a different warning.


I'm seeing
+FAIL: g++.target/aarch64/bitfield-abi-warning-align16-O2.C 
scan-assembler-times and\\tw0, w1, 1 10
+FAIL: g++.target/aarch64/bitfield-abi-warning-align32-O2.C 
scan-assembler-times and\\tw0, w1, 1 10
+FAIL: g++.target/aarch64/bitfield-abi-warning-align8-O2.C scan-assembler-times 
and\\tw0, w0, 1 11
+FAIL: g++.target/aarch64/bitfield-abi-warning-align8-O2.C scan-assembler-times 
and\\tw0, w1, 1 18
+FAIL: gcc.target/aarch64/sve/pcs/struct_3_128.c -march=armv8.2-a+sve (internal 
compiler error: in aarch64_layout_arg, at config/aarch64/aarch64.cc:7696)
+FAIL: gcc.target/aarch64/sve/pcs/struct_3_128.c -march=armv8.2-a+sve (test for 
excess errors)
+FAIL: gcc.target/aarch64/sve/pcs/struct_3_256.c -march=armv8.2-a+sve (internal 
compiler error: in aarch64_layout_arg, at config/aarch64/aarch64.cc:7696)
+FAIL: gcc.target/aarch64/sve/pcs/struct_3_256.c -march=armv8.2-a+sve (test for 
excess errors)
+FAIL: gcc.target/aarch64/sve/pcs/struct_3_512.c -march=armv8.2-a+sve (internal 
compiler error: in aarch64_layout_arg, at config/aarch64/aarch64.cc:7696)
+FAIL: gcc.target/aarch64/sve/pcs/struct_3_512.c -march=armv8.2-a+sve (test for 
excess errors)
regressions with this change.



Really deeply sorry for this :-(



aarch64.cc:7696 is for me the newly added:


+  gcc_assert (alignment <= 16 * BITS_PER_UNIT
+ && (!alignment || abi_break < alignment)
+ && (!abi_break_packed || alignment < abi_break_packed));


assert.
Details in
https://kojipkgs.fedoraproject.org//work/tasks/2857/96062857/build.log
(configure line etc.), plus if you
wget https://kojipkgs.fedoraproject.org//work/tasks/2857/96062857/build.log
sed -n '/^begin /,/^end/p' build.log | uuencode > you get a compressed tarball 
with the testsuite *.log files.


Thanks I managed to download this (you meant uudecode rather than 
uuencode ;-) )


I see the scan-assembler-times are also failing in gcc.target, I guess 
you just forgot to paste them?


From your other message, it seems you are building with stack-protector 
enabled by default, but I can't see that in the configure lines?


Indeed I just checked the generated code with/without 
-fstack-protector-all, and it obviously changes a lot, thus breaking the 
fragile scan-assembler directives. As you said, it's easy to avoid with 
-fno-stack-protector.


I'll check the problem with the assert.

Thanks and sorry,

Christophe



Jakub



Re: [PATCH] [PR107608] [range-ops] Avoid folding into INF when flag_trapping_math.

2023-01-15 Thread Aldy Hernandez via Gcc-patches




On 1/15/23 13:18, Jakub Jelinek wrote:

On Sun, Jan 15, 2023 at 11:32:27AM +0100, Aldy Hernandez wrote:

As discussed in the PR, for trapping math, do not fold overflowing
operations into +-INF as doing so could elide a trap.

There is a minor adjustment to known_isinf() where it was mistakenly
returning true for an [infinity U NAN], whereas it should only return
true when the range is exclusively +INF or -INF.  This is benign, as
there were no users of known_isinf up to now.

I had some testsuite issues with:


FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++14  scan-sarif-file 
"text": "  int u6587u5b57u5316u3051 =
FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++17  scan-sarif-file 
"text": "  int u6587u5b57u5316u3051 =
FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++20  scan-sarif-file 
"text": "  int u6587u5b57u5316u3051 =
FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++98  scan-sarif-file 
"text": "  int u6587u5b57u5316u3051 =
FAIL: g++.dg/pr71488.C   (test for excess errors)
FAIL: g++.dg/guality/pr55665.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 23 p == 40
FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -Wc++-compat   scan-sarif-file 
"text": "  int u6587u5b57u5316u3051 =

< FAIL: g++.dg/pr71488.C   (test for excess errors)
< FAIL: g++.dg/guality/pr55665.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 23 p == 40

FAIL: ./index0-out.go execution,  -O0 -g -fno-var-tracking-assignments
FAIL: go.test/test/fixedbugs/issue27836.dir/Äfoo.go  -O -I. (test for excess 
errors)
FAIL: go.test/test/fixedbugs/issue27836.dir/Ämain.go  -O -I. (test for excess 
errors)


But they seem to be transient issues on my machine, as re-running them
manually don't cause any issues.  Also, the tests themselves have
nothing to do with floats so I don't see how they could be related.

Tested on x86-64 Linux.

I also ran the glibc testsuite (git sources) on x86-64 and this patch
fixes:

-FAIL: math/test-double-lgamma
-FAIL: math/test-double-log1p
-FAIL: math/test-float-lgamma
-FAIL: math/test-float-log1p
-FAIL: math/test-float128-catan
-FAIL: math/test-float128-catanh
-FAIL: math/test-float128-lgamma
-FAIL: math/test-float128-log
-FAIL: math/test-float128-log1p
-FAIL: math/test-float128-y0
-FAIL: math/test-float128-y1
-FAIL: math/test-float32-lgamma
-FAIL: math/test-float32-log1p
-FAIL: math/test-float32x-lgamma
-FAIL: math/test-float32x-log1p
-FAIL: math/test-float64-lgamma
-FAIL: math/test-float64-log1p
-FAIL: math/test-float64x-lgamma
-FAIL: math/test-ldouble-lgamma

OK for trunk?

PR tree-optimization/107608

gcc/ChangeLog:

* range-op-float.cc (range_operator_float::fold_range): Avoid
folding into INF when flag_trapping_math.
* value-range.h (frange::known_isinf): Return false for possible NANs.


As a workaround this looks ok to me, but we need to figure out something
better for GCC 14.


Agreed.

I think the underlying problem is that we have little or inconsistent 
support for propagating floats.  It's not a ranger issue, but all the 
levels above it (and even the gimplifier) which seem to do their own 
thing wrt when they propagate or not.


FWIW, I still think the issue is DCE and friends which are removing 
trapping statements, but I'm happy to entertain other solutions.


Aldy



[committed] libstdc++: Remove dg-xfail-run-if in std/time/tzdb_list/1.cc

2023-01-15 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux and powerpc-aix. Pushed to trunk.

-- >8 --

Use the global override_used to tell whether the target supports the
override functionality that the test_reload and test_erase functions
rely on.

libstdc++-v3/ChangeLog:

* testsuite/std/time/tzdb_list/1.cc: Remove dg-xfail-run-if
and fail gracefully if defining the weak symbol doesn't work.
---
 libstdc++-v3/testsuite/std/time/tzdb_list/1.cc | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/testsuite/std/time/tzdb_list/1.cc 
b/libstdc++-v3/testsuite/std/time/tzdb_list/1.cc
index 2b121ff219d..e52f346d266 100644
--- a/libstdc++-v3/testsuite/std/time/tzdb_list/1.cc
+++ b/libstdc++-v3/testsuite/std/time/tzdb_list/1.cc
@@ -2,13 +2,13 @@
 // { dg-do run { target c++20 } }
 // { dg-require-effective-target tzdb }
 // { dg-require-effective-target cxx11_abi }
-// { dg-xfail-run-if "no weak override on AIX" { powerpc-ibm-aix* } }
 
 #include 
 #include 
+#include 
 #include 
 
-static bool override_used = true;
+static bool override_used = false;
 
 namespace __gnu_cxx
 {
@@ -119,6 +119,12 @@ int main()
   std::ofstream("tzdata.zi") << tzdata_zi;
 
   test_access();
-  test_reload();
-  test_erase();
+
+  if (override_used)
+  {
+test_reload();
+test_erase();
+  }
+  else
+std::puts("__gnu_cxx::zoneinfo_dir_override() doesn't work on this 
target");
 }
-- 
2.39.0



Re: libstdc++: Fix deadlock in debug iterator increment [PR108288]

2023-01-15 Thread François Dumont via Gcc-patches

Committed with the idiomatic approach.

I'll work on this additional check later.

On 12/01/23 22:35, Jonathan Wakely wrote:

On Thu, 12 Jan 2023 at 18:25, François Dumont  wrote:

On 12/01/23 13:00, Jonathan Wakely wrote:

On Thu, 12 Jan 2023 at 05:52, François Dumont wrote:

Small update for an obvious compilation issue and to review new test
case that could have lead to an infinite loop if the increment issue was
not detected.

I also forgot to ask if there is more chance for the instantiation to be
elided when it is implemented like in the _Safe_local_iterator:
return { __cur, this->_M_sequence };

No, that doesn't make any difference.


than in the _Safe_iterator:
return _Safe_iterator(__cur, this->_M_sequence);

In the case where the user code do not use it ?

Fully tested now, ok to commit ?

François

On 11/01/23 07:03, François Dumont wrote:

Thanks for fixing this.

Here is the extension of the fix to all post-increment/decrement
operators we have on _GLIBCXX_DEBUG iterator.

Thanks, I completely forgot we have other partial specializations, I
just fixed the one that showed a deadlock in the user's example!


I prefer to restore somehow previous implementation to continue to
have _GLIBCXX_DEBUG post operators implemented in terms of normal post
operators.

Why?

Implementing post-increment as:

  auto tmp = *this;
  ++*this;
  return tmp;

is the idiomatic way to write it, and it works fine in this case. I
don't think it performs any more work than your version, does it?
Why not use the idiomatic form?

Is it just so that post-inc of a debug iterator uses post-inc of the
underlying iterator? Why does that matter?


A little yes, but that's a minor reason that is just making me happy.

Main reason is that this form could produce a __msg_init_copy_singular
before the __msg_bad_inc.

Ah yes, I see. That's a shame. I find the idiomatic form much simpler
to read, and it will generate better code (because it just reuses
existing functions, instead of adding new ones).

We could do this though, right?

 _GLIBCXX_DEBUG_VERIFY(this->_M_incrementable(),
   _M_message(__msg_bad_inc)
   ._M_iterator(*this, "this"));
 _Safe_iterator __tmp = *this;
 ++*this;
 return __tmp;

That does the VERIFY check twice though.


And moreover I plan to propose a patch later to skip any check in the
call to _Safe_iterator(__cur, _M_sequence) as we already know that __cur
is ok here like anywhere else in the lib.

There will still be one in the constructor normally elided unless
--no-elide-constructors but there is not much I can do about it.

Don't worry about it. Nobody should ever use -fno-elide-constructors
in any real cases (except maybe debugging some very strange corner
cases, and in that case the extra safe iterator checks are not going
to be their biggest problem).

The patch is OK for trunk then.





Re: [PATCH] libatomic: Use config/mingw/lock.c for --enable-threads=single

2023-01-15 Thread Jonathan Yong via Gcc-patches

On 1/14/23 20:39, Jonathan Wakely wrote:

OK for trunk?



Looks OK to me, thanks for the patch.




[committed] libstdc++: Remove unconditional -pthread from test options

2023-01-15 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

This shouldn't be in the common options, it's already added for the
relevant targets using dg-additional-options.

libstdc++-v3/ChangeLog:

* testsuite/30_threads/jthread/jthread.cc: Remove -pthread from
dg-options.
---
 libstdc++-v3/testsuite/30_threads/jthread/jthread.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/30_threads/jthread/jthread.cc 
b/libstdc++-v3/testsuite/30_threads/jthread/jthread.cc
index 82c01caf7ab..849a2781a69 100644
--- a/libstdc++-v3/testsuite/30_threads/jthread/jthread.cc
+++ b/libstdc++-v3/testsuite/30_threads/jthread/jthread.cc
@@ -15,7 +15,7 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-// { dg-options "-std=gnu++2a -pthread" }
+// { dg-options "-std=gnu++2a" }
 // { dg-do run { target c++2a } }
 // { dg-add-options libatomic }
 // { dg-additional-options "-pthread" { target pthread } }
-- 
2.39.0



[committed] config-list.mk: Modernize FreeBSD targets towards version 13

2023-01-15 Thread Gerald Pfeifer
And here is a second set of changes to bring config-list.mk largely in 
line with the current situation on FreeBSD.

(It probably makes sense to switch to, or add, powerpc64le. I'll leave 
that to others closer to that.)

Gerald


2023-01-15  Gerald Pfeifer  

* config-list.mk: Update FreeBSD targets to version 13.
Add aarch64-freebsd13.
---
 contrib/config-list.mk | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/contrib/config-list.mk b/contrib/config-list.mk
index 05184eaa701..20b8f4a196f 100644
--- a/contrib/config-list.mk
+++ b/contrib/config-list.mk
@@ -29,7 +29,7 @@ GCC_SRC_DIR=../../gcc
 # > make.out 2>&1 &
 #
 
-LIST = aarch64-elf aarch64-linux-gnu aarch64-rtems \
+LIST = aarch64-elf aarch64-freebsd13 aarch64-linux-gnu aarch64-rtems \
   alpha-linux-gnu alpha-netbsd alpha-openbsd \
   alpha64-dec-vms alpha-dec-vms \
   amdgcn-amdhsa \
@@ -48,7 +48,7 @@ LIST = aarch64-elf aarch64-linux-gnu aarch64-rtems \
   hppa64-hpux11.3 \
   hppa64-hpux11.0OPT-enable-sjlj-exceptions=yes \
   i686-pc-linux-gnu i686-apple-darwin i686-apple-darwin9 i686-apple-darwin10 \
-  i686-freebsd6 i686-kfreebsd-gnu \
+  i686-freebsd13 i686-kfreebsd-gnu \
   i686-netbsdelf9 \
   i686-openbsd i686-elf i686-kopensolaris-gnu i686-symbolics-gnu \
   i686-pc-msdosdjgpp i686-lynxos i686-nto-qnx \
@@ -76,7 +76,7 @@ LIST = aarch64-elf aarch64-linux-gnu aarch64-rtems \
   or1k-elf or1k-linux-uclibc or1k-linux-musl or1k-rtems \
   pdp11-aout \
   powerpc-darwin8 \
-  powerpc-darwin7 powerpc64-darwin powerpc-freebsd6 powerpc-netbsd \
+  powerpc-darwin7 powerpc64-darwin powerpc-freebsd13 powerpc-netbsd \
   powerpc-eabisimaltivec powerpc-eabisim ppc-elf \
   powerpc-eabialtivec powerpc-xilinx-eabi powerpc-eabi \
   powerpc-rtems \
@@ -98,7 +98,7 @@ LIST = aarch64-elf aarch64-linux-gnu aarch64-rtems \
   v850e1-elf v850e-elf v850-elf v850-rtems vax-linux-gnu \
   vax-netbsdelf visium-elf x86_64-apple-darwin \
   x86_64-pc-linux-gnuOPT-with-fpmath=avx \
-  x86_64-elfOPT-with-fpmath=sse x86_64-freebsd6 x86_64-netbsd \
+  x86_64-elfOPT-with-fpmath=sse x86_64-freebsd13 x86_64-netbsd \
   x86_64-w64-mingw32 \
   x86_64-mingw32OPT-enable-sjlj-exceptions=yes x86_64-rtems \
   xstormy16-elf xtensa-elf \
-- 
2.38.1


[committed] libstdc++: Fix narrowing conversion in std/time/clock/utc/io.cc

2023-01-15 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, and verified with -fsigned-char -fshort-wchar
(which makes the underlying type of wchar_t be unsigned short).

Nightstrike also tested on mingw-w64.

Pushed to trunk.

-- >8 --

For a port with signed char and unsigned wchar_t initializing a wchar_t
array with a char is a narrowing conversion. The code is wrong for
assuming that (int)'a' == (int)L'a' anyway, so fix it properly by using
ctype::widen(char).

libstdc++-v3/ChangeLog:

* testsuite/std/time/clock/utc/io.cc: Use ctype to widen char.
---
 libstdc++-v3/testsuite/std/time/clock/utc/io.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/std/time/clock/utc/io.cc 
b/libstdc++-v3/testsuite/std/time/clock/utc/io.cc
index b327c7f50c7..933cba65f44 100644
--- a/libstdc++-v3/testsuite/std/time/clock/utc/io.cc
+++ b/libstdc++-v3/testsuite/std/time/clock/utc/io.cc
@@ -46,6 +46,7 @@ test_format()
 
   std::ostringstream ss;
   std::wostringstream wss;
+  const auto& ct = std::use_facet>(wss.getloc());
 
   for (char c : specs)
   {
@@ -68,7 +69,7 @@ test_format()
"required by the chrono-specs") != s.npos);
 }
 
-wchar_t wfmt[] = { L'{', L':', L'%', c, L'}' };
+wchar_t wfmt[] = { L'{', L':', L'%', ct.widen(c), L'}' };
 try
 {
   wss << std::vformat(std::wstring_view(wfmt, 5),
-- 
2.39.0



Re: [PATCH] [PR107608] [range-ops] Avoid folding into INF when flag_trapping_math.

2023-01-15 Thread Jakub Jelinek via Gcc-patches
On Sun, Jan 15, 2023 at 11:32:27AM +0100, Aldy Hernandez wrote:
> As discussed in the PR, for trapping math, do not fold overflowing
> operations into +-INF as doing so could elide a trap.
> 
> There is a minor adjustment to known_isinf() where it was mistakenly
> returning true for an [infinity U NAN], whereas it should only return
> true when the range is exclusively +INF or -INF.  This is benign, as
> there were no users of known_isinf up to now.
> 
> I had some testsuite issues with:
> 
> > FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++14  
> > scan-sarif-file "text": "  int u6587u5b57u5316u3051 =
> > FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++17  
> > scan-sarif-file "text": "  int u6587u5b57u5316u3051 =
> > FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++20  
> > scan-sarif-file "text": "  int u6587u5b57u5316u3051 =
> > FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++98  
> > scan-sarif-file "text": "  int u6587u5b57u5316u3051 =
> > FAIL: g++.dg/pr71488.C   (test for excess errors)
> > FAIL: g++.dg/guality/pr55665.C   -O2 -flto -fno-use-linker-plugin 
> > -flto-partition=none  line 23 p == 40
> > FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -Wc++-compat   
> > scan-sarif-file "text": "  int u6587u5b57u5316u3051 =
> < FAIL: g++.dg/pr71488.C   (test for excess errors)
> < FAIL: g++.dg/guality/pr55665.C   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  line 23 p == 40
> > FAIL: ./index0-out.go execution,  -O0 -g -fno-var-tracking-assignments
> > FAIL: go.test/test/fixedbugs/issue27836.dir/Äfoo.go  -O -I. (test for 
> > excess errors)
> > FAIL: go.test/test/fixedbugs/issue27836.dir/Ämain.go  -O -I. (test for 
> > excess errors)
> 
> But they seem to be transient issues on my machine, as re-running them
> manually don't cause any issues.  Also, the tests themselves have
> nothing to do with floats so I don't see how they could be related.
> 
> Tested on x86-64 Linux.
> 
> I also ran the glibc testsuite (git sources) on x86-64 and this patch
> fixes:
> 
> -FAIL: math/test-double-lgamma
> -FAIL: math/test-double-log1p
> -FAIL: math/test-float-lgamma
> -FAIL: math/test-float-log1p
> -FAIL: math/test-float128-catan
> -FAIL: math/test-float128-catanh
> -FAIL: math/test-float128-lgamma
> -FAIL: math/test-float128-log
> -FAIL: math/test-float128-log1p
> -FAIL: math/test-float128-y0
> -FAIL: math/test-float128-y1
> -FAIL: math/test-float32-lgamma
> -FAIL: math/test-float32-log1p
> -FAIL: math/test-float32x-lgamma
> -FAIL: math/test-float32x-log1p
> -FAIL: math/test-float64-lgamma
> -FAIL: math/test-float64-log1p
> -FAIL: math/test-float64x-lgamma
> -FAIL: math/test-ldouble-lgamma
> 
> OK for trunk?
> 
>   PR tree-optimization/107608
> 
> gcc/ChangeLog:
> 
>   * range-op-float.cc (range_operator_float::fold_range): Avoid
>   folding into INF when flag_trapping_math.
>   * value-range.h (frange::known_isinf): Return false for possible NANs.

As a workaround this looks ok to me, but we need to figure out something
better for GCC 14.

Ok for trunk.

Jakub



[PATCH] [PR107608] [range-ops] Avoid folding into INF when flag_trapping_math.

2023-01-15 Thread Aldy Hernandez via Gcc-patches
As discussed in the PR, for trapping math, do not fold overflowing
operations into +-INF as doing so could elide a trap.

There is a minor adjustment to known_isinf() where it was mistakenly
returning true for an [infinity U NAN], whereas it should only return
true when the range is exclusively +INF or -INF.  This is benign, as
there were no users of known_isinf up to now.

I had some testsuite issues with:

> FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++14  
> scan-sarif-file "text": "  int u6587u5b57u5316u3051 =
> FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++17  
> scan-sarif-file "text": "  int u6587u5b57u5316u3051 =
> FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++20  
> scan-sarif-file "text": "  int u6587u5b57u5316u3051 =
> FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++98  
> scan-sarif-file "text": "  int u6587u5b57u5316u3051 =
> FAIL: g++.dg/pr71488.C   (test for excess errors)
> FAIL: g++.dg/guality/pr55665.C   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  line 23 p == 40
> FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -Wc++-compat   
> scan-sarif-file "text": "  int u6587u5b57u5316u3051 =
< FAIL: g++.dg/pr71488.C   (test for excess errors)
< FAIL: g++.dg/guality/pr55665.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 23 p == 40
> FAIL: ./index0-out.go execution,  -O0 -g -fno-var-tracking-assignments
> FAIL: go.test/test/fixedbugs/issue27836.dir/Äfoo.go  -O -I. (test for excess 
> errors)
> FAIL: go.test/test/fixedbugs/issue27836.dir/Ämain.go  -O -I. (test for 
> excess errors)

But they seem to be transient issues on my machine, as re-running them
manually don't cause any issues.  Also, the tests themselves have
nothing to do with floats so I don't see how they could be related.

Tested on x86-64 Linux.

I also ran the glibc testsuite (git sources) on x86-64 and this patch
fixes:

-FAIL: math/test-double-lgamma
-FAIL: math/test-double-log1p
-FAIL: math/test-float-lgamma
-FAIL: math/test-float-log1p
-FAIL: math/test-float128-catan
-FAIL: math/test-float128-catanh
-FAIL: math/test-float128-lgamma
-FAIL: math/test-float128-log
-FAIL: math/test-float128-log1p
-FAIL: math/test-float128-y0
-FAIL: math/test-float128-y1
-FAIL: math/test-float32-lgamma
-FAIL: math/test-float32-log1p
-FAIL: math/test-float32x-lgamma
-FAIL: math/test-float32x-log1p
-FAIL: math/test-float64-lgamma
-FAIL: math/test-float64-log1p
-FAIL: math/test-float64x-lgamma
-FAIL: math/test-ldouble-lgamma

OK for trunk?

PR tree-optimization/107608

gcc/ChangeLog:

* range-op-float.cc (range_operator_float::fold_range): Avoid
folding into INF when flag_trapping_math.
* value-range.h (frange::known_isinf): Return false for possible NANs.
---
 gcc/range-op-float.cc | 21 +
 gcc/value-range.h |  1 +
 2 files changed, 22 insertions(+)

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index 986a3896a4f..74ac4658378 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -91,6 +91,27 @@ range_operator_float::fold_range (frange , tree type,
   else
 r.clear_nan ();
 
+  // If the result has overflowed and flag_trapping_math, folding this
+  // operation could elide an overflow or division by zero exception.
+  // Avoid returning a singleton +-INF, to keep the propagators (DOM
+  // and substitute_and_fold_engine) from folding.  See PR107608.
+  if (flag_trapping_math
+  && MODE_HAS_INFINITIES (TYPE_MODE (type))
+  && r.known_isinf () && !op1.known_isinf () && !op2.known_isinf ())
+{
+  REAL_VALUE_TYPE inf = r.lower_bound ();
+  if (real_isneg ())
+   {
+ REAL_VALUE_TYPE min = real_min_representable (type);
+ r.set (type, inf, min);
+   }
+  else
+   {
+ REAL_VALUE_TYPE max = real_max_representable (type);
+ r.set (type, max, inf);
+   }
+}
+
   return true;
 }
 
diff --git a/gcc/value-range.h b/gcc/value-range.h
index ea50ed3e64a..f4ac73b499f 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -1300,6 +1300,7 @@ inline bool
 frange::known_isinf () const
 {
   return (m_kind == VR_RANGE
+ && !maybe_isnan ()
  && real_identical (_min, _max)
  && real_isinf (_min));
 }
-- 
2.39.0



[PATCH] [PR107608] [range-ops] Avoid folding into INF when flag_trapping_math.

2023-01-15 Thread Aldy Hernandez via Gcc-patches
As discussed in the PR, for trapping math, do not fold overflowing
operations into +-INF as doing so could elide a trap.

There is a minor adjustment to known_isinf() where it was mistakenly
returning true for an [infinity U NAN], whereas it should only return
true when the range is exclusively +INF or -INF.  This is benign, as
there were no users of known_isinf up to now.

I had some testsuite issues with:

> FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++14  
> scan-sarif-file "text": "  int u6587u5b57u5316u3051 =
> FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++17  
> scan-sarif-file "text": "  int u6587u5b57u5316u3051 =
> FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++20  
> scan-sarif-file "text": "  int u6587u5b57u5316u3051 =
> FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -std=gnu++98  
> scan-sarif-file "text": "  int u6587u5b57u5316u3051 =
> FAIL: g++.dg/pr71488.C   (test for excess errors)
> FAIL: g++.dg/guality/pr55665.C   -O2 -flto -fno-use-linker-plugin 
> -flto-partition=none  line 23 p == 40
> FAIL: c-c++-common/diagnostic-format-sarif-file-4.c  -Wc++-compat   
> scan-sarif-file "text": "  int u6587u5b57u5316u3051 =
< FAIL: g++.dg/pr71488.C   (test for excess errors)
< FAIL: g++.dg/guality/pr55665.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  line 23 p == 40
> FAIL: ./index0-out.go execution,  -O0 -g -fno-var-tracking-assignments
> FAIL: go.test/test/fixedbugs/issue27836.dir/Äfoo.go  -O -I. (test for excess 
> errors)
> FAIL: go.test/test/fixedbugs/issue27836.dir/Ämain.go  -O -I. (test for 
> excess errors)

But they seem to be transient issues on my machine, as re-running them
manually don't cause any issues.  Also, the tests themselves have
nothing to do with floats so I don't see how they could be related.

Tested on x86-64 Linux.

I also ran the glibc testsuite (git sources) on x86-64 and this patch
fixes:

-FAIL: math/test-double-lgamma
-FAIL: math/test-double-log1p
-FAIL: math/test-float-lgamma
-FAIL: math/test-float-log1p
-FAIL: math/test-float128-catan
-FAIL: math/test-float128-catanh
-FAIL: math/test-float128-lgamma
-FAIL: math/test-float128-log
-FAIL: math/test-float128-log1p
-FAIL: math/test-float128-y0
-FAIL: math/test-float128-y1
-FAIL: math/test-float32-lgamma
-FAIL: math/test-float32-log1p
-FAIL: math/test-float32x-lgamma
-FAIL: math/test-float32x-log1p
-FAIL: math/test-float64-lgamma
-FAIL: math/test-float64-log1p
-FAIL: math/test-float64x-lgamma
-FAIL: math/test-ldouble-lgamma

OK for trunk?

PR tree-optimization/107608

gcc/ChangeLog:

* range-op-float.cc (range_operator_float::fold_range): Avoid
folding into INF when flag_trapping_math.
* value-range.h (frange::known_isinf): Return false for possible NANs.
---
 gcc/range-op-float.cc | 21 +
 gcc/value-range.h |  1 +
 2 files changed, 22 insertions(+)

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index 986a3896a4f..74ac4658378 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -91,6 +91,27 @@ range_operator_float::fold_range (frange , tree type,
   else
 r.clear_nan ();
 
+  // If the result has overflowed and flag_trapping_math, folding this
+  // operation could elide an overflow or division by zero exception.
+  // Avoid returning a singleton +-INF, to keep the propagators (DOM
+  // and substitute_and_fold_engine) from folding.  See PR107608.
+  if (flag_trapping_math
+  && MODE_HAS_INFINITIES (TYPE_MODE (type))
+  && r.known_isinf () && !op1.known_isinf () && !op2.known_isinf ())
+{
+  REAL_VALUE_TYPE inf = r.lower_bound ();
+  if (real_isneg ())
+   {
+ REAL_VALUE_TYPE min = real_min_representable (type);
+ r.set (type, inf, min);
+   }
+  else
+   {
+ REAL_VALUE_TYPE max = real_max_representable (type);
+ r.set (type, max, inf);
+   }
+}
+
   return true;
 }
 
diff --git a/gcc/value-range.h b/gcc/value-range.h
index ea50ed3e64a..f4ac73b499f 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -1300,6 +1300,7 @@ inline bool
 frange::known_isinf () const
 {
   return (m_kind == VR_RANGE
+ && !maybe_isnan ()
  && real_identical (_min, _max)
  && real_isinf (_min));
 }
-- 
2.39.0