date:20220921

Re: Proxy ping [PATCH] Fortran: Fix automatic reallocation inside select rank [PR100103]

2022-09-21 Thread Thomas Koenig via Gcc-patches


Hello Harald,


the patch for this PR was submitted for review by Jose here:

   https://gcc.gnu.org/pipermail/fortran/2021-April/055934.html

but unfortunately was never reviewed.

I verified that it works on mainline and x86_64-pc-linux-gnu,
and I think that it is fine.

Although the above mail suggests that there is a dependency
on the fix for another PR with a rather lengthy patch,
it appears that this is no longer the case.  It might be
that the fix for PR100245 (another reallocation issue)
already did the necessary job.

So OK for mainline?


Looks good to me. Thanks for picking up these patches!

Best regards

Thomas

[PATCH v2] Re: OpenMP: Generate SIMD clones for functions with "declare target"

2022-09-21 Thread Sandra Loosemore


On 9/14/22 12:12, Jakub Jelinek wrote:


If it is pure optimization thing and purely keyed on the definition,
all the simd clones should be local to the TU, never exported from it.


OK, here is a revised patch that addresses that.  x86_64 target also 
generates a different set of clones for functions with internal linkage 
vs external so I hacked that to treat these implicit clones in the same 
way as other internal clones.


There is an existing problem with internal "declare simd" clones in that 
nothing ever DCEs clones that end up not being useful, or does a scan of 
the code in the compilation unit before clone generation to avoid 
generating useless clones in the first place.  I haven't tried to solve 
that problem, but I did attempt to mitigate it for these implicit 
"declare target" clones by tagging the option 
OPT_LEVELS_2_PLUS_SPEED_ONLY (instead of enabling it by default all the 
time) so the clones are not generated by default at -Os and -Og.  I 
added a couple new test cases to check this.


On 9/14/22 15:45, Thomas Schwinge wrote:

However, OpenACC and OpenMP support may be active at the same time...


+  if (attr == NULL_TREE
+  && flag_openmp_target_simd_clone && !flag_openacc)


..., so '!flag_openacc' is not the right check here.  Instead you'd do
'!oacc_get_fn_attrib (DECL_ATTRIBUTES (node->decl))' (untested) or
similar.


This is fixed now too.

OK to check in?

-SandraFrom dfdb9a2162978b964863f351c814211dca8e9a3f Mon Sep 17 00:00:00 2001
From: Sandra Loosemore 
Date: Thu, 22 Sep 2022 02:16:42 +
Subject: [PATCH] OpenMP: Generate SIMD clones for functions with "declare
 target"

This patch causes the IPA simdclone pass to generate clones for
functions with the "omp declare target" attribute as if they had
"omp declare simd", provided the function appears to be suitable for
SIMD execution.  The filter is conservative, rejecting functions
that write memory or that call other functions not known to be safe.
A new option -fopenmp-target-simd-clone is added to control this
transformation; it's enabled at -O2 and higher.

gcc/ChangeLog:

	* common.opt (fopenmp-target-simd-clone): New option.
	* opts.cc (default_options_table): Add -fopenmp-target-simd-clone.
	* doc/invoke.texi (-fopenmp-target-simd-clone): Document.
	* omp-simd-clone.cc (auto_simd_check_stmt): New function.
	(mark_auto_simd_clone): New function.
	(simd_clone_create): Add force_local argument, make the symbol
	have internal linkage if it is true.
	(expand_simd_clones): Also check for cloneable functions with
	"omp declare target".  Pass explicit_p argument to
	simd_clone.compute_vecsize_and_simdlen target hook.
	* target.def (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN):
	Add bool explicit_p argument.
	* doc/tm.texi: Regenerated.
	* config/aarch64/aarch64.cc
	(aarch64_simd_clone_compute_vecsize_and_simdlen): Update.
	* config/gcn/gcn.cc
	(gcn_simd_clone_compute_vecsize_and_simdlen): Update.
	* config/i386/i386.cc
	(ix86_simd_clone_compute_vecsize_and_simdlen): Update.

gcc/testsuite/ChangeLog:

	* gcc.dg/gomp/target-simd-clone-1.c: New.
	* gcc.dg/gomp/target-simd-clone-2.c: New.
	* gcc.dg/gomp/target-simd-clone-3.c: New.
	* gcc.dg/gomp/target-simd-clone-4.c: New.
	* gcc.dg/gomp/target-simd-clone-5.c: New.
	* gcc.dg/gomp/target-simd-clone-6.c: New.
---
 gcc/common.opt|   4 +
 gcc/config/aarch64/aarch64.cc |  24 +-
 gcc/config/gcn/gcn.cc |  10 +-
 gcc/config/i386/i386.cc   |  27 +-
 gcc/doc/invoke.texi   |  12 +-
 gcc/doc/tm.texi   |   2 +-
 gcc/omp-simd-clone.cc | 237 --
 gcc/opts.cc   |   1 +
 gcc/target.def|   2 +-
 .../gcc.dg/gomp/target-simd-clone-1.c |  18 ++
 .../gcc.dg/gomp/target-simd-clone-2.c |  18 ++
 .../gcc.dg/gomp/target-simd-clone-3.c |  17 ++
 .../gcc.dg/gomp/target-simd-clone-4.c |  16 ++
 .../gcc.dg/gomp/target-simd-clone-5.c |  13 +
 .../gcc.dg/gomp/target-simd-clone-6.c |  13 +
 15 files changed, 362 insertions(+), 52 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-2.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-3.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-4.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-5.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/target-simd-clone-6.c

diff --git a/gcc/common.opt b/gcc/common.opt
index fba90ff6dcb..c735c62a8d4 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2217,6 +2217,10 @@ fomit-frame-pointer
 Common Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
 
+fopenmp-target-simd-clone
+Common Var(flag_openmp_target_simd_clone) Optimization
+Generate SIMD clones for

Re: [PATCH v6, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-09-21 Thread Kewen.Lin via Gcc-patches

on 2022/9/22 05:56, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Jun 24, 2022 at 10:02:19AM +0800, HAO CHEN GUI wrote:
>>   This patch also binds __builtin_vsx_xs[min/max]dp to fmin/max instead
>> of smin/max. So the builtins always generate xs[min/max]dp on all
>> platforms.
> 
> But how does this not blow up with -ffast-math?

Indeed.  Since it guards with "TARGET_VSX && !flag_finite_math_only",
the bifs seem to cause ICE at -ffast-math.

Haochen, could you double check it?

> 
> In the other direction I am worried that the unspecs will degrade
> performance (relative to smin/smax) when -ffast-math *is* active (and
> this new builtin code and pattern doesn't blow up).

For fmin/fmax it would be fine, since they are transformed to {MAX,MIN}
EXPR in middle end, and yes, it can degrade for the bifs, although IMHO
the previous expansion to smin/smax contradicts with the bif names (users
expect to map them to xs{min,max}dp than others).

> 
> I still think we should get RTL codes for this, to have access to proper
> floating point min/max semantics always and everywhere.  "fmin" and
> "fmax" seem to be good names :-)

It would be good, especially if we have observed some uses of these bifs
and further opportunities around them.  :)

BR,
Kewen

[PATCH] rs6000: Fix the condition with frame_pointer_needed_indeed [PR96072]

2022-09-21 Thread Kewen.Lin via Gcc-patches

Hi,

As PR96072 shows, the code adding REG_CFA_DEF_CFA reg note
makes one assumption that we have emitted one insn which
restores the frame pointer previously.  That part of code
was guarded with flag frame_pointer_needed before, it was
consistent, but later it was replaced with flag
frame_pointer_needed_indeed since commit r10-7981.  It
caused ICE due to unexpected NULL insn.  This patch is to
make the conditions consistent.

Bootstrapped and regtested on powerpc64-linux-gnu P7 and
powerpc64le-linux-gnu P9 and P10.

Is it ok for trunk?

BR,
Kewen
-
PR target/96072

gcc/ChangeLog:

* config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): Update the
condition for adding REG_CFA_DEF_CFA reg note with
frame_pointer_needed_indeed.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr96072.c: New test.
---
 gcc/config/rs6000/rs6000-logue.cc  |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr96072.c | 14 ++
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96072.c

diff --git a/gcc/config/rs6000/rs6000-logue.cc 
b/gcc/config/rs6000/rs6000-logue.cc
index 51f55d1d527..41daf6ee646 100644
--- a/gcc/config/rs6000/rs6000-logue.cc
+++ b/gcc/config/rs6000/rs6000-logue.cc
@@ -4956,7 +4956,7 @@ rs6000_emit_epilogue (enum epilogue_type epilogue_type)
 a REG_CFA_DEF_CFA note, but that's OK;  A duplicate is
 discarded by dwarf2cfi.cc/dwarf2out.cc, and in any case would
 be harmless if emitted.  */
-  if (frame_pointer_needed)
+  if (frame_pointer_needed_indeed)
{
  insn = get_last_insn ();
  add_reg_note (insn, REG_CFA_DEF_CFA,
diff --git a/gcc/testsuite/gcc.target/powerpc/pr96072.c 
b/gcc/testsuite/gcc.target/powerpc/pr96072.c
new file mode 100644
index 000..23d1cc74ffd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr96072.c
@@ -0,0 +1,14 @@
+/* { dg-options "-O1" } */
+
+/* Verify there is no ICE on 32 bit environment.  */
+
+void
+he (int jn)
+{
+  {
+int bh[jn];
+if (jn != 0)
+  goto wa;
+  }
+wa:;
+}
--
2.27.0

[PATCH] rs6000: Fix condition of define_expand vec_shr_ [PR100645]

2022-09-21 Thread Kewen.Lin via Gcc-patches

Hi,
 
PR100645 exposes one latent bug in define_expand vec_shr_
that the current condition TARGET_ALTIVEC is too loose.  The
mode iterator VEC_L contains a few modes, they are not always
supported as vector mode, VECTOR_UNIT_ALTIVEC_OR_VSX_P should
be used like some other VEC_L usages.

Bootstrapped and regtested on powerpc64-linux-gnu P7 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push it a week later if no objections.

BR,
Kewen
-
PR target/100645

gcc/ChangeLog:

* config/rs6000/vector.md (vec_shr_): Replace condition
TARGET_ALTIVEC with VECTOR_UNIT_ALTIVEC_OR_VSX_P.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr100645.c: New test.
---
 gcc/config/rs6000/vector.md |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr100645.c | 13 +
 2 files changed, 14 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr100645.c

diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index a0d33d2f604..0171705803c 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -1475,7 +1475,7 @@ (define_expand "vec_shr_"
   [(match_operand:VEC_L 0 "vlogical_operand")
(match_operand:VEC_L 1 "vlogical_operand")
(match_operand:QI 2 "reg_or_short_operand")]
-  "TARGET_ALTIVEC"
+  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
 {
   rtx bitshift = operands[2];
   rtx shift;
diff --git a/gcc/testsuite/gcc.target/powerpc/pr100645.c 
b/gcc/testsuite/gcc.target/powerpc/pr100645.c
new file mode 100644
index 000..e221287c0f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr100645.c
@@ -0,0 +1,13 @@
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-mdejagnu-cpu=power6 -maltivec" } */
+
+/* It's to verify no ICE here.  */
+
+typedef long long v2di __attribute__ ((vector_size (16)));
+
+v2di
+foo_v2di_l (v2di x)
+{
+  return __builtin_shuffle ((v2di){0, 0}, x, (v2di){3, 0});
+}
+
--
2.27.0

Re: [PATCH] [x86] Fix typo in floorv2sf2, should be register_operand for op1, not vector_operand.

2022-09-21 Thread Hongtao Liu via Gcc-patches

On Thu, Sep 22, 2022 at 9:17 AM liuhongt  wrote:
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Verify 526.blend_r can be rebuilt with the fix.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/106994
> * config/i386/mmx.md (floorv2sf2): Fix typo, use
> register_operand instead of vector_operand for operands[1].
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr106994.c: New test.
> ---
>  gcc/config/i386/mmx.md   |  2 +-
>  gcc/testsuite/gcc.target/i386/pr106994.c | 24 
>  2 files changed, 25 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr106994.c
>
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index 222a041de58..c359e2dd6de 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -1676,7 +1676,7 @@ (define_expand "lceilv2sfv2si2"
>  (define_expand "floorv2sf2"
>[(set (match_operand:V2SF 0 "register_operand")
> (unspec:V2SF
> - [(match_operand:V2SF 1 "vector_operand")
> + [(match_operand:V2SF 1 "register_operand")
>(match_dup 2)]
>   UNSPEC_ROUND))]
>"TARGET_SSE4_1 && !flag_trapping_math
> diff --git a/gcc/testsuite/gcc.target/i386/pr106994.c 
> b/gcc/testsuite/gcc.target/i386/pr106994.c
> new file mode 100644
> index 000..0803311dc75
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106994.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=skylake -Ofast" } */
> +
> +typedef struct {
> +  float ymin, ymax;
> +} rctf;
> +
> +rctf view2d_map_cur_using_maskUI_view2d_view_ortho_curmasked;
> +float view2d_map_cur_using_maskUI_view2d_view_ortho_yofs;
> +
> +void BLI_rctf_translate();
> +void glLoadIdentity();
> +
> +void
> +view2d_map_cur_using_maskUI_view2d_view_ortho() {
> +  
> BLI_rctf_translate(_map_cur_using_maskUI_view2d_view_ortho_curmasked);
> +  view2d_map_cur_using_maskUI_view2d_view_ortho_curmasked.ymin =
> +  
> __builtin_floor(view2d_map_cur_using_maskUI_view2d_view_ortho_curmasked.ymin) 
> -
> +  view2d_map_cur_using_maskUI_view2d_view_ortho_yofs;
> +  view2d_map_cur_using_maskUI_view2d_view_ortho_curmasked.ymax =
> +  
> __builtin_floor(view2d_map_cur_using_maskUI_view2d_view_ortho_curmasked.ymax) 
> -
> +  view2d_map_cur_using_maskUI_view2d_view_ortho_yofs;
> +  glLoadIdentity();
> +}
> --
> 2.27.0
>


-- 
BR,
Hongtao

[PATCH] [x86] Fix typo in floorv2sf2, should be register_operand for op1, not vector_operand.

2022-09-21 Thread liuhongt via Gcc-patches

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Verify 526.blend_r can be rebuilt with the fix.

Ok for trunk?

gcc/ChangeLog:

PR target/106994
* config/i386/mmx.md (floorv2sf2): Fix typo, use
register_operand instead of vector_operand for operands[1].

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr106994.c: New test.
---
 gcc/config/i386/mmx.md   |  2 +-
 gcc/testsuite/gcc.target/i386/pr106994.c | 24 
 2 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr106994.c

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 222a041de58..c359e2dd6de 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1676,7 +1676,7 @@ (define_expand "lceilv2sfv2si2"
 (define_expand "floorv2sf2"
   [(set (match_operand:V2SF 0 "register_operand")
(unspec:V2SF
- [(match_operand:V2SF 1 "vector_operand")
+ [(match_operand:V2SF 1 "register_operand")
   (match_dup 2)]
  UNSPEC_ROUND))]
   "TARGET_SSE4_1 && !flag_trapping_math
diff --git a/gcc/testsuite/gcc.target/i386/pr106994.c 
b/gcc/testsuite/gcc.target/i386/pr106994.c
new file mode 100644
index 000..0803311dc75
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106994.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-march=skylake -Ofast" } */
+
+typedef struct {
+  float ymin, ymax;
+} rctf;
+
+rctf view2d_map_cur_using_maskUI_view2d_view_ortho_curmasked;
+float view2d_map_cur_using_maskUI_view2d_view_ortho_yofs;
+
+void BLI_rctf_translate();
+void glLoadIdentity();
+
+void
+view2d_map_cur_using_maskUI_view2d_view_ortho() {
+  BLI_rctf_translate(_map_cur_using_maskUI_view2d_view_ortho_curmasked);
+  view2d_map_cur_using_maskUI_view2d_view_ortho_curmasked.ymin =
+  
__builtin_floor(view2d_map_cur_using_maskUI_view2d_view_ortho_curmasked.ymin) -
+  view2d_map_cur_using_maskUI_view2d_view_ortho_yofs;
+  view2d_map_cur_using_maskUI_view2d_view_ortho_curmasked.ymax =
+  
__builtin_floor(view2d_map_cur_using_maskUI_view2d_view_ortho_curmasked.ymax) -
+  view2d_map_cur_using_maskUI_view2d_view_ortho_yofs;
+  glLoadIdentity();
+}
-- 
2.27.0

Re: [PATCH] Ignore debug insns with CONCAT and CONCATN for insn scheduling

2022-09-21 Thread H.J. Lu via Gcc-patches

On Wed, Sep 7, 2022 at 10:03 AM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 9/2/2022 8:36 AM, H.J. Lu via Gcc-patches wrote:
> > CONCAT and CONCATN never appear in the insn chain.  They are only used
> > in debug insn.  Ignore debug insns with CONCAT and CONCATN for insn
> > scheduling to avoid different insn orders with and without debug insn.
> >
> > gcc/
> >
> >   PR rtl-optimization/106746
> >   * sched-deps.cc (sched_analyze_2): Ignore debug insns with CONCAT
> >   and CONCATN.
> Shouldn't we be ignoring everything in a debug insn?   I don't see why
> CONCAT/CONCATN are special here.

Debug insns are processed by insn scheduling.   I think it is to improve debug
experiences.  It is just that there are no matching usages of CONCAT/CONCATN
in non-debug insns.

--
H.J.

Re: [PATCH v3] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-09-21 Thread Noah Goldstein via Gcc-patches

On Sat, Jul 9, 2022 at 8:59 AM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 6/21/2022 12:12 PM, Noah Goldstein via Gcc-patches wrote:
> > This patch allows for strchr(x, c) to the replace with memchr(x, c,
> > strlen(x) + 1) if strlen(x) has already been computed earlier in the
> > tree.
> >
> > Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821
> >
> > Since memchr doesn't need to re-find the null terminator it is faster
> > than strchr.
> >
> > bootstrapped and tested on x86_64-linux.
> >
> >   PR tree-optimization/95821
> >
> > gcc/
> >
> >   * tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
> >   memchr instead of strchr if strlen already computed.
> >
> > gcc/testsuite/
> >
> >   * c-c++-common/pr95821-1.c: New test.
> >   * c-c++-common/pr95821-2.c: New test.
> >   * c-c++-common/pr95821-3.c: New test.
> >   * c-c++-common/pr95821-4.c: New test.
> >   * c-c++-common/pr95821-5.c: New test.
> >   * c-c++-common/pr95821-6.c: New test.
> >   * c-c++-common/pr95821-7.c: New test.
> >   * c-c++-common/pr95821-8.c: New test.
> Given Jakub's involvement to-date and the fact this touches
> tree-ssa-strlen.cc I think Jakub should have final ACK/NAK on this.
>
> jeff
>

Ping.

Re: [PATCH v6, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-09-21 Thread Segher Boessenkool

Hi!

On Fri, Jun 24, 2022 at 10:02:19AM +0800, HAO CHEN GUI wrote:
>   This patch also binds __builtin_vsx_xs[min/max]dp to fmin/max instead
> of smin/max. So the builtins always generate xs[min/max]dp on all
> platforms.

But how does this not blow up with -ffast-math?

In the other direction I am worried that the unspecs will degrade
performance (relative to smin/smax) when -ffast-math *is* active (and
this new builtin code and pattern doesn't blow up).

I still think we should get RTL codes for this, to have access to proper
floating point min/max semantics always and everywhere.  "fmin" and
"fmax" seem to be good names :-)

Segher

[OG12][PATCH] OpenMP: Fix ICE with OMP metadirectives

2022-09-21 Thread Paul-Antoine Arras


Hello,

Here is a patch that fixes an ICE in gfortran triggered by an invalid 
end statement at the end of an OMP metadirective:


```
!$OMP metadirective ...
...
!$OMP end ...
```

Does this fix look correct?

Thanks,
--
Paul-Antoine ArrasFrom 73ecbc2672a5352a08260f7a9d0de6d2c29ea2b6 Mon Sep 17 00:00:00 2001
From: Paul-Antoine Arras 
Date: Wed, 21 Sep 2022 15:52:56 +
Subject: [PATCH] OpenMP: Fix ICE with OMP metadirectives

Problem: ending an OpenMP metadirective block with an OMP end statement
results in an internal compiler error.
Solution: reject invalid end statements and issue a proper diagnostic.

Also add a new test to check this behaviour.

gcc/fortran/ChangeLog:

* parse.cc (parse_omp_metadirective_body): Reject OMP end statements
at the end of an OMP metadirective.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/metadirective-9.f90: New test.
---
 gcc/fortran/ChangeLog.omp |  5 
 gcc/fortran/parse.cc  | 14 +
 gcc/testsuite/ChangeLog.omp   |  4 +++
 .../gfortran.dg/gomp/metadirective-9.f90  | 29 +++
 4 files changed, 52 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/metadirective-9.f90

diff --git gcc/fortran/ChangeLog.omp gcc/fortran/ChangeLog.omp
index 8c89cd5bd43..7b253608bf8 100644
--- gcc/fortran/ChangeLog.omp
+++ gcc/fortran/ChangeLog.omp
@@ -1,3 +1,8 @@
+2022-09-21  Paul-Antoine Arras  
+
+* parse.cc (parse_omp_metadirective_body): Reject OMP end statements
+at the end of an OMP metadirective.
+
 2022-09-09  Tobias Burnus  
 
Backport from mainline:
diff --git gcc/fortran/parse.cc gcc/fortran/parse.cc
index b35d76a4f6b..1f1fa0eba0e 100644
--- gcc/fortran/parse.cc
+++ gcc/fortran/parse.cc
@@ -5863,6 +5863,20 @@ parse_omp_metadirective_body (gfc_statement omp_st)
  break;
}
 
+  if (gfc_state_stack->state == COMP_OMP_METADIRECTIVE
+ && startswith (gfc_ascii_statement (st), "!$OMP END "))
+   {
+ for (gfc_state_data *p = gfc_state_stack; p; p = p->previous)
+   if (p->state == COMP_OMP_STRUCTURED_BLOCK)
+ goto finish;
+ gfc_error (
+   "Unexpected %s statement in an OMP METADIRECTIVE block at %C",
+   gfc_ascii_statement (st));
+ reject_statement ();
+ st = next_statement ();
+   }
+finish:
+
   gfc_in_metadirective_body = old_in_metadirective_body;
 
   if (gfc_state_stack->head)
diff --git gcc/testsuite/ChangeLog.omp gcc/testsuite/ChangeLog.omp
index e0c8c138620..f075354af4d 100644
--- gcc/testsuite/ChangeLog.omp
+++ gcc/testsuite/ChangeLog.omp
@@ -1,3 +1,7 @@
+2022-09-21  Paul-Antoine Arras  
+
+* gfortran.dg/gomp/metadirective-9.f90: New test.
+
 2022-09-09  Paul-Antoine Arras  
 
Backport from mainline:
diff --git gcc/testsuite/gfortran.dg/gomp/metadirective-9.f90 
gcc/testsuite/gfortran.dg/gomp/metadirective-9.f90
new file mode 100644
index 000..4db37dd0ef9
--- /dev/null
+++ gcc/testsuite/gfortran.dg/gomp/metadirective-9.f90
@@ -0,0 +1,29 @@
+! { dg-do compile }
+
+program OpenMP_Metadirective_WrongEnd_Test
+
+  integer :: &
+iV, jV, kV
+  integer, dimension ( 3 ) :: &
+lV, uV
+  logical :: &
+UseDevice
+
+!$OMP metadirective &
+!$OMP   when ( user = { condition ( UseDevice ) } &
+!$OMP : target teams distribute parallel do simd collapse ( 3 ) &
+!$OMP private ( iaVS ) ) &
+!$OMP   default ( parallel do simd collapse ( 3 ) private ( iaVS ) )
+do kV = lV ( 3 ), uV ( 3 )
+  do jV = lV ( 2 ), uV ( 2 )
+do iV = lV ( 1 ), uV ( 1 )
+
+
+end do
+  end do
+end do
+!$OMP end target teams distribute parallel do simd ! { dg-error 
"Unexpected !.OMP END TARGET TEAMS DISTRIBUTE PARALLEL DO SIMD statement in an 
OMP METADIRECTIVE block at .1." }
+
+
+end program
+
-- 
2.31.1

Proxy ping [PATCH] Fortran: Fix automatic reallocation inside select rank [PR100103]

2022-09-21 Thread Harald Anlauf via Gcc-patches

Dear all,

the patch for this PR was submitted for review by Jose here:

  https://gcc.gnu.org/pipermail/fortran/2021-April/055934.html

but unfortunately was never reviewed.

I verified that it works on mainline and x86_64-pc-linux-gnu,
and I think that it is fine.

Although the above mail suggests that there is a dependency
on the fix for another PR with a rather lengthy patch,
it appears that this is no longer the case.  It might be
that the fix for PR100245 (another reallocation issue)
already did the necessary job.

So OK for mainline?

Thanks,
Harald

From 6c93c5058f552f47a3d828d3fb19cca652901299 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jos=C3=A9=20Rui=20Faustino=20de=20Sousa?=
 
Date: Wed, 21 Sep 2022 22:55:02 +0200
Subject: [PATCH] Fortran: Fix automatic reallocation inside select rank
 [PR100103]

gcc/fortran/ChangeLog:

	PR fortran/100103
	* trans-array.cc (gfc_is_reallocatable_lhs): Add select rank
	temporary associate names as possible targets of automatic
	reallocation.

gcc/testsuite/ChangeLog:

	PR fortran/100103
	* gfortran.dg/PR100103.f90: New test.
---
 gcc/fortran/trans-array.cc |  4 +-
 gcc/testsuite/gfortran.dg/PR100103.f90 | 76 ++
 2 files changed, 78 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/PR100103.f90

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 05134952db4..795ce14af08 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -10378,7 +10378,7 @@ gfc_is_reallocatable_lhs (gfc_expr *expr)

   /* An allocatable class variable with no reference.  */
   if (sym->ts.type == BT_CLASS
-  && !sym->attr.associate_var
+  && (!sym->attr.associate_var || sym->attr.select_rank_temporary)
   && CLASS_DATA (sym)->attr.allocatable
   && expr->ref
   && ((expr->ref->type == REF_ARRAY && expr->ref->u.ar.type == AR_FULL
@@ -10393,7 +10393,7 @@ gfc_is_reallocatable_lhs (gfc_expr *expr)

   /* An allocatable variable.  */
   if (sym->attr.allocatable
-  && !sym->attr.associate_var
+  && (!sym->attr.associate_var || sym->attr.select_rank_temporary)
   && expr->ref
   && expr->ref->type == REF_ARRAY
   && expr->ref->u.ar.type == AR_FULL)
diff --git a/gcc/testsuite/gfortran.dg/PR100103.f90 b/gcc/testsuite/gfortran.dg/PR100103.f90
new file mode 100644
index 000..21405610a71
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR100103.f90
@@ -0,0 +1,76 @@
+! { dg-do run }
+!
+! Test the fix for PR100103
+!
+
+program main_p
+  implicit none
+
+  integer:: i
+  integer, parameter :: n = 11
+
+  type :: foo_t
+integer :: i
+  end type foo_t
+
+  type(foo_t), parameter :: a(*) = [(foo_t(i), i=1,n)]
+
+  type(foo_t),  allocatable :: bar_d(:)
+  class(foo_t), allocatable :: bar_p(:)
+  class(*), allocatable :: bar_u(:)
+
+
+  call foo_d(bar_d)
+  if(.not.allocated(bar_d)) stop 1
+  if(any(bar_d%i/=a%i)) stop 2
+  deallocate(bar_d)
+  call foo_p(bar_p)
+  if(.not.allocated(bar_p)) stop 3
+  if(any(bar_p%i/=a%i)) stop 4
+  deallocate(bar_p)
+  call foo_u(bar_u)
+  if(.not.allocated(bar_u)) stop 5
+  select type(bar_u)
+  type is(foo_t)
+if(any(bar_u%i/=a%i)) stop 6
+  class default
+stop 7
+  end select
+  deallocate(bar_u)
+
+contains
+
+  subroutine foo_d(that)
+type(foo_t), allocatable, intent(out) :: that(..)
+
+select rank(that)
+rank(1)
+  that = a
+rank default
+  stop 8
+end select
+  end subroutine foo_d
+
+  subroutine foo_p(that)
+class(foo_t), allocatable, intent(out) :: that(..)
+
+select rank(that)
+rank(1)
+  that = a
+rank default
+  stop 9
+end select
+  end subroutine foo_p
+
+  subroutine foo_u(that)
+class(*), allocatable, intent(out) :: that(..)
+
+select rank(that)
+rank(1)
+  that = a
+rank default
+  stop 10
+end select
+  end subroutine foo_u
+
+end program main_p
--
2.35.3

Re: [PATCH] Cleanup gdb printers.py

2022-09-21 Thread Jonathan Wakely via Gcc-patches

On Wed, 21 Sept 2022 at 20:57, François Dumont via Libstdc++
 wrote:
>
> I stopped my research to find out if those types ever existed in 2001.
> Clearly they do not exist now.
>
>  libstdc++: Remove useless gdb printer registrations.
>
>  libstdc++-v3/ChangeLog:
>
>  * python/libstdcxx/v6/printers.py: Remove printer
> registration for non-existing
>  types std::__debug::unique_ptr, std::__debug::stack,
> std::__debug::queue,
>  std::__debug::priority_queue.
>
> Ok to commit ?

Oh good catch, please commit, thanks!

[PATCH] x86: Check corrupted return address when unwinding stack

2022-09-21 Thread H.J. Lu via Gcc-patches

If shadow stack is enabled, when unwinding stack, we count how many stack
frames we pop to reach the landing pad and adjust shadow stack by the same
amount.  When counting the stack frame, we compare the return address on
normal stack against the return address on shadow stack.  If they don't
match, return _URC_FATAL_PHASE2_ERROR for the corrupted return address on
normal stack.  Don't check the return address for

1. Non-catchable exception where exception_class == 0.  Process will be
terminated.
2. Zero return address which marks the outermost stack frame.
3. Signal stack frame since kernel puts a restore token on shadow stack.

* unwind-generic.h (_Unwind_Frames_Increment): Add the EXC
argument.
* unwind.inc (_Unwind_RaiseException_Phase2): Pass EXC to
_Unwind_Frames_Increment.
(_Unwind_ForcedUnwind_Phase2): Likewise.
* config/i386/shadow-stack-unwind.h (_Unwind_Frames_Increment):
Take the EXC argument.  Return _URC_FATAL_PHASE2_ERROR if the
return address on normal stack doesn't match the return address
on shadow stack.
---
 libgcc/config/i386/shadow-stack-unwind.h | 51 ++--
 libgcc/unwind-generic.h  |  2 +-
 libgcc/unwind.inc|  4 +-
 3 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/libgcc/config/i386/shadow-stack-unwind.h 
b/libgcc/config/i386/shadow-stack-unwind.h
index 2b02682bdae..89d44165000 100644
--- a/libgcc/config/i386/shadow-stack-unwind.h
+++ b/libgcc/config/i386/shadow-stack-unwind.h
@@ -54,10 +54,39 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
aligned.  If the original shadow stack is 8 byte aligned, we just
need to pop 2 slots, one restore token, from shadow stack.  Otherwise,
we need to pop 3 slots, one restore token + 4 byte padding, from
-   shadow stack.  */
-#ifndef __x86_64__
+   shadow stack.
+
+   When popping a stack frame, we compare the return address on normal
+   stack against the return address on shadow stack.  If they don't match,
+   return _URC_FATAL_PHASE2_ERROR for the corrupted return address on
+   normal stack.  Don't check the return address for
+   1. Non-catchable exception where exception_class == 0.  Process will
+  be terminated.
+   2. Zero return address which marks the outermost stack frame.
+   3. Signal stack frame since kernel puts a restore token on shadow
+  stack.
+ */
 #undef _Unwind_Frames_Increment
-#define _Unwind_Frames_Increment(context, frames)  \
+#ifdef __x86_64__
+#define _Unwind_Frames_Increment(exc, context, frames) \
+{  \
+  frames++;\
+  if (exc->exception_class != 0\
+ && _Unwind_GetIP (context) != 0   \
+ && !_Unwind_IsSignalFrame (context))  \
+   {   \
+ _Unwind_Word ssp = _get_ssp ();   \
+ if (ssp != 0) \
+   {   \
+ ssp += 8 * frames;\
+ _Unwind_Word ra = *(_Unwind_Word *) ssp;  \
+ if (ra != _Unwind_GetIP (context))\
+   return _URC_FATAL_PHASE2_ERROR; \
+   }   \
+   }   \
+}
+#else
+#define _Unwind_Frames_Increment(exc, context, frames) \
   if (_Unwind_IsSignalFrame (context)) \
 do \
   {\
@@ -83,5 +112,19 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
   }\
 while (0); \
   else \
-frames++;
+{  \
+  frames++;\
+  if (exc->exception_class != 0\
+ && _Unwind_GetIP (context) != 0)  \
+   {   \
+ _Unwind_Word ssp = _get_ssp ();   \
+ if (ssp != 0) \
+   {   \
+ ssp += 4 * frames;\
+ _Unwind_Word ra = *(_Unwind_Word *) ssp;  \
+ if (ra != _Unwind_GetIP (context))\
+   return _URC_FATAL_PHASE2_ERROR; \
+   }   \
+   }   \
+}
 #endif
diff --git a/libgcc/unwind-generic.h b/libgcc/unwind-generic.h
index a87c9b3ccf6..bf721282d03 100644
---

Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-09-21 Thread Alexander Monakov via Gcc-patches



Hi.

On the high level, I'd be highly uncomfortable with this. I guess we are in
vague agreement that it cannot be efficiently implemented. It also goes
against the good practice of accelerator programming, which requires queueing
work on the accelerator and letting it run asynchronously with the CPU with high
occupancy.

(I know libgomp still waits for the GPU to finish in each GOMP_offload_run,
but maybe it's better to improve *that* instead of piling on new slowness)

What I said above also applies to MPI+GPU scenarios: a well-designed algorithm
should arrange for MPI communications to happen in parallel with some useful
offloaded calculations. I don't see the value in implementing the ability to
invoke an MPI call from the accelerator in such inefficient fashion.

(so yes, I disagree with "it is better to provide a feature even if it is slow –
than not providing it at all", when it is advertised as a general-purpose
feature, not a purely debugging helper)


On to the patch itself. IIRC one of the questions was use of CUDA managed
memory. I think it is unsafe because device-issued atomics are not guaranteed
to appear atomic to the host, unless compiling for compute capability 6.0 or
above, and using system-scope atomics ("atom.sys").

And for non-USM code path you're relying on cudaMemcpy observing device-side
atomics in the right order.

Atomics aside, CUDA pinned memory would be a natural choice for such a tiny
structure. Did you rule it out for some reason?

Some remarks on the diff below, not intended to be a complete review.

Alexander


> --- a/libgomp/config/nvptx/target.c
> +++ b/libgomp/config/nvptx/target.c
> @@ -26,7 +26,29 @@
>  #include "libgomp.h"
>  #include 
>  
> +#define GOMP_REV_OFFLOAD_VAR __gomp_rev_offload_var

Shouldn't this be in a header (needs to be in sync with the plugin).

> +
> +/* Reverse offload. Must match version used in plugin/plugin-nvptx.c. */
> +struct rev_offload {
> +  uint64_t fn;
> +  uint64_t mapnum;
> +  uint64_t addrs;
> +  uint64_t sizes;
> +  uint64_t kinds;
> +  int32_t dev_num;
> +  uint32_t lock;
> +};

Likewise.

> +
> +#if (__SIZEOF_SHORT__ != 2 \
> + || __SIZEOF_SIZE_T__ != 8 \
> + || __SIZEOF_POINTER__ != 8)
> +#error "Data-type conversion required for rev_offload"
> +#endif

Huh? This is not a requirement that is new for reverse offload, it has always
been like that for offloading (all ABI rules regarding type sizes, struct
layout, bitfield layout, endianness must match).

> +
> +
>  extern int __gomp_team_num __attribute__((shared));
> +extern volatile struct gomp_offload_icvs GOMP_ADDITIONAL_ICVS;
> +volatile struct rev_offload *GOMP_REV_OFFLOAD_VAR;
>  
>  bool
>  GOMP_teams4 (unsigned int num_teams_lower, unsigned int num_teams_upper,
> @@ -88,16 +110,32 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t 
> mapnum,
>void **hostaddrs, size_t *sizes, unsigned short *kinds,
>unsigned int flags, void **depend, void **args)
>  {
> -  (void) device;
> -  (void) fn;
> -  (void) mapnum;
> -  (void) hostaddrs;
> -  (void) sizes;
> -  (void) kinds;
>(void) flags;
>(void) depend;
>(void) args;
> -  __builtin_unreachable ();
> +
> +  if (device != GOMP_DEVICE_HOST_FALLBACK
> +  || fn == NULL
> +  || GOMP_REV_OFFLOAD_VAR == NULL)
> +return;

Shouldn't this be an 'assert' instead?

> +
> +  while (__sync_lock_test_and_set (_REV_OFFLOAD_VAR->lock, (uint8_t) 1))
> +;  /* spin  */
> +
> +  __atomic_store_n (_REV_OFFLOAD_VAR->mapnum, mapnum, __ATOMIC_SEQ_CST);
> +  __atomic_store_n (_REV_OFFLOAD_VAR->addrs, hostaddrs, 
> __ATOMIC_SEQ_CST);
> +  __atomic_store_n (_REV_OFFLOAD_VAR->sizes, sizes, __ATOMIC_SEQ_CST);
> +  __atomic_store_n (_REV_OFFLOAD_VAR->kinds, kinds, __ATOMIC_SEQ_CST);
> +  __atomic_store_n (_REV_OFFLOAD_VAR->dev_num,
> + GOMP_ADDITIONAL_ICVS.device_num, __ATOMIC_SEQ_CST);

Looks like all these can be plain stores, you only need ...

> +
> +  /* 'fn' must be last.  */
> +  __atomic_store_n (_REV_OFFLOAD_VAR->fn, fn, __ATOMIC_SEQ_CST);

... this to be atomic with 'release' semantics in the usual producer-consumer
pattern.

> +
> +  /* Processed on the host - when done, fn is set to NULL.  */
> +  while (__atomic_load_n (_REV_OFFLOAD_VAR->fn, __ATOMIC_SEQ_CST) != 0)
> +;  /* spin  */
> +  __sync_lock_release (_REV_OFFLOAD_VAR->lock);
>  }
>  
>  void
> diff --git a/libgomp/libgomp-plugin.c b/libgomp/libgomp-plugin.c
> index 9d4cc62..316de74 100644
> --- a/libgomp/libgomp-plugin.c
> +++ b/libgomp/libgomp-plugin.c
> @@ -78,3 +78,15 @@ GOMP_PLUGIN_fatal (const char *msg, ...)
>gomp_vfatal (msg, ap);
>va_end (ap);
>  }
> +
> +void
> +GOMP_PLUGIN_target_rev (uint64_t fn_ptr, uint64_t mapnum, uint64_t 
> devaddrs_ptr,
> + uint64_t sizes_ptr, uint64_t kinds_ptr, int dev_num,
> + void (*dev_to_host_cpy) (void *, const void *, size_t,
> +  void *),
> +

[PATCH] Cleanup gdb printers.py

2022-09-21 Thread François Dumont via Gcc-patches

I stopped my research to find out if those types ever existed in 2001. 
Clearly they do not exist now.


    libstdc++: Remove useless gdb printer registrations.

    libstdc++-v3/ChangeLog:

    * python/libstdcxx/v6/printers.py: Remove printer 
registration for non-existing
    types std::__debug::unique_ptr, std::__debug::stack, 
std::__debug::queue,

    std::__debug::priority_queue.

Ok to commit ?

François
diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py
index bd4289c1c62..5a3dcbd13f9 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -2246,12 +2246,7 @@ def build_libstdcxx_dictionary ():
 libstdcxx_printer.add('std::__debug::map', StdMapPrinter)
 libstdcxx_printer.add('std::__debug::multimap', StdMapPrinter)
 libstdcxx_printer.add('std::__debug::multiset', StdSetPrinter)
-libstdcxx_printer.add('std::__debug::priority_queue',
-  StdStackOrQueuePrinter)
-libstdcxx_printer.add('std::__debug::queue', StdStackOrQueuePrinter)
 libstdcxx_printer.add('std::__debug::set', StdSetPrinter)
-libstdcxx_printer.add('std::__debug::stack', StdStackOrQueuePrinter)
-libstdcxx_printer.add('std::__debug::unique_ptr', UniquePointerPrinter)
 libstdcxx_printer.add('std::__debug::vector', StdVectorPrinter)
 
 # These are the TR1 and C++11 printers.

Re: [PATCH 09/10] fortran: Support clobbering of variable subreferences [PR88364]

2022-09-21 Thread Harald Anlauf via Gcc-patches


Hi Mikael,

Am 21.09.22 um 20:56 schrieb Mikael Morin:

Le 21/09/2022 à 11:57, Thomas Koenig a écrit :


Hi Harald,


I think I understand much of what is said, but I feel that I do
not really understand what *clobber* means for the different
beasts we are discussing (although I have an impression of what
it means for a scalar object).




More seriously: My understanding of a clobber it is a hint to
the middle end that the value in question will not be used,
and that operations leading to this value can be removed,
unless they are used otherwise.


My understanding is that "clobber" means "overwrite with garbage" for
all the beasts we have been discussing, which translates to nothing in
the final code, but can be used by the optimizers as Thomas said.

This is a bit off-topic but clobbers model registers having their values
changed unpredictably or by ways unknown to the compiler, in the backend
code, or in inline assembly statements.
Here is an excerpt from rtl.texi:

@item (clobber @var{x})
Represents the storing or possible storing of an unpredictable,
undescribed value into @var{x}


ah, I missed that file.  I only found references to assembly,
and references to registers etc. were not really helpful here.

It also says:

> If @var{x} is @code{(mem:BLK (const_int 0))} or
> @code{(mem:BLK (scratch))}, it means that all memory
> locations must be presumed clobbered.  ...

so this goes into the direction I was thinking of.


I Hope it helps.

Re: [PATCH 09/10] fortran: Support clobbering of variable subreferences [PR88364]

2022-09-21 Thread Mikael Morin


Le 21/09/2022 à 11:57, Thomas Koenig a écrit :


Hi Harald,


I think I understand much of what is said, but I feel that I do
not really understand what *clobber* means for the different
beasts we are discussing (although I have an impression of what
it means for a scalar object).


Obviously, "clobber" means taking a big stick and hitting the beast
in question over the head with it :-)

More seriously: My understanding of a clobber it is a hint to
the middle end that the value in question will not be used,
and that operations leading to this value can be removed,
unless they are used otherwise.

My understanding is that "clobber" means "overwrite with garbage" for 
all the beasts we have been discussing, which translates to nothing in 
the final code, but can be used by the optimizers as Thomas said.


This is a bit off-topic but clobbers model registers having their values 
changed unpredictably or by ways unknown to the compiler, in the backend 
code, or in inline assembly statements.

Here is an excerpt from rtl.texi:

@item (clobber @var{x})
Represents the storing or possible storing of an unpredictable,
undescribed value into @var{x}


I Hope it helps.

Re: [PATCH][RFH] Wire ranger into FRE

2022-09-21 Thread Andrew MacLeod via Gcc-patches




On 9/21/22 06:13, Richard Biener wrote:

On Mon, 19 Sep 2022, Andrew MacLeod wrote:



It looks like you created a fur_source to manually adjust PHIs within the
fold_stmt query to ignore edges that are not marked executable.

Yes, and use the current values from the VN lattice when looking at
statement operands.


yes, that is exactly how its intended to be used.





That would then just leave you with the stale cache state to deal with?   And
if we can resolve that, would all just work?  at least in theory?

In theory, yes.  Besides that the use-def walking of the cache it not
wired up with fur_*


Well, yes. hmm, you want to set cache values based on the VN lattice as 
well. yes. OK, let me do a bit of cache explanation since I haven't done 
that yet. It does not need a fur_source of any kind, and I'll explain why.


The cache has 2 primary functions..
  1) maintain the global definition table (used to decide if a name has 
been processed). This is local and not the one the rest of GCC uses.   and
  2) maintain the range-on-entry cache andresolve queries to that 
efficiently.


The cache does not actually create any NEW information.  This is one of 
its key features in preventing any kind of cascading cyclic updates.  
All it does is propagate existing information from the definition table, 
with values extracted from the global value table.  So your example is 
not good for this, as there isn't much in the cache for it.  so lets 
tweak it and add another block. example:


n_2 = 1
  i_4 = 0
  val_5 = 0
:
  # i_1 = PHI 
  #val_2 = PHI 
  val_6 = val_2 + 1;
  i_7 = i_1 + 1
  if (i_7 > 22)
 goto 
  else
 goto 

  if (i_7 < n_3)
    goto ;
  else
    goto ;

  _8 = val_6
  return _8

For the sake of simplicity, lets also assume bb2 and bb3 have been 
looked and all the ssa-names defined in those blocks have an entry in 
rangers defintion table.


Moving to  if we ask for the range of "if (i_7< n_3) to be 
evaluated, it checks that i_7 and n_3 have been evaluated before it 
proceeds.  Both have entries, which means the next task is to get their 
values at this location.  range_of_expr is called on each one, and as 
they are not defined in this block, ranger asks the cache for the value 
of i_7 on entry to bb7. (likewise when it gets an answer back, it will 
do so for n_3 as well)


The cache walks back the dominators until it finds either:
  a) the block with the definition of i_7, or
  b) a block which has an on-entry cache value for i_7 already set.
During it walk, it tags any block which has i_7 in the export list, 
meaning an outgoing edge from that block may change the value of i_7.


There are additional complexities, but the fundamental operation is to 
now take the value it saw from a) or b) as the starting value, and 
supply that to GORI at every intervening outgoing edge i_7 was exported 
from. Whenever the value changes along the way, we write a cache update 
at the end of the edge to facilitate future queries.  At the end, the 
answer has been calculated and is stored as the on-entry value for this 
block.


So returning to the example, assume i_7 was set to VARYING in bb3, GORI 
would apply !(i_7 > 22) to the value, and we would end up in  with 
a range-on-entry of [0, 21] and it would be stored in bb7.


In your example, if you have disabled that back edge, you would have a 
value of [1,1] for i_7.  GORI would not have changed that value since 
its already < 22, and we would store [1,1] as the range-on-entry to 


Likewise, we do something similar for n_3.  The point is, the cache has 
not gone and created an new information.  its *only* purpose it to 
propagate known values thru the CFG, adjusting them for any outgoing 
edges that are encountered.  It uses a temporal marking in an attempt to 
identify when a global value has been changed, meaning it may need to go 
and repopulate something, but the bottom line It never contains anything 
beyond "reductions" in the ranges of values in the global table.  And it 
only every works on one name at a time.


THe bottom line, Ranger effectively only every changes values via the 
global table. And the cache propagates simply those values around, 
adjusting them with GORI as appropriate.


So there are multiple approaches.  We could simply kill the global table 
and cache line for any ssa_name we want to change the value of.  That 
gets a little tricker for on-entry values of secondary effects (ie, 
those used in the calculation of the primary names). It would probably 
work, but something unforeseen could show up.


More advanced would be to "layer" the cache.  ie, we use the cache, and 
at some point, you issue a "push".  The push creates a new cache, and 
all queries look first to the new cache, and if it cant be answered 
looks "down" thru to the previous caches.   This resolves all queries as 
if the cache layers are all "one" thing. Sets would always go to the 
latest layer.   When we "pop", we delete the latest layer..  the layers

[PATCH] Fortran: fix ICE in generate_coarray_sym_init [PR82868]

2022-09-21 Thread Harald Anlauf via Gcc-patches

Dear all,

I intend to commit the attached, obvious patch for a NULL pointer
dereference until tomorrow unless there are comments or objections.
We better skip initialization for a symbol which is an associate name.

Regtested on x86_64-pc-linux-gnu.

Thanks,
Harald

From 0259762271b2eb430e058b0bff4d7b11513c48c4 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Wed, 21 Sep 2022 19:55:30 +0200
Subject: [PATCH] Fortran: fix ICE in generate_coarray_sym_init [PR82868]

gcc/fortran/ChangeLog:

	PR fortran/82868
	* trans-decl.cc (generate_coarray_sym_init): Skip symbol
	if attr.associate_var.

gcc/testsuite/ChangeLog:

	PR fortran/82868
	* gfortran.dg/associate_26a.f90: New test.
---
 gcc/fortran/trans-decl.cc   |  1 +
 gcc/testsuite/gfortran.dg/associate_26a.f90 | 15 +++
 2 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/associate_26a.f90

diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 908a4c6d42e..5d16d640322 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -5529,6 +5529,7 @@ generate_coarray_sym_init (gfc_symbol *sym)

   if (sym->attr.dummy || sym->attr.allocatable || !sym->attr.codimension
   || sym->attr.use_assoc || !sym->attr.referenced
+  || sym->attr.associate_var
   || sym->attr.select_type_temporary)
 return;

diff --git a/gcc/testsuite/gfortran.dg/associate_26a.f90 b/gcc/testsuite/gfortran.dg/associate_26a.f90
new file mode 100644
index 000..85aebebd4d8
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/associate_26a.f90
@@ -0,0 +1,15 @@
+! { dg-do compile }
+! { dg-options "-fcoarray=lib" }
+!
+! Test the fix for PR78152 and the followup in PR82868
+!
+! Contributed by 
+!
+program co_assoc
+  implicit none
+  integer, parameter :: p = 5
+  real, allocatable :: a(:,:)[:,:]
+  allocate (a(p,p)[2,*])
+  associate (i => a(1:p, 1:p))
+  end associate
+end program co_assoc
--
2.35.3

[PATCH] c++ modules: partial variable template specializations [PR106826]

2022-09-21 Thread Patrick Palka via Gcc-patches

With partial variable template specializations, it looks like we
stream the VAR_DECL (i.e. the DECL_TEMPLATE_RESULT of the corresponding
TEMPLATE_DECL) since process_partial_specialization adds it to the
specializations table, but end up never streaming the corresponding
TEMPLATE_DECL itself that appears only in the primary template's
DECL_TEMPLATE_SPECIALIZATIONS list, which leads to the list being
incomplete on stream-in.

The modules machinery already has special logic for streaming partial
specializations of class templates; this patch generalizes it to handle
those of variable templates as well.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

PR c++/106826

gcc/cp/ChangeLog:

* module.cc (trees_out::decl_value): Use get_template_info in
the MK_partial case.
(trees_out::key_mergeable): Likewise.
(trees_in::key_mergeable): Likewise.
(has_definition): Consider DECL_INITIAL of a partial variable
template specialization.
(depset::hash::make_dependency): Introduce a dependency of
partial variable template specializations too.

gcc/testsuite/ChangeLog:

* g++.dg/modules/partial-2_a.C: New test.
* g++.dg/modules/partial-2_b.C: New test.
---
 gcc/cp/module.cc   | 32 +---
 gcc/testsuite/g++.dg/modules/partial-2_a.C | 43 ++
 gcc/testsuite/g++.dg/modules/partial-2_b.C | 21 +++
 3 files changed, 82 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/partial-2_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/partial-2_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 9a9ef4e3332..334bde99b0f 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -7789,8 +7789,9 @@ trees_out::decl_value (tree decl, depset *dep)
}
  else
{
- tree_node (CLASSTYPE_TI_TEMPLATE (TREE_TYPE (inner)));
- tree_node (CLASSTYPE_TI_ARGS (TREE_TYPE (inner)));
+ tree ti = get_template_info (inner);
+ tree_node (TI_TEMPLATE (ti));
+ tree_node (TI_ARGS (ti));
}
}
   tree_node (get_constraints (decl));
@@ -10626,8 +10627,9 @@ trees_out::key_mergeable (int tag, merge_kind mk, tree 
decl, tree inner,
case MK_partial:
  {
key.constraints = get_constraints (inner);
-   key.ret = CLASSTYPE_TI_TEMPLATE (TREE_TYPE (inner));
-   key.args = CLASSTYPE_TI_ARGS (TREE_TYPE (inner));
+   tree ti = get_template_info (inner);
+   key.ret = TI_TEMPLATE (ti);
+   key.args = TI_ARGS (ti);
  }
  break;
}
@@ -10866,8 +10868,8 @@ trees_in::key_mergeable (int tag, merge_kind mk, tree 
decl, tree inner,
   spec; spec = TREE_CHAIN (spec))
{
  tree tmpl = TREE_VALUE (spec);
- if (template_args_equal (key.args,
-  CLASSTYPE_TI_ARGS (TREE_TYPE (tmpl)))
+ tree ti = get_template_info (tmpl);
+ if (template_args_equal (key.args, TI_ARGS (ti))
  && cp_tree_equal (key.constraints,
get_constraints
(DECL_TEMPLATE_RESULT (tmpl
@@ -11381,8 +11383,7 @@ has_definition (tree decl)
 
 case VAR_DECL:
   if (DECL_LANG_SPECIFIC (decl)
- && DECL_TEMPLATE_INFO (decl)
- && DECL_USE_TEMPLATE (decl) < 2)
+ && DECL_TEMPLATE_INFO (decl))
return DECL_INITIAL (decl);
   else
{
@@ -12498,11 +12499,14 @@ depset::hash::make_dependency (tree decl, entity_kind 
ek)
 
   if (!dep)
 {
-  if (DECL_IMPLICIT_TYPEDEF_P (decl)
- /* ... not an enum, for instance.  */
- && RECORD_OR_UNION_TYPE_P (TREE_TYPE (decl))
- && TYPE_LANG_SPECIFIC (TREE_TYPE (decl))
- && CLASSTYPE_USE_TEMPLATE (TREE_TYPE (decl)) == 2)
+  if ((DECL_IMPLICIT_TYPEDEF_P (decl)
+  /* ... not an enum, for instance.  */
+  && RECORD_OR_UNION_TYPE_P (TREE_TYPE (decl))
+  && TYPE_LANG_SPECIFIC (TREE_TYPE (decl))
+  && CLASSTYPE_USE_TEMPLATE (TREE_TYPE (decl)) == 2)
+ || (VAR_P (decl)
+ && DECL_LANG_SPECIFIC (decl)
+ && DECL_USE_TEMPLATE (decl) == 2))
{
  /* A partial or explicit specialization. Partial
 specializations might not be in the hash table, because
@@ -12515,7 +12519,7 @@ depset::hash::make_dependency (tree decl, entity_kind 
ek)
 dep_hash, and then convert the dep we just found into a
 redirect.  */
 
- tree ti = TYPE_TEMPLATE_INFO (TREE_TYPE (decl));
+ tree ti = get_template_info (decl);
  tree tmpl = TI_TEMPLATE (ti);
  tree partial = NULL_TREE;
  for (tree spec = DECL_TEMPLATE_SPECIALIZATIONS (tmpl);
diff --git a/gcc/testsuite/g++.dg/modules/partial-2_a.C

Re: [PATCH] MIPS: fix building on multiarch platform

2022-09-21 Thread Maciej W. Rozycki

On Wed, 21 Sep 2022, Xi Ruoyao wrote:

> > diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
> > index 74b6e11aabb..fe7f5b274b9 100644
> > --- a/gcc/config/mips/mips.h
> > +++ b/gcc/config/mips/mips.h
> > @@ -3427,6 +3427,7 @@ struct GTY(())  machine_function {
> >  
> >  /* If we are *not* using multilibs and the default ABI is not ABI_32
> > we
> >     need to change these from /lib and /usr/lib.  */
> > +#ifndef ENABLE_MULTIARCH
> >  #if MIPS_ABI_DEFAULT == ABI_N32
> >  #define STANDARD_STARTFILE_PREFIX_1 "/lib32/"
> >  #define STANDARD_STARTFILE_PREFIX_2 "/usr/lib32/"
> > @@ -3434,6 +3435,7 @@ struct GTY(())  machine_function {
> >  #define STANDARD_STARTFILE_PREFIX_1 "/lib64/"
> >  #define STANDARD_STARTFILE_PREFIX_2 "/usr/lib64/"
> >  #endif
> > +#endif
> 
> Should we just remove STANDARD_STARTFILE_PREFIX_{1,2} unconditionally? 
> I just took a look and the only Linux ports using these macros are MIPS
> and LoongArch (borrowed these macros from MIPS, I guess).  On a non-
> multilib distro /usr/lib is likely used, and on multilib distros the
> macros are not used anyway.

 See  for the 
rationale.  Has glibc switched since?

  Maciej

Re: [PATCH] MIPS: fix building on multiarch platform

2022-09-21 Thread Xi Ruoyao via Gcc-patches

On Wed, 2022-09-21 at 11:31 +, YunQiang Su wrote:
> On platforms that support multiarch, such as Debian,
> the filesystem hierarchy doesn't fellow the old Irix style:
> lib & lib/ for native
> lib64 for N64 on N32/O32 systems
> lib32 for N32 on N64/O32 systems
> libo32 for O32 on N64/N32 systems
> 
> Thus we cannot
>  #define STANDARD_STARTFILE_PREFIX_1
>  #define STANDARD_STARTFILE_PREFIX_2
> on N32 or N64 systems, else collect2 won't look for libraries
> on /lib/.
> 
> gcc/ChangeLog:
> * configure.ac: AC_DEFINE(ENABLE_MULTIARCH, 1)
> * configure: Regenerated.
> * config.in: Regenerated.
> * config/mips/mips.h: don't define STANDARD_STARTFILE_PREFIX_1
>   if ENABLE_MULTIARCH is defined.
> * config/mips/t-linux64: define correct multiarch path when
>   multiarch is enabled.
> ---
>  gcc/config.in |  6 ++
>  gcc/config/mips/mips.h    |  2 ++
>  gcc/config/mips/t-linux64 | 21 -
>  gcc/configure |  4 
>  gcc/configure.ac  |  3 +++
>  5 files changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config.in b/gcc/config.in
> index 6ac17be189e..b2ce6361327 100644
> --- a/gcc/config.in
> +++ b/gcc/config.in
> @@ -2312,6 +2312,12 @@
>  #endif
>  
>  
> +/* Specify if mutliarch is enabled. */
> +#ifndef USED_FOR_TARGET
> +#undef ENABLE_MULTIARCH
> +#endif
> +
> +
>  /* The size of `dev_t', as computed by sizeof. */
>  #ifndef USED_FOR_TARGET
>  #undef SIZEOF_DEV_T
> diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
> index 74b6e11aabb..fe7f5b274b9 100644
> --- a/gcc/config/mips/mips.h
> +++ b/gcc/config/mips/mips.h
> @@ -3427,6 +3427,7 @@ struct GTY(())  machine_function {
>  
>  /* If we are *not* using multilibs and the default ABI is not ABI_32
> we
>     need to change these from /lib and /usr/lib.  */
> +#ifndef ENABLE_MULTIARCH
>  #if MIPS_ABI_DEFAULT == ABI_N32
>  #define STANDARD_STARTFILE_PREFIX_1 "/lib32/"
>  #define STANDARD_STARTFILE_PREFIX_2 "/usr/lib32/"
> @@ -3434,6 +3435,7 @@ struct GTY(())  machine_function {
>  #define STANDARD_STARTFILE_PREFIX_1 "/lib64/"
>  #define STANDARD_STARTFILE_PREFIX_2 "/usr/lib64/"
>  #endif
> +#endif

Should we just remove STANDARD_STARTFILE_PREFIX_{1,2} unconditionally? 
I just took a look and the only Linux ports using these macros are MIPS
and LoongArch (borrowed these macros from MIPS, I guess).  On a non-
multilib distro /usr/lib is likely used, and on multilib distros the
macros are not used anyway.

>  /* Load store bonding is not supported by micromips and fix_24k.  The
>     performance can be degraded for those targets.  Hence, do not bond for
> diff --git a/gcc/config/mips/t-linux64 b/gcc/config/mips/t-linux64
> index 2fdd8e00407..37d176ea309 100644
> --- a/gcc/config/mips/t-linux64
> +++ b/gcc/config/mips/t-linux64
> @@ -20,7 +20,26 @@ MULTILIB_OPTIONS = mabi=n32/mabi=32/mabi=64
>  MULTILIB_DIRNAMES = n32 32 64
>  MIPS_EL = $(if $(filter %el, $(firstword $(subst -, ,$(target,el)
>  MIPS_SOFT = $(if $(strip $(filter MASK_SOFT_FLOAT_ABI, 
> $(target_cpu_default)) $(filter soft, $(with_float))),soft)
> -MULTILIB_OSDIRNAMES = \
> +ifeq (yes,$(enable_multiarch))
> +  ifneq (,$(findstring gnuabi64,$(target)))
> +    MULTILIB_OSDIRNAMES = \
> +   ../lib32$(call 
> if_multiarch,:mips64$(MIPS_EL)-linux-gnuabin32$(MIPS_SOFT)) \
> +   ../libo32$(call if_multiarch,:mips$(MIPS_EL)-linux-gnu$(MIPS_SOFT)) \
> +   ../lib$(call 
> if_multiarch,:mips64$(MIPS_EL)-linux-gnuabi64$(MIPS_SOFT))
> +  else ifneq (,$(findstring gnuabin32,$(target)))
> +    MULTILIB_OSDIRNAMES = \
> +   ../lib$(call 
> if_multiarch,:mips64$(MIPS_EL)-linux-gnuabin32$(MIPS_SOFT)) \
> +   ../libo32$(call if_multiarch,:mips$(MIPS_EL)-linux-gnu$(MIPS_SOFT)) \
> +   ../lib64$(call 
> if_multiarch,:mips64$(MIPS_EL)-linux-gnuabi64$(MIPS_SOFT))
> +  else
> +    MULTILIB_OSDIRNAMES = \
> +   ../lib32$(call 
> if_multiarch,:mips64$(MIPS_EL)-linux-gnuabin32$(MIPS_SOFT)) \
> +   ../lib$(call if_multiarch,:mips$(MIPS_EL)-linux-gnu$(MIPS_SOFT)) \
> +   ../lib64$(call 
> if_multiarch,:mips64$(MIPS_EL)-linux-gnuabi64$(MIPS_SOFT))
> +  endif
> +else
> +  MULTILIB_OSDIRNAMES = \
> ../lib32$(call 
> if_multiarch,:mips64$(MIPS_EL)-linux-gnuabin32$(MIPS_SOFT)) \
> ../lib$(call if_multiarch,:mips$(MIPS_EL)-linux-gnu$(MIPS_SOFT)) \
> ../lib64$(call 
> if_multiarch,:mips64$(MIPS_EL)-linux-gnuabi64$(MIPS_SOFT))
> +endif

Hmm, I don't think we should touch this.  The default setting of
MULTILIB_OSDIRNAMES is simply not designed to suit all distros (at least
for now) and many distros are patching it.  Change it here won't give
the distro maintainers any benefit, but will force them to rebase the
patch.

If we want to make the distro maintainers' life easier we'd make a
global decision (for all ports) and maybe add some configuration options
for MULTILIB_OSDIRNAMES.


-- 
Xi Ruoyao 
School of

[committed] libstdc++: Remove main() from some compile-only tests

2022-09-21 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux, pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* testsuite/17_intro/headers/c++1998/all_attributes.cc: Remove
unnecessary main function.
* testsuite/17_intro/headers/c++2011/all_attributes.cc:
Likewise.
* testsuite/17_intro/headers/c++2014/all_attributes.cc:
Likewise.
* testsuite/17_intro/headers/c++2017/all_attributes.cc:
Likewise.
* testsuite/17_intro/headers/c++2020/all_attributes.cc:
Likewise.
---
 .../testsuite/17_intro/headers/c++1998/all_attributes.cc | 5 -
 .../testsuite/17_intro/headers/c++2011/all_attributes.cc | 5 -
 .../testsuite/17_intro/headers/c++2014/all_attributes.cc | 5 -
 .../testsuite/17_intro/headers/c++2017/all_attributes.cc | 5 -
 .../testsuite/17_intro/headers/c++2020/all_attributes.cc | 5 -
 5 files changed, 25 deletions(-)

diff --git a/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc 
b/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc
index 20cda779b03..b8b37473505 100644
--- a/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc
+++ b/libstdc++-v3/testsuite/17_intro/headers/c++1998/all_attributes.cc
@@ -38,8 +38,3 @@
 
 #include 
 #include 
-
-int
-main()
-{
-}
diff --git a/libstdc++-v3/testsuite/17_intro/headers/c++2011/all_attributes.cc 
b/libstdc++-v3/testsuite/17_intro/headers/c++2011/all_attributes.cc
index c2b4c236da7..222ab786c95 100644
--- a/libstdc++-v3/testsuite/17_intro/headers/c++2011/all_attributes.cc
+++ b/libstdc++-v3/testsuite/17_intro/headers/c++2011/all_attributes.cc
@@ -38,8 +38,3 @@
 
 #include 
 #include 
-
-int
-main()
-{
-}
diff --git a/libstdc++-v3/testsuite/17_intro/headers/c++2014/all_attributes.cc 
b/libstdc++-v3/testsuite/17_intro/headers/c++2014/all_attributes.cc
index f6c4251acbc..b31d13f22d0 100644
--- a/libstdc++-v3/testsuite/17_intro/headers/c++2014/all_attributes.cc
+++ b/libstdc++-v3/testsuite/17_intro/headers/c++2014/all_attributes.cc
@@ -38,8 +38,3 @@
 
 #include 
 #include 
-
-int
-main()
-{
-}
diff --git a/libstdc++-v3/testsuite/17_intro/headers/c++2017/all_attributes.cc 
b/libstdc++-v3/testsuite/17_intro/headers/c++2017/all_attributes.cc
index 170ebef51c6..fd4d7d477f0 100644
--- a/libstdc++-v3/testsuite/17_intro/headers/c++2017/all_attributes.cc
+++ b/libstdc++-v3/testsuite/17_intro/headers/c++2017/all_attributes.cc
@@ -37,8 +37,3 @@
 
 #include 
 #include 
-
-int
-main()
-{
-}
diff --git a/libstdc++-v3/testsuite/17_intro/headers/c++2020/all_attributes.cc 
b/libstdc++-v3/testsuite/17_intro/headers/c++2020/all_attributes.cc
index 1d573a20c10..f700badb304 100644
--- a/libstdc++-v3/testsuite/17_intro/headers/c++2020/all_attributes.cc
+++ b/libstdc++-v3/testsuite/17_intro/headers/c++2020/all_attributes.cc
@@ -36,8 +36,3 @@
 
 #include 
 #include 
-
-int
-main()
-{
-}
-- 
2.37.3

[committed] libstdc++: Fix accidental duplicate test [PR91456]

2022-09-21 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux, pushed to trunk.

-- >8 --

It looks like I committed the testcase for std::function twice, instead
of one for std::function and one for std::is_invocable_r. This replaces
the is_invocable_r one with the example from the PR.

libstdc++-v3/ChangeLog:

PR libstdc++/91456
* testsuite/20_util/function/91456.cc: Add comment with PR
number.
* testsuite/20_util/is_invocable/91456.cc: Likewise. Replace
std::function checks with std::is_invocable_r checks.
---
 libstdc++-v3/testsuite/20_util/function/91456.cc |  3 +++
 libstdc++-v3/testsuite/20_util/is_invocable/91456.cc | 10 ++
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/testsuite/20_util/function/91456.cc 
b/libstdc++-v3/testsuite/20_util/function/91456.cc
index 6b6631c452d..081bf20e2cf 100644
--- a/libstdc++-v3/testsuite/20_util/function/91456.cc
+++ b/libstdc++-v3/testsuite/20_util/function/91456.cc
@@ -17,6 +17,9 @@
 
 // { dg-do compile { target c++17 } }
 
+// PR 91456
+// std::function and std::is_invocable_r do not understand guaranteed elision
+
 #include 
 
 struct Immovable {
diff --git a/libstdc++-v3/testsuite/20_util/is_invocable/91456.cc 
b/libstdc++-v3/testsuite/20_util/is_invocable/91456.cc
index a946db15c55..976d257ce85 100644
--- a/libstdc++-v3/testsuite/20_util/is_invocable/91456.cc
+++ b/libstdc++-v3/testsuite/20_util/is_invocable/91456.cc
@@ -17,6 +17,9 @@
 
 // { dg-do compile { target c++17 } }
 
+// PR 91456
+// std::function and std::is_invocable_r do not understand guaranteed elision
+
 #include 
 
 #include 
@@ -27,7 +30,6 @@ struct Immovable {
   Immovable& operator=(const Immovable&) = delete;
 };
 
-Immovable get() { return {}; }
-const Immovable i = get();  // OK
-std::function f{};   // fails
-const Immovable i2 = f();
+static_assert(std::is_invocable_r_v);
+static_assert(std::is_invocable_r_v);
+static_assert(std::is_invocable_r_v);
-- 
2.37.3

[committed] libstdc++: Update synopsis test for C++11 and later

2022-09-21 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux, pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* testsuite/20_util/headers/memory/synopsis.cc: Add declarations
from C++11 and later.
---
 .../20_util/headers/memory/synopsis.cc| 66 +--
 1 file changed, 59 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/testsuite/20_util/headers/memory/synopsis.cc 
b/libstdc++-v3/testsuite/20_util/headers/memory/synopsis.cc
index 03e3f80dac5..15437c72ee0 100644
--- a/libstdc++-v3/testsuite/20_util/headers/memory/synopsis.cc
+++ b/libstdc++-v3/testsuite/20_util/headers/memory/synopsis.cc
@@ -26,20 +26,35 @@
 # define NOTHROW
 #endif
 
-namespace std {
+namespace std
+{
+#if __cplusplus >= 201103L
+  template struct pointer_traits;
+  template struct pointer_traits;
+
+  void* align(size_t alignment, size_t size, void*& ptr, size_t& space);
+
+  struct allocator_arg_t;
+  extern const allocator_arg_t allocator_arg;
+
+  template struct uses_allocator;
+
+  template struct allocator_traits;
+#endif // C++11
+
+#if __STDC_HOSTED__
   //  lib.default.allocator, the default allocator:
   template  class allocator;
+#if __cplusplus >= 202002L
+  template 
+  constexpr bool operator==(const allocator&, const allocator&) throw();
+#else
   template <> class allocator;
   template 
-#if __cplusplus > 201703L
-  constexpr
-#endif
   bool operator==(const allocator&, const allocator&) throw();
   template 
-#if __cplusplus > 201703L
-  constexpr
-#endif
   bool operator!=(const allocator&, const allocator&) throw();
+#endif
 
   //  lib.storage.iterator, raw storage iterator:
   template  class raw_storage_iterator;
@@ -49,18 +64,55 @@ namespace std {
   pair get_temporary_buffer(ptrdiff_t n) NOTHROW;
   template 
   void return_temporary_buffer(T* p);
+#endif // HOSTED
 
   //  lib.specialized.algorithms, specialized algorithms:
+#if __cplusplus >= 201703L
+  template  constexpr T* addressof(T&) noexcept;
+#elif __cplusplus >= 201402L
+  template  T* addressof(T&) noexcept;
+#endif
   template 
   ForwardIterator
   uninitialized_copy(InputIterator first, InputIterator last,
 ForwardIterator result);
+#if __cplusplus >= 201103L
+  template 
+  ForwardIterator
+  uninitialized_copy_n(InputIterator first, Size n, ForwardIterator result);
+#endif
   template 
   void uninitialized_fill(ForwardIterator first, ForwardIterator last,
  const T& x);
   template 
   void uninitialized_fill_n(ForwardIterator first, Size n, const T& x);
 
+#if __cplusplus >= 201103L
+  template class default_delete;
+  template class default_delete;
+  template class unique_ptr;
+  template class unique_ptr;
+  template
+void swap(unique_ptr&, unique_ptr&) noexcept;
+#if __cplusplus >= 201402L
+  template unique_ptr make_unique(Args&&...);
+#endif
+
+  class bad_weak_ptr;
+  template class shared_ptr;
+  template shared_ptr make_shared(Args&&... args);
+  template
+  shared_ptr allocate_shared(const A& a, Args&&... args);
+  template void swap(shared_ptr&, shared_ptr&) noexcept;
+  template class weak_ptr;
+  template void swap(weak_ptr&, weak_ptr&) noexcept;
+  template class owner_less;
+  template class enable_shared_from_this;
+
+  template struct hash>;
+  template struct hash>;
+#endif
+
   //  lib.auto.ptr, pointers:
   template class auto_ptr;
 }
-- 
2.37.3

[COMMITTED] [PR106967] Set known NANs to undefined for flag_finite_math_only.

2022-09-21 Thread Aldy Hernandez via Gcc-patches

Richard, this is what you suggested.  Thanks.

Explicit NANs in the IL can be treated as undefined for
flag_finite_math_only.  This causes all the right things to happen wrt
threading, folding, etc.  It also saves us special casing throughout.

It occurs to me that we should do something similar for infinities for
-ffinite-math-only.  That is, drop them to the min/max representable
numbers, and adjust everything (including VARYING endpoints)
accordingly.  Furthermore, we should saturate to min/max representable
in the setter, so (upcoming) binary operators don't have to worry about
going over min/max.

Sigh...floating point... the gift that keeps on giving.

gcc/ChangeLog:

* value-range.cc (frange::set): Set known NANs to undefined for
flag_finite_math_only.
---
 gcc/value-range.cc | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 505eb9211a7..7e8028eced2 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -313,8 +313,13 @@ frange::set (tree min, tree max, value_range_kind kind)
   gcc_checking_assert (real_identical (TREE_REAL_CST_PTR (min),
   TREE_REAL_CST_PTR (max)));
   tree type = TREE_TYPE (min);
-  bool sign = real_isneg (TREE_REAL_CST_PTR (min));
-  set_nan (type, sign);
+  if (HONOR_NANS (type))
+   {
+ bool sign = real_isneg (TREE_REAL_CST_PTR (min));
+ set_nan (type, sign);
+   }
+  else
+   set_undefined ();
   return;
 }
 
-- 
2.37.1

[PATCH] MIPS: fix building on multiarch platform

2022-09-21 Thread YunQiang Su

On platforms that support multiarch, such as Debian,
the filesystem hierarchy doesn't fellow the old Irix style:
lib & lib/ for native
lib64 for N64 on N32/O32 systems
lib32 for N32 on N64/O32 systems
libo32 for O32 on N64/N32 systems

Thus we cannot
 #define STANDARD_STARTFILE_PREFIX_1
 #define STANDARD_STARTFILE_PREFIX_2
on N32 or N64 systems, else collect2 won't look for libraries
on /lib/.

gcc/ChangeLog:
* configure.ac: AC_DEFINE(ENABLE_MULTIARCH, 1)
* configure: Regenerated.
* config.in: Regenerated.
* config/mips/mips.h: don't define STANDARD_STARTFILE_PREFIX_1
  if ENABLE_MULTIARCH is defined.
* config/mips/t-linux64: define correct multiarch path when
  multiarch is enabled.
---
 gcc/config.in |  6 ++
 gcc/config/mips/mips.h|  2 ++
 gcc/config/mips/t-linux64 | 21 -
 gcc/configure |  4 
 gcc/configure.ac  |  3 +++
 5 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/gcc/config.in b/gcc/config.in
index 6ac17be189e..b2ce6361327 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -2312,6 +2312,12 @@
 #endif
 
 
+/* Specify if mutliarch is enabled. */
+#ifndef USED_FOR_TARGET
+#undef ENABLE_MULTIARCH
+#endif
+
+
 /* The size of `dev_t', as computed by sizeof. */
 #ifndef USED_FOR_TARGET
 #undef SIZEOF_DEV_T
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 74b6e11aabb..fe7f5b274b9 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -3427,6 +3427,7 @@ struct GTY(())  machine_function {
 
 /* If we are *not* using multilibs and the default ABI is not ABI_32 we
need to change these from /lib and /usr/lib.  */
+#ifndef ENABLE_MULTIARCH
 #if MIPS_ABI_DEFAULT == ABI_N32
 #define STANDARD_STARTFILE_PREFIX_1 "/lib32/"
 #define STANDARD_STARTFILE_PREFIX_2 "/usr/lib32/"
@@ -3434,6 +3435,7 @@ struct GTY(())  machine_function {
 #define STANDARD_STARTFILE_PREFIX_1 "/lib64/"
 #define STANDARD_STARTFILE_PREFIX_2 "/usr/lib64/"
 #endif
+#endif
 
 /* Load store bonding is not supported by micromips and fix_24k.  The
performance can be degraded for those targets.  Hence, do not bond for
diff --git a/gcc/config/mips/t-linux64 b/gcc/config/mips/t-linux64
index 2fdd8e00407..37d176ea309 100644
--- a/gcc/config/mips/t-linux64
+++ b/gcc/config/mips/t-linux64
@@ -20,7 +20,26 @@ MULTILIB_OPTIONS = mabi=n32/mabi=32/mabi=64
 MULTILIB_DIRNAMES = n32 32 64
 MIPS_EL = $(if $(filter %el, $(firstword $(subst -, ,$(target,el)
 MIPS_SOFT = $(if $(strip $(filter MASK_SOFT_FLOAT_ABI, $(target_cpu_default)) 
$(filter soft, $(with_float))),soft)
-MULTILIB_OSDIRNAMES = \
+ifeq (yes,$(enable_multiarch))
+  ifneq (,$(findstring gnuabi64,$(target)))
+MULTILIB_OSDIRNAMES = \
+   ../lib32$(call 
if_multiarch,:mips64$(MIPS_EL)-linux-gnuabin32$(MIPS_SOFT)) \
+   ../libo32$(call if_multiarch,:mips$(MIPS_EL)-linux-gnu$(MIPS_SOFT)) \
+   ../lib$(call if_multiarch,:mips64$(MIPS_EL)-linux-gnuabi64$(MIPS_SOFT))
+  else ifneq (,$(findstring gnuabin32,$(target)))
+MULTILIB_OSDIRNAMES = \
+   ../lib$(call 
if_multiarch,:mips64$(MIPS_EL)-linux-gnuabin32$(MIPS_SOFT)) \
+   ../libo32$(call if_multiarch,:mips$(MIPS_EL)-linux-gnu$(MIPS_SOFT)) \
+   ../lib64$(call 
if_multiarch,:mips64$(MIPS_EL)-linux-gnuabi64$(MIPS_SOFT))
+  else
+MULTILIB_OSDIRNAMES = \
+   ../lib32$(call 
if_multiarch,:mips64$(MIPS_EL)-linux-gnuabin32$(MIPS_SOFT)) \
+   ../lib$(call if_multiarch,:mips$(MIPS_EL)-linux-gnu$(MIPS_SOFT)) \
+   ../lib64$(call 
if_multiarch,:mips64$(MIPS_EL)-linux-gnuabi64$(MIPS_SOFT))
+  endif
+else
+  MULTILIB_OSDIRNAMES = \
../lib32$(call 
if_multiarch,:mips64$(MIPS_EL)-linux-gnuabin32$(MIPS_SOFT)) \
../lib$(call if_multiarch,:mips$(MIPS_EL)-linux-gnu$(MIPS_SOFT)) \
../lib64$(call 
if_multiarch,:mips64$(MIPS_EL)-linux-gnuabi64$(MIPS_SOFT))
+endif
diff --git a/gcc/configure b/gcc/configure
index 817d765568e..f9a796d6bb4 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -7841,6 +7841,10 @@ if test x${enable_multiarch} = xauto; then
 enable_multiarch=no
   fi
 fi
+if test x${enable_multiarch} = xyes; then
+  $as_echo "#define ENABLE_MULTIARCH 1" >>confdefs.h
+
+fi
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for multiarch configuration" 
>&5
 $as_echo_n "checking for multiarch configuration... " >&6; }
 
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 59f205a1781..44631e23033 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -886,6 +886,9 @@ if test x${enable_multiarch} = xauto; then
 enable_multiarch=no
   fi
 fi
+if test x${enable_multiarch} = xyes; then
+  AC_DEFINE(ENABLE_MULTIARCH, 1)
+fi
 AC_MSG_CHECKING(for multiarch configuration)
 AC_SUBST(enable_multiarch)
 AC_MSG_RESULT($enable_multiarch$ma_msg_suffix)
-- 
2.30.2

[PATCH] tree-optimization/106984 - tsan and COND_EXPR GIMPLE

2022-09-21 Thread Richard Biener via Gcc-patches

The following adjusts a missed spot in TSAN for the RHS COND_EXPR
GIMPLE IL rework.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/106984
* tsan.cc (instrument_builtin_call): Build the COND_EXPR condition in
a separate statement.

* gcc.dg/tsan/pr106984.c: New testcase.
---
 gcc/testsuite/gcc.dg/tsan/pr106984.c |  7 +++
 gcc/tsan.cc  | 13 +++--
 2 files changed, 14 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tsan/pr106984.c

diff --git a/gcc/testsuite/gcc.dg/tsan/pr106984.c 
b/gcc/testsuite/gcc.dg/tsan/pr106984.c
new file mode 100644
index 000..69cf83d547a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tsan/pr106984.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=thread" } */
+
+int svcsw (int *ptr, int oldval, int newval)
+{
+  return __sync_val_compare_and_swap (ptr, oldval, newval);
+}
diff --git a/gcc/tsan.cc b/gcc/tsan.cc
index 79d4582acd1..2406527c96a 100644
--- a/gcc/tsan.cc
+++ b/gcc/tsan.cc
@@ -620,15 +620,16 @@ instrument_builtin_call (gimple_stmt_iterator *gsi)
maybe_clean_or_replace_eh_stmt (stmt, gsi_stmt (*gsi));
if (tsan_atomic_table[i].action == val_cas && lhs)
  {
-   tree cond;
stmt = gsi_stmt (*gsi);
-   g = gimple_build_assign (make_ssa_name (TREE_TYPE (t)), t);
+   tree t2 = make_ssa_name (TREE_TYPE (t));
+   g = gimple_build_assign (t2, t);
gsi_insert_after (gsi, g, GSI_NEW_STMT);
t = make_ssa_name (TREE_TYPE (TREE_TYPE (decl)), stmt);
-   cond = build2 (NE_EXPR, boolean_type_node, t,
-  build_int_cst (TREE_TYPE (t), 0));
-   g = gimple_build_assign (lhs, COND_EXPR, cond, args[1],
-gimple_assign_lhs (g));
+   tree cond = make_ssa_name (boolean_type_node);
+   g = gimple_build_assign (cond, NE_EXPR,
+t, build_zero_cst (TREE_TYPE (t)));
+   gsi_insert_after (gsi, g, GSI_NEW_STMT);
+   g = gimple_build_assign (lhs, COND_EXPR, cond, args[1], t2);
gimple_call_set_lhs (stmt, t);
update_stmt (stmt);
gsi_insert_after (gsi, g, GSI_NEW_STMT);
-- 
2.35.3

Re: [PATCH][RFH] Wire ranger into FRE

2022-09-21 Thread Richard Biener via Gcc-patches

On Mon, 19 Sep 2022, Andrew MacLeod wrote:

> Yeah, currently the internal cache isnt really wired into the fold_using_range
> as its is suppose to be a consistent internal state.  so its not currently
> expecting to work in a situation here what it thinks is global state might
> change.
> 
> I figured I better go back and watch your entire VN presentation, I only
> caught the last bit live.
> 
> Let me try to understand exactly what you want/need.. and let me use a simple
> example from that talk (yes I just typed it in :-).
> 
>   n_2 = 1
>   i_4 = 0
>   val_5 = 0
> :
>   # i_1 = PHI 
>   #val_2 = PHI 
>   val_6 = val_2 + 1;
>   i_7 = i_1 + 1
>   if (i_7 < n_3)
>     goto ;
>   else
>     goto ;
> 
>   _8 = val_6
>   return _8
> 
> 
> In an ideal world, While you are processing bb 3 the first iteration,  you
> want edge 4->3 to be unexecutable as far as ranger is concerned.  You walk
> thru the block, and get to the end.  and in this example, you'd be done cause
> any queries you make range would have if (2 < 1),
> 
> For the case when we do need to iterate (say n_2 = 3), you need to go back and
> rewalk that block, with that edge no longer marked as unexecutable?  And part
> of the problem would be that ranger's cache for bb 3 would have all those
> values from the first pass. and everything explodes.

Yep.

> It looks like you created a fur_source to manually adjust PHIs within the
> fold_stmt query to ignore edges that are not marked executable.

Yes, and use the current values from the VN lattice when looking at
statement operands.

> So now let me start with my questions.
> 
> In theory, if you could tell it "kill the cache for bb3"  change the
> executable state and re walk it, that would work?  This clearly get s a little
> more complicated as the size of the region that is being iterated grows, so
> you would want to be able to invalidate the cache for a range of blocks.

Yep.  I do have such a range of blocks nicely available.  I would also
know the points when I change executability of an edge (always goes from
not executable to executable) in case that would help more.

> Thats not quite as straightforward as it might seems because the cache for
> values per basic block is indexed by ssa-name..  so you cant simply say, "kill
> bb3s value".. you need to walk all the ssa-names that have a cache object, and
> if they have an entry for bb3, kill that.  But lets leave that alone for a
> minute, because that is a solvable problem.
> 
> Ranger also has a globally available flag that it uses to internally flag and
> track executable state of edge:
> 
>   auto_edge_flag non_executable_edge_flag;
> 
> The gori object which is instantiated with any ranger instance only uses that
> to determine if an edge should be processed for a value  ie
> 
> gori_compute::outgoing_edge_range_p (vrange , edge e, tree name, range_query
> )
> {
>     if ((e->flags & m_not_executable_flag))
>     {
>   r.set_undefined ();
>   return true;
> 
> So it seems to me that if you all set/cleared this flag along with
> EDGE_EXECUTABLE  (and it means the opposite by the way.  TRUE means the edge
> is not_executable.. )  That at least all your queries through ranger would be
> picking up the values you are looking for.  including those PHI's.

Ah, I see.

> That would then just leave you with the stale cache state to deal with?   And
> if we can resolve that, would all just work?  at least in theory?

In theory, yes.  Besides that the use-def walking of the cache it not
wired up with fur_*

> There is a concept in the cache of staleness which is used when we recalculate
> over back edges, but its really oriented towards a model where we only ever
> move edges from executable to non-executable.    Your VN approach moves them
> in the other way and I have no confidence it would work properly in that mode.
> 
> Its not super efficient, but as a starting point we might be able to do
> something like
>   1 - when you enter a region call a new routine called "Mark state" which
> would  push an empty bitmap over ssa-names onto a stack
>   2 - everytime the cache creates an entry, set the bit for the ssa-name in
> the bitmap at the top of the stack
>  3 - when you call "reset state", simply kill all the cache entries for any
> ssa name set in the bitmap.  And also kill rangers internal global value for
> the name.   THIs would in essence reset eveyrthing it knows about that ssa
> name.

Value-numbering essentially does such things to be able to roll-back,
it records "change objects" that can be used to remove elements added
to hash-tables efficiently for example.

> That would have to drawback of losing any information that had been calculated
> earlier, and possibly trigger some additional calculations as that value will
> now be stale elsewhere in the IL.   We could make it more efficient if you
> also provided a bitmap over the basic blocks in the region (or some way of
> determining that).  Then we would just kill the entries

Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx

2022-09-21 Thread Chung-Lin Tang via Gcc-patches





On 2022/9/21 5:01 PM, Jakub Jelinek wrote:

On Wed, Sep 21, 2022 at 03:45:36PM +0800, Chung-Lin Tang via Gcc-patches wrote:

Hi Tom,
I had a patch submitted earlier, where I reported that the current way of 
implementing
barriers in libgomp on nvptx created a quite significant performance drop on 
some SPEChpc2021
benchmarks:
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html

That previous patch wasn't accepted well (admittedly, it was kind of a hack).
So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX.

Basically, instead of trying to have the GPU do CPU-with-OS-like things that it 
isn't suited for,
barriers are implemented simplistically with bar.* synchronization instructions.
Tasks are processed after threads have joined, and only if team->task_count != 0

(arguably, there might be a little bit of performance forfeited where earlier 
arriving threads
could've been used to process tasks ahead of other threads. But that again 
falls into requiring
implementing complex futex-wait/wake like behavior. Really, that kind of 
tasking is not what target
offloading is usually used for)


I admit I don't have a good picture if people in real-world actually use
tasking in offloading regions and how much and in what way, but the above
definitely would be a show-stopper for typical tasking workloads, where
one thread (usually from master/masked/single construct's body) creates lots
of tasks and can spend considerable amount of time in those preparations,
while other threads are expected to handle those tasks.


I think the most common use case for target offloading is "parallel for".

Really, not simply removing tasking altogether from target regions in the 
specification is just looking for trouble.

If asynchronous offloaded tasks are to be supported, something at the whole GPU 
offload region level
is much more reasonable, like the async clause functionality in OpenACC.


Do we have an idea how are other implementations handling this?
I think it should be easily observable with atomics, have
master/masked/single that creates lots of tasks and then spends a long time
doing something, have very small task bodies that just increment some atomic
counter and at the end of the master/masked/single see how many tasks were
already encountered.


This could be an interesting test...


Note, I don't have any smart ideas how to handle this instead and what
you posted might be ok for what people usually do on offloading targets
in OpenMP if they use tasking at all, just wanted to mention that there
could be workloads where the above is a serious problem.  If there are
say hundreds of threads doing nothing until a single thread reaches a
barrier and there are hundreds of pending tasks...


I think it might still be doable, just not in the very fine "wake one thread" 
style
that the Linux-based implementation was doing.


E.g. note we have that 64 pending task limit after which we start to
create undeferred tasks, so if we never start handling tasks until
one thread is done with them, that would mean the single thread
would create 64 deferred tasks and then handle all the others itself
making it even longer until the other tasks can deal with it.


Okay, thanks for reminding that.

Chung-Lin

Re: [PATCH 09/10] fortran: Support clobbering of variable subreferences [PR88364]

2022-09-21 Thread Thomas Koenig via Gcc-patches




Hi Harald,


I think I understand much of what is said, but I feel that I do
not really understand what *clobber* means for the different
beasts we are discussing (although I have an impression of what
it means for a scalar object).


Obviously, "clobber" means taking a big stick and hitting the beast
in question over the head with it :-)

More seriously: My understanding of a clobber it is a hint to
the middle end that the value in question will not be used,
and that operations leading to this value can be removed,
unless they are used otherwise.

If I'm wrong or imprecise, I'm sure somebody will correct me :-)

Regards

Thomas

Re: [PATCH] Add __builtin_iseqsig()

2022-09-21 Thread FX via Gcc-patches

ping*2



0001-Add-__builtin_iseqsig.patch
Description: Binary data


> Le 9 sept. 2022 à 19:55, FX  a écrit :
> 
> ping
> 
> 
>> Le 1 sept. 2022 à 23:02, FX  a écrit :
>> 
>> Attached patch adds __builtin_iseqsig() to the middle-end and C family 
>> front-ends.
>> Testing does not currently check whether the signaling part works, because 
>> with optimisation is actually does not (preexisting compiler bug: 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106805)
>> 
>> Bootstrapped and regtested on x86_64-linux.
>> OK to commit?
>> 
>> (I’m not very skilled for middle-end hacking, so I’m sure there will be 
>> modifications to make.)
>> 
>> FX
>> <0001-Add-__builtin_iseqsig.patch>
>

[wwwdocs] Add reference to pp_format to Coding Conventions

2022-09-21 Thread Jonathan Wakely via Gcc-patches

Finding the docs for the GCC-specific diagnostic formats isn't easy.
This might help.

OK for wwwdocs?

---
 htdocs/codingconventions.html | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/htdocs/codingconventions.html b/htdocs/codingconventions.html
index e4d30510..f1d1f165 100644
--- a/htdocs/codingconventions.html
+++ b/htdocs/codingconventions.html
@@ -333,7 +333,8 @@ code token.
 Diagnostics using the GCC diagnostic functions should generally
 use the GCC-specific formats such as %qs or
 % and % for quoting and
-%m for errno numbers.
+%m for errno numbers. See
+pp_format in pretty-print.cc for supported formats.
 
 Identifiers should generally be formatted with %E or
 %qE; use of identifier_to_locale is needed
-- 
2.37.3

[wwwdocs] Add C++23 library additions to GCC 13 release notes

2022-09-21 Thread Jonathan Wakely via Gcc-patches

Pushed to wwwdocs.

---
 htdocs/gcc-13/changes.html | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index 390193ca..a7d88038 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -158,7 +158,22 @@ a work-in-progress.
   warnings have been extended to warn in more contexts.
 
 
-
+Runtime Library (libstdc++)
+
+  Improved experimental support for C++23, including:
+
+Additions to the ranges header:
+  views::zip, views::zip_transform,
+  views::adjacent, views::adjacent_transform
+  views::pairwise, views::slide,
+  views::chunk, views::chunk_by.
+
+
+  
+  Support for the experimental/scope header
+  from v3 of the Library Fundamentals Technical Specification.
+  
+
 
 
 
-- 
2.37.3

Re: [COMMITTED] frange::maybe_isnan() should return FALSE for undefined ranges.

2022-09-21 Thread Richard Biener via Gcc-patches

On Wed, Sep 21, 2022 at 9:52 AM Aldy Hernandez  wrote:
>
> The reason the flags were uninitialized was because they were unused,
> similarly for m_type.  But you're right, it is icky and prone to bugs.
> I just thought it was cheap to set_undefined by just flipping
> m_kind=VR_UNDEFINED, but it smells like premature optimization.
>
> How about this?

LGTM

>
> Aldy
>
> On Wed, Sep 21, 2022 at 9:39 AM Richard Biener
>  wrote:
> >
> > On Tue, Sep 20, 2022 at 8:23 PM Aldy Hernandez via Gcc-patches
> >  wrote:
> > >
> > > Undefined ranges have undefined NAN bits.  We can't depend on them,
> > > as they may contain garbage.
> >
> > Ick ;)  Can you add a comment at least?
> >
> > > This patch returns false from
> > > maybe_isnan() for undefined ranges (the empty set).
> > >
> > > gcc/ChangeLog:
> > >
> > > * value-range.h (frange::maybe_isnan): Return false for
> > > undefined ranges.
> > > ---
> > >  gcc/value-range.h | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/gcc/value-range.h b/gcc/value-range.h
> > > index 7d5584a9294..325ed08f290 100644
> > > --- a/gcc/value-range.h
> > > +++ b/gcc/value-range.h
> > > @@ -1210,6 +1210,8 @@ frange::known_isinf () const
> > >  inline bool
> > >  frange::maybe_isnan () const
> > >  {
> > > +  if (undefined_p ())
> > > +return false;
> > >return m_pos_nan || m_neg_nan;
> > >  }
> > >
> > > --
> > > 2.37.1
> > >
> >

Re: [PATCH] Remove legacy -gz=zlib-gnu

2022-09-21 Thread Richard Biener via Gcc-patches

On Wed, Sep 21, 2022 at 9:49 AM Martin Liška  wrote:
>
> On 9/21/22 09:36, Richard Biener wrote:
> > If it's all configure time what's the point in
> > "deprecating" it?
>
> Note it's one of our options -gz where 'zlib-gnu' is one of the possible 
> option values.

I see.  Not sure if deprecating is really necessary, you need to keep
recognizing
zlib-gnu as no-op anyway.  So I'd just go ahead and remove support for it.

> Martin

Re: [PATCH v6, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

2022-09-21 Thread Kewen.Lin via Gcc-patches

Hi Haochen,

on 2022/6/24 10:02, HAO CHEN GUI wrote:
> Hi,
>   This patch implements optab f[min/max]_optab by xs[min/max]dp on rs6000.
> Tests show that outputs of xs[min/max]dp are consistent with the standard
> of C99 fmin/max.
> 
>   This patch also binds __builtin_vsx_xs[min/max]dp to fmin/max instead
> of smin/max. So the builtins always generate xs[min/max]dp on all
> platforms.
> 
>   Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-06-24 Haochen Gui 
> 
> gcc/
>   PR target/103605
>   * config/rs6000/rs6000.md (FMINMAX): New.
>   (minmax_op): New.
>   (f3): New pattern by UNSPEC_FMAX and UNSPEC_FMIN.

Nit: here miss UNSPEC_FMAX and UNSPEC_FMIN.

>   * config/rs6000/rs6000-builtins.def (__builtin_vsx_xsmaxdp): Set
>   pattern to fmaxdf3.
>   (__builtin_vsx_xsmindp): Set pattern to fmindf3.
> 
> gcc/testsuite/
>   PR target/103605
>   * gcc.dg/powerpc/pr103605.c: New.
> 
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index f4a9f24bcc5..8b735493b40 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1613,10 +1613,10 @@
>  XSCVSPDP vsx_xscvspdp {}
> 
>const double __builtin_vsx_xsmaxdp (double, double);
> -XSMAXDP smaxdf3 {}
> +XSMAXDP fmaxdf3 {}
> 
>const double __builtin_vsx_xsmindp (double, double);
> -XSMINDP smindf3 {}
> +XSMINDP fmindf3 {}
> 
>const double __builtin_vsx_xsrdpi (double);
>  XSRDPI vsx_xsrdpi {}
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index bf85baa5370..ae0dd98f0f9 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -158,6 +158,8 @@ (define_c_enum "unspec"
> UNSPEC_HASHCHK
> UNSPEC_XXSPLTIDP_CONST
> UNSPEC_XXSPLTIW_CONST
> +   UNSPEC_FMAX
> +   UNSPEC_FMIN
>])
> 
>  ;;
> @@ -5341,6 +5343,22 @@ (define_insn_and_split "*s3_fpr"
>DONE;
>  })
> 
> +
> +(define_int_iterator FMINMAX [UNSPEC_FMAX UNSPEC_FMIN])
> +
> +(define_int_attr  minmax_op [(UNSPEC_FMAX "max")
> +  (UNSPEC_FMIN "min")])
> +
> +(define_insn "f3"
> +  [(set (match_operand:SFDF 0 "vsx_register_operand" "=wa")
> + (unspec:SFDF [(match_operand:SFDF 1 "vsx_register_operand" "wa")
> +   (match_operand:SFDF 2 "vsx_register_operand" "wa")]
> +  FMINMAX))]
> +  "TARGET_VSX && !flag_finite_math_only"
> +  "xsdp %x0,%x1,%x2"
> +  [(set_attr "type" "fp")]
> +)
> +
>  (define_expand "movcc"
> [(set (match_operand:GPR 0 "gpc_reg_operand")
>(if_then_else:GPR (match_operand 1 "comparison_operator")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103605.c 
> b/gcc/testsuite/gcc.target/powerpc/pr103605.c
> new file mode 100644
> index 000..1c938d40e61
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr103605.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */

Nit: This dg-do line isn't needed.  OK with or without two nits fixed.  Thanks!

BR,
Kewen

RE: [PATCH] Add attribute hot judgement for INLINE_HINT_known_hot hint.

2022-09-21 Thread Cui, Lili via Gcc-patches

> Thank you.  Can you please also add a testcase that tests for this.
> So you modify imagemagick marking attribute hot on the specific inline?

Thanks Honza. Added the testcase. I didn't modify source code of 538.imagic_r, 
the original source code has attribute like:

#define magick_hot_spot  __attribute__((__hot__))
static Cache *SetPixelCacheNexusPixels( ... ) magick_hot_spot;

> I will try to also look again at your earlier patch - I had very busy summer 
> and
> unfortunately lost track on this one.
>
NP, I guessed you were busy during that time, my earlier patch was partially 
duplicated with function "Elimination_by_inlining_prob", 
except "parameter points to caller local memory" part, maybe we can find a 
suitable place to add local memory part  to the IPA.

> Honza

gcc/ChangeLog

  * ipa-inline-analysis.cc (do_estimate_edge_time): Add function attribute
  judgement for INLINE_HINT_known_hot hint.

gcc/testsuite/ChangeLog:

  * gcc.dg/ipa/inlinehint-6.c: New test.
---
 gcc/ipa-inline-analysis.cc  | 13 ---
 gcc/testsuite/gcc.dg/ipa/inlinehint-6.c | 47 +
 2 files changed, 56 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/inlinehint-6.c

diff --git a/gcc/ipa-inline-analysis.cc b/gcc/ipa-inline-analysis.cc
index 1ca685d1b0e..7bd29c36590 100644
--- a/gcc/ipa-inline-analysis.cc
+++ b/gcc/ipa-inline-analysis.cc
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-utils.h"
 #include "cfgexpand.h"
 #include "gimplify.h"
+#include "attribs.h"
 
 /* Cached node/edge growths.  */
 fast_call_summary *edge_growth_cache = 
NULL;
@@ -249,15 +250,19 @@ do_estimate_edge_time (struct cgraph_edge *edge, sreal 
*ret_nonspec_time)
   hints = estimates.hints;
 }
 
-  /* When we have profile feedback, we can quite safely identify hot
- edges and for those we disable size limits.  Don't do that when
- probability that caller will call the callee is low however, since it
+  /* When we have profile feedback or function attribute, we can quite safely
+ identify hot edges and for those we disable size limits.  Don't do that
+ when probability that caller will call the callee is low however, since it
  may hurt optimization of the caller's hot path.  */
-  if (edge->count.ipa ().initialized_p () && edge->maybe_hot_p ()
+  if ((edge->count.ipa ().initialized_p () && edge->maybe_hot_p ()
   && (edge->count.ipa () * 2
  > (edge->caller->inlined_to
 ? edge->caller->inlined_to->count.ipa ()
 : edge->caller->count.ipa (
+  || (lookup_attribute ("hot", DECL_ATTRIBUTES (edge->caller->decl))
+ != NULL
+&& lookup_attribute ("hot", DECL_ATTRIBUTES (edge->callee->decl))
+ != NULL))
 hints |= INLINE_HINT_known_hot;
 
   gcc_checking_assert (size >= 0);
diff --git a/gcc/testsuite/gcc.dg/ipa/inlinehint-6.c 
b/gcc/testsuite/gcc.dg/ipa/inlinehint-6.c
new file mode 100644
index 000..1f3be641c6d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/inlinehint-6.c
@@ -0,0 +1,47 @@
+/* { dg-options "-O3 -c -fdump-ipa-inline-details -fno-early-inlining 
-fno-ipa-cp"  } */
+/* { dg-add-options bind_pic_locally } */
+
+#define size_t long long int
+
+struct A
+{
+  size_t f1, f2, f3, f4;
+};
+struct C
+{
+  struct A a;
+  size_t b;
+};
+struct C x;
+
+__attribute__((hot)) struct C callee (struct A *a, struct C *c)
+{
+  c->a=(*a);
+
+  if((c->b + 7) & 17)
+   {
+  c->a.f1 = c->a.f2 + c->a.f1;
+  c->a.f2 = c->a.f3 - c->a.f2;
+  c->a.f3 = c->a.f2 + c->a.f3;
+  c->a.f4 = c->a.f2 - c->a.f4;
+  c->b = c->a.f2;
+
+}
+  return *c;
+}
+
+__attribute__((hot)) struct C caller (size_t d, size_t e, size_t f, size_t g, 
struct C *c)
+{
+  struct A a;
+  a.f1 = 1 + d;
+  a.f2 = e;
+  a.f3 = 12 + f;
+  a.f4 = 68 + g;
+  if (c->b > 0)
+return callee (, c);
+  else
+return *c;
+}
+
+/* { dg-final { scan-ipa-dump "known_hot"  "inline"  } } */
+
-- 
2.17.1

Thanks,
Lili.



0001-Add-attribute-hot-judgement-for-INLINE_HINT_known_ho.patch
Description: 0001-Add-attribute-hot-judgement-for-INLINE_HINT_known_ho.patch

Re: [PATCH] Fortran 2018 rounding modes changes

2022-09-21 Thread FX via Gcc-patches

Follow-up patch, including a test, committed as attached.

FX



0001-Fortran-handle-RADIX-kind-in-IEEE_SET_ROUNDING_MODE.patch
Description: Binary data

[PATCH] aarch64: Rewrite -march=native to -mcpu if no other -mcpu or -mtune is given

2022-09-21 Thread Kyrylo Tkachov via Gcc-patches

Hi all,

We have received requests to improve the out-of-the box experience and
performance of AArch64 GCC users, particularly those porting software from other
architectures. This has many aspects. One such aspect are apps built natively
with an -march=native used as a tuning flag in the Makefile.
On AArch64 this selects the right architecture features on GNU+Linux for the
host system but tunes for the "generic" CPU target.
This patch makes GCC also tune for the host CPU, as well as selecting its
architecture. That is, it translates -march=native into -mcpu=native.
This maintains the documentation that it "causes the compiler to pick the
architecture of the host system" since -mcpu=native does that, but it also
gives a better performance experience for the user.

If the user explicitly asked for a particular CPU tuning through -mcpu or
-mtune then we don't do this rewriting so that the user option is honoured.

This would have been a one-line patch if it wasn't for --with-tune
configure-time arguments. When GCC is configured with --with-tune= the
OPTION_DEFAULT_SPECS will insert an -mtune= in the options if no other
-mcpu or -mtune options were given. This will spook the aforementioned desired
rewriting of -march=native into -mcpu=native, though I'd argue that we want to
do the rewrite even then. Therefore, this patch moves some specs in aarch64.h
around and refactors the --with-tune rewriting into CONFIG_TUNE_SPEC so that
the materialization of the implicit -mtune= does not happen if 
-march=native
is used.

Bootstrapped and tested on aarch64-none-linux-gnu and checked with the output
of -### from the driver that the option rewriting works as expected on
aarch64-linux-gnu.

Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64.h (HAVE_LOCAL_CPU_DETECT,
EXTRA_SPEC_FUNCTIONS, MCPU_MTUNE_NATIVE_SPECS): Move definitions up 
before
OPTION_DEFAULT_SPECS.
(MCPU_MTUNE_NATIVE_SPECS): Pass "cpu" to
local_cpu_detect when rewriting -march=native and no -mcpu or -mtune
is given.
(CONFIG_TUNE_SPEC): Define.
(OPTION_DEFAULT_SPECS): Use CONFIG_TUNE_SPEC for "tune".


march-mcpu.patch
Description: march-mcpu.patch

[COMMITTED] [PR106967] frange: revamp relational operators for NANs.

2022-09-21 Thread Aldy Hernandez via Gcc-patches

Since NANs can be inserted by other passes even for -ffinite-math-only,
we can't depend on the flag to determine if a NAN is a possiblity.
Instead, we must explicitly check for them.

In the case of -ffinite-math-only, paths leading up to a NAN are
undefined and can be considered unreachable.  I have audited all the
relational code and made sure we're handling the known NAN case before
anything else, setting undefined when appropriate.

In the process, I revamped all the relational code handling NANs to
correctly notice paths that are unreachable.

The basic structure for ordered relational operators (except != of
course) is this:

If either operand is a known NAN, return FALSE.

The true side of a relop when one operand is a NAN is
unreachable.

On the false side of a relop when one operand is a NAN, we
know nothing about the other operand.

Regstrapped on x86-64 and ppc64le Linux.
lapack testing on x86-64 with and without -ffinite-math-only.

p.s. This is in addition to the suggestions by Richi in the PR.  I'll do
this shortly.

PR tree-optimization/106967

gcc/ChangeLog:

* range-op-float.cc (foperator_equal::fold_range): Adjust for NAN.
(foperator_equal::op1_range): Same.
(foperator_not_equal::fold_range): Same.
(foperator_not_equal::op1_range): Same.
(foperator_lt::fold_range): Same.
(foperator_lt::op1_range): Same.
(foperator_lt::op2_range): Same.
(foperator_le::fold_range): Same.
(foperator_le::op1_range): Same.
(foperator_le::op2_range): Same.
(foperator_gt::fold_range): Same.
(foperator_gt::op1_range): Same.
(foperator_gt::op2_range): Same.
(foperator_ge::fold_range): Same.
(foperator_ge::op1_range): Same.
(foperator_ge::op2_range): Same.
(foperator_unordered::op1_range): Same.
(foperator_ordered::fold_range): Same.
(foperator_ordered::op1_range): Same.
(build_le): Assert that we don't have a NAN.
(build_lt): Same.
(build_gt): Same.
(build_ge): Same.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr106967.c: New test.
---
 gcc/range-op-float.cc| 270 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr106967.c |  16 ++
 2 files changed, 193 insertions(+), 93 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr106967.c

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index 1e39a07ab97..2bd3dc9253f 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -230,11 +230,8 @@ frange_add_zeros (frange , tree type)
 static bool
 build_le (frange , tree type, const frange )
 {
-  if (val.known_isnan ())
-{
-  r.set_undefined ();
-  return false;
-}
+  gcc_checking_assert (!val.known_isnan ());
+
   r.set (type, dconstninf, val.upper_bound ());
 
   // Add both zeros if there's the possibility of zero equality.
@@ -248,11 +245,8 @@ build_le (frange , tree type, const frange )
 static bool
 build_lt (frange , tree type, const frange )
 {
-  if (val.known_isnan ())
-{
-  r.set_undefined ();
-  return false;
-}
+  gcc_checking_assert (!val.known_isnan ());
+
   // < -INF is outside the range.
   if (real_isinf (_bound (), 1))
 {
@@ -272,11 +266,8 @@ build_lt (frange , tree type, const frange )
 static bool
 build_ge (frange , tree type, const frange )
 {
-  if (val.known_isnan ())
-{
-  r.set_undefined ();
-  return false;
-}
+  gcc_checking_assert (!val.known_isnan ());
+
   r.set (type, val.lower_bound (), dconstinf);
 
   // Add both zeros if there's the possibility of zero equality.
@@ -290,11 +281,8 @@ build_ge (frange , tree type, const frange )
 static bool
 build_gt (frange , tree type, const frange )
 {
-  if (val.known_isnan ())
-{
-  r.set_undefined ();
-  return false;
-}
+  gcc_checking_assert (!val.known_isnan ());
+
   // > +INF is outside the range.
   if (real_isinf (_bound (), 0))
 {
@@ -365,9 +353,11 @@ foperator_equal::fold_range (irange , tree type,
   if (frelop_early_resolve (r, type, op1, op2, rel, VREL_EQ))
 return true;
 
+  if (op1.known_isnan () || op2.known_isnan ())
+r = range_false (type);
   // We can be sure the values are always equal or not if both ranges
   // consist of a single value, and then compare them.
-  if (op1.singleton_p () && op2.singleton_p ())
+  else if (op1.singleton_p () && op2.singleton_p ())
 {
   if (op1 == op2)
r = range_true (type);
@@ -393,25 +383,33 @@ foperator_equal::fold_range (irange , tree type,
 bool
 foperator_equal::op1_range (frange , tree type,
const irange ,
-   const frange  ATTRIBUTE_UNUSED,
+   const frange ,
relation_kind rel) const
 {
   switch (get_bool_state (r, lhs, type))
 {
 case BRS_TRUE:
-  // If it's true, the result is the same

Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx

2022-09-21 Thread Jakub Jelinek via Gcc-patches

On Wed, Sep 21, 2022 at 03:45:36PM +0800, Chung-Lin Tang via Gcc-patches wrote:
> Hi Tom,
> I had a patch submitted earlier, where I reported that the current way of 
> implementing
> barriers in libgomp on nvptx created a quite significant performance drop on 
> some SPEChpc2021
> benchmarks:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html
> 
> That previous patch wasn't accepted well (admittedly, it was kind of a hack).
> So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX.
> 
> Basically, instead of trying to have the GPU do CPU-with-OS-like things that 
> it isn't suited for,
> barriers are implemented simplistically with bar.* synchronization 
> instructions.
> Tasks are processed after threads have joined, and only if team->task_count 
> != 0
> 
> (arguably, there might be a little bit of performance forfeited where earlier 
> arriving threads
> could've been used to process tasks ahead of other threads. But that again 
> falls into requiring
> implementing complex futex-wait/wake like behavior. Really, that kind of 
> tasking is not what target
> offloading is usually used for)

I admit I don't have a good picture if people in real-world actually use
tasking in offloading regions and how much and in what way, but the above
definitely would be a show-stopper for typical tasking workloads, where
one thread (usually from master/masked/single construct's body) creates lots
of tasks and can spend considerable amount of time in those preparations,
while other threads are expected to handle those tasks.

Do we have an idea how are other implementations handling this?
I think it should be easily observable with atomics, have
master/masked/single that creates lots of tasks and then spends a long time
doing something, have very small task bodies that just increment some atomic
counter and at the end of the master/masked/single see how many tasks were
already encountered.

Note, I don't have any smart ideas how to handle this instead and what
you posted might be ok for what people usually do on offloading targets
in OpenMP if they use tasking at all, just wanted to mention that there
could be workloads where the above is a serious problem.  If there are
say hundreds of threads doing nothing until a single thread reaches a
barrier and there are hundreds of pending tasks...
E.g. note we have that 64 pending task limit after which we start to
create undeferred tasks, so if we never start handling tasks until
one thread is done with them, that would mean the single thread
would create 64 deferred tasks and then handle all the others itself
making it even longer until the other tasks can deal with it.

Jakub

Re: [PATCH] Fortran: add IEEE_MODES_TYPE, IEEE_GET_MODES and IEEE_SET_MODES

2022-09-21 Thread FX via Gcc-patches

I forgot to include the gfortran.map part of the patch, and so the test failed 
on platforms that have symbol versioning.
Fix below committed to master.

FX



commit ce8aed75a38b468490ecab4c318e3eb08d468608 (HEAD -> master)
Author: Francois-Xavier Coudert 
Date:   2022-09-21 10:04:22 +0200

Fortran: add symbols in version map for IEEE_GET_MODES and IEEE_SET_MODES

The symbols were forgotten in the patch that added IEEE_GET_MODES
and IEEE_SET_MODES.

2022-09-21  Francois-Xavier Coudert  

libgfortran/

* gfortran.map: Add symbols for IEEE_GET_MODES
and IEEE_SET_MODES.

diff --git a/libgfortran/gfortran.map b/libgfortran/gfortran.map
index e0e795c3d48..db9b86cb183 100644
--- a/libgfortran/gfortran.map
+++ b/libgfortran/gfortran.map
@@ -1759,3 +1759,9 @@ GFORTRAN_12 {
   _gfortran_transfer_real128_write;
 #endif
 } GFORTRAN_10.2;
+
+GFORTRAN_13 {
+  global:
+__ieee_exceptions_MOD_ieee_get_modes;
+__ieee_exceptions_MOD_ieee_set_modes;
+} GFORTRAN_12;

[PING][PATCH 0/15] arm: Enables return address verification and branch target identification on Cortex-M

2022-09-21 Thread Andrea Corallo via Gcc-patches

Hi all,

ping^2 for patches 9/15 7/15 11/15 12/15 and 10/15 V2 of this series.

  Andrea

Re: [COMMITTED] frange::maybe_isnan() should return FALSE for undefined ranges.

2022-09-21 Thread Aldy Hernandez via Gcc-patches

The reason the flags were uninitialized was because they were unused,
similarly for m_type.  But you're right, it is icky and prone to bugs.
I just thought it was cheap to set_undefined by just flipping
m_kind=VR_UNDEFINED, but it smells like premature optimization.

How about this?

Aldy

On Wed, Sep 21, 2022 at 9:39 AM Richard Biener
 wrote:
>
> On Tue, Sep 20, 2022 at 8:23 PM Aldy Hernandez via Gcc-patches
>  wrote:
> >
> > Undefined ranges have undefined NAN bits.  We can't depend on them,
> > as they may contain garbage.
>
> Ick ;)  Can you add a comment at least?
>
> > This patch returns false from
> > maybe_isnan() for undefined ranges (the empty set).
> >
> > gcc/ChangeLog:
> >
> > * value-range.h (frange::maybe_isnan): Return false for
> > undefined ranges.
> > ---
> >  gcc/value-range.h | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/gcc/value-range.h b/gcc/value-range.h
> > index 7d5584a9294..325ed08f290 100644
> > --- a/gcc/value-range.h
> > +++ b/gcc/value-range.h
> > @@ -1210,6 +1210,8 @@ frange::known_isinf () const
> >  inline bool
> >  frange::maybe_isnan () const
> >  {
> > +  if (undefined_p ())
> > +return false;
> >return m_pos_nan || m_neg_nan;
> >  }
> >
> > --
> > 2.37.1
> >
>
From f96eb8f4386f2cec67dbf20ab3650aa756fe218a Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Wed, 21 Sep 2022 09:49:14 +0200
Subject: [PATCH] Clear unused flags in frange for undefined ranges.

gcc/ChangeLog:

	* value-range.cc (frange::combine_zeros): Call set_undefined.
	(frange::intersect_nans): Same.
	(frange::intersect): Same.
	(frange::verify_range): Undefined ranges do not have a type.
	* value-range.h (frange::set_undefined): Clear NAN flags and type.
---
 gcc/value-range.cc | 8 
 gcc/value-range.h  | 4 
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index dc42b6d3120..505eb9211a7 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -422,7 +422,7 @@ frange::combine_zeros (const frange , bool union_p)
   if (maybe_isnan ())
 	m_kind = VR_NAN;
   else
-	m_kind = VR_UNDEFINED;
+	set_undefined ();
   changed = true;
 }
   return changed;
@@ -506,7 +506,7 @@ frange::intersect_nans (const frange )
   if (maybe_isnan ())
 m_kind = VR_NAN;
   else
-m_kind = VR_UNDEFINED;
+set_undefined ();
   if (flag_checking)
 verify_range ();
   return true;
@@ -558,7 +558,7 @@ frange::intersect (const vrange )
   if (maybe_isnan ())
 	m_kind = VR_NAN;
   else
-	m_kind = VR_UNDEFINED;
+	set_undefined ();
   if (flag_checking)
 	verify_range ();
   return true;
@@ -696,7 +696,7 @@ frange::verify_range ()
   switch (m_kind)
 {
 case VR_UNDEFINED:
-  // m_type is ignored.
+  gcc_checking_assert (!m_type);
   return;
 case VR_VARYING:
   gcc_checking_assert (m_type);
diff --git a/gcc/value-range.h b/gcc/value-range.h
index 325ed08f290..3668b331187 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -1083,6 +1083,10 @@ inline void
 frange::set_undefined ()
 {
   m_kind = VR_UNDEFINED;
+  m_type = NULL;
+  m_pos_nan = false;
+  m_neg_nan = false;
+  // m_min and m_min are unitialized as they are REAL_VALUE_TYPE ??.
   if (flag_checking)
 verify_range ();
 }
-- 
2.37.1

Re: [PATCH] Remove legacy -gz=zlib-gnu

2022-09-21 Thread Martin Liška

On 9/21/22 09:36, Richard Biener wrote:
> If it's all configure time what's the point in
> "deprecating" it?

Note it's one of our options -gz where 'zlib-gnu' is one of the possible option 
values.

Martin

[committed] libstdc++: Qualify std::abort() in test

2022-09-21 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux, pushed to trunk.

-- >8 --

This test includes  so should use std::abort not ::abort.

libstdc++-v3/ChangeLog:

* testsuite/18_support/uncaught_exception/14026.cc: Qualify
call to std::abort.
---
 libstdc++-v3/testsuite/18_support/uncaught_exception/14026.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/18_support/uncaught_exception/14026.cc 
b/libstdc++-v3/testsuite/18_support/uncaught_exception/14026.cc
index 22d4a90d49e..bd281b97174 100644
--- a/libstdc++-v3/testsuite/18_support/uncaught_exception/14026.cc
+++ b/libstdc++-v3/testsuite/18_support/uncaught_exception/14026.cc
@@ -28,7 +28,7 @@ static void
 no_uncaught ()
 {
   if (std::uncaught_exception())
-abort();
+std::abort();
 }
 
 int
-- 
2.37.3

Re: [PATCH] Remove legacy -gz=zlib-gnu

2022-09-21 Thread Fangrui Song via Gcc-patches

On Wed, Sep 21, 2022 at 12:37 AM Richard Biener
 wrote:
>
> On Tue, Sep 20, 2022 at 2:55 PM Martin Liška  wrote:
> >
> > On 7/1/22 09:20, Fangrui Song via Gcc-patches wrote:
> > > On 2022-07-01, Andrew Pinski wrote:
> > >> On Thu, Jun 30, 2022 at 11:58 PM Fangrui Song via Gcc-patches
> > >>  wrote:
> > >>>
> > >>> From: Fangrui Song 
> > >>>
> > >>> SHF_COMPRESSED style zlib has been supported since binutils 2.26
> > >>> and the legacy zlib-gnu option hasn't gain adoption.
> > >>> According to Debian Code Search (`gz=zlib-gnu`), no project uses
> > >>> -gz=zlib-gnu (valgrind has a configure to use -gz=zlib).
> > >>> Remove support for the legacy zlib-gnu and simplify configure.ac by
> > >>> removing zlib-gnu ld/as check.
> > >>
> > >> A couple of things, you are missing a changelog.
> > >
> > > Sorry.
> > >
> > >> Second, why remove something which is still working?
> >
> > Hi.
> >
> > I do support the option removal, while I would replace the removal with a 
> > warning
> > saying no compression will be used.
> >
> > >
> > > It's unused and its existence causes confusion: the paradox of choice.
> > > People may assume the support may be good but newer DWARF consumers may
> > > not support the legacy format.
> >
> > Agree, the compression format is legacy. I verified all openSUSE packages 
> > (15k)
> > and there's no project actively using it.
> >
> > >
> > > The other motivation is to clean up it a bit.  I foresee that someone
> > > will add --compress-debug-sections=zstd to binutils and configure.ac and
> > > gcc/gcc.cc would become more messy.
> >
> > The argument makes sense, it will be even bigger mess.
> >
> > @Richi: Is it something we can deprecate for GCC 13?
>
> What's the practical difference between zlib and zlib-gnu?  Can we just
> map zlib-gnu to zlib?  If it's all configure time what's the point in
> "deprecating" it?

zlib-gnu uses the legacy .zdebug section name with a "ZLIB" magic:
http://www.linker-aliens.org/blogs/ali/entry/elf_section_compression/
https://maskray.me/blog/2022-01-23-compressed-debug-sections has some
history about how the zlib-gabi replacement: ELFCOMPRESS_ZLIB .

FWIW I removed -gz=zlib-gnu from clang and .zdebug support from
various llvm-project tools.

I cannot really find uses of -gz=zlib-gnu in the wild. Users can
always fallback to -Wa, and -Wl, if their tools are so old that
ELFCOMPRESS_ZLIB is unsupported.

> Richard.
>
> >
> > Martin
> >
> > >
> > >> Third, why not just make gz=zlib-gnu as an alias to gz=zlib instead so
> > >> if someone used it before it will still work. we try not to remove
> > >> options; have them emit a warning and be ignored (or moved over to the
> > >> closed option).
> > >
> > > Changing the semantics of -gz=zlib-gnu would be even more confusing.
> > >
> > >> Thanks,
> > >> Andrew
> > >>
> > >>> ---
> > >>>  gcc/common.opt  |  3 ---
> > >>>  gcc/configure   | 33 ++---
> > >>>  gcc/configure.ac| 29 -
> > >>>  gcc/doc/invoke.texi | 11 +--
> > >>>  gcc/gcc.cc  | 22 ++
> > >>>  5 files changed, 17 insertions(+), 81 deletions(-)
> > >>>
> > >>> diff --git a/gcc/common.opt b/gcc/common.opt
> > >>> index e7a51e882ba..8754d93d545 100644
> > >>> --- a/gcc/common.opt
> > >>> +++ b/gcc/common.opt
> > >>> @@ -3424,9 +3424,6 @@ Enum(compressed_debug_sections) String(none) 
> > >>> Value(0)
> > >>>  EnumValue
> > >>>  Enum(compressed_debug_sections) String(zlib) Value(1)
> > >>>
> > >>> -EnumValue
> > >>> -Enum(compressed_debug_sections) String(zlib-gnu) Value(2)
> > >>> -
> > >>>  gz
> > >>>  Common Driver
> > >>>  Generate compressed debug sections.
> > >>> diff --git a/gcc/configure b/gcc/configure
> > >>> index 62872d132ea..ca87e875e9d 100755
> > >>> --- a/gcc/configure
> > >>> +++ b/gcc/configure
> > >>> @@ -19674,7 +19674,7 @@ else
> > >>>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
> > >>>lt_status=$lt_dlunknown
> > >>>cat > conftest.$ac_ext <<_LT_EOF
> > >>> -#line 19679 "configure"
> > >>> +#line 19677 "configure"
> > >>>  #include "confdefs.h"
> > >>>
> > >>>  #if HAVE_DLFCN_H
> > >>> @@ -19780,7 +19780,7 @@ else
> > >>>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
> > >>>lt_status=$lt_dlunknown
> > >>>cat > conftest.$ac_ext <<_LT_EOF
> > >>> -#line 19785 "configure"
> > >>> +#line 19783 "configure"
> > >>>  #include "confdefs.h"
> > >>>
> > >>>  #if HAVE_DLFCN_H
> > >>> @@ -29711,20 +29711,13 @@ else
> > >>> if $gcc_cv_as --compress-debug-sections -o conftest.o conftest.s 
> > >>> 2>&1 | grep -i warning > /dev/null
> > >>> then
> > >>>   gcc_cv_as_compress_debug=0
> > >>> -   # Since binutils 2.26, gas supports --compress-debug-sections=type,
> > >>> +   # Since binutils 2.26, gas supports --compress-debug-sections=zlib,
> > >>> # defaulting to the ELF gABI format.
> > >>> -   elif $gcc_cv_as --compress-debug-sections=zlib-gnu -o conftest.o 
> > >>> conftest.s > /dev/null 2>&1
> >

[PATCH, nvptx, 2/2] Reimplement libgomp barriers for nvptx: bar.red instruction support in GCC

2022-09-21 Thread Chung-Lin Tang via Gcc-patches


Hi Tom, following the first patch.

This new barrier implementation I posted in the first patch uses the 'bar.red' 
instruction.
Usually this could've been easily done with a single line of inline assembly. 
However I quickly
realized that because the NVPTX GCC port is implemented with all virtual 
general registers,
we don't have a register constraint usable to select "predicate registers".
Since bar.red uses predicate typed values, I can't create it directly using 
inline asm.

So it appears that the most simple way of accessing it is with a target builtin.
The attached patch adds bar.red instructions to the nvptx port, and 
__builtin_nvptx_bar_red_* builtins
to use it. The code should support all variations of bar.red (and, or, and popc 
operations).

(This support was used to implement the first libgomp barrier patch, so must be 
approved together)

Thanks,
Chung-Lin

2022-09-21  Chung-Lin Tang  

gcc/ChangeLog:

* config/nvptx/nvptx.cc (nvptx_print_operand): Add 'p'
case, adjust comments.
(enum nvptx_builtins): Add NVPTX_BUILTIN_BAR_RED_AND,
NVPTX_BUILTIN_BAR_RED_OR, and NVPTX_BUILTIN_BAR_RED_POPC.
(nvptx_expand_bar_red): New function.
(nvptx_init_builtins):
Add DEFs of __builtin_nvptx_bar_red_[and/or/popc].
(nvptx_expand_builtin): Use nvptx_expand_bar_red to expand
NVPTX_BUILTIN_BAR_RED_[AND/OR/POPC] cases.

* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add
UNSPECV_BARRED_AND, UNSPECV_BARRED_OR, and UNSPECV_BARRED_POPC.
(BARRED): New int iterator.
(barred_op,barred_mode,barred_ptxtype): New int attrs.
(nvptx_barred_): New define_insn.
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 49cc681..afc3a890 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -2879,6 +2879,7 @@ nvptx_mem_maybe_shared_p (const_rtx x)
t -- print a type opcode suffix, promoting QImode to 32 bits
T -- print a type size in bits
u -- print a type opcode suffix without promotions.
+   p -- print a '!' for constant 0.
x -- print a destination operand that may also be a bit bucket.  */
 
 static void
@@ -3012,6 +3013,11 @@ nvptx_print_operand (FILE *file, rtx x, int code)
   fprintf (file, "@!");
   goto common;
 
+case 'p':
+  if (INTVAL (x) == 0)
+   fprintf (file, "!");
+  break;
+
 case 'c':
   mode = GET_MODE (XEXP (x, 0));
   switch (x_code)
@@ -6151,9 +6157,90 @@ enum nvptx_builtins
   NVPTX_BUILTIN_CMP_SWAPLL,
   NVPTX_BUILTIN_MEMBAR_GL,
   NVPTX_BUILTIN_MEMBAR_CTA,
+  NVPTX_BUILTIN_BAR_RED_AND,
+  NVPTX_BUILTIN_BAR_RED_OR,
+  NVPTX_BUILTIN_BAR_RED_POPC,
   NVPTX_BUILTIN_MAX
 };
 
+/* Expander for 'bar.red' instruction builtins.  */
+
+static rtx
+nvptx_expand_bar_red (tree exp, rtx target,
+ machine_mode ARG_UNUSED (m), int ARG_UNUSED (ignore))
+{
+  int code = DECL_MD_FUNCTION_CODE (TREE_OPERAND (CALL_EXPR_FN (exp), 0));
+  machine_mode mode = TYPE_MODE (TREE_TYPE (exp));
+
+  if (!target)
+target = gen_reg_rtx (mode);
+
+  rtx pred, dst;
+  rtx bar = expand_expr (CALL_EXPR_ARG (exp, 0),
+NULL_RTX, SImode, EXPAND_NORMAL);
+  rtx nthr = expand_expr (CALL_EXPR_ARG (exp, 1),
+ NULL_RTX, SImode, EXPAND_NORMAL);
+  rtx cpl = expand_expr (CALL_EXPR_ARG (exp, 2),
+NULL_RTX, SImode, EXPAND_NORMAL);
+  rtx redop = expand_expr (CALL_EXPR_ARG (exp, 3),
+  NULL_RTX, SImode, EXPAND_NORMAL);
+  if (CONST_INT_P (bar))
+{
+  if (INTVAL (bar) < 0 || INTVAL (bar) > 15)
+   {
+ error_at (EXPR_LOCATION (exp),
+   "barrier value must be within [0,15]");
+ return const0_rtx;
+   }
+}
+  else if (!REG_P (bar))
+bar = copy_to_mode_reg (SImode, bar);
+
+  if (!CONST_INT_P (nthr) && !REG_P (nthr))
+nthr = copy_to_mode_reg (SImode, nthr);
+
+  if (!CONST_INT_P (cpl))
+{
+  error_at (EXPR_LOCATION (exp),
+   "complement argument must be constant");
+  return const0_rtx;
+}
+
+  pred = gen_reg_rtx (BImode);
+  if (!REG_P (redop))
+redop = copy_to_mode_reg (SImode, redop);
+  emit_insn (gen_rtx_SET (pred, gen_rtx_NE (BImode, redop, GEN_INT (0;
+  redop = pred;
+
+  rtx pat;
+  switch (code)
+{
+case NVPTX_BUILTIN_BAR_RED_AND:
+  dst = gen_reg_rtx (BImode);
+  pat = gen_nvptx_barred_and (dst, bar, nthr, cpl, redop);
+  break;
+case NVPTX_BUILTIN_BAR_RED_OR:
+  dst = gen_reg_rtx (BImode);
+  pat = gen_nvptx_barred_or (dst, bar, nthr, cpl, redop);
+  break;
+case NVPTX_BUILTIN_BAR_RED_POPC:
+  dst = gen_reg_rtx (SImode);
+  pat = gen_nvptx_barred_popc (dst, bar, nthr, cpl, redop);
+  break;
+default:
+  gcc_unreachable ();
+}
+  emit_insn (pat);
+  if (GET_MODE (dst) == BImode)
+{
+  rtx tmp = gen_reg_rtx (mode);
+  emit_insn (gen_rtx_SET (tmp, gen_rtx_NE

[PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx

2022-09-21 Thread Chung-Lin Tang via Gcc-patches


Hi Tom,
I had a patch submitted earlier, where I reported that the current way of 
implementing
barriers in libgomp on nvptx created a quite significant performance drop on 
some SPEChpc2021
benchmarks:
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html

That previous patch wasn't accepted well (admittedly, it was kind of a hack).
So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX.

Basically, instead of trying to have the GPU do CPU-with-OS-like things that it 
isn't suited for,
barriers are implemented simplistically with bar.* synchronization instructions.
Tasks are processed after threads have joined, and only if team->task_count != 0

(arguably, there might be a little bit of performance forfeited where earlier 
arriving threads
could've been used to process tasks ahead of other threads. But that again 
falls into requiring
implementing complex futex-wait/wake like behavior. Really, that kind of 
tasking is not what target
offloading is usually used for)

Implementation highlight notes:
1. gomp_team_barrier_wake() is now an empty function (threads never "wake" in 
the usual manner)
2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction.
3. gomp_barrier_wait_last() now is implemented using "bar.arrive"

4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end():
   The main synchronization is done using a 'bar.red' instruction. This reduces 
across all threads
   the condition (team->task_count != 0), to enable the task processing down 
below if any thread
   created a task. (this bar.red usage required the need of the second GCC 
patch in this series)

This patch has been tested on x86_64/powerpc64le with nvptx offloading, using 
libgomp, ovo, omptests,
and sollve_vv testsuites, all without regressions. Also verified that the 
SPEChpc 2021 521.miniswp_t
and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle has 
been restored to
devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk?

(also suggest backporting to GCC12 branch, if performance regression can be 
considered a defect)

Thanks,
Chung-Lin

libgomp/ChangeLog:

2022-09-21  Chung-Lin Tang  

* config/nvptx/bar.c (generation_to_barrier): Remove.
(futex_wait,futex_wake,do_spin,do_wait): Remove.
(GOMP_WAIT_H): Remove.
(#include "../linux/bar.c"): Remove.
(gomp_barrier_wait_end): New function.
(gomp_barrier_wait): Likewise.
(gomp_barrier_wait_last): Likewise.
(gomp_team_barrier_wait_end): Likewise.
(gomp_team_barrier_wait): Likewise.
(gomp_team_barrier_wait_final): Likewise.
(gomp_team_barrier_wait_cancel_end): Likewise.
(gomp_team_barrier_wait_cancel): Likewise.
(gomp_team_barrier_cancel): Likewise.
* config/nvptx/bar.h (gomp_team_barrier_wake): Remove
prototype, add new static inline function.
diff --git a/libgomp/config/nvptx/bar.c b/libgomp/config/nvptx/bar.c
index eee2107..0b958ed 100644
--- a/libgomp/config/nvptx/bar.c
+++ b/libgomp/config/nvptx/bar.c
@@ -30,137 +30,143 @@
 #include 
 #include "libgomp.h"
 
-/* For cpu_relax.  */
-#include "doacross.h"
-
-/* Assuming ADDR is >generation, return bar.  Copied from
-   rtems/bar.c.  */
+void
+gomp_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state)
+{
+  if (__builtin_expect (state & BAR_WAS_LAST, 0))
+{
+  /* Next time we'll be awaiting TOTAL threads again.  */
+  bar->awaited = bar->total;
+  __atomic_store_n (>generation, bar->generation + BAR_INCR,
+   MEMMODEL_RELEASE);
+}
+  if (bar->total > 1)
+asm ("bar.sync 1, %0;" : : "r" (32 * bar->total));
+}
 
-static gomp_barrier_t *
-generation_to_barrier (int *addr)
+void
+gomp_barrier_wait (gomp_barrier_t *bar)
 {
-  char *bar
-= (char *) addr - __builtin_offsetof (gomp_barrier_t, generation);
-  return (gomp_barrier_t *)bar;
+  gomp_barrier_wait_end (bar, gomp_barrier_wait_start (bar));
 }
 
-/* Implement futex_wait-like behaviour to plug into the linux/bar.c
-   implementation.  Assumes ADDR is >generation.   */
+/* Like gomp_barrier_wait, except that if the encountering thread
+   is not the last one to hit the barrier, it returns immediately.
+   The intended usage is that a thread which intends to gomp_barrier_destroy
+   this barrier calls gomp_barrier_wait, while all other threads
+   call gomp_barrier_wait_last.  When gomp_barrier_wait returns,
+   the barrier can be safely destroyed.  */
 
-static inline void
-futex_wait (int *addr, int val)
+void
+gomp_barrier_wait_last (gomp_barrier_t *bar)
 {
-  gomp_barrier_t *bar = generation_to_barrier (addr);
+  /* The above described behavior matches 'bar.arrive' perfectly.  */
+  if (bar->total > 1)
+asm ("bar.arrive 1, %0;" : : "r" (32 * bar->total));
+}
 
-  if (bar->total < 2)
-/* A barrier with less than two threads, nop.  */
-return;
+void
+gomp_team_barrier_wait_end (gomp_barrier_t *bar,

[committed] libstdc++: Remove trailing whitespace in documentation sources

2022-09-21 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux, pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* doc/xml/manual/documentation_hacking.xml: Remove trailing
whitespace.
* doc/xml/manual/policy_data_structures.xml: Likewise.
---
 .../doc/xml/manual/documentation_hacking.xml |  4 ++--
 .../doc/xml/manual/policy_data_structures.xml| 12 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/doc/xml/manual/documentation_hacking.xml 
b/libstdc++-v3/doc/xml/manual/documentation_hacking.xml
index 03bf1f184d4..776d5e857b5 100644
--- a/libstdc++-v3/doc/xml/manual/documentation_hacking.xml
+++ b/libstdc++-v3/doc/xml/manual/documentation_hacking.xml
@@ -353,7 +353,7 @@
  documentation. Here are some of the obvious errors, and ways
  to fix some common issues that may appear quite cryptic.

-   
+

  First, if using a rule like make pdf, try to
  narrow down the scope of the error to either docbook
@@ -844,7 +844,7 @@ make 
XSL_STYLE_DIR="/usr/share/xml/docbook/stylesheet/nwalsh"
  documentation. Here are some of the obvious errors, and ways
  to fix some common issues that may appear quite cryptic.

-   
+

  First, if using a rule like make pdf, try to
  narrow down the scope of the error to either docbook
diff --git a/libstdc++-v3/doc/xml/manual/policy_data_structures.xml 
b/libstdc++-v3/doc/xml/manual/policy_data_structures.xml
index 3e598105f7e..305257c7404 100644
--- a/libstdc++-v3/doc/xml/manual/policy_data_structures.xml
+++ b/libstdc++-v3/doc/xml/manual/policy_data_structures.xml
@@ -3003,7 +3003,7 @@

  

-   
+
Let U be a domain (e.g., the integers, or the
strings of 3 characters). A hash-table algorithm needs to map
elements of U "uniformly" into the range [0,..., m -
@@ -3179,7 +3179,7 @@
0t - 1 
si ai mod m
  

-   
+
 
where a is some non-negative integral value. This is
the standard string-hashing function used in SGI's
@@ -3278,7 +3278,7 @@
  

  
-   
+
  If cc_hash_table's
  hash-functor, Hash_Fn is instantiated by 
null_type , then Comb_Hash_Fn is 
taken to be
  a ranged-hash function. The graphic below shows an 
insert sequence
@@ -3298,7 +3298,7 @@
  

  
-   
+

 

@@ -3917,7 +3917,7 @@
  

  
-   
+
  Supporting such trees is difficult for a number of
  reasons:
 
@@ -4819,7 +4819,7 @@
assert(p.top() == 3);
  
 
-   
+
  It should be noted that an alternative design could embed an
  associative container in a priority queue. Could, but most
  probably should not. To begin with, it should be noted that one
-- 
2.37.3

[committed] libstdc++: Add _Exit to for freestanding

2022-09-21 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux, pushed to trunk.

-- >8 --

When I added std::_Exit to the freestanding declarations in  I
should also have added it to .

libstdc++-v3/ChangeLog:

* include/c_compatibility/stdlib.h [!_GLIBCXX_HOSTED]: Add
using-declaration for _Exit.
---
 libstdc++-v3/include/c_compatibility/stdlib.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libstdc++-v3/include/c_compatibility/stdlib.h 
b/libstdc++-v3/include/c_compatibility/stdlib.h
index 377b9107ded..70fa4c8e503 100644
--- a/libstdc++-v3/include/c_compatibility/stdlib.h
+++ b/libstdc++-v3/include/c_compatibility/stdlib.h
@@ -45,6 +45,9 @@ using std::exit;
 # ifdef _GLIBCXX_HAVE_QUICK_EXIT
   using std::quick_exit;
 # endif
+# if _GLIBCXX_USE_C99_STDLIB
+  using std::_Exit;
+# endif
 #endif
 
 #if _GLIBCXX_HOSTED
-- 
2.37.3

[committed] libstdc++: Add to ranges_base.h header

2022-09-21 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux, pushed to trunk.

-- >8 --

The header should be included explicitly to use std::initializer_list.
With the upcoming changes to make  available for freestanding
this becomes an error, because  is no longer provided
by any of the other headers involved here.

libstdc++-v3/ChangeLog:

* include/bits/ranges_base.h: Include .
---
 libstdc++-v3/include/bits/ranges_base.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libstdc++-v3/include/bits/ranges_base.h 
b/libstdc++-v3/include/bits/ranges_base.h
index 866d7c56cbc..805f196cc9f 100644
--- a/libstdc++-v3/include/bits/ranges_base.h
+++ b/libstdc++-v3/include/bits/ranges_base.h
@@ -33,6 +33,7 @@
 #pragma GCC system_header
 
 #if __cplusplus > 201703L
+#include 
 #include 
 #include 
 #include 
-- 
2.37.3

Re: [PATCH] Don't check can_vec_perm_const_p for nonlinear iv_init when it's constant.

2022-09-21 Thread Hongtao Liu via Gcc-patches

On Wed, Sep 21, 2022 at 3:41 PM Richard Biener via Gcc-patches
 wrote:
>
> On Wed, Sep 21, 2022 at 1:41 AM liuhongt via Gcc-patches
>  wrote:
> >
> > When init_expr is INTEGER_CST or REAL_CST, can_vec_perm_const_p is not
> > necessary since there's no real vec_perm needed, but
> > vec_gen_perm_mask_checked will gcc_assert (can_vec_perm_const_p). So
> > it's better to use vec_gen_perm_mask_any in
> > vect_create_nonlinear_iv_init.
>
> and the VEC_PERM build will fold the permute away?
Yes, it's just a const vector. [ c, -c, c, c, .. ].
>
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
>
> OK.
>
> Thanks,
> Richard.
>
> > gcc/ChangeLog:
> >
> > PR tree-optimization/106963
> > * tree-vect-loop.cc (vect_create_nonlinear_iv_init): Use
> > vec_gen_perm_mask_any instead of vec_gen_perm_mask_check.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr106963.c: New test.
> > ---
> >  gcc/testsuite/gcc.target/i386/pr106963.c | 14 ++
> >  gcc/tree-vect-loop.cc|  5 -
> >  2 files changed, 18 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr106963.c
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr106963.c 
> > b/gcc/testsuite/gcc.target/i386/pr106963.c
> > new file mode 100644
> > index 000..9f2d20e2523
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr106963.c
> > @@ -0,0 +1,14 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mavx -mno-avx2" } */
> > +
> > +void
> > +foo_neg_const (int *a)
> > +{
> > +  int i, b = 1;
> > +
> > +  for (i = 0; i < 1000; i++)
> > +{
> > +  a[i] = b;
> > +  b = -b;
> > +}
> > +}
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 9c434b66c5b..aabdc6f2d81 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -8356,8 +8356,11 @@ vect_create_nonlinear_iv_init (gimple_seq* stmts, 
> > tree init_expr,
> > sel[2 * i + 1] = i + nunits;
> >   }
> > vec_perm_indices indices (sel, 2, nunits);
> > +   /* Don't use vect_gen_perm_mask_checked since can_vec_perm_const_p 
> > may
> > +  fail when vec_init is const vector. In that situation vec_perm 
> > is not
> > +  really needed.  */
> > tree perm_mask_even
> > - = vect_gen_perm_mask_checked (vectype, indices);
> > + = vect_gen_perm_mask_any (vectype, indices);
> > vec_init = gimple_build (stmts, VEC_PERM_EXPR,
> >  vectype,
> >  vec_init, vec_neg,
> > --
> > 2.18.1
> >



-- 
BR,
Hongtao

Re: [PATCH] Don't check can_vec_perm_const_p for nonlinear iv_init when it's constant.

2022-09-21 Thread Richard Biener via Gcc-patches

On Wed, Sep 21, 2022 at 1:41 AM liuhongt via Gcc-patches
 wrote:
>
> When init_expr is INTEGER_CST or REAL_CST, can_vec_perm_const_p is not
> necessary since there's no real vec_perm needed, but
> vec_gen_perm_mask_checked will gcc_assert (can_vec_perm_const_p). So
> it's better to use vec_gen_perm_mask_any in
> vect_create_nonlinear_iv_init.

and the VEC_PERM build will fold the permute away?

> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> PR tree-optimization/106963
> * tree-vect-loop.cc (vect_create_nonlinear_iv_init): Use
> vec_gen_perm_mask_any instead of vec_gen_perm_mask_check.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr106963.c: New test.
> ---
>  gcc/testsuite/gcc.target/i386/pr106963.c | 14 ++
>  gcc/tree-vect-loop.cc|  5 -
>  2 files changed, 18 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr106963.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr106963.c 
> b/gcc/testsuite/gcc.target/i386/pr106963.c
> new file mode 100644
> index 000..9f2d20e2523
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr106963.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx -mno-avx2" } */
> +
> +void
> +foo_neg_const (int *a)
> +{
> +  int i, b = 1;
> +
> +  for (i = 0; i < 1000; i++)
> +{
> +  a[i] = b;
> +  b = -b;
> +}
> +}
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 9c434b66c5b..aabdc6f2d81 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -8356,8 +8356,11 @@ vect_create_nonlinear_iv_init (gimple_seq* stmts, tree 
> init_expr,
> sel[2 * i + 1] = i + nunits;
>   }
> vec_perm_indices indices (sel, 2, nunits);
> +   /* Don't use vect_gen_perm_mask_checked since can_vec_perm_const_p may
> +  fail when vec_init is const vector. In that situation vec_perm is 
> not
> +  really needed.  */
> tree perm_mask_even
> - = vect_gen_perm_mask_checked (vectype, indices);
> + = vect_gen_perm_mask_any (vectype, indices);
> vec_init = gimple_build (stmts, VEC_PERM_EXPR,
>  vectype,
>  vec_init, vec_neg,
> --
> 2.18.1
>

Re: [COMMITTED] frange::maybe_isnan() should return FALSE for undefined ranges.

2022-09-21 Thread Richard Biener via Gcc-patches

On Tue, Sep 20, 2022 at 8:23 PM Aldy Hernandez via Gcc-patches
 wrote:
>
> Undefined ranges have undefined NAN bits.  We can't depend on them,
> as they may contain garbage.

Ick ;)  Can you add a comment at least?

> This patch returns false from
> maybe_isnan() for undefined ranges (the empty set).
>
> gcc/ChangeLog:
>
> * value-range.h (frange::maybe_isnan): Return false for
> undefined ranges.
> ---
>  gcc/value-range.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/value-range.h b/gcc/value-range.h
> index 7d5584a9294..325ed08f290 100644
> --- a/gcc/value-range.h
> +++ b/gcc/value-range.h
> @@ -1210,6 +1210,8 @@ frange::known_isinf () const
>  inline bool
>  frange::maybe_isnan () const
>  {
> +  if (undefined_p ())
> +return false;
>return m_pos_nan || m_neg_nan;
>  }
>
> --
> 2.37.1
>

Re: [PATCH] Remove legacy -gz=zlib-gnu

2022-09-21 Thread Richard Biener via Gcc-patches

On Tue, Sep 20, 2022 at 2:55 PM Martin Liška  wrote:
>
> On 7/1/22 09:20, Fangrui Song via Gcc-patches wrote:
> > On 2022-07-01, Andrew Pinski wrote:
> >> On Thu, Jun 30, 2022 at 11:58 PM Fangrui Song via Gcc-patches
> >>  wrote:
> >>>
> >>> From: Fangrui Song 
> >>>
> >>> SHF_COMPRESSED style zlib has been supported since binutils 2.26
> >>> and the legacy zlib-gnu option hasn't gain adoption.
> >>> According to Debian Code Search (`gz=zlib-gnu`), no project uses
> >>> -gz=zlib-gnu (valgrind has a configure to use -gz=zlib).
> >>> Remove support for the legacy zlib-gnu and simplify configure.ac by
> >>> removing zlib-gnu ld/as check.
> >>
> >> A couple of things, you are missing a changelog.
> >
> > Sorry.
> >
> >> Second, why remove something which is still working?
>
> Hi.
>
> I do support the option removal, while I would replace the removal with a 
> warning
> saying no compression will be used.
>
> >
> > It's unused and its existence causes confusion: the paradox of choice.
> > People may assume the support may be good but newer DWARF consumers may
> > not support the legacy format.
>
> Agree, the compression format is legacy. I verified all openSUSE packages 
> (15k)
> and there's no project actively using it.
>
> >
> > The other motivation is to clean up it a bit.  I foresee that someone
> > will add --compress-debug-sections=zstd to binutils and configure.ac and
> > gcc/gcc.cc would become more messy.
>
> The argument makes sense, it will be even bigger mess.
>
> @Richi: Is it something we can deprecate for GCC 13?

What's the practical difference between zlib and zlib-gnu?  Can we just
map zlib-gnu to zlib?  If it's all configure time what's the point in
"deprecating" it?

Richard.

>
> Martin
>
> >
> >> Third, why not just make gz=zlib-gnu as an alias to gz=zlib instead so
> >> if someone used it before it will still work. we try not to remove
> >> options; have them emit a warning and be ignored (or moved over to the
> >> closed option).
> >
> > Changing the semantics of -gz=zlib-gnu would be even more confusing.
> >
> >> Thanks,
> >> Andrew
> >>
> >>> ---
> >>>  gcc/common.opt  |  3 ---
> >>>  gcc/configure   | 33 ++---
> >>>  gcc/configure.ac| 29 -
> >>>  gcc/doc/invoke.texi | 11 +--
> >>>  gcc/gcc.cc  | 22 ++
> >>>  5 files changed, 17 insertions(+), 81 deletions(-)
> >>>
> >>> diff --git a/gcc/common.opt b/gcc/common.opt
> >>> index e7a51e882ba..8754d93d545 100644
> >>> --- a/gcc/common.opt
> >>> +++ b/gcc/common.opt
> >>> @@ -3424,9 +3424,6 @@ Enum(compressed_debug_sections) String(none) 
> >>> Value(0)
> >>>  EnumValue
> >>>  Enum(compressed_debug_sections) String(zlib) Value(1)
> >>>
> >>> -EnumValue
> >>> -Enum(compressed_debug_sections) String(zlib-gnu) Value(2)
> >>> -
> >>>  gz
> >>>  Common Driver
> >>>  Generate compressed debug sections.
> >>> diff --git a/gcc/configure b/gcc/configure
> >>> index 62872d132ea..ca87e875e9d 100755
> >>> --- a/gcc/configure
> >>> +++ b/gcc/configure
> >>> @@ -19674,7 +19674,7 @@ else
> >>>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
> >>>lt_status=$lt_dlunknown
> >>>cat > conftest.$ac_ext <<_LT_EOF
> >>> -#line 19679 "configure"
> >>> +#line 19677 "configure"
> >>>  #include "confdefs.h"
> >>>
> >>>  #if HAVE_DLFCN_H
> >>> @@ -19780,7 +19780,7 @@ else
> >>>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
> >>>lt_status=$lt_dlunknown
> >>>cat > conftest.$ac_ext <<_LT_EOF
> >>> -#line 19785 "configure"
> >>> +#line 19783 "configure"
> >>>  #include "confdefs.h"
> >>>
> >>>  #if HAVE_DLFCN_H
> >>> @@ -29711,20 +29711,13 @@ else
> >>> if $gcc_cv_as --compress-debug-sections -o conftest.o conftest.s 2>&1 
> >>> | grep -i warning > /dev/null
> >>> then
> >>>   gcc_cv_as_compress_debug=0
> >>> -   # Since binutils 2.26, gas supports --compress-debug-sections=type,
> >>> +   # Since binutils 2.26, gas supports --compress-debug-sections=zlib,
> >>> # defaulting to the ELF gABI format.
> >>> -   elif $gcc_cv_as --compress-debug-sections=zlib-gnu -o conftest.o 
> >>> conftest.s > /dev/null 2>&1
> >>> +   elif $gcc_cv_as --compress-debug-sections=zlib -o conftest.o 
> >>> conftest.s > /dev/null 2>&1
> >>> then
> >>>   gcc_cv_as_compress_debug=2
> >>>   gcc_cv_as_compress_debug_option="--compress-debug-sections"
> >>>   gcc_cv_as_no_compress_debug_option="--nocompress-debug-sections"
> >>> -   # Before binutils 2.26, gas only supported --compress-debug-options 
> >>> and
> >>> -   # emitted the traditional GNU format.
> >>> -   elif $gcc_cv_as --compress-debug-sections -o conftest.o conftest.s > 
> >>> /dev/null 2>&1
> >>> -   then
> >>> - gcc_cv_as_compress_debug=1
> >>> - gcc_cv_as_compress_debug_option="--compress-debug-sections"
> >>> - gcc_cv_as_no_compress_debug_option="--nocompress-debug-sections"
> >>> else
> >>>   gcc_cv_as_compress_debug=0
> >>> fi
> >>> @@

Re: [PATCH] sched1: Fix -fcompare-debug issue in schedule_region [PR105586]

2022-09-21 Thread Richard Biener via Gcc-patches

On Tue, Sep 20, 2022 at 9:18 AM Surya Kumari Jangala via Gcc-patches
 wrote:
>
> Hi Jeff, Richard,
> Thank you for reviewing the patch!
> I have committed the patch to the gcc repo.
> Can I backport this patch to prior versions of gcc, as this is an easy patch 
> to backport and the issue exists in prior versions too?

It doesn't seem to be a regression so I'd error on the safe side here.

Richard.

> Regards,
> Surya
>
>
> On 31/08/22 9:09 pm, Jeff Law via Gcc-patches wrote:
> >
> >
> > On 8/23/2022 5:49 AM, Surya Kumari Jangala via Gcc-patches wrote:
> >> sched1: Fix -fcompare-debug issue in schedule_region [PR105586]
> >>
> >> In schedule_region(), a basic block that does not contain any real insns
> >> is not scheduled and the dfa state at the entry of the bb is not copied
> >> to the fallthru basic block. However a DEBUG insn is treated as a real
> >> insn, and if a bb contains non-real insns and a DEBUG insn, it's dfa
> >> state is copied to the fallthru bb. This was resulting in
> >> -fcompare-debug failure as the incoming dfa state of the fallthru block
> >> is different with -g. We should always copy the dfa state of a bb to
> >> it's fallthru bb even if the bb does not contain real insns.
> >>
> >> 2022-08-22  Surya Kumari Jangala  
> >>
> >> gcc/
> >> PR rtl-optimization/105586
> >> * sched-rgn.cc (schedule_region): Always copy dfa state to
> >> fallthru block.
> >>
> >> gcc/testsuite/
> >> PR rtl-optimization/105586
> >> * gcc.target/powerpc/pr105586.c: New test.
> > Interesting.We may have stumbled over this bug internally a little 
> > while ago -- not from a compare-debug standpoint, but from a "why isn't the 
> > processor state copied to the fallthru block" point of view.   I had it on 
> > my to investigate list, but hadn't gotten around to it yet.
> >
> > I think there were requests for ChangeLog updates and a function comment 
> > for save_state_for_fallthru_edge.  OK with those updates.
> >
> > jeff
> >

Re: [PATCH] Fix incorrect handle in vectorizable_induction for mixed induction type.

2022-09-21 Thread Richard Biener via Gcc-patches

On Tue, Sep 20, 2022 at 4:24 AM liuhongt via Gcc-patches
 wrote:
>
> The codes in vectorizable_induction for slp_node assume all phi_info
> have same induction type(vect_step_op_add), but since we support
> nonlinear induction, it could be wrong handled.
> So the patch return false when slp_node has mixed induction type.
>
> Note codes in other place will still vectorize the induction with
> separate iv update and vec_perm. But slp_node handle in
> vectorizable_induction will be more optimal when all induction type
> are the same, it will update ivs with one operation instead of
> separate iv updates and permutation.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR tree-optimization/103144
> * tree-vect-loop.cc (vectorizable_induction): Return false for
> slp_node with mixed induction type.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr103144-mix-1.c: New test.
> * gcc.target/i386/pr103144-mix-2.c: New test.
> ---
>  .../gcc.target/i386/pr103144-mix-1.c  | 17 +
>  .../gcc.target/i386/pr103144-mix-2.c  | 35 +++
>  gcc/tree-vect-loop.cc | 34 ++
>  3 files changed, 79 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-mix-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr103144-mix-2.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mix-1.c 
> b/gcc/testsuite/gcc.target/i386/pr103144-mix-1.c
> new file mode 100644
> index 000..b292d66ef71
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103144-mix-1.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 2 "optimized" } } */
> +/* For induction variable with differernt induction type(vect_step_op_add, 
> vect_step_op_neg),
> +   It should't be handled in vectorizable_induction with just 1 single iv 
> update(addition.),
> +   separate iv update and vec_perm are needed.  */
> +int
> +__attribute__((noipa))
> +foo (int* p, int c, int n)
> +{
> +  for (int i = 0; i != n; i++)
> +{
> +  p[2* i]= i;
> +  p[2 * i+1] = c;
> +  c = -c;
> +}
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mix-2.c 
> b/gcc/testsuite/gcc.target/i386/pr103144-mix-2.c
> new file mode 100644
> index 000..b7043d59aec
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr103144-mix-2.c
> @@ -0,0 +1,35 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mavx2 -ftree-vectorize -fvect-cost-model=unlimited 
> -mprefer-vector-width=256" } */
> +/* { dg-require-effective-target avx2 } */
> +
> +#include "avx2-check.h"
> +#include 
> +#include "pr103144-mix-1.c"
> +
> +typedef int v8si __attribute__((vector_size(32)));
> +
> +#define N 34
> +void
> +avx2_test (void)
> +{
> +  int* epi32_exp = (int*) malloc (N * sizeof (int));
> +  int* epi32_dst = (int*) malloc (N * sizeof (int));
> +
> +  __builtin_memset (epi32_exp, 0, N * sizeof (int));
> +  int b = 8;
> +  v8si init1 = __extension__(v8si) { 0, b, 1, -b, 2, b, 3, -b };
> +  v8si init2 = __extension__(v8si) { 4, b, 5, -b, 6, b, 7, -b };
> +  v8si init3 = __extension__(v8si) { 8, b, 9, -b, 10, b, 11, -b };
> +  v8si init4 = __extension__(v8si) { 12, b, 13, -b, 14, b, 15, -b };
> +  memcpy (epi32_exp, , 32);
> +  memcpy (epi32_exp + 8, , 32);
> +  memcpy (epi32_exp + 16, , 32);
> +  memcpy (epi32_exp + 24, , 32);
> +  epi32_exp[32] = 16;
> +  epi32_exp[33] = b;
> +  foo (epi32_dst, b, N / 2);
> +  if (__builtin_memcmp (epi32_dst, epi32_exp, N * sizeof (int)) != 0)
> +__builtin_abort ();
> +
> +  return;
> +}
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 9c434b66c5b..c7050a47c1c 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -9007,14 +9007,34 @@ vectorizable_induction (loop_vec_info loop_vinfo,
>  iv_loop = loop;
>gcc_assert (iv_loop == (gimple_bb (phi))->loop_father);
>
> -  if (slp_node && !nunits.is_constant ())
> +  if (slp_node)
>  {
> -  /* The current SLP code creates the step value element-by-element.  */
> -  if (dump_enabled_p ())
> -   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -"SLP induction not supported for variable-length"
> -" vectors.\n");
> -  return false;
> +  if (!nunits.is_constant ())
> +   {
> + /* The current SLP code creates the step value element-by-element.  
> */
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"SLP induction not supported for variable-length"
> +" vectors.\n");
> + return false;
> +   }
> +
> +  stmt_vec_info phi_info;
> +  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (slp_node), i, phi_info)
> +   {
> + if

60 matches

Mail list logo