date:20221011

Re: [PATCH][RFT] Vectorization of first-order recurrences

2022-10-11 Thread Richard Biener via Gcc-patches

On Mon, 10 Oct 2022, Andrew Stubbs wrote:

> On 10/10/2022 12:03, Richard Biener wrote:
> > The following picks up the prototype by Ju-Zhe Zhong for vectorizing
> > first order recurrences.  That solves two TSVC missed optimization PRs.
> > 
> > There's a new scalar cycle def kind, vect_first_order_recurrence
> > and it's handling of the backedge value vectorization is complicated
> > by the fact that the vectorized value isn't the PHI but instead
> > a (series of) permute(s) shifting in the recurring value from the
> > previous iteration.  I've implemented this by creating both the
> > single vectorized PHI and the series of permutes when vectorizing
> > the scalar PHI but leave the backedge values in both unassigned.
> > The backedge values are (for the testcases) computed by a load
> > which is also the place after which the permutes are inserted.
> > That placement also restricts the cases we can handle (without
> > resorting to code motion).
> > 
> > I added both costing and SLP handling though SLP handling is
> > restricted to the case where a single vectorized PHI is enough.
> > 
> > Missing is epilogue handling - while prologue peeling would
> > be handled transparently by adjusting iv_phi_p the epilogue
> > case doesn't work with just inserting a scalar LC PHI since
> > that a) keeps the scalar load live and b) that loads is the
> > wrong one, it has to be the last, much like when we'd vectorize
> > the LC PHI as live operation.  Unfortunately LIVE
> > compute/analysis happens too early before we decide on
> > peeling.  When using fully masked loop vectorization the
> > vect-recurr-6.c works as expected though.
> > 
> > I have tested this on x86_64 for now, but since epilogue
> > handling is missing there's probably no practical cases.
> > My prototype WHILE_ULT AVX512 patch can handle vect-recurr-6.c
> > just fine but I didn't feel like running SPEC within SDE nor
> > is the WHILE_ULT patch complete enough.  Builds of SPEC 2k7
> > with fully masked loops succeed (minus three cases of
> > PR107096, caused by my WHILE_ULT prototype).
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > 
> > Testing with SVE, GCN or RVV appreciated, ideas how to cleanly
> > handle epilogues welcome.
> 
> The testcases all produce correct code on GCN and pass the execution tests.
> 
> The code isn't terribly optimal because we don't have a two-input permutation
> instruction, so we permute each half separately and vec_merge the results. In
> this case the first vector is always a no-op permutation so that's wasted
> cycles. We'd really want a vector rotate and write-lane (or the other way
> around). I think the special-case permutations can be recognised and coded
> into the backend, but I don't know if we can easily tell that the first vector
> is just a bunch of duplicates, when it's not constant.

It's not actually a bunch of duplicates in all but the first iteration.
But what you can recognize is that we're only using lane N - 1 of the
first vector, so you could model the permute as extract last
+ shift in scalar (the extracted lane).  IIRC VLA vector targets usually
have something like shift the vector and set the low lane from a
scalar?  The extract lane N - 1 might be more difficult but then
a rotate plus extracting lane 0 might work as well.

So yes, I think special-casing this constant permutation makes most
sense.

Richard.

Re: [PING 2] [PATCH] libstdc++: basic_filebuf: don't flush more often than necessary.

2022-10-11 Thread Charles-François Natali via Gcc-patches

On Thu, Oct 6, 2022, 17:56 Jonathan Wakely  wrote:

> > I actually just copy-pasted the header from another test, would it be
> simpler if i just removed it?
>
>
> Yes, that's probably the simplest solution, and then add a
> Signed-off-by: tag in your patch email, to state you're contributing
> it under the DCO terms (assuming of course that you are willing and
> able to certify those terms).
>

I submitted another version of the patch without the header and with the
Signed-Off tag, see other thread.

Cheers,

Charles

Re: [patch] configury support for VxWorks shared libraries

2022-10-11 Thread Olivier Hainque via Gcc-patches




> On 10 Oct 2022, at 21:38, Jonathan Wakely  wrote:
> 
> On Mon, 10 Oct 2022 at 19:06, Olivier Hainque via Libstdc++
>  wrote:
>> 
>> Sorry, I forgot to cc libstdc++ on
>> 
>>  https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603182.html
>> 
>> which includes a regen of libstdc++-v3/configure after an update
>> libtool.m4.
> 
> OK,

Great :)

> thanks for the heads up.

You are very welcome. Thanks for your prompt feedback.

Regards,

Olivier

[PATCH-1, rs6000] Generate permute index directly for little endian target [PR100866]

2022-10-11 Thread HAO CHEN GUI via Gcc-patches

Hi,
  This patch modifies the help function which generates permute index for
vector byte reversion and generates permute index directly for little endian
targets. It saves one "xxlnor" instructions on P8 little endian targets as
the original process needs an "xxlnor" to calculate complement for the index.

Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
Is this okay for trunk? Any recommendations? Thanks a lot.

ChangeLog
2022-10-11  Haochen Gui 

gcc/
PR target/100866
* config/rs6000/rs6000-call.cc (swap_endian_selector_for_mode):
Generate permute index directly for little endian targets.
* config/rs6000/vsx.md (revb_): Call vprem directly with
corresponding permute indexes.

gcc/testsuite/
PR target/100866
* gcc.target/powerpc/pr100866.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000-call.cc b/gcc/config/rs6000/rs6000-call.cc
index 551968b0995..bad8e9e0e52 100644
--- a/gcc/config/rs6000/rs6000-call.cc
+++ b/gcc/config/rs6000/rs6000-call.cc
@@ -2839,7 +2839,10 @@ swap_endian_selector_for_mode (machine_mode mode)
 }

   for (i = 0; i < 16; ++i)
-perm[i] = GEN_INT (swaparray[i]);
+if (BYTES_BIG_ENDIAN)
+  perm[i] = GEN_INT (swaparray[i]);
+else
+  perm[i] = GEN_INT (~swaparray[i] & 0x001f);

   return force_reg (V16QImode, gen_rtx_CONST_VECTOR (V16QImode,
 gen_rtvec_v (16, perm)));
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index e226a93bbe5..b68eba48d2c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -6096,8 +6096,8 @@ (define_expand "revb_"
 to the endian mode in use, i.e. in LE mode, put elements
 in BE order.  */
   rtx sel = swap_endian_selector_for_mode(mode);
-  emit_insn (gen_altivec_vperm_ (operands[0], operands[1],
-  operands[1], sel));
+  emit_insn (gen_altivec_vperm__direct (operands[0], operands[1],
+ operands[1], sel));
 }

   DONE;
diff --git a/gcc/testsuite/gcc.target/powerpc/pr100866.c 
b/gcc/testsuite/gcc.target/powerpc/pr100866.c
new file mode 100644
index 000..c708dfd502e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr100866.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
+/* { dg-final { scan-assembler-not "xxlnor" } } */
+
+#include 
+
+vector unsigned short revb(vector unsigned short a)
+{
+   return vec_revb(a);
+}

Re: [PATCH v5] c-family: ICE with [[gnu::nocf_check]] [PR106937]

2022-10-11 Thread Andreas Schwab via Gcc-patches

On Okt 10 2022, Marek Polacek via Gcc-patches wrote:

> diff --git a/gcc/testsuite/c-c++-common/pointer-to-fn1.c 
> b/gcc/testsuite/c-c++-common/pointer-to-fn1.c
> new file mode 100644
> index 000..975885462e9
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/pointer-to-fn1.c
> @@ -0,0 +1,18 @@
> +/* PR c++/106937 */
> +/* { dg-options "-fcf-protection" } */

FAIL: c-c++-common/pointer-to-fn1.c  -Wc++-compat  (test for excess errors)
Excess errors:
cc1: error: '-fcf-protection=full' is not supported for this target

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

[PATCH] [x86] Add define_insn_and_split to support general version of "kxnor".

2022-10-11 Thread liuhongt via Gcc-patches

For genereal_reg_operand, it will be splitted into xor + not.
For mask_reg_operand, it will be splitted with UNSPEC_MASK_OP just
like what we did for other logic operations.

The patch will optimize xor+not to kxnor when possible.

Bootstrapped and regtested on x86_64-pc-linux-gnu.
Ok for trunk?

gcc/ChangeLog:

* config/i386/i386.md (*notxor_1): New post_reload
define_insn_and_split.
(*notxorqi_1): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr107093.c: New test.
---
 gcc/config/i386/i386.md  | 71 
 gcc/testsuite/gcc.target/i386/pr107093.c | 38 +
 2 files changed, 109 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107093.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 1be9b669909..228edba2b40 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -10826,6 +10826,39 @@ (define_insn "*_1"
(set_attr "type" "alu, alu, msklog")
(set_attr "mode" "")])
 
+(define_insn_and_split "*notxor_1"
+  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,?k")
+   (not:SWI248
+ (xor:SWI248
+   (match_operand:SWI248 1 "nonimmediate_operand" "%0,0,k")
+   (match_operand:SWI248 2 "" "r,,k"
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (XOR, mode, operands)"
+  "#"
+  "&& reload_completed"
+  [(parallel
+[(set (match_dup 0)
+ (xor:SWI248 (match_dup 1) (match_dup 2)))
+ (clobber (reg:CC FLAGS_REG))])
+   (set (match_dup 0)
+   (not:SWI248 (match_dup 1)))]
+{
+  if (MASK_REGNO_P (REGNO (operands[0])))
+{
+  emit_insn (gen_kxnor (operands[0], operands[1], operands[2]));
+  DONE;
+}
+}
+  [(set (attr "isa")
+   (cond [(eq_attr "alternative" "2")
+(if_then_else (eq_attr "mode" "SI,DI")
+  (const_string "avx512bw")
+  (const_string "avx512f"))
+ ]
+ (const_string "*")))
+   (set_attr "type" "alu, alu, msklog")
+   (set_attr "mode" "")])
+
 (define_insn_and_split "*iordi_1_bts"
   [(set (match_operand:DI 0 "nonimmediate_operand" "=rm")
(ior:DI
@@ -10959,6 +10992,44 @@ (define_insn "*qi_1"
  (symbol_ref "!TARGET_PARTIAL_REG_STALL")]
   (symbol_ref "true")))])
 
+(define_insn_and_split "*notxorqi_1"
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,?k")
+   (not:QI
+ (xor:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,k")
+ (match_operand:QI 2 "general_operand" "qn,m,rn,k"
+   (clobber (reg:CC FLAGS_REG))]
+  "ix86_binary_operator_ok (XOR, QImode, operands)"
+  "#"
+  "&& reload_completed"
+  [(parallel
+[(set (match_dup 0)
+ (xor:QI (match_dup 1) (match_dup 2)))
+ (clobber (reg:CC FLAGS_REG))])
+   (set (match_dup 0)
+   (not:QI (match_dup 0)))]
+{
+  if (mask_reg_operand (operands[0], QImode))
+{
+  emit_insn (gen_kxnorqi (operands[0], operands[1], operands[2]));
+  DONE;
+}
+}
+  [(set_attr "isa" "*,*,*,avx512f")
+   (set_attr "type" "alu,alu,alu,msklog")
+   (set (attr "mode")
+   (cond [(eq_attr "alternative" "2")
+(const_string "SI")
+   (and (eq_attr "alternative" "3")
+(match_test "!TARGET_AVX512DQ"))
+(const_string "HI")
+  ]
+  (const_string "QI")))
+   ;; Potential partial reg stall on alternative 2.
+   (set (attr "preferred_for_speed")
+ (cond [(eq_attr "alternative" "2")
+ (symbol_ref "!TARGET_PARTIAL_REG_STALL")]
+  (symbol_ref "true")))])
+
 ;; Alternative 1 is needed to work around LRA limitation, see PR82524.
 (define_insn_and_split "*_1_slp"
   [(set (strict_low_part (match_operand:SWI12 0 "register_operand" 
"+,&"))
diff --git a/gcc/testsuite/gcc.target/i386/pr107093.c 
b/gcc/testsuite/gcc.target/i386/pr107093.c
new file mode 100644
index 000..23e30cbac0f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr107093.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2 -mavx512vl" } */
+/* { dg-final { scan-assembler-times {(?n)kxnor[bwqd]} 4 { target { ! ia32 } } 
} } */
+/* { dg-final { scan-assembler-times {(?n)kxnor[bwdq]} 3 { target ia32 } } }  
*/
+
+#include
+
+__m512i
+foo (__m512i a, __m512i b, __m512i c, __m512i d)
+{
+  __mmask32 k1 = _mm512_cmp_epi16_mask (a, b, 1);
+  __mmask32 k2 = _mm512_cmp_epi16_mask (c, d, 2);
+  return _mm512_mask_mov_epi16 (a, ~(k1 ^ k2), c);
+}
+
+__m512i
+foo1 (__m512i a, __m512i b, __m512i c, __m512i d)
+{
+  __mmask16 k1 = _mm512_cmp_epi32_mask (a, b, 1);
+  __mmask16 k2 = _mm512_cmp_epi32_mask (c, d, 2);
+  return _mm512_mask_mov_epi32 (a, ~(k1 ^ k2), c);
+}
+
+__m512i
+foo2 (__m512i a, __m512i b, __m512i c, __m512i d)
+{
+  __mmask64 k1 = _mm512_cmp_epi8_mask (a, b, 1);
+  __mmask64 k2 = _mm512_cmp_epi8_mask (c, d, 2);
+  return _mm512_mask_mov_epi8 (a, ~(k1 ^ k2), c);
+}
+
+__m512i
+foo3 (__m512i a, _

[COMMITTED] [PR107195] Set range to zero when nonzero mask is 0.

2022-10-11 Thread Aldy Hernandez via Gcc-patches

When solving 0 = _15 & 1, we calculate _15 as:

[irange] int [-INF, -2][0, +INF] NONZERO 0xfffe

The known value of _15 is [0, 1] NONZERO 0x1 which is intersected with
the above, yielding:

[0, 1] NONZERO 0x0

This eventually gets copied to a _Bool [0, 1] NONZERO 0x0.

This is problematic because here we have a bool which is zero, but
returns false for irange::zero_p, since the latter does not look at
nonzero bits.  This causes logical_combine to assume the range is
not-zero, and all hell breaks loose.

I think we should just normalize a nonzero mask of 0 to [0, 0] at
creation, thus avoiding all this.

PR tree-optimization/107195

gcc/ChangeLog:

* value-range.cc (irange::set_range_from_nonzero_bits): Set range
to [0,0] when nonzero mask is 0.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr107195-1.c: New test.
* gcc.dg/tree-ssa/pr107195-2.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c | 15 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c | 16 
 gcc/value-range.cc |  5 +
 3 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c
new file mode 100644
index 000..a0c20dbd4b1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-1.c
@@ -0,0 +1,15 @@
+// { dg-do run }
+// { dg-options "-O1 -fno-tree-ccp" }
+
+int a, b;
+int main() {
+  int c = 0;
+  if (a)
+c = 1;
+  c = 1 & (a && c) && b;
+  if (a) {
+b = c;
+__builtin_abort ();
+  }
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c
new file mode 100644
index 000..d447c78bdd3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr107195-2.c
@@ -0,0 +1,16 @@
+// { dg-do run }
+// { dg-options "-O1" }
+
+int a, b;
+int main() {
+  int c = 0;
+  long d;
+  for (; b < 1; b++) {
+(c && d) & 3 || a;
+d = c;
+c = -1;
+if (d)
+  __builtin_abort();
+  }
+  return 0;
+}
diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index a14f9bc4394..e07d2aa9a5b 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -2903,6 +2903,11 @@ irange::set_range_from_nonzero_bits ()
}
   return true;
 }
+  else if (popcount == 0)
+{
+  set_zero (type ());
+  return true;
+}
   return false;
 }
 
-- 
2.37.3

Re: [PATCH][RFT] Vectorization of first-order recurrences

2022-10-11 Thread juzhe.zh...@rivai.ai

Hi, I apply this patch in RVV downstrean. Tested it with a lot of vector 
benchmark. It overal has a greate performance gain.
Maybe the last thing to merge this patch is wait for Richard Sandiford test it 
in ARM SVE?

By the way, would you mind sharing the case list that GCC failed to vectorize 
but Clang succeed ? 
I am familiar with LLVM. I think I can do this job.
Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2022-10-10 19:03
To: gcc-patches
CC: richard.sandiford; ams; juzhe.zhong
Subject: [PATCH][RFT] Vectorization of first-order recurrences
The following picks up the prototype by Ju-Zhe Zhong for vectorizing
first order recurrences.  That solves two TSVC missed optimization PRs.
 
There's a new scalar cycle def kind, vect_first_order_recurrence
and it's handling of the backedge value vectorization is complicated
by the fact that the vectorized value isn't the PHI but instead
a (series of) permute(s) shifting in the recurring value from the
previous iteration.  I've implemented this by creating both the
single vectorized PHI and the series of permutes when vectorizing
the scalar PHI but leave the backedge values in both unassigned.
The backedge values are (for the testcases) computed by a load
which is also the place after which the permutes are inserted.
That placement also restricts the cases we can handle (without
resorting to code motion).
 
I added both costing and SLP handling though SLP handling is
restricted to the case where a single vectorized PHI is enough.
 
Missing is epilogue handling - while prologue peeling would
be handled transparently by adjusting iv_phi_p the epilogue
case doesn't work with just inserting a scalar LC PHI since
that a) keeps the scalar load live and b) that loads is the
wrong one, it has to be the last, much like when we'd vectorize
the LC PHI as live operation.  Unfortunately LIVE
compute/analysis happens too early before we decide on
peeling.  When using fully masked loop vectorization the
vect-recurr-6.c works as expected though.
 
I have tested this on x86_64 for now, but since epilogue
handling is missing there's probably no practical cases.
My prototype WHILE_ULT AVX512 patch can handle vect-recurr-6.c
just fine but I didn't feel like running SPEC within SDE nor
is the WHILE_ULT patch complete enough.  Builds of SPEC 2k7
with fully masked loops succeed (minus three cases of
PR107096, caused by my WHILE_ULT prototype).
 
Bootstrapped and tested on x86_64-unknown-linux-gnu.
 
Testing with SVE, GCN or RVV appreciated, ideas how to cleanly
handle epilogues welcome.
 
Thanks,
Richard.
 
PR tree-optimization/99409
PR tree-optimization/99394
* tree-vectorizer.h (vect_def_type::vect_first_order_recurrence): Add.
(stmt_vec_info_type::recurr_info_type): Likewise.
(vectorizable_recurr): New function.
* tree-vect-loop.cc (vect_phi_first_order_recurrence_p): New
function.
(vect_analyze_scalar_cycles_1): Look for first order
recurrences.
(vect_analyze_loop_operations): Handle them.
(vect_transform_loop): Likewise.
(vectorizable_recurr): New function.
(maybe_set_vectorized_backedge_value): Handle the backedge value
setting in the first order recurrence PHI and the permutes.
* tree-vect-stmts.cc (vect_analyze_stmt): Handle first order
recurrences.
(vect_transform_stmt): Likewise.
(vect_is_simple_use): Likewise.
(vect_is_simple_use): Likewise.
* tree-vect-slp.cc (vect_get_and_check_slp_defs): Likewise.
(vect_build_slp_tree_2): Likewise.
(vect_schedule_scc): Handle the backedge value setting in the
first order recurrence PHI and the permutes.
 
* gcc.dg/vect/vect-recurr-1.c: New testcase.
* gcc.dg/vect/vect-recurr-2.c: Likewise.
* gcc.dg/vect/vect-recurr-3.c: Likewise.
* gcc.dg/vect/vect-recurr-4.c: Likewise.
* gcc.dg/vect/vect-recurr-5.c: Likewise.
* gcc.dg/vect/vect-recurr-6.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s252.c: Un-XFAIL.
* gcc.dg/vect/tsvc/vect-tsvc-s254.c: Likewise.
* gcc.dg/vect/tsvc/vect-tsvc-s291.c: Likewise.
 
Co-authored-by: Ju-Zhe Zhong 
---
.../gcc.dg/vect/tsvc/vect-tsvc-s252.c |   2 +-
.../gcc.dg/vect/tsvc/vect-tsvc-s254.c |   2 +-
.../gcc.dg/vect/tsvc/vect-tsvc-s291.c |   2 +-
gcc/testsuite/gcc.dg/vect/vect-recurr-1.c |  38 +++
gcc/testsuite/gcc.dg/vect/vect-recurr-2.c |  39 +++
gcc/testsuite/gcc.dg/vect/vect-recurr-3.c |  39 +++
gcc/testsuite/gcc.dg/vect/vect-recurr-4.c |  42 +++
gcc/testsuite/gcc.dg/vect/vect-recurr-5.c |  43 +++
gcc/testsuite/gcc.dg/vect/vect-recurr-6.c |  39 +++
gcc/tree-vect-loop.cc | 286 --
gcc/tree-vect-slp.cc  |  38 ++-
gcc/tree-vect-stmts.cc|  17 +-
gcc/tree-vectorizer.h |   4 +
13 files changed, 563 insertions(+), 28 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/vect/vect-recurr-1.c
create mode 100644 gcc/testsuite/gcc.dg/vect/vect-recurr-2.c
create mode 100644 gcc/testsuite/gcc.dg/vect/vect-recurr-3.c
create mode 100644 gcc/testsuite/gcc.dg/vect/ve

Re: [PATCH][RFT] Vectorization of first-order recurrences

2022-10-11 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> On Mon, 10 Oct 2022, Andrew Stubbs wrote:
>> On 10/10/2022 12:03, Richard Biener wrote:
>> > The following picks up the prototype by Ju-Zhe Zhong for vectorizing
>> > first order recurrences.  That solves two TSVC missed optimization PRs.
>> > 
>> > There's a new scalar cycle def kind, vect_first_order_recurrence
>> > and it's handling of the backedge value vectorization is complicated
>> > by the fact that the vectorized value isn't the PHI but instead
>> > a (series of) permute(s) shifting in the recurring value from the
>> > previous iteration.  I've implemented this by creating both the
>> > single vectorized PHI and the series of permutes when vectorizing
>> > the scalar PHI but leave the backedge values in both unassigned.
>> > The backedge values are (for the testcases) computed by a load
>> > which is also the place after which the permutes are inserted.
>> > That placement also restricts the cases we can handle (without
>> > resorting to code motion).
>> > 
>> > I added both costing and SLP handling though SLP handling is
>> > restricted to the case where a single vectorized PHI is enough.
>> > 
>> > Missing is epilogue handling - while prologue peeling would
>> > be handled transparently by adjusting iv_phi_p the epilogue
>> > case doesn't work with just inserting a scalar LC PHI since
>> > that a) keeps the scalar load live and b) that loads is the
>> > wrong one, it has to be the last, much like when we'd vectorize
>> > the LC PHI as live operation.  Unfortunately LIVE
>> > compute/analysis happens too early before we decide on
>> > peeling.  When using fully masked loop vectorization the
>> > vect-recurr-6.c works as expected though.
>> > 
>> > I have tested this on x86_64 for now, but since epilogue
>> > handling is missing there's probably no practical cases.
>> > My prototype WHILE_ULT AVX512 patch can handle vect-recurr-6.c
>> > just fine but I didn't feel like running SPEC within SDE nor
>> > is the WHILE_ULT patch complete enough.  Builds of SPEC 2k7
>> > with fully masked loops succeed (minus three cases of
>> > PR107096, caused by my WHILE_ULT prototype).
>> > 
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
>> > 
>> > Testing with SVE, GCN or RVV appreciated, ideas how to cleanly
>> > handle epilogues welcome.
>> 
>> The testcases all produce correct code on GCN and pass the execution tests.
>> 
>> The code isn't terribly optimal because we don't have a two-input permutation
>> instruction, so we permute each half separately and vec_merge the results. In
>> this case the first vector is always a no-op permutation so that's wasted
>> cycles. We'd really want a vector rotate and write-lane (or the other way
>> around). I think the special-case permutations can be recognised and coded
>> into the backend, but I don't know if we can easily tell that the first 
>> vector
>> is just a bunch of duplicates, when it's not constant.
>
> It's not actually a bunch of duplicates in all but the first iteration.
> But what you can recognize is that we're only using lane N - 1 of the
> first vector, so you could model the permute as extract last
> + shift in scalar (the extracted lane).  IIRC VLA vector targets usually
> have something like shift the vector and set the low lane from a
> scalar?

Yeah.

> The extract lane N - 1 might be more difficult but then
> a rotate plus extracting lane 0 might work as well.

I guess for SVE we should probably use SPLICE, which joins two vectors
and uses a predicate to select the first element that should be extracted.

Unfortunately we don't have a way of representing "last bit set, all other
bits clear" as a constant though, so I guess it'll have to be hidden
behind unspecs.

I meant to start SVE tests running once I'd finished for the day yesterday,
but forgot, sorry.  Will try to test today.

On the patch:

+  /* This is the second phase of vectorizing first-order rececurrences. An
+ overview of the transformation is described below. Suppose we have the
+ following loop.
+
+ int32_t t = 0;
+ for (int i = 0; i < n; ++i)
+   {
+   b[i] = a[i] - t;
+   t = a[i];
+  }
+
+There is a first-order recurrence on "a". For this loop, the shorthand
+scalar IR looks like:
+
+scalar.preheader:
+  init = a[-1]
+  br loop.body
+
+scalar.body:
+  i = PHI <0(scalar.preheader), i+1(scalar.body)>
+  _2 = PHI <(init(scalar.preheader), <_1(scalar.body)>
+  _1 = a[i]
+  b[i] = _1 - _2
+  br cond, scalar.body, ...
+
+In this example, _2 is a recurrence because it's value depends on the
+previous iteration. In the first phase of vectorization, we created a
+temporary value for _2. We now complete the vectorization and produce the
+shorthand vector IR shown below (VF = 4).
+
+vector.preheader:
+  vect_init = vect_cst(..., ..., ..., a[-1])
+  br vector.body
+
+vector.body
+  i = PHI <0(vector.preheader), i+4(vector.body)>
+  vect_1 = PH

Re: [PATCH] [x86] Add define_insn_and_split to support general version of "kxnor".

2022-10-11 Thread Uros Bizjak via Gcc-patches

On Tue, Oct 11, 2022 at 10:03 AM liuhongt  wrote:
>
> For genereal_reg_operand, it will be splitted into xor + not.
> For mask_reg_operand, it will be splitted with UNSPEC_MASK_OP just
> like what we did for other logic operations.
>
> The patch will optimize xor+not to kxnor when possible.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/i386.md (*notxor_1): New post_reload
> define_insn_and_split.
> (*notxorqi_1): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr107093.c: New test.

OK with a small fix below.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.md  | 71 
>  gcc/testsuite/gcc.target/i386/pr107093.c | 38 +
>  2 files changed, 109 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr107093.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 1be9b669909..228edba2b40 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -10826,6 +10826,39 @@ (define_insn "*_1"
> (set_attr "type" "alu, alu, msklog")
> (set_attr "mode" "")])
>
> +(define_insn_and_split "*notxor_1"
> +  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,?k")
> +   (not:SWI248
> + (xor:SWI248
> +   (match_operand:SWI248 1 "nonimmediate_operand" "%0,0,k")
> +   (match_operand:SWI248 2 "" "r,,k"
> +   (clobber (reg:CC FLAGS_REG))]
> +  "ix86_binary_operator_ok (XOR, mode, operands)"
> +  "#"
> +  "&& reload_completed"
> +  [(parallel
> +[(set (match_dup 0)
> + (xor:SWI248 (match_dup 1) (match_dup 2)))
> + (clobber (reg:CC FLAGS_REG))])
> +   (set (match_dup 0)
> +   (not:SWI248 (match_dup 1)))]

(not:SWI248 (match_dup 0))

in the above RTX.

> +{
> +  if (MASK_REGNO_P (REGNO (operands[0])))
> +{
> +  emit_insn (gen_kxnor (operands[0], operands[1], operands[2]));
> +  DONE;
> +}
> +}
> +  [(set (attr "isa")
> +   (cond [(eq_attr "alternative" "2")
> +(if_then_else (eq_attr "mode" "SI,DI")
> +  (const_string "avx512bw")
> +  (const_string "avx512f"))
> + ]
> + (const_string "*")))
> +   (set_attr "type" "alu, alu, msklog")
> +   (set_attr "mode" "")])
> +
>  (define_insn_and_split "*iordi_1_bts"
>[(set (match_operand:DI 0 "nonimmediate_operand" "=rm")
> (ior:DI
> @@ -10959,6 +10992,44 @@ (define_insn "*qi_1"
>   (symbol_ref "!TARGET_PARTIAL_REG_STALL")]
>(symbol_ref "true")))])
>
> +(define_insn_and_split "*notxorqi_1"
> +  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,?k")
> +   (not:QI
> + (xor:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,k")
> + (match_operand:QI 2 "general_operand" "qn,m,rn,k"
> +   (clobber (reg:CC FLAGS_REG))]
> +  "ix86_binary_operator_ok (XOR, QImode, operands)"
> +  "#"
> +  "&& reload_completed"
> +  [(parallel
> +[(set (match_dup 0)
> + (xor:QI (match_dup 1) (match_dup 2)))
> + (clobber (reg:CC FLAGS_REG))])
> +   (set (match_dup 0)
> +   (not:QI (match_dup 0)))]
> +{
> +  if (mask_reg_operand (operands[0], QImode))
> +{
> +  emit_insn (gen_kxnorqi (operands[0], operands[1], operands[2]));
> +  DONE;
> +}
> +}
> +  [(set_attr "isa" "*,*,*,avx512f")
> +   (set_attr "type" "alu,alu,alu,msklog")
> +   (set (attr "mode")
> +   (cond [(eq_attr "alternative" "2")
> +(const_string "SI")
> +   (and (eq_attr "alternative" "3")
> +(match_test "!TARGET_AVX512DQ"))
> +(const_string "HI")
> +  ]
> +  (const_string "QI")))
> +   ;; Potential partial reg stall on alternative 2.
> +   (set (attr "preferred_for_speed")
> + (cond [(eq_attr "alternative" "2")
> + (symbol_ref "!TARGET_PARTIAL_REG_STALL")]
> +  (symbol_ref "true")))])
> +
>  ;; Alternative 1 is needed to work around LRA limitation, see PR82524.
>  (define_insn_and_split "*_1_slp"
>[(set (strict_low_part (match_operand:SWI12 0 "register_operand" 
> "+,&"))
> diff --git a/gcc/testsuite/gcc.target/i386/pr107093.c 
> b/gcc/testsuite/gcc.target/i386/pr107093.c
> new file mode 100644
> index 000..23e30cbac0f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr107093.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512bw -O2 -mavx512vl" } */
> +/* { dg-final { scan-assembler-times {(?n)kxnor[bwqd]} 4 { target { ! ia32 } 
> } } } */
> +/* { dg-final { scan-assembler-times {(?n)kxnor[bwdq]} 3 { target ia32 } } } 
>  */
> +
> +#include
> +
> +__m512i
> +foo (__m512i a, __m512i b, __m512i c, __m512i d)
> +{
> +  __mmask32 k1 = _mm512_cmp_epi16_mask (a, b, 1);
> +  __mmask32 k2 = _mm512_cmp_epi16_mask (c, d, 2);
> +  return _mm512_mask_mov_epi16 (a, ~(k1 ^ k2), c);
> +}
> +
> +__m512i
> +foo1 (__m512i a, __m512i b, __m512i c, __m512i

Re: [PATCH v2] LoongArch: Libvtv add loongarch support.

2022-10-11 Thread Xi Ruoyao via Gcc-patches

On Mon, 2022-10-10 at 10:49 -0700, Caroline Tice via Gcc-patches wrote:
> Is "if (VTV_PAGE_SIZE != sysconf (_SC_PAGE_SIZE))" going to fail for
> loongarch?

Because LoongArch systems may have 4KB, 16KB, or 64KB pages.

> If not, why do you need to insert anything here at all?  If so,
> perhaps you could write something similar to sysconf_SC_PAGE_SIZE for
> loongarch (as was done for __CYGWIN__ & __MINGW32__)?

I'd like to ask a question: if we set VTV_PAGE_SIZE to 64KB and make the
special case, will libvtv work for 4KB and 16KB pages?  (If I read code
correctly, setting VTV_PAGE_SIZE to 4KB will obviously break 16KB or
64KB configuration.)

If VTV_PAGE_SIZE == sysconf (_SC_PAGE_SIZE) is strictly required for
libvtv we'll have to keep the check as-is and then we'll only support
16KB page configuration (which is the default in Linux kernel
configuration for LoongArch).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

RE: [PATCH][GCC 12] arm: Fix constant immediates predicates and constraints for some MVE builtins

2022-10-11 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Monday, October 10, 2022 4:30 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov 
> Subject: Re: [PATCH][GCC 12] arm: Fix constant immediates predicates and
> constraints for some MVE builtins
> 
> ping^2 ?
> 
> 
> On 10/5/22 16:55, Christophe Lyon via Gcc-patches wrote:
> > ping?
> >
> >
> > On 9/12/22 10:13, Christophe Lyon via Gcc-patches wrote:
> >> Hi!
> >>
> >> On 9/9/22 11:33, Christophe Lyon wrote:
> >>> This is a backport from trunk to gcc-12.
> >>>
> >>> Several MVE builtins incorrectly use the same predicate/constraint
> >>> pair for several modes, which does not match the specification.
> >>> This patch uses the appropriate iterator instead.

Ok.
Thanks,
Kyrill

> >>>
> >>> 2022-09-06  Christophe Lyon  
> >>>
> >>> gcc/
> >>> * config/arm/mve.md (mve_vqshluq_n_s): Use
> >>> MVE_pred/MVE_constraint instead of mve_imm_7/Ra.
> >>> (mve_vqshluq_m_n_s): Likewise.
> >>> (mve_vqrshrnbq_n_): Use
> MVE_pred3/MVE_constraint3
> >>> instead of mve_imm_8/Rb.
> >>> (mve_vqrshrunbq_n_s): Likewise.
> >>> (mve_vqrshrntq_n_): Likewise.
> >>> (mve_vqrshruntq_n_s): Likewise.
> >>> (mve_vrshrnbq_n_): Likewise.
> >>> (mve_vrshrntq_n_): Likewise.
> >>> (mve_vqrshrnbq_m_n_): Likewise.
> >>> (mve_vqrshrntq_m_n_): Likewise.
> >>> (mve_vrshrnbq_m_n_): Likewise.
> >>> (mve_vrshrntq_m_n_): Likewise.
> >>> (mve_vqrshrunbq_m_n_s): Likewise.
> >>> (mve_vsriq_n_ instead
> >>> of mve_imm_selective_upto_8/Rg.
> >>> (mve_vsriq_m_n_): Likewise.
> >>>
> >>> (cheerry-picked from c3fb6658c7670e446f2fd00984404d971e416b3c)
> >>
> >>
> >> Is this backport OK for gcc-12? (with the "cheerry" typo above fixed)
> >>
> >> Thanks,
> >>
> >> Christophe
> >>
> >>
> >>> ---
> >>>   gcc/config/arm/mve.md | 30 +++---
> >>>   1 file changed, 15 insertions(+), 15 deletions(-)
> >>>
> >>> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> >>> index f16991c0a34..469e7e7f8dc 100644
> >>> --- a/gcc/config/arm/mve.md
> >>> +++ b/gcc/config/arm/mve.md
> >>> @@ -1617,7 +1617,7 @@ (define_insn "mve_vqshluq_n_s"
> >>>     [
> >>>  (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> >>>   (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand"
> "w")
> >>> -   (match_operand:SI 2 "mve_imm_7" "Ra")]
> >>> +   (match_operand:SI 2 "" "")]
> >>>    VQSHLUQ_N_S))
> >>>     ]
> >>>     "TARGET_HAVE_MVE"
> >>> @@ -2608,7 +2608,7 @@ (define_insn
> "mve_vqrshrnbq_n_"
> >>>  (set (match_operand: 0 "s_register_operand" "=w")
> >>>   (unspec: [(match_operand: 1
> >>> "s_register_operand" "0")
> >>>    (match_operand:MVE_5 2 "s_register_operand" "w")
> >>> - (match_operand:SI 3 "mve_imm_8" "Rb")]
> >>> + (match_operand:SI 3 ""
> >>> "")]
> >>>    VQRSHRNBQ_N))
> >>>     ]
> >>>     "TARGET_HAVE_MVE"
> >>> @@ -2623,7 +2623,7 @@ (define_insn "mve_vqrshrunbq_n_s"
> >>>  (set (match_operand: 0 "s_register_operand" "=w")
> >>>   (unspec: [(match_operand: 1
> >>> "s_register_operand" "0")
> >>>    (match_operand:MVE_5 2 "s_register_operand" "w")
> >>> - (match_operand:SI 3 "mve_imm_8" "Rb")]
> >>> + (match_operand:SI 3 ""
> >>> "")]
> >>>    VQRSHRUNBQ_N_S))
> >>>     ]
> >>>     "TARGET_HAVE_MVE"
> >>> @@ -3563,7 +3563,7 @@ (define_insn "mve_vsriq_n_"
> >>>  (set (match_operand:MVE_2 0 "s_register_operand" "=w")
> >>>   (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
> >>>  (match_operand:MVE_2 2 "s_register_operand" "w")
> >>> -   (match_operand:SI 3 "mve_imm_selective_upto_8" "Rg")]
> >>> +   (match_operand:SI 3 "" "")]
> >>>    VSRIQ_N))
> >>>     ]
> >>>     "TARGET_HAVE_MVE"
> >>> @@ -4466,7 +4466,7 @@ (define_insn
> "mve_vqrshrntq_n_"
> >>>  (set (match_operand: 0 "s_register_operand" "=w")
> >>>   (unspec: [(match_operand: 1
> >>> "s_register_operand" "0")
> >>>  (match_operand:MVE_5 2 "s_register_operand" "w")
> >>> -   (match_operand:SI 3 "mve_imm_8" "Rb")]
> >>> +   (match_operand:SI 3 "" "")]
> >>>    VQRSHRNTQ_N))
> >>>     ]
> >>>     "TARGET_HAVE_MVE"
> >>> @@ -4482,7 +4482,7 @@ (define_insn "mve_vqrshruntq_n_s"
> >>>  (set (match_operand: 0 "s_register_operand" "=w")
> >>>   (unspec: [(match_operand: 1
> >>> "s_register_operand" "0")
> >>>  (match_operand:MVE_5 2 "s_register_operand" "w")
> >>> -   (match_operand:SI 3 "mve_imm_8" "Rb")]
> >>> +   (match_operand:SI 3 "" "")]
> >>>    VQRSHRUNTQ_N_S))
> >>>     ]
> >>>     "TARGET_HAVE_MVE"
> >>> @@ -4770,7 +4770,7 @@ (define_insn
> "mve_vrshrnbq_n_"
> >>>  (set (match_operand: 0 "s_register_operand" "=w")
> >>>   (unspec: [(match_operand: 1
> >>> "s_register_operand" "0")
> >>>  (match_opera

Re: [Patch][v5] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-10-11 Thread Jakub Jelinek via Gcc-patches

On Fri, Oct 07, 2022 at 04:26:58PM +0200, Tobias Burnus wrote:
> libgomp/nvptx: Prepare for reverse-offload callback handling
> 
> This patch adds a stub 'gomp_target_rev' in the host's target.c, which will
> later handle the reverse offload.
> For nvptx, it adds support for forwarding the offload gomp_target_ext call
> to the host by setting values in a struct on the device and querying it on
> the host - invoking gomp_target_rev on the result.
> 
> For host-device consistency guarantee reasons, reverse offload is currently
> limited -march=sm_70 (for libgomp).
> 
> gcc/ChangeLog:
> 
>   * config/nvptx/mkoffload.cc (process): Warn if the linked-in libgomp.a
>   has not been compiled with sm_70 or higher and disable code gen then.
> 
> include/ChangeLog:
> 
>   * cuda/cuda.h (enum CUdevice_attribute): Add
>   CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING.
>   (CU_MEMHOSTALLOC_DEVICEMAP): Define.
>   (cuMemHostAlloc): Add prototype.
> 
> libgomp/ChangeLog:
> 
>   * config/nvptx/icv-device.c (GOMP_DEVICE_NUM_VAR): Remove
>   'static' for this variable.
>   * config/nvptx/libgomp-nvptx.h: New file.
>   * config/nvptx/target.c: Include it.
>   (GOMP_ADDITIONAL_ICVS): Declare extern var.
>   (GOMP_REV_OFFLOAD_VAR): Declare var.
>   (GOMP_target_ext): Handle reverse offload.
>   * libgomp-plugin.h (GOMP_PLUGIN_target_rev): New prototype.
>   * libgomp-plugin.c (GOMP_PLUGIN_target_rev): New, call ...
>   * target.c (gomp_target_rev): ... this new stub function.
>   * libgomp.h (gomp_target_rev): Declare.
>   * libgomp.map (GOMP_PLUGIN_1.4): New; add GOMP_PLUGIN_target_rev.
>   * plugin/cuda-lib.def (cuMemHostAlloc): Add.
>   * plugin/plugin-nvptx.c: Include libgomp-nvptx.h.
>   (struct ptx_device): Add rev_data member. 
>   (nvptx_open_device): #if 0 unused check; add
>   unified address assert check.
>   (GOMP_OFFLOAD_get_num_devices): Claim unified address
>   support.
>   (GOMP_OFFLOAD_load_image): Free rev_fn_table if no
>   offload functions exist. Make offload var available
>   on host and device.
>   (rev_off_dev_to_host_cpy, rev_off_host_to_dev_cpy): New.
>   (GOMP_OFFLOAD_run): Handle reverse offload.

So, does this mean one has to have gcc configured --with-arch=sm_70
or later to make reverse offloading work (and then on the other
side no support for older PTX arches at all)?
If yes, I was kind of hoping we could arrange for it to be more
user-friendly, build libgomp.a normally (sm_35 or what is the default),
build the single TU in libgomp that needs the sm_70 stuff with -march=sm_70
and arrange for mkoffload to link in the sm_70 stuff only if the user
wants reverse offload (or has requires reverse_offload?).  In that case
ignore sm_60 and older devices, if reverse offload isn't wanted, don't link
in the part that needs sm_70 and make stuff working on sm_35 and later.
Or perhaps have 2 versions of target.o, one sm_35 and one sm_70 and let
mkoffload choose among them.

> +  /* The code for nvptx for GOMP_target_ext in 
> libgomp/config/nvptx/target.c
> +  for < sm_70 exists but is disabled here as it is unclear whether there
> +  is the required consistency between host and device.
> +  See https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602715.html
> +  for details.  */
> +  warning_at (input_location, 0,
> +   "Disabling offload-code generation for this device type: "
> +   "% can only be fulfilled "
> +   "for % or higher");
> +  inform (UNKNOWN_LOCATION,
> +   "Reverse offload requires that GCC is configured with "
> +   "%<--with-arch=sm_70%> or higher and not overridden by a lower "
> +   "value for %<-foffload-options=nvptx-none=-march=%>");

Diagnostics (sure, Fortran FE is an exception) shouldn't start with capital
letters).

> @@ -519,10 +523,20 @@ nvptx_open_device (int n)
> CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR, dev);
>ptx_dev->max_threads_per_multiprocessor = pi;
>  
> +#if 0
> +  int async_engines;
>r = CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &async_engines,
>CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT, dev);
>if (r != CUDA_SUCCESS)
>  async_engines = 1;
> +#endif

Please avoid #if 0 code.

> +
> +  /* Required below for reverse offload as implemented, but with compute
> + capability >= 2.0 and 64bit device processes, this should be 
> universally be
> + the case; hence, an assert.  */
> +  r = CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &pi,
> +  CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING, dev);
> +  assert (r == CUDA_SUCCESS && pi);
>  
>for (int i = 0; i != GOMP_DIM_MAX; i++)
>  ptx_dev->default_dims[i] = 0;
> @@ -1179,8 +1193,10 @@ GOMP_OFFLOAD_get_num_devices (unsigned int 
> omp_requires_mask)
>  {
>int num_devices = nvptx_get_num_devices ();
>/* Return -1 if no omp_requires_mask

Re: [PATCH 2/2] Split edge when edge locus and dest don't match

2022-10-11 Thread Jørgen Kvalsvik via Gcc-patches

On 07/10/2022 13:45, Jørgen Kvalsvik wrote:
> On 07/10/2022 08:53, Richard Biener wrote:
>> On Thu, Oct 6, 2022 at 4:28 PM Jørgen Kvalsvik
>>  wrote:
>>>
>>> On 06/10/2022 10:12, Richard Biener wrote:
 On Wed, Oct 5, 2022 at 2:49 PM Martin Liška  wrote:
>
> On 10/5/22 14:04, Jørgen Kvalsvik via Gcc-patches wrote:
>> Edges with locus are candidates for splitting so that the edge with
>> locus is the only edge out of a basic block, except when the locuses
>> match. The test checks the last (non-debug) statement in a basic block,
>> but this causes an unnecessary split when the edge locus go to a block
>> which coincidentally has an unrelated label. Comparing the first
>> statement of the destination block is the same check but does not get
>> tripped up by labels.
>>
>> An example with source/edge/dest locus when an edge is split:
>>
>>   1  int fn (int a, int b, int c) {
>>   2int x = 0;
>>   3if (a && b) {
>>   4x = a;
>>   5} else {
>>   6  a_:
>>   7x = (a - b);
>>   8}
>>   9
>>  10return x;
>>  11  }
>>
>> block  file  line   col  stmt
>>
>> src t.c 310  if (a_3(D) != 0)
>> edget.c 6 1
>> destt.c 6 1  a_:
>>
>> src t.c 313  if (b_4(D) != 0)
>> edget.c 6 1
>> dst t.c 6 1  a_:
>>
>> With label removed:
>>
>>   1  int fn (int a, int b, int c) {
>>   2int x = 0;
>>   3if (a && b) {
>>   4x = a;
>>   5} else {
>>   6  // a_: <- label removed
>>   7x = (a - b);
>>   8}
>>   9
>>  10return x;
>>  11
>>
>> block  file  line   col  stmt
>>
>> src t.c 310  if (a_3(D) != 0)
>> edge  (null)0 0
>> destt.c 6 1  a_:
>>
>> src t.c 313  if (b_4(D) != 0)
>> edge  (null)0 0
>> dst t.c 6 1  a_:
>>
>> and this is extract from gcov-4b.c which *should* split:
>>
>> 205  int
>> 206  test_switch (int i, int j)
>> 207  {
>> 208int result = 0;
>> 209
>> 210switch (i)/* branch(80 25) */
>> 211  /* branch(end) */
>> 212  {
>> 213case 1:
>> 214  result = do_something (2);
>> 215  break;
>> 216case 2:
>> 217  result = do_something (1024);
>> 218  break;
>> 219case 3:
>> 220case 4:
>> 221  if (j == 2) /* branch(67) */
>> 222  /* branch(end) */
>> 223return do_something (4);
>> 224  result = do_something (8);
>> 225  break;
>> 226default:
>> 227  result = do_something (32);
>> 228  switch_m++;
>> 229  break;
>> 230  }
>> 231return result;
>> 231  }
>>
>> block  file  line   col  stmt
>>
>> src4b.c   21418  result_18 = do_something (2);
>> edge   4b.c   215 9
>> dst4b.c   23110  _22 = result_3;
>>
>> src4b.c   21718  result_16 = do_something (1024);
>> edge   4b.c   218 9
>> dst4b.c   23110  _22 = result_3;
>>
>> src4b.c   22418  result_12 = do_something (8);
>> edge   4b.c   225 9
>> dst4b.c   23110  _22 = result_3;
>>
>> Note that the behaviour of comparison is preserved for the (switch) edge
>> splitting case. The former case now fails the check (even though
>> e->goto_locus is no longer a reserved location) because the dest matches
>> the e->locus.
>
> It's fine, please install it.
> I verified tramp3d coverage output is the same as before the patch.
>
> Martin
>
>>
>> gcc/ChangeLog:
>>
>> * profile.cc (branch_prob): Compare edge locus to dest, not src.
>> ---
>>  gcc/profile.cc | 18 +-
>>  1 file changed, 9 insertions(+), 9 deletions(-)
>>
>> diff --git a/gcc/profile.cc b/gcc/profile.cc
>> index 96121d60711..c13a79a84c2 100644
>> --- a/gcc/profile.cc
>> +++ b/gcc/profile.cc
>> @@ -1208,17 +1208,17 @@ branch_prob (bool thunk)
>> FOR_EACH_EDGE (e, ei, bb->succs)
>>   {
>> gimple_stmt_iterator gsi;
>> -   gimple *last = NULL;
>> +   gimple *dest = NULL;
>>
>> /* It may happen that there are compiler generated statements
>>

[committed 0/6] amdgcn: Add V32, V16, V8, V4, and V2 vectors

2022-10-11 Thread Andrew Stubbs

This patch series adds additional vector sizes for the amdgcn backend.

The hardware supports any arbitrary vector length up to 64-lanes via
masking, but GCC cannot (yet) make full use of them due to middle-end
limitations.  Adding smaller "virtual" vector sizes increases the
complexity of the backend a little, but opens up optimization
opportunities for the current middle-end implementation somewhat. In
particular, it enables many more cases of SLP optimization.

The patchset gives aproximately 100 addtional test PASS and a few extra
FAIL.  However, the failures are not new issues, but rather existing
problems that did not show up because the code did not previously
vectorize.  Expanding the testcase to allow 64-lane vectors shows the
same problems there.

I shall backport these patches to the OG12 branch shortly.

Andrew

Andrew Stubbs (6):
  amdgcn: add multiple vector sizes
  amdgcn: Resolve insn conditions at compile time
  amdgcn: Add vec_extract for partial vectors
  amdgcn: vec_init for multiple vector sizes
  amdgcn: Add vector integer negate insn
  amdgcn: vector testsuite tweaks

 gcc/config/gcn/gcn-modes.def  |   82 ++
 gcc/config/gcn/gcn-protos.h   |   24 +-
 gcc/config/gcn/gcn-valu.md|  399 +--
 gcc/config/gcn/gcn.cc | 1063 +++--
 gcc/config/gcn/gcn.h  |   24 +
 gcc/testsuite/gcc.dg/pr104464.c   |2 +
 gcc/testsuite/gcc.dg/signbit-2.c  |5 +-
 gcc/testsuite/gcc.dg/signbit-5.c  |1 +
 gcc/testsuite/gcc.dg/vect/bb-slp-68.c |5 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c |3 +-
 .../gcc.dg/vect/bb-slp-subgroups-3.c  |5 +-
 .../gcc.dg/vect/no-vfa-vect-depend-2.c|3 +-
 gcc/testsuite/gcc.dg/vect/pr33953.c   |3 +-
 gcc/testsuite/gcc.dg/vect/pr65947-12.c|3 +-
 gcc/testsuite/gcc.dg/vect/pr65947-13.c|3 +-
 gcc/testsuite/gcc.dg/vect/pr80631-2.c |3 +-
 gcc/testsuite/gcc.dg/vect/slp-reduc-4.c   |3 +-
 .../gcc.dg/vect/trapv-vect-reduc-4.c  |3 +-
 gcc/testsuite/lib/target-supports.exp |3 +-
 19 files changed, 1183 insertions(+), 454 deletions(-)

-- 
2.37.0

[committed 2/6] amdgcn: Resolve insn conditions at compile time

2022-10-11 Thread Andrew Stubbs


GET_MODE_NUNITS isn't a compile time constant, so we end up with many
impossible insns in the machine description.  Adding MODE_VF allows the insns
to be eliminated completely.

gcc/ChangeLog:

* config/gcn/gcn-valu.md
(2): Use MODE_VF.
(2): Likewise.
* config/gcn/gcn.h (MODE_VF): New macro.
---
 gcc/config/gcn/gcn-valu.md | 10 ++
 gcc/config/gcn/gcn.h   | 24 
 2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 52d2fcb880a..c7be2361164 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -2873,8 +2873,9 @@ (define_insn "2"
   [(set (match_operand:VCVT_FMODE 0 "register_operand" "=  v")
 	(cvt_op:VCVT_FMODE
 	  (match_operand:VCVT_MODE 1 "gcn_alu_operand" "vSvB")))]
-  "gcn_valid_cvt_p (mode, mode,
-		_cvt)"
+  "MODE_VF (mode) == MODE_VF (mode)
+   && gcn_valid_cvt_p (mode, mode,
+		   _cvt)"
   "v_cvt\t%0, %1"
   [(set_attr "type" "vop1")
(set_attr "length" "8")])
@@ -2883,8 +2884,9 @@ (define_insn "2"
   [(set (match_operand:VCVT_IMODE 0 "register_operand"  "=  v")
 	(cvt_op:VCVT_IMODE
 	  (match_operand:VCVT_FMODE 1 "gcn_alu_operand" "vSvB")))]
-  "gcn_valid_cvt_p (mode, mode,
-		_cvt)"
+  "MODE_VF (mode) == MODE_VF (mode)
+   && gcn_valid_cvt_p (mode, mode,
+		   _cvt)"
   "v_cvt\t%0, %1"
   [(set_attr "type" "vop1")
(set_attr "length" "8")])
diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
index 318256c4a7a..38f7212db59 100644
--- a/gcc/config/gcn/gcn.h
+++ b/gcc/config/gcn/gcn.h
@@ -678,3 +678,27 @@ enum gcn_builtin_codes
 /* Trampolines */
 #define TRAMPOLINE_SIZE 36
 #define TRAMPOLINE_ALIGNMENT 64
+
+/* MD Optimization.
+   The following are intended to be obviously constant at compile time to
+   allow genconditions to eliminate bad patterns at compile time.  */
+#define MODE_VF(M) \
+  ((M == V64QImode || M == V64HImode || M == V64HFmode || M == V64SImode \
+|| M == V64SFmode || M == V64DImode || M == V64DFmode) \
+   ? 64 \
+   : (M == V32QImode || M == V32HImode || M == V32HFmode || M == V32SImode \
+  || M == V32SFmode || M == V32DImode || M == V32DFmode) \
+   ? 32 \
+   : (M == V16QImode || M == V16HImode || M == V16HFmode || M == V16SImode \
+  || M == V16SFmode || M == V16DImode || M == V16DFmode) \
+   ? 16 \
+   : (M == V8QImode || M == V8HImode || M == V8HFmode || M == V8SImode \
+  || M == V8SFmode || M == V8DImode || M == V8DFmode) \
+   ? 8 \
+   : (M == V4QImode || M == V4HImode || M == V4HFmode || M == V4SImode \
+  || M == V4SFmode || M == V4DImode || M == V4DFmode) \
+   ? 4 \
+   : (M == V2QImode || M == V2HImode || M == V2HFmode || M == V2SImode \
+  || M == V2SFmode || M == V2DImode || M == V2DFmode) \
+   ? 2 \
+   : 1)

[committed 3/6] amdgcn: Add vec_extract for partial vectors

2022-10-11 Thread Andrew Stubbs


Add vec_extract expanders for all valid pairs of vector types.

gcc/ChangeLog:

* config/gcn/gcn-protos.h (get_exec): Add prototypes for two variants.
* config/gcn/gcn-valu.md
(vec_extract): New define_expand.
* config/gcn/gcn.cc (get_exec): Export the existing function. Add a
new overload variant.
---
 gcc/config/gcn/gcn-protos.h |  2 ++
 gcc/config/gcn/gcn-valu.md  | 34 ++
 gcc/config/gcn/gcn.cc   |  9 -
 3 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h
index 6300c1cbd36..f9a1fc00b4f 100644
--- a/gcc/config/gcn/gcn-protos.h
+++ b/gcc/config/gcn/gcn-protos.h
@@ -24,6 +24,8 @@ extern bool gcn_constant64_p (rtx);
 extern bool gcn_constant_p (rtx);
 extern rtx gcn_convert_mask_mode (rtx reg);
 extern unsigned int gcn_dwarf_register_number (unsigned int regno);
+extern rtx get_exec (int64_t);
+extern rtx get_exec (machine_mode mode);
 extern char * gcn_expand_dpp_shr_insn (machine_mode, const char *, int, int);
 extern void gcn_expand_epilogue ();
 extern rtx gcn_expand_scaled_offsets (addr_space_t as, rtx base, rtx offsets,
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index c7be2361164..9ea60e1174f 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -808,6 +808,40 @@ (define_insn "vec_extract"
(set_attr "exec" "none")
(set_attr "laneselect" "yes")])
 
+(define_expand "vec_extract"
+  [(set (match_operand:V_ALL_ALT 0 "register_operand")
+	(vec_select:V_ALL_ALT
+	  (match_operand:V_ALL 1 "register_operand")
+	  (parallel [(match_operand 2 "immediate_operand")])))]
+  "MODE_VF (mode) < MODE_VF (mode)
+   && mode == mode"
+  {
+int numlanes = GET_MODE_NUNITS (mode);
+int firstlane = INTVAL (operands[2]) * numlanes;
+rtx tmp;
+
+if (firstlane == 0)
+  {
+	/* A plain move will do.  */
+	tmp = operands[1];
+  } else {
+/* FIXME: optimize this by using DPP where available.  */
+
+rtx permutation = gen_reg_rtx (mode);
+	emit_insn (gen_vec_series (permutation,
+	   GEN_INT (firstlane*4),
+	   GEN_INT (4)));
+
+	tmp = gen_reg_rtx (mode);
+	emit_insn (gen_ds_bpermute (tmp, permutation, operands[1],
+		get_exec (mode)));
+  }
+
+emit_move_insn (operands[0],
+		gen_rtx_SUBREG (mode, tmp, 0));
+DONE;
+  })
+
 (define_expand "extract_last_"
   [(match_operand: 0 "register_operand")
(match_operand:DI 1 "gcn_alu_operand")
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index e1636f6ddd6..fdcf290ef8b 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -846,7 +846,7 @@ gcn_ira_change_pseudo_allocno_class (int regno, reg_class_t cl,
 /* Create a new DImode pseudo reg and emit an instruction to initialize
it to VAL.  */
 
-static rtx
+rtx
 get_exec (int64_t val)
 {
   rtx reg = gen_reg_rtx (DImode);
@@ -854,6 +854,13 @@ get_exec (int64_t val)
   return reg;
 }
 
+rtx
+get_exec (machine_mode mode)
+{
+  int vf = (VECTOR_MODE_P (mode) ? GET_MODE_NUNITS (mode) : 1);
+  return get_exec (0xUL >> (64-vf));
+}
+
 /* }}}  */
 /* {{{ Immediate constants.  */

[committed 4/6] amdgcn: vec_init for multiple vector sizes

2022-10-11 Thread Andrew Stubbs


Implements vec_init when the input is a vector of smaller vectors, or of
vector MEM types, or a smaller vector duplicated several times.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (vec_init): New.
* config/gcn/gcn.cc (GEN_VN): Add andvNsi3, subvNsi3.
(GEN_VNM): Add gathervNm_expr.
(GEN_VN_NOEXEC): Add vec_seriesvNsi.
(gcn_expand_vector_init): Add initialization of vectors from smaller
vectors.
---
 gcc/config/gcn/gcn-valu.md |  10 +++
 gcc/config/gcn/gcn.cc  | 159 +++--
 2 files changed, 143 insertions(+), 26 deletions(-)

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 9ea60e1174f..f708e587f38 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -893,6 +893,16 @@ (define_expand "vec_init"
 DONE;
   })
 
+(define_expand "vec_init"
+  [(match_operand:V_ALL 0 "register_operand")
+   (match_operand:V_ALL_ALT 1)]
+  "mode == mode
+   && MODE_VF (mode) < MODE_VF (mode)"
+  {
+gcn_expand_vector_init (operands[0], operands[1]);
+DONE;
+  })
+
 ;; }}}
 ;; {{{ Scatter / Gather
 
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index fdcf290ef8b..3dc294c2d2f 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -1365,12 +1365,17 @@ GEN_VN (add,di3_vcc_zext_dup2, A(rtx dest, rtx src1, rtx src2, rtx vcc),
 	A(dest, src1, src2, vcc))
 GEN_VN (addc,si3, A(rtx dest, rtx src1, rtx src2, rtx vccout, rtx vccin),
 	A(dest, src1, src2, vccout, vccin))
+GEN_VN (and,si3, A(rtx dest, rtx src1, rtx src2), A(dest, src1, src2))
 GEN_VN (ashl,si3, A(rtx dest, rtx src, rtx shift), A(dest, src, shift))
 GEN_VNM_NOEXEC (ds_bpermute,, A(rtx dest, rtx addr, rtx src, rtx exec),
 		A(dest, addr, src, exec))
+GEN_VNM (gather,_expr, A(rtx dest, rtx addr, rtx as, rtx vol),
+	 A(dest, addr, as, vol))
 GEN_VNM (mov,, A(rtx dest, rtx src), A(dest, src))
 GEN_VN (mul,si3_dup, A(rtx dest, rtx src1, rtx src2), A(dest, src1, src2))
+GEN_VN (sub,si3, A(rtx dest, rtx src1, rtx src2), A(dest, src1, src2))
 GEN_VNM (vec_duplicate,, A(rtx dest, rtx src), A(dest, src))
+GEN_VN_NOEXEC (vec_series,si, A(rtx dest, rtx x, rtx c), A(dest, x, c))
 
 #undef GEN_VNM
 #undef GEN_VN
@@ -1993,44 +1998,146 @@ regno_ok_for_index_p (int regno)
 void
 gcn_expand_vector_init (rtx op0, rtx vec)
 {
-  int64_t initialized_mask = 0;
-  int64_t curr_mask = 1;
+  rtx val[64];
   machine_mode mode = GET_MODE (op0);
   int vf = GET_MODE_NUNITS (mode);
+  machine_mode addrmode = VnMODE (vf, DImode);
+  machine_mode offsetmode = VnMODE (vf, SImode);
 
-  rtx val = XVECEXP (vec, 0, 0);
+  int64_t mem_mask = 0;
+  int64_t item_mask[64];
+  rtx ramp = gen_reg_rtx (offsetmode);
+  rtx addr = gen_reg_rtx (addrmode);
 
-  for (int i = 1; i < vf; i++)
-if (rtx_equal_p (val, XVECEXP (vec, 0, i)))
-  curr_mask |= (int64_t) 1 << i;
+  int unit_size = GET_MODE_SIZE (GET_MODE_INNER (GET_MODE (op0)));
+  emit_insn (gen_mulvNsi3_dup (ramp, gen_rtx_REG (offsetmode, VGPR_REGNO (1)),
+			   GEN_INT (unit_size)));
 
-  if (gcn_constant_p (val))
-emit_move_insn (op0, gcn_vec_constant (mode, val));
-  else
+  bool simple_repeat = true;
+
+  /* Expand nested vectors into one vector.  */
+  int item_count = XVECLEN (vec, 0);
+  for (int i = 0, j = 0; i < item_count; i++)
+{
+  rtx item = XVECEXP (vec, 0, i);
+  machine_mode mode = GET_MODE (item);
+  int units = VECTOR_MODE_P (mode) ? GET_MODE_NUNITS (mode) : 1;
+  item_mask[j] = (((uint64_t)-1)>>(64-units)) << j;
+
+  if (simple_repeat && i != 0)
+	simple_repeat = item == XVECEXP (vec, 0, i-1);
+
+  /* If its a vector of values then copy them into the final location.  */
+  if (GET_CODE (item) == CONST_VECTOR)
+	{
+	  for (int k = 0; k < units; k++)
+	val[j++] = XVECEXP (item, 0, k);
+	  continue;
+	}
+  /* Otherwise, we have a scalar or an expression that expands...  */
+
+  if (MEM_P (item))
+	{
+	  rtx base = XEXP (item, 0);
+	  if (MEM_ADDR_SPACE (item) == DEFAULT_ADDR_SPACE
+	  && REG_P (base))
+	{
+	  /* We have a simple vector load.  We can put the addresses in
+		 the vector, combine it with any other such MEMs, and load it
+		 all with a single gather at the end.  */
+	  int64_t mask = ((0xUL
+			   >> (64-GET_MODE_NUNITS (mode)))
+			  << j);
+	  rtx exec = get_exec (mask);
+	  emit_insn (gen_subvNsi3
+			 (ramp, ramp,
+			  gcn_vec_constant (offsetmode, j*unit_size),
+			  ramp, exec));
+	  emit_insn (gen_addvNdi3_zext_dup2
+			 (addr, ramp, base,
+			  (mem_mask ? addr : gcn_gen_undef (addrmode)),
+			  exec));
+	  mem_mask |= mask;
+	}
+	  else
+	/* The MEM is non-trivial, so let's load it independently.  */
+	item = force_reg (mode, item);
+	}
+  else if (!CONST_INT_P (item) && !CONST_DOUBLE_P (item))
+	/* The item may be a symbol_ref, or something else non-trivial.  */
+	item = force_reg (mode, item);
+
+  /* Duplicate the vector across e

[committed 1/6] amdgcn: add multiple vector sizes

2022-10-11 Thread Andrew Stubbs


The vectors sizes are simulated using implicit masking, but they make life
easier for the autovectorizer and SLP passes.

gcc/ChangeLog:

* config/gcn/gcn-modes.def (VECTOR_MODE): Add new modes
V32QI, V32HI, V32SI, V32DI, V32TI, V32HF, V32SF, V32DF,
V16QI, V16HI, V16SI, V16DI, V16TI, V16HF, V16SF, V16DF,
V8QI, V8HI, V8SI, V8DI, V8TI, V8HF, V8SF, V8DF,
V4QI, V4HI, V4SI, V4DI, V4TI, V4HF, V4SF, V4DF,
V2QI, V2HI, V2SI, V2DI, V2TI, V2HF, V2SF, V2DF.
(ADJUST_ALIGNMENT): Likewise.
* config/gcn/gcn-protos.h (gcn_full_exec): Delete.
(gcn_full_exec_reg): Delete.
(gcn_scalar_exec): Delete.
(gcn_scalar_exec_reg): Delete.
(vgpr_1reg_mode_p): Use inner mode to identify vector registers.
(vgpr_2reg_mode_p): Likewise.
(vgpr_vector_mode_p): Use VECTOR_MODE_P.
* config/gcn/gcn-valu.md (V_QI, V_HI, V_HF, V_SI, V_SF, V_DI, V_DF,
V_QIHI, V_1REG, V_INT_1REG, V_INT_1REG_ALT, V_FP_1REG, V_2REG, V_noQI,
V_noHI, V_INT_noQI, V_INT_noHI, V_ALL, V_ALL_ALT, V_INT, V_FP):
Add additional vector modes.
(V64_SI, V64_DI, V64_ALL, V64_FP): New iterators.
(scalar_mode, SCALAR_MODE, vnsi, VnSI, vndi, VnDI, sdwa):
Add additional vector mode mappings.
(mov): Implement vector length conversions.
(ldexp3): Use VnSI.
(frexp_exp2): Likewise.
(VCVT_MODE, VCVT_FMODE, VCVT_IMODE): Add additional vector modes.
(reduc__scal_): Use V64_ALL.
(fold_left_plus_): Use V64_FP.
(*_dpp_shr_): Use V64_1REG.
(*_dpp_shr_): Use V64_DI.
(*plus_carry_dpp_shr_): Use V64_INT_1REG.
(*plus_carry_in_dpp_shr_): Use V64_SI.
(*plus_carry_dpp_shr_): Use V64_DI.
(mov_from_lane63_): Use V64_2REG.
* config/gcn/gcn.cc (VnMODE): New function.
(gcn_can_change_mode_class): Support multiple vector sizes.
(gcn_modes_tieable_p): Likewise.
(gcn_operand_part): Likewise.
(gcn_scalar_exec): Delete function.
(gcn_scalar_exec_reg): Delete function.
(gcn_full_exec): Delete function.
(gcn_full_exec_reg): Delete function.
(gcn_inline_fp_constant_p): Support multiple vector sizes.
(gcn_fp_constant_p): Likewise.
(A): New macro.
(GEN_VN_NOEXEC): New macro.
(GEN_VNM_NOEXEC): New macro.
(GEN_VN): New macro.
(GEN_VNM): New macro.
(GET_VN_FN): New macro.
(CODE_FOR): New macro.
(CODE_FOR_OP): New macro.
(gen_mov_with_exec): Delete function.
(gen_duplicate_load): Delete function.
(gcn_expand_vector_init): Support multiple vector sizes.
(strided_constant): Likewise.
(gcn_addr_space_legitimize_address): Likewise.
(gcn_expand_scalar_to_vector_address): Likewise.
(gcn_expand_scaled_offsets): Likewise.
(gcn_secondary_reload): Likewise.
(gcn_valid_cvt_p): Likewise.
(gcn_expand_builtin_1): Likewise.
(gcn_make_vec_perm_address): Likewise.
(gcn_vectorize_vec_perm_const): Likewise.
(gcn_vector_mode_supported_p): Likewise.
(gcn_autovectorize_vector_modes): New hook.
(gcn_related_vector_mode): Support multiple vector sizes.
(gcn_expand_dpp_shr_insn): Add FIXME comment.
(gcn_md_reorg): Support multiple vector sizes.
(print_reg): Likewise.
(print_operand): Likewise.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): New hook.
---
 gcc/config/gcn/gcn-modes.def |  82 
 gcc/config/gcn/gcn-protos.h  |  22 +-
 gcc/config/gcn/gcn-valu.md   | 332 ++---
 gcc/config/gcn/gcn.cc| 927 ++-
 4 files changed, 938 insertions(+), 425 deletions(-)

diff --git a/gcc/config/gcn/gcn-modes.def b/gcc/config/gcn/gcn-modes.def
index 82585de798b..1b8a3203463 100644
--- a/gcc/config/gcn/gcn-modes.def
+++ b/gcc/config/gcn/gcn-modes.def
@@ -29,6 +29,48 @@ VECTOR_MODE (FLOAT, HF, 64);/*		  V64HF */
 VECTOR_MODE (FLOAT, SF, 64);/*		  V64SF */
 VECTOR_MODE (FLOAT, DF, 64);/*		  V64DF */
 
+/* Artificial vector modes, for when vector masking doesn't work (yet).  */
+VECTOR_MODE (INT, QI, 32);  /*		  V32QI */
+VECTOR_MODE (INT, HI, 32);  /*		  V32HI */
+VECTOR_MODE (INT, SI, 32);  /*		  V32SI */
+VECTOR_MODE (INT, DI, 32);  /*		  V32DI */
+VECTOR_MODE (INT, TI, 32);  /*		  V32TI */
+VECTOR_MODE (FLOAT, HF, 32);/*		  V32HF */
+VECTOR_MODE (FLOAT, SF, 32);/*		  V32SF */
+VECTOR_MODE (FLOAT, DF, 32);/*		  V32DF */
+VECTOR_MODE (INT, QI, 16);  /*		  V16QI */
+VECTOR_MODE (INT, HI, 16);  /*		  V16HI */
+VECTOR_MODE (INT, SI, 16);  /*		  V16SI */
+VECTOR_MODE (INT, DI, 16);  /*		  V16DI */
+VECTOR_MODE (INT, TI, 16);  /*		  V16TI */
+VECTOR_MODE (FLOAT, HF, 16);/*		  V16HF */
+VECTOR_MODE (FLOAT, SF, 16);/*		  V16SF */
+VECTOR_MODE (FLOAT, DF, 16);/*		  V16DF */
+VECTOR_MODE (I

[committed 5/6] amdgcn: Add vector integer negate insn

2022-10-11 Thread Andrew Stubbs


Another example of the vectorizer needing explicit insns where the scalar
expander just works.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (neg2): New define_expand.
---
 gcc/config/gcn/gcn-valu.md | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index f708e587f38..00c0e3be1ea 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -2390,6 +2390,19 @@ (define_insn "3"
   [(set_attr "type" "vop2,ds")
(set_attr "length" "8,8")])
 
+;; }}}
+;; {{{ Int unops
+
+(define_expand "neg2"
+  [(match_operand:V_INT 0 "register_operand")
+   (match_operand:V_INT 1 "register_operand")]
+  ""
+  {
+emit_insn (gen_sub3 (operands[0], gcn_vec_constant (mode, 0),
+			   operands[1]));
+DONE;
+  })
+
 ;; }}}
 ;; {{{ FP binops - special cases

[committed 6/6] amdgcn: vector testsuite tweaks

2022-10-11 Thread Andrew Stubbs


The testsuite needs a few tweaks following my patches to add multiple vector
sizes for amdgcn.

gcc/testsuite/ChangeLog:

* gcc.dg/pr104464.c: Xfail on amdgcn.
* gcc.dg/signbit-2.c: Likewise.
* gcc.dg/signbit-5.c: Likewise.
* gcc.dg/vect/bb-slp-68.c: Likewise.
* gcc.dg/vect/bb-slp-cond-1.c: Change expectations on amdgcn.
* gcc.dg/vect/bb-slp-subgroups-3.c: Likewise.
* gcc.dg/vect/no-vfa-vect-depend-2.c: Change expectations for multiple
vector sizes.
* gcc.dg/vect/pr33953.c: Likewise.
* gcc.dg/vect/pr65947-12.c: Likewise.
* gcc.dg/vect/pr65947-13.c: Likewise.
* gcc.dg/vect/pr80631-2.c: Likewise.
* gcc.dg/vect/slp-reduc-4.c: Likewise.
* gcc.dg/vect/trapv-vect-reduc-4.c: Likewise.
* lib/target-supports.exp (available_vector_sizes): Add more sizes
for amdgcn.
---
 gcc/testsuite/gcc.dg/pr104464.c  | 2 ++
 gcc/testsuite/gcc.dg/signbit-2.c | 5 +++--
 gcc/testsuite/gcc.dg/signbit-5.c | 1 +
 gcc/testsuite/gcc.dg/vect/bb-slp-68.c| 5 +++--
 gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c| 3 ++-
 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c   | 5 -
 gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c | 3 ++-
 gcc/testsuite/gcc.dg/vect/pr33953.c  | 3 ++-
 gcc/testsuite/gcc.dg/vect/pr65947-12.c   | 3 ++-
 gcc/testsuite/gcc.dg/vect/pr65947-13.c   | 3 ++-
 gcc/testsuite/gcc.dg/vect/pr80631-2.c| 3 ++-
 gcc/testsuite/gcc.dg/vect/slp-reduc-4.c  | 3 ++-
 gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c   | 3 ++-
 gcc/testsuite/lib/target-supports.exp| 3 ++-
 14 files changed, 31 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr104464.c b/gcc/testsuite/gcc.dg/pr104464.c
index ed6a22c39d5..d36a28678cb 100644
--- a/gcc/testsuite/gcc.dg/pr104464.c
+++ b/gcc/testsuite/gcc.dg/pr104464.c
@@ -9,3 +9,5 @@ foo(void)
 {
   f += (F)(f != (F){}[0]);
 }
+
+/* { dg-xfail-if "-fnon-call-exceptions unsupported" { amdgcn-*-* } } */
diff --git a/gcc/testsuite/gcc.dg/signbit-2.c b/gcc/testsuite/gcc.dg/signbit-2.c
index 2f2dc448286..99a455bc7d7 100644
--- a/gcc/testsuite/gcc.dg/signbit-2.c
+++ b/gcc/testsuite/gcc.dg/signbit-2.c
@@ -20,6 +20,7 @@ void fun2(int32_t *x, int n)
   x[i] = (-x[i]) >> 30;
 }
 
-/* { dg-final { scan-tree-dump {\s+>\s+\{ 0(, 0)+ \}} optimized { target vect_int } } } */
+/* Xfail amdgcn where vector truth type is not integer type.  */
+/* { dg-final { scan-tree-dump {\s+>\s+\{ 0(, 0)+ \}} optimized { target vect_int xfail amdgcn-*-* } } } */
 /* { dg-final { scan-tree-dump {\s+>\s+0} optimized { target { ! vect_int } } } } */
-/* { dg-final { scan-tree-dump-not {\s+>>\s+31} optimized } } */
+/* { dg-final { scan-tree-dump-not {\s+>>\s+31} optimized { xfail amdgcn-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/signbit-5.c b/gcc/testsuite/gcc.dg/signbit-5.c
index 2b119cdfda7..0fad56c0ea8 100644
--- a/gcc/testsuite/gcc.dg/signbit-5.c
+++ b/gcc/testsuite/gcc.dg/signbit-5.c
@@ -4,6 +4,7 @@
 /* This test does not work when the truth type does not match vector type.  */
 /* { dg-additional-options "-mno-avx512f" { target { i?86-*-* x86_64-*-* } } } */
 /* { dg-additional-options "-march=armv8-a" { target aarch64_sve } } */
+/* { dg-xfail-run-if "truth type does not match vector type" { amdgcn-*-* } } */
 
 
 #include 
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-68.c b/gcc/testsuite/gcc.dg/vect/bb-slp-68.c
index 8718031cc71..e7573a14933 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-68.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-68.c
@@ -18,5 +18,6 @@ void foo ()
   x[9] = z[3] + 1.;
 }
 
-/* We want to have the store group split into 4, 2, 4 when using 32byte vectors.  */
-/* { dg-final { scan-tree-dump-not "from scalars" "slp2" } } */
+/* We want to have the store group split into 4, 2, 4 when using 32byte vectors.
+   Unfortunately it does not work when 64-byte vectors are available.  */
+/* { dg-final { scan-tree-dump-not "from scalars" "slp2" { xfail amdgcn-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
index 4bd286bf08c..1f5c621e5fd 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c
@@ -46,5 +46,6 @@ int main ()
 }
 
 /* { dg-final { scan-tree-dump {(no need for alias check [^\n]* when VF is 1|no alias between [^\n]* when [^\n]* is outside \(-16, 16\))} "vect" { target vect_element_align } } } */
-/* { dg-final { scan-tree-dump-times "loop vectorized" 1 "vect" { target vect_element_align } } } */
+/* { dg-final { scan-tree-dump-times "loop vectorized" 1 "vect" { target { vect_element_align && !amdgcn-*-* } } } } */
+/* { dg-final { scan-tree-dump-times "loop vectorized" 2 "vect" { target amdgcn-*-* } } } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c
index 03c062ae6cf..fb71991

Re: [PATCH] libstdc++: Allow emergency EH alloc pool size to be tuned [PR68606]

2022-10-11 Thread Jonathan Wakely via Gcc-patches

On Tue, 11 Oct 2022 at 07:41, Richard Biener  wrote:
>
> On Mon, Oct 10, 2022 at 5:10 PM Jonathan Wakely  wrote:
> >
> > On Mon, 10 Oct 2022 at 12:17, Jonathan Wakely  wrote:
> > >
> > > On Mon, 10 Oct 2022 at 07:18, Richard Biener  
> > > wrote:
> > > >
> > > > On Fri, Oct 7, 2022 at 5:55 PM Jonathan Wakely via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > This needs a little more documentation (see the TODO in the manual),
> > > > > rather than just the comments in the source. This isn't final, but I
> > > > > think it's the direction I want to take.
> > > > >
> > > > > -- >8 --
> > > > >
> > > > > Implement a long-standing request to support tuning the size of the
> > > > > emergency buffer for allocating exceptions after malloc fails, or to
> > > > > disable that buffer entirely.
> > > > >
> > > > > It's now possible to disable the dynamic allocation of the buffer and
> > > > > use a fixed-size static buffer, via --enable-libstdcxx-static-eh-pool.
> > > > > This is a built-time choice that is baked into libstdc++ and so 
> > > > > affects
> > > > > all code linked against that build of libstdc++.
> > > > >
> > > > > The size of the pool can be set by 
> > > > > --with-libstdcxx-eh-pool-obj-count=N
> > > > > which is measured in units of sizeof(void*) not bytes. A given 
> > > > > exception
> > > > > type such as std::system_error depends on the target, so giving a size
> > > > > in bytes wouldn't be portable across 16/32/64-bit targets.
> > > > >
> > > > > When libstdc++ is configured to use a dynamic buffer, the size of that
> > > > > buffer can now be tuned at runtime by setting the GLIBCXX_TUNABLES
> > > > > environment variable (c.f. PR libstdc++/88264). The number of 
> > > > > exceptions
> > > > > to reserve space for is controlled by the "glibcxx.eh_pool.obj_count"
> > > > > and "glibcxx.eh_pool.obj_size" tunables. The pool will be sized to be
> > > > > able to allocate obj_count exceptions of size obj_size*sizeof(void*) 
> > > > > and
> > > > > obj_count "dependent" exceptions rethrown by std::rethrow_exception.
> > > > >
> > > > > With the ability to tune the buffer size, we can reduce the default 
> > > > > pool
> > > > > size. Most users never need to throw 1kB exceptions in parallel from
> > > > > hundreds of threads after malloc is OOM.
> > > >
> > > > But does it hurt?  Back in time when I reworked the allocator to be less
> > > > wasteful the whole point was to allow more exceptions to be in-flight
> > > > during OOM shutdown of a process with many threads.
> > >
> > > It certainly hurts for small systems, but maybe we can keep the large
> > > allocation for 64-bit targets (currently 73kB) and only reduce it for
> > > 32-bit (19kB) and 16-bit (3kB IIRC) targets.
> >
> > Maybe this incremental diff would be an improvement:
> >
> > @@ -90,7 +90,7 @@ using namespace __cxxabiv1;
> > // Assume that the number of concurrent exception objects scales with the
> > // processor word size, i.e., 16-bit systems are not likely to have hundreds
> > // of threads all simultaneously throwing on OOM conditions.
> > -# define EMERGENCY_OBJ_COUNT   (8 * __SIZEOF_POINTER__)
> > +# define EMERGENCY_OBJ_COUNT   (4 * __SIZEOF_POINTER__ * 
> > __SIZEOF_POINTER__)
> > # define MAX_OBJ_COUNT  (16 << __SIZEOF_POINTER__)
> > #else
> > # define EMERGENCY_OBJ_COUNT   4
> >
> > This makes it quadratic in the word size, so on 64-bit targets we'd
> > have space for 256 "reasonable size" exceptions (and twice as many
> > single word exceptions like std::bad_alloc), but only 64 on 32-bit
> > targets, and only 16 on 16-bit ones.
>
> So can we then commonize some of the #defines by using sizeof(void *)
> (taking pointer size as word size?)

What did you have in mind? Do you mean use sizeof(void*) instead of
the SIZEOF macro?

MAX_OBJ_COUNT uses the SIZEOF macro so it can be used in a
preprocessor condition:

#ifdef _GLIBCXX_EH_POOL_NOBJS
# if _GLIBCXX_EH_POOL_NOBJS > MAX_OBJ_COUNT

Re: [Patch][v5] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-10-11 Thread Alexander Monakov

On Tue, 11 Oct 2022, Jakub Jelinek wrote:

> So, does this mean one has to have gcc configured --with-arch=sm_70
> or later to make reverse offloading work (and then on the other
> side no support for older PTX arches at all)?
> If yes, I was kind of hoping we could arrange for it to be more
> user-friendly, build libgomp.a normally (sm_35 or what is the default),
> build the single TU in libgomp that needs the sm_70 stuff with -march=sm_70
> and arrange for mkoffload to link in the sm_70 stuff only if the user
> wants reverse offload (or has requires reverse_offload?).  In that case
> ignore sm_60 and older devices, if reverse offload isn't wanted, don't link
> in the part that needs sm_70 and make stuff working on sm_35 and later.
> Or perhaps have 2 versions of target.o, one sm_35 and one sm_70 and let
> mkoffload choose among them.

My understanding is such trickery should not be necessary with
the barrier-based approach, i.e. the sequence of PTX instructions

  st   % plain store
  membar.sys
  st.volatile

should be enough to guarantee that the former store is visible on the host
before the latter, and work all the way back to sm_20.

Alexander

[PATCH] tree-optimization/107212 - SLP reduction of reduction paths

2022-10-11 Thread Richard Biener via Gcc-patches

The following fixes an issue with how we handle epilogue generation
for SLP reductions of reduction paths where the actual live lanes
are not "canonical".  We need to make sure to identify all live
lanes as reductions and thus have to iterate over all participating
SLP lanes when walking the reduction SSA use-def chain.  Also the
previous attempt likely to mitigate such issue in
vectorizable_live_operation is misguided and has to be removed.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed to trunk 
sofar.

PR tree-optimization/107212
* tree-vect-loop.cc (vectorizable_reduction): Make sure to
set STMT_VINFO_REDUC_DEF for all live lanes in a SLP
reduction.
(vectorizable_live_operation): Do not pun to the SLP
node representative for reduction epilogue generation.

* gcc.dg/vect/pr107212-1.c: New testcase.
* gcc.dg/vect/pr107212-2.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/pr107212-1.c | 27 ++
 gcc/testsuite/gcc.dg/vect/pr107212-2.c | 23 ++
 gcc/tree-vect-loop.cc  | 20 ---
 3 files changed, 63 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr107212-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr107212-2.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr107212-1.c 
b/gcc/testsuite/gcc.dg/vect/pr107212-1.c
new file mode 100644
index 000..5343f9b6b23
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr107212-1.c
@@ -0,0 +1,27 @@
+/* { dg-do run } */
+
+#include "tree-vect.h"
+
+int main()
+{
+  check_vect ();
+
+  unsigned int tab[6][2] = { {69, 73}, {36, 40}, {24, 16},
+{16, 11}, {4, 5}, {3, 1} };
+
+  int sum_0 = 0;
+  int sum_1 = 0;
+
+  for(int t=0; t<6; t++) {
+  sum_0 += tab[t][0];
+  sum_1 += tab[t][1];
+  }
+
+  int x1 = (sum_0 < 100);
+  int x2 = (sum_0 > 200);
+
+  if (x1 || x2 || sum_1 != 146)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr107212-2.c 
b/gcc/testsuite/gcc.dg/vect/pr107212-2.c
new file mode 100644
index 000..109c2b991a6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr107212-2.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+
+#include "tree-vect.h"
+
+int sum_1 = 0;
+
+int main()
+{
+  check_vect ();
+
+  unsigned int tab[6][2] = {{150, 0}, {0, 0}, {0, 0}, {0, 0}, {0, 0}, {0, 0}};
+  
+  int sum_0 = 0;
+  
+  for (int t = 0; t < 6; t++) {
+sum_0 += tab[t][0];
+sum_1 += tab[t][0];
+  }
+  
+  if (sum_0 < 100 || sum_0 > 200)
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 2536cc3cf49..1996ecfee7a 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6822,10 +6822,20 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
}
   if (!REDUC_GROUP_FIRST_ELEMENT (vdef))
only_slp_reduc_chain = false;
-  /* ???  For epilogue generation live members of the chain need
+  /* For epilogue generation live members of the chain need
  to point back to the PHI via their original stmt for
-info_for_reduction to work.  */
-  if (STMT_VINFO_LIVE_P (vdef))
+info_for_reduction to work.  For SLP we need to look at
+all lanes here - even though we only will vectorize from
+the SLP node with live lane zero the other live lanes also
+need to be identified as part of a reduction to be able
+to skip code generation for them.  */
+  if (slp_for_stmt_info)
+   {
+ for (auto s : SLP_TREE_SCALAR_STMTS (slp_for_stmt_info))
+   if (STMT_VINFO_LIVE_P (s))
+ STMT_VINFO_REDUC_DEF (vect_orig_stmt (s)) = phi_info;
+   }
+  else if (STMT_VINFO_LIVE_P (vdef))
STMT_VINFO_REDUC_DEF (def) = phi_info;
   gimple_match_op op;
   if (!gimple_extract_op (vdef->stmt, &op))
@@ -9601,10 +9611,6 @@ vectorizable_live_operation (vec_info *vinfo,
 all involved stmts together.  */
  else if (slp_index != 0)
return true;
- else
-   /* For SLP reductions the meta-info is attached to
-  the representative.  */
-   stmt_info = SLP_TREE_REPRESENTATIVE (slp_node);
}
   stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
   gcc_assert (reduc_info->is_reduc_info);
-- 
2.35.3

Re: [PATCH 2/2] Split edge when edge locus and dest don't match

2022-10-11 Thread Richard Biener via Gcc-patches

On Tue, Oct 11, 2022 at 12:57 PM Jørgen Kvalsvik
 wrote:
>
> On 07/10/2022 13:45, Jørgen Kvalsvik wrote:
> > On 07/10/2022 08:53, Richard Biener wrote:
> >> On Thu, Oct 6, 2022 at 4:28 PM Jørgen Kvalsvik
> >>  wrote:
> >>>
> >>> On 06/10/2022 10:12, Richard Biener wrote:
>  On Wed, Oct 5, 2022 at 2:49 PM Martin Liška  wrote:
> >
> > On 10/5/22 14:04, Jørgen Kvalsvik via Gcc-patches wrote:
> >> Edges with locus are candidates for splitting so that the edge with
> >> locus is the only edge out of a basic block, except when the locuses
> >> match. The test checks the last (non-debug) statement in a basic block,
> >> but this causes an unnecessary split when the edge locus go to a block
> >> which coincidentally has an unrelated label. Comparing the first
> >> statement of the destination block is the same check but does not get
> >> tripped up by labels.
> >>
> >> An example with source/edge/dest locus when an edge is split:
> >>
> >>   1  int fn (int a, int b, int c) {
> >>   2int x = 0;
> >>   3if (a && b) {
> >>   4x = a;
> >>   5} else {
> >>   6  a_:
> >>   7x = (a - b);
> >>   8}
> >>   9
> >>  10return x;
> >>  11  }
> >>
> >> block  file  line   col  stmt
> >>
> >> src t.c 310  if (a_3(D) != 0)
> >> edget.c 6 1
> >> destt.c 6 1  a_:
> >>
> >> src t.c 313  if (b_4(D) != 0)
> >> edget.c 6 1
> >> dst t.c 6 1  a_:
> >>
> >> With label removed:
> >>
> >>   1  int fn (int a, int b, int c) {
> >>   2int x = 0;
> >>   3if (a && b) {
> >>   4x = a;
> >>   5} else {
> >>   6  // a_: <- label removed
> >>   7x = (a - b);
> >>   8}
> >>   9
> >>  10return x;
> >>  11
> >>
> >> block  file  line   col  stmt
> >>
> >> src t.c 310  if (a_3(D) != 0)
> >> edge  (null)0 0
> >> destt.c 6 1  a_:
> >>
> >> src t.c 313  if (b_4(D) != 0)
> >> edge  (null)0 0
> >> dst t.c 6 1  a_:
> >>
> >> and this is extract from gcov-4b.c which *should* split:
> >>
> >> 205  int
> >> 206  test_switch (int i, int j)
> >> 207  {
> >> 208int result = 0;
> >> 209
> >> 210switch (i)/* branch(80 25) */
> >> 211  /* branch(end) */
> >> 212  {
> >> 213case 1:
> >> 214  result = do_something (2);
> >> 215  break;
> >> 216case 2:
> >> 217  result = do_something (1024);
> >> 218  break;
> >> 219case 3:
> >> 220case 4:
> >> 221  if (j == 2) /* branch(67) */
> >> 222  /* branch(end) */
> >> 223return do_something (4);
> >> 224  result = do_something (8);
> >> 225  break;
> >> 226default:
> >> 227  result = do_something (32);
> >> 228  switch_m++;
> >> 229  break;
> >> 230  }
> >> 231return result;
> >> 231  }
> >>
> >> block  file  line   col  stmt
> >>
> >> src4b.c   21418  result_18 = do_something (2);
> >> edge   4b.c   215 9
> >> dst4b.c   23110  _22 = result_3;
> >>
> >> src4b.c   21718  result_16 = do_something (1024);
> >> edge   4b.c   218 9
> >> dst4b.c   23110  _22 = result_3;
> >>
> >> src4b.c   22418  result_12 = do_something (8);
> >> edge   4b.c   225 9
> >> dst4b.c   23110  _22 = result_3;
> >>
> >> Note that the behaviour of comparison is preserved for the (switch) 
> >> edge
> >> splitting case. The former case now fails the check (even though
> >> e->goto_locus is no longer a reserved location) because the dest 
> >> matches
> >> the e->locus.
> >
> > It's fine, please install it.
> > I verified tramp3d coverage output is the same as before the patch.
> >
> > Martin
> >
> >>
> >> gcc/ChangeLog:
> >>
> >> * profile.cc (branch_prob): Compare edge locus to dest, not 
> >> src.
> >> ---
> >>  gcc/profile.cc | 18 +-
> >>  1 file changed, 9 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/gcc/profile.cc b/gcc/profile.cc
> >> index 96121d60711..c13a79a84c2 100644
> >> --- a/gcc/profile.cc
> >> +++ b/gcc/profile

Re: [committed 0/6] amdgcn: Add V32, V16, V8, V4, and V2 vectors

2022-10-11 Thread Richard Biener via Gcc-patches

On Tue, Oct 11, 2022 at 1:03 PM Andrew Stubbs  wrote:
>
> This patch series adds additional vector sizes for the amdgcn backend.
>
> The hardware supports any arbitrary vector length up to 64-lanes via
> masking, but GCC cannot (yet) make full use of them due to middle-end
> limitations.  Adding smaller "virtual" vector sizes increases the
> complexity of the backend a little, but opens up optimization
> opportunities for the current middle-end implementation somewhat. In
> particular, it enables many more cases of SLP optimization.
>
> The patchset gives aproximately 100 addtional test PASS and a few extra
> FAIL.  However, the failures are not new issues, but rather existing
> problems that did not show up because the code did not previously
> vectorize.  Expanding the testcase to allow 64-lane vectors shows the
> same problems there.
>
> I shall backport these patches to the OG12 branch shortly.

I suppose until you change the related_vector_mode hook the PR107096 issue
will not hit you but at least it's then latent ...

>
> Andrew
>
> Andrew Stubbs (6):
>   amdgcn: add multiple vector sizes
>   amdgcn: Resolve insn conditions at compile time
>   amdgcn: Add vec_extract for partial vectors
>   amdgcn: vec_init for multiple vector sizes
>   amdgcn: Add vector integer negate insn
>   amdgcn: vector testsuite tweaks
>
>  gcc/config/gcn/gcn-modes.def  |   82 ++
>  gcc/config/gcn/gcn-protos.h   |   24 +-
>  gcc/config/gcn/gcn-valu.md|  399 +--
>  gcc/config/gcn/gcn.cc | 1063 +++--
>  gcc/config/gcn/gcn.h  |   24 +
>  gcc/testsuite/gcc.dg/pr104464.c   |2 +
>  gcc/testsuite/gcc.dg/signbit-2.c  |5 +-
>  gcc/testsuite/gcc.dg/signbit-5.c  |1 +
>  gcc/testsuite/gcc.dg/vect/bb-slp-68.c |5 +-
>  gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c |3 +-
>  .../gcc.dg/vect/bb-slp-subgroups-3.c  |5 +-
>  .../gcc.dg/vect/no-vfa-vect-depend-2.c|3 +-
>  gcc/testsuite/gcc.dg/vect/pr33953.c   |3 +-
>  gcc/testsuite/gcc.dg/vect/pr65947-12.c|3 +-
>  gcc/testsuite/gcc.dg/vect/pr65947-13.c|3 +-
>  gcc/testsuite/gcc.dg/vect/pr80631-2.c |3 +-
>  gcc/testsuite/gcc.dg/vect/slp-reduc-4.c   |3 +-
>  .../gcc.dg/vect/trapv-vect-reduc-4.c  |3 +-
>  gcc/testsuite/lib/target-supports.exp |3 +-
>  19 files changed, 1183 insertions(+), 454 deletions(-)
>
> --
> 2.37.0
>

Re: [PATCH RESEND 1/1] p1689r5: initial support

2022-10-11 Thread Ben Boeckel via Gcc-patches

On Tue, Oct 04, 2022 at 21:12:03 +0200, Harald Anlauf wrote:
> Am 04.10.22 um 17:12 schrieb Ben Boeckel:
> > This patch implements support for [P1689R5][] to communicate to a build
> > system the C++20 module dependencies to build systems so that they may
> > build `.gcm` files in the proper order.
> 
> Is there a reason that you are touching so many frontends?

Just those that needed the update for `cpp_finish`. It does align with
those that will (eventually) need this support anyways (AFAIK).

> > diff --git a/gcc/fortran/cpp.cc b/gcc/fortran/cpp.cc
> > index 364bd0d2a85..0b9df9c02cd 100644
> > --- a/gcc/fortran/cpp.cc
> > +++ b/gcc/fortran/cpp.cc
> > @@ -712,7 +712,7 @@ gfc_cpp_done (void)
> >   FILE *f = fopen (gfc_cpp_option.deps_filename, "w");
> >   if (f)
> > {
> > - cpp_finish (cpp_in, f);
> > + cpp_finish (cpp_in, f, NULL);
> >   fclose (f);
> > }
> >   else
> > @@ -721,7 +721,7 @@ gfc_cpp_done (void)
> >  xstrerror (errno));
> > }
> > else
> > -   cpp_finish (cpp_in, stdout);
> > +   cpp_finish (cpp_in, stdout, NULL);
> >   }
> >
> > cpp_undef_all (cpp_in);
> 
> Couldn't you simply default the third argument of cpp_finish() to NULL?

I could do that. Wasn't sure how much that would be acceptable given
that it is a "silent" change, but it would reduce the number of files
touched here.

Thanks,

--Ben

Re: [PATCH] libstdc++: Allow emergency EH alloc pool size to be tuned [PR68606]

2022-10-11 Thread Richard Biener via Gcc-patches

On Tue, Oct 11, 2022 at 1:06 PM Jonathan Wakely  wrote:
>
> On Tue, 11 Oct 2022 at 07:41, Richard Biener  
> wrote:
> >
> > On Mon, Oct 10, 2022 at 5:10 PM Jonathan Wakely  wrote:
> > >
> > > On Mon, 10 Oct 2022 at 12:17, Jonathan Wakely  wrote:
> > > >
> > > > On Mon, 10 Oct 2022 at 07:18, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Fri, Oct 7, 2022 at 5:55 PM Jonathan Wakely via Gcc-patches
> > > > >  wrote:
> > > > > >
> > > > > > This needs a little more documentation (see the TODO in the manual),
> > > > > > rather than just the comments in the source. This isn't final, but I
> > > > > > think it's the direction I want to take.
> > > > > >
> > > > > > -- >8 --
> > > > > >
> > > > > > Implement a long-standing request to support tuning the size of the
> > > > > > emergency buffer for allocating exceptions after malloc fails, or to
> > > > > > disable that buffer entirely.
> > > > > >
> > > > > > It's now possible to disable the dynamic allocation of the buffer 
> > > > > > and
> > > > > > use a fixed-size static buffer, via 
> > > > > > --enable-libstdcxx-static-eh-pool.
> > > > > > This is a built-time choice that is baked into libstdc++ and so 
> > > > > > affects
> > > > > > all code linked against that build of libstdc++.
> > > > > >
> > > > > > The size of the pool can be set by 
> > > > > > --with-libstdcxx-eh-pool-obj-count=N
> > > > > > which is measured in units of sizeof(void*) not bytes. A given 
> > > > > > exception
> > > > > > type such as std::system_error depends on the target, so giving a 
> > > > > > size
> > > > > > in bytes wouldn't be portable across 16/32/64-bit targets.
> > > > > >
> > > > > > When libstdc++ is configured to use a dynamic buffer, the size of 
> > > > > > that
> > > > > > buffer can now be tuned at runtime by setting the GLIBCXX_TUNABLES
> > > > > > environment variable (c.f. PR libstdc++/88264). The number of 
> > > > > > exceptions
> > > > > > to reserve space for is controlled by the 
> > > > > > "glibcxx.eh_pool.obj_count"
> > > > > > and "glibcxx.eh_pool.obj_size" tunables. The pool will be sized to 
> > > > > > be
> > > > > > able to allocate obj_count exceptions of size 
> > > > > > obj_size*sizeof(void*) and
> > > > > > obj_count "dependent" exceptions rethrown by std::rethrow_exception.
> > > > > >
> > > > > > With the ability to tune the buffer size, we can reduce the default 
> > > > > > pool
> > > > > > size. Most users never need to throw 1kB exceptions in parallel from
> > > > > > hundreds of threads after malloc is OOM.
> > > > >
> > > > > But does it hurt?  Back in time when I reworked the allocator to be 
> > > > > less
> > > > > wasteful the whole point was to allow more exceptions to be in-flight
> > > > > during OOM shutdown of a process with many threads.
> > > >
> > > > It certainly hurts for small systems, but maybe we can keep the large
> > > > allocation for 64-bit targets (currently 73kB) and only reduce it for
> > > > 32-bit (19kB) and 16-bit (3kB IIRC) targets.
> > >
> > > Maybe this incremental diff would be an improvement:
> > >
> > > @@ -90,7 +90,7 @@ using namespace __cxxabiv1;
> > > // Assume that the number of concurrent exception objects scales with the
> > > // processor word size, i.e., 16-bit systems are not likely to have 
> > > hundreds
> > > // of threads all simultaneously throwing on OOM conditions.
> > > -# define EMERGENCY_OBJ_COUNT   (8 * __SIZEOF_POINTER__)
> > > +# define EMERGENCY_OBJ_COUNT   (4 * __SIZEOF_POINTER__ * 
> > > __SIZEOF_POINTER__)
> > > # define MAX_OBJ_COUNT  (16 << __SIZEOF_POINTER__)
> > > #else
> > > # define EMERGENCY_OBJ_COUNT   4
> > >
> > > This makes it quadratic in the word size, so on 64-bit targets we'd
> > > have space for 256 "reasonable size" exceptions (and twice as many
> > > single word exceptions like std::bad_alloc), but only 64 on 32-bit
> > > targets, and only 16 on 16-bit ones.
> >
> > So can we then commonize some of the #defines by using sizeof(void *)
> > (taking pointer size as word size?)
>
> What did you have in mind? Do you mean use sizeof(void*) instead of
> the SIZEOF macro?

I was just confused and didn't see you commonized EMERGENCY_OBJ_SIZE
already, so ignore my comment.

>
> MAX_OBJ_COUNT uses the SIZEOF macro so it can be used in a
> preprocessor condition:
>
> #ifdef _GLIBCXX_EH_POOL_NOBJS
> # if _GLIBCXX_EH_POOL_NOBJS > MAX_OBJ_COUNT
>

libiberty: Demangling 'M' prefixes

2022-10-11 Thread Nathan Sidwell via Gcc-patches



The grammar for a lambda context can include  'M', and we
were adding the component that generated to the substitution table
twice.  Just ignore the 'M' completely -- we'll already have done the
checks we need when we saw its predecessor.  A prefix cannot be the
last component of a nested name, so we do not need to check for that
case (although we could if we wanted to be more lenient).

nathan

--
Nathan SidwellFrom 0fa35c7e2974a22b2107fa378895c3069fe07ff3 Mon Sep 17 00:00:00 2001
From: Nathan Sidwell 
Date: Fri, 30 Sep 2022 09:43:30 -0700
Subject: [PATCH] libiberty: Demangling 'M' prefixes

The grammar for a lambda context can include  'M', and we
were adding the component that generated to the substitution table
twice.  Just ignore the 'M' completely -- we'll already have done the
checks we need when we saw its predecessor.  A prefix cannot be the
last component of a nested name, so we do not need to check for that
case (although we could if we wanted to be more lenient).

	libiberty/
	* cp-demangle.c (d_prefix): 'M' components are not
	(re-)added to the substitution table.
	* testsuite/demangle-expected: Add tests.
---
 libiberty/cp-demangle.c   |  8 +++-
 libiberty/testsuite/demangle-expected | 21 +
 2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index 303bfbf709e..4beb4d257bb 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -1609,12 +1609,10 @@ d_prefix (struct d_info *di, int substable)
 	}
   else if (peek == 'M')
 	{
-	  /* Initializer scope for a lambda.  We don't need to represent
-	 this; the normal code will just treat the variable as a type
-	 scope, which gives appropriate output.  */
-	  if (ret == NULL)
-	return NULL;
+	  /* Initializer scope for a lambda.  We already added it as a
+  	 substitution candidate, don't do that again.  */
 	  d_advance (di, 1);
+	  continue;
 	}
   else
 	{
diff --git a/libiberty/testsuite/demangle-expected b/libiberty/testsuite/demangle-expected
index 90dd4a13945..bd92b12076b 100644
--- a/libiberty/testsuite/demangle-expected
+++ b/libiberty/testsuite/demangle-expected
@@ -1581,3 +1581,24 @@ void L1()::{lambda((auto:1)...)#1}::operator()(int, int*) const
 _ZZ2L1vENKUlDpT_E_clIJiPiEEEDaS0_
 auto L1()::{lambda((auto:1)...)#1}::operator()(int, int*) const
 
+_Z7captureIN4gvarMUlvE_EE7WrapperIT_EOS3_
+Wrapper capture(gvar::{lambda()#1}&&)
+
+_ZNK2L2MUlT_T0_E_clIifEEvS_S0_
+void L2::{lambda(auto:1, auto:2)#1}::operator()(L2, int) const
+
+_ZNK1C1fMUlT_E_clIMS_iEEDaS1_
+auto C::f::{lambda(auto:1)#1}::operator()(int C::*) const
+
+_ZNK2L2MUlT_T0_E_clIifEEvS0_S1_
+void L2::{lambda(auto:1, auto:2)#1}::operator()(int, float) const
+
+_ZNK1B2L3MUlT_T0_E_clIjdEEvS1_S2_
+void B::L3::{lambda(auto:1, auto:2)#1}::operator()(unsigned int, double) const
+
+_Z3fooIN1qMUlvE_ENS0_UlvE0_EEiOT_OT0_
+int foo(q::{lambda()#1}&&, q::{lambda()#2}&&)
+
+_ZNK2L1MUlDpT_E_clIJiPiEEEvS1_
+void L1::{lambda((auto:1)...)#1}::operator()(int, int*) const
+
-- 
2.30.2

Re: [PATCH RESEND 1/1] p1689r5: initial support

2022-10-11 Thread Ben Boeckel via Gcc-patches

On Mon, Oct 10, 2022 at 17:04:09 -0400, Jason Merrill wrote:
> On 10/4/22 11:12, Ben Boeckel wrote:
> > This patch implements support for [P1689R5][] to communicate to a build
> > system the C++20 module dependencies to build systems so that they may
> > build `.gcm` files in the proper order.
> 
> Thanks!
> 
> > Support is communicated through the following three new flags:
> > 
> > - `-fdeps-format=` specifies the format for the output. Currently named
> >`p1689r5`.
> > 
> > - `-fdeps-file=` specifies the path to the file to write the format to.
> 
> Do you expect users to want to emit Makefile (-MM) and P1689 
> dependencies from the same compilation?

Yes, the build system wants to know what files affect scanning as well
(e.g., `#include ` is still possible and if it changes, this
operation should be performed again. The `-M` flags do this quite nicely
already :) .

> > - `-fdep-output=` specifies the `.o` that will be written for the TU
> >that is scanned. This is required so that the build system can
> >correlate the dependency output with the actual compilation that will
> >occur.
> 
> The dependency machinery already needs to be able to figure out the name 
> of the output file, can't we reuse that instead of specifying it on the 
> command line?

This is how it determines the output of the compilation. Because it is
piggy-backing on the `-E` flag, the `-o` flag specifies the output of
the preprocessed source (purely a side-effect right now).

> > diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
> > index 2db1e9cbdfb..90787230a9e 100644
> > --- a/libcpp/include/cpplib.h
> > +++ b/libcpp/include/cpplib.h
> > @@ -298,6 +298,9 @@ typedef CPPCHAR_SIGNED_T cppchar_signed_t;
> >   /* Style of header dependencies to generate.  */
> >   enum cpp_deps_style { DEPS_NONE = 0, DEPS_USER, DEPS_SYSTEM };
> >   
> > +/* Format of header dependencies to generate.  */
> > +enum cpp_deps_format { DEPS_FMT_NONE = 0, DEPS_FMT_P1689R5 };
> 
> Why not add this to cpp_deps_style?

It is orthogonal. The `-M` flags and `-fdeps-*` flags are similar, but
`-fdeps-*` dependencies are fundamentally different than `-M`
dependencies. The comment does need updated though.

`-M` is about discovered dependencies: those that you find out while
doing work. `-fdep-*` is about ordering dependencies: extracting
information from file content in order to even order future work around.
It stems from the loss of embarassing parallelism when building C++20
code that uses `import`. It's isomorphic to the Fortran module
compilation ordering problem.

> > @@ -387,7 +410,7 @@ make_write_vec (const mkdeps::vec &vec, 
> > FILE *fp,
> >  .PHONY targets for all the dependencies too.  */
> >   
> >   static void
> > -make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
> > +make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax, int 
> > extra)
> 
> Instead of adding the "extra" parameter...

Hmm. Not sure why I had named this so poorly. Maybe this comes from my
initial stab at this functionality in 2019 (the format has been hammered
out in ISO C++'s SG15 since then).

> >   {
> > const mkdeps *d = pfile->deps;
> >   
> > @@ -398,7 +421,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned 
> > int colmax)
> > if (d->deps.size ())
> >   {
> > column = make_write_vec (d->targets, fp, 0, colmax, d->quote_lwm);
> > -  if (CPP_OPTION (pfile, deps.modules) && d->cmi_name)
> > +  if (extra && CPP_OPTION (pfile, deps.modules) && d->cmi_name)
> > column = make_write_name (d->cmi_name, fp, column, colmax);
> > fputs (":", fp);
> > column++;
> > @@ -412,7 +435,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned 
> > int colmax)
> > if (!CPP_OPTION (pfile, deps.modules))
> >   return;
> 
> ...how about checking CPP_OPTION for p1689r5 mode here?

I'll take a look at this.

> >   
> > -  if (d->modules.size ())
> > +  if (extra && d->modules.size ())
> >   {
> > column = make_write_vec (d->targets, fp, 0, colmax, d->quote_lwm);
> > if (d->cmi_name)
> > @@ -423,7 +446,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned 
> > int colmax)
> > fputs ("\n", fp);
> >   }
> >   
> > -  if (d->module_name)
> > +  if (extra && d->module_name)
> >   {
> > if (d->cmi_name)
> > {
> > @@ -455,7 +478,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned 
> > int colmax)
> > }
> >   }
> > 
> > -  if (d->modules.size ())
> > +  if (extra && d->modules.size ())
> >   {
> > column = fprintf (fp, "CXX_IMPORTS +=");
> > make_write_vec (d->modules, fp, column, colmax, 0, ".c++m");
> > @@ -468,9 +491,203 @@ make_write (const cpp_reader *pfile, FILE *fp, 
> > unsigned int colmax)
> >   /* Really we should be opening fp here.  */
> >   
> >   void
> > -deps_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
> > +deps_write (const struct cpp_reader *pfi

Re: [committed 0/6] amdgcn: Add V32, V16, V8, V4, and V2 vectors

2022-10-11 Thread Andrew Stubbs


On 11/10/2022 12:29, Richard Biener wrote:

On Tue, Oct 11, 2022 at 1:03 PM Andrew Stubbs  wrote:


This patch series adds additional vector sizes for the amdgcn backend.

The hardware supports any arbitrary vector length up to 64-lanes via
masking, but GCC cannot (yet) make full use of them due to middle-end
limitations.  Adding smaller "virtual" vector sizes increases the
complexity of the backend a little, but opens up optimization
opportunities for the current middle-end implementation somewhat. In
particular, it enables many more cases of SLP optimization.

The patchset gives aproximately 100 addtional test PASS and a few extra
FAIL.  However, the failures are not new issues, but rather existing
problems that did not show up because the code did not previously
vectorize.  Expanding the testcase to allow 64-lane vectors shows the
same problems there.

I shall backport these patches to the OG12 branch shortly.


I suppose until you change the related_vector_mode hook the PR107096 issue
will not hit you but at least it's then latent ...


How do you mean, change it?

static opt_machine_mode
gcn_related_vector_mode (machine_mode vector_mode,
 scalar_mode element_mode, poly_uint64 nunits)
{
  int n = nunits.to_constant ();

  if (n == 0)
n = GET_MODE_NUNITS (vector_mode);

  return VnMODE (n, element_mode);
}


It returns what it's asked for, always matching the number of lanes (not 
the bitsize), which is most likely the most natural for GCN.


Andrew

Re: [PATCH v2 00/10] Introduce strub: machine-independent stack scrubbing

2022-10-11 Thread Alexandre Oliva via Gcc-patches

On Oct 10, 2022, Richard Biener  wrote:

> As noted in the Cauldron Discussion I think you should do all
> instrumentation post-IPA only to simplify your life not needing to
> handle inlining of instrumentation

I looked a bit into that after the Cauldron, and recalled why I wanted
to instrument before inlining: in the case of internal strub, that
introduces a wrapper, it's desirable to be able to inline the wrapper.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about

Re: [committed 0/6] amdgcn: Add V32, V16, V8, V4, and V2 vectors

2022-10-11 Thread Richard Biener via Gcc-patches

On Tue, Oct 11, 2022 at 1:53 PM Andrew Stubbs  wrote:
>
> On 11/10/2022 12:29, Richard Biener wrote:
> > On Tue, Oct 11, 2022 at 1:03 PM Andrew Stubbs  wrote:
> >>
> >> This patch series adds additional vector sizes for the amdgcn backend.
> >>
> >> The hardware supports any arbitrary vector length up to 64-lanes via
> >> masking, but GCC cannot (yet) make full use of them due to middle-end
> >> limitations.  Adding smaller "virtual" vector sizes increases the
> >> complexity of the backend a little, but opens up optimization
> >> opportunities for the current middle-end implementation somewhat. In
> >> particular, it enables many more cases of SLP optimization.
> >>
> >> The patchset gives aproximately 100 addtional test PASS and a few extra
> >> FAIL.  However, the failures are not new issues, but rather existing
> >> problems that did not show up because the code did not previously
> >> vectorize.  Expanding the testcase to allow 64-lane vectors shows the
> >> same problems there.
> >>
> >> I shall backport these patches to the OG12 branch shortly.
> >
> > I suppose until you change the related_vector_mode hook the PR107096 issue
> > will not hit you but at least it's then latent ...
>
> How do you mean, change it?
>
> static opt_machine_mode
> gcn_related_vector_mode (machine_mode vector_mode,
>   scalar_mode element_mode, poly_uint64 nunits)
> {
>int n = nunits.to_constant ();
>
>if (n == 0)
>  n = GET_MODE_NUNITS (vector_mode);
>
>return VnMODE (n, element_mode);
> }
>
>
> It returns what it's asked for, always matching the number of lanes (not
> the bitsize), which is most likely the most natural for GCN.

Yes, change it in any way no longer honoring that.  Or discover the
case (not sure if it actually exists) where the vectorizer itself tricks
you into this by passing down nunits !=0 when vectorizing a loop
(I _think_ that's only done for basic-block vectorization currently).

Richard.

>
> Andrew

Re: [PATCH v2 00/10] Introduce strub: machine-independent stack scrubbing

2022-10-11 Thread Richard Biener via Gcc-patches

On Tue, Oct 11, 2022 at 1:57 PM Alexandre Oliva  wrote:
>
> On Oct 10, 2022, Richard Biener  wrote:
>
> > As noted in the Cauldron Discussion I think you should do all
> > instrumentation post-IPA only to simplify your life not needing to
> > handle inlining of instrumentation
>
> I looked a bit into that after the Cauldron, and recalled why I wanted
> to instrument before inlining: in the case of internal strub, that
> introduces a wrapper, it's desirable to be able to inline the wrapper.

I think if the wrapper is created at IPA time it is also available for
IPA inlining.

Richard.

>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about

Re: [PATCH 1/5] [gfortran] Add parsing support for allocate directive (OpenMP 5.0).

2022-10-11 Thread Jakub Jelinek via Gcc-patches

On Thu, Jan 13, 2022 at 02:53:16PM +, Hafiz Abid Qadeer wrote:
> Currently we only make use of this directive when it is associated
> with an allocate statement.

Sorry for the delay.

I'll start with a comment that allocate directive in 5.0/5.1
for Fortran is a complete mess that has been fixed only in 5.2
by splitting the directive into the allocators construct and
allocate directive.
The problem with 5.0/5.1 is that it is just ambiguous whether
!$omp allocate (list) optional-clauses
is associated with an allocate statement or not.
When it is not associated with allocate statement, it is a declarative
directive that should appear only in the specification part, when it is
associated with a allocate stmt, it should appear only in the executable
part.  And a mess starts when it is on the boundary between the two.
Now, how exactly to differentiate between the 2 I'm afraid depends
on the exact OpenMP version.
1) if we are p->state == ORDER_EXEC already, it must be associated
   with allocate-stmt (and we should error whenever it violates restrictions
   for those)
2) if (list) is missing, it must be associated with allocate-stmt
3) for 5.0 only, if allocator clause isn't specified, it must be
   not associated with allocate-stmt, but in 5.1 the clauses are optional
   also for one associated with it; if align clause is specified, it must be
   5.1
4) all the allocate directives after one that must be associated with
   allocate-stmt must be also associated with allocate-stmt
5) if variables in list are allocatable, it must be associated with
   allocate-stmt, if they aren't allocatable, it must not be associated
   with allocate-stmt

In your patch, you put ST_OMP_ALLOCATE into case_executable define,
I'm afraid due to the above we need to handle ST_OMP_ALLOCATE manually
whenever case_executable/case_omp_decl appear in parse.cc and be prepared
that it could be either declarative directive or executable construct
and resolve based on the 1-5 above into which category it belongs
(either during parsing or during resolving).  And certainly have
testsuite coverage for cases like:
  integer :: i, j
  integer, allocatable :: k(:), l(:)
!$omp allocate (i) allocator (alloc1)
!$omp allocate (j) allocator (alloc2)
!$omp allocate (k) allocator (alloc3)
!$omp allocate (l) allocator (alloc4)
  allocate (k(14), l(23))
where I think the first 2 are declarative directives and the last
2 bind to allocate-stmt (etc., cover all the cases mentioned above).

On the other side, 5.1 has:
"An allocate directive that is associated with an allocate-stmt and specifies a 
list must be
preceded by an executable statement or OpenMP construct."
restriction, so if we implement that, the ambiguity decreases.
We wouldn't need to worry about 3) and 5), would decide on 1) and 2) and 4)
only.

> gcc/fortran/ChangeLog:
> 
>   * dump-parse-tree.c (show_omp_node): Handle EXEC_OMP_ALLOCATE.
>   (show_code_node): Likewise.
>   * gfortran.h (enum gfc_statement): Add ST_OMP_ALLOCATE.
>   (OMP_LIST_ALLOCATOR): New enum value.
>   (enum gfc_exec_op): Add EXEC_OMP_ALLOCATE.
>   * match.h (gfc_match_omp_allocate): New function.
>   * openmp.c (enum omp_mask1): Add OMP_CLAUSE_ALLOCATOR.
>   (OMP_ALLOCATE_CLAUSES): New define.
>   (gfc_match_omp_allocate): New function.
>   (resolve_omp_clauses): Add ALLOCATOR in clause_names.
>   (omp_code_to_statement): Handle EXEC_OMP_ALLOCATE.
>   (EMPTY_VAR_LIST): New define.
>   (check_allocate_directive_restrictions): New function.
>   (gfc_resolve_omp_allocate): Likewise.
>   (gfc_resolve_omp_directive): Handle EXEC_OMP_ALLOCATE.
>   * parse.c (decode_omp_directive): Handle ST_OMP_ALLOCATE.
>   (next_statement): Likewise.

You didn't change next_statement, but case_executable macro.
But see above.

>   (gfc_ascii_statement): Likewise.
>   * resolve.c (gfc_resolve_code): Handle EXEC_OMP_ALLOCATE.
>   * st.c (gfc_free_statement): Likewise.
>   * trans.c (trans_code): Likewise
> 
> gcc/testsuite/ChangeLog:
> 
>   * gfortran.dg/gomp/allocate-4.f90: New test.
>   * gfortran.dg/gomp/allocate-5.f90: New test.
> ---

> --- a/gcc/fortran/openmp.c
> +++ b/gcc/fortran/openmp.c
> @@ -921,6 +921,7 @@ enum omp_mask1
>OMP_CLAUSE_FAIL,  /* OpenMP 5.1.  */
>OMP_CLAUSE_WEAK,  /* OpenMP 5.1.  */
>OMP_CLAUSE_NOWAIT,
> +  OMP_CLAUSE_ALLOCATOR,

I don't see how can you add OMP_CLAUSE_ALLOCATOR to enum omp_mask1.
OMP_MASK1_LAST is already 64, so I think the
  gcc_checking_assert (OMP_MASK1_LAST <= 64 && OMP_MASK2_LAST <= 64);
assertion would fail.
OMP_MASK2_LAST is on the other side just 30, and allocate directive
takes just allocator or in 5.1 align clauses, so both should
go to the enum omp_mask2 block.  And for newly added clauses,
we add the /* OpenMP 5.0.  */ etc. comments when the clause
appeared first (5.0 for allocator, 5.1 for align).

>/* This must come last.  */
>OMP_MASK1_LAST
>  };
> @@ -3568,6 +3569,7 @@ cleanup

Re: [PATCH 2/5] [gfortran] Translate allocate directive (OpenMP 5.0).

2022-10-11 Thread Jakub Jelinek via Gcc-patches

On Thu, Jan 13, 2022 at 02:53:17PM +, Hafiz Abid Qadeer wrote:
> gcc/fortran/ChangeLog:
> 
>   * trans-openmp.c (gfc_trans_omp_clauses): Handle OMP_LIST_ALLOCATOR.
>   (gfc_trans_omp_allocate): New function.
>   (gfc_trans_omp_directive): Handle EXEC_OMP_ALLOCATE.
> 
> gcc/ChangeLog:
> 
>   * tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE_ALLOCATOR.
>   (dump_generic_node): Handle OMP_ALLOCATE.
>   * tree.def (OMP_ALLOCATE): New.
>   * tree.h (OMP_ALLOCATE_CLAUSES): Likewise.
>   (OMP_ALLOCATE_DECL): Likewise.
>   (OMP_ALLOCATE_ALLOCATOR): Likewise.
>   * tree.c (omp_clause_num_ops): Add entry for OMP_CLAUSE_ALLOCATOR.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gfortran.dg/gomp/allocate-6.f90: New test.

There is another issue besides what I wrote in my last review,
and I'm afraid I don't know what to do about it, hoping Tobias
has some ideas.
The problem is that without the allocate-stmt associated allocate directive,
Fortran allocatables are easily always allocated with malloc and freed with
free.  The deallocation can be implicit through reallocation, or explicit
deallocate statement etc.
But when some allocatables are now allocated with a different
allocator (when allocate-stmt associated allocate directive is used),
some allocatables are allocated with malloc and others with GOMP_alloc
but we need to free them with the corresponding allocator based on how
they were allocated, what has been allocated with malloc should be
deallocated with free, what has been allocated with GOMP_alloc should be
deallocated with GOMP_free.
The deallocation can be done in a completely different TU from where it has
been allocated, in theory it could be also not compiled with -fopenmp, etc.
So, I'm afraid we need to store somewhere whether we used malloc or
GOMP_alloc for the allocation (say somewhere in the array descriptor and for
other stuff somewhere on the side?) and slow down all code that needs
deallocation to check that bit (or say we don't support
deallocation/reallocation of OpenMP allocated allocatables without -fopenmp
on the deallocation TU and only slow down -fopenmp compiled code)?

Tobias, thoughts on this?

Jakub

[PATCH 1/2] gcov: test switch/break line counts

2022-10-11 Thread Jørgen Kvalsvik via Gcc-patches

The coverage support will under some conditions decide to split edges to
accurately report coverage. By running the test suite with/without this
edge splitting a small diff shows up, addressed by this patch, which
should catch future regressions.

Removing the edge splitting:

$ diff --git a/gcc/profile.cc b/gcc/profile.cc
--- a/gcc/profile.cc
+++ b/gcc/profile.cc
@@ -1244,19 +1244,7 @@ branch_prob (bool thunk)
Don't do that when the locuses match, so
if (blah) goto something;
is not computed twice.  */
- if (last
- && gimple_has_location (last)
- && !RESERVED_LOCATION_P (e->goto_locus)
- && !single_succ_p (bb)
- && (LOCATION_FILE (e->goto_locus)
- != LOCATION_FILE (gimple_location (last))
- || (LOCATION_LINE (e->goto_locus)
- != LOCATION_LINE (gimple_location (last)
-   {
- basic_block new_bb = split_edge (e);
- edge ne = single_succ_edge (new_bb);
- ne->goto_locus = e->goto_locus;
-   }
+
if ((e->flags & (EDGE_ABNORMAL | EDGE_ABNORMAL_CALL))
&& e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
need_exit_edge = 1;

Assuming the .gcov files from make chec-gcc RUNTESTFLAGS=gcov.exp are
kept:

$ diff -r no-split-edge with-split-edge | grep -C 2 -E "^[<>]\s\s"
diff -r sans-split-edge/gcc/gcov-4.c.gcov with-split-edge/gcc/gcov-4.c.gcov
228c228
< -:  224:break;
---
> 1:  224:break;
231c231
< -:  227:break;
---
> #:  227:break;
237c237
< -:  233:break;
---
> 2:  233:break;

gcc/testsuite/ChangeLog:

* g++.dg/gcov/gcov-1.C: Add line count check.
* gcc.misc-tests/gcov-4.c: Likewise.
---
 gcc/testsuite/g++.dg/gcov/gcov-1.C| 8 
 gcc/testsuite/gcc.misc-tests/gcov-4.c | 4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/g++.dg/gcov/gcov-1.C 
b/gcc/testsuite/g++.dg/gcov/gcov-1.C
index 9018b9a3a73..ee383b480a8 100644
--- a/gcc/testsuite/g++.dg/gcov/gcov-1.C
+++ b/gcc/testsuite/g++.dg/gcov/gcov-1.C
@@ -257,20 +257,20 @@ test_switch (int i, int j)
   switch (i)   /* count(5) */
/* branch(end) */
 {
-  case 1:
+  case 1:  /* count(1) */
 result = do_something (2); /* count(1) */
-break;
+break; /* count(1) */
   case 2:
 result = do_something (1024);
 break;
-  case 3:
+  case 3:  /* count(3) */
   case 4:
/* branch(67) */
 if (j == 2)/* count(3) */
/* branch(end) */
   return do_something (4); /* count(1) */
 result = do_something (8); /* count(2) */
-break;
+break; /* count(2) */
   default:
result = do_something (32); /* count(1) */
switch_m++; /* count(1) */
diff --git a/gcc/testsuite/gcc.misc-tests/gcov-4.c 
b/gcc/testsuite/gcc.misc-tests/gcov-4.c
index 9d8ab1c1097..498d299b66b 100644
--- a/gcc/testsuite/gcc.misc-tests/gcov-4.c
+++ b/gcc/testsuite/gcc.misc-tests/gcov-4.c
@@ -221,7 +221,7 @@ test_switch (int i, int j)
 {
   case 1:
 result = do_something (2); /* count(1) */
-break;
+break; /* count(1) */
   case 2:
 result = do_something (1024);
 break;
@@ -230,7 +230,7 @@ test_switch (int i, int j)
 if (j == 2)/* count(3) */
   return do_something (4); /* count(1) */
 result = do_something (8); /* count(2) */
-break;
+break; /* count(2) */
   default:
result = do_something (32); /* count(1) */
switch_m++; /* count(1) */
-- 
2.34.0

[PATCH 2/2] gcov: test line count for label in then/else block

2022-10-11 Thread Jørgen Kvalsvik via Gcc-patches

Add a test to catch regression in line counts for labels on top of
then/else blocks. Only the 'goto ' should contribute to the line
counter for the label, not the if.

gcc/testsuite/ChangeLog:

* gcc.misc-tests/gcov-4.c:
---
 gcc/testsuite/gcc.misc-tests/gcov-4.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.misc-tests/gcov-4.c 
b/gcc/testsuite/gcc.misc-tests/gcov-4.c
index 498d299b66b..da7929ef7fc 100644
--- a/gcc/testsuite/gcc.misc-tests/gcov-4.c
+++ b/gcc/testsuite/gcc.misc-tests/gcov-4.c
@@ -110,6 +110,29 @@ lab2:
   return 8;/* count(1) */
 }
 
+int
+test_goto3 (int i, int j)
+{
+if (j) goto else_; /* count(1) */
+
+top:
+if (i) /* count(1) */
+  {
+   i = do_something (i);
+  }
+else
+  {
+else_: /* count(1) */
+   j = do_something (j);   /* count(2) */
+   if (j)  /* count(2) */
+ {
+   j = 0;  /* count(1) */
+   goto top;   /* count(1) */
+ }
+  }
+return 16;
+}
+
 void
 call_goto ()
 {
@@ -117,6 +140,7 @@ call_goto ()
   goto_val += test_goto1 (1);
   goto_val += test_goto2 (3);
   goto_val += test_goto2 (30);
+  goto_val += test_goto3 (0, 1);
 }
 
 /* Check nested if-then-else statements. */
@@ -260,7 +284,7 @@ main()
   call_unref ();
   if ((for_val1 != 12)
   || (for_val2 != 87)
-  || (goto_val != 15)
+  || (goto_val != 31)
   || (ifelse_val1 != 31)
   || (ifelse_val2 != 23)
   || (ifelse_val3 != 246)
-- 
2.34.0

Re: [PATCH] Avoid calling tracer.trailer() twice.

2022-10-11 Thread Andrew MacLeod via Gcc-patches

It probably should just be changed to a print if it doesn't return.. 
something like


if (idx && res)
  {
tracer.print (idx, "logical_combine produced");
r.dump (dump_file);
fputc ('\n', dump_file);
   }

Andrew

On 10/10/22 14:58, Aldy Hernandez wrote:

[Andrew, you OK with this?  I can't tell whether the trailer() call was
actually needed.]

logical_combine is calling tracer.trailer() one too many times causing
the second trailer() call to subtract a 0 indent by 2, yielding an
indent of SOMETHING_REALLY_BIG :).  You'd be surprised how many tools
can't handle incredibly long lines.

gcc/ChangeLog:

* gimple-range-gori.cc (gori_compute::logical_combine): Avoid
calling tracer.trailer().
---
  gcc/gimple-range-gori.cc | 10 +-
  1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
index b37d03cddda..469382aa477 100644
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -798,20 +798,12 @@ gori_compute::logical_combine (vrange &r, enum tree_code 
code,
// would be lost.
if (!range_is_either_true_or_false (lhs))
  {
-  bool res;
Value_Range r1 (r);
if (logical_combine (r1, code, m_bool_zero, op1_true, op1_false,
   op2_true, op2_false)
  && logical_combine (r, code, m_bool_one, op1_true, op1_false,
  op2_true, op2_false))
-   {
- r.union_ (r1);
- res = true;
-   }
-  else
-   res = false;
-  if (idx)
-   tracer.trailer (idx, "logical_combine", res, NULL_TREE, r);
+   r.union_ (r1);
  }
  
switch (code)

Re: [PATCH 2/5] [gfortran] Translate allocate directive (OpenMP 5.0).

2022-10-11 Thread Tobias Burnus


Hi Jakub,

On 11.10.22 14:24, Jakub Jelinek wrote:

There is another issue besides what I wrote in my last review,
and I'm afraid I don't know what to do about it, hoping Tobias
has some ideas.
The problem is that without the allocate-stmt associated allocate directive,
Fortran allocatables are easily always allocated with malloc and freed with
free.  The deallocation can be implicit through reallocation, or explicit
deallocate statement etc.
...
But when some allocatables are now allocated with a different
allocator (when allocate-stmt associated allocate directive is used),
some allocatables are allocated with malloc and others with GOMP_alloc
but we need to free them with the corresponding allocator based on how
they were allocated, what has been allocated with malloc should be
deallocated with free, what has been allocated with GOMP_alloc should be
deallocated with GOMP_free.



I think the most common case is:

integer, allocatable :: var(:)
!$omp allocators allocator(my_alloc) ! must be in same scope as decl of 'var'
...
! optionally: deallocate(var)
end ! of scope: block/subroutine/... - automatic deallocation

Those can be easily handled. It gets more complicated with control flow:

if (...) then
 !$omp allocators allocator(...)
 allocate(...)
else
 allocate (...)
endif



However, the problem is really that there is is no mandatory
'!$omp deallocators' and also the wording like:

"If any operation of the base language causes a reallocation of
an array that is allocated with a memory allocator then that
memory allocator will be used to release the current memory
and to allocate the new memory." (OpenMP 5.0 wording)

There has been some attempt to relax the rules a bit, e.g. by
adding the wording:
"For allocated allocatable components of such variables, the allocator that
will be used for the deallocation and allocation is unspecified."

And some wording change (→issues 3189) to clarify related component issues.

But nonetheless, there is still the issue of:

(a) explicit DEALLOCATE in some other translation unit
(b) some intrinsic operation which reallocate the memory, either via libgomp
or in the source code:
 a = [1,2,3]  ! possibly reallocates
 str = trim(str) ! possibly reallocates
where the first one calls 'realloc' directly in the code and the second one
calls 'libgomp' for that.

* * *

I don't see a good solution – and there is in principle the same issue with
unified-shared memory (USM) on hardware that does not support transparently
accessing all host memory on the device.

Compilers support this case by allocating memory in some special memory,
which is either accessible from both sides ('pinned') or migrates on the
first access from the device side - but remains there until the accessing
device kernel ends ('managed memory').

Newer hardware (+ associated Linux kernel support) permit accessing all
memory in a somewhat fast way, avoiding this issue (and special handling
is then left to the user.) For AMDGCN, my understanding is that all hardware
supported by GCC supports this - but glacial speed until the last hardware
architectures. For Nvidia, this is supported since Pascal (I think for Titan X,
P100, i.e. sm_5.2/sm_60) - but I believe not for all Pascal/Kepler hardware.

I mention this because the USM implementation at
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597976.html
suffers from this.
And https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601059.html
tries to solve the the 'trim' example issue above - i.e. the case where
libgomp reallocates pinned/managed (pseudo-)USM memory.

* * *

The deallocation can be done in a completely different TU from where it has
been allocated, in theory it could be also not compiled with -fopenmp, etc.
So, I'm afraid we need to store somewhere whether we used malloc or
GOMP_alloc for the allocation (say somewhere in the array descriptor and for
other stuff somewhere on the side?) and slow down all code that needs
deallocation to check that bit (or say we don't support
deallocation/reallocation of OpenMP allocated allocatables without -fopenmp
on the deallocation TU and only slow down -fopenmp compiled code)?

The problem with storing is that gfortran inserts the malloc/realloc/free calls 
directly, i.e. without library preloading, intercepting those libcalls, I do 
not see how it can work at all.

I also do not know how to handle the pinned-memory case above correctly, either.

One partial support would be requiring that the code using allocatables cannot 
do any reallocation/deallocation by only permitting calls to procedures which 
do not permit allocatables. (Such that no reallocation can happen.) – And print 
a 'sorry' for the rest.

Other implementations seem to have a Fortran library call for (re)allocations, 
which permits to swap the allocator from the generic one to the 
omp_default_mem_alloc.

* * *

In terms of the array descriptor, we have inside 'struct dtype_type'  the 
'signed short attribute', which currently only h

Re: Adding a new thread model to GCC

2022-10-11 Thread LIU Hao via Gcc-patches


在 2022-10-10 23:56, LIU Hao 写道:

在 2022-10-04 20:44, LIU Hao 写道:

Attached are revised patches. These are exported from trunk.



Revised further. The patch for libgfortran has been committed to trunk today, so I include only the 
other two.


   * In the second patch, a space character has been inserted after
     `(int)` for clearness.

   * The macro controlling how to build GCC itself has been renamed to
     `TARGET_USING_MCFGTHREAD` for consistency.

   * Checks against `TARGET_USING_MCFGTHREAD` have been updated in a
     more friendly way.

   * When not using mcfgthread, NTDLL is no longer a default library.
     Although all recent Windows versions are based on the NT kernel,
     there could still be people who want to target 9x or CE; thus
     NTDLL is only added when it is potentially necessary, for example
     when linking against the static libgcc.




Attached is the (previous) third patch, with configure scripts regenerated.


--
Best regards,
LIU Hao

From c32690fa4878d8824a0e05e54f614a8dd9ed68b7 Mon Sep 17 00:00:00 2001
From: LIU Hao 
Date: Sat, 16 Apr 2022 00:46:23 +0800
Subject: [PATCH 2/2] gcc: Add 'mcf' thread model support from mcfgthread

This patch adds the new thread model `mcf`, which implements mutexes
and condition variables with the mcfgthread library.

Source code for mcfgthread is available at 
.

config/ChangeLog:
* gthr.m4 (GCC_AC_THREAD_HEADER): Add new case for `mcf` thread
model

gcc/config/ChangeLog:
* i386/mingw-mcfgthread.h: New file
* i386/mingw32.h: Add builtin macro and default libraries for
mcfgthread when thread model is `mcf`

gcc/ChangeLog:
* config.gcc: Include 'i386/mingw-mcfgthread.h' when thread model
is `mcf`
* configure.ac: Recognize `mcf` as a valid thread model
* configure: Regenerate

libatomic/ChangeLog:
* configure.tgt: Add new case for `mcf` thread model

libgcc/ChangeLog:
* config.host: Add new cases for `mcf` thread model
* config/i386/gthr-mcf.h: New file
* config/i386/t-mingw-mcfgthread: New file
* config/i386/t-slibgcc-cygming: Add mcfgthread for libgcc DLL
* configure: Regenerate

libstdc++-v3/ChangeLog:
* libsupc++/atexit_thread.cc (__cxa_thread_atexit): Use
implementation from mcfgthread if available
* libsupc++/guard.cc (__cxa_guard_acquire, __cxa_guard_release,
__cxa_guard_abort): Use implementations from mcfgthread if
available
* configure: Regenerate
---
 config/gthr.m4  |  1 +
 gcc/config.gcc  |  3 +++
 gcc/config/i386/mingw-mcfgthread.h  |  1 +
 gcc/config/i386/mingw32.h   | 13 ++-
 gcc/configure   |  2 +-
 gcc/configure.ac|  2 +-
 libatomic/configure.tgt |  2 +-
 libgcc/config.host  |  6 +
 libgcc/config/i386/gthr-mcf.h   |  1 +
 libgcc/config/i386/t-mingw-mcfgthread   |  1 +
 libgcc/config/i386/t-slibgcc-cygming|  6 -
 libgcc/configure|  1 +
 libstdc++-v3/configure  | 13 ++-
 libstdc++-v3/libsupc++/atexit_thread.cc | 20 
 libstdc++-v3/libsupc++/guard.cc | 31 +
 15 files changed, 92 insertions(+), 11 deletions(-)
 create mode 100644 gcc/config/i386/mingw-mcfgthread.h
 create mode 100644 libgcc/config/i386/gthr-mcf.h
 create mode 100644 libgcc/config/i386/t-mingw-mcfgthread

diff --git a/config/gthr.m4 b/config/gthr.m4
index 4b937306ad08..11996247f150 100644
--- a/config/gthr.m4
+++ b/config/gthr.m4
@@ -22,6 +22,7 @@ case $1 in
 tpf)   thread_header=config/s390/gthr-tpf.h ;;
 vxworks)   thread_header=config/gthr-vxworks.h ;;
 win32) thread_header=config/i386/gthr-win32.h ;;
+mcf)   thread_header=config/i386/gthr-mcf.h ;;
 esac
 AC_SUBST(thread_header)
 ])
diff --git a/gcc/config.gcc b/gcc/config.gcc
index eec544ff1bac..1f6adea1ab9b 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2091,6 +2091,9 @@ i[34567]86-*-mingw* | x86_64-*-mingw*)
if test x$enable_threads = xposix ; then
tm_file="${tm_file} i386/mingw-pthread.h"
fi
+   if test x$enable_threads = xmcf ; then
+   tm_file="${tm_file} i386/mingw-mcfgthread.h"
+   fi
tm_file="${tm_file} i386/mingw32.h"
# This makes the logic if mingw's or the w64 feature set has to be used
case ${target} in
diff --git a/gcc/config/i386/mingw-mcfgthread.h 
b/gcc/config/i386/mingw-mcfgthread.h
new file mode 100644
index ..7d4eda3ed494
--- /dev/null
+++ b/gcc/config/i386/mingw-mcfgthread.h
@@ -0,0 +1 @@
+#define TARGET_USING_MCFGTHREAD  1
diff --git a/gcc/config/i386/mingw32.h b/gcc/config/i386/mingw32.h
index d3ca0cd0279d..b5f31c3da0ac 100644
--- a/gcc/config/i386/mingw32.h
+++ b/gcc/config/i386/mingw32.h
@@ -32

[PATCH] c++: Implement excess precision support for C++ [PR107097, PR323]

2022-10-11 Thread Jakub Jelinek via Gcc-patches

Hi!

The following patch implements excess precision support for C++.
Like for C, it uses EXCESS_PRECISION_EXPR tree to say that its operand
is evaluated in excess precision and what the semantic type of the
expression is.
In most places I've followed what the C FE does in similar spots, so
e.g. for binary ops if one or both operands are already
EXCESS_PRECISION_EXPR, strip those away or for operations that might need
excess precision (+, -, *, /) check if the operands should use excess
precision and convert to that type and at the end wrap into
EXCESS_PRECISION_EXPR with the common semantic type.
In general I've tried to follow the C99 handling, C11+ relies on the
C standard saying that in case of integral conversions excess precision
can be used (see PR87390 for more details), but I don't see anything similar
on the C++ standard side.
There are some cases which needed to be handled differently, the C FE can
just strip EXCESS_PRECISION_EXPR (replace it with its operand) when handling
explicit cast, but that IMHO isn't right for C++ - the discovery what exact
conversion should be used (e.g. if user conversion or standard or their
sequence) should be decided based on the semantic type (i.e. type of
EXCESS_PRECISION_EXPR), and that decision continues in convert_like* where
we pick the right user conversion, again, if say some class has ctor
from double and long double and we are on ia32 with standard excess
precision promoting float/double to long double, then we should pick the
ctor from double.  Or when some other class has ctor from just double,
and EXCESS_PRECISION_EXPR semantic type is float, we should choose the
user ctor from double, but actually just convert the long double excess
precision to double and not to float first.  We need to make sure
even identity conversion converts from excess precision to the semantic one
though, but if identity is chained with other conversions, we don't want
the identity next_conversion to drop to semantic precision only to widen
afterwards.

The existing testcases tweaks were for cases on i686-linux where excess
precision breaks those tests, e.g. if we have
  double d = 4.2;
  if (d == 4.2)
then it does the expected thing only with -fexcess-precision=fast,
because with -fexcess-precision=standard it is actually
  double d = 4.2;
  if ((long double) d == 4.2L)
where 4.2L is different from 4.2.  I've added -fexcess-precision=fast
to some tests and changed other tests to use constants that are exactly
representable and don't suffer from these excess precision issues.

There is one exception, pr68180.C looks like a bug in the patch which is
also present in the C FE (so I'd like to get it resolved incrementally
in both).  Reduced testcase:
typedef float __attribute__((vector_size (16))) float32x4_t;
float32x4_t foo(float32x4_t x, float y) { return x + y; }
with -m32 -std=c11 -Wno-psabi or -m32 -std=c++17 -Wno-psabi
it is rejected with:
pr68180.c:2:52: error: conversion of scalar ‘long double’ to vector 
‘float32x4_t’ {aka ‘__vector(4) float’} involves truncation
but without excess precision (say just -std=c11 -Wno-psabi or -std=c++17 
-Wno-psabi)
it is accepted.  Perhaps we should pass down the semantic type to
scalar_to_vector and use the semantic type rather than excess precision type
in the diagnostics.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-10-11  Jakub Jelinek  

PR middle-end/323
PR c++/107097
gcc/c-family/
* c-opts.cc (c_common_post_options): Handle flag_excess_precision
in C++ the same as in C.
* c-lex.cc (interpret_float): Set const_type to excess_precision ()
even for C++.
gcc/cp/
* parser.cc (cp_parser_primary_expression): Handle
EXCESS_PRECISION_EXPR with REAL_CST operand the same as REAL_CST.
* cvt.cc (cp_ep_convert_and_check): New function.
* call.cc (build_conditional_expr): Add excess precision support.
(convert_like_internal): Likewise.  Add NESTED_P argument, pass true
to recursive calls to convert_like.
(convert_like): Add NESTED_P argument, pass it through to
convert_like_internal.  For other overload pass false to it.
(convert_like_with_context): Pass false to NESTED_P.
(convert_arg_to_ellipsis): Add excess precision support.
(magic_varargs_p): For __builtin_is{finite,inf,inf_sign,nan,normal}
and __builtin_fpclassify return 2 instead of 1, document what it
means.
(build_over_call): Don't handle former magic 2 which is no longer
used, instead for magic 1 remove EXCESS_PRECISION_EXPR.
(perform_direct_initialization_if_possible): Pass false to NESTED_P
convert_like argument.
* constexpr.cc (cxx_eval_constant_expression): Handle
EXCESS_PRECISION_EXPR.
(potential_constant_expression_1): Likewise.
* pt.cc (tsubst_copy, tsubst_copy_and_build): Likewise.
* cp-tree.h (cp_ep_convert_and_check): Declare.

Re: [PATCH v2 00/10] Introduce strub: machine-independent stack scrubbing

2022-10-11 Thread Alexandre Oliva via Gcc-patches

On Oct 11, 2022, Richard Biener  wrote:

> On Tue, Oct 11, 2022 at 1:57 PM Alexandre Oliva  wrote:
>> 
>> On Oct 10, 2022, Richard Biener  wrote:
>> 
>> > As noted in the Cauldron Discussion I think you should do all
>> > instrumentation post-IPA only to simplify your life not needing to
>> > handle inlining of instrumentation
>> 
>> I looked a bit into that after the Cauldron, and recalled why I wanted
>> to instrument before inlining: in the case of internal strub, that
>> introduces a wrapper, it's desirable to be able to inline the wrapper.

> I think if the wrapper is created at IPA time it is also available for
> IPA inlining.

Yeah, but now I'm not sure what you're suggesting.  The wrapper is
instrumentation, and requires instrumentation of the wrapped
counterpart, so that can't be post-IPA.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about

[PATCH] middle-end, v2: IFN_ASSUME support [PR106654]

2022-10-11 Thread Jakub Jelinek via Gcc-patches

On Mon, Oct 10, 2022 at 11:19:24PM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Mon, Oct 10, 2022 at 05:09:29PM -0400, Jason Merrill wrote:
> > On 10/10/22 04:54, Jakub Jelinek via Gcc-patches wrote:
> > > My earlier patches gimplify the simplest non-side-effects assumptions
> > > into if (cond) ; else __builtin_unreachable (); and throw the rest
> > > on the floor.
> > > The following patch attempts to do something with the rest too.
> > > For -O0, it actually throws even the simplest assumptions on the floor,
> > > we don't expect optimizations and the assumptions are there to allow
> > > optimizations.
> > 
> > I'd think we should trap on failed assume at -O0 (i.e. with
> > -funreachable-traps).
> 
> For the simple conditions?  Perhaps.  But for the side-effects cases
> that doesn't seem to be easily possible.

Here is an updated patch which will trap on failed simple assume.

Bootstrapped/regtested successfully on x86_64-linux and i686-linux, the only
change was moving the !optimize handling from before the
if (cond); else __builtin_unreachable ();
gimplification to right after it.

2022-10-11  Jakub Jelinek  

PR c++/106654
gcc/
* function.h (struct function): Add assume_function bitfield.
* gimplify.cc (gimplify_call_expr): If the assumption isn't
simple enough, expand it into IFN_ASSUME guarded block or
for -O0 drop it.
* gimple-low.cc (create_assumption_fn): New function.
(struct lower_assumption_data): New type.
(find_assumption_locals_r, assumption_copy_decl,
adjust_assumption_stmt_r, adjust_assumption_stmt_op,
lower_assumption): New functions.
(lower_stmt): Handle IFN_ASSUME guarded block.
* tree-ssa-ccp.cc (pass_fold_builtins::execute): Remove
IFN_ASSUME calls.
* lto-streamer-out.cc (output_struct_function_base): Pack
assume_function bit.
* lto-streamer-in.cc (input_struct_function_base): And unpack it.
* cgraphunit.cc (cgraph_node::expand): Don't verify assume_function
has TREE_ASM_WRITTEN set and don't release its body.
* cfgexpand.cc (pass_expand::execute): Don't expand assume_function
into RTL, just destroy loops and exit.
* internal-fn.cc (expand_ASSUME): Remove gcc_unreachable.
* passes.cc (pass_rest_of_compilation::gate): Return false also for
fun->assume_function.
* tree-vectorizer.cc (pass_vectorize::gate,
pass_slp_vectorize::gate): Likewise.
* ipa-icf.cc (sem_function::parse): Punt for func->assume_function.
gcc/cp/
* parser.cc (cp_parser_omp_assumption_clauses): Wrap IFN_ASSUME
argument with fold_build_cleanup_point_expr.
* cp-gimplify.cc (process_stmt_assume_attribute): Likewise.
* pt.cc (tsubst_copy_and_build): Likewise.
gcc/testsuite/
* g++.dg/cpp23/attr-assume5.C: New test.
* g++.dg/cpp23/attr-assume6.C: New test.
* g++.dg/cpp23/attr-assume7.C: New test.

--- gcc/function.h.jj   2022-10-10 09:31:22.051478926 +0200
+++ gcc/function.h  2022-10-10 09:59:49.283646705 +0200
@@ -438,6 +438,10 @@ struct GTY(()) function {

   /* Set if there are any OMP_TARGET regions in the function.  */
   unsigned int has_omp_target : 1;
+
+  /* Set for artificial function created for [[assume (cond)]].
+ These should be GIMPLE optimized, but not expanded to RTL.  */
+  unsigned int assume_function : 1;
 };

 /* Add the decl D to the local_decls list of FUN.  */
--- gcc/gimplify.cc.jj  2022-10-10 09:31:57.518983613 +0200
+++ gcc/gimplify.cc 2022-10-10 09:59:49.285646677 +0200
@@ -3569,7 +3569,52 @@ gimplify_call_expr (tree *expr_p, gimple
 fndecl, 0));
  return GS_OK;
}
- /* FIXME: Otherwise expand it specially.  */
+ /* If not optimizing, ignore the assumptions.  */
+ if (!optimize)
+   {
+ *expr_p = NULL_TREE;
+ return GS_ALL_DONE;
+   }
+ /* Temporarily, until gimple lowering, transform
+.ASSUME (cond);
+into:
+guard = .ASSUME ();
+if (guard) goto label_true; else label_false;
+label_true:;
+{
+  guard = cond;
+}
+label_false:;
+.ASSUME (guard);
+such that gimple lowering can outline the condition into
+a separate function easily.  */
+ tree guard = create_tmp_var (boolean_type_node);
+ gcall *call = gimple_build_call_internal (ifn, 0);
+ gimple_call_set_nothrow (call, TREE_NOTHROW (*expr_p));
+ gimple_set_location (call, loc);
+ gimple_call_set_lhs (call, guard);
+ gimple_seq_add_stmt (pre_p, call);
+ *expr_p = build2 (MODIFY_EXPR, void_type_node, guard,
+   CALL_EXPR_ARG (*expr_p, 0));
+ *expr_p = build3 (BIND_EXPR, void_type_node, NULL, *expr_p,

[COMMITTED] Move TRUE case first in range-op.cc.

2022-10-11 Thread Aldy Hernandez via Gcc-patches

It's incredibly annoying that some of the BRS_TRUE cases come after
BRS_FALSE, if only because we're not consistent.  Having random
ordering increases the changes of thinkos when adapting the irange
code to floats.

gcc/ChangeLog:

* range-op.cc (operator_equal::op1_range): Move BRS_TRUE case up.
(operator_lt::op2_range): Same.
(operator_le::op2_range): Same.
(operator_gt::op2_range): Same.
(operator_ge::op2_range): Same.
---
 gcc/range-op.cc | 42 +-
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index df0735cb42a..4d5a033dfa5 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -531,6 +531,11 @@ operator_equal::op1_range (irange &r, tree type,
 {
   switch (get_bool_state (r, lhs, type))
 {
+case BRS_TRUE:
+  // If it's true, the result is the same as OP2.
+  r = op2;
+  break;
+
 case BRS_FALSE:
   // If the result is false, the only time we know anything is
   // if OP2 is a constant.
@@ -543,11 +548,6 @@ operator_equal::op1_range (irange &r, tree type,
r.set_varying (type);
   break;
 
-case BRS_TRUE:
-  // If it's true, the result is the same as OP2.
-  r = op2;
-  break;
-
 default:
   break;
 }
@@ -841,14 +841,14 @@ operator_lt::op2_range (irange &r, tree type,
 {
   switch (get_bool_state (r, lhs, type))
 {
-case BRS_FALSE:
-  build_le (r, type, op1.upper_bound ());
-  break;
-
 case BRS_TRUE:
   build_gt (r, type, op1.lower_bound ());
   break;
 
+case BRS_FALSE:
+  build_le (r, type, op1.upper_bound ());
+  break;
+
 default:
   break;
 }
@@ -952,14 +952,14 @@ operator_le::op2_range (irange &r, tree type,
 {
   switch (get_bool_state (r, lhs, type))
 {
-case BRS_FALSE:
-  build_lt (r, type, op1.upper_bound ());
-  break;
-
 case BRS_TRUE:
   build_ge (r, type, op1.lower_bound ());
   break;
 
+case BRS_FALSE:
+  build_lt (r, type, op1.upper_bound ());
+  break;
+
 default:
   break;
 }
@@ -1062,14 +1062,14 @@ operator_gt::op2_range (irange &r, tree type,
 {
   switch (get_bool_state (r, lhs, type))
 {
-case BRS_FALSE:
-  build_ge (r, type, op1.lower_bound ());
-  break;
-
 case BRS_TRUE:
   build_lt (r, type, op1.upper_bound ());
   break;
 
+case BRS_FALSE:
+  build_ge (r, type, op1.lower_bound ());
+  break;
+
 default:
   break;
 }
@@ -1173,14 +1173,14 @@ operator_ge::op2_range (irange &r, tree type,
 {
   switch (get_bool_state (r, lhs, type))
 {
-case BRS_FALSE:
-  build_gt (r, type, op1.lower_bound ());
-  break;
-
 case BRS_TRUE:
   build_le (r, type, op1.upper_bound ());
   break;
 
+case BRS_FALSE:
+  build_gt (r, type, op1.lower_bound ());
+  break;
+
 default:
   break;
 }
-- 
2.37.3

[COMMITTED] Share common ordered comparison code with UN*_EXPR.

2022-10-11 Thread Aldy Hernandez via Gcc-patches

Most unordered comparisons can use the result from the ordered
version, if the operands are known not to be NAN or if the result is
true.

gcc/ChangeLog:

* range-op-float.cc (class foperator_unordered_lt): New.
(class foperator_relop_unknown): Remove
(class foperator_unordered_le): New.
(class foperator_unordered_gt): New.
(class foperator_unordered_ge): New.
(class foperator_unordered_equal): New.
(floating_op_table::floating_op_table): Replace all UN_EXPR
entries with their appropriate fop_unordered_* counterpart.
---
 gcc/range-op-float.cc | 140 ++
 1 file changed, 128 insertions(+), 12 deletions(-)

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index 3cf117d8931..8dd4bcc70c0 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -1132,24 +1132,140 @@ foperator_ordered::op1_range (frange &r, tree type,
   return true;
 }
 
-// Placeholder for unimplemented relational operators.
+class foperator_unordered_lt : public range_operator_float
+{
+  using range_operator_float::fold_range;
+public:
+  bool fold_range (irange &r, tree type,
+  const frange &op1, const frange &op2,
+  relation_kind rel) const final override
+  {
+if (op1.known_isnan () || op2.known_isnan ())
+  {
+   r = range_true (type);
+   return true;
+  }
+if (!fop_lt.fold_range (r, type, op1, op2, rel))
+  return false;
+// The result is the same as the ordered version when the
+// comparison is true or when the operands cannot be NANs.
+if (finite_operands_p (op1, op2) || r == range_true (type))
+  return true;
+else
+  {
+   r = range_true_and_false (type);
+   return true;
+  }
+  }
+} fop_unordered_lt;
 
-class foperator_relop_unknown : public range_operator_float
+class foperator_unordered_le : public range_operator_float
 {
   using range_operator_float::fold_range;
+public:
+  bool fold_range (irange &r, tree type,
+  const frange &op1, const frange &op2,
+  relation_kind rel) const final override
+  {
+if (op1.known_isnan () || op2.known_isnan ())
+  {
+   r = range_true (type);
+   return true;
+  }
+if (!fop_le.fold_range (r, type, op1, op2, rel))
+  return false;
+// The result is the same as the ordered version when the
+// comparison is true or when the operands cannot be NANs.
+if (finite_operands_p (op1, op2) || r == range_true (type))
+  return true;
+else
+  {
+   r = range_true_and_false (type);
+   return true;
+  }
+  }
+} fop_unordered_le;
 
+class foperator_unordered_gt : public range_operator_float
+{
+  using range_operator_float::fold_range;
 public:
   bool fold_range (irange &r, tree type,
   const frange &op1, const frange &op2,
-  relation_kind) const final override
+  relation_kind rel) const final override
   {
 if (op1.known_isnan () || op2.known_isnan ())
-  r = range_true (type);
+  {
+   r = range_true (type);
+   return true;
+  }
+if (!fop_gt.fold_range (r, type, op1, op2, rel))
+  return false;
+// The result is the same as the ordered version when the
+// comparison is true or when the operands cannot be NANs.
+if (finite_operands_p (op1, op2) || r == range_true (type))
+  return true;
 else
-  r.set_varying (type);
-return true;
+  {
+   r = range_true_and_false (type);
+   return true;
+  }
+  }
+} fop_unordered_gt;
+
+class foperator_unordered_ge : public range_operator_float
+{
+  using range_operator_float::fold_range;
+public:
+  bool fold_range (irange &r, tree type,
+  const frange &op1, const frange &op2,
+  relation_kind rel) const final override
+  {
+if (op1.known_isnan () || op2.known_isnan ())
+  {
+   r = range_true (type);
+   return true;
+  }
+if (!fop_ge.fold_range (r, type, op1, op2, rel))
+  return false;
+// The result is the same as the ordered version when the
+// comparison is true or when the operands cannot be NANs.
+if (finite_operands_p (op1, op2) || r == range_true (type))
+  return true;
+else
+  {
+   r = range_true_and_false (type);
+   return true;
+  }
+  }
+} fop_unordered_ge;
+
+class foperator_unordered_equal : public range_operator_float
+{
+  using range_operator_float::fold_range;
+public:
+  bool fold_range (irange &r, tree type,
+  const frange &op1, const frange &op2,
+  relation_kind rel) const final override
+  {
+if (op1.known_isnan () || op2.known_isnan ())
+  {
+   r = range_true (type);
+   return true;
+  }
+if (!fop_equal.fold_range (r, type, op1, op2, rel))
+  return false;
+// The result is the same as the ordered version when the
+// comparison is tr

[COMMITTED] Implement ABS_EXPR operator for frange.

2022-10-11 Thread Aldy Hernandez via Gcc-patches

Implementing ABS_EXPR allows us to fold certain __builtin_inf calls
since they are expanded into calls to involving ABS_EXPR.

This is an adaptation of the integer version.

gcc/ChangeLog:

* range-op-float.cc (class foperator_abs): New.
(floating_op_table::floating_op_table): Add ABS_EXPR entry.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/vrp-float-abs-1.c: New test.
---
 gcc/range-op-float.cc | 91 +++
 .../gcc.dg/tree-ssa/vrp-float-abs-1.c | 17 
 2 files changed, 108 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index ef51b7538e3..283eb134c78 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -1132,6 +1132,95 @@ foperator_ordered::op1_range (frange &r, tree type,
   return true;
 }
 
+class foperator_abs : public range_operator_float
+{
+  using range_operator_float::fold_range;
+  using range_operator_float::op1_range;
+public:
+  bool fold_range (frange &r, tree type,
+  const frange &op1, const frange &,
+  relation_kind) const final override;
+  bool op1_range (frange &r, tree type,
+ const frange &lhs, const frange &op2,
+ relation_kind rel) const final override;
+} fop_abs;
+
+bool
+foperator_abs::fold_range (frange &r, tree type,
+  const frange &op1, const frange &op2,
+  relation_kind) const
+{
+  if (empty_range_varying (r, type, op1, op2))
+return true;
+  if (op1.known_isnan ())
+{
+  r.set_nan (type, /*sign=*/false);
+  return true;
+}
+
+  const REAL_VALUE_TYPE lh_lb = op1.lower_bound ();
+  const REAL_VALUE_TYPE lh_ub = op1.upper_bound ();
+  // Handle the easy case where everything is positive.
+  if (real_compare (GE_EXPR, &lh_lb, &dconst0)
+  && !real_iszero (&lh_lb, /*sign=*/true)
+  && !op1.maybe_isnan (/*sign=*/true))
+{
+  r = op1;
+  return true;
+}
+
+  REAL_VALUE_TYPE min = real_value_abs (&lh_lb);
+  REAL_VALUE_TYPE max = real_value_abs (&lh_ub);
+  // If the range contains zero then we know that the minimum value in the
+  // range will be zero.
+  if (real_compare (LE_EXPR, &lh_lb, &dconst0)
+  && real_compare (GE_EXPR, &lh_ub, &dconst0))
+{
+  if (real_compare (GT_EXPR, &min, &max))
+   max = min;
+  min = dconst0;
+}
+  else
+{
+  // If the range was reversed, swap MIN and MAX.
+  if (real_compare (GT_EXPR, &min, &max))
+   std::swap (min, max);
+}
+
+  r.set (type, min, max);
+  if (op1.maybe_isnan ())
+r.update_nan (/*sign=*/false);
+  else
+r.clear_nan ();
+  return true;
+}
+
+bool
+foperator_abs::op1_range (frange &r, tree type,
+ const frange &lhs, const frange &op2,
+ relation_kind) const
+{
+  if (empty_range_varying (r, type, lhs, op2))
+return true;
+  if (lhs.known_isnan ())
+{
+  r.set_nan (type);
+  return true;
+}
+
+  // Start with the positives because negatives are an impossible result.
+  frange positives (type, dconst0, frange_val_max (type));
+  positives.update_nan (/*sign=*/false);
+  positives.intersect (lhs);
+  r = positives;
+  // Then add the negative of each pair:
+  // ABS(op1) = [5,20] would yield op1 => [-20,-5][5,20].
+  r.union_ (frange (type,
+   real_value_negate (&positives.upper_bound ()),
+   real_value_negate (&positives.lower_bound (;
+  return true;
+}
+
 class foperator_unordered_lt : public range_operator_float
 {
   using range_operator_float::fold_range;
@@ -1502,6 +1591,8 @@ floating_op_table::floating_op_table ()
   set (UNEQ_EXPR, fop_unordered_equal);
   set (ORDERED_EXPR, fop_ordered);
   set (UNORDERED_EXPR, fop_unordered);
+
+  set (ABS_EXPR, fop_abs);
 }
 
 // Return a pointer to the range_operator_float instance, if there is
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c
new file mode 100644
index 000..4b7b75833e0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp-float-abs-1.c
@@ -0,0 +1,17 @@
+// { dg-do compile }
+// { dg-options "-O2 -fno-thread-jumps -fdump-tree-evrp" }
+
+void link_error ();
+
+void
+foo (double x, double y)
+{
+  if (x > y && __builtin_signbit (y) == 0)
+{
+  // y == +INF is impossible.
+  if (__builtin_isinf (y))
+link_error ();
+}
+}
+
+// { dg-final { scan-tree-dump-not "link_error" "evrp" } }
-- 
2.37.3

[COMMITTED] Implement op1_range operators for unordered comparisons.

2022-10-11 Thread Aldy Hernandez via Gcc-patches

gcc/ChangeLog:

* range-op-float.cc (foperator_unordered_le::op1_range): New.
(foperator_unordered_le::op2_range): New.
(foperator_unordered_gt::op1_range): New.
(foperator_unordered_gt::op2_range): New.
(foperator_unordered_ge::op1_range): New.
(foperator_unordered_ge::op2_range): New.
(foperator_unordered_equal::op1_range): New.
---
 gcc/range-op-float.cc | 205 ++
 1 file changed, 205 insertions(+)

diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
index 8dd4bcc70c0..ef51b7538e3 100644
--- a/gcc/range-op-float.cc
+++ b/gcc/range-op-float.cc
@@ -1162,6 +1162,8 @@ public:
 class foperator_unordered_le : public range_operator_float
 {
   using range_operator_float::fold_range;
+  using range_operator_float::op1_range;
+  using range_operator_float::op2_range;
 public:
   bool fold_range (irange &r, tree type,
   const frange &op1, const frange &op2,
@@ -1184,11 +1186,65 @@ public:
return true;
   }
   }
+  bool op1_range (frange &r, tree type,
+ const irange &lhs, const frange &op2,
+ relation_kind rel) const final override;
+  bool op2_range (frange &r, tree type,
+ const irange &lhs, const frange &op1,
+ relation_kind rel) const final override;
 } fop_unordered_le;
 
+bool
+foperator_unordered_le::op1_range (frange &r, tree type,
+  const irange &lhs, const frange &op2,
+  relation_kind) const
+{
+  switch (get_bool_state (r, lhs, type))
+{
+case BRS_TRUE:
+  build_le (r, type, op2);
+  break;
+
+case BRS_FALSE:
+  build_gt (r, type, op2);
+  r.clear_nan ();
+  break;
+
+default:
+  break;
+}
+  return true;
+}
+
+bool
+foperator_unordered_le::op2_range (frange &r,
+  tree type,
+  const irange &lhs,
+  const frange &op1,
+  relation_kind) const
+{
+  switch (get_bool_state (r, lhs, type))
+{
+case BRS_TRUE:
+  build_ge (r, type, op1);
+  break;
+
+case BRS_FALSE:
+  build_lt (r, type, op1);
+  r.clear_nan ();
+  break;
+
+default:
+  break;
+}
+  return true;
+}
+
 class foperator_unordered_gt : public range_operator_float
 {
   using range_operator_float::fold_range;
+  using range_operator_float::op1_range;
+  using range_operator_float::op2_range;
 public:
   bool fold_range (irange &r, tree type,
   const frange &op1, const frange &op2,
@@ -1211,11 +1267,67 @@ public:
return true;
   }
   }
+  bool op1_range (frange &r, tree type,
+ const irange &lhs, const frange &op2,
+ relation_kind rel) const final override;
+  bool op2_range (frange &r, tree type,
+ const irange &lhs, const frange &op1,
+ relation_kind rel) const final override;
 } fop_unordered_gt;
 
+bool
+foperator_unordered_gt::op1_range (frange &r,
+tree type,
+const irange &lhs,
+const frange &op2,
+relation_kind) const
+{
+  switch (get_bool_state (r, lhs, type))
+{
+case BRS_TRUE:
+  build_gt (r, type, op2);
+  break;
+
+case BRS_FALSE:
+  build_le (r, type, op2);
+  r.clear_nan ();
+  break;
+
+default:
+  break;
+}
+  return true;
+}
+
+bool
+foperator_unordered_gt::op2_range (frange &r,
+  tree type,
+  const irange &lhs,
+  const frange &op1,
+  relation_kind) const
+{
+  switch (get_bool_state (r, lhs, type))
+{
+case BRS_TRUE:
+  build_lt (r, type, op1);
+  break;
+
+case BRS_FALSE:
+  build_ge (r, type, op1);
+  r.clear_nan ();
+  break;
+
+default:
+  break;
+}
+  return true;
+}
+
 class foperator_unordered_ge : public range_operator_float
 {
   using range_operator_float::fold_range;
+  using range_operator_float::op1_range;
+  using range_operator_float::op2_range;
 public:
   bool fold_range (irange &r, tree type,
   const frange &op1, const frange &op2,
@@ -1238,11 +1350,66 @@ public:
return true;
   }
   }
+  bool op1_range (frange &r, tree type,
+ const irange &lhs, const frange &op2,
+ relation_kind rel) const final override;
+  bool op2_range (frange &r, tree type,
+ const irange &lhs, const frange &op1,
+ relation_kind rel) const final override;
 } fop_unordered_ge;
 
+bool
+foperator_unordered_ge::op1_range (frange &r,
+  tree type,
+  const irange &lhs,
+  const frange

Re: [PATCH 1/2] gcov: test switch/break line counts

2022-10-11 Thread Michael Matz via Gcc-patches

Hello,

On Tue, 11 Oct 2022, Jørgen Kvalsvik via Gcc-patches wrote:

> The coverage support will under some conditions decide to split edges to
> accurately report coverage. By running the test suite with/without this
> edge splitting a small diff shows up, addressed by this patch, which
> should catch future regressions.
> 
> Removing the edge splitting:
> 
> $ diff --git a/gcc/profile.cc b/gcc/profile.cc
> --- a/gcc/profile.cc
> +++ b/gcc/profile.cc
> @@ -1244,19 +1244,7 @@ branch_prob (bool thunk)
> Don't do that when the locuses match, so
> if (blah) goto something;
> is not computed twice.  */
> - if (last
> - && gimple_has_location (last)
> - && !RESERVED_LOCATION_P (e->goto_locus)
> - && !single_succ_p (bb)
> - && (LOCATION_FILE (e->goto_locus)
> - != LOCATION_FILE (gimple_location (last))
> - || (LOCATION_LINE (e->goto_locus)
> - != LOCATION_LINE (gimple_location (last)
> -   {
> - basic_block new_bb = split_edge (e);
> - edge ne = single_succ_edge (new_bb);
> - ne->goto_locus = e->goto_locus;
> -   }
> +

Assuming this is correct (I really can't say) then the comment needs 
adjustments.  It specifically talks about this very code you remove.


Ciao,
Michael.

Re: [COMMITTED] Implement op1_range operators for unordered comparisons.

2022-10-11 Thread Aldy Hernandez via Gcc-patches

I forgot to mention.  These were lifted from the integer counterparts.
Most of the code is the same, as the build_{cond} code in the frange
version will add the appropriate NAN (unless -ffinite-math-only), and
all we have to do is clear it on the false edge.

Aldy

On Tue, Oct 11, 2022 at 3:51 PM Aldy Hernandez  wrote:
>
> gcc/ChangeLog:
>
> * range-op-float.cc (foperator_unordered_le::op1_range): New.
> (foperator_unordered_le::op2_range): New.
> (foperator_unordered_gt::op1_range): New.
> (foperator_unordered_gt::op2_range): New.
> (foperator_unordered_ge::op1_range): New.
> (foperator_unordered_ge::op2_range): New.
> (foperator_unordered_equal::op1_range): New.
> ---
>  gcc/range-op-float.cc | 205 ++
>  1 file changed, 205 insertions(+)
>
> diff --git a/gcc/range-op-float.cc b/gcc/range-op-float.cc
> index 8dd4bcc70c0..ef51b7538e3 100644
> --- a/gcc/range-op-float.cc
> +++ b/gcc/range-op-float.cc
> @@ -1162,6 +1162,8 @@ public:
>  class foperator_unordered_le : public range_operator_float
>  {
>using range_operator_float::fold_range;
> +  using range_operator_float::op1_range;
> +  using range_operator_float::op2_range;
>  public:
>bool fold_range (irange &r, tree type,
>const frange &op1, const frange &op2,
> @@ -1184,11 +1186,65 @@ public:
> return true;
>}
>}
> +  bool op1_range (frange &r, tree type,
> + const irange &lhs, const frange &op2,
> + relation_kind rel) const final override;
> +  bool op2_range (frange &r, tree type,
> + const irange &lhs, const frange &op1,
> + relation_kind rel) const final override;
>  } fop_unordered_le;
>
> +bool
> +foperator_unordered_le::op1_range (frange &r, tree type,
> +  const irange &lhs, const frange &op2,
> +  relation_kind) const
> +{
> +  switch (get_bool_state (r, lhs, type))
> +{
> +case BRS_TRUE:
> +  build_le (r, type, op2);
> +  break;
> +
> +case BRS_FALSE:
> +  build_gt (r, type, op2);
> +  r.clear_nan ();
> +  break;
> +
> +default:
> +  break;
> +}
> +  return true;
> +}
> +
> +bool
> +foperator_unordered_le::op2_range (frange &r,
> +  tree type,
> +  const irange &lhs,
> +  const frange &op1,
> +  relation_kind) const
> +{
> +  switch (get_bool_state (r, lhs, type))
> +{
> +case BRS_TRUE:
> +  build_ge (r, type, op1);
> +  break;
> +
> +case BRS_FALSE:
> +  build_lt (r, type, op1);
> +  r.clear_nan ();
> +  break;
> +
> +default:
> +  break;
> +}
> +  return true;
> +}
> +
>  class foperator_unordered_gt : public range_operator_float
>  {
>using range_operator_float::fold_range;
> +  using range_operator_float::op1_range;
> +  using range_operator_float::op2_range;
>  public:
>bool fold_range (irange &r, tree type,
>const frange &op1, const frange &op2,
> @@ -1211,11 +1267,67 @@ public:
> return true;
>}
>}
> +  bool op1_range (frange &r, tree type,
> + const irange &lhs, const frange &op2,
> + relation_kind rel) const final override;
> +  bool op2_range (frange &r, tree type,
> + const irange &lhs, const frange &op1,
> + relation_kind rel) const final override;
>  } fop_unordered_gt;
>
> +bool
> +foperator_unordered_gt::op1_range (frange &r,
> +tree type,
> +const irange &lhs,
> +const frange &op2,
> +relation_kind) const
> +{
> +  switch (get_bool_state (r, lhs, type))
> +{
> +case BRS_TRUE:
> +  build_gt (r, type, op2);
> +  break;
> +
> +case BRS_FALSE:
> +  build_le (r, type, op2);
> +  r.clear_nan ();
> +  break;
> +
> +default:
> +  break;
> +}
> +  return true;
> +}
> +
> +bool
> +foperator_unordered_gt::op2_range (frange &r,
> +  tree type,
> +  const irange &lhs,
> +  const frange &op1,
> +  relation_kind) const
> +{
> +  switch (get_bool_state (r, lhs, type))
> +{
> +case BRS_TRUE:
> +  build_lt (r, type, op1);
> +  break;
> +
> +case BRS_FALSE:
> +  build_ge (r, type, op1);
> +  r.clear_nan ();
> +  break;
> +
> +default:
> +  break;
> +}
> +  return true;
> +}
> +
>  class foperator_unordered_ge : public range_operator_float
>  {
>using range_operator_float::fold_range;
> +  using range_operator_float::op1_range;
> +  using range_operator_float::op2_range;
>  public:
>bool fold_range (irange &r, tree type,
>cons

Re: [PATCH 1/2] gcov: test switch/break line counts

2022-10-11 Thread Jørgen Kvalsvik via Gcc-patches

On 11/10/2022 15:55, Michael Matz wrote:
> Hello,
> 
> On Tue, 11 Oct 2022, Jørgen Kvalsvik via Gcc-patches wrote:
> 
>> The coverage support will under some conditions decide to split edges to
>> accurately report coverage. By running the test suite with/without this
>> edge splitting a small diff shows up, addressed by this patch, which
>> should catch future regressions.
>>
>> Removing the edge splitting:
>>
>> $ diff --git a/gcc/profile.cc b/gcc/profile.cc
>> --- a/gcc/profile.cc
>> +++ b/gcc/profile.cc
>> @@ -1244,19 +1244,7 @@ branch_prob (bool thunk)
>> Don't do that when the locuses match, so
>> if (blah) goto something;
>> is not computed twice.  */
>> - if (last
>> - && gimple_has_location (last)
>> - && !RESERVED_LOCATION_P (e->goto_locus)
>> - && !single_succ_p (bb)
>> - && (LOCATION_FILE (e->goto_locus)
>> - != LOCATION_FILE (gimple_location (last))
>> - || (LOCATION_LINE (e->goto_locus)
>> - != LOCATION_LINE (gimple_location (last)
>> -   {
>> - basic_block new_bb = split_edge (e);
>> - edge ne = single_succ_edge (new_bb);
>> - ne->goto_locus = e->goto_locus;
>> -   }
>> +
> 
> Assuming this is correct (I really can't say) then the comment needs 
> adjustments.  It specifically talks about this very code you remove.
> 
> 
> Ciao,
> Michael.

Michael,

I apologise for the confusion. The diff there is not a part of the change itself
(note the indentation) but rather a way to reproduce, or at least understand,
the type of change that would trigger the new test error. If it is too confusing
I can re-write the commit message.

Thanks,
Jørgen

Re: [PATCH] [x86] Add define_insn_and_split to support general version of "kxnor".

2022-10-11 Thread Jakub Jelinek via Gcc-patches

On Tue, Oct 11, 2022 at 04:03:16PM +0800, liuhongt via Gcc-patches wrote:
> gcc/ChangeLog:
> 
>   * config/i386/i386.md (*notxor_1): New post_reload
>   define_insn_and_split.
>   (*notxorqi_1): Ditto.

> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -10826,6 +10826,39 @@ (define_insn "*_1"
> (set_attr "type" "alu, alu, msklog")
> (set_attr "mode" "")])
>  
> +(define_insn_and_split "*notxor_1"
> +  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,?k")
> + (not:SWI248
> +   (xor:SWI248
> + (match_operand:SWI248 1 "nonimmediate_operand" "%0,0,k")
> + (match_operand:SWI248 2 "" "r,,k"
> +   (clobber (reg:CC FLAGS_REG))]
> +  "ix86_binary_operator_ok (XOR, mode, operands)"
> +  "#"
> +  "&& reload_completed"
> +  [(parallel
> +[(set (match_dup 0)
> +   (xor:SWI248 (match_dup 1) (match_dup 2)))
> + (clobber (reg:CC FLAGS_REG))])
> +   (set (match_dup 0)
> + (not:SWI248 (match_dup 1)))]
> +{
> +  if (MASK_REGNO_P (REGNO (operands[0])))

This causes --enable-checking=yes,rtl,extra regression on
gcc.dg/store_merging_13.c test on x86_64-linux:
.../gcc/testsuite/gcc.dg/store_merging_13.c: In function 'f13':
.../gcc/testsuite/gcc.dg/store_merging_13.c:189:1: internal compiler error: RTL 
check: expected code 'reg', have 'mem' in rhs_regno, at rtl.h:1932
0x7b0c8f rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, 
char const*)
../../gcc/rtl.cc:916
0x8e74be rhs_regno
../../gcc/rtl.h:1932
0x9785fd rhs_regno
./genrtl.h:120
0x9785fd gen_split_260(rtx_insn*, rtx_def**)
../../gcc/config/i386/i386.md:10846
0x23596dc split_insns(rtx_def*, rtx_insn*)
../../gcc/config/i386/i386.md:16392
0xfccd5a try_split(rtx_def*, rtx_insn*, int)
../../gcc/emit-rtl.cc:3799
0x132e9d8 split_insn
../../gcc/recog.cc:3384
0x13359d5 split_all_insns()
../../gcc/recog.cc:3488
0x1335ae8 execute
../../gcc/recog.cc:4412
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

Fixed thusly, tested on x86_64-linux, committed to trunk as obvious.

2022-10-11  Jakub Jelinek  

PR target/107185
* config/i386/i386.md (*notxor_1): Use MASK_REG_P (x) instead of
MASK_REGNO_P (REGNO (x)).

--- gcc/config/i386/i386.md.jj  2022-10-11 12:10:42.188891134 +0200
+++ gcc/config/i386/i386.md 2022-10-11 15:47:45.531449089 +0200
@@ -10843,7 +10843,7 @@ (define_insn_and_split "*notxor_1"
(set (match_dup 0)
(not:SWI248 (match_dup 0)))]
 {
-  if (MASK_REGNO_P (REGNO (operands[0])))
+  if (MASK_REG_P (operands[0]))
 {
   emit_insn (gen_kxnor (operands[0], operands[1], operands[2]));
   DONE;


Jakub

Re: [PATCH 1/2] gcov: test switch/break line counts

2022-10-11 Thread Michael Matz via Gcc-patches

Hello,

On Tue, 11 Oct 2022, Jørgen Kvalsvik wrote:

> I apologise for the confusion. The diff there is not a part of the 
> change itself (note the indentation) but rather a way to reproduce,

Ah!  Thanks, that explains it, sorry for adding confusion on top :-)


Ciao,
Michael.

Re: [PATCH] Avoid calling tracer.trailer() twice.

2022-10-11 Thread Aldy Hernandez via Gcc-patches

Sure.

OK?
Aldy

On Tue, Oct 11, 2022 at 3:11 PM Andrew MacLeod  wrote:
>
> It probably should just be changed to a print if it doesn't return..
> something like
>
> if (idx && res)
>{
>  tracer.print (idx, "logical_combine produced");
>  r.dump (dump_file);
>  fputc ('\n', dump_file);
> }
>
> Andrew
>
> On 10/10/22 14:58, Aldy Hernandez wrote:
> > [Andrew, you OK with this?  I can't tell whether the trailer() call was
> > actually needed.]
> >
> > logical_combine is calling tracer.trailer() one too many times causing
> > the second trailer() call to subtract a 0 indent by 2, yielding an
> > indent of SOMETHING_REALLY_BIG :).  You'd be surprised how many tools
> > can't handle incredibly long lines.
> >
> > gcc/ChangeLog:
> >
> >   * gimple-range-gori.cc (gori_compute::logical_combine): Avoid
> >   calling tracer.trailer().
> > ---
> >   gcc/gimple-range-gori.cc | 10 +-
> >   1 file changed, 1 insertion(+), 9 deletions(-)
> >
> > diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
> > index b37d03cddda..469382aa477 100644
> > --- a/gcc/gimple-range-gori.cc
> > +++ b/gcc/gimple-range-gori.cc
> > @@ -798,20 +798,12 @@ gori_compute::logical_combine (vrange &r, enum 
> > tree_code code,
> > // would be lost.
> > if (!range_is_either_true_or_false (lhs))
> >   {
> > -  bool res;
> > Value_Range r1 (r);
> > if (logical_combine (r1, code, m_bool_zero, op1_true, op1_false,
> >  op2_true, op2_false)
> > && logical_combine (r, code, m_bool_one, op1_true, op1_false,
> > op2_true, op2_false))
> > - {
> > -   r.union_ (r1);
> > -   res = true;
> > - }
> > -  else
> > - res = false;
> > -  if (idx)
> > - tracer.trailer (idx, "logical_combine", res, NULL_TREE, r);
> > + r.union_ (r1);
> >   }
> >
> > switch (code)
>
From e7fb4679528d190463cf3d0d4a4daa7333b5a553 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Tue, 11 Oct 2022 16:00:33 +0200
Subject: [PATCH] Avoid calling tracer.trailer() twice.

gcc/ChangeLog:

* gimple-range-gori.cc (gori_compute::logical_combine): Avoid
calling tracer.trailer().
---
 gcc/gimple-range-gori.cc | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
index b37d03cddda..5ff067cadb5 100644
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -810,8 +810,12 @@ gori_compute::logical_combine (vrange &r, enum tree_code code,
 	}
   else
 	res = false;
-  if (idx)
-	tracer.trailer (idx, "logical_combine", res, NULL_TREE, r);
+  if (idx && res)
+	{
+	  tracer.print (idx, "logical_combine produced ");
+	  r.dump (dump_file);
+	  fputc ('\n', dump_file);
+	}
 }
 
   switch (code)
-- 
2.37.3

Re: [PATCH 2/5] [gfortran] Translate allocate directive (OpenMP 5.0).

2022-10-11 Thread Jakub Jelinek via Gcc-patches

On Tue, Oct 11, 2022 at 03:22:02PM +0200, Tobias Burnus wrote:
> Hi Jakub,
> 
> On 11.10.22 14:24, Jakub Jelinek wrote:
> 
> There is another issue besides what I wrote in my last review,
> and I'm afraid I don't know what to do about it, hoping Tobias
> has some ideas.
> The problem is that without the allocate-stmt associated allocate directive,
> Fortran allocatables are easily always allocated with malloc and freed with
> free.  The deallocation can be implicit through reallocation, or explicit
> deallocate statement etc.
> ...
> But when some allocatables are now allocated with a different
> allocator (when allocate-stmt associated allocate directive is used),
> some allocatables are allocated with malloc and others with GOMP_alloc
> but we need to free them with the corresponding allocator based on how
> they were allocated, what has been allocated with malloc should be
> deallocated with free, what has been allocated with GOMP_alloc should be
> deallocated with GOMP_free.
> 
> 
> 
> I think the most common case is:
> 
> integer, allocatable :: var(:)
> !$omp allocators allocator(my_alloc) ! must be in same scope as decl of 'var'
> ...
> ! optionally: deallocate(var)
> end ! of scope: block/subroutine/... - automatic deallocation

So you talk here about the declarative directive the patch does sorry on,
or about the executable one above allocate stmt?

Anyway, even this simple case has the problem that one can have
subroutine foo (var)
  integer, allocatable:: var(:)
  var = [1, 2, 3] ! reallocate
end subroutine
and call foo (var) above.

> Those can be easily handled. It gets more complicated with control flow:
> 
> if (...) then
>  !$omp allocators allocator(...)
>  allocate(...)
> else
>  allocate (...)
> endif
> 
> 
> 
> However, the problem is really that there is is no mandatory
> '!$omp deallocators' and also the wording like:
> 
> "If any operation of the base language causes a reallocation of
> an array that is allocated with a memory allocator then that
> memory allocator will be used to release the current memory
> and to allocate the new memory." (OpenMP 5.0 wording)
> 
> There has been some attempt to relax the rules a bit, e.g. by
> adding the wording:
> "For allocated allocatable components of such variables, the allocator that
> will be used for the deallocation and allocation is unspecified."
> 
> And some wording change (→issues 3189) to clarify related component issues.
> 
> But nonetheless, there is still the issue of:
> 
> (a) explicit DEALLOCATE in some other translation unit
> (b) some intrinsic operation which reallocate the memory, either via libgomp
> or in the source code:
>  a = [1,2,3]  ! possibly reallocates
>  str = trim(str) ! possibly reallocates
> where the first one calls 'realloc' directly in the code and the second one
> calls 'libgomp' for that.
> 
> * * *
> 
> I don't see a good solution – and there is in principle the same issue with
> unified-shared memory (USM) on hardware that does not support transparently
> accessing all host memory on the device.
> 
> Compilers support this case by allocating memory in some special memory,
> which is either accessible from both sides ('pinned') or migrates on the
> first access from the device side - but remains there until the accessing
> device kernel ends ('managed memory').
> 
> Newer hardware (+ associated Linux kernel support) permit accessing all
> memory in a somewhat fast way, avoiding this issue (and special handling
> is then left to the user.) For AMDGCN, my understanding is that all hardware
> supported by GCC supports this - but glacial speed until the last hardware
> architectures. For Nvidia, this is supported since Pascal (I think for Titan 
> X,
> P100, i.e. sm_5.2/sm_60) - but I believe not for all Pascal/Kepler hardware.
> 
> I mention this because the USM implementation at
> https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597976.html
> suffers from this.
> And https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601059.html
> tries to solve the the 'trim' example issue above - i.e. the case where
> libgomp reallocates pinned/managed (pseudo-)USM memory.
> 
> * * *
> 
> The deallocation can be done in a completely different TU from where it has
> been allocated, in theory it could be also not compiled with -fopenmp, etc.
> So, I'm afraid we need to store somewhere whether we used malloc or
> GOMP_alloc for the allocation (say somewhere in the array descriptor and for
> other stuff somewhere on the side?) and slow down all code that needs
> deallocation to check that bit (or say we don't support
> deallocation/reallocation of OpenMP allocated allocatables without -fopenmp
> on the deallocation TU and only slow down -fopenmp compiled code)?
> 
> The problem with storing is that gfortran inserts the malloc/realloc/free 
> calls directly, i.e. without library preloading, intercepting those libcalls, 
> I do not see how it can work at all.

Well, it can use a weak symbol, if not linked against libgo

Re: [PATCH 2/5] [gfortran] Translate allocate directive (OpenMP 5.0).

2022-10-11 Thread Jakub Jelinek via Gcc-patches

On Tue, Oct 11, 2022 at 04:15:25PM +0200, Jakub Jelinek wrote:
> Well, it can use a weak symbol, if not linked against libgomp, the bit
> that it is OpenMP shouldn't be set and so realloc/free will be used
> and do
>   if (arrdescr.gomp_alloced_bit)
> GOMP_free (arrdescr.data, 0);
>   else
> free (arrdescr.data);
> and similar.  And I think we can just document that we do this only for
> -fopenmp compiled code.
> But do we have a place to store that bit?  I presume in array descriptors
> there could be some bit for it, but what to do about scalar allocatables,
> or allocatable components etc.?
> In theory we could use ugly stuff like if all the allocations would be
> guaranteed to have at least 2 byte alignment use LSB bit of the pointer
> to mark GOMP_alloc allocated memory for the scalar allocatables etc. but
> then would need in -fopenmp compiled code to strip it away.
> 
> As for pinned memory, if it is allocated through libgomp allocators, that
> should just work if GOMP_free/GOMP_realloc is used, that is why we have
> those extra data in front of the allocations where we store everything we
> need.  But those also make the OpenMP allocations incompatible with
> malloc/free allocations.

Yet another option would be to change the way our OpenMP allocators work,
instead of having allocation internal data before the allocated memory
have them somewhere on the side and use some data structures mapping
ranges of virtual memory to the allocation data.
We'd either need to use mmap to have better control on where exactly
we allocate stuff so that the on the side data structures wouldn't need
to be for every allocation, or do those for every allocation perhaps with
merging of adjacent allocations or something similar.
Disadvantage is that it would be slower and might need more locking etc.,
advantage is that it could be then malloc/free compatible, any not tracked
address would be forwarded from GOMP_free to free etc.  And we'd not waste
e.g. precious pinned etc. memory especially when doing allocations with very
high alignment, where the data before allocation means we can waste up to
max (32, alignment - 1) of extra memory.  And gfortran
inline emitted reallocation/deallocation could just emit GOMP_realloc/free
always for -fopenmp.  The way GOMP_ allocators are currently written, it is
our internal choice if we do it the current way or the on the side way or
some other way, but if we'd guarantee free compatibility we'd make it part
of the ABI.

CCing DJ and Carlos if they have thoughts about this.
The OpenMP spec essentially requires that allocations through its allocator
remember somewhere with which allocator (and its exact properties) each
allocation has been done, so that it can be taken into account during
reallocation or freeing.

Jakub

Re: [PATCH 2/5] [gfortran] Translate allocate directive (OpenMP 5.0).

2022-10-11 Thread Tobias Burnus


On 11.10.22 16:15, Jakub Jelinek wrote:

I think the most common case is:

integer, allocatable :: var(:)
!$omp allocators allocator(my_alloc) ! must be in same scope as decl of 'var'
...
! optionally: deallocate(var)
end ! of scope: block/subroutine/... - automatic deallocation



So you talk here about the declarative directive the patch does sorry on,
or about the executable one above allocate stmt?

Here, I was only talking about the most common usage case, with the
assumption that the user code does not cause any reallocation.

I later talked about accepting only code which cannot cause
reallocation (compile-time check of the code contained in the
scope).

Thus, a 'call foo(a)' would be fine, but not for ...


Anyway, even this simple case has the problem that one can have
subroutine foo (var)
 integer, allocatable:: var(:)

a 'foo' that has an 'allocatable' attribute for the dummy argument.
I think in the common case, it has not – such that most code can run w/o 
running into this issue.

However, for code like
 type t
   real, allocatable :: x(:), y(:), z(:)
 end type t
 type(t) :: var
 !$omp allocators(my_alloc)
 allocate(var%x(N), var%y(N), var%z(N))

 call bar(var%x)
 call foo(var)

it is more difficult: 'bar' works (if its dummy argument is not 'allocatable')
but for 'foo', the (re|de)allocation cannot be ruled out.
Thus, we always have to 'sorry' for such a code – and I fear it could be 
somewhat
common.



Well, it can use a weak symbol, if not linked against libgomp, the bit
that it is OpenMP shouldn't be set and so realloc/free will be used
and do
 if (arrdescr.gomp_alloced_bit)
   GOMP_free (arrdescr.data, 0);
 else
   free (arrdescr.data);
and similar.  And I think we can just document that we do this only for
-fopenmp compiled code.
But do we have a place to store that bit?

I presume in array descriptors
there could be some bit for it, but what to do about scalar allocatables,
or allocatable components etc.?

As mentioned, we could use the 'dtype.attribute' field which is currently not 
really used – and if, only 2 of the 16 bits are used. But you are right that 
for scalar allocatables, we do not use array descriptors (except with BIND(C)). 
Hmm.

For allocatable components, the same applied: If arrays, then there is an array 
descriptor – for scalars, there isn't. (And storing the length of a scalar 
character string with deferred length uses an aux variable + has lots of bugs.)

In theory we could use ugly stuff like if all the allocations would be
guaranteed to have at least 2 byte alignment use LSB bit of the pointer
to mark GOMP_alloc allocated memory for the scalar allocatables etc. but
then would need in -fopenmp compiled code to strip it away.

I think we could do tricks with scalar allocatable variable – but it will be 
more complicated with scalar allocatable components. Hmm.

As for pinned memory, if it is allocated through libgomp allocators, that
should just work if GOMP_free/GOMP_realloc is used, that is why we have
those extra data in front of the allocations where we store everything we
need.  But those also make the OpenMP allocations incompatible with
malloc/free allocations.


The problem of making pseudo-USM work is that it has to be applied to all 
(stack,heap) memory – which implies that all code using malloc/free needs to be 
either call the GOMP version or the GLIBC version, but shall not mix one or the 
other. – Thus, calling some library or any other file that was not compiled 
with -f... will have issues with malloc/free. Another issue is that variables 
not allocated via GOMP_* will not be accessible on the device in that case.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH] Avoid calling tracer.trailer() twice.

2022-10-11 Thread Andrew MacLeod via Gcc-patches


perfect.

On 10/11/22 10:08, Aldy Hernandez wrote:

Sure.

OK?
Aldy

On Tue, Oct 11, 2022 at 3:11 PM Andrew MacLeod  wrote:

It probably should just be changed to a print if it doesn't return..
something like

if (idx && res)
{
  tracer.print (idx, "logical_combine produced");
  r.dump (dump_file);
  fputc ('\n', dump_file);
 }

Andrew

On 10/10/22 14:58, Aldy Hernandez wrote:

[Andrew, you OK with this?  I can't tell whether the trailer() call was
actually needed.]

logical_combine is calling tracer.trailer() one too many times causing
the second trailer() call to subtract a 0 indent by 2, yielding an
indent of SOMETHING_REALLY_BIG :).  You'd be surprised how many tools
can't handle incredibly long lines.

gcc/ChangeLog:

   * gimple-range-gori.cc (gori_compute::logical_combine): Avoid
   calling tracer.trailer().
---
   gcc/gimple-range-gori.cc | 10 +-
   1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
index b37d03cddda..469382aa477 100644
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -798,20 +798,12 @@ gori_compute::logical_combine (vrange &r, enum tree_code 
code,
 // would be lost.
 if (!range_is_either_true_or_false (lhs))
   {
-  bool res;
 Value_Range r1 (r);
 if (logical_combine (r1, code, m_bool_zero, op1_true, op1_false,
  op2_true, op2_false)
 && logical_combine (r, code, m_bool_one, op1_true, op1_false,
 op2_true, op2_false))
- {
-   r.union_ (r1);
-   res = true;
- }
-  else
- res = false;
-  if (idx)
- tracer.trailer (idx, "logical_combine", res, NULL_TREE, r);
+ r.union_ (r1);
   }

 switch (code)

Re: [PATCH] c++ modules: lazy loading from within template [PR99377]

2022-10-11 Thread Patrick Palka via Gcc-patches

On Mon, 10 Oct 2022, Nathan Sidwell wrote:

> On 10/4/22 13:36, Patrick Palka wrote:
> > Here when lazily loading the binding for f at parse time from the
> > template g, processing_template_decl is set and thus the call to
> > note_vague_linkage_fn from module_state::read_cluster has no effect,
> > and we never push f onto deferred_fns and end up never emitting its
> > definition.
> > 
> > ISTM the behavior of the lazy loading machinery shouldn't be sensitive
> > to whether we're inside a template, and therefore we should probably be
> > clearing processing_template_decl somewhere e.g in lazy_load_binding.
> > This is sufficient to fix the testcase.
> 
> yeah, I remember hitting issues with this, but thought I'd got rid of the need
> to override processing_template_decl.  Do you also need to override it in
> lazy_load_pendings though? that's a lazy loader and my suspicion is it might
> be susceptible to the same issues.

Hmm yeah, looks like we need to override it in lazy_load_pendings too:
I ran the testsuite with gcc_assert (!processing_template_decl) added to
module_state::read_cluster and if we don't also override it in
lazy_load_binding then the assert triggers for pr99425-2_b.X.

> 
> > 
> > But it also seems the processing_template_decl test in
> > note_vague_linkage_fn, added by r8-7539-g977bc3ee11383e for PR84973, is
> > perhaps too strong: if the intent is to avoid deferring output for
> > uninstantiated templates, we should make sure that DECL in question is
> > an uninstantiated template by checking e.g. value_dependent_expression_p.
> > This too is sufficient to fix the testcase (since f isn't a template)
> > and survives bootstrap and regtest.
> 
> I think this is an orthogonal issue -- can we remove it from this patch?

Done.

-- >8 --

Subject: [PATCH] c++ modules: lazy loading from within template [PR99377]

Here when lazily loading the binding for f at parse time from the
template g, processing_template_decl is set and thus the call to
note_vague_linkage_fn from module_state::read_cluster has no effect,
and we never push f onto deferred_fns and end up never emitting its
definition.

ISTM the behavior of the lazy loading machinery shouldn't be sensitive
to whether we're inside a template, and therefore we should be clearing
processing_template_decl in the entrypoints lazy_load_binding and
lazy_load_pendings.

PR c++/99377

gcc/cp/ChangeLog:

* module.cc (lazy_load_binding): Clear processing_template_decl.
(lazy_load_pendings): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr99377-2_a.C: New test.
* g++.dg/modules/pr99377-2_b.C: New test.
---
 gcc/cp/module.cc   | 8 
 gcc/testsuite/g++.dg/modules/pr99377-2_a.C | 5 +
 gcc/testsuite/g++.dg/modules/pr99377-2_b.C | 6 ++
 3 files changed, 19 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/pr99377-2_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/pr99377-2_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 94fbee85225..7c48602136c 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -19081,6 +19081,10 @@ lazy_load_binding (unsigned mod, tree ns, tree id, 
binding_slot *mslot)
 
   timevar_start (TV_MODULE_IMPORT);
 
+  /* Make sure lazy loading from a template context behaves as if
+ from a non-template context.  */
+  processing_template_decl_sentinel ptds;
+
   /* Stop GC happening, even in outermost loads (because our caller
  could well be building up a lookup set).  */
   function_depth++;
@@ -19129,6 +19133,10 @@ lazy_load_binding (unsigned mod, tree ns, tree id, 
binding_slot *mslot)
 void
 lazy_load_pendings (tree decl)
 {
+  /* Make sure lazy loading from a template context behaves as if
+ from a non-template context.  */
+  processing_template_decl_sentinel ptds;
+
   tree key_decl;
   pending_key key;
   key.ns = find_pending_key (decl, &key_decl);
diff --git a/gcc/testsuite/g++.dg/modules/pr99377-2_a.C 
b/gcc/testsuite/g++.dg/modules/pr99377-2_a.C
new file mode 100644
index 000..26e2bccbbbe
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr99377-2_a.C
@@ -0,0 +1,5 @@
+// PR c++/99377
+// { dg-additional-options -fmodules-ts }
+// { dg-module-cmi pr99377 }
+export module pr99377;
+export inline void f() { }
diff --git a/gcc/testsuite/g++.dg/modules/pr99377-2_b.C 
b/gcc/testsuite/g++.dg/modules/pr99377-2_b.C
new file mode 100644
index 000..69571952c8a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr99377-2_b.C
@@ -0,0 +1,6 @@
+// PR c++/99377
+// { dg-additional-options -fmodules-ts }
+// { dg-do link }
+import pr99377;
+template void g() { f(); }
+int main() { g(); }
-- 
2.38.0.15.gbbe21b64a0

Re: [PATCH v3] libstdc++: fix pointer type exception catch [PR105387]

2022-10-11 Thread Jakob Hasse via Gcc-patches

Hello, is there any update regarding the patch PR105387 for bug 105387? We've 
been waiting for some time now, but the bugzilla bug is still open: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105387. If there is any issue with 
the patch (besides the ones we discussed before), please let us know. If 
there's no chance to integrate that patch, we would also like to know, to make 
decisions on how to handle the patch internally.

Thanks, and All the Best,
Jakob Hasse

From: Jonathan Wakely 
Sent: Wednesday, May 25, 2022 5:18 PM
To: Jakob Hasse 
Cc: libstd...@gcc.gnu.org ; gcc-patches 
; Anton Maklakov ; Ivan 
Grokhotkov 
Subject: Re: [PATCH v3] libstdc++: fix pointer type exception catch [PR105387]

[External: This email originated outside Espressif]

On Wed, 25 May 2022 at 03:30, Jakob Hasse via Libstdc++
 wrote:
>
> Hello,
>
> two weeks ago I submitted the second version of the patch PR105387 for the 
> bug 105387. Now I added a pointer-to-member exception test just to make sure 
> that it doesn't break in case RTTI is enabled. The test is disabled if RTTI 
> is disabled. I didn't receive any feedback so far regarding the second 
> version of the patch. Is there any issue preventing acceptance?

Just a lack of time to review it properly.

It's on my list.


>
> I ran the conformance tests on libstdc++v3 by running
> make -j 18 check RUNTESTFLAGS=conformance.exp
>
> Results for the current version (only difference is the added 
> pointer-to-member test):
>
> Without RTTI before applying patch:
> === libstdc++ Summary ===
>
> # of expected passes 14560
> # of unexpected failures 5
> # of expected failures 95
> # of unsupported tests 702
>
> Without RTTI after applying patch:
> === libstdc++ Summary ===
>
> # of expected passes 14562
> # of unexpected failures 5
> # of expected failures 95
> # of unsupported tests 703
>
> With RTTI before applying patch:
> === libstdc++ Summary ===
>
> # of expected passes 14598
> # of unexpected failures 2
> # of expected failures 95
> # of unsupported tests 683
>
> With RTTI after applying patch:
> === libstdc++ Summary ===
>
> # of expected passes 14602
> # of unexpected failures 2
> # of expected failures 95
> # of unsupported tests 683
>
> Given that the pointer-to-member test is disabled when RTTI is disabled, the 
> results look logical to me.
>

[PATCH] c++ modules: ICE with templated friend and std namespace [PR100134]

2022-10-11 Thread Patrick Palka via Gcc-patches

IIUC the function depset::hash::add_binding_entity has an assert
verifying that if a namespace contains an exported entity, then
the namespace must have been opened in the module purview:

  if (data->hash->add_namespace_entities (decl, data->partitions))
{
  /* It contains an exported thing, so it is exported.  */
  gcc_checking_assert (DECL_MODULE_PURVIEW_P (decl));
  DECL_MODULE_EXPORT_P (decl) = true;
}

We're tripping over this assert in the below testcase because by
instantiating and exporting std::A, we end up in turn defining
and exporting the hidden friend std::f without ever having opening
the enclosing namespace std within the module purview and thus
DECL_MODULE_PURVIEW_P (std_node) is false.

Note that it's important that the enclosing namespace is std here: if we
use a different namespace then the ICE disappears.  This probably has
something to do with the fact that we predefine std via push_namespace
from cxx_init_decl_processing (which makes it look like we've opened the
namespace in the TU), whereas with another namespace we would instead
lazily obtain the NAMESPACE_DECL from add_imported_namespace.

Since templated frined functions are special in that they give us a way
to declare a new namespace-scope function without having to explicitly
open the namespace, this patch proposes to fix this issue by propagating
DECL_MODULE_PURVIEW_P from a friend function to the enclosing namespace
when instantiating the friend.

Tested on x86_64-pc-linux-gnu, does this look like the right fix?  Other
solutions that seem to work are to set DECL_MODULE_PURVIEW_P on std_node
after the fact from declare_module, or simply to suppress the assert for
std_node.

PR c++/100134

gcc/cp/ChangeLog:

* pt.cc (tsubst_friend_function): Propagate DECL_MODULE_PURVIEW_P
from the new declaration to the enclosing namespace scope.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-friend-8_a.H: New test.
* g++.dg/modules/tpl-friend-8_b.C: New test.
---
 gcc/cp/pt.cc  | 7 +++
 gcc/testsuite/g++.dg/modules/tpl-friend-8_a.H | 9 +
 gcc/testsuite/g++.dg/modules/tpl-friend-8_b.C | 8 
 3 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-8_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-8_b.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 5b9fc588a21..9e3085f3fa6 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -11448,6 +11448,13 @@ tsubst_friend_function (tree decl, tree args)
 by duplicate_decls.  */
  new_friend = old_decl;
}
+
+  /* We've just added a new namespace-scope entity to the purview without
+necessarily having opened the enclosing namespace, so make sure the
+enclosing namespace is in the purview now too.  */
+  if (TREE_CODE (DECL_CONTEXT (new_friend)) == NAMESPACE_DECL)
+   DECL_MODULE_PURVIEW_P (DECL_CONTEXT (new_friend))
+ |= DECL_MODULE_PURVIEW_P (STRIP_TEMPLATE (new_friend));
 }
   else
 {
diff --git a/gcc/testsuite/g++.dg/modules/tpl-friend-8_a.H 
b/gcc/testsuite/g++.dg/modules/tpl-friend-8_a.H
new file mode 100644
index 000..bd2290460b5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/tpl-friend-8_a.H
@@ -0,0 +1,9 @@
+// PR c++/100134
+// { dg-additional-options -fmodule-header }
+// { dg-module-cmi {} }
+
+namespace std {
+  template struct A {
+friend void f(A) { }
+  };
+}
diff --git a/gcc/testsuite/g++.dg/modules/tpl-friend-8_b.C 
b/gcc/testsuite/g++.dg/modules/tpl-friend-8_b.C
new file mode 100644
index 000..76d7447c2eb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/tpl-friend-8_b.C
@@ -0,0 +1,8 @@
+// PR c++/100134
+// { dg-additional-options -fmodules-ts }
+// { dg-module-cmi pr100134 }
+export module pr100134;
+
+import "tpl-friend-8_a.H";
+
+export std::A a;
-- 
2.38.0.15.gbbe21b64a0

[PATCH] testsuite: Only run -fcf-protection test on i?86/x86_64 [PR107213]

2022-10-11 Thread Marek Polacek via Gcc-patches

This test fails on non-i?86/x86_64 targets because on those targets
we get

  error: '-fcf-protection=full' is not supported for this target

so this patch limits where the test is run.

Tested on x86_64-pc-linux-gnu, ok for trunk?

gcc/testsuite/ChangeLog:

* c-c++-common/pointer-to-fn1.c: Only run on i?86/x86_64.
---
 gcc/testsuite/c-c++-common/pointer-to-fn1.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/c-c++-common/pointer-to-fn1.c 
b/gcc/testsuite/c-c++-common/pointer-to-fn1.c
index 975885462e9..e2f948d824a 100644
--- a/gcc/testsuite/c-c++-common/pointer-to-fn1.c
+++ b/gcc/testsuite/c-c++-common/pointer-to-fn1.c
@@ -1,4 +1,5 @@
 /* PR c++/106937 */
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
 /* { dg-options "-fcf-protection" } */
 /* { dg-additional-options "-std=c++11 -fpermissive" { target c++ } } */
 /* Test printing a pointer to function with attribute.  */

base-commit: 637e3668fdc17c4e226538fb14f9fab225433d01
-- 
2.37.3

Re: [PATCH] c++ modules: lazy loading from within template [PR99377]

2022-10-11 Thread Nathan Sidwell via Gcc-patches


On 10/11/22 10:58, Patrick Palka wrote:

On Mon, 10 Oct 2022, Nathan Sidwell wrote:


On 10/4/22 13:36, Patrick Palka wrote:

Here when lazily loading the binding for f at parse time from the
template g, processing_template_decl is set and thus the call to
note_vague_linkage_fn from module_state::read_cluster has no effect,
and we never push f onto deferred_fns and end up never emitting its
definition.

ISTM the behavior of the lazy loading machinery shouldn't be sensitive
to whether we're inside a template, and therefore we should probably be
clearing processing_template_decl somewhere e.g in lazy_load_binding.
This is sufficient to fix the testcase.


yeah, I remember hitting issues with this, but thought I'd got rid of the need
to override processing_template_decl.  Do you also need to override it in
lazy_load_pendings though? that's a lazy loader and my suspicion is it might
be susceptible to the same issues.


Hmm yeah, looks like we need to override it in lazy_load_pendings too:
I ran the testsuite with gcc_assert (!processing_template_decl) added to
module_state::read_cluster and if we don't also override it in
lazy_load_binding then the assert triggers for pr99425-2_b.X.





But it also seems the processing_template_decl test in
note_vague_linkage_fn, added by r8-7539-g977bc3ee11383e for PR84973, is
perhaps too strong: if the intent is to avoid deferring output for
uninstantiated templates, we should make sure that DECL in question is
an uninstantiated template by checking e.g. value_dependent_expression_p.
This too is sufficient to fix the testcase (since f isn't a template)
and survives bootstrap and regtest.


I think this is an orthogonal issue -- can we remove it from this patch?


Done.

-- >8 --

Subject: [PATCH] c++ modules: lazy loading from within template [PR99377]

Here when lazily loading the binding for f at parse time from the
template g, processing_template_decl is set and thus the call to
note_vague_linkage_fn from module_state::read_cluster has no effect,
and we never push f onto deferred_fns and end up never emitting its
definition.

ISTM the behavior of the lazy loading machinery shouldn't be sensitive
to whether we're inside a template, and therefore we should be clearing
processing_template_decl in the entrypoints lazy_load_binding and
lazy_load_pendings.

PR c++/99377

gcc/cp/ChangeLog:

* module.cc (lazy_load_binding): Clear processing_template_decl.
(lazy_load_pendings): Likewise.


ok, thanks



gcc/testsuite/ChangeLog:

* g++.dg/modules/pr99377-2_a.C: New test.
* g++.dg/modules/pr99377-2_b.C: New test.
---
  gcc/cp/module.cc   | 8 
  gcc/testsuite/g++.dg/modules/pr99377-2_a.C | 5 +
  gcc/testsuite/g++.dg/modules/pr99377-2_b.C | 6 ++
  3 files changed, 19 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/modules/pr99377-2_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/pr99377-2_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 94fbee85225..7c48602136c 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -19081,6 +19081,10 @@ lazy_load_binding (unsigned mod, tree ns, tree id, 
binding_slot *mslot)
  
timevar_start (TV_MODULE_IMPORT);
  
+  /* Make sure lazy loading from a template context behaves as if

+ from a non-template context.  */
+  processing_template_decl_sentinel ptds;
+
/* Stop GC happening, even in outermost loads (because our caller
   could well be building up a lookup set).  */
function_depth++;
@@ -19129,6 +19133,10 @@ lazy_load_binding (unsigned mod, tree ns, tree id, 
binding_slot *mslot)
  void
  lazy_load_pendings (tree decl)
  {
+  /* Make sure lazy loading from a template context behaves as if
+ from a non-template context.  */
+  processing_template_decl_sentinel ptds;
+
tree key_decl;
pending_key key;
key.ns = find_pending_key (decl, &key_decl);
diff --git a/gcc/testsuite/g++.dg/modules/pr99377-2_a.C 
b/gcc/testsuite/g++.dg/modules/pr99377-2_a.C
new file mode 100644
index 000..26e2bccbbbe
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr99377-2_a.C
@@ -0,0 +1,5 @@
+// PR c++/99377
+// { dg-additional-options -fmodules-ts }
+// { dg-module-cmi pr99377 }
+export module pr99377;
+export inline void f() { }
diff --git a/gcc/testsuite/g++.dg/modules/pr99377-2_b.C 
b/gcc/testsuite/g++.dg/modules/pr99377-2_b.C
new file mode 100644
index 000..69571952c8a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr99377-2_b.C
@@ -0,0 +1,6 @@
+// PR c++/99377
+// { dg-additional-options -fmodules-ts }
+// { dg-do link }
+import pr99377;
+template void g() { f(); }
+int main() { g(); }


--
Nathan Sidwell

Re: [PATCH] Use cxx11 abi in versioned namespace

2022-10-11 Thread François Dumont via Gcc-patches


Hi

    Now that pretty printer is fixed (once patch validated) I'd like to 
propose this patch again.


    Note that I'am adding a check on pretty printer with a std::any on 
a std::wstring. I did so because of the FIXME in printers.py which is 
dealing with 'std::string' explicitely. Looks like in my case, where 
there is no 'std::string' but just a 'std::__8::string' we do not need 
the workaround.


    Once again I am attaching also the version namespace bump patch as 
I think that adopting the cxx11 abi in this mode is a good enough reason 
to bump it. If you agress let me know if I should squash the commits 
before pushing.


    libstdc++: [_GLIBCXX_INLINE_VERSION] Use cxx11 abi

    Use cxx11 abi when activating versioned namespace mode.

    libstdcxx-v3/ChangeLog:

    * acinclude.m4 [GLIBCXX_ENABLE_LIBSTDCXX_DUAL_ABI]: Default 
to "new" libstdcxx abi.
    * config/locale/dragonfly/monetary_members.cc 
[!_GLIBCXX_USE_DUAL_ABI]: Define money_base

    members.
    * config/locale/generic/monetary_members.cc 
[!_GLIBCXX_USE_DUAL_ABI]: Likewise.
    * config/locale/gnu/monetary_members.cc 
[!_GLIBCXX_USE_DUAL_ABI]: Likewise.

    * config/locale/gnu/numeric_members.cc
    [!_GLIBCXX_USE_DUAL_ABI](__narrow_multibyte_chars): Define.
    * configure: Regenerate.
    * include/bits/c++config
    [_GLIBCXX_INLINE_VERSION](_GLIBCXX_NAMESPACE_CXX11, 
_GLIBCXX_BEGIN_NAMESPACE_CXX11): Define

    empty.
[_GLIBCXX_INLINE_VERSION](_GLIBCXX_END_NAMESPACE_CXX11, 
_GLIBCXX_DEFAULT_ABI_TAG): Likewise.

    * python/libstdcxx/v6/printers.py
    (StdStringPrinter::__init__): Set self.new_string to True 
when std::__8::basic_string type is

    found.
    * src/Makefile.am 
[ENABLE_SYMVERS_GNU_NAMESPACE](ldbl_alt128_compat_sources): Define empty.

    * src/Makefile.in: Regenerate.
    * src/c++11/Makefile.am (cxx11_abi_sources): Rename into...
    (dual_abi_sources): ...this, new. Also move several sources 
to...

    (sources): ...this.
    (extra_string_inst_sources): Move several sources to...
    (inst_sources): ...this.
    * src/c++11/Makefile.in: Regenerate.
    * src/c++11/cow-fstream-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-locale_init.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-sstream-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-stdexcept.cc 
[_GLIBCXX_USE_CXX11_ABI](error_category::_M_message):

    Skip definition.
    [_GLIBCXX_USE_CXX11_ABI]: Skip Transaction Memory TS 
definitions.
    * src/c++11/cow-string-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-string-io-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-wstring-inst.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-wstring-io-inst.cc 
[_GLIBCXX_USE_CXX11_ABI]: Skip definitions.
    * src/c++11/cxx11-hash_tr1.cc [!_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cxx11-ios_failure.cc [!_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.

    [!_GLIBCXX_USE_DUAL_ABI] (__ios_failure): Remove.
    * src/c++11/cxx11-locale-inst.cc: Cleanup, just include 
locale-inst.cc.
    * src/c++11/cxx11-stdexcept.cc [!_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.

    [!_GLIBCXX_USE_DUAL_ABI](__cow_string): Remove.
    * src/c++11/cxx11-wlocale-inst.cc 
[!_GLIBCXX_USE_CXX11_ABI]: Skip definitions.
    * src/c++11/fstream-inst.cc [!_GLIBCXX_USE_CXX11_ABI]: Skip 
definitions

    * src/c++11/locale-inst-numeric.h
[!_GLIBCXX_USE_DUAL_ABI](std::use_facet>, 
std::use_facet>): Instantiate.
[!_GLIBCXX_USE_DUAL_ABI](std::has_facet>, 
std::has_facet>): Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](std::num_getistreambuf_iterator>): Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](std::num_putostreambuf_iterator>): Instantiate.
    * src/c++11/locale-inst.cc [!_GLIBCXX_USE_DUAL_ABI]: Build 
only when configured

    _GLIBCXX_USE_CXX11_ABI is equal to currently built abi.
    [!_GLIBCXX_USE_DUAL_ABI](__moneypunct_cache): 
Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](__moneypunct_cache): 
Instantiate.

    [!_GLIBCXX_USE_DUAL_ABI](__numpunct_cache): Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](__timepunct): Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](__timepunct_cache): Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](time_putostreambuf_iterator>): Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](time_put_bynameostreambuf_iterator>): Instantiate.

[!_GLIBCXX_USE_DUAL_ABI](__ctype_abstract_base): Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](ctype_byname): Instantiate.
    [!_GLIBCXX_USE_DUAL_ABI](__codecvt_abstract_basembstate_t>)

Re: [PATCH] c++ modules: ICE with templated friend and std namespace [PR100134]

2022-10-11 Thread Nathan Sidwell via Gcc-patches


On 10/11/22 11:35, Patrick Palka wrote:

IIUC the function depset::hash::add_binding_entity has an assert
verifying that if a namespace contains an exported entity, then
the namespace must have been opened in the module purview:

   if (data->hash->add_namespace_entities (decl, data->partitions))
 {
   /* It contains an exported thing, so it is exported.  */
   gcc_checking_assert (DECL_MODULE_PURVIEW_P (decl));
   DECL_MODULE_EXPORT_P (decl) = true;
 }

We're tripping over this assert in the below testcase because by
instantiating and exporting std::A, we end up in turn defining
and exporting the hidden friend std::f without ever having opening
the enclosing namespace std within the module purview and thus
DECL_MODULE_PURVIEW_P (std_node) is false.

Note that it's important that the enclosing namespace is std here: if we
use a different namespace then the ICE disappears.  This probably has
something to do with the fact that we predefine std via push_namespace
from cxx_init_decl_processing (which makes it look like we've opened the
namespace in the TU), whereas with another namespace we would instead
lazily obtain the NAMESPACE_DECL from add_imported_namespace.

Since templated frined functions are special in that they give us a way
to declare a new namespace-scope function without having to explicitly
open the namespace, this patch proposes to fix this issue by propagating
DECL_MODULE_PURVIEW_P from a friend function to the enclosing namespace
when instantiating the friend.


ouch.  This is ok, but I think we have a bug -- what is the module 
ownership of the friend introduced by the instantiation?


Haha, there's a note on 13.7.5/3 -- the attachment is to the same module 
as the befriending class.


That means we end up creating and writing out entities that exist in the 
symbol table (albeit hidden) whose module ownership is neither the 
global module or the tu's module.  That's not something the module 
machinery anticipates. We'll get the mangling wrong for starters. Hmm.


These are probably rare.  Thinking about the right solution though ...

nathan




Tested on x86_64-pc-linux-gnu, does this look like the right fix?  Other
solutions that seem to work are to set DECL_MODULE_PURVIEW_P on std_node
after the fact from declare_module, or simply to suppress the assert for
std_node.

PR c++/100134

gcc/cp/ChangeLog:

* pt.cc (tsubst_friend_function): Propagate DECL_MODULE_PURVIEW_P
from the new declaration to the enclosing namespace scope.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-friend-8_a.H: New test.
* g++.dg/modules/tpl-friend-8_b.C: New test.
---
  gcc/cp/pt.cc  | 7 +++
  gcc/testsuite/g++.dg/modules/tpl-friend-8_a.H | 9 +
  gcc/testsuite/g++.dg/modules/tpl-friend-8_b.C | 8 
  3 files changed, 24 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-8_a.H
  create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-8_b.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 5b9fc588a21..9e3085f3fa6 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -11448,6 +11448,13 @@ tsubst_friend_function (tree decl, tree args)
 by duplicate_decls.  */
  new_friend = old_decl;
}
+
+  /* We've just added a new namespace-scope entity to the purview without
+necessarily having opened the enclosing namespace, so make sure the
+enclosing namespace is in the purview now too.  */
+  if (TREE_CODE (DECL_CONTEXT (new_friend)) == NAMESPACE_DECL)
+   DECL_MODULE_PURVIEW_P (DECL_CONTEXT (new_friend))
+ |= DECL_MODULE_PURVIEW_P (STRIP_TEMPLATE (new_friend));
  }
else
  {
diff --git a/gcc/testsuite/g++.dg/modules/tpl-friend-8_a.H 
b/gcc/testsuite/g++.dg/modules/tpl-friend-8_a.H
new file mode 100644
index 000..bd2290460b5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/tpl-friend-8_a.H
@@ -0,0 +1,9 @@
+// PR c++/100134
+// { dg-additional-options -fmodule-header }
+// { dg-module-cmi {} }
+
+namespace std {
+  template struct A {
+friend void f(A) { }
+  };
+}
diff --git a/gcc/testsuite/g++.dg/modules/tpl-friend-8_b.C 
b/gcc/testsuite/g++.dg/modules/tpl-friend-8_b.C
new file mode 100644
index 000..76d7447c2eb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/tpl-friend-8_b.C
@@ -0,0 +1,8 @@
+// PR c++/100134
+// { dg-additional-options -fmodules-ts }
+// { dg-module-cmi pr100134 }
+export module pr100134;
+
+import "tpl-friend-8_a.H";
+
+export std::A a;


--
Nathan Sidwell

Re: [PATCH] middle-end IFN_ASSUME support [PR106654]

2022-10-11 Thread Andrew MacLeod via Gcc-patches




On 10/10/22 04:54, Jakub Jelinek via Gcc-patches wrote:

Hi!

My earlier patches gimplify the simplest non-side-effects assumptions
into if (cond) ; else __builtin_unreachable (); and throw the rest
on the floor.
The following patch attempts to do something with the rest too.
For -O0, it actually throws even the simplest assumptions on the floor,
we don't expect optimizations and the assumptions are there to allow
optimizations.  Otherwise, it keeps the handling of the simplest
assumptions as is, and otherwise arranges for the assumptions to be
visible in the IL as
   .ASSUME (_Z2f4i._assume.0, i_1(D));
call where there is an artificial function like:
bool _Z2f4i._assume.0 (int i)
{
   bool _2;

[local count: 1073741824]:
   _2 = i_1(D) == 43;
   return _2;

}
with the semantics that there is UB unless the assumption function
would return true.

Aldy, could ranger handle this?  If it sees .ASSUME call,
walk the body of such function from the edge(s) to exit with the
assumption that the function returns true, so above set _2 [true, true]
and from there derive that i_1(D) [43, 43] and then map the argument
in the assumption function to argument passed to IFN_ASSUME (note,
args there are shifted by 1)?



Ranger GORI component could assume the return value is [1,1] and work 
backwards from there. Single basic blocks would be trivial. The problem 
becomes when there are multiple blocks.   The gori engine has no real 
restriction other than it works from within a basic block only


I see no reason we couldn't wire something up that continues propagating 
values out the top of the block evaluating things for more complicated 
cases.  you would end up with a set of ranges for names which are the 
"maximal" possible range based on the restriction that the return value 
is [1,1].




During gimplification it actually gimplifies it into
   D.2591 = .ASSUME ();
   if (D.2591 != 0) goto ; else goto ;
   :
   {
 i = i + 1;
 D.2591 = i == 44;
   }
   :
   .ASSUME (D.2591);
with the condition wrapped into a GIMPLE_BIND (I admit the above isn't
extra clean but it is just something to hold it from gimplifier until
gimple low pass; it reassembles if (condition_never_true) { cond; };



What we really care about is what the SSA form looks like.. thats what 
ranger will deal with.


Is this function inlined?  If it isn't then you'd need LTO/IPA to 
propagate the ranges we calculated above for the function. Or some 
special pass that reads assumes, does the processing you mention above 
and applies it?  Is that what you are thinking?


Looking at assume7.C, I see:

int bar (int x)
{
   [local count: 1073741824]:
  .ASSUME (_Z3bari._assume.0, x_1(D));
  return x_1(D);

And:

bool _Z3bari._assume.0 (int x)
{
  bool _2;

   [local count: 1073741824]:
  _2 = x_1(D) == 42;
  return _2;


Using the above approach, GORI could tell you that if _2 is [1,1] that 
x_1 must be [42,42].


If you are parsing that ASSUME, you could presumably match things pu and 
we could make x_1 have a range of [42,42] in bar() at that call.


this would require a bit of processing in fold_using_range for handling 
function calls, checking for this case and so on, but quite doable.


looking at the more complicated case for

bool _Z3bazi._assume.0 (int x)

it seems that the answer is determines without processing most of the 
function. ie:, work from the bottom up:


   [local count: 670631318]:
  _8 = x_3 == 43;   x_3 = [43,43]

   [local count: 1073741824]:
  # _1 = PHI <0(2), _8(5)>  _8 = [1,1]  2->6 cant happen
  return _1;    _1 = [1,1]

you only care about x, so as soon as you find a result that that, you'd 
actually be done.   However, I can imagine cases where you do need to go 
all the way back to the top of the assume function.. and combine values. Ie


bool assume (int x, int y)
{
  if (y > 10)
    return x == 2;
  return x > 20;
}

   [local count: 1073741824]:
  if (y_2(D) > 10)
    goto ; [34.00%]
  else
    goto ; [66.00%]

   [local count: 365072224]:
  _5 = x_3(D) == 2;                    x_3 = [2,2]
  goto ; [100.00%]

   [local count: 708669601]:
  _4 = x_3(D) > 20;                    x_3 = [21, +INF]

   [local count: 1073741824]:
  # _1 = PHI <_5(3), _4(4)>      _5 = [1,1], _4 = [1,1]

  return _1;

And we'd have a range of [2,2][21, +INF]
if you wanted to be able to plug values of Y in, things would get more 
complicated, but the framework would all be there.



OK, enough pontificating,   it wouldn't be ranger, but yes, you could 
wire GORI into a pass which evaluates what we think the range of a set 
of inputs would be if we return 1.  I don't think It would be very 
difficult, just a bit of work to work the IL in reverse.  And then you 
should be able to wire in a range update for those parameters in 
fold_using_range (or wherever else I suppose) with a little more work.


It seems to me that if you were to "optimize" the function via this new 
pass,  assumi

Re: [PATCH] libstdc++: Allow emergency EH alloc pool size to be tuned [PR68606]

2022-10-11 Thread David Edelsohn via Gcc-patches

This patch seems to have broken bootstrap on AIX.  It seems to assume
methods that aren't guaranteed to be defined.

Thanks, David

libtool: compile:  /tmp/GCC/./gcc/xgcc -B/tmp/GCC/./gcc/
-B/nasfarm/edelsohn/ins
tall/GCC/powerpc-ibm-aix7.2.5.0/bin/
-B/nasfarm/edelsohn/install/GCC/powerpc-ibm
-aix7.2.5.0/lib/ -isystem
/nasfarm/edelsohn/install/GCC/powerpc-ibm-aix7.2.5.0/i
nclude -isystem
/nasfarm/edelsohn/install/GCC/powerpc-ibm-aix7.2.5.0/sys-include
 -fno-checking -DHAVE_CONFIG_H -I..
-I/nasfarm/edelsohn/src/src/libstdc++-v3/../
libiberty -I/nasfarm/edelsohn/src/src/libstdc++-v3/../include
-D_GLIBCXX_SHARED
-I/tmp/GCC/powerpc-ibm-aix7.2.5.0/libstdc++-v3/include/powerpc-ibm-aix7.2.5.0
-I
/tmp/GCC/powerpc-ibm-aix7.2.5.0/libstdc++-v3/include
-I/nasfarm/edelsohn/src/src
/libstdc++-v3/libsupc++ -I/nasfarm/edelsohn/install/include
-I/nasfarm/edelsohn/
install/include -g -O2 -DIN_GLIBCPP_V3 -Wno-error -c cp-demangle.c  -fPIC
-DPIC -o cp-demangle.o
/nasfarm/edelsohn/src/src/libstdc++-v3/libsupc++/eh_alloc.cc: In member
function 'void* {anonymous}::pool::allocate(std::size_t)':
/nasfarm/edelsohn/src/src/libstdc++-v3/libsupc++/eh_alloc.cc:239:54: error:
no matching function for call to
'__gnu_cxx::__scoped_lock::__scoped_lock(int&)'
  239 |   __gnu_cxx::__scoped_lock sentry(emergency_mutex);
  |  ^
In file included from
/nasfarm/edelsohn/src/src/libstdc++-v3/libsupc++/eh_alloc.cc:37:
/tmp/GCC/powerpc-ibm-aix7.2.5.0/libstdc++-v3/include/ext/concurrence.h:240:14:
note: candidate: '__gnu_cxx::__scoped_lock::__scoped_lock(__mutex_type&)'
  240 | explicit __scoped_lock(__mutex_type& __name) : _M_device(__name)
  |  ^
/tmp/GCC/powerpc-ibm-aix7.2.5.0/libstdc++-v3/include/ext/concurrence.h:240:42:
note:   no known conversion for argument 1 from 'int' to
'__gnu_cxx::__scoped_lock::__mutex_type&'
  240 | explicit __scoped_lock(__mutex_type& __name) : _M_device(__name)
  |~~^~
/tmp/GCC/powerpc-ibm-aix7.2.5.0/libstdc++-v3/include/ext/concurrence.h:236:5:
note: candidate: '__gnu_cxx::__scoped_lock::__scoped_lock(const
__gnu_cxx::__scoped_lock&)'
  236 | __scoped_lock(const __scoped_lock&);
  | ^
/tmp/GCC/powerpc-ibm-aix7.2.5.0/libstdc++-v3/include/ext/concurrence.h:236:19:
note:   no known conversion for argument 1 from 'int' to 'const
__gnu_cxx::__scoped_lock&'
  236 | __scoped_lock(const __scoped_lock&);
  |   ^~~~
make[5]: *** [Makefile:778: eh_alloc.lo] Error 1

[PATCH] Fortran: check types of source expressions before conversion [PR107215]

2022-10-11 Thread Harald Anlauf via Gcc-patches

Dear all,

this PR is an obvious followup to PR107000, where invalid
types appeared in array constructors and lead to an ICE
either in a conversion or reduction of a unary or binary
expression.

The present PR shows that several other conversions need to
be protected by a check of the type of the source expression.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 87dae7eb9d4cc76060d258ba99bc53f62c7130f2 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 11 Oct 2022 20:37:42 +0200
Subject: [PATCH] Fortran: check types of source expressions before conversion
 [PR107215]

gcc/fortran/ChangeLog:

	PR fortran/107215
	* arith.cc (gfc_int2int): Check validity of type of source expr.
	(gfc_int2real): Likewise.
	(gfc_int2complex): Likewise.
	(gfc_real2int): Likewise.
	(gfc_real2real): Likewise.
	(gfc_complex2int): Likewise.
	(gfc_complex2real): Likewise.
	(gfc_complex2complex): Likewise.
	(gfc_log2log): Likewise.
	(gfc_log2int): Likewise.
	(gfc_int2log): Likewise.

gcc/testsuite/ChangeLog:

	PR fortran/107215
	* gfortran.dg/pr107215.f90: New test.
---
 gcc/fortran/arith.cc   | 33 ++
 gcc/testsuite/gfortran.dg/pr107215.f90 | 17 +
 2 files changed, 50 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/pr107215.f90

diff --git a/gcc/fortran/arith.cc b/gcc/fortran/arith.cc
index 086b1f856b1..9e079e42995 100644
--- a/gcc/fortran/arith.cc
+++ b/gcc/fortran/arith.cc
@@ -2040,6 +2040,9 @@ gfc_int2int (gfc_expr *src, int kind)
   gfc_expr *result;
   arith rc;

+  if (src->ts.type != BT_INTEGER)
+return NULL;
+
   result = gfc_get_constant_expr (BT_INTEGER, kind, &src->where);

   mpz_set (result->value.integer, src->value.integer);
@@ -2085,6 +2088,9 @@ gfc_int2real (gfc_expr *src, int kind)
   gfc_expr *result;
   arith rc;

+  if (src->ts.type != BT_INTEGER)
+return NULL;
+
   result = gfc_get_constant_expr (BT_REAL, kind, &src->where);

   mpfr_set_z (result->value.real, src->value.integer, GFC_RND_MODE);
@@ -2116,6 +2122,9 @@ gfc_int2complex (gfc_expr *src, int kind)
   gfc_expr *result;
   arith rc;

+  if (src->ts.type != BT_INTEGER)
+return NULL;
+
   result = gfc_get_constant_expr (BT_COMPLEX, kind, &src->where);

   mpc_set_z (result->value.complex, src->value.integer, GFC_MPC_RND_MODE);
@@ -2150,6 +2159,9 @@ gfc_real2int (gfc_expr *src, int kind)
   arith rc;
   bool did_warn = false;

+  if (src->ts.type != BT_REAL)
+return NULL;
+
   result = gfc_get_constant_expr (BT_INTEGER, kind, &src->where);

   gfc_mpfr_to_mpz (result->value.integer, src->value.real, &src->where);
@@ -2196,6 +2208,9 @@ gfc_real2real (gfc_expr *src, int kind)
   arith rc;
   bool did_warn = false;

+  if (src->ts.type != BT_REAL)
+return NULL;
+
   result = gfc_get_constant_expr (BT_REAL, kind, &src->where);

   mpfr_set (result->value.real, src->value.real, GFC_RND_MODE);
@@ -2310,6 +2325,9 @@ gfc_complex2int (gfc_expr *src, int kind)
   arith rc;
   bool did_warn = false;

+  if (src->ts.type != BT_COMPLEX)
+return NULL;
+
   result = gfc_get_constant_expr (BT_INTEGER, kind, &src->where);

   gfc_mpfr_to_mpz (result->value.integer, mpc_realref (src->value.complex),
@@ -2372,6 +2390,9 @@ gfc_complex2real (gfc_expr *src, int kind)
   arith rc;
   bool did_warn = false;

+  if (src->ts.type != BT_COMPLEX)
+return NULL;
+
   result = gfc_get_constant_expr (BT_REAL, kind, &src->where);

   mpc_real (result->value.real, src->value.complex, GFC_RND_MODE);
@@ -2439,6 +2460,9 @@ gfc_complex2complex (gfc_expr *src, int kind)
   arith rc;
   bool did_warn = false;

+  if (src->ts.type != BT_COMPLEX)
+return NULL;
+
   result = gfc_get_constant_expr (BT_COMPLEX, kind, &src->where);

   mpc_set (result->value.complex, src->value.complex, GFC_MPC_RND_MODE);
@@ -2504,6 +2528,9 @@ gfc_log2log (gfc_expr *src, int kind)
 {
   gfc_expr *result;

+  if (src->ts.type != BT_LOGICAL)
+return NULL;
+
   result = gfc_get_constant_expr (BT_LOGICAL, kind, &src->where);
   result->value.logical = src->value.logical;

@@ -2518,6 +2545,9 @@ gfc_log2int (gfc_expr *src, int kind)
 {
   gfc_expr *result;

+  if (src->ts.type != BT_LOGICAL)
+return NULL;
+
   result = gfc_get_constant_expr (BT_INTEGER, kind, &src->where);
   mpz_set_si (result->value.integer, src->value.logical);

@@ -2532,6 +2562,9 @@ gfc_int2log (gfc_expr *src, int kind)
 {
   gfc_expr *result;

+  if (src->ts.type != BT_INTEGER)
+return NULL;
+
   result = gfc_get_constant_expr (BT_LOGICAL, kind, &src->where);
   result->value.logical = (mpz_cmp_si (src->value.integer, 0) != 0);

diff --git a/gcc/testsuite/gfortran.dg/pr107215.f90 b/gcc/testsuite/gfortran.dg/pr107215.f90
new file mode 100644
index 000..2c2a0ca7502
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr107215.f90
@@ -0,0 +1,17 @@
+! { dg-do compile }
+! PR fortran/107215 - ICE in gfc_real2real and gfc_complex2complex
+! Contributed by G.Steinmetz
+
+program p
+  double precision, parameter :: z = 1.0

Re: [PATCH v2] btf: Add support to BTF_KIND_ENUM64 type

2022-10-11 Thread Indu Bhagat via Gcc-patches


Hi Guillermo,

On 10/3/22 7:39 AM, Guillermo E. Martinez via Gcc-patches wrote:

diff --git a/gcc/ctfc.cc b/gcc/ctfc.cc
index 9773358a475..253c36b6a0a 100644
--- a/gcc/ctfc.cc
+++ b/gcc/ctfc.cc
@@ -604,6 +604,7 @@ ctf_add_enum (ctf_container_ref ctfc, uint32_t 
flag, const char * name,

    gcc_assert (size <= CTF_MAX_SIZE);
    dtd->dtd_data.ctti_size = size;
+  dtd->flags = CTF_ENUM_F_NONE;
    ctfc->ctfc_num_stypes++;
@@ -612,7 +613,7 @@ ctf_add_enum (ctf_container_ref ctfc, uint32_t 
flag, const char * name,

  int
  ctf_add_enumerator (ctf_container_ref ctfc, ctf_id_t enid, const 
char * name,

-    HOST_WIDE_INT value, dw_die_ref die)
+    HOST_WIDE_INT value, uint32_t flags, dw_die_ref die)
  {
    ctf_dmdef_t * dmd;
    uint32_t kind, vlen, root;
@@ -630,10 +631,12 @@ ctf_add_enumerator (ctf_container_ref ctfc, 
ctf_id_t enid, const char * name,

    gcc_assert (kind == CTF_K_ENUM && vlen < CTF_MAX_VLEN);
-  /* Enum value is of type HOST_WIDE_INT in the compiler, dmd_value 
is int32_t
- on the other hand.  Check bounds and skip adding this enum 
value if out of

- bounds.  */
-  if ((value > INT_MAX) || (value < INT_MIN))
+  /* Enum value is of type HOST_WIDE_INT in the compiler, CTF 
enumerators
+ values in ctf_enum_t is limited to int32_t, BTF supports signed 
and
+ unsigned enumerators values of 32 and 64 bits, for both debug 
formats

+ we use ctf_dmdef_t.dmd_value entry of HOST_WIDE_INT type. So check
+ CTF bounds and skip adding this enum value if out of bounds.  */
+  if (!btf_debuginfo_p() && ((value > INT_MAX) || (value < INT_MIN)))
  {
    /* FIXME - Note this TBD_CTF_REPRESENTATION_LIMIT.  */
    return (1);
@@ -649,6 +652,7 @@ ctf_add_enumerator (ctf_container_ref ctfc, 
ctf_id_t enid, const char * name,

    dmd->dmd_value = value;
    dtd->dtd_data.ctti_info = CTF_TYPE_INFO (kind, root, vlen + 1);
+  dtd->flags |= flags;
    ctf_dmd_list_append (&dtd->dtd_u.dtu_members, dmd);
    if ((name != NULL) && strcmp (name, ""))
diff --git a/gcc/ctfc.h b/gcc/ctfc.h
index bcf3a43ae1b..a22342b2610 100644
--- a/gcc/ctfc.h
+++ b/gcc/ctfc.h
@@ -125,6 +125,10 @@ typedef struct GTY (()) ctf_itype
  #define CTF_FUNC_VARARG 0x1
+/* Enum specific flags.  */
+#define CTF_ENUM_F_NONE (0)
+#define CTF_ENUM_F_ENUMERATORS_SIGNED   (1 << 0)
+
  /* Struct/union/enum member definition for CTF generation.  */
  typedef struct GTY ((chain_next ("%h.dmd_next"))) ctf_dmdef
@@ -133,7 +137,7 @@ typedef struct GTY ((chain_next ("%h.dmd_next"))) 
ctf_dmdef

    ctf_id_t dmd_type;    /* Type of this member (for sou).  */
    uint32_t dmd_name_offset;    /* Offset of the name in str table.  */
    uint64_t dmd_offset;    /* Offset of this member in bits (for 
sou).  */

-  int dmd_value;    /* Value of this member (for enum).  */
+  HOST_WIDE_INT dmd_value;    /* Value of this member (for enum).  */
    struct ctf_dmdef * dmd_next;    /* A list node.  */
  } ctf_dmdef_t;


I am wondering if you considered adding a member here instead - 
something like-


bool dmd_value_signed; /* Signedness for the enumerator.  */.

See comment below.


@@ -162,6 +166,7 @@ struct GTY ((for_user)) ctf_dtdef
    bool from_global_func; /* Whether this type was added from a global
  function.  */
    uint32_t linkage;   /* Used in function types.  0=local, 
1=global.  */
+  uint32_t flags; /* Flags to describe specific type's 
properties.  */

    union GTY ((desc ("ctf_dtu_d_union_selector (&%1)")))
    {
  /* struct, union, or enum.  */


Instead of carrying this information in ctf_dtdef which is the data 
structure for each type in CTF, how about adding a new member in 
struct ctf_dmdef? The "flags" member is meant for only enum types, and 
hence it will be more appropriate to add to ctf_dmdef as say, 
dmd_value_signed.




Yes, `ctf_dtdef' is structure for each type in CTF (including enumeration),
and `ctf_dmdef' keeps information for enumerator, not for the 
enumeration type.


Yes, please scrap my earlier suggestion of adding to ctf_dmdef_t.

What do you think about adding something like 'dtd_enum_signedness' to 
ctf_dtdef, instead of uint32_t 'flags'; with two possible values of 0 
(unsigned) and 1 (signed).


I believe your intention of using the latter is to conserve some memory 
in the long run (by reusing the flags field for other types in future if 
need be)? I do, however, prefer an explicit member like 
dtd_enum_signedness at this time. My reasoning for keeping it explicit 
is that it helps code be more readable/maintainable.


Thanks for your patience,
Indu

Re: [PATCH] libstdc++: Allow emergency EH alloc pool size to be tuned [PR68606]

2022-10-11 Thread Jonathan Wakely via Gcc-patches

On Tue, 11 Oct 2022, 19:38 David Edelsohn via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> This patch seems to have broken bootstrap on AIX.  It seems to assume
> methods that aren't guaranteed to be defined.
>


It doesn't use anything that wasn't already used by that file.

I have no idea how it ever compiled if it doesn't now, but I'll take a look.



> Thanks, David
>
> libtool: compile:  /tmp/GCC/./gcc/xgcc -B/tmp/GCC/./gcc/
> -B/nasfarm/edelsohn/ins
> tall/GCC/powerpc-ibm-aix7.2.5.0/bin/
> -B/nasfarm/edelsohn/install/GCC/powerpc-ibm
> -aix7.2.5.0/lib/ -isystem
> /nasfarm/edelsohn/install/GCC/powerpc-ibm-aix7.2.5.0/i
> nclude -isystem
> /nasfarm/edelsohn/install/GCC/powerpc-ibm-aix7.2.5.0/sys-include
>  -fno-checking -DHAVE_CONFIG_H -I..
> -I/nasfarm/edelsohn/src/src/libstdc++-v3/../
> libiberty -I/nasfarm/edelsohn/src/src/libstdc++-v3/../include
> -D_GLIBCXX_SHARED
>
> -I/tmp/GCC/powerpc-ibm-aix7.2.5.0/libstdc++-v3/include/powerpc-ibm-aix7.2.5.0
> -I
> /tmp/GCC/powerpc-ibm-aix7.2.5.0/libstdc++-v3/include
> -I/nasfarm/edelsohn/src/src
> /libstdc++-v3/libsupc++ -I/nasfarm/edelsohn/install/include
> -I/nasfarm/edelsohn/
> install/include -g -O2 -DIN_GLIBCPP_V3 -Wno-error -c cp-demangle.c  -fPIC
> -DPIC -o cp-demangle.o
> /nasfarm/edelsohn/src/src/libstdc++-v3/libsupc++/eh_alloc.cc: In member
> function 'void* {anonymous}::pool::allocate(std::size_t)':
> /nasfarm/edelsohn/src/src/libstdc++-v3/libsupc++/eh_alloc.cc:239:54: error:
> no matching function for call to
> '__gnu_cxx::__scoped_lock::__scoped_lock(int&)'
>   239 |   __gnu_cxx::__scoped_lock sentry(emergency_mutex);
>   |  ^
> In file included from
> /nasfarm/edelsohn/src/src/libstdc++-v3/libsupc++/eh_alloc.cc:37:
>
> /tmp/GCC/powerpc-ibm-aix7.2.5.0/libstdc++-v3/include/ext/concurrence.h:240:14:
> note: candidate: '__gnu_cxx::__scoped_lock::__scoped_lock(__mutex_type&)'
>   240 | explicit __scoped_lock(__mutex_type& __name) :
> _M_device(__name)
>   |  ^
>
> /tmp/GCC/powerpc-ibm-aix7.2.5.0/libstdc++-v3/include/ext/concurrence.h:240:42:
> note:   no known conversion for argument 1 from 'int' to
> '__gnu_cxx::__scoped_lock::__mutex_type&'
>   240 | explicit __scoped_lock(__mutex_type& __name) :
> _M_device(__name)
>   |~~^~
>
> /tmp/GCC/powerpc-ibm-aix7.2.5.0/libstdc++-v3/include/ext/concurrence.h:236:5:
> note: candidate: '__gnu_cxx::__scoped_lock::__scoped_lock(const
> __gnu_cxx::__scoped_lock&)'
>   236 | __scoped_lock(const __scoped_lock&);
>   | ^
>
> /tmp/GCC/powerpc-ibm-aix7.2.5.0/libstdc++-v3/include/ext/concurrence.h:236:19:
> note:   no known conversion for argument 1 from 'int' to 'const
> __gnu_cxx::__scoped_lock&'
>   236 | __scoped_lock(const __scoped_lock&);
>   |   ^~~~
> make[5]: *** [Makefile:778: eh_alloc.lo] Error 1
>

Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]

2022-10-11 Thread Vineet Gupta


Hi Christoph, Kito,

On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:

This series provides a cleanup of the current atomics implementation
of RISC-V:

* PR100265: Use proper fences for atomic load/store
* PR100266: Provide programmatic implementation of CAS

As both are very related, I merged the patches into one series.

The first patch could be squashed into the following patches,
but I found it easier to understand the chances with it in place.

The series has been tested as follows:
* Building and testing a multilib RV32/64 toolchain
   (bootstrapped with riscv-gnu-toolchain repo)
* Manual review of generated sequences for GCC's atomic builtins API

The programmatic re-implementation of CAS benefits from a REE improvement
(see PR100264):
   https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
If this patch is not in place, then an additional extension instruction
is emitted after the SC.W (in case of RV64 and CAS for uint32_t).

Further, the new CAS code requires cbranch INSN helpers to be present:
   https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html


I was wondering is this patchset is blocked on some technical grounds.

Thx,
-Vineet


Changes for v2:
* Guard LL/SC sequence by compiler barriers ("blockage")
   (suggested by Andrew Waterman)
* Changed commit message for AMOSWAP->STORE change
   (suggested by Andrew Waterman)
* Extracted cbranch4 patch from patchset (suggested by Kito Cheng)
* Introduce predicate riscv_sync_memory_operand (suggested by Jim Wilson)
* Fix small code style issue

Christoph Muellner (10):
   RISC-V: Simplify memory model code [PR 100265]
   RISC-V: Emit proper memory ordering suffixes for AMOs [PR 100265]
   RISC-V: Eliminate %F specifier from riscv_print_operand() [PR 100265]
   RISC-V: Use STORE instead of AMOSWAP for atomic stores [PR 100265]
   RISC-V: Emit fences according to chosen memory model [PR 100265]
   RISC-V: Implement atomic_{load,store} [PR 100265]
   RISC-V: Model INSNs for LR and SC [PR 100266]
   RISC-V: Add s.ext-consuming INSNs for LR and SC [PR 100266]
   RISC-V: Provide programmatic implementation of CAS [PR 100266]
   RISC-V: Introduce predicate "riscv_sync_memory_operand" [PR 100266]

  gcc/config/riscv/riscv-protos.h |   1 +
  gcc/config/riscv/riscv.c| 136 +---
  gcc/config/riscv/sync.md| 216 +---
  3 files changed, 235 insertions(+), 118 deletions(-)

Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]

2022-10-11 Thread Palmer Dabbelt


On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote:

Hi Christoph, Kito,

On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:

This series provides a cleanup of the current atomics implementation
of RISC-V:

* PR100265: Use proper fences for atomic load/store
* PR100266: Provide programmatic implementation of CAS

As both are very related, I merged the patches into one series.

The first patch could be squashed into the following patches,
but I found it easier to understand the chances with it in place.

The series has been tested as follows:
* Building and testing a multilib RV32/64 toolchain
   (bootstrapped with riscv-gnu-toolchain repo)
* Manual review of generated sequences for GCC's atomic builtins API

The programmatic re-implementation of CAS benefits from a REE improvement
(see PR100264):
   https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
If this patch is not in place, then an additional extension instruction
is emitted after the SC.W (in case of RV64 and CAS for uint32_t).

Further, the new CAS code requires cbranch INSN helpers to be present:
   https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html


I was wondering is this patchset is blocked on some technical grounds.


There's a v3 (though I can't find all of it, so not quite sure what 
happened), but IIUC that still has the same fundamental problems that 
all these have had: changing over to the new fence model may by an ABI 
break and the split CAS implementation doesn't ensure eventual success 
(see Jim's comments).  Not sure if there's other comments floating 
around, though, that's just what I remember.


+Andrea, in case he has time to look at the memory model / ABI issues.  

We'd still need to sort out the CAS issues, though, and it's not 
abundantly clear it's worth the work: we're essentailly constrained to 
just emitting those fixed CAS sequences due to the eventual success 
rules, so it's not clear what the benefit of splitting those up is.  
With WRS there are some routines we might want to generate code for 
(cond_read_acquire() in Linux, for example) but we'd really need to dig 
into those to see if it's even sane/fast.


There's another patch set to fix the lack of inline atomic routines 
without breaking stuff, there were some minor comments from Kito and 
IIRC I had some test failures that I needed to chase down as well.  
That's a much safer fix in the short term, we'll need to deal with this 
eventually but at least we can stop the libatomic issues for the distro 
folks.




Thx,
-Vineet


Changes for v2:
* Guard LL/SC sequence by compiler barriers ("blockage")
   (suggested by Andrew Waterman)
* Changed commit message for AMOSWAP->STORE change
   (suggested by Andrew Waterman)
* Extracted cbranch4 patch from patchset (suggested by Kito Cheng)
* Introduce predicate riscv_sync_memory_operand (suggested by Jim Wilson)
* Fix small code style issue

Christoph Muellner (10):
   RISC-V: Simplify memory model code [PR 100265]
   RISC-V: Emit proper memory ordering suffixes for AMOs [PR 100265]
   RISC-V: Eliminate %F specifier from riscv_print_operand() [PR 100265]
   RISC-V: Use STORE instead of AMOSWAP for atomic stores [PR 100265]
   RISC-V: Emit fences according to chosen memory model [PR 100265]
   RISC-V: Implement atomic_{load,store} [PR 100265]
   RISC-V: Model INSNs for LR and SC [PR 100266]
   RISC-V: Add s.ext-consuming INSNs for LR and SC [PR 100266]
   RISC-V: Provide programmatic implementation of CAS [PR 100266]
   RISC-V: Introduce predicate "riscv_sync_memory_operand" [PR 100266]

  gcc/config/riscv/riscv-protos.h |   1 +
  gcc/config/riscv/riscv.c| 136 +---
  gcc/config/riscv/sync.md| 216 +---
  3 files changed, 235 insertions(+), 118 deletions(-)

Re: [PATCH] libstdc++: Implement ranges::repeat_view from P2474R2

2022-10-11 Thread Jonathan Wakely via Gcc-patches

On Tue, 11 Oct 2022 at 03:51, Patrick Palka via Libstdc++
 wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk? (The paper
> also makes changes to views::take and views::drop, which will be
> implemented separately.)

OK, thanks.

Re: [PATCH] libstdc++: Allow emergency EH alloc pool size to be tuned [PR68606]

2022-10-11 Thread Jonathan Wakely via Gcc-patches

On Tue, 11 Oct 2022 at 19:58, Jonathan Wakely wrote:
>
>
>
> On Tue, 11 Oct 2022, 19:38 David Edelsohn via Libstdc++, 
>  wrote:
>>
>> This patch seems to have broken bootstrap on AIX.  It seems to assume
>> methods that aren't guaranteed to be defined.
>
>
>
> It doesn't use anything that wasn't already used by that file.
>
> I have no idea how it ever compiled if it doesn't now, but I'll take a look.

The problem was inconsistent namespace qualification. The
__scoped_lock type is always present on AIX even for the
single-threaded multilib, but the code wasn't referring to the correct
__scoped_lock type consistently.

Fixed like so. Bootstrapped on x86_64-linux with --disable-threads
(I'm still waiting for the AIX build, but the symptom and cure are the
same as for --disable-threads on other targets).

Pushed to trunk.
commit 23c3cbaed36f6d2f3a7a64f6ebda69329723514b
Author: Jonathan Wakely 
Date:   Tue Oct 11 20:19:08 2022

libstdc++: Fix bootstrap for --disable-threads build [PR107221]

The __scoped_lock type should be used unqualified so that we always
refer to pool::__scoped_lock, which might be the dummy fallback
implementation.

The __mutex and __scoped_lock types in  already work
fine without __GTHREADS being defined, but that header isn't included at
all unless _GLIBCXX_HOSTED != 0. The fallback implementation should be
used for ! _GLIBCXX_HOSTED instead of for !defined __GTHREADS.

libstdc++-v3/ChangeLog:

PR bootstrap/107221
* libsupc++/eh_alloc.cc (pool): Change preprocessor condition
for using __mutex from __GTHREADS to _GLIBCXX_HOSTED.
(pool::allocate): Remove namespace qualification to use
pool::__scoped_lock instead of __gnu_cxx::__scoped_lock.

diff --git a/libstdc++-v3/libsupc++/eh_alloc.cc 
b/libstdc++-v3/libsupc++/eh_alloc.cc
index 50dc37c0d9c..81b8a1548c6 100644
--- a/libstdc++-v3/libsupc++/eh_alloc.cc
+++ b/libstdc++-v3/libsupc++/eh_alloc.cc
@@ -145,7 +145,7 @@ namespace
char data[] __attribute__((aligned));
   };

-#ifdef __GTHREADS
+#if _GLIBCXX_HOSTED
   // A single mutex controlling emergency allocations.
   __gnu_cxx::__mutex emergency_mutex;
   using __scoped_lock = __gnu_cxx::__scoped_lock;
@@ -236,7 +236,7 @@ namespace

   void *pool::allocate (std::size_t size) noexcept
 {
-  __gnu_cxx::__scoped_lock sentry(emergency_mutex);
+  __scoped_lock sentry(emergency_mutex);
   // We need an additional size_t member plus the padding to
   // ensure proper alignment of data.
   size += offsetof (allocated_entry, data);

[PATCH] c++: ICE with VEC_INIT_EXPR and defarg [PR106925]

2022-10-11 Thread Marek Polacek via Gcc-patches

Since r12-8066, in cxx_eval_vec_init we perform expand_vec_init_expr
while processing the default argument in this test.  At this point
start_preparsed_function hasn't yet set current_function_decl.
expand_vec_init_expr then leads to maybe_splice_retval_cleanup which
checks DECL_CONSTRUCTOR_P (current_function_decl) without checking that
c_f_d is non-null first.  It seems correct that c_f_d is null here, so
it seems to me that maybe_splice_retval_cleanup should check c_f_d as
in the following patch.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/12?

PR c++/106925

gcc/cp/ChangeLog:

* except.cc (maybe_splice_retval_cleanup): Check current_function_decl.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-defarg3.C: New test.
---
 gcc/cp/except.cc  |  3 +++
 gcc/testsuite/g++.dg/cpp0x/initlist-defarg3.C | 13 +
 2 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/initlist-defarg3.C

diff --git a/gcc/cp/except.cc b/gcc/cp/except.cc
index b8a85ed0572..9f77289b9ca 100644
--- a/gcc/cp/except.cc
+++ b/gcc/cp/except.cc
@@ -1327,6 +1327,9 @@ maybe_splice_retval_cleanup (tree compound_stmt)
&& current_binding_level->level_chain->kind == sk_function_parms);
 
   if ((function_body || current_binding_level->kind == sk_try)
+  /* When we're processing a default argument, c_f_d may not have been
+set.  */
+  && current_function_decl
   && !DECL_CONSTRUCTOR_P (current_function_decl)
   && !DECL_DESTRUCTOR_P (current_function_decl)
   && current_retval_sentinel)
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-defarg3.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist-defarg3.C
new file mode 100644
index 000..5c3e886b306
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist-defarg3.C
@@ -0,0 +1,13 @@
+// PR c++/106925
+// { dg-do compile { target c++11 } }
+
+struct Foo;
+template  struct __array_traits { typedef Foo _Type[_Nm]; };
+template  struct array {
+  typename __array_traits<_Nm>::_Type _M_elems;
+};
+template  struct MyVector { array data{}; };
+struct Foo {
+  float a{0};
+};
+void foo(MyVector<1> = MyVector<1>());

base-commit: 23c3cbaed36f6d2f3a7a64f6ebda69329723514b
-- 
2.37.3

Re: [PATCH v5] c-family: ICE with [[gnu::nocf_check]] [PR106937]

2022-10-11 Thread Marek Polacek via Gcc-patches

On Tue, Oct 11, 2022 at 09:40:45AM +0200, Andreas Schwab via Gcc-patches wrote:
> On Okt 10 2022, Marek Polacek via Gcc-patches wrote:
> 
> > diff --git a/gcc/testsuite/c-c++-common/pointer-to-fn1.c 
> > b/gcc/testsuite/c-c++-common/pointer-to-fn1.c
> > new file mode 100644
> > index 000..975885462e9
> > --- /dev/null
> > +++ b/gcc/testsuite/c-c++-common/pointer-to-fn1.c
> > @@ -0,0 +1,18 @@
> > +/* PR c++/106937 */
> > +/* { dg-options "-fcf-protection" } */
> 
> FAIL: c-c++-common/pointer-to-fn1.c  -Wc++-compat  (test for excess errors)
> Excess errors:
> cc1: error: '-fcf-protection=full' is not supported for this target

Thanks, patch posted.

Marek

Re: [PATCH] Fortran: check types of source expressions before conversion [PR107215]

2022-10-11 Thread Mikael Morin


Le 11/10/2022 à 20:47, Harald Anlauf via Fortran a écrit :

Dear all,

this PR is an obvious followup to PR107000, where invalid
types appeared in array constructors and lead to an ICE
either in a conversion or reduction of a unary or binary
expression.

The present PR shows that several other conversions need to
be protected by a check of the type of the source expression.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


OK, thanks.

[PATCH] Fortran: check types of operands of arithmetic binary operations [PR107217]

2022-10-11 Thread Harald Anlauf via Gcc-patches

Dear all,

we need to check that the operands of arithmetic binary operations
are consistent and of numeric type.

The PR reported an issue for multiplication ("*"), but we better
extend this to the other binary operations.

I chose the following solution:
- consistent types for +,-,*,/, keeping an internal error if any
  unhandled type shows up,
- numeric types for **

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From a95f251504bcb8ba28b7db1d2b7990631c761e9c Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 11 Oct 2022 22:08:48 +0200
Subject: [PATCH] Fortran: check types of operands of arithmetic binary
 operations [PR107217]

gcc/fortran/ChangeLog:

	PR fortran/107217
	* arith.cc (gfc_arith_plus): Compare consistency of types of operands.
	(gfc_arith_minus): Likewise.
	(gfc_arith_times): Likewise.
	(gfc_arith_divide): Likewise.
	(arith_power): Check that both operands are of numeric type.

gcc/testsuite/ChangeLog:

	PR fortran/107217
	* gfortran.dg/pr107217.f90: New test.
---
 gcc/fortran/arith.cc   | 15 +++
 gcc/testsuite/gfortran.dg/pr107217.f90 | 18 ++
 2 files changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/pr107217.f90

diff --git a/gcc/fortran/arith.cc b/gcc/fortran/arith.cc
index 9e079e42995..14ba931e37f 100644
--- a/gcc/fortran/arith.cc
+++ b/gcc/fortran/arith.cc
@@ -624,6 +624,9 @@ gfc_arith_plus (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp)
   gfc_expr *result;
   arith rc;

+  if (op1->ts.type != op2->ts.type)
+return ARITH_INVALID_TYPE;
+
   result = gfc_get_constant_expr (op1->ts.type, op1->ts.kind, &op1->where);

   switch (op1->ts.type)
@@ -658,6 +661,9 @@ gfc_arith_minus (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp)
   gfc_expr *result;
   arith rc;

+  if (op1->ts.type != op2->ts.type)
+return ARITH_INVALID_TYPE;
+
   result = gfc_get_constant_expr (op1->ts.type, op1->ts.kind, &op1->where);

   switch (op1->ts.type)
@@ -692,6 +698,9 @@ gfc_arith_times (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp)
   gfc_expr *result;
   arith rc;

+  if (op1->ts.type != op2->ts.type)
+return ARITH_INVALID_TYPE;
+
   result = gfc_get_constant_expr (op1->ts.type, op1->ts.kind, &op1->where);

   switch (op1->ts.type)
@@ -727,6 +736,9 @@ gfc_arith_divide (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp)
   gfc_expr *result;
   arith rc;

+  if (op1->ts.type != op2->ts.type)
+return ARITH_INVALID_TYPE;
+
   rc = ARITH_OK;

   result = gfc_get_constant_expr (op1->ts.type, op1->ts.kind, &op1->where);
@@ -815,6 +827,9 @@ arith_power (gfc_expr *op1, gfc_expr *op2, gfc_expr **resultp)
   gfc_expr *result;
   arith rc;

+  if (!gfc_numeric_ts (&op1->ts) || !gfc_numeric_ts (&op2->ts))
+return ARITH_INVALID_TYPE;
+
   rc = ARITH_OK;
   result = gfc_get_constant_expr (op1->ts.type, op1->ts.kind, &op1->where);

diff --git a/gcc/testsuite/gfortran.dg/pr107217.f90 b/gcc/testsuite/gfortran.dg/pr107217.f90
new file mode 100644
index 000..9c8492e64f0
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr107217.f90
@@ -0,0 +1,18 @@
+! { dg-do compile }
+! PR fortran/107217 - ICE in gfc_arith_times
+! Contributed by G.Steinmetz
+
+program p
+  print *, [real :: (['1'])] * 2 ! { dg-error "Cannot convert" }
+  print *, 2 * [real :: (['1'])] ! { dg-error "Cannot convert" }
+  print *, [real :: (['1'])] + 2 ! { dg-error "Cannot convert" }
+  print *, [real :: (['1'])] - 2 ! { dg-error "Cannot convert" }
+  print *, [real :: (['1'])] / 2 ! { dg-error "Cannot convert" }
+  print *, 1 / [real :: (['1'])] ! { dg-error "Cannot convert" }
+  print *, [real :: (['1'])] ** 2 ! { dg-error "Cannot convert" }
+  print *, 2 ** [real :: (['1'])] ! { dg-error "Cannot convert" }
+  print *, 2.0 ** [real :: (.true.)] ! { dg-error "Cannot convert" }
+  print *, [real :: (.true.)] ** 2.0 ! { dg-error "Cannot convert" }
+  print *, [complex :: (['1'])] ** (1.0,2.0) ! { dg-error "Cannot convert" }
+  print *, (1.0,2.0) ** [complex :: (['1'])] ! { dg-error "Cannot convert" }
+end
--
2.35.3

Re: [PATCH] c++: ICE with VEC_INIT_EXPR and defarg [PR106925]

2022-10-11 Thread Jason Merrill via Gcc-patches


On 10/11/22 16:00, Marek Polacek wrote:

Since r12-8066, in cxx_eval_vec_init we perform expand_vec_init_expr
while processing the default argument in this test.


Hmm, why are we calling cxx_eval_vec_init during parsing of the default 
argument?  In particular, any expansion that depends on the enclosing 
function context should be deferred until the default arg is used by a call.


But it's certainly true that the "function_body" test is wrong in this 
situation; you might move the c_f_d test into the calculation of that 
variable.  The patch is OK with that change, but please also answer my 
question above.



At this point
start_preparsed_function hasn't yet set current_function_decl.
expand_vec_init_expr then leads to maybe_splice_retval_cleanup which
checks DECL_CONSTRUCTOR_P (current_function_decl) without checking that
c_f_d is non-null first.  It seems correct that c_f_d is null here, so
it seems to me that maybe_splice_retval_cleanup should check c_f_d as
in the following patch.



Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/12?

PR c++/106925

gcc/cp/ChangeLog:

* except.cc (maybe_splice_retval_cleanup): Check current_function_decl.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/initlist-defarg3.C: New test.
---
  gcc/cp/except.cc  |  3 +++
  gcc/testsuite/g++.dg/cpp0x/initlist-defarg3.C | 13 +
  2 files changed, 16 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/initlist-defarg3.C

diff --git a/gcc/cp/except.cc b/gcc/cp/except.cc
index b8a85ed0572..9f77289b9ca 100644
--- a/gcc/cp/except.cc
+++ b/gcc/cp/except.cc
@@ -1327,6 +1327,9 @@ maybe_splice_retval_cleanup (tree compound_stmt)
 && current_binding_level->level_chain->kind == sk_function_parms);
  
if ((function_body || current_binding_level->kind == sk_try)

+  /* When we're processing a default argument, c_f_d may not have been
+set.  */
+  && current_function_decl
&& !DECL_CONSTRUCTOR_P (current_function_decl)
&& !DECL_DESTRUCTOR_P (current_function_decl)
&& current_retval_sentinel)
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-defarg3.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist-defarg3.C
new file mode 100644
index 000..5c3e886b306
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist-defarg3.C
@@ -0,0 +1,13 @@
+// PR c++/106925
+// { dg-do compile { target c++11 } }
+
+struct Foo;
+template  struct __array_traits { typedef Foo _Type[_Nm]; };
+template  struct array {
+  typename __array_traits<_Nm>::_Type _M_elems;
+};
+template  struct MyVector { array data{}; };
+struct Foo {
+  float a{0};
+};
+void foo(MyVector<1> = MyVector<1>());

base-commit: 23c3cbaed36f6d2f3a7a64f6ebda69329723514b

Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]

2022-10-11 Thread Christoph Müllner via Gcc-patches

On Tue, Oct 11, 2022 at 9:31 PM Palmer Dabbelt  wrote:

> On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote:
> > Hi Christoph, Kito,
> >
> > On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:
> >> This series provides a cleanup of the current atomics implementation
> >> of RISC-V:
> >>
> >> * PR100265: Use proper fences for atomic load/store
> >> * PR100266: Provide programmatic implementation of CAS
> >>
> >> As both are very related, I merged the patches into one series.
> >>
> >> The first patch could be squashed into the following patches,
> >> but I found it easier to understand the chances with it in place.
> >>
> >> The series has been tested as follows:
> >> * Building and testing a multilib RV32/64 toolchain
> >>(bootstrapped with riscv-gnu-toolchain repo)
> >> * Manual review of generated sequences for GCC's atomic builtins API
> >>
> >> The programmatic re-implementation of CAS benefits from a REE
> improvement
> >> (see PR100264):
> >>https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
> >> If this patch is not in place, then an additional extension instruction
> >> is emitted after the SC.W (in case of RV64 and CAS for uint32_t).
> >>
> >> Further, the new CAS code requires cbranch INSN helpers to be present:
> >>https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html
> >
> > I was wondering is this patchset is blocked on some technical grounds.
>
> There's a v3 (though I can't find all of it, so not quite sure what
> happened), but IIUC that still has the same fundamental problems that
> all these have had: changing over to the new fence model may by an ABI
> break and the split CAS implementation doesn't ensure eventual success
> (see Jim's comments).  Not sure if there's other comments floating
> around, though, that's just what I remember.
>

v3 was sent on May 27, 2022, when I rebased this on an internal tree:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html
I dropped the CAS patch in v3 (issue: stack spilling under extreme register
pressure instead of erroring out) as I thought that this was the blocker
for the series.
I just learned a few weeks ago, when I asked Palmer at the GNU Cauldron
about this series, that the ABI break is the blocker.

My initial understanding was that fixing something broken cannot be an ABI
break.
And that the mismatch of the implementation in 2021 and the recommended
mappings in the ratified specification from 2019 is something that is
broken. I still don't know the background here, but I guess this assumption
is incorrect from a historical point of view.

However, I'm sure that I am not the only one that assumes the mappings in
the specification to be implemented in compilers and tools. Therefore I
still consider the implementation of the RISC-V atomics in GCC as broken
(at least w.r.t. user expectation from people that lack the historical
background and just read the RISC-V specification).

>
> +Andrea, in case he has time to look at the memory model / ABI issues.
>
> We'd still need to sort out the CAS issues, though, and it's not
> abundantly clear it's worth the work: we're essentailly constrained to
> just emitting those fixed CAS sequences due to the eventual success
> rules, so it's not clear what the benefit of splitting those up is.
> With WRS there are some routines we might want to generate code for
> (cond_read_acquire() in Linux, for example) but we'd really need to dig
> into those to see if it's even sane/fast.
>
> There's another patch set to fix the lack of inline atomic routines
> without breaking stuff, there were some minor comments from Kito and
> IIRC I had some test failures that I needed to chase down as well.
> That's a much safer fix in the short term, we'll need to deal with this
> eventually but at least we can stop the libatomic issues for the distro
> folks.
>

I expect that the pressure for a proper fix upstream (instead of a backward
compatible compromise) will increase over time (once people start building
big iron based on RISC-V and start hunting performance bottlenecks in
multithreaded workloads to be competitive).
What could be done to get some relief is to enable the new atomics ABI by a
command line switch and promote its use. And at one point in the future (if
there are enough fixes to justify a break) the new ABI can be enabled by
default with a new flag to enable the old ABI.

>
> >
> > Thx,
> > -Vineet
> >
> >> Changes for v2:
> >> * Guard LL/SC sequence by compiler barriers ("blockage")
> >>(suggested by Andrew Waterman)
> >> * Changed commit message for AMOSWAP->STORE change
> >>(suggested by Andrew Waterman)
> >> * Extracted cbranch4 patch from patchset (suggested by Kito Cheng)
> >> * Introduce predicate riscv_sync_memory_operand (suggested by Jim
> Wilson)
> >> * Fix small code style issue
> >>
> >> Christoph Muellner (10):
> >>RISC-V: Simplify memory model code [PR 100265]
> >>RISC-V: Emit proper memory ordering suffixes for A

Re: [PATCH v2] LoongArch: Libvtv add loongarch support.

2022-10-11 Thread Caroline Tice via Gcc-patches

I think that if VTV_PAGE_SIZE is not set to the actual size being used by
the system,  it could result in some unexpected failures.  I believe the
right thing to do in this case, since the size may vary, is to get the
actual size being used by the system and use that in the definition of
VTV_PAGE_SIZE.  So in include/vtv-permission.h you would have something
like:

+#elif defined(__loongarch_lp64)
+#define VTV_PAGE_SIZE sysconf(_SC_PAGE_SIZE)

Then you would have the accurate, correct size for the current system, and
there would be no need to update the
check in vtv_malloc.cc at all.

-- Caroline Tice
cmt...@google.com

On Tue, Sep 27, 2022 at 3:04 AM Lulu Cheng  wrote:

>
> v1 - > v2:
>
> 1. When the macro __loongarch_lp64 is defined, the VTV_PAGE_SIZE is set to
> 64K.
> 2. In the vtv_malloc.cc file __vtv_malloc_init function, it does not check
>whether VTV_PAGE_SIZE is equal to the system page size, if the macro
>__loongarch_lp64 is defined.
>
> All regression tests of libvtv passed.
>
> === libvtv Summary ===
>
> # of expected passes176
>
> But I haven't tested the performance yet.
>
> ---
> Co-Authored-By: qijingwen 
>
> include/ChangeLog:
>
> * vtv-change-permission.h (defined):
> (VTV_PAGE_SIZE): Under the loongarch64 architecture,
> set VTV_PAGE_SIZE to 64K.
>
> libvtv/ChangeLog:
>
> * configure.tgt: Add loongarch support.
> * vtv_malloc.cc (defined): If macro __loongarch_lp64 is
> defined, then don't check whether VTV_PAGE_SIZE is the
> same as the system page size.
> ---
>  include/vtv-change-permission.h | 4 
>  libvtv/configure.tgt| 3 +++
>  libvtv/vtv_malloc.cc| 5 +
>  3 files changed, 12 insertions(+)
>
> diff --git a/include/vtv-change-permission.h
> b/include/vtv-change-permission.h
> index 70bdad92bca..64e419c29d5 100644
> --- a/include/vtv-change-permission.h
> +++ b/include/vtv-change-permission.h
> @@ -48,6 +48,10 @@ extern void __VLTChangePermission (int);
>  #else
>  #if defined(__sun__) && defined(__svr4__) && defined(__sparc__)
>  #define VTV_PAGE_SIZE 8192
> +/* LoongArch architecture 64-bit system supports 4k,16k and 64k
> +   page size, which is set to the maximum value here.  */
> +#elif defined(__loongarch_lp64)
> +#define VTV_PAGE_SIZE 65536
>  #else
>  #define VTV_PAGE_SIZE 4096
>  #endif
> diff --git a/libvtv/configure.tgt b/libvtv/configure.tgt
> index aa2a3f675b8..6cdd1e97ab1 100644
> --- a/libvtv/configure.tgt
> +++ b/libvtv/configure.tgt
> @@ -50,6 +50,9 @@ case "${target}" in
> ;;
>x86_64-*-darwin[1]* | i?86-*-darwin[1]*)
> ;;
> +  loongarch*-*-linux*)
> +   VTV_SUPPORTED=yes
> +   ;;
>*)
> ;;
>  esac
> diff --git a/libvtv/vtv_malloc.cc b/libvtv/vtv_malloc.cc
> index 67c5de6d4e9..45804b8d7f8 100644
> --- a/libvtv/vtv_malloc.cc
> +++ b/libvtv/vtv_malloc.cc
> @@ -212,6 +212,11 @@ __vtv_malloc_init (void)
>
>  #if defined (__CYGWIN__) || defined (__MINGW32__)
>if (VTV_PAGE_SIZE != sysconf_SC_PAGE_SIZE())
> +#elif defined (__loongarch_lp64)
> +  /* I think that under the LoongArch 64-bit system, VTV_PAGE_SIZE is set
> + to the maximum value of 64K supported by the system, so there is no
> + need to judge here.  */
> +  if (false)
>  #else
>if (VTV_PAGE_SIZE != sysconf (_SC_PAGE_SIZE))
>  #endif
> --
> 2.31.1
>
>

[PATCH v2 1/3] doc: -falign-functions doesn't override the attribute((align(N)))

2022-10-11 Thread Palmer Dabbelt

I found this when reading the documentation for Kito's recent patch.
>From the discussion it sounds like this is the desired behavior, so
let's document it.

gcc/doc/ChangeLog

* invoke.texi (-falign-functions): Mention __align__
---
 gcc/doc/invoke.texi | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2a9ea3455f6..8326a60dcf1 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13136,7 +13136,9 @@ effective only in combination with 
@option{-fstrict-aliasing}.
 Align the start of functions to the next power-of-two greater than or
 equal to @var{n}, skipping up to @var{m}-1 bytes.  This ensures that at
 least the first @var{m} bytes of the function can be fetched by the CPU
-without crossing an @var{n}-byte alignment boundary.
+without crossing an @var{n}-byte alignment boundary.  This does not override
+functions that otherwise specify their own alignment constraints, such as via
+an alignment attribute.
 
 If @var{m} is not specified, it defaults to @var{n}.
 
-- 
2.34.1

[PATCH v2 0/3] doc: -falign-functions improvements

2022-10-11 Thread Palmer Dabbelt

There were some recent discussions about the desired behavior of
-falign-functions, which is behaving as desired.  This improves the
documentation to make that explicit.

Change since v1 <20221007134901.5078-1-pal...@rivosinc.com>:

* New patch 2 and 3

[PATCH v2 3/3] doc: -falign-functions is ignored for cold/size-optimized functions

2022-10-11 Thread Palmer Dabbelt

gcc/doc/ChangeLog

* invoke.texi (-falign-functions): Mention cold/size-optimized
functions.
---
 gcc/doc/invoke.texi | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a24798d5029..6af18ae9bfd 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13138,7 +13138,8 @@ equal to @var{n}, skipping up to @var{m}-1 bytes.  This 
ensures that at
 least the first @var{m} bytes of the function can be fetched by the CPU
 without crossing an @var{n}-byte alignment boundary.  This does not override
 functions that otherwise specify their own alignment constraints, such as via
-an alignment attribute.
+an alignment attribute.  Functions that are optimized for size, for example
+cold functions, are not aligned.
 
 If @var{m} is not specified, it defaults to @var{n}.
 
-- 
2.34.1

Re: [PATCH] doc: -falign-functions doesn't override the attribute((align(N)))

2022-10-11 Thread Palmer Dabbelt


On Sun, 09 Oct 2022 23:07:21 PDT (-0700), richard.guent...@gmail.com wrote:

On Fri, Oct 7, 2022 at 3:50 PM Palmer Dabbelt  wrote:


I found this when reading the documentation for Kito's recent patch.
From the discussion it sounds like this is the desired behavior, so
let's document it.


Maybe also mention that the alignment doesn't apply to functions
optimized for size?


Oops, I guess that was the whole point of the discussion ;).  I sent a 
v2, which also mentions -Os but not sure we need to do that explicitly.





gcc/doc/ChangeLog

* invoke.texi (-falign-functions): Mention __align__
---
 gcc/doc/invoke.texi | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2a9ea3455f6..8326a60dcf1 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13136,7 +13136,9 @@ effective only in combination with 
@option{-fstrict-aliasing}.
 Align the start of functions to the next power-of-two greater than or
 equal to @var{n}, skipping up to @var{m}-1 bytes.  This ensures that at
 least the first @var{m} bytes of the function can be fetched by the CPU
-without crossing an @var{n}-byte alignment boundary.
+without crossing an @var{n}-byte alignment boundary.  This does not override
+functions that otherwise specify their own alignment constraints, such as via
+an alignment attribute.

 If @var{m} is not specified, it defaults to @var{n}.

--
2.34.1

[PATCH v2 2/3] doc: -falign-functions is ignored under -Os

2022-10-11 Thread Palmer Dabbelt

This is implicitly mentioned in the docs, but there were some questions
in a recent patch.  This makes it more exlicit that -falign-functions is
meant to be ignored under -Os.

gcc/doc/ChangeLog

* invoke.texi (-falign-functions): Mention -Os
---
 gcc/doc/invoke.texi | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8326a60dcf1..a24798d5029 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13164,7 +13164,8 @@ equivalent and mean that functions are not aligned.
 If @var{n} is not specified or is zero, use a machine-dependent default.
 The maximum allowed @var{n} option value is 65536.
 
-Enabled at levels @option{-O2}, @option{-O3}.
+Enabled at levels @option{-O2}, @option{-O3}.  This has no behavior under under
+@option{-Os}.
 
 @item -flimit-function-alignment
 If this option is enabled, the compiler tries to avoid unnecessarily
-- 
2.34.1

[PATCH] coroutines: Use cp_build_init_expr consistently.

2022-10-11 Thread Iain Sandoe via Gcc-patches

Tested on x86_64-darwin19, OK for master?
thanks
Iain

-- >8 --

Now we have the TARGET_EXPR_ELIDING_P flag, it is important to ensure it
is set properly on target exprs.  The code here has a mixture of APIs used
to build inits.  This patch changes that to use cp_build_init_expr() where
possible, since that handles setting the flag.
---
 gcc/cp/coroutines.cc | 25 -
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 01a3e831ee5..20d309445eb 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1025,8 +1025,7 @@ build_co_await (location_t loc, tree a, 
suspend_point_kind suspend_kind)
   else
 {
   e_proxy = get_awaitable_var (suspend_kind, o_type);
-  o = cp_build_modify_expr (loc, e_proxy, INIT_EXPR, o,
-   tf_warning_or_error);
+  o = cp_build_init_expr (loc, e_proxy, o);
 }
 
   /* I suppose we could check that this is contextually convertible to bool.  
*/
@@ -2889,8 +2888,7 @@ flatten_await_stmt (var_nest_node *n, hash_set 
*promoted,
  gcc_checking_assert (!already_present);
  tree inner = TREE_OPERAND (init, 1);
  gcc_checking_assert (TREE_CODE (inner) != COND_EXPR);
- init = cp_build_modify_expr (input_location, var, INIT_EXPR, init,
-  tf_warning_or_error);
+ init = cp_build_init_expr (var, init);
  /* Simplify for the case that we have an init containing the temp
 alone.  */
  if (t == n->init && n->var == NULL_TREE)
@@ -3656,8 +3654,7 @@ await_statement_walker (tree *stmt, int *do_subtree, void 
*d)
if (TREE_CODE (cond_inner) == CLEANUP_POINT_EXPR)
  cond_inner = TREE_OPERAND (cond_inner, 0);
location_t sloc = EXPR_LOCATION (SWITCH_STMT_COND (sw_stmt));
-   tree new_s = cp_build_init_expr (sloc, newvar,
-cond_inner);
+   tree new_s = cp_build_init_expr (sloc, newvar, cond_inner);
finish_expr_stmt (new_s);
SWITCH_STMT_COND (sw_stmt) = newvar;
/* Now add the switch statement with the condition re-
@@ -4902,9 +4899,8 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
  if (flag_exceptions)
{
  /* This var is now live.  */
- r = build_modify_expr (fn_start, parm.guard_var,
-boolean_type_node, INIT_EXPR, fn_start,
-boolean_true_node, boolean_type_node);
+ r = cp_build_init_expr (fn_start, parm.guard_var,
+ boolean_true_node);
  finish_expr_stmt (r);
}
}
@@ -4948,9 +4944,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
   r = coro_build_cvt_void_expr_stmt (r, fn_start);
   finish_expr_stmt (r);
 
-  r = build_modify_expr (fn_start, coro_promise_live, boolean_type_node,
-INIT_EXPR, fn_start, boolean_true_node,
-boolean_type_node);
+  r = cp_build_init_expr (fn_start, coro_promise_live, boolean_true_node);
   finish_expr_stmt (r);
 
   promise_dtor
@@ -5031,8 +5025,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
   NULL_TREE);
   add_decl_expr (gro);
   gro_bind_vars = gro;
-  r = cp_build_modify_expr (input_location, gro, INIT_EXPR, get_ro,
-   tf_warning_or_error);
+  r = cp_build_init_expr (gro, get_ro);
   /* The constructed object might require a cleanup.  */
   if (TYPE_HAS_NONTRIVIAL_DESTRUCTOR (gro_type))
{
@@ -5053,9 +5046,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
  cleanup.  */
   if (gro_ret_dtor)
 {
-   r = build_modify_expr (fn_start, coro_gro_live, boolean_type_node,
- INIT_EXPR, fn_start, boolean_true_node,
- boolean_type_node);
+   r = cp_build_init_expr (fn_start, coro_gro_live, boolean_true_node);
   finish_expr_stmt (r);
 }
   /* Initialize the resume_idx_var to 0, meaning "not started".  */
-- 
2.24.3 (Apple Git-128)

Re: [PATCH v2] c++: parser - Support for target address spaces in C++

2022-10-11 Thread Paul Iannetta via Gcc-patches

Thank you very much for the comments.

On Mon, Oct 10, 2022 at 03:20:13PM -0400, Jason Merrill wrote:
> On 10/9/22 12:12, Paul Iannetta wrote:
> > That's a nice feature! LLVM is using AS to mangle the
> > address-name qualified types but that would have required an update of
> > libiberty's demangler in the binutils as well.
> 
> And they haven't proposed this mangling to
> 
>   https://github.com/itanium-cxx-abi/cxx-abi/
> 
> yet, either.

When looking at clang/lib/AST/{Microsoft,Itamium}Mangle.cpp, the
comments may suggest that they wanted to implement them as extended
qualifiers prefixed by an `U' but that's not what they ended up doing.

> You certainly want some template tests, say
> 
> template 
> int f (T *p) { return *p; }
> __seg_fs int *a;
> int main() { f(a); }
> // test for mangling of f<__seg_fs int>
> 
> -
> 
> template 
> int f (T __seg_gs *p) { return *p; }
> __seg_fs int *a;
> int main() { f(a); } // error, conflicting address spaces

Indeed, I was getting the first one right by a stroke of luck but not
the second.  I've consequently adapted the part which checks and
computes the unification of cv-qualifiers in the presence of address
spaces.  The idea being that a type parameter T can match any address
spaces but an address-space qualified parameter can't accept more
general address spaces than itself.

> > +/* Return true if between two named address spaces, whether there is a 
> > superset
> > +   named address space that encompasses both address spaces.  If there is a
> > +   superset, return which address space is the superset.  */
> > +
> > +bool
> > +addr_space_superset (addr_space_t as1, addr_space_t as2,
> > +addr_space_t * common)
> > +{
> > +  if (as1 == as2)
> > +{
> > +  *common = as1;
> > +  return true;
> > +}
> > +  else if (targetm.addr_space.subset_p (as1, as2))
> > +{
> > +  *common = as2;
> > +  return true;
> > +}
> > +  else if (targetm.addr_space.subset_p (as2, as1))
> > +{
> > +  *common = as1;
> > +  return true;
> > +}
> > +  else
> > +return false;
> 
> So it's not possible for the two spaces to both be subsets of a third?
> 

According to the [N1275, page 38]:
Address spaces may overlap in a nested fashion. For any two address
spaces, either the address spaces must be disjoint, they must be
equivalent, or one must be a subset of the other. [...] There is no
requirement that named address spaces (intrinsic or otherwise) be
subsets of the generic address space.
[N1275]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1275.pdf

Hence, two disjoint address spaces can't be subsets of a third, per
the draft.

> 
> New non-bit-fields should be added before the bit-fields.
> 

Done.

> > diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
> > index eb53e0ebeb4..06625ad9a4b 100644
> > --- a/gcc/cp/mangle.cc
> > +++ b/gcc/cp/mangle.cc
> > @@ -2509,6 +2509,15 @@ write_CV_qualifiers_for_type (const tree type)
> >array.  */
> > cp_cv_quals quals = TYPE_QUALS (type);
> > +  if (DECODE_QUAL_ADDR_SPACE (quals))
> > +{
> > +  addr_space_t as = DECODE_QUAL_ADDR_SPACE (quals);
> 
> This can be
> 
> if (addr_space_t as = DECODE_QUAL_ADDR_SPACE (quals))
> 
> so you don't need to repeat it.

I thought this was c++17 only, but in fact c++17 only broadened the
idiom.  Nice! (It would be even nicer to have this feature in C as
well)
> > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> > index 763df6f479b..110ceafc98b 100644
> > --- a/gcc/cp/parser.cc
> > +++ b/gcc/cp/parser.cc
> > [...]
> > @@ -23812,6 +23830,13 @@ cp_parser_cv_qualifier_seq_opt (cp_parser* parser)
> >   break;
> > }
> > +  if (RID_FIRST_ADDR_SPACE <= token->keyword &&
> > + token->keyword <= RID_LAST_ADDR_SPACE)
> > +   {
> > + cv_qualifier =
> > +   ENCODE_QUAL_ADDR_SPACE (token->keyword - RID_FIRST_ADDR_SPACE);
> > +   }
> 
> We usually omit braces around a single statement.
> 
Done.

#  >8 
Add support for custom address spaces in C++

gcc/
* tree.h (ENCODE_QUAL_ADDR_SPACE): Missing parentheses.

gcc/c/
* c-decl.cc: Remove c_register_addr_space.

gcc/c-family/
* c-common.cc (c_register_addr_space): Imported from c-decl.cc
(addr_space_superset): Imported from gcc/c/c-typecheck.cc
* c-common.h: Remove the FIXME.
(addr_space_superset): New declaration.

gcc/cp/
* cp-tree.h (enum cp_decl_spec): Add addr_space support.
(struct cp_decl_specifier_seq): Likewise.
* decl.cc (get_type_quals): Likewise.
(check_tag_decl): Likewise.
* parser.cc (cp_parser_type_specifier): Likewise.
(cp_parser_cv_qualifier_seq_opt): Likewise.
(cp_parser_postfix_expression): Likewise.
(cp_parser_type_specifier): Likewise.
(set_and_check_decl_spec_loc): Likewise.
* typeck.cc (composite_pointer_type): Likewise
(comp_ptr_ttypes_real): Likewise.
*

Re: [PATCH] coroutines: Use cp_build_init_expr consistently.

2022-10-11 Thread Jason Merrill via Gcc-patches


On 10/11/22 17:58, Iain Sandoe wrote:

Tested on x86_64-darwin19, OK for master?
thanks
Iain

-- >8 --

Now we have the TARGET_EXPR_ELIDING_P flag, it is important to ensure it
is set properly on target exprs.  The code here has a mixture of APIs used
to build inits.  This patch changes that to use cp_build_init_expr() where
possible, since that handles setting the flag.


Hmm, I wouldn't expect this to be necessary, since cp_build_modify_expr 
calls cp_build_init_expr for INIT_EXPR.  Perhaps the similarity of the 
function names is a trap...



---
  gcc/cp/coroutines.cc | 25 -
  1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 01a3e831ee5..20d309445eb 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1025,8 +1025,7 @@ build_co_await (location_t loc, tree a, 
suspend_point_kind suspend_kind)
else
  {
e_proxy = get_awaitable_var (suspend_kind, o_type);
-  o = cp_build_modify_expr (loc, e_proxy, INIT_EXPR, o,
-   tf_warning_or_error);
+  o = cp_build_init_expr (loc, e_proxy, o);
  }
  
/* I suppose we could check that this is contextually convertible to bool.  */

@@ -2889,8 +2888,7 @@ flatten_await_stmt (var_nest_node *n, hash_set 
*promoted,
  gcc_checking_assert (!already_present);
  tree inner = TREE_OPERAND (init, 1);
  gcc_checking_assert (TREE_CODE (inner) != COND_EXPR);
- init = cp_build_modify_expr (input_location, var, INIT_EXPR, init,
-  tf_warning_or_error);
+ init = cp_build_init_expr (var, init);
  /* Simplify for the case that we have an init containing the temp
 alone.  */
  if (t == n->init && n->var == NULL_TREE)
@@ -3656,8 +3654,7 @@ await_statement_walker (tree *stmt, int *do_subtree, void 
*d)
if (TREE_CODE (cond_inner) == CLEANUP_POINT_EXPR)
  cond_inner = TREE_OPERAND (cond_inner, 0);
location_t sloc = EXPR_LOCATION (SWITCH_STMT_COND (sw_stmt));
-   tree new_s = cp_build_init_expr (sloc, newvar,
-cond_inner);
+   tree new_s = cp_build_init_expr (sloc, newvar, cond_inner);
finish_expr_stmt (new_s);
SWITCH_STMT_COND (sw_stmt) = newvar;
/* Now add the switch statement with the condition re-
@@ -4902,9 +4899,8 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
  if (flag_exceptions)
{
  /* This var is now live.  */
- r = build_modify_expr (fn_start, parm.guard_var,
-boolean_type_node, INIT_EXPR, fn_start,
-boolean_true_node, boolean_type_node);
+ r = cp_build_init_expr (fn_start, parm.guard_var,
+ boolean_true_node);
  finish_expr_stmt (r);
}
}
@@ -4948,9 +4944,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
r = coro_build_cvt_void_expr_stmt (r, fn_start);
finish_expr_stmt (r);
  
-  r = build_modify_expr (fn_start, coro_promise_live, boolean_type_node,

-INIT_EXPR, fn_start, boolean_true_node,
-boolean_type_node);
+  r = cp_build_init_expr (fn_start, coro_promise_live, boolean_true_node);
finish_expr_stmt (r);
  
promise_dtor

@@ -5031,8 +5025,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
   NULL_TREE);
add_decl_expr (gro);
gro_bind_vars = gro;
-  r = cp_build_modify_expr (input_location, gro, INIT_EXPR, get_ro,
-   tf_warning_or_error);
+  r = cp_build_init_expr (gro, get_ro);
/* The constructed object might require a cleanup.  */
if (TYPE_HAS_NONTRIVIAL_DESTRUCTOR (gro_type))
{
@@ -5053,9 +5046,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
   cleanup.  */
if (gro_ret_dtor)
  {
-   r = build_modify_expr (fn_start, coro_gro_live, boolean_type_node,
- INIT_EXPR, fn_start, boolean_true_node,
- boolean_type_node);
+   r = cp_build_init_expr (fn_start, coro_gro_live, boolean_true_node);
finish_expr_stmt (r);
  }
/* Initialize the resume_idx_var to 0, meaning "not started".  */

Re: [PATCH v2 2/3] doc: -falign-functions is ignored under -Os

2022-10-11 Thread Eric Gallager via Gcc-patches

On Tue, Oct 11, 2022 at 5:03 PM Palmer Dabbelt  wrote:
>
> This is implicitly mentioned in the docs, but there were some questions
> in a recent patch.  This makes it more exlicit that -falign-functions is
> meant to be ignored under -Os.
>
> gcc/doc/ChangeLog
>
> * invoke.texi (-falign-functions): Mention -Os

Since there's -Oz now, too, should that be mentioned as well?

> ---
>  gcc/doc/invoke.texi | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 8326a60dcf1..a24798d5029 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -13164,7 +13164,8 @@ equivalent and mean that functions are not aligned.
>  If @var{n} is not specified or is zero, use a machine-dependent default.
>  The maximum allowed @var{n} option value is 65536.
>
> -Enabled at levels @option{-O2}, @option{-O3}.
> +Enabled at levels @option{-O2}, @option{-O3}.  This has no behavior under 
> under
> +@option{-Os}.
>
>  @item -flimit-function-alignment
>  If this option is enabled, the compiler tries to avoid unnecessarily
> --
> 2.34.1
>

Re: [PATCH] coroutines: Use cp_build_init_expr consistently.

2022-10-11 Thread Iain Sandoe

Hi Jason

> On 11 Oct 2022, at 23:06, Jason Merrill  wrote:
> 
> On 10/11/22 17:58, Iain Sandoe wrote:
>> Tested on x86_64-darwin19, OK for master?
>> thanks
>> Iain
>> -- >8 --
>> Now we have the TARGET_EXPR_ELIDING_P flag, it is important to ensure it
>> is set properly on target exprs.  The code here has a mixture of APIs used
>> to build inits.  This patch changes that to use cp_build_init_expr() where
>> possible, since that handles setting the flag.
> 
> Hmm, I wouldn't expect this to be necessary, since cp_build_modify_expr calls 
> cp_build_init_expr for INIT_EXPR.

It seems that path is only taken if the init is a CONSTRUCTOR.

>  Perhaps the similarity of the function names is a trap...

not sure exactly what trap you mean.

It seems that on my local set of additional tests (for some of the open prs) 
there are some
progressions from this change, I will have to track down which change is the 
cause.

thanks
Iain

> 
>> ---
>>  gcc/cp/coroutines.cc | 25 -
>>  1 file changed, 8 insertions(+), 17 deletions(-)
>> diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
>> index 01a3e831ee5..20d309445eb 100644
>> --- a/gcc/cp/coroutines.cc
>> +++ b/gcc/cp/coroutines.cc
>> @@ -1025,8 +1025,7 @@ build_co_await (location_t loc, tree a, 
>> suspend_point_kind suspend_kind)
>>else
>>  {
>>e_proxy = get_awaitable_var (suspend_kind, o_type);
>> -  o = cp_build_modify_expr (loc, e_proxy, INIT_EXPR, o,
>> -tf_warning_or_error);
>> +  o = cp_build_init_expr (loc, e_proxy, o);
>>  }
>>  /* I suppose we could check that this is contextually convertible to 
>> bool.  */
>> @@ -2889,8 +2888,7 @@ flatten_await_stmt (var_nest_node *n, hash_set 
>> *promoted,
>>gcc_checking_assert (!already_present);
>>tree inner = TREE_OPERAND (init, 1);
>>gcc_checking_assert (TREE_CODE (inner) != COND_EXPR);
>> -  init = cp_build_modify_expr (input_location, var, INIT_EXPR, init,
>> -   tf_warning_or_error);
>> +  init = cp_build_init_expr (var, init);
>>/* Simplify for the case that we have an init containing the temp
>>   alone.  */
>>if (t == n->init && n->var == NULL_TREE)
>> @@ -3656,8 +3654,7 @@ await_statement_walker (tree *stmt, int *do_subtree, 
>> void *d)
>>  if (TREE_CODE (cond_inner) == CLEANUP_POINT_EXPR)
>>cond_inner = TREE_OPERAND (cond_inner, 0);
>>  location_t sloc = EXPR_LOCATION (SWITCH_STMT_COND (sw_stmt));
>> -tree new_s = cp_build_init_expr (sloc, newvar,
>> - cond_inner);
>> +tree new_s = cp_build_init_expr (sloc, newvar, cond_inner);
>>  finish_expr_stmt (new_s);
>>  SWITCH_STMT_COND (sw_stmt) = newvar;
>>  /* Now add the switch statement with the condition re-
>> @@ -4902,9 +4899,8 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
>> *destroyer)
>>if (flag_exceptions)
>>  {
>>/* This var is now live.  */
>> -  r = build_modify_expr (fn_start, parm.guard_var,
>> - boolean_type_node, INIT_EXPR, fn_start,
>> - boolean_true_node, boolean_type_node);
>> +  r = cp_build_init_expr (fn_start, parm.guard_var,
>> +  boolean_true_node);
>>finish_expr_stmt (r);
>>  }
>>  }
>> @@ -4948,9 +4944,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
>> *destroyer)
>>r = coro_build_cvt_void_expr_stmt (r, fn_start);
>>finish_expr_stmt (r);
>>  -  r = build_modify_expr (fn_start, coro_promise_live, 
>> boolean_type_node,
>> - INIT_EXPR, fn_start, boolean_true_node,
>> - boolean_type_node);
>> +  r = cp_build_init_expr (fn_start, coro_promise_live, 
>> boolean_true_node);
>>finish_expr_stmt (r);
>>  promise_dtor
>> @@ -5031,8 +5025,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
>> *destroyer)
>> NULL_TREE);
>>add_decl_expr (gro);
>>gro_bind_vars = gro;
>> -  r = cp_build_modify_expr (input_location, gro, INIT_EXPR, get_ro,
>> -tf_warning_or_error);
>> +  r = cp_build_init_expr (gro, get_ro);
>>/* The constructed object might require a cleanup.  */
>>if (TYPE_HAS_NONTRIVIAL_DESTRUCTOR (gro_type))
>>  {
>> @@ -5053,9 +5046,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
>> *destroyer)
>>   cleanup.  */
>>if (gro_ret_dtor)
>>  {
>> -   r = build_modify_expr (fn_start, coro_gro_live, boolean_type_node,
>> -  INIT_EXPR, fn_start, boolean_true_node,
>> -  boolean_type_node);
>> +   r = cp_build_init_expr (fn_start, coro_gro_live, boolean_true_node);
>>finish_expr_stmt (r);
>>

[Ada] Enable support for atomic primitives on SPARC/Linux

2022-10-11 Thread Eric Botcazou via Gcc-patches

The SPARC/Linux port is very similar to the SPARC/Solaris port nowadays so it 
makes sense to copy the setting of the support for atomic primitives.

This fixes the single regression in the gnat.dg testsuite:
FAIL: gnat.dg/prot7.adb (test for excess errors)

Tested on SPARC64/Linux, applied on the mainline.


2022-10-11  Eric Botcazou  

* libgnat/system-linux-sparc.ads (Support_Atomic_Primitives): New
constant set to True.

-- 
Eric Botcazoudiff --git a/gcc/ada/libgnat/system-linux-sparc.ads b/gcc/ada/libgnat/system-linux-sparc.ads
index cc502da3e5b..6d4ee380b2d 100644
--- a/gcc/ada/libgnat/system-linux-sparc.ads
+++ b/gcc/ada/libgnat/system-linux-sparc.ads
@@ -133,6 +133,7 @@ private
Stack_Check_Probes: constant Boolean := True;
Stack_Check_Limits: constant Boolean := False;
Support_Aggregates: constant Boolean := True;
+   Support_Atomic_Primitives : constant Boolean := True;
Support_Composite_Assign  : constant Boolean := True;
Support_Composite_Compare : constant Boolean := True;
Support_Long_Shifts   : constant Boolean := True;

[PATCH] Fix emit_group_store regression on big-endian

2022-10-11 Thread Eric Botcazou via Gcc-patches

Hi,

the recent optimization implemented for complex modes in:
  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595865.html
contains an oversight for big-endian platforms in the "interesting corner 
case" mentioned in the message: it uses a lowpart SUBREG when the integer 
modes have different sizes, but this does not match the semantics of the 
PARALLELs which have a bundled byte offset; this offset is always zero in the 
code path and the lowpart is not at offset zero on big-endian platforms.

Calling validate_subreg with this zero offset would fix the regression by 
disabling the optimization on big-endian platforms, so instead the attached 
fix adds the appropriate right shift for them.

This fixes the following regressions in the C testsuite on SPARC64/Linux:
FAIL: gcc.c-torture/execute/20041124-1.c   -O0  execution test
FAIL: gcc.c-torture/execute/20041124-1.c   -O1  execution test
FAIL: gcc.c-torture/execute/20041124-1.c   -O2  execution test
FAIL: gcc.c-torture/execute/20041124-1.c   -O2 -flto -fno-use-linker-plugin -
flto-partition=none  execution test
FAIL: gcc.c-torture/execute/20041124-1.c   -O2 -flto -fuse-linker-plugin -fno-
fat-lto-objects  execution test
FAIL: gcc.c-torture/execute/20041124-1.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/20041124-1.c   -Os  execution test
FAIL: gcc.dg/compat/struct-by-value-11 c_compat_x_tst.o-c_compat_y_tst.o 
execute 
FAIL: gcc.dg/compat/struct-by-value-12 c_compat_x_tst.o-c_compat_y_tst.o 
execute 
FAIL: tmpdir-gcc.dg-struct-layout-1/t027 c_compat_x_tst.o-c_compat_y_tst.o 
execute 

Tested on SPARC64/Linux, OK for the mainline?


2022-10-11  Eric Botcazou  

* expr.cc (emit_group_stote): Fix handling of modes of different
sizes for big-endian targets in latest change and add commentary.

-- 
Eric Botcazoudiff --git a/gcc/expr.cc b/gcc/expr.cc
index ba627f176a7..b897b6dc385 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -2813,50 +2813,69 @@ emit_group_store (rtx orig_dst, rtx src, tree type ATTRIBUTE_UNUSED,
   else
 	adj_bytelen = bytelen;
 
+  /* Deal with destination CONCATs by either storing into one of the parts
+	 or doing a copy after storing into a register or stack temporary.  */
   if (GET_CODE (dst) == CONCAT)
 	{
 	  if (known_le (bytepos + adj_bytelen,
 			GET_MODE_SIZE (GET_MODE (XEXP (dst, 0)
 	dest = XEXP (dst, 0);
+
 	  else if (known_ge (bytepos, GET_MODE_SIZE (GET_MODE (XEXP (dst, 0)
 	{
 	  bytepos -= GET_MODE_SIZE (GET_MODE (XEXP (dst, 0)));
 	  dest = XEXP (dst, 1);
 	}
+
 	  else
 	{
 	  machine_mode dest_mode = GET_MODE (dest);
 	  machine_mode tmp_mode = GET_MODE (tmps[i]);
-	  scalar_int_mode imode;
+	  scalar_int_mode dest_imode;
 
 	  gcc_assert (known_eq (bytepos, 0) && XVECLEN (src, 0));
 
-	  if (finish == 1
+	  /* If the source is a single scalar integer register, and the
+		 destination has a complex mode for which a same-sized integer
+		 mode exists, then we can take the left-justified part of the
+		 source in the complex mode.  */
+	  if (finish == start + 1
 		  && REG_P (tmps[i])
-		  && COMPLEX_MODE_P (dest_mode)
 		  && SCALAR_INT_MODE_P (tmp_mode)
-		  && int_mode_for_mode (dest_mode).exists (&imode))
+		  && COMPLEX_MODE_P (dest_mode)
+		  && int_mode_for_mode (dest_mode).exists (&dest_imode))
 		{
-		  if (tmp_mode != imode)
+		  const scalar_int_mode tmp_imode
+		= as_a  (tmp_mode);
+
+		  if (GET_MODE_BITSIZE (dest_imode)
+		  < GET_MODE_BITSIZE (tmp_imode))
 		{
-		  rtx tmp = gen_reg_rtx (imode);
-		  emit_move_insn (tmp, gen_lowpart (imode, tmps[i]));
-		  dst = gen_lowpart (dest_mode, tmp);
+		  dest = gen_reg_rtx (dest_imode);
+		  if (BYTES_BIG_ENDIAN)
+			tmps[i] = expand_shift (RSHIFT_EXPR, tmp_mode, tmps[i],
+		GET_MODE_BITSIZE (tmp_imode)
+		- GET_MODE_BITSIZE (dest_imode),
+		NULL_RTX, 1);
+		  emit_move_insn (dest, gen_lowpart (dest_imode, tmps[i]));
+		  dst = gen_lowpart (dest_mode, dest);
 		}
 		  else
 		dst = gen_lowpart (dest_mode, tmps[i]);
 		}
+
+	  /* Otherwise spill the source onto the stack using the more
+		 aligned of the two modes.  */
 	  else if (GET_MODE_ALIGNMENT (dest_mode)
-		  >= GET_MODE_ALIGNMENT (tmp_mode))
+		   >= GET_MODE_ALIGNMENT (tmp_mode))
 		{
 		  dest = assign_stack_temp (dest_mode,
 	GET_MODE_SIZE (dest_mode));
-		  emit_move_insn (adjust_address (dest,
-		  tmp_mode,
-		  bytepos),
+		  emit_move_insn (adjust_address (dest, tmp_mode, bytepos),
   tmps[i]);
 		  dst = dest;
 		}
+
 	  else
 		{
 		  dest = assign_stack_temp (tmp_mode,
@@ -2864,6 +2883,7 @@ emit_group_store (rtx orig_dst, rtx src, tree type ATTRIBUTE_UNUSED,
 		  emit_move_insn (dest, tmps[i]);
 		  dst = adjust_address (dest, dest_mode, bytepos);
 		}
+
 	  break;
 	}
 	}

Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]

2022-10-11 Thread Jeff Law via Gcc-patches




On 10/11/22 13:31, Palmer Dabbelt wrote:

On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote:

Hi Christoph, Kito,

On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:

This series provides a cleanup of the current atomics implementation
of RISC-V:

* PR100265: Use proper fences for atomic load/store
* PR100266: Provide programmatic implementation of CAS

As both are very related, I merged the patches into one series.

The first patch could be squashed into the following patches,
but I found it easier to understand the chances with it in place.

The series has been tested as follows:
* Building and testing a multilib RV32/64 toolchain
   (bootstrapped with riscv-gnu-toolchain repo)
* Manual review of generated sequences for GCC's atomic builtins API

The programmatic re-implementation of CAS benefits from a REE 
improvement

(see PR100264):
https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
If this patch is not in place, then an additional extension instruction
is emitted after the SC.W (in case of RV64 and CAS for uint32_t).

Further, the new CAS code requires cbranch INSN helpers to be present:
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html


I was wondering is this patchset is blocked on some technical grounds.


There's a v3 (though I can't find all of it, so not quite sure what 
happened), but IIUC that still has the same fundamental problems that 
all these have had: changing over to the new fence model may by an ABI 
break and the split CAS implementation doesn't ensure eventual success 
(see Jim's comments).  Not sure if there's other comments floating 
around, though, that's just what I remember.


Do we have a pointer to the ABI discussion.  I've been meaning to 
familiarize myself with the issues in this space and that seems like a 
good place to start given its blocking progress on the atomics.



jeff

Re: [RFC] Add support for vectors in comparisons (like the C++ frontend does)

2022-10-11 Thread Paul Iannetta via Gcc-patches

On Mon, Oct 10, 2022 at 11:07:06PM +, Joseph Myers wrote:
> On Mon, 10 Oct 2022, Paul Iannetta via Gcc-patches wrote:
> 
> > I have a patch to bring this feature to the C front-end as well, and
> > would like to hear your opinion on it, especially since it may affect
> > the feature-set of the objc front-end as well.
> 
> > Currently, this is only a tentative patch and I did not add any tests
> > to the testsuite.
> 
> I think tests (possibly existing C++ tests moved to c-c++-common?) are 
> necessary to judge such a feature; it could better be judged based on 
> tests without implementation than based on implementation without tests.

Currently, this feature has the following tests in g++.dg/ext/
  - vector9.C
  - vector19.C
  - vector21.C
  - vector22.C
  - vector23.C
  - vector27.C
  - vector28.C
provided by Marc Glisse when he implemented the feature for C++.

They are all handled by my mirror implementation (after removing
C++-only features), save for a case in vector19.C ( v ? '1' : '2',
where v is a vector of unsigned char, but '1' and '2' are considered
as int, which results in a type mismatch.)

I'll move those tests to c-c++-common tomorrow, but will duplicate
vector19.C and vector23.C which rely on C++-only features.

During my tests, I've been using variations around this:

typedef int v2si __attribute__((__vector_size__ (2 * sizeof(int;

v2si f (v2si a, v2si b, v2si c)
{
  v2si d = a + !b;
  v2si e = a || b;
  return c ? (a + !b) && (c - e && a) : (!!b ^ c && e);
}

It is already possible to express much of the same thing without the
syntactic sugar but is is barely legible

typedef int v2si __attribute__((__vector_size__ (2 * sizeof(int;

v2si f (v2si a, v2si b, v2si c)
{
  v2si d = a + (b == 0);
  v2si e = (a != 0) | (b != 0);
  return ((c != 0) & (((a + (b == 0)) != 0) & (((c - e) != 0) & (a != 0
   | ((c == 0) & (b == 0) == 0) ^ c) != 0) & (e != 0)));
}

Paul

Re: [PATCH] coroutines: Use cp_build_init_expr consistently.

2022-10-11 Thread Jason Merrill via Gcc-patches


On 10/11/22 18:17, Iain Sandoe wrote:

Hi Jason


On 11 Oct 2022, at 23:06, Jason Merrill  wrote:

On 10/11/22 17:58, Iain Sandoe wrote:

Tested on x86_64-darwin19, OK for master?
thanks
Iain
-- >8 --
Now we have the TARGET_EXPR_ELIDING_P flag, it is important to ensure it
is set properly on target exprs.  The code here has a mixture of APIs used
to build inits.  This patch changes that to use cp_build_init_expr() where
possible, since that handles setting the flag.


Hmm, I wouldn't expect this to be necessary, since cp_build_modify_expr calls 
cp_build_init_expr for INIT_EXPR.


It seems that path is only taken if the init is a CONSTRUCTOR.


If it's not a CONSTRUCTOR it builds a constructor call, which should end 
up having the same effect.  Certainly cp_build_init_expr is simpler if 
it works, so the patch is OK, but if it's necessary that indicates a bug 
that should be fixed.



  Perhaps the similarity of the function names is a trap...


not sure exactly what trap you mean.


The names are close, but cp_build_modify_expr is a lot more complicated; 
it handles conversions and overloaded operators.



It seems that on my local set of additional tests (for some of the open prs) 
there are some
progressions from this change, I will have to track down which change is the 
cause.

thanks
Iain




---
  gcc/cp/coroutines.cc | 25 -
  1 file changed, 8 insertions(+), 17 deletions(-)
diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 01a3e831ee5..20d309445eb 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1025,8 +1025,7 @@ build_co_await (location_t loc, tree a, 
suspend_point_kind suspend_kind)
else
  {
e_proxy = get_awaitable_var (suspend_kind, o_type);
-  o = cp_build_modify_expr (loc, e_proxy, INIT_EXPR, o,
-   tf_warning_or_error);
+  o = cp_build_init_expr (loc, e_proxy, o);
  }
  /* I suppose we could check that this is contextually convertible to 
bool.  */
@@ -2889,8 +2888,7 @@ flatten_await_stmt (var_nest_node *n, hash_set 
*promoted,
  gcc_checking_assert (!already_present);
  tree inner = TREE_OPERAND (init, 1);
  gcc_checking_assert (TREE_CODE (inner) != COND_EXPR);
- init = cp_build_modify_expr (input_location, var, INIT_EXPR, init,
-  tf_warning_or_error);
+ init = cp_build_init_expr (var, init);
  /* Simplify for the case that we have an init containing the temp
 alone.  */
  if (t == n->init && n->var == NULL_TREE)
@@ -3656,8 +3654,7 @@ await_statement_walker (tree *stmt, int *do_subtree, void 
*d)
if (TREE_CODE (cond_inner) == CLEANUP_POINT_EXPR)
  cond_inner = TREE_OPERAND (cond_inner, 0);
location_t sloc = EXPR_LOCATION (SWITCH_STMT_COND (sw_stmt));
-   tree new_s = cp_build_init_expr (sloc, newvar,
-cond_inner);
+   tree new_s = cp_build_init_expr (sloc, newvar, cond_inner);
finish_expr_stmt (new_s);
SWITCH_STMT_COND (sw_stmt) = newvar;
/* Now add the switch statement with the condition re-
@@ -4902,9 +4899,8 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
  if (flag_exceptions)
{
  /* This var is now live.  */
- r = build_modify_expr (fn_start, parm.guard_var,
-boolean_type_node, INIT_EXPR, fn_start,
-boolean_true_node, boolean_type_node);
+ r = cp_build_init_expr (fn_start, parm.guard_var,
+ boolean_true_node);
  finish_expr_stmt (r);
}
}
@@ -4948,9 +4944,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
r = coro_build_cvt_void_expr_stmt (r, fn_start);
finish_expr_stmt (r);
  -  r = build_modify_expr (fn_start, coro_promise_live, boolean_type_node,
-INIT_EXPR, fn_start, boolean_true_node,
-boolean_type_node);
+  r = cp_build_init_expr (fn_start, coro_promise_live, boolean_true_node);
finish_expr_stmt (r);
  promise_dtor
@@ -5031,8 +5025,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
   NULL_TREE);
add_decl_expr (gro);
gro_bind_vars = gro;
-  r = cp_build_modify_expr (input_location, gro, INIT_EXPR, get_ro,
-   tf_warning_or_error);
+  r = cp_build_init_expr (gro, get_ro);
/* The constructed object might require a cleanup.  */
if (TYPE_HAS_NONTRIVIAL_DESTRUCTOR (gro_type))
{
@@ -5053,9 +5046,7 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
   cleanup.  */
if (gro_ret_dtor)
  {
-   r = build_modify_expr (fn_start, coro_gro_liv

Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]

2022-10-11 Thread Vineet Gupta

On 10/11/22 13:46, Christoph Müllner wrote:

On Tue, Oct 11, 2022 at 9:31 PM Palmer Dabbelt  wrote:

On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote:
> Hi Christoph, Kito,
>
> On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:
>> This series provides a cleanup of the current atomics
implementation
>> of RISC-V:
>>
>> * PR100265: Use proper fences for atomic load/store
>> * PR100266: Provide programmatic implementation of CAS
>>
>> As both are very related, I merged the patches into one series.
>>
>> The first patch could be squashed into the following patches,
>> but I found it easier to understand the chances with it in place.
>>
>> The series has been tested as follows:
>> * Building and testing a multilib RV32/64 toolchain
>>    (bootstrapped with riscv-gnu-toolchain repo)
>> * Manual review of generated sequences for GCC's atomic
builtins API
>>
>> The programmatic re-implementation of CAS benefits from a REE
improvement
>> (see PR100264):
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
>> If this patch is not in place, then an additional extension
instruction
>> is emitted after the SC.W (in case of RV64 and CAS for uint32_t).
>>
>> Further, the new CAS code requires cbranch INSN helpers to be
present:
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html
>
> I was wondering is this patchset is blocked on some technical
grounds.

There's a v3 (though I can't find all of it, so not quite sure what
happened), but IIUC that still has the same fundamental problems that
all these have had: changing over to the new fence model may by an
ABI
break and the split CAS implementation doesn't ensure eventual
success
(see Jim's comments).  Not sure if there's other comments floating
around, though, that's just what I remember.

v3 was sent on May 27, 2022, when I rebased this on an internal tree:
https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html
I dropped the CAS patch in v3 (issue: stack spilling under extreme 
register pressure instead of erroring out) as I thought that this was 
the blocker for the series.
I just learned a few weeks ago, when I asked Palmer at the GNU 
Cauldron about this series, that the ABI break is the blocker.

Yeah I was confused about the ABI aspect as I didn't see any mention of 
that in the public reviews of v1 and v2.

My initial understanding was that fixing something broken cannot be an 
ABI break.
And that the mismatch of the implementation in 2021 and the 
recommended mappings in the ratified specification from 2019 is 
something that is broken. I still don't know the background here, but 
I guess this assumption is incorrect from a historical point of view.

However, I'm sure that I am not the only one that assumes the mappings 
in the specification to be implemented in compilers and tools. 
Therefore I still consider the implementation of the RISC-V atomics in 
GCC as broken (at least w.r.t. user expectation from people that lack 
the historical background and just read the RISC-V specification).

+Andrea, in case he has time to look at the memory model / ABI
issues.

We'd still need to sort out the CAS issues, though, and it's not
abundantly clear it's worth the work: we're essentailly
constrained to
just emitting those fixed CAS sequences due to the eventual success
rules, so it's not clear what the benefit of splitting those up is.
With WRS there are some routines we might want to generate code for
(cond_read_acquire() in Linux, for example) but we'd really need
to dig
into those to see if it's even sane/fast.

There's another patch set to fix the lack of inline atomic routines
without breaking stuff, there were some minor comments from Kito and
IIRC I had some test failures that I needed to chase down as well.
That's a much safer fix in the short term, we'll need to deal with
this
eventually but at least we can stop the libatomic issues for the
distro
folks.

I expect that the pressure for a proper fix upstream (instead of a 
backward compatible compromise) will increase over time (once people 
start building big iron based on RISC-V and start hunting performance 
bottlenecks in multithreaded workloads to be competitive).
What could be done to get some relief is to enable the new atomics ABI 
by a command line switch and promote its use. And at one point in the 
future (if there are enough fixes to justify a break) the new ABI can 
be enabled by default with a new flag to enable the old ABI.

Indeed we are stuck with inefficiencies with status quo. The new abi 
option sounds like a reasonable plan going fwd.

Also my understand is that while the considerations are ABI centric, the 
option to faciliate this need not be tied to canonical -mabi=lp32, lp6

Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]

2022-10-11 Thread Palmer Dabbelt

On Tue, 11 Oct 2022 16:31:25 PDT (-0700), Vineet Gupta wrote:

On 10/11/22 13:46, Christoph Müllner wrote:

On Tue, Oct 11, 2022 at 9:31 PM Palmer Dabbelt  wrote:

On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote:
> Hi Christoph, Kito,
>
> On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:
>> This series provides a cleanup of the current atomics
implementation
>> of RISC-V:
>>
>> * PR100265: Use proper fences for atomic load/store
>> * PR100266: Provide programmatic implementation of CAS
>>
>> As both are very related, I merged the patches into one series.
>>
>> The first patch could be squashed into the following patches,
>> but I found it easier to understand the chances with it in place.
>>
>> The series has been tested as follows:
>> * Building and testing a multilib RV32/64 toolchain
>>    (bootstrapped with riscv-gnu-toolchain repo)
>> * Manual review of generated sequences for GCC's atomic
builtins API
>>
>> The programmatic re-implementation of CAS benefits from a REE
improvement
>> (see PR100264):
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
>> If this patch is not in place, then an additional extension
instruction
>> is emitted after the SC.W (in case of RV64 and CAS for uint32_t).
>>
>> Further, the new CAS code requires cbranch INSN helpers to be
present:
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html
>
> I was wondering is this patchset is blocked on some technical
grounds.

There's a v3 (though I can't find all of it, so not quite sure what
happened), but IIUC that still has the same fundamental problems that
all these have had: changing over to the new fence model may by an
ABI
break and the split CAS implementation doesn't ensure eventual
success
(see Jim's comments).  Not sure if there's other comments floating
around, though, that's just what I remember.

v3 was sent on May 27, 2022, when I rebased this on an internal tree:
https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html
I dropped the CAS patch in v3 (issue: stack spilling under extreme 
register pressure instead of erroring out) as I thought that this was 
the blocker for the series.
I just learned a few weeks ago, when I asked Palmer at the GNU 
Cauldron about this series, that the ABI break is the blocker.

Yeah I was confused about the ABI aspect as I didn't see any mention of 
that in the public reviews of v1 and v2.

Sorry, I thought we'd talked about it somewhere but it must have just 
been in meetings and such.  Patrick was writing a similar patch set 
around the same time so it probably just got tied up in that, we ended 
up reducing it to just the strong CAS inline stuff because we couldn't 
sort out the correctness of the rest of it.

My initial understanding was that fixing something broken cannot be an 
ABI break.
And that the mismatch of the implementation in 2021 and the 
recommended mappings in the ratified specification from 2019 is 
something that is broken. I still don't know the background here, but 
I guess this assumption is incorrect from a historical point of view.

We agreed that we wouldn't break binaries back when we submitted the 
port.  The ISA has changed many times since then, including adding the 
recommended mappings, but those binaries exist and we can't just 
silently break things for users.

However, I'm sure that I am not the only one that assumes the mappings 
in the specification to be implemented in compilers and tools. 
Therefore I still consider the implementation of the RISC-V atomics in 
GCC as broken (at least w.r.t. user expectation from people that lack 
the historical background and just read the RISC-V specification).

You can't just read one of those RISC-V PDFs and assume that 
implementations that match those words will function correctly.  Those 
words regularly change in ways where reasonable readers would end up 
with incompatible implementations due to those differences.  That's why 
we're so explicit about versions and such these days, we're just getting 
burned by these old mappings because they're from back when we though 
the RISC-V definition of compatibility was going to match the more 
common one and we didn't build in fallbacks.

+Andrea, in case he has time to look at the memory model / ABI
issues.

We'd still need to sort out the CAS issues, though, and it's not
abundantly clear it's worth the work: we're essentailly
constrained to
just emitting those fixed CAS sequences due to the eventual success
rules, so it's not clear what the benefit of splitting those up is.
With WRS there are some routines we might want to generate code for
(cond_read_acquire() in Linux, for example) but we'd really need
to dig
into those to see if it's even sane/fast.

There's another patch

RE: [PATCH] [x86] Add define_insn_and_split to support general version of "kxnor".

2022-10-11 Thread Liu, Hongtao via Gcc-patches




> -Original Message-
> From: Jakub Jelinek 
> Sent: Tuesday, October 11, 2022 9:59 PM
> To: Liu, Hongtao 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] [x86] Add define_insn_and_split to support general
> version of "kxnor".
> 
> On Tue, Oct 11, 2022 at 04:03:16PM +0800, liuhongt via Gcc-patches wrote:
> > gcc/ChangeLog:
> >
> > * config/i386/i386.md (*notxor_1): New post_reload
> > define_insn_and_split.
> > (*notxorqi_1): Ditto.
> 
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -10826,6 +10826,39 @@ (define_insn "*_1"
> > (set_attr "type" "alu, alu, msklog")
> > (set_attr "mode" "")])
> >
> > +(define_insn_and_split "*notxor_1"
> > +  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,?k")
> > +   (not:SWI248
> > + (xor:SWI248
> > +   (match_operand:SWI248 1 "nonimmediate_operand" "%0,0,k")
> > +   (match_operand:SWI248 2 "" "r,,k"
> > +   (clobber (reg:CC FLAGS_REG))]
> > +  "ix86_binary_operator_ok (XOR, mode, operands)"
> > +  "#"
> > +  "&& reload_completed"
> > +  [(parallel
> > +[(set (match_dup 0)
> > + (xor:SWI248 (match_dup 1) (match_dup 2)))
> > + (clobber (reg:CC FLAGS_REG))])
> > +   (set (match_dup 0)
> > +   (not:SWI248 (match_dup 1)))]
> > +{
> > +  if (MASK_REGNO_P (REGNO (operands[0])))
> 
> This causes --enable-checking=yes,rtl,extra regression on
> gcc.dg/store_merging_13.c test on x86_64-linux:
> .../gcc/testsuite/gcc.dg/store_merging_13.c: In function 'f13':
> .../gcc/testsuite/gcc.dg/store_merging_13.c:189:1: internal compiler error: 
> RTL
> check: expected code 'reg', have 'mem' in rhs_regno, at rtl.h:1932 0x7b0c8f
> rtl_check_failed_code1(rtx_def const*, rtx_code, char const*, int, char 
> const*)
> ../../gcc/rtl.cc:916
> 0x8e74be rhs_regno
> ../../gcc/rtl.h:1932
> 0x9785fd rhs_regno
> ./genrtl.h:120
> 0x9785fd gen_split_260(rtx_insn*, rtx_def**)
> ../../gcc/config/i386/i386.md:10846
> 0x23596dc split_insns(rtx_def*, rtx_insn*)
> ../../gcc/config/i386/i386.md:16392
> 0xfccd5a try_split(rtx_def*, rtx_insn*, int)
> ../../gcc/emit-rtl.cc:3799
> 0x132e9d8 split_insn
> ../../gcc/recog.cc:3384
> 0x13359d5 split_all_insns()
> ../../gcc/recog.cc:3488
> 0x1335ae8 execute
> ../../gcc/recog.cc:4412
> Please submit a full bug report, with preprocessed source (by using -freport-
> bug).
> Please include the complete backtrace with any bug report.
> See  for instructions.
> 
> Fixed thusly, tested on x86_64-linux, committed to trunk as obvious.
Thanks.
> 
> 2022-10-11  Jakub Jelinek  
> 
>   PR target/107185
>   * config/i386/i386.md (*notxor_1): Use MASK_REG_P (x)
> instead of
>   MASK_REGNO_P (REGNO (x)).
> 
> --- gcc/config/i386/i386.md.jj2022-10-11 12:10:42.188891134 +0200
> +++ gcc/config/i386/i386.md   2022-10-11 15:47:45.531449089 +0200
> @@ -10843,7 +10843,7 @@ (define_insn_and_split "*notxor_1"
> (set (match_dup 0)
>   (not:SWI248 (match_dup 0)))]
>  {
> -  if (MASK_REGNO_P (REGNO (operands[0])))
> +  if (MASK_REG_P (operands[0]))
>  {
>emit_insn (gen_kxnor (operands[0], operands[1], operands[2]));
>DONE;
> 
> 
>   Jakub

[PATCH] RISC-V: Add new line at end of file.

2022-10-11 Thread juzhe . zhong

From: Ju-Zhe Zhong 

gcc/ChangeLog:

* config/riscv/riscv-c.cc: Add new line.
* config/riscv/riscv_vector.h (vwrite_csr): Add new line.

---
 gcc/config/riscv/riscv-c.cc | 2 +-
 gcc/config/riscv/riscv_vector.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 6fe4a8aeacf..57eeaebc582 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -195,4 +195,4 @@ void
 riscv_register_pragmas (void)
 {
   c_register_pragma ("riscv", "intrinsic", riscv_pragma_intrinsic);
-}
\ No newline at end of file
+}
diff --git a/gcc/config/riscv/riscv_vector.h b/gcc/config/riscv/riscv_vector.h
index 85cc656bc41..1efe3f888b5 100644
--- a/gcc/config/riscv/riscv_vector.h
+++ b/gcc/config/riscv/riscv_vector.h
@@ -97,4 +97,4 @@ vwrite_csr(enum RVV_CSR csr, unsigned long value)
 }
 #endif // __cplusplus
 #endif // __riscv_vector
-#endif // __RISCV_VECTOR_H
\ No newline at end of file
+#endif // __RISCV_VECTOR_H
-- 
2.36.1

1 2 >

1 - 100 of 121 matches

Mail list logo