date:20240229

[PATCH] c++, v2: Fix up decltype of non-dependent structured binding decl in template [PR92687]

2024-02-29 Thread Jakub Jelinek

On Thu, Feb 29, 2024 at 12:50:47PM +0100, Jakub Jelinek wrote:
> finish_decltype_type uses DECL_HAS_VALUE_EXPR_P (expr) check for
> DECL_DECOMPOSITION_P (expr) to determine if it is
> array/struct/vector/complex etc. subobject proxy case vs. structured
> binding using std::tuple_{size,element}.
> For non-templates or when templates are already instantiated, that works
> correctly, finalized DECL_DECOMPOSITION_P non-base vars indeed have
> DECL_VALUE_EXPR in the former case and don't have it in the latter.
> It works fine for dependent structured bindings as well, cp_finish_decomp in
> that case creates DECLTYPE_TYPE tree and defers the handling until
> instantiation.
> As the testcase shows, this doesn't work for the non-dependent structured
> binding case in templates, because DECL_HAS_VALUE_EXPR_P is set in that case
> always; cp_finish_decomp ends with:
>   if (processing_template_decl)
> {
>   for (unsigned int i = 0; i < count; i++)
> if (!DECL_HAS_VALUE_EXPR_P (v[i]))
>   {
> tree a = build_nt (ARRAY_REF, decl, size_int (i),
>NULL_TREE, NULL_TREE);
> SET_DECL_VALUE_EXPR (v[i], a);
> DECL_HAS_VALUE_EXPR_P (v[i]) = 1;
>   }
> }
> and those artificial ARRAY_REFs are used in various places during
> instantiation to find out what base the DECL_DECOMPOSITION_P VAR_DECLs
> have and their positions.

> Another option would be to change
>  tree
>  lookup_decomp_type (tree v)
>  {
> -  return *decomp_type_table->get (v);
> +  if (tree *slot = decomp_type_table->get (v))
> +return *slot;
> +  return NULL_TREE;
>  }
> 
> and in finish_decl_decomp either just in the ptds.saved case or always
> try to lookup_decomp_type, if it returns non-NULL, return what it returned,
> otherwise return unlowered_expr_type (expr).  I guess it would be cleaner,
> I thought it would be more costly due to the hash table lookup, but now that
> I think about it again, DECL_VALUE_EXPR is a hash table lookup as well.
> So maybe then
> +   if (ptds.saved)
> + {
> +   gcc_checking_assert (DECL_HAS_VALUE_EXPR_P (expr));
> +   /* DECL_HAS_VALUE_EXPR_P is always set if
> +  processing_template_decl.  If lookup_decomp_type
> +  returns non-NULL, it is the tuple case.  */
> +   if (tree ret = lookup_decomp_type (expr))
> + return ret;
> + }
> if (DECL_HAS_VALUE_EXPR_P (expr))
>   /* Expr is an array or struct subobject proxy, handle
>  bit-fields properly.  */
>   return unlowered_expr_type (expr);
> else
>   /* Expr is a reference variable for the tuple case.  */
>   return lookup_decomp_type (expr);

Here is a variant of the patch which does that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Or the other version, or adding some flag to the DECL_DECOMPOSITION_P
decls?

2024-03-01  Jakub Jelinek  

PR c++/92687
* decl.cc (lookup_decomp_type): Return NULL_TREE if decomp_type_table
doesn't have entry for V.
* semantics.cc (finish_decltype_type): If ptds.saved, assert
DECL_HAS_VALUE_EXPR_P is true and decide on tuple vs. non-tuple based
on if lookup_decomp_type is NULL or not.

* g++.dg/cpp1z/decomp59.C: New test.

--- gcc/cp/decl.cc.jj   2024-02-28 23:20:01.004751204 +0100
+++ gcc/cp/decl.cc  2024-02-29 20:03:11.087218176 +0100
@@ -9262,7 +9262,9 @@ static GTY((cache)) decl_tree_cache_map
 tree
 lookup_decomp_type (tree v)
 {
-  return *decomp_type_table->get (v);
+  if (tree *slot = decomp_type_table->get (v))
+return *slot;
+  return NULL_TREE;
 }
 
 /* Mangle a decomposition declaration if needed.  Arguments like
--- gcc/cp/semantics.cc.jj  2024-02-28 22:57:08.101800588 +0100
+++ gcc/cp/semantics.cc 2024-02-29 20:04:51.936880622 +0100
@@ -11804,6 +11804,15 @@ finish_decltype_type (tree expr, bool id
 access expression).  */
   if (DECL_DECOMPOSITION_P (expr))
{
+ if (ptds.saved)
+   {
+ gcc_checking_assert (DECL_HAS_VALUE_EXPR_P (expr));
+ /* DECL_HAS_VALUE_EXPR_P is always set if
+processing_template_decl.  If lookup_decomp_type
+returns non-NULL, it is the tuple case.  */
+ if (tree ret = lookup_decomp_type (expr))
+   return ret;
+   }
  if (DECL_HAS_VALUE_EXPR_P (expr))
/* Expr is an array or struct subobject proxy, handle
   bit-fields properly.  */
--- gcc/testsuite/g++.dg/cpp1z/decomp59.C.jj2024-02-29 20:02:17.467929327 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/decomp59.C   2024-02-29 20:02:17.467929327 
+0100
@@ -0,0 +1,63 @@
+// PR c++/92687
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+namespace std {
+  template struct tuple_size;
+  template struct tuple_element;
+}
+
+struct A {
+  int i;
+  template  int& get() {

Re: [PATCH] calls: Fix up TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-02-29 Thread Jakub Jelinek

On Fri, Mar 01, 2024 at 01:53:54AM -0300, Alexandre Oliva wrote:
> On Feb 27, 2024, Richard Earnshaw  wrote:
> 
> > This one has been festering for a while; both Alexandre and Torbjorn
> > have attempted to fix it recently, but I'm not sure either is really
> > right...
> 
> *nod* xref https://gcc.gnu.org/pipermail/gcc-patches/2024-March/646926.html
> The patch I proposed was indeed far too limited in scope.
> 
> > On Arm this is causing all anonymous arguments to be passed on the
> > stack, which is incorrect per the ABI.  On a target that uses
> > 'pretend_outgoing_vararg_named', why is it correct to set n_named_args
> > to zero?  Is it enough to guard both the statements you've added with
> > !targetm.calls.pretend_outgoing_args_named?
> 
> ISTM that the change you suggest over Jakub's patch would address the
> inconsistency on ARM.

At least in my understanding, the only part of my patch that was being
discussed was the !strict_argument_naming && !pretend_outgoing_args_named
case with structure_value_addr_parm, I don't see how that would affect
ARM, given that it is a !strict_argument_naming && pretend_outgoing_args_named
target.  In that case with the patch as posted n_named_args will be
structure_value_addr_parm before INIT_CUMULATIVE_ARGS and num_actuals
afterwards, I don't see any disagreement on that.

Jakub

Re:[PATCH 5/5] RISC-V: Support vmsxx.vx for autovec comparison of vec and imm

2024-02-29 Thread 钟居哲

Hi, han. My comment for this patch is same as

[PATCH 3/5] RISC-V: Support vmfxx.vf for autovec comparison of vec and imm



--Original--
From: "demin.han"

Re:[PATCH 3/5] RISC-V: Support vmfxx.vf for autovec comparison of vec and imm

2024-02-29 Thread 钟居哲

Hi, han. I understand you are trying to support optimize vector-splat_vector 
into vector-scalar in "expand" stage, that is,


vv - vx or vv - vf.


It's a known issue that we know for a long time.


This patch is trying to transform vv-vf when the splat vector is duplicate 
from a constant (by recognize it is a CONST_VECTOR in expand stage),
but can't transform vv-vf when splat vector is duplicate from a 
register.


For example, like a[i] = b[i]  x ? c[i] : d[i], the x is a register, this 
case can not be optimized with your patch.


Actually, we have a solution to do all possible transformation (including the 
case I mentioned above) from vv to vx or vf by late-combine PASS which
is contributed by ARM Richard 
Sandiford:https://patchwork.ozlabs.org/project/gcc/patch/mptr0ljn9eh@arm.com/
You can try to apply this patch and experiment it locally yourself.


And I believe it will be landed in GCC-15. So I don't think we need this patch 
to do the optimization.


Thanks.

--Original--
From: "demin.han"

Re:[PATCH 4/5] RISC-V: Remove integer vector eqne pattern

2024-02-29 Thread 钟居哲

Hi, han. My review comment of this patch is same as I said in:



[PATCH 1/5] RISC-V: Remove float vector eqne pattern



--Original--
From: "demin.han"

Re:[PATCH 2/5] RISC-V: Refactor expand_vec_cmp

2024-02-29 Thread 钟居哲

LGTM. But please commit it with adding [NFC] into the title of this patch:


RISC-V: Refactor expand_vec_cmp [NFC]


--Original--
From: "demin.han"

Re:[PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-02-29 Thread 钟居哲

Hello, han. Thanks for trying to optimize the codes.


But I believe those vector-scalar patterns (eq/ne) you remove in this patch are 
necessary.


This is the story:
1. For commutative RTL code in GCC like plus, eq, ne, ... etc,
  we known in semantic Both (eq: (reg) (vec_duplicate ... ) and 
(eq: (vec_duplicate ...) (reg)) are right.
  However, GCC prefer this order as I remembered - (eq: 
(vec_duplicate ...) (reg)).


2. Before this patch, the order of the comparison as follows (take eq and lt as 
an example):
 
  1). (eq: (vec_duplicate ...) (reg)) -- commutative
  2). (lt: (reg) (vec_duplicate ... )  -- 
non-commutative
 
 These patterns order are different.
 
 So, you see we have dedicated patterns (seems duplicate patterns) 
for vector-scalar eq/ne, whereas, we unify eq/ne into other comparisons for 
vector-vector instructions.
 If we unify eq/ne into other comparisons for vector-scalar 
instructions (like your patch does), we will end up have:
 
 (eq: (reg) (vec_duplicate ... ) [after this patch] instead of (eq: 
(vec_duplicate ...) (reg)) [Before this patch].


So, I think this patch may not be right.
I may be wrong, Robin/Jerff/kito feel free to correct me if I am wrong.


--Original--
From: "demin.han"

RE: [PATCH 0/5] RISC-V: Support vf and vx for autovec comparison of

2024-02-29 Thread Demin Han

Sorry for the unexpected truncation.

Hi,
vf and vx are not supported well when comparing vector and
immediate in current autovec.
For example, following insts generated for float type:
flw
vsetvli
vfmv.v.f
...
vmfxx.vv
Two issues:
  1. Additional vsetvl and vfmv instructions
  2. Occupy one vector register and may results in smaller lmul

We expect:
flw
...
vmfxx.vf

For simplicity of supporting vx and vf, two refactors completed first.
1. remove eqne pattern; any special case or reason for eqne when first added?
2. refactor duplicate code.

[PATCH 4/5] RISC-V: Remove integer vector eqne pattern

2024-02-29 Thread demin.han

We can unify eqne and other comparison operations.

Tested on RV32 and RV64.

gcc/ChangeLog:

* config/riscv/predicates.md (comparison_except_eqge_operator): Only
  exclue ge
(comparison_except_ge_operator): Ditto
* config/riscv/riscv-string.cc (expand_rawmemchr): Use cmp pattern
(expand_strcmp): Ditto
* config/riscv/riscv-vector-builtins-bases.cc: Remvoe eqne cond
* config/riscv/vector.md (@pred_eqne_scalar): Remove eqne
  patterns
(*pred_eqne_scalar_merge_tie_mask): Ditto
(*pred_eqne_scalar): Ditto
(*pred_eqne_scalar_narrow): Ditto
(*pred_eqne_extended_scalar_merge_tie_mask): Ditto
(*pred_eqne_extended_scalar): Ditto
(*pred_eqne_extended_scalar_narrow): Ditto

Signed-off-by: demin.han 
---
 gcc/config/riscv/predicates.md|   4 +-
 gcc/config/riscv/riscv-string.cc  |   4 +-
 .../riscv/riscv-vector-builtins-bases.cc  |   3 -
 gcc/config/riscv/vector.md| 279 +-
 4 files changed, 15 insertions(+), 275 deletions(-)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 6c87a7bd1f4..7f144551bb2 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -548,8 +548,8 @@ (define_predicate "ltge_operator"
 (define_predicate "comparison_except_ltge_operator"
   (match_code "eq,ne,le,leu,gt,gtu"))
 
-(define_predicate "comparison_except_eqge_operator"
-  (match_code "le,leu,gt,gtu,lt,ltu"))
+(define_predicate "comparison_except_ge_operator"
+  (match_code "eq,ne,le,leu,gt,gtu,lt,ltu"))
 
 (define_predicate "ge_operator"
   (match_code "ge,geu"))
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index b09b51d7526..da33bd74ac6 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -1074,7 +1074,7 @@ expand_rawmemchr (machine_mode mode, rtx dst, rtx 
haystack, rtx needle,
   /* Compare needle with haystack and store in a mask.  */
   rtx eq = gen_rtx_EQ (mask_mode, gen_const_vec_duplicate (vmode, needle), 
vec);
   rtx vmsops[] = {mask, eq, vec, needle};
-  emit_nonvlmax_insn (code_for_pred_eqne_scalar (vmode),
+  emit_nonvlmax_insn (code_for_pred_cmp_scalar (vmode),
  riscv_vector::COMPARE_OP, vmsops, cnt);
 
   /* Find the first bit in the mask.  */
@@ -1200,7 +1200,7 @@ expand_strcmp (rtx result, rtx src1, rtx src2, rtx nbytes,
 = gen_rtx_EQ (mask_mode, gen_const_vec_duplicate (vmode, CONST0_RTX 
(mode)),
  vec1);
   rtx vmsops1[] = {mask0, eq0, vec1, CONST0_RTX (mode)};
-  emit_nonvlmax_insn (code_for_pred_eqne_scalar (vmode),
+  emit_nonvlmax_insn (code_for_pred_cmp_scalar (vmode),
  riscv_vector::COMPARE_OP, vmsops1, cnt);
 
   /* Look for vec1 != vec2 (includes vec2[i] == 0).  */
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index d414721ede8..0cef0b91758 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -718,9 +718,6 @@ public:
  if (CODE == GE || CODE == GEU)
return e.use_compare_insn (CODE, code_for_pred_ge_scalar (
   e.vector_mode ()));
- else if (CODE == EQ || CODE == NE)
-   return e.use_compare_insn (CODE, code_for_pred_eqne_scalar (
-  e.vector_mode ()));
  else
return e.use_compare_insn (CODE, code_for_pred_cmp_scalar (
   e.vector_mode ()));
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 9210d7c28ad..544ca4af938 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -4671,7 +4671,7 @@ (define_expand "@pred_cmp_scalar"
 (match_operand 8 "const_int_operand")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (match_operator: 3 "comparison_except_eqge_operator"
+ (match_operator: 3 "comparison_except_ge_operator"
 [(match_operand:V_VLSI_QHS 4 "register_operand")
  (vec_duplicate:V_VLSI_QHS
(match_operand: 5 "register_operand"))])
@@ -4689,7 +4689,7 @@ (define_insn "*pred_cmp_scalar_merge_tie_mask"
 (match_operand 7 "const_int_operand"  "  i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (match_operator: 2 "comparison_except_eqge_operator"
+ (match_operator: 2 "comparison_except_ge_operator"
 [(match_operand:V_VLSI_QHS 3 "register_operand"   " vr")
  (vec_duplicate:V_VLSI_QHS
(match_operand: 4 "register_operand"  "  r"))])
@@ -4714,7 +4714,7 @@ (define_insn "*pred_cmp_scalar"
 (match_operand 8 "const_int_operand" "i,i,
i,i")

[PATCH 3/5] RISC-V: Support vmfxx.vf for autovec comparison of vec and imm

2024-02-29 Thread demin.han

Currently, following instructions generated in autovector:
flw
vsetvli
vfmv.v.f
...
vmfxx.vv
Two issues:
  1. Additional vsetvl and vfmv instructions
  2. Occupy one vector register and may results in smaller lmul

We expect:
flw
...
vmfxx.vf

Tested on RV32 and RV64

gcc/ChangeLog:

* config/riscv/autovec.md: Accept imm
* config/riscv/riscv-v.cc (get_cmp_insn_code): Select scalar pattern
(expand_vec_cmp): Ditto
* config/riscv/riscv.cc (riscv_const_insns): Exclude float mode

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: Add new tests

Signed-off-by: demin.han 
---
 gcc/config/riscv/autovec.md   |  2 +-
 gcc/config/riscv/riscv-v.cc   | 23 +
 gcc/config/riscv/riscv.cc |  2 +-
 .../riscv/rvv/autovec/cmp/vcond-1.c   | 34 +++
 4 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 3b32369f68c..6cfb0800c45 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -690,7 +690,7 @@ (define_expand "vec_cmp"
   [(set (match_operand: 0 "register_operand")
(match_operator: 1 "comparison_operator"
  [(match_operand:V_VLSF 2 "register_operand")
-  (match_operand:V_VLSF 3 "register_operand")]))]
+  (match_operand:V_VLSF 3 "nonmemory_operand")]))]
   "TARGET_VECTOR"
   {
 riscv_vector::expand_vec_cmp_float (operands[0], GET_CODE (operands[1]),
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 14e75b9a117..2a188ac78e0 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2610,9 +2610,15 @@ expand_vec_init (rtx target, rtx vals)
 /* Get insn code for corresponding comparison.  */
 
 static insn_code
-get_cmp_insn_code (rtx_code code, machine_mode mode)
+get_cmp_insn_code (rtx_code code, machine_mode mode, bool scalar_p)
 {
   insn_code icode;
+  if (FLOAT_MODE_P (mode))
+{
+  icode = !scalar_p ? code_for_pred_cmp (mode)
+   : code_for_pred_cmp_scalar (mode);
+  return icode;
+}
   switch (code)
 {
 case EQ:
@@ -2628,10 +2634,7 @@ get_cmp_insn_code (rtx_code code, machine_mode mode)
 case LTU:
 case GE:
 case GEU:
-  if (FLOAT_MODE_P (mode))
-   icode = code_for_pred_cmp (mode);
-  else
-   icode = code_for_pred_ltge (mode);
+  icode = code_for_pred_ltge (mode);
   break;
 default:
   gcc_unreachable ();
@@ -2757,7 +2760,6 @@ expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx 
op1, rtx mask,
 {
   machine_mode mask_mode = GET_MODE (target);
   machine_mode data_mode = GET_MODE (op0);
-  insn_code icode = get_cmp_insn_code (code, data_mode);
 
   if (code == LTGT)
 {
@@ -2765,12 +2767,19 @@ expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx 
op1, rtx mask,
   rtx gt = gen_reg_rtx (mask_mode);
   expand_vec_cmp (lt, LT, op0, op1, mask, maskoff);
   expand_vec_cmp (gt, GT, op0, op1, mask, maskoff);
-  icode = code_for_pred (IOR, mask_mode);
+  insn_code icode = code_for_pred (IOR, mask_mode);
   rtx ops[] = {target, lt, gt};
   emit_vlmax_insn (icode, BINARY_MASK_OP, ops);
   return;
 }
 
+  rtx elt;
+  machine_mode scalar_mode = GET_MODE_INNER (GET_MODE (op1));
+  bool scalar_p = const_vec_duplicate_p (op1, ) && FLOAT_MODE_P 
(data_mode);
+  if (scalar_p)
+op1 = force_reg (scalar_mode, elt);
+  insn_code icode = get_cmp_insn_code (code, data_mode, scalar_p);
+
   rtx cmp = gen_rtx_fmt_ee (code, mask_mode, op0, op1);
   if (!mask && !maskoff)
 {
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 4100abc9dd1..1ffe4865c19 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1760,7 +1760,7 @@ riscv_const_insns (rtx x)
   register vec_duplicate into vmv.v.x.  */
scalar_mode smode = GET_MODE_INNER (GET_MODE (x));
if (maybe_gt (GET_MODE_SIZE (smode), UNITS_PER_WORD)
-   && !immediate_operand (elt, Pmode))
+   && !FLOAT_MODE_P (smode) && !immediate_operand (elt, Pmode))
  return 0;
/* Constants from -16 to 15 can be loaded with vmv.v.i.
   The Wc0, Wc1 constraints are already covered by the
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-1.c
index 99a230d1c8a..7f6738518ee 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-1.c
@@ -141,6 +141,34 @@
 TEST_VAR_ALL (DEF_VCOND_VAR)
 TEST_IMM_ALL (DEF_VCOND_IMM)
 
+#define TEST_COND_IMM_FLOAT(T, COND, IMM, SUFFIX)  \
+  T (float, float, COND, IMM, SUFFIX##_float_float)\
+  T (double, double, COND, IMM, SUFFIX##_double_double)

[PATCH 2/5] RISC-V: Refactor expand_vec_cmp

2024-02-29 Thread demin.han

There are two expand_vec_cmp functions.
They have same structure and similar code.
We can use default arguments instead of overloading.

Tested on RV32 and RV64.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (expand_vec_cmp): Change proto
* config/riscv/riscv-v.cc (expand_vec_cmp): Use default arguments
(expand_vec_cmp_float): Adapt arguments

Signed-off-by: demin.han 
---
 gcc/config/riscv/riscv-protos.h |  2 +-
 gcc/config/riscv/riscv-v.cc | 44 +++--
 2 files changed, 15 insertions(+), 31 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 80efdf2b7e5..b8735593805 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -603,7 +603,7 @@ bool simm5_p (rtx);
 bool neg_simm5_p (rtx);
 #ifdef RTX_CODE
 bool has_vi_variant_p (rtx_code, rtx);
-void expand_vec_cmp (rtx, rtx_code, rtx, rtx);
+void expand_vec_cmp (rtx, rtx_code, rtx, rtx, rtx = nullptr, rtx = nullptr);
 bool expand_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
 void expand_cond_len_unop (unsigned, rtx *);
 void expand_cond_len_binop (unsigned, rtx *);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 0cfbd21ce6f..14e75b9a117 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2752,7 +2752,8 @@ vectorize_related_mode (machine_mode vector_mode, 
scalar_mode element_mode,
 /* Expand an RVV comparison.  */
 
 void
-expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx op1)
+expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx op1, rtx mask,
+   rtx maskoff)
 {
   machine_mode mask_mode = GET_MODE (target);
   machine_mode data_mode = GET_MODE (op0);
@@ -2762,8 +2763,8 @@ expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx 
op1)
 {
   rtx lt = gen_reg_rtx (mask_mode);
   rtx gt = gen_reg_rtx (mask_mode);
-  expand_vec_cmp (lt, LT, op0, op1);
-  expand_vec_cmp (gt, GT, op0, op1);
+  expand_vec_cmp (lt, LT, op0, op1, mask, maskoff);
+  expand_vec_cmp (gt, GT, op0, op1, mask, maskoff);
   icode = code_for_pred (IOR, mask_mode);
   rtx ops[] = {target, lt, gt};
   emit_vlmax_insn (icode, BINARY_MASK_OP, ops);
@@ -2771,33 +2772,16 @@ expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx 
op1)
 }
 
   rtx cmp = gen_rtx_fmt_ee (code, mask_mode, op0, op1);
-  rtx ops[] = {target, cmp, op0, op1};
-  emit_vlmax_insn (icode, COMPARE_OP, ops);
-}
-
-void
-expand_vec_cmp (rtx target, rtx_code code, rtx mask, rtx maskoff, rtx op0,
-   rtx op1)
-{
-  machine_mode mask_mode = GET_MODE (target);
-  machine_mode data_mode = GET_MODE (op0);
-  insn_code icode = get_cmp_insn_code (code, data_mode);
-
-  if (code == LTGT)
+  if (!mask && !maskoff)
 {
-  rtx lt = gen_reg_rtx (mask_mode);
-  rtx gt = gen_reg_rtx (mask_mode);
-  expand_vec_cmp (lt, LT, mask, maskoff, op0, op1);
-  expand_vec_cmp (gt, GT, mask, maskoff, op0, op1);
-  icode = code_for_pred (IOR, mask_mode);
-  rtx ops[] = {target, lt, gt};
-  emit_vlmax_insn (icode, BINARY_MASK_OP, ops);
-  return;
+  rtx ops[] = {target, cmp, op0, op1};
+  emit_vlmax_insn (icode, COMPARE_OP, ops);
+}
+  else
+{
+  rtx ops[] = {target, mask, maskoff, cmp, op0, op1};
+  emit_vlmax_insn (icode, COMPARE_OP_MU, ops);
 }
-
-  rtx cmp = gen_rtx_fmt_ee (code, mask_mode, op0, op1);
-  rtx ops[] = {target, mask, maskoff, cmp, op0, op1};
-  emit_vlmax_insn (icode, COMPARE_OP_MU, ops);
 }
 
 /* Expand an RVV floating-point comparison:
@@ -2875,7 +2859,7 @@ expand_vec_cmp_float (rtx target, rtx_code code, rtx op0, 
rtx op1,
   else
{
  /* vmfeq.vvv0, vb, vb, v0.t  */
- expand_vec_cmp (eq0, EQ, eq0, eq0, op1, op1);
+ expand_vec_cmp (eq0, EQ, op1, op1, eq0, eq0);
}
   break;
 default:
@@ -2893,7 +2877,7 @@ expand_vec_cmp_float (rtx target, rtx_code code, rtx op0, 
rtx op1,
   if (code == ORDERED)
 emit_move_insn (target, eq0);
   else
-expand_vec_cmp (eq0, code, eq0, eq0, op0, op1);
+expand_vec_cmp (eq0, code, op0, op1, eq0, eq0);
 
   if (can_invert_p)
 {
-- 
2.43.2

[PATCH 5/5] RISC-V: Support vmsxx.vx for autovec comparison of vec and imm

2024-02-29 Thread demin.han

Similar to previous float change, vmsxx.vx is needed.
1. Only those which can't match vi should use vx.
2. DImode is processed by sew64_scalar_helper.

Tested on RV32 and RV64.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (get_cmp_insn_code): Select scalar pattern
(expand_vec_cmp): Ditto

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: Update expect

Signed-off-by: demin.han 
---
 gcc/config/riscv/riscv-v.cc   | 33 ---
 .../riscv/rvv/autovec/cmp/vcond-1.c   | 14 ++--
 2 files changed, 26 insertions(+), 21 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 2a188ac78e0..9b601a4a8ff 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2619,26 +2619,18 @@ get_cmp_insn_code (rtx_code code, machine_mode mode, 
bool scalar_p)
: code_for_pred_cmp_scalar (mode);
   return icode;
 }
-  switch (code)
+  if (scalar_p)
 {
-case EQ:
-case NE:
-case LE:
-case LEU:
-case GT:
-case GTU:
-case LTGT:
-  icode = code_for_pred_cmp (mode);
-  break;
-case LT:
-case LTU:
-case GE:
-case GEU:
-  icode = code_for_pred_ltge (mode);
-  break;
-default:
-  gcc_unreachable ();
+  if (code == GE || code == GEU)
+ icode = code_for_pred_ge_scalar (mode);
+  else
+ icode = code_for_pred_cmp_scalar (mode);
+  return icode;
 }
+  if (code == LT || code == LTU || code == GE || code == GEU)
+icode = code_for_pred_ltge (mode);
+  else
+icode = code_for_pred_cmp (mode);
   return icode;
 }
 
@@ -2775,7 +2767,10 @@ expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx 
op1, rtx mask,
 
   rtx elt;
   machine_mode scalar_mode = GET_MODE_INNER (GET_MODE (op1));
-  bool scalar_p = const_vec_duplicate_p (op1, ) && FLOAT_MODE_P 
(data_mode);
+  bool scalar_p
+= const_vec_duplicate_p (op1, )
+  && (FLOAT_MODE_P (data_mode)
+ || (scalar_mode != DImode && !has_vi_variant_p (code, elt)));
   if (scalar_p)
 op1 = force_reg (scalar_mode, elt);
   insn_code icode = get_cmp_insn_code (code, data_mode, scalar_p);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-1.c
index 7f6738518ee..e04c2a0cfbd 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-1.c
@@ -180,9 +180,19 @@ TEST_IMM_FLOAT_ALL (DEF_VCOND_IMM)
 /* { dg-final { scan-assembler-times {\tvmseq} 78 } } */
 /* { dg-final { scan-assembler-times {\tvmsne} 78 } } */
 /* { dg-final { scan-assembler-times {\tvmsgt} 82 } } */
-/* { dg-final { scan-assembler-times {\tvmslt} 38 } } */
-/* { dg-final { scan-assembler-times {\tvmsge} 38 } } */
+/* { dg-final { scan-assembler-times {\tvmslt} 50 } } */
+/* { dg-final { scan-assembler-times {\tvmsge} 26 } } */
 /* { dg-final { scan-assembler-times {\tvmsle} 82 } } */
+/* { dg-final { scan-assembler-times {\tvmseq\.vx} 16 } } */
+/* { dg-final { scan-assembler-times {\tvmsne\.vx} 16 } } */
+/* { dg-final { scan-assembler-times {\tvmsgt\.vx} 4 } } */
+/* { dg-final { scan-assembler-times {\tvmsgtu\.vx} 14 } } */
+/* { dg-final { scan-assembler-times {\tvmslt\.vx} 24 } } */
+/* { dg-final { scan-assembler-times {\tvmsltu\.vx} 0 } } */
+/* { dg-final { scan-assembler-times {\tvmsge\.vx} 0 } } */
+/* { dg-final { scan-assembler-times {\tvmsgeu\.vx} 0 } } */
+/* { dg-final { scan-assembler-times {\tvmsle\.vx} 4 } } */
+/* { dg-final { scan-assembler-times {\tvmsleu\.vx} 14 } } */
 /* { dg-final { scan-assembler-times {\tvmfgt.vf} 6 } } */
 /* { dg-final { scan-assembler-times {\tvmflt.vf} 6 } } */
 /* { dg-final { scan-assembler-times {\tvmfge.vf} 6 } } */
-- 
2.43.2

[PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-02-29 Thread demin.han

We can unify eqne and other comparison operations.

Tested on RV32 and RV64

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Remove eqne cond
* config/riscv/vector.md (@pred_eqne_scalar): Remove patterns
(*pred_eqne_scalar_merge_tie_mask): Ditto
(*pred_eqne_scalar): Ditto
(*pred_eqne_scalar_narrow): Ditto

Signed-off-by: demin.han 
---
 .../riscv/riscv-vector-builtins-bases.cc  |  4 -
 gcc/config/riscv/vector.md| 86 ---
 2 files changed, 90 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index b6f6e4ff37e..d414721ede8 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1420,10 +1420,6 @@ public:
 switch (e.op_info->op)
   {
case OP_TYPE_vf: {
- if (CODE == EQ || CODE == NE)
-   return e.use_compare_insn (CODE, code_for_pred_eqne_scalar (
-  e.vector_mode ()));
- else
return e.use_compare_insn (CODE, code_for_pred_cmp_scalar (
   e.vector_mode ()));
}
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index ab6e099852d..9210d7c28ad 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7520,92 +7520,6 @@ (define_insn "*pred_cmp_scalar_narrow"
(set_attr "mode" "")
(set_attr "spec_restriction" "none,thv,thv,none,none")])
 
-(define_expand "@pred_eqne_scalar"
-  [(set (match_operand: 0 "register_operand")
-   (if_then_else:
- (unspec:
-   [(match_operand: 1 "vector_mask_operand")
-(match_operand 6 "vector_length_operand")
-(match_operand 7 "const_int_operand")
-(match_operand 8 "const_int_operand")
-(reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (match_operator: 3 "equality_operator"
-[(vec_duplicate:V_VLSF
-   (match_operand: 5 "register_operand"))
- (match_operand:V_VLSF 4 "register_operand")])
- (match_operand: 2 "vector_merge_operand")))]
-  "TARGET_VECTOR"
-  {})
-
-(define_insn "*pred_eqne_scalar_merge_tie_mask"
-  [(set (match_operand: 0 "register_operand"  "=vm")
-   (if_then_else:
- (unspec:
-   [(match_operand: 1 "register_operand" "  0")
-(match_operand 5 "vector_length_operand" " rK")
-(match_operand 6 "const_int_operand" "  i")
-(match_operand 7 "const_int_operand" "  i")
-(reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (match_operator: 2 "equality_operator"
-[(vec_duplicate:V_VLSF
-   (match_operand: 4 "register_operand" "  f"))
- (match_operand:V_VLSF 3 "register_operand"  " vr")])
- (match_dup 1)))]
-  "TARGET_VECTOR"
-  "vmf%B2.vf\t%0,%3,%4,v0.t"
-  [(set_attr "type" "vfcmp")
-   (set_attr "mode" "")
-   (set_attr "merge_op_idx" "1")
-   (set_attr "vl_op_idx" "5")
-   (set (attr "ma") (symbol_ref "riscv_vector::get_ma(operands[6])"))
-   (set (attr "avl_type_idx") (const_int 7))])
-
-;; We don't use early-clobber for LMUL <= 1 to get better codegen.
-(define_insn "*pred_eqne_scalar"
-  [(set (match_operand: 0 "register_operand""=vr,   vr,   
,   ")
-   (if_then_else:
- (unspec:
-   [(match_operand: 1 "vector_mask_operand"  
"vmWc1,vmWc1,vmWc1,vmWc1")
-(match_operand 6 "vector_length_operand" "   rK,   rK,   
rK,   rK")
-(match_operand 7 "const_int_operand" "i,i,
i,i")
-(match_operand 8 "const_int_operand" "i,i,
i,i")
-(reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (match_operator: 3 "equality_operator"
-[(vec_duplicate:V_VLSF
-   (match_operand: 5 "register_operand" "f,f,
f,f"))
- (match_operand:V_VLSF 4 "register_operand"  "   vr,   vr,   
vr,   vr")])
- (match_operand: 2 "vector_merge_operand""   vu,0,
vu,0")))]
-  "TARGET_VECTOR && riscv_vector::cmp_lmul_le_one (mode)"
-  "vmf%B3.vf\t%0,%4,%5%p1"
-  [(set_attr "type" "vfcmp")
-   (set_attr "mode" "")
-   (set_attr "spec_restriction" "thv,thv,rvv,rvv")])
-
-;; We use early-clobber for source LMUL > dest LMUL.
-(define_insn "*pred_eqne_scalar_narrow"
-  [(set (match_operand: 0 "register_operand""=vm,   vr,   
vr,  ,  ")
-   (if_then_else:
- (unspec:
-   [(match_operand: 1 "vector_mask_operand"  "
0,vmWc1,vmWc1,vmWc1,vmWc1")
-(match_operand 6 "vector_length_operand" "   rK,   rK,   
rK,   rK,   rK")
-(match_operand 7

[PATCH 0/5] RISC-V: Support vf and vx for autovec comparison of

2024-02-29 Thread demin.han

We expect:
flw
...
vmfxx.vf

For simplicity of supporting vx and vf, two refactors completed first.
1. remove eqne pattern; any special case or reason for eqne when first added?
2. refactor duplicate code.


demin.han (5):
  RISC-V: Remove float vector eqne pattern
  RISC-V: Refactor expand_vec_cmp
  RISC-V: Support vmfxx.vf for autovec comparison of vec and imm
  RISC-V: Remove integer vector eqne pattern
  RISC-V: Support vmsxx.vx for autovec comparison of vec and imm

 gcc/config/riscv/autovec.md   |   2 +-
 gcc/config/riscv/predicates.md|   4 +-
 gcc/config/riscv/riscv-protos.h   |   2 +-
 gcc/config/riscv/riscv-string.cc  |   4 +-
 gcc/config/riscv/riscv-v.cc   |  94 ++---
 .../riscv/riscv-vector-builtins-bases.cc  |   7 -
 gcc/config/riscv/riscv.cc |   2 +-
 gcc/config/riscv/vector.md| 365 +-
 .../riscv/rvv/autovec/cmp/vcond-1.c   |  48 ++-
 9 files changed, 105 insertions(+), 423 deletions(-)

-- 
2.43.2

Re: [PATCH] calls: Fix up TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-02-29 Thread Alexandre Oliva

On Feb 27, 2024, Richard Earnshaw  wrote:

> This one has been festering for a while; both Alexandre and Torbjorn
> have attempted to fix it recently, but I'm not sure either is really
> right...

*nod* xref https://gcc.gnu.org/pipermail/gcc-patches/2024-March/646926.html
The patch I proposed was indeed far too limited in scope.

> On Arm this is causing all anonymous arguments to be passed on the
> stack, which is incorrect per the ABI.  On a target that uses
> 'pretend_outgoing_vararg_named', why is it correct to set n_named_args
> to zero?  Is it enough to guard both the statements you've added with
> !targetm.calls.pretend_outgoing_args_named?

ISTM that the change you suggest over Jakub's patch would address the
inconsistency on ARM.

Matthew suggested a patch along these lines in the other thread, that I
xrefed above, that seems sound to me, but I also suspect it won't fix
the ppc64le issue.  My hunch is that we'll need a combination of both,
possibly with further tweaks to adjust for Jakub's just-added test.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice but
very few check the facts.  Think Assange & Stallman.  The empires strike back

Re: [PATCH] arm: fix c23 0-named-args caller-side stdarg

2024-02-29 Thread Alexandre Oliva

Hello, Matthew,

Thanks for the review.

On Feb 26, 2024, Matthew Malcomson  wrote:

> I think you're right that the AAPCS32 requires all arguments to be passed in
> registers for this testcase.
> (Nit on the commit-message: It says that your reading of the AAPCS32
> suggests
> that the *caller* is correct -- I believe based on the change you
> suggested you
> meant *callee* is correct in expecting arguments in registers.)

Ugh, yeah, sorry about the typo.

> The approach you suggest looks OK to me -- I do notice that it doesn't
> fix the
> legacy ABI's of `atpcs` and `apcs` and guess it would be nicer to have them
> working at the same time though would defer to maintainers on how
> important that
> is.
> (For the benefit of others reading) I don't believe there is any ABI concern
> with this since it's fixing something that is currently not working at
> all and
> only applies to c23 (so a change shouldn't have too much of an impact).

> You mention you chose to make the change in the arm backend rather
> than general
> code due to hesitancy to change the generic ABI-affecting code. That makes
> sense to me, certainly at this late stage in the development cycle.

*nod* I wrote the patch in the following context: I hit the problem on
the very first toolchain I started transitioning to gcc-13.  I couldn't
really fathom the notion that this breakage could have survived an
entire release cycle if it affected many targets, and sort of held on to
an assumption that the abi used by our arm-eabi toolchain had to be an
uncommon one.

All of this hypothesizing falls apart by the now apparent knowledge that
the test is faling elsewhere as well, even on other ARM ABIs, it just
hadn't been addressed yet.  I'm glad we're getting there :-)

> From a quick check on c23-stdarg-4.c it does look like the below
> change ends up
> with the same codegen as your patch (except in the case of those
> legacy ABI's,
> where the below does make the caller and callee ABI match AFAICT):

> ```
>   diff --git a/gcc/calls.cc b/gcc/calls.cc
>   index 01f44734743..0b302f633ed 100644
>   --- a/gcc/calls.cc
>   +++ b/gcc/calls.cc
>   @@ -2970,14 +2970,15 @@ expand_call (tree exp, rtx target, int ignore)
>     we do not have any reliable way to pass unnamed args in
>     registers, so we must force them into memory.  */

>   -  if (type_arg_types != 0
>   +  if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>  && targetm.calls.strict_argument_naming (args_so_far))
>    ;
>  else if (type_arg_types != 0
>  && ! targetm.calls.pretend_outgoing_varargs_named
> (args_so_far))
>    /* Don't include the last named arg.  */
>    --n_named_args;
>   -  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>   +  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
>   +    && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>    n_named_args = 0;
>  else
>    /* Treat all args as named.  */
> ```

> Do you agree that this makes sense (i.e. is there something I'm
> completely missing)?

Yeah, your argument is quite convincing, and the target knobs are indeed
in line with the change you suggest, whereas the current code seems to
deviate from them.

With my ABI designer hat on, however, I see that there's room for ABIs
to make decisions about 0-args stdargs that go differently from stdargs
with leading named args, from prototyped functions, and even from
prototypeless functions, and we might end up needing more knobs to deal
with such custom decisions.  We can cross that bridge if/when we get to
it, though.

> (lm32 mcore msp430 gcn cris fr30 frv h8300 arm v850 rx pru)

Interesting that ppc64le is not on your list.  There's PR107453 about
that, and another thread is discussing a fix for it that is somewhat
different from what you propose (presumably because the way the problem
manifests on ppc64le is different), but it also tweaks expand_call.

I'll copy you when following up there.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: [PATCH] combine: Don't simplify paradoxical SUBREG on WORD_REGISTER_OPERATIONS [PR113010]

2024-02-29 Thread Jeff Law





On 2/26/24 17:17, Greg McGary wrote:

The sign-bit-copies of a sign-extending load cannot be known until runtime on
WORD_REGISTER_OPERATIONS targets, except in the case of a zero-extending MEM
load.  See the fix for PR112758.

2024-02-22  Greg McGary  

 PR rtl-optimization/113010
* combine.cc (simplify_comparison): Simplify a SUBREG on
  WORD_REGISTER_OPERATIONS targets only if it is a zero-extending
  MEM load.

* gcc.c-torture/execute/pr113010.c: New test.
I think this is fine for the trunk.  I'll do some final testing on it 
tomorrow.


Jeff

Re: [PATCH] LoongArch: Allow s9 as a register alias

2024-02-29 Thread chenglulu




在 2024/2/29 下午3:14, Xi Ruoyao 写道:

The psABI allows using s9 as an alias of r22.

gcc/ChangeLog:

* config/loongarch/loongarch.h (ADDITIONAL_REGISTER_NAMES): Add
s9 as an alias of r22.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?


I think a test is needed.

Others LGTM.

Thanks!



  gcc/config/loongarch/loongarch.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index 8b453ab3140..bf2351f0968 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -931,6 +931,7 @@ typedef struct {
{ "t8",   20 + GP_REG_FIRST },\
{ "x",21 + GP_REG_FIRST },\
{ "fp",   22 + GP_REG_FIRST },\
+  { "s9",22 + GP_REG_FIRST },\
{ "s0",   23 + GP_REG_FIRST },\
{ "s1",   24 + GP_REG_FIRST },\
{ "s2",   25 + GP_REG_FIRST },\

libbacktrace patch committed: Read symbol table of debuginfo file

2024-02-29 Thread Ian Lance Taylor

This patch to libbacktrace reads symbol tables from debuginfo files.
These become another symbol table to search.  This is needed if people
use --strip-all rather than --strip-debug when adding a debuglink
section.  This fixes
https://github.com/ianlancetaylor/libbacktrace/issues/113.
Bootstrapped and ran libbacktrace and libgo tests on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian

* elf.c (elf_add): Add the symbol table from a debuginfo file.
* Makefile.am (MAKETESTS): Add buildidfull and gnudebuglinkfull
variants of buildid and gnudebuglink tests.
(%_gnudebuglinkfull, %_buildidfull): New patterns.
* Makefile.in: Regenerate.
24810fbf7b0ce274dfa46cc362305ac77ee5a72c
diff --git a/libbacktrace/Makefile.am b/libbacktrace/Makefile.am
index 16a72d2abf7..750ed80ed05 100644
--- a/libbacktrace/Makefile.am
+++ b/libbacktrace/Makefile.am
@@ -257,7 +257,7 @@ b2test_LDFLAGS = -Wl,--build-id
 b2test_LDADD = libbacktrace_elf_for_test.la
 
 check_PROGRAMS += b2test
-MAKETESTS += b2test_buildid
+MAKETESTS += b2test_buildid b2test_buildidfull
 
 if HAVE_DWZ
 
@@ -267,7 +267,7 @@ b3test_LDFLAGS = -Wl,--build-id
 b3test_LDADD = libbacktrace_elf_for_test.la
 
 check_PROGRAMS += b3test
-MAKETESTS += b3test_dwz_buildid
+MAKETESTS += b3test_dwz_buildid b3test_dwz_buildidfull
 
 endif HAVE_DWZ
 
@@ -443,12 +443,16 @@ endif HAVE_PTHREAD
 
 if HAVE_OBJCOPY_DEBUGLINK
 
-MAKETESTS += btest_gnudebuglink
+MAKETESTS += btest_gnudebuglink btest_gnudebuglinkfull
 
 %_gnudebuglink: %
$(OBJCOPY) --only-keep-debug $< $@.debug
$(OBJCOPY) --strip-debug --add-gnu-debuglink=$@.debug $< $@
 
+%_gnudebuglinkfull: %
+   $(OBJCOPY) --only-keep-debug $< $@.debug
+   $(OBJCOPY) --strip-all --add-gnu-debuglink=$@.debug $< $@
+
 endif HAVE_OBJCOPY_DEBUGLINK
 
 %_buildid: %
@@ -457,6 +461,12 @@ endif HAVE_OBJCOPY_DEBUGLINK
  $<
$(OBJCOPY) --strip-debug $< $@
 
+%_buildidfull: %
+   ./install-debuginfo-for-buildid.sh \
+ "$(TEST_BUILD_ID_DIR)" \
+ $<
+   $(OBJCOPY) --strip-all $< $@
+
 if HAVE_COMPRESSED_DEBUG
 
 ctestg_SOURCES = btest.c testlib.c
diff --git a/libbacktrace/elf.c b/libbacktrace/elf.c
index c506cc29fe1..664937e1438 100644
--- a/libbacktrace/elf.c
+++ b/libbacktrace/elf.c
@@ -6872,7 +6872,7 @@ elf_add (struct backtrace_state *state, const char 
*filename, int descriptor,
 
   if (symtab_shndx == 0)
 symtab_shndx = dynsym_shndx;
-  if (symtab_shndx != 0 && !debuginfo)
+  if (symtab_shndx != 0)
 {
   const b_elf_shdr *symtab_shdr;
   unsigned int strtab_shndx;

[PATCH, rs6000] Add subreg patterns for SImode rotate and mask insert

2024-02-29 Thread HAO CHEN GUI

Hi,
  This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In
combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
lshiftrt with an out AND. It matches a DImode rotate and mask insert on
rs6000.

Trying 2 -> 7:
2: r122:DI=r129:DI
  REG_DEAD r129:DI
7: r125:SI=r122:DI#0 0>>0x1f
  REG_DEAD r122:DI
Failed to match this instruction:
(set (subreg:DI (reg:SI 125 [ x ]) 0)
(zero_extract:DI (reg:DI 129)
(const_int 32 [0x20])
(const_int 1 [0x1])))
Successfully matched this instruction:
(set (subreg:DI (reg:SI 125 [ x ]) 0)
(and:DI (lshiftrt:DI (reg:DI 129)
(const_int 31 [0x1f]))
(const_int 4294967295 [0x])))

This conversion blocks the further combination which combines to a SImode
rotate and mask insert insn.

Trying 9, 7 -> 10:
9: r127:SI=r130:DI#0&0xfffe
  REG_DEAD r130:DI
7: r125:SI#0=r129:DI 0>>0x1f&0x
  REG_DEAD r129:DI
   10: r124:SI=r127:SI|r125:SI
  REG_DEAD r125:SI
  REG_DEAD r127:SI
Failed to match this instruction:
(set (reg:SI 124)
(ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
(const_int -2 [0xfffe]))
(subreg:SI (zero_extract:DI (reg:DI 129)
(const_int 32 [0x20])
(const_int 1 [0x1])) 0)))
Failed to match this instruction:
(set (reg:SI 124)
(ior:SI (and:SI (subreg:SI (reg:DI 130) 0)
(const_int -2 [0xfffe]))
(subreg:SI (and:DI (lshiftrt:DI (reg:DI 129)
(const_int 31 [0x1f]))
(const_int 4294967295 [0x])) 0)))

  The root cause of the issue is if it's necessary to do the widen mode for
lshiftrt when the target already has the narrow mode lshiftrt and its cost
is not high. My former patch tried to fix the problem but not accepted yet.
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624852.html

  As it's stage 4 now, I drafted this patch to fix the regression by adding
subreg patterns of SImode rotate and mask insert. It actually does reversed
things and narrow the mode for lshiftrt so that it can matches the SImode
rotate and mask insert.

  The case "rlwimi-2.c" is fixed and restore the corresponding number of
insns to original ones. The case "rlwinm-0.c" is also changed and 9 "rlwinm"
is replaced with 9 "rldicl" as the sequence of combine is changed. It's not
a regression as the total number of insns isn't changed.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen


ChangeLog
rs6000: Add subreg patterns for SImode rotate and mask insert

In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode
lshiftrt with an AND.  The new pattern matches rotate and mask insert on
rs6000.  Thus it blocks the pattern to be further combined to a SImode rotate
and mask insert pattern.  This patch fixes the problem by adding two subreg
pattern for SImode rotate and mask insert patterns.

gcc/
PR target/93738
* config/rs6000/rs6000.md (*rotlsi3_insert_9): New.
(*rotlsi3_insert_8): New.

gcc/testsuite/
PR target/93738
* gcc.target/powerpc/rlwimi-2.c: Adjust the number of 64bit and 32bit
rotate instructions.
* gcc.target/powerpc/rlwinm-0.c: Likewise.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bc8bc6ab060..b0b40f91e3e 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4253,6 +4253,36 @@ (define_insn "*rotl3_insert"
 ; difference between rlwimi and rldimi.  We also might want dot forms,
 ; but not for rlwimi on POWER4 and similar processors.

+; Subreg pattern of insn "*rotlsi3_insert"
+(define_insn_and_split "*rotlsi3_insert_9"
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
+   (ior:SI (and:SI
+(match_operator:SI 8 "lowpart_subreg_operator"
+ [(and:DI (match_operator:DI 4 "rotate_mask_operator"
+   [(match_operand:DI 1 "gpc_reg_operand" "r")
+(match_operand:SI 2 "const_int_operand" "n")])
+  (match_operand:DI 3 "const_int_operand" "n"))])
+(match_operand:SI 5 "const_int_operand" "n"))
+   (and:SI (match_operand:SI 6 "gpc_reg_operand" "0")
+   (match_operand:SI 7 "const_int_operand" "n"]
+  "rs6000_is_valid_insert_mask (operands[5], operands[4], SImode)
+   && GET_CODE (operands[4]) == LSHIFTRT
+   && INTVAL (operands[3]) == 0x
+   && UINTVAL (operands[5]) + UINTVAL (operands[7]) + 1 == 0"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (ior:SI (and:SI (lshiftrt:SI (match_dup 9)
+(match_dup 2))
+   (match_dup 5))
+   (and:SI (match_dup 6)
+   (match_dup 7]
+{
+  int offset = BYTES_BIG_ENDIAN ? 4 : 0;
+  operands[9] = gen_rtx_SUBREG (SImode,

Re: [PATCH] RISC-V: Add riscv_vector_cc function attribute

2024-02-29 Thread Li Xu

Ping.



xu...@eswincomputing.com
 
From: Li Xu
Date: 2024-02-27 09:17
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; zhengyu; xuli
Subject: [PATCH] RISC-V: Add riscv_vector_cc function attribute
From: xuli 
 
Standard vector calling convention variant will only enabled when function
has vector argument or returning value by default, however user may also
want to invoke function without that during a vectorized loop at some situation,
but it will cause a huge performance penalty due to vector register 
store/restore.
 
So user can declare function with this riscv_vector_cc attribute like below, 
that could enforce
function will use standard vector calling convention variant.
 
void foo() __attribute__((riscv_vector_cc));
[[riscv::vector_cc]] void foo(); // For C++11 and C23
 
For more details please reference the below link.
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/67
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (TARGET_GNU_ATTRIBUTES): Add riscv_vector_cc
attribute to riscv_attribute_table.
(riscv_vector_cc_function_p): Return true if FUNC is a riscv_vector_cc function.
(riscv_fntype_abi): Add riscv_vector_cc attribute check.
* doc/extend.texi: Add riscv_vector_cc attribute description.
 
gcc/testsuite/ChangeLog:
 
* g++.target/riscv/rvv/base/attribute-riscv_vector_cc-error.C: New test.
* gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-callee-saved.c: New test.
* gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-error.c: New test.
---
gcc/config/riscv/riscv.cc |  55 ++--
gcc/doc/extend.texi   |  12 ++
.../base/attribute-riscv_vector_cc-error.C|  22 
.../attribute-riscv_vector_cc-callee-saved.c  | 117 ++
.../base/attribute-riscv_vector_cc-error.c|  11 ++
5 files changed, 209 insertions(+), 8 deletions(-)
create mode 100644 
gcc/testsuite/g++.target/riscv/rvv/base/attribute-riscv_vector_cc-error.C
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-callee-saved.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-error.c
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 4100abc9dd1..7f37f231796 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -537,24 +537,52 @@ static tree riscv_handle_fndecl_attribute (tree *, tree, 
tree, int, bool *);
static tree riscv_handle_type_attribute (tree *, tree, tree, int, bool *);
/* Defining target-specific uses of __attribute__.  */
-TARGET_GNU_ATTRIBUTES (riscv_attribute_table,
+static const attribute_spec riscv_gnu_attributes[] =
{
   /* Syntax: { name, min_len, max_len, decl_required, type_required,
   function_type_required, affects_type_identity, handler,
   exclude } */
   /* The attribute telling no prologue/epilogue.  */
-  { "naked", 0,  0, true, false, false, false,
-riscv_handle_fndecl_attribute, NULL },
+  {"naked", 0, 0, true, false, false, false, riscv_handle_fndecl_attribute,
+   NULL},
   /* This attribute generates prologue/epilogue for interrupt handlers.  */
-  { "interrupt", 0, 1, false, true, true, false,
-riscv_handle_type_attribute, NULL },
+  {"interrupt", 0, 1, false, true, true, false, riscv_handle_type_attribute,
+   NULL},
   /* The following two are used for the built-in properties of the Vector type
  and are not used externally */
   {"RVV sizeless type", 4, 4, false, true, false, true, NULL, NULL},
-  {"RVV type", 0, 0, false, true, false, true, NULL, NULL}
-});
+  {"RVV type", 0, 0, false, true, false, true, NULL, NULL},
+  /* This attribute is used to declare a function, forcing it to use the
+standard vector calling convention variant. Syntax:
+__attribute__((riscv_vector_cc)). */
+  {"riscv_vector_cc", 0, 0, false, true, true, true, NULL, NULL}
+};
+
+static const scoped_attribute_specs riscv_gnu_attribute_table  =
+{
+  "gnu", {riscv_gnu_attributes}
+};
+
+static const attribute_spec riscv_attributes[] =
+{
+  /* This attribute is used to declare a function, forcing it to use the
+ standard vector calling convention variant. Syntax:
+ [[riscv::vector_cc]]. */
+  {"vector_cc", 0, 0, false, true, true, true, NULL, NULL}
+};
+
+static const scoped_attribute_specs riscv_nongnu_attribute_table =
+{
+  "riscv", {riscv_attributes}
+};
+
+static const scoped_attribute_specs *const riscv_attribute_table[] =
+{
+  _gnu_attribute_table,
+  _nongnu_attribute_table
+};
/* Order for the CLOBBERs/USEs of gpr_save.  */
static const unsigned gpr_save_reg_order[] = {
@@ -5425,6 +5453,16 @@ riscv_arguments_is_vector_type_p (const_tree fntype)
   return false;
}
+/* Return true if FUNC is a riscv_vector_cc function.
+   For more details please reference the below link.
+   https://github.com/riscv-non-isa/riscv-c-api-doc/pull/67 */
+static bool
+riscv_vector_cc_function_p (const_tree fntype)
+{
+  return lookup_attribute ("vector_cc", TYPE_ATTRIBUTES (fntype)) != NULL_TREE
+ || lookup_attribute

Re: [PATCH gcc] Hurd x86_64: add unwind support for signal trampoline code

2024-02-29 Thread Samuel Thibault

Flavio Cruz, le mer. 28 févr. 2024 22:59:09 -0500, a ecrit:
> Tested with some simple toy examples where an exception is thrown in the
> signal handler.
> 
> libgcc/ChangeLog:
>   * config/i386/gnu-unwind.h: Support unwinding x86_64 signal frames.
> 
> Signed-off-by: Flavio Cruz 

Reviewed-by: Samuel Thibault 

Thanks!!

> ---
>  libgcc/config/i386/gnu-unwind.h | 97 -
>  1 file changed, 94 insertions(+), 3 deletions(-)
> 
> diff --git a/libgcc/config/i386/gnu-unwind.h b/libgcc/config/i386/gnu-unwind.h
> index 0751b5593d4..02b060ab4a5 100644
> --- a/libgcc/config/i386/gnu-unwind.h
> +++ b/libgcc/config/i386/gnu-unwind.h
> @@ -32,9 +32,100 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
>  
>  #ifdef __x86_64__
>  
> -/*
> - * TODO: support for 64 bits needs to be implemented.
> - */
> +#define MD_FALLBACK_FRAME_STATE_FOR x86_gnu_fallback_frame_state
> +
> +static _Unwind_Reason_Code
> +x86_gnu_fallback_frame_state
> +(struct _Unwind_Context *context, _Unwind_FrameState *fs)
> +{
> +  static const unsigned char gnu_sigtramp_code[] =
> +  {
> +/* rpc_wait_trampoline: */
> +0x48, 0xc7, 0xc0, 0xe7, 0xff, 0xff, 0xff,/* mov$-25,%rax */
> +0x0f, 0x05,  /* syscall */
> +0x49, 0x89, 0x04, 0x24,  /* mov%rax,(%r12) */
> +0x48, 0x89, 0xdc,/* mov%rbx,%rsp */
> +
> +/* trampoline: */
> +0x5f,/* pop%rdi */
> +0x5e,/* pop%rsi */
> +0x5a,/* pop%rdx */
> +0x48, 0x83, 0xc4, 0x08,  /* add$0x8,%rsp */
> +0x41, 0xff, 0xd5,/* call   *%r13 */
> +
> +/* RA HERE */
> +0x48, 0x8b, 0x7c, 0x24, 0x10,/* mov0x10(%rsp),%rdi */
> +0xc3,/* ret */
> +
> +/* firewall: */
> +0xf4,/* hlt */
> +  };
> +
> +  const size_t gnu_sigtramp_len = sizeof gnu_sigtramp_code;
> +  const size_t gnu_sigtramp_tail = 7; /* length of tail after RA */
> +
> +  struct stack_contents {
> +void *sigreturn_addr;
> +void *sigreturn_returns_here;
> +struct sigcontext *return_scp;
> +  } *stack_contents;
> +  struct sigcontext *scp;
> +  unsigned long usp;
> +
> +  unsigned char *adjusted_pc = (unsigned char*)(context->ra) +
> +gnu_sigtramp_tail - gnu_sigtramp_len;
> +  if (memcmp (adjusted_pc, gnu_sigtramp_code, gnu_sigtramp_len))
> +return _URC_END_OF_STACK;
> +
> +  stack_contents = context->cfa;
> +
> +  scp = stack_contents->return_scp;
> +  usp = scp->sc_ursp;
> +
> +  fs->regs.reg[0].loc.offset = (unsigned long)>sc_rax - usp;
> +  fs->regs.reg[1].loc.offset = (unsigned long)>sc_rdx - usp;
> +  fs->regs.reg[2].loc.offset = (unsigned long)>sc_rcx - usp;
> +  fs->regs.reg[3].loc.offset = (unsigned long)>sc_rbx - usp;
> +  fs->regs.reg[4].loc.offset = (unsigned long)>sc_rsi - usp;
> +  fs->regs.reg[5].loc.offset = (unsigned long)>sc_rdi - usp;
> +  fs->regs.reg[6].loc.offset = (unsigned long)>sc_rbp - usp;
> +  fs->regs.reg[8].loc.offset = (unsigned long)>sc_r8 - usp;
> +  fs->regs.reg[9].loc.offset = (unsigned long)>sc_r9 - usp;
> +  fs->regs.reg[10].loc.offset = (unsigned long)>sc_r10 - usp;
> +  fs->regs.reg[11].loc.offset = (unsigned long)>sc_r11 - usp;
> +  fs->regs.reg[12].loc.offset = (unsigned long)>sc_r12 - usp;
> +  fs->regs.reg[13].loc.offset = (unsigned long)>sc_r13 - usp;
> +  fs->regs.reg[14].loc.offset = (unsigned long)>sc_r14 - usp;
> +  fs->regs.reg[15].loc.offset = (unsigned long)>sc_r15 - usp;
> +  fs->regs.reg[16].loc.offset = (unsigned long)>sc_rip - usp;
> +
> +  /* Register 7 is rsp  */
> +  fs->regs.cfa_how = CFA_REG_OFFSET;
> +  fs->regs.cfa_reg = 7;
> +  fs->regs.cfa_offset = usp - (unsigned long) context->cfa;
> +
> +  fs->regs.how[0] = REG_SAVED_OFFSET;
> +  fs->regs.how[1] = REG_SAVED_OFFSET;
> +  fs->regs.how[2] = REG_SAVED_OFFSET;
> +  fs->regs.how[3] = REG_SAVED_OFFSET;
> +  fs->regs.how[4] = REG_SAVED_OFFSET;
> +  fs->regs.how[5] = REG_SAVED_OFFSET;
> +  fs->regs.how[6] = REG_SAVED_OFFSET;
> +  fs->regs.how[8] = REG_SAVED_OFFSET;
> +  fs->regs.how[9] = REG_SAVED_OFFSET;
> +  fs->regs.how[10] = REG_SAVED_OFFSET;
> +  fs->regs.how[11] = REG_SAVED_OFFSET;
> +  fs->regs.how[12] = REG_SAVED_OFFSET;
> +  fs->regs.how[13] = REG_SAVED_OFFSET;
> +  fs->regs.how[14] = REG_SAVED_OFFSET;
> +  fs->regs.how[15] = REG_SAVED_OFFSET;
> +  fs->regs.how[16] = REG_SAVED_OFFSET;
> +
> +  fs->retaddr_column = 16;
> +  fs->signal_frame = 1;
> +
> +  return _URC_NO_REASON;
> +}
>  
>  #else /* ifdef __x86_64__  */
>  
> -- 
> 2.43.0
> 
> 

-- 
Samuel
---
Pour une évaluation indépendante, transparente et rigoureuse !
Je soutiens la Commission d'Évaluation de l'Inria.

[PATCH] c++/modules: Stream definitions for implicit instantiations [PR114170]

2024-02-29 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

An implicit instantiation has an initializer depending on whether
DECL_INITIALIZED_P is set (like normal VAR_DECLs) which needs to be
written to ensure that consumers of header modules properly emit
definitions for these instantiations. This patch ensures that we
correctly fallback to checking this flag when DECL_INITIAL is not set
for a template instantiation.

As a drive-by fix, also ensures that the count of initializers matches
the actual number of initializers written. This doesn't seem to be
necessary for correctness in the current testsuite, but feels wrong and
makes debugging harder when initializers aren't properly written for
other reasons.

PR c++/114170

gcc/cp/ChangeLog:

* module.cc (has_definition): Fall back to DECL_INITIALIZED_P
when DECL_INITIAL is not set on a template.
(module_state::write_inits): Only increment count when
initializers are actually written.

gcc/testsuite/ChangeLog:

* g++.dg/modules/var-tpl-2_a.H: New test.
* g++.dg/modules/var-tpl-2_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc   |  8 +---
 gcc/testsuite/g++.dg/modules/var-tpl-2_a.H | 10 ++
 gcc/testsuite/g++.dg/modules/var-tpl-2_b.C | 10 ++
 3 files changed, 25 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/var-tpl-2_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/var-tpl-2_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 1b2ba2e0fa8..09578de41ec 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -11586,8 +11586,9 @@ has_definition (tree decl)
 
 case VAR_DECL:
   if (DECL_LANG_SPECIFIC (decl)
- && DECL_TEMPLATE_INFO (decl))
-   return DECL_INITIAL (decl);
+ && DECL_TEMPLATE_INFO (decl)
+ && DECL_INITIAL (decl))
+   return true;
   else
{
  if (!DECL_INITIALIZED_P (decl))
@@ -17528,13 +17529,14 @@ module_state::write_inits (elf_out *to, depset::hash 
, unsigned *crc_ptr)
   tree list = static_aggregates;
   for (int passes = 0; passes != 2; passes++)
 {
-  for (tree init = list; init; init = TREE_CHAIN (init), count++)
+  for (tree init = list; init; init = TREE_CHAIN (init))
if (TREE_LANG_FLAG_0 (init))
  {
tree decl = TREE_VALUE (init);
 
dump ("Initializer:%u for %N", count, decl);
sec.tree_node (decl);
+   ++count;
  }
 
   list = tls_aggregates;
diff --git a/gcc/testsuite/g++.dg/modules/var-tpl-2_a.H 
b/gcc/testsuite/g++.dg/modules/var-tpl-2_a.H
new file mode 100644
index 000..607fc0b808e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/var-tpl-2_a.H
@@ -0,0 +1,10 @@
+// PR c++/114170
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+
+inline int f() { return 42; }
+
+template
+inline int v = f();
+
+inline int g() { return v; }
diff --git a/gcc/testsuite/g++.dg/modules/var-tpl-2_b.C 
b/gcc/testsuite/g++.dg/modules/var-tpl-2_b.C
new file mode 100644
index 000..6d2ef4004e6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/var-tpl-2_b.C
@@ -0,0 +1,10 @@
+// PR c++/114170
+// { dg-module-do run }
+// { dg-additional-options "-fmodules-ts" }
+
+import "var-tpl-2_a.H";
+
+int main() {
+  if (v != 42)
+__builtin_abort();
+}
-- 
2.43.2

Re: [PATCH v3] c++: implement [[gnu::non_owning]] [PR110358]

2024-02-29 Thread Jason Merrill


On 2/29/24 19:12, Marek Polacek wrote:

On Wed, Feb 28, 2024 at 06:03:54PM -0500, Jason Merrill wrote:


Hmm, if we're also going to allow the attribute to be applied to a function,
the name doesn't make so much sense.  For a class, it says that the class
refers to its initializer; for a function, it says that the function return
value *doesn't* refer to its argument.


Yeah, that's a fair point; I guess "non_owning" would be too perplexing.


If we want something that can apply to both classes and functions, we're
probably back to an attribute that just suppresses the warning, with a
different name.

Or I guess we could have two attributes, but that seems like a lot.

WDYT?


I think we don't want two separate attributes, and we do want that one
attribute to apply to both fns and classes.  We could implement something
like

   [[gnu::no_warning("Wdangling-reference")]]
   [[gnu::no_warning("Wdangling-reference", bool)]]

but first, that's a lot of typing, second, it would be confusing because
it wouldn't work for any other warning.  We already have [[unused]] and
[[maybe_unused]] whose effect is to suppress a warning.  It think our
best bet is to do the most straightforward thing: [[gnu::no_dangling]],
which this patch implements.  I didn't call it no_dangling_reference in
the hope that it can, some day, be also used for some -Wdangling-pointer
purposes.


Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Since -Wdangling-reference has false positives that can't be
prevented, we should offer an easy way to suppress the warning.
Currently, that is only possible by using a #pragma, either around the
enclosing class or around the call site.  But #pragma GCC diagnostic tend
to be onerous.  A better solution would be to have an attribute.

To that end, this patch adds a new attribute, [[gnu::no_dangling]].
This attribute takes an optional bool argument to support cases like:

   template 
   struct [[gnu::no_dangling(std::is_reference_v)]] S {
  // ...
   };

PR c++/110358
PR c++/109642

gcc/cp/ChangeLog:

* call.cc (no_dangling_p): New.
(reference_like_class_p): Use it.
(do_warn_dangling_reference): Use it.  Don't warn when the function
or its enclosing class has attribute gnu::no_dangling.
* tree.cc (cxx_gnu_attributes): Add gnu::no_dangling.
(handle_no_dangling_attribute): New.

gcc/ChangeLog:

* doc/extend.texi: Document gnu::no_dangling.
* doc/invoke.texi: Mention that gnu::no_dangling disables
-Wdangling-reference.

gcc/testsuite/ChangeLog:

* g++.dg/ext/attr-no-dangling1.C: New test.
* g++.dg/ext/attr-no-dangling2.C: New test.
* g++.dg/ext/attr-no-dangling3.C: New test.
* g++.dg/ext/attr-no-dangling4.C: New test.
* g++.dg/ext/attr-no-dangling5.C: New test.
* g++.dg/ext/attr-no-dangling6.C: New test.
* g++.dg/ext/attr-no-dangling7.C: New test.
* g++.dg/ext/attr-no-dangling8.C: New test.
* g++.dg/ext/attr-no-dangling9.C: New test.
---
  gcc/cp/call.cc   | 38 ++--
  gcc/cp/tree.cc   | 26 
  gcc/doc/extend.texi  | 21 +++
  gcc/doc/invoke.texi  | 21 +++
  gcc/testsuite/g++.dg/ext/attr-no-dangling1.C | 38 
  gcc/testsuite/g++.dg/ext/attr-no-dangling2.C | 29 +
  gcc/testsuite/g++.dg/ext/attr-no-dangling3.C | 24 
  gcc/testsuite/g++.dg/ext/attr-no-dangling4.C | 14 +
  gcc/testsuite/g++.dg/ext/attr-no-dangling5.C | 31 ++
  gcc/testsuite/g++.dg/ext/attr-no-dangling6.C | 65 
  gcc/testsuite/g++.dg/ext/attr-no-dangling7.C | 31 ++
  gcc/testsuite/g++.dg/ext/attr-no-dangling8.C | 30 +
  gcc/testsuite/g++.dg/ext/attr-no-dangling9.C | 25 
  13 files changed, 387 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling1.C
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling2.C
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling3.C
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling4.C
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling5.C
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling6.C
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling7.C
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling8.C
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-no-dangling9.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index c40ef2e3028..9e4c8073600 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -14033,11 +14033,7 @@ std_pair_ref_ref_p (tree t)
return true;
  }
  
-/* Return true if a class CTYPE is either std::reference_wrapper or

-   std::ref_view, or a reference wrapper class.  We consider a class
-   a reference wrapper class if it has a reference member.  We no
-   longer check that it has a constructor taking

[PATCH v3] c++: implement [[gnu::non_owning]] [PR110358]

2024-02-29 Thread Marek Polacek

On Wed, Feb 28, 2024 at 06:03:54PM -0500, Jason Merrill wrote:
> On 2/21/24 19:35, Marek Polacek wrote:
> > On Fri, Jan 26, 2024 at 04:04:35PM -0500, Jason Merrill wrote:
> > > On 1/25/24 20:37, Marek Polacek wrote:
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > 
> > > > -- >8 --
> > > > Since -Wdangling-reference has false positives that can't be
> > > > prevented, we should offer an easy way to suppress the warning.
> > > > Currently, that is only possible by using a #pragma, either around the
> > > > enclosing class or around the call site.  But #pragma GCC diagnostic 
> > > > tend
> > > > to be onerous.  A better solution would be to have an attribute.  Such
> > > > an attribute should not be tied to this particular warning though.  [*]
> > > > 
> > > > The warning bogusly triggers for classes that are like std::span,
> > > > std::reference_wrapper, and std::ranges::ref_view.  The common property
> > > > seems to be that these classes are only wrappers around some data.  So
> > > > I chose the name non_owning, but I'm not attached to it.  I hope that
> > > > in the future the attribute can be used for something other than this
> > > > diagnostic.
> > > 
> > > You decided not to pursue Barry's request for a bool argument to the
> > > attribute?
> > 
> > At first I thought it'd be an unnecessary complication but it was actually
> > pretty easy.  Better to accept the optional argument from the get-go
> > otherwise people would have to add > GCC 14 checks.
> > > Might it be more useful for the attribute to make reference_like_class_p
> > > return true, so that we still warn about a temporary of another type 
> > > passing
> > > through it?
> > 
> > Good point.  Fixed.
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > -- >8 --
> > Since -Wdangling-reference has false positives that can't be
> > prevented, we should offer an easy way to suppress the warning.
> > Currently, that is only possible by using a #pragma, either around the
> > enclosing class or around the call site.  But #pragma GCC diagnostic tend
> > to be onerous.  A better solution would be to have an attribute.  Such
> > an attribute should not be tied to this particular warning though.
> > 
> > The warning bogusly triggers for classes that are like std::span,
> > std::reference_wrapper, and std::ranges::ref_view.  The common property
> > seems to be that these classes are only wrappers around some data.  So
> > I chose the name non_owning, but I'm not attached to it.  I hope that
> > in the future the attribute can be used for something other than this
> > diagnostic.
> > 
> > This attribute takes an optional bool argument to support cases like:
> > 
> >template 
> >struct [[gnu::non_owning(std::is_reference_v)]] S {
> >   // ...
> >};
> > 
> > PR c++/110358
> > PR c++/109642
> > 
> > gcc/cp/ChangeLog:
> > 
> > * call.cc (non_owning_p): New.
> > (reference_like_class_p): Use it.
> > (do_warn_dangling_reference): Use it.  Don't warn when the function
> > or its enclosing class has attribute gnu::non_owning.
> > * tree.cc (cxx_gnu_attributes): Add gnu::non_owning.
> > (handle_non_owning_attribute): New.
> > 
> > gcc/ChangeLog:
> > 
> > * doc/extend.texi: Document gnu::non_owning.
> > * doc/invoke.texi: Mention that gnu::non_owning disables
> > -Wdangling-reference.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/ext/attr-non-owning1.C: New test.
> > * g++.dg/ext/attr-non-owning2.C: New test.
> > * g++.dg/ext/attr-non-owning3.C: New test.
> > * g++.dg/ext/attr-non-owning4.C: New test.
> > * g++.dg/ext/attr-non-owning5.C: New test.
> > * g++.dg/ext/attr-non-owning6.C: New test.
> > * g++.dg/ext/attr-non-owning7.C: New test.
> > * g++.dg/ext/attr-non-owning8.C: New test.
> > * g++.dg/ext/attr-non-owning9.C: New test.
> > ---
> >   gcc/cp/call.cc  | 38 ++--
> >   gcc/cp/tree.cc  | 26 +
> >   gcc/doc/extend.texi | 25 
> >   gcc/doc/invoke.texi | 21 +++
> >   gcc/testsuite/g++.dg/ext/attr-non-owning1.C | 38 
> >   gcc/testsuite/g++.dg/ext/attr-non-owning2.C | 29 +
> >   gcc/testsuite/g++.dg/ext/attr-non-owning3.C | 24 
> >   gcc/testsuite/g++.dg/ext/attr-non-owning4.C | 14 +
> >   gcc/testsuite/g++.dg/ext/attr-non-owning5.C | 31 ++
> >   gcc/testsuite/g++.dg/ext/attr-non-owning6.C | 65 +
> >   gcc/testsuite/g++.dg/ext/attr-non-owning7.C | 31 ++
> >   gcc/testsuite/g++.dg/ext/attr-non-owning8.C | 30 ++
> >   gcc/testsuite/g++.dg/ext/attr-non-owning9.C | 25 
> >   13 files changed, 391 insertions(+), 6 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/ext/attr-non-owning1.C
> >   create mode 100644 gcc/testsuite/g++.dg/ext/attr-non-owning2.C
> >   create mode 100644

[pushed] analyzer: fix ICE in call summarization [PR114159]

2024-02-29 Thread David Malcolm

PR analyzer/114159 reports an ICE inside playback of call summaries
for very low values of --param=analyzer-max-svalue-depth=VAL.

Root cause is that call_summary_edge_info's ctor tries to evaluate
the function ptr of a gimple call stmt and assumes it gets a function *,
but with low values of --param=analyzer-max-svalue-depth=VAL we get
back an UNKNOWN svalue, rather than a pointer to a specific function.

Fix by adding a new call_info ctor that passes a specific
const function & from the call_summary_edge_info, rather than trying
to compute the function.

In doing so, I noticed that the analyzer was using "function *" despite
not modifying functions, and was sloppy about can-be-null versus
must-be-non-null function pointers, so I "constified" the function, and
converted the many places where the function must be non-null to be
"const function &".

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Successful run of analyzer integration tests on x86_64-pc-linux-gnu.
Pushed to trunk as r14-9245-gc0d8a64e72324d.

gcc/analyzer/ChangeLog:
PR analyzer/114159
* analyzer.cc: Include "tree-dfa.h".
(get_ssa_default_def): New decl.
* analyzer.h (get_ssa_default_def): New.
* call-info.cc (call_info::call_info): New ctor taking an explicit
called_fn.
* call-info.h (call_info::call_info): Likewise.
* call-summary.cc (call_summary_replay::call_summary_replay):
Convert param from function * to const function &.
* call-summary.h (call_summary_replay::call_summary_replay):
Likewise.
* checker-event.h (state_change_event::get_dest_function):
Constify return value.
* engine.cc (point_and_state::validate): Update for conversion to
const function &.
(exploded_node::on_stmt): Likewise.
(call_summary_edge_info::call_summary_edge_info): Likewise.
Pass in called_fn to call_info ctor.
(exploded_node::replay_call_summaries): Update for conversion to
const function &.  Convert per_function_data from * to &.
(exploded_node::replay_call_summary): Update for conversion to
const function &.
(exploded_graph::add_function_entry): Likewise.
(toplevel_function_p): Likewise.
(add_tainted_args_callback): Likewise.
(exploded_graph::build_initial_worklist): Likewise.
(exploded_graph::maybe_create_dynamic_call): Likewise.
(maybe_update_for_edge): Likewise.
(exploded_graph::on_escaped_function): Likewise.
* exploded-graph.h (exploded_node::replay_call_summaries):
Likewise.
(exploded_node::replay_call_summary): Likewise.
(exploded_graph::add_function_entry): Likewise.
* program-point.cc (function_point::from_function_entry):
Likewise.
(program_point::from_function_entry): Likewise.
* program-point.h (function_point::from_function_entry): Likewise.
(program_point::from_function_entry): Likewise.
* program-state.cc (program_state::push_frame): Likewise.
(program_state::get_current_function): Constify return type.
* program-state.h (program_state::push_frame): Update for
conversion to const function &.
(program_state::get_current_function): Likewise.
* region-model-manager.cc
(region_model_manager::get_frame_region): Likewise.
* region-model-manager.h
(region_model_manager::get_frame_region): Likewise.
* region-model.cc (region_model::called_from_main_p): Likewise.
(region_model::update_for_gcall): Likewise.
(region_model::push_frame): Likewise.
(region_model::get_current_function): Constify return type.
(region_model::pop_frame): Update for conversion to
const function &.
(selftest::test_stack_frames): Likewise.
(selftest::test_get_representative_path_var): Likewise.
(selftest::test_state_merging): Likewise.
(selftest::test_alloca): Likewise.
* region-model.h (region_model::push_frame): Likewise.
(region_model::get_current_function): Likewise.
* region.cc (frame_region::dump_to_pp): Likewise.
(frame_region::get_region_for_local): Likewise.
* region.h (class frame_region): Likewise.
* sm-signal.cc (signal_unsafe_call::describe_state_change):
Likewise.
(update_model_for_signal_handler): Likewise.
(signal_delivery_edge_info_t::update_model): Likewise.
(register_signal_handler::impl_transition): Likewise.
* state-purge.cc (class gimple_op_visitor): Likewise.
(state_purge_map::state_purge_map): Likewise.
(state_purge_map::get_or_create_data_for_decl): Likewise.
(state_purge_per_ssa_name::state_purge_per_ssa_name): Likewise.
(state_purge_per_ssa_name::add_to_worklist): Likewise.
(state_purge_per_ssa_name::process_point): Likewise.

Re: [committed] Set num_threads to 50 on 32-bit hppa in two libgomp loop tests

2024-02-29 Thread John David Anglin


On 2024-02-29 6:02 p.m., Thomas Schwinge wrote:

Hi!

On 2024-02-01T19:20:57+, John David Anglin  wrote:

Tested on hppa-unknown-linux-gnu.  Committed to trunk.
Set num_threads to 50 on 32-bit hppa in two libgomp loop tests

We support a maximum of 50 threads on 32-bit hppa.

What happens if you go higher?  Curious, what/why is that architectural
limit of 50 threads?

One gets an EAGAIN error at 51.  I don't know why 50 is the architectural limit 
on hppa-linux.
I had asked Helge previously but didn't get an answer.  As far as I can tell, 
limit isn't set by glibc.

It seems 64 is supported on all other targets.


I wonder: shouldn't that cap at 50 threads happen inside libgomp,
generally, instead of per test case and user code (!)?  Per my
understanding, OpenMP 'num_threads' specifies a *desired* number of
threads; the implementation may limit that value.

Sounds like a good suggestion.

Dave

--
John David Anglin  dave.ang...@bell.net

Re: [committed] Set num_threads to 50 on 32-bit hppa in two libgomp loop tests

2024-02-29 Thread Thomas Schwinge

Hi!

On 2024-02-01T19:20:57+, John David Anglin  wrote:
> Tested on hppa-unknown-linux-gnu.  Committed to trunk.

> Set num_threads to 50 on 32-bit hppa in two libgomp loop tests
>
> We support a maximum of 50 threads on 32-bit hppa.

What happens if you go higher?  Curious, what/why is that architectural
limit of 50 threads?

I wonder: shouldn't that cap at 50 threads happen inside libgomp,
generally, instead of per test case and user code (!)?  Per my
understanding, OpenMP 'num_threads' specifies a *desired* number of
threads; the implementation may limit that value.


Grüße
 Thomas


> --- a/libgomp/testsuite/libgomp.c++/loop-3.C
> +++ b/libgomp/testsuite/libgomp.c++/loop-3.C
> @@ -1,3 +1,9 @@
> +#if defined(__hppa__) && !defined(__LP64__)
> +#define NUM_THREADS 50
> +#else
> +#define NUM_THREADS 64
> +#endif
> +
>  extern "C" void abort (void);
>  int a;
>  
> @@ -19,7 +25,7 @@ foo ()
>  int
>  main (void)
>  {
> -#pragma omp parallel num_threads (64)
> +#pragma omp parallel num_threads (NUM_THREADS)
>foo ();
>  
>return 0;

> --- a/libgomp/testsuite/libgomp.c/omp-loop03.c
> +++ b/libgomp/testsuite/libgomp.c/omp-loop03.c
> @@ -1,3 +1,9 @@
> +#if defined(__hppa__) && !defined(__LP64__)
> +#define NUM_THREADS 50
> +#else
> +#define NUM_THREADS 64
> +#endif
> +
>  extern void abort (void);
>  int a;
>  
> @@ -19,7 +25,7 @@ foo ()
>  int
>  main (void)
>  {
> -#pragma omp parallel num_threads (64)
> +#pragma omp parallel num_threads (NUM_THREADS)
>foo ();
>  
>return 0;

Re: [PATCH 1/3] Change 'v1' float and int code to fall back to v0

2024-02-29 Thread Tom Tromey

> "Jeff" == Jeff Law  writes:

>> I don't know how to fix this.

Jeff> Me neither, but I can suggest a hacky workaround.

FTR, I asked Jakub on irc and he fixed it, so thankfully I didn't have
to resort to the hack :-)

thanks,
Tom

Re: [PATCH] Fortran: improve checks of NULL without MOLD as actual argument [PR104819]

2024-02-29 Thread Jerry D


On 2/29/24 12:56 PM, Harald Anlauf wrote:

Dear all,

here's a first patch addressing issues with NULL as actual argument:
if the dummy is assumed-rank or assumed length, MOLD shall be present.

There is also an interp on interoperability of c_sizeof and NULL
pointers, for which we have a partially incorrect testcase
(gfortran.dg/pr101329.f90) which gets fixed.

See https://j3-fortran.org/doc/year/22/22-101r1.txt for more.

Furthermore, nested NULL()s are now handled.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

I consider this part as safe and would like to backport to 13-branch.
Objections?

Thanks,
Harald


Looks good to me. I think backport is OK as well.

Jerry -

Re: [PATCH] libgccjit: Add option to allow special characters in function names

2024-02-29 Thread Antoni Boucher


Hi and thanks for the review.
I thought it would be a bit weird to have an option to change which 
characters are allowed, but I can't think of a better solution.

Here's the updated patch that now allow arbitrary characters.

Le 2024-02-20 à 16 h 05, Iain Sandoe a écrit :




On 20 Feb 2024, at 20:50, David Malcolm  wrote:

On Thu, 2024-02-15 at 17:08 -0500, Antoni Boucher wrote:

Hi.
This patch adds a new option to allow special characters like . and $
in function names.
This is useful to allow for mangling using those characters.
Thanks for the review.


Thanks for the patch.


diff --git a/gcc/jit/docs/topics/contexts.rst b/gcc/jit/docs/topics/contexts.rst
index 10a0e50f9f6..4af75ea7418 100644
--- a/gcc/jit/docs/topics/contexts.rst
+++ b/gcc/jit/docs/topics/contexts.rst
@@ -453,6 +453,10 @@ Boolean options
  If true, the :type:`gcc_jit_context` will not clean up intermediate files
  written to the filesystem, and will display their location on stderr.

+  .. macro:: GCC_JIT_BOOL_OPTION_SPECIAL_CHARS_IN_FUNC_NAMES
+
+ If true, allow special characters like . and $ in function names.


The documentation and the comment in libgccjit.h say:
  "allow special characters like . and $ in function names."
and on reading the implementation, the special characters are exactly
'.' and '$'.

The API seems rather arbitrary and inflexible to me; why the choice of
those characters?  Presumably those are the ones that Rust's mangling
scheme uses, but do other mangling schemes require other chars?

How about an API for setting the valid chars, something like:

extern void
gcc_jit_context_set_valid_symbol_chars (gcc_jit_context *ctxt,
const char *chars);

to specify the chars that are valid in addition to underscore and
alphanumeric.

In your case you'd call:

  gcc_jit_context_set_valid_symbol_chars (ctxt, ".$");

Or is that overkill?


If we ever wanted to support objective-c (NeXT runtime) then we’d need to
be able to support +,-,[,] space and : at least.  The interesting thing there is
that most assemblers do not support that either (and the symbols then need
to be quoted into the assembler) .

So, it’s not (IMO) overkill considering at least one potential extension.

Iain



Dave

From 1bd8a95ffd8e30bf3b65a2948f9ebba22e691bd6 Mon Sep 17 00:00:00 2001
From: Antoni Boucher 
Date: Thu, 15 Feb 2024 17:03:22 -0500
Subject: [PATCH] libgccjit: Add option to allow special characters in function
 names

gcc/jit/ChangeLog:

	* docs/topics/contexts.rst: Add documentation for new option.
	* jit-recording.cc (recording::context::get_str_option): New
	method.
	* jit-recording.h (get_str_option): New method.
	* libgccjit.cc (gcc_jit_context_new_function): Allow special
	characters in function names.
	* libgccjit.h (enum gcc_jit_str_option): New option.

gcc/testsuite/ChangeLog:

	* jit.dg/test-special-chars.c: New test.
---
 gcc/jit/docs/topics/contexts.rst  |  8 +++--
 gcc/jit/jit-recording.cc  | 17 --
 gcc/jit/jit-recording.h   |  3 ++
 gcc/jit/libgccjit.cc  |  8 +++--
 gcc/jit/libgccjit.h   |  3 ++
 gcc/testsuite/jit.dg/test-special-chars.c | 41 +++
 6 files changed, 74 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/jit.dg/test-special-chars.c

diff --git a/gcc/jit/docs/topics/contexts.rst b/gcc/jit/docs/topics/contexts.rst
index 10a0e50f9f6..e7950cee961 100644
--- a/gcc/jit/docs/topics/contexts.rst
+++ b/gcc/jit/docs/topics/contexts.rst
@@ -317,13 +317,17 @@ String Options
copy of the underlying string, so it is valid to pass in a pointer to
an on-stack buffer.
 
-   There is just one string option specified this way:
-
.. macro:: GCC_JIT_STR_OPTION_PROGNAME
 
   The name of the program, for use as a prefix when printing error
   messages to stderr.  If `NULL`, or default, "libgccjit.so" is used.
 
+  .. macro:: GCC_JIT_STR_OPTION_SPECIAL_CHARS_IN_FUNC_NAMES
+
+  Special characters to allow in function names.
+  This string contains all characters that should not be rejected by
+  libgccjit. Ex.: ".$"
+
 Boolean options
 ***
 
diff --git a/gcc/jit/jit-recording.cc b/gcc/jit/jit-recording.cc
index 68a2e860c1f..a656b9a0a98 100644
--- a/gcc/jit/jit-recording.cc
+++ b/gcc/jit/jit-recording.cc
@@ -1362,6 +1362,18 @@ recording::context::set_str_option (enum gcc_jit_str_option opt,
   log_str_option (opt);
 }
 
+const char*
+recording::context::get_str_option (enum gcc_jit_str_option opt)
+{
+  if (opt < 0 || opt >= GCC_JIT_NUM_STR_OPTIONS)
+{
+  add_error (NULL,
+		 "unrecognized (enum gcc_jit_str_option) value: %i", opt);
+  return NULL;
+}
+  return m_str_options[opt];
+}
+
 /* Set the given integer option for this context, or add an error if
it's not recognized.
 
@@ -1703,7 +1715,8 @@ recording::context::dump_to_file (const char *path, bool update_locations)
 
 static const char * const

Re: [PATCH V3 2/2] rs6000: Load store fusion for rs6000 target using common infrastructure

2024-02-29 Thread Segher Boessenkool

Hi!

On Mon, Feb 19, 2024 at 04:24:37PM +0530, Ajit Agarwal wrote:
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -518,7 +518,7 @@ or1k*-*-*)
>   ;;
>  powerpc*-*-*)
>   cpu_type=rs6000
> - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
> rs6000-vecload-fusion.o"

Line too long.

> +  /* Pass to replace adjacent memory addresses lxv instruction with lxvp
> + instruction.  */
> +  INSERT_PASS_BEFORE (pass_early_remat, 1, pass_analyze_vecload);

That is not such a great name.  Any pss name with "analyze" is not so
good -- the pass does much more than just "analyze" things!

> --- /dev/null
> +++ b/gcc/config/rs6000/rs6000-vecload-fusion.cc
> @@ -0,0 +1,701 @@
> +/* Subroutines used to replace lxv with lxvp
> +   for TARGET_POWER10 and TARGET_VSX,

The pass filename is not good then, either.

> +   Copyright (C) 2020-2023 Free Software Foundation, Inc.

What in here is from 2020?

Most things will be from 2024, too.  First publication date is what
counts.

> +   Contributed by Ajit Kumar Agarwal .

We don't say such things in the files normally.

> +class rs6000_pair_fusion : public pair_fusion
> +{
> +public:
> +  rs6000_pair_fusion (bb_info *bb) : pair_fusion (bb) {reg_ops = NULL;};
> +  bool is_fpsimd_op_p (rtx reg_op, machine_mode mem_mode, bool load_p);
> +  bool pair_mem_ok_policy (rtx first_mem, bool load_p, machine_mode mode)
> +  {
> +return !(first_mem || load_p || mode);
> +  }

It is much more natural to write this as
  retuurn !first_mem && !load && !mode;

(_p is wrong, this is not a predicate, it is not a function at all!)

What is "!mode" for here?  How can VOIDmode happen here?  What does it
mean?  This needs to be documented.

> +  bool pair_check_register_operand (bool load_p, rtx reg_op,
> + machine_mode mem_mode)
> +  {
> +if (load_p || reg_op || mem_mode)
> +  return false;
> +else
> +  return false;
> +  }

The compiler will have warned for this.  Please look at all compiler
(and other) warnings that you introduce.

> +rs6000_pair_fusion::is_fpsimd_op_p (rtx reg_op, machine_mode mem_mode, bool 
> load_p)
> +{
> +  return !((reg_op && mem_mode) || load_p);
> +}

For more complex logic, split it up into two or more conditional
returns.

> +// alias_walker that iterates over stores.
> +template
> +class store_walker : public def_walker

That is not a good comment.  You should describe parameters and return
values and that kind of thing.  That it walks over things is bloody
obvious from the name already :-)

> +extern insn_info *
> +find_trailing_add (insn_info *insns[2],
> +const insn_range_info _range,
> +int initial_writeback,
> +rtx *writeback_effect,
> +def_info **add_def,
> +def_info *base_def,
> +poly_int64 initial_offset,
> +unsigned access_size);

That is way, way, way too many parameters.

So:

* Better names please.
* Better documentation, too, including documentations in the code.
Don't describe *what*, anyone can see that anyway, but describe *why*.
* This is way too much for one patch.  Split this into many patches,
properly structured in a patch series.  The design will need some
explanation, but none of the code should need that, ever!

Segher

[PATCH] c++: Ensure DECL_CONTEXT is set for temporary vars [PR114005]

2024-02-29 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

Alternatively we could update 'DECL_CONTEXT' only for
'make_temporary_var_for_ref_to_temp' in call.cc, as a more targetted
fix, but I felt that this way it'd also fix any other similar issues
that have gone uncaught so far.

-- >8 --

Modules streaming requires DECL_CONTEXT to be set for anything streamed.
This patch ensures that 'create_temporary_var' does set a DECL_CONTEXT
for these variables (such as the backing storage for initializer_lists)
even if not inside a function declaration.

PR c++/114005

gcc/cp/ChangeLog:

* init.cc (create_temporary_var): Set DECL_CONTEXT to
current_namespace if at namespace scope.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr114005_a.C: New test.
* g++.dg/modules/pr114005_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/init.cc| 2 ++
 gcc/testsuite/g++.dg/modules/pr114005_a.C | 8 
 gcc/testsuite/g++.dg/modules/pr114005_b.C | 7 +++
 3 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/pr114005_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/pr114005_b.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index ac37330527e..e6fca7b3226 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -4258,6 +4258,8 @@ create_temporary_var (tree type)
   DECL_ARTIFICIAL (decl) = 1;
   DECL_IGNORED_P (decl) = 1;
   DECL_CONTEXT (decl) = current_function_decl;
+  if (!DECL_CONTEXT (decl))
+DECL_CONTEXT (decl) = current_namespace;
 
   return decl;
 }
diff --git a/gcc/testsuite/g++.dg/modules/pr114005_a.C 
b/gcc/testsuite/g++.dg/modules/pr114005_a.C
new file mode 100644
index 000..404683484ec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr114005_a.C
@@ -0,0 +1,8 @@
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi M }
+
+module;
+#include 
+
+export module M;
+export constexpr std::initializer_list foo{ 1, 2, 3 };
diff --git a/gcc/testsuite/g++.dg/modules/pr114005_b.C 
b/gcc/testsuite/g++.dg/modules/pr114005_b.C
new file mode 100644
index 000..88317ce11f8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr114005_b.C
@@ -0,0 +1,7 @@
+// { dg-additional-options "-fmodules-ts" }
+
+import M;
+
+int main() {
+  return foo.size();
+}
-- 
2.43.2

Re: [PATCH] libgccjit: Add support for creating temporary variables

2024-02-29 Thread Antoni Boucher


Hi and thanks for the review!
Here's the updated patch.

Le 2024-01-24 à 09 h 54, David Malcolm a écrit :

On Fri, 2024-01-19 at 16:54 -0500, Antoni Boucher wrote:

Hi.
This patch adds a new way to create local variable that won't
generate
debug info: it is to be used for compiler-generated variables.
Thanks for the review.


Thanks for the patch.


diff --git a/gcc/jit/docs/topics/compatibility.rst 
b/gcc/jit/docs/topics/compatibility.rst
index cbf5b414d8c..5d62e264a00 100644
--- a/gcc/jit/docs/topics/compatibility.rst
+++ b/gcc/jit/docs/topics/compatibility.rst
@@ -390,3 +390,12 @@ on functions and variables:
* :func:`gcc_jit_function_add_string_attribute`
* :func:`gcc_jit_function_add_integer_array_attribute`
* :func:`gcc_jit_lvalue_add_string_attribute`
+
+.. _LIBGCCJIT_ABI_27:
+
+``LIBGCCJIT_ABI_27``
+
+``LIBGCCJIT_ABI_27`` covers the addition of a functions to create a new


"functions" -> "function"


+temporary variable:
+
+  * :func:`gcc_jit_function_new_temp`
diff --git a/gcc/jit/docs/topics/functions.rst 
b/gcc/jit/docs/topics/functions.rst
index 804605ea939..230caf42466 100644
--- a/gcc/jit/docs/topics/functions.rst
+++ b/gcc/jit/docs/topics/functions.rst
@@ -171,6 +171,26 @@ Functions
 underlying string, so it is valid to pass in a pointer to an on-stack
 buffer.
  
+.. function:: gcc_jit_lvalue *\

+  gcc_jit_function_new_temp (gcc_jit_function *func,\
+ gcc_jit_location *loc,\
+ gcc_jit_type *type)
+
+   Create a new local variable within the function, of the given type.
+   This function is similar to :func:`gcc_jit_function_new_local`, but
+   it is to be used for compiler-generated variables (as opposed to
+   user-defined variables in the language to be compiled) and these
+   variables won't show up in the debug info.
+
+   The parameter ``type`` must be non-`void`.
+
+   This entrypoint was added in :ref:`LIBGCCJIT_ABI_26`; you can test
+   for its presence using


The ABI number is inconsistent here (it's 27 above and in the .map
file), but obviously you can fix this when you eventually commit this
based on what the ABI number actually is.

[...snip...]


diff --git a/gcc/jit/jit-playback.cc b/gcc/jit/jit-playback.cc
index 84df6c100e6..cb6b2f66276 100644
--- a/gcc/jit/jit-playback.cc
+++ b/gcc/jit/jit-playback.cc
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "toplev.h"
  #include "tree-cfg.h"
  #include "convert.h"
+#include "gimple-expr.h"
  #include "stor-layout.h"
  #include "print-tree.h"
  #include "gimplify.h"
@@ -1950,13 +1951,27 @@ new_local (location *loc,
   type *type,
   const char *name,
   const std::vector> )
+  std::string>> ,
+  bool is_temp)
  {
gcc_assert (type);
-  gcc_assert (name);
-  tree inner = build_decl (UNKNOWN_LOCATION, VAR_DECL,
+  tree inner;
+  if (is_temp)
+  {
+inner = build_decl (UNKNOWN_LOCATION, VAR_DECL,
+   create_tmp_var_name ("JITTMP"),
+   type->as_tree ());
+DECL_ARTIFICIAL (inner) = 1;
+DECL_IGNORED_P (inner) = 1;
+DECL_NAMELESS (inner) = 1;


We could assert that "name" is null in the is_temp branch.

An alternative approach might be to drop "is_temp", and instead make
"name" being null signify that it's a temporary, if you prefer that
approach.  Would client code ever want to specify a name prefix for a
temporary?


No, I don't think anyone would want a different prefix.





+  }
+  else
+  {
+gcc_assert (name);
+inner = build_decl (UNKNOWN_LOCATION, VAR_DECL,
   get_identifier (name),
   type->as_tree ());
+  }
DECL_CONTEXT (inner) = this->m_inner_fndecl;
  
/* Prepend to BIND_EXPR_VARS: */


[...snip...]

Thanks again for the patch.  Looks good to me as-is (apart from the
grammar and ABI number nits), but what do you think of eliminating
"is_temp" in favor of the "name" ptr being null?  I think it's your
call.

Dave
From 80ea12ce227b2ac5d5dcd99374532e30a775ecbd Mon Sep 17 00:00:00 2001
From: Antoni Boucher 
Date: Thu, 18 Jan 2024 16:54:59 -0500
Subject: [PATCH] libgccjit: Add support for creating temporary variables

gcc/jit/ChangeLog:

	* docs/topics/compatibility.rst (LIBGCCJIT_ABI_28): New ABI tag.
	* docs/topics/functions.rst: Document gcc_jit_function_new_temp.
	* jit-playback.cc (new_local): Add support for temporary
	variables.
	* jit-recording.cc (recording::function::new_temp): New method.
	(recording::local::write_reproducer): Support temporary
	variables.
	* jit-recording.h (new_temp): New method.
	* libgccjit.cc (gcc_jit_function_new_temp): New function.
	* libgccjit.h (gcc_jit_function_new_temp): New function.
	* libgccjit.map: New function.

gcc/testsuite/ChangeLog:

	* jit.dg/all-non-failing-tests.h: Mention test-temp.c.
	* jit.dg/test-temp.c: New test.
---

[PATCH] c++/modules: depending local enums [PR104919, PR106009]

2024-02-29 Thread Patrick Palka

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

For local enums defined in a non-template function or a function template
instantiation it seems we neglect to make the function depend on the enum
definition, which ultimately causes streaming to fail due to the enum
definition not being streamed before uses of its enumerators are streamed,
as far as I can tell.

The code responsible for adding such dependencies is

gcc/cp/module.cc
@@ -8784,17 +8784,6 @@ trees_out::decl_node (tree decl, walk_kind ref)
   depset *dep = NULL;
   if (streaming_p ())
 dep = dep_hash->find_dependency (decl);
!  else if (TREE_CODE (ctx) != FUNCTION_DECL
!  || TREE_CODE (decl) == TEMPLATE_DECL
!  || (dep_hash->sneakoscope && DECL_IMPLICIT_TYPEDEF_P (decl))
!  || (DECL_LANG_SPECIFIC (decl)
!  && DECL_MODULE_IMPORT_P (decl)))
!{
!  auto kind = (TREE_CODE (decl) == NAMESPACE_DECL
!  && !DECL_NAMESPACE_ALIAS (decl)
!  ? depset::EK_NAMESPACE : depset::EK_DECL);
!  dep = dep_hash->add_dependency (decl, kind);
!}

   if (!dep)
 {

and the condition there notably excludes local TYPE_DECLs from a
non-template function or a function template instantiation.  (For a
TYPE_DECL from a function template definition, we'll be dealing with the
corresponding TEMPLATE_DECL instead, so we'll add the dependency.)

Local classes seem fine as-is but perhaps by accident: with a local
class we end up depending on the injected-class-name of the local class
since it satisfies the above conditions.  A local enum doesn't have
such a TYPE_DECL member than we can depend on (its CONST_DECLs are
handled earlier as tt_enum_decl tags).

This patch attempts to fix this by keeping the 'sneakoscope' flag set
while walking the definition of a function, so that we add this needed
dependency between a containing function (non-template or specialization)
and its local types.  Currently it's set only when walking the
declaration (presumably to catch local types that escape via a deduced
return type), but it seems to make sense to add a dependency regardless
of the type escapes.

This was nearly enough to make things work, except we now ran into
issues with the local TYPE/CONST_DECL copies when streaming the
constexpr version of a function body.  It occurred to me that we don't
need to make copies of local types when copying a constexpr function
body; only VAR_DECLs etc need to be copied for sake of recursive
constexpr calls.  So this patch adjusts copy_fn accordingly.

PR c++/104919
PR c++/106009

gcc/cp/ChangeLog:

* module.cc (depset::hash::find_dependencies): Keep sneakoscope
set when walking the definition.

gcc/ChangeLog:

* tree-inline.cc (remap_decl): Handle copy_decl returning the
original decl.
(remap_decls): Handle remap_decl returning the original decl.
(copy_fn): Adjust copy_decl callback to skip TYPE_DECL and
CONST_DECL.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tdef-7.h:
* g++.dg/modules/tdef-7_b.C:
* g++.dg/modules/enum-13_a.C: New test.
* g++.dg/modules/enum-13_b.C: New test.
---
 gcc/cp/module.cc |  2 +-
 gcc/testsuite/g++.dg/modules/enum-13_a.C | 23 +++
 gcc/testsuite/g++.dg/modules/enum-13_b.C |  8 
 gcc/testsuite/g++.dg/modules/tdef-7.h|  2 --
 gcc/testsuite/g++.dg/modules/tdef-7_b.C  |  2 +-
 gcc/tree-inline.cc   | 14 +++---
 6 files changed, 44 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/enum-13_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/enum-13_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 66ef0bcaa94..29e57716297 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -13547,9 +13547,9 @@ depset::hash::find_dependencies (module_state *module)
  /* Turn the Sneakoscope on when depending the decl.  */
  sneakoscope = true;
  walker.decl_value (decl, current);
- sneakoscope = false;
  if (current->has_defn ())
walker.write_definition (decl);
+ sneakoscope = false;
}
  walker.end ();
 
diff --git a/gcc/testsuite/g++.dg/modules/enum-13_a.C 
b/gcc/testsuite/g++.dg/modules/enum-13_a.C
new file mode 100644
index 000..2e570c6c4fb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/enum-13_a.C
@@ -0,0 +1,23 @@
+// PR c++/104919
+// PR c++/106009
+// { dg-additional-options -fmodules-ts }
+// { dg-module-cmi Enum13 }
+
+export module Enum13;
+
+export
+constexpr int f() {
+  enum E { e = 42 };
+  return e;
+}
+
+template
+constexpr int ft(T) {
+  enum blah { e = 43 };
+  return e;
+}
+
+export
+constexpr int g() {
+  return ft(0);
+}
diff --git a/gcc/testsuite/g++.dg/modules/enum-13_b.C 
b/gcc/testsuite/g++.dg/modules/enum-13_b.C
new

[PATCH] Fortran: improve checks of NULL without MOLD as actual argument [PR104819]

2024-02-29 Thread Harald Anlauf

Dear all,

here's a first patch addressing issues with NULL as actual argument:
if the dummy is assumed-rank or assumed length, MOLD shall be present.

There is also an interp on interoperability of c_sizeof and NULL
pointers, for which we have a partially incorrect testcase
(gfortran.dg/pr101329.f90) which gets fixed.

See https://j3-fortran.org/doc/year/22/22-101r1.txt for more.

Furthermore, nested NULL()s are now handled.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

I consider this part as safe and would like to backport to 13-branch.
Objections?

Thanks,
Harald

From ce7199b16872b3014be68744329a8f19ddd64b05 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 29 Feb 2024 21:43:53 +0100
Subject: [PATCH] Fortran: improve checks of NULL without MOLD as actual
 argument [PR104819]

gcc/fortran/ChangeLog:

	PR fortran/104819
	* check.cc (gfc_check_null): Handle nested NULL()s.
	(is_c_interoperable): Check for MOLD argument of NULL() as part of
	the interoperability check.
	* interface.cc (gfc_compare_actual_formal): Extend checks for NULL()
	actual arguments for presence of MOLD argument when required by
	Interp J3/22-146.

gcc/testsuite/ChangeLog:

	PR fortran/104819
	* gfortran.dg/pr101329.f90: Adjust testcase to conform to interp.
	* gfortran.dg/null_actual_4.f90: New test.
---
 gcc/fortran/check.cc|  5 ++-
 gcc/fortran/interface.cc| 30 ++
 gcc/testsuite/gfortran.dg/null_actual_4.f90 | 35 +
 gcc/testsuite/gfortran.dg/pr101329.f90  |  4 +--
 4 files changed, 71 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/null_actual_4.f90

diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc
index d661cf37f01..db74dcf3f40 100644
--- a/gcc/fortran/check.cc
+++ b/gcc/fortran/check.cc
@@ -4384,6 +4384,9 @@ gfc_check_null (gfc_expr *mold)
   if (mold == NULL)
 return true;

+  if (mold->expr_type == EXPR_NULL)
+return true;
+
   if (!variable_check (mold, 0, true))
 return false;

@@ -5216,7 +5219,7 @@ is_c_interoperable (gfc_expr *expr, const char **msg, bool c_loc, bool c_f_ptr)
 {
   *msg = NULL;

-  if (expr->expr_type == EXPR_NULL)
+  if (expr->expr_type == EXPR_NULL && expr->ts.type == BT_UNKNOWN)
 {
   *msg = "NULL() is not interoperable";
   return false;
diff --git a/gcc/fortran/interface.cc b/gcc/fortran/interface.cc
index 231f2f252af..64b90550be2 100644
--- a/gcc/fortran/interface.cc
+++ b/gcc/fortran/interface.cc
@@ -3296,6 +3296,36 @@ gfc_compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal,
 	  && a->expr->ts.type != BT_ASSUMED)
 	gfc_find_vtab (>expr->ts);

+  /* Interp J3/22-146:
+	 "If the context of the reference to NULL is an 
+	 corresponding to an  dummy argument, MOLD shall be
+	 present."  */
+  if (a->expr->expr_type == EXPR_NULL
+	  && a->expr->ts.type == BT_UNKNOWN
+	  && f->sym->as
+	  && f->sym->as->type == AS_ASSUMED_RANK)
+	{
+	  gfc_error ("Intrinsic % without % argument at %L "
+		 "passed to assumed-rank dummy %qs",
+		 >expr->where, f->sym->name);
+	  ok = false;
+	  goto match;
+	}
+
+  if (a->expr->expr_type == EXPR_NULL
+	  && a->expr->ts.type == BT_UNKNOWN
+	  && f->sym->ts.type == BT_CHARACTER
+	  && !f->sym->ts.deferred
+	  && f->sym->ts.u.cl
+	  && f->sym->ts.u.cl->length == NULL)
+	{
+	  gfc_error ("Intrinsic % without % argument at %L "
+		 "passed to assumed-length dummy %qs",
+		 >expr->where, f->sym->name);
+	  ok = false;
+	  goto match;
+	}
+
   if (a->expr->expr_type == EXPR_NULL
 	  && ((f->sym->ts.type != BT_CLASS && !f->sym->attr.pointer
 	   && (f->sym->attr.allocatable || !f->sym->attr.optional
diff --git a/gcc/testsuite/gfortran.dg/null_actual_4.f90 b/gcc/testsuite/gfortran.dg/null_actual_4.f90
new file mode 100644
index 000..e03d5c8f7de
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/null_actual_4.f90
@@ -0,0 +1,35 @@
+! { dg-do compile }
+! PR fortran/104819
+!
+! Reject NULL without MOLD as actual to an assumed-rank dummy.
+! See also interpretation request at
+! https://j3-fortran.org/doc/year/22/22-101r1.txt
+!
+! Test nested NULL()
+
+program p
+  implicit none
+  integer, pointer :: a, a3(:,:,:)
+  character(10), pointer :: c
+
+  call foo (a)
+  call foo (a3)
+  call foo (null (a))
+  call foo (null (a3))
+  call foo (null (null (a)))  ! Valid: nested NULL()s
+  call foo (null (null (a3))) ! Valid: nested NULL()s
+  call foo (null ())  ! { dg-error "passed to assumed-rank dummy" }
+
+  call str (null (c))
+  call str (null (null (c)))
+  call str (null ())  ! { dg-error "passed to assumed-length dummy" }
+contains
+  subroutine foo (x)
+integer, pointer, intent(in) :: x(..)
+print *, rank (x)
+  end
+
+  subroutine str (x)
+character(len=*), pointer, intent(in) :: x
+  end
+end
diff --git a/gcc/testsuite/gfortran.dg/pr101329.f90 b/gcc/testsuite/gfortran.dg/pr101329.f90
index b82210d4e28..aca171bd4f8 100644
---

[PATCH] libstdc++-v3: Fix cmath math declarations and stub support for hppa64--hpux11

2024-02-29 Thread John David Anglin

This change fixes the C99 math function support in  on
hppa64-*-hpux11*.

Tested on hppa64-hp-hpux11.11 and x86_64-linux-gnu.  See:
https://gcc.gnu.org/pipermail/gcc-testresults/2024-February/809158.html
https://gcc.gnu.org/pipermail/gcc-testresults/2024-February/809101.html

Okay for trunk?

Dave
---

Fix cmath math declarations and stub support for hppa64-*-hpux11*

This change fixes the following issues:

1) When the target host system doesn't support the full set of C99
functions, the stub replacements are not declared by cmath.  As a
result, stub replacements do not become members of namespace std.

2) Some using statements for float and long double C99 functions
are surrounded by a _GLIBCXX_HAVE_* #ifdef.  For example,
#ifdef _GLIBCXX_HAVE_ACOSF
  using ::acosf;
#endif
As a result, missing float and long double functions never become
a member of std even though there is stub support for all of them.

3) Undefs for acosf, acosl, etc, are missing.  Adding these should
allow PR86553 to be fixed.

4) Added AC_DEFINE statements for HAVE_CBRTF, HAVE_COPYSIGNF,
HAVE_HYPOTF, HAVE_LOG2F and HAVE_NEXTAFTERF to crossconfig.m4
for hpux host.

5) Added additional checks to linkage.m4.

6) Added stubs for missing float, double and long double C99
functions.

PR libstdc++/114101

libstdc++-v3/ChangeLog:

* config/os/hpux/os_defines.h (_GLIBCXX_USE_C99_MATH_FUNCS): Define.
(_GLIBCXX_USE_C99_MATH_TR1): Define.
(_GLIBCXX_USE_BUILTIN_FMA): Define if _PA_RISC2_0 host.
(_GLIBCXX_USE_BUILTIN_FMAF): Likewise.
* crossconfig.m4: Add AC_DEFINE statements for HAVE_CBRTF,
HAVE_COPYSIGNF, HAVE_HYPOTF, HAVE_LOG2F and HAVE_NEXTAFTERF.
* include/c_global/cmath: Add #undef statements for acosf,
acosl, etc.  Add declarations for acosf, acosl, etc.  Likewise,
add declarations for acoshf, acoshl, etc, for C++11.
* libstdc++-v3/include/tr1/cmath: Add declarations for acosf,
acosl, etc.
* linkage.m4: Add checks for fma, nexttoward, scalbln, tgamma,
cbrtf, copysignf, expm1f, log2f, nanf, nextafterf, nexttowardf,
expm1l, ilogbl, nanl, nextafterl, nexttowardl, scalblnl,
scalbnl.
* src/c++98/Makefile.am: Add math_stubs_double.cc to sources.
* src/c++98/math_stubs_double.cc: New file.
* src/c++98/math_stubs_float.cc (scalbnf): New stub.
(lgammaf, tgammaf, erff, erfcf, remquof, fdimf, nearbyintf,
exp2f, rintf, lrintf, llrintf, fmaxf, fminf, log1pf, truncf,
asinhf, acoshf, atanhf, scalblnf, lroundf, llroundf, roundf,
remainderf, logbf, ilogbf, expm1f, nextafterf, nexttowardf,
nanf): Likewise.
* src/c++98/math_stubs_long_double.cc (ilogbl): New stub.
(lgammal, log1pl, nanl, nearbyintl, nextafterl, nexttowardl,
scalblnl, scalbnl, tgammal): Likewise.
* configure: Regenerate.
* config.h.in: Regenerate.
* src/c++98/Makefile.in: Regenerate.

diff --git a/libstdc++-v3/config/os/hpux/os_defines.h 
b/libstdc++-v3/config/os/hpux/os_defines.h
index 38c1c38af0c..9ab1af42bda 100644
--- a/libstdc++-v3/config/os/hpux/os_defines.h
+++ b/libstdc++-v3/config/os/hpux/os_defines.h
@@ -79,6 +79,18 @@ namespace std
 
 #define _GLIBCXX_USE_LONG_LONG 1
 
+// Import C99 functions in  in  in namespace std in C++11.
+// Missing functions are handled by stubs.  The fma, nexttoward, scalbln
+// and tgamma are missing in HP-UX 11.  Many float variants are supported.
+#define _GLIBCXX_USE_C99_MATH_FUNCS 1
+#define _GLIBCXX_USE_C99_MATH_TR1 1
+
+#ifdef _PA_RISC2_0
+// Float and double fma are supported directly in hardware.
+#define _GLIBCXX_USE_BUILTIN_FMA 1
+#define _GLIBCXX_USE_BUILTIN_FMAF 1
+#endif
+
 // HPUX on IA64 requires vtable to be 64 bit aligned even at 32 bit
 // mode.  We need to pad the vtable structure to achieve this.
 #if !defined(_LP64) && defined (__ia64__)
diff --git a/libstdc++-v3/crossconfig.m4 b/libstdc++-v3/crossconfig.m4
index b3269cb88e0..c6b08be5df5 100644
--- a/libstdc++-v3/crossconfig.m4
+++ b/libstdc++-v3/crossconfig.m4
@@ -152,14 +152,10 @@ case "${host}" in
 AC_DEFINE(HAVE_ACOSF)
 AC_DEFINE(HAVE_ASINF)
 AC_DEFINE(HAVE_ATANF)
+AC_DEFINE(HAVE_ATAN2F)
 AC_DEFINE(HAVE_COSF)
 AC_DEFINE(HAVE_COSHF)
-AC_DEFINE(HAVE_SINF)
-AC_DEFINE(HAVE_SINHF)
-AC_DEFINE(HAVE_TANF)
-AC_DEFINE(HAVE_TANHF)
 AC_DEFINE(HAVE_EXPF)
-AC_DEFINE(HAVE_ATAN2F)
 AC_DEFINE(HAVE_FABSF)
 AC_DEFINE(HAVE_FMODF)
 AC_DEFINE(HAVE_FREXPF)
@@ -167,7 +163,16 @@ case "${host}" in
 AC_DEFINE(HAVE_LOG10F)
 AC_DEFINE(HAVE_MODF)
 AC_DEFINE(HAVE_POWF)
+AC_DEFINE(HAVE_SINF)
+AC_DEFINE(HAVE_SINHF)
 AC_DEFINE(HAVE_SQRTF)
+AC_DEFINE(HAVE_TANF)
+AC_DEFINE(HAVE_TANHF)
+AC_DEFINE(HAVE_CBRTF)
+AC_DEFINE(HAVE_COPYSIGNF)
+AC_DEFINE(HAVE_HYPOTF)
+AC_DEFINE(HAVE_LOG2F)
+AC_DEFINE(HAVE_NEXTAFTERF)
 
 # GLIBCXX_CHECK_STDLIB_SUPPORT

[PATCH v1 00/13] Add aarch64-w64-mingw32 target

2024-02-29 Thread Evgeny Karpov

Thank you for the initial review for v1!

Work on refactoring, rebasing, and validating 
"[PATCH v2] Add aarch64-w64-mingw32 target" is in progress. 
The v2 x64 mingw target will also be fully tested to avoid
regression due to refactoring.
Please provide feedback if anything is missing.

Changes from v1 to v2:
Adjust the target name to aarch64-*-mingw* to exclude the 
big-endian target from support.
Exclude 64-bit ISA.
Rename enum calling_abi to aarch64_calling_abi.
Move AARCH64 MS ABI definitions FIXED_REGISTERS, 
CALL_REALLY_USED_REGISTERS, and STATIC_CHAIN_REGNUM from 
aarch64.h to aarch64-abi-ms.h.
Rename TARGET_ARM64_MS_ABI to TARGET_AARCH64_MS_ABI.
Exclude TARGET_64BIT from the aarch64 target.
Exclude HAVE_GAS_WEAK.
Set HAVE_GAS_ALIGNED_COMM to 1 by default.
Use a reference from "x86 Windows Options" to "Cygwin and MinGW Options".
Update commit descriptions to follow standard style.


Regards,
Evgeny

Re: [PATCH] c++: auto(x) partial substitution [PR110025, PR114138]

2024-02-29 Thread Patrick Palka

On Wed, 28 Feb 2024, Jason Merrill wrote:

> On 2/27/24 15:48, Patrick Palka wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
> > OK for trunk and perhaps 13?
> > 
> > -- >8 --
> > 
> > In r12-6773-g09845ad7569bac we gave CTAD placeholders a level of 0 and
> > ensured we never replaced them via tsubst.  It turns out that autos
> > representing an explicit cast need the same treatment and for the same
> > reason: such autos appear in an expression context and so their level
> > gets easily messed up after partial substitution, leading to premature
> > replacement via an incidental tsubst instead of via do_auto_deduction.
> > 
> > This patch fixes this by extending the r12-6773 approach to auto(x) and
> > auto{x}.
> > 
> > PR c++/110025
> > PR c++/114138
> > 
> > gcc/cp/ChangeLog:
> > 
> > * cp-tree.h (make_cast_auto): Declare.
> > * parser.cc (cp_parser_functional_cast): Replace a parsed auto
> > with a level-less one via make_cast_auto.
> > * pt.cc (find_parameter_packs_r): Don't treat level-less auto
> > as a type parameter pack.
> > (tsubst) : Generalized CTAD placeholder
> > handling to all level-less autos.
> > (make_cast_auto): Define.
> > (do_auto_deduction): Handle deduction of a level-less non-CTAD
> > auto.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp23/auto-fncast16.C: New test.
> > * g++.dg/cpp23/auto-fncast17.C: New test.
> > * g++.dg/cpp23/auto-fncast18.C: New test.
> > ---
> >   gcc/cp/cp-tree.h   |  1 +
> >   gcc/cp/parser.cc   | 11 
> >   gcc/cp/pt.cc   | 31 +-
> >   gcc/testsuite/g++.dg/cpp23/auto-fncast16.C | 12 
> >   gcc/testsuite/g++.dg/cpp23/auto-fncast17.C | 15 +
> >   gcc/testsuite/g++.dg/cpp23/auto-fncast18.C | 71 ++
> >   6 files changed, 138 insertions(+), 3 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp23/auto-fncast16.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp23/auto-fncast17.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp23/auto-fncast18.C
> > 
> > diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> > index 04c3aa6cd91..6f1da1c7bad 100644
> > --- a/gcc/cp/cp-tree.h
> > +++ b/gcc/cp/cp-tree.h
> > @@ -7476,6 +7476,7 @@ extern tree make_decltype_auto
> > (void);
> >   extern tree make_constrained_auto (tree, tree);
> >   extern tree make_constrained_decltype_auto(tree, tree);
> >   extern tree make_template_placeholder (tree);
> > +extern tree make_cast_auto (void);
> >   extern bool template_placeholder_p(tree);
> >   extern bool ctad_template_p   (tree);
> >   extern bool unparenthesized_id_or_class_member_access_p (tree);
> > diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> > index 3ee9d49fb8e..1e518e6ef51 100644
> > --- a/gcc/cp/parser.cc
> > +++ b/gcc/cp/parser.cc
> > @@ -33314,6 +33314,17 @@ cp_parser_functional_cast (cp_parser* parser, tree
> > type)
> > if (!type)
> >   type = error_mark_node;
> >   +  if (TREE_CODE (type) == TYPE_DECL
> > +  && is_auto (TREE_TYPE (type)))
> > +type = TREE_TYPE (type);
> > +
> > +  if (is_auto (type)
> > +  && !AUTO_IS_DECLTYPE (type)
> > +  && !PLACEHOLDER_TYPE_CONSTRAINTS (type)
> > +  && !CLASS_PLACEHOLDER_TEMPLATE (type))
> > +/* auto(x) and auto{x} are represented using a level-less auto.  */
> > +type = make_cast_auto ();
> > +
> > if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE))
> >   {
> > cp_lexer_set_source_position (parser->lexer);
> > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > index 2803824d11e..620fe5cdbfa 100644
> > --- a/gcc/cp/pt.cc
> > +++ b/gcc/cp/pt.cc
> > @@ -3921,7 +3921,8 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees,
> > void* data)
> >  parameter pack (14.6.3), or the type-specifier-seq of a type-id that
> >  is a pack expansion, the invented template parameter is a template
> >  parameter pack.  */
> > -  if (ppd->type_pack_expansion_p && is_auto (t))
> > +  if (ppd->type_pack_expansion_p && is_auto (t)
> > + && TEMPLATE_TYPE_LEVEL (t) != 0)
> > TEMPLATE_TYPE_PARAMETER_PACK (t) = true;
> > if (TEMPLATE_TYPE_PARAMETER_PACK (t))
> >   parameter_pack_p = true;
> > @@ -16297,9 +16298,14 @@ tsubst (tree t, tree args, tsubst_flags_t complain,
> > tree in_decl)
> > }
> > case TEMPLATE_TYPE_PARM:
> > -  if (template_placeholder_p (t))
> > +  if (TEMPLATE_TYPE_LEVEL (t) == 0)
> > {
> > + /* Level-less auto must be replaced via do_auto_deduction.  */
> 
> This comment could use clarification about the CTAD case.

Fixed.

> 
> > + gcc_checking_assert (is_auto (t));
> >   tree tmpl = CLASS_PLACEHOLDER_TEMPLATE (t);
> > + if (!tmpl)
> > +   return t;
> > +
> >   tmpl = tsubst_expr (tmpl, args, complain, in_decl);
> >   if (TREE_CODE

[PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments for AArch64

2024-02-29 Thread Evgeny Karpov

Thursday, February 29, 2024 6:56 PM
Andrew Pinski (QUIC) wrote:

> Looking at these results, this port is not in any shape or form to be 
> upstreamed
> right now. Even simple -g will cause failures.
> Note we don't need a clean testsuite run but the patch series is not even
> allowing enabling hello world due to the -g not being able to used.
> 
> Thanks,
> Amdrew Pinski

For now, our contribution plan contains 4 patch series.

1. Minimal aarch64-w64-mingw32 C implementation to cross-compile
hello-world with libgcc for Windows Arm64 using MinGW.
2. Extension of the aarch64-w64-mingw32 C implementation to
cross-compile OpenSSL, OpenBLAS, FFmpeg, and libjpeg-turbo. All
packages successfully pass tests.
3. Addition of call stack support for debugging, resolution of
optimization issues in the C compiler, and DLL export/import for the
aarch64-w64-mingw32 target.
4. Unit testing integration for aarch64-w64-mingw32 target

The goal is to prepare the first patch series for upstreaming as
soon as possible. This will enable iterative development and
introduce a new target that can potentially be tested and
improved by the community.

If debugging information is a strong blocker for the first patch
series, its priority can be changed. However, it would be
preferable to include it in the second and third patch series,
as originally planned.

Regards,
Evgeny

Re: [patch, libgfortran] Part 2: PR105456 Child I/O does not propage iostat

2024-02-29 Thread Steve Kargl

On Thu, Feb 29, 2024 at 09:36:43AM -0800, Jerry D wrote:
> On 2/29/24 1:47 AM, Bernhard Reutner-Fischer wrote:
> 
> > And, just for my own education, the length limitation of iomsg to 255
> > chars is not backed by the standard AFAICS, right? It's just our
> > STRERR_MAXSZ?
> 
> Yes, its what we have had for a long lone time. Once you throw an error
> things get very processor dependent. I found MSGLEN set to 100 and IOMSG_len
> to 256. Nothing magic about it.
> 

There is no restriction on the length for the iomsg-variable
that receives the generated error message.  In fact, if the
iomsg-variable has a deferred-length type parameter, then
(re)-allocation to the exact length is expected.

  F2023

  12.11.6 IOMSG= specifier

  If an error, end-of-file, or end-of-record condition occurs during
  execution of an input/output statement, iomsg-variable is assigned
  an explanatory message, as if by intrinsic assignment. If no such
  condition occurs, the definition status and value of iomsg-variable
  are unchanged.

character(len=23) emsg
read(fd,*,iomsg=emsg)

Here, the generated iomsg is either truncated to a length of 23
or padded with blanks to a length of 23.

character(len=:), allocatable :: emsg
read(fd,*,iomsg=emsg)

Here, emsg should have the length of whatever error message was
generated.

HTH

-- 
Steve

Re: [PATCH v1 00/13] Add aarch64-w64-mingw32 target

2024-02-29 Thread NightStrike

On Thu, Feb 29, 2024 at 11:26 AM Evgeny Karpov
 wrote:
>
> Monday, February 26, 2024 2:30 AM
> NightStrike wrote:
>
> > To be clear, because of the refactoring, it will affect x86/x64 Windows 
> > targets.
> > Can you do a testsuite run before and after and see that it doesn't get 
> > worse?
> > The full testsuite for all languages for Windows isn't in great shape, but 
> > it's not
> > awful.  Some languages, like Rust and Fortran, have ~10 FAILs.  C and C++ 
> > have
> > several thousand.
> >
> > In particular, there are quite a few testsuite test FAILs regarding MS ABI 
> > that
> > hopefully do not get worse.
> >
>
> Thank you for bringing it up! Our CI will be extended to test the x64
> mingw target and calculate a delta, starting from patch series v2.

Thanks.  You should probably include x86 also, at least for all the
areas that overlap.  I would like to compare my own test results with
yours when you have that ready.

You can send test results to the gcc mailing list setup for this
purpose: https://gcc.gnu.org/mailman/listinfo/gcc-testresults, and
there are scripts in contrib/ to help automate the process.  I
personally stopped, because the clusters I used had their mail sending
capabilities cut off, but I'm working on fixing that.

> > Lastly, I don't think I see in the current patch series where you add new
> > testsuite coverage for aarch64-specific bits.  I probably missed it, so 
> > feel free to
> > helpfully correct me there :)  I'd be curious to see how the tests were 
> > written to
> > take into account target differences (using for example the dejagnu feature
> > procs) and other nuances.
>
> Tests have not been added yet. This does not mean they do not exist
> or are not used. They are implemented and used in our CI, and will be
> contributed to the aarch64-w64-mingw32 target in the next patch
> series.
> https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-build/tree/main/tests

Awesome!  These tests look like they are handled by your own custom
test harness, so hopefully it won't be too difficult to convert it all
to dejagnu.  Honestly, the sooner you do that, the better, because the
task is going to balloon.  You'll find that Deja offers all kinds of
neat and useful features that allow you to test all kinds of things,
so it'll result in better coverage in the end.

Re: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments for AArch64

2024-02-29 Thread Richard Earnshaw (lists)

On 29/02/2024 17:55, Andrew Pinski (QUIC) wrote:
>> -Original Message-
>> From: Maxim Kuvyrkov 
>> Sent: Thursday, February 29, 2024 9:46 AM
>> To: Andrew Pinski (QUIC) 
>> Cc: Evgeny Karpov ; Andrew Pinski
>> ; Richard Sandiford ; gcc-
>> patc...@gcc.gnu.org; 10wa...@gmail.com; m...@harmstone.com; Zac
>> Walker ; Ron Riddle
>> ; Radek Barton 
>> Subject: Re: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW
>> environments for AArch64
>>
>> WARNING: This email originated from outside of Qualcomm. Please be wary
>> of any links or attachments, and do not enable macros.
>>
>>> On Feb 29, 2024, at 21:35, Andrew Pinski (QUIC)
>>  wrote:
>>>
>>>
>>>
 -Original Message-
 From: Evgeny Karpov 
 Sent: Thursday, February 29, 2024 8:46 AM
 To: Andrew Pinski 
 Cc: Richard Sandiford ; gcc-
 patc...@gcc.gnu.org; 10wa...@gmail.com; Maxim Kuvyrkov
 ; m...@harmstone.com; Zac Walker
 ; Ron Riddle ;
 Radek Barton ; Andrew Pinski (QUIC)
 
 Subject: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments
 for AArch64

 Wednesday, February 28, 2024 2:00 AM
 Andrew Pinski wrote:

> What does this mean with respect to C++ exceptions? Or you using
> SJLJ exceptions support or the dwarf unwinding ones without SEH
>> support?
> I am not sure if SJLJ exceptions is well tested any more in GCC either.
>
> Also I have a question if you ran the full GCC/G++ testsuites and
> what were the results?
> If you did run it, did you use a cross compiler or the native
> compiler? Did you do a bootstrap (GCC uses C++ but no exceptions
>> though)?

 As mentioned in the cover letter and the thread, the current
 contribution covers only the C scope.
 Exception handling is fully disabled for now.
 There is an experimental build with C++ and SEH, however, it is not
 included in the plan for the current contribution.

 https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-
>> build

> If you run using a cross compiler, did you use ssh or some other
> route to run the applications?
>
> Thanks,
> Andrew Pinski

 GitHub Actions are used to cross-compile toolchains, packages and
 tests, and execute tests on Windows Arm64.
>>>
>>> This does not answer my question because what you are running is just
>> simple testcases and not the FULL GCC testsuite.
>>> So again have you ran the GCC testsuite and do you have a dejagnu board to
>> be able to execute the binaries?
>>> I think without the GCC testsuite ran to find all of the known failures, 
>>> you are
>> going to be running into many issues.
>>> The GCC testsuite includes many tests for ABI corner cases and many
>> features that you will most likely not think about testing using your simple
>> testcases.
>>> In fact I suspect there will be some of the aarch64 testcases which will 
>>> need
>> to be modified for the windows ABI which you have not done yet.
>>
>> Hi Andrew,
>>
>> We (Linaro) have a prototype CI loop setup for testing aarch64-w64-
>> mingw32, and we have results for gcc-c and libatomic -- see [1].
>>
>> The results are far from clean, but that's expected.  This patch series aims 
>> at
>> enabling C hello-world only, and subsequent patch series will improve the
>> state of the port.
>>
>> [1] https://ci.linaro.org/job/tcwg_gnu_mingw_check_gcc--master-woa64-
>> build/6/artifact/artifacts/sumfiles/
> 
> Looking at these results, this port is not in any shape or form to be 
> upstreamed right now. Even simple -g will cause failures.
> Note we don't need a clean testsuite run but the patch series is not even 
> allowing enabling hello world due to the -g not being able to used.
> 

It seemed to me as though the patch was posted for comments, not for immediate 
inclusion.  I agree this isn't ready for committing yet, but neither should the 
submitters wait until it's perfect before posting it.

I think it's gcc-15 material, so now is about the right time to be thinking 
about it.

R.

> Thanks,
> Amdrew Pinski
> 
>>
>> Thanks,
>>
>> --
>> Maxim Kuvyrkov
>> https://www.linaro.org
>

Re: [PATCH] calls: Further fixes for TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-02-29 Thread Jakub Jelinek

On Thu, Feb 29, 2024 at 05:51:03PM +, Richard Earnshaw (lists) wrote:
> Oh, but wait!  Perhaps that now falls into the initial 'if' clause and we 
> never reach the point where you pick zero.  So perhaps I'm worrying about 
> nothing.

If you are worried about the
+  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
+  && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
 n_named_args = 0;
case in the patch, we know at that point that the initial n_named_args is
equal to structure_value_addr_parm, so either 0, in that case
--n_named_args;
would yield the undesirable negative value, so we want 0 instead; for that
case we could as well just have ; in there instead of n_named_args = 0;,
or it is 1, in that case --n_named_args; would turn that into 0.

Jakub

RE: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments for AArch64

2024-02-29 Thread Andrew Pinski (QUIC)

> -Original Message-
> From: Maxim Kuvyrkov 
> Sent: Thursday, February 29, 2024 9:46 AM
> To: Andrew Pinski (QUIC) 
> Cc: Evgeny Karpov ; Andrew Pinski
> ; Richard Sandiford ; gcc-
> patc...@gcc.gnu.org; 10wa...@gmail.com; m...@harmstone.com; Zac
> Walker ; Ron Riddle
> ; Radek Barton 
> Subject: Re: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW
> environments for AArch64
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary
> of any links or attachments, and do not enable macros.
> 
> > On Feb 29, 2024, at 21:35, Andrew Pinski (QUIC)
>  wrote:
> >
> >
> >
> >> -Original Message-
> >> From: Evgeny Karpov 
> >> Sent: Thursday, February 29, 2024 8:46 AM
> >> To: Andrew Pinski 
> >> Cc: Richard Sandiford ; gcc-
> >> patc...@gcc.gnu.org; 10wa...@gmail.com; Maxim Kuvyrkov
> >> ; m...@harmstone.com; Zac Walker
> >> ; Ron Riddle ;
> >> Radek Barton ; Andrew Pinski (QUIC)
> >> 
> >> Subject: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments
> >> for AArch64
> >>
> >> Wednesday, February 28, 2024 2:00 AM
> >> Andrew Pinski wrote:
> >>
> >>> What does this mean with respect to C++ exceptions? Or you using
> >>> SJLJ exceptions support or the dwarf unwinding ones without SEH
> support?
> >>> I am not sure if SJLJ exceptions is well tested any more in GCC either.
> >>>
> >>> Also I have a question if you ran the full GCC/G++ testsuites and
> >>> what were the results?
> >>> If you did run it, did you use a cross compiler or the native
> >>> compiler? Did you do a bootstrap (GCC uses C++ but no exceptions
> though)?
> >>
> >> As mentioned in the cover letter and the thread, the current
> >> contribution covers only the C scope.
> >> Exception handling is fully disabled for now.
> >> There is an experimental build with C++ and SEH, however, it is not
> >> included in the plan for the current contribution.
> >>
> >> https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-
> build
> >>
> >>> If you run using a cross compiler, did you use ssh or some other
> >>> route to run the applications?
> >>>
> >>> Thanks,
> >>> Andrew Pinski
> >>
> >> GitHub Actions are used to cross-compile toolchains, packages and
> >> tests, and execute tests on Windows Arm64.
> >
> > This does not answer my question because what you are running is just
> simple testcases and not the FULL GCC testsuite.
> > So again have you ran the GCC testsuite and do you have a dejagnu board to
> be able to execute the binaries?
> > I think without the GCC testsuite ran to find all of the known failures, 
> > you are
> going to be running into many issues.
> > The GCC testsuite includes many tests for ABI corner cases and many
> features that you will most likely not think about testing using your simple
> testcases.
> > In fact I suspect there will be some of the aarch64 testcases which will 
> > need
> to be modified for the windows ABI which you have not done yet.
> 
> Hi Andrew,
> 
> We (Linaro) have a prototype CI loop setup for testing aarch64-w64-
> mingw32, and we have results for gcc-c and libatomic -- see [1].
> 
> The results are far from clean, but that's expected.  This patch series aims 
> at
> enabling C hello-world only, and subsequent patch series will improve the
> state of the port.
> 
> [1] https://ci.linaro.org/job/tcwg_gnu_mingw_check_gcc--master-woa64-
> build/6/artifact/artifacts/sumfiles/

Looking at these results, this port is not in any shape or form to be 
upstreamed right now. Even simple -g will cause failures.
Note we don't need a clean testsuite run but the patch series is not even 
allowing enabling hello world due to the -g not being able to used.

Thanks,
Amdrew Pinski

> 
> Thanks,
> 
> --
> Maxim Kuvyrkov
> https://www.linaro.org

[committed] libstdc++: Fix std::basic_format_arg::handle for BasicFormatters

2024-02-29 Thread Jonathan Wakely

Tested x86_64-linux. Pushed to trunk.

-- >8 --

std::basic_format_arg::handle is supposed to format its value as const
if that is valid, to reduce the number of instantiations of the
formatter's format function. I made a silly typo so that it checks
formattable_with not formattable_with,
which breaks support for BasicFormatters i.e. ones that can only format
non-const types.

There's a static_assert in the handle constructor which is supposed to
improve diagnostics for trying to format a const argument with a
formatter that doesn't support it. That condition can't fail, because
the std::basic_format_arg constructor is already constrained to check
that the argument type is formattable. The static_assert can be removed.

libstdc++-v3/ChangeLog:

* include/std/format (basic_format_arg::handle::__maybe_const_t):
Fix condition to check if const type is formattable.
(basic_format_arg::handle::handle(T&)): Remove redundant
static_assert.
* testsuite/std/format/formatter/basic.cc: New test.
---
 libstdc++-v3/include/std/format   |  6 +
 .../testsuite/std/format/formatter/basic.cc   | 24 +++
 2 files changed, 25 insertions(+), 5 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/format/formatter/basic.cc

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 961441e355b..ee189f9086c 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -3210,7 +3210,7 @@ namespace __format
// Format as const if possible, to reduce instantiations.
template
  using __maybe_const_t
-   = __conditional_t<__formattable<_Tp>, const _Tp, _Tp>;
+   = __conditional_t<__formattable, const _Tp, _Tp>;
 
template
  static void
@@ -3228,10 +3228,6 @@ namespace __format
  explicit
  handle(_Tp& __val) noexcept
  {
-   if constexpr (!__formattable)
- static_assert(!is_const_v<_Tp>, "std::format argument must be "
- "non-const for this type");
-
this->_M_ptr = __builtin_addressof(__val);
auto __func = _S_format<__maybe_const_t<_Tp>>;
this->_M_func = reinterpret_cast(__func);
diff --git a/libstdc++-v3/testsuite/std/format/formatter/basic.cc 
b/libstdc++-v3/testsuite/std/format/formatter/basic.cc
new file mode 100644
index 000..56c18864135
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/format/formatter/basic.cc
@@ -0,0 +1,24 @@
+// { dg-do compile { target c++20 } }
+
+// BasicFormatter requirements do not require a const parameter.
+
+#include 
+
+struct X { };
+
+template<> struct std::formatter
+{
+  constexpr auto parse(format_parse_context& ctx)
+  { return ctx.begin(); }
+
+  // Takes non-const X&
+  format_context::iterator format(X&, format_context& ctx) const
+  {
+auto out = ctx.out();
+*out++ = 'x';
+return out;
+  }
+};
+
+X x;
+auto s = std::format("{}", x);
-- 
2.43.2

[committed] libstdc++: Fix conditions for using memcmp in std::lexicographical_compare_three_way [PR113960]

2024-02-29 Thread Jonathan Wakely

Tested aarch64-linux, powerpc-linux (power 7 BE), x86_64-linux.
The bug reporter tested it on s390x too.

Pushed to trunk. This should be backported too.

-- >8 --

The change in r11-2981-g2f983fa69005b6 meant that
std::lexicographical_compare_three_way started to use memcmp for
unsigned integers on big endian targets, but for that to be valid we
need the two value types to have the same size and we need to use that
size to compute the length passed to memcmp.

I already defined a __is_memcmp_ordered_with trait that does the right
checks, std::lexicographical_compare_three_way just needs to use it.

libstdc++-v3/ChangeLog:

PR libstdc++/113960
* include/bits/stl_algobase.h (__is_byte_iter): Replace with ...
(__memcmp_ordered_with): New concept.
(lexicographical_compare_three_way): Use __memcmp_ordered_with
instead of __is_byte_iter. Use correct length for memcmp.
* testsuite/25_algorithms/lexicographical_compare_three_way/113960.cc:
New test.
---
 libstdc++-v3/include/bits/stl_algobase.h  | 41 ++-
 .../113960.cc | 15 +++
 2 files changed, 37 insertions(+), 19 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/25_algorithms/lexicographical_compare_three_way/113960.cc

diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index d534e02871f..74ff42d4f39 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -1806,11 +1806,14 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
 }
 
 #if __cpp_lib_three_way_comparison
-  // Iter points to a contiguous range of unsigned narrow character type
-  // or std::byte, suitable for comparison by memcmp.
-  template
-concept __is_byte_iter = contiguous_iterator<_Iter>
-  && __is_memcmp_ordered>::__value;
+  // Both iterators refer to contiguous ranges of unsigned narrow characters,
+  // or std::byte, or big-endian unsigned integers, suitable for comparison
+  // using memcmp.
+  template
+concept __memcmp_ordered_with
+  = (__is_memcmp_ordered_with,
+ iter_value_t<_Iter2>>::__value)
+ && contiguous_iterator<_Iter1> && contiguous_iterator<_Iter2>;
 
   // Return a struct with two members, initialized to the smaller of x and y
   // (or x if they compare equal) and the result of the comparison x <=> y.
@@ -1860,20 +1863,20 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
   if (!std::__is_constant_evaluated())
if constexpr (same_as<_Comp, __detail::_Synth3way>
  || same_as<_Comp, compare_three_way>)
- if constexpr (__is_byte_iter<_InputIter1>)
-   if constexpr (__is_byte_iter<_InputIter2>)
- {
-   const auto [__len, __lencmp] = _GLIBCXX_STD_A::
- __min_cmp(__last1 - __first1, __last2 - __first2);
-   if (__len)
- {
-   const auto __c
- = __builtin_memcmp(&*__first1, &*__first2, __len) <=> 0;
-   if (__c != 0)
- return __c;
- }
-   return __lencmp;
- }
+ if constexpr (__memcmp_ordered_with<_InputIter1, _InputIter2>)
+   {
+ const auto [__len, __lencmp] = _GLIBCXX_STD_A::
+   __min_cmp(__last1 - __first1, __last2 - __first2);
+ if (__len)
+   {
+ const auto __blen = __len * sizeof(*__first1);
+ const auto __c
+   = __builtin_memcmp(&*__first1, &*__first2, __blen) <=> 0;
+ if (__c != 0)
+   return __c;
+   }
+ return __lencmp;
+   }
 
   while (__first1 != __last1)
{
diff --git 
a/libstdc++-v3/testsuite/25_algorithms/lexicographical_compare_three_way/113960.cc
 
b/libstdc++-v3/testsuite/25_algorithms/lexicographical_compare_three_way/113960.cc
new file mode 100644
index 000..d51ae1a3d50
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/25_algorithms/lexicographical_compare_three_way/113960.cc
@@ -0,0 +1,15 @@
+// { dg-do run { target c++20 } }
+
+// PR libstdc++/113960
+// std::map with std::vector as input overwrites itself with c++20, on s390x
+
+#include 
+#include 
+
+int main()
+{
+  unsigned short a1[] { 1, 2, 3 };
+  unsigned short a2[] { 1, 2, 4 };
+  // Incorrect memcmp comparison for big endian targets.
+  VERIFY( std::lexicographical_compare_three_way(a1, a1+3, a2, a2+3) < 0 );
+}
-- 
2.43.2

Re: [PATCH] calls: Further fixes for TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-02-29 Thread Richard Earnshaw (lists)

On 29/02/2024 17:38, Jakub Jelinek wrote:
> On Thu, Feb 29, 2024 at 05:23:25PM +, Richard Earnshaw (lists) wrote:
>> On 29/02/2024 15:55, Jakub Jelinek wrote:
>>> On Thu, Feb 29, 2024 at 02:14:05PM +, Richard Earnshaw wrote:
> I tried the above on arm, aarch64 and x86_64 and that seems fine,
> including the new testcase you added.
>

 I should mention though, that INIT_CUMULATIVE_ARGS on arm ignores
 n_named_args entirely, it doesn't need it (I don't think it even existed
 when the AAPCS code was added).
>>>
>>> So far I've just checked that the new testcase passes not just on
>>> x86_64/i686-linux, but also on {powerpc64le,s390x,aarch64}-linux
>>> with vanilla trunk.
>>> Haven't posted this patch in patch form, plus while I'm not really sure
>>> whether setting n_named_args to 0 or not changing in the
>>> !pretend_outgoing_varargs_named is right, the setting to 0 feels more
>>> correct to me.  If structure_value_addr_parm is 1, the function effectively
>>> has a single named argument and then ... args and if the target wants
>>> n_named_args to be number of named arguments except the last, then that
>>> should be 0 rather than 1.
>>>
>>> Thus, is the following patch ok for trunk then?
>>
>> The comment at the start of the section says
>>
>>   /* Now possibly adjust the number of named args.
>>  Normally, don't include the last named arg if anonymous args follow.
>>  We do include the last named arg if
>>  targetm.calls.strict_argument_naming() returns nonzero.
>>  (If no anonymous args follow, the result of list_length is actually
>>  one too large.  This is harmless.)
>>
>> So in the case of strict_argument_naming perhaps it should return 1, but 0 
>> for other cases.
> 
> The TYPE_NO_NAMED_ARGS_STDARG_P (funtype) case is as if type_arg_types != 0
> and list_length (type_arg_types) == 0, i.e. no user named arguments.
> As list_length (NULL) returns 0, perhaps it could be even handled just the
> by changing all the type_arg_types != 0 checks to
> type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
> There are just 2 cases I'm worried about, one is that I think rest of
> calls.cc nor the backends are prepared to see n_named_args -1 after the
> adjustments, I think it is better to use 0, and then the question is what
> the !strict_argument_naming && !pretend_outgoing_varargs_named case
> wants to do for the aggregate return.  The patch as posted for
> void foo (...); void bar () { foo (1, 2, 3); }
> will set n_named_args initially to 0 (no named args) and with the
> adjustments for strict_argument_naming 0, otherwise for !pretend
> 0 as well, otherwise 3.
> For
> struct { char buf[4096]; } baz (...); void qux () { baz (1, 2, 3); }
> the patch sets n_named_args initially to 1 (the hidden return) and
> with the arguments for strict keep it at 1, for !pretend 0 and otherwise
> 3.
> 
> So, which case do you think is handled incorrectly with that?

The way I was thinking about it (and testing it on Arm) was to look at 
n_named_args for the cases of a traditional varargs case, then reduce that by 
one (except it can't ever be negative).

So for 

void f(...);
void g(int, ...);
struct S { int a[32]; };

struct S h (...);
struct S i (int, ...);

void a ()
{
  struct S x;
  f(1, 2, 3, 4);
  g(1, 2, 3, 4);
  x = h (1, 2, 3, 4);
  x = i (1, 2, 3, 4);
}

There are various permutations that could lead to answers of 0, 1, 2, 4 and 5 
depending on how those various targets treat each case and how the result 
pointer address is handled.  My suspicion is that for a target that has strict 
argument naming and the result pointer passed as a first argument, the answer 
for the 'h()' call should be 1, not zero.  

Oh, but wait!  Perhaps that now falls into the initial 'if' clause and we never 
reach the point where you pick zero.  So perhaps I'm worrying about nothing.

R.

> 
>   Jakub
>

Re: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments for AArch64

2024-02-29 Thread Maxim Kuvyrkov

> On Feb 29, 2024, at 21:35, Andrew Pinski (QUIC)  
> wrote:
> 
> 
> 
>> -Original Message-
>> From: Evgeny Karpov 
>> Sent: Thursday, February 29, 2024 8:46 AM
>> To: Andrew Pinski 
>> Cc: Richard Sandiford ; gcc-
>> patc...@gcc.gnu.org; 10wa...@gmail.com; Maxim Kuvyrkov
>> ; m...@harmstone.com; Zac Walker
>> ; Ron Riddle ; Radek
>> Barton ; Andrew Pinski (QUIC)
>> 
>> Subject: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments
>> for AArch64
>> 
>> Wednesday, February 28, 2024 2:00 AM
>> Andrew Pinski wrote:
>> 
>>> What does this mean with respect to C++ exceptions? Or you using SJLJ
>>> exceptions support or the dwarf unwinding ones without SEH support?
>>> I am not sure if SJLJ exceptions is well tested any more in GCC either.
>>> 
>>> Also I have a question if you ran the full GCC/G++ testsuites and what
>>> were the results?
>>> If you did run it, did you use a cross compiler or the native
>>> compiler? Did you do a bootstrap (GCC uses C++ but no exceptions though)?
>> 
>> As mentioned in the cover letter and the thread, the current contribution
>> covers only the C scope.
>> Exception handling is fully disabled for now.
>> There is an experimental build with C++ and SEH, however, it is not included 
>> in
>> the plan for the current contribution.
>> 
>> https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-build
>> 
>>> If you run using a cross compiler, did you use ssh or some other route
>>> to run the applications?
>>> 
>>> Thanks,
>>> Andrew Pinski
>> 
>> GitHub Actions are used to cross-compile toolchains, packages and tests, and
>> execute tests on Windows Arm64.
> 
> This does not answer my question because what you are running is just simple 
> testcases and not the FULL GCC testsuite.
> So again have you ran the GCC testsuite and do you have a dejagnu board to be 
> able to execute the binaries?
> I think without the GCC testsuite ran to find all of the known failures, you 
> are going to be running into many issues.
> The GCC testsuite includes many tests for ABI corner cases and many features 
> that you will most likely not think about testing using your simple testcases.
> In fact I suspect there will be some of the aarch64 testcases which will need 
> to be modified for the windows ABI which you have not done yet.

Hi Andrew,

We (Linaro) have a prototype CI loop setup for testing aarch64-w64-mingw32, and 
we have results for gcc-c and libatomic -- see [1].

The results are far from clean, but that's expected.  This patch series aims at 
enabling C hello-world only, and subsequent patch series will improve the state 
of the port.

[1] 
https://ci.linaro.org/job/tcwg_gnu_mingw_check_gcc--master-woa64-build/6/artifact/artifacts/sumfiles/

Thanks,

--
Maxim Kuvyrkov
https://www.linaro.org

[patch,avr,applied] PR target/114132: Sets up a frame without need

2024-02-29 Thread Georg-Johann Lay


The condition when a frame pointer is required because
arguments are passe on the stack was not exact, and
there were situations when a frame was set up without
a need for it.

Johann

--

AVR: target/114132 - Code sets up a frame pointer without need.

The condition CUMULATIVE_ARGS.nregs == 0 in avr_frame_pointer_required_p()
means that no more argument registers are left, but that's not the same
condition that tells whether an argument pointer is required.

PR target/114132
gcc/
* config/avr/avr.h (CUMULATIVE_ARGS) : New field.
* config/avr/avr.cc (avr_init_cumulative_args): Initialize it.
(avr_function_arg): Set it.
(avr_frame_pointer_required_p): Use it instead of .nregs.

gcc/testsuite/
* gcc.target/avr/pr114132-1.c: New test.
* gcc.target/avr/torture/pr114132-2.c: New test.AVR: target/114132 - Code sets up a frame pointer without need.

The condition CUMULATIVE_ARGS.nregs == 0 in avr_frame_pointer_required_p()
means that no more argument registers are left, but that's not the same
condition that tells whether an argument pointer is required.

PR target/114132
gcc/
* config/avr/avr.h (CUMULATIVE_ARGS) : New field.
* config/avr/avr.cc (avr_init_cumulative_args): Initialize it.
(avr_function_arg): Set it.
(avr_frame_pointer_required_p): Use it instead of .nregs.

gcc/testsuite/
* gcc.target/avr/pr114132-1.c: New test.
* gcc.target/avr/torture/pr114132-2.c: New test.

diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 655a8e89fdc..478463b237a 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -3565,6 +3565,7 @@ avr_init_cumulative_args (CUMULATIVE_ARGS *cum, tree fntype, rtx libname,
 {
   cum->nregs = AVR_TINY ? 6 : 18;
   cum->regno = FIRST_CUM_REG;
+  cum->has_stack_args = 0;
   if (!libname && stdarg_p (fntype))
 cum->nregs = 0;
 
@@ -3605,6 +3606,8 @@ avr_function_arg (cumulative_args_t cum_v, const function_arg_info )
   if (cum->nregs && bytes <= cum->nregs)
 return gen_rtx_REG (arg.mode, cum->regno - bytes);
 
+  cum->has_stack_args = 1;
+
   return NULL_RTX;
 }
 
@@ -6014,6 +6017,8 @@ out_movhi_mr_r (rtx_insn *insn, rtx op[], int *plen)
   return "";
 }
 
+
+/* Implement `TARGET_FRAME_POINTER_REQUIRED'.  */
 /* Return 1 if frame pointer for current function required.  */
 
 static bool
@@ -6022,7 +6027,7 @@ avr_frame_pointer_required_p (void)
   return (cfun->calls_alloca
 	  || cfun->calls_setjmp
 	  || cfun->has_nonlocal_label
-	  || crtl->args.info.nregs == 0
+	  || crtl->args.info.has_stack_args
 	  || get_frame_size () > 0);
 }
 
diff --git a/gcc/config/avr/avr.h b/gcc/config/avr/avr.h
index ff2738df78c..56211fa9cd0 100644
--- a/gcc/config/avr/avr.h
+++ b/gcc/config/avr/avr.h
@@ -333,6 +333,10 @@ typedef struct avr_args
 
   /* Next available register number */
   int regno;
+
+  /* Whether some of the arguments are passed on the stack,
+ and hence an arg pointer is needed.  */
+  int has_stack_args;
 } CUMULATIVE_ARGS;
 
 #define INIT_CUMULATIVE_ARGS(CUM, FNTYPE, LIBNAME, FNDECL, N_NAMED_ARGS) \
diff --git a/gcc/testsuite/gcc.target/avr/pr114132-1.c b/gcc/testsuite/gcc.target/avr/pr114132-1.c
new file mode 100644
index 000..209eca823bd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/pr114132-1.c
@@ -0,0 +1,15 @@
+/* { dg-additional-options "-Os -std=c99" } */
+
+#ifdef __AVR_TINY__
+int func (int a, int b, char c)
+#else
+int func (long long a, long long b, char c)
+#endif
+{
+(void) a;
+(void) b;
+
+return c;
+}
+
+/* { dg-final { scan-assembler-not "push r28" } } */
diff --git a/gcc/testsuite/gcc.target/avr/torture/pr114132-2.c b/gcc/testsuite/gcc.target/avr/torture/pr114132-2.c
new file mode 100644
index 000..c2bcbacec37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/torture/pr114132-2.c
@@ -0,0 +1,22 @@
+/* { dg-do run } */
+/* { dg-additional-options "-std=c99" } */
+
+__attribute__((noinline,noclone))
+#ifdef __AVR_TINY__
+int func (int a, int b, char c)
+#else
+int func (long long a, long long b, char c)
+#endif
+{
+(void) a;
+(void) b;
+return 10 + c;
+}
+
+int main (void)
+{
+if (func (0, 0, 91) != 101)
+__builtin_abort();
+return 0;
+}
+

Re: [PATCH] calls: Further fixes for TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-02-29 Thread Jakub Jelinek

On Thu, Feb 29, 2024 at 05:23:25PM +, Richard Earnshaw (lists) wrote:
> On 29/02/2024 15:55, Jakub Jelinek wrote:
> > On Thu, Feb 29, 2024 at 02:14:05PM +, Richard Earnshaw wrote:
> >>> I tried the above on arm, aarch64 and x86_64 and that seems fine,
> >>> including the new testcase you added.
> >>>
> >>
> >> I should mention though, that INIT_CUMULATIVE_ARGS on arm ignores
> >> n_named_args entirely, it doesn't need it (I don't think it even existed
> >> when the AAPCS code was added).
> > 
> > So far I've just checked that the new testcase passes not just on
> > x86_64/i686-linux, but also on {powerpc64le,s390x,aarch64}-linux
> > with vanilla trunk.
> > Haven't posted this patch in patch form, plus while I'm not really sure
> > whether setting n_named_args to 0 or not changing in the
> > !pretend_outgoing_varargs_named is right, the setting to 0 feels more
> > correct to me.  If structure_value_addr_parm is 1, the function effectively
> > has a single named argument and then ... args and if the target wants
> > n_named_args to be number of named arguments except the last, then that
> > should be 0 rather than 1.
> > 
> > Thus, is the following patch ok for trunk then?
> 
> The comment at the start of the section says
> 
>   /* Now possibly adjust the number of named args.
>  Normally, don't include the last named arg if anonymous args follow.
>  We do include the last named arg if
>  targetm.calls.strict_argument_naming() returns nonzero.
>  (If no anonymous args follow, the result of list_length is actually
>  one too large.  This is harmless.)
> 
> So in the case of strict_argument_naming perhaps it should return 1, but 0 
> for other cases.

The TYPE_NO_NAMED_ARGS_STDARG_P (funtype) case is as if type_arg_types != 0
and list_length (type_arg_types) == 0, i.e. no user named arguments.
As list_length (NULL) returns 0, perhaps it could be even handled just the
by changing all the type_arg_types != 0 checks to
type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
There are just 2 cases I'm worried about, one is that I think rest of
calls.cc nor the backends are prepared to see n_named_args -1 after the
adjustments, I think it is better to use 0, and then the question is what
the !strict_argument_naming && !pretend_outgoing_varargs_named case
wants to do for the aggregate return.  The patch as posted for
void foo (...); void bar () { foo (1, 2, 3); }
will set n_named_args initially to 0 (no named args) and with the
adjustments for strict_argument_naming 0, otherwise for !pretend
0 as well, otherwise 3.
For
struct { char buf[4096]; } baz (...); void qux () { baz (1, 2, 3); }
the patch sets n_named_args initially to 1 (the hidden return) and
with the arguments for strict keep it at 1, for !pretend 0 and otherwise
3.

So, which case do you think is handled incorrectly with that?

Jakub

Re: [patch, libgfortran] Part 2: PR105456 Child I/O does not propage iostat

2024-02-29 Thread Jerry D


On 2/29/24 1:47 AM, Bernhard Reutner-Fischer wrote:

On Wed, 28 Feb 2024 21:29:06 -0800
Jerry D  wrote:


The attached patch adds the error checks similar to the first patch
previously committed.

I noticed a redundancy in some defines MSGLEN and IOMSG_LEN so I
consolidated this to one define in io.h. This is just cleanup stuff.

I have added test cases for each of the places where UDTIO is done in
the library.

Regressions tested on x86_64.

OK for trunk?


I think the commit hooks will complain about several missing spaces
before open brace; See contrib/check_GNU_style.py /tmp/pr105456-3.diff


I was given the OK from git gcc-verify. Regardless if hooks fail I just 
fix and try again.




Would it make sense to introduce and use an internal helper like trim()?
Or would it be possible to trim the message in generate_error_common()?



I was debating this and what would be the best approach. I was not sure 
where to put it.  I like the idea of doing in the generate_error_common. 
 I will try that and see how it plays.



And, just for my own education, the length limitation of iomsg to 255
chars is not backed by the standard AFAICS, right? It's just our
STRERR_MAXSZ?


Yes, its what we have had for a long lone time. Once you throw an error 
things get very processor dependent. I found MSGLEN set to 100 and 
IOMSG_len to 256. Nothing magic about it.


I appreciate the comments.

--- snip ---

Jerry -

RE: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments for AArch64

2024-02-29 Thread Andrew Pinski (QUIC)



> -Original Message-
> From: Evgeny Karpov 
> Sent: Thursday, February 29, 2024 8:46 AM
> To: Andrew Pinski 
> Cc: Richard Sandiford ; gcc-
> patc...@gcc.gnu.org; 10wa...@gmail.com; Maxim Kuvyrkov
> ; m...@harmstone.com; Zac Walker
> ; Ron Riddle ; Radek
> Barton ; Andrew Pinski (QUIC)
> 
> Subject: [PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments
> for AArch64
> 
> Wednesday, February 28, 2024 2:00 AM
> Andrew Pinski wrote:
> 
> > What does this mean with respect to C++ exceptions? Or you using SJLJ
> > exceptions support or the dwarf unwinding ones without SEH support?
> > I am not sure if SJLJ exceptions is well tested any more in GCC either.
> >
> > Also I have a question if you ran the full GCC/G++ testsuites and what
> > were the results?
> > If you did run it, did you use a cross compiler or the native
> > compiler? Did you do a bootstrap (GCC uses C++ but no exceptions though)?
> 
> As mentioned in the cover letter and the thread, the current contribution
> covers only the C scope.
> Exception handling is fully disabled for now.
> There is an experimental build with C++ and SEH, however, it is not included 
> in
> the plan for the current contribution.
> 
> https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-build
> 
> > If you run using a cross compiler, did you use ssh or some other route
> > to run the applications?
> >
> > Thanks,
> > Andrew Pinski
> 
> GitHub Actions are used to cross-compile toolchains, packages and tests, and
> execute tests on Windows Arm64.

This does not answer my question because what you are running is just simple 
testcases and not the FULL GCC testsuite.
So again have you ran the GCC testsuite and do you have a dejagnu board to be 
able to execute the binaries?
I think without the GCC testsuite ran to find all of the known failures, you 
are going to be running into many issues.
The GCC testsuite includes many tests for ABI corner cases and many features 
that you will most likely not think about testing using your simple testcases.
In fact I suspect there will be some of the aarch64 testcases which will need 
to be modified for the windows ABI which you have not done yet.


Thanks,
Andrew Pinski

> 
> https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-
> build/actions/runs/7929205044
> 
> Regards,
> Evgeny

Re: [Patch] OpenMP/C++: Fix (first)private clause with member variables [PR110347] [was: [RFA/RFC] C++/OpenMP: Supporting (first)private for member variables [PR110347] - or VALUE_EXPR and gimplify]

2024-02-29 Thread Jakub Jelinek

On Sat, Feb 17, 2024 at 12:35:48AM +0100, Tobias Burnus wrote:
> Hence, I now use this code, but also pass a flag to distinguish target
> regions (→ map) from shared usage, assuming that it is needed for the
> latter (otherwise, there wouldn't be that code).
> 
> The issue only showed up for a compile-only testcase, which I have now
> turned into a run-time testcase.
> In order to do so, I had to fix a bogus test for is mapped (or at least
> I think it is bogus) - and for sure it didn't handle shared memory.
> 
> I also modified it such that it iterates over devices. Changes to the dump:
> the 'device' clause had to be added (3x) and for the long line: 'this' and
> 'iptr' swapped the order and 'map(from:mapped)' became
> 'firstprivate(mapped)' due to my changes.
> I appended a patch which only shows the test-case differences as "git diff"
> contains all lines as I move it to libgomp/.
> 
> Comments, remarks, suggestions?

As discussed on IRC, I believe not disregarding the capture proxies in
target regions if they shouldn't be shared is always wrong, but also the
gimplify.cc suggestion was incorrect.

The thing is that at the place where the omp_disregard_value_expr call
is done currently for target region flags is always in_code ? GOVD_SEEN : 0
so by testing flags & anything we actually don't differentiate between
privatized vars and mapped vars.  So, it needs to be moved after we
actually compute the flags, similarly how we do it for non-target.
Now, in the patch I've mentioned on IRC last night I had & GOVD_MAP) != 0
checks, but that breaks e.g. the target-lambda-3.C testcase.  The
problem is that gimplification treats declare target functions as having
an implicit target region around the whole body, GOVD_MAP of course at
that point isn't set for anything and so we treated as privatized and
thus the vanilla trunk to the patched one resulted e.g. in the lambda
body
@@ -82,13 +82,11 @@ void run(int)operator()
   int * const data2 [value-expr: __closure->__data2];
   const int val [value-expr: __closure->__val];
 
-  _1 = __closure->__val;
-  _2 = __closure->__data2;
-  _3 = (long unsigned int) i;
-  _4 = _3 * 4;
-  _5 = _2 + _4;
-  _6 = _1 + 1;
-  *_5 = _6;
+  _1 = (long unsigned int) i;
+  _2 = _1 * 4;
+  _3 = data2 + _2;
+  _4 = val + 1;
+  *_3 = _4;
 }
changes, which uses uninitialized vars and so overwrites random memory.
The following updated patch checks for non-presence of GOVD_PRIVATE
and GOVD_FIRSTPRIVATE flags rather than presence of GOVD_MAP and worked
on the new testcases from the patch (but haven't tested it further).

>   * testsuite/libgomp.c++/target-lambda-3.C: Moved from
>   gcc/testsuite/g++.dg/gomp/ and fixed is-mapped handling.
>   * testsuite/libgomp.c++/firstprivate-c++-1.C: New test.
>   * testsuite/libgomp.c++/firstprivate-c++-2.C: New test.
>   * testsuite/libgomp.c++/private-c++-1.C: New test.
>   * testsuite/libgomp.c++/private-c++-2.C: New test.
>   * testsuite/libgomp.c++/use_device_ptr-c++-1.C: New test.

As discussed on IRC, please drop the -c++ infixes from the tests
and renumber if there are existing tests with that name already.
This is in libgomp.c++/ directory, all the tests are C++ in there.
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c++/use_device_ptr-c++-1.C
...
> +  omp_target_free (D, dev);}

Please add a newline in between ; and }

The patch below is meant to be used together with the testsuite
updates from your patch, but perhaps we want also some runtime testcase
using
int
foo ()
{
  int var = 42;
  [] () {
#pragma omp target firstprivate(var)
{
  var += 26;
  if (var != 42 + 26)
__builtin_abort ();
}
  } ();
  return var;
}

int
main ()
{
  if (foo () != 42)
__builtin_abort ();
}
and
template 
struct A {
  A () : a(), b()
  {
[&] ()
{
#pragma omp target firstprivate (a) map (from: b)
  b = ++a;
} ();
  }

  T a, b;
};

int
main ()
{
  A x;
  if (x.a != 0 || x.b != 1)
__builtin_abort ();
}
or so (unless this is already covered somewhere).

--- gcc/gimplify.cc.jj  2024-02-28 22:24:54.859623016 +0100
+++ gcc/gimplify.cc 2024-02-29 18:03:00.744657060 +0100
@@ -8144,13 +8144,6 @@ omp_notice_variable (struct gimplify_omp
   n = splay_tree_lookup (ctx->variables, (splay_tree_key)decl);
   if ((ctx->region_type & ORT_TARGET) != 0)
 {
-  if (ctx->region_type & ORT_ACC)
-   /* For OpenACC, as remarked above, defer expansion.  */
-   shared = false;
-  else
-   shared = true;
-
-  ret = lang_hooks.decls.omp_disregard_value_expr (decl, shared);
   if (n == NULL)
{
  unsigned nflags = flags;
@@ -8275,9 +8268,22 @@ omp_notice_variable (struct gimplify_omp
}
found_outer:
  omp_add_variable (ctx, decl, nflags);
+ if (ctx->region_type & ORT_ACC)
+   /* For OpenACC, as remarked above, defer expansion.  */
+   shared = false;
+ else
+   shared = (nflags & (GOVD_PRIVATE |

Re: [PATCH] calls: Further fixes for TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-02-29 Thread Richard Earnshaw (lists)

On 29/02/2024 15:55, Jakub Jelinek wrote:
> On Thu, Feb 29, 2024 at 02:14:05PM +, Richard Earnshaw wrote:
>>> I tried the above on arm, aarch64 and x86_64 and that seems fine,
>>> including the new testcase you added.
>>>
>>
>> I should mention though, that INIT_CUMULATIVE_ARGS on arm ignores
>> n_named_args entirely, it doesn't need it (I don't think it even existed
>> when the AAPCS code was added).
> 
> So far I've just checked that the new testcase passes not just on
> x86_64/i686-linux, but also on {powerpc64le,s390x,aarch64}-linux
> with vanilla trunk.
> Haven't posted this patch in patch form, plus while I'm not really sure
> whether setting n_named_args to 0 or not changing in the
> !pretend_outgoing_varargs_named is right, the setting to 0 feels more
> correct to me.  If structure_value_addr_parm is 1, the function effectively
> has a single named argument and then ... args and if the target wants
> n_named_args to be number of named arguments except the last, then that
> should be 0 rather than 1.
> 
> Thus, is the following patch ok for trunk then?

The comment at the start of the section says

  /* Now possibly adjust the number of named args.
 Normally, don't include the last named arg if anonymous args follow.
 We do include the last named arg if
 targetm.calls.strict_argument_naming() returns nonzero.
 (If no anonymous args follow, the result of list_length is actually
 one too large.  This is harmless.)

So in the case of strict_argument_naming perhaps it should return 1, but 0 for 
other cases.

R.

> 
> 2024-02-29  Jakub Jelinek  
> 
>   PR target/107453
>   * calls.cc (expand_call): For TYPE_NO_NAMED_ARGS_STDARG_P set
>   n_named_args initially before INIT_CUMULATIVE_ARGS to
>   structure_value_addr_parm rather than 0, after it don't modify
>   it if strict_argument_naming and clear only if
>   !pretend_outgoing_varargs_named.
> 
> --- gcc/calls.cc.jj   2024-01-22 11:48:08.045847508 +0100
> +++ gcc/calls.cc  2024-02-29 16:24:47.799855912 +0100
> @@ -2938,7 +2938,7 @@ expand_call (tree exp, rtx target, int i
>/* Count the struct value address, if it is passed as a parm.  */
>+ structure_value_addr_parm);
>else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
> -n_named_args = 0;
> +n_named_args = structure_value_addr_parm;
>else
>  /* If we know nothing, treat all args as named.  */
>  n_named_args = num_actuals;
> @@ -2970,14 +2970,15 @@ expand_call (tree exp, rtx target, int i
>   we do not have any reliable way to pass unnamed args in
>   registers, so we must force them into memory.  */
>  
> -  if (type_arg_types != 0
> +  if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>&& targetm.calls.strict_argument_naming (args_so_far))
>  ;
>else if (type_arg_types != 0
>  && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>  /* Don't include the last named arg.  */
>  --n_named_args;
> -  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
> +  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
> +&& ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>  n_named_args = 0;
>else
>  /* Treat all args as named.  */
> 
>   Jakub
>

Re: [PATCH] lto, Darwin: Fix offload section names.

2024-02-29 Thread Tobias Burnus


Hi Iain, hello world,

Thomas Schwinge wrote:

On 2024-01-16T15:00:16+, Iain Sandoe  wrote:

...

diff --git a/gcc/lto-section-names.h b/gcc/lto-section-names.h
index a743deb4efb..1cdadf36ec0 100644
--- a/gcc/lto-section-names.h
+++ b/gcc/lto-section-names.h

...

@@ -35,8 +39,14 @@ extern const char *section_name_prefix;
  
  #define LTO_SEGMENT_NAME "__GNU_LTO"
  
+#if OBJECT_FORMAT_MACHO

+#define OFFLOAD_VAR_TABLE_SECTION_NAME "__GNU_OFFLOAD,__vars"
+#define OFFLOAD_FUNC_TABLE_SECTION_NAME "__GNU_OFFLOAD,__funcs"
+#define OFFLOAD_IND_FUNC_TABLE_SECTION_NAME "__GNU_OFFLOAD,__ind_fns"
+#else
  #define OFFLOAD_VAR_TABLE_SECTION_NAME ".gnu.offload_vars"

...

Just to note that, per my understanding, this will require corresponding
changes elsewhere, once you attempt to actually enable offloading
compilation for Darwin (which -- ;-) I suspect -- is not on your agenda
right now):


For instance also in MOLD:

https://github.com/rui314/mold/blob/50bdf39ba57e29386de28bd0c303035e626fa29c/elf/input-files.cc#L244

if ((shdr.sh_flags & SHF_EXCLUDE) &&
name.starts_with(".gnu.offload_lto_.symtab.")) {
  this->is_gcc_offload_obj = true;
  continue;
}

Tobias

[PATCH v1 08/13] aarch64: Add Cygwin and MinGW environments for AArch64

2024-02-29 Thread Evgeny Karpov

Wednesday, February 28, 2024 2:00 AM
Andrew Pinski wrote:

> What does this mean with respect to C++ exceptions? Or you using SJLJ
> exceptions support or the dwarf unwinding ones without SEH support?
> I am not sure if SJLJ exceptions is well tested any more in GCC either.
> 
> Also I have a question if you ran the full GCC/G++ testsuites and what were 
> the
> results?
> If you did run it, did you use a cross compiler or the native compiler? Did 
> you
> do a bootstrap (GCC uses C++ but no exceptions though)?

As mentioned in the cover letter and the thread, the current
contribution covers only the C scope.
Exception handling is fully disabled for now.
There is an experimental build with C++ and SEH, however, it
is not included in the plan for the current contribution.

https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-build

> If you run using a cross compiler, did you use ssh or some other route to run
> the applications?
> 
> Thanks,
> Andrew Pinski

GitHub Actions are used to cross-compile toolchains, packages
and tests, and execute tests on Windows Arm64.

https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-build/actions/runs/7929205044

Regards,
Evgeny

[PATCH v1 00/13] Add aarch64-w64-mingw32 target

2024-02-29 Thread Evgeny Karpov

Monday, February 26, 2024 2:30 AM 
NightStrike wrote:

> To be clear, because of the refactoring, it will affect x86/x64 Windows 
> targets.
> Can you do a testsuite run before and after and see that it doesn't get worse?
> The full testsuite for all languages for Windows isn't in great shape, but 
> it's not
> awful.  Some languages, like Rust and Fortran, have ~10 FAILs.  C and C++ have
> several thousand.
> 
> In particular, there are quite a few testsuite test FAILs regarding MS ABI 
> that
> hopefully do not get worse.
> 

Thank you for bringing it up! Our CI will be extended to test the x64
mingw target and calculate a delta, starting from patch series v2.

> Lastly, I don't think I see in the current patch series where you add new
> testsuite coverage for aarch64-specific bits.  I probably missed it, so feel 
> free to
> helpfully correct me there :)  I'd be curious to see how the tests were 
> written to
> take into account target differences (using for example the dejagnu feature
> procs) and other nuances.

Tests have not been added yet. This does not mean they do not exist
or are not used. They are implemented and used in our CI, and will be
contributed to the aarch64-w64-mingw32 target in the next patch
series.
https://github.com/Windows-on-ARM-Experiments/mingw-woarm64-build/tree/main/tests

Regards,
Evgeny

Re: [PATCH] i386: For noreturn functions save at least the bp register if it is used [PR114116]

2024-02-29 Thread Michael Matz

Hello,

On Tue, 27 Feb 2024, Jakub Jelinek wrote:

> On Tue, Feb 27, 2024 at 10:13:14AM +0100, Jakub Jelinek wrote:
> > For __libc_start_main, glibc surely just could use no_callee_saved_registers
> > attribute, because that is typically the outermost frame in backtrace,
> > there is no need to save those there.
> > And for kernel if it really wants it and nothing will use the backtraces,
> > perhaps the patch wouldn't need to be reverted completely but just guarded
> > the implicit no_callee_saved_registers treatment of noreturn
> > functions on -mcmodel=kernel or -fno-asynchronous-unwind-tables.
> 
> Guarding on -fno-asynchronous-unwind-tables isn't a good idea,
> with just -g we emit in that case unwind info in .debug_frame section
> and even that shouldn't break, and we shouldn't generate different code for
> -g vs. -g0.
> The problem with the changes is that it breaks the unwinding and debugging
> experience not just in the functions on which the optimization triggers,
> but on all functions in the backtrace as well.
> 
> So, IMHO either revert the changes altogether, or guard on -mcmodel=kernel
> (but talk to kernel people on linux-toolchains if that is what they actually
> want).

What is the underlying real purpose of the changes anyway?  It's a 
nano-optimization: for functions to be called multiple times 
they must either return or be recursive.  The latter is not very likely, 
so a noreturn function is called only once in the vast majority of cases.  
Any optimizations that diddle with the frame setup code for functions 
called only once seems to be ... well, not so very useful, especially so 
when they impact anything that is actually useful, like debugging.

I definitely think this shouldn't be done by default.

Ciao,
Michael.

[PATCH] calls: Further fixes for TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-02-29 Thread Jakub Jelinek

On Thu, Feb 29, 2024 at 02:14:05PM +, Richard Earnshaw wrote:
> > I tried the above on arm, aarch64 and x86_64 and that seems fine,
> > including the new testcase you added.
> > 
> 
> I should mention though, that INIT_CUMULATIVE_ARGS on arm ignores
> n_named_args entirely, it doesn't need it (I don't think it even existed
> when the AAPCS code was added).

So far I've just checked that the new testcase passes not just on
x86_64/i686-linux, but also on {powerpc64le,s390x,aarch64}-linux
with vanilla trunk.
Haven't posted this patch in patch form, plus while I'm not really sure
whether setting n_named_args to 0 or not changing in the
!pretend_outgoing_varargs_named is right, the setting to 0 feels more
correct to me.  If structure_value_addr_parm is 1, the function effectively
has a single named argument and then ... args and if the target wants
n_named_args to be number of named arguments except the last, then that
should be 0 rather than 1.

Thus, is the following patch ok for trunk then?

2024-02-29  Jakub Jelinek  

PR target/107453
* calls.cc (expand_call): For TYPE_NO_NAMED_ARGS_STDARG_P set
n_named_args initially before INIT_CUMULATIVE_ARGS to
structure_value_addr_parm rather than 0, after it don't modify
it if strict_argument_naming and clear only if
!pretend_outgoing_varargs_named.

--- gcc/calls.cc.jj 2024-01-22 11:48:08.045847508 +0100
+++ gcc/calls.cc2024-02-29 16:24:47.799855912 +0100
@@ -2938,7 +2938,7 @@ expand_call (tree exp, rtx target, int i
 /* Count the struct value address, if it is passed as a parm.  */
 + structure_value_addr_parm);
   else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
-n_named_args = 0;
+n_named_args = structure_value_addr_parm;
   else
 /* If we know nothing, treat all args as named.  */
 n_named_args = num_actuals;
@@ -2970,14 +2970,15 @@ expand_call (tree exp, rtx target, int i
  we do not have any reliable way to pass unnamed args in
  registers, so we must force them into memory.  */

-  if (type_arg_types != 0
+  if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
   && targetm.calls.strict_argument_naming (args_so_far))
 ;
   else if (type_arg_types != 0
   && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
 /* Don't include the last named arg.  */
 --n_named_args;
-  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
+  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
+  && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
 n_named_args = 0;
   else
 /* Treat all args as named.  */

Jakub

Re: [PATCH] lto, Darwin: Fix offload section names.

2024-02-29 Thread Iain Sandoe

Hi Thomas,

> On 29 Feb 2024, at 14:37, Thomas Schwinge  wrote:

> On 2024-01-16T15:00:16+, Iain Sandoe  wrote:
>> Currently, these section names have wrong syntax for Mach-O.
>> Although they were added some time ago; recently added tests are
>> now emitting them leading to new fails on Darwin.
>> 
>> This adds a Mach-O variant for each.
> 
>> gcc/lto-section-names.h | 10 ++
>> 1 file changed, 10 insertions(+)
>> 
>> diff --git a/gcc/lto-section-names.h b/gcc/lto-section-names.h
>> index a743deb4efb..1cdadf36ec0 100644
>> --- a/gcc/lto-section-names.h
>> +++ b/gcc/lto-section-names.h
>> @@ -25,7 +25,11 @@ along with GCC; see the file COPYING3.  If not see
>>name for the functions and static_initializers.  For other types of
>>sections a '.' and the section type are appended.  */
>> #define LTO_SECTION_NAME_PREFIX ".gnu.lto_"
>> +#if OBJECT_FORMAT_MACHO
>> +#define OFFLOAD_SECTION_NAME_PREFIX "__GNU_OFFLD_LTO,"
>> +#else
>> #define OFFLOAD_SECTION_NAME_PREFIX ".gnu.offload_lto_"
>> +#endif
>> 
>> /* Can be either OFFLOAD_SECTION_NAME_PREFIX when we stream IR for offload
>>compiler, or LTO_SECTION_NAME_PREFIX for LTO case.  */
>> @@ -35,8 +39,14 @@ extern const char *section_name_prefix;
>> 
>> #define LTO_SEGMENT_NAME "__GNU_LTO"
>> 
>> +#if OBJECT_FORMAT_MACHO
>> +#define OFFLOAD_VAR_TABLE_SECTION_NAME "__GNU_OFFLOAD,__vars"
>> +#define OFFLOAD_FUNC_TABLE_SECTION_NAME "__GNU_OFFLOAD,__funcs"
>> +#define OFFLOAD_IND_FUNC_TABLE_SECTION_NAME "__GNU_OFFLOAD,__ind_fns"
>> +#else
>> #define OFFLOAD_VAR_TABLE_SECTION_NAME ".gnu.offload_vars"
>> #define OFFLOAD_FUNC_TABLE_SECTION_NAME ".gnu.offload_funcs"
>> #define OFFLOAD_IND_FUNC_TABLE_SECTION_NAME ".gnu.offload_ind_funcs"
>> +#endif
>> 
>> #endif /* GCC_LTO_SECTION_NAMES_H */
> 
> Just to note that, per my understanding, this will require corresponding
> changes elsewhere, once you attempt to actually enable offloading
> compilation for Darwin (which -- ;-) I suspect -- is not on your agenda
> right now):

It is disappointing, but adding offloading to Darwin seems to be out of reach 
at the moment.

AFAIK, we have no support for NVidia after macOS 10.13 and the AMD units fitted 
to new(ish)
boxes are high-end graphics cards (when last I discussed with Andrew, we could 
not conclude
whether they would be handled usefully).

Adding arbitrary extension cards is (technically) feasible to some of the 
2019-era server-style
machines - but that would still need approved and signed kernel drivers.  I 
have not looked into
whether the “studio” Arm64 machine might support such additions (the 
constraints on kernel-
side addtions would surely be even more strict on the newer OS versions).

So, indeed (for now at least) sadly, this is not even on the distant horizon :-(

Iain

> 
>$ git grep --cached -F .gnu.offload_
>gcc/config/gcn/mkoffload.cc:  if (sscanf (buf, " .section 
> .gnu.offload_vars%c", ) > 0)
>gcc/config/gcn/mkoffload.cc:  else if (sscanf (buf, " .section 
> .gnu.offload_funcs%c", ) > 0)
>gcc/config/gcn/mkoffload.cc:  /* Likewise for .gnu.offload_vars; used 
> for reverse offload. */
>gcc/config/gcn/mkoffload.cc:  else if (sscanf (buf, " .section 
> .gnu.offload_ind_funcs%c", ) > 0)
>['gcc/lto-section-names.h' adjusted per above.]
>libgcc/offloadstuff.c:#define OFFLOAD_FUNC_TABLE_SECTION_NAME 
> ".gnu.offload_funcs"
>libgcc/offloadstuff.c:#define OFFLOAD_IND_FUNC_TABLE_SECTION_NAME 
> ".gnu.offload_ind_funcs"
>libgcc/offloadstuff.c:#define OFFLOAD_VAR_TABLE_SECTION_NAME 
> ".gnu.offload_vars"
>lto-plugin/lto-plugin.c:  if (startswith (name, ".gnu.offload_lto_.opts"))
> 
> 
> Grüße
> Thomas

Re: [PATCH] libgccjit: Allow comparing array types

2024-02-29 Thread Antoni Boucher

David: Ping.

On Thu, 2024-01-25 at 07:52 -0500, Antoni Boucher wrote:
> Thanks.
> Can we please agree on some wording to use so I know when the patch
> can
> be pushed. Especially since we're now in stage 4, it would help me if
> you say something like "you can push to master".
> Regards.
> 
> On Wed, 2024-01-24 at 12:14 -0500, David Malcolm wrote:
> > On Fri, 2024-01-19 at 16:55 -0500, Antoni Boucher wrote:
> > > Hi.
> > > This patch allows comparing different instances of array types as
> > > equal.
> > > Thanks for the review.
> > 
> > Thanks; the patch looks good to me.
> > 
> > Dave
> > 
>

[PATCH v1 00/13] Add aarch64-w64-mingw32 target

2024-02-29 Thread Evgeny Karpov

Friday, February 23, 2024 7:00 PM
Richard Sandiford wrote:

> Seconded. :)  Thanks also for the very clear organisation of the series, and 
> for
> commonising code rather than cut-&-pasting it.

Thank you, Richard, for the valuable feedback. It is great to hear
that the series structure is easy to review. That work has been
done before submitting v1.

> FWIW, I agree with all the comments posted so far, and just sent some other
> comments too.  I think my main high-level comments are:
> 
> - Could you double-check that all the code in the common files are
>   used on both aarch64 and x86?  I think it's OK to move code outside
>   of x86 even if aarch64 doesn't want to use it, provided that it makes
>   conceptual target-independent sense.  But it's not clear whether
>   unused code is deliberate or not (e.g. the EXTRA_OS_CPP_BUILTINS
>   thing I mentioned in the part 2 review).

All files from the mingw folder are used by the aarch64 target.
Some of them are used partially as mingw.cc. As mentioned in the
cover letter, the current contribution covers only the C scope.
EXTRA_OS_CPP_BUILTINS is one example which is not used.

> - Could you test with all languages enabled, and say what languages
>   are supported?  Some languages require an existing compiler for
>   the same language and so are more difficult to bootstrap for
>   a new port.  I suppose you'd need a cross-host build first,
>   then use the cross-compiled compilers to bootstrap.
> 
> Thanks,
> Richard

Our CI for the current contribution uses and tests only the C
language for the aarch64-w64-mingw32 target.

Regards,
Evgeny

Re: [PATCH] RISC-V: Update test expectancies with recent scheduler change

2024-02-29 Thread Palmer Dabbelt


On Wed, 28 Feb 2024 02:24:40 PST (-0800), Robin Dapp wrote:

I suggest specify -fno-schedule-insns to force tests assembler never
change for any scheduling model.


We already do that and that's the point - as I mentioned before, no
scheduling is worse than default scheduling here (for some definition
of worse).  The way to reduce the number of vsetvls is to set the
load latency to a low value.


I think -fno-schedule-insns is a perfectly reasonable way to get rid of 
the test failures in the short term.


Using -fno-schedule-insns doesn't really fix the core fragility of the 
tests, though: what the pass does depends very much on the order of 
instructions it sees, so anything that reorders RTL is going to cause 
churn in the tests.  Sure getting rid of scheduling will get rid of a 
big cause for reordering, but any pass could reorder RTL and thus change 
the expected vsetvl counts.


Maybe the right thing to do here is to rewrite these as RTL tests?  That 
way we can very tightly control the input ordering.  It's kind of the 
opposite of Jeff's suggestion to add more debug output to the pass, but 
I think that wouldn't actually solve the issue: we're not having trouble 
matching assembly, the fragility comes from the input side.


That might be a "grass is always greener" thing, though, as I don't 
think I've managed to write a useful RTL test yet...




Regards
 Robin

Re: [PATCH] libgccjit: Add support for machine-dependent builtins

2024-02-29 Thread Antoni Boucher

David: Ping.

On Thu, 2024-02-15 at 09:32 -0500, Antoni Boucher wrote:
> David: Ping
> 
> On Thu, 2024-02-08 at 08:59 -0500, Antoni Boucher wrote:
> > David: Ping.
> > 
> > On Wed, 2024-01-10 at 18:58 -0500, Antoni Boucher wrote:
> > > Here it is: https://gcc.gnu.org/pipermail/jit/2023q4/001725.html
> > > 
> > > On Wed, 2024-01-10 at 18:44 -0500, David Malcolm wrote:
> > > > On Wed, 2024-01-10 at 18:29 -0500, Antoni Boucher wrote:
> > > > > David: Ping in case you missed this patch.
> > > > 
> > > > For some reason it's not showing up in patchwork (or, at least,
> > > > I
> > > > can't
> > > > find it there).  Do you have a URL for it there?
> > > > 
> > > > Sorry about this
> > > > Dave
> > > > 
> > > > > 
> > > > > On Sat, 2023-02-11 at 17:37 -0800, Andrew Pinski wrote:
> > > > > > On Sat, Feb 11, 2023 at 4:31 PM Antoni Boucher via Gcc-
> > > > > > patches
> > > > > >  wrote:
> > > > > > > 
> > > > > > > Hi.
> > > > > > > This patch adds support for machine-dependent builtins in
> > > > > > > libgccjit
> > > > > > > (bug 108762).
> > > > > > > 
> > > > > > > There are two things I don't like in this patch:
> > > > > > > 
> > > > > > >  1. There are a few functions copied from the C frontend
> > > > > > > (common_mark_addressable_vec and a few others).
> > > > > > > 
> > > > > > >  2. Getting a target builtin only works from the second
> > > > > > > compilation
> > > > > > > since the type information is recorded at the first
> > > > > > > compilation.
> > > > > > > I
> > > > > > > couldn't find a way to get the builtin data without using
> > > > > > > the
> > > > > > > langhook.
> > > > > > > It is necessary to get the type information for type
> > > > > > > checking
> > > > > > > and
> > > > > > > instrospection.
> > > > > > > 
> > > > > > > Any idea how to fix these issues?
> > > > > > 
> > > > > > Seems like you should do this patch in a few steps; that is
> > > > > > split
> > > > > > it
> > > > > > up.
> > > > > > Definitely split out GCC_JIT_TYPE_BFLOAT16 support.
> > > > > > I also think the vector support should be in a different
> > > > > > patch
> > > > > > too.
> > > > > > 
> > > > > > Splitting out these parts would definitely make it easier
> > > > > > for
> > > > > > review
> > > > > > and make incremental improvements.
> > > > > > 
> > > > > > Thanks,
> > > > > > Andrew Pinski
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > Thanks for the review.
> > > > > 
> > > > 
> > > 
> > 
>

Re: [PATCH] libgccjit: Add ability to get CPU features

2024-02-29 Thread Antoni Boucher

David: Ping.
Iain: Ping.

On Tue, 2024-02-13 at 13:37 -0500, Antoni Boucher wrote:
> David: Ping.
> 
> On Tue, 2024-02-06 at 07:54 -0500, Antoni Boucher wrote:
> > David: Ping.
> > 
> > On Tue, 2024-01-30 at 10:50 -0500, Antoni Boucher wrote:
> > > David: I'm unsure what to do here. It seems we cannot find a
> > > reviewer.
> > > Would it help if I show you the code in gccrs that is similar?
> > > Would it help if I ask someone from gccrs to review this code?
> > > 
> > > On Sat, 2024-01-20 at 09:50 -0500, Antoni Boucher wrote:
> > > > CC-ing Iain in case they can do the review since it is based on
> > > > how
> > > > they did it in the D frontend.
> > > > Could you please do the review?
> > > > Thanks!
> > > > 
> > > > On Thu, 2023-11-09 at 18:04 -0500, David Malcolm wrote:
> > > > > On Thu, 2023-11-09 at 17:27 -0500, Antoni Boucher wrote:
> > > > > > Hi.
> > > > > > This patch adds support for getting the CPU features in
> > > > > > libgccjit
> > > > > > (bug
> > > > > > 112466)
> > > > > > 
> > > > > > There's a TODO in the test:
> > > > > > I'm not sure how to test that gcc_jit_target_info_arch
> > > > > > returns
> > > > > > the
> > > > > > correct value since it is dependant on the CPU.
> > > > > > Any idea on how to improve this?
> > > > > > 
> > > > > > Also, I created a CStringHash to be able to have a
> > > > > > std::unordered_set. Is there any built-in way
> > > > > > of
> > > > > > doing
> > > > > > this?
> > > > > 
> > > > > Thanks for the patch.
> > > > > 
> > > > > Some high-level questions:
> > > > > 
> > > > > Is this specifically about detecting capabilities of the host
> > > > > that
> > > > > libgccjit is currently running on? or how the target was
> > > > > configured
> > > > > when libgccjit was built?
> > > > > 
> > > > > One of the benefits of libgccjit is that, in theory, we
> > > > > support
> > > > > all
> > > > > of
> > > > > the targets that GCC already supports.  Does this patch
> > > > > change
> > > > > that,
> > > > > or
> > > > > is this more about giving client code the ability to
> > > > > determine
> > > > > capabilities of the specific host being compiled for?
> > > > > 
> > > > > I'm nervous about having per-target jit code.  Presumably
> > > > > there's
> > > > > a
> > > > > reason that we can't reuse existing target logic here - can
> > > > > you
> > > > > please
> > > > > describe what the problem is.  I see that the ChangeLog has:
> > > > > 
> > > > > > * config/i386/i386-jit.cc: New file.
> > > > > 
> > > > > where i386-jit.cc has almost 200 lines of nontrivial code. 
> > > > > Where
> > > > > did
> > > > > this come from?  Did you base it on existing code in our
> > > > > source
> > > > > tree,
> > > > > making modifications to fit the new internal API, or did you
> > > > > write
> > > > > it
> > > > > from scratch?  In either case, how onerous would this be for
> > > > > other
> > > > > targets?
> > > > > 
> > > > > I'm not at expert at target hooks (or at the i386 backend),
> > > > > so
> > > > > if
> > > > > we
> > > > > do
> > > > > go with this approach I'd want someone else to review those
> > > > > parts
> > > > > of
> > > > > the patch.
> > > > > 
> > > > > Have you verified that GCC builds with this patch with jit
> > > > > *not*
> > > > > enabled in the enabled languages?
> > > > > 
> > > > > [...snip...]
> > > > > 
> > > > > A nitpick:
> > > > > 
> > > > > > +.. function:: const char * \
> > > > > > +  gcc_jit_target_info_arch
> > > > > > (gcc_jit_target_info
> > > > > > *info)
> > > > > > +
> > > > > > +   Get the architecture of the currently running CPU.
> > > > > 
> > > > > What does this string look like?
> > > > > How long does the pointer remain valid?
> > > > > 
> > > > > Thanks again; hope the above makes sense
> > > > > Dave
> > > > > 
> > > > 
> > > 
> > 
>

Re: [PATCH] i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534]

2024-02-29 Thread Jan Hubicka

> On Thu, Feb 29, 2024 at 03:15:30PM +0100, Jan Hubicka wrote:
> > I am not wed to the idea (just it appeared to me as an option to
> > disabling this optimization by default). I still think it may make sense.
> 
> Maybe I misunderstood your idea.
> So, you are basically suggesting to go in a completely opposite direction
> from H.J.'s changes, instead of saving less in noreturn prologues, save
> more, most likely not change anything in code generation on the caller's
> side (nothing is kept alive across visible unconditional noreturn calls)
> and maybe after some time use normally caller-saved registers in say call
> frame debug info for possible use in DW_OP_entry_value (but in that case it
> is an ABI change) and improve debuggability of vars which live in normally
> caller-saved registers right before a noreturn call in that frame?
> What about registers in which function arguments are passed?

Hmm, you are right - I got it backwards. 
> 
> If it is done as a change with no changes at all on the caller side and
> just on the callee side, it could again be guarded by some option (not sure
> what the default would be) where the user could increase debuggability of
> the noreturn caller (in that case always necessarily just a single immediate
> one; while not doing the callee saved register saves improves debuggability
> in perhaps multiple routines in the call stack, depending on which registers
> wouldn't be saved and which registers are used in each of the caller frames;
> e.g. noreturn function could save/not save all of %rbx, %r1[2345], one
> caller somewhere use %r12, another %rbx and %r13, yet another one %r14 and
> %r15).

I am not sure how much practical value this would get, but in any case
it is indpeendent of the discussed patch.

Sorry for the confussion,
Honza
> 
>   Jakub
>

Re: [PATCH] tree-profile: Don't instrument an IFUNC resolver nor its callees

2024-02-29 Thread H.J. Lu

On Thu, Feb 29, 2024 at 7:06 AM Jan Hubicka  wrote:
>
> > > I am worried about scenario where ifunc selector calls function foo
> > > defined locally and foo is also used from other places possibly in hot
> > > loops.
> > > >
> > > > > So it is not really reliable fix (though I guess it will work a lot of
> > > > > common code).  I wonder what would be alternatives.  In GCC generated
> > > > > profling code we use TLS only for undirect call profiling (so there is
> > > > > no need to turn off rest of profiling).  I wonder if there is any 
> > > > > chance
> > > > > to not make it seffault when it is done before TLS is set up?
> > > >
> > > > IFUNC selector should make minimum external calls, none is preferred.
> > >
> > > Edge porfiling only inserts (atomic) 64bit increments of counters.
> > > If target supports these operations inline, no external calls will be
> > > done.
> > >
> > > Indirect call profiling inserts the problematic TLS variable (to track
> > > caller-callee pairs). Value profiling also inserts various additional
> > > external calls to counters.
> > >
> > > I am perfectly fine with disabling instrumentation for ifunc selectors
> > > and functions only reachable from them, but I am worried about calles
> > > used also from non-ifunc path.
> >
> > Programmers need to understand not to do it.
>
> It would help to have this documented. Should we warn when ifunc
> resolver calls external function, comdat of function reachable from
> non-ifunc code?

That will be nice.

> >
> > > For example selector implemented in C++ may do some string handling to
> > > match CPU name and propagation will disable profiling for std::string
> >
> > On x86, they should use CPUID, not string functions.
> >
> > > member functions (which may not be effective if comdat section is
> > > prevailed from other translation unit).
> >
> > String functions may lead to external function calls which is dangerous.
> >
> > > > Any external calls may lead to issues at run-time.  It is a very bad 
> > > > idea
> > > > to profile IFUNC selector via external function call.
> > >
> > > Looking at https://sourceware.org/glibc/wiki/GNU_IFUNC
> > > there are other limitations on ifunc except for profiling, such as
> > > -fstack-protector-all.  So perhaps your propagation can be used to
> > > disable those features as well.
> >
> > So, it may not be tree-profile specific.  Where should these 2 bits
> > be added?
>
> If we want to disable other transforms too, then I think having a bit in
> cgraph_node for reachability from ifunc resolver makes sense.
> I would still do the cycle detection using on-side hash_map to avoid
> polution of the global datastructure.
>

I will see what I can do.

Thanks.

> Thanks,
> Honza
> >
> > > "Unfortunately there are actually a lot of restrictions placed on IFUNC
> > > usage which aren't entirely clear and the documentation needs to be
> > > updated." makes me wonder what other transformations are potentially
> > > dangerous.
> > >
> > > Honza
> >
> >
> > --
> > H.J.



-- 
H.J.

Re: [PATCH] i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534]

2024-02-29 Thread Jakub Jelinek

On Thu, Feb 29, 2024 at 03:15:30PM +0100, Jan Hubicka wrote:
> I am not wed to the idea (just it appeared to me as an option to
> disabling this optimization by default). I still think it may make sense.

Maybe I misunderstood your idea.
So, you are basically suggesting to go in a completely opposite direction
from H.J.'s changes, instead of saving less in noreturn prologues, save
more, most likely not change anything in code generation on the caller's
side (nothing is kept alive across visible unconditional noreturn calls)
and maybe after some time use normally caller-saved registers in say call
frame debug info for possible use in DW_OP_entry_value (but in that case it
is an ABI change) and improve debuggability of vars which live in normally
caller-saved registers right before a noreturn call in that frame?
What about registers in which function arguments are passed?

If it is done as a change with no changes at all on the caller side and
just on the callee side, it could again be guarded by some option (not sure
what the default would be) where the user could increase debuggability of
the noreturn caller (in that case always necessarily just a single immediate
one; while not doing the callee saved register saves improves debuggability
in perhaps multiple routines in the call stack, depending on which registers
wouldn't be saved and which registers are used in each of the caller frames;
e.g. noreturn function could save/not save all of %rbx, %r1[2345], one
caller somewhere use %r12, another %rbx and %r13, yet another one %r14 and
%r15).

Jakub

Re: [PATCH] tree-profile: Don't instrument an IFUNC resolver nor its callees

2024-02-29 Thread Jan Hubicka

> > I am worried about scenario where ifunc selector calls function foo
> > defined locally and foo is also used from other places possibly in hot
> > loops.
> > >
> > > > So it is not really reliable fix (though I guess it will work a lot of
> > > > common code).  I wonder what would be alternatives.  In GCC generated
> > > > profling code we use TLS only for undirect call profiling (so there is
> > > > no need to turn off rest of profiling).  I wonder if there is any chance
> > > > to not make it seffault when it is done before TLS is set up?
> > >
> > > IFUNC selector should make minimum external calls, none is preferred.
> >
> > Edge porfiling only inserts (atomic) 64bit increments of counters.
> > If target supports these operations inline, no external calls will be
> > done.
> >
> > Indirect call profiling inserts the problematic TLS variable (to track
> > caller-callee pairs). Value profiling also inserts various additional
> > external calls to counters.
> >
> > I am perfectly fine with disabling instrumentation for ifunc selectors
> > and functions only reachable from them, but I am worried about calles
> > used also from non-ifunc path.
> 
> Programmers need to understand not to do it.

It would help to have this documented. Should we warn when ifunc
resolver calls external function, comdat of function reachable from
non-ifunc code?
> 
> > For example selector implemented in C++ may do some string handling to
> > match CPU name and propagation will disable profiling for std::string
> 
> On x86, they should use CPUID, not string functions.
> 
> > member functions (which may not be effective if comdat section is
> > prevailed from other translation unit).
> 
> String functions may lead to external function calls which is dangerous.
> 
> > > Any external calls may lead to issues at run-time.  It is a very bad idea
> > > to profile IFUNC selector via external function call.
> >
> > Looking at https://sourceware.org/glibc/wiki/GNU_IFUNC
> > there are other limitations on ifunc except for profiling, such as
> > -fstack-protector-all.  So perhaps your propagation can be used to
> > disable those features as well.
> 
> So, it may not be tree-profile specific.  Where should these 2 bits
> be added?

If we want to disable other transforms too, then I think having a bit in
cgraph_node for reachability from ifunc resolver makes sense.
I would still do the cycle detection using on-side hash_map to avoid
polution of the global datastructure.

Thanks,
Honza
> 
> > "Unfortunately there are actually a lot of restrictions placed on IFUNC
> > usage which aren't entirely clear and the documentation needs to be
> > updated." makes me wonder what other transformations are potentially
> > dangerous.
> >
> > Honza
> 
> 
> -- 
> H.J.

Re: [PATCH] tree-profile: Don't instrument an IFUNC resolver nor its callees

2024-02-29 Thread H.J. Lu

On Thu, Feb 29, 2024 at 6:34 AM Jan Hubicka  wrote:
>
> > On Thu, Feb 29, 2024 at 5:39 AM Jan Hubicka  wrote:
> > >
> > > > We can't instrument an IFUNC resolver nor its callees as it may require
> > > > TLS which hasn't been set up yet when the dynamic linker is resolving
> > > > IFUNC symbols.  Add an IFUNC resolver caller marker to symtab_node to
> > > > avoid recursive checking.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >   PR tree-optimization/114115
> > > >   * cgraph.h (enum ifunc_caller): New.
> > > >   (symtab_node): Add has_ifunc_caller.
> > > Unless we have users outside of tree-profile, I think it is better to
> > > avoid adding extra data to cgraph_node.  One can use node->get_uid() 
> > > indexed hash
> > > set to save the two bits needed for propagation.
> > > >   * tree-profile.cc (check_ifunc_resolver): New.
> > > >   (is_caller_ifunc_resolver): Likewise.
> > > >   (tree_profiling): Don't instrument an IFUNC resolver nor its
> > > >   callees.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >   PR tree-optimization/114115
> > > >   * gcc.dg/pr114115.c: New test.
> > >
> > > The problem with this approach is that tracking callees of ifunc
> > > resolvers will stop on the translation unit boundary and also with
> > > indirect call.  Also while ifunc resolver itself is called only once,
> > > its callees may also be used from performance critical code.
> >
> > IFUNC selector shouldn't have any external dependencies which
> > can cause issues at run-time.
>
> I am worried about scenario where ifunc selector calls function foo
> defined locally and foo is also used from other places possibly in hot
> loops.
> >
> > > So it is not really reliable fix (though I guess it will work a lot of
> > > common code).  I wonder what would be alternatives.  In GCC generated
> > > profling code we use TLS only for undirect call profiling (so there is
> > > no need to turn off rest of profiling).  I wonder if there is any chance
> > > to not make it seffault when it is done before TLS is set up?
> >
> > IFUNC selector should make minimum external calls, none is preferred.
>
> Edge porfiling only inserts (atomic) 64bit increments of counters.
> If target supports these operations inline, no external calls will be
> done.
>
> Indirect call profiling inserts the problematic TLS variable (to track
> caller-callee pairs). Value profiling also inserts various additional
> external calls to counters.
>
> I am perfectly fine with disabling instrumentation for ifunc selectors
> and functions only reachable from them, but I am worried about calles
> used also from non-ifunc path.

Programmers need to understand not to do it.

> For example selector implemented in C++ may do some string handling to
> match CPU name and propagation will disable profiling for std::string

On x86, they should use CPUID, not string functions.

> member functions (which may not be effective if comdat section is
> prevailed from other translation unit).

String functions may lead to external function calls which is dangerous.

> > Any external calls may lead to issues at run-time.  It is a very bad idea
> > to profile IFUNC selector via external function call.
>
> Looking at https://sourceware.org/glibc/wiki/GNU_IFUNC
> there are other limitations on ifunc except for profiling, such as
> -fstack-protector-all.  So perhaps your propagation can be used to
> disable those features as well.

So, it may not be tree-profile specific.  Where should these 2 bits
be added?

> "Unfortunately there are actually a lot of restrictions placed on IFUNC
> usage which aren't entirely clear and the documentation needs to be
> updated." makes me wonder what other transformations are potentially
> dangerous.
>
> Honza


-- 
H.J.

Re: [PATCH] lto, Darwin: Fix offload section names.

2024-02-29 Thread Thomas Schwinge

Hi Iain!

On 2024-01-16T15:00:16+, Iain Sandoe  wrote:
> Currently, these section names have wrong syntax for Mach-O.
> Although they were added some time ago; recently added tests are
> now emitting them leading to new fails on Darwin.
>
> This adds a Mach-O variant for each.

>  gcc/lto-section-names.h | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/lto-section-names.h b/gcc/lto-section-names.h
> index a743deb4efb..1cdadf36ec0 100644
> --- a/gcc/lto-section-names.h
> +++ b/gcc/lto-section-names.h
> @@ -25,7 +25,11 @@ along with GCC; see the file COPYING3.  If not see
> name for the functions and static_initializers.  For other types of
> sections a '.' and the section type are appended.  */
>  #define LTO_SECTION_NAME_PREFIX ".gnu.lto_"
> +#if OBJECT_FORMAT_MACHO
> +#define OFFLOAD_SECTION_NAME_PREFIX "__GNU_OFFLD_LTO,"
> +#else
>  #define OFFLOAD_SECTION_NAME_PREFIX ".gnu.offload_lto_"
> +#endif
>  
>  /* Can be either OFFLOAD_SECTION_NAME_PREFIX when we stream IR for offload
> compiler, or LTO_SECTION_NAME_PREFIX for LTO case.  */
> @@ -35,8 +39,14 @@ extern const char *section_name_prefix;
>  
>  #define LTO_SEGMENT_NAME "__GNU_LTO"
>  
> +#if OBJECT_FORMAT_MACHO
> +#define OFFLOAD_VAR_TABLE_SECTION_NAME "__GNU_OFFLOAD,__vars"
> +#define OFFLOAD_FUNC_TABLE_SECTION_NAME "__GNU_OFFLOAD,__funcs"
> +#define OFFLOAD_IND_FUNC_TABLE_SECTION_NAME "__GNU_OFFLOAD,__ind_fns"
> +#else
>  #define OFFLOAD_VAR_TABLE_SECTION_NAME ".gnu.offload_vars"
>  #define OFFLOAD_FUNC_TABLE_SECTION_NAME ".gnu.offload_funcs"
>  #define OFFLOAD_IND_FUNC_TABLE_SECTION_NAME ".gnu.offload_ind_funcs"
> +#endif
>  
>  #endif /* GCC_LTO_SECTION_NAMES_H */

Just to note that, per my understanding, this will require corresponding
changes elsewhere, once you attempt to actually enable offloading
compilation for Darwin (which -- ;-) I suspect -- is not on your agenda
right now):

$ git grep --cached -F .gnu.offload_
gcc/config/gcn/mkoffload.cc:  if (sscanf (buf, " .section 
.gnu.offload_vars%c", ) > 0)
gcc/config/gcn/mkoffload.cc:  else if (sscanf (buf, " .section 
.gnu.offload_funcs%c", ) > 0)
gcc/config/gcn/mkoffload.cc:  /* Likewise for .gnu.offload_vars; used 
for reverse offload. */
gcc/config/gcn/mkoffload.cc:  else if (sscanf (buf, " .section 
.gnu.offload_ind_funcs%c", ) > 0)
['gcc/lto-section-names.h' adjusted per above.]
libgcc/offloadstuff.c:#define OFFLOAD_FUNC_TABLE_SECTION_NAME 
".gnu.offload_funcs"
libgcc/offloadstuff.c:#define OFFLOAD_IND_FUNC_TABLE_SECTION_NAME 
".gnu.offload_ind_funcs"
libgcc/offloadstuff.c:#define OFFLOAD_VAR_TABLE_SECTION_NAME 
".gnu.offload_vars"
lto-plugin/lto-plugin.c:  if (startswith (name, ".gnu.offload_lto_.opts"))


Grüße
 Thomas

Re: [PATCH] tree-profile: Don't instrument an IFUNC resolver nor its callees

2024-02-29 Thread Jan Hubicka

> On Thu, Feb 29, 2024 at 5:39 AM Jan Hubicka  wrote:
> >
> > > We can't instrument an IFUNC resolver nor its callees as it may require
> > > TLS which hasn't been set up yet when the dynamic linker is resolving
> > > IFUNC symbols.  Add an IFUNC resolver caller marker to symtab_node to
> > > avoid recursive checking.
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR tree-optimization/114115
> > >   * cgraph.h (enum ifunc_caller): New.
> > >   (symtab_node): Add has_ifunc_caller.
> > Unless we have users outside of tree-profile, I think it is better to
> > avoid adding extra data to cgraph_node.  One can use node->get_uid() 
> > indexed hash
> > set to save the two bits needed for propagation.
> > >   * tree-profile.cc (check_ifunc_resolver): New.
> > >   (is_caller_ifunc_resolver): Likewise.
> > >   (tree_profiling): Don't instrument an IFUNC resolver nor its
> > >   callees.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   PR tree-optimization/114115
> > >   * gcc.dg/pr114115.c: New test.
> >
> > The problem with this approach is that tracking callees of ifunc
> > resolvers will stop on the translation unit boundary and also with
> > indirect call.  Also while ifunc resolver itself is called only once,
> > its callees may also be used from performance critical code.
> 
> IFUNC selector shouldn't have any external dependencies which
> can cause issues at run-time.

I am worried about scenario where ifunc selector calls function foo
defined locally and foo is also used from other places possibly in hot
loops.
> 
> > So it is not really reliable fix (though I guess it will work a lot of
> > common code).  I wonder what would be alternatives.  In GCC generated
> > profling code we use TLS only for undirect call profiling (so there is
> > no need to turn off rest of profiling).  I wonder if there is any chance
> > to not make it seffault when it is done before TLS is set up?
> 
> IFUNC selector should make minimum external calls, none is preferred.

Edge porfiling only inserts (atomic) 64bit increments of counters.
If target supports these operations inline, no external calls will be
done.

Indirect call profiling inserts the problematic TLS variable (to track
caller-callee pairs). Value profiling also inserts various additional
external calls to counters.

I am perfectly fine with disabling instrumentation for ifunc selectors
and functions only reachable from them, but I am worried about calles
used also from non-ifunc path.

For example selector implemented in C++ may do some string handling to
match CPU name and propagation will disable profiling for std::string
member functions (which may not be effective if comdat section is
prevailed from other translation unit).

> Any external calls may lead to issues at run-time.  It is a very bad idea
> to profile IFUNC selector via external function call.

Looking at https://sourceware.org/glibc/wiki/GNU_IFUNC
there are other limitations on ifunc except for profiling, such as
-fstack-protector-all.  So perhaps your propagation can be used to
disable those features as well.

"Unfortunately there are actually a lot of restrictions placed on IFUNC
usage which aren't entirely clear and the documentation needs to be
updated." makes me wonder what other transformations are potentially
dangerous.

Honza

Re: [PATCH] i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534]

2024-02-29 Thread H.J. Lu

On Thu, Feb 29, 2024 at 6:15 AM Jan Hubicka  wrote:
>
> > On Thu, Feb 29, 2024 at 02:31:05PM +0100, Jan Hubicka wrote:
> > > I agree that debugability of user core dumps is important here.
> > >
> > > I guess an ideal solution would be to change codegen of noreturn functions
> > > to callee save all registers. Performance of prologue of noreturn
> > > function is not too important. THen we can stop caller saving registers
> > > and still get reasonable backtraces.
> >
> > I don't think that is possible.
> > While both C and C++ require that if [[noreturn]] attribute is used on
> > some function declaration, it must be used on the first declaration and
> > also if some function is [[noreturn]] in one TU, it must be [[noreturn]]
> > in all other TUs which declare the same function.
> > But, we have no such requirement for __attribute__((noreturn)), there it
> > is a pure optimization, it can be declared just on the caller side as an
> > optimization hint the function will not return, or just on the callee side
> > where the compiler will actually verify it doesn't return, or both.
> > And, the attribute is not part of function type, so even in standard C/C++,
> > one can use
> > extern void bar ();
> > [[noreturn]] void foo ()
> > {
> >   for (;;) bar ();
> > }
> > void (*fn) () = foo;
> > void baz ()
> > {
> >   fn ();
> > }
> > As you can call the noreturn function directly or indirectly, changing
> > calling conventions based on noreturn vs. no-noreturn is IMHO not possible.
>
> I am not wed to the idea (just it appeared to me as an option to
> disabling this optimization by default). I still think it may make sense.
>
> Making noreturn calles to save caller saved register is compatible with
> the default ABI.  If noreturn is missing on caller side, then caller will
> save reigsters as usual. Noreturn callee will save them again, which is
> pointless, but everything should work as usual and extra cost of saving
> should not matter in practice.  This is also the case of indirect call
> of noreturn function where you miss annotation on caller side.
>
> If noreturn is missing on callee side, we will lose information on
> functions arguments in backtrace, but the code will still work
> (especially if we save BP register to make code backtraceable).  This is
> scenario that probably can be avoided in practice where it matters (such
> as in glibc abort whose implementation is annotated).
>
> Noreturn already leads to some information loss in backtraces. I tend to
> get surprised from time to time to see whrong call to abort due to tail
> merging. So it may be acceptable to lose info in a situation where user
> does sily thing and only annotates caller.
>
> Since we auto-detect noreturn, we may need to be extra careful about noreturn
> comdats. Here auto-detection of prevailing def may have different
> outcome than auto-detection of prevailed defs. So we may want to disable
> the optimization for auto-detected comdats.
>

There are 2 kinds of noreturns.  One is abort which may require backtrace.
The other is a normal exit from the previous frame.  The latter case doesn't
require backtrace and can be performance critical.  Which one is more
important for users?

-- 
H.J.

Re: [PATCH] i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534]

2024-02-29 Thread Jan Hubicka

> On Thu, Feb 29, 2024 at 02:31:05PM +0100, Jan Hubicka wrote:
> > I agree that debugability of user core dumps is important here.
> > 
> > I guess an ideal solution would be to change codegen of noreturn functions
> > to callee save all registers. Performance of prologue of noreturn
> > function is not too important. THen we can stop caller saving registers
> > and still get reasonable backtraces.
> 
> I don't think that is possible.
> While both C and C++ require that if [[noreturn]] attribute is used on
> some function declaration, it must be used on the first declaration and
> also if some function is [[noreturn]] in one TU, it must be [[noreturn]]
> in all other TUs which declare the same function.
> But, we have no such requirement for __attribute__((noreturn)), there it
> is a pure optimization, it can be declared just on the caller side as an
> optimization hint the function will not return, or just on the callee side
> where the compiler will actually verify it doesn't return, or both.
> And, the attribute is not part of function type, so even in standard C/C++,
> one can use
> extern void bar ();
> [[noreturn]] void foo ()
> {
>   for (;;) bar ();
> }
> void (*fn) () = foo;
> void baz ()
> {
>   fn ();
> }
> As you can call the noreturn function directly or indirectly, changing
> calling conventions based on noreturn vs. no-noreturn is IMHO not possible.

I am not wed to the idea (just it appeared to me as an option to
disabling this optimization by default). I still think it may make sense.

Making noreturn calles to save caller saved register is compatible with
the default ABI.  If noreturn is missing on caller side, then caller will
save reigsters as usual. Noreturn callee will save them again, which is
pointless, but everything should work as usual and extra cost of saving
should not matter in practice.  This is also the case of indirect call
of noreturn function where you miss annotation on caller side.

If noreturn is missing on callee side, we will lose information on
functions arguments in backtrace, but the code will still work
(especially if we save BP register to make code backtraceable).  This is
scenario that probably can be avoided in practice where it matters (such
as in glibc abort whose implementation is annotated).  

Noreturn already leads to some information loss in backtraces. I tend to
get surprised from time to time to see whrong call to abort due to tail
merging. So it may be acceptable to lose info in a situation where user
does sily thing and only annotates caller.

Since we auto-detect noreturn, we may need to be extra careful about noreturn
comdats. Here auto-detection of prevailing def may have different
outcome than auto-detection of prevailed defs. So we may want to disable
the optimization for auto-detected comdats.

Honza
> 
>   Jakub
>

Re: [PATCH] calls: Fix up TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-02-29 Thread Richard Earnshaw




On 29/02/2024 14:10, Richard Earnshaw (lists) wrote:
> On 27/02/2024 17:25, Jakub Jelinek wrote:
>> On Tue, Feb 27, 2024 at 04:41:32PM +, Richard Earnshaw wrote:
 2023-01-09  Jakub Jelinek  

PR target/107453
* calls.cc (expand_call): For calls with
TYPE_NO_NAMED_ARGS_STDARG_P (funtype) use zero for n_named_args.
Formatting fix.
>>>
>>> This one has been festering for a while; both Alexandre and Torbjorn have 
>>> attempted to fix it recently, but I'm not sure either is really right...
>>>
>>> On Arm this is causing all anonymous arguments to be passed on the stack,
>>> which is incorrect per the ABI.  On a target that uses
>>> 'pretend_outgoing_vararg_named', why is it correct to set n_named_args to
>>> zero?  Is it enough to guard both the statements you've added with
>>> !targetm.calls.pretend_outgoing_args_named?
>>
>> I'm afraid I haven't heard of that target hook before.
>> All I was doing with that change was fixing a regression reported in the PR
>> for ppc64le/sparc/nvptx/loongarch at least.
>>
>> The TYPE_NO_NAMED_ARGS_STDARG_P functions (C23 fns like void foo (...) {})
>> have NULL type_arg_types, so the list_length (type_arg_types) isn't done for
>> it, but it should be handled as if it was non-NULL but list length was 0.
>>
>> So, for the
>>   if (type_arg_types != 0)
>> n_named_args
>>   = (list_length (type_arg_types)
>>  /* Count the struct value address, if it is passed as a parm.  */
>>  + structure_value_addr_parm);
>>   else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>> n_named_args = 0;
>>   else
>> /* If we know nothing, treat all args as named.  */
>> n_named_args = num_actuals;
>> case, I think guarding it by any target hooks is wrong, although
>> I guess it should have been
>> n_named_args = structure_value_addr_parm;
>> instead of
>> n_named_args = 0;
>>
>> For the second
>>   if (type_arg_types != 0
>>   && targetm.calls.strict_argument_naming (args_so_far))
>> ;
>>   else if (type_arg_types != 0
>>&& ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>> /* Don't include the last named arg.  */
>> --n_named_args;
>>   else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>> n_named_args = 0;
>>   else
>> /* Treat all args as named.  */
>> n_named_args = num_actuals;
>> bet (but no testing done, don't even know which targets return what for
>> those hooks) we should treat those as if type_arg_types was non-NULL
>> with 0 elements in the list, except the --n_named_args doesn't make sense
>> because that would decrease it to -1.
>> So perhaps
>>   if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>>   && targetm.calls.strict_argument_naming (args_so_far))
>> ;
>>   else if (type_arg_types != 0
>>&& ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
>> /* Don't include the last named arg.  */
>> --n_named_args;
>>   else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
>> && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far)))
>> ;
>>   else
>> /* Treat all args as named.  */
>> n_named_args = num_actuals;
> 
> I tried the above on arm, aarch64 and x86_64 and that seems fine, including 
> the new testcase you added.
> 

I should mention though, that INIT_CUMULATIVE_ARGS on arm ignores n_named_args 
entirely, it doesn't need it (I don't think it even existed when the AAPCS code 
was added).

R.

> R.
> 
>>
>> (or n_named_args = 0; instead of ; before the final else?  Dunno).
>> I guess we need some testsuite coverage for caller/callee ABI match of
>> struct S { char p[64]; };
>> struct S foo (...);
>>
>>  Jakub
>>
>

Re: [PATCH] tree-profile: Don't instrument an IFUNC resolver nor its callees

2024-02-29 Thread H.J. Lu

On Thu, Feb 29, 2024 at 5:39 AM Jan Hubicka  wrote:
>
> > We can't instrument an IFUNC resolver nor its callees as it may require
> > TLS which hasn't been set up yet when the dynamic linker is resolving
> > IFUNC symbols.  Add an IFUNC resolver caller marker to symtab_node to
> > avoid recursive checking.
> >
> > gcc/ChangeLog:
> >
> >   PR tree-optimization/114115
> >   * cgraph.h (enum ifunc_caller): New.
> >   (symtab_node): Add has_ifunc_caller.
> Unless we have users outside of tree-profile, I think it is better to
> avoid adding extra data to cgraph_node.  One can use node->get_uid() indexed 
> hash
> set to save the two bits needed for propagation.
> >   * tree-profile.cc (check_ifunc_resolver): New.
> >   (is_caller_ifunc_resolver): Likewise.
> >   (tree_profiling): Don't instrument an IFUNC resolver nor its
> >   callees.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR tree-optimization/114115
> >   * gcc.dg/pr114115.c: New test.
>
> The problem with this approach is that tracking callees of ifunc
> resolvers will stop on the translation unit boundary and also with
> indirect call.  Also while ifunc resolver itself is called only once,
> its callees may also be used from performance critical code.

IFUNC selector shouldn't have any external dependencies which
can cause issues at run-time.

> So it is not really reliable fix (though I guess it will work a lot of
> common code).  I wonder what would be alternatives.  In GCC generated
> profling code we use TLS only for undirect call profiling (so there is
> no need to turn off rest of profiling).  I wonder if there is any chance
> to not make it seffault when it is done before TLS is set up?

IFUNC selector should make minimum external calls, none is preferred.
Any external calls may lead to issues at run-time.  It is a very bad idea
to profile IFUNC selector via external function call.

> Honza
> > ---
> >  gcc/cgraph.h| 18 +++
> >  gcc/testsuite/gcc.dg/pr114115.c | 24 +
> >  gcc/tree-profile.cc | 92 +
> >  3 files changed, 134 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/pr114115.c
> >
> > diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> > index 47f35e8078d..ce99f4a5114 100644
> > --- a/gcc/cgraph.h
> > +++ b/gcc/cgraph.h
> > @@ -100,6 +100,21 @@ enum symbol_partitioning_class
> > SYMBOL_DUPLICATE
> >  };
> >
> > +/* Classification whether a function has any IFUNC resolver caller.  */
> > +enum ifunc_caller
> > +{
> > +  /* It is unknown if this function has any IFUNC resolver caller.  */
> > +  IFUNC_CALLER_UNKNOWN,
> > +  /* Work in progress to check if this function has any IFUNC resolver
> > + caller.  */
> > +  IFUNC_CALLER_WIP,
> > +  /* This function has at least an IFUNC resolver caller, including
> > + itself.  */
> > +  IFUNC_CALLER_TRUE,
> > +  /* This function doesn't have any IFUNC resolver caller.  */
> > +  IFUNC_CALLER_FALSE
> > +};
> > +
> >  /* Base of all entries in the symbol table.
> > The symtab_node is inherited by cgraph and varpol nodes.  */
> >  struct GTY((desc ("%h.type"), tag ("SYMTAB_SYMBOL"),
> > @@ -121,6 +136,7 @@ public:
> >used_from_other_partition (false), in_other_partition (false),
> >address_taken (false), in_init_priority_hash (false),
> >need_lto_streaming (false), offloadable (false), ifunc_resolver 
> > (false),
> > +  has_ifunc_caller (IFUNC_CALLER_UNKNOWN),
> >order (false), next_sharing_asm_name (NULL),
> >previous_sharing_asm_name (NULL), same_comdat_group (NULL), ref_list 
> > (),
> >alias_target (NULL), lto_file_data (NULL), aux (NULL),
> > @@ -595,6 +611,8 @@ public:
> >/* Set when symbol is an IFUNC resolver.  */
> >unsigned ifunc_resolver : 1;
> >
> > +  /* Classification whether a function has any IFUNC resolver caller.  */
> > +  ENUM_BITFIELD (ifunc_caller) has_ifunc_caller : 2;
> >
> >/* Ordering of all symtab entries.  */
> >int order;
> > diff --git a/gcc/testsuite/gcc.dg/pr114115.c 
> > b/gcc/testsuite/gcc.dg/pr114115.c
> > new file mode 100644
> > index 000..2629f591877
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr114115.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O0 -fprofile-generate -fdump-tree-optimized" } */
> > +/* { dg-require-profiling "-fprofile-generate" } */
> > +/* { dg-require-ifunc "" } */
> > +
> > +void *foo_ifunc2() __attribute__((ifunc("foo_resolver")));
> > +
> > +void bar(void)
> > +{
> > +}
> > +
> > +static int f3()
> > +{
> > +  bar ();
> > +  return 5;
> > +}
> > +
> > +void (*foo_resolver(void))(void)
> > +{
> > +  f3();
> > +  return bar;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-not "__gcov_indirect_call_profiler_v" 
> > "optimized" } } */
> > diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> > index aed13e2b1bc..46478648b32 100644
> > --- a/gcc/tree-profile.cc
> > +++ b/gcc/tree-profile.cc
> > @@ -738,6

Re: [PATCH] calls: Fix up TYPE_NO_NAMED_ARGS_STDARG_P handling [PR107453]

2024-02-29 Thread Richard Earnshaw (lists)

On 27/02/2024 17:25, Jakub Jelinek wrote:
> On Tue, Feb 27, 2024 at 04:41:32PM +, Richard Earnshaw wrote:
>>> 2023-01-09  Jakub Jelinek  
>>>
>>> PR target/107453
>>> * calls.cc (expand_call): For calls with
>>> TYPE_NO_NAMED_ARGS_STDARG_P (funtype) use zero for n_named_args.
>>> Formatting fix.
>>
>> This one has been festering for a while; both Alexandre and Torbjorn have 
>> attempted to fix it recently, but I'm not sure either is really right...
>>
>> On Arm this is causing all anonymous arguments to be passed on the stack,
>> which is incorrect per the ABI.  On a target that uses
>> 'pretend_outgoing_vararg_named', why is it correct to set n_named_args to
>> zero?  Is it enough to guard both the statements you've added with
>> !targetm.calls.pretend_outgoing_args_named?
> 
> I'm afraid I haven't heard of that target hook before.
> All I was doing with that change was fixing a regression reported in the PR
> for ppc64le/sparc/nvptx/loongarch at least.
> 
> The TYPE_NO_NAMED_ARGS_STDARG_P functions (C23 fns like void foo (...) {})
> have NULL type_arg_types, so the list_length (type_arg_types) isn't done for
> it, but it should be handled as if it was non-NULL but list length was 0.
> 
> So, for the
>   if (type_arg_types != 0)
> n_named_args
>   = (list_length (type_arg_types)
>  /* Count the struct value address, if it is passed as a parm.  */
>  + structure_value_addr_parm);
>   else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
> n_named_args = 0;
>   else
> /* If we know nothing, treat all args as named.  */
> n_named_args = num_actuals;
> case, I think guarding it by any target hooks is wrong, although
> I guess it should have been
> n_named_args = structure_value_addr_parm;
> instead of
> n_named_args = 0;
> 
> For the second
>   if (type_arg_types != 0
>   && targetm.calls.strict_argument_naming (args_so_far))
> ;
>   else if (type_arg_types != 0
>&& ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
> /* Don't include the last named arg.  */
> --n_named_args;
>   else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
> n_named_args = 0;
>   else
> /* Treat all args as named.  */
> n_named_args = num_actuals;
> bet (but no testing done, don't even know which targets return what for
> those hooks) we should treat those as if type_arg_types was non-NULL
> with 0 elements in the list, except the --n_named_args doesn't make sense
> because that would decrease it to -1.
> So perhaps
>   if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
>   && targetm.calls.strict_argument_naming (args_so_far))
> ;
>   else if (type_arg_types != 0
>&& ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
> /* Don't include the last named arg.  */
> --n_named_args;
>   else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
>  && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far)))
> ;
>   else
> /* Treat all args as named.  */
> n_named_args = num_actuals;

I tried the above on arm, aarch64 and x86_64 and that seems fine, including the 
new testcase you added.

R.

> 
> (or n_named_args = 0; instead of ; before the final else?  Dunno).
> I guess we need some testsuite coverage for caller/callee ABI match of
> struct S { char p[64]; };
> struct S foo (...);
> 
>   Jakub
>

Re: [PATCH] i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534]

2024-02-29 Thread Jakub Jelinek

On Thu, Feb 29, 2024 at 02:31:05PM +0100, Jan Hubicka wrote:
> I agree that debugability of user core dumps is important here.
> 
> I guess an ideal solution would be to change codegen of noreturn functions
> to callee save all registers. Performance of prologue of noreturn
> function is not too important. THen we can stop caller saving registers
> and still get reasonable backtraces.

I don't think that is possible.
While both C and C++ require that if [[noreturn]] attribute is used on
some function declaration, it must be used on the first declaration and
also if some function is [[noreturn]] in one TU, it must be [[noreturn]]
in all other TUs which declare the same function.
But, we have no such requirement for __attribute__((noreturn)), there it
is a pure optimization, it can be declared just on the caller side as an
optimization hint the function will not return, or just on the callee side
where the compiler will actually verify it doesn't return, or both.
And, the attribute is not part of function type, so even in standard C/C++,
one can use
extern void bar ();
[[noreturn]] void foo ()
{
  for (;;) bar ();
}
void (*fn) () = foo;
void baz ()
{
  fn ();
}
As you can call the noreturn function directly or indirectly, changing
calling conventions based on noreturn vs. no-noreturn is IMHO not possible.

Jakub

Re: [PATCH] tree-profile: Don't instrument an IFUNC resolver nor its callees

2024-02-29 Thread Jan Hubicka

> We can't instrument an IFUNC resolver nor its callees as it may require
> TLS which hasn't been set up yet when the dynamic linker is resolving
> IFUNC symbols.  Add an IFUNC resolver caller marker to symtab_node to
> avoid recursive checking.
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/114115
>   * cgraph.h (enum ifunc_caller): New.
>   (symtab_node): Add has_ifunc_caller.
Unless we have users outside of tree-profile, I think it is better to
avoid adding extra data to cgraph_node.  One can use node->get_uid() indexed 
hash
set to save the two bits needed for propagation.
>   * tree-profile.cc (check_ifunc_resolver): New.
>   (is_caller_ifunc_resolver): Likewise.
>   (tree_profiling): Don't instrument an IFUNC resolver nor its
>   callees.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/114115
>   * gcc.dg/pr114115.c: New test.

The problem with this approach is that tracking callees of ifunc
resolvers will stop on the translation unit boundary and also with
indirect call.  Also while ifunc resolver itself is called only once,
its callees may also be used from performance critical code.

So it is not really reliable fix (though I guess it will work a lot of
common code).  I wonder what would be alternatives.  In GCC generated
profling code we use TLS only for undirect call profiling (so there is
no need to turn off rest of profiling).  I wonder if there is any chance
to not make it seffault when it is done before TLS is set up?

Honza
> ---
>  gcc/cgraph.h| 18 +++
>  gcc/testsuite/gcc.dg/pr114115.c | 24 +
>  gcc/tree-profile.cc | 92 +
>  3 files changed, 134 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr114115.c
> 
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 47f35e8078d..ce99f4a5114 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -100,6 +100,21 @@ enum symbol_partitioning_class
> SYMBOL_DUPLICATE
>  };
>  
> +/* Classification whether a function has any IFUNC resolver caller.  */
> +enum ifunc_caller
> +{
> +  /* It is unknown if this function has any IFUNC resolver caller.  */
> +  IFUNC_CALLER_UNKNOWN,
> +  /* Work in progress to check if this function has any IFUNC resolver
> + caller.  */
> +  IFUNC_CALLER_WIP,
> +  /* This function has at least an IFUNC resolver caller, including
> + itself.  */
> +  IFUNC_CALLER_TRUE,
> +  /* This function doesn't have any IFUNC resolver caller.  */
> +  IFUNC_CALLER_FALSE
> +};
> +
>  /* Base of all entries in the symbol table.
> The symtab_node is inherited by cgraph and varpol nodes.  */
>  struct GTY((desc ("%h.type"), tag ("SYMTAB_SYMBOL"),
> @@ -121,6 +136,7 @@ public:
>used_from_other_partition (false), in_other_partition (false),
>address_taken (false), in_init_priority_hash (false),
>need_lto_streaming (false), offloadable (false), ifunc_resolver 
> (false),
> +  has_ifunc_caller (IFUNC_CALLER_UNKNOWN),
>order (false), next_sharing_asm_name (NULL),
>previous_sharing_asm_name (NULL), same_comdat_group (NULL), ref_list 
> (),
>alias_target (NULL), lto_file_data (NULL), aux (NULL),
> @@ -595,6 +611,8 @@ public:
>/* Set when symbol is an IFUNC resolver.  */
>unsigned ifunc_resolver : 1;
>  
> +  /* Classification whether a function has any IFUNC resolver caller.  */
> +  ENUM_BITFIELD (ifunc_caller) has_ifunc_caller : 2;
>  
>/* Ordering of all symtab entries.  */
>int order;
> diff --git a/gcc/testsuite/gcc.dg/pr114115.c b/gcc/testsuite/gcc.dg/pr114115.c
> new file mode 100644
> index 000..2629f591877
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr114115.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O0 -fprofile-generate -fdump-tree-optimized" } */
> +/* { dg-require-profiling "-fprofile-generate" } */
> +/* { dg-require-ifunc "" } */
> +
> +void *foo_ifunc2() __attribute__((ifunc("foo_resolver")));
> +
> +void bar(void)
> +{
> +}
> +
> +static int f3()
> +{
> +  bar ();
> +  return 5;
> +}
> +
> +void (*foo_resolver(void))(void)
> +{
> +  f3();
> +  return bar;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "__gcov_indirect_call_profiler_v" 
> "optimized" } } */
> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> index aed13e2b1bc..46478648b32 100644
> --- a/gcc/tree-profile.cc
> +++ b/gcc/tree-profile.cc
> @@ -738,6 +738,72 @@ include_source_file_for_profile (const char *filename)
>return false;
>  }
>  
> +/* Return true and set *DATA to true if NODE is an ifunc resolver.  */
> +
> +static bool
> +check_ifunc_resolver (cgraph_node *node, void *data)
> +{
> +  if (node->ifunc_resolver)
> +{
> +  bool *is_ifunc_resolver = (bool *) data;
> +  *is_ifunc_resolver = true;
> +  return true;
> +}
> +  return false;
> +}
> +
> +/* Return true if any caller of NODE is an ifunc resolver.  */
> +
> +static bool
> +is_caller_ifunc_resolver (cgraph_node *node)
> +{
> +  if

Re: [PATCH] i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534]

2024-02-29 Thread Jan Hubicka

> >
> > The problem is that it doesn't help in this case.
> > If some optimization makes debugging of some function harder, normally it is
> > enough to recompile the translation unit that defines it with -O0/-Og, or
> > add optimize attribute on the function.
> > While in this case, the optimization interferes with debugging of other
> > functions, not necessarily from the same translation unit, not necessarily
> > even from the same library or binary, or even from the same package.
> > As I tried to explain, supposedly glibc abort is compiled with -O2 and needs
> > a lot of registers, so say it uses all of %rbx, %rbp, %r12, %r13, %r14,
> > %r15 and this optimization is applied on it.  That means debugging of any
> > application (-O2, -Og or even -O0 compiled) to understand what went wrong
> > and why it aborted will be harder.  Including core file analysis.
> > Recompiling those apps with -O0/-Og will not help.  The only thing that
> > would help is to recompile glibc with -O0/-Og.
> > Doesn't have to be abort, doesn't have to be glibc.  Any library which
> > exports some noreturn APIs may be affected.
> > And there is not even a workaround other than to recompile with -O0/-Og the
> > noreturn functions, no way to disable this optimization.
> >
> > Given that most users just will not be aware of this, even adding the option
> > but defaulting to on would mean a problem for a lot of users.  Most of them
> > will not know the problem is that some noreturn function 10 frames deep in
> > the call stack was optimized this way.
> >
> > If people only call the noreturn functions from within the same package,
> > for some strange reason care about performance of noreturn functions (they
> > don't return, so unless you longjmp out of them or something similar
> > which is costly on its own already, they should be entered exactly once)
> > and are willing to pay the price in worse debugging in that case, let them
> > use the option.  But if they provide libraries that other packages then
> > consume, I'd say it wouldn't be a good idea.
> 
> +1
> 
> I'll definitely patch this by-default behavior out if we as upstream keep it.
> Debugging customer core dumps is more important than optimizing
> glibc abort/assert.
> 
> I do hope such patch will be at least easy, like flipping the default of an
> option.

I agree that debugability of user core dumps is important here.

I guess an ideal solution would be to change codegen of noreturn functions
to callee save all registers. Performance of prologue of noreturn
function is not too important. THen we can stop caller saving registers
and still get reasonable backtraces.

This is essentially an ABI change (though kind of conservative one since
nothing except debugging really depends on it).  Perhaps it would make
sense to make the optimization non-default option now and also implement
the callee save logic. Then we should be able to flip release or two
later. Maybe synchroniation with LLVM would be desirable here if we
decide to go this route.

Honza
> 
> Richard.
> 
> > Jakub
> >

Re: [PATCH v2] DSE: Bugfix ICE after allow vector type in get_stored_val

2024-02-29 Thread Robin Dapp

On 2/29/24 02:38, Li, Pan2 wrote:
>> So it's going to check if V2SF can be tied to DI and V4QI with SI.  I 
>> suspect those are going to fail for RISC-V as those aren't tieable.
> 
> Yes, you are right. Different REG_CLASS are not allowed to be tieable in 
> RISC-V.
> 
> static bool
> riscv_modes_tieable_p (machine_mode mode1, machine_mode mode2)
> {
>   /* We don't allow different REG_CLASS modes tieable since it
>  will cause ICE in register allocation (RA).
>  E.g. V2SI and DI are not tieable.  */
>   if (riscv_v_ext_mode_p (mode1) != riscv_v_ext_mode_p (mode2))
> return false;
>   return (mode1 == mode2
>   || !(GET_MODE_CLASS (mode1) == MODE_FLOAT
>&& GET_MODE_CLASS (mode2) == MODE_FLOAT));
> }

Yes, but what we set tieable is e.g. V4QI and V2SF.

I suggested a target band-aid before:

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 799d7919a4a..982ca1a4250 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -8208,6 +8208,11 @@ riscv_modes_tieable_p (machine_mode mode1, machine_mode 
mode2)
  E.g. V2SI and DI are not tieable.  */
   if (riscv_v_ext_mode_p (mode1) != riscv_v_ext_mode_p (mode2))
 return false;
+  if (GET_MODE_CLASS (GET_MODE_INNER (mode1)) == MODE_INT
+  && GET_MODE_CLASS (GET_MODE_INNER (mode2)) == MODE_FLOAT
+  && GET_MODE_SIZE (GET_MODE_INNER (mode1))
+   != GET_MODE_SIZE (GET_MODE_INNER (mode2)))
+return false;
   return (mode1 == mode2
  || !(GET_MODE_CLASS (mode1) == MODE_FLOAT
   && GET_MODE_CLASS (mode2) == MODE_FLOAT));

but I don't like that as it just works around something
that I didn't even understand fully...

Regards
 Robin

Re: [PATCH] tree-optimization/114151 - handle POLY_INT_CST in get_range_pos_neg

2024-02-29 Thread Jakub Jelinek

On Thu, Feb 29, 2024 at 02:08:19PM +0100, Richard Biener wrote:
> > So, wouldn't it be better to outline what you have above + POLY_INT_CST
> > handling into a helper function, which similarly to get_range_pos_neg
> > returns a bitmask, but rather than 1 bit for may be [0, max] and another 
> > bit for
> > may be [min, -1] you return 3 bits, 1 bit for may be [1, max], another for
> > may be [0, 0] and another for may be [min, -1]?
> > Also, I bet you actually want to handle TREE_UNSIGNED just as [0, 0]
> > and [1, max] ranges unlike get_range_pos_neg.
> 
> I'm just lazy and given TYPE_OVERFLOW_WRAPS (and thus unsigned) doesn't
> ever get here and I special-case integer_zerop it doesn't really matter
> that in these cases get_range_pos_neg isn't exactly what's wanted - I'm
> asking it only for those cases where it works just fine.

Just handling integer_zerop doesn't cover the case where the chrec
operand isn't INTEGER_CST, just includes zero in its range.  And I'd think
that is something quite common (sure, INTEGER_CST chrec operands are likely
more common than that) that we know that something isn't negative, or isn't
positive, or is non-negative, or is non-positive etc.

> > So perhaps
> >   int ret = 7;
> >   if (TYPE_UNSIGNED (TREE_TYPE (arg)))
> > ret = 3;
> >   if (poly_int_tree_p (arg))
> > {
> >   poly_wide_int w = wi::to_poly_wide (arg);
> >   if (known_lt (w, 0))
> > return 4;
> >   else if (known_eq (w, 0))
> > return 2;
> >   else if (known_gt (w, 0))
> > return 1;
> >   else
> > return 7;
> > }
> >   value_range r;
> >   if (!get_range_query (cfun)->range_of_expr (r, arg)
> >   || r.undefined_p ())
> > return ret;
> >   if (r.nonpositive_p ())
> > ret &= ~1;
> >   if (r.nonzero_p ())
> > ret &= ~2;
> >   if (r.nonnegative_p ())
> > ret &= ~4;
> >   return ret;

And the above should be short/simple enough to be added even if it
just has a single user (ok, 2 in the same stmt).
Could be even just a lambda if there are no other uses for it,
so you would need to care less how to name it/where to declare etc.

> > I doubt POLY_INT_CST will appear on what the function is being called on
> > (types with scalar integral modes, mainly in .*_OVERFLOW expansion or say
> > division/modulo expansion, but maybe my imagination is limited);
> > so, if you think this is a good idea and the poly int in that case somehow
> > guarantees the existing behavior (guess for signed it would be at least when
> > not -fwrapv in action UB if the addition of the first POLY_INT_CST coeff
> > and the others multiplied by the runtime value wraps around, but for
> > unsigned is there a guarantee that if all the POLY_INT_CST coefficients
> > don't have msb set that the resulting value will not have msb set either?
> 
> I hope so, but ...

Let's wait for Richard there.
Anyway, if for the chrec case it only uses it on non-wrapping signed,
then the POLY_INT_CST handling is fine in there...

Jakub

Re: [PATCH] i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534]

2024-02-29 Thread Richard Biener

On Thu, Feb 29, 2024 at 1:47 PM Jakub Jelinek  wrote:
>
> On Thu, Feb 29, 2024 at 04:26:00AM -0800, H.J. Lu wrote:
> > > > Adding Hongtao and Honza into the loop as the ones who acked the 
> > > > original
> > > > patch.
> > > >
> > > > The no_callee_saved_registers by default for noreturn functions change 
> > > > can
> > > > break in-process backtrace(3) or backtraces from debugger or other 
> > > > process
> > > > (quite often, any time the noreturn function decides to use the bp 
> > > > register
> > > > and any of the parent frames uses a frame pointer; the unwinder just 
> > > > crashes
> > > > in the libgcc unwinder case, gdb prints stack corrupted message), so I'd
> > > > like to save bp register in that case:
> > > >
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646591.html
> > > I think this patch makes sense and LGTM, we save and restore frame
> > > pointer for noreturn.
> > > >
> > > > and additionally the no_callee_saved_registers by default for noreturn
> > > > functions change can make debugging harder, again not localized to the
> > > > noreturn function, but any of its callers.  So, if say glibc abort 
> > > > function
> > > > implementation needs a lot of normally callee-saved registers, no 
> > > > matter how
> > > > users recompile their apps, they will see garbage or optimized out
> > > > vars/parameters in their code unless they rebuild their glibc with -O0.
> > > > So, I think we should guard that by a non-default option:
> > > >
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646649.html
> > > So it turns off the optimization for noreturn functions by default,
> > > I'm not sure about this.
> > > Any comments, H.J?
> >
> > We need BP for backtrace.  I don't think we need to save other
> > registers.  True, GDB may not see function parameters.  But
> > optimization always has this impact.  When I need to debug a
> > program, I always use -O0 or -Og.
>
> The problem is that it doesn't help in this case.
> If some optimization makes debugging of some function harder, normally it is
> enough to recompile the translation unit that defines it with -O0/-Og, or
> add optimize attribute on the function.
> While in this case, the optimization interferes with debugging of other
> functions, not necessarily from the same translation unit, not necessarily
> even from the same library or binary, or even from the same package.
> As I tried to explain, supposedly glibc abort is compiled with -O2 and needs
> a lot of registers, so say it uses all of %rbx, %rbp, %r12, %r13, %r14,
> %r15 and this optimization is applied on it.  That means debugging of any
> application (-O2, -Og or even -O0 compiled) to understand what went wrong
> and why it aborted will be harder.  Including core file analysis.
> Recompiling those apps with -O0/-Og will not help.  The only thing that
> would help is to recompile glibc with -O0/-Og.
> Doesn't have to be abort, doesn't have to be glibc.  Any library which
> exports some noreturn APIs may be affected.
> And there is not even a workaround other than to recompile with -O0/-Og the
> noreturn functions, no way to disable this optimization.
>
> Given that most users just will not be aware of this, even adding the option
> but defaulting to on would mean a problem for a lot of users.  Most of them
> will not know the problem is that some noreturn function 10 frames deep in
> the call stack was optimized this way.
>
> If people only call the noreturn functions from within the same package,
> for some strange reason care about performance of noreturn functions (they
> don't return, so unless you longjmp out of them or something similar
> which is costly on its own already, they should be entered exactly once)
> and are willing to pay the price in worse debugging in that case, let them
> use the option.  But if they provide libraries that other packages then
> consume, I'd say it wouldn't be a good idea.

+1

I'll definitely patch this by-default behavior out if we as upstream keep it.
Debugging customer core dumps is more important than optimizing
glibc abort/assert.

I do hope such patch will be at least easy, like flipping the default of an
option.

Richard.

> Jakub
>

Re: [PATCH v3] RISC-V: Introduce gcc option mrvv-vector-bits for RVV

2024-02-29 Thread Robin Dapp

> I think it makes more sense to remove the whole
> --param=riscv-autovec-preference since we should use 
> -fno-tree-vectorize instead of --param=riscv-autovec-preference=none
> which is more reasonable compile option for users.
> 
> --param is just a internal testing option that we added before,
> ideally we should remove them.
Yes, I agree with that.  At least the "none" part doesn't seem
necessary.

Regards
 Robin

Re: [PATCH] tree-optimization/114151 - handle POLY_INT_CST in get_range_pos_neg

2024-02-29 Thread Richard Biener

On Thu, 29 Feb 2024, Jakub Jelinek wrote:

> On Thu, Feb 29, 2024 at 09:21:02AM +0100, Richard Biener wrote:
> > The following switches the logic in chrec_fold_multiply to
> > get_range_pos_neg since handling POLY_INT_CST possibly mixed with
> > non-poly ranges will make the open-coding awkward and while not
> > a perfect fit it should work.
> > 
> > In turn the following makes get_range_pos_neg aware of POLY_INT_CSTs.
> > I couldn't make it work with poly_wide_int since the compares always
> > fail to build but poly_widest_int works fine and it should be
> > semantically the same.  I've also changed get_range_pos_neg to
> > use get_range_query (cfun), problematical passes shouldn't have
> > a range query activated so it shouldn't make a difference there.
> > 
> > This doesn't make a difference for the PR but not considering
> > POLY_INT_CSTs was a mistake.
> > 
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?
> > 
> > Thanks,
> > Richard.
> > 
> > PR tree-optimization/114151
> > * tree.cc (get_range_pos_neg): Handle POLY_INT_CST, use
> > the passes range-query if available.
> > * tree-chre.cc (chrec_fold_multiply): Use get_range_pos_neg
> > to see if both operands have the same range.
> > ---
> >  gcc/tree-chrec.cc | 14 ++
> >  gcc/tree.cc   | 12 +++-
> >  2 files changed, 9 insertions(+), 17 deletions(-)
> > 
> > diff --git a/gcc/tree-chrec.cc b/gcc/tree-chrec.cc
> > index 2e6c7356d3b..450d018ce6f 100644
> > --- a/gcc/tree-chrec.cc
> > +++ b/gcc/tree-chrec.cc
> > @@ -442,18 +442,8 @@ chrec_fold_multiply (tree type,
> >   if (!ANY_INTEGRAL_TYPE_P (type)
> >   || TYPE_OVERFLOW_WRAPS (type)
> >   || integer_zerop (CHREC_LEFT (op0))
> > - || (TREE_CODE (CHREC_LEFT (op0)) == INTEGER_CST
> > - && TREE_CODE (CHREC_RIGHT (op0)) == INTEGER_CST
> > - && (tree_int_cst_sgn (CHREC_LEFT (op0))
> > - == tree_int_cst_sgn (CHREC_RIGHT (op0
> > - || (get_range_query (cfun)->range_of_expr (rl, CHREC_LEFT (op0))
> > - && !rl.undefined_p ()
> > - && (rl.nonpositive_p () || rl.nonnegative_p ())
> > - && get_range_query (cfun)->range_of_expr (rr,
> > -   CHREC_RIGHT (op0))
> > - && !rr.undefined_p ()
> > - && ((rl.nonpositive_p () && rr.nonpositive_p ())
> > - || (rl.nonnegative_p () && rr.nonnegative_p ()
> > + || (get_range_pos_neg (CHREC_LEFT (op0))
> > + | get_range_pos_neg (CHREC_RIGHT (op0))) != 3)
> > {
> >   tree left = chrec_fold_multiply (type, CHREC_LEFT (op0), op1);
> >   tree right = chrec_fold_multiply (type, CHREC_RIGHT (op0), op1);
> 
> So, wouldn't it be better to outline what you have above + POLY_INT_CST
> handling into a helper function, which similarly to get_range_pos_neg
> returns a bitmask, but rather than 1 bit for may be [0, max] and another bit 
> for
> may be [min, -1] you return 3 bits, 1 bit for may be [1, max], another for
> may be [0, 0] and another for may be [min, -1]?
> Also, I bet you actually want to handle TREE_UNSIGNED just as [0, 0]
> and [1, max] ranges unlike get_range_pos_neg.

I'm just lazy and given TYPE_OVERFLOW_WRAPS (and thus unsigned) doesn't
ever get here and I special-case integer_zerop it doesn't really matter
that in these cases get_range_pos_neg isn't exactly what's wanted - I'm
asking it only for those cases where it works just fine.

> So perhaps
>   int ret = 7;
>   if (TYPE_UNSIGNED (TREE_TYPE (arg)))
> ret = 3;
>   if (poly_int_tree_p (arg))
> {
>   poly_wide_int w = wi::to_poly_wide (arg);
>   if (known_lt (w, 0))
>   return 4;
>   else if (known_eq (w, 0))
>   return 2;
>   else if (known_gt (w, 0))
>   return 1;
>   else
>   return 7;
> }
>   value_range r;
>   if (!get_range_query (cfun)->range_of_expr (r, arg)
>   || r.undefined_p ())
> return ret;
>   if (r.nonpositive_p ())
> ret &= ~1;
>   if (r.nonzero_p ())
> ret &= ~2;
>   if (r.nonnegative_p ())
> ret &= ~4;
>   return ret;
> 
> ?  And then you can use it similarly,
>   ((whatever_fn (CHREC_LEFT (op0))
> | whatever_fn (CHREC_RIGHT (op0))) & ~2) != 5
> 
> Sure, if it is written just for this case and not other uses,
> it could be just 2 bits, can contain [1, max] and can contain [min, -1]
> because you don't care about zero, return 0 for the known_eq (w, 0)
> there...
> 
> Though see below, perhaps it should just handle INTEGER_CSTs and
> is_constant () POLY_INT_CSTs, not really sure what happens if there
> are overflows in the POLY_INT_CST evaluation.

I'm indeed also not sure whether the POLY_INT_CST is behaving
correctly.  I think POLYs are always constrained somehow but
whether known_gt ([__INT_MAX__, 1], 0) computes correctly
(or whether we treat overflow there as undefined?) I don't know.

I was trying to avoid adding

Re: [PATCH] i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534]

2024-02-29 Thread Jakub Jelinek

On Thu, Feb 29, 2024 at 04:26:00AM -0800, H.J. Lu wrote:
> > > Adding Hongtao and Honza into the loop as the ones who acked the original
> > > patch.
> > >
> > > The no_callee_saved_registers by default for noreturn functions change can
> > > break in-process backtrace(3) or backtraces from debugger or other process
> > > (quite often, any time the noreturn function decides to use the bp 
> > > register
> > > and any of the parent frames uses a frame pointer; the unwinder just 
> > > crashes
> > > in the libgcc unwinder case, gdb prints stack corrupted message), so I'd
> > > like to save bp register in that case:
> > >
> > > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646591.html
> > I think this patch makes sense and LGTM, we save and restore frame
> > pointer for noreturn.
> > >
> > > and additionally the no_callee_saved_registers by default for noreturn
> > > functions change can make debugging harder, again not localized to the
> > > noreturn function, but any of its callers.  So, if say glibc abort 
> > > function
> > > implementation needs a lot of normally callee-saved registers, no matter 
> > > how
> > > users recompile their apps, they will see garbage or optimized out
> > > vars/parameters in their code unless they rebuild their glibc with -O0.
> > > So, I think we should guard that by a non-default option:
> > >
> > > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646649.html
> > So it turns off the optimization for noreturn functions by default,
> > I'm not sure about this.
> > Any comments, H.J?
> 
> We need BP for backtrace.  I don't think we need to save other
> registers.  True, GDB may not see function parameters.  But
> optimization always has this impact.  When I need to debug a
> program, I always use -O0 or -Og.

The problem is that it doesn't help in this case.
If some optimization makes debugging of some function harder, normally it is
enough to recompile the translation unit that defines it with -O0/-Og, or
add optimize attribute on the function.
While in this case, the optimization interferes with debugging of other
functions, not necessarily from the same translation unit, not necessarily
even from the same library or binary, or even from the same package.
As I tried to explain, supposedly glibc abort is compiled with -O2 and needs
a lot of registers, so say it uses all of %rbx, %rbp, %r12, %r13, %r14,
%r15 and this optimization is applied on it.  That means debugging of any
application (-O2, -Og or even -O0 compiled) to understand what went wrong
and why it aborted will be harder.  Including core file analysis.
Recompiling those apps with -O0/-Og will not help.  The only thing that
would help is to recompile glibc with -O0/-Og.
Doesn't have to be abort, doesn't have to be glibc.  Any library which
exports some noreturn APIs may be affected.
And there is not even a workaround other than to recompile with -O0/-Og the
noreturn functions, no way to disable this optimization.

Given that most users just will not be aware of this, even adding the option
but defaulting to on would mean a problem for a lot of users.  Most of them
will not know the problem is that some noreturn function 10 frames deep in
the call stack was optimized this way.

If people only call the noreturn functions from within the same package,
for some strange reason care about performance of noreturn functions (they
don't return, so unless you longjmp out of them or something similar
which is costly on its own already, they should be entered exactly once)
and are willing to pay the price in worse debugging in that case, let them
use the option.  But if they provide libraries that other packages then
consume, I'd say it wouldn't be a good idea.

Jakub

Re: [PATCH] s390: Fix TARGET_SECONDARY_RELOAD for non-SYMBOL_REFs

2024-02-29 Thread Stefan Schulze Frielinghaus

On Thu, Feb 29, 2024 at 01:26:54PM +0100, Andreas Schwab wrote:
> On Feb 29 2024, Stefan Schulze Frielinghaus wrote:
> 
> > RTX X must not necessarily be a SYMBOL_REF and may e.g. be an
> 
> False friend: s/must not/need not/

Argh I always fall for this ;-) Thanks for pointing this out.  Changed
for the final commit.

Cheers,
Stefan

[PATCH v2] libstdc++: Add more nodiscard uses in

2024-02-29 Thread Jonathan Wakely

We need to add [[nodiscard]] to the comparison ops in 
too, which I missed in the v1 patch.

Tested aarch64-linux. Pushed to trunk (yesterday).
commit 26d6a714b29eeef77591f136f5162622a549d8fd
Author: Jonathan Wakely 
Date:   Mon Feb 26 13:09:02 2024

libstdc++: Add more nodiscard uses in 

Add [[nodiscard]] to vector::at and to comparison operators.

libstdc++-v3/ChangeLog:

* include/bits/stl_bvector.h (vector::at): Add
nodiscard.
* include/bits/stl_vector.h (vector::at): Likewise.
(operator==, operator<=>, operator<, operator!=, operator>)
(operator<=, operator>=): Likewise.
* include/debug/vector (operator==, operator<=>, operator<)
(operator!=, operator>, operator<=, operator>=): Likewise.
* testsuite/23_containers/vector/nodiscard.cc: New test.

diff --git a/libstdc++-v3/include/bits/stl_bvector.h 
b/libstdc++-v3/include/bits/stl_bvector.h
index aa5644b4a0e..2c8b892b07a 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -1101,7 +1101,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   }
 
 public:
-  _GLIBCXX20_CONSTEXPR
+  _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
   reference
   at(size_type __n)
   {
@@ -1109,7 +1109,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
return (*this)[__n];
   }
 
-  _GLIBCXX20_CONSTEXPR
+  _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
   const_reference
   at(size_type __n) const
   {
diff --git a/libstdc++-v3/include/bits/stl_vector.h 
b/libstdc++-v3/include/bits/stl_vector.h
index 6a9543eefce..a8d387f40a1 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -1172,7 +1172,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
*  is first checked that it is in the range of the vector.  The
*  function throws out_of_range if the check fails.
*/
-  _GLIBCXX20_CONSTEXPR
+  _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
   reference
   at(size_type __n)
   {
@@ -1191,7 +1191,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
*  is first checked that it is in the range of the vector.  The
*  function throws out_of_range if the check fails.
*/
-  _GLIBCXX20_CONSTEXPR
+  _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
   const_reference
   at(size_type __n) const
   {
@@ -2042,7 +2042,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
*  and if corresponding elements compare equal.
   */
   template
-_GLIBCXX20_CONSTEXPR
+_GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
 inline bool
 operator==(const vector<_Tp, _Alloc>& __x, const vector<_Tp, _Alloc>& __y)
 { return (__x.size() == __y.size()
@@ -2061,7 +2061,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
*  `<` and `>=` etc.
   */
   template
-_GLIBCXX20_CONSTEXPR
+[[nodiscard]] _GLIBCXX20_CONSTEXPR
 inline __detail::__synth3way_t<_Tp>
 operator<=>(const vector<_Tp, _Alloc>& __x, const vector<_Tp, _Alloc>& __y)
 {
@@ -2082,32 +2082,32 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
*  See std::lexicographical_compare() for how the determination is made.
   */
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator<(const vector<_Tp, _Alloc>& __x, const vector<_Tp, _Alloc>& __y)
 { return std::lexicographical_compare(__x.begin(), __x.end(),
  __y.begin(), __y.end()); }
 
   /// Based on operator==
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator!=(const vector<_Tp, _Alloc>& __x, const vector<_Tp, _Alloc>& __y)
 { return !(__x == __y); }
 
   /// Based on operator<
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator>(const vector<_Tp, _Alloc>& __x, const vector<_Tp, _Alloc>& __y)
 { return __y < __x; }
 
   /// Based on operator<
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator<=(const vector<_Tp, _Alloc>& __x, const vector<_Tp, _Alloc>& __y)
 { return !(__y < __x); }
 
   /// Based on operator<
   template
-inline bool
+_GLIBCXX_NODISCARD inline bool
 operator>=(const vector<_Tp, _Alloc>& __x, const vector<_Tp, _Alloc>& __y)
 { return !(__x < __y); }
 #endif // three-way comparison
diff --git a/libstdc++-v3/include/debug/vector 
b/libstdc++-v3/include/debug/vector
index 5a0fc808651..216822975a2 100644
--- a/libstdc++-v3/include/debug/vector
+++ b/libstdc++-v3/include/debug/vector
@@ -866,7 +866,7 @@ namespace __debug
 };
 
   template
-_GLIBCXX20_CONSTEXPR
+_GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
 inline bool
 operator==(const vector<_Tp, _Alloc>& __lhs,
   const vector<_Tp, _Alloc>& __rhs)
@@ -874,35 +874,41 @@ namespace __debug
 
 #if __cpp_lib_three_way_comparison
   template
+[[nodiscard]]
 constexpr __detail::__synth3way_t<_Tp>
 operator<=>(const vector<_Tp, _Alloc>& __x, const vector<_Tp,

Re: [PATCH] i386: Guard noreturn no-callee-saved-registers optimization with -mnoreturn-no-callee-saved-registers [PR38534]

2024-02-29 Thread H.J. Lu

On Wed, Feb 28, 2024 at 10:20 PM Hongtao Liu  wrote:
>
> On Wed, Feb 28, 2024 at 4:54 PM Jakub Jelinek  wrote:
> >
> > Hi!
> >
> > Adding Hongtao and Honza into the loop as the ones who acked the original
> > patch.
> >
> > The no_callee_saved_registers by default for noreturn functions change can
> > break in-process backtrace(3) or backtraces from debugger or other process
> > (quite often, any time the noreturn function decides to use the bp register
> > and any of the parent frames uses a frame pointer; the unwinder just crashes
> > in the libgcc unwinder case, gdb prints stack corrupted message), so I'd
> > like to save bp register in that case:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646591.html
> I think this patch makes sense and LGTM, we save and restore frame
> pointer for noreturn.
> >
> > and additionally the no_callee_saved_registers by default for noreturn
> > functions change can make debugging harder, again not localized to the
> > noreturn function, but any of its callers.  So, if say glibc abort function
> > implementation needs a lot of normally callee-saved registers, no matter how
> > users recompile their apps, they will see garbage or optimized out
> > vars/parameters in their code unless they rebuild their glibc with -O0.
> > So, I think we should guard that by a non-default option:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646649.html
> So it turns off the optimization for noreturn functions by default,
> I'm not sure about this.
> Any comments, H.J?

We need BP for backtrace.  I don't think we need to save other
registers.  True, GDB may not see function parameters.  But
optimization always has this impact.  When I need to debug a
program, I always use -O0 or -Og.

> >
> > Plus we need to somehow make sure to emit DW_CFA_undefined for the modified
> > but not saved normally callee-saved registers, so that we at least don't get
> > garbage in debug info.  H.J. posted some patches for that, so far I wasn't
> > happy about the implementation but the actual change is desirable.
> >
> > Your thoughts on this?
> >
> > Jakub
> >
>
>
> --
> BR,
> Hongtao



-- 
H.J.

Re: [PATCH] s390: Fix TARGET_SECONDARY_RELOAD for non-SYMBOL_REFs

2024-02-29 Thread Andreas Schwab

On Feb 29 2024, Stefan Schulze Frielinghaus wrote:

> RTX X must not necessarily be a SYMBOL_REF and may e.g. be an

False friend: s/must not/need not/

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

[PATCH] s390: Fix test vector/long-double-to-i64.c

2024-02-29 Thread Stefan Schulze Frielinghaus

Starting with r14-8319-g86de9b66480b71 fwprop improved so that vpdi is
no longer required.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/long-double-to-i64.c: Fix scan
assembler directive.
---
 .../gcc.target/s390/vector/long-double-to-i64.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c 
b/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c
index 2dbbb5d1c03..ed89878e6ee 100644
--- a/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c
+++ b/gcc/testsuite/gcc.target/s390/vector/long-double-to-i64.c
@@ -1,19 +1,24 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -march=z14 -mzarch --save-temps" } */
 /* { dg-do run { target { s390_z14_hw } } } */
+/* { dg-final { check-function-bodies "**" "" "" { target { lp64 } } } } */
+
 #include 
 #include 
 
+/*
+** long_double_to_i64:
+** ld  %f0,0\(%r2\)
+** ld  %f2,8\(%r2\)
+** cgxbr   %r2,5,%f0
+** br  %r14
+*/
 __attribute__ ((noipa)) static int64_t
 long_double_to_i64 (long double x)
 {
   return x;
 }
 
-/* { dg-final { scan-assembler-times {\n\tvpdi\t%v\d+,%v\d+,%v\d+,1\n} 1 } } */
-/* { dg-final { scan-assembler-times {\n\tvpdi\t%v\d+,%v\d+,%v\d+,5\n} 1 } } */
-/* { dg-final { scan-assembler-times {\n\tcgxbr\t} 1 } } */
-
 int
 main (void)
 {
-- 
2.43.0

[PATCH] s390: Fix tests rosbg_si_srl and rxsbg_si_srl

2024-02-29 Thread Stefan Schulze Frielinghaus

Starting with r14-2047-gd0e891406b16dc two SI mode tests are optimized
into DI mode.  Thus, the scan-assembler directives fail.  For example
RTL expression

(ior:SI (subreg:SI (lshiftrt:DI (reg:DI 69)
(const_int 2 [0x2])) 4)
(subreg:SI (reg:DI 68) 4))

is optimized into

(ior:DI (lshiftrt:DI (reg:DI 69)
(const_int 2 [0x2]))
(reg:DI 68))

Fixed by moving operands into memory in order to enforce SI mode
computation.

Furthermore, in r9-6056-g290dfd9bc7bea2 the starting bit position of the
scan-assembler directive for rosbg was incorrectly set to 32 which
actually should be 32+SHIFT_AMOUNT, i.e., in this particular case 34.

gcc/testsuite/ChangeLog:

* gcc.target/s390/md/rXsbg_mode_sXl.c: Fix tests rosbg_si_srl
and rxsbg_si_srl.
---
 .../gcc.target/s390/md/rXsbg_mode_sXl.c| 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.target/s390/md/rXsbg_mode_sXl.c 
b/gcc/testsuite/gcc.target/s390/md/rXsbg_mode_sXl.c
index ede813818ff..cf454d2783c 100644
--- a/gcc/testsuite/gcc.target/s390/md/rXsbg_mode_sXl.c
+++ b/gcc/testsuite/gcc.target/s390/md/rXsbg_mode_sXl.c
@@ -22,6 +22,8 @@
 { dg-skip-if "" { *-*-* } { "*" } { "-march=*" } }
 */
 
+unsigned int a, b;
+
 __attribute__ ((noinline)) unsigned int
 si_sll (unsigned int x)
 {
@@ -42,11 +44,11 @@ rosbg_si_sll (unsigned int a, unsigned int b)
 /* { dg-final { scan-assembler-times "rosbg\t%r.,%r.,32,62,1" 1 } } */
 
 __attribute__ ((noinline)) unsigned int
-rosbg_si_srl (unsigned int a, unsigned int b)
+rosbg_si_srl (void)
 {
   return a | (b >> 2);
 }
-/* { dg-final { scan-assembler-times "rosbg\t%r.,%r.,32,63,62" 1 } } */
+/* { dg-final { scan-assembler-times "rosbg\t%r.,%r.,34,63,62" 1 } } */
 
 __attribute__ ((noinline)) unsigned int
 rxsbg_si_sll (unsigned int a, unsigned int b)
@@ -56,11 +58,11 @@ rxsbg_si_sll (unsigned int a, unsigned int b)
 /* { dg-final { scan-assembler-times "rxsbg\t%r.,%r.,32,62,1" 1 } } */
 
 __attribute__ ((noinline)) unsigned int
-rxsbg_si_srl (unsigned int a, unsigned int b)
+rxsbg_si_srl (void)
 {
   return a ^ (b >> 2);
 }
-/* { dg-final { scan-assembler-times "rxsbg\t%r.,%r.,32,63,62" 1 } } */
+/* { dg-final { scan-assembler-times "rxsbg\t%r.,%r.,34,63,62" 1 } } */
 
 __attribute__ ((noinline)) unsigned long long
 di_sll (unsigned long long x)
@@ -108,21 +110,21 @@ main (void)
   /* SIMode */
   {
 unsigned int r;
-unsigned int a = 0x12488421u;
-unsigned int b = 0xu;
+a = 0x12488421u;
+b = 0xu;
 unsigned int csll = si_sll (b);
 unsigned int csrl = si_srl (b);
 
 r = rosbg_si_sll (a, b);
 if (r != (a | csll))
   __builtin_abort ();
-r = rosbg_si_srl (a, b);
+r = rosbg_si_srl ();
 if (r != (a | csrl))
   __builtin_abort ();
 r = rxsbg_si_sll (a, b);
 if (r != (a ^ csll))
   __builtin_abort ();
-r = rxsbg_si_srl (a, b);
+r = rxsbg_si_srl ();
 if (r != (a ^ csrl))
   __builtin_abort ();
   }
-- 
2.43.0

[PATCH] s390: Fix TARGET_SECONDARY_RELOAD for non-SYMBOL_REFs

2024-02-29 Thread Stefan Schulze Frielinghaus

RTX X must not necessarily be a SYMBOL_REF and may e.g. be an
UNSPEC_GOTENT for which SYMBOL_FLAG_NOTALIGN2_P fails.

gcc/ChangeLog:

* config/s390/s390.cc (s390_secondary_reload): Guard
SYMBOL_FLAG_NOTALIGN2_P.
---
 gcc/config/s390/s390.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 943fc9bfd72..12430d77786 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -4778,7 +4778,7 @@ s390_secondary_reload (bool in_p, rtx x, reg_class_t 
rclass_i,
   if (in_p
  && s390_loadrelative_operand_p (x, , )
  && mode == Pmode
- && !SYMBOL_FLAG_NOTALIGN2_P (symref)
+ && (!SYMBOL_REF_P (symref) || !SYMBOL_FLAG_NOTALIGN2_P (symref))
  && (offset & 1) == 1)
sri->icode = ((mode == DImode) ? CODE_FOR_reloaddi_larl_odd_addend_z10
  : CODE_FOR_reloadsi_larl_odd_addend_z10);
-- 
2.43.0

[PATCH] dwarf2out: Don't move variable sized aggregates to comdat [PR114015]

2024-02-29 Thread Jakub Jelinek

Hi!

The following testcase ICEs, because we decide to move that
struct { char a[n]; } DW_TAG_structure_type into .debug_types section
/ DW_UT_type DWARF5 unit, but refer from there to a DW_TAG_variable
(created artificially for the array bounds).
Even with non-bitint, I think it is just wrong to use .debug_types
section / DW_UT_type for something that uses DW_OP_fbreg and similar
in it, things clearly dependent on a particular function.
In most cases, is_nested_in_subprogram (die) check results in such
aggregates not being moved, but in the function parameter type case
that is not the case.

The following patch fixes it by returning false from should_move_die_to_comdat
for non-constant sized aggregate types, i.e. when either we gave up on
adding DW_AT_byte_size for it because it wasn't expressable, or when
it is something non-constant (location description, reference, ...).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-02-29  Jakub Jelinek  

PR debug/114015
* dwarf2out.cc (should_move_die_to_comdat): Return false for
aggregates without DW_AT_byte_size attribute or with non-constant
DW_AT_byte_size.

* gcc.dg/debug/dwarf2/pr114015.c: New test.

--- gcc/dwarf2out.cc.jj 2024-02-17 01:14:48.157790666 +0100
+++ gcc/dwarf2out.cc2024-02-28 17:11:44.259252850 +0100
@@ -8215,6 +8215,15 @@ should_move_die_to_comdat (dw_die_ref di
   || is_nested_in_subprogram (die)
   || contains_subprogram_definition (die))
return false;
+  if (die->die_tag != DW_TAG_enumeration_type)
+   {
+ /* Don't move non-constant size aggregates.  */
+ dw_attr_node *sz = get_AT (die, DW_AT_byte_size);
+ if (sz == NULL
+ || (AT_class (sz) != dw_val_class_unsigned_const
+ && AT_class (sz) != dw_val_class_unsigned_const_implicit))
+   return false;
+   }
   return true;
 case DW_TAG_array_type:
 case DW_TAG_interface_type:
--- gcc/testsuite/gcc.dg/debug/dwarf2/pr114015.c.jj 2024-02-28 
17:22:33.206221495 +0100
+++ gcc/testsuite/gcc.dg/debug/dwarf2/pr114015.c2024-02-28 
17:21:49.357831730 +0100
@@ -0,0 +1,14 @@
+/* PR debug/114015 */
+/* { dg-do compile { target bitint } } */
+/* { dg-options "-g -fvar-tracking-assignments -fdebug-types-section -w" } */
+
+#if __BITINT_MAXWIDTH__ >= 236
+typedef _BitInt(236) B;
+#else
+typedef _BitInt(63) B;
+#endif
+
+int
+foo (B n, struct { char a[n]; } o)
+{
+}

Jakub

[PATCH] c++: Fix up decltype of non-dependent structured binding decl in template [PR92687]

2024-02-29 Thread Jakub Jelinek

Hi!

finish_decltype_type uses DECL_HAS_VALUE_EXPR_P (expr) check for
DECL_DECOMPOSITION_P (expr) to determine if it is
array/struct/vector/complex etc. subobject proxy case vs. structured
binding using std::tuple_{size,element}.
For non-templates or when templates are already instantiated, that works
correctly, finalized DECL_DECOMPOSITION_P non-base vars indeed have
DECL_VALUE_EXPR in the former case and don't have it in the latter.
It works fine for dependent structured bindings as well, cp_finish_decomp in
that case creates DECLTYPE_TYPE tree and defers the handling until
instantiation.
As the testcase shows, this doesn't work for the non-dependent structured
binding case in templates, because DECL_HAS_VALUE_EXPR_P is set in that case
always; cp_finish_decomp ends with:
  if (processing_template_decl)
{
  for (unsigned int i = 0; i < count; i++)
if (!DECL_HAS_VALUE_EXPR_P (v[i]))
  {
tree a = build_nt (ARRAY_REF, decl, size_int (i),
   NULL_TREE, NULL_TREE);
SET_DECL_VALUE_EXPR (v[i], a);
DECL_HAS_VALUE_EXPR_P (v[i]) = 1;
  }
}
and those artificial ARRAY_REFs are used in various places during
instantiation to find out what base the DECL_DECOMPOSITION_P VAR_DECLs
have and their positions.

The following patch fixes it by remembering from cp_finish_decomp in
the processing_template_decl case whether the structured binding uses
std::tuple_{size,element} or not in a flag, which then finish_decltype_type
can use.
Rather than wasting a lang_decl_base bit on it or growing the size of
lang_decl_decomp for it, I chose to abuse the ARRAY_REF operands;
the ARRAY_REF in that case is completely artificial, will never be emitted
(when the cp_finish_decomp is called on the instantiated version of it
with !processing_template_decl, DECL_VALUE_EXPR/DECL_HAS_VALUE_EXPR_P is
cleared), so I chose to use size_zero_node for the TREE_OPERAND (array_ref,
2) as a flag this structured binding is the tuple case (per ARRAY_REF
documentation the third operand is an optional copy of TYPE_MIN_VALUE of the
index type; and AFAIK everything in the C++ FE uses NULL there, it is mainly
there for Ada (and the 4th argument for arrays of non-constant length
elements), but all these ARRAY_REFs aren't even folded or something similar).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Another option would be to change
 tree
 lookup_decomp_type (tree v)
 {
-  return *decomp_type_table->get (v);
+  if (tree *slot = decomp_type_table->get (v))
+return *slot;
+  return NULL_TREE;
 }

and in finish_decl_decomp either just in the ptds.saved case or always
try to lookup_decomp_type, if it returns non-NULL, return what it returned,
otherwise return unlowered_expr_type (expr).  I guess it would be cleaner,
I thought it would be more costly due to the hash table lookup, but now that
I think about it again, DECL_VALUE_EXPR is a hash table lookup as well.
So maybe then
+ if (ptds.saved)
+   {
+ gcc_checking_assert (DECL_HAS_VALUE_EXPR_P (expr));
+ /* DECL_HAS_VALUE_EXPR_P is always set if
+processing_template_decl.  If lookup_decomp_type
+returns non-NULL, it is the tuple case.  */
+ if (tree ret = lookup_decomp_type (expr))
+   return ret;
+   }
  if (DECL_HAS_VALUE_EXPR_P (expr))
/* Expr is an array or struct subobject proxy, handle
   bit-fields properly.  */
return unlowered_expr_type (expr);
  else
/* Expr is a reference variable for the tuple case.  */
return lookup_decomp_type (expr);

2024-02-29  Jakub Jelinek  

PR c++/92687
* decl.cc (cp_finish_decomp): If processing_template_decl, remember
whether std::tuple_{size,element} will be used or not in third
operand of DECL_VALUE_EXPR ARRAY_REF.
* semantics.cc (finish_decltype_type): Use that if ptds.saved to see
if lookup_decomp_type should be used.

* g++.dg/cpp1z/decomp59.C: New test.

--- gcc/cp/decl.cc.jj   2024-02-28 08:41:18.486493565 +0100
+++ gcc/cp/decl.cc  2024-02-28 15:10:47.555186301 +0100
@@ -9384,6 +9384,7 @@ cp_finish_decomp (tree decl, cp_decomp *
 
   tree eltype = NULL_TREE;
   unsigned HOST_WIDE_INT eltscnt = 0;
+  bool tuple_p = false;
   if (TREE_CODE (type) == ARRAY_TYPE)
 {
   tree nelts;
@@ -9535,6 +9536,7 @@ cp_finish_decomp (tree decl, cp_decomp *
 of the individual variables.  If those will be read, we'll mark
 the underlying decl as read at that point.  */
   DECL_READ_P (decl) = save_read;
+  tuple_p = true;
 }
   else if (TREE_CODE (type) == UNION_TYPE)
 {
@@ -9607,14 +9609,25 @@ cp_finish_decomp (tree decl, cp_decomp *
 }
   if (processing_template_decl)
 {
+  /* For non-dependent structured bindings using std::tuple_size
+and std::tuple_element,

Re: [PATCH v2] testsuite: Add a test case for negating FP vectors containing zeros

2024-02-29 Thread Xi Ruoyao

On Thu, 2024-02-29 at 15:09 +0800, Xi Ruoyao wrote:
> Recently I've fixed two wrong FP vector negate implementation which
> caused wrong sign bits in zeros in targets (r14-8786 and r14-8801).  To
> prevent a similar issue from happening again, add a test case.
> 
> Tested on x86_64 (with SSE2, AVX, AVX2, and AVX512F), AArch64, MIPS
> (with MSA), LoongArch (with LSX and LASX).
> 
> gcc/testsuite:
> 
>   * gcc.dg/vect/vect-neg-zero.c: New test.
> ---
> 
> v1->v2: Remove { dg-do run } which was likely triggering a SIGILL on
> Linaro ARM CI.

Oops, still failing ARM CI.  Not sure why...

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [PATCH] middle-end/114070 - VEC_COND_EXPR folding

2024-02-29 Thread Jakub Jelinek

On Thu, Feb 29, 2024 at 11:16:54AM +0100, Richard Biener wrote:
> That said, the quick experiment shows this isn't anything for stage4.

The earlier the vector lowering is moved in the pass list, the higher
are the possibilities that match.pd or some other optimization reintroduces
unsupportable vector operations into the IL.

Guess your patch looks reasonable.

> > PR middle-end/114070
> > * match.pd ((c ? a : b) op d  -->  c ? (a op d) : (b op d)):
> > Allow the folding if before lowering and the current IL
> > isn't supported with vcond_mask.
> > ---
> >  gcc/match.pd | 18 +++---
> >  1 file changed, 15 insertions(+), 3 deletions(-)
> > 
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index f3fffd8dec2..4edba7c84fb 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -5153,7 +5153,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >(op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4))
> >(if (TREE_CODE_CLASS (op) != tcc_comparison
> > || types_match (type, TREE_TYPE (@1))
> > -   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
> > +   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK)
> > +   || (optimize_vectors_before_lowering_p ()
> > +  /* The following is optimistic on the side of non-support, we are
> > + missing the legacy vcond{,u,eq} cases.  Do this only when
> > + lowering will be able to fixup..  */
> > +  && !expand_vec_cond_expr_p (TREE_TYPE (@1),
> > +  TREE_TYPE (@0), ERROR_MARK)))
> > (vec_cond @0 (op! @1 @3) (op! @2 @4
> >  
> >  /* (c ? a : b) op d  -->  c ? (a op d) : (b op d) */
> > @@ -5161,13 +5167,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >(op (vec_cond:s @0 @1 @2) @3)
> >(if (TREE_CODE_CLASS (op) != tcc_comparison
> > || types_match (type, TREE_TYPE (@1))
> > -   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
> > +   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK)
> > +   || (optimize_vectors_before_lowering_p ()
> > +  && !expand_vec_cond_expr_p (TREE_TYPE (@1),
> > +  TREE_TYPE (@0), ERROR_MARK)))
> > (vec_cond @0 (op! @1 @3) (op! @2 @3
> >   (simplify
> >(op @3 (vec_cond:s @0 @1 @2))
> >(if (TREE_CODE_CLASS (op) != tcc_comparison
> > || types_match (type, TREE_TYPE (@1))
> > -   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
> > +   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK)
> > +   || (optimize_vectors_before_lowering_p ()
> > +  && !expand_vec_cond_expr_p (TREE_TYPE (@1),
> > +  TREE_TYPE (@0), ERROR_MARK)))
> > (vec_cond @0 (op! @3 @1) (op! @3 @2)
> >  
> >  #if GIMPLE
> > 

Jakub

Re: [PATCH] tree-optimization/114151 - handle POLY_INT_CST in get_range_pos_neg

2024-02-29 Thread Jakub Jelinek

On Thu, Feb 29, 2024 at 09:21:02AM +0100, Richard Biener wrote:
> The following switches the logic in chrec_fold_multiply to
> get_range_pos_neg since handling POLY_INT_CST possibly mixed with
> non-poly ranges will make the open-coding awkward and while not
> a perfect fit it should work.
> 
> In turn the following makes get_range_pos_neg aware of POLY_INT_CSTs.
> I couldn't make it work with poly_wide_int since the compares always
> fail to build but poly_widest_int works fine and it should be
> semantically the same.  I've also changed get_range_pos_neg to
> use get_range_query (cfun), problematical passes shouldn't have
> a range query activated so it shouldn't make a difference there.
> 
> This doesn't make a difference for the PR but not considering
> POLY_INT_CSTs was a mistake.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu, OK?
> 
> Thanks,
> Richard.
> 
>   PR tree-optimization/114151
>   * tree.cc (get_range_pos_neg): Handle POLY_INT_CST, use
>   the passes range-query if available.
>   * tree-chre.cc (chrec_fold_multiply): Use get_range_pos_neg
>   to see if both operands have the same range.
> ---
>  gcc/tree-chrec.cc | 14 ++
>  gcc/tree.cc   | 12 +++-
>  2 files changed, 9 insertions(+), 17 deletions(-)
> 
> diff --git a/gcc/tree-chrec.cc b/gcc/tree-chrec.cc
> index 2e6c7356d3b..450d018ce6f 100644
> --- a/gcc/tree-chrec.cc
> +++ b/gcc/tree-chrec.cc
> @@ -442,18 +442,8 @@ chrec_fold_multiply (tree type,
> if (!ANY_INTEGRAL_TYPE_P (type)
> || TYPE_OVERFLOW_WRAPS (type)
> || integer_zerop (CHREC_LEFT (op0))
> -   || (TREE_CODE (CHREC_LEFT (op0)) == INTEGER_CST
> -   && TREE_CODE (CHREC_RIGHT (op0)) == INTEGER_CST
> -   && (tree_int_cst_sgn (CHREC_LEFT (op0))
> -   == tree_int_cst_sgn (CHREC_RIGHT (op0
> -   || (get_range_query (cfun)->range_of_expr (rl, CHREC_LEFT (op0))
> -   && !rl.undefined_p ()
> -   && (rl.nonpositive_p () || rl.nonnegative_p ())
> -   && get_range_query (cfun)->range_of_expr (rr,
> - CHREC_RIGHT (op0))
> -   && !rr.undefined_p ()
> -   && ((rl.nonpositive_p () && rr.nonpositive_p ())
> -   || (rl.nonnegative_p () && rr.nonnegative_p ()
> +   || (get_range_pos_neg (CHREC_LEFT (op0))
> +   | get_range_pos_neg (CHREC_RIGHT (op0))) != 3)
>   {
> tree left = chrec_fold_multiply (type, CHREC_LEFT (op0), op1);
> tree right = chrec_fold_multiply (type, CHREC_RIGHT (op0), op1);

So, wouldn't it be better to outline what you have above + POLY_INT_CST
handling into a helper function, which similarly to get_range_pos_neg
returns a bitmask, but rather than 1 bit for may be [0, max] and another bit for
may be [min, -1] you return 3 bits, 1 bit for may be [1, max], another for
may be [0, 0] and another for may be [min, -1]?
Also, I bet you actually want to handle TREE_UNSIGNED just as [0, 0]
and [1, max] ranges unlike get_range_pos_neg.

So perhaps
  int ret = 7;
  if (TYPE_UNSIGNED (TREE_TYPE (arg)))
ret = 3;
  if (poly_int_tree_p (arg))
{
  poly_wide_int w = wi::to_poly_wide (arg);
  if (known_lt (w, 0))
return 4;
  else if (known_eq (w, 0))
return 2;
  else if (known_gt (w, 0))
return 1;
  else
return 7;
}
  value_range r;
  if (!get_range_query (cfun)->range_of_expr (r, arg)
  || r.undefined_p ())
return ret;
  if (r.nonpositive_p ())
ret &= ~1;
  if (r.nonzero_p ())
ret &= ~2;
  if (r.nonnegative_p ())
ret &= ~4;
  return ret;

?  And then you can use it similarly,
  ((whatever_fn (CHREC_LEFT (op0))
| whatever_fn (CHREC_RIGHT (op0))) & ~2) != 5

Sure, if it is written just for this case and not other uses,
it could be just 2 bits, can contain [1, max] and can contain [min, -1]
because you don't care about zero, return 0 for the known_eq (w, 0)
there...

Though see below, perhaps it should just handle INTEGER_CSTs and
is_constant () POLY_INT_CSTs, not really sure what happens if there
are overflows in the POLY_INT_CST evaluation.

> --- a/gcc/tree.cc
> +++ b/gcc/tree.cc
> @@ -14408,13 +14408,15 @@ get_range_pos_neg (tree arg)
>  
>int prec = TYPE_PRECISION (TREE_TYPE (arg));
>int cnt = 0;
> -  if (TREE_CODE (arg) == INTEGER_CST)
> +  if (poly_int_tree_p (arg))
>  {
> -  wide_int w = wi::sext (wi::to_wide (arg), prec);
> -  if (wi::neg_p (w))
> +  poly_widest_int w = wi::sext (wi::to_poly_widest (arg), prec);
> +  if (known_lt (w, 0))
>   return 2;
> -  else
> +  else if (known_ge (w, 0))
>   return 1;
> +  else
> + return 3;
>  }
>while (CONVERT_EXPR_P (arg)
>&& INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (arg, 0)))

I doubt POLY_INT_CST will appear on what the function is being called on
(types

Re: [PATCH] middle-end/114070 - VEC_COND_EXPR folding

2024-02-29 Thread Richard Biener

On Thu, 29 Feb 2024, Richard Biener wrote:

> The following amends the PR114070 fix to optimistically allow
> the folding when we cannot expand the current vec_cond using
> vcond_mask and we're still before vector lowering.  This leaves
> a small window between vectorization and lowering where we could
> break vec_conds that can be expanded via vcond{,u,eq}, most
> susceptible is the loop unrolling pass which applies VN and thus
> possibly folding to the unrolled body of a vectorized loop.
> 
> This gets back the folding for targets that cannot do vectorization.
> It doesn't get back the folding for x86 with AVX512 for example
> since that can handle the original IL but not the folded since
> it misses some vcond_mask expanders.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> 
> As said for stage1 I want to move vector lowering before vectorization.
> While I'm not entirely happy with this patch it forces us into the
> correct direction, getting vcond_mask and vcmp{,u,eq} patterns
> implemented.  We could use canonicalize_math_p () to close the
> vectorizer -> vector lowering gap but this only works when that
> pass is run (not with -Og or when disabled).  We could add a new
> PROP_vectorizer_il and disable the folding if the vectorizer ran.
> 
> Or we could simply live with the regression.
> 
> Any preferences?

I've tried moving vector lowering, first try to after the first
forwprop after IPA.  That exposes (at least) invariant motion
creating unsupported COND_EXPRs - we hoist a vector PHI as
_2 ? _3 : _6 and that might lead to unsupported BLKmode moves.

I think there's some latent issues to be fixed in passes.

A more conservative move is to duplicate vector lowering into
the loop/non-loop sections and put it right before vectorization
(but there's invariant motion after it, so the above issue will
prevail).  Since the vectorizer currently cannot handle existing
vector code "re-vectorization" is best done on lowered code (after
that got some cleanup).  Also SLP can interface with existing
vectors, but currently only non-BLKmode ones, so that would benefit
as well.

That said, the quick experiment shows this isn't anything for stage4.

Richard.

> Thanks,
> Richard.
> 
>   PR middle-end/114070
>   * match.pd ((c ? a : b) op d  -->  c ? (a op d) : (b op d)):
>   Allow the folding if before lowering and the current IL
>   isn't supported with vcond_mask.
> ---
>  gcc/match.pd | 18 +++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index f3fffd8dec2..4edba7c84fb 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5153,7 +5153,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4))
>(if (TREE_CODE_CLASS (op) != tcc_comparison
> || types_match (type, TREE_TYPE (@1))
> -   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
> +   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK)
> +   || (optimize_vectors_before_lowering_p ()
> +/* The following is optimistic on the side of non-support, we are
> +   missing the legacy vcond{,u,eq} cases.  Do this only when
> +   lowering will be able to fixup..  */
> +&& !expand_vec_cond_expr_p (TREE_TYPE (@1),
> +TREE_TYPE (@0), ERROR_MARK)))
> (vec_cond @0 (op! @1 @3) (op! @2 @4
>  
>  /* (c ? a : b) op d  -->  c ? (a op d) : (b op d) */
> @@ -5161,13 +5167,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(op (vec_cond:s @0 @1 @2) @3)
>(if (TREE_CODE_CLASS (op) != tcc_comparison
> || types_match (type, TREE_TYPE (@1))
> -   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
> +   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK)
> +   || (optimize_vectors_before_lowering_p ()
> +&& !expand_vec_cond_expr_p (TREE_TYPE (@1),
> +TREE_TYPE (@0), ERROR_MARK)))
> (vec_cond @0 (op! @1 @3) (op! @2 @3
>   (simplify
>(op @3 (vec_cond:s @0 @1 @2))
>(if (TREE_CODE_CLASS (op) != tcc_comparison
> || types_match (type, TREE_TYPE (@1))
> -   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
> +   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK)
> +   || (optimize_vectors_before_lowering_p ()
> +&& !expand_vec_cond_expr_p (TREE_TYPE (@1),
> +TREE_TYPE (@0), ERROR_MARK)))
> (vec_cond @0 (op! @3 @1) (op! @3 @2)
>  
>  #if GIMPLE
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

1 2 >

1 - 100 of 103 matches

Mail list logo