Re: [PATCH] Assert we don't create recursive DW_AT_abstract_origin

2023-11-16 Thread Richard Biener
On Thu, 16 Nov 2023, Jason Merrill wrote:

> On 10/30/23 08:57, Richard Biener wrote:
> > We have a support case that shows GCC 7 sometimes creates
> > DW_TAG_label refering to itself via a DW_AT_abstract_origin
> > when using LTO.  This for example triggers the sanity check
> > added below during LTO bootstrap.
> > 
> > Making this check cover more than just DW_AT_abstract_origin
> > breaks bootstrap on trunk for
> > 
> >/* GNU extension: Record what type our vtable lives in.  */
> >if (TYPE_VFIELD (type))
> >  {
> >tree vtype = DECL_FCONTEXT (TYPE_VFIELD (type));
> > 
> >gen_type_die (vtype, context_die);
> >add_AT_die_ref (type_die, DW_AT_containing_type,
> >lookup_type_die (vtype));
> > 
> > so the check is for now restricted to DW_AT_abstract_origin.
> > 
> > Bootstrapped on x86_64-unknown-linux-gnu, OK?
> 
> Let's also check for DW_AT_specification, since that's the other one get_AT
> follows.  OK with that change.

The following is what I applied.

Richard.

>From 2c070d92beea9e46947693c623b44551dc18e513 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Mon, 30 Oct 2023 13:17:11 +0100
Subject: [PATCH] Assert we don't create recursive
 DW_AT_{abstract_origin,specification}
To: gcc-patches@gcc.gnu.org

We have a support case that shows GCC 7 sometimes creates
DW_TAG_label refering to itself via a DW_AT_abstract_origin
when using LTO.  This for example triggers the sanity check
added below during LTO bootstrap.

Making this check cover more than just DW_AT_abstract_origin
breaks bootstrap on trunk for

  /* GNU extension: Record what type our vtable lives in.  */
  if (TYPE_VFIELD (type))
{
  tree vtype = DECL_FCONTEXT (TYPE_VFIELD (type));

  gen_type_die (vtype, context_die);
  add_AT_die_ref (type_die, DW_AT_containing_type,
  lookup_type_die (vtype));

so the check is for now restricted to DW_AT_abstract_origin
and DW_AT_specification both of which we follow within get_AT.

* dwarf2out.cc (add_AT_die_ref): Assert we do not add
a self-ref DW_AT_abstract_origin or DW_AT_specification.
---
 gcc/dwarf2out.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 9850d094707..d187be9b786 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -4908,6 +4908,9 @@ add_AT_die_ref (dw_die_ref die, enum dwarf_attribute 
attr_kind, dw_die_ref targ_
 {
   dw_attr_node attr;
   gcc_checking_assert (targ_die != NULL);
+  gcc_assert (targ_die != die
+ || (attr_kind != DW_AT_abstract_origin
+ && attr_kind != DW_AT_specification));
 
   /* With LTO we can end up trying to reference something we didn't create
  a DIE for.  Avoid crashing later on a NULL referenced DIE.  */
-- 
2.35.3



Re: [PATCH] sra: SRA of non-escaped aggregates passed by reference to calls

2023-11-16 Thread Richard Biener
On Thu, 16 Nov 2023, Martin Jambor wrote:

> Hello,
> 
> PR109849 shows that a loop that heavily pushes and pops from a stack
> implemented by a C++ std::vec results in slow code, mainly because the
> vector structure is not split by SRA and so we end up in many loads
> and stores into it.  This is because it is passed by reference
> to (re)allocation methods and so needs to live in memory, even though
> it does not escape from them and so we could SRA it if we
> re-constructed it before the call and then separated it to distinct
> replacements afterwards.
> 
> This patch does exactly that, first relaxing the selection of
> candidates to also include those which are addressable but do not
> escape and then adding code to deal with the calls.  The
> micro-benchmark that is also the (scan-dump) testcase in this patch
> runs twice as fast with it than with current trunk.  Honza measured
> its effect on the libjxl benchmark and it almost closes the
> performance gap between Clang and GCC while not requiring excessive
> inlining and thus code growth.
> 
> The patch disallows creation of replacements for such aggregates which
> are also accessed with a precision smaller than their size because I
> have observed that this led to excessive zero-extending of data
> leading to slow-downs of perlbench (on some CPUs).  Apart from this
> case I have not noticed any regressions, at least not so far.
> 
> Gimple call argument flags can tell if an argument is unused (and then
> we do not need to generate any statements for it) or if it is not
> written to and then we do not need to generate statements loading
> replacements from the original aggregate after the call statement.
> Unfortunately, we cannot symmetrically use flags that an aggregate is
> not read because to avoid re-constructing the aggregate before the
> call because flags don't tell which what parts of aggregates were not
> written to, so we load all replacements, and so all need to have the
> correct value before the call.
> 
> The patch passes bootstrap, lto-bootstrap and profiled-lto-bootstrap on
> x86_64-linux and a very similar patch has also passed bootstrap and
> testing on Aarch64-linux and ppc64le-linux (I'm re-running both on these
> two architectures but as I'm sending this).  OK for master?
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2023-11-16  Martin Jambor  
> 
>   PR middle-end/109849
>   * tree-sra.cc (passed_by_ref_in_call): New.
>   (sra_initialize): Allocate passed_by_ref_in_call.
>   (sra_deinitialize): Free passed_by_ref_in_call.
>   (create_access): Add decl pool candidates only if they are not
>   already candidates.
>   (build_access_from_expr_1): Bail out on ADDR_EXPRs.
>   (build_access_from_call_arg): New function.
>   (asm_visit_addr): Rename to scan_visit_addr, change the
>   disqualification dump message.
>   (scan_function): Check taken addresses for all non-call statements,
>   including phi nodes.  Process all call arguments, including the static
>   chain, build_access_from_call_arg.
>   (maybe_add_sra_candidate): Relax need_to_live_in_memory check to allow
>   non-escaped local variables.
>   (sort_and_splice_var_accesses): Disallow smaller-than-precision
>   replacements for aggregates passed by reference to functions.
>   (sra_modify_expr): Use a separate stmt iterator for adding satements
>   before the processed statement and after it.
>   (sra_modify_call_arg): New function.
>   (sra_modify_assign): Adjust calls to sra_modify_expr.
>   (sra_modify_function_body): Likewise, use sra_modify_call_arg to
>   process call arguments, including the static chain.
> 
> gcc/testsuite/ChangeLog:
> 
> 2023-11-03  Martin Jambor  
> 
>   PR middle-end/109849
>   * g++.dg/tree-ssa/pr109849.C: New test.
>   * gfortran.dg/pr43984.f90: Added -fno-tree-sra to dg-options.
> ---
>  gcc/testsuite/g++.dg/tree-ssa/pr109849.C |  31 +++
>  gcc/testsuite/gfortran.dg/pr43984.f90|   2 +-
>  gcc/tree-sra.cc  | 244 ++-
>  3 files changed, 231 insertions(+), 46 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr109849.C
> 
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr109849.C 
> b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C
> new file mode 100644
> index 000..cd348c0f590
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-sra" } */
> +
> +#include 
> +typedef unsigned int uint32_t;
> +std::pair pair;
> +void
> +test()
> +{
> +std::vector > stack;
> +stack.push_back (pair);
> +while (!stack.empty()) {
> +std::pair cur = stack.back();
> +stack.pop_back();
> +if (!cur.first)
> +{
> +cur.second++;
> +stack.push_back (cur);
> +}
> 

[PATCH V3] tree-optimization: Add register pressure heuristics and appropriate use of profile data.

2023-11-16 Thread Ajit Agarwal
Hello Richard:

This patch does decision making in code sinking considers the following 
decision.
High register pressure region is true if the following criteria
satisfied.

a) If liveout (early_bb) <= livein (early_bb).
b) if liveout (best_bb) <= liveout (early_bb).
c) !best_bb->count >= early_bb->count.

If above is true we decide to do code motion (code sinking).

Decision making doesn't include sinking threshold and multiplication
factor of 100.

Decisions as above is based on liveness analysis and profile
data.

consider a stmt a = b + c; where b and c die at the definition of a.

There are chances that live_out (best_bb) greater if for all
successors of best_bb there are more GEN (variables). If
live_out (best_bb) is less means there more KILL (Variables)
in successors of best_bb.

With below heuristics live_out (best_bb) > live_out (early_bb)
then we dont do code motion as there are chances of more
interfering live ranges.If liveout (best_bb) <= liveout (early_bb)
then we do code motion as there is there are more KILL(for all
successors of best_bb) and there is less chance of interfering
live ranges.

With moving down above stmt from early_bb to best_bb increases
live_out (early_bb) by one but live_out (best_bb) may be remains.
If live_out (early_bb) increase by 1 but if it becomes greater
than live_out (best_bb) then we dont do code motion if we have
more GEN (Variables) in best_bb otherewise its safer to do
code motion.

for above statement a = b + c dies b and c and generates a in
early_bb then liveout(early_bb) increases by 1. If before moving
if liveout (best_bb) is 10 and then liveout (early_bb) becomes > 10
then we dont do code motion otherwise we do code motion.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


tree-optimization: Add register pressure heuristics and appropriate use of 
profile data.

Decision making in code sinking considers the following decision.
High register pressure region is true if the following criteria
satisfied.

a) If liveout (early_bb) <= livein (early_bb).
b) if liveout (best_bb) <= liveout (early_bb).
c) !best_bb->count >= early_bb->count.

If above is true we decide to do code motion (code sinking).

Decision making doesn't include sinking threshold and multiplication
factor of 100.

Decisions as above is based on liveness analysis and profile
data.

consider a stmt a = b + c; where b and c die at the definition of a.

There are chances that live_out (best_bb) greater if for all
successors of best_bb there are more GEN (variables). If
live_out (best_bb) is less means there more KILL (Variables)
in successors of best_bb.

With below heuristics live_out (best_bb) > live_out (early_bb)
then we dont do code motion as there are chances of more
interfering live ranges.If liveout (best_bb) <= liveout (early_bb)
then we do code motion as there is there are more KILL(for all
successors of best_bb) and there is less chance of interfering
live ranges.

With moving down above stmt from early_bb to best_bb increases
live_out (early_bb) by one but live_out (best_bb) may be remains.
If live_out (early_bb) increase by 1 but if it becomes greater
than live_out (best_bb) then we dont do code motion if we have
more GEN (Variables) in best_bb otherewise its safer to do
code motion.

for above statement a = b + c dies b and c and generates a in
early_bb then liveout(early_bb) increases by 1. If before moving
if liveout (best_bb) is 10 and then liveout (early_bb) becomes > 10
then we dont do code motion otherwise we do code motion.

2023-11-17  Ajit Kumar Agarwal  

gcc/ChangeLog:

* tree-ssa-sink.cc (statement_sink_location): Add tree_live_info_p
as paramters.
(sink_code_in_bb): Ditto.
(select_best_block): Add register pressure heuristics to select
the best blocks in the immediate dominator for same loop nest depth.
(execute): Add live range analysis.
(additional_var_map): New function.
* tree-ssa-live.cc (set_var_live_on_entry): Add virtual operand
tests on ssa_names.
(verify_live_on_entry): Ditto.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-sink-21.c: New test.
* gcc.dg/tree-ssa/ssa-sink-22.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c |  15 +++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c |  19 +++
 gcc/tree-ssa-live.cc|  11 +-
 gcc/tree-ssa-sink.cc| 129 +++-
 4 files changed, 142 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
new file mode 100644
index 000..d3b79ca5803
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j;
+void 

[PATCH] RISC-V: Fix bug of tuple move splitter[PR112561]

2023-11-16 Thread Juzhe-Zhong
Fix segment fault on tuple move:

bbl loader
z   ra 000102ac sp 003ffaf0 gp 0001c0b8
tp  t0 000104a0 t1 000f t2 
s0  s1  a0 003ffb30 a1 003ffb58
a2  a3  a4  a5 0001c340
a6 0004 a7 0004 s2  s3 
s4  s5  s6  s7 
s8  s9  sA  sB 
t3  t4  t5  t6 
pc 000101aa va/inst 0004 sr 80026620
User store segfault @ 0x0004

PR target/112561

gcc/ChangeLog:

* config/riscv/riscv-v.cc (emit_vlmax_insn_lra): Add VLS optimization.
(expand_tuple_move): Fix bug

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112561.c: New test.

---
 gcc/config/riscv/riscv-v.cc   | 21 ---
 .../gcc.target/riscv/rvv/autovec/pr112561.c   | 16 ++
 2 files changed, 34 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112561.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 6a2009ffb05..08bbb657a06 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -374,10 +374,24 @@ void
 emit_vlmax_insn_lra (unsigned icode, unsigned insn_flags, rtx *ops, rtx vl)
 {
   gcc_assert (!can_create_pseudo_p ());
+  machine_mode mode = GET_MODE (ops[0]);
 
-  insn_expander e (insn_flags, true);
-  e.set_vl (vl);
-  e.emit_insn ((enum insn_code) icode, ops);
+  if (imm_avl_p (mode))
+{
+  /* Even though VL is a real hardreg already allocated since
+it is post-RA now, we still gain benefits that we emit
+vsetivli zero, imm instead of vsetvli VL, zero which is
+we can be more flexible in post-RA instruction scheduling.  */
+  insn_expander e (insn_flags, false);
+  e.set_vl (gen_int_mode (GET_MODE_NUNITS (mode), Pmode));
+  e.emit_insn ((enum insn_code) icode, ops);
+}
+  else
+{
+  insn_expander e (insn_flags, true);
+  e.set_vl (vl);
+  e.emit_insn ((enum insn_code) icode, ops);
+}
 }
 
 /* Emit an RVV insn with a predefined vector length.  Contrary to
@@ -2148,6 +2162,7 @@ expand_tuple_move (rtx *ops)
  offset = ops[2];
}
 
+  emit_vlmax_vsetvl (subpart_mode, ops[4]);
   if (MEM_P (ops[1]))
{
  /* Load operations.  */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112561.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112561.c
new file mode 100644
index 000..25e61fa12c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112561.c
@@ -0,0 +1,16 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-options "-O3 -ftree-vectorize 
--param=riscv-autovec-preference=fixed-vlmax -mcmodel=medlow" } */
+
+int printf(char *, ...);
+int a, b, c, e;
+short d[7][7] = {};
+int main() {
+  short f;
+  c = 0;
+  for (; c <= 6; c++) {
+e |= d[c][c] & 1;
+b &= f & 3;
+  }
+  printf("%d\n", a);
+  return 0;
+}
-- 
2.36.3



[PATCH v2] RISC-V: T-HEAD: Add support for the XTheadInt ISA extension

2023-11-16 Thread Jin Ma
The XTheadInt ISA extension provides acceleration interruption
instructions as defined in T-Head-specific:
* th.ipush
* th.ipop

Ref:
https://github.com/T-head-Semi/thead-extension-spec/releases/download/2.3.0/xthead-2023-11-10-2.3.0.pdf

gcc/ChangeLog:

* config/riscv/riscv-protos.h (th_int_get_mask): New prototype.
(th_int_get_save_adjustment): Likewise.
(th_int_adjust_cfi_prologue): Likewise.
* config/riscv/riscv.cc (TH_INT_INTERRUPT): New macro.
(riscv_expand_prologue): Add the processing of XTheadInt.
(riscv_expand_epilogue): Likewise.
* config/riscv/riscv.md: New unspec.
* config/riscv/thead.cc (BITSET_P): New macro.
* config/riscv/thead.md (th_int_push): New pattern.
(th_int_pop): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadint-push-pop.c: New test.
---
 gcc/config/riscv/riscv-protos.h   |  3 +
 gcc/config/riscv/riscv.cc | 61 ++-
 gcc/config/riscv/riscv.h  |  3 +
 gcc/config/riscv/riscv.md |  4 +
 gcc/config/riscv/thead.cc | 77 +++
 gcc/config/riscv/thead.md | 67 
 .../gcc.target/riscv/xtheadint-push-pop.c | 36 +
 7 files changed, 247 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadint-push-pop.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 196b53f10f3..91d1e99f672 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -633,6 +633,9 @@ extern void th_mempair_prepare_save_restore_operands 
(rtx[4], bool,
  int, HOST_WIDE_INT,
  int, HOST_WIDE_INT);
 extern void th_mempair_save_restore_regs (rtx[4], bool, machine_mode);
+extern unsigned int th_int_get_mask (unsigned int);
+extern unsigned int th_int_get_save_adjustment (void);
+extern rtx th_int_adjust_cfi_prologue (unsigned int);
 #ifdef RTX_CODE
 extern const char*
 th_mempair_output_move (rtx[4], bool, machine_mode, RTX_CODE);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index c2bd1c2ed29..6ff6f4789a4 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -94,15 +94,22 @@ along with GCC; see the file COPYING3.  If not see
 #define UNSPEC_ADDRESS_TYPE(X) \
   ((enum riscv_symbol_type) (XINT (X, 1) - UNSPEC_ADDRESS_FIRST))
 
-/* True if bit BIT is set in VALUE.  */
-#define BITSET_P(VALUE, BIT) (((VALUE) & (1ULL << (BIT))) != 0)
-
 /* Extract the backup dynamic frm rtl.  */
 #define DYNAMIC_FRM_RTL(c) ((c)->machine->mode_sw_info.dynamic_frm)
 
 /* True the mode switching has static frm, or false.  */
 #define STATIC_FRM_P(c) ((c)->machine->mode_sw_info.static_frm_p)
 
+/* True if we can use the instructions in the XTheadInt extension
+   to handle interrupts, or false.  */
+#define TH_INT_INTERRUPT(c)\
+  (TARGET_XTHEADINT\
+   /* The XTheadInt extension only supports rv32.  */  \
+   && !TARGET_64BIT\
+   && (c)->machine->interrupt_handler_p
\
+   /* The XTheadInt instructions can only be executed in M-mode.  */   \
+   && (c)->machine->interrupt_mode == MACHINE_MODE)
+
 /* Information about a function's frame layout.  */
 struct GTY(())  riscv_frame_info {
   /* The size of the frame in bytes.  */
@@ -6737,6 +6744,7 @@ riscv_expand_prologue (void)
   unsigned fmask = frame->fmask;
   int spimm, multi_push_additional, stack_adj;
   rtx insn, dwarf = NULL_RTX;
+  unsigned th_int_mask = 0;
 
   if (flag_stack_usage_info)
 current_function_static_stack_size = constant_lower_bound (remaining_size);
@@ -6805,6 +6813,28 @@ riscv_expand_prologue (void)
   REG_NOTES (insn) = dwarf;
 }
 
+  th_int_mask = th_int_get_mask (frame->mask);
+  if (th_int_mask && TH_INT_INTERRUPT (cfun))
+{
+  frame->mask &= ~th_int_mask;
+
+  /* RISCV_PROLOGUE_TEMP may be used to handle some CSR for
+interrupts, such as fcsr.  */
+  if ((TARGET_HARD_FLOAT  && frame->fmask)
+ || (TARGET_ZFINX && frame->mask))
+   frame->mask |= (1 << RISCV_PROLOGUE_TEMP_REGNUM);
+
+  unsigned save_adjustment = th_int_get_save_adjustment ();
+  frame->gp_sp_offset -= save_adjustment;
+  remaining_size -= save_adjustment;
+
+  insn = emit_insn (gen_th_int_push ());
+
+  rtx dwarf = th_int_adjust_cfi_prologue (th_int_mask);
+  RTX_FRAME_RELATED_P (insn) = 1;
+  REG_NOTES (insn) = dwarf;
+}
+
   /* Save the GP, FP registers.  */
   if ((frame->mask | frame->fmask) != 0)
 {
@@ -7033,6 +7063,7 @@ riscv_expand_epilogue (int style)
 = use_multi_pop ? frame->multi_push_adj_base + 

Re: [PATCH V11] : tree-ssa-sink: Improve code sinking pass

2023-11-16 Thread Ajit Agarwal
Hello Richard:

On 16/11/23 3:28 pm, Richard Biener wrote:
> On Mon, Oct 30, 2023 at 1:10 PM Ajit Agarwal  wrote:
>>
>> Hello Richard:
>>
>> Currently, code sinking will sink code at the use points with loop having 
>> same
>> nesting depth. The following patch improves code sinking by placing the sunk
>> code in immediate dominator with same loop nest depth.
>>
>> Review comments are incorporated.
>>
>> For example :
>>
>> void bar();
>> int j;
>> void foo(int a, int b, int c, int d, int e, int f)
>> {
>>   int l;
>>   l = a + b + c + d +e + f;
>>   if (a != 5)
>> {
>>   bar();
>>   j = l;
>> }
>> }
>>
>> Code Sinking does the following:
>>
>> void bar();
>> int j;
>> void foo(int a, int b, int c, int d, int e, int f)
>> {
>>   int l;
>>
>>   if (a != 5)
>> {
>>   l = a + b + c + d +e + f;
>>   bar();
>>   j = l;
>> }
>> }
>>
>> Bootstrapped regtested on powerpc64-linux-gnu.
>>
>> Thanks & Regards
>> Ajit
>>
>>
>> tree-ssa-sink: Improve code sinking pass
>>
>> Currently, code sinking will sink code at the use points with loop having 
>> same
>> nesting depth. The following patch improves code sinking by placing the sunk
>> code in immediate dominator with same loop nest depth.
>>
>> 2023-10-30  Ajit Kumar Agarwal  
>>
>> gcc/ChangeLog:
>>
>> PR tree-optimization/81953
>> * tree-ssa-sink.cc (statement_sink_location): Move statements with
>> same loop nest depth.
>> (select_best_block): Add heuristics to select the best blocks in the
>> immediate dominato for same loop nest depthr.
>>
>> gcc/testsuite/ChangeLog:
>>
>> PR tree-optimization/81953
>> * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
>> * gcc.dg/tree-ssa/ssa-sink-22.c: New test.
>> ---
>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 +++
>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +++
>>  gcc/tree-ssa-sink.cc| 21 ++---
>>  3 files changed, 48 insertions(+), 7 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>>
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>> new file mode 100644
>> index 000..d3b79ca5803
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>> @@ -0,0 +1,15 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
>> +void bar();
>> +int j;
>> +void foo(int a, int b, int c, int d, int e, int f)
>> +{
>> +  int l;
>> +  l = a + b + c + d +e + f;
>> +  if (a != 5)
>> +{
>> +  bar();
>> +  j = l;
>> +}
>> +}
>> +/* { dg-final { scan-tree-dump 
>> {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c 
>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>> new file mode 100644
>> index 000..84e7938c54f
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
>> +void bar();
>> +int j, x;
>> +void foo(int a, int b, int c, int d, int e, int f)
>> +{
>> +  int l;
>> +  l = a + b + c + d +e + f;
>> +  if (a != 5)
>> +{
>> +  bar();
>> +  if (b != 3)
>> +x = 3;
>> +  else
>> +x = 5;
>> +  j = l;
>> +}
>> +}
>> +/* { dg-final { scan-tree-dump 
>> {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
>> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
>> index a360c5cdd6e..0b823b81309 100644
>> --- a/gcc/tree-ssa-sink.cc
>> +++ b/gcc/tree-ssa-sink.cc
>> @@ -176,6 +176,9 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
>> bool *debug_stmts)
>> tree, return the best basic block between them (inclusive) to place
>> statements.
>>
>> +   The best basic block should be an immediate dominator of
>> +   best basic block if we've moved to same loop nest.
>> +
>> We want the most control dependent block in the shallowest loop nest.
>>
>> If the resulting block is in a shallower loop nest, then use it.  Else
>> @@ -201,14 +204,13 @@ select_best_block (basic_block early_bb,
>>  {
>>/* If we've moved into a lower loop nest, then that becomes
>>  our best block.  */
>> -  if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
>> +  if (bb_loop_depth (temp_bb) <= bb_loop_depth (best_bb))
> 
> This will now only ever sink stmts out of loops but never do traditional
> sinking into conditional executed blocks within the same loop.
> 
> That's clearly not what the code is intended to do.
> 

Incorporated in V12 of the patch. Ok for trunk?

Thanks & Regards
Ajit
> Richard.
> 
>> best_bb = temp_bb;
>>
>>/* Walk up the dominator tree, hopefully we'll find a shallower
>>  loop nest.  */
>>temp_bb = get_immediate_dominator (CDI_DOMINATORS, 

[PATCH] [APX PPX] Support Intel APX PPX

2023-11-16 Thread Hongyu Wang
Intel APX PPX feature has been released in [1].

PPX stands for Push-Pop Acceleration. PUSH/PUSH2 and its corresponding POP
can be marked with a 1-bit hint to indicate that the POP reads the
value written by the PUSH from the stack. The processor tracks these marked
instructions internally and fast-forwards register data between
matching PUSH and POP instructions, without going through memory or
through the training loop of the Fast Store Forwarding Predictor (FSFP).
This feature can also be adopted to PUSH2/POP2.

For GCC, we emit explicit suffix 'p' (paired) to indicate the push/pop
pair are marked with PPX hint. To separate form original push/pop, we
use UNSPEC to restrict the PPX related patterns. So for pushp/popp, the
cfi is manually adjusted for the UNSPEC PPX insns.

In the first implementation we only emit them under prologue/epilogue
when saving/restoring callee-saved registers to make sure push/pop are
paired. So an extra flag was added to check if PPX insns can be emitted
for those register save/restore interfaces.

The PPX hint is purely a performance hint. If the 'p' suffix is not
emitted for paired push/pop, the PPX optimization will be disabled,
while program sematic will not be affected at all.

Bootstrapped/regtest on x86-64-pc-linux-gnu{-m32,}.

Ok for master?

[1].https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.ht

gcc/ChangeLog:

* config/i386/i386-opts.h (enum apx_features): Add apx_ppx, add
it to apx_all.
* config/i386/i386.cc (ix86_emit_restore_reg_using_pop): Add
ppx_p parameter for function declaration.
(gen_push2): Add ppx_p parameter, emit push2p if ppx_p is true.
(ix86_emit_restore_reg_using_pop2): Likewise for pop2p.
(gen_pushp): New function to emit pushp and adjust cfi.
(ix86_emit_save_regs): Emit pushp/push2p under TARGET_APX_PPX.
(ix86_emit_restore_reg_using_pop): Add ppx_p, emit popp insn
and adjust cfi when ppx_p is ture.
(ix86_emit_restore_reg_using_pop2): Add ppx_p and parse to its
callee.
(ix86_emit_restore_regs_using_pop2): Likewise.
(ix86_expand_epilogue): Parse TARGET_APX_PPX to
ix86_emit_restore_reg_using_pop.
* config/i386/i386.h (TARGET_APX_PPX): New.
* config/i386/i386.md (UNSPEC_APXPUSHP): New unspec.
(UNSPEC_APXPOPP): Likewise.
(UNSPEC_APXPUSH2P): Likewise.
(UNSPEC_APXPOP2P_LOW): Likewise.
(UNSPEC_APXPOP2P_HIGH): Likewise.
(pushp_di): New define_insn.
(popp_di): Likewise.
(push2p_di): Likewise.
(pop2p_di): Likewise.
* config/i386/i386.opt: Add apx_ppx enum.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-interrupt-1.c: Adjust option to restrict them
under certain subfeatures.
* gcc.target/i386/apx-push2pop2-1.c: Likewise.
* gcc.target/i386/apx-push2pop2_force_drap-1.c: Likewise.
* gcc.target/i386/apx-push2pop2_interrupt-1.c: Likewise.
* gcc.target/i386/apx-ppx-1.c: New test.
---
 gcc/config/i386/i386-opts.h   |   3 +-
 gcc/config/i386/i386.cc   | 113 ++
 gcc/config/i386/i386.h|   1 +
 gcc/config/i386/i386.md   |  47 +++-
 gcc/config/i386/i386.opt  |   3 +
 .../gcc.target/i386/apx-interrupt-1.c |   2 +-
 gcc/testsuite/gcc.target/i386/apx-ppx-1.c |   9 ++
 .../gcc.target/i386/apx-push2pop2-1.c |   2 +-
 .../i386/apx-push2pop2_force_drap-1.c |   2 +-
 .../i386/apx-push2pop2_interrupt-1.c  |   2 +-
 10 files changed, 158 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-ppx-1.c

diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
index 2ec76a16bce..4d293edb399 100644
--- a/gcc/config/i386/i386-opts.h
+++ b/gcc/config/i386/i386-opts.h
@@ -139,7 +139,8 @@ enum apx_features {
   apx_egpr = 1 << 0,
   apx_push2pop2 = 1 << 1,
   apx_ndd = 1 << 2,
-  apx_all = apx_egpr | apx_push2pop2 | apx_ndd,
+  apx_ppx = 1 << 3,
+  apx_all = apx_egpr | apx_push2pop2 | apx_ndd | apx_ppx,
 };
 
 #endif
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 683ac643bc8..df2fc236c0a 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -105,7 +105,7 @@ along with GCC; see the file COPYING3.  If not see
 static rtx legitimize_dllimport_symbol (rtx, bool);
 static rtx legitimize_pe_coff_extern_decl (rtx, bool);
 static void ix86_print_operand_address_as (FILE *, rtx, addr_space_t, bool);
-static void ix86_emit_restore_reg_using_pop (rtx);
+static void ix86_emit_restore_reg_using_pop (rtx, bool = false);
 
 
 #ifndef CHECK_STACK_LIMIT
@@ -6512,7 +6512,7 @@ gen_popfl (void)
 
 /* Generate a "push2" pattern for input ARG.  */
 rtx
-gen_push2 (rtx mem, rtx reg1, rtx reg2)
+gen_push2 (rtx mem, rtx reg1, rtx reg2, bool ppx_p = false)
 {
   struct 

[PATCH V12] tree-ssa-sink: Improve code sinking pass

2023-11-16 Thread Ajit Agarwal
Hello Richard:

Currently, code sinking will sink code at the use points with loop having same
nesting depth. The following patch improves code sinking by placing the sunk
code in immediate dominator with same loop nest depth.

Review comments are incorporated.

For example :

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  l = a + b + c + d +e + f;
  if (a != 5)
{
  bar();
  j = l;
}
}

Code Sinking does the following:

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;

  if (a != 5)
{
  l = a + b + c + d +e + f;
  bar();
  j = l;
}
}

Bootstrapped regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


tree-ssa-sink: Improve code sinking pass

Currently, code sinking will sink code at the use points with loop having same
nesting depth. The following patch improves code sinking by placing the sunk
code in immediate dominator with same loop nest depth.

2023-11-17  Ajit Kumar Agarwal  

gcc/ChangeLog:

PR tree-optimization/81953
* tree-ssa-sink.cc (statement_sink_location): Move statements with
same loop nest depth.
(select_best_block): Add heuristics to select the best blocks in the
immediate dominato for same loop nest depthr.

gcc/testsuite/ChangeLog:

PR tree-optimization/81953
* gcc.dg/tree-ssa/ssa-sink-21.c: New test.
* gcc.dg/tree-ssa/ssa-sink-22.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 +++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 ++
 gcc/tree-ssa-sink.cc| 29 +
 3 files changed, 58 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
new file mode 100644
index 000..d3b79ca5803
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
new file mode 100644
index 000..84e7938c54f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j, x;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  if (b != 3)
+x = 3;
+  else
+x = 5;
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index a360c5cdd6e..4e6568df4e5 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -176,6 +176,9 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool 
*debug_stmts)
tree, return the best basic block between them (inclusive) to place
statements.
 
+   The best basic block should be an immediate dominator of
+   best basic block if we've moved to same loop nest.
+
We want the most control dependent block in the shallowest loop nest.
 
If the resulting block is in a shallower loop nest, then use it.  Else
@@ -209,6 +212,18 @@ select_best_block (basic_block early_bb,
   temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
 }
 
+  temp_bb = best_bb;
+  /* If we've moved into a same loop nest, then that becomes
+ our best block.  */
+  while (best_bb == late_bb && temp_bb != early_bb
+&& bb_loop_depth (temp_bb) == bb_loop_depth (best_bb))
+{
+  best_bb = temp_bb;
+  /* Walk up the dominator tree, hopefully we'll find a best
+block to move in same loop nest.  */
+  temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
+}
+
   /* Placing a statement before a setjmp-like function would be invalid
  (it cannot be reevaluated when execution follows an abnormal edge).
  If we selected a block with abnormal predecessors, just punt.  */
@@ -250,7 +265,13 @@ select_best_block (basic_block early_bb,
   /* If result of comparsion is unknown, prefer EARLY_BB.
 Thus use !(...>=..) rather than (...<...)  */
   && !(best_bb->count * 100 >= early_bb->count * threshold))
-return best_bb;
+{
+  /* Avoid sinking to immediate dominator if the statement to be moved
+has memory operand and same loop nest.  */
+  if (best_bb != late_bb && gimple_vuse (stmt))
+   return late_bb;
+  return best_bb;
+}
 
   /* 

[PATCH] LoongArch: Fix eh_return epilogue for normal returns.

2023-11-16 Thread Yang Yujie
gcc/ChangeLog:

* config/loongarch/loongarch.cc: Do not restore the saved eh_return
data registers ($r4-$r7) for a normal return of a function that calls
__builtin_eh_return elsewhere.
* config/loongarch/loongarch-protos.h: Same.
* config/loongarch/loongarch.md: Same.
---
 gcc/config/loongarch/loongarch-protos.h |  2 +-
 gcc/config/loongarch/loongarch.cc   | 41 -
 gcc/config/loongarch/loongarch.md   | 18 +--
 3 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/gcc/config/loongarch/loongarch-protos.h 
b/gcc/config/loongarch/loongarch-protos.h
index cb8fc36b086..af20b5d7132 100644
--- a/gcc/config/loongarch/loongarch-protos.h
+++ b/gcc/config/loongarch/loongarch-protos.h
@@ -60,7 +60,7 @@ enum loongarch_symbol_type {
 extern rtx loongarch_emit_move (rtx, rtx);
 extern HOST_WIDE_INT loongarch_initial_elimination_offset (int, int);
 extern void loongarch_expand_prologue (void);
-extern void loongarch_expand_epilogue (bool);
+extern void loongarch_expand_epilogue (int);
 extern bool loongarch_can_use_return_insn (void);
 
 extern bool loongarch_symbolic_constant_p (rtx, enum loongarch_symbol_type *);
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 738911661d7..7f60a4367d3 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -1011,20 +1011,30 @@ loongarch_save_restore_reg (machine_mode mode, int 
regno, HOST_WIDE_INT offset,
 
 static void
 loongarch_for_each_saved_reg (HOST_WIDE_INT sp_offset,
- loongarch_save_restore_fn fn)
+ loongarch_save_restore_fn fn,
+ bool skip_eh_data_regs_p)
 {
   HOST_WIDE_INT offset;
 
   /* Save the link register and s-registers.  */
   offset = cfun->machine->frame.gp_sp_offset - sp_offset;
   for (int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
-if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
-  {
-   if (!cfun->machine->reg_is_wrapped_separately[regno])
- loongarch_save_restore_reg (word_mode, regno, offset, fn);
+{
+  /* Special care needs to be taken for $r4-$r7 (EH_RETURN_DATA_REGNO)
+when returning normally from a function that calls __builtin_eh_return.
+In this case, these registers are saved but should not be restored,
+or the return value may be clobbered.  */
 
-   offset -= UNITS_PER_WORD;
-  }
+  if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
+   {
+ if (!(cfun->machine->reg_is_wrapped_separately[regno]
+   || (skip_eh_data_regs_p
+   && GP_ARG_FIRST <= regno && regno < GP_ARG_FIRST + 4)))
+   loongarch_save_restore_reg (word_mode, regno, offset, fn);
+
+ offset -= UNITS_PER_WORD;
+   }
+}
 
   /* This loop must iterate over the same space as its companion in
  loongarch_compute_frame_info.  */
@@ -1293,7 +1303,7 @@ loongarch_expand_prologue (void)
GEN_INT (-step1));
   RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
   size -= step1;
-  loongarch_for_each_saved_reg (size, loongarch_save_reg);
+  loongarch_for_each_saved_reg (size, loongarch_save_reg, false);
 }
 
   /* Set up the frame pointer, if we're using one.  */
@@ -1378,11 +1388,11 @@ loongarch_can_use_return_insn (void)
   return reload_completed && cfun->machine->frame.total_size == 0;
 }
 
-/* Expand an "epilogue" or "sibcall_epilogue" pattern; SIBCALL_P
-   says which.  */
+/* Expand function epilogue for the following insn patterns:
+   "epilogue" (style == 0) / "sibcall_epilogue" (1) / "eh_return" (2).  */
 
 void
-loongarch_expand_epilogue (bool sibcall_p)
+loongarch_expand_epilogue (int style)
 {
   /* Split the frame into two.  STEP1 is the amount of stack we should
  deallocate before restoring the registers.  STEP2 is the amount we
@@ -1399,7 +1409,8 @@ loongarch_expand_epilogue (bool sibcall_p)
   bool need_barrier_p
 = (get_frame_size () + cfun->machine->frame.arg_pointer_offset) != 0;
 
-  if (!sibcall_p && loongarch_can_use_return_insn ())
+  /* Handle simple returns.  */
+  if (style == 0 && loongarch_can_use_return_insn ())
 {
   emit_jump_insn (gen_return ());
   return;
@@ -1475,7 +1486,8 @@ loongarch_expand_epilogue (bool sibcall_p)
 
   /* Restore the registers.  */
   loongarch_for_each_saved_reg (frame->total_size - step2,
-   loongarch_restore_reg);
+   loongarch_restore_reg,
+   crtl->calls_eh_return && style != 2);
 
   if (need_barrier_p)
 loongarch_emit_stack_tie ();
@@ -1500,7 +1512,8 @@ loongarch_expand_epilogue (bool sibcall_p)
 emit_insn (gen_add3_insn (stack_pointer_rtx, stack_pointer_rtx,
  EH_RETURN_STACKADJ_RTX));
 
-  if (!sibcall_p)
+  /* Emit return unless doing sibcall.  

Re: [PATCH 0/5] LoongArch: Initial LA664 support

2023-11-16 Thread chenglulu



在 2023/11/17 下午12:55, Xi Ruoyao 写道:

On Fri, 2023-11-17 at 10:41 +0800, chenglulu wrote:

Hi,

Thank you very much for the modification, but I think we need to support
la664 with the configuration items of configure.

I'll add it.


I also defined ISA_BASE_LA64V110 to represent the LoongArch1.1
instruction set, what do you think?

I'll add it too.  I had misread section 1.5 paragraph 1 of the spec so I
didn't consider this a good idea, but after reading it again I think it
should be added.

I have already added these two, but not on the basis of your patch. So...



在 2023/11/16 下午9:18, Xi Ruoyao 写道:

Loongson 3A6000 processor will be shipped to general users in this month
and it features 4 cores with the new LA664 micro architecture.  Here is
some changes from LA464:

1. The 32-bit division instruction now ignores the high 32 bits of the
     input registers.  This is enumerated via CPUCFG word 0x2, bit 26.
2. The micro architecture now guarantees two loads on the same memory
     address won't be reordered with each other.  dbar 0x700 is turned
     into nop.
3. The architecture now supports approximate square root instructions
     (FRECIPE and VRSQRTE) on 32-bit or 64-bit floating-point values and
     the vectors of these values.
4. The architecture now supports SC.Q instruction for 128-bit CAS.
5. The architecture now supports LL.ACQ and SC.REL instructions (well, I
     don't really know what they are for).
6. The architecture now supports CAS instructions for 64, 32, 16, or 8-bit
     values.
7. The architecture now supports atomic add and atomic swap instructions
     for 16 or 8-bit values.
8. Some non-zero hint values of DBAR instructions are added.

These features are documented in LoongArch v1.1.  Implementations can
implement any subset of them and enumerate the implemented features via
CPUCFG.  LA664 implements them all.

(8) is already implemented in previous patches because it's completely
backward-compatible.  This series implements (1) and (2) with switches
-mdiv32 and -mld-seq-sa (these names are derived from the names of the
corresponding CPUCFG bits documented in the LoongArch v1.1
specification).

The other features require Binutils support and we are close to the end
of GCC 14 stage 1, so I'm posting this series first now.

With -march=la664, these two options are implicitly enabled but they can
be turned off with -mno-div32 or -mno-ld-seq-sa.

With -march=native, the current CPU is probed via CPUCFG and these
options are implicitly enabled if the CPU supports the corresponding
feature.  They can be turned off with explicit -mno-div32 or
-mno-ld-seq-sa as well.

-mtune=la664 is implemented as a copy of -mtune=la464 and we can adjust
it with benchmark results later.

Bootstrapped and regtested on a LA664 with BOOT_CFLAGS="-march=la664
-O2", a LA464 with BOOT_CFLAGS="-march=native -O2".  And manually
verified -march=native probing on LA664 and LA464.

Xi Ruoyao (5):
    LoongArch: Switch loongarch-def to C++
    LoongArch: genopts: Add infrastructure to generate code for new
  features in ISA evolution
    LoongArch: Take the advantage of -mdiv32 if it's enabled
    LoongArch: Don't emit dbar 0x700 if -mld-seq-sa
    LoongArch: Add -march=la664 and -mtune=la664

   gcc/config/loongarch/genopts/genstr.sh    |  78 ++-
   gcc/config/loongarch/genopts/isa-evolution.in |   2 +
   .../loongarch/genopts/loongarch-strings   |   1 +
   gcc/config/loongarch/genopts/loongarch.opt.in |  10 +
   gcc/config/loongarch/loongarch-cpu.cc |  37 ++--
   gcc/config/loongarch/loongarch-cpucfg-map.h   |  36 +++
   gcc/config/loongarch/loongarch-def-array.h    |  40 
   gcc/config/loongarch/loongarch-def.c  | 205 --
   gcc/config/loongarch/loongarch-def.cc | 193 +
   gcc/config/loongarch/loongarch-def.h  |  67 --
   gcc/config/loongarch/loongarch-opts.h |   9 +-
   gcc/config/loongarch/loongarch-str.h  |   8 +-
   gcc/config/loongarch/loongarch-tune.h | 123 ++-
   gcc/config/loongarch/loongarch.cc |   6 +-
   gcc/config/loongarch/loongarch.md |  31 ++-
   gcc/config/loongarch/loongarch.opt    |  23 +-
   gcc/config/loongarch/t-loongarch  |  25 ++-
   .../gcc.target/loongarch/div-div32.c  |  31 +++
   .../gcc.target/loongarch/div-no-div32.c   |  11 +
   19 files changed, 664 insertions(+), 272 deletions(-)
   create mode 100644 gcc/config/loongarch/genopts/isa-evolution.in
   create mode 100644 gcc/config/loongarch/loongarch-cpucfg-map.h
   create mode 100644 gcc/config/loongarch/loongarch-def-array.h
   delete mode 100644 gcc/config/loongarch/loongarch-def.c
   create mode 100644 gcc/config/loongarch/loongarch-def.cc
   create mode 100644 gcc/testsuite/gcc.target/loongarch/div-div32.c
   create mode 100644 gcc/testsuite/gcc.target/loongarch/div-no-div32.c





[PATCH] RISC-V: Implement -mmemcpy-strategy= options[PR112537]

2023-11-16 Thread Li Xu
From: xuli 

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112537

-mmemcpy-strategy=[auto|libcall|scalar|vector]

auto: Current status, use scalar or vector instructions.
libcall: Always use a library call.
scalar: Only use scalar instructions.
vector: Only use vector instructions.

PR target/112537

gcc/ChangeLog:

* config/riscv/riscv-opts.h (enum riscv_stringop_strategy_enum): 
Strategy enum.
* config/riscv/riscv-string.cc (riscv_expand_block_move): Disabled 
based on options.
(expand_block_move): Ditto.
* config/riscv/riscv.opt: Add -mmemcpy-strategy=.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/cpymem-strategy-1.c: New test.
* gcc.target/riscv/rvv/base/cpymem-strategy-2.c: New test.
* gcc.target/riscv/rvv/base/cpymem-strategy-3.c: New test.
* gcc.target/riscv/rvv/base/cpymem-strategy-4.c: New test.
* gcc.target/riscv/rvv/base/cpymem-strategy-5.c: New test.
* gcc.target/riscv/rvv/base/cpymem-strategy.h: New test.
---
 gcc/config/riscv/riscv-opts.h | 12 +++
 gcc/config/riscv/riscv-string.cc  |  7 ++-
 gcc/config/riscv/riscv.opt| 20 +++
 .../riscv/rvv/base/cpymem-strategy-1.c|  6 ++
 .../riscv/rvv/base/cpymem-strategy-2.c|  6 ++
 .../riscv/rvv/base/cpymem-strategy-3.c|  6 ++
 .../riscv/rvv/base/cpymem-strategy-4.c|  6 ++
 .../riscv/rvv/base/cpymem-strategy-5.c|  6 ++
 .../riscv/rvv/base/cpymem-strategy.h  | 12 +++
 9 files changed, 80 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-strategy-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-strategy-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-strategy-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-strategy-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-strategy-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-strategy.h

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 532b1b6b84a..0b242f068e1 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -102,6 +102,18 @@ enum riscv_entity
   MAX_RISCV_ENTITIES
 };
 
+/* RISC-V stringop strategy. */
+enum riscv_stringop_strategy_enum {
+  /* Use scalar or vector instructions. */
+  USE_AUTO,
+  /* Always use a library call. */
+  USE_LIBCALL,
+  /* Only use scalar instructions. */
+  USE_SCALAR,
+  /* Only use vector instructions. */
+  USE_VECTOR
+};
+
 #define TARGET_ZICOND_LIKE (TARGET_ZICOND || (TARGET_XVENTANACONDOPS && 
TARGET_64BIT))
 
 /* Bit of riscv_zvl_flags will set contintuly, N-1 bit will set if N-bit is
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 57e8ad698d7..3b5e05e2c44 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -710,6 +710,10 @@ riscv_block_move_loop (rtx dest, rtx src, unsigned 
HOST_WIDE_INT length,
 bool
 riscv_expand_block_move (rtx dest, rtx src, rtx length)
 {
+  if (riscv_memcpy_strategy == USE_LIBCALL
+  || riscv_memcpy_strategy == USE_VECTOR)
+return false;
+
   if (CONST_INT_P (length))
 {
   unsigned HOST_WIDE_INT hwi_length = UINTVAL (length);
@@ -773,7 +777,8 @@ expand_block_move (rtx dst_in, rtx src_in, rtx length_in)
bnez a2, loop   # Any more?
ret # Return
   */
-  if (!TARGET_VECTOR)
+  if (!TARGET_VECTOR || riscv_memcpy_strategy == USE_LIBCALL
+  || riscv_memcpy_strategy == USE_SCALAR)
 return false;
   HOST_WIDE_INT potential_ew
 = (MIN (MIN (MEM_ALIGN (src_in), MEM_ALIGN (dst_in)), BITS_PER_WORD)
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 70d78151cee..4f3ce2233b2 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -527,3 +527,23 @@ Target Var(TARGET_ADJUST_LMUL_COST) Init(0)
 Target Undocumented Bool Var(riscv_vector_abi) Init(0)
 Enable the use of vector registers for function arguments and return value.
 This is an experimental switch and may be subject to change in the future.
+
+Enum
+Name(riscv_stringop_strategy) Type(enum riscv_stringop_strategy_enum)
+Valid arguments to -mmemcpy-strategy=:
+
+EnumValue
+Enum(riscv_stringop_strategy) String(auto) Value(USE_AUTO)
+
+EnumValue
+Enum(riscv_stringop_strategy) String(libcall) Value(USE_LIBCALL)
+
+EnumValue
+Enum(riscv_stringop_strategy) String(scalar) Value(USE_SCALAR)
+
+EnumValue
+Enum(riscv_stringop_strategy) String(vector) Value(USE_VECTOR)
+
+mmemcpy-strategy=
+Target RejectNegative Joined Enum(riscv_stringop_strategy) 
Var(riscv_memcpy_strategy) Init(USE_AUTO)
+Specify memcpy expansion strategy.
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/cpymem-strategy-1.c 

[PATCH] Support cbranchm for Vector HI/QImode.

2023-11-16 Thread liuhongt
The missing cbranchv*{hi,qi}4 maybe needed by early break vectorization.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.

gcc/ChangeLog:

* config/i386/sse.md (cbranch4): Extend to Vector
HI/QImode.
---
 gcc/config/i386/sse.md | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d250a6cb802..3659660a616 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -514,6 +514,12 @@ (define_mode_iterator VI_AVX2
(V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX2") V4SI
(V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX2") V2DI])
 
+(define_mode_iterator VI_AVX_AVX512F
+  [(V64QI "TARGET_AVX512F && TARGET_EVEX512") (V32QI "TARGET_AVX") V16QI
+   (V32HI "TARGET_AVX512F && TARGET_EVEX512") (V16HI "TARGET_AVX") V8HI
+   (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX") V4SI
+   (V8DI "TARGET_AVX512F && TARGET_EVEX512") (V4DI "TARGET_AVX") V2DI])
+
 ;; All QImode vector integer modes
 (define_mode_iterator VI1
   [(V32QI "TARGET_AVX") V16QI])
@@ -27868,8 +27874,8 @@ (define_insn "_store_mask"
 
 (define_expand "cbranch4"
   [(set (reg:CC FLAGS_REG)
-   (compare:CC (match_operand:VI48_AVX_AVX512F 1 "register_operand")
-   (match_operand:VI48_AVX_AVX512F 2 "nonimmediate_operand")))
+   (compare:CC (match_operand:VI_AVX_AVX512F 1 "register_operand")
+   (match_operand:VI_AVX_AVX512F 2 "nonimmediate_operand")))
(set (pc) (if_then_else
   (match_operator 0 "bt_comparison_operator"
[(reg:CC FLAGS_REG) (const_int 0)])
-- 
2.31.1



[PATCH] c++: Introduce the extended attribute for asm declarations

2023-11-16 Thread Julian Waters
Resent as plain text to appear on the patch tracker

Hi all,

This is the beginning of a patch to introduce the extended attribute
for asm declarations proposed in
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636563.html. I
will need some reviewer help in implementing this patch, as I am not
very familiar with gcc's internals.

The attribute in question looks as such (Example):

[[gnu::extended([output] "=r" (output) [otheroutput] "=r" (otheroutput),
  [input] "r" (input) [otherinput] "r" (otherinput),
  "%rcx" "%rdx", label, volatile stack)]]
asm ("");

I would really appreciate any reviews, as well as help in implementing
this patch

best regards,
Julian

>From a189f2820025315b5574d0e9384b96301c6ba7e8 Mon Sep 17 00:00:00 2001
From: TheShermanTanker 
Date: Fri, 17 Nov 2023 11:09:50 +0800
Subject: [PATCH] Introduce the extended attribute for asm declarations

---
 gcc/cp/parser.cc | 25 +
 gcc/cp/tree.cc   | 11 +++
 2 files changed, 36 insertions(+)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 5116bcb78f6..ecc5f2fabc1 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -15407,6 +15407,13 @@ cp_parser_block_declaration (cp_parser *parser,
  cp_parser_commit_to_tentative_parse (parser);
   cp_parser_asm_definition (parser);
 }
+  else if ((attr_idx = cp_parser_skip_std_attribute_spec_seq (parser, 1)) != 1
+   && cp_lexer_nth_token_is_keyword (parser->lexer, attr_idx, RID_ASM))
+{
+  if (statement_p)
+cp_parser_commit_to_tentative_parse (parser);
+  cp_parser_asm_definition (parser);
+}
   /* If the next keyword is `namespace', we have a
  namespace-alias-definition.  */
   else if (token1->keyword == RID_NAMESPACE)
@@ -22397,6 +22404,23 @@ cp_parser_asm_definition (cp_parser* parser)
   bool invalid_inputs_p = false;
   bool invalid_outputs_p = false;
   required_token missing = RT_NONE;
+
+  tree attrs = cp_parser_std_attribute_spec_seq (parser);
+  tree extended = error_mark_node;
+
+  if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON))
+{
+  /* Error during attribute parsing that resulted in skipping
+ to next semicolon.  */
+  cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
+  return;
+}
+
+  if (attrs != error_mark_node)
+{
+  extended = lookup_attribute ("gnu", "extended", attrs);
+}
+
   location_t asm_loc = cp_lexer_peek_token (parser->lexer)->location;

   /* Look for the `asm' keyword.  */
@@ -22511,6 +22535,7 @@ cp_parser_asm_definition (cp_parser* parser)
  two `:' tokens.  */
   if (cp_parser_allow_gnu_extensions_p (parser)
   && parser->in_function_body
+  && extended == error_mark_node
   && (cp_lexer_next_token_is (parser->lexer, CPP_COLON)
|| cp_lexer_next_token_is (parser->lexer, CPP_SCOPE)))
 {
diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index 417c92ba76f..1f081b3dfd8 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -46,6 +46,7 @@ static tree verify_stmt_tree_r (tree *, int *, void *);

 static tree handle_init_priority_attribute (tree *, tree, tree, int, bool *);
 static tree handle_abi_tag_attribute (tree *, tree, tree, int, bool *);
+static tree handle_extended_attribute (tree *, tree, tree, int, bool *);
 static tree handle_contract_attribute (tree *, tree, tree, int, bool *);

 /* If REF is an lvalue, returns the kind of lvalue that REF is.
@@ -5080,6 +5081,8 @@ const struct attribute_spec cxx_attribute_table[] =
 handle_init_priority_attribute, NULL },
   { "abi_tag", 1, -1, false, false, false, true,
 handle_abi_tag_attribute, NULL },
+  { "extended", 0, 5, true, false, false, false,
+handle_extended_attribute, NULL },
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
 };

@@ -5350,6 +5353,14 @@ handle_abi_tag_attribute (tree* node, tree
name, tree args,
   return NULL_TREE;
 }

+static tree
+handle_extended_attribute (tree *node, tree name, tree args, int
flags, bool *no_add_attrs)
+{
+  /* TODO What could be done here? */
+  *no_add_attrs = true;
+  return NULL_TREE;
+}
+
 /* Perform checking for contract attributes.  */

 tree
-- 
2.41.0


Re: [PATCH 0/5] LoongArch: Initial LA664 support

2023-11-16 Thread Xi Ruoyao
On Fri, 2023-11-17 at 10:41 +0800, chenglulu wrote:
> Hi,
> 
> Thank you very much for the modification, but I think we need to support 
> la664 with the configuration items of configure.

I'll add it.

> I also defined ISA_BASE_LA64V110 to represent the LoongArch1.1 
> instruction set, what do you think?

I'll add it too.  I had misread section 1.5 paragraph 1 of the spec so I
didn't consider this a good idea, but after reading it again I think it
should be added.

> 在 2023/11/16 下午9:18, Xi Ruoyao 写道:
> > Loongson 3A6000 processor will be shipped to general users in this month
> > and it features 4 cores with the new LA664 micro architecture.  Here is
> > some changes from LA464:
> > 
> > 1. The 32-bit division instruction now ignores the high 32 bits of the
> >     input registers.  This is enumerated via CPUCFG word 0x2, bit 26.
> > 2. The micro architecture now guarantees two loads on the same memory
> >     address won't be reordered with each other.  dbar 0x700 is turned
> >     into nop.
> > 3. The architecture now supports approximate square root instructions
> >     (FRECIPE and VRSQRTE) on 32-bit or 64-bit floating-point values and
> >     the vectors of these values.
> > 4. The architecture now supports SC.Q instruction for 128-bit CAS.
> > 5. The architecture now supports LL.ACQ and SC.REL instructions (well, I
> >     don't really know what they are for).
> > 6. The architecture now supports CAS instructions for 64, 32, 16, or 8-bit
> >     values.
> > 7. The architecture now supports atomic add and atomic swap instructions
> >     for 16 or 8-bit values.
> > 8. Some non-zero hint values of DBAR instructions are added.
> > 
> > These features are documented in LoongArch v1.1.  Implementations can
> > implement any subset of them and enumerate the implemented features via
> > CPUCFG.  LA664 implements them all.
> > 
> > (8) is already implemented in previous patches because it's completely
> > backward-compatible.  This series implements (1) and (2) with switches
> > -mdiv32 and -mld-seq-sa (these names are derived from the names of the
> > corresponding CPUCFG bits documented in the LoongArch v1.1
> > specification).
> > 
> > The other features require Binutils support and we are close to the end
> > of GCC 14 stage 1, so I'm posting this series first now.
> > 
> > With -march=la664, these two options are implicitly enabled but they can
> > be turned off with -mno-div32 or -mno-ld-seq-sa.
> > 
> > With -march=native, the current CPU is probed via CPUCFG and these
> > options are implicitly enabled if the CPU supports the corresponding
> > feature.  They can be turned off with explicit -mno-div32 or
> > -mno-ld-seq-sa as well.
> > 
> > -mtune=la664 is implemented as a copy of -mtune=la464 and we can adjust
> > it with benchmark results later.
> > 
> > Bootstrapped and regtested on a LA664 with BOOT_CFLAGS="-march=la664
> > -O2", a LA464 with BOOT_CFLAGS="-march=native -O2".  And manually
> > verified -march=native probing on LA664 and LA464.
> > 
> > Xi Ruoyao (5):
> >    LoongArch: Switch loongarch-def to C++
> >    LoongArch: genopts: Add infrastructure to generate code for new
> >  features in ISA evolution
> >    LoongArch: Take the advantage of -mdiv32 if it's enabled
> >    LoongArch: Don't emit dbar 0x700 if -mld-seq-sa
> >    LoongArch: Add -march=la664 and -mtune=la664
> > 
> >   gcc/config/loongarch/genopts/genstr.sh    |  78 ++-
> >   gcc/config/loongarch/genopts/isa-evolution.in |   2 +
> >   .../loongarch/genopts/loongarch-strings   |   1 +
> >   gcc/config/loongarch/genopts/loongarch.opt.in |  10 +
> >   gcc/config/loongarch/loongarch-cpu.cc |  37 ++--
> >   gcc/config/loongarch/loongarch-cpucfg-map.h   |  36 +++
> >   gcc/config/loongarch/loongarch-def-array.h    |  40 
> >   gcc/config/loongarch/loongarch-def.c  | 205 --
> >   gcc/config/loongarch/loongarch-def.cc | 193 +
> >   gcc/config/loongarch/loongarch-def.h  |  67 --
> >   gcc/config/loongarch/loongarch-opts.h |   9 +-
> >   gcc/config/loongarch/loongarch-str.h  |   8 +-
> >   gcc/config/loongarch/loongarch-tune.h | 123 ++-
> >   gcc/config/loongarch/loongarch.cc |   6 +-
> >   gcc/config/loongarch/loongarch.md |  31 ++-
> >   gcc/config/loongarch/loongarch.opt    |  23 +-
> >   gcc/config/loongarch/t-loongarch  |  25 ++-
> >   .../gcc.target/loongarch/div-div32.c  |  31 +++
> >   .../gcc.target/loongarch/div-no-div32.c   |  11 +
> >   19 files changed, 664 insertions(+), 272 deletions(-)
> >   create mode 100644 gcc/config/loongarch/genopts/isa-evolution.in
> >   create mode 100644 gcc/config/loongarch/loongarch-cpucfg-map.h
> >   create mode 100644 gcc/config/loongarch/loongarch-def-array.h
> >   delete mode 100644 gcc/config/loongarch/loongarch-def.c
> >   create mode 100644 gcc/config/loongarch/loongarch-def.cc
> >   create mode 100644 

Re: [committed] libstdc++: Implement std::out_ptr and std::inout_ptr for C++23 [PR111667]

2023-11-16 Thread Hans-Peter Nilsson
> From: Jonathan Wakely 
> Date: Thu, 16 Nov 2023 08:12:39 +

>   PR libstdc++/111667
>   * include/Makefile.am: Add new header.
>   * include/Makefile.in: Regenerate.
>   * include/bits/out_ptr.h: New file.
>   * include/bits/shared_ptr.h (__is_shared_ptr): Move definition
>   to here ...
>   * include/bits/shared_ptr_atomic.h (__is_shared_ptr): ... from
>   here.
>   * include/bits/shared_ptr_base.h (__shared_count): Declare
>   out_ptr_t as a friend.
>   (_Sp_counted_deleter, __shared_ptr): Likewise.
>   * include/bits/unique_ptr.h (unique_ptr, unique_ptr):
>   Declare out_ptr_t and inout_ptr_t as friends.
>   (__is_unique_ptr): Define new variable template.
>   * include/bits/version.def (out_ptr): Define.
>   * include/bits/version.h: Regenerate.
>   * include/std/memory: Include new header.
>   * testsuite/20_util/smartptr.adapt/inout_ptr/1.cc: New test.
>   * testsuite/20_util/smartptr.adapt/inout_ptr/2.cc: New test.
>   * testsuite/20_util/smartptr.adapt/inout_ptr/shared_ptr_neg.cc:
>   New test.
>   * testsuite/20_util/smartptr.adapt/inout_ptr/void_ptr.cc: New
>   test.
>   * testsuite/20_util/smartptr.adapt/out_ptr/1.cc: New test.
>   * testsuite/20_util/smartptr.adapt/out_ptr/2.cc: New test.
>   * testsuite/20_util/smartptr.adapt/out_ptr/shared_ptr_neg.cc:
>   New test.
>   * testsuite/20_util/smartptr.adapt/out_ptr/void_ptr.cc: New
>   test.

This commit, r14-5524-gc7f6537db94f7c, exposed or caused, for several targets:
FAIL: g++.dg/modules/xtreme-header-4_b.C -std=c++2b (internal compiler error: 
tree check: expected class 'type', have 'declaration' (template_decl) in 
get_originating_module_decl, at cp/module.cc:18649)
FAIL: g++.dg/modules/xtreme-header-4_b.C -std=c++2b (test for excess errors)
FAIL: g++.dg/modules/xtreme-header-5_b.C -std=c++2b (internal compiler error: 
tree check: expected class 'type', have 'declaration' (template_decl) in 
get_originating_module_decl, at cp/module.cc:18649)
FAIL: g++.dg/modules/xtreme-header-5_b.C -std=c++2b (test for excess errors)
FAIL: g++.dg/modules/xtreme-header_b.C -std=c++2b (internal compiler error: 
tree check: expected class 'type', have 'declaration' (template_decl) in 
get_originating_module_decl, at cp/module.cc:18649)
FAIL: g++.dg/modules/xtreme-header_b.C -std=c++2b (test for excess errors)

See PR 112580.

(BTW, I can't tell from the archive, whether the
"Linaro-TCWG-CI" tester notified you or just sent its report
to the gcc-regression@ list.)

brgds, H-P


[PATCH] RISC-V: Optimize VLA SLP with duplicate VLA shuffle indice

2023-11-16 Thread Juzhe-Zhong
When evaluating dynamic LMUL, notice we can do better on VLA SLP with duplicate 
VLA shuffle indice.

Consider this following case:

void
foo (uint16_t *restrict a, uint16_t *restrict b, int n)
{
  for (int i = 0; i < n; ++i)
{
  a[i * 8] = b[i * 8 + 3] + 1;
  a[i * 8 + 1] = b[i * 8 + 6] + 1;
  a[i * 8 + 2] = b[i * 8 + 0] + 1;
  a[i * 8 + 3] = b[i * 8 + 2] + 1;
  a[i * 8 + 4] = b[i * 8 + 1] + 1;
  a[i * 8 + 5] = b[i * 8 + 7] + 1;
  a[i * 8 + 6] = b[i * 8 + 5] + 1;
  a[i * 8 + 7] = b[i * 8 + 4] + 1;
}
}

We want to generate this following indice:

{ 3, 6, 0, 2, 1, 7, 5, 4, 11, 14, 8, 10, 9, 15, 13, 12, 19, 22, 16, 18, 17, 23, 
21, 20, ... }

Before this patch, we are using pair of vmseq + vmerge to create such vector in 
prologue:

https://godbolt.org/z/r919WPabK

foo:
ble a2,zero,.L5
vsetvli a5,zero,e16,m8,ta,ma
vid.v   v16
vand.vi v24,v16,7
vmseq.viv0,v24,1
vmseq.viv1,v24,2
vmv.v.i v8,3
vmerge.vim  v8,v8,5,v0
vmv1r.v v0,v1
vmseq.viv2,v24,3
vmerge.vim  v8,v8,-2,v0
vmv1r.v v0,v2
vmseq.viv1,v24,4
csrra4,vlenb
vmerge.vim  v8,v8,-1,v0
csrra6,vlenb
vmv1r.v v0,v1
vmseq.viv2,v24,5
sllia3,a4,2
vmerge.vim  v8,v8,-3,v0
sllia6,a6,2
vmv1r.v v0,v2
vmseq.viv1,v24,6
vmerge.vim  v8,v8,2,v0
addia6,a6,-1
vmv1r.v v0,v1
sllia2,a2,3
sllia4,a4,3
neg a7,a3
vmseq.viv2,v24,7
vmerge.vim  v8,v8,-1,v0
vmv.v.x v24,a6
vmv1r.v v0,v2
vmerge.vim  v8,v8,-3,v0
vadd.vv v16,v16,v8
vand.vv v24,v16,v24
.L3:
minua5,a2,a3
vsetvli zero,a5,e16,m8,ta,ma
mv  a6,a2
vle16.v v16,0(a1)
vrgather.vv v8,v16,v24
vadd.vi v8,v8,1
vse16.v v8,0(a0)
add a1,a1,a4
add a0,a0,a4
add a2,a2,a7
bgtua6,a3,.L3
.L5:
ret

After this patch:

foo:
ble a2,zero,.L5
li  a6,-536875008
sllia6,a6,4
addia6,a6,3
csrra4,vlenb
csrra5,vlenb
li  t1,-536850432
sllia6,a6,16
sllia3,a4,1
addia6,a6,-3
sllia5,a5,1
sllit1,t1,4
vsetvli t3,zero,e64,m4,ta,ma
addia5,a5,-1
vmv.v.x v4,a6
addit1,t1,3
sllia2,a2,3
sllia4,a4,2
neg a7,a3
vslide1up.vxv16,v4,t1
vsetvli a6,zero,e16,m4,ta,ma
vid.v   v12
vmv.v.x v4,a5
vand.vi v20,v12,7
vrgather.vv v8,v16,v20
vadd.vv v12,v12,v8
vand.vv v12,v12,v4
.L3:
minua5,a2,a3
vsetvli zero,a5,e16,m4,ta,ma
mv  a6,a2
vle16.v v8,0(a1)
vrgather.vv v4,v8,v12
vadd.vi v4,v4,1
vse16.v v4,0(a0)
add a1,a1,a4
add a0,a0,a4
add a2,a2,a7
bgtua6,a3,.L3
.L5:
ret

Optimization:
1. reduce 9 dynamic instructions in prologue.
2. Fewer vector instructions reduce hardware pipeline consuming.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (rvv_builder::merge_pattern): New function.
(expand_vector_init_trailing_same_elem): Adapt function.
(expand_const_vector): Ditto.
(expand_vec_init): Ditto.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/slp-1.c: New test.

---
 gcc/config/riscv/riscv-v.cc   | 174 +-
 .../costmodel/riscv/rvv/dynamic-lmul4-6.c |   7 +-
 .../costmodel/riscv/rvv/dynamic-lmul4-8.c |   7 +-
 .../gcc.dg/vect/costmodel/riscv/rvv/slp-1.c   |  40 
 4 files changed, 174 insertions(+), 54 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/slp-1.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 000d0c2c721..6a2009ffb05 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -420,6 +420,7 @@ public:
 
   bool single_step_npatterns_p () const;
   bool npatterns_all_equal_p () const;
+  void merge_pattern (rtx);
 
   machine_mode new_mode () const { return m_new_mode; }
   scalar_mode inner_mode () const { return m_inner_mode; }
@@ -679,6 +680,50 @@ rvv_builder::npatterns_all_equal_p () const
   return true;
 }
 
+/* Merge pattern to reduce the elements we need to process.
+
+   E.g. v = { 0, 1, 2, 3 }, mode = V4SI.
+   Since we can use EEW = 64 RVV instructions.
+
+   Transform it into:
+   v = { 1 << 32, 3 << 32 | 2 }, mode = V2DI.  */
+void
+rvv_builder::merge_pattern (rtx src)
+{
+  if (this->inner_bits_size () 

[PATCH] C++: Introduce the extended attribute for asm declarations

2023-11-16 Thread Julian Waters
Hi all,

This is the beginning of a patch to introduce the extended attribute
for asm declarations proposed in
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636563.html. I
will need some reviewer help in implementing this patch, as I am not
very familiar with gcc's internals.

The attribute in question looks as such (Example):

[[gnu::extended([output] "=r" (output) [otheroutput] "=r" (otheroutput),
  [input] "r" (input) [otherinput] "r" (otherinput),
  "%rcx" "%rdx", label, volatile stack)]]
asm ("");

I would really appreciate any reviews, as well as help in implementing
this patch

best regards,
Julian
From a189f2820025315b5574d0e9384b96301c6ba7e8 Mon Sep 17 00:00:00 2001
From: TheShermanTanker 
Date: Fri, 17 Nov 2023 11:09:50 +0800
Subject: [PATCH] Introduce the extended attribute for asm declarations

---
 gcc/cp/parser.cc | 25 +
 gcc/cp/tree.cc   | 11 +++
 2 files changed, 36 insertions(+)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 5116bcb78f6..ecc5f2fabc1 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -15407,6 +15407,13 @@ cp_parser_block_declaration (cp_parser *parser,
 	cp_parser_commit_to_tentative_parse (parser);
   cp_parser_asm_definition (parser);
 }
+  else if ((attr_idx = cp_parser_skip_std_attribute_spec_seq (parser, 1)) != 1
+   && cp_lexer_nth_token_is_keyword (parser->lexer, attr_idx, RID_ASM))
+{
+  if (statement_p)
+cp_parser_commit_to_tentative_parse (parser);
+  cp_parser_asm_definition (parser);
+}
   /* If the next keyword is `namespace', we have a
  namespace-alias-definition.  */
   else if (token1->keyword == RID_NAMESPACE)
@@ -22397,6 +22404,23 @@ cp_parser_asm_definition (cp_parser* parser)
   bool invalid_inputs_p = false;
   bool invalid_outputs_p = false;
   required_token missing = RT_NONE;
+
+  tree attrs = cp_parser_std_attribute_spec_seq (parser);
+  tree extended = error_mark_node;
+
+  if (cp_lexer_next_token_is (parser->lexer, CPP_SEMICOLON))
+{
+  /* Error during attribute parsing that resulted in skipping
+ to next semicolon.  */
+  cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
+  return;
+}
+
+  if (attrs != error_mark_node)
+{
+  extended = lookup_attribute ("gnu", "extended", attrs);
+}
+
   location_t asm_loc = cp_lexer_peek_token (parser->lexer)->location;
 
   /* Look for the `asm' keyword.  */
@@ -22511,6 +22535,7 @@ cp_parser_asm_definition (cp_parser* parser)
  two `:' tokens.  */
   if (cp_parser_allow_gnu_extensions_p (parser)
   && parser->in_function_body
+  && extended == error_mark_node
   && (cp_lexer_next_token_is (parser->lexer, CPP_COLON)
 	  || cp_lexer_next_token_is (parser->lexer, CPP_SCOPE)))
 {
diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index 417c92ba76f..1f081b3dfd8 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -46,6 +46,7 @@ static tree verify_stmt_tree_r (tree *, int *, void *);
 
 static tree handle_init_priority_attribute (tree *, tree, tree, int, bool *);
 static tree handle_abi_tag_attribute (tree *, tree, tree, int, bool *);
+static tree handle_extended_attribute (tree *, tree, tree, int, bool *);
 static tree handle_contract_attribute (tree *, tree, tree, int, bool *);
 
 /* If REF is an lvalue, returns the kind of lvalue that REF is.
@@ -5080,6 +5081,8 @@ const struct attribute_spec cxx_attribute_table[] =
 handle_init_priority_attribute, NULL },
   { "abi_tag", 1, -1, false, false, false, true,
 handle_abi_tag_attribute, NULL },
+  { "extended", 0, 5, true, false, false, false,
+handle_extended_attribute, NULL },
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
 };
 
@@ -5350,6 +5353,14 @@ handle_abi_tag_attribute (tree* node, tree name, tree args,
   return NULL_TREE;
 }
 
+static tree
+handle_extended_attribute (tree *node, tree name, tree args, int flags, bool *no_add_attrs)
+{
+  /* TODO What could be done here? */
+  *no_add_attrs = true;
+  return NULL_TREE;
+}
+
 /* Perform checking for contract attributes.  */
 
 tree
-- 
2.41.0



Re: [committed] hppa: Revise REG+D address support to allow long displacements before reload

2023-11-16 Thread Jeff Law




On 11/16/23 18:20, Sam James wrote:



Jeff, I don't suppose you could dig out the old bugs/commits just out of
interest?
That work goes back to the early 90s when I was primarily responsible 
for the PA platform.  But the core issue hasn't changed in that not 
enough context is provided for reload to know how to deal with these 
problems.


So, digging out those testcases/codes would be quite difficult; at the 
time we didn't have standard procedures where tests were added to the 
testsuite for most changes or even discussed.






I'm not seeing any obvious problems in the gcc testsuite.  It needs testing on 
packages that do extensive
floating point calculations.


OK, I'll focus on those.
THe more likely scenario is xmpy which is used for integer multiply, but 
the operands have to be moved into FP registers because the operation 
happens in the FPU.


jeff


[PATCH v2 5/5] aarch64: Add function multiversioning support

2023-11-16 Thread Andrew Carlotti
This adds initial support for function multiversioning on aarch64 using
the target_version and target_clones attributes.  This loosely follows
the Beta specification in the ACLE [1], although with some differences
that still need to be resolved (possibly as follow-up patches).

Existing function multiversioning implementations are broken in various
ways when used across translation units.  This includes placing
resolvers in the wrong translation units, and using symbol mangling that
callers to unintentionally bypass the resolver in some circumstances.
Fixing these issues for aarch64 will require modifications to our ACLE
specification.  It will also require further adjustments to existing
middle end code, to facilitate different mangling and resolver
placement while preserving existing target behaviours.

The list of function multiversioning features specified in the ACLE is
also inconsistent with the list of features supported in target option
extensions.  I intend to resolve some or all of these inconsistencies at
a later stage.

The target_version attribute is currently only supported in C++, since
this is the only frontend with existing support for multiversioning
using the target attribute.  On the other hand, this patch happens to
enable multiversioning with the target_clones attribute in Ada and D, as
well as the entire C family, using their existing frontend support.

This patch also does not support the following aspects of the Beta
specification:

- The target_clones attribute should allow an implicit unlisted
  "default" version.
- There should be an option to disable function multiversioning at
  compile time.
- Unrecognised target names in a target_clones attribute should be
  ignored (with an optional warning).  This current patch raises an
  error instead.

[1] 
https://github.com/ARM-software/acle/blob/main/main/acle.md#function-multi-versioning

---

I believe the support present in this patch correctly handles function
multiversioning within a single translation unit for all features in the ACLE
specification with option extension support.

Is it ok to push this patch in its current state? I'd then continue working on
incremental improvements to the supported feature extensions and the ABI issues
in followup patches, in along with corresponding changes and improvements to
the ACLE specification.


gcc/ChangeLog:

* config/aarch64/aarch64-feature-deps.h (fmv_deps_):
Define aarch64_feature_flags mask foreach FMV feature.
* config/aarch64/aarch64-option-extensions.def: Use new macros
to define FMV feature extensions.
* config/aarch64/aarch64.cc (aarch64_option_valid_attribute_p):
Check for target_version attribute after processing target
attribute.
(aarch64_fmv_feature_data): New.
(aarch64_parse_fmv_features): New.
(aarch64_process_target_version_attr): New.
(aarch64_option_valid_version_attribute_p): New.
(get_feature_mask_for_version): New.
(compare_feature_masks): New.
(aarch64_compare_version_priority): New.
(build_ifunc_arg_type): New.
(make_resolver_func): New.
(add_condition_to_bb): New.
(compare_feature_version_info): New.
(dispatch_function_versions): New.
(aarch64_generate_version_dispatcher_body): New.
(aarch64_get_function_versions_dispatcher): New.
(aarch64_common_function_versions): New.
(aarch64_mangle_decl_assembler_name): New.
(TARGET_OPTION_VALID_VERSION_ATTRIBUTE_P): New implementation.
(TARGET_OPTION_EXPANDED_CLONES_ATTRIBUTE): New implementation.
(TARGET_OPTION_FUNCTION_VERSIONS): New implementation.
(TARGET_COMPARE_VERSION_PRIORITY): New implementation.
(TARGET_GENERATE_VERSION_DISPATCHER_BODY): New implementation.
(TARGET_GET_FUNCTION_VERSIONS_DISPATCHER): New implementation.
(TARGET_MANGLE_DECL_ASSEMBLER_NAME): New implementation.
* config/arm/aarch-common.h (enum aarch_parse_opt_result): Add
  new value to report duplicate FMV feature.
* common/config/aarch64/cpuinfo.h: New file.

libgcc/ChangeLog:

* config/aarch64/cpuinfo.c (enum CPUFeatures): Move to shared
  copy in gcc/common

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/options_set_17.c: Reorder expected flags.
* gcc.target/aarch64/cpunative/native_cpu_0.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_18.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_19.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_20.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_21.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_22.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_6.c: 

[PATCH v2 4/5] Add support for target_version attribute

2023-11-16 Thread Andrew Carlotti
This patch adds support for the "target_version" attribute to the middle
end and the C++ frontend, which will be used to implement function
multiversioning in the aarch64 backend.

On targets that don't use the "target" attribute for multiversioning,
there is no conflict between the "target" and "target_clones"
attributes.  This patch therefore makes the mutual exclusion in
C-family, D and Ada conditonal upon the value of the
expanded_clones_attribute target hook.

The "target_version" attribute is only added to C++ in this patch,
because this is currently the only frontend which supports
multiversioning using the "target" attribute.  Support for the
"target_version" attribute will be extended to C at a later date.

Targets that currently use the "target" attribute for function
multiversioning (i.e. i386 and rs6000) are not affected by this patch.

Ok for master?

gcc/ChangeLog:

* attribs.cc (decl_attributes): Pass attribute name to target.
(is_function_default_version): Update comment to specify
incompatibility with target_version attributes.
* cgraphclones.cc (cgraph_node::create_version_clone_with_body):
Call valid_version_attribute_p for target_version attributes.
* target.def (valid_version_attribute_p): New hook.
(expanded_clones_attribute): New hook.
* doc/tm.texi.in: Add new hooks.
* doc/tm.texi: Regenerate.
* multiple_target.cc (create_dispatcher_calls): Remove redundant
is_function_default_version check.
(expand_target_clones): Use target hook for attribute name.
* targhooks.cc (default_target_option_valid_version_attribute_p):
New.
* targhooks.h (default_target_option_valid_version_attribute_p):
New.
* tree.h (DECL_FUNCTION_VERSIONED): Update comment to include
target_version attributes.

gcc/c-family/ChangeLog:

* c-attribs.cc (CLONES_USES_TARGET): New macro.
(attr_target_exclusions): Use new macro.
(attr_target_clones_exclusions): Ditto, and add target_version.
(attr_target_version_exclusions): New.
(c_common_attribute_table): Add target_version.
(handle_target_version_attribute): New.

gcc/ada/ChangeLog:

* gcc-interface/utils.cc (CLONES_USES_TARGET): New macro.
(attr_target_exclusions): Use new macro.
(attr_target_clones_exclusions): Ditto.

gcc/d/ChangeLog:

* d-attribs.cc (CLONES_USES_TARGET): New macro.
(attr_target_exclusions): Use new macro.
(attr_target_clones_exclusions): Ditto.

gcc/cp/ChangeLog:

* decl2.cc (check_classfn): Update comment to include
target_version attributes.


diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index 
e33a63948cebdeafc3abcdd539a35141969ad978..8850943cb3326568b4679a73405f50487aa1b7c6
 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.cc
@@ -143,16 +143,21 @@ static const struct attribute_spec::exclusions 
attr_noinline_exclusions[] =
   { NULL, false, false, false },
 };
 
+#define CLONES_USES_TARGET \
+  (strcmp (targetm.target_option.expanded_clones_attribute, \
+  "target") == 0)
+
 static const struct attribute_spec::exclusions attr_target_exclusions[] =
 {
-  { "target_clones", true, true, true },
+  { "target_clones", CLONES_USES_TARGET, CLONES_USES_TARGET,
+CLONES_USES_TARGET },
   { NULL, false, false, false },
 };
 
 static const struct attribute_spec::exclusions attr_target_clones_exclusions[] 
=
 {
   { "always_inline", true, true, true },
-  { "target", true, true, true },
+  { "target", CLONES_USES_TARGET, CLONES_USES_TARGET, CLONES_USES_TARGET },
   { NULL, false, false, false },
 };
 
diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 
f9fd258598914ce2112ecaaeaad6c63cd69a44e2..27533023ef5c481ba085c2f0c605dfb992987b3e
 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -657,7 +657,8 @@ decl_attributes (tree *node, tree attributes, int flags,
  options to the attribute((target(...))) list.  */
   if (TREE_CODE (*node) == FUNCTION_DECL
   && current_target_pragma
-  && targetm.target_option.valid_attribute_p (*node, NULL_TREE,
+  && targetm.target_option.valid_attribute_p (*node,
+ get_identifier("target"),
  current_target_pragma, 0))
 {
   tree cur_attr = lookup_attribute ("target", attributes);
@@ -1241,8 +1242,9 @@ make_dispatcher_decl (const tree decl)
   return func_decl;  
 }
 
-/* Returns true if decl is multi-versioned and DECL is the default function,
-   that is it is not tagged with target specific optimization.  */
+/* Returns true if DECL is multi-versioned using the target attribute, and this
+   is the default version.  This function can only be used for targets that do
+   not support the "target_version" attribute.  */
 
 bool
 is_function_default_version (const tree decl)
diff --git 

[PATCH v2 3/5] ada: Improve attribute exclusion handling

2023-11-16 Thread Andrew Carlotti
Change the handling of some attribute mutual exclusions to use the
generic attribute exclusion lists, and fix some asymmetric exclusions by
adding the exclusions for always_inline after noinline or target_clones.

Aside from the new always_inline exclusions, the only change is
functionality is the choice of warning message displayed.  All warnings
about attribute mutual exclusions now use the same message.

---

I haven't manged to test the Ada frontend, but this patch (and the following
one) contain only minimal change to functionality, which I have tested by
copying the code to the C++ frontend and verifying the behaviour of equivalent
changes there.  Is this ok to push without further testing?  If not, then could
someone test this series for me?

gcc/ada/ChangeLog:

* gcc-interface/utils.cc (attr_noinline_exclusions): New.
(attr_always_inline_exclusions): Ditto.
(attr_target_exclusions): Ditto.
(attr_target_clones_exclusions): Ditto.
(gnat_internal_attribute_table): Add new exclusion lists.
(handle_noinline_attribute): Remove custom exclusion handling.
(handle_target_attribute): Ditto.
(handle_target_clones_attribute): Ditto.


diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index 
8b2c7f99ef3060603658e438b71a3bfa3ef7f2ac..e33a63948cebdeafc3abcdd539a35141969ad978
 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.cc
@@ -130,6 +130,32 @@ static const struct attribute_spec::exclusions 
attr_stack_protect_exclusions[] =
   { NULL, false, false, false },
 };
 
+static const struct attribute_spec::exclusions attr_always_inline_exclusions[] 
=
+{
+  { "noinline", true, true, true },
+  { "target_clones", true, true, true },
+  { NULL, false, false, false },
+};
+
+static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
+{
+  { "always_inline", true, true, true },
+  { NULL, false, false, false },
+};
+
+static const struct attribute_spec::exclusions attr_target_exclusions[] =
+{
+  { "target_clones", true, true, true },
+  { NULL, false, false, false },
+};
+
+static const struct attribute_spec::exclusions attr_target_clones_exclusions[] 
=
+{
+  { "always_inline", true, true, true },
+  { "target", true, true, true },
+  { NULL, false, false, false },
+};
+
 /* Fake handler for attributes we don't properly support, typically because
they'd require dragging a lot of the common-c front-end circuitry.  */
 static tree fake_attribute_handler (tree *, tree, tree, int, bool *);
@@ -165,7 +191,7 @@ const struct attribute_spec gnat_internal_attribute_table[] 
=
   { "strub",   0, 1, false, true, false, true,
 handle_strub_attribute, NULL },
   { "noinline", 0, 0,  true,  false, false, false,
-handle_noinline_attribute, NULL },
+handle_noinline_attribute, attr_noinline_exclusions },
   { "noclone",  0, 0,  true,  false, false, false,
 handle_noclone_attribute, NULL },
   { "no_icf",   0, 0,  true,  false, false, false,
@@ -175,7 +201,7 @@ const struct attribute_spec gnat_internal_attribute_table[] 
=
   { "leaf", 0, 0,  true,  false, false, false,
 handle_leaf_attribute, NULL },
   { "always_inline",0, 0,  true,  false, false, false,
-handle_always_inline_attribute, NULL },
+handle_always_inline_attribute, attr_always_inline_exclusions },
   { "malloc",   0, 0,  true,  false, false, false,
 handle_malloc_attribute, NULL },
   { "type generic", 0, 0,  false, true,  true,  false,
@@ -192,9 +218,9 @@ const struct attribute_spec gnat_internal_attribute_table[] 
=
   { "simd", 0, 1,  true,  false, false, false,
 handle_simd_attribute, NULL },
   { "target",   1, -1, true,  false, false, false,
-handle_target_attribute, NULL },
+handle_target_attribute, attr_target_exclusions },
   { "target_clones",1, -1, true,  false, false, false,
-handle_target_clones_attribute, NULL },
+handle_target_clones_attribute, attr_target_clones_exclusions },
 
   { "vector_size",  1, 1,  false, true,  false, false,
 handle_vector_size_attribute, NULL },
@@ -6742,16 +6768,7 @@ handle_noinline_attribute (tree *node, tree name,
   int ARG_UNUSED (flags), bool *no_add_attrs)
 {
   if (TREE_CODE (*node) == FUNCTION_DECL)
-{
-  if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
-   {
- warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-  "with attribute %qs", name, "always_inline");
- *no_add_attrs = true;
-   }
-  else
-   DECL_UNINLINABLE (*node) = 1;
-}
+DECL_UNINLINABLE (*node) = 1;
   else
 {
   warning (OPT_Wattributes, "%qE attribute ignored", name);
@@ -7050,12 +7067,6 @@ handle_target_attribute (tree *node, tree name, tree 
args, int flags,
   warning (OPT_Wattributes, "%qE attribute ignored", name);
   *no_add_attrs = true;
 }
-  else if (lookup_attribute 

[PATCH v2 2/5] c-family: Simplify attribute exclusion handling

2023-11-16 Thread Andrew Carlotti
This patch changes the handling of mutual exclusions involving the
target and target_clones attributes to use the generic attribute
exclusion lists.  Additionally, the duplicate handling for the
always_inline and noinline attribute exclusion is removed.

The only change in functionality is the choice of warning message
displayed - due to either a change in the wording for mutual exclusion
warnings, or a change in the order in which different checks occur.

Ok for master?

gcc/c-family/ChangeLog:

* c-attribs.cc (attr_always_inline_exclusions): New.
(attr_target_exclusions): Ditto.
(attr_target_clones_exclusions): Ditto.
(c_common_attribute_table): Add new exclusion lists.
(handle_noinline_attribute): Remove custom exclusion handling.
(handle_always_inline_attribute): Ditto.
(handle_target_attribute): Ditto.
(handle_target_clones_attribute): Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/mvc2.C:
* g++.target/i386/mvc3.C:


diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 
461732f60f7c4031cc6692000fbdddb9f726a035..b3b41ef123a0f171f57acb1b7f7fdde716428c00
 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -214,6 +214,13 @@ static const struct attribute_spec::exclusions 
attr_inline_exclusions[] =
   ATTR_EXCL (NULL, false, false, false),
 };
 
+static const struct attribute_spec::exclusions attr_always_inline_exclusions[] 
=
+{
+  ATTR_EXCL ("noinline", true, true, true),
+  ATTR_EXCL ("target_clones", true, true, true),
+  ATTR_EXCL (NULL, false, false, false),
+};
+
 static const struct attribute_spec::exclusions attr_noinline_exclusions[] =
 {
   ATTR_EXCL ("always_inline", true, true, true),
@@ -221,6 +228,19 @@ static const struct attribute_spec::exclusions 
attr_noinline_exclusions[] =
   ATTR_EXCL (NULL, false, false, false),
 };
 
+static const struct attribute_spec::exclusions attr_target_exclusions[] =
+{
+  ATTR_EXCL ("target_clones", true, true, true),
+  ATTR_EXCL (NULL, false, false, false),
+};
+
+static const struct attribute_spec::exclusions attr_target_clones_exclusions[] 
=
+{
+  ATTR_EXCL ("always_inline", true, true, true),
+  ATTR_EXCL ("target", true, true, true),
+  ATTR_EXCL (NULL, false, false, false),
+};
+
 extern const struct attribute_spec::exclusions attr_noreturn_exclusions[] =
 {
   ATTR_EXCL ("alloc_align", true, true, true),
@@ -332,7 +352,7 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_leaf_attribute, NULL },
   { "always_inline",  0, 0, true,  false, false, false,
  handle_always_inline_attribute,
- attr_inline_exclusions },
+ attr_always_inline_exclusions },
   { "gnu_inline", 0, 0, true,  false, false, false,
  handle_gnu_inline_attribute,
  attr_inline_exclusions },
@@ -483,9 +503,11 @@ const struct attribute_spec c_common_attribute_table[] =
   { "error", 1, 1, true,  false, false, false,
  handle_error_attribute, NULL },
   { "target", 1, -1, true, false, false, false,
- handle_target_attribute, NULL },
+ handle_target_attribute,
+ attr_target_exclusions },
   { "target_clones",  1, -1, true, false, false, false,
- handle_target_clones_attribute, NULL },
+ handle_target_clones_attribute,
+ attr_target_clones_exclusions },
   { "optimize",   1, -1, true, false, false, false,
  handle_optimize_attribute, NULL },
   /* For internal use only.  The leading '*' both prevents its usage in
@@ -1397,16 +1419,7 @@ handle_noinline_attribute (tree *node, tree name,
   int ARG_UNUSED (flags), bool *no_add_attrs)
 {
   if (TREE_CODE (*node) == FUNCTION_DECL)
-{
-  if (lookup_attribute ("always_inline", DECL_ATTRIBUTES (*node)))
-   {
- warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-  "with attribute %qs", name, "always_inline");
- *no_add_attrs = true;
-   }
-  else
-   DECL_UNINLINABLE (*node) = 1;
-}
+DECL_UNINLINABLE (*node) = 1;
   else
 {
   warning (OPT_Wattributes, "%qE attribute ignored", name);
@@ -1487,22 +1500,9 @@ handle_always_inline_attribute (tree *node, tree name,
 {
   if (TREE_CODE (*node) == FUNCTION_DECL)
 {
-  if (lookup_attribute ("noinline", DECL_ATTRIBUTES (*node)))
-   {
- warning (OPT_Wattributes, "%qE attribute ignored due to conflict "
-  "with %qs attribute", name, "noinline");
- *no_add_attrs = true;
-   }
-  else if (lookup_attribute ("target_clones", DECL_ATTRIBUTES (*node)))

Re: [PATCH v2[1/5] aarch64: Add cpu feature detection to libgcc

2023-11-16 Thread Andrew Carlotti
This is added to enable function multiversioning, but can also be used
directly.  The interface is chosen to match that used in LLVM's
compiler-rt, to facilitate cross-compiler compatibility.

The content of the patch is derived almost entirely from Pavel's prior
contributions to compiler-rt/lib/builtins/cpu_model.c. I have made minor
changes to align more closely with GCC coding style, and to exclude any code
from other LLVM contributors, and am adding this to GCC with Pavel's approval.

libgcc/ChangeLog:

* config/aarch64/t-aarch64: Include cpuinfo.c
* config/aarch64/cpuinfo.c: New file
(__init_cpu_features_constructor) New.
(__init_cpu_features_resolver) New.
(__init_cpu_features) New.

Co-authored-by: Pavel Iliin 


diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c
new file mode 100644
index 
..0888ca4ed058430f524b99cb0e204bd996fa0e55
--- /dev/null
+++ b/libgcc/config/aarch64/cpuinfo.c
@@ -0,0 +1,502 @@
+/* CPU feature detection for AArch64 architecture.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+  
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#if defined(__has_include)
+#if __has_include()
+#include 
+
+#if __has_include()
+#include 
+#else
+typedef struct __ifunc_arg_t {
+  unsigned long _size;
+  unsigned long _hwcap;
+  unsigned long _hwcap2;
+} __ifunc_arg_t;
+#endif
+
+#if __has_include()
+#include 
+
+/* CPUFeatures must correspond to the same AArch64 features in aarch64.cc  */
+enum CPUFeatures {
+  FEAT_RNG,
+  FEAT_FLAGM,
+  FEAT_FLAGM2,
+  FEAT_FP16FML,
+  FEAT_DOTPROD,
+  FEAT_SM4,
+  FEAT_RDM,
+  FEAT_LSE,
+  FEAT_FP,
+  FEAT_SIMD,
+  FEAT_CRC,
+  FEAT_SHA1,
+  FEAT_SHA2,
+  FEAT_SHA3,
+  FEAT_AES,
+  FEAT_PMULL,
+  FEAT_FP16,
+  FEAT_DIT,
+  FEAT_DPB,
+  FEAT_DPB2,
+  FEAT_JSCVT,
+  FEAT_FCMA,
+  FEAT_RCPC,
+  FEAT_RCPC2,
+  FEAT_FRINTTS,
+  FEAT_DGH,
+  FEAT_I8MM,
+  FEAT_BF16,
+  FEAT_EBF16,
+  FEAT_RPRES,
+  FEAT_SVE,
+  FEAT_SVE_BF16,
+  FEAT_SVE_EBF16,
+  FEAT_SVE_I8MM,
+  FEAT_SVE_F32MM,
+  FEAT_SVE_F64MM,
+  FEAT_SVE2,
+  FEAT_SVE_AES,
+  FEAT_SVE_PMULL128,
+  FEAT_SVE_BITPERM,
+  FEAT_SVE_SHA3,
+  FEAT_SVE_SM4,
+  FEAT_SME,
+  FEAT_MEMTAG,
+  FEAT_MEMTAG2,
+  FEAT_MEMTAG3,
+  FEAT_SB,
+  FEAT_PREDRES,
+  FEAT_SSBS,
+  FEAT_SSBS2,
+  FEAT_BTI,
+  FEAT_LS64,
+  FEAT_LS64_V,
+  FEAT_LS64_ACCDATA,
+  FEAT_WFXT,
+  FEAT_SME_F64,
+  FEAT_SME_I64,
+  FEAT_SME2,
+  FEAT_RCPC3,
+  FEAT_MAX,
+  FEAT_EXT = 62, /* Reserved to indicate presence of additional features field
+   in __aarch64_cpu_features.  */
+  FEAT_INIT  /* Used as flag of features initialization completion.  */
+};
+
+/* Architecture features used in Function Multi Versioning.  */
+struct {
+  unsigned long long features;
+  /* As features grows new fields could be added.  */
+} __aarch64_cpu_features __attribute__((visibility("hidden"), nocommon));
+
+#ifndef _IFUNC_ARG_HWCAP
+#define _IFUNC_ARG_HWCAP (1ULL << 62)
+#endif
+#ifndef AT_HWCAP
+#define AT_HWCAP 16
+#endif
+#ifndef HWCAP_CPUID
+#define HWCAP_CPUID (1 << 11)
+#endif
+#ifndef HWCAP_FP
+#define HWCAP_FP (1 << 0)
+#endif
+#ifndef HWCAP_ASIMD
+#define HWCAP_ASIMD (1 << 1)
+#endif
+#ifndef HWCAP_AES
+#define HWCAP_AES (1 << 3)
+#endif
+#ifndef HWCAP_PMULL
+#define HWCAP_PMULL (1 << 4)
+#endif
+#ifndef HWCAP_SHA1
+#define HWCAP_SHA1 (1 << 5)
+#endif
+#ifndef HWCAP_SHA2
+#define HWCAP_SHA2 (1 << 6)
+#endif
+#ifndef HWCAP_ATOMICS
+#define HWCAP_ATOMICS (1 << 8)
+#endif
+#ifndef HWCAP_FPHP
+#define HWCAP_FPHP (1 << 9)
+#endif
+#ifndef HWCAP_ASIMDHP
+#define HWCAP_ASIMDHP (1 << 10)
+#endif
+#ifndef HWCAP_ASIMDRDM
+#define HWCAP_ASIMDRDM (1 << 12)
+#endif
+#ifndef HWCAP_JSCVT
+#define HWCAP_JSCVT (1 << 13)
+#endif
+#ifndef HWCAP_FCMA
+#define HWCAP_FCMA (1 << 14)
+#endif
+#ifndef HWCAP_LRCPC
+#define HWCAP_LRCPC (1 << 15)
+#endif
+#ifndef HWCAP_DCPOP
+#define HWCAP_DCPOP (1 << 16)
+#endif
+#ifndef HWCAP_SHA3
+#define HWCAP_SHA3 (1 << 17)
+#endif
+#ifndef HWCAP_SM3
+#define HWCAP_SM3 (1 << 18)
+#endif
+#ifndef HWCAP_SM4

[PATCH v2 0/5] target_version and aarch64 function multiversioning

2023-11-16 Thread Andrew Carlotti
This series adds support for function multiversioning on aarch64.

Patch 1/5 is a repost of my copy of Pavel's aarch64 cpu feature detection code
to libgcc. This is slightly refactored in a later patch, but I've preserved
this patch as-is to make the attribution clearer.

Patches 2/5 and 3/5 are minor cleanups in the c-family and Ada attribute
exclusion handling, to support further tweaks to attribute exclusion handling
for c-family, Ada and D in patch 4.

Patch 4/5 adds support for the target_version attribute to the middle end and
C++ frontend, but should otherwise have no functional changes.

Patch 5/5 uses this support to implement function multiversioning in aarch64.

I plan to improve the existing documentation and tests, including covering the
new functionality, in subsequent commits (perhaps after fixing some of the
current ABI issues).

I'm happy with the state of patches 2-4. Patches 1 and 5 have various
outstanding issues, most of which require fixes to the ACLE as well.  It might
be best to push these patches in something like their current form, and then
push incremental fixes once we've agreed on the relevant specification changes.

The series passes regression testing on both x86 and aarch64 for C and C++. I
haven't got an Ada or D compiler on my build machine, so I haven't tested these
languages; however, I tested using the same code and making equivalent changes
in the C++ frontend, to verify their (minimal) impact upon attribute processing
functionality.

Thanks,
Andrew


Re: [PATCH 0/5] LoongArch: Initial LA664 support

2023-11-16 Thread chenglulu

Hi,

Thank you very much for the modification, but I think we need to support 
la664 with the configuration items of configure.


I also defined ISA_BASE_LA64V110 to represent the LoongArch1.1 
instruction set, what do you think?



在 2023/11/16 下午9:18, Xi Ruoyao 写道:

Loongson 3A6000 processor will be shipped to general users in this month
and it features 4 cores with the new LA664 micro architecture.  Here is
some changes from LA464:

1. The 32-bit division instruction now ignores the high 32 bits of the
input registers.  This is enumerated via CPUCFG word 0x2, bit 26.
2. The micro architecture now guarantees two loads on the same memory
address won't be reordered with each other.  dbar 0x700 is turned
into nop.
3. The architecture now supports approximate square root instructions
(FRECIPE and VRSQRTE) on 32-bit or 64-bit floating-point values and
the vectors of these values.
4. The architecture now supports SC.Q instruction for 128-bit CAS.
5. The architecture now supports LL.ACQ and SC.REL instructions (well, I
don't really know what they are for).
6. The architecture now supports CAS instructions for 64, 32, 16, or 8-bit
values.
7. The architecture now supports atomic add and atomic swap instructions
for 16 or 8-bit values.
8. Some non-zero hint values of DBAR instructions are added.

These features are documented in LoongArch v1.1.  Implementations can
implement any subset of them and enumerate the implemented features via
CPUCFG.  LA664 implements them all.

(8) is already implemented in previous patches because it's completely
backward-compatible.  This series implements (1) and (2) with switches
-mdiv32 and -mld-seq-sa (these names are derived from the names of the
corresponding CPUCFG bits documented in the LoongArch v1.1
specification).

The other features require Binutils support and we are close to the end
of GCC 14 stage 1, so I'm posting this series first now.

With -march=la664, these two options are implicitly enabled but they can
be turned off with -mno-div32 or -mno-ld-seq-sa.

With -march=native, the current CPU is probed via CPUCFG and these
options are implicitly enabled if the CPU supports the corresponding
feature.  They can be turned off with explicit -mno-div32 or
-mno-ld-seq-sa as well.

-mtune=la664 is implemented as a copy of -mtune=la464 and we can adjust
it with benchmark results later.

Bootstrapped and regtested on a LA664 with BOOT_CFLAGS="-march=la664
-O2", a LA464 with BOOT_CFLAGS="-march=native -O2".  And manually
verified -march=native probing on LA664 and LA464.

Xi Ruoyao (5):
   LoongArch: Switch loongarch-def to C++
   LoongArch: genopts: Add infrastructure to generate code for new
 features in ISA evolution
   LoongArch: Take the advantage of -mdiv32 if it's enabled
   LoongArch: Don't emit dbar 0x700 if -mld-seq-sa
   LoongArch: Add -march=la664 and -mtune=la664

  gcc/config/loongarch/genopts/genstr.sh|  78 ++-
  gcc/config/loongarch/genopts/isa-evolution.in |   2 +
  .../loongarch/genopts/loongarch-strings   |   1 +
  gcc/config/loongarch/genopts/loongarch.opt.in |  10 +
  gcc/config/loongarch/loongarch-cpu.cc |  37 ++--
  gcc/config/loongarch/loongarch-cpucfg-map.h   |  36 +++
  gcc/config/loongarch/loongarch-def-array.h|  40 
  gcc/config/loongarch/loongarch-def.c  | 205 --
  gcc/config/loongarch/loongarch-def.cc | 193 +
  gcc/config/loongarch/loongarch-def.h  |  67 --
  gcc/config/loongarch/loongarch-opts.h |   9 +-
  gcc/config/loongarch/loongarch-str.h  |   8 +-
  gcc/config/loongarch/loongarch-tune.h | 123 ++-
  gcc/config/loongarch/loongarch.cc |   6 +-
  gcc/config/loongarch/loongarch.md |  31 ++-
  gcc/config/loongarch/loongarch.opt|  23 +-
  gcc/config/loongarch/t-loongarch  |  25 ++-
  .../gcc.target/loongarch/div-div32.c  |  31 +++
  .../gcc.target/loongarch/div-no-div32.c   |  11 +
  19 files changed, 664 insertions(+), 272 deletions(-)
  create mode 100644 gcc/config/loongarch/genopts/isa-evolution.in
  create mode 100644 gcc/config/loongarch/loongarch-cpucfg-map.h
  create mode 100644 gcc/config/loongarch/loongarch-def-array.h
  delete mode 100644 gcc/config/loongarch/loongarch-def.c
  create mode 100644 gcc/config/loongarch/loongarch-def.cc
  create mode 100644 gcc/testsuite/gcc.target/loongarch/div-div32.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/div-no-div32.c

>From a22073dc47602e4de7922efe66fd83d6196eb5f9 Mon Sep 17 00:00:00 2001
From: Lulu Cheng 
Date: Thu, 16 Nov 2023 20:43:53 +0800
Subject: [PATCH v1 1/2] LoongArch: Add LA664 support.

Define ISA_BASE_LA64V110, which represents the base instruction set defined in LoongArch1.1.
Support the configure setting --with-arch =la664, and support -march=la664,-mtune=la664.

gcc/ChangeLog:

	* config.gcc: Support LA664.
	* config/loongarch/genopts/loongarch-strings: 

[PATCH v2] LoongArch: Implement C[LT]Z_DEFINED_VALUE_AT_ZERO

2023-11-16 Thread Li Wei
The LoongArch has defined ctz and clz on the backend, but if we want GCC
do CTZ transformation optimization in forwprop2 pass, GCC need to know
the value of c[lt]z at zero, which may be beneficial for some test cases
(like spec2017 deepsjeng_r).

After implementing the macro, we test dynamic instruction count on
deepsjeng_r:
- before 1688423249186
- after  1660311215745 (1.66% reduction)

gcc/ChangeLog:

* config/loongarch/loongarch.h (CLZ_DEFINED_VALUE_AT_ZERO):
  Implement.
(CTZ_DEFINED_VALUE_AT_ZERO): Same.

gcc/testsuite/ChangeLog:

* gcc.dg/pr90838.c: add clz/ctz test support on LoongArch.
---
 gcc/config/loongarch/loongarch.h | 5 +
 gcc/testsuite/gcc.dg/pr90838.c   | 5 +
 2 files changed, 10 insertions(+)

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index ddac8e98ea9..115222e70fd 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -1239,3 +1239,8 @@ struct GTY (()) machine_function
 
 #define TARGET_EXPLICIT_RELOCS \
   (la_opt_explicit_relocs == EXPLICIT_RELOCS_ALWAYS)
+
+#define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
+#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \
+  ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2)
diff --git a/gcc/testsuite/gcc.dg/pr90838.c b/gcc/testsuite/gcc.dg/pr90838.c
index 759059683a9..40aad70499d 100644
--- a/gcc/testsuite/gcc.dg/pr90838.c
+++ b/gcc/testsuite/gcc.dg/pr90838.c
@@ -83,3 +83,8 @@ int ctz4 (unsigned long x)
 /* { dg-final { scan-assembler-times "ctz\t" 3 { target { rv32 } } } } */
 /* { dg-final { scan-assembler-times "andi\t" 1 { target { rv32 } } } } */
 /* { dg-final { scan-assembler-times "mul\t" 1 { target { rv32 } } } } */
+
+/* { dg-final { scan-tree-dump-times {= \.CTZ} 4 "forwprop2" { target { 
loongarch64*-*-* } } } } */
+/* { dg-final { scan-assembler-times "ctz.d\t" 1 { target { loongarch64*-*-* } 
} } } */
+/* { dg-final { scan-assembler-times "ctz.w\t" 3 { target { loongarch64*-*-* } 
} } } */
+/* { dg-final { scan-assembler-times "andi\t" 4 { target { loongarch64*-*-* } 
} } } */
-- 
2.31.1



Re: [pushed][PATCH] LoongArch: Fix scan-assembler-times of lasx/lsx test case.

2023-11-16 Thread chenglulu

Pushed to r14-5544

在 2023/11/16 下午8:31, Jiahao Xu 写道:

These tests fail when they are first added,this patch adjusts the 
scan-assembler-times
to fix them.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: Adjust assembler 
times.
* gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: Ditto.
* gcc.target/loongarch/vector/lsx/lsx-vcond-1.c: Ditto.
* gcc.target/loongarch/vector/lsx/lsx-vcond-2.c: Ditto.

diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vcond-1.c 
b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vcond-1.c
index ee9cb1a1fa7..57064eac9dc 100644
--- a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vcond-1.c
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vcond-1.c
@@ -52,13 +52,13 @@ TEST_VAR_ALL (DEF_VCOND_VAR)
  
  /* { dg-final { scan-assembler-times {\txvslt\.b} 4 } } */

  /* { dg-final { scan-assembler-times {\txvslt\.h} 4 } } */
-/* { dg-final { scan-assembler-times {\txvslt\.w} 4 } } */
-/* { dg-final { scan-assembler-times {\txvslt\.d} 4 } } */
+/* { dg-final { scan-assembler-times {\txvslt\.w} 8 } } */
+/* { dg-final { scan-assembler-times {\txvslt\.d} 8 } } */
  /* { dg-final { scan-assembler-times {\txvsle\.b} 4 } } */
  /* { dg-final { scan-assembler-times {\txvsle\.h} 4 } } */
-/* { dg-final { scan-assembler-times {\txvsle\.w} 4 } } */
-/* { dg-final { scan-assembler-times {\txvsle\.d} 4 } } */
+/* { dg-final { scan-assembler-times {\txvsle\.w} 8 } } */
+/* { dg-final { scan-assembler-times {\txvsle\.d} 8 } } */
  /* { dg-final { scan-assembler-times {\txvseq\.b} 4 } } */
  /* { dg-final { scan-assembler-times {\txvseq\.h} 4 } } */
-/* { dg-final { scan-assembler-times {\txvseq\.w} 4 } } */
-/* { dg-final { scan-assembler-times {\txvseq\.d} 4 } } */
+/* { dg-final { scan-assembler-times {\txvseq\.w} 8 } } */
+/* { dg-final { scan-assembler-times {\txvseq\.d} 8 } } */
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vcond-2.c 
b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vcond-2.c
index 5f40ed44c2d..55d5a084c88 100644
--- a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vcond-2.c
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vcond-2.c
@@ -67,21 +67,21 @@ TEST_CMP (nule)
  TEST_CMP (nuge)
  TEST_CMP (nugt)
  
-/* { dg-final { scan-assembler-times {\txvfcmp\.ceq\.s} 2 } } */

-/* { dg-final { scan-assembler-times {\txvfcmp\.ceq\.d} 2 } } */
-/* { dg-final { scan-assembler-times {\txvfcmp\.cne\.s} 2 } } */
-/* { dg-final { scan-assembler-times {\txvfcmp\.cne\.d} 2 } } */
-/* { dg-final { scan-assembler-times {\txvfcmp\.slt\.s} 4 } } */
-/* { dg-final { scan-assembler-times {\txvfcmp\.slt\.d} 4 } } */
-/* { dg-final { scan-assembler-times {\txvfcmp\.sle\.s} 4 } } */
-/* { dg-final { scan-assembler-times {\txvfcmp\.sle\.d} 4 } } */
-/* { dg-final { scan-assembler-times {\txvfcmp\.cor\.s} 2 } } */
-/* { dg-final { scan-assembler-times {\txvfcmp\.cor\.d} 2 } } */
-/* { dg-final { scan-assembler-times {\txvfcmp\.cun\.s} 2 } } */
-/* { dg-final { scan-assembler-times {\txvfcmp\.cun\.d} 2 } } */
-/* { dg-final { scan-assembler-times {\txvfcmp\.cueq\.s} 4 } } */
-/* { dg-final { scan-assembler-times {\txvfcmp\.cueq\.d} 4 } } */
-/* { dg-final { scan-assembler-times {\txvfcmp\.cule\.s} 8 } } */
-/* { dg-final { scan-assembler-times {\txvfcmp\.cule\.d} 8 } } */
-/* { dg-final { scan-assembler-times {\txvfcmp\.cult\.s} 8 } } */
-/* { dg-final { scan-assembler-times {\txvfcmp\.cult\.d} 8 } } */
+/* { dg-final { scan-assembler-times {\txvfcmp\.ceq\.s} 3 } } */
+/* { dg-final { scan-assembler-times {\txvfcmp\.ceq\.d} 3 } } */
+/* { dg-final { scan-assembler-times {\txvfcmp\.cne\.s} 3 } } */
+/* { dg-final { scan-assembler-times {\txvfcmp\.cne\.d} 3 } } */
+/* { dg-final { scan-assembler-times {\txvfcmp\.slt\.s} 6 } } */
+/* { dg-final { scan-assembler-times {\txvfcmp\.slt\.d} 6 } } */
+/* { dg-final { scan-assembler-times {\txvfcmp\.sle\.s} 6 } } */
+/* { dg-final { scan-assembler-times {\txvfcmp\.sle\.d} 6 } } */
+/* { dg-final { scan-assembler-times {\txvfcmp\.cor\.s} 3 } } */
+/* { dg-final { scan-assembler-times {\txvfcmp\.cor\.d} 3 } } */
+/* { dg-final { scan-assembler-times {\txvfcmp\.cun\.s} 3 } } */
+/* { dg-final { scan-assembler-times {\txvfcmp\.cun\.d} 3 } } */
+/* { dg-final { scan-assembler-times {\txvfcmp\.cueq\.s} 6 } } */
+/* { dg-final { scan-assembler-times {\txvfcmp\.cueq\.d} 6 } } */
+/* { dg-final { scan-assembler-times {\txvfcmp\.cule\.s} 12 } } */
+/* { dg-final { scan-assembler-times {\txvfcmp\.cule\.d} 12 } } */
+/* { dg-final { scan-assembler-times {\txvfcmp\.cult\.s} 12 } } */
+/* { dg-final { scan-assembler-times {\txvfcmp\.cult\.d} 12 } } */
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-vcond-1.c 
b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-vcond-1.c
index 138adccfaf9..8c69f0d9bdb 100644
--- a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-vcond-1.c
+++ 

Re: [pushed][PATCH] LoongArch: Increase cost of vector aligned store/load.

2023-11-16 Thread chenglulu

Pushed to r14-5545.

在 2023/11/16 下午4:44, Jiahao Xu 写道:

Based on SPEC2017 performance evaluation results, it's better to make them equal
to the cost of unaligned store/load so as to avoid odd alignment peeling.

gcc/ChangeLog:

* config/loongarch/loongarch.cc
(loongarch_builtin_vectorization_cost): Adjust.

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 738911661d7..d05743bec87 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3893,11 +3893,9 @@ loongarch_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
case scalar_stmt:
case scalar_load:
case vector_stmt:
-  case vector_load:
case vec_to_scalar:
case scalar_to_vec:
case scalar_store:
-  case vector_store:
return 1;
  
case vec_promote_demote:

@@ -3905,6 +3903,8 @@ loongarch_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
return LASX_SUPPORTED_MODE_P (mode)
  && !LSX_SUPPORTED_MODE_P (mode) ? 2 : 1;
  
+  case vector_load:

+  case vector_store:
case unaligned_load:
case unaligned_store:
return 2;




Re: [PATCH V3 4/7] ira: Support subreg copy

2023-11-16 Thread Lehua Ding

Hi Vladimir,

Thank you so much for your review. Based on your comments, I feel like 
there are a lot of issues, especially the long compile time issue. So 
I'm going to reorganize and refactor the patches so that as many of them 
as possible can be reviewed separately. this way there will be fewer 
patches to support subreg in the end. I plan to split it into four 
separate patches like bellow. What do you think?


1. live_subreg problem
2. conflict_hard_regs check refactoring
3. use object instead of allocno to create copies
4. support subreg coalesce
   4.1 ira: Apply live_subreg data to ira
   4.2 lra: Apply live_subreg data to lra
   4.3 ira: Support subreg liveness track
   4.4 lra: Support subreg liveness track

So for the two patches about LRA, maybe you can stop review and wait for 
the revised patchs.


On 2023/11/17 5:13, Vladimir Makarov wrote:


On 11/12/23 07:08, Lehua Ding wrote:
This patch changes the previous way of creating a copy between 
allocnos to objects.


gcc/ChangeLog:

* ira-build.cc (find_allocno_copy): Removed.
(find_object): New.
(ira_create_copy): Adjust.
(add_allocno_copy_to_list): Adjust.
(swap_allocno_copy_ends_if_necessary): Adjust.
(ira_add_allocno_copy): Adjust.
(print_copy): Adjust.
(print_allocno_copies): Adjust.
(ira_flattening): Adjust.
* ira-color.cc (INCLUDE_VECTOR): Include vector.
(struct allocno_color_data): Adjust.
(struct allocno_hard_regs_subnode): Adjust.
(form_allocno_hard_regs_nodes_forest): Adjust.
(update_left_conflict_sizes_p): Adjust.
(struct update_cost_queue_elem): Adjust.
(queue_update_cost): Adjust.
(get_next_update_cost): Adjust.
(update_costs_from_allocno): Adjust.
(update_conflict_hard_regno_costs): Adjust.
(assign_hard_reg): Adjust.
(objects_conflict_by_live_ranges_p): New.
(allocno_thread_conflict_p): Adjust.
(object_thread_conflict_p): Ditto.
(merge_threads): Ditto.
(form_threads_from_copies): Ditto.
(form_threads_from_bucket): Ditto.
(form_threads_from_colorable_allocno): Ditto.
(init_allocno_threads): Ditto.
(add_allocno_to_bucket): Ditto.
(delete_allocno_from_bucket): Ditto.
(allocno_copy_cost_saving): Ditto.
(color_allocnos): Ditto.
(color_pass): Ditto.
(update_curr_costs): Ditto.
(coalesce_allocnos): Ditto.
(ira_reuse_stack_slot): Ditto.
(ira_initiate_assign): Ditto.
(ira_finish_assign): Ditto.
* ira-conflicts.cc (allocnos_conflict_for_copy_p): Ditto.
(REG_SUBREG_P): Ditto.
(subreg_move_p): New.
(regs_non_conflict_for_copy_p): New.
(subreg_reg_align_and_times_p): New.
(process_regs_for_copy): Ditto.
(add_insn_allocno_copies): Ditto.
(propagate_copies): Ditto.
* ira-emit.cc (add_range_and_copies_from_move_list): Ditto.
* ira-int.h (struct ira_allocno_copy): Ditto.
(ira_add_allocno_copy): Ditto.
(find_object): Exported.
(subreg_move_p): Exported.
* ira.cc (print_redundant_copies): Exported.

---
  gcc/ira-build.cc | 154 +++-
  gcc/ira-color.cc | 541 +++
  gcc/ira-conflicts.cc | 173 +++---
  gcc/ira-emit.cc  |  10 +-
  gcc/ira-int.h    |  10 +-
  gcc/ira.cc   |   5 +-
  6 files changed, 646 insertions(+), 247 deletions(-)
The patch is mostly ok for me except that there are the same issues I 
mentioned in my 1st email. Not changing comments for functions with 
changed interface like function arg types and names (e.g. 
find_allocno_copy) is particularly bad.  It makes the comments confusing 
and wrong.  Also using just "adjust" in changelog entries is too brief. 
You should at least mention that function signature is changed.

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index a32693e69e4..13f0f7336ed 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 8aed25144b9..099312bcdb3 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see



-  ira_allocno_t next_thread_allocno;
+  ira_object_t *next_thread_objects;
+  /* The allocno all thread shared.  */
+  ira_allocno_t first_thread_allocno;
+  /* The offset start relative to the first_thread_allocno.  */
+  int first_thread_offset;
+  /* All allocnos belong to the thread.  */
+  bitmap thread_allocnos;


It is better to use bitmap_head instead of bitmap.  It permits to avoid 
allocation of bitmap_head for bitmap.  There are many places when 
bitmap_head in you patches can be better used than bitmap (it is 
especially profitable if there is significant probability of empty bitmap).


Of  course the patch cab be committed when all the patches are approved 
and fixed.





--
Best,
Lehua (RiVAI)


[PATCH] libstdc++: Remove UB from operator+ of months and weekdays.

2023-11-16 Thread Cassio Neri
The following functions invoke signed integer overflow (UB) for some extreme
values of days and months [1]:

  weekday operator+(const weekday& x, const days& y); // #1
  month operator+(const month& x, const months& y);   // #2

For #1, the crux of the problem is that, in libstdc++, days::rep is int64_t.
Other implementations use int32_t and cast operands to int64_t and perform
arithmetic operations without fear of overflowing. For instance, #1 evaluates:

  modulo(static_cast(unsigned{x}._M_wd) + __y.count(), 7);

Sadly, libstdc++ doesn't have this luxury.  For #2, casting to a larger type
could help but all implementations follow the Standard's "Returns clause"
and evaluate:

   modulo(static_cast(unsigned{__x}) + (__y.count() - 1), 12);

Hence, overflow occurs when __y.count() is the minimum value of its type.  When
long long is larger than months::rep, this is a fix:

   modulo(static_cast(unsigned{__x}) + 11 + __y.count(), 12);

Again, this is not possible for libstdc++. To fix these UB, this patch
implements:

  template 
  unsigned __add_modulo(unsigned __x, _T __y);

which returns the remainder of Euclidean division of __x +__y by __d without
overflowing.  This function replaces

  constexpr unsigned __modulo(long long __n, unsigned __d);

which also calculates the reminder but takes the sum __n as argument at which
point the overflow might have already occurred.

In addition to solve the UB issues, __add_modulo allows shorter branchless code
on x86-64 and ARM [2].

[1] https://godbolt.org/z/WqvosbrvG
[2] https://godbolt.org/z/o63794GEE

libstdc++-v3/ChangeLog:

* include/std/chrono: Fix operator+ for months and weekdays.
* testsuite/std/time/month/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/month/2.cc: New test for extreme values.
* testsuite/std/time/weekday/1.cc: Add constexpr tests against overflow.
* testsuite/std/time/weekday/2.cc: New test for extreme values.
---

If desirable, I think I'm able to do something similar for operator-(x, y)
(month/weekday x and months/days y) which is specified as:
Returns: x + -y;
All implementations follow the above and -y overflows when y has the minimum
value of its type.

 libstdc++-v3/include/std/chrono  | 61 
 libstdc++-v3/testsuite/std/time/month/1.cc   |  9 +++
 libstdc++-v3/testsuite/std/time/month/2.cc   | 47 +++
 libstdc++-v3/testsuite/std/time/weekday/1.cc |  8 +++
 libstdc++-v3/testsuite/std/time/weekday/2.cc | 47 +++
 5 files changed, 148 insertions(+), 24 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/month/2.cc
 create mode 100644 libstdc++-v3/testsuite/std/time/weekday/2.cc

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 10bdd1c4ede..02087a9374c 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -497,18 +497,38 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

 namespace __detail
 {
-  // Compute the remainder of the Euclidean division of __n divided by __d.
-  // Euclidean division truncates toward negative infinity and always
-  // produces a remainder in the range of [0,__d-1] (whereas standard
-  // division truncates toward zero and yields a nonpositive remainder
-  // for negative __n).
+  // Compute the remainder of the Euclidean division of __x + __y divided 
by
+  // __d without overflowing.  Typically, __x <= 255 + d - 1 is sum of
+  // weekday/month and an offset in [0, d - 1] and __y is a duration count.
+  // For instance, [time.cal.month.nonmembers] says that given month x and
+  // months y, to get x + y one must calculate:
+  //
+  // modulo(static_cast(unsigned{x}) + (y.count() - 1), 12) + 1.
+  //
+  // Since y.count() is a 64-bits signed value the subtraction y.count() - 
1
+  // or the addition of this value with static_cast(unsigned{x})
+  // might overflow.  This function can be used to avoid this problem:
+  // __add_modulo<12>(unsigned{x} + 11, y.count()) + 1;
+  // (More details in the implementation of operator+(month, months).)
+  template 
   constexpr unsigned
-  __modulo(long long __n, unsigned __d)
-  {
-   if (__n >= 0)
- return __n % __d;
-   else
- return (__d + (__n % __d)) % __d;
+  __add_modulo(unsigned __x, _T __y)
+  {
+   using _U = make_unsigned_t<_T>;
+   // For __y >= 0, _U(__y) has the same mathematical value as __y and this
+   // function simply returns (__x + _U(__y)) % d.  Typically, this doesn't
+   // overflow since the range of _U contains many more positive values
+   // than _T's.  For __y < 0, _U(__y) has a mathematical value in the
+   // upper-half range of _U so that adding a positive value to it might
+   // overflow.  Moreover, most likely, _U(__y) != __y mod d.  To fix both
+   // issues we "subtract" from _U(__y) an __offset 

Re: [committed] hppa: Revise REG+D address support to allow long displacements before reload

2023-11-16 Thread Sam James


Sam James  writes:

> John David Anglin  writes:
>
>> On 2023-11-16 4:52 p.m., Jeff Law wrote:
>>>
>>>
>>> On 11/16/23 10:54, John David Anglin wrote:
 Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11.  Committed
 to trunk.

 This patch works around problem compiling python3.11 by improving
 REG+D address handling.  The change results in smaller code and
 reduced register pressure.

 Dave
 ---

 hppa: Revise REG+D address support to allow long displacements before 
 reload

 In analyzing PR rtl-optimization/112415, I realized that restricting
 REG+D offsets to 5-bits before reload results in very poor code and
 complexities in optimizing these instructions after reload.  The
 general problem is long displacements are not allowed for floating
 point accesses when generating PA 1.1 code.  Even with PA 2.0, there
 is a ELF linker bug that prevents using long displacements for
 floating point loads and stores.

 In the past, enabling long displacements before reload caused issues
 in reload.  However, there have been fixes in the handling of reloads
 for floating-point accesses.  This change allows long displacements
 before reload and corrects a couple of issues in the constraint
 handling for integer and floating-point accesses.

 2023-11-16  John David Anglin  

 gcc/ChangeLog:

 PR rtl-optimization/112415
 * config/pa/pa.cc (pa_legitimate_address_p): Allow 14-bit
 displacements before reload.  Simplify logic flow.  Revise
 comments.
 * config/pa/pa.h (TARGET_ELF64): New define.
 (INT14_OK_STRICT): Update define and comment.
 * config/pa/pa64-linux.h (TARGET_ELF64): Define.
 * config/pa/predicates.md (base14_operand): Don't check
 alignment of short displacements.
 (integer_store_memory_operand): Don't return true when
 reload_in_progress is true.  Remove INT_5_BITS check.
 (floating_point_store_memory_operand): Don't return true when
 reload_in_progress is true.  Use INT14_OK_STRICT to check
 whether long displacements are always okay.
>>> I strongly suspect this is going to cause problems in the end.
>>>
>>> I've already done what you're trying to do.  It'll likely look fine
>>> for an extended period of time, but it will almost certainly break
>>> one day.
>
> Jeff, I don't suppose you could dig out the old bugs/commits just out of
> interest?
>
>> I could happen.  If it happens and can't be fixed, it's easy enough to 
>> return false in
>> pa_legitimate_address_p before reload.  Maybe we could add an optimization 
>> option for this.

I might hack in an option for local testing so I can quickly check
with/without...

>>
>> As it stands, the code improvement for python is significant.  I don't think 
>> f-m-o can fix things after reload.
>>a
>> Hopefully, Sam will test the change with various package builds on gentoo.  
>> Debian is still on gcc-13.
>
> Yeah, happy to do that. We haven't got GCC 14 deployed in the wild, but
> we have it available for people who want to test and opt-in to it.
>
> Fingers crossed it's calm. I'll let you know if it isn't ;)
>
>> I'm not seeing any obvious problems in the gcc testsuite.  It needs testing 
>> on packages that do extensive
>> floating point calculations.
>
> OK, I'll focus on those.
>
>>
>> Dave



Re: [committed] hppa: Revise REG+D address support to allow long displacements before reload

2023-11-16 Thread Sam James


John David Anglin  writes:

> On 2023-11-16 4:52 p.m., Jeff Law wrote:
>>
>>
>> On 11/16/23 10:54, John David Anglin wrote:
>>> Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11.  Committed
>>> to trunk.
>>>
>>> This patch works around problem compiling python3.11 by improving
>>> REG+D address handling.  The change results in smaller code and
>>> reduced register pressure.
>>>
>>> Dave
>>> ---
>>>
>>> hppa: Revise REG+D address support to allow long displacements before reload
>>>
>>> In analyzing PR rtl-optimization/112415, I realized that restricting
>>> REG+D offsets to 5-bits before reload results in very poor code and
>>> complexities in optimizing these instructions after reload.  The
>>> general problem is long displacements are not allowed for floating
>>> point accesses when generating PA 1.1 code.  Even with PA 2.0, there
>>> is a ELF linker bug that prevents using long displacements for
>>> floating point loads and stores.
>>>
>>> In the past, enabling long displacements before reload caused issues
>>> in reload.  However, there have been fixes in the handling of reloads
>>> for floating-point accesses.  This change allows long displacements
>>> before reload and corrects a couple of issues in the constraint
>>> handling for integer and floating-point accesses.
>>>
>>> 2023-11-16  John David Anglin  
>>>
>>> gcc/ChangeLog:
>>>
>>> PR rtl-optimization/112415
>>> * config/pa/pa.cc (pa_legitimate_address_p): Allow 14-bit
>>> displacements before reload.  Simplify logic flow.  Revise
>>> comments.
>>> * config/pa/pa.h (TARGET_ELF64): New define.
>>> (INT14_OK_STRICT): Update define and comment.
>>> * config/pa/pa64-linux.h (TARGET_ELF64): Define.
>>> * config/pa/predicates.md (base14_operand): Don't check
>>> alignment of short displacements.
>>> (integer_store_memory_operand): Don't return true when
>>> reload_in_progress is true.  Remove INT_5_BITS check.
>>> (floating_point_store_memory_operand): Don't return true when
>>> reload_in_progress is true.  Use INT14_OK_STRICT to check
>>> whether long displacements are always okay.
>> I strongly suspect this is going to cause problems in the end.
>>
>> I've already done what you're trying to do.  It'll likely look fine
>> for an extended period of time, but it will almost certainly break
>> one day.

Jeff, I don't suppose you could dig out the old bugs/commits just out of
interest?

> I could happen.  If it happens and can't be fixed, it's easy enough to return 
> false in
> pa_legitimate_address_p before reload.  Maybe we could add an optimization 
> option for this.
>
> As it stands, the code improvement for python is significant.  I don't think 
> f-m-o can fix things after reload.
>a
> Hopefully, Sam will test the change with various package builds on gentoo.  
> Debian is still on gcc-13.

Yeah, happy to do that. We haven't got GCC 14 deployed in the wild, but
we have it available for people who want to test and opt-in to it.

Fingers crossed it's calm. I'll let you know if it isn't ;)

> I'm not seeing any obvious problems in the gcc testsuite.  It needs testing 
> on packages that do extensive
> floating point calculations.

OK, I'll focus on those.

>
> Dave



RFC: Problem with UNSPEC/UNSPEC_VOLATILE and modes

2023-11-16 Thread Jeff Law

So looking for thoughts from the community on this one

Let's take this RTL:


(insn 10 9 11 2 (set (reg:SI 144)
(unspec_volatile [
(const_int 0 [0])
] UNSPECV_FRFLAGS)) "j.c":11:3 discrim 1 362 {riscv_frflags}
 (nil)) 
(insn 11 10 55 2 (set (reg:DI 140 [ _12 ])

(sign_extend:DI (reg:SI 144))) "j.c":11:3 discrim 1 122 
{*extendsidi2_internal}
 (expr_list:REG_DEAD (reg:SI 144) 
(nil)))  


Assume we have a pass that can look at how (reg:DI 140) is used and 
ultimately determine that bits 32..63 are never read.  So that pass 
turns the SIGN_EXTEND into a lowpart SUBREG:



(insn 10 9 11 2 (set (reg:SI 144)
(unspec_volatile [
(const_int 0 [0])
] UNSPECV_FRFLAGS)) "j.c":11:3 discrim 1 362 {riscv_frflags}
 (nil))
(insn 11 10 55 2 (set (reg:DI 140 [ _12 ])
(subreg:DI (reg:SI 144) 0)) "j.c":11:3 discrim 1 206 {*movdi_64bit}
 (expr_list:REG_DEAD (reg:SI 144)
(nil)))



Combine comes along and tries to substitute the UNSPEC_VOLATILE into the 
use of (reg:SI 144).  We end up calling simplify_subreg with the inner 
mode being VOIDmode (from the UNSPEC_VOLATILE which has no mode).  That 
triggers an assertion failure.



So one obvious fix is to require an UNSPEC_VOLATILE to have a mode when 
it is one arm of a SET.  I see ~40 cases where this can happen with a 
bit of grepping of the .md files across all the ports.So it's 
probably a tractable problem.  I could easily argue this is the right 
fix.  The docs for set indicate that if the destination is a reg, subreg 
or mem (which all have a mode) that the source must be valid for the 
destination's mode.


Another approach would be to remove the assert from simplify_subreg and 
just return NULL_RTX.  It's a trivial change, though I must admit I 
don't like removing checks, even if it's the easiest way to fix this 
problem.


A third approach would be to adjust combine to avoid the problem, 
perhaps in can_combine_p or somewhere else.  This feels a bit like a 
hack to me and I would expect we can get into the same scenario from 
other optimization passes.


Whatever we do for UNSPEC_VOLATILE we probably should be doing for 
UNSPEC as well.


Again, looking for community input on this one.  If left to my own 
devices I'd probably be looking to add modes to the relevant modeless 
unspecs.


Jeff


Re: building GNU gettext on AIX

2023-11-16 Thread David Edelsohn
On Thu, Nov 16, 2023 at 7:07 PM Bruno Haible  wrote:

> David Edelsohn wrote:
> > > ibm-clang links against libpthread.a as well:
> > > $ ldd /opt/IBM/openxlC/17.1.1/bin/.ibm-clang.orig
> > > /opt/IBM/openxlC/17.1.1/bin/.ibm-clang.orig needs:
> > >  /usr/lib/libpthreads.a(shr_xpg5_64.o)
> > >  /usr/opt/zlibNX/lib/libz.a(libz.so.1)
> > >  /usr/lib/libcurses.a(shr42_64.o)
> > >  /usr/lib/libiconv.a(shr4_64.o)
> > >  /usr/lib/libc++.a(shr_64.o)
> > >  /usr/lib/libc++abi.a(libc++abi.so.1)
> > >  /usr/lib/libc.a(shr_64.o)
> > >  /usr/lib/libpthreads.a(_shr_xpg5_64.o)
> > >  /usr/lib/libc++.a(libc++.so.1)
> > >  /usr/lib/libunwind.a(libunwind.so.1)
> > >  /usr/lib/libc.a(_shr_64.o)
> > >  /unix
> > >  /usr/lib/libcrypt.a(shr_64.o)
> > >
> >
> > I have asked the IBM Clang team why ibm-clang depends on libpthreads.
>
> The reason is that
>   - For a library, it is a normal expectation nowadays that it is
> multithread-safe.
>   - Making a library multithread-safe (without major hacks) means to do
> locking or to call pthread_once / call_once in some places.
>   - The ISO C 11 threading functions in libc have some drawbacks compared
> to the pthread functions. [1] So most developer prefer to rely on the
> POSIX threads API.
>   - Since AIX does not have the POSIX mutex functions in libc and does not
> support weak symbols like in ELF, this means a dependency to
> pthread_mutex_lock or pthread_once.
>   - Accordingly, in the list of libraries above, 3 libraries need pthread*
> symbols:
>
> $ nm -X 64 /usr/lib/libc++abi.a | grep ' U ' | grep pthread_mutex
> pthread_mutex_lock   U   -
> pthread_mutex_unlock U   -
> $ nm -X 64 /usr/lib/libc++.a | grep ' U ' | grep pthread_mutex
> pthread_mutex_destroy U   -
> pthread_mutex_init   U   -
> pthread_mutex_lock   U   -
> pthread_mutex_trylock U   -
> pthread_mutex_unlock U   -
> pthread_mutexattr_destroy U   -
> pthread_mutexattr_init U   -
> pthread_mutexattr_settype U   -
> $ nm -X 64 /usr/opt/zlibNX/lib/libz.a | grep ' U ' | grep pthread_mutex
> pthread_mutex_destroy U   -
> pthread_mutex_init   U   -
> pthread_mutex_lock   U   -
> pthread_mutex_unlock U   -
>

There are ibm_clang and ibm_clang_r (previous xlc and xlc_r) to compile
with and without thread safe.   If IBM Clang team
chose to only provide a thread safe version of libc++, okay, but that
doesn't seem like a fundamental requirement.
zlibNX is another can of worms.

David


[PATCH 1/2] Support reduc_{plus, xor, and, ior}_scal_m for vector integer mode.

2023-11-16 Thread liuhongt
BB vectorizer relies on the backend support of
.REDUC_{PLUS,IOR,XOR,AND} to vectorize reduction.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.

gcc/ChangeLog:

PR target/112325
* config/i386/sse.md (reduc__scal_): New expander.
(REDUC_ANY_LOGIC_MODE): New iterator.
(REDUC_PLUS_MODE): Extend to VxHI/SI/DImode.
(REDUC_SSE_PLUS_MODE): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112325-1.c: New test.
* gcc.target/i386/pr112325-2.c: New test.
---
 gcc/config/i386/sse.md |  48 -
 gcc/testsuite/gcc.target/i386/pr112325-1.c | 116 +
 gcc/testsuite/gcc.target/i386/pr112325-2.c |  38 +++
 3 files changed, 199 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112325-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr112325-2.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d250a6cb802..f94a77d0b6d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -3417,7 +3417,9 @@ (define_insn "sse3_hv4sf3"
 
 (define_mode_iterator REDUC_SSE_PLUS_MODE
  [(V2DF "TARGET_SSE") (V4SF "TARGET_SSE")
-  (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")])
+  (V8HF "TARGET_AVX512FP16 && TARGET_AVX512VL")
+  (V8HI "TARGET_SSE2") (V4SI "TARGET_SSE2")
+  (V2DI "TARGET_SSE2")])
 
 (define_expand "reduc_plus_scal_"
  [(plus:REDUC_SSE_PLUS_MODE
@@ -3458,8 +3460,12 @@ (define_mode_iterator REDUC_PLUS_MODE
   (V8DF "TARGET_AVX512F && TARGET_EVEX512")
   (V16SF "TARGET_AVX512F && TARGET_EVEX512")
   (V32HF "TARGET_AVX512FP16 && TARGET_AVX512VL && TARGET_EVEX512")
-  (V32QI "TARGET_AVX")
-  (V64QI "TARGET_AVX512F && TARGET_EVEX512")])
+  (V32QI "TARGET_AVX") (V16HI "TARGET_AVX")
+  (V8SI "TARGET_AVX")  (V4DI "TARGET_AVX")
+  (V64QI "TARGET_AVX512F && TARGET_EVEX512")
+  (V32HI "TARGET_AVX512F && TARGET_EVEX512")
+  (V16SI "TARGET_AVX512F && TARGET_EVEX512")
+  (V8DI "TARGET_AVX512F && TARGET_EVEX512")])
 
 (define_expand "reduc_plus_scal_"
  [(plus:REDUC_PLUS_MODE
@@ -3597,6 +3603,42 @@ (define_insn 
"reduces"
(set_attr "prefix" "evex")
(set_attr "mode" "")])
 
+(define_expand "reduc__scal_"
+ [(any_logic:VI_128
+(match_operand: 0 "register_operand")
+(match_operand:VI_128 1 "register_operand"))]
+ "TARGET_SSE2"
+{
+  rtx tmp = gen_reg_rtx (mode);
+  ix86_expand_reduc (gen_3, tmp, operands[1]);
+  emit_insn (gen_vec_extract (operands[0],
+  tmp, const0_rtx));
+  DONE;
+})
+
+(define_mode_iterator REDUC_ANY_LOGIC_MODE
+ [(V32QI "TARGET_AVX") (V16HI "TARGET_AVX")
+  (V8SI "TARGET_AVX")  (V4DI "TARGET_AVX")
+  (V64QI "TARGET_AVX512F && TARGET_EVEX512")
+  (V32HI "TARGET_AVX512F && TARGET_EVEX512")
+  (V16SI "TARGET_AVX512F && TARGET_EVEX512")
+  (V8DI "TARGET_AVX512F && TARGET_EVEX512")])
+
+(define_expand "reduc__scal_"
+ [(any_logic:REDUC_ANY_LOGIC_MODE
+   (match_operand: 0 "register_operand")
+   (match_operand:REDUC_ANY_LOGIC_MODE 1 "register_operand"))]
+ ""
+{
+  rtx tmp = gen_reg_rtx (mode);
+  emit_insn (gen_vec_extract_hi_ (tmp, operands[1]));
+  rtx tmp2 = gen_reg_rtx (mode);
+  rtx tmp3 = gen_lowpart (mode, operands[1]);
+  emit_insn (gen_3 (tmp2, tmp, tmp3));
+  emit_insn (gen_reduc__scal_ (operands[0], tmp2));
+  DONE;
+})
+
 ;
 ;;
 ;; Parallel floating point comparisons
diff --git a/gcc/testsuite/gcc.target/i386/pr112325-1.c 
b/gcc/testsuite/gcc.target/i386/pr112325-1.c
new file mode 100644
index 000..56e20c156f1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112325-1.c
@@ -0,0 +1,116 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512vl -mavx512bw -O2 -mtune=generic 
-mprefer-vector-width=512 -fdump-tree-slp2" } */
+/* { dg-final { scan-tree-dump-times ".REDUC_PLUS" 3 "slp2" } } */
+/* { dg-final { scan-tree-dump-times ".REDUC_IOR" 4 "slp2" } } */
+
+int
+__attribute__((noipa))
+plus_v4si (int* a)
+{
+  int sum = 0;
+  sum += a[0];
+  sum += a[1];
+  sum += a[2];
+  sum += a[3];
+  return sum;
+}
+
+short
+__attribute__((noipa))
+plus_v8hi (short* a)
+{
+  short sum = 0;
+  sum += a[0];
+  sum += a[1];
+  sum += a[2];
+  sum += a[3];
+  sum += a[4];
+  sum += a[5];
+  sum += a[6];
+  sum += a[7];
+  return sum;
+}
+
+long long
+__attribute__((noipa))
+plus_v8di (long long* a)
+{
+  long long sum = 0;
+  sum += a[0];
+  sum += a[1];
+  sum += a[2];
+  sum += a[3];
+  sum += a[4];
+  sum += a[5];
+  sum += a[6];
+  sum += a[7];
+  return sum;
+}
+
+int
+__attribute__((noipa))
+ior_v4si (int* a)
+{
+  int sum = 0;
+  sum |= a[0];
+  sum |= a[1];
+  sum |= a[2];
+  sum |= a[3];
+  return sum;
+}
+
+short
+__attribute__((noipa))
+ior_v8hi (short* a)
+{
+  short sum = 0;
+  sum |= a[0];
+  sum |= a[1];
+  sum |= a[2];
+  sum |= a[3];
+  sum |= a[4];
+  sum |= a[5];
+  sum |= a[6];
+  sum |= a[7];
+  return sum;
+}
+
+long long
+__attribute__((noipa))
+ior_v8di (long long* a)

[PATCH 2/2] Add i?86-*-* and x86_64-*-* to vect_logical_reduc

2023-11-16 Thread liuhongt
x86 backend support reduc_{and,ior,xor>_scal_m for vector integer
modes.

Ok for trunk?

gcc/testsuite/ChangeLog:

* lib/target-supports.exp (vect_logical_reduc): Add i?86-*-*
and x86_64-*-*.
---
 gcc/testsuite/lib/target-supports.exp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index b6a2e4fd096..30dd39508f8 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9257,7 +9257,8 @@ proc check_effective_target_vect_call_roundf { } {
 proc check_effective_target_vect_logical_reduc { } {
 return [expr { [check_effective_target_aarch64_sve]
   || [istarget amdgcn-*-*]
-  || [check_effective_target_riscv_v] }]
+  || [check_effective_target_riscv_v]
+  || [istarget i?86-*-*] || [istarget x86_64-*-*]}]
 }
 
 # Return 1 if the target supports the fold_extract_last optab.
-- 
2.31.1



Re: building GNU gettext on AIX

2023-11-16 Thread Bruno Haible
David Edelsohn wrote:
> > ibm-clang links against libpthread.a as well:
> > $ ldd /opt/IBM/openxlC/17.1.1/bin/.ibm-clang.orig
> > /opt/IBM/openxlC/17.1.1/bin/.ibm-clang.orig needs:
> >  /usr/lib/libpthreads.a(shr_xpg5_64.o)
> >  /usr/opt/zlibNX/lib/libz.a(libz.so.1)
> >  /usr/lib/libcurses.a(shr42_64.o)
> >  /usr/lib/libiconv.a(shr4_64.o)
> >  /usr/lib/libc++.a(shr_64.o)
> >  /usr/lib/libc++abi.a(libc++abi.so.1)
> >  /usr/lib/libc.a(shr_64.o)
> >  /usr/lib/libpthreads.a(_shr_xpg5_64.o)
> >  /usr/lib/libc++.a(libc++.so.1)
> >  /usr/lib/libunwind.a(libunwind.so.1)
> >  /usr/lib/libc.a(_shr_64.o)
> >  /unix
> >  /usr/lib/libcrypt.a(shr_64.o)
> >
> 
> I have asked the IBM Clang team why ibm-clang depends on libpthreads.

The reason is that
  - For a library, it is a normal expectation nowadays that it is
multithread-safe.
  - Making a library multithread-safe (without major hacks) means to do
locking or to call pthread_once / call_once in some places.
  - The ISO C 11 threading functions in libc have some drawbacks compared
to the pthread functions. [1] So most developer prefer to rely on the
POSIX threads API.
  - Since AIX does not have the POSIX mutex functions in libc and does not
support weak symbols like in ELF, this means a dependency to
pthread_mutex_lock or pthread_once.
  - Accordingly, in the list of libraries above, 3 libraries need pthread*
symbols:

$ nm -X 64 /usr/lib/libc++abi.a | grep ' U ' | grep pthread_mutex
pthread_mutex_lock   U   -
pthread_mutex_unlock U   -
$ nm -X 64 /usr/lib/libc++.a | grep ' U ' | grep pthread_mutex
pthread_mutex_destroy U   -
pthread_mutex_init   U   -
pthread_mutex_lock   U   -
pthread_mutex_trylock U   -
pthread_mutex_unlock U   -
pthread_mutexattr_destroy U   -
pthread_mutexattr_init U   -
pthread_mutexattr_settype U   -
$ nm -X 64 /usr/opt/zlibNX/lib/libz.a | grep ' U ' | grep pthread_mutex
pthread_mutex_destroy U   -
pthread_mutex_init   U   -
pthread_mutex_lock   U   -
pthread_mutex_unlock U   -

Bruno

[1] Lock initialization is clumsy. The return value of a thread is only an
'int', not a pointer. Etc.





Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-16 Thread David Edelsohn
On Thu, Nov 16, 2023 at 5:52 PM Arsen Arsenović  wrote:

>
> David Edelsohn  writes:
>
> > On Thu, Nov 16, 2023 at 5:22 PM Arsen Arsenović  wrote:
> >
> >>
> >> David Edelsohn  writes:
> >>
> >> > Don't build with the dependent libraries in tree.  Don't build the
> >> > dependent libraries as shared libraries. The libraries are already
> built
> >> > and in /opt/cfarm, as mentioned in the Compile Farm wiki.
> >> >
> >> > AIX is not Solaris and not Linux.  It doesn't use ELF.  AIX shared
> >> > libraries *ARE* shared object files in archives.  Shared object
> >> versioning
> >> > is handled by multiple objects in the same archive.
> >>
> >> Hmm, I see.  I removed all the deps but gettext from the tree.
> >>
> >> This leaves gettext-runtime fulfilling the previous role of intl/.
> >>
> >> However, I'm confused about how this worked before, in that case, since,
> >> IIRC, intl also produced libraries and was also put into host exports.
> >>
> >> Leaving gettext in tree produces:
> >>
> >> Could not load program gawk:
> >> Dependent module
> >> /home/arsen/build/./gettext/intl/.libs/libintl.a(libintl.so.8) could
> not be
> >> loaded.
> >> Member libintl.so.8 is not found in archive
> >>
> >> I'll try to see why intl/ didn't cause the same issue soon.
> >>
> >> Thanks, have a lovely evening.
> >>
> >
> > The previous version of "intl" was built as a static library.  Configure
> in
> > the older package had the option --enable-host-shared,
> > which I did not use.  Based on the failure message, the in-tree gettext
> > seems to be built as a shared library.  If you explicitly
> > pass --disable-shared to the in-tree configure, you may get farther.  I'm
> > currently using --disable-shared --disable-threads.
> > As we have discussed, the current gettext will retain some references to
> > pthreads despite the configure option.
>
> Sure, but my patch does insert --disable-shared:
>
> --8<---cut here---start->8---
> host_modules= { module= gettext; bootstrap=true; no_install=true;
> module_srcdir= "gettext/gettext-runtime";
> // We always build gettext with pic, because some packages
> (e.g. gdbserver)
> // need it in some configuratons, which is determined via
> nontrivial tests.
> // Always enabling pic seems to make sense for something
> tied to
> // user-facing output.
> extra_configure_flags='--disable-shared --disable-java
> --disable-csharp --with-pic';
> lib_path=intl/.libs; };
> --8<---cut here---end--->8---
>
> ... and it is applied:
>
> --8<---cut here---start->8---
> -bash-5.1$ ./config.status --config
> --srcdir=../../gcc/gettext/gettext-runtime --cache-file=./config.cache
>   --disable-werror --with-gmp=/opt/cfarm
>   --with-libiconv-prefix=/opt/cfarm --disable-libstdcxx-pch
>   --with-included-gettext --program-transform-name=s,y,y,
>   --disable-option-checking --build=powerpc-ibm-aix7.3.1.0
>   --host=powerpc-ibm-aix7.3.1.0 --target=powerpc-ibm-aix7.3.1.0
>   --disable-intermodule --enable-checking=yes,types,extra
>   --disable-coverage --enable-languages=c,c++
>   --disable-build-format-warnings --disable-shared --disable-java
>   --disable-csharp --with-pic build_alias=powerpc-ibm-aix7.3.1.0
>   host_alias=powerpc-ibm-aix7.3.1.0 target_alias=powerpc-ibm-aix7.3.1.0
>   CC=gcc CFLAGS=-g 'LDFLAGS=-static-libstdc++ -static-libgcc
>   -Wl,-bbigtoc' 'CXX=g++ -std=c++11' CXXFLAGS=-g
> --8<---cut here---end--->8---
>
> I'm unsure how to tell what the produced binaries are w.r.t static or
> shared, but I only see .o files inside intl/.libs/libintl.a, while I see
> a .so.1 in (e.g.) /lib/libz.a, hinting at it not being shared (?)
>

An AIX shared library created by libtool will look like
libfoo.a[libfoo.so.N], where N is the package major version number.
Normally with one file.

An AIX static library will look like libfoo.a[a.o, b.o, c.o]
with multiple object files.

An AIX archive can contain a combination of shared objects and
normal object files.

AIX normally uses the convention shr.o or shr_64.o for the name
of the shared object file.  Hint, hint, an AIX archive can contain
both 32 bit and 64 bit object files or shared objects.

I don't know why the gettext build system would create
/home/arsen/build/./gettext/intl/.libs/libintl.a(libintl.so.8)
if --disable-shared was requested.  That clearly is using the
naming of a libtool AIX shared object and failing due to
the missing shared object.  Although in this case, the problem
seems to be the shared library load path.  AIX uses LIBPATH,
not LD_LIBRARY_PATH.

Also, for me, the out of tree path was

gettext/gettext-runtime/intl/.libs

Is your search path missing a level?

Thanks, David


>
> I do see that the build system adds intl to the LD_LIBRARY_PATH.
>
> I will be testing dropping 

Re: [PATCH] c, c++: Add new value for vector types for __builtin_classify_type (type)

2023-11-16 Thread Joseph Myers
On Thu, 16 Nov 2023, Jason Merrill wrote:

> On 11/11/23 03:22, Jakub Jelinek wrote:
> > Hi!
> > 
> > While filing a clang request to return 18 on _BitInts for
> > __builtin_classify_type instead of -1 they return currently, I've
> > noticed that we return -1 for vector types.  I'm not convinced it is a good
> > idea to change behavior of __builtin_classify_type (vector_expression)
> > after 22 years it behaved one way (returned -1), but the
> > __builtin_classify_type (type) form is a new extension added for GCC 14,
> > so this patch returns 19 for vectors just in that second form.  Many other
> > return values are only accessible from the second form as well (mostly
> > because
> > of argument promotions), so I think it is fine like that.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> The C++ changes are OK (and obvious).  I'm skeptical of the choice to keep
> returning -1 for the expression form, it seems more likely to cause problems
> (due to it disagreeing with the type form) than changing it (due to old code
> somehow relying on -1?).  But people who are more familiar with the use of
> __builtin_classify_type should make the call.

I'm also doubtful of keeping returning -1 for vectors in expression form 
(I'd be surprised if people are actually using __builtin_classify_type 
with vectors).  The C changes are OK (but the front-end changes wouldn't 
be needed at all if the vector and type argument cases aren't 
distinguished).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: building GNU gettext on AIX

2023-11-16 Thread David Edelsohn
On Thu, Nov 16, 2023 at 5:47 PM Bruno Haible  wrote:

> Hi David,
>
> > the default, distributed libintl library will not allow GCC to be built
> > with NLS enabled.
>
> The problem is this configure test from gettext.m4
>
>   checking for GNU gettext in libintl... no
>
> It should say
>
>   checking for GNU gettext in libintl... yes
>
> I reproduce it with simple hello-world package, outside GCC.
>
> It tests whether a program that uses gettext() can be linked with
>   -lintl -liconv
> But now, on AIX, it needs to test whether such a program can be linked with
>   -lintl -liconv -lpthread
>
> > Were you suggesting that --enable-threads=isoc would work now or that it
> > would require further changes for a future release?
>
> It requires a change, effectively to do as if HAVE_PTHREAD_API is undefined
> if --enable-threads=isoc was provided.
>
> I can prepare a new gettext release that has both issues fixed:
>   - gettext.m4 that fixes the configure test and sets the variable LIBINTL
> to "-Lsome/libdir -lintl -liconv -lpthread",
>   - mbrtowc.o and setlocale*.o that use mtx_* locks instead of pthread_*
> mutexes when requested.
>
> But you then need to make up your mind w.r.t. what I wrote in the earlier
> mail.
>
>   * GCC can pass --enable-threads=isoc, to avoid the libpthread dependency
> on AIX ≥ 7.2.
>

I have reached out to the AIX Open Source Tools team for their
perspective.  Normally GCC and other
FOSS packages have not based their support for OS versions on official
vendor support lifecycles,
within reason.  In fact, many users have appreciated longer duration
support.


>
>   * Or GCC can (continue to?) use the variable LIBINTL. This will work on
> AIX 7.1 as well but the programs will then be linked against
> libpthread.
> One additional library.
> $ ldd gcc
> /opt/freeware/bin/gcc needs:
>  /usr/lib/libc.a(shr.o)
>  /opt/freeware/lib/libiconv.a(libiconv.so.2)
>  /usr/lib/libc.a(_shr.o)
>  /unix
>  /usr/lib/libcrypt.a(shr.o)
>  /opt/freeware/lib/libgcc_s.a(shr.o)
> libpthread.a will be added to this list.
>

My builds of GCC only rely on AIX libc.  All other libraries are statically
linked.


>
> ibm-clang links against libpthread.a as well:
> $ ldd /opt/IBM/openxlC/17.1.1/bin/.ibm-clang.orig
> /opt/IBM/openxlC/17.1.1/bin/.ibm-clang.orig needs:
>  /usr/lib/libpthreads.a(shr_xpg5_64.o)
>  /usr/opt/zlibNX/lib/libz.a(libz.so.1)
>  /usr/lib/libcurses.a(shr42_64.o)
>  /usr/lib/libiconv.a(shr4_64.o)
>  /usr/lib/libc++.a(shr_64.o)
>  /usr/lib/libc++abi.a(libc++abi.so.1)
>  /usr/lib/libc.a(shr_64.o)
>  /usr/lib/libpthreads.a(_shr_xpg5_64.o)
>  /usr/lib/libc++.a(libc++.so.1)
>  /usr/lib/libunwind.a(libunwind.so.1)
>  /usr/lib/libc.a(_shr_64.o)
>  /unix
>  /usr/lib/libcrypt.a(shr_64.o)
>

I have asked the IBM Clang team why ibm-clang depends on libpthreads.

One option is to add -lpthreads to the link line, even if the tools are not
built
thread-safe.  Only the intl support would invoke the extraneous overhead.
The downside is that other pthreads dependencies could sneak in.

Thanks, David


>
> Bruno
>
>
>
>


Re: building GNU gettext on AIX

2023-11-16 Thread Arsen Arsenović

Bruno Haible  writes:

> Hi David,
>
>> the default, distributed libintl library will not allow GCC to be built
>> with NLS enabled.
>
> The problem is this configure test from gettext.m4
>
>   checking for GNU gettext in libintl... no
>
> It should say
>
>   checking for GNU gettext in libintl... yes
>
> I reproduce it with simple hello-world package, outside GCC.
>
> It tests whether a program that uses gettext() can be linked with
>   -lintl -liconv
> But now, on AIX, it needs to test whether such a program can be linked with
>   -lintl -liconv -lpthread
>
>> Were you suggesting that --enable-threads=isoc would work now or that it
>> would require further changes for a future release?
>
> It requires a change, effectively to do as if HAVE_PTHREAD_API is undefined
> if --enable-threads=isoc was provided.
>
> I can prepare a new gettext release that has both issues fixed:
>   - gettext.m4 that fixes the configure test and sets the variable LIBINTL
> to "-Lsome/libdir -lintl -liconv -lpthread",
>   - mbrtowc.o and setlocale*.o that use mtx_* locks instead of pthread_*
> mutexes when requested.
>
> But you then need to make up your mind w.r.t. what I wrote in the earlier
> mail.
>
>   * GCC can pass --enable-threads=isoc, to avoid the libpthread dependency
> on AIX ≥ 7.2.

Hmm, would that option work everywhere, though?  Or would we have to
wire up configury to detect which flag to use?  If so, what would it
look like.

>   * Or GCC can (continue to?) use the variable LIBINTL. This will work on

If you mean the one generated by gettext.m4/uninstalled-config.sh, it is
utilized today:

LIBS = @LIBS@ libcommon.a $(CPPLIB) $(LIBINTL) $(LIBICONV) $(LIBBACKTRACE) \
$(LIBIBERTY) $(LIBDECNUMBER) $(HOST_LIBS)

(from gcc/Makefile.in)


> AIX 7.1 as well but the programs will then be linked against libpthread.
> One additional library.
> $ ldd gcc
> /opt/freeware/bin/gcc needs:
>  /usr/lib/libc.a(shr.o)
>  /opt/freeware/lib/libiconv.a(libiconv.so.2)
>  /usr/lib/libc.a(_shr.o)
>  /unix
>  /usr/lib/libcrypt.a(shr.o)
>  /opt/freeware/lib/libgcc_s.a(shr.o)
> libpthread.a will be added to this list.
>
> ibm-clang links against libpthread.a as well:
> $ ldd /opt/IBM/openxlC/17.1.1/bin/.ibm-clang.orig
> /opt/IBM/openxlC/17.1.1/bin/.ibm-clang.orig needs:
>  /usr/lib/libpthreads.a(shr_xpg5_64.o)
>  /usr/opt/zlibNX/lib/libz.a(libz.so.1)
>  /usr/lib/libcurses.a(shr42_64.o)
>  /usr/lib/libiconv.a(shr4_64.o)
>  /usr/lib/libc++.a(shr_64.o)
>  /usr/lib/libc++abi.a(libc++abi.so.1)
>  /usr/lib/libc.a(shr_64.o)
>  /usr/lib/libpthreads.a(_shr_xpg5_64.o)
>  /usr/lib/libc++.a(libc++.so.1)
>  /usr/lib/libunwind.a(libunwind.so.1)
>  /usr/lib/libc.a(_shr_64.o)
>  /unix
>  /usr/lib/libcrypt.a(shr_64.o)

David, I'll leave that decision up to you.

> Bruno


-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-16 Thread Arsen Arsenović

David Edelsohn  writes:

> On Thu, Nov 16, 2023 at 5:22 PM Arsen Arsenović  wrote:
>
>>
>> David Edelsohn  writes:
>>
>> > Don't build with the dependent libraries in tree.  Don't build the
>> > dependent libraries as shared libraries. The libraries are already built
>> > and in /opt/cfarm, as mentioned in the Compile Farm wiki.
>> >
>> > AIX is not Solaris and not Linux.  It doesn't use ELF.  AIX shared
>> > libraries *ARE* shared object files in archives.  Shared object
>> versioning
>> > is handled by multiple objects in the same archive.
>>
>> Hmm, I see.  I removed all the deps but gettext from the tree.
>>
>> This leaves gettext-runtime fulfilling the previous role of intl/.
>>
>> However, I'm confused about how this worked before, in that case, since,
>> IIRC, intl also produced libraries and was also put into host exports.
>>
>> Leaving gettext in tree produces:
>>
>> Could not load program gawk:
>> Dependent module
>> /home/arsen/build/./gettext/intl/.libs/libintl.a(libintl.so.8) could not be
>> loaded.
>> Member libintl.so.8 is not found in archive
>>
>> I'll try to see why intl/ didn't cause the same issue soon.
>>
>> Thanks, have a lovely evening.
>>
>
> The previous version of "intl" was built as a static library.  Configure in
> the older package had the option --enable-host-shared,
> which I did not use.  Based on the failure message, the in-tree gettext
> seems to be built as a shared library.  If you explicitly
> pass --disable-shared to the in-tree configure, you may get farther.  I'm
> currently using --disable-shared --disable-threads.
> As we have discussed, the current gettext will retain some references to
> pthreads despite the configure option.

Sure, but my patch does insert --disable-shared:

--8<---cut here---start->8---
host_modules= { module= gettext; bootstrap=true; no_install=true;
module_srcdir= "gettext/gettext-runtime";
// We always build gettext with pic, because some packages 
(e.g. gdbserver)
// need it in some configuratons, which is determined via 
nontrivial tests.
// Always enabling pic seems to make sense for something tied to
// user-facing output.
extra_configure_flags='--disable-shared --disable-java 
--disable-csharp --with-pic';
lib_path=intl/.libs; };
--8<---cut here---end--->8---

... and it is applied:

--8<---cut here---start->8---
-bash-5.1$ ./config.status --config
--srcdir=../../gcc/gettext/gettext-runtime --cache-file=./config.cache
  --disable-werror --with-gmp=/opt/cfarm
  --with-libiconv-prefix=/opt/cfarm --disable-libstdcxx-pch
  --with-included-gettext --program-transform-name=s,y,y,
  --disable-option-checking --build=powerpc-ibm-aix7.3.1.0
  --host=powerpc-ibm-aix7.3.1.0 --target=powerpc-ibm-aix7.3.1.0
  --disable-intermodule --enable-checking=yes,types,extra
  --disable-coverage --enable-languages=c,c++
  --disable-build-format-warnings --disable-shared --disable-java
  --disable-csharp --with-pic build_alias=powerpc-ibm-aix7.3.1.0
  host_alias=powerpc-ibm-aix7.3.1.0 target_alias=powerpc-ibm-aix7.3.1.0
  CC=gcc CFLAGS=-g 'LDFLAGS=-static-libstdc++ -static-libgcc
  -Wl,-bbigtoc' 'CXX=g++ -std=c++11' CXXFLAGS=-g
--8<---cut here---end--->8---

I'm unsure how to tell what the produced binaries are w.r.t static or
shared, but I only see .o files inside intl/.libs/libintl.a, while I see
a .so.1 in (e.g.) /lib/libz.a, hinting at it not being shared (?)

I do see that the build system adds intl to the LD_LIBRARY_PATH.

I will be testing dropping lib_path from the module definition above.
It might be superflous (I think it is only used for LD_LIBRARY_PATH, for
when the libs built by the build system are shared - which they never
are for in-tree gettext).  I'll take the shot to add --disable-threads,
too, for this test.

From 4b75355d5ee9162a922a85517ef3c0a16931544d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Arsen=20Arsenovi=C4=87?= 
Date: Thu, 16 Nov 2023 23:50:30 +0100
Subject: [PATCH] disable threads, lib_path on gettext

---
 Makefile.def |  3 +--
 Makefile.in  | 27 +++
 2 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/Makefile.def b/Makefile.def
index 792f81447e1b..78414b4cd89c 100644
--- a/Makefile.def
+++ b/Makefile.def
@@ -80,8 +80,7 @@ host_modules= { module= gettext; bootstrap=true; no_install=true;
 		// need it in some configuratons, which is determined via nontrivial tests.
 		// Always enabling pic seems to make sense for something tied to
 		// user-facing output.
-extra_configure_flags='--disable-shared --disable-java --disable-csharp --with-pic';
-lib_path=intl/.libs; };
+extra_configure_flags='--disable-shared --disable-threads --disable-java --disable-csharp --with-pic'; };
 

Re: building GNU gettext on AIX

2023-11-16 Thread Bruno Haible
Hi David,

> the default, distributed libintl library will not allow GCC to be built
> with NLS enabled.

The problem is this configure test from gettext.m4

  checking for GNU gettext in libintl... no

It should say

  checking for GNU gettext in libintl... yes

I reproduce it with simple hello-world package, outside GCC.

It tests whether a program that uses gettext() can be linked with
  -lintl -liconv
But now, on AIX, it needs to test whether such a program can be linked with
  -lintl -liconv -lpthread

> Were you suggesting that --enable-threads=isoc would work now or that it
> would require further changes for a future release?

It requires a change, effectively to do as if HAVE_PTHREAD_API is undefined
if --enable-threads=isoc was provided.

I can prepare a new gettext release that has both issues fixed:
  - gettext.m4 that fixes the configure test and sets the variable LIBINTL
to "-Lsome/libdir -lintl -liconv -lpthread",
  - mbrtowc.o and setlocale*.o that use mtx_* locks instead of pthread_*
mutexes when requested.

But you then need to make up your mind w.r.t. what I wrote in the earlier
mail.

  * GCC can pass --enable-threads=isoc, to avoid the libpthread dependency
on AIX ≥ 7.2.

  * Or GCC can (continue to?) use the variable LIBINTL. This will work on
AIX 7.1 as well but the programs will then be linked against libpthread.
One additional library.
$ ldd gcc
/opt/freeware/bin/gcc needs:
 /usr/lib/libc.a(shr.o)
 /opt/freeware/lib/libiconv.a(libiconv.so.2)
 /usr/lib/libc.a(_shr.o)
 /unix
 /usr/lib/libcrypt.a(shr.o)
 /opt/freeware/lib/libgcc_s.a(shr.o)
libpthread.a will be added to this list.

ibm-clang links against libpthread.a as well:
$ ldd /opt/IBM/openxlC/17.1.1/bin/.ibm-clang.orig
/opt/IBM/openxlC/17.1.1/bin/.ibm-clang.orig needs:
 /usr/lib/libpthreads.a(shr_xpg5_64.o)
 /usr/opt/zlibNX/lib/libz.a(libz.so.1)
 /usr/lib/libcurses.a(shr42_64.o)
 /usr/lib/libiconv.a(shr4_64.o)
 /usr/lib/libc++.a(shr_64.o)
 /usr/lib/libc++abi.a(libc++abi.so.1)
 /usr/lib/libc.a(shr_64.o)
 /usr/lib/libpthreads.a(_shr_xpg5_64.o)
 /usr/lib/libc++.a(libc++.so.1)
 /usr/lib/libunwind.a(libunwind.so.1)
 /usr/lib/libc.a(_shr_64.o)
 /unix
 /usr/lib/libcrypt.a(shr_64.o)

Bruno





[PATCH] libgccjit Fix a RTL bug for libgccjit

2023-11-16 Thread Antoni Boucher
Hi.
This patch fixes a RTL bug when using some target-specific builtins in
libgccjit (bug 112576).

The test use a function from an unmerged patch:
https://gcc.gnu.org/pipermail/jit/2023q1/001605.html

Thanks for the review!
From 9236998f5ad3156ebe39e97c03d1a28ce80dd95a Mon Sep 17 00:00:00 2001
From: Antoni Boucher 
Date: Thu, 9 Jun 2022 20:57:41 -0400
Subject: [PATCH] libgccjit Fix a RTL bug for libgccjit

This fixes a 'unrecognizable insn' error when generating some code using
target-specific builtins.

gcc/ChangeLog:
	PR jit/112576
	* emit-rtl.cc (init_emit_once): Do not initialize const_int_rtx
	if already initialized.

gcc/testsuite:
	PR jit/112576
	* jit.dg/all-non-failing-tests.h: Mention test-rtl-bug-target-builtins.c.
	* jit.dg/test-rtl-bug-target-builtins.c: New test.
---
 gcc/emit-rtl.cc   |  9 +-
 gcc/testsuite/jit.dg/all-non-failing-tests.h  |  3 +
 .../jit.dg/test-rtl-bug-target-builtins.c | 87 +++
 3 files changed, 97 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/jit.dg/test-rtl-bug-target-builtins.c

diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index 84b6833225e..a18ac1de98c 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -6216,8 +6216,13 @@ init_emit_once (void)
   /* Don't use gen_rtx_CONST_INT here since gen_rtx_CONST_INT in this case
  tries to use these variables.  */
   for (i = - MAX_SAVED_CONST_INT; i <= MAX_SAVED_CONST_INT; i++)
-const_int_rtx[i + MAX_SAVED_CONST_INT] =
-  gen_rtx_raw_CONST_INT (VOIDmode, (HOST_WIDE_INT) i);
+  {
+// Do not initialize twice the constants because there are used elsewhere
+// and libgccjit execute this function twice.
+if (const_int_rtx[i + MAX_SAVED_CONST_INT] == NULL)
+  const_int_rtx[i + MAX_SAVED_CONST_INT]
+	= gen_rtx_raw_CONST_INT (VOIDmode, (HOST_WIDE_INT) i);
+  }
 
   if (STORE_FLAG_VALUE >= - MAX_SAVED_CONST_INT
   && STORE_FLAG_VALUE <= MAX_SAVED_CONST_INT)
diff --git a/gcc/testsuite/jit.dg/all-non-failing-tests.h b/gcc/testsuite/jit.dg/all-non-failing-tests.h
index e762563f9bd..3da2e285b80 100644
--- a/gcc/testsuite/jit.dg/all-non-failing-tests.h
+++ b/gcc/testsuite/jit.dg/all-non-failing-tests.h
@@ -322,6 +322,9 @@
 /* test-setting-alignment.c: This can't be in the testcases array as it
is target-specific.  */
 
+/* test-rtl-bug-target-builtins.c: This can't be in the testcases array as it
+   is target-specific.  */
+
 /* test-string-literal.c */
 #define create_code create_code_string_literal
 #define verify_code verify_code_string_literal
diff --git a/gcc/testsuite/jit.dg/test-rtl-bug-target-builtins.c b/gcc/testsuite/jit.dg/test-rtl-bug-target-builtins.c
new file mode 100644
index 000..d4a686271f9
--- /dev/null
+++ b/gcc/testsuite/jit.dg/test-rtl-bug-target-builtins.c
@@ -0,0 +1,87 @@
+/* { dg-do compile { target x86_64-*-* } } */
+
+#include 
+#include 
+
+#include "libgccjit.h"
+
+#define TEST_PROVIDES_MAIN
+#include "harness.h"
+
+void
+create_code (gcc_jit_context *ctxt, void *user_data)
+{
+  gcc_jit_context_add_command_line_option (ctxt, "-mavx512vl");
+  gcc_jit_function *builtin =
+gcc_jit_context_get_target_builtin_function (ctxt,
+"__builtin_ia32_cvtpd2udq128_mask");
+
+  gcc_jit_type *u8_type =
+gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_UINT8_T);
+  gcc_jit_type *double_type =
+gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_DOUBLE);
+  gcc_jit_type *v2df = gcc_jit_type_get_vector (double_type, 2);
+  gcc_jit_type *int_type =
+gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_INT);
+  gcc_jit_type *v4si = gcc_jit_type_get_vector (int_type, 4);
+
+  gcc_jit_function *func =
+gcc_jit_context_new_function (ctxt, NULL,
+  GCC_JIT_FUNCTION_EXPORTED,
+  v4si,
+  "epu32",
+  0, NULL,
+  0);
+  gcc_jit_block *block = gcc_jit_function_new_block (func, NULL);
+  gcc_jit_lvalue *var1 = gcc_jit_function_new_local (func, NULL, v2df, "var1");
+  gcc_jit_lvalue *var2 = gcc_jit_function_new_local (func, NULL, v4si, "var2");
+  gcc_jit_rvalue *args[3] = {
+gcc_jit_lvalue_as_rvalue (var1),
+gcc_jit_lvalue_as_rvalue (var2),
+gcc_jit_context_zero (ctxt, u8_type),
+  };
+  gcc_jit_rvalue *call = gcc_jit_context_new_call (ctxt, NULL, builtin, 3, args);
+  gcc_jit_block_end_with_return (block, NULL, call);
+}
+
+void
+verify_code (gcc_jit_context *ctxt, gcc_jit_result *result)
+{
+  CHECK_NON_NULL (result);
+}
+
+int
+main (int argc, char **argv)
+{
+  /*  This is the same as the main provided by harness.h, but it first create a dummy context and compile
+  in order to add the target builtins to libgccjit's internal state.  */
+  gcc_jit_context *ctxt;
+  ctxt = gcc_jit_context_acquire ();
+  if (!ctxt)
+{
+  fail ("gcc_jit_context_acquire failed");
+  return -1;
+}
+  gcc_jit_result *result;
+  result = gcc_jit_context_compile (ctxt);
+  gcc_jit_result_release (result);
+  gcc_jit_context_release (ctxt);
+
+  int i;
+
+  for (i = 1; i <= 5; i++)
+{
+ 

Re: [committed] hppa: Revise REG+D address support to allow long displacements before reload

2023-11-16 Thread John David Anglin

On 2023-11-16 4:52 p.m., Jeff Law wrote:



On 11/16/23 10:54, John David Anglin wrote:

Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11.  Committed
to trunk.

This patch works around problem compiling python3.11 by improving
REG+D address handling.  The change results in smaller code and
reduced register pressure.

Dave
---

hppa: Revise REG+D address support to allow long displacements before reload

In analyzing PR rtl-optimization/112415, I realized that restricting
REG+D offsets to 5-bits before reload results in very poor code and
complexities in optimizing these instructions after reload.  The
general problem is long displacements are not allowed for floating
point accesses when generating PA 1.1 code.  Even with PA 2.0, there
is a ELF linker bug that prevents using long displacements for
floating point loads and stores.

In the past, enabling long displacements before reload caused issues
in reload.  However, there have been fixes in the handling of reloads
for floating-point accesses.  This change allows long displacements
before reload and corrects a couple of issues in the constraint
handling for integer and floating-point accesses.

2023-11-16  John David Anglin  

gcc/ChangeLog:

PR rtl-optimization/112415
* config/pa/pa.cc (pa_legitimate_address_p): Allow 14-bit
displacements before reload.  Simplify logic flow.  Revise
comments.
* config/pa/pa.h (TARGET_ELF64): New define.
(INT14_OK_STRICT): Update define and comment.
* config/pa/pa64-linux.h (TARGET_ELF64): Define.
* config/pa/predicates.md (base14_operand): Don't check
alignment of short displacements.
(integer_store_memory_operand): Don't return true when
reload_in_progress is true.  Remove INT_5_BITS check.
(floating_point_store_memory_operand): Don't return true when
reload_in_progress is true.  Use INT14_OK_STRICT to check
whether long displacements are always okay.

I strongly suspect this is going to cause problems in the end.

I've already done what you're trying to do.  It'll likely look fine for an 
extended period of time, but it will almost certainly break one day.

I could happen.  If it happens and can't be fixed, it's easy enough to return 
false in
pa_legitimate_address_p before reload.  Maybe we could add an optimization 
option for this.

As it stands, the code improvement for python is significant.  I don't think 
f-m-o can fix things after reload.

Hopefully, Sam will test the change with various package builds on gentoo.  
Debian is still on gcc-13.
I'm not seeing any obvious problems in the gcc testsuite.  It needs testing on 
packages that do extensive
floating point calculations.

Dave

--
John David Anglin  dave.ang...@bell.net



Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-16 Thread David Edelsohn
On Thu, Nov 16, 2023 at 5:22 PM Arsen Arsenović  wrote:

>
> David Edelsohn  writes:
>
> > Don't build with the dependent libraries in tree.  Don't build the
> > dependent libraries as shared libraries. The libraries are already built
> > and in /opt/cfarm, as mentioned in the Compile Farm wiki.
> >
> > AIX is not Solaris and not Linux.  It doesn't use ELF.  AIX shared
> > libraries *ARE* shared object files in archives.  Shared object
> versioning
> > is handled by multiple objects in the same archive.
>
> Hmm, I see.  I removed all the deps but gettext from the tree.
>
> This leaves gettext-runtime fulfilling the previous role of intl/.
>
> However, I'm confused about how this worked before, in that case, since,
> IIRC, intl also produced libraries and was also put into host exports.
>
> Leaving gettext in tree produces:
>
> Could not load program gawk:
> Dependent module
> /home/arsen/build/./gettext/intl/.libs/libintl.a(libintl.so.8) could not be
> loaded.
> Member libintl.so.8 is not found in archive
>
> I'll try to see why intl/ didn't cause the same issue soon.
>
> Thanks, have a lovely evening.
>

The previous version of "intl" was built as a static library.  Configure in
the older package had the option --enable-host-shared,
which I did not use.  Based on the failure message, the in-tree gettext
seems to be built as a shared library.  If you explicitly
pass --disable-shared to the in-tree configure, you may get farther.  I'm
currently using --disable-shared --disable-threads.
As we have discussed, the current gettext will retain some references to
pthreads despite the configure option.

Thanks, David


>
> > Thanks, David
> >
> >
> >
> > On Thu, Nov 16, 2023 at 4:15 PM Arsen Arsenović  wrote:
> >
> >>
> >> Arsen Arsenović  writes:
> >>
> >> > [[PGP Signed Part:Good signature from 52C294301EA2C493 Arsen Arsenović
> >> (Gentoo Developer UID)  (trust ultimate) created at
> >> 2023-11-16T19:47:16+0100 using EDDSA]]
> >> >
> >> > David Edelsohn  writes:
> >> >
> >> >> On Wed, Nov 15, 2023 at 9:22 AM Arsen Arsenović 
> >> wrote:
> >> >>
> >> >>>
> >> >>> David Edelsohn  writes:
> >> >>>
> >> >>> > GCC had been working on AIX with NLS, using
> >> "--with-included-gettext".
> >> >>> > --disable-nls gets past the breakage, but GCC does not build for
> me
> >> on
> >> >>> AIX
> >> >>> > with NLS enabled.
> >> >>>
> >> >>> That should still work with gettext 0.22+ extracted in-tree (it
> should
> >> >>> be fetched by download_prerequisites).
> >> >>>
> >> >>> > A change in dependencies for GCC should have been announced and
> more
> >> >>> widely
> >> >>> > socialized in the GCC development mailing list, not just GCC
> patches
> >> >>> > mailing list.
> >> >>> >
> >> >>> > I have tried both the AIX Open Source libiconv and libgettext
> >> package,
> >> >>> and
> >> >>> > the ones that I previously built.  Both fail because GCC configure
> >> >>> decides
> >> >>> > to disable NLS, despite being requested, while libcpp is
> satisfied,
> >> so
> >> >>> > tools in the gcc subdirectory don't link against libiconv and the
> >> build
> >> >>> > fails.  With the included gettext, I was able to rely on a
> >> >>> self-consistent
> >> >>> > solution.
> >> >>>
> >> >>> That is interesting.  They should be using the same checks.  I've
> >> >>> checked trunk and regenerated files on it, and saw no significant
> diff
> >> >>> (some whitespace changes only).  Could you post the config.log of
> both?
> >> >>>
> >> >>> I've never used AIX.  Can I reproduce this on one of the cfarm
> machines
> >> >>> to poke around?  I've tried cfarm119, but that one lacked git, and I
> >> >>> haven't poked around much further due to time constraints.
> >> >>>
> >> >>
> >> >> The AIX system in the Compile Farm has a complete complement of Open
> >> Source
> >> >> software installed.
> >> >>
> >> >> Please ensure that /opt/freeware/bin is in your path.  Also, the GCC
> >> Wiki
> >> >> Compile Farm page has build tips that include AIX
> >> >>
> >> >>
> >>
> https://gcc.gnu.org/wiki/CompileFarm#Services_and_software_installed_on_farm_machines
> >> >
> >> > Thanks, that got me further.
> >> >
> >> >> that recommended --with-included-gettext configuration option.
> >> >
> >> > This flag should still exist and operate the same if gettext is
> present
> >> > in tree.  I've cloned gcc and downloaded prerequisites (via
> >> > contrib/download_prerequisites) and I am trying to configure it now.
> >>
> >> The build failed.  After gettext/gmp/... (in-tree hostlibs) get built
> >> and added to library paths, further GCC processes fail to run:
> >>
> >> configure:3305: gcc -g  -static-libstdc++ -static-libgcc -Wl,-bbigtoc
> >> conftest.c  >&5
> >> Could not load program
> >> /opt/freeware/libexec/gcc/powerpc-ibm-aix7.3.0.0/10/cc1:
> >> Dependent module
> >> /home/arsen/build/./gmp/.libs/libgmp.a(libgmp.so.10) could not be
> loaded.
> >> Member libgmp.so.10 is not found in archive
> >>
> >> This seems odd.  I am not sure 

Re: [PATCH] vect: Use statement vectype for conditional mask.

2023-11-16 Thread Robin Dapp
> For the fortran testcase we don't even run into this but hit an
> internal def and assert on
> 
>   gcc_assert (STMT_VINFO_VEC_STMTS (def_stmt_info).length () == ncopies);
> 
> I think this shows missing handling of .COND_* in the bool pattern recognition
> as we get the 'bool' condition as boolean data vector rather than a mask.  The
> same is true for the testcase with the invariant condition.  This causes us to
> select the wrong vector type here.  The "easiest" might be to look at
> how COND_EXPR is handled in vect_recog_bool_pattern and friends and
> handle .COND_* IFNs the same for the mask operand.

For the first (imagick) testcase adding a bool pattern does not help
because we always pass NULL as vectype to vect_get_vec_defs.
Doing so we will always use get_vectype_for_scalar_type (i.e.
a "full" bool vector) because vectype of the (conditional) stmt
is the lhs type and not the mask's type.
For cond_exprs in vectorizable_condition we directly pass a
comp_vectype instead (truth_type).  Wouldn't that, independently
of the pattern recog, make sense?

Now for the Fortran testcase I'm still a bit lost.  Opposed to
before we now vectorize with a variable VF and hit the problem
in the epilogue with ncopies = 2.

.COND_ADD (_7, __gcov0.__brute_force_MOD_brute_I_lsm.21_67, 1, 
__gcov0.__brute_force_MOD_brute_I_lsm.21_67);
where
_7 = *_6
which is an internal_def.

I played around with doing it analogously to the COND_EXPR
handling, so creating a COND_ADD (_7 != 0) which will required
several fixups in other places because we're not prepared to
handle that.  In the end it seems to only shift the problem
because we will still need the definition of _7.

I guess you're implying that the definition should have already
been handled by pattern recognition so that at the point when
we need it, it has a related pattern stmt with the proper mask
type?

Regards
 Robin



[PATCH] libgccjit: Fix ira cost segfault

2023-11-16 Thread Antoni Boucher
Hi.
This patch fixes a segfault that happens when compiling librsvg (more
specifically its dependency aho-corasick) with rustc_codegen_gcc (bug
112575).
I was not able to create a reproducer for this bug: I'm assuming I
might need to concat all the reproducers together in the same file in
order to be able to reproduce the issue.

I'm also not sure I put the cleanup in the correct location.
Is there any finalizer function for target specific code?

Thanks to fix this issue.
From e0f4f51682266bc9f507afdb64908ed3695a2f5e Mon Sep 17 00:00:00 2001
From: Antoni Boucher 
Date: Thu, 2 Nov 2023 17:18:35 -0400
Subject: [PATCH] libgccjit: Fix ira cost segfault

gcc/ChangeLog:
	PR jit/112575
	* config/i386/i386-options.cc (ix86_option_override_internal):
	Cleanup target_attribute_cache.
---
 gcc/config/i386/i386-options.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index df7d24352d1..f596c0fb53c 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -3070,6 +3070,12 @@ ix86_option_override_internal (bool main_args_p,
 	= opts->x_flag_unsafe_math_optimizations;
   target_option_default_node = target_option_current_node
 = build_target_option_node (opts, opts_set);
+  /* TODO: check if this is the correct location.  It should probably be in
+	 some finalizer function, but I don't
+	 know if there's one.  */
+  target_attribute_cache[0] = NULL;
+  target_attribute_cache[1] = NULL;
+  target_attribute_cache[2] = NULL;
 }
 
   if (opts->x_flag_cf_protection != CF_NONE)
-- 
2.42.1



Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-16 Thread Arsen Arsenović

David Edelsohn  writes:

> Don't build with the dependent libraries in tree.  Don't build the
> dependent libraries as shared libraries. The libraries are already built
> and in /opt/cfarm, as mentioned in the Compile Farm wiki.
>
> AIX is not Solaris and not Linux.  It doesn't use ELF.  AIX shared
> libraries *ARE* shared object files in archives.  Shared object versioning
> is handled by multiple objects in the same archive.

Hmm, I see.  I removed all the deps but gettext from the tree.

This leaves gettext-runtime fulfilling the previous role of intl/.

However, I'm confused about how this worked before, in that case, since,
IIRC, intl also produced libraries and was also put into host exports.

Leaving gettext in tree produces:

Could not load program gawk:
Dependent module 
/home/arsen/build/./gettext/intl/.libs/libintl.a(libintl.so.8) could not be 
loaded.
Member libintl.so.8 is not found in archive 

I'll try to see why intl/ didn't cause the same issue soon.

Thanks, have a lovely evening.

> Thanks, David
>
>
>
> On Thu, Nov 16, 2023 at 4:15 PM Arsen Arsenović  wrote:
>
>>
>> Arsen Arsenović  writes:
>>
>> > [[PGP Signed Part:Good signature from 52C294301EA2C493 Arsen Arsenović
>> (Gentoo Developer UID)  (trust ultimate) created at
>> 2023-11-16T19:47:16+0100 using EDDSA]]
>> >
>> > David Edelsohn  writes:
>> >
>> >> On Wed, Nov 15, 2023 at 9:22 AM Arsen Arsenović 
>> wrote:
>> >>
>> >>>
>> >>> David Edelsohn  writes:
>> >>>
>> >>> > GCC had been working on AIX with NLS, using
>> "--with-included-gettext".
>> >>> > --disable-nls gets past the breakage, but GCC does not build for me
>> on
>> >>> AIX
>> >>> > with NLS enabled.
>> >>>
>> >>> That should still work with gettext 0.22+ extracted in-tree (it should
>> >>> be fetched by download_prerequisites).
>> >>>
>> >>> > A change in dependencies for GCC should have been announced and more
>> >>> widely
>> >>> > socialized in the GCC development mailing list, not just GCC patches
>> >>> > mailing list.
>> >>> >
>> >>> > I have tried both the AIX Open Source libiconv and libgettext
>> package,
>> >>> and
>> >>> > the ones that I previously built.  Both fail because GCC configure
>> >>> decides
>> >>> > to disable NLS, despite being requested, while libcpp is satisfied,
>> so
>> >>> > tools in the gcc subdirectory don't link against libiconv and the
>> build
>> >>> > fails.  With the included gettext, I was able to rely on a
>> >>> self-consistent
>> >>> > solution.
>> >>>
>> >>> That is interesting.  They should be using the same checks.  I've
>> >>> checked trunk and regenerated files on it, and saw no significant diff
>> >>> (some whitespace changes only).  Could you post the config.log of both?
>> >>>
>> >>> I've never used AIX.  Can I reproduce this on one of the cfarm machines
>> >>> to poke around?  I've tried cfarm119, but that one lacked git, and I
>> >>> haven't poked around much further due to time constraints.
>> >>>
>> >>
>> >> The AIX system in the Compile Farm has a complete complement of Open
>> Source
>> >> software installed.
>> >>
>> >> Please ensure that /opt/freeware/bin is in your path.  Also, the GCC
>> Wiki
>> >> Compile Farm page has build tips that include AIX
>> >>
>> >>
>> https://gcc.gnu.org/wiki/CompileFarm#Services_and_software_installed_on_farm_machines
>> >
>> > Thanks, that got me further.
>> >
>> >> that recommended --with-included-gettext configuration option.
>> >
>> > This flag should still exist and operate the same if gettext is present
>> > in tree.  I've cloned gcc and downloaded prerequisites (via
>> > contrib/download_prerequisites) and I am trying to configure it now.
>>
>> The build failed.  After gettext/gmp/... (in-tree hostlibs) get built
>> and added to library paths, further GCC processes fail to run:
>>
>> configure:3305: gcc -g  -static-libstdc++ -static-libgcc -Wl,-bbigtoc
>> conftest.c  >&5
>> Could not load program
>> /opt/freeware/libexec/gcc/powerpc-ibm-aix7.3.0.0/10/cc1:
>> Dependent module
>> /home/arsen/build/./gmp/.libs/libgmp.a(libgmp.so.10) could not be loaded.
>> Member libgmp.so.10 is not found in archive
>>
>> This seems odd.  I am not sure what compels the RTDL (?) to look up .sos
>> in archives, or how it knows about these archives..  I suspect it's
>> getting tripped by something in HOST_EXPORTS.
>>
>> >> Thanks, David
>> >>
>> >>
>> >>>
>> >>> TIA, sorry about the inconvenience.  Have a lovely day.
>> >>>
>> >>> > The current gettext-0.22.3 fails to build for me on AIX.
>> >>> >
>> >>> > libcpp configure believes that NLS functions on AIX, but gcc
>> configure
>> >>> > fails in its tests of gettext functionality, which leads to an
>> >>> inconsistent
>> >>> > configuration and build breakage.
>> >>> >
>> >>> > Thanks, David
>> >>>
>> >>>
>> >>> --
>> >>> Arsen Arsenović
>> >>>
>>
>>
>> --
>> Arsen Arsenović
>>


-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] libgccjit: Add support for the type bfloat16

2023-11-16 Thread Antoni Boucher
I forgot to attach the patch.

On Thu, 2023-11-16 at 17:19 -0500, Antoni Boucher wrote:
> Hi.
> This patch adds the support for the type bfloat16 (bug 112574).
> 
> This was asked to be splitted from a another patch sent here:
> https://gcc.gnu.org/pipermail/jit/2023q1/001607.html
> 
> Thanks for the review.

From 0e57583bba7e9fe5f5ff89559d4f29bf1bd7a240 Mon Sep 17 00:00:00 2001
From: Antoni Boucher 
Date: Thu, 16 Nov 2023 10:59:22 -0500
Subject: [PATCH] libgccjit: Add support for the type bfloat16

gcc/jit/ChangeLog:

	PR jit/112574
	* docs/topics/types.rst: Document GCC_JIT_TYPE_BFLOAT16.
	* jit-common.h: Update NUM_GCC_JIT_TYPES.
	* jit-playback.cc (get_tree_node_for_type): Support bfloat16.
	* jit-recording.cc (recording::memento_of_get_type::get_size,
	recording::memento_of_get_type::dereference,
	recording::memento_of_get_type::is_int,
	recording::memento_of_get_type::is_signed,
	recording::memento_of_get_type::is_float,
	recording::memento_of_get_type::is_bool): Support bfloat16.
	* libgccjit.h (enum gcc_jit_types): Add GCC_JIT_TYPE_BFLOAT16.

gcc/testsuite/ChangeLog:

	PR jit/112574
	* jit.dg/test-types.c: Test GCC_JIT_TYPE_BFLOAT16.
	* jit.dg/test-bfloat16.c: New test.
---
 gcc/jit/docs/topics/types.rst|  2 ++
 gcc/jit/jit-common.h |  2 +-
 gcc/jit/jit-playback.cc  |  2 ++
 gcc/jit/jit-recording.cc | 11 +
 gcc/jit/libgccjit.h  |  4 ++-
 gcc/testsuite/jit.dg/test-bfloat16.c | 37 
 gcc/testsuite/jit.dg/test-types.c|  2 ++
 7 files changed, 58 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/jit.dg/test-bfloat16.c

diff --git a/gcc/jit/docs/topics/types.rst b/gcc/jit/docs/topics/types.rst
index d8c1d15d69d..1ae814a349d 100644
--- a/gcc/jit/docs/topics/types.rst
+++ b/gcc/jit/docs/topics/types.rst
@@ -113,6 +113,8 @@ Standard types
- C99's ``__int128_t``
  * - :c:data:`GCC_JIT_TYPE_FLOAT`
-
+ * - :c:data:`GCC_JIT_TYPE_BFLOAT16`
+   - C's ``__bfloat16``
  * - :c:data:`GCC_JIT_TYPE_DOUBLE`
-
  * - :c:data:`GCC_JIT_TYPE_LONG_DOUBLE`
diff --git a/gcc/jit/jit-common.h b/gcc/jit/jit-common.h
index 80c1618da96..983c9190d44 100644
--- a/gcc/jit/jit-common.h
+++ b/gcc/jit/jit-common.h
@@ -36,7 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 #endif
 #endif
 
-const int NUM_GCC_JIT_TYPES = GCC_JIT_TYPE_INT128_T + 1;
+const int NUM_GCC_JIT_TYPES = GCC_JIT_TYPE_BFLOAT16 + 1;
 
 /* This comment is included by the docs.
 
diff --git a/gcc/jit/jit-playback.cc b/gcc/jit/jit-playback.cc
index 18cc4da25b8..7e1c97a4638 100644
--- a/gcc/jit/jit-playback.cc
+++ b/gcc/jit/jit-playback.cc
@@ -280,6 +280,8 @@ get_tree_node_for_type (enum gcc_jit_types type_)
 
 case GCC_JIT_TYPE_FLOAT:
   return float_type_node;
+case GCC_JIT_TYPE_BFLOAT16:
+  return bfloat16_type_node;
 case GCC_JIT_TYPE_DOUBLE:
   return double_type_node;
 case GCC_JIT_TYPE_LONG_DOUBLE:
diff --git a/gcc/jit/jit-recording.cc b/gcc/jit/jit-recording.cc
index 9b5b8005ebe..af8b7a421ec 100644
--- a/gcc/jit/jit-recording.cc
+++ b/gcc/jit/jit-recording.cc
@@ -2385,6 +2385,10 @@ recording::memento_of_get_type::get_size ()
 case GCC_JIT_TYPE_FLOAT:
   size = FLOAT_TYPE_SIZE;
   break;
+#ifdef HAVE_BFmode
+case GCC_JIT_TYPE_BFLOAT16:
+  return GET_MODE_UNIT_SIZE (BFmode);
+#endif
 case GCC_JIT_TYPE_DOUBLE:
   size = DOUBLE_TYPE_SIZE;
   break;
@@ -2444,6 +2448,7 @@ recording::memento_of_get_type::dereference ()
 case GCC_JIT_TYPE_INT64_T:
 case GCC_JIT_TYPE_INT128_T:
 case GCC_JIT_TYPE_FLOAT:
+case GCC_JIT_TYPE_BFLOAT16:
 case GCC_JIT_TYPE_DOUBLE:
 case GCC_JIT_TYPE_LONG_DOUBLE:
 case GCC_JIT_TYPE_COMPLEX_FLOAT:
@@ -2508,6 +2513,7 @@ recording::memento_of_get_type::is_int () const
   return true;
 
 case GCC_JIT_TYPE_FLOAT:
+case GCC_JIT_TYPE_BFLOAT16:
 case GCC_JIT_TYPE_DOUBLE:
 case GCC_JIT_TYPE_LONG_DOUBLE:
   return false;
@@ -2566,6 +2572,7 @@ recording::memento_of_get_type::is_signed () const
 case GCC_JIT_TYPE_UINT128_T:
 
 case GCC_JIT_TYPE_FLOAT:
+case GCC_JIT_TYPE_BFLOAT16:
 case GCC_JIT_TYPE_DOUBLE:
 case GCC_JIT_TYPE_LONG_DOUBLE:
 
@@ -2625,6 +2632,7 @@ recording::memento_of_get_type::is_float () const
   return false;
 
 case GCC_JIT_TYPE_FLOAT:
+case GCC_JIT_TYPE_BFLOAT16:
 case GCC_JIT_TYPE_DOUBLE:
 case GCC_JIT_TYPE_LONG_DOUBLE:
   return true;
@@ -2688,6 +2696,7 @@ recording::memento_of_get_type::is_bool () const
   return false;
 
 case GCC_JIT_TYPE_FLOAT:
+case GCC_JIT_TYPE_BFLOAT16:
 case GCC_JIT_TYPE_DOUBLE:
 case GCC_JIT_TYPE_LONG_DOUBLE:
   return false;
@@ -2768,6 +2777,7 @@ static const char * const get_type_strings[] = {
   "__int64_t",/* GCC_JIT_TYPE_INT64_T */
   "__int128_t",   /* GCC_JIT_TYPE_INT128_T */
 
+  "bfloat16", /* GCC_JIT_TYPE_BFLOAT16 */
 };
 
 /* Implementation of 

[PATCH] libgccjit: Add support for the type bfloat16

2023-11-16 Thread Antoni Boucher
Hi.
This patch adds the support for the type bfloat16 (bug 112574).

This was asked to be splitted from a another patch sent here:
https://gcc.gnu.org/pipermail/jit/2023q1/001607.html

Thanks for the review.


Re: building GNU gettext on AIX

2023-11-16 Thread David Edelsohn
On Thu, Nov 16, 2023 at 1:52 PM Bruno Haible  wrote:

> David Edelsohn wrote:
> > I manually commented out HAVE_PTHREAD_API from config.h and produced a
> > libintl.a without references to pthreads.
>
> Good finding!
>
> Commenting out HAVE_PTHREAD_API from config.h is also what makes the
> option --enable-threads=isoc work as expected on AIX 7.3.
>

I reconfigured and built gettext with --enable-threads=isoc .  libintl.a
still contains references to pthread_mutex and friends:

$ nm -BCpg libintl.a  | grep pthread

 - U __n_pthreads

 - U .pthread_mutex_lock

 - U .pthread_mutex_unlock

 - U .pthread_mutex_lock

 - U .pthread_mutex_unlock
 - U __n_pthreads

from files mbrtowc, setlocale_null, and vasnwprintf.

I tested on an AIX 7.2.5 system and confirmed that libc does provide the
mtx_ symbols:

$ nm -BCpg libc.a | grep mtx_

 0 T .mtx_timedlock

   160 T .mtx_unlock

   256 T .mtx_trylock

   416 T .mtx_lock

   512 T .mtx_init

   736 T .mtx_destroy

80 D mtx_timedlock

92 D mtx_unlock

   104 D mtx_trylock

   116 D mtx_lock

   128 D mtx_init

   140 D mtx_destroy


Were you suggesting that --enable-threads=isoc would work now or that it
would require further changes for a future release?


At the moment, configuring gettext with --disable-threads and manually
modifying config.h is the only method that produces

libintl.a without references to pthreads allowing GCC to build on AIX with
NLS enabled.


Thanks, David


Re: [PATCH] c, c++: Add new value for vector types for __builtin_classify_type (type)

2023-11-16 Thread Jason Merrill

On 11/11/23 03:22, Jakub Jelinek wrote:

Hi!

While filing a clang request to return 18 on _BitInts for
__builtin_classify_type instead of -1 they return currently, I've
noticed that we return -1 for vector types.  I'm not convinced it is a good
idea to change behavior of __builtin_classify_type (vector_expression)
after 22 years it behaved one way (returned -1), but the
__builtin_classify_type (type) form is a new extension added for GCC 14,
so this patch returns 19 for vectors just in that second form.  Many other
return values are only accessible from the second form as well (mostly because
of argument promotions), so I think it is fine like that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


The C++ changes are OK (and obvious).  I'm skeptical of the choice to 
keep returning -1 for the expression form, it seems more likely to cause 
problems (due to it disagreeing with the type form) than changing it 
(due to old code somehow relying on -1?).  But people who are more 
familiar with the use of __builtin_classify_type should make the call.



2023-11-11  Jakub Jelinek  

gcc/
* typeclass.h (enum type_class): Add vector_type_class.
* builtins.h (type_to_class): Add FOR_TYPE argument.
* builtins.cc (type_to_class): Add FOR_TYPE argument,
for VECTOR_TYPE return vector_type_class if it is true, no_type_class
otherwise.
(expand_builtin_classify_type, fold_builtin_classify_type): Pass
false to type_to_class second argument.
gcc/c/
* c-parser.cc (c_parser_postfix_expression_after_primary): Pass true
to type_to_class second argument.
gcc/cp/
* parser.cc (cp_parser_postfix_expression): Pass true to type_to_class
second argument.
* pt.cc (tsubst_expr): Likewise.
gcc/testsuite/
* c-c++-common/builtin-classify-type-1.c (main): Add tests for vector
types.

--- gcc/typeclass.h.jj  2023-09-06 17:28:24.238977355 +0200
+++ gcc/typeclass.h 2023-11-10 10:50:59.519007647 +0100
@@ -38,7 +38,7 @@ enum type_class
record_type_class, union_type_class,
array_type_class, string_type_class,
lang_type_class, opaque_type_class,
-  bitint_type_class
+  bitint_type_class, vector_type_class
  };
  
  #endif /* GCC_TYPECLASS_H */

--- gcc/builtins.h.jj   2023-09-29 10:39:37.073836032 +0200
+++ gcc/builtins.h  2023-11-10 11:17:22.196907216 +0100
@@ -156,6 +156,6 @@ extern internal_fn associated_internal_f
  extern internal_fn replacement_internal_fn (gcall *);
  
  extern bool builtin_with_linkage_p (tree);

-extern int type_to_class (tree);
+extern int type_to_class (tree, bool);
  
  #endif /* GCC_BUILTINS_H */

--- gcc/builtins.cc.jj  2023-11-09 09:17:40.230182483 +0100
+++ gcc/builtins.cc 2023-11-10 11:19:29.669129855 +0100
@@ -1833,10 +1833,11 @@ expand_builtin_return (rtx result)
expand_naked_return ();
  }
  
-/* Used by expand_builtin_classify_type and fold_builtin_classify_type.  */

+/* Used by expand_builtin_classify_type and fold_builtin_classify_type.
+   FOR_TYPE is true for __builtin_classify_type (type), false otherwise.  */
  
  int

-type_to_class (tree type)
+type_to_class (tree type, bool for_type)
  {
switch (TREE_CODE (type))
  {
@@ -1859,6 +1860,7 @@ type_to_class (tree type)
  case LANG_TYPE:  return lang_type_class;
  case OPAQUE_TYPE:  return opaque_type_class;
  case BITINT_TYPE:return bitint_type_class;
+case VECTOR_TYPE: return for_type ? vector_type_class : no_type_class;
  default: return no_type_class;
  }
  }
@@ -1869,7 +1871,7 @@ static rtx
  expand_builtin_classify_type (tree exp)
  {
if (call_expr_nargs (exp))
-return GEN_INT (type_to_class (TREE_TYPE (CALL_EXPR_ARG (exp, 0;
+return GEN_INT (type_to_class (TREE_TYPE (CALL_EXPR_ARG (exp, 0)), false));
return GEN_INT (no_type_class);
  }
  
@@ -8678,7 +8680,8 @@ fold_builtin_classify_type (tree arg)

if (arg == 0)
  return build_int_cst (integer_type_node, no_type_class);
  
-  return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));

+  return build_int_cst (integer_type_node,
+   type_to_class (TREE_TYPE (arg), false));
  }
  
  /* Fold a call EXPR (which may be null) to __builtin_strlen with argument

--- gcc/c/c-parser.cc.jj2023-11-09 09:04:18.473545429 +0100
+++ gcc/c/c-parser.cc   2023-11-10 11:19:57.907735925 +0100
@@ -12249,7 +12249,7 @@ c_parser_postfix_expression_after_primar
   _const_operands);
parens.skip_until_found_close (parser);
expr.value = build_int_cst (integer_type_node,
-   type_to_class (ret.spec));
+   type_to_class (ret.spec, true));
break;
  }
else
--- gcc/cp/parser.cc.jj 2023-11-09 09:04:18.771541207 +0100
+++ gcc/cp/parser.cc2023-11-10 

Re: [committed] hppa: Revise REG+D address support to allow long displacements before reload

2023-11-16 Thread Jeff Law




On 11/16/23 10:54, John David Anglin wrote:

Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11.  Committed
to trunk.

This patch works around problem compiling python3.11 by improving
REG+D address handling.  The change results in smaller code and
reduced register pressure.

Dave
---

hppa: Revise REG+D address support to allow long displacements before reload

In analyzing PR rtl-optimization/112415, I realized that restricting
REG+D offsets to 5-bits before reload results in very poor code and
complexities in optimizing these instructions after reload.  The
general problem is long displacements are not allowed for floating
point accesses when generating PA 1.1 code.  Even with PA 2.0, there
is a ELF linker bug that prevents using long displacements for
floating point loads and stores.

In the past, enabling long displacements before reload caused issues
in reload.  However, there have been fixes in the handling of reloads
for floating-point accesses.  This change allows long displacements
before reload and corrects a couple of issues in the constraint
handling for integer and floating-point accesses.

2023-11-16  John David Anglin  

gcc/ChangeLog:

PR rtl-optimization/112415
* config/pa/pa.cc (pa_legitimate_address_p): Allow 14-bit
displacements before reload.  Simplify logic flow.  Revise
comments.
* config/pa/pa.h (TARGET_ELF64): New define.
(INT14_OK_STRICT): Update define and comment.
* config/pa/pa64-linux.h (TARGET_ELF64): Define.
* config/pa/predicates.md (base14_operand): Don't check
alignment of short displacements.
(integer_store_memory_operand): Don't return true when
reload_in_progress is true.  Remove INT_5_BITS check.
(floating_point_store_memory_operand): Don't return true when
reload_in_progress is true.  Use INT14_OK_STRICT to check
whether long displacements are always okay.

I strongly suspect this is going to cause problems in the end.

I've already done what you're trying to do.  It'll likely look fine for 
an extended period of time, but it will almost certainly break one day.


Jeff



Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-16 Thread David Edelsohn
Don't build with the dependent libraries in tree.  Don't build the
dependent libraries as shared libraries. The libraries are already built
and in /opt/cfarm, as mentioned in the Compile Farm wiki.

AIX is not Solaris and not Linux.  It doesn't use ELF.  AIX shared
libraries *ARE* shared object files in archives.  Shared object versioning
is handled by multiple objects in the same archive.

Thanks, David



On Thu, Nov 16, 2023 at 4:15 PM Arsen Arsenović  wrote:

>
> Arsen Arsenović  writes:
>
> > [[PGP Signed Part:Good signature from 52C294301EA2C493 Arsen Arsenović
> (Gentoo Developer UID)  (trust ultimate) created at
> 2023-11-16T19:47:16+0100 using EDDSA]]
> >
> > David Edelsohn  writes:
> >
> >> On Wed, Nov 15, 2023 at 9:22 AM Arsen Arsenović 
> wrote:
> >>
> >>>
> >>> David Edelsohn  writes:
> >>>
> >>> > GCC had been working on AIX with NLS, using
> "--with-included-gettext".
> >>> > --disable-nls gets past the breakage, but GCC does not build for me
> on
> >>> AIX
> >>> > with NLS enabled.
> >>>
> >>> That should still work with gettext 0.22+ extracted in-tree (it should
> >>> be fetched by download_prerequisites).
> >>>
> >>> > A change in dependencies for GCC should have been announced and more
> >>> widely
> >>> > socialized in the GCC development mailing list, not just GCC patches
> >>> > mailing list.
> >>> >
> >>> > I have tried both the AIX Open Source libiconv and libgettext
> package,
> >>> and
> >>> > the ones that I previously built.  Both fail because GCC configure
> >>> decides
> >>> > to disable NLS, despite being requested, while libcpp is satisfied,
> so
> >>> > tools in the gcc subdirectory don't link against libiconv and the
> build
> >>> > fails.  With the included gettext, I was able to rely on a
> >>> self-consistent
> >>> > solution.
> >>>
> >>> That is interesting.  They should be using the same checks.  I've
> >>> checked trunk and regenerated files on it, and saw no significant diff
> >>> (some whitespace changes only).  Could you post the config.log of both?
> >>>
> >>> I've never used AIX.  Can I reproduce this on one of the cfarm machines
> >>> to poke around?  I've tried cfarm119, but that one lacked git, and I
> >>> haven't poked around much further due to time constraints.
> >>>
> >>
> >> The AIX system in the Compile Farm has a complete complement of Open
> Source
> >> software installed.
> >>
> >> Please ensure that /opt/freeware/bin is in your path.  Also, the GCC
> Wiki
> >> Compile Farm page has build tips that include AIX
> >>
> >>
> https://gcc.gnu.org/wiki/CompileFarm#Services_and_software_installed_on_farm_machines
> >
> > Thanks, that got me further.
> >
> >> that recommended --with-included-gettext configuration option.
> >
> > This flag should still exist and operate the same if gettext is present
> > in tree.  I've cloned gcc and downloaded prerequisites (via
> > contrib/download_prerequisites) and I am trying to configure it now.
>
> The build failed.  After gettext/gmp/... (in-tree hostlibs) get built
> and added to library paths, further GCC processes fail to run:
>
> configure:3305: gcc -g  -static-libstdc++ -static-libgcc -Wl,-bbigtoc
> conftest.c  >&5
> Could not load program
> /opt/freeware/libexec/gcc/powerpc-ibm-aix7.3.0.0/10/cc1:
> Dependent module
> /home/arsen/build/./gmp/.libs/libgmp.a(libgmp.so.10) could not be loaded.
> Member libgmp.so.10 is not found in archive
>
> This seems odd.  I am not sure what compels the RTDL (?) to look up .sos
> in archives, or how it knows about these archives..  I suspect it's
> getting tripped by something in HOST_EXPORTS.
>
> >> Thanks, David
> >>
> >>
> >>>
> >>> TIA, sorry about the inconvenience.  Have a lovely day.
> >>>
> >>> > The current gettext-0.22.3 fails to build for me on AIX.
> >>> >
> >>> > libcpp configure believes that NLS functions on AIX, but gcc
> configure
> >>> > fails in its tests of gettext functionality, which leads to an
> >>> inconsistent
> >>> > configuration and build breakage.
> >>> >
> >>> > Thanks, David
> >>>
> >>>
> >>> --
> >>> Arsen Arsenović
> >>>
>
>
> --
> Arsen Arsenović
>


[PATCH 4/4] c23: construct composite type for tagged types

2023-11-16 Thread Martin Uecker






Support for constructing composite type for structs and unions
in C23.

gcc/c:
* c-typeck.cc (composite_type_internal): Adapted from
composite_type to support structs and unions.
(composite_type): New wrapper function.
(build_conditional_operator): Return composite type.

gcc/testsuite:
* gcc.dg/c23-tag-composite-1.c: New test.
* gcc.dg/c23-tag-composite-2.c: New test.
* gcc.dg/c23-tag-composite-3.c: New test.
* gcc.dg/c23-tag-composite-4.c: New test.
---
 gcc/c/c-typeck.cc  | 114 +
 gcc/testsuite/gcc.dg/c23-tag-composite-1.c |  26 +
 gcc/testsuite/gcc.dg/c23-tag-composite-2.c |  16 +++
 gcc/testsuite/gcc.dg/c23-tag-composite-3.c |  17 +++
 gcc/testsuite/gcc.dg/c23-tag-composite-4.c |  21 
 5 files changed, 176 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-composite-1.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-composite-2.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-composite-3.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-composite-4.c

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 262b04c582f..2255fb66bb2 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -381,8 +381,15 @@ build_functype_attribute_variant (tree ntype, tree otype, 
tree attrs)
nonzero; if that isn't so, this may crash.  In particular, we
assume that qualifiers match.  */
 
+struct composite_cache {
+  tree t1;
+  tree t2;
+  tree composite;
+  struct composite_cache* next;
+};
+
 tree
-composite_type (tree t1, tree t2)
+composite_type_internal (tree t1, tree t2, struct composite_cache* cache)
 {
   enum tree_code code1;
   enum tree_code code2;
@@ -427,7 +434,8 @@ composite_type (tree t1, tree t2)
   {
tree pointed_to_1 = TREE_TYPE (t1);
tree pointed_to_2 = TREE_TYPE (t2);
-   tree target = composite_type (pointed_to_1, pointed_to_2);
+   tree target = composite_type_internal (pointed_to_1,
+  pointed_to_2, cache);
 t1 = build_pointer_type_for_mode (target, TYPE_MODE (t1), false);
t1 = build_type_attribute_variant (t1, attributes);
return qualify_type (t1, t2);
@@ -435,7 +443,8 @@ composite_type (tree t1, tree t2)
 
 case ARRAY_TYPE:
   {
-   tree elt = composite_type (TREE_TYPE (t1), TREE_TYPE (t2));
+   tree elt = composite_type_internal (TREE_TYPE (t1), TREE_TYPE (t2),
+   cache);
int quals;
tree unqual_elt;
tree d1 = TYPE_DOMAIN (t1);
@@ -503,9 +512,61 @@ composite_type (tree t1, tree t2)
return build_type_attribute_variant (t1, attributes);
   }
 
-case ENUMERAL_TYPE:
 case RECORD_TYPE:
 case UNION_TYPE:
+  if (flag_isoc23 && !comptypes_same_p (t1, t2))
+   {
+ gcc_checking_assert (COMPLETE_TYPE_P (t1) && COMPLETE_TYPE_P (t2));
+ gcc_checking_assert (comptypes (t1, t2));
+
+ /* If a composite type for these two types is already under
+construction, return it.  */
+
+ for (struct composite_cache *c = cache; c != NULL; c = c->next)
+   if (c->t1 == t1 && c->t2 == t2)
+  return c->composite;
+
+ /* Otherwise, create a new type node and link it into the cache.  */
+
+ tree n = make_node (code1);
+ struct composite_cache cache2 = { t1, t2, n, cache };
+ cache = 
+
+ tree f1 = TYPE_FIELDS (t1);
+ tree f2 = TYPE_FIELDS (t2);
+ tree fields = NULL_TREE;
+
+ for (tree a = f1, b = f2; a && b;
+  a = DECL_CHAIN (a), b = DECL_CHAIN (b))
+   {
+ tree ta = TREE_TYPE (a);
+ tree tb = TREE_TYPE (b);
+
+ gcc_assert (DECL_NAME (a) == DECL_NAME (b));
+ gcc_assert (comptypes (ta, tb));
+
+ tree f = build_decl (input_location, FIELD_DECL, DECL_NAME (a),
+  composite_type_internal (ta, tb, cache));
+
+ DECL_FIELD_CONTEXT (f) = n;
+ DECL_CHAIN (f) = fields;
+ fields = f;
+   }
+
+ TYPE_NAME (n) = TYPE_NAME (t1);
+ TYPE_FIELDS (n) = nreverse (fields);
+ TYPE_ATTRIBUTES (n) = attributes;
+ layout_type (n);
+ n = build_type_attribute_variant (n, attributes);
+ n = qualify_type (n, t1);
+
+ gcc_checking_assert (comptypes (n, t1));
+ gcc_checking_assert (comptypes (n, t2));
+
+ return n;
+   }
+  /* FALLTHRU */
+case ENUMERAL_TYPE:
   if (attributes != NULL)
{
  /* Try harder not to create a new aggregate type.  */
@@ -520,7 +581,8 @@ composite_type (tree t1, tree t2)
   /* Function types: prefer the one that specified arg types.
 If both do, merge the arg types.  Also merge the return types.  */
   {
-   tree valtype = composite_type (TREE_TYPE (t1), 

[PATCH] c++: Set DECL_CONTEXT for __cxa_thread_atexit [PR99187]

2023-11-16 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu. I don't have write
access.

-- >8 --

Modules streaming requires DECL_CONTEXT to be set on declarations that
are streamed. This ensures that __cxa_thread_atexit is given translation
unit context much like is already done with many other support
functions.

PR c++/99187

gcc/cp/ChangeLog:

* cp-tree.h (enum cp_tree_index):
(thread_atexit_node):
* decl.cc (get_thread_atexit_node):

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr99187.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/cp-tree.h   |  4 
 gcc/cp/decl.cc | 15 +--
 gcc/testsuite/g++.dg/modules/pr99187.C | 10 ++
 3 files changed, 27 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/pr99187.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 1fa710d7154..c7a1cf610c8 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -231,6 +231,7 @@ enum cp_tree_index
 CPTI_RETHROW_FN,
 CPTI_ATEXIT_FN_PTR_TYPE,
 CPTI_ATEXIT,
+CPTI_THREAD_ATEXIT,
 CPTI_DSO_HANDLE,
 CPTI_DCAST,
 
@@ -375,6 +376,9 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
 /* A pointer to `std::atexit'.  */
 #define atexit_nodecp_global_trees[CPTI_ATEXIT]
 
+/* A pointer to `__cxa_thread_atexit'.  */
+#define thread_atexit_node cp_global_trees[CPTI_THREAD_ATEXIT]
+
 /* A pointer to `__dso_handle'.  */
 #define dso_handle_nodecp_global_trees[CPTI_DSO_HANDLE]
 
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index d2ed46b1453..6a1f4213c9a 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -9592,6 +9592,9 @@ get_atexit_node (void)
 static tree
 get_thread_atexit_node (void)
 {
+  if (thread_atexit_node)
+return thread_atexit_node;
+
   /* The declaration for `__cxa_thread_atexit' is:
 
  int __cxa_thread_atexit (void (*)(void *), void *, void *) */
@@ -9600,10 +9603,18 @@ get_thread_atexit_node (void)
   ptr_type_node, ptr_type_node,
   NULL_TREE);
 
-  /* Now, build the function declaration.  */
+  /* Now, build the function declaration, as with __cxa_atexit.  */
+  unsigned flags = push_abi_namespace ();
   tree atexit_fndecl = build_library_fn_ptr ("__cxa_thread_atexit", fn_type,
 ECF_LEAF | ECF_NOTHROW);
-  return decay_conversion (atexit_fndecl, tf_warning_or_error);
+  DECL_CONTEXT (atexit_fndecl) = FROB_CONTEXT (current_namespace);
+  DECL_SOURCE_LOCATION (atexit_fndecl) = BUILTINS_LOCATION;
+  atexit_fndecl = pushdecl (atexit_fndecl, /*hiding=*/true);
+  pop_abi_namespace (flags);
+  mark_used (atexit_fndecl);
+  thread_atexit_node = decay_conversion (atexit_fndecl, tf_warning_or_error);
+
+  return thread_atexit_node;
 }
 
 /* Returns the __dso_handle VAR_DECL.  */
diff --git a/gcc/testsuite/g++.dg/modules/pr99187.C 
b/gcc/testsuite/g++.dg/modules/pr99187.C
new file mode 100644
index 000..7f707e0c703
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr99187.C
@@ -0,0 +1,10 @@
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi pr99187 }
+
+export module pr99187;
+
+export struct A { ~A() {} };
+
+export inline void f() {
+  static thread_local A a;
+}
-- 
2.42.0



[PATCH 3/4] c23: aliasing of compatible tagged types

2023-11-16 Thread Martin Uecker




Tell the backend which types are equivalent by setting
TYPE_CANONICAL to one struct in the set of equivalent
structs. Structs are considered equivalent by ignoring
all sizes of arrays nested in types below field level.

gcc/c:
* c-decl.cc (c_struct_hasher): Hash stable for struct
types.
(c_struct_hasher::hash, c_struct_hasher::equal): New
functions.
(finish_struct): Set TYPE_CANONICAL to first struct in
equivalence class.
* c-objc-common.cc (c_get_alias_set): Let structs or
unions with variable size alias anything.
* c-tree.h (comptypes_equiv): New prototype.
* c-typeck.cc (comptypes_equiv): New function.
(comptypes_internal): Implement equivalence mode.
(tagged_types_tu_compatible): Implement equivalence mode.

gcc/testsuite:
* gcc.dg/c23-tag-2.c: Activate.
* gcc.dg/c23-tag-6.c: Activate.
* gcc.dg/c23-tag-alias-1.c: New test.
* gcc.dg/c23-tag-alias-2.c: New test.
* gcc.dg/c23-tag-alias-3.c: New test.
* gcc.dg/c23-tag-alias-4.c: New test.
* gcc.dg/c23-tag-alias-5.c: New test.
* gcc.dg/c23-tag-alias-6.c: New test.
* gcc.dg/c23-tag-alias-7.c: New test.
* gcc.dg/c23-tag-alias-8.c: New test.
* gcc.dg/gnu23-tag-alias-1.c: New test.
---
 gcc/c/c-decl.cc  | 48 +
 gcc/c/c-objc-common.cc   |  5 ++
 gcc/c/c-tree.h   |  1 +
 gcc/c/c-typeck.cc| 31 
 gcc/testsuite/gcc.dg/c23-tag-2.c |  4 +-
 gcc/testsuite/gcc.dg/c23-tag-5.c |  5 +-
 gcc/testsuite/gcc.dg/c23-tag-alias-1.c   | 48 +
 gcc/testsuite/gcc.dg/c23-tag-alias-2.c   | 73 +++
 gcc/testsuite/gcc.dg/c23-tag-alias-3.c   | 48 +
 gcc/testsuite/gcc.dg/c23-tag-alias-4.c   | 73 +++
 gcc/testsuite/gcc.dg/c23-tag-alias-5.c   | 30 
 gcc/testsuite/gcc.dg/c23-tag-alias-6.c   | 77 
 gcc/testsuite/gcc.dg/c23-tag-alias-7.c   | 86 ++
 gcc/testsuite/gcc.dg/c23-tag-alias-8.c   | 90 
 gcc/testsuite/gcc.dg/gnu23-tag-alias-1.c | 33 +
 15 files changed, 648 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-alias-1.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-alias-2.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-alias-3.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-alias-4.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-alias-5.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-alias-6.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-alias-7.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-alias-8.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu23-tag-alias-1.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index e5d48c3fa56..d0a405087c3 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -634,6 +634,36 @@ public:
   auto_vec typedefs_seen;
 };
 
+
+/* Hash table for structs and unions.  */
+struct c_struct_hasher : ggc_ptr_hash
+{
+  static hashval_t hash (tree t);
+  static bool equal (tree, tree);
+};
+
+/* Hash an RECORD OR UNION.  */
+hashval_t
+c_struct_hasher::hash (tree type)
+{
+  inchash::hash hstate;
+
+  hstate.add_int (TREE_CODE (type));
+  hstate.add_object (TYPE_NAME (type));
+
+  return hstate.end ();
+}
+
+/* Compare two RECORD or UNION types.  */
+bool
+c_struct_hasher::equal (tree t1,  tree t2)
+{
+  return comptypes_equiv_p (t1, t2);
+}
+
+/* All tagged typed so that TYPE_CANONICAL can be set correctly.  */
+static GTY (()) hash_table *c_struct_htab;
+
 /* Information for the struct or union currently being parsed, or
NULL if not parsing a struct or union.  */
 static class c_struct_parse_info *struct_parse_info;
@@ -9646,6 +9676,24 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
 
   C_TYPE_BEING_DEFINED (t) = 0;
 
+  /* Set type canonical based on equivalence class.  */
+  if (flag_isoc23)
+{
+  if (NULL == c_struct_htab)
+   c_struct_htab = hash_table::create_ggc (61);
+
+  hashval_t hash = c_struct_hasher::hash (t);
+
+  tree *e = c_struct_htab->find_slot_with_hash (t, hash, INSERT);
+  if (*e)
+   TYPE_CANONICAL (t) = *e;
+  else
+   {
+ TYPE_CANONICAL (t) = t;
+ *e = t;
+   }
+}
+
   tree incomplete_vars = C_TYPE_INCOMPLETE_VARS (TYPE_MAIN_VARIANT (t));
   for (x = TYPE_MAIN_VARIANT (t); x; x = TYPE_NEXT_VARIANT (x))
 {
diff --git a/gcc/c/c-objc-common.cc b/gcc/c/c-objc-common.cc
index c8f49aa2370..738afbad770 100644
--- a/gcc/c/c-objc-common.cc
+++ b/gcc/c/c-objc-common.cc
@@ -389,6 +389,11 @@ c_get_alias_set (tree t)
   if (TREE_CODE (t) == ENUMERAL_TYPE)
 return get_alias_set (ENUM_UNDERLYING_TYPE (t));
 
+  /* Structs with variable size can alias different incompatible
+ structs.  Let them alias anything.   */
+  if (RECORD_OR_UNION_TYPE_P (t) && 

[PATCH 2/4] c23: tag compatibility rules for enums

2023-11-16 Thread Martin Uecker




Allow redefinition of enum types and enumerators.  Diagnose
nested redefinitions including redefinitions in the enum
specifier for enum types with fixed underlying type.

gcc/c:
* c-tree.h (c_parser_enum_specifier): Add parameter.
* c-decl.cc (start_enum): Allow redefinition.
(finish_enum): Diagnose conflicts.
(build_enumerator): Set context.
(diagnose_mismatched_decls): Diagnose conflicting enumerators.
(push_decl): Preserve context for enumerators.
* c-parser.cc (c_parser_enum_specifier): Remember when
seen is from an enum type which is not yet defined.

gcc/testsuide/:
* gcc.dg/c23-tag-enum-1.c: New test.
* gcc.dg/c23-tag-enum-2.c: New test.
* gcc.dg/c23-tag-enum-3.c: New test.
* gcc.dg/c23-tag-enum-4.c: New test.
* gcc.dg/c23-tag-enum-5.c: New test.
---
 gcc/c/c-decl.cc   | 65 +++
 gcc/c/c-parser.cc |  5 ++-
 gcc/c/c-tree.h|  3 +-
 gcc/c/c-typeck.cc |  5 ++-
 gcc/testsuite/gcc.dg/c23-tag-enum-1.c | 56 +++
 gcc/testsuite/gcc.dg/c23-tag-enum-2.c | 23 ++
 gcc/testsuite/gcc.dg/c23-tag-enum-3.c |  7 +++
 gcc/testsuite/gcc.dg/c23-tag-enum-4.c | 22 +
 gcc/testsuite/gcc.dg/c23-tag-enum-5.c | 18 
 9 files changed, 192 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-enum-1.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-enum-2.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-enum-3.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-enum-4.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-enum-5.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 194dd595334..e5d48c3fa56 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -2114,9 +2114,24 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
  given scope.  */
   if (TREE_CODE (olddecl) == CONST_DECL)
 {
-  auto_diagnostic_group d;
-  error ("redeclaration of enumerator %q+D", newdecl);
-  locate_old_decl (olddecl);
+  if (flag_isoc23
+ && TYPE_NAME (DECL_CONTEXT (newdecl))
+ && DECL_CONTEXT (newdecl) != DECL_CONTEXT (olddecl)
+ && TYPE_NAME (DECL_CONTEXT (newdecl)) == TYPE_NAME (DECL_CONTEXT 
(olddecl)))
+   {
+ if (!simple_cst_equal (DECL_INITIAL (olddecl), DECL_INITIAL 
(newdecl)))
+   {
+ auto_diagnostic_group d;
+ error ("conflicting redeclaration of enumerator %q+D", newdecl);
+ locate_old_decl (olddecl);
+   }
+   }
+  else
+   {
+ auto_diagnostic_group d;
+ error ("redeclaration of enumerator %q+D", newdecl);
+ locate_old_decl (olddecl);
+   }
   return false;
 }
 
@@ -3277,8 +3292,11 @@ pushdecl (tree x)
 
   /* Must set DECL_CONTEXT for everything not at file scope or
  DECL_FILE_SCOPE_P won't work.  Local externs don't count
- unless they have initializers (which generate code).  */
+ unless they have initializers (which generate code).  We
+ also exclude CONST_DECLs because enumerators will get the
+ type of the enum as context.  */
   if (current_function_decl
+  && TREE_CODE (x) != CONST_DECL
   && (!VAR_OR_FUNCTION_DECL_P (x)
  || DECL_INITIAL (x) || !TREE_PUBLIC (x)))
 DECL_CONTEXT (x) = current_function_decl;
@@ -9737,7 +9755,7 @@ layout_array_type (tree t)
 
 tree
 start_enum (location_t loc, struct c_enum_contents *the_enum, tree name,
-   tree fixed_underlying_type)
+   tree fixed_underlying_type, bool potential_nesting_p)
 {
   tree enumtype = NULL_TREE;
   location_t enumloc = UNKNOWN_LOCATION;
@@ -9749,9 +9767,26 @@ start_enum (location_t loc, struct c_enum_contents 
*the_enum, tree name,
   if (name != NULL_TREE)
 enumtype = lookup_tag (ENUMERAL_TYPE, name, true, );
 
+  if (enumtype != NULL_TREE && TREE_CODE (enumtype) == ENUMERAL_TYPE)
+{
+  /* If the type is currently being defined or if we have seen an
+incomplete version which is now complete, this is a nested
+redefinition.  The later happens if the redefinition occurs
+inside the enum specifier itself.  */
+  if (C_TYPE_BEING_DEFINED (enumtype)
+ || (potential_nesting_p && TYPE_VALUES (enumtype) != NULL_TREE))
+   error_at (loc, "nested redefinition of %", name);
+
+ /* For C23 we allow redefinitions.  We set to zero and check for
+   consistency later.  */
+  if (flag_isoc23 && TYPE_VALUES (enumtype) != NULL_TREE)
+   enumtype = NULL_TREE;
+}
+
   if (enumtype == NULL_TREE || TREE_CODE (enumtype) != ENUMERAL_TYPE)
 {
   enumtype = make_node (ENUMERAL_TYPE);
+  TYPE_SIZE (enumtype) = NULL_TREE;
   pushtag (loc, name, enumtype);
   if (fixed_underlying_type != NULL_TREE)
{
@@ -9779,9 +9814,6 @@ start_enum (location_t loc, struct c_enum_contents 
*the_enum, tree name,

[PATCH 1/4] c23: tag compatibility rules for struct and unions

2023-11-16 Thread Martin Uecker



Implement redeclaration and compatibility rules for
structures and unions in C23.

gcc/c/:
* c-decl.cc (previous_tag): New function.
(get_parm_info): Turn off warning for C2X.
(start_struct): Allow redefinitons.
(finish_struct): Diagnose conflicts.
* c-tree.h (comptypes_same_p): Add prototype.
* c-typeck.cc (comptypes_same_p): New function
(comptypes_internal): Activate comparison of tagged
types (convert_for_assignment): Ingore qualifiers.
(digest_init): Add error.
(initialized_elementwise_p): Allow compatible types.

gcc/testsuite/:
* gcc.dg/c23-enum-7.c: Remove warning.
* gcc.dg/c23-tag-1.c: New test.
* gcc.dg/c23-tag-2.c: New deactivated test.
* gcc.dg/c23-tag-3.c: New test.
* gcc.dg/c23-tag-4.c: New test.
* gcc.dg/c23-tag-5.c: New deactivated test.
* gcc.dg/c23-tag-6.c: New test.
* gcc.dg/c23-tag-7.c: New test.
* gcc.dg/c23-tag-8.c: New test.
* gcc.dg/gnu23-tag-1.c: New test.
* gcc.dg/gnu23-tag-2.c: New test.
* gcc.dg/gnu23-tag-3.c: New test.
* gcc.dg/gnu23-tag-4.c: New test.
---
 gcc/c/c-decl.cc| 62 ---
 gcc/c/c-tree.h |  1 +
 gcc/c/c-typeck.cc  | 38 +
 gcc/testsuite/gcc.dg/c23-enum-7.c  |  6 +--
 gcc/testsuite/gcc.dg/c23-tag-1.c   | 67 ++
 gcc/testsuite/gcc.dg/c23-tag-2.c   | 43 +++
 gcc/testsuite/gcc.dg/c23-tag-3.c   | 16 +++
 gcc/testsuite/gcc.dg/c23-tag-4.c   | 26 
 gcc/testsuite/gcc.dg/c23-tag-5.c   | 33 +++
 gcc/testsuite/gcc.dg/c23-tag-6.c   | 25 +++
 gcc/testsuite/gcc.dg/c23-tag-7.c   | 12 ++
 gcc/testsuite/gcc.dg/c23-tag-8.c   | 10 +
 gcc/testsuite/gcc.dg/gnu23-tag-1.c | 10 +
 gcc/testsuite/gcc.dg/gnu23-tag-2.c | 19 +
 gcc/testsuite/gcc.dg/gnu23-tag-3.c | 28 +
 gcc/testsuite/gcc.dg/gnu23-tag-4.c | 31 ++
 16 files changed, 411 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-3.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-4.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-5.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-6.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-7.c
 create mode 100644 gcc/testsuite/gcc.dg/c23-tag-8.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu23-tag-1.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu23-tag-2.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu23-tag-3.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu23-tag-4.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 64d3a941cb9..194dd595334 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -2039,6 +2039,28 @@ locate_old_decl (tree decl)
decl, TREE_TYPE (decl));
 }
 
+
+/* Subroutine of finish_struct.  For a tagged type, it finds the
+   declaration for a visible tag declared in the the same scope
+   if such a declaration exists.  */
+static tree
+previous_tag (tree type)
+{
+  struct c_binding *b = NULL;
+  tree name = TYPE_NAME (type);
+
+  if (name)
+b = I_TAG_BINDING (name);
+
+  if (b)
+b = b->shadowed;
+
+  if (b && B_IN_CURRENT_SCOPE (b))
+return b->decl;
+
+  return NULL_TREE;
+}
+
 /* Subroutine of duplicate_decls.  Compare NEWDECL to OLDDECL.
Returns true if the caller should proceed to merge the two, false
if OLDDECL should simply be discarded.  As a side effect, issues
@@ -8573,11 +8595,14 @@ get_parm_info (bool ellipsis, tree expr)
  if (TREE_CODE (decl) != UNION_TYPE || b->id != NULL_TREE)
{
  if (b->id)
-   /* The %s will be one of 'struct', 'union', or 'enum'.  */
-   warning_at (b->locus, 0,
-   "%<%s %E%> declared inside parameter list"
-   " will not be visible outside of this definition or"
-   " declaration", keyword, b->id);
+   {
+ /* The %s will be one of 'struct', 'union', or 'enum'.  */
+ if (!flag_isoc23)
+   warning_at (b->locus, 0,
+   "%<%s %E%> declared inside parameter list"
+   " will not be visible outside of this 
definition or"
+   " declaration", keyword, b->id);
+   }
  else
/* The %s will be one of 'struct', 'union', or 'enum'.  */
warning_at (b->locus, 0,
@@ -8782,6 +8807,14 @@ start_struct (location_t loc, enum tree_code code, tree 
name,
 
   if (name != NULL_TREE)
 ref = lookup_tag (code, name, true, );
+
+  /* For C23, even if we already have a completed definition,
+ we do not use it. We will check for consistency later.
+ If we are in a nested 

c23 type compatibility rules, v3

2023-11-16 Thread Martin Uecker


Joseph,

this is another revised series for the C23 rules for type
compatibility.

1/4 c23: tag compatibility rules for struct and unions
2/4 c23: tag compatibility rules for enums
3/4 c23: aliasing of compatible tagged types
4/4 c23: construct composite type for tagged types


The first two were revised to address the nesting (and
other) issues you pointed out.

For 3 and 4 I only changed c2x to c23 and moved some
tests around. 3 wasn't reviewed so far and 4 still
needs some more work from my side.


Bootstrapped and regression tested on x86_64.


Martin








Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-16 Thread Arsen Arsenović

Arsen Arsenović  writes:

> [[PGP Signed Part:Good signature from 52C294301EA2C493 Arsen Arsenović 
> (Gentoo Developer UID)  (trust ultimate) created at 
> 2023-11-16T19:47:16+0100 using EDDSA]]
>
> David Edelsohn  writes:
>
>> On Wed, Nov 15, 2023 at 9:22 AM Arsen Arsenović  wrote:
>>
>>>
>>> David Edelsohn  writes:
>>>
>>> > GCC had been working on AIX with NLS, using "--with-included-gettext".
>>> > --disable-nls gets past the breakage, but GCC does not build for me on
>>> AIX
>>> > with NLS enabled.
>>>
>>> That should still work with gettext 0.22+ extracted in-tree (it should
>>> be fetched by download_prerequisites).
>>>
>>> > A change in dependencies for GCC should have been announced and more
>>> widely
>>> > socialized in the GCC development mailing list, not just GCC patches
>>> > mailing list.
>>> >
>>> > I have tried both the AIX Open Source libiconv and libgettext package,
>>> and
>>> > the ones that I previously built.  Both fail because GCC configure
>>> decides
>>> > to disable NLS, despite being requested, while libcpp is satisfied, so
>>> > tools in the gcc subdirectory don't link against libiconv and the build
>>> > fails.  With the included gettext, I was able to rely on a
>>> self-consistent
>>> > solution.
>>>
>>> That is interesting.  They should be using the same checks.  I've
>>> checked trunk and regenerated files on it, and saw no significant diff
>>> (some whitespace changes only).  Could you post the config.log of both?
>>>
>>> I've never used AIX.  Can I reproduce this on one of the cfarm machines
>>> to poke around?  I've tried cfarm119, but that one lacked git, and I
>>> haven't poked around much further due to time constraints.
>>>
>>
>> The AIX system in the Compile Farm has a complete complement of Open Source
>> software installed.
>>
>> Please ensure that /opt/freeware/bin is in your path.  Also, the GCC Wiki
>> Compile Farm page has build tips that include AIX
>>
>> https://gcc.gnu.org/wiki/CompileFarm#Services_and_software_installed_on_farm_machines
>
> Thanks, that got me further.
>
>> that recommended --with-included-gettext configuration option.
>
> This flag should still exist and operate the same if gettext is present
> in tree.  I've cloned gcc and downloaded prerequisites (via
> contrib/download_prerequisites) and I am trying to configure it now.

The build failed.  After gettext/gmp/... (in-tree hostlibs) get built
and added to library paths, further GCC processes fail to run:

configure:3305: gcc -g  -static-libstdc++ -static-libgcc -Wl,-bbigtoc 
conftest.c  >&5
Could not load program /opt/freeware/libexec/gcc/powerpc-ibm-aix7.3.0.0/10/cc1:
Dependent module /home/arsen/build/./gmp/.libs/libgmp.a(libgmp.so.10) 
could not be loaded.
Member libgmp.so.10 is not found in archive 

This seems odd.  I am not sure what compels the RTDL (?) to look up .sos
in archives, or how it knows about these archives..  I suspect it's
getting tripped by something in HOST_EXPORTS.

>> Thanks, David
>>
>>
>>>
>>> TIA, sorry about the inconvenience.  Have a lovely day.
>>>
>>> > The current gettext-0.22.3 fails to build for me on AIX.
>>> >
>>> > libcpp configure believes that NLS functions on AIX, but gcc configure
>>> > fails in its tests of gettext functionality, which leads to an
>>> inconsistent
>>> > configuration and build breakage.
>>> >
>>> > Thanks, David
>>>
>>>
>>> --
>>> Arsen Arsenović
>>>


-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH V3 5/7] ira: Add all nregs >= 2 pseudos to tracke subreg list

2023-11-16 Thread Vladimir Makarov



On 11/12/23 07:08, Lehua Ding wrote:

This patch relax the subreg track capability to all subreg registers.
The patch is ok for me when general issues I mentioned in my first email 
and the issue given below are fixed.

gcc/ChangeLog:

* ira-build.cc (get_reg_unit_size): New.
(has_same_nregs): New.
(ira_set_allocno_class): Adjust.


...

+
+/* Return true if TARGET_CLASS_MAX_NREGS and TARGET_HARD_REGNO_NREGS results is
+   same. It should be noted that some targets may not implement these two very
+   uniformly, and need to be debugged step by step. For example, in V3x1DI mode
+   in AArch64, TARGET_CLASS_MAX_NREGS returns 2 but TARGET_HARD_REGNO_NREGS
+   returns 3. They are in conflict and need to be repaired in the Hook of
+   AArch64.  */
+static bool
+has_same_nregs (ira_allocno_t a)
+{
+  for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+if (REGNO_REG_CLASS (i) != NO_REGS
+   && reg_class_subset_p (REGNO_REG_CLASS (i), ALLOCNO_CLASS (a))
+   && ALLOCNO_NREGS (a) != hard_regno_nregs (i, ALLOCNO_MODE (a)))
+  return false;
+  return true;
+}
+


It is better to fix the problem source.  But sometimes it is hard to do 
this for all targets.  RA already has analogous code.  So it is ok for 
me.  The only thing is that it is too expensive to do this for each 
allocno.  You should implement some cache (class, mode)->result.





Re: [PATCH] Assert we don't create recursive DW_AT_abstract_origin

2023-11-16 Thread Jason Merrill

On 10/30/23 08:57, Richard Biener wrote:

We have a support case that shows GCC 7 sometimes creates
DW_TAG_label refering to itself via a DW_AT_abstract_origin
when using LTO.  This for example triggers the sanity check
added below during LTO bootstrap.

Making this check cover more than just DW_AT_abstract_origin
breaks bootstrap on trunk for

   /* GNU extension: Record what type our vtable lives in.  */
   if (TYPE_VFIELD (type))
 {
   tree vtype = DECL_FCONTEXT (TYPE_VFIELD (type));

   gen_type_die (vtype, context_die);
   add_AT_die_ref (type_die, DW_AT_containing_type,
   lookup_type_die (vtype));

so the check is for now restricted to DW_AT_abstract_origin.

Bootstrapped on x86_64-unknown-linux-gnu, OK?


Let's also check for DW_AT_specification, since that's the other one 
get_AT follows.  OK with that change.



My workaround for the GCC 7 problem is

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 5590845d2a4..07185a1a0d3 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -23030,7 +23031,7 @@ gen_label_die (tree decl, dw_die_ref context_die)
lbl_die = new_die (DW_TAG_label, context_die, decl);
equate_decl_number_to_die (decl, lbl_die);
  
-  if (origin != NULL)

+  if (origin != NULL && origin != decl)
 add_abstract_origin_attribute (lbl_die, origin);
else
 add_name_and_src_coords_attributes (lbl_die, decl);

that's not needed on trunk because there we dont' end up
with LABEL_DECLs with self-DECL_ABSTRACT_ORIGIN (and not DECL_ABSTRACT).

Thanks,
Richard.

* dwarf2out.cc (add_AT_die_ref): Assert we do not add
a self-ref DW_AT_abstract_origin.
---
  gcc/dwarf2out.cc | 1 +
  1 file changed, 1 insertion(+)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 1e0cec66c5e..0070a9e8412 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -4908,6 +4908,7 @@ add_AT_die_ref (dw_die_ref die, enum dwarf_attribute 
attr_kind, dw_die_ref targ_
  {
dw_attr_node attr;
gcc_checking_assert (targ_die != NULL);
+  gcc_assert (targ_die != die || attr_kind != DW_AT_abstract_origin);
  
/* With LTO we can end up trying to reference something we didn't create

   a DIE for.  Avoid crashing later on a NULL referenced DIE.  */




Re: [PATCH V3 4/7] ira: Support subreg copy

2023-11-16 Thread Vladimir Makarov



On 11/12/23 07:08, Lehua Ding wrote:

This patch changes the previous way of creating a copy between allocnos to 
objects.

gcc/ChangeLog:

* ira-build.cc (find_allocno_copy): Removed.
(find_object): New.
(ira_create_copy): Adjust.
(add_allocno_copy_to_list): Adjust.
(swap_allocno_copy_ends_if_necessary): Adjust.
(ira_add_allocno_copy): Adjust.
(print_copy): Adjust.
(print_allocno_copies): Adjust.
(ira_flattening): Adjust.
* ira-color.cc (INCLUDE_VECTOR): Include vector.
(struct allocno_color_data): Adjust.
(struct allocno_hard_regs_subnode): Adjust.
(form_allocno_hard_regs_nodes_forest): Adjust.
(update_left_conflict_sizes_p): Adjust.
(struct update_cost_queue_elem): Adjust.
(queue_update_cost): Adjust.
(get_next_update_cost): Adjust.
(update_costs_from_allocno): Adjust.
(update_conflict_hard_regno_costs): Adjust.
(assign_hard_reg): Adjust.
(objects_conflict_by_live_ranges_p): New.
(allocno_thread_conflict_p): Adjust.
(object_thread_conflict_p): Ditto.
(merge_threads): Ditto.
(form_threads_from_copies): Ditto.
(form_threads_from_bucket): Ditto.
(form_threads_from_colorable_allocno): Ditto.
(init_allocno_threads): Ditto.
(add_allocno_to_bucket): Ditto.
(delete_allocno_from_bucket): Ditto.
(allocno_copy_cost_saving): Ditto.
(color_allocnos): Ditto.
(color_pass): Ditto.
(update_curr_costs): Ditto.
(coalesce_allocnos): Ditto.
(ira_reuse_stack_slot): Ditto.
(ira_initiate_assign): Ditto.
(ira_finish_assign): Ditto.
* ira-conflicts.cc (allocnos_conflict_for_copy_p): Ditto.
(REG_SUBREG_P): Ditto.
(subreg_move_p): New.
(regs_non_conflict_for_copy_p): New.
(subreg_reg_align_and_times_p): New.
(process_regs_for_copy): Ditto.
(add_insn_allocno_copies): Ditto.
(propagate_copies): Ditto.
* ira-emit.cc (add_range_and_copies_from_move_list): Ditto.
* ira-int.h (struct ira_allocno_copy): Ditto.
(ira_add_allocno_copy): Ditto.
(find_object): Exported.
(subreg_move_p): Exported.
* ira.cc (print_redundant_copies): Exported.

---
  gcc/ira-build.cc | 154 +++-
  gcc/ira-color.cc | 541 +++
  gcc/ira-conflicts.cc | 173 +++---
  gcc/ira-emit.cc  |  10 +-
  gcc/ira-int.h|  10 +-
  gcc/ira.cc   |   5 +-
  6 files changed, 646 insertions(+), 247 deletions(-)
The patch is mostly ok for me except that there are the same issues I 
mentioned in my 1st email. Not changing comments for functions with 
changed interface like function arg types and names (e.g. 
find_allocno_copy) is particularly bad.  It makes the comments confusing 
and wrong.  Also using just "adjust" in changelog entries is too brief.  
You should at least mention that function signature is changed.

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index a32693e69e4..13f0f7336ed 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 8aed25144b9..099312bcdb3 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -20,6 +20,7 @@ along with GCC; see the file COPYING3.  If not see
  


  
-  ira_allocno_t next_thread_allocno;

+  ira_object_t *next_thread_objects;
+  /* The allocno all thread shared.  */
+  ira_allocno_t first_thread_allocno;
+  /* The offset start relative to the first_thread_allocno.  */
+  int first_thread_offset;
+  /* All allocnos belong to the thread.  */
+  bitmap thread_allocnos;


It is better to use bitmap_head instead of bitmap.  It permits to avoid 
allocation of bitmap_head for bitmap.  There are many places when 
bitmap_head in you patches can be better used than bitmap (it is 
especially profitable if there is significant probability of empty bitmap).


Of  course the patch cab be committed when all the patches are approved 
and fixed.




Re: [PATCH] Fix PR ada/111909 On Darwin, determine filesystem case sensitivity at runtime

2023-11-16 Thread Simon Wright
On 13 Nov 2023, at 16:18, Arnaud Charlet  wrote:
> 
> OK, I thought there would be some defines that we could use for that, too 
> bad if there isn't
> and indeed we might need to perform another runtime check then as 
> suggested by Iain.
 
 I can see a possible interface, operatingSystemVersion in NSProcessInfo.h 
 - Objective C
 needed, I think
>>> 
>>> Some of the NS interfaces are available to regular C (e.g. stuff in 
>>> CoreFoundation), and I am
>>> fairly/very sure that we will be able to find a machanism that does not 
>>> involve introducing an
>>> ObjC dep.  [I am obvioulsy not in any way against ObjC - since i’m the 
>>> maintainer ;) .. but it
>>> seems heavyweight for solving this issue].
>> 
>> It certainly would be heavyweight, since TargetConditionals.h includes 
>> TARGET_OS_OSX, 
>> which is 1 if we’re compiling for macOS and 0 otherwise (there’s a useful 
>> chart at :83 in the 
>> MacOSX13.1 SDK).
>> 
>> Two ways ahead here:
>> (1) just replace the current __arm__, __arm64__ test with this
> 
> That would be fine here (replace refs to *arm* by TARGET_OS_OSX), since this 
> was my original
> suggestion (copied at the top of this email).
> 
>> (2) as 1, but implement the runtime test for case sensitivity only for macOS
>> 
>> Whether (2) is acceptable depends, I suppose, on what issues Iain 
>> encountered on Darwin 9 
>> & Darwin 17. I’ll be content to go with (1).

I'm not sure whether this should have been a new [PATCH V2] thread?

Also, should the test code below (between %%%) be included in the
testsuite?

--8<--

In gcc/ada/adaint.c(__gnat_get_file_names_case_sensitive), the
current assumption for __APPLE__ is that file names are
case-insensitive unless __arm__ or __arm64__ are defined, in which
case file names are declared case-sensitive.

The associated comment is
  "By default, we suppose filesystems aren't case sensitive on
  Windows and Darwin (but they are on arm-darwin)."

This means that on aarch64-apple-darwin, file names are treated as
case-sensitive, which is not the default case.

Apple provide a header file  which permits a
compile-time check for the compiler target (e.g. OSX vs IOS). At Darwin
10.5 (Xcode 3) iOS wasn't supported, so it was adequate to check
TARGET_OS_MAC; nowadays, that covers many variants including macOS
and iOS, so one needs to check whether TARGET_OS_OSX is defined, and
if so whether it's set.

Bootstrapped on x86_64-apple-darwin with languages c,c++,ada and regression
tested (check-ada).

Likewise bootstrapped on aarch64-apple-darwin from the Github sources
corresponding to GCC 2023-11-05.

__gnat_get_file_names_case_sensitive() isn't exported to user code, so
implemented check code as below: each compiler (x86_64-apple-darwin and
aarch64-apple-darwin) reported that file names were not case sensitive.

%%%
with Ada.Text_IO;
with Interfaces.C;
procedure Check_Case_Sensitivity is
   type C_Boolean is (False, True)
 with Convention => C;
   function Get_File_Names_Case_Sensitive return C_Boolean
   with
 Import,
 Convention => C,
 External_Name => "__gnat_get_file_names_case_sensitive";
begin
   Ada.Text_IO.Put_Line ("GNAT thinks file names are " &
   (case Get_File_Names_Case_Sensitive is
   when False => "not case sensitive",
   when True  => "case sensitive"));
end Check_Case_Sensitivity;
%%%

gcc/ada/Changelog:

2023-11-16 Simon Wright 

  * gcc/ada/adaint.c
  (__gnat_get_file_names_case_sensitive): Split out the __APPLE__
  check and remove the checks for __arm__, __arm64__.
  File names are by default case sensitive unless TARGET_OS_OSX
  (or if this is an older OS release, in which case TARGET_OS_OSX
  is undefined, TARGET_OS_MAC) is set.

Signed-off-by: Simon Wright 
---
 gcc/ada/adaint.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/adaint.c b/gcc/ada/adaint.c
index bb4ed2607e5..1ef529ec20b 100644
--- a/gcc/ada/adaint.c
+++ b/gcc/ada/adaint.c
@@ -84,7 +84,7 @@
 #endif /* VxWorks */
 
 #if defined (__APPLE__)
-#include 
+#include 
 #endif
 
 #if defined (__hpux__)
@@ -613,12 +613,25 @@ __gnat_get_file_names_case_sensitive (void)
   else
{
  /* By default, we suppose filesystems aren't case sensitive on
-Windows and Darwin (but they are on arm-darwin).  */
-#if defined (WINNT) || defined (__DJGPP__) \
-  || (defined (__APPLE__) && !(defined (__arm__) || defined (__arm64__)))
+Windows or DOS.  */
+#if defined (WINNT) || defined (__DJGPP__)
+ file_names_case_sensitive_cache = 0;
+#elif defined (__APPLE__)
+ /* By default, macOS volumes are case-insensitive, iOS
+volumes are case-sensitive.  */
+#if defined (TARGET_OS_OSX)  /* In recent SDK.  */
+#if TARGET_OS_OSX/* macOS.  */   
  file_names_case_sensitive_cache = 0;
 #else
  file_names_case_sensitive_cache = 

[PATCH v5] gcc: Introduce -fhardened

2023-11-16 Thread Marek Polacek
On Wed, Nov 15, 2023 at 01:25:27PM +0100, Jakub Jelinek wrote:
> On Fri, Nov 03, 2023 at 06:51:16PM -0400, Marek Polacek wrote:
> > +  if (flag_hardened)
> > +   {
> > + if (!fortify_seen_p && optimize > 0)
> > +   {
> > + if (TARGET_GLIBC_MAJOR == 2 && TARGET_GLIBC_MINOR >= 35)
> > +   cpp_define (parse_in, "_FORTIFY_SOURCE=3");
> > + else
> > +   cpp_define (parse_in, "_FORTIFY_SOURCE=2");
> > +   }
> 
> I don't like the above in generic code, the fact that gcc was configured
> against glibc target headers doesn't mean it is targetting glibc.
> E.g. for most *-linux* targets, config/linux.opt provides the
> -mbionic/-mglibc/-muclibc/-mmusl options.
> 
> One ugly way around would be to do
> #ifdef OPTION_GLIBC
>   if (OPTION_GLIBC && TARGET_GLIBC_MAJOR == 2 && TARGET_GLIBC_MINOR >= 35)
> cpp_define (parse_in, "_FORTIFY_SOURCE=3");
>   else
> #endif
> cpp_define (parse_in, "_FORTIFY_SOURCE=2");
> (assuming OPTION_GLIBC at that point is already computed); a cleaner way
> would be to introduce a target hook for that, say
> fortify_source_default_level or something similar, where the default hook
> would return 2 and next to linux_libc_has_function one would override it
> for OPTION_GLIBC && TARGET_GLIBC_MAJOR == 2 && TARGET_GLIBC_MINOR >= 35
> to 3.  That way, in the future other targets (say *BSD) can choose to do
> something similar more easily.

Thanks, that's a good point.  In this version I've added a target hook.

On my system, -D_FORTIFY_SOURCE=3 will be used, and if I remove
linux_fortify_source_default_level it's =2 as expected.

The only problem was that it doesn't seem to be possible to use
targetm. in opts.cc -- I get an undefined reference.  But since
the opts.cc use is for --help only, it's not a big deal either way.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
In 
I proposed -fhardened, a new umbrella option that enables a reasonable set
of hardening flags.  The read of the room seems to be that the option
would be useful.  So here's a patch implementing that option.

Currently, -fhardened enables:

  -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
  -D_GLIBCXX_ASSERTIONS
  -ftrivial-auto-var-init=zero
  -fPIE  -pie  -Wl,-z,relro,-z,now
  -fstack-protector-strong
  -fstack-clash-protection
  -fcf-protection=full (x86 GNU/Linux only)

-fhardened will not override options that were specified on the command line
(before or after -fhardened).  For example,

 -D_FORTIFY_SOURCE=1 -fhardened

means that _FORTIFY_SOURCE=1 will be used.  Similarly,

  -fhardened -fstack-protector

will not enable -fstack-protector-strong.

Currently, -fhardened is only supported on GNU/Linux.

In DW_AT_producer it is reflected only as -fhardened; it doesn't expand
to anything.  This patch provides -Whardened, enabled by default, which
warns when -fhardened couldn't enable a particular option.  I think most
often it will say that _FORTIFY_SOURCE wasn't enabled because optimization
were not enabled.

gcc/c-family/ChangeLog:

* c-opts.cc: Include "target.h".
(c_finish_options): Maybe cpp_define _FORTIFY_SOURCE
and _GLIBCXX_ASSERTIONS.

gcc/ChangeLog:

* common.opt (Whardened, fhardened): New options.
* config.in: Regenerate.
* config/bpf/bpf.cc: Include "opts.h".
(bpf_option_override): If flag_stack_protector_set_by_fhardened_p, do
not inform that -fstack-protector does not work.
* config/i386/i386-options.cc (ix86_option_override_internal): When
-fhardened, maybe enable -fcf-protection=full.
* config/linux-protos.h (linux_fortify_source_default_level): Declare.
* config/linux.cc (linux_fortify_source_default_level): New.
* config/linux.h (TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL): Redefine.
* configure: Regenerate.
* configure.ac: Check if the linker supports '-z now' and '-z relro'.
Check if -fhardened is supported on $target_os.
* doc/invoke.texi: Document -fhardened and -Whardened.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_FORTIFY_SOURCE_DEFAULT_LEVEL): Add.
* gcc.cc (driver_handle_option): Remember if any link options or -static
were specified on the command line.
(process_command): When -fhardened, maybe enable -pie and
-Wl,-z,relro,-z,now.
* opts.cc (flag_stack_protector_set_by_fhardened_p): New global.
(finish_options): When -fhardened, enable
-ftrivial-auto-var-init=zero and -fstack-protector-strong.
(print_help_hardened): New.
(print_help): Call it.
* target.def (fortify_source_default_level): New target hook.
* targhooks.cc (default_fortify_source_default_level): New.
* targhooks.h (default_fortify_source_default_level): Declare.
* toplev.cc (process_options): When -fhardened, enable

Re: [PATCH 1/2] libstdc++: Atomic wait/notify ABI stabilization

2023-11-16 Thread Jonathan Wakely
On Thu, 16 Nov 2023 at 13:49, Jonathan Wakely  wrote:
>
> From: Thomas Rodgers 
>
> These two patches were written by Tom earlier this year, before he left
> Red Hat. We should finish reviewing them for GCC 14 (and probably squash
> them into one?)
>
> Tom, you mentioned further work that changes the __platform_wait_t* to
> uintptr_t, is that ready, or likely to be ready very soon?
>
> Tested x86_64-linux, testing underway on powerpc-aix and sparc-solaris.

I'm seeing this on AIX and Solaris:

WARNING: program timed out.
FAIL: 30_threads/semaphore/try_acquire.cc  -std=gnu++20 execution test



Re: [Patch] Fortran: Accept -std=f2023 support, update line-length for Fortran 2023

2023-11-16 Thread Harald Anlauf

Hi Tobias,

On 11/16/23 14:01, Tobias Burnus wrote:

This adds -std=f2023, which is mostly a prep patch for future changes.

However, Fortran 2023, https://j3-fortran.org/doc/year/23/23-007r1.pdf
changes two things which is taken
care in this patch:

(A) In "6.3.2.1 Free form line length":

Fortran 2018: "If a line consists entirely of characters of default kind
(7.4.4), it shall contain at most 132 characters"
Fortran 2023: "A line shall contain at most ten thousand characters."

(B) In "6.3.2.6 Free form statements":
Fortran 2018: "A statement shall not have more than 255 continuation
lines."
Fortran 2023: "A statement shall not have more than one million
characters."


this is really a funny change: we're not really prepared to handle
this.  According to the standard one can have 99 lines with only
"&" and then an ";", but then only 100 lines with 1 characters.

There is a similar wording for fixed-form which you overlooked:

6.3.3.5 Fixed form statements

Fortran 2023: "A statement shall not have more than one million characters"

Please adjust the fixed-form limits in your patch.

If you think that we need testcases for fixed-form, add them,
or forget them.  I don't bother.


I have not added a testcase for exceeding the latter but otherwise there
are new
tests and I had to add a couple of -std=f2018 to existing tests.

Comments, suggestions, approval?


I have the following comments:

- there are existing testcases continuation_5.f, continuation_6.f,
  thus I suggest to rename your new continuation_{5,6}.f90 to
  continuation_17.f90+ .

- I don't understand your new testcase line_length_14.f90 .
  This is supposed to test -std=gnu, but then -std=gnu is not a
  standard but a moving target, which is why you had to adjust
  existing testcases.
  So what does it buy us beyond line_length_1{2,3}.f90 ?


Tobias

PS: I find it funny that -std=c23, -std=c++23 and -std=f2023 will get
added in the same GCC release.


:-)


PPS: I did not bother adding .f23 as file extension; I believe that also
.f18 is unsupported.


I never use extensions other than .f90 for portable code.

With the above fixed, I am fine with your patch.

Thanks,
Harald


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer:
Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München;
Registergericht München, HRB 106955




Re: building GNU gettext on AIX

2023-11-16 Thread Arsen Arsenović

Bruno Haible  writes:

> Arsen Arsenović wrote:
>> >   * If yes, then the question is how distributors will in general package
>> > libintl on AIX. If it's installed in public locations (such as in
>> > /opt/freeware/{lib,lib64}/libintl.a on gcc119.fsffrance.org), then we
>> > have a problem: It may cause undefined behaviour in multithreaded
>> > packages that use GNU libintl.
>> > If you can guarantee that it will be installed in GCC-private 
>> > directories
>> > (and outside the path where GCC looks for libraries to link with!) then
>> > it would be OK to install such a non-thread-safe libintl.
>> > But if you cannot guarantee that, we are in trouble.
>> 
>> The in-tree configuration already passes --disable-shared, so I imagine
>> passing --disable-threads would be OK too, for the case that it is
>> utilized.  (relevant for the latter case: GCC-private build of libintl)
>
> Yeah, but this affects only those people who use the in-tree build of
> the libraries.
>
> The problem for distributors remains the same: They have a strong tendency
> of building libraries indepently, with --enable-shared (so that they can
> easily apply fixes without rebuilding the world). These distributors on AIX
> would notice that the GCC configuration attempts to link with "-lintl"
> but not with "-lintl -pthread" and thus the configuration detects that NLS
> is not usable.
>
> Arsen: Where in the GCC tree is this part of the GCC configuration? Is it
> in some configure.ac owned by GCC, or does it come from gettext.m4 ?

See Makefile.def.  It specifies a host-gettext module that has the extra
flags set.

If the in-tree configuration is used, then the uninstalled-config.sh
gettext generates is used.  See config/gettext-sister.m4.

gettext.m4 is unaltered, but it is essentially only used when the
gettext in-tree source is not present (because, otherwise,
gettext-runtime generates uninstalled-config.sh even if it builds
nothing)

Hope that answers it.

> Bruno


-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: building GNU gettext on AIX

2023-11-16 Thread Bruno Haible
Arsen Arsenović wrote:
> >   * If yes, then the question is how distributors will in general package
> > libintl on AIX. If it's installed in public locations (such as in
> > /opt/freeware/{lib,lib64}/libintl.a on gcc119.fsffrance.org), then we
> > have a problem: It may cause undefined behaviour in multithreaded
> > packages that use GNU libintl.
> > If you can guarantee that it will be installed in GCC-private 
> > directories
> > (and outside the path where GCC looks for libraries to link with!) then
> > it would be OK to install such a non-thread-safe libintl.
> > But if you cannot guarantee that, we are in trouble.
> 
> The in-tree configuration already passes --disable-shared, so I imagine
> passing --disable-threads would be OK too, for the case that it is
> utilized.  (relevant for the latter case: GCC-private build of libintl)

Yeah, but this affects only those people who use the in-tree build of
the libraries.

The problem for distributors remains the same: They have a strong tendency
of building libraries indepently, with --enable-shared (so that they can
easily apply fixes without rebuilding the world). These distributors on AIX
would notice that the GCC configuration attempts to link with "-lintl"
but not with "-lintl -pthread" and thus the configuration detects that NLS
is not usable.

Arsen: Where in the GCC tree is this part of the GCC configuration? Is it
in some configure.ac owned by GCC, or does it come from gettext.m4 ?

Bruno





Re: building GNU gettext on AIX

2023-11-16 Thread Bruno Haible
David Edelsohn wrote:
> I manually commented out HAVE_PTHREAD_API from config.h and produced a
> libintl.a without references to pthreads.

Good finding!

Commenting out HAVE_PTHREAD_API from config.h is also what makes the
option --enable-threads=isoc work as expected on AIX 7.3.

Bruno





Re: building GNU gettext on AIX

2023-11-16 Thread Arsen Arsenović

Bruno Haible  writes:

> David Edelsohn wrote:
>> > It is great that gettext and libintl can be built thread-safe, but GCC
>> > (cc1, gcov, etc.) are not pthreads applications and are not built with
>> > pthreads.  Because libintl defaults to pthreads enabled, NLS cannot
>> > function in GCC on AIX by default.  
>> ...
>> The latest issue is that a few files in gettext ignore --disable-pthreads
>> and creates a dependency on pthread_mutex.
>
> GNU gettext does not have an option '--disable-pthreads'. Instead, it has
> options
>
>   --enable-threads={isoc|posix|isoc+posix|windows}
>   specify multithreading API
>
>   --disable-threads   build without multithread safety
>
>> The issue appears to be that intl/gnulib-lib/{mbrtowc.c,setlocale_null.c}
>> include pthread.h based on HAVE_PTHREAD_API, which is defined as 1 in
>> intl/config.h build directory
>
> Yup, I confirm that the dependency comes from these two object files.
>
> Will the next GCC release support AIX 7.1.x ? Recall that AIX 7.1 went
> end-of-life on 2023-04-30 [1]
>
>   * If no, then the simple solution would be to pass the configure option
>   --enable-threads=isoc
> This should not introduce a link dependency, because the mtx_lock,
> mtx_unlock, and mtx_init functions are in libc in AIX ≥ 7.2. Currently it
> does not work (it still uses pthread_mutex_lock and pthread_mutex_unlock
> despite --enable-threads=isoc). But I could make this work and release
> a gettext 0.22.4 with the fix.
>
>   * If yes, then the question is how distributors will in general package
> libintl on AIX. If it's installed in public locations (such as in
> /opt/freeware/{lib,lib64}/libintl.a on gcc119.fsffrance.org), then we
> have a problem: It may cause undefined behaviour in multithreaded
> packages that use GNU libintl.
> If you can guarantee that it will be installed in GCC-private directories
> (and outside the path where GCC looks for libraries to link with!) then
> it would be OK to install such a non-thread-safe libintl.
> But if you cannot guarantee that, we are in trouble.

The in-tree configuration already passes --disable-shared, so I imagine
passing --disable-threads would be OK too, for the case that it is
utilized.  (relevant for the latter case: GCC-private build of libintl)

> How do other library vendors handle this issue on AIX? Do they ship two
> libraries, one MT-safe and one not? Under different names? Or in different
> library search paths?
>
> Bruno
>
> [1] https://www.ibm.com/support/pages/aix-support-lifecycle-information


-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-16 Thread Arsen Arsenović

Xi Ruoyao  writes:

> On Wed, 2023-11-15 at 15:14 +0100, Arsen Arsenović wrote:
>> That is interesting.  They should be using the same checks.  I've
>> checked trunk and regenerated files on it, and saw no significant diff
>> (some whitespace changes only).  Could you post the config.log of
>> both?
>
> You did not regenerate config.in.  But I've regenerated it in r14-5434
> anyway.
>
> The related changes:
>
> +/* Define to 1 if you have the Mac OS X function
> +   CFLocaleCopyPreferredLanguages in the CoreFoundation framework. */
> +#ifndef USED_FOR_TARGET
> +#undef HAVE_CFLOCALECOPYPREFERREDLANGUAGES
> +#endif
> +
> +
> +/* Define to 1 if you have the Mac OS X function
> CFPreferencesCopyAppValue in
> +   the CoreFoundation framework. */
> +#ifndef USED_FOR_TARGET
> +#undef HAVE_CFPREFERENCESCOPYAPPVALUE
> +#endif
>
> +/* Define if the GNU dcgettext() function is already present or preinstalled.
> +   */
> +#ifndef USED_FOR_TARGET
> +#undef HAVE_DCGETTEXT
> +#endif
>
> +/* Define if the GNU gettext() function is already present or preinstalled. 
> */
> +#ifndef USED_FOR_TARGET
> +#undef HAVE_GETTEXT
> +#endif
>
> I don't know if they are related to the issue on AIX though.

Ah, thanks for doing that.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: building GNU gettext on AIX

2023-11-16 Thread Bruno Haible
David Edelsohn wrote:
> > It is great that gettext and libintl can be built thread-safe, but GCC
> > (cc1, gcov, etc.) are not pthreads applications and are not built with
> > pthreads.  Because libintl defaults to pthreads enabled, NLS cannot
> > function in GCC on AIX by default.  
> ...
> The latest issue is that a few files in gettext ignore --disable-pthreads
> and creates a dependency on pthread_mutex.

GNU gettext does not have an option '--disable-pthreads'. Instead, it has
options

  --enable-threads={isoc|posix|isoc+posix|windows}
  specify multithreading API

  --disable-threads   build without multithread safety

> The issue appears to be that intl/gnulib-lib/{mbrtowc.c,setlocale_null.c}
> include pthread.h based on HAVE_PTHREAD_API, which is defined as 1 in
> intl/config.h build directory

Yup, I confirm that the dependency comes from these two object files.

Will the next GCC release support AIX 7.1.x ? Recall that AIX 7.1 went
end-of-life on 2023-04-30 [1]

  * If no, then the simple solution would be to pass the configure option
  --enable-threads=isoc
This should not introduce a link dependency, because the mtx_lock,
mtx_unlock, and mtx_init functions are in libc in AIX ≥ 7.2. Currently it
does not work (it still uses pthread_mutex_lock and pthread_mutex_unlock
despite --enable-threads=isoc). But I could make this work and release
a gettext 0.22.4 with the fix.

  * If yes, then the question is how distributors will in general package
libintl on AIX. If it's installed in public locations (such as in
/opt/freeware/{lib,lib64}/libintl.a on gcc119.fsffrance.org), then we
have a problem: It may cause undefined behaviour in multithreaded
packages that use GNU libintl.
If you can guarantee that it will be installed in GCC-private directories
(and outside the path where GCC looks for libraries to link with!) then
it would be OK to install such a non-thread-safe libintl.
But if you cannot guarantee that, we are in trouble.

How do other library vendors handle this issue on AIX? Do they ship two
libraries, one MT-safe and one not? Under different names? Or in different
library search paths?

Bruno

[1] https://www.ibm.com/support/pages/aix-support-lifecycle-information





Re: [PATCH v3 0/2] Replace intl/ with out-of-tree GNU gettext

2023-11-16 Thread Arsen Arsenović

David Edelsohn  writes:

> On Wed, Nov 15, 2023 at 9:22 AM Arsen Arsenović  wrote:
>
>>
>> David Edelsohn  writes:
>>
>> > GCC had been working on AIX with NLS, using "--with-included-gettext".
>> > --disable-nls gets past the breakage, but GCC does not build for me on
>> AIX
>> > with NLS enabled.
>>
>> That should still work with gettext 0.22+ extracted in-tree (it should
>> be fetched by download_prerequisites).
>>
>> > A change in dependencies for GCC should have been announced and more
>> widely
>> > socialized in the GCC development mailing list, not just GCC patches
>> > mailing list.
>> >
>> > I have tried both the AIX Open Source libiconv and libgettext package,
>> and
>> > the ones that I previously built.  Both fail because GCC configure
>> decides
>> > to disable NLS, despite being requested, while libcpp is satisfied, so
>> > tools in the gcc subdirectory don't link against libiconv and the build
>> > fails.  With the included gettext, I was able to rely on a
>> self-consistent
>> > solution.
>>
>> That is interesting.  They should be using the same checks.  I've
>> checked trunk and regenerated files on it, and saw no significant diff
>> (some whitespace changes only).  Could you post the config.log of both?
>>
>> I've never used AIX.  Can I reproduce this on one of the cfarm machines
>> to poke around?  I've tried cfarm119, but that one lacked git, and I
>> haven't poked around much further due to time constraints.
>>
>
> The AIX system in the Compile Farm has a complete complement of Open Source
> software installed.
>
> Please ensure that /opt/freeware/bin is in your path.  Also, the GCC Wiki
> Compile Farm page has build tips that include AIX
>
> https://gcc.gnu.org/wiki/CompileFarm#Services_and_software_installed_on_farm_machines

Thanks, that got me further.

> that recommended --with-included-gettext configuration option.

This flag should still exist and operate the same if gettext is present
in tree.  I've cloned gcc and downloaded prerequisites (via
contrib/download_prerequisites) and I am trying to configure it now.

> Thanks, David
>
>
>>
>> TIA, sorry about the inconvenience.  Have a lovely day.
>>
>> > The current gettext-0.22.3 fails to build for me on AIX.
>> >
>> > libcpp configure believes that NLS functions on AIX, but gcc configure
>> > fails in its tests of gettext functionality, which leads to an
>> inconsistent
>> > configuration and build breakage.
>> >
>> > Thanks, David
>>
>>
>> --
>> Arsen Arsenović
>>


-- 
Arsen Arsenović


signature.asc
Description: PGP signature


RE: [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits

2023-11-16 Thread Tamar Christina
> -Original Message-
> From: Tamar Christina 
> Sent: Thursday, November 16, 2023 3:19 PM
> To: Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: RE: [PATCH 7/21]middle-end: update IV update code to support early
> breaks and arbitrary exits
> 
> > -Original Message-
> > From: Richard Biener 
> > Sent: Thursday, November 16, 2023 2:18 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > Subject: RE: [PATCH 7/21]middle-end: update IV update code to support
> > early breaks and arbitrary exits
> >
> > On Thu, 16 Nov 2023, Tamar Christina wrote:
> >
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Thursday, November 16, 2023 1:36 PM
> > > > To: Tamar Christina 
> > > > Cc: gcc-patches@gcc.gnu.org; nd ;
> > j...@ventanamicro.com
> > > > Subject: RE: [PATCH 7/21]middle-end: update IV update code to
> > > > support early breaks and arbitrary exits
> > > >
> > > > On Thu, 16 Nov 2023, Tamar Christina wrote:
> > > >
> > > > > > > > > > >
> > > > > > > > > > > Perhaps I'm missing something here?
> > > > > > > > > >
> > > > > > > > > > OK, so I refreshed my mind of what
> > > > > > > > > > vect_update_ivs_after_vectorizer
> > > > > > > > does.
> > > > > > > > > >
> > > > > > > > > > I still do not understand the (complexity of the) patch.
> > > > > > > > > > Basically the function computes the new value of the
> > > > > > > > > > IV "from scratch" based on the number of scalar
> > > > > > > > > > iterations of the vector loop,
> > > > > > the 'niter'
> > > > > > > > > > argument.  I would have expected that for the early
> > > > > > > > > > exits we either pass in a different 'niter' or
> > > > > > > > > > alternatively a
> > 'niter_adjustment'.
> > > > > > > > >
> > > > > > > > > But for an early exit there's no static value for
> > > > > > > > > adjusted niter, since you don't know which iteration you 
> > > > > > > > > exited
> from.
> > > > > > > > > Unlike the normal exit when you know if you get there
> > > > > > > > > you've done all possible
> > > > > > > > iterations.
> > > > > > > > >
> > > > > > > > > So you must compute the scalar iteration count on the exit 
> > > > > > > > > itself.
> > > > > > > >
> > > > > > > > ?  You do not need the actual scalar iteration you exited
> > > > > > > > (you don't compute that either), you need the scalar
> > > > > > > > iteration the vector iteration started with when it exited
> > > > > > > > prematurely and that's readily
> > > > > > available?
> > > > > > >
> > > > > > > For a normal exit yes, not for an early exit no?
> > > > > > > niters_vector_mult_vf is only valid for the main exit.
> > > > > > >
> > > > > > > There's the unadjusted scalar count, which is what it's
> > > > > > > using to adjust it to the final count.  Unless I'm missing 
> > > > > > > something?
> > > > > >
> > > > > > Ah, of course - niters_vector_mult_vf is for the countable exit.
> > > > > > For the early exits we can't precompute the scalar iteration value.
> > > > > > But that then means we should compute the appropriate
> > "continuation"
> > > > > > as live value of the vectorized IVs even when they were not
> > > > > > originally used outside of the loop.  I don't see how we can
> > > > > > express this in terms of the scalar IVs in the (not yet)
> > > > > > vectorized loop - similar to the reduction case you are going
> > > > > > to end up with the wrong values
> > > > here.
> > > > > >
> > > > > > That said, I've for a long time wanted to preserve the
> > > > > > original control IV also for the vector code (leaving any 
> > > > > > "optimization"
> > > > > > to IVOPTs there), that would enable us to compute the correct
> > > > > > "niters_vector_mult_vf" based on that IV.
> > > > > >
> > > > > > So given we cannot use the scalar IVs you have to handle all
> > > > > > inductions (besides the main exit control IV) in
> > > > > > vectorizable_live_operation
> > > > I think.
> > > > > >
> > > > >
> > > > > That's what I currently do, that's why there was the
> > > > > if (STMT_VINFO_LIVE_P (phi_info))
> > > > >   continue;
> > > >
> > > > Yes, but that only works for the inductions marked so.  We'd need
> > > > to mark the others as well, but only for the early exits.
> > > >
> > > > > although I don't understand why we use the scalar count,  I
> > > > > suppose the reasoning is that we don't really want to keep it
> > > > > around, and referencing
> > > > it forces it to be kept?
> > > >
> > > > Referencing it will cause the scalar compute to be retained, but
> > > > since we do not adjust the scalar compute during vectorization
> > > > (but expect it to be dead) the scalar compute will compute the
> > > > wrong thing (as shown by the reduction example - I suspect
> > > > inductions will suffer
> > from the same problem).
> > > >
> > > > > At the moment it just does `init + (final - init) * vf` which is 
> > > > > correct no?
> > > >
> > > > 

Re: [Committed] RISC-V: Change unaligned fast/slow/avoid macros to misaligned [PR111557]

2023-11-16 Thread Edwin Lu

Committed!

On 11/15/2023 11:34 PM, Kito Cheng wrote:

ohhh, thanks for fixing that, LGTM!

On Thu, Nov 16, 2023 at 7:31 AM Edwin Lu  wrote:

Fix __riscv_unaligned_fast/slow/avoid macro name to
__riscv_misaligned_fast/slow/avoid to be consistent with the RISC-V API Spec

gcc/ChangeLog:

 * config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): update macro name

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/attribute-1.c: update macro name
 * gcc.target/riscv/attribute-4.c: ditto
 * gcc.target/riscv/attribute-5.c: ditto
 * gcc.target/riscv/predef-align-1.c: ditto
 * gcc.target/riscv/predef-align-2.c: ditto
 * gcc.target/riscv/predef-align-3.c: ditto
 * gcc.target/riscv/predef-align-4.c: ditto
 * gcc.target/riscv/predef-align-5.c: ditto
 * gcc.target/riscv/predef-align-6.c: ditto

Signed-off-by: Edwin Lu 
---
  gcc/config/riscv/riscv-c.cc |  6 +++---
  gcc/testsuite/gcc.target/riscv/attribute-1.c| 10 +-
  gcc/testsuite/gcc.target/riscv/attribute-4.c|  8 
  gcc/testsuite/gcc.target/riscv/attribute-5.c| 10 +-
  gcc/testsuite/gcc.target/riscv/predef-align-1.c | 10 +-
  gcc/testsuite/gcc.target/riscv/predef-align-2.c |  8 
  gcc/testsuite/gcc.target/riscv/predef-align-3.c | 10 +-
  gcc/testsuite/gcc.target/riscv/predef-align-4.c | 10 +-
  gcc/testsuite/gcc.target/riscv/predef-align-5.c |  8 
  gcc/testsuite/gcc.target/riscv/predef-align-6.c | 10 +-
  10 files changed, 45 insertions(+), 45 deletions(-)

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index b7f9ba204f7..dd1bd0596fc 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -109,11 +109,11 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
  }

if (riscv_user_wants_strict_align)
-builtin_define_with_int_value ("__riscv_unaligned_avoid", 1);
+builtin_define_with_int_value ("__riscv_misaligned_avoid", 1);
else if (riscv_slow_unaligned_access_p)
-builtin_define_with_int_value ("__riscv_unaligned_slow", 1);
+builtin_define_with_int_value ("__riscv_misaligned_slow", 1);
else
-builtin_define_with_int_value ("__riscv_unaligned_fast", 1);
+builtin_define_with_int_value ("__riscv_misaligned_fast", 1);

if (TARGET_MIN_VLEN != 0)
  builtin_define_with_int_value ("__riscv_v_min_vlen", TARGET_MIN_VLEN);
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-1.c 
b/gcc/testsuite/gcc.target/riscv/attribute-1.c
index abfb0b498e0..a39efb3e6ff 100644
--- a/gcc/testsuite/gcc.target/riscv/attribute-1.c
+++ b/gcc/testsuite/gcc.target/riscv/attribute-1.c
@@ -4,13 +4,13 @@ int foo()
  {

  /* In absence of -m[no-]strict-align, default mcpu is currently
-   set to rocket.  rocket has slow_unaligned_access=true.  */
-#if !defined(__riscv_unaligned_slow)
-#error "__riscv_unaligned_slow is not set"
+   set to rocket.  rocket has slow_misaligned_access=true.  */
+#if !defined(__riscv_misaligned_slow)
+#error "__riscv_misaligned_slow is not set"
  #endif

-#if defined(__riscv_unaligned_avoid) || defined(__riscv_unaligned_fast)
-#error "__riscv_unaligned_avoid or __riscv_unaligned_fast is unexpectedly set"
+#if defined(__riscv_misaligned_avoid) || defined(__riscv_misaligned_fast)
+#error "__riscv_misaligned_avoid or __riscv_misaligned_fast is unexpectedly 
set"
  #endif

  return 0;
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-4.c 
b/gcc/testsuite/gcc.target/riscv/attribute-4.c
index 545f87cb899..a5a95042a31 100644
--- a/gcc/testsuite/gcc.target/riscv/attribute-4.c
+++ b/gcc/testsuite/gcc.target/riscv/attribute-4.c
@@ -3,12 +3,12 @@
  int foo()
  {

-#if !defined(__riscv_unaligned_avoid)
-#error "__riscv_unaligned_avoid is not set"
+#if !defined(__riscv_misaligned_avoid)
+#error "__riscv_misaligned_avoid is not set"
  #endif

-#if defined(__riscv_unaligned_fast) || defined(__riscv_unaligned_slow)
-#error "__riscv_unaligned_fast or __riscv_unaligned_slow is unexpectedly set"
+#if defined(__riscv_misaligned_fast) || defined(__riscv_misaligned_slow)
+#error "__riscv_misaligned_fast or __riscv_misaligned_slow is unexpectedly set"
  #endif

return 0;
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-5.c 
b/gcc/testsuite/gcc.target/riscv/attribute-5.c
index 753043c31e9..ad1a1811fa3 100644
--- a/gcc/testsuite/gcc.target/riscv/attribute-5.c
+++ b/gcc/testsuite/gcc.target/riscv/attribute-5.c
@@ -3,13 +3,13 @@
  int foo()
  {

-/* Default mcpu is rocket which has slow_unaligned_access=true.  */
-#if !defined(__riscv_unaligned_slow)
-#error "__riscv_unaligned_slow is not set"
+/* Default mcpu is rocket which has slow_misaligned_access=true.  */
+#if !defined(__riscv_misaligned_slow)
+#error "__riscv_misaligned_slow is not set"
  #endif

-#if defined(__riscv_unaligned_avoid) || defined(__riscv_unaligned_fast)
-#error "__riscv_unaligned_avoid or __riscv_unaligned_fast is unexpectedly set"
+#if 

Re: building GNU gettext on AIX

2023-11-16 Thread David Edelsohn
I manually commented out HAVE_PTHREAD_API from config.h and produced a
libintl.a without references to pthreads.  Configuring GCC with that custom
libintl.a enables NLS.  I now am building GCC with NLS and we will see how
well it functions.

gettext depends on pthreads by default and the versions distributed.

Thanks, David


On Thu, Nov 16, 2023 at 1:01 PM David Edelsohn  wrote:

> Bruno,
>
> The issue appears to be that intl/gnulib-lib/{mbrtowc.c,setlocale_null.c}
> include pthread.h based on HAVE_PTHREAD_API, which is defined as 1 in
> intl/config.h build directory despite requesting --disable-pthreads.
>
> Thanks, David
>
> On Thu, Nov 16, 2023 at 11:35 AM David Edelsohn  wrote:
>
>> I configured gettext with --disable-pthreads and libintl.a still contains
>> references to pthread_mutex_lock and pthread_mutex_unlock, which causes NLS
>> configure to fail on AIX.
>>
>> How can this be corrected?
>>
>> Thanks, David
>>
>> libintl.a[libgnu_la-mbrtowc.o]:
>>
>>  - U __lc_charmap
>>
>>  - U errno
>>
>>  - U .locale_encoding_classification
>>
>>  - U .gl_get_mbtowc_lock
>>
>>  - U .pthread_mutex_lock
>>
>>  - U .mbtowc
>>
>>  - U .pthread_mutex_unlock
>>
>>  - U .abort
>>
>>  0 T ._libintl_mbrtowc
>>
>>   1952 D _libintl_mbrtowc
>>
>> libintl.a[libgnu_la-setlocale_null.o]:
>>
>>  - U .gl_get_setlocale_null_lock
>>
>>  - U .pthread_mutex_lock
>>
>>  - U .setlocale
>>
>>  - U .strlen
>>
>>  - U .memcpy
>>
>>  - U .pthread_mutex_unlock
>>
>>  - U .abort
>>
>>  - U .strcpy
>>
>>336 T ._libintl_setlocale_null_r
>>
>>400 T ._libintl_setlocale_null
>>
>>812 D _libintl_setlocale_null_r
>>
>>824 D _libintl_setlocale_null
>>
>> On Thu, Nov 16, 2023 at 11:00 AM David Edelsohn 
>> wrote:
>>
>>> Bruno,
>>>
>>> I have been able to tweak the environment and build gettext and
>>> libintl.  With the updated libintl and environment, GCC reliably does not
>>> use NLS.
>>>
>>> The issue is that libintl utilizes pthreads.  AIX does not provide no-op
>>> pthread stubs in libc.  pthreads is an explicit multilib on AIX.
>>>
>>> It is great that gettext and libintl can be built thread-safe, but GCC
>>> (cc1, gcov, etc.) are not pthreads applications and are not built with
>>> pthreads.  Because libintl defaults to pthreads enabled, NLS cannot
>>> function in GCC on AIX by default.  The GCC included gettext was built in
>>> the default for GCC libraries, which was not pthreads enabled.
>>>
>>> I can rebuild libintl with --disable-pthreads and I will see if that
>>> works, but the default, distributed libintl library will not allow GCC to
>>> be built with NLS enabled.  And, no, GCC on AIX should not be forced to
>>> build with pthreads.
>>>
>>> This is a regression in NLS support in GCC.
>>>
>>> Thanks, David
>>>
>>>
>>> On Wed, Nov 15, 2023 at 5:39 PM Bruno Haible  wrote:
>>>
 David Edelsohn wrote:
 > I am using my own install of GCC for a reason.

 I have built GNU gettext 0.22.3 in various configurations on the AIX 7.1
 and 7.3 machines in the compilefarm, and haven't encountered issues with
 'max_align_t' nor with 'getpeername'. So, from my point of view, GNU
 gettext
 works fine on AIX with gcc and xlc (but not ibm-clang, which I haven't
 tested).

 You will surely understand that I cannot test a release against a
 compiler
 that exists only on your hard disk.

 The hint I gave you, based on the partial logs that you provided, is to
 look at the configure test for intmax_t first.

 Bruno






[committed] i386: Optimize QImode insn with high input registers

2023-11-16 Thread Uros Bizjak
Sometimes the compiler emits the following code with qi_ext_0:

shrl$8, %eax
addb%bh, %al

Patch introduces new low part QImode insn patterns with both of
their input arguments extracted from high register.  This invalid
insn is split after reload to a move from the high register
and qi_ext_0 instruction.  The combine pass is able to
convert shift to zero/sign-extract sub-RTX, which we split to the
optimal:

movzbl  %bh, %edx
addb%ah, %dl

PR target/78904

gcc/ChangeLog:

* config/i386/i386.md (*addqi_ext2_0):
New define_insn_and_split pattern.
(*subqi_ext2_0): Ditto.
(*qi_ext2_0): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr78904-10.c: New test.
* gcc.target/i386/pr78904-10a.c: New test.
* gcc.target/i386/pr78904-10b.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index f5407ab3054..1b5a794b9e5 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -7069,6 +7069,39 @@ (define_insn "*addqi_ext_0"
(set_attr "type" "alu")
(set_attr "mode" "QI")])
 
+(define_insn_and_split "*addqi_ext2_0"
+  [(set (match_operand:QI 0 "register_operand" "=")
+   (plus:QI
+ (subreg:QI
+   (match_operator:SWI248 3 "extract_operator"
+ [(match_operand 1 "int248_register_operand" "Q")
+  (const_int 8)
+  (const_int 8)]) 0)
+ (subreg:QI
+   (match_operator:SWI248 4 "extract_operator"
+ [(match_operand 2 "int248_register_operand" "Q")
+  (const_int 8)
+  (const_int 8)]) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  ""
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+   (subreg:QI
+ (match_op_dup 4
+   [(match_dup 2) (const_int 8) (const_int 8)]) 0))
+   (parallel
+ [(set (match_dup 0)
+  (plus:QI
+(subreg:QI
+  (match_op_dup 3
+[(match_dup 1) (const_int 8) (const_int 8)]) 0)
+  (match_dup 0)))
+  (clobber (reg:CC FLAGS_REG))])]
+  ""
+  [(set_attr "type" "alu")
+   (set_attr "mode" "QI")])
+
 (define_expand "addqi_ext_1"
   [(parallel
  [(set (zero_extract:HI (match_operand:HI 0 "register_operand")
@@ -7814,6 +7847,39 @@ (define_insn "*subqi_ext_0"
(set_attr "type" "alu")
(set_attr "mode" "QI")])
 
+(define_insn_and_split "*subqi_ext2_0"
+  [(set (match_operand:QI 0 "register_operand" "=")
+   (minus:QI
+ (subreg:QI
+   (match_operator:SWI248 3 "extract_operator"
+ [(match_operand 1 "int248_register_operand" "Q")
+  (const_int 8)
+  (const_int 8)]) 0)
+ (subreg:QI
+   (match_operator:SWI248 4 "extract_operator"
+ [(match_operand 2 "int248_register_operand" "Q")
+  (const_int 8)
+  (const_int 8)]) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  ""
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+   (subreg:QI
+ (match_op_dup 3
+   [(match_dup 1) (const_int 8) (const_int 8)]) 0))
+   (parallel
+ [(set (match_dup 0)
+  (minus:QI
+(match_dup 0)
+(subreg:QI
+  (match_op_dup 4
+[(match_dup 2) (const_int 8) (const_int 8)]) 0)))
+  (clobber (reg:CC FLAGS_REG))])]
+  ""
+  [(set_attr "type" "alu")
+   (set_attr "mode" "QI")])
+
 ;; Alternative 1 is needed to work around LRA limitation, see PR82524.
 (define_insn_and_split "*subqi_ext_1"
   [(set (zero_extract:SWI248
@@ -11815,6 +11881,39 @@ (define_insn "*qi_ext_0"
(set_attr "type" "alu")
(set_attr "mode" "QI")])
 
+(define_insn_and_split "*qi_ext2_0"
+  [(set (match_operand:QI 0 "register_operand" "=")
+   (any_logic:QI
+ (subreg:QI
+   (match_operator:SWI248 3 "extract_operator"
+ [(match_operand 1 "int248_register_operand" "Q")
+  (const_int 8)
+  (const_int 8)]) 0)
+ (subreg:QI
+   (match_operator:SWI248 4 "extract_operator"
+ [(match_operand 2 "int248_register_operand" "Q")
+  (const_int 8)
+  (const_int 8)]) 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  ""
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+   (subreg:QI
+ (match_op_dup 4
+   [(match_dup 2) (const_int 8) (const_int 8)]) 0))
+   (parallel
+ [(set (match_dup 0)
+  (any_logic:QI
+(subreg:QI
+  (match_op_dup 3
+[(match_dup 1) (const_int 8) (const_int 8)]) 0)
+  (match_dup 0)))
+  (clobber (reg:CC FLAGS_REG))])]
+  ""
+  [(set_attr "type" "alu")
+   (set_attr "mode" "QI")])
+
 (define_expand "andqi_ext_1"
   [(parallel
  [(set (zero_extract:HI (match_operand:HI 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/i386/pr78904-10.c 
b/gcc/testsuite/gcc.target/i386/pr78904-10.c
new file mode 100644
index 

[PATCH 11/11] aarch64: Use individual loads/stores for mem{cpy,set} expansion

2023-11-16 Thread Alex Coplan
This patch adjusts the mem{cpy,set} expansion in the aarch64 backend to use
individual loads/stores instead of ldp/stp at expand time.  The idea is to rely
on the ldp fusion pass to fuse the accesses together later in the RTL pipeline.

The earlier parts of the RTL pipeline should be able to do a better job with the
individual (non-paired) accesses, especially given that an earlier patch in this
series moves the pair representation to use unspecs.

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

* config/aarch64/aarch64.cc
(aarch64_copy_one_block_and_progress_pointers): Emit individual
accesses instead of load/store pairs.
(aarch64_set_one_block_and_progress_pointer): Likewise.
---
 gcc/config/aarch64/aarch64.cc | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 1f6094bf1bc..315ba7119c0 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -25457,9 +25457,12 @@ aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst,
   /* "Cast" the pointers to the correct mode.  */
   *src = adjust_address (*src, mode, 0);
   *dst = adjust_address (*dst, mode, 0);
-  /* Emit the memcpy.  */
-  emit_insn (aarch64_gen_load_pair (reg1, reg2, *src));
-  emit_insn (aarch64_gen_store_pair (*dst, reg1, reg2));
+  /* Emit the memcpy.  The load/store pair pass should form
+	 a load/store pair from these moves.  */
+  emit_move_insn (reg1, *src);
+  emit_move_insn (reg2, aarch64_progress_pointer (*src));
+  emit_move_insn (*dst, reg1);
+  emit_move_insn (aarch64_progress_pointer (*dst), reg2);
   /* Move the pointers forward.  */
   *src = aarch64_move_pointer (*src, 32);
   *dst = aarch64_move_pointer (*dst, 32);
@@ -25638,7 +25641,8 @@ aarch64_set_one_block_and_progress_pointer (rtx src, rtx *dst,
   /* "Cast" the *dst to the correct mode.  */
   *dst = adjust_address (*dst, mode, 0);
   /* Emit the memset.  */
-  emit_insn (aarch64_gen_store_pair (*dst, src, src));
+  emit_move_insn (*dst, src);
+  emit_move_insn (aarch64_progress_pointer (*dst), src);
 
   /* Move the pointers forward.  */
   *dst = aarch64_move_pointer (*dst, 32);


[PATCH 10/11] aarch64: Add new load/store pair fusion pass.

2023-11-16 Thread Alex Coplan
This is a v3 of the aarch64 load/store pair fusion pass.
v2 was posted here:
 - https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633601.html

The main changes since v2 are as follows:

We now handle writeback opportunities as well.  E.g. for this testcase:

void foo (long *p, long *q, long x, long y)
{
  do {
*(p++) = x;
*(p++) = y;
  } while (p < q);
}

wtih the patch, we generate:

foo:
.LFB0:
.align  3
.L2:
stp x2, x3, [x0], 16
cmp x0, x1
bcc .L2
ret

instead of:

foo:
.LFB0:
.align  3
.L2:
str x2, [x0], 16
str x3, [x0, -8]
cmp x0, x1
bcc .L2
ret

i.e. the pass is now capable of finding load/store pair opportunities even in
the case that one or more of the initial candidate accesses uses writeback 
addressing.
We do this by adding a notion of canonicalizing RTL bases.  When we see a
writeback access, we record that the new base def is equivalent to the original
def plus some offset.  When tracking accesses, we then canonicalize to track
each access relative to the earliest equivalent base in the basic block.

This allows us to spot that accesses are adjacent even though they don't share
the same RTL-SSA base def.

Furthermore, we also add some extra logic to opportunistically fold in trailing
destructive updates of the base register used for a load/store pair.  E.g. for

void post_add (long *p, long *q, long x, long y)
{
  do {
p[0] = x;
p[1] = y;
p += 2;
  } while (p < q);
}

the auto-inc-dec pass doesn't currently form any writeback accesses, and we
generate:

post_add:
.LFB0:
.align  3
.L2:
add x0, x0, 16
stp x2, x3, [x0, -16]
cmp x0, x1
bcc .L2
ret

but with the updated pass, we now get:

post_add:
.LFB0:
.align  3
.L2:
stp x2, x3, [x0], 16
cmp x0, x1
bcc .L2
ret

Other notable changes to the pass since the last version include:
 - We switch to using the aarch64_gen_{load,store}_pair interface
   for forming the (non-writeback) pairs, allowing use of the new
   load/store pair representation added by the earlier patch.
 - The various updates to the load/store pair patterns mean that
   we no longer need to do mode canonicalization / mode unification
   in the pass, as the patterns allow arbitrary combinations of suitable modes
   of the same size.  So we remove the logic to do this (including the
   param to control the strategy).
 - Fix up classification of zero operands to make sure that these are always
   treated as GPR operands for pair discovery purposes.  This avoids us
   pairing zero operands with FPRs in the pre-RA pass, which used to lead to
   undesirable codegen involving cross-file moves.
 - We also remove the try_adjust_address logic from the previous iteration of
   the pass.  Since we validate all ldp/stp offsets in the pass, this only
   meant that we lost opportunities in the case that a given mem fails to
   adjust in its original mode.

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

* config.gcc: Add aarch64-ldp-fusion.o to extra_objs for aarch64; add
aarch64-ldp-fusion.cc to target_gtfiles.
* config/aarch64/aarch64-passes.def: Add copies of pass_ldp_fusion
before and after RA.
* config/aarch64/aarch64-protos.h (make_pass_ldp_fusion): Declare.
* config/aarch64/aarch64.opt (-mearly-ldp-fusion): New.
(-mlate-ldp-fusion): New.
(--param=aarch64-ldp-alias-check-limit): New.
(--param=aarch64-ldp-writeback): New.
* config/aarch64/t-aarch64: Add rule for aarch64-ldp-fusion.o.
* config/aarch64/aarch64-ldp-fusion.cc: New file.
---
 gcc/config.gcc   |4 +-
 gcc/config/aarch64/aarch64-ldp-fusion.cc | 2727 ++
 gcc/config/aarch64/aarch64-passes.def|2 +
 gcc/config/aarch64/aarch64-protos.h  |1 +
 gcc/config/aarch64/aarch64.opt   |   23 +
 gcc/config/aarch64/t-aarch64 |7 +
 6 files changed, 2762 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-ldp-fusion.cc

diff --git a/gcc/config.gcc b/gcc/config.gcc
index c1460ca354e..8b7f6b20309 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -349,8 +349,8 @@ aarch64*-*-*)
 	c_target_objs="aarch64-c.o"
 	cxx_target_objs="aarch64-c.o"
 	d_target_objs="aarch64-d.o"
-	extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o"
-	target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.cc \$(srcdir)/config/aarch64/aarch64-sve-builtins.h \$(srcdir)/config/aarch64/aarch64-sve-builtins.cc"
+	extra_objs="aarch64-builtins.o aarch-common.o 

[PATCH 09/11] aarch64: Rewrite non-writeback ldp/stp patterns

2023-11-16 Thread Alex Coplan
This patch overhauls the load/store pair patterns with two main goals:

1. Fixing a correctness issue (the current patterns are not RA-friendly).
2. Allowing more flexibility in which operand modes are supported, and which
   combinations of modes are allowed in the two arms of the load/store pair,
   while reducing the number of patterns required both in the source and in
   the generated code.

The correctness issue (1) is due to the fact that the current patterns have
two independent memory operands tied together only by a predicate on the insns.
Since LRA only looks at the constraints, one of the memory operands can get
reloaded without the other one being changed, leading to the insn becoming
unrecognizable after reload.

We fix this issue by changing the patterns such that they only ever have one
memory operand representing the entire pair.  For the store case, we use an
unspec to logically concatenate the register operands before storing them.
For the load case, we use unspecs to extract the "lanes" from the pair mem,
with the second occurrence of the mem matched using a match_dup (such that there
is still really only one memory operand as far as the RA is concerned).

In terms of the modes used for the pair memory operands, we canonicalize
these to V2x4QImode, V2x8QImode, and V2x16QImode.  These modes have not
only the correct size but also correct alignment requirement for a
memory operand representing an entire load/store pair.  Unlike the other
two, V2x4QImode didn't previously exist, so had to be added with the
patch.

As with the previous patch generalizing the writeback patterns, this
patch aims to be flexible in the combinations of modes supported by the
patterns without requiring a large number of generated patterns by using
distinct mode iterators.

The new scheme means we only need a single (generated) pattern for each
load/store operation of a given operand size.  For the 4-byte and 8-byte
operand cases, we use the GPI iterator to synthesize the two patterns.
The 16-byte case is implemented as a separate pattern in the source (due
to only having a single possible alternative).

Since the UNSPEC patterns can't be interpreted by the dwarf2cfi code,
we add REG_CFA_OFFSET notes to the store pair insns emitted by
aarch64_save_callee_saves, so that correct CFI information can still be
generated.  Furthermore, we now unconditionally generate these CFA
notes on frame-related insns emitted by aarch64_save_callee_saves.
This is done in case that the load/store pair pass forms these into
pairs, in which case the CFA notes would be needed.

We also adjust the ldp/stp peepholes to generate the new form.  This is
done by switching the generation to use the
aarch64_gen_{load,store}_pair interface, making it easier to change the
form in the future if needed.  (Likewise, the upcoming aarch64
load/store pair pass also makes use of this interface).

This patch also adds an "ldpstp" attribute to the non-writeback
load/store pair patterns, which is used by the post-RA load/store pair
pass to identify existing patterns and see if they can be promoted to
writeback variants.

One potential concern with using unspecs for the patterns is that it can block
optimization by the generic RTL passes.  This patch series tries to mitigate
this in two ways:
 1. The pre-RA load/store pair pass runs very late in the pre-RA pipeline.
 2. A later patch in the series adjusts the aarch64 mem{cpy,set} expansion to
emit individual loads/stores instead of ldp/stp.  These should then be
formed back into load/store pairs much later in the RTL pipeline by the
new load/store pair pass.

Bootstrapped/regtested on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

gcc/ChangeLog:

* config/aarch64/aarch64-ldpstp.md: Abstract ldp/stp
representation from peepholes, allowing use of new form.
* config/aarch64/aarch64-modes.def (V2x4QImode): Define.
* config/aarch64/aarch64-protos.h
(aarch64_finish_ldpstp_peephole): Declare.
(aarch64_swap_ldrstr_operands): Delete declaration.
(aarch64_gen_load_pair): Declare.
(aarch64_gen_store_pair): Declare.
* config/aarch64/aarch64-simd.md (load_pair):
Delete.
(vec_store_pair): Delete.
(load_pair): Delete.
(vec_store_pair): Delete.
* config/aarch64/aarch64.cc (aarch64_pair_mode_for_mode): New.
(aarch64_gen_store_pair): Adjust to use new unspec form of stp.
Drop second mem from parameters.
(aarch64_gen_load_pair): Likewise.
(aarch64_pair_mem_from_base): New.
(aarch64_save_callee_saves): Emit REG_CFA_OFFSET notes for
frame-related saves.  Adjust call to aarch64_gen_store_pair
(aarch64_restore_callee_saves): Adjust calls to
aarch64_gen_load_pair to account for change in interface.
(aarch64_process_components): Likewise.
(aarch64_classify_address): Handle 32-byte pair mems in
LDP_STP_N case.

[PATCH 08/11] aarch64: Generalize writeback ldp/stp patterns

2023-11-16 Thread Alex Coplan
Thus far the writeback forms of ldp/stp have been exclusively used in
prologue and epilogue code for saving/restoring of registers to/from the
stack.

As such, forms of ldp/stp that weren't needed for prologue/epilogue code
weren't supported by the aarch64 backend.  This patch generalizes the
load/store pair writeback patterns to allow:

 - Base registers other than the stack pointer.
 - Modes that weren't previously supported.
 - Combinations of distinct modes provided they have the same size.
 - Pre/post variants that weren't previously needed in prologue/epilogue
   code.

We make quite some effort to avoid a combinatorial explosion in the
number of patterns generated (and those in the source) by making
extensive use of special predicates.

An updated version of the upcoming ldp/stp pass can generate the
writeback forms, so this patch is motivated by that.

This patch doesn't add zero-extending or sign-extending forms of the
writeback patterns; that is left for future work.

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_ldpstp_operand_mode_p): 
Declare.
* config/aarch64/aarch64.cc (aarch64_gen_storewb_pair): Build RTL
directly instead of invoking named pattern.
(aarch64_gen_loadwb_pair): Likewise.
(aarch64_ldpstp_operand_mode_p): New.
* config/aarch64/aarch64.md (loadwb_pair_): Replace 
with
...
(*loadwb_post_pair_): ... this. Generalize as described
in cover letter.
(loadwb_pair_): Delete (superseded by the
above).
(*loadwb_post_pair_16): New.
(*loadwb_pre_pair_): New.
(loadwb_pair_): Delete.
(*loadwb_pre_pair_16): New.
(storewb_pair_): Replace with ...
(*storewb_pre_pair_): ... this.  Generalize as
described in cover letter.
(*storewb_pre_pair_16): New.
(storewb_pair_): Delete.
(*storewb_post_pair_): New.
(storewb_pair_): Delete.
(*storewb_post_pair_16): New.
* config/aarch64/predicates.md (aarch64_mem_pair_operator): New.
(pmode_plus_operator): New.
(aarch64_ldp_reg_operand): New.
(aarch64_stp_reg_operand): New.
---
 gcc/config/aarch64/aarch64-protos.h |   1 +
 gcc/config/aarch64/aarch64.cc   |  60 +++---
 gcc/config/aarch64/aarch64.md   | 284 
 gcc/config/aarch64/predicates.md|  38 
 4 files changed, 271 insertions(+), 112 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 36d6c688bc8..e463fd5c817 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1023,6 +1023,7 @@ bool aarch64_operands_ok_for_ldpstp (rtx *, bool, machine_mode);
 bool aarch64_operands_adjust_ok_for_ldpstp (rtx *, bool, machine_mode);
 bool aarch64_mem_ok_with_ldpstp_policy_model (rtx, bool, machine_mode);
 void aarch64_swap_ldrstr_operands (rtx *, bool);
+bool aarch64_ldpstp_operand_mode_p (machine_mode);
 
 extern void aarch64_asm_output_pool_epilogue (FILE *, const char *,
 	  tree, HOST_WIDE_INT);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 4820fac67a1..ccf081d2a16 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8977,23 +8977,15 @@ static rtx
 aarch64_gen_storewb_pair (machine_mode mode, rtx base, rtx reg, rtx reg2,
 			  HOST_WIDE_INT adjustment)
 {
-  switch (mode)
-{
-case E_DImode:
-  return gen_storewb_pairdi_di (base, base, reg, reg2,
-GEN_INT (-adjustment),
-GEN_INT (UNITS_PER_WORD - adjustment));
-case E_DFmode:
-  return gen_storewb_pairdf_di (base, base, reg, reg2,
-GEN_INT (-adjustment),
-GEN_INT (UNITS_PER_WORD - adjustment));
-case E_TFmode:
-  return gen_storewb_pairtf_di (base, base, reg, reg2,
-GEN_INT (-adjustment),
-GEN_INT (UNITS_PER_VREG - adjustment));
-default:
-  gcc_unreachable ();
-}
+  rtx new_base = plus_constant (Pmode, base, -adjustment);
+  rtx mem = gen_frame_mem (mode, new_base);
+  rtx mem2 = adjust_address_nv (mem, mode, GET_MODE_SIZE (mode));
+
+  return gen_rtx_PARALLEL (VOIDmode,
+			   gen_rtvec (3,
+  gen_rtx_SET (base, new_base),
+  gen_rtx_SET (mem, reg),
+  gen_rtx_SET (mem2, reg2)));
 }
 
 /* Push registers numbered REGNO1 and REGNO2 to the stack, adjusting the
@@ -9025,20 +9017,15 @@ static rtx
 aarch64_gen_loadwb_pair (machine_mode mode, rtx base, rtx reg, rtx reg2,
 			 HOST_WIDE_INT adjustment)
 {
-  switch (mode)
-{
-case E_DImode:
-  return gen_loadwb_pairdi_di (base, base, reg, reg2, GEN_INT (adjustment),
-   GEN_INT (UNITS_PER_WORD));
-case E_DFmode:
-  return gen_loadwb_pairdf_di (base, base, reg, reg2, GEN_INT (adjustment),
-   GEN_INT (UNITS_PER_WORD));
-case E_TFmode:
-  return gen_loadwb_pairtf_di (base, base, 

[PATCH 07/11] aarch64: Fix up printing of ldp/stp with -msve-vector-bits=128

2023-11-16 Thread Alex Coplan
Later patches allow using SVE modes in ldp/stp with -msve-vector-bits=128,
so we need to make sure that we don't use SVE addressing modes when
printing the address for the ldp/stp.

This patch does that.

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_print_address_internal): Handle SVE
modes when printing ldp/stp addresses.
---
 gcc/config/aarch64/aarch64.cc | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index abd029887e5..4820fac67a1 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -12661,6 +12661,9 @@ aarch64_print_address_internal (FILE *f, machine_mode mode, rtx x,
   return false;
 }
 
+  const bool load_store_pair_p = (type == ADDR_QUERY_LDP_STP
+  || type == ADDR_QUERY_LDP_STP_N);
+
   if (aarch64_classify_address (, x, mode, true, type))
 switch (addr.type)
   {
@@ -12672,7 +12675,15 @@ aarch64_print_address_internal (FILE *f, machine_mode mode, rtx x,
 	  }
 
 	vec_flags = aarch64_classify_vector_mode (mode);
-	if (vec_flags & VEC_ANY_SVE)
+	if ((vec_flags & VEC_ANY_SVE)
+	&& load_store_pair_p
+	&& !addr.const_offset.is_constant ())
+	  {
+	output_operand_lossage ("poly offset in ldp/stp address");
+	return false;
+	  }
+
+	if ((vec_flags & VEC_ANY_SVE) && !load_store_pair_p)
 	  {
 	HOST_WIDE_INT vnum
 	  = exact_div (addr.const_offset,


[PATCH 06/11] aarch64: Fix up aarch64_print_operand xzr/wzr case

2023-11-16 Thread Alex Coplan
This adjusts aarch64_print_operand to recognize zero rtxes in modes other than
VOIDmode.  This allows us to use xzr/wzr for zero vectors, for example.

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_print_operand): Handle
non-VOIDmode CONST0_RTXes in {x,w}zr cases.
---
 gcc/config/aarch64/aarch64.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 800a8b0e110..abd029887e5 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -12387,7 +12387,7 @@ aarch64_print_operand (FILE *f, rtx x, int code)
 
 case 'w':
 case 'x':
-  if (x == const0_rtx
+  if (x == CONST0_RTX (GET_MODE (x))
 	  || (CONST_DOUBLE_P (x) && aarch64_float_const_zero_rtx_p (x)))
 	{
 	  asm_fprintf (f, "%czr", code);


[PATCH 05/11] aarch64, testsuite: Fix up pr103147-10 tests

2023-11-16 Thread Alex Coplan
For the ret function, allow the loads to be emitted in either order in
the codegen.  The order gets inverted with the new load/store pair pass.

OK for trunk?

gcc/testsuite/ChangeLog:

* g++.target/aarch64/pr103147-10.C (ret): Allow loads in either order.
* gcc.target/aarch64/pr103147-10.c (ret): Likewise.
---
 gcc/testsuite/g++.target/aarch64/pr103147-10.C | 5 +
 gcc/testsuite/gcc.target/aarch64/pr103147-10.c | 5 +
 2 files changed, 10 insertions(+)

diff --git a/gcc/testsuite/g++.target/aarch64/pr103147-10.C b/gcc/testsuite/g++.target/aarch64/pr103147-10.C
index e12771533f7..5a98c30ed3f 100644
--- a/gcc/testsuite/g++.target/aarch64/pr103147-10.C
+++ b/gcc/testsuite/g++.target/aarch64/pr103147-10.C
@@ -62,8 +62,13 @@ ld4 (int32x4x4_t *a, int32_t *b)
 /*
 ** ret:
 **	...
+** (
 **	ldp	q0, q1, \[x0\]
 **	ldr	q2, \[x0, #?32\]
+** |
+**	ldr	q2, \[x0, #?32\]
+**	ldp	q0, q1, \[x0\]
+** )
 **	...
 */
 int32x4x3_t
diff --git a/gcc/testsuite/gcc.target/aarch64/pr103147-10.c b/gcc/testsuite/gcc.target/aarch64/pr103147-10.c
index 57942bfd10a..2609266bc46 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr103147-10.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr103147-10.c
@@ -60,8 +60,13 @@ ld4 (int32x4x4_t *a, int32_t *b)
 /*
 ** ret:
 **	...
+** (
 **	ldp	q0, q1, \[x0\]
 **	ldr	q2, \[x0, #?32\]
+** |
+**	ldr	q2, \[x0, #?32\]
+**	ldp	q0, q1, \[x0\]
+** )
 **	...
 */
 int32x4x3_t


[PATCH 04/11] aarch64, testsuite: Allow ldp/stp on SVE regs with -msve-vector-bits=128

2023-11-16 Thread Alex Coplan
Later patches in the series allow ldp and stp to use SVE modes if
-msve-vector-bits=128 is provided.  This patch therefore adjusts tests
that pass -msve-vector-bits=128 to allow ldp/stp to save/restore SVE
registers.

OK for trunk?

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/pcs/stack_clash_1_128.c: Allow ldp/stp saves
of SVE registers.
* gcc.target/aarch64/sve/pcs/struct_3_128.c: Likewise.
---
 .../aarch64/sve/pcs/stack_clash_1_128.c   | 32 +++
 .../gcc.target/aarch64/sve/pcs/struct_3_128.c | 29 +
 2 files changed, 61 insertions(+)

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_1_128.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_1_128.c
index 404301dc0c1..795429b01cb 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_1_128.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_1_128.c
@@ -19,6 +19,7 @@
 **	str	p13, \[sp, #9, mul vl\]
 **	str	p14, \[sp, #10, mul vl\]
 **	str	p15, \[sp, #11, mul vl\]
+** (
 **	str	z8, \[sp, #2, mul vl\]
 **	str	z9, \[sp, #3, mul vl\]
 **	str	z10, \[sp, #4, mul vl\]
@@ -35,7 +36,18 @@
 **	str	z21, \[sp, #15, mul vl\]
 **	str	z22, \[sp, #16, mul vl\]
 **	str	z23, \[sp, #17, mul vl\]
+** |
+**	stp	q8, q9, \[sp, 32\]
+**	stp	q10, q11, \[sp, 64\]
+**	stp	q12, q13, \[sp, 96\]
+**	stp	q14, q15, \[sp, 128\]
+**	stp	q16, q17, \[sp, 160\]
+**	stp	q18, q19, \[sp, 192\]
+**	stp	q20, q21, \[sp, 224\]
+**	stp	q22, q23, \[sp, 256\]
+** )
 **	ptrue	p0\.b, vl16
+** (
 **	ldr	z8, \[sp, #2, mul vl\]
 **	ldr	z9, \[sp, #3, mul vl\]
 **	ldr	z10, \[sp, #4, mul vl\]
@@ -52,6 +64,16 @@
 **	ldr	z21, \[sp, #15, mul vl\]
 **	ldr	z22, \[sp, #16, mul vl\]
 **	ldr	z23, \[sp, #17, mul vl\]
+** |
+**	ldp	q8, q9, \[sp, 32\]
+**	ldp	q10, q11, \[sp, 64\]
+**	ldp	q12, q13, \[sp, 96\]
+**	ldp	q14, q15, \[sp, 128\]
+**	ldp	q16, q17, \[sp, 160\]
+**	ldp	q18, q19, \[sp, 192\]
+**	ldp	q20, q21, \[sp, 224\]
+**	ldp	q22, q23, \[sp, 256\]
+** )
 **	ldr	p4, \[sp\]
 **	ldr	p5, \[sp, #1, mul vl\]
 **	ldr	p6, \[sp, #2, mul vl\]
@@ -101,16 +123,26 @@ test_2 (void)
 **	str	p5, \[sp\]
 **	str	p6, \[sp, #1, mul vl\]
 **	str	p11, \[sp, #2, mul vl\]
+** (
 **	str	z8, \[sp, #1, mul vl\]
 **	str	z13, \[sp, #2, mul vl\]
 **	str	z19, \[sp, #3, mul vl\]
 **	str	z20, \[sp, #4, mul vl\]
+** |
+**	stp	q8, q13, \[sp, 16\]
+**	stp	q19, q20, \[sp, 48\]
+** )
 **	str	z22, \[sp, #5, mul vl\]
 **	ptrue	p0\.b, vl16
+** (
 **	ldr	z8, \[sp, #1, mul vl\]
 **	ldr	z13, \[sp, #2, mul vl\]
 **	ldr	z19, \[sp, #3, mul vl\]
 **	ldr	z20, \[sp, #4, mul vl\]
+** |
+**	ldp	q8, q13, \[sp, 16\]
+**	ldp	q19, q20, \[sp, 48\]
+** )
 **	ldr	z22, \[sp, #5, mul vl\]
 **	ldr	p5, \[sp\]
 **	ldr	p6, \[sp, #1, mul vl\]
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c
index f6d78469aa5..0d330c015b9 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/struct_3_128.c
@@ -220,6 +220,7 @@ SEL2 (struct, pst_arr5)
 /*
 ** test_pst_arr5:
 **	sub	sp, sp, #128
+** (
 **	str	z0, \[sp\]
 **	str	z1, \[sp, #1, mul vl\]
 **	str	z2, \[sp, #2, mul vl\]
@@ -228,6 +229,12 @@ SEL2 (struct, pst_arr5)
 **	str	z5, \[sp, #5, mul vl\]
 **	str	z6, \[sp, #6, mul vl\]
 **	str	z7, \[sp, #7, mul vl\]
+** |
+**	stp	q0, q1, \[sp\]
+**	stp	q2, q3, \[sp, 32\]
+**	stp	q4, q5, \[sp, 64\]
+**	stp	q6, q7, \[sp, 96\]
+** )
 **	mov	(x7, sp|w7, wsp)
 **	add	sp, sp, #?128
 **	ret
@@ -374,8 +381,12 @@ SEL2 (struct, pst_uniform1)
 /*
 ** test_pst_uniform1:
 **	sub	sp, sp, #32
+** (
 **	str	z0, \[sp\]
 **	str	z1, \[sp, #1, mul vl\]
+** |
+**	stp	q0, q1, \[sp\]
+** )
 **	mov	(x7, sp|w7, wsp)
 **	add	sp, sp, #?32
 **	ret
@@ -398,8 +409,12 @@ SEL2 (struct, pst_uniform2)
 /*
 ** test_pst_uniform2:
 **	sub	sp, sp, #48
+** (
 **	str	z0, \[sp\]
 **	str	z1, \[sp, #1, mul vl\]
+** |
+**	stp	q0, q1, \[sp\]
+** )
 **	str	z2, \[sp, #2, mul vl\]
 **	mov	(x7, sp|w7, wsp)
 **	add	sp, sp, #?48
@@ -424,10 +439,15 @@ SEL2 (struct, pst_uniform3)
 /*
 ** test_pst_uniform3:
 **	sub	sp, sp, #64
+** (
 **	str	z0, \[sp\]
 **	str	z1, \[sp, #1, mul vl\]
 **	str	z2, \[sp, #2, mul vl\]
 **	str	z3, \[sp, #3, mul vl\]
+** |
+**	stp	q0, q1, \[sp\]
+**	stp	q2, q3, \[sp, 32\]
+** )
 **	mov	(x7, sp|w7, wsp)
 **	add	sp, sp, #?64
 **	ret
@@ -456,8 +476,12 @@ SEL2 (struct, pst_uniform4)
 **	ptrue	(p[0-7])\.b, vl16
 **	st1w	z0\.s, \2, \[x7\]
 **	add	(x[0-9]+), x7, #?32
+** (
 **	str	z1, \[\3\]
 **	str	z2, \[\3, #1, mul vl\]
+** |
+**	stp	q1, q2, \[\3\]
+** )
 **	str	z3, \[\3, #2, mul vl\]
 **	st1w	z4\.s, \2, \[x7, #6, mul vl\]
 **	add	sp, sp, #?144
@@ -542,10 +566,15 @@ SEL2 (struct, pst_mixed2)
 **	str	p2, \[sp, #18, mul vl\]
 **	add	(x[0-9]+), sp, #?38
 **	st1b	z2\.b, \1, \[\4\]
+** (
 **	str	z3, \[sp, #4, mul vl\]
 **	str	z4, \[sp, #5, mul vl\]
 **	str	z5, \[sp, #6, mul vl\]
 **	str	z6, \[sp, #7, mul vl\]
+** |
+**	stp	q3, q4, \[sp, 64\]
+**	stp	q5, q6, \[sp, 96\]
+** )
 **	mov	(x7, sp|w7, 

[PATCH 03/11] aarch64, testsuite: Fix up auto-init-padding tests

2023-11-16 Thread Alex Coplan
The tests currently depending on memcpy lowering forming stps at -O0,
but we no longer want to form stps during memcpy lowering, but instead
in the load/store pair fusion pass.

This patch therefore tweaks affected tests to enable optimizations
(-O1), and adjusts the tests to avoid parts of the structures being
optimized away where necessary.

OK for trunk?

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/auto-init-padding-1.c: Add -O to options,
adjust test to work with optimizations enabled.
* gcc.target/aarch64/auto-init-padding-2.c: Add -O to options.
* gcc.target/aarch64/auto-init-padding-3.c: Add -O to options,
adjust test to work with optimizations enabled.
* gcc.target/aarch64/auto-init-padding-4.c: Likewise.
* gcc.target/aarch64/auto-init-padding-9.c: Likewise.
---
 gcc/testsuite/gcc.target/aarch64/auto-init-padding-1.c | 8 +---
 gcc/testsuite/gcc.target/aarch64/auto-init-padding-2.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/auto-init-padding-3.c | 7 ---
 gcc/testsuite/gcc.target/aarch64/auto-init-padding-4.c | 4 ++--
 gcc/testsuite/gcc.target/aarch64/auto-init-padding-9.c | 7 ---
 5 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-1.c b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-1.c
index c747ebdcdf7..7027454dc74 100644
--- a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-1.c
@@ -1,17 +1,19 @@
 /* Verify zero initialization for structure type automatic variables with
padding.  */
 /* { dg-do compile } */
-/* { dg-options "-ftrivial-auto-var-init=zero" } */
+/* { dg-options "-O -ftrivial-auto-var-init=zero" } */
 
 struct test_aligned {
 int internal1;
 long long internal2;
 } __attribute__ ((aligned(64)));
 
-int foo ()
+void bar (struct test_aligned *);
+
+void foo ()
 {
   struct test_aligned var;
-  return var.internal1;
+  bar();
 }
 
 /* { dg-final { scan-assembler-times {stp\tq[0-9]+, q[0-9]+,} 2 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-2.c b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-2.c
index 6e280904da1..d3b6591c9b0 100644
--- a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-2.c
@@ -1,7 +1,7 @@
 /* Verify pattern initialization for structure type automatic variables with
padding.  */
 /* { dg-do compile } */
-/* { dg-options "-ftrivial-auto-var-init=pattern" } */
+/* { dg-options "-O -ftrivial-auto-var-init=pattern" } */
 
 struct test_aligned {
 int internal1;
diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-3.c b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-3.c
index 9ddea58b468..aad4bb8944f 100644
--- a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-3.c
+++ b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-3.c
@@ -1,7 +1,7 @@
 /* Verify zero initialization for nested structure type automatic variables with
padding.  */
 /* { dg-do compile } */
-/* { dg-options "-ftrivial-auto-var-init=zero" } */
+/* { dg-options "-O -ftrivial-auto-var-init=zero" } */
 
 struct test_aligned {
 unsigned internal1;
@@ -16,11 +16,12 @@ struct test_big_hole {
 struct test_aligned four;
 } __attribute__ ((aligned(64)));
 
+void bar (struct test_big_hole *);
 
-int foo ()
+void foo ()
 {
   struct test_big_hole var;
-  return var.four.internal1;
+  bar ();
 }
 
 /* { dg-final { scan-assembler-times {stp\tq[0-9]+, q[0-9]+,} 4 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-4.c b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-4.c
index 75bba82ed34..efd310f054d 100644
--- a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-4.c
+++ b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-4.c
@@ -1,7 +1,7 @@
 /* Verify pattern initialization for nested structure type automatic variables with
padding.  */
 /* { dg-do compile } */
-/* { dg-options "-ftrivial-auto-var-init=pattern" } */
+/* { dg-options "-O -ftrivial-auto-var-init=pattern" } */
 
 struct test_aligned {
 unsigned internal1;
@@ -23,4 +23,4 @@ int foo ()
   return var.four.internal1;
 }
 
-/* { dg-final { scan-assembler-times {stp\tq[0-9]+, q[0-9]+,} 5 } } */
+/* { dg-final { scan-assembler-times {stp\tq[0-9]+, q[0-9]+,} 4 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-9.c b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-9.c
index 0f1930f813e..64ed8f11fe6 100644
--- a/gcc/testsuite/gcc.target/aarch64/auto-init-padding-9.c
+++ b/gcc/testsuite/gcc.target/aarch64/auto-init-padding-9.c
@@ -1,7 +1,7 @@
 /* Verify zero initialization for array type with structure element with
padding.  */ 
 /* { dg-do compile } */
-/* { dg-options "-ftrivial-auto-var-init=zero" } */
+/* { dg-options "-O -ftrivial-auto-var-init=zero" } */
 
 struct test_trailing_hole {
 int one;
@@ -11,11 +11,12 @@ struct 

[PATCH 02/11] rtl-ssa: Add some helpers for removing accesses

2023-11-16 Thread Alex Coplan
This adds some helpers to access-utils.h for removing accesses from an
access_array.  This is needed by the upcoming aarch64 load/store pair
fusion pass.

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

gcc/ChangeLog:

* rtl-ssa/access-utils.h (filter_accesses): New.
(remove_regno_access): New.
(check_remove_regno_access): New.
---
 gcc/rtl-ssa/access-utils.h | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/gcc/rtl-ssa/access-utils.h b/gcc/rtl-ssa/access-utils.h
index f078625babf..31259d742d9 100644
--- a/gcc/rtl-ssa/access-utils.h
+++ b/gcc/rtl-ssa/access-utils.h
@@ -78,6 +78,48 @@ drop_memory_access (T accesses)
   return T (arr.begin (), accesses.size () - 1);
 }
 
+// Filter ACCESSES to return an access_array of only those accesses that
+// satisfy PREDICATE.  Alocate the new array above WATERMARK.
+template
+inline T
+filter_accesses (obstack_watermark ,
+		 T accesses,
+		 FilterPredicate predicate)
+{
+  access_array_builder builder (watermark);
+  builder.reserve (accesses.size ());
+  auto it = accesses.begin ();
+  auto end = accesses.end ();
+  for (; it != end; it++)
+if (predicate (*it))
+  builder.quick_push (*it);
+  return T (builder.finish ());
+}
+
+// Given an array of ACCESSES, remove any access with regno REGNO.
+// Allocate the new access array above WM.
+template
+inline T
+remove_regno_access (obstack_watermark ,
+		 T accesses, unsigned int regno)
+{
+  using Access = decltype (accesses[0]);
+  auto pred = [regno](Access a) { return a->regno () != regno; };
+  return filter_accesses (watermark, accesses, pred);
+}
+
+// As above, but additionally check that we actually did remove an access.
+template
+inline T
+check_remove_regno_access (obstack_watermark ,
+			   T accesses, unsigned regno)
+{
+  auto orig_size = accesses.size ();
+  auto result = remove_regno_access (watermark, accesses, regno);
+  gcc_assert (result.size () < orig_size);
+  return result;
+}
+
 // If sorted array ACCESSES includes a reference to REGNO, return the
 // access, otherwise return null.
 template


[PATCH 01/11] rtl-ssa: Support for inserting new insns

2023-11-16 Thread Alex Coplan
N.B. this is just a rebased (but otherwise unchanged) version of the
same patch already posted here:

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633348.html

this is the only unreviewed dependency from the previous series, so it
seemed easier just to re-post it (not least to appease the pre-commit
CI).

-- >8 --

The upcoming aarch64 load pair pass needs to form store pairs, and can
re-order stores over loads when alias analysis determines this is safe.
In the case that both mem defs have uses in the RTL-SSA IR, and both
stores require re-ordering over their uses, we represent that as
(tentative) deletion of the original store insns and creation of a new
insn, to prevent requiring repeated re-parenting of uses during the
pass.  We then update all mem uses that require re-parenting in one go
at the end of the pass.

To support this, RTL-SSA needs to handle inserting new insns (rather
than just changing existing ones), so this patch adds support for that.

New insns (and new accesses) are temporaries, allocated above a temporary
obstack_watermark, such that the user can easily back out of a change without
awkward bookkeeping.

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

gcc/ChangeLog:

* rtl-ssa/accesses.cc (function_info::create_set): New.
* rtl-ssa/accesses.h (access_info::is_temporary): New.
* rtl-ssa/changes.cc (move_insn): Handle new (temporary) insns.
(function_info::finalize_new_accesses): Handle new/temporary
user-created accesses.
(function_info::apply_changes_to_insn): Ensure m_is_temp flag
on new insns gets cleared.
(function_info::change_insns): Handle new/temporary insns.
(function_info::create_insn): New.
* rtl-ssa/changes.h (class insn_change): Make function_info a
friend class.
* rtl-ssa/functions.h (function_info): Declare new entry points:
create_set, create_insn.  Declare new change_alloc helper.
* rtl-ssa/insns.cc (insn_info::print_full): Identify temporary insns in
dump.
* rtl-ssa/insns.h (insn_info): Add new m_is_temp flag and accompanying
is_temporary accessor.
* rtl-ssa/internals.inl (insn_info::insn_info): Initialize m_is_temp to
false.
* rtl-ssa/member-fns.inl (function_info::change_alloc): New.
* rtl-ssa/movement.h (restrict_movement_for_defs_ignoring): Add
handling for temporary defs.
---
 gcc/rtl-ssa/accesses.cc| 10 ++
 gcc/rtl-ssa/accesses.h |  4 +++
 gcc/rtl-ssa/changes.cc | 74 +++---
 gcc/rtl-ssa/changes.h  |  2 ++
 gcc/rtl-ssa/functions.h| 14 
 gcc/rtl-ssa/insns.cc   |  5 +++
 gcc/rtl-ssa/insns.h|  7 +++-
 gcc/rtl-ssa/internals.inl  |  1 +
 gcc/rtl-ssa/member-fns.inl | 12 +++
 gcc/rtl-ssa/movement.h |  8 -
 10 files changed, 123 insertions(+), 14 deletions(-)

diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
index 510545a8bad..76d70fd8bd3 100644
--- a/gcc/rtl-ssa/accesses.cc
+++ b/gcc/rtl-ssa/accesses.cc
@@ -1456,6 +1456,16 @@ function_info::make_uses_available (obstack_watermark ,
   return use_array (new_uses, num_uses);
 }
 
+set_info *
+function_info::create_set (obstack_watermark ,
+			   insn_info *insn,
+			   resource_info resource)
+{
+  auto set = change_alloc (watermark, insn, resource);
+  set->m_is_temp = true;
+  return set;
+}
+
 // Return true if ACCESS1 can represent ACCESS2 and if ACCESS2 can
 // represent ACCESS1.
 static bool
diff --git a/gcc/rtl-ssa/accesses.h b/gcc/rtl-ssa/accesses.h
index fce31d46717..7e7a90ece97 100644
--- a/gcc/rtl-ssa/accesses.h
+++ b/gcc/rtl-ssa/accesses.h
@@ -204,6 +204,10 @@ public:
   // in the main instruction pattern.
   bool only_occurs_in_notes () const { return m_only_occurs_in_notes; }
 
+  // Return true if this is a temporary access, e.g. one created for
+  // an insn that is about to be inserted.
+  bool is_temporary () const { return m_is_temp; }
+
 protected:
   access_info (resource_info, access_kind);
 
diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
index aab532b9f26..da2a61d701a 100644
--- a/gcc/rtl-ssa/changes.cc
+++ b/gcc/rtl-ssa/changes.cc
@@ -394,14 +394,20 @@ move_insn (insn_change , insn_info *after)
   // At the moment we don't support moving instructions between EBBs,
   // but this would be worth adding if it's useful.
   insn_info *insn = change.insn ();
-  gcc_assert (after->ebb () == insn->ebb ());
+
   bb_info *bb = after->bb ();
   basic_block cfg_bb = bb->cfg_bb ();
 
-  if (insn->bb () != bb)
-// Force DF to mark the old block as dirty.
-df_insn_delete (rtl);
-  ::remove_insn (rtl);
+  if (!insn->is_temporary ())
+{
+  gcc_assert (after->ebb () == insn->ebb ());
+
+  if (insn->bb () != bb)
+	// Force DF to mark the old block as dirty.
+	df_insn_delete (rtl);
+  ::remove_insn (rtl);
+}
+
   ::add_insn_after (rtl, after_rtl, cfg_bb);
 }
 
@@ 

[PATCH 00/11] aarch64: Rework ldp/stp patterns, add new ldp/stp pass

2023-11-16 Thread Alex Coplan
Hi,

This patch series reworks the load/store pair representation in the aarch64
backend and adds a new load/store pair fusion pass.

Patch 1/11 is just a rebased version of the patch from the previous version of
the series to add support to RTL-SSA for inserting new insns.

Patch 2/11 adds some RTL-SSA helpers for removing accesses.  Patches 3-5 fix up
the testsuite in light of the codegen changes.  Patches 6-7 make small tweaks to
operand printing in the aarch64 backend.  Patch 8/11 generalizes and reworks the
existing load/store pair writeback patterns (in preparation for use by the 
pass).
Patch 9/11 reworks the non-writeback pair patterns, both to fix a correctness
issue and increase their generality (while reducing the number of patterns).
Patch 10/11 is a revised version of the load/store pair fusion pass including
writeback support (among other changes).

Finally, patch 11/11 adjusts the mem{cpy,set} expansion to avoid creating
ldp/stp at expand time, instead we rely on the new pass to do it.

Many thanks to Richard Sandiford for his help in patiently answering my
many questions during the development of the series.

Bootstrapped/regtested as a series on aarch64-linux-gnu.

Thanks,
Alex

Alex Coplan (11):
  rtl-ssa: Support for inserting new insns
  rtl-ssa: Add some helpers for removing accesses
  aarch64, testsuite: Fix up auto-init-padding tests
  aarch64, testsuite: Allow ldp/stp on SVE regs with -msve-vector-bits=128
  aarch64, testsuite: Fix up pr103147-10 tests
  aarch64: Fix up aarch64_print_operand xzr/wzr case
  aarch64: Fix up printing of ldp/stp with -msve-vector-bits=128
  aarch64: Generalize writeback ldp/stp patterns
  aarch64: Rewrite non-writeback ldp/stp patterns
  aarch64: Add new load/store pair fusion pass.
  aarch64: Use individual loads/stores for mem{cpy,set} expansion

 gcc/config.gcc|4 +-
 gcc/config/aarch64/aarch64-ldp-fusion.cc  | 2727 +
 gcc/config/aarch64/aarch64-ldpstp.md  |   66 +-
 gcc/config/aarch64/aarch64-modes.def  |6 +-
 gcc/config/aarch64/aarch64-passes.def |2 +
 gcc/config/aarch64/aarch64-protos.h   |7 +-
 gcc/config/aarch64/aarch64-simd.md|   60 -
 gcc/config/aarch64/aarch64.cc |  338 +-
 gcc/config/aarch64/aarch64.md |  472 +--
 gcc/config/aarch64/aarch64.opt|   23 +
 gcc/config/aarch64/iterators.md   |3 +
 gcc/config/aarch64/predicates.md  |   48 +-
 gcc/config/aarch64/t-aarch64  |7 +
 gcc/rtl-ssa/access-utils.h|   42 +
 gcc/rtl-ssa/accesses.cc   |   10 +
 gcc/rtl-ssa/accesses.h|4 +
 gcc/rtl-ssa/changes.cc|   74 +-
 gcc/rtl-ssa/changes.h |2 +
 gcc/rtl-ssa/functions.h   |   14 +
 gcc/rtl-ssa/insns.cc  |5 +
 gcc/rtl-ssa/insns.h   |7 +-
 gcc/rtl-ssa/internals.inl |1 +
 gcc/rtl-ssa/member-fns.inl|   12 +
 gcc/rtl-ssa/movement.h|8 +-
 .../g++.target/aarch64/pr103147-10.C  |5 +
 .../gcc.target/aarch64/auto-init-padding-1.c  |8 +-
 .../gcc.target/aarch64/auto-init-padding-2.c  |2 +-
 .../gcc.target/aarch64/auto-init-padding-3.c  |7 +-
 .../gcc.target/aarch64/auto-init-padding-4.c  |4 +-
 .../gcc.target/aarch64/auto-init-padding-9.c  |7 +-
 .../gcc.target/aarch64/pr103147-10.c  |5 +
 .../aarch64/sve/pcs/stack_clash_1_128.c   |   32 +
 .../gcc.target/aarch64/sve/pcs/struct_3_128.c |   29 +
 33 files changed, 3573 insertions(+), 468 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-ldp-fusion.cc



Re: building GNU gettext on AIX

2023-11-16 Thread David Edelsohn
Bruno,

The issue appears to be that intl/gnulib-lib/{mbrtowc.c,setlocale_null.c}
include pthread.h based on HAVE_PTHREAD_API, which is defined as 1 in
intl/config.h build directory despite requesting --disable-pthreads.

Thanks, David

On Thu, Nov 16, 2023 at 11:35 AM David Edelsohn  wrote:

> I configured gettext with --disable-pthreads and libintl.a still contains
> references to pthread_mutex_lock and pthread_mutex_unlock, which causes NLS
> configure to fail on AIX.
>
> How can this be corrected?
>
> Thanks, David
>
> libintl.a[libgnu_la-mbrtowc.o]:
>
>  - U __lc_charmap
>
>  - U errno
>
>  - U .locale_encoding_classification
>
>  - U .gl_get_mbtowc_lock
>
>  - U .pthread_mutex_lock
>
>  - U .mbtowc
>
>  - U .pthread_mutex_unlock
>
>  - U .abort
>
>  0 T ._libintl_mbrtowc
>
>   1952 D _libintl_mbrtowc
>
> libintl.a[libgnu_la-setlocale_null.o]:
>
>  - U .gl_get_setlocale_null_lock
>
>  - U .pthread_mutex_lock
>
>  - U .setlocale
>
>  - U .strlen
>
>  - U .memcpy
>
>  - U .pthread_mutex_unlock
>
>  - U .abort
>
>  - U .strcpy
>
>336 T ._libintl_setlocale_null_r
>
>400 T ._libintl_setlocale_null
>
>812 D _libintl_setlocale_null_r
>
>824 D _libintl_setlocale_null
>
> On Thu, Nov 16, 2023 at 11:00 AM David Edelsohn  wrote:
>
>> Bruno,
>>
>> I have been able to tweak the environment and build gettext and libintl.
>> With the updated libintl and environment, GCC reliably does not use NLS.
>>
>> The issue is that libintl utilizes pthreads.  AIX does not provide no-op
>> pthread stubs in libc.  pthreads is an explicit multilib on AIX.
>>
>> It is great that gettext and libintl can be built thread-safe, but GCC
>> (cc1, gcov, etc.) are not pthreads applications and are not built with
>> pthreads.  Because libintl defaults to pthreads enabled, NLS cannot
>> function in GCC on AIX by default.  The GCC included gettext was built in
>> the default for GCC libraries, which was not pthreads enabled.
>>
>> I can rebuild libintl with --disable-pthreads and I will see if that
>> works, but the default, distributed libintl library will not allow GCC to
>> be built with NLS enabled.  And, no, GCC on AIX should not be forced to
>> build with pthreads.
>>
>> This is a regression in NLS support in GCC.
>>
>> Thanks, David
>>
>>
>> On Wed, Nov 15, 2023 at 5:39 PM Bruno Haible  wrote:
>>
>>> David Edelsohn wrote:
>>> > I am using my own install of GCC for a reason.
>>>
>>> I have built GNU gettext 0.22.3 in various configurations on the AIX 7.1
>>> and 7.3 machines in the compilefarm, and haven't encountered issues with
>>> 'max_align_t' nor with 'getpeername'. So, from my point of view, GNU
>>> gettext
>>> works fine on AIX with gcc and xlc (but not ibm-clang, which I haven't
>>> tested).
>>>
>>> You will surely understand that I cannot test a release against a
>>> compiler
>>> that exists only on your hard disk.
>>>
>>> The hint I gave you, based on the partial logs that you provided, is to
>>> look at the configure test for intmax_t first.
>>>
>>> Bruno
>>>
>>>
>>>
>>>


[committed] hppa: Revise REG+D address support to allow long displacements before reload

2023-11-16 Thread John David Anglin
Tested on hppa-unknown-linux-gnu and hppa64-hp-hpux11.11.  Committed
to trunk.

This patch works around problem compiling python3.11 by improving
REG+D address handling.  The change results in smaller code and
reduced register pressure.

Dave
---

hppa: Revise REG+D address support to allow long displacements before reload

In analyzing PR rtl-optimization/112415, I realized that restricting
REG+D offsets to 5-bits before reload results in very poor code and
complexities in optimizing these instructions after reload.  The
general problem is long displacements are not allowed for floating
point accesses when generating PA 1.1 code.  Even with PA 2.0, there
is a ELF linker bug that prevents using long displacements for
floating point loads and stores.

In the past, enabling long displacements before reload caused issues
in reload.  However, there have been fixes in the handling of reloads
for floating-point accesses.  This change allows long displacements
before reload and corrects a couple of issues in the constraint
handling for integer and floating-point accesses.

2023-11-16  John David Anglin  

gcc/ChangeLog:

PR rtl-optimization/112415
* config/pa/pa.cc (pa_legitimate_address_p): Allow 14-bit
displacements before reload.  Simplify logic flow.  Revise
comments.
* config/pa/pa.h (TARGET_ELF64): New define.
(INT14_OK_STRICT): Update define and comment.
* config/pa/pa64-linux.h (TARGET_ELF64): Define.
* config/pa/predicates.md (base14_operand): Don't check
alignment of short displacements.
(integer_store_memory_operand): Don't return true when
reload_in_progress is true.  Remove INT_5_BITS check.
(floating_point_store_memory_operand): Don't return true when
reload_in_progress is true.  Use INT14_OK_STRICT to check
whether long displacements are always okay.

diff --git a/gcc/config/pa/pa.cc b/gcc/config/pa/pa.cc
index 218c48b4ae0..565c948a9e6 100644
--- a/gcc/config/pa/pa.cc
+++ b/gcc/config/pa/pa.cc
@@ -10819,23 +10819,29 @@ pa_legitimate_address_p (machine_mode mode, rtx x, 
bool strict, code_helper)
 
   if (GET_CODE (index) == CONST_INT)
{
+ /* Short 5-bit displacements always okay.  */
  if (INT_5_BITS (index))
return true;
 
- /* When INT14_OK_STRICT is false, a secondary reload is needed
-to adjust the displacement of SImode and DImode floating point
-instructions but this may fail when the register also needs
-reloading.  So, we return false when STRICT is true.  We
-also reject long displacements for float mode addresses since
-the majority of accesses will use floating point instructions
-that don't support 14-bit offsets.  */
- if (!INT14_OK_STRICT
- && (strict || !(reload_in_progress || reload_completed))
- && mode != QImode
- && mode != HImode)
+ if (!base14_operand (index, mode))
return false;
 
- return base14_operand (index, mode);
+ /* Long 14-bit displacements always okay for these cases.  */
+ if (INT14_OK_STRICT
+ || mode == QImode
+ || mode == HImode)
+   return true;
+
+ /* A secondary reload may be needed to adjust the displacement
+of floating-point accesses when STRICT is nonzero.  */
+ if (strict)
+   return false;
+
+ /* We get significantly better code if we allow long displacements
+before reload for all accesses.  Instructions must satisfy their
+constraints after reload, so we must have an integer access.
+Return true for both cases.  */
+ return true;
}
 
   if (!TARGET_DISABLE_INDEXING
diff --git a/gcc/config/pa/pa.h b/gcc/config/pa/pa.h
index e65af522966..aba2cec7357 100644
--- a/gcc/config/pa/pa.h
+++ b/gcc/config/pa/pa.h
@@ -37,6 +37,11 @@ extern unsigned long total_code_bytes;
 #define TARGET_ELF32 0
 #endif
 
+/* Generate code for ELF64 ABI.  */
+#ifndef TARGET_ELF64
+#define TARGET_ELF64 0
+#endif
+
 /* Generate code for SOM 32bit ABI.  */
 #ifndef TARGET_SOM
 #define TARGET_SOM 0
@@ -823,12 +828,11 @@ extern int may_call_alloca;
 
 /* Nonzero if 14-bit offsets can be used for all loads and stores.
This is not possible when generating PA 1.x code as floating point
-   loads and stores only support 5-bit offsets.  Note that we do not
-   forbid the use of 14-bit offsets for integer modes.  Instead, we
-   use secondary reloads to fix REG+D memory addresses for integer
-   mode floating-point loads and stores.
+   accesses only support 5-bit offsets.  Note that we do not forbid
+   the use of 14-bit offsets prior to reload.  Instead, we use secondary
+   reloads to fix REG+D memory addresses for floating-point accesses.
 
-   FIXME: the ELF32 linker clobbers the LSB of the FP register number
+   FIXME: the GNU ELF linker 

Re: building GNU gettext on AIX

2023-11-16 Thread David Edelsohn
On Thu, Nov 16, 2023 at 11:58 AM Richard Biener 
wrote:

>
>
> Am 16.11.2023 um 17:00 schrieb David Edelsohn :
>
> 
> Bruno,
>
> I have been able to tweak the environment and build gettext and libintl.
> With the updated libintl and environment, GCC reliably does not use NLS.
>
> The issue is that libintl utilizes pthreads.  AIX does not provide no-op
> pthread stubs in libc.  pthreads is an explicit multilib on AIX.
>
> It is great that gettext and libintl can be built thread-safe, but GCC
> (cc1, gcov, etc.) are not pthreads applications and are not built with
> pthreads.  Because libintl defaults to pthreads enabled, NLS cannot
> function in GCC on AIX by default.  The GCC included gettext was built in
> the default for GCC libraries, which was not pthreads enabled.
>
> I can rebuild libintl with --disable-pthreads and I will see if that
> works, but the default, distributed libintl library will not allow GCC to
> be built with NLS enabled.  And, no, GCC on AIX should not be forced to
> build with pthreads.
>
> This is a regression in NLS support in GCC.
>
>
> If that’s for the in-tree libintl we can arrange configure to pass down
> —disable-pthreads like we adjust configure args for gmp and friends as well.
>

The latest issue is that a few files in gettext ignore --disable-pthreads
and creates a dependency on pthread_mutex.

David



>
> Richard
>
> Thanks, David
>
>
> On Wed, Nov 15, 2023 at 5:39 PM Bruno Haible  wrote:
>
>> David Edelsohn wrote:
>> > I am using my own install of GCC for a reason.
>>
>> I have built GNU gettext 0.22.3 in various configurations on the AIX 7.1
>> and 7.3 machines in the compilefarm, and haven't encountered issues with
>> 'max_align_t' nor with 'getpeername'. So, from my point of view, GNU
>> gettext
>> works fine on AIX with gcc and xlc (but not ibm-clang, which I haven't
>> tested).
>>
>> You will surely understand that I cannot test a release against a compiler
>> that exists only on your hard disk.
>>
>> The hint I gave you, based on the partial logs that you provided, is to
>> look at the configure test for intmax_t first.
>>
>> Bruno
>>
>>
>>
>>


[Ada] Fix internal error on function returning dynamically-sized type

2023-11-16 Thread Eric Botcazou
This is PR ada/109881, a tree sharing issue for the internal return type 
synthesized for a function returning a dynamically-sized type and taking an 
Out or In/Out parameter passed by copy.

Tested on x86-64/Linux, applied on mainline, 13 and 12 branches.


2023-11-16  Eric Botcazou  

PR ada/109881
* gcc-interface/decl.cc (gnat_to_gnu_subprog_type): Also create a
TYPE_DECL for the return type built for the CI/CO mechanism.


2023-11-16  Eric Botcazou  

* gnat.dg/varsize4.ads, gnat.dg/varsize4.adb: New test.
* gnat.dg/varsize4_pkg.ads: New helper.

-- 
Eric Botcazoudiff --git a/gcc/ada/gcc-interface/decl.cc b/gcc/ada/gcc-interface/decl.cc
index 95fa508c559..9c7f6840e21 100644
--- a/gcc/ada/gcc-interface/decl.cc
+++ b/gcc/ada/gcc-interface/decl.cc
@@ -6329,6 +6329,12 @@ gnat_to_gnu_subprog_type (Entity_Id gnat_subprog, bool definition,
 
 	  if (debug_info_p)
 	rest_of_record_type_compilation (gnu_cico_return_type);
+
+	  /* Declare it now since it will never be declared otherwise.  This
+	 is necessary to ensure that its subtrees are properly marked.  */
+	  create_type_decl (TYPE_NAME (gnu_cico_return_type),
+			gnu_cico_return_type,
+			true, debug_info_p, gnat_subprog);
 	}
 
   gnu_return_type = gnu_cico_return_type;
-- { dg-do compile }

package body Varsize4 is

   function Func (bytes_read : out Natural) return Arr is
  Ret : Arr := (others => False);
   begin
  return Ret;
   end;

   function Get return Natural is
  Data  : Arr;
  Bytes : Natural;
   begin
  Data := Func (Bytes);
  return Bytes;
   end;

end Varsize4;
with Varsize4_Pkg;

package Varsize4 is

   type Arr is array (1 .. Varsize4_Pkg.F) of Boolean;

   function Get return Natural;

end Varsize4;
package Varsize4_Pkg is

   function F return Natural;

end Varsize4_Pkg;


[committed] libstdc++: Fix aligned formatting of stacktrace_entry and thread::id [PR112564]

2023-11-16 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

The formatter for std::thread::id should default to right-align, and the
formatter for std::stacktrace_entry should not just ignore the
fill-and-align and width from the format-spec!

libstdc++-v3/ChangeLog:

PR libstdc++/112564
* include/std/stacktrace (formatter::format): Format according
to format-spec.
* include/std/thread (formatter::format): Use _Align_right as
default.
* testsuite/19_diagnostics/stacktrace/output.cc: Check
fill-and-align handling. Change compile test to run.
* testsuite/30_threads/thread/id/output.cc: Check fill-and-align
handling.
---
 libstdc++-v3/include/std/stacktrace|  4 +++-
 libstdc++-v3/include/std/thread|  3 ++-
 .../19_diagnostics/stacktrace/output.cc| 18 +++---
 .../testsuite/30_threads/thread/id/output.cc   | 14 --
 4 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/include/std/stacktrace 
b/libstdc++-v3/include/std/stacktrace
index 9d5f6396aed..f570745fe51 100644
--- a/libstdc++-v3/include/std/stacktrace
+++ b/libstdc++-v3/include/std/stacktrace
@@ -740,7 +740,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
  std::ostringstream __os;
  __os << __x;
- return __format::__write(__fc.out(), __os.view());
+ auto __str = __os.view();
+ return __format::__write_padded_as_spec(__str, __str.size(),
+ __fc, _M_spec);
}
 
 private:
diff --git a/libstdc++-v3/include/std/thread b/libstdc++-v3/include/std/thread
index 39042d7cdf5..ee3b8b1fcb0 100644
--- a/libstdc++-v3/include/std/thread
+++ b/libstdc++-v3/include/std/thread
@@ -335,7 +335,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  __os << __id;
  auto __str = __os.view();
  return __format::__write_padded_as_spec(__str, __str.size(),
- __fc, _M_spec);
+ __fc, _M_spec,
+ __format::_Align_right);
}
 
 private:
diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/output.cc 
b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/output.cc
index 4960ccb85b8..67f1e0cebaf 100644
--- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/output.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/output.cc
@@ -1,4 +1,5 @@
-// { dg-do compile { target c++23 } }
+// { dg-options "-lstdc++exp" }
+// { dg-do run { target c++23 } }
 // { dg-require-effective-target stacktrace }
 // { dg-add-options no_pch }
 
@@ -17,7 +18,8 @@ test_to_string()
 {
   auto trace = std::stacktrace::current();
   std::string s1 = std::to_string(trace.at(0));
-  VERIFY( s1.contains("test_to_string():15") );
+  VERIFY( s1.contains("test_to_string()") );
+  VERIFY( s1.contains("output.cc:19") );
   std::string s2 = std::to_string(trace);
   VERIFY( s2.contains(s1) );
 }
@@ -47,7 +49,17 @@ test_format()
   std::stacktrace_entry entry = trace.at(0);
   std::string str = std::to_string(entry);
   VERIFY( std::format("{}", entry) == str );
-  VERIFY( std::format("{0:!<{1}}", entry, str.size() + 3) == (str + "!!!") );
+  auto len = str.size();
+  // with width
+  VERIFY( std::format("{0:{1}}", entry, len + 1) == (str + " ") );
+  // with align + width
+  VERIFY( std::format("{0:<{1}}", entry, len + 2) == (str + "  ") );
+  VERIFY( std::format("{0:^{1}}", entry, len + 3) == (" " + str + "  ") );
+  VERIFY( std::format("{0:>{1}}", entry, len + 4) == ("" + str) );
+  // with fill-and-align + width
+  VERIFY( std::format("{0:!<{1}}", entry, len + 2) == (str + "!!") );
+  VERIFY( std::format("{0:!^{1}}", entry, len + 3) == ("!" + str + "!!") );
+  VERIFY( std::format("{0:!>{1}}", entry, len + 4) == ("" + str) );
 }
 
 int main()
diff --git a/libstdc++-v3/testsuite/30_threads/thread/id/output.cc 
b/libstdc++-v3/testsuite/30_threads/thread/id/output.cc
index 08d8c899fda..3c167202b02 100644
--- a/libstdc++-v3/testsuite/30_threads/thread/id/output.cc
+++ b/libstdc++-v3/testsuite/30_threads/thread/id/output.cc
@@ -80,8 +80,18 @@ test02()
   auto len = s1.size();
   out.str("");
 
-  auto s2 = std::format("{0:x^{1}}", j, len + 4);
-  VERIFY( s2 == ("xx" + s1 + "xx") );
+  std::string s2;
+  // with width
+  s2 = std::format("{0:{1}}", j, len + 2);
+  VERIFY( s2 == ("  " + s1) );
+  // with align + width
+  s2 = std::format("{0:>{1}}", j, len + 2);
+  VERIFY( s2 == ("  " + s1) );
+  s2 = std::format("{0:<{1}}", j, len + 2);
+  VERIFY( s2 == (s1 + "  ") );
+  // with fill-and-align + width
+  s2 = std::format("{0:x^{1}}", j, len + 5);
+  VERIFY( s2 == ("xx" + s1 + "xxx") );
 
 #ifdef _GLIBCXX_USE_WCHAR_T
   static_assert( 
std::is_default_constructible_v> );
-- 
2.41.0



[COMMITTED] Add myself to write after approval

2023-11-16 Thread Michal Jires
ChangeLog:

* MAINTAINERS: Add myself.
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index c43167d9a75..f0112f5d029 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -486,6 +486,7 @@ Fariborz Jahanian   

 Surya Kumari Jangala   
 Haochen Jiang  
 Qian Jianhua   
+Michal Jires   
 Janis Johnson  
 Teresa Johnson 
 Kean Johnston  
@@ -753,6 +754,7 @@ information.
 
 Robin Dapp 
 Robin Dapp 
+Michal Jires   
 Matthias Kretz 
 Tim Lange  
 Jeff Law   
-- 
2.42.1



Re: building GNU gettext on AIX

2023-11-16 Thread Richard Biener
Am 16.11.2023 um 17:00 schrieb David Edelsohn :Bruno,I have been able to tweak the environment and build gettext and libintl.  With the updated libintl and environment, GCC reliably does not use NLS.The issue is that libintl utilizes pthreads.  AIX does not provide no-op pthread stubs in libc.  pthreads is an explicit multilib on AIX.It is great that gettext and libintl can be built thread-safe, but GCC (cc1, gcov, etc.) are not pthreads applications and are not built with pthreads.  Because libintl defaults to pthreads enabled, NLS cannot function in GCC on AIX by default.  The GCC included gettext was built in the default for GCC libraries, which was not pthreads enabled.I can rebuild libintl with --disable-pthreads and I will see if that works, but the default, distributed libintl library will not allow GCC to be built with NLS enabled.  And, no, GCC on AIX should not be forced to build with pthreads.This is a regression in NLS support in GCC.If that’s for the in-tree libintl we can arrange configure to pass down —disable-pthreads like we adjust configure args for gmp and friends as well.Richard Thanks, DavidOn Wed, Nov 15, 2023 at 5:39 PM Bruno Haible  wrote:David Edelsohn wrote:
> I am using my own install of GCC for a reason.

I have built GNU gettext 0.22.3 in various configurations on the AIX 7.1
and 7.3 machines in the compilefarm, and haven't encountered issues with
'max_align_t' nor with 'getpeername'. So, from my point of view, GNU gettext
works fine on AIX with gcc and xlc (but not ibm-clang, which I haven't
tested).

You will surely understand that I cannot test a release against a compiler
that exists only on your hard disk.

The hint I gave you, based on the partial logs that you provided, is to
look at the configure test for intmax_t first.

Bruno






[PATCH] sra: SRA of non-escaped aggregates passed by reference to calls

2023-11-16 Thread Martin Jambor
Hello,

PR109849 shows that a loop that heavily pushes and pops from a stack
implemented by a C++ std::vec results in slow code, mainly because the
vector structure is not split by SRA and so we end up in many loads
and stores into it.  This is because it is passed by reference
to (re)allocation methods and so needs to live in memory, even though
it does not escape from them and so we could SRA it if we
re-constructed it before the call and then separated it to distinct
replacements afterwards.

This patch does exactly that, first relaxing the selection of
candidates to also include those which are addressable but do not
escape and then adding code to deal with the calls.  The
micro-benchmark that is also the (scan-dump) testcase in this patch
runs twice as fast with it than with current trunk.  Honza measured
its effect on the libjxl benchmark and it almost closes the
performance gap between Clang and GCC while not requiring excessive
inlining and thus code growth.

The patch disallows creation of replacements for such aggregates which
are also accessed with a precision smaller than their size because I
have observed that this led to excessive zero-extending of data
leading to slow-downs of perlbench (on some CPUs).  Apart from this
case I have not noticed any regressions, at least not so far.

Gimple call argument flags can tell if an argument is unused (and then
we do not need to generate any statements for it) or if it is not
written to and then we do not need to generate statements loading
replacements from the original aggregate after the call statement.
Unfortunately, we cannot symmetrically use flags that an aggregate is
not read because to avoid re-constructing the aggregate before the
call because flags don't tell which what parts of aggregates were not
written to, so we load all replacements, and so all need to have the
correct value before the call.

The patch passes bootstrap, lto-bootstrap and profiled-lto-bootstrap on
x86_64-linux and a very similar patch has also passed bootstrap and
testing on Aarch64-linux and ppc64le-linux (I'm re-running both on these
two architectures but as I'm sending this).  OK for master?

Thanks,

Martin


gcc/ChangeLog:

2023-11-16  Martin Jambor  

PR middle-end/109849
* tree-sra.cc (passed_by_ref_in_call): New.
(sra_initialize): Allocate passed_by_ref_in_call.
(sra_deinitialize): Free passed_by_ref_in_call.
(create_access): Add decl pool candidates only if they are not
already candidates.
(build_access_from_expr_1): Bail out on ADDR_EXPRs.
(build_access_from_call_arg): New function.
(asm_visit_addr): Rename to scan_visit_addr, change the
disqualification dump message.
(scan_function): Check taken addresses for all non-call statements,
including phi nodes.  Process all call arguments, including the static
chain, build_access_from_call_arg.
(maybe_add_sra_candidate): Relax need_to_live_in_memory check to allow
non-escaped local variables.
(sort_and_splice_var_accesses): Disallow smaller-than-precision
replacements for aggregates passed by reference to functions.
(sra_modify_expr): Use a separate stmt iterator for adding satements
before the processed statement and after it.
(sra_modify_call_arg): New function.
(sra_modify_assign): Adjust calls to sra_modify_expr.
(sra_modify_function_body): Likewise, use sra_modify_call_arg to
process call arguments, including the static chain.

gcc/testsuite/ChangeLog:

2023-11-03  Martin Jambor  

PR middle-end/109849
* g++.dg/tree-ssa/pr109849.C: New test.
* gfortran.dg/pr43984.f90: Added -fno-tree-sra to dg-options.
---
 gcc/testsuite/g++.dg/tree-ssa/pr109849.C |  31 +++
 gcc/testsuite/gfortran.dg/pr43984.f90|   2 +-
 gcc/tree-sra.cc  | 244 ++-
 3 files changed, 231 insertions(+), 46 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr109849.C

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr109849.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C
new file mode 100644
index 000..cd348c0f590
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr109849.C
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sra" } */
+
+#include 
+typedef unsigned int uint32_t;
+std::pair pair;
+void
+test()
+{
+std::vector > stack;
+stack.push_back (pair);
+while (!stack.empty()) {
+std::pair cur = stack.back();
+stack.pop_back();
+if (!cur.first)
+{
+cur.second++;
+stack.push_back (cur);
+}
+if (cur.second > 1)
+break;
+}
+}
+int
+main()
+{
+for (int i = 0; i < 1; i++)
+  test();
+}
+
+/* { dg-final { scan-tree-dump "Created a 

  1   2   >