Re: [PATCH] Allow mode-switching to introduce internal loops [PR113220]

2024-02-21 Thread Jakub Jelinek
On Wed, Feb 21, 2024 at 10:03:17AM +, Richard Sandiford wrote:
> In this PR, the SME mode-switching code needs to insert a stack-probe
> loop for an alloca.  This patch allows the target to do that.
> 
> There are two parts to it: allowing loops for insertions in blocks,
> and allowing them for insertions on edges.  The former can be handled
> entirely within mode-switching itself, by recording which blocks have
> had new branches inserted.  The latter requires an extension to
> commit_one_edge_insertion.
> 
> I think the extension to commit_one_edge_insertion makes logical sense,
> since it already explicitly allows internal loops during RTL expansion.
> The single-block find_sub_basic_blocks is a relatively recent addition,
> so wouldn't have been available when the code was originally written.
> 
> The patch also has a small and obvious fix to make the aarch64 emit
> hook cope with labels.
> 
> I've added specific -fstack-clash-protection versions of all
> aarch64-sme.exp tests that previously failed because of this bug.
> I've also added -fno-stack-clash-protection to the original versions
> of these tests if they contain scans that assume no protection.
> 
> Tested on aarch64-linux-gnu.  OK to install?
> 
> Richard
> 
> 
> gcc/
>   PR target/113220
>   * cfgrtl.cc (commit_one_edge_insertion): Handle sequences that
>   contain jumps even if called after initial RTL expansion.
>   * mode-switching.cc: Include cfgbuild.h.
>   (optimize_mode_switching): Allow the sequence returned by the
>   emit hook to contain internal jumps.  Record which blocks
>   contain such jumps and split the blocks at the end.
>   * config/aarch64/aarch64.cc (aarch64_mode_emit): Check for
>   non-debug insns when scanning the sequence.

LGTM.

Jakub



[PATCH] Allow mode-switching to introduce internal loops [PR113220]

2024-02-21 Thread Richard Sandiford
In this PR, the SME mode-switching code needs to insert a stack-probe
loop for an alloca.  This patch allows the target to do that.

There are two parts to it: allowing loops for insertions in blocks,
and allowing them for insertions on edges.  The former can be handled
entirely within mode-switching itself, by recording which blocks have
had new branches inserted.  The latter requires an extension to
commit_one_edge_insertion.

I think the extension to commit_one_edge_insertion makes logical sense,
since it already explicitly allows internal loops during RTL expansion.
The single-block find_sub_basic_blocks is a relatively recent addition,
so wouldn't have been available when the code was originally written.

The patch also has a small and obvious fix to make the aarch64 emit
hook cope with labels.

I've added specific -fstack-clash-protection versions of all
aarch64-sme.exp tests that previously failed because of this bug.
I've also added -fno-stack-clash-protection to the original versions
of these tests if they contain scans that assume no protection.

Tested on aarch64-linux-gnu.  OK to install?

Richard


gcc/
PR target/113220
* cfgrtl.cc (commit_one_edge_insertion): Handle sequences that
contain jumps even if called after initial RTL expansion.
* mode-switching.cc: Include cfgbuild.h.
(optimize_mode_switching): Allow the sequence returned by the
emit hook to contain internal jumps.  Record which blocks
contain such jumps and split the blocks at the end.
* config/aarch64/aarch64.cc (aarch64_mode_emit): Check for
non-debug insns when scanning the sequence.

gcc/testsuite/
PR target/113220
* gcc.target/aarch64/sme/call_sm_switch_5.c: Add
-fno-stack-clash-protection.
* gcc.target/aarch64/sme/call_sm_switch_5_scp.c: New test.
* gcc.target/aarch64/sme/sibcall_6_scp.c: New test.
* gcc.target/aarch64/sme/za_state_4.c: Add
-fno-stack-clash-protection.
* gcc.target/aarch64/sme/za_state_4_scp.c: New test.
* gcc.target/aarch64/sme/za_state_5.c: Add
-fno-stack-clash-protection.
* gcc.target/aarch64/sme/za_state_5_scp.c: New test.
---
 gcc/cfgrtl.cc | 27 ++-
 gcc/config/aarch64/aarch64.cc |  2 ++
 gcc/mode-switching.cc | 15 +++
 .../gcc.target/aarch64/sme/call_sm_switch_5.c |  2 +-
 .../aarch64/sme/call_sm_switch_5_scp.c|  3 +++
 .../gcc.target/aarch64/sme/sibcall_6_scp.c|  3 +++
 .../gcc.target/aarch64/sme/za_state_4.c   |  2 +-
 .../gcc.target/aarch64/sme/za_state_4_scp.c   |  3 +++
 .../gcc.target/aarch64/sme/za_state_5.c   |  2 +-
 .../gcc.target/aarch64/sme/za_state_5_scp.c   |  3 +++
 10 files changed, 53 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/call_sm_switch_5_scp.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/sibcall_6_scp.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_4_scp.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_5_scp.c

diff --git a/gcc/cfgrtl.cc b/gcc/cfgrtl.cc
index 15259c5e984..304c429c99b 100644
--- a/gcc/cfgrtl.cc
+++ b/gcc/cfgrtl.cc
@@ -2018,6 +2018,21 @@ commit_one_edge_insertion (edge e)
   insns = e->insns.r;
   e->insns.r = NULL;
 
+  /* Allow the sequence to contain internal jumps, such as a memcpy loop
+ or an allocation loop.  If such a sequence is emitted during RTL
+ expansion, we'll create the appropriate basic blocks later,
+ at the end of the pass.  But if such a sequence is emitted after
+ initial expansion, we'll need to find the subblocks ourselves.  */
+  bool contains_jump = false;
+  if (!currently_expanding_to_rtl)
+for (rtx_insn *insn = insns; insn; insn = NEXT_INSN (insn))
+  if (JUMP_P (insn))
+   {
+ rebuild_jump_labels_chain (insns);
+ contains_jump = true;
+ break;
+   }
+
   /* Figure out where to put these insns.  If the destination has
  one predecessor, insert there.  Except for the exit block.  */
   if (single_pred_p (e->dest) && e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
@@ -2112,13 +2127,13 @@ commit_one_edge_insertion (edge e)
delete_insn (before);
 }
   else
-/* Some builtin expanders, such as those for memset and memcpy,
-   may generate loops and conditionals, and those may get emitted
-   into edges.  That's ok while expanding to rtl, basic block
-   boundaries will be identified and split afterwards.  ???  Need
-   we check whether the destination labels of any inserted jumps
-   are also part of the inserted sequence?  */
+/* Sequences inserted after RTL expansion are expected to be SESE,
+   with only internal branches allowed.  If the sequence jumps outside
+   itself then we do not know how to add the associated edges here.  */
 gcc_assert (!JUMP_P (last) ||