from:"Philipp Tomsich"

Re: [PATCH v2] MATCH: Look through VIEW_CONVERT when folding VEC_PERM_EXPRs.

2024-05-24 Thread Philipp Tomsich

On Fri, 24 May 2024 at 13:02, Richard Biener  wrote:
>
> On Fri, 24 May 2024, Manolis Tsamis wrote:
>
> > The match.pd patterns to merge two vector permutes into one fail when a
> > potentially no-op view convert expressions is between the two permutes.
> > This change lifts this restriction.
>
> OK.

Applied to master, thanks!
--Philipp.

Re: [PATCH] [RFC] Target-independent store forwarding avoidance. [PR48696] Target-independent store forwarding avoidance.

2024-05-23 Thread Philipp Tomsich

On Thu, 23 May 2024 at 18:18, Andrew Pinski  wrote:
>
> On Thu, May 23, 2024 at 8:01 AM Manolis Tsamis  
> wrote:
> >
> > This pass detects cases of expensive store forwarding and tries to avoid 
> > them
> > by reordering the stores and using suitable bit insertion sequences.
> > For example it can transform this:
> >
> >  strbw2, [x1, 1]
> >  ldr x0, [x1]  # Epxensive store forwarding to larger load.

@Manolis: looks like a typo slipped through: Epxensive -> Expensive

> >
> > To:
> >
> >  ldr x0, [x1]
> >  strbw2, [x1]
> >  bfi x0, x2, 0, 8
> >
>
> Are you sure this is correct with respect to the C11/C++11 memory
> models? If not then the pass should be gated with
> flag_store_data_races.

This optimization (i.e., the reordering and usage of the
bfi-instruction) should always be safe and not violate the C++11
memory model, as we still perform the same stores (i.e., with the same
width).
Keeping the same stores around (and only reordering them relative to
the loads) ensures that only the bytes containing the adjacent bits
are overwritten.
This pass never tries to merge multiple stores (although later passes
may), but only reorders those relative to a (wider) load we are
forwarding into.

> Also stores like this start a new "alias set" (I can't remember the
> exact term here). So how do you represent the store's aliasing set? Do
> you change it? If not, are you sure that will do the right thing?
>
> You didn't document the new option or the new --param (invoke.texi);
> this is the bare minimum requirement.
> Note you should add documentation for the new pass in the internals
> manual (passes.texi) (note most folks forget to update this when
> adding a new pass).
>
> Thanks,
> Andrew
>
>
> > Assembly like this can appear with bitfields or type punning / unions.
> > On stress-ng when running the cpu-union microbenchmark the following 
> > speedups
> > have been observed.
> >
> >   Neoverse-N1:  +29.4%
> >   Intel Coffeelake: +13.1%
> >   AMD 5950X:+17.5%
> >
> > PR rtl-optimization/48696
> >
> > gcc/ChangeLog:
> >
> > * Makefile.in: Add avoid-store-forwarding.o.
> > * common.opt: New option -favoid-store-forwarding.
> > * params.opt: New param store-forwarding-max-distance.
> > * passes.def: Schedule a new pass.
> > * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
> > * avoid-store-forwarding.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/avoid-store-forwarding-1.c: New test.
> > * gcc.dg/avoid-store-forwarding-2.c: New test.
> > * gcc.dg/avoid-store-forwarding-3.c: New test.
> >
> > Signed-off-by: Manolis Tsamis 
> > ---
> >
> >  gcc/Makefile.in   |   1 +
> >  gcc/avoid-store-forwarding.cc | 554 ++
> >  gcc/common.opt|   4 +
> >  gcc/params.opt|   4 +
> >  gcc/passes.def|   1 +
> >  .../gcc.dg/avoid-store-forwarding-1.c |  46 ++
> >  .../gcc.dg/avoid-store-forwarding-2.c |  39 ++
> >  .../gcc.dg/avoid-store-forwarding-3.c |  31 +
> >  gcc/tree-pass.h   |   1 +
> >  9 files changed, 681 insertions(+)
> >  create mode 100644 gcc/avoid-store-forwarding.cc
> >  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-2.c
> >  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-3.c
> >
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index a7f15694c34..be969b1ca1d 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1681,6 +1681,7 @@ OBJS = \
> > statistics.o \
> > stmt.o \
> > stor-layout.o \
> > +   avoid-store-forwarding.o \
> > store-motion.o \
> > streamer-hooks.o \
> > stringpool.o \
> > diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
> > new file mode 100644
> > index 000..d90627c4872
> > --- /dev/null
> > +++ b/gcc/avoid-store-forwarding.cc
> > @@ -0,0 +1,554 @@
> > +/* Avoid store forwarding optimization pass.
> > +   Copyright (C) 2024 Free Software Foundation, Inc.
> > +   Contributed by VRULL GmbH.
> > +
> > +   This file is part of GCC.
> > +
> > +   GCC is free software; you can redistribute it and/or modify it
> > +   under the terms of the GNU General Public License as published by
> > +   the Free Software Foundation; either version 3, or (at your option)
> > +   any later version.
> > +
> > +   GCC is distributed in the hope that it will be useful, but
> > +   WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   General Public License for more details.
> > +
> > +   You should have received a copy of the GNU General Public License
> > +   along

Re: [PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split

2024-04-06 Thread Philipp Tomsich

On Sat 6. Apr 2024 at 06:52, Jeff Law  wrote:

>
>
> On 3/27/24 4:55 AM, Philipp Tomsich wrote:
> > Jeff,
> >
> > just a heads-up that that trunk (i.e., the soon-to-be GCC14) still
> > generates the suboptimal sequence:
> >https://godbolt.org/z/K9YYEPsvY
> Realistically it's too late to get this into gcc-14.

I didn’t expect this for 14, but wanted to make sure we didn’t forget about
it once the branch for 15 opens up.

Thanks,
Philipp.

>

Re: [PATCH v3] RISC-V: Replace zero_extendsidi2_shifted with generalized split

2024-03-27 Thread Philipp Tomsich

Jeff,

just a heads-up that that trunk (i.e., the soon-to-be GCC14) still
generates the suboptimal sequence:
  https://godbolt.org/z/K9YYEPsvY

Thanks,
Philipp.


On Mon, 21 Nov 2022 at 18:00, Philipp Tomsich  wrote:
>
> On Sun, 20 Nov 2022 at 17:38, Jeff Law  wrote:
> >
> >
> > On 11/9/22 16:10, Philipp Tomsich wrote:
> > > The current method of treating shifts of extended values on RISC-V
> > > frequently causes sequences of 3 shifts, despite the presence of the
> > > 'zero_extendsidi2_shifted' pattern.
> > >
> > > Consider:
> > >  unsigned long f(unsigned int a, unsigned long b)
> > >  {
> > >  a = a << 1;
> > >  unsigned long c = (unsigned long) a;
> > >  c = b + (c<<4);
> > >  return c;
> > >  }
> > > which will present at combine-time as:
> > >  Trying 7, 8 -> 9:
> > >  7: r78:SI=r81:DI#0<<0x1
> > >REG_DEAD r81:DI
> > >  8: r79:DI=zero_extend(r78:SI)
> > >REG_DEAD r78:SI
> > >  9: r72:DI=r79:DI<<0x4
> > >REG_DEAD r79:DI
> > >  Failed to match this instruction:
> > >  (set (reg:DI 72 [ _1 ])
> > >  (and:DI (ashift:DI (reg:DI 81)
> > >  (const_int 5 [0x5]))
> > >   (const_int 68719476704 [0xfffe0])))
> > > and produce the following (optimized) assembly:
> > >  f:
> > >   slliw   a5,a0,1
> > >   sllia5,a5,32
> > >   srlia5,a5,28
> > >   add a0,a5,a1
> > >   ret
> > >
> > > The current way of handling this (in 'zero_extendsidi2_shifted')
> > > doesn't apply for two reasons:
> > > - this is seen before reload, and
> > > - (more importantly) the constant mask is not 0xul.
> > >
> > > To address this, we introduce a generalized version of shifting
> > > zero-extended values that supports any mask of consecutive ones as
> > > long as the number of training zeros is the inner shift-amount.
> > >
> > > With this new split, we generate the following assembly for the
> > > aforementioned function:
> > >  f:
> > >   sllia0,a0,33
> > >   srlia0,a0,28
> > >   add a0,a0,a1
> > >   ret
> > >
> > > Unfortunately, all of this causes some fallout (especially in how it
> > > interacts with Zb* extensions and zero_extract expressions formed
> > > during combine): this is addressed through additional instruction
> > > splitting and handling of zero_extract.
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/riscv/bitmanip.md (*zext.w): Match a zext.w expressed
> > >  as an and:DI.
> > >   (*andi_add.uw): New pattern.
> > >   (*slli_slli_uw): New pattern.
> > >   (*shift_then_shNadd.uw): New pattern.
> > >   (*slliuw): Rename to riscv_slli_uw.
> > >   (riscv_slli_uw): Renamed from *slliuw.
> > >   (*zeroextract2_highbits): New pattern.
> > >   (*zero_extract): New pattern, which will be split to
> > >   shift-left + shift-right.
> > >   * config/riscv/predicates.md (dimode_shift_operand):
> > >   * config/riscv/riscv.md (*zero_extract_lowbits):
> > >   (zero_extendsidi2_shifted): Rename.
> > >   (*zero_extendsidi2_shifted): Generalize.
> > >   (*shift_truthvalue): New pattern.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.target/riscv/shift-shift-6.c: New test.
> > >   * gcc.target/riscv/shift-shift-7.c: New test.
> > >   * gcc.target/riscv/shift-shift-8.c: New test.
> > >   * gcc.target/riscv/shift-shift-9.c: New test.
> > >   * gcc.target/riscv/snez.c: New test.
> > >
> > > Commit notes:
> > > - Depends on a predicate posted in "RISC-V: Optimize branches testing
> > >a bit-range or a shifted immediate".  Depending on the order of
> > >applying these, I'll take care to pull that part out of the other
> > >patch if needed.
> > >
> > > Version-changes: 2
> > > - refactor
> > > - optimise for additional corner cases and deal with fallout
> > >
> > > Version-changes: 3
> > > - removed the [WIP] from the commit message (no other changes)
> > >
> > > Signed-off-by: Philipp Tomsich 
> &

Re: [PATCH] aarch64: Check the ldp/stp policy model correctly when mem ops are reversed.

2024-01-29 Thread Philipp Tomsich

Applied to master, thanks!
--Philipp.

On Wed, 24 Jan 2024 at 12:43, Richard Sandiford 
wrote:

> Manos Anagnostakis  writes:
> > The current ldp/stp policy framework implementation was missing cases,
> where
> > the memory operands were reversed. Therefore the call to the framework
> function
> > is moved after the lower mem check with the suitable parameters. Also
> removes
> > the mode of aarch64_operands_ok_for_ldpstp, which becomes unused and
> triggers
> > a warning on bootstrap.
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-ldpstp.md: Remove unused mode.
> > * config/aarch64/aarch64-protos.h
> (aarch64_operands_ok_for_ldpstp):
> >   Likewise.
> > * config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
> >   Call on framework moved later.
>
> OK, thanks.  The policy infrastructure is new to GCC 14 and so I think
> the change qualifies for stage 4.
>
> Richard
>
> > Signed-off-by: Manos Anagnostakis 
> > Co-Authored-By: Manolis Tsamis 
> > ---
> >  gcc/config/aarch64/aarch64-ldpstp.md | 22 +++---
> >  gcc/config/aarch64/aarch64-protos.h  |  2 +-
> >  gcc/config/aarch64/aarch64.cc| 18 +-
> >  3 files changed, 21 insertions(+), 21 deletions(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64-ldpstp.md
> b/gcc/config/aarch64/aarch64-ldpstp.md
> > index b668fa8e2a6..b7c0bf05cd1 100644
> > --- a/gcc/config/aarch64/aarch64-ldpstp.md
> > +++ b/gcc/config/aarch64/aarch64-ldpstp.md
> > @@ -23,7 +23,7 @@
> >   (match_operand:GPI 1 "memory_operand" ""))
> > (set (match_operand:GPI 2 "register_operand" "")
> >   (match_operand:GPI 3 "memory_operand" ""))]
> > -  "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
> > +  "aarch64_operands_ok_for_ldpstp (operands, true)"
> >[(const_int 0)]
> >  {
> >aarch64_finish_ldpstp_peephole (operands, true);
> > @@ -35,7 +35,7 @@
> >   (match_operand:GPI 1 "aarch64_reg_or_zero" ""))
> > (set (match_operand:GPI 2 "memory_operand" "")
> >   (match_operand:GPI 3 "aarch64_reg_or_zero" ""))]
> > -  "aarch64_operands_ok_for_ldpstp (operands, false, mode)"
> > +  "aarch64_operands_ok_for_ldpstp (operands, false)"
> >[(const_int 0)]
> >  {
> >aarch64_finish_ldpstp_peephole (operands, false);
> > @@ -47,7 +47,7 @@
> >   (match_operand:GPF 1 "memory_operand" ""))
> > (set (match_operand:GPF 2 "register_operand" "")
> >   (match_operand:GPF 3 "memory_operand" ""))]
> > -  "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
> > +  "aarch64_operands_ok_for_ldpstp (operands, true)"
> >[(const_int 0)]
> >  {
> >aarch64_finish_ldpstp_peephole (operands, true);
> > @@ -59,7 +59,7 @@
> >   (match_operand:GPF 1 "aarch64_reg_or_fp_zero" ""))
> > (set (match_operand:GPF 2 "memory_operand" "")
> >   (match_operand:GPF 3 "aarch64_reg_or_fp_zero" ""))]
> > -  "aarch64_operands_ok_for_ldpstp (operands, false, mode)"
> > +  "aarch64_operands_ok_for_ldpstp (operands, false)"
> >[(const_int 0)]
> >  {
> >aarch64_finish_ldpstp_peephole (operands, false);
> > @@ -71,7 +71,7 @@
> >   (match_operand:DREG 1 "memory_operand" ""))
> > (set (match_operand:DREG2 2 "register_operand" "")
> >   (match_operand:DREG2 3 "memory_operand" ""))]
> > -  "aarch64_operands_ok_for_ldpstp (operands, true, mode)"
> > +  "aarch64_operands_ok_for_ldpstp (operands, true)"
> >[(const_int 0)]
> >  {
> >aarch64_finish_ldpstp_peephole (operands, true);
> > @@ -83,7 +83,7 @@
> >   (match_operand:DREG 1 "register_operand" ""))
> > (set (match_operand:DREG2 2 "memory_operand" "")
> >   (match_operand:DREG2 3 "register_operand" ""))]
> > -  "aarch64_operands_ok_for_ldpstp (operands, false, mode)"
> > +  "aarch64_operands_ok_for_ldpstp (operands, false)"
> >[(const_int 0)]
> >  {
> >aarch64_finish_ldpstp_peephole (operands, false);
> > @@ -96,7 +96,7 @@
> > (set (match_operand:VQ2 2 "register_operand" "")
> >   (match_operand:VQ2 3 "memory_operand" ""))]
> >"TARGET_FLOAT
> > -   && aarch64_operands_ok_for_ldpstp (operands, true, mode)
> > +   && aarch64_operands_ok_for_ldpstp (operands, true)
> > && (aarch64_tune_params.extra_tuning_flags
> >   & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) == 0"
> >[(const_int 0)]
> > @@ -111,7 +111,7 @@
> > (set (match_operand:VQ2 2 "memory_operand" "")
> >   (match_operand:VQ2 3 "register_operand" ""))]
> >"TARGET_FLOAT
> > -   && aarch64_operands_ok_for_ldpstp (operands, false, mode)
> > +   && aarch64_operands_ok_for_ldpstp (operands, false)
> > && (aarch64_tune_params.extra_tuning_flags
> >   & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) == 0"
> >[(const_int 0)]
> > @@ -128,7 +128,7 @@
> >   (sign_extend:DI (match_operand:SI 1 "memory_operand" "")))
> > (set (match_operand:DI 2 "register_operand" "")
> >   (sign_extend:DI (match_operand:SI 3 "memory_operand" "")))]
> > -  "aarch64_operands_ok_for_ldpstp (operands, true, SImode)"
> > +

Re: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding.

2023-12-06 Thread Philipp Tomsich

On Wed, 6 Dec 2023 at 23:32, Richard Biener  wrote:
>
> On Wed, Dec 6, 2023 at 2:48 PM Manos Anagnostakis
>  wrote:
> >
> > This is an RTL pass that detects store forwarding from stores to larger 
> > loads (load pairs).
> >
> > This optimization is SPEC2017-driven and was found to be beneficial for 
> > some benchmarks,
> > through testing on ampere1/ampere1a machines.
> >
> > For example, it can transform cases like
> >
> > str  d5, [sp, #320]
> > fmul d5, d31, d29
> > ldp  d31, d17, [sp, #312] # Large load from small store
> >
> > to
> >
> > str  d5, [sp, #320]
> > fmul d5, d31, d29
> > ldr  d31, [sp, #312]
> > ldr  d17, [sp, #320]
> >
> > Currently, the pass is disabled by default on all architectures and enabled 
> > by a target-specific option.
> >
> > If deemed beneficial enough for a default, it will be enabled on 
> > ampere1/ampere1a,
> > or other architectures as well, without needing to be turned on by this 
> > option.
>
> What is aarch64-specific about the pass?
>
> I see an increasingly large number of target specific passes pop up (probably
> for the excuse we can generalize them if necessary).  But GCC isn't LLVM
> and this feels like getting out of hand?

We had an OK from Richard Sandiford on the earlier (v5) version with
v6 just fixing an obvious bug... so I was about to merge this earlier
just when you commented.

Given that this had months of test exposure on our end, I would prefer
to move this forward for GCC14 in its current form.
The project of replacing architecture-specific store-forwarding passes
with a generalized infrastructure could then be addressed in the GCC15
timeframe (or beyond)?

--Philipp.

>
> The x86 backend also has its store-forwarding "pass" as part of mdreorg
> in ix86_split_stlf_stall_load.
>
> Richard.
>
> > Bootstrapped and regtested on aarch64-linux.
> >
> > gcc/ChangeLog:
> >
> > * config.gcc: Add aarch64-store-forwarding.o to extra_objs.
> > * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New pass.
> > * config/aarch64/aarch64-protos.h 
> > (make_pass_avoid_store_forwarding): Declare.
> > * config/aarch64/aarch64.opt (mavoid-store-forwarding): New option.
> > (aarch64-store-forwarding-threshold): New param.
> > * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o
> > * doc/invoke.texi: Document new option and new param.
> > * config/aarch64/aarch64-store-forwarding.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test.
> > * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test.
> > * gcc.target/aarch64/ldp_ssll_overlap.c: New test.
> >
> > Signed-off-by: Manos Anagnostakis 
> > Co-Authored-By: Manolis Tsamis 
> > Co-Authored-By: Philipp Tomsich 
> > ---
> > Changes in v6:
> > - An obvious change. insn_cnt was incremented only on
> >   stores and not for every insn in the bb. Now restored.
> >
> >  gcc/config.gcc|   1 +
> >  gcc/config/aarch64/aarch64-passes.def |   1 +
> >  gcc/config/aarch64/aarch64-protos.h   |   1 +
> >  .../aarch64/aarch64-store-forwarding.cc   | 318 ++
> >  gcc/config/aarch64/aarch64.opt|   9 +
> >  gcc/config/aarch64/t-aarch64  |  10 +
> >  gcc/doc/invoke.texi   |  11 +-
> >  .../aarch64/ldp_ssll_no_overlap_address.c |  33 ++
> >  .../aarch64/ldp_ssll_no_overlap_offset.c  |  33 ++
> >  .../gcc.target/aarch64/ldp_ssll_overlap.c |  33 ++
> >  10 files changed, 449 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc
> >  create mode 100644 
> > gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c
> >  create mode 100644 
> > gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c
> >
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index 6450448f2f0..7c48429eb82 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -350,6 +350,7 @@ aarch64*-*-*)
> > cxx_target_objs="aarch64-c.o"
> > d_target_objs="aarch64-d.o"
> > extra_objs="aarch64-builtins.o aarch-common.o 
> > aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o 
> > aarch64-sve-builtins-base.o aarch64-sve-builti

Re: [PATCH v2] aarch64: Add support for Ampere-1B (-mcpu=ampere1b) CPU

2023-11-29 Thread Philipp Tomsich

Applied to master, thanks!
Philipp.

On Tue, 28 Nov 2023 at 12:57, Richard Sandiford
 wrote:
>
> Philipp Tomsich  writes:
> > On Tue, 28 Nov 2023 at 12:21, Richard Sandiford
> >  wrote:
> >>
> >> Philipp Tomsich  writes:
> >> > This patch adds initial support for Ampere-1B core.
> >> >
> >> > The Ampere-1B core implements ARMv8.7 with the following (compiler
> >> > visible) extensions:
> >> >  - CSSC (Common Short Sequence Compression instructions),
> >> >  - MTE (Memory Tagging Extension)
> >> >  - SM3/SM4
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >   * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere-1b
> >> >   * config/aarch64/aarch64-cost-tables.h: Add ampere1b_extra_costs
> >> >   * config/aarch64/aarch64-tune.md: Regenerate
> >> >   * config/aarch64/aarch64.cc: Include ampere1b tuning model
> >> >   * doc/invoke.texi: Document -mcpu=ampere1b
> >> >   * config/aarch64/tuning_models/ampere1b.h: New file.
> >>
> >> OK, thanks, but:
> >>
> >> >
> >> > Signed-off-by: Philipp Tomsich 
> >> > ---
> >> >
> >> > Changes in v2:
> >> > - moved ampere1b model to a separated file
> >> > - regenerated aarch64-tune.md after rebase
> >> >
> >> >  gcc/config/aarch64/aarch64-cores.def|   1 +
> >> >  gcc/config/aarch64/aarch64-cost-tables.h| 107 ++
> >> >  gcc/config/aarch64/aarch64-tune.md  |   2 +-
> >> >  gcc/config/aarch64/aarch64.cc   |   1 +
> >> >  gcc/config/aarch64/tuning_models/ampere1b.h | 114 
> >> >  gcc/doc/invoke.texi |   2 +-
> >> >  6 files changed, 225 insertions(+), 2 deletions(-)
> >> >  create mode 100644 gcc/config/aarch64/tuning_models/ampere1b.h
> >> >
> >> > diff --git a/gcc/config/aarch64/aarch64-cores.def 
> >> > b/gcc/config/aarch64/aarch64-cores.def
> >> > index 16752b77f4b..ad896a80f1f 100644
> >> > --- a/gcc/config/aarch64/aarch64-cores.def
> >> > +++ b/gcc/config/aarch64/aarch64-cores.def
> >> > @@ -74,6 +74,7 @@ AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx, 
> >> >  V8A,  (CRC, CRYPTO), thu
> >> >  /* Ampere Computing ('\xC0') cores. */
> >> >  AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (F16, RNG, AES, 
> >> > SHA3), ampere1, 0xC0, 0xac3, -1)
> >> >  AARCH64_CORE("ampere1a", ampere1a, cortexa57, V8_6A, (F16, RNG, AES, 
> >> > SHA3, SM4, MEMTAG), ampere1a, 0xC0, 0xac4, -1)
> >> > +AARCH64_CORE("ampere1b", ampere1b, cortexa57, V8_7A, (F16, RNG, AES, 
> >> > SHA3, SM4, MEMTAG, CSSC), ampere1b, 0xC0, 0xac5, -1)
> >> >  /* Do not swap around "emag" and "xgene1",
> >> > this order is required to handle variant correctly. */
> >> >  AARCH64_CORE("emag",emag,  xgene1,V8A,  (CRC, CRYPTO), 
> >> > emag, 0x50, 0x000, 3)
> >> > diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
> >> > b/gcc/config/aarch64/aarch64-cost-tables.h
> >> > index 0cb638f3a13..4c8da7f119b 100644
> >> > --- a/gcc/config/aarch64/aarch64-cost-tables.h
> >> > +++ b/gcc/config/aarch64/aarch64-cost-tables.h
> >> > @@ -882,4 +882,111 @@ const struct cpu_cost_table ampere1a_extra_costs =
> >> >}
> >> >  };
> >> >
> >> > +const struct cpu_cost_table ampere1b_extra_costs =
> >> > +{
> >> > +  /* ALU */
> >> > +  {
> >> > +0, /* arith.  */
> >> > +0, /* logical.  */
> >> > +0, /* shift.  */
> >> > +COSTS_N_INSNS (1), /* shift_reg.  */
> >> > +0, /* arith_shift.  */
> >> > +COSTS_N_INSNS (1), /* arith_shift_reg.  */
> >> > +0, /* log_shift.  */
> >> > +COSTS_N_INSNS (1), /* log_shift_reg.  */
> >> > +0, /* extend.  */
> >> > +COSTS_N_INSNS (1), /* extend_arith.  */
> >> > +0, /* bfi.  */
> >> > +0, /* bfx.  */
> >> > +0, /* clz.  */
> >> > +0, /* rev.  */
> >> > +

Re: [RFC PATCH] RISC-V: Remove f{r,s}flags builtins

2023-11-29 Thread Philipp Tomsich

These build-ins are used internally for the
TARGET_ATOMIC_ASSIGN_EXPAND_FENV expansion (and therefore can not be
removed):

/* Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV.  */

void
riscv_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
{
  if (!(TARGET_HARD_FLOAT || TARGET_ZFINX))
return;

  tree frflags = GET_BUILTIN_DECL (CODE_FOR_riscv_frflags);
  tree fsflags = GET_BUILTIN_DECL (CODE_FOR_riscv_fsflags);
  tree old_flags = create_tmp_var_raw (RISCV_ATYPE_USI);

  *hold = build4 (TARGET_EXPR, RISCV_ATYPE_USI, old_flags,
  build_call_expr (frflags, 0), NULL_TREE, NULL_TREE);
  *clear = build_call_expr (fsflags, 1, old_flags);
  *update = NULL_TREE;
}


On Wed, 29 Nov 2023 at 20:58, Christoph Müllner
 wrote:
>
> On Wed, Nov 29, 2023 at 8:24 PM Patrick O'Neill  wrote:
> >
> > Hi Christoph,
> >
> > The precommit-ci is seeing a large number of ICE segmentation faults as a 
> > result of this patch:
> > https://github.com/ewlu/gcc-precommit-ci/issues/796#issuecomment-1831853523
> >
> > The failures aren't in riscv.exp testsuite files so that's likely why you 
> > didn't run into them in your testing.
>
> Oh, I see.
> Then keeping things like they are is probably the best idea.
> Sorry for the noise!
>
> BR
> Christoph
>
> >
> > Debug log:
> >
> > /home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.dg/c11-atomic-2.c:110:3:
> >  internal compiler error: Segmentation fault
> > 0x133afb3 crash_signal
> > ../../../gcc/gcc/toplev.cc:316
> > 0x1678d1f contains_struct_check(tree_node*, tree_node_structure_enum, char 
> > const*, int, char const*)
> > ../../../gcc/gcc/tree.h:3747
> > 0x1678d1f build_call_expr_loc_array(unsigned int, tree_node*, int, 
> > tree_node**)
> > ../../../gcc/gcc/tree.cc:10815
> > 0x1679043 build_call_expr(tree_node*, int, ...)
> > ../../../gcc/gcc/tree.cc:10865
> > 0x17f816e riscv_atomic_assign_expand_fenv(tree_node**, tree_node**, 
> > tree_node**)
> > ../../../gcc/gcc/config/riscv/riscv-builtins.cc:420
> > 0xc5209b build_atomic_assign
> > ../../../gcc/gcc/c/c-typeck.cc:4289
> > 0xc60a47 build_modify_expr(unsigned int, tree_node*, tree_node*, tree_code, 
> > unsigned int, tree_node*, tree_node*)
> > ../../../gcc/gcc/c/c-typeck.cc:6406
> > 0xc85a61 c_parser_expr_no_commas
> > ../../../gcc/gcc/c/c-parser.cc:9112
> > 0xc85db1 c_parser_expression
> > ../../../gcc/gcc/c/c-parser.cc:12725
> > 0xc862bb c_parser_expression_conv
> > ../../../gcc/gcc/c/c-parser.cc:12765
> > 0xca3607 c_parser_statement_after_labels
> > ../../../gcc/gcc/c/c-parser.cc:7755
> > 0xc9f27e c_parser_compound_statement_nostart
> > ../../../gcc/gcc/c/c-parser.cc:7242
> > 0xc9f804 c_parser_compound_statement
> > ../../../gcc/gcc/c/c-parser.cc:6527
> > 0xca359c c_parser_statement_after_labels
> > ../../../gcc/gcc/c/c-parser.cc:7590
> > 0xca5713 c_parser_statement
> > ../../../gcc/gcc/c/c-parser.cc:7561
> > 0xca5713 c_parser_c99_block_statement
> > ../../../gcc/gcc/c/c-parser.cc:7820
> > 0xca6a2c c_parser_do_statement
> > ../../../gcc/gcc/c/c-parser.cc:8194
> > 0xca3d51 c_parser_statement_after_labels
> > ../../../gcc/gcc/c/c-parser.cc:7605
> > 0xc9f27e c_parser_compound_statement_nostart
> > ../../../gcc/gcc/c/c-parser.cc:7242
> > 0xc9f804 c_parser_compound_statement
> > ../../../gcc/gcc/c/c-parser.cc:6527
> > Please submit a full bug report, with preprocessed source (by using 
> > -freport-bug).
> > Please include the complete backtrace with any bug report.
> > See  for instructions.
> > compiler exited with status 1
> > FAIL: gcc.dg/c11-atomic-2.c (internal compiler error: Segmentation fault)
> >
> > Let me know if you need any additional info/investigation from me.
> >
> > Thanks,
> > Patrick
> >
> > On 11/29/23 03:49, Christoph Muellner wrote:
> >
> > From: Christoph Müllner 
> >
> > We have two builtins which are undocumented and have no known users.
> > Further, they don't exist in LLVM (so are no portable).
> > This means they are in an unclear state of being supported or not.
> > Let's remove them get them out of this undecided state.
> >
> > A discussion about making these builtins available in all
> > compilers was held many years ago with the decision to
> > not document them in the RISC-V C API documentation:
> >   https://github.com/riscv-non-isa/riscv-c-api-doc/pull/3
> >
> > This is an RFC patch as this breaks existing code that uses
> > these builtins, even if we don't know if such code exists.
> >
> > An alternative to this patch would be to document them
> > in gcc/doc/extend.texi (like has been done with __builtin_riscv_pause)
> > and put them into a supported state.
> >
> > This patch removes two tests for these builtins.
> > A test of this patch did not trigger any regressions in riscv.exp.
> >
> > Signed-off-by: Christoph Müllner 
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv-builtins.cc: Remove the builtins
> > __builtin_riscv_frflags and __builtin_riscv_fsflags.
> >
> > gcc/testsuite/ChangeLog:
>

Re: T-Head Vector for GCC-14? (was Re: RISC-V: Support XTheadVector extensions)

2023-11-28 Thread Philipp Tomsich

On Tue, 28 Nov 2023 at 20:31, Palmer Dabbelt  wrote:
>
> On Wed, 22 Nov 2023 14:27:50 PST (-0800), jeffreya...@gmail.com wrote:
> > ...
>
> [Trimming everything else, as this is a big change.  I'm also making it
> a new subject/thread, so folks can see.]
>
> > More generally, I think I need to soften my prior statement about
> > deferring this to gcc-15.  This code was submitted in time for the
> > gcc-14 deadline, so it should be evaluated just like we do anything else
> > that makes the deadline.  There are various criteria we use to evaluate
> > if something should get integrated and we should just work through this
> > series like we always do and not treat it specially in any way.
>
> We talked about this some in the pachwork meeting today.  There's a lot
> of moving parts here, so here's my best bet at summarizing
>
> It seems like folks broadly agree: I think the only reason everyone was
> so quick to defer to 15 was because we though the Vrull guys even want
> to, but sounds like there's some interest in getting this into 14.

Thank you for the follow-up on this, as I had the original
conversation with Jeff in passing.
We (and the Alibaba folks and the BeagleV-AHEAD community) would
prefer to get this into 14.

> That's obviously a risky thing to do given it was sent right at the end
> of the window, but it meets the rules.
>
> Folks in the call seemed generally amenable to at least trying for 14,
> so unless anyone's opposed on the lists it seems like the way to go.
> IIRC we ended up with the following TODO list:
>
> * Make sure this doesn't regress on the targets we already support.
>   From the sounds of things there's been test suite runs that look fine,
>   so hopefully that's all manageable.  Christoph said he'd send
>   something out, we've had a bunch of test skew so there might be a bit
>   lurking but it should be generally manageable.
> * We agree on some sort of support lifecycle.  There seemed to be
>   basically two proposals: merge for 14 with the aim of quickly
>   deperecating it (maybe even for 15), or merge for 14 with the aim of
>   keeping it until it ends up un-tested (ie, requiring test results are
>   published for every release).

We expect real-world users, including the BeagleV-AHEAD community, to
need support for the foreseeable future.
Keeping it until it ends up untested (and test cases are reasonably
clean) sounds like a good threshold to ensure the integrity of the
codebase while giving this a clear path to stay in for its useful
life.

Philipp.

> * We actually find some time to sit down and do the code review.
>   That'll be a chunk of work and time is tight since most of us are
>   focusing on V-1.0, but hopefully we've got time to fit things in.
> * There's some options for testing without hardware: QEMU dropped
>   support for V-0.7.1 a while ago, but there's a patch set that's not
>   yet on the lists to bring that back.
>
> So I think unless anyone's opposed, we can at least start looking into
> getting this into GCC-14 -- there's obviously still a ton of review work
> to do and we might find something problematic, but we won't know until
> we actually sit down and do the reviews.
>
> ---
>
> Then for my opinions:
>
> The only policy worry I have is the support lifecycle: IMO merging
> something we're going to quickly deprecate is going to lead to headaches
> for users, so we should only merge this if we're going to plan on
> supporting it for the life of the hardware.  That's always hard to
> define, but we talked through the option of pushing this onto the users:
> we'd require test results published for every GCC release, and if no
> reasonably cleas test results are published then we'll assume the HW is
> defunct and support for it can be deprecated.  That's sort of patterned
> on how glibc documents deprecating ports.
>
> IIRC we didn't really end up with any deprecation policy when merging
> the other vendor support, so I'd argue we should just make that the
> general plan for supporting vendor extensions.  It pushes a little more
> work to the vendors/users than we have before, but I think it's a good
> balance.  It's also a pretty easy policy for vendors to understand: if
> they want their custom stuff supported, they need to demonstrate it
> works.

Re: [PATCH v2] aarch64: Add support for Ampere-1B (-mcpu=ampere1b) CPU

2023-11-28 Thread Philipp Tomsich

On Tue, 28 Nov 2023 at 12:21, Richard Sandiford
 wrote:
>
> Philipp Tomsich  writes:
> > This patch adds initial support for Ampere-1B core.
> >
> > The Ampere-1B core implements ARMv8.7 with the following (compiler
> > visible) extensions:
> >  - CSSC (Common Short Sequence Compression instructions),
> >  - MTE (Memory Tagging Extension)
> >  - SM3/SM4
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere-1b
> >   * config/aarch64/aarch64-cost-tables.h: Add ampere1b_extra_costs
> >   * config/aarch64/aarch64-tune.md: Regenerate
> >   * config/aarch64/aarch64.cc: Include ampere1b tuning model
> >   * doc/invoke.texi: Document -mcpu=ampere1b
> >   * config/aarch64/tuning_models/ampere1b.h: New file.
>
> OK, thanks, but:
>
> >
> > Signed-off-by: Philipp Tomsich 
> > ---
> >
> > Changes in v2:
> > - moved ampere1b model to a separated file
> > - regenerated aarch64-tune.md after rebase
> >
> >  gcc/config/aarch64/aarch64-cores.def|   1 +
> >  gcc/config/aarch64/aarch64-cost-tables.h| 107 ++
> >  gcc/config/aarch64/aarch64-tune.md  |   2 +-
> >  gcc/config/aarch64/aarch64.cc   |   1 +
> >  gcc/config/aarch64/tuning_models/ampere1b.h | 114 
> >  gcc/doc/invoke.texi |   2 +-
> >  6 files changed, 225 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/config/aarch64/tuning_models/ampere1b.h
> >
> > diff --git a/gcc/config/aarch64/aarch64-cores.def 
> > b/gcc/config/aarch64/aarch64-cores.def
> > index 16752b77f4b..ad896a80f1f 100644
> > --- a/gcc/config/aarch64/aarch64-cores.def
> > +++ b/gcc/config/aarch64/aarch64-cores.def
> > @@ -74,6 +74,7 @@ AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  
> > V8A,  (CRC, CRYPTO), thu
> >  /* Ampere Computing ('\xC0') cores. */
> >  AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (F16, RNG, AES, SHA3), 
> > ampere1, 0xC0, 0xac3, -1)
> >  AARCH64_CORE("ampere1a", ampere1a, cortexa57, V8_6A, (F16, RNG, AES, SHA3, 
> > SM4, MEMTAG), ampere1a, 0xC0, 0xac4, -1)
> > +AARCH64_CORE("ampere1b", ampere1b, cortexa57, V8_7A, (F16, RNG, AES, SHA3, 
> > SM4, MEMTAG, CSSC), ampere1b, 0xC0, 0xac5, -1)
> >  /* Do not swap around "emag" and "xgene1",
> > this order is required to handle variant correctly. */
> >  AARCH64_CORE("emag",emag,  xgene1,V8A,  (CRC, CRYPTO), 
> > emag, 0x50, 0x000, 3)
> > diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
> > b/gcc/config/aarch64/aarch64-cost-tables.h
> > index 0cb638f3a13..4c8da7f119b 100644
> > --- a/gcc/config/aarch64/aarch64-cost-tables.h
> > +++ b/gcc/config/aarch64/aarch64-cost-tables.h
> > @@ -882,4 +882,111 @@ const struct cpu_cost_table ampere1a_extra_costs =
> >}
> >  };
> >
> > +const struct cpu_cost_table ampere1b_extra_costs =
> > +{
> > +  /* ALU */
> > +  {
> > +0, /* arith.  */
> > +0, /* logical.  */
> > +0, /* shift.  */
> > +COSTS_N_INSNS (1), /* shift_reg.  */
> > +0, /* arith_shift.  */
> > +COSTS_N_INSNS (1), /* arith_shift_reg.  */
> > +0, /* log_shift.  */
> > +COSTS_N_INSNS (1), /* log_shift_reg.  */
> > +0, /* extend.  */
> > +COSTS_N_INSNS (1), /* extend_arith.  */
> > +0, /* bfi.  */
> > +0, /* bfx.  */
> > +0, /* clz.  */
> > +0, /* rev.  */
> > +0, /* non_exec.  */
> > +true   /* non_exec_costs_exec.  */
> > +  },
> > +  {
> > +/* MULT SImode */
> > +{
> > +  COSTS_N_INSNS (2),   /* simple.  */
> > +  COSTS_N_INSNS (2),   /* flag_setting.  */
> > +  COSTS_N_INSNS (2),   /* extend.  */
> > +  COSTS_N_INSNS (3),   /* add.  */
> > +  COSTS_N_INSNS (3),   /* extend_add.  */
> > +  COSTS_N_INSNS (12)   /* idiv.  */
> > +},
> > +/* MULT DImode */
> > +{
> > +  COSTS_N_INSNS (2),   /* simple.  */
> > +  0,   /* flag_setting (N/A).  */
> > +  COSTS_N_INSNS (2),   /* extend.  */
> > +  COSTS_N_INSNS (3),   /* add.  */
> > +  COSTS_N_INSNS (3),   /* extend_add.  */
> > +  COSTS_N_INSNS (1

Re: [PATCH v2] ifcvt: Remove obsolete code for subreg handling in noce_convert_multiple_sets

2023-11-22 Thread Philipp Tomsich

Applied to master, thanks!
Philipp,


On Thu, 23 Nov 2023 at 04:48, Jeff Law  wrote:
>
>
>
> On 11/21/23 11:04, Manolis Tsamis wrote:
> > This code used to handle SUBREG for register replacement when ifcvt was 
> > doing
> > the replacements manually. This special handling is not needed anymore
> > because simplify_replace_rtx is used for the replacements and it properly
> > handles these cases.
> >
> > gcc/ChangeLog:
> >
> >   * ifcvt.cc (noce_convert_multiple_sets_1): Remove old code.
> OK.
> jeff

[PATCH v2] aarch64: Add support for Ampere-1B (-mcpu=ampere1b) CPU

2023-11-22 Thread Philipp Tomsich

This patch adds initial support for Ampere-1B core.

The Ampere-1B core implements ARMv8.7 with the following (compiler
visible) extensions:
 - CSSC (Common Short Sequence Compression instructions),
 - MTE (Memory Tagging Extension)
 - SM3/SM4

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere-1b
* config/aarch64/aarch64-cost-tables.h: Add ampere1b_extra_costs
* config/aarch64/aarch64-tune.md: Regenerate
* config/aarch64/aarch64.cc: Include ampere1b tuning model
* doc/invoke.texi: Document -mcpu=ampere1b
* config/aarch64/tuning_models/ampere1b.h: New file.

Signed-off-by: Philipp Tomsich 
---

Changes in v2:
- moved ampere1b model to a separated file
- regenerated aarch64-tune.md after rebase

 gcc/config/aarch64/aarch64-cores.def|   1 +
 gcc/config/aarch64/aarch64-cost-tables.h| 107 ++
 gcc/config/aarch64/aarch64-tune.md  |   2 +-
 gcc/config/aarch64/aarch64.cc   |   1 +
 gcc/config/aarch64/tuning_models/ampere1b.h | 114 
 gcc/doc/invoke.texi |   2 +-
 6 files changed, 225 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/aarch64/tuning_models/ampere1b.h

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 16752b77f4b..ad896a80f1f 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -74,6 +74,7 @@ AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  V8A,  
(CRC, CRYPTO), thu
 /* Ampere Computing ('\xC0') cores. */
 AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (F16, RNG, AES, SHA3), 
ampere1, 0xC0, 0xac3, -1)
 AARCH64_CORE("ampere1a", ampere1a, cortexa57, V8_6A, (F16, RNG, AES, SHA3, 
SM4, MEMTAG), ampere1a, 0xC0, 0xac4, -1)
+AARCH64_CORE("ampere1b", ampere1b, cortexa57, V8_7A, (F16, RNG, AES, SHA3, 
SM4, MEMTAG, CSSC), ampere1b, 0xC0, 0xac5, -1)
 /* Do not swap around "emag" and "xgene1",
this order is required to handle variant correctly. */
 AARCH64_CORE("emag",emag,  xgene1,V8A,  (CRC, CRYPTO), emag, 
0x50, 0x000, 3)
diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
b/gcc/config/aarch64/aarch64-cost-tables.h
index 0cb638f3a13..4c8da7f119b 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -882,4 +882,111 @@ const struct cpu_cost_table ampere1a_extra_costs =
   }
 };
 
+const struct cpu_cost_table ampere1b_extra_costs =
+{
+  /* ALU */
+  {
+0, /* arith.  */
+0, /* logical.  */
+0, /* shift.  */
+COSTS_N_INSNS (1), /* shift_reg.  */
+0, /* arith_shift.  */
+COSTS_N_INSNS (1), /* arith_shift_reg.  */
+0, /* log_shift.  */
+COSTS_N_INSNS (1), /* log_shift_reg.  */
+0, /* extend.  */
+COSTS_N_INSNS (1), /* extend_arith.  */
+0, /* bfi.  */
+0, /* bfx.  */
+0, /* clz.  */
+0, /* rev.  */
+0, /* non_exec.  */
+true   /* non_exec_costs_exec.  */
+  },
+  {
+/* MULT SImode */
+{
+  COSTS_N_INSNS (2),   /* simple.  */
+  COSTS_N_INSNS (2),   /* flag_setting.  */
+  COSTS_N_INSNS (2),   /* extend.  */
+  COSTS_N_INSNS (3),   /* add.  */
+  COSTS_N_INSNS (3),   /* extend_add.  */
+  COSTS_N_INSNS (12)   /* idiv.  */
+},
+/* MULT DImode */
+{
+  COSTS_N_INSNS (2),   /* simple.  */
+  0,   /* flag_setting (N/A).  */
+  COSTS_N_INSNS (2),   /* extend.  */
+  COSTS_N_INSNS (3),   /* add.  */
+  COSTS_N_INSNS (3),   /* extend_add.  */
+  COSTS_N_INSNS (18)   /* idiv.  */
+}
+  },
+  /* LD/ST */
+  {
+COSTS_N_INSNS (2), /* load.  */
+COSTS_N_INSNS (2), /* load_sign_extend.  */
+0, /* ldrd (n/a).  */
+0, /* ldm_1st.  */
+0, /* ldm_regs_per_insn_1st.  */
+0, /* ldm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (3), /* loadf.  */
+COSTS_N_INSNS (3), /* loadd.  */
+COSTS_N_INSNS (3), /* load_unaligned.  */
+0, /* store.  */
+0, /* strd.  */
+0, /* stm_1st.  */
+0, /* stm_regs_per_insn_1st.  */
+0, /* stm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (1), /* storef.  */
+COSTS_N_INSNS (1), /* stored.  */
+COSTS_N_INSNS (1), /* store_unaligned.  */
+COSTS_N_INSNS (3), /* loadv.  */
+COSTS_N_INSNS (3)  /* storev.  */
+  },
+  {
+/* FP SFmode */
+

Re: RISC-V: Support XTheadVector extensions

2023-11-18 Thread Philipp Tomsich

On Fri, 17 Nov 2023 at 22:47, Jeff Law  wrote:
>
>
>
> On 11/17/23 04:39, juzhe.zh...@rivai.ai wrote:
> > 90% theadvector extension reusing current RVV 1.0 instructions patterns:
> > Just change ASM, For example:
> >
> > @@ -2923,7 +2923,7 @@ (define_insn "*pred_mulh_scalar"
> >(match_operand:VFULLI_D 3 "register_operand"  "vr,vr, vr, vr")] 
> > VMULH)
> > (match_operand:VFULLI_D 2 "vector_merge_operand" "vu, 0, vu,  0")))]
> > "TARGET_VECTOR"
> > -  "vmulh.vx\t%0,%3,%z4%p1"
> > +  "%^vmulh.vx\t%0,%3,%z4%p1"
> > [(set_attr "type" "vimul")
> >  (set_attr "mode" "")])
> >
> > +  if (letter == '^')
> > +{
> > +  if (TARGET_XTHEADVECTOR)
> > + fputs ("th.", file);
> > +  return;
> > +}
> I assume this hunk is meant for riscv_output_operand in riscv.cc.  We
> may also need to add '^' to the punct_valid_p hook.  But yes, this is
> the preferred way to go when all we need to do is prefix the instruction
> with "th.".
>
>
> >
> > Btw, stage 1 will close soon.  So I will review this patch on GCC-15 as
> > long as all other RISC-V maintainers agree.
> I *think* it's a gcc-15 issue.  Philipp T. and I briefly spoke about
> this at the RVI summit a couple weeks back and he indicated the thead
> vector work was targeting gcc-15.

To restate the intent clearly:
- Getting this merged into GCC14 would be our most favored outcome, as
boards with XTheadV are quite common in the field: Allwinner D1,
BeagleBoard BeagleV-Ahead, Sophgo Milk-V;
- If that is not possible and we end up with an "ok for 15", we can
still resolve the downstream ecosystem issues (primarily felt by the
BeagleV-Ahead community) gracefully.
>From our brief discussion, I understood you thought it more realistic
to land this early into GCC15.

If we end up targeting GCC15, I would still like to achieve an
agreement on design early.  This would allow our team to make any
needed changes and maintain them in a vendor-branch (on the GCC GIT
reposirty) until GCC15 opens up.

Philipp.

Re: [PATCH] aarch64: costs: update for TARGET_CSSC

2023-11-16 Thread Philipp Tomsich

Thanks for the quick turnaround on the review.
I'll send a v2 after the mcpu=ampere1b change has landed, as the
extra-costs change will have an interaction with that change (due to
the extra fields in the structure).

Philipp.


On Thu, 16 Nov 2023 at 15:12, Kyrylo Tkachov  wrote:
>
>
>
> > -Original Message-
> > From: Richard Earnshaw 
> > Sent: Thursday, November 16, 2023 8:53 AM
> > To: Philipp Tomsich ; gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov 
> > Subject: Re: [PATCH] aarch64: costs: update for TARGET_CSSC
> >
> >
> >
> > On 16/11/2023 06:15, Philipp Tomsich wrote:
> > > With the addition of CSSC (Common Short Sequence Compression)
> > > instructions, a number of idioms match to single instructions (e.g.,
> > > abs) that previously expanded to multi-instruction sequences.
> > >
> > > This recognizes (some of) those idioms that are now misclassified and
> > > returns a cost of a single instruction.
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/aarch64/aarch64.cc (aarch64_rtx_costs): Support
> > > idioms matching to CSSC instructions, if target CSSC is
> > > present
> > >
> > > Signed-off-by: Philipp Tomsich 
> > > ---
> > >
> > >   gcc/config/aarch64/aarch64.cc | 34 --
> > >   1 file changed, 24 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > > index 800a8b0e110..d89c94519e9 100644
> > > --- a/gcc/config/aarch64/aarch64.cc
> > > +++ b/gcc/config/aarch64/aarch64.cc
> > > @@ -14431,10 +14431,17 @@ aarch64_rtx_costs (rtx x, machine_mode
> > mode, int outer ATTRIBUTE_UNUSED,
> > > return false;
> > >
> > >   case CTZ:
> > > -  *cost = COSTS_N_INSNS (2);
> > > +  if (!TARGET_CSSC)
> > > +   {
> > > + /* Will be split to a bit-reversal + clz */
> > > + *cost = COSTS_N_INSNS (2);
> > > +
> > > + if (speed)
> > > +   *cost += extra_cost->alu.clz + extra_cost->alu.rev;
> > > +   }
> > > +  else
> > > +   *cost = COSTS_N_INSNS (1);
> >
> > There should be some speed-related extra_cost to add here as well, so
> > that target-specific costing can be taken into account.
>
> And I'd rather have the conditions be not inverted i.e.
> If (TARGET_CSSC)
>  ...
> else
>  ...
>
> Thanks,
> Kyrill
> >
> > >
> > > -  if (speed)
> > > -   *cost += extra_cost->alu.clz + extra_cost->alu.rev;
> > > return false;
> > >
> > >   case COMPARE:
> > > @@ -15373,12 +15380,17 @@ cost_plus:
> > > }
> > > else
> > > {
> > > - /* Integer ABS will either be split to
> > > -two arithmetic instructions, or will be an ABS
> > > -(scalar), which we don't model.  */
> > > - *cost = COSTS_N_INSNS (2);
> > > - if (speed)
> > > -   *cost += 2 * extra_cost->alu.arith;
> > > + if (!TARGET_CSSC)
> > > +   {
> > > + /* Integer ABS will either be split to
> > > +two arithmetic instructions, or will be an ABS
> > > +(scalar), which we don't model.  */
> > > + *cost = COSTS_N_INSNS (2);
> > > + if (speed)
> > > +   *cost += 2 * extra_cost->alu.arith;
> > > +   }
> > > + else
> > > +   *cost = COSTS_N_INSNS (1);
> >
> > same here.
> >
> > > }
> > > return false;
> > >
> > > @@ -15388,13 +15400,15 @@ cost_plus:
> > > {
> > >   if (VECTOR_MODE_P (mode))
> > > *cost += extra_cost->vect.alu;
> > > - else
> > > + else if (GET_MODE_CLASS (mode) == MODE_FLOAT)
> > > {
> > >   /* FMAXNM/FMINNM/FMAX/FMIN.
> > >  TODO: This may not be accurate for all implementations, but
> > >  we do not model this in the cost tables.  */
> > >   *cost += extra_cost->fp[mode == DFmode].addsub;
> > > }
> > > + else if (TARGET_CSSC)
> > > +   *cost = COSTS_N_INSNS (1);
> >
> > and here.
> >
> > > }
> > > return false;
> > >
> >
> > R.

[PATCH] aarch64: Add support for Ampere-1B (-mcpu=ampere1b) CPU

2023-11-15 Thread Philipp Tomsich

This patch adds initial support for Ampere-1B core.

The Ampere-1B core implements ARMv8.7 with the following (compiler
visible) extensions:
 - CSSC (Common Short Sequence Compression instructions),
 - MTE (Memory Tagging Extension)
 - SM3/SM4

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere-1b
* config/aarch64/aarch64-cost-tables.h: Add ampere1b_extra_costs
* config/aarch64/aarch64.cc: Add ampere1b_prefetch_tune and
ampere1b_advsimd_vector_costs
* config/aarch64/aarch64-tune.md: Regenerate
* doc/invoke.texi: Document -mcpu=ampere1b

Signed-off-by: Philipp Tomsich 
---

 gcc/config/aarch64/aarch64-cores.def |   1 +
 gcc/config/aarch64/aarch64-cost-tables.h | 107 +++
 gcc/config/aarch64/aarch64-tune.md   |   2 +-
 gcc/config/aarch64/aarch64.cc|  89 +++
 gcc/doc/invoke.texi  |   2 +-
 5 files changed, 199 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index eae40b29df6..19dfb133d29 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -74,6 +74,7 @@ AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  V8A,  
(CRC, CRYPTO), thu
 /* Ampere Computing ('\xC0') cores. */
 AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (F16, RNG, AES, SHA3), 
ampere1, 0xC0, 0xac3, -1)
 AARCH64_CORE("ampere1a", ampere1a, cortexa57, V8_6A, (F16, RNG, AES, SHA3, 
SM4, MEMTAG), ampere1a, 0xC0, 0xac4, -1)
+AARCH64_CORE("ampere1b", ampere1b, cortexa57, V8_7A, (F16, RNG, AES, SHA3, 
SM4, MEMTAG, CSSC), ampere1b, 0xC0, 0xac5, -1)
 /* Do not swap around "emag" and "xgene1",
this order is required to handle variant correctly. */
 AARCH64_CORE("emag",emag,  xgene1,V8A,  (CRC, CRYPTO), emag, 
0x50, 0x000, 3)
diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
b/gcc/config/aarch64/aarch64-cost-tables.h
index 0cb638f3a13..4c8da7f119b 100644
--- a/gcc/config/aarch64/aarch64-cost-tables.h
+++ b/gcc/config/aarch64/aarch64-cost-tables.h
@@ -882,4 +882,111 @@ const struct cpu_cost_table ampere1a_extra_costs =
   }
 };
 
+const struct cpu_cost_table ampere1b_extra_costs =
+{
+  /* ALU */
+  {
+0, /* arith.  */
+0, /* logical.  */
+0, /* shift.  */
+COSTS_N_INSNS (1), /* shift_reg.  */
+0, /* arith_shift.  */
+COSTS_N_INSNS (1), /* arith_shift_reg.  */
+0, /* log_shift.  */
+COSTS_N_INSNS (1), /* log_shift_reg.  */
+0, /* extend.  */
+COSTS_N_INSNS (1), /* extend_arith.  */
+0, /* bfi.  */
+0, /* bfx.  */
+0, /* clz.  */
+0, /* rev.  */
+0, /* non_exec.  */
+true   /* non_exec_costs_exec.  */
+  },
+  {
+/* MULT SImode */
+{
+  COSTS_N_INSNS (2),   /* simple.  */
+  COSTS_N_INSNS (2),   /* flag_setting.  */
+  COSTS_N_INSNS (2),   /* extend.  */
+  COSTS_N_INSNS (3),   /* add.  */
+  COSTS_N_INSNS (3),   /* extend_add.  */
+  COSTS_N_INSNS (12)   /* idiv.  */
+},
+/* MULT DImode */
+{
+  COSTS_N_INSNS (2),   /* simple.  */
+  0,   /* flag_setting (N/A).  */
+  COSTS_N_INSNS (2),   /* extend.  */
+  COSTS_N_INSNS (3),   /* add.  */
+  COSTS_N_INSNS (3),   /* extend_add.  */
+  COSTS_N_INSNS (18)   /* idiv.  */
+}
+  },
+  /* LD/ST */
+  {
+COSTS_N_INSNS (2), /* load.  */
+COSTS_N_INSNS (2), /* load_sign_extend.  */
+0, /* ldrd (n/a).  */
+0, /* ldm_1st.  */
+0, /* ldm_regs_per_insn_1st.  */
+0, /* ldm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (3), /* loadf.  */
+COSTS_N_INSNS (3), /* loadd.  */
+COSTS_N_INSNS (3), /* load_unaligned.  */
+0, /* store.  */
+0, /* strd.  */
+0, /* stm_1st.  */
+0, /* stm_regs_per_insn_1st.  */
+0, /* stm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (1), /* storef.  */
+COSTS_N_INSNS (1), /* stored.  */
+COSTS_N_INSNS (1), /* store_unaligned.  */
+COSTS_N_INSNS (3), /* loadv.  */
+COSTS_N_INSNS (3)  /* storev.  */
+  },
+  {
+/* FP SFmode */
+{
+  COSTS_N_INSNS (18),  /* div.  */
+  COSTS_N_INSNS (3),   /* mult.  */
+  COSTS_N_INSNS (3),   /* mult_addsub.  */
+  COSTS_N_INSNS (3),   /* fma.  */
+  COSTS_N_INSNS (2),   /* addsub.  */
+  COSTS_N_I

[PATCH] aarch64: costs: update for TARGET_CSSC

2023-11-15 Thread Philipp Tomsich

With the addition of CSSC (Common Short Sequence Compression)
instructions, a number of idioms match to single instructions (e.g.,
abs) that previously expanded to multi-instruction sequences.

This recognizes (some of) those idioms that are now misclassified and
returns a cost of a single instruction.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_rtx_costs): Support
idioms matching to CSSC instructions, if target CSSC is
present

Signed-off-by: Philipp Tomsich 
---

 gcc/config/aarch64/aarch64.cc | 34 --
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 800a8b0e110..d89c94519e9 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -14431,10 +14431,17 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int 
outer ATTRIBUTE_UNUSED,
   return false;
 
 case CTZ:
-  *cost = COSTS_N_INSNS (2);
+  if (!TARGET_CSSC)
+   {
+ /* Will be split to a bit-reversal + clz */
+ *cost = COSTS_N_INSNS (2);
+
+ if (speed)
+   *cost += extra_cost->alu.clz + extra_cost->alu.rev;
+   }
+  else
+   *cost = COSTS_N_INSNS (1);
 
-  if (speed)
-   *cost += extra_cost->alu.clz + extra_cost->alu.rev;
   return false;
 
 case COMPARE:
@@ -15373,12 +15380,17 @@ cost_plus:
}
   else
{
- /* Integer ABS will either be split to
-two arithmetic instructions, or will be an ABS
-(scalar), which we don't model.  */
- *cost = COSTS_N_INSNS (2);
- if (speed)
-   *cost += 2 * extra_cost->alu.arith;
+ if (!TARGET_CSSC)
+   {
+ /* Integer ABS will either be split to
+two arithmetic instructions, or will be an ABS
+(scalar), which we don't model.  */
+ *cost = COSTS_N_INSNS (2);
+ if (speed)
+   *cost += 2 * extra_cost->alu.arith;
+   }
+ else
+   *cost = COSTS_N_INSNS (1);
}
   return false;
 
@@ -15388,13 +15400,15 @@ cost_plus:
{
  if (VECTOR_MODE_P (mode))
*cost += extra_cost->vect.alu;
- else
+ else if (GET_MODE_CLASS (mode) == MODE_FLOAT)
{
  /* FMAXNM/FMINNM/FMAX/FMIN.
 TODO: This may not be accurate for all implementations, but
 we do not model this in the cost tables.  */
  *cost += extra_cost->fp[mode == DFmode].addsub;
}
+ else if (TARGET_CSSC)
+   *cost = COSTS_N_INSNS (1);
}
   return false;
 
-- 
2.34.1

Re: [PATCH v2] aarch64: Improve on ldp-stp policies code structure.

2023-09-29 Thread Philipp Tomsich

Applied to master. Thanks!
--Philipp.

On Fri, 29 Sept 2023 at 12:34, Richard Sandiford
 wrote:
>
> Manos Anagnostakis  writes:
> > Improves on: 834fc2bf
> >
> > This improves the code structure of the ldp-stp policies
> > patch introduced in 834fc2bf
> >
> > Bootstrapped and regtested on aarch64-linux.
> >
> > gcc/ChangeLog:
> >   * config/aarch64/aarch64-opts.h (enum aarch64_ldp_policy): Removed.
> >   (enum aarch64_ldp_stp_policy): Merged enums aarch64_ldp_policy
> >   and aarch64_stp_policy to aarch64_ldp_stp_policy.
> >   (enum aarch64_stp_policy): Removed.
> >   * config/aarch64/aarch64-protos.h (struct tune_params): Removed
> >   aarch64_ldp_policy_model and aarch64_stp_policy_model enum types
> >   and left only the definitions to the aarch64-opts one.
> >   * config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): Removed.
> >   (aarch64_parse_stp_policy): Removed.
> >   (aarch64_override_options_internal): Removed calls to parsing
> >   functions and added obvious direct assignments.
> >   (aarch64_mem_ok_with_ldpstp_policy_model): Improved
> >   code quality based on the new changes.
> >   * config/aarch64/aarch64.opt: Use single enum type
> >   aarch64_ldp_stp_policy for both ldp and stp options.
> >
> > gcc/testsuite/ChangeLog:
> >   * gcc.target/aarch64/ldp_aligned.c: Splitted into this and
> >   ldp_unaligned.
> >   * gcc.target/aarch64/stp_aligned.c: Splitted into this and
> >   stp_unaligned.
> >   * gcc.target/aarch64/ldp_unaligned.c: New test.
> >   * gcc.target/aarch64/stp_unaligned.c: New test.
>
> Nice!  OK for trunk, thanks.
>
> Sorry again for my mix-up with the original review.
>
> Richard
>
> > Signed-off-by: Manos Anagnostakis 
> > ---
> >  gcc/config/aarch64/aarch64-opts.h |  26 ++-
> >  gcc/config/aarch64/aarch64-protos.h   |  25 +--
> >  gcc/config/aarch64/aarch64.cc | 160 +++---
> >  gcc/config/aarch64/aarch64.opt|  29 +---
> >  .../gcc.target/aarch64/ldp_aligned.c  |  28 ---
> >  .../gcc.target/aarch64/ldp_unaligned.c|  40 +
> >  .../gcc.target/aarch64/stp_aligned.c  |  25 ---
> >  .../gcc.target/aarch64/stp_unaligned.c|  37 
> >  8 files changed, 155 insertions(+), 215 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_unaligned.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_unaligned.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-opts.h 
> > b/gcc/config/aarch64/aarch64-opts.h
> > index db8348507a3..831e28ab52a 100644
> > --- a/gcc/config/aarch64/aarch64-opts.h
> > +++ b/gcc/config/aarch64/aarch64-opts.h
> > @@ -108,20 +108,18 @@ enum aarch64_key_type {
> >AARCH64_KEY_B
> >  };
> >
> > -/* Load pair policy type.  */
> > -enum aarch64_ldp_policy {
> > -  LDP_POLICY_DEFAULT,
> > -  LDP_POLICY_ALWAYS,
> > -  LDP_POLICY_NEVER,
> > -  LDP_POLICY_ALIGNED
> > -};
> > -
> > -/* Store pair policy type.  */
> > -enum aarch64_stp_policy {
> > -  STP_POLICY_DEFAULT,
> > -  STP_POLICY_ALWAYS,
> > -  STP_POLICY_NEVER,
> > -  STP_POLICY_ALIGNED
> > +/* An enum specifying how to handle load and store pairs using
> > +   a fine-grained policy:
> > +   - LDP_STP_POLICY_DEFAULT: Use the policy defined in the tuning 
> > structure.
> > +   - LDP_STP_POLICY_ALIGNED: Emit ldp/stp if the source pointer is aligned
> > +   to at least double the alignment of the type.
> > +   - LDP_STP_POLICY_ALWAYS: Emit ldp/stp regardless of alignment.
> > +   - LDP_STP_POLICY_NEVER: Do not emit ldp/stp.  */
> > +enum aarch64_ldp_stp_policy {
> > +  AARCH64_LDP_STP_POLICY_DEFAULT,
> > +  AARCH64_LDP_STP_POLICY_ALIGNED,
> > +  AARCH64_LDP_STP_POLICY_ALWAYS,
> > +  AARCH64_LDP_STP_POLICY_NEVER
> >  };
> >
> >  #endif
> > diff --git a/gcc/config/aarch64/aarch64-protos.h 
> > b/gcc/config/aarch64/aarch64-protos.h
> > index 5c6802b4fe8..60a55f4bc19 100644
> > --- a/gcc/config/aarch64/aarch64-protos.h
> > +++ b/gcc/config/aarch64/aarch64-protos.h
> > @@ -568,30 +568,9 @@ struct tune_params
> >/* Place prefetch struct pointer at the end to enable type checking
> >   errors when tune_params misses elements (e.g., from erroneous 
> > merges).  */
> >const struct cpu_prefetch_tune *prefetch;
> > -/* An enum specifying how to handle load pairs using a fine-grained policy:
> > -   - LDP_POLICY_ALIGNED: Emit ldp if the source pointer is aligned
> > -   to at least double the alignment of the type.
> > -   - LDP_POLICY_ALWAYS: Emit ldp regardless of alignment.
> > -   - LDP_POLICY_NEVER: Do not emit ldp.  */
> >
> > -  enum aarch64_ldp_policy_model
> > -  {
> > -LDP_POLICY_ALIGNED,
> > -LDP_POLICY_ALWAYS,
> > -LDP_POLICY_NEVER
> > -  } ldp_policy_model;
> > -/* An enum specifying how to handle store pairs using a fine-grained 
> > policy:
> > -   - STP_POLICY_ALIGNED: Emit stp if the source pointer is aligned
> > -   to at least double the alignment of the type.
> > -   -

Re: [PATCH v2] aarch64: Fine-grained ldp and stp policies with test-cases.

2023-09-28 Thread Philipp Tomsich

Manos,

Please submit a follow-on patch implementing the requested
improvements of the code structure (as this reduces the maintenance
burden).

Thanks,
Philipp.


On Thu, 28 Sept 2023 at 15:33, Manos Anagnostakis
 wrote:
>
> Hey Richard,
>
> Thanks for taking the time to review this, but it has been commited since 
> yesterday after getting reviewed by Kyrill and Tamar.
>
> Discussions:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631285.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631300.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631389.html
>
> Commited version:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631484.html
>
> Manos.
>
> On Thu, Sep 28, 2023 at 4:17 PM Richard Sandiford  
> wrote:
>>
>> Thanks for the patch and sorry for the slow review.
>>
>> Manos Anagnostakis  writes:
>> > This patch implements the following TODO in gcc/config/aarch64/aarch64.cc
>> > to provide the requested behaviour for handling ldp and stp:
>> >
>> >   /* Allow the tuning structure to disable LDP instruction formation
>> >  from combining instructions (e.g., in peephole2).
>> >  TODO: Implement fine-grained tuning control for LDP and STP:
>> >1. control policies for load and store separately;
>> >2. support the following policies:
>> >   - default (use what is in the tuning structure)
>> >   - always
>> >   - never
>> >   - aligned (only if the compiler can prove that the
>> > load will be aligned to 2 * element_size)  */
>> >
>> > It provides two new and concrete command-line options -mldp-policy and 
>> > -mstp-policy
>> > to give the ability to control load and store policies seperately as
>> > stated in part 1 of the TODO.
>> >
>> > The accepted values for both options are:
>> > - default: Use the ldp/stp policy defined in the corresponding tuning
>> >   structure.
>> > - always: Emit ldp/stp regardless of alignment.
>> > - never: Do not emit ldp/stp.
>> > - aligned: In order to emit ldp/stp, first check if the load/store will
>> >   be aligned to 2 * element_size.
>> >
>> > gcc/ChangeLog:
>> > * config/aarch64/aarch64-protos.h (struct tune_params): Add
>> >   appropriate enums for the policies.
>> > * config/aarch64/aarch64-tuning-flags.def
>> >   (AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning
>> >   options.
>> > * config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New
>> >   function to parse ldp-policy option.
>> > (aarch64_parse_stp_policy): New function to parse stp-policy 
>> > option.
>> > (aarch64_override_options_internal): Call parsing functions.
>> > (aarch64_operands_ok_for_ldpstp): Add option-value check and
>> >   alignment check and remove superseded ones
>> > (aarch64_operands_adjust_ok_for_ldpstp): Add option-value check and
>> >   alignment check and remove superseded ones.
>> > * config/aarch64/aarch64.opt: Add options.
>> >
>> > gcc/testsuite/ChangeLog:
>> > * gcc.target/aarch64/ldp_aligned.c: New test.
>> > * gcc.target/aarch64/ldp_always.c: New test.
>> > * gcc.target/aarch64/ldp_never.c: New test.
>> > * gcc.target/aarch64/stp_aligned.c: New test.
>> > * gcc.target/aarch64/stp_always.c: New test.
>> > * gcc.target/aarch64/stp_never.c: New test.
>> >
>> > Signed-off-by: Manos Anagnostakis 
>> > ---
>> > Changes in v2:
>> > - Fixed commited ldp tests to correctly trigger
>> >   and test aarch64_operands_adjust_ok_for_ldpstp in aarch64.cc.
>> > - Added "-mcpu=generic" to commited tests to guarantee generic 
>> > target code
>> >   generation and not cause the regressions of v1.
>> >
>> >  gcc/config/aarch64/aarch64-protos.h   |  24 ++
>> >  gcc/config/aarch64/aarch64-tuning-flags.def   |   8 -
>> >  gcc/config/aarch64/aarch64.cc | 229 ++
>> >  gcc/config/aarch64/aarch64.opt|   8 +
>> >  .../gcc.target/aarch64/ldp_aligned.c  |  66 +
>> >  gcc/testsuite/gcc.target/aarch64/ldp_always.c |  66 +
>> >  gcc/testsuite/gcc.target/aarch64/ldp_never.c  |  66 +
>> >  .../gcc.target/aarch64/stp_aligned.c  |  60 +
>> >  gcc/testsuite/gcc.target/aarch64/stp_always.c |  60 +
>> >  gcc/testsuite/gcc.target/aarch64/stp_never.c  |  60 +
>> >  10 files changed, 586 insertions(+), 61 deletions(-)
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_always.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_never.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_aligned.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_always.c
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_never.c
>> >
>> > diff --git a/gcc/config/aarch64/aarch64-protos.h 
>> >

Re: [PATCH v4] aarch64: Fine-grained policies to control ldp-stp formation.

2023-09-27 Thread Philipp Tomsich

Applied to master (with fixups). Thanks!
Philipp.

On Wed, 27 Sept 2023 at 10:40, Kyrylo Tkachov  wrote:
>
> Hi Manos,
>
> > -Original Message-
> > From: Manos Anagnostakis 
> > Sent: Tuesday, September 26, 2023 2:52 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov ; Tamar Christina
> > ; Philipp Tomsich ;
> > Manos Anagnostakis 
> > Subject: [PATCH v4] aarch64: Fine-grained policies to control ldp-stp
> > formation.
> >
> > This patch implements the following TODO in gcc/config/aarch64/aarch64.cc
> > to provide the requested behaviour for handling ldp and stp:
> >
> >   /* Allow the tuning structure to disable LDP instruction formation
> >  from combining instructions (e.g., in peephole2).
> >  TODO: Implement fine-grained tuning control for LDP and STP:
> >1. control policies for load and store separately;
> >2. support the following policies:
> >   - default (use what is in the tuning structure)
> >   - always
> >   - never
> >   - aligned (only if the compiler can prove that the
> > load will be aligned to 2 * element_size)  */
> >
> > It provides two new and concrete target-specific command-line parameters
> > -param=aarch64-ldp-policy= and -param=aarch64-stp-policy=
> > to give the ability to control load and store policies seperately as
> > stated in part 1 of the TODO.
> >
> > The accepted values for both parameters are:
> > - default: Use the policy of the tuning structure (default).
> > - always: Emit ldp/stp regardless of alignment.
> > - never: Do not emit ldp/stp.
> > - aligned: In order to emit ldp/stp, first check if the load/store will
> >   be aligned to 2 * element_size.
> >
> > Bootstrapped and regtested aarch64-linux.
> >
> > gcc/ChangeLog:
> > * config/aarch64/aarch64-opts.h (enum aarch64_ldp_policy): New
> >   enum type.
> > (enum aarch64_stp_policy): New enum type.
> > * config/aarch64/aarch64-protos.h (struct tune_params): Add
> >   appropriate enums for the policies.
> >   (aarch64_mem_ok_with_ldpstp_policy_model): New declaration.
> > * config/aarch64/aarch64-tuning-flags.def
> >   (AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning
> >   options.
> > * config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New
> >   function to parse ldp-policy parameter.
> > (aarch64_parse_stp_policy): New function to parse stp-policy 
> > parameter.
> > (aarch64_override_options_internal): Call parsing functions.
> >   (aarch64_mem_ok_with_ldpstp_policy_model): New function.
> > (aarch64_operands_ok_for_ldpstp): Add call to
> >   aarch64_mem_ok_with_ldpstp_policy_model for parameter-value
> >   check and alignment check and remove superseded ones.
> > (aarch64_operands_adjust_ok_for_ldpstp): Add call to
> > aarch64_mem_ok_with_ldpstp_policy_model for parameter-value
> >   check and alignment check and remove superseded ones.
> > * config/aarch64/aarch64.opt: Add parameters.
> >   * doc/invoke.texi: Document the parameters accordingly.
>
> The ChangeLog entry should name the new parameters. For example:
> * config/aarch64/aarch64.opt (aarch64-ldp-policy): New param.
>
> Ok with the fixed ChangeLog.
> Thank you for the work!
> Kyrill
>
> >
> > gcc/testsuite/ChangeLog:
> >   * gcc.target/aarch64/ampere1-no_ldp_combine.c: Removed.
> > * gcc.target/aarch64/ldp_aligned.c: New test.
> > * gcc.target/aarch64/ldp_always.c: New test.
> > * gcc.target/aarch64/ldp_never.c: New test.
> > * gcc.target/aarch64/stp_aligned.c: New test.
> > * gcc.target/aarch64/stp_always.c: New test.
> > * gcc.target/aarch64/stp_never.c: New test.
> >
> > Signed-off-by: Manos Anagnostakis 
> > ---
> > Changes in v4:
> > - Changed the parameters to accept enum instead of an
> >   integer and updated documentation in doc/invoke.texi.
> > - Packed all the new checks in aarch64_operands_ok_for_ldpstp/
> >   aarch64_operands_adjust_ok_for_ldpstp in a new function
> >   called aarch64_mem_ok_with_ldpstp_policy_model.
> >
> >  gcc/config/aarch64/aarch64-opts.h |  16 ++
> >  gcc/config/aarch64/aarch64-protos.h   |  25 +++
> >  gcc/config/aarch64/aarch64-tuning-flags.def   |   8 -
> >  gcc/config/aarch64/aarch64.cc | 212

Re: [PATCH v3] aarch64: Fine-grained policies to control ldp-stp formation.

2023-09-25 Thread Philipp Tomsich

On Mon, 25 Sept 2023 at 21:54, Andrew Pinski  wrote:
>
> On Mon, Sep 25, 2023 at 12:50 PM Manos Anagnostakis
>  wrote:
> >
> > This patch implements the following TODO in gcc/config/aarch64/aarch64.cc
> > to provide the requested behaviour for handling ldp and stp:
> >
> >   /* Allow the tuning structure to disable LDP instruction formation
> >  from combining instructions (e.g., in peephole2).
> >  TODO: Implement fine-grained tuning control for LDP and STP:
> >1. control policies for load and store separately;
> >2. support the following policies:
> >   - default (use what is in the tuning structure)
> >   - always
> >   - never
> >   - aligned (only if the compiler can prove that the
> > load will be aligned to 2 * element_size)  */
> >
> > It provides two new and concrete target-specific command-line parameters
> > -param=aarch64-ldp-policy= and -param=aarch64-stp-policy=
> > to give the ability to control load and store policies seperately as
> > stated in part 1 of the TODO.
> >
> > The accepted values for both parameters are:
> > - 0: Use the policy of the tuning structure (default).
> > - 1: Emit ldp/stp regardless of alignment.
> > - 2: Do not emit ldp/stp.
> > - 3: In order to emit ldp/stp, first check if the load/store will
> >   be aligned to 2 * element_size.
>
> Instead of a number, does it make sense to instead use an string
> (ENUM) for this param.
> Also I think using --param is a bad idea if it is going to be
> documented in the user manual.
> Maybe a -m option should be used instead.

See https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631283.html
for the discussion triggering the change from -m... to --param and the
change to using a number instead of a string.

Thanks,
Philipp.

>
> Thanks,
> Andrew
>
> >
> > gcc/ChangeLog:
> > * config/aarch64/aarch64-protos.h (struct tune_params): Add
> > appropriate enums for the policies.
> > * config/aarch64/aarch64-tuning-flags.def
> > (AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning
> > options.
> > * config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New
> > function to parse ldp-policy parameter.
> > (aarch64_parse_stp_policy): New function to parse stp-policy 
> > parameter.
> > (aarch64_override_options_internal): Call parsing functions.
> > (aarch64_operands_ok_for_ldpstp): Add parameter-value check and
> > alignment check and remove superseded ones.
> > (aarch64_operands_adjust_ok_for_ldpstp): Add parameter-value check 
> > and
> > alignment check and remove superseded ones.
> > * config/aarch64/aarch64.opt: Add options.
> > * doc/invoke.texi: Document the parameters accordingly.
> >
> > gcc/testsuite/ChangeLog:
> > * gcc.target/aarch64/ampere1-no_ldp_combine.c: Removed.
> > * gcc.target/aarch64/ldp_aligned.c: New test.
> > * gcc.target/aarch64/ldp_always.c: New test.
> > * gcc.target/aarch64/ldp_never.c: New test.
> > * gcc.target/aarch64/stp_aligned.c: New test.
> > * gcc.target/aarch64/stp_always.c: New test.
> > * gcc.target/aarch64/stp_never.c: New test.
> >
> > Signed-off-by: Manos Anagnostakis 
> > ---
> > Changes in v3:
> > - Changed command-line options to target-specific parameters
> >   and documented them accordingly in doc/invoke.texi.
> > - Removed ampere1-no_ldp_combine.c test as superseded.
> >
> >  gcc/config/aarch64/aarch64-protos.h   |  24 ++
> >  gcc/config/aarch64/aarch64-tuning-flags.def   |   8 -
> >  gcc/config/aarch64/aarch64.cc | 215 +-
> >  gcc/config/aarch64/aarch64.opt|   8 +
> >  gcc/doc/invoke.texi   |  30 +++
> >  .../aarch64/ampere1-no_ldp_combine.c  |  11 -
> >  .../gcc.target/aarch64/ldp_aligned.c  |  66 ++
> >  gcc/testsuite/gcc.target/aarch64/ldp_always.c |  66 ++
> >  gcc/testsuite/gcc.target/aarch64/ldp_never.c  |  66 ++
> >  .../gcc.target/aarch64/stp_aligned.c  |  60 +
> >  gcc/testsuite/gcc.target/aarch64/stp_always.c |  60 +
> >  gcc/testsuite/gcc.target/aarch64/stp_never.c  |  60 +
> >  12 files changed, 600 insertions(+), 74 deletions(-)
> >  delete mode 100644 
> > gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_always.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_never.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_aligned.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_always.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_never.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-protos.h 
> > b/gcc/config/aarch64/aarch64-protos.h
> > index

Re: [PATCH v2 1/2] riscv: Add support for strlen inline expansion

2023-09-12 Thread Philipp Tomsich

Applied to master. Thanks!
Philipp.


On Wed, 6 Sept 2023 at 18:07, Christoph Muellner
 wrote:
>
> From: Christoph Müllner 
>
> This patch implements the expansion of the strlen builtin for RV32/RV64
> for xlen-aligned aligned strings if Zbb or XTheadBb instructions are 
> available.
> The inserted sequences are:
>
> rv32gc_zbb (RV64 is similar):
>   add a3,a0,4
>   li  a4,-1
> .L1:  lw  a5,0(a0)
>   add a0,a0,4
>   orc.b   a5,a5
>   beq a5,a4,.L1
>   not a5,a5
>   ctz a5,a5
>   srl a5,a5,0x3
>   add a0,a0,a5
>   sub a0,a0,a3
>
> rv64gc_xtheadbb (RV32 is similar):
>   add   a4,a0,8
> .L2:  lda5,0(a0)
>   add   a0,a0,8
>   th.tstnbz a5,a5
>   beqz  a5,.L2
>   th.reva5,a5
>   th.ff1a5,a5
>   srl   a5,a5,0x3
>   add   a0,a0,a5
>   sub   a0,a0,a4
>
> This allows to inline calls to strlen(), with optimized code for
> xlen-aligned strings, resulting in the following benefits over
> a call to libc:
> * no call/ret instructions
> * no stack frame allocation
> * no register saving/restoring
> * no alignment test
>
> The inlining mechanism is gated by a new switch ('-minline-strlen')
> and by the variable 'optimize_size'.
>
> Tested using the glibc string tests.
>
> Signed-off-by: Christoph Müllner 
>
> gcc/ChangeLog:
>
> * config.gcc: Add new object riscv-string.o.
> riscv-string.cc.
> * config/riscv/riscv-protos.h (riscv_expand_strlen):
> New function.
> * config/riscv/riscv.md (strlen): New expand INSN.
> * config/riscv/riscv.opt: New flag 'minline-strlen'.
> * config/riscv/t-riscv: Add new object riscv-string.o.
> * config/riscv/thead.md (th_rev2): Export INSN name.
> (th_rev2): Likewise.
> (th_tstnbz2): New INSN.
> * doc/invoke.texi: Document '-minline-strlen'.
> * emit-rtl.cc (emit_likely_jump_insn): New helper function.
> (emit_unlikely_jump_insn): Likewise.
> * rtl.h (emit_likely_jump_insn): New prototype.
> (emit_unlikely_jump_insn): Likewise.
> * config/riscv/riscv-string.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/xtheadbb-strlen-unaligned.c: New test.
> * gcc.target/riscv/xtheadbb-strlen.c: New test.
> * gcc.target/riscv/zbb-strlen-disabled-2.c: New test.
> * gcc.target/riscv/zbb-strlen-disabled.c: New test.
> * gcc.target/riscv/zbb-strlen-unaligned.c: New test.
> * gcc.target/riscv/zbb-strlen.c: New test.
> ---
>  gcc/config.gcc|   3 +-
>  gcc/config/riscv/riscv-protos.h   |   3 +
>  gcc/config/riscv/riscv-string.cc  | 183 ++
>  gcc/config/riscv/riscv.md |  28 +++
>  gcc/config/riscv/riscv.opt|   4 +
>  gcc/config/riscv/t-riscv  |   6 +
>  gcc/config/riscv/thead.md |   9 +-
>  gcc/doc/invoke.texi   |  11 +-
>  gcc/emit-rtl.cc   |  24 +++
>  gcc/rtl.h |   2 +
>  .../riscv/xtheadbb-strlen-unaligned.c |  14 ++
>  .../gcc.target/riscv/xtheadbb-strlen.c|  19 ++
>  .../gcc.target/riscv/zbb-strlen-disabled-2.c  |  15 ++
>  .../gcc.target/riscv/zbb-strlen-disabled.c|  15 ++
>  .../gcc.target/riscv/zbb-strlen-unaligned.c   |  14 ++
>  gcc/testsuite/gcc.target/riscv/zbb-strlen.c   |  19 ++
>  16 files changed, 366 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/config/riscv/riscv-string.cc
>  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-strlen-unaligned.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-strlen.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-disabled.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen-unaligned.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-strlen.c
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index b2fe7c7ceef..aff6b6a5601 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -530,7 +530,8 @@ pru-*-*)
> ;;
>  riscv*)
> cpu_type=riscv
> -   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
> riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o 
> riscv-vector-costs.o"
> +   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
> riscv-shorten-memrefs.o riscv-selftests.o riscv-string.o"
> +   extra_objs="${extra_objs} riscv-v.o riscv-vsetvl.o 
> riscv-vector-costs.o"
> extra_objs="${extra_objs} riscv-vector-builtins.o 
> riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
> extra_objs="${extra_objs} thead.o"
> d_target_objs="riscv-d.o"
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index

Re: [PATCH v2 2/2] riscv: Add support for str(n)cmp inline expansion

2023-09-12 Thread Philipp Tomsich

Applied to master. Thanks!
Philipp.

On Tue, 12 Sept 2023 at 05:34, Jeff Law  wrote:
>
>
>
> On 9/6/23 10:07, Christoph Muellner wrote:
> > From: Christoph Müllner 
> >
> > This patch implements expansions for the cmpstrsi and cmpstrnsi
> > builtins for RV32/RV64 for xlen-aligned strings if Zbb or XTheadBb
> > instructions are available.  The expansion basically emits a comparison
> > sequence which compares XLEN bits per step if possible.
> >
> > This allows to inline calls to strcmp() and strncmp() if both strings
> > are xlen-aligned.  For strncmp() the length parameter needs to be known.
> > The benefits over calls to libc are:
> > * no call/ret instructions
> > * no stack frame allocation
> > * no register saving/restoring
> > * no alignment tests
> >
> > The inlining mechanism is gated by a new switches ('-minline-strcmp' and
> > '-minline-strncmp') and by the variable 'optimize_size'.
> > The amount of emitted unrolled loop iterations can be controlled by the
> > parameter '--param=riscv-strcmp-inline-limit=N', which defaults to 64.
> >
> > The comparision sequence is inspired by the strcmp example
> > in the appendix of the Bitmanip specification (incl. the fast
> > result calculation in case the first word does not contain
> > a NULL byte).  Additional inspiration comes from rs6000-string.c.
> >
> > The emitted sequence is not triggering any readahead pagefault issues,
> > because only aligned strings are accessed by aligned xlen-loads.
> >
> > This patch has been tested using the glibc string tests on QEMU:
> > * rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=64
> > * rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=8
> > * rv32gc_zbb/rv32gc_xtheadbb with riscv-strcmp-inline-limit=64
> >
> > Signed-off-by: Christoph Müllner 
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/bitmanip.md (*_not): Export INSN name.
> >   (_not3): Likewise.
> >   * config/riscv/riscv-protos.h (riscv_expand_strcmp): New
> >   prototype.
> >   * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
> >   macros.
> >   (GEN_EMIT_HELPER2): Likewise.
> >   (emit_strcmp_scalar_compare_byte): New function.
> >   (emit_strcmp_scalar_compare_subword): Likewise.
> >   (emit_strcmp_scalar_compare_word): Likewise.
> >   (emit_strcmp_scalar_load_and_compare): Likewise.
> >   (emit_strcmp_scalar_call_to_libc): Likewise.
> >   (emit_strcmp_scalar_result_calculation_nonul): Likewise.
> >   (emit_strcmp_scalar_result_calculation): Likewise.
> >   (riscv_expand_strcmp_scalar): Likewise.
> >   (riscv_expand_strcmp): Likewise.
> >   * config/riscv/riscv.md (*slt_): Export
> >   INSN name.
> >   (@slt_3): Likewise.
> >   (cmpstrnsi): Invoke expansion function for str(n)cmp.
> >   (cmpstrsi): Likewise.
> >   * config/riscv/riscv.opt: Add new parameter
> >   '-mstring-compare-inline-limit'.
> >   * doc/invoke.texi: Document new parameter
> >   '-mstring-compare-inline-limit'.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/xtheadbb-strcmp-unaligned.c: New test.
> >   * gcc.target/riscv/xtheadbb-strcmp.c: New test.
> >   * gcc.target/riscv/zbb-strcmp-disabled-2.c: New test.
> >   * gcc.target/riscv/zbb-strcmp-disabled.c: New test.
> >   * gcc.target/riscv/zbb-strcmp-unaligned.c: New test.
> >   * gcc.target/riscv/zbb-strcmp.c: New test.
> OK for the trunk.  THanks for pushing this along.
>
> jeff

Re: [PATCH] riscv: xtheadbb: Fix extendqi insn

2023-09-08 Thread Philipp Tomsich

Applied to master. Thanks!
Philipp.

On Fri, 8 Sept 2023 at 14:17, Kito Cheng  wrote:

> LGTM
>
> Christoph Muellner 於 2023年9月8日 週五，14:00寫道：
>
>> From: Christoph Müllner 
>>
>> Recently three SPEC CPU 2017 benchmarks broke when using xtheadbb:
>> * 500.perlbench_r
>> * 525.x264_r
>> * 557.xz_r
>>
>> Tracing the issue down revealed, that we emit a 'th.ext xN,xN,15,0'
>> for a extendqi insn, which is obviously wrong.
>> This patch splits the common 'extend2_th_ext'
>> insn into two 'extendqi' and 'extendhi' insns,
>> which emit the right extension instruction.
>> Additionally, this patch adds test cases for these insns.
>>
>> Signed-off-by: Christoph Müllner 
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/thead.md
>> (*extend2_th_ext):
>> Remove broken INSN.
>> (*extendhi2_th_ext): New INSN.
>> (*extendqi2_th_ext): New INSN.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/xtheadbb-ext-2.c: New test.
>> * gcc.target/riscv/xtheadbb-ext-3.c: New test.
>> ---
>>  gcc/config/riscv/thead.md   | 17 ++---
>>  gcc/testsuite/gcc.target/riscv/xtheadbb-ext-2.c | 12 
>>  gcc/testsuite/gcc.target/riscv/xtheadbb-ext-3.c | 12 
>>  3 files changed, 38 insertions(+), 3 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-ext-2.c
>>  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadbb-ext-3.c
>>
>> diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md
>> index 29f98dec3a8..05d1b32bd94 100644
>> --- a/gcc/config/riscv/thead.md
>> +++ b/gcc/config/riscv/thead.md
>> @@ -58,14 +58,25 @@ (define_insn "*th_ext4"
>>[(set_attr "type" "bitmanip")
>> (set_attr "mode" "")])
>>
>> -(define_insn "*extend2_th_ext"
>> +(define_insn "*extendhi2_th_ext"
>>[(set (match_operand:SUPERQI 0 "register_operand" "=r,r")
>> (sign_extend:SUPERQI
>> -   (match_operand:SHORT 1 "nonimmediate_operand" "r,m")))]
>> +   (match_operand:HI 1 "nonimmediate_operand" "r,m")))]
>>"TARGET_XTHEADBB"
>>"@
>> th.ext\t%0,%1,15,0
>> -   l\t%0,%1"
>> +   lh\t%0,%1"
>> +  [(set_attr "type" "bitmanip,load")
>> +   (set_attr "mode" "")])
>> +
>> +(define_insn "*extendqi2_th_ext"
>> +  [(set (match_operand:SUPERQI 0 "register_operand" "=r,r")
>> +   (sign_extend:SUPERQI
>> +   (match_operand:QI 1 "nonimmediate_operand" "r,m")))]
>> +  "TARGET_XTHEADBB"
>> +  "@
>> +   th.ext\t%0,%1,7,0
>> +   lb\t%0,%1"
>>[(set_attr "type" "bitmanip,load")
>> (set_attr "mode" "")])
>>
>> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadbb-ext-2.c
>> b/gcc/testsuite/gcc.target/riscv/xtheadbb-ext-2.c
>> new file mode 100644
>> index 000..4645b9c56df
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/xtheadbb-ext-2.c
>> @@ -0,0 +1,12 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gc_xtheadbb" { target { rv64 } } } */
>> +/* { dg-options "-march=rv32gc_xtheadbb" { target { rv32 } } } */
>> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
>> +
>> +signed long extqi(signed char i)
>> +{
>> +return --i;
>> +}
>> +
>> +/* { dg-final { scan-assembler "th.ext\ta\[0-9\]+,a\[0-9\]+,7,0" } } */
>> +/* { dg-final { scan-assembler-not "th.ext\ta\[0-9\]+,a\[0-9\]+,15,0" }
>> } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadbb-ext-3.c
>> b/gcc/testsuite/gcc.target/riscv/xtheadbb-ext-3.c
>> new file mode 100644
>> index 000..2c9ebbc563a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/xtheadbb-ext-3.c
>> @@ -0,0 +1,12 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-march=rv64gc_xtheadbb" { target { rv64 } } } */
>> +/* { dg-options "-march=rv32gc_xtheadbb" { target { rv32 } } } */
>> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" "-Oz" } } */
>> +
>> +signed long exthi(signed short i)
>> +{
>> +return --i;
>> +}
>> +
>> +/* { dg-final { scan-assembler "th.ext\ta\[0-9\]+,a\[0-9\]+,15,0" } } */
>> +/* { dg-final { scan-assembler-not "th.ext\ta\[0-9\]+,a\[0-9\]+,7,0" } }
>> */
>> --
>> 2.41.0
>>
>>

Re: [PATCH] riscv: thead: Fix mode attribute for extension patterns

2023-09-08 Thread Philipp Tomsich

Applied to master. Thanks!
Philipp.

On Fri, 8 Sept 2023 at 10:13, Kito Cheng  wrote:

> LGTM
>
> Christoph Muellner  於 2023年9月8日 週五 14:16 寫道：
>
>> From: Christoph Müllner 
>>
>> The mode attribute of an extension pattern is usually set to the target
>> type.
>> Let's follow this convention consistently for xtheadbb.
>>
>> Signed-off-by: Christoph Müllner 
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/thead.md: Use more appropriate mode attributes
>> for extensions.
>> ---
>>  gcc/config/riscv/thead.md | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md
>> index 05d1b32bd94..2287b752ea1 100644
>> --- a/gcc/config/riscv/thead.md
>> +++ b/gcc/config/riscv/thead.md
>> @@ -101,7 +101,7 @@ (define_insn "*zero_extendsidi2_th_extu"
>> th.extu\t%0,%1,31,0
>> lwu\t%0,%1"
>>[(set_attr "type" "bitmanip,load")
>> -   (set_attr "mode" "SI")])
>> +   (set_attr "mode" "DI")])
>>
>>  (define_insn "*zero_extendhi2_th_extu"
>>[(set (match_operand:GPR 0 "register_operand" "=r,r")
>> @@ -111,7 +111,7 @@ (define_insn "*zero_extendhi2_th_extu"
>> th.extu\t%0,%1,15,0
>> lhu\t%0,%1"
>>[(set_attr "type" "bitmanip,load")
>> -   (set_attr "mode" "HI")])
>> +   (set_attr "mode" "")])
>>
>>  (define_insn "*th_clz2"
>>[(set (match_operand:X 0 "register_operand" "=r")
>> --
>> 2.41.0
>>
>>

Re: [PATCH] riscv: bitmanip: Remove duplicate zero_extendhi2 pattern

2023-09-08 Thread Philipp Tomsich

Committed as 'obvious' to master. Thanks!
Philipp.

On Fri, 8 Sept 2023 at 08:53, Christoph Muellner <
christoph.muell...@vrull.eu> wrote:

> From: Christoph Müllner 
>
> We currently have two identical zero_extendhi2 patterns:
> * '*zero_extendhi2_zbb'
> * '*zero_extendhi2_bitmanip'
>
> This patch removes the *_zbb pattern and ensures that all sign- and
> zero-extensions use the postfix '_bitmanip'.
>
> Signed-off-by: Christoph Müllner 
>
> gcc/ChangeLog:
>
> * config/riscv/bitmanip.md
> (*extend2_zbb):
> Rename postfix to _bitmanip.
> (*extend2_bitmanip): Renamed pattern.
> (*zero_extendhi2_zbb): Remove duplicated pattern.
> ---
>  gcc/config/riscv/bitmanip.md | 13 +
>  1 file changed, 1 insertion(+), 12 deletions(-)
>
> diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> index 1544ef4e125..431b3292213 100644
> --- a/gcc/config/riscv/bitmanip.md
> +++ b/gcc/config/riscv/bitmanip.md
> @@ -283,7 +283,7 @@ (define_insn "*zero_extendhi2_bitmanip"
>[(set_attr "type" "bitmanip,load")
> (set_attr "mode" "")])
>
> -(define_insn "*extend2_zbb"
> +(define_insn "*extend2_bitmanip"
>[(set (match_operand:SUPERQI   0 "register_operand" "=r,r")
> (sign_extend:SUPERQI
> (match_operand:SHORT 1 "nonimmediate_operand" " r,m")))]
> @@ -294,17 +294,6 @@ (define_insn "*extend2_zbb"
>[(set_attr "type" "bitmanip,load")
> (set_attr "mode" "")])
>
> -(define_insn "*zero_extendhi2_zbb"
> -  [(set (match_operand:GPR0 "register_operand" "=r,r")
> -   (zero_extend:GPR
> -   (match_operand:HI 1 "nonimmediate_operand" " r,m")))]
> -  "TARGET_ZBB"
> -  "@
> -   zext.h\t%0,%1
> -   lhu\t%0,%1"
> -  [(set_attr "type" "bitmanip,load")
> -   (set_attr "mode" "HI")])
> -
>  (define_expand "rotrdi3"
>[(set (match_operand:DI 0 "register_operand")
> (rotatert:DI (match_operand:DI 1 "register_operand")
> --
> 2.41.0
>
>

Re: [PATCH] riscv: xtheadbb: Fix xtheadbb-li-rotr test for rv32

2023-09-06 Thread Philipp Tomsich

Committed as "obvious" to master.
--Philipp.

On Wed, 6 Sept 2023 at 12:04, Christoph Muellner <
christoph.muell...@vrull.eu> wrote:

> From: Christoph Müllner 
>
> The test was introduced recently and tests a RV64-only feature.
> However, when testing an RV32 compiler, the test gets executed as well
> and fails with "cc1: error: ABI requires '-march=rv32'".
> This patch fixes this by adding '-mabi=lp64' (like it is done for
> other RV64-only tests as well).
>
> Retested with RV32 and RV64 to ensure this won't pop up again.
>
> Signed-off-by: Christoph Müllner 
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/xtheadbb-li-rotr.c: Don't run for RV32.
> ---
>  gcc/testsuite/gcc.target/riscv/xtheadbb-li-rotr.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadbb-li-rotr.c
> b/gcc/testsuite/gcc.target/riscv/xtheadbb-li-rotr.c
> index 136dcb01cf4..01f4215179a 100644
> --- a/gcc/testsuite/gcc.target/riscv/xtheadbb-li-rotr.c
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadbb-li-rotr.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-march=rv64gc_xtheadbb" } */
> +/* { dg-options "-march=rv64gc_xtheadbb -mabi=lp64" } */
>  /* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
>  /* { dg-final { check-function-bodies "**" "" } } */
>
> --
> 2.41.0
>
>

Re: [PATCH] riscv: Synthesize all 11-bit-rotate constants with rori

2023-09-05 Thread Philipp Tomsich

Applied to master. Thanks!
Philipp.

On Tue, 5 Sept 2023 at 23:57, Jeff Law  wrote:

>
>
> On 9/5/23 15:15, Christoph Muellner wrote:
> > From: Christoph Müllner 
> >
> > Some constants can be built up using LI+RORI instructions.
> > The current implementation requires one of the upper 32-bits
> > to be a zero bit, which is not neccesary.
> > Let's drop this requirement in order to be able to synthesize
> > a constant like 0x00ffL.
> >
> > The tests for LI+RORI are made more strict to detect regression
> > in the calculation of the LI constant and the rotation amount.
> >
> > Signed-off-by: Christoph Müllner 
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv.cc (riscv_build_integer_1): Don't
> >   require one zero bit in the upper 32 bits for LI+RORI synthesis.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/xtheadbb-li-rotr.c: New tests.
> >   * gcc.target/riscv/zbb-li-rotr.c: Likewise.
> OK
> jeff
>

Re: [PATCH] riscv: xtheadbb: Enable constant synthesis with th.srri

2023-09-05 Thread Philipp Tomsich

Applied to master. Thanks!
Philipp.

On Tue, 5 Sept 2023 at 18:10, Jeff Law  wrote:

>
>
> On 9/5/23 09:42, Christoph Muellner wrote:
> > From: Christoph Müllner 
> >
> > Some constants can be built up using rotate-right instructions.
> > The code that enables this can be found in riscv_build_integer_1().
> > However, this functionality is only available for Zbb, which
> > includes the rori instruction.  This patch enables this also for
> > XTheadBb, which includes the th.srri instruction.
> >
> > Signed-off-by: Christoph Müllner 
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv.cc (riscv_build_integer_1): Enable constant
> >   synthesis with rotate-right for XTheadBb.
> OK
> Jeff
>

Re: [PATCH] riscv: xtheadcondmov: Don't run tests with -Oz

2023-09-05 Thread Philipp Tomsich

Applied to master. Thanks!
Philipp.

On Tue, 5 Sept 2023 at 08:22, Jeff Law  wrote:

>
>
> On 9/1/23 04:20, Christoph Muellner wrote:
> > From: Christoph Müllner 
> >
> > Recently, these xtheadcondmov tests regressed with -Oz:
> > * FAIL: gcc.target/riscv/xtheadcondmov-mveqz-imm-eqz.c
> > * FAIL: gcc.target/riscv/xtheadcondmov-mveqz-imm-not.c
> > * FAIL: gcc.target/riscv/xtheadcondmov-mvnez-imm-cond.c
> > * FAIL: gcc.target/riscv/xtheadcondmov-mvnez-imm-nez.c
> >
> > As -Oz stands for "Optimize aggressively for size rather than speed.",
> > we need to inspect the generated code, which looks like this:
> >
> >-Oz
> > :
> >   0:   e199bneza1,6 <.L2>
> >   2:   40100513li  a0,1025
> >0006 <.L2>:
> >   6:   8082ret
> >
> >-O2:
> > :
> >   0:   40100793li  a5,1025
> >   4:   40b7950bth.mveqza0,a5,a1
> >   8:   8082ret
> >
> > As the generated code with -Oz consumes less size, there is nothing
> > wrong in the code generation. Instead, let's not run the xtheadcondmov
> > tests with -Oz.
> >
> > Signed-off-by: Christoph Müllner 
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/xtheadcondmov-mveqz-imm-eqz.c: Disable for -Oz.
> >   * gcc.target/riscv/xtheadcondmov-mveqz-imm-not.c: Likewise.
> >   * gcc.target/riscv/xtheadcondmov-mveqz-reg-eqz.c: Likewise.
> >   * gcc.target/riscv/xtheadcondmov-mveqz-reg-not.c: Likewise.
> >   * gcc.target/riscv/xtheadcondmov-mvnez-imm-cond.c: Likewise.
> >   * gcc.target/riscv/xtheadcondmov-mvnez-imm-nez.c: Likewise.
> >   * gcc.target/riscv/xtheadcondmov-mvnez-reg-cond.c: Likewise.
> >   * gcc.target/riscv/xtheadcondmov-mvnez-reg-nez.c: Likewise.
> OK
> jeff
>

Re: RISC-V: Added support for CRC.

2023-08-16 Thread Philipp Tomsich

On Wed, 16 Aug 2023 at 21:10, Alexander Monakov  wrote:
>
>
> On Tue, 15 Aug 2023, Jeff Law wrote:
>
> > Because if the compiler can optimize it automatically, then the projects 
> > have
> > to do literally nothing to take advantage of it.  They just compile normally
> > and their bitwise CRC gets optimized down to either a table lookup or a 
> > clmul
> > variant.  That's the real goal here.
>
> The only high-profile FOSS project that carries a bitwise CRC implementation
> I'm aware of is the 'xz' compression library. There bitwise CRC is used for
> populating the lookup table under './configure --enable-small':
>
> https://github.com/tukaani-project/xz/blob/2b871f4dbffe3801d0da3f89806b5935f758d5f3/src/liblzma/check/crc64_small.c
>
> It's a well-reasoned choice and your compiler would be undoing it
> (reintroducing the table when the bitwise CRC is employed specifically
> to avoid carrying the table).
>
> > One final note.  Elsewhere in this thread you described performance 
> > concerns.
> > Right now clmuls can be implemented in 4c, fully piped.
>
> Pipelining doesn't matter in the implementation being proposed here, because
> the builtin is expanded to
>
>li  a4,quotient
>li  a5,polynomial
>xor a0,a1,a0
>clmul   a0,a0,a4
>srlia0,a0,crc_size
>clmul   a0,a0,a5
>sllia0,a0,GET_MODE_BITSIZE (word_mode) - crc_size
>srlia0,a0,GET_MODE_BITSIZE (word_mode) - crc_size
>
> making CLMULs data-dependent, so the second can only be started one cycle
> after the first finishes, and consecutive invocations of __builtin_crc
> are likewise data-dependent (with three cycles between CLMUL). So even
> when you get CLMUL down to 3c latency, you'll have two CLMULs and 10 cycles
> per input block, while state of the art is one widening CLMUL per input block
> (one CLMUL per 32-bit block on a 64-bit CPU) limited by throughput, not 
> latency.
>
> > I fully expect that latency to drop within the next 12-18 months.  In that
> > world, there's not going to be much benefit to using hand-coded libraries vs
> > just letting the compiler do it.

I would also hope that the hand-coded libraries would eventually have
a code path for compilers that support the built-in.
For what it's worth, there now is CRC in Boost:
https://www.boost.org/doc/libs/1_83_0/doc/html/crc.html

Cheers,
philipp.

Re: [RFC PATCH v2 1/2] RISC-V: __builtin_riscv_pause for all environment

2023-08-16 Thread Philipp Tomsich

On Wed, 16 Aug 2023 at 03:27, Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 8/9/23 20:25, Tsukasa OI wrote:
> > From: Tsukasa OI 
> >
> > The "pause" RISC-V hint instruction requires the 'Zihintpause' extension
> > (in the assembler).  However, GCC emits "pause" unconditionally, making
> > an assembler error while compiling code with __builtin_riscv_pause while
> > the 'Zihintpause' extension disabled.
> >
> > However, the "pause" instruction code (0x010f) is a HINT and emitting
> > its instruction code is safe in any environment.
> >
> > This commit implements handling for the 'Zihintpause' extension and emits
> > ".insn 0x010f" instead of "pause" only if the extension is disabled
> > (making the diagnostics better).
> >
> > gcc/ChangeLog:
> >
> >   * common/config/riscv/riscv-common.cc
> >   (riscv_ext_version_table): Implement the 'Zihintpause' extension,
> >   version 2.0.  (riscv_ext_flag_table) Add 'Zihintpause' handling.
> >   * config/riscv/riscv-builtins.cc: Remove availability predicate
> >   "always" and add "hint_pause" and "hint_pause_pseudo", corresponding
> >   the existence of the 'Zihintpause' extension.
> >   (riscv_builtins) Split builtin implementation depending on the
> >   existence of the 'Zihintpause' extension.
> >   * config/riscv/riscv-opts.h
> >   (MASK_ZIHINTPAUSE, TARGET_ZIHINTPAUSE): New.
> >   * config/riscv/riscv.md (riscv_pause): Make it only available when
> >   the 'Zihintpause' extension is enabled.  (riscv_pause_insn) New
> >   "pause" implementation when the 'Zihintpause' extension is disabled.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/builtin_pause.c: Removed.
> >   * gcc.target/riscv/zihintpause-1.c:
> >   New test when the 'Zihintpause' extension is enabled.
> >   * gcc.target/riscv/zihintpause-2.c: Likewise.
> >   * gcc.target/riscv/zihintpause-noarch.c:
> >   New test when the 'Zihintpause' extension is disabled.
> So the conclusion from today's meeting was to make this available
> irrespective of the extension set.  So I've dropped the alternate patch
> from patchwork.
>
>
> > diff --git a/gcc/config/riscv/riscv-builtins.cc 
> > b/gcc/config/riscv/riscv-builtins.cc
> > index 79681d759628..554fb7f69bb0 100644
> > --- a/gcc/config/riscv/riscv-builtins.cc
> > +++ b/gcc/config/riscv/riscv-builtins.cc
> > @@ -122,7 +122,8 @@ AVAIL (clmul_zbkc32_or_zbc32, (TARGET_ZBKC || 
> > TARGET_ZBC) && !TARGET_64BIT)
> >   AVAIL (clmul_zbkc64_or_zbc64, (TARGET_ZBKC || TARGET_ZBC) && TARGET_64BIT)
> >   AVAIL (clmulr_zbc32, TARGET_ZBC && !TARGET_64BIT)
> >   AVAIL (clmulr_zbc64, TARGET_ZBC && TARGET_64BIT)
> > -AVAIL (always, (!0))
> > +AVAIL (hint_pause, TARGET_ZIHINTPAUSE)
> > +AVAIL (hint_pause_pseudo, !TARGET_ZIHINTPAUSE)
> >
> >   /* Construct a riscv_builtin_description from the given arguments.
> >
> > @@ -179,7 +180,8 @@ static const struct riscv_builtin_description 
> > riscv_builtins[] = {
> >
> > DIRECT_BUILTIN (frflags, RISCV_USI_FTYPE, hard_float),
> > DIRECT_NO_TARGET_BUILTIN (fsflags, RISCV_VOID_FTYPE_USI, hard_float),
> > -  DIRECT_NO_TARGET_BUILTIN (pause, RISCV_VOID_FTYPE, always),
> > +  RISCV_BUILTIN (pause, "pause", RISCV_BUILTIN_DIRECT_NO_TARGET, 
> > RISCV_VOID_FTYPE, hint_pause),
> > +  RISCV_BUILTIN (pause_insn, "pause", RISCV_BUILTIN_DIRECT_NO_TARGET, 
> > RISCV_VOID_FTYPE, hint_pause_pseudo),
> >   };
> >
> >   /* Index I is the function declaration for riscv_builtins[I], or null if 
> > the
> > diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
> > index 28d9b81bd800..a6c3e0c9098f 100644
> > --- a/gcc/config/riscv/riscv-opts.h
> > +++ b/gcc/config/riscv/riscv-opts.h
> > @@ -102,10 +102,12 @@ enum riscv_entity
> >   #define MASK_ZICSR(1 << 0)
> >   #define MASK_ZIFENCEI (1 << 1)
> >   #define MASK_ZIHINTNTL (1 << 2)
> > +#define MASK_ZIHINTPAUSE (1 << 3)
> >
> >   #define TARGET_ZICSR((riscv_zi_subext & MASK_ZICSR) != 0)
> >   #define TARGET_ZIFENCEI ((riscv_zi_subext & MASK_ZIFENCEI) != 0)
> >   #define TARGET_ZIHINTNTL ((riscv_zi_subext & MASK_ZIHINTNTL) != 0)
> > +#define TARGET_ZIHINTPAUSE ((riscv_zi_subext & MASK_ZIHINTPAUSE) != 0)
> >
> >   #define MASK_ZAWRS   (1 << 0)
> >   #define TARGET_ZAWRS ((riscv_za_subext & MASK_ZAWRS) != 0)
> > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > index 688fd697255b..a6cdb32e9408 100644
> > --- a/gcc/config/riscv/riscv.md
> > +++ b/gcc/config/riscv/riscv.md
> > @@ -2192,9 +2192,14 @@
> >
> >   (define_insn "riscv_pause"
> > [(unspec_volatile [(const_int 0)] UNSPECV_PAUSE)]
> > -  ""
> > +  "TARGET_ZIHINTPAUSE"
> > "pause")
> >
> > +(define_insn "riscv_pause_insn"
> > +  [(unspec_volatile [(const_int 0)] UNSPECV_PAUSE)]
> > +  ""
> > +  ".insn\t0x010f")
> > +
> So I was wondering if we'd be better off always emitting the .insn form
> with a comment on the line indicating it's a pause.  ie something like
>
> .insn\t0x010f ;; pause

Re: [RFC PATCH 0/2] RISC-V: __builtin_riscv_pause for all environment

2023-08-13 Thread Philipp Tomsich

On Sat, 12 Aug 2023 at 01:31, Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 8/9/23 16:39, Tsukasa OI wrote:
> > On 2023/08/10 5:05, Jeff Law wrote:
>
> >> I'd tend to think we do not want to expose the intrinsic unless the
> >> right extensions are enabled -- even though the encoding is a no-op and
> >> we could emit it as a .insn.
> >
> > I think that makes sense.  The only reason I implemented the
> > no-'Zihintpause' version is because GCC 13 implemented the built-in
> > unconditionally.  If the compatibility breakage is considered minimum (I
> > don't know, though), I'm ready to submit 'Zihintpause'-only version of
> > this patch set.
> While it's a compatibility break I don't think we have a need to
> preserve this kind of compatibility.  I suspect anyone using
> __builtin_riscv_pause was probably already turning on Zihintpause and if
> they weren't they should have been :-0
>
>
> I'm sure we'll kick this around in the Tuesday meeting and hopefully
> make a decision about the desired direction.  You're obviously welcome
> to join if you're inclined.  Let me know if you need an invite.

The original discussion (and I believe that Andrew was the decisive
voice in the end) came to the conclusion that—given that pause is a
true hint—it could always be enabled.
We had originally expected to enable it only if Zihintpause was part
of the target architecture, but viewing it as "just a name for an
already existing pure hint" also made sense.
Note that on systems that don't implement Zihintpause, the hint is
guarantueed to not have an architectural effect.

That said, I don't really have a strong leaning one way or another.
Philipp.

Re: [PATCH] cprop_hardreg: Allow more propagation of the stack pointer.

2023-08-07 Thread Philipp Tomsich

Applied to master, thanks!
--Philipp.


On Mon, 7 Aug 2023 at 19:20, Jeff Law  wrote:
>
>
>
> On 8/7/23 05:31, Manolis Tsamis wrote:
> > The stack pointer propagation fix 736f8fd3 turned out to be more restrictive
> > than needed by rejecting propagation of the stack pointer when REG_POINTER
> > didn't match.
> >
> > This commit removes this check:
> > When the stack pointer is propagated it is fine for this to result in
> > REG_POINTER becoming true from false, which is what the original code 
> > checked.
> >
> > This simplification makes the previously introduced function
> > maybe_copy_reg_attrs obsolete and the logic can be inlined at the call 
> > sites,
> > as it was before 736f8fd3.
> >
> > gcc/ChangeLog:
> >
> >   * regcprop.cc (maybe_copy_reg_attrs): Remove unnecessary function.
> >   (find_oldest_value_reg): Inline stack_pointer_rtx check.
> >   (copyprop_hardreg_forward_1): Inline stack_pointer_rtx check.
> OK
> jeff

Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Philipp Tomsich

Very helpful! Looks as if regprop for stack_pointer is now either too
conservative — or one of our patches is missing in everyone's test
setup; we'll take a closer look.

On Wed, 2 Aug 2023 at 01:03, Vineet Gupta  wrote:
>
>
>
> On 8/1/23 15:07, Philipp Tomsich wrote:
> > +Manolis Tsamis
> >
> > On Tue, 1 Aug 2023 at 23:56, Jeff Law via Gcc-patches
> >  wrote:
> >>
> >>
> >> On 8/1/23 13:14, Vineet Gupta wrote:
> >>
> >>> I have some numbers for f-m-o v3 vs this. Attached here (vs. inline to
> >>> avoid the Thunderbird mangling the test formatting)
> >> Thanks.  Of particular importance is the leela change.  My recollection
> >> was that the f-m-o work also picked up that case.  But if my memory is
> >> faulty (always a possibility), then that shows a clear case where
> >> Jivan's work picks up a case not handled by Manolis's work.
> > f-m-o originally targeted (and benefited) the leela-case.  I wonder if
> > other optimizations/changes over the last year interfere with this and
> > what needs to be changed to accomodate this... looks like we need to
> > revisit against trunk.
> >
> > Philipp.
> >
> >> And on the other direction we can see that deepsjeng isn't helped by
> >> Jivan's work, but is helped by Manolis's new pass.
> >>
> >> I'd always hoped/expected we'd have cases where one patch clearly helped
> >> over the other.  While the .25% to .37% improvements for the three most
> >> impacted benchmarks doesn't move the needle much across the whole suite
> >> they do add up over time.
> >>
> >> Jeff
>
> I took a quick look at Leela, the significant difference is from
> additional insns with SP not getting propagated.
>
> e.g.
>
> 231b6:mva4,sp
> 231b8:sh2adda5,a5,a4
>
> vs.
>
> 1e824:sh2adda5,a5,sp
>
> There are 5 such instances which more or less make up for the delta.
>
> -Vineet
>

Re: RISC-V: Folding memory for FP + constant case

2023-08-01 Thread Philipp Tomsich

+Manolis Tsamis

On Tue, 1 Aug 2023 at 23:56, Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 8/1/23 13:14, Vineet Gupta wrote:
>
> >
> > I have some numbers for f-m-o v3 vs this. Attached here (vs. inline to
> > avoid the Thunderbird mangling the test formatting)
> Thanks.  Of particular importance is the leela change.  My recollection
> was that the f-m-o work also picked up that case.  But if my memory is
> faulty (always a possibility), then that shows a clear case where
> Jivan's work picks up a case not handled by Manolis's work.

f-m-o originally targeted (and benefited) the leela-case.  I wonder if
other optimizations/changes over the last year interfere with this and
what needs to be changed to accomodate this... looks like we need to
revisit against trunk.

Philipp.

> And on the other direction we can see that deepsjeng isn't helped by
> Jivan's work, but is helped by Manolis's new pass.
>
> I'd always hoped/expected we'd have cases where one patch clearly helped
> over the other.  While the .25% to .37% improvements for the three most
> impacted benchmarks doesn't move the needle much across the whole suite
> they do add up over time.
>
> Jeff

Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-21 Thread Philipp Tomsich

On Fri, 21 Jul 2023 at 19:56, Vineet Gupta  wrote:
>
> DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize 
> it.
>
> void zd(double *) { *d = 0.0; }
>
> currently:
>
> | fmv.d.x fa5,zero
> | fsd fa5,0(a0)
> | ret
>
> With patch
>
> | sd  zero,0(a0)
> | ret
> This came to light when testing the in-flight f-m-o patch where an ICE
> was gettinh triggered due to lack of this pattern but turns out this

typo: "gettinh" -> "getting"

> is an independent optimization of its own [1]
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html
>
> Apparently this is a regression in gcc-13, introduced by commit
> ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
> thus is a partial revert of that change.

Should we add a "Fixes: "?

> Ran thru full multilib testsuite, there was 1 false failure due to
> random string "lw" appearing in lto build assembler output,
> which is also fixed in the patch.
>
> gcc/Changelog:

PR target/110748

>
> * config/riscv/predicates.md (const_0_operand): Add back
>   const_double.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr110748-1.c: New Test.
> * gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
>   patterns to avoid random string matches.
>
> Signed-off-by: Vineet Gupta 
> ---
>  gcc/config/riscv/predicates.md |  2 +-
>  gcc/testsuite/gcc.target/riscv/pr110748-1.c| 10 ++
>  gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 
>  3 files changed, 15 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c
>
> diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> index 5a22c77f0cd0..9db28c2def7e 100644
> --- a/gcc/config/riscv/predicates.md
> +++ b/gcc/config/riscv/predicates.md
> @@ -58,7 +58,7 @@
> (match_test "INTVAL (op) + 1 != 0")))
>
>  (define_predicate "const_0_operand"
> -  (and (match_code "const_int,const_wide_int,const_vector")
> +  (and (match_code "const_int,const_wide_int,const_double,const_vector")
> (match_test "op == CONST0_RTX (GET_MODE (op))")))
>
>  (define_predicate "const_1_operand"
> diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c 
> b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
> new file mode 100644
> index ..2f5bc08aae72
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
> +/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
> +
> +
> +void zd(double *d) { *d = 0.0;  }
> +void zf(float *f)  { *f = 0.0;  }
> +
> +/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
> +/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
> b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> index 1036044291e7..89eb48bed1b9 100644
> --- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> @@ -18,7 +18,7 @@ d2ll (double d)
>  /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
>  /* { dg-final { scan-assembler "fmv.x.w" } } */
>  /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
> -/* { dg-final { scan-assembler-not "sw" } } */
> -/* { dg-final { scan-assembler-not "fld" } } */
> -/* { dg-final { scan-assembler-not "fsd" } } */
> -/* { dg-final { scan-assembler-not "lw" } } */
> +/* { dg-final { scan-assembler-not "\tsw\t" } } */
> +/* { dg-final { scan-assembler-not "\tfld\t" } } */
> +/* { dg-final { scan-assembler-not "\tfsd\t" } } */
> +/* { dg-final { scan-assembler-not "\tlw\t" } } */
> --
> 2.34.1
>

Re: [PATCH] riscv: thead: Fix failing XTheadCondMov tests (indirect-rv[32|64])

2023-07-12 Thread Philipp Tomsich

Thanks, applied to trunk!

Philipp.

On Wed, 12 Jul 2023 at 16:08, Jeff Law  wrote:

>
>
> On 7/12/23 08:07, Philipp Tomsich wrote:
> >
> >
> > On Wed, 12 Jul 2023 at 16:05, Jeff Law  > <mailto:jeffreya...@gmail.com>> wrote:
> >
> >
> >
> > On 7/12/23 06:48, Christoph Müllner wrote:
> >  > On Wed, Jul 12, 2023 at 4:05 AM Jeff Law  > <mailto:jeffreya...@gmail.com>> wrote:
> >  >>
> >  >>
> >  >>
> >  >> On 7/10/23 22:44, Christoph Muellner wrote:
> >  >>> From: Christoph Müllner  > <mailto:christoph.muell...@vrull.eu>>
> >  >>>
> >  >>> Recently, two identical XTheadCondMov tests have been added,
> > which both fail.
> >  >>> Let's fix that by changing the following:
> >  >>> * Merge both files into one (no need for separate tests for
> > rv32 and rv64)
> >  >>> * Drop unrelated attribute check test (we already test for
> > `th.mveqz`
> >  >>> and `th.mvnez` instructions, so there is little additional
> > value)
> >  >>> * Fix the pattern to allow matching
> >  >>>
> >  >>> gcc/testsuite/ChangeLog:
> >  >>>
> >  >>>* gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Moved
> > to...
> >  >>>* gcc.target/riscv/xtheadcondmov-indirect.c: ...here.
> >  >>>* gcc.target/riscv/xtheadcondmov-indirect-rv64.c:
> Removed.
> >  >> I thought this stuff got fixed recently.  Certainly happy to see
> the
> >  >> files merged though.  Here's what I got from the July 4 run:
> >  >
> >  > I have the following with a GCC master from today
> >  > (a454325bea77a0dd79415480d48233a7c296bc0a):
> >  >
> >  > FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2
> >  > scan-assembler .attribute arch,
> >  >
> >
>  "rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
> >  > FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2
> >  > scan-assembler .attribute arch,
> >  >
> >
>  "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
> >  >
> >  > With this patch the fails are gone.
> > Then it's fine with me :-)
> >
> >
> > For the avoidance of all doubt: could I hear an "OK"?
> OK for the trunk.
> jeff
>

Re: [PATCH] riscv: thead: Fix failing XTheadCondMov tests (indirect-rv[32|64])

2023-07-12 Thread Philipp Tomsich

On Wed, 12 Jul 2023 at 16:05, Jeff Law  wrote:

>
>
> On 7/12/23 06:48, Christoph Müllner wrote:
> > On Wed, Jul 12, 2023 at 4:05 AM Jeff Law  wrote:
> >>
> >>
> >>
> >> On 7/10/23 22:44, Christoph Muellner wrote:
> >>> From: Christoph Müllner 
> >>>
> >>> Recently, two identical XTheadCondMov tests have been added, which
> both fail.
> >>> Let's fix that by changing the following:
> >>> * Merge both files into one (no need for separate tests for rv32 and
> rv64)
> >>> * Drop unrelated attribute check test (we already test for `th.mveqz`
> >>> and `th.mvnez` instructions, so there is little additional value)
> >>> * Fix the pattern to allow matching
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>* gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Moved to...
> >>>* gcc.target/riscv/xtheadcondmov-indirect.c: ...here.
> >>>* gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Removed.
> >> I thought this stuff got fixed recently.  Certainly happy to see the
> >> files merged though.  Here's what I got from the July 4 run:
> >
> > I have the following with a GCC master from today
> > (a454325bea77a0dd79415480d48233a7c296bc0a):
> >
> > FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2
> > scan-assembler .attribute arch,
> > "rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
> > FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv64.c   -O2
> > scan-assembler .attribute arch,
> > "rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_xtheadcondmov1p0"
> >
> > With this patch the fails are gone.
> Then it's fine with me :-)


For the avoidance of all doubt: could I hear an "OK"?

Thanks,
Philipp.

Re: [PATCH 1/1] riscv: thead: Fix ICE when enable XTheadMemPair ISA extension.

2023-07-12 Thread Philipp Tomsich

Awesome, thanks!

On Wed, 12 Jul 2023 at 09:18, Kito Cheng  wrote:

> Yeah, I've applied patches on my local tree and running the testsuite.
>
> On Wed, Jul 12, 2023 at 3:11 PM Philipp Tomsich
>  wrote:
> >
> > Looks like I missed the OK on this one.
> > I can pick it up today, unless you Kito already has it in flight?
> >
> > Thanks,
> > Philipp.
> >
> > On Tue, 11 Jul 2023 at 17:51, Kito Cheng  wrote:
> >>
> >> Hi Christoph:
> >>
> >> Ooops, I thought Philipp will push those patches, does here any other
> >> patches got approved but not committed? I can help to push those
> >> patches tomorrow.
> >>
> >> On Tue, Jul 11, 2023 at 11:42 PM Christoph Müllner
> >>  wrote:
> >> >
> >> > Hi Cooper,
> >> >
> >> > I addressed this in April this year.
> >> > It even got an "ok", but nobody pushed it:
> >> >   https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616972.html
> >> >
> >> > BR
> >> > Christoph
> >> >
> >> > On Tue, Jul 11, 2023 at 5:39 PM Xianmiao Qu <
> cooper...@linux.alibaba.com> wrote:
> >> > >
> >> > > The frame related load/store instructions should not been
> >> > > scheduled bewteen echo other, and the REG_FRAME_RELATED_EXPR
> >> > > expression note should should be added to those instructions
> >> > > to prevent this.
> >> > > This bug cause ICE during GCC bootstap, and it will also ICE
> >> > > in the simplified case mempair-4.c, compilation fails with:
> >> > > during RTL pass: dwarf2
> >> > > theadmempair-4.c:20:1: internal compiler error: in
> dwarf2out_frame_debug_cfa_offset, at dwarf2cfi.cc:1376
> >> > > 0xa8c017 dwarf2out_frame_debug_cfa_offset
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:1376
> >> > > 0xa8c017 dwarf2out_frame_debug
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:2285
> >> > > 0xa8c017 scan_insn_after
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:2726
> >> > > 0xa8cc97 scan_trace
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:2893
> >> > > 0xa8d84d create_cfi_notes
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:2933
> >> > > 0xa8d84d execute_dwarf2_frame
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:3309
> >> > > 0xa8d84d execute
> >> > > ../../../gcc/gcc/dwarf2cfi.cc:3799
> >> > >
> >> > > gcc/ChangeLog:
> >> > >
> >> > > * config/riscv/thead.cc (th_mempair_save_regs): Add
> >> > > REG_FRAME_RELATED_EXPR note for mempair instuctions.
> >> > >
> >> > > gcc/testsuite/ChangeLog:
> >> > > * gcc.target/riscv/xtheadmempair-4.c: New test.
> >> > > ---
> >> > >  gcc/config/riscv/thead.cc |  6 +++--
> >> > >  .../gcc.target/riscv/xtheadmempair-4.c| 26
> +++
> >> > >  2 files changed, 30 insertions(+), 2 deletions(-)
> >> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
> >> > >
> >> > > diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
> >> > > index 75203805310..2df709226f9 100644
> >> > > --- a/gcc/config/riscv/thead.cc
> >> > > +++ b/gcc/config/riscv/thead.cc
> >> > > @@ -366,10 +366,12 @@ th_mempair_save_regs (rtx operands[4])
> >> > >  {
> >> > >rtx set1 = gen_rtx_SET (operands[0], operands[1]);
> >> > >rtx set2 = gen_rtx_SET (operands[2], operands[3]);
> >> > > +  rtx dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (2));
> >> > >rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2,
> set1, set2)));
> >> > >RTX_FRAME_RELATED_P (insn) = 1;
> >> > > -  add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set1));
> >> > > -  add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set2));
> >> > > +  XVECEXP (dwarf, 0, 0) = copy_rtx (set1);
> >> > > +  XVECEXP (dwarf, 0, 1) = copy_rtx (set2);
> >> > > +  add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf);
> >> > >  }
> >> > >
> >> > >  /* Similar like riscv_restore_reg, but restores two registers from
> memory
> >> > > diff -

Re: [PATCH 1/1] riscv: thead: Fix ICE when enable XTheadMemPair ISA extension.

2023-07-12 Thread Philipp Tomsich

Looks like I missed the OK on this one.
I can pick it up today, unless you Kito already has it in flight?

Thanks,
Philipp.

On Tue, 11 Jul 2023 at 17:51, Kito Cheng  wrote:

> Hi Christoph:
>
> Ooops, I thought Philipp will push those patches, does here any other
> patches got approved but not committed? I can help to push those
> patches tomorrow.
>
> On Tue, Jul 11, 2023 at 11:42 PM Christoph Müllner
>  wrote:
> >
> > Hi Cooper,
> >
> > I addressed this in April this year.
> > It even got an "ok", but nobody pushed it:
> >   https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616972.html
> >
> > BR
> > Christoph
> >
> > On Tue, Jul 11, 2023 at 5:39 PM Xianmiao Qu 
> wrote:
> > >
> > > The frame related load/store instructions should not been
> > > scheduled bewteen echo other, and the REG_FRAME_RELATED_EXPR
> > > expression note should should be added to those instructions
> > > to prevent this.
> > > This bug cause ICE during GCC bootstap, and it will also ICE
> > > in the simplified case mempair-4.c, compilation fails with:
> > > during RTL pass: dwarf2
> > > theadmempair-4.c:20:1: internal compiler error: in
> dwarf2out_frame_debug_cfa_offset, at dwarf2cfi.cc:1376
> > > 0xa8c017 dwarf2out_frame_debug_cfa_offset
> > > ../../../gcc/gcc/dwarf2cfi.cc:1376
> > > 0xa8c017 dwarf2out_frame_debug
> > > ../../../gcc/gcc/dwarf2cfi.cc:2285
> > > 0xa8c017 scan_insn_after
> > > ../../../gcc/gcc/dwarf2cfi.cc:2726
> > > 0xa8cc97 scan_trace
> > > ../../../gcc/gcc/dwarf2cfi.cc:2893
> > > 0xa8d84d create_cfi_notes
> > > ../../../gcc/gcc/dwarf2cfi.cc:2933
> > > 0xa8d84d execute_dwarf2_frame
> > > ../../../gcc/gcc/dwarf2cfi.cc:3309
> > > 0xa8d84d execute
> > > ../../../gcc/gcc/dwarf2cfi.cc:3799
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/riscv/thead.cc (th_mempair_save_regs): Add
> > > REG_FRAME_RELATED_EXPR note for mempair instuctions.
> > >
> > > gcc/testsuite/ChangeLog:
> > > * gcc.target/riscv/xtheadmempair-4.c: New test.
> > > ---
> > >  gcc/config/riscv/thead.cc |  6 +++--
> > >  .../gcc.target/riscv/xtheadmempair-4.c| 26 +++
> > >  2 files changed, 30 insertions(+), 2 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
> > >
> > > diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
> > > index 75203805310..2df709226f9 100644
> > > --- a/gcc/config/riscv/thead.cc
> > > +++ b/gcc/config/riscv/thead.cc
> > > @@ -366,10 +366,12 @@ th_mempair_save_regs (rtx operands[4])
> > >  {
> > >rtx set1 = gen_rtx_SET (operands[0], operands[1]);
> > >rtx set2 = gen_rtx_SET (operands[2], operands[3]);
> > > +  rtx dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (2));
> > >rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2,
> set1, set2)));
> > >RTX_FRAME_RELATED_P (insn) = 1;
> > > -  add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set1));
> > > -  add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set2));
> > > +  XVECEXP (dwarf, 0, 0) = copy_rtx (set1);
> > > +  XVECEXP (dwarf, 0, 1) = copy_rtx (set2);
> > > +  add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf);
> > >  }
> > >
> > >  /* Similar like riscv_restore_reg, but restores two registers from
> memory
> > > diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
> b/gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
> > > new file mode 100644
> > > index 000..d653f056ef4
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-4.c
> > > @@ -0,0 +1,26 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-g" "-Oz" "-Os" "-flto" }
> } */
> > > +/* { dg-options "-march=rv64gc_xtheadmempair -O2 -g
> -mtune=thead-c906" { target { rv64 } } } */
> > > +/* { dg-options "-march=rv32gc_xtheadmempair -O2 -g
> -mtune=thead-c906" { target { rv32 } } } */
> > > +
> > > +void a();
> > > +void b(char *);
> > > +void m_fn1(int);
> > > +int e;
> > > +
> > > +int foo(int ee, int f, int g) {
> > > +  char *h = (char *)__builtin_alloca(1);
> > > +  b(h);
> > > +  b("");
> > > +  int i = ee;
> > > +  e = g;
> > > +  m_fn1(f);
> > > +  a();
> > > +  e = i;
> > > +}
> > > +
> > > +/* { dg-final { scan-assembler-times "th.ldd\t" 3 { target { rv64 } }
> } } */
> > > +/* { dg-final { scan-assembler-times "th.sdd\t" 3 { target { rv64 } }
> } } */
> > > +
> > > +/* { dg-final { scan-assembler-times "th.lwd\t" 3 { target { rv32 } }
> } } */
> > > +/* { dg-final { scan-assembler-times "th.swd\t" 3 { target { rv32 } }
> } } */
> > > --
> > > 2.17.1
> > >
>

Re: [PATCH v2] tree-optimization/110279- Check for nested FMA chains in reassoc

2023-07-11 Thread Philipp Tomsich

Jakub,

it looks like you did a lot of work on reassoc in the past — could you
have a quick look and comment?

Thanks,
Philipp.


On Tue, 11 Jul 2023 at 04:59, Di Zhao OS  wrote:
>
> Attached is an updated version of the patch.
>
> Based on Philipp's review, some changes:
>
> 1. Defined new enum fma_state to describe the state of FMA candidates
>for a list of operands. (Since the tests seems simple after the
>change, I didn't add predicates on it.)
> 2. Changed return type of convert_mult_to_fma_1 and convert_mult_to_fma
>to tree, to remove the in/out parameter.
> 3. Added description of return value values of rank_ops_for_fma.
>
> ---
> gcc/ChangeLog:
>
> * tree-ssa-math-opts.cc (convert_mult_to_fma_1): Added new parameter
> check_only_p. Changed return type to tree.
> (struct fma_transformation_info): Moved to header.
> (class fma_deferring_state): Moved to header.
> (convert_mult_to_fma): Added new parameter check_only_p. Changed
> return type to tree.
> * tree-ssa-math-opts.h (struct fma_transformation_info): Moved from 
> .cc.
> (class fma_deferring_state): Moved from .cc.
> (convert_mult_to_fma): Add function decl.
> * tree-ssa-reassoc.cc (enum fma_state): Defined new enum to describe
> the state of FMA candidates for a list of operands.
> (rewrite_expr_tree_parallel): Changed boolean parameter to enum type.
> (rank_ops_for_fma): Return enum fma_state.
> (reassociate_bb): Avoid rewriting to parallel if nested FMAs are 
> found.
>
> Thanks,
> Di Zhao
>
>

Re: [PING][PATCH] tree-optimization/110279- Check for nested FMA chains in reassoc

2023-07-07 Thread Philipp Tomsich

On Fri, 7 Jul 2023 at 10:28, Di Zhao OS via Gcc-patches
 wrote:
>
> Update the patch so it can apply.
>
> Tested on spec2017 fprate cases again. With option "-funroll-loops -Ofast 
> -flto",
> the improvements of 1-copy run are:
>
> Ampere1:
> 508.namd_r  4.26%
> 510.parest_r2.55%
> Overall 0.54%
> Intel Xeon:
> 503.bwaves_r1.3%
> 508.namd_r  1.58%
> overall 0.42%

This looks like a worthwhile improvement.

>From reviewing the patch, a few nit-picks:
- given that 'has_fma' can now take three values { -1, 0, 1 }, an enum
with more descriptive names for these 3 states should be used;
- using "has_fma >= 0" and "fma > 0" tests are hard to read; after
changing this to an enum, you can use macros or helper functions to
test the predicates (i.e., *_P macros or *_p helpers) for readability
- the meaning of the return values of rank_ops_for_fma should be
documented in the comment describing the function
- changing convert_mult_to_fma_1 to return a tree* (i.e., return_lhs
or NULL_TREE) removes the need for an in/out parameter

Thanks,
Philipp.

>
>
> Thanks,
> Di Zhao
>
>
> > -Original Message-
> > From: Di Zhao OS
> > Sent: Friday, June 16, 2023 4:51 PM
> > To: gcc-patches@gcc.gnu.org
> > Subject: [PATCH] tree-optimization/110279- Check for nested FMA chains in
> > reassoc
> >
> > This patch is to fix the regressions found in SPEC2017 fprate cases
> >  on aarch64.
> >
> > 1. Reused code in pass widening_mul to check for nested FMA chains
> >  (those connected by MULT_EXPRs), since re-writing to parallel
> >  generates worse codes.
> >
> > 2. Avoid re-arrange to produce less FMA chains that can be slow.
> >
> > Tested on ampere1 and neoverse-n1, this fixed the regressions in
> > 508.namd_r and 510.parest_r 1 copy run. While I'm still collecting data
> > on x86 machines we have, I'd like to know what do you think of this.
> >
> > (Previously I tried to improve things with FMA by adding a widening_mul
> > pass before reassoc2 for it's easier to recognize different patterns
> > of FMA chains and decide whether to split them. But I suppose handling
> > them all in reassoc pass is more efficient.)
> >
> > Thanks,
> > Di Zhao
> >
> > ---
> > gcc/ChangeLog:
> >
> > * tree-ssa-math-opts.cc (convert_mult_to_fma_1): Add new parameter.
> > Support new mode that merely do the checking.
> > (struct fma_transformation_info): Moved to header.
> > (class fma_deferring_state): Moved to header.
> > (convert_mult_to_fma): Add new parameter.
> > * tree-ssa-math-opts.h (struct fma_transformation_info):
> > (class fma_deferring_state): Moved from .cc.
> > (convert_mult_to_fma): Add function decl.
> > * tree-ssa-reassoc.cc (rewrite_expr_tree_parallel):
> > (rank_ops_for_fma): Return -1 if nested FMAs are found.
> > (reassociate_bb): Avoid rewriting to parallel if nested FMAs are
> > found.
>

Re: [PATCH v2] RISC-V: Add support for vector crypto extensions

2023-07-03 Thread Philipp Tomsich

Thanks, applied to master.
--Philipp.

On Mon, 3 Jul 2023 at 15:42, Kito Cheng  wrote:

> Thanks, LGTM :)
>
> Christoph Muellner 於 2023年7月3日 週一，19:08寫道：
>
>> From: Christoph Müllner 
>>
>> This series adds basic support for the vector crypto extensions:
>> * Zvbb
>> * Zvbc
>> * Zvkg
>> * Zvkned
>> * Zvkhn[a,b]
>> * Zvksed
>> * Zvksh
>> * Zvkn
>> * Zvknc
>> * Zvkng
>> * Zvks
>> * Zvksc
>> * Zvksg
>> * Zvkt
>>
>> This patch is based on the v20230620 version of the Vector Cryptography
>> specification. The specification is frozen and can be found here:
>>   https://github.com/riscv/riscv-crypto/releases/tag/v20230620
>>
>> Binutils support has been merged upstream a few days ago.
>>
>> All extensions come with tests for the feature test macros.
>>
>> gcc/ChangeLog:
>>
>> * common/config/riscv/riscv-common.cc: Add support for zvbb,
>> zvbc, zvkg, zvkned, zvknha, zvknhb, zvksed, zvksh, zvkn,
>> zvknc, zvkng, zvks, zvksc, zvksg, zvkt and the implied subsets.
>> * config/riscv/arch-canonicalize: Add canonicalization info for
>> zvkn, zvknc, zvkng, zvks, zvksc, zvksg.
>> * config/riscv/riscv-opts.h (MASK_ZVBB): New macro.
>> (MASK_ZVBC): Likewise.
>> (TARGET_ZVBB): Likewise.
>> (TARGET_ZVBC): Likewise.
>> (MASK_ZVKG): Likewise.
>> (MASK_ZVKNED): Likewise.
>> (MASK_ZVKNHA): Likewise.
>> (MASK_ZVKNHB): Likewise.
>> (MASK_ZVKSED): Likewise.
>> (MASK_ZVKSH): Likewise.
>> (MASK_ZVKN): Likewise.
>> (MASK_ZVKNC): Likewise.
>> (MASK_ZVKNG): Likewise.
>> (MASK_ZVKS): Likewise.
>> (MASK_ZVKSC): Likewise.
>> (MASK_ZVKSG): Likewise.
>> (MASK_ZVKT): Likewise.
>> (TARGET_ZVKG): Likewise.
>> (TARGET_ZVKNED): Likewise.
>> (TARGET_ZVKNHA): Likewise.
>> (TARGET_ZVKNHB): Likewise.
>> (TARGET_ZVKSED): Likewise.
>> (TARGET_ZVKSH): Likewise.
>> (TARGET_ZVKN): Likewise.
>> (TARGET_ZVKNC): Likewise.
>> (TARGET_ZVKNG): Likewise.
>> (TARGET_ZVKS): Likewise.
>> (TARGET_ZVKSC): Likewise.
>> (TARGET_ZVKSG): Likewise.
>> (TARGET_ZVKT): Likewise.
>> * config/riscv/riscv.opt: Introduction of riscv_zv{b,k}_subext.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/zvbb.c: New test.
>> * gcc.target/riscv/zvbc.c: New test.
>> * gcc.target/riscv/zvkg.c: New test.
>> * gcc.target/riscv/zvkn-1.c: New test.
>> * gcc.target/riscv/zvkn.c: New test.
>> * gcc.target/riscv/zvknc-1.c: New test.
>> * gcc.target/riscv/zvknc-2.c: New test.
>> * gcc.target/riscv/zvknc.c: New test.
>> * gcc.target/riscv/zvkned.c: New test.
>> * gcc.target/riscv/zvkng-1.c: New test.
>> * gcc.target/riscv/zvkng-2.c: New test.
>> * gcc.target/riscv/zvkng.c: New test.
>> * gcc.target/riscv/zvknha.c: New test.
>> * gcc.target/riscv/zvknhb.c: New test.
>> * gcc.target/riscv/zvks-1.c: New test.
>> * gcc.target/riscv/zvks.c: New test.
>> * gcc.target/riscv/zvksc-1.c: New test.
>> * gcc.target/riscv/zvksc-2.c: New test.
>> * gcc.target/riscv/zvksc.c: New test.
>> * gcc.target/riscv/zvksed.c: New test.
>> * gcc.target/riscv/zvksg-1.c: New test.
>> * gcc.target/riscv/zvksg-2.c: New test.
>> * gcc.target/riscv/zvksg.c: New test.
>> * gcc.target/riscv/zvksh.c: New test.
>> * gcc.target/riscv/zvkt.c: New test.
>>
>> Signed-off-by: Christoph Müllner 
>> ---
>> Changes for v2:
>> - Update patch for specification version v20230620
>>
>>  gcc/common/config/riscv/riscv-common.cc  | 55 
>>  gcc/config/riscv/arch-canonicalize   |  7 +++
>>  gcc/config/riscv/riscv-opts.h| 34 +++
>>  gcc/config/riscv/riscv.opt   |  6 +++
>>  gcc/testsuite/gcc.target/riscv/zvbb.c| 13 ++
>>  gcc/testsuite/gcc.target/riscv/zvbc.c| 13 ++
>>  gcc/testsuite/gcc.target/riscv/zvkg.c| 13 ++
>>  gcc/testsuite/gcc.target/riscv/zvkn-1.c  | 29 +
>>  gcc/testsuite/gcc.target/riscv/zvkn.c| 29 +
>>  gcc/testsuite/gcc.target/riscv/zvknc-1.c | 37 
>>  gcc/testsuite/gcc.target/riscv/zvknc-2.c | 37 
>>  gcc/testsuite/gcc.target/riscv/zvknc.c   | 37 
>>  gcc/testsuite/gcc.target/riscv/zvkned.c  | 13 ++
>>  gcc/testsuite/gcc.target/riscv/zvkng-1.c | 37 
>>  gcc/testsuite/gcc.target/riscv/zvkng-2.c | 37 
>>  gcc/testsuite/gcc.target/riscv/zvkng.c   | 37 
>>  gcc/testsuite/gcc.target/riscv/zvknha.c  | 13 ++
>>  gcc/testsuite/gcc.target/riscv/zvknhb.c  | 13 ++
>>  gcc/testsuite/gcc.target/riscv/zvks-1.c  | 29 +
>>  gcc/testsuite/gcc.target/riscv/zvks.c| 29 +
>>  gcc/testsuite/gcc.target/riscv/zvksc-1.c | 37

Re: [PATCH] cprop_hardreg: fix ORIGINAL_REGNO/REG_ATTRS/REG_POINTER handling

2023-06-28 Thread Philipp Tomsich

Thanks! Applied to master with the requested changes as
417b8379b32945d61f1ce3d8281bee063eea1937.
Note that the final version factors out the duplicated logic, so we
now have a single place to add the comments.

Philipp.


On Sun, 25 Jun 2023 at 06:09, Jeff Law  wrote:
>
>
>
> On 6/22/23 05:11, Philipp Tomsich wrote:
> > From: Manolis Tsamis 
> >
> > Fixes: 6a2e8dcbbd4bab3
> >
> > Propagation for the stack pointer in regcprop was enabled in
> > 6a2e8dcbbd4bab3, but set ORIGINAL_REGNO/REG_ATTRS/REG_POINTER for
> > stack_pointer_rtx which caused regression (e.g., PR 110313, PR 110308).
> >
> > This fix adds special handling for stack_pointer_rtx in the places
> > where maybe_mode_change is called. This also adds an check in
> > maybe_mode_change to return the stack pointer only when the requested
> > mode matches the mode of stack_pointer_rtx.
> >
> >   PR 110308
> Should be
> PR debug/110308
>
>
> >
> > gcc/ChangeLog:
> >
> >   * regcprop.cc (maybe_mode_change): Check stack_pointer_rtx mode.
> >   (find_oldest_value_reg): Special handling of stack_pointer_rtx.
> >   (copyprop_hardreg_forward_1): Ditto.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * g++.dg/torture/pr110308.C: New test.
> I don't doubt the need for the special handling of the stack pointer,
> but it's not obvious why it's needed.  So my request is that both humks
> which specialize handling of ORIGINAL_REGNO, REG_ATTRS & REG_POINTER
> have a comment indicating why we must not adjust those values when
> NEW_RTX is STACK_POINTER_RTX.
>
> OK with that change.
>
> Jeff

[COMMITTED, PR 110308] cprop_hardreg: fix ORIGINAL_REGNO/REG_ATTRS/REG_POINTER handling

2023-06-28 Thread Philipp Tomsich

From: Manolis Tsamis 

Fixes: 6a2e8dcbbd4bab3

Propagation for the stack pointer in regcprop was enabled in
6a2e8dcbbd4bab3, but set ORIGINAL_REGNO/REG_ATTRS/REG_POINTER for
stack_pointer_rtx which caused regression (e.g., PR 110313, PR 110308).

This fix adds special handling for stack_pointer_rtx in the places
where maybe_mode_change is called. This also adds an check in
maybe_mode_change to return the stack pointer only when the requested
mode matches the mode of stack_pointer_rtx.

PR debug/110308

gcc/ChangeLog:

* regcprop.cc (maybe_mode_change): Check stack_pointer_rtx mode.
(maybe_copy_reg_attrs): New function.
(find_oldest_value_reg): Use maybe_copy_reg_attrs.
(copyprop_hardreg_forward_1): Ditto.

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr110308.C: New test.

Signed-off-by: Manolis Tsamis 
Signed-off-by: Philipp Tomsich 

---

 gcc/regcprop.cc | 52 +
 gcc/testsuite/g++.dg/torture/pr110308.C | 29 ++
 2 files changed, 65 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr110308.C

diff --git a/gcc/regcprop.cc b/gcc/regcprop.cc
index 6cbfadb181f..d28a4d5aca8 100644
--- a/gcc/regcprop.cc
+++ b/gcc/regcprop.cc
@@ -423,7 +423,7 @@ maybe_mode_change (machine_mode orig_mode, machine_mode 
copy_mode,
  It's unclear if we need to do the same for other special registers.  */
   if (regno == STACK_POINTER_REGNUM)
 {
-  if (orig_mode == new_mode)
+  if (orig_mode == new_mode && new_mode == GET_MODE (stack_pointer_rtx))
return stack_pointer_rtx;
   else
return NULL_RTX;
@@ -451,6 +451,31 @@ maybe_mode_change (machine_mode orig_mode, machine_mode 
copy_mode,
   return NULL_RTX;
 }
 
+/* Helper function to copy attributes when replacing OLD_REG with NEW_REG.
+   If the changes required for NEW_REG are invalid return NULL_RTX, otherwise
+   return NEW_REG.  This is intended to be used with maybe_mode_change.  */
+
+static rtx
+maybe_copy_reg_attrs (rtx new_reg, rtx old_reg)
+{
+  if (new_reg != stack_pointer_rtx)
+{
+  /* NEW_REG is assumed to be a register copy resulting from
+maybe_mode_change.  */
+  ORIGINAL_REGNO (new_reg) = ORIGINAL_REGNO (old_reg);
+  REG_ATTRS (new_reg) = REG_ATTRS (old_reg);
+  REG_POINTER (new_reg) = REG_POINTER (old_reg);
+}
+  else if (REG_POINTER (new_reg) != REG_POINTER (old_reg))
+{
+  /* Only a single instance of STACK_POINTER_RTX must exist and we cannot
+modify it.  Allow propagation if REG_POINTER for OLD_REG matches and
+don't touch ORIGINAL_REGNO and REG_ATTRS.  */
+  return NULL_RTX;
+}
+  return new_reg;
+}
+
 /* Find the oldest copy of the value contained in REGNO that is in
register class CL and has mode MODE.  If found, return an rtx
of that oldest register, otherwise return NULL.  */
@@ -486,12 +511,7 @@ find_oldest_value_reg (enum reg_class cl, rtx reg, struct 
value_data *vd)
 
   new_rtx = maybe_mode_change (oldmode, vd->e[regno].mode, mode, i, regno);
   if (new_rtx)
-   {
- ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (reg);
- REG_ATTRS (new_rtx) = REG_ATTRS (reg);
- REG_POINTER (new_rtx) = REG_POINTER (reg);
- return new_rtx;
-   }
+   return maybe_copy_reg_attrs (new_rtx, reg);
 }
 
   return NULL_RTX;
@@ -965,15 +985,15 @@ copyprop_hardreg_forward_1 (basic_block bb, struct 
value_data *vd)
 
  if (validate_change (insn, _SRC (set), new_rtx, 0))
{
- ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (src);
- REG_ATTRS (new_rtx) = REG_ATTRS (src);
- REG_POINTER (new_rtx) = REG_POINTER (src);
- if (dump_file)
-   fprintf (dump_file,
-"insn %u: replaced reg %u with %u\n",
-INSN_UID (insn), regno, REGNO (new_rtx));
- changed = true;
- goto did_replacement;
+ if (maybe_copy_reg_attrs (new_rtx, src))
+   {
+ if (dump_file)
+   fprintf (dump_file,
+"insn %u: replaced reg %u with %u\n",
+INSN_UID (insn), regno, REGNO (new_rtx));
+ changed = true;
+ goto did_replacement;
+   }
}
  /* We need to re-extract as validate_change clobbers
 recog_data.  */
diff --git a/gcc/testsuite/g++.dg/torture/pr110308.C 
b/gcc/testsuite/g++.dg/torture/pr110308.C
new file mode 100644
index 000..36c6d382121
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr110308.C
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+
+int channelCount, decodeBlock_o

Re: [PATCH] Change fma_reassoc_width tuning for ampere1

2023-06-22 Thread Philipp Tomsich

Richard,

OK for backport to GCC-13?

Thanks,
Philipp.

On Thu, 22 Jun 2023 at 16:18, Richard Sandiford via Gcc-patches
 wrote:
>
> Di Zhao OS via Gcc-patches  writes:
> > This patch enables reassociation of floating-point additions on ampere1.
> > This brings about 1% overall benefit on spec2017 fprate cases. (There
> > are minor regressions in 510.parest_r and 508.namd_r, analyzed here:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279 .)
> >
> > Bootstrapped and tested on aarch64-unknown-linux-gnu. Is this OK for trunk?
> >
> > Thanks,
> > Di Zhao
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64.cc: Change fma_reassoc_width for ampere1
>
> Thanks, pushed to trunk.
>
> Richard
>
> > ---
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index d16565b5581..301c9f6c0cd 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -1927,7 +1927,7 @@ static const struct tune_params ampere1_tunings =
> >"32:12",   /* loop_align.  */
> >2, /* int_reassoc_width.  */
> >4, /* fp_reassoc_width.  */
> > -  1, /* fma_reassoc_width.  */
> > +  4, /* fma_reassoc_width.  */
> >2, /* vec_reassoc_width.  */
> >2, /* min_div_recip_mul_sf.  */
> >2, /* min_div_recip_mul_df.  */

[PATCH] cprop_hardreg: fix ORIGINAL_REGNO/REG_ATTRS/REG_POINTER handling

2023-06-22 Thread Philipp Tomsich

From: Manolis Tsamis 

Fixes: 6a2e8dcbbd4bab3

Propagation for the stack pointer in regcprop was enabled in
6a2e8dcbbd4bab3, but set ORIGINAL_REGNO/REG_ATTRS/REG_POINTER for
stack_pointer_rtx which caused regression (e.g., PR 110313, PR 110308).

This fix adds special handling for stack_pointer_rtx in the places
where maybe_mode_change is called. This also adds an check in
maybe_mode_change to return the stack pointer only when the requested
mode matches the mode of stack_pointer_rtx.

PR 110308

gcc/ChangeLog:

* regcprop.cc (maybe_mode_change): Check stack_pointer_rtx mode.
(find_oldest_value_reg): Special handling of stack_pointer_rtx.
(copyprop_hardreg_forward_1): Ditto.

gcc/testsuite/ChangeLog:

* g++.dg/torture/pr110308.C: New test.

Signed-off-by: Manolis Tsamis 
Signed-off-by: Philipp Tomsich 

---
This addresses both the PRs (110308 and 110313) and was confirmed to
resolve the AArch64 bootstrap issue reported by Thiago.

OK for trunk?

 gcc/regcprop.cc | 43 +
 gcc/testsuite/g++.dg/torture/pr110308.C | 30 +
 2 files changed, 60 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/torture/pr110308.C

diff --git a/gcc/regcprop.cc b/gcc/regcprop.cc
index 6cbfadb181f..fe75b7f1fa0 100644
--- a/gcc/regcprop.cc
+++ b/gcc/regcprop.cc
@@ -423,7 +423,7 @@ maybe_mode_change (machine_mode orig_mode, machine_mode 
copy_mode,
  It's unclear if we need to do the same for other special registers.  */
   if (regno == STACK_POINTER_REGNUM)
 {
-  if (orig_mode == new_mode)
+  if (orig_mode == new_mode && new_mode == GET_MODE (stack_pointer_rtx))
return stack_pointer_rtx;
   else
return NULL_RTX;
@@ -487,9 +487,14 @@ find_oldest_value_reg (enum reg_class cl, rtx reg, struct 
value_data *vd)
   new_rtx = maybe_mode_change (oldmode, vd->e[regno].mode, mode, i, regno);
   if (new_rtx)
{
- ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (reg);
- REG_ATTRS (new_rtx) = REG_ATTRS (reg);
- REG_POINTER (new_rtx) = REG_POINTER (reg);
+ if (new_rtx != stack_pointer_rtx)
+   {
+ ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (reg);
+ REG_ATTRS (new_rtx) = REG_ATTRS (reg);
+ REG_POINTER (new_rtx) = REG_POINTER (reg);
+   }
+ else if (REG_POINTER (new_rtx) != REG_POINTER (reg))
+   return NULL_RTX;
  return new_rtx;
}
 }
@@ -965,15 +970,27 @@ copyprop_hardreg_forward_1 (basic_block bb, struct 
value_data *vd)
 
  if (validate_change (insn, _SRC (set), new_rtx, 0))
{
- ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (src);
- REG_ATTRS (new_rtx) = REG_ATTRS (src);
- REG_POINTER (new_rtx) = REG_POINTER (src);
- if (dump_file)
-   fprintf (dump_file,
-"insn %u: replaced reg %u with %u\n",
-INSN_UID (insn), regno, REGNO (new_rtx));
- changed = true;
- goto did_replacement;
+ bool can_change;
+ if (new_rtx != stack_pointer_rtx)
+   {
+ ORIGINAL_REGNO (new_rtx) = ORIGINAL_REGNO (src);
+ REG_ATTRS (new_rtx) = REG_ATTRS (src);
+ REG_POINTER (new_rtx) = REG_POINTER (src);
+ can_change = true;
+   }
+ else
+   can_change
+ = (REG_POINTER (new_rtx) == REG_POINTER (src));
+
+ if (can_change)
+   {
+ if (dump_file)
+   fprintf (dump_file,
+"insn %u: replaced reg %u with %u\n",
+INSN_UID (insn), regno, REGNO (new_rtx));
+ changed = true;
+ goto did_replacement;
+   }
}
  /* We need to re-extract as validate_change clobbers
 recog_data.  */
diff --git a/gcc/testsuite/g++.dg/torture/pr110308.C 
b/gcc/testsuite/g++.dg/torture/pr110308.C
new file mode 100644
index 000..ddd30d4fc3f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr110308.C
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-g2 -O2" } */
+
+int channelCount, decodeBlock_outputLength;
+struct BlockCodec {
+  virtual int decodeBlock(const unsigned char *, short *);
+};
+struct ms_adpcm_state {
+  char predictorIndex;
+  int sample1;
+  ms_adpcm_state();
+};
+bool decodeBlock_ok;
+void encodeBlock() { ms_adpcm_state(); }
+struct MSADPCM : BlockCodec {
+  int decodeBlock(const unsigned char *, short *);
+}

Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-22 Thread Philipp Tomsich

This should be covered by PR110308 (proposed fix attached there) and PR110313.
Our bootstrap runs are still in progress to confirm.


On Thu, 22 Jun 2023 at 09:40, Richard Biener  wrote:
>
> On Thu, Jun 22, 2023 at 1:42 AM Thiago Jung Bauermann
>  wrote:
> >
> >
> > Hello,
> >
> > Jeff Law  writes:
> >
> > > On 6/19/23 22:52, Tamar Christina wrote:
> > >
> > >>> It's a bit hackish, but could we reject the stack pointer for operand1 
> > >>> in the
> > >>> stack-tie?  And if we do so, does it help?
> > >> Yeah this one I had to defer until later this week to look at closer 
> > >> because what I'm
> > >> wondering about is whether the optimization should apply to frame related
> > >> RTX as well.
> > >> Looking at the description of RTX_FRAME_RELATED_P that this optimization 
> > >> may
> > >> end up de-optimizing RISC targets by creating an offset that is larger 
> > >> than offset
> > >> which can be used from a SP making reload having to spill.  i.e. 
> > >> sometimes the
> > >> move was explicitly done. So perhaps it should not apply it to
> > >> RTX_FRAME_RELATED_P in find_oldest_value_reg and 
> > >> copyprop_hardreg_forward_1?
> > >> Other parts of this pass already seems to bail out in similar 
> > >> situations. So I needed
> > >> to
> > >> write some testcases to check what would happen in these cases hence the 
> > >> deferral.
> > >> to later in the week.
> > > Rejecting for RTX_FRAME_RELATED_P would seem reasonable and probably 
> > > better in general to
> > > me.  The cases where we're looking to clean things up aren't really in the
> > > prologue/epilogue, but instead in the main body after register 
> > > elimination has turned fp
> > > into sp + offset, thus making all kinds of things no longer valid.
> >
> > The problems I reported were fixed by commits:
> >
> > 580b74a79146 "aarch64: Robustify stack tie handling"
> > 079f31c55318 "aarch64: Fix gcc.target/aarch64/sve/pcs failures"
> >
> > Thanks!
> >
> > But unfortunately I'm still seeing bootstrap failures (ICE segmentation
> > fault) in today's trunk with build config bootstrap-lto in both
> > armv8l-linux-gnueabihf and aarch64-linux-gnu.
>
> If there's not yet a bugreport for this please make sure to open one so
> this issue doesn't get lost.
>
> > If I revert commit 6a2e8dcbbd4b "cprop_hardreg: Enable propagation of
> > the stack pointer if possible" from trunk then both bootstraps succeed.
> >
> > Here's the command I'm using to build on armv8l:
> >
> > ~/src/configure \
> > SHELL=/bin/bash \
> > --with-gnu-as \
> > --with-gnu-ld \
> > --disable-libmudflap \
> > --enable-lto \
> > --enable-shared \
> > --without-included-gettext \
> > --enable-nls \
> > --with-system-zlib \
> > --disable-sjlj-exceptions \
> > --enable-gnu-unique-object \
> > --enable-linker-build-id \
> > --disable-libstdcxx-pch \
> > --enable-c99 \
> > --enable-clocale=gnu \
> > --enable-libstdcxx-debug \
> > --enable-long-long \
> > --with-cloog=no \
> > --with-ppl=no \
> > --with-isl=no \
> > --disable-multilib \
> > --with-float=hard \
> > --with-fpu=neon-fp-armv8 \
> > --with-mode=thumb \
> > --with-arch=armv8-a \
> > --enable-threads=posix \
> > --enable-multiarch \
> > --enable-libstdcxx-time=yes \
> > --enable-gnu-indirect-function \
> > --disable-werror \
> > --enable-checking=yes \
> > --enable-bootstrap \
> > --with-build-config=bootstrap-lto \
> > --enable-languages=c,c++,fortran,lto \
> > && make \
> > profiledbootstrap \
> > SHELL=/bin/bash \
> > -w \
> > -j 40 \
> > CFLAGS_FOR_BUILD="-pipe -g -O2" \
> > CXXFLAGS_FOR_BUILD="-pipe -g -O2" \
> > LDFLAGS_FOR_BUILD="-static-libgcc" \
> > MAKEINFOFLAGS=--force \
> > BUILD_INFO="" \
> > MAKEINFO=echo
> >
> > And here's the slightly different one for aarch64-linux:
> >
> > ~/src/configure \
> > SHELL=/bin/bash \
> > --with-gnu-as \
> > --with-gnu-ld \
> > --disable-libmudflap \
> > --enable-lto \
> > --enable-shared \
> > --without-included-gettext \
> > --enable-nls \
> > --with-system-zlib \
> > --disable-sjlj-exceptions \
> > --enable-gnu-unique-object \
> > --enable-linker-build-id \
> > --disable-libstdcxx-pch \
> > --enable-c99 \
> > --enable-clocale=gnu \
> > --enable-libstdcxx-debug \
> > --enable-long-long \
> > --with-cloog=no \
> > --with-ppl=no \
> > --with-isl=no \
> > --disable-multilib \
> > --enable-fix-cortex-a53-835769 \
> > --enable-fix-cortex-a53-843419 \
> > --with-arch=armv8-a \
> > --enable-threads=posix \
> > --enable-multiarch \
> > --enable-libstdcxx-time=yes \
> > --enable-gnu-indirect-function \
> > --disable-werror \
> > --enable-checking=yes \
> > --enable-bootstrap \
> > --with-build-config=bootstrap-lto \
> >

Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-15 Thread Philipp Tomsich

Rebased, retested, and applied to trunk.  Thanks!
--Philipp.


On Thu, 8 Jun 2023 at 00:18, Jeff Law  wrote:
>
>
>
> On 5/25/23 06:35, Manolis Tsamis wrote:
> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
> > in all cases, due to maybe_mode_change returning NULL. Relax this
> > restriction and allow propagation when no mode change is requested.
> >
> > gcc/ChangeLog:
> >
> >  * regcprop.cc (maybe_mode_change): Enable stack pointer 
> > propagation.
> Thanks for the clarification.  This is OK for the trunk.  It looks
> generic enough to have value going forward now rather than waiting.
>
> jeff

Re: [PATCH] RISC-V: Add Veyron V1 pipeline description

2023-06-08 Thread Philipp Tomsich

On Thu 8. Jun 2023 at 16:17, Jeff Law  wrote:

>
>
> On 6/8/23 04:22, Kito Cheng wrote:
>
> >
> >
> > Oh, okay I got the awkness point...I am ok with that on gcc land, but I
> > would like binutils support that first, or remove the extension from the
> > mcpu for temporary before binutils support, otherwise it just a broken
> > support for that CPU on trunk gcc.
> I pushed the binutils bits into the repo a couple months ago:
>
> > commit 1656d3f8ef56a16745689c03269412988ebcaa54
> > Author: Philipp Tomsich 
> > Date:   Wed Apr 26 14:09:34 2023 -0600
> >
> > RISC-V: Support XVentanaCondOps extension
> [ ... ]
>
> I'd very much like to see the condops go into GCC as well, but I've been
> hesitant to move it forward myself.  We're still waiting on hardware and
> it wasn't clear to me that we really had consensus agreement to move the
> bits forward based on an announcement vs waiting on actual hardware
> availability (based on the comments from Palmer when I upstreamed the
> binutils bits).


Zicondops will go to ratification in the next couple of weeks, and the plan
is to revise the patches by then.

So I would propose that we move Zicond forward as that happens and (given
how small XVentanaCondOps is on-top of Zicond) we pick it up then.


> IIRC there was general consensus on rewriting the lowest level


That was part of the “moving forward”… this needs a rebase and a major
revision.


> primitives as if-then-else constructs.  Something like this:
>
> > (define_code_iterator eq_or_ne [eq ne])
> > (define_code_attr n [(eq "") (ne "n")])
> > (define_code_attr rev [(eq "n") (ne "")])
> >
> > (define_insn "*vt.maskc"
> >   [(set (match_operand:X 0 "register_operand" "=r")
> > (if_then_else:X
> >  (eq_or_ne (match_operand:X 1 "register_operand" "r")
> >  (const_int 0))
> >  (const_int 0)
> >  (match_operand:X 2 "register_operand" "r")))]
> >   "TARGET_XVENTANACONDOPS"
> >   "vt.maskc\t%0,%2,%1")
> >
> > (define_insn "*vt.maskc_reverse"
> >   [(set (match_operand:X 0 "register_operand" "=r")
> > (if_then_else:X
> >  (eq_or_ne (match_operand:X 1 "register_operand" "r")
> >  (const_int 0))
> >  (match_operand:X 2 "register_operand" "r")
> >  (const_int 0)))]
> >   "TARGET_XVENTANACONDOPS"
> >   "vt.maskc\t%0,%2,%1")
>
> That's what we're using internally these days.  I would expect zicond to
> work in exactly the same manner, but with a different instruction being
> emitted.
>
> We've also got bits here which wire this up in the conditional move
> expander and which adjust the ifcvt.cc bits from VRULL to use the
> if-then-else form.  All this will be useful for zicond as well.
>
> I don't mind letting zicond go first.  It's frozen so it ought to be
> non-controversial.  We can then upstream the various improvements to
> utilize zicond better.  That moves things forward in a meaningful manner
> and buys time to meet the hardware requirement for xventanacondops which
> will be trivial to add if zicond is already supported.
>
>
>
>
> Jeff
>

Re: [PATCH] RISC-V: Add Veyron V1 pipeline description

2023-06-08 Thread Philipp Tomsich

On Thu 8. Jun 2023 at 09:35, Kito Cheng via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:

> > diff --git a/gcc/config/riscv/riscv-cores.def
> b/gcc/config/riscv/riscv-cores.def
> > index 7d87ab7ce28..4078439e562 100644
> > --- a/gcc/config/riscv/riscv-cores.def
> > +++ b/gcc/config/riscv/riscv-cores.def
> > @@ -38,6 +38,7 @@ RISCV_TUNE("sifive-3-series", generic,
> rocket_tune_info)
> >  RISCV_TUNE("sifive-5-series", generic, rocket_tune_info)
> >  RISCV_TUNE("sifive-7-series", sifive_7, sifive_7_tune_info)
> >  RISCV_TUNE("thead-c906", generic, thead_c906_tune_info)
> > +RISCV_TUNE("veyron-v1", veyron_v1, veyron_v1_tune_info)
> >  RISCV_TUNE("size", generic, optimize_size_tune_info)
> >
> >  #undef RISCV_TUNE
> > @@ -77,4 +78,7 @@ RISCV_CORE("thead-c906",
> "rv64imafdc_xtheadba_xtheadbb_xtheadbs_xtheadcmo_"
> >   "xtheadcondmov_xtheadfmemidx_xtheadmac_"
> >   "xtheadmemidx_xtheadmempair_xtheadsync",
> >   "thead-c906")
> > +
> > +RISCV_CORE("veyron-v1",
>  "rv64imafdc_zba_zbb_zbc_zbs_zifencei_xventanacondops",
> > + "veyron-v1")
>
> Seems like xventanacondops have not in the trunk yet, I saw Jeff has
> approved before but not commit yet


We couldn’t apply back then, as Veyro -V1 had been unnannounced.
Can we move this forward now?

Philipp.

>

Re: FW: [RFC] RISC-V: Support risc-v bfloat16 This patch support bfloat16 in riscv like x86_64 and arm.

2023-06-01 Thread Philipp Tomsich

On Thu, 1 Jun 2023 at 18:49, Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 6/1/23 01:01, juzhe.zh...@rivai.ai wrote:
> > I plan to implement BF16 vector in GCC but still waiting for ISA
> > ratified since GCC policy doesn't allow un-ratified ISA.
> Right.  So those specs need to move along further before we can start
> integrating code.

Doesn't our policy require specs to only pass the FREEZE milestone
(i.e., the requirement for public review) before we can start
integrating them?
This should give us at least a 6 week (minimum 30 days public-review
plus 2 weeks for the TSC vote to send this up for ratification)
headstart on ratification (with the small risk of minor changes
required due to review comments) to start integrating support for new
extensions.

Best,
Philipp.

p.s.: Just for reference, the RISC-V Lifecycle Guide (defining these
milestones in specification development) is linked from
https://wiki.riscv.org/ for details.

> >
> > Currently, we are working on INT8,INT16,INT32,INT64,FP16,FP32,FP64
> > auto-vectorizaiton.
> > It should very simple BF16 in current vector framework in GCC.
> In prior architectures I've worked on the bulk of BF16 work was just
> adding additional entries to existing iterators.  So I agree, it should
> be very simple :-)
>
> Jeff
>

Re: [PATCH] RISC-V: Synthesize power-of-two constants.

2023-05-30 Thread Philipp Tomsich

Assuming a fully pipelined vector unit (and from experience on
AArch64), an u-arch's scalar-to-vector move cost is likely to play a
significant role in whether this will be profitable or not.

--Philipp.

On Wed, 31 May 2023 at 00:10, Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 5/30/23 16:01, 钟居哲 wrote:
> > I agree with Andrew.
> >
> > And I don't think this patch is appropriate for following reasons:
> > 1. This patch increases vector workload in machine since
> >   it convert scalar load + vmv.v.x into vmv.v.i + vsll.vi.
> This is probably uarch dependent.  I can probably construct cases where
> the first will be better and I can probably construct cases where the
> latter will be better.  In fact the recommendation from our uarch team
> is to generally do this stuff on the vector side.
>
>
>
> > 2. For multi-issue OoO machine, scalar instructions are very cheap
> >  when they are located in vector codegen. For example a sequence
> >  like this:
> >scalar insn
> >scalar insn
> >vector insn
> >scalar insn
> > vector insn
> >
> >In such situation, we can issue multiple instructions simultaneously,
> >and the latency of scalar instructions will be hided so scalar
> > instruction
> >is cheap. Wheras this patch increasing vector pipeline workload
> > is not
> >friendly to OoO machine what I mentioned above.
> I probably need to be careful what I say here :-)  I'll go with mixing
> vector/scalar code may incur certain penalties on some
> microarchitectures depending on the exact code sequences involved.
>
>
> > 3.   I can image the only benefit of this patch is that we can reduce
> > scalar register pressure
> >in some extreme circumstances. However, I don't this benefit is
> > "real" since GCC should
> >well schedule the instruction sequence when we well tune the
> > vector instructions scheduling
> >model and cost model to make such register live range very short
> > when the scalar register
> >pressure is very high.
> >
> > Overal, I disagree with this patch.
> What I think this all argues is that it'll likely need to be uarch
> dependent.I'm not yet sure how to describe the properties of the
> uarch in a concise manner to put into our costing structure yet though.
>
> jeff

Re: [PATCH] RISC-V: Optimize TARGET_XTHEADCONDMOV

2023-05-26 Thread Philipp Tomsich

LGTM.  Happy to move this forward, once it receives an OK from one of you.

--Philipp.

On Fri, 26 May 2023 at 02:53, Die Li  wrote:
>
> This patch allows less instructions to be used when TARGET_XTHEADCONDMOV is 
> enabled.
>
> Provide an example from the existing testcases.
>
> Testcase:
> int ConEmv_imm_imm_reg(int x, int y){
>   if (x == 1000) return 10;
>   return y;
> }
>
> Cflags:
> -O2 -march=rv64gc_xtheadcondmov -mabi=lp64d
>
> before patch:
> ConEmv_imm_imm_reg:
> addia5,a0,-1000
> li  a0,10
> th.mvneza0,zero,a5
> th.mveqza1,zero,a5
> or  a0,a0,a1
> ret
>
> after patch:
> ConEmv_imm_imm_reg:
> addia5,a0,-1000
> li  a0,10
> th.mvneza0,a1,a5
> ret
>
> Signed-off-by: Die Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_expand_conditional_move_onesided): 
> Delete.
> (riscv_expand_conditional_move):  Reuse the TARGET_SFB_ALU expand 
> process for TARGET_XTHEADCONDMOV
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Update the output.
> * gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Likewise.
> ---
>  gcc/config/riscv/riscv.cc | 44 +++--
>  .../riscv/xtheadcondmov-indirect-rv32.c   | 48 +++
>  .../riscv/xtheadcondmov-indirect-rv64.c   | 48 +++
>  3 files changed, 42 insertions(+), 98 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 09fc9e5d95e..8b8ac9181ba 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -3442,37 +3442,6 @@ riscv_expand_conditional_branch (rtx label, rtx_code 
> code, rtx op0, rtx op1)
>emit_jump_insn (gen_condjump (condition, label));
>  }
>
> -/* Helper to emit two one-sided conditional moves for the movecc.  */
> -
> -static void
> -riscv_expand_conditional_move_onesided (rtx dest, rtx cons, rtx alt,
> -   rtx_code code, rtx op0, rtx op1)
> -{
> -  machine_mode mode = GET_MODE (dest);
> -
> -  gcc_assert (GET_MODE_CLASS (mode) == MODE_INT);
> -  gcc_assert (reg_or_0_operand (cons, mode));
> -  gcc_assert (reg_or_0_operand (alt, mode));
> -
> -  riscv_emit_int_compare (, , , true);
> -  rtx cond = gen_rtx_fmt_ee (code, mode, op0, op1);
> -
> -  rtx tmp1 = gen_reg_rtx (mode);
> -  rtx tmp2 = gen_reg_rtx (mode);
> -
> -  emit_insn (gen_rtx_SET (tmp1, gen_rtx_IF_THEN_ELSE (mode, cond,
> - cons, const0_rtx)));
> -
> -  /* We need to expand a sequence for both blocks and we do that such,
> - that the second conditional move will use the inverted condition.
> - We use temporaries that are or'd to the dest register.  */
> -  cond = gen_rtx_fmt_ee ((code == EQ) ? NE : EQ, mode, op0, op1);
> -  emit_insn (gen_rtx_SET (tmp2, gen_rtx_IF_THEN_ELSE (mode, cond,
> - alt, const0_rtx)));
> -
> -  emit_insn (gen_rtx_SET (dest, gen_rtx_IOR (mode, tmp1, tmp2)));
> - }
> -
>  /* Emit a cond move: If OP holds, move CONS to DEST; else move ALT to DEST.
> Return 0 if expansion failed.  */
>
> @@ -3483,6 +3452,7 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx 
> cons, rtx alt)
>rtx_code code = GET_CODE (op);
>rtx op0 = XEXP (op, 0);
>rtx op1 = XEXP (op, 1);
> +  bool need_eq_ne_p = false;
>
>if (TARGET_XTHEADCONDMOV
>&& GET_MODE_CLASS (mode) == MODE_INT
> @@ -3492,14 +3462,12 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx 
> cons, rtx alt)
>&& GET_MODE (op0) == mode
>&& GET_MODE (op1) == mode
>&& (code == EQ || code == NE))
> +need_eq_ne_p = true;
> +
> +  if (need_eq_ne_p || (TARGET_SFB_ALU
> +  && GET_MODE (op0) == word_mode))
>  {
> -  riscv_expand_conditional_move_onesided (dest, cons, alt, code, op0, 
> op1);
> -  return true;
> -}
> -  else if (TARGET_SFB_ALU
> -  && GET_MODE (op0) == word_mode)
> -{
> -  riscv_emit_int_compare (, , );
> +  riscv_emit_int_compare (, , , need_eq_ne_p);
>rtx cond = gen_rtx_fmt_ee (code, GET_MODE (op0), op0, op1);
>
>/* The expander allows (const_int 0) for CONS for the benefit of
> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c 
> b/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
> index 9afdc2eabfd..e2b135f3d00 100644
> --- a/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
> @@ -1,15 +1,13 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -march=rv32gc_xtheadcondmov -mabi=ilp32 
> -mriscv-attribute" } */
> -/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" "-Og" } } */
> +/* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-Os" "-Og" "-O3" "-Oz" "-flto"} } 
> */
>  /* { dg-final { check-function-bodies "**" ""  } } */
>
>  /*
>

Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.

2023-05-25 Thread Philipp Tomsich

On Thu, 25 May 2023 at 16:14, Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 5/25/23 07:50, Richard Biener wrote:
> > On Thu, May 25, 2023 at 3:32 PM Jeff Law via Gcc-patches
> >  wrote:
> >>
> >>
> >>
> >> On 5/25/23 07:01, Richard Biener via Gcc-patches wrote:
> >>> On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis  
> >>> wrote:
> 
>  Implementation of the new RISC-V optimization pass for memory offset
>  calculations, documentation and testcases.
> >>>
> >>> Why do fwprop or combine not what you want to do?

At least for stack variables, the virtual-stack-vars is not resolved
until reload.
So combine will be running much too early to be of any use (and I
haven't recently looked at whether one of the propagation passes runs
after).

Philipp.

> >> I think a lot of them end up coming from register elimination.
> >
> > Why isn't this a problem for other targets then?  Or maybe it is and this
> > shouldn't be a machine specific pass?  Maybe postreload-gcse should
> > perform strength reduction (I can't think of any other post reload pass
> > that would do something even remotely related).
> It is to some degree.  I ran into similar problems at my prior employer.
>   We ended up working around it in the target files in a different way
> -- which didn't work when I quickly tried it on RISC-V.
>
> Seems like it would be worth another investigative step as part of the
> evaluation of this patch.  I wasn't at 100% when I did that poking
> around many months ago.
>
> Jeff

Re: [PATCH] RISC-V: Add rounding mode operand for fixed-point patterns

2023-05-15 Thread Philipp Tomsich

On Mon, 15 May 2023 at 10:18,  wrote:
>
> From: Juzhe-Zhong 
>
> Since we are going to have fixed-point intrinsics that are modeling rounding 
> mode
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222
>
> We should have operand to specify rounding mode in fixed-point instructions.
> We don't support these modeling rounding mode intrinsics yet but we will 
> definetely
> support them later.
>
> This is the preparing patch for new coming intrinsics.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-protos.h (enum vxrm_field_enum): New enum.
> * config/riscv/riscv-vector-builtins.cc 
> (function_expander::use_exact_insn): Add default rounding mode operand.
> * config/riscv/riscv.cc (riscv_hard_regno_nregs): Add VXRM_RENUM.
> (riscv_hard_regno_mode_ok): Ditto.
> (riscv_conditional_register_usage): Ditto.
> * config/riscv/riscv.h (DWARF_FRAME_REGNUM): Ditto.
> (VXRM_REG_P): Ditto.
> (RISCV_DWARF_VXRM): Ditto.
> * config/riscv/riscv.md: Ditto.
> * config/riscv/vector.md: Ditto.
>
> ---
>  gcc/config/riscv/riscv-protos.h   |  8 +++
>  gcc/config/riscv/riscv-vector-builtins.cc |  7 +++
>  gcc/config/riscv/riscv.cc |  5 +-
>  gcc/config/riscv/riscv.h  |  5 +-
>  gcc/config/riscv/riscv.md |  1 +
>  gcc/config/riscv/vector.md| 74 +--
>  6 files changed, 77 insertions(+), 23 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index bc71f9cbbba..835bb802fc6 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -223,6 +223,14 @@ machine_mode preferred_simd_mode (scalar_mode);
>  opt_machine_mode get_mask_mode (machine_mode);
>  void expand_vec_series (rtx, rtx, rtx);
>  void expand_vec_init (rtx, rtx);
> +/* Rounding mode bitfield for fixed point VXRM.  */
> +enum vxrm_field_enum
> +{
> +  VXRM_RNU,
> +  VXRM_RNE,
> +  VXRM_RDN,
> +  VXRM_ROD
> +};
>  }
>
>  /* We classify builtin types into two classes:
> diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
> b/gcc/config/riscv/riscv-vector-builtins.cc
> index 0f56f29f7aa..1de075fb90d 100644
> --- a/gcc/config/riscv/riscv-vector-builtins.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins.cc
> @@ -3288,6 +3288,13 @@ function_expander::use_exact_insn (insn_code icode)
>
>if (base->apply_vl_p ())
>  add_input_operand (Pmode, get_avl_type_rtx (avl_type::NONVLMAX));
> +
> +  /* TODO: Currently, we don't support intrinsic that is modeling rounding 
> mode.
> + We add default rounding mode for the intrinsics that didn't model 
> rounding
> + mode yet.  */
> +  if (opno != insn_data[icode].n_generator_args)
> +add_input_operand (Pmode, const0_rtx);
> +
>return generate_insn (icode);
>  }
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index a770fdfaa0e..c9c8861f84a 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -6082,7 +6082,7 @@ riscv_hard_regno_nregs (unsigned int regno, 
> machine_mode mode)
>
>/* mode for VL or VTYPE are just a marker, not holding value,
>   so it always consume one register.  */
> -  if (regno == VTYPE_REGNUM || regno == VL_REGNUM)
> +  if (regno == VTYPE_REGNUM || regno == VL_REGNUM || regno == VXRM_REGNUM)

Shouldn't this be VXRM_REG_P(...), VTYPE_REG_P(...), and VL_REG_P(...)?

>  return 1;
>
>/* Assume every valid non-vector mode fits in one vector register.  */
> @@ -6150,7 +6150,7 @@ riscv_hard_regno_mode_ok (unsigned int regno, 
> machine_mode mode)
>if (lmul != 1)
> return ((regno % lmul) == 0);
>  }
> -  else if (regno == VL_REGNUM || regno == VTYPE_REGNUM)
> +  else if (regno == VL_REGNUM || regno == VTYPE_REGNUM || regno == 
> VXRM_REGNUM)

Ditto.

>  return true;
>else
>  return false;
> @@ -6586,6 +6586,7 @@ riscv_conditional_register_usage (void)
>
>fixed_regs[VTYPE_REGNUM] = call_used_regs[VTYPE_REGNUM] = 1;
>fixed_regs[VL_REGNUM] = call_used_regs[VL_REGNUM] = 1;
> +  fixed_regs[VXRM_REGNUM] = call_used_regs[VXRM_REGNUM] = 1;
>  }
>  }
>
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index 4473115d3a9..f74b70de562 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -121,7 +121,8 @@ ASM_MISA_SPEC
>
>  /* The mapping from gcc register number to DWARF 2 CFA column number.  */
>  #define DWARF_FRAME_REGNUM(REGNO)
>   \
> -  (VL_REG_P (REGNO) ? RISCV_DWARF_VL 
>   \
> +  (VXRM_REG_P (REGNO) ? RISCV_DWARF_VXRM 
>   \
> +   : VL_REG_P (REGNO) ? RISCV_DWARF_VL   
>   \
> : VTYPE_REG_P (REGNO) 
>   \
>   ? RISCV_DWARF_VTYPE 
>

Re: [PATCH v3] Add pattern to convert vector shift + bitwise and + multiply to vector compare in some cases.

2023-05-11 Thread Philipp Tomsich

Bootstrapped and reg-tested overnight for x86 and aarch64.
Applied to master, thanks!

Philipp.

On Tue, 9 May 2023 at 09:13, Richard Biener  wrote:
>
> On Tue, Dec 20, 2022 at 1:23 PM Manolis Tsamis  
> wrote:
> >
> > When using SWAR (SIMD in a register) techniques a comparison operation 
> > within
> > such a register can be made by using a combination of shifts, bitwise and 
> > and
> > multiplication. If code using this scheme is vectorized then there is 
> > potential
> > to replace all these operations with a single vector comparison, by 
> > reinterpreting
> > the vector types to match the width of the SWAR register.
> >
> > For example, for the test function packed_cmp_16_32, the original generated 
> > code is:
> >
> > ldr q0, [x0]
> > add w1, w1, 1
> > ushrv0.4s, v0.4s, 15
> > and v0.16b, v0.16b, v2.16b
> > shl v1.4s, v0.4s, 16
> > sub v0.4s, v1.4s, v0.4s
> > str q0, [x0], 16
> > cmp w2, w1
> > bhi .L20
> >
> > with this pattern the above can be optimized to:
> >
> > ldr q0, [x0]
> > add w1, w1, 1
> > cmltv0.8h, v0.8h, #0
> > str q0, [x0], 16
> > cmp w2, w1
> > bhi .L20
> >
> > The effect is similar for x86-64.
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Simplify vector shift + bit_and + multiply in some 
> > cases.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/swar_to_vec_cmp.c: New test.
>
> OK if it still bootstraps/tests OK.
>
> Thanks,
> Richard.
>
> > Signed-off-by: Manolis Tsamis 
> >
> > ---
> >
> > Changes in v3:
> > - Changed pattern to use vec_cond_expr.
> > - Changed pattern to work with VLA vector.
> > - Added both expand_vec_cmp_expr_p and
> >   expand_vec_cond_expr_p check.
> > - Fixed type compatibility issues.
> >
> >  gcc/match.pd  | 61 
> >  .../gcc.target/aarch64/swar_to_vec_cmp.c  | 72 +++
> >  2 files changed, 133 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/swar_to_vec_cmp.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 67a0a682f31..320437f8aa3 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -301,6 +301,67 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  (view_convert (bit_and:itype (view_convert @0)
> >  (ne @1 { build_zero_cst (type); })))
> >
> > +/* In SWAR (SIMD within a register) code a signed comparison of packed data
> > +   can be constructed with a particular combination of shift, bitwise and,
> > +   and multiplication by constants.  If that code is vectorized we can
> > +   convert this pattern into a more efficient vector comparison.  */
> > +(simplify
> > + (mult (bit_and (rshift @0 uniform_integer_cst_p@1)
> > +   uniform_integer_cst_p@2)
> > +uniform_integer_cst_p@3)
> > + (with {
> > +   tree rshift_cst = uniform_integer_cst_p (@1);
> > +   tree bit_and_cst = uniform_integer_cst_p (@2);
> > +   tree mult_cst = uniform_integer_cst_p (@3);
> > +  }
> > +  /* Make sure we're working with vectors and uniform vector constants.  */
> > +  (if (VECTOR_TYPE_P (type)
> > +   && tree_fits_uhwi_p (rshift_cst)
> > +   && tree_fits_uhwi_p (mult_cst)
> > +   && tree_fits_uhwi_p (bit_and_cst))
> > +   /* Compute what constants would be needed for this to represent a packed
> > +  comparison based on the shift amount denoted by RSHIFT_CST.  */
> > +   (with {
> > + HOST_WIDE_INT vec_elem_bits = vector_element_bits (type);
> > + poly_int64 vec_nelts = TYPE_VECTOR_SUBPARTS (type);
> > + poly_int64 vec_bits = vec_elem_bits * vec_nelts;
> > + unsigned HOST_WIDE_INT cmp_bits_i, bit_and_i, mult_i;
> > + unsigned HOST_WIDE_INT target_mult_i, target_bit_and_i;
> > + cmp_bits_i = tree_to_uhwi (rshift_cst) + 1;
> > + mult_i = tree_to_uhwi (mult_cst);
> > + target_mult_i = (HOST_WIDE_INT_1U << cmp_bits_i) - 1;
> > + bit_and_i = tree_to_uhwi (bit_and_cst);
> > + target_bit_and_i = 0;
> > +
> > + /* The bit pattern in BIT_AND_I should be a mask for the least
> > +   significant bit of each packed element that is CMP_BITS wide.  */
> > + for (unsigned i = 0; i < vec_elem_bits / cmp_bits_i; i++)
> > +   target_bit_and_i = (target_bit_and_i << cmp_bits_i) | 1U;
> > +}
> > +(if ((exact_log2 (cmp_bits_i)) >= 0
> > +&& cmp_bits_i < HOST_BITS_PER_WIDE_INT
> > +&& multiple_p (vec_bits, cmp_bits_i)
> > +&& vec_elem_bits <= HOST_BITS_PER_WIDE_INT
> > +&& target_mult_i == mult_i
> > +&& target_bit_and_i == bit_and_i)
> > + /* Compute the vector shape for the comparison and check if the 
> > target is
> > +   able to expand the comparison with that type.  */
> > + (with {
> > +   /* We're doing a signed comparison.  */
> > +   tree cmp_type =

Re: [RFC PATCH v1 09/10] RISC-V: Recognize xventanacondops extension

2023-04-25 Thread Philipp Tomsich

The binutils support is lingering on-list since Jan 2022:
   https://sourceware.org/pipermail/binutils/2022-January/119388.html

If we get an OK on that one, we will rebase, retest, and merge it.

Thanks,
Philipp.

On Tue, 25 Apr 2023 at 11:53, Kito Cheng  wrote:

> I am not sure if we should accept this on gcc trunk without binutils
> support?
>
> On Sat, Apr 22, 2023 at 3:58 AM Jeff Law via Gcc-patches
>  wrote:
> >
> >
> >
> > On 2/10/23 15:41, Philipp Tomsich wrote:
> > > This adds the xventanacondops extension to the option parsing and as a
> > > default for the ventana-vt1 core:
> > >
> > > gcc/Changelog:
> > >
> > >   * common/config/riscv/riscv-common.cc: Recognize
> > >"xventanacondops" as part of an architecture string.
> > >   * config/riscv/riscv-opts.h (MASK_XVENTANACONDOPS): Define.
> > >   (TARGET_XVENTANACONDOPS): Define.
> > >   * config/riscv/riscv.opt: Add "riscv_xventanacondops".
> > >
> > > Signed-off-by: Philipp Tomsich 
> > OK
> > jeff
>

Re: [PATCH v1] [RFC] Improve folding for comparisons with zero in tree-ssa-forwprop.

2023-04-21 Thread Philipp Tomsich

Any guidance on the next steps for this patch?
I believe that we answered all open questions, but may have missed something.

With trunk open for new development, we would like to revise and land this…

Thanks,
Philipp.

On Mon, 20 Mar 2023 at 15:02, Manolis Tsamis  wrote:
>
> On Fri, Mar 17, 2023 at 10:31 AM Richard Biener
>  wrote:
> >
> > On Thu, Mar 16, 2023 at 4:27 PM Manolis Tsamis  
> > wrote:
> > >
> > > For this C testcase:
> > >
> > > void g();
> > > void f(unsigned int *a)
> > > {
> > >   if (++*a == 1)
> > > g();
> > > }
> > >
> > > GCC will currently emit a comparison with 1 by using the value
> > > of *a after the increment. This can be improved by comparing
> > > against 0 and using the value before the increment. As a result
> > > there is a potentially shorter dependancy chain (no need to wait
> > > for the result of +1) and on targets with compare zero instructions
> > > the generated code is one instruction shorter.
> >
> > The downside is we now need two registers and their lifetime overlaps.
> >
> > Your patch mixes changing / inverting a parameter (which seems unneeded
> > for the actual change) with preferring compares against zero.
> >
>
> Indeed. I thought that without that change the original names wouldn't 
> properly
> describe what the parameter actually does and that's why I've changed it.
> I can undo that in the next revision.
>
> > What's the reason to specifically prefer compares against zero?  On x86
> > we have add that sets flags, so ++*a == 0 would be preferred, but
> > for your sequence we'd need a test reg, reg; branch on zero, so we do
> > not save any instruction.
> >
>
> My reasoning is that zero is treated preferentially  in most if not
> all architectures. Some specifically have zero/non-zero comparisons so
> we get one less instruction. X86 doesn't explicitly have that but I
> think that test reg, reg may not be always needed depending on the
> rest of the code. By what Andrew mentions below there may even be
> optimizations for zero in the microarchitecture level.
>
> Because this is still an arch-specific thing I initially tried to make
> it arch-depended by invoking the target's const functions (e.g. If I
> recall correctly aarch64 will return a lower cost for zero
> comparisons). But the code turned out complicated and messy so I came
> up with this alternative that just treats zero preferentially.
>
> If you have in mind a way that this can be done in a better way I
> could try to implement it.
>
> > We do have quite some number of bugreports with regards to making VRPs
> > life harder when splitting things this way.  It's easier for VRP to handle
> >
> >   _1 = _2 + 1;
> >   if (_1 == 1)
> >
> > than it is
> >
> >   _1 = _2 + 1;
> >   if (_2 == 0)
> >
> > where VRP fails to derive a range for _1 on the _2 == 0 branch.  So besides
> > the life-range issue there's other side-effects as well.  Maybe ranger 
> > meanwhile
> > can handle the above case?
> >
>
> Answered by Andrew MacLeod.
>
> > What's the overall effect of the change on a larger code base?
> >
>
> I made some quick runs of SPEC2017 and got the following results (# of
> folds of zero comparisons):
>
>  gcc 2586
>  xalancbmk 1456
>  perlbench   375
>  x264   307
>  omnetpp 137
>  leela   24
>  deepsjeng  15
>  exchange2  4
>  xz4
>
> My test runs on Aarch64 do not show any significant change in runtime.
> In some cases (e.g. gcc) the binary is smaller in size, but that can
> depend on a number of other things.
>
> Thanks,
> Manolis
>
> > Thanks,
> > Richard.
> >
> > >
> > > Example from Aarch64:
> > >
> > > Before
> > > ldr w1, [x0]
> > > add w1, w1, 1
> > > str w1, [x0]
> > > cmp w1, 1
> > > beq .L4
> > > ret
> > >
> > > After
> > > ldr w1, [x0]
> > > add w2, w1, 1
> > > str w2, [x0]
> > > cbz w1, .L4
> > > ret
> > >
> > > gcc/ChangeLog:
> > >
> > > * tree-ssa-forwprop.cc (combine_cond_expr_cond):
> > > (forward_propagate_into_comparison_1): Optimize
> > > for zero comparisons.
> > >
> > > Signed-off-by: Manolis Tsamis 
> > > ---
> > >
> > >  gcc/tree-ssa-forwprop.cc | 41 +++-
> > >  1 file changed, 28 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
> > > index e34f0888954..93d5043821b 100644
> > > --- a/gcc/tree-ssa-forwprop.cc
> > > +++ b/gcc/tree-ssa-forwprop.cc
> > > @@ -373,12 +373,13 @@ rhs_to_tree (tree type, gimple *stmt)
> > >  /* Combine OP0 CODE OP1 in the context of a COND_EXPR.  Returns
> > > the folded result in a form suitable for COND_EXPR_COND or
> > > NULL_TREE, if there is no suitable simplified form.  If
> > > -   INVARIANT_ONLY is true only gimple_min_invariant results are
> > > -   considered simplified.  */
> > > +   ALWAYS_COMBINE is false then only combine it the

Re: [PATCH v2] aarch64: disable LDP via tuning structure for -mcpu=ampere1/1a

2023-04-17 Thread Philipp Tomsich

On Mon, 17 Apr 2023 at 17:07, Kyrylo Tkachov  wrote:
>
>
>
> > -Original Message-
> > From: Philipp Tomsich 
> > Sent: Monday, April 17, 2023 11:22 AM
> > To: Kyrylo Tkachov 
> > Cc: gcc-patches@gcc.gnu.org; Di Zhao 
> > Subject: Re: [PATCH v2] aarch64: disable LDP via tuning structure for -
> > mcpu=ampere1/1a
> >
> > OK for backport?
> > This will be all the way down to GCC10, as I just realized that we
> > need to backport the entire ampere1/1a support to GCC10 (we stopped at
> > GCC11 for some unexplainable reason)...
>
> Ok, under the principle that we'd already backported the ampere1 support and 
> this is a small and unintrusive change.
> But I suppose the change for the branches shouldn't include the TODO note as 
> we would not be extending the LDP restriction support there.

Thanks for the reminder to drop the TODO.
I would have done a straight cherry-pick without this... and regretted it.

Philipp.

>
> Thanks,
> Kyrill
>
> >
> > Philipp.
> >
> >
> > On Mon, 17 Apr 2023 at 12:20, Philipp Tomsich 
> > wrote:
> > >
> > > Applied to master, thanks!
> > > Philipp.
> > >
> > > On Mon, 17 Apr 2023 at 11:56, Kyrylo Tkachov 
> > wrote:
> > >>
> > >>
> > >>
> > >> > -Original Message-
> > >> > From: Philipp Tomsich 
> > >> > Sent: Friday, April 14, 2023 7:06 PM
> > >> > To: gcc-patches@gcc.gnu.org
> > >> > Cc: Kyrylo Tkachov ; Philipp Tomsich
> > >> > ; Di Zhao 
> > >> > Subject: [PATCH v2] aarch64: disable LDP via tuning structure for -
> > >> > mcpu=ampere1/1a
> > >> >
> > >> > AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
> > >> > Given the chance that this causes instructions to slip into the next
> > >> > decoding cycle and the additional overheads when handling
> > >> > cacheline-crossing LDP instructions, we disable the generation of LDP
> > >> > isntructions through the tuning structure from instruction combining
> > >> > (such as in peephole2).
> > >> >
> > >> > Given the code-density benefits in builtins and prologue/epilogue
> > >> > expansion, we allow LDPs there.
> > >> >
> > >> > This commit:
> > >> >  * adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
> > >> >  * allows -moverride=tune=... to override this
> > >> >
> > >> > These changes are benchmark-driven, yielding the following changes
> > >> > (with a net-overall improvement):
> > >> >503.bwaves_r.  -0.88%
> > >> >507.cactuBSSN_r 0.35%
> > >> >508.namd_r  3.09%
> > >> >510.parest_r   -2.99%
> > >> >511.povray_r5.54%
> > >> >519.lbm_r  15.83%
> > >> >521.wrf_r   0.56%
> > >> >526.blender_r   2.47%
> > >> >527.cam4_r  0.70%
> > >> >538.imagick_r   0.00%
> > >> >544.nab_r  -0.33%
> > >> >549.fotonik3d_r.   -0.42%
> > >> >554.roms_r  0.00%
> > >> >-
> > >> >= total 1.79%
> > >> >
> > >> > Signed-off-by: Philipp Tomsich 
> > >> > Co-Authored-By: Di Zhao 
> > >>
> > >> Ok.
> > >> Thanks,
> > >> Kyrill
> > >>
> > >> >
> > >> > gcc/ChangeLog:
> > >> >
> > >> >   * config/aarch64/aarch64-tuning-flags.def
> > >> > (AARCH64_EXTRA_TUNING_OPTION):
> > >> >   Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
> > >> >   * config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
> > >> >   Check for the above tuning option when processing loads.
> > >> >
> > >> > gcc/testsuite/ChangeLog:
> > >> >
> > >> >   * gcc.target/aarch64/ampere1-no_ldp_combine.c: New test.
> > >> >
> > >> > ---
> > >> >
> > >> > Changes in v2:
> > >> > - apply both to -mcpu=ampere1 and -mcpu=ampere1a
> > >> > - add TODO: tag, per discussions on the mailing list
> > >> > - add testcase
> > >> >
> > >> >  gcc/config/aarch64/

Re: [PATCH v2] aarch64: disable LDP via tuning structure for -mcpu=ampere1/1a

2023-04-17 Thread Philipp Tomsich

OK for backport?
This will be all the way down to GCC10, as I just realized that we
need to backport the entire ampere1/1a support to GCC10 (we stopped at
GCC11 for some unexplainable reason)...

Philipp.


On Mon, 17 Apr 2023 at 12:20, Philipp Tomsich  wrote:
>
> Applied to master, thanks!
> Philipp.
>
> On Mon, 17 Apr 2023 at 11:56, Kyrylo Tkachov  wrote:
>>
>>
>>
>> > -----Original Message-
>> > From: Philipp Tomsich 
>> > Sent: Friday, April 14, 2023 7:06 PM
>> > To: gcc-patches@gcc.gnu.org
>> > Cc: Kyrylo Tkachov ; Philipp Tomsich
>> > ; Di Zhao 
>> > Subject: [PATCH v2] aarch64: disable LDP via tuning structure for -
>> > mcpu=ampere1/1a
>> >
>> > AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
>> > Given the chance that this causes instructions to slip into the next
>> > decoding cycle and the additional overheads when handling
>> > cacheline-crossing LDP instructions, we disable the generation of LDP
>> > isntructions through the tuning structure from instruction combining
>> > (such as in peephole2).
>> >
>> > Given the code-density benefits in builtins and prologue/epilogue
>> > expansion, we allow LDPs there.
>> >
>> > This commit:
>> >  * adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
>> >  * allows -moverride=tune=... to override this
>> >
>> > These changes are benchmark-driven, yielding the following changes
>> > (with a net-overall improvement):
>> >503.bwaves_r.  -0.88%
>> >507.cactuBSSN_r 0.35%
>> >508.namd_r  3.09%
>> >510.parest_r   -2.99%
>> >511.povray_r5.54%
>> >519.lbm_r  15.83%
>> >521.wrf_r       0.56%
>> >526.blender_r   2.47%
>> >527.cam4_r  0.70%
>> >538.imagick_r   0.00%
>> >544.nab_r  -0.33%
>> >549.fotonik3d_r.   -0.42%
>> >554.roms_r  0.00%
>> >-
>> >= total 1.79%
>> >
>> > Signed-off-by: Philipp Tomsich 
>> > Co-Authored-By: Di Zhao 
>>
>> Ok.
>> Thanks,
>> Kyrill
>>
>> >
>> > gcc/ChangeLog:
>> >
>> >   * config/aarch64/aarch64-tuning-flags.def
>> > (AARCH64_EXTRA_TUNING_OPTION):
>> >   Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
>> >   * config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
>> >   Check for the above tuning option when processing loads.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >   * gcc.target/aarch64/ampere1-no_ldp_combine.c: New test.
>> >
>> > ---
>> >
>> > Changes in v2:
>> > - apply both to -mcpu=ampere1 and -mcpu=ampere1a
>> > - add TODO: tag, per discussions on the mailing list
>> > - add testcase
>> >
>> >  gcc/config/aarch64/aarch64-tuning-flags.def|  3 +++
>> >  gcc/config/aarch64/aarch64.cc  | 18 --
>> >  .../aarch64/ampere1-no_ldp_combine.c   | 11 +++
>> >  3 files changed, 30 insertions(+), 2 deletions(-)
>> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ampere1-
>> > no_ldp_combine.c
>> >
>> > diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def
>> > b/gcc/config/aarch64/aarch64-tuning-flags.def
>> > index 712895a5263..52112ba7c48 100644
>> > --- a/gcc/config/aarch64/aarch64-tuning-flags.def
>> > +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
>> > @@ -44,6 +44,9 @@ AARCH64_EXTRA_TUNING_OPTION
>> > ("cheap_shift_extend", CHEAP_SHIFT_EXTEND)
>> >  /* Disallow load/store pair instructions on Q-registers.  */
>> >  AARCH64_EXTRA_TUNING_OPTION ("no_ldp_stp_qregs",
>> > NO_LDP_STP_QREGS)
>> >
>> > +/* Disallow load-pair instructions to be formed in combine/peephole.  */
>> > +AARCH64_EXTRA_TUNING_OPTION ("no_ldp_combine",
>> > NO_LDP_COMBINE)
>> > +
>> >  AARCH64_EXTRA_TUNING_OPTION ("rename_load_regs",
>> > RENAME_LOAD_REGS)
>> >
>> >  AARCH64_EXTRA_TUNING_OPTION ("cse_sve_vl_constants",
>> > CSE_SVE_VL_CONSTANTS)
>> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> > index f4ef22ce02f..0f04ab9fba0 100644
>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc

Re: [PATCH v2] aarch64: disable LDP via tuning structure for -mcpu=ampere1/1a

2023-04-17 Thread Philipp Tomsich

Applied to master, thanks!
Philipp.

On Mon, 17 Apr 2023 at 11:56, Kyrylo Tkachov  wrote:

>
>
> > -Original Message-
> > From: Philipp Tomsich 
> > Sent: Friday, April 14, 2023 7:06 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov ; Philipp Tomsich
> > ; Di Zhao 
> > Subject: [PATCH v2] aarch64: disable LDP via tuning structure for -
> > mcpu=ampere1/1a
> >
> > AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
> > Given the chance that this causes instructions to slip into the next
> > decoding cycle and the additional overheads when handling
> > cacheline-crossing LDP instructions, we disable the generation of LDP
> > isntructions through the tuning structure from instruction combining
> > (such as in peephole2).
> >
> > Given the code-density benefits in builtins and prologue/epilogue
> > expansion, we allow LDPs there.
> >
> > This commit:
> >  * adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
> >  * allows -moverride=tune=... to override this
> >
> > These changes are benchmark-driven, yielding the following changes
> > (with a net-overall improvement):
> >503.bwaves_r.  -0.88%
> >507.cactuBSSN_r 0.35%
> >508.namd_r  3.09%
> >510.parest_r   -2.99%
> >511.povray_r5.54%
> >519.lbm_r  15.83%
> >521.wrf_r   0.56%
> >526.blender_r   2.47%
> >527.cam4_r  0.70%
> >    538.imagick_r   0.00%
> >544.nab_r  -0.33%
> >549.fotonik3d_r.   -0.42%
> >554.roms_r  0.00%
> >-
> >= total 1.79%
> >
> > Signed-off-by: Philipp Tomsich 
> > Co-Authored-By: Di Zhao 
>
> Ok.
> Thanks,
> Kyrill
>
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-tuning-flags.def
> > (AARCH64_EXTRA_TUNING_OPTION):
> >   Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
> >   * config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
> >   Check for the above tuning option when processing loads.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/ampere1-no_ldp_combine.c: New test.
> >
> > ---
> >
> > Changes in v2:
> > - apply both to -mcpu=ampere1 and -mcpu=ampere1a
> > - add TODO: tag, per discussions on the mailing list
> > - add testcase
> >
> >  gcc/config/aarch64/aarch64-tuning-flags.def|  3 +++
> >  gcc/config/aarch64/aarch64.cc  | 18 --
> >  .../aarch64/ampere1-no_ldp_combine.c   | 11 +++
> >  3 files changed, 30 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/ampere1-
> > no_ldp_combine.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def
> > b/gcc/config/aarch64/aarch64-tuning-flags.def
> > index 712895a5263..52112ba7c48 100644
> > --- a/gcc/config/aarch64/aarch64-tuning-flags.def
> > +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
> > @@ -44,6 +44,9 @@ AARCH64_EXTRA_TUNING_OPTION
> > ("cheap_shift_extend", CHEAP_SHIFT_EXTEND)
> >  /* Disallow load/store pair instructions on Q-registers.  */
> >  AARCH64_EXTRA_TUNING_OPTION ("no_ldp_stp_qregs",
> > NO_LDP_STP_QREGS)
> >
> > +/* Disallow load-pair instructions to be formed in combine/peephole.  */
> > +AARCH64_EXTRA_TUNING_OPTION ("no_ldp_combine",
> > NO_LDP_COMBINE)
> > +
> >  AARCH64_EXTRA_TUNING_OPTION ("rename_load_regs",
> > RENAME_LOAD_REGS)
> >
> >  AARCH64_EXTRA_TUNING_OPTION ("cse_sve_vl_constants",
> > CSE_SVE_VL_CONSTANTS)
> > diff --git a/gcc/config/aarch64/aarch64.cc
> b/gcc/config/aarch64/aarch64.cc
> > index f4ef22ce02f..0f04ab9fba0 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -1933,7 +1933,7 @@ static const struct tune_params ampere1_tunings =
> >2, /* min_div_recip_mul_df.  */
> >0, /* max_case_values.  */
> >tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
> > -  (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
> > +  (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE),   /* tune_flags.  */
> >_prefetch_tune
> >  };
> >
> > @@ -1971,7 +1971,7 @@ static const struct tune_params ampere1a_tunings
> > =
> >2, /* min_div_recip_mul_df.  */
> >0, /* max_case_values.  */
> >tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.

[PATCH v2] aarch64: disable LDP via tuning structure for -mcpu=ampere1/1a

2023-04-14 Thread Philipp Tomsich

AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
Given the chance that this causes instructions to slip into the next
decoding cycle and the additional overheads when handling
cacheline-crossing LDP instructions, we disable the generation of LDP
isntructions through the tuning structure from instruction combining
(such as in peephole2).

Given the code-density benefits in builtins and prologue/epilogue
expansion, we allow LDPs there.

This commit:
 * adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
 * allows -moverride=tune=... to override this

These changes are benchmark-driven, yielding the following changes
(with a net-overall improvement):
   503.bwaves_r.  -0.88%
   507.cactuBSSN_r 0.35%
   508.namd_r  3.09%
   510.parest_r   -2.99%
   511.povray_r5.54%
   519.lbm_r  15.83%
   521.wrf_r   0.56%
   526.blender_r   2.47%
   527.cam4_r  0.70%
   538.imagick_r   0.00%
   544.nab_r  -0.33%
   549.fotonik3d_r.   -0.42%
   554.roms_r  0.00%
   -
   = total 1.79%

Signed-off-by: Philipp Tomsich 
Co-Authored-By: Di Zhao 

gcc/ChangeLog:

* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
* config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
Check for the above tuning option when processing loads.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/ampere1-no_ldp_combine.c: New test.

---

Changes in v2:
- apply both to -mcpu=ampere1 and -mcpu=ampere1a
- add TODO: tag, per discussions on the mailing list
- add testcase

 gcc/config/aarch64/aarch64-tuning-flags.def|  3 +++
 gcc/config/aarch64/aarch64.cc  | 18 --
 .../aarch64/ampere1-no_ldp_combine.c   | 11 +++
 3 files changed, 30 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index 712895a5263..52112ba7c48 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -44,6 +44,9 @@ AARCH64_EXTRA_TUNING_OPTION ("cheap_shift_extend", 
CHEAP_SHIFT_EXTEND)
 /* Disallow load/store pair instructions on Q-registers.  */
 AARCH64_EXTRA_TUNING_OPTION ("no_ldp_stp_qregs", NO_LDP_STP_QREGS)
 
+/* Disallow load-pair instructions to be formed in combine/peephole.  */
+AARCH64_EXTRA_TUNING_OPTION ("no_ldp_combine", NO_LDP_COMBINE)
+
 AARCH64_EXTRA_TUNING_OPTION ("rename_load_regs", RENAME_LOAD_REGS)
 
 AARCH64_EXTRA_TUNING_OPTION ("cse_sve_vl_constants", CSE_SVE_VL_CONSTANTS)
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index f4ef22ce02f..0f04ab9fba0 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -1933,7 +1933,7 @@ static const struct tune_params ampere1_tunings =
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags.  */
   _prefetch_tune
 };
 
@@ -1971,7 +1971,7 @@ static const struct tune_params ampere1a_tunings =
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags.  */
   _prefetch_tune
 };
 
@@ -26053,6 +26053,20 @@ aarch64_operands_ok_for_ldpstp (rtx *operands, bool 
load,
   enum reg_class rclass_1, rclass_2;
   rtx mem_1, mem_2, reg_1, reg_2;
 
+  /* Allow the tuning structure to disable LDP instruction formation
+ from combining instructions (e.g., in peephole2).
+ TODO: Implement fine-grained tuning control for LDP and STP:
+  1. control policies for load and store separately;
+  2. support the following policies:
+ - default (use what is in the tuning structure)
+ - always
+ - never
+ - aligned (only if the compiler can prove that the
+   load will be aligned to 2 * element_size)  */
+  if (load && (aarch64_tune_params.extra_tuning_flags
+  & AARCH64_EXTRA_TUNE_NO_LDP_COMBINE))
+return false;
+
   if (load)
 {
   mem_1 = operands[1];
diff --git a/gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c 
b/gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c
new file mode 100644
index 000..bc871f4481d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c
@@ -0,0 +1,11 @@
+/* { dg-options "-O3 -mtune=ampere1" } */
+
+long
+foo (long a[])
+{
+  return a[0] + a[1];
+}
+
+/* We should see

Re: [PATCH] aarch64: disable LDP via tuning structure for -mcpu=ampere1

2023-04-14 Thread Philipp Tomsich

On Fri, 14 Apr 2023 at 13:02, Kyrylo Tkachov  wrote:

> Hi Philipp,
>
> From: Philipp Tomsich 
> Sent: Friday, April 14, 2023 11:26 AM
> To: Kyrylo Tkachov 
> Cc: gcc-patches@gcc.gnu.org; Di Zhao 
> Subject: Re: [PATCH] aarch64: disable LDP via tuning structure for
> -mcpu=ampere1
>
>
>
> On Fri, 14 Apr 2023 at 11:31, Philipp Tomsich  philipp.toms...@vrull.eu> wrote:
> Kyrylo,
>
> On Fri, 14 Apr 2023 at 11:21, Kyrylo Tkachov  kyrylo.tkac...@arm.com> wrote:
> >
> > Hi Philipp,
> >
> > > -Original Message-
> > > From: Philipp Tomsich <mailto:philipp.toms...@vrull.eu>
> > > Sent: Friday, April 14, 2023 12:22 AM
> > > To: mailto:gcc-patches@gcc.gnu.org
> > > Cc: Kyrylo Tkachov <mailto:kyrylo.tkac...@arm.com>; Philipp Tomsich
> > > <mailto:philipp.toms...@vrull.eu>; Di Zhao  di.z...@amperecomputing.com>
> > > Subject: [PATCH] aarch64: disable LDP via tuning structure for -
> > > mcpu=ampere1
> > >
> > > AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
> > > Given the chance that this causes instructions to slip into the next
> > > decoding cycle and the additional overheads when handling
> > > cacheline-crossing LDP instructions, we disable the generation of LDP
> > > isntructions through the tuning structure from instruction combining
> > > (such as in peephole2).
> > >
> > > Given the code-density benefits in builtins and prologue/epilogue
> > > expansion, we allow LDPs there.
> >
> > LDPs are indeed quite an important part of the ISA for code density and
> there are, in principle, second-order benefits from using them, like
> keeping the instruction cache footprint low (which can be important for
> large workloads).
> > Did you gather some benchmarks showing a benefit of disabling them in
> this manner?
>
> >This has been benchmark-driven, but I need to follow up separately (as
> >I the final numbers are with the folks that have access to the
> >benchmark machines)..
>
> >Here are the numbers for the submitted change for AmpereOne:
> >   503.bwaves_r.-0.88%
> >   507.cactuBSSN_r0.35%
> >   508.namd_r 3.09%
> >   510.parest_r   -2.99%
> >   511.povray_r5.54%
> >   519.lbm_r   15.83%
> >   521.wrf_r  0.56%
> >   526.blender_r   2.47%
> >   527.cam4_r  0.70%
> >   538.imagick_r   0.00%
> >   544.nab_r        -0.33%
> >  549.fotonik3d_r.  -0.42%
> >   554.roms_r   0.00%
> >   = total   1.79%
>
> Thanks for getting these, the gains are quite significant.
>
> >
> > > This commit:
> > >  * adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
> > >  * allows -moverride=tune=... to override this
> > >
> > > Signed-off-by: Philipp Tomsich <mailto:philipp.toms...@vrull.eu>
> > > Co-Authored-By: Di Zhao <mailto:di.z...@amperecomputing.com>
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/aarch64/aarch64-tuning-flags.def
> > > (AARCH64_EXTRA_TUNING_OPTION):
> > >   Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
> > >   * config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
> > >   Check for the above tuning option when processing loads.
> > >
> > > ---
> > >
> > >  gcc/config/aarch64/aarch64-tuning-flags.def | 3 +++
> > >  gcc/config/aarch64/aarch64.cc   | 8 +++-
> > >  2 files changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def
> > > b/gcc/config/aarch64/aarch64-tuning-flags.def
> > > index 712895a5263..52112ba7c48 100644
> > > --- a/gcc/config/aarch64/aarch64-tuning-flags.def
> > > +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
> > > @@ -44,6 +44,9 @@ AARCH64_EXTRA_TUNING_OPTION
> > > ("cheap_shift_extend", CHEAP_SHIFT_EXTEND)
> > >  /* Disallow load/store pair instructions on Q-registers.  */
> > >  AARCH64_EXTRA_TUNING_OPTION ("no_ldp_stp_qregs",
> > > NO_LDP_STP_QREGS)
> > >
> > > +/* Disallow load-pair instructions to be formed in combine/peephole.
> */
> > > +AARCH64_EXTRA_TUNING_OPTION ("no_ldp_combine",
> > > NO_LDP_COMBINE)
> > > +
> > >  AARCH64_EXTRA_TUNING_OPTION ("rename_load_regs",
> > > RENAME_LOAD_REGS)
> > >
> > >  AARCH64_EXTR

Re: [PATCH] aarch64: disable LDP via tuning structure for -mcpu=ampere1

2023-04-14 Thread Philipp Tomsich

On Fri, 14 Apr 2023 at 11:31, Philipp Tomsich 
wrote:

> Kyrylo,
>
> On Fri, 14 Apr 2023 at 11:21, Kyrylo Tkachov 
> wrote:
> >
> > Hi Philipp,
> >
> > > -Original Message-
> > > From: Philipp Tomsich 
> > > Sent: Friday, April 14, 2023 12:22 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Kyrylo Tkachov ; Philipp Tomsich
> > > ; Di Zhao 
> > > Subject: [PATCH] aarch64: disable LDP via tuning structure for -
> > > mcpu=ampere1
> > >
> > > AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
> > > Given the chance that this causes instructions to slip into the next
> > > decoding cycle and the additional overheads when handling
> > > cacheline-crossing LDP instructions, we disable the generation of LDP
> > > isntructions through the tuning structure from instruction combining
> > > (such as in peephole2).
> > >
> > > Given the code-density benefits in builtins and prologue/epilogue
> > > expansion, we allow LDPs there.
> >
> > LDPs are indeed quite an important part of the ISA for code density and
> there are, in principle, second-order benefits from using them, like
> keeping the instruction cache footprint low (which can be important for
> large workloads).
> > Did you gather some benchmarks showing a benefit of disabling them in
> this manner?
>
> This has been benchmark-driven, but I need to follow up separately (as
> I the final numbers are with the folks that have access to the
> benchmark machines)..
>

Here are the numbers for the submitted change for AmpereOne:
   503.bwaves_r.-0.88%
   507.cactuBSSN_r0.35%
   508.namd_r 3.09%
   510.parest_r   -2.99%
   511.povray_r5.54%
   519.lbm_r   15.83%
   521.wrf_r  0.56%
   526.blender_r   2.47%
   527.cam4_r  0.70%
   538.imagick_r   0.00%
   544.nab_r-0.33%
   549.fotonik3d_r.  -0.42%
   554.roms_r       0.00%
   = total   1.79%


> >
> > > This commit:
> > >  * adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
> > >  * allows -moverride=tune=... to override this
> > >
> > > Signed-off-by: Philipp Tomsich 
> > > Co-Authored-By: Di Zhao 
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/aarch64/aarch64-tuning-flags.def
> > > (AARCH64_EXTRA_TUNING_OPTION):
> > >   Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
> > >   * config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
> > >   Check for the above tuning option when processing loads.
> > >
> > > ---
> > >
> > >  gcc/config/aarch64/aarch64-tuning-flags.def | 3 +++
> > >  gcc/config/aarch64/aarch64.cc   | 8 +++-
> > >  2 files changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def
> > > b/gcc/config/aarch64/aarch64-tuning-flags.def
> > > index 712895a5263..52112ba7c48 100644
> > > --- a/gcc/config/aarch64/aarch64-tuning-flags.def
> > > +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
> > > @@ -44,6 +44,9 @@ AARCH64_EXTRA_TUNING_OPTION
> > > ("cheap_shift_extend", CHEAP_SHIFT_EXTEND)
> > >  /* Disallow load/store pair instructions on Q-registers.  */
> > >  AARCH64_EXTRA_TUNING_OPTION ("no_ldp_stp_qregs",
> > > NO_LDP_STP_QREGS)
> > >
> > > +/* Disallow load-pair instructions to be formed in combine/peephole.
> */
> > > +AARCH64_EXTRA_TUNING_OPTION ("no_ldp_combine",
> > > NO_LDP_COMBINE)
> > > +
> > >  AARCH64_EXTRA_TUNING_OPTION ("rename_load_regs",
> > > RENAME_LOAD_REGS)
> > >
> > >  AARCH64_EXTRA_TUNING_OPTION ("cse_sve_vl_constants",
> > > CSE_SVE_VL_CONSTANTS)
> > > diff --git a/gcc/config/aarch64/aarch64.cc
> b/gcc/config/aarch64/aarch64.cc
> > > index f4ef22ce02f..8dc1a9ceb17 100644
> > > --- a/gcc/config/aarch64/aarch64.cc
> > > +++ b/gcc/config/aarch64/aarch64.cc
> > > @@ -1971,7 +1971,7 @@ static const struct tune_params ampere1a_tunings
> > > =
> > >2, /* min_div_recip_mul_df.  */
> > >0, /* max_case_values.  */
> > >tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
> > > -  (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
> > > +  (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE),   /* tune_flags.  */
> > >_prefetch_tune
> > >  };
>

Re: [PATCH] aarch64: disable LDP via tuning structure for -mcpu=ampere1

2023-04-14 Thread Philipp Tomsich

For phase 1, we plan to replace this with a feature to allow
finer-grained control over when to use LDP or STP (i.e., control these
independently) with the following scopes and policies:
 - scopes are: { sched-fusion, mem, pro/epilogue, peephole }
 - policies are: { default (from tuning), always, never, aligned (to
2x element size) }
Happy to get this fuller solution already onto the list, if it helps
with forward-progress on the localised change.

The current patch tries to be minimally invasive (i.e., it doesn't touch STP).
It intentionally avoids modifying the sched-fusion logic (which
requires refactoring, as it doesn't differentiate between the load and
store cases), pro/epilogue creation and mem* function expansion.

Philipp.

On Fri, 14 Apr 2023 at 11:31, Philipp Tomsich  wrote:
>
> Kyrylo,
>
> On Fri, 14 Apr 2023 at 11:21, Kyrylo Tkachov  wrote:
> >
> > Hi Philipp,
> >
> > > -Original Message-
> > > From: Philipp Tomsich 
> > > Sent: Friday, April 14, 2023 12:22 AM
> > > To: gcc-patches@gcc.gnu.org
> > > Cc: Kyrylo Tkachov ; Philipp Tomsich
> > > ; Di Zhao 
> > > Subject: [PATCH] aarch64: disable LDP via tuning structure for -
> > > mcpu=ampere1
> > >
> > > AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
> > > Given the chance that this causes instructions to slip into the next
> > > decoding cycle and the additional overheads when handling
> > > cacheline-crossing LDP instructions, we disable the generation of LDP
> > > isntructions through the tuning structure from instruction combining
> > > (such as in peephole2).
> > >
> > > Given the code-density benefits in builtins and prologue/epilogue
> > > expansion, we allow LDPs there.
> >
> > LDPs are indeed quite an important part of the ISA for code density and 
> > there are, in principle, second-order benefits from using them, like 
> > keeping the instruction cache footprint low (which can be important for 
> > large workloads).
> > Did you gather some benchmarks showing a benefit of disabling them in this 
> > manner?
>
>
> This has been benchmark-driven, but I need to follow up separately (as
> I the final numbers are with the folks that have access to the
> benchmark machines)..
>
> >
> > > This commit:
> > >  * adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
> > >  * allows -moverride=tune=... to override this
> > >
> > > Signed-off-by: Philipp Tomsich 
> > > Co-Authored-By: Di Zhao 
> > >
> > > gcc/ChangeLog:
> > >
> > >   * config/aarch64/aarch64-tuning-flags.def
> > > (AARCH64_EXTRA_TUNING_OPTION):
> > >   Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
> > >   * config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
> > >   Check for the above tuning option when processing loads.
> > >
> > > ---
> > >
> > >  gcc/config/aarch64/aarch64-tuning-flags.def | 3 +++
> > >  gcc/config/aarch64/aarch64.cc   | 8 +++-
> > >  2 files changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def
> > > b/gcc/config/aarch64/aarch64-tuning-flags.def
> > > index 712895a5263..52112ba7c48 100644
> > > --- a/gcc/config/aarch64/aarch64-tuning-flags.def
> > > +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
> > > @@ -44,6 +44,9 @@ AARCH64_EXTRA_TUNING_OPTION
> > > ("cheap_shift_extend", CHEAP_SHIFT_EXTEND)
> > >  /* Disallow load/store pair instructions on Q-registers.  */
> > >  AARCH64_EXTRA_TUNING_OPTION ("no_ldp_stp_qregs",
> > > NO_LDP_STP_QREGS)
> > >
> > > +/* Disallow load-pair instructions to be formed in combine/peephole.  */
> > > +AARCH64_EXTRA_TUNING_OPTION ("no_ldp_combine",
> > > NO_LDP_COMBINE)
> > > +
> > >  AARCH64_EXTRA_TUNING_OPTION ("rename_load_regs",
> > > RENAME_LOAD_REGS)
> > >
> > >  AARCH64_EXTRA_TUNING_OPTION ("cse_sve_vl_constants",
> > > CSE_SVE_VL_CONSTANTS)
> > > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > > index f4ef22ce02f..8dc1a9ceb17 100644
> > > --- a/gcc/config/aarch64/aarch64.cc
> > > +++ b/gcc/config/aarch64/aarch64.cc
> > > @@ -1971,7 +1971,7 @@ static const struct tune_params ampere1a_tunings
> > > =
> > >2, /* min_div_recip_mul_df.  */
> > >0, /* max_case_values.  */
> > >tune_params::AUTOPREFETCHER_WEA

Re: [PATCH] aarch64: disable LDP via tuning structure for -mcpu=ampere1

2023-04-14 Thread Philipp Tomsich

Kyrylo,

On Fri, 14 Apr 2023 at 11:21, Kyrylo Tkachov  wrote:
>
> Hi Philipp,
>
> > -Original Message-
> > From: Philipp Tomsich 
> > Sent: Friday, April 14, 2023 12:22 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Kyrylo Tkachov ; Philipp Tomsich
> > ; Di Zhao 
> > Subject: [PATCH] aarch64: disable LDP via tuning structure for -
> > mcpu=ampere1
> >
> > AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
> > Given the chance that this causes instructions to slip into the next
> > decoding cycle and the additional overheads when handling
> > cacheline-crossing LDP instructions, we disable the generation of LDP
> > isntructions through the tuning structure from instruction combining
> > (such as in peephole2).
> >
> > Given the code-density benefits in builtins and prologue/epilogue
> > expansion, we allow LDPs there.
>
> LDPs are indeed quite an important part of the ISA for code density and there 
> are, in principle, second-order benefits from using them, like keeping the 
> instruction cache footprint low (which can be important for large workloads).
> Did you gather some benchmarks showing a benefit of disabling them in this 
> manner?


This has been benchmark-driven, but I need to follow up separately (as
I the final numbers are with the folks that have access to the
benchmark machines)..

>
> > This commit:
> >  * adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
> >  * allows -moverride=tune=... to override this
> >
> > Signed-off-by: Philipp Tomsich 
> > Co-Authored-By: Di Zhao 
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-tuning-flags.def
> > (AARCH64_EXTRA_TUNING_OPTION):
> >   Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
> >   * config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
> >   Check for the above tuning option when processing loads.
> >
> > ---
> >
> >  gcc/config/aarch64/aarch64-tuning-flags.def | 3 +++
> >  gcc/config/aarch64/aarch64.cc   | 8 +++-
> >  2 files changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def
> > b/gcc/config/aarch64/aarch64-tuning-flags.def
> > index 712895a5263..52112ba7c48 100644
> > --- a/gcc/config/aarch64/aarch64-tuning-flags.def
> > +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
> > @@ -44,6 +44,9 @@ AARCH64_EXTRA_TUNING_OPTION
> > ("cheap_shift_extend", CHEAP_SHIFT_EXTEND)
> >  /* Disallow load/store pair instructions on Q-registers.  */
> >  AARCH64_EXTRA_TUNING_OPTION ("no_ldp_stp_qregs",
> > NO_LDP_STP_QREGS)
> >
> > +/* Disallow load-pair instructions to be formed in combine/peephole.  */
> > +AARCH64_EXTRA_TUNING_OPTION ("no_ldp_combine",
> > NO_LDP_COMBINE)
> > +
> >  AARCH64_EXTRA_TUNING_OPTION ("rename_load_regs",
> > RENAME_LOAD_REGS)
> >
> >  AARCH64_EXTRA_TUNING_OPTION ("cse_sve_vl_constants",
> > CSE_SVE_VL_CONSTANTS)
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index f4ef22ce02f..8dc1a9ceb17 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -1971,7 +1971,7 @@ static const struct tune_params ampere1a_tunings
> > =
> >2, /* min_div_recip_mul_df.  */
> >0, /* max_case_values.  */
> >tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
> > -  (AARCH64_EXTRA_TUNE_NONE), /* tune_flags.  */
> > +  (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE),   /* tune_flags.  */
> >_prefetch_tune
> >  };
> >
> > @@ -26053,6 +26053,12 @@ aarch64_operands_ok_for_ldpstp (rtx
> > *operands, bool load,
> >enum reg_class rclass_1, rclass_2;
> >rtx mem_1, mem_2, reg_1, reg_2;
> >
> > +  /* Allow the tuning structure to disable LDP instruction formation
> > + from combining instructions (e.g., in peephole2).  */
> > +  if (load && (aarch64_tune_params.extra_tuning_flags
> > +& AARCH64_EXTRA_TUNE_NO_LDP_COMBINE))
> > +return false;
>
> If we do decide to do this, I think this is not a complete approach. See the 
> similar tuning flag AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS.
> There's various other places in the backend that would need to be adjusted to 
> avoid bringing loads together for the peephole2s to merge (the sched_fusion 
> stuff).
> Plus there's the cpymem expansions that would generate load pairs too...

I have add-on patches for these, but given that I don't have direct
access to the benchmarking machine and the benchmarks have been run
with this functionality only, I didn't submit them for the time being.
Do you see a path to get this in during the current cycle and defer
the add-on patches (happy to resubmit as a series) only?

> We'd want some testcases added to check that LDPs are blocked too...
>
> Thanks,
> Kyrill
>
> > +
> >if (load)
> >  {
> >mem_1 = operands[1];
> > --
> > 2.34.1
>

[PATCH] aarch64: disable LDP via tuning structure for -mcpu=ampere1

2023-04-13 Thread Philipp Tomsich

AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
Given the chance that this causes instructions to slip into the next
decoding cycle and the additional overheads when handling
cacheline-crossing LDP instructions, we disable the generation of LDP
isntructions through the tuning structure from instruction combining
(such as in peephole2).

Given the code-density benefits in builtins and prologue/epilogue
expansion, we allow LDPs there.

This commit:
 * adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
 * allows -moverride=tune=... to override this

Signed-off-by: Philipp Tomsich 
Co-Authored-By: Di Zhao 

gcc/ChangeLog:

* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
* config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
Check for the above tuning option when processing loads.

---

 gcc/config/aarch64/aarch64-tuning-flags.def | 3 +++
 gcc/config/aarch64/aarch64.cc   | 8 +++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index 712895a5263..52112ba7c48 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -44,6 +44,9 @@ AARCH64_EXTRA_TUNING_OPTION ("cheap_shift_extend", 
CHEAP_SHIFT_EXTEND)
 /* Disallow load/store pair instructions on Q-registers.  */
 AARCH64_EXTRA_TUNING_OPTION ("no_ldp_stp_qregs", NO_LDP_STP_QREGS)
 
+/* Disallow load-pair instructions to be formed in combine/peephole.  */
+AARCH64_EXTRA_TUNING_OPTION ("no_ldp_combine", NO_LDP_COMBINE)
+
 AARCH64_EXTRA_TUNING_OPTION ("rename_load_regs", RENAME_LOAD_REGS)
 
 AARCH64_EXTRA_TUNING_OPTION ("cse_sve_vl_constants", CSE_SVE_VL_CONSTANTS)
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index f4ef22ce02f..8dc1a9ceb17 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -1971,7 +1971,7 @@ static const struct tune_params ampere1a_tunings =
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags.  */
   _prefetch_tune
 };
 
@@ -26053,6 +26053,12 @@ aarch64_operands_ok_for_ldpstp (rtx *operands, bool 
load,
   enum reg_class rclass_1, rclass_2;
   rtx mem_1, mem_2, reg_1, reg_2;
 
+  /* Allow the tuning structure to disable LDP instruction formation
+ from combining instructions (e.g., in peephole2).  */
+  if (load && (aarch64_tune_params.extra_tuning_flags
+  & AARCH64_EXTRA_TUNE_NO_LDP_COMBINE))
+return false;
+
   if (load)
 {
   mem_1 = operands[1];
-- 
2.34.1

Re: [PATCH] RISC-V: avoid splitting small constant in i_extrabit pattern

2023-04-10 Thread Philipp Tomsich

On Mon, 10 Apr 2023 at 17:57, Jeff Law  wrote:
>
>
>
> On 4/9/23 23:07, Lin Sinan via Gcc-patches wrote:
> > From: Sinan Lin 
> >
> > there is no need to split an xori/ori with an small constant. take the test
> > case `int foo(int idx) { return idx|3; }` as an example,
> >
> > rv64im_zba generates:
> >  ori a0,a0,3
> >  ret
> > but, rv64im_zba_zbs generates:
> >  ori a0,a0,1
> >  ori a0,a0,2
> >  ret
> >
> > with this change, insn `ori r2,r1,3` will not be splitted in zbs.
> > ---
> >   gcc/config/riscv/predicates.md |  2 +-
> >   .../gcc.target/riscv/zbs-extra-bit-or-twobits.c| 14 ++
> >   2 files changed, 15 insertions(+), 1 deletion(-)
> >   create mode 100644 
> > gcc/testsuite/gcc.target/riscv/zbs-extra-bit-or-twobits.c
> A minor oversight in the VRULL patches in this space.  This is actually
> a regression as we were previously generating the single [xo]ri.

Thanks for catching this one!

I looked this change over and it looks fine.  I hope this is the last
fallout from this set of changes.

>
> The patch looks fine, though it does need to go through a test cycle.
>
> jeff
>

Re: [PATCH] aarch64: update ampere1 vectorization cost

2023-04-03 Thread Philipp Tomsich

Kyrill,

We reran on GCC12 and GCC11, reproducing the same improvements (e.g.,
on fotonik3d) that prompted the changes.
I'll apply the backports later this week, unless you have any further concerns…

Thanks,
Philipp.


On Mon, 27 Mar 2023 at 11:24, Kyrylo Tkachov  wrote:
>
>
>
> > -Original Message-
> > From: Philipp Tomsich 
> > Sent: Monday, March 27, 2023 9:50 AM
> > To: Kyrylo Tkachov 
> > Cc: gcc-patches@gcc.gnu.org; Richard Sandiford
> > ; Tamar Christina
> > ; Manolis Tsamis 
> > Subject: Re: [PATCH] aarch64: update ampere1 vectorization cost
> >
> > On Mon, 27 Mar 2023 at 16:45, Kyrylo Tkachov 
> > wrote:
> > >
> > > Hi Philipp,
> > >
> > > > -----Original Message-
> > > > From: Gcc-patches  > > > bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Philipp
> > > > Tomsich
> > > > Sent: Monday, March 27, 2023 8:47 AM
> > > > To: gcc-patches@gcc.gnu.org
> > > > Cc: Richard Sandiford ; Tamar Christina
> > > > ; Philipp Tomsich
> > ;
> > > > Manolis Tsamis 
> > > > Subject: [PATCH] aarch64: update ampere1 vectorization cost
> > > >
> > > > The original submission of AmpereOne (-mcpu=ampere1) costs occurred
> > > > prior to exhaustive testing of vectorizable workloads against
> > > > hardware.
> > > >
> > > > Adjust the vector costs to achieve the best results and more closely
> > > > match the underlying hardware.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >   * config/aarch64/aarch64.cc: Update vector costs for ampere1.
> > > >
> > > > Co-Authored-By: Manolis Tsamis 
> > > >
> > > > Signed-off-by: Philipp Tomsich 
> > > > ---
> > > > We would like to get this into GCC 13 to avoid having to backport at
> > > > the start of the next cycle.
> > > >
> > >
> > > Given this affects only the ampere1 costs that sounds fine to me and 
> > > fairly
> > low risk, you are being trusted that these costs are actually desirable and
> > properly validated on the hardware involved.
> > >
> > > > OK for backports?
> > >
> > > This is ok for trunk (GCC 13). Do you also want to backport this to other
> > branches?
> >
> > Ampere1 (with the older vector costs) are in GCC12 and GCC11.
> > I would like to backport to those as well.
>
> Ok then, though you may want to run the benchmarks on the branches as well to 
> make sure the costs give the expected benefit there as well.
> Thanks,
> Kyrill
>
> >
> > Thanks,
> > Philipp.
> >
> > > Thanks,
> > > Kyrill
> > >
> > > >
> > > >  gcc/config/aarch64/aarch64.cc | 12 ++--
> > > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc
> > > > index b27f4354031..661fff65cea 100644
> > > > --- a/gcc/config/aarch64/aarch64.cc
> > > > +++ b/gcc/config/aarch64/aarch64.cc
> > > > @@ -1132,7 +1132,7 @@ static const struct cpu_vector_cost
> > > > thunderx3t110_vector_cost =
> > > >
> > > >  static const advsimd_vec_cost ampere1_advsimd_vector_cost =
> > > >  {
> > > > -  3, /* int_stmt_cost  */
> > > > +  1, /* int_stmt_cost  */
> > > >3, /* fp_stmt_cost  */
> > > >0, /* ld2_st2_permute_cost  */
> > > >0, /* ld3_st3_permute_cost  */
> > > > @@ -1148,17 +1148,17 @@ static const advsimd_vec_cost
> > > > ampere1_advsimd_vector_cost =
> > > >8, /* store_elt_extra_cost  */
> > > >6, /* vec_to_scalar_cost  */
> > > >7, /* scalar_to_vec_cost  */
> > > > -  5, /* align_load_cost  */
> > > > -  5, /* unalign_load_cost  */
> > > > -  2, /* unalign_store_cost  */
> > > > -  2  /* store_cost  */
> > > > +  4, /* align_load_cost  */
> > > > +  4, /* unalign_load_cost  */
> > > > +  1, /* unalign_store_cost  */
> > > > +  1  /* store_cost  */
> > > >  };
> > > >
> > > >  /* Ampere-1 costs for vector insn classes.  */
> > > >  static const struct cpu_vector_cost ampere1_vector_cost =
> > > >  {
> > > >1, /* scalar_int_stmt_cost  */
> > > > -  1, /* scalar_fp_stmt_cost  */
> > > > +  3, /* scalar_fp_stmt_cost  */
> > > >4, /* scalar_load_cost  */
> > > >1, /* scalar_store_cost  */
> > > >1, /* cond_taken_branch_cost  */
> > > > --
> > > > 2.34.1
> > >

Re: [PATCH] target/109296 - riscv: Add missing mode specifiers for XTheadMemPair

2023-03-27 Thread Philipp Tomsich

Applied to master, thanks!
Philipp.

On Mon, 27 Mar 2023 at 19:55, Kito Cheng  wrote:
>
> OK for trunk, thanks :)
>
> On Mon, Mar 27, 2023 at 7:04 PM Christoph Muellner 
>  wrote:
>>
>> From: Christoph Müllner 
>>
>> This patch adds missing mode specifiers for XTheadMemPair INSNs.
>>
>> gcc/ChangeLog:
>> PR target/109296
>> * config/riscv/thead.md: Add missing mode specifiers.
>>
>> Signed-off-by: Christoph Müllner 
>> ---
>>  gcc/config/riscv/thead.md | 16 
>>  1 file changed, 8 insertions(+), 8 deletions(-)
>>
>> diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md
>> index 63c4af6f77d..0623607d3dc 100644
>> --- a/gcc/config/riscv/thead.md
>> +++ b/gcc/config/riscv/thead.md
>> @@ -321,10 +321,10 @@ (define_insn "*th_mempair_store_2"
>>
>>  ;; MEMPAIR load DI extended signed SI
>>  (define_insn "*th_mempair_load_extendsidi2"
>> -  [(set (match_operand 0 "register_operand" "=r")
>> -   (sign_extend:DI (match_operand 1 "memory_operand" "m")))
>> -   (set (match_operand 2 "register_operand" "=r")
>> -   (sign_extend:DI (match_operand 3 "memory_operand" "m")))]
>> +  [(set (match_operand:DI 0 "register_operand" "=r")
>> +   (sign_extend:DI (match_operand:SI 1 "memory_operand" "m")))
>> +   (set (match_operand:DI 2 "register_operand" "=r")
>> +   (sign_extend:DI (match_operand:SI 3 "memory_operand" "m")))]
>>"TARGET_XTHEADMEMPAIR && TARGET_64BIT && reload_completed
>> && th_mempair_operands_p (operands, true, SImode)"
>>{ return th_mempair_output_move (operands, true, SImode, SIGN_EXTEND); }
>> @@ -334,10 +334,10 @@ (define_insn "*th_mempair_load_extendsidi2"
>>
>>  ;; MEMPAIR load DI extended unsigned SI
>>  (define_insn "*th_mempair_load_zero_extendsidi2"
>> -  [(set (match_operand 0 "register_operand" "=r")
>> -   (zero_extend:DI (match_operand 1 "memory_operand" "m")))
>> -   (set (match_operand 2 "register_operand" "=r")
>> -   (zero_extend:DI (match_operand 3 "memory_operand" "m")))]
>> +  [(set (match_operand:DI 0 "register_operand" "=r")
>> +   (zero_extend:DI (match_operand:SI 1 "memory_operand" "m")))
>> +   (set (match_operand:DI 2 "register_operand" "=r")
>> +   (zero_extend:DI (match_operand:SI 3 "memory_operand" "m")))]
>>"TARGET_XTHEADMEMPAIR && TARGET_64BIT && reload_completed
>> && th_mempair_operands_p (operands, true, SImode)"
>>{ return th_mempair_output_move (operands, true, SImode, ZERO_EXTEND); }
>> --
>> 2.39.2
>>

Re: [PATCH] aarch64: update ampere1 vectorization cost

2023-03-27 Thread Philipp Tomsich

Applied to master, thanks!
Philipp.

On Mon, 27 Mar 2023 at 16:45, Kyrylo Tkachov  wrote:
>
> Hi Philipp,
>
> > -Original Message-
> > From: Gcc-patches  > bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Philipp
> > Tomsich
> > Sent: Monday, March 27, 2023 8:47 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford ; Tamar Christina
> > ; Philipp Tomsich ;
> > Manolis Tsamis 
> > Subject: [PATCH] aarch64: update ampere1 vectorization cost
> >
> > The original submission of AmpereOne (-mcpu=ampere1) costs occurred
> > prior to exhaustive testing of vectorizable workloads against
> > hardware.
> >
> > Adjust the vector costs to achieve the best results and more closely
> > match the underlying hardware.
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64.cc: Update vector costs for ampere1.
> >
> > Co-Authored-By: Manolis Tsamis 
> >
> > Signed-off-by: Philipp Tomsich 
> > ---
> > We would like to get this into GCC 13 to avoid having to backport at
> > the start of the next cycle.
> >
>
> Given this affects only the ampere1 costs that sounds fine to me and fairly 
> low risk, you are being trusted that these costs are actually desirable and 
> properly validated on the hardware involved.
>
> > OK for backports?
>
> This is ok for trunk (GCC 13). Do you also want to backport this to other 
> branches?
> Thanks,
> Kyrill
>
> >
> >  gcc/config/aarch64/aarch64.cc | 12 ++--
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index b27f4354031..661fff65cea 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -1132,7 +1132,7 @@ static const struct cpu_vector_cost
> > thunderx3t110_vector_cost =
> >
> >  static const advsimd_vec_cost ampere1_advsimd_vector_cost =
> >  {
> > -  3, /* int_stmt_cost  */
> > +  1, /* int_stmt_cost  */
> >3, /* fp_stmt_cost  */
> >0, /* ld2_st2_permute_cost  */
> >0, /* ld3_st3_permute_cost  */
> > @@ -1148,17 +1148,17 @@ static const advsimd_vec_cost
> > ampere1_advsimd_vector_cost =
> >8, /* store_elt_extra_cost  */
> >6, /* vec_to_scalar_cost  */
> >7, /* scalar_to_vec_cost  */
> > -  5, /* align_load_cost  */
> > -  5, /* unalign_load_cost  */
> > -  2, /* unalign_store_cost  */
> > -  2  /* store_cost  */
> > +  4, /* align_load_cost  */
> > +  4, /* unalign_load_cost  */
> > +  1, /* unalign_store_cost  */
> > +  1  /* store_cost  */
> >  };
> >
> >  /* Ampere-1 costs for vector insn classes.  */
> >  static const struct cpu_vector_cost ampere1_vector_cost =
> >  {
> >1, /* scalar_int_stmt_cost  */
> > -  1, /* scalar_fp_stmt_cost  */
> > +  3, /* scalar_fp_stmt_cost  */
> >4, /* scalar_load_cost  */
> >1, /* scalar_store_cost  */
> >1, /* cond_taken_branch_cost  */
> > --
> > 2.34.1
>

Re: [PATCH] aarch64: update ampere1 vectorization cost

2023-03-27 Thread Philipp Tomsich

On Mon, 27 Mar 2023 at 16:45, Kyrylo Tkachov  wrote:
>
> Hi Philipp,
>
> > -Original Message-
> > From: Gcc-patches  > bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Philipp
> > Tomsich
> > Sent: Monday, March 27, 2023 8:47 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford ; Tamar Christina
> > ; Philipp Tomsich ;
> > Manolis Tsamis 
> > Subject: [PATCH] aarch64: update ampere1 vectorization cost
> >
> > The original submission of AmpereOne (-mcpu=ampere1) costs occurred
> > prior to exhaustive testing of vectorizable workloads against
> > hardware.
> >
> > Adjust the vector costs to achieve the best results and more closely
> > match the underlying hardware.
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64.cc: Update vector costs for ampere1.
> >
> > Co-Authored-By: Manolis Tsamis 
> >
> > Signed-off-by: Philipp Tomsich 
> > ---
> > We would like to get this into GCC 13 to avoid having to backport at
> > the start of the next cycle.
> >
>
> Given this affects only the ampere1 costs that sounds fine to me and fairly 
> low risk, you are being trusted that these costs are actually desirable and 
> properly validated on the hardware involved.
>
> > OK for backports?
>
> This is ok for trunk (GCC 13). Do you also want to backport this to other 
> branches?

Ampere1 (with the older vector costs) are in GCC12 and GCC11.
I would like to backport to those as well.

Thanks,
Philipp.

> Thanks,
> Kyrill
>
> >
> >  gcc/config/aarch64/aarch64.cc | 12 ++--
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index b27f4354031..661fff65cea 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -1132,7 +1132,7 @@ static const struct cpu_vector_cost
> > thunderx3t110_vector_cost =
> >
> >  static const advsimd_vec_cost ampere1_advsimd_vector_cost =
> >  {
> > -  3, /* int_stmt_cost  */
> > +  1, /* int_stmt_cost  */
> >3, /* fp_stmt_cost  */
> >0, /* ld2_st2_permute_cost  */
> >0, /* ld3_st3_permute_cost  */
> > @@ -1148,17 +1148,17 @@ static const advsimd_vec_cost
> > ampere1_advsimd_vector_cost =
> >8, /* store_elt_extra_cost  */
> >6, /* vec_to_scalar_cost  */
> >7, /* scalar_to_vec_cost  */
> > -  5, /* align_load_cost  */
> > -  5, /* unalign_load_cost  */
> > -  2, /* unalign_store_cost  */
> > -  2  /* store_cost  */
> > +  4, /* align_load_cost  */
> > +  4, /* unalign_load_cost  */
> > +  1, /* unalign_store_cost  */
> > +  1  /* store_cost  */
> >  };
> >
> >  /* Ampere-1 costs for vector insn classes.  */
> >  static const struct cpu_vector_cost ampere1_vector_cost =
> >  {
> >1, /* scalar_int_stmt_cost  */
> > -  1, /* scalar_fp_stmt_cost  */
> > +  3, /* scalar_fp_stmt_cost  */
> >4, /* scalar_load_cost  */
> >1, /* scalar_store_cost  */
> >1, /* cond_taken_branch_cost  */
> > --
> > 2.34.1
>

[PATCH] aarch64: update ampere1 vectorization cost

2023-03-27 Thread Philipp Tomsich

The original submission of AmpereOne (-mcpu=ampere1) costs occurred
prior to exhaustive testing of vectorizable workloads against
hardware.

Adjust the vector costs to achieve the best results and more closely
match the underlying hardware.

gcc/ChangeLog:

* config/aarch64/aarch64.cc: Update vector costs for ampere1.

Co-Authored-By: Manolis Tsamis 

Signed-off-by: Philipp Tomsich 
---
We would like to get this into GCC 13 to avoid having to backport at
the start of the next cycle.

OK for backports?

 gcc/config/aarch64/aarch64.cc | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index b27f4354031..661fff65cea 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -1132,7 +1132,7 @@ static const struct cpu_vector_cost 
thunderx3t110_vector_cost =
 
 static const advsimd_vec_cost ampere1_advsimd_vector_cost =
 {
-  3, /* int_stmt_cost  */
+  1, /* int_stmt_cost  */
   3, /* fp_stmt_cost  */
   0, /* ld2_st2_permute_cost  */
   0, /* ld3_st3_permute_cost  */
@@ -1148,17 +1148,17 @@ static const advsimd_vec_cost 
ampere1_advsimd_vector_cost =
   8, /* store_elt_extra_cost  */
   6, /* vec_to_scalar_cost  */
   7, /* scalar_to_vec_cost  */
-  5, /* align_load_cost  */
-  5, /* unalign_load_cost  */
-  2, /* unalign_store_cost  */
-  2  /* store_cost  */
+  4, /* align_load_cost  */
+  4, /* unalign_load_cost  */
+  1, /* unalign_store_cost  */
+  1  /* store_cost  */
 };
 
 /* Ampere-1 costs for vector insn classes.  */
 static const struct cpu_vector_cost ampere1_vector_cost =
 {
   1, /* scalar_int_stmt_cost  */
-  1, /* scalar_fp_stmt_cost  */
+  3, /* scalar_fp_stmt_cost  */
   4, /* scalar_load_cost  */
   1, /* scalar_store_cost  */
   1, /* cond_taken_branch_cost  */
-- 
2.34.1

Re: [PATCH v1] [RFC] Improve folding for comparisons with zero in tree-ssa-forwprop.

2023-03-17 Thread Philipp Tomsich

On Fri, 17 Mar 2023 at 09:31, Richard Biener  wrote:
>
> On Thu, Mar 16, 2023 at 4:27 PM Manolis Tsamis  
> wrote:
> >
> > For this C testcase:
> >
> > void g();
> > void f(unsigned int *a)
> > {
> >   if (++*a == 1)
> > g();
> > }
> >
> > GCC will currently emit a comparison with 1 by using the value
> > of *a after the increment. This can be improved by comparing
> > against 0 and using the value before the increment. As a result
> > there is a potentially shorter dependancy chain (no need to wait
> > for the result of +1) and on targets with compare zero instructions
> > the generated code is one instruction shorter.
>
> The downside is we now need two registers and their lifetime overlaps.
>
> Your patch mixes changing / inverting a parameter (which seems unneeded
> for the actual change) with preferring compares against zero.
>
> What's the reason to specifically prefer compares against zero?  On x86
> we have add that sets flags, so ++*a == 0 would be preferred, but
> for your sequence we'd need a test reg, reg; branch on zero, so we do
> not save any instruction.

AArch64, RISC-V and MIPS support a branch-on-(not-)equals-zero, while
comparing against a constant requires to load any non-zero value into
a register first.
This feels a bit like we need to call onto the backend to check
whether comparisons against 0 are cheaper.

Obviously, the underlying issue become worse if the immediate can not
be built up in a single instruction.
Using RISC-V as an example (primarily, as RISC-V makes it particularly
easy to run into multi-instruction sequences for constants), we can
construct the following case:

  void f(unsigned int *a)
  {
if ((*a += 0x900) == 0x900)
   g();
  }

which GCC 12.2.0 (trunk may already be small enough to reuse the
constant once loaded into register, but I did not check…) with -O3
turns into:

f:
  lw a4,0(a0)
  li a5,4096
  addiw a5,a5,-1792
  addw a4,a5,a4
  li a5,4096
  sw a4,0(a0)
  addi a5,a5,-1792
  beq a4,a5,.L4
  ret
.L4:
  tail g

Thanks,
Philipp.


On Fri, 17 Mar 2023 at 09:31, Richard Biener  wrote:
>
> On Thu, Mar 16, 2023 at 4:27 PM Manolis Tsamis  
> wrote:
> >
> > For this C testcase:
> >
> > void g();
> > void f(unsigned int *a)
> > {
> >   if (++*a == 1)
> > g();
> > }
> >
> > GCC will currently emit a comparison with 1 by using the value
> > of *a after the increment. This can be improved by comparing
> > against 0 and using the value before the increment. As a result
> > there is a potentially shorter dependancy chain (no need to wait
> > for the result of +1) and on targets with compare zero instructions
> > the generated code is one instruction shorter.
>
> The downside is we now need two registers and their lifetime overlaps.
>
> Your patch mixes changing / inverting a parameter (which seems unneeded
> for the actual change) with preferring compares against zero.
>
> What's the reason to specifically prefer compares against zero?  On x86
> we have add that sets flags, so ++*a == 0 would be preferred, but
> for your sequence we'd need a test reg, reg; branch on zero, so we do
> not save any instruction.
>
> We do have quite some number of bugreports with regards to making VRPs
> life harder when splitting things this way.  It's easier for VRP to handle
>
>   _1 = _2 + 1;
>   if (_1 == 1)
>
> than it is
>
>   _1 = _2 + 1;
>   if (_2 == 0)
>
> where VRP fails to derive a range for _1 on the _2 == 0 branch.  So besides
> the life-range issue there's other side-effects as well.  Maybe ranger 
> meanwhile
> can handle the above case?
>
> What's the overall effect of the change on a larger code base?
>
> Thanks,
> Richard.
>
> >
> > Example from Aarch64:
> >
> > Before
> > ldr w1, [x0]
> > add w1, w1, 1
> > str w1, [x0]
> > cmp w1, 1
> > beq .L4
> > ret
> >
> > After
> > ldr w1, [x0]
> > add w2, w1, 1
> > str w2, [x0]
> > cbz w1, .L4
> > ret
> >
> > gcc/ChangeLog:
> >
> > * tree-ssa-forwprop.cc (combine_cond_expr_cond):
> > (forward_propagate_into_comparison_1): Optimize
> > for zero comparisons.
> >
> > Signed-off-by: Manolis Tsamis 
> > ---
> >
> >  gcc/tree-ssa-forwprop.cc | 41 +++-
> >  1 file changed, 28 insertions(+), 13 deletions(-)
> >
> > diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
> > index e34f0888954..93d5043821b 100644
> > --- a/gcc/tree-ssa-forwprop.cc
> > +++ b/gcc/tree-ssa-forwprop.cc
> > @@ -373,12 +373,13 @@ rhs_to_tree (tree type, gimple *stmt)
> >  /* Combine OP0 CODE OP1 in the context of a COND_EXPR.  Returns
> > the folded result in a form suitable for COND_EXPR_COND or
> > NULL_TREE, if there is no suitable simplified form.  If
> > -   INVARIANT_ONLY is true only gimple_min_invariant results are
> > -   considered simplified.  */
> > +   ALWAYS_COMBINE is false then only combine it the resulting
> > +   expression is

Re: [PATCH v1] [RFC] Improve folding for comparisons with zero in tree-ssa-forwprop.

2023-03-16 Thread Philipp Tomsich

Just to add a bit more color on this one...
It was originally observed (and isolated from)
_ZN11xalanc_1_1027XalanReferenceCountedObject12addReferenceEPS0_ and
reproduces both for AArch64 and RISC-V.

The basic block (annotated with dynamic instructions executed and
percentage of total dynamic instructions) looks as follows:

>   0x00511488 4589868875 0.4638%
> _ZN11xalanc_1_1027XalanReferenceCountedObject12addReferenceEPS0_
>   4518  lw  a4,8(a0)
>   0017029b  addiw   t0,a4,1
>   00552423  sw  t0,8(a0)
>   4685  addia3,zero,1
>   00d28363  beq t0,a3,6 # 0x51149a

This change reduces the instruction count on RISC-V by one compressible
instruction (2 bytes) and on AArch64 by one instruction (4 bytes).
No execution time improvement (measured on Neoverse-N1) — as would be
expected.

--Philipp.

On Thu, 16 Mar 2023 at 17:41, Jeff Law  wrote:

>
>
> On 3/16/23 09:27, Manolis Tsamis wrote:
> > For this C testcase:
> >
> > void g();
> > void f(unsigned int *a)
> > {
> >if (++*a == 1)
> >  g();
> > }
> >
> > GCC will currently emit a comparison with 1 by using the value
> > of *a after the increment. This can be improved by comparing
> > against 0 and using the value before the increment. As a result
> > there is a potentially shorter dependancy chain (no need to wait
> > for the result of +1) and on targets with compare zero instructions
> > the generated code is one instruction shorter.
> >
> > Example from Aarch64:
> >
> > Before
> >  ldr w1, [x0]
> >  add w1, w1, 1
> >  str w1, [x0]
> >  cmp w1, 1
> >  beq .L4
> >  ret
> >
> > After
> >  ldr w1, [x0]
> >  add w2, w1, 1
> >  str w2, [x0]
> >  cbz w1, .L4
> >  ret
> >
> > gcc/ChangeLog:
> >
> >  * tree-ssa-forwprop.cc (combine_cond_expr_cond):
> >  (forward_propagate_into_comparison_1): Optimize
> >  for zero comparisons.
> Deferring to gcc-14.  Though I'm generally supportive of normalizing to
> a comparison against zero when we safely can :-)
>
> jeff
>

Re: [wwwdocs] gcc-13: riscv: Document the T-Head CPU support

2023-03-15 Thread Philipp Tomsich

Applied to master, thanks!
Philipp.

On Sun, 5 Mar 2023 at 11:18, Kito Cheng  wrote:

> LGTM :)
>
>
> On Fri, Feb 24, 2023 at 7:19 PM Christoph Muellner
>  wrote:
> >
> > From: Christoph Müllner 
> >
> > This patch documents the new T-Head CPU support for RISC-V.
> >
> > Signed-off-by: Christoph Müllner 
> > ---
> >  htdocs/gcc-13/changes.html | 24 +++-
> >  1 file changed, 23 insertions(+), 1 deletion(-)
> >
> > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> > index a803f501..ce5ba35c 100644
> > --- a/htdocs/gcc-13/changes.html
> > +++ b/htdocs/gcc-13/changes.html
> > @@ -490,7 +490,29 @@ a work-in-progress.
> >
> >  RISC-V
> >  
> > -New ISA extension support for zawrs.
> > +  New ISA extension support for Zawrs.
> > +  Support for the following vendor extensions has been added:
> > +
> > +  XTheadBa
> > +  XTheadBb
> > +  XTheadBs
> > +  XTheadCmo
> > +  XTheadCondMov
> > +  XTheadFMemIdx
> > +  XTheadFmv
> > +  XTheadInt
> > +  XTheadMac
> > +  XTheadMemIdx
> > +  XTheadMemPair
> > +  XTheadSync
> > +
> > +  
> > +  The following new CPUs are supported through the
> -mcpu
> > +  option (GCC identifiers in parentheses).
> > +
> > +  T-Head's XuanTie C906 (thead-c906).
> > +
> > +  
> >  
> >
> >  
> > --
> > 2.39.2
> >
>

Re: [PATCH v4 0/9] RISC-V: Add XThead* extension support

2023-03-15 Thread Philipp Tomsich

On Sun, 5 Mar 2023 at 11:19, Kito Cheng  wrote:

> LGTM :)
>

Applied to master, thanks!
--Philipp.

On Thu, Mar 2, 2023 at 4:36 PM Christoph Muellner
>  wrote:
> >
> > From: Christoph Müllner 
> >
> > This series introduces support for the T-Head specific RISC-V ISA
> extensions
> > which are available e.g. on the T-Head XuanTie C906.
> >
> > The ISA spec can be found here:
> >   https://github.com/T-head-Semi/thead-extension-spec
> >
> > This series adds support for the following XThead* extensions:
> > * XTheadBa
> > * XTheadBb
> > * XTheadBs
> > * XTheadCmo
> > * XTheadCondMov
> > * XTheadFmv
> > * XTheadInt
> > * XTheadMac
> > * XTheadMemPair
> > * XTheadSync
> >
> > All extensions are properly integrated and the included tests
> > demonstrate the improvements of the generated code.
> >
> > The series also introduces support for "-mcpu=thead-c906", which also
> > enables all available XThead* ISA extensions of the T-Head C906.
> >
> > All patches have been tested and don't introduce regressions for RV32 or
> RV64.
> > The patches have also been tested with SPEC CPU2017 on QEMU and real HW
> > (D1 board).
> >
> > Support patches for these extensions for Binutils, QEMU, and LLVM have
> > already been merged in the corresponding upstream projects.
> >
> > Patches 1-8 from this series (everything except the last one) got an ACK
> > by Kito. However, since there were a few comments after the ACK, I
> > decided to send out a v4, so that reviewers can verify that their
> > comments have been addressed properly.
> >
> > Note, that there was a concern raised by Andrew Pinski (on CC), which
> > might not be resolved with this series (I could not reproduce the issue,
> > but I might have misunderstood something).
> >
> > Changes in v4:
> > - Drop XTheadMemIdx and XTheadFMemIdx (will be a follow-up series)
> > - Replace 'immediate_operand' by 'const_int_operand' in many patterns
> > - Small cleanups in XTheadBb
> > - Factor out C code into thead.cc (XTheadMemPair) to minimize changes in
> >   riscv.cc
> >
> > Changes in v3:
> > - Bugfix in XTheadBa
> > - Rewrite of XTheadMemPair
> > - Inclusion of XTheadMemIdx and XTheadFMemIdx
> >
> > Christoph Müllner (9):
> >   riscv: Add basic XThead* vendor extension support
> >   riscv: riscv-cores.def: Add T-Head XuanTie C906
> >   riscv: thead: Add support for the XTheadBa ISA extension
> >   riscv: thead: Add support for the XTheadBs ISA extension
> >   riscv: thead: Add support for the XTheadBb ISA extension
> >   riscv: thead: Add support for the XTheadCondMov ISA extensions
> >   riscv: thead: Add support for the XTheadMac ISA extension
> >   riscv: thead: Add support for the XTheadFmv ISA extension
> >   riscv: thead: Add support for the XTheadMemPair ISA extension
> >
> >  gcc/common/config/riscv/riscv-common.cc   |  26 ++
> >  gcc/config.gcc|   1 +
> >  gcc/config/riscv/bitmanip.md  |  52 ++-
> >  gcc/config/riscv/constraints.md   |   8 +
> >  gcc/config/riscv/iterators.md |   4 +
> >  gcc/config/riscv/peephole.md  |  56 +++
> >  gcc/config/riscv/riscv-cores.def  |   4 +
> >  gcc/config/riscv/riscv-opts.h |  26 ++
> >  gcc/config/riscv/riscv-protos.h   |  16 +-
> >  gcc/config/riscv/riscv.cc | 226 +++--
> >  gcc/config/riscv/riscv.md |  67 ++-
> >  gcc/config/riscv/riscv.opt|   3 +
> >  gcc/config/riscv/t-riscv  |   4 +
> >  gcc/config/riscv/thead.cc | 427 ++
> >  gcc/config/riscv/thead.md | 346 ++
> >  .../gcc.target/riscv/mcpu-thead-c906.c|  28 ++
> >  .../gcc.target/riscv/xtheadba-addsl.c |  55 +++
> >  gcc/testsuite/gcc.target/riscv/xtheadba.c |  14 +
> >  gcc/testsuite/gcc.target/riscv/xtheadbb-ext.c |  20 +
> >  .../gcc.target/riscv/xtheadbb-extu-2.c|  22 +
> >  .../gcc.target/riscv/xtheadbb-extu.c  |  22 +
> >  gcc/testsuite/gcc.target/riscv/xtheadbb-ff1.c |  18 +
> >  gcc/testsuite/gcc.target/riscv/xtheadbb-rev.c |  45 ++
> >  .../gcc.target/riscv/xtheadbb-srri.c  |  25 +
> >  gcc/testsuite/gcc.target/riscv/xtheadbb.c |  14 +
> >  gcc/testsuite/gcc.target/riscv/xtheadbs-tst.c |  13 +
> >  gcc/testsuite/gcc.target/riscv/xtheadbs.c |  14 +
> >  gcc/testsuite/gcc.target/riscv/xtheadcmo.c|  14 +
> >  .../riscv/xtheadcondmov-mveqz-imm-eqz.c   |  38 ++
> >  .../riscv/xtheadcondmov-mveqz-imm-not.c   |  38 ++
> >  .../riscv/xtheadcondmov-mveqz-reg-eqz.c   |  38 ++
> >  .../riscv/xtheadcondmov-mveqz-reg-not.c   |  38 ++
> >  .../riscv/xtheadcondmov-mvnez-imm-cond.c  |  38 ++
> >  .../riscv/xtheadcondmov-mvnez-imm-nez.c   |  38 ++
> >  .../riscv/xtheadcondmov-mvnez-reg-cond.c  |  38 ++
> >  .../riscv/xtheadcondmov-mvnez-reg-nez.c   |  38 ++
> >  .../gcc.target/riscv/xtheadcondmov.c  |  14

Re: [PATCH] RISC-V: costs: miscomputed shiftadd_cost triggering synth_mult [PR/108987]

2023-03-01 Thread Philipp Tomsich

On Wed, 1 Mar 2023 at 20:53, Vineet Gupta  wrote:
>
> This showed up as dynamic icount regression in SPEC 531.deepsjeng with 
> upstream
> gcc (vs. gcc 12.2). gcc was resorting to synthetic multiply using shift+add(s)
> even when multiply had clear cost benefit.
>
> |000133b8  .constprop.0]+0x382>:
> |   133b8:  srl a3,a1,s6
> |   133bc:  and a3,a3,s5
> |   133c0:  sllia4,a3,0x9
> |   133c4:  add a4,a4,a3
> |   133c6:  sllia4,a4,0x9
> |   133c8:  add a4,a4,a3
> |   133ca:  sllia3,a4,0x1b
> |   133ce:  add a4,a4,a3
>
> vs. gcc 12 doing something lke below.
>
> |000131c4  .constprop.0]+0x35c>:
> |   131c4:  ld  s1,8(sp)
> |   131c6:  srl a3,a1,s4
> |   131ca:  and a3,a3,s11
> |   131ce:  mul a3,a3,s1
>
> Bisected this to f90cb39235c4 ("RISC-V: costs: support shift-and-add in
> strength-reduction"). The intent was to optimize cost for
> shift-add-pow2-{1,2,3} corresponding to bitmanip insns SH*ADD, but ended
> up doing that for all shift values which seems to favor synthezing
> multiply among others.
>
> The bug itself is trivial, IN_RANGE() calling pow2p_hwi() which returns bool
> vs. exact_log2() returning power of 2.
>
> This fix also requires update to the test introduced by the same commit
> which now generates MUL vs. synthesizing it.
>
> gcc/Changelog:
>
> * config/riscv/riscv.cc (riscv_rtx_costs): Fixed IN_RANGE() to
>   use exact_log2().
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/zba-shNadd-07.c: f2(i*783) now generates MUL vs.
>   5 insn sh1add+slli+add+slli+sub.
> * gcc.target/riscv/pr108987.c: New test.
>
> Signed-off-by: Vineet Gupta 

Reviewed-by: Philipp Tomsich

Re: [PATCH] RISC-V: Fix wrong partial subreg check for bsetidisi

2023-02-28 Thread Philipp Tomsich

On Tue, 28 Feb 2023 at 06:00, Lin Sinan  wrote:
>
> From: Lin Sinan 
>
> The partial subreg check should be for subreg operand(operand 1) instead of
> the immediate operand(operand 2). This change also fix pr68648.c in zbs.

Good catch.
Reviewed-by:

[RFC PATCH v1 10/10] RISC-V: Support XVentanaCondOps extension

2023-02-10 Thread Philipp Tomsich

The vendor-defined XVentanaCondOps extension adds two instructions
with semantics identical to Zicond.

This plugs the 2 new instruction in using the canonical RTX, which
also matches the combiner-input for noce_try_store_flag_mask and
noce_try_store_flag, defined for conditional-zero.

For documentation on XVentanaCondOps, refer to:
  
https://github.com/ventanamicro/ventana-custom-extensions/releases/download/v1.0.1/ventana-custom-extensions-v1.0.1.pdf

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_rtx_costs): Recognize idiom
for conditional zero as a single instruction for TARGET_XVENTANACONDOPS.
* config/riscv/riscv.md: Include xventanacondops.md.
* config/riscv/zicond.md: Enable splitters for TARGET_XVENTANACONDOPS.
* config/riscv/xventanacondops.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xventanacondops-and-01.c: New test.
* gcc.target/riscv/xventanacondops-and-02.c: New test.
* gcc.target/riscv/xventanacondops-eq-01.c: New test.
* gcc.target/riscv/xventanacondops-eq-02.c: New test.
* gcc.target/riscv/xventanacondops-ifconv-imm.c: New test.
* gcc.target/riscv/xventanacondops-le-01.c: New test.
* gcc.target/riscv/xventanacondops-le-02.c: New test.
* gcc.target/riscv/xventanacondops-lt-01.c: New test.
* gcc.target/riscv/xventanacondops-lt-03.c: New test.
* gcc.target/riscv/xventanacondops-ne-01.c: New test.
* gcc.target/riscv/xventanacondops-ne-03.c: New test.
* gcc.target/riscv/xventanacondops-ne-04.c: New test.
* gcc.target/riscv/xventanacondops-xor-01.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/riscv.cc |  4 +--
 gcc/config/riscv/riscv.md |  5 ++--
 gcc/config/riscv/xventanacondops.md   | 29 +++
 gcc/config/riscv/zicond.md| 15 +-
 .../gcc.target/riscv/xventanacondops-and-01.c | 16 ++
 .../gcc.target/riscv/xventanacondops-and-02.c | 15 ++
 .../gcc.target/riscv/xventanacondops-eq-01.c  | 11 +++
 .../gcc.target/riscv/xventanacondops-eq-02.c  | 14 +
 .../riscv/xventanacondops-ifconv-imm.c| 19 
 .../gcc.target/riscv/xventanacondops-le-01.c  | 16 ++
 .../gcc.target/riscv/xventanacondops-le-02.c  | 11 +++
 .../gcc.target/riscv/xventanacondops-lt-01.c  | 16 ++
 .../gcc.target/riscv/xventanacondops-lt-03.c  | 16 ++
 .../gcc.target/riscv/xventanacondops-ne-01.c  | 10 +++
 .../gcc.target/riscv/xventanacondops-ne-03.c  | 13 +
 .../gcc.target/riscv/xventanacondops-ne-04.c  | 13 +
 .../gcc.target/riscv/xventanacondops-xor-01.c | 14 +
 17 files changed, 226 insertions(+), 11 deletions(-)
 create mode 100644 gcc/config/riscv/xventanacondops.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-and-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-and-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-eq-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-eq-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ifconv-imm.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-le-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-le-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-lt-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-lt-03.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ne-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ne-03.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ne-04.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-xor-01.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 7e69a652fc5..94ac8f350e6 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2331,8 +2331,8 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
   return false;
 
 case AND:
-  /* czero.eqz/nez */
-  if ((TARGET_ZICOND)
+  /* czero.eqz/nez or vt.maskc/vt.maskcn */
+  if ((TARGET_ZICOND || TARGET_XVENTANACONDOPS)
  && mode == word_mode
  && GET_CODE (XEXP (x, 0)) == NEG)
{
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 6f255a80379..e6b73c316cb 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2673,7 +2673,7 @@ (define_split
(match_operator:GPR 1 "anyle_operator"
   [(match_operand:X 2 "register_operand")
(match_operand:X 3 "register_operand")]))]
-  "TARGET_ZICOND"
+  "TARGET_ZICOND || TARGET_XVENTANACONDOPS"
   [(set (match_dup 0) (match_dup 4))
(set (match_dup 0) (eq:GPR (match_dup 0) (const_int 0)))]
  {

[RFC PATCH v1 07/10] RISC-V: Recognize bexti in negated if-conversion

2023-02-10 Thread Philipp Tomsich

While the positive case "if ((bits >> SHAMT) & 1)" for SHAMT 0..10 can
trigger conversion into efficient branchless sequences
  - with Zbs (bexti + neg + and)
  - with Zicond (andi + czero.nez)
the inverted/negated case results in
  andi a5,a0,1024
  seqz a5,a5
  neg a5,a5
  and a5,a5,a1
due to how the sequence presents to the combine pass.

This adds an additional splitter to reassociate the polarity reversed
case into bexti + addi, if Zbs is present.

gcc/ChangeLog:

* config/riscv/zicond.md: Add split to reassociate
"andi + seqz + neg" into "bexti + addi".

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/zicond.md | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/riscv/zicond.md b/gcc/config/riscv/zicond.md
index 15fdaa539f1..0aad61c7009 100644
--- a/gcc/config/riscv/zicond.md
+++ b/gcc/config/riscv/zicond.md
@@ -143,3 +143,13 @@ (define_split
 {
   operands[2] = GEN_INT(1 << UINTVAL(operands[2]));
 })
+
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (neg:X (eq:X (zero_extract:X (match_operand:X 1 "register_operand")
+(const_int 1)
+(match_operand 2 "immediate_operand"))
+(const_int 0]
+  "!TARGET_ZICOND && TARGET_ZBS"
+  [(set (match_dup 0) (zero_extract:X (match_dup 1) (const_int 1) (match_dup 
2)))
+   (set (match_dup 0) (plus:X (match_dup 0) (const_int -1)))])
-- 
2.34.1

[RFC PATCH v1 08/10] ifcvt: add if-conversion to conditional-zero instructions

2023-02-10 Thread Philipp Tomsich

Some architectures, as it the case on RISC-V with the proposed
ZiCondOps and the vendor-defined XVentanaCondOps, define a
conditional-zero instruction that is equivalent to:
 - the positive form:  rd = (rc != 0) ? rs : 0
 - the negated form:   rd = (rc == 0) ? rs : 0

While noce_try_store_flag_mask will somewhat work for this case, it
will generate a number of atomic RTX that will misdirect the cost
calculation and may be too long (i.e., 4 RTX and more) to successfully
merge at combine-time.

Instead, we add two new transforms that attempt to build up what we
define as the canonical form of a conditional-zero expression:

  (set (match_operand 0 "register_operand" "=r")
   (and (neg (eq_or_ne (match_operand 1 "register_operand" "r")
   (const_int 0)))
(match_operand 2 "register_operand" "r")))

Architectures that provide a conditional-zero are thus expected to
define an instruction matching this pattern in their backend.

Based on this, we support the following cases:
 - noce_try_condzero:
  a ? a : b
  a ? b : 0  (and then/else swapped)
 !a ? b : 0  (and then/else swapped)
 - noce_try_condzero_arith:
 conditional-plus, conditional-minus, conditional-and,
 conditional-or, conditional-xor, conditional-shift,
 conditional-and

Given that this is hooked into the CE passes, it is less powerful than
a tree-pass (e.g., it can not transform cases where an extension, such
as for uint16_t operations is in either the then or else-branch
together with the arithmetic) but already covers a good array of cases
and triggers across SPEC CPU 2017.

Adding transformations in a tree pass should come in a future
improvement.

gcc/ChangeLog:

* ifcvt.cc (noce_emit_insn): Add prototype.
(noce_emit_condzero): Helper for noce_try_condzero and
noce_try_condzero_arith transforms.
(noce_try_condzero): New transform.
(noce_try_condzero_arith): New transform for conditional
arithmetic that can be built up by exploiting that the
conditional-zero instruction will inject 0, which acts
as the neutral element for operations.
(noce_process_if_block): Call noce_try_condzero and
noce_try_condzero_arith.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xventanacondops-and-01.c: New test.
* gcc.target/riscv/xventanacondops-and-02.c: New test.
* gcc.target/riscv/xventanacondops-eq-01.c: New test.
* gcc.target/riscv/xventanacondops-eq-02.c: New test.
* gcc.target/riscv/xventanacondops-lt-01.c: New test.
* gcc.target/riscv/xventanacondops-ne-01.c: New test.
* gcc.target/riscv/xventanacondops-xor-01.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/ifcvt.cc  | 216 ++
 .../gcc.target/riscv/zicond-and-01.c  |  16 ++
 .../gcc.target/riscv/zicond-and-02.c  |  15 ++
 gcc/testsuite/gcc.target/riscv/zicond-eq-01.c |  11 +
 gcc/testsuite/gcc.target/riscv/zicond-eq-02.c |  14 ++
 gcc/testsuite/gcc.target/riscv/zicond-lt-01.c |  16 ++
 gcc/testsuite/gcc.target/riscv/zicond-ne-01.c |  10 +
 .../gcc.target/riscv/zicond-xor-01.c  |  14 ++
 8 files changed, 312 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-and-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-and-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-eq-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-eq-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-lt-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-ne-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-xor-01.c

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index 008796838f7..7ac3bd8f18e 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -97,6 +97,7 @@ static int find_if_case_2 (basic_block, edge, edge);
 static int dead_or_predicable (basic_block, basic_block, basic_block,
   edge, int);
 static void noce_emit_move_insn (rtx, rtx);
+static rtx_insn *noce_emit_insn (rtx);
 static rtx_insn *block_has_only_trap (basic_block);
 static void need_cmov_or_rewire (basic_block, hash_set *,
 hash_map *);
@@ -787,6 +788,9 @@ static rtx noce_get_alt_condition (struct noce_if_info *, 
rtx, rtx_insn **);
 static int noce_try_minmax (struct noce_if_info *);
 static int noce_try_abs (struct noce_if_info *);
 static int noce_try_sign_mask (struct noce_if_info *);
+static rtx noce_emit_condzero (struct noce_if_info *, rtx, bool = false);
+static int noce_try_condzero (struct noce_if_info *);
+static int noce_try_condzero_arith (struct noce_if_info *);
 
 /* Return the comparison code for reversed condition for IF_INFO,
or UNKNOWN if reversing the condition is not possible.  */
@@ -1664,6 +1668,214 @@ noce_try_addcc (struct noce_if_info *if_info)
   return FALSE;
 }
 
+

[RFC PATCH v1 06/10] RISC-V: Recognize sign-extract + and cases for czero.eqz/nez

2023-02-10 Thread Philipp Tomsich

Users might use explicit arithmetic operations to create a mask and
then and it, in a sequence like
cond = (bits >> SHIFT) & 1;
mask = ~(cond - 1);
val &= mask;
which will present as a single-bit sign-extract.

Dependening on what combination of XVentanaCondOps and Zbs are
available, this will map to the following sequences:
 - bexti + czero, if both Zbs and XVentanaCondOps are present
 - andi + czero,  if only XVentanaCondOps is available and the
  sign-extract is operating on bits 10:0 (bit 11
  can't be reached, as the immediate is
  sign-extended)
 - slli + srli + and, otherwise.

gcc/ChangeLog:

* config/riscv/zicond.md: Recognize SIGN_EXTRACT of a
single-bit followed by AND for Zicond.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-le-01.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/zicond.md| 45 +++
 gcc/testsuite/gcc.target/riscv/zicond-le-01.c | 16 +++
 2 files changed, 61 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-le-01.c

diff --git a/gcc/config/riscv/zicond.md b/gcc/config/riscv/zicond.md
index 9d1ce067150..15fdaa539f1 100644
--- a/gcc/config/riscv/zicond.md
+++ b/gcc/config/riscv/zicond.md
@@ -98,3 +98,48 @@ (define_split
   operands[6] = gen_rtx_fmt_ee (GET_CODE (operands[1]) == LE ? GT : GTU,
mode, operands[2], operands[3]);
 })
+
+;; Users might use explicit arithmetic operations to create a mask and
+;; then and it, in a sequence like
+;;cond = (bits >> SHIFT) & 1;
+;;mask = ~(cond - 1);
+;;val &= mask;
+;; which will present as a single-bit sign-extract in the combiner.
+;;
+;; This will give rise to any of the following cases:
+;; - with Zbs and XVentanaCondOps: bexti + vt.maskc
+;; - with XVentanaCondOps (but w/o Zbs):
+;;   - andi + vt.maskc, if the mask is representable in the immediate
+;;  (which requires extra care due to the immediate
+;;   being sign-extended)
+;;   - slli + srli + and
+;; - otherwise: slli + srli + and
+
+;; With Zbb, we have bexti for all possible bits...
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (and:X (sign_extract:X (match_operand:X 1 "register_operand")
+  (const_int 1)
+  (match_operand 2 "immediate_operand"))
+  (match_operand:X 3 "register_operand")))
+   (clobber (match_operand:X 4 "register_operand"))]
+  "TARGET_ZICOND && TARGET_ZBS"
+  [(set (match_dup 4) (zero_extract:X (match_dup 1) (const_int 1) (match_dup 
2)))
+   (set (match_dup 0) (and:X (neg:X (ne:X (match_dup 4) (const_int 0)))
+(match_dup 3)))])
+
+;; ...whereas RV64I only allows us access to bits 0..10 in a single andi.
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (and:X (sign_extract:X (match_operand:X 1 "register_operand")
+  (const_int 1)
+  (match_operand 2 "immediate_operand"))
+  (match_operand:X 3 "register_operand")))
+   (clobber (match_operand:X 4 "register_operand"))]
+  "TARGET_ZICOND && !TARGET_ZBS && (UINTVAL (operands[2]) < 11)"
+  [(set (match_dup 4) (and:X (match_dup 1) (match_dup 2)))
+   (set (match_dup 0) (and:X (neg:X (ne:X (match_dup 4) (const_int 0)))
+(match_dup 3)))]
+{
+  operands[2] = GEN_INT(1 << UINTVAL(operands[2]));
+})
diff --git a/gcc/testsuite/gcc.target/riscv/zicond-le-01.c 
b/gcc/testsuite/gcc.target/riscv/zicond-le-01.c
new file mode 100644
index 000..e5902d1ca5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zicond-le-01.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicond -mabi=lp64 -mbranch-cost=4" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" "-Os" "-Oz"  } } */
+
+long long sink (long long);
+
+long long le1 (long long a, long long b)
+{
+  if (a <= b)
+b = 0;
+
+  return sink(b);
+}
+
+/* { dg-final { scan-assembler-times "sgt\t" 1 } } */
+/* { dg-final { scan-assembler-times "czero.eqz\t" 1 } } */
-- 
2.34.1

[RFC PATCH v1 04/10] RISC-V: Support immediates in Zicond

2023-02-10 Thread Philipp Tomsich

When if-conversion encounters sequences using immediates, the
sequences can't trivially map back onto czero.eqz/czero.nezt (even if
benefitial) due to czero.eqz/czero.nez not having immediate forms.

This adds a splitter to rewrite opportunities for Zicond that operate
on an immediate by first putting the immediate into a register to
enable the non-immediate czero.eqz/czero.nez instructions to operate
on the value.

Consider code, such as

  long func2 (long a, long c)
  {
if (c)
  a = 2;
else
  a = 5;
return a;
  }

which will be converted to

  func2:
seqza0,a2
neg a0,a0
andia0,a0,3
addia0,a0,2
ret

Following this change, we generate

li  a0,3
czero.nez a0,a0,a2
addia0,a0,2
ret

This commit also introduces a simple unit test for if-conversion with
immediate (literal) values as the sources for simple sets in the THEN
and ELSE blocks. The test checks that the conditional-zero instruction
(czero.eqz/nez) is emitted as part of the resulting branchless
instruction sequence.

gcc/ChangeLog:

* config/riscv/zicond.md: Support immediates for
czero.eqz/czero.nez through a splitter.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-ifconv-imm.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/zicond.md| 20 +++
 .../gcc.target/riscv/zicond-ifconv-imm.c  | 19 ++
 2 files changed, 39 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-ifconv-imm.c

diff --git a/gcc/config/riscv/zicond.md b/gcc/config/riscv/zicond.md
index 278e3a67802..19d0b35585b 100644
--- a/gcc/config/riscv/zicond.md
+++ b/gcc/config/riscv/zicond.md
@@ -28,3 +28,23 @@ (define_insn "*czero."
(match_operand:DI 2 "register_operand" "r")))]
   "TARGET_ZICOND"
   "czero.\t%0,%2,%1")
+
+;; Zicond does not have immediate forms, so we need to do extra work
+;; to support these: if we encounter a vt.maskc/n with an immediate,
+;; we split this into a load-immediate followed by a czero.eqz/nez.
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+   (and:DI (neg:DI (match_operator:DI 1 "equality_operator"
+  [(match_operand:DI 2 "register_operand")
+   (const_int 0)]))
+   (match_operand:DI 3 "immediate_operand")))
+   (clobber (match_operand:DI 4 "register_operand"))]
+  "TARGET_ZICOND"
+  [(set (match_dup 4) (match_dup 3))
+   (set (match_dup 0) (and:DI (neg:DI (match_dup 1))
+ (match_dup 4)))]
+{
+  /* Eliminate the clobber/temporary, if it is not needed. */
+  if (!rtx_equal_p (operands[0], operands[2]))
+ operands[4] = operands[0];
+})
diff --git a/gcc/testsuite/gcc.target/riscv/zicond-ifconv-imm.c 
b/gcc/testsuite/gcc.target/riscv/zicond-ifconv-imm.c
new file mode 100644
index 000..f410537a4f0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zicond-ifconv-imm.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicond -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
+
+/* Each function below should emit a czero.nez instruction */
+
+long
+foo0 (long a, long b, long c)
+{
+  if (c)
+a = 0;
+  else
+a = 5;
+  return a;
+}
+
+/* { dg-final { scan-assembler-times "czero.nez\t" 1 } } */
+/* { dg-final { scan-assembler-not "beqz\t" } } */
+/* { dg-final { scan-assembler-not "bnez\t" } } */
-- 
2.34.1

[RFC PATCH v1 05/10] RISC-V: Support noce_try_store_flag_mask as czero.eqz/czero.nez

2023-02-10 Thread Philipp Tomsich

When if-conversion in noce_try_store_flag_mask starts the sequence off
with an order-operator, our patterns for czero.eqz/nez will receive
the result of the order-operator as a register argument; consequently,
they can't know that the result will be either 1 or 0.

To convey this information (and make czero.eqz/nez applicable), we
wrap the result of the order-operator in a eq/ne against (const_int 0).
This commit adds the split pattern to handle these cases.

During if-conversion, if noce_try_store_flag_mask succeeds, we may see
if (cur < next) {
next = 0;
}
transformed into
   27: r82:SI=ltu(r76:DI,r75:DI)
  REG_DEAD r76:DI
   28: r81:SI=r82:SI^0x1
  REG_DEAD r82:SI
   29: r80:DI=zero_extend(r81:SI)
  REG_DEAD r81:SI

This currently escapes the combiner, as RISC-V does not have a pattern
to apply the 'slt' instruction to 'geu' verbs.  By adding a pattern in
this commit, we match such cases.

gcc/ChangeLog:

* config/riscv/predicates.md (anyge_operator): Define.
(anygt_operator): Same.
(anyle_operator): Same.
(anylt_operator): Same.
* config/riscv/riscv.md: Helpers for ge(u) & le(u).
* config/riscv/zicond.md: Add split to wrap an an
order-operator suitably for generating czero.eqz/nez

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-le-02.c: New test.
* gcc.target/riscv/zicond-lt-03.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/predicates.md| 12 +
 gcc/config/riscv/riscv.md | 26 ++
 gcc/config/riscv/zicond.md| 50 +++
 gcc/testsuite/gcc.target/riscv/zicond-le-02.c | 11 
 gcc/testsuite/gcc.target/riscv/zicond-lt-03.c | 16 ++
 5 files changed, 115 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-le-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-lt-03.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 034d088c656..6b6f867824e 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -204,6 +204,18 @@ (define_predicate "modular_operator"
 (define_predicate "equality_operator"
   (match_code "eq,ne"))
 
+(define_predicate "anyge_operator"
+  (match_code "ge,geu"))
+
+(define_predicate "anygt_operator"
+  (match_code "gt,gtu"))
+
+(define_predicate "anyle_operator"
+  (match_code "le,leu"))
+
+(define_predicate "anylt_operator"
+  (match_code "lt,ltu"))
+
 (define_predicate "order_operator"
   (match_code "eq,ne,lt,ltu,le,leu,ge,geu,gt,gtu"))
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 7c632bb4d65..6f255a80379 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2668,6 +2668,19 @@ (define_insn "*sge_"
   [(set_attr "type" "slt")
(set_attr "mode" "")])
 
+(define_split
+  [(set (match_operand:GPR 0 "register_operand")
+   (match_operator:GPR 1 "anyle_operator"
+  [(match_operand:X 2 "register_operand")
+   (match_operand:X 3 "register_operand")]))]
+  "TARGET_ZICOND"
+  [(set (match_dup 0) (match_dup 4))
+   (set (match_dup 0) (eq:GPR (match_dup 0) (const_int 0)))]
+ {
+  operands[4] = gen_rtx_fmt_ee (GET_CODE (operands[1]) == LE ? LT : LTU,
+   mode, operands[3], operands[2]);
+ })
+
 (define_insn "*slt_"
   [(set (match_operand:GPR   0 "register_operand" "= r")
(any_lt:GPR (match_operand:X 1 "register_operand" "  r")
@@ -2689,6 +2702,19 @@ (define_insn "*sle_"
   [(set_attr "type" "slt")
(set_attr "mode" "")])
 
+(define_split
+  [(set (match_operand:GPR 0 "register_operand")
+   (match_operator:GPR 1 "anyge_operator"
+  [(match_operand:X 2 "register_operand")
+   (match_operand:X 3 "register_operand")]))]
+  "TARGET_ZICOND"
+  [(set (match_dup 0) (match_dup 4))
+   (set (match_dup 0) (eq:GPR (match_dup 0) (const_int 0)))]
+{
+  operands[4] = gen_rtx_fmt_ee (GET_CODE (operands[1]) == GE ? LT : LTU,
+   mode, operands[2], operands[3]);
+})
+
 ;;
 ;;  
 ;;
diff --git a/gcc/config/riscv/zicond.md b/gcc/config/riscv/zicond.md
index 19d0b35585b..9d1ce067150 100644
--- a/gcc/config/riscv/zicond.md
+++ b/gcc/config/riscv/zicond.md
@@ -48,3 +48,53 @@ (define_split
   if (!rtx_equal_p (operands[0], operands[2]))
  operands[4] = operands[0];
 })
+
+;; Make order operators digestible to the vt.maskc logic by
+;; wrapping their result in a comparison against (const_int 0).
+
+;; "a >= b" is "!(

[RFC PATCH v1 09/10] RISC-V: Recognize xventanacondops extension

2023-02-10 Thread Philipp Tomsich

This adds the xventanacondops extension to the option parsing and as a
default for the ventana-vt1 core:

gcc/Changelog:

* common/config/riscv/riscv-common.cc: Recognize
  "xventanacondops" as part of an architecture string.
* config/riscv/riscv-opts.h (MASK_XVENTANACONDOPS): Define.
(TARGET_XVENTANACONDOPS): Define.
* config/riscv/riscv.opt: Add "riscv_xventanacondops".

Signed-off-by: Philipp Tomsich 
---

 gcc/common/config/riscv/riscv-common.cc | 2 ++
 gcc/config/riscv/riscv-opts.h   | 3 +++
 gcc/config/riscv/riscv.opt  | 3 +++
 3 files changed, 8 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 999e1926db1..a77a04d68d9 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1250,6 +1250,8 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"svinval", _options::x_riscv_sv_subext, MASK_SVINVAL},
   {"svnapot", _options::x_riscv_sv_subext, MASK_SVNAPOT},
 
+  {"xventanacondops", _options::x_riscv_xventanacondops, 
MASK_XVENTANACONDOPS},
+
   {NULL, NULL, 0}
 };
 
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 61d5212da20..d80e81c6c28 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -191,4 +191,7 @@ enum stack_protector_guard {
? 0 \
: 32 << (__builtin_popcount (riscv_zvl_flags) - 1))
 
+#define MASK_XVENTANACONDOPS (1 << 0)
+#define TARGET_XVENTANACONDOPS ((riscv_xventanacondops & MASK_XVENTANACONDOPS) 
!= 0)
+
 #endif /* ! GCC_RISCV_OPTS_H */
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index e78c99382cd..6ebaad43d0e 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -233,6 +233,9 @@ int riscv_zm_subext
 TargetVariable
 int riscv_sv_subext
 
+TargetVariable
+int riscv_xventanacondops = 0
+
 Enum
 Name(isa_spec_class) Type(enum riscv_isa_spec_class)
 Supported ISA specs (for use with the -misa-spec= option):
-- 
2.34.1

[RFC PATCH v1 00/10] RISC-V: Support the Zicond (conditional-operations) extension

2023-02-10 Thread Philipp Tomsich



The (proposed, but about to be frozen) Zicond extension adds 2
unconditional R-type instructions that can be used to build branchless
sequences that have conditional-arithmetic/bitwise/select semantics
and integrate will with the RISC-V architecture.

See the Zicond specification for details:
  
https://github.com/riscv/riscv-zicond/releases/download/v1.0-draft-20230207/riscv-zicond_1.0-draft-20230207.pdf

The Zicond extension defines a conditional-zero(-or-value)
instruction, which is similar to the following C construct:
  rd = rc ? rs : 0

This functionality can be tied back into if-convertsion and also
matches some typical programming idioms.  This series includes backend
support for Zicond both to handle conditional-zero constructions and
if-conversion.  We also change the previously submitted
XVentanaCondops support to use the Zicond infrastructure.

Tested against SPEC CPU 2017.



Philipp Tomsich (10):
  docs: Document a canonical RTL for a conditional-zero insns
  RISC-V: Recognize Zicond (conditional operations) extension
  RISC-V: Generate czero.eqz/nez on noce_try_store_flag_mask
if-conversion
  RISC-V: Support immediates in Zicond
  RISC-V: Support noce_try_store_flag_mask as czero.eqz/czero.nez
  RISC-V: Recognize sign-extract + and cases for czero.eqz/nez
  RISC-V: Recognize bexti in negated if-conversion
  ifcvt: add if-conversion to conditional-zero instructions
  RISC-V: Recognize xventanacondops extension
  RISC-V: Support XVentanaCondOps extension

 gcc/common/config/riscv/riscv-common.cc   |   5 +
 gcc/config/riscv/predicates.md|  12 +
 gcc/config/riscv/riscv-opts.h |   5 +
 gcc/config/riscv/riscv.cc |  15 ++
 gcc/config/riscv/riscv.md |  28 +++
 gcc/config/riscv/riscv.opt|   3 +
 gcc/config/riscv/xventanacondops.md   |  29 +++
 gcc/config/riscv/zicond.md| 156 +
 gcc/doc/md.texi   |  17 ++
 gcc/ifcvt.cc  | 216 ++
 .../gcc.target/riscv/xventanacondops-and-01.c |  16 ++
 .../gcc.target/riscv/xventanacondops-and-02.c |  15 ++
 .../gcc.target/riscv/xventanacondops-eq-01.c  |  11 +
 .../gcc.target/riscv/xventanacondops-eq-02.c  |  14 ++
 .../riscv/xventanacondops-ifconv-imm.c|  19 ++
 .../gcc.target/riscv/xventanacondops-le-01.c  |  16 ++
 .../gcc.target/riscv/xventanacondops-le-02.c  |  11 +
 .../gcc.target/riscv/xventanacondops-lt-01.c  |  16 ++
 .../gcc.target/riscv/xventanacondops-lt-03.c  |  16 ++
 .../gcc.target/riscv/xventanacondops-ne-01.c  |  10 +
 .../gcc.target/riscv/xventanacondops-ne-03.c  |  13 ++
 .../gcc.target/riscv/xventanacondops-ne-04.c  |  13 ++
 .../gcc.target/riscv/xventanacondops-xor-01.c |  14 ++
 .../gcc.target/riscv/zicond-and-01.c  |  16 ++
 .../gcc.target/riscv/zicond-and-02.c  |  15 ++
 gcc/testsuite/gcc.target/riscv/zicond-eq-01.c |  11 +
 gcc/testsuite/gcc.target/riscv/zicond-eq-02.c |  14 ++
 .../gcc.target/riscv/zicond-ifconv-imm.c  |  19 ++
 gcc/testsuite/gcc.target/riscv/zicond-le-01.c |  16 ++
 gcc/testsuite/gcc.target/riscv/zicond-le-02.c |  11 +
 gcc/testsuite/gcc.target/riscv/zicond-lt-01.c |  16 ++
 gcc/testsuite/gcc.target/riscv/zicond-lt-03.c |  16 ++
 gcc/testsuite/gcc.target/riscv/zicond-ne-01.c |  10 +
 gcc/testsuite/gcc.target/riscv/zicond-ne-03.c |  13 ++
 gcc/testsuite/gcc.target/riscv/zicond-ne-04.c |  13 ++
 .../gcc.target/riscv/zicond-xor-01.c  |  14 ++
 36 files changed, 854 insertions(+)
 create mode 100644 gcc/config/riscv/xventanacondops.md
 create mode 100644 gcc/config/riscv/zicond.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-and-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-and-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-eq-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-eq-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ifconv-imm.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-le-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-le-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-lt-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-lt-03.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ne-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ne-03.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-ne-04.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/xventanacondops-xor-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-and-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-and-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-eq-01.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-eq-02.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-ifconv-imm.c

[RFC PATCH v1 03/10] RISC-V: Generate czero.eqz/nez on noce_try_store_flag_mask if-conversion

2023-02-10 Thread Philipp Tomsich

Adds a pattern to map the output of noce_try_store_flag_mask
if-conversion in the combiner onto vt.maskc; the input patterns
supported are similar to the following:
  (set (reg/v/f:DI 75 [  ])
   (and:DI (neg:DI (ne:DI (reg:DI 82)
   (const_int 0 [0])))
   (reg/v/f:DI 75 [  ])))

To ensure that the combine-pass doesn't get confused about
profitability, we recognize the idiom as requiring a single
instruction when the Zicond extension is present.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_rtx_costs): Recongnize the idiom
for conditional-zero as a single instruction for TARGET_ZICOND
* config/riscv/riscv.md: Include zicond.md.
* config/riscv/zicond.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-ne-03.c: New test.
* gcc.target/riscv/zicond-ne-04.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/riscv/riscv.cc | 15 ++
 gcc/config/riscv/riscv.md |  1 +
 gcc/config/riscv/zicond.md| 30 +++
 gcc/testsuite/gcc.target/riscv/zicond-ne-03.c | 13 
 gcc/testsuite/gcc.target/riscv/zicond-ne-04.c | 13 
 5 files changed, 72 insertions(+)
 create mode 100644 gcc/config/riscv/zicond.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-ne-03.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-ne-04.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index e4d3e1a3229..7e69a652fc5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2331,6 +2331,21 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
   return false;
 
 case AND:
+  /* czero.eqz/nez */
+  if ((TARGET_ZICOND)
+ && mode == word_mode
+ && GET_CODE (XEXP (x, 0)) == NEG)
+   {
+ rtx inner = XEXP (XEXP (x, 0), 0);
+
+ if ((GET_CODE (inner) == EQ || GET_CODE (inner) == NE)
+ && CONST_INT_P (XEXP (inner, 1))
+ && INTVAL (XEXP (inner, 1)) == 0)
+   {
+ *total = COSTS_N_INSNS (1);
+ return true;
+   }
+   }
   /* slli.uw pattern for zba.  */
   if (TARGET_ZBA && TARGET_64BIT && mode == DImode
  && GET_CODE (XEXP (x, 0)) == ASHIFT)
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index e8b5fc6644d..7c632bb4d65 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3228,3 +3228,4 @@ (define_insn "riscv_prefetchi_"
 (include "generic.md")
 (include "sifive-7.md")
 (include "vector.md")
+(include "zicond.md")
diff --git a/gcc/config/riscv/zicond.md b/gcc/config/riscv/zicond.md
new file mode 100644
index 000..278e3a67802
--- /dev/null
+++ b/gcc/config/riscv/zicond.md
@@ -0,0 +1,30 @@
+;; Machine description for the RISC-V Zicond extension
+;; Copyright (C) 2022-23 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_code_iterator eq_or_ne [eq ne])
+(define_code_attr eqz [(eq "nez") (ne "eqz")])
+
+(define_insn "*czero."
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (and:DI (neg:DI (eq_or_ne:DI
+   (match_operand:DI 1 "register_operand" "r")
+   (const_int 0)))
+   (match_operand:DI 2 "register_operand" "r")))]
+  "TARGET_ZICOND"
+  "czero.\t%0,%2,%1")
diff --git a/gcc/testsuite/gcc.target/riscv/zicond-ne-03.c 
b/gcc/testsuite/gcc.target/riscv/zicond-ne-03.c
new file mode 100644
index 000..887b1273ce7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zicond-ne-03.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicond -mabi=lp64 -mtune=thead-c906" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" "-Os" "-Oz" } } */
+
+long long ne3(long long a, long long b)
+{
+  if (a != 0)
+return b;
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "czero.eqz" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zicond-ne

[RFC PATCH v1 01/10] docs: Document a canonical RTL for a conditional-zero insns

2023-02-10 Thread Philipp Tomsich

On RISC-V, conditional-zero (i.e., move a register value or zero to a
destination register) instructions are part if the Zicond extension.
To support architectures that have similar constructs, we define a
canonical RTL representation that can be used in if-conversion.

Signed-off-by: Philipp Tomsich 
---

 gcc/doc/md.texi | 17 +
 1 file changed, 17 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 7235d34c4b3..579462ea67f 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8347,6 +8347,23 @@ operand of @code{mult} is also a shift, then that is 
extended also.
 This transformation is only applied when it can be proven that the
 original operation had sufficient precision to prevent overflow.
 
+@cindex @code{conditional-zero}, canonicalization of
+@item
+A machine that has an instruction that performs a conditional-zero
+operation (i.e., an instruction that moves a register value or puts 0
+into the destination register) should specify the pattern for that
+instruction as:
+@smallexample
+(define_insn ""
+  [(set (match_operand:@var{m} 0 @dots{})
+(and:@var{m}
+  (neg:@var{m} (@var{eq_or_ne} (match_operand:@var{m} 1 @dots{})
+   (const_int 0)))
+  (match_operand:@var{m} 2 @dots{})))]
+  "@dots{}"
+  "@dots{}")
+@end smallexample
+
 @end itemize
 
 Further canonicalization rules are defined in the function
-- 
2.34.1

[RFC PATCH v1 02/10] RISC-V: Recognize Zicond (conditional operations) extension

2023-02-10 Thread Philipp Tomsich

This adds the RISC-V Zicond extension to the option parsing.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Recognize "zicond"
as part of an architecture string.
* config/riscv/riscv-opts.h (MASK_ZICOND): Define.
(TARGET_ZICOND): Define.

Signed-off-by: Philipp Tomsich 
---

 gcc/common/config/riscv/riscv-common.cc | 3 +++
 gcc/config/riscv/riscv-opts.h   | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 787674003cb..999e1926db1 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -163,6 +163,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zifencei", ISA_SPEC_CLASS_20191213, 2, 0},
   {"zifencei", ISA_SPEC_CLASS_20190608, 2, 0},
 
+  {"zicond", ISA_SPEC_CLASS_NONE, 1, 0},
+
   {"zawrs", ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"zba", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1181,6 +1183,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
 
   {"zicsr",_options::x_riscv_zi_subext, MASK_ZICSR},
   {"zifencei", _options::x_riscv_zi_subext, MASK_ZIFENCEI},
+  {"zicond",   _options::x_riscv_zi_subext, MASK_ZICOND},
 
   {"zawrs", _options::x_riscv_za_subext, MASK_ZAWRS},
 
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index ff398c0a2ae..61d5212da20 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -69,9 +69,11 @@ enum stack_protector_guard {
 
 #define MASK_ZICSR(1 << 0)
 #define MASK_ZIFENCEI (1 << 1)
+#define MASK_ZICOND   (1 << 2)
 
 #define TARGET_ZICSR((riscv_zi_subext & MASK_ZICSR) != 0)
 #define TARGET_ZIFENCEI ((riscv_zi_subext & MASK_ZIFENCEI) != 0)
+#define TARGET_ZICOND   ((riscv_zi_subext & MASK_ZICOND) != 0)
 
 #define MASK_ZAWRS   (1 << 0)
 #define TARGET_ZAWRS ((riscv_za_subext & MASK_ZAWRS) != 0)
-- 
2.34.1

Re: [PATCH V1 1/1] UNRATIFIED RISC-V: Add 'ZiCond' extension

2023-02-09 Thread Philipp Tomsich

Just a quick heads-up to avoid duplication of work: we have a series
queued up for later this week (right now, SPEC2017 is still running
for QA purposes) that adds if-conversion support and converts that
into Zicond operations.
It doesn't have much overlap (except handling the "zicond" flag), as
we don't use builtins but a new canonical pattern.

Philipp.


On Thu, 9 Feb 2023 at 12:06,  wrote:
>
> From: yulong 
>
> [DO NOT MERGE]
> Until 'ZiCond' extension is frozen/ratified and final version number is
> determined, this patch should not be merged upstream.  This commit uses
> version 1.0 as in the documentation.
>
> This commit adds support for the latest draft of RISC-V Integer Conditional
> (ZiCond) extension consisting of 2 new instructions.
>
> This is based on the early draft of ZiCond on GitHub:
> 
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc: Add zicond ext.
> * config/riscv/riscv-builtins.cc (RISCV_FTYPE_NAME2): New.
> (AVAIL): New.
> (RISCV_FTYPE_ATYPES2): New.
> * config/riscv/riscv-ftypes.def (2): New.
> * config/riscv/riscv-opts.h (MASK_ZICOND): New.
> (TARGET_ZICOND): New.
> * config/riscv/riscv.md (riscv_eqz_): Add new mode.
> (riscv_nez_): Add new mode.
> * config/riscv/riscv.opt: New.
> * config/riscv/riscv-zicond.def: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/zicond-1.c: New test.
> * gcc.target/riscv/zicond-2.c: New test.
> ---
>  gcc/common/config/riscv/riscv-common.cc   |  4 
>  gcc/config/riscv/riscv-builtins.cc|  8 
>  gcc/config/riscv/riscv-ftypes.def |  2 ++
>  gcc/config/riscv/riscv-opts.h |  3 +++
>  gcc/config/riscv/riscv-zicond.def |  5 +
>  gcc/config/riscv/riscv.md | 22 ++
>  gcc/config/riscv/riscv.opt|  3 +++
>  gcc/testsuite/gcc.target/riscv/zicond-1.c | 15 +++
>  gcc/testsuite/gcc.target/riscv/zicond-2.c | 15 +++
>  9 files changed, 77 insertions(+)
>  create mode 100644 gcc/config/riscv/riscv-zicond.def
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zicond-2.c
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 787674003cb..5a8b1278ac8 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -190,6 +190,8 @@ static const struct riscv_ext_version 
> riscv_ext_version_table[] =
>{"zicbom",ISA_SPEC_CLASS_NONE, 1, 0},
>{"zicbop",ISA_SPEC_CLASS_NONE, 1, 0},
>
> +  {"zicond",ISA_SPEC_CLASS_NONE, 1, 0},
> +
>{"zk",ISA_SPEC_CLASS_NONE, 1, 0},
>{"zkn",   ISA_SPEC_CLASS_NONE, 1, 0},
>{"zks",   ISA_SPEC_CLASS_NONE, 1, 0},
> @@ -1209,6 +1211,8 @@ static const riscv_ext_flag_table_t 
> riscv_ext_flag_table[] =
>{"zicbom", _options::x_riscv_zicmo_subext, MASK_ZICBOM},
>{"zicbop", _options::x_riscv_zicmo_subext, MASK_ZICBOP},
>
> +  {"zicond", _options::x_riscv_zicond_subext, MASK_ZICOND},
> +
>{"zve32x",   _options::x_target_flags, MASK_VECTOR},
>{"zve32f",   _options::x_target_flags, MASK_VECTOR},
>{"zve64x",   _options::x_target_flags, MASK_VECTOR},
> diff --git a/gcc/config/riscv/riscv-builtins.cc 
> b/gcc/config/riscv/riscv-builtins.cc
> index 25ca407f9a9..66a8126b2b4 100644
> --- a/gcc/config/riscv/riscv-builtins.cc
> +++ b/gcc/config/riscv/riscv-builtins.cc
> @@ -42,6 +42,7 @@ along with GCC; see the file COPYING3.  If not see
>  /* Macros to create an enumeration identifier for a function prototype.  */
>  #define RISCV_FTYPE_NAME0(A) RISCV_##A##_FTYPE
>  #define RISCV_FTYPE_NAME1(A, B) RISCV_##A##_FTYPE_##B
> +#define RISCV_FTYPE_NAME2(A, B, C) RISCV_##A##_FTYPE_##B##_##C
>
>  /* Classifies the prototype of a built-in function.  */
>  enum riscv_function_type {
> @@ -99,6 +100,10 @@ AVAIL (zero64,  TARGET_ZICBOZ && TARGET_64BIT)
>  AVAIL (prefetchi32, TARGET_ZICBOP && !TARGET_64BIT)
>  AVAIL (prefetchi64, TARGET_ZICBOP && TARGET_64BIT)
>  AVAIL (always, (!0))
> +AVAIL (nez32, TARGET_ZICOND && !TARGET_64BIT)
> +AVAIL (nez64, TARGET_ZICOND && TARGET_64BIT)
> +AVAIL (eqz32, TARGET_ZICOND && !TARGET_64BIT)
> +AVAIL (eqz64, TARGET_ZICOND && TARGET_64BIT)
>
>  /* Construct a riscv_builtin_description from the given arguments.
>
> @@ -142,9 +147,12 @@ AVAIL (always, (!0))
>RISCV_ATYPE_##A
>  #define RISCV_FTYPE_ATYPES1(A, B) \
>RISCV_ATYPE_##A, RISCV_ATYPE_##B
> +#define RISCV_FTYPE_ATYPES2(A, B, C) \
> +  RISCV_ATYPE_##A, RISCV_ATYPE_##B, RISCV_ATYPE_##C
>
>  static const struct riscv_builtin_description riscv_builtins[] = {
>#include "riscv-cmo.def"
> +  #include "riscv-zicond.def"
>
>DIRECT_BUILTIN (frflags, RISCV_USI_FTYPE, hard_float),
>DIRECT_NO_TARGET_BUILTIN (fsflags,

[PATCH, COMMITTED] PR target/108589 - Check REG_P for AARCH64_FUSE_ADDSUB_2REG_CONST1

2023-01-31 Thread Philipp Tomsich

This adds a check for REG_P on SET_DEST for the new idiom recognizer
for AARCH64_FUSE_ADDSUB_2REG_CONST1.  The reported ICE is only
observable with checking=rtl.

Bootstrapped/regtested aarch64-linux, committed.

PR target/108589

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Check
REG_P on SET_DEST.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr108589.c: New test.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/aarch64/aarch64.cc   |  1 +
 gcc/testsuite/gcc.target/aarch64/pr108589.c | 15 +++
 2 files changed, 16 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr108589.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 17c1e23e5b5..acc0cfe5f94 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -25704,6 +25704,7 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
*curr)
  && CONST_INT_P (XEXP (curr_src, 1))
  && INTVAL (XEXP (curr_src, 1)) == polarity
  && REG_P (XEXP (curr_src, 0))
+ && REG_P (SET_DEST (prev_set))
  && REGNO (SET_DEST (prev_set)) == REGNO (XEXP (curr_src, 0)))
return true;
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/pr108589.c 
b/gcc/testsuite/gcc.target/aarch64/pr108589.c
new file mode 100644
index 000..e9c5bc608af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr108589.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mtune=ampere1a -fno-split-wide-types" } */
+
+int i;
+__int128 j;
+short s;
+
+void
+foo (void)
+{
+  j -= i;
+  int l = i - __builtin_sub_overflow_p (0, 61680, s);
+  j -= __builtin_mul_overflow_p (i, l, 0);
+}
-- 
2.34.1

Re: [PATCH] aarch64: Update Ampere-1A (-mcpu=ampere1a) to include SM4

2023-01-30 Thread Philipp Tomsich

On Mon, 30 Jan 2023 at 15:18, Kyrylo Tkachov  wrote:
>
>
>
> > -Original Message-
> > From: Gcc-patches  > bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Philipp
> > Tomsich
> > Sent: Saturday, January 28, 2023 11:12 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Manolis Tsamis ; Richard Sandiford
> > ; Tamar Christina
> > ; Philipp Tomsich 
> > Subject: [PATCH] aarch64: Update Ampere-1A (-mcpu=ampere1a) to include
> > SM4
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-cores.def (AARCH64_CORE): Update
> > ampere1a to include SM4.
>
> Ok, this looks consistent with what recently went in to LLVM.
> Thanks,
> Kyrill

Thanks, applied to master!
Philipp.

>
> >
> > Signed-off-by: Philipp Tomsich 
> > ---
> >
> >  gcc/config/aarch64/aarch64-cores.def | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64-cores.def
> > b/gcc/config/aarch64/aarch64-cores.def
> > index 2a0f52e1dd9..85fdfd8bf74 100644
> > --- a/gcc/config/aarch64/aarch64-cores.def
> > +++ b/gcc/config/aarch64/aarch64-cores.def
> > @@ -70,7 +70,7 @@ AARCH64_CORE("thunderxt83",   thunderxt83,
> > thunderx,  V8A,  (CRC, CRYPTO), thu
> >
> >  /* Ampere Computing ('\xC0') cores. */
> >  AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (F16, RNG, AES,
> > SHA3), ampere1, 0xC0, 0xac3, -1)
> > -AARCH64_CORE("ampere1a", ampere1a, cortexa57, V8_6A, (F16, RNG, AES,
> > SHA3, MEMTAG), ampere1a, 0xC0, 0xac4, -1)
> > +AARCH64_CORE("ampere1a", ampere1a, cortexa57, V8_6A, (F16, RNG, AES,
> > SHA3, SM4, MEMTAG), ampere1a, 0xC0, 0xac4, -1)
> >  /* Do not swap around "emag" and "xgene1",
> > this order is required to handle variant correctly. */
> >  AARCH64_CORE("emag",emag,  xgene1,V8A,  (CRC, CRYPTO), 
> > emag,
> > 0x50, 0x000, 3)
> > --
> > 2.34.1
>

[PATCH] aarch64: Update Ampere-1A (-mcpu=ampere1a) to include SM4

2023-01-28 Thread Philipp Tomsich

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Update
ampere1a to include SM4.

Signed-off-by: Philipp Tomsich 
---

 gcc/config/aarch64/aarch64-cores.def | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 2a0f52e1dd9..85fdfd8bf74 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -70,7 +70,7 @@ AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  V8A,  
(CRC, CRYPTO), thu
 
 /* Ampere Computing ('\xC0') cores. */
 AARCH64_CORE("ampere1", ampere1, cortexa57, V8_6A, (F16, RNG, AES, SHA3), 
ampere1, 0xC0, 0xac3, -1)
-AARCH64_CORE("ampere1a", ampere1a, cortexa57, V8_6A, (F16, RNG, AES, SHA3, 
MEMTAG), ampere1a, 0xC0, 0xac4, -1)
+AARCH64_CORE("ampere1a", ampere1a, cortexa57, V8_6A, (F16, RNG, AES, SHA3, 
SM4, MEMTAG), ampere1a, 0xC0, 0xac4, -1)
 /* Do not swap around "emag" and "xgene1",
this order is required to handle variant correctly. */
 AARCH64_CORE("emag",emag,  xgene1,V8A,  (CRC, CRYPTO), emag, 
0x50, 0x000, 3)
-- 
2.34.1

Re: [PATCH] RISC-V: Optimize min/max with SImode sources on 64-bit

2022-12-29 Thread Philipp Tomsich

On Wed, 28 Dec 2022 at 19:18, Raphael Moreira Zinsly <
rzin...@ventanamicro.com> wrote:

> The Zbb min/max pattern was not matching 32-bit sources when
> compiling for 64-bit.
> This patch separates the pattern into SImode and DImode, and
> use a define_expand to handle SImode on 64-bit.
> zbb-min-max-02.c generates different code as a result of the new
> expander.  The resulting code is as efficient as the old code.
> Furthermore, the special sh1add pattern that appeared in
> zbb-min-max-02.c is tested by the zba-shNadd-* tests.
>
> gcc/ChangeLog:
>
> * config/riscv/bitmanip.md
> (3): Divide pattern into
> si3_insn and di3.
> (si3): Handle SImode sources on
> TARGET_64BIT.
>
> gcc/testsuite:
>
> * gcc.target/riscv/zbb-abs.c: New test.
> * gcc.target/riscv/zbb-min-max-02.c: Addapt the
> expected output.
> ---
>  gcc/config/riscv/bitmanip.md  | 38 ---
>  gcc/testsuite/gcc.target/riscv/zbb-abs.c  | 18 +
>  .../gcc.target/riscv/zbb-min-max-02.c |  2 +-
>  3 files changed, 52 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-abs.c
>
> diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> index d17133d58c1..abf08a29e89 100644
> --- a/gcc/config/riscv/bitmanip.md
> +++ b/gcc/config/riscv/bitmanip.md
> @@ -360,14 +360,42 @@
>DONE;
>  })
>
> -(define_insn "3"
> -  [(set (match_operand:X 0 "register_operand" "=r")
> -(bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
> -  (match_operand:X 2 "register_operand" "r")))]
> -  "TARGET_ZBB"
> +(define_insn "si3_insn"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +(bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
> +(match_operand:SI 2 "register_operand" "r")))]
> +  "!TARGET_64BIT && TARGET_ZBB"
>"\t%0,%1,%2"
>[(set_attr "type" "bitmanip")])
>
> +(define_insn "di3"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +(bitmanip_minmax:DI (match_operand:DI 1 "register_operand" "r")
> +(match_operand:DI 2 "register_operand" "r")))]
> +  "TARGET_64BIT && TARGET_ZBB"
> +  "\t%0,%1,%2"
> +  [(set_attr "type" "bitmanip")])
> +
> +(define_expand "si3"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +(bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
> +(match_operand:SI 2 "register_operand" "r")))]
> +  "TARGET_ZBB"
> +  "
> +{
> +  if (TARGET_64BIT)
> +{
> +  rtx op1_x = gen_reg_rtx (DImode);
> +  emit_move_insn (op1_x, gen_rtx_SIGN_EXTEND (DImode, operands[1]));
> +  rtx op2_x = gen_reg_rtx (DImode);
> +  emit_move_insn (op2_x, gen_rtx_SIGN_EXTEND (DImode, operands[2]));
> +  rtx dst_x = gen_reg_rtx (DImode);
> +  emit_insn (gen_di3 (dst_x, op1_x, op2_x));
> +  emit_move_insn (operands[0], gen_lowpart (SImode, dst_x));
> +  DONE;
> +}
> +}")
>

We have two issues around min/max here:
1. That it doesn't apply to the SImode abs case (which is due to
expand_abs_nojump() blindly testing for the current mode in smax_optab).
2. That we have to reduce the number of extensions to the least amount.

The above addresses expand_abs_nojump(), but makes the general solution
harder as the middle-end needs to know there is no native SImode min/max
available.
We still plan (proof-of-concept works, but a final patch will likely not be
ready before very late in January) to submit a patch to improve the
expansion of MIN_EXPR/MAX_EXPR that utilizes the type-precision and
value-ranges to not even create the sign-extensions in the first place.  If
we do the above, the middle-end will blindly emit this sequence with the 2
sign-extensions — which may or may not be eliminated later by combining
with a w-form.
I'll also add an enhancement to expand_abs_nojump() to our list of changes
for the min/max enhancement during the lowering.

Note that, if we decide to go ahead with using this as a temporary solution
until our change is ready, you'll also need to add a cost for the SImode
max.

Philipp.


> +
>  ;; Optimize the common case of a SImode min/max against a constant
>  ;; that is safe both for sign- and zero-extension.
>  (define_insn_and_split "*minmax"
> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-abs.c
> b/gcc/testsuite/gcc.target/riscv/zbb-abs.c
> new file mode 100644
> index 000..6ef7efdbd49
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbb-abs.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc_zbb" } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" } } */
> +
> +#define ABS(x) (((x) >= 0) ? (x) : -(x))
> +
> +int
> +foo (int x)
> +{
> +  return ABS(x);
> +}
> +
> +/* { dg-final { scan-assembler-times "neg" 1 } } */
> +/* { dg-final {

Re: [PATCH v2 02/11] riscv: Restructure callee-saved register save/restore code

2022-12-27 Thread Philipp Tomsich

Applied to master (with the change from the reviews), thanks!

Philipp.

On Mon, 19 Dec 2022 at 07:30, Kito Cheng  wrote:

> just one more nit: Use INVALID_REGNUM as sentinel value for
> riscv_next_saved_reg, otherwise LGTM, and feel free to commit that
> separately :)
>
> On Mon, Dec 19, 2022 at 9:08 AM Christoph Muellner
>  wrote:
> >
> > From: Christoph Müllner 
> >
> > This patch restructures the loop over the GP registers
> > which saves/restores then as part of the prologue/epilogue.
> > No functional change is intended by this patch, but it
> > offers the possibility to use load-pair/store-pair instructions.
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv.cc (riscv_next_saved_reg): New function.
> > (riscv_is_eh_return_data_register): New function.
> > (riscv_for_each_saved_reg): Restructure loop.
> >
> > Signed-off-by: Christoph Müllner 
> > ---
> >  gcc/config/riscv/riscv.cc | 94 +++
> >  1 file changed, 66 insertions(+), 28 deletions(-)
> >
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 6dd2ab2d11e..a8d5e1dac7f 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -4835,6 +4835,49 @@ riscv_save_restore_reg (machine_mode mode, int
> regno,
> >fn (gen_rtx_REG (mode, regno), mem);
> >  }
> >
> > +/* Return the next register up from REGNO up to LIMIT for the callee
> > +   to save or restore.  OFFSET will be adjusted accordingly.
> > +   If INC is set, then REGNO will be incremented first.  */
> > +
> > +static unsigned int
> > +riscv_next_saved_reg (unsigned int regno, unsigned int limit,
> > + HOST_WIDE_INT *offset, bool inc = true)
> > +{
> > +  if (inc)
> > +regno++;
> > +
> > +  while (regno <= limit)
> > +{
> > +  if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
> > +   {
> > + *offset = *offset - UNITS_PER_WORD;
> > + break;
> > +   }
> > +
> > +  regno++;
> > +}
> > +  return regno;
> > +}
> > +
> > +/* Return TRUE if provided REGNO is eh return data register.  */
> > +
> > +static bool
> > +riscv_is_eh_return_data_register (unsigned int regno)
> > +{
> > +  unsigned int i, regnum;
> > +
> > +  if (!crtl->calls_eh_return)
> > +return false;
> > +
> > +  for (i = 0; (regnum = EH_RETURN_DATA_REGNO (i)) != INVALID_REGNUM;
> i++)
> > +if (regno == regnum)
> > +  {
> > +   return true;
> > +  }
> > +
> > +  return false;
> > +}
> > +
> >  /* Call FN for each register that is saved by the current function.
> > SP_OFFSET is the offset of the current stack pointer from the start
> > of the frame.  */
> > @@ -4844,36 +4887,31 @@ riscv_for_each_saved_reg (poly_int64 sp_offset,
> riscv_save_restore_fn fn,
> >   bool epilogue, bool maybe_eh_return)
> >  {
> >HOST_WIDE_INT offset;
> > +  unsigned int regno;
> > +  unsigned int start = GP_REG_FIRST;
> > +  unsigned int limit = GP_REG_LAST;
> >
> >/* Save the link register and s-registers. */
> > -  offset = (cfun->machine->frame.gp_sp_offset - sp_offset).to_constant
> ();
> > -  for (unsigned int regno = GP_REG_FIRST; regno <= GP_REG_LAST; regno++)
> > -if (BITSET_P (cfun->machine->frame.mask, regno - GP_REG_FIRST))
> > -  {
> > -   bool handle_reg =
> !cfun->machine->reg_is_wrapped_separately[regno];
> > -
> > -   /* If this is a normal return in a function that calls the
> eh_return
> > -  builtin, then do not restore the eh return data registers as
> that
> > -  would clobber the return value.  But we do still need to save
> them
> > -  in the prologue, and restore them for an exception return, so
> we
> > -  need special handling here.  */
> > -   if (epilogue && !maybe_eh_return && crtl->calls_eh_return)
> > - {
> > -   unsigned int i, regnum;
> > -
> > -   for (i = 0; (regnum = EH_RETURN_DATA_REGNO (i)) !=
> INVALID_REGNUM;
> > -i++)
> > - if (regno == regnum)
> > -   {
> > - handle_reg = FALSE;
> > - break;
> > -   }
> > - }
> > -
> > -   if (handle_reg)
> > - riscv_save_restore_reg (word_mode, regno, offset, fn);
> > -   offset -= UNITS_PER_WORD;
> > -  }
> > +  offset = (cfun->machine->frame.gp_sp_offset - sp_offset).to_constant
> ()
> > +  + UNITS_PER_WORD;
> > +  for (regno = riscv_next_saved_reg (start, limit, , false);
> > +   regno <= limit;
> > +   regno = riscv_next_saved_reg (regno, limit, ))
> > +{
> > +  if (cfun->machine->reg_is_wrapped_separately[regno])
> > +   continue;
> > +
> > +  /* If this is a normal return in a function that calls the
> eh_return
> > +builtin, then do not restore the eh return data registers as
> that
> > +would clobber the return value.  But we do still need to save
> them
> > +in the prologue, and

Re: [PATCH v2 01/11] riscv: attr: Synchronize comments with code

2022-12-27 Thread Philipp Tomsich

Applied to master, thanks!

Philipp.

On Mon, 19 Dec 2022 at 03:49, Kito Cheng  wrote:

> LGTM, you can commit this separately if you want :)
>
> On Mon, Dec 19, 2022 at 9:09 AM Christoph Muellner
>  wrote:
> >
> > From: Christoph Müllner 
> >
> > The comment above the enumeration of existing attributes got out of
> > order and a few entries were forgotten.
> > This patch synchronizes the comments according to the list.
> > This commit does not include any functional change.
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv.md: Sync comments with code.
> >
> > Signed-off-by: Christoph Müllner 
> > ---
> >  gcc/config/riscv/riscv.md | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> > index df57e2b0b4a..a8bb331f25c 100644
> > --- a/gcc/config/riscv/riscv.md
> > +++ b/gcc/config/riscv/riscv.md
> > @@ -220,7 +220,6 @@ (define_attr "enabled" "no,yes"
> >  ;; mfc transfer from coprocessor
> >  ;; const   load constant
> >  ;; arith   integer arithmetic instructions
> > -;; auipc   integer addition to PC
> >  ;; logical  integer logical instructions
> >  ;; shift   integer shift instructions
> >  ;; slt set less than instructions
> > @@ -236,9 +235,13 @@ (define_attr "enabled" "no,yes"
> >  ;; fcvtfloating point convert
> >  ;; fsqrt   floating point square root
> >  ;; multi   multiword sequence (or user asm statements)
> > +;; auipc   integer addition to PC
> > +;; sfb_alu  SFB ALU instruction
> >  ;; nop no operation
> >  ;; ghost   an instruction that produces no real code
> >  ;; bitmanipbit manipulation instructions
> > +;; rotate   rotation instructions
> > +;; atomic   atomic instructions
> >  ;; Classification of RVV instructions which will be added to each RVV
> .md pattern and used by scheduler.
> >  ;; rdvlenb vector byte length vlenb csrr read
> >  ;; rdvlvector length vl csrr read
> > --
> > 2.38.1
> >
>

Re: [RFC PATCH] RISC-V: Add support for vector crypto extensions

2022-12-27 Thread Philipp Tomsich

On Tue, 27 Dec 2022 at 19:58, Palmer Dabbelt  wrote:
>
> On Tue, 27 Dec 2022 09:35:55 PST (-0800), gcc-patches@gcc.gnu.org wrote:
> >
> >
> > On 12/21/22 11:31, Christoph Muellner wrote:
> >> From: Christoph Müllner 
> >>
> >> This series adds basic support for the vector crypto extensions:
> >> * Zvkb
> >> * Zvkg
> >> * Zvkh[a,b]
> >> * Zvkn
> >> * Zvksed
> >> * Zvksh
> >>
> >> The implementation follows the version 20221220 of the specification,
> >> which can be found here:
> >>https://github.com/riscv/riscv-crypto/releases/tag/v20221220
> >>
> >> Note, that this specification is not frozen yet, meaning that
> >> incompatible changes are possible.
> >> Therefore, this patchset is marked as RFC and should not be considered
> >> for upstream inclusion.
> >>
> >> All extensions come with (passing) tests for the feature test macros.
> >>
> >> A Binutils patch series for vector crypto support can be found here:
> >>https://sourceware.org/pipermail/binutils/2022-December/125272.html
> >>
> >> Signed-off-by: Christoph Müllner 
> >> ---
> >>   gcc/common/config/riscv/riscv-common.cc | 16 
> >>   gcc/config/riscv/riscv-opts.h   | 16 
> >>   gcc/config/riscv/riscv.opt  |  3 +++
> >>   gcc/testsuite/gcc.target/riscv/zvkb.c   | 13 +
> >>   gcc/testsuite/gcc.target/riscv/zvkg.c   | 13 +
> >>   gcc/testsuite/gcc.target/riscv/zvkha.c  | 13 +
> >>   gcc/testsuite/gcc.target/riscv/zvkhb.c  | 13 +
> >>   gcc/testsuite/gcc.target/riscv/zvkn.c   | 13 +
> >>   gcc/testsuite/gcc.target/riscv/zvksed.c | 13 +
> >>   gcc/testsuite/gcc.target/riscv/zvksh.c  | 13 +
> >>   10 files changed, 126 insertions(+)
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/zvkb.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/zvkg.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/zvkha.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/zvkhb.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/zvkn.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/zvksed.c
> >>   create mode 100644 gcc/testsuite/gcc.target/riscv/zvksh.c
> > I don't see anything objectionable in here.  I'd guess that most (but
> > perhaps not all) of these will wire up as builtins at some point in the
> > not too distant future.
>
> These allow things like `-march=rv64gc_zvksh`, it's not really clear
> what the indented behavior is there -- specifically, does that
> implicitly enable some base vector extension?
>
> I've just skimmed the ISA manual here, but all I can find is a bit
> ambiguous
>
> With the exception of Zvknhb, each of these Vector Crypto Extensions
> can be build on any base Vector Extension, embedded (Zve*) or
> application ("V"). Zvknhb requires ELEN=64 and therefore cannot be
> implemented on a Zve32* base.
>
> I doubt it really matters which way we pick, but it is something we're
> going to need to keep consistent moving forwards as otherwise users
> might get some surprising behavior.  This has come up a bunch of times,
> but there's slightly different wording each time in the specs and I'm
> never really sure what to read of it.
>
> I don't think that alone would be enough to delay this for gcc-14, but
> as far as I can tell binutils is branching very soon for a target
> release in the middle of January.  I'm guessing these extensions will
> not be frozen by then, which would be a blocker.
>
> I'm not sure if anyone has a pressing need for these?  If not, I think
> it's best to delay them until binutils-2.41 (and presumably then
> gcc-14).

Given that the encodings last changed on Dec 21st, I would also prefer
if we could off until after the binutils-2.40 has been released.

Philipp.

Re: [PATCH v2 05/11] riscv: thead: Add support for the XTheadBa ISA extension

2022-12-19 Thread Philipp Tomsich

On Mon, 19 Dec 2022 at 05:20, Kito Cheng  wrote:
>
> LGTM with a nit:
>
> ...
> > +  "TARGET_XTHEADBA
> > +   && (INTVAL (operands[2]) >= 0) && (INTVAL (operands[2]) <= 3)"
>
> IN_RANGE(INTVAL(operands[2]), 0, 3)
>
> and I am little bit suppress it can be zero

So was I, when reading the specification — and I reconfirmed that bit
by checking with the folks at T-Head.

We discussed this internally before submitting: while this case should
never occur (as other pieces in the compiler are smart enough to
simplify the RTX), we decided to include the 0 as it is an accurate
reflection of the instruction semantics.

Philipp.

>
> > +  "th.addsl\t%0,%1,%3,%2"
> > +  [(set_attr "type" "bitmanip")
> > +   (set_attr "mode" "")])

1 2 3 4 >

1 - 100 of 346 matches

Mail list logo