RE: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-14 Thread Liu, Hongtao via Gcc-patches


> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, June 14, 2023 10:30 PM
> To: Andrew Stubbs 
> Cc: gcc-patches@gcc.gnu.org; richard.sandif...@arm.com; Jan Hubicka
> ; Liu, Hongtao ;
> kirill.yuk...@gmail.com
> Subject: Re: [PATCH 3/3] AVX512 fully masked vectorization
> 
> 
> 
> > Am 14.06.2023 um 16:27 schrieb Andrew Stubbs
> :
> >
> > On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:
> >> This implemens fully masked vectorization or a masked epilog for
> >> AVX512 style masks which single themselves out by representing each
> >> lane with a single bit and by using integer modes for the mask (both
> >> is much like GCN).
> >> AVX512 is also special in that it doesn't have any instruction to
> >> compute the mask from a scalar IV like SVE has with while_ult.
> >> Instead the masks are produced by vector compares and the loop
> >> control retains the scalar IV (mainly to avoid dependences on mask
> >> generation, a suitable mask test instruction is available).
> >
> > This is also sounds like GCN. We currently use WHILE_ULT in the middle end
> which expands to a vector compare against a vector of stepped values. This
> requires an additional instruction to prepare the comparison vector
> (compared to SVE), but the "while_ultv64sidi" pattern (for example) returns
> the DImode bitmask, so it works reasonably well.
> >
> >> Like RVV code generation prefers a decrementing IV though IVOPTs
> >> messes things up in some cases removing that IV to eliminate it with
> >> an incrementing one used for address generation.
> >> One of the motivating testcases is from PR108410 which in turn is
> >> extracted from x264 where large size vectorization shows issues with
> >> small trip loops.  Execution time there improves compared to classic
> >> AVX512 with AVX2 epilogues for the cases of less than 32 iterations.
> >> size   scalar 128 256 512512e512f
> >> 19.42   11.329.35   11.17   15.13   16.89
> >> 25.726.536.666.667.628.56
> >> 34.495.105.105.745.085.73
> >> 44.104.334.295.213.794.25
> >> 63.783.853.864.762.542.85
> >> 83.641.893.764.501.922.16
> >>123.562.213.754.261.261.42
> >>163.360.831.064.160.951.07
> >>203.391.421.334.070.750.85
> >>243.230.661.724.220.620.70
> >>283.181.092.044.200.540.61
> >>323.160.470.410.410.470.53
> >>343.160.670.610.560.440.50
> >>383.190.950.950.820.400.45
> >>423.090.581.211.130.360.40
> >> 'size' specifies the number of actual iterations, 512e is for a
> >> masked epilog and 512f for the fully masked loop.  From
> >> 4 scalar iterations on the AVX512 masked epilog code is clearly the
> >> winner, the fully masked variant is clearly worse and it's size
> >> benefit is also tiny.
> >
> > Let me check I understand correctly. In the fully masked case, there is a
> single loop in which a new mask is generated at the start of each iteration. 
> In
> the masked epilogue case, the main loop uses no masking whatsoever, thus
> avoiding the need for generating a mask, carrying the mask, inserting
> vec_merge operations, etc, and then the epilogue looks much like the fully
> masked case, but unlike smaller mode epilogues there is no loop because the
> eplogue vector size is the same. Is that right?
> 
> Yes.
What about vectorizer and unroll, when vector size is the same, unroll factor 
is N, but there're at most N - 1 iterations for epilogue loop, will there still 
a loop? 
> > This scheme seems like it might also benefit GCN, in so much as it 
> > simplifies
> the hot code path.
> >
> > GCN does not actually have smaller vector sizes, so there's no analogue to
> AVX2 (we pretend we have some smaller sizes, but that's because the
> middle end can't do masking everywhere yet, and it helps make some vector
> constants smaller, perhaps).
> >
> >> This patch does not enable using fully masked loops or masked
> >> epilogues by default.  More work on cost modeling and vectorization
> >> kind selection on x86_64 is necessary for this.
> >> Implementation wise this introduces
> LOOP_VINFO_PARTIAL_VECTORS_STYLE
> >> which could be exploited further to unify some of the flags we have
> >> right now but there didn't seem to be many easy things to merge, so
> >> I'm leaving this for followups.
> >> Mask requirements as registered by vect_record_loop_mask are kept in
> >> their original form and recorded in a hash_set now instead of being
> >> processed to a vector of rgroup_controls.  Instead that's now left to
> >> the final analysis phase which tries forming the rgroup_controls
> >> vector using while_ult and if that fails now tries AVX512 style which
> 

[PATCH V1] RISC-V:Add float16 tuple type support

2023-06-14 Thread shiyulong
From: yulong 

This patch adds support for the float16 tuple type.

gcc/ChangeLog:

* config/riscv/genrvv-type-indexer.cc (valid_type): Enable FP16 tuple.
* config/riscv/riscv-modes.def (RVV_TUPLE_MODES): New macro.
(ADJUST_ALIGNMENT): Ditto.
(RVV_TUPLE_PARTIAL_MODES): Ditto.
(ADJUST_NUNITS): Ditto.
* config/riscv/riscv-vector-builtins-types.def (vfloat16mf4x2_t): New 
types.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Ditto.
(vfloat16m2x4_t): Ditto.
(vfloat16m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): New macro.
(vfloat16mf4x3_t): Ditto.
(vfloat16mf4x4_t): Ditto.
(vfloat16mf4x5_t): Ditto.
(vfloat16mf4x6_t): Ditto.
(vfloat16mf4x7_t): Ditto.
(vfloat16mf4x8_t): Ditto.
(vfloat16mf2x2_t): Ditto.
(vfloat16mf2x3_t): Ditto.
(vfloat16mf2x4_t): Ditto.
(vfloat16mf2x5_t): Ditto.
(vfloat16mf2x6_t): Ditto.
(vfloat16mf2x7_t): Ditto.
(vfloat16mf2x8_t): Ditto.
(vfloat16m1x2_t): Ditto.
(vfloat16m1x3_t): Ditto.
(vfloat16m1x4_t): Ditto.
(vfloat16m1x5_t): Ditto.
(vfloat16m1x6_t): Ditto.
(vfloat16m1x7_t): Ditto.
(vfloat16m1x8_t): Ditto.
(vfloat16m2x2_t): Ditto.
(vfloat16m2x3_t): Ditto.
(vfloat16m2x4_t): Ditto.
(vfloat16m4x2_t): Ditto.
* config/riscv/riscv-vector-switch.def (TUPLE_ENTRY): New.
* config/riscv/riscv.md: New.
* config/riscv/vector-iterators.md: New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/tuple-28.c: New test.
* gcc.target/riscv/rvv/base/tuple-29.c: New test.
* gcc.target/riscv/rvv/base/tuple-30.c: New test.
* gcc.target/riscv/rvv/base/tuple-31.c: New test.
* gcc.target/riscv/rvv/base/tuple-32.c: New test.

---
 gcc/config/riscv/genrvv-type-indexer.cc   |  3 -
 gcc/config/riscv/riscv-modes.def  | 15 +
 .../riscv/riscv-vector-builtins-types.def | 25 
 gcc/config/riscv/riscv-vector-builtins.def| 30 ++
 gcc/config/riscv/riscv-vector-switch.def  | 32 ++
 gcc/config/riscv/riscv.md |  5 ++
 gcc/config/riscv/vector-iterators.md  | 37 
 .../gcc.target/riscv/rvv/base/tuple-28.c  | 59 +++
 .../gcc.target/riscv/rvv/base/tuple-29.c  | 59 +++
 .../gcc.target/riscv/rvv/base/tuple-30.c  | 58 ++
 .../gcc.target/riscv/rvv/base/tuple-31.c  | 30 ++
 .../gcc.target/riscv/rvv/base/tuple-32.c  | 16 +
 12 files changed, 366 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/tuple-28.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/tuple-29.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/tuple-30.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/tuple-31.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/tuple-32.c

diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
b/gcc/config/riscv/genrvv-type-indexer.cc
index 8fc93ceaab4..a332a6a3334 100644
--- a/gcc/config/riscv/genrvv-type-indexer.cc
+++ b/gcc/config/riscv/genrvv-type-indexer.cc
@@ -73,9 +73,6 @@ valid_type (unsigned sew, int lmul_log2, unsigned nf, bool 
float_p)
   if (nf > 8 || nf < 1)
 return false;
 
-  if (sew == 16 && nf != 1 && float_p) // Disable FP16 tuple in temporarily.
-return false;
-
   switch (lmul_log2)
 {
 case 1:
diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
index 19a4f9fb3db..1d152709ddc 100644
--- a/gcc/config/riscv/riscv-modes.def
+++ b/gcc/config/riscv/riscv-modes.def
@@ -220,6 +220,7 @@ ADJUST_ALIGNMENT (VNx1QI, 1);
 #define RVV_TUPLE_MODES(NBYTES, NSUBPARTS, VB, VH, VS, VD) 
\
   VECTOR_MODE_WITH_PREFIX (VNx##NSUBPARTS##x, INT, QI, NBYTES, 1); 
\
   VECTOR_MODE_WITH_PREFIX (VNx##NSUBPARTS##x, INT, HI, NBYTES / 2, 1); 
\
+  VECTOR_MODE_WITH_PREFIX (VNx##NSUBPARTS##x, FLOAT, HF, NBYTES / 2, 1);   
\
   VECTOR_MODE_WITH_PREFIX (VNx##NSUBPARTS##x, INT, SI, NBYTES / 4, 1); 
\
   VECTOR_MODE_WITH_PREFIX (VNx##NSUBPARTS##x, FLOAT, SF, NBYTES / 4, 1);   

Re: [PATCH] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-14 Thread Hongtao Liu via Gcc-patches
On Thu, Jun 15, 2023 at 1:23 PM Hongtao Liu  wrote:
>
> On Wed, Jun 14, 2023 at 5:03 PM Jan Beulich  wrote:
> >
> > On 14.06.2023 09:41, Hongtao Liu wrote:
> > > On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches
> > >  wrote:
> > >>
> > >> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are
> > >> never longer (yet sometimes shorter) than the corresponding VSHUFPS /
> > >> VPSHUFD, due to the immediate operand of the shuffle insns balancing the
> > >> need for VEX3 in the broadcast ones. When EVEX encoding is required the
> > >> broadcast insns are always shorter.
> > >>
> > >> Add two new alternatives each, one covering the AVX2 case and one
> > >> covering AVX512.
> > > I think you can just change assemble output for this first alternative
> > > when TARGET_AVX2, use vbroadcastss, else use vshufps since
> > > vbroadcastss only accept register operand when TARGET_AVX2. And no
> > > need to support 2 extra alternatives which doesn't make sense just
> > > make RA more confused about the same meaning of different
> > > alternatives.
> >
> > You mean by switching from "@ ..." to C code using "switch
> > (which_alternative)"? I can do that, sure. Yet that'll make for a
> > more complicated "length_immediate" attribute then. Would be nice
> Yes, you can also do something like
>(set (attr "length_immediate")
>  (cond [(eq_attr "alternative" "0")
>(if_then_else (match_test "TARGET_AVX2)
> (const_string "")
>(const_string "1"))
> ...]
>
> > if you could confirm that this is what you want, as I may well
> > have misunderstood you.
> >
> > But that'll be for vec_dupv4sf only, as vec_dupv4si is subtly
> > different.
> Yes, but can we use vpbroadcastd for vec_dupv4si similarly?
> >
> > >> ---
> > >> I'm working from the assumption that the isa attributes to the original
> > >> 1st and 2nd alternatives don't need further restricting (to sse2_noavx2
> > >> or avx_noavx2 as applicable), as the new earlier alternatives cover all
> > >> operand forms already when at least AVX2 is enabled.
> > >>
> > >> Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss
> > >> use? (Same further down in *vec_dupv4si and avx2_vbroadcasti128_
> > >> and elsewhere.)
> > > Not sure about this part. I grep prefix_extra, seems only used by
> > > znver.md/znver4.md for schedule, and only for comi instructions(?the
> > > reservation name seems so).
> >
> > define_attr "length_vex" and define_attr "length" use it, too.
> > Otherwise I would have asked whether the attribute couldn't be
> > purged from most insns.
> >
> > My present understanding is that the attribute is wrong on
> > vec_dupv4sf (and hence wants dropping from there altogether), and it
> > should be "prefix_data16" instead on *vec_dupv4si, evaluating to 1
> > only for the non-AVX pshufd case. I suspect at least the latter
> > would be going to far for doing it "while here" right in this patch.
> > Plus I think I have seen various other questionable uses of that
> > attribute.
> >
> > >> Is use of Yv for the source operand really necessary in *vec_dupv4si?
> > >> I.e. would scalar integer values be put in XMM{16...31} when AVX512VL
> > > Yes, You can look at ix86_hard_regno_mode_ok, EXT_REX_SSE_REGNO is
> > > allowed for scalar mode, but not for 128/256-bit vector modes.
> > >
> > > 20204  if (TARGET_AVX512F
> > > 20205  && (VALID_AVX512F_REG_OR_XI_MODE (mode)
> > > 20206  || VALID_AVX512F_SCALAR_MODE (mode)))
> > > 20207return true;
> >
> > Okay, so I need to switch input constraints for relevant new
> > alternatives to Yv (I actually wonder why I did use v in
> > vec_dupv4sf, as it was clear to me that SFmode can be in the high
> > 16 xmm registers with just AVX512F).
> >
> > >> isn't enabled? If so (*movsi_internal / *movdi_internal suggest they
> > >> might), wouldn't *vec_dupv2di need to use Yv as well in its 3rd
> > >> alternative (or just m, as Yv is already covered by the 2nd one)?
> > > I guess xm is more suitable since we still want to allocate
> > > operands[1] to register when sse3_noavx.
> > > It didn't hit any error since for avx and above, alternative 1(2rd
> > > one) is always matched than alternative 2.
> >
> > I'm afraid I don't follow: With just -mavx512f the source operand
> > can be in, say, %xmm16 (as per your clarification above). This
> > would not match Yv, but it would match vm. And hence wrongly
> > create an AVX512VL form of vmovddup. I didn't try it out earlier,
> > because unlike for SFmode / DFmode I thought it's not really clear
> > how to get the compiler to reliably put a DImode variable in an xmm
> > reg, but it just occurred to me that this can be done the same way
> > there. And voila,
> >
> > typedef long long __attribute__((vector_size(16))) v2di;
> >
> > v2di bcst(long long ll) {
> > register long long x asm("xmm16") = ll;
> >
> > asm("nop %%esp" : "+v" (x));
> >

Re: [PATCH] x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F

2023-06-14 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 14, 2023 at 5:32 PM Jan Beulich  wrote:
>
> On 14.06.2023 10:10, Hongtao Liu wrote:
> > On Wed, Jun 14, 2023 at 1:59 PM Jan Beulich via Gcc-patches
> >  wrote:
> >>
> >> There's no reason to constrain this to AVX512VL, as the wider operation
> >> is not usable for more narrow operands only when the possible memory
> > But this may require more resources (on AMD znver4 processor a zmm
> > instruction will also be split into 2 uops, right?) And on some intel
> > processors(SKX/CLX) there will be frequency reduction.
>
> I'm afraid I don't follow: Largely the same AVX512 code would be
> generated when passing -mavx512vl, so how can power/performance
> considerations matter here? All I'm doing here (and in a few more
Yes , for -march=*** is ok since AVX512VL is included.
what your patch improve is -mavx512f -mno-avx512vl, but for specific
option combinations like -mavx512f -mprefer-vector-width=256
-mno-avx512vl, your patch will produce zmm instruction which is not
expected.
> patches I'm still in the process of testing) is relax when AVX512
> insns can actually be used (reducing the copying between registers
> and/or the number of insns needed). My understanding on the Intel
> side is that it only matters whether AVX512 insns are used, not
No, vector length matters, ymm/xmm evex insns are ok to use, but zmm
insns will cause frequency reduction.
> what vector length they are. You may be right about znver4, though.
>
> Nevertheless I agree ...
>
> > If it needs to be done, it is better guarded with
> > !TARGET_PREFER_AVX256, at least when micro-architecture AVX256_OPTIMAL
> > or users explicitly uses -mprefer-vector-width=256, we don't want to
> > produce any zmm instruction for surprise.(Although
> > -mprefer-vector-width=256 is supposed for auto-vectorizer, but backend
> > codegen also use it under such cases, i.e. in *movsf_internal
> > alternative 5 use zmm only TARGET_AVX512F && !TARGET_PREFER_AVX256.)
>
> ... that respecting such overrides is probably desirable, so I'll
> adjust.
>
> Jan
>
> >> source is a non-broadcast one. This way even the scalar copysign3
> >> can benefit from the operation being a single-insn one (leaving aside
> >> moves which the compiler decides to insert for unclear reasons, and
> >> leaving aside the fact that bcst_mem_operand() is too restrictive for
> >> broadcast to be embedded right into VPTERNLOG*).
> >>
> >> Along with this also request value duplication in
> >> ix86_expand_copysign()'s call to ix86_build_signbit_mask(), eliminating
> >> excess space allocation in .rodata.*, filled with zeros which are never
> >> read.
> >>
> >> gcc/
> >>
> >> * config/i386/i386-expand.cc (ix86_expand_copysign): Request
> >> value duplication by ix86_build_signbit_mask() when AVX512F and
> >> not HFmode.
> >> * config/i386/sse.md (*_vternlog_all): Convert to
> >> 2-alternative form. Adjust "mode" attribute. Add "enabled"
> >> attribute.
> >> (*_vpternlog_1): Relax to just TARGET_AVX512F.
> >> (*_vpternlog_2): Likewise.
> >> (*_vpternlog_3): Likewise.
>


-- 
BR,
Hongtao


Re: [PATCH] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-14 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 14, 2023 at 5:03 PM Jan Beulich  wrote:
>
> On 14.06.2023 09:41, Hongtao Liu wrote:
> > On Wed, Jun 14, 2023 at 1:58 PM Jan Beulich via Gcc-patches
> >  wrote:
> >>
> >> ... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are
> >> never longer (yet sometimes shorter) than the corresponding VSHUFPS /
> >> VPSHUFD, due to the immediate operand of the shuffle insns balancing the
> >> need for VEX3 in the broadcast ones. When EVEX encoding is required the
> >> broadcast insns are always shorter.
> >>
> >> Add two new alternatives each, one covering the AVX2 case and one
> >> covering AVX512.
> > I think you can just change assemble output for this first alternative
> > when TARGET_AVX2, use vbroadcastss, else use vshufps since
> > vbroadcastss only accept register operand when TARGET_AVX2. And no
> > need to support 2 extra alternatives which doesn't make sense just
> > make RA more confused about the same meaning of different
> > alternatives.
>
> You mean by switching from "@ ..." to C code using "switch
> (which_alternative)"? I can do that, sure. Yet that'll make for a
> more complicated "length_immediate" attribute then. Would be nice
Yes, you can also do something like
   (set (attr "length_immediate")
 (cond [(eq_attr "alternative" "0")
   (if_then_else (match_test "TARGET_AVX2)
(const_string "")
   (const_string "1"))
...]

> if you could confirm that this is what you want, as I may well
> have misunderstood you.
>
> But that'll be for vec_dupv4sf only, as vec_dupv4si is subtly
> different.
Yes, but can we use vpbroadcastd for vec_dupv4si similarly?
>
> >> ---
> >> I'm working from the assumption that the isa attributes to the original
> >> 1st and 2nd alternatives don't need further restricting (to sse2_noavx2
> >> or avx_noavx2 as applicable), as the new earlier alternatives cover all
> >> operand forms already when at least AVX2 is enabled.
> >>
> >> Isn't prefix_extra use bogus here? What extra prefix does vbroadcastss
> >> use? (Same further down in *vec_dupv4si and avx2_vbroadcasti128_
> >> and elsewhere.)
> > Not sure about this part. I grep prefix_extra, seems only used by
> > znver.md/znver4.md for schedule, and only for comi instructions(?the
> > reservation name seems so).
>
> define_attr "length_vex" and define_attr "length" use it, too.
> Otherwise I would have asked whether the attribute couldn't be
> purged from most insns.
>
> My present understanding is that the attribute is wrong on
> vec_dupv4sf (and hence wants dropping from there altogether), and it
> should be "prefix_data16" instead on *vec_dupv4si, evaluating to 1
> only for the non-AVX pshufd case. I suspect at least the latter
> would be going to far for doing it "while here" right in this patch.
> Plus I think I have seen various other questionable uses of that
> attribute.
>
> >> Is use of Yv for the source operand really necessary in *vec_dupv4si?
> >> I.e. would scalar integer values be put in XMM{16...31} when AVX512VL
> > Yes, You can look at ix86_hard_regno_mode_ok, EXT_REX_SSE_REGNO is
> > allowed for scalar mode, but not for 128/256-bit vector modes.
> >
> > 20204  if (TARGET_AVX512F
> > 20205  && (VALID_AVX512F_REG_OR_XI_MODE (mode)
> > 20206  || VALID_AVX512F_SCALAR_MODE (mode)))
> > 20207return true;
>
> Okay, so I need to switch input constraints for relevant new
> alternatives to Yv (I actually wonder why I did use v in
> vec_dupv4sf, as it was clear to me that SFmode can be in the high
> 16 xmm registers with just AVX512F).
>
> >> isn't enabled? If so (*movsi_internal / *movdi_internal suggest they
> >> might), wouldn't *vec_dupv2di need to use Yv as well in its 3rd
> >> alternative (or just m, as Yv is already covered by the 2nd one)?
> > I guess xm is more suitable since we still want to allocate
> > operands[1] to register when sse3_noavx.
> > It didn't hit any error since for avx and above, alternative 1(2rd
> > one) is always matched than alternative 2.
>
> I'm afraid I don't follow: With just -mavx512f the source operand
> can be in, say, %xmm16 (as per your clarification above). This
> would not match Yv, but it would match vm. And hence wrongly
> create an AVX512VL form of vmovddup. I didn't try it out earlier,
> because unlike for SFmode / DFmode I thought it's not really clear
> how to get the compiler to reliably put a DImode variable in an xmm
> reg, but it just occurred to me that this can be done the same way
> there. And voila,
>
> typedef long long __attribute__((vector_size(16))) v2di;
>
> v2di bcst(long long ll) {
> register long long x asm("xmm16") = ll;
>
> asm("nop %%esp" : "+v" (x));
> return (v2di){x, x};
> }
>
> compiled with just -mavx512f (and -O2) produces an AVX512VL insn.
Ah, I see, indeed it's a potential bug for -mavx512f -mavx512vl
I meant with -mavx512vl,
_vec_dup_gpr will be matched
instead of vec_dupv2di since it's put before 

[PATCH] Reimplement __gnu_cxx::__ops operators

2023-06-14 Thread François Dumont via Gcc-patches
I think we all agree that __gnu_cxx::__ops needed to be reimplemented, 
here it is.


Note that I kept the usage of std::ref in ,  and .

    libstdc++: Reimplement __gnu_cxx::__ops operators

    Replace functors using iterators as input to adopt functors that
    are matching the same Standard expectations as the ones imposed on
    predicates used in predicates-aware algos. Doing so we need far less
    functors. It impose that iterators are dereference at algo level and
    not in the functors anymore.

    libstdc++-v3/ChangeLog:

    * include/std/functional (_Not_fn): Move to...
    * include/bits/predefined_ops.h: ...here, and expose a version
    in pre-C++14 mode.
    (__not_fn): New, use latter.
    (_Iter_less_iter, _Iter_less_val, _Val_less_iter, 
_Iter_equal_to_iter)
    (_Iter_equal_to_val, _Iter_comp_iter, _Iter_comp_val, 
_Val_comp_iter)
    (_Iter_equals_val, _Iter_equals_iter, _Iter_pred, 
_Iter_comp_val)

    (_Iter_comp_to_val, _Iter_comp_to_iter, _Iter_negate): Remove.
    (__iter_less_iter, __iter_less_val, __iter_comp_val, 
__val_less_iter)
    (__val_comp_iter, __iter_equal_to_iter, 
__iter_equal_to_val, __iter_comp_iter)
    (__val_comp_iter, __iter_equals_val, __iter_comp_iter, 
__pred_iter): Remove.

    (_Less, _Equal_to, _Equal_to_val, _Comp_val): New.
    (__less, __equal_to, __comp_val): New.
    * include/bits/stl_algo.h: Adapt all algos to use new 
__gnu_cxx::__ops operators.
    When possible use std::move to pass predicates between 
routines.

    * include/bits/stl_algobase.h: Likewise.
    * include/bits/stl_heap.h: Likewise.
    * include/std/deque: Cleanup usage of __gnu_cxx::__ops 
operators.

    * include/std/string: Likewise.
    * include/std/vector: Likewise.

Tested under Linux x86_64 normal and _GLIBCXX_DEBUG modes.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/predefined_ops.h b/libstdc++-v3/include/bits/predefined_ops.h
index e9933373ed9..dc8920ed5f8 100644
--- a/libstdc++-v3/include/bits/predefined_ops.h
+++ b/libstdc++-v3/include/bits/predefined_ops.h
@@ -32,376 +32,170 @@
 
 #include 
 
+#if __cplusplus >= 201103L
+# include 
+#endif
+
 namespace __gnu_cxx
 {
 namespace __ops
 {
-  struct _Iter_less_iter
+  struct _Less
   {
-template
+template
   _GLIBCXX14_CONSTEXPR
   bool
-  operator()(_Iterator1 __it1, _Iterator2 __it2) const
-  { return *__it1 < *__it2; }
+  operator()(const _Lhs& __lhs, const _Rhs& __rhs) const
+  { return __lhs < __rhs; }
   };
 
   _GLIBCXX14_CONSTEXPR
-  inline _Iter_less_iter
-  __iter_less_iter()
-  { return _Iter_less_iter(); }
-
-  struct _Iter_less_val
-  {
-#if __cplusplus >= 201103L
-constexpr _Iter_less_val() = default;
-#else
-_Iter_less_val() { }
-#endif
-
-_GLIBCXX20_CONSTEXPR
-explicit
-_Iter_less_val(_Iter_less_iter) { }
-
-template
-  _GLIBCXX20_CONSTEXPR
-  bool
-  operator()(_Iterator __it, _Value& __val) const
-  { return *__it < __val; }
-  };
-
-  _GLIBCXX20_CONSTEXPR
-  inline _Iter_less_val
-  __iter_less_val()
-  { return _Iter_less_val(); }
-
-  _GLIBCXX20_CONSTEXPR
-  inline _Iter_less_val
-  __iter_comp_val(_Iter_less_iter)
-  { return _Iter_less_val(); }
-
-  struct _Val_less_iter
-  {
-#if __cplusplus >= 201103L
-constexpr _Val_less_iter() = default;
-#else
-_Val_less_iter() { }
-#endif
-
-_GLIBCXX20_CONSTEXPR
-explicit
-_Val_less_iter(_Iter_less_iter) { }
-
-template
-  _GLIBCXX20_CONSTEXPR
-  bool
-  operator()(_Value& __val, _Iterator __it) const
-  { return __val < *__it; }
-  };
-
-  _GLIBCXX20_CONSTEXPR
-  inline _Val_less_iter
-  __val_less_iter()
-  { return _Val_less_iter(); }
-
-  _GLIBCXX20_CONSTEXPR
-  inline _Val_less_iter
-  __val_comp_iter(_Iter_less_iter)
-  { return _Val_less_iter(); }
-
-  struct _Iter_equal_to_iter
-  {
-template
-  _GLIBCXX20_CONSTEXPR
-  bool
-  operator()(_Iterator1 __it1, _Iterator2 __it2) const
-  { return *__it1 == *__it2; }
-  };
-
-  _GLIBCXX20_CONSTEXPR
-  inline _Iter_equal_to_iter
-  __iter_equal_to_iter()
-  { return _Iter_equal_to_iter(); }
+  inline _Less
+  __less()
+  { return _Less(); }
 
-  struct _Iter_equal_to_val
+  struct _Equal_to
   {
-template
+template
   _GLIBCXX20_CONSTEXPR
   bool
-  operator()(_Iterator __it, _Value& __val) const
-  { return *__it == __val; }
+  operator()(const _Lhs& __lhs, const _Rhs& __rhs) const
+  { return __lhs == __rhs; }
   };
 
   _GLIBCXX20_CONSTEXPR
-  inline _Iter_equal_to_val
-  __iter_equal_to_val()
-  { return _Iter_equal_to_val(); }
-
-  _GLIBCXX20_CONSTEXPR
-  inline _Iter_equal_to_val
-  __iter_comp_val(_Iter_equal_to_iter)
-  { return _Iter_equal_to_val(); }
-
-  template
-struct _Iter_comp_iter
-{
-  _Compare _M_comp;
-
-  explicit 

Re: [PATCHv3, rs6000] Splat vector small V2DI constants with ISA 2.07 instructions [PR104124]

2023-06-14 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

on 2023/5/26 10:49, HAO CHEN GUI wrote:
> Hi,
>   This patch adds a new insn for vector splat with small V2DI constants on P8.
> If the value of constant is in RANGE (-16, 15) and not 0 or -1, it can be 
> loaded
> with vspltisw and vupkhsw on P8. It should be efficient than loading vector 
> from
> memory.
> 
>   Compared to last version, the main change is to set a default value for 
> third
> parameter of vspltisw_vupkhsw_constant_p and call the function with 2 
> arguments
> when the third one doesn't matter.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> 2023-05-26  Haochen Gui 
> 
> gcc/
>   PR target/104124
>   * config/rs6000/altivec.md (*altivec_vupkhs_direct): Rename
>   to...
>   (altivec_vupkhs_direct): ...this.
>   * config/rs6000/constraints.md (wT constraint): New constant for a
>   vector constraint that can be loaded with vspltisw and vupkhsw.
>   * config/rs6000/predicates.md (vspltisw_vupkhsw_constant_split): New
>   predicate for wT constraint.
>   (easy_vector_constant): Call vspltisw_vupkhsw_constant_p to Check if
>   a vector constant can be synthesized with a vspltisw and a vupkhsw.
>   * config/rs6000/rs6000-protos.h (vspltisw_vupkhsw_constant_p): Declare.
>   * config/rs6000/rs6000.cc (vspltisw_vupkhsw_constant_p): Call
>   * (vspltisw_vupkhsw_constant_p): New function to return true if OP
>   mode is V2DI and can be synthesized with vupkhsw and vspltisw.
>   * config/rs6000/vsx.md (*vspltisw_v2di_split): New insn to load up
>   constants with vspltisw and vupkhsw.
> 
> gcc/testsuite/
>   PR target/104124
>   * gcc.target/powerpc/pr104124.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 49b0c964f4d..2c932854c33 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -2542,7 +2542,7 @@ (define_insn "altivec_vupkhs"
>  }
>[(set_attr "type" "vecperm")])
> 
> -(define_insn "*altivec_vupkhs_direct"
> +(define_insn "altivec_vupkhs_direct"
>[(set (match_operand:VP 0 "register_operand" "=v")
>   (unspec:VP [(match_operand: 1 "register_operand" "v")]
>UNSPEC_VUNPACK_HI_SIGN_DIRECT))]
> diff --git a/gcc/config/rs6000/constraints.md 
> b/gcc/config/rs6000/constraints.md
> index c4a6ccf4efb..e7f185660c0 100644
> --- a/gcc/config/rs6000/constraints.md
> +++ b/gcc/config/rs6000/constraints.md
> @@ -144,6 +144,10 @@ (define_constraint "wS"
>"@internal Vector constant that can be loaded with XXSPLTIB & sign 
> extension."
>(match_test "xxspltib_constant_split (op, mode)"))
> 
> +(define_constraint "wT"
> +  "@internal Vector constant that can be loaded with vspltisw & vupkhsw."
> +  (match_test "vspltisw_vupkhsw_constant_split (op, mode)"))

Could we avoid to add this new constraint?  Instead put this check
vspltisw_vupkhsw_constant_split (op, mode) to the condition of the
define_insn_and_split "*vspltisw_v2di_split, and update the constraint
with existing constraint which stands for a superset of vspltisw &
vupkhsw constants, such as: W?

> +
>  ;; ISA 3.0 DS-form instruction that has the bottom 2 bits 0 and no update 
> form.
>  ;; Used by LXSD/STXSD/LXSSP/STXSSP.  In contrast to "Y", the multiple-of-four
>  ;; offset is enforced for 32-bit too.
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index 52c65534e51..1ed770bffa6 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -694,6 +694,14 @@ (define_predicate "xxspltib_constant_split"
>return num_insns > 1;
>  })
> 
> +;; Return true if the operand is a constant that can be loaded with a 
> vspltisw
> +;; instruction and then a vupkhsw instruction.
> +
> +(define_predicate "vspltisw_vupkhsw_constant_split"
> +  (match_code "const_vector")
> +{
> +  return vspltisw_vupkhsw_constant_p (op, mode);
> +})

Maybe simpler with:

   and (match_code "const_vector")
   (match_test "vspltisw_vupkhsw_constant_p (op, mode)")

> 
>  ;; Return 1 if the operand is constant that can loaded directly with a 
> XXSPLTIB
>  ;; instruction.
> @@ -742,6 +750,11 @@ (define_predicate "easy_vector_constant"
>&& xxspltib_constant_p (op, mode, _insns, ))
>   return true;
> 
> +  /* V2DI constant within RANGE (-16, 15) can be synthesized with a
> +  vspltisw and a vupkhsw.  */
> +  if (vspltisw_vupkhsw_constant_p (op, mode, ))
> + return true;
> +
>return easy_altivec_constant (op, mode);
>  }
> 
> diff --git a/gcc/config/rs6000/rs6000-protos.h 
> b/gcc/config/rs6000/rs6000-protos.h
> index 1a4fc1df668..00cb2d82953 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, 
> rtx, int, int, int,
> 
>  extern int easy_altivec_constant (rtx, machine_mode);

Re: [PATCH v2] [PR96339] Optimise svlast[ab]

2023-06-14 Thread Tejas Belagod via Gcc-patches



From: Kyrylo Tkachov 
Date: Wednesday, June 14, 2023 at 10:11 PM
To: Prathamesh Kulkarni , Tejas Belagod 

Cc: Richard Sandiford , gcc-patches@gcc.gnu.org 

Subject: RE: [PATCH v2] [PR96339] Optimise svlast[ab]


> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Prathamesh
> Kulkarni via Gcc-patches
> Sent: Wednesday, June 14, 2023 8:13 AM
> To: Tejas Belagod 
> Cc: Richard Sandiford ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH v2] [PR96339] Optimise svlast[ab]
>
> On Tue, 13 Jun 2023 at 12:38, Tejas Belagod via Gcc-patches
>  wrote:
> >
> >
> >
> > From: Richard Sandiford 
> > Date: Monday, June 12, 2023 at 2:15 PM
> > To: Tejas Belagod 
> > Cc: gcc-patches@gcc.gnu.org , Tejas Belagod
> 
> > Subject: Re: [PATCH v2] [PR96339] Optimise svlast[ab]
> > Tejas Belagod  writes:
> > > From: Tejas Belagod 
> > >
> > >   This PR optimizes an SVE intrinsics sequence where
> > > svlasta (svptrue_pat_b8 (SV_VL1), x)
> > >   a scalar is selected based on a constant predicate and a variable 
> > > vector.
> > >   This sequence is optimized to return the correspoding element of a
> NEON
> > >   vector. For eg.
> > > svlasta (svptrue_pat_b8 (SV_VL1), x)
> > >   returns
> > > umovw0, v0.b[1]
> > >   Likewise,
> > > svlastb (svptrue_pat_b8 (SV_VL1), x)
> > >   returns
> > >  umovw0, v0.b[0]
> > >   This optimization only works provided the constant predicate maps to a
> range
> > >   that is within the bounds of a 128-bit NEON register.
> > >
> > > gcc/ChangeLog:
> > >
> > >PR target/96339
> > >* config/aarch64/aarch64-sve-builtins-base.cc (svlast_impl::fold): 
> > > Fold
> sve
> > >calls that have a constant input predicate vector.
> > >(svlast_impl::is_lasta): Query to check if intrinsic is svlasta.
> > >(svlast_impl::is_lastb): Query to check if intrinsic is svlastb.
> > >(svlast_impl::vect_all_same): Check if all vector elements are 
> > > equal.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >PR target/96339
> > >* gcc.target/aarch64/sve/acle/general-c/svlast.c: New.
> > >* gcc.target/aarch64/sve/acle/general-c/svlast128_run.c: New.
> > >* gcc.target/aarch64/sve/acle/general-c/svlast256_run.c: New.
> > >* gcc.target/aarch64/sve/pcs/return_4.c (caller_bf16): Fix asm
> > >to expect optimized code for function body.
> > >* gcc.target/aarch64/sve/pcs/return_4_128.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_4_256.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_4_512.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_4_1024.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_4_2048.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_5.c (caller_bf16): Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_5_128.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_5_256.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_5_512.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_5_1024.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_5_2048.c (caller_bf16): 
> > > Likewise.
> >
> > OK, thanks.
> >
> > Applied on master, thanks.
> Hi Tejas,
> This seems to break aarch64 bootstrap build with following error due
> to -Wsign-compare diagnostic:
> 00:18:19 /home/tcwg-
> buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/gcc/config/
> aarch64/aarch64-sve-builtins-base.cc:1133:35:
> error: comparison of integer expressions of different signedness:
> ‘int’ and ‘long unsigned int’ [-Werror=sign-compare]
> 00:18:19  1133 | for (i = npats; i < enelts; i += step_1)
> 00:18:19  | ~~^~~~
> 00:30:46 abe-debug-build: cc1plus: all warnings being treated as errors
> 00:30:46 abe-debug-build: make[3]: ***
> [/home/tcwg-
> buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/gcc/config/
> aarch64/t-aarch64:96:
> aarch64-sve-builtins-base.o] Error 1

Fixed thusly in trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins-base.cc (svlast_impl::fold):
Fix signed comparison warning in loop from npats to enelts.


Ah, sorry for breaking BS and thanks Kyrill for the fix.

Tejas.

>
> Thanks,
> Prathamesh
> >
> > Tejas.
> >
> >
> > Richard


Re: [PATCH] RISC-V: Add autovec FP unary operations.

2023-06-14 Thread juzhe.zh...@rivai.ai
After several considerations, I think we may need to add VF_AUTO iterators 
(with predicate TARGET_ZVFH for vector HF mode) for FP autovec.
Add add testcase of these unary operations with -march=rv64gc_zvfhmin to make 
sure they don't
cause any ICE and vectorizations.

like https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621322.html
this patch.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-14 23:31
To: gcc-patches; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Add autovec FP unary operations.
Hi,
 
this patch adds floating-point autovec expanders for vfneg, vfabs as well as
vfsqrt and the accompanying tests.  vfrsqrt7 will be added at a later time.
 
Similary to the binop tests, there are flavors for zvfh now.  Prerequisites
as before.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (2): Add unop expanders.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/abs-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c: New test.
---
gcc/config/riscv/autovec.md   | 36 ++-
.../riscv/rvv/autovec/unop/abs-run.c  |  6 ++--
.../riscv/rvv/autovec/unop/abs-rv32gcv.c  |  3 +-
.../riscv/rvv/autovec/unop/abs-rv64gcv.c  |  3 +-
.../riscv/rvv/autovec/unop/abs-template.h | 14 +++-
.../riscv/rvv/autovec/unop/abs-zvfh-run.c | 35 ++
.../riscv/rvv/autovec/unop/vfsqrt-run.c   | 29 +++
.../riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c   | 10 ++
.../riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c   | 10 ++
.../riscv/rvv/autovec/unop/vfsqrt-template.h  | 31 
.../riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c  | 32 +
.../riscv/rvv/autovec/unop/vneg-run.c |  6 ++--
.../riscv/rvv/autovec/unop/vneg-rv32gcv.c |  3 +-
.../riscv/rvv/autovec/unop/vneg-rv64gcv.c |  3 +-
.../riscv/rvv/autovec/unop/vneg-template.h|  5 ++-
.../riscv/rvv/autovec/unop/vneg-zvfh-run.c| 26 ++
16 files changed, 241 insertions(+), 11 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 1c6d793cae0..72154400f1f 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -498,7 +498,7 @@ (define_expand "2"
})
;; 
---
-;; - ABS expansion to vmslt and vneg
+;; - [INT] ABS expansion to vmslt and vneg.
;; 
---
(define_expand "abs2"
@@ -517,6 +517,40 @@ (define_expand "abs2"
   DONE;
})
+;; 
---
+;;  [FP] Unary operations
+;; 
---
+;; Includes:
+;; - vfneg.v/vfabs.v
+;; 
---
+(define_expand "2"
+  [(set (match_operand:VF 0 "register_operand")
+(any_float_unop_nofrm:VF
+ (match_operand:VF 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+})
+
+;; 
---
+;; - [FP] Square root
+;; 
---
+;; Includes:
+;; - vfsqrt.v
+;; 
---
+(define_expand "2"
+  [(set (match_operand:VF 0 

Re: [PATCH v3] RISC-V: Use merge approach to optimize vector permutation

2023-06-14 Thread juzhe.zh...@rivai.ai
LGTM thanks,.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-15 10:18
To: gcc-patches
CC: juzhe.zhong; palmer; rdapp.gcc; jeffreyalaw; kito.cheng
Subject: [PATCH v3] RISC-V: Use merge approach to optimize vector permutation
From: Juzhe-Zhong 
 
This patch is to optimize the permuation case that is suiteable use
merge approach.
 
Consider this following case:
typedef int8_t vnx16qi __attribute__((vector_size (16)));
 
void __attribute__ ((noipa))
merge0 (vnx16qi x, vnx16qi y, vnx16qi *out)
{
  vnx16qi v = __builtin_shufflevector ((vnx16qi) x, (vnx16qi) y, MASK_16);
  *(vnx16qi*)out = v;
}
 
The gimple IR:
v_3 = VEC_PERM_EXPR ;
 
Selector = { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }, the 
common expression:
{ 0, nunits + 1, 2, nunits + 3, 4, nunits + 5, ...  }
 
For this selector, we can use vmsltu + vmerge to optimize the codegen.
 
Before this patch:
merge0:
addia5,sp,16
vl1re8.vv3,0(a5)
li  a5,31
vsetivlizero,16,e8,m1,ta,mu
vmv.v.x v2,a5
lui a5,%hi(.LANCHOR0)
addia5,a5,%lo(.LANCHOR0)
vl1re8.vv1,0(a5)
vl1re8.vv4,0(sp)
vand.vv v1,v1,v2
vmsgeu.vi   v0,v1,16
vrgather.vv v2,v4,v1
vadd.vi v1,v1,-16
vrgather.vv v2,v3,v1,v0.t
vs1r.v  v2,0(a0)
ret
 
After this patch:
merge0:
addia5,sp,16
vl1re8.vv1,0(a5)
lui a5,%hi(.LANCHOR0)
addia5,a5,%lo(.LANCHOR0)
vsetivlizero,16,e8,m1,ta,ma
vl1re8.vv0,0(a5)
vl1re8.vv2,0(sp)
vmsltu.vi   v0,v0,16
vmerge.vvm  v1,v1,v2,v0
vs1r.v  v1,0(a0)
ret
 
The key of this optimization is that:
1. mask = vmsltu (selector, nunits)
2. result = vmerge (op0, op1, mask)
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (shuffle_merge_patterns): New pattern.
(expand_vec_perm_const_1): Add merge optmization.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-7.c: New test.
---
gcc/config/riscv/riscv-v.cc   |  53 +
.../riscv/rvv/autovec/vls-vlmax/merge-1.c | 101 +
.../riscv/rvv/autovec/vls-vlmax/merge-2.c | 103 +
.../riscv/rvv/autovec/vls-vlmax/merge-3.c | 109 +
.../riscv/rvv/autovec/vls-vlmax/merge-4.c | 122 ++
.../riscv/rvv/autovec/vls-vlmax/merge-5.c |  76 +++
.../riscv/rvv/autovec/vls-vlmax/merge-6.c |  51 +
.../riscv/rvv/autovec/vls-vlmax/merge-7.c |  25 +++
.../riscv/rvv/autovec/vls-vlmax/merge_run-1.c | 119 ++
.../riscv/rvv/autovec/vls-vlmax/merge_run-2.c | 121 ++
.../riscv/rvv/autovec/vls-vlmax/merge_run-3.c | 150 +
.../riscv/rvv/autovec/vls-vlmax/merge_run-4.c | 210 ++
.../riscv/rvv/autovec/vls-vlmax/merge_run-5.c |  89 
.../riscv/rvv/autovec/vls-vlmax/merge_run-6.c |  59 +
.../riscv/rvv/autovec/vls-vlmax/merge_run-7.c |  29 +++
15 files changed, 1417 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-4.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-5.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-6.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-7.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-4.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-5.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-6.c
create mode 100644 

[PATCH v3] RISC-V: Use merge approach to optimize vector permutation

2023-06-14 Thread Pan Li via Gcc-patches
From: Juzhe-Zhong 

This patch is to optimize the permuation case that is suiteable use
merge approach.

Consider this following case:
typedef int8_t vnx16qi __attribute__((vector_size (16)));

void __attribute__ ((noipa))
merge0 (vnx16qi x, vnx16qi y, vnx16qi *out)
{
  vnx16qi v = __builtin_shufflevector ((vnx16qi) x, (vnx16qi) y, MASK_16);
  *(vnx16qi*)out = v;
}

The gimple IR:
v_3 = VEC_PERM_EXPR ;

Selector = { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }, the 
common expression:
{ 0, nunits + 1, 2, nunits + 3, 4, nunits + 5, ...  }

For this selector, we can use vmsltu + vmerge to optimize the codegen.

Before this patch:
merge0:
addia5,sp,16
vl1re8.vv3,0(a5)
li  a5,31
vsetivlizero,16,e8,m1,ta,mu
vmv.v.x v2,a5
lui a5,%hi(.LANCHOR0)
addia5,a5,%lo(.LANCHOR0)
vl1re8.vv1,0(a5)
vl1re8.vv4,0(sp)
vand.vv v1,v1,v2
vmsgeu.vi   v0,v1,16
vrgather.vv v2,v4,v1
vadd.vi v1,v1,-16
vrgather.vv v2,v3,v1,v0.t
vs1r.v  v2,0(a0)
ret

After this patch:
merge0:
addia5,sp,16
vl1re8.vv1,0(a5)
lui a5,%hi(.LANCHOR0)
addia5,a5,%lo(.LANCHOR0)
vsetivlizero,16,e8,m1,ta,ma
vl1re8.vv0,0(a5)
vl1re8.vv2,0(sp)
vmsltu.vi   v0,v0,16
vmerge.vvm  v1,v1,v2,v0
vs1r.v  v1,0(a0)
ret

The key of this optimization is that:
1. mask = vmsltu (selector, nunits)
2. result = vmerge (op0, op1, mask)

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_merge_patterns): New pattern.
(expand_vec_perm_const_1): Add merge optmization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-7.c: New test.
---
 gcc/config/riscv/riscv-v.cc   |  53 +
 .../riscv/rvv/autovec/vls-vlmax/merge-1.c | 101 +
 .../riscv/rvv/autovec/vls-vlmax/merge-2.c | 103 +
 .../riscv/rvv/autovec/vls-vlmax/merge-3.c | 109 +
 .../riscv/rvv/autovec/vls-vlmax/merge-4.c | 122 ++
 .../riscv/rvv/autovec/vls-vlmax/merge-5.c |  76 +++
 .../riscv/rvv/autovec/vls-vlmax/merge-6.c |  51 +
 .../riscv/rvv/autovec/vls-vlmax/merge-7.c |  25 +++
 .../riscv/rvv/autovec/vls-vlmax/merge_run-1.c | 119 ++
 .../riscv/rvv/autovec/vls-vlmax/merge_run-2.c | 121 ++
 .../riscv/rvv/autovec/vls-vlmax/merge_run-3.c | 150 +
 .../riscv/rvv/autovec/vls-vlmax/merge_run-4.c | 210 ++
 .../riscv/rvv/autovec/vls-vlmax/merge_run-5.c |  89 
 .../riscv/rvv/autovec/vls-vlmax/merge_run-6.c |  59 +
 .../riscv/rvv/autovec/vls-vlmax/merge_run-7.c |  29 +++
 15 files changed, 1417 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-7.c

diff --git 

Re: [PATCH v2] RISC-V: Use merge approach to optimize vector permutation

2023-06-14 Thread juzhe.zh...@rivai.ai
+  for (int i = n_patterns; i < n_patterns * 2; i++)
+if (!d->perm.series_p (i, n_patterns, i, n_patterns)
+ && !d->perm.series_p (i, n_patterns, vec_len + i, n_patterns))
+  return false;

As Robin's suggested, Add comment here:
/* Check the pattern is monotonic here, otherwise, return false.  */
 
Send V3 with adding more comments and merge thanks.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-15 09:52
To: gcc-patches
CC: juzhe.zhong; palmer; rdapp.gcc; jeffreyalaw; kito.cheng
Subject: [PATCH v2] RISC-V: Use merge approach to optimize vector permutation
From: Juzhe-Zhong 
 
This patch is to optimize the permuation case that is suiteable use
merge approach.
 
Consider this following case:
typedef int8_t vnx16qi __attribute__((vector_size (16)));
 
void __attribute__ ((noipa))
merge0 (vnx16qi x, vnx16qi y, vnx16qi *out)
{
  vnx16qi v = __builtin_shufflevector ((vnx16qi) x, (vnx16qi) y, MASK_16);
  *(vnx16qi*)out = v;
}
 
The gimple IR:
v_3 = VEC_PERM_EXPR ;
 
Selector = { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }, the 
common expression:
{ 0, nunits + 1, 2, nunits + 3, 4, nunits + 5, ...  }
 
For this selector, we can use vmsltu + vmerge to optimize the codegen.
 
Before this patch:
merge0:
addia5,sp,16
vl1re8.vv3,0(a5)
li  a5,31
vsetivlizero,16,e8,m1,ta,mu
vmv.v.x v2,a5
lui a5,%hi(.LANCHOR0)
addia5,a5,%lo(.LANCHOR0)
vl1re8.vv1,0(a5)
vl1re8.vv4,0(sp)
vand.vv v1,v1,v2
vmsgeu.vi   v0,v1,16
vrgather.vv v2,v4,v1
vadd.vi v1,v1,-16
vrgather.vv v2,v3,v1,v0.t
vs1r.v  v2,0(a0)
ret
 
After this patch:
merge0:
addia5,sp,16
vl1re8.vv1,0(a5)
lui a5,%hi(.LANCHOR0)
addia5,a5,%lo(.LANCHOR0)
vsetivlizero,16,e8,m1,ta,ma
vl1re8.vv0,0(a5)
vl1re8.vv2,0(sp)
vmsltu.vi   v0,v0,16
vmerge.vvm  v1,v1,v2,v0
vs1r.v  v1,0(a0)
ret
 
The key of this optimization is that:
1. mask = vmsltu (selector, nunits)
2. result = vmerge (op0, op1, mask)
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (shuffle_merge_patterns): New pattern.
(expand_vec_perm_const_1): Add merge optmization.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-7.c: New test.
---
gcc/config/riscv/riscv-v.cc   |  52 +
.../riscv/rvv/autovec/vls-vlmax/merge-1.c | 101 +
.../riscv/rvv/autovec/vls-vlmax/merge-2.c | 103 +
.../riscv/rvv/autovec/vls-vlmax/merge-3.c | 109 +
.../riscv/rvv/autovec/vls-vlmax/merge-4.c | 122 ++
.../riscv/rvv/autovec/vls-vlmax/merge-5.c |  76 +++
.../riscv/rvv/autovec/vls-vlmax/merge-6.c |  51 +
.../riscv/rvv/autovec/vls-vlmax/merge-7.c |  25 +++
.../riscv/rvv/autovec/vls-vlmax/merge_run-1.c | 119 ++
.../riscv/rvv/autovec/vls-vlmax/merge_run-2.c | 121 ++
.../riscv/rvv/autovec/vls-vlmax/merge_run-3.c | 150 +
.../riscv/rvv/autovec/vls-vlmax/merge_run-4.c | 210 ++
.../riscv/rvv/autovec/vls-vlmax/merge_run-5.c |  89 
.../riscv/rvv/autovec/vls-vlmax/merge_run-6.c |  59 +
.../riscv/rvv/autovec/vls-vlmax/merge_run-7.c |  29 +++
15 files changed, 1416 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-4.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-5.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-6.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-7.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-2.c
create mode 100644 

RE: [PATCH] RISC-V: Use merge approach to optimize vector permutation

2023-06-14 Thread Li, Pan2 via Gcc-patches
Addressed the comments in PATCH v2 as below.

https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621789.html

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Thursday, June 15, 2023 3:11 AM
To: Robin Dapp ; juzhe.zh...@rivai.ai; 
gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; kito.ch...@sifive.com; pal...@dabbelt.com; 
pal...@rivosinc.com
Subject: Re: [PATCH] RISC-V: Use merge approach to optimize vector permutation



On 6/14/23 09:00, Robin Dapp wrote:
> Hi Juzhe,
> 
> the general method seems sane and useful (it's not very complicated).
> I was just distracted by
> 
>> Selector = { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }, the 
>> common expression:
>> { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ...  }
>>
>> For this selector, we can use vmsltu + vmerge to optimize the codegen.
> 
> because it's actually { 0, nunits + 1, 2, nunits + 3, ... } or maybe
> { 0, nunits, 0, nunits, ... } + { 0, 1, 2, 3, ..., nunits - 1 }.
> 
> Because of the ascending/monotonic? selector structure we can use vmerge
> instead of vrgather.
> 
>> +/* Recognize the patterns that we can use merge operation to shuffle the
>> +   vectors. The value of Each element (index i) in selector can only be
>> +   either i or nunits + i.
>> +
>> +   E.g.
>> +   v = VEC_PERM_EXPR (v0, v1, selector),
>> +   selector = { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ...  }
> 
> Same.
> 
>> +
>> +   We can transform such pattern into:
>> +
>> +   v = vcond_mask (v0, v1, mask),
>> +   mask = { 0, 1, 0, 1, 0, 1, ... }.  */
>> +
>> +static bool
>> +shuffle_merge_patterns (struct expand_vec_perm_d *d)
>> +{
>> +  machine_mode vmode = d->vmode;
>> +  machine_mode sel_mode = related_int_vector_mode (vmode).require ();
>> +  int n_patterns = d->perm.encoding ().npatterns ();
>> +  poly_int64 vec_len = d->perm.length ();
>> +
>> +  for (int i = 0; i < n_patterns; ++i)
>> +if (!known_eq (d->perm[i], i) && !known_eq (d->perm[i], vec_len + i))
>> +  return false;
>> +
>> +  for (int i = n_patterns; i < n_patterns * 2; i++)
>> +if (!d->perm.series_p (i, n_patterns, i, n_patterns)
>> +&& !d->perm.series_p (i, n_patterns, vec_len + i, n_patterns))
>> +  return false;
> 
> Maybe add a comment that we check that the pattern is actually monotonic
> or however you prefet to call it?
> 
> I didn't go through all tests in detail but skimmed several.  All in all
> looks good to me.
So I think that means we want a V2 for the comment updates.  But I think 
we can go ahead and consider V2 pre-approved.

jeff


[PATCH v2] RISC-V: Use merge approach to optimize vector permutation

2023-06-14 Thread Pan Li via Gcc-patches
From: Juzhe-Zhong 

This patch is to optimize the permuation case that is suiteable use
merge approach.

Consider this following case:
typedef int8_t vnx16qi __attribute__((vector_size (16)));

void __attribute__ ((noipa))
merge0 (vnx16qi x, vnx16qi y, vnx16qi *out)
{
  vnx16qi v = __builtin_shufflevector ((vnx16qi) x, (vnx16qi) y, MASK_16);
  *(vnx16qi*)out = v;
}

The gimple IR:
v_3 = VEC_PERM_EXPR ;

Selector = { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }, the 
common expression:
{ 0, nunits + 1, 2, nunits + 3, 4, nunits + 5, ...  }

For this selector, we can use vmsltu + vmerge to optimize the codegen.

Before this patch:
merge0:
addia5,sp,16
vl1re8.vv3,0(a5)
li  a5,31
vsetivlizero,16,e8,m1,ta,mu
vmv.v.x v2,a5
lui a5,%hi(.LANCHOR0)
addia5,a5,%lo(.LANCHOR0)
vl1re8.vv1,0(a5)
vl1re8.vv4,0(sp)
vand.vv v1,v1,v2
vmsgeu.vi   v0,v1,16
vrgather.vv v2,v4,v1
vadd.vi v1,v1,-16
vrgather.vv v2,v3,v1,v0.t
vs1r.v  v2,0(a0)
ret

After this patch:
merge0:
addia5,sp,16
vl1re8.vv1,0(a5)
lui a5,%hi(.LANCHOR0)
addia5,a5,%lo(.LANCHOR0)
vsetivlizero,16,e8,m1,ta,ma
vl1re8.vv0,0(a5)
vl1re8.vv2,0(sp)
vmsltu.vi   v0,v0,16
vmerge.vvm  v1,v1,v2,v0
vs1r.v  v1,0(a0)
ret

The key of this optimization is that:
1. mask = vmsltu (selector, nunits)
2. result = vmerge (op0, op1, mask)

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_merge_patterns): New pattern.
(expand_vec_perm_const_1): Add merge optmization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-7.c: New test.
---
 gcc/config/riscv/riscv-v.cc   |  52 +
 .../riscv/rvv/autovec/vls-vlmax/merge-1.c | 101 +
 .../riscv/rvv/autovec/vls-vlmax/merge-2.c | 103 +
 .../riscv/rvv/autovec/vls-vlmax/merge-3.c | 109 +
 .../riscv/rvv/autovec/vls-vlmax/merge-4.c | 122 ++
 .../riscv/rvv/autovec/vls-vlmax/merge-5.c |  76 +++
 .../riscv/rvv/autovec/vls-vlmax/merge-6.c |  51 +
 .../riscv/rvv/autovec/vls-vlmax/merge-7.c |  25 +++
 .../riscv/rvv/autovec/vls-vlmax/merge_run-1.c | 119 ++
 .../riscv/rvv/autovec/vls-vlmax/merge_run-2.c | 121 ++
 .../riscv/rvv/autovec/vls-vlmax/merge_run-3.c | 150 +
 .../riscv/rvv/autovec/vls-vlmax/merge_run-4.c | 210 ++
 .../riscv/rvv/autovec/vls-vlmax/merge_run-5.c |  89 
 .../riscv/rvv/autovec/vls-vlmax/merge_run-6.c |  59 +
 .../riscv/rvv/autovec/vls-vlmax/merge_run-7.c |  29 +++
 15 files changed, 1416 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-7.c

diff --git 

RE: [PATCH V2] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread Li, Pan2 via Gcc-patches
Committed with the comment update,, thanks Jeff and Juzhe.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Thursday, June 15, 2023 3:08 AM
To: Lehua Ding ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; rdapp@gamil.com; jeffreya...@gamil.com; 
pal...@rivosinc.com
Subject: Re: [PATCH V2] RISC-V: Ensure vector args and return use function 
stack to pass [PR110119]



On 6/14/23 05:56, Lehua Ding wrote:
> The V2 patch address comments from Juzhe, thanks.
> 
> Hi,
>   
> The reason for this bug is that in the case where the vector register 
> is set to a fixed length (with 
> `--param=riscv-autovec-preference=fixed-vlmax` option), 
> TARGET_PASS_BY_REFERENCE thinks that variables of type vint32m1 can be 
> passed through two scalar registers, but when GCC calls FUNCTION_VALUE 
> (call function riscv_get_arg_info inside) it returns NULL_RTX. These 
> two functions are not unified. The current treatment is to pass all 
> vector arguments and returns through the function stack, and a new calling 
> convention for vector registers will be added in the future.
>   
> Best,
> Lehua
> 
>  PR target/110119
> 
> gcc/ChangeLog:
> 
>  * config/riscv/riscv.cc (riscv_get_arg_info): Return NULL_RTX for 
> vector mode
>  (riscv_pass_by_reference): Return true for vector mode
> 
> gcc/testsuite/ChangeLog:
> 
>  * gcc.target/riscv/rvv/base/pr110119-1.c: New test.
>  * gcc.target/riscv/rvv/base/pr110119-2.c: New test.
And just to be clear, I've asked for a minor comment update.  The usual 
procedure is to go ahead and post a V3.  In this case I'll also give that V3 
pre-approval.  So no need to wait for additional acks.  Post it and it can be 
committed immediately.

jeff


[PATCH v3] LoongArch: Avoid non-returning indirect jumps through $ra [PR110136]

2023-06-14 Thread Lulu Cheng
Micro-architecture unconditionally treats a "jr $ra" as "return from 
subroutine",
hence doing "jr $ra" would interfere with both subroutine return prediction and
the more general indirect branch prediction.

Therefore, a problem like PR110136 can cause a significant increase in branch 
error
prediction rate and affect performance. The same problem exists with 
"indirect_jump".

gcc/ChangeLog:

* config/loongarch/loongarch.md: Modify the register constraints for 
template
"jumptable" and "indirect_jump" from "r" to "e".

Co-authored-by: Andrew Pinski 
---
v1 -> v2:
  1. Modify the description.
  2. Modify the register constraints of the template "indirect_jump".
v2 -> v3:
  1. Modify the description.
---
 gcc/config/loongarch/loongarch.md | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 816a943d155..b37e070660f 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2895,6 +2895,10 @@ (define_insn "*jump_pic"
 }
   [(set_attr "type" "branch")])
 
+;; Micro-architecture unconditionally treats a "jr $ra" as "return from 
subroutine",
+;; non-returning indirect jumps through $ra would interfere with both 
subroutine
+;; return prediction and the more general indirect branch prediction.
+
 (define_expand "indirect_jump"
   [(set (pc) (match_operand 0 "register_operand"))]
   ""
@@ -2905,7 +2909,7 @@ (define_expand "indirect_jump"
 })
 
 (define_insn "@indirect_jump"
-  [(set (pc) (match_operand:P 0 "register_operand" "r"))]
+  [(set (pc) (match_operand:P 0 "register_operand" "e"))]
   ""
   "jr\t%0"
   [(set_attr "type" "jump")
@@ -2928,7 +2932,7 @@ (define_expand "tablejump"
 
 (define_insn "@tablejump"
   [(set (pc)
-   (match_operand:P 0 "register_operand" "r"))
+   (match_operand:P 0 "register_operand" "e"))
(use (label_ref (match_operand 1 "" "")))]
   ""
   "jr\t%0"
-- 
2.31.1



[committed] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread Pan Li via Gcc-patches
From: Lehua Ding 

The V2 patch address comments from Juzhe, thanks.

Hi,

The reason for this bug is that in the case where the vector register is set
to a fixed length (with `--param=riscv-autovec-preference=fixed-vlmax` option),
TARGET_PASS_BY_REFERENCE thinks that variables of type vint32m1 can be passed
through two scalar registers, but when GCC calls FUNCTION_VALUE (call function
riscv_get_arg_info inside) it returns NULL_RTX. These two functions are not
unified. The current treatment is to pass all vector arguments and returns
through the function stack, and a new calling convention for vector registers
will be added in the future.

https://github.com/riscv-non-isa/riscv-elf-psabi-doc/
https://github.com/palmer-dabbelt/riscv-elf-psabi-doc/commit/126fa719972ff998a8a239c47d506c7809aea363

Best,
Lehua

gcc/ChangeLog:
PR target/110119
* config/riscv/riscv.cc (riscv_get_arg_info): Return NULL_RTX for 
vector mode
(riscv_pass_by_reference): Return true for vector mode

gcc/testsuite/ChangeLog:
PR target/110119
* gcc.target/riscv/rvv/base/pr110119-1.c: New test.
* gcc.target/riscv/rvv/base/pr110119-2.c: New test.
---
 gcc/config/riscv/riscv.cc | 17 
 .../gcc.target/riscv/rvv/base/pr110119-1.c| 26 +++
 .../gcc.target/riscv/rvv/base/pr110119-2.c| 26 +++
 3 files changed, 64 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dd5361c2bd2..e5ae4e81b7a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3915,13 +3915,13 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
CUMULATIVE_ARGS *cum,
   riscv_pass_in_vector_p (type);
 }
 
-  /* TODO: Currently, it will cause an ICE for --param
- riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
- let GCC generate loads/stores. Ideally, we should either warn the user not
- to use an RVV vector type as function argument or support the calling
- convention directly.  */
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
   if (riscv_v_ext_mode_p (mode))
 return NULL_RTX;
+
   if (named)
 {
   riscv_aggregate_field fields[2];
@@ -4106,6 +4106,13 @@ riscv_pass_by_reference (cumulative_args_t cum_v, const 
function_arg_info )
return false;
 }
 
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (arg.mode))
+return true;
+
   /* Pass by reference if the data do not fit in two integer registers.  */
   return !IN_RANGE (size, 0, 2 * UNITS_PER_WORD);
 }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
new file mode 100644
index 000..f16502bcfee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv --param=riscv-autovec-preference=fixed-vlmax" 
} */
+
+#include "riscv_vector.h"
+
+typedef int8_t vnx2qi __attribute__ ((vector_size (2)));
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi (int8_t a, int8_t b, int8_t *out)
+{
+  vnx2qi v = {a, b};
+  return v;
+}
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi_2 (vnx2qi a, int8_t *out)
+{
+  return a;
+}
+
+__attribute__ ((noipa)) vint32m1_t
+f_vint32m1 (int8_t *a, int8_t *out)
+{
+  vint32m1_t v = *(vint32m1_t *) a;
+  return v;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c
new file mode 100644
index 000..b233ff1e904
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr110119-2.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gczve32x 
--param=riscv-autovec-preference=fixed-vlmax" } */
+
+#include 
+#include "riscv_vector.h"
+
+__attribute__ ((noipa)) vint32m1x3_t
+foo1 (int32_t *in, int vl)
+{
+  vint32m1x3_t v = __riscv_vlseg3e32_v_i32m1x3 (in, vl);
+  return v;
+}
+
+__attribute__ ((noipa)) void
+foo2 (vint32m1x3_t a, int32_t *out, int vl)
+{
+  __riscv_vsseg3e32_v_i32m1x3 (out, a, vl);
+}
+
+__attribute__ ((noipa)) vint32m1x3_t
+foo3 (vint32m1x3_t a, int32_t *out, int32_t *in, int vl)
+{
+  __riscv_vsseg3e32_v_i32m1x3 (out, a, vl);
+  vint32m1x3_t v = __riscv_vlseg3e32_v_i32m1x3 (in, vl);
+  return v;
+}
-- 
2.34.1



Re: [PATCH] LoongArch: Set default alignment for functions and labels with -mtune

2023-06-14 Thread Lulu Cheng

LGTM! Thanks!

在 2023/6/14 上午8:43, Xi Ruoyao 写道:

The LA464 micro-architecture is sensitive to alignment of code.  The
Loongson team has benchmarked various combinations of function, the
results [1] show that 16-byte label alignment together with 32-byte
function alignment gives best results in terms of SPEC score.

Add a mtune-based table-driven mechanism to set the default of
-falign-{functions,labels}.  As LA464 is the first (and the only for
now) uarch supported by GCC, the same setting is also used for
the "generic" -mtune=loongarch64.  In the future we may set different
settings for LA{2,3,6}64 once we add the support for them.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

gcc/ChangeLog:

* config/loongarch/loongarch-tune.h (loongarch_align): New
struct.
* config/loongarch/loongarch-def.h (loongarch_cpu_align): New
array.
* config/loongarch/loongarch-def.c (loongarch_cpu_align): Define
the array.
* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Set the value of
-falign-functions= if -falign-functions is enabled but no value
is given.  Likewise for -falign-labels=.
---
  gcc/config/loongarch/loongarch-def.c  | 12 
  gcc/config/loongarch/loongarch-def.h  |  1 +
  gcc/config/loongarch/loongarch-tune.h |  8 
  gcc/config/loongarch/loongarch.cc |  6 ++
  4 files changed, 27 insertions(+)

diff --git a/gcc/config/loongarch/loongarch-def.c 
b/gcc/config/loongarch/loongarch-def.c
index fc4ebbefede..6729c857f7c 100644
--- a/gcc/config/loongarch/loongarch-def.c
+++ b/gcc/config/loongarch/loongarch-def.c
@@ -72,6 +72,18 @@ loongarch_cpu_cache[N_TUNE_TYPES] = {
},
  };
  
+struct loongarch_align

+loongarch_cpu_align[N_TUNE_TYPES] = {
+  [CPU_LOONGARCH64] = {
+.function = "32",
+.label = "16",
+  },
+  [CPU_LA464] = {
+.function = "32",
+.label = "16",
+  },
+};
+
  /* The following properties cannot be looked up directly using "cpucfg".
   So it is necessary to provide a default value for "unknown native"
   tune targets (i.e. -mtune=native while PRID does not correspond to
diff --git a/gcc/config/loongarch/loongarch-def.h 
b/gcc/config/loongarch/loongarch-def.h
index 778b1409956..fb8bb88eb52 100644
--- a/gcc/config/loongarch/loongarch-def.h
+++ b/gcc/config/loongarch/loongarch-def.h
@@ -144,6 +144,7 @@ extern int loongarch_cpu_issue_rate[];
  extern int loongarch_cpu_multipass_dfa_lookahead[];
  
  extern struct loongarch_cache loongarch_cpu_cache[];

+extern struct loongarch_align loongarch_cpu_align[];
  extern struct loongarch_rtx_cost_data loongarch_cpu_rtx_cost_data[];
  
  #ifdef __cplusplus

diff --git a/gcc/config/loongarch/loongarch-tune.h 
b/gcc/config/loongarch/loongarch-tune.h
index ba31c4f08c3..5c03262daff 100644
--- a/gcc/config/loongarch/loongarch-tune.h
+++ b/gcc/config/loongarch/loongarch-tune.h
@@ -48,4 +48,12 @@ struct loongarch_cache {
  int simultaneous_prefetches; /* number of parallel prefetch */
  };
  
+/* Alignment for functions and labels for best performance.  For new uarchs

+   the value should be measured via benchmarking.  See the documentation for
+   -falign-functions and -falign-labels in invoke.texi for the format.  */
+struct loongarch_align {
+  const char *function;/* default value for -falign-functions */
+  const char *label;   /* default value for -falign-labels */
+};
+
  #endif /* LOONGARCH_TUNE_H */
diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index eb73d11b869..5b8b93eb24b 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -6249,6 +6249,12 @@ loongarch_option_override_internal (struct gcc_options 
*opts)
&& !opts->x_optimize_size)
  opts->x_flag_prefetch_loop_arrays = 1;
  
+  if (opts->x_flag_align_functions && !opts->x_str_align_functions)

+opts->x_str_align_functions = 
loongarch_cpu_align[LARCH_ACTUAL_TUNE].function;
+
+  if (opts->x_flag_align_labels && !opts->x_str_align_labels)
+opts->x_str_align_labels = loongarch_cpu_align[LARCH_ACTUAL_TUNE].label;
+
if (TARGET_DIRECT_EXTERN_ACCESS && flag_shlib)
  error ("%qs cannot be used for compiling a shared library",
   "-mdirect-extern-access");




RE: [PATCH v1] RISC-V: Align the predictor style for define_insn_and_split

2023-06-14 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff and Juzhe, sorry for misleading.

Pan

-Original Message-
From: Jeff Law  
Sent: Thursday, June 15, 2023 2:51 AM
To: juzhe.zh...@rivai.ai; Li, Pan2 ; gcc-patches 

Cc: Robin Dapp ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Align the predictor style for 
define_insn_and_split



On 6/13/23 20:31, juzhe.zh...@rivai.ai wrote:
> LGTM.
Similarly.  If I've interpreted the thread correctly, there aren't any issues 
created by this patch, though there are some existing issues that need to be 
addressed independently.  The patch itself is definitely the right thing to be 
doing.

I'd suggest going forward with the commit whenever it's convenient Pan.

Thanks,
Jeff


RE: [PATCH v3] RISC-V: Bugfix for vec_init repeating auto vectorization in RV32

2023-06-14 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff and Juzhe.

Pan

-Original Message-
From: Jeff Law  
Sent: Thursday, June 15, 2023 2:56 AM
To: juzhe.zh...@rivai.ai; Li, Pan2 ; gcc-patches 

Cc: Robin Dapp ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v3] RISC-V: Bugfix for vec_init repeating auto 
vectorization in RV32



On 6/14/23 03:01, juzhe.zh...@rivai.ai wrote:
> LGTM
Agreed.  Commit when convenient.

jeff


[r14-1805 Regression] FAIL: c-c++-common/Wfree-nonheap-object-3.c -std=gnu++98 (test for warnings, line 45) on Linux/x86_64

2023-06-14 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

9c03391ba447ff86038d6a34c90ae737c3915b5f is the first bad commit
commit 9c03391ba447ff86038d6a34c90ae737c3915b5f
Author: Thomas Schwinge 
Date:   Wed Jun 7 16:24:26 2023 +0200

Tighten 'dg-warning' alternatives in 
'c-c++-common/Wfree-nonheap-object{,-2,-3}.c'

caused

FAIL: c-c++-common/Wfree-nonheap-object-3.c  -std=gnu++14 (test for excess 
errors)
FAIL: c-c++-common/Wfree-nonheap-object-3.c  -std=gnu++14  (test for warnings, 
line 45)
FAIL: c-c++-common/Wfree-nonheap-object-3.c  -std=gnu++17 (test for excess 
errors)
FAIL: c-c++-common/Wfree-nonheap-object-3.c  -std=gnu++17  (test for warnings, 
line 45)
FAIL: c-c++-common/Wfree-nonheap-object-3.c  -std=gnu++20 (test for excess 
errors)
FAIL: c-c++-common/Wfree-nonheap-object-3.c  -std=gnu++20  (test for warnings, 
line 45)
FAIL: c-c++-common/Wfree-nonheap-object-3.c  -std=gnu++98 (test for excess 
errors)
FAIL: c-c++-common/Wfree-nonheap-object-3.c  -std=gnu++98  (test for warnings, 
line 45)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-1805/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=c-c++-common/Wfree-nonheap-object-3.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=c-c++-common/Wfree-nonheap-object-3.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=c-c++-common/Wfree-nonheap-object-3.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=c-c++-common/Wfree-nonheap-object-3.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


Re: [PATCH] c++: provide #include hint for missing includes [PR110164]

2023-06-14 Thread Sam James via Gcc-patches

Eric Gallager via Gcc-patches  writes:

> On Wed, Jun 14, 2023 at 8:29 PM David Malcolm via Gcc-patches
>  wrote:
>>
>> PR c++/110164 notes that in cases where we have a forward decl
>> of a std library type such as:
>>
>> std::array x;
>>
>> we omit this diagnostic:
>>
>> error: aggregate ‘std::array x’ has incomplete type and cannot be 
>> defined
>>
>> This patch adds this hint to the diagnostic:
>>
>> note: ‘std::array’ is defined in header ‘’; this is probably fixable 
>> by adding ‘#include ’
>>
>
> ..."probably"?
>

Right now, our fixit says:
```
/tmp/foo.c:1:1: note: ‘time_t’ is defined in header ‘’; did you forget 
to ‘#include ’?
```

We should probably use the same phrasing for consistency?

>> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>> OK for trunk?
>>
>> gcc/cp/ChangeLog:
>> PR c++/110164
>> * cp-name-hint.h (maybe_suggest_missing_header): New decl.
>> * decl.cc: Define INCLUDE_MEMORY.  Add include of
>> "cp/cp-name-hint.h".
>> (start_decl_1): Call maybe_suggest_missing_header.
>> * name-lookup.cc (maybe_suggest_missing_header): Remove "static".
>>
>> gcc/testsuite/ChangeLog:
>> PR c++/110164
>> * g++.dg/missing-header-pr110164.C: New test.
>> ---
>>  gcc/cp/cp-name-hint.h  |  3 +++
>>  gcc/cp/decl.cc | 10 ++
>>  gcc/cp/name-lookup.cc  |  2 +-
>>  gcc/testsuite/g++.dg/missing-header-pr110164.C | 10 ++
>>  4 files changed, 24 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/g++.dg/missing-header-pr110164.C
>>
>> diff --git a/gcc/cp/cp-name-hint.h b/gcc/cp/cp-name-hint.h
>> index bfa7c53c8f6..e2387e23d1f 100644
>> --- a/gcc/cp/cp-name-hint.h
>> +++ b/gcc/cp/cp-name-hint.h
>> @@ -32,6 +32,9 @@ along with GCC; see the file COPYING3.  If not see
>>
>>  extern name_hint suggest_alternatives_for (location_t, tree, bool);
>>  extern name_hint suggest_alternatives_in_other_namespaces (location_t, 
>> tree);
>> +extern name_hint maybe_suggest_missing_header (location_t location,
>> +  tree name,
>> +  tree scope);
>>  extern name_hint suggest_alternative_in_explicit_scope (location_t, tree, 
>> tree);
>>  extern name_hint suggest_alternative_in_scoped_enum (tree, tree);
>>
>> diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
>> index a672e4844f1..504b08ec250 100644
>> --- a/gcc/cp/decl.cc
>> +++ b/gcc/cp/decl.cc
>> @@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
>> line numbers.  For example, the CONST_DECLs for enum values.  */
>>
>>  #include "config.h"
>> +#define INCLUDE_MEMORY
>>  #include "system.h"
>>  #include "coretypes.h"
>>  #include "target.h"
>> @@ -46,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "c-family/c-objc.h"
>>  #include "c-family/c-pragma.h"
>>  #include "c-family/c-ubsan.h"
>> +#include "cp/cp-name-hint.h"
>>  #include "debug.h"
>>  #include "plugin.h"
>>  #include "builtins.h"
>> @@ -5995,7 +5997,11 @@ start_decl_1 (tree decl, bool initialized)
>> ;   /* An auto type is ok.  */
>>else if (TREE_CODE (type) != ARRAY_TYPE)
>> {
>> + auto_diagnostic_group d;
>>   error ("variable %q#D has initializer but incomplete type", decl);
>> + maybe_suggest_missing_header (input_location,
>> +   TYPE_IDENTIFIER (type),
>> +   TYPE_CONTEXT (type));
>>   type = TREE_TYPE (decl) = error_mark_node;
>> }
>>else if (!COMPLETE_TYPE_P (complete_type (TREE_TYPE (type
>> @@ -6011,8 +6017,12 @@ start_decl_1 (tree decl, bool initialized)
>> gcc_assert (CLASS_PLACEHOLDER_TEMPLATE (type));
>>else
>> {
>> + auto_diagnostic_group d;
>>   error ("aggregate %q#D has incomplete type and cannot be defined",
>>  decl);
>> + maybe_suggest_missing_header (input_location,
>> +   TYPE_IDENTIFIER (type),
>> +   TYPE_CONTEXT (type));
>>   /* Change the type so that assemble_variable will give
>>  DECL an rtl we can live with: (mem (const_int 0)).  */
>>   type = TREE_TYPE (decl) = error_mark_node;
>> diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
>> index 6ac58a35b56..917b481c163 100644
>> --- a/gcc/cp/name-lookup.cc
>> +++ b/gcc/cp/name-lookup.cc
>> @@ -6796,7 +6796,7 @@ maybe_suggest_missing_std_header (location_t location, 
>> tree name)
>> for NAME within SCOPE at LOCATION, or an empty name_hint if this isn't
>> applicable.  */
>>
>> -static name_hint
>> +name_hint
>>  maybe_suggest_missing_header (location_t location, tree name, tree scope)
>>  {
>>if (scope == NULL_TREE)
>> diff --git 

Re: [PATCH] c++: provide #include hint for missing includes [PR110164]

2023-06-14 Thread Eric Gallager via Gcc-patches
On Wed, Jun 14, 2023 at 8:29 PM David Malcolm via Gcc-patches
 wrote:
>
> PR c++/110164 notes that in cases where we have a forward decl
> of a std library type such as:
>
> std::array x;
>
> we omit this diagnostic:
>
> error: aggregate ‘std::array x’ has incomplete type and cannot be 
> defined
>
> This patch adds this hint to the diagnostic:
>
> note: ‘std::array’ is defined in header ‘’; this is probably fixable 
> by adding ‘#include ’
>

..."probably"?

> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> OK for trunk?
>
> gcc/cp/ChangeLog:
> PR c++/110164
> * cp-name-hint.h (maybe_suggest_missing_header): New decl.
> * decl.cc: Define INCLUDE_MEMORY.  Add include of
> "cp/cp-name-hint.h".
> (start_decl_1): Call maybe_suggest_missing_header.
> * name-lookup.cc (maybe_suggest_missing_header): Remove "static".
>
> gcc/testsuite/ChangeLog:
> PR c++/110164
> * g++.dg/missing-header-pr110164.C: New test.
> ---
>  gcc/cp/cp-name-hint.h  |  3 +++
>  gcc/cp/decl.cc | 10 ++
>  gcc/cp/name-lookup.cc  |  2 +-
>  gcc/testsuite/g++.dg/missing-header-pr110164.C | 10 ++
>  4 files changed, 24 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/missing-header-pr110164.C
>
> diff --git a/gcc/cp/cp-name-hint.h b/gcc/cp/cp-name-hint.h
> index bfa7c53c8f6..e2387e23d1f 100644
> --- a/gcc/cp/cp-name-hint.h
> +++ b/gcc/cp/cp-name-hint.h
> @@ -32,6 +32,9 @@ along with GCC; see the file COPYING3.  If not see
>
>  extern name_hint suggest_alternatives_for (location_t, tree, bool);
>  extern name_hint suggest_alternatives_in_other_namespaces (location_t, tree);
> +extern name_hint maybe_suggest_missing_header (location_t location,
> +  tree name,
> +  tree scope);
>  extern name_hint suggest_alternative_in_explicit_scope (location_t, tree, 
> tree);
>  extern name_hint suggest_alternative_in_scoped_enum (tree, tree);
>
> diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> index a672e4844f1..504b08ec250 100644
> --- a/gcc/cp/decl.cc
> +++ b/gcc/cp/decl.cc
> @@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
> line numbers.  For example, the CONST_DECLs for enum values.  */
>
>  #include "config.h"
> +#define INCLUDE_MEMORY
>  #include "system.h"
>  #include "coretypes.h"
>  #include "target.h"
> @@ -46,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "c-family/c-objc.h"
>  #include "c-family/c-pragma.h"
>  #include "c-family/c-ubsan.h"
> +#include "cp/cp-name-hint.h"
>  #include "debug.h"
>  #include "plugin.h"
>  #include "builtins.h"
> @@ -5995,7 +5997,11 @@ start_decl_1 (tree decl, bool initialized)
> ;   /* An auto type is ok.  */
>else if (TREE_CODE (type) != ARRAY_TYPE)
> {
> + auto_diagnostic_group d;
>   error ("variable %q#D has initializer but incomplete type", decl);
> + maybe_suggest_missing_header (input_location,
> +   TYPE_IDENTIFIER (type),
> +   TYPE_CONTEXT (type));
>   type = TREE_TYPE (decl) = error_mark_node;
> }
>else if (!COMPLETE_TYPE_P (complete_type (TREE_TYPE (type
> @@ -6011,8 +6017,12 @@ start_decl_1 (tree decl, bool initialized)
> gcc_assert (CLASS_PLACEHOLDER_TEMPLATE (type));
>else
> {
> + auto_diagnostic_group d;
>   error ("aggregate %q#D has incomplete type and cannot be defined",
>  decl);
> + maybe_suggest_missing_header (input_location,
> +   TYPE_IDENTIFIER (type),
> +   TYPE_CONTEXT (type));
>   /* Change the type so that assemble_variable will give
>  DECL an rtl we can live with: (mem (const_int 0)).  */
>   type = TREE_TYPE (decl) = error_mark_node;
> diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
> index 6ac58a35b56..917b481c163 100644
> --- a/gcc/cp/name-lookup.cc
> +++ b/gcc/cp/name-lookup.cc
> @@ -6796,7 +6796,7 @@ maybe_suggest_missing_std_header (location_t location, 
> tree name)
> for NAME within SCOPE at LOCATION, or an empty name_hint if this isn't
> applicable.  */
>
> -static name_hint
> +name_hint
>  maybe_suggest_missing_header (location_t location, tree name, tree scope)
>  {
>if (scope == NULL_TREE)
> diff --git a/gcc/testsuite/g++.dg/missing-header-pr110164.C 
> b/gcc/testsuite/g++.dg/missing-header-pr110164.C
> new file mode 100644
> index 000..15980071c38
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/missing-header-pr110164.C
> @@ -0,0 +1,10 @@
> +// { dg-require-effective-target c++11 }
> +
> +#include 
> +
> +std::array a1; /* { dg-error "incomplete type" } */
> +/* { dg-message 

[PATCH] c++: provide #include hint for missing includes [PR110164]

2023-06-14 Thread David Malcolm via Gcc-patches
PR c++/110164 notes that in cases where we have a forward decl
of a std library type such as:

std::array x;

we omit this diagnostic:

error: aggregate ‘std::array x’ has incomplete type and cannot be 
defined

This patch adds this hint to the diagnostic:

note: ‘std::array’ is defined in header ‘’; this is probably fixable by 
adding ‘#include ’

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
OK for trunk?

gcc/cp/ChangeLog:
PR c++/110164
* cp-name-hint.h (maybe_suggest_missing_header): New decl.
* decl.cc: Define INCLUDE_MEMORY.  Add include of
"cp/cp-name-hint.h".
(start_decl_1): Call maybe_suggest_missing_header.
* name-lookup.cc (maybe_suggest_missing_header): Remove "static".

gcc/testsuite/ChangeLog:
PR c++/110164
* g++.dg/missing-header-pr110164.C: New test.
---
 gcc/cp/cp-name-hint.h  |  3 +++
 gcc/cp/decl.cc | 10 ++
 gcc/cp/name-lookup.cc  |  2 +-
 gcc/testsuite/g++.dg/missing-header-pr110164.C | 10 ++
 4 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/missing-header-pr110164.C

diff --git a/gcc/cp/cp-name-hint.h b/gcc/cp/cp-name-hint.h
index bfa7c53c8f6..e2387e23d1f 100644
--- a/gcc/cp/cp-name-hint.h
+++ b/gcc/cp/cp-name-hint.h
@@ -32,6 +32,9 @@ along with GCC; see the file COPYING3.  If not see
 
 extern name_hint suggest_alternatives_for (location_t, tree, bool);
 extern name_hint suggest_alternatives_in_other_namespaces (location_t, tree);
+extern name_hint maybe_suggest_missing_header (location_t location,
+  tree name,
+  tree scope);
 extern name_hint suggest_alternative_in_explicit_scope (location_t, tree, 
tree);
 extern name_hint suggest_alternative_in_scoped_enum (tree, tree);
 
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index a672e4844f1..504b08ec250 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
line numbers.  For example, the CONST_DECLs for enum values.  */
 
 #include "config.h"
+#define INCLUDE_MEMORY
 #include "system.h"
 #include "coretypes.h"
 #include "target.h"
@@ -46,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-family/c-objc.h"
 #include "c-family/c-pragma.h"
 #include "c-family/c-ubsan.h"
+#include "cp/cp-name-hint.h"
 #include "debug.h"
 #include "plugin.h"
 #include "builtins.h"
@@ -5995,7 +5997,11 @@ start_decl_1 (tree decl, bool initialized)
;   /* An auto type is ok.  */
   else if (TREE_CODE (type) != ARRAY_TYPE)
{
+ auto_diagnostic_group d;
  error ("variable %q#D has initializer but incomplete type", decl);
+ maybe_suggest_missing_header (input_location,
+   TYPE_IDENTIFIER (type),
+   TYPE_CONTEXT (type));
  type = TREE_TYPE (decl) = error_mark_node;
}
   else if (!COMPLETE_TYPE_P (complete_type (TREE_TYPE (type
@@ -6011,8 +6017,12 @@ start_decl_1 (tree decl, bool initialized)
gcc_assert (CLASS_PLACEHOLDER_TEMPLATE (type));
   else
{
+ auto_diagnostic_group d;
  error ("aggregate %q#D has incomplete type and cannot be defined",
 decl);
+ maybe_suggest_missing_header (input_location,
+   TYPE_IDENTIFIER (type),
+   TYPE_CONTEXT (type));
  /* Change the type so that assemble_variable will give
 DECL an rtl we can live with: (mem (const_int 0)).  */
  type = TREE_TYPE (decl) = error_mark_node;
diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 6ac58a35b56..917b481c163 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -6796,7 +6796,7 @@ maybe_suggest_missing_std_header (location_t location, 
tree name)
for NAME within SCOPE at LOCATION, or an empty name_hint if this isn't
applicable.  */
 
-static name_hint
+name_hint
 maybe_suggest_missing_header (location_t location, tree name, tree scope)
 {
   if (scope == NULL_TREE)
diff --git a/gcc/testsuite/g++.dg/missing-header-pr110164.C 
b/gcc/testsuite/g++.dg/missing-header-pr110164.C
new file mode 100644
index 000..15980071c38
--- /dev/null
+++ b/gcc/testsuite/g++.dg/missing-header-pr110164.C
@@ -0,0 +1,10 @@
+// { dg-require-effective-target c++11 }
+
+#include 
+
+std::array a1; /* { dg-error "incomplete type" } */
+/* { dg-message "'std::array' is defined in header ''; this is probably 
fixable by adding '#include '" "hint" { target *-*-* } .-1 } */
+
+std::array a2 {5}; /* { dg-error "incomplete type" } */
+/* { dg-message "'std::array' is defined in header ''; this is probably 
fixable by adding '#include '" "hint" { target *-*-* } .-1 } */
+
-- 
2.26.3


[libstdc++] [testsuite] expect zero entropy matching implementation

2023-06-14 Thread Alexandre Oliva via Gcc-patches


random_device::get_entropy() returns 0.0 when _GLIBCXX_USE_DEV_RANDOM
is not defined, but the test expects otherwise.  Adjust.

Regstrapped on x86_64-linux-gnu, also tested on aarch64-rtems6.  Ok to
install?


for  libstdc++-v3/ChangeLog

* testsuite/26_numerics/random/random_device/entropy.cc:
Expect get_entropy to return zero when _GLIBCXX_USE_DEV_RANDOM
is not defined.
---
 .../26_numerics/random/random_device/entropy.cc|8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/26_numerics/random/random_device/entropy.cc 
b/libstdc++-v3/testsuite/26_numerics/random/random_device/entropy.cc
index 9f529f5d81410..3e6872c8a613f 100644
--- a/libstdc++-v3/testsuite/26_numerics/random/random_device/entropy.cc
+++ b/libstdc++-v3/testsuite/26_numerics/random/random_device/entropy.cc
@@ -13,7 +13,13 @@ test01()
   VERIFY( std::random_device(token).entropy() == 0.0 );
 
   using result_type = std::random_device::result_type;
+#ifdef _GLIBCXX_USE_DEV_RANDOM
   const double max = std::numeric_limits::digits;
+#else
+  // random_device::entropy() always returns 0.0 when
+  // _GLIBCXX_USE_DEV_RANDOM is not defined.
+  const double max = 0.0;
+#endif
 
   for (auto token : { "/dev/random", "/dev/urandom" })
 if (__gnu_test::random_device_available(token))
@@ -30,7 +36,7 @@ test01()
   VERIFY( entropy == max );
 }
 
-for (auto token : { "getentropy", "arc4random" })
+  for (auto token : { "getentropy", "arc4random" })
 if (__gnu_test::random_device_available(token))
 {
   const double entropy = std::random_device(token).entropy();

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[libstdc++] [testsuite] xfail dbl from_chars for aarch64 rtems ldbl

2023-06-14 Thread Alexandre Oliva via Gcc-patches


rtems, like vxworks, uses fast-float doubles for from_chars even for
long double, so it loses precision, so expect the long double bits to
fail on aarch64.

Regstrapped on x86_64-linux-gnu, also tested on aarch64-rtems6.  Ok to
install?


for  libstdc++-v3/ChangeLog

* testsuite/20_util/from_chars/4.cc: Skip long double on
aarch64-rtems.
---
 libstdc++-v3/testsuite/20_util/from_chars/4.cc |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/20_util/from_chars/4.cc 
b/libstdc++-v3/testsuite/20_util/from_chars/4.cc
index 206e18daeb229..76e07df9d2bf3 100644
--- a/libstdc++-v3/testsuite/20_util/from_chars/4.cc
+++ b/libstdc++-v3/testsuite/20_util/from_chars/4.cc
@@ -18,7 +18,7 @@
 //  is supported in C++14 as a GNU extension
 // { dg-do run { target c++14 } }
 // { dg-add-options ieee }
-// { dg-additional-options "-DSKIP_LONG_DOUBLE" { target aarch64-*-vxworks* 
x86_64-*-vxworks* } }
+// { dg-additional-options "-DSKIP_LONG_DOUBLE" { target aarch64-*-rtems* 
aarch64-*-vxworks* x86_64-*-vxworks* } }
 
 #include 
 #include 

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


libstdc++-v3: do not duplicate some math functions when using newlib

2023-06-14 Thread Alexandre Oliva via Gcc-patches


Contributing a patch by Joel Brobecker .
Regstrapped on x86_64-linux-gnu just to be sure, also tested with
aarch64-rtems6.  I'm going to put this in later this week if there
aren't any objections.


When running the libstdc++ testsuite on AArch64 RTEMS, we noticed
that about 25 tests are failing during the link, due to the "sqrtl"
function being defined twice:
  - once inside RTEMS' libm;
  - once inside our libstdc++.

One test that fails, for instance, would be 26_numerics/complex/13450.cc.

In comparing libm and libstdc++, we found that libstc++ also
duplicates "hypotf", and "hypotl".

For "sqrtl" and "hypotl", the symbosl come a unit called
from math_stubs_long_double.cc, while "hypotf" comes from
the equivalent unit for the float version, called math_stubs_float.cc.
Those units are always compiled in libstdc++ and provide our own
version of various math routines when those are missing from
the target system. The definition of those symbols is predicated
on the existance of various macros provided by c++config.h, which
themselves are predicated by the corresponding HAVE_xxx macros
in config.h.

One key element behind what's happening, here, is that the target
uses newlib, and therefore GCC was configured --with-newlib.
The section of libstdc++v3's configure script that handles which math
functions are available has a newlib-specific section, and that
section provides a hardcoded list of symbols.

For "hypotf", this commit fixes the issue by doing the same
as for the other routines already declared in that section.
I verified by inspection in the newlib code that this function
should always be present, so hardcoding it in our configure
script should not be an issue.

For the math routines handling doubles ("sqrtl" and "hypotl"),
however, I do not believe we can assume that newlib's libm
will always provide them. Therefore, this commit fixes that
part of the issue by ading a compile-check for "sqrtl" and "hypotl".
And while at it, we also include checks for all the other math
functions that math_stubs_long_double.cc re-implements, allowing
us to be resilient to future newlib enhancements adding support
for more functions.

libstdc++-v3/ChangeLog:

* configure.ac ["x${with_newlib}" = "xyes"]: Define
HAVE_HYPOTF.  Add compile-checks for various long double
math functions as well.
* configure: Regenerate.
---
 libstdc++-v3/configure| 1179 +
 libstdc++-v3/configure.ac |9 
 2 files changed, 1188 insertions(+)

diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 354c566b0055c..bda8053ecc279 100755
[omitted]
diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
index 0abe54e7b9a21..9770c1787679f 100644
--- a/libstdc++-v3/configure.ac
+++ b/libstdc++-v3/configure.ac
@@ -349,6 +349,7 @@ else
 AC_DEFINE(HAVE_FLOORF)
 AC_DEFINE(HAVE_FMODF)
 AC_DEFINE(HAVE_FREXPF)
+AC_DEFINE(HAVE_HYPOTF)
 AC_DEFINE(HAVE_LDEXPF)
 AC_DEFINE(HAVE_LOG10F)
 AC_DEFINE(HAVE_LOGF)
@@ -360,6 +361,14 @@ else
 AC_DEFINE(HAVE_TANF)
 AC_DEFINE(HAVE_TANHF)
 
+dnl # Support for the long version of some math libraries depends on
+dnl # architecture and newlib version.  So test for their availability
+dnl # rather than hardcoding that information.
+GLIBCXX_CHECK_MATH_DECLS([
+  acosl asinl atan2l atanl ceill coshl cosl expl fabsl floorl fmodl
+  frexpl hypotl ldexpl log10l logl modfl powl sinhl sinl sqrtl
+  tanhl tanl])
+
 AC_DEFINE(HAVE_ICONV)
 AC_DEFINE(HAVE_MEMALIGN)
 

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH 1/3] OpenMP: C support for imperfectly-nested loops

2023-06-14 Thread Sandra Loosemore

High-order bit:  I've just committed OG13 version of these patches that is 
integrated with Frederik's previous loop transformation patches that are 
already on that branch.  The OG13 version incorporates many of the suggestions 
from this initial review plus a few bug fixes.  I've also made corresponding 
fixes to the mainline version but I've still got a lot of unfinished items, 
mostly related to additional tests for corner cases.

On 5/25/23 04:00, Jakub Jelinek wrote:

On Fri, Apr 28, 2023 at 05:22:52PM -0600, Sandra Loosemore wrote:

OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop.  In GCC this code is moved
into the inner loop body by the respective front ends.

This patch changes the C front end to use recursive descent parsing
on nested loops within an "omp for" construct, rather than an iterative
approach, in order to preserve proper nesting of compound statements.

gcc/c/ChangeLog
* c-parser.cc (struct c_parser): Add omp_for_parse_state field.
(struct omp_for_parse_data): New.
(c_parser_compound_statement_nostart): Special-case nested
OMP loops and blocks in intervening code.
(c_parser_while_statement): Reject in intervening code.
(c_parser_do_statement): Likewise.
(c_parser_for_statement): Likewise.
(c_parser_postfix_expression_after_primary): Reject calls to OMP
runtime routines in intervening code.
(c_parser_pragma): Reject OMP pragmas in intervening code.
(c_parser_omp_loop_nest): New, split from c_parser_omp_for_loop.
(c_parser_omp_for_loop): Rewrite to use recursive descent and
generalize handling for intervening code.

gcc/ChangeLog
* omp-api.h: New file.


Why?  Just add those to omp-general.h.


This is for the Fortran front end, which needs this stuff without everything 
else omp-general.h sucks in.  I remember that I initially did try to put it in 
omp-general.h but split it out when I ran into some trouble with that, and I 
thought it was an abstraction violation in the Fortran front end.  I didn't 
touch this for now; is it important enough that I should spend more time on it?


* omp-general.cc (omp_runtime_api_procname): New.
(omp_runtime_api_call): Moved here from omp-low.cc, and make
non-static.
* omp-general.h: Include omp-api.h.
* omp-low.cc (omp_runtime_api_call): Delete this copy.

gcc/testsuite/ChangeLog
* c-c++-common/goacc/collapse-1.c: Adjust expected error messages.
* c-c++-common/goacc/tile-2.c: Likewise.
* c-c++-common/gomp/imperfect1.c: New.
* c-c++-common/gomp/imperfect2.c: New.
* c-c++-common/gomp/imperfect3.c: New.
* c-c++-common/gomp/imperfect4.c: New.
* c-c++-common/gomp/imperfect5.c: New.
* gcc.dg/gomp/collapse-1.c: Adjust expected error messages.

libgomp/ChangeLog
* testsuite/libgomp.c-c++-common/imperfect1.c: New.
* testsuite/libgomp.c-c++-common/imperfect2.c: New.
* testsuite/libgomp.c-c++-common/imperfect3.c: New.
* testsuite/libgomp.c-c++-common/imperfect4.c: New.
* testsuite/libgomp.c-c++-common/imperfect5.c: New.
* testsuite/libgomp.c-c++-common/imperfect6.c: New.
* testsuite/libgomp.c-c++-common/offload-imperfect1.c: New.
* testsuite/libgomp.c-c++-common/offload-imperfect2.c: New.
* testsuite/libgomp.c-c++-common/offload-imperfect3.c: New.
* testsuite/libgomp.c-c++-common/offload-imperfect4.c: New.


If the 3 patches are going to be committed separately (which I think is a
good idea), then the *c-c++-common* tests are a problem, because the tests
will then fail after the C FE part is committed before the C++ FE part is
committed.
For the new tests there are 2 options, one is commit them in the C patch
with /* { dg-do run { target c } } */ instead of just
/* { dg-do run } */ etc. and then in the second patch remove those
" { target c }" parts, or commit them in the second patch only.
For the existing tests with adjustments, do the { target c } vs.
{ target c++ } games and tweak in the second patch.


OK, I've split the new c-c++-common tests into a separate commit, and done the 
other rigamarole to adjust the other test cases incrementally with each part of 
the series.


The offload-imperfect* tests should be called target-imperfect* I think,
for consistency with other tests.


Done.


In the gcc/testsuite/c-c++-common/gomp/ tests I miss some coverage for
the boundary cases what is and isn't intervening code.
Before your changes, we were allowing multiple levels of {}s,
so
#pragma omp for ordered(2)
for (int i = 0; i < 64; i++)
   {
 {
   {
 for (int j = 0; j < 64; j++)
   ;
   }
 }
   }
which is valid in 5.0 (but should be tested in the testsuite), but also
empty statements, which when reading the 

[OG13 6/6] OpenMP: Fortran support for imperfectly nested loops

2023-06-14 Thread Sandra Loosemore
OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop.  In GCC this code is moved
into the inner loop body by the respective front ends.

In the Fortran front end, most of the semantic processing happens during
the translation phase, so the parse phase just collects the intervening
statements, checks them for errors, and splices them around the loop body.

gcc/fortran/ChangeLog
* openmp.cc: Include omp-api.h.
(resolve_omp_clauses): Consolidate inscan reduction clause conflict
checking here.
(scan_for_next_loop_in_chain): New.
(scan_for_next_loop_in_block): New.
(gfc_resolve_omp_do_blocks): Set omp_current_do_collapse properly.
Handle imperfectly-nested loops when looking for nested omp scan.
Refactor to move inscan reduction clause conflict checking to
resolve_omp_clauses.
(gfc_resolve_do_iterator): Handle imperfectly-nested loops.
(struct icode_error_state): New.
(icode_code_error_callback): New.
(icode_expr_error_callback): New.
(diagnose_intervening_code_errors_1): New.
(diagnose_intervening_code_errors): New.
(restructure_intervening_code): New.
(resolve_nested_loops): Update error handling, and extend to
detect imperfect nesting errors and check validity of
intervening code.  Call restructure_intervening_code if needed.
(resolve_omp_do): Rename collapse -> count.

gcc/testsuite/ChangeLog
* gfortran.dg/gomp/collapse1.f90: Adjust expected errors.
* gfortran.dg/gomp/collapse2.f90: Likewise.
* gfortran.dg/gomp/imperfect1.f90: New.
* gfortran.dg/gomp/imperfect2.f90: New.
* gfortran.dg/gomp/imperfect3.f90: New.
* gfortran.dg/gomp/imperfect4.f90: New.
* gfortran.dg/gomp/imperfect5.f90: New.
* gfortran.dg/gomp/loop-transforms/tile-1.f90: Adjust expected errors.
* gfortran.dg/gomp/loop-transforms/tile-2.f90: Likewise.
* gfortran.dg/gomp/loop-transforms/tile-imperfect-nest.f90: Likewise.

libgomp/ChangeLog
* testsuite/libgomp.fortran/imperfect-destructor.f90: New.
* testsuite/libgomp.fortran/imperfect-transform-1.f90: New.
* testsuite/libgomp.fortran/imperfect-transform-2.f90: New.
* testsuite/libgomp.fortran/imperfect1.f90: New.
* testsuite/libgomp.fortran/imperfect2.f90: New.
* testsuite/libgomp.fortran/imperfect3.f90: New.
* testsuite/libgomp.fortran/imperfect4.f90: New.
* testsuite/libgomp.fortran/target-imperfect-transform-1.f90: New.
* testsuite/libgomp.fortran/target-imperfect-transform-2.f90: New.
* testsuite/libgomp.fortran/target-imperfect1.f90: New.
* testsuite/libgomp.fortran/target-imperfect2.f90: New.
* testsuite/libgomp.fortran/target-imperfect3.f90: New.
* testsuite/libgomp.fortran/target-imperfect4.f90: New.
---
 gcc/fortran/ChangeLog.omp |  23 +
 gcc/fortran/openmp.cc | 668 +++---
 gcc/testsuite/ChangeLog.omp   |  13 +
 gcc/testsuite/gfortran.dg/gomp/collapse1.f90  |   2 +-
 gcc/testsuite/gfortran.dg/gomp/collapse2.f90  |  10 +-
 gcc/testsuite/gfortran.dg/gomp/imperfect1.f90 |  39 +
 gcc/testsuite/gfortran.dg/gomp/imperfect2.f90 |  56 ++
 gcc/testsuite/gfortran.dg/gomp/imperfect3.f90 |  29 +
 gcc/testsuite/gfortran.dg/gomp/imperfect4.f90 |  36 +
 gcc/testsuite/gfortran.dg/gomp/imperfect5.f90 |  67 ++
 .../gomp/loop-transforms/tile-1.f90   |  12 +-
 .../gomp/loop-transforms/tile-2.f90   |   2 +-
 .../loop-transforms/tile-imperfect-nest.f90   |  16 +-
 libgomp/ChangeLog.omp |  16 +
 .../libgomp.fortran/imperfect-destructor.f90  | 142 
 .../libgomp.fortran/imperfect-transform-1.f90 |  70 ++
 .../libgomp.fortran/imperfect-transform-2.f90 |  70 ++
 .../testsuite/libgomp.fortran/imperfect1.f90  |  67 ++
 .../testsuite/libgomp.fortran/imperfect2.f90  | 102 +++
 .../testsuite/libgomp.fortran/imperfect3.f90  | 110 +++
 .../testsuite/libgomp.fortran/imperfect4.f90  | 121 
 .../target-imperfect-transform-1.f90  |  73 ++
 .../target-imperfect-transform-2.f90  |  73 ++
 .../libgomp.fortran/target-imperfect1.f90 |  72 ++
 .../libgomp.fortran/target-imperfect2.f90 | 110 +++
 .../libgomp.fortran/target-imperfect3.f90 | 116 +++
 .../libgomp.fortran/target-imperfect4.f90 | 126 
 27 files changed, 2125 insertions(+), 116 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/imperfect1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/imperfect2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/imperfect3.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/imperfect4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/imperfect5.f90
 create mode 100644 

[OG13 5/6] OpenMP: Refactor and tidy Fortran front-end code for loop transformations

2023-06-14 Thread Sandra Loosemore
This patch rearranges some code previously added to support loop
transformations to simplify merging support for imperfectly-nested loops
in a subsequent patch.  There is no new functionality added here.

gcc/fortran/ChangeLog
* openmp.cc (find_nested_loop_in_chain): Move up in file.
(find_nested_loop_in_block): Likewise.
(resolve_nested_loops): New helper function to consolidate code
from...
(resolve_omp_do, resolve_omp_tile): ...these functions.  Also,
remove the redundant call to resolve_nested_loop_transforms, and
use uniform error message wording.

gcc/testsuite/ChangeLog
* gfortran.dg/gomp/collapse1.f90: Adjust expected error message.
* gfortran.dg/gomp/collapse2.f90: Likewise.
* gfortran.dg/gomp/loop-transforms/tile-2.f90: Likewise.
---
 gcc/fortran/ChangeLog.omp |  10 +
 gcc/fortran/openmp.cc | 447 +++---
 gcc/testsuite/ChangeLog.omp   |   6 +
 gcc/testsuite/gfortran.dg/gomp/collapse1.f90  |   2 +-
 gcc/testsuite/gfortran.dg/gomp/collapse2.f90  |   4 +-
 .../gomp/loop-transforms/tile-2.f90   |   2 +-
 6 files changed, 204 insertions(+), 267 deletions(-)

diff --git a/gcc/fortran/ChangeLog.omp b/gcc/fortran/ChangeLog.omp
index 3791eddc6c5..04ed7f88175 100644
--- a/gcc/fortran/ChangeLog.omp
+++ b/gcc/fortran/ChangeLog.omp
@@ -1,3 +1,13 @@
+2023-06-13  Sandra Loosemore  
+
+   * openmp.cc (find_nested_loop_in_chain): Move up in file.
+   (find_nested_loop_in_block): Likewise.
+   (resolve_nested_loops): New helper function to consolidate code
+   from...
+   (resolve_omp_do, resolve_omp_tile): ...these functions.  Also,
+   remove the redundant call to resolve_nested_loop_transforms, and
+   use uniform error message wording.
+
 2023-06-12  Tobias Burnus  
 
* trans-openmp.cc (gfc_omp_deep_map_kind_p): Fix conditions for
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index ca9a8e665d1..5ab64b5231f 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -10045,6 +10045,52 @@ static struct fortran_omp_context
 static gfc_code *omp_current_do_code;
 static int omp_current_do_collapse;
 
+/* Forward declaration for mutually recursive functions.  */
+static gfc_code *
+find_nested_loop_in_block (gfc_code *block);
+
+/* Return the first nested DO loop in CHAIN, or NULL if there
+   isn't one.  Does no error checking on intervening code.  */
+
+static gfc_code *
+find_nested_loop_in_chain (gfc_code *chain)
+{
+  gfc_code *code;
+
+  if (!chain)
+return NULL;
+
+  for (code = chain; code; code = code->next)
+{
+  if (code->op == EXEC_DO)
+   return code;
+  else if (loop_transform_p (code->op) && code->block)
+   {
+ code = code->block;
+ continue;
+   }
+  else if (code->op == EXEC_BLOCK)
+   {
+ gfc_code *c = find_nested_loop_in_block (code);
+ if (c)
+   return c;
+   }
+}
+  return NULL;
+}
+
+/* Return the first nested DO loop in BLOCK, or NULL if there
+   isn't one.  Does no error checking on intervening code.  */
+static gfc_code *
+find_nested_loop_in_block (gfc_code *block)
+{
+  gfc_namespace *ns;
+  gcc_assert (block->op == EXEC_BLOCK);
+  ns = block->ext.block.ns;
+  gcc_assert (ns);
+  return find_nested_loop_in_chain (ns->code);
+}
+
 void
 gfc_resolve_omp_do_blocks (gfc_code *code, gfc_namespace *ns)
 {
@@ -10282,51 +10328,6 @@ gfc_resolve_omp_local_vars (gfc_namespace *ns)
 }
 
 
-/* Forward declaration for mutually recursive functions.  */
-static gfc_code *
-find_nested_loop_in_block (gfc_code *block);
-
-/* Return the first nested DO loop in CHAIN, or NULL if there
-   isn't one.  Does no error checking on intervening code.  */
-
-static gfc_code *
-find_nested_loop_in_chain (gfc_code *chain)
-{
-  gfc_code *code;
-
-  if (!chain)
-return NULL;
-
-  for (code = chain; code; code = code->next)
-{
-  if (code->op == EXEC_DO)
-   return code;
-  else if (loop_transform_p (code->op) && code->block)
-   {
- code = code->block;
- continue;
-   }
-  else if (code->op == EXEC_BLOCK)
-   {
- gfc_code *c = find_nested_loop_in_block (code);
- if (c)
-   return c;
-   }
-}
-  return NULL;
-}
-
-/* Return the first nested DO loop in BLOCK, or NULL if there
-   isn't one.  Does no error checking on intervening code.  */
-static gfc_code *
-find_nested_loop_in_block (gfc_code *block)
-{
-  gfc_namespace *ns;
-  gcc_assert (block->op == EXEC_BLOCK);
-  ns = block->ext.block.ns;
-  gcc_assert (ns);
-  return find_nested_loop_in_chain (ns->code);
-}
 /* CODE is an OMP loop construct.  Return true if VAR matches an iteration
variable outer to level DEPTH.  */
 static bool
@@ -10547,13 +10548,140 @@ resolve_omp_unroll (gfc_code *code)
 descr, loc);
 }
 
+/* Shared helper function for resolve_omp_do and 

[OG13 3/6] OpenMP: C++ support for imperfectly-nested loops

2023-06-14 Thread Sandra Loosemore
OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop.  In GCC this code is moved
into the inner loop body by the respective front ends.

This patch changes the C++ front end to use recursive descent parsing
on nested loops within an "omp for" construct, rather than an
iterative approach, in order to preserve proper nesting of compound
statements.  Preserving cleanups (destructors) for class objects
declared in intervening code and loop initializers complicates moving
the former into the body of the loop; this is handled by parsing the
entire construct before reassembling any of it.

New common C/C++ testcases are in a separate patch.

gcc/cp/ChangeLog
* cp-tree.h (cp_convert_omp_range_for): Adjust declaration.
* parser.cc (struct omp_for_parse_data): New.
(cp_parser_postfix_expression): Diagnose calls to OpenMP runtime
in intervening code.
(check_omp_intervening_code): New.
(cp_parser_statement_seq_opt): Special-case nested OMP loops and
blocks in intervening code.
(cp_parser_iteration_statement): Reject loops in intervening code.
(cp_parser_omp_for_loop_init): Expand comments and tweak the
interface slightly to better distinguish input/output parameters.
(cp_parser_omp_range_for): Likewise.
(cp_convert_omp_range_for): Likewise.
(cp_parser_see_omp_loop_nest): New.
(cp_parser_omp_loop_nest): New, split from cp_parser_omp_for_loop
and largely rewritten.  Add more comments.
(struct sit_data, substitute_in_tree_walker, substitute_in_tree):
New.
(fixup_blocks_walker): New.
(cp_parser_omp_for_loop): Rewrite to use recursive descent instead
of a loop.  Add logic to reshuffle the bits of code collected
during parsing so intervening code gets moved to the loop body.
(cp_parser_omp_loop): Remove call to finish_omp_for_block, which
is now redundant.
(cp_parser_omp_simd): Likewise.
(cp_parser_omp_for): Likewise.
(cp_parser_omp_distribute): Likewise.
(cp_parser_oacc_loop): Likewise.
(cp_parser_omp_taskloop): Likewise.
(cp_parser_pragma): Reject OpenMP pragmas in intervening code.
* parser.h (struct cp_parser): Add omp_for_parse_state field.
* pt.cc (tsubst_omp_for_iterator): Adjust call to
cp_convert_omp_range_for.
* semantics.cc (struct fofb_data, finish_omp_for_block_walker): New.
(finish_omp_for_block): Allow variables to be bound in a BIND_EXPR
nested inside BIND instead of directly in BIND itself.

gcc/testsuite/ChangeLog

* c-c++-common/goacc/tile-2.c: Adjust expected error patterns.
* c-c++-common/gomp/loop-transforms/imperfect-loop-nest: Likewise.
* c-c++-common/gomp/loop-transforms/tile-1.c: Likewise.
* c-c++-common/gomp/loop-transforms/tile-2.c: Likewise.
* c-c++-common/gomp/loop-transforms/tile-3.c: Likewise.
* c-c++-common/gomp/loop-transforms/unroll-inner-2.c: Likewise.
* g++.dg/gomp/attrs-4.C: Likewise.
* g++.dg/gomp/for-1.C: Likewise.
* g++.dg/gomp/pr41967.C: Likewise.
* g++.dg/gomp/pr94512.C: Likewise.

libgomp/ChangeLog

* testsuite/libgomp.c++/imperfect-class-1.C: New.
* testsuite/libgomp.c++/imperfect-class-2.C: New.
* testsuite/libgomp.c++/imperfect-class-3.C: New.
* testsuite/libgomp.c++/imperfect-destructor.C: New.
* testsuite/libgomp.c++/imperfect-template-1.C: New.
* testsuite/libgomp.c++/imperfect-template-2.C: New.
* testsuite/libgomp.c++/imperfect-template-3.C: New.
---
 gcc/cp/ChangeLog.omp  |   38 +
 gcc/cp/cp-tree.h  |2 +-
 gcc/cp/parser.cc  | 1331 +++--
 gcc/cp/parser.h   |3 +
 gcc/cp/pt.cc  |3 +-
 gcc/cp/semantics.cc   |   80 +-
 gcc/testsuite/ChangeLog.omp   |   13 +
 gcc/testsuite/c-c++-common/goacc/tile-2.c |4 +-
 .../loop-transforms/imperfect-loop-nest.c |5 +-
 .../gomp/loop-transforms/tile-1.c |   10 +-
 .../gomp/loop-transforms/tile-2.c |   10 +-
 .../gomp/loop-transforms/tile-3.c |   16 +-
 .../gomp/loop-transforms/unroll-inner-2.c |5 +-
 gcc/testsuite/g++.dg/gomp/attrs-4.C   |2 +-
 gcc/testsuite/g++.dg/gomp/for-1.C |2 +-
 gcc/testsuite/g++.dg/gomp/pr41967.C   |2 +-
 gcc/testsuite/g++.dg/gomp/pr94512.C   |2 +-
 libgomp/ChangeLog.omp |   10 +
 .../testsuite/libgomp.c++/imperfect-class-1.C |  169 +++
 .../testsuite/libgomp.c++/imperfect-class-2.C |  167 +++
 .../testsuite/libgomp.c++/imperfect-class-3.C |  167 +++
 

[OG13 4/6] OpenMP: New c/c++ testcases for imperfectly-nested loops

2023-06-14 Thread Sandra Loosemore
gcc/testsuite/ChangeLog
* c-c++-common/gomp/imperfect1.c: New.
* c-c++-common/gomp/imperfect2.c: New.
* c-c++-common/gomp/imperfect3.c: New.
* c-c++-common/gomp/imperfect4.c: New.
* c-c++-common/gomp/imperfect5.c: New.

libgomp/ChangeLog
* testsuite/libgomp.c-c++-common/imperfect-transform-1.c: New.
* testsuite/libgomp.c-c++-common/imperfect-transform-2.c: New.
* testsuite/libgomp.c-c++-common/imperfect1.c: New.
* testsuite/libgomp.c-c++-common/imperfect2.c: New.
* testsuite/libgomp.c-c++-common/imperfect3.c: New.
* testsuite/libgomp.c-c++-common/imperfect4.c: New.
* testsuite/libgomp.c-c++-common/imperfect5.c: New.
* testsuite/libgomp.c-c++-common/imperfect6.c: New.
* testsuite/libgomp.c-c++-common/target-imperfect-transform-1.c: New.
* testsuite/libgomp.c-c++-common/target-imperfect-transform-2.c: New.
* testsuite/libgomp.c-c++-common/target-imperfect1.c: New.
* testsuite/libgomp.c-c++-common/target-imperfect2.c: New.
* testsuite/libgomp.c-c++-common/target-imperfect3.c: New.
* testsuite/libgomp.c-c++-common/target-imperfect4.c: New.
---
 gcc/testsuite/ChangeLog.omp   |   8 ++
 gcc/testsuite/c-c++-common/gomp/imperfect1.c  |  38 ++
 gcc/testsuite/c-c++-common/gomp/imperfect2.c  |  34 +
 gcc/testsuite/c-c++-common/gomp/imperfect3.c  |  33 +
 gcc/testsuite/c-c++-common/gomp/imperfect4.c  |  33 +
 gcc/testsuite/c-c++-common/gomp/imperfect5.c  |  57 
 libgomp/ChangeLog.omp |  17 +++
 .../imperfect-transform-1.c   |  79 +++
 .../imperfect-transform-2.c   |  79 +++
 .../libgomp.c-c++-common/imperfect1.c |  76 +++
 .../libgomp.c-c++-common/imperfect2.c | 114 
 .../libgomp.c-c++-common/imperfect3.c | 119 +
 .../libgomp.c-c++-common/imperfect4.c | 117 
 .../libgomp.c-c++-common/imperfect5.c |  49 +++
 .../libgomp.c-c++-common/imperfect6.c | 115 
 .../target-imperfect-transform-1.c|  82 
 .../target-imperfect-transform-2.c|  82 
 .../libgomp.c-c++-common/target-imperfect1.c  |  81 
 .../libgomp.c-c++-common/target-imperfect2.c  | 122 +
 .../libgomp.c-c++-common/target-imperfect3.c  | 125 ++
 .../libgomp.c-c++-common/target-imperfect4.c  | 122 +
 21 files changed, 1582 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect3.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect4.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/imperfect5.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/imperfect-transform-1.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/imperfect-transform-2.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect2.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect3.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect4.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect5.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/imperfect6.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-imperfect-transform-1.c
 create mode 100644 
libgomp/testsuite/libgomp.c-c++-common/target-imperfect-transform-2.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-imperfect1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-imperfect2.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-imperfect3.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-imperfect4.c

diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp
index d42813684e2..72d7b52256a 100644
--- a/gcc/testsuite/ChangeLog.omp
+++ b/gcc/testsuite/ChangeLog.omp
@@ -1,3 +1,11 @@
+2023-06-13  Sandra Loosemore  
+
+   * c-c++-common/gomp/imperfect1.c: New.
+   * c-c++-common/gomp/imperfect2.c: New.
+   * c-c++-common/gomp/imperfect3.c: New.
+   * c-c++-common/gomp/imperfect4.c: New.
+   * c-c++-common/gomp/imperfect5.c: New.
+
 2023-06-13  Sandra Loosemore  
 
* c-c++-common/goacc/tile-2.c: Adjust expected error patterns.
diff --git a/gcc/testsuite/c-c++-common/gomp/imperfect1.c 
b/gcc/testsuite/c-c++-common/gomp/imperfect1.c
new file mode 100644
index 000..705626ad169
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/imperfect1.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+
+/* This test case is expected to fail due to errors.  */
+
+int f1 (int depth, int iter);
+int f2 (int depth, int iter);
+
+void 

[OG13 2/6] OpenMP: C support for imperfectly-nested loops

2023-06-14 Thread Sandra Loosemore
OpenMP 5.0 removed the restriction that multiple collapsed loops must
be perfectly nested, allowing "intervening code" (including nested
BLOCKs) before or after each nested loop.  In GCC this code is moved
into the inner loop body by the respective front ends.

This patch changes the C front end to use recursive descent parsing
on nested loops within an "omp for" construct, rather than an iterative
approach, in order to preserve proper nesting of compound statements.

New common C/C++ testcases are in a separate patch.

gcc/c/ChangeLog
* c-parser.cc (struct c_parser): Add omp_for_parse_state field.
(struct omp_for_parse_data): New.
(check_omp_intervening_code): New.
(c_parser_compound_statement_nostart): Recognize intervening code
and nested loops in OpenMP loop constructs, and handle each
appropriately.
(c_parser_while_statement): Error on loop in intervening code.
(c_parser_do_statement): Likewise.
(c_parser_for_statement): Likewise.
(c_parser_postfix_expression_after_primary): Error on calls to
the OpenMP runtime in intervening code.
(c_parser_pragma): Error on OpenMP pragmas in intervening code.
(c_parser_see_omp_loop_nest): New.
(c_parser_omp_loop_nest): New.
(c_parser_omp_for_loop): Rewrite to use recursive descent, calling
c_parser_omp_loop_nest to do the heavy lifting.

gcc/ChangeLog
* omp-api.h: New.
* omp-general.cc (omp_runtime_api_procname): New.
(omp_runtime_api_call): Moved here from omp-low.cc, and make
non-static.
* omp-general.h: Include omp-api.h.
* omp-low.cc (omp_runtime_api_call): Delete this copy.

gcc/testsuite/ChangeLog
* c-c++-common/goacc/collapse-1.c: Update for new C error behavior.
* c-c++-common/goacc/tile-2.c: Likewise.
* c-c++-common/gomp/loop-transforms/imperfect-loop-nest.c: Likewise.
* c-c++-common/gomp/loop-transforms/tile-1.c: Likewise.
* c-c++-common/gomp/loop-transforms/tile-2.c: Likewise.
* c-c++-common/gomp/loop-transforms/tile-3.c: Likewise.
* c-c++-common/gomp/loop-transforms/unroll-inner-2.c: Likewise.
* c-c++-common/gomp/metadirective-1.c: Likewise.
* gcc.dg/gomp/collapse-1.c: Likewise.
* gcc.dg/gomp/for-1.c: Likewise.
* gcc.dg/gomp/for-11.c: Likewise.
---
 gcc/ChangeLog.omp |   9 +
 gcc/c/ChangeLog.omp   |  19 +
 gcc/c/c-parser.cc | 833 --
 gcc/omp-api.h |  32 +
 gcc/omp-general.cc| 134 +++
 gcc/omp-general.h |   1 +
 gcc/omp-low.cc| 129 ---
 gcc/testsuite/ChangeLog.omp   |  14 +
 gcc/testsuite/c-c++-common/goacc/collapse-1.c |  16 +-
 gcc/testsuite/c-c++-common/goacc/tile-2.c |   4 +-
 .../loop-transforms/imperfect-loop-nest.c |   4 +-
 .../gomp/loop-transforms/tile-1.c |  12 +-
 .../gomp/loop-transforms/tile-2.c |  12 +-
 .../gomp/loop-transforms/tile-3.c |  24 +-
 .../gomp/loop-transforms/unroll-inner-2.c |   3 +-
 .../c-c++-common/gomp/metadirective-1.c   |   2 +-
 gcc/testsuite/gcc.dg/gomp/collapse-1.c|  10 +-
 gcc/testsuite/gcc.dg/gomp/for-1.c |   2 +-
 gcc/testsuite/gcc.dg/gomp/for-11.c|   2 +-
 19 files changed, 812 insertions(+), 450 deletions(-)
 create mode 100644 gcc/omp-api.h

diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp
index d77d01076c2..78c655618ee 100644
--- a/gcc/ChangeLog.omp
+++ b/gcc/ChangeLog.omp
@@ -1,3 +1,12 @@
+2023-06-13  Sandra Loosemore  
+
+   * omp-api.h: New.
+   * omp-general.cc (omp_runtime_api_procname): New.
+   (omp_runtime_api_call): Moved here from omp-low.cc, and make
+   non-static.
+   * omp-general.h: Include omp-api.h.
+   * omp-low.cc (omp_runtime_api_call): Delete this copy.
+
 2023-06-13  Sandra Loosemore  
Frederik Harwath 
 
diff --git a/gcc/c/ChangeLog.omp b/gcc/c/ChangeLog.omp
index ec4c53b165d..48cf1edd443 100644
--- a/gcc/c/ChangeLog.omp
+++ b/gcc/c/ChangeLog.omp
@@ -1,3 +1,22 @@
+2023-06-13  Sandra Loosemore  
+
+   * c-parser.cc (struct c_parser): Add omp_for_parse_state field.
+   (struct omp_for_parse_data): New.
+   (check_omp_intervening_code): New.
+   (c_parser_compound_statement_nostart): Recognize intervening code
+   and nested loops in OpenMP loop constructs, and handle each
+   appropriately.
+   (c_parser_while_statement): Error on loop in intervening code.
+   (c_parser_do_statement): Likewise.
+   (c_parser_for_statement): Likewise.
+   (c_parser_postfix_expression_after_primary): Error on calls to
+   the OpenMP runtime in intervening code.
+   (c_parser_pragma): Error on OpenMP pragmas in intervening code.
+   

[OG13 0/6] OpenMP: Support for imperfectly-nested loops

2023-06-14 Thread Sandra Loosemore
I have pushed this set of patches to the OG13 development branch.  The
major functional change compared to the mainline version I previously
posted on April 28 is that this version is integrated with Frederik's
loop transformation patches that were previously committed to this
branch.  I've also incorporated several cleanups suggested in review
of the mainline version, along with a few bug fixes.

Sandra Loosemore (6):
  OpenMP: Handle loop transformation clauses in nested functions
  OpenMP: C support for imperfectly-nested loops
  OpenMP: C++ support for imperfectly-nested loops
  OpenMP: New c/c++ testcases for imperfectly-nested loops
  OpenMP: Refactor and tidy Fortran front-end code for loop
transformations
  OpenMP: Fortran support for imperfectly nested loops

 gcc/ChangeLog.omp |   16 +
 gcc/c/ChangeLog.omp   |   19 +
 gcc/c/c-parser.cc |  833 +++
 gcc/cp/ChangeLog.omp  |   38 +
 gcc/cp/cp-tree.h  |2 +-
 gcc/cp/parser.cc  | 1331 +++--
 gcc/cp/parser.h   |3 +
 gcc/cp/pt.cc  |3 +-
 gcc/cp/semantics.cc   |   80 +-
 gcc/fortran/ChangeLog.omp |   33 +
 gcc/fortran/openmp.cc | 1063 +
 gcc/omp-api.h |   32 +
 gcc/omp-general.cc|  134 ++
 gcc/omp-general.h |1 +
 gcc/omp-low.cc|  129 --
 gcc/testsuite/ChangeLog.omp   |   54 +
 gcc/testsuite/c-c++-common/goacc/collapse-1.c |   16 +-
 gcc/testsuite/c-c++-common/goacc/tile-2.c |4 +-
 gcc/testsuite/c-c++-common/gomp/imperfect1.c  |   38 +
 gcc/testsuite/c-c++-common/gomp/imperfect2.c  |   34 +
 gcc/testsuite/c-c++-common/gomp/imperfect3.c  |   33 +
 gcc/testsuite/c-c++-common/gomp/imperfect4.c  |   33 +
 gcc/testsuite/c-c++-common/gomp/imperfect5.c  |   57 +
 .../loop-transforms/imperfect-loop-nest.c |5 +-
 .../gomp/loop-transforms/tile-1.c |   16 +-
 .../gomp/loop-transforms/tile-2.c |   16 +-
 .../gomp/loop-transforms/tile-3.c |   26 +-
 .../gomp/loop-transforms/unroll-inner-2.c |6 +-
 .../c-c++-common/gomp/metadirective-1.c   |2 +-
 gcc/testsuite/g++.dg/gomp/attrs-4.C   |2 +-
 gcc/testsuite/g++.dg/gomp/for-1.C |2 +-
 gcc/testsuite/g++.dg/gomp/pr41967.C   |2 +-
 gcc/testsuite/g++.dg/gomp/pr94512.C   |2 +-
 gcc/testsuite/gcc.dg/gomp/collapse-1.c|   10 +-
 gcc/testsuite/gcc.dg/gomp/for-1.c |2 +-
 gcc/testsuite/gcc.dg/gomp/for-11.c|2 +-
 gcc/testsuite/gfortran.dg/gomp/collapse1.f90  |4 +-
 gcc/testsuite/gfortran.dg/gomp/collapse2.f90  |   10 +-
 gcc/testsuite/gfortran.dg/gomp/imperfect1.f90 |   39 +
 gcc/testsuite/gfortran.dg/gomp/imperfect2.f90 |   56 +
 gcc/testsuite/gfortran.dg/gomp/imperfect3.f90 |   29 +
 gcc/testsuite/gfortran.dg/gomp/imperfect4.f90 |   36 +
 gcc/testsuite/gfortran.dg/gomp/imperfect5.f90 |   67 +
 .../gomp/loop-transforms/tile-1.f90   |   12 +-
 .../gomp/loop-transforms/tile-2.f90   |2 +-
 .../loop-transforms/tile-imperfect-nest.f90   |   16 +-
 gcc/tree-nested.cc|   14 +
 libgomp/ChangeLog.omp |   48 +
 .../testsuite/libgomp.c++/imperfect-class-1.C |  169 +++
 .../testsuite/libgomp.c++/imperfect-class-2.C |  167 +++
 .../testsuite/libgomp.c++/imperfect-class-3.C |  167 +++
 .../libgomp.c++/imperfect-destructor.C|  135 ++
 .../libgomp.c++/imperfect-template-1.C|  172 +++
 .../libgomp.c++/imperfect-template-2.C|  170 +++
 .../libgomp.c++/imperfect-template-3.C|  170 +++
 .../imperfect-transform-1.c   |   79 +
 .../imperfect-transform-2.c   |   79 +
 .../libgomp.c-c++-common/imperfect1.c |   76 +
 .../libgomp.c-c++-common/imperfect2.c |  114 ++
 .../libgomp.c-c++-common/imperfect3.c |  119 ++
 .../libgomp.c-c++-common/imperfect4.c |  117 ++
 .../libgomp.c-c++-common/imperfect5.c |   49 +
 .../libgomp.c-c++-common/imperfect6.c |  115 ++
 .../target-imperfect-transform-1.c|   82 +
 .../target-imperfect-transform-2.c|   82 +
 .../libgomp.c-c++-common/target-imperfect1.c  |   81 +
 .../libgomp.c-c++-common/target-imperfect2.c  |  122 ++
 .../libgomp.c-c++-common/target-imperfect3.c  |  125 ++
 .../libgomp.c-c++-common/target-imperfect4.c  |  122 ++
 .../libgomp.fortran/imperfect-destructor.f90  |  142 ++
 .../libgomp.fortran/imperfect-transform-1.f90 |   70 +
 .../libgomp.fortran/imperfect-transform-2.f90 |   70 +
 .../testsuite/libgomp.fortran/imperfect1.f90  |   67 +
 

[OG13 1/6] OpenMP: Handle loop transformation clauses in nested functions

2023-06-14 Thread Sandra Loosemore
The new internal clauses introduced for loop transformations were missing
from the big switch statements over all clauses in these functions.

gcc/ChangeLog:
* tree-nested.cc (convert_nonlocal_omp_clauses): Handle loop
transformation clauses.
(convert_local_omp_clauses): Likewise.

libgomp/ChangeLog:
* testsuite/libgomp.fortran/loop-transforms/nested-fn.f90: New test.

Co-Authored-By: Frederik Harwath 
---
 gcc/ChangeLog.omp |  7 +++
 gcc/tree-nested.cc| 14 ++
 libgomp/ChangeLog.omp |  5 +
 .../loop-transforms/nested-fn.f90 | 19 +++
 4 files changed, 45 insertions(+)
 create mode 100644 
libgomp/testsuite/libgomp.fortran/loop-transforms/nested-fn.f90

diff --git a/gcc/ChangeLog.omp b/gcc/ChangeLog.omp
index b4ebf6c0dea..d77d01076c2 100644
--- a/gcc/ChangeLog.omp
+++ b/gcc/ChangeLog.omp
@@ -1,3 +1,10 @@
+2023-06-13  Sandra Loosemore  
+   Frederik Harwath 
+
+   * tree-nested.cc (convert_nonlocal_omp_clauses): Handle loop
+   transformation clauses.
+   (convert_local_omp_clauses): Likewise.
+
 2023-06-12  Tobias Burnus  
 
Backported from mainline:
diff --git a/gcc/tree-nested.cc b/gcc/tree-nested.cc
index 04651d86608..51c69dd3c10 100644
--- a/gcc/tree-nested.cc
+++ b/gcc/tree-nested.cc
@@ -1494,6 +1494,13 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct 
walk_stmt_info *wi)
case OMP_CLAUSE__OMPACC_:
  break;
 
+ /* Clauses related to loop transforms.  */
+   case OMP_CLAUSE_TILE:
+   case OMP_CLAUSE_UNROLL_FULL:
+   case OMP_CLAUSE_UNROLL_PARTIAL:
+   case OMP_CLAUSE_UNROLL_NONE:
+ break;
+
  /* The following clause belongs to the OpenACC cache directive, which
 is discarded during gimplification.  */
case OMP_CLAUSE__CACHE_:
@@ -2291,6 +2298,13 @@ convert_local_omp_clauses (tree *pclauses, struct 
walk_stmt_info *wi)
case OMP_CLAUSE__OMPACC_:
  break;
 
+ /* Clauses related to loop transforms.  */
+   case OMP_CLAUSE_TILE:
+   case OMP_CLAUSE_UNROLL_FULL:
+   case OMP_CLAUSE_UNROLL_PARTIAL:
+   case OMP_CLAUSE_UNROLL_NONE:
+ break;
+
  /* The following clause belongs to the OpenACC cache directive, which
 is discarded during gimplification.  */
case OMP_CLAUSE__CACHE_:
diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 60dc6c1f7c2..5ce5052a8dc 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,3 +1,8 @@
+2023-06-13  Sandra Loosemore  
+   Frederik Harwath 
+
+   * testsuite/libgomp.fortran/loop-transforms/nested-fn.f90: New test.
+
 2023-06-14  Tobias Burnus  
 
Backported from mainline:
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/nested-fn.f90 
b/libgomp/testsuite/libgomp.fortran/loop-transforms/nested-fn.f90
new file mode 100644
index 000..dc70c9228fd
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/nested-fn.f90
@@ -0,0 +1,19 @@
+! { dg-do run }
+
+program foo
+  integer :: count
+contains
+
+subroutine s1 ()
+  integer :: i, count
+
+  count = 0
+
+  !$omp target parallel do
+  !$omp unroll partial
+  do i = 1, 100
+  end do
+
+end subroutine
+
+end program
-- 
2.31.1



Re: Re: [PATCH] RISC-V: Add autovec FP unary operations.

2023-06-14 Thread 钟居哲
After several tries:

(define_mode_iterator VF_AUTO [
  (VNx1HF "TARGET_ZVFH && TARGET_MIN_VLEN < 128")
  (VNx2HF "TARGET_ZVFH")
  (VNx4HF "TARGET_ZVFH")
  (VNx8HF "TARGET_ZVFH")
  (VNx16HF "TARGET_ZVFH")
  (VNx32HF "TARGET_ZVFH && TARGET_MIN_VLEN > 32")
  (VNx64HF "TARGET_ZVFH && TARGET_MIN_VLEN >= 128")

  (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
  (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
  (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
  (VNx8SF "TARGET_VECTOR_ELEN_FP_32")
  (VNx16SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32")
  (VNx32SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
  (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
  (VNx2DF "TARGET_VECTOR_ELEN_FP_64")
  (VNx4DF "TARGET_VECTOR_ELEN_FP_64")
  (VNx8DF "TARGET_VECTOR_ELEN_FP_64")
  (VNx16DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN >= 128")
])


I think we should add VF_AUTO change iterator into using TARGET_ZVFH.
Then it also works now. -march=zvfhmin no auto-vectorization , -march=zvfh has 
auto-vectorization.

Feel free to comments more solutions.

Thanks.


juzhe.zh...@rivai.ai
 
From: 钟居哲
Date: 2023-06-15 05:15
To: Jeff Law; rdapp.gcc; gcc-patches; palmer; kito.cheng
Subject: Re: Re: [PATCH] RISC-V: Add autovec FP unary operations.
Hi, Jeff.  Thanks for quick approval.

When I reviewed the patch:
(define_expand "2"
  [(set (match_operand:VF 0 "register_operand")
(any_float_unop_nofrm:VF
 (match_operand:VF 1 "register_operand")))]
  "TARGET_VECTOR"
{
  insn_code icode = code_for_pred (, mode);
  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
  DONE;
})

There could be issue here of FP16 vector. 
Since let's see VF iterator:
(define_mode_iterator VF [
  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")


You can see For all FP16 mode, we use predicate "TARGET_VECTOR_ELEN_FP_16"
which is true when either TARGET_ZVFHM or TARGET_ZVFHMIN.
The reason we do that since most floating-point instructions are using same 
iterators that we can't add TARGET_ZVFHMIN or TARGET_ZVFH
in naive way. Some instructions pattern are using VF for example vle16.v which 
should be enabled as long as TARGET_ZVFHMIN wheras
the instructions like vfneg.v need TARGET_ZVFH.

So I do the experiment:
void
f (_Float16 *restrict a, _Float16 *restrict b)
{
  for (int i = 0; i < 100; ++i)
{
  a[i] = -b[i];
}
}

with compile option:
-march=rv64gcv_zvfhmin --param=riscv-autovec-preference=fixed-vlmax -O3

ICE happens:
auto.c:26:1: error: unable to generate reloads for:
(insn 8 7 9 2 (set (reg:VNx8HF 186 [ vect__6.7 ])
(if_then_else:VNx8HF (unspec:VNx8BI [
(const_vector:VNx8BI [
(const_int 1 [0x1]) repeated x8
])
(const_int 8 [0x8])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(neg:VNx8HF (reg:VNx8HF 134 [ vect__4.6 ]))
(unspec:VNx8HF [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))) "auto.c":24:14 6631 {pred_negvnx8hf}
 (expr_list:REG_DEAD (reg:VNx8HF 134 [ vect__4.6 ])
(nil)))

The reason of ICE is that we have enabled auto-vectorzation pattern of vfneg.v 
when TARGET_ZVFHMIN according to VF iterators but
the instructions pattern of vfneg.v is correctly disabled and only enabled when 
TARGET_ZVFH since we have this attribute for each
RVV instruction pattern:
(define_attr "fp_vector_disabled" "no,yes"
  (cond [
(and (eq_attr "type" "vfmov,vfalu,vfmul,vfdiv,
vfwalu,vfwmul,vfmuladd,vfwmuladd,
vfsqrt,vfrecp,vfminmax,vfsgnj,vfcmp,
vfclass,vfmerge,
vfncvtitof,vfwcvtftoi,vfcvtftoi,vfcvtitof,
vfredo,vfredu,vfwredo,vfwredu,
vfslide1up,vfslide1down")
   (and (eq_attr "mode" "VNx1HF,VNx2HF,VNx4HF,VNx8HF,VNx16HF,VNx32HF,VNx64HF")
(match_test "!TARGET_ZVFH")))
(const_string "yes")

;; The mode records as QI for the FP16 <=> INT8 instruction.
(and (eq_attr "type" "vfncvtftoi,vfwcvtitof")
   (and (eq_attr "mode" "VNx1QI,VNx2QI,VNx4QI,VNx8QI,VNx16QI,VNx32QI,VNx64QI")
(match_test "!TARGET_ZVFH")))
(const_string "yes")
  ]
  (const_string "no")))

When I slightly change the pattern as follows:
(define_expand "2"
  [(set (match_operand:VF 0 "register_operand")
(any_float_unop_nofrm:VF
 (match_operand:VF 1 "register_operand")))]
  "TARGET_VECTOR && !(GET_MODE_INNER (mode) == HFmode && !TARGET_ZVFH)"
{
  insn_code icode = code_for_pred (, mode);
  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
  

Re: Re: [PATCH] RISC-V: Add autovec FP unary operations.

2023-06-14 Thread 钟居哲
Hi, Jeff.  Thanks for quick approval.

When I reviewed the patch:
(define_expand "2"
  [(set (match_operand:VF 0 "register_operand")
(any_float_unop_nofrm:VF
 (match_operand:VF 1 "register_operand")))]
  "TARGET_VECTOR"
{
  insn_code icode = code_for_pred (, mode);
  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
  DONE;
})

There could be issue here of FP16 vector. 
Since let's see VF iterator:
(define_mode_iterator VF [
  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")


You can see For all FP16 mode, we use predicate "TARGET_VECTOR_ELEN_FP_16"
which is true when either TARGET_ZVFHM or TARGET_ZVFHMIN.
The reason we do that since most floating-point instructions are using same 
iterators that we can't add TARGET_ZVFHMIN or TARGET_ZVFH
in naive way. Some instructions pattern are using VF for example vle16.v which 
should be enabled as long as TARGET_ZVFHMIN wheras
the instructions like vfneg.v need TARGET_ZVFH.

So I do the experiment:
void
f (_Float16 *restrict a, _Float16 *restrict b)
{
  for (int i = 0; i < 100; ++i)
{
  a[i] = -b[i];
}
}

with compile option:
-march=rv64gcv_zvfhmin --param=riscv-autovec-preference=fixed-vlmax -O3

ICE happens:
auto.c:26:1: error: unable to generate reloads for:
(insn 8 7 9 2 (set (reg:VNx8HF 186 [ vect__6.7 ])
(if_then_else:VNx8HF (unspec:VNx8BI [
(const_vector:VNx8BI [
(const_int 1 [0x1]) repeated x8
])
(const_int 8 [0x8])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(neg:VNx8HF (reg:VNx8HF 134 [ vect__4.6 ]))
(unspec:VNx8HF [
(reg:SI 0 zero)
] UNSPEC_VUNDEF))) "auto.c":24:14 6631 {pred_negvnx8hf}
 (expr_list:REG_DEAD (reg:VNx8HF 134 [ vect__4.6 ])
(nil)))

The reason of ICE is that we have enabled auto-vectorzation pattern of vfneg.v 
when TARGET_ZVFHMIN according to VF iterators but
the instructions pattern of vfneg.v is correctly disabled and only enabled when 
TARGET_ZVFH since we have this attribute for each
RVV instruction pattern:
(define_attr "fp_vector_disabled" "no,yes"
  (cond [
(and (eq_attr "type" "vfmov,vfalu,vfmul,vfdiv,
vfwalu,vfwmul,vfmuladd,vfwmuladd,
vfsqrt,vfrecp,vfminmax,vfsgnj,vfcmp,
vfclass,vfmerge,
vfncvtitof,vfwcvtftoi,vfcvtftoi,vfcvtitof,
vfredo,vfredu,vfwredo,vfwredu,
vfslide1up,vfslide1down")
   (and (eq_attr "mode" "VNx1HF,VNx2HF,VNx4HF,VNx8HF,VNx16HF,VNx32HF,VNx64HF")
(match_test "!TARGET_ZVFH")))
(const_string "yes")

;; The mode records as QI for the FP16 <=> INT8 instruction.
(and (eq_attr "type" "vfncvtftoi,vfwcvtitof")
   (and (eq_attr "mode" "VNx1QI,VNx2QI,VNx4QI,VNx8QI,VNx16QI,VNx32QI,VNx64QI")
(match_test "!TARGET_ZVFH")))
(const_string "yes")
  ]
  (const_string "no")))

When I slightly change the pattern as follows:
(define_expand "2"
  [(set (match_operand:VF 0 "register_operand")
(any_float_unop_nofrm:VF
 (match_operand:VF 1 "register_operand")))]
  "TARGET_VECTOR && !(GET_MODE_INNER (mode) == HFmode && !TARGET_ZVFH)"
{
  insn_code icode = code_for_pred (, mode);
  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
  DONE;
})

Add && !(GET_MODE_INNER (mode) == HFmode && !TARGET_ZVFH)
to condition.

It works for both TARGET_ZVFH and TARGET_ZVFHMIN
-march=rv64gcv_zvfhmin:
f:
li  a4,2147450880
li  a5,-2147450880
addia4,a4,-1
addia5,a5,1
sllia3,a5,32
sllia2,a4,32
mv  a5,a4
li  a4,-2147450880
addia6,a1,200
add a3,a3,a4
add a2,a2,a5
.L2:
ld  a5,0(a1)
addia0,a0,8
addia1,a1,8
not a4,a5
and a5,a5,a2
and a4,a4,a3
sub a5,a3,a5
xor a5,a4,a5
sd  a5,-8(a0)
bne a1,a6,.L2
ret

-march=rv64gcv_zvfh:
f:
vsetivlizero,8,e16,m1,ta,ma
addia4,a1,16
addia5,a0,16
vle16.v v1,0(a1)
vfneg.v v1,v1
vse16.v v1,0(a0)
addia2,a1,32
addia3,a0,32
vle16.v v1,0(a4)
vfneg.v v1,v1
vse16.v v1,0(a5)
addia4,a1,48
addia5,a0,48
vle16.v v1,0(a2)
vfneg.v v1,v1
vse16.v v1,0(a3)
addia2,a1,64
addia3,a0,64
vle16.v v1,0(a4)
vfneg.v v1,v1
vse16.v 

[wwwdocs] Broken URL to README in st/cli-be project

2023-06-14 Thread Jivan Hakobyan via Gcc-patches
In CLI project link to README is broken. This patch fixes that.
Discussed in PR110250


-- 
With the best regards
Jivan Hakobyan
diff --git a/htdocs/projects/cli.html b/htdocs/projects/cli.html
index 380fb031..394832b6 100644
--- a/htdocs/projects/cli.html
+++ b/htdocs/projects/cli.html
@@ -145,7 +145,7 @@ are followed.
 
 
 There is a small
-https://gcc.gnu.org/svn/gcc/branches/st/README?view=markup;>README
+https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=README;hb=refs/vendors/st/heads/README;>README
  file that explains how to build and install the GCC CLI back end and
 front end and the CLI binutils (both Mono based and DotGnu based) .
 


[PATCH ver 4] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-14 Thread Carl Love via Gcc-patches
Kewen, GCC maintainers:

Version 4, added missing cases for new xxexpqp, xsxexpdp and xsxsigqp
cases to rs6000_expand_builtin.  Merged the new define_insn definitions
with the existing definitions.  Renamed the builtins by removing the
__builtin_ prefix from the names.  Fixed the documentation for the
builtins.  Updated the test files to check the desired instructions
were generated.  Retested patch on Power 10 with no regressions.

Version 3, was able to get the overloaded version of scalar_insert_exp
to work and the change to xsxexpqp_f128_ define instruction to
work with the suggestions from Kewen.  

Version 2, I have addressed the various comments from Kewen.  I had
issues with adding an additional overloaded version of
scalar_insert_exp with vector arguments.  The overload infrastructure
didn't work with a mix of scalar and vector arguments.  I did rename
the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp make
it similar to the existing builtin.  I also wasn't able to get the
suggested merge of xsxexpqp_f128_ with xsxexpqp_ to work so
I left the two simpler definitiions.

The patch add three new builtins to extract the significand and
exponent of an IEEE float 128-bit value where the builtin argument is a
vector.  Additionally, a builtin to insert the exponent into an IEEE
float 128-bit vector argument is added.  These builtins were requested
since there is no clean and optimal way to transfer between a vector
and a scalar IEEE 128 bit value.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable or not.  Thanks.

   Carl



rs6000: Add builtins for IEEE 128-bit floating point values

Add support for the following builtins:

 __vector unsigned long long int scalar_extract_exp_to_vec (__ieee128);
 __vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128);
 __ieee128 scalar_insert_exp (__vector unsigned __int128,
  __vector unsigned long long);

These builtins were requesed since there is no clean and performant way to
transfer a value from a vector type and scalar type, despite the fact
that they both reside in vector registers.

gcc/
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
(CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti,
CODE_FOR_xsiexpqp_kf_v2di   ): Add case statements.
* config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
 __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
builtin definitions.
Rename xsxexpqp_kf, xsxsigqp_kf, xxsiexpqp_kf to xsexpqp_kf_di,
xsxsigqp_kf_ti, xsiexpqp_kf_di respectively.
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Add else if for MODE_VECTOR_INT. Update comments.
* config/rs6000/rs6000-overload.def
(__builtin_vec_scalar_insert_exp): Add new overload definition with
vector arguments.
(scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
odverloaded definitions.
* config/vsx.md (VSEEQP_DI, VSESQP_TI): New mode iterators.
(VSEEQP_DI_base): New mode attribute definition.
Rename xsxexpqp_ to sxexpqp__.
Rename xsxsigqp_ to xsxsigqp__.
Rename xsiexpqp_ to xsiexpqp__.
(xsxsigqp_f128_, xsiexpqpf_f128_): Add define_insn for
new builtins.
* doc/extend.texi (__builtin_extractf128_exp,
__builtin_extractf128_sig): Add documentation for new builtins.
(scalar_insert_exp): Add new overloaded builtin definition.

gcc/testsuite/
* gcc.target/powerpc/bfp/extract-exp-1.c: New test case.
* gcc.target/powerpc/bfp/extract-sig-1.c: New test case.
* gcc.target/powerpc/bfp/insert-exp-1.c: New test case.
---
 gcc/config/rs6000/rs6000-builtin.cc   | 21 +++--
 gcc/config/rs6000/rs6000-builtins.def | 15 ++-
 gcc/config/rs6000/rs6000-c.cc | 10 +-
 gcc/config/rs6000/rs6000-overload.def | 10 ++
 gcc/config/rs6000/vsx.md  | 26 +++--
 gcc/doc/extend.texi   | 21 -
 .../gcc.target/powerpc/bfp/extract-exp-1.c| 53 +++
 .../gcc.target/powerpc/bfp/extract-sig-1.c| 60 
 .../gcc.target/powerpc/bfp/insert-exp-1.c | 94 +++
 9 files changed, 284 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-exp-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-sig-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/insert-exp-1.c

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 534698e7d3e..a8f291c6a72 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ 

Re: [PATCH 1/3] Inline vect_get_max_nscalars_per_iter

2023-06-14 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Wed, 14 Jun 2023, Richard Sandiford wrote:
>
>> Richard Biener via Gcc-patches  writes:
>> > The function is only meaningful for LOOP_VINFO_MASKS processing so
>> > inline it into the single use.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
>> >
>> >* tree-vect-loop.cc (vect_get_max_nscalars_per_iter): Inline
>> >into ...
>> >(vect_verify_full_masking): ... this.
>> 
>> I think we did have a use for the separate function internally,
>> but obviously it was never submitted.  Personally I'd prefer
>> to keep things as they are though.
>
> OK - after 3/3 it's no longer "generic" (it wasn't before,
> it doesn't inspect the _len groups either), it's only meaningful
> for WHILE_ULT style analysis.

Ah, yeah, that's fair.  Sorry, I hadn't seen the rgc_vec/rgc_map
thing when I wrote the above.

So yeah, please go ahead.

Thanks,
Richard

>
>> 
>> 
>> > ---
>> >  gcc/tree-vect-loop.cc | 22 ++
>> >  1 file changed, 6 insertions(+), 16 deletions(-)
>> >
>> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
>> > index ace9e759f5b..a9695e5b25d 100644
>> > --- a/gcc/tree-vect-loop.cc
>> > +++ b/gcc/tree-vect-loop.cc
>> > @@ -1117,20 +1117,6 @@ can_produce_all_loop_masks_p (loop_vec_info 
>> > loop_vinfo, tree cmp_type)
>> >return true;
>> >  }
>> >  
>> > -/* Calculate the maximum number of scalars per iteration for every
>> > -   rgroup in LOOP_VINFO.  */
>> > -
>> > -static unsigned int
>> > -vect_get_max_nscalars_per_iter (loop_vec_info loop_vinfo)
>> > -{
>> > -  unsigned int res = 1;
>> > -  unsigned int i;
>> > -  rgroup_controls *rgm;
>> > -  FOR_EACH_VEC_ELT (LOOP_VINFO_MASKS (loop_vinfo), i, rgm)
>> > -res = MAX (res, rgm->max_nscalars_per_iter);
>> > -  return res;
>> > -}
>> > -
>> >  /* Calculate the minimum precision necessary to represent:
>> >  
>> >MAX_NITERS * FACTOR
>> > @@ -1210,8 +1196,6 @@ static bool
>> >  vect_verify_full_masking (loop_vec_info loop_vinfo)
>> >  {
>> >unsigned int min_ni_width;
>> > -  unsigned int max_nscalars_per_iter
>> > -= vect_get_max_nscalars_per_iter (loop_vinfo);
>> >  
>> >/* Use a normal loop if there are no statements that need masking.
>> >   This only happens in rare degenerate cases: it means that the loop
>> > @@ -1219,6 +1203,12 @@ vect_verify_full_masking (loop_vec_info loop_vinfo)
>> >if (LOOP_VINFO_MASKS (loop_vinfo).is_empty ())
>> >  return false;
>> >  
>> > +  /* Calculate the maximum number of scalars per iteration for every 
>> > rgroup.  */
>> > +  unsigned int max_nscalars_per_iter = 1;
>> > +  for (auto rgm : LOOP_VINFO_MASKS (loop_vinfo))
>> > +max_nscalars_per_iter
>> > +  = MAX (max_nscalars_per_iter, rgm.max_nscalars_per_iter);
>> > +
>> >/* Work out how many bits we need to represent the limit.  */
>> >min_ni_width
>> >  = vect_min_prec_for_max_niters (loop_vinfo, max_nscalars_per_iter);
>> 


Re: [PATCH] RISC-V: Add autovec FP unary operations.

2023-06-14 Thread Jeff Law via Gcc-patches




On 6/14/23 09:31, Robin Dapp wrote:

Hi,

this patch adds floating-point autovec expanders for vfneg, vfabs as well as
vfsqrt and the accompanying tests.  vfrsqrt7 will be added at a later time.
So with vrsqrt7 I think the question turns into will we be able to use 
it effectively.  With its limited initial accuracy, we'll be stuck with 
another round of Newton-Raphson or Goldschmidt, so we're not likely 
going to beat the latency of a standard vsqrt.  We can use it to improve 
throughput though since it does pipeline (using the fmacs of course, so 
there's a definite trade-off if the fmacs are already saturated).





Similary to the binop tests, there are flavors for zvfh now.  Prerequisites
as before.

Regards
  Robin

gcc/ChangeLog:

* config/riscv/autovec.md (2): Add unop expanders.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/abs-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c: New test.
LGTM.  So if Juzhe is happy with it, then it's good to go once 
dependencies are resolved.


jeff



Re: [PATCH v2] machine descriptor: New compact syntax for insn and insn_split in Machine Descriptions.

2023-06-14 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> +The syntax rules are as follows:
> +@itemize @bullet
> +@item
> +Templates must start with @samp{@{@@} to use the new syntax.
> +
> +@item
> +@samp{@{@@} is followed by a layout in parentheses which is @samp{cons:}

s/parentheses/square brackets/

> +followed by a comma-separated list of 
> @code{match_operand}/@code{match_scratch}
> +operand numbers, then a semicolon, followed by the same for attributes
> +(@samp{attrs:}).  Operand modifiers can be placed in this section group as 
> well.

How about:

  Operand modifiers like @code{=} and @code{+} can be placed before
  an operand number.

> +Both sections are optional (so you can use only @samp{cons}, or only
> +@samp{attrs}, or both), and @samp{cons} must come before @samp{attrs} if
> +present.
> +
> +@item
> +Each alternative begins with any amount of whitespace.
> +
> +@item
> +Following the whitespace is a comma-separated list of "constraints" and/or
> +"attributes" within brackets @code{[]}, with sections separated by a 
> semicolon.
> +
> +@item
> +Should you want to copy the previous asm line, the symbol @code{^} can be 
> used.
> +This allows less copy pasting between alternative and reduces the number of
> +lines to update on changes.
> +
> +@item
> +When using C functions for output, the idiom @samp{* return @var{function};}
> +can be replaced with the shorthand @samp{<< @var{function};}.
> +
> +@item
> +Following the closing @samp{]} is any amount of whitespace, and then the 
> actual
> +asm output.
> +
> +@item
> +Spaces are allowed in the list (they will simply be removed).
> +
> +@item
> +All constraint alternatives should be specified.  For example, a list of
> +of three blank alternatives should be written @samp{[,,]} rather than
> +@samp{[]}.
> +
> +@item
> +All attribute alternatives should be non-empty, with @samp{*}
> +representing the default attribute value.  For example, a list of three
> +default attribute values should be written @samp{[*,*,*]} rather than
> +@samp{[]}.
> +
> +

Nit: too many blank lines.

> +@item
> +Within an @samp{@{@@} block both multiline and singleline C comments are
> +allowed, but when used outside of a C block they must be the only 
> non-whitespace
> +blocks on the line.
> +
> +@item
> +Within an @samp{@{@@} block, any iterators that do not get expanded will 
> result
> +in an error.  If for some reason it is required to have @code{<} or @code{>} 
> in
> +the output then these must be escaped using @backslashchar{}.
> +
> +@item
> +It is possible to use the @samp{attrs} list to specify some attributes and to
> +use the normal @code{set_attr} syntax to specify other attributes.  There 
> must
> +not be any overlap between the two lists.
> +
> +In other words, the following is valid:
> +@smallexample
> +@group
> +(define_insn_and_split ""
> +  [(set (match_operand:SI 0 "nonimmediate_operand")
> +   (match_operand:SI 1 "aarch64_mov_operand"))]
> +  ""
> +  @{@@ [cons: 0, 1; attrs: type, arch, length]@}
> +  @dots{}
> +  [(set_attr "foo" "mov_imm")]
> +)
> +@end group
> +@end smallexample
> +
> +but this is not valid:
> +@smallexample
> +@group
> +(define_insn_and_split ""
> +  [(set (match_operand:SI 0 "nonimmediate_operand")
> +   (match_operand:SI 1 "aarch64_mov_operand"))]
> +  ""
> +  @{@@ [cons: 0, 1; attrs: type, arch, length]@}
> +  @dots{}
> +  [(set_attr "arch" "bar")
> +   (set_attr "foo" "mov_imm")]
> +)
> +@end group
> +@end smallexample
> +
> +because you can't mix and match new and old syntax.

Maybe “because it specifies @code{arch} twice”?  Suggesting that because
“new” and “old” tend not to age well.

> +/* Add constraints to an rtx.  This function is similar to 
> remove_constraints.
> +   Errors if adding the constraints would overwrite existing constraints.  */
> +
> +static void
> +add_constraints (rtx part, file_location loc, vec_conlist )
> +{
> +  const char *format_ptr;
> +
> +  if (part == NULL_RTX)
> +return;
> +
> +  /* If match_op or match_scr, check if we have the right one, and if so, 
> copy
> + over the constraint list.  */
> +  if (GET_CODE (part) == MATCH_OPERAND || GET_CODE (part) == MATCH_SCRATCH)
> +{
> +  int field = GET_CODE (part) == MATCH_OPERAND ? 2 : 1;
> +  unsigned id = XINT (part, 0);
> +
> +  if (id >= cons.size ())
> +   fatal_at (loc, "could not find match_operand/scratch with id %d", id);

Is this an error?  I thought it should be treated like...

> +
> +  if (cons[id].idx == -1)
> +   return;

...cons[id].idx == -1 is here.  I.e. I think they could be combined to:

  if (ids >= cons.size () || cons[id].idx == -1)
return;

> +
> +  if (XSTR (part, field)[0] != '\0')
> +   {
> + error_at (loc, "can't mix normal and compact constraint syntax");
> + return;
> +   }
> +  XSTR (part, field) = cons[id].out ();
> +  cons[id].idx = -1;
> +}
> +
> +  format_ptr = GET_RTX_FORMAT (GET_CODE (part));
> +
> +  /* Recursively search the rtx.  */
> +  for 

Re: [i386 PATCH] A minor code clean-up: Use NULL_RTX instead of nullptr

2023-06-14 Thread Bernhard Reutner-Fischer via Gcc-patches
plonk.

On 26 May 2023 10:31:51 CEST, Bernhard Reutner-Fischer  
wrote:
>On Thu, 25 May 2023 18:58:04 +0200
>Bernhard Reutner-Fischer  wrote:
>
>> On Wed, 24 May 2023 18:54:06 +0100
>> "Roger Sayle"  wrote:
>> 
>> > My understanding is that GCC's preferred null value for rtx is NULL_RTX
>> > (and for tree is NULL_TREE), and by being typed allows strict type 
>> > checking,
>> > and use with function polymorphism and template instantiation.
>> > C++'s nullptr is preferred over NULL and 0 for pointer types that don't
>> > have a defined null of the correct type.
>> > 
>> > This minor clean-up uses NULL_RTX consistently in i386-expand.cc.  
>> 
>> Oh. Well, i can't resist cleanups :)
>
>> (and handle nullptr too, and the same game for tree)
>
>so like the attached. And
>sed -e 's/RTX/TREE/g' -e 's/rtx/tree/g' \
>  < ~/coccinelle/gcc-rtx-null.0.cocci \
>  > ~/coccinelle/gcc-tree-null.0.cocci
>
>I do not know if we want to shorten explicit NULL comparisons.
> foo == NULL => !foo and foo != NULL => foo
>Left them alone in the form they were written.
>
>See the attached result of the rtx hunks, someone would have to build

I've bootstrapped and regtested the hunks for rtx as cited up-thread without 
regressions (as expected).

I know everybody is busy, but I'd like to know if I should swap these out 
completely,   or postpone this until start of stage3 or next stage 1 or 
something.
I can easily keep these local to my personal pre-configure stage for my own 
amusement.

thanks,

>it and hack git-commit-mklog.py --changelog 'Use NULL_RTX.'
>to print("{}.".format(random.choice(['Ditto', 'Same', 'Likewise']))) ;)
>
>> 
>> Just a thought..
>
>cheers,



Re: [PATCH] RISC-V: Use merge approach to optimize vector permutation

2023-06-14 Thread Jeff Law via Gcc-patches




On 6/14/23 09:00, Robin Dapp wrote:

Hi Juzhe,

the general method seems sane and useful (it's not very complicated).
I was just distracted by


Selector = { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }, the 
common expression:
{ 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ...  }

For this selector, we can use vmsltu + vmerge to optimize the codegen.


because it's actually { 0, nunits + 1, 2, nunits + 3, ... } or maybe
{ 0, nunits, 0, nunits, ... } + { 0, 1, 2, 3, ..., nunits - 1 }.

Because of the ascending/monotonic? selector structure we can use vmerge
instead of vrgather.


+/* Recognize the patterns that we can use merge operation to shuffle the
+   vectors. The value of Each element (index i) in selector can only be
+   either i or nunits + i.
+
+   E.g.
+   v = VEC_PERM_EXPR (v0, v1, selector),
+   selector = { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ...  }


Same.


+
+   We can transform such pattern into:
+
+   v = vcond_mask (v0, v1, mask),
+   mask = { 0, 1, 0, 1, 0, 1, ... }.  */
+
+static bool
+shuffle_merge_patterns (struct expand_vec_perm_d *d)
+{
+  machine_mode vmode = d->vmode;
+  machine_mode sel_mode = related_int_vector_mode (vmode).require ();
+  int n_patterns = d->perm.encoding ().npatterns ();
+  poly_int64 vec_len = d->perm.length ();
+
+  for (int i = 0; i < n_patterns; ++i)
+if (!known_eq (d->perm[i], i) && !known_eq (d->perm[i], vec_len + i))
+  return false;
+
+  for (int i = n_patterns; i < n_patterns * 2; i++)
+if (!d->perm.series_p (i, n_patterns, i, n_patterns)
+   && !d->perm.series_p (i, n_patterns, vec_len + i, n_patterns))
+  return false;


Maybe add a comment that we check that the pattern is actually monotonic
or however you prefet to call it?

I didn't go through all tests in detail but skimmed several.  All in all
looks good to me.
So I think that means we want a V2 for the comment updates.  But I think 
we can go ahead and consider V2 pre-approved.


jeff


Re: [PATCH V2] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread Jeff Law via Gcc-patches




On 6/14/23 05:56, Lehua Ding wrote:

The V2 patch address comments from Juzhe, thanks.

Hi,
  
The reason for this bug is that in the case where the vector register is set

to a fixed length (with `--param=riscv-autovec-preference=fixed-vlmax` option),
TARGET_PASS_BY_REFERENCE thinks that variables of type vint32m1 can be passed
through two scalar registers, but when GCC calls FUNCTION_VALUE (call function
riscv_get_arg_info inside) it returns NULL_RTX. These two functions are not
unified. The current treatment is to pass all vector arguments and returns
through the function stack, and a new calling convention for vector registers
will be added in the future.
  
Best,

Lehua

 PR target/110119

gcc/ChangeLog:

 * config/riscv/riscv.cc (riscv_get_arg_info): Return NULL_RTX for 
vector mode
 (riscv_pass_by_reference): Return true for vector mode

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/base/pr110119-1.c: New test.
 * gcc.target/riscv/rvv/base/pr110119-2.c: New test.
And just to be clear, I've asked for a minor comment update.  The usual 
procedure is to go ahead and post a V3.  In this case I'll also give 
that V3 pre-approval.  So no need to wait for additional acks.  Post it 
and it can be committed immediately.


jeff


Re: [PATCH V2] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread Jeff Law via Gcc-patches




On 6/14/23 06:05, Robin Dapp via Gcc-patches wrote:

Oh. I see Robin's email is also wrong. CC Robin too for you


It still arrived via the mailing list ;)


Good to see a Fix patch of the ICE before Vector ABI patch.
Let's wait for more comments.


LGTM, this way I don't even need to rewrite my tests.
I think Palmer wanted to include a pointer to the psabi MR, so we should 
probably include that in a comment.  So OK with that in a comment.


I think there was talk of having this all be hidden behind a flag, but 
given it's an ICE on vector types, I don't mind just defining something 
for now to fix the ICE and give psabi time to finalize that spec.


This was also a good reminder that the vector work can't really be 
complete until we have the psabi updates in place and implemented.   The 
efforts can obviously continue in parallel, but it's a dependency worth 
noting in the RISE context.


Jeff


Re: [PATCH V2] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread Jeff Law via Gcc-patches




On 6/14/23 06:01, juzhe.zh...@rivai.ai wrote:

LGTM now. Thanks for fixing it.

Good to see a Fix patch of the ICE before Vector ABI patch.
Let's wait for more comments.

Lehua Ding takes care of Vector ABI implementation and hopefully will 
send it soon.


It seems the email of Jeff is wrong. CC Jeff .for you.
The gmail address is fine.  I tend to use that for most of my upstream 
email interactions so that my work inbox is marginally decluttered.  And 
I'm also on gcc-patches, so I would have received it through that route 
as well.


jeff


Re: [PATCH v3] RISC-V: Bugfix for vec_init repeating auto vectorization in RV32

2023-06-14 Thread Jeff Law via Gcc-patches




On 6/14/23 03:01, juzhe.zh...@rivai.ai wrote:

LGTM

Agreed.  Commit when convenient.

jeff


Re: [PATCH] RISC-V: Add (u)int8_t to binop tests.

2023-06-14 Thread Jeff Law via Gcc-patches




On 6/14/23 01:23, juzhe.zh...@rivai.ai wrote:

LGTM

Likewise.
jeff


Re: [PATCH v1] RISC-V: Align the predictor style for define_insn_and_split

2023-06-14 Thread Jeff Law via Gcc-patches




On 6/13/23 20:31, juzhe.zh...@rivai.ai wrote:

LGTM.
Similarly.  If I've interpreted the thread correctly, there aren't any 
issues created by this patch, though there are some existing issues that 
need to be addressed independently.  The patch itself is definitely the 
right thing to be doing.


I'd suggest going forward with the commit whenever it's convenient Pan.

Thanks,
Jeff


Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-14 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> This implemens fully masked vectorization or a masked epilog for
> AVX512 style masks which single themselves out by representing
> each lane with a single bit and by using integer modes for the mask
> (both is much like GCN).
>
> AVX512 is also special in that it doesn't have any instruction
> to compute the mask from a scalar IV like SVE has with while_ult.
> Instead the masks are produced by vector compares and the loop
> control retains the scalar IV (mainly to avoid dependences on
> mask generation, a suitable mask test instruction is available).
>
> Like RVV code generation prefers a decrementing IV though IVOPTs
> messes things up in some cases removing that IV to eliminate
> it with an incrementing one used for address generation.
>
> One of the motivating testcases is from PR108410 which in turn
> is extracted from x264 where large size vectorization shows
> issues with small trip loops.  Execution time there improves
> compared to classic AVX512 with AVX2 epilogues for the cases
> of less than 32 iterations.
>
> size   scalar 128 256 512512e512f
> 19.42   11.329.35   11.17   15.13   16.89
> 25.726.536.666.667.628.56
> 34.495.105.105.745.085.73
> 44.104.334.295.213.794.25
> 63.783.853.864.762.542.85
> 83.641.893.764.501.922.16
>123.562.213.754.261.261.42
>163.360.831.064.160.951.07
>203.391.421.334.070.750.85
>243.230.661.724.220.620.70
>283.181.092.044.200.540.61
>323.160.470.410.410.470.53
>343.160.670.610.560.440.50
>383.190.950.950.820.400.45
>423.090.581.211.130.360.40
>
> 'size' specifies the number of actual iterations, 512e is for
> a masked epilog and 512f for the fully masked loop.  From
> 4 scalar iterations on the AVX512 masked epilog code is clearly
> the winner, the fully masked variant is clearly worse and
> it's size benefit is also tiny.
>
> This patch does not enable using fully masked loops or
> masked epilogues by default.  More work on cost modeling
> and vectorization kind selection on x86_64 is necessary
> for this.
>
> Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
> which could be exploited further to unify some of the flags
> we have right now but there didn't seem to be many easy things
> to merge, so I'm leaving this for followups.
>
> Mask requirements as registered by vect_record_loop_mask are kept in their
> original form and recorded in a hash_set now instead of being
> processed to a vector of rgroup_controls.  Instead that's now
> left to the final analysis phase which tries forming the rgroup_controls
> vector using while_ult and if that fails now tries AVX512 style
> which needs a different organization and instead fills a hash_map
> with the relevant info.  vect_get_loop_mask now has two implementations,
> one for the two mask styles we then have.
>
> I have decided against interweaving vect_set_loop_condition_partial_vectors
> with conditions to do AVX512 style masking and instead opted to
> "duplicate" this to vect_set_loop_condition_partial_vectors_avx512.
> Likewise for vect_verify_full_masking vs vect_verify_full_masking_avx512.
>
> I was split between making 'vec_loop_masks' a class with methods,
> possibly merging in the _len stuff into a single registry.  It
> seemed to be too many changes for the purpose of getting AVX512
> working.  I'm going to play wait and see what happens with RISC-V
> here since they are going to get both masks and lengths registered
> I think.
>
> The vect_prepare_for_masked_peels hunk might run into issues with
> SVE, I didn't check yet but using LOOP_VINFO_RGROUP_COMPARE_TYPE
> looked odd.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.  I've run
> the testsuite with --param vect-partial-vector-usage=2 with and
> without -fno-vect-cost-model and filed two bugs, one ICE (PR110221)
> and one latent wrong-code (PR110237).
>
> There's followup work to be done to try enabling masked epilogues
> for x86-64 by default (when AVX512 is enabled, possibly only when
> -mprefer-vector-width=512).  Getting cost modeling and decision
> right is going to be challenging.
>
> Any comments?
>
> OK?

Some comments below, but otherwise LGTM FWIW.

> Btw, testing on GCN would be welcome - the _avx512 paths could
> work for it so in case the while_ult path fails (not sure if
> it ever does) it could get _avx512 style masking.  Likewise
> testing on ARM just to see I didn't break anything here.
> I don't have SVE hardware so testing is probably meaningless.
>
> Thanks,
> Richard.
>
>   * tree-vectorizer.h (enum vect_partial_vector_style): New.
>   

Re: Remove MFWRAP_SPEC remnant

2023-06-14 Thread Jeff Law via Gcc-patches




On 6/14/23 03:14, Jivan Hakobyan via Gcc-patches wrote:

This patch removes a remnant of mudflap.

gcc/ChangeLog:
 * config/moxie/uclinux.h (MFWRAP_SPEC): Remove

Thanks.  I pushed this to the trunk.
jeff


[COMMITED] MAINTAINERS: Add myself to write after approval

2023-06-14 Thread Filip Kastl via Gcc-patches
ChangeLog:

* MAINTAINERS: Add myself to write after approval
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index c8b787b6e1e..4a9a656647e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -484,6 +484,7 @@ Kean Johnston 
 Phillip Jordan 
 Tim Josling 
 Victor Kaplansky 
+Filip Kastl 
 Geoffrey Keating 
 Brendan Kehoe 
 Andi Kleen 
-- 
2.40.1


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool
On Wed, Jun 14, 2023 at 06:25:10PM +0200, Richard Biener wrote:
> > Form rs6000.md:
> > ; This is to explain that changes to the stack pointer should
> > ; not be moved over loads from or stores to stack memory.
> > (define_insn "stack_tie"
> 
> That suggests it’s the hard register value that‘s protected, not the memory 
> pointed to.  I suppose that means an unspec volatile with the reg as input 
> would serve the same?

No?  It says what it says.  That is pretty vague language, of course,
not entirely by accident no doubt.

> Or maybe that’s not the whole story.
> 
> 
> > and from rs6000-logue.cc:
> > /* This ties together stack memory (MEM with an alias set of 
> > frame_alias_set)
> >   and the change to the stack pointer.  */
> > static void
> > rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed)
> 
> I cannot make sense of that comment, but not sure if I really want to know …

It really is the same thing: this is a bloody heavy hammer keeping the
change to the stack pointer (or "hard" frame pointer) in place wrt any
accesses to the stack memory.

If there was a nice portable way to avoid needing this we haven't found
it yet -- or a non-portable way even, and it doesn't have to be all that
nice either come to think of it :-)


Segher


RE: [PATCH v2] [PR96339] Optimise svlast[ab]

2023-06-14 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Prathamesh
> Kulkarni via Gcc-patches
> Sent: Wednesday, June 14, 2023 8:13 AM
> To: Tejas Belagod 
> Cc: Richard Sandiford ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH v2] [PR96339] Optimise svlast[ab]
> 
> On Tue, 13 Jun 2023 at 12:38, Tejas Belagod via Gcc-patches
>  wrote:
> >
> >
> >
> > From: Richard Sandiford 
> > Date: Monday, June 12, 2023 at 2:15 PM
> > To: Tejas Belagod 
> > Cc: gcc-patches@gcc.gnu.org , Tejas Belagod
> 
> > Subject: Re: [PATCH v2] [PR96339] Optimise svlast[ab]
> > Tejas Belagod  writes:
> > > From: Tejas Belagod 
> > >
> > >   This PR optimizes an SVE intrinsics sequence where
> > > svlasta (svptrue_pat_b8 (SV_VL1), x)
> > >   a scalar is selected based on a constant predicate and a variable 
> > > vector.
> > >   This sequence is optimized to return the correspoding element of a
> NEON
> > >   vector. For eg.
> > > svlasta (svptrue_pat_b8 (SV_VL1), x)
> > >   returns
> > > umovw0, v0.b[1]
> > >   Likewise,
> > > svlastb (svptrue_pat_b8 (SV_VL1), x)
> > >   returns
> > >  umovw0, v0.b[0]
> > >   This optimization only works provided the constant predicate maps to a
> range
> > >   that is within the bounds of a 128-bit NEON register.
> > >
> > > gcc/ChangeLog:
> > >
> > >PR target/96339
> > >* config/aarch64/aarch64-sve-builtins-base.cc (svlast_impl::fold): 
> > > Fold
> sve
> > >calls that have a constant input predicate vector.
> > >(svlast_impl::is_lasta): Query to check if intrinsic is svlasta.
> > >(svlast_impl::is_lastb): Query to check if intrinsic is svlastb.
> > >(svlast_impl::vect_all_same): Check if all vector elements are 
> > > equal.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >PR target/96339
> > >* gcc.target/aarch64/sve/acle/general-c/svlast.c: New.
> > >* gcc.target/aarch64/sve/acle/general-c/svlast128_run.c: New.
> > >* gcc.target/aarch64/sve/acle/general-c/svlast256_run.c: New.
> > >* gcc.target/aarch64/sve/pcs/return_4.c (caller_bf16): Fix asm
> > >to expect optimized code for function body.
> > >* gcc.target/aarch64/sve/pcs/return_4_128.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_4_256.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_4_512.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_4_1024.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_4_2048.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_5.c (caller_bf16): Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_5_128.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_5_256.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_5_512.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_5_1024.c (caller_bf16): 
> > > Likewise.
> > >* gcc.target/aarch64/sve/pcs/return_5_2048.c (caller_bf16): 
> > > Likewise.
> >
> > OK, thanks.
> >
> > Applied on master, thanks.
> Hi Tejas,
> This seems to break aarch64 bootstrap build with following error due
> to -Wsign-compare diagnostic:
> 00:18:19 /home/tcwg-
> buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/gcc/config/
> aarch64/aarch64-sve-builtins-base.cc:1133:35:
> error: comparison of integer expressions of different signedness:
> ‘int’ and ‘long unsigned int’ [-Werror=sign-compare]
> 00:18:19  1133 | for (i = npats; i < enelts; i += step_1)
> 00:18:19  | ~~^~~~
> 00:30:46 abe-debug-build: cc1plus: all warnings being treated as errors
> 00:30:46 abe-debug-build: make[3]: ***
> [/home/tcwg-
> buildslave/workspace/tcwg_gnu_6/abe/snapshots/gcc.git~master/gcc/config/
> aarch64/t-aarch64:96:
> aarch64-sve-builtins-base.o] Error 1

Fixed thusly in trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins-base.cc (svlast_impl::fold):
Fix signed comparison warning in loop from npats to enelts.

> 
> Thanks,
> Prathamesh
> >
> > Tejas.
> >
> >
> > Richard


boot.patch
Description: boot.patch


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool
Hi!

On Wed, Jun 14, 2023 at 10:04:20AM +0100, Richard Sandiford wrote:
> I'd also understood it to be either.  As in, it is a may-clobber
> that can be used for must-clobber.  Alternatively: the value stored
> is unpredictable, and can therefore be the same as the current value.

Yes, it is a set with an unspecified RHS.

> I think the main difference between:
> 
>   (clobber (mem:BLK …))
> 
> and
> 
>   (set (mem:BLK …) (unspec:BLK …))
> 
> is that the latter must happen for correctness (unless something
> that understands the unspec proves otherwise) whereas a clobber
> can validly be dropped.  So for something like stack_tie, a set
> seems more correct than a clobber.

No, the latter can be removed as well, under exactly the same
conditions: if no code after it reads what was written.  This happens in
branches marked dead.


Segher


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool
On Wed, Jun 14, 2023 at 09:22:09AM +, Richard Biener wrote:
> How can a clobber be validly dropped?

Same as any other set: if no code executed after it can read whatever is
written.  This typically means a stack frame goes away, or simply no
more code is executed *at all* after this.

> For the case of stack
> memory if there's no stack use after it it could be elided
> and I suppose the clobber itself can be moved.  But then
> the function return is a stack use as well.

A function return does not access the stack at all on most
architectures, including PowerPC.  Some epilogue insns can do, of
course, but we expand to separate insns during expand already.


Segher


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Richard Biener via Gcc-patches



> Am 14.06.2023 um 17:41 schrieb Segher Boessenkool 
> :
> 
> Hi!
> 
>> On Wed, Jun 14, 2023 at 07:59:04AM +, Richard Biener wrote:
>>> On Wed, 14 Jun 2023, Jiufu Guo wrote:
>>> 3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0])
>>> UNSPEC_TIE".
>>>   This avoids using BLK on unspec, but using DI.
>> 
>> That gives the MEM a size which means we can interpret the (set ..)
>> as killing a specific area of memory, enabling DSE of earlier
>> stores.
> 
> Or DSE can delete this tie even, if it can see some later store to the
> same location without anything in between that can read what the tie
> stores.
> 
> BLKmode avoids all of this.  You can call that elegant, you can call it
> cheating, you can call it many things -- but it *works*.
> 
>> AFAIU this special instruction is only supposed to prevent
>> code motion (of stack memory accesses?) across this instruction?
> 
> Form rs6000.md:
> ; This is to explain that changes to the stack pointer should
> ; not be moved over loads from or stores to stack memory.
> (define_insn "stack_tie"

That suggests it’s the hard register value that‘s protected, not the memory 
pointed to.  I suppose that means an unspec volatile with the reg as input 
would serve the same?

Or maybe that’s not the whole story.


> and from rs6000-logue.cc:
> /* This ties together stack memory (MEM with an alias set of frame_alias_set)
>   and the change to the stack pointer.  */
> static void
> rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed)

I cannot make sense of that comment, but not sure if I really want to know …

> A big reason this is needed is because of all the hard frame pointer
> stuff, which the generic parts of GCC require, but there is no register
> for that in the Power architecture.  Nothing is an issue here in most
> cases, but sometimes we need to do unusual things to the stack, say for
> alloca.
> 
>> I'd say a
>> 
>>  (may_clobber (mem:BLK (reg:DI 1 1)))
> 
> "clobber" always means "may clobber".  (clobber X) means X is written
> with some unspecified value, which may well be whatever value it
> currently holds.  Via some magical means or whatever, there is no
> mechanism specified, just the effects :-)
> 
>> might be more to the point?  I've used "may_clobber" which doesn't
>> exist since I'm not sure whether a clobber is considered a kill.
>> The docs say "Represents the storing or possible storing of an 
>> unpredictable..." - what is it?  Storing or possible storing?
> 
> It is the same thing.  "clobber" means the same thing as "set", except
> the value that is written is not specified.
> 
>> I suppose stack_tie should be less strict than the documented
>> (clobber (mem:BLK (const_int 0))) (clobber all memory).
> 
> "clobber" is nicer than the set to (const_int 0).  Does it work though?
> All this code is always fragile :-/  I'm all for this change, don't get
> me wrong, but preferably things stay in working order.
> 
> We use "stack_tie" as a last resort heavy hammer anyway, in all normal
> cases we explain the actual data flow explicitly and correctly, also
> between the various registers used in the *logues.
> 
> 
> Segher


Re: [PATCH ver 3] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-14 Thread Carl Love via Gcc-patches
On Tue, 2023-06-13 at 11:10 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/6/8 23:21, Carl Love wrote:
> > Kewen, GCC maintainers:
> > 
> > Version 3, was able to get the overloaded version of
> > scalar_insert_exp
> > to work and the change to xsxexpqp_f128_ define instruction
> > to
> > work with the suggestions from Kewen.  
> > 
> > Version 2, I have addressed the various comments from Kewen.  I had
> > issues with adding an additional overloaded version of
> > scalar_insert_exp with vector arguments.  The overload
> > infrastructure
> > didn't work with a mix of scalar and vector arguments.  I did
> > rename
> > the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp
> > make
> > it similar to the existing builtin.  I also wasn't able to get the
> > suggested merge of xsxexpqp_f128_ with xsxexpqp_ to
> > work so
> > I left the two simpler definitiions.
> > 
> > The patch add three new builtins to extract the significand and
> > exponent of an IEEE float 128-bit value where the builtin argument
> > is a
> > vector.  Additionally, a builtin to insert the exponent into an
> > IEEE
> > float 128-bit vector argument is added.  These builtins were
> > requested
> > since there is no clean and optimal way to transfer between a
> > vector
> > and a scalar IEEE 128 bit value.
> > 
> > The patch has been tested on Power 10 with no regressions.  Please
> > let
> > me know if the patch is acceptable or not.  Thanks.
> > 
> >Carl
> > 
> > ---
> > rs6000: Add builtins for IEEE 128-bit floating point values
> > 
> > Add support for the following builtins:
> > 
> >  __vector unsigned long long int
> >  __builtin_scalar_extract_exp_to_vec (__ieee128);
> >  __vector unsigned __int128
> >  __builtin_scalar_extract_sig_to_vec (__ieee128);
> >  __ieee128 scalar_insert_exp (__vector unsigned __int128,
> >   __vector unsigned long long);

Fixed commit log, removed __builtin_ from the names per comments from
Kewen below.
> > 
> > These builtins were requesed since there is no clean and performant
> > way to
> > transfer a value from a vector type and scalar type, despite the
> > fact
> > that they both reside in vector registers.
> > 
> > gcc/
> > * config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
> > Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
> > Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
> > * config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
> >  __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
> > builtin definitions.
> > Rename xsxexpqp_kf to xsxexpqp_kf_di.
> > * config/rs6000/rs6000-c.cc
> > (altivec_resolve_overloaded_builtin):
> > Add else if for MODE_VECTOR_INT. Update comments.
> > * config/rs6000/rs6000-overload.def
> > (__builtin_vec_scalar_insert_exp): Add new overload definition
> > with
> > vector arguments.
> > * config/vsx.md (VSEEQP_DI): New mode iterator.
> > Rename define_insn xsxexpqp_ to
> > sxexpqp__.
> > (xsxsigqp_f128_, xsiexpqpf_f128_): Add define_insn
> > for
> > new builtins.
> > * doc/extend.texi (__builtin_extractf128_exp,
> > __builtin_extractf128_sig): Add documentation for new builtins.
> > (scalar_insert_exp): Add new overloaded builtin definition.
> > 
> > gcc/testsuite/
> > * gcc.target/powerpc/bfp/extract-exp-ieee128.c: New test case.
> > * gcc.target/powerpc/bfp/extract-sig-ieee128.c: New test case.
> > * gcc.target/powerpc/bfp/insert-exp-ieee128.c: New test case.
> > ---
> >  gcc/config/rs6000/rs6000-builtin.cc   |  4 +-
> >  gcc/config/rs6000/rs6000-builtins.def | 11 ++-
> >  gcc/config/rs6000/rs6000-c.cc | 10 +-
> >  gcc/config/rs6000/rs6000-overload.def |  2 +
> >  gcc/config/rs6000/vsx.md  | 28 +-
> >  gcc/doc/extend.texi   |  9 ++
> >  .../powerpc/bfp/extract-exp-ieee128.c | 50 ++
> >  .../powerpc/bfp/extract-sig-ieee128.c | 57 
> >  .../powerpc/bfp/insert-exp-ieee128.c  | 91
> > +++
> >  9 files changed, 253 insertions(+), 9 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-
> > exp-ieee128.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-
> > sig-ieee128.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/insert-
> > exp-ieee128.c
> > 
> > diff --git a/gcc/config/rs6000/rs6000-builtin.cc
> > b/gcc/config/rs6000/rs6000-builtin.cc
> > index 534698e7d3e..d99f0ae5dda 100644
> > --- a/gcc/config/rs6000/rs6000-builtin.cc
> > +++ b/gcc/config/rs6000/rs6000-builtin.cc
> > @@ -3326,8 +3326,8 @@ rs6000_expand_builtin (tree exp, rtx target,
> > rtx /* subtarget */,
> >case CODE_FOR_fmakf4_odd:
> > icode = CODE_FOR_fmatf4_odd;
> > break;
> > -  case CODE_FOR_xsxexpqp_kf:
> > -   icode = CODE_FOR_xsxexpqp_tf;
> > +  

[PATCH v2] Add MinGW option -mcrtdll= for choosing C RunTime DLL library

2023-06-14 Thread Pali Rohár via Gcc-patches
It adjust preprocess, compile and link flags, which allows to change
default -lmsvcrt library by another provided by MinGW runtime.

gcc/
 * config/i386/mingw-w64.h (CPP_SPEC): Adjust for -mcrtdll=.
 (REAL_LIBGCC_SPEC): New define.
 * config/i386/mingw.opt: Add mcrtdll=
 * config/i386/mingw32.h (CPP_SPEC): Adjust for -mcrtdll=.
 (REAL_LIBGCC_SPEC): Adjust for -mcrtdll=.
 (STARTFILE_SPEC): Adjust for -mcrtdll=.
 * doc/invoke.texi: Add mcrtdll= documentation.
---
Changes in v2:
* Fixed doc/invoke.texi documentation
---
 gcc/config/i386/mingw-w64.h | 22 +-
 gcc/config/i386/mingw.opt   |  4 
 gcc/config/i386/mingw32.h   | 28 
 gcc/doc/invoke.texi | 24 +++-
 4 files changed, 72 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/mingw-w64.h b/gcc/config/i386/mingw-w64.h
index 3a21cec3f8cd..0146ed4f793e 100644
--- a/gcc/config/i386/mingw-w64.h
+++ b/gcc/config/i386/mingw-w64.h
@@ -25,7 +25,27 @@ along with GCC; see the file COPYING3.  If not see
 #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{mthreads:-D_MT} " \
 "%{municode:-DUNICODE} " \
 "%{" SPEC_PTHREAD1 ":-D_REENTRANT} " \
-"%{" SPEC_PTHREAD2 ":-U_REENTRANT} "
+"%{" SPEC_PTHREAD2 ":-U_REENTRANT} " \
+"%{mcrtdll=crtdll*:-U__MSVCRT__ -D__CRTDLL__} " \
+"%{mcrtdll=msvcrt10*:-D__MSVCRT_VERSION__=0x100} " \
+"%{mcrtdll=msvcrt20*:-D__MSVCRT_VERSION__=0x200} " \
+"%{mcrtdll=msvcrt40*:-D__MSVCRT_VERSION__=0x400} " \
+"%{mcrtdll=msvcrt-os*:-D__MSVCRT_VERSION__=0x700} " \
+"%{mcrtdll=msvcr70*:-D__MSVCRT_VERSION__=0x700} " \
+"%{mcrtdll=msvcr71*:-D__MSVCRT_VERSION__=0x701} " \
+"%{mcrtdll=msvcr80*:-D__MSVCRT_VERSION__=0x800} " \
+"%{mcrtdll=msvcr90*:-D__MSVCRT_VERSION__=0x900} " \
+"%{mcrtdll=msvcr100*:-D__MSVCRT_VERSION__=0xA00} " \
+"%{mcrtdll=msvcr110*:-D__MSVCRT_VERSION__=0xB00} " \
+"%{mcrtdll=msvcr120*:-D__MSVCRT_VERSION__=0xC00} " \
+"%{mcrtdll=ucrt*:-D_UCRT} "
+
+#undef REAL_LIBGCC_SPEC
+#define REAL_LIBGCC_SPEC \
+  "%{mthreads:-lmingwthrd} -lmingw32 \
+   " SHARED_LIBGCC_SPEC " \
+   -lmingwex %{!mcrtdll=*:-lmsvcrt} %{mcrtdll=*:-l%*} \
+   -lkernel32 " MCFGTHREAD_SPEC
 
 #undef STARTFILE_SPEC
 #define STARTFILE_SPEC "%{shared|mdll:dllcrt2%O%s} \
diff --git a/gcc/config/i386/mingw.opt b/gcc/config/i386/mingw.opt
index 0ae026a66bd6..dd66a50aec00 100644
--- a/gcc/config/i386/mingw.opt
+++ b/gcc/config/i386/mingw.opt
@@ -18,6 +18,10 @@
 ; along with GCC; see the file COPYING3.  If not see
 ; .
 
+mcrtdll=
+Target RejectNegative Joined
+Preprocess, compile or link with specified C RunTime DLL library.
+
 pthread
 Driver
 
diff --git a/gcc/config/i386/mingw32.h b/gcc/config/i386/mingw32.h
index 6a55baaa4587..a1ee001983a7 100644
--- a/gcc/config/i386/mingw32.h
+++ b/gcc/config/i386/mingw32.h
@@ -89,7 +89,20 @@ along with GCC; see the file COPYING3.  If not see
 #undef CPP_SPEC
 #define CPP_SPEC "%{posix:-D_POSIX_SOURCE} %{mthreads:-D_MT} " \
 "%{" SPEC_PTHREAD1 ":-D_REENTRANT} " \
-"%{" SPEC_PTHREAD2 ": } "
+"%{" SPEC_PTHREAD2 ": } " \
+"%{mcrtdll=crtdll*:-U__MSVCRT__ -D__CRTDLL__} " \
+"%{mcrtdll=msvcrt10*:-D__MSVCRT_VERSION__=0x100} " \
+"%{mcrtdll=msvcrt20*:-D__MSVCRT_VERSION__=0x200} " \
+"%{mcrtdll=msvcrt40*:-D__MSVCRT_VERSION__=0x400} " \
+"%{mcrtdll=msvcrt-os*:-D__MSVCRT_VERSION__=0x700} " \
+"%{mcrtdll=msvcr70*:-D__MSVCRT_VERSION__=0x700} " \
+"%{mcrtdll=msvcr71*:-D__MSVCRT_VERSION__=0x701} " \
+"%{mcrtdll=msvcr80*:-D__MSVCRT_VERSION__=0x800} " \
+"%{mcrtdll=msvcr90*:-D__MSVCRT_VERSION__=0x900} " \
+"%{mcrtdll=msvcr100*:-D__MSVCRT_VERSION__=0xA00} " \
+"%{mcrtdll=msvcr110*:-D__MSVCRT_VERSION__=0xB00} " \
+"%{mcrtdll=msvcr120*:-D__MSVCRT_VERSION__=0xC00} " \
+"%{mcrtdll=ucrt*:-D_UCRT} "
 
 /* For Windows applications, include more libraries, but always include
kernel32.  */
@@ -184,11 +197,18 @@ along with GCC; see the file COPYING3.  If not see
 #define REAL_LIBGCC_SPEC \
   "%{mthreads:-lmingwthrd} -lmingw32 \
" SHARED_LIBGCC_SPEC " \
-   -lmoldname -lmingwex -lmsvcrt -lkernel32 " MCFGTHREAD_SPEC
+   %{mcrtdll=crtdll*:-lcoldname} %{!mcrtdll=crtdll*:-lmoldname} \
+   -lmingwex %{!mcrtdll=*:-lmsvcrt} %{mcrtdll=*:-l%*} \
+   -lkernel32 " MCFGTHREAD_SPEC
 
 #undef STARTFILE_SPEC
-#define STARTFILE_SPEC "%{shared|mdll:dllcrt2%O%s} \
-  %{!shared:%{!mdll:crt2%O%s}} %{pg:gcrt2%O%s} \
+#define STARTFILE_SPEC " \
+  %{shared|mdll:%{mcrtdll=crtdll*:dllcrt1%O%s}} \
+  

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool
Hi!

On Wed, Jun 14, 2023 at 09:52:37AM +, Richard Biener wrote:
> I see.  So
> 
> (parallel
>  (unspec stack_tie)
>  (clobber (mem:BLK ...)))

Written like this, without a "set", *every* unspec has to be an
unspec_volatile, for the same reason as all inline asms without outputs
always are considered volatile asm.  The "unspec" arm of the parallel
can be omitted, and if that is valid RTL (possibly after other changes,
like omitting the parallel,replacing it by its one remaining arm), this
is a prefectly valid transformation.

> I suppose it needs to be an unspec_volatile?  It feels like
> the stack_ties are a delicate hack preventing enough but not too
> much optimization ...

Yes.  So let's please not disturb it :-)

It isn't a "delicate" hack I would say, but its effects are needed in
some places, and messing it up leads to hard to debug problems.  Which
had happened time and time again over the years.

It just is hard to deal with variable sized stack adjustments and the
like.  As long as we only use stack ties in such unusual cases, all is
fine.  There are worse things, like what we have the
frame_pointer_needed_indeed thing for :-)


Segher


Re: [RFC] Add stdckdint.h header for C23

2023-06-14 Thread Zack Weinberg via Gcc-patches
On Wed, Jun 14, 2023, at 10:52 AM, Joseph Myers wrote:
> On Tue, 13 Jun 2023, Paul Eggert wrote:
>
>> > There is always the possibility to have the header co-owned by both
>> > the compiler and C library, limits.h style. Just
>> > #if __has_include_next()
>> > # include_next 
>> > #endif
>>
>> I don't see how you could implement
>> __has_include_next() for arbitrary non-GCC compilers,
>> which is what we'd need for glibc users. For glibc internals we can
>> use "#include_next" more readily, since we assume a new-enough GCC.
>> I.e. we could do something like this:
>
> Given the possibility of library functions being included in
>  in future standard versions, it seems important to look
> at ways of splitting responsibility for the header between the
> compiler and library, whether with __has_include_next, or compiler
> version conditionals, or some other such variation.

limits.h is a horrible mess, with both the compiler and the library
trying to get the last word, and I don't think we should take it as a
model. I suggest a better model is GCC 12's stdint.h, where the compiler
provides "stdint-gcc.h" that's used *only* in freestanding mode, and a
wrapper header that defers to the C library (using #include_next, which
is safe in GCC-provided headers) when __STDC_HOSTED__ is defined;
meanwhile, glibc's stdint.h doesn't look for compiler-provided headers
at all.

In this case it sounds like glibc needs the compiler to provide some of
the definitions, and I suggest this should be done via private predefined
macros, in the mode of __INTx_TYPE__ and friends.

zw


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool
Hi!

On Wed, Jun 14, 2023 at 05:26:52PM +0800, Jiufu Guo wrote:
> Richard Biener  writes:
> >> 3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0])
> >> UNSPEC_TIE".
> >>This avoids using BLK on unspec, but using DI.
> >
> > That gives the MEM a size which means we can interpret the (set ..)
> > as killing a specific area of memory, enabling DSE of earlier
> > stores.
> 
> Oh, thanks!
> While with 'unspec:DI', I'm wondering if it means this 'set' would
> do some special things other than pure 'set' to the memory. 

No, that is not what unspec means.  It just means "some DImode value I'm
not telling you anything about".  If to get that value there is some
important work done (that should not be oprimised away, say) you need
unspec_volatile, which means just that: there is an unspecified side
effect done by that insn, so it has to be done on the real machine
exactly like on the abstract C machine, so the insn has big restrictions
on being moved and removed etc.

We can replace the RHS of (almost) *every* set with an unspec, and the
compiler would still work, just would generate pretty lousy code.  But
at least CSE and DSE (and everything else purely dataflow) would still
work :-)


Segher


[patch] libgomp: Extend OMP_ALLOCATOR, add affinity env var doc (was: [Patch] libgomp.texi: Document allocator + affininity env vars)

2023-06-14 Thread Tobias Burnus

On 14.06.23 12:34, Tobias Burnus wrote:

Comments on the wording and/or the content?

This remains — however, the attached patch now additionally lists the
predefined allocators, fixes one awkward wording of mine – and it
documents the OpenMP 5.1 syntax of the OMP_ALLOCATOR environment variable.

Plus: it actually implements the latter, i.e. besides predefined
allocators also predefined memory spaces optionally followed by traits
can be specified in the env var.

Comments are highly welcome!

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp: Extend OMP_ALLOCATOR, add affinity env var doc

Support OpenMP 5.1's syntax for OMP_ALLOCATOR as well,
which permits besides predefined allocators also
predefined memspaces optionally followed by traits.

Additionally, this commit adds the previously lacking
documentation for OMP_ALLOCATOR, OMP_AFFINITY_FORMAT
and OMP_DISPLAY_AFFINITY.

libgomp/ChangeLog:

	* env.c (gomp_def_allocator_envvar): New var.
	(parse_allocator): Handle OpenMP 5.1 syntax.
	(cleanup_env): New.
	(omp_display_env): Output gomp_def_allocator_envvar
	for an allocator with traits.
	* libgomp.texi (OMP_ALLOCATOR, OMP_AFFINITY_FORMAT,
	OMP_DISPLAY_AFFINITY): New.
	* testsuite/libgomp.c/allocator-1.c: New test.
	* testsuite/libgomp.c/allocator-2.c: New test.
	* testsuite/libgomp.c/allocator-3.c: New test.
	* testsuite/libgomp.c/allocator-4.c: New test.
	* testsuite/libgomp.c/allocator-5.c: New test.
	* testsuite/libgomp.c/allocator-6.c: New test.

 libgomp/env.c | 188 +++---
 libgomp/libgomp.texi  | 142 ++
 libgomp/testsuite/libgomp.c/allocator-1.c |  15 +++
 libgomp/testsuite/libgomp.c/allocator-2.c |  17 +++
 libgomp/testsuite/libgomp.c/allocator-3.c |  27 +
 libgomp/testsuite/libgomp.c/allocator-4.c |  15 +++
 libgomp/testsuite/libgomp.c/allocator-5.c |  15 +++
 libgomp/testsuite/libgomp.c/allocator-6.c |  15 +++
 8 files changed, 420 insertions(+), 14 deletions(-)

diff --git a/libgomp/env.c b/libgomp/env.c
index 25c0211dda1..f24484d7f70 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -112,6 +112,7 @@ unsigned long gomp_bind_var_list_len;
 void **gomp_places_list;
 unsigned long gomp_places_list_len;
 uintptr_t gomp_def_allocator = omp_default_mem_alloc;
+char *gomp_def_allocator_envvar = NULL;
 int gomp_debug_var;
 unsigned int gomp_num_teams_var;
 int gomp_nteams_var;
@@ -1233,8 +1234,12 @@ parse_affinity (bool ignore)
 static bool
 parse_allocator (const char *env, const char *val, void *const params[])
 {
+  const char *orig_val = val;
   uintptr_t *ret = (uintptr_t *) params[0];
   *ret = omp_default_mem_alloc;
+  bool memspace = false;
+  size_t ntraits = 0;
+  omp_alloctrait_t *traits;
 
   if (val == NULL)
 return false;
@@ -1243,28 +1248,169 @@ parse_allocator (const char *env, const char *val, void *const params[])
 ++val;
   if (0)
 ;
-#define C(v) \
+#define C(v, m) \
   else if (strncasecmp (val, #v, sizeof (#v) - 1) == 0)	\
 {			\
   *ret = v;		\
   val += sizeof (#v) - 1;\
-}
-  C (omp_default_mem_alloc)
-  C (omp_large_cap_mem_alloc)
-  C (omp_const_mem_alloc)
-  C (omp_high_bw_mem_alloc)
-  C (omp_low_lat_mem_alloc)
-  C (omp_cgroup_mem_alloc)
-  C (omp_pteam_mem_alloc)
-  C (omp_thread_mem_alloc)
+  memspace = m;	\
+}
+  C (omp_default_mem_alloc, false)
+  C (omp_large_cap_mem_alloc, false)
+  C (omp_const_mem_alloc, false)
+  C (omp_high_bw_mem_alloc, false)
+  C (omp_low_lat_mem_alloc, false)
+  C (omp_cgroup_mem_alloc, false)
+  C (omp_pteam_mem_alloc, false)
+  C (omp_thread_mem_alloc, false)
+  C (omp_default_mem_space, true)
+  C (omp_large_cap_mem_space, true)
+  C (omp_const_mem_space, true)
+  C (omp_high_bw_mem_space, true)
+  C (omp_low_lat_mem_space, true)
 #undef C
   else
-val = "X";
+goto invalid;
+  if (memspace && *val == ':')
+{
+  ++val;
+  const char *cp = val;
+  while (*cp != '\0')
+	{
+	  if (*cp == '=')
+	++ntraits;
+	  ++cp;
+	}
+  traits = gomp_alloca (ntraits * sizeof (omp_alloctrait_t));
+  size_t n = 0;
+  while (*val != '\0')
+	{
+#define C(v) \
+	  else if (strncasecmp (val, #v "=", sizeof (#v)) == 0)	\
+	{			\
+	  val += sizeof (#v);\
+	  traits[n].key = omp_atk_ ## v;
+#define V(v) \
+	else if (strncasecmp (val, #v, sizeof (#v) - 1) == 0)	\
+	  {\
+		val += sizeof (#v) - 1;	\
+		traits[n].value = omp_atv_ ## v;			\
+	  }
+	  if (0)
+	;
+	  C (sync_hint)
+	  if (0)
+		;
+	  V (contended)
+	  V (uncontended)
+	  V (serialized)
+	  V (private)
+	  else
+		goto invalid;
+	}
+	  C (alignment)
+	  char *end;
+	  errno = 0;
+	  traits[n].value = strtol 

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool
Hi!

On Wed, Jun 14, 2023 at 07:59:04AM +, Richard Biener wrote:
> On Wed, 14 Jun 2023, Jiufu Guo wrote:
> > 3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0])
> > UNSPEC_TIE".
> >This avoids using BLK on unspec, but using DI.
> 
> That gives the MEM a size which means we can interpret the (set ..)
> as killing a specific area of memory, enabling DSE of earlier
> stores.

Or DSE can delete this tie even, if it can see some later store to the
same location without anything in between that can read what the tie
stores.

BLKmode avoids all of this.  You can call that elegant, you can call it
cheating, you can call it many things -- but it *works*.

> AFAIU this special instruction is only supposed to prevent
> code motion (of stack memory accesses?) across this instruction?

Form rs6000.md:
; This is to explain that changes to the stack pointer should
; not be moved over loads from or stores to stack memory.
(define_insn "stack_tie"

and from rs6000-logue.cc:
/* This ties together stack memory (MEM with an alias set of frame_alias_set)
   and the change to the stack pointer.  */
static void
rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed)

A big reason this is needed is because of all the hard frame pointer
stuff, which the generic parts of GCC require, but there is no register
for that in the Power architecture.  Nothing is an issue here in most
cases, but sometimes we need to do unusual things to the stack, say for
alloca.

> I'd say a
> 
>   (may_clobber (mem:BLK (reg:DI 1 1)))

"clobber" always means "may clobber".  (clobber X) means X is written
with some unspecified value, which may well be whatever value it
currently holds.  Via some magical means or whatever, there is no
mechanism specified, just the effects :-)

> might be more to the point?  I've used "may_clobber" which doesn't
> exist since I'm not sure whether a clobber is considered a kill.
> The docs say "Represents the storing or possible storing of an 
> unpredictable..." - what is it?  Storing or possible storing?

It is the same thing.  "clobber" means the same thing as "set", except
the value that is written is not specified.

> I suppose stack_tie should be less strict than the documented
> (clobber (mem:BLK (const_int 0))) (clobber all memory).

"clobber" is nicer than the set to (const_int 0).  Does it work though?
All this code is always fragile :-/  I'm all for this change, don't get
me wrong, but preferably things stay in working order.

We use "stack_tie" as a last resort heavy hammer anyway, in all normal
cases we explain the actual data flow explicitly and correctly, also
between the various registers used in the *logues.


Segher


[PATCH] RISC-V: Add autovec FP unary operations.

2023-06-14 Thread Robin Dapp via Gcc-patches
Hi,

this patch adds floating-point autovec expanders for vfneg, vfabs as well as
vfsqrt and the accompanying tests.  vfrsqrt7 will be added at a later time.

Similary to the binop tests, there are flavors for zvfh now.  Prerequisites
as before.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec.md (2): Add unop expanders.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/abs-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c: New test.
---
 gcc/config/riscv/autovec.md   | 36 ++-
 .../riscv/rvv/autovec/unop/abs-run.c  |  6 ++--
 .../riscv/rvv/autovec/unop/abs-rv32gcv.c  |  3 +-
 .../riscv/rvv/autovec/unop/abs-rv64gcv.c  |  3 +-
 .../riscv/rvv/autovec/unop/abs-template.h | 14 +++-
 .../riscv/rvv/autovec/unop/abs-zvfh-run.c | 35 ++
 .../riscv/rvv/autovec/unop/vfsqrt-run.c   | 29 +++
 .../riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c   | 10 ++
 .../riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c   | 10 ++
 .../riscv/rvv/autovec/unop/vfsqrt-template.h  | 31 
 .../riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c  | 32 +
 .../riscv/rvv/autovec/unop/vneg-run.c |  6 ++--
 .../riscv/rvv/autovec/unop/vneg-rv32gcv.c |  3 +-
 .../riscv/rvv/autovec/unop/vneg-rv64gcv.c |  3 +-
 .../riscv/rvv/autovec/unop/vneg-template.h|  5 ++-
 .../riscv/rvv/autovec/unop/vneg-zvfh-run.c| 26 ++
 16 files changed, 241 insertions(+), 11 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 1c6d793cae0..72154400f1f 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -498,7 +498,7 @@ (define_expand "2"
 })
 
 ;; 
---
-;; - ABS expansion to vmslt and vneg
+;; - [INT] ABS expansion to vmslt and vneg.
 ;; 
---
 
 (define_expand "abs2"
@@ -517,6 +517,40 @@ (define_expand "abs2"
   DONE;
 })
 
+;; 
---
+;;  [FP] Unary operations
+;; 
---
+;; Includes:
+;; - vfneg.v/vfabs.v
+;; 
---
+(define_expand "2"
+  [(set (match_operand:VF 0 "register_operand")
+(any_float_unop_nofrm:VF
+ (match_operand:VF 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+})
+
+;; 
---
+;; - [FP] Square root
+;; 
---
+;; Includes:
+;; - vfsqrt.v
+;; 
---
+(define_expand "2"
+  [(set (match_operand:VF 0 "register_operand")
+(any_float_unop:VF
+ (match_operand:VF 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_fp_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+})
+
 ;; =
 ;; == Ternary arithmetic
 ;; 

[PATCH] RISC-V: Add autovec FP binary operations.

2023-06-14 Thread Robin Dapp via Gcc-patches
Hi,

this implements the floating-point autovec expanders for binary
operations: vfadd, vfsub, vfdiv, vfmul, vfmax, vfmin and adds
tests.

The existing tests are amended and split up into non-_Float16
and _Float16 flavors as we cannot rely on the zvfh extension
being present.

As long as we do not have full middle-end support -ffast-math
is required for the tests.

In order to allow proper _Float16 support we need to disable
promotion to float.  This patch handles that similarly to
TARGET_ZFH and TARGET_ZINX which is not strictly accurate.
The zvfh extension only requires zfhmin on the scalar side
i.e. just conversion to float and no actual operations.

The *run tests rely on the testsuite changes sent earlier.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec.md (3): Implement binop
expander.
* config/riscv/riscv-protos.h (emit_vlmax_fp_insn): Declare.
(emit_vlmax_fp_minmax_insn): Declare.
(enum frm_field_enum): Rename this...
(enum rounding_mode): ...to this.
* config/riscv/riscv-v.cc (emit_vlmax_fp_insn): New function
(emit_vlmax_fp_minmax_insn): New function.
* config/riscv/riscv.cc (riscv_const_insns): Clarify const
vector handling.
(riscv_libgcc_floating_mode_supported_p): Adjust comment.
(riscv_excess_precision): Do not convert to float for ZVFH.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vdiv-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmax-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmin-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmul-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vsub-zvfh-run.c: New test.
---
 gcc/config/riscv/autovec.md   | 36 +
 gcc/config/riscv/riscv-protos.h   |  5 +-
 gcc/config/riscv/riscv-v.cc   | 76 ++-
 gcc/config/riscv/riscv.cc | 27 +--
 .../riscv/rvv/autovec/binop/vadd-run.c| 12 ++-
 .../riscv/rvv/autovec/binop/vadd-rv32gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vadd-rv64gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vadd-template.h   | 11 ++-
 .../riscv/rvv/autovec/binop/vadd-zvfh-run.c   | 54 +
 .../riscv/rvv/autovec/binop/vdiv-run.c|  8 +-
 .../riscv/rvv/autovec/binop/vdiv-rv32gcv.c|  7 +-
 .../riscv/rvv/autovec/binop/vdiv-rv64gcv.c|  7 +-
 .../riscv/rvv/autovec/binop/vdiv-template.h   |  8 +-
 .../riscv/rvv/autovec/binop/vdiv-zvfh-run.c   | 37 +
 .../riscv/rvv/autovec/binop/vmax-run.c|  9 ++-
 .../riscv/rvv/autovec/binop/vmax-rv32gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vmax-rv64gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vmax-template.h   |  8 +-
 .../riscv/rvv/autovec/binop/vmax-zvfh-run.c   | 38 ++
 .../riscv/rvv/autovec/binop/vmin-run.c| 10 ++-
 .../riscv/rvv/autovec/binop/vmin-rv32gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vmin-rv64gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vmin-template.h   |  8 +-
 .../riscv/rvv/autovec/binop/vmin-zvfh-run.c   | 37 +
 .../riscv/rvv/autovec/binop/vmul-run.c|  8 +-
 

[PATCH 1/2] Missed opportunity to use [SU]ABD

2023-06-14 Thread Oluwatamilore Adebayo via Gcc-patches
From: oluade01 

This adds a recognition pattern for the non-widening
absolute difference (ABD).

gcc/ChangeLog:

* doc/md.texi (sabd, uabd): Document them.
* internal-fn.def (ABD): Use new optab.
* optabs.def (sabd_optab, uabd_optab): New optabs,
* tree-vect-patterns.cc (vect_recog_absolute_difference):
Recognize the following idiom abs (a - b).
(vect_recog_sad_pattern): Refactor to use
vect_recog_absolute_difference.
(vect_recog_abd_pattern): Use patterns found by
vect_recog_absolute_difference to build a new ABD
internal call.
---
 gcc/doc/md.texi   |  10 ++
 gcc/internal-fn.def   |   3 +
 gcc/optabs.def|   2 +
 gcc/tree-vect-patterns.cc | 233 +-
 4 files changed, 217 insertions(+), 31 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
6a435eb44610960513e9739ac9ac1e8a27182c10..e11b10d2fca11016232921bc85e47975f700e6c6
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5787,6 +5787,16 @@ Other shift and rotate instructions, analogous to the
 Vector shift and rotate instructions that take vectors as operand 2
 instead of a scalar type.
 
+@cindex @code{uabd@var{m}} instruction pattern
+@cindex @code{sabd@var{m}} instruction pattern
+@item @samp{uabd@var{m}}, @samp{sabd@var{m}}
+Signed and unsigned absolute difference instructions.  These
+instructions find the difference between operands 1 and 2
+then return the absolute value.  A C code equivalent would be:
+@smallexample
+op0 = op1 > op2 ? op1 - op2 : op2 - op1;
+@end smallexample
+
 @cindex @code{avg@var{m}3_floor} instruction pattern
 @cindex @code{uavg@var{m}3_floor} instruction pattern
 @item @samp{avg@var{m}3_floor}
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 
3ac9d82aace322bd8ef108596e5583daa18c76e3..116965f4830cec8f60642ff011a86b6562e2c509
 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -191,6 +191,9 @@ DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
 DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
 DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
 
+DEF_INTERNAL_SIGNED_OPTAB_FN (ABD, ECF_CONST | ECF_NOTHROW, first,
+ sabd, uabd, binary)
+
 DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | ECF_NOTHROW, first,
  savg_floor, uavg_floor, binary)
 DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, first,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 
6c064ff4993620067d38742a0bfe0a3efb511069..35b835a6ac56d72417dac8ddfd77a8a7e2475e65
 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -359,6 +359,8 @@ OPTAB_D (mask_fold_left_plus_optab, 
"mask_fold_left_plus_$a")
 OPTAB_D (extract_last_optab, "extract_last_$a")
 OPTAB_D (fold_extract_last_optab, "fold_extract_last_$a")
 
+OPTAB_D (uabd_optab, "uabd$a3")
+OPTAB_D (sabd_optab, "sabd$a3")
 OPTAB_D (savg_floor_optab, "avg$a3_floor")
 OPTAB_D (uavg_floor_optab, "uavg$a3_floor")
 OPTAB_D (savg_ceil_optab, "avg$a3_ceil")
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
dc102c919352a0328cf86eabceb3a38c41a7e4fd..e2392113bff4065c909aefc760b4c48978b73a5a
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -782,6 +782,83 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info 
stmt2_info, tree new_rhs,
 }
 }
 
+/* Look for the following pattern
+   X = x[i]
+   Y = y[i]
+   DIFF = X - Y
+   DAD = ABS_EXPR
+
+   ABS_STMT should point to a statement of code ABS_EXPR or ABSU_EXPR.
+   HALF_TYPE and UNPROM will be set should the statement be found to
+   be a widened operation.
+   DIFF_STMT will be set to the MINUS_EXPR
+   statement that precedes the ABS_STMT unless vect_widened_op_tree
+   succeeds.
+ */
+static bool
+vect_recog_absolute_difference (vec_info *vinfo, gassign *abs_stmt,
+   tree *half_type,
+   vect_unpromoted_value unprom[2],
+   gassign **diff_stmt)
+{
+  if (!abs_stmt)
+return false;
+
+  /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
+ inside the loop (in case we are analyzing an outer-loop).  */
+  enum tree_code code = gimple_assign_rhs_code (abs_stmt);
+  if (code != ABS_EXPR && code != ABSU_EXPR)
+return false;
+
+  tree abs_oprnd = gimple_assign_rhs1 (abs_stmt);
+  tree abs_type = TREE_TYPE (abs_oprnd);
+  if (!abs_oprnd)
+return false;
+  if (!ANY_INTEGRAL_TYPE_P (abs_type)
+  || TYPE_OVERFLOW_WRAPS (abs_type)
+  || TYPE_UNSIGNED (abs_type))
+return false;
+
+  /* Peel off conversions from the ABS input.  This can involve sign
+ changes (e.g. from an unsigned subtraction to a signed ABS input)
+ or signed promotion, but it can't include unsigned promotion.
+ (Note that ABS of an unsigned promotion should have been folded
+ away before now anyway.)  */
+  vect_unpromoted_value 

Re: [PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Jakub Jelinek via Gcc-patches
On Wed, Jun 14, 2023 at 04:45:48PM +0200, Uros Bizjak wrote:
> +;; Helper peephole2 for the addcarry and subborrow
> +;; peephole2s, to optimize away nop which resulted from uaddc/usubc
> +;; expansion optimization.
> +(define_peephole2
> +  [(set (match_operand:SWI48 0 "general_reg_operand")
> +   (match_operand:SWI48 1 "memory_operand"))
> +   (const_int 0)]
> +  ""
> +  [(set (match_dup 0) (match_dup 1))])
> 
> Is this (const_int 0) from a recent patch from Roger that introduced:

The first one I see is the one immediately above that:
;; Pre-reload splitter to optimize
;; *setcc_qi followed by *addqi3_cconly_overflow_1 with the same QI
;; operand and no intervening flags modifications into nothing.
(define_insn_and_split "*setcc_qi_addqi3_cconly_overflow_1_"
  [(set (reg:CCC FLAGS_REG)
(compare:CCC (neg:QI (geu:QI (reg:CC_CCC FLAGS_REG) (const_int 0)))
 (ltu:QI (reg:CC_CCC FLAGS_REG) (const_int 0]
  "ix86_pre_reload_split ()"
  "#"
  "&& 1"
  [(const_int 0)])

And you're right, the following incremental patch (I'd integrate it
into the full patch with
(*setcc_qi_addqi3_cconly_overflow_1_, *setccc,
*setcc_qi_negqi_ccc_1_, *setcc_qi_negqi_ccc_2_): Split
into NOTE_INSN_DELETED note rather than nop instruction.
added to ChangeLog) passes all the new tests as well:

--- gcc/config/i386/i386.md 2023-06-14 12:21:38.668657604 +0200
+++ gcc/config/i386/i386.md 2023-06-14 17:12:31.742625193 +0200
@@ -7990,16 +7990,6 @@
(set_attr "pent_pair" "pu")
(set_attr "mode" "")])
 
-;; Helper peephole2 for the addcarry and subborrow
-;; peephole2s, to optimize away nop which resulted from uaddc/usubc
-;; expansion optimization.
-(define_peephole2
-  [(set (match_operand:SWI48 0 "general_reg_operand")
-   (match_operand:SWI48 1 "memory_operand"))
-   (const_int 0)]
-  ""
-  [(set (match_dup 0) (match_dup 1))])
-
 (define_peephole2
   [(parallel [(set (reg:CCC FLAGS_REG)
   (compare:CCC
@@ -8641,7 +8631,8 @@
   "ix86_pre_reload_split ()"
   "#"
   "&& 1"
-  [(const_int 0)])
+  [(const_int 0)]
+  "emit_note (NOTE_INSN_DELETED); DONE;")
 
 ;; Set the carry flag from the carry flag.
 (define_insn_and_split "*setccc"
@@ -8650,7 +8641,8 @@
   "ix86_pre_reload_split ()"
   "#"
   "&& 1"
-  [(const_int 0)])
+  [(const_int 0)]
+  "emit_note (NOTE_INSN_DELETED); DONE;")
 
 ;; Set the carry flag from the carry flag.
 (define_insn_and_split "*setcc_qi_negqi_ccc_1_"
@@ -8659,7 +8651,8 @@
   "ix86_pre_reload_split ()"
   "#"
   "&& 1"
-  [(const_int 0)])
+  [(const_int 0)]
+  "emit_note (NOTE_INSN_DELETED); DONE;")
 
 ;; Set the carry flag from the carry flag.
 (define_insn_and_split "*setcc_qi_negqi_ccc_2_"
@@ -8669,7 +8662,8 @@
   "ix86_pre_reload_split ()"
   "#"
   "&& 1"
-  [(const_int 0)])
+  [(const_int 0)]
+  "emit_note (NOTE_INSN_DELETED); DONE;")
 
 ;; Overflow setting add instructions
 

Jakub



Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool
Hi!

On Wed, Jun 14, 2023 at 12:06:29PM +0800, Jiufu Guo wrote:
> Segher Boessenkool  writes:
> I'm also thinking about other solutions:
> 1. "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])"
>   This is the existing pattern.  It may be read as an action
>   to clean an unknown-size memory block.

Including a size zero memory block, yes.  BLKmode was originally to do
things like bcopy (before modern names like memcpy were more usually
used), and those very much need size zero as well.

> 2. "set (mem/c:BLK (reg/f:DI 1 1) unspec:blk (const_int 0 [0])
> UNSPEC_TIE".
>   Current patch is using this one.

What would be the semantics of that?  Just the same as the current stuff
I'd say, or less?  It cannot be more!

> 3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0])
> UNSPEC_TIE".
>This avoids using BLK on unspec, but using DI.

And is incorrect because of that.

> 4. "set (mem/c:BLK (reg/f:DI 1 1) unspec (const_int 0 [0])
> UNSPEC_TIE"
>There is still a mode for the unspec.

It has VOIDmode here, which is incorrect.

> > On Tue, Jun 13, 2023 at 08:23:35PM +0800, Jiufu Guo wrote:
> >> +&& XINT (SET_SRC (set), 1) == UNSPEC_TIE
> >> +&& XVECEXP (SET_SRC (set), 0, 0) == const0_rtx);
> >
> > This makes it required that the operand of an UNSPEC_TIE unspec is a
> > const_int 0.  This should be documented somewhere.  Ideally you would
> > want no operand at all here, but every unspec has an operand.
> 
> Right!  Since checked UNSPEC_TIE arleady, we may not need to check
> the inner operand. Like " && XINT (SET_SRC (set), 1) == UNSPEC_TIE);".

Yes.  But we should write down somewhere (in a comment near the unspec
constant def for example) what the operand is -- so, "operand is usually
(const_int 0) because we have to put *something* there" or such.  The
clumsiness of this is enough for me to prefer some other solution
already ;-)


Segher


Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool
Hi!

On Wed, Jun 14, 2023 at 05:18:15PM +0800, Xi Ruoyao wrote:
> The generic issue here is to fix (not "papering over") the signed
> overflow, we need to perform the addition in a target machine mode.  We
> may always use Pmode (IIRC const_anchor was introduced for optimizing
> some constant addresses), but can we do better?

The main issue is that the machine description generated target code to
compute some constants, but the sanitizer treats it as if it was user
code that might do wrong things.

> Should we try addition in both DImode and SImode for a 64-bit capable
> machine?

Why?  At least on PowerPC there is only one insn, and it is 64 bits.
The SImode version just ignores all bits other than the low 32 bits, in
both inputs and output.

> Or should we even try more operations than addition (for eg bit
> operations like xor or shift)?  Doing so will need to create a new
> target hook for const anchoring, this is the "complete rework" I meant.

This might make const anchor useful for way more targets maybe,
including rs6000, yes :-)


Segher


Re: [PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Uros Bizjak via Gcc-patches
On Wed, Jun 14, 2023 at 4:56 PM Jakub Jelinek  wrote:
>
> On Wed, Jun 14, 2023 at 04:34:27PM +0200, Uros Bizjak wrote:
> > LGTM for the x86 part. I did my best, but those peephole2 patterns are
> > real PITA to be reviewed thoroughly.
> >
> > Maybe split out peephole2 pack to a separate patch, followed by a
> > testcase patch. This way, bisection would be able to point out if a
> > generic part or target-dependent part caused eventual regression.
>
> Ok.  Guess if it helps for bisection, I could even split the peephole2s
> to one peephole2 addition per commit and then the final patch would add the
> expanders and the generic code.

I don't think it is necessary to split it too much. Peephole2s can be
tricky, but if there is something wrong, it is easy to figure out
which one is problematic.

Uros.


Re: [PATCH] RISC-V: Use merge approach to optimize vector permutation

2023-06-14 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

the general method seems sane and useful (it's not very complicated).
I was just distracted by

> Selector = { 0, 17, 2, 19, 4, 21, 6, 23, 8, 9, 10, 27, 12, 29, 14, 31 }, the 
> common expression:
> { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ...  }
> 
> For this selector, we can use vmsltu + vmerge to optimize the codegen.

because it's actually { 0, nunits + 1, 2, nunits + 3, ... } or maybe
{ 0, nunits, 0, nunits, ... } + { 0, 1, 2, 3, ..., nunits - 1 }.

Because of the ascending/monotonic? selector structure we can use vmerge
instead of vrgather.

> +/* Recognize the patterns that we can use merge operation to shuffle the
> +   vectors. The value of Each element (index i) in selector can only be
> +   either i or nunits + i.
> +
> +   E.g.
> +   v = VEC_PERM_EXPR (v0, v1, selector),
> +   selector = { 0, nunits + 1, 1, nunits + 2, 2, nunits + 3, ...  }

Same.

> +
> +   We can transform such pattern into:
> +
> +   v = vcond_mask (v0, v1, mask),
> +   mask = { 0, 1, 0, 1, 0, 1, ... }.  */
> +
> +static bool
> +shuffle_merge_patterns (struct expand_vec_perm_d *d)
> +{
> +  machine_mode vmode = d->vmode;
> +  machine_mode sel_mode = related_int_vector_mode (vmode).require ();
> +  int n_patterns = d->perm.encoding ().npatterns ();
> +  poly_int64 vec_len = d->perm.length ();
> +
> +  for (int i = 0; i < n_patterns; ++i)
> +if (!known_eq (d->perm[i], i) && !known_eq (d->perm[i], vec_len + i))
> +  return false;
> +
> +  for (int i = n_patterns; i < n_patterns * 2; i++)
> +if (!d->perm.series_p (i, n_patterns, i, n_patterns)
> + && !d->perm.series_p (i, n_patterns, vec_len + i, n_patterns))
> +  return false;

Maybe add a comment that we check that the pattern is actually monotonic
or however you prefet to call it?

I didn't go through all tests in detail but skimmed several.  All in all
looks good to me.

Regards
 Robin



Re: [PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Jakub Jelinek via Gcc-patches
On Wed, Jun 14, 2023 at 04:34:27PM +0200, Uros Bizjak wrote:
> LGTM for the x86 part. I did my best, but those peephole2 patterns are
> real PITA to be reviewed thoroughly.
> 
> Maybe split out peephole2 pack to a separate patch, followed by a
> testcase patch. This way, bisection would be able to point out if a
> generic part or target-dependent part caused eventual regression.

Ok.  Guess if it helps for bisection, I could even split the peephole2s
to one peephole2 addition per commit and then the final patch would add the
expanders and the generic code.

Jakub



Re: [RFC] Add stdckdint.h header for C23

2023-06-14 Thread Joseph Myers
On Tue, 13 Jun 2023, Paul Eggert wrote:

> > There is always the possibility to have the header co-owned by both
> > the compiler and C library, limits.h style.
> > Just
> > #if __has_include_next()
> > # include_next 
> > #endif
> 
> I don't see how you could implement __has_include_next() for
> arbitrary non-GCC compilers, which is what we'd need for glibc users. For
> glibc internals we can use "#include_next" more readily, since we assume a
> new-enough GCC. I.e. we could do something like this:

Given the possibility of library functions being included in  
in future standard versions, it seems important to look at ways of 
splitting responsibility for the header between the compiler and library, 
whether with __has_include_next, or compiler version conditionals, or some 
other such variation.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Uros Bizjak via Gcc-patches
On Wed, Jun 14, 2023 at 4:00 PM Jakub Jelinek  wrote:
>
> Hi!
>
> On Wed, Jun 14, 2023 at 12:35:42PM +, Richard Biener wrote:
> > At this point two pages of code without a comment - can you introduce
> > some vertical spacing and comments as to what is matched now?  The
> > split out functions help somewhat but the code is far from obvious :/
> >
> > Maybe I'm confused by the loops and instead of those sth like
> >
> >  if (match_x_y_z (op0)
> >  || match_x_y_z (op1))
> >...
> >
> > would be easier to follow with the loop bodies split out?
> > Maybe put just put them in lambdas even?
> >
> > I guess you'll be around as long as myself so we can go with
> > this code under the premise you're going to maintain it - it's
> > not that I'm writing trivially to understand code myself ...
>
> As I said on IRC, I don't really know how to split that into further
> functions, the problem is that we need to pattern match a lot of
> statements and it is hard to come up with names for each of them.
> And we need quite a lot of variables for checking their interactions.
>
> The code isn't that much different from say match_arith_overflow or
> optimize_spaceship or other larger pattern recognizers.  And the
> intent is that all the code paths in the recognizer are actually covered
> by the testcases in the testsuite.
>
> That said, I've added 18 new comments to the function, and rebased it
> on top of the
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621717.html
> patch with all constant arguments handling moved to fold-const-call.cc
> even for the new ifns.
>
> Ok for trunk like this if it passes bootstrap/regtest?
>
> 2023-06-13  Jakub Jelinek  
>
> PR middle-end/79173
> * internal-fn.def (UADDC, USUBC): New internal functions.
> * internal-fn.cc (expand_UADDC, expand_USUBC): New functions.
> (commutative_ternary_fn_p): Return true also for IFN_UADDC.
> * optabs.def (uaddc5_optab, usubc5_optab): New optabs.
> * tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart,
> match_uaddc_usubc): New functions.
> (math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc
> for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
> other optimizations have been successful for those.
> * gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC.
> * fold-const-call.cc (fold_const_call): Likewise.
> * gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
> * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
> * doc/md.texi (uaddc5, usubc5): Document new named
> patterns.
> * config/i386/i386.md (subborrow): Add alternative with
> memory destination.
> (uaddc5, usubc5): New define_expand patterns.
> (*sub_3, @add3_carry, addcarry, @sub3_carry,
> subborrow, *add3_cc_overflow_1): Add define_peephole2
> TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory
> destination in these patterns.
>
> * gcc.target/i386/pr79173-1.c: New test.
> * gcc.target/i386/pr79173-2.c: New test.
> * gcc.target/i386/pr79173-3.c: New test.
> * gcc.target/i386/pr79173-4.c: New test.
> * gcc.target/i386/pr79173-5.c: New test.
> * gcc.target/i386/pr79173-6.c: New test.
> * gcc.target/i386/pr79173-7.c: New test.
> * gcc.target/i386/pr79173-8.c: New test.
> * gcc.target/i386/pr79173-9.c: New test.
> * gcc.target/i386/pr79173-10.c: New test.

+;; Helper peephole2 for the addcarry and subborrow
+;; peephole2s, to optimize away nop which resulted from uaddc/usubc
+;; expansion optimization.
+(define_peephole2
+  [(set (match_operand:SWI48 0 "general_reg_operand")
+   (match_operand:SWI48 1 "memory_operand"))
+   (const_int 0)]
+  ""
+  [(set (match_dup 0) (match_dup 1))])

Is this (const_int 0) from a recent patch from Roger that introduced:

+;; Set the carry flag from the carry flag.
+(define_insn_and_split "*setccc"
+  [(set (reg:CCC FLAGS_REG)
+ (reg:CCC FLAGS_REG))]
+  "ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)])
+
+;; Set the carry flag from the carry flag.
+(define_insn_and_split "*setcc_qi_negqi_ccc_1_"
+  [(set (reg:CCC FLAGS_REG)
+ (ltu:CCC (reg:CC_CCC FLAGS_REG) (const_int 0)))]
+  "ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)])
+
+;; Set the carry flag from the carry flag.
+(define_insn_and_split "*setcc_qi_negqi_ccc_2_"
+  [(set (reg:CCC FLAGS_REG)
+ (unspec:CCC [(ltu:QI (reg:CC_CCC FLAGS_REG) (const_int 0))
+ (const_int 0)] UNSPEC_CC_NE))]
+  "ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(const_int 0)])

If this interferes with RTL stream, then instead of emitting
(const_int 0), the above patterns should simply emit:

{
  emit_note (NOTE_INSN_DELETED);
  DONE;
}

And there will be no (const_int 0) in the RTL stream.

Uros.


Re: [PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Uros Bizjak via Gcc-patches
On Wed, Jun 14, 2023 at 4:00 PM Jakub Jelinek  wrote:
>
> Hi!
>
> On Wed, Jun 14, 2023 at 12:35:42PM +, Richard Biener wrote:
> > At this point two pages of code without a comment - can you introduce
> > some vertical spacing and comments as to what is matched now?  The
> > split out functions help somewhat but the code is far from obvious :/
> >
> > Maybe I'm confused by the loops and instead of those sth like
> >
> >  if (match_x_y_z (op0)
> >  || match_x_y_z (op1))
> >...
> >
> > would be easier to follow with the loop bodies split out?
> > Maybe put just put them in lambdas even?
> >
> > I guess you'll be around as long as myself so we can go with
> > this code under the premise you're going to maintain it - it's
> > not that I'm writing trivially to understand code myself ...
>
> As I said on IRC, I don't really know how to split that into further
> functions, the problem is that we need to pattern match a lot of
> statements and it is hard to come up with names for each of them.
> And we need quite a lot of variables for checking their interactions.
>
> The code isn't that much different from say match_arith_overflow or
> optimize_spaceship or other larger pattern recognizers.  And the
> intent is that all the code paths in the recognizer are actually covered
> by the testcases in the testsuite.
>
> That said, I've added 18 new comments to the function, and rebased it
> on top of the
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621717.html
> patch with all constant arguments handling moved to fold-const-call.cc
> even for the new ifns.
>
> Ok for trunk like this if it passes bootstrap/regtest?
>
> 2023-06-13  Jakub Jelinek  
>
> PR middle-end/79173
> * internal-fn.def (UADDC, USUBC): New internal functions.
> * internal-fn.cc (expand_UADDC, expand_USUBC): New functions.
> (commutative_ternary_fn_p): Return true also for IFN_UADDC.
> * optabs.def (uaddc5_optab, usubc5_optab): New optabs.
> * tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart,
> match_uaddc_usubc): New functions.
> (math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc
> for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
> other optimizations have been successful for those.
> * gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC.
> * fold-const-call.cc (fold_const_call): Likewise.
> * gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
> * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
> * doc/md.texi (uaddc5, usubc5): Document new named
> patterns.
> * config/i386/i386.md (subborrow): Add alternative with
> memory destination.
> (uaddc5, usubc5): New define_expand patterns.
> (*sub_3, @add3_carry, addcarry, @sub3_carry,
> subborrow, *add3_cc_overflow_1): Add define_peephole2
> TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory
> destination in these patterns.
>
> * gcc.target/i386/pr79173-1.c: New test.
> * gcc.target/i386/pr79173-2.c: New test.
> * gcc.target/i386/pr79173-3.c: New test.
> * gcc.target/i386/pr79173-4.c: New test.
> * gcc.target/i386/pr79173-5.c: New test.
> * gcc.target/i386/pr79173-6.c: New test.
> * gcc.target/i386/pr79173-7.c: New test.
> * gcc.target/i386/pr79173-8.c: New test.
> * gcc.target/i386/pr79173-9.c: New test.
> * gcc.target/i386/pr79173-10.c: New test.

LGTM for the x86 part. I did my best, but those peephole2 patterns are
real PITA to be reviewed thoroughly.

Maybe split out peephole2 pack to a separate patch, followed by a
testcase patch. This way, bisection would be able to point out if a
generic part or target-dependent part caused eventual regression.

Thanks,
Uros.

>
> --- gcc/internal-fn.def.jj  2023-06-13 18:23:37.208793152 +0200
> +++ gcc/internal-fn.def 2023-06-14 12:21:38.650657857 +0200
> @@ -416,6 +416,8 @@ DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LE
>  DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (UADDC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (USUBC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL)
>  DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> --- gcc/internal-fn.cc.jj   2023-06-13 18:23:37.206793179 +0200
> +++ gcc/internal-fn.cc  2023-06-14 12:21:38.652657829 +0200
> @@ -2776,6 +2776,44 @@ expand_MUL_OVERFLOW (internal_fn, gcall
>expand_arith_overflow (MULT_EXPR, stmt);
>  }
>
> +/* Expand UADDC STMT.  */
> +
> +static void
> +expand_UADDC 

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-14 Thread Richard Biener via Gcc-patches



> Am 14.06.2023 um 16:27 schrieb Andrew Stubbs :
> 
> On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:
>> This implemens fully masked vectorization or a masked epilog for
>> AVX512 style masks which single themselves out by representing
>> each lane with a single bit and by using integer modes for the mask
>> (both is much like GCN).
>> AVX512 is also special in that it doesn't have any instruction
>> to compute the mask from a scalar IV like SVE has with while_ult.
>> Instead the masks are produced by vector compares and the loop
>> control retains the scalar IV (mainly to avoid dependences on
>> mask generation, a suitable mask test instruction is available).
> 
> This is also sounds like GCN. We currently use WHILE_ULT in the middle end 
> which expands to a vector compare against a vector of stepped values. This 
> requires an additional instruction to prepare the comparison vector (compared 
> to SVE), but the "while_ultv64sidi" pattern (for example) returns the DImode 
> bitmask, so it works reasonably well.
> 
>> Like RVV code generation prefers a decrementing IV though IVOPTs
>> messes things up in some cases removing that IV to eliminate
>> it with an incrementing one used for address generation.
>> One of the motivating testcases is from PR108410 which in turn
>> is extracted from x264 where large size vectorization shows
>> issues with small trip loops.  Execution time there improves
>> compared to classic AVX512 with AVX2 epilogues for the cases
>> of less than 32 iterations.
>> size   scalar 128 256 512512e512f
>> 19.42   11.329.35   11.17   15.13   16.89
>> 25.726.536.666.667.628.56
>> 34.495.105.105.745.085.73
>> 44.104.334.295.213.794.25
>> 63.783.853.864.762.542.85
>> 83.641.893.764.501.922.16
>>123.562.213.754.261.261.42
>>163.360.831.064.160.951.07
>>203.391.421.334.070.750.85
>>243.230.661.724.220.620.70
>>283.181.092.044.200.540.61
>>323.160.470.410.410.470.53
>>343.160.670.610.560.440.50
>>383.190.950.950.820.400.45
>>423.090.581.211.130.360.40
>> 'size' specifies the number of actual iterations, 512e is for
>> a masked epilog and 512f for the fully masked loop.  From
>> 4 scalar iterations on the AVX512 masked epilog code is clearly
>> the winner, the fully masked variant is clearly worse and
>> it's size benefit is also tiny.
> 
> Let me check I understand correctly. In the fully masked case, there is a 
> single loop in which a new mask is generated at the start of each iteration. 
> In the masked epilogue case, the main loop uses no masking whatsoever, thus 
> avoiding the need for generating a mask, carrying the mask, inserting 
> vec_merge operations, etc, and then the epilogue looks much like the fully 
> masked case, but unlike smaller mode epilogues there is no loop because the 
> eplogue vector size is the same. Is that right?

Yes.

> This scheme seems like it might also benefit GCN, in so much as it simplifies 
> the hot code path.
> 
> GCN does not actually have smaller vector sizes, so there's no analogue to 
> AVX2 (we pretend we have some smaller sizes, but that's because the middle 
> end can't do masking everywhere yet, and it helps make some vector constants 
> smaller, perhaps).
> 
>> This patch does not enable using fully masked loops or
>> masked epilogues by default.  More work on cost modeling
>> and vectorization kind selection on x86_64 is necessary
>> for this.
>> Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
>> which could be exploited further to unify some of the flags
>> we have right now but there didn't seem to be many easy things
>> to merge, so I'm leaving this for followups.
>> Mask requirements as registered by vect_record_loop_mask are kept in their
>> original form and recorded in a hash_set now instead of being
>> processed to a vector of rgroup_controls.  Instead that's now
>> left to the final analysis phase which tries forming the rgroup_controls
>> vector using while_ult and if that fails now tries AVX512 style
>> which needs a different organization and instead fills a hash_map
>> with the relevant info.  vect_get_loop_mask now has two implementations,
>> one for the two mask styles we then have.
>> I have decided against interweaving vect_set_loop_condition_partial_vectors
>> with conditions to do AVX512 style masking and instead opted to
>> "duplicate" this to vect_set_loop_condition_partial_vectors_avx512.
>> Likewise for vect_verify_full_masking vs vect_verify_full_masking_avx512.
>> I was split between making 'vec_loop_masks' a class with methods,
>> possibly merging in the _len 

Re: [PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Richard Biener via Gcc-patches



> Am 14.06.2023 um 16:00 schrieb Jakub Jelinek :
> 
> Hi!
> 
>> On Wed, Jun 14, 2023 at 12:35:42PM +, Richard Biener wrote:
>> At this point two pages of code without a comment - can you introduce
>> some vertical spacing and comments as to what is matched now?  The
>> split out functions help somewhat but the code is far from obvious :/
>> 
>> Maybe I'm confused by the loops and instead of those sth like
>> 
>> if (match_x_y_z (op0)
>> || match_x_y_z (op1))
>>   ...
>> 
>> would be easier to follow with the loop bodies split out?
>> Maybe put just put them in lambdas even?
>> 
>> I guess you'll be around as long as myself so we can go with
>> this code under the premise you're going to maintain it - it's
>> not that I'm writing trivially to understand code myself ...
> 
> As I said on IRC, I don't really know how to split that into further
> functions, the problem is that we need to pattern match a lot of
> statements and it is hard to come up with names for each of them.
> And we need quite a lot of variables for checking their interactions.
> 
> The code isn't that much different from say match_arith_overflow or
> optimize_spaceship or other larger pattern recognizers.  And the
> intent is that all the code paths in the recognizer are actually covered
> by the testcases in the testsuite.
> 
> That said, I've added 18 new comments to the function, and rebased it
> on top of the
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621717.html
> patch with all constant arguments handling moved to fold-const-call.cc
> even for the new ifns.
> 
> Ok for trunk like this if it passes bootstrap/regtest?

Ok.

Thanks,
Richard 

> 2023-06-13  Jakub Jelinek  
> 
>PR middle-end/79173
>* internal-fn.def (UADDC, USUBC): New internal functions.
>* internal-fn.cc (expand_UADDC, expand_USUBC): New functions.
>(commutative_ternary_fn_p): Return true also for IFN_UADDC.
>* optabs.def (uaddc5_optab, usubc5_optab): New optabs.
>* tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart,
>match_uaddc_usubc): New functions.
>(math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc
>for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
>other optimizations have been successful for those.
>* gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC.
>* fold-const-call.cc (fold_const_call): Likewise.
>* gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
>* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
>* doc/md.texi (uaddc5, usubc5): Document new named
>patterns.
>* config/i386/i386.md (subborrow): Add alternative with
>memory destination.
>(uaddc5, usubc5): New define_expand patterns.
>(*sub_3, @add3_carry, addcarry, @sub3_carry,
>subborrow, *add3_cc_overflow_1): Add define_peephole2
>TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory
>destination in these patterns.
> 
>* gcc.target/i386/pr79173-1.c: New test.
>* gcc.target/i386/pr79173-2.c: New test.
>* gcc.target/i386/pr79173-3.c: New test.
>* gcc.target/i386/pr79173-4.c: New test.
>* gcc.target/i386/pr79173-5.c: New test.
>* gcc.target/i386/pr79173-6.c: New test.
>* gcc.target/i386/pr79173-7.c: New test.
>* gcc.target/i386/pr79173-8.c: New test.
>* gcc.target/i386/pr79173-9.c: New test.
>* gcc.target/i386/pr79173-10.c: New test.
> 
> --- gcc/internal-fn.def.jj2023-06-13 18:23:37.208793152 +0200
> +++ gcc/internal-fn.def2023-06-14 12:21:38.650657857 +0200
> @@ -416,6 +416,8 @@ DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LE
> DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (UADDC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (USUBC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL)
> DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> --- gcc/internal-fn.cc.jj2023-06-13 18:23:37.206793179 +0200
> +++ gcc/internal-fn.cc2023-06-14 12:21:38.652657829 +0200
> @@ -2776,6 +2776,44 @@ expand_MUL_OVERFLOW (internal_fn, gcall
>   expand_arith_overflow (MULT_EXPR, stmt);
> }
> 
> +/* Expand UADDC STMT.  */
> +
> +static void
> +expand_UADDC (internal_fn ifn, gcall *stmt)
> +{
> +  tree lhs = gimple_call_lhs (stmt);
> +  tree arg1 = gimple_call_arg (stmt, 0);
> +  tree arg2 = gimple_call_arg (stmt, 1);
> +  tree arg3 = gimple_call_arg (stmt, 2);
> +  tree type = TREE_TYPE (arg1);
> +  machine_mode mode = TYPE_MODE (type);
> +  insn_code icode = optab_handler (ifn == IFN_UADDC
> +   ? uaddc5_optab : usubc5_optab, mode);
> +  rtx op1 = expand_normal (arg1);
> +  rtx op2 = expand_normal (arg2);
> +  rtx op3 = 

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-14 Thread Andrew Stubbs

On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:

This implemens fully masked vectorization or a masked epilog for
AVX512 style masks which single themselves out by representing
each lane with a single bit and by using integer modes for the mask
(both is much like GCN).

AVX512 is also special in that it doesn't have any instruction
to compute the mask from a scalar IV like SVE has with while_ult.
Instead the masks are produced by vector compares and the loop
control retains the scalar IV (mainly to avoid dependences on
mask generation, a suitable mask test instruction is available).


This is also sounds like GCN. We currently use WHILE_ULT in the middle 
end which expands to a vector compare against a vector of stepped 
values. This requires an additional instruction to prepare the 
comparison vector (compared to SVE), but the "while_ultv64sidi" pattern 
(for example) returns the DImode bitmask, so it works reasonably well.



Like RVV code generation prefers a decrementing IV though IVOPTs
messes things up in some cases removing that IV to eliminate
it with an incrementing one used for address generation.

One of the motivating testcases is from PR108410 which in turn
is extracted from x264 where large size vectorization shows
issues with small trip loops.  Execution time there improves
compared to classic AVX512 with AVX2 epilogues for the cases
of less than 32 iterations.

size   scalar 128 256 512512e512f
 19.42   11.329.35   11.17   15.13   16.89
 25.726.536.666.667.628.56
 34.495.105.105.745.085.73
 44.104.334.295.213.794.25
 63.783.853.864.762.542.85
 83.641.893.764.501.922.16
123.562.213.754.261.261.42
163.360.831.064.160.951.07
203.391.421.334.070.750.85
243.230.661.724.220.620.70
283.181.092.044.200.540.61
323.160.470.410.410.470.53
343.160.670.610.560.440.50
383.190.950.950.820.400.45
423.090.581.211.130.360.40

'size' specifies the number of actual iterations, 512e is for
a masked epilog and 512f for the fully masked loop.  From
4 scalar iterations on the AVX512 masked epilog code is clearly
the winner, the fully masked variant is clearly worse and
it's size benefit is also tiny.


Let me check I understand correctly. In the fully masked case, there is 
a single loop in which a new mask is generated at the start of each 
iteration. In the masked epilogue case, the main loop uses no masking 
whatsoever, thus avoiding the need for generating a mask, carrying the 
mask, inserting vec_merge operations, etc, and then the epilogue looks 
much like the fully masked case, but unlike smaller mode epilogues there 
is no loop because the eplogue vector size is the same. Is that right?


This scheme seems like it might also benefit GCN, in so much as it 
simplifies the hot code path.


GCN does not actually have smaller vector sizes, so there's no analogue 
to AVX2 (we pretend we have some smaller sizes, but that's because the 
middle end can't do masking everywhere yet, and it helps make some 
vector constants smaller, perhaps).



This patch does not enable using fully masked loops or
masked epilogues by default.  More work on cost modeling
and vectorization kind selection on x86_64 is necessary
for this.

Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
which could be exploited further to unify some of the flags
we have right now but there didn't seem to be many easy things
to merge, so I'm leaving this for followups.

Mask requirements as registered by vect_record_loop_mask are kept in their
original form and recorded in a hash_set now instead of being
processed to a vector of rgroup_controls.  Instead that's now
left to the final analysis phase which tries forming the rgroup_controls
vector using while_ult and if that fails now tries AVX512 style
which needs a different organization and instead fills a hash_map
with the relevant info.  vect_get_loop_mask now has two implementations,
one for the two mask styles we then have.

I have decided against interweaving vect_set_loop_condition_partial_vectors
with conditions to do AVX512 style masking and instead opted to
"duplicate" this to vect_set_loop_condition_partial_vectors_avx512.
Likewise for vect_verify_full_masking vs vect_verify_full_masking_avx512.

I was split between making 'vec_loop_masks' a class with methods,
possibly merging in the _len stuff into a single registry.  It
seemed to be too many changes for the purpose of getting AVX512
working.  I'm going to play wait and see what happens with RISC-V
here since they are going to get both masks and lengths registered
I think.

The 

[PATCH] RISC-V: testsuite: Add vector_hw and zvfh_hw checks.

2023-06-14 Thread Robin Dapp via Gcc-patches
Hi,

this introduces new checks for run tests.  Currently we have
riscv_vector as well as rv32 and rv64 which all check if GCC (with the
current configuration) can build (not execute) the respective tests.

Many tests specify e.g. a different -march for vector, though.  So the
check fails even though we could build as well as run the tests (i.e.
when qemu and binfmt are set up properly).

The new vector_hw now tries to compile, link and execute a simple
vector example.  If this succeeds the respective test can run.

Similarly we introduce a zvfh_hw check which will be used in the
upcoming floating-point unop/binop tests as well as rv32_hw and
rv64_hw checks that are currently unused.

I have requested feedback from some of you individually already and
would kindly ask for feedback if this works for folks (or already does
without doing anything?).
With my current gcc configuration (e.g. --target=riscv64-unknown-linux-gnu
--with-sysroot) the riscv_vector check fails and consequently
no vector test is run (UNSUPPORTED).  With the new riscv_vector_hw
check everythings runs on my machine.

Regards
 Robin

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/shift-run.c: Use
riscv_vector_hw.
* gcc.target/riscv/rvv/autovec/binop/shift-scalar-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vand-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vor-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vxor-run.c: Dito.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c: Dito.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c: Dito.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c: Dito.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-4.c: Dito.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c:
Dito.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-run.c: Dito.
* gcc.target/riscv/rvv/autovec/conversions/vsext-run.c: Dito.
* gcc.target/riscv/rvv/autovec/conversions/vzext-run.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c:
Dito.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c:
Dito.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c:
Dito.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-4.c:
Dito.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-1.c:
Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-4.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-5.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-6.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-7.c: Dito.
* gcc.target/riscv/rvv/autovec/series_run-1.c: Dito.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: Dito.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: Dito.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: Dito.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: Dito.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: Dito.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: Dito.
* gcc.target/riscv/rvv/autovec/unop/abs-run.c: Dito.
* gcc.target/riscv/rvv/autovec/unop/vneg-run.c: Dito.
* gcc.target/riscv/rvv/autovec/unop/vnot-run.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/full-vec-move1-run.c:
Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c:
Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-2.c:
Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-3.c:
Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/insert_run-1.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/insert_run-2.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c: Dito.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c: Dito.
* 

Re: [PATCH] libstdc++: Clarify manual demangle doc

2023-06-14 Thread Jonathan Wakely via Gcc-patches
On Sat, 10 Jun 2023 at 23:04, Jonny Grant wrote:
>
> libstdc++-v3/ChangeLog:
>
> * doc/xml/manual/extensions.xml: Remove demangle exception 
> description and include.

Thanks, pushed to trunk.

>
> ---
>  libstdc++-v3/doc/xml/manual/extensions.xml | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/libstdc++-v3/doc/xml/manual/extensions.xml 
> b/libstdc++-v3/doc/xml/manual/extensions.xml
> index daa98f5cba7..d4fe2f509d4 100644
> --- a/libstdc++-v3/doc/xml/manual/extensions.xml
> +++ b/libstdc++-v3/doc/xml/manual/extensions.xml
> @@ -514,12 +514,10 @@ get_temporary_buffer(5, (int*)0);
>  you won't notice.)
>
>
> -Probably the only times you'll be interested in demangling at runtime
> -are when you're seeing typeid strings in RTTI, or when
> -you're handling the runtime-support exception classes.  For example:
> +Probably the only time you'll be interested in demangling at runtime
> +is when you're seeing typeid strings in RTTI.  For example:
>
> 
> -#include exception
>  #include iostream
>  #include cstdlib
>  #include cxxabi.h
> --
> 2.37.2


[PATCH] c++: tweak c++17 ctor/conversion tiebreaker [DR2327]

2023-06-14 Thread Jason Merrill via Gcc-patches
In discussion of this issue CWG decided that the change of behavior on
well-formed code like overload-conv-4.C is undesirable.  In further
discussion of possible resolutions, we discovered that we can avoid that
change while still getting the desired behavior on overload-conv-3.C by
making this a tiebreaker after comparing conversions, rather than before.
This also simplifies the implementation.

The issue resolution has not yet been finalized, but this seems like a clear
improvement.

DR 2327

gcc/cp/ChangeLog:

* call.cc (joust_maybe_elide_copy): Don't change cand.
(joust): Move the elided tiebreaker later.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/overload-conv-4.C: Remove warnings.
* g++.dg/cpp1z/elide7.C: New test.
---
 gcc/cp/call.cc   | 56 
 gcc/testsuite/g++.dg/cpp0x/overload-conv-4.C |  5 +-
 gcc/testsuite/g++.dg/cpp1z/elide7.C  | 14 +
 3 files changed, 39 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/elide7.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 68cf878308e..15a3d6f2a1f 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -12560,11 +12560,11 @@ add_warning (struct z_candidate *winner, struct 
z_candidate *loser)
 }
 
 /* CAND is a constructor candidate in joust in C++17 and up.  If it copies a
-   prvalue returned from a conversion function, replace CAND with the candidate
-   for the conversion and return true.  Otherwise, return false.  */
+   prvalue returned from a conversion function, return true.  Otherwise, return
+   false.  */
 
 static bool
-joust_maybe_elide_copy (z_candidate *)
+joust_maybe_elide_copy (z_candidate *cand)
 {
   tree fn = cand->fn;
   if (!DECL_COPY_CONSTRUCTOR_P (fn) && !DECL_MOVE_CONSTRUCTOR_P (fn))
@@ -12580,10 +12580,7 @@ joust_maybe_elide_copy (z_candidate *)
   (conv->type, DECL_CONTEXT (fn)));
   z_candidate *uc = conv->cand;
   if (DECL_CONV_FN_P (uc->fn))
-   {
- cand = uc;
- return true;
-   }
+   return true;
 }
   return false;
 }
@@ -12735,27 +12732,6 @@ joust (struct z_candidate *cand1, struct z_candidate 
*cand2, bool warn,
}
 }
 
-  /* Handle C++17 copy elision in [over.match.ctor] (direct-init) context.  The
- standard currently says that only constructors are candidates, but if one
- copies a prvalue returned by a conversion function we want to treat the
- conversion as the candidate instead.
-
- Clang does something similar, as discussed at
- http://lists.isocpp.org/core/2017/10/3166.php
- http://lists.isocpp.org/core/2019/03/5721.php  */
-  int elided_tiebreaker = 0;
-  if (len == 1 && cxx_dialect >= cxx17
-  && DECL_P (cand1->fn)
-  && DECL_COMPLETE_CONSTRUCTOR_P (cand1->fn)
-  && !(cand1->flags & LOOKUP_ONLYCONVERTING))
-{
-  bool elided1 = joust_maybe_elide_copy (cand1);
-  bool elided2 = joust_maybe_elide_copy (cand2);
-  /* As a tiebreaker below we will prefer a constructor to a conversion
-operator exposed this way.  */
-  elided_tiebreaker = elided2 - elided1;
-}
-
   for (i = 0; i < len; ++i)
 {
   conversion *t1 = cand1->convs[i + off1];
@@ -12917,11 +12893,6 @@ joust (struct z_candidate *cand1, struct z_candidate 
*cand2, bool warn,
   if (winner)
 return winner;
 
-  /* Put this tiebreaker first, so that we don't try to look at second_conv of
- a constructor candidate that doesn't have one.  */
-  if (elided_tiebreaker)
-return elided_tiebreaker;
-
   /* DR 495 moved this tiebreaker above the template ones.  */
   /* or, if not that,
  the  context  is  an  initialization by user-defined conversion (see
@@ -12958,6 +12929,25 @@ joust (struct z_candidate *cand1, struct z_candidate 
*cand2, bool warn,
   }
   }
 
+  /* DR2327: C++17 copy elision in [over.match.ctor] (direct-init) context.
+ The standard currently says that only constructors are candidates, but if
+ one copies a prvalue returned by a conversion function we prefer that.
+
+ Clang does something similar, as discussed at
+ http://lists.isocpp.org/core/2017/10/3166.php
+ http://lists.isocpp.org/core/2019/03/5721.php  */
+  if (len == 1 && cxx_dialect >= cxx17
+  && DECL_P (cand1->fn)
+  && DECL_COMPLETE_CONSTRUCTOR_P (cand1->fn)
+  && !(cand1->flags & LOOKUP_ONLYCONVERTING))
+{
+  bool elided1 = joust_maybe_elide_copy (cand1);
+  bool elided2 = joust_maybe_elide_copy (cand2);
+  winner = elided1 - elided2;
+  if (winner)
+   return winner;
+}
+
   /* or, if not that,
  F1 is a non-template function and F2 is a template function
  specialization.  */
diff --git a/gcc/testsuite/g++.dg/cpp0x/overload-conv-4.C 
b/gcc/testsuite/g++.dg/cpp0x/overload-conv-4.C
index 6fcdbbaa6a4..d2663e6cb20 100644
--- a/gcc/testsuite/g++.dg/cpp0x/overload-conv-4.C
+++ b/gcc/testsuite/g++.dg/cpp0x/overload-conv-4.C
@@ -17,7 

[PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Jakub Jelinek via Gcc-patches
Hi!

On Wed, Jun 14, 2023 at 12:35:42PM +, Richard Biener wrote:
> At this point two pages of code without a comment - can you introduce
> some vertical spacing and comments as to what is matched now?  The
> split out functions help somewhat but the code is far from obvious :/
> 
> Maybe I'm confused by the loops and instead of those sth like
> 
>  if (match_x_y_z (op0)
>  || match_x_y_z (op1))
>...
> 
> would be easier to follow with the loop bodies split out?
> Maybe put just put them in lambdas even?
> 
> I guess you'll be around as long as myself so we can go with
> this code under the premise you're going to maintain it - it's
> not that I'm writing trivially to understand code myself ...

As I said on IRC, I don't really know how to split that into further
functions, the problem is that we need to pattern match a lot of
statements and it is hard to come up with names for each of them.
And we need quite a lot of variables for checking their interactions.

The code isn't that much different from say match_arith_overflow or
optimize_spaceship or other larger pattern recognizers.  And the
intent is that all the code paths in the recognizer are actually covered
by the testcases in the testsuite.

That said, I've added 18 new comments to the function, and rebased it
on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621717.html
patch with all constant arguments handling moved to fold-const-call.cc
even for the new ifns.

Ok for trunk like this if it passes bootstrap/regtest?

2023-06-13  Jakub Jelinek  

PR middle-end/79173
* internal-fn.def (UADDC, USUBC): New internal functions.
* internal-fn.cc (expand_UADDC, expand_USUBC): New functions.
(commutative_ternary_fn_p): Return true also for IFN_UADDC.
* optabs.def (uaddc5_optab, usubc5_optab): New optabs.
* tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart,
match_uaddc_usubc): New functions.
(math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc
for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
other optimizations have been successful for those.
* gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC.
* fold-const-call.cc (fold_const_call): Likewise.
* gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
* doc/md.texi (uaddc5, usubc5): Document new named
patterns.
* config/i386/i386.md (subborrow): Add alternative with
memory destination.
(uaddc5, usubc5): New define_expand patterns.
(*sub_3, @add3_carry, addcarry, @sub3_carry,
subborrow, *add3_cc_overflow_1): Add define_peephole2
TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory
destination in these patterns.

* gcc.target/i386/pr79173-1.c: New test.
* gcc.target/i386/pr79173-2.c: New test.
* gcc.target/i386/pr79173-3.c: New test.
* gcc.target/i386/pr79173-4.c: New test.
* gcc.target/i386/pr79173-5.c: New test.
* gcc.target/i386/pr79173-6.c: New test.
* gcc.target/i386/pr79173-7.c: New test.
* gcc.target/i386/pr79173-8.c: New test.
* gcc.target/i386/pr79173-9.c: New test.
* gcc.target/i386/pr79173-10.c: New test.

--- gcc/internal-fn.def.jj  2023-06-13 18:23:37.208793152 +0200
+++ gcc/internal-fn.def 2023-06-14 12:21:38.650657857 +0200
@@ -416,6 +416,8 @@ DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LE
 DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (UADDC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (USUBC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL)
 DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
--- gcc/internal-fn.cc.jj   2023-06-13 18:23:37.206793179 +0200
+++ gcc/internal-fn.cc  2023-06-14 12:21:38.652657829 +0200
@@ -2776,6 +2776,44 @@ expand_MUL_OVERFLOW (internal_fn, gcall
   expand_arith_overflow (MULT_EXPR, stmt);
 }
 
+/* Expand UADDC STMT.  */
+
+static void
+expand_UADDC (internal_fn ifn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree arg1 = gimple_call_arg (stmt, 0);
+  tree arg2 = gimple_call_arg (stmt, 1);
+  tree arg3 = gimple_call_arg (stmt, 2);
+  tree type = TREE_TYPE (arg1);
+  machine_mode mode = TYPE_MODE (type);
+  insn_code icode = optab_handler (ifn == IFN_UADDC
+  ? uaddc5_optab : usubc5_optab, mode);
+  rtx op1 = expand_normal (arg1);
+  rtx op2 = expand_normal (arg2);
+  rtx op3 = expand_normal (arg3);
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, 

Re: [PATCH] middle-end: Move constant args folding of .UBSAN_CHECK_* and .*_OVERFLOW into fold-const-call.cc

2023-06-14 Thread Richard Biener via Gcc-patches
On Wed, 14 Jun 2023, Jakub Jelinek wrote:

> On Wed, Jun 14, 2023 at 12:25:46PM +, Richard Biener wrote:
> > I think that's still very much desirable so this followup looks OK.
> > Maybe you can re-base it as prerequesite though?
> 
> Rebased then (of course with the UADDC/USUBC handling removed from this
> first patch, will be added in the second one).
> 
> Ok for trunk if it passes bootstrap/regtest?

OK.

Thanks,
Richard.

> 2023-06-14  Jakub Jelinek  
> 
>   * gimple-fold.cc (gimple_fold_call): Move handling of arg0
>   as well as arg1 INTEGER_CSTs for .UBSAN_CHECK_{ADD,SUB,MUL}
>   and .{ADD,SUB,MUL}_OVERFLOW calls from here...
>   * fold-const-call.cc (fold_const_call): ... here.
> 
> --- gcc/gimple-fold.cc.jj 2023-06-13 18:23:37.199793275 +0200
> +++ gcc/gimple-fold.cc2023-06-14 15:41:51.090987708 +0200
> @@ -5702,22 +5702,6 @@ gimple_fold_call (gimple_stmt_iterator *
>   result = arg0;
> else if (subcode == MULT_EXPR && integer_onep (arg0))
>   result = arg1;
> -   else if (TREE_CODE (arg0) == INTEGER_CST
> -&& TREE_CODE (arg1) == INTEGER_CST)
> - {
> -   if (cplx_result)
> - result = int_const_binop (subcode, fold_convert (type, arg0),
> -   fold_convert (type, arg1));
> -   else
> - result = int_const_binop (subcode, arg0, arg1);
> -   if (result && arith_overflowed_p (subcode, type, arg0, arg1))
> - {
> -   if (cplx_result)
> - overflow = build_one_cst (type);
> -   else
> - result = NULL_TREE;
> - }
> - }
> if (result)
>   {
> if (result == integer_zero_node)
> --- gcc/fold-const-call.cc.jj 2023-06-02 10:36:43.096967505 +0200
> +++ gcc/fold-const-call.cc2023-06-14 15:40:34.388064498 +0200
> @@ -1669,6 +1669,7 @@ fold_const_call (combined_fn fn, tree ty
>  {
>const char *p0, *p1;
>char c;
> +  tree_code subcode;
>switch (fn)
>  {
>  case CFN_BUILT_IN_STRSPN:
> @@ -1738,6 +1739,46 @@ fold_const_call (combined_fn fn, tree ty
>  case CFN_FOLD_LEFT_PLUS:
>return fold_const_fold_left (type, arg0, arg1, PLUS_EXPR);
>  
> +case CFN_UBSAN_CHECK_ADD:
> +case CFN_ADD_OVERFLOW:
> +  subcode = PLUS_EXPR;
> +  goto arith_overflow;
> +
> +case CFN_UBSAN_CHECK_SUB:
> +case CFN_SUB_OVERFLOW:
> +  subcode = MINUS_EXPR;
> +  goto arith_overflow;
> +
> +case CFN_UBSAN_CHECK_MUL:
> +case CFN_MUL_OVERFLOW:
> +  subcode = MULT_EXPR;
> +  goto arith_overflow;
> +
> +arith_overflow:
> +  if (integer_cst_p (arg0) && integer_cst_p (arg1))
> + {
> +   tree itype
> + = TREE_CODE (type) == COMPLEX_TYPE ? TREE_TYPE (type) : type;
> +   bool ovf = false;
> +   tree r = int_const_binop (subcode, fold_convert (itype, arg0),
> + fold_convert (itype, arg1));
> +   if (!r || TREE_CODE (r) != INTEGER_CST)
> + return NULL_TREE;
> +   if (arith_overflowed_p (subcode, itype, arg0, arg1))
> + ovf = true;
> +   if (TREE_OVERFLOW (r))
> + r = drop_tree_overflow (r);
> +   if (itype == type)
> + {
> +   if (ovf)
> + return NULL_TREE;
> +   return r;
> + }
> +   else
> + return build_complex (type, r, build_int_cst (itype, ovf));
> + }
> +  return NULL_TREE;
> +
>  default:
>return fold_const_call_1 (fn, type, arg0, arg1);
>  }
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


[PATCH] middle-end: Move constant args folding of .UBSAN_CHECK_* and .*_OVERFLOW into fold-const-call.cc

2023-06-14 Thread Jakub Jelinek via Gcc-patches
On Wed, Jun 14, 2023 at 12:25:46PM +, Richard Biener wrote:
> I think that's still very much desirable so this followup looks OK.
> Maybe you can re-base it as prerequesite though?

Rebased then (of course with the UADDC/USUBC handling removed from this
first patch, will be added in the second one).

Ok for trunk if it passes bootstrap/regtest?

2023-06-14  Jakub Jelinek  

* gimple-fold.cc (gimple_fold_call): Move handling of arg0
as well as arg1 INTEGER_CSTs for .UBSAN_CHECK_{ADD,SUB,MUL}
and .{ADD,SUB,MUL}_OVERFLOW calls from here...
* fold-const-call.cc (fold_const_call): ... here.

--- gcc/gimple-fold.cc.jj   2023-06-13 18:23:37.199793275 +0200
+++ gcc/gimple-fold.cc  2023-06-14 15:41:51.090987708 +0200
@@ -5702,22 +5702,6 @@ gimple_fold_call (gimple_stmt_iterator *
result = arg0;
  else if (subcode == MULT_EXPR && integer_onep (arg0))
result = arg1;
- else if (TREE_CODE (arg0) == INTEGER_CST
-  && TREE_CODE (arg1) == INTEGER_CST)
-   {
- if (cplx_result)
-   result = int_const_binop (subcode, fold_convert (type, arg0),
- fold_convert (type, arg1));
- else
-   result = int_const_binop (subcode, arg0, arg1);
- if (result && arith_overflowed_p (subcode, type, arg0, arg1))
-   {
- if (cplx_result)
-   overflow = build_one_cst (type);
- else
-   result = NULL_TREE;
-   }
-   }
  if (result)
{
  if (result == integer_zero_node)
--- gcc/fold-const-call.cc.jj   2023-06-02 10:36:43.096967505 +0200
+++ gcc/fold-const-call.cc  2023-06-14 15:40:34.388064498 +0200
@@ -1669,6 +1669,7 @@ fold_const_call (combined_fn fn, tree ty
 {
   const char *p0, *p1;
   char c;
+  tree_code subcode;
   switch (fn)
 {
 case CFN_BUILT_IN_STRSPN:
@@ -1738,6 +1739,46 @@ fold_const_call (combined_fn fn, tree ty
 case CFN_FOLD_LEFT_PLUS:
   return fold_const_fold_left (type, arg0, arg1, PLUS_EXPR);
 
+case CFN_UBSAN_CHECK_ADD:
+case CFN_ADD_OVERFLOW:
+  subcode = PLUS_EXPR;
+  goto arith_overflow;
+
+case CFN_UBSAN_CHECK_SUB:
+case CFN_SUB_OVERFLOW:
+  subcode = MINUS_EXPR;
+  goto arith_overflow;
+
+case CFN_UBSAN_CHECK_MUL:
+case CFN_MUL_OVERFLOW:
+  subcode = MULT_EXPR;
+  goto arith_overflow;
+
+arith_overflow:
+  if (integer_cst_p (arg0) && integer_cst_p (arg1))
+   {
+ tree itype
+   = TREE_CODE (type) == COMPLEX_TYPE ? TREE_TYPE (type) : type;
+ bool ovf = false;
+ tree r = int_const_binop (subcode, fold_convert (itype, arg0),
+   fold_convert (itype, arg1));
+ if (!r || TREE_CODE (r) != INTEGER_CST)
+   return NULL_TREE;
+ if (arith_overflowed_p (subcode, itype, arg0, arg1))
+   ovf = true;
+ if (TREE_OVERFLOW (r))
+   r = drop_tree_overflow (r);
+ if (itype == type)
+   {
+ if (ovf)
+   return NULL_TREE;
+ return r;
+   }
+ else
+   return build_complex (type, r, build_int_cst (itype, ovf));
+   }
+  return NULL_TREE;
+
 default:
   return fold_const_call_1 (fn, type, arg0, arg1);
 }


Jakub



Re: [PATCH 1/3] Inline vect_get_max_nscalars_per_iter

2023-06-14 Thread Richard Biener via Gcc-patches
On Wed, 14 Jun 2023, Richard Sandiford wrote:

> Richard Biener via Gcc-patches  writes:
> > The function is only meaningful for LOOP_VINFO_MASKS processing so
> > inline it into the single use.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> >
> > * tree-vect-loop.cc (vect_get_max_nscalars_per_iter): Inline
> > into ...
> > (vect_verify_full_masking): ... this.
> 
> I think we did have a use for the separate function internally,
> but obviously it was never submitted.  Personally I'd prefer
> to keep things as they are though.

OK - after 3/3 it's no longer "generic" (it wasn't before,
it doesn't inspect the _len groups either), it's only meaningful
for WHILE_ULT style analysis.

> 
> 
> > ---
> >  gcc/tree-vect-loop.cc | 22 ++
> >  1 file changed, 6 insertions(+), 16 deletions(-)
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index ace9e759f5b..a9695e5b25d 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -1117,20 +1117,6 @@ can_produce_all_loop_masks_p (loop_vec_info 
> > loop_vinfo, tree cmp_type)
> >return true;
> >  }
> >  
> > -/* Calculate the maximum number of scalars per iteration for every
> > -   rgroup in LOOP_VINFO.  */
> > -
> > -static unsigned int
> > -vect_get_max_nscalars_per_iter (loop_vec_info loop_vinfo)
> > -{
> > -  unsigned int res = 1;
> > -  unsigned int i;
> > -  rgroup_controls *rgm;
> > -  FOR_EACH_VEC_ELT (LOOP_VINFO_MASKS (loop_vinfo), i, rgm)
> > -res = MAX (res, rgm->max_nscalars_per_iter);
> > -  return res;
> > -}
> > -
> >  /* Calculate the minimum precision necessary to represent:
> >  
> >MAX_NITERS * FACTOR
> > @@ -1210,8 +1196,6 @@ static bool
> >  vect_verify_full_masking (loop_vec_info loop_vinfo)
> >  {
> >unsigned int min_ni_width;
> > -  unsigned int max_nscalars_per_iter
> > -= vect_get_max_nscalars_per_iter (loop_vinfo);
> >  
> >/* Use a normal loop if there are no statements that need masking.
> >   This only happens in rare degenerate cases: it means that the loop
> > @@ -1219,6 +1203,12 @@ vect_verify_full_masking (loop_vec_info loop_vinfo)
> >if (LOOP_VINFO_MASKS (loop_vinfo).is_empty ())
> >  return false;
> >  
> > +  /* Calculate the maximum number of scalars per iteration for every 
> > rgroup.  */
> > +  unsigned int max_nscalars_per_iter = 1;
> > +  for (auto rgm : LOOP_VINFO_MASKS (loop_vinfo))
> > +max_nscalars_per_iter
> > +  = MAX (max_nscalars_per_iter, rgm.max_nscalars_per_iter);
> > +
> >/* Work out how many bits we need to represent the limit.  */
> >min_ni_width
> >  = vect_min_prec_for_max_niters (loop_vinfo, max_nscalars_per_iter);
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH] [RFC] main loop masked vectorization with --param vect-partial-vector-usage=1

2023-06-14 Thread Richard Biener via Gcc-patches
On Wed, 14 Jun 2023, Richard Sandiford wrote:

> Richard Biener via Gcc-patches  writes:
> > Currently vect_determine_partial_vectors_and_peeling will decide
> > to apply fully masking to the main loop despite
> > --param vect-partial-vector-usage=1 when the currently analyzed
> > vector mode results in a vectorization factor that's bigger
> > than the number of scalar iterations.  That's undesirable for
> > targets where a vector mode can handle both partial vector and
> > non-partial vector vectorization.  I understand that for AARCH64
> > we have SVE and NEON but SVE can only do partial vector and
> > NEON only non-partial vector vectorization, plus the target
> > chooses to let cost comparison decide the vector mode to use.
> 
> SVE can do both (and does non-partial for things that can't yet be
> predicated, like reversing loads).  But yeah, NEON can only do
> non-partial.
> 
> > For x86 and the upcoming AVX512 partial vector support the
> > story is different, the target chooses the first (and largest)
> > vector mode that can successfully used for vectorization.  But
> > that means with --param vect-partial-vector-usage=1 we will
> > always choose AVX512 with partial vectors for the main loop
> > even if, for example, V4SI would be a perfect fit with full
> > vectors and no required epilog!
> 
> Sounds like a good candidate for VECT_COMPARE_COSTS.  Did you
> try using that?

Yeah, I didn't try that because we've never done that and I expect
unrelated "effects" ...

> > The following tries to find the appropriate condition for
> > this - I suppose simply refusing to set LOOP_VINFO_USING_PARTIAL_VECTORS_P
> > on the main loop when --param vect-partial-vector-usage=1 will
> > hurt AARCH64?
> 
> Yeah, I'd expect so.
> 
> > Incidentially looking up the docs for
> > vect-partial-vector-usage suggests that it's not supposed to
> > control epilog vectorization but instead
> > "1 allows partial vector loads and stores if vectorization removes the
> > need for the code to iterate".  That's probably OK in the end
> > but if there's a fixed size vector mode that allows the same thing
> > without using masking that would be better.
> >
> > I wonder if we should special-case known niter (bounds) somehow
> > when analyzing the vector modes and override the targets sorting?
> >
> > Maybe we want a new --param in addition to vect-epilogues-nomask
> > and vect-partial-vector-usage to say we want masked epilogues?
> >
> > * tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling):
> > For non-VLA vectorization interpret param_vect_partial_vector_usage == 1
> > as only applying to epilogues.
> > ---
> >  gcc/tree-vect-loop.cc | 10 +-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 9be66b8fbc5..9323aa572d4 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -2478,7 +2478,15 @@ vect_determine_partial_vectors_and_peeling 
> > (loop_vec_info loop_vinfo,
> >   && !LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> >   && !vect_known_niters_smaller_than_vf (loop_vinfo))
> > LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
> > -  else
> > +  /* Avoid using a large fixed size vectorization mode with masking
> > +for the main loop when we were asked to only use masking for
> > +the epilog.
> > +???  Ideally we'd start analysis with a better sized mode,
> > +the param_vect_partial_vector_usage == 2 case suffers from
> > +this as well.  But there's a catch-22.  */
> > +  else if (!(!LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> > +&& param_vect_partial_vector_usage == 1
> > +&& LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()))
> 
> I don't think is_constant is a good thing to test here.  The way things
> work for SVE is essentially the same for VL-agnostic and VL-specific.
> 
> Also, I think this hard-codes the assumption that the smallest mode
> isn't maskable.  Wouldn't it spuriously fail vectorisation if there
> was no smaller mode available?
> 
> Similarly, it looks like it could fail for AVX512 if processing 511
> characters, whereas I'd have expected AVX512 to still be the right
> choice there.

Possibly, yes.

> If VECT_COMPARE_COSTS seems too expensive, we could try to look
> for cases where a vector mode later in the list gives a VF that
> is exactly equal to the number of scalar iterations.  (Exactly
> *divides* the number of scalar iterations would be less clear-cut IMO.)
> 
> But converting a vector mode into a VF isn't trivial with our
> current vectoriser structures, so I'm not sure how much of a
> saving it would be over VECT_COMPARE_COSTS.  And it would be much
> more special-purpose than VECT_COMPARE_COSTS.

It occured to me if NITER is constant or we have a constant bound
on it we could set max_vf to the next higher power-of-two
(and min_vf to the next lower power-of-two?) when doing the
"autodetect" run.  Unfortunately we 

[PATCH v2] c++: Accept elaborated-enum-base in system headers

2023-06-14 Thread Alex Coplan via Gcc-patches
Hi,

This is a v2 patch addressing feedback for:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621050.html

macOS SDK headers using the CF_ENUM macro can expand to invalid C++ code
of the form:

typedef enum T : BaseType T;

i.e. an elaborated-type-specifier with an additional enum-base.
Upstream LLVM can be made to accept the above construct with
-Wno-error=elaborated-enum-base.

This patch adds the -Welaborated-enum-base warning to GCC and adjusts
the C++ parser to emit this warning instead of rejecting this code
outright.

The macro expansion in the macOS headers occurs in the case that the
compiler declares support for enums with underlying type using
__has_feature, see
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618450.html

GCC rejecting this construct outright means that GCC fails to bootstrap
on Darwin in the case that it (correctly) implements __has_feature and
declares support for C++ enums with underlying type.

With this patch, GCC can bootstrap on Darwin in combination with the
(WIP) __has_feature patch posted at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617878.html

Bootstrapped/regtested on aarch64-linux-gnu and x86_64-apple-darwin.
OK for trunk?

Thanks,
Alex

gcc/c-family/ChangeLog:

* c.opt (Welaborated-enum-base): New.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_enum_specifier): Don't reject
elaborated-type-specifier with enum-base, instead emit new
Welaborated-enum-base warning.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/enum40.C: Adjust expected diagnostics.
* g++.dg/cpp0x/forw_enum6.C: Likewise.
* g++.dg/cpp0x/elab-enum-base.C: New test.
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index cead1995561..f935665d629 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1488,6 +1488,13 @@ Wsubobject-linkage
 C++ ObjC++ Var(warn_subobject_linkage) Warning Init(1)
 Warn if a class type has a base or a field whose type uses the anonymous 
namespace or depends on a type with no linkage.
 
+Welaborated-enum-base
+C++ ObjC++ Var(warn_elaborated_enum_base) Warning Init(1)
+Warn if an additional enum-base is used in an elaborated-type-specifier.
+That is, if an enum with given underlying type and no enumerator list
+is used in a declaration other than just a standalone declaration of the
+enum.
+
 Wduplicate-decl-specifier
 C ObjC Var(warn_duplicate_decl_specifier) Warning LangEnabledBy(C ObjC,Wall)
 Warn when a declaration has duplicate const, volatile, restrict or _Atomic 
specifier.
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index d77fbd20e56..4dd290717de 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -21024,11 +21024,13 @@ cp_parser_enum_specifier (cp_parser* parser)
 
   /* Check for the `:' that denotes a specified underlying type in C++0x.
  Note that a ':' could also indicate a bitfield width, however.  */
+  location_t colon_loc = UNKNOWN_LOCATION;
   if (cp_lexer_next_token_is (parser->lexer, CPP_COLON))
 {
   cp_decl_specifier_seq type_specifiers;
 
   /* Consume the `:'.  */
+  colon_loc = cp_lexer_peek_token (parser->lexer)->location;
   cp_lexer_consume_token (parser->lexer);
 
   auto tdf
@@ -21077,10 +21079,13 @@ cp_parser_enum_specifier (cp_parser* parser)
  && cp_lexer_next_token_is_not (parser->lexer, CPP_SEMICOLON))
{
  if (has_underlying_type)
-   cp_parser_commit_to_tentative_parse (parser);
- cp_parser_error (parser, "expected %<;%> or %<{%>");
- if (has_underlying_type)
-   return error_mark_node;
+   pedwarn (colon_loc,
+OPT_Welaborated_enum_base,
+"declaration of enumeration with "
+"fixed underlying type and no enumerator list is "
+"only permitted as a standalone declaration");
+ else
+   cp_parser_error (parser, "expected %<;%> or %<{%>");
}
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/elab-enum-base.C 
b/gcc/testsuite/g++.dg/cpp0x/elab-enum-base.C
new file mode 100644
index 000..57141f013bd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/elab-enum-base.C
@@ -0,0 +1,7 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+// Empty dg-options to override -pedantic-errors.
+
+typedef long CFIndex;
+typedef enum CFComparisonResult : CFIndex CFComparisonResult;
+// { dg-warning "declaration of enumeration with fixed underlying type" "" { 
target *-*-* } .-1 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/enum40.C 
b/gcc/testsuite/g++.dg/cpp0x/enum40.C
index cfdf2a4a18a..d3ffeb62d70 100644
--- a/gcc/testsuite/g++.dg/cpp0x/enum40.C
+++ b/gcc/testsuite/g++.dg/cpp0x/enum40.C
@@ -4,23 +4,25 @@
 void
 foo ()
 {
-  enum : int a alignas;// { dg-error "expected" }
+  enum : int a alignas;// { dg-error "declaration of enum" }
+  // { dg-error {expected '\(' before ';'} "" { target *-*-* } .-1 }
 }
 
 void
 bar ()
 {
-  enum : int a;   

Re: [PATCH 1/3] Inline vect_get_max_nscalars_per_iter

2023-06-14 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> The function is only meaningful for LOOP_VINFO_MASKS processing so
> inline it into the single use.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
>
>   * tree-vect-loop.cc (vect_get_max_nscalars_per_iter): Inline
>   into ...
>   (vect_verify_full_masking): ... this.

I think we did have a use for the separate function internally,
but obviously it was never submitted.  Personally I'd prefer
to keep things as they are though.



> ---
>  gcc/tree-vect-loop.cc | 22 ++
>  1 file changed, 6 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index ace9e759f5b..a9695e5b25d 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1117,20 +1117,6 @@ can_produce_all_loop_masks_p (loop_vec_info 
> loop_vinfo, tree cmp_type)
>return true;
>  }
>  
> -/* Calculate the maximum number of scalars per iteration for every
> -   rgroup in LOOP_VINFO.  */
> -
> -static unsigned int
> -vect_get_max_nscalars_per_iter (loop_vec_info loop_vinfo)
> -{
> -  unsigned int res = 1;
> -  unsigned int i;
> -  rgroup_controls *rgm;
> -  FOR_EACH_VEC_ELT (LOOP_VINFO_MASKS (loop_vinfo), i, rgm)
> -res = MAX (res, rgm->max_nscalars_per_iter);
> -  return res;
> -}
> -
>  /* Calculate the minimum precision necessary to represent:
>  
>MAX_NITERS * FACTOR
> @@ -1210,8 +1196,6 @@ static bool
>  vect_verify_full_masking (loop_vec_info loop_vinfo)
>  {
>unsigned int min_ni_width;
> -  unsigned int max_nscalars_per_iter
> -= vect_get_max_nscalars_per_iter (loop_vinfo);
>  
>/* Use a normal loop if there are no statements that need masking.
>   This only happens in rare degenerate cases: it means that the loop
> @@ -1219,6 +1203,12 @@ vect_verify_full_masking (loop_vec_info loop_vinfo)
>if (LOOP_VINFO_MASKS (loop_vinfo).is_empty ())
>  return false;
>  
> +  /* Calculate the maximum number of scalars per iteration for every rgroup. 
>  */
> +  unsigned int max_nscalars_per_iter = 1;
> +  for (auto rgm : LOOP_VINFO_MASKS (loop_vinfo))
> +max_nscalars_per_iter
> +  = MAX (max_nscalars_per_iter, rgm.max_nscalars_per_iter);
> +
>/* Work out how many bits we need to represent the limit.  */
>min_ni_width
>  = vect_min_prec_for_max_niters (loop_vinfo, max_nscalars_per_iter);


Re: [PATCH] [RFC] main loop masked vectorization with --param vect-partial-vector-usage=1

2023-06-14 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> Currently vect_determine_partial_vectors_and_peeling will decide
> to apply fully masking to the main loop despite
> --param vect-partial-vector-usage=1 when the currently analyzed
> vector mode results in a vectorization factor that's bigger
> than the number of scalar iterations.  That's undesirable for
> targets where a vector mode can handle both partial vector and
> non-partial vector vectorization.  I understand that for AARCH64
> we have SVE and NEON but SVE can only do partial vector and
> NEON only non-partial vector vectorization, plus the target
> chooses to let cost comparison decide the vector mode to use.

SVE can do both (and does non-partial for things that can't yet be
predicated, like reversing loads).  But yeah, NEON can only do
non-partial.

> For x86 and the upcoming AVX512 partial vector support the
> story is different, the target chooses the first (and largest)
> vector mode that can successfully used for vectorization.  But
> that means with --param vect-partial-vector-usage=1 we will
> always choose AVX512 with partial vectors for the main loop
> even if, for example, V4SI would be a perfect fit with full
> vectors and no required epilog!

Sounds like a good candidate for VECT_COMPARE_COSTS.  Did you
try using that?

> The following tries to find the appropriate condition for
> this - I suppose simply refusing to set LOOP_VINFO_USING_PARTIAL_VECTORS_P
> on the main loop when --param vect-partial-vector-usage=1 will
> hurt AARCH64?

Yeah, I'd expect so.

> Incidentially looking up the docs for
> vect-partial-vector-usage suggests that it's not supposed to
> control epilog vectorization but instead
> "1 allows partial vector loads and stores if vectorization removes the
> need for the code to iterate".  That's probably OK in the end
> but if there's a fixed size vector mode that allows the same thing
> without using masking that would be better.
>
> I wonder if we should special-case known niter (bounds) somehow
> when analyzing the vector modes and override the targets sorting?
>
> Maybe we want a new --param in addition to vect-epilogues-nomask
> and vect-partial-vector-usage to say we want masked epilogues?
>
>   * tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling):
>   For non-VLA vectorization interpret param_vect_partial_vector_usage == 1
>   as only applying to epilogues.
> ---
>  gcc/tree-vect-loop.cc | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 9be66b8fbc5..9323aa572d4 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -2478,7 +2478,15 @@ vect_determine_partial_vectors_and_peeling 
> (loop_vec_info loop_vinfo,
> && !LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> && !vect_known_niters_smaller_than_vf (loop_vinfo))
>   LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
> -  else
> +  /* Avoid using a large fixed size vectorization mode with masking
> +  for the main loop when we were asked to only use masking for
> +  the epilog.
> +  ???  Ideally we'd start analysis with a better sized mode,
> +  the param_vect_partial_vector_usage == 2 case suffers from
> +  this as well.  But there's a catch-22.  */
> +  else if (!(!LOOP_VINFO_EPILOGUE_P (loop_vinfo)
> +  && param_vect_partial_vector_usage == 1
> +  && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()))

I don't think is_constant is a good thing to test here.  The way things
work for SVE is essentially the same for VL-agnostic and VL-specific.

Also, I think this hard-codes the assumption that the smallest mode
isn't maskable.  Wouldn't it spuriously fail vectorisation if there
was no smaller mode available?

Similarly, it looks like it could fail for AVX512 if processing 511
characters, whereas I'd have expected AVX512 to still be the right
choice there.

If VECT_COMPARE_COSTS seems too expensive, we could try to look
for cases where a vector mode later in the list gives a VF that
is exactly equal to the number of scalar iterations.  (Exactly
*divides* the number of scalar iterations would be less clear-cut IMO.)

But converting a vector mode into a VF isn't trivial with our
current vectoriser structures, so I'm not sure how much of a
saving it would be over VECT_COMPARE_COSTS.  And it would be much
more special-purpose than VECT_COMPARE_COSTS.

Thanks,
Richard


>   LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = true;
>  }


Re:RE: [PATCH] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread Lehua Ding
Nit for test.
 +/* { dg-options "-march=rv64gczve32x
 +--param=riscv-autovec-preference=fixed-vlmax" } */
To
 +/* { dg-options "-march=rv64gc_zve32x 
--param=riscv-autovec-preference=fixed-vlmax" } */
Fixed in the V2 patch 
(https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621698.html), thank you.


Best,
Lehua


Re: [PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Richard Biener via Gcc-patches
On Tue, 13 Jun 2023, Jakub Jelinek wrote:

> On Tue, Jun 13, 2023 at 08:40:36AM +, Richard Biener wrote:
> > I suspect re-association can wreck things even more here.  I have
> > to say the matching code is very hard to follow, not sure if
> > splitting out a function matching
> > 
> >_22 = .{ADD,SUB}_OVERFLOW (_6, _5);
> >_23 = REALPART_EXPR <_22>;
> >_24 = IMAGPART_EXPR <_22>;
> > 
> > from _23 and _24 would help?
> 
> I've outlined 3 most often used sequences of statements or checks
> into 3 helper functions, hope that helps.
> 
> > > +  while (TREE_CODE (rhs[0]) == SSA_NAME && !rhs[3])
> > > + {
> > > +   gimple *g = SSA_NAME_DEF_STMT (rhs[0]);
> > > +   if (has_single_use (rhs[0])
> > > +   && is_gimple_assign (g)
> > > +   && (gimple_assign_rhs_code (g) == code
> > > +   || (code == MINUS_EXPR
> > > +   && gimple_assign_rhs_code (g) == PLUS_EXPR
> > > +   && TREE_CODE (gimple_assign_rhs2 (g)) == INTEGER_CST)))
> > > + {
> > > +   rhs[0] = gimple_assign_rhs1 (g);
> > > +   tree  = rhs[2] ? rhs[3] : rhs[2];
> > > +   r = gimple_assign_rhs2 (g);
> > > +   if (gimple_assign_rhs_code (g) != code)
> > > + r = fold_build1 (NEGATE_EXPR, TREE_TYPE (r), r);
> > 
> > Can you use const_unop here?  In fact both will not reliably
> > negate all constants (ick), so maybe we want a force_const_negate ()?
> 
> It is unsigned type NEGATE_EXPR of INTEGER_CST, so I think it should
> work.  That said, changed it to const_unop and am just giving up on it
> as if it wasn't a PLUS_EXPR with INTEGER_CST addend if const_unop doesn't
> simplify.
> 
> > > +   else if (addc_subc)
> > > + {
> > > +   if (!integer_zerop (arg2))
> > > + ;
> > > +   /* x = y + 0 + 0; x = y - 0 - 0; */
> > > +   else if (integer_zerop (arg1))
> > > + result = arg0;
> > > +   /* x = 0 + y + 0; */
> > > +   else if (subcode != MINUS_EXPR && integer_zerop (arg0))
> > > + result = arg1;
> > > +   /* x = y - y - 0; */
> > > +   else if (subcode == MINUS_EXPR
> > > +&& operand_equal_p (arg0, arg1, 0))
> > > + result = integer_zero_node;
> > > + }
> > 
> > So this all performs simplifications but also constant folding.  In
> > particular the match.pd re-simplification will invoke fold_const_call
> > on all-constant argument function calls but does not do extra folding
> > on partially constant arg cases but instead relies on patterns here.
> > 
> > Can you add all-constant arg handling to fold_const_call and
> > consider moving cases like y + 0 + 0 to match.pd?
> 
> The reason I've done this here is that this is the spot where all other
> similar internal functions are handled, be it the ubsan ones
> - IFN_UBSAN_CHECK_{ADD,SUB,MUL}, or __builtin_*_overflow ones
> - IFN_{ADD,SUB,MUL}_OVERFLOW, or these 2 new ones.  The code handles
> there 2 constant arguments as well as various patterns that can be
> simplified and has code to clean it up later, build a COMPLEX_CST,
> or COMPLEX_EXPR etc. as needed.  So, I think we want to handle those
> elsewhere, we should do it for all of those functions, but then
> probably incrementally.
> 
> > > +@cindex @code{addc@var{m}5} instruction pattern
> > > +@item @samp{addc@var{m}5}
> > > +Adds operands 2, 3 and 4 (where the last operand is guaranteed to have
> > > +only values 0 or 1) together, sets operand 0 to the result of the
> > > +addition of the 3 operands and sets operand 1 to 1 iff there was no
> > > +overflow on the unsigned additions, and to 0 otherwise.  So, it is
> > > +an addition with carry in (operand 4) and carry out (operand 1).
> > > +All operands have the same mode.
> > 
> > operand 1 set to 1 for no overflow sounds weird when specifying it
> > as carry out - can you double check?
> 
> Fixed.
> 
> > > +@cindex @code{subc@var{m}5} instruction pattern
> > > +@item @samp{subc@var{m}5}
> > > +Similarly to @samp{addc@var{m}5}, except subtracts operands 3 and 4
> > > +from operand 2 instead of adding them.  So, it is
> > > +a subtraction with carry/borrow in (operand 4) and carry/borrow out
> > > +(operand 1).  All operands have the same mode.
> > > +
> > 
> > I wonder if we want to name them uaddc and usubc?  Or is this supposed
> > to be simply the twos-complement "carry"?  I think the docs should
> > say so then (note we do have uaddv and addv).
> 
> Makes sense, I've actually renamed even the internal functions etc.
> 
> Here is only lightly tested patch with everything but gimple-fold.cc
> changed.
> 
> 2023-06-13  Jakub Jelinek  
> 
>   PR middle-end/79173
>   * internal-fn.def (UADDC, USUBC): New internal functions.
>   * internal-fn.cc (expand_UADDC, expand_USUBC): New functions.
>   (commutative_ternary_fn_p): Return true also for IFN_UADDC.
>   * optabs.def (uaddc5_optab, usubc5_optab): New optabs.
>   * tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart,
>   match_uaddc_usubc): New 

Re: [PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Richard Biener via Gcc-patches
On Wed, 14 Jun 2023, Jakub Jelinek wrote:

> On Tue, Jun 13, 2023 at 01:29:04PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > > > + else if (addc_subc)
> > > > +   {
> > > > + if (!integer_zerop (arg2))
> > > > +   ;
> > > > + /* x = y + 0 + 0; x = y - 0 - 0; */
> > > > + else if (integer_zerop (arg1))
> > > > +   result = arg0;
> > > > + /* x = 0 + y + 0; */
> > > > + else if (subcode != MINUS_EXPR && integer_zerop (arg0))
> > > > +   result = arg1;
> > > > + /* x = y - y - 0; */
> > > > + else if (subcode == MINUS_EXPR
> > > > +  && operand_equal_p (arg0, arg1, 0))
> > > > +   result = integer_zero_node;
> > > > +   }
> > > 
> > > So this all performs simplifications but also constant folding.  In
> > > particular the match.pd re-simplification will invoke fold_const_call
> > > on all-constant argument function calls but does not do extra folding
> > > on partially constant arg cases but instead relies on patterns here.
> > > 
> > > Can you add all-constant arg handling to fold_const_call and
> > > consider moving cases like y + 0 + 0 to match.pd?
> > 
> > The reason I've done this here is that this is the spot where all other
> > similar internal functions are handled, be it the ubsan ones
> > - IFN_UBSAN_CHECK_{ADD,SUB,MUL}, or __builtin_*_overflow ones
> > - IFN_{ADD,SUB,MUL}_OVERFLOW, or these 2 new ones.  The code handles
> > there 2 constant arguments as well as various patterns that can be
> > simplified and has code to clean it up later, build a COMPLEX_CST,
> > or COMPLEX_EXPR etc. as needed.  So, I think we want to handle those
> > elsewhere, we should do it for all of those functions, but then
> > probably incrementally.
> 
> The patch I've posted yesterday now fully tested on x86_64-linux and
> i686-linux.
> 
> Here is an untested incremental patch to handle constant folding of these
> in fold-const-call.cc rather than gimple-fold.cc.
> Not really sure if that is the way to go because it is replacing 28
> lines of former code with 65 of new code, for the overall benefit that say
> int
> foo (long long *p)
> {
>   int one = 1;
>   long long max = __LONG_LONG_MAX__;
>   return __builtin_add_overflow (one, max, p);
> }
> can be now fully folded already in ccp1 pass while before it was only
> cleaned up in forwprop1 pass right after it.

I think that's still very much desirable so this followup looks OK.
Maybe you can re-base it as prerequesite though?

> As for doing some stuff in match.pd, I'm afraid it would result in even more
> significant growth, the advantage of gimple-fold.cc doing all of these in
> one place is that the needed infrastructure can be shared.

Yes, I saw that.

Richard.

> 
> --- gcc/gimple-fold.cc.jj 2023-06-14 12:21:38.657657759 +0200
> +++ gcc/gimple-fold.cc2023-06-14 12:52:04.335054958 +0200
> @@ -5731,34 +5731,6 @@ gimple_fold_call (gimple_stmt_iterator *
>   result = arg0;
> else if (subcode == MULT_EXPR && integer_onep (arg0))
>   result = arg1;
> -   if (type
> -   && result == NULL_TREE
> -   && TREE_CODE (arg0) == INTEGER_CST
> -   && TREE_CODE (arg1) == INTEGER_CST
> -   && (!uaddc_usubc || TREE_CODE (arg2) == INTEGER_CST))
> - {
> -   if (cplx_result)
> - result = int_const_binop (subcode, fold_convert (type, arg0),
> -   fold_convert (type, arg1));
> -   else
> - result = int_const_binop (subcode, arg0, arg1);
> -   if (result && arith_overflowed_p (subcode, type, arg0, arg1))
> - {
> -   if (cplx_result)
> - overflow = build_one_cst (type);
> -   else
> - result = NULL_TREE;
> - }
> -   if (uaddc_usubc && result)
> - {
> -   tree r = int_const_binop (subcode, result,
> - fold_convert (type, arg2));
> -   if (r == NULL_TREE)
> - result = NULL_TREE;
> -   else if (arith_overflowed_p (subcode, type, result, arg2))
> - overflow = build_one_cst (type);
> - }
> - }
> if (result)
>   {
> if (result == integer_zero_node)
> --- gcc/fold-const-call.cc.jj 2023-06-02 10:36:43.096967505 +0200
> +++ gcc/fold-const-call.cc2023-06-14 12:56:08.195631214 +0200
> @@ -1669,6 +1669,7 @@ fold_const_call (combined_fn fn, tree ty
>  {
>const char *p0, *p1;
>char c;
> +  tree_code subcode;
>switch (fn)
>  {
>  case CFN_BUILT_IN_STRSPN:
> @@ -1738,6 +1739,46 @@ fold_const_call (combined_fn fn, tree ty
>  case CFN_FOLD_LEFT_PLUS:
>return fold_const_fold_left (type, arg0, arg1, PLUS_EXPR);
>  
> +case CFN_UBSAN_CHECK_ADD:
> +case CFN_ADD_OVERFLOW:
> +  

RE: Re: [PATCH] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread Li, Pan2 via Gcc-patches
Nit for test.

+/* { dg-options "-march=rv64gczve32x 
+--param=riscv-autovec-preference=fixed-vlmax" } */

To

+/* { dg-options "-march=rv64gc_zve32x 
--param=riscv-autovec-preference=fixed-vlmax" } */

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of juzhe.zh...@rivai.ai
Sent: Wednesday, June 14, 2023 7:21 PM
To: 丁乐华 ; gcc-patches 
Cc: jeffreyalaw ; Robin Dapp ; 
palmer 
Subject: 回复: Re: [PATCH] RISC-V: Ensure vector args and return use function 
stack to pass [PR110119]

Also
p110119-1.c
change name of test into
pr110119-1.c


juzhe.zh...@rivai.ai
 
发件人: juzhe.zh...@rivai.ai
发送时间: 2023-06-14 19:17
收件人: 丁乐华; gcc-patches
抄送: jeffreyalaw; Robin Dapp; palmer
主题: Re: [PATCH] RISC-V: Ensure vector args and return use function stack to 
pass [PR110119] Oh. I see.

Change  if (riscv_v_ext_mode_p (arg.mode) || riscv_v_ext_tuple_mode_p 
(arg.mode))

into 

if (riscv_v_ext_mode_p (arg.mode))

since riscv_v_ext_mode_p (arg.mode) includes riscv_v_ext_vector_mode_p 
(arg.mode) and riscv_v_ext_tuple_mode_p (arg.mode)

no need has riscv_v_ext_tuple_mode_p


juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-06-14 19:03
To: gcc-patches; juzhe.zhong
Subject: [PATCH] RISC-V: Ensure vector args and return use function stack to 
pass [PR110119] Hi,
 
The reason for this bug is that in the case where the vector register is set to 
a fixed length (with `--param=riscv-autovec-preference=fixed-vlmax` option), 
TARGET_PASS_BY_REFERENCE thinks that variables of type vint32m1 can be passed 
through two scalar registers, but when GCC calls FUNCTION_VALUE (call function 
riscv_get_arg_info inside) it returns NULL_RTX. These two functions are not 
unified. The current treatment is to pass all vector arguments and returns 
through the function stack, and a new calling convention for vector registers 
will be added in the future.
 
Best,
Lehua
 
  PR target/110119
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_get_arg_info): Return NULL_RTX for 
vector mode
(riscv_pass_by_reference): Return true for vector mode
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/p110119-1.c: New test.
* gcc.target/riscv/rvv/base/p110119-2.c: New test.
 
---
gcc/config/riscv/riscv.cc | 19 +-
.../gcc.target/riscv/rvv/base/p110119-1.c | 26 +++
.../gcc.target/riscv/rvv/base/p110119-2.c | 26 +++
3 files changed, 65 insertions(+), 6 deletions(-) create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/p110119-2.c
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 
dd5361c2bd2a..be868c7b6127 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3915,13 +3915,13 @@ riscv_get_arg_info (struct riscv_arg_info *info, const 
CUMULATIVE_ARGS *cum,
   riscv_pass_in_vector_p (type);
 }
-  /* TODO: Currently, it will cause an ICE for --param
- riscv-autovec-preference=fixed-vlmax. So, we just return NULL_RTX here
- let GCC generate loads/stores. Ideally, we should either warn the user not
- to use an RVV vector type as function argument or support the calling
- convention directly.  */
-  if (riscv_v_ext_mode_p (mode))
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (mode) || riscv_v_ext_tuple_mode_p (mode))
 return NULL_RTX;
+
   if (named)
 {
   riscv_aggregate_field fields[2];
@@ -4106,6 +4106,13 @@ riscv_pass_by_reference (cumulative_args_t cum_v, const 
function_arg_info ) return false;
 }
+  /* All current vector arguments and return values are passed through the
+ function stack. Ideally, we should either warn the user not to use an RVV
+ vector type as function argument or support a calling convention
+ with better performance.  */
+  if (riscv_v_ext_mode_p (arg.mode) || riscv_v_ext_tuple_mode_p (arg.mode))
+return true;
+
   /* Pass by reference if the data do not fit in two integer registers.  */
   return !IN_RANGE (size, 0, 2 * UNITS_PER_WORD); } diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
new file mode 100644
index ..0edbb0626299
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/p110119-1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv 
+--param=riscv-autovec-preference=fixed-vlmax" } */
+
+#include "riscv_vector.h"
+
+typedef int8_t vnx2qi __attribute__ ((vector_size (2)));
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi (int8_t a, int8_t b, int8_t *out) {
+  vnx2qi v = {a, b};
+  return v;
+}
+
+__attribute__ ((noipa)) vnx2qi
+f_vnx2qi_2 (vnx2qi a, int8_t *out)
+{
+  return a;
+}
+
+__attribute__ 

Re: [PATCH] Implement ipa_vr hashing.

2023-06-14 Thread Aldy Hernandez via Gcc-patches
PING

On Sat, Jun 10, 2023 at 10:30 PM Aldy Hernandez  wrote:
>
>
>
> On 5/29/23 16:51, Martin Jambor wrote:
> > Hi,
> >
> > On Mon, May 22 2023, Aldy Hernandez via Gcc-patches wrote:
> >> Implement hashing for ipa_vr.  When all is said and done, all these
> >> patches incurr a 7.64% slowdown for ipa-cp, with is entirely covered by
> >> the similar 7% increase in this area last week.  So we get type agnostic
> >> ranges with "infinite" range precision close to free.
> >
> > Do you know why/where this slow-down happens?  Do we perhaps want to
> > limit the "infiniteness" a little somehow?
>
> I addressed the slow down in another mail.
>
> >
> > Also, jump functions live for a long time, have you looked at how memory
> > hungry they become?  I hope that the hashing would be good at preventing
> > any issues.
>
> On a side-note, the caching does help.  On a (mistaken) hunch, I had
> played around with removing caching for everything but UNDEFINED/VARYING
> and zero/nonzero to simplify things, but the cache hit ratio was still
> surprisingly high (+80%).  So good job there :-).
>
> >
> > Generally, I think I OK with the patches if the impact on memory is not
> > too bad, though I guess they depend on the one I looked at last week, so
> > we may focus on that one first.
>
> I'm not sure whether this was an OK for the other patches, given you
> approved the first patch, so I'll hold off until you give the go-ahead.
>
> Thanks.
> Aldy



Re: [PATCH] Convert remaining uses of value_range in ipa-*.cc to Value_Range.

2023-06-14 Thread Aldy Hernandez via Gcc-patches
PING

On Mon, May 22, 2023 at 8:56 PM Aldy Hernandez  wrote:
>
> Minor cleanups to get rid of value_range in IPA.  There's only one left,
> but it's in the switch code which is integer specific.
>
> OK?
>
> gcc/ChangeLog:
>
> * ipa-cp.cc (decide_whether_version_node): Adjust comment.
> * ipa-fnsummary.cc (evaluate_conditions_for_known_args): Adjust
> for Value_Range.
> (set_switch_stmt_execution_predicate): Same.
> * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Same.
> ---
>  gcc/ipa-cp.cc|  3 +--
>  gcc/ipa-fnsummary.cc | 22 ++
>  gcc/ipa-prop.cc  |  9 +++--
>  3 files changed, 18 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index 03273666ea2..2e64415096e 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -6287,8 +6287,7 @@ decide_whether_version_node (struct cgraph_node *node)
> {
>   /* If some values generated for self-recursive calls with
>  arithmetic jump functions fall outside of the known
> -value_range for the parameter, we can skip them.  VR 
> interface
> -supports this only for integers now.  */
> +range for the parameter, we can skip them.  */
>   if (TREE_CODE (val->value) == INTEGER_CST
>   && !plats->m_value_range.bottom_p ()
>   && !ipa_range_contains_p (plats->m_value_range.m_vr,
> diff --git a/gcc/ipa-fnsummary.cc b/gcc/ipa-fnsummary.cc
> index 0474af8991e..1ce8501fe85 100644
> --- a/gcc/ipa-fnsummary.cc
> +++ b/gcc/ipa-fnsummary.cc
> @@ -488,19 +488,20 @@ evaluate_conditions_for_known_args (struct cgraph_node 
> *node,
>   if (vr.varying_p () || vr.undefined_p ())
> break;
>
> - value_range res;
> + Value_Range res (op->type);
>   if (!op->val[0])
> {
> + Value_Range varying (op->type);
> + varying.set_varying (op->type);
>   range_op_handler handler (op->code, op->type);
>   if (!handler
>   || !res.supports_type_p (op->type)
> - || !handler.fold_range (res, op->type, vr,
> - value_range (op->type)))
> + || !handler.fold_range (res, op->type, vr, varying))
> res.set_varying (op->type);
> }
>   else if (!op->val[1])
> {
> - value_range op0;
> + Value_Range op0 (op->type);
>   range_op_handler handler (op->code, op->type);
>
>   ipa_range_set_and_normalize (op0, op->val[0]);
> @@ -518,14 +519,14 @@ evaluate_conditions_for_known_args (struct cgraph_node 
> *node,
> }
>   if (!vr.varying_p () && !vr.undefined_p ())
> {
> - value_range res;
> - value_range val_vr;
> + int_range<2> res;
> + Value_Range val_vr (TREE_TYPE (c->val));
>   range_op_handler handler (c->code, boolean_type_node);
>
>   ipa_range_set_and_normalize (val_vr, c->val);
>
>   if (!handler
> - || !res.supports_type_p (boolean_type_node)
> + || !val_vr.supports_type_p (TREE_TYPE (c->val))
>   || !handler.fold_range (res, boolean_type_node, vr, 
> val_vr))
> res.set_varying (boolean_type_node);
>
> @@ -1687,12 +1688,17 @@ set_switch_stmt_execution_predicate (struct 
> ipa_func_body_info *fbi,
>int bound_limit = opt_for_fn (fbi->node->decl,
> param_ipa_max_switch_predicate_bounds);
>int bound_count = 0;
> -  value_range vr;
> +  // This can safely be an integer range, as switches can only hold
> +  // integers.
> +  int_range<2> vr;
>
>get_range_query (cfun)->range_of_expr (vr, op);
>if (vr.undefined_p ())
>  vr.set_varying (TREE_TYPE (op));
>tree vr_min, vr_max;
> +  // ?? This entire function could use a rewrite to use the irange
> +  // API, instead of trying to recreate its intersection/union logic.
> +  // Any use of get_legacy_range() is a serious code smell.
>value_range_kind vr_type = get_legacy_range (vr, vr_min, vr_max);
>wide_int vr_wmin = wi::to_wide (vr_min);
>wide_int vr_wmax = wi::to_wide (vr_max);
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index 6383bc11e0a..5f9e6dbbff2 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -2348,7 +2348,6 @@ ipa_compute_jump_functions_for_edge (struct 
> ipa_func_body_info *fbi,
>gcall *call = cs->call_stmt;
>int n, arg_num = gimple_call_num_args (call);
>bool useful_context = false;
> -  value_range vr;
>
>if (arg_num == 0 || 

Re: [PATCH] Convert ipa_jump_func to use ipa_vr instead of a value_range.

2023-06-14 Thread Aldy Hernandez via Gcc-patches
PING

On Mon, May 22, 2023 at 8:56 PM Aldy Hernandez  wrote:
>
> This patch converts the ipa_jump_func code to use the type agnostic
> ipa_vr suitable for GC instead of value_range which is integer specific.
>
> I've disabled the range cacheing to simplify the patch for review, but
> it is handled in the next patch in the series.
>
> OK?
>
> gcc/ChangeLog:
>
> * ipa-cp.cc (ipa_vr_operation_and_type_effects): New.
> * ipa-prop.cc (ipa_get_value_range): Adjust for ipa_vr.
> (ipa_set_jfunc_vr): Take a range.
> (ipa_compute_jump_functions_for_edge): Pass range to
> ipa_set_jfunc_vr.
> (ipa_write_jump_function): Call streamer write helper.
> (ipa_read_jump_function): Call streamer read helper.
> * ipa-prop.h (class ipa_vr): Change m_vr to an ipa_vr.
> ---
>  gcc/ipa-cp.cc   | 15 +++
>  gcc/ipa-prop.cc | 70 ++---
>  gcc/ipa-prop.h  |  5 +++-
>  3 files changed, 44 insertions(+), 46 deletions(-)
>
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index bdbc2184b5f..03273666ea2 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -1928,6 +1928,21 @@ ipa_vr_operation_and_type_effects (vrange _vr,
>   && !dst_vr.undefined_p ());
>  }
>
> +/* Same as above, but the SRC_VR argument is an IPA_VR which must
> +   first be extracted onto a vrange.  */
> +
> +static bool
> +ipa_vr_operation_and_type_effects (vrange _vr,
> +  const ipa_vr _vr,
> +  enum tree_code operation,
> +  tree dst_type, tree src_type)
> +{
> +  Value_Range tmp;
> +  src_vr.get_vrange (tmp);
> +  return ipa_vr_operation_and_type_effects (dst_vr, tmp, operation,
> +   dst_type, src_type);
> +}
> +
>  /* Determine range of JFUNC given that INFO describes the caller node or
> the one it is inlined to, CS is the call graph edge corresponding to JFUNC
> and PARM_TYPE of the parameter.  */
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index bbfe0f8aa45..c46a89f1b49 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -2287,9 +2287,10 @@ ipa_set_jfunc_bits (ipa_jump_func *jf, const 
> widest_int ,
>  /* Return a pointer to a value_range just like *TMP, but either find it in
> ipa_vr_hash_table or allocate it in GC memory.  TMP->equiv must be NULL.  
> */
>
> -static value_range *
> -ipa_get_value_range (value_range *tmp)
> +static ipa_vr *
> +ipa_get_value_range (const vrange )
>  {
> +  /* FIXME: Add hashing support.
>value_range **slot = ipa_vr_hash_table->find_slot (tmp, INSERT);
>if (*slot)
>  return *slot;
> @@ -2297,40 +2298,27 @@ ipa_get_value_range (value_range *tmp)
>value_range *vr = new (ggc_alloc ()) value_range;
>*vr = *tmp;
>*slot = vr;
> +  */
> +  ipa_vr *vr = new (ggc_alloc ()) ipa_vr (tmp);
>
>return vr;
>  }
>
> -/* Return a pointer to a value range consisting of TYPE, MIN, MAX and an 
> empty
> -   equiv set. Use hash table in order to avoid creating multiple same copies 
> of
> -   value_ranges.  */
> -
> -static value_range *
> -ipa_get_value_range (enum value_range_kind kind, tree min, tree max)
> -{
> -  value_range tmp (TREE_TYPE (min),
> -  wi::to_wide (min), wi::to_wide (max), kind);
> -  return ipa_get_value_range ();
> -}
> -
> -/* Assign to JF a pointer to a value_range structure with TYPE, MIN and MAX 
> and
> -   a NULL equiv bitmap.  Use hash table in order to avoid creating multiple
> -   same value_range structures.  */
> +/* Assign to JF a pointer to a value_range just like TMP but either fetch a
> +   copy from ipa_vr_hash_table or allocate a new on in GC memory.  */
>
>  static void
> -ipa_set_jfunc_vr (ipa_jump_func *jf, enum value_range_kind type,
> - tree min, tree max)
> +ipa_set_jfunc_vr (ipa_jump_func *jf, const vrange )
>  {
> -  jf->m_vr = ipa_get_value_range (type, min, max);
> +  jf->m_vr = ipa_get_value_range (tmp);
>  }
>
> -/* Assign to JF a pointer to a value_range just like TMP but either fetch a
> -   copy from ipa_vr_hash_table or allocate a new on in GC memory.  */
> -
>  static void
> -ipa_set_jfunc_vr (ipa_jump_func *jf, value_range *tmp)
> +ipa_set_jfunc_vr (ipa_jump_func *jf, const ipa_vr )
>  {
> -  jf->m_vr = ipa_get_value_range (tmp);
> +  Value_Range tmp;
> +  vr.get_vrange (tmp);
> +  ipa_set_jfunc_vr (jf, tmp);
>  }
>
>  /* Compute jump function for all arguments of callsite CS and insert the
> @@ -2392,8 +2380,8 @@ ipa_compute_jump_functions_for_edge (struct 
> ipa_func_body_info *fbi,
>
>   if (addr_nonzero)
> {
> - tree z = build_int_cst (TREE_TYPE (arg), 0);
> - ipa_set_jfunc_vr (jfunc, VR_ANTI_RANGE, z, z);
> + vr.set_nonzero (TREE_TYPE (arg));
> + ipa_set_jfunc_vr (jfunc, vr);
> }
>   else
> gcc_assert (!jfunc->m_vr);
> @@ -2412,7 +2400,7 @@ 

[wwwdocs] Broken URL to README.Portability

2023-06-14 Thread Jivan Hakobyan via Gcc-patches
This patch fixes the link to README.Portability in "GCC Coding Conventions"
page


-- 
With the best regards
Jivan Hakobyan
diff --git a/htdocs/codingconventions.html b/htdocs/codingconventions.html
index 9b6d243d..f5a356a8 100644
--- a/htdocs/codingconventions.html
+++ b/htdocs/codingconventions.html
@@ -252,7 +252,7 @@ and require at least an ANSI C89 or ISO C90 host compiler.
 C code should avoid pre-standard style function definitions, unnecessary
 function prototypes and use of the now deprecated PARAMS macro.
 See https://gcc.gnu.org/svn/gcc/trunk/gcc/README.Portability;>README.Portability
+href="https://gcc.gnu.org/git/?p=gcc.git;a=blob_plain;f=gcc/README.Portability;>README.Portability
 for details of some of the portability problems that may arise.  Some
 of these problems are warned about by gcc -Wtraditional,
 which is included in the default warning options in a bootstrap.


Re: [PATCH V2] RISC-V: Ensure vector args and return use function stack to pass [PR110119]

2023-06-14 Thread Robin Dapp via Gcc-patches
> Oh. I see Robin's email is also wrong. CC Robin too for you 

It still arrived via the mailing list ;)

> Good to see a Fix patch of the ICE before Vector ABI patch.
> Let's wait for more comments.

LGTM, this way I don't even need to rewrite my tests.

Regards
 Robin


  1   2   >