Re: [PATCH v6 0/4] P1689R5 support

2023-06-16 Thread Jason Merrill via Gcc-patches
On Fri, Jun 16, 2023 at 3:49 PM Ben Boeckel  wrote:
>
> On Thu, Jun 08, 2023 at 21:59:13 +0400, Maxim Kuvyrkov wrote:
> > This patch series causes ICEs on arm-linux-gnueabihf.  Would you
> > please investigate?  Please let me know if you need any in reproducing
> > these.
>
> Finally back at it. I tried on aarch64, but wasn't able to reproduce the
> errors (alas, it is probably a 32bit thing…let me try with `-m32`). Is
> there hardware I can access to try this out on the same target triple?
>
> Alternatively, a backtrace may be able to help pinpoint it enough if you
> have the cycles.

I see the same thing with patch 4 on x86_64-pc-linux-gnu, e.g.

FAIL: g++.dg/modules/ben-1_a.C -std=c++17 (test for excess errors)
Excess errors:
/home/jason/gt/gcc/testsuite/g++.dg/modules/ben-1_a.C:9:1: internal
compiler error: Segmentation fault
0x19e2f3c crash_signal
/home/jason/gt/gcc/toplev.cc:314
0x340f3f8 mkdeps::vec::size() const
/home/jason/gt/libcpp/mkdeps.cc:57
0x340dc1f apply_vpath
/home/jason/gt/libcpp/mkdeps.cc:194
0x340e08e deps_add_dep(mkdeps*, char const*)
/home/jason/gt/libcpp/mkdeps.cc:318
0xea7b51 module_client::open_module_client(unsigned int, char const*,
mkdeps*, void (*)(char const*), char const*)
/home/jason/gt/gcc/cp/mapper-client.cc:291
0xef2ba8 make_mapper
/home/jason/gt/gcc/cp/module.cc:14042
0xf0896c get_mapper(unsigned int, mkdeps*)
/home/jason/gt/gcc/cp/module.cc:3977
0xf032ac name_pending_imports
/home/jason/gt/gcc/cp/module.cc:19623
0xf03a7d preprocessed_module(cpp_reader*)
/home/jason/gt/gcc/cp/module.cc:19817
0xe85104 module_token_cdtor(cpp_reader*, unsigned long)
/home/jason/gt/gcc/cp/lex.cc:548
0xf467b2 cp_lexer_new_main
/home/jason/gt/gcc/cp/parser.cc:756
0xfc1e3a c_parse_file()
/home/jason/gt/gcc/cp/parser.cc:49725
0x11c5bf5 c_common_parse_file()
/home/jason/gt/gcc/c-family/c-opts.cc:1268



Re: [PATCH] simplify-rtx: Simplify VEC_CONCAT of SUBREG and VEC_CONCAT from same vector

2023-06-16 Thread Jeff Law via Gcc-patches




On 6/16/23 03:06, Kyrylo Tkachov via Gcc-patches wrote:

Hi all,

In the testcase for this patch we try to vec_concat the lowpart and highpart of 
a vector, but the lowpart is expressed as a subreg.
simplify-rtx.cc does not recognise this and combine ends up trying to match:
Trying 7 -> 8:
 7: r93:V2SI=vec_select(r95:V4SI,parallel)
 8: r97:V4SI=vec_concat(r95:V4SI#0,r93:V2SI)
   REG_DEAD r95:V4SI
   REG_DEAD r93:V2SI
Failed to match this instruction:
(set (reg:V4SI 97)
 (vec_concat:V4SI (subreg:V2SI (reg/v:V4SI 95 [ a ]) 0)
 (vec_select:V2SI (reg/v:V4SI 95 [ a ])
 (parallel:V4SI [
 (const_int 2 [0x2])
 (const_int 3 [0x3])
 ]

This should be just (set (reg:V4SI 97) (reg:V4SI 95)). This patch adds such a 
simplification.
The testcase is a bit artificial, but I do have other aarch64-specific patterns 
that I want to optimise later
that rely on this simplification happening.

Without this patch for the testcase we generate:
foo:
 dup d31, v0.d[1]
 ins v0.d[1], v31.d[0]
 ret

whereas we should just not generate anything as the operation is ultimately a 
no-op.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Ok for trunk?
Thanks,
Kyrill

gcc/ChangeLog:

* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
Simplify vec_concat of lowpart subreg and high part vec_select.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/low-high-combine_1.c: New test.

OK.

Jeff


Re: [PATCH v3] RISC-V: Add autovec FP binary operations.

2023-06-16 Thread Jeff Law via Gcc-patches




On 6/16/23 07:43, juzhe.zhong wrote:

lgtm

ACK for the trunk.
jeff


Re: [PATCH v3] RISC-V: Add autovec FP unary operations.

2023-06-16 Thread Jeff Law via Gcc-patches




On 6/16/23 07:44, juzhe.zhong wrote:

lgtm

Which is good enough for me.  Ok for the trunk.
jeff


Re: [PATCH] RISC-V: Add autovec FP unary operations.

2023-06-16 Thread Jeff Law via Gcc-patches




On 6/14/23 15:15, 钟居哲 wrote:

Hi, Jeff.  Thanks for quick approval.

When I reviewed the patch:
(define_expand "2"
   [(set (match_operand:VF 0 "register_operand")
     (any_float_unop_nofrm:VF
      (match_operand:VF 1 "register_operand")))]
"TARGET_VECTOR"
{
   insn_code icode = code_for_pred (, mode);
   riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
   DONE;
})

There could be issue here of FP16 vector.
Since let's see VF iterator:
(define_mode_iterator VF [
   (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
   (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
   (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")


You can see For all FP16 mode, we use predicate "TARGET_VECTOR_ELEN_FP_16"
which is true when either TARGET_ZVFHM or TARGET_ZVFHMIN.
The reason we do that since most floating-point instructions are using 
same iterators that we can't add TARGET_ZVFHMIN or TARGET_ZVFH
in naive way. Some instructions pattern are using VF for example vle16.v 
which should be enabled as long as TARGET_ZVFHMIN wheras

the instructions like vfneg.v need TARGET_ZVFH.

So I do the experiment:
void
f (_Float16 *restrict a, _Float16 *restrict b)
{
for (int i = 0; i < 100; ++i)
     {
a[i] = -b[i];
     }
}

with compile option:
-march=rv64gcv_zvfhmin --param=riscv-autovec-preference=fixed-vlmax -O3

ICE happens:
auto.c:26:1: error: unable to generate reloads for:
(insn 8 7 9 2 (set (reg:VNx8HF 186 [ vect__6.7 ])
         (if_then_else:VNx8HF (unspec:VNx8BI [
                     (const_vector:VNx8BI [
                             (const_int 1 [0x1]) repeated x8
                         ])
                     (const_int 8 [0x8])
                     (const_int 2 [0x2]) repeated x2
                     (const_int 0 [0])
                     (reg:SI 66 vl)
                     (reg:SI 67 vtype)
                 ] UNSPEC_VPREDICATE)
             (neg:VNx8HF (reg:VNx8HF 134 [ vect__4.6 ]))
             (unspec:VNx8HF [
                     (reg:SI 0 zero)
                 ] UNSPEC_VUNDEF))) "auto.c":24:14 6631 {pred_negvnx8hf}
      (expr_list:REG_DEAD (reg:VNx8HF 134 [ vect__4.6 ])
         (nil)))

The reason of ICE is that we have enabled auto-vectorzation pattern of 
vfneg.v when TARGET_ZVFHMIN according to VF iterators but
the instructions pattern of vfneg.v is correctly disabled and only 
enabled when TARGET_ZVFH since we have this attribute for each

RVV instruction pattern:
(define_attr "fp_vector_disabled" "no,yes"
   (cond [
     (and (eq_attr "type" "vfmov,vfalu,vfmul,vfdiv,
         vfwalu,vfwmul,vfmuladd,vfwmuladd,
         vfsqrt,vfrecp,vfminmax,vfsgnj,vfcmp,
         vfclass,vfmerge,
         vfncvtitof,vfwcvtftoi,vfcvtftoi,vfcvtitof,
         vfredo,vfredu,vfwredo,vfwredu,
         vfslide1up,vfslide1down")
    (and (eq_attr "mode" 
"VNx1HF,VNx2HF,VNx4HF,VNx8HF,VNx16HF,VNx32HF,VNx64HF")

         (match_test "!TARGET_ZVFH")))
     (const_string "yes")

;; The mode records as QI for the FP16 <=> INT8 instruction.
     (and (eq_attr "type" "vfncvtftoi,vfwcvtitof")
    (and (eq_attr "mode" 
"VNx1QI,VNx2QI,VNx4QI,VNx8QI,VNx16QI,VNx32QI,VNx64QI")

         (match_test "!TARGET_ZVFH")))
     (const_string "yes")
   ]
   (const_string "no")))

When I slightly change the pattern as follows:
(define_expand "2"
   [(set (match_operand:VF 0 "register_operand")
     (any_float_unop_nofrm:VF
      (match_operand:VF 1 "register_operand")))]
"TARGET_VECTOR && !(GET_MODE_INNER (mode) == HFmode && !TARGET_ZVFH)"
{
   insn_code icode = code_for_pred (, mode);
   riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
   DONE;
})

Add && !(GET_MODE_INNER (mode) == HFmode && !TARGET_ZVFH)
to condition.

It works for both TARGET_ZVFH and TARGET_ZVFHMIN
-march=rv64gcv_zvfhmin:
f:
         li      a4,2147450880
         li      a5,-2147450880
         addi    a4,a4,-1
         addi    a5,a5,1
         slli    a3,a5,32
         slli    a2,a4,32
         mv      a5,a4
         li      a4,-2147450880
         addi    a6,a1,200
         add     a3,a3,a4
         add     a2,a2,a5
.L2:
         ld      a5,0(a1)
         addi    a0,a0,8
         addi    a1,a1,8
         not     a4,a5
         and     a5,a5,a2
         and     a4,a4,a3
         sub     a5,a3,a5
         xor     a5,a4,a5
         sd      a5,-8(a0)
         bne     a1,a6,.L2
         ret

-march=rv64gcv_zvfh:
f:
         vsetivli        zero,8,e16,m1,ta,ma
         addi    a4,a1,16
         addi    a5,a0,16
         vle16.v v1,0(a1)
         vfneg.v v1,v1
         vse16.v v1,0(a0)
         addi    a2,a1,32
         addi    a3,a0,32
         vle16.v v1,0(a4)
         vfneg.v v1,v1
         vse16.v v1,0(a5)
         addi    a4,a1,48
         addi    a5,a0,48
         vle16.v v1,0(a2)
         vfneg.v v1,v1
         vse16.v v1,0(a3)
  

Re: [PATCH v2] RISC-V: Implement vec_set and vec_extract.

2023-06-16 Thread Jeff Law via Gcc-patches




On 6/16/23 07:55, 钟居哲 wrote:

LGTM

OK for the trunk.  Sorry for the delays.
jeff


RE: [PATCH v2] RISC-V: Bugfix for RVV integer reduction in ZVE32/64.

2023-06-16 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Friday, June 16, 2023 11:56 PM
To: juzhe.zh...@rivai.ai; Li, Pan2 ; gcc-patches 

Cc: Robin Dapp ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v2] RISC-V: Bugfix for RVV integer reduction in ZVE32/64.



On 6/16/23 02:10, juzhe.zh...@rivai.ai wrote:
> LGTM. Thanks for fix this bug.
> Let's wait for Jeff's final approve.
OK.

jeff


RE: [PATCH] RISC-V: Fix VL operand bug in VSETVL PASS[PR110264]

2023-06-16 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Friday, June 16, 2023 11:53 PM
To: Juzhe-Zhong ; gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; kito.ch...@sifive.com; pal...@dabbelt.com; 
pal...@rivosinc.com; rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Fix VL operand bug in VSETVL PASS[PR110264]



On 6/16/23 02:02, Juzhe-Zhong wrote:
> This patch fixes this issue happens on both GCC-13 and GCC-14.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110264
> 
> The testcase is too big and I failed to reduce it so I didn't append
> test into this patch.
> 
> This patch should not only land into GCC-14 but also should backport to 
> GCC-13.
> 
>   PR target/110264
> 
> gcc/ChangeLog:
> 
>  * config/riscv/riscv-vsetvl.cc (insert_vsetvl): Fix bug.
OK.

Note, I've been swamped this week.  So things are moving a bit slower 
than I'd like on the review side.

jeff


Re: [PATCH] tree-optimization/110278 - uns < (typeof uns)(uns != 0) is always false

2023-06-16 Thread Andrew Pinski via Gcc-patches
On Fri, Jun 16, 2023 at 4:14 PM Andrew Pinski  wrote:
>
> On Fri, Jun 16, 2023 at 4:46 AM Richard Biener via Gcc-patches
>  wrote:
> >
> > The following adds two patterns simplifying comparisons,
> > uns < (typeof uns)(uns != 0) is always false and x != (typeof x)(x == 0)
> > is always true.
>
> A few more that should be done (I will file a bug in a few minutes):
> `x == (typeof x)(x == 0)` is always false.
> `x == (typeof x)(x != 0)` is `(unsigned_type)x <= 1`
> `x != (typeof x)(x != 0)` is `(unsigned_type)x > 1`
> `uns <= (typeof uns)(uns != 0)` -> `uns <= 1`
> `uns > (typeof uns)(uns != 0)` is `uns > 1`
> `uns >= (typeof uns)(uns != 0)` is always true
>
> That should be all of them I think and  I think I did it correctly.

Filed as PR 110293 .

Thanks,
Andrew

>
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
> >
> > PR tree-optimization/110278
> > * match.pd (uns < (typeof uns)(uns != 0) -> false): New.
> > (x != (typeof x)(x == 0) -> true): Likewise.
> > ---
> >  gcc/match.pd | 11 +++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 264f9cb8a40..48b76e6a051 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -6410,6 +6410,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  (if (cmp == GT_EXPR)
> >   (lt (view_convert:st @0) { build_zero_cst (st); })))
> >
> > +/* unsigned < (typeof unsigned)(unsigned != 0) is always false.  */
> > +(simplify
> > + (lt:c @0 (convert (ne @0 integer_zerop)))
> > + (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
> > +  { constant_boolean_node (false, type); }))
> > +
> > +/* x != (typeof x)(x == 0) is always true.  */
> > +(simplify
> > + (ne:c @0 (convert (eq @0 integer_zerop)))
> > + { constant_boolean_node (true, type); })
> > +
> >  (for cmp (unordered ordered unlt unle ungt unge uneq ltgt)
> >   /* If the second operand is NaN, the result is constant.  */
> >   (simplify
> > --
> > 2.35.3


Re: [PATCH v6 0/4] P1689R5 support

2023-06-16 Thread Ben Boeckel via Gcc-patches
On Fri, Jun 16, 2023 at 15:48:59 -0400, Ben Boeckel wrote:
> On Thu, Jun 08, 2023 at 21:59:13 +0400, Maxim Kuvyrkov wrote:
> > This patch series causes ICEs on arm-linux-gnueabihf.  Would you
> > please investigate?  Please let me know if you need any in reproducing
> > these.
> 
> Finally back at it. I tried on aarch64, but wasn't able to reproduce the
> errors (alas, it is probably a 32bit thing…let me try with `-m32`). Is
> there hardware I can access to try this out on the same target triple?

Trying inside of an i386 container also came up with nothing…I'll try
qemu.

--Ben


Re: [PATCH] tree-optimization/110278 - uns < (typeof uns)(uns != 0) is always false

2023-06-16 Thread Andrew Pinski via Gcc-patches
On Fri, Jun 16, 2023 at 4:46 AM Richard Biener via Gcc-patches
 wrote:
>
> The following adds two patterns simplifying comparisons,
> uns < (typeof uns)(uns != 0) is always false and x != (typeof x)(x == 0)
> is always true.

A few more that should be done (I will file a bug in a few minutes):
`x == (typeof x)(x == 0)` is always false.
`x == (typeof x)(x != 0)` is `(unsigned_type)x <= 1`
`x != (typeof x)(x != 0)` is `(unsigned_type)x > 1`
`uns <= (typeof uns)(uns != 0)` -> `uns <= 1`
`uns > (typeof uns)(uns != 0)` is `uns > 1`
`uns >= (typeof uns)(uns != 0)` is always true

That should be all of them I think and  I think I did it correctly.

>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
>
> PR tree-optimization/110278
> * match.pd (uns < (typeof uns)(uns != 0) -> false): New.
> (x != (typeof x)(x == 0) -> true): Likewise.
> ---
>  gcc/match.pd | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 264f9cb8a40..48b76e6a051 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -6410,6 +6410,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (if (cmp == GT_EXPR)
>   (lt (view_convert:st @0) { build_zero_cst (st); })))
>
> +/* unsigned < (typeof unsigned)(unsigned != 0) is always false.  */
> +(simplify
> + (lt:c @0 (convert (ne @0 integer_zerop)))
> + (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
> +  { constant_boolean_node (false, type); }))
> +
> +/* x != (typeof x)(x == 0) is always true.  */
> +(simplify
> + (ne:c @0 (convert (eq @0 integer_zerop)))
> + { constant_boolean_node (true, type); })
> +
>  (for cmp (unordered ordered unlt unle ungt unge uneq ltgt)
>   /* If the second operand is NaN, the result is constant.  */
>   (simplify
> --
> 2.35.3


Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-06-16 Thread Joseph Myers
On Fri, 16 Jun 2023, Qing Zhao via Gcc-patches wrote:

> > So for 
> > 
> > struct foo { int c; int buf[(struct { int d; }){ .d = .c }]; };
> > 
> > one knows during parsing that the .d is a designator
> > and that .c is not.
> 
> Therefore, the above should be invalid based on this rule since .c is 
> not a member in the current structure.

What do you mean by "current structure"?  I think two different concepts 
are being conflated: the structure *being initialized* (what the C 
standard calls the "current object" for a brace-enclosed initializer 
list), and the structure *being defined*.  The former is what's relevant 
for designators.  The latter is what's relevant for the suggested new 
syntax.  And .c *is* a member of the structure being defined in this 
example.

Those two structure types are always different, except for corner cases 
with C2x tag compatibility (where an object of structure type might be 
initialized in the middle of a redefinition of that type).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [committed] libgomp: Fix OMP_TARGET_OFFLOAD=mandatory

2023-06-16 Thread Thomas Schwinge
Hi Tobias!

On 2023-06-16T17:57:10+0200, Tobias Burnus  wrote:
> Found an order problem caused by my r14-1801-g18c8b56c7d67a9 due to
> ordering issues related to the offloading initialization
> (gomp_init_targets_once).
>
> The testsuite did test various ways but only code such paths that
> initialized the library before ...
>
> Committed as Rev. r14-1893-g8216ca85037be9.

> commit 8216ca85037be9f4d5c20540522a22a4a93b660e
> Author: Tobias Burnus 
> Date:   Fri Jun 16 17:21:59 2023 +0200
>
> libgomp: Fix OMP_TARGET_OFFLOAD=mandatory
>
> It turned out that gomp_init_targets_once() was not run when directly
> calling 'omp target' or 'omp target (enter/exit) data' causing an
> abort with OMP_TARGET_OFFLOAD=mandatory wrongly claiming that no
> device is available. It was called a tiny bit later but few lines too
> late for updating the default-device-var.
>
> libgomp/ChangeLog:
>
> * target.c (resolve_device): Call gomp_get_num_devices early to 
> ensure
> gomp_init_targets_once was called before using default-device-var.

> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -138,6 +138,10 @@ gomp_get_num_devices (void)
>  static struct gomp_device_descr *
>  resolve_device (int device_id, bool remapped)
>  {
> +  /* Get number of devices and thus ensure that 'gomp_init_targets_once' was
> + called, which must be done before using default_device_var.  */
> +  int num_devices = gomp_get_num_devices ();
> +
>if (remapped && device_id == GOMP_DEVICE_ICV)
>  {
>struct gomp_task_icv *icv = gomp_icv (false);
> @@ -151,7 +155,7 @@ resolve_device (int device_id, bool remapped)
>: omp_initial_device))
>   return NULL;
>if (gomp_target_offload_var == GOMP_TARGET_OFFLOAD_MANDATORY
> -   && gomp_get_num_devices () == 0)
> +   && num_devices == 0)
>   gomp_fatal ("OMP_TARGET_OFFLOAD is set to MANDATORY, "
>   "but only the host device is available");
>else if (device_id == omp_invalid_device)
> @@ -162,10 +166,10 @@ resolve_device (int device_id, bool remapped)
>
>return NULL;
>  }
> -  else if (device_id >= gomp_get_num_devices ())
> +  else if (device_id >= num_devices)
>  {
>if (gomp_target_offload_var == GOMP_TARGET_OFFLOAD_MANDATORY
> -   && device_id != num_devices_openmp)
> +   && device_id != num_devices)
>   gomp_fatal ("OMP_TARGET_OFFLOAD is set to MANDATORY, "
>   "but device not found");

I see the new tests PASS, but with offloading enabled (nvptx) also see:

PASS: libgomp.c/target-51.c (test for excess errors)
PASS: libgomp.c/target-51.c execution test
[-PASS:-]{+FAIL:+} libgomp.c/target-51.c output pattern test

... due to:

Output was:

libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device cannot be used 
for offloading

Should match:
.*libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but device not found.*


Grüße
 Thomas


> diff --git a/libgomp/testsuite/libgomp.c/target-55.c 
> b/libgomp/testsuite/libgomp.c/target-55.c
> new file mode 100644
> index 000..1314b3c6963
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c/target-55.c
> @@ -0,0 +1,20 @@
> +/* { dg-do run { target { offload_device } } } */
> +/* { dg-set-target-env-var OMP_TARGET_OFFLOAD "mandatory" } */
> +
> +/* Should pass - see target-55a.c for !offload_device */
> +
> +/* Check OMP_TARGET_OFFLOAD - it shall run on systems with offloading
> +   devices available and fail otherwise.  Note that this did always
> +   fail - as the device handling wasn't initialized before doing the
> +   mandatory checking.  */
> +
> +int
> +main ()
> +{
> +  int x = 1;
> +  #pragma omp target map(tofrom: x)
> +x = 5;
> +  if (x != 5)
> +__builtin_abort ();
> +  return 0;
> +}
> diff --git a/libgomp/testsuite/libgomp.c/target-55a.c 
> b/libgomp/testsuite/libgomp.c/target-55a.c
> new file mode 100644
> index 000..53978c3f405
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c/target-55a.c
> @@ -0,0 +1,23 @@
> +/* { dg-do run { target { ! offload_device } } } */
> +/* { dg-set-target-env-var OMP_TARGET_OFFLOAD "mandatory" } */
> +
> +/* Should fail - see target-55a.c for offload_device */
> +
> +/* { dg-shouldfail "omp_invalid_device" } */
> +/* { dg-output ".*libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but only 
> the host device is available.*" } */
> +
> +/* Check OMP_TARGET_OFFLOAD - it shall run on systems with offloading
> +   devices available and fail otherwise.  Note that this did always
> +   fail - as the device handling wasn't initialized before doing the
> +   mandatory checking.  */
> +
> +int
> +main ()
> +{
> +  int x = 1;
> +  #pragma omp target map(tofrom: x)
> +x = 5;
> +  if (x != 5)
> +__builtin_abort ();
> +  return 0;
> +}
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; 

Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-06-16 Thread Qing Zhao via Gcc-patches



> On Jun 16, 2023, at 1:07 PM, Martin Uecker  wrote:
> 
> Am Freitag, dem 16.06.2023 um 16:21 + schrieb Joseph Myers:
>> On Fri, 16 Jun 2023, Martin Uecker via Gcc-patches wrote:
>> 
 Note that no expressions can start with the '.' token at present.  As soon 
 as you invent a new kind of expression that can start with that token, you 
 have syntactic ambiguity.
 
 struct s1 { int c; char a[(struct s2 { int c; char b[.c]; }) {.c=.c}.c]; };
 
 Is ".c=.c" a use of the existing syntax for designated initializers, with 
 the first ".c" being a designator and the second being a use of the new 
 kind of expression, or is it an assignment expression, where both the LHS 
 and the RHS of the assignment use the new kind of expression?  And do 
 those .c, when the use the new kind of expression, refer to the inner or 
 outer struct definition?
>>> 
>>> I would treat this is one integrated feature. Essentially .c is
>>> somthing like this->c for the current struct for designated
>>> initializer *and* size expressions because it is semantically 
>>> so close.In the initializer I would allow only 
>>> the current use for designated initialization for all names of
>>> member of the currently initialized struct,  so .c = .c would 
>>> be invalid.   It should never refer to the outer struct if there
>> 
>> I'm not clear on what the intended disambiguation rule here is, when "." 
>> is seen in initializer list context - does this rule depend on whether the 
>> following identifier is a member of the struct being initialized, so 
>> ".c=.c" would be OK above if the initialized struct didn't have a member 
>> called c but the outer struct definition did? 
> 
> When initializers are parsed it is already clear what
> the names of the members of the inner struct are, so
> one can differentiate between designated initializers 
> and potential other uses in an expression. 
> 
> So the main rule is: if you parse .something in a context
> where a designator is allowed and "something" is a member
> of the current struct, then it is a designator.

So, Limiting the .something ONLY to the CURRENT structure/union might be the 
simple and clean rule.

And I guess that this is also the rule for the current designator initializer 
syntax in C99?

> 
> So for 
> 
> struct foo { int c; int buf[(struct { int d; }){ .d = .c }]; };
> 
> one knows during parsing that the .d is a designator
> and that .c is not.

Therefore, the above should be invalid based on this rule since .c is not a 
member in the current structure.


> For
> 
> struct foo { int c; int buf[(struct { int d; }){ .c = .c }]; };
> 
> one knows that both uses of .c are not.

And this also is invalid since .c is not to a member in the current structure. 

> 
> Whether these different use cases should be allowed or not
> is a different question, but my point is that there does
> not seem to be a problem directly identifying the uses 
> as a designator as usual. To me, this seems to imply that
> it is safe to use the same syntax.
> 
>> That seems like a rather 
>> messy rule.  And does "would allow only" apply other than in the ambiguous 
>> context?  That seems to be implied by ".c=.c" being invalid above, because 
>> to make it invalid you need to disallow the new construct being used for 
>> the second .c, not just make the first .c interpreted as a designator.
> 
> Yes. 
>> 
>> Again, this sort of thing needs a detailed written specification, with 
>> multiple iterations discussed among different implementations. 
> 
> Oh, I agree with this.
> 
>> The above 
>> paragraph doesn't make clear to me any of: the disambiguation rules; what 
>> is allowed in what context; how name lookup works (consider tricky cases 
>> such as a reference to an identifier declared *later* in the same struct, 
>> possibly in the context of C2x tag compatibility where a previous 
>> definition of the struct is visible); when these expressions get 
>> evaluated; what the underlying principles are behind those choices.
> 
> I also agree that all this needs careful consideration and written
> rules.  My point is mereley that there does not seem to be a
> fundamental issue differentiating the new feature from 
> designators during parsing, so there may not be a risk using 
> the same syntax.

Yes, I agree on this. 
Extending the existing designated initializer syntax, .member, for the purpose 
of the argument of the new attribute seems very natural. 
If we can use this syntax in the argument of this new attribute, 
1. it will be easy to extend the argument of this attribute to an expression. 
2. It will also easy to use this syntax later if we accept the following

struct foo {
...
unsigned int count;
...
int data[.count];
};


thanks.

Qing
> 
>> Using a token (existing or new) other than '.' - one that doesn't 
>> introduce ambiguity in any context where expressions can be used - would 
>> help significantly, although some of the 

[r14-1873 Regression] FAIL: 25_algorithms/set_union/constrained.cc (test for excess errors) on Linux/x86_64

2023-06-16 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

6a2e8dcbbd4bab374b27abea375bf7a921047800 is the first bad commit
commit 6a2e8dcbbd4bab374b27abea375bf7a921047800
Author: Manolis Tsamis 
Date:   Thu May 25 13:44:41 2023 +0200

cprop_hardreg: Enable propagation of the stack pointer if possible

caused

FAIL: 25_algorithms/set_union/constrained.cc (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-1873/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libstdc++-v3/testsuite && make check 
RUNTESTFLAGS="conformance.exp=25_algorithms/set_union/constrained.cc 
--target_board='unix{-m64}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


Re: [PATCH v6 0/4] P1689R5 support

2023-06-16 Thread Ben Boeckel via Gcc-patches
On Thu, Jun 08, 2023 at 21:59:13 +0400, Maxim Kuvyrkov wrote:
> This patch series causes ICEs on arm-linux-gnueabihf.  Would you
> please investigate?  Please let me know if you need any in reproducing
> these.

Finally back at it. I tried on aarch64, but wasn't able to reproduce the
errors (alas, it is probably a 32bit thing…let me try with `-m32`). Is
there hardware I can access to try this out on the same target triple?

Alternatively, a backtrace may be able to help pinpoint it enough if you
have the cycles.

Thanks,

--Ben


Re: [PATCH] ipa-sra: Disable candidates with no known callers (PR 110276)

2023-06-16 Thread Jan Hubicka via Gcc-patches
> Hi,
> 
> In IPA-SRA we use can_be_local_p () predicate rather than just plain
> local call graph flag in order to figure out whether the node is a
> part of an external API that we cannot change.  Although there are
> cases where this can allow more transformations, it also means we can
> analyze functions which have no callers at all, which is pointless.

Do we also have some cost model that we do not privatize very large
comdats?
> 
> Moreover, it makes an assert of hint propagation trigger, which checks
> that we have looked at callers before processing hints that come from
> them.  This has been reported as PR 110276.
> 
> This patch simply adds a check that a node has at least one caller
> into the early checks and makes the node a non-candidate for any
> transformation if it does not.
> 
> Bootstrapped and tested on x86_64-linux, LTO bootstrap is still
> underway.  OK if it passes too?

OK. (It is run during WPA, not during early compilation, right?)

Honza
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2023-06-16  Martin Jambor  
> 
>   PR ipa/110276
>   * ipa-sra.cc (struct caller_issues): New field there_is_one.
>   (check_for_caller_issues): Set it.
>   (check_all_callers_for_issues): Check it.
> 
> gcc/testsuite/ChangeLog:
> 
> 2023-06-16  Martin Jambor  
> 
>   PR ipa/110276
>   * gcc.dg/ipa/pr110276.c: New test.
> ---
>  gcc/ipa-sra.cc  | 11 +++
>  gcc/testsuite/gcc.dg/ipa/pr110276.c | 15 +++
>  2 files changed, 26 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/pr110276.c
> 
> diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc
> index 3fee8fb22ce..21d281a9756 100644
> --- a/gcc/ipa-sra.cc
> +++ b/gcc/ipa-sra.cc
> @@ -3074,6 +3074,8 @@ struct caller_issues
>cgraph_node *candidate;
>/* There is a thunk among callers.  */
>bool thunk;
> +  /* Set if there is at least one caller that is OK.  */
> +  bool there_is_one;
>/* Call site with no available information.  */
>bool unknown_callsite;
>/* Call from outside the candidate's comdat group.  */
> @@ -3116,6 +3118,8 @@ check_for_caller_issues (struct cgraph_node *node, void 
> *data)
>  
>if (csum->m_bit_aligned_arg)
>   issues->bit_aligned_aggregate_argument = true;
> +
> +  issues->there_is_one = true;
>  }
>return false;
>  }
> @@ -3170,6 +3174,13 @@ check_all_callers_for_issues (cgraph_node *node)
>for (unsigned i = 0; i < param_count; i++)
>   (*ifs->m_parameters)[i].split_candidate = false;
>  }
> +  if (!issues.there_is_one)
> +{
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, "There is no call to %s that we can modify.  "
> +  "Disabling all modifications.\n", node->dump_name ());
> +  return true;
> +}
>return false;
>  }
>  
> diff --git a/gcc/testsuite/gcc.dg/ipa/pr110276.c 
> b/gcc/testsuite/gcc.dg/ipa/pr110276.c
> new file mode 100644
> index 000..5a1e2f3fb1c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/ipa/pr110276.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +typedef long (*EFI_PCI_IO_PROTOCOL_CONFIG)();
> +typedef struct {
> +  EFI_PCI_IO_PROTOCOL_CONFIG Read;
> +} EFI_PCI_IO_PROTOCOL_CONFIG_ACCESS;
> +typedef struct {
> +  EFI_PCI_IO_PROTOCOL_CONFIG_ACCESS Pci;
> +} EFI_PCI_IO_PROTOCOL;
> +int init_regs_0;
> +static void __attribute__((constructor)) init(EFI_PCI_IO_PROTOCOL *pci_io) {
> +  if (init_regs_0)
> +pci_io->Pci.Read();
> +}
> -- 
> 2.40.1
> 


Re: [PATCH zero-call-used-regs] Add leafy mode for zero-call-used-regs

2023-06-16 Thread Qing Zhao via Gcc-patches
Hi, Alexandre,

> On Jun 16, 2023, at 3:26 AM, Alexandre Oliva  wrote:
> 
> Hello, Qing,
> 
> On Oct 27, 2022, Qing Zhao  wrote:
> 
> 
>> On Oct 26, 2022, at 5:29 PM, Alexandre Oliva  wrote:
>>> I'm sure there are other scenarios in which keeping at least the
>>> possibility of 'all' is useful.
>> Okay.
> 
> 
>> i.e, instead introducing a new MODE “LEAFY_MODE” and a new user
>> sub-option, for LEAF functions, only
>> Clear its’ used registers even for “ALL”.
> 
>> However, since there is need to clear the un-used registers for leaf
>> functions. It looks like it is needed to provide
>> This new sub-option to users.
> 
>> Is this clear this time?
> 
> Yeah, I guess I understand what you mean.  But since there are cases in
> which clearing all (call-clobbered) registers in a leaf function is
> useful, I suppose it makes sense to offer both possibilities.
agreed.
> 
> If there was a default operation mode for -fzero-call-used-regs, I guess
> it would make sense to consider leafy the default, rather than all, but
> since there isn't, and it always has to be specified explicitly, that's
> not something to be considered.
> 
> So the available choices are:
> 
> 1. introduce 'leafy' as a separate mode, leaving 'all' alone
> 
> 2. change the behavior of 'all' to that of the proposed 'leafy', and either
> 
> 2.a) add another mode that retains the currently-useful behavior of 'all',
>   or
> 
> 2.b) make the current behavior of 'all' no longer available
> 
> Personally, I find 1. the least disruptive to existing users of
> -fzero-call-used-regs.  If we were introducing the option now, maybe 2.a
> would be more sensible, but at this point, changing the meaning of 'all'
> seems to be a disservice to security-sensitive users.
> 
> Those who would prefer the leaner operation on leaf functions can then
> switch to 'leafy' mode, but that's better than finding carefully-crafted
> code relying on the current behavior of 'all' for security suddenly
> changes from under them, isn't it?

Yes, I agree.
> 
> 
> That said, I'm willing to implement the alternate change, if changing
> the expected behavior is preferred over offering a different choice, if
> needed to get closure on this feature.
> 
> For now, I'm just pinging the refreshed and retested patch.

As I mentioned in the previous round of review, I think that the documentation
 might need to add more details on what’s the LEAFY mode,
The purpose of it, and how to use it, provide more details to the end-users.


> Ok to install?
> 
> 
> Add leafy mode for zero-call-used-regs
> 
> Introduce 'leafy' to auto-select between 'used' and 'all' for leaf and
> nonleaf functions, respectively.
> 
> 
> for  gcc/ChangeLog
> 
>   * doc/extend.texi (zero-call-used-regs): Document leafy and
>   variants thereof.
>   * flag-types.h (zero_regs_flags): Add LEAFY_MODE, as well as
>   LEAFY and variants.
>   * function.cc (gen_call_ued_regs_seq): Set only_used for leaf
>   functions in leafy mode.
>   * opts.cc (zero_call_used_regs_opts): Add leafy and variants.
> 
> for  gcc/testsuite/ChangeLog
> 
>   * c-c++-common/zero-scratch-regs-leafy-1.c: New.
>   * c-c++-common/zero-scratch-regs-leafy-2.c: New.
>   * gcc.target/i386/zero-scratch-regs-leafy-1.c: New.
>   * gcc.target/i386/zero-scratch-regs-leafy-2.c: New.
> ---
> gcc/doc/extend.texi|   22 ++--
> gcc/flag-types.h   |5 +
> gcc/function.cc|3 +++
> gcc/opts.cc|4 
> .../c-c++-common/zero-scratch-regs-leafy-1.c   |   15 ++
> .../c-c++-common/zero-scratch-regs-leafy-2.c   |   21 +++
> .../gcc.target/i386/zero-scratch-regs-leafy-1.c|   12 +++
> .../gcc.target/i386/zero-scratch-regs-leafy-2.c|   16 +++
> 8 files changed, 96 insertions(+), 2 deletions(-)
> create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-leafy-1.c
> create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-leafy-2.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-leafy-1.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-leafy-2.c
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 7b5592502734e..f8b0bb53ef5d4 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi

I think in the documentation of zero_call_used_regs, 

After the description of the 3 basic values: “skip”, “used”, and “all”. 
The description of the new value “leafy” need to be added first.

In addition to the below doc change.

The others LGTM.

Thanks.

Qing

> @@ -4412,10 +4412,28 @@ zeros all call-used registers that pass arguments.
> @item all-gpr-arg
> zeros all call-used general purpose registers that pass
> arguments.
> +
> +@item leafy
> +Same as @samp{used} in a leaf 

libgo patch committed: Add benchmarks and examples to test list

2023-06-16 Thread Ian Lance Taylor via Gcc-patches
In https://go.dev/cl/384695
(https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590289.html)
I simplified the code that built lists of benchmarks, examples, and
fuzz tests, and managed to break it. This patch corrects the code to
once again make the benchmarks available, and to run the examples with
output and the fuzz targets.

Doing this revealed a test failure in internal/fuzz on 32-bit x86: a
signalling NaN is turned into a quiet NaN on the 387 floating-point
stack that GCC uses by default. This CL skips the test.

This fixes https://go.dev/issue/60826.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
bc6bd0d608da1609c1caeb04ab795a83720add55
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 702257009d2..1191a8d663d 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-737de90a63002d4872b19772a7116404ee5815b4
+a3a3c3a2d1bc6a8ca51b302d08c94ef27cdd8f0f
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/go/internal/fuzz/encoding_test.go 
b/libgo/go/internal/fuzz/encoding_test.go
index 8e3800eb77f..53fc5b8dc71 100644
--- a/libgo/go/internal/fuzz/encoding_test.go
+++ b/libgo/go/internal/fuzz/encoding_test.go
@@ -6,6 +6,7 @@ package fuzz
 
 import (
"math"
+   "runtime"
"strconv"
"testing"
"unicode"
@@ -330,6 +331,14 @@ func FuzzFloat64RoundTrip(f *testing.F) {
f.Add(math.Float64bits(math.Inf(-1)))
 
f.Fuzz(func(t *testing.T, u1 uint64) {
+   // The signaling NaN test fails on 32-bit x86 with gccgo,
+   // which uses the 387 floating-point stack by default.
+   // Converting a signaling NaN in and out of the stack
+   // changes the NaN to a quiet NaN.
+   if runtime.GOARCH == "386" && u1 == 0x7FF1 {
+   t.Skip("skipping signalling NaN test on 386 with gccgo")
+   }
+
x1 := math.Float64frombits(u1)
 
b := marshalCorpusFile(x1)
diff --git a/libgo/testsuite/gotest b/libgo/testsuite/gotest
index 0a0a7e14d74..33c98d804d6 100755
--- a/libgo/testsuite/gotest
+++ b/libgo/testsuite/gotest
@@ -577,13 +577,13 @@ symtogo() {
 # Find Go benchmark/fuzz/example functions.
 # The argument is the function name prefix.
 findfuncs() {
-   pattern='$1([^a-z].*)?'
+   pattern="$1([^a-z].*)?"
syms=$($NM -p -v _gotest_.o | egrep " $text .*\."$pattern'$' | fgrep -v 
' __go_' | egrep -v '\.\.\w+$' | sed 's/.* //')
if $havex; then
xsyms=$($NM -p -v $xofile | egrep " $text .*\."$pattern'$' | fgrep 
-v ' __go_' | egrep -v '\.\.\w+$' | sed 's/.* //')
syms="$syms $xsyms"
fi
-$(symtogo "$benchmarksyms")
+symtogo "$syms"
 }
 
 # Takes an example name and puts any output into the file example.txt.
@@ -643,11 +643,13 @@ exampleoutput() {
fi
if $havex; then
needxtest=false
-   if test -n "$testxsyms" -o -n "$benchmarkxsyms"; then
+   if test -n "$testxsyms"; then
+   needxtest=true
+   elif echo "$benchmarks" | grep '_test\.' >/dev/null; then
needxtest=true
else
# Check whether any example has output.
-   for i in $(symtogo "$examplexsyms"); do
+   for i in $(echo "$examples" | grep '_test\.'); do
exampleoutput $i
if test -f example.txt; then
rm -f example.txt


[PATCH ver 5] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-16 Thread Carl Love via Gcc-patches
Kewen, GCC maintainers:

Version 5, Tested the patch on P9 BE per request.  Fixed up test case
to get the correct expected values for BE and LE.  Fixed typos. 
Updated the doc/extend.texi to clarify the vector arguments.  Changed
test file names per request.  Moved builtin defs next to related
definitions.  Renamed new mode_attr. Removed new mode_iterator, used
existing iterator instead. Renamed mode_iterator VSEEQP_DI to V2DI_DI. 
Fixed up overloaded definitions per request.

Version 4, added missing cases for new xxexpqp, xsxexpdp and xsxsigqp
cases to rs6000_expand_builtin.  Merged the new define_insn definitions
with the existing definitions.  Renamed the builtins by removing the
__builtin_ prefix from the names.  Fixed the documentation for the
builtins.  Updated the test files to check the desired instructions
were generated.  Retested patch on Power 10 with no regressions.

Version 3, was able to get the overloaded version of scalar_insert_exp
to work and the change to xsxexpqp_f128_ define instruction to
work with the suggestions from Kewen.  

Version 2, I have addressed the various comments from Kewen.  I had
issues with adding an additional overloaded version of
scalar_insert_exp with vector arguments.  The overload infrastructure
didn't work with a mix of scalar and vector arguments.  I did rename
the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp make
it similar to the existing builtin.  I also wasn't able to get the
suggested merge of xsxexpqp_f128_ with xsxexpqp_ to work so
I left the two simpler definitiions.

The patch add three new builtins to extract the significand and
exponent of an IEEE float 128-bit value where the builtin argument is a
vector.  Additionally, a builtin to insert the exponent into an IEEE
float 128-bit vector argument is added.  These builtins were requested
since there is no clean and optimal way to transfer between a vector
and a scalar IEEE 128 bit value.

The patch has been tested on Power 9 BE and Power 10 LE with no
regressions.  Please let me know if the patch is acceptable or not. 
Thanks.

   Carl


rs6000: Add builtins for IEEE 128-bit floating point values

Add support for the following builtins:

 __vector unsigned long long int scalar_extract_exp_to_vec (__ieee128);
 __vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128);
 __ieee128 scalar_insert_exp (__vector unsigned __int128,
  __vector unsigned long long);

The instructions used in the builtins operate on vector registers.  Thus
the result must be moved to a scalar type.  There is no clean, performant
way to do this.  The user code typically needs the result as a vector
anyway.

gcc/
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
(CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti,
CODE_FOR_xsiexpqp_kf_v2di): Add case statements.
* config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
 __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
builtin definitions.
Rename xsxexpqp_kf, xsxsigqp_kf, xsiexpqp_kf to xsexpqp_kf_di,
xsxsigqp_kf_ti, xsiexpqp_kf_di respectively.
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Update case RS6000_OVLD_VEC_VSIE to handle MODE_VECTOR_INT for new
overloaded instance. Update comments.
* config/rs6000/rs6000-overload.def
(__builtin_vec_scalar_insert_exp): Add new overload definition with
vector arguments.
(scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
overloaded definitions.
* config/vsx.md (V2DI_DI): New mode iterator.
Rename xsxexpqp_ to sxexpqp__.
Rename xsxsigqp_ to xsxsigqp__.
Rename xsiexpqp_ to xsiexpqp__.
* doc/extend.texi (__builtin_extractf128_exp,
__builtin_extractf128_sig): Add documentation for new builtins.
(scalar_insert_exp): Add new overloaded builtin definition.

gcc/testsuite/
* gcc.target/powerpc/bfp/extract-exp-8.c: New test case.
* gcc.target/powerpc/bfp/extract-sig-8.c: New test case.
* gcc.target/powerpc/bfp/insert-exp-16.c: New test case.
---
 gcc/config/rs6000/rs6000-builtin.cc   |  21 +++-
 gcc/config/rs6000/rs6000-builtins.def |  15 ++-
 gcc/config/rs6000/rs6000-c.cc |  10 +-
 gcc/config/rs6000/rs6000-overload.def |  12 ++
 gcc/config/rs6000/vsx.md  |  25 +++--
 gcc/doc/extend.texi   |  24 +++-
 .../powerpc/bfp/scalar-extract-exp-8.c|  58 ++
 .../powerpc/bfp/scalar-extract-sig-8.c|  65 +++
 .../powerpc/bfp/scalar-insert-exp-16.c| 103 ++
 9 files changed, 307 insertions(+), 26 deletions(-)
 create mode 100644 

Re: [PATCH ver 4] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-16 Thread Carl Love via Gcc-patches
On Thu, 2023-06-15 at 14:23 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/6/15 04:37, Carl Love wrote:
> > Kewen, GCC maintainers:
> > 
> > Version 4, added missing cases for new xxexpqp, xsxexpdp and
> > xsxsigqp
> > cases to rs6000_expand_builtin.  Merged the new define_insn
> > definitions
> > with the existing definitions.  Renamed the builtins by removing
> > the
> > __builtin_ prefix from the names.  Fixed the documentation for the
> > builtins.  Updated the test files to check the desired instructions
> > were generated.  Retested patch on Power 10 with no regressions.
> > 
> > Version 3, was able to get the overloaded version of
> > scalar_insert_exp
> > to work and the change to xsxexpqp_f128_ define instruction
> > to
> > work with the suggestions from Kewen.  
> > 
> > Version 2, I have addressed the various comments from Kewen.  I had
> > issues with adding an additional overloaded version of
> > scalar_insert_exp with vector arguments.  The overload
> > infrastructure
> > didn't work with a mix of scalar and vector arguments.  I did
> > rename
> > the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp
> > make
> > it similar to the existing builtin.  I also wasn't able to get the
> > suggested merge of xsxexpqp_f128_ with xsxexpqp_ to
> > work so
> > I left the two simpler definitiions.
> > 
> > The patch add three new builtins to extract the significand and
> > exponent of an IEEE float 128-bit value where the builtin argument
> > is a
> > vector.  Additionally, a builtin to insert the exponent into an
> > IEEE
> > float 128-bit vector argument is added.  These builtins were
> > requested
> > since there is no clean and optimal way to transfer between a
> > vector
> > and a scalar IEEE 128 bit value.
> > 
> > The patch has been tested on Power 10 with no regressions.  Please
> > let
> > me know if the patch is acceptable or not.  Thanks.
> 
> I'd suggest you to test this on P9 BE as well to ensure the test case
> to work well on BE too.

Tested on P9 BE.  Updated test cases for the correct expected BE and LE
results.

> 
> >Carl
> > 
> > 
> > 
> > rs6000: Add builtins for IEEE 128-bit floating point values
> > 
> > Add support for the following builtins:
> > 
> >  __vector unsigned long long int scalar_extract_exp_to_vec
> > (__ieee128);
> >  __vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128);
> >  __ieee128 scalar_insert_exp (__vector unsigned __int128,
> >   __vector unsigned long long);
> > 
> > These builtins were requesed since there is no clean and performant
> > way to
> 
> s/requesed/requested/

Fixed.

> 
> > transfer a value from a vector type and scalar type, despite the
> > fact
> 
> Describe it oppositely?  As the related existing bifs returns scalar
> type,
> the users want them in vector type, so it's "from scalar type to
> vector
> type"?

Updated the description.

> 
> > that they both reside in vector registers.
> 
> the fact is the related hardware insns have vsx registers
> destination.
> 
> > gcc/
> > * config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
> > Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
> > Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
> > (CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti,
> > CODE_FOR_xsiexpqp_kf_v2di   ): Add case statements.
> 
> unnecessary tab.

Fixed.

> 
> > * config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
> >  __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
> > builtin definitions.
> > Rename xsxexpqp_kf, xsxsigqp_kf, xxsiexpqp_kf to xsexpqp_kf_di,
> 
> typo, xxsiexpqp_kf => xsiexpqp_kf
> 
> > xsxsigqp_kf_ti, xsiexpqp_kf_di respectively.
> > * config/rs6000/rs6000-c.cc
> > (altivec_resolve_overloaded_builtin):
> > Add else if for MODE_VECTOR_INT. Update comments.
> 
> May be better with "Update RS6000_OVLD_VEC_VSIE handling for
> MODE_VECTOR_INT
> which is used for newly added overloaded instance"?

Changed.

> 
> > * config/rs6000/rs6000-overload.def
> > (__builtin_vec_scalar_insert_exp): Add new overload definition
> > with
> > vector arguments.
> > (scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
> > odverloaded definitions.
> 
> s/odverloaded/overloaded/

Fixed.

> 
> > * config/vsx.md (VSEEQP_DI, VSESQP_TI): New mode iterators.
> > (VSEEQP_DI_base): New mode attribute definition.
> > Rename xsxexpqp_ to
> > sxexpqp__.
> > Rename xsxsigqp_ to
> > xsxsigqp__.
> > Rename xsiexpqp_ to
> > xsiexpqp__.
> > (xsxsigqp_f128_, xsiexpqpf_f128_): Add define_insn
> > for
> > new builtins.
> > * doc/extend.texi (__builtin_extractf128_exp,
> > __builtin_extractf128_sig): Add documentation for new builtins.
> > (scalar_insert_exp): Add new overloaded builtin definition.
> > 
> > gcc/testsuite/
> > * gcc.target/powerpc/bfp/extract-exp-1.c: New test case.

Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-06-16 Thread Martin Uecker via Gcc-patches
Am Freitag, dem 16.06.2023 um 16:21 + schrieb Joseph Myers:
> On Fri, 16 Jun 2023, Martin Uecker via Gcc-patches wrote:
> 
> > > Note that no expressions can start with the '.' token at present.  As 
> > > soon 
> > > as you invent a new kind of expression that can start with that token, 
> > > you 
> > > have syntactic ambiguity.
> > > 
> > > struct s1 { int c; char a[(struct s2 { int c; char b[.c]; }) {.c=.c}.c]; 
> > > };
> > > 
> > > Is ".c=.c" a use of the existing syntax for designated initializers, with 
> > > the first ".c" being a designator and the second being a use of the new 
> > > kind of expression, or is it an assignment expression, where both the LHS 
> > > and the RHS of the assignment use the new kind of expression?  And do 
> > > those .c, when the use the new kind of expression, refer to the inner or 
> > > outer struct definition?
> > 
> > I would treat this is one integrated feature. Essentially .c is
> > somthing like this->c for the current struct for designated
> > initializer *and* size expressions because it is semantically 
> > so close.In the initializer I would allow only 
> > the current use for designated initialization for all names of
> > member of the currently initialized struct,  so .c = .c would 
> > be invalid.   It should never refer to the outer struct if there
> 
> I'm not clear on what the intended disambiguation rule here is, when "." 
> is seen in initializer list context - does this rule depend on whether the 
> following identifier is a member of the struct being initialized, so 
> ".c=.c" would be OK above if the initialized struct didn't have a member 
> called c but the outer struct definition did? 

When initializers are parsed it is already clear what
the names of the members of the inner struct are, so
one can differentiate between designated initializers 
and potential other uses in an expression. 

So the main rule is: if you parse .something in a context
where a designator is allowed and "something" is a member
of the current struct, then it is a designator.

So for 

struct foo { int c; int buf[(struct { int d; }){ .d = .c }]; };

one knows during parsing that the .d is a designator
and that .c is not. For

struct foo { int c; int buf[(struct { int d; }){ .c = .c }]; };

one knows that both uses of .c are not.

Whether these different use cases should be allowed or not
is a different question, but my point is that there does
not seem to be a problem directly identifying the uses 
as a designator as usual. To me, this seems to imply that
it is safe to use the same syntax.

>  That seems like a rather 
> messy rule.  And does "would allow only" apply other than in the ambiguous 
> context?  That seems to be implied by ".c=.c" being invalid above, because 
> to make it invalid you need to disallow the new construct being used for 
> the second .c, not just make the first .c interpreted as a designator.

Yes. 
> 
> Again, this sort of thing needs a detailed written specification, with 
> multiple iterations discussed among different implementations. 

Oh, I agree with this.

>  The above 
> paragraph doesn't make clear to me any of: the disambiguation rules; what 
> is allowed in what context; how name lookup works (consider tricky cases 
> such as a reference to an identifier declared *later* in the same struct, 
> possibly in the context of C2x tag compatibility where a previous 
> definition of the struct is visible); when these expressions get 
> evaluated; what the underlying principles are behind those choices.

I also agree that all this needs careful consideration and written
rules.  My point is mereley that there does not seem to be a
fundamental issue differentiating the new feature from 
designators during parsing, so there may not be a risk using 
the same syntax.

> Using a token (existing or new) other than '.' - one that doesn't 
> introduce ambiguity in any context where expressions can be used - would 
> help significantly, although some of the issues would still apply.

The cost of using a new symbol is that one has two different
syntax for something which is semantically equivalent, i.e.
a notion to refer to a member of the current struct.

Martin

> 




Re: [PATCH] builtins: Add support for clang compatible __builtin_{add, sub}c{, l, ll} [PR79173]

2023-06-16 Thread Richard Biener via Gcc-patches



> Am 16.06.2023 um 16:34 schrieb Jakub Jelinek :
> 
> Hi!
> 
> While the design of these builtins in clang is questionable,
> rather than being say
> unsigned __builtin_addc (unsigned, unsigned, bool, bool *)
> so that it is clear they add two [0, 0x] range numbers
> plus one [0, 1] range carry in and give [0, 0x] range
> return plus [0, 1] range carry out, they actually instead
> add 3 [0, 0x] values together but the carry out
> isn't then the expected [0, 2] value because
> 0xULL + 0x + 0x is 0x2fffd,
> but just [0, 1] whether there was any overflow at all.
> 
> It is something used in the wild and shorter to write than the
> corresponding
> #define __builtin_addc(a,b,carry_in,carry_out) \
>  ({ unsigned _s; \
> unsigned _c1 = __builtin_uadd_overflow (a, b, &_s); \
> unsigned _c2 = __builtin_uadd_overflow (_s, carry_in, &_s); \
> *(carry_out) = (_c1 | _c2); \
> _s; })
> and so a canned builtin for something people could often use.
> It isn't that hard to maintain on the GCC side, as we just lower
> it to two .ADD_OVERFLOW calls early, and the already committed
> pottern recognization code can then make .UADDC/.USUBC calls out of
> that if the carry in is in [0, 1] range and the corresponding
> optab is supported by the target.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2023-06-16  Jakub Jelinek  
> 
>PR middle-end/79173
>* builtin-types.def (BT_FN_UINT_UINT_UINT_UINT_UINTPTR,
>BT_FN_ULONG_ULONG_ULONG_ULONG_ULONGPTR,
>BT_FN_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONGPTR): New
>types.
>* builtins.def (BUILT_IN_ADDC, BUILT_IN_ADDCL, BUILT_IN_ADDCLL,
>BUILT_IN_SUBC, BUILT_IN_SUBCL, BUILT_IN_SUBCLL): New builtins.
>* builtins.cc (fold_builtin_addc_subc): New function.
>(fold_builtin_varargs): Handle BUILT_IN_{ADD,SUB}C{,L,LL}.
>* doc/extend.texi (__builtin_addc, __builtin_subc): Document.
> 
>* gcc.target/i386/pr79173-11.c: New test.
>* gcc.dg/builtin-addc-1.c: New test.
> 
> --- gcc/builtin-types.def.jj2023-06-16 12:01:09.622759288 +0200
> +++ gcc/builtin-types.def2023-06-16 12:04:20.277086893 +0200
> @@ -842,10 +842,17 @@ DEF_FUNCTION_TYPE_4 (BT_FN_PTR_PTR_INT_S
> BT_PTR, BT_PTR, BT_INT, BT_SIZE, BT_SIZE)
> DEF_FUNCTION_TYPE_4 (BT_FN_UINT_UINT_UINT_UINT_UINT,
> BT_UINT, BT_UINT, BT_UINT, BT_UINT, BT_UINT)
> +DEF_FUNCTION_TYPE_4 (BT_FN_UINT_UINT_UINT_UINT_UINTPTR,
> + BT_UINT, BT_UINT, BT_UINT, BT_UINT, BT_PTR_UINT)
> DEF_FUNCTION_TYPE_4 (BT_FN_UINT_FLOAT_FLOAT_FLOAT_FLOAT,
> BT_UINT, BT_FLOAT, BT_FLOAT, BT_FLOAT, BT_FLOAT)
> DEF_FUNCTION_TYPE_4 (BT_FN_ULONG_ULONG_ULONG_UINT_UINT,
> BT_ULONG, BT_ULONG, BT_ULONG, BT_UINT, BT_UINT)
> +DEF_FUNCTION_TYPE_4 (BT_FN_ULONG_ULONG_ULONG_ULONG_ULONGPTR,
> + BT_ULONG, BT_ULONG, BT_ULONG, BT_ULONG, BT_PTR_ULONG)
> +DEF_FUNCTION_TYPE_4 
> (BT_FN_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONGPTR,
> + BT_ULONGLONG, BT_ULONGLONG, BT_ULONGLONG, BT_ULONGLONG,
> + BT_PTR_ULONGLONG)
> DEF_FUNCTION_TYPE_4 (BT_FN_STRING_STRING_CONST_STRING_SIZE_SIZE,
> BT_STRING, BT_STRING, BT_CONST_STRING, BT_SIZE, BT_SIZE)
> DEF_FUNCTION_TYPE_4 (BT_FN_INT_FILEPTR_INT_CONST_STRING_VALIST_ARG,
> --- gcc/builtins.def.jj2023-06-16 12:01:09.622759288 +0200
> +++ gcc/builtins.def2023-06-16 12:04:20.278086879 +0200
> @@ -934,6 +934,12 @@ DEF_GCC_BUILTIN(BUILT_IN_USUBLL_
> DEF_GCC_BUILTIN(BUILT_IN_UMUL_OVERFLOW, "umul_overflow", 
> BT_FN_BOOL_UINT_UINT_UINTPTR, ATTR_NOTHROW_NONNULL_LEAF_LIST)
> DEF_GCC_BUILTIN(BUILT_IN_UMULL_OVERFLOW, "umull_overflow", 
> BT_FN_BOOL_ULONG_ULONG_ULONGPTR, ATTR_NOTHROW_NONNULL_LEAF_LIST)
> DEF_GCC_BUILTIN(BUILT_IN_UMULLL_OVERFLOW, "umulll_overflow", 
> BT_FN_BOOL_ULONGLONG_ULONGLONG_ULONGLONGPTR, ATTR_NOTHROW_NONNULL_LEAF_LIST)
> +DEF_GCC_BUILTIN(BUILT_IN_ADDC, "addc", 
> BT_FN_UINT_UINT_UINT_UINT_UINTPTR, ATTR_NOTHROW_NONNULL_LEAF_LIST)
> +DEF_GCC_BUILTIN(BUILT_IN_ADDCL, "addcl", 
> BT_FN_ULONG_ULONG_ULONG_ULONG_ULONGPTR, ATTR_NOTHROW_NONNULL_LEAF_LIST)
> +DEF_GCC_BUILTIN(BUILT_IN_ADDCLL, "addcll", 
> BT_FN_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONGPTR, 
> ATTR_NOTHROW_NONNULL_LEAF_LIST)
> +DEF_GCC_BUILTIN(BUILT_IN_SUBC, "subc", 
> BT_FN_UINT_UINT_UINT_UINT_UINTPTR, ATTR_NOTHROW_NONNULL_LEAF_LIST)
> +DEF_GCC_BUILTIN(BUILT_IN_SUBCL, "subcl", 
> BT_FN_ULONG_ULONG_ULONG_ULONG_ULONGPTR, ATTR_NOTHROW_NONNULL_LEAF_LIST)
> +DEF_GCC_BUILTIN(BUILT_IN_SUBCLL, "subcll", 
> BT_FN_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONGPTR, 
> ATTR_NOTHROW_NONNULL_LEAF_LIST)
> 
> /* Category: miscellaneous builtins.  */
> DEF_LIB_BUILTIN(BUILT_IN_ABORT, "abort", BT_FN_VOID, 
> ATTR_TMPURE_NORETURN_NOTHROW_LEAF_COLD_LIST)
> --- gcc/builtins.cc.jj2023-06-13 18:23:37.141794072 +0200
> +++ 

Re: [PATCH] tree-ssa-math-opts: Fix up uaddc/usubc pattern matching [PR110271]

2023-06-16 Thread Richard Biener via Gcc-patches



> Am 16.06.2023 um 16:23 schrieb Jakub Jelinek :
> 
> Hi!
> 
> The following testcase ICEs, because I misremembered what the return value
> from match_arith_overflow is.  It isn't true if __builtin_*_overflow was
> matched, but it is true only in the BIT_NOT_EXPR case if stmt was removed.
> 
> So, if match_arith_overflow matches something, gsi_stmt (gsi) will not
> be stmt and match_uaddc_usubc will be confused and can ICE.
> 
> The following patch fixes it by checking if gsi_stmt (gsi) == stmt,
> in that case we know it is still a PLUS_EXPR/MINUS_EXPR and we can try to
> pattern match it further as UADDC/USUBC.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2023-06-16  Jakub Jelinek  
> 
>PR tree-optimization/110271
>* tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children)
>: Ignore return value from match_arith_overflow,
>instead call match_uaddc_usubc only if gsi_stmt (gsi) is still stmt.
> 
>* gcc.c-torture/compile/pr110271.c: New test.
> 
> --- gcc/tree-ssa-math-opts.cc.jj2023-06-15 09:12:28.777829348 +0200
> +++ gcc/tree-ssa-math-opts.cc2023-06-16 10:44:31.231798664 +0200
> @@ -5558,9 +5558,12 @@ math_opts_dom_walker::after_dom_children
> 
>case PLUS_EXPR:
>case MINUS_EXPR:
> -  if (!convert_plusminus_to_widen (, stmt, code)
> -  && !match_arith_overflow (, stmt, code, m_cfg_changed_p))
> -match_uaddc_usubc (, stmt, code);
> +  if (!convert_plusminus_to_widen (, stmt, code))
> +{
> +  match_arith_overflow (, stmt, code, m_cfg_changed_p);
> +  if (gsi_stmt (gsi) == stmt)
> +match_uaddc_usubc (, stmt, code);
> +}
>  break;
> 
>case BIT_NOT_EXPR:
> --- gcc/testsuite/gcc.c-torture/compile/pr110271.c.jj2023-06-16 
> 10:57:32.757621687 +0200
> +++ gcc/testsuite/gcc.c-torture/compile/pr110271.c2023-06-16 
> 10:57:15.298871335 +0200
> @@ -0,0 +1,24 @@
> +/* PR tree-optimization/110271 */
> +
> +unsigned a, b, c, d, e;
> +
> +void
> +foo (unsigned *x, int y, unsigned int *z)
> +{
> +  for (int i = 0; i < y; i++)
> +{
> +  b += d;
> +  a += b < d;
> +  a += c = (__PTRDIFF_TYPE__) x > 3;
> +  d = z[1] + (a < c);
> +  a += e;
> +  d += a < e;
> +}
> +}
> +
> +void
> +bar (unsigned int *z)
> +{
> +  unsigned *x = x;
> +  foo (x, 9, z);
> +}
> 
>Jakub
> 


Re: [PATCH 1/2] go: update usage of TARGET_AIX to TARGET_AIX_OS

2023-06-16 Thread Ian Lance Taylor via Gcc-patches
On Fri, Jun 16, 2023 at 9:00 AM Paul E. Murphy via Gcc-patches
 wrote:
>
> TARGET_AIX is defined to a non-zero value on linux and maybe other
> powerpc64le targets.  This leads to unexpected behavior such as
> dropping the .go_export section when linking a shared library
> on linux/powerpc64le.
>
> Instead, use TARGET_AIX_OS to toggle AIX specific behavior.
>
> Fixes golang/go#60798.
>
> gcc/go/ChangeLog:
>
> * go-backend.cc [TARGET_AIX]: Rename and update usage to
> TARGET_AIX_OS.
> * go-lang.cc: Likewise.

This is OK.

Thanks.

Ian


[x86_64 PATCH] Two minor tweaks to ix86_expand_move.

2023-06-16 Thread Roger Sayle

This patch splits out two (independent) minor changes to i386-expand.cc's
ix86_expand_move from a larger patch, given that it's better to review
and commit these independent pieces separately from a more complex patch.

The first change is to test for CONST_WIDE_INT_P before calling
ix86_convert_const_wide_int_to_broadcast.  Whilst stepping through
this function in gdb, I was surprised that the code was continually
jumping into this function with operands that obviously weren't
appropriate.

The second change is to generalize the optimization for efficiently
moving a TImode value to V1TImode (via V2DImode), to cover all 128-bit
vector modes.

Hence for the test case:

typedef unsigned long uv2di __attribute__ ((__vector_size__ (16)));
uv2di foo2(__int128 x) { return (uv2di)x; }

we'd previously move via memory with:

foo2:   movq%rdi, -24(%rsp)
movq%rsi, -16(%rsp)
movdqa  -24(%rsp), %xmm0
ret

with this patch we now generate with -O2 (the same as V1TImode):

foo2:   movq%rdi, %xmm0
movq%rsi, %xmm1
punpcklqdq  %xmm1, %xmm0
ret

and with -O2 -msse4 the even better:

foo2:   movq%rdi, %xmm0
pinsrq  $1, %rsi, %xmm0
ret

The new test case is unimaginatively called sse2-v1ti-mov-2.c given
the original test case just for V1TI mode was called sse2-v1ti-mov-1.c.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-06-16  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_move): Check that OP1 is
CONST_WIDE_INT_P before calling ix86_convert_wide_int_to_broadcast.
Generalize special case for converting TImode to V1TImode to handle
all 128-bit vector conversions.

gcc/testsuite/ChangeLog
* gcc.target/i386/sse2-v1ti-mov-2.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index def060a..6a28b67 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -521,7 +521,8 @@ ix86_expand_move (machine_mode mode, rtx operands[])
  return;
}
}
- else if (GET_MODE_SIZE (mode) >= 16)
+ else if (CONST_WIDE_INT_P (op1)
+  && GET_MODE_SIZE (mode) >= 16)
{
  rtx tmp = ix86_convert_const_wide_int_to_broadcast
(GET_MODE (op0), op1);
@@ -696,8 +697,9 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[])
   return;
 }
 
-  /* Special case TImode to V1TImode conversions, via V2DI.  */
-  if (mode == V1TImode
+  /* Special case TImode to 128-bit vector conversions via V2DI.  */
+  if (VECTOR_MODE_P (mode)
+  && GET_MODE_SIZE (mode) == 16
   && SUBREG_P (op1)
   && GET_MODE (SUBREG_REG (op1)) == TImode
   && TARGET_64BIT && TARGET_SSE
@@ -709,7 +711,7 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[])
   emit_move_insn (lo, gen_lowpart (DImode, SUBREG_REG (op1)));
   emit_move_insn (hi, gen_highpart (DImode, SUBREG_REG (op1)));
   emit_insn (gen_vec_concatv2di (tmp, lo, hi));
-  emit_move_insn (op0, gen_lowpart (V1TImode, tmp));
+  emit_move_insn (op0, gen_lowpart (mode, tmp));
   return;
 }
 
diff --git a/gcc/testsuite/gcc.target/i386/sse2-v1ti-mov-2.c 
b/gcc/testsuite/gcc.target/i386/sse2-v1ti-mov-2.c
new file mode 100644
index 000..7e89085
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-v1ti-mov-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -msse2" } */
+
+typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));
+typedef unsigned long uv2di __attribute__ ((__vector_size__ (16)));
+typedef unsigned int uv4si __attribute__ ((__vector_size__ (16)));
+typedef unsigned short uv8hi __attribute__ ((__vector_size__ (16)));
+typedef unsigned char uv16qi __attribute__ ((__vector_size__ (16)));
+
+uv1ti foo1(__int128 x) { return (uv1ti)x; }
+uv2di foo2(__int128 x) { return (uv2di)x; }
+uv4si foo4(__int128 x) { return (uv4si)x; }
+uv8hi foo8(__int128 x) { return (uv8hi)x; }
+uv16qi foo16(__int128 x) { return (uv16qi)x; }
+
+/* { dg-final { scan-assembler-not "%\[er\]sp" } } */


Re: [wwwdocs] gcc-14/changes.htm - Offloading: -lm/-lgfortran is autolinked

2023-06-16 Thread Gerald Pfeifer
On Fri, 16 Jun 2023, Tobias Burnus wrote:
> Thomas recently improved the offload experience by avoiding to use, e.g.
> 
>   gfortran -O3 -fopenmp qcd.f90 -lblas -foffload-options="-lgfortran -lm"
> 
> as libm and libgfortran now automatically get linked as 'gfortran' links
> -lgfortran and -lm on the host (only those libraries, not others). Thus,
> the commandline now looks much more natural:
> 
>   gfortran -O3 -fopenmp qcd.f90 -lblas

Nice!

> Attached patch documents it in the release notes.
> I loved to hear comments, suggestions, improvements (or even appraisals).

Looks good to me. (Personally I would have written "the math and 
Fortran runtime libraries", which is shorter, but pretty much a matter 
of preference. IOW, keep it as is unless you like it better, too. :-)

One idea might be to show the two invocations - before and after - in the 
release notes as well, at the end of that new entry. Totally up to you, 
too.


For the benefit of the doubt: Okay, thank you!

Gerald


Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-06-16 Thread Joseph Myers
On Fri, 16 Jun 2023, Martin Uecker via Gcc-patches wrote:

> > Note that no expressions can start with the '.' token at present.  As soon 
> > as you invent a new kind of expression that can start with that token, you 
> > have syntactic ambiguity.
> > 
> > struct s1 { int c; char a[(struct s2 { int c; char b[.c]; }) {.c=.c}.c]; };
> > 
> > Is ".c=.c" a use of the existing syntax for designated initializers, with 
> > the first ".c" being a designator and the second being a use of the new 
> > kind of expression, or is it an assignment expression, where both the LHS 
> > and the RHS of the assignment use the new kind of expression?  And do 
> > those .c, when the use the new kind of expression, refer to the inner or 
> > outer struct definition?
> 
> I would treat this is one integrated feature. Essentially .c is
> somthing like this->c for the current struct for designated
> initializer *and* size expressions because it is semantically 
> so close.In the initializer I would allow only 
> the current use for designated initialization for all names of
> member of the currently initialized struct,  so .c = .c would 
> be invalid.   It should never refer to the outer struct if there

I'm not clear on what the intended disambiguation rule here is, when "." 
is seen in initializer list context - does this rule depend on whether the 
following identifier is a member of the struct being initialized, so 
".c=.c" would be OK above if the initialized struct didn't have a member 
called c but the outer struct definition did?  That seems like a rather 
messy rule.  And does "would allow only" apply other than in the ambiguous 
context?  That seems to be implied by ".c=.c" being invalid above, because 
to make it invalid you need to disallow the new construct being used for 
the second .c, not just make the first .c interpreted as a designator.

Again, this sort of thing needs a detailed written specification, with 
multiple iterations discussed among different implementations.  The above 
paragraph doesn't make clear to me any of: the disambiguation rules; what 
is allowed in what context; how name lookup works (consider tricky cases 
such as a reference to an identifier declared *later* in the same struct, 
possibly in the context of C2x tag compatibility where a previous 
definition of the struct is visible); when these expressions get 
evaluated; what the underlying principles are behind those choices.

Using a token (existing or new) other than '.' - one that doesn't 
introduce ambiguity in any context where expressions can be used - would 
help significantly, although some of the issues would still apply.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] ipa-sra: Disable candidates with no known callers (PR 110276)

2023-06-16 Thread Martin Jambor
Hi,

In IPA-SRA we use can_be_local_p () predicate rather than just plain
local call graph flag in order to figure out whether the node is a
part of an external API that we cannot change.  Although there are
cases where this can allow more transformations, it also means we can
analyze functions which have no callers at all, which is pointless.

Moreover, it makes an assert of hint propagation trigger, which checks
that we have looked at callers before processing hints that come from
them.  This has been reported as PR 110276.

This patch simply adds a check that a node has at least one caller
into the early checks and makes the node a non-candidate for any
transformation if it does not.

Bootstrapped and tested on x86_64-linux, LTO bootstrap is still
underway.  OK if it passes too?

Thanks,

Martin


gcc/ChangeLog:

2023-06-16  Martin Jambor  

PR ipa/110276
* ipa-sra.cc (struct caller_issues): New field there_is_one.
(check_for_caller_issues): Set it.
(check_all_callers_for_issues): Check it.

gcc/testsuite/ChangeLog:

2023-06-16  Martin Jambor  

PR ipa/110276
* gcc.dg/ipa/pr110276.c: New test.
---
 gcc/ipa-sra.cc  | 11 +++
 gcc/testsuite/gcc.dg/ipa/pr110276.c | 15 +++
 2 files changed, 26 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr110276.c

diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc
index 3fee8fb22ce..21d281a9756 100644
--- a/gcc/ipa-sra.cc
+++ b/gcc/ipa-sra.cc
@@ -3074,6 +3074,8 @@ struct caller_issues
   cgraph_node *candidate;
   /* There is a thunk among callers.  */
   bool thunk;
+  /* Set if there is at least one caller that is OK.  */
+  bool there_is_one;
   /* Call site with no available information.  */
   bool unknown_callsite;
   /* Call from outside the candidate's comdat group.  */
@@ -3116,6 +3118,8 @@ check_for_caller_issues (struct cgraph_node *node, void 
*data)
 
   if (csum->m_bit_aligned_arg)
issues->bit_aligned_aggregate_argument = true;
+
+  issues->there_is_one = true;
 }
   return false;
 }
@@ -3170,6 +3174,13 @@ check_all_callers_for_issues (cgraph_node *node)
   for (unsigned i = 0; i < param_count; i++)
(*ifs->m_parameters)[i].split_candidate = false;
 }
+  if (!issues.there_is_one)
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "There is no call to %s that we can modify.  "
+"Disabling all modifications.\n", node->dump_name ());
+  return true;
+}
   return false;
 }
 
diff --git a/gcc/testsuite/gcc.dg/ipa/pr110276.c 
b/gcc/testsuite/gcc.dg/ipa/pr110276.c
new file mode 100644
index 000..5a1e2f3fb1c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr110276.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef long (*EFI_PCI_IO_PROTOCOL_CONFIG)();
+typedef struct {
+  EFI_PCI_IO_PROTOCOL_CONFIG Read;
+} EFI_PCI_IO_PROTOCOL_CONFIG_ACCESS;
+typedef struct {
+  EFI_PCI_IO_PROTOCOL_CONFIG_ACCESS Pci;
+} EFI_PCI_IO_PROTOCOL;
+int init_regs_0;
+static void __attribute__((constructor)) init(EFI_PCI_IO_PROTOCOL *pci_io) {
+  if (init_regs_0)
+pci_io->Pci.Read();
+}
-- 
2.40.1



[patch] OpenMP (C/C++): Keep pointer value of unmapped ptr with default mapping [PR110270]

2023-06-16 Thread Tobias Burnus

This fixes an issue related to OpenMP C/C++'s default mapping of pointer 
variables.
(That's 'defaultmap(default:pointer)' – which is possibly surprisingly *not* the
same as 'defaultmap(firstprivate:pointer)').

Namely, OpenMP supports the following:

int *ptr = malloc(sizeof(int)*5);
#pragma omp target enter data map(ptr[:5])

#pragma omp target
  p[2] = 5;

which matches 'firstprivate(p)' + attaching the device address of 'p[:0]' (a 
zero-sized array),
the latter making it possible to use 'p' automatically without the need to add 
any map clauses
at least as long as *p has been mapped before.

However, for

 int *ptr = omp_target_alloc (sizeof(int)*5, dev_num);
 #pragma omp target
   p[2] = 5;
or for
  #pragma omp requires unified_shared_memory
  int pa = [0];
  #pragma omp target
pa[0] = 6;

it failed before because neither 'ptr' nor 'pa' were mapped. Solution as a user 
was
either to add a (default)map clause (with map type than 'default'), a 
firstprivate
or an is_device_ptr clause.
The problem was that with default mapping, p and pa had the value NULL in the 
example
above on the device. (As required by OpenMP 5.0/5.1). With the commit, they 
retain
the original value avoiding surprises for the code above.
(See PR for the reference to the relevant sections of the OpenMP 5.{0,1,2} 
specifications.)


I would love if someone would give a review it;  albeit the actual code change 
in
libgomp/target.c is just a changing a single enum value.

If there are no comments, I intent to push it next week ...

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP (C/C++): Keep pointer value of unmapped ptr with default mapping [PR110270]

For C/C++ pointers, default implicit mapping firstprivatizes the pointer
but if the memory it points to is mapped, the it is updated to point to
the device memory (by attaching a zero sized array section of the pointed-to
storage).

However, if the pointed-to storage wasn't mapped, the pointer was set to
NULL on the device side (OpenMP 5.0/5.1 semantic). With this commit, the
pointer retains the on-host address in that case (OpenMP 5.2 semantic).

The new semantic avoids an explicit map/firstprivate/is_device_ptr in the
following sensible cases: Special values (e.g. pointer or 0x1, 0x2 etc.),
explicitly device allocated memory (e.g. omp_target_alloc), and with
(unified) shared memory.
(Note: With (U)SM, mappings still must be tracked, at least when
omp_target_associate_ptr does not fail when passing in two destinct pointers.)

libgomp/

	PR middle-end/110270
	* target.c (gomp_map_vars_internal): Copy host value instead of NULL
	for  GOMP_MAP_ZERO_LEN_ARRAY_SECTION if not mapped.
	* libgomp.texi (OpenMP 5.2 Impl.): Mark as 'Y'.
	* testsuite/libgomp.c/target-19.c: Update expected value.
	* testsuite/libgomp.c++/target-18.C: Likewise.
	* testsuite/libgomp.c++/target-19.C: Likewise.
	* testsuite/libgomp.c-c++-common/requires-unified-addr-2.c: New test.
	* testsuite/libgomp.c-c++-common/target-implicit-map-3.c: New test.
	* testsuite/libgomp.c-c++-common/target-implicit-map-4.c: New test.

 libgomp/libgomp.texi   |   2 +-
 libgomp/target.c   |   2 +-
 libgomp/testsuite/libgomp.c++/target-18.C  |  21 ++-
 libgomp/testsuite/libgomp.c++/target-19.C  |  13 +-
 .../libgomp.c-c++-common/requires-unified-addr-2.c |  85 +++
 .../libgomp.c-c++-common/target-implicit-map-3.c   | 105 ++
 .../libgomp.c-c++-common/target-implicit-map-4.c   | 159 +
 libgomp/testsuite/libgomp.c/target-19.c|  21 ++-
 8 files changed, 392 insertions(+), 16 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 1c57f5aa261..db8b1f1427e 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -384,7 +384,7 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
 @item @code{declare mapper} with iterator and @code{present} modifiers
   @tab N @tab
 @item If a matching mapped list item is not found in the data environment, the
-  pointer retains its original value @tab N @tab
+  pointer retains its original value @tab Y @tab
 @item New @code{enter} clause as alias for @code{to} on declare target directive
   @tab Y @tab
 @item Deprecation of @code{to} clause on declare target directive @tab N @tab
diff --git a/libgomp/target.c b/libgomp/target.c
index e39ef8f6e82..aa2410c0f16 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1149,7 +1149,7 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 	  if (!n)
 	{
 	  tgt->list[i].key = NULL;
-	  tgt->list[i].offset = OFFSET_POINTER;
+	  tgt->list[i].offset = OFFSET_INLINED;
 	  continue;
 	}
 	}
diff --git 

[PATCH] Regenerate some autotools generated files (Was: Re: [PATCH v3] configure: Implement --enable-host-pie)

2023-06-16 Thread Martin Jambor
On Fri, Jun 16 2023, Marek Polacek wrote:
> On Fri, Jun 16, 2023 at 12:26:23PM +0200, Martin Jambor wrote:
>> Hello,
>> 
>> On Thu, Jun 15 2023, Marek Polacek via Gcc-patches wrote:
>> > On Mon, Jun 05, 2023 at 09:06:43PM -0600, Jeff Law wrote:
>> >> 
>> >> 
>> >> On 6/5/23 10:18, Marek Polacek via Gcc-patches wrote:
>> >> > Ping.  Anyone have any further comments?
>> >> Given this was approved before, but got reverted due to issues (which have
>> >> since been addressed) -- I think you might as well go forward and sooner
>> >> rather than later so that we can catch fallout earlier.
>> >
>> > Thanks, pushed now, after rebasing, adjusting the patch for
>> > r14-1385, and testing with and without --enable-host-pie on
>> > both Debian and Fedora.
>> >
>> > If something comes up and I can't fix it quickly enough, I'll
>> > have to revert the patch.  We'll see.
>> >
>> 
>> The script that regularly checks that the checked-in autotools-generated
>> files are in sync now complain about the following diff.  Unless someone
>> stops me because I overlooked something or for some other reason, I will
>> commit it later on as obvious.
>
> Please, go ahead.
>  
>> I wonder where the "line" differences come from, perhaps you added a
>> comment after running autoconf/automake/...?  The zlib/Makefile.in hunks
>
> Arg, I think I must've messed up the #lines when rebasing though I don't
> know what went wrong with zlib/Makefile.in.  But I don't think the latter
> will actually make any difference.
>
>> like something we should have, though, even if I did not check whether
>> it makes any difference in practice.  And I want the checking script to
>> shut up too ;-)
>
> Thanks and sorry.
>

No worries, I have committed the following.

Thanks and have a nice weekend,

Martin



As discussed in
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621976.html this
should put the autotools generated files in sync to what they were
generated from (and make an automated checker happy).

Tested by bootstrapping on top of only a few revisions ago.

zlib/ChangeLog:

2023-06-16  Martin Jambor  

* Makefile.in: Regenerate.
* configure: Likewise.

gcc/ChangeLog:

2023-06-16  Martin Jambor  

* configure: Regenerate.
---
 gcc/configure| 4 ++--
 zlib/Makefile.in | 2 ++
 zlib/configure   | 4 ++--
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/configure b/gcc/configure
index a4563a9cade..f7b4b283ca2 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -19847,7 +19847,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19848 "configure"
+#line 19850 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -19953,7 +19953,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19954 "configure"
+#line 19956 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/zlib/Makefile.in b/zlib/Makefile.in
index 3f5102d1b87..80fe3b69116 100644
--- a/zlib/Makefile.in
+++ b/zlib/Makefile.in
@@ -353,6 +353,8 @@ datadir = @datadir@
 datarootdir = @datarootdir@
 docdir = @docdir@
 dvidir = @dvidir@
+enable_host_pie = @enable_host_pie@
+enable_host_shared = @enable_host_shared@
 exec_prefix = @exec_prefix@
 host = @host@
 host_alias = @host_alias@
diff --git a/zlib/configure b/zlib/configure
index 77be6c284e3..9308866a636 100755
--- a/zlib/configure
+++ b/zlib/configure
@@ -10763,7 +10763,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 10778 "configure"
+#line 10766 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -10869,7 +10869,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 10884 "configure"
+#line 10872 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
-- 
2.40.1





[PATCH 1/2] go: update usage of TARGET_AIX to TARGET_AIX_OS

2023-06-16 Thread Paul E. Murphy via Gcc-patches
TARGET_AIX is defined to a non-zero value on linux and maybe other
powerpc64le targets.  This leads to unexpected behavior such as
dropping the .go_export section when linking a shared library
on linux/powerpc64le.

Instead, use TARGET_AIX_OS to toggle AIX specific behavior.

Fixes golang/go#60798.

gcc/go/ChangeLog:

* go-backend.cc [TARGET_AIX]: Rename and update usage to
TARGET_AIX_OS.
* go-lang.cc: Likewise.
---
 gcc/go/go-backend.cc | 6 +++---
 gcc/go/go-lang.cc| 8 
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/go/go-backend.cc b/gcc/go/go-backend.cc
index c6a1a2b7c18..6e2c919e829 100644
--- a/gcc/go/go-backend.cc
+++ b/gcc/go/go-backend.cc
@@ -45,8 +45,8 @@ along with GCC; see the file COPYING3.  If not see
 #define GO_EXPORT_SECTION_NAME ".go_export"
 #endif
 
-#ifndef TARGET_AIX
-#define TARGET_AIX 0
+#ifndef TARGET_AIX_OS
+#define TARGET_AIX_OS 0
 #endif
 
 /* This file holds all the cases where the Go frontend needs
@@ -107,7 +107,7 @@ go_write_export_data (const char *bytes, unsigned int size)
 {
   gcc_assert (targetm_common.have_named_sections);
   sec = get_section (GO_EXPORT_SECTION_NAME,
-TARGET_AIX ? SECTION_EXCLUDE : SECTION_DEBUG,
+TARGET_AIX_OS ? SECTION_EXCLUDE : SECTION_DEBUG,
 NULL);
 }
 
diff --git a/gcc/go/go-lang.cc b/gcc/go/go-lang.cc
index b6e8c37bf22..c6c147b20a5 100644
--- a/gcc/go/go-lang.cc
+++ b/gcc/go/go-lang.cc
@@ -39,8 +39,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "go-c.h"
 #include "go-gcc.h"
 
-#ifndef TARGET_AIX
-#define TARGET_AIX 0
+#ifndef TARGET_AIX_OS
+#define TARGET_AIX_OS 0
 #endif
 
 /* Language-dependent contents of a type.  */
@@ -116,9 +116,9 @@ go_langhook_init (void)
   args.compiling_runtime = go_compiling_runtime;
   args.debug_escape_level = go_debug_escape_level;
   args.debug_escape_hash = go_debug_escape_hash;
-  args.nil_check_size_threshold = TARGET_AIX ? -1 : 4096;
+  args.nil_check_size_threshold = TARGET_AIX_OS ? -1 : 4096;
   args.debug_optimization = go_debug_optimization;
-  args.need_eqtype = TARGET_AIX ? true : false;
+  args.need_eqtype = TARGET_AIX_OS ? true : false;
   args.linemap = go_get_linemap();
   args.backend = go_get_backend();
   go_create_gogo ();
-- 
2.31.1



[PATCH 2/2] rust: update usage of TARGET_AIX to TARGET_AIX_OS

2023-06-16 Thread Paul E. Murphy via Gcc-patches
This was noticed when fixing the gccgo usage of the macro, the
rust usage is very similar.

TARGET_AIX is defined as a non-zero value on linux/powerpc64le
which may cause unexpected behavior.  TARGET_AIX_OS should be
used to toggle AIX specific behavior.

gcc/rust/ChangeLog:

* rust-object-export.cc [TARGET_AIX]: Rename and update
usage to TARGET_AIX_OS.
---
 gcc/rust/rust-object-export.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/rust/rust-object-export.cc b/gcc/rust/rust-object-export.cc
index 1143c767784..f9a395f6964 100644
--- a/gcc/rust/rust-object-export.cc
+++ b/gcc/rust/rust-object-export.cc
@@ -46,8 +46,8 @@
 #define RUST_EXPORT_SECTION_NAME ".rust_export"
 #endif
 
-#ifndef TARGET_AIX
-#define TARGET_AIX 0
+#ifndef TARGET_AIX_OS
+#define TARGET_AIX_OS 0
 #endif
 
 /* Return whether or not GCC has reported any errors.  */
@@ -91,7 +91,7 @@ rust_write_export_data (const char *bytes, unsigned int size)
 {
   gcc_assert (targetm_common.have_named_sections);
   sec = get_section (RUST_EXPORT_SECTION_NAME,
-TARGET_AIX ? SECTION_EXCLUDE : SECTION_DEBUG, NULL);
+TARGET_AIX_OS ? SECTION_EXCLUDE : SECTION_DEBUG, NULL);
 }
 
   switch_to_section (sec);
-- 
2.31.1



[committed] libgomp: Fix OMP_TARGET_OFFLOAD=mandatory

2023-06-16 Thread Tobias Burnus

Found an order problem caused by my r14-1801-g18c8b56c7d67a9 due to
ordering issues related to the offloading initialization
(gomp_init_targets_once).

The testsuite did test various ways but only code such paths that
initialized the library before ...

Committed as Rev. r14-1893-g8216ca85037be9.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 8216ca85037be9f4d5c20540522a22a4a93b660e
Author: Tobias Burnus 
Date:   Fri Jun 16 17:21:59 2023 +0200

libgomp: Fix OMP_TARGET_OFFLOAD=mandatory

It turned out that gomp_init_targets_once() was not run when directly
calling 'omp target' or 'omp target (enter/exit) data' causing an
abort with OMP_TARGET_OFFLOAD=mandatory wrongly claiming that no
device is available. It was called a tiny bit later but few lines too
late for updating the default-device-var.

libgomp/ChangeLog:

* target.c (resolve_device): Call gomp_get_num_devices early to ensure
gomp_init_targets_once was called before using default-device-var.
* testsuite/libgomp.c/target-55.c: New test.
* testsuite/libgomp.c/target-55a.c: New test.
---
 libgomp/target.c | 10 +++---
 libgomp/testsuite/libgomp.c/target-55.c  | 20 
 libgomp/testsuite/libgomp.c/target-55a.c | 23 +++
 3 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/libgomp/target.c b/libgomp/target.c
index e39ef8f6e82..b6a7214ab4f 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -138,6 +138,10 @@ gomp_get_num_devices (void)
 static struct gomp_device_descr *
 resolve_device (int device_id, bool remapped)
 {
+  /* Get number of devices and thus ensure that 'gomp_init_targets_once' was
+ called, which must be done before using default_device_var.  */
+  int num_devices = gomp_get_num_devices ();
+
   if (remapped && device_id == GOMP_DEVICE_ICV)
 {
   struct gomp_task_icv *icv = gomp_icv (false);
@@ -151,7 +155,7 @@ resolve_device (int device_id, bool remapped)
  : omp_initial_device))
 	return NULL;
   if (gomp_target_offload_var == GOMP_TARGET_OFFLOAD_MANDATORY
-	  && gomp_get_num_devices () == 0)
+	  && num_devices == 0)
 	gomp_fatal ("OMP_TARGET_OFFLOAD is set to MANDATORY, "
 		"but only the host device is available");
   else if (device_id == omp_invalid_device)
@@ -162,10 +166,10 @@ resolve_device (int device_id, bool remapped)
 
   return NULL;
 }
-  else if (device_id >= gomp_get_num_devices ())
+  else if (device_id >= num_devices)
 {
   if (gomp_target_offload_var == GOMP_TARGET_OFFLOAD_MANDATORY
-	  && device_id != num_devices_openmp)
+	  && device_id != num_devices)
 	gomp_fatal ("OMP_TARGET_OFFLOAD is set to MANDATORY, "
 		"but device not found");
 
diff --git a/libgomp/testsuite/libgomp.c/target-55.c b/libgomp/testsuite/libgomp.c/target-55.c
new file mode 100644
index 000..1314b3c6963
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-55.c
@@ -0,0 +1,20 @@
+/* { dg-do run { target { offload_device } } } */
+/* { dg-set-target-env-var OMP_TARGET_OFFLOAD "mandatory" } */
+
+/* Should pass - see target-55a.c for !offload_device */
+
+/* Check OMP_TARGET_OFFLOAD - it shall run on systems with offloading
+   devices available and fail otherwise.  Note that this did always
+   fail - as the device handling wasn't initialized before doing the
+   mandatory checking.  */
+
+int
+main ()
+{
+  int x = 1;
+  #pragma omp target map(tofrom: x)
+x = 5;
+  if (x != 5)
+__builtin_abort ();
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c/target-55a.c b/libgomp/testsuite/libgomp.c/target-55a.c
new file mode 100644
index 000..53978c3f405
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-55a.c
@@ -0,0 +1,23 @@
+/* { dg-do run { target { ! offload_device } } } */
+/* { dg-set-target-env-var OMP_TARGET_OFFLOAD "mandatory" } */
+
+/* Should fail - see target-55a.c for offload_device */
+
+/* { dg-shouldfail "omp_invalid_device" } */
+/* { dg-output ".*libgomp: OMP_TARGET_OFFLOAD is set to MANDATORY, but only the host device is available.*" } */
+
+/* Check OMP_TARGET_OFFLOAD - it shall run on systems with offloading
+   devices available and fail otherwise.  Note that this did always
+   fail - as the device handling wasn't initialized before doing the
+   mandatory checking.  */
+
+int
+main ()
+{
+  int x = 1;
+  #pragma omp target map(tofrom: x)
+x = 5;
+  if (x != 5)
+__builtin_abort ();
+  return 0;
+}


Re: [PATCH v2] RISC-V: Bugfix for RVV integer reduction in ZVE32/64.

2023-06-16 Thread Jeff Law via Gcc-patches




On 6/16/23 02:10, juzhe.zh...@rivai.ai wrote:

LGTM. Thanks for fix this bug.
Let's wait for Jeff's final approve.

OK.

jeff


Re: [PATCH] RISC-V: Fix VL operand bug in VSETVL PASS[PR110264]

2023-06-16 Thread Jeff Law via Gcc-patches




On 6/16/23 02:02, Juzhe-Zhong wrote:

This patch fixes this issue happens on both GCC-13 and GCC-14.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110264

The testcase is too big and I failed to reduce it so I didn't append
test into this patch.

This patch should not only land into GCC-14 but also should backport to GCC-13.

PR target/110264

gcc/ChangeLog:

 * config/riscv/riscv-vsetvl.cc (insert_vsetvl): Fix bug.

OK.

Note, I've been swamped this week.  So things are moving a bit slower 
than I'd like on the review side.


jeff


[pushed] [RA] [PR110215] Ignore conflicts for some pseudos from insns throwing a final exception

2023-06-16 Thread Vladimir Makarov via Gcc-patches

The following patch solves

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215

The patch was successfully tested and bootstrapped on x86-64, aarch64, 
and ppc64le.


It is difficult to make a stable test for the PR.  So there is not test 
in the patch.


commit 154c69039571c66b3a6d16ecfa9e6ff22942f59f
Author: Vladimir N. Makarov 
Date:   Fri Jun 16 11:12:32 2023 -0400

RA: Ignore conflicts for some pseudos from insns throwing a final exception

IRA adds conflicts to the pseudos from insns can throw exceptions
internally even if the exception code is final for the function and
the pseudo value is not used in the exception code.  This results in
spilling a pseudo in a loop (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110215).

The following patch fixes the problem.

PR rtl-optimization/110215

gcc/ChangeLog:

* ira-lives.cc: Include except.h.
(process_bb_node_lives): Ignore conflicts from cleanup exceptions
when the pseudo does not live at the exception landing pad.

diff --git a/gcc/ira-lives.cc b/gcc/ira-lives.cc
index 6a3901ee234..bc8493856a4 100644
--- a/gcc/ira-lives.cc
+++ b/gcc/ira-lives.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ira-int.h"
 #include "sparseset.h"
 #include "function-abi.h"
+#include "except.h"
 
 /* The code in this file is similar to one in global but the code
works on the allocno basis and creates live ranges instead of
@@ -1383,14 +1384,24 @@ process_bb_node_lives (ira_loop_tree_node_t loop_tree_node)
 		  SET_HARD_REG_SET (OBJECT_CONFLICT_HARD_REGS (obj));
 		  SET_HARD_REG_SET (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj));
 		}
-		  if (can_throw_internal (insn))
+		  eh_region r;
+		  eh_landing_pad lp;
+		  rtx_code_label *landing_label;
+		  basic_block landing_bb;
+		  if (can_throw_internal (insn)
+		  && (r = get_eh_region_from_rtx (insn)) != NULL
+		  && (lp = gen_eh_landing_pad (r)) != NULL
+		  && (landing_label = lp->landing_pad) != NULL
+		  && (landing_bb = BLOCK_FOR_INSN (landing_label)) != NULL
+		  && (r->type != ERT_CLEANUP
+			  || bitmap_bit_p (df_get_live_in (landing_bb),
+	   ALLOCNO_REGNO (a
 		{
-		  OBJECT_CONFLICT_HARD_REGS (obj)
-			|= callee_abi.mode_clobbers (ALLOCNO_MODE (a));
-		  OBJECT_TOTAL_CONFLICT_HARD_REGS (obj)
-			|= callee_abi.mode_clobbers (ALLOCNO_MODE (a));
+		  HARD_REG_SET new_conflict_regs
+			= callee_abi.mode_clobbers (ALLOCNO_MODE (a));
+		  OBJECT_CONFLICT_HARD_REGS (obj) |= new_conflict_regs;
+		  OBJECT_TOTAL_CONFLICT_HARD_REGS (obj) |= new_conflict_regs;
 		}
-
 		  if (sparseset_bit_p (allocnos_processed, num))
 		continue;
 		  sparseset_set_bit (allocnos_processed, num);


Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-06-16 Thread Qing Zhao via Gcc-patches


> On Jun 16, 2023, at 3:21 AM, Martin Uecker  wrote:
> 
> Am Donnerstag, dem 15.06.2023 um 16:55 + schrieb Joseph Myers:
>> On Thu, 15 Jun 2023, Qing Zhao via Gcc-patches wrote:
>> 
> ...
>>> 1. Update the routine “c_parser_postfix_expression” (is this the right 
>>> place? ) to accept the new designator syntax.
>> 
>> Any design that might work with an expression is the sort of thing that 
>> would likely involve many iterations on the specification (i.e. proposed 
>> wording changes to the C standard) for the interpretation of the new kinds 
>> of expressions, including how to resolve syntactic ambiguities and how 
>> name lookup works, before it could be considered ready to implement, and 
>> then a lot more work on the specification based on implementation 
>> experience.
>> 
>> Note that no expressions can start with the '.' token at present.  As soon 
>> as you invent a new kind of expression that can start with that token, you 
>> have syntactic ambiguity.
>> 
>> struct s1 { int c; char a[(struct s2 { int c; char b[.c]; }) {.c=.c}.c]; };
>> 
>> Is ".c=.c" a use of the existing syntax for designated initializers, with 
>> the first ".c" being a designator and the second being a use of the new 
>> kind of expression, or is it an assignment expression, where both the LHS 
>> and the RHS of the assignment use the new kind of expression?  And do 
>> those .c, when the use the new kind of expression, refer to the inner or 
>> outer struct definition?
> 
> I would treat this is one integrated feature. Essentially .c is
> somthing like this->c for the current struct for designated
> initializer *and* size expressions because it is semantically 
> so close.  

Yes, I think this is reasonable. (Is “this” the immediate containing structure?)

>  In the initializer I would allow only 
> the current use for designated initialization for all names of
> member of the currently initialized struct,  so .c = .c would 
> be invalid.

Given “.c” basically is “this->c”, then .c = .c should be considered as
this->c = this->c, is such self-initialization allowed in C?

>   It should never refer to the outer struct if there
> is a member with the same name in the inner struct, i.e. the
> outside member is then hidden. 

Does the above mean:  if there is NO name conflicting, it could refer to a 
field of an outer struct?

Why this is allowed? Why just disallow all referring to the field of the outer 
structure since .c basically is this->c?
> 
> So this would be ok:
> 
> struct s1 { int d; char a[(struct s2 { int c; char b[.c]; }) {.c=.d}.c]; };
> 
> Here the use of .d would be ok because it is not from the struct
> currently initialized, but from an outside scope.

I think that the above .c=.d should report an error, since .d does not exist in 
the containing structure.

Do I miss anything here?

thanks.

Qing
> 
> Martin
> 
> 
> 
> 



Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-06-16 Thread Qing Zhao via Gcc-patches


> On Jun 15, 2023, at 6:48 PM, Joseph Myers  wrote:
> 
> On Thu, 15 Jun 2023, Qing Zhao via Gcc-patches wrote:
> 
>> B. The argument of the new attribute “counted_by” is an identifier that can 
>> be
>> accepted by “c_parser_attribute_arguments”:
>> 
>> struct trailing_array_B {
>> Int count;
>> int array_B[] __attribute ((counted_by (count))); 
>> };
>> 
>> 
>> From my current very limited understanding of the C FE source code, it’s 
>> not easy to extend the argument to an expression later for the above. Is 
>> this understanding right?
> 
> It wouldn't be entirely compatible: if you change to interpreting the 
> argument as an expression, then the above would suggest a global variable 
> count is used (as opposed to some other syntax for referring to an element 
> of the containing structure).

Yeah, that’s the reason I tried to introduce the new “.count” syntax for the 
argument 
of the new attribute in the very beginning in order to avoid such incompatible 
issue later.  -:)
> 
> So an attribute that takes an element name might best be a *different* 
> attribute from any potential future one taking an expression (with some 
> new syntax to refer to an element).

So, if we add this “counted_by (identifier)” attribute now, and later we need 
to add another
 new attribute “new_counted_by (expression)”  at that time if needed?

Kees, what’s your opinion on this?

thanks.

Qing
> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com



[PATCH] builtins: Add support for clang compatible __builtin_{add,sub}c{,l,ll} [PR79173]

2023-06-16 Thread Jakub Jelinek via Gcc-patches
Hi!

While the design of these builtins in clang is questionable,
rather than being say
unsigned __builtin_addc (unsigned, unsigned, bool, bool *)
so that it is clear they add two [0, 0x] range numbers
plus one [0, 1] range carry in and give [0, 0x] range
return plus [0, 1] range carry out, they actually instead
add 3 [0, 0x] values together but the carry out
isn't then the expected [0, 2] value because
0xULL + 0x + 0x is 0x2fffd,
but just [0, 1] whether there was any overflow at all.

It is something used in the wild and shorter to write than the
corresponding
#define __builtin_addc(a,b,carry_in,carry_out) \
  ({ unsigned _s; \
 unsigned _c1 = __builtin_uadd_overflow (a, b, &_s); \
 unsigned _c2 = __builtin_uadd_overflow (_s, carry_in, &_s); \
 *(carry_out) = (_c1 | _c2); \
 _s; })
and so a canned builtin for something people could often use.
It isn't that hard to maintain on the GCC side, as we just lower
it to two .ADD_OVERFLOW calls early, and the already committed
pottern recognization code can then make .UADDC/.USUBC calls out of
that if the carry in is in [0, 1] range and the corresponding
optab is supported by the target.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-06-16  Jakub Jelinek  

PR middle-end/79173
* builtin-types.def (BT_FN_UINT_UINT_UINT_UINT_UINTPTR,
BT_FN_ULONG_ULONG_ULONG_ULONG_ULONGPTR,
BT_FN_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONGPTR): New
types.
* builtins.def (BUILT_IN_ADDC, BUILT_IN_ADDCL, BUILT_IN_ADDCLL,
BUILT_IN_SUBC, BUILT_IN_SUBCL, BUILT_IN_SUBCLL): New builtins.
* builtins.cc (fold_builtin_addc_subc): New function.
(fold_builtin_varargs): Handle BUILT_IN_{ADD,SUB}C{,L,LL}.
* doc/extend.texi (__builtin_addc, __builtin_subc): Document.

* gcc.target/i386/pr79173-11.c: New test.
* gcc.dg/builtin-addc-1.c: New test.

--- gcc/builtin-types.def.jj2023-06-16 12:01:09.622759288 +0200
+++ gcc/builtin-types.def   2023-06-16 12:04:20.277086893 +0200
@@ -842,10 +842,17 @@ DEF_FUNCTION_TYPE_4 (BT_FN_PTR_PTR_INT_S
 BT_PTR, BT_PTR, BT_INT, BT_SIZE, BT_SIZE)
 DEF_FUNCTION_TYPE_4 (BT_FN_UINT_UINT_UINT_UINT_UINT,
 BT_UINT, BT_UINT, BT_UINT, BT_UINT, BT_UINT)
+DEF_FUNCTION_TYPE_4 (BT_FN_UINT_UINT_UINT_UINT_UINTPTR,
+BT_UINT, BT_UINT, BT_UINT, BT_UINT, BT_PTR_UINT)
 DEF_FUNCTION_TYPE_4 (BT_FN_UINT_FLOAT_FLOAT_FLOAT_FLOAT,
 BT_UINT, BT_FLOAT, BT_FLOAT, BT_FLOAT, BT_FLOAT)
 DEF_FUNCTION_TYPE_4 (BT_FN_ULONG_ULONG_ULONG_UINT_UINT,
 BT_ULONG, BT_ULONG, BT_ULONG, BT_UINT, BT_UINT)
+DEF_FUNCTION_TYPE_4 (BT_FN_ULONG_ULONG_ULONG_ULONG_ULONGPTR,
+BT_ULONG, BT_ULONG, BT_ULONG, BT_ULONG, BT_PTR_ULONG)
+DEF_FUNCTION_TYPE_4 
(BT_FN_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONGPTR,
+BT_ULONGLONG, BT_ULONGLONG, BT_ULONGLONG, BT_ULONGLONG,
+BT_PTR_ULONGLONG)
 DEF_FUNCTION_TYPE_4 (BT_FN_STRING_STRING_CONST_STRING_SIZE_SIZE,
 BT_STRING, BT_STRING, BT_CONST_STRING, BT_SIZE, BT_SIZE)
 DEF_FUNCTION_TYPE_4 (BT_FN_INT_FILEPTR_INT_CONST_STRING_VALIST_ARG,
--- gcc/builtins.def.jj 2023-06-16 12:01:09.622759288 +0200
+++ gcc/builtins.def2023-06-16 12:04:20.278086879 +0200
@@ -934,6 +934,12 @@ DEF_GCC_BUILTIN(BUILT_IN_USUBLL_
 DEF_GCC_BUILTIN(BUILT_IN_UMUL_OVERFLOW, "umul_overflow", 
BT_FN_BOOL_UINT_UINT_UINTPTR, ATTR_NOTHROW_NONNULL_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_UMULL_OVERFLOW, "umull_overflow", 
BT_FN_BOOL_ULONG_ULONG_ULONGPTR, ATTR_NOTHROW_NONNULL_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_UMULLL_OVERFLOW, "umulll_overflow", 
BT_FN_BOOL_ULONGLONG_ULONGLONG_ULONGLONGPTR, ATTR_NOTHROW_NONNULL_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_ADDC, "addc", 
BT_FN_UINT_UINT_UINT_UINT_UINTPTR, ATTR_NOTHROW_NONNULL_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_ADDCL, "addcl", 
BT_FN_ULONG_ULONG_ULONG_ULONG_ULONGPTR, ATTR_NOTHROW_NONNULL_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_ADDCLL, "addcll", 
BT_FN_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONGPTR, 
ATTR_NOTHROW_NONNULL_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_SUBC, "subc", 
BT_FN_UINT_UINT_UINT_UINT_UINTPTR, ATTR_NOTHROW_NONNULL_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_SUBCL, "subcl", 
BT_FN_ULONG_ULONG_ULONG_ULONG_ULONGPTR, ATTR_NOTHROW_NONNULL_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_SUBCLL, "subcll", 
BT_FN_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONG_ULONGLONGPTR, 
ATTR_NOTHROW_NONNULL_LEAF_LIST)
 
 /* Category: miscellaneous builtins.  */
 DEF_LIB_BUILTIN(BUILT_IN_ABORT, "abort", BT_FN_VOID, 
ATTR_TMPURE_NORETURN_NOTHROW_LEAF_COLD_LIST)
--- gcc/builtins.cc.jj  2023-06-13 18:23:37.141794072 +0200
+++ gcc/builtins.cc 2023-06-16 13:11:25.094406298 +0200
@@ -9555,6 +9555,51 @@ fold_builtin_arith_overflow (location_t
   

Re: [PATCH v3] c++: Accept elaborated-enum-base with pedwarn

2023-06-16 Thread Alex Coplan via Gcc-patches
On 16/06/2023 09:07, Jason Merrill wrote:
> On 6/16/23 07:58, Alex Coplan wrote:
> > Hi,
> > 
> > This is a v3 patch addressing feedback for:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621714.html
> > 
> > The only change since the previous version is that the new option is
> > documented in invoke.texi (and the description in c.opt was shortened as
> > requested).
> > 
> > --
> > 
> > macOS SDK headers using the CF_ENUM macro can expand to invalid C++ code
> > of the form:
> > 
> > typedef enum T : BaseType T;
> > 
> > i.e. an elaborated-type-specifier with an additional enum-base.
> > Upstream LLVM can be made to accept the above construct with
> > -Wno-error=elaborated-enum-base.
> > 
> > This patch adds the -Welaborated-enum-base warning to GCC and adjusts
> > the C++ parser to emit this warning instead of rejecting this code
> > outright.
> > 
> > The macro expansion in the macOS headers occurs in the case that the
> > compiler declares support for enums with underlying type using
> > __has_feature, see
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618450.html
> > 
> > GCC rejecting this construct outright means that GCC fails to bootstrap
> > on Darwin in the case that it (correctly) implements __has_feature and
> > declares support for C++ enums with underlying type.
> > 
> > With this patch, GCC can bootstrap on Darwin in combination with the
> > (WIP) __has_feature patch posted at:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617878.html
> > 
> > Bootstrapped/regtested on aarch64-linux-gnu and x86_64-apple-darwin.
> > OK for trunk?
> 
> OK, thanks.

Thanks for the reviews, pushed as
g:b106f11dc6adb8df15cc5c268896d314c76ca35f.

> 
> > Thanks,
> > Alex
> > 
> > gcc/c-family/ChangeLog:
> > 
> > * c.opt (Welaborated-enum-base): New.
> > 
> > gcc/ChangeLog:
> > 
> > * doc/invoke.texi: Document -Welaborated-enum-base.
> > 
> > gcc/cp/ChangeLog:
> > 
> > * parser.cc (cp_parser_enum_specifier): Don't reject
> > elaborated-type-specifier with enum-base, instead emit new
> > Welaborated-enum-base warning.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp0x/enum40.C: Adjust expected diagnostics.
> > * g++.dg/cpp0x/forw_enum6.C: Likewise.
> > * g++.dg/cpp0x/elab-enum-base.C: New test.
> 


[PATCH] tree-ssa-math-opts: Fix up uaddc/usubc pattern matching [PR110271]

2023-06-16 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase ICEs, because I misremembered what the return value
from match_arith_overflow is.  It isn't true if __builtin_*_overflow was
matched, but it is true only in the BIT_NOT_EXPR case if stmt was removed.

So, if match_arith_overflow matches something, gsi_stmt (gsi) will not
be stmt and match_uaddc_usubc will be confused and can ICE.

The following patch fixes it by checking if gsi_stmt (gsi) == stmt,
in that case we know it is still a PLUS_EXPR/MINUS_EXPR and we can try to
pattern match it further as UADDC/USUBC.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-06-16  Jakub Jelinek  

PR tree-optimization/110271
* tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children)
: Ignore return value from match_arith_overflow,
instead call match_uaddc_usubc only if gsi_stmt (gsi) is still stmt.

* gcc.c-torture/compile/pr110271.c: New test.

--- gcc/tree-ssa-math-opts.cc.jj2023-06-15 09:12:28.777829348 +0200
+++ gcc/tree-ssa-math-opts.cc   2023-06-16 10:44:31.231798664 +0200
@@ -5558,9 +5558,12 @@ math_opts_dom_walker::after_dom_children
 
case PLUS_EXPR:
case MINUS_EXPR:
- if (!convert_plusminus_to_widen (, stmt, code)
- && !match_arith_overflow (, stmt, code, m_cfg_changed_p))
-   match_uaddc_usubc (, stmt, code);
+ if (!convert_plusminus_to_widen (, stmt, code))
+   {
+ match_arith_overflow (, stmt, code, m_cfg_changed_p);
+ if (gsi_stmt (gsi) == stmt)
+   match_uaddc_usubc (, stmt, code);
+   }
  break;
 
case BIT_NOT_EXPR:
--- gcc/testsuite/gcc.c-torture/compile/pr110271.c.jj   2023-06-16 
10:57:32.757621687 +0200
+++ gcc/testsuite/gcc.c-torture/compile/pr110271.c  2023-06-16 
10:57:15.298871335 +0200
@@ -0,0 +1,24 @@
+/* PR tree-optimization/110271 */
+
+unsigned a, b, c, d, e;
+
+void
+foo (unsigned *x, int y, unsigned int *z)
+{
+  for (int i = 0; i < y; i++)
+{
+  b += d;
+  a += b < d;
+  a += c = (__PTRDIFF_TYPE__) x > 3;
+  d = z[1] + (a < c);
+  a += e;
+  d += a < e;
+}
+}
+
+void
+bar (unsigned int *z)
+{
+  unsigned *x = x;
+  foo (x, 9, z);
+}

Jakub



Re: [PATCH v2] RISC-V: Implement vec_set and vec_extract.

2023-06-16 Thread 钟居哲
LGTM



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-16 21:41
To: Jeff Law; gcc-patches; palmer; Kito Cheng; juzhe.zh...@rivai.ai
CC: rdapp.gcc
Subject: [PATCH v2] RISC-V: Implement vec_set and vec_extract.
Hi,
 
with the recent changes that we also pass the return value via
stack this is can go forward now.
 
Changes in V2:
- Remove redundant force_reg.
- Change target selectors to those introduced in the binop patch.
 
Regards
Robin
 
 
This implements the vec_set and vec_extract patterns for integer and
floating-point data types.  For vec_set we broadcast the insert value to
a vector register and then perform a vslideup with effective length 1 to
the requested index.
 
vec_extract is done by sliding down the requested element to index 0
and v(f)mv.[xf].s to a scalar register.
 
The patch does not include vector-vector extraction which
will be done at a later time.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (vec_set): Implement.
(vec_extract): Implement.
* config/riscv/riscv-protos.h (enum insn_type): Add slide insn.
(emit_vlmax_slide_insn): Declare.
(emit_nonvlmax_slide_tu_insn): Declare.
(emit_scalar_move_insn): Export.
(emit_nonvlmax_integer_move_insn): Export.
* config/riscv/riscv-v.cc (emit_vlmax_slide_insn): New function.
(emit_nonvlmax_slide_tu_insn): New function.
(emit_vlmax_masked_mu_insn): No change.
(emit_vlmax_integer_move_insn): Export.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c:
New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c:
New test.
---
gcc/config/riscv/autovec.md   |  79 ++
gcc/config/riscv/riscv-protos.h   |   5 +
gcc/config/riscv/riscv-v.cc   |  50 +++-
.../rvv/autovec/vls-vlmax/vec_extract-1.c |  57 +
.../rvv/autovec/vls-vlmax/vec_extract-2.c |  68 +
.../rvv/autovec/vls-vlmax/vec_extract-3.c |  69 +
.../rvv/autovec/vls-vlmax/vec_extract-4.c |  72 ++
.../rvv/autovec/vls-vlmax/vec_extract-run.c   | 239 +
.../autovec/vls-vlmax/vec_extract-zvfh-run.c  |  77 ++
.../riscv/rvv/autovec/vls-vlmax/vec_set-1.c   |  62 +
.../riscv/rvv/autovec/vls-vlmax/vec_set-2.c   |  74 ++
.../riscv/rvv/autovec/vls-vlmax/vec_set-3.c   |  76 ++
.../riscv/rvv/autovec/vls-vlmax/vec_set-4.c   |  79 ++
.../riscv/rvv/autovec/vls-vlmax/vec_set-run.c | 240 ++
.../rvv/autovec/vls-vlmax/vec_set-zvfh-run.c  |  78 ++
15 files changed, 1323 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index c23a625afe1..9569b420d45 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -655,3 +655,82 @@ (define_expand "select_vl"
   riscv_vector::expand_select_vl (operands);
   DONE;
})
+
+;; -
+;;  [INT,FP] Insert a vector element.
+;; -
+
+(define_expand "vec_set"
+  [(match_operand:V 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (match_operand 2 "immediate_operand")]
+  "TARGET_VECTOR"
+{
+  /* If we set the first element, emit an v(f)mv.s.[xf].  */
+  if (operands[2] == const0_rtx)
+{
+  rtx ops[] = {operands[0], 

Re: [PATCH] c++: diagnostic ICE b/c of empty TPARMS_PRIMARY_TEMPLATE [PR109655]

2023-06-16 Thread Jason Merrill via Gcc-patches

On 6/9/23 11:19, Patrick Palka wrote:

When defining a previously declared class template, we neglect to set
TPARMS_PRIMARY_TEMPLATE for the in-scope template parameters, which the
class members go on to inherit, and so the members' DECL_TEMPLATE_PARMS
will have empty TPARMS_PRIMARY_TEMPLATE at those levels as well.  This
causes us to crash when diagnosing a constraint mismatch for an
out-of-line declaration of a member of a constrained class template.

This patch fixes this by walking the context to get at the corresponding
primary template instead.  I spent a while trying to get us to set
TPARMS_PRIMARY_TEMPLATE for templated class definitions that are
redeclarations, but it proved to be hairy in particular for partial
specializations and nested templates.


Maybe set it a bit earlier in push_template_decl, if you're going to 
figure out here what it should be anyway?



Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/13?

PR c++/109655

gcc/cp/ChangeLog:

* pt.cc (push_template_decl): Handle TPARMS_PRIMARY_TEMPLATE
being empty when diagnosing a constraint mismatch for an
enclosing template scope.  Don't bother checking constraints
if DECL_PARMS and SCOPE_PARMS are the same.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-class6.C: New test.
* g++.dg/cpp2a/concepts-class6a.C: New test.
---
  gcc/cp/pt.cc  | 19 +++--
  gcc/testsuite/g++.dg/cpp2a/concepts-class6.C  | 30 ++
  gcc/testsuite/g++.dg/cpp2a/concepts-class6a.C | 40 +++
  3 files changed, 86 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-class6.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-class6a.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 17bf4d24151..f913b248345 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -6155,12 +6155,25 @@ push_template_decl (tree decl, bool is_friend)
  decl_parms = TREE_CHAIN (decl_parms);
  scope_parms = TREE_CHAIN (scope_parms);
}
- while (decl_parms)
+ while (decl_parms && decl_parms != scope_parms)
{
  if (!template_requirements_equivalent_p (decl_parms, scope_parms))
{
- error ("redeclaration of %qD with different constraints",
-TPARMS_PRIMARY_TEMPLATE (TREE_VALUE (decl_parms)));
+ tree td = TPARMS_PRIMARY_TEMPLATE (TREE_VALUE (decl_parms));
+ if (!td)
+   {
+ /* FIXME: TPARMS_PRIMARY_TEMPLATE doesn't always get
+set for enclosing template scopes.  Work around
+this by walking the context to obtain the relevant
+(primary) template whose constraints we mismatch.  */
+ int level = TMPL_PARMS_DEPTH (decl_parms);
+ td = TYPE_TI_TEMPLATE (ctx);
+ while (!PRIMARY_TEMPLATE_P (td)
+|| (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (td))
+!= level))
+   td = TYPE_TI_TEMPLATE (DECL_CONTEXT (td));
+   }
+ error ("redeclaration of %qD with different constraints", td);
  break;
}
  decl_parms = TREE_CHAIN (decl_parms);
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-class6.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-class6.C
new file mode 100644
index 000..dcef6a2c9d4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-class6.C
@@ -0,0 +1,30 @@
+// PR c++/109655
+// { dg-do compile { target c++20 } }
+
+class C {
+  template
+  requires true
+  friend class D;
+
+  template
+  requires true
+  class E;
+};
+
+template
+requires true
+class D {
+  void f();
+};
+
+template
+void D::f() { } // { dg-error "class D' with different constraints" }
+
+template
+requires true
+class C::E {
+  void f();
+};
+
+template
+void C::E::f() { } // { dg-error "class C::E' with different constraints" }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-class6a.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-class6a.C
new file mode 100644
index 000..751d13cdf6c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-class6a.C
@@ -0,0 +1,40 @@
+// PR c++/109655
+// { dg-do compile { target c++20 } }
+
+template
+requires true
+class C {
+  class D;
+
+  template
+  requires (!!true)
+  class E;
+};
+
+template
+requires true
+class C::D {
+  void f();
+};
+
+template  // missing "requires true"
+void C::D::f() { } // { dg-error "class C' with different constraints" }
+
+template
+requires true
+template
+requires (!!true)
+class C::E {
+  void f();
+  void g();
+};
+
+template
+requires true
+template
+void C::E::f() { } // { dg-error "class C::E' with different 
constraints" }
+
+template
+template
+requires (!!true)
+void C::E::g() { } // { dg-error 

[wwwdocs] gcc-14/changes.htm - Offloading: -lm/-lgfortran is autolinked

2023-06-16 Thread Tobias Burnus

Thomas recently improved the offload experience by avoiding to use, e.g.

  gfortran -O3 -fopenmp qcd.f90 -lblas -foffload-options="-lgfortran -lm"

as libm and libgfortran now automatically get linked as 'gfortran' links
-lgfortran and -lm on the host (only those libraries, not others). Thus,
the commandline now looks much more natural:

  gfortran -O3 -fopenmp qcd.f90 -lblas

→ https.//gcc.gnu.org/r14-1807-g4bcb46b3ade179 for the code change.

Attached patch documents it in the release notes.
I loved to hear comments, suggestions, improvements (or even appraisals).

If not, I just will commit it eventually as is - and it has to be improved
later on ...

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcc-14/changes.htm - Offloading: -lm/-lgfortran is autolinked

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index c403c94f..96653f05 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -51,6 +51,15 @@ a work-in-progress.
   was extended.
 
   
+  
+  For offload-device code generated via OpenMP and OpenACC, the math
+  library and the Fortran runtime library will now automatically be linked,
+  when the user or compiler links them on the host side. Thus, it is no
+  longer required to explicitly pass -lm and/or
+  -lgfortran to the offload-device linker using the https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html#index-foffload-options;
+  >-foffload-options= flag.
+  
 
 
 New Languages and Language specific improvements


[PATCH v2] RISC-V: Implement vec_set and vec_extract.

2023-06-16 Thread Robin Dapp via Gcc-patches
Hi,

with the recent changes that we also pass the return value via
stack this is can go forward now.

Changes in V2:
 - Remove redundant force_reg.
 - Change target selectors to those introduced in the binop patch.

Regards
 Robin


This implements the vec_set and vec_extract patterns for integer and
floating-point data types.  For vec_set we broadcast the insert value to
a vector register and then perform a vslideup with effective length 1 to
the requested index.

vec_extract is done by sliding down the requested element to index 0
and v(f)mv.[xf].s to a scalar register.

The patch does not include vector-vector extraction which
will be done at a later time.

gcc/ChangeLog:

* config/riscv/autovec.md (vec_set): Implement.
(vec_extract): Implement.
* config/riscv/riscv-protos.h (enum insn_type): Add slide insn.
(emit_vlmax_slide_insn): Declare.
(emit_nonvlmax_slide_tu_insn): Declare.
(emit_scalar_move_insn): Export.
(emit_nonvlmax_integer_move_insn): Export.
* config/riscv/riscv-v.cc (emit_vlmax_slide_insn): New function.
(emit_nonvlmax_slide_tu_insn): New function.
(emit_vlmax_masked_mu_insn): No change.
(emit_vlmax_integer_move_insn): Export.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c:
New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c:
New test.
---
 gcc/config/riscv/autovec.md   |  79 ++
 gcc/config/riscv/riscv-protos.h   |   5 +
 gcc/config/riscv/riscv-v.cc   |  50 +++-
 .../rvv/autovec/vls-vlmax/vec_extract-1.c |  57 +
 .../rvv/autovec/vls-vlmax/vec_extract-2.c |  68 +
 .../rvv/autovec/vls-vlmax/vec_extract-3.c |  69 +
 .../rvv/autovec/vls-vlmax/vec_extract-4.c |  72 ++
 .../rvv/autovec/vls-vlmax/vec_extract-run.c   | 239 +
 .../autovec/vls-vlmax/vec_extract-zvfh-run.c  |  77 ++
 .../riscv/rvv/autovec/vls-vlmax/vec_set-1.c   |  62 +
 .../riscv/rvv/autovec/vls-vlmax/vec_set-2.c   |  74 ++
 .../riscv/rvv/autovec/vls-vlmax/vec_set-3.c   |  76 ++
 .../riscv/rvv/autovec/vls-vlmax/vec_set-4.c   |  79 ++
 .../riscv/rvv/autovec/vls-vlmax/vec_set-run.c | 240 ++
 .../rvv/autovec/vls-vlmax/vec_set-zvfh-run.c  |  78 ++
 15 files changed, 1323 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index c23a625afe1..9569b420d45 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -655,3 +655,82 @@ (define_expand "select_vl"
   riscv_vector::expand_select_vl (operands);
   DONE;
 })
+
+;; -
+;;  [INT,FP] Insert a vector element.
+;; -
+
+(define_expand "vec_set"
+  [(match_operand:V0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (match_operand  2 "immediate_operand")]
+  "TARGET_VECTOR"
+{
+  /* If we set the first element, emit an v(f)mv.s.[xf].  */
+  if (operands[2] == const0_rtx)
+{
+  rtx ops[] = {operands[0], 

[PATCH v3] RISC-V: Add autovec FP unary operations.

2023-06-16 Thread Robin Dapp via Gcc-patches
Hi,

changes from V2:
 - No longer dependent on testsuite changes.
 - Add zvfhmin-1.c unary test cases.

Regards
 Robin

This patch adds floating-point autovec expanders for vfneg, vfabs as well as
vfsqrt and the accompanying tests.

Similary to the binop tests, there are flavors for zvfh now.

gcc/ChangeLog:

* config/riscv/autovec.md (2): Add unop expanders.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/abs-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/zvfhmin-1.c: Add unops.
---
 gcc/config/riscv/autovec.md   | 36 ++-
 .../riscv/rvv/autovec/unop/abs-run.c  | 46 ++-
 .../riscv/rvv/autovec/unop/abs-rv32gcv.c  |  3 +-
 .../riscv/rvv/autovec/unop/abs-rv64gcv.c  |  3 +-
 .../riscv/rvv/autovec/unop/abs-template.h | 17 +--
 .../riscv/rvv/autovec/unop/abs-zvfh-run.c | 35 ++
 .../riscv/rvv/autovec/unop/vfsqrt-run.c   | 30 
 .../riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c   | 12 +
 .../riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c   | 12 +
 .../riscv/rvv/autovec/unop/vfsqrt-template.h  | 31 +
 .../riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c  | 33 +
 .../riscv/rvv/autovec/unop/vneg-run.c |  8 ++--
 .../riscv/rvv/autovec/unop/vneg-rv32gcv.c |  3 +-
 .../riscv/rvv/autovec/unop/vneg-rv64gcv.c |  3 +-
 .../riscv/rvv/autovec/unop/vneg-template.h|  5 +-
 .../riscv/rvv/autovec/unop/vneg-zvfh-run.c| 26 +++
 .../gcc.target/riscv/rvv/autovec/zvfhmin-1.c  | 16 ++-
 17 files changed, 284 insertions(+), 35 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 94452c932a4..5b84eaaf052 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -513,7 +513,7 @@ (define_expand "2"
 })
 
 ;; 
---
-;; - ABS expansion to vmslt and vneg
+;; - [INT] ABS expansion to vmslt and vneg.
 ;; 
---
 
 (define_expand "abs2"
@@ -532,6 +532,40 @@ (define_expand "abs2"
   DONE;
 })
 
+;; 
---
+;;  [FP] Unary operations
+;; 
---
+;; Includes:
+;; - vfneg.v/vfabs.v
+;; 
---
+(define_expand "2"
+  [(set (match_operand:VF_AUTO 0 "register_operand")
+(any_float_unop_nofrm:VF_AUTO
+ (match_operand:VF_AUTO 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+})
+
+;; 
---
+;; - [FP] Square root
+;; 
---
+;; Includes:
+;; - vfsqrt.v
+;; 
---
+(define_expand "2"
+  [(set (match_operand:VF_AUTO 0 "register_operand")
+(any_float_unop:VF_AUTO
+ (match_operand:VF_AUTO 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_fp_insn (icode, riscv_vector::RVV_UNOP, operands);
+  

[PATCH v3] RISC-V: Add autovec FP binary operations.

2023-06-16 Thread Robin Dapp via Gcc-patches
Hi,

changes in v3:
 - No longer "dependent" on testsuite changes.  Just the zvfh
run testcases use riscv_zvfh_hw, i.e. require that we can compile,
link the code as well as execute the resulting binary.  
 - Renamed rounding modes (floating_point_rounding_mode feels a
bit long-winded but well...)

Regards
 Robin


This implements the floating-point autovec expanders for binary
operations: vfadd, vfsub, vfdiv, vfmul, vfmax, vfmin and adds
tests.

The existing tests are split up into non-_Float16 and _Float16
flavors as we cannot rely on the zvfh extension being present.

As long as we do not have full middle-end support we need
-ffast-math for the tests.

In order to allow proper _Float16 support we need to disable
promotion to float.  This patch handles that similarly to
TARGET_ZFH and TARGET_ZINX.  This is not strictly accurate
as the zvfh extension only requires the zfhmin i.e. just
conversion to float and no actual operations.

gcc/ChangeLog:

* config/riscv/autovec.md (3): Implement binop
expander.
* config/riscv/riscv-protos.h (emit_vlmax_fp_insn): Declare.
(emit_vlmax_fp_minmax_insn): Declare.
(enum frm_field_enum): Rename this...
(enum rounding_mode): ...to this.
* config/riscv/riscv-v.cc (emit_vlmax_fp_insn): New function
(emit_vlmax_fp_minmax_insn): New function.
* config/riscv/riscv.cc (riscv_const_insns): Clarify const
vector handling.
(riscv_libgcc_floating_mode_supported_p): Adjust comment.
(riscv_excess_precision): Do not convert to float for ZVFH.
* config/riscv/vector-iterators.md: Add VF_AUTO iterator.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vdiv-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmax-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmin-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmul-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vsub-zvfh-run.c: New test.
* lib/target-supports.exp: Add riscv_vector_hw and riscv_zvfh_hw
target selectors.
---
 gcc/config/riscv/autovec.md   | 36 +
 gcc/config/riscv/riscv-protos.h   |  8 +-
 gcc/config/riscv/riscv-v.cc   | 74 ++-
 gcc/config/riscv/riscv.cc | 27 +--
 gcc/config/riscv/vector-iterators.md  | 28 +++
 .../riscv/rvv/autovec/binop/vadd-run.c| 12 ++-
 .../riscv/rvv/autovec/binop/vadd-rv32gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vadd-rv64gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vadd-template.h   | 11 ++-
 .../riscv/rvv/autovec/binop/vadd-zvfh-run.c   | 54 ++
 .../riscv/rvv/autovec/binop/vdiv-run.c|  8 +-
 .../riscv/rvv/autovec/binop/vdiv-rv32gcv.c|  7 +-
 .../riscv/rvv/autovec/binop/vdiv-rv64gcv.c|  7 +-
 .../riscv/rvv/autovec/binop/vdiv-template.h   |  8 +-
 .../riscv/rvv/autovec/binop/vdiv-zvfh-run.c   | 37 ++
 .../riscv/rvv/autovec/binop/vmax-run.c|  9 ++-
 .../riscv/rvv/autovec/binop/vmax-rv32gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vmax-rv64gcv.c|  3 +-
 

RE: [x86 PATCH] Convert ptestz of pandn into ptestc.

2023-06-16 Thread Roger Sayle

Hi Uros,
Here's an updated version of this patch incorporating your comments.
It uses emit_insn (target, const1_rtx), bt_comparison operator to
combine the sete/setne to setc/setnc, and je/jne to jc/jnc patterns,
uses scan-assembler-times in the test cases, and cleans up the silly
cut'n'paste issue that mangled strict_low_part/subreg of a register
that was already QImode.  I tried, but the strict_low_part variant
really is required (some of the new test cases fail without it), but
things are much neater now, and have few patterns than the original.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-06-16  Roger Sayle  
Uros Bizjak  

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_sse_ptest): Recognize
expansion of ptestc with equal operands as producing const1_rtx.
* config/i386/i386.cc (ix86_rtx_costs): Provide accurate cost
estimates of UNSPEC_PTEST, where the ptest performs the PAND
or PAND of its operands.
* config/i386/sse.md (define_split): Transform CCCmode UNSPEC_PTEST
of reg_equal_p operands into an x86_stc instruction.
(define_split): Split pandn/ptestz/set{n?}e into ptestc/set{n?}c.
(define_split): Similar to above for strict_low_part destinations.
(define_split): Split pandn/ptestz/j{n?}e into ptestc/j{n?}c.

gcc/testsuite/ChangeLog
* gcc.target/i386/avx-vptest-4.c: New test case.
* gcc.target/i386/avx-vptest-5.c: Likewise.
* gcc.target/i386/avx-vptest-6.c: Likewise.
* gcc.target/i386/pr109973-1.c: Update test case.
* gcc.target/i386/pr109973-2.c: Likewise.
* gcc.target/i386/sse4_1-ptest-4.c: New test case.
* gcc.target/i386/sse4_1-ptest-5.c: Likewise.
* gcc.target/i386/sse4_1-ptest-6.c: Likewise.


Thanks,
Roger
--

> -Original Message-
> From: Uros Bizjak 
> Sent: 14 June 2023 09:31
> To: Roger Sayle 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [x86 PATCH] Convert ptestz of pandn into ptestc.
> 
> On Tue, Jun 13, 2023 at 6:03 PM Roger Sayle 
> wrote:
> >
> >
> > This patch is the next instalment in a set of backend patches around
> > improvements to ptest/vptest.  A previous patch optimized the sequence
> > t=pand(x,y); ptestz(t,t) into the equivalent ptestz(x,y), using the
> > property that ZF is set to (X) == 0.  This patch performs a similar
> > transformation, converting t=pandn(x,y); ptestz(t,t) into the (almost)
> > equivalent ptestc(y,x), using the property that the CF flags is set to
> > (~X) == 0.  The tricky bit is that this sets the CF flag instead of
> > the ZF flag, so we can only perform this transformation when we can
> > also convert the flags' consumer, as well as the producer.
> >
> > For the test case:
> >
> > int foo (__m128i x, __m128i y)
> > {
> >   __m128i a = x & ~y;
> >   return __builtin_ia32_ptestz128 (a, a); }
> >
> > With -O2 -msse4.1 we previously generated:
> >
> > foo:pandn   %xmm0, %xmm1
> > xorl%eax, %eax
> > ptest   %xmm1, %xmm1
> > sete%al
> > ret
> >
> > with this patch we now generate:
> >
> > foo:xorl%eax, %eax
> > ptest   %xmm0, %xmm1
> > setc%al
> > ret
> >
> > At the same time, this patch also provides alternative fixes for PR
> > target/109973 and PR target/110118, by recognizing that ptestc(x,x)
> > always sets the carry flag (X&~X is always zero).  This is achieved
> > both by recognizing the special case in ix86_expand_sse_ptest and with
> > a splitter to convert an eligible ptest into an stc.
> >
> > The next piece is, of course, STV of "if (x & ~y)..."
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> > 2023-06-13  Roger Sayle  
> >
> > gcc/ChangeLog
> > * config/i386/i386-expand.cc (ix86_expand_sse_ptest): Recognize
> > expansion of ptestc with equal operands as returning const1_rtx.
> > * config/i386/i386.cc (ix86_rtx_costs): Provide accurate cost
> > estimates of UNSPEC_PTEST, where the ptest performs the PAND
> > or PAND of its operands.
> > * config/i386/sse.md (define_split): Transform CCCmode UNSPEC_PTEST
> > of reg_equal_p operands into an x86_stc instruction.
> > (define_split): Split pandn/ptestz/setne into ptestc/setnc.
> > (define_split): Split pandn/ptestz/sete into ptestc/setc.
> > (define_split): Split pandn/ptestz/je into ptestc/jc.
> > (define_split): Split pandn/ptestz/jne into ptestc/jnc.
> >
> > gcc/testsuite/ChangeLog
> > * gcc.target/i386/avx-vptest-4.c: New test case.
> > * gcc.target/i386/avx-vptest-5.c: Likewise.
> > * gcc.target/i386/avx-vptest-6.c: Likewise.
> > * 

Re: [PATCH v2] Add MinGW option -mcrtdll= for choosing C RunTime DLL library

2023-06-16 Thread Jonathan Yong via Gcc-patches

On 6/14/23 16:09, Pali Rohár wrote:

It adjust preprocess, compile and link flags, which allows to change
default -lmsvcrt library by another provided by MinGW runtime.

gcc/
  * config/i386/mingw-w64.h (CPP_SPEC): Adjust for -mcrtdll=.
  (REAL_LIBGCC_SPEC): New define.
  * config/i386/mingw.opt: Add mcrtdll=
  * config/i386/mingw32.h (CPP_SPEC): Adjust for -mcrtdll=.
  (REAL_LIBGCC_SPEC): Adjust for -mcrtdll=.
  (STARTFILE_SPEC): Adjust for -mcrtdll=.
  * doc/invoke.texi: Add mcrtdll= documentation.
---
Changes in v2:
* Fixed doc/invoke.texi documentation
---
  gcc/config/i386/mingw-w64.h | 22 +-
  gcc/config/i386/mingw.opt   |  4 
  gcc/config/i386/mingw32.h   | 28 
  gcc/doc/invoke.texi | 24 +++-
  4 files changed, 72 insertions(+), 6 deletions(-)



Thanks, pushed to master branch.




[PATCH][4/5] aarch64: [US]Q(R)SHR(U)N2 refactoring

2023-06-16 Thread Kyrylo Tkachov via Gcc-patches
This patch is large in lines of code, but it is a fairly regular
extension of the first patch as it converts the high-half patterns
to standard RTL codes in the same fashion as the first patch did for the
low-half ones.
This now allows us to remove the unspec codes for these instructions as
there are no more uses of them left.

Bootstrapped and tested on aarch64-none-linux-gnu and
aarch64_be-none-elf.

gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def (shrn2): Rename builtins 
to...
(shrn2_n): ... This.
(rshrn2): Rename builtins to...
(rshrn2_n): ... This.
* config/aarch64/arm_neon.h (vrshrn_high_n_s16): Adjust for the above.
(vrshrn_high_n_s32): Likewise.
(vrshrn_high_n_s64): Likewise.
(vrshrn_high_n_u16): Likewise.
(vrshrn_high_n_u32): Likewise.
(vrshrn_high_n_u64): Likewise.
(vshrn_high_n_s16): Likewise.
(vshrn_high_n_s32): Likewise.
(vshrn_high_n_s64): Likewise.
(vshrn_high_n_u16): Likewise.
(vshrn_high_n_u32): Likewise.
(vshrn_high_n_u64): Likewise.
* config/aarch64/aarch64-simd.md (*aarch64_shrn2_vect_le):
Delete.
(*aarch64_shrn2_vect_be): Likewise.
(aarch64_shrn2_insn_le): Likewise.
(aarch64_shrn2_insn_be): Likewise.
(aarch64_shrn2): Likewise.
(aarch64_rshrn2_insn_le): Likewise.
(aarch64_rshrn2_insn_be): Likewise.
(aarch64_rshrn2): Likewise.
(aarch64_qshrn2_n_insn_le): Likewise.
(aarch64_shrn2_n_insn_le): New define_insn.
(aarch64_qshrn2_n_insn_be): Delete.
(aarch64_shrn2_n_insn_be): New define_insn.
(aarch64_qshrn2_n): Delete.
(aarch64_shrn2_n): New define_expand.
(aarch64_rshrn2_n_insn_le): New define_insn.
(aarch64_rshrn2_n_insn_be): New define_insn.
(aarch64_rshrn2_n): New define_expand.
(aarch64_sqshrun2_n_insn_le): New define_insn.
(aarch64_sqshrun2_n_insn_be): New define_insn.
(aarch64_sqshrun2_n): New define_expand.
(aarch64_sqrshrun2_n_insn_le): New define_insn.
(aarch64_sqrshrun2_n_insn_be): New define_insn.
(aarch64_sqrshrun2_n): New define_expand.
* config/aarch64/iterators.md (UNSPEC_SQSHRUN, UNSPEC_SQRSHRUN,
UNSPEC_SQSHRN, UNSPEC_UQSHRN, UNSPEC_SQRSHRN, UNSPEC_UQRSHRN):
Delete unspec values.
(VQSHRN_N): Delete int iterator.


s4.patch
Description: s4.patch


Re: [PATCH v3] c++: Accept elaborated-enum-base with pedwarn

2023-06-16 Thread Jason Merrill via Gcc-patches

On 6/16/23 07:58, Alex Coplan wrote:

Hi,

This is a v3 patch addressing feedback for:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621714.html

The only change since the previous version is that the new option is
documented in invoke.texi (and the description in c.opt was shortened as
requested).

--

macOS SDK headers using the CF_ENUM macro can expand to invalid C++ code
of the form:

typedef enum T : BaseType T;

i.e. an elaborated-type-specifier with an additional enum-base.
Upstream LLVM can be made to accept the above construct with
-Wno-error=elaborated-enum-base.

This patch adds the -Welaborated-enum-base warning to GCC and adjusts
the C++ parser to emit this warning instead of rejecting this code
outright.

The macro expansion in the macOS headers occurs in the case that the
compiler declares support for enums with underlying type using
__has_feature, see
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618450.html

GCC rejecting this construct outright means that GCC fails to bootstrap
on Darwin in the case that it (correctly) implements __has_feature and
declares support for C++ enums with underlying type.

With this patch, GCC can bootstrap on Darwin in combination with the
(WIP) __has_feature patch posted at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617878.html

Bootstrapped/regtested on aarch64-linux-gnu and x86_64-apple-darwin.
OK for trunk?


OK, thanks.


Thanks,
Alex

gcc/c-family/ChangeLog:

* c.opt (Welaborated-enum-base): New.

gcc/ChangeLog:

* doc/invoke.texi: Document -Welaborated-enum-base.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_enum_specifier): Don't reject
elaborated-type-specifier with enum-base, instead emit new
Welaborated-enum-base warning.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/enum40.C: Adjust expected diagnostics.
* g++.dg/cpp0x/forw_enum6.C: Likewise.
* g++.dg/cpp0x/elab-enum-base.C: New test.




[PATCH][0/5][committed] aarch64: Reimplement [US]Q(R)SHR(U)N(2) patterns with standard RTL codes

2023-06-16 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch series reimplements the MD patterns for the instructions that
perform narrowing right shifts with optional rounding and saturation
using standard RTL codes rather than unspecs.  This includes the scalar
forms and the *2 forms that write to the high half of the result vector.
This allows us to get rid of a number of unspecs and should significantly
improve the simplification capabilities around these instructions.
I attempted to compress as many forms as possible with iterators and the
end result looks reasonably orthogonal with a few small exceptions described
in the individual patches.

The semantics are pretty well exercised by tests in advsimd-intrinsics.exp and
in many of those tests the intrinsics involved are now entirely evaluated at
compile-time and disappear from the output at optimisation levels. The 
validation
against the reference numbers still passes (though I got many failures during
development as I was getting little things wrong, so the tests are working as
intended!).

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Pushing to trunk.
Thanks,
Kyrill


[PATCH][2/5] aarch64: [US]Q(R)SHR(U)N scalar forms refactoring

2023-06-16 Thread Kyrylo Tkachov via Gcc-patches
Some instructions from the previous patch have scalar forms:
SQSHRN,SQRSHRN,UQSHRN,UQRSHRN,SQSHRUN,SQRSHRUN.
This patch converts the patterns for these to use standard RTL codes.
Their MD patterns deviate slightly from the vector forms mostly due to
things like operands being scalar rather than vectors.
One nuance is in the SQSHRUN,SQRSHRUN patterns. These end in a truncate
to the scalar narrow mode e.g. SI -> QI.  This gets simplified by the
RTL passes to a subreg rather than keeping it as a truncate.
So we end up representing these without the truncate and in the expander
read the narrow subreg in order to comply with the expected width of the
intrinsic.

Bootstrapped and tested on aarch64-none-linux-gnu and
aarch64_be-none-elf.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_qshrn_n):
Rename to...
(aarch64_shrn_n): ... This.  Reimplement with RTL codes.
(*aarch64_rshrn_n_insn): New define_insn.
(aarch64_sqrshrun_n_insn): Likewise.
(aarch64_sqshrun_n_insn): Likewise.
(aarch64_rshrn_n): New define_expand.
(aarch64_sqshrun_n): Likewise.
(aarch64_sqrshrun_n): Likewise.
* config/aarch64/iterators.md (V2XWIDE): Add HI and SI modes.


s2.patch
Description: s2.patch


[PATCH][5/5] aarch64: Handle ASHIFTRT in patterns for shrn2

2023-06-16 Thread Kyrylo Tkachov via Gcc-patches
Similar to the low-half patterns, we want to match both ashiftrt and
lshiftrt with the truncate for SHRN2.  We reuse the SHIFTRT iterator
and the AARCH64_VALID_SHRN_OP check to help, but because we expand the
high-half patterns by their gen_* names we need to disambiguate all the
different trunc+shift combinations in the pattern name, which leads to a
slight renaming of the builtins.  The AARCH64_VALID_SHRN_OP check on the
expander and the define_insns ensures that no invalid combination ends
up getting matched.

Bootstrapped and tested on aarch64-none-linux-gnu and
aarch64_be-none-elf.

gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def (shrn2_n): Rename builtins 
to...
(ushrn2_n): ... This.
(sqshrn2_n): Rename builtins to...
(ssqshrn2_n): ... This.
(uqshrn2_n): Rename builtins to...
(uqushrn2_n): ... This.
* config/aarch64/arm_neon.h (vqshrn_high_n_s16): Adjust for the above.
(vqshrn_high_n_s32): Likewise.
(vqshrn_high_n_s64): Likewise.
(vqshrn_high_n_u16): Likewise.
(vqshrn_high_n_u32): Likewise.
(vqshrn_high_n_u64): Likewise.
(vshrn_high_n_s16): Likewise.
(vshrn_high_n_s32): Likewise.
(vshrn_high_n_s64): Likewise.
(vshrn_high_n_u16): Likewise.
(vshrn_high_n_u32): Likewise.
(vshrn_high_n_u64): Likewise.
* config/aarch64/aarch64-simd.md 
(aarch64_shrn2_n_insn_le):
Rename to...
(aarch64_shrn2_n_insn_le): ... This.
Use SHIFTRT iterator and AARCH64_VALID_SHRN_OP check.
(aarch64_shrn2_n_insn_be): Rename to...
(aarch64_shrn2_n_insn_be): ... This.
Use SHIFTRT iterator and AARCH64_VALID_SHRN_OP check.
(aarch64_shrn2_n): Rename to...
(aarch64_shrn2_n): ... This.
Update expander for the above.


s5.patch
Description: s5.patch


[PATCH][3/5] aarch64: Add ASHIFTRT handling for shrn pattern

2023-06-16 Thread Kyrylo Tkachov via Gcc-patches
The first patch in the series has some fallout in the testsuite,
particularly gcc.target/aarch64/shrn-combine-2.c.
Our previous patterns for SHRN matched both
(truncate (ashiftrt (x) (N))) and (truncate (lshiftrt (x) (N))
as these are equivalent for the shift amounts involved.
In our refactoring, however, we mapped shrn to truncate+lshiftrt.

The fix here is to iterate over ashiftrt,lshiftrt in the pattern for it.
However, we don't want to allow ashiftrt for us_truncate or lshiftrt for
ss_truncate from the ALL_TRUNC iterator.

This patch addds a AARCH64_VALID_SHRN_OP helper to gate the valid
combinations of truncations and shifts.

Bootstrapped and tested on aarch64-none-linux-gnu and
aarch64_be-none-elf.

gcc/ChangeLog:

* config/aarch64/aarch64.h (AARCH64_VALID_SHRN_OP): Define.
* config/aarch64/aarch64-simd.md
(*aarch64_shrn_n_insn): Rename to...
(*aarch64_shrn_n_insn): ... This.
Use SHIFTRT iterator and add AARCH64_VALID_SHRN_OP to condition.
* config/aarch64/iterators.md (shrn_s): New code attribute.


s3.patch
Description: s3.patch


[PATCH][1/5] aarch64: Reimplement [US]Q(R)SHR(U)N patterns with RTL codes

2023-06-16 Thread Kyrylo Tkachov via Gcc-patches
This patch reimplements the MD patterns for the instructions that
perform narrowing right shifts with optional rounding and saturation
using standard RTL codes rather than unspecs.

There are four groups of patterns involved:

* Simple narrowing shifts with optional signed or unsigned truncation:
SHRN, SQSHRN, UQSHRN.  These are expressed as a truncation operation of
a right shift.  The matrix of valid combinations looks like this:

|   ashiftrt   |   lshiftrt  |
--
ss_truncate |   SQSHRN |  X  |
us_truncate | X|UQSHRN   |
truncate| X| SHRN|
--

* Narrowing shifts with rounding with optional signed or unsigned
truncation: RSHRN, SQRSHRN, UQRSHRN.  These follow the same
combinations of truncation and shift codes as above, but also perform
intermediate widening of the results in order to represent the addition
of the rounding constant.  This group also corrects an existing
inaccuracy for RSHRN where we don't currently model the intermediate
widening for rounding.

* The somewhat special "Signed saturating Shift Right Unsigned Narrow":
SQSHRUN.  Similar to the SQXTUN instructions, these perform a
saturating truncation that isn't represented by US_TRUNCATE or
SS_TRUNCATE but needs to use a clamping operation followed by a
TRUNCATE.

* The rounding version of the above: SQRSHRUN.  It needs the special
clamping truncate representation but with an intermediate widening and
rounding addition.

Besides using standard RTL codes for all of the above instructions, this
patch allows us to get rid of the explicit define_insns and
define_expands for SHRN and RSHRN.

Bootstrapped and tested on aarch64-none-linux-gnu and
aarch64_be-none-elf.  We've got pretty thorough execute tests in
advsimd-intrinsics.exp that exercise these and many instances of these
instructions get constant-folded away during optimisation and the
validation still passes (during development where I was figuring out the
details of the semantics they were discovering failures), so I'm fairly
confident in the representation.

gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def (shrn): Rename builtins to...
(shrn_n): ... This.
(rshrn): Rename builtins to...
(rshrn_n): ... This.
* config/aarch64/arm_neon.h (vshrn_n_s16): Adjust for the above.
(vshrn_n_s32): Likewise.
(vshrn_n_s64): Likewise.
(vshrn_n_u16): Likewise.
(vshrn_n_u32): Likewise.
(vshrn_n_u64): Likewise.
(vrshrn_n_s16): Likewise.
(vrshrn_n_s32): Likewise.
(vrshrn_n_s64): Likewise.
(vrshrn_n_u16): Likewise.
(vrshrn_n_u32): Likewise.
(vrshrn_n_u64): Likewise.
* config/aarch64/aarch64-simd.md
(*aarch64_shrn): Delete.
(aarch64_shrn): Likewise.
(aarch64_rshrn_insn): Likewise.
(aarch64_rshrn): Likewise.
(aarch64_qshrn_n_insn): Likewise.
(aarch64_qshrn_n): Likewise.
(*aarch64_shrn_n_insn): New define_insn.
(*aarch64_rshrn_n_insn): Likewise.
(*aarch64_sqshrun_n_insn): Likewise.
(*aarch64_sqrshrun_n_insn): Likewise.
(aarch64_shrn_n): New define_expand.
(aarch64_rshrn_n): Likewise.
(aarch64_sqshrun_n): Likewise.
(aarch64_sqrshrun_n): Likewise.
* config/aarch64/iterators.md (ALL_TRUNC): New code iterator.
(TRUNCEXTEND): New code attribute.
(TRUNC_SHIFT): Likewise.
(shrn_op): Likewise.
* config/aarch64/predicates.md (aarch64_simd_umax_quarter_mode):
New predicate.


s1.patch
Description: s1.patch


Re: [PATCH v2] RISC-V: Add autovec FP binary operations.

2023-06-16 Thread Robin Dapp via Gcc-patches
> Why do we need '-ffast-math' with the tests?

Normally we would use the COND_ADD to mask out possibly trapping
vector elements and the likes but COND_ADD works with normal
vector masking.  What we currently have is no masking but the
LEN_LOAD/LEN_STORE machinery i.e. length-controlled loops.
There is no LEN_MASK_COND_ADD yet, but Juzhe is working to upstream
it.  Once this is in place we don't need -ffast-math anymore.

Regards
 Robin



[PATCH] tree-optimization/110243 - kill off IVOPTs split_offset

2023-06-16 Thread Richard Biener via Gcc-patches
IVOPTs has strip_offset which suffers from the same issues regarding
integer overflow that split_constant_offset did but the latter was
fixed quite some time ago.  The following implements strip_offset
in terms of split_constant_offset, removing the redundant and
incorrect implementation.

The implementations are not exactly the same, strip_offset relies
on ptrdiff_tree_p to fend off too large offsets while split_constant_offset
simply assumes those do not happen and truncates them.  By
the same means strip_offset also handles POLY_INT_CSTs but
split_constant_offset does not.  Massaging the latter to
behave like strip_offset in those cases might be the way to go?

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Comments?

Thanks,
Richard.

PR tree-optimization/110243
* tree-ssa-loop-ivopts.cc (strip_offset_1): Remove.
(strip_offset): Make it a wrapper around split_constant_offset.

* gcc.dg/torture/pr110243.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr110243.c |  22 +++
 gcc/tree-ssa-loop-ivopts.cc | 182 ++--
 2 files changed, 32 insertions(+), 172 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr110243.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr110243.c 
b/gcc/testsuite/gcc.dg/torture/pr110243.c
new file mode 100644
index 000..07dffd95d4d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr110243.c
@@ -0,0 +1,22 @@
+/* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
+
+#define X 11
+unsigned char a;
+long b = X;
+int c[9][1];
+unsigned d;
+static long *e = , *f = 
+int g() {
+  if (a && a <= '9')
+return '0';
+  if (a)
+return 10;
+  return -1;
+}
+int main() {
+  d = 0;
+  for (; (int)*f -(X-1) + d < 9; d++)
+c[g() + (int)*f + ((int)*e - X) -(X-1) + d]
+ [0] = 0;
+}
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 6fbd2d59318..a03764072a4 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -2772,183 +2772,21 @@ find_interesting_uses (struct ivopts_data *data, 
basic_block *body)
 }
 }
 
-/* Strips constant offsets from EXPR and stores them to OFFSET.  If INSIDE_ADDR
-   is true, assume we are inside an address.  If TOP_COMPREF is true, assume
-   we are at the top-level of the processed address.  */
-
-static tree
-strip_offset_1 (tree expr, bool inside_addr, bool top_compref,
-   poly_int64 *offset)
-{
-  tree op0 = NULL_TREE, op1 = NULL_TREE, tmp, step;
-  enum tree_code code;
-  tree type, orig_type = TREE_TYPE (expr);
-  poly_int64 off0, off1;
-  HOST_WIDE_INT st;
-  tree orig_expr = expr;
-
-  STRIP_NOPS (expr);
-
-  type = TREE_TYPE (expr);
-  code = TREE_CODE (expr);
-  *offset = 0;
-
-  switch (code)
-{
-case POINTER_PLUS_EXPR:
-case PLUS_EXPR:
-case MINUS_EXPR:
-  op0 = TREE_OPERAND (expr, 0);
-  op1 = TREE_OPERAND (expr, 1);
-
-  op0 = strip_offset_1 (op0, false, false, );
-  op1 = strip_offset_1 (op1, false, false, );
-
-  *offset = (code == MINUS_EXPR ? off0 - off1 : off0 + off1);
-  if (op0 == TREE_OPERAND (expr, 0)
- && op1 == TREE_OPERAND (expr, 1))
-   return orig_expr;
-
-  if (integer_zerop (op1))
-   expr = op0;
-  else if (integer_zerop (op0))
-   {
- if (code == MINUS_EXPR)
-   expr = fold_build1 (NEGATE_EXPR, type, op1);
- else
-   expr = op1;
-   }
-  else
-   expr = fold_build2 (code, type, op0, op1);
-
-  return fold_convert (orig_type, expr);
-
-case MULT_EXPR:
-  op1 = TREE_OPERAND (expr, 1);
-  if (!cst_and_fits_in_hwi (op1))
-   return orig_expr;
-
-  op0 = TREE_OPERAND (expr, 0);
-  op0 = strip_offset_1 (op0, false, false, );
-  if (op0 == TREE_OPERAND (expr, 0))
-   return orig_expr;
-
-  *offset = off0 * int_cst_value (op1);
-  if (integer_zerop (op0))
-   expr = op0;
-  else
-   expr = fold_build2 (MULT_EXPR, type, op0, op1);
-
-  return fold_convert (orig_type, expr);
-
-case ARRAY_REF:
-case ARRAY_RANGE_REF:
-  if (!inside_addr)
-   return orig_expr;
-
-  step = array_ref_element_size (expr);
-  if (!cst_and_fits_in_hwi (step))
-   break;
-
-  st = int_cst_value (step);
-  op1 = TREE_OPERAND (expr, 1);
-  op1 = strip_offset_1 (op1, false, false, );
-  *offset = off1 * st;
-
-  if (top_compref
- && integer_zerop (op1))
-   {
- /* Strip the component reference completely.  */
- op0 = TREE_OPERAND (expr, 0);
- op0 = strip_offset_1 (op0, inside_addr, top_compref, );
- *offset += off0;
- return op0;
-   }
-  break;
-
-case COMPONENT_REF:
-  {
-   tree field;
-
-   if (!inside_addr)
- return orig_expr;
-
-   tmp = component_ref_field_offset (expr);
-   field = TREE_OPERAND (expr, 1);
-   if (top_compref
-   && cst_and_fits_in_hwi (tmp)
-   && 

Re: [PATCH v3] configure: Implement --enable-host-pie

2023-06-16 Thread Marek Polacek via Gcc-patches
On Fri, Jun 16, 2023 at 12:26:23PM +0200, Martin Jambor wrote:
> Hello,
> 
> On Thu, Jun 15 2023, Marek Polacek via Gcc-patches wrote:
> > On Mon, Jun 05, 2023 at 09:06:43PM -0600, Jeff Law wrote:
> >> 
> >> 
> >> On 6/5/23 10:18, Marek Polacek via Gcc-patches wrote:
> >> > Ping.  Anyone have any further comments?
> >> Given this was approved before, but got reverted due to issues (which have
> >> since been addressed) -- I think you might as well go forward and sooner
> >> rather than later so that we can catch fallout earlier.
> >
> > Thanks, pushed now, after rebasing, adjusting the patch for
> > r14-1385, and testing with and without --enable-host-pie on
> > both Debian and Fedora.
> >
> > If something comes up and I can't fix it quickly enough, I'll
> > have to revert the patch.  We'll see.
> >
> 
> The script that regularly checks that the checked-in autotools-generated
> files are in sync now complain about the following diff.  Unless someone
> stops me because I overlooked something or for some other reason, I will
> commit it later on as obvious.

Please, go ahead.
 
> I wonder where the "line" differences come from, perhaps you added a
> comment after running autoconf/automake/...?  The zlib/Makefile.in hunks

Arg, I think I must've messed up the #lines when rebasing though I don't
know what went wrong with zlib/Makefile.in.  But I don't think the latter
will actually make any difference.

> like something we should have, though, even if I did not check whether
> it makes any difference in practice.  And I want the checking script to
> shut up too ;-)

Thanks and sorry.

> Thanks,
> 
> Martin
> 
> 
> diff --git a/gcc/configure b/gcc/configure
> index a4563a9cade..f7b4b283ca2 100755
> --- a/gcc/configure
> +++ b/gcc/configure
> @@ -19847,7 +19847,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 19848 "configure"
> +#line 19850 "configure"
>  #include "confdefs.h"
>  
>  #if HAVE_DLFCN_H
> @@ -19953,7 +19953,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 19954 "configure"
> +#line 19956 "configure"
>  #include "confdefs.h"
>  
>  #if HAVE_DLFCN_H
> diff --git a/zlib/Makefile.in b/zlib/Makefile.in
> index 3f5102d1b87..80fe3b69116 100644
> --- a/zlib/Makefile.in
> +++ b/zlib/Makefile.in
> @@ -353,6 +353,8 @@ datadir = @datadir@
>  datarootdir = @datarootdir@
>  docdir = @docdir@
>  dvidir = @dvidir@
> +enable_host_pie = @enable_host_pie@
> +enable_host_shared = @enable_host_shared@
>  exec_prefix = @exec_prefix@
>  host = @host@
>  host_alias = @host_alias@
> diff --git a/zlib/configure b/zlib/configure
> index 77be6c284e3..9308866a636 100755
> --- a/zlib/configure
> +++ b/zlib/configure
> @@ -10763,7 +10763,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 10778 "configure"
> +#line 10766 "configure"
>  #include "confdefs.h"
>  
>  #if HAVE_DLFCN_H
> @@ -10869,7 +10869,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 10884 "configure"
> +#line 10872 "configure"
>  #include "confdefs.h"
>  
>  #if HAVE_DLFCN_H
> 

Marek



Re: [PATCH] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR110061]

2023-06-16 Thread Wilco Dijkstra via Gcc-patches

ping

From: Wilco Dijkstra
Sent: 02 June 2023 18:28
To: GCC Patches 
Cc: Richard Sandiford ; Kyrylo Tkachov 

Subject: [PATCH] libatomic: Enable lock-free 128-bit atomics on AArch64 
[PR110061] 
 

Enable lock-free 128-bit atomics on AArch64.  This is backwards compatible with
existing binaries, gives better performance than locking atomics and is what
most users expect.

Note 128-bit atomic loads use a load/store exclusive loop if LSE2 is not 
supported.
This results in an implicit store which is invisible to software as long as the 
given
address is writeable (which will be true when using atomics in actual code).

A simple test on an old Cortex-A72 showed 2.7x speedup of 128-bit atomics.

Passes regress, OK for commit?

libatomic/
    PR target/110061
    config/linux/aarch64/atomic_16.S: Implement lock-free ARMv8.0 atomics.
    config/linux/aarch64/host-config.h: Use atomic_16.S for baseline v8.0.
    State we have lock-free atomics.

---

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index 
05439ce394b9653c9bcb582761ff7aaa7c8f9643..0485c284117edf54f41959d2fab9341a9567b1cf
 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -22,6 +22,21 @@
    .  */
 
 
+/* AArch64 128-bit lock-free atomic implementation.
+
+   128-bit atomics are now lock-free for all AArch64 architecture versions.
+   This is backwards compatible with existing binaries and gives better
+   performance than locking atomics.
+
+   128-bit atomic loads use a exclusive loop if LSE2 is not supported.
+   This results in an implicit store which is invisible to software as long
+   as the given address is writeable.  Since all other atomics have explicit
+   writes, this will be true when using atomics in actual code.
+
+   The libat__16 entry points are ARMv8.0.
+   The libat__16_i1 entry points are used when LSE2 is available.  */
+
+
 .arch   armv8-a+lse
 
 #define ENTRY(name) \
@@ -37,6 +52,10 @@ name:    \
 .cfi_endproc;   \
 .size name, .-name;
 
+#define ALIAS(alias,name)  \
+   .global alias;  \
+   .set alias, name;
+
 #define res0 x0
 #define res1 x1
 #define in0  x2
@@ -70,6 +89,24 @@ name:    \
 #define SEQ_CST 5
 
 
+ENTRY (libat_load_16)
+   mov x5, x0
+   cbnz    w1, 2f
+
+   /* RELAXED.  */
+1: ldxp    res0, res1, [x5]
+   stxp    w4, res0, res1, [x5]
+   cbnz    w4, 1b
+   ret
+
+   /* ACQUIRE/CONSUME/SEQ_CST.  */
+2: ldaxp   res0, res1, [x5]
+   stxp    w4, res0, res1, [x5]
+   cbnz    w4, 2b
+   ret
+END (libat_load_16)
+
+
 ENTRY (libat_load_16_i1)
 cbnz    w1, 1f
 
@@ -93,6 +130,23 @@ ENTRY (libat_load_16_i1)
 END (libat_load_16_i1)
 
 
+ENTRY (libat_store_16)
+   cbnz    w4, 2f
+
+   /* RELAXED.  */
+1: ldxp    xzr, tmp0, [x0]
+   stxp    w4, in0, in1, [x0]
+   cbnz    w4, 1b
+   ret
+
+   /* RELEASE/SEQ_CST.  */
+2: ldxp    xzr, tmp0, [x0]
+   stlxp   w4, in0, in1, [x0]
+   cbnz    w4, 2b
+   ret
+END (libat_store_16)
+
+
 ENTRY (libat_store_16_i1)
 cbnz    w4, 1f
 
@@ -101,14 +155,14 @@ ENTRY (libat_store_16_i1)
 ret
 
 /* RELEASE/SEQ_CST.  */
-1: ldaxp   xzr, tmp0, [x0]
+1: ldxp    xzr, tmp0, [x0]
 stlxp   w4, in0, in1, [x0]
 cbnz    w4, 1b
 ret
 END (libat_store_16_i1)
 
 
-ENTRY (libat_exchange_16_i1)
+ENTRY (libat_exchange_16)
 mov x5, x0
 cbnz    w4, 2f
 
@@ -126,22 +180,55 @@ ENTRY (libat_exchange_16_i1)
 stxp    w4, in0, in1, [x5]
 cbnz    w4, 3b
 ret
-4:
-   cmp w4, RELEASE
-   b.ne    6f
 
-   /* RELEASE.  */
-5: ldxp    res0, res1, [x5]
+   /* RELEASE/ACQ_REL/SEQ_CST.  */
+4: ldaxp   res0, res1, [x5]
 stlxp   w4, in0, in1, [x5]
-   cbnz    w4, 5b
+   cbnz    w4, 4b
 ret
+END (libat_exchange_16)
 
-   /* ACQ_REL/SEQ_CST.  */
-6: ldaxp   res0, res1, [x5]
-   stlxp   w4, in0, in1, [x5]
-   cbnz    w4, 6b
+
+ENTRY (libat_compare_exchange_16)
+   ldp exp0, exp1, [x1]
+   cbz w4, 3f
+   cmp w4, RELEASE
+   b.hs    4f
+
+   /* ACQUIRE/CONSUME.  */
+1: ldaxp   tmp0, tmp1, [x0]
+   cmp tmp0, exp0
+   ccmp    tmp1, exp1, 0, eq
+   bne 2f
+   stxp    w4, in0, in1, [x0]
+   cbnz    w4, 1b
+   mov x0, 1
 ret
-END (libat_exchange_16_i1)
+
+2: stp tmp0, tmp1, [x1]
+   mov x0, 0
+   ret
+
+   /* RELAXED.  */
+3: ldxp    tmp0, tmp1, [x0]
+   cmp tmp0, exp0
+   ccmp    tmp1, exp1, 0, eq
+   bne 2b
+   stxp    w4, in0, in1, [x0]
+   cbnz    w4, 3b
+   mov x0, 1
+   ret
+
+   /* RELEASE/ACQ_REL/SEQ_CST.  */
+4: ldaxp   tmp0, tmp1, 

[PATCH] [contrib] validate_failures.py: Don't consider summary line in wrong place

2023-06-16 Thread Thiago Jung Bauermann via Gcc-patches
When parsing a summary or manifest file, if we're not either after a tool
line (e.g. "=== gdb tests ===") or before a summary line (e.g.,
"=== gdb Summary ===") then the current line can't be a valid result line
so ignore it.

This addresses a problem we're seeing when running the GDB testsuite in
our CI environment where it produces a valid summary file, but then after
the "=== gdb Summary ===" section it outputs a series of Tcl errors that
match _VALID_TEST_RESULTS_REX and thus confuse the parsing logic:

05: 14:32 .sum file seems to be broken: tool="None", exp="None", 
summary_line="ERROR: ---"
05: 14:32 Traceback (most recent call last):
05: 14:32   File 
"/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 706, in 

05: 14:32 retval = Main(sys.argv)
05: 14:32   File 
"/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 697, in 
Main
05: 14:32 retval = CheckExpectedResults()
05: 14:32   File 
"/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 572, in 
CheckExpectedResults
05: 14:32 actual = GetResults(sum_files)
05: 14:32   File 
"/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 447, in 
GetResults
05: 14:32 build_results.update(ParseSummary(sum_fname))
05: 14:32   File 
"/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 389, in 
ParseSummary
05: 14:32 result = result_set.MakeTestResult(line, ordinal)
05: 14:32   File 
"/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 236, in 
MakeTestResult
05: 14:32 return TestResult(summary_line, ordinal,
05: 14:32   File 
"/path/to/gcc/contrib/testsuite-management/validate_failures.py", line 148, in 
__init__
05: 14:32 raise

contrib/ChangeLog:

* testsuite-management/validate_failures.py (IsInterestingResult):
Add result_set argument and use it.  Adjust callers.
---
 .../testsuite-management/validate_failures.py  | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/contrib/testsuite-management/validate_failures.py 
b/contrib/testsuite-management/validate_failures.py
index 4dfd9cda4e24..11bb6f7e9c7c 100755
--- a/contrib/testsuite-management/validate_failures.py
+++ b/contrib/testsuite-management/validate_failures.py
@@ -295,10 +295,20 @@ def SplitAttributesFromSummaryLine(line):
   return (attrs, line)
 
 
-def IsInterestingResult(line):
+def IsInterestingResult(result_set, line):
   """Return True if line is one of the summary lines we care about."""
   (_, line) = SplitAttributesFromSummaryLine(line)
-  return bool(_VALID_TEST_RESULTS_REX.match(line))
+  valid_result = bool(_VALID_TEST_RESULTS_REX.match(line))
+
+  # If there's no tool defined it means that either the results section hasn't
+  # started yet, or it is already over.
+  if valid_result and result_set.current_tool is None:
+if _OPTIONS.verbosity >= 3:
+  print(f'WARNING: Result "{line}" found outside sum file boundaries.',
+file=sys.stderr)
+return False
+
+  return valid_result
 
 
 def IsToolLine(line):
@@ -354,7 +364,7 @@ def ParseManifestWorker(result_set, manifest_path):
   result_set.remove(result_set.MakeTestResult(GetNegativeResult(line)))
 elif IsInclude(line):
   ParseManifestWorker(result_set, GetIncludeFile(line, manifest_path))
-elif IsInterestingResult(line):
+elif IsInterestingResult(result_set, line):
   result = result_set.MakeTestResult(line)
   if result.HasExpired():
 # Ignore expired manifest entries.
@@ -391,7 +401,7 @@ def ParseSummary(sum_fname):
   ordinal=0
   sum_file = open(sum_fname, encoding='latin-1', mode='r')
   for line in sum_file:
-if IsInterestingResult(line):
+if IsInterestingResult(result_set, line):
   result = result_set.MakeTestResult(line, ordinal)
   ordinal += 1
   if result.HasExpired():


[PATCH v3] c++: Accept elaborated-enum-base with pedwarn

2023-06-16 Thread Alex Coplan via Gcc-patches
Hi,

This is a v3 patch addressing feedback for:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621714.html

The only change since the previous version is that the new option is
documented in invoke.texi (and the description in c.opt was shortened as
requested).

--

macOS SDK headers using the CF_ENUM macro can expand to invalid C++ code
of the form:

typedef enum T : BaseType T;

i.e. an elaborated-type-specifier with an additional enum-base.
Upstream LLVM can be made to accept the above construct with
-Wno-error=elaborated-enum-base.

This patch adds the -Welaborated-enum-base warning to GCC and adjusts
the C++ parser to emit this warning instead of rejecting this code
outright.

The macro expansion in the macOS headers occurs in the case that the
compiler declares support for enums with underlying type using
__has_feature, see
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618450.html

GCC rejecting this construct outright means that GCC fails to bootstrap
on Darwin in the case that it (correctly) implements __has_feature and
declares support for C++ enums with underlying type.

With this patch, GCC can bootstrap on Darwin in combination with the
(WIP) __has_feature patch posted at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617878.html

Bootstrapped/regtested on aarch64-linux-gnu and x86_64-apple-darwin.
OK for trunk?

Thanks,
Alex

gcc/c-family/ChangeLog:

* c.opt (Welaborated-enum-base): New.

gcc/ChangeLog:

* doc/invoke.texi: Document -Welaborated-enum-base.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_enum_specifier): Don't reject
elaborated-type-specifier with enum-base, instead emit new
Welaborated-enum-base warning.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/enum40.C: Adjust expected diagnostics.
* g++.dg/cpp0x/forw_enum6.C: Likewise.
* g++.dg/cpp0x/elab-enum-base.C: New test.
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index cead1995561..0930a3c0422 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1488,6 +1488,10 @@ Wsubobject-linkage
 C++ ObjC++ Var(warn_subobject_linkage) Warning Init(1)
 Warn if a class type has a base or a field whose type uses the anonymous 
namespace or depends on a type with no linkage.
 
+Welaborated-enum-base
+C++ ObjC++ Var(warn_elaborated_enum_base) Warning Init(1)
+Warn if an additional enum-base is used in an elaborated-type-specifier.
+
 Wduplicate-decl-specifier
 C ObjC Var(warn_duplicate_decl_specifier) Warning LangEnabledBy(C ObjC,Wall)
 Warn when a declaration has duplicate const, volatile, restrict or _Atomic 
specifier.
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index d77fbd20e56..4dd290717de 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -21024,11 +21024,13 @@ cp_parser_enum_specifier (cp_parser* parser)
 
   /* Check for the `:' that denotes a specified underlying type in C++0x.
  Note that a ':' could also indicate a bitfield width, however.  */
+  location_t colon_loc = UNKNOWN_LOCATION;
   if (cp_lexer_next_token_is (parser->lexer, CPP_COLON))
 {
   cp_decl_specifier_seq type_specifiers;
 
   /* Consume the `:'.  */
+  colon_loc = cp_lexer_peek_token (parser->lexer)->location;
   cp_lexer_consume_token (parser->lexer);
 
   auto tdf
@@ -21077,10 +21079,13 @@ cp_parser_enum_specifier (cp_parser* parser)
  && cp_lexer_next_token_is_not (parser->lexer, CPP_SEMICOLON))
{
  if (has_underlying_type)
-   cp_parser_commit_to_tentative_parse (parser);
- cp_parser_error (parser, "expected %<;%> or %<{%>");
- if (has_underlying_type)
-   return error_mark_node;
+   pedwarn (colon_loc,
+OPT_Welaborated_enum_base,
+"declaration of enumeration with "
+"fixed underlying type and no enumerator list is "
+"only permitted as a standalone declaration");
+ else
+   cp_parser_error (parser, "expected %<;%> or %<{%>");
}
 }
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6d08229ce40..8ee5ba15709 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -254,7 +254,8 @@ in the following sections.
 -Wdelete-non-virtual-dtor  -Wno-deprecated-array-compare
 -Wdeprecated-copy -Wdeprecated-copy-dtor
 -Wno-deprecated-enum-enum-conversion -Wno-deprecated-enum-float-conversion
--Weffc++  -Wno-exceptions -Wextra-semi  -Wno-inaccessible-base
+-Weffc++ -Wno-elaborated-enum-base
+-Wno-exceptions -Wextra-semi  -Wno-inaccessible-base
 -Wno-inherited-variadic-ctor  -Wno-init-list-lifetime
 -Winvalid-constexpr -Winvalid-imported-macros
 -Wno-invalid-offsetof  -Wno-literal-suffix
@@ -3846,6 +3847,15 @@ bool b = e <= 3.7;
 @option{-std=c++20}.  In pre-C++20 dialects, this warning can be enabled
 by @option{-Wenum-conversion}.
 
+@opindex Welaborated-enum-base
+@opindex Wno-elaborated-enum-base
+@item -Wno-elaborated-enum-base
+For C++11 and above, warn if an 

RE: [PATCH v1] RISC-V: Fix one warning of maybe-uninitialized in riscv-vsetvl.cc

2023-06-16 Thread Li, Pan2 via Gcc-patches
Committed, thanks Robin.

Pan

-Original Message-
From: Robin Dapp  
Sent: Friday, June 16, 2023 7:41 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; juzhe.zh...@rivai.ai; jeffreya...@gmail.com; Wang, 
Yanzhang ; kito.ch...@gmail.com
Subject: Re: [PATCH v1] RISC-V: Fix one warning of maybe-uninitialized in 
riscv-vsetvl.cc

> This patch would like to fix one maybe-uninitialized warning. Aka:
> 
> riscv-vsetvl.cc:4354:3: error: 'vsetvl_rinsn' may be used uninitialized 
> [-Werror=maybe-uninitialized]
> 
> Signed-off-by: Pan Li 

IMHO obvious enough that it doesn't need a maintainer's OK, so go
ahead.

We should make sure to find such nits before Andreas does in the
future, though, as we don't want to waste his time.
This means either proper bootstrapping or always configuring
with -Werror=all (or similar).  The latter should catch most and
is less intrusive.

Regards
 Robin



[PATCH] tree-optimization/110278 - uns < (typeof uns)(uns != 0) is always false

2023-06-16 Thread Richard Biener via Gcc-patches
The following adds two patterns simplifying comparisons,
uns < (typeof uns)(uns != 0) is always false and x != (typeof x)(x == 0)
is always true.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110278
* match.pd (uns < (typeof uns)(uns != 0) -> false): New.
(x != (typeof x)(x == 0) -> true): Likewise.
---
 gcc/match.pd | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index 264f9cb8a40..48b76e6a051 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6410,6 +6410,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (if (cmp == GT_EXPR)
  (lt (view_convert:st @0) { build_zero_cst (st); })))
 
+/* unsigned < (typeof unsigned)(unsigned != 0) is always false.  */
+(simplify
+ (lt:c @0 (convert (ne @0 integer_zerop)))
+ (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
+  { constant_boolean_node (false, type); }))
+
+/* x != (typeof x)(x == 0) is always true.  */
+(simplify
+ (ne:c @0 (convert (eq @0 integer_zerop)))
+ { constant_boolean_node (true, type); })
+
 (for cmp (unordered ordered unlt unle ungt unge uneq ltgt)
  /* If the second operand is NaN, the result is constant.  */
  (simplify
-- 
2.35.3


Re: [PATCH v1] RISC-V: Fix one warning of maybe-uninitialized in riscv-vsetvl.cc

2023-06-16 Thread Robin Dapp via Gcc-patches
> This patch would like to fix one maybe-uninitialized warning. Aka:
> 
> riscv-vsetvl.cc:4354:3: error: 'vsetvl_rinsn' may be used uninitialized 
> [-Werror=maybe-uninitialized]
> 
> Signed-off-by: Pan Li 

IMHO obvious enough that it doesn't need a maintainer's OK, so go
ahead.

We should make sure to find such nits before Andreas does in the
future, though, as we don't want to waste his time.
This means either proper bootstrapping or always configuring
with -Werror=all (or similar).  The latter should catch most and
is less intrusive.

Regards
 Robin



RE: [PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS

2023-06-16 Thread Li, Pan2 via Gcc-patches
Sorry for inconvenient, file one PATCH for this as below.

https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621980.html

Pan

-Original Message-
From: Andreas Schwab  
Sent: Friday, June 16, 2023 6:55 PM
To: juzhe.zh...@rivai.ai
Cc: gcc-patches@gcc.gnu.org; kito.ch...@gmail.com; kito.ch...@sifive.com; 
pal...@dabbelt.com; pal...@rivosinc.com; jeffreya...@gmail.com; 
rdapp@gmail.com; Li, Pan2 
Subject: Re: [PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS

Why didn't you test that??

../../gcc/config/riscv/riscv-vsetvl.cc: In member function 'bool 
pass_vsetvl::global_eliminate_vsetvl_insn(const rtl_ssa::bb_info*) const':
../../gcc/config/riscv/riscv-vsetvl.cc:4354:3: error: 'vsetvl_rinsn' may be 
used uninitialized [-Werror=maybe-uninitialized]
 4354 |   if (!vsetvl_rinsn)
  |   ^~
../../gcc/config/riscv/riscv-vsetvl.cc:4343:13: note: 'vsetvl_rinsn' was 
declared here
 4343 |   rtx_insn *vsetvl_rinsn;
  | ^~~~
cc1plus: all warnings being treated as errors
make[3]: *** [../../gcc/config/riscv/t-riscv:66: riscv-vsetvl.o] Error 1

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


[PATCH v1] RISC-V: Fix one warning of maybe-uninitialized in riscv-vsetvl.cc

2023-06-16 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to fix one maybe-uninitialized warning. Aka:

riscv-vsetvl.cc:4354:3: error: 'vsetvl_rinsn' may be used uninitialized 
[-Werror=maybe-uninitialized]

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc
(pass_vsetvl::global_eliminate_vsetvl_insn): Initialize var by NULL.
---
 gcc/config/riscv/riscv-vsetvl.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index cae9be0d928..7066dea3d14 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -4340,7 +4340,7 @@ get_first_vsetvl_before_rvv_insns (basic_block cfg_bb)
 bool
 pass_vsetvl::global_eliminate_vsetvl_insn (const bb_info *bb) const
 {
-  rtx_insn *vsetvl_rinsn;
+  rtx_insn *vsetvl_rinsn = NULL;
   vector_insn_info dem = vector_insn_info ();
   const auto _info = get_block_info (bb);
   basic_block cfg_bb = bb->cfg_bb ();
-- 
2.34.1



Re: [PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS

2023-06-16 Thread Andreas Schwab
Why didn't you test that??

../../gcc/config/riscv/riscv-vsetvl.cc: In member function 'bool 
pass_vsetvl::global_eliminate_vsetvl_insn(const rtl_ssa::bb_info*) const':
../../gcc/config/riscv/riscv-vsetvl.cc:4354:3: error: 'vsetvl_rinsn' may be 
used uninitialized [-Werror=maybe-uninitialized]
 4354 |   if (!vsetvl_rinsn)
  |   ^~
../../gcc/config/riscv/riscv-vsetvl.cc:4343:13: note: 'vsetvl_rinsn' was 
declared here
 4343 |   rtx_insn *vsetvl_rinsn;
  | ^~~~
cc1plus: all warnings being treated as errors
make[3]: *** [../../gcc/config/riscv/t-riscv:66: riscv-vsetvl.o] Error 1

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: Re: [PATCH V4] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs

2023-06-16 Thread juzhe.zh...@rivai.ai
Address comments and send V6.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-16 18:10
To: juzhe.zh...@rivai.ai; gcc-patches
CC: rdapp.gcc; rguenther; richard.sandiford
Subject: Re: [PATCH V4] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs
> <= (operand 2 + operand 4) are used."
 
Sorry it's really minor (and my mistake) but it should be < and
not <=, right?  Mask index 0 is inactive when the length is 0.
 
> +Perform a masked store (operand 2 + operand 4)
 
Even more minor but as mentioned the "of" is still missing ;)
Same with the <=.
 
Regards
Robin
 


[PATCH V6] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs

2023-06-16 Thread juzhe . zhong
From: Ju-Zhe Zhong 

This patch bootstrap pass on X86, ok for trunk ?

Accoding to comments from Richi, split the first patch to add ifn && optabs
of LEN_MASK_{LOAD,STORE} only, we don't apply them into vectorizer in this
patch. And also add BIAS argument for possible s390's future use.

The description of the patterns in doc are coming Robin.

After this patch is approved, will send the second patch to apply len_mask_*
patterns into vectorizer.

Target like ARM SVE in GCC has an elegant way to handle both loop control
and flow control simultaneously:

loop_control_mask = WHILE_ULT
flow_control_mask = comparison
control_mask = loop_control_mask & flow_control_mask;
MASK_LOAD (control_mask)
MASK_STORE (control_mask)

However, targets like RVV (RISC-V Vector) can not use this approach in
auto-vectorization since RVV use length in loop control.

This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets
like RISC-V that uses length in loop control.
Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length
or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR.
Mask is the outcome of comparison.

LEN_MASK_ LOAD/STORE format is defined as follows:
1). LEN_MASK_LOAD (ptr, align, length, mask).
2). LEN_MASK_STORE (ptr, align, length, mask, vec).

Consider these 4 following cases:

VLA: Variable-length auto-vectorization
VLS: Specific-length auto-vectorization

Case 1 (VLS): -mrvv-vector-bits=128   IR (Does not use LEN_MASK_*):
Code:   v1 = MEM (...)
  for (int i = 0; i < 4; i++)   v2 = MEM (...)
a[i] = b[i] + c[i]; v3 = v1 + v2 
MEM[...] = v3

Case 2 (VLS): -mrvv-vector-bits=128   IR (LEN_MASK_* with length = VF, mask = 
comparison):
Code:   mask = comparison
  for (int i = 0; i < 4; i++)   v1 = LEN_MASK_LOAD (length = VF, mask)
if (cond[i])v2 = LEN_MASK_LOAD (length = VF, mask) 
  a[i] = b[i] + c[i];   v3 = v1 + v2
LEN_MASK_STORE (length = VF, mask, v3)
   
Case 3 (VLA):
Code:   loop_len = SELECT_VL or MIN
  for (int i = 0; i < n; i++)   v1 = LEN_MASK_LOAD (length = loop_len, 
mask = {-1,-1,...})
  a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = loop_len, 
mask = {-1,-1,...})
v3 = v1 + v2
LEN_MASK_STORE (length = loop_len, mask 
= {-1,-1,...}, v3)

Case 4 (VLA):
Code:   loop_len = SELECT_VL or MIN
  for (int i = 0; i < n; i++)   mask = comparison
  if (cond[i])  v1 = LEN_MASK_LOAD (length = loop_len, 
mask)
  a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = loop_len, 
mask)
v3 = v1 + v2
LEN_MASK_STORE (length = loop_len, 
mask, v3)

Co-authored-by: Robin Dapp 

gcc/ChangeLog:

* doc/md.texi: Add len_mask{load,store}.
* genopinit.cc (main): Ditto.
(CMP_NAME): Ditto.
* internal-fn.cc (len_maskload_direct): Ditto.
(len_maskstore_direct): Ditto.
(expand_call_mem_ref): Ditto.
(expand_partial_load_optab_fn): Ditto.
(expand_len_maskload_optab_fn): Ditto.
(expand_partial_store_optab_fn): Ditto.
(expand_len_maskstore_optab_fn): Ditto.
(direct_len_maskload_optab_supported_p): Ditto.
(direct_len_maskstore_optab_supported_p): Ditto.
* internal-fn.def (LEN_MASK_LOAD): Ditto.
(LEN_MASK_STORE): Ditto.
* optabs.def (OPTAB_CD): Ditto.

---
 gcc/doc/md.texi | 53 -
 gcc/genopinit.cc|  6 +++--
 gcc/internal-fn.cc  | 43 
 gcc/internal-fn.def |  4 
 gcc/optabs.def  |  2 ++
 5 files changed, 101 insertions(+), 7 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index a43fd65a2b2..6a52adba9c0 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,7 +5094,7 @@ This pattern is not allowed to @code{FAIL}.
 
 @cindex @code{len_load_@var{m}} instruction pattern
 @item @samp{len_load_@var{m}}
-Load (operand 2 - operand 3) elements from vector memory operand 1
+Load (operand 2 - operand 3) elements from memory operand 1
 into vector register operand 0, setting the other elements of
 operand 0 to undefined values.  Operands 0 and 1 have mode @var{m},
 which must be a vector mode.  Operand 2 has whichever integer mode the
@@ -5136,6 +5136,57 @@ of @code{QI} elements.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{len_maskload@var{m}@var{n}} instruction pattern
+@item @samp{len_maskload@var{m}@var{n}}
+Perform a masked load of (operand 2 + operand 4) elements from vector memory

Re: [PATCH v3] configure: Implement --enable-host-pie

2023-06-16 Thread Martin Jambor
Hello,

On Thu, Jun 15 2023, Marek Polacek via Gcc-patches wrote:
> On Mon, Jun 05, 2023 at 09:06:43PM -0600, Jeff Law wrote:
>> 
>> 
>> On 6/5/23 10:18, Marek Polacek via Gcc-patches wrote:
>> > Ping.  Anyone have any further comments?
>> Given this was approved before, but got reverted due to issues (which have
>> since been addressed) -- I think you might as well go forward and sooner
>> rather than later so that we can catch fallout earlier.
>
> Thanks, pushed now, after rebasing, adjusting the patch for
> r14-1385, and testing with and without --enable-host-pie on
> both Debian and Fedora.
>
> If something comes up and I can't fix it quickly enough, I'll
> have to revert the patch.  We'll see.
>

The script that regularly checks that the checked-in autotools-generated
files are in sync now complain about the following diff.  Unless someone
stops me because I overlooked something or for some other reason, I will
commit it later on as obvious.

I wonder where the "line" differences come from, perhaps you added a
comment after running autoconf/automake/...?  The zlib/Makefile.in hunks
like something we should have, though, even if I did not check whether
it makes any difference in practice.  And I want the checking script to
shut up too ;-)

Thanks,

Martin


diff --git a/gcc/configure b/gcc/configure
index a4563a9cade..f7b4b283ca2 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -19847,7 +19847,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19848 "configure"
+#line 19850 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -19953,7 +19953,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 19954 "configure"
+#line 19956 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/zlib/Makefile.in b/zlib/Makefile.in
index 3f5102d1b87..80fe3b69116 100644
--- a/zlib/Makefile.in
+++ b/zlib/Makefile.in
@@ -353,6 +353,8 @@ datadir = @datadir@
 datarootdir = @datarootdir@
 docdir = @docdir@
 dvidir = @dvidir@
+enable_host_pie = @enable_host_pie@
+enable_host_shared = @enable_host_shared@
 exec_prefix = @exec_prefix@
 host = @host@
 host_alias = @host_alias@
diff --git a/zlib/configure b/zlib/configure
index 77be6c284e3..9308866a636 100755
--- a/zlib/configure
+++ b/zlib/configure
@@ -10763,7 +10763,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 10778 "configure"
+#line 10766 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -10869,7 +10869,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 10884 "configure"
+#line 10872 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H


[Ping] Re: [PATCH] avr: Set param_min_pagesize to 0 [PR105523]

2023-06-16 Thread SenthilKumar.Selvaraj--- via Gcc-patches
On Fri, 2023-06-02 at 12:32 +0530, Senthil Kumar Selvaraj wrote:
> On Mon, 2023-05-22 at 14:05 +0200, Richard Biener wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> > content is safe
> > 
> > On Fri, May 19, 2023 at 7:58 AM  wrote:
> > > On 26/04/23, 5:51 PM, "Richard Biener"  > > > wrote:
> > > > On Wed, Apr 26, 2023 at 12:56 PM  > > > > wrote:
> > > > > On Wed, Apr 26, 2023 at 3:15 PM Richard Biener via Gcc-patches 
> > > > > mailto:gcc-patches@gcc.gnu.org>> wrote:
> > > > > > On Wed, Apr 26, 2023 at 11:42 AM Richard Biener
> > > > > > mailto:richard.guent...@gmail.com>> 
> > > > > > wrote:
> > > > > > > On Wed, Apr 26, 2023 at 11:01 AM SenthilKumar.Selvaraj--- via
> > > > > > > Gcc-patches  > > > > > > > wrote:
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > This patch fixes PR 105523 by setting param_min_pagesize to 0 
> > > > > > > > for the
> > > > > > > > avr target. For this target, zero and offsets from zero are 
> > > > > > > > perfectly
> > > > > > > > valid addresses, and the default value of param_min_pagesize 
> > > > > > > > ends up
> > > > > > > > triggering warnings on valid memory accesses.
> > > > > > > 
> > > > > > > I think the proper configuration is to have
> > > > > > > DEFAULT_ADDR_SPACE_ZERO_ADDRESS_VALID
> > > > > > 
> > > > > > Err, TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID
> > > > > 
> > > > > That worked. Ok for trunk and backporting to 13 and 12 branches
> > > > > (pending regression testing)?
> > > > 
> > > > OK, but please let Denis time to comment.
> > > 
> > > Didn't hear from Denis. When running regression tests with this patch,
> > > I found that some tests with -fdelete-null-pointer-checks were
> > > failing. Commit 19416210b37db0584cd0b3f3b3961324b8973d25 made
> > > -fdelete-null-pointer-checks false by default, while still allowing it
> > > to be overridden from the command line (it was previously
> > > unconditionally false).
> > > 
> > > To keep the same behavior, I modified the hook to report zero
> > > addresses as valid only if -fdelete-null-pointer-checks is not set.
> > > With this change, all regression tests pass.
> > > 
> > > Ok for trunk and backporting to 13 and 12 branches?
> > 
> > I think that's bit backwards - this hook conveys more precise information
> > (it's address-space specific) and it is also more specific.  Instead I'd
> > suggest to set the flag to zero in the target like nios2 or msp430 do.
> > In fact we should probably initialize it using this hook (and using the
> > default address space).
> 
> Does the below patch work? The hook impl reports that zero address is
> valid, and flag_delete_null_pointer_checks is set to zero if the
> hook says zero is a valid address.
> 
> As flag_delete_null_pointer_checks is now always disabled for avr, I
> removed the resetting code in avr-common.cc that disables it for
> OPT_LEVELS_ALL by default, and added avr as a target that always keeps
> null pointer checks in testsuite/lib/target-supports.exp.
> 
> I also removed ATTRIBUTE_UNUSED and the parameter name in the target
> hook to address 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619014.html. 
> 
> PR 105523
> 
> gcc/ChangeLog:
> 
>   * common/config/avr/avr-common.cc: Remove setting
>   of OPT_fdelete_null_pointer_checks.
>   * config/avr/avr.cc (avr_option_override): Clear
>   flag_delete_null_pointer_checks if zero_address_valid.
>   (avr_addr_space_zero_address_valid): New function.
>   (TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID): Provide target
>   hook.
> 
> gcc/testsuite/ChangeLog:
> 
>   * lib/target-supports.exp
>   (check_effective_target_keeps_null_pointer_checks): Add
>   avr.
>   * gcc.target/avr/pr105523.c: New test.
> 
> diff --git gcc/common/config/avr/avr-common.cc 
> gcc/common/config/avr/avr-common.cc
> index 2ad0244..2f874c5 100644
> --- gcc/common/config/avr/avr-common.cc
> +++ gcc/common/config/avr/avr-common.cc
> @@ -29,12 +29,6 @@
>  /* Implement TARGET_OPTION_OPTIMIZATION_TABLE.  */
>  static const struct default_options avr_option_optimization_table[] =
>{
> -// With -fdelete-null-pointer-checks option, the compiler assumes
> -// that dereferencing of a null pointer would halt the program.
> -// For AVR this assumption is not true and a program can safely
> -// dereference null pointers.  Changes made by this option may not
> -// work properly for AVR.  So disable this option.
> -{ OPT_LEVELS_ALL, OPT_fdelete_null_pointer_checks, NULL, 0 },
>  // The only effect of -fcaller-saves might be that it triggers
>  // a frame without need when it tries to be smart around calls.
>  { OPT_LEVELS_ALL, OPT_fcaller_saves, NULL, 0 },
> diff --git gcc/config/avr/avr.cc gcc/config/avr/avr.cc
> index a90cade..b987837 100644
> --- gcc/config/avr/avr.cc
> +++ gcc/config/avr/avr.cc
> @@ -756,6 +756,10 @@ 

Re: [PATCH V3 1/4] rs6000: build constant via li;rotldi

2023-06-16 Thread Segher Boessenkool
Hi!

On Fri, Jun 16, 2023 at 04:34:12PM +0800, Jiufu Guo wrote:
> +/* Check if value C can be built by 2 instructions: one is 'li', another is
> +   rotldi.
> +
> +   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
> +   is set to -1, and return true.  Return false otherwise.  */

Don't say "is set to -1", the point of having this is so you say "is set
to the "li" value".  Just like you describe what SHIFT is for.

> +static bool
> +can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
> +HOST_WIDE_INT *mask)
> +{
> +  int n;

Put shis later, like:

> +  /* Check if C can be rotated to a positive or negative value
> +  which 'li' instruction is able to load.  */
  int n;
> +  if (can_be_rotated_to_lowbits (c, 15, )
> +  || can_be_rotated_to_lowbits (~c, 15, ))
> +{
> +  *mask = HOST_WIDE_INT_M1;
> +  *shift = HOST_BITS_PER_WIDE_INT - n;
> +  return true;
> +}

It is tricky to see ~c will always work, since what is really done is -c
instead.  Can you just use that here?

> @@ -10266,15 +10291,14 @@ static void
>  rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
>  {
>rtx temp;
> +  int shift;
> +  HOST_WIDE_INT mask;
>HOST_WIDE_INT ud1, ud2, ud3, ud4;
>  
>ud1 = c & 0x;
> -  c = c >> 16;
> -  ud2 = c & 0x;
> -  c = c >> 16;
> -  ud3 = c & 0x;
> -  c = c >> 16;
> -  ud4 = c & 0x;
> +  ud2 = (c >> 16) & 0x;
> +  ud3 = (c >> 32) & 0x;
> +  ud4 = (c >> 48) & 0x;
>  
>if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
>|| (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
> @@ -10305,6 +10329,17 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
> c)
>emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
>GEN_INT ((ud2 ^ 0x) << 16)));
>  }
> +  else if (can_be_built_by_li_and_rotldi (c, , ))
> +{
> +  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
> +  unsigned HOST_WIDE_INT imm = (c | ~mask);
> +  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
> +
> +  emit_move_insn (temp, GEN_INT (imm));
> +  if (shift != 0)
> + temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
> +  emit_move_insn (dest, temp);
> +}

If you would rewrite so it isn't such a run-on thing with "else if",
instead using early outs, or even some factoring, you could declare the
variable used only in a tiny scope in that tiny scope instead.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
> @@ -0,0 +1,54 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -save-temps" } */
> +/* { dg-require-effective-target has_arch_ppc64 } */

Please put a tiny comment here saying what this test is *for*?  The file
name is a bit of hint already, but you can indicate much more in one or
two lines :-)

With those adjustments, okay for trunk.  Thanks!

(If -c doesn't work, it needs more explanation).


Segher


Re: [PATCH V4] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs

2023-06-16 Thread Robin Dapp via Gcc-patches
> <= (operand 2 + operand 4) are used."

Sorry it's really minor (and my mistake) but it should be < and
not <=, right?  Mask index 0 is inactive when the length is 0.

> +Perform a masked store (operand 2 + operand 4)

Even more minor but as mentioned the "of" is still missing ;)
Same with the <=.

Regards
 Robin


Re: Re: [PATCH V4] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs

2023-06-16 Thread juzhe.zh...@rivai.ai
Thanks Robin. 

I have sent V5 for future merge convenience.
I didn't change len_load/len_store description since I think it should be 
another separate patch.
This patch is adding len_maskload/len_maskstore.

I will wait for Richard S the final comments.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-16 17:21
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; rguenther; richard.sandiford
Subject: Re: [PATCH V4] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs
Hi Juzhe,
 
> +@cindex @code{len_maskload@var{m}@var{n}} instruction pattern
> +@item @samp{len_maskload@var{m}@var{n}}
> +Perform a masked load (operand 2 - operand 4) elements from vector memory
> +operand 1 into vector register operand 0, setting the other elements of
> +operand 0 to undefined values.  This is a combination of len_load and 
> maskload. 
> +Operands 0 and 1 have mode @var{m}, which must be a vector mode.  Operand 2
> +has whichever integer mode the target prefers.  A secondary mask is 
> specified in
> +operand 3 which must be of type @var{n}.  Operand 4 conceptually has mode 
> @code{QI}.
> +
> +Operand 2 can be a variable or a constant amount.  Operand 4 specifies a
> +constant bias: it is either a constant 0 or a constant -1.  The predicate on
> +operand 4 must only accept the bias values that the target actually supports.
> +GCC handles a bias of 0 more efficiently than a bias of -1.
> +
> +If (operand 2 - operand 4) exceeds the number of elements in mode
> +@var{m}, the behavior is undefined.
> +
> +If the target prefers the length to be measured in bytes
> +rather than elements, it should only implement this pattern for vectors
> +of @code{QI} elements.
> +
> +This pattern is not allowed to @code{FAIL}.
 
Please still change
"Perform a masked load (operand 2 - operand 4) elements"
to
"Perform a masked load of (operand 2 + operand 4) elements".
 
"vector memory operand" -> "memory operand"
 
As Richi has mentioned we are adding the negative bias not subtracting a 
positive
one.  You could also change the len_load and len_store comments while at it so
as to not introduce more confusion.
 
The "secondary" can also be omitted now because we don't have a primary mask
somewhere.  Maybe, for clarification, even if it's implicit:
"A mask is specified in operand 3 which must... The mask has lower precedence
than the length and is itself subject to length masking, i.e. only mask indices
<= (operand 2 + operand 4) are used."
 
> +
> +@cindex @code{len_maskstore@var{m}@var{n}} instruction pattern
> +@item @samp{len_maskstore@var{m}@var{n}}
> +Perform a masked store (operand 2 - operand 4) vector elements from vector 
> register
> +operand 1 into memory operand 0, leaving the other elements of operand 0 
> unchanged.
> +This is a combination of len_store and maskstore.
> +Operands 0 and 1 have mode @var{m}, which must be a vector mode.  Operand 2 
> has whichever
> +integer mode the target prefers.  A secondary mask is specified in operand 3 
> which must be
> +of type @var{n}.  Operand 4 conceptually has mode @code{QI}.
 
Same thing applies here "store of (operand 2 + operand 4) vector elements as
well as the secondary.
 
Thanks.  No V5 necessary IMHO for those but let's see what Richard says.
 
Regards
Robin
 
 


[PATCH V5] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs

2023-06-16 Thread juzhe . zhong
From: Ju-Zhe Zhong 

This patch bootstrap pass on X86, ok for trunk ?

Accoding to comments from Richi, split the first patch to add ifn && optabs
of LEN_MASK_{LOAD,STORE} only, we don't apply them into vectorizer in this
patch. And also add BIAS argument for possible s390's future use.

The description of the patterns in doc are coming Robin.

After this patch is approved, will send the second patch to apply len_mask_*
patterns into vectorizer.

Target like ARM SVE in GCC has an elegant way to handle both loop control
and flow control simultaneously:

loop_control_mask = WHILE_ULT
flow_control_mask = comparison
control_mask = loop_control_mask & flow_control_mask;
MASK_LOAD (control_mask)
MASK_STORE (control_mask)

However, targets like RVV (RISC-V Vector) can not use this approach in
auto-vectorization since RVV use length in loop control.

This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets
like RISC-V that uses length in loop control.
Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length
or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR.
Mask is the outcome of comparison.

LEN_MASK_ LOAD/STORE format is defined as follows:
1). LEN_MASK_LOAD (ptr, align, length, mask).
2). LEN_MASK_STORE (ptr, align, length, mask, vec).

Consider these 4 following cases:

VLA: Variable-length auto-vectorization
VLS: Specific-length auto-vectorization

Case 1 (VLS): -mrvv-vector-bits=128   IR (Does not use LEN_MASK_*):
Code:   v1 = MEM (...)
  for (int i = 0; i < 4; i++)   v2 = MEM (...)
a[i] = b[i] + c[i]; v3 = v1 + v2 
MEM[...] = v3

Case 2 (VLS): -mrvv-vector-bits=128   IR (LEN_MASK_* with length = VF, mask = 
comparison):
Code:   mask = comparison
  for (int i = 0; i < 4; i++)   v1 = LEN_MASK_LOAD (length = VF, mask)
if (cond[i])v2 = LEN_MASK_LOAD (length = VF, mask) 
  a[i] = b[i] + c[i];   v3 = v1 + v2
LEN_MASK_STORE (length = VF, mask, v3)
   
Case 3 (VLA):
Code:   loop_len = SELECT_VL or MIN
  for (int i = 0; i < n; i++)   v1 = LEN_MASK_LOAD (length = loop_len, 
mask = {-1,-1,...})
  a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = loop_len, 
mask = {-1,-1,...})
v3 = v1 + v2
LEN_MASK_STORE (length = loop_len, mask 
= {-1,-1,...}, v3)

Case 4 (VLA):
Code:   loop_len = SELECT_VL or MIN
  for (int i = 0; i < n; i++)   mask = comparison
  if (cond[i])  v1 = LEN_MASK_LOAD (length = loop_len, 
mask)
  a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = loop_len, 
mask)
v3 = v1 + v2
LEN_MASK_STORE (length = loop_len, 
mask, v3)

Co-authored-by: Robin Dapp 

gcc/ChangeLog:

* doc/md.texi: Add len_mask{load,store}.
* genopinit.cc (main): Ditto.
(CMP_NAME): Ditto.
* internal-fn.cc (len_maskload_direct): Ditto.
(len_maskstore_direct): Ditto.
(expand_call_mem_ref): Ditto.
(expand_partial_load_optab_fn): Ditto.
(expand_len_maskload_optab_fn): Ditto.
(expand_partial_store_optab_fn): Ditto.
(expand_len_maskstore_optab_fn): Ditto.
(direct_len_maskload_optab_supported_p): Ditto.
(direct_len_maskstore_optab_supported_p): Ditto.
* internal-fn.def (LEN_MASK_LOAD): Ditto.
(LEN_MASK_STORE): Ditto.
* optabs.def (OPTAB_CD): Ditto.

---
 gcc/doc/md.texi | 53 -
 gcc/genopinit.cc|  6 +++--
 gcc/internal-fn.cc  | 43 
 gcc/internal-fn.def |  4 
 gcc/optabs.def  |  2 ++
 5 files changed, 101 insertions(+), 7 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index a43fd65a2b2..b5a9f937be2 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5094,7 +5094,7 @@ This pattern is not allowed to @code{FAIL}.
 
 @cindex @code{len_load_@var{m}} instruction pattern
 @item @samp{len_load_@var{m}}
-Load (operand 2 - operand 3) elements from vector memory operand 1
+Load (operand 2 - operand 3) elements from memory operand 1
 into vector register operand 0, setting the other elements of
 operand 0 to undefined values.  Operands 0 and 1 have mode @var{m},
 which must be a vector mode.  Operand 2 has whichever integer mode the
@@ -5136,6 +5136,57 @@ of @code{QI} elements.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{len_maskload@var{m}@var{n}} instruction pattern
+@item @samp{len_maskload@var{m}@var{n}}
+Perform a masked load (operand 2 + operand 4) elements from vector memory
+operand 1 

Re: [PATCH V4] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs

2023-06-16 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

> +@cindex @code{len_maskload@var{m}@var{n}} instruction pattern
> +@item @samp{len_maskload@var{m}@var{n}}
> +Perform a masked load (operand 2 - operand 4) elements from vector memory
> +operand 1 into vector register operand 0, setting the other elements of
> +operand 0 to undefined values.  This is a combination of len_load and 
> maskload. 
> +Operands 0 and 1 have mode @var{m}, which must be a vector mode.  Operand 2
> +has whichever integer mode the target prefers.  A secondary mask is 
> specified in
> +operand 3 which must be of type @var{n}.  Operand 4 conceptually has mode 
> @code{QI}.
> +
> +Operand 2 can be a variable or a constant amount.  Operand 4 specifies a
> +constant bias: it is either a constant 0 or a constant -1.  The predicate on
> +operand 4 must only accept the bias values that the target actually supports.
> +GCC handles a bias of 0 more efficiently than a bias of -1.
> +
> +If (operand 2 - operand 4) exceeds the number of elements in mode
> +@var{m}, the behavior is undefined.
> +
> +If the target prefers the length to be measured in bytes
> +rather than elements, it should only implement this pattern for vectors
> +of @code{QI} elements.
> +
> +This pattern is not allowed to @code{FAIL}.

Please still change
"Perform a masked load (operand 2 - operand 4) elements"
to
"Perform a masked load of (operand 2 + operand 4) elements".

"vector memory operand" -> "memory operand"

As Richi has mentioned we are adding the negative bias not subtracting a 
positive
one.  You could also change the len_load and len_store comments while at it so
as to not introduce more confusion.

The "secondary" can also be omitted now because we don't have a primary mask
somewhere.  Maybe, for clarification, even if it's implicit:
"A mask is specified in operand 3 which must... The mask has lower precedence
than the length and is itself subject to length masking, i.e. only mask indices
<= (operand 2 + operand 4) are used."

> +
> +@cindex @code{len_maskstore@var{m}@var{n}} instruction pattern
> +@item @samp{len_maskstore@var{m}@var{n}}
> +Perform a masked store (operand 2 - operand 4) vector elements from vector 
> register
> +operand 1 into memory operand 0, leaving the other elements of operand 0 
> unchanged.
> +This is a combination of len_store and maskstore.
> +Operands 0 and 1 have mode @var{m}, which must be a vector mode.  Operand 2 
> has whichever
> +integer mode the target prefers.  A secondary mask is specified in operand 3 
> which must be
> +of type @var{n}.  Operand 4 conceptually has mode @code{QI}.

Same thing applies here "store of (operand 2 + operand 4) vector elements as
well as the secondary.

Thanks.  No V5 necessary IMHO for those but let's see what Richard says.

Regards
 Robin



Re: [PATCH, OpenACC 2.7] Implement host_data must have use_device clause requirement

2023-06-16 Thread Thomas Schwinge
Hi Chung-Lin!

On 2023-06-06T23:10:37+0800, Chung-Lin Tang  wrote:
> this patch implements the OpenACC 2.7 change requiring the host_data construct
> to have at least one use_device clause.

Thanks!

> This patch started out with a simple check during gimplify (much smaller 
> patch),

Heh, thanks for the explanation -- would've been my first question
otherweise.  ;-)

> but turned out that front-ends removed use_device clauses when they have 
> error,
> and the gimplify check started to echo a "no use_device clause" message in 
> such
> cases, which seem confusing for the user. So ended up adding the check in each
> front-end instead.

I presume that's also the reason why you're doing this check before
'c_finish_omp_clauses' etc.?

I'll clarify with the OpenACC Technical Committee whether really those
diagnostics are intended as "error" or should instead just be "warning".
After all, there's no actual problem with an OpenACC 'host_data' without
'use_device' clause (or no data clause on OpenACC 'data', 'enter data',
'exit data', 'update', etc.) -- it's just likely that the user missed
something.  That is, the OpenACC 2.7: "At least one 'use_device' clause
must appear" is addressing the user, not at the implementation (in my
current interpretation).  Depending on the outcome of that, we can easily
adjust GCC.

Note for later, independently of your work here:
'c_parser_oacc_enter_exit_data' etc. for its corresponding "has no data
movement clause" diagnostic actually does 'c_finish_omp_clauses' etc.
first -- maybe that should be changed accordingly.  (Actually, I note
that it's only OpenACC 3.0 that "Required at least one data clause on a
'data' construct, an 'enter data' directive, or an 'exit data'
directive", heh...  Per his internal 2014-10-17 email, Cesar implemented
the code of 'c_parser_oacc_enter_exit_data' etc. "similar to that of acc
update", which indeed already back then did require "At least one 'self',
'host', or 'device' clause".  Fortran does have the diagnostic for
OpenACC 'update', but it's missing for OpenACC 'enter data', 'exit data'
without data clause (have not checked other constructs with similar
requirements).)

> Tested on powerpc64le-linux/nvptx, x86_64-linux/amdgcn tests in progress 
> (expect
> no surprises). Is this okay for trunk?

OK with one small change, please -- unless there's a reason for doing it
this way:

> --- a/gcc/fortran/trans-openmp.cc
> +++ b/gcc/fortran/trans-openmp.cc
> @@ -4677,6 +4677,12 @@ gfc_trans_oacc_construct (gfc_code *code)
>   break;
>case EXEC_OACC_HOST_DATA:
>   construct_code = OACC_HOST_DATA;
> + if (code->ext.omp_clauses->lists[OMP_LIST_USE_DEVICE] == NULL)
> +   {
> + error_at (gfc_get_location (>loc),
> +   "% construct requires % clause");
> + return NULL_TREE;
> +   }
>   break;
>default:
>   gcc_unreachable ();

The OpenMP "must contain at least one [...] clause" checks are done in
'gcc/fortran/openmp.cc:resolve_omp_clauses'.  For consistency (or, to let
'gcc/fortran/trans-openmp.cc' continue to just deal with "directive
translation"), do similar for OpenACC 'host_data'?  (..., and we later
accordingly adjust 'gcc/fortran/openmp.cc:gfc_match_oacc_update', too?)


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: Re: [PATCH V4] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs

2023-06-16 Thread juzhe.zh...@rivai.ai
Thanks a lot! I will wait for Richard final approve.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-16 17:04
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford; rdapp.gcc
Subject: Re: [PATCH V4] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs
On Thu, 15 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> This patch bootstrap pass on X86, ok for trunk ?
 
OK with me, please give Richard S. a chance to comment before pushing.
 
Thanks,
Richard.
 
> Accoding to comments from Richi, split the first patch to add ifn && optabs
> of LEN_MASK_{LOAD,STORE} only, we don't apply them into vectorizer in this
> patch. And also add BIAS argument for possible s390's future use.
> 
> The description of the patterns in doc are coming Robin.
> 
> After this patch is approved, will send the second patch to apply len_mask_*
> patterns into vectorizer.
> 
> Target like ARM SVE in GCC has an elegant way to handle both loop control
> and flow control simultaneously:
> 
> loop_control_mask = WHILE_ULT
> flow_control_mask = comparison
> control_mask = loop_control_mask & flow_control_mask;
> MASK_LOAD (control_mask)
> MASK_STORE (control_mask)
> 
> However, targets like RVV (RISC-V Vector) can not use this approach in
> auto-vectorization since RVV use length in loop control.
> 
> This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets
> like RISC-V that uses length in loop control.
> Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length
> or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR.
> Mask is the outcome of comparison.
> 
> LEN_MASK_ LOAD/STORE format is defined as follows:
> 1). LEN_MASK_LOAD (ptr, align, length, mask).
> 2). LEN_MASK_STORE (ptr, align, length, mask, vec).
> 
> Consider these 4 following cases:
> 
> VLA: Variable-length auto-vectorization
> VLS: Specific-length auto-vectorization
> 
> Case 1 (VLS): -mrvv-vector-bits=128   IR (Does not use LEN_MASK_*):
> Code: v1 = MEM (...)
>   for (int i = 0; i < 4; i++)   v2 = MEM (...)
> a[i] = b[i] + c[i]; v3 = v1 + v2 
> MEM[...] = v3
> 
> Case 2 (VLS): -mrvv-vector-bits=128   IR (LEN_MASK_* with length = VF, mask = 
> comparison):
> Code:   mask = comparison
>   for (int i = 0; i < 4; i++)   v1 = LEN_MASK_LOAD (length = VF, mask)
> if (cond[i])v2 = LEN_MASK_LOAD (length = VF, 
> mask) 
>   a[i] = b[i] + c[i];   v3 = v1 + v2
> LEN_MASK_STORE (length = VF, mask, v3)
>
> Case 3 (VLA):
> Code:   loop_len = SELECT_VL or MIN
>   for (int i = 0; i < n; i++)   v1 = LEN_MASK_LOAD (length = 
> loop_len, mask = {-1,-1,...})
>   a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = 
> loop_len, mask = {-1,-1,...})
> v3 = v1 + v2  
>   
> LEN_MASK_STORE (length = loop_len, 
> mask = {-1,-1,...}, v3)
> 
> Case 4 (VLA):
> Code:   loop_len = SELECT_VL or MIN
>   for (int i = 0; i < n; i++)   mask = comparison
>   if (cond[i])  v1 = LEN_MASK_LOAD (length = 
> loop_len, mask)
>   a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = 
> loop_len, mask)
> v3 = v1 + v2  
>   
> LEN_MASK_STORE (length = loop_len, 
> mask, v3)
> 
> Co-authored-by: Robin Dapp 
> 
> gcc/ChangeLog:
> 
> * doc/md.texi: Add len_mask{load,store}.
> * genopinit.cc (main): Ditto.
> (CMP_NAME): Ditto.
> * internal-fn.cc (len_maskload_direct): Ditto.
> (len_maskstore_direct): Ditto.
> (expand_call_mem_ref): Ditto.
> (expand_partial_load_optab_fn): Ditto.
> (expand_len_maskload_optab_fn): Ditto.
> (expand_partial_store_optab_fn): Ditto.
> (expand_len_maskstore_optab_fn): Ditto.
> (direct_len_maskload_optab_supported_p): Ditto.
> (direct_len_maskstore_optab_supported_p): Ditto.
> * internal-fn.def (LEN_MASK_LOAD): Ditto.
> (LEN_MASK_STORE): Ditto.
> * optabs.def (OPTAB_CD): Ditto.
> 
> ---
>  gcc/doc/md.texi | 46 +
>  gcc/genopinit.cc|  6 --
>  gcc/internal-fn.cc  | 43 ++
>  gcc/internal-fn.def |  4 
>  gcc/optabs.def  |  2 ++
>  5 files changed, 95 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index a43fd65a2b2..af23ec938d6 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5136,6 +5136,52 @@ of @code{QI} elements.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{len_maskload@var{m}@var{n}} instruction pattern
> 

[PATCH] simplify-rtx: Simplify VEC_CONCAT of SUBREG and VEC_CONCAT from same vector

2023-06-16 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

In the testcase for this patch we try to vec_concat the lowpart and highpart of 
a vector, but the lowpart is expressed as a subreg.
simplify-rtx.cc does not recognise this and combine ends up trying to match:
Trying 7 -> 8:
7: r93:V2SI=vec_select(r95:V4SI,parallel)
8: r97:V4SI=vec_concat(r95:V4SI#0,r93:V2SI)
  REG_DEAD r95:V4SI
  REG_DEAD r93:V2SI
Failed to match this instruction:
(set (reg:V4SI 97)
(vec_concat:V4SI (subreg:V2SI (reg/v:V4SI 95 [ a ]) 0)
(vec_select:V2SI (reg/v:V4SI 95 [ a ])
(parallel:V4SI [
(const_int 2 [0x2])
(const_int 3 [0x3])
]

This should be just (set (reg:V4SI 97) (reg:V4SI 95)). This patch adds such a 
simplification.
The testcase is a bit artificial, but I do have other aarch64-specific patterns 
that I want to optimise later
that rely on this simplification happening.

Without this patch for the testcase we generate:
foo:
dup d31, v0.d[1]
ins v0.d[1], v31.d[0]
ret

whereas we should just not generate anything as the operation is ultimately a 
no-op.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Ok for trunk?
Thanks,
Kyrill

gcc/ChangeLog:

* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
Simplify vec_concat of lowpart subreg and high part vec_select.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/low-high-combine_1.c: New test.


concat-subreg.patch
Description: concat-subreg.patch


Re: [PATCH V4] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs

2023-06-16 Thread Richard Biener via Gcc-patches
On Thu, 15 Jun 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> This patch bootstrap pass on X86, ok for trunk ?

OK with me, please give Richard S. a chance to comment before pushing.

Thanks,
Richard.

> Accoding to comments from Richi, split the first patch to add ifn && optabs
> of LEN_MASK_{LOAD,STORE} only, we don't apply them into vectorizer in this
> patch. And also add BIAS argument for possible s390's future use.
> 
> The description of the patterns in doc are coming Robin.
> 
> After this patch is approved, will send the second patch to apply len_mask_*
> patterns into vectorizer.
> 
> Target like ARM SVE in GCC has an elegant way to handle both loop control
> and flow control simultaneously:
> 
> loop_control_mask = WHILE_ULT
> flow_control_mask = comparison
> control_mask = loop_control_mask & flow_control_mask;
> MASK_LOAD (control_mask)
> MASK_STORE (control_mask)
> 
> However, targets like RVV (RISC-V Vector) can not use this approach in
> auto-vectorization since RVV use length in loop control.
> 
> This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets
> like RISC-V that uses length in loop control.
> Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length
> or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR.
> Mask is the outcome of comparison.
> 
> LEN_MASK_ LOAD/STORE format is defined as follows:
> 1). LEN_MASK_LOAD (ptr, align, length, mask).
> 2). LEN_MASK_STORE (ptr, align, length, mask, vec).
> 
> Consider these 4 following cases:
> 
> VLA: Variable-length auto-vectorization
> VLS: Specific-length auto-vectorization
> 
> Case 1 (VLS): -mrvv-vector-bits=128   IR (Does not use LEN_MASK_*):
> Code: v1 = MEM (...)
>   for (int i = 0; i < 4; i++)   v2 = MEM (...)
> a[i] = b[i] + c[i]; v3 = v1 + v2 
> MEM[...] = v3
> 
> Case 2 (VLS): -mrvv-vector-bits=128   IR (LEN_MASK_* with length = VF, mask = 
> comparison):
> Code:   mask = comparison
>   for (int i = 0; i < 4; i++)   v1 = LEN_MASK_LOAD (length = VF, mask)
> if (cond[i])v2 = LEN_MASK_LOAD (length = VF, 
> mask) 
>   a[i] = b[i] + c[i];   v3 = v1 + v2
> LEN_MASK_STORE (length = VF, mask, v3)
>
> Case 3 (VLA):
> Code:   loop_len = SELECT_VL or MIN
>   for (int i = 0; i < n; i++)   v1 = LEN_MASK_LOAD (length = 
> loop_len, mask = {-1,-1,...})
>   a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = 
> loop_len, mask = {-1,-1,...})
> v3 = v1 + v2  
>   
> LEN_MASK_STORE (length = loop_len, 
> mask = {-1,-1,...}, v3)
> 
> Case 4 (VLA):
> Code:   loop_len = SELECT_VL or MIN
>   for (int i = 0; i < n; i++)   mask = comparison
>   if (cond[i])  v1 = LEN_MASK_LOAD (length = 
> loop_len, mask)
>   a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = 
> loop_len, mask)
> v3 = v1 + v2  
>   
> LEN_MASK_STORE (length = loop_len, 
> mask, v3)
> 
> Co-authored-by: Robin Dapp 
> 
> gcc/ChangeLog:
> 
> * doc/md.texi: Add len_mask{load,store}.
> * genopinit.cc (main): Ditto.
> (CMP_NAME): Ditto.
> * internal-fn.cc (len_maskload_direct): Ditto.
> (len_maskstore_direct): Ditto.
> (expand_call_mem_ref): Ditto.
> (expand_partial_load_optab_fn): Ditto.
> (expand_len_maskload_optab_fn): Ditto.
> (expand_partial_store_optab_fn): Ditto.
> (expand_len_maskstore_optab_fn): Ditto.
> (direct_len_maskload_optab_supported_p): Ditto.
> (direct_len_maskstore_optab_supported_p): Ditto.
> * internal-fn.def (LEN_MASK_LOAD): Ditto.
> (LEN_MASK_STORE): Ditto.
> * optabs.def (OPTAB_CD): Ditto.
> 
> ---
>  gcc/doc/md.texi | 46 +
>  gcc/genopinit.cc|  6 --
>  gcc/internal-fn.cc  | 43 ++
>  gcc/internal-fn.def |  4 
>  gcc/optabs.def  |  2 ++
>  5 files changed, 95 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index a43fd65a2b2..af23ec938d6 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5136,6 +5136,52 @@ of @code{QI} elements.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{len_maskload@var{m}@var{n}} instruction pattern
> +@item @samp{len_maskload@var{m}@var{n}}
> +Perform a masked load (operand 2 - operand 4) elements from vector memory
> +operand 1 into vector register operand 0, setting the other elements of
> +operand 0 to undefined values.  This 

[PATCH 2/2][v2] AVX512 fully masked vectorization

2023-06-16 Thread Richard Biener via Gcc-patches


Compared to v1 this drops the first patch of the series which
inlined vect_get_max_nscalars_per_iter.  I was able to simplify
this final patch to no longer require a hash-map of rgroup_controls
but can re-use the existing vector which makes the function
meaningful (although unused) for AVX512 as well.

Otherwise I've incorporated review comments and fixed the
missing conversion of LOOP_VINFO_MASK_SKIP_NITERS in
vect_set_loop_condition_partial_vectors.

I've re-bootstrapped and tested the series on x86_64-unknown-linux-gnu
and plan to push it later today if I hear no objections.

Thanks,
Richard.


--

This implemens fully masked vectorization or a masked epilog for
AVX512 style masks which single themselves out by representing
each lane with a single bit and by using integer modes for the mask
(both is much like GCN).

AVX512 is also special in that it doesn't have any instruction
to compute the mask from a scalar IV like SVE has with while_ult.
Instead the masks are produced by vector compares and the loop
control retains the scalar IV (mainly to avoid dependences on
mask generation, a suitable mask test instruction is available).

Like RVV code generation prefers a decrementing IV though IVOPTs
messes things up in some cases removing that IV to eliminate
it with an incrementing one used for address generation.

One of the motivating testcases is from PR108410 which in turn
is extracted from x264 where large size vectorization shows
issues with small trip loops.  Execution time there improves
compared to classic AVX512 with AVX2 epilogues for the cases
of less than 32 iterations.

size   scalar 128 256 512512e512f
19.42   11.329.35   11.17   15.13   16.89
25.726.536.666.667.628.56
34.495.105.105.745.085.73
44.104.334.295.213.794.25
63.783.853.864.762.542.85
83.641.893.764.501.922.16
   123.562.213.754.261.261.42
   163.360.831.064.160.951.07
   203.391.421.334.070.750.85
   243.230.661.724.220.620.70
   283.181.092.044.200.540.61
   323.160.470.410.410.470.53
   343.160.670.610.560.440.50
   383.190.950.950.820.400.45
   423.090.581.211.130.360.40

'size' specifies the number of actual iterations, 512e is for
a masked epilog and 512f for the fully masked loop.  From
4 scalar iterations on the AVX512 masked epilog code is clearly
the winner, the fully masked variant is clearly worse and
it's size benefit is also tiny.

This patch does not enable using fully masked loops or
masked epilogues by default.  More work on cost modeling
and vectorization kind selection on x86_64 is necessary
for this.

Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
which could be exploited further to unify some of the flags
we have right now but there didn't seem to be many easy things
to merge, so I'm leaving this for followups.

Mask requirements as registered by vect_record_loop_mask are kept in their
original form and recorded in a hash_set now instead of being
processed to a vector of rgroup_controls.  Instead that's now
left to the final analysis phase which tries forming the rgroup_controls
vector using while_ult and if that fails now tries AVX512 style
which needs a different organization and instead fills a hash_map
with the relevant info.  vect_get_loop_mask now has two implementations,
one for the two mask styles we then have.

I have decided against interweaving vect_set_loop_condition_partial_vectors
with conditions to do AVX512 style masking and instead opted to
"duplicate" this to vect_set_loop_condition_partial_vectors_avx512.
Likewise for vect_verify_full_masking vs vect_verify_full_masking_avx512.

The vect_prepare_for_masked_peels hunk might run into issues with
SVE, I didn't check yet but using LOOP_VINFO_RGROUP_COMPARE_TYPE
looked odd.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  I've run
the testsuite with --param vect-partial-vector-usage=2 with and
without -fno-vect-cost-model and filed two bugs, one ICE (PR110221)
and one latent wrong-code (PR110237).

* tree-vectorizer.h (enum vect_partial_vector_style): New.
(_loop_vec_info::partial_vector_style): Likewise.
(LOOP_VINFO_PARTIAL_VECTORS_STYLE): Likewise.
(rgroup_controls::compare_type): Add.
(vec_loop_masks): Change from a typedef to auto_vec<>
to a structure.
* tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors):
Adjust.  Convert niters_skip to compare_type.
(vect_set_loop_condition_partial_vectors_avx512): New function
implementing the AVX512 partial vector codegen.
(vect_set_loop_condition): Dispatch to the correct

[PATCH] tree-optimization/110279- Check for nested FMA chains in reassoc

2023-06-16 Thread Di Zhao OS via Gcc-patches
This patch is to fix the regressions found in SPEC2017 fprate cases
 on aarch64.

1. Reused code in pass widening_mul to check for nested FMA chains
 (those connected by MULT_EXPRs), since re-writing to parallel
 generates worse codes.

2. Avoid re-arrange to produce less FMA chains that can be slow.

Tested on ampere1 and neoverse-n1, this fixed the regressions in
508.namd_r and 510.parest_r 1 copy run. While I'm still collecting data
on x86 machines we have, I'd like to know what do you think of this.

(Previously I tried to improve things with FMA by adding a widening_mul
pass before reassoc2 for it's easier to recognize different patterns
of FMA chains and decide whether to split them. But I suppose handling
them all in reassoc pass is more efficient.)

Thanks,
Di Zhao

---
gcc/ChangeLog:

* tree-ssa-math-opts.cc (convert_mult_to_fma_1): Add new parameter.
Support new mode that merely do the checking. 
(struct fma_transformation_info): Moved to header.
(class fma_deferring_state): Moved to header.
(convert_mult_to_fma): Add new parameter.
* tree-ssa-math-opts.h (struct fma_transformation_info):
(class fma_deferring_state): Moved from .cc.
(convert_mult_to_fma): Add function decl.
* tree-ssa-reassoc.cc (rewrite_expr_tree_parallel):
(rank_ops_for_fma): Return -1 if nested FMAs are found.
(reassociate_bb): Avoid rewriting to parallel if nested FMAs are found.



pr110279-Check-for-nested-FMA-chains-in-reassoc.diff
Description: pr110279-Check-for-nested-FMA-chains-in-reassoc.diff


Re: [PATCH] Check SCALAR_INT_MODE_P in try_const_anchors

2023-06-16 Thread Richard Biener via Gcc-patches
On Fri, 16 Jun 2023, Jiufu Guo wrote:

> Hi,
> 
> The const_anchor in cse.cc supports integer constants only.
> There is a "gcc_assert (SCALAR_INT_MODE_P (mode))" in
> try_const_anchors.
> 
> In the latest code, some non-integer modes are used with const int.
> For examples:
> "set (mem/c:BLK (xx) (const_int 0 [0])" occur in md files of
> rs6000, i386, arm, and pa. For this, the mode may be BLKmode.
> Pattern "(set (strict_low_part (xx)) (const_int xx))" could
> be generated in a few ports. For this, the mode may be VOIDmode.
> 
> So, avoid mode other than SCALAR_INT_MODE in try_const_anchors
> would be needed.
> 
> Some discussions in the previous thread:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621097.html
> 
> Bootstrap  pass on ppc64{,le} and x86_64.
> Is this ok for trunk?

OK.

Richard.

> 
> BR,
> Jeff (Jiufu Guo)
> 
> gcc/ChangeLog:
> 
>   * cse.cc (try_const_anchors): Check SCALAR_INT_MODE.
> 
> ---
>  gcc/cse.cc | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index 2bb63ac4105..ddb76fd281d 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -1312,11 +1312,10 @@ try_const_anchors (rtx src_const, machine_mode mode)
>rtx lower_exp = NULL_RTX, upper_exp = NULL_RTX;
>unsigned lower_old, upper_old;
>  
> -  /* CONST_INT is used for CC modes, but we should leave those alone.  */
> -  if (GET_MODE_CLASS (mode) == MODE_CC)
> +  /* CONST_INT may be in various modes, avoid non-scalar-int mode. */
> +  if (!SCALAR_INT_MODE_P (mode))
>  return NULL_RTX;
>  
> -  gcc_assert (SCALAR_INT_MODE_P (mode));
>if (!compute_const_anchors (src_const, _base, _offs,
> _base, _offs))
>  return NULL_RTX;
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH v5] MIPS: Add speculation_barrier support

2023-06-16 Thread Xi Ruoyao via Gcc-patches
On Fri, 2023-06-16 at 15:53 +0800, YunQiang Su wrote:
> Ohh, sorry. I forget it. I commented there.
> I have no permission to close this bug report. Can you help to close
> it?

Modify the email address of your Bugzilla account to your @gcc.gnu.org
address, then you should be able to close it.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[PATCH V3 1/4] rs6000: build constant via li;rotldi

2023-06-16 Thread Jiufu Guo via Gcc-patches
Hi,

If a constant is possible to be rotated to/from a positive or negative
value from "li", then "li;rotldi" can be used to build the constant.

Compare with the previous version, those one-line abstraction codes are
removed.
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621001.html

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/ChangeLog:

* config/rs6000/rs6000.cc (can_be_built_by_li_and_rotldi): New function.
(rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build.c: New test.
---
 gcc/config/rs6000/rs6000.cc   | 47 +---
 .../gcc.target/powerpc/const-build.c  | 54 +++
 2 files changed, 95 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/const-build.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 42f49e4a56b..13aafd1360a 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10258,6 +10258,31 @@ rs6000_emit_set_const (rtx dest, rtx source)
   return true;
 }
 
+/* Check if value C can be built by 2 instructions: one is 'li', another is
+   rotldi.
+
+   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
+   is set to -1, and return true.  Return false otherwise.  */
+
+static bool
+can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
+  HOST_WIDE_INT *mask)
+{
+  int n;
+
+  /* Check if C can be rotated to a positive or negative value
+  which 'li' instruction is able to load.  */
+  if (can_be_rotated_to_lowbits (c, 15, )
+  || can_be_rotated_to_lowbits (~c, 15, ))
+{
+  *mask = HOST_WIDE_INT_M1;
+  *shift = HOST_BITS_PER_WIDE_INT - n;
+  return true;
+}
+
+  return false;
+}
+
 /* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
Output insns to set DEST equal to the constant C as a series of
lis, ori and shl instructions.  */
@@ -10266,15 +10291,14 @@ static void
 rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
 {
   rtx temp;
+  int shift;
+  HOST_WIDE_INT mask;
   HOST_WIDE_INT ud1, ud2, ud3, ud4;
 
   ud1 = c & 0x;
-  c = c >> 16;
-  ud2 = c & 0x;
-  c = c >> 16;
-  ud3 = c & 0x;
-  c = c >> 16;
-  ud4 = c & 0x;
+  ud2 = (c >> 16) & 0x;
+  ud3 = (c >> 32) & 0x;
+  ud4 = (c >> 48) & 0x;
 
   if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
   || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
@@ -10305,6 +10329,17 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
   emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
 GEN_INT ((ud2 ^ 0x) << 16)));
 }
+  else if (can_be_built_by_li_and_rotldi (c, , ))
+{
+  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
+  unsigned HOST_WIDE_INT imm = (c | ~mask);
+  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
+
+  emit_move_insn (temp, GEN_INT (imm));
+  if (shift != 0)
+   temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
+  emit_move_insn (dest, temp);
+}
   else if (ud3 == 0 && ud4 == 0)
 {
   temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
diff --git a/gcc/testsuite/gcc.target/powerpc/const-build.c 
b/gcc/testsuite/gcc.target/powerpc/const-build.c
new file mode 100644
index 000..70f095f6bf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
@@ -0,0 +1,54 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+#define NOIPA __attribute__ ((noipa))
+
+struct fun
+{
+  long long (*f) (void);
+  long long val;
+};
+
+long long NOIPA
+li_rotldi_1 (void)
+{
+  return 0x75310LL;
+}
+
+long long NOIPA
+li_rotldi_2 (void)
+{
+  return 0x2164LL;
+}
+
+long long NOIPA
+li_rotldi_3 (void)
+{
+  return 0x8531LL;
+}
+
+long long NOIPA
+li_rotldi_4 (void)
+{
+  return 0x2194LL;
+}
+
+struct fun arr[] = {
+  {li_rotldi_1, 0x75310LL},
+  {li_rotldi_2, 0x2164LL},
+  {li_rotldi_3, 0x8531LL},
+  {li_rotldi_4, 0x2194LL},
+};
+
+/* { dg-final { scan-assembler-times {\mrotldi\M} 4 } } */
+
+int
+main ()
+{
+  for (int i = 0; i < sizeof (arr) / sizeof (arr[0]); i++)
+if ((*arr[i].f) () != arr[i].val)
+  __builtin_abort ();
+
+  return 0;
+}
-- 
2.39.3



[PATCH] tree-optimization/110269 - restore missed condition folding

2023-06-16 Thread Richard Biener via Gcc-patches
The following makes sure we optimize x != 0 using range info
via tree_expr_nonzero_p via match.pd.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

This causes

FAIL: gcc.dg/tree-ssa/pr103257-1.c scan-tree-dump-times optimized 
"link_error" 0

because we now improve early folding the following way

-  if (a ((unsigned int) ((0, 1) && b != 0) > b, (int) (short int) c) != 
0)
+  if (a ((unsigned int) (b != 0) > b, (int) (short int) c) != 0)

and that causes some initial CFG turned into straight-line code
which we not handle as expected.  I'm going to file a bug
for this but go ahead pushing this patch.

Richard.


PR tree-optimization/110269
* fold-const.cc (fold_binary_loc): Merge x != 0 folding
with tree_expr_nonzero_p ...
* match.pd (cmp (convert? addr@0) integer_zerop): With this
pattern.

* gcc.dg/tree-ssa/pr110269.c: New testcase.
---
 gcc/fold-const.cc|  7 -
 gcc/match.pd |  4 +--
 gcc/testsuite/gcc.dg/tree-ssa/pr110269.c | 34 
 3 files changed, 36 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr110269.c

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 34c5f192a1d..7589498dbd0 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -12298,13 +12298,6 @@ fold_binary_loc (location_t loc, enum tree_code code, 
tree type,
  tem, build_int_cst (TREE_TYPE (tem), 0));
}
 
-  if (integer_zerop (arg1)
- && tree_expr_nonzero_p (arg0))
-{
- tree res = constant_boolean_node (code==NE_EXPR, type);
- return omit_one_operand_loc (loc, type, res, arg0);
-   }
-
   if (TREE_CODE (arg0) == BIT_XOR_EXPR
  && TREE_CODE (arg1) == BIT_XOR_EXPR)
{
diff --git a/gcc/match.pd b/gcc/match.pd
index 0fcf91f0eeb..264f9cb8a40 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6120,8 +6120,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(cmp @0 (bit_xor @1 (convert @2)
 
  (simplify
-  (cmp (convert? addr@0) integer_zerop)
-  (if (tree_single_nonzero_warnv_p (@0, NULL))
+  (cmp (nop_convert? @0) integer_zerop)
+  (if (tree_expr_nonzero_p (@0))
{ constant_boolean_node (cmp == NE_EXPR, type); }))
 
  /* (X & C) op (Y & C) into (X ^ Y) & C op 0.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr110269.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr110269.c
new file mode 100644
index 000..c68a6f91604
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr110269.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ccp2 -fdump-tree-optimized" } */
+
+void foo(void);
+static int a = 1, c;
+static int *b = 
+static int **d = 
+static int ***e = 
+void __assert_fail() __attribute__((__noreturn__));
+static int f() {
+if (a) return a;
+for (; c;) *e = 0;
+if (b) __assert_fail();
+return 6;
+}
+int main() {
+if (f()) {
+*d = 0;
+if (b == 0)
+;
+else {
+__builtin_unreachable();
+__assert_fail();
+}
+}
+if (b == 0)
+;
+else
+foo();
+;
+}
+
+/* { dg-final { scan-tree-dump-times "Folding predicate" 2 "ccp2" } } */
+/* { dg-final { scan-tree-dump-not "foo" "optimized" } } */
-- 
2.35.3


RE: [PATCH v2] RISC-V: Bugfix for RVV integer reduction in ZVE32/64.

2023-06-16 Thread Li, Pan2 via Gcc-patches
Thanks Juzhe for reviewing, will take care of the FP and widen part soon.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, June 16, 2023 4:11 PM
To: Li, Pan2 ; gcc-patches 
Cc: Robin Dapp ; jeffreyalaw ; Li, 
Pan2 ; Wang, Yanzhang ; kito.cheng 

Subject: Re: [PATCH v2] RISC-V: Bugfix for RVV integer reduction in ZVE32/64.

LGTM. Thanks for fix this bug.
Let's wait for Jeff's final approve.

Thanks.

juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-06-16 16:09
To: gcc-patches
CC: juzhe.zhong; 
rdapp.gcc; 
jeffreyalaw; pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v2] RISC-V: Bugfix for RVV integer reduction in ZVE32/64.
From: Pan Li mailto:pan2...@intel.com>>

The rvv integer reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.

code_for_reduc (code, mode1, mode2)
{
  if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx16qi; // ZVE128+

  if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx8qi;  // ZVE64

  if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx4qi;  // ZVE32
}

Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1QI, VNx1QI) which will return the code of
the ZVE128+ instead of the ZVE32 logically.

This patch will merge the 3 patterns into pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32 will be
code_for_reduc (max, VNx1Q1, VNx8QI), then the correct code of ZVE32
will be returned as expectation.

Please note both GCC 13 and 14 are impacted by this issue.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
Co-Authored by: Juzhe-Zhong mailto:juzhe.zh...@rivai.ai>>

PR 110265

gcc/ChangeLog:
PR target/110265
* config/riscv/riscv-vector-builtins-bases.cc: Add ret_mode for
integer reduction expand.
* config/riscv/vector-iterators.md: Add VQI, VHI, VSI and VDI,
and the LMUL1 attr respectively.
* config/riscv/vector.md.
(@pred_reduc_): Removed.
(@pred_reduc_): Likewise.
(@pred_reduc_): Likewise.
(@pred_reduc_): New pattern.
(@pred_reduc_): Likewise.
(@pred_reduc_): Likewise.
(@pred_reduc_): Likewise.

gcc/testsuite/ChangeLog:
PR target/110265
* gcc.target/riscv/rvv/base/pr110265-1.c: New test.
* gcc.target/riscv/rvv/base/pr110265-1.h: New test.
* gcc.target/riscv/rvv/base/pr110265-2.c: New test.
* gcc.target/riscv/rvv/base/pr110265-2.h: New test.
* gcc.target/riscv/rvv/base/pr110265-3.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  13 +-
gcc/config/riscv/vector-iterators.md  |  61 +
gcc/config/riscv/vector.md| 208 +-
.../gcc.target/riscv/rvv/base/pr110265-1.c|  13 ++
.../gcc.target/riscv/rvv/base/pr110265-1.h|  65 ++
.../gcc.target/riscv/rvv/base/pr110265-2.c|  14 ++
.../gcc.target/riscv/rvv/base/pr110265-2.h|  57 +
.../gcc.target/riscv/rvv/base/pr110265-3.c|  14 ++
8 files changed, 385 insertions(+), 60 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-1.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-2.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-3.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 87a684dd127..53bd0ed2534 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1396,8 +1396,17 @@ public:
   rtx expand (function_expander ) const override
   {
-return e.use_exact_insn (
-  code_for_pred_reduc (CODE, e.vector_mode (), e.vector_mode ()));
+machine_mode mode = e.vector_mode ();
+machine_mode ret_mode = e.ret_mode ();
+
+/* TODO: we will use ret_mode after all types of PR110265 are addressed.  
*/
+if ((GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT)
+   || GET_MODE_INNER (mode) != GET_MODE_INNER (ret_mode))
+  return e.use_exact_insn (
+ code_for_pred_reduc (CODE, e.vector_mode (), e.vector_mode ()));
+else
+  return e.use_exact_insn (
+ code_for_pred_reduc (CODE, e.vector_mode (), e.ret_mode ()));
   }
};
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 8c71c9e22cc..e2c8ade98eb 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ 

[PATCH] Check SCALAR_INT_MODE_P in try_const_anchors

2023-06-16 Thread Jiufu Guo via Gcc-patches
Hi,

The const_anchor in cse.cc supports integer constants only.
There is a "gcc_assert (SCALAR_INT_MODE_P (mode))" in
try_const_anchors.

In the latest code, some non-integer modes are used with const int.
For examples:
"set (mem/c:BLK (xx) (const_int 0 [0])" occur in md files of
rs6000, i386, arm, and pa. For this, the mode may be BLKmode.
Pattern "(set (strict_low_part (xx)) (const_int xx))" could
be generated in a few ports. For this, the mode may be VOIDmode.

So, avoid mode other than SCALAR_INT_MODE in try_const_anchors
would be needed.

Some discussions in the previous thread:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621097.html

Bootstrap  pass on ppc64{,le} and x86_64.
Is this ok for trunk?


BR,
Jeff (Jiufu Guo)

gcc/ChangeLog:

* cse.cc (try_const_anchors): Check SCALAR_INT_MODE.

---
 gcc/cse.cc | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/cse.cc b/gcc/cse.cc
index 2bb63ac4105..ddb76fd281d 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -1312,11 +1312,10 @@ try_const_anchors (rtx src_const, machine_mode mode)
   rtx lower_exp = NULL_RTX, upper_exp = NULL_RTX;
   unsigned lower_old, upper_old;
 
-  /* CONST_INT is used for CC modes, but we should leave those alone.  */
-  if (GET_MODE_CLASS (mode) == MODE_CC)
+  /* CONST_INT may be in various modes, avoid non-scalar-int mode. */
+  if (!SCALAR_INT_MODE_P (mode))
 return NULL_RTX;
 
-  gcc_assert (SCALAR_INT_MODE_P (mode));
   if (!compute_const_anchors (src_const, _base, _offs,
  _base, _offs))
 return NULL_RTX;
-- 
2.39.3



Re: [PATCH v2] RISC-V: Bugfix for RVV integer reduction in ZVE32/64.

2023-06-16 Thread juzhe.zh...@rivai.ai
LGTM. Thanks for fix this bug.
Let's wait for Jeff's final approve.

Thanks.


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-16 16:09
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v2] RISC-V: Bugfix for RVV integer reduction in ZVE32/64.
From: Pan Li 
 
The rvv integer reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.
 
code_for_reduc (code, mode1, mode2)
{
  if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx16qi; // ZVE128+
 
  if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx8qi;  // ZVE64
 
  if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx4qi;  // ZVE32
}
 
Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1QI, VNx1QI) which will return the code of
the ZVE128+ instead of the ZVE32 logically.
 
This patch will merge the 3 patterns into pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32 will be
code_for_reduc (max, VNx1Q1, VNx8QI), then the correct code of ZVE32
will be returned as expectation.
 
Please note both GCC 13 and 14 are impacted by this issue.
 
Signed-off-by: Pan Li 
Co-Authored by: Juzhe-Zhong 
 
PR 110265
 
gcc/ChangeLog:
PR target/110265
* config/riscv/riscv-vector-builtins-bases.cc: Add ret_mode for
integer reduction expand.
* config/riscv/vector-iterators.md: Add VQI, VHI, VSI and VDI,
and the LMUL1 attr respectively.
* config/riscv/vector.md.
(@pred_reduc_): Removed.
(@pred_reduc_): Likewise.
(@pred_reduc_): Likewise.
(@pred_reduc_): New pattern.
(@pred_reduc_): Likewise.
(@pred_reduc_): Likewise.
(@pred_reduc_): Likewise.
 
gcc/testsuite/ChangeLog:
PR target/110265
* gcc.target/riscv/rvv/base/pr110265-1.c: New test.
* gcc.target/riscv/rvv/base/pr110265-1.h: New test.
* gcc.target/riscv/rvv/base/pr110265-2.c: New test.
* gcc.target/riscv/rvv/base/pr110265-2.h: New test.
* gcc.target/riscv/rvv/base/pr110265-3.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  13 +-
gcc/config/riscv/vector-iterators.md  |  61 +
gcc/config/riscv/vector.md| 208 +-
.../gcc.target/riscv/rvv/base/pr110265-1.c|  13 ++
.../gcc.target/riscv/rvv/base/pr110265-1.h|  65 ++
.../gcc.target/riscv/rvv/base/pr110265-2.c|  14 ++
.../gcc.target/riscv/rvv/base/pr110265-2.h|  57 +
.../gcc.target/riscv/rvv/base/pr110265-3.c|  14 ++
8 files changed, 385 insertions(+), 60 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-1.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-2.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-3.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 87a684dd127..53bd0ed2534 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1396,8 +1396,17 @@ public:
   rtx expand (function_expander ) const override
   {
-return e.use_exact_insn (
-  code_for_pred_reduc (CODE, e.vector_mode (), e.vector_mode ()));
+machine_mode mode = e.vector_mode ();
+machine_mode ret_mode = e.ret_mode ();
+
+/* TODO: we will use ret_mode after all types of PR110265 are addressed.  
*/
+if ((GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT)
+   || GET_MODE_INNER (mode) != GET_MODE_INNER (ret_mode))
+  return e.use_exact_insn (
+ code_for_pred_reduc (CODE, e.vector_mode (), e.vector_mode ()));
+else
+  return e.use_exact_insn (
+ code_for_pred_reduc (CODE, e.vector_mode (), e.ret_mode ()));
   }
};
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 8c71c9e22cc..e2c8ade98eb 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -929,6 +929,67 @@ (define_mode_iterator V64T [
   (VNx2x64QI "TARGET_MIN_VLEN >= 128")
])
+(define_mode_iterator VQI [
+  (VNx1QI "TARGET_MIN_VLEN < 128")
+  VNx2QI
+  VNx4QI
+  VNx8QI
+  VNx16QI
+  VNx32QI
+  (VNx64QI "TARGET_MIN_VLEN > 32")
+  (VNx128QI "TARGET_MIN_VLEN >= 128")
+])
+
+(define_mode_iterator VHI [
+  (VNx1HI "TARGET_MIN_VLEN < 128")
+  VNx2HI
+  VNx4HI
+  VNx8HI
+  VNx16HI
+  (VNx32HI "TARGET_MIN_VLEN > 32")
+  (VNx64HI "TARGET_MIN_VLEN >= 128")
+])
+
+(define_mode_iterator VSI [
+  (VNx1SI "TARGET_MIN_VLEN < 128")
+  VNx2SI
+  VNx4SI
+  VNx8SI
+  (VNx16SI "TARGET_MIN_VLEN > 32")
+  (VNx32SI "TARGET_MIN_VLEN >= 128")
+])
+

[PATCH v2] RISC-V: Bugfix for RVV integer reduction in ZVE32/64.

2023-06-16 Thread Pan Li via Gcc-patches
From: Pan Li 

The rvv integer reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.

code_for_reduc (code, mode1, mode2)
{
  if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx16qi; // ZVE128+

  if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx8qi;  // ZVE64

  if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx4qi;  // ZVE32
}

Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1QI, VNx1QI) which will return the code of
the ZVE128+ instead of the ZVE32 logically.

This patch will merge the 3 patterns into pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32 will be
code_for_reduc (max, VNx1Q1, VNx8QI), then the correct code of ZVE32
will be returned as expectation.

Please note both GCC 13 and 14 are impacted by this issue.

Signed-off-by: Pan Li 
Co-Authored by: Juzhe-Zhong 

PR 110265

gcc/ChangeLog:
PR target/110265
* config/riscv/riscv-vector-builtins-bases.cc: Add ret_mode for
integer reduction expand.
* config/riscv/vector-iterators.md: Add VQI, VHI, VSI and VDI,
and the LMUL1 attr respectively.
* config/riscv/vector.md.
(@pred_reduc_): Removed.
(@pred_reduc_): Likewise.
(@pred_reduc_): Likewise.
(@pred_reduc_): New pattern.
(@pred_reduc_): Likewise.
(@pred_reduc_): Likewise.
(@pred_reduc_): Likewise.

gcc/testsuite/ChangeLog:
PR target/110265
* gcc.target/riscv/rvv/base/pr110265-1.c: New test.
* gcc.target/riscv/rvv/base/pr110265-1.h: New test.
* gcc.target/riscv/rvv/base/pr110265-2.c: New test.
* gcc.target/riscv/rvv/base/pr110265-2.h: New test.
* gcc.target/riscv/rvv/base/pr110265-3.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  |  13 +-
 gcc/config/riscv/vector-iterators.md  |  61 +
 gcc/config/riscv/vector.md| 208 +-
 .../gcc.target/riscv/rvv/base/pr110265-1.c|  13 ++
 .../gcc.target/riscv/rvv/base/pr110265-1.h|  65 ++
 .../gcc.target/riscv/rvv/base/pr110265-2.c|  14 ++
 .../gcc.target/riscv/rvv/base/pr110265-2.h|  57 +
 .../gcc.target/riscv/rvv/base/pr110265-3.c|  14 ++
 8 files changed, 385 insertions(+), 60 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-1.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-2.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-3.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 87a684dd127..53bd0ed2534 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1396,8 +1396,17 @@ public:
 
   rtx expand (function_expander ) const override
   {
-return e.use_exact_insn (
-  code_for_pred_reduc (CODE, e.vector_mode (), e.vector_mode ()));
+machine_mode mode = e.vector_mode ();
+machine_mode ret_mode = e.ret_mode ();
+
+/* TODO: we will use ret_mode after all types of PR110265 are addressed.  
*/
+if ((GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT)
+   || GET_MODE_INNER (mode) != GET_MODE_INNER (ret_mode))
+  return e.use_exact_insn (
+   code_for_pred_reduc (CODE, e.vector_mode (), e.vector_mode ()));
+else
+  return e.use_exact_insn (
+   code_for_pred_reduc (CODE, e.vector_mode (), e.ret_mode ()));
   }
 };
 
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 8c71c9e22cc..e2c8ade98eb 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -929,6 +929,67 @@ (define_mode_iterator V64T [
   (VNx2x64QI "TARGET_MIN_VLEN >= 128")
 ])
 
+(define_mode_iterator VQI [
+  (VNx1QI "TARGET_MIN_VLEN < 128")
+  VNx2QI
+  VNx4QI
+  VNx8QI
+  VNx16QI
+  VNx32QI
+  (VNx64QI "TARGET_MIN_VLEN > 32")
+  (VNx128QI "TARGET_MIN_VLEN >= 128")
+])
+
+(define_mode_iterator VHI [
+  (VNx1HI "TARGET_MIN_VLEN < 128")
+  VNx2HI
+  VNx4HI
+  VNx8HI
+  VNx16HI
+  (VNx32HI "TARGET_MIN_VLEN > 32")
+  (VNx64HI "TARGET_MIN_VLEN >= 128")
+])
+
+(define_mode_iterator VSI [
+  (VNx1SI "TARGET_MIN_VLEN < 128")
+  VNx2SI
+  VNx4SI
+  VNx8SI
+  (VNx16SI "TARGET_MIN_VLEN > 32")
+  (VNx32SI "TARGET_MIN_VLEN >= 128")
+])
+
+(define_mode_iterator VDI [
+  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128")
+  (VNx2DI "TARGET_VECTOR_ELEN_64")
+  (VNx4DI 

Re: [PATCH] mips: Fix overaligned function arguments [PR109435]

2023-06-16 Thread YunQiang Su via Gcc-patches
Jovan Dmitrovic  于2023年6月7日周三 18:29写道:
>
> I see what you mean now, so I've made adjustment in order for testcase to work
> on assembly. Following is the updated patch.
>
> Regards,
> Jovan
>
> From 2744357b5232c61bf1f780c4915d47b19d71f993 Mon Sep 17 00:00:00 2001
> From: Jovan Dmitrovic 
> Date: Fri, 19 May 2023 12:36:55 +0200
> Subject: [PATCH] mips: Fix overaligned function arguments [PR109435]
>
> This patch changes alignment for typedef types when passed as
> arguments, making the alignment equal to the alignment of
> original (aliased) types.
>
> This change makes it impossible for a typedef type to have
> alignment that is less than its size.
>
> Signed-off-by: Jovan Dmitrovic 
>
> gcc/ChangeLog:
> PR target/109435
> * config/mips/mips.cc (mips_function_arg_alignment): Returns
> the alignment of function argument. In case of typedef type,
> it returns the aligment of the aliased type.
> (mips_function_arg_boundary): Relocated calculation of the
> aligment of function arguments.
>

Please refer
https://gcc.gnu.org/contribute.html
about how to work with the ChangeLog.

> gcc/testsuite/ChangeLog:
> PR target/109435
> * gcc.target/mips/align-1.c: New test.
> ---
>  gcc/config/mips/mips.cc | 19 -
>  gcc/testsuite/gcc.target/mips/align-1.c | 38 +
>  2 files changed, 56 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/mips/align-1.c
>
> diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
> index c1d1691306e..20ba35f754c 100644
> --- a/gcc/config/mips/mips.cc
> +++ b/gcc/config/mips/mips.cc
> @@ -6190,6 +6190,23 @@ mips_arg_partial_bytes (cumulative_args_t cum, const 
> function_arg_info )
>return info.stack_words > 0 ? info.reg_words * UNITS_PER_WORD : 0;
>  }
>
> +/* Given MODE and TYPE of a function argument, return the alignment in
> +   bits.
> +   In case of typedef, alignment of its original type is
> +   used.  */
> +
> +static unsigned int
> +mips_function_arg_alignment (machine_mode mode, const_tree type)
> +{
> +  if (!type)
> +return GET_MODE_ALIGNMENT (mode);
> +
> +  if (is_typedef_decl (TYPE_NAME (type)))
> +type = DECL_ORIGINAL_TYPE (TYPE_NAME (type));
> +
> +  return TYPE_ALIGN (type);
> +}
> +
>  /* Implement TARGET_FUNCTION_ARG_BOUNDARY.  Every parameter gets at
> least PARM_BOUNDARY bits of alignment, but will be given anything up
> to STACK_BOUNDARY bits if the type requires it.  */
> @@ -6198,8 +6215,8 @@ static unsigned int
>  mips_function_arg_boundary (machine_mode mode, const_tree type)
>  {
>unsigned int alignment;
> +  alignment = mips_function_arg_alignment (mode, type);
>
> -  alignment = type ? TYPE_ALIGN (type) : GET_MODE_ALIGNMENT (mode);
>if (alignment < PARM_BOUNDARY)
>  alignment = PARM_BOUNDARY;
>if (alignment > STACK_BOUNDARY)
> diff --git a/gcc/testsuite/gcc.target/mips/align-1.c 
> b/gcc/testsuite/gcc.target/mips/align-1.c
> new file mode 100644
> index 000..5c639bee274
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/mips/align-1.c
> @@ -0,0 +1,38 @@
> +/* Check that typedef alignment does not affect passing of function
> +   parameters. */
> +/* { dg-do compile { target { "mips*-*-linux*" } } } */

mips* may be OK, since this test looks reasonable for bare metal platforms.

> +/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */

Does `-flto` required here?

> +
> +#include 
> +
> +typedef struct ui8
> +{
> +  unsigned v[8];
> +} uint8 __attribute__ ((aligned(64)));
> +
> +unsigned
> +callee (int x, uint8 a)
> +{
> +  return a.v[0];
> +}
> +
> +uint8
> +identity (uint8 in)
> +{
> +  return in;
> +}
> +
> +int
> +main (void)
> +{
> +  uint8 vec = {{1, 2, 3, 4, 5, 6, 7, 8}};
> +  uint8 temp = identity (vec);
> +  unsigned temp2 = callee (1, identity (vec));
> +  assert (callee (1, temp) == 1);
> +  assert (temp2 == 1);
> +  return 0;
> +}
> +
> +/* { dg-final { scan-assembler "\tsd\t\\\$5,0\\(\\\$\[0-9\]\\)" } } */
> +/* { dg-final { scan-assembler "\tsd\t\\\$6,8\\(\\\$\[0-9\]\\)" } } */
> +/* { dg-final { scan-assembler "\tsd\t\\\$7,16\\(\\\$\[0-9\]\\)" } } */

I guess, this test may fail for mips32 targets?
Maybe we can add 2 tests: one for O32, and one for N32/N64.
Add `-mabi=32`/`-mabi=n32` option into  `dg-do compile` line.

> --
> 2.34.1
>
>
>
>
> --
> YunQiang Su



-- 
YunQiang Su


Re: [PATCH] Introduce hardbool attribute for C

2023-06-16 Thread Alexandre Oliva via Gcc-patches
On Jun 16, 2023, Thomas Koenig  wrote:

> So, such a type would be incompatible with vanilla LOGICAL variables
> and with C interop logical variables.

Yeah, it would.  It's something else, and if you choose to use such a
type in an interface, it would need to be handled as such.  Presumably,
absent direct support in the desired language, using the underlying type
and the explicitly chosen constants would work.

This is nothing to call home about.  It's not unusual for languages to
support features that are not directly representable in other languages.
And this is one that isn't even hard to work around.

But I'd first doubt the wisdom of whoever adds such a type to a
cross-language interface.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] RISC-V: Fix VL operand bug in VSETVL PASS[PR110264]

2023-06-16 Thread Juzhe-Zhong
This patch fixes this issue happens on both GCC-13 and GCC-14.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110264

The testcase is too big and I failed to reduce it so I didn't append
test into this patch.

This patch should not only land into GCC-14 but also should backport to GCC-13.

PR target/110264

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (insert_vsetvl): Fix bug.

---
 gcc/config/riscv/riscv-vsetvl.cc | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index cae9be0d928..42cc6f29f26 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -744,7 +744,10 @@ insert_vsetvl (enum emit_type emit_type, rtx_insn *rinsn,
   if (vlmax_avl_p (info.get_avl ()))
 {
   gcc_assert (has_vtype_op (rinsn) || vsetvl_insn_p (rinsn));
-  rtx vl_op = info.get_avl_reg_rtx ();
+  /* For user vsetvli a5, zero, we should use get_vl to get the VL
+operand "a5".  */
+  rtx vl_op
+   = vsetvl_insn_p (rinsn) ? get_vl (rinsn) : info.get_avl_reg_rtx ();
   gcc_assert (!vlmax_avl_p (vl_op));
   emit_vsetvl_insn (VSETVL_NORMAL, emit_type, info, vl_op, rinsn);
   return VSETVL_NORMAL;
-- 
2.36.3



[PATCH] RISC-V: Fix PR 110264

2023-06-16 Thread Juzhe-Zhong
This patch fixes this issue happens on both GCC-13 and GCC-14.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110264

The testcase is too big and I failed to reduce it so I didn't append
test into this patch.

This patch should not only land into GCC-14 but also should backport to GCC-13.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (insert_vsetvl): Fix bug.

---
 gcc/config/riscv/riscv-vsetvl.cc | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index cae9be0d928..42cc6f29f26 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -744,7 +744,10 @@ insert_vsetvl (enum emit_type emit_type, rtx_insn *rinsn,
   if (vlmax_avl_p (info.get_avl ()))
 {
   gcc_assert (has_vtype_op (rinsn) || vsetvl_insn_p (rinsn));
-  rtx vl_op = info.get_avl_reg_rtx ();
+  /* For user vsetvli a5, zero, we should use get_vl to get the VL
+operand "a5".  */
+  rtx vl_op
+   = vsetvl_insn_p (rinsn) ? get_vl (rinsn) : info.get_avl_reg_rtx ();
   gcc_assert (!vlmax_avl_p (vl_op));
   emit_vsetvl_insn (VSETVL_NORMAL, emit_type, info, vl_op, rinsn);
   return VSETVL_NORMAL;
-- 
2.36.3



RE: [PATCH v1] RISC-V: Bugfix for RVV integer reduction in ZVE32/64.

2023-06-16 Thread Li, Pan2 via Gcc-patches
VECTOR_FLOAT_MODE_P referenced from expand, will remove it as it will be 
removed shortly.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, June 16, 2023 3:48 PM
To: Li, Pan2 ; gcc-patches 
Cc: Robin Dapp ; jeffreyalaw ; Li, 
Pan2 ; Wang, Yanzhang ; kito.cheng 

Subject: Re: [PATCH v1] RISC-V: Bugfix for RVV integer reduction in ZVE32/64.


+/* Nonzero if MODE is a vector float mode.  */

+#define VECTOR_FLOAT_MODE_P(MODE)   \

+  (GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT)
Why you add this?

Remove it. Otherwise, LGTM.

juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-06-16 15:28
To: gcc-patches
CC: juzhe.zhong; 
rdapp.gcc; 
jeffreyalaw; pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Bugfix for RVV integer reduction in ZVE32/64.
From: Pan Li mailto:pan2...@intel.com>>

The rvv integer reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.

code_for_reduc (code, mode1, mode2)
{
  if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx16qi; // ZVE128+

  if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx8qi;  // ZVE64

  if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx4qi;  // ZVE32
}

Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1QI, VNx1QI) which will return the code of
the ZVE128+ instead of the ZVE32 logically.

This patch will merge the 3 patterns into one pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32 will be
code_for_reduc (max, VNx1Q1, VNx4QI), then the correct code of ZVE32
will be returned as expectation.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
Co-Authored by: Juzhe-Zhong mailto:juzhe.zh...@rivai.ai>>

PR 110265

gcc/ChangeLog:
PR target/110265
* config/riscv/riscv-vector-builtins-bases.cc: Add ret_mode for
integer reduction expand.
* config/riscv/vector-iterators.md: Add VQI, VHI, VSI and VDI,
and the LMUL1 attr respectively.
* config/riscv/vector.md.
(@pred_reduc_): Removed.
(@pred_reduc_): Likewise.
(@pred_reduc_): Likewise.
(@pred_reduc_): New pattern.
(@pred_reduc_): Likewise.
(@pred_reduc_): Likewise.
(@pred_reduc_): Likewise.
* machmode.h (VECTOR_FLOAT_MODE_P): New macro.

gcc/testsuite/ChangeLog:
PR target/110265
* gcc.target/riscv/rvv/base/pr110265-1.c: New test.
* gcc.target/riscv/rvv/base/pr110265-1.h: New test.
* gcc.target/riscv/rvv/base/pr110265-2.c: New test.
* gcc.target/riscv/rvv/base/pr110265-2.h: New test.
* gcc.target/riscv/rvv/base/pr110265-3.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  13 +-
gcc/config/riscv/vector-iterators.md  |  61 +
gcc/config/riscv/vector.md| 208 +-
gcc/machmode.h|   4 +
.../gcc.target/riscv/rvv/base/pr110265-1.c|  13 ++
.../gcc.target/riscv/rvv/base/pr110265-1.h|  65 ++
.../gcc.target/riscv/rvv/base/pr110265-2.c|  14 ++
.../gcc.target/riscv/rvv/base/pr110265-2.h|  57 +
.../gcc.target/riscv/rvv/base/pr110265-3.c|  14 ++
9 files changed, 389 insertions(+), 60 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-1.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-2.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-3.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 87a684dd127..a77933d60d5 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1396,8 +1396,17 @@ public:
   rtx expand (function_expander ) const override
   {
-return e.use_exact_insn (
-  code_for_pred_reduc (CODE, e.vector_mode (), e.vector_mode ()));
+machine_mode mode = e.vector_mode ();
+machine_mode ret_mode = e.ret_mode ();
+
+/* TODO: we will use ret_mode after all types of PR110265 are addressed.  
*/
+if (VECTOR_FLOAT_MODE_P (mode)
+   || GET_MODE_INNER (mode) != GET_MODE_INNER (ret_mode))
+  return e.use_exact_insn (
+ code_for_pred_reduc (CODE, e.vector_mode (), e.vector_mode ()));
+else
+  return e.use_exact_insn (
+ code_for_pred_reduc (CODE, e.vector_mode (), e.ret_mode ()));
   }
};
diff --git 

Re: [PATCH v5] MIPS: Add speculation_barrier support

2023-06-16 Thread YunQiang Su
Richard Earnshaw (lists) via Gcc-patches 
于2023年6月8日周四 20:36写道:
>
>
> On 01/06/2023 05:26, YunQiang Su wrote:
> > speculation_barrier for MIPS needs sync+jr.hb (r2+),
> > so we implement __speculation_barrier in libgcc, like arm32 does.
> >
> > gcc/ChangeLog:
> >   * config/mips/mips-protos.h (mips_emit_speculation_barrier): New
> >  prototype.
> >   * config/mips/mips.cc (speculation_barrier_libfunc): New static
> >  variable.
> >   (mips_init_libfuncs): Initialize it.
> >   (mips_emit_speculation_barrier): New function.
> >   * config/mips/mips.md (speculation_barrier): Call
> >  mips_emit_speculation_barrier.
> >
> > libgcc/ChangeLog:
> >   * config/mips/lib1funcs.S: New file.
> >   define __speculation_barrier and include mips16.S.
> >   * config/mips/t-mips: define LIB1ASMSRC as mips/lib1funcs.S.
> >   define LIB1ASMFUNCS as _speculation_barrier.
> >   set version info for __speculation_barrier.
> >   * config/mips/libgcc-mips.ver: New file.
> >   * config/mips/t-mips16: don't define LIB1ASMSRC as mips16.S
> >   included in lib1funcs.S now.
> > ---
>
> Please remember to cite PR86793 when committing this fix.
>

Ohh, sorry. I forget it. I commented there.
I have no permission to close this bug report. Can you help to close it?

> R.
>
> >   gcc/config/mips/mips-protos.h  |  2 +
> >   gcc/config/mips/mips.cc| 12 ++
> >   gcc/config/mips/mips.md| 12 ++
> >   libgcc/config/mips/lib1funcs.S | 65 ++
> >   libgcc/config/mips/libgcc-mips.ver | 21 ++
> >   libgcc/config/mips/t-mips  |  7 
> >   libgcc/config/mips/t-mips16|  3 +-
> >   7 files changed, 120 insertions(+), 2 deletions(-)
> >   create mode 100644 libgcc/config/mips/lib1funcs.S
> >   create mode 100644 libgcc/config/mips/libgcc-mips.ver
> >
> > diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
> > index 20483469105..da7902c235b 100644
> > --- a/gcc/config/mips/mips-protos.h
> > +++ b/gcc/config/mips/mips-protos.h
> > @@ -388,4 +388,6 @@ extern void mips_register_frame_header_opt (void);
> >   extern void mips_expand_vec_cond_expr (machine_mode, machine_mode, rtx *);
> >   extern void mips_expand_vec_cmp_expr (rtx *);
> >
> > +extern void mips_emit_speculation_barrier_function (void);
> > +
> >   #endif /* ! GCC_MIPS_PROTOS_H */
> > diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
> > index ca491b981a3..c1d1691306e 100644
> > --- a/gcc/config/mips/mips.cc
> > +++ b/gcc/config/mips/mips.cc
> > @@ -13611,6 +13611,9 @@ mips_autovectorize_vector_modes (vector_modes 
> > *modes, bool)
> > return 0;
> >   }
> >
> > +
> > +static GTY (()) rtx speculation_barrier_libfunc;
> > +
> >   /* Implement TARGET_INIT_LIBFUNCS.  */
> >
> >   static void
> > @@ -13680,6 +13683,7 @@ mips_init_libfuncs (void)
> > synchronize_libfunc = init_one_libfunc ("__sync_synchronize");
> > init_sync_libfuncs (UNITS_PER_WORD);
> >   }
> > +  speculation_barrier_libfunc = init_one_libfunc ("__speculation_barrier");
> >   }
> >
> >   /* Build up a multi-insn sequence that loads label TARGET into $AT.  */
> > @@ -19092,6 +19096,14 @@ mips_avoid_hazard (rtx_insn *after, rtx_insn 
> > *insn, int *hilo_delay,
> > }
> >   }
> >
> > +/* Emit a speculation barrier.
> > +   JR.HB is needed, so we put speculation_barrier_libfunc in libgcc.  */
> > +void
> > +mips_emit_speculation_barrier_function ()
> > +{
> > +  emit_library_call (speculation_barrier_libfunc, LCT_NORMAL, VOIDmode);
> > +}
> > +
> >   /* A SEQUENCE is breakable iff the branch inside it has a compact form
> >  and the target has compact branches.  */
> >
> > diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
> > index ac1d77afc7d..5d04ac566dd 100644
> > --- a/gcc/config/mips/mips.md
> > +++ b/gcc/config/mips/mips.md
> > @@ -160,6 +160,8 @@
> > ;; The `.insn' pseudo-op.
> > UNSPEC_INSN_PSEUDO
> > UNSPEC_JRHB
> > +
> > +  VUNSPEC_SPECULATION_BARRIER
> >   ])
> >
> >   (define_constants
> > @@ -7455,6 +7457,16 @@
> > mips_expand_conditional_move (operands);
> > DONE;
> >   })
> > +
> > +(define_expand "speculation_barrier"
> > +  [(unspec_volatile [(const_int 0)] VUNSPEC_SPECULATION_BARRIER)]
> > +  ""
> > +  "
> > +  mips_emit_speculation_barrier_function ();
> > +  DONE;
> > +  "
> > +)
> > +
> >
> >   ;;
> >   ;;  
> > diff --git a/libgcc/config/mips/lib1funcs.S b/libgcc/config/mips/lib1funcs.S
> > new file mode 100644
> > index 000..97a3655e8ab
> > --- /dev/null
> > +++ b/libgcc/config/mips/lib1funcs.S
> > @@ -0,0 +1,65 @@
> > +/* Copyright (C) 2023 Free Software Foundation, Inc.
> > +
> > +This file is free software; you can redistribute it and/or modify it
> > +under the terms of the GNU General Public License as published by the
> > +Free Software Foundation; either version 3, or (at your option) any
> > +later version.

Re: [PATCH v1] RISC-V: Bugfix for RVV integer reduction in ZVE32/64.

2023-06-16 Thread juzhe.zh...@rivai.ai
+/* Nonzero if MODE is a vector float mode.  */
+#define VECTOR_FLOAT_MODE_P(MODE)  \
+  (GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT) 
Why you add this?

Remove it. Otherwise, LGTM.


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-16 15:28
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Bugfix for RVV integer reduction in ZVE32/64.
From: Pan Li 
 
The rvv integer reduction has 3 different patterns for zve128+, zve64
and zve32. They take the same iterator with different attributions.
However, we need the generated function code_for_reduc (code, mode1, mode2).
The implementation of code_for_reduc may look like below.
 
code_for_reduc (code, mode1, mode2)
{
  if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx16qi; // ZVE128+
 
  if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx8qi;  // ZVE64
 
  if (code == max && mode1 == VNx1QI && mode2 == VNx1QI)
return CODE_FOR_pred_reduc_maxvnx1qivnx4qi;  // ZVE32
}
 
Thus there will be a problem here. For example zve32, we will have
code_for_reduc (max, VNx1QI, VNx1QI) which will return the code of
the ZVE128+ instead of the ZVE32 logically.
 
This patch will merge the 3 patterns into one pattern, and pass both the
input_vector and the ret_vector of code_for_reduc. For example, ZVE32 will be
code_for_reduc (max, VNx1Q1, VNx4QI), then the correct code of ZVE32
will be returned as expectation.
 
Signed-off-by: Pan Li 
Co-Authored by: Juzhe-Zhong 
 
PR 110265
 
gcc/ChangeLog:
PR target/110265
* config/riscv/riscv-vector-builtins-bases.cc: Add ret_mode for
integer reduction expand.
* config/riscv/vector-iterators.md: Add VQI, VHI, VSI and VDI,
and the LMUL1 attr respectively.
* config/riscv/vector.md.
(@pred_reduc_): Removed.
(@pred_reduc_): Likewise.
(@pred_reduc_): Likewise.
(@pred_reduc_): New pattern.
(@pred_reduc_): Likewise.
(@pred_reduc_): Likewise.
(@pred_reduc_): Likewise.
* machmode.h (VECTOR_FLOAT_MODE_P): New macro.
 
gcc/testsuite/ChangeLog:
PR target/110265
* gcc.target/riscv/rvv/base/pr110265-1.c: New test.
* gcc.target/riscv/rvv/base/pr110265-1.h: New test.
* gcc.target/riscv/rvv/base/pr110265-2.c: New test.
* gcc.target/riscv/rvv/base/pr110265-2.h: New test.
* gcc.target/riscv/rvv/base/pr110265-3.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  13 +-
gcc/config/riscv/vector-iterators.md  |  61 +
gcc/config/riscv/vector.md| 208 +-
gcc/machmode.h|   4 +
.../gcc.target/riscv/rvv/base/pr110265-1.c|  13 ++
.../gcc.target/riscv/rvv/base/pr110265-1.h|  65 ++
.../gcc.target/riscv/rvv/base/pr110265-2.c|  14 ++
.../gcc.target/riscv/rvv/base/pr110265-2.h|  57 +
.../gcc.target/riscv/rvv/base/pr110265-3.c|  14 ++
9 files changed, 389 insertions(+), 60 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-1.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-2.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr110265-3.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 87a684dd127..a77933d60d5 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1396,8 +1396,17 @@ public:
   rtx expand (function_expander ) const override
   {
-return e.use_exact_insn (
-  code_for_pred_reduc (CODE, e.vector_mode (), e.vector_mode ()));
+machine_mode mode = e.vector_mode ();
+machine_mode ret_mode = e.ret_mode ();
+
+/* TODO: we will use ret_mode after all types of PR110265 are addressed.  
*/
+if (VECTOR_FLOAT_MODE_P (mode)
+   || GET_MODE_INNER (mode) != GET_MODE_INNER (ret_mode))
+  return e.use_exact_insn (
+ code_for_pred_reduc (CODE, e.vector_mode (), e.vector_mode ()));
+else
+  return e.use_exact_insn (
+ code_for_pred_reduc (CODE, e.vector_mode (), e.ret_mode ()));
   }
};
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 8c71c9e22cc..e2c8ade98eb 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -929,6 +929,67 @@ (define_mode_iterator V64T [
   (VNx2x64QI "TARGET_MIN_VLEN >= 128")
])
+(define_mode_iterator VQI [
+  (VNx1QI "TARGET_MIN_VLEN < 128")
+  VNx2QI
+  VNx4QI
+  VNx8QI
+  VNx16QI
+  VNx32QI
+  (VNx64QI "TARGET_MIN_VLEN > 32")
+  (VNx128QI "TARGET_MIN_VLEN >= 128")
+])
+
+(define_mode_iterator VHI [
+  (VNx1HI "TARGET_MIN_VLEN < 128")
+  VNx2HI
+  VNx4HI
+  VNx8HI
+  VNx16HI
+  (VNx32HI "TARGET_MIN_VLEN > 32")
+  (VNx64HI "TARGET_MIN_VLEN >= 128")
+])
+
+(define_mode_iterator VSI [
+  (VNx1SI 

Re: [PATCH 2/2] Refined 256/512-bit vpacksswb/vpackssdw patterns.

2023-06-16 Thread Uros Bizjak via Gcc-patches
On Fri, Jun 16, 2023 at 4:12 AM liuhongt  wrote:
>
> The packing in vpacksswb/vpackssdw is not a simple concat, it's an
> interweave from src1 and src2 for every 128 bit(or 64-bit for the
> ss_truncate result).
>
> .i.e.
>
> dst[192-255] = ss_truncate (src2[128-255])
> dst[128-191] = ss_truncate (src1[128-255])
> dst[64-127] = ss_truncate (src2[0-127])
> dst[0-63] = ss_truncate (src1[0-127]
>
> The patch refined those patterns with an extra vec_select for the
> interweave.
>
> The patch will fix below testcase which failed after
> g:921b841350c4fc298d09f6c5674663e0f4208610 added constant-folding for 
> SS_TRUNCATE
> FAIL: gcc.target/i386/avx2-vpackssdw-2.c execution test.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/110235
> * config/i386/sse.md (_packsswb): Split
> to below 3 new define_insns.
> (sse2_packsswb): New define_insn.
> (avx2_packsswb): Ditto.
> (avx512bw_packsswb): Ditto.
> (_packssdw): Split to below 3 new define_insns.
> (sse2_packssdw): New define_insn.
> (avx2_packssdw): Ditto.
> (avx512bw_packssdw): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx512bw-vpackssdw-3.c: New test.
> * gcc.target/i386/avx512bw-vpacksswb-3.c: New test.

Please proofread and fix ChangeLog entry, in the same way as your
previous patch.

Otherwise LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/sse.md| 165 --
>  .../gcc.target/i386/avx512bw-vpackssdw-3.c|  55 ++
>  .../gcc.target/i386/avx512bw-vpacksswb-3.c|  50 ++
>  3 files changed, 252 insertions(+), 18 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-vpackssdw-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-vpacksswb-3.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 83e3f534fd2..cc4e4620257 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -17762,14 +17762,14 @@ (define_expand "vec_pack_sbool_trunc_qi"
>DONE;
>  })
>
> -(define_insn "_packsswb"
> -  [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,")
> -   (vec_concat:VI1_AVX512
> - (ss_truncate:
> -   (match_operand: 1 "register_operand" "0,"))
> - (ss_truncate:
> -   (match_operand: 2 "vector_operand" 
> "xBm,m"]
> -  "TARGET_SSE2 &&  && "
> +(define_insn "sse2_packsswb"
> +  [(set (match_operand:V16QI 0 "register_operand" "=x,Yw")
> +   (vec_concat:V16QI
> + (ss_truncate:V8QI
> +   (match_operand:V8HI 1 "register_operand" "0,Yw"))
> + (ss_truncate:V8QI
> +   (match_operand:V8HI 2 "vector_operand" "xBm,Ywm"]
> +  "TARGET_SSE2 &&  && "
>"@
> packsswb\t{%2, %0|%0, %2}
> vpacksswb\t{%2, %1, %0|%0, %1, %2}"
> @@ -1,16 +1,93 @@ (define_insn "_packsswb"
> (set_attr "type" "sselog")
> (set_attr "prefix_data16" "1,*")
> (set_attr "prefix" "orig,")
> -   (set_attr "mode" "")])
> +   (set_attr "mode" "TI")])
>
> -(define_insn "_packssdw"
> -  [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,")
> -   (vec_concat:VI2_AVX2
> - (ss_truncate:
> -   (match_operand: 1 "register_operand" "0,"))
> - (ss_truncate:
> -   (match_operand: 2 "vector_operand" 
> "xBm,m"]
> -  "TARGET_SSE2 &&  && "
> +(define_insn "avx2_packsswb"
> +  [(set (match_operand:V32QI 0 "register_operand" "=Yw")
> +   (vec_select:V32QI
> + (vec_concat:V32QI
> +   (ss_truncate:V16QI
> + (match_operand:V16HI 1 "register_operand" "Yw"))
> +   (ss_truncate:V16QI
> + (match_operand:V16HI 2 "vector_operand" "Ywm")))
> + (parallel [(const_int 0)  (const_int 1)
> +(const_int 2)  (const_int 3)
> +(const_int 4)  (const_int 5)
> +(const_int 6)  (const_int 7)
> +(const_int 16) (const_int 17)
> +(const_int 18) (const_int 19)
> +(const_int 20) (const_int 21)
> +(const_int 22) (const_int 23)
> +(const_int 8)  (const_int 9)
> +(const_int 10) (const_int 11)
> +(const_int 12) (const_int 13)
> +(const_int 14) (const_int 15)
> +(const_int 24) (const_int 25)
> +(const_int 26) (const_int 27)
> +(const_int 28) (const_int 29)
> +(const_int 30) (const_int 31)])))]
> +  "TARGET_AVX2 &&  && "
> +  "vpacksswb\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "type" "sselog")
> +   (set_attr "prefix" "")
> +   (set_attr "mode" "OI")])
> +
> +(define_insn "avx512bw_packsswb"
> +  [(set (match_operand:V64QI 0 "register_operand" "=v")
> +   (vec_select:V64QI
> + (vec_concat:V64QI
> +   (ss_truncate:V32QI
> + (match_operand:V32HI 1 

  1   2   >