Re: [PATCH] Introduce hardbool attribute for C

2023-06-15 Thread Alexandre Oliva via Gcc-patches
On Aug  9, 2022, Alexandre Oliva  wrote:

> Ping?

Ping?  Refreshed, added setting of ENUM_UNDERLYING_TYPE, retested on
x86_64-linux-gnu (also on gcc-13).


Introduce hardbool attribute for C

This patch introduces hardened booleans in C.  The hardbool attribute,
when attached to an integral type, turns it into an enumerate type
with boolean semantics, using the named or implied constants as
representations for false and true.

Expressions of such types decay to _Bool, trapping if the value is
neither true nor false, and _Bool can convert implicitly back to them.
Other conversions go through _Bool first.


for  gcc/c-family/ChangeLog

* c-attribs.cc (c_common_attribute_table): Add hardbool.
(handle_hardbool_attribute): New.
(type_valid_for_vector_size): Reject hardbool.
* c-common.cc (convert_and_check): Skip warnings for convert
and check for hardbool.
(c_hardbool_type_attr_1): New.
* c-common.h (c_hardbool_type_attr): New.

for  gcc/c/ChangeLog

* c-typeck.cc (convert_lvalue_to_rvalue): Decay hardbools.
* c-convert.cc (convert): Convert to hardbool through
truthvalue.
* c-decl.cc (check_bitfield_type_and_width): Skip enumeral
truncation warnings for hardbool.
(finish_struct): Propagate hardbool attribute to bitfield
types.
(digest_init): Convert to hardbool.

for  gcc/ChangeLog

* doc/extend.texi (hardbool): New type attribute.

for  gcc/testsuite/ChangeLog

* gcc.dg/hardbool-err.c: New.
* gcc.dg/hardbool-trap.c: New.
* gcc.dg/hardbool.c: New.
* gcc.dg/hardbool-s.c: New.
* gcc.dg/hardbool-us.c: New.
* gcc.dg/hardbool-i.c: New.
* gcc.dg/hardbool-ul.c: New.
* gcc.dg/hardbool-ll.c: New.
* gcc.dg/hardbool-5a.c: New.
* gcc.dg/hardbool-s-5a.c: New.
* gcc.dg/hardbool-us-5a.c: New.
* gcc.dg/hardbool-i-5a.c: New.
* gcc.dg/hardbool-ul-5a.c: New.
* gcc.dg/hardbool-ll-5a.c: New.
---
 gcc/c-family/c-attribs.cc |   98 -
 gcc/c-family/c-common.cc  |   21 
 gcc/c-family/c-common.h   |   18 
 gcc/c/c-convert.cc|   14 +++
 gcc/c/c-decl.cc   |   10 ++
 gcc/c/c-typeck.cc |   31 ++-
 gcc/doc/extend.texi   |   52 +++
 gcc/testsuite/gcc.dg/hardbool-err.c   |   28 ++
 gcc/testsuite/gcc.dg/hardbool-trap.c  |   13 +++
 gcc/testsuite/gcc.dg/torture/hardbool-5a.c|6 +
 gcc/testsuite/gcc.dg/torture/hardbool-i-5a.c  |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-i.c |5 +
 gcc/testsuite/gcc.dg/torture/hardbool-ll-5a.c |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-ll.c|5 +
 gcc/testsuite/gcc.dg/torture/hardbool-s-5a.c  |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-s.c |5 +
 gcc/testsuite/gcc.dg/torture/hardbool-ul-5a.c |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-ul.c|5 +
 gcc/testsuite/gcc.dg/torture/hardbool-us-5a.c |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-us.c|5 +
 gcc/testsuite/gcc.dg/torture/hardbool.c   |  118 +
 21 files changed, 460 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/hardbool-err.c
 create mode 100644 gcc/testsuite/gcc.dg/hardbool-trap.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-i-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-i.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ll-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ll.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-s-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-s.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ul-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ul.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-us-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-us.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index c12211cb4d499..365319e642b1a 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -176,6 +176,7 @@ static tree handle_objc_root_class_attribute (tree *, tree, 
tree, int, bool *);
 static tree handle_objc_nullability_attribute (tree *, tree, tree, int, bool 
*);
 static tree handle_signed_bool_precision_attribute (tree *, tree, tree, int,
bool *);
+static tree handle_hardbool_attribute (tree *, tree, tree, int, bool *);
 static tree handle_retain_attribute (tree *, tree, tree, int, bool *);
 static tree handle_fd_arg_attribute (tree *, tree, tree, int, bool *);
 
@@ -293,6 +294,8 @@ const struct attribute_spec 

[PATCH] Add bfloat16_t support for riscv

2023-06-15 Thread Liao Shihua
x86_64/i686/AArch64 has for a few months working std::bfloat16_t
support, __bf16 there is no longer a storage only type, but can 
be used for arithmetics and is supported in libgcc and libstdc++. 
The patch adds similar support for RISC-V. __bf16 has been merged 
in psABI. The compiler handles all operations with __bf16 by 
converting to SFmode.

gcc/ChangeLog:

* config/riscv/iterators.md (ld):Add BFmode in iterators.
(sd):Ditto.
* config/riscv/riscv-builtins.cc (riscv_init_builtin_types):Add 
bfloat16_type_node in riscv.
* config/riscv/riscv-modes.def (FLOAT_MODE):Add BFmode in FLOAT_MODE.
(ADJUST_FLOAT_FORMAT):Ditto.
* config/riscv/riscv.cc (riscv_mangle_type):Add DF16b in mangle.
(riscv_scalar_mode_supported_p):Add BFmode in scalar_float_mode.
(riscv_libgcc_floating_mode_supported_p):Support BFmode in libgcc.
* config/riscv/riscv.md (mode" ):Support BFmode in machine description.
(movbf): Support BFmode in softfloat.
(*movbf_softfloat):Ditto.

libgcc/ChangeLog:

* config/riscv/sfp-machine.h (_FP_NANFRAC_B):Define.
(_FP_NANSIGN_B):
* config/riscv/t-softfp32:Add trunc{tfbf dfbf sfbf hfbf}, extendbfsf, 
floatdibf, floatundibf.
* config/riscv/t-softfp64:Add floattibf, floatuntibf.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/__bf16-soft.c: New test.

---
 gcc/config/riscv/iterators.md|  4 ++--
 gcc/config/riscv/riscv-builtins.cc   | 16 +++
 gcc/config/riscv/riscv-modes.def |  2 ++
 gcc/config/riscv/riscv.cc| 12 ---
 gcc/config/riscv/riscv.md| 21 +++-
 gcc/testsuite/gcc.target/riscv/__bf16-soft.c | 12 +++
 libgcc/config/riscv/sfp-machine.h|  3 +++
 libgcc/config/riscv/t-softfp32   |  7 ---
 libgcc/config/riscv/t-softfp64   |  2 +-
 9 files changed, 69 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/__bf16-soft.c

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index d374a10810c..c9148028ea3 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -87,13 +87,13 @@
 (define_mode_attr default_load [(QI "lbu") (HI "lhu") (SI "lw") (DI "ld")])
 
 ;; Mode attribute for FP loads into integer registers.
-(define_mode_attr softload [(HF "lh") (SF "lw") (DF "ld")])
+(define_mode_attr softload [(BF "lh") (HF "lh") (SF "lw") (DF "ld")])
 
 ;; Instruction names for stores.
 (define_mode_attr store [(QI "sb") (HI "sh") (SI "sw") (DI "sd") (HF "fsh") 
(SF "fsw") (DF "fsd")])
 
 ;; Instruction names for FP stores from integer registers.
-(define_mode_attr softstore [(HF "sh") (SF "sw") (DF "sd")])
+(define_mode_attr softstore [(BF "sh") (HF "sh") (SF "sw") (DF "sd")])
 
 ;; This attribute gives the best constraint to use for registers of
 ;; a given mode.
diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index 79681d75962..398247a0ccb 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -194,6 +194,7 @@ static GTY(()) int riscv_builtin_decl_index[NUM_INSN_CODES];
   riscv_builtin_decls[riscv_builtin_decl_index[(CODE)]]
 
 tree riscv_float16_type_node = NULL_TREE;
+tree riscv_bfloat16_type_node = NULL_TREE;
 
 /* Return the function type associated with function prototype TYPE.  */
 
@@ -237,6 +238,21 @@ riscv_init_builtin_types (void)
   if (!maybe_get_identifier ("_Float16"))
 lang_hooks.types.register_builtin_type (riscv_float16_type_node,
"_Float16");
+
+  /* Provide the __bf16 type and bfloat16_type_node if needed.  */
+  if (!bfloat16_type_node)
+{
+  riscv_bfloat16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (riscv_bfloat16_type_node) = 16;
+  SET_TYPE_MODE (riscv_bfloat16_type_node, BFmode);
+  layout_type (riscv_bfloat16_type_node);
+}
+  else
+riscv_bfloat16_type_node = bfloat16_type_node;
+
+  if (!maybe_get_identifier ("__bf16"))
+lang_hooks.types.register_builtin_type (riscv_bfloat16_type_node,
+   "__bf16");
 }
 
 /* Implement TARGET_INIT_BUILTINS.  */
diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
index 19a4f9fb3db..4bb03307840 100644
--- a/gcc/config/riscv/riscv-modes.def
+++ b/gcc/config/riscv/riscv-modes.def
@@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
 
 FLOAT_MODE (HF, 2, ieee_half_format);
 FLOAT_MODE (TF, 16, ieee_quad_format);
+FLOAT_MODE (BF, 2, 0);
+ADJUST_FLOAT_FORMAT (BF, _bfloat_half_format);
 
 /* Vector modes.  */
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index e5ae4e81b7a..d5b1350d4bf 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7155,8 +7155,14 @@ static const char *
 riscv_mangle_type 

Re: [PATCH] rs6000: Change bitwise xor to inequality operator [PR106907]

2023-06-15 Thread Peter Bergner via Gcc-patches
On 6/12/23 6:18 AM, P Jeevitha wrote:
> Bitwise xor performed on bool
> is similar to checking inequality. So changed to inequality
> operator (!=) instead of bitwise xor (^).
[snip'
> -   if (swapped ^ !BYTES_BIG_ENDIAN
[snip]
> +   if (swapped != !BYTES_BIG_ENDIAN

I know Andreas mentioned using "swapped != !BYTES_BIG_ENDIAN" in
the bugzilla, but that's the same as "swapped == BYTES_BIG_ENDIAN",
and it doesn't contain a double-negative and seems a little clearer.

It's up to Segher though...and if we go with this, then the ChangeLog
entry needs to be updated slightly since we're no longer testing for
inequality.

Peter



Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-15 Thread Jiufu Guo via Gcc-patches


Hi,

Segher Boessenkool  writes:

> On Thu, Jun 15, 2023 at 03:00:40PM +0800, Jiufu Guo wrote:
>> >>   This is the existing pattern.  It may be read as an action
>> >>   to clean an unknown-size memory block.
>> >
>> > Including a size zero memory block, yes.  BLKmode was originally to do
>> > things like bcopy (before modern names like memcpy were more usually
>> > used), and those very much need size zero as well.h
>> 
>> The size is possible to be zero.  No asm code needs to
>> be generated for "set 'const_int 0' to zero size memory"".
>> stack_tie does not generate any real code.  It seems ok :)
>> 
>> While, it may not be zero size mem.  This may be a concern.
>> This is one reason that I would like to have an unspec_tie.
>
> It very much *can* be a zero size mem, that is perfectly find for
> mem:BLK.

There is still one concern: how to distinguish stack_tie
from other insn.
For example, below fake pattern:
(define_insn "xx_cleanmem"
  [(parallel: [(set (mem:BLK (xxx)) (const_int 0))
   (XXX/use "const_int_operand" "n")])]...

To avoid this pattern to be recognized as 'stack_tie',
'unspec_tie' was came to mind. 

>
>> Another reason is unspec:blk is used but various ports :) 
>
> unspec:BLK is undefined.  BLKmode is allowed on mem only.
>
>> >> 2. "set (mem/c:BLK (reg/f:DI 1 1) unspec:blk (const_int 0 [0])
>> >> UNSPEC_TIE".
>> >>   Current patch is using this one.
>> >
>> > What would be the semantics of that?  Just the same as the current stuff
>> > I'd say, or less?  It cannot be more!
>> 
>> The semantic that I trying to achieve is "this is a special
>> insn, not only a normal set to unknown size mem".
>
> What does that *mean*?  "Special instruction"?  What would what code do
> for that?  What would the RTL mean?
>
>> As you explained before on 'unspec:DI', the unspec would
>> just decorate the set_src part: something DI value with
>> machine-specific operation.
>
> An unspec is an operation on its operands, giving some (in this case)
> DImode value.  There is nothing special about that operation, it can be
> optimised like any other, it's just not specified what exactly that
> value is (to the generic compiler, the backend itself can very much
> optimise stuff with it).
>
>> But, since 'tie_operand' is checked for this insn.
>> If 'tie_operand' checks UNPSEC_TIE, then the insn
>> with UNPSEC_TIE is 'a special insn'.  Or interpret
>> the semantic of this insn as: this insn stack_ite
>> indicates "set/operate a zero size block".
>
> tie_operand is a predicate.  The predicate of an insn has to return 1,
> or the insn is not recognised.  You can do the same in insn conditions
> always (in principle anyway).

Thank you very much for your detailed and patient explanation!

BR,
Jeff (Jiufu Guo)

>
>
> Segher


[PATCH 2/2] Refined 256/512-bit vpacksswb/vpackssdw patterns.

2023-06-15 Thread liuhongt via Gcc-patches
The packing in vpacksswb/vpackssdw is not a simple concat, it's an
interweave from src1 and src2 for every 128 bit(or 64-bit for the
ss_truncate result).

.i.e.

dst[192-255] = ss_truncate (src2[128-255])
dst[128-191] = ss_truncate (src1[128-255])
dst[64-127] = ss_truncate (src2[0-127])
dst[0-63] = ss_truncate (src1[0-127]

The patch refined those patterns with an extra vec_select for the
interweave.

The patch will fix below testcase which failed after
g:921b841350c4fc298d09f6c5674663e0f4208610 added constant-folding for 
SS_TRUNCATE
FAIL: gcc.target/i386/avx2-vpackssdw-2.c execution test.

Bootstrapped and regtested on x86_64-pc-linux-gnu.
Ok for trunk?

gcc/ChangeLog:

PR target/110235
* config/i386/sse.md (_packsswb): Split
to below 3 new define_insns.
(sse2_packsswb): New define_insn.
(avx2_packsswb): Ditto.
(avx512bw_packsswb): Ditto.
(_packssdw): Split to below 3 new define_insns.
(sse2_packssdw): New define_insn.
(avx2_packssdw): Ditto.
(avx512bw_packssdw): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512bw-vpackssdw-3.c: New test.
* gcc.target/i386/avx512bw-vpacksswb-3.c: New test.
---
 gcc/config/i386/sse.md| 165 --
 .../gcc.target/i386/avx512bw-vpackssdw-3.c|  55 ++
 .../gcc.target/i386/avx512bw-vpacksswb-3.c|  50 ++
 3 files changed, 252 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-vpackssdw-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-vpacksswb-3.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 83e3f534fd2..cc4e4620257 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -17762,14 +17762,14 @@ (define_expand "vec_pack_sbool_trunc_qi"
   DONE;
 })
 
-(define_insn "_packsswb"
-  [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,")
-   (vec_concat:VI1_AVX512
- (ss_truncate:
-   (match_operand: 1 "register_operand" "0,"))
- (ss_truncate:
-   (match_operand: 2 "vector_operand" "xBm,m"]
-  "TARGET_SSE2 &&  && "
+(define_insn "sse2_packsswb"
+  [(set (match_operand:V16QI 0 "register_operand" "=x,Yw")
+   (vec_concat:V16QI
+ (ss_truncate:V8QI
+   (match_operand:V8HI 1 "register_operand" "0,Yw"))
+ (ss_truncate:V8QI
+   (match_operand:V8HI 2 "vector_operand" "xBm,Ywm"]
+  "TARGET_SSE2 &&  && "
   "@
packsswb\t{%2, %0|%0, %2}
vpacksswb\t{%2, %1, %0|%0, %1, %2}"
@@ -1,16 +1,93 @@ (define_insn "_packsswb"
(set_attr "type" "sselog")
(set_attr "prefix_data16" "1,*")
(set_attr "prefix" "orig,")
-   (set_attr "mode" "")])
+   (set_attr "mode" "TI")])
 
-(define_insn "_packssdw"
-  [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,")
-   (vec_concat:VI2_AVX2
- (ss_truncate:
-   (match_operand: 1 "register_operand" "0,"))
- (ss_truncate:
-   (match_operand: 2 "vector_operand" "xBm,m"]
-  "TARGET_SSE2 &&  && "
+(define_insn "avx2_packsswb"
+  [(set (match_operand:V32QI 0 "register_operand" "=Yw")
+   (vec_select:V32QI
+ (vec_concat:V32QI
+   (ss_truncate:V16QI
+ (match_operand:V16HI 1 "register_operand" "Yw"))
+   (ss_truncate:V16QI
+ (match_operand:V16HI 2 "vector_operand" "Ywm")))
+ (parallel [(const_int 0)  (const_int 1)
+(const_int 2)  (const_int 3)
+(const_int 4)  (const_int 5)
+(const_int 6)  (const_int 7)
+(const_int 16) (const_int 17)
+(const_int 18) (const_int 19)
+(const_int 20) (const_int 21)
+(const_int 22) (const_int 23)
+(const_int 8)  (const_int 9)
+(const_int 10) (const_int 11)
+(const_int 12) (const_int 13)
+(const_int 14) (const_int 15)
+(const_int 24) (const_int 25)
+(const_int 26) (const_int 27)
+(const_int 28) (const_int 29)
+(const_int 30) (const_int 31)])))]
+  "TARGET_AVX2 &&  && "
+  "vpacksswb\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "sselog")
+   (set_attr "prefix" "")
+   (set_attr "mode" "OI")])
+
+(define_insn "avx512bw_packsswb"
+  [(set (match_operand:V64QI 0 "register_operand" "=v")
+   (vec_select:V64QI
+ (vec_concat:V64QI
+   (ss_truncate:V32QI
+ (match_operand:V32HI 1 "register_operand" "v"))
+   (ss_truncate:V32QI
+ (match_operand:V32HI 2 "vector_operand" "vm")))
+ (parallel [(const_int 0)  (const_int 1)
+(const_int 2)  (const_int 3)
+(const_int 4)  (const_int 5)
+(const_int 6)  (const_int 7)
+(const_int 32) (const_int 33)
+(const_int 34) 

[PATCH 1/2] Reimplement packuswb/packusdw with UNSPEC_US_TRUNCATE instead of original us_truncate.

2023-06-15 Thread liuhongt via Gcc-patches
packuswb/packusdw does unsigned saturation for signed source, but rtl
us_truncate means does unsigned saturation for unsigned source.
So for value -1, packuswb will produce 0, but us_truncate produces
255. The patch reimplement those related patterns and functions with
UNSPEC_US_TRUNCATE instead of us_truncate.

The patch will fix below testcase which failed after
g:921b841350c4fc298d09f6c5674663e0f4208610 added constant-folding for 
US_TRUNCATE

FAIL: gcc.target/i386/avx-vpackuswb-1.c execution test
FAIL: gcc.target/i386/avx2-vpackusdw-2.c execution test
FAIL: gcc.target/i386/avx2-vpackuswb-2.c execution test
FAIL: gcc.target/i386/sse2-packuswb-1.c execution test

Bootstrapped and regtested on x86_64-pc-linux-gnu.
Ok for trunk?

gcc/ChangeLog:

PR target/110235
* config/i386/i386-expand.cc (ix86_split_mmx_pack): Use
UNSPEC_US_TRUNCATE instead of original us_truncate for
packusdw/packuswb.
* config/i386/mmx.md (mmx_packswb): Splitted to
below 2 new patterns.
(mmx_packsswb): New reload_completed define_insn_and_split.
(mmx_packuswb): Ditto.
(mmx_packusdw): Use UNSPEC_US_TRUNCATE instead of original
us_truncate.
(s_trunsuffix): Removed.
(any_s_truncate): Removed.
* config/i386/sse.md (_packuswb): Use
UNSPEC_US_TRUNCATE instead of original us_truncate.
(_packusdw): Ditto.
* config/i386/i386.md (UNSPEC_US_TRUNCATE): New unspec_c_enum.
---
 gcc/config/i386/i386-expand.cc | 20 
 gcc/config/i386/i386.md|  4 
 gcc/config/i386/mmx.md | 43 ++
 gcc/config/i386/sse.md | 20 
 4 files changed, 57 insertions(+), 30 deletions(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index def060ab562..35e2740f9b6 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1019,6 +1019,7 @@ ix86_split_mmx_pack (rtx operands[], enum rtx_code code)
   rtx op0 = operands[0];
   rtx op1 = operands[1];
   rtx op2 = operands[2];
+  rtx src;
 
   machine_mode dmode = GET_MODE (op0);
   machine_mode smode = GET_MODE (op1);
@@ -1042,11 +1043,20 @@ ix86_split_mmx_pack (rtx operands[], enum rtx_code code)
   op1 = lowpart_subreg (sse_smode, op1, GET_MODE (op1));
   op2 = lowpart_subreg (sse_smode, op2, GET_MODE (op2));
 
-  op1 = gen_rtx_fmt_e (code, sse_half_dmode, op1);
-  op2 = gen_rtx_fmt_e (code, sse_half_dmode, op2);
-  rtx insn = gen_rtx_SET (dest, gen_rtx_VEC_CONCAT (sse_dmode,
-   op1, op2));
-  emit_insn (insn);
+  /* For packusdw/packuswb, it does unsigned saturation for
+ signed source which is different for rtl US_TRUNCATE.  */
+  if (code == US_TRUNCATE)
+src = gen_rtx_UNSPEC (sse_dmode,
+ gen_rtvec (2, op1, op2),
+ UNSPEC_US_TRUNCATE);
+  else
+{
+  op1 = gen_rtx_fmt_e (code, sse_half_dmode, op1);
+  op2 = gen_rtx_fmt_e (code, sse_half_dmode, op2);
+  src = gen_rtx_VEC_CONCAT (sse_dmode, op1, op2);
+}
+
+  emit_move_insn (dest, src);
 
   ix86_move_vector_high_sse_to_mmx (op0);
 }
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 0929115ed4d..070a84d8af9 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -129,6 +129,10 @@ (define_c_enum "unspec" [
   UNSPEC_RSQRT
   UNSPEC_PSADBW
 
+  ;; US_TRUNCATE this is different from rtl us_truncate,
+  ;; it does unsigned truncation for signed source.
+  UNSPEC_US_TRUNCATE
+
   ;; For AVX/AVX512F support
   UNSPEC_SCALEF
   UNSPEC_PCMP
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 6fbe3909c8b..315eb4193c4 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -3337,27 +3337,41 @@ (define_split
 ;;
 ;
 
-;; Used in signed and unsigned truncations with saturation.
-(define_code_iterator any_s_truncate [ss_truncate us_truncate])
-;; Instruction suffix for truncations with saturation.
-(define_code_attr s_trunsuffix [(ss_truncate "s") (us_truncate "u")])
-
-(define_insn_and_split "mmx_packswb"
+(define_insn_and_split "mmx_packsswb"
   [(set (match_operand:V8QI 0 "register_operand" "=y,x,Yw")
(vec_concat:V8QI
- (any_s_truncate:V4QI
+ (ss_truncate:V4QI
(match_operand:V4HI 1 "register_operand" "0,0,Yw"))
- (any_s_truncate:V4QI
+ (ss_truncate:V4QI
(match_operand:V4HI 2 "register_mmxmem_operand" "ym,x,Yw"]
   "TARGET_MMX || TARGET_MMX_WITH_SSE"
   "@
-   packswb\t{%2, %0|%0, %2}
+   packsswb\t{%2, %0|%0, %2}
+   #
+   #"
+  "&& reload_completed
+   && SSE_REGNO_P (REGNO (operands[0]))"
+  [(const_int 0)]
+  "ix86_split_mmx_pack (operands, SS_TRUNCATE); DONE;"
+  [(set_attr "mmx_isa" "native,sse_noavx,avx")
+   (set_attr "type" "mmxshft,sselog,sselog")
+   (set_attr "mode" "DI,TI,TI")])
+

[pushed] c: add name hints to c_parser_declspecs [PR107583]

2023-06-15 Thread David Malcolm via Gcc-patches
PR c/107583 notes that we weren't issuing a hint for

  struct foo {
time_t mytime; /* missing  include should trigger fixit */
  };

in the C frontend.

The root cause is that one of the "unknown type name" diagnostics
was missing logic to emit hints, which this patch fixes.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-1876-g57446d1bc9757e.

gcc/c/ChangeLog:
PR c/107583
* c-parser.cc (c_parser_declspecs): Add hints to "unknown type
name" error.

gcc/testsuite/ChangeLog:
PR c/107583
* c-c++-common/spellcheck-pr107583.c: New test.
---
 gcc/c/c-parser.cc| 14 +-
 gcc/testsuite/c-c++-common/spellcheck-pr107583.c | 10 ++
 2 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/c-c++-common/spellcheck-pr107583.c

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 5baa501dbee..f8b14e4c688 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -3182,7 +3182,19 @@ c_parser_declspecs (c_parser *parser, struct c_declspecs 
*specs,
  attrs_ok = true;
  if (kind == C_ID_ID)
{
- error_at (loc, "unknown type name %qE", value);
+ auto_diagnostic_group d;
+ name_hint hint = lookup_name_fuzzy (value, FUZZY_LOOKUP_TYPENAME,
+ loc);
+ if (const char *suggestion = hint.suggestion ())
+   {
+ gcc_rich_location richloc (loc);
+ richloc.add_fixit_replace (suggestion);
+ error_at (,
+   "unknown type name %qE; did you mean %qs?",
+   value, suggestion);
+   }
+ else
+   error_at (loc, "unknown type name %qE", value);
  t.kind = ctsk_typedef;
  t.spec = error_mark_node;
}
diff --git a/gcc/testsuite/c-c++-common/spellcheck-pr107583.c 
b/gcc/testsuite/c-c++-common/spellcheck-pr107583.c
new file mode 100644
index 000..86a9e7dbcb6
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/spellcheck-pr107583.c
@@ -0,0 +1,10 @@
+struct s1 {
+  time_t mytime; /* { dg-error "unknown type name 'time_t'" "c error" { target 
c } } */
+  /* { dg-error "'time_t' does not name a type" "c++ error" { target c++ } .-1 
} */
+  /* { dg-message "'time_t' is defined in header" "hint" { target *-*-* } .-2 
} */
+};
+
+struct s2 {
+  unsinged i; /* { dg-error "unknown type name 'unsinged'; did you mean 
'unsigned'." "c error" { target c } } */
+  /* { dg-error "'unsinged' does not name a type; did you mean 'unsigned'." 
"c++ error" { target c++ } .-1 } */
+};
-- 
2.26.3



Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-06-15 Thread Joseph Myers
On Thu, 15 Jun 2023, Qing Zhao via Gcc-patches wrote:

> B. The argument of the new attribute “counted_by” is an identifier that can be
> accepted by “c_parser_attribute_arguments”:
> 
> struct trailing_array_B {
>  Int count;
>  int array_B[] __attribute ((counted_by (count))); 
> };
> 
> 
> From my current very limited understanding of the C FE source code, it’s 
> not easy to extend the argument to an expression later for the above. Is 
> this understanding right?

It wouldn't be entirely compatible: if you change to interpreting the 
argument as an expression, then the above would suggest a global variable 
count is used (as opposed to some other syntax for referring to an element 
of the containing structure).

So an attribute that takes an element name might best be a *different* 
attribute from any potential future one taking an expression (with some 
new syntax to refer to an element).

-- 
Joseph S. Myers
jos...@codesourcery.com


RE: [x86 PATCH] PR target/31985: Improve memory operand use with doubleword add.

2023-06-15 Thread Roger Sayle

Hi Uros,

> On the 7th June 2023, Uros Bizkak wrote:
> The register allocator considers the instruction-to-be-split as one 
> instruction, so it
> can allocate output register to match an input register (or a register that 
> forms an
> input address), So, you have to either add an early clobber to the output, or
> somehow prevent output to clobber registers in the second pattern.

This implements your suggestion of adding an early clobber to the output, a
one character ('&') change from the previous version of this patch.  Retested
with make bootstrap and make -k check, with and without -m32, to confirm
there are no issues, and this still fixes the pr31985.c test case.

As you've suggested, I'm also working on improving STV in this area.

Ok for mainline?


2023-06-15  Roger Sayle  
Uros Bizjak  

gcc/ChangeLog
PR target/31985
* config/i386/i386.md (*add3_doubleword_concat): New
define_insn_and_split combine *add3_doubleword with a
*concat3 for more efficient lowering after reload.

gcc/testsuite/ChangeLog
PR target/31985
* gcc.target/i386/pr31985.c: New test case.

Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e6ebc46..42c302d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6124,6 +6124,36 @@
  (clobber (reg:CC FLAGS_REG))])]
  "split_double_mode (mode, [0], 2, [0], [3]);")
 
+(define_insn_and_split "*add3_doubleword_concat"
+  [(set (match_operand: 0 "register_operand" "=")
+   (plus:
+ (any_or_plus:
+   (ashift:
+ (zero_extend:
+   (match_operand:DWIH 2 "nonimmediate_operand" "rm"))
+ (match_operand: 3 "const_int_operand"))
+   (zero_extend:
+ (match_operand:DWIH 4 "nonimmediate_operand" "rm")))
+ (match_operand: 1 "register_operand" "0")))
+   (clobber (reg:CC FLAGS_REG))]
+  "INTVAL (operands[3]) ==  * BITS_PER_UNIT"
+  "#"
+  "&& reload_completed"
+  [(parallel [(set (reg:CCC FLAGS_REG)
+  (compare:CCC
+(plus:DWIH (match_dup 1) (match_dup 4))
+(match_dup 1)))
+ (set (match_dup 0)
+  (plus:DWIH (match_dup 1) (match_dup 4)))])
+   (parallel [(set (match_dup 5)
+  (plus:DWIH
+(plus:DWIH
+  (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0))
+  (match_dup 6))
+(match_dup 2)))
+ (clobber (reg:CC FLAGS_REG))])]
+ "split_double_mode (mode, [0], 2, [0], [5]);")
+
 (define_insn "*add_1"
   [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r")
(plus:SWI48
diff --git a/gcc/testsuite/gcc.target/i386/pr31985.c 
b/gcc/testsuite/gcc.target/i386/pr31985.c
new file mode 100644
index 000..a6de1b5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr31985.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2" } */
+
+void test_c (unsigned int a, unsigned int b, unsigned int c, unsigned int d)
+{
+  volatile unsigned int x, y;
+  unsigned long long __a = b | ((unsigned long long)a << 32);
+  unsigned long long __b = d | ((unsigned long long)c << 32);
+  unsigned long long __c = __a + __b;
+  x = (unsigned int)(__c & 0x);
+  y = (unsigned int)(__c >> 32);
+}
+
+/* { dg-final { scan-assembler-times "movl" 4 } } */


Re: [PATCH v3] configure: Implement --enable-host-pie

2023-06-15 Thread Marek Polacek via Gcc-patches
On Mon, Jun 05, 2023 at 09:06:43PM -0600, Jeff Law wrote:
> 
> 
> On 6/5/23 10:18, Marek Polacek via Gcc-patches wrote:
> > Ping.  Anyone have any further comments?
> Given this was approved before, but got reverted due to issues (which have
> since been addressed) -- I think you might as well go forward and sooner
> rather than later so that we can catch fallout earlier.

Thanks, pushed now, after rebasing, adjusting the patch for
r14-1385, and testing with and without --enable-host-pie on
both Debian and Fedora.

If something comes up and I can't fix it quickly enough, I'll
have to revert the patch.  We'll see.

Marek



Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-06-15 Thread Philipp Tomsich
Rebased, retested, and applied to trunk.  Thanks!
--Philipp.


On Thu, 8 Jun 2023 at 00:18, Jeff Law  wrote:
>
>
>
> On 5/25/23 06:35, Manolis Tsamis wrote:
> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
> > in all cases, due to maybe_mode_change returning NULL. Relax this
> > restriction and allow propagation when no mode change is requested.
> >
> > gcc/ChangeLog:
> >
> >  * regcprop.cc (maybe_mode_change): Enable stack pointer 
> > propagation.
> Thanks for the clarification.  This is OK for the trunk.  It looks
> generic enough to have value going forward now rather than waiting.
>
> jeff


Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-06-15 Thread Qing Zhao via Gcc-patches


> On Jun 15, 2023, at 12:55 PM, Joseph Myers  wrote:
> 
> On Thu, 15 Jun 2023, Qing Zhao via Gcc-patches wrote:
> 
>> Comparing B with A, I don’t see too much benefit, either from 
>> user-interface point of view, or from implementation point of view.
>> 
>> For implementation, both A and B need to search the fields of the 
>> containing structure by the name of the field “count”.
>> 
>> For user interface, I think that A and B are similar.
> 
> But as a language design matter, there are no standard C interfaces that 
> interpret a string as an identifier, so doing so does not fit well with 
> the language.

Okay, makes sense.  So I will choose B over A. -:) 
> 
>> 1. Update the routine “c_parser_postfix_expression” (is this the right 
>> place? ) to accept the new designator syntax.
> 
> Any design that might work with an expression is the sort of thing that 
> would likely involve many iterations on the specification (i.e. proposed 
> wording changes to the C standard) for the interpretation of the new kinds 
> of expressions, including how to resolve syntactic ambiguities and how 
> name lookup works, before it could be considered ready to implement, and 
> then a lot more work on the specification based on implementation 
> experience.

Okay, I see the complication to make such new syntax into C standard…

> 
> Note that no expressions can start with the '.' token at present.  As soon 
> as you invent a new kind of expression that can start with that token, you 
> have syntactic ambiguity.
> 
> struct s1 { int c; char a[(struct s2 { int c; char b[.c]; }) {.c=.c}.c]; };
> 
> Is ".c=.c" a use of the existing syntax for designated initializers, with 
> the first ".c" being a designator and the second being a use of the new 
> kind of expression, or is it an assignment expression, where both the LHS 
> and the RHS of the assignment use the new kind of expression?  And do 
> those .c, when the use the new kind of expression, refer to the inner or 
> outer struct definition?

Okay, I see. Yes, this will be really confusing. 

> 
> There are obvious advantages to using tokens that don't introduce such an 
> ambiguity with designators (i.e., not '.' as the token to start the new 
> kind of expression, but something that cannot start a designator), if such 
> tokens can be found.  But you still have the name lookup question when 
> there are multiple nested structure definitions.  And the question of when 
> expressions are considered to be evaluated, if they have side effects such 
> as ".c=.c" does.
> 
> "Whatever falls out of the implementation" is not a good approach for 
> language design here.  If you want a new kind of expressions here, you 
> need a careful multi-implementation design phase that produces a proper 
> specification and has good reasons for the particular choices made in 
> cases of ambiguity.

Thanks a lot for your detailed explanation on the language design concerns. 
For this new attribute, I was convinced that it might not worth the effort to 
introduce a new syntax at this stage.

Another question,  whether it’s possible to extend such attribute later to 
accept expression as its argument if we take approach B:

B. The argument of the new attribute “counted_by” is an identifier that can be
accepted by “c_parser_attribute_arguments”:

struct trailing_array_B {
 Int count;
 int array_B[] __attribute ((counted_by (count))); 
};


From my current very limited understanding of the C FE source code, it’s not 
easy to extend the argument to an expression later for the above.
Is this understanding right?

(The motivation of accepting expression as the argument for the new attribute 
“counted_by” is 
   from the proposal for LLVM: 
https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854:

• __counted_by(N) : The pointer points to memory that contains N 
elements of pointee type. N is an expression of integer type which can be a 
simple reference to declaration, a constant including calls to constant 
functions, or an arithmetic expression that does not have side effect. The 
annotation cannot apply to pointers to incomplete types or types without size 
such as  void *.
)
 
thanks.
Qing

> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com



[PATCH] Add another testcase for PR 110266

2023-06-15 Thread Andrew Pinski via Gcc-patches
Since the combining of sin/cos into cexpi is depedent
on the target, this adds another testcase which had failed (earlier in
evpr rather than vrp2) that will fail on all targets rather than
ones which have sincos or C99 math functions.

Committed as obvious after a quick test.

gcc/testsuite/ChangeLog:

PR tree-optimization/110266
* gcc.c-torture/compile/pr110266.c: New test.
---
 gcc/testsuite/gcc.c-torture/compile/pr110266.c | 9 +
 1 file changed, 9 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr110266.c

diff --git a/gcc/testsuite/gcc.c-torture/compile/pr110266.c 
b/gcc/testsuite/gcc.c-torture/compile/pr110266.c
new file mode 100644
index 000..92af0c51efc
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr110266.c
@@ -0,0 +1,9 @@
+double PsyBufferUpdate(int n)
+{
+  if (n == 4)
+{
+  _Complex double t = __builtin_cexpi(n);
+  return __real t * __imag t;
+}
+  return 0;
+}
-- 
2.31.1



Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.

2023-06-15 Thread Manolis Tsamis
The new target-independant implementation of fold-mem-offsets pass can
be found in the list (link is
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621920.html)

Aside from now being target independent, I have fixed a number of new
bugs that emerged when running this on other targets and a minor
memory leak.
I have also improved the propagation logic in fold_offsets to work
with more patterns found in other targets (e.g. LEA instructions in
x86).
Finally I improved the naming of things (e.g. replaced uses of
'delete'/'remove' with 'fold', made bitmap names more meaningful) and
reduced unnecessary verbosity in some comments.

Thanks,
Manolis

On Tue, Jun 13, 2023 at 12:58 AM Jeff Law  wrote:
>
>
>
> On 6/12/23 01:32, Manolis Tsamis wrote:
>
> >>
> >>> +  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
> >>> +{
> >>> +  /* Problem getting some definition for this instruction.  */
> >>> +  if (ref_link->ref == NULL)
> >>> + return NULL;
> >>> +  if (DF_REF_INSN_INFO (ref_link->ref) == NULL)
> >>> + return NULL;
> >>> +  if (global_regs[REGNO (reg)]
> >>> +   && !set_of (reg, DF_REF_INSN (ref_link->ref)))
> >>> + return NULL;
> >>> +}
> >> That last condition feels a bit odd.  It would seem that you wanted an
> >> OR boolean rather than AND.
> >>
> >
> > Most of this function I didn't write by myself, I used existing code
> > to get definitions taken from REE's get_defs.
> > In the original there's a comment about this line this comment that 
> > explains it:
> >
> >As global regs are assumed to be defined at each function call
> >dataflow can report a call_insn as being a definition of REG.
> >But we can't do anything with that in this pass so proceed only
> >if the instruction really sets REG in a way that can be deduced
> >from the RTL structure.
> >
> > This function is the only one I copied without changing much (because
> > I didn't quite understand it), so I don't know if that condition is
> > any useful for f-m-o.
> > Also the code duplication here is a bit unfortunate, maybe it would be
> > preferred to create a generic version that can be used in both?
> Ah.  So I think the code is meant to filter out things that DF will say
> are set vs those which are actually exposed explicitly in the RTL (and
> which REE might be able to modify).  So we're probably good.
>
> Those routines are are pretty close to each other in implementation.  I
> bet we could take everything up to the loop over the ref links and
> factor that into a common function.   Both your function and get_defs
> would be able to use that and then do bit of processing afterwards.
>
>
> >
> >>
> >>> +
> >>> +  unsigned int dest_regno = REGNO (dest);
> >>> +
> >>> +  /* We don't want to fold offsets from instructions that change some
> >>> + particular registers with potentially global side effects.  */
> >>> +  if (!GP_REG_P (dest_regno)
> >>> +  || dest_regno == STACK_POINTER_REGNUM
> >>> +  || (frame_pointer_needed && dest_regno == 
> >>> HARD_FRAME_POINTER_REGNUM)
> >>> +  || dest_regno == GP_REGNUM
> >>> +  || dest_regno == THREAD_POINTER_REGNUM
> >>> +  || dest_regno == RETURN_ADDR_REGNUM)
> >>> +return 0;
> >> I'd think most of this would be captured by testing fixed_registers
> >> rather than trying to list each register individually.  In fact, if we
> >> need to generalize this to work on other targets we almost certainly
> >> want a more general test.
> >>
> >
> > Thanks, I knew there would be some proper way to test this but wasn't
> > aware which is the correct one.
> > Should this look like below? Or is the GP_REG_P redundant and just
> > fixed_regs will do?
> If you want to verify it's a general register, then you have to ask if
> the regno is in the GENERAL_REGS class.  Something like:
>
> TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], dest_regno)
>
> GP_REG_P is a risc-v specific macro, so we can't use it here.
>
> So something like
>if (fixed_regs[dest_regno]
>|| !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], dest_regno))
>
>
>
> >> The debugging message is a bit misleading.  Yea, we always delete
> >> something here, but in one case we end up emitting a copy.
> >>
> >
> > Indeed. Maybe "Instruction reduced to move: ..."?
> Works for me.
>
> >
> >>
> >>
> >>> +
> >>> +   /* Temporarily change the offset in MEM to test whether
> >>> +  it results in a valid instruction.  */
> >>> +   machine_mode mode = GET_MODE (mem_addr);
> >>> +   XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, GEN_INT (offset));
> >>> +
> >>> +   bool valid_change = recog (PATTERN (insn), insn, 0) >= 0;
> >>> +
> >>> +   /* Restore the instruction.  */
> >>> +   XEXP (mem, 0) = mem_addr;
> >> You need to reset the INSN_CODE after restoring the instruction.  That's
> >> generally a bad thing to do, but I've seen it done enough (and been
> >> guilty myself in the past) that we should just assume some 

[PATCH v2] Implement new RTL optimizations pass: fold-mem-offsets.

2023-06-15 Thread Manolis Tsamis
This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:

  addi t4,sp,16
  add  t2,a6,t4
  shl  t3,t2,1
  ld   a2,0(t3)
  addi a2,1
  sd   a2,8(t2)

into the following (one instruction less):

  add  t2,a6,sp
  shl  t3,t2,1
  ld   a2,32(t3)
  addi a2,1
  sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

gcc/ChangeLog:

* Makefile.in: Add fold-mem-offsets.o.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_fold_mem_offsets): Declare.
* common.opt: New options.
* doc/invoke.texi: Document new option.
* fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/fold-mem-offsets-1.c: New test.
* gcc.target/riscv/fold-mem-offsets-2.c: New test.
* gcc.target/riscv/fold-mem-offsets-3.c: New test.

Signed-off-by: Manolis Tsamis 
---

Changes in v2:
- Made the pass target-independant instead of RISCV specific.
- Fixed a number of bugs.
- Add code to handle more ADD patterns as found
  in other targets (x86, aarch64).
- Improved naming and comments.
- Fixed bitmap memory leak.

 gcc/Makefile.in   |   1 +
 gcc/common.opt|   4 +
 gcc/doc/invoke.texi   |   8 +
 gcc/fold-mem-offsets.cc   | 630 ++
 gcc/passes.def|   1 +
 .../gcc.target/riscv/fold-mem-offsets-1.c |  16 +
 .../gcc.target/riscv/fold-mem-offsets-2.c |  24 +
 .../gcc.target/riscv/fold-mem-offsets-3.c |  17 +
 gcc/tree-pass.h   |   1 +
 9 files changed, 702 insertions(+)
 create mode 100644 gcc/fold-mem-offsets.cc
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 4be82e83b9e..98a59e0d207 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1423,6 +1423,7 @@ OBJS = \
fixed-value.o \
fold-const.o \
fold-const-call.o \
+   fold-mem-offsets.o \
function.o \
function-abi.o \
function-tests.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index a28ca13385a..5a793de34fa 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1248,6 +1248,10 @@ fcprop-registers
 Common Var(flag_cprop_registers) Optimization
 Perform a register copy-propagation optimization pass.
 
+ffold-mem-offsets
+Target Bool Var(flag_fold_mem_offsets) Init(1)
+Fold instructions calculating memory offsets to the memory access instruction 
if possible.
+
 fcrossjumping
 Common Var(flag_crossjumping) Optimization
 Perform cross-jumping optimization.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9ecbd32a228..b1dba4df536 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -537,6 +537,7 @@ Objective-C and Objective-C++ Dialects}.
 -fauto-inc-dec  -fbranch-probabilities
 -fcaller-saves
 -fcombine-stack-adjustments  -fconserve-stack
+-ffold-mem-offsets
 -fcompare-elim  -fcprop-registers  -fcrossjumping
 -fcse-follow-jumps  -fcse-skip-blocks  -fcx-fortran-rules
 -fcx-limited-range
@@ -14230,6 +14231,13 @@ the comparison operation before register allocation is 
complete.
 
 Enabled at levels @option{-O1}, @option{-O2}, @option{-O3}, @option{-Os}.
 
+@opindex ffold-mem-offsets
+@item -ffold-mem-offsets
+@itemx -fno-fold-mem-offsets
+Try to eliminate add instructions by folding them in memory loads/stores.
+
+Enabled at levels @option{-O2}, @option{-O3}.
+
 @opindex fcprop-registers
 @item -fcprop-registers
 After register allocation and post-register allocation instruction splitting,
diff --git a/gcc/fold-mem-offsets.cc b/gcc/fold-mem-offsets.cc
new file mode 100644
index 000..8ef0f438191
--- /dev/null
+++ b/gcc/fold-mem-offsets.cc
@@ -0,0 +1,630 @@
+/* Late RTL pass to fold memory offsets.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see

Re: [PATCH v2] RISC-V: Add autovec FP unary operations.

2023-06-15 Thread 钟居哲
Add unary testcase of fp16 into zvfhmin-1.c (make sure they don't ICE and no 
vectorize when -march=rv64gc_zvfhmin).
Make sure we won't do wrong in case of ZVFHMIN.

/* We can't enable FP16 NEG/PLUS/MINUS/MULT/DIV auto-vectorization when 
-march="*zvfhmin*".  */
/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 0 "vect" 
} } */

Otherwise, LGTM.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-15 23:12
To: gcc-patches; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw
CC: rdapp.gcc
Subject: [PATCH v2] RISC-V: Add autovec FP unary operations.
Hi,
 
changes from V1:
  - Use VF_AUTO iterator.
  - Don't mention vfsqrt7.
 
This patch adds floating-point autovec expanders for vfneg, vfabs as well as
vfsqrt and the accompanying tests.
 
Similary to the binop tests, there are flavors for zvfh now.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (2): Add unop expanders.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/abs-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c: New test.
---
gcc/config/riscv/autovec.md   | 36 ++-
.../riscv/rvv/autovec/unop/abs-run.c  |  6 ++--
.../riscv/rvv/autovec/unop/abs-rv32gcv.c  |  3 +-
.../riscv/rvv/autovec/unop/abs-rv64gcv.c  |  3 +-
.../riscv/rvv/autovec/unop/abs-template.h | 14 +++-
.../riscv/rvv/autovec/unop/abs-zvfh-run.c | 35 ++
.../riscv/rvv/autovec/unop/vfsqrt-run.c   | 29 +++
.../riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c   | 10 ++
.../riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c   | 10 ++
.../riscv/rvv/autovec/unop/vfsqrt-template.h  | 31 
.../riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c  | 32 +
.../riscv/rvv/autovec/unop/vneg-run.c |  6 ++--
.../riscv/rvv/autovec/unop/vneg-rv32gcv.c |  3 +-
.../riscv/rvv/autovec/unop/vneg-rv64gcv.c |  3 +-
.../riscv/rvv/autovec/unop/vneg-template.h|  5 ++-
.../riscv/rvv/autovec/unop/vneg-zvfh-run.c| 26 ++
16 files changed, 241 insertions(+), 11 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 94452c932a4..5b84eaaf052 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -513,7 +513,7 @@ (define_expand "2"
})
;; 
---
-;; - ABS expansion to vmslt and vneg
+;; - [INT] ABS expansion to vmslt and vneg.
;; 
---
(define_expand "abs2"
@@ -532,6 +532,40 @@ (define_expand "abs2"
   DONE;
})
+;; 
---
+;;  [FP] Unary operations
+;; 
---
+;; Includes:
+;; - vfneg.v/vfabs.v
+;; 
---
+(define_expand "2"
+  [(set (match_operand:VF_AUTO 0 "register_operand")
+(any_float_unop_nofrm:VF_AUTO
+ (match_operand:VF_AUTO 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+})
+
+;; 
---
+;; - [FP] Square root
+;; 
---
+;; Includes:
+;; - vfsqrt.v
+;; 
---
+(define_expand "2"
+  [(set 

Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-06-15 Thread Joseph Myers
On Thu, 15 Jun 2023, Qing Zhao via Gcc-patches wrote:

> Comparing B with A, I don’t see too much benefit, either from 
> user-interface point of view, or from implementation point of view.
> 
> For implementation, both A and B need to search the fields of the 
> containing structure by the name of the field “count”.
> 
> For user interface, I think that A and B are similar.

But as a language design matter, there are no standard C interfaces that 
interpret a string as an identifier, so doing so does not fit well with 
the language.

> 1. Update the routine “c_parser_postfix_expression” (is this the right 
> place? ) to accept the new designator syntax.

Any design that might work with an expression is the sort of thing that 
would likely involve many iterations on the specification (i.e. proposed 
wording changes to the C standard) for the interpretation of the new kinds 
of expressions, including how to resolve syntactic ambiguities and how 
name lookup works, before it could be considered ready to implement, and 
then a lot more work on the specification based on implementation 
experience.

Note that no expressions can start with the '.' token at present.  As soon 
as you invent a new kind of expression that can start with that token, you 
have syntactic ambiguity.

struct s1 { int c; char a[(struct s2 { int c; char b[.c]; }) {.c=.c}.c]; };

Is ".c=.c" a use of the existing syntax for designated initializers, with 
the first ".c" being a designator and the second being a use of the new 
kind of expression, or is it an assignment expression, where both the LHS 
and the RHS of the assignment use the new kind of expression?  And do 
those .c, when the use the new kind of expression, refer to the inner or 
outer struct definition?

There are obvious advantages to using tokens that don't introduce such an 
ambiguity with designators (i.e., not '.' as the token to start the new 
kind of expression, but something that cannot start a designator), if such 
tokens can be found.  But you still have the name lookup question when 
there are multiple nested structure definitions.  And the question of when 
expressions are considered to be evaluated, if they have side effects such 
as ".c=.c" does.

"Whatever falls out of the implementation" is not a good approach for 
language design here.  If you want a new kind of expressions here, you 
need a careful multi-implementation design phase that produces a proper 
specification and has good reasons for the particular choices made in 
cases of ambiguity.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v2] RISC-V: Add autovec FP unary operations.

2023-06-15 Thread Michael Collison

Hi Robin,

Looks good to me except for note that this seems to depend on a new 
function: emit_vlmax_fp_insn which appears to be part of your autovec FP 
binary operation. So that patch would need to be merged first from what 
I can see.


On 6/15/23 11:12, Robin Dapp via Gcc-patches wrote:

Hi,

changes from V1:
   - Use VF_AUTO iterator.
   - Don't mention vfsqrt7.

This patch adds floating-point autovec expanders for vfneg, vfabs as well as
vfsqrt and the accompanying tests.

Similary to the binop tests, there are flavors for zvfh now.

gcc/ChangeLog:

* config/riscv/autovec.md (2): Add unop expanders.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/abs-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c: New test.
---
  gcc/config/riscv/autovec.md   | 36 ++-
  .../riscv/rvv/autovec/unop/abs-run.c  |  6 ++--
  .../riscv/rvv/autovec/unop/abs-rv32gcv.c  |  3 +-
  .../riscv/rvv/autovec/unop/abs-rv64gcv.c  |  3 +-
  .../riscv/rvv/autovec/unop/abs-template.h | 14 +++-
  .../riscv/rvv/autovec/unop/abs-zvfh-run.c | 35 ++
  .../riscv/rvv/autovec/unop/vfsqrt-run.c   | 29 +++
  .../riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c   | 10 ++
  .../riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c   | 10 ++
  .../riscv/rvv/autovec/unop/vfsqrt-template.h  | 31 
  .../riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c  | 32 +
  .../riscv/rvv/autovec/unop/vneg-run.c |  6 ++--
  .../riscv/rvv/autovec/unop/vneg-rv32gcv.c |  3 +-
  .../riscv/rvv/autovec/unop/vneg-rv64gcv.c |  3 +-
  .../riscv/rvv/autovec/unop/vneg-template.h|  5 ++-
  .../riscv/rvv/autovec/unop/vneg-zvfh-run.c| 26 ++
  16 files changed, 241 insertions(+), 11 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 94452c932a4..5b84eaaf052 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -513,7 +513,7 @@ (define_expand "2"
  })
  
  ;; ---

-;; - ABS expansion to vmslt and vneg
+;; - [INT] ABS expansion to vmslt and vneg.
  ;; 
---
  
  (define_expand "abs2"

@@ -532,6 +532,40 @@ (define_expand "abs2"
DONE;
  })
  
+;; ---

+;;  [FP] Unary operations
+;; 
---
+;; Includes:
+;; - vfneg.v/vfabs.v
+;; 
---
+(define_expand "2"
+  [(set (match_operand:VF_AUTO 0 "register_operand")
+(any_float_unop_nofrm:VF_AUTO
+ (match_operand:VF_AUTO 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+})
+
+;; 
---
+;; - [FP] Square root
+;; 
---
+;; Includes:
+;; - vfsqrt.v
+;; 
---
+(define_expand "2"
+  [(set (match_operand:VF_AUTO 0 "register_operand")
+(any_float_unop:VF_AUTO
+ (match_operand:VF_AUTO 

Re: [PATCH v2] RISC-V: Add autovec FP binary operations.

2023-06-15 Thread Michael Collison

Robin,

Why do we need '-ffast-math' with the tests?

On 6/15/23 11:10, Robin Dapp via Gcc-patches wrote:

Hi,

changes from V1:
  - Add VF_AUTO iterator and use it.
  - Ensured we don't ICE with -march=rv64gcv_zfhmin.

this implements the floating-point autovec expanders for binary
operations: vfadd, vfsub, vfdiv, vfmul, vfmax, vfmin and adds
tests.

The existing tests are split up into non-_Float16 and _Float16
flavors as we cannot rely on the zvfh extension being present.

As long as we do not have full middle-end support we need
-ffast-math for the tests.

gcc/ChangeLog:

* config/riscv/autovec.md (3): Implement binop
expander.
* config/riscv/riscv-protos.h (emit_vlmax_fp_insn): Declare.
(emit_vlmax_fp_minmax_insn): Declare.
(enum frm_field_enum): Rename this...
(enum rounding_mode): ...to this.
* config/riscv/riscv-v.cc (emit_vlmax_fp_insn): New function
(emit_vlmax_fp_minmax_insn): New function.
* config/riscv/riscv.cc (riscv_const_insns): Clarify const
vector handling.
(riscv_libgcc_floating_mode_supported_p): Adjust comment.
(riscv_excess_precision): Do not convert to float for ZVFH.
* config/riscv/vector-iterators.md: Add VF_AUTO iterator.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vdiv-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmax-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmin-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmul-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vsub-zvfh-run.c: New test.
---
  gcc/config/riscv/autovec.md   | 36 +
  gcc/config/riscv/riscv-protos.h   |  5 +-
  gcc/config/riscv/riscv-v.cc   | 74 ++-
  gcc/config/riscv/riscv.cc | 27 +--
  gcc/config/riscv/vector-iterators.md  | 28 +++
  .../riscv/rvv/autovec/binop/vadd-run.c| 12 ++-
  .../riscv/rvv/autovec/binop/vadd-rv32gcv.c|  3 +-
  .../riscv/rvv/autovec/binop/vadd-rv64gcv.c|  3 +-
  .../riscv/rvv/autovec/binop/vadd-template.h   | 11 ++-
  .../riscv/rvv/autovec/binop/vadd-zvfh-run.c   | 54 ++
  .../riscv/rvv/autovec/binop/vdiv-run.c|  8 +-
  .../riscv/rvv/autovec/binop/vdiv-rv32gcv.c|  7 +-
  .../riscv/rvv/autovec/binop/vdiv-rv64gcv.c|  7 +-
  .../riscv/rvv/autovec/binop/vdiv-template.h   |  8 +-
  .../riscv/rvv/autovec/binop/vdiv-zvfh-run.c   | 37 ++
  .../riscv/rvv/autovec/binop/vmax-run.c|  9 ++-
  .../riscv/rvv/autovec/binop/vmax-rv32gcv.c|  3 +-
  .../riscv/rvv/autovec/binop/vmax-rv64gcv.c|  3 +-
  .../riscv/rvv/autovec/binop/vmax-template.h   |  8 +-
  .../riscv/rvv/autovec/binop/vmax-zvfh-run.c   | 38 ++
  .../riscv/rvv/autovec/binop/vmin-run.c| 10 ++-
  .../riscv/rvv/autovec/binop/vmin-rv32gcv.c|  3 +-
  .../riscv/rvv/autovec/binop/vmin-rv64gcv.c|  3 +-
  .../riscv/rvv/autovec/binop/vmin-template.h   |  8 +-
  .../riscv/rvv/autovec/binop/vmin-zvfh-run.c   | 37 ++
  .../riscv/rvv/autovec/binop/vmul-run.c|  8 +-
  

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-15 Thread Segher Boessenkool
On Thu, Jun 15, 2023 at 03:00:40PM +0800, Jiufu Guo wrote:
> >>   This is the existing pattern.  It may be read as an action
> >>   to clean an unknown-size memory block.
> >
> > Including a size zero memory block, yes.  BLKmode was originally to do
> > things like bcopy (before modern names like memcpy were more usually
> > used), and those very much need size zero as well.h
> 
> The size is possible to be zero.  No asm code needs to
> be generated for "set 'const_int 0' to zero size memory"".
> stack_tie does not generate any real code.  It seems ok :)
> 
> While, it may not be zero size mem.  This may be a concern.
> This is one reason that I would like to have an unspec_tie.

It very much *can* be a zero size mem, that is perfectly find for
mem:BLK.

> Another reason is unspec:blk is used but various ports :) 

unspec:BLK is undefined.  BLKmode is allowed on mem only.

> >> 2. "set (mem/c:BLK (reg/f:DI 1 1) unspec:blk (const_int 0 [0])
> >> UNSPEC_TIE".
> >>   Current patch is using this one.
> >
> > What would be the semantics of that?  Just the same as the current stuff
> > I'd say, or less?  It cannot be more!
> 
> The semantic that I trying to achieve is "this is a special
> insn, not only a normal set to unknown size mem".

What does that *mean*?  "Special instruction"?  What would what code do
for that?  What would the RTL mean?

> As you explained before on 'unspec:DI', the unspec would
> just decorate the set_src part: something DI value with
> machine-specific operation.

An unspec is an operation on its operands, giving some (in this case)
DImode value.  There is nothing special about that operation, it can be
optimised like any other, it's just not specified what exactly that
value is (to the generic compiler, the backend itself can very much
optimise stuff with it).

> But, since 'tie_operand' is checked for this insn.
> If 'tie_operand' checks UNPSEC_TIE, then the insn
> with UNPSEC_TIE is 'a special insn'.  Or interpret
> the semantic of this insn as: this insn stack_ite
> indicates "set/operate a zero size block".

tie_operand is a predicate.  The predicate of an insn has to return 1,
or the insn is not recognised.  You can do the same in insn conditions
always (in principle anyway).


Segher


[PATCH] PR tree-optimization/110266 - Check for integer only complex

2023-06-15 Thread Andrew MacLeod via Gcc-patches
With the expanded capabilities of range-op dispatch, floating point 
complex objects can appear when folding, whic they couldn't before. In 
the processing for extracting integers from complex int's, make sure it 
actually is an integer.


Bootstraps on x86_64-pc-linux-gnu.  Regtesting currently under way.  
Assuming there are no issues, I will push this.


Andrew

From 2ba20a9e7b41fbcf1f03d5447e14b9b7b174fead Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Thu, 15 Jun 2023 11:59:55 -0400
Subject: [PATCH] Check for integer only complex.

With the expanded capabilities of range-op dispatch, floating point
complex objects can appear when folding, whic they couldn't before.
In the processig for extracting integers from complex ints, make sure it
is an integer complex.

	PR tree-optimization/110266
	gcc/
	* gimple-range-fold.cc (adjust_imagpart_expr): Check for integer
	complex type.
	(adjust_realpart_expr): Ditto.

	gcc/testsuite/
	* gcc.dg/pr110266.c: New.
---
 gcc/gimple-range-fold.cc|  6 --
 gcc/testsuite/gcc.dg/pr110266.c | 20 
 2 files changed, 24 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr110266.c

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 173d9f386c5..b4018d08d2b 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -506,7 +506,8 @@ adjust_imagpart_expr (vrange , const gimple *stmt)
   && gimple_assign_rhs_code (def_stmt) == COMPLEX_CST)
 {
   tree cst = gimple_assign_rhs1 (def_stmt);
-  if (TREE_CODE (cst) == COMPLEX_CST)
+  if (TREE_CODE (cst) == COMPLEX_CST
+	  && TREE_CODE (TREE_TYPE (TREE_TYPE (cst))) == INTEGER_TYPE)
 	{
 	  wide_int w = wi::to_wide (TREE_IMAGPART (cst));
 	  int_range<1> imag (TREE_TYPE (TREE_IMAGPART (cst)), w, w);
@@ -533,7 +534,8 @@ adjust_realpart_expr (vrange , const gimple *stmt)
   && gimple_assign_rhs_code (def_stmt) == COMPLEX_CST)
 {
   tree cst = gimple_assign_rhs1 (def_stmt);
-  if (TREE_CODE (cst) == COMPLEX_CST)
+  if (TREE_CODE (cst) == COMPLEX_CST
+	  && TREE_CODE (TREE_TYPE (TREE_TYPE (cst))) == INTEGER_TYPE)
 	{
 	  wide_int imag = wi::to_wide (TREE_REALPART (cst));
 	  int_range<2> tmp (TREE_TYPE (TREE_REALPART (cst)), imag, imag);
diff --git a/gcc/testsuite/gcc.dg/pr110266.c b/gcc/testsuite/gcc.dg/pr110266.c
new file mode 100644
index 000..0b2acb5a791
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr110266.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#include 
+
+int Hann_i, PsyBufferUpdate_psyInfo_0, PsyBufferUpdate_i;
+double *mdct_data;
+double PsyBufferUpdate_sfreq;
+void PsyBufferUpdate() {
+  if (PsyBufferUpdate_psyInfo_0 == 4)
+for (; Hann_i;)
+  ;
+  {
+double xr_0 = cos(PsyBufferUpdate_psyInfo_0);
+PsyBufferUpdate_sfreq = sin(PsyBufferUpdate_psyInfo_0);
+for (; PsyBufferUpdate_psyInfo_0; PsyBufferUpdate_i++)
+  mdct_data[PsyBufferUpdate_i] = xr_0 * PsyBufferUpdate_sfreq;
+  }
+}
+
-- 
2.40.1



Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Richard Biener via Gcc-patches



> Am 15.06.2023 um 16:04 schrieb Andrew Stubbs :
> 
> On 15/06/2023 15:00, Richard Biener wrote:
>>> On Thu, 15 Jun 2023, Andrew Stubbs wrote:
>>> On 15/06/2023 14:34, Richard Biener wrote:
 On Thu, 15 Jun 2023, Andrew Stubbs wrote:
 
> On 15/06/2023 12:06, Richard Biener wrote:
>> On Thu, 15 Jun 2023, Andrew Stubbs wrote:
>> 
>>> On 15/06/2023 10:58, Richard Biener wrote:
 On Thu, 15 Jun 2023, Andrew Stubbs wrote:
 
> On 14/06/2023 15:29, Richard Biener wrote:
>> 
>> 
>>> Am 14.06.2023 um 16:27 schrieb Andrew Stubbs 
>>> :
>>> 
>>> On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:
 This implemens fully masked vectorization or a masked epilog for
 AVX512 style masks which single themselves out by representing
 each lane with a single bit and by using integer modes for the mask
 (both is much like GCN).
 AVX512 is also special in that it doesn't have any instruction
 to compute the mask from a scalar IV like SVE has with while_ult.
 Instead the masks are produced by vector compares and the loop
 control retains the scalar IV (mainly to avoid dependences on
 mask generation, a suitable mask test instruction is available).
>>> 
>>> This is also sounds like GCN. We currently use WHILE_ULT in the
>>> middle
>>> end
>>> which expands to a vector compare against a vector of stepped 
>>> values.
>>> This
>>> requires an additional instruction to prepare the comparison vector
>>> (compared to SVE), but the "while_ultv64sidi" pattern (for example)
>>> returns
>>> the DImode bitmask, so it works reasonably well.
>>> 
 Like RVV code generation prefers a decrementing IV though IVOPTs
 messes things up in some cases removing that IV to eliminate
 it with an incrementing one used for address generation.
 One of the motivating testcases is from PR108410 which in turn
 is extracted from x264 where large size vectorization shows
 issues with small trip loops.  Execution time there improves
 compared to classic AVX512 with AVX2 epilogues for the cases
 of less than 32 iterations.
 size   scalar 128 256 512512e512f
 19.42   11.329.35   11.17   15.13   16.89
 25.726.536.666.667.628.56
 34.495.105.105.745.085.73
 44.104.334.295.213.794.25
 63.783.853.864.762.542.85
 83.641.893.764.501.922.16
123.562.213.754.261.261.42
163.360.831.064.160.951.07
203.391.421.334.070.750.85
243.230.661.724.220.620.70
283.181.092.044.200.540.61
323.160.470.410.410.470.53
343.160.670.610.560.440.50
383.190.950.950.820.400.45
423.090.581.211.130.360.40
 'size' specifies the number of actual iterations, 512e is for
 a masked epilog and 512f for the fully masked loop.  From
 4 scalar iterations on the AVX512 masked epilog code is clearly
 the winner, the fully masked variant is clearly worse and
 it's size benefit is also tiny.
>>> 
>>> Let me check I understand correctly. In the fully masked case, there
>>> is
>>> a
>>> single loop in which a new mask is generated at the start of each
>>> iteration. In the masked epilogue case, the main loop uses no 
>>> masking
>>> whatsoever, thus avoiding the need for generating a mask, carrying
>>> the
>>> mask, inserting vec_merge operations, etc, and then the epilogue
>>> looks
>>> much
>>> like the fully masked case, but unlike smaller mode epilogues there
>>> is
>>> no
>>> loop because the eplogue vector size is the same. Is that right?
>> 
>> Yes.
>> 
>>> This scheme seems like it might also benefit GCN, in so much as it
>>> simplifies the hot code path.
>>> 
>>> GCN does not actually have smaller vector sizes, so there's no
>>> analogue
>>> to
>>> AVX2 (we pretend we have some smaller sizes, but that's because the
>>> middle
>>> end can't do masking everywhere 

Skip a number of C++ 'g++.dg/tree-prof/' test cases for '-fno-exceptions' testing (was: Skip a number of C++ test cases for '-fno-exceptions' testing (was: Support in the GCC(/C++) test suites for '-f

2023-06-15 Thread Thomas Schwinge
Hi!

On 2023-06-15T17:15:54+0200, I wrote:
> On 2023-06-06T20:31:21+0100, Jonathan Wakely  wrote:
>> On Tue, 6 Jun 2023 at 20:14, Thomas Schwinge  wrote:
>>> This issue comes up in context of me working on C++ support for GCN and
>>> nvptx target.  Those targets shall default to '-fno-exceptions' -- or,
>>> "in other words", '-fexceptions' is not supported.  (Details omitted
>>> here.)
>>>
>>> It did seem clear to me that with such a configuration it'll be hard to
>>> get clean test results.  Then I found code in
>>> 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune':
>>>
>>> # If exceptions are disabled, mark tests expecting exceptions to be 
>>> enabled
>>> # as unsupported.
>>> if { ![check_effective_target_exceptions_enabled] } {
>>> if [regexp "(^|\n)\[^\n\]*: error: exception handling disabled" 
>>> $text] {
>>> return "::unsupported::exception handling disabled"
>>> }
>>>
>>> ..., which, in a way, sounds as if the test suite generally is meant to
>>> produce useful results for '-fno-exceptions', nice surprise!
>>>
>>> Running x86_64-pc-linux-gnu (not yet GCN, nvptx) 'make check' with:
>>>
>>> RUNTESTFLAGS='--target_board=unix/-fno-exceptions\{,-m32\}'
>>>
>>> ..., I find that indeed this does work for a lot of test cases, where we
>>> then get (random example):
>>>
>>>  PASS: g++.dg/coroutines/pr99710.C  (test for errors, line 23)
>>> -PASS: g++.dg/coroutines/pr99710.C (test for excess errors)
>>> +UNSUPPORTED: g++.dg/coroutines/pr99710.C: exception handling disabled
>>>
>>> ..., due to:
>>>
>>>  [...]/g++.dg/coroutines/pr99710.C: In function 'task my_coro()':
>>> +[...]/g++.dg/coroutines/pr99710.C:18:10: error: exception handling
>>> disabled, use '-fexceptions' to enable
>>>  [...]/g++.dg/coroutines/pr99710.C:23:7: error: await expressions are
>>> not permitted in handlers
>>>  compiler exited with status 1
>>>
>>> But, we're nowhere near clean test results: PASS -> FAIL as well as
>>> XFAIL -> XPASS regressions, due to 'error: exception handling disabled'
>>> precluding other diagnostics seems to be one major issue.
>>>
>>> Is there interest in me producing the obvious (?) changes to those test
>>> cases, such that compiler g++ as well as target library libstdc++ test
>>> results are reasonably clean?  (If you think that's all "wasted effort",
>>> then I suppose I'll just locally ignore any FAILs/XPASSes/UNRESOLVEDs
>>> that appear in combination with
>>> 'UNSUPPORTED: [...]: exception handling disabled'.)
>>
>> I would welcome that for libstdc++. [...]

> Not having heard anything contrary regarding the compiler side of things,
> I've now been working on that, see below.

>>> Otherwise, a number of test cases need DejaGnu directives
>>> conditionalized on 'target exceptions_enabled'.
>
> Before I get to such things, even simpler: OK to push the attached
> "Skip a number of C++ test cases for '-fno-exceptions' testing"?

Similarly, OK to push the attached
"Skip a number of C++ 'g++.dg/tree-prof/' test cases for '-fno-exceptions' 
testing"?


Grüße
 Thomas


>>> (Or,
>>> 'error: exception handling disabled' made a "really late" diagnostic, so
>>> that it doesn't preclude other diagnostics?  I'll have a look.  Well,
>>> maybe something like: in fact do not default to '-fno-exceptions', but
>>> instead emit 'error: exception handling disabled' only if in a "really
>>> late" pass we run into exceptions-related constructs that we cannot
>>> support.  That'd also avoid PASS -> UNSUPPORTED "regressions" when
>>> exception handling in fact gets optimized away, for example.  I like that
>>> idea, conceptually -- but is it feasible to implement..?)
>>
>> IMHO just [...] using [an effective target keyword] in test
>> selectors seems simpler, and doesn't require changes to the compiler, just
>> the tests.
>
> I still like the idea, but yes, I've mentally put it on file "for later"
> (ha, ha, ha...) -- it doesn't seem obvious to implement.
>
>
> Grüße
>  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 9d9c1430c569e661913a3f5dc59fceaa03cc935d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 14 Jun 2023 22:39:01 +0200
Subject: [PATCH] Skip a number of C++ 'g++.dg/tree-prof/' test cases for
 '-fno-exceptions' testing

Running 'make check' with: 'RUNTESTFLAGS=--target_board=unix/-fno-exceptions',
'error: exception handling disabled' is triggered for C++ 'throw' etc. usage,
and per 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune':

# If exceptions are disabled, mark tests expecting exceptions to be enabled
# as unsupported.
if { ![check_effective_target_exceptions_enabled] } {
if [regexp "(^|\n)\[^\n\]*: error: exception handling disabled" $text] {
  return "::unsupported::exception handling 

Re: [PATCH] rs6000, fix vec_replace_unaligned builtin arguments

2023-06-15 Thread Carl Love via Gcc-patches
On Tue, 2023-06-13 at 11:24 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/5/31 04:41, Carl Love wrote:
> > GCC maintainers:
> > 
> > The following patch fixes the first argument in the builtin
> > definition
> > and the corresponding test cases.  Initially, the builtin
> > specification
> > was wrong due to a cut and past error.  The documentation was fixed
> > in:
> > 
> > 
> >commit 8cb748a31cd8c7ac9c88b6abc38ce077dd462a7a
> >Author: Bill Schmidt 
> >Date:   Fri Feb 4 13:26:44 2022 -0600
> > 
> >rs6000: Clean up ISA 3.1 documentation [PR100808]
> > 
> >Due to a pasto error in the documentation,
> > vec_replace_unaligned was
> >implemented with the same function prototypes as
> > vec_replace_elt.  It was
> >intended that vec_replace_unaligned always specify output
> > vectors as having
> >type vector unsigned char, to emphasize that elements are
> > potentially
> >misaligned by this built-in function.  This patch corrects
> > the
> >misimplementation.
> > 
> >2022-02-04  Bill Schmidt  
> > 
> >gcc/
> >PR target/100808
> >* doc/extend.texi (Basic PowerPC Built-in Functions
> > Available on ISA
> >3.1): Provide consistent type names.  Remove
> > unnecessary semicolons.
> >Fix bad line breaks.
> > 
> 
> Wrong referred commit, should be
> ed3fea09b18f67e757b5768b42cb6e816626f1db.
> The above commit used the wrong commit log.

Fixed the commit reference as noted.

> 
> > This patch fixes the arguments in the definitions and updates the
> > testcases accordingly.  Additionally, a few minor spacing issues
> > are
> > fixed.
> > 
> > The patch has been tested on Power 10 with no regressions.  Please
> > let
> > me know if the patch is acceptable for mainline.  Thanks.
> > 
> >  Carl 
> > 
> > --
> > rs6000, fix vec_replace_unaligned builtin arguments
> > 
> > The first argument of the vec_replace_unaligned builtin should
> > always be
> > unsinged char, as specified in gcc/doc/extend.texi.
> 
> s/unsinged/unsigned/

Fixed.

> 
> > This patch fixes the buitin definitions and updates the testcases
> > to use
> 
> s/buitin/builtin/

Fixed.

> 
> > the correct arguments.
> > 
> > gcc/ChangeLog:
> > * config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
> > Fix first argument type.
> > 
> > gcc/testsuite/ChangeLog:
> > * gcc.target/powerpc/ver-replace-word-runnable.c
> > (vec_replace_unaligned) Fix first argument type.
> > (vresult_uchar): Fix expected   results.
> 
> Nit: unexpected tab.

Fixed.

> 
> > (vec_replace_unaligned): Update for loop to check uchar
> > results.
> > Remove extra spaces in if statements.
> > Insert missing spaces in for statements.
> > (dg-final): Update expected instruction counts.
> > ---
> >  gcc/config/rs6000/rs6000-overload.def |  12 +-
> >  .../powerpc/vec-replace-word-runnable.c   | 157 ++--
> > --
> >  2 files changed, 92 insertions(+), 77 deletions(-)
> > 
> > diff --git a/gcc/config/rs6000/rs6000-overload.def
> > b/gcc/config/rs6000/rs6000-overload.def
> > index c582490c084..26dc662b8fb 100644
> > --- a/gcc/config/rs6000/rs6000-overload.def
> > +++ b/gcc/config/rs6000/rs6000-overload.def
> > @@ -3059,17 +3059,17 @@
> >  VREPLACE_ELT_V2DF
> >  
> >  [VEC_REPLACE_UN, vec_replace_unaligned, __builtin_vec_replace_un]
> > -  vuc __builtin_vec_replace_un (vui, unsigned int, const int);
> > +  vuc __builtin_vec_replace_un (vuc, unsigned int, const int);
> >  VREPLACE_UN_UV4SI
> > -  vuc __builtin_vec_replace_un (vsi, signed int, const int);
> > +  vuc __builtin_vec_replace_un (vuc, signed int, const int);
> >  VREPLACE_UN_V4SI
> > -  vuc __builtin_vec_replace_un (vull, unsigned long long, const
> > int);
> > +  vuc __builtin_vec_replace_un (vuc, unsigned long long, const
> > int);
> >  VREPLACE_UN_UV2DI
> > -  vuc __builtin_vec_replace_un (vsll, signed long long, const
> > int);
> > +  vuc __builtin_vec_replace_un (vuc, signed long long, const int);
> >  VREPLACE_UN_V2DI
> > -  vuc __builtin_vec_replace_un (vf, float, const int);
> > +  vuc __builtin_vec_replace_un (vuc, float, const int);
> >  VREPLACE_UN_V4SF
> > -  vuc __builtin_vec_replace_un (vd, double, const int);
> > +  vuc __builtin_vec_replace_un (vuc, double, const int);
> >  VREPLACE_UN_V2DF
> 
> Looks good, since the given element can be replaced without aligned,
> the given vector type don't need to match the given element, with
> the potential implication that it can be misaligned.
> 
> >  
> >  [VEC_REVB, vec_revb, __builtin_vec_revb]
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-
> > runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-
> > runnable.c
> > index 27318822871..66b0ef58996 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
> > +++ 

[PATCH ver 2] rs6000, fix vec_replace_unaligned builtin arguments

2023-06-15 Thread Carl Love via Gcc-patches
GCC maintainers:

Version 2, fixed various typos.  Updated the change log body to say the
instruction counts were updated.  The instruction counts changed as a
result of changing the first argument of the vec_replace_unaligned
builtin call from vector unsigned long long (vull) to vector unsigned
char (vuc).  When the first argument was vull the builtin call
generated the vinsd instruction for the two test cases.  The updated
call with vuc as the first argument generates two vinsw instructions
instead.  Patch was retested on Power 10 with no regressions.

The following patch fixes the first argument in the builtin definition
and the corresponding test cases.  Initially, the builtin specification
was wrong due to a cut and past error.  The documentation was fixed in:

   commit ed3fea09b18f67e757b5768b42cb6e816626f1db
   Author: Bill Schmidt 
   Date:   Fri Feb 4 13:07:17 2022 -0600

   rs6000: Correct function prototypes for vec_replace_unaligned

   Due to a pasto error in the documentation, vec_replace_unaligned was
   implemented with the same function prototypes as vec_replace_elt.  It was
   intended that vec_replace_unaligned always specify output vectors as 
having
   type vector unsigned char, to emphasize that elements are potentially
   misaligned by this built-in function.  This patch corrects the
   misimplementation.


This patch fixes the arguments in the definitions and updates the
testcases accordingly.  Additionally, a few minor spacing issues are
fixed.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable for mainline.  Thanks.

 Carl 

--
rs6000, fix vec_replace_unaligned builtin arguments

The first argument of the vec_replace_unaligned builtin should always be
unsigned char, as specified in gcc/doc/extend.texi.

This patch fixes the builtin definitions and updates the testcases to use
the correct arguments.  The expected instruction counts for the testcase
are updated.

gcc/ChangeLog:
* config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
Fix first argument type.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/ver-replace-word-runnable.c
(vec_replace_unaligned) Fix first argument type.
(vresult_uchar): Fix expected results.
(vec_replace_unaligned): Update for loop to check uchar results.
Remove extra spaces in if statements.
Insert missing spaces in for statements.
(dg-final): Update expected instruction counts.
---
 gcc/config/rs6000/rs6000-overload.def |  12 +-
 .../powerpc/vec-replace-word-runnable.c   | 157 ++
 2 files changed, 92 insertions(+), 77 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index c582490c084..26dc662b8fb 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3059,17 +3059,17 @@
 VREPLACE_ELT_V2DF
 
 [VEC_REPLACE_UN, vec_replace_unaligned, __builtin_vec_replace_un]
-  vuc __builtin_vec_replace_un (vui, unsigned int, const int);
+  vuc __builtin_vec_replace_un (vuc, unsigned int, const int);
 VREPLACE_UN_UV4SI
-  vuc __builtin_vec_replace_un (vsi, signed int, const int);
+  vuc __builtin_vec_replace_un (vuc, signed int, const int);
 VREPLACE_UN_V4SI
-  vuc __builtin_vec_replace_un (vull, unsigned long long, const int);
+  vuc __builtin_vec_replace_un (vuc, unsigned long long, const int);
 VREPLACE_UN_UV2DI
-  vuc __builtin_vec_replace_un (vsll, signed long long, const int);
+  vuc __builtin_vec_replace_un (vuc, signed long long, const int);
 VREPLACE_UN_V2DI
-  vuc __builtin_vec_replace_un (vf, float, const int);
+  vuc __builtin_vec_replace_un (vuc, float, const int);
 VREPLACE_UN_V4SF
-  vuc __builtin_vec_replace_un (vd, double, const int);
+  vuc __builtin_vec_replace_un (vuc, double, const int);
 VREPLACE_UN_V2DF
 
 [VEC_REVB, vec_revb, __builtin_vec_revb]
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
index 27318822871..66b0ef58996 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
@@ -20,6 +20,9 @@ main (int argc, char *argv [])
   unsigned char ch;
   unsigned int index;
 
+  vector unsigned char src_va_uchar;
+  vector unsigned char expected_vresult_uchar;
+
   vector unsigned int vresult_uint;
   vector unsigned int expected_vresult_uint;
   vector unsigned int src_va_uint;
@@ -64,10 +67,10 @@ main (int argc, char *argv [])
 
   vresult_uint = vec_replace_elt (src_va_uint, src_a_uint, 2);
 
-  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
+  if (!vec_all_eq (vresult_uint, expected_vresult_uint)) {
 #if DEBUG
 printf("ERROR, 

Re: [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations.

2023-06-15 Thread Jeff Law via Gcc-patches




On 6/15/23 09:30, Manolis Tsamis wrote:

On Thu, Jun 15, 2023 at 6:04 PM Jeff Law  wrote:




On 5/25/23 07:42, Jeff Law wrote:


Thanks Manolis.  Do you happen to know if this includes the fixes I
passed along to Philipp a few months back?  My recollection is one fixed
stale DF data which prevented an ICE during bootstrapping, the other
needed to ignore debug insns in one or two places so that the behavior
didn't change based on the existence of debug insns.

So we stumbled over another relatively minor issue in this code this
week that I'm sure you'll want to fix for a V2.

Specifically fold_offset's "scale" argument needs to be a HOST_WIDE_INT
rather than an "int".  Inside the ASHIFT handling you need to change the
type of shift_scale to a HOST_WIDE_INT as well and potentially the
actual computation of shift_scale.

The problem is if you have a compile-time constant address on rv64, it
might be constructed with code like this:





(insn 282 47 283 6 (set (reg:DI 14 a4 [267])
 (const_int 348160 [0x55000])) 
"test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit}
  (nil))
(insn 283 282 284 6 (set (reg:DI 14 a4 [267])
 (plus:DI (reg:DI 14 a4 [267])
 (const_int 1365 [0x555]))) 
"test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
  (expr_list:REG_EQUAL (const_int 349525 [0x5])
 (nil)))
(insn 284 283 285 6 (set (reg:DI 13 a3 [268])
 (const_int 1431662592 [0x7000])) 
"test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit}
  (nil))
(insn 285 284 215 6 (set (reg:DI 13 a3 [268])
 (plus:DI (reg:DI 13 a3 [268])
 (const_int 4 [0x4]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 
{riscv_adddi3}
  (expr_list:REG_EQUAL (const_int 1431662596 [0x7004])
 (nil)))
(insn 215 285 216 6 (set (reg:DI 14 a4 [271])
 (ashift:DI (reg:DI 14 a4 [267])
 (const_int 32 [0x20]))) "test_dbmd_pucinterruptenable_rw.c":18:31 
204 {ashldi3}
  (nil))
(insn 216 215 42 6 (set (reg/f:DI 14 a4 [166])
 (plus:DI (reg:DI 14 a4 [271])
 (reg:DI 13 a3 [268]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 
{riscv_adddi3}
  (expr_list:REG_DEAD (reg:DI 13 a3 [268])
 (expr_list:REG_EQUIV (const_int 1501199875796996 [0x57004])
 (nil




Note that 32bit ASHIFT in insn 215.  If you're doing that computation in
a 32bit integer type, then it's going to shift off the end of the type.


Thanks for reporting. I also noticed this while reworking the
implementation for v2 and I have fixed it among other things.

But I'm still wondering about the type of the offset folding
calculation and whether it could overflow in a bad way:
Could there also be edge cases where HOST_WIDE_INT would be problematic as well?
Maybe unsigned HOST_WIDE_INT is more correct (due to potential overflow issues)?
I think HOST_WIDE_INT is going to be OK.  If we overflow a H_W_I, then 
there's bigger problems elsewhere.


jeff


Skip a number of C++ "split files" test cases for '-fno-exceptions' testing (was: Skip a number of C++ test cases for '-fno-exceptions' testing (was: Support in the GCC(/C++) test suites for '-fno-exc

2023-06-15 Thread Thomas Schwinge
Hi!

On 2023-06-15T17:15:54+0200, I wrote:
> On 2023-06-06T20:31:21+0100, Jonathan Wakely  wrote:
>> On Tue, 6 Jun 2023 at 20:14, Thomas Schwinge  wrote:
>>> This issue comes up in context of me working on C++ support for GCN and
>>> nvptx target.  Those targets shall default to '-fno-exceptions' -- or,
>>> "in other words", '-fexceptions' is not supported.  (Details omitted
>>> here.)
>>>
>>> It did seem clear to me that with such a configuration it'll be hard to
>>> get clean test results.  Then I found code in
>>> 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune':
>>>
>>> # If exceptions are disabled, mark tests expecting exceptions to be 
>>> enabled
>>> # as unsupported.
>>> if { ![check_effective_target_exceptions_enabled] } {
>>> if [regexp "(^|\n)\[^\n\]*: error: exception handling disabled" 
>>> $text] {
>>> return "::unsupported::exception handling disabled"
>>> }
>>>
>>> ..., which, in a way, sounds as if the test suite generally is meant to
>>> produce useful results for '-fno-exceptions', nice surprise!
>>>
>>> Running x86_64-pc-linux-gnu (not yet GCN, nvptx) 'make check' with:
>>>
>>> RUNTESTFLAGS='--target_board=unix/-fno-exceptions\{,-m32\}'
>>>
>>> ..., I find that indeed this does work for a lot of test cases, where we
>>> then get (random example):
>>>
>>>  PASS: g++.dg/coroutines/pr99710.C  (test for errors, line 23)
>>> -PASS: g++.dg/coroutines/pr99710.C (test for excess errors)
>>> +UNSUPPORTED: g++.dg/coroutines/pr99710.C: exception handling disabled
>>>
>>> ..., due to:
>>>
>>>  [...]/g++.dg/coroutines/pr99710.C: In function 'task my_coro()':
>>> +[...]/g++.dg/coroutines/pr99710.C:18:10: error: exception handling
>>> disabled, use '-fexceptions' to enable
>>>  [...]/g++.dg/coroutines/pr99710.C:23:7: error: await expressions are
>>> not permitted in handlers
>>>  compiler exited with status 1
>>>
>>> But, we're nowhere near clean test results: PASS -> FAIL as well as
>>> XFAIL -> XPASS regressions, due to 'error: exception handling disabled'
>>> precluding other diagnostics seems to be one major issue.
>>>
>>> Is there interest in me producing the obvious (?) changes to those test
>>> cases, such that compiler g++ as well as target library libstdc++ test
>>> results are reasonably clean?  (If you think that's all "wasted effort",
>>> then I suppose I'll just locally ignore any FAILs/XPASSes/UNRESOLVEDs
>>> that appear in combination with
>>> 'UNSUPPORTED: [...]: exception handling disabled'.)
>>
>> I would welcome that for libstdc++. [...]

> Not having heard anything contrary regarding the compiler side of things,
> I've now been working on that, see below.

>>> Otherwise, a number of test cases need DejaGnu directives
>>> conditionalized on 'target exceptions_enabled'.
>
> Before I get to such things, even simpler: OK to push the attached
> "Skip a number of C++ test cases for '-fno-exceptions' testing"?

Similarly, OK to push the attached
"Skip a number of C++ "split files" test cases for '-fno-exceptions' testing"?


Grüße
 Thomas


>>> (Or,
>>> 'error: exception handling disabled' made a "really late" diagnostic, so
>>> that it doesn't preclude other diagnostics?  I'll have a look.  Well,
>>> maybe something like: in fact do not default to '-fno-exceptions', but
>>> instead emit 'error: exception handling disabled' only if in a "really
>>> late" pass we run into exceptions-related constructs that we cannot
>>> support.  That'd also avoid PASS -> UNSUPPORTED "regressions" when
>>> exception handling in fact gets optimized away, for example.  I like that
>>> idea, conceptually -- but is it feasible to implement..?)
>>
>> IMHO just [...] using [an effective target keyword] in test
>> selectors seems simpler, and doesn't require changes to the compiler, just
>> the tests.
>
> I still like the idea, but yes, I've mentally put it on file "for later"
> (ha, ha, ha...) -- it doesn't seem obvious to implement.
>
>
> Grüße
>  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From ff03a142dcdb8e2225a57f62bbc7679c384e88e5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 7 Jun 2023 16:11:11 +0200
Subject: [PATCH] Skip a number of C++ "split files" test cases for
 '-fno-exceptions' testing

Running 'make check' with: 'RUNTESTFLAGS=--target_board=unix/-fno-exceptions',
'error: exception handling disabled' is triggered for C++ 'throw' etc. usage,
and per 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune':

# If exceptions are disabled, mark tests expecting exceptions to be enabled
# as unsupported.
if { ![check_effective_target_exceptions_enabled] } {
if [regexp "(^|\n)\[^\n\]*: error: exception handling disabled" $text] {
  return "::unsupported::exception handling disabled"

Re: [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations.

2023-06-15 Thread Manolis Tsamis
On Thu, Jun 15, 2023 at 6:04 PM Jeff Law  wrote:
>
>
>
> On 5/25/23 07:42, Jeff Law wrote:
>
> > Thanks Manolis.  Do you happen to know if this includes the fixes I
> > passed along to Philipp a few months back?  My recollection is one fixed
> > stale DF data which prevented an ICE during bootstrapping, the other
> > needed to ignore debug insns in one or two places so that the behavior
> > didn't change based on the existence of debug insns.
> So we stumbled over another relatively minor issue in this code this
> week that I'm sure you'll want to fix for a V2.
>
> Specifically fold_offset's "scale" argument needs to be a HOST_WIDE_INT
> rather than an "int".  Inside the ASHIFT handling you need to change the
> type of shift_scale to a HOST_WIDE_INT as well and potentially the
> actual computation of shift_scale.
>
> The problem is if you have a compile-time constant address on rv64, it
> might be constructed with code like this:
>
>
>
>
> > (insn 282 47 283 6 (set (reg:DI 14 a4 [267])
> > (const_int 348160 [0x55000])) 
> > "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit}
> >  (nil))
> > (insn 283 282 284 6 (set (reg:DI 14 a4 [267])
> > (plus:DI (reg:DI 14 a4 [267])
> > (const_int 1365 [0x555]))) 
> > "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
> >  (expr_list:REG_EQUAL (const_int 349525 [0x5])
> > (nil)))
> > (insn 284 283 285 6 (set (reg:DI 13 a3 [268])
> > (const_int 1431662592 [0x7000])) 
> > "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit}
> >  (nil))
> > (insn 285 284 215 6 (set (reg:DI 13 a3 [268])
> > (plus:DI (reg:DI 13 a3 [268])
> > (const_int 4 [0x4]))) "test_dbmd_pucinterruptenable_rw.c":18:31 
> > 5 {riscv_adddi3}
> >  (expr_list:REG_EQUAL (const_int 1431662596 [0x7004])
> > (nil)))
> > (insn 215 285 216 6 (set (reg:DI 14 a4 [271])
> > (ashift:DI (reg:DI 14 a4 [267])
> > (const_int 32 [0x20]))) 
> > "test_dbmd_pucinterruptenable_rw.c":18:31 204 {ashldi3}
> >  (nil))
> > (insn 216 215 42 6 (set (reg/f:DI 14 a4 [166])
> > (plus:DI (reg:DI 14 a4 [271])
> > (reg:DI 13 a3 [268]))) 
> > "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
> >  (expr_list:REG_DEAD (reg:DI 13 a3 [268])
> > (expr_list:REG_EQUIV (const_int 1501199875796996 [0x57004])
> > (nil
>
>
>
> Note that 32bit ASHIFT in insn 215.  If you're doing that computation in
> a 32bit integer type, then it's going to shift off the end of the type.
>
Thanks for reporting. I also noticed this while reworking the
implementation for v2 and I have fixed it among other things.

But I'm still wondering about the type of the offset folding
calculation and whether it could overflow in a bad way:
Could there also be edge cases where HOST_WIDE_INT would be problematic as well?
Maybe unsigned HOST_WIDE_INT is more correct (due to potential overflow issues)?

Manolis

>
> Jeff


Skip a number of C++ test cases for '-fno-exceptions' testing (was: Support in the GCC(/C++) test suites for '-fno-exceptions')

2023-06-15 Thread Thomas Schwinge
Hi!

On 2023-06-06T20:31:21+0100, Jonathan Wakely  wrote:
> On Tue, 6 Jun 2023 at 20:14, Thomas Schwinge  wrote:
>> This issue comes up in context of me working on C++ support for GCN and
>> nvptx target.  Those targets shall default to '-fno-exceptions' -- or,
>> "in other words", '-fexceptions' is not supported.  (Details omitted
>> here.)
>>
>> It did seem clear to me that with such a configuration it'll be hard to
>> get clean test results.  Then I found code in
>> 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune':
>>
>> # If exceptions are disabled, mark tests expecting exceptions to be 
>> enabled
>> # as unsupported.
>> if { ![check_effective_target_exceptions_enabled] } {
>> if [regexp "(^|\n)\[^\n\]*: error: exception handling disabled" 
>> $text] {
>> return "::unsupported::exception handling disabled"
>> }
>>
>> ..., which, in a way, sounds as if the test suite generally is meant to
>> produce useful results for '-fno-exceptions', nice surprise!
>>
>> Running x86_64-pc-linux-gnu (not yet GCN, nvptx) 'make check' with:
>>
>> RUNTESTFLAGS='--target_board=unix/-fno-exceptions\{,-m32\}'
>>
>> ..., I find that indeed this does work for a lot of test cases, where we
>> then get (random example):
>>
>>  PASS: g++.dg/coroutines/pr99710.C  (test for errors, line 23)
>> -PASS: g++.dg/coroutines/pr99710.C (test for excess errors)
>> +UNSUPPORTED: g++.dg/coroutines/pr99710.C: exception handling disabled
>>
>> ..., due to:
>>
>>  [...]/g++.dg/coroutines/pr99710.C: In function 'task my_coro()':
>> +[...]/g++.dg/coroutines/pr99710.C:18:10: error: exception handling
>> disabled, use '-fexceptions' to enable
>>  [...]/g++.dg/coroutines/pr99710.C:23:7: error: await expressions are
>> not permitted in handlers
>>  compiler exited with status 1
>>
>> But, we're nowhere near clean test results: PASS -> FAIL as well as
>> XFAIL -> XPASS regressions, due to 'error: exception handling disabled'
>> precluding other diagnostics seems to be one major issue.
>>
>> Is there interest in me producing the obvious (?) changes to those test
>> cases, such that compiler g++ as well as target library libstdc++ test
>> results are reasonably clean?  (If you think that's all "wasted effort",
>> then I suppose I'll just locally ignore any FAILs/XPASSes/UNRESOLVEDs
>> that appear in combination with
>> 'UNSUPPORTED: [...]: exception handling disabled'.)
>
> I would welcome that for libstdc++. I do sometimes run the libstdc++ tests
> with "unusual" options, like -fno-exceptions and -fno-rtti (e.g. today I've
> been fixing FAILs that only happen with -fexcess-precision=standard). I
> just manually ignore the tests that fail for -fno-exceptions, but it would
> be great if they were automatically skipped as UNSUPPORTED.

Per your and my changes a few days ago, we've already got libstdc++
covered, with the sole exception of:

PASS: 27_io/basic_ostream/inserters_arithmetic/pod/23875.cc (test for 
excess errors)
[-PASS:-]{+FAIL:+} 27_io/basic_ostream/inserters_arithmetic/pod/23875.cc 
execution test

terminate called after throwing an instance of 'std::bad_cast'
  what():  std::bad_cast

(Low priority for me.)

Not having heard anything contrary regarding the compiler side of things,
I've now been working on that, see below.

> We already have a handful of tests that use #if __cpp_exceptions to make
> those parts conditional on exception support.

Yes, that's an option not for all but certainly for some test cases.
(I'm not looking into that now -- but this may in fact be a good
beginner-level task, will add to ).

>> Otherwise, a number of test cases need DejaGnu directives
>> conditionalized on 'target exceptions_enabled'.

Before I get to such things, even simpler: OK to push the attached
"Skip a number of C++ test cases for '-fno-exceptions' testing"?


>> (Or,
>> 'error: exception handling disabled' made a "really late" diagnostic, so
>> that it doesn't preclude other diagnostics?  I'll have a look.  Well,
>> maybe something like: in fact do not default to '-fno-exceptions', but
>> instead emit 'error: exception handling disabled' only if in a "really
>> late" pass we run into exceptions-related constructs that we cannot
>> support.  That'd also avoid PASS -> UNSUPPORTED "regressions" when
>> exception handling in fact gets optimized away, for example.  I like that
>> idea, conceptually -- but is it feasible to implement..?)
>
> IMHO just [...] using [an effective target keyword] in test
> selectors seems simpler, and doesn't require changes to the compiler, just
> the tests.

I still like the idea, but yes, I've mentally put it on file "for later"
(ha, ha, ha...) -- it doesn't seem obvious to implement.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz 

[PATCH v2] RISC-V: Add autovec FP unary operations.

2023-06-15 Thread Robin Dapp via Gcc-patches
Hi,

changes from V1:
  - Use VF_AUTO iterator.
  - Don't mention vfsqrt7.

This patch adds floating-point autovec expanders for vfneg, vfabs as well as
vfsqrt and the accompanying tests.

Similary to the binop tests, there are flavors for zvfh now.

gcc/ChangeLog:

* config/riscv/autovec.md (2): Add unop expanders.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/abs-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c: New test.
---
 gcc/config/riscv/autovec.md   | 36 ++-
 .../riscv/rvv/autovec/unop/abs-run.c  |  6 ++--
 .../riscv/rvv/autovec/unop/abs-rv32gcv.c  |  3 +-
 .../riscv/rvv/autovec/unop/abs-rv64gcv.c  |  3 +-
 .../riscv/rvv/autovec/unop/abs-template.h | 14 +++-
 .../riscv/rvv/autovec/unop/abs-zvfh-run.c | 35 ++
 .../riscv/rvv/autovec/unop/vfsqrt-run.c   | 29 +++
 .../riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c   | 10 ++
 .../riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c   | 10 ++
 .../riscv/rvv/autovec/unop/vfsqrt-template.h  | 31 
 .../riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c  | 32 +
 .../riscv/rvv/autovec/unop/vneg-run.c |  6 ++--
 .../riscv/rvv/autovec/unop/vneg-rv32gcv.c |  3 +-
 .../riscv/rvv/autovec/unop/vneg-rv64gcv.c |  3 +-
 .../riscv/rvv/autovec/unop/vneg-template.h|  5 ++-
 .../riscv/rvv/autovec/unop/vneg-zvfh-run.c| 26 ++
 16 files changed, 241 insertions(+), 11 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 94452c932a4..5b84eaaf052 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -513,7 +513,7 @@ (define_expand "2"
 })
 
 ;; 
---
-;; - ABS expansion to vmslt and vneg
+;; - [INT] ABS expansion to vmslt and vneg.
 ;; 
---
 
 (define_expand "abs2"
@@ -532,6 +532,40 @@ (define_expand "abs2"
   DONE;
 })
 
+;; 
---
+;;  [FP] Unary operations
+;; 
---
+;; Includes:
+;; - vfneg.v/vfabs.v
+;; 
---
+(define_expand "2"
+  [(set (match_operand:VF_AUTO 0 "register_operand")
+(any_float_unop_nofrm:VF_AUTO
+ (match_operand:VF_AUTO 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+})
+
+;; 
---
+;; - [FP] Square root
+;; 
---
+;; Includes:
+;; - vfsqrt.v
+;; 
---
+(define_expand "2"
+  [(set (match_operand:VF_AUTO 0 "register_operand")
+(any_float_unop:VF_AUTO
+ (match_operand:VF_AUTO 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_fp_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+})
+
 ;; =
 ;; == Ternary arithmetic
 ;; 

[PATCH v2] RISC-V: Add autovec FP binary operations.

2023-06-15 Thread Robin Dapp via Gcc-patches
Hi,

changes from V1:
 - Add VF_AUTO iterator and use it.
 - Ensured we don't ICE with -march=rv64gcv_zfhmin.

this implements the floating-point autovec expanders for binary
operations: vfadd, vfsub, vfdiv, vfmul, vfmax, vfmin and adds
tests.

The existing tests are split up into non-_Float16 and _Float16
flavors as we cannot rely on the zvfh extension being present.

As long as we do not have full middle-end support we need
-ffast-math for the tests.

gcc/ChangeLog:

* config/riscv/autovec.md (3): Implement binop
expander.
* config/riscv/riscv-protos.h (emit_vlmax_fp_insn): Declare.
(emit_vlmax_fp_minmax_insn): Declare.
(enum frm_field_enum): Rename this...
(enum rounding_mode): ...to this.
* config/riscv/riscv-v.cc (emit_vlmax_fp_insn): New function
(emit_vlmax_fp_minmax_insn): New function.
* config/riscv/riscv.cc (riscv_const_insns): Clarify const
vector handling.
(riscv_libgcc_floating_mode_supported_p): Adjust comment.
(riscv_excess_precision): Do not convert to float for ZVFH.
* config/riscv/vector-iterators.md: Add VF_AUTO iterator.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vdiv-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmax-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmin-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmul-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vsub-zvfh-run.c: New test.
---
 gcc/config/riscv/autovec.md   | 36 +
 gcc/config/riscv/riscv-protos.h   |  5 +-
 gcc/config/riscv/riscv-v.cc   | 74 ++-
 gcc/config/riscv/riscv.cc | 27 +--
 gcc/config/riscv/vector-iterators.md  | 28 +++
 .../riscv/rvv/autovec/binop/vadd-run.c| 12 ++-
 .../riscv/rvv/autovec/binop/vadd-rv32gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vadd-rv64gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vadd-template.h   | 11 ++-
 .../riscv/rvv/autovec/binop/vadd-zvfh-run.c   | 54 ++
 .../riscv/rvv/autovec/binop/vdiv-run.c|  8 +-
 .../riscv/rvv/autovec/binop/vdiv-rv32gcv.c|  7 +-
 .../riscv/rvv/autovec/binop/vdiv-rv64gcv.c|  7 +-
 .../riscv/rvv/autovec/binop/vdiv-template.h   |  8 +-
 .../riscv/rvv/autovec/binop/vdiv-zvfh-run.c   | 37 ++
 .../riscv/rvv/autovec/binop/vmax-run.c|  9 ++-
 .../riscv/rvv/autovec/binop/vmax-rv32gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vmax-rv64gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vmax-template.h   |  8 +-
 .../riscv/rvv/autovec/binop/vmax-zvfh-run.c   | 38 ++
 .../riscv/rvv/autovec/binop/vmin-run.c| 10 ++-
 .../riscv/rvv/autovec/binop/vmin-rv32gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vmin-rv64gcv.c|  3 +-
 .../riscv/rvv/autovec/binop/vmin-template.h   |  8 +-
 .../riscv/rvv/autovec/binop/vmin-zvfh-run.c   | 37 ++
 .../riscv/rvv/autovec/binop/vmul-run.c|  8 +-
 .../riscv/rvv/autovec/binop/vmul-rv32gcv.c|  7 +-
 .../riscv/rvv/autovec/binop/vmul-rv64gcv.c|  7 +-
 .../riscv/rvv/autovec/binop/vmul-template.h   |  8 

Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-06-15 Thread Qing Zhao via Gcc-patches
Hi, Joseph,

I studied c_parser_attribute_arguments and related source code, 
and also studied PLACEHOLDER_EXPR and related source code. 

Right now, I still cannot decide what’s the best user-interface should be for 
the 
argument of the new attribute “element_count". (Or the new attribute 
name “counted_by” as suggested by Kees). 

There are 3 possible interfaces for the argument of the new attribute 
“counted_by”:

The first 2 interfaces are the following A and B:

A. the argument of the new attribute “counted_by” is a string as the current 
patch:

struct trailing_array_A {
  Int count;
  int array_A[] __attribute ((counted_by (“count"))); 
};

B. The argument of the new attribute “counted_by” is an identifier that can be
 accepted by “c_parser_attribute_arguments”:

struct trailing_array_B {
  Int count;
  int array_B[] __attribute ((counted_by (count))); 
};

To implement this interface,  we need to adjust “attribute_takes_identifier_p” 
to 
accept the new attribute “counted_by” and then interpret this new identifier  
“count” as a 
field of the containing structure by looking at the name.  (Otherwise, the 
identifier “count”
will be treated as an identifier in the current scope, which is not declared 
yet)

Comparing B with A, I don’t see too much benefit, either from user-interface 
point of view, 
or from implementation point of view.  

For implementation, both A and B need to search the fields of the containing 
structure by 
the name of the field “count”.

For user interface, I think that A and B are similar.

In addition to consider the user-interface and implementation, another concern 
is the possibility
 to extend the argument of this new attribute to an expression in the future, 
for example:

struct trailing_array_F {
  Int count;
  int array_F[] __attribute ((counted_by (count * 4))); 
};

In the above struct “trailing_array_F”, the argument of the attribute 
“counted_by” is “count * 4”, which
is an expression.  

If we plan to extend the argument of this new attribute to an expression, then 
neither A nor B is
good, right?

For this purpose, it might be cleaner to introduce a new syntax similar as the 
designator as mentioned in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108896 as following, i.e, the 
approach C:

C. The argument of the new attribute “counted_by” is a new designator 
identifier that will be parsed as
The field of the containing structure:

struct trailing_array_C {
  Int count;
  int array_C[] __attribute ((counted_by (.count))); 
};

I think that after the C FE accepts this new designator syntax as the argument 
of the attribute, then it’s
very easy to extend the argument to an arbitrary expression later. 

For the implementation of this approach, my current thinking is: 

1. Update the routine “c_parser_postfix_expression” (is this the right place? ) 
to accept the new designator syntax.
2. Use “PLACEHOLDER_EXPR” to represent the containing structure, and build a 
COMPONENT_REF to hold
the argument of the attribute in the IR.
3. When using this attribute in middle-end or sanitizer, use 
SUBSTITUTE_PLACEHOLDER_IN_EXPR(EXP, OBJ)
to get the size info in IR. 

So, right now, I think that we might need to take the approach C?

What’s your opinion and suggestions?

Thanks a lot for your help.

Qing


> On Jun 7, 2023, at 6:05 PM, Joseph Myers  wrote:
> 
> On Wed, 7 Jun 2023, Qing Zhao via Gcc-patches wrote:
> 
>> Are you suggesting to use identifier directly as the argument of the 
>> attribute?
>> I tried this in the beginning, however, the current parser for the attribute 
>> argument can not identify that this identifier is a field identifier inside 
>> the same structure. 
>> 
>> For example:
>> 
>> int count;
>> struct trailing_array_7 {
>>  Int count;
>>  int array_7[] __attribute ((element_count (count))); 
>> };
>> 
>> The identifier “count” inside the attribute will refer to the variable 
>> “int count” outside of the structure.
> 
> c_parser_attribute_arguments is supposed to allow an identifier as an 
> attribute argument - and not look it up (the user of the attribute would 
> later need to look it up in the context of the containing structure).  
> Callers use attribute_takes_identifier_p to determine which attributes 
> take identifiers (versus expressions) as arguments, which would need 
> updating to cover the new attribute.
> 
> There is a ??? comment about the case where the identifier is declared as 
> a type name.  That would simply be one of the cases carried over from the 
> old Bison parser, and it would seem reasonable to remove that 
> special-casing so that the attribute works even when the identifier is 
> declared as a typedef name as an ordinary identifier, since it's fine for 
> structure members to have the same name as a typedef name.
> 
> Certainly taking an identifier directly seems like cleaner syntax than 
> taking a string that then needs reinterpreting as an identifier.
> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com



[PATCH v2] RISC-V: testsuite: Add vector_hw and zvfh_hw checks.

2023-06-15 Thread Robin Dapp via Gcc-patches
Hi,

Changes from v1:
 - Revamped the target selectors again.
 - Fixed some syntax as well as caching errors that were still present.
 - Adjusted some test cases I missed.

The current situation with target selectors is improvable at best.
We definitely need to discern between being able to build a
test with the current configuration and running the test on the
current target which this patch attempts to do.  There might
be a need for more fine-grained checks in the future that could
also go into our target-specific riscv.exp in the subdirectories
but for now I think we're good.

A bit more detail is in the patch description below.  The testsuite
is as clean as before for the configurations I tried: default, rv64gcv,
rv64gcv_zfhmin, rv64gc, rv64gc_zfh, rv64gc_zfhmin.  I hope I didn't
overlook tests that appear unsupported now but shouldn't be.

@Pan: No need to check the old version anymore, thanks.  This patch
is preferred.

Regards
 Robin


This introduces new checks for run tests.  Currently we have
riscv_vector as well as rv32 and rv64 which all check if GCC (with the
current configuration) can build the respective tests.

Many tests specify e.g. a different -march for vector which
makes the check fail even though we could build as well as run
those tests.

vector_hw now tries to compile, link and execute a simple vector example
file.  If this succeeds the respective test can run.

Similarly we introduce a zvfh_hw check which will be used in the
upcoming floating-point unop/binop tests as well as rv32_hw and
rv64_hw checks that are currently unused.

To conclude:
 - If we want a testcase to only compile when the current configuration
   has vector support we use {riscv_vector}.
 - If we want a testcase to run when the current target can supports
   executing vector instructions we use {riscv_vector_hw}.
   It still needs to be ensured that we can actually build the test
   which can be achieved by either
   (1) compiling with e.g. -march=rv64gcv or
   (2) only enabling the test when the current configuration supports
 vector via {riscsv_vector}.

The same principle applies for zfh, zfhmin and zvfh but we do not yet
have all target selectors.  In the meanwhile we need to make sure to
specify the proper -march flags like in (1).

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/shift-run.c: Use
riscv_vector_hw.
* gcc.target/riscv/rvv/autovec/binop/shift-scalar-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vand-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vor-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vxor-run.c: Dito.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c: Dito.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c: Dito.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c: Dito.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-4.c: Dito.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c:
Dito.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-run.c: Dito.
* gcc.target/riscv/rvv/autovec/conversions/vsext-run.c: Dito.
* gcc.target/riscv/rvv/autovec/conversions/vzext-run.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c:
Dito.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c:
Dito.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c:
Dito.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-4.c:
Dito.
* gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-1.c:
Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-4.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-5.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-6.c: Dito.
* gcc.target/riscv/rvv/autovec/partial/slp_run-7.c: Dito.
* gcc.target/riscv/rvv/autovec/series_run-1.c: Dito.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: Dito.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: Dito.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: Dito.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: Dito.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: Dito.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: Dito.
   

Re: [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations.

2023-06-15 Thread Jeff Law via Gcc-patches




On 5/25/23 07:42, Jeff Law wrote:

Thanks Manolis.  Do you happen to know if this includes the fixes I 
passed along to Philipp a few months back?  My recollection is one fixed 
stale DF data which prevented an ICE during bootstrapping, the other 
needed to ignore debug insns in one or two places so that the behavior 
didn't change based on the existence of debug insns.
So we stumbled over another relatively minor issue in this code this 
week that I'm sure you'll want to fix for a V2.


Specifically fold_offset's "scale" argument needs to be a HOST_WIDE_INT 
rather than an "int".  Inside the ASHIFT handling you need to change the 
type of shift_scale to a HOST_WIDE_INT as well and potentially the 
actual computation of shift_scale.


The problem is if you have a compile-time constant address on rv64, it 
might be constructed with code like this:






(insn 282 47 283 6 (set (reg:DI 14 a4 [267])
(const_int 348160 [0x55000])) "test_dbmd_pucinterruptenable_rw.c":18:31 
179 {*movdi_64bit}
 (nil))
(insn 283 282 284 6 (set (reg:DI 14 a4 [267])
(plus:DI (reg:DI 14 a4 [267])
(const_int 1365 [0x555]))) 
"test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
 (expr_list:REG_EQUAL (const_int 349525 [0x5])
(nil)))
(insn 284 283 285 6 (set (reg:DI 13 a3 [268])
(const_int 1431662592 [0x7000])) 
"test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit}
 (nil))
(insn 285 284 215 6 (set (reg:DI 13 a3 [268])
(plus:DI (reg:DI 13 a3 [268])
(const_int 4 [0x4]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 
{riscv_adddi3}
 (expr_list:REG_EQUAL (const_int 1431662596 [0x7004])
(nil)))
(insn 215 285 216 6 (set (reg:DI 14 a4 [271])
(ashift:DI (reg:DI 14 a4 [267]) 
(const_int 32 [0x20]))) "test_dbmd_pucinterruptenable_rw.c":18:31 204 {ashldi3}
 (nil)) 
(insn 216 215 42 6 (set (reg/f:DI 14 a4 [166])
(plus:DI (reg:DI 14 a4 [271]) 
(reg:DI 13 a3 [268]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}

 (expr_list:REG_DEAD (reg:DI 13 a3 [268])
(expr_list:REG_EQUIV (const_int 1501199875796996 [0x57004])
(nil




Note that 32bit ASHIFT in insn 215.  If you're doing that computation in 
a 32bit integer type, then it's going to shift off the end of the type.



Jeff


Re: [PATCH] RISC-V: Add autovec FP unary operations.

2023-06-15 Thread Robin Dapp via Gcc-patches
> Btw. I'm currently running the testsuite with rv64gcv_zfhmin
> default march and see some additional FAILs.  Will report back.

Reporting back - the FAILs are a combination of an older qemu
version and not fully comprehensive target selectors.  I'm going
to send a V2 for the testsuite patch as well.

Regards
 Robin


Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Andrew Stubbs

On 15/06/2023 15:00, Richard Biener wrote:

On Thu, 15 Jun 2023, Andrew Stubbs wrote:


On 15/06/2023 14:34, Richard Biener wrote:

On Thu, 15 Jun 2023, Andrew Stubbs wrote:


On 15/06/2023 12:06, Richard Biener wrote:

On Thu, 15 Jun 2023, Andrew Stubbs wrote:


On 15/06/2023 10:58, Richard Biener wrote:

On Thu, 15 Jun 2023, Andrew Stubbs wrote:


On 14/06/2023 15:29, Richard Biener wrote:




Am 14.06.2023 um 16:27 schrieb Andrew Stubbs :

On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:

This implemens fully masked vectorization or a masked epilog for
AVX512 style masks which single themselves out by representing
each lane with a single bit and by using integer modes for the mask
(both is much like GCN).
AVX512 is also special in that it doesn't have any instruction
to compute the mask from a scalar IV like SVE has with while_ult.
Instead the masks are produced by vector compares and the loop
control retains the scalar IV (mainly to avoid dependences on
mask generation, a suitable mask test instruction is available).


This is also sounds like GCN. We currently use WHILE_ULT in the
middle
end
which expands to a vector compare against a vector of stepped values.
This
requires an additional instruction to prepare the comparison vector
(compared to SVE), but the "while_ultv64sidi" pattern (for example)
returns
the DImode bitmask, so it works reasonably well.


Like RVV code generation prefers a decrementing IV though IVOPTs
messes things up in some cases removing that IV to eliminate
it with an incrementing one used for address generation.
One of the motivating testcases is from PR108410 which in turn
is extracted from x264 where large size vectorization shows
issues with small trip loops.  Execution time there improves
compared to classic AVX512 with AVX2 epilogues for the cases
of less than 32 iterations.
size   scalar 128 256 512512e512f
 19.42   11.329.35   11.17   15.13   16.89
 25.726.536.666.667.628.56
 34.495.105.105.745.085.73
 44.104.334.295.213.794.25
 63.783.853.864.762.542.85
 83.641.893.764.501.922.16
123.562.213.754.261.261.42
163.360.831.064.160.951.07
203.391.421.334.070.750.85
243.230.661.724.220.620.70
283.181.092.044.200.540.61
323.160.470.410.410.470.53
343.160.670.610.560.440.50
383.190.950.950.820.400.45
423.090.581.211.130.360.40
'size' specifies the number of actual iterations, 512e is for
a masked epilog and 512f for the fully masked loop.  From
4 scalar iterations on the AVX512 masked epilog code is clearly
the winner, the fully masked variant is clearly worse and
it's size benefit is also tiny.


Let me check I understand correctly. In the fully masked case, there
is
a
single loop in which a new mask is generated at the start of each
iteration. In the masked epilogue case, the main loop uses no masking
whatsoever, thus avoiding the need for generating a mask, carrying
the
mask, inserting vec_merge operations, etc, and then the epilogue
looks
much
like the fully masked case, but unlike smaller mode epilogues there
is
no
loop because the eplogue vector size is the same. Is that right?


Yes.


This scheme seems like it might also benefit GCN, in so much as it
simplifies the hot code path.

GCN does not actually have smaller vector sizes, so there's no
analogue
to
AVX2 (we pretend we have some smaller sizes, but that's because the
middle
end can't do masking everywhere yet, and it helps make some vector
constants smaller, perhaps).


This patch does not enable using fully masked loops or
masked epilogues by default.  More work on cost modeling
and vectorization kind selection on x86_64 is necessary
for this.
Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
which could be exploited further to unify some of the flags
we have right now but there didn't seem to be many easy things
to merge, so I'm leaving this for followups.
Mask requirements as registered by vect_record_loop_mask are kept in
their
original form and recorded in a hash_set now instead of being
processed to a vector of rgroup_controls.  Instead that's now
left to the final analysis phase which tries forming the
rgroup_controls
vector using while_ult and if that fails now tries AVX512 style
which needs a different organization and instead fills a hash_map
with the relevant info.  vect_get_loop_mask now has two
implementations,
one for the two mask styles we then have.
I have decided against interweaving
vect_set_loop_condition_partial_vectors
with conditions to do AVX512 style masking and instead opted to

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Richard Biener via Gcc-patches
On Thu, 15 Jun 2023, Andrew Stubbs wrote:

> On 15/06/2023 14:34, Richard Biener wrote:
> > On Thu, 15 Jun 2023, Andrew Stubbs wrote:
> > 
> >> On 15/06/2023 12:06, Richard Biener wrote:
> >>> On Thu, 15 Jun 2023, Andrew Stubbs wrote:
> >>>
>  On 15/06/2023 10:58, Richard Biener wrote:
> > On Thu, 15 Jun 2023, Andrew Stubbs wrote:
> >
> >> On 14/06/2023 15:29, Richard Biener wrote:
> >>>
> >>>
>  Am 14.06.2023 um 16:27 schrieb Andrew Stubbs :
> 
>  On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:
> > This implemens fully masked vectorization or a masked epilog for
> > AVX512 style masks which single themselves out by representing
> > each lane with a single bit and by using integer modes for the mask
> > (both is much like GCN).
> > AVX512 is also special in that it doesn't have any instruction
> > to compute the mask from a scalar IV like SVE has with while_ult.
> > Instead the masks are produced by vector compares and the loop
> > control retains the scalar IV (mainly to avoid dependences on
> > mask generation, a suitable mask test instruction is available).
> 
>  This is also sounds like GCN. We currently use WHILE_ULT in the
>  middle
>  end
>  which expands to a vector compare against a vector of stepped values.
>  This
>  requires an additional instruction to prepare the comparison vector
>  (compared to SVE), but the "while_ultv64sidi" pattern (for example)
>  returns
>  the DImode bitmask, so it works reasonably well.
> 
> > Like RVV code generation prefers a decrementing IV though IVOPTs
> > messes things up in some cases removing that IV to eliminate
> > it with an incrementing one used for address generation.
> > One of the motivating testcases is from PR108410 which in turn
> > is extracted from x264 where large size vectorization shows
> > issues with small trip loops.  Execution time there improves
> > compared to classic AVX512 with AVX2 epilogues for the cases
> > of less than 32 iterations.
> > size   scalar 128 256 512512e512f
> > 19.42   11.329.35   11.17   15.13   16.89
> > 25.726.536.666.667.628.56
> > 34.495.105.105.745.085.73
> > 44.104.334.295.213.794.25
> > 63.783.853.864.762.542.85
> > 83.641.893.764.501.922.16
> >123.562.213.754.261.261.42
> >163.360.831.064.160.951.07
> >203.391.421.334.070.750.85
> >243.230.661.724.220.620.70
> >283.181.092.044.200.540.61
> >323.160.470.410.410.470.53
> >343.160.670.610.560.440.50
> >383.190.950.950.820.400.45
> >423.090.581.211.130.360.40
> > 'size' specifies the number of actual iterations, 512e is for
> > a masked epilog and 512f for the fully masked loop.  From
> > 4 scalar iterations on the AVX512 masked epilog code is clearly
> > the winner, the fully masked variant is clearly worse and
> > it's size benefit is also tiny.
> 
>  Let me check I understand correctly. In the fully masked case, there
>  is
>  a
>  single loop in which a new mask is generated at the start of each
>  iteration. In the masked epilogue case, the main loop uses no masking
>  whatsoever, thus avoiding the need for generating a mask, carrying
>  the
>  mask, inserting vec_merge operations, etc, and then the epilogue
>  looks
>  much
>  like the fully masked case, but unlike smaller mode epilogues there
>  is
>  no
>  loop because the eplogue vector size is the same. Is that right?
> >>>
> >>> Yes.
> >>>
>  This scheme seems like it might also benefit GCN, in so much as it
>  simplifies the hot code path.
> 
>  GCN does not actually have smaller vector sizes, so there's no
>  analogue
>  to
>  AVX2 (we pretend we have some smaller sizes, but that's because the
>  middle
>  end can't do masking everywhere yet, and it helps make some vector
>  constants smaller, perhaps).
> 
> > This patch does not enable using fully masked loops or
> > masked epilogues by default.  More work on cost modeling
> > and 

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Andrew Stubbs

On 15/06/2023 14:34, Richard Biener wrote:

On Thu, 15 Jun 2023, Andrew Stubbs wrote:


On 15/06/2023 12:06, Richard Biener wrote:

On Thu, 15 Jun 2023, Andrew Stubbs wrote:


On 15/06/2023 10:58, Richard Biener wrote:

On Thu, 15 Jun 2023, Andrew Stubbs wrote:


On 14/06/2023 15:29, Richard Biener wrote:




Am 14.06.2023 um 16:27 schrieb Andrew Stubbs :

On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:

This implemens fully masked vectorization or a masked epilog for
AVX512 style masks which single themselves out by representing
each lane with a single bit and by using integer modes for the mask
(both is much like GCN).
AVX512 is also special in that it doesn't have any instruction
to compute the mask from a scalar IV like SVE has with while_ult.
Instead the masks are produced by vector compares and the loop
control retains the scalar IV (mainly to avoid dependences on
mask generation, a suitable mask test instruction is available).


This is also sounds like GCN. We currently use WHILE_ULT in the middle
end
which expands to a vector compare against a vector of stepped values.
This
requires an additional instruction to prepare the comparison vector
(compared to SVE), but the "while_ultv64sidi" pattern (for example)
returns
the DImode bitmask, so it works reasonably well.


Like RVV code generation prefers a decrementing IV though IVOPTs
messes things up in some cases removing that IV to eliminate
it with an incrementing one used for address generation.
One of the motivating testcases is from PR108410 which in turn
is extracted from x264 where large size vectorization shows
issues with small trip loops.  Execution time there improves
compared to classic AVX512 with AVX2 epilogues for the cases
of less than 32 iterations.
size   scalar 128 256 512512e512f
19.42   11.329.35   11.17   15.13   16.89
25.726.536.666.667.628.56
34.495.105.105.745.085.73
44.104.334.295.213.794.25
63.783.853.864.762.542.85
83.641.893.764.501.922.16
   123.562.213.754.261.261.42
   163.360.831.064.160.951.07
   203.391.421.334.070.750.85
   243.230.661.724.220.620.70
   283.181.092.044.200.540.61
   323.160.470.410.410.470.53
   343.160.670.610.560.440.50
   383.190.950.950.820.400.45
   423.090.581.211.130.360.40
'size' specifies the number of actual iterations, 512e is for
a masked epilog and 512f for the fully masked loop.  From
4 scalar iterations on the AVX512 masked epilog code is clearly
the winner, the fully masked variant is clearly worse and
it's size benefit is also tiny.


Let me check I understand correctly. In the fully masked case, there is
a
single loop in which a new mask is generated at the start of each
iteration. In the masked epilogue case, the main loop uses no masking
whatsoever, thus avoiding the need for generating a mask, carrying the
mask, inserting vec_merge operations, etc, and then the epilogue looks
much
like the fully masked case, but unlike smaller mode epilogues there is
no
loop because the eplogue vector size is the same. Is that right?


Yes.


This scheme seems like it might also benefit GCN, in so much as it
simplifies the hot code path.

GCN does not actually have smaller vector sizes, so there's no analogue
to
AVX2 (we pretend we have some smaller sizes, but that's because the
middle
end can't do masking everywhere yet, and it helps make some vector
constants smaller, perhaps).


This patch does not enable using fully masked loops or
masked epilogues by default.  More work on cost modeling
and vectorization kind selection on x86_64 is necessary
for this.
Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
which could be exploited further to unify some of the flags
we have right now but there didn't seem to be many easy things
to merge, so I'm leaving this for followups.
Mask requirements as registered by vect_record_loop_mask are kept in
their
original form and recorded in a hash_set now instead of being
processed to a vector of rgroup_controls.  Instead that's now
left to the final analysis phase which tries forming the
rgroup_controls
vector using while_ult and if that fails now tries AVX512 style
which needs a different organization and instead fills a hash_map
with the relevant info.  vect_get_loop_mask now has two
implementations,
one for the two mask styles we then have.
I have decided against interweaving
vect_set_loop_condition_partial_vectors
with conditions to do AVX512 style masking and instead opted to
"duplicate" this to vect_set_loop_condition_partial_vectors_avx512.
Likewise for vect_verify_full_masking vs

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Richard Biener via Gcc-patches
On Thu, 15 Jun 2023, Andrew Stubbs wrote:

> On 15/06/2023 12:06, Richard Biener wrote:
> > On Thu, 15 Jun 2023, Andrew Stubbs wrote:
> > 
> >> On 15/06/2023 10:58, Richard Biener wrote:
> >>> On Thu, 15 Jun 2023, Andrew Stubbs wrote:
> >>>
>  On 14/06/2023 15:29, Richard Biener wrote:
> >
> >
> >> Am 14.06.2023 um 16:27 schrieb Andrew Stubbs :
> >>
> >> On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:
> >>> This implemens fully masked vectorization or a masked epilog for
> >>> AVX512 style masks which single themselves out by representing
> >>> each lane with a single bit and by using integer modes for the mask
> >>> (both is much like GCN).
> >>> AVX512 is also special in that it doesn't have any instruction
> >>> to compute the mask from a scalar IV like SVE has with while_ult.
> >>> Instead the masks are produced by vector compares and the loop
> >>> control retains the scalar IV (mainly to avoid dependences on
> >>> mask generation, a suitable mask test instruction is available).
> >>
> >> This is also sounds like GCN. We currently use WHILE_ULT in the middle
> >> end
> >> which expands to a vector compare against a vector of stepped values.
> >> This
> >> requires an additional instruction to prepare the comparison vector
> >> (compared to SVE), but the "while_ultv64sidi" pattern (for example)
> >> returns
> >> the DImode bitmask, so it works reasonably well.
> >>
> >>> Like RVV code generation prefers a decrementing IV though IVOPTs
> >>> messes things up in some cases removing that IV to eliminate
> >>> it with an incrementing one used for address generation.
> >>> One of the motivating testcases is from PR108410 which in turn
> >>> is extracted from x264 where large size vectorization shows
> >>> issues with small trip loops.  Execution time there improves
> >>> compared to classic AVX512 with AVX2 epilogues for the cases
> >>> of less than 32 iterations.
> >>> size   scalar 128 256 512512e512f
> >>>19.42   11.329.35   11.17   15.13   16.89
> >>>25.726.536.666.667.628.56
> >>>34.495.105.105.745.085.73
> >>>44.104.334.295.213.794.25
> >>>63.783.853.864.762.542.85
> >>>83.641.893.764.501.922.16
> >>>   123.562.213.754.261.261.42
> >>>   163.360.831.064.160.951.07
> >>>   203.391.421.334.070.750.85
> >>>   243.230.661.724.220.620.70
> >>>   283.181.092.044.200.540.61
> >>>   323.160.470.410.410.470.53
> >>>   343.160.670.610.560.440.50
> >>>   383.190.950.950.820.400.45
> >>>   423.090.581.211.130.360.40
> >>> 'size' specifies the number of actual iterations, 512e is for
> >>> a masked epilog and 512f for the fully masked loop.  From
> >>> 4 scalar iterations on the AVX512 masked epilog code is clearly
> >>> the winner, the fully masked variant is clearly worse and
> >>> it's size benefit is also tiny.
> >>
> >> Let me check I understand correctly. In the fully masked case, there is
> >> a
> >> single loop in which a new mask is generated at the start of each
> >> iteration. In the masked epilogue case, the main loop uses no masking
> >> whatsoever, thus avoiding the need for generating a mask, carrying the
> >> mask, inserting vec_merge operations, etc, and then the epilogue looks
> >> much
> >> like the fully masked case, but unlike smaller mode epilogues there is
> >> no
> >> loop because the eplogue vector size is the same. Is that right?
> >
> > Yes.
> >
> >> This scheme seems like it might also benefit GCN, in so much as it
> >> simplifies the hot code path.
> >>
> >> GCN does not actually have smaller vector sizes, so there's no analogue
> >> to
> >> AVX2 (we pretend we have some smaller sizes, but that's because the
> >> middle
> >> end can't do masking everywhere yet, and it helps make some vector
> >> constants smaller, perhaps).
> >>
> >>> This patch does not enable using fully masked loops or
> >>> masked epilogues by default.  More work on cost modeling
> >>> and vectorization kind selection on x86_64 is necessary
> >>> for this.
> >>> Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
> >>> which could be exploited further to unify some of the flags
> >>> we have right now but there didn't seem to be many easy things
> >>> to merge, so I'm leaving this for followups.
> >>> 

[PATCH V4] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs

2023-06-15 Thread juzhe . zhong
From: Ju-Zhe Zhong 

This patch bootstrap pass on X86, ok for trunk ?

Accoding to comments from Richi, split the first patch to add ifn && optabs
of LEN_MASK_{LOAD,STORE} only, we don't apply them into vectorizer in this
patch. And also add BIAS argument for possible s390's future use.

The description of the patterns in doc are coming Robin.

After this patch is approved, will send the second patch to apply len_mask_*
patterns into vectorizer.

Target like ARM SVE in GCC has an elegant way to handle both loop control
and flow control simultaneously:

loop_control_mask = WHILE_ULT
flow_control_mask = comparison
control_mask = loop_control_mask & flow_control_mask;
MASK_LOAD (control_mask)
MASK_STORE (control_mask)

However, targets like RVV (RISC-V Vector) can not use this approach in
auto-vectorization since RVV use length in loop control.

This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets
like RISC-V that uses length in loop control.
Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length
or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR.
Mask is the outcome of comparison.

LEN_MASK_ LOAD/STORE format is defined as follows:
1). LEN_MASK_LOAD (ptr, align, length, mask).
2). LEN_MASK_STORE (ptr, align, length, mask, vec).

Consider these 4 following cases:

VLA: Variable-length auto-vectorization
VLS: Specific-length auto-vectorization

Case 1 (VLS): -mrvv-vector-bits=128   IR (Does not use LEN_MASK_*):
Code:   v1 = MEM (...)
  for (int i = 0; i < 4; i++)   v2 = MEM (...)
a[i] = b[i] + c[i]; v3 = v1 + v2 
MEM[...] = v3

Case 2 (VLS): -mrvv-vector-bits=128   IR (LEN_MASK_* with length = VF, mask = 
comparison):
Code:   mask = comparison
  for (int i = 0; i < 4; i++)   v1 = LEN_MASK_LOAD (length = VF, mask)
if (cond[i])v2 = LEN_MASK_LOAD (length = VF, mask) 
  a[i] = b[i] + c[i];   v3 = v1 + v2
LEN_MASK_STORE (length = VF, mask, v3)
   
Case 3 (VLA):
Code:   loop_len = SELECT_VL or MIN
  for (int i = 0; i < n; i++)   v1 = LEN_MASK_LOAD (length = loop_len, 
mask = {-1,-1,...})
  a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = loop_len, 
mask = {-1,-1,...})
v3 = v1 + v2
LEN_MASK_STORE (length = loop_len, mask 
= {-1,-1,...}, v3)

Case 4 (VLA):
Code:   loop_len = SELECT_VL or MIN
  for (int i = 0; i < n; i++)   mask = comparison
  if (cond[i])  v1 = LEN_MASK_LOAD (length = loop_len, 
mask)
  a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = loop_len, 
mask)
v3 = v1 + v2
LEN_MASK_STORE (length = loop_len, 
mask, v3)

Co-authored-by: Robin Dapp 

gcc/ChangeLog:

* doc/md.texi: Add len_mask{load,store}.
* genopinit.cc (main): Ditto.
(CMP_NAME): Ditto.
* internal-fn.cc (len_maskload_direct): Ditto.
(len_maskstore_direct): Ditto.
(expand_call_mem_ref): Ditto.
(expand_partial_load_optab_fn): Ditto.
(expand_len_maskload_optab_fn): Ditto.
(expand_partial_store_optab_fn): Ditto.
(expand_len_maskstore_optab_fn): Ditto.
(direct_len_maskload_optab_supported_p): Ditto.
(direct_len_maskstore_optab_supported_p): Ditto.
* internal-fn.def (LEN_MASK_LOAD): Ditto.
(LEN_MASK_STORE): Ditto.
* optabs.def (OPTAB_CD): Ditto.

---
 gcc/doc/md.texi | 46 +
 gcc/genopinit.cc|  6 --
 gcc/internal-fn.cc  | 43 ++
 gcc/internal-fn.def |  4 
 gcc/optabs.def  |  2 ++
 5 files changed, 95 insertions(+), 6 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index a43fd65a2b2..af23ec938d6 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5136,6 +5136,52 @@ of @code{QI} elements.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{len_maskload@var{m}@var{n}} instruction pattern
+@item @samp{len_maskload@var{m}@var{n}}
+Perform a masked load (operand 2 - operand 4) elements from vector memory
+operand 1 into vector register operand 0, setting the other elements of
+operand 0 to undefined values.  This is a combination of len_load and 
maskload. 
+Operands 0 and 1 have mode @var{m}, which must be a vector mode.  Operand 2
+has whichever integer mode the target prefers.  A secondary mask is specified 
in
+operand 3 which must be of type @var{n}.  Operand 4 conceptually has mode 
@code{QI}.
+
+Operand 2 can be a variable or a constant amount.  Operand 4 specifies a
+constant 

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Andrew Stubbs

On 15/06/2023 12:06, Richard Biener wrote:

On Thu, 15 Jun 2023, Andrew Stubbs wrote:


On 15/06/2023 10:58, Richard Biener wrote:

On Thu, 15 Jun 2023, Andrew Stubbs wrote:


On 14/06/2023 15:29, Richard Biener wrote:




Am 14.06.2023 um 16:27 schrieb Andrew Stubbs :

On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:

This implemens fully masked vectorization or a masked epilog for
AVX512 style masks which single themselves out by representing
each lane with a single bit and by using integer modes for the mask
(both is much like GCN).
AVX512 is also special in that it doesn't have any instruction
to compute the mask from a scalar IV like SVE has with while_ult.
Instead the masks are produced by vector compares and the loop
control retains the scalar IV (mainly to avoid dependences on
mask generation, a suitable mask test instruction is available).


This is also sounds like GCN. We currently use WHILE_ULT in the middle
end
which expands to a vector compare against a vector of stepped values.
This
requires an additional instruction to prepare the comparison vector
(compared to SVE), but the "while_ultv64sidi" pattern (for example)
returns
the DImode bitmask, so it works reasonably well.


Like RVV code generation prefers a decrementing IV though IVOPTs
messes things up in some cases removing that IV to eliminate
it with an incrementing one used for address generation.
One of the motivating testcases is from PR108410 which in turn
is extracted from x264 where large size vectorization shows
issues with small trip loops.  Execution time there improves
compared to classic AVX512 with AVX2 epilogues for the cases
of less than 32 iterations.
size   scalar 128 256 512512e512f
   19.42   11.329.35   11.17   15.13   16.89
   25.726.536.666.667.628.56
   34.495.105.105.745.085.73
   44.104.334.295.213.794.25
   63.783.853.864.762.542.85
   83.641.893.764.501.922.16
  123.562.213.754.261.261.42
  163.360.831.064.160.951.07
  203.391.421.334.070.750.85
  243.230.661.724.220.620.70
  283.181.092.044.200.540.61
  323.160.470.410.410.470.53
  343.160.670.610.560.440.50
  383.190.950.950.820.400.45
  423.090.581.211.130.360.40
'size' specifies the number of actual iterations, 512e is for
a masked epilog and 512f for the fully masked loop.  From
4 scalar iterations on the AVX512 masked epilog code is clearly
the winner, the fully masked variant is clearly worse and
it's size benefit is also tiny.


Let me check I understand correctly. In the fully masked case, there is a
single loop in which a new mask is generated at the start of each
iteration. In the masked epilogue case, the main loop uses no masking
whatsoever, thus avoiding the need for generating a mask, carrying the
mask, inserting vec_merge operations, etc, and then the epilogue looks
much
like the fully masked case, but unlike smaller mode epilogues there is no
loop because the eplogue vector size is the same. Is that right?


Yes.


This scheme seems like it might also benefit GCN, in so much as it
simplifies the hot code path.

GCN does not actually have smaller vector sizes, so there's no analogue
to
AVX2 (we pretend we have some smaller sizes, but that's because the
middle
end can't do masking everywhere yet, and it helps make some vector
constants smaller, perhaps).


This patch does not enable using fully masked loops or
masked epilogues by default.  More work on cost modeling
and vectorization kind selection on x86_64 is necessary
for this.
Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
which could be exploited further to unify some of the flags
we have right now but there didn't seem to be many easy things
to merge, so I'm leaving this for followups.
Mask requirements as registered by vect_record_loop_mask are kept in
their
original form and recorded in a hash_set now instead of being
processed to a vector of rgroup_controls.  Instead that's now
left to the final analysis phase which tries forming the rgroup_controls
vector using while_ult and if that fails now tries AVX512 style
which needs a different organization and instead fills a hash_map
with the relevant info.  vect_get_loop_mask now has two implementations,
one for the two mask styles we then have.
I have decided against interweaving
vect_set_loop_condition_partial_vectors
with conditions to do AVX512 style masking and instead opted to
"duplicate" this to vect_set_loop_condition_partial_vectors_avx512.
Likewise for vect_verify_full_masking vs
vect_verify_full_masking_avx512.
I was split between making 'vec_loop_masks' a class with methods,

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Richard Biener via Gcc-patches
On Thu, 15 Jun 2023, Richard Biener wrote:

> On Wed, 14 Jun 2023, Richard Sandiford wrote:
> 
> > Richard Biener via Gcc-patches  writes:
> > > This implemens fully masked vectorization or a masked epilog for
> > > AVX512 style masks which single themselves out by representing
> > > each lane with a single bit and by using integer modes for the mask
> > > (both is much like GCN).
> > >
> > > AVX512 is also special in that it doesn't have any instruction
> > > to compute the mask from a scalar IV like SVE has with while_ult.
> > > Instead the masks are produced by vector compares and the loop
> > > control retains the scalar IV (mainly to avoid dependences on
> > > mask generation, a suitable mask test instruction is available).
> > >
> > > Like RVV code generation prefers a decrementing IV though IVOPTs
> > > messes things up in some cases removing that IV to eliminate
> > > it with an incrementing one used for address generation.
> > >
> > > One of the motivating testcases is from PR108410 which in turn
> > > is extracted from x264 where large size vectorization shows
> > > issues with small trip loops.  Execution time there improves
> > > compared to classic AVX512 with AVX2 epilogues for the cases
> > > of less than 32 iterations.
> > >
> > > size   scalar 128 256 512512e512f
> > > 19.42   11.329.35   11.17   15.13   16.89
> > > 25.726.536.666.667.628.56
> > > 34.495.105.105.745.085.73
> > > 44.104.334.295.213.794.25
> > > 63.783.853.864.762.542.85
> > > 83.641.893.764.501.922.16
> > >123.562.213.754.261.261.42
> > >163.360.831.064.160.951.07
> > >203.391.421.334.070.750.85
> > >243.230.661.724.220.620.70
> > >283.181.092.044.200.540.61
> > >323.160.470.410.410.470.53
> > >343.160.670.610.560.440.50
> > >383.190.950.950.820.400.45
> > >423.090.581.211.130.360.40
> > >
> > > 'size' specifies the number of actual iterations, 512e is for
> > > a masked epilog and 512f for the fully masked loop.  From
> > > 4 scalar iterations on the AVX512 masked epilog code is clearly
> > > the winner, the fully masked variant is clearly worse and
> > > it's size benefit is also tiny.
> > >
> > > This patch does not enable using fully masked loops or
> > > masked epilogues by default.  More work on cost modeling
> > > and vectorization kind selection on x86_64 is necessary
> > > for this.
> > >
> > > Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
> > > which could be exploited further to unify some of the flags
> > > we have right now but there didn't seem to be many easy things
> > > to merge, so I'm leaving this for followups.
> > >
> > > Mask requirements as registered by vect_record_loop_mask are kept in their
> > > original form and recorded in a hash_set now instead of being
> > > processed to a vector of rgroup_controls.  Instead that's now
> > > left to the final analysis phase which tries forming the rgroup_controls
> > > vector using while_ult and if that fails now tries AVX512 style
> > > which needs a different organization and instead fills a hash_map
> > > with the relevant info.  vect_get_loop_mask now has two implementations,
> > > one for the two mask styles we then have.
> > >
> > > I have decided against interweaving 
> > > vect_set_loop_condition_partial_vectors
> > > with conditions to do AVX512 style masking and instead opted to
> > > "duplicate" this to vect_set_loop_condition_partial_vectors_avx512.
> > > Likewise for vect_verify_full_masking vs vect_verify_full_masking_avx512.
> > >
> > > I was split between making 'vec_loop_masks' a class with methods,
> > > possibly merging in the _len stuff into a single registry.  It
> > > seemed to be too many changes for the purpose of getting AVX512
> > > working.  I'm going to play wait and see what happens with RISC-V
> > > here since they are going to get both masks and lengths registered
> > > I think.
> > >
> > > The vect_prepare_for_masked_peels hunk might run into issues with
> > > SVE, I didn't check yet but using LOOP_VINFO_RGROUP_COMPARE_TYPE
> > > looked odd.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.  I've run
> > > the testsuite with --param vect-partial-vector-usage=2 with and
> > > without -fno-vect-cost-model and filed two bugs, one ICE (PR110221)
> > > and one latent wrong-code (PR110237).
> > >
> > > There's followup work to be done to try enabling masked epilogues
> > > for x86-64 by default (when AVX512 is enabled, possibly only when
> > > -mprefer-vector-width=512).  Getting cost modeling and decision
> > > right is going to be challenging.
> > >
> > > Any comments?
> > >
> > 

[PATCH 2/2] libstdc++: use new built-in trait __remove_pointer

2023-06-15 Thread Ken Matsui via Gcc-patches
This patch lets libstdc++ use new built-in trait __remove_pointer.

libstdc++-v3/ChangeLog:

* include/std/type_traits (remove_pointer): Use __remove_pointer 
built-in trait.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 0e7a9c9c7f3..81497e2f3e1 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2023,6 +2023,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // Pointer modifications.
 
+  /// remove_pointer
+#if __has_builtin(__remove_pointer)
+  template
+struct remove_pointer
+{ using type = __remove_pointer(_Tp); };
+#else
   template
 struct __remove_pointer_helper
 { using type = _Tp; };
@@ -2031,11 +2037,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct __remove_pointer_helper<_Tp, _Up*>
 { using type = _Up; };
 
-  /// remove_pointer
   template
 struct remove_pointer
 : public __remove_pointer_helper<_Tp, __remove_cv_t<_Tp>>
 { };
+#endif
 
   template
 struct __add_pointer_helper
-- 
2.41.0



[PATCH 1/2] c++: implement __remove_pointer built-in trait

2023-06-15 Thread Ken Matsui via Gcc-patches
This patch implements built-in trait for std::remove_pointer.

gcc/cp/ChangeLog:

* cp-trait.def: Define __remove_pointer.
* semantics.cc (finish_trait_type): Handle CPTK_REMOVE_POINTER.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __remove_pointer.
* g++.dg/ext/remove_pointer.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-trait.def   |  1 +
 gcc/cp/semantics.cc   |  4 ++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  3 ++
 gcc/testsuite/g++.dg/ext/remove_pointer.C | 51 +++
 4 files changed, 59 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/remove_pointer.C

diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 8b7fece0cc8..07823e55579 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -90,6 +90,7 @@ DEFTRAIT_EXPR (IS_DEDUCIBLE, "__is_deducible ", 2)
 DEFTRAIT_TYPE (REMOVE_CV, "__remove_cv", 1)
 DEFTRAIT_TYPE (REMOVE_REFERENCE, "__remove_reference", 1)
 DEFTRAIT_TYPE (REMOVE_CVREF, "__remove_cvref", 1)
+DEFTRAIT_TYPE (REMOVE_POINTER, "__remove_pointer", 1)
 DEFTRAIT_TYPE (UNDERLYING_TYPE,  "__underlying_type", 1)
 DEFTRAIT_TYPE (TYPE_PACK_ELEMENT, "__type_pack_element", -1)
 
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 8fb47fd179e..885c7a6fb64 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12373,6 +12373,10 @@ finish_trait_type (cp_trait_kind kind, tree type1, 
tree type2,
   if (TYPE_REF_P (type1))
type1 = TREE_TYPE (type1);
   return cv_unqualified (type1);
+case CPTK_REMOVE_POINTER:
+  if (TYPE_PTR_P (type1))
+type1 = TREE_TYPE (type1);
+  return type1;
 
 case CPTK_TYPE_PACK_ELEMENT:
   return finish_type_pack_element (type1, type2, complain);
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index f343e153e56..e21e0a95509 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -146,3 +146,6 @@
 #if !__has_builtin (__remove_cvref)
 # error "__has_builtin (__remove_cvref) failed"
 #endif
+#if !__has_builtin (__remove_pointer)
+# error "__has_builtin (__remove_pointer) failed"
+#endif
diff --git a/gcc/testsuite/g++.dg/ext/remove_pointer.C 
b/gcc/testsuite/g++.dg/ext/remove_pointer.C
new file mode 100644
index 000..7b13db93950
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/remove_pointer.C
@@ -0,0 +1,51 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+SA(__is_same(__remove_pointer(int), int));
+SA(__is_same(__remove_pointer(int*), int));
+SA(__is_same(__remove_pointer(int**), int*));
+
+SA(__is_same(__remove_pointer(const int*), const int));
+SA(__is_same(__remove_pointer(const int**), const int*));
+SA(__is_same(__remove_pointer(int* const), int));
+SA(__is_same(__remove_pointer(int** const), int*));
+SA(__is_same(__remove_pointer(int* const* const), int* const));
+
+SA(__is_same(__remove_pointer(volatile int*), volatile int));
+SA(__is_same(__remove_pointer(volatile int**), volatile int*));
+SA(__is_same(__remove_pointer(int* volatile), int));
+SA(__is_same(__remove_pointer(int** volatile), int*));
+SA(__is_same(__remove_pointer(int* volatile* volatile), int* volatile));
+
+SA(__is_same(__remove_pointer(const volatile int*), const volatile int));
+SA(__is_same(__remove_pointer(const volatile int**), const volatile int*));
+SA(__is_same(__remove_pointer(const int* volatile), const int));
+SA(__is_same(__remove_pointer(volatile int* const), volatile int));
+SA(__is_same(__remove_pointer(int* const volatile), int));
+SA(__is_same(__remove_pointer(const int** volatile), const int*));
+SA(__is_same(__remove_pointer(volatile int** const), volatile int*));
+SA(__is_same(__remove_pointer(int** const volatile), int*));
+SA(__is_same(__remove_pointer(int* const* const volatile), int* const));
+SA(__is_same(__remove_pointer(int* volatile* const volatile), int* volatile));
+SA(__is_same(__remove_pointer(int* const volatile* const volatile), int* const 
volatile));
+
+SA(__is_same(__remove_pointer(int&), int&));
+SA(__is_same(__remove_pointer(const int&), const int&));
+SA(__is_same(__remove_pointer(volatile int&), volatile int&));
+SA(__is_same(__remove_pointer(const volatile int&), const volatile int&));
+
+SA(__is_same(__remove_pointer(int&&), int&&));
+SA(__is_same(__remove_pointer(const int&&), const int&&));
+SA(__is_same(__remove_pointer(volatile int&&), volatile int&&));
+SA(__is_same(__remove_pointer(const volatile int&&), const volatile int&&));
+
+SA(__is_same(__remove_pointer(int[3]), int[3]));
+SA(__is_same(__remove_pointer(const int[3]), const int[3]));
+SA(__is_same(__remove_pointer(volatile int[3]), volatile int[3]));
+SA(__is_same(__remove_pointer(const volatile int[3]), const volatile int[3]));
+
+SA(__is_same(__remove_pointer(int(int)), int(int)));
+SA(__is_same(__remove_pointer(int(*const)(int)), int(int)));

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Richard Biener via Gcc-patches
On Wed, 14 Jun 2023, Richard Sandiford wrote:

> Richard Biener via Gcc-patches  writes:
> > This implemens fully masked vectorization or a masked epilog for
> > AVX512 style masks which single themselves out by representing
> > each lane with a single bit and by using integer modes for the mask
> > (both is much like GCN).
> >
> > AVX512 is also special in that it doesn't have any instruction
> > to compute the mask from a scalar IV like SVE has with while_ult.
> > Instead the masks are produced by vector compares and the loop
> > control retains the scalar IV (mainly to avoid dependences on
> > mask generation, a suitable mask test instruction is available).
> >
> > Like RVV code generation prefers a decrementing IV though IVOPTs
> > messes things up in some cases removing that IV to eliminate
> > it with an incrementing one used for address generation.
> >
> > One of the motivating testcases is from PR108410 which in turn
> > is extracted from x264 where large size vectorization shows
> > issues with small trip loops.  Execution time there improves
> > compared to classic AVX512 with AVX2 epilogues for the cases
> > of less than 32 iterations.
> >
> > size   scalar 128 256 512512e512f
> > 19.42   11.329.35   11.17   15.13   16.89
> > 25.726.536.666.667.628.56
> > 34.495.105.105.745.085.73
> > 44.104.334.295.213.794.25
> > 63.783.853.864.762.542.85
> > 83.641.893.764.501.922.16
> >123.562.213.754.261.261.42
> >163.360.831.064.160.951.07
> >203.391.421.334.070.750.85
> >243.230.661.724.220.620.70
> >283.181.092.044.200.540.61
> >323.160.470.410.410.470.53
> >343.160.670.610.560.440.50
> >383.190.950.950.820.400.45
> >423.090.581.211.130.360.40
> >
> > 'size' specifies the number of actual iterations, 512e is for
> > a masked epilog and 512f for the fully masked loop.  From
> > 4 scalar iterations on the AVX512 masked epilog code is clearly
> > the winner, the fully masked variant is clearly worse and
> > it's size benefit is also tiny.
> >
> > This patch does not enable using fully masked loops or
> > masked epilogues by default.  More work on cost modeling
> > and vectorization kind selection on x86_64 is necessary
> > for this.
> >
> > Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
> > which could be exploited further to unify some of the flags
> > we have right now but there didn't seem to be many easy things
> > to merge, so I'm leaving this for followups.
> >
> > Mask requirements as registered by vect_record_loop_mask are kept in their
> > original form and recorded in a hash_set now instead of being
> > processed to a vector of rgroup_controls.  Instead that's now
> > left to the final analysis phase which tries forming the rgroup_controls
> > vector using while_ult and if that fails now tries AVX512 style
> > which needs a different organization and instead fills a hash_map
> > with the relevant info.  vect_get_loop_mask now has two implementations,
> > one for the two mask styles we then have.
> >
> > I have decided against interweaving vect_set_loop_condition_partial_vectors
> > with conditions to do AVX512 style masking and instead opted to
> > "duplicate" this to vect_set_loop_condition_partial_vectors_avx512.
> > Likewise for vect_verify_full_masking vs vect_verify_full_masking_avx512.
> >
> > I was split between making 'vec_loop_masks' a class with methods,
> > possibly merging in the _len stuff into a single registry.  It
> > seemed to be too many changes for the purpose of getting AVX512
> > working.  I'm going to play wait and see what happens with RISC-V
> > here since they are going to get both masks and lengths registered
> > I think.
> >
> > The vect_prepare_for_masked_peels hunk might run into issues with
> > SVE, I didn't check yet but using LOOP_VINFO_RGROUP_COMPARE_TYPE
> > looked odd.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.  I've run
> > the testsuite with --param vect-partial-vector-usage=2 with and
> > without -fno-vect-cost-model and filed two bugs, one ICE (PR110221)
> > and one latent wrong-code (PR110237).
> >
> > There's followup work to be done to try enabling masked epilogues
> > for x86-64 by default (when AVX512 is enabled, possibly only when
> > -mprefer-vector-width=512).  Getting cost modeling and decision
> > right is going to be challenging.
> >
> > Any comments?
> >
> > OK?
> 
> Some comments below, but otherwise LGTM FWIW.
> 
> > Btw, testing on GCN would be welcome - the _avx512 paths could
> > work for it so in case the while_ult path fails (not sure if
> > it ever does) it could get _avx512 style 

Re: [PATCH] c++: provide #include hint for missing includes [PR110164]

2023-06-15 Thread Sam James via Gcc-patches



> On 15 Jun 2023, at 12:54, David Malcolm  wrote:
> 
> On Thu, 2023-06-15 at 01:43 +0100, Sam James wrote:
>> 
>> Eric Gallager via Gcc-patches  writes:
>> 
>>> On Wed, Jun 14, 2023 at 8:29 PM David Malcolm via Gcc-patches
>>>  wrote:
 
 PR c++/110164 notes that in cases where we have a forward decl
 of a std library type such as:
 
 std::array x;
 
 we omit this diagnostic:
 
 error: aggregate ‘std::array x’ has incomplete type and
 cannot be defined
 
 This patch adds this hint to the diagnostic:
 
 note: ‘std::array’ is defined in header ‘’; this is
 probably fixable by adding ‘#include ’
 
>>> 
>>> ..."probably"?
>>> 
>> 
>> Right now, our fixit says:
>> ```
>> /tmp/foo.c:1:1: note: ‘time_t’ is defined in header ‘’; did
>> you forget to ‘#include ’?
>> ```
>> 
>> We should probably use the same phrasing for consistency?
> 
> It's using the same phrasing (it's calling the same function); I
> changed the wording recently, in r14-1798-g7474c46cf2d371:
>  https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621607.html
> 
> I used "probably" because there's no guarantee it will fix things (e.g.
> if the user has non-standard headers).

Ah, sorry Dave, shame on me for not git pulling first :)

No objection then - I like the new phrasing, was just worried about consistency.

> 
> Dave
> 

Thank you!

[PATCH] gcc-ar: Remove code duplication.

2023-06-15 Thread Costas Argyris via Gcc-patches
Some refactoring I thought would be useful while looking at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77576

I think some duplicated code can go away by doing this,
while also saving a bit of memory.
From c3f3b2fd53291805b5d0be19df6d1a348c5889ec Mon Sep 17 00:00:00 2001
From: Costas Argyris 
Date: Thu, 15 Jun 2023 12:37:35 +0100
Subject: [PATCH] gcc-ar: Remove code duplication.

Preparatory refactoring that simplifies by eliminating
some duplicated code, before trying to fix 77576.
I believe this stands on its own regardless of the PR.
It also saves a nargv element when we have a plugin and
three when not.

Signed-off-by: Costas Argyris 
---
 gcc/gcc-ar.cc | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/gcc/gcc-ar.cc b/gcc/gcc-ar.cc
index 5e5b63e1988..4e4c525927d 100644
--- a/gcc/gcc-ar.cc
+++ b/gcc/gcc-ar.cc
@@ -128,6 +128,9 @@ main (int ac, char **av)
   const char *exe_name;
 #if HAVE_LTO_PLUGIN > 0
   char *plugin;
+  const int j = 2; /* Two extra args, --plugin   */
+#else
+  const int j = 0; /* No extra args.  */
 #endif
   int k, status, err;
   const char *err_msg;
@@ -206,25 +209,21 @@ main (int ac, char **av)
 	}
 }
 
+  /* Prepend - if necessary.  */
+  if (is_ar && av[1] && av[1][0] != '-')
+av[1] = concat ("-", av[1], NULL);
+  
   /* Create new command line with plugin - if we have one, otherwise just
  copy the command through.  */
-  nargv = XCNEWVEC (const char *, ac + 4);
+  nargv = XCNEWVEC (const char *, ac + j + 1); /* +j plugin args +1 for NULL.  */
   nargv[0] = exe_name;
 #if HAVE_LTO_PLUGIN > 0
   nargv[1] = "--plugin";
   nargv[2] = plugin;
-  if (is_ar && av[1] && av[1][0] != '-')
-av[1] = concat ("-", av[1], NULL);
-  for (k = 1; k < ac; k++)
-nargv[2 + k] = av[k];
-  nargv[2 + k] = NULL;
-#else
-  if (is_ar && av[1] && av[1][0] != '-')
-av[1] = concat ("-", av[1], NULL);
-  for (k = 1; k < ac; k++)
-nargv[k] = av[k];
-  nargv[k] = NULL;
 #endif
+  for (k = 1; k < ac; k++)
+nargv[j + k] = av[k];
+  nargv[j + k] = NULL;
 
   /* Run utility */
   /* ??? the const is misplaced in pex_one's argv? */
-- 
2.30.2



Re: [PATCH] c++: provide #include hint for missing includes [PR110164]

2023-06-15 Thread David Malcolm via Gcc-patches
On Thu, 2023-06-15 at 01:43 +0100, Sam James wrote:
> 
> Eric Gallager via Gcc-patches  writes:
> 
> > On Wed, Jun 14, 2023 at 8:29 PM David Malcolm via Gcc-patches
> >  wrote:
> > > 
> > > PR c++/110164 notes that in cases where we have a forward decl
> > > of a std library type such as:
> > > 
> > > std::array x;
> > > 
> > > we omit this diagnostic:
> > > 
> > > error: aggregate ‘std::array x’ has incomplete type and
> > > cannot be defined
> > > 
> > > This patch adds this hint to the diagnostic:
> > > 
> > > note: ‘std::array’ is defined in header ‘’; this is
> > > probably fixable by adding ‘#include ’
> > > 
> > 
> > ..."probably"?
> > 
> 
> Right now, our fixit says:
> ```
> /tmp/foo.c:1:1: note: ‘time_t’ is defined in header ‘’; did
> you forget to ‘#include ’?
> ```
> 
> We should probably use the same phrasing for consistency?

It's using the same phrasing (it's calling the same function); I
changed the wording recently, in r14-1798-g7474c46cf2d371:
  https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621607.html

I used "probably" because there's no guarantee it will fix things (e.g.
if the user has non-standard headers).

Dave



Re: [PATCH] libcpp: Diagnose #include after failed __has_include [PR80753]

2023-06-15 Thread Jason Merrill via Gcc-patches

On 6/13/23 12:29, Jakub Jelinek wrote:

Hi!

As can be seen in the testcase, we don't diagnose #include/#include_next
of a non-existent header if __has_include/__has_include_next is done for
that header first.
The problem is that we normally error the first time some header is not
found, but in the _cpp_FFK_HAS_INCLUDE case obviously don't want to diagnose
it, just expand it to 0.  And libcpp caches both successful includes and
unsuccessful ones.

The following patch fixes that by remembering that we haven't diagnosed
error when using __has_include* on it, and diagnosing it when using the
cache entry in normal mode the first time.

I think _cpp_FFK_NORMAL is the only mode in which we normally diagnose
errors, for _cpp_FFK_PRE_INCLUDE that open_file_failed isn't reached
and for _cpp_FFK_FAKE neither.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
after a while for backports?


OK.


2023-06-13  Jakub Jelinek  

PR preprocessor/80753
libcpp/
* files.cc (struct _cpp_file): Add deferred_error bitfield.
(_cpp_find_file): When finding a file in cache with deferred_error
set in _cpp_FFK_NORMAL mode, call open_file_failed and clear the flag.
Set deferred_error in _cpp_FFK_HAS_INCLUDE mode if open_file_failed
hasn't been called.
gcc/testsuite/
* c-c++-common/missing-header-5.c: New test.

--- libcpp/files.cc.jj  2023-01-16 11:52:16.326730483 +0100
+++ libcpp/files.cc 2023-06-13 11:27:59.867465878 +0200
@@ -109,6 +109,10 @@ struct _cpp_file
/* If this file is implicitly preincluded.  */
bool implicit_preinclude : 1;
  
+  /* Set if a header wasn't found with __has_include or __has_include_next

+ and error should be emitted if it is included normally.  */
+  bool deferred_error : 1;
+
/* > 0: Known C++ Module header unit, <0: known not.  ==0, unknown  */
int header_unit : 2;
  };
@@ -523,7 +527,14 @@ _cpp_find_file (cpp_reader *pfile, const
cpp_file_hash_entry *entry
  = search_cache ((struct cpp_file_hash_entry *) *hash_slot, start_dir);
if (entry)
-return entry->u.file;
+{
+  if (entry->u.file->deferred_error && kind == _cpp_FFK_NORMAL)
+   {
+ open_file_failed (pfile, entry->u.file, angle_brackets, loc);
+ entry->u.file->deferred_error = false;
+   }
+  return entry->u.file;
+}
  
_cpp_file *file = make_cpp_file (start_dir, fname);

file->implicit_preinclude
@@ -589,6 +600,8 @@ _cpp_find_file (cpp_reader *pfile, const
  
  	if (kind != _cpp_FFK_HAS_INCLUDE)

  open_file_failed (pfile, file, angle_brackets, loc);
+   else
+ file->deferred_error = true;
break;
  }
  
--- gcc/testsuite/c-c++-common/missing-header-5.c.jj	2023-06-13 11:29:49.345931030 +0200

+++ gcc/testsuite/c-c++-common/missing-header-5.c   2023-06-13 
11:25:34.952497526 +0200
@@ -0,0 +1,15 @@
+/* PR preprocessor/80753 */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+#if __has_include("nonexistent.h")
+# error
+#endif
+
+#include "nonexistent.h"
+
+/* { dg-message "nonexistent.h" "nonexistent.h" { target *-*-* } 0 } */
+/* { dg-message "terminated" "terminated" { target *-*-* } 0 } */
+
+/* This declaration should not receive any diagnostic.  */
+foo bar;

Jakub





[PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2023-06-15 Thread Stamatis Markianos-Wright via Gcc-patches

    Hi all,

    This is the 2/2 patch that contains the functional changes needed
    for MVE Tail Predicated Low Overhead Loops.  See my previous email
    for a general introduction of MVE LOLs.

    This support is added through the already existing loop-doloop
    mechanisms that are used for non-MVE dls/le looping.

    Mid-end changes are:

    1) Relax the loop-doloop mechanism in the mid-end to allow for
   decrement numbers other that -1 and for `count` to be an
   rtx containing a simple REG (which in this case will contain
   the number of elements to be processed), rather
   than an expression for calculating the number of iterations.
    2) Added a new df utility function: `df_bb_regno_only_def_find` that
   will return the DEF of a REG only if it is DEF-ed once within the
   basic block.

    And many things in the backend to implement the above optimisation:

    3)  Implement the `arm_predict_doloop_p` target hook to instruct the
    mid-end about Low Overhead Loops (MVE or not), as well as
    `arm_loop_unroll_adjust` which will prevent unrolling of any loops
    that are valid for becoming MVE Tail_Predicated Low Overhead Loops
    (unrolling can transform a loop in ways that invalidate the dlstp/
    letp tranformation logic and the benefit of the dlstp/letp loop
    would be considerably higher than that of unrolling)
    4)  Appropriate changes to the define_expand of doloop_end, new
    patterns for dlstp and letp, new iterators,  unspecs, etc.
    5) `arm_mve_loop_valid_for_dlstp` and a number of checking functions:
   * `arm_mve_dlstp_check_dec_counter`
   * `arm_mve_dlstp_check_inc_counter`
   * `arm_mve_check_reg_origin_is_num_elems`
   * `arm_mve_check_df_chain_back_for_implic_predic`
   * `arm_mve_check_df_chain_fwd_for_implic_predic_impact`
   This all, in smoe way or another, are running checks on the loop
   structure in order to determine if the loop is valid for dlstp/letp
   transformation.
    6) `arm_attempt_dlstp_transform`: (called from the define_expand of
    doloop_end) this function re-checks for the loop's suitability for
    dlstp/letp transformation and then implements it, if possible.
    7) Various utility functions:
   *`arm_mve_get_vctp_lanes` to map
   from vctp unspecs to number of lanes, and `arm_get_required_vpr_reg`
   to check an insn to see if it requires the VPR or not.
   * `arm_mve_get_loop_vctp`
   * `arm_mve_get_vctp_lanes`
   * `arm_emit_mve_unpredicated_insn_to_seq`
   * `arm_get_required_vpr_reg`
   * `arm_get_required_vpr_reg_param`
   * `arm_get_required_vpr_reg_ret_val`
   * `arm_mve_vec_insn_is_predicated_with_this_predicate`
   * `arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate`

    No regressions on arm-none-eabi with various targets and on
    aarch64-none-elf. Thoughts on getting this into trunk?

    Thank you,
    Stam Markianos-Wright

    gcc/ChangeLog:

    * config/arm/arm-protos.h (arm_target_insn_ok_for_lob): 
Rename to...

    (arm_target_bb_ok_for_lob): ...this
    (arm_attempt_dlstp_transform): New.
    * config/arm/arm.cc (TARGET_LOOP_UNROLL_ADJUST): New.
    (TARGET_PREDICT_DOLOOP_P): New.
    (arm_block_set_vect):
    (arm_target_insn_ok_for_lob): Rename from 
arm_target_insn_ok_for_lob.

    (arm_target_bb_ok_for_lob): New.
    (arm_mve_get_vctp_lanes): New.
    (arm_get_required_vpr_reg): New.
    (arm_get_required_vpr_reg_param): New.
    (arm_get_required_vpr_reg_ret_val): New.
    (arm_mve_get_loop_vctp): New.
(arm_mve_vec_insn_is_unpredicated_or_uses_other_predicate): New.
    (arm_mve_vec_insn_is_predicated_with_this_predicate): New.
    (arm_mve_check_df_chain_back_for_implic_predic): New.
    (arm_mve_check_df_chain_fwd_for_implic_predic_impact): New.
    (arm_mve_check_reg_origin_is_num_elems): New.
    (arm_mve_dlstp_check_inc_counter): New.
    (arm_mve_dlstp_check_dec_counter): New.
    (arm_mve_loop_valid_for_dlstp): New.
    (arm_predict_doloop_p): New.
    (arm_loop_unroll_adjust): New.
    (arm_emit_mve_unpredicated_insn_to_seq): New.
    (arm_attempt_dlstp_transform): New.
    * config/arm/iterators.md (DLSTP): New.
    (mode1): Add DLSTP mappings.
    * config/arm/mve.md (*predicated_doloop_end_internal): New.
    (dlstp_insn): New.
    * config/arm/thumb2.md (doloop_end): Update for MVE LOLs.
    * config/arm/unspecs.md: New unspecs.
    * df-core.cc (df_bb_regno_only_def_find): New.
    * df.h (df_bb_regno_only_def_find): New.
    * loop-doloop.cc (doloop_condition_get): Relax conditions.
    (doloop_optimize): Add support for elementwise LoLs.

    gcc/testsuite/ChangeLog:

    * 

Re: [PATCH v2] c++: Accept elaborated-enum-base in system headers

2023-06-15 Thread Jason Merrill via Gcc-patches

On 6/14/23 09:31, Alex Coplan wrote:

Hi,

This is a v2 patch addressing feedback for:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621050.html

macOS SDK headers using the CF_ENUM macro can expand to invalid C++ code
of the form:

typedef enum T : BaseType T;

i.e. an elaborated-type-specifier with an additional enum-base.
Upstream LLVM can be made to accept the above construct with
-Wno-error=elaborated-enum-base.

This patch adds the -Welaborated-enum-base warning to GCC and adjusts
the C++ parser to emit this warning instead of rejecting this code
outright.

The macro expansion in the macOS headers occurs in the case that the
compiler declares support for enums with underlying type using
__has_feature, see
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618450.html

GCC rejecting this construct outright means that GCC fails to bootstrap
on Darwin in the case that it (correctly) implements __has_feature and
declares support for C++ enums with underlying type.

With this patch, GCC can bootstrap on Darwin in combination with the
(WIP) __has_feature patch posted at:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617878.html

Bootstrapped/regtested on aarch64-linux-gnu and x86_64-apple-darwin.
OK for trunk?

Thanks,
Alex

gcc/c-family/ChangeLog:

 * c.opt (Welaborated-enum-base): New.



+Welaborated-enum-base
+C++ ObjC++ Var(warn_elaborated_enum_base) Warning Init(1)
+Warn if an additional enum-base is used in an elaborated-type-specifier.
+That is, if an enum with given underlying type and no enumerator list
+is used in a declaration other than just a standalone declaration of the
+enum.


Just the first line of description here; the rest should go in 
doc/invoke.texi.


Jason



Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow control for length loop control

2023-06-15 Thread Richard Biener via Gcc-patches
On Thu, 15 Jun 2023, Robin Dapp wrote:

> > the minus in 'operand 2 - operand 3' should be a plus if the
> > bias is really zero or -1.  I suppose
> 
> Yes, that somehow got lost from when the bias was still +1.  Maybe
> Juzhe can fix this in the course of his patch.
> 
> > that's quite conservative.  I think you can do better when the
> > loads are aligned, reading an extra byte when ignoring the bias
> > is OK and you at least know the very first element is used.
> > For stores you would need to emit compare for all but
> > the first store of a group though ...
> 
> The implementation is a first shot and yes we could do a bit
> better but limiting to a single rgroup is IMHO the more severe
> restriction.  The pattern wasn't hit very often across SPEC
> either way.  I think overall proper masking is  more important for
> fixed-length vectors while length control might be more useful
> for variable-length vectors.  Just my gut feeling though, you're
> the expert there.
> 
> > That said, I'm still not seeing where you actually apply the bias.
> 
> We do
> 
> +
> +  int partial_load_bias = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
> +  if (partial_load_bias != 0)
> +{
> +  tree adjusted_len = rgc->bias_adjusted_ctrl;
> +  gassign *minus = gimple_build_assign (adjusted_len, PLUS_EXPR,
> +   rgc->controls[0],
> +   build_int_cst
> +   (TREE_TYPE (rgc->controls[0]),
> +partial_load_bias));
> +  gimple_seq_add_stmt (header_seq, minus);
> +}
> +
> 
> as well as
> 
> + if (use_bias_adjusted_len)
> +   {
> + gcc_assert (i == 0);
> + tree adjusted_len =
> +   make_temp_ssa_name (len_type, NULL, "adjusted_loop_len");
> + SSA_NAME_DEF_STMT (adjusted_len) = gimple_build_nop ();
> + rgl->bias_adjusted_ctrl = adjusted_len;
> +   }

Ah, OK.  It's a bit odd to have predicates on define_expand.  The
define_expand pattern is expected to only match either literal 0
or literal -1 (and consistently so for all len_ optabs) and thus
operand 2, the length, needs to be adjusted by the middle-end
to match up with the pattern supplied operand 3.

Richard.


Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Richard Biener via Gcc-patches
On Thu, 15 Jun 2023, Andrew Stubbs wrote:

> On 15/06/2023 10:58, Richard Biener wrote:
> > On Thu, 15 Jun 2023, Andrew Stubbs wrote:
> > 
> >> On 14/06/2023 15:29, Richard Biener wrote:
> >>>
> >>>
>  Am 14.06.2023 um 16:27 schrieb Andrew Stubbs :
> 
>  On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:
> > This implemens fully masked vectorization or a masked epilog for
> > AVX512 style masks which single themselves out by representing
> > each lane with a single bit and by using integer modes for the mask
> > (both is much like GCN).
> > AVX512 is also special in that it doesn't have any instruction
> > to compute the mask from a scalar IV like SVE has with while_ult.
> > Instead the masks are produced by vector compares and the loop
> > control retains the scalar IV (mainly to avoid dependences on
> > mask generation, a suitable mask test instruction is available).
> 
>  This is also sounds like GCN. We currently use WHILE_ULT in the middle
>  end
>  which expands to a vector compare against a vector of stepped values.
>  This
>  requires an additional instruction to prepare the comparison vector
>  (compared to SVE), but the "while_ultv64sidi" pattern (for example)
>  returns
>  the DImode bitmask, so it works reasonably well.
> 
> > Like RVV code generation prefers a decrementing IV though IVOPTs
> > messes things up in some cases removing that IV to eliminate
> > it with an incrementing one used for address generation.
> > One of the motivating testcases is from PR108410 which in turn
> > is extracted from x264 where large size vectorization shows
> > issues with small trip loops.  Execution time there improves
> > compared to classic AVX512 with AVX2 epilogues for the cases
> > of less than 32 iterations.
> > size   scalar 128 256 512512e512f
> >   19.42   11.329.35   11.17   15.13   16.89
> >   25.726.536.666.667.628.56
> >   34.495.105.105.745.085.73
> >   44.104.334.295.213.794.25
> >   63.783.853.864.762.542.85
> >   83.641.893.764.501.922.16
> >  123.562.213.754.261.261.42
> >  163.360.831.064.160.951.07
> >  203.391.421.334.070.750.85
> >  243.230.661.724.220.620.70
> >  283.181.092.044.200.540.61
> >  323.160.470.410.410.470.53
> >  343.160.670.610.560.440.50
> >  383.190.950.950.820.400.45
> >  423.090.581.211.130.360.40
> > 'size' specifies the number of actual iterations, 512e is for
> > a masked epilog and 512f for the fully masked loop.  From
> > 4 scalar iterations on the AVX512 masked epilog code is clearly
> > the winner, the fully masked variant is clearly worse and
> > it's size benefit is also tiny.
> 
>  Let me check I understand correctly. In the fully masked case, there is a
>  single loop in which a new mask is generated at the start of each
>  iteration. In the masked epilogue case, the main loop uses no masking
>  whatsoever, thus avoiding the need for generating a mask, carrying the
>  mask, inserting vec_merge operations, etc, and then the epilogue looks
>  much
>  like the fully masked case, but unlike smaller mode epilogues there is no
>  loop because the eplogue vector size is the same. Is that right?
> >>>
> >>> Yes.
> >>>
>  This scheme seems like it might also benefit GCN, in so much as it
>  simplifies the hot code path.
> 
>  GCN does not actually have smaller vector sizes, so there's no analogue
>  to
>  AVX2 (we pretend we have some smaller sizes, but that's because the
>  middle
>  end can't do masking everywhere yet, and it helps make some vector
>  constants smaller, perhaps).
> 
> > This patch does not enable using fully masked loops or
> > masked epilogues by default.  More work on cost modeling
> > and vectorization kind selection on x86_64 is necessary
> > for this.
> > Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
> > which could be exploited further to unify some of the flags
> > we have right now but there didn't seem to be many easy things
> > to merge, so I'm leaving this for followups.
> > Mask requirements as registered by vect_record_loop_mask are kept in
> > their
> > original form and recorded in a hash_set now instead of being
> > processed to a vector of rgroup_controls.  Instead that's now
> > left to the final analysis phase which tries forming the rgroup_controls
> 

Re: [PATCH v7 0/6] c++, libstdc++: get std::is_object to dispatch to new built-in traits

2023-06-15 Thread Ken Matsui via Gcc-patches
Hi,

For those curious about the performance improvements of this patch, I
conducted a benchmark that instantiates 256k specializations of
is_object_v based on Patrick's code. You can find the benchmark code
at this link:

https://github.com/ken-matsui/gcc-benches/blob/main/is_object_benchmark.cc

On my computer, using the gcc HEAD of this patch for a release build,
the patch with -DUSE_BUILTIN took 64% less time and used 44-47% less
memory compared to not using it.

Sincerely,
Ken Matsui

On Mon, Jun 12, 2023 at 3:49 PM Ken Matsui  wrote:
>
> Hi,
>
> This patch series gets std::is_object to dispatch to built-in traits and
> implements the following built-in traits, on which std::object depends.
>
> * __is_reference
> * __is_function
> * __is_void
>
> std::is_object was depending on them with disjunction and negation.
>
> __not_<__or_, is_reference<_Tp>, is_void<_Tp>>>::type
>
> Therefore, this patch uses them directly instead of implementing an additional
> built-in trait __is_object, which makes the compiler slightly bigger and
> slower.
>
> __bool_constant __is_void(_Tp))>
>
> This would instantiate only __bool_constant and __bool_constant,
> which can be mostly shared. That is, the purpose of built-in traits is
> considered as achieved.
>
> Changes in v7
>
> * Removed an unnecessary new line.
>
> Ken Matsui (6):
>   c++: implement __is_reference built-in trait
>   libstdc++: use new built-in trait __is_reference for std::is_reference
>   c++: implement __is_function built-in trait
>   libstdc++: use new built-in trait __is_function for std::is_function
>   c++, libstdc++: implement __is_void built-in trait
>   libstdc++: make std::is_object dispatch to new built-in traits
>
>  gcc/cp/constraint.cc  |  9 +++
>  gcc/cp/cp-trait.def   |  3 +
>  gcc/cp/semantics.cc   | 12 
>  gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  9 +++
>  gcc/testsuite/g++.dg/ext/is_function.C| 58 +++
>  gcc/testsuite/g++.dg/ext/is_reference.C   | 34 +++
>  gcc/testsuite/g++.dg/ext/is_void.C| 35 +++
>  gcc/testsuite/g++.dg/tm/pr46567.C |  6 +-
>  libstdc++-v3/include/bits/cpp_type_traits.h   | 15 -
>  libstdc++-v3/include/debug/helper_functions.h |  5 +-
>  libstdc++-v3/include/std/type_traits  | 51 
>  11 files changed, 216 insertions(+), 21 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ext/is_function.C
>  create mode 100644 gcc/testsuite/g++.dg/ext/is_reference.C
>  create mode 100644 gcc/testsuite/g++.dg/ext/is_void.C
>
> --
> 2.41.0
>


Re: [libstdc++] [testsuite] xfail dbl from_chars for aarch64 rtems ldbl

2023-06-15 Thread Jonathan Wakely via Gcc-patches
On Thu, 15 Jun 2023, 01:49 Alexandre Oliva via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

>
> rtems, like vxworks, uses fast-float doubles for from_chars even for
> long double, so it loses precision, so expect the long double bits to
> fail on aarch64.
>
> Regstrapped on x86_64-linux-gnu, also tested on aarch64-rtems6.  Ok to
> install?
>

OK, thanks



>
> for  libstdc++-v3/ChangeLog
>
> * testsuite/20_util/from_chars/4.cc: Skip long double on
> aarch64-rtems.
> ---
>  libstdc++-v3/testsuite/20_util/from_chars/4.cc |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libstdc++-v3/testsuite/20_util/from_chars/4.cc
> b/libstdc++-v3/testsuite/20_util/from_chars/4.cc
> index 206e18daeb229..76e07df9d2bf3 100644
> --- a/libstdc++-v3/testsuite/20_util/from_chars/4.cc
> +++ b/libstdc++-v3/testsuite/20_util/from_chars/4.cc
> @@ -18,7 +18,7 @@
>  //  is supported in C++14 as a GNU extension
>  // { dg-do run { target c++14 } }
>  // { dg-add-options ieee }
> -// { dg-additional-options "-DSKIP_LONG_DOUBLE" { target
> aarch64-*-vxworks* x86_64-*-vxworks* } }
> +// { dg-additional-options "-DSKIP_LONG_DOUBLE" { target aarch64-*-rtems*
> aarch64-*-vxworks* x86_64-*-vxworks* } }
>
>  #include 
>  #include 
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 
>


Re: libstdc++-v3: do not duplicate some math functions when using newlib

2023-06-15 Thread Jonathan Wakely via Gcc-patches
On Thu, 15 Jun 2023, 01:46 Alexandre Oliva via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

>
> Contributing a patch by Joel Brobecker .
> Regstrapped on x86_64-linux-gnu just to be sure, also tested with
> aarch64-rtems6.  I'm going to put this in later this week if there
> aren't any objections.
>
>
> When running the libstdc++ testsuite on AArch64 RTEMS, we noticed
> that about 25 tests are failing during the link, due to the "sqrtl"
> function being defined twice:
>   - once inside RTEMS' libm;
>   - once inside our libstdc++.
>
> One test that fails, for instance, would be 26_numerics/complex/13450.cc.
>
> In comparing libm and libstdc++, we found that libstc++ also
> duplicates "hypotf", and "hypotl".
>
> For "sqrtl" and "hypotl", the symbosl come a unit called
> from math_stubs_long_double.cc, while "hypotf" comes from
> the equivalent unit for the float version, called math_stubs_float.cc.
> Those units are always compiled in libstdc++ and provide our own
> version of various math routines when those are missing from
> the target system. The definition of those symbols is predicated
> on the existance of various macros provided by c++config.h, which
> themselves are predicated by the corresponding HAVE_xxx macros
> in config.h.
>
> One key element behind what's happening, here, is that the target
> uses newlib, and therefore GCC was configured --with-newlib.
> The section of libstdc++v3's configure script that handles which math
> functions are available has a newlib-specific section, and that
> section provides a hardcoded list of symbols.
>
> For "hypotf", this commit fixes the issue by doing the same
> as for the other routines already declared in that section.
> I verified by inspection in the newlib code that this function
> should always be present, so hardcoding it in our configure
> script should not be an issue.
>
> For the math routines handling doubles ("sqrtl" and "hypotl"),
> however, I do not believe we can assume that newlib's libm
> will always provide them. Therefore, this commit fixes that
> part of the issue by ading a compile-check for "sqrtl" and "hypotl".
> And while at it, we also include checks for all the other math
> functions that math_stubs_long_double.cc re-implements, allowing
> us to be resilient to future newlib enhancements adding support
> for more functions.
>

Excellent, I've been looking at this area of our configury and the math
stubs recently and this is a nice improvement.

OK for trunk, thanks.


> libstdc++-v3/ChangeLog:
>
> * configure.ac ["x${with_newlib}" = "xyes"]: Define
> HAVE_HYPOTF.  Add compile-checks for various long double
> math functions as well.
> * configure: Regenerate.
> ---
>  libstdc++-v3/configure| 1179
> +
>  libstdc++-v3/configure.ac |9
>  2 files changed, 1188 insertions(+)
>
> diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
> index 354c566b0055c..bda8053ecc279 100755
> [omitted]
> diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
> index 0abe54e7b9a21..9770c1787679f 100644
> --- a/libstdc++-v3/configure.ac
> +++ b/libstdc++-v3/configure.ac
> @@ -349,6 +349,7 @@ else
>  AC_DEFINE(HAVE_FLOORF)
>  AC_DEFINE(HAVE_FMODF)
>  AC_DEFINE(HAVE_FREXPF)
> +AC_DEFINE(HAVE_HYPOTF)
>  AC_DEFINE(HAVE_LDEXPF)
>  AC_DEFINE(HAVE_LOG10F)
>  AC_DEFINE(HAVE_LOGF)
> @@ -360,6 +361,14 @@ else
>  AC_DEFINE(HAVE_TANF)
>  AC_DEFINE(HAVE_TANHF)
>
> +dnl # Support for the long version of some math libraries depends on
> +dnl # architecture and newlib version.  So test for their availability
> +dnl # rather than hardcoding that information.
> +GLIBCXX_CHECK_MATH_DECLS([
> +  acosl asinl atan2l atanl ceill coshl cosl expl fabsl floorl fmodl
> +  frexpl hypotl ldexpl log10l logl modfl powl sinhl sinl sqrtl
> +  tanhl tanl])
> +
>  AC_DEFINE(HAVE_ICONV)
>  AC_DEFINE(HAVE_MEMALIGN)
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 
>


Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Andrew Stubbs

On 15/06/2023 10:58, Richard Biener wrote:

On Thu, 15 Jun 2023, Andrew Stubbs wrote:


On 14/06/2023 15:29, Richard Biener wrote:




Am 14.06.2023 um 16:27 schrieb Andrew Stubbs :

On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:

This implemens fully masked vectorization or a masked epilog for
AVX512 style masks which single themselves out by representing
each lane with a single bit and by using integer modes for the mask
(both is much like GCN).
AVX512 is also special in that it doesn't have any instruction
to compute the mask from a scalar IV like SVE has with while_ult.
Instead the masks are produced by vector compares and the loop
control retains the scalar IV (mainly to avoid dependences on
mask generation, a suitable mask test instruction is available).


This is also sounds like GCN. We currently use WHILE_ULT in the middle end
which expands to a vector compare against a vector of stepped values. This
requires an additional instruction to prepare the comparison vector
(compared to SVE), but the "while_ultv64sidi" pattern (for example) returns
the DImode bitmask, so it works reasonably well.


Like RVV code generation prefers a decrementing IV though IVOPTs
messes things up in some cases removing that IV to eliminate
it with an incrementing one used for address generation.
One of the motivating testcases is from PR108410 which in turn
is extracted from x264 where large size vectorization shows
issues with small trip loops.  Execution time there improves
compared to classic AVX512 with AVX2 epilogues for the cases
of less than 32 iterations.
size   scalar 128 256 512512e512f
  19.42   11.329.35   11.17   15.13   16.89
  25.726.536.666.667.628.56
  34.495.105.105.745.085.73
  44.104.334.295.213.794.25
  63.783.853.864.762.542.85
  83.641.893.764.501.922.16
 123.562.213.754.261.261.42
 163.360.831.064.160.951.07
 203.391.421.334.070.750.85
 243.230.661.724.220.620.70
 283.181.092.044.200.540.61
 323.160.470.410.410.470.53
 343.160.670.610.560.440.50
 383.190.950.950.820.400.45
 423.090.581.211.130.360.40
'size' specifies the number of actual iterations, 512e is for
a masked epilog and 512f for the fully masked loop.  From
4 scalar iterations on the AVX512 masked epilog code is clearly
the winner, the fully masked variant is clearly worse and
it's size benefit is also tiny.


Let me check I understand correctly. In the fully masked case, there is a
single loop in which a new mask is generated at the start of each
iteration. In the masked epilogue case, the main loop uses no masking
whatsoever, thus avoiding the need for generating a mask, carrying the
mask, inserting vec_merge operations, etc, and then the epilogue looks much
like the fully masked case, but unlike smaller mode epilogues there is no
loop because the eplogue vector size is the same. Is that right?


Yes.


This scheme seems like it might also benefit GCN, in so much as it
simplifies the hot code path.

GCN does not actually have smaller vector sizes, so there's no analogue to
AVX2 (we pretend we have some smaller sizes, but that's because the middle
end can't do masking everywhere yet, and it helps make some vector
constants smaller, perhaps).


This patch does not enable using fully masked loops or
masked epilogues by default.  More work on cost modeling
and vectorization kind selection on x86_64 is necessary
for this.
Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
which could be exploited further to unify some of the flags
we have right now but there didn't seem to be many easy things
to merge, so I'm leaving this for followups.
Mask requirements as registered by vect_record_loop_mask are kept in their
original form and recorded in a hash_set now instead of being
processed to a vector of rgroup_controls.  Instead that's now
left to the final analysis phase which tries forming the rgroup_controls
vector using while_ult and if that fails now tries AVX512 style
which needs a different organization and instead fills a hash_map
with the relevant info.  vect_get_loop_mask now has two implementations,
one for the two mask styles we then have.
I have decided against interweaving
vect_set_loop_condition_partial_vectors
with conditions to do AVX512 style masking and instead opted to
"duplicate" this to vect_set_loop_condition_partial_vectors_avx512.
Likewise for vect_verify_full_masking vs vect_verify_full_masking_avx512.
I was split between making 'vec_loop_masks' a class with methods,
possibly merging in the _len stuff into a single registry.  It
seemed to be too many changes for the purpose 

Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow control for length loop control

2023-06-15 Thread Robin Dapp via Gcc-patches
> the minus in 'operand 2 - operand 3' should be a plus if the
> bias is really zero or -1.  I suppose

Yes, that somehow got lost from when the bias was still +1.  Maybe
Juzhe can fix this in the course of his patch.

> that's quite conservative.  I think you can do better when the
> loads are aligned, reading an extra byte when ignoring the bias
> is OK and you at least know the very first element is used.
> For stores you would need to emit compare for all but
> the first store of a group though ...

The implementation is a first shot and yes we could do a bit
better but limiting to a single rgroup is IMHO the more severe
restriction.  The pattern wasn't hit very often across SPEC
either way.  I think overall proper masking is  more important for
fixed-length vectors while length control might be more useful
for variable-length vectors.  Just my gut feeling though, you're
the expert there.

> That said, I'm still not seeing where you actually apply the bias.

We do

+
+  int partial_load_bias = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+  if (partial_load_bias != 0)
+{
+  tree adjusted_len = rgc->bias_adjusted_ctrl;
+  gassign *minus = gimple_build_assign (adjusted_len, PLUS_EXPR,
+   rgc->controls[0],
+   build_int_cst
+   (TREE_TYPE (rgc->controls[0]),
+partial_load_bias));
+  gimple_seq_add_stmt (header_seq, minus);
+}
+

as well as

+ if (use_bias_adjusted_len)
+   {
+ gcc_assert (i == 0);
+ tree adjusted_len =
+   make_temp_ssa_name (len_type, NULL, "adjusted_loop_len");
+ SSA_NAME_DEF_STMT (adjusted_len) = gimple_build_nop ();
+ rgl->bias_adjusted_ctrl = adjusted_len;
+   }

Regards
 Robin


Re: Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow control for length loop control

2023-06-15 Thread juzhe.zh...@rivai.ai
Hi, Richi. I have sent the first splitted patch (only add ifn and optabs) as 
you suggested.
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621874.html 
Could you take a look at it?
After this patch is approved, I will send the second patch (Support them into 
vectorizer) next.

Thanks!


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-15 17:52
To: Robin Dapp
CC: juzhe.zh...@rivai.ai; gcc-patches; richard.sandiford; krebbel; uweigand; 
linkw
Subject: Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow 
control for length loop control
On Thu, 15 Jun 2023, Robin Dapp wrote:
 
> > Meh, PoP is now behind a paywall, trying to get through ... I wonder
> > if there's a nice online html documenting the s390 len_load/store
> > instructions to better understand the need for the bias.
> 
> https://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf
> 
> Look for vector load with length (store).  The length operand specifies
> the highest bytes to load instead of the actual length.
 
Hmm.  It indeed cannot represent len == 0, so you are making sure
that never happens?  Because when it is actually zero you are
going to get -1 here?  At least I don't see the bias operand used at
all:
 
; Implement len_load/len_store optabs with vll/vstl.
(define_expand "len_load_v16qi"
  [(match_operand:V16QI 0 "register_operand")
   (match_operand:V16QI 1 "memory_operand")
   (match_operand:QI 2 "register_operand")
   (match_operand:QI 3 "vll_bias_operand")
  ]
  "TARGET_VX && TARGET_64BIT"
{
  rtx mem = adjust_address (operands[1], BLKmode, 0);
 
  rtx len = gen_reg_rtx (SImode);
  emit_move_insn (len, gen_rtx_ZERO_EXTEND (SImode, operands[2]));
  emit_insn (gen_vllv16qi (operands[0], len, mem));
  DONE;
})
 
the docs of len_load say
 
"
@cindex @code{len_load_@var{m}} instruction pattern
@item @samp{len_load_@var{m}}
Load (operand 2 - operand 3) elements from vector memory operand 1
into vector register operand 0, setting the other elements of
operand 0 to undefined values.  Operands 0 and 1 have mode @var{m},
which must be a vector mode.  Operand 2 has whichever integer mode the
target prefers.  Operand 3 conceptually has mode @code{QI}. 
 
Operand 2 can be a variable or a constant amount.  Operand 3 specifies a
constant bias: it is either a constant 0 or a constant -1.  The predicate 
on
operand 3 must only accept the bias values that the target actually 
supports.
GCC handles a bias of 0 more efficiently than a bias of -1.
 
If (operand 2 - operand 3) exceeds the number of elements in mode
@var{m}, the behavior is undefined.
 
If the target prefers the length to be measured in bytes rather than
elements, it should only implement this pattern for vectors of @code{QI}
elements."
 
the minus in 'operand 2 - operand 3' should be a plus if the
bias is really zero or -1.  I suppose
 
'If (operand 2 - operand 3) exceeds the number of elements in mode
@var{m}, the behavior is undefined.'
 
means that the vectorizer has to make sure the biased element
count never underflows?
 
That is, for a loop like
 
void foo (double *x, float *y, int n)
{
  for (int i = 0; i < n; ++i)
y[i] = x[i];
}
 
you should get
 
   x1 = len_load (...);
   x2 = len_load (...);
   y = VEC_PACK_TRUNC_EXPR 
   len_store (..., y);
 
but then the x2 load can end up with a len of zero and thus
trap (since you will load either a full vector or the first
byte of it).  I see you do
 
  /* If the backend requires a bias of -1 for LEN_LOAD, we must not emit
 len_loads with a length of zero.  In order to avoid that we prohibit
 more than one loop length here.  */
  if (partial_load_bias == -1
  && LOOP_VINFO_LENS (loop_vinfo).length () > 1)
return false;
 
that's quite conservative.  I think you can do better when the
loads are aligned, reading an extra byte when ignoring the bias
is OK and you at least know the very first element is used.
For stores you would need to emit compare for all but
the first store of a group though ...
 
That said, I'm still not seeing where you actually apply the bias.
 
Richard.
 


[PATCH V3] VECT: Support LEN_MASK_{LOAD,STORE} ifn && optabs

2023-06-15 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Accoding to comments from Richi, split the first patch to add ifn && optabs
of LEN_MASK_{LOAD,STORE} only, we don't apply them into vectorizer in this
patch. And also add BIAS argument for possible s390's future use.

The description of the patterns in doc are coming Robin.

After this patch is approved, will send the second patch to apply len_mask_*
patterns into vectorizer.

Target like ARM SVE in GCC has an elegant way to handle both loop control
and flow control simultaneously:

loop_control_mask = WHILE_ULT
flow_control_mask = comparison
control_mask = loop_control_mask & flow_control_mask;
MASK_LOAD (control_mask)
MASK_STORE (control_mask)

However, targets like RVV (RISC-V Vector) can not use this approach in
auto-vectorization since RVV use length in loop control.

This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets
like RISC-V that uses length in loop control.
Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length
or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR.
Mask is the outcome of comparison.

LEN_MASK_ LOAD/STORE format is defined as follows:
1). LEN_MASK_LOAD (ptr, align, length, mask).
2). LEN_MASK_STORE (ptr, align, length, mask, vec).

Consider these 4 following cases:

VLA: Variable-length auto-vectorization
VLS: Specific-length auto-vectorization

Case 1 (VLS): -mrvv-vector-bits=128   IR (Does not use LEN_MASK_*):
Code:   v1 = MEM (...)
  for (int i = 0; i < 4; i++)   v2 = MEM (...)
a[i] = b[i] + c[i]; v3 = v1 + v2 
MEM[...] = v3

Case 2 (VLS): -mrvv-vector-bits=128   IR (LEN_MASK_* with length = VF, mask = 
comparison):
Code:   mask = comparison
  for (int i = 0; i < 4; i++)   v1 = LEN_MASK_LOAD (length = VF, mask)
if (cond[i])v2 = LEN_MASK_LOAD (length = VF, mask) 
  a[i] = b[i] + c[i];   v3 = v1 + v2
LEN_MASK_STORE (length = VF, mask, v3)
   
Case 3 (VLA):
Code:   loop_len = SELECT_VL or MIN
  for (int i = 0; i < n; i++)   v1 = LEN_MASK_LOAD (length = loop_len, 
mask = {-1,-1,...})
  a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = loop_len, 
mask = {-1,-1,...})
v3 = v1 + v2
LEN_MASK_STORE (length = loop_len, mask 
= {-1,-1,...}, v3)

Case 4 (VLA):
Code:   loop_len = SELECT_VL or MIN
  for (int i = 0; i < n; i++)   mask = comparison
  if (cond[i])  v1 = LEN_MASK_LOAD (length = loop_len, 
mask)
  a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = loop_len, 
mask)
v3 = v1 + v2
LEN_MASK_STORE (length = loop_len, 
mask, v3)

Co-authored-by: Robin Dapp 

gcc/ChangeLog:

* doc/md.texi: Add len_mask{load,store}.
* genopinit.cc (main): Ditto.
(CMP_NAME): Ditto.
* internal-fn.cc (len_maskload_direct): Ditto.
(len_maskstore_direct): Ditto.
(expand_call_mem_ref): Ditto.
(expand_partial_load_optab_fn): Ditto.
(expand_len_maskload_optab_fn): Ditto.
(expand_partial_store_optab_fn): Ditto.
(expand_len_maskstore_optab_fn): Ditto.
(direct_len_maskload_optab_supported_p): Ditto.
(direct_len_maskstore_optab_supported_p): Ditto.
* internal-fn.def (LEN_MASK_LOAD): Ditto.
(LEN_MASK_STORE): Ditto.
* optabs.def (OPTAB_CD): Ditto.

---
 gcc/doc/md.texi | 46 +
 gcc/genopinit.cc|  6 --
 gcc/internal-fn.cc  | 39 --
 gcc/internal-fn.def |  4 
 gcc/optabs.def  |  2 ++
 5 files changed, 93 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index a43fd65a2b2..af23ec938d6 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5136,6 +5136,52 @@ of @code{QI} elements.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{len_maskload@var{m}@var{n}} instruction pattern
+@item @samp{len_maskload@var{m}@var{n}}
+Perform a masked load (operand 2 - operand 4) elements from vector memory
+operand 1 into vector register operand 0, setting the other elements of
+operand 0 to undefined values.  This is a combination of len_load and 
maskload. 
+Operands 0 and 1 have mode @var{m}, which must be a vector mode.  Operand 2
+has whichever integer mode the target prefers.  A secondary mask is specified 
in
+operand 3 which must be of type @var{n}.  Operand 4 conceptually has mode 
@code{QI}.
+
+Operand 2 can be a variable or a constant amount.  Operand 4 specifies a
+constant bias: it is either a constant 0 or a constant -1.  The 

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Richard Sandiford via Gcc-patches
Andrew Stubbs  writes:
> One 
> comment: building a vector constant {0, 1, 2, 3, , 63} results in a 
> very large entry in the constant pool and an unnecessary memory load (it 
> literally has to use this sequence to generate the addresses to load the 
> constant!) Generating the sequence via VEC_SERIES would be a no-op, for 
> GCN, because we have an ABI-mandated register that already holds that 
> value. (Perhaps I have another piece missing here, IDK?)

A constant like that should become a CONST_VECTOR in RTL, so I think
the way to handle it would be to treat such a CONST_VECTOR as a valid
immediate operand, including providing an alternative for it in the
move patterns.  const_vec_series_p provides a quick way to test.

Thanks,
Richard


Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Richard Biener via Gcc-patches
On Thu, 15 Jun 2023, Andrew Stubbs wrote:

> On 14/06/2023 15:29, Richard Biener wrote:
> > 
> > 
> >> Am 14.06.2023 um 16:27 schrieb Andrew Stubbs :
> >>
> >> On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:
> >>> This implemens fully masked vectorization or a masked epilog for
> >>> AVX512 style masks which single themselves out by representing
> >>> each lane with a single bit and by using integer modes for the mask
> >>> (both is much like GCN).
> >>> AVX512 is also special in that it doesn't have any instruction
> >>> to compute the mask from a scalar IV like SVE has with while_ult.
> >>> Instead the masks are produced by vector compares and the loop
> >>> control retains the scalar IV (mainly to avoid dependences on
> >>> mask generation, a suitable mask test instruction is available).
> >>
> >> This is also sounds like GCN. We currently use WHILE_ULT in the middle end
> >> which expands to a vector compare against a vector of stepped values. This
> >> requires an additional instruction to prepare the comparison vector
> >> (compared to SVE), but the "while_ultv64sidi" pattern (for example) returns
> >> the DImode bitmask, so it works reasonably well.
> >>
> >>> Like RVV code generation prefers a decrementing IV though IVOPTs
> >>> messes things up in some cases removing that IV to eliminate
> >>> it with an incrementing one used for address generation.
> >>> One of the motivating testcases is from PR108410 which in turn
> >>> is extracted from x264 where large size vectorization shows
> >>> issues with small trip loops.  Execution time there improves
> >>> compared to classic AVX512 with AVX2 epilogues for the cases
> >>> of less than 32 iterations.
> >>> size   scalar 128 256 512512e512f
> >>>  19.42   11.329.35   11.17   15.13   16.89
> >>>  25.726.536.666.667.628.56
> >>>  34.495.105.105.745.085.73
> >>>  44.104.334.295.213.794.25
> >>>  63.783.853.864.762.542.85
> >>>  83.641.893.764.501.922.16
> >>> 123.562.213.754.261.261.42
> >>> 163.360.831.064.160.951.07
> >>> 203.391.421.334.070.750.85
> >>> 243.230.661.724.220.620.70
> >>> 283.181.092.044.200.540.61
> >>> 323.160.470.410.410.470.53
> >>> 343.160.670.610.560.440.50
> >>> 383.190.950.950.820.400.45
> >>> 423.090.581.211.130.360.40
> >>> 'size' specifies the number of actual iterations, 512e is for
> >>> a masked epilog and 512f for the fully masked loop.  From
> >>> 4 scalar iterations on the AVX512 masked epilog code is clearly
> >>> the winner, the fully masked variant is clearly worse and
> >>> it's size benefit is also tiny.
> >>
> >> Let me check I understand correctly. In the fully masked case, there is a
> >> single loop in which a new mask is generated at the start of each
> >> iteration. In the masked epilogue case, the main loop uses no masking
> >> whatsoever, thus avoiding the need for generating a mask, carrying the
> >> mask, inserting vec_merge operations, etc, and then the epilogue looks much
> >> like the fully masked case, but unlike smaller mode epilogues there is no
> >> loop because the eplogue vector size is the same. Is that right?
> > 
> > Yes.
> > 
> >> This scheme seems like it might also benefit GCN, in so much as it
> >> simplifies the hot code path.
> >>
> >> GCN does not actually have smaller vector sizes, so there's no analogue to
> >> AVX2 (we pretend we have some smaller sizes, but that's because the middle
> >> end can't do masking everywhere yet, and it helps make some vector
> >> constants smaller, perhaps).
> >>
> >>> This patch does not enable using fully masked loops or
> >>> masked epilogues by default.  More work on cost modeling
> >>> and vectorization kind selection on x86_64 is necessary
> >>> for this.
> >>> Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
> >>> which could be exploited further to unify some of the flags
> >>> we have right now but there didn't seem to be many easy things
> >>> to merge, so I'm leaving this for followups.
> >>> Mask requirements as registered by vect_record_loop_mask are kept in their
> >>> original form and recorded in a hash_set now instead of being
> >>> processed to a vector of rgroup_controls.  Instead that's now
> >>> left to the final analysis phase which tries forming the rgroup_controls
> >>> vector using while_ult and if that fails now tries AVX512 style
> >>> which needs a different organization and instead fills a hash_map
> >>> with the relevant info.  vect_get_loop_mask now has two implementations,
> >>> one for the two mask styles we then have.
> >>> I have decided against interweaving
> >>> 

Re: [RFC] RISC-V: Support risc-v bfloat16 This patch support bfloat16 in riscv like x86_64 and arm.

2023-06-15 Thread Jin Ma via Gcc-patches
> diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
> index 5b70ab20758..6349f032bc8 100644
> --- a/gcc/config/riscv/iterators.md
> +++ b/gcc/config/riscv/iterators.md
> @@ -61,10 +61,15 @@
>  ;; Iterator for hardware-supported floating-point modes.
>  (define_mode_iterator ANYF [(SF "TARGET_HARD_FLOAT || TARGET_ZFINX")
> (DF "TARGET_DOUBLE_FLOAT || TARGET_ZDINX")
> -   (HF "TARGET_ZFH || TARGET_ZHINX")])
> +   (HF "TARGET_ZFH || TARGET_ZHINX") 
> +(BF "TARGET_ZFBFMIN")])
> +
> +;; Iterator for HImode constant generation.
> +(define_mode_iterator BFHF [BF HF])
>  
>  ;; Iterator for floating-point modes that can be loaded into X registers.
> -(define_mode_iterator SOFTF [SF (DF "TARGET_64BIT") (HF "TARGET_ZFHMIN")])
> +(define_mode_iterator SOFTF [SF (DF "TARGET_64BIT") (HF "TARGET_ZFHMIN")
> +(BF "TARGET_ZFBFMIN")])
>  
>  
>  ;; ---
> @@ -76,27 +81,27 @@
>  (define_mode_attr size [(QI "b") (HI "h")])
>  
>  ;; Mode attributes for loads.
> -(define_mode_attr load [(QI "lb") (HI "lh") (SI "lw") (DI "ld") (HF "flh") 
> (SF "flw") (DF "fld")])
> +(define_mode_attr load [(QI "lb") (HI "lh") (SI "lw") (DI "ld") (BF "flh") 
> (HF "flh") (SF "flw") (DF "fld")])
>  
>  ;; Instruction names for integer loads that aren't explicitly sign or zero
>  ;; extended.  See riscv_output_move and LOAD_EXTEND_OP.
>  (define_mode_attr default_load [(QI "lbu") (HI "lhu") (SI "lw") (DI "ld")])
>  
>  ;; Mode attribute for FP loads into integer registers.
> -(define_mode_attr softload [(HF "lh") (SF "lw") (DF "ld")])
> +(define_mode_attr softload [(BF "lh") (HF "lh") (SF "lw") (DF "ld")])
>  
>  ;; Instruction names for stores.
> -(define_mode_attr store [(QI "sb") (HI "sh") (SI "sw") (DI "sd") (HF "fsh") 
> (SF "fsw") (DF "fsd")])
> +(define_mode_attr store [(QI "sb") (HI "sh") (SI "sw") (DI "sd") (BF "fsh") 
> (HF "fsh") (SF "fsw") (DF "fsd")])
>  
>  ;; Instruction names for FP stores from integer registers.
> -(define_mode_attr softstore [(HF "sh") (SF "sw") (DF "sd")])
> +(define_mode_attr softstore [(BF "sh") (HF "sh") (SF "sw") (DF "sd")])
>  
>  ;; This attribute gives the best constraint to use for registers of
>  ;; a given mode.
>  (define_mode_attr reg [(SI "d") (DI "d") (CC "d")])
>  
>  ;; This attribute gives the format suffix for floating-point operations.
> -(define_mode_attr fmt [(HF "h") (SF "s") (DF "d")])
> +(define_mode_attr fmt [(BF "h") (HF "h") (SF "s") (DF "d")])
>  
>  ;; This attribute gives the integer suffix for floating-point conversions.
>  (define_mode_attr ifmt [(SI "w") (DI "l")])
> @@ -106,7 +111,7 @@
>  
>  ;; This attribute gives the upper-case mode name for one unit of a
>  ;; floating-point mode.
> -(define_mode_attr UNITMODE [(HF "HF") (SF "SF") (DF "DF")])
> +(define_mode_attr UNITMODE [(BF "BF") (HF "HF") (SF "SF") (DF "DF")])
>  

There are also some problems here, which cannot be simply handled like HF. 
Many instructions support HF but do not support BF. For example, fadd.h
can be used for HF but cannot be used for BF. 

I guess it may need to be converted to SF first, then fadd.s, and finally
converted to BF.  I'm not so sure.

Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow control for length loop control

2023-06-15 Thread Richard Biener via Gcc-patches
On Thu, 15 Jun 2023, Robin Dapp wrote:

> > Meh, PoP is now behind a paywall, trying to get through ... I wonder
> > if there's a nice online html documenting the s390 len_load/store
> > instructions to better understand the need for the bias.
> 
> https://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf
> 
> Look for vector load with length (store).  The length operand specifies
> the highest bytes to load instead of the actual length.

Hmm.  It indeed cannot represent len == 0, so you are making sure
that never happens?  Because when it is actually zero you are
going to get -1 here?  At least I don't see the bias operand used at
all:

; Implement len_load/len_store optabs with vll/vstl.
(define_expand "len_load_v16qi"
  [(match_operand:V16QI 0 "register_operand")
   (match_operand:V16QI 1 "memory_operand")
   (match_operand:QI 2 "register_operand")
   (match_operand:QI 3 "vll_bias_operand")
  ]
  "TARGET_VX && TARGET_64BIT"
{
  rtx mem = adjust_address (operands[1], BLKmode, 0);

  rtx len = gen_reg_rtx (SImode);
  emit_move_insn (len, gen_rtx_ZERO_EXTEND (SImode, operands[2]));
  emit_insn (gen_vllv16qi (operands[0], len, mem));
  DONE;
})

the docs of len_load say

"
@cindex @code{len_load_@var{m}} instruction pattern
@item @samp{len_load_@var{m}}
Load (operand 2 - operand 3) elements from vector memory operand 1
into vector register operand 0, setting the other elements of
operand 0 to undefined values.  Operands 0 and 1 have mode @var{m},
which must be a vector mode.  Operand 2 has whichever integer mode the
target prefers.  Operand 3 conceptually has mode @code{QI}. 

Operand 2 can be a variable or a constant amount.  Operand 3 specifies a
constant bias: it is either a constant 0 or a constant -1.  The predicate 
on
operand 3 must only accept the bias values that the target actually 
supports.
GCC handles a bias of 0 more efficiently than a bias of -1.

If (operand 2 - operand 3) exceeds the number of elements in mode
@var{m}, the behavior is undefined.

If the target prefers the length to be measured in bytes rather than
elements, it should only implement this pattern for vectors of @code{QI}
elements."

the minus in 'operand 2 - operand 3' should be a plus if the
bias is really zero or -1.  I suppose

'If (operand 2 - operand 3) exceeds the number of elements in mode
@var{m}, the behavior is undefined.'

means that the vectorizer has to make sure the biased element
count never underflows?

That is, for a loop like

void foo (double *x, float *y, int n)
{
  for (int i = 0; i < n; ++i)
y[i] = x[i];
}

you should get

   x1 = len_load (...);
   x2 = len_load (...);
   y = VEC_PACK_TRUNC_EXPR 
   len_store (..., y);

but then the x2 load can end up with a len of zero and thus
trap (since you will load either a full vector or the first
byte of it).  I see you do

  /* If the backend requires a bias of -1 for LEN_LOAD, we must not emit
 len_loads with a length of zero.  In order to avoid that we prohibit
 more than one loop length here.  */
  if (partial_load_bias == -1
  && LOOP_VINFO_LENS (loop_vinfo).length () > 1)
return false;

that's quite conservative.  I think you can do better when the
loads are aligned, reading an extra byte when ignoring the bias
is OK and you at least know the very first element is used.
For stores you would need to emit compare for all but
the first store of a group though ...

That said, I'm still not seeing where you actually apply the bias.

Richard.


[PATCH] value-prof.cc: Correct edge prob calculation.

2023-06-15 Thread Filip Kastl via Gcc-patches
The mod-subtract optimization with ncounts==1 produced incorrect edge
probabilities due to incorrect conditional probability calculation. This
patch fixes the calculation.

gcc/ChangeLog:

* value-prof.cc (gimple_mod_subtract_transform): Correct edge
  prob calculation.

Signed-off-by: Filip Kastl 
---
 gcc/value-prof.cc | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/value-prof.cc b/gcc/value-prof.cc
index f40e58ac4f2..580d6dd648d 100644
--- a/gcc/value-prof.cc
+++ b/gcc/value-prof.cc
@@ -1186,7 +1186,11 @@ gimple_mod_subtract_transform (gimple_stmt_iterator *si)
   if (all > 0)
 {
   prob1 = profile_probability::probability_in_gcov_type (count1, all);
-  prob2 = profile_probability::probability_in_gcov_type (count2, all);
+  if (all == count1)
+   prob2 = profile_probability::even ();
+  else
+   prob2 = profile_probability::probability_in_gcov_type (count2, all -
+  count1);
 }
   else
 {
-- 
2.40.1



Re: [PATCH v2] c++: Accept elaborated-enum-base in system headers

2023-06-15 Thread Iain Sandoe
Hi Alex

> On 14 Jun 2023, at 14:31, Alex Coplan  wrote:
> 
> Hi,
> 
> This is a v2 patch addressing feedback for:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621050.html
> 
> macOS SDK headers using the CF_ENUM macro can expand to invalid C++ code
> of the form:
> 
> typedef enum T : BaseType T;
> 
> i.e. an elaborated-type-specifier with an additional enum-base.
> Upstream LLVM can be made to accept the above construct with
> -Wno-error=elaborated-enum-base.
> 
> This patch adds the -Welaborated-enum-base warning to GCC and adjusts
> the C++ parser to emit this warning instead of rejecting this code
> outright.
> 
> The macro expansion in the macOS headers occurs in the case that the
> compiler declares support for enums with underlying type using
> __has_feature, see
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618450.html
> 
> GCC rejecting this construct outright means that GCC fails to bootstrap
> on Darwin in the case that it (correctly) implements __has_feature and
> declares support for C++ enums with underlying type.
> 
> With this patch, GCC can bootstrap on Darwin in combination with the
> (WIP) __has_feature patch posted at:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617878.html
> 
> Bootstrapped/regtested on aarch64-linux-gnu and x86_64-apple-darwin.
> OK for trunk?

For the record, I have tried this on Darwin21 and Darwin17 together with
my patches for __attribute__((availability(….))) which is also instrumental
in exposing more of the SDK content than we have previously been able
to parse on GCC.  Bootstrap succeeded there, and I will try on some older
versions as time permits (likely after WG21 meeting is done).

thanks
Iain

> 
> Thanks,
> Alex
> 
> gcc/c-family/ChangeLog:
> 
>* c.opt (Welaborated-enum-base): New.
> 
> gcc/cp/ChangeLog:
> 
>* parser.cc (cp_parser_enum_specifier): Don't reject
>elaborated-type-specifier with enum-base, instead emit new
>Welaborated-enum-base warning.
> 
> gcc/testsuite/ChangeLog:
> 
>* g++.dg/cpp0x/enum40.C: Adjust expected diagnostics.
>* g++.dg/cpp0x/forw_enum6.C: Likewise.
>* g++.dg/cpp0x/elab-enum-base.C: New test.
> 



Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Andrew Stubbs

On 14/06/2023 15:29, Richard Biener wrote:




Am 14.06.2023 um 16:27 schrieb Andrew Stubbs :

On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:

This implemens fully masked vectorization or a masked epilog for
AVX512 style masks which single themselves out by representing
each lane with a single bit and by using integer modes for the mask
(both is much like GCN).
AVX512 is also special in that it doesn't have any instruction
to compute the mask from a scalar IV like SVE has with while_ult.
Instead the masks are produced by vector compares and the loop
control retains the scalar IV (mainly to avoid dependences on
mask generation, a suitable mask test instruction is available).


This is also sounds like GCN. We currently use WHILE_ULT in the middle end which expands 
to a vector compare against a vector of stepped values. This requires an additional 
instruction to prepare the comparison vector (compared to SVE), but the 
"while_ultv64sidi" pattern (for example) returns the DImode bitmask, so it 
works reasonably well.


Like RVV code generation prefers a decrementing IV though IVOPTs
messes things up in some cases removing that IV to eliminate
it with an incrementing one used for address generation.
One of the motivating testcases is from PR108410 which in turn
is extracted from x264 where large size vectorization shows
issues with small trip loops.  Execution time there improves
compared to classic AVX512 with AVX2 epilogues for the cases
of less than 32 iterations.
size   scalar 128 256 512512e512f
 19.42   11.329.35   11.17   15.13   16.89
 25.726.536.666.667.628.56
 34.495.105.105.745.085.73
 44.104.334.295.213.794.25
 63.783.853.864.762.542.85
 83.641.893.764.501.922.16
123.562.213.754.261.261.42
163.360.831.064.160.951.07
203.391.421.334.070.750.85
243.230.661.724.220.620.70
283.181.092.044.200.540.61
323.160.470.410.410.470.53
343.160.670.610.560.440.50
383.190.950.950.820.400.45
423.090.581.211.130.360.40
'size' specifies the number of actual iterations, 512e is for
a masked epilog and 512f for the fully masked loop.  From
4 scalar iterations on the AVX512 masked epilog code is clearly
the winner, the fully masked variant is clearly worse and
it's size benefit is also tiny.


Let me check I understand correctly. In the fully masked case, there is a 
single loop in which a new mask is generated at the start of each iteration. In 
the masked epilogue case, the main loop uses no masking whatsoever, thus 
avoiding the need for generating a mask, carrying the mask, inserting vec_merge 
operations, etc, and then the epilogue looks much like the fully masked case, 
but unlike smaller mode epilogues there is no loop because the eplogue vector 
size is the same. Is that right?


Yes.


This scheme seems like it might also benefit GCN, in so much as it simplifies 
the hot code path.

GCN does not actually have smaller vector sizes, so there's no analogue to AVX2 
(we pretend we have some smaller sizes, but that's because the middle end can't 
do masking everywhere yet, and it helps make some vector constants smaller, 
perhaps).


This patch does not enable using fully masked loops or
masked epilogues by default.  More work on cost modeling
and vectorization kind selection on x86_64 is necessary
for this.
Implementation wise this introduces LOOP_VINFO_PARTIAL_VECTORS_STYLE
which could be exploited further to unify some of the flags
we have right now but there didn't seem to be many easy things
to merge, so I'm leaving this for followups.
Mask requirements as registered by vect_record_loop_mask are kept in their
original form and recorded in a hash_set now instead of being
processed to a vector of rgroup_controls.  Instead that's now
left to the final analysis phase which tries forming the rgroup_controls
vector using while_ult and if that fails now tries AVX512 style
which needs a different organization and instead fills a hash_map
with the relevant info.  vect_get_loop_mask now has two implementations,
one for the two mask styles we then have.
I have decided against interweaving vect_set_loop_condition_partial_vectors
with conditions to do AVX512 style masking and instead opted to
"duplicate" this to vect_set_loop_condition_partial_vectors_avx512.
Likewise for vect_verify_full_masking vs vect_verify_full_masking_avx512.
I was split between making 'vec_loop_masks' a class with methods,
possibly merging in the _len stuff into a single registry.  It
seemed to be too many changes for the purpose of getting AVX512
working.  I'm going to play wait and see what happens with RISC-V
here 

Re: [PATCH] ipa: Self-DCE of uses of removed call LHSs (PR 108007)

2023-06-15 Thread Bernhard Reutner-Fischer via Gcc-patches
On 13 June 2023 17:11:13 CEST, Martin Jambor  wrote:
>Ping.

s/funtction/function/
s/runing/running/
>
>Thanks,


Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow control for length loop control

2023-06-15 Thread Robin Dapp via Gcc-patches
On 6/15/23 11:18, Robin Dapp wrote:
>> Meh, PoP is now behind a paywall, trying to get through ... I wonder
>> if there's a nice online html documenting the s390 len_load/store
>> instructions to better understand the need for the bias.

This is z16, but obviously no changes for vll/vstl:
https://publibfp.dhe.ibm.com/epubs/pdf/a227832d.pdf

Regards
 Robin


Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow control for length loop control

2023-06-15 Thread Robin Dapp via Gcc-patches
> Meh, PoP is now behind a paywall, trying to get through ... I wonder
> if there's a nice online html documenting the s390 len_load/store
> instructions to better understand the need for the bias.

https://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf

Look for vector load with length (store).  The length operand specifies
the highest bytes to load instead of the actual length.

Regards
 Robin


Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow control for length loop control

2023-06-15 Thread Richard Biener via Gcc-patches
On Thu, 15 Jun 2023, Robin Dapp wrote:

> >>> Can you try using the same wording for length and mask operands
> >>> as for len_load and maskload?  Also len_load has the "bias"
> >>> operand which you omit here - IIRC that was added for s390 which
> >>> for unknown reason behaves a little different than power.  If
> >>> len support for s390 ever extends to other ops or power or s390
> >>> gain mask support for conditional code we'd likely have to adjust
> >>> each optab you add.  Maybe it's better to add the bias operand
> >>> now.
> > 
> > I don't know BIAS well and It seems to be a Power target dependent feature.
> > I think len_mask_* in general should only need lenght and mask operand.
> > Actually, the function argument is totally same as vp_load/vp_store in LLVM.
> > 
> > Could I just keep current format (without BIAS argument)? And extend it 
> > with BIAS if
> > PowerPC want to use LEN_MASK_ * ?
> 
> FYI: The only proper user of bias is s390 because the insns cannot handle a
> zero length.  Power doesn't actually need it.  What we do is just subtract the
> bias (== 1) from the length in case of n_rgroups == 1 and nothing for
> bias == 0 so the actually bias support code needed is small.

Meh, PoP is now behind a paywall, trying to get through ... I wonder
if there's a nice online html documenting the s390 len_load/store
instructions to better understand the need for the bias.

Richard.


Re: [PATCH 4/4] rs6000: build constant via li/lis;rldic

2023-06-15 Thread guojiufu via Gcc-patches

On 2023-06-13 17:18, Jiufu Guo via Gcc-patches wrote:

Hi David,

Thanks for your valuable comments!

David Edelsohn  writes:



...
Do you have any measurement of how expensive it is to test all of 
these additional methods to generate a constant?  How much does this 
affect the

compile time?


Yeap, Thanks for this very good question!
This patch is mostly using bitwise operations and if-conditions,
it would be expected not expensive.

Testcases were checked.  For example:
A case with ~1000 constants: most of them hit this feature.
With this feature, the compiling time is slightly faster.

0m1.985s(without patch) vs. 0m1.874s(with patch)
(note:D rs6000_emit_set_long_const does not occur in hot perf
functions.  So, the tricky time saving would not directly cause
by this feature.)

A case with ~1000 constants:(most are not hit by this feature)
0m2.493s(without patch) vs. 0m2.558s(with patch).


Typo, this should be:
0m2.493s(with patch) vs. 0m2.558s(without patch).

It is also faster with the patch :)

BR,
Jeff (Jiufu Guo)



For runtime, actually, with the patch, it seems there is no visible
improvement in SPEC2017.  While I still feel this patch is
doing the right thing: use fewer instructions to build the constant.

BR,
Jeff (Jiufu Guo)



Thanks, David





Re: [pushed][PATCH v3] LoongArch: Avoid non-returning indirect jumps through $ra [PR110136]

2023-06-15 Thread Xi Ruoyao via Gcc-patches
Xuerui: I guess this makes it sensible to show "ret" instead of "jirl
$zero, $ra, 0" in objdump -d output, but I don't know how to implement
it.  Do you have some idea?

On Thu, 2023-06-15 at 16:27 +0800, Lulu Cheng wrote:
> Pushed to trunk and gcc-12 gcc-13.
> r14-1866
> r13-7448
> r12-9698
> 
> 在 2023/6/15 上午9:30, Lulu Cheng 写道:
> > Micro-architecture unconditionally treats a "jr $ra" as "return from
> > subroutine",
> > hence doing "jr $ra" would interfere with both subroutine return
> > prediction and
> > the more general indirect branch prediction.
> > 
> > Therefore, a problem like PR110136 can cause a significant increase
> > in branch error
> > prediction rate and affect performance. The same problem exists with
> > "indirect_jump".
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/loongarch.md: Modify the register
> > constraints for template
> > "jumptable" and "indirect_jump" from "r" to "e".
> > 
> > Co-authored-by: Andrew Pinski 
> > ---
> > v1 -> v2:
> >    1. Modify the description.
> >    2. Modify the register constraints of the template
> > "indirect_jump".
> > v2 -> v3:
> >    1. Modify the description.
> > ---
> >   gcc/config/loongarch/loongarch.md | 8 ++--
> >   1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/gcc/config/loongarch/loongarch.md
> > b/gcc/config/loongarch/loongarch.md
> > index 816a943d155..b37e070660f 100644
> > --- a/gcc/config/loongarch/loongarch.md
> > +++ b/gcc/config/loongarch/loongarch.md
> > @@ -2895,6 +2895,10 @@ (define_insn "*jump_pic"
> >   }
> >     [(set_attr "type" "branch")])
> >   
> > +;; Micro-architecture unconditionally treats a "jr $ra" as "return
> > from subroutine",
> > +;; non-returning indirect jumps through $ra would interfere with
> > both subroutine
> > +;; return prediction and the more general indirect branch
> > prediction.
> > +
> >   (define_expand "indirect_jump"
> >     [(set (pc) (match_operand 0 "register_operand"))]
> >     ""
> > @@ -2905,7 +2909,7 @@ (define_expand "indirect_jump"
> >   })
> >   
> >   (define_insn "@indirect_jump"
> > -  [(set (pc) (match_operand:P 0 "register_operand" "r"))]
> > +  [(set (pc) (match_operand:P 0 "register_operand" "e"))]
> >     ""
> >     "jr\t%0"
> >     [(set_attr "type" "jump")
> > @@ -2928,7 +2932,7 @@ (define_expand "tablejump"
> >   
> >   (define_insn "@tablejump"
> >     [(set (pc)
> > -   (match_operand:P 0 "register_operand" "r"))
> > +   (match_operand:P 0 "register_operand" "e"))
> >  (use (label_ref (match_operand 1 "" "")))]
> >     ""
> >     "jr\t%0"
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow control for length loop control

2023-06-15 Thread juzhe.zh...@rivai.ai
Ok. I will add BIAS argument too for the convenience of possible s390 needed.
Even though we (RVV) don't really need it.

Thank. Will send a small patch V3 soon.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-15 16:58
To: juzhe.zh...@rivai.ai; rguenther
CC: rdapp.gcc; gcc-patches; richard.sandiford; krebbel; uweigand; linkw
Subject: Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow 
control for length loop control
>>> Can you try using the same wording for length and mask operands
>>> as for len_load and maskload?  Also len_load has the "bias"
>>> operand which you omit here - IIRC that was added for s390 which
>>> for unknown reason behaves a little different than power.  If
>>> len support for s390 ever extends to other ops or power or s390
>>> gain mask support for conditional code we'd likely have to adjust
>>> each optab you add.  Maybe it's better to add the bias operand
>>> now.
> 
> I don't know BIAS well and It seems to be a Power target dependent feature.
> I think len_mask_* in general should only need lenght and mask operand.
> Actually, the function argument is totally same as vp_load/vp_store in LLVM.
> 
> Could I just keep current format (without BIAS argument)? And extend it with 
> BIAS if
> PowerPC want to use LEN_MASK_ * ?
 
FYI: The only proper user of bias is s390 because the insns cannot handle a
zero length.  Power doesn't actually need it.  What we do is just subtract the
bias (== 1) from the length in case of n_rgroups == 1 and nothing for
bias == 0 so the actually bias support code needed is small.
 
Regards
Robin
 


Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow control for length loop control

2023-06-15 Thread Robin Dapp via Gcc-patches
>>> Can you try using the same wording for length and mask operands
>>> as for len_load and maskload?  Also len_load has the "bias"
>>> operand which you omit here - IIRC that was added for s390 which
>>> for unknown reason behaves a little different than power.  If
>>> len support for s390 ever extends to other ops or power or s390
>>> gain mask support for conditional code we'd likely have to adjust
>>> each optab you add.  Maybe it's better to add the bias operand
>>> now.
> 
> I don't know BIAS well and It seems to be a Power target dependent feature.
> I think len_mask_* in general should only need lenght and mask operand.
> Actually, the function argument is totally same as vp_load/vp_store in LLVM.
> 
> Could I just keep current format (without BIAS argument)? And extend it with 
> BIAS if
> PowerPC want to use LEN_MASK_ * ?

FYI: The only proper user of bias is s390 because the insns cannot handle a
zero length.  Power doesn't actually need it.  What we do is just subtract the
bias (== 1) from the length in case of n_rgroups == 1 and nothing for
bias == 0 so the actually bias support code needed is small.

Regards
 Robin


Re: Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow control for length loop control

2023-06-15 Thread juzhe.zh...@rivai.ai
>> I don't know BIAS well and It seems to be a Power target dependent feature.
>>I think len_mask_* in general should only need lenght and mask operand.
>>Actually, the function argument is totally same as vp_load/vp_store in LLVM.

>>Could I just keep current format (without BIAS argument)? And extend it with 
>>BIAS if
>>PowerPC want to use LEN_MASK_ * 

Oh, sorry. Forget about this information. I will add BIAS argument too.
Thanks for comments.
Will send splitted patch with only IFN and optab patch for review.

Thanks a lot.


juzhe.zh...@rivai.ai
 
From: juzhe.zh...@rivai.ai
Date: 2023-06-15 16:47
To: rguenther
CC: gcc-patches; richard.sandiford; krebbel; uweigand
Subject: Re: Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow 
control for length loop control
Hi, Richard.

Thanks for comments.

>>Can you try using the same wording for length and mask operands
>>as for len_load and maskload?  Also len_load has the "bias"
>>operand which you omit here - IIRC that was added for s390 which
>>for unknown reason behaves a little different than power.  If
>>len support for s390 ever extends to other ops or power or s390
>>gain mask support for conditional code we'd likely have to adjust
>>each optab you add.  Maybe it's better to add the bias operand
>>now.

I don't know BIAS well and It seems to be a Power target dependent feature.
I think len_mask_* in general should only need lenght and mask operand.
Actually, the function argument is totally same as vp_load/vp_store in LLVM.

Could I just keep current format (without BIAS argument)? And extend it with 
BIAS if
PowerPC want to use LEN_MASK_ * ?


>> Can you please split the patch?  Have 1/n add the optab and ifn
>>plus adjust the generic ifn predicates.  Have 2/n adjust the vectorizer
>>parts and 3/n optional things such as DSE.
Ok.

>>Can you instead add a len_p argument to the function and do

>>if (len_p)
>>  check len is full
>>if (mask_p)
>>  check mask is full

>>?


>>Use internal_fn_stored_value_index and internal_fn_mask_index,
>>possibly add internal_fn_len_index?

>>why deviate from can_vec_mask_load_store_p and not pass in the
>>mask_mode?  In fact I wonder why this function differs from
>>can_vec_mask_load_store_p besides using other optabs?  Couldn't
>>we simply add a bool with_len argument to can_vec_mask_load_store_p?

>>But if-conversion only needs the conditional masking, not _len.  I
>>don't think we need to check this at all?  In fact
>>can_vec_mask_load_store_p should probably return true when
>>LEN_MASKLOAD is available since we can always use the full vector
>>len as len argument?  The suggested bool argument with_len
>>could also become an enum { WITH_LEN, WITHOUT_LEN, EITHER },
>>but not sure if that's really necessary.

Ok.


>>Are you going to provide len_load, maskload and len_maskload
>>patterns or just len_maskload?  (I hope the last)

I just want to only enable len_maskload in RISC-V port (no len_load or 
mask_load)
As I said in commit log, 

for (int i= 0; i < n; i++)
  a[i] = a[i] + b[i];

has length no mask == > len_maskload/len_maskstore ( length, mask = 
{1,1,1,1,1,1,1,.})

for (int i= 0; i < 4; i++)
  if (cond[i])
  a[i] = a[i] + b[i];

no length has mask == > len_maskload/len_maskstore ( length = vf, mask = 
comparison)


>> common with IFN_MASK_STORE by using internal_fn_stored_value_index
Ok.

>> internal_fn_stored_value_index
ok

>> That's too simple, the code using the info is not prepared for
>> LEN_MASK since it handles maskload and len_load separately.
>> I suggest to drop this, picking it up separately.
ok

>> I think you need to verify the length operand is the full vector, note
>> this is for if-conversion which could care less for _LEN, but if we
>> insist on using _LEN (I didn't see you changing if-conversion that
>> way?!) then we need to put in some value even for the "scalar"
>> placeholder.  I'd suggest to simply use IFN_MASK_{LOAD,STORE} in
>> if-conversion but vectorize that as LEN_ with full length if
>> plain MASK_LOAD/STORE isn't available.  Which means these changes
>> are not necessary at all.

ok

>>as with can_mask_* we don't really care if masking is supported or
>>not so I suggest to amend get_len_load_store_mode to also check the
>>len_mask{load,store} optabs?  Given that the function
>>should possibly return the corresponding IFN as well.
ok

>>use the proper ifn index compute fn
ok

>>so this answers my question - you just have len_mask{load,store}?
Yes.


>>I think we really want to common this somehow, having
>>if (loop_lens) do the final_len compute and then afterwards
>>select the IFN to create, filling required default args of
>>final_mask and final_len if not computed.
ok.

>>and split this out to a helper function
ok.




juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-15 16:06
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford; krebbel; uweigand
Subject: Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow 
control for 

Re: Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow control for length loop control

2023-06-15 Thread juzhe.zh...@rivai.ai
Hi, Richard.

Thanks for comments.

>>Can you try using the same wording for length and mask operands
>>as for len_load and maskload?  Also len_load has the "bias"
>>operand which you omit here - IIRC that was added for s390 which
>>for unknown reason behaves a little different than power.  If
>>len support for s390 ever extends to other ops or power or s390
>>gain mask support for conditional code we'd likely have to adjust
>>each optab you add.  Maybe it's better to add the bias operand
>>now.

I don't know BIAS well and It seems to be a Power target dependent feature.
I think len_mask_* in general should only need lenght and mask operand.
Actually, the function argument is totally same as vp_load/vp_store in LLVM.

Could I just keep current format (without BIAS argument)? And extend it with 
BIAS if
PowerPC want to use LEN_MASK_ * ?


>> Can you please split the patch?  Have 1/n add the optab and ifn
>>plus adjust the generic ifn predicates.  Have 2/n adjust the vectorizer
>>parts and 3/n optional things such as DSE.
Ok.

>>Can you instead add a len_p argument to the function and do

>>if (len_p)
>>  check len is full
>>if (mask_p)
>>  check mask is full

>>?


>>Use internal_fn_stored_value_index and internal_fn_mask_index,
>>possibly add internal_fn_len_index?

>>why deviate from can_vec_mask_load_store_p and not pass in the
>>mask_mode?  In fact I wonder why this function differs from
>>can_vec_mask_load_store_p besides using other optabs?  Couldn't
>>we simply add a bool with_len argument to can_vec_mask_load_store_p?

>>But if-conversion only needs the conditional masking, not _len.  I
>>don't think we need to check this at all?  In fact
>>can_vec_mask_load_store_p should probably return true when
>>LEN_MASKLOAD is available since we can always use the full vector
>>len as len argument?  The suggested bool argument with_len
>>could also become an enum { WITH_LEN, WITHOUT_LEN, EITHER },
>>but not sure if that's really necessary.

Ok.


>>Are you going to provide len_load, maskload and len_maskload
>>patterns or just len_maskload?  (I hope the last)

I just want to only enable len_maskload in RISC-V port (no len_load or 
mask_load)
As I said in commit log, 

for (int i= 0; i < n; i++)
  a[i] = a[i] + b[i];

has length no mask == > len_maskload/len_maskstore ( length, mask = 
{1,1,1,1,1,1,1,.})

for (int i= 0; i < 4; i++)
  if (cond[i])
  a[i] = a[i] + b[i];

no length has mask == > len_maskload/len_maskstore ( length = vf, mask = 
comparison)


>> common with IFN_MASK_STORE by using internal_fn_stored_value_index
Ok.

>> internal_fn_stored_value_index
ok

>> That's too simple, the code using the info is not prepared for
>> LEN_MASK since it handles maskload and len_load separately.
>> I suggest to drop this, picking it up separately.
ok

>> I think you need to verify the length operand is the full vector, note
>> this is for if-conversion which could care less for _LEN, but if we
>> insist on using _LEN (I didn't see you changing if-conversion that
>> way?!) then we need to put in some value even for the "scalar"
>> placeholder.  I'd suggest to simply use IFN_MASK_{LOAD,STORE} in
>> if-conversion but vectorize that as LEN_ with full length if
>> plain MASK_LOAD/STORE isn't available.  Which means these changes
>> are not necessary at all.

ok

>>as with can_mask_* we don't really care if masking is supported or
>>not so I suggest to amend get_len_load_store_mode to also check the
>>len_mask{load,store} optabs?  Given that the function
>>should possibly return the corresponding IFN as well.
ok

>>use the proper ifn index compute fn
ok

>>so this answers my question - you just have len_mask{load,store}?
Yes.


>>I think we really want to common this somehow, having
>>if (loop_lens) do the final_len compute and then afterwards
>>select the IFN to create, filling required default args of
>>final_mask and final_len if not computed.
ok.

>>and split this out to a helper function
ok.




juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-15 16:06
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford; krebbel; uweigand
Subject: Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow 
control for length loop control
On Mon, 12 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Target like ARM SVE in GCC has an elegant way to handle both loop control
> and flow control simultaneously:
> 
> loop_control_mask = WHILE_ULT
> flow_control_mask = comparison
> control_mask = loop_control_mask & flow_control_mask;
> MASK_LOAD (control_mask)
> MASK_STORE (control_mask)
> 
> However, targets like RVV (RISC-V Vector) can not use this approach in
> auto-vectorization since RVV use length in loop control.
> 
> This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets
> like RISC-V that uses length in loop control.
> Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length
> or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR.
> 

Re: [PATCH] x86: correct and improve "*vec_dupv2di"

2023-06-15 Thread Uros Bizjak via Gcc-patches
On Thu, Jun 15, 2023 at 10:15 AM Jan Beulich  wrote:
>
> On 15.06.2023 09:45, Hongtao Liu wrote:
> > On Thu, Jun 15, 2023 at 3:07 PM Uros Bizjak via Gcc-patches
> >  wrote:
> >> On Thu, Jun 15, 2023 at 8:03 AM Jan Beulich via Gcc-patches
> >>  wrote:
> >>> +case 3:
> >>> +  return "%vmovddup\t{%1, %0|%0, %1}";
> >>> +case 4:
> >>> +  return "movlhps\t%0, %0";
> >>> +default:
> >>> +  gcc_unreachable ();
> >>> +}
> >>> +}
> >>> +  [(set_attr "isa" "sse2_noavx,avx,avx512f,sse3,noavx")
> >>> +   (set_attr "type" "sselog1,sselog1,ssemov,sselog1,ssemov")
> >>> +   (set_attr "prefix" "orig,maybe_evex,evex,maybe_vex,orig")
> >>> +   (set_attr "mode" "TI,TI,TI,DF,V4SF")
> > alternative 2 should be XImode when !TARGET_AVX512VL.
>
> This gives me a chance to actually raise a related question I stumbled
> across several times: Which operand does the mode attribute actually
> describe? I've seen places where it's the source, but I've also seen
> places where it's the destination. Because of this mix I wasn't really
> sure that getting this attribute entirely correct is actually
> necessary, and hence I hoped it would be okay to not further complicate
> the attribute here.

It should be the mode the insn is operating in. So, zero-extended
SImode add is still operating in SImode, even if its output is DImode,
and TARGET_MMX_WITH_SSE are V4SFmode, even if their operands are all
V2SFmode.

Uros.


Pushed: [PATCH] LoongArch: Set default alignment for functions and labels with -mtune

2023-06-15 Thread Xi Ruoyao via Gcc-patches
Pushed r14-1839.

On Thu, 2023-06-15 at 09:12 +0800, Lulu Cheng wrote:
> LGTM! Thanks!
> 
> 在 2023/6/14 上午8:43, Xi Ruoyao 写道:
> > The LA464 micro-architecture is sensitive to alignment of code.  The
> > Loongson team has benchmarked various combinations of function, the
> > results [1] show that 16-byte label alignment together with 32-byte
> > function alignment gives best results in terms of SPEC score.
> > 
> > Add a mtune-based table-driven mechanism to set the default of
> > -falign-{functions,labels}.  As LA464 is the first (and the only for
> > now) uarch supported by GCC, the same setting is also used for
> > the "generic" -mtune=loongarch64.  In the future we may set
> > different
> > settings for LA{2,3,6}64 once we add the support for them.
> > 
> > Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?
> > 
> > gcc/ChangeLog:
> > 
> > * config/loongarch/loongarch-tune.h (loongarch_align): New
> > struct.
> > * config/loongarch/loongarch-def.h (loongarch_cpu_align):
> > New
> > array.
> > * config/loongarch/loongarch-def.c (loongarch_cpu_align):
> > Define
> > the array.
> > * config/loongarch/loongarch.cc
> > (loongarch_option_override_internal): Set the value of
> > -falign-functions= if -falign-functions is enabled but no
> > value
> > is given.  Likewise for -falign-labels=.
> > ---
> >   gcc/config/loongarch/loongarch-def.c  | 12 
> >   gcc/config/loongarch/loongarch-def.h  |  1 +
> >   gcc/config/loongarch/loongarch-tune.h |  8 
> >   gcc/config/loongarch/loongarch.cc |  6 ++
> >   4 files changed, 27 insertions(+)
> > 
> > diff --git a/gcc/config/loongarch/loongarch-def.c
> > b/gcc/config/loongarch/loongarch-def.c
> > index fc4ebbefede..6729c857f7c 100644
> > --- a/gcc/config/loongarch/loongarch-def.c
> > +++ b/gcc/config/loongarch/loongarch-def.c
> > @@ -72,6 +72,18 @@ loongarch_cpu_cache[N_TUNE_TYPES] = {
> >     },
> >   };
> >   
> > +struct loongarch_align
> > +loongarch_cpu_align[N_TUNE_TYPES] = {
> > +  [CPU_LOONGARCH64] = {
> > +    .function = "32",
> > +    .label = "16",
> > +  },
> > +  [CPU_LA464] = {
> > +    .function = "32",
> > +    .label = "16",
> > +  },
> > +};
> > +
> >   /* The following properties cannot be looked up directly using
> > "cpucfg".
> >    So it is necessary to provide a default value for "unknown
> > native"
> >    tune targets (i.e. -mtune=native while PRID does not correspond
> > to
> > diff --git a/gcc/config/loongarch/loongarch-def.h
> > b/gcc/config/loongarch/loongarch-def.h
> > index 778b1409956..fb8bb88eb52 100644
> > --- a/gcc/config/loongarch/loongarch-def.h
> > +++ b/gcc/config/loongarch/loongarch-def.h
> > @@ -144,6 +144,7 @@ extern int loongarch_cpu_issue_rate[];
> >   extern int loongarch_cpu_multipass_dfa_lookahead[];
> >   
> >   extern struct loongarch_cache loongarch_cpu_cache[];
> > +extern struct loongarch_align loongarch_cpu_align[];
> >   extern struct loongarch_rtx_cost_data
> > loongarch_cpu_rtx_cost_data[];
> >   
> >   #ifdef __cplusplus
> > diff --git a/gcc/config/loongarch/loongarch-tune.h
> > b/gcc/config/loongarch/loongarch-tune.h
> > index ba31c4f08c3..5c03262daff 100644
> > --- a/gcc/config/loongarch/loongarch-tune.h
> > +++ b/gcc/config/loongarch/loongarch-tune.h
> > @@ -48,4 +48,12 @@ struct loongarch_cache {
> >   int simultaneous_prefetches; /* number of parallel prefetch */
> >   };
> >   
> > +/* Alignment for functions and labels for best performance.  For
> > new uarchs
> > +   the value should be measured via benchmarking.  See the
> > documentation for
> > +   -falign-functions and -falign-labels in invoke.texi for the
> > format.  */
> > +struct loongarch_align {
> > +  const char *function;/* default value for -falign-
> > functions */
> > +  const char *label;   /* default value for -falign-labels */
> > +};
> > +
> >   #endif /* LOONGARCH_TUNE_H */
> > diff --git a/gcc/config/loongarch/loongarch.cc
> > b/gcc/config/loongarch/loongarch.cc
> > index eb73d11b869..5b8b93eb24b 100644
> > --- a/gcc/config/loongarch/loongarch.cc
> > +++ b/gcc/config/loongarch/loongarch.cc
> > @@ -6249,6 +6249,12 @@ loongarch_option_override_internal (struct
> > gcc_options *opts)
> >     && !opts->x_optimize_size)
> >   opts->x_flag_prefetch_loop_arrays = 1;
> >   
> > +  if (opts->x_flag_align_functions && !opts->x_str_align_functions)
> > +    opts->x_str_align_functions =
> > loongarch_cpu_align[LARCH_ACTUAL_TUNE].function;
> > +
> > +  if (opts->x_flag_align_labels && !opts->x_str_align_labels)
> > +    opts->x_str_align_labels =
> > loongarch_cpu_align[LARCH_ACTUAL_TUNE].label;
> > +
> >     if (TARGET_DIRECT_EXTERN_ACCESS && flag_shlib)
> >   error ("%qs cannot be used for compiling a shared library",
> >    "-mdirect-extern-access");
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [pushed][PATCH v3] LoongArch: Avoid non-returning indirect jumps through $ra [PR110136]

2023-06-15 Thread Lulu Cheng

Pushed to trunk and gcc-12 gcc-13.
r14-1866
r13-7448
r12-9698

在 2023/6/15 上午9:30, Lulu Cheng 写道:

Micro-architecture unconditionally treats a "jr $ra" as "return from 
subroutine",
hence doing "jr $ra" would interfere with both subroutine return prediction and
the more general indirect branch prediction.

Therefore, a problem like PR110136 can cause a significant increase in branch 
error
prediction rate and affect performance. The same problem exists with 
"indirect_jump".

gcc/ChangeLog:

* config/loongarch/loongarch.md: Modify the register constraints for 
template
"jumptable" and "indirect_jump" from "r" to "e".

Co-authored-by: Andrew Pinski 
---
v1 -> v2:
   1. Modify the description.
   2. Modify the register constraints of the template "indirect_jump".
v2 -> v3:
   1. Modify the description.
---
  gcc/config/loongarch/loongarch.md | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 816a943d155..b37e070660f 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2895,6 +2895,10 @@ (define_insn "*jump_pic"
  }
[(set_attr "type" "branch")])
  
+;; Micro-architecture unconditionally treats a "jr $ra" as "return from subroutine",

+;; non-returning indirect jumps through $ra would interfere with both 
subroutine
+;; return prediction and the more general indirect branch prediction.
+
  (define_expand "indirect_jump"
[(set (pc) (match_operand 0 "register_operand"))]
""
@@ -2905,7 +2909,7 @@ (define_expand "indirect_jump"
  })
  
  (define_insn "@indirect_jump"

-  [(set (pc) (match_operand:P 0 "register_operand" "r"))]
+  [(set (pc) (match_operand:P 0 "register_operand" "e"))]
""
"jr\t%0"
[(set_attr "type" "jump")
@@ -2928,7 +2932,7 @@ (define_expand "tablejump"
  
  (define_insn "@tablejump"

[(set (pc)
-   (match_operand:P 0 "register_operand" "r"))
+   (match_operand:P 0 "register_operand" "e"))
 (use (label_ref (match_operand 1 "" "")))]
""
"jr\t%0"




Re: [PATCH] RISC-V: Add autovec FP unary operations.

2023-06-15 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

I like the iterator solution better,  I added it to the
binops V2 patch with a comment and will post it in a while.
Also realized there is already a testcase and the "enabled"
attribute is set properly now but I hadn't rebased to the
current master branch in a while...

Btw. I'm currently running the testsuite with rv64gcv_zfhmin
default march and see some additional FAILs.  Will report back.

Regards
 Robin


Re: [PATCH] x86: correct and improve "*vec_dupv2di"

2023-06-15 Thread Jan Beulich via Gcc-patches
On 15.06.2023 09:45, Hongtao Liu wrote:
> On Thu, Jun 15, 2023 at 3:07 PM Uros Bizjak via Gcc-patches
>  wrote:
>> On Thu, Jun 15, 2023 at 8:03 AM Jan Beulich via Gcc-patches
>>  wrote:
>>> +case 3:
>>> +  return "%vmovddup\t{%1, %0|%0, %1}";
>>> +case 4:
>>> +  return "movlhps\t%0, %0";
>>> +default:
>>> +  gcc_unreachable ();
>>> +}
>>> +}
>>> +  [(set_attr "isa" "sse2_noavx,avx,avx512f,sse3,noavx")
>>> +   (set_attr "type" "sselog1,sselog1,ssemov,sselog1,ssemov")
>>> +   (set_attr "prefix" "orig,maybe_evex,evex,maybe_vex,orig")
>>> +   (set_attr "mode" "TI,TI,TI,DF,V4SF")
> alternative 2 should be XImode when !TARGET_AVX512VL.

This gives me a chance to actually raise a related question I stumbled
across several times: Which operand does the mode attribute actually
describe? I've seen places where it's the source, but I've also seen
places where it's the destination. Because of this mix I wasn't really
sure that getting this attribute entirely correct is actually
necessary, and hence I hoped it would be okay to not further complicate
the attribute here.

Jan


Re: [PATCH V2] VECT: Support LEN_MASK_ LOAD/STORE to support flow control for length loop control

2023-06-15 Thread Richard Biener via Gcc-patches
On Mon, 12 Jun 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Target like ARM SVE in GCC has an elegant way to handle both loop control
> and flow control simultaneously:
> 
> loop_control_mask = WHILE_ULT
> flow_control_mask = comparison
> control_mask = loop_control_mask & flow_control_mask;
> MASK_LOAD (control_mask)
> MASK_STORE (control_mask)
> 
> However, targets like RVV (RISC-V Vector) can not use this approach in
> auto-vectorization since RVV use length in loop control.
> 
> This patch adds LEN_MASK_ LOAD/STORE to support flow control for targets
> like RISC-V that uses length in loop control.
> Normalize load/store into LEN_MASK_ LOAD/STORE as long as either length
> or mask is valid. Length is the outcome of SELECT_VL or MIN_EXPR.
> Mask is the outcome of comparison.
> 
> LEN_MASK_ LOAD/STORE format is defined as follows:
> 1). LEN_MASK_LOAD (ptr, align, length, mask).
> 2). LEN_MASK_STORE (ptr, align, length, mask, vec).
> 
> Consider these 4 following cases:
> 
> VLA: Variable-length auto-vectorization
> VLS: Specific-length auto-vectorization
> 
> Case 1 (VLS): -mrvv-vector-bits=128   IR (Does not use LEN_MASK_*):
> Code: v1 = MEM (...)
>   for (int i = 0; i < 4; i++)   v2 = MEM (...)
> a[i] = b[i] + c[i]; v3 = v1 + v2 
> MEM[...] = v3
> 
> Case 2 (VLS): -mrvv-vector-bits=128   IR (LEN_MASK_* with length = VF, mask = 
> comparison):
> Code:   mask = comparison
>   for (int i = 0; i < 4; i++)   v1 = LEN_MASK_LOAD (length = VF, mask)
> if (cond[i])v2 = LEN_MASK_LOAD (length = VF, 
> mask) 
>   a[i] = b[i] + c[i];   v3 = v1 + v2
> LEN_MASK_STORE (length = VF, mask, v3)
>
> Case 3 (VLA):
> Code:   loop_len = SELECT_VL or MIN
>   for (int i = 0; i < n; i++)   v1 = LEN_MASK_LOAD (length = 
> loop_len, mask = {-1,-1,...})
>   a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = 
> loop_len, mask = {-1,-1,...})
> v3 = v1 + v2  
>   
> LEN_MASK_STORE (length = loop_len, 
> mask = {-1,-1,...}, v3)
> 
> Case 4 (VLA):
> Code:   loop_len = SELECT_VL or MIN
>   for (int i = 0; i < n; i++)   mask = comparison
>   if (cond[i])  v1 = LEN_MASK_LOAD (length = 
> loop_len, mask)
>   a[i] = b[i] + c[i];   v2 = LEN_MASK_LOAD (length = 
> loop_len, mask)
> v3 = v1 + v2  
>   
> LEN_MASK_STORE (length = loop_len, 
> mask, v3)
> 
> More features:
> 1. Support simplify gimple fold for LEN_MASK_ LOAD/STORE:
>LEN_MASK_STORE (length = vf, mask = {-1,-1,...}, v) ===> MEM [...] = V
> 2. Allow DSE for LEN_MASK_* LOAD/STORE.
> 
> Bootstrap && Regression on X86 with no surprise difference.

Can you please split the patch?  Have 1/n add the optab and ifn
plus adjust the generic ifn predicates.  Have 2/n adjust the vectorizer
parts and 3/n optional things such as DSE.

Some comments below.

> gcc/ChangeLog:
> 
> * doc/md.texi: Add LEN_MASK_ LOAD/STORE.
> * genopinit.cc (main): Ditto.
> (CMP_NAME): Ditto.
> * gimple-fold.cc (arith_overflowed_p): Ditto.
> (gimple_fold_partial_load_store_mem_ref): Ditto.
> (gimple_fold_partial_store): Ditto.
> (gimple_fold_call): Ditto.
> * internal-fn.cc (len_maskload_direct): Ditto.
> (len_maskstore_direct): Ditto.
> (expand_partial_load_optab_fn): Ditto.
> (expand_len_maskload_optab_fn): Ditto.
> (expand_partial_store_optab_fn): Ditto.
> (expand_len_maskstore_optab_fn): Ditto.
> (direct_len_maskload_optab_supported_p): Ditto.
> (direct_len_maskstore_optab_supported_p): Ditto.
> (internal_load_fn_p): Ditto.
> (internal_store_fn_p): Ditto.
> (internal_fn_mask_index): Ditto.
> (internal_fn_stored_value_index): Ditto.
> * internal-fn.def (LEN_MASK_LOAD): Ditto.
> (LEN_MASK_STORE): Ditto.
> * optabs-query.cc (can_vec_len_mask_load_store_p): Ditto.
> * optabs-query.h (can_vec_len_mask_load_store_p): Ditto.
> * optabs.def (OPTAB_CD): Ditto.
> * tree-data-ref.cc (get_references_in_stmt): Ditto.
> * tree-if-conv.cc (ifcvt_can_use_mask_load_store): Ditto.
> * tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Ditto.
> (call_may_clobber_ref_p_1): Ditto.
> * tree-ssa-dse.cc (initialize_ao_ref_for_dse): Ditto.
> (dse_optimize_stmt): Ditto.
> * tree-ssa-loop-ivopts.cc (get_mem_type_for_internal_fn): Ditto.
> (get_alias_ptr_type_for_ptr_address): Ditto.
> * 

[COMMITTED] ada: Fix wrong finalization for double subtype of bounded vector

2023-06-15 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

The special handling of temporaries created for return values and subject
to a renaming needs to be restricted to the top level, where it is needed
to prevent dangling references to the frame of the elaboration routine from
being created, because, at a lower level, the front-end may create implicit
renamings of objects as these temporaries, so a copy is not allowed.

gcc/ada/

* gcc-interface/decl.cc (gnat_to_gnu_entity) : Restrict
the special handling of temporaries created for return values and
subject to a renaming to the top level.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/decl.cc | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/gcc-interface/decl.cc b/gcc/ada/gcc-interface/decl.cc
index e5e04ddad93..b2b77787bc0 100644
--- a/gcc/ada/gcc-interface/decl.cc
+++ b/gcc/ada/gcc-interface/decl.cc
@@ -1076,9 +1076,13 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree 
gnu_expr, bool definition)
|| EXPRESSION_CLASS_P (inner)
/* We need to detect the case where a temporary is created to
   hold the return value, since we cannot safely rename it at
-  top level as it lives only in the elaboration routine.  */
+  top level because it lives only in the elaboration routine.
+  But, at a lower level, an object initialized by a function
+  call may be (implicitly) renamed as this temporary by the
+  front-end and, in this case, we cannot make a copy.  */
|| (VAR_P (inner)
-   && DECL_RETURN_VALUE_P (inner))
+   && DECL_RETURN_VALUE_P (inner)
+   && global_bindings_p ())
/* We also need to detect the case where the front-end creates
   a dangling 'reference to a function call at top level and
   substitutes it in the renaming, for example:
@@ -1092,12 +1096,14 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree 
gnu_expr, bool definition)
 q__b : boolean renames q__R1s.all.e (1);
 
   We cannot safely rename the rewritten expression since the
-  underlying object lives only in the elaboration routine.  */
+  underlying object lives only in the elaboration routine but,
+  as above, this cannot be done at a lower level.  */
|| (INDIRECT_REF_P (inner)
&& (inner
= remove_conversions (TREE_OPERAND (inner, 0), true))
&& VAR_P (inner)
-   && DECL_RETURN_VALUE_P (inner)))
+   && DECL_RETURN_VALUE_P (inner)
+   && global_bindings_p ()))
  ;
 
/* Otherwise, this is an lvalue being renamed, so it needs to be
-- 
2.40.0



[COMMITTED] ada: Make minor improvements to user's guide

2023-06-15 Thread Marc Poulhiès via Gcc-patches
From: Ronan Desplanques 

gcc/ada/

* doc/gnat_ugn/about_this_guide.rst: Fix typo. Uniformize punctuation.
* doc/gnat_ugn/the_gnat_compilation_model.rst: Uniformize punctuation.
Fix capitalization. Fix indentation of code block. Fix RST formatting
syntax errors.
* gnat_ugn.texi: Regenerate.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/doc/gnat_ugn/about_this_guide.rst |  8 ++--
 .../gnat_ugn/the_gnat_compilation_model.rst   | 24 +--
 gcc/ada/gnat_ugn.texi | 41 +--
 3 files changed, 36 insertions(+), 37 deletions(-)

diff --git a/gcc/ada/doc/gnat_ugn/about_this_guide.rst 
b/gcc/ada/doc/gnat_ugn/about_this_guide.rst
index 33476264231..18cfb0291b6 100644
--- a/gcc/ada/doc/gnat_ugn/about_this_guide.rst
+++ b/gcc/ada/doc/gnat_ugn/about_this_guide.rst
@@ -38,17 +38,17 @@ This guide contains the following chapters:
   using the GNU make utility with GNAT.
 
 * :ref:`GNAT_Utility_Programs` explains the various utility programs that
-  are included in the GNAT environment
+  are included in the GNAT environment.
 
 * :ref:`GNAT_and_Program_Execution` covers a number of topics related to
-  running, debugging, and tuning the performace of programs developed
-  with GNAT
+  running, debugging, and tuning the performance of programs developed
+  with GNAT.
 
 Appendices cover several additional topics:
 
 * :ref:`Platform_Specific_Information` describes the different run-time
   library implementations and also presents information on how to use
-  GNAT on several specific platforms
+  GNAT on several specific platforms.
 
 * :ref:`Example_of_Binder_Output_File` shows the source code for the binder
   output file for a sample program.
diff --git a/gcc/ada/doc/gnat_ugn/the_gnat_compilation_model.rst 
b/gcc/ada/doc/gnat_ugn/the_gnat_compilation_model.rst
index 77a2055f642..c7f15b4612d 100644
--- a/gcc/ada/doc/gnat_ugn/the_gnat_compilation_model.rst
+++ b/gcc/ada/doc/gnat_ugn/the_gnat_compilation_model.rst
@@ -168,7 +168,7 @@ GNAT also supports several other 8-bit coding schemes:
 
 *ISO 8859-15 (Latin-9)*
   ISO 8859-15 (Latin-9) letters allowed in identifiers, with uppercase and
-  lowercase equivalence
+  lowercase equivalence.
 
 .. index:: code page 437 (IBM PC)
 
@@ -1778,8 +1778,8 @@ default, that contains calls to the elaboration 
procedures of those
 compilation unit that require them, followed by
 a call to the main program. This Ada program is compiled to generate the
 object file for the main program. The name of
-the Ada file is :file:`b~xxx`.adb` (with the corresponding spec
-:file:`b~xxx`.ads`) where ``xxx`` is the name of the
+the Ada file is :file:`b~xxx.adb` (with the corresponding spec
+:file:`b~xxx.ads`) where ``xxx`` is the name of the
 main program unit.
 
 Finally, the linker is used to build the resulting executable program,
@@ -3590,7 +3590,7 @@ Convention identifiers are recognized by GNAT:
   Ada compiler for further details on elaboration.
 
   However, it is not possible to mix the tasking run time of GNAT and
-  HP Ada 83, All the tasking operations must either be entirely within
+  HP Ada 83, all the tasking operations must either be entirely within
   GNAT compiled sections of the program, or entirely within HP Ada 83
   compiled sections of the program.
 
@@ -3715,14 +3715,14 @@ Convention identifiers are recognized by GNAT:
 to perform dimensional checks:
 
 
-  .. code-block:: ada
+.. code-block:: ada
 
-  type Distance is new Long_Float;
-  type Time is new Long_Float;
-  type Velocity is new Long_Float;
-  function "/" (D : Distance; T : Time)
-return Velocity;
-  pragma Import (Intrinsic, "/");
+type Distance is new Long_Float;
+type Time is new Long_Float;
+type Velocity is new Long_Float;
+function "/" (D : Distance; T : Time)
+  return Velocity;
+pragma Import (Intrinsic, "/");
 
 This common idiom is often programmed with a generic definition and an
 explicit body. The pragma makes it simpler to introduce such declarations.
@@ -3858,7 +3858,7 @@ considered:
 
 
 * Using GNAT and G++ from two different GCC installations: If both
-  compilers are on the :envvar`PATH`, the previous method may be used. It is
+  compilers are on the :envvar:`PATH`, the previous method may be used. It is
   important to note that environment variables such as
   :envvar:`C_INCLUDE_PATH`, :envvar:`GCC_EXEC_PREFIX`,
   :envvar:`BINUTILS_ROOT`, and
diff --git a/gcc/ada/gnat_ugn.texi b/gcc/ada/gnat_ugn.texi
index f119e9fa743..88123df4332 100644
--- a/gcc/ada/gnat_ugn.texi
+++ b/gcc/ada/gnat_ugn.texi
@@ -19,7 +19,7 @@
 
 @copying
 @quotation
-GNAT User's Guide for Native Platforms , May 09, 2023
+GNAT User's Guide for Native Platforms , Jun 01, 2023
 
 AdaCore
 
@@ -590,12 +590,12 @@ using the GNU make utility with GNAT.
 
 @item 
 @ref{b,,GNAT Utility Programs} explains the various utility 

[COMMITTED] ada: Fix wrong code for ACATS cd1c03i on Morello target

2023-06-15 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

gcc/ada/

* gcc-interface/utils2.cc (build_binary_op) : Do not
remove a VIEW_CONVERT_EXPR on the LHS if it is also on the RHS.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/utils2.cc | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/gcc-interface/utils2.cc b/gcc/ada/gcc-interface/utils2.cc
index e1737724b65..95bbce2f1b4 100644
--- a/gcc/ada/gcc-interface/utils2.cc
+++ b/gcc/ada/gcc-interface/utils2.cc
@@ -878,7 +878,8 @@ build_binary_op (enum tree_code op_code, tree result_type,
 them; we'll be putting them back below if needed.  Likewise for
 conversions between record types, except for justified modular types.
 But don't do this if the right operand is not BLKmode (for packed
-arrays) unless we are not changing the mode.  */
+arrays) unless we are not changing the mode, or if both ooperands
+are view conversions to the same type.  */
   while ((CONVERT_EXPR_P (left_operand)
  || TREE_CODE (left_operand) == VIEW_CONVERT_EXPR)
 && (((INTEGRAL_TYPE_P (left_type)
@@ -890,7 +891,10 @@ build_binary_op (enum tree_code op_code, tree result_type,
 && TREE_CODE (operand_type (left_operand)) == RECORD_TYPE
 && (TYPE_MODE (right_type) == BLKmode
 || TYPE_MODE (left_type)
-   == TYPE_MODE (operand_type (left_operand))
+   == TYPE_MODE (operand_type (left_operand)))
+&& !(TREE_CODE (left_operand) == VIEW_CONVERT_EXPR
+ && TREE_CODE (right_operand) == VIEW_CONVERT_EXPR
+ && left_type == right_type
{
  left_operand = TREE_OPERAND (left_operand, 0);
  left_type = TREE_TYPE (left_operand);
-- 
2.40.0



[COMMITTED] ada: Reject Loop_Entry inside prefix of Loop_Entry

2023-06-15 Thread Marc Poulhiès via Gcc-patches
From: Yannick Moy 

This rule was incompletely stated in SPARK RM and not checked.
This is now fixed.

gcc/ada/

* sem_attr.adb (Analyze_Attribute): Reject case of Loop_Entry
inside the prefix of Loop_Entry, as per SPARK RM 5.5.3.1(4,8).

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_attr.adb | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
index dc06435e7b0..7a47abdb625 100644
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -4784,8 +4784,9 @@ package body Sem_Attr is
 Loop_Decl : constant Node_Id := Label_Construct (Parent (Loop_Id));
 
 function Check_Reference (Nod : Node_Id) return Traverse_Result;
---  Determine whether a reference mentions an entity declared
---  within the related loop.
+--  Detect attribute 'Loop_Entry in prefix P and determine whether
+--  a reference mentions an entity declared within the related
+--  loop.
 
 function Declared_Within (Nod : Node_Id) return Boolean;
 --  Determine whether Nod appears in the subtree of Loop_Decl but
@@ -4796,8 +4797,22 @@ package body Sem_Attr is
 -
 
 function Check_Reference (Nod : Node_Id) return Traverse_Result is
+   Orig_Nod : constant Node_Id := Original_Node (Nod);
+   --  Check presence of Loop_Entry in the prefix P by looking at
+   --  the original node for Nod, as it will have been rewritten
+   --  into its own prefix if the assertion is ignored (see code
+   --  below).
+
 begin
-   if Nkind (Nod) = N_Identifier
+   if Is_Attribute_Loop_Entry (Orig_Nod) then
+  Error_Msg_Name_1 := Name_Loop_Entry;
+  Error_Msg_Name_2 := Name_Loop_Entry;
+  Error_Msg_N
+("attribute % cannot appear in the prefix of attribute %",
+ Nod);
+  return Abandon;
+
+   elsif Nkind (Nod) = N_Identifier
  and then Present (Entity (Nod))
  and then Declared_Within (Declaration_Node (Entity (Nod)))
then
-- 
2.40.0



[COMMITTED] ada: Remove unused files

2023-06-15 Thread Marc Poulhiès via Gcc-patches
gcc/ada/ChangeLog:

* vxworks7-cert-rtp-base-link.spec: Removed.
* vxworks7-cert-rtp-base-link__ppc64.spec: Removed.
* vxworks7-cert-rtp-base-link__x86.spec: Removed.
* vxworks7-cert-rtp-base-link__x86_64.spec: Removed.
* vxworks7-cert-rtp-link.spec: Removed.
* vxworks7-cert-rtp-link__ppcXX.spec: Removed.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/vxworks7-cert-rtp-base-link.spec |  2 --
 gcc/ada/vxworks7-cert-rtp-base-link__ppc64.spec  |  2 --
 gcc/ada/vxworks7-cert-rtp-base-link__x86.spec|  2 --
 gcc/ada/vxworks7-cert-rtp-base-link__x86_64.spec |  2 --
 gcc/ada/vxworks7-cert-rtp-link.spec  | 10 --
 gcc/ada/vxworks7-cert-rtp-link__ppcXX.spec   | 10 --
 6 files changed, 28 deletions(-)
 delete mode 100644 gcc/ada/vxworks7-cert-rtp-base-link.spec
 delete mode 100644 gcc/ada/vxworks7-cert-rtp-base-link__ppc64.spec
 delete mode 100644 gcc/ada/vxworks7-cert-rtp-base-link__x86.spec
 delete mode 100644 gcc/ada/vxworks7-cert-rtp-base-link__x86_64.spec
 delete mode 100644 gcc/ada/vxworks7-cert-rtp-link.spec
 delete mode 100644 gcc/ada/vxworks7-cert-rtp-link__ppcXX.spec

diff --git a/gcc/ada/vxworks7-cert-rtp-base-link.spec 
b/gcc/ada/vxworks7-cert-rtp-base-link.spec
deleted file mode 100644
index 1d6ee49b128..000
--- a/gcc/ada/vxworks7-cert-rtp-base-link.spec
+++ /dev/null
@@ -1,2 +0,0 @@
-*base_link:
---defsym=__wrs_rtp_base=0x8000
diff --git a/gcc/ada/vxworks7-cert-rtp-base-link__ppc64.spec 
b/gcc/ada/vxworks7-cert-rtp-base-link__ppc64.spec
deleted file mode 100644
index 97332b85440..000
--- a/gcc/ada/vxworks7-cert-rtp-base-link__ppc64.spec
+++ /dev/null
@@ -1,2 +0,0 @@
-*base_link:
---defsym=__wrs_rtp_base=0x4000
diff --git a/gcc/ada/vxworks7-cert-rtp-base-link__x86.spec 
b/gcc/ada/vxworks7-cert-rtp-base-link__x86.spec
deleted file mode 100644
index eafb5828b37..000
--- a/gcc/ada/vxworks7-cert-rtp-base-link__x86.spec
+++ /dev/null
@@ -1,2 +0,0 @@
-*base_link:
---defsym=__wrs_rtp_base=0x40
diff --git a/gcc/ada/vxworks7-cert-rtp-base-link__x86_64.spec 
b/gcc/ada/vxworks7-cert-rtp-base-link__x86_64.spec
deleted file mode 100644
index dd288690ab9..000
--- a/gcc/ada/vxworks7-cert-rtp-base-link__x86_64.spec
+++ /dev/null
@@ -1,2 +0,0 @@
-*base_link:
---defsym=__wrs_rtp_base=0x20
diff --git a/gcc/ada/vxworks7-cert-rtp-link.spec 
b/gcc/ada/vxworks7-cert-rtp-link.spec
deleted file mode 100644
index 9923c58defa..000
--- a/gcc/ada/vxworks7-cert-rtp-link.spec
+++ /dev/null
@@ -1,10 +0,0 @@
-*self_spec:
-+ %{!nostdlib:-nodefaultlibs -nostartfiles}
-
-*link:
-+ %{!nostdlib:%{mrtp:%{!shared: \
- %(base_link) \
- -l:certRtp.o \
- -L%:getenv(VSB_DIR /usr/lib/common/objcert) \
- -T%:getenv(VSB_DIR /usr/ldscripts/rtp.ld) \
-   }}}
diff --git a/gcc/ada/vxworks7-cert-rtp-link__ppcXX.spec 
b/gcc/ada/vxworks7-cert-rtp-link__ppcXX.spec
deleted file mode 100644
index 8671cea7410..000
--- a/gcc/ada/vxworks7-cert-rtp-link__ppcXX.spec
+++ /dev/null
@@ -1,10 +0,0 @@
-*self_spec:
-+ %{!nostdlib:-nodefaultlibs -nostartfiles}
-
-*link:
-+ %{!nostdlib:%{mrtp:%{!shared: \
- %(base_link) \
- -lcert -lgnu \
- -L%:getenv(VSB_DIR /usr/lib/common/objcert) \
- -T%:getenv(VSB_DIR /usr/ldscripts/rtp.ld) \
-   }}}
-- 
2.40.0



[COMMITTED] ada: Fix internal error on loop iterator filter with -gnatVa

2023-06-15 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

The problem is that the condition of the iterator filter is expanded early,
before it is integrated into an if statement of the loop body, so there is
no place to attach the actions generated by this expansion.

This happens only for simple loops, i.e. with a parameter specification, so
the fix uses the same approach for them as for loops based on iterators.

gcc/ada/

* sinfo.ads (Iterator_Filter): Document field.
* sem_ch5.adb (Analyze_Iterator_Specification): Move comment around.
(Analyze_Loop_Parameter_Specification): Only preanalyze the iterator
filter, if any.
* exp_ch5.adb (Expand_N_Loop_Statement): Analyze the new list built
when an iterator filter is present.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch5.adb |  1 +
 gcc/ada/sem_ch5.adb | 11 +++
 gcc/ada/sinfo.ads   |  5 +
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/exp_ch5.adb b/gcc/ada/exp_ch5.adb
index a4c7db9f365..258459b393d 100644
--- a/gcc/ada/exp_ch5.adb
+++ b/gcc/ada/exp_ch5.adb
@@ -5678,6 +5678,7 @@ package body Exp_Ch5 is
   New_List (Make_If_Statement (Loc,
 Condition => Iterator_Filter (LPS),
 Then_Statements => Stats)));
+   Analyze_List (Statements (N));
 end if;
 
 --  Deal with loop over predicates
diff --git a/gcc/ada/sem_ch5.adb b/gcc/ada/sem_ch5.adb
index f9174869a26..fa36a5a0741 100644
--- a/gcc/ada/sem_ch5.adb
+++ b/gcc/ada/sem_ch5.adb
@@ -2705,10 +2705,10 @@ package body Sem_Ch5 is
  end if;
   end if;
 
-  if Present (Iterator_Filter (N)) then
- --  Preanalyze the filter. Expansion will take place when enclosing
- --  loop is expanded.
+  --  Preanalyze the filter. Expansion will take place when enclosing
+  --  loop is expanded.
 
+  if Present (Iterator_Filter (N)) then
  Preanalyze_And_Resolve (Iterator_Filter (N), Standard_Boolean);
   end if;
end Analyze_Iterator_Specification;
@@ -3424,8 +3424,11 @@ package body Sem_Ch5 is
  end;
   end if;
 
+  --  Preanalyze the filter. Expansion will take place when enclosing
+  --  loop is expanded.
+
   if Present (Iterator_Filter (N)) then
- Analyze_And_Resolve (Iterator_Filter (N), Standard_Boolean);
+ Preanalyze_And_Resolve (Iterator_Filter (N), Standard_Boolean);
   end if;
 
   --  A loop parameter cannot be effectively volatile (SPARK RM 7.1.3(4)).
diff --git a/gcc/ada/sinfo.ads b/gcc/ada/sinfo.ads
index 0efbd4479e0..8040a59e175 100644
--- a/gcc/ada/sinfo.ads
+++ b/gcc/ada/sinfo.ads
@@ -1909,6 +1909,11 @@ package Sinfo is
--Present in variable reference markers. Set when the original variable
--reference constitutes a write of the variable.
 
+   --  Iterator_Filter
+   --Present in N_Loop_Parameter_Specification and N_Iterator_Specification
+   --nodes for Ada 2022. It is used to store the condition present in the
+   --eponymous Ada 2022 construct.
+
--  Itype
--Used in N_Itype_Reference node to reference an itype for which it is
--important to ensure that it is defined. See description of this node
-- 
2.40.0



[COMMITTED] ada: Remove dead code in Expand_Iterator_Loop_Over_Container

2023-06-15 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

The Condition_Actions field can only be populated for while loops.

gcc/ada/

* exp_ch5.adb (Expand_Iterator_Loop_Over_Container): Do not insert
an always empty list. Remove unused parameter Isc.
(Expand_Iterator_Loop): Adjust call to above procedure.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch5.adb | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/gcc/ada/exp_ch5.adb b/gcc/ada/exp_ch5.adb
index d8214bd6ce2..a4c7db9f365 100644
--- a/gcc/ada/exp_ch5.adb
+++ b/gcc/ada/exp_ch5.adb
@@ -181,14 +181,13 @@ package body Exp_Ch5 is
 
procedure Expand_Iterator_Loop_Over_Container
  (N : Node_Id;
-  Isc   : Node_Id;
   I_Spec: Node_Id;
   Container : Node_Id;
   Container_Typ : Entity_Id);
--  Expand loop over containers that uses the form "for X of C" with an
-   --  optional subtype mark, or "for Y in C". Isc is the iteration scheme.
-   --  I_Spec is the iterator specification and Container is either the
-   --  Container (for OF) or the iterator (for IN).
+   --  optional subtype mark, or "for Y in C". I_Spec is the iterator
+   --  specification and Container is either the Container (for OF) or the
+   --  iterator (for IN).
 
procedure Expand_Predicated_Loop (N : Node_Id);
--  Expand for loop over predicated subtype
@@ -4836,7 +4835,7 @@ package body Exp_Ch5 is
 
   else
  Expand_Iterator_Loop_Over_Container
-   (N, Isc, I_Spec, Container, Container_Typ);
+   (N, I_Spec, Container, Container_Typ);
   end if;
end Expand_Iterator_Loop;
 
@@ -5133,7 +5132,6 @@ package body Exp_Ch5 is
 
procedure Expand_Iterator_Loop_Over_Container
  (N : Node_Id;
-  Isc   : Node_Id;
   I_Spec: Node_Id;
   Container : Node_Id;
   Container_Typ : Entity_Id)
@@ -5606,13 +5604,6 @@ package body Exp_Ch5 is
  Mutate_Ekind (Cursor, Id_Kind);
   end;
 
-  --  If the range of iteration is given by a function call that returns
-  --  a container, the finalization actions have been saved in the
-  --  Condition_Actions of the iterator. Insert them now at the head of
-  --  the loop.
-
-  Insert_List_Before (N, Condition_Actions (Isc));
-
   Rewrite (N, New_Loop);
   Analyze (N);
end Expand_Iterator_Loop_Over_Container;
-- 
2.40.0



[COMMITTED] ada: Fix too small secondary stack allocation for returned conversion

2023-06-15 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

The previous fix did not address a latent issue whereby the allocation
would be made using the (static) subtype of the conversion instead of
the (dynamic) subtype of the return object, so this change rewrites the
code responsible for determining the type used for the allocation, and
also contains a small improvement to the Has_Tag_Of_Type predicate.

gcc/ada/

* exp_ch3.adb (Make_Allocator_For_Return): Rewrite the logic that
determines the type used for the allocation and add assertions.
* exp_util.adb (Has_Tag_Of_Type): Also return true for extension
aggregates.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch3.adb  | 92 +---
 gcc/ada/exp_util.adb |  1 +
 2 files changed, 63 insertions(+), 30 deletions(-)

diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
index 778eed7f16e..7ac4680b395 100644
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -7114,8 +7114,64 @@ package body Exp_Ch3 is
   function Make_Allocator_For_Return (Expr : Node_Id) return Node_Id is
  Alloc  : Node_Id;
  Alloc_Expr : Entity_Id;
+ Alloc_Typ  : Entity_Id;
 
   begin
+ --  If the return object's declaration does not include an expression,
+ --  then we use its subtype for the allocation. Likewise in the case
+ --  of a degenerate expression like a raise expression.
+
+ if No (Expr)
+   or else Nkind (Original_Node (Expr)) = N_Raise_Expression
+ then
+Alloc_Typ := Typ;
+
+ --  If the return object's declaration includes an expression, then
+ --  there are two cases: either the nominal subtype of the object is
+ --  definite and we can use it for the allocation directly, or it is
+ --  not and Analyze_Object_Declaration should have built an actual
+ --  subtype from the expression.
+
+ --  However, there are exceptions in the latter case for interfaces
+ --  (see Analyze_Object_Declaration), as well as class-wide types and
+ --  types with unknown discriminants if they are additionally limited
+ --  (see Expand_Subtype_From_Expr), so we must cope with them.
+
+ elsif Is_Interface (Typ) then
+pragma Assert (Is_Class_Wide_Type (Typ));
+
+--  For interfaces, we use the type of the expression, except if
+--  we need to put back a conversion that we have removed earlier
+--  in the processing.
+
+if Is_Class_Wide_Type (Etype (Expr)) then
+   Alloc_Typ := Typ;
+else
+   Alloc_Typ := Etype (Expr);
+end if;
+
+ elsif Is_Class_Wide_Type (Typ) then
+
+--  For class-wide types, we have to make sure that we use the
+--  dynamic type of the expression for the allocation, either by
+--  means of its (static) subtype or through the actual subtype.
+
+if Has_Tag_Of_Type (Expr) then
+   Alloc_Typ := Etype (Expr);
+
+else pragma Assert (Ekind (Typ) = E_Class_Wide_Subtype
+  and then Present (Equivalent_Type (Typ)));
+
+   Alloc_Typ := Typ;
+end if;
+
+ else pragma Assert (Is_Definite_Subtype (Typ)
+   or else (Has_Unknown_Discriminants (Typ)
+ and then Is_Limited_View (Typ)));
+
+Alloc_Typ := Typ;
+ end if;
+
  --  If the return object's declaration includes an expression and the
  --  declaration isn't marked as No_Initialization, then we generate an
  --  allocator with a qualified expression. Although this is necessary
@@ -7141,46 +7197,22 @@ package body Exp_Ch3 is
 
 Alloc_Expr := New_Copy_Tree (Expr);
 
---  In the constrained array case, deal with a potential sliding.
---  In the interface case, put back a conversion that we may have
---  removed earlier in the processing.
-
-if (Ekind (Typ) = E_Array_Subtype
- or else (Is_Interface (Typ)
-   and then Is_Class_Wide_Type (Etype (Alloc_Expr
-  and then Typ /= Etype (Alloc_Expr)
-then
-   Alloc_Expr := Convert_To (Typ, Alloc_Expr);
+if Etype (Alloc_Expr) /= Alloc_Typ then
+   Alloc_Expr := Convert_To (Alloc_Typ, Alloc_Expr);
 end if;
 
---  We always use the type of the expression for the qualified
---  expression, rather than the return object's type. We cannot
---  always use the return object's type because the expression
---  might be of a specific type and the return object might not.
-
 Alloc :=
   Make_Allocator (Loc,
 Expression =>
   Make_Qualified_Expression (Loc,
 Subtype_Mark =>
- 

[COMMITTED] ada: Adjust comments in targparm.ads

2023-06-15 Thread Marc Poulhiès via Gcc-patches
From: Ronan Desplanques 

This patch removes a few dangling references to the late front-end
implementation of exceptions from the comments of targparm.ads, and
also fixes a thinko there.

gcc/ada/

* targparm.ads: Remove references to front-end-based exceptions. Fix
thinko.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/targparm.ads | 25 +
 1 file changed, 5 insertions(+), 20 deletions(-)

diff --git a/gcc/ada/targparm.ads b/gcc/ada/targparm.ads
index aa91ee60e82..01ee492e015 100644
--- a/gcc/ada/targparm.ads
+++ b/gcc/ada/targparm.ads
@@ -213,22 +213,7 @@ package Targparm is
-- Control of Exception Handling --
---
 
-   --  GNAT implements three methods of implementing exceptions:
-
-   --Front-End Longjmp/Setjmp Exceptions
-
-   --  This approach uses longjmp/setjmp to handle exceptions. It
-   --  uses less storage, and can often propagate exceptions faster,
-   --  at the expense of (sometimes considerable) overhead in setting
-   --  up an exception handler.
-
-   --  The generation of the setjmp and longjmp calls is handled by
-   --  the front end of the compiler (this includes gigi in the case
-   --  of the standard GCC back end). It does not use any back end
-   --  support (such as the GCC3 exception handling mechanism). When
-   --  this approach is used, the compiler generates special exception
-   --  handlers for handling cleanups (AT-END actions) when an exception
-   --  is raised.
+   --  GNAT provides two methods of implementing exceptions:
 
--Back-End Zero Cost Exceptions
 
@@ -254,10 +239,10 @@ package Targparm is
 
--Control of Available Methods and Defaults
 
-   --  The following switches specify whether we're using a front-end or a
-   --  back-end mechanism and whether this is a zero-cost or a sjlj scheme.
+   --  The following switch specifies whether this is a zero-cost or a sjlj
+   --  scheme.
 
-   --  The per-switch default values correspond to the default value of
+   --  The default value corresponds to the default value of
--  Opt.Exception_Mechanism.
 
ZCX_By_Default_On_Target : Boolean := False;
@@ -408,7 +393,7 @@ package Targparm is
-- Control of Stack Checking --
---
 
-   --  GNAT provides three methods of implementing exceptions:
+   --  GNAT provides three methods of implementing stack checking:
 
--GCC Probing Mechanism
 
-- 
2.40.0



[COMMITTED] ada: Revert latest change to Find_Hook_Context

2023-06-15 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

The issue is that, if an aggregate is both below a conditional expression
and above another conditional expression in the tree, we have currently no
place to put the finalization actions generated by the innermost expression
in the context of the aggregate before it is expanded, so they end up being
placed after the outermost expression.

But it is not clear whether that's really problematic because this does not
seem to happen for array aggregates with multiple or others choices: in this
case the aggregate is expanded first and the code path is not taken.

gcc/ada/

* exp_util.adb (Find_Hook_Context): Revert latest change.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_util.adb | 10 --
 1 file changed, 10 deletions(-)

diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb
index a4aa5f64447..91959793638 100644
--- a/gcc/ada/exp_util.adb
+++ b/gcc/ada/exp_util.adb
@@ -6500,16 +6500,6 @@ package body Exp_Util is
 then
Top := Par;
 
---  Stop at contexts where temporaries may be contained
-
-elsif Nkind (Par) in N_Aggregate
-   | N_Delta_Aggregate
-   | N_Extension_Aggregate
-   | N_Block_Statement
-   | N_Loop_Statement
-then
-   exit;
-
 --  Prevent the search from going too far
 
 elsif Is_Body_Or_Package_Declaration (Par) then
-- 
2.40.0



[COMMITTED] ada: Adjust QNX Ada priorities to match QNX system priorities

2023-06-15 Thread Marc Poulhiès via Gcc-patches
From: Johannes Kliemann 

The Ada priority range of the QNX runtime started from 0, differing from
the QNX system priorities range starting from 1. As this may cause
confusion, especially if used in a mixed language environment, the Ada
priority range now starts at 1.

The default priority of Ada tasks as mandated is the middle of the
priority range. On QNX this means the default priority of Ada tasks is
30. This is much higher than the default QNX priority of 10 and may
cause unexpected system interruptions when Ada tasks take a lot of CPU time.

gcc/ada/

* libgnarl/s-osinte__qnx.adb: Adjust priority conversion function.
* libgnat/system-qnx-arm.ads: Adjust priority range and default
priority.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnarl/s-osinte__qnx.adb |  2 +-
 gcc/ada/libgnat/system-qnx-arm.ads | 14 ++
 2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/gcc/ada/libgnarl/s-osinte__qnx.adb 
b/gcc/ada/libgnarl/s-osinte__qnx.adb
index bf08ecbf4dd..127d1795a35 100644
--- a/gcc/ada/libgnarl/s-osinte__qnx.adb
+++ b/gcc/ada/libgnarl/s-osinte__qnx.adb
@@ -87,7 +87,7 @@ package body System.OS_Interface is
  (Prio : System.Any_Priority) return Interfaces.C.int
is
begin
-  return Interfaces.C.int (Prio) + 1;
+  return Interfaces.C.int (Prio);
end To_Target_Priority;
 
-
diff --git a/gcc/ada/libgnat/system-qnx-arm.ads 
b/gcc/ada/libgnat/system-qnx-arm.ads
index 344bd6168f3..1dd1a2228e9 100644
--- a/gcc/ada/libgnat/system-qnx-arm.ads
+++ b/gcc/ada/libgnat/system-qnx-arm.ads
@@ -95,22 +95,20 @@ package System is
 
--  Priority-related Declarations (RM D.1)
 
-   --  System priority is Ada priority + 1, so lies in the range 1 .. 63.
-   --
--  If the scheduling policy is SCHED_FIFO or SCHED_RR the runtime makes use
--  of the entire range provided by the system.
--
--  If the scheduling policy is SCHED_OTHER the only valid system priority
--  is 1 and other values are simply ignored.
 
-   Max_Priority   : constant Positive := 61;
-   Max_Interrupt_Priority : constant Positive := 62;
+   Max_Priority   : constant Positive := 62;
+   Max_Interrupt_Priority : constant Positive := 63;
 
-   subtype Any_Priority   is Integer  range  0 .. 62;
-   subtype Priority   is Any_Priority range  0 .. 61;
-   subtype Interrupt_Priority is Any_Priority range 62 .. 62;
+   subtype Any_Priority   is Integer  range  1 .. 63;
+   subtype Priority   is Any_Priority range  1 .. 62;
+   subtype Interrupt_Priority is Any_Priority range 63 .. 63;
 
-   Default_Priority : constant Priority := 30;
+   Default_Priority : constant Priority := 10;
 
 private
 
-- 
2.40.0



[COMMITTED] ada: Fix too small secondary stack allocation for returned aggregate

2023-06-15 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

This restores the specific treatment of aggregates that are returned through
an extended return statement in a function returning a class-wide type, and
which was incorrectly dropped in an earlier change.

gcc/ada/

* exp_ch3.adb (Make_Allocator_For_Return): Deal again specifically
with an aggregate returned through an object of a class-wide type.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch3.adb | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
index fbedc16ddd0..778eed7f16e 100644
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -7167,9 +7167,20 @@ package body Exp_Ch3 is
 Expression   => Alloc_Expr));
 
  else
-Alloc :=
-  Make_Allocator (Loc,
-Expression => New_Occurrence_Of (Typ, Loc));
+--  If the return object is of a class-wide type, we cannot use
+--  its type for the allocator. Instead we use the type of the
+--  expression, which must be an aggregate of a definite type.
+
+if Is_Class_Wide_Type (Typ) then
+   Alloc :=
+ Make_Allocator (Loc,
+   Expression => New_Occurrence_Of (Etype (Expr), Loc));
+
+else
+   Alloc :=
+ Make_Allocator (Loc,
+   Expression => New_Occurrence_Of (Typ, Loc));
+end if;
 
 --  If the return object requires default initialization, then it
 --  will happen later following the elaboration of the renaming.
-- 
2.40.0



[COMMITTED] ada: Accept aspect Always_Terminates on entries

2023-06-15 Thread Marc Poulhiès via Gcc-patches
From: Piotr Trojanek 

The recently added aspect Always_Terminates is allowed on both
procedures and entries.

gcc/ada/

* sem_prag.adb (Analyze_Pragma): Accept pragma Always_Terminates when
it applies to an entry.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_prag.adb | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb
index 0febc445b35..b1e4439b9f2 100644
--- a/gcc/ada/sem_prag.adb
+++ b/gcc/ada/sem_prag.adb
@@ -13370,6 +13370,11 @@ package body Sem_Prag is
   return;
end if;
 
+--  Entry
+
+elsif Nkind (Subp_Decl) = N_Entry_Declaration then
+   null;
+
 else
Pragma_Misplaced;
 end if;
-- 
2.40.0



[COMMITTED] ada: Fix missing finalization for aggregates nested in conditional expressions

2023-06-15 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

The finalization actions for the components of the aggregates are blocked
by Expand_Ctrl_Function_Call, which sets Is_Ignored_Transient on all the
temporaries generated from within a conditional expression whatever the
intermediate constructs.  Now aggregates and their expansion in the form
of block and loop statements are "impenetrable" as far as temporaries are
concerned, i.e. the lifetime of temporaries generated within them does
not extend beyond them, so their finalization must not be blocked there.

gcc/ada/

* exp_util.ads (Within_Case_Or_If_Expression): Adjust description.
* exp_util.adb (Find_Hook_Context): Stop the search for the topmost
conditional expression, if within one, at contexts where temporaries
may be contained.
(Within_Case_Or_If_Expression): Return false upon first encoutering
contexts where temporaries may be contained.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_util.adb | 20 
 gcc/ada/exp_util.ads |  4 +++-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb
index b032336523d..a4aa5f64447 100644
--- a/gcc/ada/exp_util.adb
+++ b/gcc/ada/exp_util.adb
@@ -6500,6 +6500,16 @@ package body Exp_Util is
 then
Top := Par;
 
+--  Stop at contexts where temporaries may be contained
+
+elsif Nkind (Par) in N_Aggregate
+   | N_Delta_Aggregate
+   | N_Extension_Aggregate
+   | N_Block_Statement
+   | N_Loop_Statement
+then
+   exit;
+
 --  Prevent the search from going too far
 
 elsif Is_Body_Or_Package_Declaration (Par) then
@@ -14222,6 +14232,16 @@ package body Exp_Util is
  then
 return True;
 
+ --  Stop at contexts where temporaries may be contained
+
+ elsif Nkind (Par) in N_Aggregate
+| N_Delta_Aggregate
+| N_Extension_Aggregate
+| N_Block_Statement
+| N_Loop_Statement
+ then
+return False;
+
  --  Prevent the search from going too far
 
  elsif Is_Body_Or_Package_Declaration (Par) then
diff --git a/gcc/ada/exp_util.ads b/gcc/ada/exp_util.ads
index 66c4dc6be4c..24065b6f7b6 100644
--- a/gcc/ada/exp_util.ads
+++ b/gcc/ada/exp_util.ads
@@ -1240,7 +1240,9 @@ package Exp_Util is
--  extension to verify legality rules on inherited conditions.
 
function Within_Case_Or_If_Expression (N : Node_Id) return Boolean;
-   --  Determine whether arbitrary node N is within a case or an if expression
+   --  Determine whether arbitrary node N is immediately within a case or an if
+   --  expression. The criterion is whether temporaries created by the actions
+   --  attached to N need to outlive an enclosing case or if expression.
 
 private
pragma Inline (Duplicate_Subexpr);
-- 
2.40.0



[COMMITTED] ada: Add escape hatch to configurable run-time

2023-06-15 Thread Marc Poulhiès via Gcc-patches
From: Ronan Desplanques 

Before this patch, the fact that Restrictions pragmas had to fit on
a single line in system.ads was difficult to reconcile with the
80-character line limit that is enforced in that file.

The special rules for pragmas in system.ads made it impossible to us
the Style_Checks pragma to allow long Restrictions pragmas. This patch
relaxes those rules so the Style_Checks pragma can be used in
system.ads.

gcc/ada/

* targparm.adb: Allow pragma Style_Checks in some forms.
* targparm.ads: Document new pragma permission.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/targparm.adb | 8 
 gcc/ada/targparm.ads | 4 
 2 files changed, 12 insertions(+)

diff --git a/gcc/ada/targparm.adb b/gcc/ada/targparm.adb
index 6e753ea6aaa..d4701453995 100644
--- a/gcc/ada/targparm.adb
+++ b/gcc/ada/targparm.adb
@@ -660,6 +660,14 @@ package body Targparm is
 Opt.Task_Dispatching_Policy_Sloc := System_Location;
 goto Line_Loop_Continue;
 
+ --  Allow "pragma Style_Checks (On);" and "pragma Style_Checks (Off);"
+ --  to make it possible to have long "pragma Restrictions" line.
+
+ elsif Looking_At_Skip ("pragma Style_Checks (On);") or else
+   Looking_At_Skip ("pragma Style_Checks (Off);")
+ then
+goto Line_Loop_Continue;
+
  --  No other configuration pragmas are permitted
 
  elsif Looking_At ("pragma ") then
diff --git a/gcc/ada/targparm.ads b/gcc/ada/targparm.ads
index 01ee492e015..212725219d7 100644
--- a/gcc/ada/targparm.ads
+++ b/gcc/ada/targparm.ads
@@ -110,6 +110,10 @@ package Targparm is
--  If a pragma Profile with a valid profile argument appears, then
--  the appropriate restrictions and policy flags are set.
 
+   --  pragma Style_Checks is allowed with "On" or "Off" as an argument, in
+   --  order to make the conditions on pragma Restrictions documented in the
+   --  next paragraph easier to manage.
+
--  The only other pragma allowed is a pragma Restrictions that specifies
--  a restriction that will be imposed on all units in the partition. Note
--  that in this context, only one restriction can be specified in a single
-- 
2.40.0



[COMMITTED] ada: Reject aspect Always_Terminates on functions and generic functions

2023-06-15 Thread Marc Poulhiès via Gcc-patches
From: Piotr Trojanek 

The recently added aspect Always_Terminates is only allowed on
procedures.

gcc/ada/

* sem_prag.adb (Analyze_Pragma): Reject pragma Always_Terminates when
it applies to a function or generic function.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_prag.adb | 13 +
 1 file changed, 13 insertions(+)

diff --git a/gcc/ada/sem_prag.adb b/gcc/ada/sem_prag.adb
index 1fa946439ee..0febc445b35 100644
--- a/gcc/ada/sem_prag.adb
+++ b/gcc/ada/sem_prag.adb
@@ -13376,6 +13376,19 @@ package body Sem_Prag is
 
 Spec_Id := Unique_Defining_Entity (Subp_Decl);
 
+--  Pragma Always_Terminates is not allowed on functions
+
+if Ekind (Spec_Id) = E_Function then
+   Error_Msg_N (Fix_Error
+ ("pragma % cannot apply to function"), N);
+  return;
+
+elsif Ekind (Spec_Id) = E_Generic_Function then
+   Error_Msg_N (Fix_Error
+ ("pragma % cannot apply to generic function"), N);
+   return;
+end if;
+
 --  A pragma that applies to a Ghost entity becomes Ghost for the
 --  purposes of legality checks and removal of ignored Ghost code.
 
-- 
2.40.0



[COMMITTED] ada: Fix missing error on function call returning incomplete view

2023-06-15 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

Testing for the presence of Non_Limited_View is not sufficient to detect
whether the nonlimited view has been analyzed because Build_Limited_Views
always sets the field on the limited view.  Instead the discriminant is
whether this nonlimited view is itself an incomplete type.

gcc/ada/

* sem_ch4.adb (Analyze_Call): Adjust the test to detect the presence
of an incomplete view of a type on a function call.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch4.adb | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/ada/sem_ch4.adb b/gcc/ada/sem_ch4.adb
index 02c59284994..b4b158a3ff4 100644
--- a/gcc/ada/sem_ch4.adb
+++ b/gcc/ada/sem_ch4.adb
@@ -1540,8 +1540,14 @@ package body Sem_Ch4 is
 
Set_Etype (N, Full_View (Etype (N)));
 
+--  If the call is within a thunk, the nonlimited view should be
+--  analyzed eventually (see also Analyze_Return_Type).
+
 elsif From_Limited_With (Etype (N))
   and then Present (Non_Limited_View (Etype (N)))
+  and then
+(Ekind (Non_Limited_View (Etype (N))) /= E_Incomplete_Type
+  or else Is_Thunk (Current_Scope))
 then
Set_Etype (N, Non_Limited_View (Etype (N)));
 
-- 
2.40.0



  1   2   >