date:20230519

Re: [PATCH] [RISC-V] Fix riscv_expand_conditional_move.

2023-05-19 Thread Jeff Law via Gcc-patches





On 4/27/23 20:21, Die Li wrote:

Two issues have been observed in current riscv_expand_conditional_move
implementation.
1. Before introduction of TARGET_XTHEADCONDMOV, op0 of comparision expression
is used for mode comparision with word_mode, but after TARGET_XTHEADCONDMOV
megered with TARGET_SFB_ALU, dest of if-then-else is used for mode comparision 
with
word_mode, and from md file mode of dest is DI or SI which can be different with
word_mode in RV64.

2. TARGET_XTHEADCONDMOV cannot be generated when the mode of the comparison is 
E_VOID.

This patch solves the issues above.

Provide an example from the newly added test case.

Testcase:
int ConNmv_reg_reg_reg(int x, int y, int z, int n){
   if (x != y) return z;
   return n;
}

Cflags:
-O2 -march=rv64gc_xtheadcondmov -mabi=lp64d

before patch:
ConNmv_reg_reg_reg:
bne a0,a1,.L23
mv  a2,a3
.L23:
mv  a0,a2
ret

after patch:
ConNmv_reg_reg_reg:
sub a1,a0,a1
th.mveqza2,zero,a1
th.mvneza3,zero,a1
or  a0,a2,a3
ret

Co-Authored by: Fei Gao 
Signed-off-by: Die Li 

gcc/ChangeLog:

 * config/riscv/riscv.cc (riscv_expand_conditional_move): Fix mode 
checking.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/xtheadcondmov-indirect-rv32.c: New test.
 * gcc.target/riscv/xtheadcondmov-indirect-rv64.c: New test.
---
  gcc/config/riscv/riscv.cc |   4 +-
  .../riscv/xtheadcondmov-indirect-rv32.c   | 116 ++
  .../riscv/xtheadcondmov-indirect-rv64.c   | 116 ++
  3 files changed, 234 insertions(+), 2 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv64.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1529855a2b4..30ace45dc5f 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3411,7 +3411,7 @@ riscv_expand_conditional_move (rtx dest, rtx op, rtx 
cons, rtx alt)
&& GET_MODE_CLASS (mode) == MODE_INT
&& reg_or_0_operand (cons, mode)
&& reg_or_0_operand (alt, mode)
-  && GET_MODE (op) == mode
+  && (GET_MODE (op) == mode || GET_MODE (op) == E_VOIDmode)
So I nearly suggested we just drop this check.  In general comparisons 
don't have modes.  But I don't think it's going to hurt and it lines up 
with the predicates that test for conditions.


Note that some of the new tests are still failing (though they certainly 
do much better after your patch)

.
  FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O1   check-function-bodies ConNmv_imm_imm_r >   FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2 

check-function-bodies ConNmv_imm_imm_reg

  FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   check-function-bodies 
ConNmv_imm_imm_reg
  FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   check-function-bodies 
ConNmv_imm_imm_reg
  FAIL: gcc.target/riscv/xtheadcondmov-indirect-rv32.c   -O3 -g   
check-function-bodies ConNmv_imm_imm_reg



[ ... and a few more instances omitted ... ]

I went ahead and pushed the patch, but you might want to double-check 
the state of those failing tests.


Jeff

Re: [PATCH 7/7] Expand directly for single bit test

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/19/23 20:14, Andrew Pinski via Gcc-patches wrote:

Instead of using creating trees to the expansion,
just expand directly which makes the code a little simplier
but also reduces how much GC memory will be used during the expansion.

OK? Bootstrapped and tested on x86_64-linux.

gcc/ChangeLog:

* expr.cc (fold_single_bit_test): Rename to ...
(expand_single_bit_test): This and expand directly.
(do_store_flag): Update for the rename function.

OK.

jeff

[PATCH] RISC-V: Add RVV comparison autovectorization

2023-05-19 Thread juzhe . zhong

From: Juzhe-Zhong 

This patch enable RVV auto-vectorization including floating-point
unorder and order comparison.

The testcases are leveraged from Richard.
So include Richard as co-author.

Co-Authored-By: Richard Sandiford 

gcc/ChangeLog:

* config/riscv/autovec.md (vcond): New pattern.
(vcondu): Ditto.
(vcond): Ditto.
(vec_cmp): Ditto.
(vec_cmpu): Ditto.
(vcond_mask_): Ditto.
* config/riscv/riscv-protos.h (expand_vec_cmp_int): New function.
(expand_vec_cmp_float): New function.
(expand_vcond): New function.
(emit_merge_op): Adapt function.
* config/riscv/riscv-v.cc (emit_pred_op): Ditto.
(emit_pred_binop): Ditto.
(emit_pred_unop): New function.
(emit_len_binop): Adapt function.
(emit_len_unop): New function.
(emit_index_op): Adapt function.
(emit_merge_op): Ditto.
(expand_vcond): New function.
(emit_pred_cmp): Ditto.
(emit_len_cmp): Ditto.
(expand_vec_cmp_int): Ditto.
(expand_vec_cmp_float): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp:
* gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond-2.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond-3.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c: New test.

---
 gcc/config/riscv/autovec.md   | 141 +
 gcc/config/riscv/riscv-protos.h   |   4 +
 gcc/config/riscv/riscv-v.cc   | 482 --
 .../riscv/rvv/autovec/cmp/vcond-1.c   | 157 ++
 .../riscv/rvv/autovec/cmp/vcond-2.c   |  75 +++
 .../riscv/rvv/autovec/cmp/vcond-3.c   |  13 +
 .../riscv/rvv/autovec/cmp/vcond_run-1.c   |  49 ++
 .../riscv/rvv/autovec/cmp/vcond_run-2.c   |  76 +++
 .../riscv/rvv/autovec/cmp/vcond_run-3.c   |   6 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 10 files changed, 970 insertions(+), 35 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index ce0b46537ad..5d8ba66f0c3 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -180,3 +180,144 @@
NULL_RTX, mode);
   DONE;
 })
+
+;; =
+;; == Comparisons and selects
+;; =
+
+;; -
+;;  [INT,FP] Compare and select
+;; -
+;; The patterns in this section are synthetic.
+;; -
+
+;; Integer (signed) vcond.  Don't enforce an immediate range here, since it
+;; depends on the comparison; leave it to riscv_vector::expand_vcond instead.
+(define_expand "vcond"
+  [(set (match_operand:V 0 "register_operand")
+   (if_then_else:V
+ (match_operator 3 "comparison_operator"
+   [(match_operand:VI 4 "register_operand")
+(match_operand:VI 5 "nonmemory_operand")])
+ (match_operand:V 1 "nonmemory_operand")
+ (match_operand:V 2 "nonmemory_operand")))]
+  "TARGET_VECTOR && known_eq (GET_MODE_NUNITS (mode),
+   GET_MODE_NUNITS (mode))"
+  {
+riscv_vector::expand_vcond (mode, operands);
+DONE;
+  }
+)
+
+;; Integer vcondu.  Don't enforce an immediate range here, since it
+;; depends on the comparison; leave it to riscv_vector::expand_vcond instead.
+(define_expand "vcondu"
+  [(set (match_operand:V 0 "register_operand")
+   (if_then_else:V
+ (match_operator 3 "comparison_operator"
+   [(match_operand:VI 4 "register_operand")
+(match_operand:VI 5 "nonmemory_operand")])
+ (match_operand:V 1 "nonmemory_operand")
+ (match_operand:V 2 "nonmemory_operand")))]
+  "TARGET_VECTOR && known_eq (GET_MODE_NUNITS (mode),
+   GET_MODE_NUNITS (mode))"
+  {
+riscv_vector::expand_vcond (mode, operands);
+DONE;
+  }
+)
+
+;; Floating-point vcond.  Don't enforce an immediate range here, since it
+;; depends on the comparison; leave it to riscv_vector::expand_vcond instead.
+(define_expand "vcond"
+  [(set

Re: [PATCH 6/7] Use BIT_FIELD_REF inside fold_single_bit_test

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/19/23 20:14, Andrew Pinski via Gcc-patches wrote:

Instead of depending on combine to do the extraction,
Let's create a tree which will expand directly into
the extraction. This improves code generation on some
targets.

OK? Bootstrapped and tested on x86_64-linux.

gcc/ChangeLog:

* expr.cc (fold_single_bit_test): Use BIT_FIELD_REF
instead of shift/and.

OK.
jeff

Re: [PATCH 5/7] Simplify fold_single_bit_test with respect to code

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/19/23 20:14, Andrew Pinski via Gcc-patches wrote:

Since we know that fold_single_bit_test is now only passed
NE_EXPR or EQ_EXPR, we can simplify it and just use a gcc_assert
to assert that is the code that is being passed.

OK? Bootstrapped and tested on x86_64-linux.

gcc/ChangeLog:

* expr.cc (fold_single_bit_test): Add an assert
and simplify based on code being NE_EXPR or EQ_EXPR.

OK.
jeff

Re: [PATCH 4/7] Simplify fold_single_bit_test slightly

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/19/23 20:14, Andrew Pinski via Gcc-patches wrote:

Now the only use of fold_single_bit_test is in do_store_flag,
we can change it such that to pass the inner arg and bitnum
instead of building a tree. There is no code generation changes
due to this change, only a decrease in GC memory that is produced
during expansion.

OK? Bootstrapped and tested on x86_64-linux.

gcc/ChangeLog:

* expr.cc (fold_single_bit_test): Take inner and bitnum
instead of arg0 and arg1. Update the code.
(do_store_flag): Don't create a tree when calling
fold_single_bit_test instead just call it with the bitnum
and the inner tree.

OK.
jeff

Re: [PATCH 3/7] Use get_def_for_expr in fold_single_bit_test

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/19/23 20:14, Andrew Pinski via Gcc-patches wrote:

The code in fold_single_bit_test, checks if
the inner was a right shift and improve the bitnum
based on that. But since the inner will always be a
SSA_NAME at this point, the code is dead. Move it over
to use the helper function get_def_for_expr instead.

OK? Bootstrapped and tested on x86_64-linux.

gcc/ChangeLog:

* expr.cc (fold_single_bit_test): Use get_def_for_expr
instead of checking the inner's code.

OK.
jeff

Re: [PATCH 2/7] Inline and simplify fold_single_bit_test_into_sign_test into fold_single_bit_test

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/19/23 20:14, Andrew Pinski via Gcc-patches wrote:

Since the last use of fold_single_bit_test is fold_single_bit_test,
we can inline it and even simplify the inlined version. This has
no behavior change.

OK? Bootstrapped and tested on x86_64-linux.

gcc/ChangeLog:

* expr.cc (fold_single_bit_test_into_sign_test): Inline into ...
(fold_single_bit_test): This and simplify.

Just to be clear, based on the NFC assumption, this is OK for the trunk.
jeff

Re: [PATCH 2/7] Inline and simplify fold_single_bit_test_into_sign_test into fold_single_bit_test

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/19/23 20:14, Andrew Pinski via Gcc-patches wrote:

Since the last use of fold_single_bit_test is fold_single_bit_test,
we can inline it and even simplify the inlined version. This has
no behavior change.

OK? Bootstrapped and tested on x86_64-linux.

gcc/ChangeLog:

* expr.cc (fold_single_bit_test_into_sign_test): Inline into ...
(fold_single_bit_test): This and simplify.
Going to trust the inlining and simpification is really NFC.  It's not 
really obvious from the patch.


jeff

Re: [PATCH 1/7] Move fold_single_bit_test to expr.cc from fold-const.cc

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/19/23 20:14, Andrew Pinski via Gcc-patches wrote:

This is part 1 of N patch set that will change the expansion
of `(A & C) != 0` from using trees to directly expanding so later
on we can do some cost analysis.

Since the only user of fold_single_bit_test is now
expand, move it to there.

OK? Bootstrapped and tested on x86_64-linux.

gcc/ChangeLog:

* fold-const.cc (fold_single_bit_test_into_sign_test): Move to
expr.cc.
(fold_single_bit_test): Likewise.
* expr.cc (fold_single_bit_test_into_sign_test): Move from fold-const.cc
(fold_single_bit_test): Likewise and make static.
* fold-const.h (fold_single_bit_test): Remove declaration.

I'm assuming this is purely moving the bits around.

OK.

jeff

Re: [PATCH] Mode-Switching: Fix local array maybe uninitialized warning

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/19/23 17:56, pan2...@intel.com wrote:

From: Pan Li 

There are 2 local array in function optimize_mode_switching. It will be
initialized conditionally at the beginning but then always consumed in
another loop. It may trigger the warning maybe-uninitialized, and may
result in build failure when enable werror, aka warning as error.

This patch will initialize the local array to zero explictly when
declaration.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* mode-switching.cc (entity_map): Initialize the array to zero.
(bb_info): Ditto.

OK.
jeff

Re: [PATCH v2] RISC-V: Add bext pattern for ZBS

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/8/23 08:11, Raphael Moreira Zinsly wrote:

Changes since v1:
 - Removed name clash change.
 - Fix new pattern indentation.

-- >8 --

When (a & (1 << bit_no)) is tested inside an IF we can use a bit extract.

gcc/ChangeLog:

* config/riscv/bitmanip.md
(branch_bext): New split pattern.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/zbs-bext-02.c: New test.

I went ahead and pushed this.

jeff

Re: [PATCH v2] RISC-V: Fix CTZ unnecessary sign extension [PR #106888]

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/8/23 08:12, Raphael Moreira Zinsly wrote:

Changes since v1:
- Remove subreg from operand 1.

-- >8 --

We were not able to match the CTZ sign extend pattern on RISC-V
because it gets optimized to zero extend and/or to ANDI patterns.
For the ANDI case, combine scrambles the RTL and generates the
extension by using subregs.

gcc/ChangeLog:
PR target/106888
* config/riscv/bitmanip.md
(disi2): Match with any_extend.
(disi2_sext): New pattern to match
with sign extend using an ANDI instruction.

gcc/testsuite/ChangeLog:
PR target/106888
* gcc.target/riscv/pr106888.c: New test.
* gcc.target/riscv/zbbw.c: Check for ANDI.

THanks.  I went ahead and retested this against the trunk and pushed it.

jeff

[PATCH 5/7] Simplify fold_single_bit_test with respect to code

2023-05-19 Thread Andrew Pinski via Gcc-patches

Since we know that fold_single_bit_test is now only passed
NE_EXPR or EQ_EXPR, we can simplify it and just use a gcc_assert
to assert that is the code that is being passed.

OK? Bootstrapped and tested on x86_64-linux.

gcc/ChangeLog:

* expr.cc (fold_single_bit_test): Add an assert
and simplify based on code being NE_EXPR or EQ_EXPR.
---
 gcc/expr.cc | 108 ++--
 1 file changed, 53 insertions(+), 55 deletions(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 67a9f82ca17..b5bc3fabb7e 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -12909,72 +12909,70 @@ fold_single_bit_test (location_t loc, enum tree_code 
code,
  tree inner, int bitnum,
  tree result_type)
 {
-  if ((code == NE_EXPR || code == EQ_EXPR))
-{
-  tree type = TREE_TYPE (inner);
-  scalar_int_mode operand_mode = SCALAR_INT_TYPE_MODE (type);
-  int ops_unsigned;
-  tree signed_type, unsigned_type, intermediate_type;
-  tree one;
-  gimple *inner_def;
+  gcc_assert (code == NE_EXPR || code == EQ_EXPR);
 
-  /* First, see if we can fold the single bit test into a sign-bit
-test.  */
-  if (bitnum == TYPE_PRECISION (type) - 1
- && type_has_mode_precision_p (type))
-   {
- tree stype = signed_type_for (type);
- return fold_build2_loc (loc, code == EQ_EXPR ? GE_EXPR : LT_EXPR,
- result_type,
- fold_convert_loc (loc, stype, inner),
- build_int_cst (stype, 0));
-   }
+  tree type = TREE_TYPE (inner);
+  scalar_int_mode operand_mode = SCALAR_INT_TYPE_MODE (type);
+  int ops_unsigned;
+  tree signed_type, unsigned_type, intermediate_type;
+  tree one;
+  gimple *inner_def;
 
-  /* Otherwise we have (A & C) != 0 where C is a single bit,
-convert that into ((A >> C2) & 1).  Where C2 = log2(C).
-Similarly for (A & C) == 0.  */
+  /* First, see if we can fold the single bit test into a sign-bit
+ test.  */
+  if (bitnum == TYPE_PRECISION (type) - 1
+  && type_has_mode_precision_p (type))
+{
+  tree stype = signed_type_for (type);
+  return fold_build2_loc (loc, code == EQ_EXPR ? GE_EXPR : LT_EXPR,
+ result_type,
+ fold_convert_loc (loc, stype, inner),
+ build_int_cst (stype, 0));
+}
 
-  /* If INNER is a right shift of a constant and it plus BITNUM does
-not overflow, adjust BITNUM and INNER.  */
-  if ((inner_def = get_def_for_expr (inner, RSHIFT_EXPR))
- && TREE_CODE (gimple_assign_rhs2 (inner_def)) == INTEGER_CST
- && bitnum < TYPE_PRECISION (type)
- && wi::ltu_p (wi::to_wide (gimple_assign_rhs2 (inner_def)),
-   TYPE_PRECISION (type) - bitnum))
-   {
- bitnum += tree_to_uhwi (gimple_assign_rhs2 (inner_def));
- inner = gimple_assign_rhs1 (inner_def);
-   }
+  /* Otherwise we have (A & C) != 0 where C is a single bit,
+ convert that into ((A >> C2) & 1).  Where C2 = log2(C).
+ Similarly for (A & C) == 0.  */
 
-  /* If we are going to be able to omit the AND below, we must do our
-operations as unsigned.  If we must use the AND, we have a choice.
-Normally unsigned is faster, but for some machines signed is.  */
-  ops_unsigned = (load_extend_op (operand_mode) == SIGN_EXTEND
- && !flag_syntax_only) ? 0 : 1;
+  /* If INNER is a right shift of a constant and it plus BITNUM does
+ not overflow, adjust BITNUM and INNER.  */
+  if ((inner_def = get_def_for_expr (inner, RSHIFT_EXPR))
+   && TREE_CODE (gimple_assign_rhs2 (inner_def)) == INTEGER_CST
+   && bitnum < TYPE_PRECISION (type)
+   && wi::ltu_p (wi::to_wide (gimple_assign_rhs2 (inner_def)),
+TYPE_PRECISION (type) - bitnum))
+{
+  bitnum += tree_to_uhwi (gimple_assign_rhs2 (inner_def));
+  inner = gimple_assign_rhs1 (inner_def);
+}
 
-  signed_type = lang_hooks.types.type_for_mode (operand_mode, 0);
-  unsigned_type = lang_hooks.types.type_for_mode (operand_mode, 1);
-  intermediate_type = ops_unsigned ? unsigned_type : signed_type;
-  inner = fold_convert_loc (loc, intermediate_type, inner);
+  /* If we are going to be able to omit the AND below, we must do our
+ operations as unsigned.  If we must use the AND, we have a choice.
+ Normally unsigned is faster, but for some machines signed is.  */
+  ops_unsigned = (load_extend_op (operand_mode) == SIGN_EXTEND
+ && !flag_syntax_only) ? 0 : 1;
 
-  if (bitnum != 0)
-   inner = build2 (RSHIFT_EXPR, intermediate_type,
-   inner, size_int (bitnum));
+  signed_type = lang_hooks.types.type_for_mode (operand_mode, 0);
+  unsigned_type = lang_hooks.types.type_for_mode (operand_mode, 1);
+  intermediate_type =

[PATCH 7/7] Expand directly for single bit test

2023-05-19 Thread Andrew Pinski via Gcc-patches

Instead of using creating trees to the expansion,
just expand directly which makes the code a little simplier
but also reduces how much GC memory will be used during the expansion.

OK? Bootstrapped and tested on x86_64-linux.

gcc/ChangeLog:

* expr.cc (fold_single_bit_test): Rename to ...
(expand_single_bit_test): This and expand directly.
(do_store_flag): Update for the rename function.
---
 gcc/expr.cc | 63 -
 1 file changed, 28 insertions(+), 35 deletions(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index d04e8ed0204..6849c9627d0 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -12899,15 +12899,14 @@ maybe_optimize_sub_cmp_0 (enum tree_code code, tree 
*arg0, tree *arg1)
 }
 
 
-/* If CODE with arguments INNER & (1<

[PATCH 4/7] Simplify fold_single_bit_test slightly

2023-05-19 Thread Andrew Pinski via Gcc-patches

Now the only use of fold_single_bit_test is in do_store_flag,
we can change it such that to pass the inner arg and bitnum
instead of building a tree. There is no code generation changes
due to this change, only a decrease in GC memory that is produced
during expansion.

OK? Bootstrapped and tested on x86_64-linux.

gcc/ChangeLog:

* expr.cc (fold_single_bit_test): Take inner and bitnum
instead of arg0 and arg1. Update the code.
(do_store_flag): Don't create a tree when calling
fold_single_bit_test instead just call it with the bitnum
and the inner tree.
---
 gcc/expr.cc | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index a61772b6808..67a9f82ca17 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -12899,23 +12899,19 @@ maybe_optimize_sub_cmp_0 (enum tree_code code, tree 
*arg0, tree *arg1)
 }
 
 
-/* If CODE with arguments ARG0 and ARG1 represents a single bit
+/* If CODE with arguments INNER & (1<

[PATCH 6/7] Use BIT_FIELD_REF inside fold_single_bit_test

2023-05-19 Thread Andrew Pinski via Gcc-patches

Instead of depending on combine to do the extraction,
Let's create a tree which will expand directly into
the extraction. This improves code generation on some
targets.

OK? Bootstrapped and tested on x86_64-linux.

gcc/ChangeLog:

* expr.cc (fold_single_bit_test): Use BIT_FIELD_REF
instead of shift/and.
---
 gcc/expr.cc | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index b5bc3fabb7e..d04e8ed0204 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -12957,22 +12957,21 @@ fold_single_bit_test (location_t loc, enum tree_code 
code,
   intermediate_type = ops_unsigned ? unsigned_type : signed_type;
   inner = fold_convert_loc (loc, intermediate_type, inner);
 
-  if (bitnum != 0)
-inner = build2 (RSHIFT_EXPR, intermediate_type,
-   inner, size_int (bitnum));
+  tree bftype = build_nonstandard_integer_type (1, 1);
+  int bitpos = bitnum;
 
-  one = build_int_cst (intermediate_type, 1);
+  if (BYTES_BIG_ENDIAN)
+bitpos = GET_MODE_BITSIZE (operand_mode) - 1 - bitpos;
 
-  if (code == EQ_EXPR)
-inner = fold_build2_loc (loc, BIT_XOR_EXPR, intermediate_type, inner, one);
+  inner = build3_loc (loc, BIT_FIELD_REF, bftype, inner,
+ bitsize_int (1), bitsize_int (bitpos));
 
-  /* Put the AND last so it can combine with more things.  */
-  inner = build2 (BIT_AND_EXPR, intermediate_type, inner, one);
+  one = build_int_cst (bftype, 1);
 
-  /* Make sure to return the proper type.  */
-  inner = fold_convert_loc (loc, result_type, inner);
+  if (code == EQ_EXPR)
+inner = fold_build2_loc (loc, BIT_XOR_EXPR, bftype, inner, one);
 
-  return inner;
+  return fold_convert_loc (loc, result_type, inner);
 }
 
 /* Generate code to calculate OPS, and exploded expression
-- 
2.17.1

[PATCH 3/7] Use get_def_for_expr in fold_single_bit_test

2023-05-19 Thread Andrew Pinski via Gcc-patches

The code in fold_single_bit_test, checks if
the inner was a right shift and improve the bitnum
based on that. But since the inner will always be a
SSA_NAME at this point, the code is dead. Move it over
to use the helper function get_def_for_expr instead.

OK? Bootstrapped and tested on x86_64-linux.

gcc/ChangeLog:

* expr.cc (fold_single_bit_test): Use get_def_for_expr
instead of checking the inner's code.
---
 gcc/expr.cc | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 6221b6991c5..a61772b6808 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -12920,6 +12920,7 @@ fold_single_bit_test (location_t loc, enum tree_code 
code,
   int ops_unsigned;
   tree signed_type, unsigned_type, intermediate_type;
   tree one;
+  gimple *inner_def;
 
   /* First, see if we can fold the single bit test into a sign-bit
 test.  */
@@ -12939,14 +12940,14 @@ fold_single_bit_test (location_t loc, enum tree_code 
code,
 
   /* If INNER is a right shift of a constant and it plus BITNUM does
 not overflow, adjust BITNUM and INNER.  */
-  if (TREE_CODE (inner) == RSHIFT_EXPR
- && TREE_CODE (TREE_OPERAND (inner, 1)) == INTEGER_CST
+  if ((inner_def = get_def_for_expr (inner, RSHIFT_EXPR))
+ && TREE_CODE (gimple_assign_rhs2 (inner_def)) == INTEGER_CST
  && bitnum < TYPE_PRECISION (type)
- && wi::ltu_p (wi::to_wide (TREE_OPERAND (inner, 1)),
+ && wi::ltu_p (wi::to_wide (gimple_assign_rhs2 (inner_def)),
TYPE_PRECISION (type) - bitnum))
{
- bitnum += tree_to_uhwi (TREE_OPERAND (inner, 1));
- inner = TREE_OPERAND (inner, 0);
+ bitnum += tree_to_uhwi (gimple_assign_rhs2 (inner_def));
+ inner = gimple_assign_rhs1 (inner_def);
}
 
   /* If we are going to be able to omit the AND below, we must do our
-- 
2.17.1

[PATCH 2/7] Inline and simplify fold_single_bit_test_into_sign_test into fold_single_bit_test

2023-05-19 Thread Andrew Pinski via Gcc-patches

Since the last use of fold_single_bit_test is fold_single_bit_test,
we can inline it and even simplify the inlined version. This has
no behavior change.

OK? Bootstrapped and tested on x86_64-linux.

gcc/ChangeLog:

* expr.cc (fold_single_bit_test_into_sign_test): Inline into ...
(fold_single_bit_test): This and simplify.
---
 gcc/expr.cc | 51 ++-
 1 file changed, 10 insertions(+), 41 deletions(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index f999f81af4a..6221b6991c5 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -12899,42 +12899,6 @@ maybe_optimize_sub_cmp_0 (enum tree_code code, tree 
*arg0, tree *arg1)
 }
 
 
-
-/* If CODE with arguments ARG0 and ARG1 represents a single bit
-   equality/inequality test, then return a simplified form of the test
-   using a sign testing.  Otherwise return NULL.  TYPE is the desired
-   result type.  */
-
-static tree
-fold_single_bit_test_into_sign_test (location_t loc,
-enum tree_code code, tree arg0, tree arg1,
-tree result_type)
-{
-  /* If this is testing a single bit, we can optimize the test.  */
-  if ((code == NE_EXPR || code == EQ_EXPR)
-  && TREE_CODE (arg0) == BIT_AND_EXPR && integer_zerop (arg1)
-  && integer_pow2p (TREE_OPERAND (arg0, 1)))
-{
-  /* If we have (A & C) != 0 where C is the sign bit of A, convert
-this into A < 0.  Similarly for (A & C) == 0 into A >= 0.  */
-  tree arg00 = sign_bit_p (TREE_OPERAND (arg0, 0), TREE_OPERAND (arg0, 1));
-
-  if (arg00 != NULL_TREE
- /* This is only a win if casting to a signed type is cheap,
-i.e. when arg00's type is not a partial mode.  */
- && type_has_mode_precision_p (TREE_TYPE (arg00)))
-   {
- tree stype = signed_type_for (TREE_TYPE (arg00));
- return fold_build2_loc (loc, code == EQ_EXPR ? GE_EXPR : LT_EXPR,
- result_type,
- fold_convert_loc (loc, stype, arg00),
- build_int_cst (stype, 0));
-   }
-}
-
-  return NULL_TREE;
-}
-
 /* If CODE with arguments ARG0 and ARG1 represents a single bit
equality/inequality test, then return a simplified form of
the test using shifts and logical operations.  Otherwise return
@@ -12955,14 +12919,19 @@ fold_single_bit_test (location_t loc, enum tree_code 
code,
   scalar_int_mode operand_mode = SCALAR_INT_TYPE_MODE (type);
   int ops_unsigned;
   tree signed_type, unsigned_type, intermediate_type;
-  tree tem, one;
+  tree one;
 
   /* First, see if we can fold the single bit test into a sign-bit
 test.  */
-  tem = fold_single_bit_test_into_sign_test (loc, code, arg0, arg1,
-result_type);
-  if (tem)
-   return tem;
+  if (bitnum == TYPE_PRECISION (type) - 1
+ && type_has_mode_precision_p (type))
+   {
+ tree stype = signed_type_for (type);
+ return fold_build2_loc (loc, code == EQ_EXPR ? GE_EXPR : LT_EXPR,
+ result_type,
+ fold_convert_loc (loc, stype, inner),
+ build_int_cst (stype, 0));
+   }
 
   /* Otherwise we have (A & C) != 0 where C is a single bit,
 convert that into ((A >> C2) & 1).  Where C2 = log2(C).
-- 
2.17.1

[PATCH 1/7] Move fold_single_bit_test to expr.cc from fold-const.cc

2023-05-19 Thread Andrew Pinski via Gcc-patches

This is part 1 of N patch set that will change the expansion
of `(A & C) != 0` from using trees to directly expanding so later
on we can do some cost analysis.

Since the only user of fold_single_bit_test is now
expand, move it to there.

OK? Bootstrapped and tested on x86_64-linux.

gcc/ChangeLog:

* fold-const.cc (fold_single_bit_test_into_sign_test): Move to
expr.cc.
(fold_single_bit_test): Likewise.
* expr.cc (fold_single_bit_test_into_sign_test): Move from fold-const.cc
(fold_single_bit_test): Likewise and make static.
* fold-const.h (fold_single_bit_test): Remove declaration.
---
 gcc/expr.cc   | 113 ++
 gcc/fold-const.cc | 112 -
 gcc/fold-const.h  |   1 -
 3 files changed, 113 insertions(+), 113 deletions(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 5ede094e705..f999f81af4a 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -12898,6 +12898,119 @@ maybe_optimize_sub_cmp_0 (enum tree_code code, tree 
*arg0, tree *arg1)
   *arg1 = treeop1;
 }
 
+
+
+/* If CODE with arguments ARG0 and ARG1 represents a single bit
+   equality/inequality test, then return a simplified form of the test
+   using a sign testing.  Otherwise return NULL.  TYPE is the desired
+   result type.  */
+
+static tree
+fold_single_bit_test_into_sign_test (location_t loc,
+enum tree_code code, tree arg0, tree arg1,
+tree result_type)
+{
+  /* If this is testing a single bit, we can optimize the test.  */
+  if ((code == NE_EXPR || code == EQ_EXPR)
+  && TREE_CODE (arg0) == BIT_AND_EXPR && integer_zerop (arg1)
+  && integer_pow2p (TREE_OPERAND (arg0, 1)))
+{
+  /* If we have (A & C) != 0 where C is the sign bit of A, convert
+this into A < 0.  Similarly for (A & C) == 0 into A >= 0.  */
+  tree arg00 = sign_bit_p (TREE_OPERAND (arg0, 0), TREE_OPERAND (arg0, 1));
+
+  if (arg00 != NULL_TREE
+ /* This is only a win if casting to a signed type is cheap,
+i.e. when arg00's type is not a partial mode.  */
+ && type_has_mode_precision_p (TREE_TYPE (arg00)))
+   {
+ tree stype = signed_type_for (TREE_TYPE (arg00));
+ return fold_build2_loc (loc, code == EQ_EXPR ? GE_EXPR : LT_EXPR,
+ result_type,
+ fold_convert_loc (loc, stype, arg00),
+ build_int_cst (stype, 0));
+   }
+}
+
+  return NULL_TREE;
+}
+
+/* If CODE with arguments ARG0 and ARG1 represents a single bit
+   equality/inequality test, then return a simplified form of
+   the test using shifts and logical operations.  Otherwise return
+   NULL.  TYPE is the desired result type.  */
+
+static tree
+fold_single_bit_test (location_t loc, enum tree_code code,
+ tree arg0, tree arg1, tree result_type)
+{
+  /* If this is testing a single bit, we can optimize the test.  */
+  if ((code == NE_EXPR || code == EQ_EXPR)
+  && TREE_CODE (arg0) == BIT_AND_EXPR && integer_zerop (arg1)
+  && integer_pow2p (TREE_OPERAND (arg0, 1)))
+{
+  tree inner = TREE_OPERAND (arg0, 0);
+  tree type = TREE_TYPE (arg0);
+  int bitnum = tree_log2 (TREE_OPERAND (arg0, 1));
+  scalar_int_mode operand_mode = SCALAR_INT_TYPE_MODE (type);
+  int ops_unsigned;
+  tree signed_type, unsigned_type, intermediate_type;
+  tree tem, one;
+
+  /* First, see if we can fold the single bit test into a sign-bit
+test.  */
+  tem = fold_single_bit_test_into_sign_test (loc, code, arg0, arg1,
+result_type);
+  if (tem)
+   return tem;
+
+  /* Otherwise we have (A & C) != 0 where C is a single bit,
+convert that into ((A >> C2) & 1).  Where C2 = log2(C).
+Similarly for (A & C) == 0.  */
+
+  /* If INNER is a right shift of a constant and it plus BITNUM does
+not overflow, adjust BITNUM and INNER.  */
+  if (TREE_CODE (inner) == RSHIFT_EXPR
+ && TREE_CODE (TREE_OPERAND (inner, 1)) == INTEGER_CST
+ && bitnum < TYPE_PRECISION (type)
+ && wi::ltu_p (wi::to_wide (TREE_OPERAND (inner, 1)),
+   TYPE_PRECISION (type) - bitnum))
+   {
+ bitnum += tree_to_uhwi (TREE_OPERAND (inner, 1));
+ inner = TREE_OPERAND (inner, 0);
+   }
+
+  /* If we are going to be able to omit the AND below, we must do our
+operations as unsigned.  If we must use the AND, we have a choice.
+Normally unsigned is faster, but for some machines signed is.  */
+  ops_unsigned = (load_extend_op (operand_mode) == SIGN_EXTEND
+ && !flag_syntax_only) ? 0 : 1;
+
+  signed_type = lang_hooks.types.type_for_mode (operand_mode, 0);
+  unsigned_type = lang_hooks.types.type_for_mode (operand_mode, 1);
+

[PATCH 0/7] Improve do_store_flag

2023-05-19 Thread Andrew Pinski via Gcc-patches

This patch set improves do_store_flag for the single bit case.
We go back to expanding the code directly rather than building some
trees. Plus instead of using shift+and we use directly bit_field
extraction; this improves code generation on avr.

Andrew Pinski (7):
  Move fold_single_bit_test to expr.cc from fold-const.cc
  Inline and simplify fold_single_bit_test_into_sign_test into
fold_single_bit_test
  Use get_def_for_expr in fold_single_bit_test
  Simplify fold_single_bit_test slightly
  Simplify fold_single_bit_test with respect to code
  Use BIT_FIELD_REF inside fold_single_bit_test
  Expand directly for single bit test

 gcc/expr.cc   |  91 -
 gcc/fold-const.cc | 112 --
 gcc/fold-const.h  |   1 -
 3 files changed, 81 insertions(+), 123 deletions(-)

-- 
2.17.1

[PATCH] Mode-Switching: Fix local array maybe uninitialized warning

2023-05-19 Thread Pan Li via Gcc-patches

From: Pan Li 

There are 2 local array in function optimize_mode_switching. It will be
initialized conditionally at the beginning but then always consumed in
another loop. It may trigger the warning maybe-uninitialized, and may
result in build failure when enable werror, aka warning as error.

This patch will initialize the local array to zero explictly when
declaration.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* mode-switching.cc (entity_map): Initialize the array to zero.
(bb_info): Ditto.
---
 gcc/mode-switching.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/mode-switching.cc b/gcc/mode-switching.cc
index 2d2818f5674..64ae2bc29c3 100644
--- a/gcc/mode-switching.cc
+++ b/gcc/mode-switching.cc
@@ -499,8 +499,8 @@ optimize_mode_switching (void)
   bool need_commit = false;
   static const int num_modes[] = NUM_MODES_FOR_MODE_SWITCHING;
 #define N_ENTITIES ARRAY_SIZE (num_modes)
-  int entity_map[N_ENTITIES];
-  struct bb_info *bb_info[N_ENTITIES];
+  int entity_map[N_ENTITIES] = {};
+  struct bb_info *bb_info[N_ENTITIES] = {};
   int i, j;
   int n_entities = 0;
   int max_num_modes = 0;
-- 
2.34.1

Re: [V7][PATCH 1/2] Handle component_ref to a structre/union field including flexible array member [PR101832]

2023-05-19 Thread Bernhard Reutner-Fischer via Gcc-patches

On Fri, 19 May 2023 20:49:47 +
Qing Zhao via Gcc-patches  wrote:

> GCC extension accepts the case when a struct with a flexible array member
> is embedded into another struct or union (possibly recursively).

Do you mean TYPE_TRAILING_FLEXARRAY()?

> diff --git a/gcc/tree.h b/gcc/tree.h
> index 0b72663e6a1..237644e788e 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -786,7 +786,12 @@ extern void omp_clause_range_check_failed (const_tree, 
> const char *, int,
> (...) prototype, where arguments can be accessed with va_start and
> va_arg), as opposed to an unprototyped function.  */
>  #define TYPE_NO_NAMED_ARGS_STDARG_P(NODE) \
> -  (TYPE_CHECK (NODE)->type_common.no_named_args_stdarg_p)
> +  (FUNC_OR_METHOD_CHECK (NODE)->type_common.no_named_args_stdarg_p)
> +
> +/* True if this RECORD_TYPE or UNION_TYPE includes a flexible array member
> +   at the last field recursively.  */
> +#define TYPE_INCLUDE_FLEXARRAY(NODE) \
> +  (RECORD_OR_UNION_CHECK (NODE)->type_common.no_named_args_stdarg_p)

Until i read the description above i read TYPE_INCLUDE_FLEXARRAY as an
option to include or not include something. The description hints more
at TYPE_INCLUDES_FLEXARRAY (with an S) to be a type which has at least
one member which has a trailing flexible array or which itself has a
trailing flexible array.

>  
>  /* In an IDENTIFIER_NODE, this means that assemble_name was called with
> this string as an argument.  */

Re: [PATCH 1/2] Improve do_store_flag for single bit comparison against 0

2023-05-19 Thread Andrew Pinski via Gcc-patches

On Fri, May 19, 2023 at 9:40 AM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 5/18/23 20:14, Andrew Pinski via Gcc-patches wrote:
> > While working something else, I noticed we could improve
> > the following function code generation:
> > ```
> > unsigned f(unsigned t)
> > {
> >if (t & ~(1<<30)) __builtin_unreachable();
> >return t != 0;
> > }
> > ```
> > Right know we just emit a comparison against 0 instead
> > of just a shift right by 30.
> > There is code in do_store_flag which already optimizes
> > `(t & 1<<30) != 0` to `(t >> 30) & 1`. This patch
> > extends it to handle the case where we know t has a
> > nonzero of just one bit set.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> >
> > gcc/ChangeLog:
> >
> >   * expr.cc (do_store_flag): Extend the one bit checking case
> >   to handle the case where we don't have an and but rather still
> >   one bit is known to be non-zero.
> So as we touched on in IRC, the concern is targets where the cost of the
> shift depends on the number of bits shifted.  Can we look at costing
> here to determine the initial RTL generation approach?
>
> Another approach that would work for some targets is a single bit
> extract.  In theory we should be discovering the extract idiom from the
> shift+and form, but I'm always concerned that it's going to be missed
> for one or more oddball reasons.

I now have a patch set which does the extraction directly rather than having
combine try to combine it later on. This actually fixes an issue with avr target
which expands out the shift by doing a loop. Since we are using
extract_bit_field,
if a target does not have an extract pattern, it will expand using
shift+and form instead.
I will resubmit this and the other patch after this new patch set is completed.

Thanks,
Andrew Pinski

>
> jeff
>

Re: [PATCH] nvptx: Add suppport for __builtin_nvptx_brev instrinsic.

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/6/23 10:04, Roger Sayle wrote:
  


This patch adds support for (a pair of) bit reversal intrinsics

__builtin_nvptx_brev and __builtin_nvptx_brevll which perform 32-bit

and 64-bit bit reversal (using nvptx's brev instruction) matching

the __brev and __brevll instrinsics provided by NVidia's nvcc compiler.

https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT
.html

  


This patch has been tested on nvptx-none which make and make -k check

with no new failures.  Ok for mainline?

  

  


2023-05-06  Roger Sayle  

  


gcc/ChangeLog

 * config/nvptx/nvptx.cc (nvptx_expand_brev): Expand target

 builtin for bit reversal using brev instruction.

 (enum nvptx_builtins): Add NVPTX_BUILTIN_BREV and

 NVPTX_BUILTIN_BREVLL.

 (nvptx_init_builtins): Define "brev" and "brevll".

 (nvptx_expand_builtin): Expand NVPTX_BUILTIN_BREV and

 NVPTX_BUILTIN_BREVLL via nvptx_expand_brev function.

 * doc/extend.texi (Nvidia PTX Builtin-in Functions): New

 section, document __builtin_nvptx_brev{,ll}.

  


gcc/testsuite/ChangeLog

 * gcc.target/nvptx/brev-1.c: New 32-bit test case.

 * gcc.target/nvptx/brev-2.c: Likewise.

 * gcc.target/nvptx/brevll-1.c: New 64-bit test case.

 * gcc.target/nvptx/brevll-2.c: Likewise.

OK
jeff

Re: [PATCH] Only use NO_REGS in cost calculation when !hard_regno_mode_ok for GENERAL_REGS and mode.

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/17/23 00:57, liuhongt via Gcc-patches wrote:

r14-172-g0368d169492017 replaces GENERAL_REGS with NO_REGS in cost
calculation when the preferred register class are not known yet.
It regressed powerpc PR109610 and PR109858, it looks too aggressive to use
NO_REGS when mode can be allocated with GENERAL_REGS.
The patch takes a step back, still use GENERAL_REGS when
hard_regno_mode_ok for mode and GENERAL_REGS, otherwise uses NO_REGS.
Kewen confirmed the patch fixed PR109858, I vefiried it also fixed PR109610.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
No big performance impact for SPEC2017 on icelake server.
Ok for trunk?

gcc/ChangeLog:

* ira-costs.cc (scan_one_insn): Only use NO_REGS in cost
calculation when !hard_regno_mode_ok for GENERAL_REGS and
mode, otherwise still use GENERAL_REGS.
BTW, Vlad is on PTO right now.  I'm sure he'll handle this after he 
returns and starts digging out of all the stuff that's piled up.


jeff

Re: [PATCH] configure: Implement --enable-host-bind-now

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/16/23 09:37, Marek Polacek via Gcc-patches wrote:

As promised in the --enable-host-pie patch, this patch adds another
configure option, --enable-host-bind-now, which adds -z now when linking
the compiler executables in order to extend hardening.  BIND_NOW with RELRO
allows the GOT to be marked RO; this prevents GOT modification attacks.

This option does not affect linking of target libraries; you can use
LDFLAGS_FOR_TARGET=-Wl,-z,relro,-z,now to enable RELRO/BIND_NOW.

With this patch:
$ readelf -Wd cc1{,plus} | grep FLAGS
  0x001e (FLAGS)  BIND_NOW
  0x6ffb (FLAGS_1)Flags: NOW PIE
  0x001e (FLAGS)  BIND_NOW
  0x6ffb (FLAGS_1)Flags: NOW PIE

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

c++tools/ChangeLog:

* configure.ac (--enable-host-bind-now): New check.
* configure: Regenerate.

gcc/ChangeLog:

* configure.ac (--enable-host-bind-now): New check.  Add
-Wl,-z,now to LD_PICFLAG if --enable-host-bind-now.
* configure: Regenerate.
* doc/install.texi: Document --enable-host-bind-now.

lto-plugin/ChangeLog:

* configure.ac (--enable-host-bind-now): New check.  Link with
-z,now.
* configure: Regenerate.

OK
jeff

Re: [V7][PATCH 2/2] Update documentation to clarify a GCC extension [PR77650]

2023-05-19 Thread Joseph Myers

On Fri, 19 May 2023, Qing Zhao via Gcc-patches wrote:

> +GCC extension accepts a structure containing an ISO C99 @dfn{flexible array

"The GCC extension" or "A GCC extension".

> +@item
> +A structure containing a C99 flexible array member, or a union containing
> +such a structure, is the middle field of another structure, for example:

There might be more than one middle field, and I think this case also 
includes where it's the *first* field - any field other than the last.

> +@smallexample
> +struct flex  @{ int length; char data[]; @};
> +
> +struct mid_flex @{ int m; struct flex flex_data; int n; @};
> +@end smallexample
> +
> +In the above, @code{mid_flex.flex_data.data[]} has undefined behavior.

And it's not literally mid_flex.flex_data.data[] that has undefined 
behavior, but trying to access a member of that array.

> +Compilers do not handle such case consistently, Any code relying on

"such a case", and "," should be "." at the end of a sentence.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [C PATCH] Remove dead code related to type compatibility across TUs.

2023-05-19 Thread Joseph Myers

On Fri, 19 May 2023, Martin Uecker via Gcc-patches wrote:

> Repost for stage 1.
> 
> 
> C: Remove dead code related to type compatibility across TUs.
> 
> Code to detect struct/unions across the same TU is not needed
> anymore. Code for determining compatibility of tagged types is
> preserved as it will be used for C2X. Some errors in the unused
> code are fixed.
> 
> Bootstrapped with no regressions for x86_64-pc-linux-gnu.
> 
> gcc/c/
> * c-decl.cc (set_type_context): Remove.
> (pop_scope, diagnose_mismatched_decls, pushdecl):
> Remove dead code.
> * c-typeck.cc (comptypes_internal): Remove dead code.
> (same_translation_unit_p): Remove.
> (tagged_types_tu_compatible_p): Some fixes.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

[V7][PATCH 2/2] Update documentation to clarify a GCC extension [PR77650]

2023-05-19 Thread Qing Zhao via Gcc-patches

on a structure with a C99 flexible array member being nested in
another structure.

"GCC extension accepts a structure containing an ISO C99 "flexible array
member", or a union containing such a structure (possibly recursively)
to be a member of a structure.

 There are two situations:

   * A structure containing a C99 flexible array member, or a union
 containing such a structure, is the last field of another structure,
 for example:

  struct flex  { int length; char data[]; };
  union union_flex { int others; struct flex f; };

  struct out_flex_struct { int m; struct flex flex_data; };
  struct out_flex_union { int n; union union_flex flex_data; };

 In the above, both 'out_flex_struct.flex_data.data[]' and
 'out_flex_union.flex_data.f.data[]' are considered as flexible
 arrays too.

   * A structure containing a C99 flexible array member, or a union
 containing such a structure, is the middle field of another structure,
 for example:

  struct flex  { int length; char data[]; };

  struct mid_flex { int m; struct flex flex_data; int n; };

 In the above, 'mid_flex.flex_data.data[]' has undefined behavior.
 Compilers do not handle such case consistently, Any code relying on
 such case should be modified to ensure that flexible array members
 only end up at the ends of structures.

 Please use warning option '-Wflex-array-member-not-at-end' to
 identify all such cases in the source code and modify them.  This
 warning will be on by default starting from GCC 15.
"

gcc/c-family/ChangeLog:

* c.opt: New option -Wflex-array-member-not-at-end.

gcc/c/ChangeLog:

* c-decl.cc (finish_struct): Issue warnings for new option.

gcc/ChangeLog:

* doc/extend.texi: Document GCC extension on a structure containing
a flexible array member to be a member of another structure.

gcc/testsuite/ChangeLog:

* gcc.dg/variable-sized-type-flex-array.c: New test.
---
 gcc/c-family/c.opt|  5 +++
 gcc/c/c-decl.cc   |  9 
 gcc/doc/extend.texi   | 45 ++-
 .../gcc.dg/variable-sized-type-flex-array.c   | 31 +
 4 files changed, 89 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/variable-sized-type-flex-array.c

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index cddeece..c26d9801b63 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -737,6 +737,11 @@ Wformat-truncation=
 C ObjC C++ LTO ObjC++ Joined RejectNegative UInteger Var(warn_format_trunc) 
Warning LangEnabledBy(C ObjC C++ LTO ObjC++,Wformat=, warn_format >= 1, 0) 
IntegerRange(0, 2)
 Warn about calls to snprintf and similar functions that truncate output.
 
+Wflex-array-member-not-at-end
+C C++ Var(warn_flex_array_member_not_at_end) Warning
+Warn when a structure containing a C99 flexible array member as the last
+field is not at the end of another structure.
+
 Wif-not-aligned
 C ObjC C++ ObjC++ Var(warn_if_not_aligned) Init(1) Warning
 Warn when the field in a struct is not aligned.
diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 2c620b681d9..9a48f28788d 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9293,6 +9293,15 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
TYPE_INCLUDE_FLEXARRAY (t)
  = is_last_field && TYPE_INCLUDE_FLEXARRAY (TREE_TYPE (x));
 
+  if (warn_flex_array_member_not_at_end
+ && !is_last_field
+ && RECORD_OR_UNION_TYPE_P (TREE_TYPE (x))
+ && TYPE_INCLUDE_FLEXARRAY (TREE_TYPE (x)))
+   warning_at (DECL_SOURCE_LOCATION (x),
+   OPT_Wflex_array_member_not_at_end,
+   "structure containing a flexible array member"
+   " is not at the end of another structure");
+
   if (DECL_NAME (x)
  || RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
saw_named_field = true;
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ed8b9c8a87b..6425ba57e88 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1751,7 +1751,50 @@ Flexible array members may only appear as the last 
member of a
 A structure containing a flexible array member, or a union containing
 such a structure (possibly recursively), may not be a member of a
 structure or an element of an array.  (However, these uses are
-permitted by GCC as extensions.)
+permitted by GCC as extensions, see details below.)
+@end itemize
+
+GCC extension accepts a structure containing an ISO C99 @dfn{flexible array
+member}, or a union containing such a structure (possibly recursively)
+to be a member of a structure.
+
+There are two situations:
+
+@itemize @bullet
+@item
+A structure containing a C99 flexible array member, or a union containing
+such a structure, is the last field of another structure, for example:
+
+@smallexample
+struct flex  @{ int length; char data[];

Re: [C PATCH v2] Fix ICEs related to VM types in C [PR106465, PR107557, PR108423, PR109450]

2023-05-19 Thread Joseph Myers

On Fri, 19 May 2023, Martin Uecker via Gcc-patches wrote:

> Thanks Joseph! 
> 
> Revised version attached. Ok?

The C front-end changes and tests are OK.

> But I wonder whether we generally need to do something 
> about
> 
>   sizeof *x
> 
> when x is NULL or not initialized. This is quite commonly
> used in C code and if the type is not of variable size,
> it is also unproblematic.  So the UB for variable size is
> unfortunate and certainly also affects existing code in
> the wild.  In practice it does not seem to cause
> problems because there is no lvalue conversion and this
> then seems to work.  Maybe we document this as an 
> extension?  (and make sure in the C FE that it
> works)  This would also make this idiom valid:

There's certainly a tricky question of what exactly it means to evaluate 
*x as far as producing an lvalue but without converting it to an rvalue - 
but right now the C standard wording on unary '*' is clear that "if it 
points to an object, the result is an lvalue designating the object" and 
"If an invalid value has been assigned to the pointer, the behavior of the 
unary * operator is undefined.", i.e. it's the evaluation as far as 
producing an lvalue that produces undefined behavior, rather than the 
lvalue conversion (that doesn't happen in sizeof) that does so.  And 
indeed we probably would be able to define semantics that avoid UB if 
desired.

-- 
Joseph S. Myers
jos...@codesourcery.com

[V7][PATCH 1/2] Handle component_ref to a structre/union field including flexible array member [PR101832]

2023-05-19 Thread Qing Zhao via Gcc-patches

GCC extension accepts the case when a struct with a flexible array member
is embedded into another struct or union (possibly recursively).
__builtin_object_size should treat such struct as flexible size.

gcc/c/ChangeLog:

PR tree-optimization/101832
* c-decl.cc (finish_struct): Set TYPE_INCLUDE_FLEXARRAY for
struct/union type.

gcc/lto/ChangeLog:

PR tree-optimization/101832
* lto-common.cc (compare_tree_sccs_1): Compare bit
TYPE_NO_NAMED_ARGS_STDARG_P or TYPE_INCLUDE_FLEXARRAY properly
for its corresponding type.

gcc/ChangeLog:

PR tree-optimization/101832
* print-tree.cc (print_node): Print new bit type_include_flexarray.
* tree-core.h (struct tree_type_common): Use bit no_named_args_stdarg_p
as type_include_flexarray for RECORD_TYPE or UNION_TYPE.
* tree-object-size.cc (addr_object_size): Handle structure/union type
when it has flexible size.
* tree-streamer-in.cc (unpack_ts_type_common_value_fields): Stream
in bit no_named_args_stdarg_p properly for its corresponding type.
* tree-streamer-out.cc (pack_ts_type_common_value_fields): Stream
out bit no_named_args_stdarg_p properly for its corresponding type.
* tree.h (TYPE_INCLUDE_FLEXARRAY): New macro TYPE_INCLUDE_FLEXARRAY.

gcc/testsuite/ChangeLog:

PR tree-optimization/101832
* gcc.dg/builtin-object-size-pr101832.c: New test.
---
 gcc/c/c-decl.cc   |  11 ++
 gcc/lto/lto-common.cc |   5 +-
 gcc/print-tree.cc |   5 +
 .../gcc.dg/builtin-object-size-pr101832.c | 134 ++
 gcc/tree-core.h   |   2 +
 gcc/tree-object-size.cc   |  23 ++-
 gcc/tree-streamer-in.cc   |   5 +-
 gcc/tree-streamer-out.cc  |   5 +-
 gcc/tree.h|   7 +-
 9 files changed, 192 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index b5b491cf2da..2c620b681d9 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9282,6 +9282,17 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
   /* Set DECL_NOT_FLEXARRAY flag for FIELD_DECL x.  */
   DECL_NOT_FLEXARRAY (x) = !is_flexible_array_member_p (is_last_field, x);
 
+  /* Set TYPE_INCLUDE_FLEXARRAY for the context of x, t.
+when x is an array and is the last field.  */
+  if (TREE_CODE (TREE_TYPE (x)) == ARRAY_TYPE)
+   TYPE_INCLUDE_FLEXARRAY (t)
+ = is_last_field && flexible_array_member_type_p (TREE_TYPE (x));
+  /* Recursively set TYPE_INCLUDE_FLEXARRAY for the context of x, t
+when x is an union or record and is the last field.  */
+  else if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
+   TYPE_INCLUDE_FLEXARRAY (t)
+ = is_last_field && TYPE_INCLUDE_FLEXARRAY (TREE_TYPE (x));
+
   if (DECL_NAME (x)
  || RECORD_OR_UNION_TYPE_P (TREE_TYPE (x)))
saw_named_field = true;
diff --git a/gcc/lto/lto-common.cc b/gcc/lto/lto-common.cc
index 537570204b3..35827aab075 100644
--- a/gcc/lto/lto-common.cc
+++ b/gcc/lto/lto-common.cc
@@ -1275,7 +1275,10 @@ compare_tree_sccs_1 (tree t1, tree t2, tree **map)
   if (AGGREGATE_TYPE_P (t1))
compare_values (TYPE_TYPELESS_STORAGE);
   compare_values (TYPE_EMPTY_P);
-  compare_values (TYPE_NO_NAMED_ARGS_STDARG_P);
+  if (FUNC_OR_METHOD_TYPE_P (t1))
+   compare_values (TYPE_NO_NAMED_ARGS_STDARG_P);
+  if (RECORD_OR_UNION_TYPE_P (t1))
+   compare_values (TYPE_INCLUDE_FLEXARRAY);
   compare_values (TYPE_PACKED);
   compare_values (TYPE_RESTRICT);
   compare_values (TYPE_USER_ALIGN);
diff --git a/gcc/print-tree.cc b/gcc/print-tree.cc
index ccecd3dc6a7..aaded53b1b1 100644
--- a/gcc/print-tree.cc
+++ b/gcc/print-tree.cc
@@ -632,6 +632,11 @@ print_node (FILE *file, const char *prefix, tree node, int 
indent,
  && TYPE_CXX_ODR_P (node))
fputs (" cxx-odr-p", file);
 
+  if ((code == RECORD_TYPE
+  || code == UNION_TYPE)
+ && TYPE_INCLUDE_FLEXARRAY (node))
+   fputs (" include-flexarray", file);
+
   /* The transparent-union flag is used for different things in
 different nodes.  */
   if ((code == UNION_TYPE || code == RECORD_TYPE)
diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c 
b/gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c
new file mode 100644
index 000..60078e11634
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c
@@ -0,0 +1,134 @@
+/* PR 101832: 
+   GCC extension accepts the case when a struct with a C99 flexible array
+   member is embedded into another struct (possibly recursively).
+   __builtin_object_size will treat such struct as flexible size.
+   However, when a structure with

[V7][PATCH 0/2]Accept and Handle the case when a structure including a FAM nested in another structure

2023-05-19 Thread Qing Zhao via Gcc-patches

Hi,

This is the 7th version of the patch, which rebased on the latest trunk.
This is an important patch needed by Linux Kernel security project. 

We already have an extensive discussion on this issue and I have went
through 6 revisions of the patches based on the discussion and resolved
all the comments and suggestions raised during the discussion;

compared to the 6th version, the major change are:

1. update the documentation to replace the mentioning of GCC14 with
GCC15.
2. update the documentation to replace the following wording:
"A structure or a union with a C99 flexible array member"
with:
"A structure containing a C99 flexible array member, or a union containing
such a structure,"

All others are the same as 6th version. 

the 6th version are here:

https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616312.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616313.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616314.html

Kees has tested the 6th version of the patch with Linux kernel, and everything
is good. relsolved many false positives for bounds checking.

Notes for the review history of these patches (2 patches)
1.The patch 1/2: Handle component_ref to a structre/union field including
  flexible array member [PR101832]

   The C front-end part has been approved by Joseph.
   For the middle-end, most of the change has been reviewed by Richard
   (and modified based on his comments and suggestions), except the change
   in tree-object-size.cc.
  
2.The patch 2/2: Update documentation to clarify a GCC extension

   This is basically a C FE and documentation change, I have updated it based
   on previous comments and suggestions.
   Joseph, could you review it to see whether this version is ready to go?

bootstrapped and regression tested on aarch64 and x86.

Okay for commit?

thanks a lot.

Qing

(for more details on the review history, I listed other important notes
below:


A. Richard Biener has reviewed the middle-end part of the first patch 
and raised some comments for the 4th version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613643.html

I updated it with his suggestion and Sandra’s comments as 5th version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614100.html

B. The comments for the 5th version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614511.html
(In this one, Joseph approved the C FE change of the first patch).
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614514.html
(In this one, Joseph raised two comments on the documentation wordings
 for the 2nd patch. And I updated  based on his comment in the 6th version)
)

[C PATCH] Remove dead code related to type compatibility across TUs.

2023-05-19 Thread Martin Uecker via Gcc-patches



Repost for stage 1.


C: Remove dead code related to type compatibility across TUs.

Code to detect struct/unions across the same TU is not needed
anymore. Code for determining compatibility of tagged types is
preserved as it will be used for C2X. Some errors in the unused
code are fixed.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/c/
* c-decl.cc (set_type_context): Remove.
(pop_scope, diagnose_mismatched_decls, pushdecl):
Remove dead code.
* c-typeck.cc (comptypes_internal): Remove dead code.
(same_translation_unit_p): Remove.
(tagged_types_tu_compatible_p): Some fixes.

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index f63c1108ab5..70345b4b019 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -1155,16 +1155,6 @@ update_label_decls (struct c_scope *scope)
 }
 }
 
-/* Set the TYPE_CONTEXT of all of TYPE's variants to CONTEXT.  */
-
-static void
-set_type_context (tree type, tree context)
-{
-  for (type = TYPE_MAIN_VARIANT (type); type;
-   type = TYPE_NEXT_VARIANT (type))
-TYPE_CONTEXT (type) = context;
-}
-
 /* Exit a scope.  Restore the state of the identifier-decl mappings
that were in effect when this scope was entered.  Return a BLOCK
node containing all the DECLs in this scope that are of interest
@@ -1253,7 +1243,6 @@ pop_scope (void)
case ENUMERAL_TYPE:
case UNION_TYPE:
case RECORD_TYPE:
- set_type_context (p, context);
 
  /* Types may not have tag-names, in which case the type
 appears in the bindings list with b->id NULL.  */
@@ -1364,12 +1353,7 @@ pop_scope (void)
 the TRANSLATION_UNIT_DECL.  This makes same_translation_unit_p
 work.  */
  if (scope == file_scope)
-   {
  DECL_CONTEXT (p) = context;
- if (TREE_CODE (p) == TYPE_DECL
- && TREE_TYPE (p) != error_mark_node)
-   set_type_context (TREE_TYPE (p), context);
-   }
 
  gcc_fallthrough ();
  /* Parameters go in DECL_ARGUMENTS, not BLOCK_VARS, and have
@@ -2318,21 +2302,18 @@ diagnose_mismatched_decls (tree newdecl, tree olddecl,
{
  if (DECL_INITIAL (olddecl))
{
- /* If both decls are in the same TU and the new declaration
-isn't overriding an extern inline reject the new decl.
-In c99, no overriding is allowed in the same translation
-unit.  */
- if ((!DECL_EXTERN_INLINE (olddecl)
-  || DECL_EXTERN_INLINE (newdecl)
-  || (!flag_gnu89_inline
-  && (!DECL_DECLARED_INLINE_P (olddecl)
-  || !lookup_attribute ("gnu_inline",
-DECL_ATTRIBUTES (olddecl)))
-  && (!DECL_DECLARED_INLINE_P (newdecl)
-  || !lookup_attribute ("gnu_inline",
-DECL_ATTRIBUTES (newdecl
- )
- && same_translation_unit_p (newdecl, olddecl))
+ /* If the new declaration isn't overriding an extern inline
+reject the new decl. In c99, no overriding is allowed
+in the same translation unit.  */
+ if (!DECL_EXTERN_INLINE (olddecl)
+ || DECL_EXTERN_INLINE (newdecl)
+ || (!flag_gnu89_inline
+ && (!DECL_DECLARED_INLINE_P (olddecl)
+ || !lookup_attribute ("gnu_inline",
+   DECL_ATTRIBUTES (olddecl)))
+ && (!DECL_DECLARED_INLINE_P (newdecl)
+ || !lookup_attribute ("gnu_inline",
+   DECL_ATTRIBUTES (newdecl)
{
  auto_diagnostic_group d;
  error ("redefinition of %q+D", newdecl);
@@ -3360,18 +3341,11 @@ pushdecl (tree x)
 type to the composite of all the types of that declaration.
 After the consistency checks, it will be reset to the
 composite of the visible types only.  */
-  if (b && (TREE_PUBLIC (x) || same_translation_unit_p (x, b->decl))
- && b->u.type)
+  if (b && b->u.type)
TREE_TYPE (b->decl) = b->u.type;
 
-  /* The point of the same_translation_unit_p check here is,
-we want to detect a duplicate decl for a construct like
-foo() { extern bar(); } ... static bar();  but not if
-they are in different translation units.  In any case,
-the static does not go in the externals scope.  */
-  if (b
- && (TREE_PUBLIC (x) || same_translation_unit_p (x, b->decl))
- && duplicate_decls (x, b->decl))
+  /* the static does not go in the externals scope.  */
+  if (b && duplicate_decls (x, b->decl))

Re: [PATCH v4 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces.

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/16/23 06:35, Ajit Agarwal wrote:



On 29/04/23 5:03 am, Jeff Law wrote:



On 4/28/23 16:42, Hans-Peter Nilsson wrote:

On Sat, 22 Apr 2023, Ajit Agarwal via Gcc-patches wrote:


Hello All:

This new version of patch 4 use improve ree pass for rs6000 target using 
defined ABI interfaces.
Bootstrapped and regtested on power64-linux-gnu.

Thanks & Regards
Ajit


 ree: Improve ree pass for rs6000 target using defined abi interfaces

  For rs6000 target we see redundant zero and sign
  extension and done to improve ree pass to eliminate
  such redundant zero and sign extension using defines
  ABI interfaces.

  2023-04-22  Ajit Kumar Agarwal  

gcc/ChangeLog:

  * ree.cc (combline_reaching_defs): Add zero_extend
  using defined abi interfaces.
  (add_removable_extension): use of defined abi interfaces
  for no reaching defs.
  (abi_extension_candidate_return_reg_p): New defined ABI function.
  (abi_extension_candidate_p): New defined ABI function.
  (abi_extension_candidate_argno_p): New defined ABI function.
  (abi_handle_regs_without_defs_p): New defined ABI function.

gcc/testsuite/ChangeLog:

  * g++.target/powerpc/zext-elim-3.C
---
   gcc/ree.cc    | 176 +++---
   .../g++.target/powerpc/zext-elim-3.C  |  16 ++
   2 files changed, 162 insertions(+), 30 deletions(-)
   create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index 413aec7c8eb..0de96b1ece1 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -473,7 +473,8 @@ get_defs (rtx_insn *insn, rtx reg, vec *dest)
   break;
   }
   -  gcc_assert (use != NULL);
+  if (use == NULL)
+    return NULL;
       ref_chain = DF_REF_CHAIN (use);
   @@ -514,7 +515,8 @@ get_uses (rtx_insn *insn, rtx reg)
   if (REGNO (DF_REF_REG (def)) == REGNO (reg))
     break;
   -  gcc_assert (def != NULL);
+  if (def == NULL)
+    return NULL;
       ref_chain = DF_REF_CHAIN (def);
   @@ -750,6 +752,103 @@ get_extended_src_reg (rtx src)
     return src;
   }
   +/* Return TRUE if the candidate insn is zero extend and regno is
+   an return  registers.  */
+
+static bool
+abi_extension_candidate_return_reg_p (rtx_insn *insn, int regno)
+{
+  rtx set = single_set (insn);
+
+  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
+    return false;
+
+  if (FUNCTION_VALUE_REGNO_P (regno))
+    return true;
+
+  return false;
+}
+
+/* Return TRUE if reg source operand of zero_extend is argument registers
+   and not return registers and source and destination operand are same
+   and mode of source and destination operand are not same.  */
+
+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+
+  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
+    return false;
+
+  machine_mode ext_dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set),0);
+
+  bool copy_needed
+    = (REGNO (SET_DEST (set)) != REGNO (XEXP (SET_SRC (set), 0)));
+
+  if (!copy_needed && ext_dst_mode != GET_MODE (orig_src)
+  && FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  && !abi_extension_candidate_return_reg_p (insn, REGNO (orig_src)))
+    return true;
+
+  return false;
+}
+
+/* Return TRUE if the candidate insn is zero extend and regno is
+   an argument registers.  */
+
+static bool
+abi_extension_candidate_argno_p (rtx_code code, int regno)
+{
+  if (code !=  ZERO_EXTEND)
+    return false;
+
+  if (FUNCTION_ARG_REGNO_P (regno))
+    return true;
+
+  return false;
+}


I don't see anything in those functions that checks if
ZERO_EXTEND is actually a feature of the ABI, e.g. as opposed to
no extension or SIGN_EXTEND.  Do I miss something?

I don't think you missed anything.  That was one of the points I was making 
last week.  Somewhere, somehow we need to describe what the ABI mandates and 
guarantees.

So while what Ajit has done is a step forward, at some point the actual details 
of the ABI need to be described in a way that can be checked and consumed by 
REE.



The ABI we need for ree pass are the argument registers and return registers. 
Based on that I have described interfaces that we need. Other than that we dont 
any other ABI hooks. I have used FUNCTION_VALUE_REGNO_P and 
FuNCTION_ARG_REGNO_P abi hooks.
You're working with one of many ABIs, some of which have useful 
properties, some of which do not.


Simply testing FUNCTION_VALUE_REGNO_P/FUNCTION_ARG_REGNO_P is not 
sufficient.  You need to be able to query the ABI properties.


jeff

Re: [PATCH] MIPS: don't expand large block move

2023-05-19 Thread Maciej W. Rozycki

On Fri, 19 May 2023, Jeff Law wrote:

> > diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
> > index ca491b981a3..00f26d5e923 100644
> > --- a/gcc/config/mips/mips.cc
> > +++ b/gcc/config/mips/mips.cc
> > @@ -8313,6 +8313,12 @@ mips_expand_block_move (rtx dest, rtx src, rtx
> > length)
> > }
> > else if (optimize)
> > {
> > + /* When the length is big enough, the lib call has better performace
> > +than load/store insns.
> > +In most platform, the value is about 64-128.
> > +And in fact lib call may be optimized with SIMD */
> > + if (INTVAL(length) >= 64)
> > +   return false;
> Just a formatting nit.  Space between INTVAL and the open paren for its
> argument list.

 This is oddly wrapped too.  I'd move "performace" (typo there!) to the 
second line, to align better with the rest of the text.

 Plus s/platform/platforms/ and there's a full stop missing along with two 
spaces at the end.  Also there's inconsistent style around <= and >=; the 
GNU Coding Standards ask for spaces around binary operators.  And "don't" 
in the change heading ought to be capitalised.

 In fact, I'd justify the whole paragraph as each sentence doesn't have to 
start on a new line, and the commit description could benefit from some 
reformatting too, as it's now odd to read.

> OK with that change.

 I think the conditional would be better readable if it was flattened 
though:

  if (INTVAL (length) <= MIPS_MAX_MOVE_BYTES_STRAIGHT)
...
  else if (INTVAL (length) >= 64)
...
  else if (optimize)
...

or even:

  if (INTVAL (length) <= MIPS_MAX_MOVE_BYTES_STRAIGHT)
...
  else if (INTVAL (length) < 64 && optimize)
...

One just wouldn't write it as proposed if creating the whole piece from 
scratch rather than retrofitting this extra conditional.

 Ultimately it may have to be tunable as LWL/LWR, etc. may be subject to 
fusion and may be faster after all.

  Maciej

Re: [PATCH 08/14] fortran: use _P() defines from tree.h

2023-05-19 Thread Bernhard Reutner-Fischer via Gcc-patches

On Thu, 18 May 2023 21:20:41 +0200
Mikael Morin  wrote:

> Le 18/05/2023 à 17:18, Bernhard Reutner-Fischer a écrit :

> > I've fed gfortran.h into the script and found some CLASS_DATA spots,
> > see attached bootstrapped and tested patch.
> > Do we want to have that?  
> Some of it makes sense, but not all of it.
> 
> It is a macro to access the _data component of a class container.
> So for class-related stuff it makes sense to use CLASS_DATA, and 
> typically there will be a check that the type is BT_CLASS before.
> But for cases where we loop over all of the components of a type that is 
> not necessarily a class container, it doesn't make sense to use CLASS_DATA.
> 
> So I suggest to only keep the following hunks.
[]
> OK for those hunks.

Pushed those as r14-1001-g05b7cc7daac8b3
Many thanks!

PS: I'm attaching the fugly script i used to do these macro
replacements FYA.


use-defines.1.awk
Description: application/awk

Re: [PATCH] c++: mangle noexcept-expr [PR70790]

2023-05-19 Thread Patrick Palka via Gcc-patches

On Fri, 19 May 2023, Patrick Palka wrote:

> This implements noexcept-expr mangling (and demangling) as per the
> Itanium ABI.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this
> look OK for trunk?
> 
>   PR c++/70790
> 
> gcc/cp/ChangeLog:
> 
>   * mangle.cc (write_expression): Handle NOEXCEPT_EXPR.
> 
> libiberty/ChangeLog:
> 
>   * cp-demangle.c (cplus_demangle_operators): Add the noexcept
>   operator.

Oops, we should also make sure we print parens around the operand of
noexcept.  Otherwise we'd demangle the mangling of e.g.

  void f(A)

instead as

  void f(A)

Fixed in the following patch:

-- >8 --

Subject: [PATCH] c++: mangle noexcept-expr [PR70790]

This implements noexcept-expr mangling (and demangling) as per the
Itanium ABI.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this
look OK for trunk?

PR c++/70790

gcc/cp/ChangeLog:

* mangle.cc (write_expression): Handle NOEXCEPT_EXPR.

libiberty/ChangeLog:

* cp-demangle.c (cplus_demangle_operators): Add the noexcept
operator.
(d_print_comp_inner) : Always
print parens around the operand of noexcept too.
* testsuite/demangle-expected: Test noexcept operator
demangling.

gcc/testsuite/ChangeLog:

* g++.dg/abi/mangle78.C: New test.
---
 gcc/cp/mangle.cc  |  5 +
 gcc/testsuite/g++.dg/abi/mangle78.C   | 14 ++
 libiberty/cp-demangle.c   |  5 +++--
 libiberty/testsuite/demangle-expected |  3 +++
 4 files changed, 25 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/abi/mangle78.C

diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
index 826c5e76c1d..7dab4e62bc9 100644
--- a/gcc/cp/mangle.cc
+++ b/gcc/cp/mangle.cc
@@ -3402,6 +3402,11 @@ write_expression (tree expr)
   else
write_string ("tr");
 }
+  else if (code == NOEXCEPT_EXPR)
+{
+  write_string ("nx");
+  write_expression (TREE_OPERAND (expr, 0));
+}
   else if (code == CONSTRUCTOR)
 {
   bool braced_init = BRACE_ENCLOSED_INITIALIZER_P (expr);
diff --git a/gcc/testsuite/g++.dg/abi/mangle78.C 
b/gcc/testsuite/g++.dg/abi/mangle78.C
new file mode 100644
index 000..63c4d779e9f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/abi/mangle78.C
@@ -0,0 +1,14 @@
+// PR c++/70790
+// { dg-do compile { target c++11 } }
+
+template
+struct A { };
+
+template
+void f(A);
+
+int main() {
+  f({});
+}
+
+// { dg-final { scan-assembler "_Z1fIiEv1AIXnxtlT_EEE" } }
diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index f2b36bcad68..efada1c322b 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -1947,6 +1947,7 @@ const struct demangle_operator_info 
cplus_demangle_operators[] =
   { "ng", NL ("-"), 1 },
   { "nt", NL ("!"), 1 },
   { "nw", NL ("new"),   3 },
+  { "nx", NL ("noexcept"),  1 },
   { "oR", NL ("|="),2 },
   { "oo", NL ("||"),2 },
   { "or", NL ("|"), 2 },
@@ -5836,8 +5837,8 @@ d_print_comp_inner (struct d_print_info *dpi, int options,
if (code && !strcmp (code, "gs"))
  /* Avoid parens after '::'.  */
  d_print_comp (dpi, options, operand);
-   else if (code && !strcmp (code, "st"))
- /* Always print parens for sizeof (type).  */
+   else if (code && (!strcmp (code, "st") || !strcmp (code, "nx")))
+ /* Always print parens for sizeof (type) or noexcept(expr).  */
  {
d_append_char (dpi, '(');
d_print_comp (dpi, options, operand);
diff --git a/libiberty/testsuite/demangle-expected 
b/libiberty/testsuite/demangle-expected
index d9bc7ed4b1f..52dff883a18 100644
--- a/libiberty/testsuite/demangle-expected
+++ b/libiberty/testsuite/demangle-expected
@@ -1659,3 +1659,6 @@ auto f()::{lambda(X<$T0>*, 
X*)#1}::operator()(X*,
 
 _ZZN1XIiE1FEvENKUliE_clEi
 X::F()::{lambda(int)#1}::operator()(int) const
+
+_Z1fIiEv1AIXnxtlT_EEE
+void f(A)
-- 
2.41.0.rc0.4.g004e0f790f

>   * testsuite/demangle-expected: Test noexcept operator
>   demangling.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/abi/mangle78.C: New test.
> ---
>  gcc/cp/mangle.cc  |  5 +
>  gcc/testsuite/g++.dg/abi/mangle78.C   | 14 ++
>  libiberty/cp-demangle.c   |  1 +
>  libiberty/testsuite/demangle-expected |  3 +++
>  4 files changed, 23 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/abi/mangle78.C
> 
> diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
> index 826c5e76c1d..7dab4e62bc9 100644
> --- a/gcc/cp/mangle.cc
> +++ b/gcc/cp/mangle.cc
> @@ -3402,6 +3402,11 @@ write_expression (tree expr)
>else
>   write_string ("tr");
>  }
> +  else if (code == NOEXCEPT_EXPR)
> +{
> +  write_string ("nx");
> +  write_expression (TREE_OPERAND (expr, 0));
> +}
>else if (code == CONSTRUCTOR)
>  {
>bool braced_init = BRACE_ENCLOSED_INITIALIZER_P (expr);
> diff --git

[PATCH v2] release the sorted FDE array when deregistering a frame [PR109685]

2023-05-19 Thread Thomas Neumann via Gcc-patches


Am 19.05.23 um 19:26 schrieb Jeff Law:

See:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617245.html

I think this needs an update given the other changes in this space.

jeff


I have included the updated the patch below.



The atomic fastpath bypasses the code that releases the sort
array which was lazily allocated during unwinding. We now
check after deregistering if there is an array to free.

libgcc/ChangeLog:
* unwind-dw2-fde.c: Free sort array in atomic fast path.
---
 libgcc/unwind-dw2-fde.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libgcc/unwind-dw2-fde.c b/libgcc/unwind-dw2-fde.c
index a5786bf729c..32b9e64a1c8 100644
--- a/libgcc/unwind-dw2-fde.c
+++ b/libgcc/unwind-dw2-fde.c
@@ -241,6 +241,12 @@ __deregister_frame_info_bases (const void *begin)
   // And remove
   ob = btree_remove (_frames, range[0]);
   bool empty_table = (range[1] - range[0]) == 0;
+
+  // Deallocate the sort array if any.
+  if (ob && ob->s.b.sorted)
+{
+  free (ob->u.sort);
+}
 #else
   init_object_mutex_once ();
   __gthread_mutex_lock (_mutex);
--
2.39.2

[PATCH] c++: mangle noexcept-expr [PR70790]

2023-05-19 Thread Patrick Palka via Gcc-patches

This implements noexcept-expr mangling (and demangling) as per the
Itanium ABI.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this
look OK for trunk?

PR c++/70790

gcc/cp/ChangeLog:

* mangle.cc (write_expression): Handle NOEXCEPT_EXPR.

libiberty/ChangeLog:

* cp-demangle.c (cplus_demangle_operators): Add the noexcept
operator.
* testsuite/demangle-expected: Test noexcept operator
demangling.

gcc/testsuite/ChangeLog:

* g++.dg/abi/mangle78.C: New test.
---
 gcc/cp/mangle.cc  |  5 +
 gcc/testsuite/g++.dg/abi/mangle78.C   | 14 ++
 libiberty/cp-demangle.c   |  1 +
 libiberty/testsuite/demangle-expected |  3 +++
 4 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/abi/mangle78.C

diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
index 826c5e76c1d..7dab4e62bc9 100644
--- a/gcc/cp/mangle.cc
+++ b/gcc/cp/mangle.cc
@@ -3402,6 +3402,11 @@ write_expression (tree expr)
   else
write_string ("tr");
 }
+  else if (code == NOEXCEPT_EXPR)
+{
+  write_string ("nx");
+  write_expression (TREE_OPERAND (expr, 0));
+}
   else if (code == CONSTRUCTOR)
 {
   bool braced_init = BRACE_ENCLOSED_INITIALIZER_P (expr);
diff --git a/gcc/testsuite/g++.dg/abi/mangle78.C 
b/gcc/testsuite/g++.dg/abi/mangle78.C
new file mode 100644
index 000..a3647711604
--- /dev/null
+++ b/gcc/testsuite/g++.dg/abi/mangle78.C
@@ -0,0 +1,14 @@
+// PR c++/70790
+// { dg-do compile { target c++11 } }
+
+template
+struct A { };
+
+template
+void f(A);
+
+int main() {
+  f({});
+}
+
+// { dg-final { scan-assembler "_Z1fIiEv1AIXnxcvT__EEE" } }
diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index f2b36bcad68..341c66db919 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -1947,6 +1947,7 @@ const struct demangle_operator_info 
cplus_demangle_operators[] =
   { "ng", NL ("-"), 1 },
   { "nt", NL ("!"), 1 },
   { "nw", NL ("new"),   3 },
+  { "nx", NL ("noexcept"),  1 },
   { "oR", NL ("|="),2 },
   { "oo", NL ("||"),2 },
   { "or", NL ("|"), 2 },
diff --git a/libiberty/testsuite/demangle-expected 
b/libiberty/testsuite/demangle-expected
index d9bc7ed4b1f..7195cc39c19 100644
--- a/libiberty/testsuite/demangle-expected
+++ b/libiberty/testsuite/demangle-expected
@@ -1659,3 +1659,6 @@ auto f()::{lambda(X<$T0>*, 
X*)#1}::operator()(X*,
 
 _ZZN1XIiE1FEvENKUliE_clEi
 X::F()::{lambda(int)#1}::operator()(int) const
+
+_Z1fIiEv1AIXnxcvT__EEE
+void f(A)
-- 
2.41.0.rc0.4.g004e0f790f

Re: [patch] Allow plugin-specific dumps

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/17/23 17:38, Nathan Sidwell via Gcc-patches wrote:
PR 99451 is about the inability to name tree and rtl dumps by plugin 
name.  And includes a patch.  But then I worked around the problem and 
forgot about it. Here it is again, retested against trunk.


ok?

nathan
--
Nathan Sidwell

0001-Allow-plugin-dumps.patch

 From e54518bc5e59ef5cdc21c652ceac41bd0c0f436c Mon Sep 17 00:00:00 2001
From: Nathan Sidwell
Date: Wed, 17 May 2023 19:27:13 -0400
Subject: [PATCH] Allow plugin dumps

Defer dump option parsing until plugins are initialized.  This allows one to
use plugin names for dumps.

PR other/99451
gcc/
* opts.h (handle_deferred_dump_options): Declare.
* opts-global.cc (handle_common_deferred_options): Do not handle
dump options here.
(handle_deferred_dump_options): New.
* toplev.cc (toplev::main): Call it after plugin init.

OK.
jeff

Re: [Patch] libgomp: Honor OpenMP's nteams-var ICV as upper limit on num teams [PR109875]

2023-05-19 Thread Tobias Burnus


I managed to attach an outdated patch.

Namely: After I tested it, I realized that GCC's testsuite setup already
marks testcases as UNSUPPORTED that use all dg-set-target-env-var – if
remote testing is done → see gcc/testsuite/lib/gcc-dg.exp.

Thus, instead of checking getenv, I can directly use #define to state
which environment variable is set, making the testcase more reliable and
avoiding some function calls. Otherwise, unchanged.

Tobias

On 19.05.23 19:18, Tobias Burnus wrote:

I intent to commit this patch early next week — any comments, questions,
concerns?

* * *

I stumbled over this issue when looking at sollve_vv's pull requests
for  omp_set_num_teams and omp_get_max_teams testcase (#729 + #728).

While the num_teams clause was honored everywhere, the nteams-var ICV
did set an upper limit on the implementation-defined number of teams.

That's fixed by the attached patch. Testing showed with my device setup
120 teams with GCN and 240 teams with nvptx – i.e. plenty values to
choose from to reduce the #teams via nteams-var.


Spec wording for OpenMP 5.1: See num_teams description in the 2nd and
3rd paragraph of "Description" at
https://www.openmp.org/spec-html/5.1/openmpse15.html


Tested on x86-64 without offloading (and working setenv support) and
with gcn and nvptx offload (running libgomp w/o setenv support and
manually also with setting the env vars).

Tobias

PS: The omp_get_max_teams routine is a bit odd; the return value is
described both as being nteams-var (which can be 0, which is actually
the default) and as returning the number (or upper bound?) of the number
of teams used. → OpenMP spec issue #3619.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße
201, 80634 München; Gesellschaft mit beschränkter Haftung;
Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft:
München; Registergericht München, HRB 106955

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp: Honor OpenMP's nteams-var ICV as upper limit on num teams [PR109875]

The nteams-var ICV exists per device and can be set either via the routine
omp_set_num_teams or as environment variable (OMP_NUM_TEAMS with optional
_ALL/_DEV/_DEV_ suffix); it is default-initialized to zero. The number
of teams created is described under the num_teams clause. If the clause is
absent, the number of teams is implementation defined but at least
one team must exist and, if nteams-var is positive, at most nteams-var
teams may exist.

The latter condition was not honored in a target region before this
commit, such that too many teams were created.

Also before this commit, the num_teams([lower:]upper) was properly
honored and the nteams-var ICV was honored for the host, overriding
the default of 3. For host fallback without clause, the default is one
such that it was and is valid for any ICV value.

	PR libgomp/109875

libgomp/ChangeLog:

	* config/gcn/target.c (GOMP_teams4): Honor nteams-var ICV.
	* config/nvptx/target.c (GOMP_teams4): Likewise.
	* testsuite/libgomp.c-c++-common/teams-nteams-icv-1.c: New test.
	* testsuite/libgomp.c-c++-common/teams-nteams-icv-2.c: New test.
	* testsuite/libgomp.c-c++-common/teams-nteams-icv-3.c: New test.
	* testsuite/libgomp.c-c++-common/teams-nteams-icv-4.c: New test.

 libgomp/config/gcn/target.c|   4 +-
 libgomp/config/nvptx/target.c  |   4 +-
 .../libgomp.c-c++-common/teams-nteams-icv-1.c  | 198 +
 .../libgomp.c-c++-common/teams-nteams-icv-2.c  |   8 +
 .../libgomp.c-c++-common/teams-nteams-icv-3.c  |   8 +
 .../libgomp.c-c++-common/teams-nteams-icv-4.c  |  14 ++
 6 files changed, 234 insertions(+), 2 deletions(-)

diff --git a/libgomp/config/gcn/target.c b/libgomp/config/gcn/target.c
index c6691fde3c6..ea5eb1ff5ed 100644
--- a/libgomp/config/gcn/target.c
+++ b/libgomp/config/gcn/target.c
@@ -48,7 +48,9 @@ GOMP_teams4 (unsigned int num_teams_lower, unsigned int num_teams_upper,
  multiple times at least for some workgroups.  */
   (void) num_teams_lower;
   if (!num_teams_upper || num_teams_upper >= num_workgroups)
-num_teams_upper = num_workgroups;
+num_teams_upper = ((GOMP_ADDITIONAL_ICVS.nteams > 0
+			&& num_workgroups > GOMP_ADDITIONAL_ICVS.nteams)
+		   ? GOMP_ADDITIONAL_ICVS.nteams : num_workgroups);
   else if (workgroup_id >= num_teams_upper)
 return false;
   gomp_num_teams_var = num_teams_upper - 1;
diff --git a/libgomp/config/nvptx/target.c b/libgomp/config/nvptx/target.c
index f102d7d02d9..125d92a2ea9 100644
--- a/libgomp/config/nvptx/target.c
+++ b/libgomp/config/nvptx/target.c
@@ -55,7 +55,9 @@ GOMP_teams4 (unsigned int num_teams_lower, unsigned int num_teams_upper,
 	= thread_limit > INT_MAX ? UINT_MAX : thread_limit;
 }

Re: [PATCH] Fix driver/33980: Precompiled header file not removed on error

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/19/23 08:48, Andrew Pinski via Gcc-patches wrote:

So the problem here is that in the spec files, we were not marking the pch
output file to be removed on error.
The way to fix this is to mark the --output-pch argument as the output
file argument.
For the C++ specs file, we had to move around where the %V was located
such that it would be after the %w marker as %V marker clears the outputfiles.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/cp/ChangeLog:

PR driver/33980
* lang-specs.h ("@c++-header"): Add %w after
the --output-pch.
("@c++-system-header"): Likewise.
("@c++-user-header"): Likewise.

gcc/ChangeLog:

PR driver/33980
* gcc.cc (default_compilers["@c-header"]): Add %w
after the --output-pch.

OK
jeff

Re: PING: [PATCH] release the sorted FDE array when deregistering a frame [PR109685]

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/12/23 09:19, Thomas Neumann via Gcc-patches wrote:
Summary: The old linear scan logic called free while searching the list 
of frames. The atomic fast path finds the frame quickly, but forgot the 
free call. This patches adds the missing free. Bugzilla #109685.


See:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617245.html

I think this needs an update given the other changes in this space.

jeff

Re: [PATCH 13-backport] riscv/linux: Don't add -latomic with -pthread

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/17/23 03:22, Bo YU wrote:

Hi,

I just want to backport the commit to gcc-13 branch:

commit 203f3060dd363361b172f7295f42bb6bf5ac0b3b
Author: Andreas Schwab 
Date:   Sat Apr 23 15:48:42 2022 +0200

     riscv/linux: Don't add -latomic with -pthread

     Now that we have support for inline subword atomic operations, it 
is no

     longer necessary to link against libatomic.  This also fixes testsuite
     failures because the framework does not properly set up the linker 
flags

     for finding libatomic.
     The use of atomic operations is also independent of the use of 
libpthread.


     gcc/
     * config/riscv/linux.h (LIB_SPEC): Don't redefine.

The discussion is here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104338#c20

THanks.  I've backported this to gcc-13.
jeff

[Committed] RISC-V: improve codegen for large constants with same 32-bit lo and hi parts [2]

2023-05-19 Thread Vineet Gupta


On 5/19/23 09:33, Jeff Law wrote:



On 5/18/23 14:57, Vineet Gupta wrote:

[part #2 of PR/109279]

SPEC2017 deepsjeng uses large constants which currently generates 
less than

ideal code. This fix improves codegen for large constants which have
same low and hi parts: e.g.

long long f(void) { return 0x0101010101010101ull; }

Before
 li  a5,0x101
 addi    a5,a5,0x101
 mv  a0,a5
 slli    a5,a5,32
 add a0,a5,a0
 ret

With patch
li    a5,0x101
addi    a5,a5,0x101
slli    a0,a5,32
add    a0,a0,a5
ret

This is testsuite clean.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_split_integer): if loval is equal
  to hival, ASHIFT the corresponding regs.
LGTM.  Please install.  Thanks for taking care of this!  The updated 
sequence looks good.


Jeff

Re: [PATCH] RISC-V: improve codegen for large constants with same 32-bit lo and hi parts [2]

2023-05-19 Thread Vineet Gupta




On 5/19/23 09:36, Palmer Dabbelt wrote:
Works for me.  Did you start that performance backports branch?  
Either way, I think this should go on it. 


Please note that there is a bit of dependency chain. Assuming the 
aforementioned branch is gcc 13.1 based, this change also needs my 
splitter relaxation fix g0530254413f8 to ensure incremental improvements 
to large const codegen.

[Patch] libgomp: Honor OpenMP's nteams-var ICV as upper limit on num teams [PR109875]

2023-05-19 Thread Tobias Burnus


I intent to commit this patch early next week — any comments, questions,
concerns?

* * *

I stumbled over this issue when looking at sollve_vv's pull requests
for  omp_set_num_teams and omp_get_max_teams testcase (#729 + #728).

While the num_teams clause was honored everywhere, the nteams-var ICV
did set an upper limit on the implementation-defined number of teams.

That's fixed by the attached patch. Testing showed with my device setup
120 teams with GCN and 240 teams with nvptx – i.e. plenty values to
choose from to reduce the #teams via nteams-var.


Spec wording for OpenMP 5.1: See num_teams description in the 2nd and
3rd paragraph of "Description" at
https://www.openmp.org/spec-html/5.1/openmpse15.html


Tested on x86-64 without offloading (and working setenv support) and
with gcn and nvptx offload (running libgomp w/o setenv support and
manually also with setting the env vars).

Tobias

PS: The omp_get_max_teams routine is a bit odd; the return value is
described both as being nteams-var (which can be 0, which is actually
the default) and as returning the number (or upper bound?) of the number
of teams used. → OpenMP spec issue #3619.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp: Honor OpenMP's nteams-var ICV as upper limit on num teams [PR109875]

The nteams-var ICV exists per device and can be set either via the routine
omp_set_num_teams or as environment variable (OMP_NUM_TEAMS with optional
_ALL/_DEV/_DEV_ suffix); it is default-initialized to zero. The number
of teams created is described under the num_teams clause. If the clause is
absent, the number of teams is implementation defined but at least
one team must exist and, if nteams-var is positive, at most nteams-var
teams may exist.

The latter condition was not honored in a target region before this
commit, such that too many teams were created.

Also before this commit, the num_teams([lower:]upper) was properly
honored and the nteams-var ICV was honored for the host, overriding
the default of 3. For host fallback without clause, the default is one
such that it was and is valid for any ICV value.

	PR libgomp/109875

libgomp/ChangeLog:

	* config/gcn/target.c (GOMP_teams4): Honor nteams-var ICV.
	* config/nvptx/target.c (GOMP_teams4): Likewise.
	* testsuite/libgomp.c-c++-common/teams-nteams-icv-1.c: New test.
	* testsuite/libgomp.c-c++-common/teams-nteams-icv-2.c: New test.
	* testsuite/libgomp.c-c++-common/teams-nteams-icv-3.c: New test.
	* testsuite/libgomp.c-c++-common/teams-nteams-icv-4.c: New test.

 libgomp/config/gcn/target.c|   4 +-
 libgomp/config/nvptx/target.c  |   4 +-
 .../libgomp.c-c++-common/teams-nteams-icv-1.c  | 201 +
 .../libgomp.c-c++-common/teams-nteams-icv-2.c  |   5 +
 .../libgomp.c-c++-common/teams-nteams-icv-3.c  |   5 +
 .../libgomp.c-c++-common/teams-nteams-icv-4.c  |   8 +
 6 files changed, 225 insertions(+), 2 deletions(-)

diff --git a/libgomp/config/gcn/target.c b/libgomp/config/gcn/target.c
index c6691fde3c6..ea5eb1ff5ed 100644
--- a/libgomp/config/gcn/target.c
+++ b/libgomp/config/gcn/target.c
@@ -48,7 +48,9 @@ GOMP_teams4 (unsigned int num_teams_lower, unsigned int num_teams_upper,
  multiple times at least for some workgroups.  */
   (void) num_teams_lower;
   if (!num_teams_upper || num_teams_upper >= num_workgroups)
-num_teams_upper = num_workgroups;
+num_teams_upper = ((GOMP_ADDITIONAL_ICVS.nteams > 0
+			&& num_workgroups > GOMP_ADDITIONAL_ICVS.nteams)
+		   ? GOMP_ADDITIONAL_ICVS.nteams : num_workgroups);
   else if (workgroup_id >= num_teams_upper)
 return false;
   gomp_num_teams_var = num_teams_upper - 1;
diff --git a/libgomp/config/nvptx/target.c b/libgomp/config/nvptx/target.c
index f102d7d02d9..125d92a2ea9 100644
--- a/libgomp/config/nvptx/target.c
+++ b/libgomp/config/nvptx/target.c
@@ -55,7 +55,9 @@ GOMP_teams4 (unsigned int num_teams_lower, unsigned int num_teams_upper,
 	= thread_limit > INT_MAX ? UINT_MAX : thread_limit;
 }
   if (!num_teams_upper)
-num_teams_upper = num_blocks;
+num_teams_upper = ((GOMP_ADDITIONAL_ICVS.nteams > 0
+			&& num_blocks > GOMP_ADDITIONAL_ICVS.nteams)
+		   ? GOMP_ADDITIONAL_ICVS.nteams : num_blocks);
   else if (num_blocks < num_teams_lower)
 num_teams_upper = num_teams_lower;
   else if (num_blocks < num_teams_upper)
diff --git a/libgomp/testsuite/libgomp.c-c++-common/teams-nteams-icv-1.c b/libgomp/testsuite/libgomp.c-c++-common/teams-nteams-icv-1.c
new file mode 100644
index 000..fb562a77ef8
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/teams-nteams-icv-1.c
@@ -0,0 +1,201 @@
+/* Check that the nteams ICV is honored. */
+/* PR libgomp/109875  */
+
+/*  This base version of testcases is supposed to

Re: [patch,avr] PR105753: Fix ICE in add_clobbers.

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/16/23 02:56, Georg-Johann Lay wrote:

This patch removes the superfluous parallel in [u]divmod patterns
in the AVR backend.  Effect of extra parallel is that add_clobbers
reaches gcc_unreachable() because the clobbers for [u]divmod are
missing.  The parallel around the parts of an insn pattern is
implicit if it has multiple parts like clobbers, so extra parallel
should be removed.

Ok to apply?

Johann

--

gcc/
 PR target/105753
 * config/avr/avr.md (divmodpsi, udivmodpsi, divmodsi, udivmodsi):
 Remove superfluous "parallel" in insn pattern.
 ([u]divmod4): Tidy code.  Use gcc_unreachable() instead of
 printing error text to assembly.

gcc/testsuite/
 PR target/105753
 * gcc.target/avr/torture/pr105753.c: New test.

OK
jeff

Re: [PATCH] MIPS: don't expand large block move

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/19/23 00:11, YunQiang Su wrote:

On platform with LWL/LWR, mips_block_move_loop is always used,
which expand __buildin_memcpy/strcpy to a loop of lwl/lwr/swl/swl etc.

For short (normally <=64), it has better performance,
but when the src/dest are long, use memcpy/strcpy lib call may have
better performance.

At the same time, lib call may be optimized with SIMD, so,
on the platform with SIMD, lib call may have much better performace.

gcc/ChangeLog:
* config/mips/mips.cc (mips_expand_block_move): don't expand
  if length>=64.

gcc/testsuite/ChangeLog:
* gcc.target/mips/expand-block-move-large.c: New test.
---
  gcc/config/mips/mips.cc |  6 ++
  .../gcc.target/mips/expand-block-move-large.c   | 17 +
  2 files changed, 23 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/mips/expand-block-move-large.c

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index ca491b981a3..00f26d5e923 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -8313,6 +8313,12 @@ mips_expand_block_move (rtx dest, rtx src, rtx length)
}
else if (optimize)
{
+ /* When the length is big enough, the lib call has better performace
+than load/store insns.
+In most platform, the value is about 64-128.
+And in fact lib call may be optimized with SIMD */
+ if (INTVAL(length) >= 64)
+   return false;
Just a formatting nit.  Space between INTVAL and the open paren for its 
argument list.


OK with that change.

jeff

Re: [PATCH] avr: Set param_min_pagesize to 0 [PR105523]

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/19/23 08:02, Bernhard Reutner-Fischer via Gcc-patches wrote:

On 19 May 2023 07:58:48 CEST, "SenthilKumar.Selvaraj--- via Gcc-patches" 
 wrote:

Just a nit:


+static bool
+avr_addr_space_zero_address_valid (addr_space_t as ATTRIBUTE_UNUSED)
+{
+  return flag_delete_null_pointer_checks == 0;
+}


Since we are c++ nowadays, you can omit the parameter name for unused 
arguments. I.e.:

static bool
avr_addr_space_zero_address_valid (addr_space_t)
{

Right.  And I strongly prefer that over ATTRIBUTE_UNUSED.

So OK for the trunk with that change.

jeff

Re: [PATCH 2/2] Improve do_store_flag for comparing single bit against that bit

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/18/23 20:14, Andrew Pinski via Gcc-patches wrote:

This is a case which I noticed while working on the previous patch.
Sometimes we end up with `a == CST` instead of comparing against 0.
This happens in the following code:
```
unsigned f(unsigned t)
{
   if (t & ~(1<<30)) __builtin_unreachable();
   t ^= (1<<30);
   return t != 0;
}
```

We should handle the case where the nonzero bits is the same as the
comparison operand.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* expr.cc (do_store_flag): Improve for single bit testing
not against zero but against that single bit.
This looks like it can/should go forward independently of 1/2 and 
touches on my earlier comment about using bit extractions  ;-)


So OK by me.

jeff

Re: [PATCH 1/2] Improve do_store_flag for single bit comparison against 0

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/18/23 20:14, Andrew Pinski via Gcc-patches wrote:

While working something else, I noticed we could improve
the following function code generation:
```
unsigned f(unsigned t)
{
   if (t & ~(1<<30)) __builtin_unreachable();
   return t != 0;
}
```
Right know we just emit a comparison against 0 instead
of just a shift right by 30.
There is code in do_store_flag which already optimizes
`(t & 1<<30) != 0` to `(t >> 30) & 1`. This patch
extends it to handle the case where we know t has a
nonzero of just one bit set.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* expr.cc (do_store_flag): Extend the one bit checking case
to handle the case where we don't have an and but rather still
one bit is known to be non-zero.
So as we touched on in IRC, the concern is targets where the cost of the 
shift depends on the number of bits shifted.  Can we look at costing 
here to determine the initial RTL generation approach?


Another approach that would work for some targets is a single bit 
extract.  In theory we should be discovering the extract idiom from the 
shift+and form, but I'm always concerned that it's going to be missed 
for one or more oddball reasons.


jeff

Re: [PATCH] RISC-V: improve codegen for large constants with same 32-bit lo and hi parts [2]

2023-05-19 Thread Palmer Dabbelt


On Fri, 19 May 2023 09:33:34 PDT (-0700), jeffreya...@gmail.com wrote:



On 5/18/23 14:57, Vineet Gupta wrote:

[part #2 of PR/109279]

SPEC2017 deepsjeng uses large constants which currently generates less than
ideal code. This fix improves codegen for large constants which have
same low and hi parts: e.g.

long long f(void) { return 0x0101010101010101ull; }

Before
 li  a5,0x101
 addia5,a5,0x101
 mv  a0,a5
 sllia5,a5,32
 add a0,a5,a0
 ret

With patch
li  a5,0x101
addia5,a5,0x101
sllia0,a5,32
add a0,a0,a5
ret

This is testsuite clean.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_split_integer): if loval is equal
  to hival, ASHIFT the corresponding regs.

LGTM.  Please install.  Thanks for taking care of this!  The updated
sequence looks good.


Works for me.  Did you start that performance backports branch?  Either 
way, I think this should go on it.

Re: [PATCH] RISC-V: improve codegen for large constants with same 32-bit lo and hi parts [2]

2023-05-19 Thread Jeff Law via Gcc-patches





On 5/18/23 14:57, Vineet Gupta wrote:

[part #2 of PR/109279]

SPEC2017 deepsjeng uses large constants which currently generates less than
ideal code. This fix improves codegen for large constants which have
same low and hi parts: e.g.

long long f(void) { return 0x0101010101010101ull; }

Before
 li  a5,0x101
 addia5,a5,0x101
 mv  a0,a5
 sllia5,a5,32
 add a0,a5,a0
 ret

With patch
li  a5,0x101
addia5,a5,0x101
sllia0,a5,32
add a0,a0,a5
ret

This is testsuite clean.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_split_integer): if loval is equal
  to hival, ASHIFT the corresponding regs.
LGTM.  Please install.  Thanks for taking care of this!  The updated 
sequence looks good.


Jeff

[PATCH] Fix driver/33980: Precompiled header file not removed on error

2023-05-19 Thread Andrew Pinski via Gcc-patches

So the problem here is that in the spec files, we were not marking the pch
output file to be removed on error.
The way to fix this is to mark the --output-pch argument as the output
file argument.
For the C++ specs file, we had to move around where the %V was located
such that it would be after the %w marker as %V marker clears the outputfiles.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/cp/ChangeLog:

PR driver/33980
* lang-specs.h ("@c++-header"): Add %w after
the --output-pch.
("@c++-system-header"): Likewise.
("@c++-user-header"): Likewise.

gcc/ChangeLog:

PR driver/33980
* gcc.cc (default_compilers["@c-header"]): Add %w
after the --output-pch.
---
 gcc/cp/lang-specs.h | 12 ++--
 gcc/gcc.cc  |  8 
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/cp/lang-specs.h b/gcc/cp/lang-specs.h
index c591d155cc1..94bdd4dcc4a 100644
--- a/gcc/cp/lang-specs.h
+++ b/gcc/cp/lang-specs.h
@@ -53,9 +53,9 @@ along with GCC; see the file COPYING3.  If not see
   "  %{fmodules-ts:-fmodule-header %{fpreprocessed:-fdirectives-only}}"
   "  %(cc1_options) %2"
   "  %{!fsyntax-only:"
-  "%{!S:-o %g.s%V}"
+  "%{!S:-o %g.s}"
   "%{!fmodule-*:%{!fmodules-*:%{!fdump-ada-spec*:"
-  " %{!o*:--output-pch %i.gch}%W{o*:--output-pch %*}"
+  " %{!o*:--output-pch %w%i.gch}%W{o*:--output-pch 
%w%*%{!S:%V}}"
   "}}}",
  CPLUSPLUS_CPP_SPEC, 0, 0},
   {"@c++-system-header",
@@ -74,9 +74,9 @@ along with GCC; see the file COPYING3.  If not see
   "%{fpreprocessed:-fdirectives-only}}"
   "  %(cc1_options) %2"
   "  %{!fsyntax-only:"
-  "%{!S:-o %g.s%V}"
+  "%{!S:-o %g.s}"
   "%{!fmodule-*:%{!fmodules-*:%{!fdump-ada-spec*:"
-  " %{!o*:--output-pch %i.gch}%W{o*:--output-pch %*}"
+  " %{!o*:--output-pch %w%i.gch}%W{o*:--output-pch 
%w%*%{!S:%V}}"
   "}}}",
  CPLUSPLUS_CPP_SPEC, 0, 0},
   {"@c++-user-header",
@@ -94,9 +94,9 @@ along with GCC; see the file COPYING3.  If not see
   "  %{fmodules-ts:-fmodule-header=user 
%{fpreprocessed:-fdirectives-only}}"
   "  %(cc1_options) %2"
   "  %{!fsyntax-only:"
-  "%{!S:-o %g.s%V}"
+  "%{!S:-o %g.s}"
   "%{!fmodule-*:%{!fmodules-*:%{!fdump-ada-spec*:"
-  " %{!o*:--output-pch %i.gch}%W{o*:--output-pch %*}"
+  " %{!o*:--output-pch %w%i.gch}%W{o*:--output-pch 
%w%*%{!S:%V}}"
   "}}}",
  CPLUSPLUS_CPP_SPEC, 0, 0},
   {"@c++",
diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index 39a44fa486d..2ccca00d603 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -1454,13 +1454,13 @@ static const struct compiler default_compilers[] =
cc1 -fpreprocessed %{save-temps*:%b.i} %{!save-temps*:%g.i} 
\
%(cc1_options)\
%{!fsyntax-only:%{!S:-o %g.s} \
-   %{!fdump-ada-spec*:%{!o*:--output-pch %i.gch}\
-  %W{o*:--output-pch %*}}%V}}\
+   %{!fdump-ada-spec*:%{!o*:--output-pch %w%i.gch}\
+  %W{o*:--output-pch 
%w%*}}%{!S:%V}}}\
  %{!save-temps*:%{!traditional-cpp:%{!no-integrated-cpp:\
cc1 %(cpp_unique_options) %(cc1_options)\
%{!fsyntax-only:%{!S:-o %g.s} \
-   %{!fdump-ada-spec*:%{!o*:--output-pch %i.gch}\
-  %W{o*:--output-pch %*}}%V}}}", 
0, 0, 0},
+   %{!fdump-ada-spec*:%{!o*:--output-pch %w%i.gch}\
+  %W{o*:--output-pch 
%w%*}}%{!S:%V", 0, 0, 0},
   {".i", "@cpp-output", 0, 0, 0},
   {"@cpp-output",
"%{!M:%{!MM:%{!E:cc1 -fpreprocessed %i %(cc1_options) 
%{!fsyntax-only:%(invoke_as)", 0, 0, 0},
-- 
2.31.1

Re: [PATCH] avr: Set param_min_pagesize to 0 [PR105523]

2023-05-19 Thread Bernhard Reutner-Fischer via Gcc-patches

On 19 May 2023 07:58:48 CEST, "SenthilKumar.Selvaraj--- via Gcc-patches" 
 wrote:

Just a nit:

>+static bool
>+avr_addr_space_zero_address_valid (addr_space_t as ATTRIBUTE_UNUSED)
>+{
>+  return flag_delete_null_pointer_checks == 0;
>+}

Since we are c++ nowadays, you can omit the parameter name for unused 
arguments. I.e.:

static bool
avr_addr_space_zero_address_valid (addr_space_t)
{
  ...

[pushed] wwwdocs: preprocess: Check whether input files exist

2023-05-19 Thread Gerald Pfeifer

This has not come up in all those years since the preprocess script
usually is invoked from other scripts, notably post commit hooks. It
can, however, be invoked manually, and error handling is generally a
good thing.

Instead of
   cat: foo/bar/index.html: No such file or directory
   New file /www/gcc/htdocs/foo/bar/index.html
and an empty output file, we now get
   Input file foo/bar/index.html not found.
when invoking `preprocess foo/bar/index.html`.

Pushed.
Gerald
---
 bin/preprocess | 5 +
 1 file changed, 5 insertions(+)

diff --git a/bin/preprocess b/bin/preprocess
index c62ba457..c6d34c4b 100755
--- a/bin/preprocess
+++ b/bin/preprocess
@@ -155,6 +155,11 @@ process_file()
 # Strip possibly leading "./".
 f=`echo $1 | sed -e 's#^\./##'`
 
+if [ ! -f "$SOURCETREE/$f" ] wwwdocs:; then
+echo "Input file $f not found."
+return
+fi
+
 if [ ! -d "$DESTTREE/`dirname $f`" ] wwwdocs:; then
 echo "Creating new directory `dirname $f`."
 mkdir -p $DESTTREE/`dirname $f`
-- 
2.40.1

Re: [PATCH] add glibc-stdint.h to vax and lm32 linux target (PR target/105525)

2023-05-19 Thread Mikael Pettersson via Gcc-patches

On Fri, May 19, 2023 at 2:06 PM Maciej W. Rozycki  wrote:
>
> On Sat, 29 Apr 2023, Jeff Law via Gcc-patches wrote:
>
> > > PR target/105525 is a build regression for the vax and lm32 linux
> > > targets present in gcc-12/13/head, where the builds fail due to
> > > unsatisfied references to __INTPTR_TYPE__ and __UINTPTR_TYPE__,
> > > caused by these two targets failing to provide glibc-stdint.h.
> > >
> > > Fixed thusly, tested by building crosses, which now succeeds.
> > >
> > > Ok for trunk? (Note I don't have commit rights.)
> > >
> > > 2023-04-28  Mikael Pettersson
> > >
> > > PR target/105525
> > > * config.gcc (vax-*-linux*): Add glibc-stdint.h.
> > > (lm32-*-uclinux*): Likewise.
> > Thanks.  I've pushed this to the trunk.
>
>  Hmm, I find it quite insteresting and indeed encouraging that someone
> actually verifies our VAX/Linux target.
>
>  Mikael, how do you actually verify it however?

My vax builds are only cross-compilers without kernel headers or libc.

The background is that I maintain a script to build GCC-based crosses to
as many targets as I can, currently it supports 78 distinct processors and
82 triplets (four processors have multiple triplets). I only check that I can
build the toolchains (full linux-gnu ones where possible).

/Mikael

>  I'm asking because while I did a glibc port for VAX/Linux (including VAX
> floating-point format support), it was many years ago and for LinuxThreads
> configuration only (hence glibc 2.4 only), which I suspect may not be
> supported by GCC anymore.  And it has never made its way upstream, because
> we'd have to land Linux kernel bits there first and that effort has
> stalled.
>
>  I can still boot that old stuff on my VAX machine, but the userland is
> minimal and somewhat unstable as things sometimes crash or otherwise
> behave in a weird way.  I do have working `bash' and more importantly
> `gdbserver' binaries though.
>
>   Maciej

Re: [PATCH] RISC-V: Add mode switching target hook to insert rounding mode config for fixed-point instructions

2023-05-19 Thread Andreas Schwab

This is built with --disable-werror, so it doesn't fail, but the warning
is there:

https://build.opensuse.org/package/live_build_log/devel:gcc:next/gcc14/openSUSE_Factory_RISCV/riscv64

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

[PATCH, OpenMP, nvptx] Improving OpenMP offloading by OpenACC

2023-05-19 Thread chunglin.tang--- via Gcc-patches

ut when needed.
(nvptx_goacc_fork_join): Return true under OMPACC mode.
* config/nvptx/nvptx.h (struct GTY(()) machine_function): Add
omp_parallel_predicate and omp_fn_entry_num_threads_reg fields.
* config/nvptx/nvptx.md (unspecv): Add UNSPECV_GET_TID,
UNSPECV_GET_NTID, UNSPECV_GET_CTAID, UNSPECV_GET_NCTAID,
UNSPECV_OMP_PARALLEL_FORK, UNSPECV_OMP_PARALLEL_JOIN entries.
(nvptx_shared_mem_operand): New predicate.
(gomp_barrier): New expand pattern.
(omp_get_num_threads): New expand pattern.
(omp_get_num_teams): New insn pattern.
(omp_get_thread_num): Likewise.
(omp_get_team_num): Likewise.
(get_ntid): Likewise.
(nvptx_omp_parallel_fork): Likewise.
(nvptx_omp_parallel_join): Likewise.

* flag-types.h (omp_target_mode_kind): New flag value enum.
* gimplify.cc (struct gimplify_omp_ctx): Add 'bool ompacc' field.
(gimplify_scan_omp_clauses): Handle OMP_CLAUSE__OMPACC_.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_ctx_ompacc_p): New function.
(gimplify_omp_for): Handle combined loops under OMPACC.

* lto-wrapper.cc (append_compiler_options): Add OPT_fopenmp_target_.
* omp-builtins.def (BUILT_IN_OMP_GET_THREAD_NUM): Remove CONST.
(BUILT_IN_OMP_GET_NUM_THREADS): Likewise.
* omp-expand.cc (remove_exit_barrier): Disable addressable-var
processing for parallel construct child functions under OMPACC mode.
(expand_oacc_for): Add OMPACC mode handling.
(get_target_arguments): Force thread_limit clause value to 1 under
OMPACC mode.
(expand_omp): Under OMPACC mode, avoid child function expanding of
GIMPLE_OMP_PARALLEL.
* omp-general.cc (omp_extract_for_data): Adjustments for OMPACC mode.
* omp-low.cc (struct omp_context): Add 'bool ompacc_p' field.
(scan_sharing_clauses): Handle OMP_CLAUSE__OMPACC_.
(ompacc_ctx_p): New function.
(scan_omp_parallel): Handle OMPACC mode, avoid creating child function.
(scan_omp_target): Tag "ompacc"/"ompacc for" attributes for target
construct child function, remove OMP_CLAUSE__OMPACC_ clauses.
(lower_oacc_head_mark): Handle OMPACC mode cases.
(lower_omp_for): Adjust OMP_FOR kind from OpenMP to OpenACC kinds, add
vector/gang clauses as needed. Add other OMPACC handling.
(lower_omp_taskreg): Add call to lower_oacc_head_tail for OMPACC case.
(lower_omp_target): Do OpenACC gang privatization under OMPACC case.
(lower_omp_teams): Forward OpenACC privatization variables to outer
target region under OMPACC mode.
(lower_omp_1): Do OpenACC gang privatization under OMPACC case for
GIMPLE_BIND.
* omp-offload.cc (ompacc_supported_clauses_p): New function.
(struct target_region_data): New struct type for tree walk.
(scan_fndecl_for_ompacc): New function.
(scan_omp_target_region_r): New function.
(scan_omp_target_construct_r): New function.
(omp_ompacc_attribute_tagging): New function.
(oacc_dim_call): Add OMPACC case handling.
(execute_oacc_device_lower): Make parts explicitly only OpenACC enabled.
(pass_oacc_device_lower::gate): Enable pass under OMPACC mode.
* omp-offload.h (omp_ompacc_attribute_tagging): New prototype.
* opts.cc (finish_options): Only allow -fopenmp-target= when -fopenmp
and no -fopenacc.
* target-insns.def (gomp_barrier): New defined insn pattern.
(omp_get_thread_num): Likewise.
(omp_get_num_threads): Likewise.
(omp_get_team_num): Likewise.
(omp_get_num_teams): Likewise.
* tree-core.h (enum omp_clause_code): Add new OMP_CLAUSE__OMPACC_ entry
for internal clause.
* tree-nested.cc (convert_nonlocal_omp_clauses): Handle
OMP_CLAUSE__OMPACC_.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE__OMPACC_.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE__OMPACC_ entry.
(omp_clause_code_name): Likewise.
* tree.h (OMP_CLAUSE__OMPACC__FOR): New macro for OMP_CLAUSE__OMPACC_.

libgomp/ChangeLog:

* config/nvptx/team.c (__nvptx_omp_num_threads): New global variable in
shared memory.


ompacc-20230519-2115.patch
Description: ompacc-20230519-2115.patch

RE: [PATCH] RISC-V: Add mode switching target hook to insert rounding mode config for fixed-point instructions

2023-05-19 Thread Li, Pan2 via Gcc-patches

Sorry to bother, just tried below build for the RISC-V but failed to 
reproduce...

../configure \
  --target=riscv64-unknown-elf \
  --prefix=${INSTALL_DIR} \
  --disable-shared \
  --enable-threads \
  --enable-tls \
  --enable-languages=c,c++ \
  --with-system-zlib \
  --with-newlib \
  --disable-libmudflap \
  --disable-libssp \
  --disable-libquadmath \
  --disable-libgomp \
  --enable-nls \
  --disable-tm-clone-registry \
  --enable-multilib \
  --src=`pwd`/../ \
  --with-abi=lp64d \
  --with-arch=rv64imafdcv \
  --with-tune=rocket \
  --with-isa-spec=20191213 \
  --enable-bootstrap \
make -j $(nproc) all-gcc && make install-gcc

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Li, Pan2 via Gcc-patches
Sent: Friday, May 19, 2023 8:29 PM
To: Andreas Schwab ; juzhe.zh...@rivai.ai
Cc: gcc-patches@gcc.gnu.org; kito.ch...@gmail.com; kito.ch...@sifive.com; 
pal...@dabbelt.com; pal...@rivosinc.com; jeffreya...@gmail.com; 
rdapp@gmail.com
Subject: RE: [PATCH] RISC-V: Add mode switching target hook to insert rounding 
mode config for fixed-point instructions

Hi Andreas,

Could you please help to share more information about how to trigger this 
error? As you don't mentioned, I assume below error comes from X86 build. I 
take below configuration but failed to reproduce.

mkdir __BUILD_X86 && cd __BUILD_X86
../configure --enable-language=c,c++   --enable-bootstrap   --disable-multilib 
--prefix=`pwd`/../__INSTALL_X86

make -j $(nproc) && make install

Pan


-Original Message-
From: Gcc-patches  On Behalf 
Of Andreas Schwab
Sent: Friday, May 19, 2023 6:41 PM
To: juzhe.zh...@rivai.ai
Cc: gcc-patches@gcc.gnu.org; kito.ch...@gmail.com; kito.ch...@sifive.com; 
pal...@dabbelt.com; pal...@rivosinc.com; jeffreya...@gmail.com; 
rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Add mode switching target hook to insert rounding 
mode config for fixed-point instructions

In function 'int optimize_mode_switching()',
inlined from 'virtual unsigned int 
{anonymous}::pass_mode_switching::execute(function*)' at 
../../gcc/mode-switching.cc:909:31:
../../gcc/mode-switching.cc:608:29: error: 'bb_info$' may be used uninitialized 
[-Werror=maybe-uninitialized]
  608 | add_seginfo (info + bb->index, ptr);
  | ^~~
../../gcc/mode-switching.cc: In member function 'virtual unsigned int 
{anonymous}::pass_mode_switching::execute(function*)':
../../gcc/mode-switching.cc:503:19: note: 'bb_info$' was declared here
  503 |   struct bb_info *bb_info[N_ENTITIES];
  |   ^~~
cc1plus: all warnings being treated as errors
make[3]: *** [Makefile:1174: mode-switching.o] Error 1

--
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1 "And 
now for something completely different."

RE: [PATCH] RISC-V: Add mode switching target hook to insert rounding mode config for fixed-point instructions

2023-05-19 Thread Li, Pan2 via Gcc-patches

Hi Andreas,

Could you please help to share more information about how to trigger this 
error? As you don't mentioned, I assume below error comes from X86 build. I 
take below configuration but failed to reproduce.

mkdir __BUILD_X86 && cd __BUILD_X86
../configure --enable-language=c,c++   --enable-bootstrap   --disable-multilib 
--prefix=`pwd`/../__INSTALL_X86

make -j $(nproc) && make install

Pan


-Original Message-
From: Gcc-patches  On Behalf 
Of Andreas Schwab
Sent: Friday, May 19, 2023 6:41 PM
To: juzhe.zh...@rivai.ai
Cc: gcc-patches@gcc.gnu.org; kito.ch...@gmail.com; kito.ch...@sifive.com; 
pal...@dabbelt.com; pal...@rivosinc.com; jeffreya...@gmail.com; 
rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Add mode switching target hook to insert rounding 
mode config for fixed-point instructions

In function 'int optimize_mode_switching()',
inlined from 'virtual unsigned int 
{anonymous}::pass_mode_switching::execute(function*)' at 
../../gcc/mode-switching.cc:909:31:
../../gcc/mode-switching.cc:608:29: error: 'bb_info$' may be used uninitialized 
[-Werror=maybe-uninitialized]
  608 | add_seginfo (info + bb->index, ptr);
  | ^~~
../../gcc/mode-switching.cc: In member function 'virtual unsigned int 
{anonymous}::pass_mode_switching::execute(function*)':
../../gcc/mode-switching.cc:503:19: note: 'bb_info$' was declared here
  503 |   struct bb_info *bb_info[N_ENTITIES];
  |   ^~~
cc1plus: all warnings being treated as errors
make[3]: *** [Makefile:1174: mode-switching.o] Error 1

--
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1 "And 
now for something completely different."

Re: Re: [PATCH] RISC-V: Implement autovec abs, vneg, vnot.

2023-05-19 Thread 钟居哲

>> What about the rest of the changes? It's not all typos but I tried
>> to unify the mask/policy handling a bit.
Oh, I see.  You rename get_prefer into get_preferred.
This makes perfect sense to me.




juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-19 20:07
To: 钟居哲; gcc-patches; kito.cheng; palmer; Michael Collison; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Implement autovec abs, vneg, vnot.
>>> +  TAIL_UNDEFINED = -1,
>>> +  MASK_UNDEFINED = -1,
> Why you add this ?
> 
>>> +  void add_policy_operands (enum tail_policy vta = TAIL_UNDEFINED,
>>> + enum mask_policy vma = MASK_UNDEFINED)
> No, you should just specify this as TAIL_ANY or MASK_ANY as default value.
 
That's the value I intended for "unspecified" i.e. the caller
didn't specify and then set it to the default.  _ANY can work as
well I guess.
 
> 
>>>const_vlmax_p (machine_mode mode)
>>>{
>>>-  poly_uint64 nuints = GET_MODE_NUNITS (mode);
>>>+  poly_uint64 nunits = GET_MODE_NUNITS (mode);
>>>-  return nuints.is_constant ()
>>>+  return nunits.is_constant ()
>>> /* The vsetivli can only hold register 0~31.  */
>>>-? (IN_RANGE (nuints.to_constant (), 0, 31))
>>>+? (IN_RANGE (nunits.to_constant (), 0, 31))
>>> /* Only allowed in VLS-VLMAX mode.  */
>>> : false;
>>>}
> Meaningless change ?
 
Typo.
 
> 
>>>/* For the instruction that doesn't require TA, we still need a default 
>>> value
>>>  to emit vsetvl. We pick up the default value according to prefer 
>>> policy. */
>>>-  return (bool) (get_prefer_tail_policy () & 0x1
>>>- || (get_prefer_tail_policy () >> 1 & 0x1));
>>>+  return (bool) (get_preferred_tail_policy () & 0x1
>>>+ || (get_preferred_tail_policy () >> 1 & 0x1));
>>>}
>>>/* Get default mask policy.  */
>>>@@ -576,8 +576,8 @@ get_default_ma ()
>>>{
>>>   /* For the instruction that doesn't require MA, we still need a 
>>> default value
>>>  to emit vsetvl. We pick up the default value according to prefer 
>>> policy. */
>>>-  return (bool) (get_prefer_mask_policy () & 0x1
>>>- || (get_prefer_mask_policy () >> 1 & 0x1));
>>>+  return (bool) (get_preferred_mask_policy () & 0x1
>>>+ || (get_preferred_mask_policy () >> 1 & 0x1));
> Why you change it ?
 
Typo/grammar imho.
 
What about the rest of the changes? It's not all typos but I tried
to unify the mask/policy handling a bit. 
 
> You are using comparison helper which I added one in my downstream 
> when I am working on comparison autovec patterns:
> 
> I think you can normalize my code with yours:
 
I wasn't aware that I'm only using one of several helpers, just refactored
what iss upstream.  Yes your code looks reasonable and it surely works
with the patch without much rework. 
 
> I am almost done all comparison autovec patterns, soon will send them after 
> testing.
 
Good, looking forward to it.
 
Regards
Robin

Re: [PATCH] RISC-V: Implement autovec abs, vneg, vnot.

2023-05-19 Thread Robin Dapp via Gcc-patches

>>> +  TAIL_UNDEFINED = -1,
>>> +  MASK_UNDEFINED = -1,
> Why you add this ?
> 
>>> +  void add_policy_operands (enum tail_policy vta = TAIL_UNDEFINED,
>>> +     enum mask_policy vma = MASK_UNDEFINED)
> No, you should just specify this as TAIL_ANY or MASK_ANY as default value.

That's the value I intended for "unspecified" i.e. the caller
didn't specify and then set it to the default.  _ANY can work as
well I guess.

> 
>>>const_vlmax_p (machine_mode mode)
>>>{
>>>-  poly_uint64 nuints = GET_MODE_NUNITS (mode);
>>>+  poly_uint64 nunits = GET_MODE_NUNITS (mode);
>>>-  return nuints.is_constant ()
>>>+  return nunits.is_constant ()
>>> /* The vsetivli can only hold register 0~31.  */
>>>-    ? (IN_RANGE (nuints.to_constant (), 0, 31))
>>>+    ? (IN_RANGE (nunits.to_constant (), 0, 31))
>>> /* Only allowed in VLS-VLMAX mode.  */
>>> : false;
>>>}
> Meaningless change ?

Typo.

> 
>>>    /* For the instruction that doesn't require TA, we still need a default 
>>>value
>>>      to emit vsetvl. We pick up the default value according to prefer 
>>>policy. */
>>>    -  return (bool) (get_prefer_tail_policy () & 0x1
>>>    - || (get_prefer_tail_policy () >> 1 & 0x1));
>>>    +  return (bool) (get_preferred_tail_policy () & 0x1
>>>    + || (get_preferred_tail_policy () >> 1 & 0x1));
>>>    }
>>>    /* Get default mask policy.  */
>>>    @@ -576,8 +576,8 @@ get_default_ma ()
>>>    {
>>>   /* For the instruction that doesn't require MA, we still need a 
>>>default value
>>>      to emit vsetvl. We pick up the default value according to prefer 
>>>policy. */
>>>    -  return (bool) (get_prefer_mask_policy () & 0x1
>>>    - || (get_prefer_mask_policy () >> 1 & 0x1));
>>>    +  return (bool) (get_preferred_mask_policy () & 0x1
>>>    + || (get_preferred_mask_policy () >> 1 & 0x1));
> Why you change it ?

Typo/grammar imho.

What about the rest of the changes? It's not all typos but I tried
to unify the mask/policy handling a bit. 

> You are using comparison helper which I added one in my downstream 
> when I am working on comparison autovec patterns:
> 
> I think you can normalize my code with yours:

I wasn't aware that I'm only using one of several helpers, just refactored
what iss upstream.  Yes your code looks reasonable and it surely works
with the patch without much rework. 

> I am almost done all comparison autovec patterns, soon will send them after 
> testing.

Good, looking forward to it.

Regards
 Robin

Re: [PATCH] add glibc-stdint.h to vax and lm32 linux target (PR target/105525)

2023-05-19 Thread Maciej W. Rozycki

On Sat, 29 Apr 2023, Jeff Law via Gcc-patches wrote:

> > PR target/105525 is a build regression for the vax and lm32 linux
> > targets present in gcc-12/13/head, where the builds fail due to
> > unsatisfied references to __INTPTR_TYPE__ and __UINTPTR_TYPE__,
> > caused by these two targets failing to provide glibc-stdint.h.
> > 
> > Fixed thusly, tested by building crosses, which now succeeds.
> > 
> > Ok for trunk? (Note I don't have commit rights.)
> > 
> > 2023-04-28  Mikael Pettersson
> > 
> > PR target/105525
> > * config.gcc (vax-*-linux*): Add glibc-stdint.h.
> > (lm32-*-uclinux*): Likewise.
> Thanks.  I've pushed this to the trunk.

 Hmm, I find it quite insteresting and indeed encouraging that someone 
actually verifies our VAX/Linux target.

 Mikael, how do you actually verify it however?

 I'm asking because while I did a glibc port for VAX/Linux (including VAX 
floating-point format support), it was many years ago and for LinuxThreads 
configuration only (hence glibc 2.4 only), which I suspect may not be 
supported by GCC anymore.  And it has never made its way upstream, because 
we'd have to land Linux kernel bits there first and that effort has 
stalled.

 I can still boot that old stuff on my VAX machine, but the userland is 
minimal and somewhat unstable as things sometimes crash or otherwise 
behave in a weird way.  I do have working `bash' and more importantly 
`gdbserver' binaries though.

  Maciej

Re: [PATCH] RISC-V: Implement autovec abs, vneg, vnot.

2023-05-19 Thread 钟居哲

>> +  TAIL_UNDEFINED = -1,
>> +  MASK_UNDEFINED = -1,
Why you add this ?

>> +  void add_policy_operands (enum tail_policy vta = TAIL_UNDEFINED,
>> + enum mask_policy vma = MASK_UNDEFINED)
No, you should just specify this as TAIL_ANY or MASK_ANY as default value.

>>const_vlmax_p (machine_mode mode)
>>{
>>-  poly_uint64 nuints = GET_MODE_NUNITS (mode);
>>+  poly_uint64 nunits = GET_MODE_NUNITS (mode);
>>-  return nuints.is_constant ()
>>+  return nunits.is_constant ()
>> /* The vsetivli can only hold register 0~31.  */
>>-? (IN_RANGE (nuints.to_constant (), 0, 31))
>>+? (IN_RANGE (nunits.to_constant (), 0, 31))
>> /* Only allowed in VLS-VLMAX mode.  */
>> : false;
>>}
Meaningless change ?

>>/* For the instruction that doesn't require TA, we still need a default 
>> value
>>  to emit vsetvl. We pick up the default value according to prefer 
>> policy. */
>>-  return (bool) (get_prefer_tail_policy () & 0x1
>>- || (get_prefer_tail_policy () >> 1 & 0x1));
>>+  return (bool) (get_preferred_tail_policy () & 0x1
>>+ || (get_preferred_tail_policy () >> 1 & 0x1));
>>}
>>/* Get default mask policy.  */
>>@@ -576,8 +576,8 @@ get_default_ma ()
>>{
>>   /* For the instruction that doesn't require MA, we still need a 
>> default value
>>  to emit vsetvl. We pick up the default value according to prefer 
>> policy. */
>>-  return (bool) (get_prefer_mask_policy () & 0x1
>>- || (get_prefer_mask_policy () >> 1 & 0x1));
>>+  return (bool) (get_preferred_mask_policy () & 0x1
>>+ || (get_preferred_mask_policy () >> 1 & 0x1));
Why you change it ?

>>   +/* Emit an RVV comparison.  */
>>   +static void
>>   +emit_pred_cmp (unsigned icode, rtx mask, rtx dest, rtx cmp,
>>   +rtx src1, rtx src2,
>>   +rtx len, machine_mode mask_mode)
>>   +{
>>   +  insn_expander<9> e;
>>   +
>>   +  e.set_dest_and_mask (dest, mask, mask_mode);
>>   +
>>   +  e.add_input_operand (cmp, GET_MODE (cmp));
>>   +
>>   +  e.add_source_operand (src1, GET_MODE (src1));
>>   +  e.add_source_operand (src2, GET_MODE (src2));

You are using comparison helper which I added one in my downstream 
when I am working on comparison autovec patterns:

I think you can normalize my code with yours:

/* Emit an RVV comparison.  If one of SRC1 and SRC2 is a scalar operand, its
   data_mode is specified using SCALAR_MODE.  */
static void
emit_pred_comparison (unsigned icode, rtx_code rcode, rtx mask, rtx dest,
  rtx src1, rtx src2, rtx len, machine_mode mask_mode,
  machine_mode scalar_mode = VOIDmode)
{
  insn_expander<9> e;
  e.set_dest_and_mask (mask, dest, mask_mode);
  machine_mode data_mode = GET_MODE (src1);

  gcc_assert (VECTOR_MODE_P (GET_MODE (src1))
|| VECTOR_MODE_P (GET_MODE (src2)));

  if (!insn_operand_matches ((enum insn_code) icode, e.opno () + 1, src1))
src1 = force_reg (data_mode, src1);
  if (!insn_operand_matches ((enum insn_code) icode, e.opno () + 2, src2))
{
  if (VECTOR_MODE_P (GET_MODE (src2)))
  src2 = force_reg (data_mode, src2);
  else
  src2 = force_reg (scalar_mode, src2);
}
  rtx comparison = gen_rtx_fmt_ee (rcode, mask_mode, src1, src2);
  if (!VECTOR_MODE_P (GET_MODE (src2)))
comparison = gen_rtx_fmt_ee (rcode, mask_mode, src1,
 gen_rtx_VEC_DUPLICATE (data_mode, src2));
  e.add_fixed_operand (comparison);

  e.add_fixed_operand (src1);
  if (CONST_INT_P (src2))
e.add_integer_operand (src2);
  else
e.add_fixed_operand (src2);

  e.set_len_and_policy (len, true, false, true);

  e.expand ((enum insn_code) icode, false);
}

static void
emit_len_comparison (unsigned icode, rtx_code rcode, rtx dest, rtx src1,
 rtx src2, rtx len, machine_mode mask_mode,
 machine_mode scalar_mode)
{
  emit_pred_comparison (icode, rcode, NULL_RTX, dest, src1, src2, len,
  mask_mode, scalar_mode);
}

/* Expand an RVV integer comparison using the RVV equivalent of:

 (set TARGET (CODE OP0 OP1)).  */

void
expand_vec_cmp_int (rtx target, rtx_code code, rtx op0, rtx op1)
{
  machine_mode mask_mode = GET_MODE (target);
  machine_mode data_mode = GET_MODE (op0);
  insn_code icode;
  bool scalar_p = false;

  if (CONST_VECTOR_P (op1))
{
  rtx elt;
  if (const_vec_duplicate_p (op1, ))
  op1 = elt;
  scalar_p = true;
}

  switch (code)
{
case LE:
case LEU:
case GT:
case GTU:
  if (scalar_p)
  icode = code_for_pred_cmp_scalar (data_mode);
  else
  icode = code_for_pred_cmp (data_mode);
  break;
case EQ:
case NE:
  if (scalar_p)
  icode = code_for_pred_eqne_scalar (data_mode);
  else
  icode = code_for_pred_cmp (data_mode);
  break;
case LT:
case LTU:
  if (scalar_p)
  icode = code_for_pred_cmp_scalar (data_mode);
  else
  icode = code_for_pred_ltge (data_mode);
  break;
case GE:
case GEU:
  if (scalar_p)
  icode = code_for_pred_ge_scalar (data_mode);
  else
  icode =

[C PATCH v2] Fix ICEs related to VM types in C [PR106465, PR107557, PR108423, PR109450]

2023-05-19 Thread Martin Uecker via Gcc-patches



Thanks Joseph! 

Revised version attached. Ok?


But I wonder whether we generally need to do something 
about

  sizeof *x

when x is NULL or not initialized. This is quite commonly
used in C code and if the type is not of variable size,
it is also unproblematic.  So the UB for variable size is
unfortunate and certainly also affects existing code in
the wild.  In practice it does not seem to cause
problems because there is no lvalue conversion and this
then seems to work.  Maybe we document this as an 
extension?  (and make sure in the C FE that it
works)  This would also make this idiom valid:

char (*buf)[n] = malloc(sizeof *buf);

Or if we do not want to do this, then I think we should
add some warnings (and UBSan check for null pointer)
which currently do not exist:

https://godbolt.org/z/fhWMKvYc8

Martin




Am Donnerstag, dem 18.05.2023 um 21:46 + schrieb Joseph Myers:
> On Thu, 18 May 2023, Martin Uecker via Gcc-patches wrote:
> 
> > +  /* we still have to evaluate size expressions */
> 
> Comments should start with a capital letter and end with ".  ".
> 
> > diff --git a/gcc/testsuite/gcc.dg/nested-vla-1.c 
> > b/gcc/testsuite/gcc.dg/nested-vla-1.c
> > new file mode 100644
> > index 000..408a68524d8
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/nested-vla-1.c
> > @@ -0,0 +1,37 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-std=gnu99" } */
> 
> I'm concerned with various undefined behavior in this and other tests; 
> they look very fragile, relying on some optimizations and not others 
> taking place.  I think they should be adjusted to avoid undefined behavior 
> if all the evaluations from the abstract machine (in particular, of sizeof 
> operands with variable size) take place, and other undefined behavior from 
> calling functions through function pointers with incompatible type.
> 
> > +   struct bar { char x[++n]; } (*bar2)(void) = bar;/* { dg-warning 
> > "incompatible pointer type" } */
> > +
> > +   if (2 != n)
> > +   __builtin_abort();
> > +
> > +   if (2 != sizeof((*bar2)()))
> > +   __builtin_abort();
> 
> You're relying on the compiler not noticing that a function is being 
> called through an incompatible type and thus not turning the call (which 
> should be evaluated, because the operand of sizeof has a type with 
> variable size) into a call to abort.
> 
> > diff --git a/gcc/testsuite/gcc.dg/nested-vla-2.c 
> > b/gcc/testsuite/gcc.dg/nested-vla-2.c
> > new file mode 100644
> > index 000..504eec48c80
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/nested-vla-2.c
> > @@ -0,0 +1,33 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-std=gnu99" } */
> > +
> > +
> > +int main()
> > +{
> > +   int n = 1;
> > +
> > +   typeof(char (*)[++n]) bar(void) { }
> > +
> > +   if (2 != n)
> > +   __builtin_abort();
> > +
> > +   if (2 != sizeof(*bar()))
> > +   __builtin_abort();
> 
> In this test, *bar() is evaluated, i.e. an undefined pointer is 
> dereferenced; it would be better to return a valid pointer to a 
> sufficiently large array to avoid that undefined behavior.
> 
> > diff --git a/gcc/testsuite/gcc.dg/pr106465.c 
> > b/gcc/testsuite/gcc.dg/pr106465.c
> > new file mode 100644
> > index 000..b03e2442f12
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr106465.c
> > @@ -0,0 +1,86 @@
> > +/* PR c/106465
> > + * { dg-do run }
> > + * { dg-options "-std=gnu99" }
> > + * */
> > +
> > +int main()
> > +{
> > +   int n = 3;
> > +   
> > +   void g1(int m, struct { char p[++m]; }* b)  /* { dg-warning 
> > "anonymous struct" } */
> > +   {
> > +   if (3 != m)
> > +   __builtin_abort();
> > +
> > +   if (3 != sizeof(b->p))
> > +   __builtin_abort();
> > +   }
> 
> > +   g1(2, (void*)0);
> 
> Similarly, this is dereferencing a null pointer in the evaluated operand 
> of sizeof.

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 90d7cd27cd5..f63c1108ab5 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -5378,7 +5378,8 @@ start_decl (struct c_declarator *declarator, struct 
c_declspecs *declspecs,
 if (lastdecl != error_mark_node)
   *lastloc = DECL_SOURCE_LOCATION (lastdecl);
 
-  if (expr)
+  /* Make sure the size expression is evaluated at this point.  */
+  if (expr && !current_scope->parm_flag)
 add_stmt (fold_convert (void_type_node, expr));
 
   if (TREE_CODE (decl) != FUNCTION_DECL && MAIN_NAME_P (DECL_NAME (decl))
@@ -7510,7 +7511,8 @@ grokdeclarator (const struct c_declarator *declarator,
&& c_type_variably_modified_p (type))
  {
tree bind = NULL_TREE;
-   if (decl_context == TYPENAME || decl_context == PARM)
+   if (decl_context == TYPENAME || decl_context == PARM
+   || decl_context == FIELD)
  {
bind = build3 (BIND_EXPR, void_type_node, NULL_TREE,
   NULL_TREE, NULL_TREE);
@@ -7519,10

Re: [committed] Enable LRA on several ports

2023-05-19 Thread Maciej W. Rozycki

On Tue, 2 May 2023, Jeff Law via Gcc-patches wrote:

> Well, I'd say that my plan would be to deprecate any target that is not
> converted by the end of this development cycle.  So the change keeps cris from
> falling into that bucket.

 As I noted in the other thread it is highly unlikely I will make it with 
the VAX target in this release cycle, owing to the catastrophic breakage 
of the exception unwinder, recently discovered, which I consider higher 
priority as a show-stopper for important software such as current GDB.  I 
will appreciate your taking this into consideration.

 That written the VAX target does build its target libraries with `-mlra', 
but there are ICE regressions in the test suite and overall code produced 
is brown paperbag quality.  And removing `-mno-lra' before that has been 
sorted will make making LRA match old reload quality much tougher.

  Maciej

Re: [PATCH] RISC-V: Allow more loading of const vectors.

2023-05-19 Thread Kito Cheng via Gcc-patches

LGTM

Robin Dapp via Gcc-patches  於 2023年5月19日 週五 19:07
寫道：

> Hi,
>
> this fixes a rebase oversight regarding the loading
> of vector constants.  Added another test to properly
> catch that in the future.
>
> Regards
>  Robin
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_const_insns): Remove else.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c: New test.
> * gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv64.c: New test.
> ---
>  gcc/config/riscv/riscv.cc   | 2 +-
>  .../gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c   | 6 ++
>  .../gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv64.c   | 6 ++
>  3 files changed, 13 insertions(+), 1 deletion(-)
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv64.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 0d1b83f4315..0e874f0604d 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -1295,7 +1295,7 @@ riscv_const_insns (rtx x)
>The Wc0, Wc1 constraints are already covered by the
>vi constraint so we do not need to check them here
>separately.  */
> -   else if (TARGET_VECTOR && satisfies_constraint_vi (x))
> +   if (TARGET_VECTOR && satisfies_constraint_vi (x))
>   return 1;
>
> /* TODO: We may support more const vector in the future.  */
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c
> new file mode 100644
> index 000..631ea3bf268
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-std=c99 -march=rv32gcv -mabi=ilp32d
> -fno-vect-cost-model --param=riscv-autovec-preference=fixed-vlmax
> -fno-builtin" } */
> +
> +#include "vmv-imm-template.h"
> +
> +/* { dg-final { scan-assembler-times "vmv.v.i" 32 } } */
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv64.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv64.c
> new file mode 100644
> index 000..7ded6cc18d2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv64.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-std=c99 -march=rv64gcv -mabi=lp64d
> -fno-vect-cost-model --param=riscv-autovec-preference=fixed-vlmax
> -fno-builtin" } */
> +
> +#include "vmv-imm-template.h"
> +
> +/* { dg-final { scan-assembler-times "vmv.v.i" 32 } } */
> --
> 2.40.1
>

Re: [PATCH] RISC-V: testsuite: Remove empty *-run-template.h.

2023-05-19 Thread Kito Cheng via Gcc-patches

LGTM

Robin Dapp via Gcc-patches  於 2023年5月19日 週五 19:10
寫道：

> Hi,
>
> this obvious patch removes empty run template files and one redundant
> stdio.h include.
>
> Regards
>  Robin
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/binop/shift-run.c: Do not include
> .
> * gcc.target/riscv/rvv/autovec/binop/shift-run-template.h: Removed.
> * gcc.target/riscv/rvv/autovec/binop/vadd-run-template.h: Removed.
> * gcc.target/riscv/rvv/autovec/binop/vand-run-template.h: Removed.
> * gcc.target/riscv/rvv/autovec/binop/vdiv-run-template.h: Removed.
> * gcc.target/riscv/rvv/autovec/binop/vmax-run-template.h: Removed.
> * gcc.target/riscv/rvv/autovec/binop/vmin-run-template.h: Removed.
> * gcc.target/riscv/rvv/autovec/binop/vmul-run-template.h: Removed.
> * gcc.target/riscv/rvv/autovec/binop/vor-run-template.h: Removed.
> * gcc.target/riscv/rvv/autovec/binop/vrem-run-template.h: Removed.
> * gcc.target/riscv/rvv/autovec/binop/vsub-run-template.h: Removed.
> * gcc.target/riscv/rvv/autovec/binop/vxor-run-template.h: Removed.
> ---
>  .../gcc.target/riscv/rvv/autovec/binop/shift-run-template.h  | 0
>  gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-run.c | 1 -
>  .../gcc.target/riscv/rvv/autovec/binop/vadd-run-template.h   | 0
>  .../gcc.target/riscv/rvv/autovec/binop/vand-run-template.h   | 0
>  .../gcc.target/riscv/rvv/autovec/binop/vdiv-run-template.h   | 0
>  .../gcc.target/riscv/rvv/autovec/binop/vmax-run-template.h   | 0
>  .../gcc.target/riscv/rvv/autovec/binop/vmin-run-template.h   | 0
>  .../gcc.target/riscv/rvv/autovec/binop/vmul-run-template.h   | 0
>  .../gcc.target/riscv/rvv/autovec/binop/vor-run-template.h| 0
>  .../gcc.target/riscv/rvv/autovec/binop/vrem-run-template.h   | 0
>  .../gcc.target/riscv/rvv/autovec/binop/vsub-run-template.h   | 0
>  .../gcc.target/riscv/rvv/autovec/binop/vxor-run-template.h   | 0
>  12 files changed, 1 deletion(-)
>  delete mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-run-template.h
>  delete mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-run-template.h
>  delete mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vand-run-template.h
>  delete mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-run-template.h
>  delete mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmax-run-template.h
>  delete mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmin-run-template.h
>  delete mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmul-run-template.h
>  delete mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vor-run-template.h
>  delete mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-run-template.h
>  delete mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-run-template.h
>  delete mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vxor-run-template.h
>
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-run-template.h
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-run-template.h
> deleted file mode 100644
> index e69de29bb2d..000
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-run.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-run.c
> index 159478c6947..ff3633b530a 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-run.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-run.c
> @@ -3,7 +3,6 @@
>
>  #include "shift-template.h"
>
> -#include 
>  #include 
>
>  #define SZ 512
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-run-template.h
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-run-template.h
> deleted file mode 100644
> index e69de29bb2d..000
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vand-run-template.h
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vand-run-template.h
> deleted file mode 100644
> index e69de29bb2d..000
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-run-template.h
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-run-template.h
> deleted file mode 100644
> index e69de29bb2d..000
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmax-run-template.h
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmax-run-template.h
> deleted file mode 100644
> index e69de29bb2d..000
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmin-run-template.h
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmin-run-template.h
> deleted file mode 100644
> index e69de29bb2d..000
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmul-run-template.h
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmul-run-template.h
> deleted file mode 100644

[PATCH] RISC-V: Implement autovec abs, vneg, vnot.

2023-05-19 Thread Robin Dapp via Gcc-patches

Hi,

this patch implements autovec expanders of abs2, vneg2 and
vnot2 for integers.  I also tried to refactor the helper code
in riscv-v.cc a bit.  Guess it's not enough to warrant a separate patch
though.

Regards
 Robin 

gcc/ChangeLog:

* config/riscv/autovec.md (2): Fix typo.
(abs2): New expander.
* config/riscv/riscv-protos.h (emit_len_masked_op): Declare.
(emit_len_cmp): Declare.
(enum tail_policy): Add undefined.
(enum mask_policy): Add undefined.
* config/riscv/riscv-v.cc (const_vlmax_p): Fix typo.
(emit_pred_op): Swap mask and dest.
(emit_pred_binop): Dito.
(emit_pred_cmp_op): Dito.
(emit_vlmax_reg_op): Dito.
(emit_len_masked_op): New function.
(emit_len_cmp): New function.
(emit_index_op): Use helper function.
(get_prefer_tail_policy): Rename.
(get_preferred_tail_policy): To this.
(get_prefer_mask_policy): Rename.
(get_preferred_mask_policy): To this.
(slide1_sew64_helper): Dito.
* config/riscv/riscv-vector-builtins.cc (get_tail_policy_for_pred):
Dito.
(get_mask_policy_for_pred): Dito.
* config/riscv/riscv-vsetvl.cc (get_default_ta): Dito.
(get_default_ma): Dito.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/abs-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/abs-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vnot-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vnot-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vnot-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vnot-template.h: New test.
---
 gcc/config/riscv/autovec.md   |  51 +-
 gcc/config/riscv/riscv-protos.h   |   8 +-
 gcc/config/riscv/riscv-v.cc   | 146 ++
 gcc/config/riscv/riscv-vector-builtins.cc |   4 +-
 gcc/config/riscv/riscv-vsetvl.cc  |   8 +-
 .../riscv/rvv/autovec/unop/abs-run.c  |  29 
 .../riscv/rvv/autovec/unop/abs-rv32gcv.c  |   7 +
 .../riscv/rvv/autovec/unop/abs-rv64gcv.c  |   7 +
 .../riscv/rvv/autovec/unop/abs-template.h |  26 
 .../riscv/rvv/autovec/unop/vneg-run.c |  29 
 .../riscv/rvv/autovec/unop/vneg-rv32gcv.c |   6 +
 .../riscv/rvv/autovec/unop/vneg-rv64gcv.c |   6 +
 .../riscv/rvv/autovec/unop/vneg-template.h|  17 ++
 .../riscv/rvv/autovec/unop/vnot-run.c |  33 
 .../riscv/rvv/autovec/unop/vnot-rv32gcv.c |   6 +
 .../riscv/rvv/autovec/unop/vnot-rv64gcv.c |   6 +
 .../riscv/rvv/autovec/unop/vnot-template.h|  21 +++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 18 files changed, 377 insertions(+), 35 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-template.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-template.h
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vnot-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vnot-rv32gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vnot-rv64gcv.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vnot-template.h

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index ce0b46537ad..8060a5cdf90 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -161,7 +161,7 @@ (define_expand "3"
 })
 
 ;; -
-;;  [INT] Binary shifts by scalar.
+;;  [INT] Binary shifts by vector.
 ;; -
 ;; Includes:
 ;; - vsll.vv/vsra.vv/vsrl.vv
@@ -180,3 +180,52 @@ (define_expand "v3"
NULL_RTX, mode);
   DONE;
 })
+
+;; =
+;; == Unary arithmetic
+;;

Re: [PATCH v2 0/9] MIPS: Add MIPS16e2 ASE instrucions.

2023-05-19 Thread Maciej W. Rozycki

Hi Jie,

 Thank you for your submission.

 Since I was a member of the team that developed this ASE in cooperation 
with the hardware group, I did the binutils part, and it was even myself 
who came up with the name for the ASE in an internal discussion, I feel 
somewhat responsible for this feature and therefore I'll review this patch 
series.  I can't formally approve it as I'm not a nominated maintainer, 
but once you have addressed my concerns I expect this to be a formality.

 It may take a couple of days though as this patchset is moderately sized 
and I'm time-constrained.

> The MIPS16e2 ASE is an enhancement to the MIPS16e ASE,
> which includes all MIPS16e instructions, with some addition.
> 
> This series of patches adds all instructions of MIPS16E2 ASE.

 NB please always document changes between revisions of patchsets sent, in 
the comment section of each patch submitted.

 Also you haven't mentioned how you verified your changes.  Please always 
state that when submitting patches, e.g. in the cover letter.

  Maciej

[PATCH] RISC-V: testsuite: Remove empty *-run-template.h.

2023-05-19 Thread Robin Dapp via Gcc-patches

Hi,

this obvious patch removes empty run template files and one redundant
stdio.h include.

Regards
 Robin

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/shift-run.c: Do not include
.
* gcc.target/riscv/rvv/autovec/binop/shift-run-template.h: Removed.
* gcc.target/riscv/rvv/autovec/binop/vadd-run-template.h: Removed.
* gcc.target/riscv/rvv/autovec/binop/vand-run-template.h: Removed.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run-template.h: Removed.
* gcc.target/riscv/rvv/autovec/binop/vmax-run-template.h: Removed.
* gcc.target/riscv/rvv/autovec/binop/vmin-run-template.h: Removed.
* gcc.target/riscv/rvv/autovec/binop/vmul-run-template.h: Removed.
* gcc.target/riscv/rvv/autovec/binop/vor-run-template.h: Removed.
* gcc.target/riscv/rvv/autovec/binop/vrem-run-template.h: Removed.
* gcc.target/riscv/rvv/autovec/binop/vsub-run-template.h: Removed.
* gcc.target/riscv/rvv/autovec/binop/vxor-run-template.h: Removed.
---
 .../gcc.target/riscv/rvv/autovec/binop/shift-run-template.h  | 0
 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-run.c | 1 -
 .../gcc.target/riscv/rvv/autovec/binop/vadd-run-template.h   | 0
 .../gcc.target/riscv/rvv/autovec/binop/vand-run-template.h   | 0
 .../gcc.target/riscv/rvv/autovec/binop/vdiv-run-template.h   | 0
 .../gcc.target/riscv/rvv/autovec/binop/vmax-run-template.h   | 0
 .../gcc.target/riscv/rvv/autovec/binop/vmin-run-template.h   | 0
 .../gcc.target/riscv/rvv/autovec/binop/vmul-run-template.h   | 0
 .../gcc.target/riscv/rvv/autovec/binop/vor-run-template.h| 0
 .../gcc.target/riscv/rvv/autovec/binop/vrem-run-template.h   | 0
 .../gcc.target/riscv/rvv/autovec/binop/vsub-run-template.h   | 0
 .../gcc.target/riscv/rvv/autovec/binop/vxor-run-template.h   | 0
 12 files changed, 1 deletion(-)
 delete mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-run-template.h
 delete mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-run-template.h
 delete mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vand-run-template.h
 delete mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-run-template.h
 delete mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmax-run-template.h
 delete mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmin-run-template.h
 delete mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmul-run-template.h
 delete mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vor-run-template.h
 delete mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrem-run-template.h
 delete mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-run-template.h
 delete mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vxor-run-template.h

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-run-template.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-run-template.h
deleted file mode 100644
index e69de29bb2d..000
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-run.c
index 159478c6947..ff3633b530a 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/shift-run.c
@@ -3,7 +3,6 @@
 
 #include "shift-template.h"
 
-#include 
 #include 
 
 #define SZ 512
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-run-template.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-run-template.h
deleted file mode 100644
index e69de29bb2d..000
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vand-run-template.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vand-run-template.h
deleted file mode 100644
index e69de29bb2d..000
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-run-template.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vdiv-run-template.h
deleted file mode 100644
index e69de29bb2d..000
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmax-run-template.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmax-run-template.h
deleted file mode 100644
index e69de29bb2d..000
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmin-run-template.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmin-run-template.h
deleted file mode 100644
index e69de29bb2d..000
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmul-run-template.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vmul-run-template.h
deleted file mode 100644
index e69de29bb2d..000
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vor-run-template.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vor-run-template.h
deleted file mode 100644
index

Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer

2023-05-19 Thread Richard Sandiford via Gcc-patches

"juzhe.zh...@rivai.ai"  writes:
> Hi, Richard. Thanks for the comments.
>
> Would you mind telling me whether it is possible that we can make decrement 
> IV support into GCC middle-end ?
>
> If yes, could you tell what I should do next for the patches since I am 
> confused that it seems the implementation of this
> patch should totally be abandoned and need to rewrite the whole thing.

No, I haven't said that.  Like I say, I haven't had time to review the
decrementing IV part of the patch yet.  But the change I mentioned
earlier seemed like an unrelated fix that should go in first.

I was hoping to partially unblock your work by reviewing that part in
isolation rather than waiting until I had time to review the whole patch.
But I guess that's just created confusion rather than been helpful, sorry.

In other words: the decrementing IV patch should (I hope) be an
optimisation.  It shouldn't be needed for correctness.  The current
incrementing IVs should work for LOAD_LEN, but perhaps inefficiently.
Is that right?

In contrast, the change to vect_get_loop_len is a correctness fix
and I can't see how RVV would work without it.

Thanks,
Richard

[PATCH] RISC-V: Allow more loading of const vectors.

2023-05-19 Thread Robin Dapp via Gcc-patches

Hi,

this fixes a rebase oversight regarding the loading
of vector constants.  Added another test to properly
catch that in the future.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_const_insns): Remove else.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c: New test.
* gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv64.c: New test.
---
 gcc/config/riscv/riscv.cc   | 2 +-
 .../gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c   | 6 ++
 .../gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv64.c   | 6 ++
 3 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv64.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0d1b83f4315..0e874f0604d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1295,7 +1295,7 @@ riscv_const_insns (rtx x)
   The Wc0, Wc1 constraints are already covered by the
   vi constraint so we do not need to check them here
   separately.  */
-   else if (TARGET_VECTOR && satisfies_constraint_vi (x))
+   if (TARGET_VECTOR && satisfies_constraint_vi (x))
  return 1;
 
/* TODO: We may support more const vector in the future.  */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c
new file mode 100644
index 000..631ea3bf268
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv32gcv -mabi=ilp32d 
-fno-vect-cost-model --param=riscv-autovec-preference=fixed-vlmax -fno-builtin" 
} */
+
+#include "vmv-imm-template.h"
+
+/* { dg-final { scan-assembler-times "vmv.v.i" 32 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv64.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv64.c
new file mode 100644
index 000..7ded6cc18d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv64.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv -mabi=lp64d 
-fno-vect-cost-model --param=riscv-autovec-preference=fixed-vlmax -fno-builtin" 
} */
+
+#include "vmv-imm-template.h"
+
+/* { dg-final { scan-assembler-times "vmv.v.i" 32 } } */
-- 
2.40.1

Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer

2023-05-19 Thread Richard Sandiford via Gcc-patches

"juzhe.zh...@rivai.ai"  writes:
>>> I don't think this is a property of decrementing IVs.  IIUC it's really
>>> a property of rgl->factor == 1 && factor == 1, where factor would need
>>> to be passed in by the caller.  Because of that, it should probably be
>>> a separate patch.
> Is it right that I just post this part code as a seperate patch then merge it?

No, not in its current form.  Like I say, the test should be based on
factors rather than TYPE_VECTOR_SUBPARTS.  But a fix for this problem
should come before the changes to IVs.

>>> That is, current LOAD_LEN targets have two properties (IIRC):
>>> (1) all vectors used in a given piece of vector code have the same byte size
>>> (2) lengths are measured in bytes rather than elements
>>> For all cases, including SVE, the number of controls needed for a scalar
>>> statement is equal to the number of vectors needed for that scalar
>>> statement.
>>> Because of (1), on current LOADL_LEN targets, the number of controls
>>> needed for a scalar statement is also proportional to the total number
>>> of bytes occupied by the vectors generated for that scalar statement.
>>> And because of (2), the total number of bytes is the only thing that
>>> matters, so all users of a particular control can use the same control
>>> value.
>>> E.g. on current LOAD_LEN targets, 2xV16QI and 2xV8HI would use the same
>>> control (with no adjustment).  2xV16QI means 32 elements, while 2xV8HI
>>> means 16 elements.  V16QI's nscalars_per_iter would therefore be double
>>> V8HI's, but V8HI's factor would be double V16QI's (2 vs 1), so things
>>> even out.
>>> The code structurally supports targets that count in elements rather
>>> than bytes, so that factor==1 for all element types.  See the
>>> "rgl->factor == 1 && factor == 1" case in:
>  >>  if (rgl->max_nscalars_per_iter < nscalars_per_iter)  >>   {  >> /* 
> For now, we only support cases in which all loads and stores fall back to 
> VnQI or none do.  */
>>>gcc_assert (!rgl->max_nscalars_per_iter>>  || 
> (rgl->factor == 1 && factor == 1)
> || (rgl->max_nscalars_per_iter * rgl->factor
>>>   == nscalars_per_iter * factor));
>  >>  rgl->max_nscalars_per_iter = nscalars_per_iter; >>  rgl->type = 
> vectype; >>  rgl->factor = factor;  >>   }>> But it hasn't been tested, 
> since no current target uses it.
>>> I think the above part of the patch shows that the current "factor is
>>> always 1" path is in fact broken, and the patch is a correctness fix on
>>> targets that measure in elements rather than bytes.
>>> So I think the above part of the patch should go in ahead of the IV changes.
>>> But the test should be based on factor rather than TYPE_VECTOR_SUBPARTS.
> Since the length control measured by bytes instead of bytes is not
> appropriate for RVV.You mean I can't support RVV auto-vectorization in
> upstream GCC middle-end and I can only support it in my downstream, is
> that right?

No.  I haven't said in this or previous reviews that something cannot be
supported in upstream GCC.

I'm saying that the code in theory supports counting in bytes *or*
counting in elements.  But only the first one has actually been tested.
And so, perhaps not surprisingly, the support for counting elements
needs a fix.

The fix in your patch looks like it's on the right lines, but it should be
based on factor rather than TYPE_VECTOR_SUBPARTS.

See get_len_load_store_mode for how this selection happens:

(1) IFN_LOAD_LEN itself always counts in elements rather than bytes.

(2) If a target has instructions that count in elements, it should
define load_len patterns for all vector modes that it supports.

(3) If a target has instructions that count in bytes, it should define
load_len patterns only for byte modes.  The vectoriser will then
use byte loads for all vector types (even things like V8HI).

For (2), the loop controls will always have a factor of 1.
For (3), the loop controls will have a factor equal to the element
size in bytes.  See:

  machine_mode vmode;
  if (get_len_load_store_mode (vecmode, is_load).exists ())
{
  nvectors = group_memory_nvectors (group_size * vf, nunits);
  vec_loop_lens *lens = _VINFO_LENS (loop_vinfo);
  unsigned factor = (vecmode == vmode) ? 1 : GET_MODE_UNIT_SIZE (vecmode);
  vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, factor);
  using_partial_vectors_p = true;
}

This part should work correctly for RVV and any future targets that
measure in elements rather than bytes.  The problem is here:

tree final_len
  = vect_get_loop_len (loop_vinfo, loop_lens,
   vec_num * ncopies,
   vec_num * j + i);
tree ptr = build_int_cst (ref_type,
  align * BITS_PER_UNIT);

Re: Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer

2023-05-19 Thread juzhe.zh...@rivai.ai

Hi, Richard. Thanks for the comments.

Would you mind telling me whether it is possible that we can make decrement IV 
support into GCC middle-end ?

If yes, could you tell what I should do next for the patches since I am 
confused that it seems the implementation of this
patch should totally be abandoned and need to rewrite the whole thing.

Would you mind giving me more information?

Thanks. 

juzhe.zh...@rivai.ai

From: Richard Sandiford
Date: 2023-05-19 18:23
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer
Thanks for the update.  I'll split this review into two pieces.
Second piece to follow (not sure when, but hopefully soon).

juzhe.zh...@rivai.ai writes:
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index ed0166fedab..6f49bdee009 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10364,12 +10375,14 @@ vect_record_loop_len (loop_vec_info loop_vinfo, 
> vec_loop_lens *lens,
> rgroup that operates on NVECTORS vectors, where 0 <= INDEX < NVECTORS.  */
>  
>  tree
> -vect_get_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens,
> -unsigned int nvectors, unsigned int index)
> +vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi,
> +vec_loop_lens *lens, unsigned int nvectors, tree vectype,
> +unsigned int index)
>  {
>rgroup_controls *rgl = &(*lens)[nvectors - 1];
>bool use_bias_adjusted_len =
>  LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) != 0;
> +  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
>  
>/* Populate the rgroup's len array, if this is the first time we've
>   used it.  */
> @@ -10400,6 +10413,26 @@ vect_get_loop_len (loop_vec_info loop_vinfo, 
> vec_loop_lens *lens,
>  
>if (use_bias_adjusted_len)
>  return rgl->bias_adjusted_ctrl;
> +  else if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> +{
> +  tree loop_len = rgl->controls[index];
> +  poly_int64 nunits1 = TYPE_VECTOR_SUBPARTS (rgl->type);
> +  poly_int64 nunits2 = TYPE_VECTOR_SUBPARTS (vectype);
> +  if (maybe_ne (nunits1, nunits2))
> + {
> +   /* A loop len for data type X can be reused for data type Y
> +  if X has N times more elements than Y and if Y's elements
> +  are N times bigger than X's.  */
> +   gcc_assert (multiple_p (nunits1, nunits2));
> +   unsigned int factor = exact_div (nunits1, nunits2).to_constant ();
> +   gimple_seq seq = NULL;
> +   loop_len = gimple_build (, RDIV_EXPR, iv_type, loop_len,
> +build_int_cst (iv_type, factor));
> +   if (seq)
> + gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
> + }
> +  return loop_len;
> +}

I don't think this is a property of decrementing IVs.  IIUC it's really
a property of rgl->factor == 1 && factor == 1, where factor would need
to be passed in by the caller.  Because of that, it should probably be
a separate patch.

That is, current LOAD_LEN targets have two properties (IIRC):

(1) all vectors used in a given piece of vector code have the same byte size
(2) lengths are measured in bytes rather than elements

For all cases, including SVE, the number of controls needed for a scalar
statement is equal to the number of vectors needed for that scalar
statement.

Because of (1), on current LOADL_LEN targets, the number of controls
needed for a scalar statement is also proportional to the total number
of bytes occupied by the vectors generated for that scalar statement.
And because of (2), the total number of bytes is the only thing that
matters, so all users of a particular control can use the same control
value.

E.g. on current LOAD_LEN targets, 2xV16QI and 2xV8HI would use the same
control (with no adjustment).  2xV16QI means 32 elements, while 2xV8HI
means 16 elements.  V16QI's nscalars_per_iter would therefore be double
V8HI's, but V8HI's factor would be double V16QI's (2 vs 1), so things
even out.

The code structurally supports targets that count in elements rather
than bytes, so that factor==1 for all element types.  See the
"rgl->factor == 1 && factor == 1" case in:

  if (rgl->max_nscalars_per_iter < nscalars_per_iter)
{
  /* For now, we only support cases in which all loads and stores fall back
to VnQI or none do.  */
  gcc_assert (!rgl->max_nscalars_per_iter
  || (rgl->factor == 1 && factor == 1)
  || (rgl->max_nscalars_per_iter * rgl->factor
  == nscalars_per_iter * factor));
  rgl->max_nscalars_per_iter = nscalars_per_iter;
  rgl->type = vectype;
  rgl->factor = factor;
}

But it hasn't been tested, since no current target uses it.

I think the above part of the patch shows that the current "factor is
always 1" path is in fact broken, and the patch is a correctness fix on
targets that measure in elements rather than bytes.

So I think the above part of the patch should go in ahead of the IV changes.
But the test should be based on factor rather than TYPE_VECTOR_SUBPARTS.

Thanks,
Richard

Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-19 Thread Prathamesh Kulkarni via Gcc-patches

On Thu, 18 May 2023 at 22:04, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Thu, 18 May 2023 at 13:37, Richard Sandiford
> >  wrote:
> >>
> >> Prathamesh Kulkarni  writes:
> >> > On Tue, 16 May 2023 at 00:29, Richard Sandiford
> >> >  wrote:
> >> >>
> >> >> Prathamesh Kulkarni  writes:
> >> >> > Hi Richard,
> >> >> > After committing the interleave+zip1 patch for vector initialization,
> >> >> > it seems to regress the s32 case for this patch:
> >> >> >
> >> >> > int32x4_t f_s32(int32_t x)
> >> >> > {
> >> >> >   return (int32x4_t) { x, x, x, 1 };
> >> >> > }
> >> >> >
> >> >> > code-gen:
> >> >> > f_s32:
> >> >> > moviv30.2s, 0x1
> >> >> > fmovs31, w0
> >> >> > dup v0.2s, v31.s[0]
> >> >> > ins v30.s[0], v31.s[0]
> >> >> > zip1v0.4s, v0.4s, v30.4s
> >> >> > ret
> >> >> >
> >> >> > instead of expected code-gen:
> >> >> > f_s32:
> >> >> > moviv31.2s, 0x1
> >> >> > dup v0.4s, w0
> >> >> > ins v0.s[3], v31.s[0]
> >> >> > ret
> >> >> >
> >> >> > Cost for fallback sequence: 16
> >> >> > Cost for interleave and zip sequence: 12
> >> >> >
> >> >> > For the above case, the cost for interleave+zip1 sequence is computed 
> >> >> > as:
> >> >> > halves[0]:
> >> >> > (set (reg:V2SI 96)
> >> >> > (vec_duplicate:V2SI (reg/v:SI 93 [ x ])))
> >> >> > cost = 8
> >> >> >
> >> >> > halves[1]:
> >> >> > (set (reg:V2SI 97)
> >> >> > (const_vector:V2SI [
> >> >> > (const_int 1 [0x1]) repeated x2
> >> >> > ]))
> >> >> > (set (reg:V2SI 97)
> >> >> > (vec_merge:V2SI (vec_duplicate:V2SI (reg/v:SI 93 [ x ]))
> >> >> > (reg:V2SI 97)
> >> >> > (const_int 1 [0x1])))
> >> >> > cost = 8
> >> >> >
> >> >> > followed by:
> >> >> > (set (reg:V4SI 95)
> >> >> > (unspec:V4SI [
> >> >> > (subreg:V4SI (reg:V2SI 96) 0)
> >> >> > (subreg:V4SI (reg:V2SI 97) 0)
> >> >> > ] UNSPEC_ZIP1))
> >> >> > cost = 4
> >> >> >
> >> >> > So the total cost becomes
> >> >> > max(costs[0], costs[1]) + zip1_insn_cost
> >> >> > = max(8, 8) + 4
> >> >> > = 12
> >> >> >
> >> >> > While the fallback rtl sequence is:
> >> >> > (set (reg:V4SI 95)
> >> >> > (vec_duplicate:V4SI (reg/v:SI 93 [ x ])))
> >> >> > cost = 8
> >> >> > (set (reg:SI 98)
> >> >> > (const_int 1 [0x1]))
> >> >> > cost = 4
> >> >> > (set (reg:V4SI 95)
> >> >> > (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 98))
> >> >> > (reg:V4SI 95)
> >> >> > (const_int 8 [0x8])))
> >> >> > cost = 4
> >> >> >
> >> >> > So total cost = 8 + 4 + 4 = 16, and we choose the interleave+zip1 
> >> >> > sequence.
> >> >> >
> >> >> > I think the issue is probably that for the interleave+zip1 sequence 
> >> >> > we take
> >> >> > max(costs[0], costs[1]) to reflect that both halves are interleaved,
> >> >> > but for the fallback seq we use seq_cost, which assumes serial 
> >> >> > execution
> >> >> > of insns in the sequence.
> >> >> > For above fallback sequence,
> >> >> > set (reg:V4SI 95)
> >> >> > (vec_duplicate:V4SI (reg/v:SI 93 [ x ])))
> >> >> > and
> >> >> > (set (reg:SI 98)
> >> >> > (const_int 1 [0x1]))
> >> >> > could be executed in parallel, which would make it's cost max(8, 4) + 
> >> >> > 4 = 12.
> >> >>
> >> >> Agreed.
> >> >>
> >> >> A good-enough substitute for this might be to ignore scalar moves
> >> >> (for both alternatives) when costing for speed.
> >> > Thanks for the suggestions. Just wondering for aarch64, if there's an 
> >> > easy
> >> > way we can check if insn is a scalar move, similar to riscv's 
> >> > scalar_move_insn_p
> >> > that checks if get_attr_type(insn) is TYPE_VIMOVXV or TYPE_VFMOVFV ?
> >>
> >> It should be enough to check that the pattern is a SET:
> >>
> >> (a) whose SET_DEST has a scalar mode and
> >> (b) whose SET_SRC an aarch64_mov_operand
> > Hi Richard,
> > Thanks for the suggestions, the attached patch calls seq_cost to compute
> > cost for sequence and then subtracts cost of each scalar move insn from it.
> > Does that look OK ?
> > The patch is under bootstrap+test on aarch64-linux-gnu.
>
> Yeah, the patch looks reasonable (some comments below).  The testing
> for this kind of patch is more than a formality though, so it would
> be good to wait to see if the tests pass.
>
> > [...]
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 29dbacfa917..7efd896d364 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -22332,6 +22332,32 @@ aarch64_unzip_vector_init (machine_mode mode, rtx 
> > vals, bool even_p)
> >return gen_rtx_PARALLEL (new_mode, vec);
> >  }
> >
> > +/* Return true if INSN is a scalar move.  */
> > +
> > +static bool
> > +scalar_move_insn_p (rtx_insn *insn)
> > +{
> > +  rtx set = single_set (insn);
> > +  if (!set)
> > +return false;
> > +  rtx src = SET_SRC (set);
> > +  rtx dest = SET_DEST (set);
> > +  return is_a(GET_MODE (dest))

Re: [PATCH] tree-ssa-math-opts: Pattern recognize some further hand written forms of signed __builtin_mul_overflow{, _p} [PR105776]

2023-05-19 Thread Richard Biener via Gcc-patches




> Am 19.05.2023 um 10:06 schrieb Jakub Jelinek :
> 
> Hi!
> 
> In the pattern recognition of signed __builtin_mul_overflow{,_p} we
> check for result of unsigned division (which follows unsigned
> multiplication) being equality compared against one of the multiplication's
> argument (the one not used in the division) and check for the comparison
> to be done against same precision cast of the argument (because
> division's result is unsigned and the argument is signed).
> But as shown in this PR, one can write it equally as comparison done in
> the signed type, i.e. compare division's result cast to corresponding
> signed type against the argument.
> 
> The following patch handles even those cases.
> 
> Bootstrapped/regtested on x86_64-linux, i686-linux, aarch64-linux and
> powerpc64le-linux, ok for trunk?

Ok.

Richard 

> 2023-05-19  Jakub Jelinek  
> 
>PR tree-optimization/105776
>* tree-ssa-math-opts.cc (arith_overflow_check_p): If cast_stmt is
>non-NULL, allow division statement to have a cast as single imm use
>rather than comparison/condition.
>(match_arith_overflow): In that case remove the cast stmt in addition
>to the division statement.
> 
>* gcc.target/i386/pr105776.c: New test.
> 
> --- gcc/tree-ssa-math-opts.cc.jj2023-05-18 14:57:13.216409685 +0200
> +++ gcc/tree-ssa-math-opts.cc2023-05-18 15:45:34.077177053 +0200
> @@ -3802,6 +3802,21 @@ arith_overflow_check_p (gimple *stmt, gi
>   use_operand_p use;
>   if (!single_imm_use (divlhs, , _use_stmt))
>return 0;
> +  if (cast_stmt && gimple_assign_cast_p (cur_use_stmt))
> +{
> +  tree cast_lhs = gimple_assign_lhs (cur_use_stmt);
> +  if (INTEGRAL_TYPE_P (TREE_TYPE (cast_lhs))
> +  && TYPE_UNSIGNED (TREE_TYPE (cast_lhs))
> +  && (TYPE_PRECISION (TREE_TYPE (cast_lhs))
> +  == TYPE_PRECISION (TREE_TYPE (divlhs)))
> +  && single_imm_use (cast_lhs, , _use_stmt))
> +{
> +  cast_stmt = NULL;
> +  divlhs = cast_lhs;
> +}
> +  else
> +return 0;
> +}
> }
>   if (gimple_code (cur_use_stmt) == GIMPLE_COND)
> {
> @@ -4390,6 +4405,16 @@ match_arith_overflow (gimple_stmt_iterat
>  gimple_stmt_iterator gsi2 = gsi_for_stmt (orig_use_stmt);
>  maybe_optimize_guarding_check (mul_stmts, use_stmt, orig_use_stmt,
> cfg_changed);
> +  use_operand_p use;
> +  gimple *cast_stmt;
> +  if (single_imm_use (gimple_assign_lhs (orig_use_stmt), ,
> +  _stmt)
> +  && gimple_assign_cast_p (cast_stmt))
> +{
> +  gimple_stmt_iterator gsi3 = gsi_for_stmt (cast_stmt);
> +  gsi_remove (, true);
> +  release_ssa_name (gimple_assign_lhs (cast_stmt));
> +}
>  gsi_remove (, true);
>  release_ssa_name (gimple_assign_lhs (orig_use_stmt));
>}
> --- gcc/testsuite/gcc.target/i386/pr105776.c.jj2023-05-18 
> 15:57:15.570218802 +0200
> +++ gcc/testsuite/gcc.target/i386/pr105776.c2023-05-18 15:56:55.273506918 
> +0200
> @@ -0,0 +1,43 @@
> +/* PR tree-optimization/105776 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized -masm=att" } */
> +/* { dg-final { scan-tree-dump-times " = \.MUL_OVERFLOW " 5 "optimized" } } 
> */
> +/* { dg-final { scan-assembler-times "\timull\t" 5 } } */
> +/* { dg-final { scan-assembler-times "\tsetno\t" 5 } } */
> +
> +int
> +foo (unsigned x, unsigned y)
> +{
> +  unsigned int r = x * y;
> +  return !x || ((int) r / (int) x) == (int) y;
> +}
> +
> +int
> +bar (unsigned x, unsigned y)
> +{
> +  return !x || ((int) (x * y) / (int) x) == (int) y;
> +}
> +
> +int
> +baz (unsigned x, unsigned y)
> +{
> +  if (x == 0)
> +return 1;
> +  return ((int) (x * y) / (int) x) == y;
> +}
> +
> +int
> +qux (unsigned x, unsigned y, unsigned *z)
> +{
> +  unsigned int r = x * y;
> +  *z = r;
> +  return !x || ((int) r / (int) x) == (int) y;
> +}
> +
> +int
> +corge (unsigned x, unsigned y, unsigned *z)
> +{
> +  unsigned int r = x * y;
> +  *z = r;
> +  return !x || ((int) r / (int) x) == y;
> +}
> 
>Jakub
>

Re: [PATCH] tree-ssa-math-opts: Pattern recognize hand written __builtin_mul_overflow_p with same unsigned types even when target just has highpart umul [PR101856]

2023-05-19 Thread Richard Biener via Gcc-patches




> Am 19.05.2023 um 10:00 schrieb Jakub Jelinek :
> 
> Hi!
> 
> As can be seen on the following testcase, we pattern recognize it on
> i?86/x86_64 as return __builtin_mul_overflow_p (x, y, 0UL) and avoid
> that way the extra division, but don't do it e.g. on aarch64 or ppc64le,
> even when return __builtin_mul_overflow_p (x, y, 0UL); actually produces
> there better code.  The reason for testing the presence of the optab
> handler is to make sure the generated code for it is short to ensure
> we don't actually pessimize code instead of optimizing it.
> But, we have one case that the internal-fn.cc .MUL_OVERFLOW expansion
> handles nicely, and that is when arguments/result is the same mode
> TYPE_UNSIGNED type, we only use IMAGPART_EXPR of it (i.e.
> __builtin_mul_overflow_p rather than __builtin_mul_overflow) and
> umul_highpart_optab supports the particular mode, in that case
> we emit comparison of the highpart umul result against zero.
> 
> So, the following patch matches what we do in internal-fn.cc and
> also pattern matches __builtin_mul_overflow_p if
> 1) we only need the flag whether it overflowed (i.e. !use_seen)
> 2) it is unsigned (i.e. !cast_stmt)
> 3) umul_highpart is supported for the mode
> 
> Bootstrapped/regtested on x86_64-linux, i686-linux, aarch64-linux and
> powerpc64le-linux, ok for trunk?

Ok.

Richard 

> 2023-05-19  Jakub Jelinek  
> 
>PR tree-optimization/101856
>* tree-ssa-math-opts.cc (match_arith_overflow): Pattern detect
>unsigned __builtin_mul_overflow_p even when umulv4_optab doesn't
>support it but umul_highpart_optab does.
> 
>* gcc.dg/tree-ssa/pr101856.c: New test.
> 
> --- gcc/tree-ssa-math-opts.cc.jj2023-05-17 20:57:59.537914382 +0200
> +++ gcc/tree-ssa-math-opts.cc2023-05-18 12:04:09.332336899 +0200
> @@ -4074,7 +4074,10 @@ match_arith_overflow (gimple_stmt_iterat
>TYPE_MODE (type)) == CODE_FOR_nothing)
>   || (code == MULT_EXPR
>  && optab_handler (cast_stmt ? mulv4_optab : umulv4_optab,
> -TYPE_MODE (type)) == CODE_FOR_nothing))
> +TYPE_MODE (type)) == CODE_FOR_nothing
> +  && (use_seen
> +  || cast_stmt
> +  || !can_mult_highpart_p (TYPE_MODE (type), true
> {
>   if (code != PLUS_EXPR)
>return false;
> --- gcc/testsuite/gcc.dg/tree-ssa/pr101856.c.jj2023-05-18 
> 11:57:17.681206745 +0200
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr101856.c2023-05-18 11:56:51.662577752 
> +0200
> @@ -0,0 +1,11 @@
> +/* PR tree-optimization/101856 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump " .MUL_OVERFLOW " "optimized" { target 
> i?86-*-* x86_64-*-* aarch64*-*-* powerpc64le-*-* } } } */
> +
> +int
> +foo (unsigned long x, unsigned long y)
> +{
> +  unsigned long z = x * y;
> +  return z / y != x;
> +}
> 
>Jakub
>

Re: [PATCH] RISC-V: Add mode switching target hook to insert rounding mode config for fixed-point instructions

2023-05-19 Thread Andreas Schwab

In function 'int optimize_mode_switching()',
inlined from 'virtual unsigned int 
{anonymous}::pass_mode_switching::execute(function*)' at 
../../gcc/mode-switching.cc:909:31:
../../gcc/mode-switching.cc:608:29: error: 'bb_info$' may be used uninitialized 
[-Werror=maybe-uninitialized]
  608 | add_seginfo (info + bb->index, ptr);
  | ^~~
../../gcc/mode-switching.cc: In member function 'virtual unsigned int 
{anonymous}::pass_mode_switching::execute(function*)':
../../gcc/mode-switching.cc:503:19: note: 'bb_info$' was declared here
  503 |   struct bb_info *bb_info[N_ENTITIES];
  |   ^~~
cc1plus: all warnings being treated as errors
make[3]: *** [Makefile:1174: mode-switching.o] Error 1

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

Re: Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer

2023-05-19 Thread juzhe.zh...@rivai.ai

>> I don't think this is a property of decrementing IVs.  IIUC it's really
>> a property of rgl->factor == 1 && factor == 1, where factor would need
>> to be passed in by the caller.  Because of that, it should probably be
>> a separate patch.
Is it right that I just post this part code as a seperate patch then merge it?

>> That is, current LOAD_LEN targets have two properties (IIRC):
>> (1) all vectors used in a given piece of vector code have the same byte size
>> (2) lengths are measured in bytes rather than elements
>> For all cases, including SVE, the number of controls needed for a scalar
>> statement is equal to the number of vectors needed for that scalar
>> statement.
>> Because of (1), on current LOADL_LEN targets, the number of controls
>> needed for a scalar statement is also proportional to the total number
>> of bytes occupied by the vectors generated for that scalar statement.
>> And because of (2), the total number of bytes is the only thing that
>> matters, so all users of a particular control can use the same control
>> value.
>> E.g. on current LOAD_LEN targets, 2xV16QI and 2xV8HI would use the same
>> control (with no adjustment).  2xV16QI means 32 elements, while 2xV8HI
>> means 16 elements.  V16QI's nscalars_per_iter would therefore be double
>> V8HI's, but V8HI's factor would be double V16QI's (2 vs 1), so things
>> even out.
>> The code structurally supports targets that count in elements rather
>> than bytes, so that factor==1 for all element types.  See the
>> "rgl->factor == 1 && factor == 1" case in:
 >>  if (rgl->max_nscalars_per_iter < nscalars_per_iter)  >>   {  >> /* For 
 >> now, we only support cases in which all loads and stores fall back   to 
 >> VnQI or none do.  */
   >>gcc_assert (!rgl->max_nscalars_per_iter>>|| 
(rgl->factor == 1 && factor == 1)
  || (rgl->max_nscalars_per_iter * rgl->factor
>>== nscalars_per_iter * factor));
 >>  rgl->max_nscalars_per_iter = nscalars_per_iter; >>  rgl->type = 
 >> vectype; >>  rgl->factor = factor;  >>   }>> But it hasn't been tested, 
 >> since no current target uses it.
>> I think the above part of the patch shows that the current "factor is
>> always 1" path is in fact broken, and the patch is a correctness fix on
>> targets that measure in elements rather than bytes.
>> So I think the above part of the patch should go in ahead of the IV changes.
>> But the test should be based on factor rather than 
>> TYPE_VECTOR_SUBPARTS.Since the length control measured by bytes instead of 
>> bytes is not appropriate for RVV.You mean I can't support RVV 
>> auto-vectorization in upstream GCC middle-end and I can only support it in 
>> my downstream, is that right? 


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-19 18:23
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer
Thanks for the update.  I'll split this review into two pieces.
Second piece to follow (not sure when, but hopefully soon).
 
juzhe.zh...@rivai.ai writes:
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index ed0166fedab..6f49bdee009 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10364,12 +10375,14 @@ vect_record_loop_len (loop_vec_info loop_vinfo, 
> vec_loop_lens *lens,
> rgroup that operates on NVECTORS vectors, where 0 <= INDEX < NVECTORS.  */
>  
>  tree
> -vect_get_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens,
> -unsigned int nvectors, unsigned int index)
> +vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi,
> +vec_loop_lens *lens, unsigned int nvectors, tree vectype,
> +unsigned int index)
>  {
>rgroup_controls *rgl = &(*lens)[nvectors - 1];
>bool use_bias_adjusted_len =
>  LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) != 0;
> +  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
>  
>/* Populate the rgroup's len array, if this is the first time we've
>   used it.  */
> @@ -10400,6 +10413,26 @@ vect_get_loop_len (loop_vec_info loop_vinfo, 
> vec_loop_lens *lens,
>  
>if (use_bias_adjusted_len)
>  return rgl->bias_adjusted_ctrl;
> +  else if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> +{
> +  tree loop_len = rgl->controls[index];
> +  poly_int64 nunits1 = TYPE_VECTOR_SUBPARTS (rgl->type);
> +  poly_int64 nunits2 = TYPE_VECTOR_SUBPARTS (vectype);
> +  if (maybe_ne (nunits1, nunits2))
> + {
> +   /* A loop len for data type X can be reused for data type Y
> +  if X has N times more elements than Y and if Y's elements
> +  are N times bigger than X's.  */
> +   gcc_assert (multiple_p (nunits1, nunits2));
> +   unsigned int factor = exact_div (nunits1, nunits2).to_constant ();
> +   gimple_seq seq = NULL;
> +   loop_len = gimple_build (, RDIV_EXPR, iv_type, loop_len,
> +build_int_cst (iv_type, factor));
> +   if (seq)
> + gsi_insert_seq_before (gsi,

Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer

2023-05-19 Thread Richard Sandiford via Gcc-patches

Thanks for the update.  I'll split this review into two pieces.
Second piece to follow (not sure when, but hopefully soon).

juzhe.zh...@rivai.ai writes:
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index ed0166fedab..6f49bdee009 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10364,12 +10375,14 @@ vect_record_loop_len (loop_vec_info loop_vinfo, 
> vec_loop_lens *lens,
> rgroup that operates on NVECTORS vectors, where 0 <= INDEX < NVECTORS.  */
>  
>  tree
> -vect_get_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens,
> -unsigned int nvectors, unsigned int index)
> +vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi,
> +vec_loop_lens *lens, unsigned int nvectors, tree vectype,
> +unsigned int index)
>  {
>rgroup_controls *rgl = &(*lens)[nvectors - 1];
>bool use_bias_adjusted_len =
>  LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) != 0;
> +  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
>  
>/* Populate the rgroup's len array, if this is the first time we've
>   used it.  */
> @@ -10400,6 +10413,26 @@ vect_get_loop_len (loop_vec_info loop_vinfo, 
> vec_loop_lens *lens,
>  
>if (use_bias_adjusted_len)
>  return rgl->bias_adjusted_ctrl;
> +  else if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> +{
> +  tree loop_len = rgl->controls[index];
> +  poly_int64 nunits1 = TYPE_VECTOR_SUBPARTS (rgl->type);
> +  poly_int64 nunits2 = TYPE_VECTOR_SUBPARTS (vectype);
> +  if (maybe_ne (nunits1, nunits2))
> + {
> +   /* A loop len for data type X can be reused for data type Y
> +  if X has N times more elements than Y and if Y's elements
> +  are N times bigger than X's.  */
> +   gcc_assert (multiple_p (nunits1, nunits2));
> +   unsigned int factor = exact_div (nunits1, nunits2).to_constant ();
> +   gimple_seq seq = NULL;
> +   loop_len = gimple_build (, RDIV_EXPR, iv_type, loop_len,
> +build_int_cst (iv_type, factor));
> +   if (seq)
> + gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
> + }
> +  return loop_len;
> +}

I don't think this is a property of decrementing IVs.  IIUC it's really
a property of rgl->factor == 1 && factor == 1, where factor would need
to be passed in by the caller.  Because of that, it should probably be
a separate patch.

That is, current LOAD_LEN targets have two properties (IIRC):

(1) all vectors used in a given piece of vector code have the same byte size
(2) lengths are measured in bytes rather than elements

For all cases, including SVE, the number of controls needed for a scalar
statement is equal to the number of vectors needed for that scalar
statement.

Because of (1), on current LOADL_LEN targets, the number of controls
needed for a scalar statement is also proportional to the total number
of bytes occupied by the vectors generated for that scalar statement.
And because of (2), the total number of bytes is the only thing that
matters, so all users of a particular control can use the same control
value.

E.g. on current LOAD_LEN targets, 2xV16QI and 2xV8HI would use the same
control (with no adjustment).  2xV16QI means 32 elements, while 2xV8HI
means 16 elements.  V16QI's nscalars_per_iter would therefore be double
V8HI's, but V8HI's factor would be double V16QI's (2 vs 1), so things
even out.

The code structurally supports targets that count in elements rather
than bytes, so that factor==1 for all element types.  See the
"rgl->factor == 1 && factor == 1" case in:

  if (rgl->max_nscalars_per_iter < nscalars_per_iter)
{
  /* For now, we only support cases in which all loads and stores fall back
 to VnQI or none do.  */
  gcc_assert (!rgl->max_nscalars_per_iter
  || (rgl->factor == 1 && factor == 1)
  || (rgl->max_nscalars_per_iter * rgl->factor
  == nscalars_per_iter * factor));
  rgl->max_nscalars_per_iter = nscalars_per_iter;
  rgl->type = vectype;
  rgl->factor = factor;
}

But it hasn't been tested, since no current target uses it.

I think the above part of the patch shows that the current "factor is
always 1" path is in fact broken, and the patch is a correctness fix on
targets that measure in elements rather than bytes.

So I think the above part of the patch should go in ahead of the IV changes.
But the test should be based on factor rather than TYPE_VECTOR_SUBPARTS.

Thanks,
Richard

Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]

2023-05-19 Thread Richard Sandiford via Gcc-patches

Tejas Belagod  writes:
> Am I correct to understand that we still need to check for the case when
> there's a repeating non-zero elements in the case of NELTS_PER_PATTERN == 2?
> eg. { 0, 0, 1, 1, 1, 1,} which should be encoded as {0, 0, 1, 1} with
> NPATTERNS = 2 ?

Yeah, that's right.  The current handling for NPATTERNS==2 looked
good to me.  It was the other two cases that I was worried about.

Thanks,
Richard

[PATCH v2] tree-ssa-sink: Improve code sinking pass

2023-05-19 Thread Ajit Agarwal via Gcc-patches

Hello All:

This patch improves code sinking pass to sink statements before call to reduce
register pressure.
Review comments are incorporated.

For example :

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  l = a + b + c + d +e + f;
  if (a != 5)
{
  bar();
  j = l;
}
}

Code Sinking does the following:

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  
  if (a != 5)
{
  l = a + b + c + d +e + f; 
  bar();
  j = l;
}
}

Bootstrapped regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


tree-ssa-sink: Improve code sinking pass

Code Sinking sinks the blocks after call.This increases register pressure
for callee-saved registers. Improves code sinking before call in the use
blocks or immediate dominator of use blocks.

2023-05-18  Ajit Kumar Agarwal  

gcc/ChangeLog:

* tree-ssa-sink.cc (statement_sink_location): Move statements before
calls.
(block_call_p): New function.
(def_use_same_block): New function.
(select_best_block): Add heuristics to select the best blocks in the
immediate post dominator.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-sink-20.c: New testcase.
* gcc.dg/tree-ssa/ssa-sink-21.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c |  15 ++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c |  19 +++
 gcc/tree-ssa-sink.cc| 160 ++--
 3 files changed, 183 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
new file mode 100644
index 000..69fa6d32e7c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
@@ -0,0 +1,15 @@
+/* { dg-options "-O2 -fdump-tree-optimized -fdump-tree-sink-stats" } */
+
+void bar();
+int j;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump-times "Sunk statements: 5" 1 "sink" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
new file mode 100644
index 000..b34959c8a4d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
@@ -0,0 +1,19 @@
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+
+void bar();
+int j, x;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  if (b != 3)
+x = 3;
+  else
+x = 5;
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump-times "Sunk statements: 5" 1 "sink" } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index b1ba7a2ad6c..091aa90d289 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -171,6 +171,71 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
bool *debug_stmts)
   return commondom;
 }
 
+/* Return TRUE if immediate uses of the defs in
+   STMT occur in the same block as STMT, FALSE otherwise.  */
+
+bool
+def_use_same_block (gimple *stmt)
+{
+  use_operand_p use;
+  def_operand_p def;
+  imm_use_iterator imm_iter;
+  ssa_op_iter iter;
+
+  FOR_EACH_SSA_DEF_OPERAND (def, stmt, iter, SSA_OP_DEF)
+{
+  FOR_EACH_IMM_USE_FAST (use, imm_iter, DEF_FROM_PTR (def))
+   {
+ if (is_gimple_debug (USE_STMT (use)))
+   continue;
+
+ if (use && (gimple_bb (USE_STMT (use)) == gimple_bb (stmt)))
+   return true;
+   }
+ }
+  return false;
+}
+
+/* Return TRUE if the block has only one call statement, FALSE otherwise. */
+
+bool
+block_call_p (basic_block bb)
+{
+  int i = 0;
+  bool is_call = false;
+  gimple_stmt_iterator gsi = gsi_last_bb (bb);
+  gimple *last_stmt = gsi_stmt (gsi);
+
+  if (last_stmt && gimple_code (last_stmt) == GIMPLE_COND)
+{
+  if (!gsi_end_p (gsi))
+   gsi_prev ();
+
+   for (; !gsi_end_p (gsi);)
+{
+  gimple *stmt = gsi_stmt (gsi);
+
+  /* We have already seen a call.  */
+  if (is_call)
+return false;
+
+  if (is_gimple_call (stmt))
+is_call = true;
+  else
+return false;
+
+  if (!gsi_end_p (gsi))
+gsi_prev ();
+
+   ++i;
+   }
+ }
+  if (is_call && i == 1)
+return true;
+
+  return false;
+}
+
 /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
tree, return the best basic block between them (inclusive) to place
statements.
@@ -190,7 +255,8 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool 
*debug_stmts)
 static basic_block
 select_best_block (basic_block early_bb,
   basic_block late_bb,
-  gimple *stmt)
+  gimple *stmt,
+

Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]

2023-05-19 Thread Tejas Belagod via Gcc-patches




From: Richard Sandiford 
Date: Tuesday, May 16, 2023 at 5:36 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]
Tejas Belagod  writes:
>>> +   {
>>> + b = build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs), val,
>>> + bitsize_int (step * BITS_PER_UNIT),
>>> + bitsize_int ((16 - step) * BITS_PER_UNIT));
>>> +
>>> + return gimple_build_assign (f.lhs, b);
>>> +   }
>>> +
>>> + /* If VECTOR_CST_NELTS_PER_PATTERN (pred) == 2 and every multiple of
>>> +'step_1' in
>>> +[VECTOR_CST_NPATTERNS .. VECTOR_CST_ENCODED_NELTS - 1]
>>> +is zero, then we can treat the vector as VECTOR_CST_NPATTERNS
>>> +elements followed by all inactive elements.  */
>>> + if (!const_vl && VECTOR_CST_NELTS_PER_PATTERN (pred) == 2)
>>
>> Following on from the above, maybe use:
>>
>>   !VECTOR_CST_NELTS (pred).is_constant ()
>>
>> instead of !const_vl here.
>>
>> I have a horrible suspicion that I'm contradicting our earlier discussion
>> here, sorry, but: I think we have to return null if NELTS_PER_PATTERN != 2.
>>
>>
>>
>> IIUC, the NPATTERNS .. ENCODED_ELTS represent the repeated part of the
> encoded
>> constant. This means the repetition occurs if NELTS_PER_PATTERN == 2, IOW the
>> base1 repeats in the encoding. This loop is checking this condition and looks
>> for a 1 in the repeated part of the NELTS_PER_PATTERN == 2 in a VL vector.
>> Please correct me if I’m misunderstanding here.
>
> NELTS_PER_PATTERN == 1 is also a repeating pattern: it means that the
> entire sequence is repeated to fill a vector.  So if an NELTS_PER_PATTERN
> == 1 constant has elements {0, 1, 0, 0}, the vector is:
>
>{0, 1, 0, 0, 0, 1, 0, 0, ...}
>
>
> Wouldn’t the vect_all_same(pred, step) cover this case for a given value of
> step?
>
>
> and the optimisation can't handle that.  NELTS_PER_PATTERN == 3 isn't
> likely to occur for predicates, but in principle it has the same problem.
>
>
>
> OK, I had misunderstood the encoding to always make base1 the repeating value
> by adjusting the NPATTERNS accordingly – I didn’t know you could also have the
> base2 value and beyond encoding the repeat value. In this case could I just
> remove NELTS_PER_PATTERN == 2 condition and the enclosed loop would check for 
> a
> repeating ‘1’ in the repeated part of the encoded pattern?

But for NELTS_PER_PATTERN==1, the whole encoded sequence repeats.
So you would have to start the check at element 0 rather than
NPATTERNS.  And then (for NELTS_PER_PATTERN==1) the loop would reject
any constant that has a nonzero element.  But all valid zero-vector
cases have been handled by this point, so the effect wouldn't be useful.

It should never be the case that all elements from NPATTERNS
onwards are zero for NELTS_PER_PATTERN==3; that case should be
canonicalised to NELTS_PER_PATTERN==2 instead.

So in practice it's simpler and more obviously correct to punt
when NELTS_PER_PATTERN != 2.

Thanks for the clarification.
I understand all points about punting when NELTS_PER_PATTERN !=2, but one.

Am I correct to understand that we still need to check for the case when 
there's a repeating non-zero elements in the case of NELTS_PER_PATTERN == 2? 
eg. { 0, 0, 1, 1, 1, 1,} which should be encoded as {0, 0, 1, 1} with 
NPATTERNS = 2 ?

Thanks,
Tejas.


Thanks,
Richard

Re: [patch,avr] Fix PR109650 wrong code

2023-05-19 Thread Georg-Johann Lay


...Ok, and now with the patch attached...

Here is a revised version of the patch.  The difference to the
previous one is that it adds some combine patterns for *cbranch
insns that were lost in the PR92729 transition.  The post-reload
part of the patterns were still there.  The new patterns are
slightly more general in that they also handle fixed-point modes.

Apart from that, the patch behaves the same:

Am 15.05.23 um 20:05 schrieb Georg-Johann Lay:

This patch fixes a wrong-code bug in the wake of PR92729, the transition
that turned the AVR backend from cc0 to CCmode.  In cc0, the insn that
uses cc0 like a conditional branch always follows the cc0 setter, which
is no more the case with CCmode where set and use of REG_CC might be in
different basic blocks.

This patch removes the machine-dependent reorg pass in avr_reorg entirely.

It is replaced by a new, AVR specific mini-pass that runs prior to
split2. Canonicalization of comparisons away from the "difficult"
codes GT[U] and LE[U] is now mostly performed by implementing
TARGET_CANONICALIZE_COMPARISON.

Moreover:

* Text peephole conditions get "dead_or_set_regno_p (*, REG_CC)" as
needed.

* RTL peephole conditions get "peep2_regno_dead_p (*, REG_CC)" as
needed.

* Conditional branches no more clobber REG_CC.

* insn output for compares looks ahead to determine the branch mode in
use. This needs also "dead_or_set_regno_p (*, REG_CC)".

* Add RTL peepholes for decrement-and-branch detection.

Finally, it fixes some of the many indentation glitches left over from
PR92729.

Ok?

I'd also backport this one because all of v12+ is affected by the wrong 
code.


Johann

--

gcc/
PR target/109650
PR target/92729

* config/avr/avr-passes.def (avr_pass_ifelse): Insert new pass.
* config/avr/avr.cc (avr_pass_ifelse): New RTL pass.
(avr_pass_data_ifelse): New pass_data for it.
(make_avr_pass_ifelse, avr_redundant_compare, avr_cbranch_cost)
(avr_canonicalize_comparison, avr_out_plus_set_ZN)
(avr_out_cmp_ext): New functions.
(compare_condtition): Make sure REG_CC dies in the branch insn.
(avr_rtx_costs_1): Add computation of cbranch costs.
(avr_adjust_insn_length) [ADJUST_LEN_ADD_SET_ZN, ADJUST_LEN_CMP_ZEXT]:
[ADJUST_LEN_CMP_SEXT]Handle them.
(TARGET_CANONICALIZE_COMPARISON): New define.
(avr_simplify_comparison_p, compare_diff_p, avr_compare_pattern)
(avr_reorg_remove_redundant_compare, avr_reorg): Remove functions.
(TARGET_MACHINE_DEPENDENT_REORG): Remove define.

* avr-protos.h (avr_simplify_comparison_p): Remove proto.
(make_avr_pass_ifelse, avr_out_plus_set_ZN, cc_reg_rtx)
(avr_out_cmp_zext): New Protos

* config/avr/avr.md (branch, difficult_branch): Don't split insns.
(*cbranchhi.zero-extend.0", *cbranchhi.zero-extend.1")
(*swapped_tst, *add.for.eqne.): New insns.
(*cbranch4): Rename to cbranch4_insn.
(define_peephole): Add dead_or_set_regno_p(insn,REG_CC) as needed.
(define_deephole2): Add peep2_regno_dead_p(*,REG_CC) as needed.
Add new RTL peepholes for decrement-and-branch and *swapped_tst.
Rework signtest-and-branch peepholes for *sbrx_branch.
(adjust_len) [add_set_ZN, cmp_zext]: New.
(QIPSI): New mode iterator.
(ALLs1, ALLs2, ALLs4, ALLs234): New mode iterators.
(gelt): New code iterator.
(gelt_eqne): New code attribute.
(rvbranch, *rvbranch, difficult_rvbranch, *difficult_rvbranch)
(branch_unspec, *negated_tst, *reversed_tst)
(*cmpqi_sign_extend): Remove insns.
(define_c_enum "unspec") [UNSPEC_IDENTITY]: Remove.

* config/avr/avr-dimode.md (cbranch4): Canonicalize comparisons.
* config/avr/predicates.md (scratch_or_d_register_operand): New.
* config/avr/contraints.md (Yxx): New constraint.

gcc/testsuite/
PR target/109650
* config/avr/torture/pr109650-1.c: New test.
* config/avr/torture/pr109650-2.c: New test.diff --git a/gcc/config/avr/avr-dimode.md b/gcc/config/avr/avr-dimode.md
index c0bb04ff9e0..91f0d395761 100644
--- a/gcc/config/avr/avr-dimode.md
+++ b/gcc/config/avr/avr-dimode.md
@@ -455,12 +455,18 @@ (define_expand "conditional_jump"
 (define_expand "cbranch4"
   [(set (pc)
 (if_then_else (match_operator 0 "ordered_comparison_operator"
-[(match_operand:ALL8 1 "register_operand"  "")
- (match_operand:ALL8 2 "nonmemory_operand" "")])
- (label_ref (match_operand 3 "" ""))
- (pc)))]
+[(match_operand:ALL8 1 "register_operand")
+ (match_operand:ALL8 2 "nonmemory_operand")])
+  (label_ref (match_operand 3))
+  (pc)))]
   "avr_have_dimode"
{
+int icode = (int) GET_CODE (operands[0]);
+
+targetm.canonicalize_comparison (, [1], [2], false);
+

Re: [RFC V2] RISC-V : Support rv64 ilp32

2023-05-19 Thread Liao Shihua


Thanks for your advice, Kito.

在 2023/5/19 15:35, Kito Cheng 写道:

I am concern about we didn't define POINTERS_EXTEND_UNSIGNED here, and
also concern about the code model stuffs, I know currently Guo-Ren's
implementation is rely on some MMU trick, but I am not sure does it
also applicable on embedded applications.




OK，we will verify this in the future.




-  /* We do not yet support ILP32 on RV64.  */
-  if (BITS_PER_WORD != POINTER_SIZE)
-error ("ABI requires %<-march=rv%d%>", POINTER_SIZE);


It seems to also make -march=rv32g -mabi=lp64 become acceptable?


Oh, I was negligent and will make improvements in the next patch.

Best Regards
Liao Shihua

[committed] libgomp: Fix up -static -fopenmp linking [PR109904]

2023-05-19 Thread Jakub Jelinek via Gcc-patches

Hi!

When an OpenMP program with target regions is linked statically,
it fails to link on various arches (doesn't when using recent glibc
because it has libdl stuff in libc), because libgomp.a(target.o) uses
dlopen/dlsym/dlclose, but we aren't linking against -ldl (unless
user asked for that).  We already have libgomp.spec so that we
can supply extra libraries to link against in the -static case,
this patch adds -ldl to that if plugins are supported.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2023-05-19  Jakub Jelinek  

PR libgomp/109904
* configure.ac (link_gomp): Include also $DL_LIBS.
* configure: Regenerated.

--- libgomp/configure.ac.jj 2023-05-15 19:12:35.138624638 +0200
+++ libgomp/configure.ac2023-05-18 20:41:58.512501769 +0200
@@ -398,9 +398,9 @@ fi
 # which will force linkage against -lpthread (or equivalent for the system).
 # That's not 100% ideal, but about the best we can do easily.
 if test $enable_shared = yes; then
-  link_gomp="-lgomp %{static: $LIBS}"
+  link_gomp="-lgomp %{static: $LIBS${DL_LIBS:+ $DL_LIBS}}"
 else
-  link_gomp="-lgomp $LIBS"
+  link_gomp="-lgomp $LIBS${DL_LIBS:+ $DL_LIBS}"
 fi
 AC_SUBST(link_gomp)
 
--- libgomp/configure.jj2023-05-15 19:12:35.138624638 +0200
+++ libgomp/configure   2023-05-18 20:42:12.703299052 +0200
@@ -16788,9 +16788,9 @@ fi
 # which will force linkage against -lpthread (or equivalent for the system).
 # That's not 100% ideal, but about the best we can do easily.
 if test $enable_shared = yes; then
-  link_gomp="-lgomp %{static: $LIBS}"
+  link_gomp="-lgomp %{static: $LIBS${DL_LIBS:+ $DL_LIBS}}"
 else
-  link_gomp="-lgomp $LIBS"
+  link_gomp="-lgomp $LIBS${DL_LIBS:+ $DL_LIBS}"
 fi
 
 

Jakub

Re: [patch,avr] Fix PR109650 wrong code

2023-05-19 Thread Georg-Johann Lay


Here is a revised version of the patch.  The difference to the
previous one is that it adds some combine patterns for *cbranch
insns that were lost in the PR92729 transition.  The post-reload
part of the patterns were still there.  The new patterns are
slightly more general in that they also handle fixed-point modes.

Apart from that, the patch behaves the same:

Am 15.05.23 um 20:05 schrieb Georg-Johann Lay:

This patch fixes a wrong-code bug in the wake of PR92729, the transition
that turned the AVR backend from cc0 to CCmode.  In cc0, the insn that
uses cc0 like a conditional branch always follows the cc0 setter, which
is no more the case with CCmode where set and use of REG_CC might be in
different basic blocks.

This patch removes the machine-dependent reorg pass in avr_reorg entirely.

It is replaced by a new, AVR specific mini-pass that runs prior to
split2. Canonicalization of comparisons away from the "difficult"
codes GT[U] and LE[U] is now mostly performed by implementing
TARGET_CANONICALIZE_COMPARISON.

Moreover:

* Text peephole conditions get "dead_or_set_regno_p (*, REG_CC)" as
needed.

* RTL peephole conditions get "peep2_regno_dead_p (*, REG_CC)" as
needed.

* Conditional branches no more clobber REG_CC.

* insn output for compares looks ahead to determine the branch mode in
use. This needs also "dead_or_set_regno_p (*, REG_CC)".

* Add RTL peepholes for decrement-and-branch detection.

Finally, it fixes some of the many indentation glitches left over from
PR92729.

Ok?

I'd also backport this one because all of v12+ is affected by the wrong 
code.


Johann

--

gcc/
PR target/109650
PR target/97279

* config/avr/avr-passes.def (avr_pass_ifelse): Insert new pass.
* config/avr/avr.cc (avr_pass_ifelse): New RTL pass.
(avr_pass_data_ifelse): New pass_data for it.
(make_avr_pass_ifelse, avr_redundant_compare, avr_cbranch_cost)
(avr_canonicalize_comparison, avr_out_plus_set_ZN)
(avr_out_cmp_ext): New functions.
(compare_condtition): Make sure REG_CC dies in the branch insn.
(avr_rtx_costs_1): Add computation of cbranch costs.
(avr_adjust_insn_length) [ADJUST_LEN_ADD_SET_ZN, ADJUST_LEN_CMP_ZEXT]:
[ADJUST_LEN_CMP_SEXT]Handle them.
(TARGET_CANONICALIZE_COMPARISON): New define.
(avr_simplify_comparison_p, compare_diff_p, avr_compare_pattern)
(avr_reorg_remove_redundant_compare, avr_reorg): Remove functions.
(TARGET_MACHINE_DEPENDENT_REORG): Remove define.

* avr-protos.h (avr_simplify_comparison_p): Remove proto.
(make_avr_pass_ifelse, avr_out_plus_set_ZN, cc_reg_rtx)
(avr_out_cmp_zext): New Protos

* config/avr/avr.md (branch, difficult_branch): Don't split insns.
(*cbranchhi.zero-extend.0", *cbranchhi.zero-extend.1")
(*swapped_tst, *add.for.eqne.): New insns.
(*cbranch4): Rename to cbranch4_insn.
(define_peephole): Add dead_or_set_regno_p(insn,REG_CC) as needed.
(define_deephole2): Add peep2_regno_dead_p(*,REG_CC) as needed.
Add new RTL peepholes for decrement-and-branch and *swapped_tst.
Rework signtest-and-branch peepholes for *sbrx_branch.
(adjust_len) [add_set_ZN, cmp_zext]: New.
(QIPSI): New mode iterator.
(ALLs1, ALLs2, ALLs4, ALLs234): New mode iterators.
(gelt): New code iterator.
(gelt_eqne): New code attribute.
(rvbranch, *rvbranch, difficult_rvbranch, *difficult_rvbranch)
(branch_unspec, *negated_tst, *reversed_tst)
(*cmpqi_sign_extend): Remove insns.
(define_c_enum "unspec") [UNSPEC_IDENTITY]: Remove.

* config/avr/avr-dimode.md (cbranch4): Canonicalize comparisons.
* config/avr/predicates.md (scratch_or_d_register_operand): New.
* config/avr/contraints.md (Yxx): New constraint.

gcc/testsuite/
PR target/109650
* config/avr/torture/pr109650-1.c: New test.
* config/avr/torture/pr109650-2.c: New test.

[pushed] libstdc++: Move lafstern.org reference to https

2023-05-19 Thread Gerald Pfeifer

Pushed.

Gerald


libstdc++-v3/ChangeLog:

* doc/xml/manual/strings.xml: Move lafstern.org reference to https.
* doc/html/manual/strings.html: Regenerate.
---
 libstdc++-v3/doc/html/manual/strings.html | 2 +-
 libstdc++-v3/doc/xml/manual/strings.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/doc/html/manual/strings.html 
b/libstdc++-v3/doc/html/manual/strings.html
index 3441119e926..ceb09f97eac 100644
--- a/libstdc++-v3/doc/html/manual/strings.html
+++ b/libstdc++-v3/doc/html/manual/strings.html
@@ -111,7 +111,7 @@
book Exceptional C++ and on his 
website as http://www.gotw.ca/gotw/029.htm; 
target="_top">GotW 29.
See?  Told you it was easy!
  Added June 2000: The May 2000 
issue of C++
- Report contains a fascinating http://lafstern.org/matt/col2_new.pdf; target="_top"> article by
+ Report contains a fascinating https://lafstern.org/matt/col2_new.pdf; target="_top"> article by
  Matt Austern (yes, the Matt 
Austern) on why
  case-insensitive comparisons are not as easy as they seem, and
  why creating a class is the wrong 
way to go
diff --git a/libstdc++-v3/doc/xml/manual/strings.xml 
b/libstdc++-v3/doc/xml/manual/strings.xml
index e9d4c8ce347..b0dab645a2d 100644
--- a/libstdc++-v3/doc/xml/manual/strings.xml
+++ b/libstdc++-v3/doc/xml/manual/strings.xml
@@ -145,7 +145,7 @@
See?  Told you it was easy!

  Added June 2000: The May 2000 issue of C++
- Report contains a fascinating http://www.w3.org/1999/xlink; 
xlink:href="http://lafstern.org/matt/col2_new.pdf;> article by
+ Report contains a fascinating http://www.w3.org/1999/xlink; 
xlink:href="https://lafstern.org/matt/col2_new.pdf;> article by
  Matt Austern (yes, the Matt Austern) on why
  case-insensitive comparisons are not as easy as they seem, and
  why creating a class is the wrong way to go
-- 
2.40.1

Re: [pushed] wwwdocs: onlinedocs/13.1.0: Remove last trace of XHTML

2023-05-19 Thread Jakub Jelinek via Gcc-patches

On Fri, May 19, 2023 at 10:08:28AM +0200, Gerald Pfeifer wrote:
> This is how I actually noticed the situation in gcc-13/buildstat.html
> (and then I mixed the two up).
> 
> Jakub, do you have some old templates somewhere maybe?

Usually I git diff last year's changes and apply that after adjusting the
versions, which indeed has the problem of bringing back old style stuff;
but copying latest files doesn't work in many cases either because it
contains lots of changes that I'd have to undo.
But sure, for the onlinedocs I guess copying latest with adjustments is better.

> diff --git a/htdocs/onlinedocs/13.1.0/index.html 
> b/htdocs/onlinedocs/13.1.0/index.html
> index 7b8c3d38..2abc06ac 100644
> --- a/htdocs/onlinedocs/13.1.0/index.html
> +++ b/htdocs/onlinedocs/13.1.0/index.html
> @@ -4,7 +4,7 @@
>  
>  
>  GCC 13.1 manuals
> -https://gcc.gnu.org/gcc.css; />
> +https://gcc.gnu.org/gcc.css;>
>  
>  
>  
> -- 
> 2.40.1

Jakub

[pushed] Darwin, libgcc : Adjust min version supported for the OS.

2023-05-19 Thread Iain Sandoe via Gcc-patches

Tested across the Darwin range (this patch has been on the WIP branches for
some time) and on x86_64-linux-gnu, for reference.
pushed to trunk, thanks
Iain

--- 8< ---

Tools from later versions of the OS deprecate or fail to support
earlier OS revisions.

Signed-off-by: Iain Sandoe 

libgcc/ChangeLog:

* config.host: Arrange to set min Darwin OS versions from
the configured host version.
* config/darwin10-unwind-find-enc-func.c: Do not use current
headers, but declare the nexessary structures locally to the
versions in use for Mac OSX 10.6.
* config/t-darwin: Amend to handle configured min OS
versions.
* config/t-darwin-min-1: New.
* config/t-darwin-min-5: New.
* config/t-darwin-min-8: New.
---
 libgcc/config.host| 18 ++
 libgcc/config/darwin10-unwind-find-enc-func.c | 34 ---
 libgcc/config/t-darwin| 10 +++---
 libgcc/config/t-darwin-min-1  |  3 ++
 libgcc/config/t-darwin-min-5  |  3 ++
 libgcc/config/t-darwin-min-8  |  3 ++
 6 files changed, 63 insertions(+), 8 deletions(-)
 create mode 100644 libgcc/config/t-darwin-min-1
 create mode 100644 libgcc/config/t-darwin-min-5
 create mode 100644 libgcc/config/t-darwin-min-8

diff --git a/libgcc/config.host b/libgcc/config.host
index b9975de9023..9d7212028d0 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -233,6 +233,24 @@ case ${host} in
   ;;
   esac
   tmake_file="$tmake_file t-slibgcc-darwin"
+  # newer toolsets produce warnings when building for unsupported versions.
+  case ${host} in
+*-*-darwin1[89]* | *-*-darwin2* )
+  tmake_file="t-darwin-min-8 $tmake_file"
+  ;;
+*-*-darwin9* | *-*-darwin1[0-7]*)
+  tmake_file="t-darwin-min-5 $tmake_file"
+  ;;
+*-*-darwin[4-8]*)
+  tmake_file="t-darwin-min-1 $tmake_file"
+  ;;
+*)
+  # Fall back to configuring for the oldest system known to work with
+  # all archs and the current sources.
+  tmake_file="t-darwin-min-5 $tmake_file"
+  echo "Warning: libgcc configured to support macOS 10.5" 1>&2
+  ;;
+  esac
   extra_parts="crt3.o libd10-uwfef.a crttms.o crttme.o libemutls_w.a"
   ;;
 *-*-dragonfly*)
diff --git a/libgcc/config/darwin10-unwind-find-enc-func.c 
b/libgcc/config/darwin10-unwind-find-enc-func.c
index 882ec3a2372..b08396c5f1b 100644
--- a/libgcc/config/darwin10-unwind-find-enc-func.c
+++ b/libgcc/config/darwin10-unwind-find-enc-func.c
@@ -1,8 +1,34 @@
-#include "tconfig.h"
-#include "tsystem.h"
-#include "unwind-dw2-fde.h"
 #include "libgcc_tm.h"
 
+/* This shim is special, it needs to be built for Mac OSX 10.6
+   regardless of the current system version.
+   We must also build it to use the unwinder layout that was
+   present for 10.6 (and not update that).
+   So we copy the referenced structures from unwind-dw2-fde.h
+   to avoid pulling in newer system headers and/or changed
+   layouts.  */
+struct dwarf_eh_bases
+{
+  void *tbase;
+  void *dbase;
+  void *func;
+};
+
+typedef  int  sword __attribute__ ((mode (SI)));
+typedef unsigned int  uword __attribute__ ((mode (SI)));
+
+/* The first few fields of an FDE.  */
+struct dwarf_fde
+{
+  uword length;
+  sword CIE_delta;
+  unsigned char pc_begin[];
+} __attribute__ ((packed, aligned (__alignof__ (void *;
+
+typedef struct dwarf_fde fde;
+
+extern const fde * _Unwind_Find_FDE (void *, struct dwarf_eh_bases *);
+
 void *
 _darwin10_Unwind_FindEnclosingFunction (void *pc)
 {
@@ -10,5 +36,5 @@ _darwin10_Unwind_FindEnclosingFunction (void *pc)
   const struct dwarf_fde *fde = _Unwind_Find_FDE (pc-1, );
   if (fde)
 return bases.func;
-  return NULL;
+  return (void *) 0;
 }
diff --git a/libgcc/config/t-darwin b/libgcc/config/t-darwin
index 299d26c2c96..a3bb70c6a0a 100644
--- a/libgcc/config/t-darwin
+++ b/libgcc/config/t-darwin
@@ -1,15 +1,15 @@
 # Set this as a minimum (unless overriden by arch t-files) since it's a
 # reasonable lowest common denominator that works for all our archs.
-HOST_LIBGCC2_CFLAGS += -mmacosx-version-min=10.4
+HOST_LIBGCC2_CFLAGS += $(DARWIN_MIN_LIB_VERSION)
 
 crt3.o: $(srcdir)/config/darwin-crt3.c
-   $(crt_compile) -mmacosx-version-min=10.4 -c $<
+   $(crt_compile) $(DARWIN_MIN_CRT_VERSION) -c $<
 
 crttms.o: $(srcdir)/config/darwin-crt-tm.c
-   $(crt_compile) -mmacosx-version-min=10.4 -DSTART -c $<
+   $(crt_compile) $(DARWIN_MIN_CRT_VERSION) -DSTART -c $<
 
 crttme.o: $(srcdir)/config/darwin-crt-tm.c
-   $(crt_compile) -mmacosx-version-min=10.4 -DEND -c $<
+   $(crt_compile) $(DARWIN_MIN_CRT_VERSION) -DEND -c $<
 
 # Make emutls weak so that we can deal with -static-libgcc, override the
 #??hidden visibility when this is present in libgcc_eh.
@@ -25,6 +25,8 @@ libemutls_w.a: emutls_s.o
$(RANLIB_FOR_TARGET) $@
 
 # Patch to __Unwind_Find_Enclosing_Function for Darwin10.
+# This needs to be built for

[pushed] wwwdocs: onlinedocs/13.1.0: Remove last trace of XHTML

2023-05-19 Thread Gerald Pfeifer

This is how I actually noticed the situation in gcc-13/buildstat.html
(and then I mixed the two up).

Jakub, do you have some old templates somewhere maybe?

Gerald

Pushed:
---
 htdocs/onlinedocs/13.1.0/index.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/onlinedocs/13.1.0/index.html 
b/htdocs/onlinedocs/13.1.0/index.html
index 7b8c3d38..2abc06ac 100644
--- a/htdocs/onlinedocs/13.1.0/index.html
+++ b/htdocs/onlinedocs/13.1.0/index.html
@@ -4,7 +4,7 @@
 
 
 GCC 13.1 manuals
-https://gcc.gnu.org/gcc.css; />
+https://gcc.gnu.org/gcc.css;>
 
 
 
-- 
2.40.1

[PATCH v2] MIPS16: Implement `code_readable` function attribute.

2023-05-19 Thread Jie Mei

From: Simon Dardis 

Support for __attribute__ ((code_readable)).  Takes up to one argument of
"yes", "no", "pcrel".  This will change the code readability setting for just
that function.  If no argument is supplied, then the setting is 'yes'.

gcc/ChangeLog:

* config/mips/mips.cc (enum mips_code_readable_setting):New enmu.
(mips_handle_code_readable_attr):New static function.
(mips_get_code_readable_attr):New static enum function.
(mips_set_current_function):Set the code_readable mode.
(mips_option_override):Same as above.
* doc/extend.texi:Document code_readable.

gcc/testsuite/ChangeLog:

* gcc.target/mips/code-readable-attr-1.c: New test.
* gcc.target/mips/code-readable-attr-2.c: New test.
* gcc.target/mips/code-readable-attr-3.c: New test.
* gcc.target/mips/code-readable-attr-4.c: New test.
* gcc.target/mips/code-readable-attr-5.c: New test.
---
 gcc/config/mips/mips.cc   | 97 ++-
 gcc/doc/extend.texi   | 17 
 .../gcc.target/mips/code-readable-attr-1.c| 51 ++
 .../gcc.target/mips/code-readable-attr-2.c| 49 ++
 .../gcc.target/mips/code-readable-attr-3.c| 50 ++
 .../gcc.target/mips/code-readable-attr-4.c| 51 ++
 .../gcc.target/mips/code-readable-attr-5.c|  5 +
 7 files changed, 319 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-1.c
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-2.c
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-3.c
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-4.c
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-5.c

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index ca822758b41..97f45e67529 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -498,6 +498,9 @@ static int mips_base_target_flags;
 /* The default compression mode.  */
 unsigned int mips_base_compression_flags;
 
+/* The default code readable setting.  */
+enum mips_code_readable_setting mips_base_code_readable;
+
 /* The ambient values of other global variables.  */
 static int mips_base_schedule_insns; /* flag_schedule_insns */
 static int mips_base_reorder_blocks_and_partition; /* flag_reorder... */
@@ -602,6 +605,7 @@ const enum reg_class 
mips_regno_to_class[FIRST_PSEUDO_REGISTER] = {
   ALL_REGS,ALL_REGS,   ALL_REGS,   ALL_REGS
 };
 
+static tree mips_handle_code_readable_attr (tree *, tree, tree, int, bool *);
 static tree mips_handle_interrupt_attr (tree *, tree, tree, int, bool *);
 static tree mips_handle_use_shadow_register_set_attr (tree *, tree, tree, int,
  bool *);
@@ -623,6 +627,8 @@ static const struct attribute_spec mips_attribute_table[] = 
{
   { "micromips",   0, 0, true,  false, false, false, NULL, NULL },
   { "nomicromips", 0, 0, true,  false, false, false, NULL, NULL },
   { "nocompression", 0, 0, true,  false, false, false, NULL, NULL },
+  { "code_readable", 0, 1, true,  false, false, false,
+mips_handle_code_readable_attr, NULL },
   /* Allow functions to be specified as interrupt handlers */
   { "interrupt",   0, 1, false, true,  true, false, mips_handle_interrupt_attr,
 NULL },
@@ -1310,6 +1316,81 @@ mips_use_debug_exception_return_p (tree type)
   TYPE_ATTRIBUTES (type)) != NULL;
 }
 
+
+/* Verify the arguments to a code_readable attribute.  */
+
+static tree
+mips_handle_code_readable_attr (tree *node ATTRIBUTE_UNUSED, tree name,
+   tree args, int flags ATTRIBUTE_UNUSED,
+   bool *no_add_attrs)
+{
+  if (!is_attribute_p ("code_readable", name) || args == NULL)
+return NULL_TREE;
+
+  if (TREE_CODE (TREE_VALUE (args)) != STRING_CST)
+{
+  warning (OPT_Wattributes,
+  "%qE attribute requires a string argument", name);
+  *no_add_attrs = true;
+}
+  else if (strcmp (TREE_STRING_POINTER (TREE_VALUE (args)), "no") != 0
+  && strcmp (TREE_STRING_POINTER (TREE_VALUE (args)), "pcrel") != 0
+  && strcmp (TREE_STRING_POINTER (TREE_VALUE (args)), "yes") != 0)
+{
+  warning (OPT_Wattributes,
+  "argument to %qE attribute is neither no, pcrel nor yes", name);
+  *no_add_attrs = true;
+}
+
+  return NULL_TREE;
+}
+
+/* Determine the code_readable setting for a function if it has one.  Set
+   *valid to true if we have a properly formed argument and
+   return the result. If there's no argument, return GCC's default.
+   Otherwise, leave valid false and return mips_base_code_readable.  In
+   that case the result should be unused anyway.  */
+
+static enum mips_code_readable_setting
+mips_get_code_readable_attr (tree decl)
+{
+  tree attr;
+
+  if (decl == NULL)
+return mips_base_code_readable;
+
+  attr =

[PATCH] tree-ssa-math-opts: Pattern recognize some further hand written forms of signed __builtin_mul_overflow{,_p} [PR105776]

2023-05-19 Thread Jakub Jelinek via Gcc-patches

Hi!

In the pattern recognition of signed __builtin_mul_overflow{,_p} we
check for result of unsigned division (which follows unsigned
multiplication) being equality compared against one of the multiplication's
argument (the one not used in the division) and check for the comparison
to be done against same precision cast of the argument (because
division's result is unsigned and the argument is signed).
But as shown in this PR, one can write it equally as comparison done in
the signed type, i.e. compare division's result cast to corresponding
signed type against the argument.

The following patch handles even those cases.

Bootstrapped/regtested on x86_64-linux, i686-linux, aarch64-linux and
powerpc64le-linux, ok for trunk?

2023-05-19  Jakub Jelinek  

PR tree-optimization/105776
* tree-ssa-math-opts.cc (arith_overflow_check_p): If cast_stmt is
non-NULL, allow division statement to have a cast as single imm use
rather than comparison/condition.
(match_arith_overflow): In that case remove the cast stmt in addition
to the division statement.

* gcc.target/i386/pr105776.c: New test.

--- gcc/tree-ssa-math-opts.cc.jj2023-05-18 14:57:13.216409685 +0200
+++ gcc/tree-ssa-math-opts.cc   2023-05-18 15:45:34.077177053 +0200
@@ -3802,6 +3802,21 @@ arith_overflow_check_p (gimple *stmt, gi
   use_operand_p use;
   if (!single_imm_use (divlhs, , _use_stmt))
return 0;
+  if (cast_stmt && gimple_assign_cast_p (cur_use_stmt))
+   {
+ tree cast_lhs = gimple_assign_lhs (cur_use_stmt);
+ if (INTEGRAL_TYPE_P (TREE_TYPE (cast_lhs))
+ && TYPE_UNSIGNED (TREE_TYPE (cast_lhs))
+ && (TYPE_PRECISION (TREE_TYPE (cast_lhs))
+ == TYPE_PRECISION (TREE_TYPE (divlhs)))
+ && single_imm_use (cast_lhs, , _use_stmt))
+   {
+ cast_stmt = NULL;
+ divlhs = cast_lhs;
+   }
+ else
+   return 0;
+   }
 }
   if (gimple_code (cur_use_stmt) == GIMPLE_COND)
 {
@@ -4390,6 +4405,16 @@ match_arith_overflow (gimple_stmt_iterat
  gimple_stmt_iterator gsi2 = gsi_for_stmt (orig_use_stmt);
  maybe_optimize_guarding_check (mul_stmts, use_stmt, orig_use_stmt,
 cfg_changed);
+ use_operand_p use;
+ gimple *cast_stmt;
+ if (single_imm_use (gimple_assign_lhs (orig_use_stmt), ,
+ _stmt)
+ && gimple_assign_cast_p (cast_stmt))
+   {
+ gimple_stmt_iterator gsi3 = gsi_for_stmt (cast_stmt);
+ gsi_remove (, true);
+ release_ssa_name (gimple_assign_lhs (cast_stmt));
+   }
  gsi_remove (, true);
  release_ssa_name (gimple_assign_lhs (orig_use_stmt));
}
--- gcc/testsuite/gcc.target/i386/pr105776.c.jj 2023-05-18 15:57:15.570218802 
+0200
+++ gcc/testsuite/gcc.target/i386/pr105776.c2023-05-18 15:56:55.273506918 
+0200
@@ -0,0 +1,43 @@
+/* PR tree-optimization/105776 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized -masm=att" } */
+/* { dg-final { scan-tree-dump-times " = \.MUL_OVERFLOW " 5 "optimized" } } */
+/* { dg-final { scan-assembler-times "\timull\t" 5 } } */
+/* { dg-final { scan-assembler-times "\tsetno\t" 5 } } */
+
+int
+foo (unsigned x, unsigned y)
+{
+  unsigned int r = x * y;
+  return !x || ((int) r / (int) x) == (int) y;
+}
+
+int
+bar (unsigned x, unsigned y)
+{
+  return !x || ((int) (x * y) / (int) x) == (int) y;
+}
+
+int
+baz (unsigned x, unsigned y)
+{
+  if (x == 0)
+return 1;
+  return ((int) (x * y) / (int) x) == y;
+}
+
+int
+qux (unsigned x, unsigned y, unsigned *z)
+{
+  unsigned int r = x * y;
+  *z = r;
+  return !x || ((int) r / (int) x) == (int) y;
+}
+
+int
+corge (unsigned x, unsigned y, unsigned *z)
+{
+  unsigned int r = x * y;
+  *z = r;
+  return !x || ((int) r / (int) x) == y;
+}

Jakub

[PATCH] tree-ssa-math-opts: Pattern recognize hand written __builtin_mul_overflow_p with same unsigned types even when target just has highpart umul [PR101856]

2023-05-19 Thread Jakub Jelinek via Gcc-patches

Hi!

As can be seen on the following testcase, we pattern recognize it on
i?86/x86_64 as return __builtin_mul_overflow_p (x, y, 0UL) and avoid
that way the extra division, but don't do it e.g. on aarch64 or ppc64le,
even when return __builtin_mul_overflow_p (x, y, 0UL); actually produces
there better code.  The reason for testing the presence of the optab
handler is to make sure the generated code for it is short to ensure
we don't actually pessimize code instead of optimizing it.
But, we have one case that the internal-fn.cc .MUL_OVERFLOW expansion
handles nicely, and that is when arguments/result is the same mode
TYPE_UNSIGNED type, we only use IMAGPART_EXPR of it (i.e.
__builtin_mul_overflow_p rather than __builtin_mul_overflow) and
umul_highpart_optab supports the particular mode, in that case
we emit comparison of the highpart umul result against zero.

So, the following patch matches what we do in internal-fn.cc and
also pattern matches __builtin_mul_overflow_p if
1) we only need the flag whether it overflowed (i.e. !use_seen)
2) it is unsigned (i.e. !cast_stmt)
3) umul_highpart is supported for the mode

Bootstrapped/regtested on x86_64-linux, i686-linux, aarch64-linux and
powerpc64le-linux, ok for trunk?

2023-05-19  Jakub Jelinek  

PR tree-optimization/101856
* tree-ssa-math-opts.cc (match_arith_overflow): Pattern detect
unsigned __builtin_mul_overflow_p even when umulv4_optab doesn't
support it but umul_highpart_optab does.

* gcc.dg/tree-ssa/pr101856.c: New test.

--- gcc/tree-ssa-math-opts.cc.jj2023-05-17 20:57:59.537914382 +0200
+++ gcc/tree-ssa-math-opts.cc   2023-05-18 12:04:09.332336899 +0200
@@ -4074,7 +4074,10 @@ match_arith_overflow (gimple_stmt_iterat
TYPE_MODE (type)) == CODE_FOR_nothing)
   || (code == MULT_EXPR
  && optab_handler (cast_stmt ? mulv4_optab : umulv4_optab,
-   TYPE_MODE (type)) == CODE_FOR_nothing))
+   TYPE_MODE (type)) == CODE_FOR_nothing
+ && (use_seen
+ || cast_stmt
+ || !can_mult_highpart_p (TYPE_MODE (type), true
 {
   if (code != PLUS_EXPR)
return false;
--- gcc/testsuite/gcc.dg/tree-ssa/pr101856.c.jj 2023-05-18 11:57:17.681206745 
+0200
+++ gcc/testsuite/gcc.dg/tree-ssa/pr101856.c2023-05-18 11:56:51.662577752 
+0200
@@ -0,0 +1,11 @@
+/* PR tree-optimization/101856 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump " .MUL_OVERFLOW " "optimized" { target i?86-*-* 
x86_64-*-* aarch64*-*-* powerpc64le-*-* } } } */
+
+int
+foo (unsigned long x, unsigned long y)
+{
+  unsigned long z = x * y;
+  return z / y != x;
+}

Jakub

[PATCH] MIPS16: Implement `code_readable` function attribute.

2023-05-19 Thread Jie Mei

Support for __attribute__ ((code_readable)).  Takes up to one argument of
"yes", "no", "pcrel".  This will change the code readability setting for just
that function.  If no argument is supplied, then the setting is 'yes'.

gcc/ChangeLog:

* config/mips/mips.cc (enum mips_code_readable_setting):New enmu.
(mips_handle_code_readable_attr):New static function.
(mips_get_code_readable_attr):New static enum function.
(mips_set_current_function):Set the code_readable mode.
(mips_option_override):Same as above.
* doc/extend.texi:Document code_readable.

gcc/testsuite/ChangeLog:

* gcc.target/mips/code-readable-attr-1.c: New test.
* gcc.target/mips/code-readable-attr-2.c: New test.
* gcc.target/mips/code-readable-attr-3.c: New test.
* gcc.target/mips/code-readable-attr-4.c: New test.
* gcc.target/mips/code-readable-attr-5.c: New test.
---
 gcc/config/mips/mips.cc   | 97 ++-
 gcc/doc/extend.texi   | 17 
 .../gcc.target/mips/code-readable-attr-1.c| 51 ++
 .../gcc.target/mips/code-readable-attr-2.c| 49 ++
 .../gcc.target/mips/code-readable-attr-3.c| 50 ++
 .../gcc.target/mips/code-readable-attr-4.c| 51 ++
 .../gcc.target/mips/code-readable-attr-5.c|  5 +
 7 files changed, 319 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-1.c
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-2.c
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-3.c
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-4.c
 create mode 100644 gcc/testsuite/gcc.target/mips/code-readable-attr-5.c

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index ca822758b41..97f45e67529 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -498,6 +498,9 @@ static int mips_base_target_flags;
 /* The default compression mode.  */
 unsigned int mips_base_compression_flags;
 
+/* The default code readable setting.  */
+enum mips_code_readable_setting mips_base_code_readable;
+
 /* The ambient values of other global variables.  */
 static int mips_base_schedule_insns; /* flag_schedule_insns */
 static int mips_base_reorder_blocks_and_partition; /* flag_reorder... */
@@ -602,6 +605,7 @@ const enum reg_class 
mips_regno_to_class[FIRST_PSEUDO_REGISTER] = {
   ALL_REGS,ALL_REGS,   ALL_REGS,   ALL_REGS
 };
 
+static tree mips_handle_code_readable_attr (tree *, tree, tree, int, bool *);
 static tree mips_handle_interrupt_attr (tree *, tree, tree, int, bool *);
 static tree mips_handle_use_shadow_register_set_attr (tree *, tree, tree, int,
  bool *);
@@ -623,6 +627,8 @@ static const struct attribute_spec mips_attribute_table[] = 
{
   { "micromips",   0, 0, true,  false, false, false, NULL, NULL },
   { "nomicromips", 0, 0, true,  false, false, false, NULL, NULL },
   { "nocompression", 0, 0, true,  false, false, false, NULL, NULL },
+  { "code_readable", 0, 1, true,  false, false, false,
+mips_handle_code_readable_attr, NULL },
   /* Allow functions to be specified as interrupt handlers */
   { "interrupt",   0, 1, false, true,  true, false, mips_handle_interrupt_attr,
 NULL },
@@ -1310,6 +1316,81 @@ mips_use_debug_exception_return_p (tree type)
   TYPE_ATTRIBUTES (type)) != NULL;
 }
 
+
+/* Verify the arguments to a code_readable attribute.  */
+
+static tree
+mips_handle_code_readable_attr (tree *node ATTRIBUTE_UNUSED, tree name,
+   tree args, int flags ATTRIBUTE_UNUSED,
+   bool *no_add_attrs)
+{
+  if (!is_attribute_p ("code_readable", name) || args == NULL)
+return NULL_TREE;
+
+  if (TREE_CODE (TREE_VALUE (args)) != STRING_CST)
+{
+  warning (OPT_Wattributes,
+  "%qE attribute requires a string argument", name);
+  *no_add_attrs = true;
+}
+  else if (strcmp (TREE_STRING_POINTER (TREE_VALUE (args)), "no") != 0
+  && strcmp (TREE_STRING_POINTER (TREE_VALUE (args)), "pcrel") != 0
+  && strcmp (TREE_STRING_POINTER (TREE_VALUE (args)), "yes") != 0)
+{
+  warning (OPT_Wattributes,
+  "argument to %qE attribute is neither no, pcrel nor yes", name);
+  *no_add_attrs = true;
+}
+
+  return NULL_TREE;
+}
+
+/* Determine the code_readable setting for a function if it has one.  Set
+   *valid to true if we have a properly formed argument and
+   return the result. If there's no argument, return GCC's default.
+   Otherwise, leave valid false and return mips_base_code_readable.  In
+   that case the result should be unused anyway.  */
+
+static enum mips_code_readable_setting
+mips_get_code_readable_attr (tree decl)
+{
+  tree attr;
+
+  if (decl == NULL)
+return mips_base_code_readable;
+
+  attr = lookup_attribute

[PATCH v1] rs6000: Update powerpc test fold-vec-extract-int.p8.c

2023-05-19 Thread Ajit Agarwal via Gcc-patches

Hello All:

Update powerpc tests for both le and be endian with extra removal of zero 
extension and sign extension.
with default ree pass for rs6000 target.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

rs6000: Update powerpc test fold-vec-extract-int.p8.c

Update powerpc tests with extra zero_extend removal with default ree pass.

2023-05-19  Ajit Kumar Agarwal  

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/fold-vec-extract-int.p8.c: Update test.
---
 gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c
index 75eaf25943b..f5f953320d8 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c
@@ -13,8 +13,8 @@
 
 /* { dg-final { scan-assembler-times {\mvspltw\M} 3 { target lp64 } } } */
 /* { dg-final { scan-assembler-times {\mmfvsrwz\M} 3 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mrldicl\M} 7 { target { le } } } } */
-/* { dg-final { scan-assembler-times {\mrldicl\M} 4 { target { lp64 && be } } 
} } */
+/* { dg-final { scan-assembler-times {\mrldicl\M} 5 { target { le } } } } */
+/* { dg-final { scan-assembler-times {\mrldicl\M} 2 { target { lp64 && be } } 
} } */
 /* { dg-final { scan-assembler-times {\msubfic\M} 3 { target { le } } } } */
 /* { dg-final { scan-assembler-times {\msldi\M} 3  { target lp64 } } } */
 /* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 { target lp64 } } } */
-- 
2.31.1

1 2 >

1 - 100 of 102 matches

Mail list logo