[PATCH v1] RISC-V: Refactor RVV frm_mode attr for rounding mode intrinsic
From: Pan Li The frm_mode attr has some assumptions for each define insn as below. 1. The define insn has at least 9 operands. 2. The operands[9] must be frm reg. 3. The operands[9] must be const int. Actually, the frm operand can be operands[8], operands[9] or operands[10], and not all the define insn has frm operands. This patch would like to refactor frm and eliminate the above assumptions, as well as unblock the underlying rounding mode intrinsic API support. After refactor, the default frm will be none, and the selected insn type will be dyn. For the floating point which honors the frm, we will set the frm_mode attr explicitly in define_insn. Passed both the riscv.exp and rvv.exp for rv32/rv64 tests. Signed-off-by: Pan Li gcc/ChangeLog: * config/riscv/riscv-protos.h (get_frm_mode): Remove operand assumptions. * config/riscv/riscv-v.cc (get_frm_mode): New function. * config/riscv/riscv-vector-builtins.cc (function_expander::use_ternop_insn): * config/riscv/vector.md: Set frm_mode attr explicitly. --- gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 28 gcc/config/riscv/riscv-vector-builtins.cc | 22 ++- gcc/config/riscv/vector.md| 170 ++ 4 files changed, 159 insertions(+), 62 deletions(-) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 324991e2619..33f7cb1d670 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -236,6 +236,7 @@ bool check_builtin_call (location_t, vec, unsigned int, tree, unsigned int, tree *); bool const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT, HOST_WIDE_INT); bool legitimize_move (rtx, rtx); +int get_frm_mode (rtx); void emit_vlmax_vsetvl (machine_mode, rtx); void emit_hard_vlmax_vsetvl (machine_mode, rtx); void emit_vlmax_insn (unsigned, int, rtx *, rtx = 0); diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 278452b9e05..d5fb8611d6e 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -1513,6 +1513,34 @@ expand_const_vector (rtx target, rtx src) gcc_unreachable (); } +/* Get the frm mode with given CONST_INT rtx, the default mode is + FRM_MODE_DYN. */ +int +get_frm_mode (rtx operand) +{ + gcc_assert (CONST_INT_P (operand)); + + switch (INTVAL (operand)) +{ +case FRM_RNE: + return FRM_MODE_RNE; +case FRM_RTZ: + return FRM_MODE_RTZ; +case FRM_RDN: + return FRM_MODE_RDN; +case FRM_RUP: + return FRM_MODE_RUP; +case FRM_RMM: + return FRM_MODE_RMM; +case FRM_DYN: + return FRM_MODE_DYN; +default: + return FRM_MODE_DYN; +} + + gcc_unreachable (); +} + /* Expand a pre-RA RVV data move from SRC to DEST. It expands move for RVV fractional vector modes. */ bool diff --git a/gcc/config/riscv/riscv-vector-builtins.cc b/gcc/config/riscv/riscv-vector-builtins.cc index 528dca7ae85..abab06c00ed 100644 --- a/gcc/config/riscv/riscv-vector-builtins.cc +++ b/gcc/config/riscv/riscv-vector-builtins.cc @@ -3730,17 +3730,29 @@ function_expander::use_ternop_insn (bool vd_accum_p, insn_code icode) } for (int argno = arg_offset; argno < call_expr_nargs (exp); argno++) -add_input_operand (argno); +{ + if (base->has_rounding_mode_operand_p () + && argno == call_expr_nargs (exp) - 2) + { + /* Since the rounding mode argument position is not consistent with +the instruction pattern, we need to skip rounding mode argument +here. */ + continue; + } + add_input_operand (argno); +} add_input_operand (Pmode, get_tail_policy_for_pred (pred)); add_input_operand (Pmode, get_mask_policy_for_pred (pred)); add_input_operand (Pmode, get_avl_type_rtx (avl_type::NONVLMAX)); - /* TODO: Currently, we don't support intrinsic that is modeling rounding mode. - We add default rounding mode for the intrinsics that didn't model rounding - mode yet. */ + if (base->has_rounding_mode_operand_p ()) +add_input_operand (call_expr_nargs (exp) - 2); + + /* The RVV floating-point only support dynamic rounding mode in the + FRM register. */ if (opno != insn_data[icode].n_generator_args) -add_input_operand (Pmode, const0_rtx); +add_input_operand (Pmode, gen_int_mode (riscv_vector::FRM_DYN, Pmode)); return generate_insn (icode); } diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index 750b2de8df9..db3ee105ef4 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -867,26 +867,8 @@ (define_attr "vxrm_mode" "rnu,rne,rdn,rod,none" ;; Defines rounding mode of an floating-point operation. (define_attr "frm_mode" "rne,rtz,rdn,rup,rmm,dyn,dyn_exit,dyn_call,none" (cond [(eq_attr "type" "vfalu,vfwalu,vfmul,vfdiv,vfwmul") - (cond - [(match_test "INTVAL (ope
Re: [PATCH] Add -Wdisabled-optimization warning for not optimizing sibling calls
On Sun, 2023-08-06 at 02:28 +0530, Prathamesh Kulkarni via Gcc-patches wrote: > On Fri, 4 Aug 2023 at 23:28, Bradley Lucier via Gcc-patches > wrote: Hi Bradley and Prathamesh... > > > > The patch at the end adds a warning when a tail/sibling call cannot > > be > > optimized for various reasons. > > > > I built and tested GCC with and without the patch with > > configuration > > > > Configured with: ../../gcc-mainline/configure --enable-languages=c > > --disable-multilib --prefix=/pkgs/gcc-mainline --disable-werror > > > > There were some changes in the test results, but I can't say that > > they > > look substantive: > > [...] > > > > to test the new warning. The warnings are of the form, e.g., > > > > ../../../gcc-mainline/gcc/tree-vect-stmts.cc:11990:44: warning: > > cannot > > apply sibling-call optimization: callee required more stack slots > > than > > the caller [-Wdisabled-optimization] > > > > These are the number of times this warning was triggered building > > stage1: > > > > grep warning: build.log | grep sibling | sed 's/^.*://' | sort | > > uniq -c > > 259 callee required more stack slots than the caller > > [-Wdisabled-optimization] > > 43 callee returns a structure [-Wdisabled-optimization] > > > > If this patch is OK, someone else will need to commit it for me. > > > > Brad > > > > gcc/Changelog > > > > * calls.cc (maybe_complain_about_tail_call) Add warning > > when > > tail or sibling call cannot be optimized. > Hi Bradley, > I don't have comments on the patch, but a new warning will also > require a corresponding entry in doc/invoke.texi. To nitpick, this isn't a new warning; the patch is extending an existing warning. Looking at the existing entry for that warning I see: @opindex Wdisabled-optimization @opindex Wno-disabled-optimization @item -Wdisabled-optimization Warn if a requested optimization pass is disabled. This warning does not generally indicate that there is anything wrong with your code; it merely indicates that GCC's optimizers are unable to handle the code effectively. Often, the problem is that your code is too big or too complex; GCC refuses to optimize programs when the optimization itself is likely to take inordinate amounts of time. ...which arguably fits the new functionality. Though I don't know how the optimizer maintainers feel about it. Also, as we add more stuff to this warning, would users need more fine-grained control over which things for the optimizer to complain about? I'm not sure. > > Thanks, > Prathamesh > > > > diff --git a/gcc/calls.cc b/gcc/calls.cc > > index 1f3a6d5c450..b95c876fda8 100644 > > --- a/gcc/calls.cc > > +++ b/gcc/calls.cc > > @@ -1242,10 +1242,12 @@ void > > maybe_complain_about_tail_call (tree call_expr, const char > > *reason) > > { > > gcc_assert (TREE_CODE (call_expr) == CALL_EXPR); > > - if (!CALL_EXPR_MUST_TAIL_CALL (call_expr)) > > - return; > > - > > - error_at (EXPR_LOCATION (call_expr), "cannot tail-call: %s", > > reason); > > + if (CALL_EXPR_MUST_TAIL_CALL (call_expr)) > > + error_at (EXPR_LOCATION (call_expr), "cannot tail-call: %s", > > reason); The existing code use error_at, passing it the location of the call_expr... > > + else if (flag_optimize_sibling_calls) > > + warning (OPT_Wdisabled_optimization, > > + "cannot apply sibling-call optimization: %s", > > reason); ...but the warning branch uses "warning", which implicitly uses the input_location global variable. Is the warning reported at the correct place? It's better to use warning_at and pass it the location at which the warning should be emitted. The patch doesn't add any test cases, but I imagine any such cases would be very target-dependent (did I add any to my libgccjit version of this way back when?) Thanks for the patch; hope this is constructive Dave > > + return; > > } > > > > /* Fill in ARGS_SIZE and ARGS array based on the parameters found > > in > > > > >
Re: [PATCH] Add -Wdisabled-optimization warning for not optimizing sibling calls
On Sun, 6 Aug 2023 at 03:07, Bradley Lucier wrote: > > On 8/5/23 4:58 PM, Prathamesh Kulkarni wrote: > > I don't have comments on the patch, but a new warning will also > > require a corresponding entry in doc/invoke.texi. > > Thank you for your comment. > > -Wdisabled-optimization is an established warning, it's just that I'd > like it to apply in another circumstance. Maybe that doesn't need new > documentation. Oops I misread your patch as adding a new warning :/ Sorry for the noise. Best Regards, Prathamesh > > Brad Lucier
Re: [PATCH] Add -Wdisabled-optimization warning for not optimizing sibling calls
On 8/5/23 4:58 PM, Prathamesh Kulkarni wrote: I don't have comments on the patch, but a new warning will also require a corresponding entry in doc/invoke.texi. Thank you for your comment. -Wdisabled-optimization is an established warning, it's just that I'd like it to apply in another circumstance. Maybe that doesn't need new documentation. Brad Lucier
Re: [PATCH] Add -Wdisabled-optimization warning for not optimizing sibling calls
On Fri, 4 Aug 2023 at 23:28, Bradley Lucier via Gcc-patches wrote: > > The patch at the end adds a warning when a tail/sibling call cannot be > optimized for various reasons. > > I built and tested GCC with and without the patch with configuration > > Configured with: ../../gcc-mainline/configure --enable-languages=c > --disable-multilib --prefix=/pkgs/gcc-mainline --disable-werror > > There were some changes in the test results, but I can't say that they > look substantive: > > diff -C 2 summary.log ../gcc-mainline > *** summary.log Thu Aug 3 22:56:13 2023 > --- ../gcc-mainline/summary.log Thu Aug 3 19:42:33 2023 > *** > *** 14,22 > === g++ Summary === > > ! # of expected passes 239234 ># of unexpected failures 5 ># of expected failures 2087 > ! # of unsupported tests10566 > ! /home/lucier/programs/gcc/objdirs/gcc-mainline-new/gcc/xg++ version > 14.0.0 20230802 (experimental) (GCC) > > === gcc tests === > --- 14,22 > === g++ Summary === > > ! # of expected passes 239262 ># of unexpected failures 5 ># of expected failures 2087 > ! # of unsupported tests10562 > ! /home/lucier/programs/gcc/objdirs/gcc-mainline/gcc/xg++ version > 14.0.0 20230802 (experimental) (GCC) > > === gcc tests === > *** > *** 155,164 > === gcc Summary === > > ! # of expected passes 192553 ># of unexpected failures 109 ># of unexpected successes19 ># of expected failures 1506 > ! # of unsupported tests2623 > ! /home/lucier/programs/gcc/objdirs/gcc-mainline-new/gcc/xgcc version > 14.0.0 20230802 (experimental) (GCC) > > === libatomic tests === > --- 155,164 > === gcc Summary === > > ! # of expected passes 192563 ># of unexpected failures 109 ># of unexpected successes19 ># of expected failures 1506 > ! # of unsupported tests2619 > ! /home/lucier/programs/gcc/objdirs/gcc-mainline/gcc/xgcc version > 14.0.0 20230802 (experimental) (GCC) > > === libatomic tests === > > I then configured and built GCC with > > ../../gcc-mainline/configure CXX="/pkgs/gcc-mainline-new/bin/g++ > -Wdisabled-optimization" --enable-languages=c --disable-multilib > --prefix=/pkgs/gcc-mainline-test --disable-werror --disable-bootstrap > > to test the new warning. The warnings are of the form, e.g., > > ../../../gcc-mainline/gcc/tree-vect-stmts.cc:11990:44: warning: cannot > apply sibling-call optimization: callee required more stack slots than > the caller [-Wdisabled-optimization] > > These are the number of times this warning was triggered building stage1: > > grep warning: build.log | grep sibling | sed 's/^.*://' | sort | uniq -c > 259 callee required more stack slots than the caller > [-Wdisabled-optimization] > 43 callee returns a structure [-Wdisabled-optimization] > > If this patch is OK, someone else will need to commit it for me. > > Brad > > gcc/Changelog > > * calls.cc (maybe_complain_about_tail_call) Add warning when > tail or sibling call cannot be optimized. Hi Bradley, I don't have comments on the patch, but a new warning will also require a corresponding entry in doc/invoke.texi. Thanks, Prathamesh > > diff --git a/gcc/calls.cc b/gcc/calls.cc > index 1f3a6d5c450..b95c876fda8 100644 > --- a/gcc/calls.cc > +++ b/gcc/calls.cc > @@ -1242,10 +1242,12 @@ void > maybe_complain_about_tail_call (tree call_expr, const char *reason) > { > gcc_assert (TREE_CODE (call_expr) == CALL_EXPR); > - if (!CALL_EXPR_MUST_TAIL_CALL (call_expr)) > -return; > - > - error_at (EXPR_LOCATION (call_expr), "cannot tail-call: %s", reason); > + if (CALL_EXPR_MUST_TAIL_CALL (call_expr)) > +error_at (EXPR_LOCATION (call_expr), "cannot tail-call: %s", reason); > + else if (flag_optimize_sibling_calls) > +warning (OPT_Wdisabled_optimization, > + "cannot apply sibling-call optimization: %s", reason); > + return; > } > > /* Fill in ARGS_SIZE and ARGS array based on the parameters found in > >
[PATCH] MATCH: Extend min_value/max_value to pointer types
Since we already had the infrastructure to optimize `(x == 0) && (x > y)` to false for integer types, this extends the same to pointer types as indirectly requested by PR 96695. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/96695 * match.pd (min_value, max_value): Extend to pointer types too. gcc/testsuite/ChangeLog: PR tree-optimization/96695 * gcc.dg/pr96695-1.c: New test. * gcc.dg/pr96695-10.c: New test. * gcc.dg/pr96695-11.c: New test. * gcc.dg/pr96695-12.c: New test. * gcc.dg/pr96695-2.c: New test. * gcc.dg/pr96695-3.c: New test. * gcc.dg/pr96695-4.c: New test. * gcc.dg/pr96695-5.c: New test. * gcc.dg/pr96695-6.c: New test. * gcc.dg/pr96695-7.c: New test. * gcc.dg/pr96695-8.c: New test. * gcc.dg/pr96695-9.c: New test. --- gcc/match.pd | 6 -- gcc/testsuite/gcc.dg/pr96695-1.c | 18 ++ gcc/testsuite/gcc.dg/pr96695-10.c | 20 gcc/testsuite/gcc.dg/pr96695-11.c | 18 ++ gcc/testsuite/gcc.dg/pr96695-12.c | 18 ++ gcc/testsuite/gcc.dg/pr96695-2.c | 18 ++ gcc/testsuite/gcc.dg/pr96695-3.c | 20 gcc/testsuite/gcc.dg/pr96695-4.c | 21 + gcc/testsuite/gcc.dg/pr96695-5.c | 19 +++ gcc/testsuite/gcc.dg/pr96695-6.c | 20 gcc/testsuite/gcc.dg/pr96695-7.c | 19 +++ gcc/testsuite/gcc.dg/pr96695-8.c | 19 +++ gcc/testsuite/gcc.dg/pr96695-9.c | 20 13 files changed, 234 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr96695-1.c create mode 100644 gcc/testsuite/gcc.dg/pr96695-10.c create mode 100644 gcc/testsuite/gcc.dg/pr96695-11.c create mode 100644 gcc/testsuite/gcc.dg/pr96695-12.c create mode 100644 gcc/testsuite/gcc.dg/pr96695-2.c create mode 100644 gcc/testsuite/gcc.dg/pr96695-3.c create mode 100644 gcc/testsuite/gcc.dg/pr96695-4.c create mode 100644 gcc/testsuite/gcc.dg/pr96695-5.c create mode 100644 gcc/testsuite/gcc.dg/pr96695-6.c create mode 100644 gcc/testsuite/gcc.dg/pr96695-7.c create mode 100644 gcc/testsuite/gcc.dg/pr96695-8.c create mode 100644 gcc/testsuite/gcc.dg/pr96695-9.c diff --git a/gcc/match.pd b/gcc/match.pd index 2278029d608..de54b17abba 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -2733,12 +2733,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (match min_value INTEGER_CST - (if (INTEGRAL_TYPE_P (type) + (if ((INTEGRAL_TYPE_P (type) + || POINTER_TYPE_P(type)) && wi::eq_p (wi::to_wide (t), wi::min_value (type) (match max_value INTEGER_CST - (if (INTEGRAL_TYPE_P (type) + (if ((INTEGRAL_TYPE_P (type) + || POINTER_TYPE_P(type)) && wi::eq_p (wi::to_wide (t), wi::max_value (type) /* x > y && x != XXX_MIN --> x > y diff --git a/gcc/testsuite/gcc.dg/pr96695-1.c b/gcc/testsuite/gcc.dg/pr96695-1.c new file mode 100644 index 000..d4287ab4c8c --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr96695-1.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-ifcombine" } */ + +#include + +_Bool and1(unsigned *x, unsigned *y) +{ + /* x > y && x != 0 --> x > y */ + return x > y && x != 0; +} + +_Bool and2(unsigned *x, unsigned *y) +{ + /* x < y && x != -1 --> x < y */ + return x < y && x != (unsigned*)-1; +} + +/* { dg-final { scan-tree-dump-not " != " "ifcombine" } } */ diff --git a/gcc/testsuite/gcc.dg/pr96695-10.c b/gcc/testsuite/gcc.dg/pr96695-10.c new file mode 100644 index 000..dfe752526f0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr96695-10.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include + +_Bool or1(unsigned *x, unsigned *y) +{ + /* x <= y || x != 0 --> true */ + return x <= y || x != 0; +} + +_Bool or2(unsigned *x, unsigned *y) +{ + /* x >= y || x != -1 --> true */ + return x >= y || x != (unsigned*)-1; +} + +/* { dg-final { scan-tree-dump-not " != " "optimized" } } */ +/* { dg-final { scan-tree-dump-not " <= " "optimized" } } */ +/* { dg-final { scan-tree-dump-not " >= " "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/pr96695-11.c b/gcc/testsuite/gcc.dg/pr96695-11.c new file mode 100644 index 000..d3c36168b98 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr96695-11.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-ifcombine" } */ + +#include + +_Bool or1(unsigned *x, unsigned *y) +{ + /* x <= y || x == 0 --> x <= y */ + return x <= y || x == 0; +} + +_Bool or2(unsigned *x, unsigned *y) +{ + /* x >= y || x == -1 --> x >= y */ + return x >= y || x == (unsigned*)-1; +} + +/* { dg-final { scan-tree-dump-not " == " "ifcombine" } } */ diff --git a/gcc/testsuite/gcc.dg/pr96695-12.c b/gcc/testsuite/gcc.dg/pr96695-12.c new file mode 100644 index
[C PATCH] Support typename as selector in _Generic
Clang now has an extension which accepts a typename for _Generic. This is simple to implement and is useful. Do we want this? Clang calls it a "Clang extension" in the pedantic warning. I changed it to "an extension" I am not sure what the policy is. Do we need an extra warning option? Clang has one. No documentation so far. Bootstrapped and regression tested on x86_64-pc-linux-gnu. Martin c: Support typename as selector in _Generic Support typenames as first argument to _Generic which is an extension supported by Clang. It makes it easier to test for types with qualifiers in combination with typeof. gcc/c/: * c-parser.cc (c_parser_generic_selection): Support typename in _Generic selector. gcc/testsuite/: * gnu2x-generic.c: New test. diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 57a01dc2fa3..9aea2425294 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -9312,30 +9312,51 @@ c_parser_generic_selection (c_parser *parser) if (!parens.require_open (parser)) return error_expr; - c_inhibit_evaluation_warnings++; selector_loc = c_parser_peek_token (parser)->location; - selector = c_parser_expr_no_commas (parser, NULL); - selector = default_function_array_conversion (selector_loc, selector); - c_inhibit_evaluation_warnings--; - if (selector.value == error_mark_node) + if (c_token_starts_typename (c_parser_peek_token (parser))) { - c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL); - return selector; -} - mark_exp_read (selector.value); - selector_type = TREE_TYPE (selector.value); - /* In ISO C terms, rvalues (including the controlling expression of - _Generic) do not have qualified types. */ - if (TREE_CODE (selector_type) != ARRAY_TYPE) -selector_type = TYPE_MAIN_VARIANT (selector_type); - /* In ISO C terms, _Noreturn is not part of the type of expressions - such as &abort, but in GCC it is represented internally as a type - qualifier. */ - if (FUNCTION_POINTER_TYPE_P (selector_type) - && TYPE_QUALS (TREE_TYPE (selector_type)) != TYPE_UNQUALIFIED) -selector_type - = build_pointer_type (TYPE_MAIN_VARIANT (TREE_TYPE (selector_type))); + /* Language extension introduced by Clang. */ + pedwarn (selector_loc, OPT_Wpedantic, "passing a type argument as " + "first argument to %<_Generic%> is an extension"); + struct c_type_name *type_name; + c_inhibit_evaluation_warnings++; + type_name = c_parser_type_name (parser); + c_inhibit_evaluation_warnings--; + if (NULL == type_name) + { + c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL); + return error_expr; + } + /* Qualifiers are preserved. */ + selector_type = groktypename (type_name, NULL, NULL); +} + else +{ + c_inhibit_evaluation_warnings++; + selector = c_parser_expr_no_commas (parser, NULL); + selector = default_function_array_conversion (selector_loc, selector); + c_inhibit_evaluation_warnings--; + + if (selector.value == error_mark_node) + { + c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL); + return selector; + } + mark_exp_read (selector.value); + selector_type = TREE_TYPE (selector.value); + /* In ISO C terms, rvalues (including the controlling expression of +_Generic) do not have qualified types. */ + if (TREE_CODE (selector_type) != ARRAY_TYPE) + selector_type = TYPE_MAIN_VARIANT (selector_type); + /* In ISO C terms, _Noreturn is not part of the type of expressions +such as &abort, but in GCC it is represented internally as a type +qualifier. */ + if (FUNCTION_POINTER_TYPE_P (selector_type) + && TYPE_QUALS (TREE_TYPE (selector_type)) != TYPE_UNQUALIFIED) + selector_type + = build_pointer_type (TYPE_MAIN_VARIANT (TREE_TYPE (selector_type))); +} if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>")) { @@ -9401,7 +9422,7 @@ c_parser_generic_selection (c_parser *parser) assoc.expression = c_parser_expr_no_commas (parser, NULL); if (!match) - c_inhibit_evaluation_warnings--; + c_inhibit_evaluation_warnings--; if (assoc.expression.value == error_mark_node) { diff --git a/gcc/testsuite/gcc.dg/gnu2x-generic.c b/gcc/testsuite/gcc.dg/gnu2x-generic.c new file mode 100644 index 000..82b09578072 --- /dev/null +++ b/gcc/testsuite/gcc.dg/gnu2x-generic.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ + +_Static_assert(_Generic(const int, const int: 1, int: 0), ""); +_Static_assert(_Generic( int, const int: 0, int: 1), ""); +_Static_assert(_Generic(int[4], int[4]: 1), ""); +_Static_assert(_Generic(typeof(int[4]), int[4]: 1), ""); + +void foo(int n) +{ + _Static_assert(_Generic(int[n++], int[4]: 1), ""); +} + +#pragma GCC diagnostic warning "-Wpedantic" +_Static_a
[committed] c: Less warnings for parameters declared as arrays [PR98536]
I splitted up the patch into two parts and committed only the FE parts which were already approved and the tests. This solves one of the two issues. Bootstrapped and regression tested on x86_64-pc-linux-gnu. Less warnings for parameters declared as arrays [PR98536] To avoid false positivies, tune the warnings for parameters declared as arrays with size expressions. Do not warn when more bounds are specified in the declaration than before. PR c/98536 c-family/ * c-warn.cc (warn_parm_array_mismatch): Do not warn if more bounds are specified. gcc/testsuite: * gcc.dg/Wvla-parameter-4.c: Adapt test. * gcc.dg/attr-access-2.c: Adapt test. diff --git a/gcc/c-family/c-warn.cc b/gcc/c-family/c-warn.cc index d4d62c48b20..b7c5d7c01a2 100644 --- a/gcc/c-family/c-warn.cc +++ b/gcc/c-family/c-warn.cc @@ -3599,23 +3599,13 @@ warn_parm_array_mismatch (location_t origloc, tree fndecl, tree newparms) continue; } - if (newunspec != curunspec) + if (newunspec > curunspec) { location_t warnloc = newloc, noteloc = origloc; const char *warnparmstr = newparmstr.c_str (); const char *noteparmstr = curparmstr.c_str (); unsigned warnunspec = newunspec, noteunspec = curunspec; - if (newunspec < curunspec) - { - /* If the new declaration has fewer unspecified bounds -point the warning to the previous declaration to make -it clear that that's the one to change. Otherwise, -point it to the new decl. */ - std::swap (warnloc, noteloc); - std::swap (warnparmstr, noteparmstr); - std::swap (warnunspec, noteunspec); - } if (warning_n (warnloc, OPT_Wvla_parameter, warnunspec, "argument %u of type %s declared with " "%u unspecified variable bound", @@ -3643,14 +3633,10 @@ warn_parm_array_mismatch (location_t origloc, tree fndecl, tree newparms) } /* Iterate over the lists of VLA variable bounds, comparing each -pair for equality, and diagnosing mismatches. The case of -the lists having different lengths is handled above so at -this point they do . */ - for (tree newvbl = newa->size, curvbl = cura->size; newvbl; +pair for equality, and diagnosing mismatches. */ + for (tree newvbl = newa->size, curvbl = cura->size; newvbl && curvbl; newvbl = TREE_CHAIN (newvbl), curvbl = TREE_CHAIN (curvbl)) { - gcc_assert (curvbl); - tree newpos = TREE_PURPOSE (newvbl); tree curpos = TREE_PURPOSE (curvbl); diff --git a/gcc/testsuite/gcc.dg/Wvla-parameter-4.c b/gcc/testsuite/gcc.dg/Wvla-parameter-4.c index 599ad19a3e4..f35faea361a 100644 --- a/gcc/testsuite/gcc.dg/Wvla-parameter-4.c +++ b/gcc/testsuite/gcc.dg/Wvla-parameter-4.c @@ -12,11 +12,6 @@ typedef int IA3[3]; /* Verify the warning points to the declaration with more unspecified bounds, guiding the user to specify them rather than making them all unspecified. */ -void* f_pIA3ax (IA3 *x[*]); // { dg-warning "argument 1 of type 'int \\\(\\\*\\\[\\\*]\\\)\\\[3]' .aka '\[^\n\r\}\]+'. declared with 1 unspecified variable bound" } -void* f_pIA3ax (IA3 *x[*]); -void* f_pIA3ax (IA3 *x[n]); // { dg-message "subsequently declared as 'int \\\(\\\*\\\[n]\\\)\\\[3]' with 0 unspecified variable bounds" "note" } -void* f_pIA3ax (IA3 *x[n]) { return x; } - void* f_pIA3an (IA3 *x[n]); // { dg-message "previously declared as 'int \\\(\\\*\\\[n]\\\)\\\[3]' with 0 unspecified variable bounds" "note" } void* f_pIA3an (IA3 *x[n]); diff --git a/gcc/testsuite/gcc.dg/attr-access-2.c b/gcc/testsuite/gcc.dg/attr-access-2.c index 76baddffc9f..616b7a9527c 100644 --- a/gcc/testsuite/gcc.dg/attr-access-2.c +++ b/gcc/testsuite/gcc.dg/attr-access-2.c @@ -60,16 +60,6 @@ RW (2, 1) void f10 (int n, char a[n]) // { dg-warning "attribute 'access *\\\( // { dg-warning "argument 2 of type 'char\\\[n]' declared as a variable length array" "" { target *-*-* } .-1 } { (void)&n; (void)&a; } - -/* The following is diagnosed to point out declarations with the T[*] - form in headers where specifying the bound is just as important as - in the definition (to detect bugs). */ - void f11 (int, char[*]); // { dg-warning "argument 2 of type 'char\\\[\\\*\\\]' declared with 1 unspecified variable bound" } - void f11 (int m, char a[m]); // { dg-message "subsequently declared as 'char\\\[m]' with 0 unspecified variable bounds" "note" } -RW (2, 1) void f11 (int n, char arr[n]) // { dg-message "subsequently declared as 'char\\\[n]' with 0 unspecifi
[no subject]
>From 104178c3912314f41a61a316e7c3bc0487ea8f3c Mon Sep 17 00:00:00 2001 From: K1ZeKaTo Date: Sat, 5 Aug 2023 14:50:19 + Subject: [PATCH] Make the rvalue work fine. It seems not considering the rvalue in the last version. --- libstdc++-v3/include/bits/iterator_concepts.h | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/libstdc++-v3/include/bits/iterator_concepts.h b/libstdc++-v3/include/bits/iterator_concepts.h index e32e94dc9fc..8bb7c0fb8c0 100644 --- a/libstdc++-v3/include/bits/iterator_concepts.h +++ b/libstdc++-v3/include/bits/iterator_concepts.h @@ -86,14 +86,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION concept __can_reference = requires { typename __with_ref<_Tp>; }; template - concept __dereferenceable = requires(_Tp& __t) + concept __dereferenceable = requires(_Tp&& __t) { - { *__t } -> __can_reference; + { *static_cast<_Tp&&>(__t) } -> __can_reference; }; } // namespace __detail template<__detail::__dereferenceable _Tp> -using iter_reference_t = decltype(*std::declval<_Tp&>()); +using iter_reference_t = decltype(*std::declval<_Tp&&>()); namespace ranges { @@ -116,7 +116,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION template requires __adl_imove<_Tp> struct __result<_Tp> - { using type = decltype(iter_move(std::declval<_Tp>())); }; + { using type = decltype(iter_move(std::declval<_Tp&&>())); }; template requires (!__adl_imove<_Tp>) @@ -129,9 +129,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _S_noexcept() { if constexpr (__adl_imove<_Tp>) - return noexcept(iter_move(std::declval<_Tp>())); + return noexcept(iter_move(std::declval<_Tp&&>())); else - return noexcept(*std::declval<_Tp>()); + return noexcept(*std::declval<_Tp&&>()); } public: @@ -148,9 +148,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION if constexpr (__adl_imove<_Tp>) return iter_move(static_cast<_Tp&&>(__e)); else if constexpr (is_lvalue_reference_v>) - return static_cast<__type<_Tp>>(*__e); + return static_cast<__type<_Tp>>(*static_cast<_Tp&&>(__e)); else - return *__e; + return *static_cast<_Tp&&>(__e); } }; } // namespace __cust_imove -- 2.41.0
Re: [committed] [RISC-V] Avoid sub-word mode comparisons with Zicond
On 8/5/23 01:46, Xiao Zeng wrote: The operands to the comparison need to be in DImode for rv64 and SImode for rv32. That's the X iterator. After analyzing the rtl log, I can't agree more with this sentence. Note the mode of the comparison operands may be different than the mode of the destination. ie, we might have a 64bit comparison and produce a 32bit sign extended result much like the setcc insns support. This patch changes the 6 zicond patterns to use the X iterator on the comparison inputs. That at least makes the patterns correct and fixes this particular testcase. There's a few other lurking problems that I'll address in additional patches. Committed to the trunk, Jeff 1 In any case, jeff's analysis is convincing, here I will add a little bit of my own analysis. 2 First, for the test cases: foo(long long int x, long long int y) { if (((int)x | (int)y) != 0) return 6; return x + y; } look directly at the compared assembly code. This allows people to quickly realize where the error occurred. X_mode.s(right) ANYI_mode.S(error) 10 foo: 10 foo: 11 or a5,a0,a1 11 or a5,a0,a1 12 sext.w a5,a5 12 li a4,6 13 addw a0,a0,a1 13 addw a0,a0,a1 14 li a4,6 15 czero.nez a1,a0,a5 14 czero.nez a1,a0,a5 16 czero.eqz a0,a4,a5 15 czero.eqz a0,a4,a5 17 or a0,a0,a1 16 or a0,a0,a1 18 ret 17 ret 3 You will find that the correct assembly is just one more assembly instruction: sext.w a5,a5, the rest of the two are exactly the same. 4 From the perspective of assembly instructions, the a5 value obtained by sext.w a5, a5 may be different from the original a5 value, which leads to errors in the test case. Exactly. 5 However, it is difficult to directly see that an error has occurred from the rtl log of gcc's passe. Yes. It's pretty subtle, but par for the course once subreg expressions are involved. subregs are a major wart in the design/implementation of RTL due to them having fairly obscure semantics that are dependent on whether they are widening or narrowing subregs, if the inner object is a mem or something else, etc and how they interact with WORD_REGISTER_OPERATIONS. 6 I'm wondering about transforms like this: In test.c.c.301r.ira (insn 36 34 42 2 (set (reg:DI 153) (if_then_else:DI (eq:DI (subreg:SI (reg:DI 145) 0) (const_int 0 [0])) (reg:DI 149) (const_int 0 [0]))) "test.c":4:12 13599 {*czero.nez.disi} (expr_list:REG_DEAD (reg:DI 149) (nil))) But it's already broken in this form. There was a sign extension in the IL up through the combine pass. In fact, I nearly made the argument that the bug was in combine's simplification of (sign_extend (subreg ...)) to (subreg ...). But ultimately the hardware semantics of risc-v are that comparisons look at the full register, plain and simple. So a narrowing subreg to a sub-word mode is simply wrong i this context. This is also consistent with the implementation of conditional branches and scc insns in the risc-v backend. In test.c.c.302r.reload it becomes (insn 36 34 42 2 (set (reg:DI 11 a1 [153]) (if_then_else:DI (eq:DI (reg:SI 15 a5 [145]) (const_int 0 [0])) (reg:DI 10 a0 [149]) (const_int 0 [0]))) "test.c":4:12 13599 {*czero.nez.disi} (nil)) Obviously, (subreg:SI (reg:DI 145) 0) is transformed into (reg:SI 15 a5 [145]) after passing through reload pass. This conversion is wrong, why did gcc not warn? The only reason subregs exist is because pseudo registers have a single mode -- which derives from the fact that there is precisely once instance of any given pseudo register. See RTL sharing section of the internals manual. A subreg expression allows us to look at a pseudo in a mode other than its natural mode. Essentially its the same bits reinterpreted in another mode. I'm over-simplifying a lot. The full details of subreg semantics are documented in the RTL section of the internals manual. WORD_REGISTER_OPERATIONS is a property of the target that (again over-simplifying) says that an operation such as an add may have a given size (say SImode) in the RTL, but the hardware performs the operation in its native word size (DImode for rv64). While WORD_REGISTER_OPERATIONS has been around since the 90s, its semantics have always been a bit vague. It's
Re: [PATCH] core: Support heap-based trampolines
Hi Richard, Thanks for your feedback. Here is an amended version of the patch, taking into consideration your requests and the following discussion. There is no configure option for the libgcc part, and the documentation is amended. The patch is split into three commits for core, target and libgcc. Currently regtesting on x86_64 linux and darwin (it was fine before I split up into three commits, so I’m re-testing to make sure I didn’t screw anything up). OK to commit? FX 0001-core-Support-heap-based-trampolines.patch Description: Binary data 0002-target-Support-heap-based-trampolines.patch Description: Binary data 0003-libgcc-support-heap-based-trampolines.patch Description: Binary data
Re: [RFC] light expander sra for parameters and returns
Hi, Richard Biener writes: > On Thu, 3 Aug 2023, Jiufu Guo wrote: > >> >> Hi Richard, >> >> Richard Biener writes: >> >> > On Tue, 1 Aug 2023, Jiufu Guo wrote: >> > >> >> >> >> Hi, >> >> >> >> Richard Biener writes: >> >> >> >> > On Mon, 24 Jul 2023, Jiufu Guo wrote: >> >> > >> >> >> >> >> >> Hi Martin, >> >> >> >> >> >> Not sure about your current option about re-using the ipa-sra code >> >> >> in the light-expander-sra. And if anything I could input please >> >> >> let me know. >> >> >> >> >> >> And I'm thinking about the difference between the expander-sra, ipa-sra >> >> >> and tree-sra. 1. For stmts walking, expander-sra has special behavior >> >> >> for return-stmt, and also a little special on assign-stmt. And phi >> >> >> stmts are not checked by ipa-sra/tree-sra. 2. For the access structure, >> >> >> I'm also thinking if we need a tree structure; it would be useful when >> >> >> checking overlaps, it was not used now in the expander-sra. >> >> >> >> >> >> For ipa-sra and tree-sra, I notice that there is some similar code, >> >> >> but of cause there are differences. While it seems the difference >> >> >> is 'intended', for example: 1. when creating and accessing, >> >> >> 'size != max_size' is acceptable in tree-sra but not for ipa-sra. >> >> >> 2. 'AGGREGATE_TYPE_P' for ipa-sra is accepted for some cases, but >> >> >> not ok for tree-ipa. >> >> >> I'm wondering if those slight difference blocks re-use the code >> >> >> between ipa-sra and tree-sra. >> >> >> >> >> >> The expander-sra may be more light, for example, maybe we can use >> >> >> FOR_EACH_IMM_USE_STMT to check the usage of each parameter, and not >> >> >> need to walk all the stmts. >> >> > >> >> > What I was hoping for is shared stmt-level analysis and a shared >> >> > data structure for the "access"(es) a stmt performs. Because that >> >> > can come up handy in multiple places. The existing SRA data >> >> > structures could easily embed that subset for example if sharing >> >> > the whole data structure of [IPA] SRA seems too unwieldly. >> >> >> >> Understand. >> >> The stmt-level analysis and "access" data structure are similar >> >> between ipa-sra/tree-sra and the expander-sra. >> >> >> >> I just update the patch, this version does not change the behaviors of >> >> the previous version. It is just cleaning/merging some functions only. >> >> The patch is attached. >> >> >> >> This version (and tree-sra/ipa-sra) is still using the similar >> >> "stmt analyze" and "access struct"". This could be extracted as >> >> shared code. >> >> I'm thinking to update the code to use the same "base_access" and >> >> "walk function". >> >> >> >> > >> >> > With a stmt-leve API using FOR_EACH_IMM_USE_STMT would still be >> >> > possible (though RTL expansion pre-walks all stmts anyway). >> >> >> >> Yeap, I also notice that "FOR_EACH_IMM_USE_STMT" is not enough. >> >> For struct parameters, walking stmt is needed. >> > >> > I think I mentioned this before, RTL expansion already >> > pre-walks the whole function looking for variables it has to >> > expand to the stack in discover_nonconstant_array_refs (which is >> > now badly named), I'd appreciate if the "SRA" walk would piggy-back >> > on that existing walk. >> >> I may misunderstand your meaning about 'stmt-level analysis' and >> 'pre-walk'. >> I understand it as 'walk to analyze each stmt in each bb'. >> In 'discover_nonconstant_array_refs', there is a 'walk_gimple_op': >> >> FOR_EACH_BB_FN (bb, cfun) >> for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >>{ >> gimple *stmt = gsi_stmt (gsi); >> if (!is_gimple_debug (stmt)) >> { >> walk_gimple_op (stmt, discover_nonconstant_array_refs_r, &wi); >> >> Maybe, this 'walk_gimple_op' is what you mentioned as 'stmt-level >> analysis' and the 'pre-walk'. Is this right? >> >> Here, 'discover_nonconstant_array_refs_r' is the callback of >> 'walk_gimple_op'. >> This callback analyses if an array is accessed through a variant index, >> if so, the array must be put into the stack. >> >> While it seems 'walk_gimple_op' is not ok for SRA analysis, >> because, in the callback, it is hard to analyze an access is >> 'read' or 'write' to an struct object. But the 'read/write' info >> is needed for SRA. >> >> 'walk_stmt_load_store_addr_ops' is another 'walk on stmt ops'. >> This 'walk' is not used to analyze SRA access, the primary reason >> would be: the load/store/addr parameter of the callback is a >> DECL, the 'size' and 'offset' are stripped. >> >> Currently, in tree-sra/ipa-sra/expand-sra, when analyzing stmt >> for SRA access, stmt code is checked for 'return/assign/call/asm'. >> And sub-expressions of these stmt(s) are analyzed. > > I realize the stmt walks cannot be shared but the per-stmt > walk of the SRA analysis should still be done there simply for > cache locality reasons (thus compile-time). Thanks Richard! For the meaning of per-stmt walk, I guess, it may look like below fak
Re: Re: [PATCH v3] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0
On Sat, Aug 05, 2023 at 05:31:00 AM Jeff Law wrote: > >On 8/1/23 19:38, Xiao Zeng wrote: >> This patch recognizes Zicond patterns when the select pattern >> with condition eq or neq to 0 (using eq as an example), namely: >> >> 1 rd = (rs2 == 0) ? non-imm : 0 >> 2 rd = (rs2 == 0) ? non-imm : non-imm >> 3 rd = (rs2 == 0) ? reg : non-imm >> 4 rd = (rs2 == 0) ? reg : reg >> >> gcc/ChangeLog: >> >> * config/riscv/riscv.cc (riscv_expand_conditional_move): Recognize >> Zicond patterns >> * config/riscv/riscv.md: Recognize Zicond patterns through >>movcc >So I've made minor adjustments to the remaining three cases. First we >need to check the code before optimizing the cases were one of the arms >of the conditional move matches op0. > >I slightly adjusted the case for out of range constants. Its better to >check SMALL_OPERAND rather than testing for specific constants. And >when that triggers, we can just force the value into a register and >continue as-is rather than recursing. > These changes make the code more concise and readable. Thumbs up! >The patch I'm committing fixes one comment typo (whitespace) and a bit >of accidentally duplicated code I added in a prior commit. > >Next up Raphael's patches to handle nontrival conditionals by emiting an >scc insn :-) It would be great to see other implementations for conditional execution. :-) > >Jeff > >ps. I'm deferrring the testsuite bits until we sort out the costing >problems. THey're definitely not forgotten and I still use them in my >local tree. Thank you Jeff, I look forward to a unified, complete and concise implementation method for cost calculation as soon as possible. Thanks Xiao Zeng
Re: RISC-V: Folding memory for FP + constant case
Hi Jeff, Thanks for all the info! Then I'll prepare/test a patch that removes this regcprop limitation and send it out. I have already tested that this change is enough to make fmo optimize leela as well. Manolis On Fri, Aug 4, 2023 at 7:23 PM Jeff Law wrote: > > > > On 8/4/23 03:52, Manolis Tsamis wrote: > > Hi all, > > > > It is true that regcprop currently does not propagate sp and hence > > leela is not optimized, but from what I see this should be something > > we can address. > > > > The reason that the propagation fails is this check that I have added > > when I introduced maybe_copy_reg_attrs: > > > > else if (REG_POINTER (new_reg) != REG_POINTER (old_reg)) > >{ > > /* Only a single instance of STACK_POINTER_RTX must exist and we cannot > > modify it. Allow propagation if REG_POINTER for OLD_REG matches and > > don't touch ORIGINAL_REGNO and REG_ATTRS. */ > > return NULL_RTX; > >} > > > > To be honest I did add this back then just to be on the safe side of > > whether a mismatch in REG_POINTER after propagation would be an issue > > (since the original regcprop had caused enough problems). > No worries. Obviously not propagating is the safe thing to do and if we > find notable missed cases we can always refine the existing code like > we're considering now. > > > > > I see two ways to solve this and make fmo able to optimize leela as well: > > 1) Remove the REG_POINTER check in regcprop if that is safe. My > > understanding is that REG_POINTER is used as a hint and there would be > > no correctness issues. > REG_POINTER is meant to be conservatively correct. If REG_POINTER is > set, then the register must be a pointer to a valid memory location. If > REG_POINTER is not set, then it may or may not point to a valid memory > location. sp, fp, ap are by definition pointers and thus have > REG_POINTER set. > > With that definition we can safely copy propagate away an sp->gpr copy > from a REG_POINTER standpoint. > > > > > > > > > > > > 2) Mark the corresponding registers with REG_POINTER. I'm not sure > > where that is supposed to happen. > > > > Since the instructions look like this: > >(insn 113 11 16 2 (set (reg:DI 15 a5 [226]) > >(reg/f:DI 2 sp)) 179 {*movdi_64bit} > > (nil)) > > > > I assume that we'd want to mark a5 as REG_POINTER anyway (which is > > not), and in that case propagation would work. > I'd strongly recommend against that. The problem is the destination > register might have been used in another context as an index register. > We've fixed a few bugs in this space through the years I suspect there > may be others lurking. You can't directly set that up in C code, but > once you get down to RTL it can happen. > > > > On the other hand if there's no correctness issue w.r.t. REG_POINTER > > and regcprop then removing the additional check would increase > > propagation opportunities in general which is also good. > I think you can safely remove this limitation. > > jeff
Re: [committed] [RISC-V] Avoid sub-word mode comparisons with Zicond
On Wed, Aug 02, 2023 at 01:41:00 PM Jeff Law wrote: > >c-torture/execute/pr59014-2.c fails with the Zicond work on rv64. We >miscompile the "foo" routine because we have eliminated a required sign >extension. > >The key routine looks like this: > >foo (long long int x, long long int y) >{ > if (((int) x | (int) y) != 0) > return 6; > return x + y; >} > >So we kindof do the expected thing at expansion time. We IOR X and Y, >sign extend the result from 32 to 64 bits (note how the values in the >source are casted from long long to ints), then emit a suitable >conditional branch. ie: > >> (insn 10 4 12 2 (set (reg:DI 142) >> (ior:DI (reg/v:DI 138 [ x ]) >> (reg/v:DI 139 [ y ]))) "j.c":6:16 99 {iordi3} >> (nil)) >> (insn 12 10 13 2 (set (reg:DI 144) >> (sign_extend:DI (subreg:SI (reg:DI 142) 0))) "j.c":6:6 116 >>{extendsidi2} >> (nil)) >> (jump_insn 13 12 14 2 (set (pc) >> (if_then_else (ne (reg:DI 144) >> (const_int 0 [0])) >> (label_ref:DI 27) >> (pc))) "j.c":6:6 243 {*branchdi} >> (expr_list:REG_DEAD (reg:DI 144) >> (int_list:REG_BR_PROB 233216732 (nil))) > >When we if-convert that we generate this sequence: > >> (insn 10 4 12 2 (set (reg:DI 142) >> (ior:DI (reg/v:DI 138 [ x ]) >> (reg/v:DI 139 [ y ]))) "j.c":6:16 99 {iordi3} >> (nil)) >> (insn 12 10 30 2 (set (reg:DI 144) >> (sign_extend:DI (subreg:SI (reg:DI 142) 0))) "j.c":6:6 116 >>{extendsidi2} >> (nil)) >> (insn 30 12 31 2 (set (reg:DI 147) >> (const_int 6 [0x6])) "j.c":8:12 179 {*movdi_64bit} >> (nil)) >> (insn 31 30 33 2 (set (reg:DI 146) >> (plus:DI (reg/v:DI 138 [ x ]) >> (reg/v:DI 139 [ y ]))) "j.c":8:12 5 {adddi3} >> (nil)) >> (insn 33 31 34 2 (set (reg:DI 149) >> (if_then_else:DI (ne:DI (reg:DI 144) >> (const_int 0 [0])) >> (const_int 0 [0]) >> (reg:DI 146))) "j.c":8:12 11368 {*czero.nez.didi} >> (nil)) >> (insn 34 33 35 2 (set (reg:DI 148) >> (if_then_else:DI (eq:DI (reg:DI 144) >> (const_int 0 [0])) >> (const_int 0 [0]) >> (reg:DI 147))) "j.c":8:12 11367 {*czero.eqz.didi} >> (nil)) >> (insn 35 34 21 2 (set (reg:DI 137 [ ]) >> (ior:DI (reg:DI 148) >> (reg:DI 149))) "j.c":8:12 99 {iordi3} >> (nil)) > >Which looks basically OK. The sign extended subreg is a bit worrisome >though. And sure enough when we get into combine: > >> Failed to match this instruction: >> (parallel [ >> (set (reg:DI 149) >> (if_then_else:DI (eq:DI (subreg:SI (reg:DI 142) 0) >> (const_int 0 [0])) >> (reg:DI 146) >> (const_int 0 [0]))) >> (set (reg:DI 144) >> (sign_extend:DI (subreg:SI (reg:DI 142) 0))) >> ]) >> Successfully matched this instruction: >> (set (reg:DI 144) >> (sign_extend:DI (subreg:SI (reg:DI 142) 0))) >> Successfully matched this instruction: >> (set (reg:DI 149) >> (if_then_else:DI (eq:DI (subreg:SI (reg:DI 142) 0) >> (const_int 0 [0])) >> (reg:DI 146) >> (const_int 0 [0]))) >> allowing combination of insns 12 and 33 > >Since we need the side effect we first try the PARALLEL with two sets. >That, as expected, fails. Generic combine code then tries to pull apart >the two sets as distinct insns resulting in this conditional move: > >> (insn 33 31 34 2 (set (reg:DI 149) >> (if_then_else:DI (eq:DI (subreg:SI (reg:DI 142) 0) >> (const_int 0 [0])) >> (reg:DI 146) >> (const_int 0 [0]))) "j.c":8:12 11347 {*czero.nez.disi} >> (expr_list:REG_DEAD (reg:DI 146) >> (nil))) > >Bzzt. We can't actually implement this RTL in the hardware. Basically >it's asking to do 32bit comparison on rv64, ignoring the upper 32 bits >of the input register. That's not actually how zicond works. > >The operands to the comparison need to be in DImode for rv64 and SImode >for rv32. That's the X iterator. After analyzing the rtl log, I can't agree more with this sentence. >Note the mode of the comparison >operands may be different than the mode of the destination. ie, we >might have a 64bit comparison and produce a 32bit sign extended result >much like the setcc insns support. > >This patch changes the 6 zicond patterns to use the X iterator on the >comparison inputs. That at least makes the patterns correct and fixes >this particular testcase. There's a few other lurking problems that >I'll address in additional patches. > >Committed to the trunk, >Jeff 1 In any case, jeff's analysis is convincing, here I will add a little bit of my own analysis. 2 First, for the test cases: foo(long long int x, long long int y) { if (((int)x | (int)y) != 0) return 6; return x + y; } look directly at the compared assembly code. This allows people to quickly realize wh