[PATCH] rs6000: Split movsf_from_si from high word before reload[PR89310]
For extracting high part element from DImode register like: {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;} split it before reload with "and mask" to avoid generating shift right 32 bit then shift left 32 bit. srdi 3,3,32 sldi 9,3,32 mtvsrd 1,9 xscvspdpn 1,1 => rldicr 3,3,0,31 mtvsrd 1,3 xscvspdpn 1,1 gcc/ChangeLog: 2020-07-03 Xionghu Luo PR rtl-optimization/89310 * config/rs6000/rs6000.md (movsf_from_si2): New define_insn_and_split. gcc/testsuite/ChangeLog: 2020-07-03 Xionghu Luo PR rtl-optimization/89310 * gcc.target/powerpc/pr89310.c: New test. --- gcc/config/rs6000/rs6000.md| 63 ++ gcc/testsuite/gcc.target/powerpc/pr89310.c | 17 ++ 2 files changed, 80 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr89310.c diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 4fcd6a94022..8d51de07594 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -7593,6 +7593,69 @@ (define_insn_and_split "movsf_from_si" "*, *, p9v, p8v, *, *, p8v,p8v, p8v, *")]) +;; For extracting high part element from DImode register like: +;; {%1:SF=unspec[r122:DI>>0x20#0] 86;clobber scratch;} +;; split it before reload with "and mask" to avoid generating shift right +;; 32 bit then shift left 32 bit. +(define_insn_and_split "movsf_from_si2" + [(set (match_operand:SF 0 "nonimmediate_operand" + "=!r, f, v, wa,m, Z, +Z, wa,?r,!r") + (unspec:SF [ +(subreg:SI (ashiftrt:DI + (match_operand:DI 1 "input_operand" + "m, m, wY,Z, r, f, + wa,r, wa,r") + (const_int 32)) 0)] + UNSPEC_SF_FROM_SI)) + (clobber (match_scratch:DI 2 + "=X,X, X, X, X, X, + X, r, X, X"))] + "TARGET_NO_SF_SUBREG + && (register_operand (operands[0], SFmode) + && register_operand (operands[1], DImode))" + "@ + lwz%U1%X1 %0,%1 + lfs%U1%X1 %0,%1 + lxssp %0,%1 + lxsspx %x0,%y1 + stw%U0%X0 %1,%0 + stfiwx %1,%y0 + stxsiwx %x1,%y0 + # + mfvsrwz %0,%x1 + mr %0,%1" + + "&& !reload_completed + && vsx_reg_sfsubreg_ok (operands[0], SFmode)" + [(const_int 0)] +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + rtx tmp = gen_reg_rtx (DImode); + + /* Avoid split {r155:SI#0=unspec[r133:DI>>0x20#0] 86;clobber scratch;} from PR42745. */ + if (!SUBREG_P (operands[0])) +{ + rtx mask = GEN_INT (HOST_WIDE_INT_M1U << 32); + emit_insn (gen_anddi3 (tmp, op1, mask)); + emit_insn (gen_p8_mtvsrd_sf (op0, tmp)); + emit_insn (gen_vsx_xscvspdpn_directmove (op0, op0)); + DONE; +} + else +FAIL; +} + [(set_attr "length" + "*, *, *, *, *, *, +*, 12,*, *") + (set_attr "type" + "load, fpload,fpload,fpload,store, fpstore, +fpstore,vecfloat, mffgpr,*") + (set_attr "isa" + "*, *, p9v, p8v, *, *, +p8v,p8v, p8v, *")]) + ;; Move 64-bit binary/decimal floating point (define_expand "mov" diff --git a/gcc/testsuite/gcc.target/powerpc/pr89310.c b/gcc/testsuite/gcc.target/powerpc/pr89310.c new file mode 100644 index 000..15e78509246 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr89310.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +struct s { + int i; + float f; +}; + +float +foo (struct s arg) +{ + return arg.f; +} + +/* { dg-final { scan-assembler-not {\msrdi\M} } } */ +/* { dg-final { scan-assembler-not {\msldi\M} } } */ +/* { dg-final { scan-assembler-times {\mrldicr\M} 1 } } */ -- 2.21.0.777.g83232e3864
RFC: make combine do as advertised (cheaper-than)?
Most comments, including the second sentence in the head comment of combine_validate_cost, the main decision-maker of the combine pass, refer to the function as returning true if the new insns(s) *cheaper* than the old insns, when in fact the function returned true also if the cost was the same. Returning true for cheaper also seems more sane than as-cheap-as considering the need to avoid oscillation between same-cost combinations. Also, it makes the job of later passes harder, having combine make more complex combinations of the same cost. Right, you can affect this with your target TARGET_RTX_COSTS and TARGET_INSN_COST hooks, but only for trivial cases, and you have increasingly more complex combinations (many-to-many combinations) where you have to twist simple logic to appease combine (stop it from combining) or give up. Main-interest ports are unsurprisingly pretty tied to this effect. I'd love to install the following patch, adjusting the function and the two opposing comments. But...it causes hundreds of regressions for each of x86_64-linux and aarch64-linux (tens for ppc64le-linux), so I'm just not up to the task, at least not without previous buy-in from reviewers. It would need those targets to have their TARGET_INSN_COST and/or TARGET_RTX_COSTS functions adjusted. Alternatives from the top of my head, one of: - With buy-in from global reviewers, installing this patch on a development branch and let all target maintainers adjust their target test-cases and cost-functions there, for merge when first-class targets are done. (I'm a dreamer.) - A target combine hook for the decision (passing for inspection tuples of from-insns and to-insns and costs) and just falling back to the current addition of rtx costs. - A simpler target combine decision hook that says which one of "cheaper" or "as-cheap-as". - Adjusting documentation and comments that are currently untruthful about the cost decision to instead say (to the effect of) "as cheap as" instead of "cheaper". So, WDYT? (Tested as above, causing massive pattern-match regressions.) gcc: * combine.c (combine_validate_cost): Reject unless the new total cost is cheaper than the original. Adjust the minority of comments that don't say "cheaper": --- gcc/combine.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/combine.c b/gcc/combine.c index f69413a..7da144e 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -846,8 +846,8 @@ do_SUBST_LINK (struct insn_link **into, struct insn_link *newval) than the original sequence I0, I1, I2, I3 and undobuf.other_insn. Note that I0, I1 and/or NEWI2PAT may be NULL_RTX. Similarly, NEWOTHERPAT and undobuf.other_insn may also both be NULL_RTX. Return false if the cost - of all the instructions can be estimated and the replacements are more - expensive than the original sequence. */ + of all the instructions can be estimated and the replacements are not + cheaper than the original sequence. */ static bool combine_validate_cost (rtx_insn *i0, rtx_insn *i1, rtx_insn *i2, rtx_insn *i3, @@ -938,8 +938,8 @@ combine_validate_cost (rtx_insn *i0, rtx_insn *i1, rtx_insn *i2, rtx_insn *i3, } /* Disallow this combination if both new_cost and old_cost are greater than - zero, and new_cost is greater than old cost. */ - int reject = old_cost > 0 && new_cost > old_cost; + zero, and new_cost is greater than or equal to the old cost. */ + int reject = old_cost > 0 && new_cost >= old_cost; if (dump_file) { -- 2.11.0
[PATCH] Enable GCC support for AMX
Hi: This patch is about to support Intel Advanced Matrix Extensions (AMX) which will be enabled in GLC. AMX is a new 64-bit programming paradigm consisting of two compo nents: a set of 2-dimensional registers (tiles) representing sub-arrays from a larger 2-dimensional memory image, and an accelerator able to operate on tiles Supported instructions are AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud AMX-BF16:tdpbf16ps The intrinsics adopts constant tile register number as its input parameters. For detailed information, please refer to https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf Bootstrap ok, regression test on i386/x86 backend is ok. OK for master? gcc/ChangeLog * common/config/i386/i386-common.c (OPTION_MASK_ISA2_AMX_TILE_SET, OPTION_MASK_ISA2_AMX_INT8_SET, OPTION_MASK_ISA2_AMX_BF16_SET, OPTION_MASK_ISA2_AMX_TILE_UNSET, OPTION_MASK_ISA2_AMX_INT8_UNSET, OPTION_MASK_ISA2_AMX_BF16_UNSET): New marcos. (ix86_handle_option): Hanlde -mamx-tile, -mamx-int8, -mamx-bf16. * common/config/i386/i386-cpuinfo.h (processor_types): Add FEATURE_AMX_TILE, FEATURE_AMX_INT8, FEATURE_AMX_BF16. * common/config/i386/cpuinfo.h (XSTATE_TILECFG, XSTATE_TILEDATA, XCR_AMX_ENABLED_MASK): New macro. (get_available_features): Enable AMX features only if their states are suoorited by OSXSAVE. * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for amx-tile, amx-int8, amx-bf16. * config.gcc: Add amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h to extra headers. * config/i386/amxbf16intrin.h: New file. * config/i386/amxint8intrin.h: Ditto. * config/i386/amxtileintrin.h: Ditto. * config/i386/cpuid.h (bit_AMX_BF16, bit_AMX_TILE, bit_AMX_INT8): New macro. * config/i386/i386-c.c (ix86_target_macros_internal): Define __AMX_TILE__, __AMX_INT8__, AMX_BF16__. * config/i386/i386-options.c (ix86_target_string): Add -mamx-tile, -mamx-int8, -mamx-bf16. (ix86_option_override_internal): Handle AMX-TILE, AMX-INT8, AMX-BF16. * config/i386/i386.h (TARGET_AMX_TILE, TARGET_AMX_TILE_P, TARGET_AMX_INT8, TARGET_AMX_INT8_P, TARGET_AMX_BF16_P, PTA_AMX_TILE, PTA_AMX_INT8, PTA_AMX_BF16): New macros. * config/i386/i386.opt: Add -mamx-tile, -mamx-int8, -mamx-bf16. * config/i386/immintrin.h: Include amxtileintrin.h, amxint8intrin.h, amxbf16intrin.h. * doc/invoke.texi: Document -mamx-tile, -mamx-int8, -mamx-bf16. * doc/extend.texi: Document amx-tile, amx-int8, amx-bf16. * doc/sourcebuild.texi ((Effective-Target Keywords, Other hardware attributes): Document amx_int8, amx_tile, amx_bf16. gcc/testsuite/ChangeLog * lib/target-supports.exp (check_effective_target_amx_tile, check_effective_target_amx_int8, check_effective_target_amx_bf16): New proc. * g++.dg/other/i386-2.C: Add -mamx-tile, -mamx-int8, -mamx-bf16. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/sse-12.c: Ditto. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/funcspec-56.inc: Add new target attribute. * gcc.target/i386/amxbf16-asmatt-1.c: New test. * gcc.target/i386/amxint8-asmatt-1.c: Ditto. * gcc.target/i386/amxtile-asmatt-1.c: Ditto. * gcc.target/i386/amxbf16-asmintel-1.c: Ditto. * gcc.target/i386/amxint8-asmintel-1.c: Ditto. * gcc.target/i386/amxtile-asmintel-1.c: Ditto. * gcc.target/i386/amxbf16-asmatt-2.c: Ditto. * gcc.target/i386/amxint8-asmatt-2.c: Ditto. * gcc.target/i386/amxtile-asmatt-2.c: Ditto. * gcc.target/i386/amxbf16-asmintel-2.c: Ditto. * gcc.target/i386/amxint8-asmintel-2.c: Ditto. * gcc.target/i386/amxtile-asmintel-2.c: Ditto. From 88a81d93c9d896cf67869f450905c2ea2b08be74 Mon Sep 17 00:00:00 2001 From: liuhongt Date: Thu, 25 Jul 2019 16:49:36 +0800 Subject: [PATCH] Enable GCC support for AMX-TILE,AMX-INT8,AMX-BF16. AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud AMX-BF16:tdpbf16ps gcc/ChangeLog * common/config/i386/i386-common.c (OPTION_MASK_ISA2_AMX_TILE_SET, OPTION_MASK_ISA2_AMX_INT8_SET, OPTION_MASK_ISA2_AMX_BF16_SET, OPTION_MASK_ISA2_AMX_TILE_UNSET, OPTION_MASK_ISA2_AMX_INT8_UNSET, OPTION_MASK_ISA2_AMX_BF16_UNSET): New marcos. (ix86_handle_option): Hanlde -mamx-tile, -mamx-int8, -mamx-bf16. * common/config/i386/i386-cpuinfo.h (processor_types): Add FEATURE_AMX_TILE, FEATURE_AMX_INT8, FEATURE_AMX_BF16. * common/config/i386/cpuinfo.h (XSTATE_TILECFG, XSTATE_TILEDATA, XCR_AMX_ENABLED_MASK): New macro. (get_available_features): Enable AMX features only if their states are suoorited by OSXSAVE. * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for amx-tile, amx-int8, amx-bf16. *
Re: [PATCH] analyzer: Fix -Wanalyzer-possible-null-argument warning
On Wed, 2020-07-01 at 18:29 +0100, Jonathan Wakely wrote: > On 30/06/20 17:43 +0100, Jonathan Wakely wrote: > > gcc/testsuite/ChangeLog: > > > > * g++.dg/analyzer/pr94028.C: Make operator new non-throwing so > > that the compiler doesn't implicitly mark it as returning > > non-null. > > > > Fixes these: > > > > FAIL: g++.dg/analyzer/pr94028.C -std=c++98 (test for excess > > errors) > > FAIL: g++.dg/analyzer/pr94028.C -std=c++14 (test for excess > > errors) > > FAIL: g++.dg/analyzer/pr94028.C -std=c++17 (test for excess > > errors) > > FAIL: g++.dg/analyzer/pr94028.C -std=c++2a (test for excess > > errors) > > Updated to add PR 96014 to the commit log. > > OK for master? Sorry for not responding to this earlier. My knowledge of C++ exceptions is a little rusty; I found the addition of "throw()" to mark the decl as non-throwing to be confusing. Looking in my copy of Stroustrup 4th edition (C++11) p367 it says this is an empty exception specification, and is equivalent to "noexcept", and Stroustrup recommends using the latter instead. Did you use this syntax for backwards compat with C++98, or is "noexcept" available in the earlier C++ dialects? Thanks Dave
Re: [PATCH] Map filename from print in gfortran with -ffile-prefix-map (PR96069)
> I think this remapping should happen with `file-prefix-map` but > shouldn't with `debug-prefix-map` (though if it happens for both it's > also not too bad) and I believe this patch is the minimum change to > achieve that. I think it makes sense to make this follow > `macro-prefix-map` although I'm not sure if this is a macro... (OTOH, > __builtin_FILE isn't a macro either so maybe it's fine?). I haven't > figured out how I can allow the option in gfortran or how to document > this new behavior though (e.g. I actually don't know what this is > called in fortran...) And here's a version that makes -fmacro-prefix-remap a common option. --- diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c index 9b6300f330f..6d105e24f16 100644 --- a/gcc/c-family/c-opts.c +++ b/gcc/c-family/c-opts.c @@ -40,7 +40,6 @@ along with GCC; see the file COPYING3. If not see #include "plugin.h"/* For PLUGIN_INCLUDE_FILE event. */ #include "mkdeps.h" #include "dumpfile.h" -#include "file-prefix-map.h"/* add_*_prefix_map() */ #ifndef DOLLARS_IN_IDENTIFIERS # define DOLLARS_IN_IDENTIFIERS true @@ -443,10 +442,6 @@ c_common_handle_option (size_t scode, const char *arg, HOST_WIDE_INT value , cpp_opts->dollars_in_ident = value; break; -case OPT_fmacro_prefix_map_: - add_macro_prefix_map (arg); - break; - case OPT_ffreestanding: value = !value; /* Fall through. */ diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 89a58282b3f..bf9899d1aef 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1517,10 +1517,6 @@ fdollars-in-identifiers C ObjC C++ ObjC++ Permit '$' as an identifier character. -fmacro-prefix-map= -C ObjC C++ ObjC++ Joined RejectNegative --fmacro-prefix-map== Map one directory name to another in __FILE__, __BASE_FILE__, a nd __builtin_FILE(). - fdump-ada-spec C ObjC C++ ObjC++ RejectNegative Var(flag_dump_ada_spec) Write all declarations as Ada code transitively. diff --git a/gcc/common.opt b/gcc/common.opt index df8af365d1b..e018716af89 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1217,6 +1217,10 @@ fdebug-prefix-map= Common Joined RejectNegative Var(common_deferred_options) Defer -fdebug-prefix-map== Map one directory name to another in debug information. +fmacro-prefix-map= +Common Joined RejectNegative Var(common_deferred_options) Defer +-fmacro-prefix-map== Map one directory name to another in __FILE__, __BASE_FILE__, a nd __builtin_FILE(). + ffile-prefix-map= Common Joined RejectNegative Var(common_deferred_options) Defer -ffile-prefix-map== Map one directory name to another in compilation result. diff --git a/gcc/fortran/trans-io.c b/gcc/fortran/trans-io.c index 21bdd5ef0d8..4d406493603 100644 --- a/gcc/fortran/trans-io.c +++ b/gcc/fortran/trans-io.c @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. If not see #include "trans-types.h" #include "trans-const.h" #include "options.h" +#include "file-prefix-map.h" /* remap_macro_filename() */ /* Members of the ioparm structure. */ @@ -1026,7 +1027,7 @@ set_error_locus (stmtblock_t * block, tree var, locus * where) TREE_TYPE (p->field), locus_file, p->field, NULL_TREE); f = where->lb->file; - str = gfc_build_cstring_const (f->filename); + str = gfc_build_cstring_const (remap_macro_filename(f->filename)); str = gfc_build_addr_expr (pchar_type_node, str); gfc_add_modify (block, locus_file, str); diff --git a/gcc/opts-global.c b/gcc/opts-global.c index b1a8429dc3c..574db430430 100644 --- a/gcc/opts-global.c +++ b/gcc/opts-global.c @@ -380,6 +380,10 @@ handle_common_deferred_options (void) add_debug_prefix_map (opt->arg); break; + case OPT_fmacro_prefix_map_: + add_macro_prefix_map (opt->arg); + break; + case OPT_ffile_prefix_map_: add_file_prefix_map (opt->arg); break; diff --git a/gcc/testsuite/gfortran.dg/pr96069.f90 b/gcc/testsuite/gfortran.dg/pr96069.f90 new file mode 100644 index 000..d7fed59a150 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr96069.f90 @@ -0,0 +1,11 @@ +! { dg-do compile } +! { dg-options "-fmacro-prefix-map==MACRO-PREFIX" } + +subroutine f(name) + implicit none + character*(*) name + print *,name + return +end subroutine f + +! { dg-final { scan-assembler ".string\t\"MACRO-PREFIX" } } > > --- > gcc/fortran/trans-io.c| 3 ++- > gcc/testsuite/gfortran.dg/pr96069.f90 | 11 +++ > 2 files changed, 13 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gfortran.dg/pr96069.f90 > > diff --git a/gcc/fortran/trans-io.c b/gcc/fortran/trans-io.c > index 21bdd5ef0d8..4d406493603 100644 > --- a/gcc/fortran/trans-io.c > +++ b/gcc/fortran/trans-io.c > @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. If not see > #include "trans-types.h" > #include "trans-const.h" > #include "options.h" > +#include "file-prefix-map.h" /* remap_macro_filename() */ > > /* Members of the
RFA: Fix combine.c combining a move and a non-move into two non-moves, PR93372
TL;DR: fixing a misdetection of what is a "simple move". Looking into performace degradation after de-cc0 for CRIS, I noticed combine behaving badly; it changed a move and a right-shift into two right-shifts, where the "combined" move was not eliminated in later passes, and where the deficiency caused an extra insn in a hot loop: crcu16 (and crcu32) in coremark. Before de-cc0, the insns input to combine looked like: 33: r58:SI=r56:SI 0>>r48:SI REG_DEAD r56:SI 35: r37:HI=r58:SI#0 and after: 33: {r58:SI=r56:SI 0>>r48:SI;clobber dccr:CC;} REG_DEAD r56:SI REG_UNUSED dccr:CC 35: {r37:HI=r58:SI#0;clobber dccr:CC;} REG_UNUSED dccr:CC That is, there's always a parallel with a clobber of the condition-codes register. Being a parallel, it's not an is_just_move, but e.g. a single_set. For the de-cc0:ed "combination", it ended up as 33: {r58:SI=r56:SI 0>>r48:SI;clobber dccr:CC;} REG_UNUSED dccr:CC 35: {r37:HI#0=r56:SI 0>>r48:SI;clobber dccr:CC;} REG_DEAD r56:SI REG_UNUSED dccr:CC That is, a move and a shift turned into two shifts; the extra shift is not eliminated by later passes, while the move was (with cc0, and "will be again") leading to redundant instructions. At first I thought this was due to parallel-ignorant old code but the "guilty" change is actually pretty recent. Regarding a parallel with a clobber not being "just" a move, there's only the two adjacent callers seen in the patch (obviously with the rename), and their use exactly matches to check that the argument is a single_set which is a move. It's always applied to an rtx_insn, so I changed the type and name to avoid the "just" issue. I had to adjust the type when calling single_set. I checked the original commit, c4c5ad1d6d1e1e a.k.a r263067 and it seems parallels-as-sets were just overlooked and that this patch appears to agree with the intent and the comment at the use of i2_was_move and i3_was_move, which has a clause saying "Also do this if we started with two insns neither of which was a simple move". With this correction in place, the performance degradation related to de-cc0 of the CRIS port as measured by coremark is gone and turned into a small win. N.B.: there certainly is a code difference in other hot functions, and the swing between different functions is larger than this difference; to be dealt with separately. Tested cris-elf, x86_64-linux, powerpc64le-linux, 2/3 through aarch64-linux (unexpectedly slow). Ok to commit? 2020-07-06 Hans-Peter Nilsson PR target/93372 * gcc/combine.c (is_move): Rename from is_just_move. Use single_set instead of of peeking directly into the PATTERN. --- gcc/combine.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/gcc/combine.c b/gcc/combine.c index 7da144e..ed90b16 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -2624,13 +2624,15 @@ can_split_parallel_of_n_reg_sets (rtx_insn *insn, int n) return true; } -/* Return whether X is just a single set, with the source - a general_operand. */ +/* Return whether INSN is just a single set, with the source + a general_operand. INSN must be an insn, not stripped to its PATTERN. */ static bool -is_just_move (rtx x) +is_move (const rtx_insn *insn) { - if (INSN_P (x)) -x = PATTERN (x); + rtx x = single_set (insn); + + if (x == NULL_RTX) +return false; return (GET_CODE (x) == SET && general_operand (SET_SRC (x), VOIDmode)); } @@ -3103,8 +3105,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, } /* Record whether i2 and i3 are trivial moves. */ - i2_was_move = is_just_move (i2); - i3_was_move = is_just_move (i3); + i2_was_move = is_move (i2); + i3_was_move = is_move (i3); /* Record whether I2DEST is used in I2SRC and similarly for the other cases. Knowing this will help in register status updating below. */ -- 2.11.0
[PATCH] Map filename from print in gfortran with -ffile-prefix-map (PR96069)
Sorry I think I accidentally had rich text mode on and also forgot the `[PATCH]` in the title in the previous email... Try again... Currently this is using the macro prefix map without allowing the -fmacro-prefix-map argument, which is arguably pretty weird... but I don't know what would be a better way. I think this remapping should happen with `file-prefix-map` but shouldn't with `debug-prefix-map` (though if it happens for both it's also not too bad) and I believe this patch is the minimum change to achieve that. I think it makes sense to make this follow `macro-prefix-map` although I'm not sure if this is a macro... (OTOH, __builtin_FILE isn't a macro either so maybe it's fine?). I haven't figured out how I can allow the option in gfortran or how to document this new behavior though (e.g. I actually don't know what this is called in fortran...) --- gcc/fortran/trans-io.c| 3 ++- gcc/testsuite/gfortran.dg/pr96069.f90 | 11 +++ 2 files changed, 13 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gfortran.dg/pr96069.f90 diff --git a/gcc/fortran/trans-io.c b/gcc/fortran/trans-io.c index 21bdd5ef0d8..4d406493603 100644 --- a/gcc/fortran/trans-io.c +++ b/gcc/fortran/trans-io.c @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. If not see #include "trans-types.h" #include "trans-const.h" #include "options.h" +#include "file-prefix-map.h" /* remap_macro_filename() */ /* Members of the ioparm structure. */ @@ -1026,7 +1027,7 @@ set_error_locus (stmtblock_t * block, tree var, locus * where) TREE_TYPE (p->field), locus_file, p->field, NULL_TREE); f = where->lb->file; - str = gfc_build_cstring_const (f->filename); + str = gfc_build_cstring_const (remap_macro_filename(f->filename)); str = gfc_build_addr_expr (pchar_type_node, str); gfc_add_modify (block, locus_file, str); diff --git a/gcc/testsuite/gfortran.dg/pr96069.f90 b/gcc/testsuite/gfortran.dg/pr96069.f90 new file mode 100644 index 000..de8bd3a14de --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr96069.f90 @@ -0,0 +1,11 @@ +! { dg-do compile } +! { dg-options "-ffile-prefix-map==MACRO-PREFIX" } + +subroutine f(name) + implicit none + character*(*) name + print *,name + return +end subroutine f + +! { dg-final { scan-assembler ".string\t\"MACRO-PREFIX" } } -- 2.27.0
Map filename from print in gfortran with -ffile-prefix-map (PR96069)
Currently this is using the macro prefix map without allowing the ---fmacro-prefix-map argument, which is arguably pretty weird... but I don't know what would be a better way. I think this remapping should happen with `file-prefix-map` but shouldn't with `debug-prefix-map` (though if it happens for both it's also not too bad) and I believe this patch is the minimum change to achieve that. I think it makes sense to make this follow `macro-prefix-map` although I'm not sure if this is a macro... (OTOH, __builtin_FILE isn't a macro either so maybe it's fine?). I haven't figured out how I can allow the option in gfortran or how do document this new behavior though (e.g. I actually don't know what this is called in fortran...) --- gcc/fortran/trans-io.c| 3 ++- gcc/testsuite/gfortran.dg/pr96069.f90 | 11 +++ 2 files changed, 13 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gfortran.dg/pr96069.f90 diff --git a/gcc/fortran/trans-io.c b/gcc/fortran/trans-io.c index 21bdd5ef0d8..4d406493603 100644 --- a/gcc/fortran/trans-io.c +++ b/gcc/fortran/trans-io.c @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. If not see #include "trans-types.h" #include "trans-const.h" #include "options.h" +#include "file-prefix-map.h" /* remap_macro_filename() */ /* Members of the ioparm structure. */ @@ -1026,7 +1027,7 @@ set_error_locus (stmtblock_t * block, tree var, locus * where) TREE_TYPE (p->field), locus_file, p->field, NULL_TREE); f = where->lb->file; - str = gfc_build_cstring_const (f->filename); + str = gfc_build_cstring_const (remap_macro_filename(f->filename)); str = gfc_build_addr_expr (pchar_type_node, str); gfc_add_modify (block, locus_file, str); diff --git a/gcc/testsuite/gfortran.dg/pr96069.f90 b/gcc/testsuite/gfortran.dg/pr96069.f90 new file mode 100644 index 000..de8bd3a14de --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr96069.f90 @@ -0,0 +1,11 @@ +! { dg-do compile } +! { dg-options "-ffile-prefix-map==MACRO-PREFIX" } + +subroutine f(name) + implicit none + character*(*) name + print *,name + return +end subroutine f + +! { dg-final { scan-assembler ".string\t\"MACRO-PREFIX" } } -- 2.27.0
committed: cris: New peephole2 movulsr + test-case.
(The previous patch was also committed, FWIW, I just forgot to mention it.) Combine likes to change a zero-extension / and + shift as seen in the test-case source to a logical shift followed by an and of the shifted mask, like: lsrq 1,r0 and.d 0x7f,r0 This was observed in the hot loop of coremark crcu16 and crcu32, when doing other changes affecting instruction selection. While fixable by other means (like instruction costs or combine patches), I wanted to break this out from those "other means". The similarity to extant peephole optimizations is not deliberate. I noticed some paths to other peephole2 test-cases have changed due to moves and renaming, so I updated them. gcc: * config/cris/cris.md (movulsr): New peephole2. gcc/testsuite: * gcc.target/cris/peep2-movulsr.c: New test. --- gcc/config/cris/cris.md | 45 --- gcc/testsuite/gcc.target/cris/peep2-movulsr.c | 19 +++ 2 files changed, 60 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/cris/peep2-movulsr.c diff --git a/gcc/config/cris/cris.md b/gcc/config/cris/cris.md index ae6a27f5f2c..c36a5402be3 100644 --- a/gcc/config/cris/cris.md +++ b/gcc/config/cris/cris.md @@ -2515,8 +2515,45 @@ (define_expand "casesi" ;; We have trouble with and:s and shifts. Maybe something is broken in ;; gcc? Or it could just be that bit-field insn expansion is a bit -;; suboptimal when not having extzv insns. -;; Testcase for the following four peepholes: gcc.dg/cris-peep2-xsrand.c +;; suboptimal when not having extzv insns. Or combine being over-eager +;; to canonicalize to "and", and ignorant on the benefits of the right +;; mixture of "and" and "zero-extend". + +;; Testcase for the following peephole: gcc.target/cris/peep2-movulsr.c + +;; Where equivalent and where the "and" argument doesn't fit "andq" but +;; is 16 bits or smaller, replace the "and" with a zero-extend preceding +;; the shift. A zero-extend is shorter and faster than "and" with a +;; 32-bit argument. + +(define_peephole2 ; movulsr + [(parallel +[(set (match_operand:SI 0 "register_operand") + (lshiftrt:SI (match_dup 0) + (match_operand:SI 1 "const_int_operand"))) + (clobber (reg:CC CRIS_CC0_REGNUM))]) + (parallel +[(set (match_dup 0) + (and:SI (match_dup 0) + (match_operand 2 "const_int_operand"))) + (clobber (reg:CC CRIS_CC0_REGNUM))])] + "INTVAL (operands[2]) > 31 && INTVAL (operands[2]) <= 0x + && (((INTVAL (operands[2]) <= 0xff ? 0xff : 0x) >> INTVAL (operands[1])) + == INTVAL (operands[2]))" + [(parallel +;; The zero-extend is expressed as an "and", only because that's easier +;; than messing with zero-extend of a subreg. +[(set (match_dup 0) (and:SI (match_dup 0) (match_dup 3))) + (clobber (reg:CC CRIS_CC0_REGNUM))]) + (parallel +[(set (match_dup 0) (lshiftrt:SI (match_dup 0) (match_dup 1))) + (clobber (reg:CC CRIS_CC0_REGNUM))])] +{ + operands[3] += INTVAL (operands[2]) <= 0xff ? GEN_INT (0xff) : GEN_INT (0x); +}) + +;; Testcase for the following four peepholes: gcc.target/cris/peep2-xsrand.c (define_peephole2 ; asrandb [(parallel @@ -2635,7 +2672,7 @@ (define_peephole2 ; lsrandw ;; move.d reg_or_mem,reg_32 ;; and.d const_32__65535,reg_32 ;; Fix it with these two peephole2's. -;; Testcases: gcc.dg/cris-peep2-andu1.c gcc.dg/cris-peep2-andu2.c +;; Testcases: gcc.target/cris/peep2-andu1.c gcc.target/cris/peep2-andu2.c (define_peephole2 ; andu [(parallel @@ -2679,7 +2716,7 @@ (define_peephole2 ; andu ? QImode : amode))); }) -;; Since r186861, gcc.dg/cris-peep2-andu2.c trigs this pattern, with which +;; Since r186861, gcc.target/cris/peep2-andu2.c trigs this pattern, with which ;; we fix up e.g.: ;; movu.b 254,$r9. ;; and.d $r10,$r9 diff --git a/gcc/testsuite/gcc.target/cris/peep2-movulsr.c b/gcc/testsuite/gcc.target/cris/peep2-movulsr.c new file mode 100644 index 000..a19afce3982 --- /dev/null +++ b/gcc/testsuite/gcc.target/cris/peep2-movulsr.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-final { scan-assembler "movu.w " } } */ +/* { dg-final { scan-assembler "movu.b " } } */ +/* { dg-final { scan-assembler-not "and.. " } } */ +/* { dg-options "-O2" } */ + +/* Test the "movulsrb", "movulsrw" peephole2:s trivially. */ + +unsigned int +movulsrb (unsigned int x) +{ + return (x & 255) >> 1; +} + +unsigned int +movulsrw (unsigned int x) +{ + return (x & 65535) >> 4; +} -- 2.11.0
cris: Correct gcc_assert for atomic_fetch_op pattern
Yet another misnumbering of operands: the asserted non-overlap would be the only benign operands overlap. "Suddenly" exposed by g++.dg/cpp0x/pr81325.C when testing unrelated changes affecting register allocation. To wit, operands 2 and 1 are the only ones that are safe for overlap, it's only that it doesn't seem to make much sense to write the address of the atomic data as the atomic data. gcc: * config/cris/sync.md ("cris_atomic_fetch__1"): Correct gcc_assert of overlapping operands. --- gcc/config/cris/sync.md | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/gcc/config/cris/sync.md b/gcc/config/cris/sync.md index 30b5ea075af..70640dbd55b 100644 --- a/gcc/config/cris/sync.md +++ b/gcc/config/cris/sync.md @@ -128,7 +128,11 @@ (define_insn "cris_atomic_fetch__1" "mode == QImode || !TARGET_ATOMICS_MAY_CALL_LIBFUNCS" { /* Can't be too sure; better ICE if this happens. */ - gcc_assert (!reg_overlap_mentioned_p (operands[2], operands[1])); + gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[2]) + && !reg_overlap_mentioned_p (operands[0], operands[3]) + && !reg_overlap_mentioned_p (operands[1], operands[3]) + && !reg_overlap_mentioned_p (operands[2], operands[3])); if (cris_cpu_version == 10) return -- 2.11.0
committed: cris: update recent patterns. Simplify cris_select_cc_mode.
The code in cris_select_cc_mode for selecting CC_NZmode was partly inconsistent with the comment and partly seemed ambiguous. I couldn't find a reason why I qualified selection of CC_NZmode on the setting operation once a matching user was spotted, so I just removed that. The cris.c update was due to observing the new test-case failing; the CC_NZmode compare wasn't eliminated. The recently re-instated adds/addu/subs/subu/bound patterns are rewritten to replace the use of match_operator with iterators. gcc: * config/cris/cris.c (cris_select_cc_mode): Always return CC_NZmode for matching comparisons. Clarify comments. * config/cris/cris-modes.def: Clarify mode comment. * config/cris/cris.md (plusminus, plusminusumin, plusumin): New code iterators. (addsub, addsubbo, nd): New code iterator attributes. ("*qihi"): Rename from "*extopqihi". Use code iterator constructs instead of match_operator constructs. ("*si"): Similar from "*extopsi". ("*addqihi_swap"): Similar from "*addxqihi_swap". ("*si_swap"): Similar from "*extopsi_swap". gcc/testsuite: * gcc.target/cris/pr93372-39.c: New test. --- gcc/config/cris/cris-modes.def | 17 +++-- gcc/config/cris/cris.c | 16 + gcc/config/cris/cris.md| 105 - gcc/testsuite/gcc.target/cris/pr93372-39.c | 19 ++ 4 files changed, 90 insertions(+), 67 deletions(-) create mode 100644 gcc/testsuite/gcc.target/cris/pr93372-39.c diff --git a/gcc/config/cris/cris-modes.def b/gcc/config/cris/cris-modes.def index 1aaf12a0f5b..874e4c19657 100644 --- a/gcc/config/cris/cris-modes.def +++ b/gcc/config/cris/cris-modes.def @@ -25,9 +25,10 @@ along with GCC; see the file COPYING3. If not see have ordinary compares and incidental condition-code settings from preceding instructions, setting a subset of N, Z, V and C to usable values, from the perspective of comparing the result against zero - (fpcraz). The two subsets meaningful to gcc are all of N, Z, V, C - versus just N, Z; some CC-users care only about N and/or Z and some - that care about at least one of those flags together with V and/or C. + (referred to below as "fpcraz"). The two subsets meaningful to gcc are + all of N, Z, V, C versus just N, Z; some CC-users care only about N + and/or Z and some that care about at least one of those flags together + with V and/or C. The plain "CC_MODE (CC)" (which is always present in gcc), is used to reflect the "unoptimized" state, where the CC-setter is a compare @@ -37,9 +38,13 @@ along with GCC; see the file COPYING3. If not see or if optimization of CC-setter and CC-users, when CCmode setters can be changed or replaced by either CC_NZmode or CC_NZVCmode. To wit, all users that require CC_NZVCmode must match only that mode at any time. - All other users must match all CCmodes. All setters that set only - CC_NZmode must set only that mode. All other setters must match - setting all CCmodes. */ + All other users must match all of CCmode, CC_NZmode, and CC_NZVCmode. + All setters that set only CC_NZmode must match setting only that mode. + All other setters must match setting all of CCmode, CC_NZmode, and + CC_NZVCmode. + + There's also other modes (i.e. CC_ZnNmode) with a separate set of + setters and users not matched by the others. */ /* Z and N flags only. For a condition-code setter: only the Z and N flags are set to usable values, fpcraz. For a condition-code user: the diff --git a/gcc/config/cris/cris.c b/gcc/config/cris/cris.c index 2bad9393c6e..b26b9f2e883 100644 --- a/gcc/config/cris/cris.c +++ b/gcc/config/cris/cris.c @@ -1530,21 +1530,11 @@ cris_select_cc_mode (enum rtx_code op, rtx x, rtx y) if (GET_MODE_CLASS (GET_MODE (x)) != MODE_INT || y != const0_rtx) return CCmode; - /* If we have a comparison that doesn't have to look at V or C, check - operand x; if it's a valid operator, return CC_NZmode, else CCmode, - so we only use CC_NZmode for the cases where we don't actually have - both V and C valid. */ + /* If we have a comparison that doesn't have to look at V or C, return + CC_NZmode. */ if (op == EQ || op == NE || op == GTU || op == LEU || op == LT || op == GE) -{ - enum rtx_code e = GET_CODE (x); - -/* Mentioning the rtx_code here is required but not sufficient: the - insn also needs to be decorated with (and the - anonymization prefix for a named pattern). */ - return e == PLUS || e == MINUS || e == MULT || e == NOT || e == NEG - ? CC_NZmode : CCmode; -} +return CC_NZmode; /* We should only get here for comparison operators. */ gcc_assert (op == GEU || op == LTU || op == GT || op == LE); diff --git a/gcc/config/cris/cris.md b/gcc/config/cris/cris.md index e700819d510..ae6a27f5f2c 100644
committed: cris.md: Reinstate add/sub with extend
When cleaning out the multitude of patterns with unknown coverage, this one went the way of the bathwater. It's use is barely common enough to mark when diffing libgcc, and has a minimal impact on performance-testsuites. Anyway, reinstated with a couple of test-cases. It's suboptimal of gcc-core not to make use of the SImode pattern when performing HImode; see the FIXME (which is actually also reinstated). This version uses match_operator, for continuity but will be replaced with a version making use of iterators (like it does for the mode). gcc: * config/cris/cris.md ("*extopqihi", "*extopsi_swap") ("*extopsi", "*addxqihi_swap"): Reinstate. gcc/testsuite: * gcc.target/cris/pr93372-36.c, gcc.target/cris/pr93372-37.c, gcc.target/cris/pr93372-38.c: New tests. --- gcc/config/cris/cris.md| 83 ++ gcc/testsuite/gcc.target/cris/pr93372-36.c | 37 + gcc/testsuite/gcc.target/cris/pr93372-37.c | 26 ++ gcc/testsuite/gcc.target/cris/pr93372-38.c | 30 +++ 4 files changed, 176 insertions(+) create mode 100644 gcc/testsuite/gcc.target/cris/pr93372-36.c create mode 100644 gcc/testsuite/gcc.target/cris/pr93372-37.c create mode 100644 gcc/testsuite/gcc.target/cris/pr93372-38.c diff --git a/gcc/config/cris/cris.md b/gcc/config/cris/cris.md index 081041fa245..e700819d510 100644 --- a/gcc/config/cris/cris.md +++ b/gcc/config/cris/cris.md @@ -1108,6 +1108,89 @@ (define_insn "*sub3" [(set_attr "slottable" "yes,yes,yes,yes,no,no") (set_attr "cc" "normal,normal,clobber,clobber,normal,normal")]) +;; Extend versions (zero/sign) of normal add/sub (no side-effects). + +;; QImode to HImode +;; FIXME: GCC should widen. + +(define_insn "*extopqihi" + [(set (match_operand:HI 0 "register_operand" "=r,r,r,r") + (match_operator:HI +3 "cris_additive_operand_extend_operator" +[(match_operand:HI 1 "register_operand" "0,0,0,r") + (match_operator:HI + 4 "cris_extend_operator" + [(match_operand:QI 2 "nonimmediate_operand" "r,Q>,m,!To")])])) + (clobber (reg:CC CRIS_CC0_REGNUM))] + "GET_MODE_SIZE (GET_MODE (operands[0])) <= UNITS_PER_WORD + && (operands[1] != frame_pointer_rtx || GET_CODE (operands[3]) != PLUS)" + "@ + %x3%E4.%m4 %2,%0 + %x3%E4.%m4 %2,%0 + %x3%E4.%m4 %2,%0 + %x3%E4.%m4 %2,%1,%0" + [(set_attr "slottable" "yes,yes,no,no") + (set_attr "cc" "clobber")]) + +(define_insn "*extopsi" + [(set (match_operand:SI 0 "register_operand" "=r,r,r,r") + (match_operator:SI +3 "cris_operand_extend_operator" +[(match_operand:SI 1 "register_operand" "0,0,0,r") + (match_operator:SI + 4 "cris_extend_operator" + [(match_operand:BW 2 "nonimmediate_operand" "r,Q>,m,!To")])])) + (clobber (reg:CC CRIS_CC0_REGNUM))] + "(GET_CODE (operands[3]) != UMIN || GET_CODE (operands[4]) == ZERO_EXTEND) + && GET_MODE_SIZE (GET_MODE (operands[0])) <= UNITS_PER_WORD + && (operands[1] != frame_pointer_rtx || GET_CODE (operands[3]) != PLUS)" + "@ + %x3%E4 %2,%0 + %x3%E4 %2,%0 + %x3%E4 %2,%0 + %x3%E4 %2,%1,%0" + [(set_attr "slottable" "yes,yes,no,no")]) + +;; We may have swapped operands for add or bound. +;; For commutative operands, these are the canonical forms. + +;; QImode to HImode + +(define_insn "*addxqihi_swap" + [(set (match_operand:HI 0 "register_operand" "=r,r,r,r") + (plus:HI +(match_operator:HI + 3 "cris_extend_operator" + [(match_operand:QI 2 "nonimmediate_operand" "r,Q>,m,!To")]) +(match_operand:HI 1 "register_operand" "0,0,0,r"))) + (clobber (reg:CC CRIS_CC0_REGNUM))] + "operands[1] != frame_pointer_rtx" + "@ + add%e3.b %2,%0 + add%e3.b %2,%0 + add%e3.b %2,%0 + add%e3.b %2,%1,%0" + [(set_attr "slottable" "yes,yes,no,no") + (set_attr "cc" "clobber")]) + +(define_insn "*extopsi_swap" + [(set (match_operand:SI 0 "register_operand" "=r,r,r,r") + (match_operator:SI +4 "cris_plus_or_bound_operator" +[(match_operator:SI + 3 "cris_extend_operator" + [(match_operand:BW 2 "nonimmediate_operand" "r,Q>,m,!To")]) + (match_operand:SI 1 "register_operand" "0,0,0,r")])) + (clobber (reg:CC CRIS_CC0_REGNUM))] + "(GET_CODE (operands[4]) != UMIN || GET_CODE (operands[3]) == ZERO_EXTEND) + && operands[1] != frame_pointer_rtx" + "@ + %x4%E3 %2,%0 + %x4%E3 %2,%0 + %x4%E3 %2,%0 + %x4%E3 %2,%1,%0" + [(set_attr "slottable" "yes,yes,no,no")]) + ;; This is the special case when we use what corresponds to the ;; instruction above in "casesi". Do *not* change it to use the generic ;; pattern and "REG 15" as pc; I did that and it led to madness and diff --git a/gcc/testsuite/gcc.target/cris/pr93372-36.c b/gcc/testsuite/gcc.target/cris/pr93372-36.c new file mode 100644 index 000..84fbdb7091d --- /dev/null +++ b/gcc/testsuite/gcc.target/cris/pr93372-36.c @@ -0,0 +1,37 @@ +/* Check that we produce sign-
Re: [PATCH] PR fortran/95980 - ICE in get_unique_type_string, at fortran/class.c:485
Early ping. > Gesendet: Montag, 29. Juni 2020 um 22:58 Uhr > Von: "Harald Anlauf" > An: "fortran" , "gcc-patches" > Betreff: [PATCH] PR fortran/95980 - ICE in get_unique_type_string, at > fortran/class.c:485 > > Dear all, > > here's a couple of NULL pointer dereferences on invalid code. > > Regtested on x86_64-pc-linux-gnu. > > OK for master? > > Thanks, > Harald > > > PR fortran/95980 - ICE on using sync images with -fcheck=bounds > > In SELECT TYPE, the argument may be an incorrectly specified unlimited > polymorphic variable. Avoid a NULL pointer dereference for clean error > recovery. > > gcc/fortran/ > PR fortran/95980 > * match.c (copy_ts_from_selector_to_associate, build_class_sym): > Distinguish between unlimited polymorphic and ordinary variables > to avoid NULL pointer dereference. > * resolve.c resolve_select_type): > Distinguish between unlimited polymorphic and ordinary variables > to avoid NULL pointer dereference.
Re: [wwwdocs] Document new G++ features
On Fri, 3 Jul 2020, Marek Polacek via Gcc-patches wrote: > Pushed. Nice. And thanks for doing this along the way. That's beneficial for users/testers of GCC 11 as it evolves, and also helps not forget things during the release process. Gerald
Re: *ping* [patch, fortran] PR 27318, warn if interfaces do not match
Hi Paul and Dominique, The patch looks fine to me. If Dominique has nothing to report then it is OK for trunk. committed. Thanks! Regards Thomas
Re: [PATCH] nvptx: Add support for vadd.add and vsub.add instructions
[ fixed $subject ] On 7/3/20 7:20 PM, Roger Sayle wrote: > > The following patch adds support for three-input addition instructions to the > nvptx backend. > The PTX ISA's "vadd.u32.u32.u32.add d, a, b, c" instruction effectively > implements 32-bit d = a+b+c, > and the "vsub.u32.u32.u32 d,a,b,c" instruction that provides 32-bit d = > (a-b)+c. The hope is that > these mnemonics help ptxas generate the low-level hardware's IADD3 > instruction. > > Tested by "make" and "make -k check" on --build=nvptx-none hosted on > x86_64-pc-linux-gnu > with no new regressions. > > [PATCH] nvptx: Add support for vadd.add and vsub.add instructions > > 2020-07-03 Roger Sayle > > gcc/ChangeLog: > * config/nvptx/nvptx.md (vadd_addsi4): New instruction. > (vsub_addsi4): New instruction. > > gcc/testsuite/ChangeLog: > * gcc.target/nvptx/vadd_add.c: New test. > * gcc.target/nvptx/vsub_add.c: New test. > > > Hopefully, I've got the patch/diff file format correct this time. > Ok for mainline? > Hi Roger, the patch looks fine, please apply. I wonder though, AFAIU the define_insn names are not standard names, so could they be defined with the '*' prefix? If so, you could add that as well. Thanks, - Tom