[x86 SSE] Improve handling of ternlog instructions in i386/sse.md

2024-05-12 Thread Roger Sayle
than it might have been. I propose to remove the vestigial patterns in a follow-up patch, once this approach has baked (proven to be stable) on mainline. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new f

[gcc r15-390] arm: Use utxb rN, rM, ror #8 to implement zero_extract on armv6.

2024-05-12 Thread Roger Sayle via Gcc-cvs
https://gcc.gnu.org/g:46077992180d6d86c86544df5e8cb943492d3b01 commit r15-390-g46077992180d6d86c86544df5e8cb943492d3b01 Author: Roger Sayle Date: Sun May 12 16:27:22 2024 +0100 arm: Use utxb rN, rM, ror #8 to implement zero_extract on armv6. Examining the code generated

[gcc r15-366] i386: Improve V[48]QI shifts on AVX512/SSE4.1

2024-05-10 Thread Roger Sayle via Gcc-cvs
https://gcc.gnu.org/g:f5a8cdc1ef5d6aa2de60849c23658ac5298df7bb commit r15-366-gf5a8cdc1ef5d6aa2de60849c23658ac5298df7bb Author: Roger Sayle Date: Fri May 10 20:26:40 2024 +0100 i386: Improve V[48]QI shifts on AVX512/SSE4.1 The following one line patch improves the code generated

Re: [x86 PATCH] Improve V[48]QI shifts on AVX512

2024-05-10 Thread Roger Sayle
d. Thanks again, Roger > From: Hongtao Liu > On Fri, May 10, 2024 at 6:26 AM Roger Sayle > wrote: > > > > > > The following one line patch improves the code generated for V8QI and > > V4QI shifts when AV512BW and AVX512VL functionality is available. > +

[x86 PATCH] Improve V[48]QI shifts on AVX512

2024-05-09 Thread Roger Sayle
ch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2024-05-09 Roger Sayle gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_vecop_qihi_partial): Don't a

[gcc r15-352] Constant fold {-1,-1} << 1 in simplify-rtx.cc

2024-05-09 Thread Roger Sayle via Gcc-cvs
https://gcc.gnu.org/g:f2449b55fb2d32fc4200667ba79847db31f6530d commit r15-352-gf2449b55fb2d32fc4200667ba79847db31f6530d Author: Roger Sayle Date: Thu May 9 22:45:54 2024 +0100 Constant fold {-1,-1} << 1 in simplify-rtx.cc This patch addresses a missed optimization oppor

[gcc r15-222] PR target/106060: Improved SSE vector constant materialization on x86.

2024-05-07 Thread Roger Sayle via Gcc-cvs
https://gcc.gnu.org/g:79649a5dcd81bc05c0ba591068c9075de43bd417 commit r15-222-g79649a5dcd81bc05c0ba591068c9075de43bd417 Author: Roger Sayle Date: Tue May 7 07:14:40 2024 +0100 PR target/106060: Improved SSE vector constant materialization on x86. This patch resolves PR target

RE: [PATCH] PR middle-end/111701: signbit(x*x) vs -fsignaling-nans

2024-05-02 Thread Roger Sayle
> From: Richard Biener > On Thu, May 2, 2024 at 11:34 AM Roger Sayle > wrote: > > > > > > > From: Richard Biener On Fri, Apr 26, > > > 2024 at 10:19 AM Roger Sayle > > > wrote: > > > > > > > > This patch address

RE: [PATCH] PR middle-end/111701: signbit(x*x) vs -fsignaling-nans

2024-05-02 Thread Roger Sayle
> From: Richard Biener > On Fri, Apr 26, 2024 at 10:19 AM Roger Sayle > wrote: > > > > This patch addresses PR middle-end/111701 where optimization of > > signbit(x*x) using tree_nonnegative_p incorrectly eliminates a > > floating point multiplication

RE: [C PATCH] PR c/109618: ICE-after-error from error_mark_node.

2024-04-30 Thread Roger Sayle
> On Tue, Apr 30, 2024 at 10:23 AM Roger Sayle > wrote: > > Hi Richard, > > Thanks for looking into this. > > > > It’s not the call to size_binop_loc (for CEIL_DIV_EXPR) that's > > problematic, but the call to fold_convert_loc (loc, size_type_node, value

RE: [C PATCH] PR c/109618: ICE-after-error from error_mark_node.

2024-04-30 Thread Roger Sayle
aversal checking error_operand_p within the unary and binary operators of an expression tree. Please let me know what you think/recommend. Best regards, Roger -- > -Original Message----- > From: Richard Biener > Sent: 30 April 2024 08:38 > To: Roger Sayle > Cc: gcc-patches@gcc.gnu.org > Su

[C PATCH] PR c/109618: ICE-after-error from error_mark_node.

2024-04-29 Thread Roger Sayle
) a CEIL_DIV_EXPR in the common case that "char" is a single-byte. The current code relies on the middle-end's tree folding to recognize that CEIL_DIV_EXPR of integer_one_node is a no-op, that can be optimized away. Ok for mainline? 2024-04-30 Roger Sayle gcc/c-family/ChangeLog P

[PATCH] PR tree-opt/113673: Avoid load merging from potentially trapping additions.

2024-04-28 Thread Roger Sayle
a part of the compiler that I'm less familiar with. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2024-04-28 Roger Sayle gcc/ChangeLog PR tree-optimi

[PATCH] PR middle-end/111701: signbit(x*x) vs -fsignaling-nans

2024-04-26 Thread Roger Sayle
with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2024-04-26 Roger Sayle gcc/ChangeLog PR middle-end/111701 * fold-const.cc (tree_binary_nonnegative_warnv_p) : Split handling of floating poi

[PATCH] PR target/114187: Fix ?Fmode SUBREG simplification in simplify_subreg.

2024-03-03 Thread Roger Sayle
ed to this lapse. Using lowpart_subreg should avoid/reduce confusion in future. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2024-03-03 Roger Sayle gcc/ChangeL

[x86_64 PATCH] PR target/113690: Fix-up MULT REG_EQUAL notes in STV.

2024-02-04 Thread Roger Sayle
with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2024-02-05 Roger Sayle gcc/ChangeLog PR target/113690 * config/i386/i386-features.cc (timode_convert_cst): New helper function to convert

[tree-ssa PATCH] PR target/113560: Enhance is_widening_mult_rhs_p.

2024-01-29 Thread Roger Sayle
rap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-01-30 Roger Sayle gcc/ChangeLog PR target/113560 * tree-ssa-math-opts.cc (is_widening_mult_rhs_p): Use range information via tree_non_zero_bits to check

[libatomic PATCH] PR other/113336: Fix libatomic testsuite regressions on ARM.

2024-01-28 Thread Roger Sayle
This patch is a revised version of the fix for PR other/113336. This patch has been tested on arm-linux-gnueabihf with --with-arch=armv6 with make bootstrap and make -k check where it fixes all of the FAILs in libatomic. Ok for mainline? 2024-01-28 Roger Sayle Victor Do

[middle-end PATCH] Constant fold {-1,-1} << 1 in simplify-rtx.cc

2024-01-26 Thread Roger Sayle
checks that VEC_SELECT or some funky (future) rtx_code doesn't cause problems. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline (in stage 1)? 2024-01-26 Roger Sayle gcc

RE: [x86 PATCH] PR target/106060: Improved SSE vector constant materialization.

2024-01-25 Thread Roger Sayle
in stage 1)? 2024-01-25 Roger Sayle Hongtao Liu gcc/ChangeLog PR target/106060 * config/i386/i386-expand.cc (enum ix86_vec_bcast_alg): New. (struct ix86_vec_bcast_map_simode_t): New type for table below. (ix86_vec_bcast_map_simode): Table of SImode

RE: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.

2024-01-19 Thread Roger Sayle
l might lead to a code quality regression, if RTL expansion doesn't know to lower it back to use PLUS on those targets with lea but without rotate. > From: Richard Biener > Sent: 19 January 2024 11:04 > On Thu, Jan 18, 2024 at 8:55 PM Roger Sayle > wrote: > > > > This patch tweak

[middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.

2024-01-18 Thread Roger Sayle
r1,r2,r1 j_s.d [blink] add2r0,r3,r0 This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2024-01-18 Roger Sayle gcc/ChangeLog * expmed.

[x86 PATCH] PR target/106060: Improved SSE vector constant materialization.

2024-01-16 Thread Roger Sayle
ootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2024-01-16 Roger Sayle gcc/ChangeLog PR target/106060 * config/i386/i386-expand.cc (enum ix86_vec_bcast_alg): New. (struct ix86_vec_bcast_map_simode_t)

[PATCH] PR rtl-optimization/111267: Improved forward propagation.

2024-01-15 Thread Roger Sayle
t .L6:xorl%eax, %eax ret This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Additionally, it also resolves the FAIL for gcc.target/i386/pr82580.c. Ok for mainline? 2024-01-1

[PATCH/RFC] Add --with-dwarf4 configure option.

2024-01-14 Thread Roger Sayle
originally misread the documentation and assumed --with-dwarf4 was already supported. 2024-01-14 Roger Sayle gcc/ChangeLog * configure.ac: Add a with --with dwarf4 option. * configure: Regenerate. * config/tm-dwarf4.h: New target file to define

RE: [libatomic PATCH] Fix testsuite regressions on ARM [raspberry pi].

2024-01-11 Thread Roger Sayle
-threaded) run-time test to search for race conditions, and confirm its implementations are correctly serializing. Please let me know what you think. Best regards, Roger -- > -Original Message- > From: Richard Earnshaw > Sent: 10 January 2024 15:34 > To: Roger Sayle ; gcc-patches

[libatomic PATCH] Fix testsuite regressions on ARM [raspberry pi].

2024-01-08 Thread Roger Sayle
testcases]. If this looks like the correct fix, I'm not confident with rebuilding Makefile.in with correct version of automake, so I'd very much appreciate it if someone/the reviewer/mainainer could please check this in for me. Thanks in advance. 2024-01-08 Roger Sayle libatomic/ChangeLog

RE: [x86_64 PATCH] PR target/112992: Optimize mode for broadcast of constants.

2024-01-06 Thread Roger Sayle
. * gcc.target/i386/pr90773-17.c: Likewise. Thanks in advance. Roger -- > -Original Message- > From: Hongtao Liu > Sent: 02 January 2024 05:40 > To: Roger Sayle > Cc: gcc-patches@gcc.gnu.org; Uros Bizjak > Subject: Re: [x86_64 PATCH] PR target/112992: Optimize mode for broadc

[x86 PATCH] PR target/113231: Improved costs in Scalar-To-Vector (STV) pass.

2024-01-06 Thread Roger Sayle
pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2024-01-06 Roger Sayle gcc/ChangeLog PR target/113231 * config/i386/i386-features.cc (compute_convert_gain): Include the overhead of e

[middle-end PATCH take #2] Only call targetm.truly_noop_truncation for truncations.

2023-12-31 Thread Roger Sayle
ted on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? Hopefully this revision tests cleanly on the linaro.org CI pipeline. 2023-12-31 Roger Sayle gcc/ChangeLog * combine.cc (make_extrac

RE: [x86_PATCH] peephole2 to resolve failure of gcc.target/i386/pr43644-2.c

2023-12-31 Thread Roger Sayle
Hi Uros, > From: Uros Bizjak > Sent: 28 December 2023 10:33 > On Fri, Dec 22, 2023 at 11:14 AM Roger Sayle > wrote: > > > > This patch resolves the failure of pr43644-2.c in the testsuite, a > > code quality test I added back in July, that started failing as the

RE: [PATCH] Improved RTL expansion of field assignments into promoted registers.

2023-12-28 Thread Roger Sayle
Hi Jeff, Thanks for the speedy review. > On 12/28/23 07:59, Roger Sayle wrote: > > This patch fixes PR rtl-optmization/104914 by tweaking/improving the > > way that fields are written into a pseudo register that needs to be > > kept sign extended. > Well, I think "

[PATCH] MIPS: Implement TARGET_INSN_COSTS

2023-12-28 Thread Roger Sayle
The current (default) behavior is that when the target doesn't define TARGET_INSN_COST the middle-end uses the backend's TARGET_RTX_COSTS, so multiplications are slower than additions, but about the same size when optimizing for size (with -Os or -Oz). All of this gets disabled with your

[middle-end PATCH] Only call targetm.truly_noop_truncation for truncations.

2023-12-28 Thread Roger Sayle
d that rely on the default behaviour of silently returning true for any (invalid) input. These are fixed below. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023

[PATCH] Improved RTL expansion of field assignments into promoted registers.

2023-12-28 Thread Roger Sayle
s been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. The cc1 from a cross-compiler to mips64 appears to generate much better code for the above test case. Ok for mainline? 2023-12-28 Roger Sayle gcc/Cha

RE: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode

2023-12-24 Thread Roger Sayle
> > > What's exceedingly weird is T_N_T_M_P (DImode, SImode) isn't > > > actually a truncation! The output precision is first, the input > > > precision is second. The docs explicitly state the output precision > > > should be smaller than the input precision (which makes sense for > > >

RE: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode

2023-12-24 Thread Roger Sayle
> What's exceedingly weird is T_N_T_M_P (DImode, SImode) isn't actually a > truncation! The output precision is first, the input precision is second. > The docs > explicitly state the output precision should be smaller than the input > precision > (which makes sense for truncation). > >

RE: Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode

2023-12-23 Thread Roger Sayle
> There's a PR in Bugzilla around this representational issue on MIPS, but I can't find > it straight away. Found it. It's PR rtl-optimization/104914, where we've already discussed this in comments #15 and #16. > -Original Message- > From: Roger Sayle > Sent: 24 Dece

Re: [PATCH v3] EXPR: Emit an truncate if 31+ bits polluted for SImode

2023-12-23 Thread Roger Sayle
Hi YunQiang (and Jeff), > MIPS claims TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true > based on that the hard register is always sign-extended, but here > the hard register is polluted by zero_extract. I suspect that the bug here is that the MIPS backend shouldn't be returning true

[ARC PATCH] Table-driven ashlsi implementation for better code/rtx_costs.

2023-12-23 Thread Roger Sayle
j_s [blink] Tested with a cross-compiler to arc-linux hosted on x86_64, with no new (compile-only) regressions from make -k check. Ok for mainline if this passes Claudiu's and/or Jeff's testing? [Thanks again to Jeff for finding the typo in my last ARC patch] 2023-12-23 Roger Sayle gcc

[x86_64 PATCH] PR target/112992: Optimize mode for broadcast of constants.

2023-12-22 Thread Roger Sayle
check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-12-21 Roger Sayle gcc/ChangeLog PR target/112992 * config/i386/i386-expand.cc (ix86_convert_const_wide_int_to_broadcast): Allow call

[x86_PATCH] peephole2 to resolve failure of gcc.target/i386/pr43644-2.c

2023-12-22 Thread Roger Sayle
, %rdx ret which I believe is optimal. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-12-21 Roger Sayle gcc/ChangeLog PR target/43644

[x86 PATCH] Improved TImode (128-bit) integer constants on x86_64.

2023-12-18 Thread Roger Sayle
-m32}, and with/without -march=cascadelake with no new failures. Ok for mainline? 2023-12-18 Roger Sayle gcc/ChangeLog * config/i386/i386-expand.cc (ix86_convert_const_wide_int_to_broadcast): Remove static. (ix86_expand_move): Don't attempt to convert wide

[PING] PR112380: Defend against CLOBBERs in RTX expressions in combine.cc

2023-12-10 Thread Roger Sayle
ications would also lead to better code generation, but I've not been able to find any examples on x86_64. This patch has been retested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-11-12 Roger

RE: [ARC PATCH] Add *extvsi_n_0 define_insn_and_split for PR 110717.

2023-12-07 Thread Roger Sayle
(normally) like turning two instructions into three. Fingers-crossed the attached patch works better on the nightly testers. Thanks in advance, Roger -- > -Original Message- > From: Jeff Law > Sent: 07 December 2023 14:47 > To: Roger Sayle ; gcc-patches@gcc.gnu.org > Cc: 'Clau

[ARC PATCH] Add *extvsi_n_0 define_insn_and_split for PR 110717.

2023-12-05 Thread Roger Sayle
r mainline if this passes Claudiu's nightly testing? 2023-12-05 Roger Sayle gcc/ChangeLog * config/arc/arc.md (*extvsi_n_0): New define_insn_and_split to implement SImode sign extract using a AND, XOR and MINUS sequence. gcc/testsuite/ChangeLog * gcc.t

[PATCH] Workaround array_slice constructor portability issues (with older g++).

2023-12-03 Thread Roger Sayle
proaches are investigated. For example, an ARRAY_SLICE(table) macro might be appropriate if there isn't an easy/portable template resolution solution. Thoughts? 2023-12-03 Roger Sayle gcc/c-family/ChangeLog * c-attribs.cc (c_common_gnu_attribute_table): Use an explicit arr

[RISC-V PATCH] Improve style to work around PR 60994 in host compiler.

2023-12-01 Thread Roger Sayle
using g++ 4.8.5 as the host compiler. Ok for mainline? 2023-12-01 Roger Sayle gcc/ChangeLog * config/riscv/riscv-vsetvl.cc (csetvl_info::parse_insn): Rename local variable from demand_flags to dflags, to avoid conflicting with (enumeration) type of the same name. Thanks

[PATCH] PR112380: Defend against CLOBBERs in RTX expressions in combine.cc

2023-11-12 Thread Roger Sayle
h the fall-out sufficient for x86_64 to bootstrap and regression test without new failures. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-11-12 Roger Sa

[x86 PATCH] Improve reg pressure of double-word right-shift then truncate.

2023-11-12 Thread Roger Sayle
no new failures. Ok for mainline? 2023-11-12 Roger Sayle gcc/ChangeLog * config/i386/i386.md (3_doubleword_lowpart): New define_insn_and_split to optimize register usage of doubleword right shifts followed by truncation. Thanks in advance, Roger -- diff --git a/gcc/

[ARC PATCH] Consistent use of whitespace in assembler templates.

2023-11-06 Thread Roger Sayle
-assembler needed to be updated to use \s+ instead of testing for a TAB or a space explicitly. Tested with a cross-compiler to arc-linux hosted on x86_64, with no new (compile-only) regressions from make -k check. Ok for mainline if this passes Claudiu's nightly testing? 2023-11-06 Roger Sayle gcc

[ARC PATCH] Improved DImode rotates and right shifts by one bit.

2023-11-06 Thread Roger Sayle
etter. Tested with a cross-compiler to arc-linux hosted on x86_64, with no new (compile-only) regressions from make -k check. Ok for mainline if this passes Claudiu's nightly testing? 2023-11-06 Roger Sayle gcc/ChangeLog * config/arc/arc.md (UNSPEC_ARC_CC_NEZ): New UNSPEC that

[ARC PATCH] Provide a TARGET_FOLD_BUILTIN target hook.

2023-11-03 Thread Roger Sayle
ly testing? 2023-11-03 Roger Sayle gcc/ChangeLog * config/arc/arc.cc (TARGET_FOLD_BUILTIN): Define to arc_fold_builtin. (arc_fold_builtin): New function. Convert ARC_BUILTIN_SWAP into a rotate. Evaluate ARC_BUILTIN_NORM and ARC_BUILTIN_NORMW of constant

[AVR PATCH] Improvements to SImode and PSImode shifts by constants.

2023-11-02 Thread Roger Sayle
ulator, where the compile-only tests in the gcc testsuite show no regressions. If someone could test this more thoroughly that would be great. 2023-11-02 Roger Sayle gcc/ChangeLog * config/avr/avr.cc (ashlqi3_out): Fix indentation whitespace. (ashlhi3_out): Li

[AVR PATCH] Optimize (X>>C)&1 for C in [1, 4, 8, 16, 24] in *insv.any_shift..

2023-11-02 Thread Roger Sayle
on x86_64, without a simulator, where the compile-only tests in the gcc testsuite show no regressions. If someone could test this more thoroughly that would be great. 2023-11-02 Roger Sayle gcc/ChangeLog * config/avr/avr.md (*insv.any_shift.): Optimize special cases

RE: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation using peephole2.

2023-11-01 Thread Roger Sayle
Hi Uros, > From: Uros Bizjak > Sent: 01 November 2023 10:05 > Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation > using peephole2. > > On Mon, Oct 30, 2023 at 6:27 PM Roger Sayle > wrote: > > > > > > This patch is a follow-u

[x86_64 PATCH] PR target/110551: Tweak mulx register allocation using peephole2.

2023-10-30 Thread Roger Sayle
rget_board=unix{-m32} with no new failures. Ok for mainline? 2023-10-30 Roger Sayle gcc/ChangeLog PR target/110551 * config/i386/i386.md (*bmi2_umul3_1): Tidy condition as operands[2] with predicate register_operand must be !MEM_P. (peephole2): Optimize a mulx fo

RE: [ARC PATCH] Improve DImode left shift by a single bit.

2023-10-30 Thread Roger Sayle
Hi Jeff, > From: Jeff Law > Sent: 30 October 2023 15:09 > Subject: Re: [ARC PATCH] Improve DImode left shift by a single bit. > > On 10/28/23 07:05, Roger Sayle wrote: > > > > This patch improves the code generated for X << 1 (and for X + X) when > >

[ARC PATCH] Improved ARC rtx_costs/insn_cost for SHIFTs and ROTATEs.

2023-10-29 Thread Roger Sayle
horter shifts by 3 and sign extension. Tested with a cross-compiler to arc-linux hosted on x86_64, with no new (compile-only) regressions from make -k check. Ok for mainline if this passes Claudiu's nightly testing? 2023-10-29 Roger Sayle gcc/ChangeLog * config/arc/arc.cc (arc_rtx_co

[ARC PATCH] Convert (signed<<31)>>31 to -(signed&1) without barrel shifter.

2023-10-28 Thread Roger Sayle
check. Ok for mainline if this passes Claudiu's nightly testing? 2023-10-28 Roger Sayle gcc/ChangeLog PR middle-end/101955 * config/arc/arc.md (*extvsi_1_0): New define_insn_and_split to convert sign extract of the least significant bit into an AND $1

[ARC PATCH] Improve DImode left shift by a single bit.

2023-10-28 Thread Roger Sayle
generates 16 instructions (plus an rts) for foo above.] Tested with a cross-compiler to arc-linux hosted on x86_64, with no new (compile-only) regressions from make -k check. Ok for mainline if this passes Claudiu's nightly testing? 2023-10-28 Roger Sayle gcc/ChangeLog * config/arc/arc.md (

[wwwdocs] Get newlib via git in simtest-howto.html

2023-10-27 Thread Roger Sayle
A minor tweak to the documentation, to use git rather than cvs to obtain the latest version of newlib. Ok for mainline? 2023-10-27 Roger Sayle * htdocs/simtest-howto.html: Use git to obtain newlib. Cheers, Roger -- diff --git a/htdocs/simtest-howto.html b/htdocs/simtest

[ARC PATCH] Improved SImode shifts and rotates with -mswap.

2023-10-27 Thread Roger Sayle
takes ~22 cycles, and replacement ~4 cycles. Tested with a cross-compiler to arc-linux hosted on x86_64, with no new (compile-only) regressions from make -k check. Ok for mainline if this passes Claudiu's nightly testing? 2023-10-27 Roger Sayle gcc/ChangeLog * config/arc/

RE: [x86 PATCH] PR target/110511: Fix reg allocation for widening multiplications.

2023-10-25 Thread Roger Sayle
e), as some passes before reload check both predicates and constraints. My original patch fixes PR 110511, using the same peephole2 idiom as already used elsewhere in i386.md. Ok for mainline? > -Original Message- > From: Uros Bizjak > Sent: 19 October 2023 18:02 > To: Roger

[NVPTX] Patch pings...

2023-10-25 Thread Roger Sayle
Random fact: there have been no changes to nvptx.md in 2023 apart from Jakub's tree-wide update to the copyright years in early January. Please can I ping two of my of pending Nvidia nvptx patches: "Correct pattern for popcountdi2 insn in nvptx.md" from January

[PATCH v2] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.

2023-10-25 Thread Roger Sayle
-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-10-25 Roger Sayle Richard Biener gcc/ChangeLog PR rtl-optimization/91865 * combine.cc (make_compound_operation): Avoi

[x86 PATCH] Fine tune STV register conversion costs for -Os.

2023-10-23 Thread Roger Sayle
piling with -Os. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-10-23 Roger Sayle gcc/ChangeLog * config/i386/i386-features.cc (compute_conver

RE: [Patch] nvptx: Use fatal_error when -march= is missing not an assert [PR111093]

2023-10-18 Thread Roger Sayle
; -Original Message- > From: Thomas Schwinge > Sent: 18 October 2023 11:16 > To: Tobias Burnus > Cc: gcc-patches@gcc.gnu.org; Tom de Vries ; Roger Sayle > > Subject: Re: [Patch] nvptx: Use fatal_error when -march= is missing not an > assert > [PR111093] > > Hi To

RE: [x86 PATCH] PR target/110551: Fix reg allocation for widening multiplications.

2023-10-18 Thread Roger Sayle
; From: Roger Sayle > Sent: 17 October 2023 20:06 > To: 'gcc-patches@gcc.gnu.org' > Cc: 'Uros Bizjak' > Subject: [x86 PATCH] PR target/110511: Fix reg allocation for widening > multiplications. > > > This patch contains clean-ups of the widening multiplication patterns in i3

[x86 PATCH] PR target/110511: Fix reg allocation for widening multiplications.

2023-10-17 Thread Roger Sayle
mainline? 2023-10-17 Roger Sayle gcc/ChangeLog PR target/110511 * config/i386/i386.md (mul3): Make operands 1 and 2 take "regiser_operand" and "nonimmediate_operand" respectively. (mulqihi3): Likewise. (*bmi2_umul3_1): Operand 2 needs to

RE: [x86 PATCH] PR 106245: Split (x<<31)>>31 as -(x&1) in i386.md

2023-10-17 Thread Roger Sayle
Hi Uros, Thanks for the speedy review. > From: Uros Bizjak > Sent: 17 October 2023 17:38 > > On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle > wrote: > > > > > > This patch is the backend piece of a solution to PRs 101955 and > > 106245, that adds a de

[x86 PATCH] PR 106245: Split (x<<31)>>31 as -(x&1) in i386.md

2023-10-17 Thread Roger Sayle
ntel and AMD; Intel sees only a 2% improvement (perhaps just a size effect), but AMD sees a 7% win. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-10-17 Roger

RE: [PATCH] Support g++ 4.8 as a host compiler.

2023-10-15 Thread Roger Sayle
need updating, if my fix isn't considered acceptable? Why this patch is an trigger issue (that requires significant discussion and deliberation) is somewhat of a mystery. Thanks in advance. Roger > -Original Message- > From: Jeff Law > Sent: 07 October 2023 17:20 > To: Roger

RE: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.

2023-10-15 Thread Roger Sayle
Hi Jeff, Thanks for the speedy review(s). > From: Jeff Law > Sent: 15 October 2023 00:03 > To: Roger Sayle ; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in > make_compound_operation. > > On 10/14/23 16:14, Roger Sayle wrot

RE: [ARC PATCH] Split asl dst, 1, src into bset dst, 0, src to implement 1<

2023-10-15 Thread Roger Sayle
I've done it again. ENOPATCH. From: Roger Sayle Sent: 15 October 2023 09:13 To: 'gcc-patches@gcc.gnu.org' Cc: 'Claudiu Zissulescu' Subject: [ARC PATCH] Split asl dst,1,src into bset dst,0,src to implement 1<mailto:ro...@nextmovesoftware.com> > gcc/ChangeLog * config/a

[ARC PATCH] Split asl dst, 1, src into bset dst, 0, src to implement 1<

2023-10-15 Thread Roger Sayle
This patch adds a pre-reload splitter to arc.md, to use the bset (set specific bit instruction) to implement 1< gcc/ChangeLog * config/arc/arc.md (*ashlsi3_1): New pre-reload splitter to use bset dst,0,src to implement 1<

[PATCH] Improved RTL expansion of 1LL << x.

2023-10-14 Thread Roger Sayle
ith and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-10-15 Roger Sayle gcc/ChangeLog * optabs.cc (expand_subword_shift): Call simplify_expand_binop instead of expand_binop. Optimize cases (i.e. avoid generating RTL) when CARRIES or INTO_I

[PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.

2023-10-14 Thread Roger Sayle
check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-10-14 Roger Sayle gcc/ChangeLog PR rtl-optimization/91865 * combine.cc (make_compound_operation): Avoid creating a ZERO_EXTEND of a ZERO_EXTEND. gcc/testsuite/Cha

[PATCH] Optimize (ne:SI (subreg:QI (ashift:SI x 7) 0) 0) as (and:SI x 1).

2023-10-10 Thread Roger Sayle
been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check with no new failures. Ok for mainline? 2023-10-10 Roger Sayle gcc/ChangeLog PR middle-end/101955 PR tree-optimization/106245 * simplify-rtx.c (simplify_relational_operation_1): Simplify

[ARC PATCH] Improved SImode shifts and rotates on !TARGET_BARREL_SHIFTER.

2023-10-08 Thread Roger Sayle
n loop j_s [blink] This patch has been tested with a cross-compiler to arc-linux hosted on x86_64-pc-linux-gnu and (partially) tested with the compile-only portions of the testsuite with no regressions. Ok for mainline, if your own testing shows no issues? 2023-10-07 Roger Sayl

RE: [X86 PATCH] Implement doubleword right shifts by 1 bit using s[ha]r+rcr.

2023-10-06 Thread Roger Sayle
Grr! I've done it again. ENOPATCH. > -Original Message- > From: Roger Sayle > Sent: 06 October 2023 14:58 > To: 'gcc-patches@gcc.gnu.org' > Cc: 'Uros Bizjak' > Subject: [X86 PATCH] Implement doubleword right shifts by 1 bit using s[ha]r+rcr. > > > This

[X86 PATCH] Implement doubleword right shifts by 1 bit using s[ha]r+rcr.

2023-10-06 Thread Roger Sayle
} with no new failures. And to provide additional testing, I've also bootstrapped and regression tested a version of this patch where the RCR is always generated (independent of the -march target) again with no regressions. Ok for mainline? 2023-10-06 Roger Sayle gcc/ChangeLog * con

RE: [X86 PATCH] Split lea into shorter left shift by 2 or 3 bits with -Oz.

2023-10-05 Thread Roger Sayle
Hi Uros, Very many thanks for the speedy reviews. Uros Bizjak wrote: > On Thu, Oct 5, 2023 at 11:06 AM Roger Sayle > wrote: > > > > > > This patch avoids long lea instructions for performing x<<2 and x<<3 > > by splitting them into shorter sal an

RE: [X86 PATCH] Implement doubleword shift left by 1 bit using add+adc.

2023-10-05 Thread Roger Sayle
Doh! ENOPATCH. > -Original Message- > From: Roger Sayle > Sent: 05 October 2023 12:44 > To: 'gcc-patches@gcc.gnu.org' > Cc: 'Uros Bizjak' > Subject: [X86 PATCH] Implement doubleword shift left by 1 bit using add+adc. > > > This patch tweaks the i386

[X86 PATCH] Implement doubleword shift left by 1 bit using add+adc.

2023-10-05 Thread Roger Sayle
_board=unix{-m32} with no new failures. Ok for mainline? 2023-10-05 Roger Sayle gcc/ChangeLog * config/i386/i386-expand.cc (ix86_split_ashl): Split shifts by one into add3_cc_overflow_1 followed by add3_carry. * config/i386/i386.md (@add3_cc_overflow_1): Rename

[X86 PATCH] Split lea into shorter left shift by 2 or 3 bits with -Oz.

2023-10-05 Thread Roger Sayle
ures. Additional testing was performed by repeating these steps after removing the "optimize_size > 1" condition, so that suitable lea instructions were always split [-Oz is not heavily tested, so this invoked the new code during the bootstrap and regression testing], again with no regr

[PATCH] Support g++ 4.8 as a host compiler.

2023-10-04 Thread Roger Sayle
compiler. Ok for mainline? 2023-10-04 Roger Sayle gcc/ChangeLog * rtl.h (rtx_def::u): Add explicit constructor to workaround issue using g++ 4.8 as a host compiler. diff --git a/gcc/rtl.h b/gcc/rtl.h index 6850281..a7667f5 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -451,6 +451,9

PING: PR rtl-optimization/110701

2023-10-03 Thread Roger Sayle
There are a small handful of middle-end maintainers/reviewers that understand and appreciate the difference between the RTL statements: (set (subreg:HI (reg:SI x)) (reg:HI y)) and (set (strict_lowpart:HI (reg:SI x)) (reg:HI y)) If one (or more) of them could please take a look at

RE: [ARC PATCH] Split SImode shifts pre-reload on !TARGET_BARREL_SHIFTER.

2023-10-03 Thread Roger Sayle
-- > -Original Message- > From: Claudiu Zissulescu > Sent: 03 October 2023 15:26 > To: Roger Sayle ; gcc-patches@gcc.gnu.org > Subject: RE: [ARC PATCH] Split SImode shifts pre-reload on > !TARGET_BARREL_SHIFTER. > > Hi Roger, > > It was nice to meet you too. > >

RE: [ARC PATCH] Use rlc r0, 0 to implement scc_ltu (i.e. carry_flag ? 1 : 0)

2023-09-29 Thread Roger Sayle
one could double check there are no issues on real hardware that would be great. I'm not sure if ARC is one of the targets covered by Jeff Law's compile farm? > -Original Message- > From: Roger Sayle > Sent: Friday, September 29, 2023 6:54 PM > To: gcc-patches@gcc.gnu.or

[ARC PATCH] Use rlc r0, 0 to implement scc_ltu (i.e. carry_flag ? 1 : 0)

2023-09-29 Thread Roger Sayle
ed on a cross-compiler to arc-linux (hosted on x86_64-pc-linux-gnu), and a partial tool chain, where the new case passes and there are no new regressions. Ok for mainline? 2023-09-29 Roger Sayle gcc/ChangeLog * config/arc/arc.md (CC_ltu): New mode iterator for CC and CC_C. (s

RE: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-09-29 Thread Roger Sayle
upta > Sent: 28 September 2023 22:44 > To: gcc-patches@gcc.gnu.org; Robin Dapp > Cc: kito.ch...@gmail.com; Jeff Law ; Palmer Dabbelt > ; gnu-toolch...@rivosinc.com; Roger Sayle > ; Jakub Jelinek ; Jivan > Hakobyan ; Vineet Gupta > Subject: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P

[ARC PATCH] Split SImode shifts pre-reload on !TARGET_BARREL_SHIFTER.

2023-09-28 Thread Roger Sayle
loop j_s.d[blink] or_s r0,r0,r1 Thanks in advance, Roger 2023-09-28 Roger Sayle gcc/ChangeLog * config/arc/arc-protos.h (emit_shift): Delete prototype. (arc_pre_reload_split): New function prototype. * config/arc/arc.cc (emit_shift): Delete function.

RE: [x86_64 PATCH] Improve __int128 argument passing (in ix86_expand_move).

2023-09-01 Thread Roger Sayle
ere before. As you/clang show, we could do better. Thanks again, and sorry for any inconvenience. Best regards, Roger -- > -Original Message- > From: Manolis Tsamis > Sent: 01 September 2023 11:45 > To: Uros Bizjak > Cc: Roger Sayle ; gcc-patches@gcc.gnu.org > Subject

PR target/107671: Make more use of btl/btq on x86_64.

2023-08-07 Thread Roger Sayle
this independent backend piece, and gain/bank the improvements on x86_64. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-08-07 Roger Sayle Uros Bizjak gcc

[Committed] Avoid FAIL of gcc.target/i386/pr110792.c

2023-08-06 Thread Roger Sayle
is tested by the 32-bit test case. Committed to mainline as obvious. Sorry for the inconvenience. 2023-08-06 Roger Sayle gcc/testsuite/ChangeLog PR target/110792 * gcc.target/i386/pr110792.c: Remove dg-final scan-assembler-not. diff --git a/gcc/testsuite/gcc.target/i386/pr110792.c

[PATCH] Specify signed/unsigned/dontcare in calls to extract_bit_field_1.

2023-08-03 Thread Roger Sayle
rdx sarq$59, %rdx ret This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-08-03 Roger Sayle gcc/ChangeLog * expmed.cc (extract_bit_fiel

[x86 PATCH] Split SUBREGs of SSE vector registers into vec_select insns.

2023-08-03 Thread Roger Sayle
failures. Ok for mainline? 2023-08-03 Roger Sayle gcc/ChangeLog * config/i386/sse.md (define_split): Convert highpart:DF extract from V2DFmode register into a sse2_storehpd instruction. (define_split): Likewise, convert lowpart:DF extract from V2DF register

[x86 PATCH] PR target/110792: Early clobber issues with rot32di2_doubleword.

2023-08-02 Thread Roger Sayle
dress), but this fix is a minimal "safe" solution, that should hopefully be suitable for backporting. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-0

[Committed] PR target/110843: Check TARGET_AVX512VL for V2DI rotates in STV.

2023-07-31 Thread Roger Sayle
with and without --target_board=unix{-m32} with no new failures. Committed to mainline as obvious. 2023-07-31 Roger Sayle gcc/ChangeLog PR target/110843 * config/i386/i386-features.cc (compute_convert_gain): Check TARGET_AVX512VL (not TARGET_AVX512F) when considering V2DImode

  1   2   3   4   5   6   7   >