Re: [PATCH] x86: correct and improve "*vec_dupv2di"

2023-06-15 Thread Uros Bizjak via Gcc-patches
On Thu, Jun 15, 2023 at 10:15 AM Jan Beulich wrote: > > On 15.06.2023 09:45, Hongtao Liu wrote: > > On Thu, Jun 15, 2023 at 3:07 PM Uros Bizjak via Gcc-patches > > wrote: > >> On Thu, Jun 15, 2023 at 8:03 AM Jan Beulich via Gcc-patches > >> wrote: &g

Re: [PATCH] x86: correct and improve "*vec_dupv2di"

2023-06-15 Thread Uros Bizjak via Gcc-patches
On Thu, Jun 15, 2023 at 8:03 AM Jan Beulich via Gcc-patches wrote: > > The input constraint for the %vmovddup alternative was wrong, as the > upper 16 XMM registers require AVX512VL to be used with this insn. To > compensate, introduce a new alternative permitting all 32 registers, by >

Re: [PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Uros Bizjak via Gcc-patches
On Wed, Jun 14, 2023 at 4:56 PM Jakub Jelinek wrote: > > On Wed, Jun 14, 2023 at 04:34:27PM +0200, Uros Bizjak wrote: > > LGTM for the x86 part. I did my best, but those peephole2 patterns are > > real PITA to be reviewed thoroughly. > > > > Maybe split out peep

Re: [PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Uros Bizjak via Gcc-patches
On Wed, Jun 14, 2023 at 4:00 PM Jakub Jelinek wrote: > > Hi! > > On Wed, Jun 14, 2023 at 12:35:42PM +, Richard Biener wrote: > > At this point two pages of code without a comment - can you introduce > > some vertical spacing and comments as to what is matched now? The > > split out functions

Re: [PATCH] middle-end, i386, v3: Pattern recognize add/subtract with carry [PR79173]

2023-06-14 Thread Uros Bizjak via Gcc-patches
On Wed, Jun 14, 2023 at 4:00 PM Jakub Jelinek wrote: > > Hi! > > On Wed, Jun 14, 2023 at 12:35:42PM +, Richard Biener wrote: > > At this point two pages of code without a comment - can you introduce > > some vertical spacing and comments as to what is matched now? The > > split out functions

[PATCH] RTL: Merge rtx_equal_p and hash_rtx functions with their callback variants

2023-06-14 Thread Uros Bizjak via Gcc-patches
Use default argument when callback function is not required to merge rtx_equal_p and hash_rtx functions with their callback variants. gcc/ChangeLog: * cse.cc (hash_rtx_cb): Rename to hash_rtx. (hash_rtx): Remove. * early-remat.cc (remat_candidate_hasher::equal): Update to call

Re: [x86 PATCH] Convert ptestz of pandn into ptestc.

2023-06-14 Thread Uros Bizjak via Gcc-patches
On Tue, Jun 13, 2023 at 6:03 PM Roger Sayle wrote: > > > This patch is the next instalment in a set of backend patches around > improvements to ptest/vptest. A previous patch optimized the sequence > t=pand(x,y); ptestz(t,t) into the equivalent ptestz(x,y), using the > property that ZF is set to

Re: Patch ping (Re: [PATCH] middle-end, i386: Pattern recognize add/subtract with carry [PR79173])

2023-06-13 Thread Uros Bizjak via Gcc-patches
On Tue, Jun 13, 2023 at 9:06 AM Jakub Jelinek wrote: > > Hi! > > On Tue, Jun 06, 2023 at 11:42:07PM +0200, Jakub Jelinek via Gcc-patches wrote: > > The following patch introduces {add,sub}c5_optab and pattern recognizes > > various forms of add with carry and subtract with carry/borrow, see > >

Re: [PATCH] New finish_compare_by_pieces target hook (for x86).

2023-06-12 Thread Uros Bizjak via Gcc-patches
On Mon, Jun 12, 2023 at 4:03 PM Roger Sayle wrote: > > > The following simple test case, from PR 104610, shows that memcmp () == 0 > can result in some bizarre code sequences on x86. > > int foo(char *a) > { > static const char t[] = "0123456789012345678901234567890"; > return

Re: [x86 PATCH] PR target/31985: Improve memory operand use with doubleword add.

2023-06-07 Thread Uros Bizjak via Gcc-patches
On Wed, Jun 7, 2023 at 8:32 AM Uros Bizjak wrote: > > On Wed, Jun 7, 2023 at 1:05 AM Roger Sayle wrote: > > > > > > This patch addresses the last remaining issue with PR target/31985, that > > GCC could make better use of memory addressing modes when impleme

Re: [x86 PATCH] PR target/31985: Improve memory operand use with doubleword add.

2023-06-07 Thread Uros Bizjak via Gcc-patches
On Wed, Jun 7, 2023 at 1:05 AM Roger Sayle wrote: > > > This patch addresses the last remaining issue with PR target/31985, that > GCC could make better use of memory addressing modes when implementing > double word addition. This is achieved by adding a define_insn_and_split > that combines an

Re: [x86 PATCH] Add support for stc, clc and cmc instructions in i386.md

2023-06-07 Thread Uros Bizjak via Gcc-patches
; > Thanks in advance. > Roger > -- > > -Original Message- > From: Uros Bizjak > Sent: 06 June 2023 18:34 > To: Roger Sayle > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [x86 PATCH] Add support for stc, clc and cmc instructions in > i386.md > > O

Re: [x86 PATCH] Add support for stc, clc and cmc instructions in i386.md

2023-06-06 Thread Uros Bizjak via Gcc-patches
c-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > 2022-06-06 Roger Sayle > Uros Bizjak > > gcc/ChangeLog > * config/i386/i386-expand.cc (ix86_expand_builtin) : >

[COMMITTED] reload1: Change return type of predicate function from int to bool

2023-06-06 Thread Uros Bizjak via Gcc-patches
gcc/ChangeLog: * rtl.h (function_invariant_p): Change return type from int to bool. * reload1.cc (function_invariant_p): Change return type from int to bool and adjust function body accordingly. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Uros. diff --git

Re: [PATCH] Fold _mm{, 256, 512}_abs_{epi8, epi16, epi32, epi64} into gimple ABSU_EXPR + VCE.

2023-06-06 Thread Uros Bizjak via Gcc-patches
On Tue, Jun 6, 2023 at 1:42 PM Hongtao Liu wrote: > > On Tue, Jun 6, 2023 at 5:11 PM Uros Bizjak wrote: > > > > On Tue, Jun 6, 2023 at 6:33 AM liuhongt via Gcc-patches > > wrote: > > > > > > r14-1145 fold the intrinsics into gimple ABS_EXPR which has U

Re: [PATCH] Fold _mm{, 256, 512}_abs_{epi8, epi16, epi32, epi64} into gimple ABSU_EXPR + VCE.

2023-06-06 Thread Uros Bizjak via Gcc-patches
On Tue, Jun 6, 2023 at 6:33 AM liuhongt via Gcc-patches wrote: > > r14-1145 fold the intrinsics into gimple ABS_EXPR which has UB for > TYPE_MIN, but PABSB will store unsigned result into dst. The patch > uses ABSU_EXPR + VCE instead of ABS_EXPR. > > Also don't fold _mm_abs_{pi8,pi16,pi32} w/o

Re: [PATCH] Fold _mm{, 256, 512}_abs_{epi8, epi16, epi32, epi64} into gimple ABSU_EXPR + VCE.

2023-06-06 Thread Uros Bizjak via Gcc-patches
On Tue, Jun 6, 2023 at 6:33 AM liuhongt via Gcc-patches wrote: > > r14-1145 fold the intrinsics into gimple ABS_EXPR which has UB for > TYPE_MIN, but PABSB will store unsigned result into dst. The patch > uses ABSU_EXPR + VCE instead of ABS_EXPR. > > Also don't fold _mm_abs_{pi8,pi16,pi32} w/o

[COMMITTED] print-rtl: Change return type of two print functions from int to void

2023-06-05 Thread Uros Bizjak via Gcc-patches
Also change one internal variable to bool. gcc/ChangeLog: * rtl.h (print_rtl_single): Change return type from int to void. (print_rtl_single_with_indent): Ditto. * print-rtl.h (class rtx_writer): Ditto. Change m_sawclose to bool. * print-rtl.cc (rtx_writer::rtx_writer): Update

[COMMITTED] reginfo: Change return type of predicate functions from int to bool

2023-06-05 Thread Uros Bizjak via Gcc-patches
gcc/ChangeLog: * rtl.h (reg_classes_intersect_p): Change return type from int to bool. (reg_class_subset_p): Ditto. * reginfo.cc (reg_classes_intersect_p): Ditto. (reg_class_subset_p): Ditto. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Uros diff --git

Re: [x86 PATCH] Add support for stc, clc and cmc instructions in i386.md

2023-06-04 Thread Uros Bizjak via Gcc-patches
On Sun, Jun 4, 2023 at 12:45 AM Roger Sayle wrote: > > > This patch is the latest revision of my patch to add support for the > STC (set carry flag), CLC (clear carry flag) and CMC (complement > carry flag) instructions to the i386 backend, incorporating Uros' > previous feedback. The

Re: [x86_64 PATCH] PR target/110083: Fix-up REG_EQUAL notes on COMPARE in STV.

2023-06-04 Thread Uros Bizjak via Gcc-patches
On Sat, Jun 3, 2023 at 7:31 PM Roger Sayle wrote: > > > This patch fixes PR target/110083, an ICE-on-valid regression exposed by > my recent PTEST improvements (to address PR target/109973). The latent > bug (admittedly mine) is that the scalar-to-vector (STV) pass doesn't update > or delete

[COMMITTED] reg-stack: Change return type of predicate functions from int to bool

2023-06-02 Thread Uros Bizjak via Gcc-patches
Also change some internal variables to bool and recode handling of boolean varialbes to not use bitwise or. gcc/ChangeLog: * rtl.h (stack_regs_mentioned): Change return type from int to bool. * reg-stack.cc (struct_block_info_def): Change "done" to bool. (stack_regs_mentioned_p):

Re: [PATCH] i386: Add missing vector truncate patterns [PR92658].

2023-06-02 Thread Uros Bizjak via Gcc-patches
On Fri, Jun 2, 2023 at 2:49 AM liuhongt wrote: > > Add missing insn patterns for v2si -> v2hi/v2qi and v2hi-> v2qi vector > truncate. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > > PR target/92658 > * config/i386/mmx.md

[COMMITTED] cse: Change return type of predicate functions from int to bool

2023-06-01 Thread Uros Bizjak via Gcc-patches
Also change some function arguments to bool and remove one instance of always zero function argument. gcc/ChangeLog: * rtl.h (exp_equiv_p): Change return type from int to bool. * cse.cc (mention_regs): Change return type from int to bool and adjust function body accordingly.

[PATCH] emit-rtl: Change return type of predicate functions from int to bool

2023-05-31 Thread Uros Bizjak via Gcc-patches
Also fix some stalled comments. gcc/ChangeLog: * rtl.h (subreg_lowpart_p): Change return type from int to bool. (active_insn_p): Ditto. (in_sequence_p): Ditto. (unshare_all_rtl): Change return type from int to void. * emit-rtl.h (mem_expr_equal_p): Change return type from int

[PATCH] alias: Change return type of predicate functions from int to bool

2023-05-31 Thread Uros Bizjak via Gcc-patches
Also remove a bunch of unneeded forward declarations. gcc/ChangeLog: * rtl.h (true_dependence): Change return type from int to bool. (canon_true_dependence): Ditto. (read_dependence): Ditto. (anti_dependence): Ditto. (canon_anti_dependence): Ditto. (output_dependence):

Re: [PATCH] libgcc: Use initarray section type for .init_stack

2023-05-31 Thread Uros Bizjak via Gcc-patches
On Wed, May 31, 2023 at 9:40 AM Kewen.Lin wrote: > > Hi Andreas, > > on 2023/5/25 15:25, Andreas Krebbel wrote: > > On 3/20/23 07:33, Kewen.Lin wrote: > >> Hi, > >> > >> One of my workmates found there is a warning like: > >> > >> libgcc/config/rs6000/morestack.S:402: Warning: ignoring > >>

Re: [PATCH] jump: Change return type of predicate functions from int to bool

2023-05-31 Thread Uros Bizjak via Gcc-patches
On Wed, May 31, 2023 at 9:17 AM Richard Biener wrote: > > On Tue, May 30, 2023 at 9:01 PM Jeff Law via Gcc-patches > wrote: > > > > > > > > On 5/30/23 08:36, Uros Bizjak via Gcc-patches wrote: > > > gcc/ChangeLog: > > > > > > *

[PATCH] jump: Change return type of predicate functions from int to bool

2023-05-30 Thread Uros Bizjak via Gcc-patches
gcc/ChangeLog: * rtl.h (comparison_dominates_p): Change return type from int to bool. (condjump_p): Ditto. (any_condjump_p): Ditto. (any_uncondjump_p): Ditto. (simplejump_p): Ditto. (returnjump_p): Ditto. (eh_returnjump_p): Ditto. (onlyjump_p): Ditto.

Re: [x86_64 PATCH] PR target/109973: CCZmode and CCCmode variants of [v]ptest.

2023-05-30 Thread Uros Bizjak via Gcc-patches
On Tue, May 30, 2023 at 9:39 AM Uros Bizjak wrote: > > On Mon, May 29, 2023 at 8:17 PM Roger Sayle > wrote: > > > > > > This is my proposed minimal fix for PR target/109973 (hopefully suitable > > for backporting) that follows Jakub Jelinek's suggestion that we

Re: [x86_64 PATCH] PR target/109973: CCZmode and CCCmode variants of [v]ptest.

2023-05-30 Thread Uros Bizjak via Gcc-patches
On Mon, May 29, 2023 at 8:17 PM Roger Sayle wrote: > > > This is my proposed minimal fix for PR target/109973 (hopefully suitable > for backporting) that follows Jakub Jelinek's suggestion that we introduce > CCZmode and CCCmode variants of ptest and vptest, so that the i386 > backend treats

[PATCH] rtlanal: Change return type of predicate functions from int to bool

2023-05-29 Thread Uros Bizjak via Gcc-patches
gcc/ChangeLog: * rtl.h (rtx_addr_can_trap_p): Change return type from int to bool. (rtx_unstable_p): Ditto. (reg_mentioned_p): Ditto. (reg_referenced_p): Ditto. (reg_used_between_p): Ditto. (reg_set_between_p): Ditto. (modified_between_p): Ditto.

[COMMITTED] i386: Also require TARGET_AVX512BW to generate truncv16hiv16qi2 [PR110021]

2023-05-29 Thread Uros Bizjak via Gcc-patches
gcc/ChangeLog: PR target/110021 * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2): Also require TARGET_AVX512BW to generate truncv16hiv16qi2. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Uros. diff --git a/gcc/config/i386/i386-expand.cc

Re: [PATCH] Disable avoid_false_dep_for_bmi for atom and icelake(and later) core processors.

2023-05-26 Thread Uros Bizjak via Gcc-patches
On Fri, May 26, 2023 at 4:46 AM liuhongt wrote: > > lzcnt/tzcnt has been fixed since skylake, popcnt has been fixed since > icelake. At least for icelake and later intel Core processors, the > errata tune is not needed. And the tune isn't need for ATOM either. > > Bootstrapped and regtested on

Re: [COMMITTED] i386: Use 2x-wider modes when emulating QImode vector instructions

2023-05-26 Thread Uros Bizjak via Gcc-patches
On Fri, May 26, 2023 at 4:12 AM Jiang, Haochen wrote: > > > gcc/ChangeLog: > > > > * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2): > > Rewrite to expand to 2x-wider (e.g. V16QI -> V16HImode) > > instructions when available. Emulate truncation via > >

[COMMITTED] i386: Use 2x-wider modes when emulating QImode vector instructions

2023-05-25 Thread Uros Bizjak via Gcc-patches
Rewrite ix86_expand_vecop_qihi2 to expand fo 2x-wider (e.g. V16QI -> V16HImode) instructions when available. Currently, the compiler generates following assembly for V16QImode multiplication (-mavx2): vpunpcklbw %xmm0, %xmm0, %xmm3 vpunpcklbw %xmm1, %xmm1, %xmm2 vpunpckhbw

[COMMITTED] i386: Add vv4qi3 expander

2023-05-24 Thread Uros Bizjak via Gcc-patches
Also, move vv8qi3 expander to a better place and enable it with TARGET_MMX_WITH_SSE. Remove handling of V8QImode from ix86_expand_vecop_qihi2 since all partial QI->HI vector modes expand via ix86_expand_vecop_qihi_partial. gcc/ChangeLog: * config/i386/i386-expand.cc

Re: [PATCH] target/109944 - avoid STLF fail for V16QImode CTOR expansion

2023-05-24 Thread Uros Bizjak via Gcc-patches
On Wed, May 24, 2023 at 12:13 PM Richard Biener wrote: > > The following dispatches to V2DImode CTOR expansion instead of > using sets of (subreg:DI (reg:V16QI 146) [08]) which causes > LRA to spill DImode and reload V16QImode. The same applies for > V8QImode or V4HImode construction from SImode

Re: [PATCH] [testsuite] [x86] cope with --enable-frame-pointer

2023-05-24 Thread Uros Bizjak via Gcc-patches
On Wed, May 24, 2023 at 7:48 AM Alexandre Oliva wrote: > > > Various x86 tests fail if the toolchain is configured with > --enable-frame-pointer, because the unexpected extra insns mess with > the expected asm counts. Add -fomit-frame-pointer so that they can > still pass. > > Bootstrapped on

[COMMITTED] i386: Add V8QI and V4QImode partial vector shift operations

2023-05-23 Thread Uros Bizjak via Gcc-patches
Add V8QImode and V4QImode vector shift patterns that call into ix86_expand_vecop_qihi_partial. Generate special sequences for constant count operands. The patch regresses g++.dg/pr91838.C - as explained in PR91838, the test returns different results, depending on whether V8QImode shift pattern

Re: [PATCH] Account for vector splat GPR->XMM move cost

2023-05-23 Thread Uros Bizjak via Gcc-patches
On Tue, May 23, 2023 at 5:18 PM Richard Biener wrote: > > The following also accounts for a GPR->XMM move cost for splat > operations and properly guards eliding the cost when moving from > memory only for SSE4.1 or HImode or larger operands. This > doesn't fix the PR fully yet. > > Bootstrapped

[COMMITTED] i386: Adjust emulated integer vector mode shift costs

2023-05-22 Thread Uros Bizjak via Gcc-patches
Returned integer vector mode costs of emulated instructions in ix86_shift_rotate_cost are wrong and do not reflect generated instruction sequences. Rewrite handling of different integer vector modes and different target ABIs to return real instruction counts in order to calcuate better costs of

[COMMITTED] i386: Account for the memory read in V*QImode multiplication sequences

2023-05-22 Thread Uros Bizjak via Gcc-patches
Add the cost of a memory read to the cost of V*QImode vector mult sequences. gcc/ChangeLog: * config/i386/i386.cc (ix86_multiplication_cost): Add the cost of a memory read to the cost of V?QImode sequences. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Uros. diff

[COMMITTED] i386: Add infrastructure for QImode partial vector mult and shift operations

2023-05-18 Thread Uros Bizjak via Gcc-patches
QImode partial vector multiplications and shifts can be implemented using their HImode counterparts. Add infrastructure to handle V8QImode and V4QImode vectors by extending (interleaving) their input operands to V8HImode, performing V8HImode operation and truncating output back to the original

Re: [PATCH] i386: Fix up types in __builtin_{inf, huge_val, nan{, s}, fabs, copysign}q builtins [PR109884]

2023-05-17 Thread Uros Bizjak via Gcc-patches
On Wed, May 17, 2023 at 8:08 PM Jakub Jelinek wrote: > > Hi! > > When _Float128 support has been added to C++ for 13.1, float128t_type_node > tree has been added - in C float128_type_node and float128t_type_node is > the same and represents both _Float128 and __float128, but in C++ they > are

[COMMITTED] i386: Adjust emulated integer vector mode multiplication costs

2023-05-17 Thread Uros Bizjak via Gcc-patches
Returned integer vector mode costs of emulated modes in ix86_multiplication_cost are wrong and do not reflect generated instruction sequences. Rewrite handling of different integer vector modes and different target ABIs to return real instruction counts in order to calculate better costs of

[PATCH] i386: Handle unsupported modes from ix86_widen_mult_cost [PR109807]

2023-05-14 Thread Uros Bizjak via Gcc-patches
Revert my previous change that faked handling of V4HI and V2SImodes in ix86_widen_mult_cost and rather return arbitrary high value for unsupported modes. This should prevent cost estimator from selecting non-existent vector widen multiply operation. gcc/ChangeLog: PR target/109807 *

[PATCH] i386: Cleanup ix86_expand_vecop_qihi{,2}

2023-05-12 Thread Uros Bizjak via Gcc-patches
Some cleanups while looking at these two functions. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2): Also reject ymm instructions for TARGET_PREFER_AVX128. Use generic gen_extend_insn to generate zero/sign extension instructions. Fix comments.

Re: [PATCH] i386: Honour -mdirect-extern-access when calling __fentry__

2023-05-12 Thread Uros Bizjak via Gcc-patches
On Fri, May 12, 2023 at 4:07 PM Ard Biesheuvel wrote: > > > > > Note that the GOT reference in question is in fact a data reference: > > > > > we > > > > > explicitly load the address of __fentry__ from the GOT, which amounts > > > > > to > > > > > eager binding, rather than emitting a PLT

[PATCH] i386: Remove mulv2si emulated sequence for TARGET_SSE2 [PR109797]

2023-05-12 Thread Uros Bizjak via Gcc-patches
Remove mulv2si emulated sequence for TARGET_SSE2 and enable only native PMULLD instruction for TARGET_SSE4_1. Ideally, the vectorization for TARGET_SSE2 should depend on more precise cost estimation (the PR contains patch for ix86_multiplication_cost), but even with patched cost function the

Re: [x86_64 PATCH] PR middle-end/109766: Prevent cprop_hardreg bloating code with -Os.

2023-05-12 Thread Uros Bizjak via Gcc-patches
On Thu, May 11, 2023 at 4:21 PM Roger Sayle wrote: > > > PR 109766 is an interesting case of large code being generated on x86_64, > caused by an interaction/conflict between register allocation and hardreg > cprop, that's tricky to fix/resolve within the middle-end. > > The task/challenge is to

[PATCH] i386: Handle V4HI and V2SImode in ix86_widen_mult_cost [PR109807]

2023-05-11 Thread Uros Bizjak via Gcc-patches
Do not crash when asking ix86_widen_mult_cost for the cost of a widening mul operation to V4HI or V2SImode. gcc/ChangeLog: PR target/109807 * config/i386/i386.cc (ix86_widen_mult_cost): Handle V4HImode and V2SImode. gcc/testsuite/ChangeLog: PR target/109807 *

Re: [PATCH] i386: Honour -mdirect-extern-access when calling __fentry__

2023-05-11 Thread Uros Bizjak via Gcc-patches
On Thu, May 11, 2023 at 12:04 AM H.J. Lu wrote: > > On Wed, May 10, 2023 at 2:17 AM Uros Bizjak wrote: > > > > On Tue, May 9, 2023 at 10:58 AM Ard Biesheuvel wrote: > > > > > > The small and medium PIC code models generate profiling calls that > > >

[PATCH] i386: Add missing vector extend patterns [PR92658]

2023-05-10 Thread Uros Bizjak via Gcc-patches
Add missing insn pattern for v2qi -> v2si vector extend and named expanders to activate generation of vector extends to 8-byte and 4-byte vectors. gcc/ChangeLog: PR target/92658 * config/i386/mmx.md (sse4_1_v2qiv2si2): New insn pattern. (v4qiv4hi2): New expander. (v2hiv2si2):

Re: [x86_64 PATCH] Use [(const_int 0)] idiom consistently in i386.md

2023-05-10 Thread Uros Bizjak via Gcc-patches
On Wed, May 10, 2023 at 9:20 PM Roger Sayle wrote: > > > Hi Uros, > This cleans up the use of [(clobber (const_int 0))] in the i386 backend. > My apologies I must have copied this idiom from one of the other targets: > aarch64.md, arm.md, thumb1.md, avr.md, or sparc.md. > > This patch has been

Re: [PATCH] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-05-10 Thread Uros Bizjak via Gcc-patches
On Fri, Apr 28, 2023 at 2:47 AM Fangrui Song wrote: > > When using -mcmodel=medium, large data is placed into .l* sections. GNU ld > places .l* sections into separate output sections. If small and medium > code model object files are mixed, the .l* sections won't cause > relocation overflow

Re: [PATCH] i386: Honour -mdirect-extern-access when calling __fentry__

2023-05-10 Thread Uros Bizjak via Gcc-patches
/i386.cc (x86_function_profiler): Take > ix86_direct_extern_access into account when generating calls > to __fentry__() HJ, is the patch OK with you? Uros. > > Cc: H.J. Lu > Cc: Jakub Jelinek > Cc: Richard Biener > Cc: Uros Bizjak > Cc: Hou Wenlong

Re: [x86_64 PATCH] Introduce insvti_highpart define_insn_and_split.

2023-05-07 Thread Uros Bizjak via Gcc-patches
On Sat, May 6, 2023 at 4:00 PM Roger Sayle wrote: > > > Hi Uros, > This is a repost/respin of a patch that was conditionally approved: > https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609470.html > > This patch adds a convenient post-reload splitter for setting/updating > the highpart of

[PATCH] i386: Rename index_register_operand predicate to register_no_SP_operand

2023-05-05 Thread Uros Bizjak via Gcc-patches
Rename index_register_operand predicate to what it really does. No functional change. gcc/ChangeLog: * config/i386/predicates.md (register_no_SP_operand): Rename from index_register_operand. (call_register_operand): Update for rename. * config/i386/i386.md (*lea_general_[1234]):

[PATCH] i386: Introduce mulv2si3 instruction

2023-05-05 Thread Uros Bizjak via Gcc-patches
For SSE2 targets the expander unpacks input elements into the correct position in the V4SI vector and emits PMULUDQ instruction. The output elements are then shuffled back to their positions in the V2SI vector. For SSE4 targets PMULLD instruction is emitted directly. gcc/ChangeLog: *

[PATCH] i386: Tighten ashift to lea splitter operand predicates [PR109733]

2023-05-04 Thread Uros Bizjak via Gcc-patches
The predicates of ashift to lea post-reload splitter were too broad so the splitter tried to convert the mask shift instruction. Tighten operand predicates to match only general registers. gcc/ChangeLog: PR target/109733 * config/i386/predicates.md (index_reg_operand): New predicate.

[PATCH] i386: Improve index_register_operand predicate

2023-05-04 Thread Uros Bizjak via Gcc-patches
Use the same approach as in register_no_elim_operand predicate, but also reject stack_pointer_rtx operands. gcc/ChangeLog: * config/i386/predicates.md (index_register_operand): Reject arg_pointer_rtx, frame_pointer_rtx, stack_pointer_rtx and VIRTUAL_REGISTER_P operands. Allow

Re: [PATCH] Turn on LRA on all targets

2023-04-24 Thread Uros Bizjak via Gcc-patches
On Mon, Apr 24, 2023 at 11:19 AM Segher Boessenkool wrote: > > On Sun, Apr 23, 2023 at 11:06:41PM +0200, Uros Bizjak wrote: > > > I send this patch now so that people can start testing. I don't plan to > > > commit this for another week at least, for a week after GCC

Re: [PATCH] Turn on LRA on all targets

2023-04-23 Thread Uros Bizjak via Gcc-patches
On Sun, Apr 23, 2023 at 6:48 PM Segher Boessenkool wrote: > > This minimal patch enables LRA for all targets. It does not clean up > the target code, nor does it do anything to generic code: it just > deletes all target definitions of TARGET_LRA_P. > > There are three kinds of changes: > > 1)

[PATCH] i386: Remove REG_OK_FOR_INDEX/REG_OK_FOR_BASE and their derivatives

2023-04-21 Thread Uros Bizjak via Gcc-patches
x86 was converted to TARGET_LEGITIMATE_ADDRESS_P long ago. Remove remnants of the conversion. Also, cleanup the remaining macros a bit by introducing INDEX_REGNO_P macro. No functional change. gcc/ChangeLog: 2023-04-21 Uroš Bizjak * config/i386/i386.h (REG_OK_FOR_INDEX_P,

[PATCH] arch: Use VIRTUAL_REGISTER_P predicate.

2023-04-20 Thread Uros Bizjak via Gcc-patches
gcc/ChangeLog: * config/arm/arm.cc (thumb1_legitimate_address_p): Use VIRTUAL_REGISTER_P predicate. (arm_eliminable_register): Ditto. * config/avr/avr.md (push_1): Ditto. * config/bfin/predicates.md (register_no_elim_operand): Ditto. * config/h8300/predicates.md

i386: Handle sign-extract for QImode operations with high registers [PR78952]

2023-04-20 Thread Uros Bizjak via Gcc-patches
Introduce extract_operator predicate to handle both, zero-extract and sign-extract extract operations with expressions like: (subreg:QI (zero_extract:SWI248 (match_operand 1 "int248_register_operand" "0") (const_int 8) (const_int 8)) 0) As shown in the testcase, this

[PATCH] i386: Emit compares between high registers and memory

2023-04-19 Thread Uros Bizjak via Gcc-patches
Following code: typedef __SIZE_TYPE__ size_t; struct S1s { char pad1; char val; short pad2; }; extern char ts[256]; _Bool foo (struct S1s a, size_t i) { return (ts[i] > a.val); } compiles with -O2 to: movl%edi, %eax movsbl %ah, %edi cmpb%dil, ts(%rsi)

Re: [PATCH] i386: Add new pattern for zero-extend cmov

2023-04-19 Thread Uros Bizjak via Gcc-patches
On Wed, Apr 19, 2023 at 1:33 AM Andrew Pinski via Gcc-patches wrote: > > After a phiopt change, I got a failure of cmov9.c. > The RTL IR has zero_extend on the outside of > the if_then_else rather than on the side. Both > ways are considered canonical as mentioned in > PR 66588. > > This fixes

Re: [PATCH] Introduce VIRTUAL_REGISTER_P and VIRTUAL_REGISTER_NUM_P predicates

2023-04-19 Thread Uros Bizjak via Gcc-patches
On Tue, Apr 18, 2023 at 7:20 PM Jakub Jelinek wrote: > > On Mon, Apr 17, 2023 at 11:27:28PM +0200, Uros Bizjak via Gcc-patches wrote: > > --- a/gcc/rtl.h > > +++ b/gcc/rtl.h > > @@ -1972,6 +1972,13 @@ set_regno_raw (rtx x, unsigned int regno, unsigned > > int

[PATCH] i386: Improve permutations with INSERTPS instruction [PR94908]

2023-04-18 Thread Uros Bizjak via Gcc-patches
INSERTPS can select any element from src and insert into any place of the dest. For SSE4.1 targets, compiler can generate e.g. insertps $64, %xmm0, %xmm1 to insert element 1 from %xmm1 to element 0 of %xmm0. gcc/ChangeLog: PR target/94908 * config/i386/i386-builtin.def

[PATCH] Introduce VIRTUAL_REGISTER_P and VIRTUAL_REGISTER_NUM_P predicates

2023-04-17 Thread Uros Bizjak via Gcc-patches
These two predicates are similar to existing HARD_REGISTER_P and HARD_REGISTER_NUM_P predicates and return 1 if the given register corresponds to a virtual register. gcc/ChangeLog: * rtl.h (VIRTUAL_REGISTER_P): New predicate. (VIRTUAL_REGISTER_NUM_P): Ditto. (REGNO_PTR_FRAME_P): Use

Re: [PATCH] i386: Fix up z operand modifier diagnostics on inline-asm [PR109458]

2023-04-12 Thread Uros Bizjak via Gcc-patches
On Wed, Apr 12, 2023 at 4:28 PM Jakub Jelinek wrote: > > Hi! > > On the following testcase, we emit weird diagnostics. > User used the z modifier, but diagnostics talks about Z instead. > This is because z is implemented by doing some stuff and then falling > through into the Z case. > > The

Re: [PATCH] Adjust memory_move_cost for MASK_REGS when MODE_SIZE > 8.

2023-03-30 Thread Uros Bizjak via Gcc-patches
On Fri, Mar 31, 2023 at 7:11 AM liuhongt wrote: > > RA sometimes will use lowest the cost of the mode with all different > regclasses > w/o check if it's hard_regno_mode_ok. > It's impossible to put modes whose size > 8 into MASK_REGS, ajdust the cost to > avoid potential performance issue. I

Re: [PATCH V2] Rename ufix_trunc/ufloat* patterns to fixuns_trunc/floatuns* to align with standard pattern name.

2023-03-30 Thread Uros Bizjak via Gcc-patches
On Thu, Mar 30, 2023 at 1:43 PM liuhongt wrote: > > > > Just rename the instruction and fix all its call sites. The name of > > > the insn pattern is internal to the compiler and can be renamed at > > > will. > > > > Ideally, we should standardize all the names to a standard name, so > > e.g.

Re: [PATCH] Support vector conversion for AVX512 vcvtudq2pd/vcvttps2udq/vcvttpd2udq.

2023-03-30 Thread Uros Bizjak via Gcc-patches
On Thu, Mar 30, 2023 at 8:17 AM Uros Bizjak wrote: > > On Thu, Mar 30, 2023 at 3:47 AM liuhongt wrote: > > > > There's some typo for the standard pattern name for unsigned_{float,fix}, > > it should be floatunsmn2/fixuns_truncmn2, not ufloatmn2/ufix_truncmn2 > > i

Re: [PATCH] Support vector conversion for AVX512 vcvtudq2pd/vcvttps2udq/vcvttpd2udq.

2023-03-30 Thread Uros Bizjak via Gcc-patches
On Thu, Mar 30, 2023 at 3:47 AM liuhongt wrote: > > There's some typo for the standard pattern name for unsigned_{float,fix}, > it should be floatunsmn2/fixuns_truncmn2, not ufloatmn2/ufix_truncmn2 > in current trunk, the patch fix the typo. > > Also vcvttps2udq is available under AVX512VL, so it

Re: [PATCH] Generate vpblendd instead of vpblendw for V4SI under AVX2.

2023-03-29 Thread Uros Bizjak via Gcc-patches
On Wed, Mar 29, 2023 at 9:21 AM liuhongt wrote: > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} > Ok for GCC14 stage-1(or maybe trunk)? > > gcc/ChangeLog: > > * config/i386/i386-expand.cc (expand_vec_perm_blend): Generate > vpblendd instead of vpblendw for V4SI under

Re: [PATCH] i386: Require just 32-bit alignment for SLOT_FLOATxFDI_387 -m32 -mpreferred-stack-boundary=2 DImode temporaries [PR109276]

2023-03-28 Thread Uros Bizjak via Gcc-patches
On Tue, Mar 28, 2023 at 10:11 AM Jakub Jelinek wrote: > > Hi! > > The following testcase ICEs since r11-2259 because assign_386_stack_local > -> assign_stack_local -> ix86_local_alignment now uses 64-bit alignment > for DImode temporaries rather than 32-bit as before. > Most of the spots in the

Re: [PATCH] Remove TARGET_GEN_MEMSET_SCRATCH_RTX since it's not used anymore.

2023-03-22 Thread Uros Bizjak via Gcc-patches
On Wed, Mar 22, 2023 at 3:59 AM liuhongt wrote: > > The target hook is only used by i386, and the current definition is > same as default gen_reg_rtx. So there's no need for this target hook. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk(or GCC14)? > >

[PATCH] i386: Robustify vec perm blend functions for TARGET_MMX_WITH_SSE

2023-03-16 Thread Uros Bizjak via Gcc-patches
8-byte modes should be processed only for TARGET_MMX_WITH_SSE. gcc/ChangeLog: * config/i386/i386-expand.cc (expand_vec_perm_pblendv): Handle 8-byte modes only with TARGET_MMX_WITH_SSE. (expand_vec_perm_2perm_pblendv): Ditto. Bootstrapped and regression tested on x86_64-linux-gnu

[PATCH] i386: Fix blend vector permutation for 8-byte modes

2023-03-15 Thread Uros Bizjak via Gcc-patches
8-byte modes should be processed only for TARGET_MMX_WITH_SSE. Handle V2SFmode and fix V2HImode handling. The resulting BLEND instructions are always faster than MOVSS/MOVSD, so prioritize them w.r.t MOVSS/MOVSD for TARGET_SSE4_1. gcc/ChangeLog: * config/i386/i386-expand.cc

[PATCH] i386: Use movss to implement V2SImode VEC_PERM.

2023-03-14 Thread Uros Bizjak via Gcc-patches
Perform V2SI vector permutation in the same way as existing V2SF for TARGET_MMX_WITH_SSE targets. The testcase: typedef unsigned int v2si __attribute__((vector_size(8))); v2si foo(v2si x, v2si y) { return (v2si){y[0], x[1]}; } is currently compiled to (-O2): foo: movdqa %xmm0, %xmm2

Re: [PATCH] i386: Fix up split_double_concat [PR109109]

2023-03-14 Thread Uros Bizjak via Gcc-patches
On Tue, Mar 14, 2023 at 5:09 PM Jakub Jelinek wrote: > > Hi! > > In my PR107627 change I've missed one important case, which causes > miscompilation of f4 and f6 in the following tests. > > Combine matches there *concatsidi3_3 define_insn_and_split (as with all > other f* functions in those

Re: [PATCH] i386:Add missing OPTION_MASK_ISA_AVX512VL in i386-builtin.def for VAES builtins

2023-03-14 Thread Uros Bizjak via Gcc-patches
On Tue, Mar 14, 2023 at 7:27 AM Hu, Lin1 wrote: > > The implementation of these builtins requires support for both AVX512VL and > VAES. However, the builtins didn't request AVX512VL. As a result, compiling > pr109117-1.c with the options -mvaes -mno-avx512vl caused an ICE. > > This patch aims to

Re: Patch ping: Re: [PATCH] libgcc, i386, optabs, v2: Add __float{, un}tibf to libgcc and expand BF -> integral through SF intermediate [PR107703]

2023-03-11 Thread Uros Bizjak via Gcc-patches
On Fri, Mar 10, 2023 at 7:11 PM Ian Lance Taylor wrote: > > Jakub Jelinek writes: > > > On Wed, Mar 01, 2023 at 01:32:43PM +0100, Jakub Jelinek via Gcc-patches > > wrote: > >> On Wed, Nov 16, 2022 at 12:51:14PM +0100, Jakub Jelinek via Gcc-patches > >> wrote: > >> > On Wed, Nov 16, 2022 at

Re: [PATCH] target/108738 - limit STV chain discovery

2023-03-02 Thread Uros Bizjak via Gcc-patches
On Thu, Mar 2, 2023 at 2:28 PM Richard Biener wrote: > > The following puts a hard limit on the inherently quadratic STV chain > discovery. Without a limit for the compiler.i testcase in PR26854 > we see at -O2 > > machine dep reorg : 574.45 ( 53%) > > with release checking

[PATCH] i386: Do not constrain fmod and remainder patterns with flag_finite_math_only [PR108922]

2023-02-27 Thread Uros Bizjak via Gcc-patches
According to Intel ISA manual, fprem and fprem1 return NaN when invalid arithmetic exception is generated. This is documented in Table 8-10 of the ISA manual and makes these two instructions fully IEEE compatible. The reverted patch was based on the data from table 3-30 and 3-31 of the Intel ISA

[PATCH] i386: Introduce general_x64constmem_operand predicate

2023-02-20 Thread Uros Bizjak via Gcc-patches
Instructions that use high-part QImode registers can not be encoded with REX prefix. To avoid REX prefix, operand constraints allow only legacy QImode registers, immediates and constant memory operands. The patch introduces matching predicate, so invalid operands are not combined into instruction

Re: [PATCH] i386: Fix up replacement of registers in certain peephole2s [PR108832]

2023-02-18 Thread Uros Bizjak via Gcc-patches
On Sat, Feb 18, 2023 at 11:35 AM Jakub Jelinek wrote: > > Hi! > > As mentioned in the PR, replace_rtx has 2 modes, one that only replaces > x == from with to, the other which i386.md uses which also replaces > REGNO (x) == REGNO (from) with to if both are REGs, but assert they have > the same

[PATCH] i386: Generate QImode binary ops with high-part input register [PR108831]

2023-02-17 Thread Uros Bizjak via Gcc-patches
Following testcase: --cut here-- struct S { unsigned char pad1; unsigned char val; unsigned short pad2; }; unsigned char test_add (unsigned char a, struct S b) { a += b.val; return a; } --cut here-- should be compiled to something like: addb %dh, %al but is currently

Re: [PATCH] simplify-rtx: Fix VOIDmode operand handling in simplify_subreg [PR108805]

2023-02-17 Thread Uros Bizjak via Gcc-patches
On Fri, Feb 17, 2023 at 12:31 PM Richard Biener wrote: > > On Fri, 17 Feb 2023, Uros Bizjak wrote: > > > On Fri, Feb 17, 2023 at 8:38 AM Richard Biener wrote: > > > > > > On Thu, 16 Feb 2023, Uros Bizjak wrote: > > > > > > >

Re: [PATCH] simplify-rtx: Fix VOIDmode operand handling in simplify_subreg [PR108805]

2023-02-17 Thread Uros Bizjak via Gcc-patches
On Fri, Feb 17, 2023 at 8:38 AM Richard Biener wrote: > > On Thu, 16 Feb 2023, Uros Bizjak wrote: > > > simplify_subreg can return VOIDmode const_int operand and will > > cause ICE in simplify_gen_subreg when this operand is passed to it. > > > > The patch

[PATCH] simplify-rtx: Fix VOIDmode operand handling in simplify_subreg [PR108805]

2023-02-16 Thread Uros Bizjak via Gcc-patches
simplify_subreg can return VOIDmode const_int operand and will cause ICE in simplify_gen_subreg when this operand is passed to it. The patch prevents VOIDmode temporary from entering simplify_gen_subreg. We can't process const_int operand any further, since outermode is not an integer mode here.

[PATCH] i386: Relax extract location operand mode requirements

2023-02-15 Thread Uros Bizjak via Gcc-patches
There is no requirement on the mode of the location operand, so any supported integer mode is valid. We can relax extract location operand mode requirement of other patterns involving zero_extract RTX. 2023-02-15 Uroš Bizjak gcc/ChangeLog: * config/i386/i386.md (*cmpqi_ext_1): Use

[PATCH] testsuite/i386: Cleanup target selectors in i386 target directory.

2023-02-15 Thread Uros Bizjak via Gcc-patches
gcc/testsuite/ChangeLog: 2023-02-15 Uroš Bizjak * g++.target/i386/empty-class2.C (dg-additional-options): Remove. * gcc.target/i386/avx512fp16-reduce-op-2.c: Ditto. * gcc.target/i386/pr99464.c: Ditto. * gcc.target/i386/pr103541.c (dg-do): Compile for !ia32 target. *

[PATCH] i386: Rename extr_register_operand to int248_register_operand

2023-02-15 Thread Uros Bizjak via Gcc-patches
No functional changes. gcc/ChangeLog: 2023-02-15 Uroš Bizjak * config/i386/predicates.md (int248_register_operand): Rename from extr_register_operand. * config/i386/i386.md (*extv): Update for renamed predicate. (*extzx): Ditto. (*ashl3_doubleword_mask): Use

Re: [PATCH] target/108738 - STV bitmap operations compile-time hog

2023-02-14 Thread Uros Bizjak via Gcc-patches
On Thu, Feb 9, 2023 at 3:25 PM Richard Biener via Gcc-patches wrote: > > When the set of candidates becomes very large then repeated > bit checks on it during the build of an actual chain can become > slow because of the O(n) nature of bitmap tests. The following > switches the candidates

Re: [PATCH] target/108738 - optimize bit operations in STV

2023-02-14 Thread Uros Bizjak via Gcc-patches
On Thu, Feb 9, 2023 at 3:25 PM Richard Biener via Gcc-patches wrote: > > The following does low-hanging optimizations, combining bitmap > test and set and removing redundant operations. > > This shaves off half of the testcase compile time. > > Bootstrapped and tested on x86_64-unknown-linux-gnu,

[PATCH] i386: Relax extract location operand mode requirements [PR108516]

2023-02-13 Thread Uros Bizjak via Gcc-patches
Combine pass simplifies zero-extend of a zero-extract to: Trying 16 -> 6: 16: r86:QI#0=zero_extract(r87:HI,0x8,0x8) REG_DEAD r87:HI 6: r84:SI=zero_extend(r86:QI) REG_DEAD r86:QI Failed to match this instruction: (set (reg:SI 84 [ s.e2 ]) (zero_extract:SI (reg:HI 87)

<    1   2   3   4   5   6   7   8   9   10   >