[PATCH] i386: Fix ICE in ix86_print_opreand_address [PR 102761]

2021-10-18 Thread Uros Bizjak via Gcc-patches
2021-10-18 Uroš Bizjak PR target/102761 gcc/ChangeLog: * config/i386/i386.c (ix86_print_operand_address): Error out for non-address_operand asm operands. gcc/testsuite/ChangeLog: * gcc.target/i386/pr102761.c: New test. Boostrapped and regression tested on x86_64-linux-gnu

Re: [PATCH] Allow early sets of SSE hard registers from standard_sse_constant_p

2021-10-15 Thread Uros Bizjak via Gcc-patches
On Fri, Oct 15, 2021 at 2:15 PM Roger Sayle wrote: > > > My previous patch, which was intended to reduce the differences seen by > the combination of -march=cascadelake and -m32, has additionally found > some more instances where this combination behaves differently to regular >

Re: [PATCH v2] x86_64: Some SUBREG related optimization tweaks to i386 backend.

2021-10-13 Thread Uros Bizjak via Gcc-patches
On Wed, Oct 13, 2021 at 10:23 AM Roger Sayle wrote: > > > Good catch. I agree with Hongtao that although my testing revealed > no problems with the previous version of this patch, it makes sense to > call gen_reg_rtx to generate an pseudo intermediate instead of attempting > to reuse the

[PATCH] i386: Improve workaround for PR82524 LRA limitation [PR85730]

2021-10-12 Thread Uros Bizjak via Gcc-patches
As explained in PR82524, LRA is not able to reload strict_low_part inout operand with matched input operand. The patch introduces a workaround, where we allow LRA to generate an instruction with non-matched input operand which is split post reload to an instruction that inserts non-matched input

Re: [PATCH][i386] Support reduc_{plus, smax, smin, umax, umin}_scal_v4qi.

2021-10-11 Thread Uros Bizjak via Gcc-patches
On Mon, Oct 11, 2021 at 8:26 AM liuhongt wrote: > > After providing expanders for reduc_umin/umax/smin/smax_scal_v4qi, > perfomance are a little bit faster than before for reduce operations > w/ options -O2 -march=haswell, -O2 -march=skylake-avx512 > and -Ofast -march=skylake-avx512. > >

[PATCH] i386: Eliminate sign extension after logic operation [PR89954]

2021-09-30 Thread Uros Bizjak via Gcc-patches
Convert (sign_extend:WIDE (any_logic:NARROW (memory, immediate))) to (any_logic:WIDE (sign_extend (memory)), (sign_extend (immediate))). This eliminates sign extension after logic operation. 2021-09-30 Uroš Bizjak gcc/ PR target/89954 * config/i386/i386.md (sign_extend:WIDE

Re: [PATCH] i386: Don't emit fldpi etc. if -frounding-math [PR102498]

2021-09-28 Thread Uros Bizjak via Gcc-patches
On Tue, Sep 28, 2021 at 11:33 AM Jakub Jelinek wrote: > > Hi! > > i387 has instructions to store some transcedental numbers into the top of > stack. The problem is that what exact bit in the last place one gets for > those depends on the current rounding mode, the CPU knows the number with >

Re: [PATCH] [i386] Support reduc_{plus, smax, smin, umax, min}_scal_v4hi.

2021-09-28 Thread Uros Bizjak via Gcc-patches
On Tue, Sep 28, 2021 at 8:42 AM liuhongt wrote: > > Hi: > Bootstrapped and regtested on x86_64-pc-lunux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > > PR target/102494 > * config/i386/i386-expand.c (emit_reduc_half): Hanlde V4HImode. > * config/i386/mmx.md

Re: [PATCH] AVX512FP16:support basic 64/32bit vector type and operation.

2021-09-28 Thread Uros Bizjak via Gcc-patches
x-gnu{-m32,} and sde. > > OK for master with the updated one? I'd put this new pattern in mmx.md to keep 64bit/32bit modes in mmx.md, similar to e.g. FMA patterns among others. OK with the eventual above change. Thanks, Uros. > > Uros Bizjak via Gcc-patches 于2021

Re: [PATCH] AVX512FP16:support basic 64/32bit vector type and operation.

2021-09-27 Thread Uros Bizjak via Gcc-patches
On Mon, Sep 27, 2021 at 12:42 PM Hongyu Wang wrote: > > Hi Uros, > > This patch intends to support V4HF/V2HF vector type and basic operations. > > For 32bit target, V4HF vector is parsed same as __m64 type, V2HF > is parsed by stack and returned from GPR since it is not specified > by ABI. > > We

Re: [PATCH] [GIMPLE] Simplify (_Float16) ceil ((double) x) to .CEIL (x) when available.

2021-09-24 Thread Uros Bizjak via Gcc-patches
On Fri, Sep 24, 2021 at 1:26 PM liuhongt wrote: > > Hi: > Related discussion in [1] and PR. > > Bootstrapped and regtest on x86_64-linux-gnu{-m32,}. > Ok for trunk? > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574330.html > > gcc/ChangeLog: > > PR target/102464 >

Re: [PATCH] Support 64bit fma/fms/fnma/fnms under avx512vl.

2021-09-22 Thread Uros Bizjak via Gcc-patches
On Wed, Sep 22, 2021 at 7:09 AM liuhongt wrote: > > Hi: > fma/fms/fnma/fnmsv2sf4 are defined only under (TARGET_FMA || TARGET_FMA4). > The patch extend the expanders to TARGET_AVX512VL. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > >

Re: [PATCH] x86-64: Remove HAVE_LD_PIE_COPYRELOC

2021-09-21 Thread Uros Bizjak via Gcc-patches
On Mon, Sep 20, 2021 at 8:20 PM Fāng-ruì Sòng via Gcc-patches wrote: > > PING^5 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html > > On Sat, Sep 4, 2021 at 12:11 PM Fāng-ruì Sòng wrote: > > > > PING^4 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html > > > > One major

Re: [PATCH 3/4] [PATCH 3/4] x86: Properly handle USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS

2021-09-17 Thread Uros Bizjak via Gcc-patches
On Fri, Sep 17, 2021 at 5:15 AM Cui, Lili wrote: > > > > -Original Message- > > From: Uros Bizjak > > Sent: Thursday, September 16, 2021 2:28 PM > > To: Cui, Lili > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; H. J. Lu > > > > Subject:

[PATCH] [i386] Change ix86_decompose_address return type to bool.

2021-09-16 Thread Uros Bizjak via Gcc-patches
After a recent change only a boolean value is returned. 2021-09-16 Uroš Bizjak gcc/ * config/i386/i386-protos.h (ix86_decompose_address): Change return type to bool. * config/i386/i386.c (ix86_decompose_address): Ditto. Bootstrapped and regression tested on x86_64-linux-gnu

Re: [PATCH 2/4] [PATCH 2/4] x86: Update memcpy/memset inline strategies for -mtune=tremont

2021-09-16 Thread Uros Bizjak via Gcc-patches
On Wed, Sep 15, 2021 at 10:10 AM wrote: > > From: "H.J. Lu" > > Simply memcpy and memset inline strategies to avoid branches for > -mtune=tremont: > > 1. Create Tremont cost model from generic cost model. > 2. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector >load and store

Re: [PATCH 1/4] [PATCH 1/4] x86: Update -mtune=tremont

2021-09-16 Thread Uros Bizjak via Gcc-patches
On Wed, Sep 15, 2021 at 10:09 AM wrote: > > From: "H.J. Lu" > > Initial -mtune=tremont update > > 1. Use Haswell scheduling model. > 2. Assume that stack engine allows to execute push instructions in > parall. > 3. Prepare for scheduling pass as -mtune=generic. > 4. Use the same issue rate as

Re: [PATCH 4/4] [PATCH 4/4] x86: Add TARGET_SSE_PARTIAL_REG_[FP_]CONVERTS_DEPENDENCY

2021-09-16 Thread Uros Bizjak via Gcc-patches
On Wed, Sep 15, 2021 at 10:10 AM wrote: > > From: "H.J. Lu" > > 1. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with > TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY in SSE FP to FP splitters. > 2. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with > TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY in SSE INT

Re: [PATCH 3/4] [PATCH 3/4] x86: Properly handle USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTS

2021-09-16 Thread Uros Bizjak via Gcc-patches
On Wed, Sep 15, 2021 at 10:10 AM wrote: > > From: "H.J. Lu" > > Check TARGET_USE_VECTOR_FP_CONVERTS or TARGET_USE_VECTOR_CONVERTS when > handling avx_partial_xmm_update attribute. Don't convert AVX partial > XMM register update if vector packed SSE conversion should be used. > > gcc/ > >

Re: [PATCH] i386: support micro-levels in target{, _clone} attrs [PR101696]

2021-09-13 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 12, 2021 at 5:32 PM Martin Liška wrote: > > On 8/12/21 5:26 PM, H.J. Lu wrote: > > Will it hurt if they have proper feature_priorities you added? > > No. They are unused, by we should use the proper priorities. gcc/ChangeLog: * common/config/i386/cpuinfo.h (cpu_indicator_init): Add

[PATCH] [i386] Call force_reg unconditionally.

2021-08-26 Thread Uros Bizjak via Gcc-patches
There is no point to check RTXes before calling force_reg, force_reg checks for REG RTX by itself. 2021-08-26 Uroš Bizjak gcc/ * config/i386/i386.md (*btr_1): Call force_reg unconditionally. (conditional moves with memory inputs splitters): Ditto. * config/i386/sse.md (one_cmpl2):

[PATCH] [i386] Set all_regs to true in the call to replace_rtx [PR102057]

2021-08-26 Thread Uros Bizjak via Gcc-patches
We want to replace all REGs equal to FROM. 2021-08-26 Uroš Bizjak gcc/ PR target/102057 * config/i386/i386.md (cmove reg-reg move elimination peephole2s): Set all_regs to true in the call to replace_rtx. I was not able to create a testcase without warnings. Bootstrapped and

Re: [PATCH] i386: Add peephole for lea and zero extend [PR 101716]

2021-08-25 Thread Uros Bizjak via Gcc-patches
condition for decompose. (ix86_rtx_costs): Adjust cost for lea with non-canonical zero-extend. OK. Thanks, Uros. > Uros Bizjak 于2021年8月16日周一 下午5:26写道: > > > > > On Mon, Aug 16, 2021 at 11:18 AM Hongyu Wang wrote: > > > > > > > So, the question is if

Re: [GCC-11] [PATCH 0/5] Finish and general-regs-only

2021-08-25 Thread Uros Bizjak via Gcc-patches
On Tue, Aug 24, 2021 at 4:57 PM H.J. Lu wrote: > > On Sun, Aug 15, 2021 at 11:11 PM Richard Biener > wrote: > > > > On Fri, Aug 13, 2021 at 3:51 PM H.J. Lu wrote: > > > > > > and target("general-regs-only") function attribute > > > were added to GCC 11. But their implementations are

Re: [PATCH] i386: Add peephole for lea and zero extend [PR 101716]

2021-08-16 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 16, 2021 at 11:18 AM Hongyu Wang wrote: > > > So, the question is if the combine pass really needs to zero-extend > > with 0xfffe, the left shift << 1 guarantees zero in the LSB, so > > 0x should be better and in line with canonical zero-extension > > RTX. > > The shift

Re: [PATCH] [i386] Fix ICE.

2021-08-16 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 16, 2021 at 11:19 AM liuhongt wrote: > > Hi: > avx512f_scalef2 only accept register_operand for operands[1], > force it to reg in ldexp3. > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > Ok for trunk. > > gcc/ChangeLog: > > PR target/101930 > *

Re: [PATCH] i386: Add peephole for lea and zero extend [PR 101716]

2021-08-16 Thread Uros Bizjak via Gcc-patches
On Fri, Aug 13, 2021 at 9:21 AM Uros Bizjak wrote: > > On Fri, Aug 13, 2021 at 2:48 AM Hongyu Wang wrote: > > > > Hi, > > > > For lea + zero_extendsidi insns, if dest of lea and src of zext are the > > same, combine them with single leal under 6

Re: [PATCH] i386: Add peephole for lea and zero extend [PR 101716]

2021-08-13 Thread Uros Bizjak via Gcc-patches
On Fri, Aug 13, 2021 at 2:48 AM Hongyu Wang wrote: > > Hi, > > For lea + zero_extendsidi insns, if dest of lea and src of zext are the > same, combine them with single leal under 64bit target since 32bit > register will be automatically zero-extended. > > Bootstrapped and regtested on

Re: [PATCH] Extend ldexp{s, d}f3 to vscalefs{s, d} when TARGET_AVX512F and TARGET_SSE_MATH.

2021-08-12 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 12, 2021 at 6:40 AM Hongtao Liu wrote: > > > > Hi: > > > > AVX512F supported vscalefs{s,d} which is the same as ldexp except the > > > > second operand should be floating point. > > > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > > > > > > > > gcc/ChangeLog: > > > >

Re: [PATCH] Extend ldexp{s, d}f3 to vscalefs{s, d} when TARGET_AVX512F and TARGET_SSE_MATH.

2021-08-11 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 11, 2021 at 8:36 AM Uros Bizjak wrote: > > On Tue, Aug 10, 2021 at 2:13 PM liuhongt wrote: > > > > Hi: > > AVX512F supported vscalefs{s,d} which is the same as ldexp except the > > second operand should be floating point. > > Bootstrapped an

Re: [PATCH] Extend ldexp{s, d}f3 to vscalefs{s, d} when TARGET_AVX512F and TARGET_SSE_MATH.

2021-08-11 Thread Uros Bizjak via Gcc-patches
On Tue, Aug 10, 2021 at 2:13 PM liuhongt wrote: > > Hi: > AVX512F supported vscalefs{s,d} which is the same as ldexp except the > second operand should be floating point. > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. > > gcc/ChangeLog: > > PR target/98309 > *

Re: [PATCH v3] x86: Optimize load of const all 1s FP vectors

2021-08-09 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 9, 2021 at 7:47 PM H.J. Lu wrote: > > On Mon, Aug 9, 2021 at 8:27 AM Uros Bizjak wrote: > > > > On Mon, Aug 9, 2021 at 5:24 PM H.J. Lu wrote: > > > > > > On Sun, Aug 8, 2021 at 1:23 PM Uros Bizjak wrote: > > > > >

Re: [PATCH v2] x86: Optimize load of const all 1s float vectors

2021-08-09 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 9, 2021 at 5:24 PM H.J. Lu wrote: > > On Sun, Aug 8, 2021 at 1:23 PM Uros Bizjak wrote: > > > > On Sat, Aug 7, 2021 at 4:41 PM H.J. Lu wrote: > > > > > > Update vector_all_ones_operand to return true for const all 1s float > > > vec

[PATCH] i386: Name V2SF logic insns [PR101812]

2021-08-09 Thread Uros Bizjak via Gcc-patches
Name V2SF logic insns, so expand_simple_binop works with V2SF modes. 2021-08-09 Uroš Bizjak gcc/ PR target/101812 * config/i386/mmx.md (v2sf3): Rename from *mmx_v2sf3 gcc/testsuite/ PR target/101812 * gcc.target/i386/pr101812.c: New test. Bootstrapped and regression

Re: [PATCH] x86: Optimize load of const all 1s float vectors

2021-08-08 Thread Uros Bizjak via Gcc-patches
On Sat, Aug 7, 2021 at 4:41 PM H.J. Lu wrote: > > Update vector_all_ones_operand to return true for const all 1s float > vectors. > > gcc/ > > PR target/101804 > * config/i386/predicates.md (vector_all_ones_operand): Return > true for const all 1s float vectors. > >

[PATCH] i386: Fix conditional move reg-to-reg move elimination peepholes [PR101797]

2021-08-06 Thread Uros Bizjak via Gcc-patches
Add missing operand predicate, otherwise any RTX will match. 2021-08-06 Uroš Bizjak gcc/ PR target/101797 * config/i386/i386.md (cmove reg-to-reg move elimination peephole2s): Add general_gr_operand predicate to operand 3. gcc/testsuite/ PR target/101797 *

Re: [PATCH v2] x86: Update STORE_MAX_PIECES

2021-08-04 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 4, 2021 at 3:34 PM H.J. Lu wrote: > > On Tue, Aug 3, 2021 at 6:56 AM H.J. Lu wrote: > > > > 1. Update x86 STORE_MAX_PIECES to use OImode and XImode only if inter-unit > > move is enabled since x86 uses vec_duplicate, which is enabled only when > > inter-unit move is enabled, to

Re: [PATCH] x86: Avoid stack realignment when copying data with SSE register

2021-08-04 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 4, 2021 at 3:20 PM H.J. Lu wrote: > > To avoid stack realignment, call ix86_gen_scratch_sse_rtx to get a > scratch SSE register to copy data with with SSE register from one > memory location to another. > > gcc/ > > PR target/101772 > * config/i386/i386-expand.c

Re: [PATCH 5/6] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions.

2021-08-04 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 2, 2021 at 8:44 AM liuhongt wrote: > > From: "Guo, Xuepeng" > > gcc/ChangeLog: > > * common/config/i386/cpuinfo.h (get_available_features): > Detect FEATURE_AVX512FP16. > * common/config/i386/i386-common.c > (OPTION_MASK_ISA_AVX512FP16_SET, >

Re: [PATCH] [i386] Refine predicate of peephole2 to general_reg_operand. [PR target/101743]

2021-08-04 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 4, 2021 at 5:33 AM liuhongt wrote: > > Hi: > The define_peephole2 which is added by r12-2640-gf7bf03cf69ccb7dc > should only work on general registers, considering that x86 also > supports mov instructions between gpr, sse reg, mask reg, limiting the > peephole2 predicate to

Re: [PATCH] x86: Use XMM31 for scratch SSE register

2021-08-03 Thread Uros Bizjak via Gcc-patches
On Tue, Aug 3, 2021 at 10:15 AM Hongtao Liu wrote: > > On Tue, Aug 3, 2021 at 4:03 PM Uros Bizjak via Gcc-patches > wrote: > > > > On Mon, Aug 2, 2021 at 7:47 PM H.J. Lu wrote: > > > > > > In 64-bit mode, use XMM31 for scratch SSE register to avoid vzerou

Re: [PATCH] x86: Use XMM31 for scratch SSE register

2021-08-03 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 2, 2021 at 7:47 PM H.J. Lu wrote: > > In 64-bit mode, use XMM31 for scratch SSE register to avoid vzeroupper > if possible. > > gcc/ > > * config/i386/i386.c (ix86_gen_scratch_sse_rtx): In 64-bit mode, > try XMM31 to avoid vzeroupper. > > gcc/testsuite/ > > *

Re: [PATCH v7 03/10] x86: Update piecewise move and store

2021-08-02 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 2, 2021 at 4:57 PM H.J. Lu wrote: > > On Mon, Aug 2, 2021 at 4:20 AM Uros Bizjak wrote: > > > > On Fri, Jul 30, 2021 at 11:32 PM H.J. Lu wrote: > > > > > > We can use TImode/OImode/XImode integers for piecewise move and store. > > &

Re: [PATCH v6 03/10] x86: Update piecewise move and store

2021-08-02 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 30, 2021 at 11:32 PM H.J. Lu wrote: > > We can use TImode/OImode/XImode integers for piecewise move and store. > > 1. Define MAX_MOVE_MAX to 64, which is the constant maximum number of > bytes that a single instruction can move quickly between memory and > registers or between two

Re: [PATCH] i386: Improve SImode constant - __builtin_clzll for -mno-lzcnt

2021-08-01 Thread Uros Bizjak via Gcc-patches
On Sun, Aug 1, 2021 at 7:12 PM H.J. Lu wrote: > > On Sat, Jul 31, 2021 at 12:53:44PM -0700, H.J. Lu wrote: > > On Fri, Jul 30, 2021 at 6:27 AM Jakub Jelinek via Gcc-patches > > wrote: > > > > > > On Fri, Jul 30, 2021 at 12:27:39PM +0200, Uros Bizjak wrote: >

Re: [PATCH] x86: Don't enable LZCNT/POPCNT if disabled explicitly

2021-07-30 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 30, 2021 at 3:04 PM H.J. Lu wrote: > > gcc/ > > PR target/101685 > * config/i386/i386-options.c (ix86_option_override_internal): > Don't enable LZCNT/POPCNT if they have been disabled explicitly. > > gcc/testsuite/ > > PR target/101685 > *

Re: [PATCH] i386: Improve extensions of __builtin_clz and constant - __builtin_clz for -mno-lzcnt [PR78103]

2021-07-30 Thread Uros Bizjak via Gcc-patches
On Wed, Jul 28, 2021 at 10:36 AM Jakub Jelinek wrote: > > Hi! > > This patch improves emitted code for the non-TARGET_LZCNT case. > As __builtin_clz* is UB on 0 argument and for !TARGET_LZCNT > CLZ_VALUE_DEFINED_AT_ZERO is 0, it is UB even at RTL time and so we > can take advantage of that and

Re: PING^1 [PATCH v2] x86: Check AVX512 without mask instructions

2021-07-30 Thread Uros Bizjak via Gcc-patches
On Mon, Jul 26, 2021 at 5:33 AM Hongtao Liu wrote: > > On Wed, Jul 14, 2021 at 8:27 PM H.J. Lu wrote: > > > > On Fri, Jun 25, 2021 at 5:39 AM H.J. Lu wrote: > > > > > > On Fri, Jun 25, 2021 at 12:50 AM Uros Bizjak wrote: > > > > > >

Re: [x86_64 PATCH] Decrement followed by cmov improvements.

2021-07-30 Thread Uros Bizjak via Gcc-patches
On Mon, Jul 26, 2021 at 1:27 PM Roger Sayle wrote: > > > The following patch to the x86_64 backend improves the code generated > for a decrement followed by a conditional move. The primary change is > to recognize that after subtracting one, checking the result is -1 (or > equivalently that the

Re: [PATCH 04/10] AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16 instructions.

2021-07-22 Thread Uros Bizjak via Gcc-patches
On Wed, Jul 21, 2021 at 9:44 AM liuhongt wrote: > > From: "Guo, Xuepeng" > > gcc/ChangeLog: > > * common/config/i386/cpuinfo.h (get_available_features): > Detect FEATURE_AVX512FP16. > * common/config/i386/i386-common.c > (OPTION_MASK_ISA_AVX512FP16_SET, >

Re: [PATCH] x86: Remove OPTION_MASK_ISA_SSE4_2 from CRC32 _builtin functions

2021-07-21 Thread Uros Bizjak via Gcc-patches
V sre., 21. jul. 2021 14:23 je oseba H.J. Lu napisala: > Since > > commit 39671f87b2df6a1894cc11a161e4a7949d1ddccd > Author: H.J. Lu > Date: Thu Apr 15 05:59:48 2021 -0700 > > x86: Use crc32 target option for CRC32 intrinsics > > enabled OPTION_MASK_ISA_CRC32 for -msse4 and removed

Re: [PATCH 03/10] [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations.

2021-07-21 Thread Uros Bizjak via Gcc-patches
On Wed, Jul 21, 2021 at 9:43 AM liuhongt wrote: > > gcc/ChangeLog: > > * optabs-query.c (get_best_extraction_insn): Use word_mode for > HF field. > > libgcc/ChangeLog: > > * config/i386/32/sfp-machine.h (_FP_NANFRAC_H): New macro. > * config/i386/64/sfp-machine.h

Re: [PATCH 02/10] [i386] Enable _Float16 type for TARGET_SSE2 and above.

2021-07-21 Thread Uros Bizjak via Gcc-patches
On Wed, Jul 21, 2021 at 9:43 AM liuhongt wrote: > > gcc/ChangeLog: > > * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode. > * config/i386/i386.c (enum x86_64_reg_class): Add > X86_64_SSEHF_CLASS. > (merge_classes): Handle X86_64_SSEHF_CLASS. >

Re: [PATCH] Support logic shift left/right for avx512 mask type.

2021-07-21 Thread Uros Bizjak via Gcc-patches
On Wed, Jul 21, 2021 at 5:05 AM Hongtao Liu wrote: > > On Tue, Jul 20, 2021 at 9:41 PM Uros Bizjak wrote: > > > > On Tue, Jul 20, 2021 at 2:33 PM liuhongt wrote: > > > > > > Hi: > > > As mention in > > > https://gcc.gnu.org/pipermail

Re: [PATCH] Support logic shift left/right for avx512 mask type.

2021-07-20 Thread Uros Bizjak via Gcc-patches
On Tue, Jul 20, 2021 at 2:33 PM liuhongt wrote: > > Hi: > As mention in > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575420.html > > cut start- > > note for the lowpart we can just view-convert away the excess bits, > > fully re-using the mask. We generate surprisingly "good"

[PATCH] i386: Remove atomic_storedi_fpu and atomic_loaddi_fpu peepholes [PR100182]

2021-07-19 Thread Uros Bizjak via Gcc-patches
These patterns result in non-atomic sequence. 2021-07-21 Uroš Bizjak gcc/ PR target/100182 * config/i386/sync.md (define_peephole2 atomic_storedi_fpu): Remove. (define_peephole2 atomic_loaddi_fpu): Ditto. gcc/testsuite/ PR target/100182 * gcc.target/i386/pr71245-1.c:

Re: [PATCH] ix86: Enable the GPR only instructions for -mgeneral-regs-only

2021-07-18 Thread Uros Bizjak via Gcc-patches
On Sun, Jul 18, 2021 at 3:40 AM H.J. Lu wrote: > > For -mgeneral-regs-only, enable the GPR only instructions which are > enabled implicitly by SSE ISAs unless they have been disabled explicitly. > > gcc/ > > PR target/101492 > * common/config/i386/i386-common.c

Re: [PATCH] x86: Don't issue vzeroupper if callee returns AVX register

2021-07-18 Thread Uros Bizjak via Gcc-patches
On Sun, Jul 18, 2021 at 6:47 PM H.J. Lu wrote: > > Don't issue vzeroupper before function call if callee returns AVX > register since callee must be compiled with AVX. > > gcc/ > > PR target/101495 > * config/i386/i386.c (ix86_check_avx_upper_stores): Moved before >

[PATCH] i386: Fix ix86_hard_regno_mode_ok for TDmode on 32bit targets [PR101346]

2021-07-15 Thread Uros Bizjak via Gcc-patches
General regs on 32bit targets do not support 128bit modes, including TDmode. gcc/ 2021-07-15 Uroš Bizjak PR target/101346 * config/i386/i386.h (VALID_SSE_REG_MODE): Add TDmode. (VALID_INT_MODE_P): Add SDmode and DDmode. Add TDmode for TARGET_64BIT. (VALID_DFP_MODE_P):

Re: [PATCH v3] vect: Recog mul_highpart pattern

2021-07-15 Thread Uros Bizjak via Gcc-patches
V čet., 15. jul. 2021 10:49 je oseba Kewen.Lin napisala: > on 2021/7/15 下午4:23, Uros Bizjak wrote: > > On Thu, Jul 15, 2021 at 10:04 AM Kewen.Lin wrote: > >> > >> Hi Uros, > >> > >> on 2021/7/15 下午3:17, Uros Bizjak wrote: > >

Re: [PATCH v3] vect: Recog mul_highpart pattern

2021-07-15 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 15, 2021 at 10:04 AM Kewen.Lin wrote: > > Hi Uros, > > on 2021/7/15 下午3:17, Uros Bizjak wrote: > > On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin wrote: > >> > >> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote: > >>> on 2021/7/14 下午2

Re: [PATCH v3] vect: Recog mul_highpart pattern

2021-07-15 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin wrote: > > on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote: > > on 2021/7/14 下午2:38, Richard Biener wrote: > >> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin wrote: > >>> > >>> on 2021/7/13 下午8:42, Richard Biener wrote: > On Tue, Jul 13, 2021 at

Re: [PATCH v3] x86: Don't enable UINTR in 32-bit mode

2021-07-13 Thread Uros Bizjak via Gcc-patches
On Tue, Jul 13, 2021 at 8:59 PM Jakub Jelinek wrote: > > On Tue, Jul 13, 2021 at 09:35:18AM -0700, H.J. Lu wrote: > > Here is the v3 patch. OK for master? > > From my POV LGTM, but please give Uros a chance to chime in. > > > From ceab81ef97ab102c410830c41ba7fea911170d1a Mon Sep 17 00:00:00

[PATCH] i386: Fix vec_set expanders [PR101424]

2021-07-12 Thread Uros Bizjak via Gcc-patches
AVX does not support 32-byte integer compares, required by ix86_expand_vector_set_var. The following patch fixes vec_set expanders by introducing new vec_setm_avx2_operand predicate for AVX vector modes. gcc/ 2021-07-12 Uroš Bizjak PR target/101424 * config/i386/predicates.md

[PATCH] Change the type of memory classification functions to bool

2021-07-09 Thread Uros Bizjak via Gcc-patches
2021-07-09 Uroš Bizjak gcc/ * recog.c (memory_address_addr_space_p): Change the type to bool. Return true/false instead of 1/0. (offsettable_memref_p): Ditto. (offsettable_nonstrict_memref_p): Ditto. (offsettable_address_addr_space_p): Ditto. Change the type of addressp

[PATCH] i386: Fix *udivmodsi4_pow2_zext_? patterns

2021-07-09 Thread Uros Bizjak via Gcc-patches
In addition to the obvious cut-n-pasto where *udivmodsi4_pow2_zext_2 never matches, limit the range of the immediate operand to prevent out of range immediate operand of AND instruction. Found by inspection, the patterns rarely match (if at all), since tree optimizers do the transformation before

Re: Ping: [PATCH] Darwini,X86: Adjust call clobbers to allow for lazy-binding [PR100152].

2021-07-09 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 9, 2021 at 10:25 AM Iain Sandoe wrote: > > (early) ping; > if possible I’d like to get this onto master in time to back-port for 11.2. > > > On 4 Jul 2021, at 21:08, Iain Sandoe wrote: > > > > Hi, > > > > (I’m not going to defend the status quo here, it seems a bit prone > > to

Re: [x86_64 PATCH]: Improvement to signed division of integer constant.

2021-07-08 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 8, 2021 at 10:25 AM Roger Sayle wrote: > > > This patch tweaks the way GCC handles 32-bit integer division on > x86_64, when the numerator is constant. Currently the function > > int foo (int x) { > return 100/x; > } > > generates the code: > foo:movl$100, %eax >

[PATCH] i386: Add pack/unpack patterns for 32bit vectors [PR100637]

2021-07-08 Thread Uros Bizjak via Gcc-patches
V1SI mode shift is needed to shift 32bit operands and consequently we need to implement V1SI moves and pushes. 2021-07-08 Uroš Bizjak gcc/ PR target/100637 * config/i386/i386-expand.c (ix86_expand_sse_unpack): Handle V4QI mode. * config/i386/mmx.md (V_32): New mode iterator.

[PATCH] i386: Add variable vec_set for 32bit vectors [PR97194]

2021-07-06 Thread Uros Bizjak via Gcc-patches
To generate sane code a SSE4.1 variable PBLENDV instruction is needed. Also enable variable vec_set through vec_setm_operand predicate for TARGET_SSE4_1 instead of TARGET_AVX2. ix86_expand_vector_init_duplicate is able to emulate vpbroadcast{b,w} with pxor/pshufb. 2021-07-06 Uroš Bizjak

[PATCH] i386: Implement 4-byte vector (V4QI/V2HI) constant permutations [PR100637]

2021-07-05 Thread Uros Bizjak via Gcc-patches
2021-07-05 Uroš Bizjak gcc/ PR target/100637 * config/i386/i386-expand.c (ix86_split_mmx_punpck): Handle V4QI and V2HI modes. (expand_vec_perm_blend): Allow 4-byte vector modes with TARGET_SSE4_1. Handle V4QI mode. Emit mmx_pblendvb32 for 4-byte modes.

Re: [PATCH] [i386] Remove rex64suffix for v?cvtt?(ss|sd)*2si

2021-07-02 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 2, 2021 at 12:48 PM Hongyu Wang wrote: > > > > > On Fri, Jul 2, 2021 at 10:30 AM Hongyu Wang wrote: > > > > > > Hi, > > > > > > For instructions like cvtss2si, there is no need to output the 'l' > > > or 'q' suffixes just like cvtss2usi, since the output operand is always > > >

Re: [PATCH] [i386] Remove rex64suffix for v?cvtt?(ss|sd)*2si

2021-07-02 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 2, 2021 at 10:30 AM Hongyu Wang wrote: > > Hi, > > For instructions like cvtss2si, there is no need to output the 'l' > or 'q' suffixes just like cvtss2usi, since the output operand is always > register and those suffixes are only used to distinguish ambiguous > memory operands. > >

Re: [PATCH] i386: Disable param ira-consider-dup-in-all-alts [PR100328]

2021-07-02 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 2, 2021 at 4:28 AM Kewen.Lin wrote: > > Hi, > > With Hongtao's help (thanks), we got the SPEC2017 performance > evaluation result on x86_64 (see [1]), this new parameter > ira-consider-dup-in-all-alts has negative effects on i386. > Since we observed it can benefit ports aarch64 and

Re: [PATCH 0/2] Initial support for AVX512FP16

2021-07-02 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 2, 2021 at 8:25 AM Hongtao Liu wrote: > > > AVX512FP16 is disclosed, refer to [1]. > > > There're 100+ instructions for AVX512FP16, 67 gcc patches, for the > > > convenience of review, we divide the 67 patches into 2 major parts. > > > The first part is 2 patches containing

[PATCH] i386: Return true/false instead of 1/0 from predicates.

2021-07-01 Thread Uros Bizjak via Gcc-patches
No functional changes. 2021-07-01 Uroš Bizjak gcc/ * config/i386/predicates.md (ix86_endbr_immediate_operand): Return true/false instead of 1/0. (movq_parallel): Ditto. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Pushed to master. Uros. diff --git

[PATCH] Return true/false instead of 1/0 from generic predicates.

2021-07-01 Thread Uros Bizjak via Gcc-patches
No functional changes. 2021-07-01 Uroš Bizjak gcc/ * recog.c (general_operand): Return true/false instead of 1/0. (register_operand): Ditto. (immediate_operand): Ditto. (const_int_operand): Ditto. (const_scalar_int_operand): Ditto. (const_double_operand): Ditto.

Re: [PATCH v6 1/2] x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast

2021-07-01 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 1, 2021 at 2:42 PM H.J. Lu wrote: > > Hi Uros, > > On Thu, Jul 1, 2021 at 1:32 AM Hongtao Liu wrote: > > > > On Tue, Jun 29, 2021 at 6:16 AM H.J. Lu wrote: > > > > > > 1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTOR > > > operands to vector broadcast from an

[PATCH] Change the type of predicates to bool.

2021-07-01 Thread Uros Bizjak via Gcc-patches
On Wed, Jun 30, 2021 at 12:50 PM Richard Biener wrote: > > On Wed, Jun 30, 2021 at 10:47 AM Uros Bizjak via Gcc-patches > wrote: > > > > This RFC patch changes the type of predicates to bool. However, some > > of the targets (e.g. x86) use indirect functions to

Re: [PATCH 0/2] Initial support for AVX512FP16

2021-07-01 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 1, 2021 at 2:40 PM H.J. Lu wrote: > > On Thu, Jul 1, 2021 at 4:10 AM Uros Bizjak wrote: > > > > [Sorry for double post, gcc-patches address was wrong in original post] > > > > On Thu, Jul 1, 2021 at 7:48 AM liuhongt wrote: > > > > > >

Re: [PATCH 0/2] Initial support for AVX512FP16

2021-07-01 Thread Uros Bizjak via Gcc-patches
[Sorry for double post, gcc-patches address was wrong in original post] On Thu, Jul 1, 2021 at 7:48 AM liuhongt wrote: > > Hi: > AVX512FP16 is disclosed, refer to [1]. > There're 100+ instructions for AVX512FP16, 67 gcc patches, for the > convenience of review, we divide the 67 patches into

[PATCH] i386: Add integer nabs instructions [PR101044]

2021-07-01 Thread Uros Bizjak via Gcc-patches
The patch adds integer nabs "(NEG (ABS (...)))" instructions, adds STV conversion and adjusts STV cost calculations accordingly. When CMOV instruction is used to implement abs, the sign is determined from the preceding operand negation, and CMOVS is used to select between negated and non-negated

[RFC PATCH] Change the type of predicates to bool.

2021-06-30 Thread Uros Bizjak via Gcc-patches
This RFC patch changes the type of predicates to bool. However, some of the targets (e.g. x86) use indirect functions to call the predicates, so without the local change, the build fails. Putting the patch through CI bots should weed out the problems, but I have no infrastructure to do it myself.

[PATCH] i386: Add V2SFmode vec_addsub pattern [PR95046]

2021-06-29 Thread Uros Bizjak via Gcc-patches
gcc/ 2021-06-21 Uroš Bizjak PR target/95046 * config/i386/mmx.md (vec_addsubv2sf3): New insn pattern. gcc/testsuite/ 2021-06-21 Uroš Bizjak PR target/95046 * gcc.target/i386/pr95046-9.c: New test. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Pushed

Re: [PATCH][RFC] Add x86 subadd SLP pattern

2021-06-25 Thread Uros Bizjak via Gcc-patches
On Fri, Jun 25, 2021 at 8:48 AM Richard Biener wrote: > > On Thu, 24 Jun 2021, Uros Bizjak wrote: > > > On Thu, Jun 24, 2021 at 1:07 PM Richard Biener wrote: > > > > > This addds SLP pattern recognition for the SSE3/AVX [v]addsubp{ds} v0, v1 > > > instructi

Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only

2021-06-25 Thread Uros Bizjak via Gcc-patches
On Fri, Jun 25, 2021 at 4:51 AM Hongtao Liu wrote: > > On Fri, Jun 25, 2021 at 12:13 AM Uros Bizjak via Gcc-patches > wrote: > > > > On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu wrote: > > > > > > CPUID functions are used to detect CPU features. If vector I

Re: [PATCH] x86: Compile CPUID functions with -mgeneral-regs-only

2021-06-24 Thread Uros Bizjak via Gcc-patches
On Thu, Jun 24, 2021 at 2:12 PM H.J. Lu wrote: > > CPUID functions are used to detect CPU features. If vector ISAs > are enabled, compiler is free to use them in these functions. Add > __attribute__ ((target("general-regs-only"))) to CPUID functions > to avoid vector instructions. These

Re: [PATCH][RFC] Add x86 subadd SLP pattern

2021-06-24 Thread Uros Bizjak via Gcc-patches
On Thu, Jun 24, 2021 at 1:07 PM Richard Biener wrote: > This addds SLP pattern recognition for the SSE3/AVX [v]addsubp{ds} v0, v1 > instructions which compute { v0[0] - v1[0], v0[1], + v1[1], ... } > thus subtract, add alternating on lanes, starting with subtract. > > It adds a corresponding

[RFC PATCH] i386: Add pack/unpack patterns for 64bit vectors [PR89021]

2021-06-24 Thread Uros Bizjak via Gcc-patches
2021-06-24 Uroš Bizjak gcc/ PR target/89021 * config/i386/i386-expand.c (ix86_expand_sse_unpack): Handle V8QI and V4HI modes. * config/i386/mmx.md (sse4_1_v4qiv4hi2): New insn pattern. (sse4_1_v4qiv4hi2): Ditto. (mmxpackmode): New mode attribute.

Re: [PATCH] [i386] Revert x86_order_regs_for_local_alloc changes in r12-1669.

2021-06-24 Thread Uros Bizjak via Gcc-patches
On Thu, Jun 24, 2021 at 10:44 AM liuhongt wrote: > > Still put general regs as first alloca order. > > This should fix 2 failures introduced by r12-1669, also add xfail to new > failed testcases to temporarily avoid regression, eventually xfail should > be removed. > > compare_test log on

[PATCH] i386: Add PPERM two-operand 64bit vector permutation [PR89021]

2021-06-23 Thread Uros Bizjak via Gcc-patches
Add emulation of V8QI PPERM permutations for TARGET_XOP target. Similar to PSHUFB, the permutation is performed with V16QI PPERM instruction, where selector is defined in V16QI mode with inactive elements set to 0x80. Specific to two operand permutations is the remapping of elements from the

[PATCH] i386: Prevent unwanted combine from LZCNT to BSR [PR101175]

2021-06-23 Thread Uros Bizjak via Gcc-patches
The current RTX pattern for BSR allows combine pass to convert LZCNT insn to BSR. Note that the LZCNT has a defined behavior to return the operand size when operand is zero, where BSR has not. Add a BSR specific setting of zero-flag to RTX pattern of BSR insn in order to avoid matching unwanted

Re: [PATCH] Disparage slightly the mask register alternative for bitwise operations. [PR target/101142]

2021-06-23 Thread Uros Bizjak via Gcc-patches
On Wed, Jun 23, 2021 at 11:41 AM Uros Bizjak wrote: > > On Wed, Jun 23, 2021 at 11:32 AM Hongtao Liu wrote: > > > > > > > > Also when allocano cost of GENERAL_REGS is same as MASK_REGS, > > > > > > > allocate > > >

Re: [PATCH] Disparage slightly the mask register alternative for bitwise operations. [PR target/101142]

2021-06-23 Thread Uros Bizjak via Gcc-patches
On Wed, Jun 23, 2021 at 11:32 AM Hongtao Liu wrote: > > > > > > Also when allocano cost of GENERAL_REGS is same as MASK_REGS, > > > > > > allocate > > > > > > MASK_REGS first since it has already been disparaged. > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > PR

Re: [PATCH] Disparage slightly the mask register alternative for bitwise operations. [PR target/101142]

2021-06-23 Thread Uros Bizjak via Gcc-patches
On Mon, Jun 21, 2021 at 10:08 AM Hongtao Liu wrote: > > On Mon, Jun 21, 2021 at 3:28 PM Uros Bizjak via Gcc-patches > wrote: > > > > On Mon, Jun 21, 2021 at 6:56 AM liuhongt wrote: > > > > > > The avx512 supports bitwise operations with mask regis

Re: [PATCH][RFC] Add x86 subadd SLP pattern

2021-06-22 Thread Uros Bizjak via Gcc-patches
On Tue, Jun 22, 2021 at 12:34 PM Richard Biener wrote: > > On Tue, 22 Jun 2021, Uros Bizjak wrote: > > > On Tue, Jun 22, 2021 at 11:42 AM Richard Sandiford > > wrote: > > > > > >> Well, the pattern is called addsub in the x86 world because highpart >

Re: [PATCH][RFC] Add x86 subadd SLP pattern

2021-06-22 Thread Uros Bizjak via Gcc-patches
On Tue, Jun 22, 2021 at 11:42 AM Richard Sandiford wrote: > >> Well, the pattern is called addsub in the x86 world because highpart > >> does add and lowpart does sub. In left-to-right writing systems > >> highpart comes before lowpart, so you have addsub. > > > > The other targets mentioned do

Re: [x86_64 PATCH] PR target/11877: Use xor to write zero to memory with -Os

2021-06-21 Thread Uros Bizjak via Gcc-patches
On Mon, Jun 21, 2021 at 12:28 PM Jakub Jelinek wrote: > > On Mon, Jun 21, 2021 at 12:14:09PM +0200, Richard Biener wrote: > > > But we could do what I've done in > > > r11-7694-gd55ce33a34a8e33d17285228b32cf1e564241a70 > > > - have int ix86_last_zero_store_uid; > > > set to INSN_UID of the last

Re: [PATCH] Disparage slightly the mask register alternative for bitwise operations. [PR target/101142]

2021-06-21 Thread Uros Bizjak via Gcc-patches
On Mon, Jun 21, 2021 at 6:56 AM liuhongt wrote: > > The avx512 supports bitwise operations with mask registers, but the > throughput of those instructions is much lower than that of the > corresponding gpr version, so we would additionally disparages > slightly the mask register alternative for

Re: [x86_64 PATCH] PR target/11877: Use xor to write zero to memory with -Os

2021-06-21 Thread Uros Bizjak via Gcc-patches
On Sun, Jun 20, 2021 at 5:37 PM Roger Sayle wrote: > > > The following patch attempts to resolve PR target/11877 (without > triggering PR/23102). On x86_64, writing an SImode or DImode zero > to memory uses an instruction encoding that is larger than first > clearing a register (using xor) then

<    4   5   6   7   8   9   10   11   12   13   >