On Sat, Nov 13, 2021 at 3:34 AM Hongyu Wang wrote:
>
> Hi,
>
> From the CPU's point of view, getting a cache line for writing is more
> expensive than reading. See Appendix A.2 Spinlock in:
>
> https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/
> xeon-lock-scaling-analysis
On Wed, Nov 10, 2021 at 10:09 AM Cui,Lili wrote:
>
> Hi Uros,
>
> This patch is to update mtune for alderlake.
>
> Bootstrap is ok, and no regressions for i386/x86-64 testsuite.
>
> OK for master?
>
> Update mtune for alderlake, Alder Lake Intel Hybrid Technology will not
> support
> Intel® AVX-5
On Tue, Oct 5, 2021 at 5:50 PM Eric Botcazou via Gcc-patches
wrote:
>
> Hi,
>
> the first issue is that the !gotoff_operand path of legitimize_pic_address in
> large PIC model does not make use of REG when it is available, which breaks
> for thunks because new pseudo-registers can no longer be cre
On Thu, Nov 4, 2021 at 3:44 PM H.J. Lu via Gcc-patches
wrote:
>
> Check leal and addl for x32 to fix:
>
> FAIL: gcc.target/i386/amxtile-3.c scan-assembler addq[ \\t]+\\$12
> FAIL: gcc.target/i386/amxtile-3.c scan-assembler leaq[ \\t]+4
> FAIL: gcc.target/i386/amxtile-3.c scan-assembler leaq[ \\t]+
On Tue, Nov 2, 2021 at 9:41 AM Jakub Jelinek wrote:
>
> Hi!
>
> As discussed in the PR, TImode isn't supported for -m32 on x86 (for the same
> reason as on most 32-bit targets, no support for > 2 * BITS_PER_WORD
> precision integers), but since PR32280 V1TImode is allowed with -msse in SSE
> regs,
On Mon, Nov 1, 2021 at 5:45 PM Roger Sayle wrote:
>
>
> This simple patch improves the implementation of 128-bit (TImode)
> rotations on x86_64 (a missed optimization opportunity spotted
> during the recent V1TImode improvements).
>
> Currently, the function:
>
> unsigned __int128 rotrti3(unsigned
On Mon, Nov 1, 2021 at 9:43 AM Jakub Jelinek wrote:
>
> On Mon, Nov 01, 2021 at 08:27:12AM +0100, Uros Bizjak wrote:
> > > Also, I wonder for all these patterns (previously and now added),
> > > shouldn't
> > > they have && TARGET_64BIT in conditions? I mean, we don't really support
> > > scalar
On Sun, Oct 31, 2021 at 11:02 AM Roger Sayle wrote:
>
>
> Very many thanks to Jakub for proof-reading my patch, catching my silly
> GNU-style
> mistakes and making excellent suggestions. This revised patch incorporates
> all of
> his feedback, and has been tested on x86_64-pc-linux-gnu with make
On Mon, Oct 25, 2021 at 4:16 PM Roger Sayle wrote:
>
>
> Hi Uros,
> I believe the proposed sequences should be dramatically faster than LLVM's
> implementation(s), due to the large latencies required to move values between
> the vector and scalar parts on modern x86_64 microarchitectures. All of
On Sun, Oct 24, 2021 at 6:34 PM Roger Sayle wrote:
>
>
> This patch provides RTL expanders to implement logical shifts and
> rotates of 128-bit values (stored in vector integer registers) by
> constant bit counts. Previously, GCC would transfer these values
> to a pair of scalar registers (TImode
On Fri, Oct 22, 2021 at 9:19 AM Roger Sayle wrote:
>
>
> On x86_64, V1TI mode holds a 128-bit integer value in a (vector) SSE
> register (where regular TI mode uses a pair of 64-bit general purpose
> scalar registers). This patch improves the implementation of AND, IOR,
> XOR and NOT on these val
On Thu, Oct 21, 2021 at 6:47 PM H.J. Lu wrote:
>
> PR target/98667
> * doc/invoke.texi: Document -fcf-protection requires i686 or
> new.
Obvious patch?
Uros.
> ---
> gcc/doc/invoke.texi | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/doc/i
On Thu, Oct 21, 2021 at 6:50 PM H.J. Lu wrote:
>
> On Tue, Oct 19, 2021 at 11:42 PM Uros Bizjak wrote:
> >
> > On Tue, Oct 19, 2021 at 8:23 PM H.J. Lu wrote:
> > >
> > > commit 247c407c83f0015f4b92d5f71e45b63192f6757e
> > > Author: Roger Sayle
> > > Date: Mon Oct 18 12:15:40 2021 +0100
> > >
On Tue, Oct 19, 2021 at 8:23 PM H.J. Lu wrote:
>
> commit 247c407c83f0015f4b92d5f71e45b63192f6757e
> Author: Roger Sayle
> Date: Mon Oct 18 12:15:40 2021 +0100
>
> Try placing RTL folded constants in the constant pool.
>
> My recent attempts to come up with a testcase for my patch to ev
On Mon, Oct 18, 2021 at 1:23 PM Martin Liška wrote:
>
> On 10/11/21 13:17, Martin Liška wrote:
> > On 10/4/21 23:02, Andrew Pinski wrote:
> >> It might be useful to skip tabs for the same reason as spaces really.
> >
> > Sure, be my guest.
> >
> > Martin
>
> May I please ping this i386-specific pa
2021-10-18 Uroš Bizjak
PR target/102761
gcc/ChangeLog:
* config/i386/i386.c (ix86_print_operand_address):
Error out for non-address_operand asm operands.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr102761.c: New test.
Boostrapped and regression tested on x86_64-linux-gnu {
On Fri, Oct 15, 2021 at 2:15 PM Roger Sayle wrote:
>
>
> My previous patch, which was intended to reduce the differences seen by
> the combination of -march=cascadelake and -m32, has additionally found
> some more instances where this combination behaves differently to regular
> x86_64-pc-linux-gn
On Wed, Oct 13, 2021 at 10:23 AM Roger Sayle wrote:
>
>
> Good catch. I agree with Hongtao that although my testing revealed
> no problems with the previous version of this patch, it makes sense to
> call gen_reg_rtx to generate an pseudo intermediate instead of attempting
> to reuse the existing
As explained in PR82524, LRA is not able to reload strict_low_part inout
operand with matched input operand. The patch introduces a workaround,
where we allow LRA to generate an instruction with non-matched input operand
which is split post reload to an instruction that inserts non-matched input
op
On Mon, Oct 11, 2021 at 8:26 AM liuhongt wrote:
>
> After providing expanders for reduc_umin/umax/smin/smax_scal_v4qi,
> perfomance are a little bit faster than before for reduce operations
> w/ options -O2 -march=haswell, -O2 -march=skylake-avx512
> and -Ofast -march=skylake-avx512.
>
> gcc/Cha
Convert (sign_extend:WIDE (any_logic:NARROW (memory, immediate)))
to (any_logic:WIDE (sign_extend (memory)), (sign_extend (immediate))).
This eliminates sign extension after logic operation.
2021-09-30 Uroš Bizjak
gcc/
PR target/89954
* config/i386/i386.md
(sign_extend:WIDE (any_lo
On Tue, Sep 28, 2021 at 11:33 AM Jakub Jelinek wrote:
>
> Hi!
>
> i387 has instructions to store some transcedental numbers into the top of
> stack. The problem is that what exact bit in the last place one gets for
> those depends on the current rounding mode, the CPU knows the number with
> slig
On Tue, Sep 28, 2021 at 8:42 AM liuhongt wrote:
>
> Hi:
> Bootstrapped and regtested on x86_64-pc-lunux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/102494
> * config/i386/i386-expand.c (emit_reduc_half): Hanlde V4HImode.
> * config/i386/mmx.md (reduc_pl
x-gnu{-m32,} and sde.
>
> OK for master with the updated one?
I'd put this new pattern in mmx.md to keep 64bit/32bit modes in
mmx.md, similar to e.g. FMA patterns among others.
OK with the eventual above change.
Thanks,
Uros.
>
> Uros Bizjak via Gcc-patches 于2021年9月27日周一 下午7:
On Mon, Sep 27, 2021 at 12:42 PM Hongyu Wang wrote:
>
> Hi Uros,
>
> This patch intends to support V4HF/V2HF vector type and basic operations.
>
> For 32bit target, V4HF vector is parsed same as __m64 type, V2HF
> is parsed by stack and returned from GPR since it is not specified
> by ABI.
>
> We
On Fri, Sep 24, 2021 at 1:26 PM liuhongt wrote:
>
> Hi:
> Related discussion in [1] and PR.
>
> Bootstrapped and regtest on x86_64-linux-gnu{-m32,}.
> Ok for trunk?
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574330.html
>
> gcc/ChangeLog:
>
> PR target/102464
>
On Wed, Sep 22, 2021 at 7:09 AM liuhongt wrote:
>
> Hi:
> fma/fms/fnma/fnmsv2sf4 are defined only under (TARGET_FMA || TARGET_FMA4).
> The patch extend the expanders to TARGET_AVX512VL.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
>
On Mon, Sep 20, 2021 at 8:20 PM Fāng-ruì Sòng via Gcc-patches
wrote:
>
> PING^5 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
>
> On Sat, Sep 4, 2021 at 12:11 PM Fāng-ruì Sòng wrote:
> >
> > PING^4 https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570139.html
> >
> > One major d
On Fri, Sep 17, 2021 at 5:15 AM Cui, Lili wrote:
>
>
> > -Original Message-
> > From: Uros Bizjak
> > Sent: Thursday, September 16, 2021 2:28 PM
> > To: Cui, Lili
> > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; H. J. Lu
> >
> > Subject: Re: [PATCH 3/4] [PATCH 3/4] x86: Properly handle
>
After a recent change only a boolean value is returned.
2021-09-16 Uroš Bizjak
gcc/
* config/i386/i386-protos.h (ix86_decompose_address):
Change return type to bool.
* config/i386/i386.c (ix86_decompose_address): Ditto.
Bootstrapped and regression tested on x86_64-linux-gnu {,-m32
On Wed, Sep 15, 2021 at 10:10 AM wrote:
>
> From: "H.J. Lu"
>
> Simply memcpy and memset inline strategies to avoid branches for
> -mtune=tremont:
>
> 1. Create Tremont cost model from generic cost model.
> 2. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector
>load and store
On Wed, Sep 15, 2021 at 10:09 AM wrote:
>
> From: "H.J. Lu"
>
> Initial -mtune=tremont update
>
> 1. Use Haswell scheduling model.
> 2. Assume that stack engine allows to execute push&pop instructions in
> parall.
> 3. Prepare for scheduling pass as -mtune=generic.
> 4. Use the same issue rate as
On Wed, Sep 15, 2021 at 10:10 AM wrote:
>
> From: "H.J. Lu"
>
> 1. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with
> TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY in SSE FP to FP splitters.
> 2. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with
> TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY in SSE INT
On Wed, Sep 15, 2021 at 10:10 AM wrote:
>
> From: "H.J. Lu"
>
> Check TARGET_USE_VECTOR_FP_CONVERTS or TARGET_USE_VECTOR_CONVERTS when
> handling avx_partial_xmm_update attribute. Don't convert AVX partial
> XMM register update if vector packed SSE conversion should be used.
>
> gcc/
>
>
On Thu, Aug 12, 2021 at 5:32 PM Martin Liška wrote:
>
> On 8/12/21 5:26 PM, H.J. Lu wrote:
> > Will it hurt if they have proper feature_priorities you added?
>
> No. They are unused, by we should use the proper priorities.
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (cpu_indicator_init): Add s
There is no point to check RTXes before calling force_reg,
force_reg checks for REG RTX by itself.
2021-08-26 Uroš Bizjak
gcc/
* config/i386/i386.md (*btr_1): Call force_reg unconditionally.
(conditional moves with memory inputs splitters): Ditto.
* config/i386/sse.md (one_cmpl2):
We want to replace all REGs equal to FROM.
2021-08-26 Uroš Bizjak
gcc/
PR target/102057
* config/i386/i386.md (cmove reg-reg move elimination peephole2s):
Set all_regs to true in the call to replace_rtx.
I was not able to create a testcase without warnings.
Bootstrapped and regre
On Tue, Aug 24, 2021 at 5:22 PM Hongyu Wang wrote:
>
> Hi Uros,
>
> Sorry for the late update. I have tried adjusting the combine pass but
> found it is not easy to modify shift const, so I came up with an
> alternative solution with your patch. It matches the non-canonical
> zero-extend in ix86_d
On Tue, Aug 24, 2021 at 4:57 PM H.J. Lu wrote:
>
> On Sun, Aug 15, 2021 at 11:11 PM Richard Biener
> wrote:
> >
> > On Fri, Aug 13, 2021 at 3:51 PM H.J. Lu wrote:
> > >
> > > and target("general-regs-only") function attribute
> > > were added to GCC 11. But their implementations are incomplete
On Mon, Aug 16, 2021 at 11:18 AM Hongyu Wang wrote:
>
> > So, the question is if the combine pass really needs to zero-extend
> > with 0xfffe, the left shift << 1 guarantees zero in the LSB, so
> > 0x should be better and in line with canonical zero-extension
> > RTX.
>
> The shift mas
On Mon, Aug 16, 2021 at 11:19 AM liuhongt wrote:
>
> Hi:
> avx512f_scalef2 only accept register_operand for operands[1],
> force it to reg in ldexp3.
>
> Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> Ok for trunk.
>
> gcc/ChangeLog:
>
> PR target/101930
> * config/
On Fri, Aug 13, 2021 at 9:21 AM Uros Bizjak wrote:
>
> On Fri, Aug 13, 2021 at 2:48 AM Hongyu Wang wrote:
> >
> > Hi,
> >
> > For lea + zero_extendsidi insns, if dest of lea and src of zext are the
> > same, combine them with single leal under 64bit target since 32bit
> > register will be automat
On Fri, Aug 13, 2021 at 2:48 AM Hongyu Wang wrote:
>
> Hi,
>
> For lea + zero_extendsidi insns, if dest of lea and src of zext are the
> same, combine them with single leal under 64bit target since 32bit
> register will be automatically zero-extended.
>
> Bootstrapped and regtested on x86_64-linux
On Thu, Aug 12, 2021 at 6:40 AM Hongtao Liu wrote:
> > > > Hi:
> > > > AVX512F supported vscalefs{s,d} which is the same as ldexp except the
> > > > second operand should be floating point.
> > > > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
> > > >
> > > > gcc/ChangeLog:
> > > >
On Wed, Aug 11, 2021 at 8:36 AM Uros Bizjak wrote:
>
> On Tue, Aug 10, 2021 at 2:13 PM liuhongt wrote:
> >
> > Hi:
> > AVX512F supported vscalefs{s,d} which is the same as ldexp except the
> > second operand should be floating point.
> > Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
On Tue, Aug 10, 2021 at 2:13 PM liuhongt wrote:
>
> Hi:
> AVX512F supported vscalefs{s,d} which is the same as ldexp except the
> second operand should be floating point.
> Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
>
> gcc/ChangeLog:
>
> PR target/98309
> * config
On Mon, Aug 9, 2021 at 7:47 PM H.J. Lu wrote:
>
> On Mon, Aug 9, 2021 at 8:27 AM Uros Bizjak wrote:
> >
> > On Mon, Aug 9, 2021 at 5:24 PM H.J. Lu wrote:
> > >
> > > On Sun, Aug 8, 2021 at 1:23 PM Uros Bizjak wrote:
> > > >
> > > > On Sat, Aug 7, 2021 at 4:41 PM H.J. Lu wrote:
> > > > >
> > >
On Mon, Aug 9, 2021 at 5:24 PM H.J. Lu wrote:
>
> On Sun, Aug 8, 2021 at 1:23 PM Uros Bizjak wrote:
> >
> > On Sat, Aug 7, 2021 at 4:41 PM H.J. Lu wrote:
> > >
> > > Update vector_all_ones_operand to return true for const all 1s float
> > > vectors.
> > >
> > > gcc/
> > >
> > > PR target
Name V2SF logic insns, so expand_simple_binop works with V2SF modes.
2021-08-09 Uroš Bizjak
gcc/
PR target/101812
* config/i386/mmx.md (v2sf3):
Rename from *mmx_v2sf3
gcc/testsuite/
PR target/101812
* gcc.target/i386/pr101812.c: New test.
Bootstrapped and regression teste
On Sat, Aug 7, 2021 at 4:41 PM H.J. Lu wrote:
>
> Update vector_all_ones_operand to return true for const all 1s float
> vectors.
>
> gcc/
>
> PR target/101804
> * config/i386/predicates.md (vector_all_ones_operand): Return
> true for const all 1s float vectors.
>
> gcc/tes
Add missing operand predicate, otherwise any RTX will match.
2021-08-06 Uroš Bizjak
gcc/
PR target/101797
* config/i386/i386.md (cmove reg-to-reg move elimination peephole2s):
Add general_gr_operand predicate to operand 3.
gcc/testsuite/
PR target/101797
* gcc.target/i386/
On Wed, Aug 4, 2021 at 3:34 PM H.J. Lu wrote:
>
> On Tue, Aug 3, 2021 at 6:56 AM H.J. Lu wrote:
> >
> > 1. Update x86 STORE_MAX_PIECES to use OImode and XImode only if inter-unit
> > move is enabled since x86 uses vec_duplicate, which is enabled only when
> > inter-unit move is enabled, to implem
On Wed, Aug 4, 2021 at 3:20 PM H.J. Lu wrote:
>
> To avoid stack realignment, call ix86_gen_scratch_sse_rtx to get a
> scratch SSE register to copy data with with SSE register from one
> memory location to another.
>
> gcc/
>
> PR target/101772
> * config/i386/i386-expand.c (ix86_e
On Mon, Aug 2, 2021 at 8:44 AM liuhongt wrote:
>
> From: "Guo, Xuepeng"
>
> gcc/ChangeLog:
>
> * common/config/i386/cpuinfo.h (get_available_features):
> Detect FEATURE_AVX512FP16.
> * common/config/i386/i386-common.c
> (OPTION_MASK_ISA_AVX512FP16_SET,
> OP
On Wed, Aug 4, 2021 at 5:33 AM liuhongt wrote:
>
> Hi:
> The define_peephole2 which is added by r12-2640-gf7bf03cf69ccb7dc
> should only work on general registers, considering that x86 also
> supports mov instructions between gpr, sse reg, mask reg, limiting the
> peephole2 predicate to general_
On Tue, Aug 3, 2021 at 10:15 AM Hongtao Liu wrote:
>
> On Tue, Aug 3, 2021 at 4:03 PM Uros Bizjak via Gcc-patches
> wrote:
> >
> > On Mon, Aug 2, 2021 at 7:47 PM H.J. Lu wrote:
> > >
> > > In 64-bit mode, use XMM31 for scratch SSE register to avoid vzerou
On Mon, Aug 2, 2021 at 7:47 PM H.J. Lu wrote:
>
> In 64-bit mode, use XMM31 for scratch SSE register to avoid vzeroupper
> if possible.
>
> gcc/
>
> * config/i386/i386.c (ix86_gen_scratch_sse_rtx): In 64-bit mode,
> try XMM31 to avoid vzeroupper.
>
> gcc/testsuite/
>
> * gc
On Mon, Aug 2, 2021 at 4:57 PM H.J. Lu wrote:
>
> On Mon, Aug 2, 2021 at 4:20 AM Uros Bizjak wrote:
> >
> > On Fri, Jul 30, 2021 at 11:32 PM H.J. Lu wrote:
> > >
> > > We can use TImode/OImode/XImode integers for piecewise move and store.
> > >
> > > 1. Define MAX_MOVE_MAX to 64, which is the co
On Fri, Jul 30, 2021 at 11:32 PM H.J. Lu wrote:
>
> We can use TImode/OImode/XImode integers for piecewise move and store.
>
> 1. Define MAX_MOVE_MAX to 64, which is the constant maximum number of
> bytes that a single instruction can move quickly between memory and
> registers or between two memo
On Sun, Aug 1, 2021 at 7:12 PM H.J. Lu wrote:
>
> On Sat, Jul 31, 2021 at 12:53:44PM -0700, H.J. Lu wrote:
> > On Fri, Jul 30, 2021 at 6:27 AM Jakub Jelinek via Gcc-patches
> > wrote:
> > >
> > > On Fri, Jul 30, 2021 at 12:27:39PM +0200, Uros Bizjak wrote:
> > > > Please put some space here, e.g.
On Fri, Jul 30, 2021 at 3:04 PM H.J. Lu wrote:
>
> gcc/
>
> PR target/101685
> * config/i386/i386-options.c (ix86_option_override_internal):
> Don't enable LZCNT/POPCNT if they have been disabled explicitly.
>
> gcc/testsuite/
>
> PR target/101685
> * gcc.ta
On Wed, Jul 28, 2021 at 10:36 AM Jakub Jelinek wrote:
>
> Hi!
>
> This patch improves emitted code for the non-TARGET_LZCNT case.
> As __builtin_clz* is UB on 0 argument and for !TARGET_LZCNT
> CLZ_VALUE_DEFINED_AT_ZERO is 0, it is UB even at RTL time and so we
> can take advantage of that and ass
> > On Fri, Jun 25, 2021 at 4:51 AM Hongtao Liu wrote:
> > > > >
> > > > > On Fri, Jun 25, 2021 at 12:13 AM Uros Bizjak via Gcc-patches
> > > > > wrote:
> > > > > >
> > > > > > On Thu, Jun 24, 2021 at 2:12 PM H.J. L
On Mon, Jul 26, 2021 at 1:27 PM Roger Sayle wrote:
>
>
> The following patch to the x86_64 backend improves the code generated
> for a decrement followed by a conditional move. The primary change is
> to recognize that after subtracting one, checking the result is -1 (or
> equivalently that the o
On Wed, Jul 21, 2021 at 9:44 AM liuhongt wrote:
>
> From: "Guo, Xuepeng"
>
> gcc/ChangeLog:
>
> * common/config/i386/cpuinfo.h (get_available_features):
> Detect FEATURE_AVX512FP16.
> * common/config/i386/i386-common.c
> (OPTION_MASK_ISA_AVX512FP16_SET,
> O
V sre., 21. jul. 2021 14:23 je oseba H.J. Lu napisala:
> Since
>
> commit 39671f87b2df6a1894cc11a161e4a7949d1ddccd
> Author: H.J. Lu
> Date: Thu Apr 15 05:59:48 2021 -0700
>
> x86: Use crc32 target option for CRC32 intrinsics
>
> enabled OPTION_MASK_ISA_CRC32 for -msse4 and removed TARGET_
On Wed, Jul 21, 2021 at 9:43 AM liuhongt wrote:
>
> gcc/ChangeLog:
>
> * optabs-query.c (get_best_extraction_insn): Use word_mode for
> HF field.
>
> libgcc/ChangeLog:
>
> * config/i386/32/sfp-machine.h (_FP_NANFRAC_H): New macro.
> * config/i386/64/sfp-machine.h (_
On Wed, Jul 21, 2021 at 9:43 AM liuhongt wrote:
>
> gcc/ChangeLog:
>
> * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode.
> * config/i386/i386.c (enum x86_64_reg_class): Add
> X86_64_SSEHF_CLASS.
> (merge_classes): Handle X86_64_SSEHF_CLASS.
> (e
On Wed, Jul 21, 2021 at 5:05 AM Hongtao Liu wrote:
>
> On Tue, Jul 20, 2021 at 9:41 PM Uros Bizjak wrote:
> >
> > On Tue, Jul 20, 2021 at 2:33 PM liuhongt wrote:
> > >
> > > Hi:
> > > As mention in
> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575420.html
> > >
> > > cut start
On Tue, Jul 20, 2021 at 2:33 PM liuhongt wrote:
>
> Hi:
> As mention in
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575420.html
>
> cut start-
> > note for the lowpart we can just view-convert away the excess bits,
> > fully re-using the mask. We generate surprisingly "good"
These patterns result in non-atomic sequence.
2021-07-21 Uroš Bizjak
gcc/
PR target/100182
* config/i386/sync.md (define_peephole2 atomic_storedi_fpu):
Remove.
(define_peephole2 atomic_loaddi_fpu): Ditto.
gcc/testsuite/
PR target/100182
* gcc.target/i386/pr71245-1.c: R
On Sun, Jul 18, 2021 at 3:40 AM H.J. Lu wrote:
>
> For -mgeneral-regs-only, enable the GPR only instructions which are
> enabled implicitly by SSE ISAs unless they have been disabled explicitly.
>
> gcc/
>
> PR target/101492
> * common/config/i386/i386-common.c (ix86_handle_option)
On Sun, Jul 18, 2021 at 6:47 PM H.J. Lu wrote:
>
> Don't issue vzeroupper before function call if callee returns AVX
> register since callee must be compiled with AVX.
>
> gcc/
>
> PR target/101495
> * config/i386/i386.c (ix86_check_avx_upper_stores): Moved before
> ix86_av
General regs on 32bit targets do not support 128bit modes,
including TDmode.
gcc/
2021-07-15 Uroš Bizjak
PR target/101346
* config/i386/i386.h (VALID_SSE_REG_MODE): Add TDmode.
(VALID_INT_MODE_P): Add SDmode and DDmode.
Add TDmode for TARGET_64BIT.
(VALID_DFP_MODE_P): Remo
V čet., 15. jul. 2021 10:49 je oseba Kewen.Lin
napisala:
> on 2021/7/15 下午4:23, Uros Bizjak wrote:
> > On Thu, Jul 15, 2021 at 10:04 AM Kewen.Lin wrote:
> >>
> >> Hi Uros,
> >>
> >> on 2021/7/15 下午3:17, Uros Bizjak wrote:
> >>> On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin wrote:
>
> on 2
On Thu, Jul 15, 2021 at 10:04 AM Kewen.Lin wrote:
>
> Hi Uros,
>
> on 2021/7/15 下午3:17, Uros Bizjak wrote:
> > On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin wrote:
> >>
> >> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> >>> on 2021/7/14 下午2:38, Richard Biener wrote:
> On Tue, Jul 13, 2
On Thu, Jul 15, 2021 at 9:07 AM Kewen.Lin wrote:
>
> on 2021/7/14 下午3:45, Kewen.Lin via Gcc-patches wrote:
> > on 2021/7/14 下午2:38, Richard Biener wrote:
> >> On Tue, Jul 13, 2021 at 4:59 PM Kewen.Lin wrote:
> >>>
> >>> on 2021/7/13 下午8:42, Richard Biener wrote:
> On Tue, Jul 13, 2021 at 12:
On Tue, Jul 13, 2021 at 8:59 PM Jakub Jelinek wrote:
>
> On Tue, Jul 13, 2021 at 09:35:18AM -0700, H.J. Lu wrote:
> > Here is the v3 patch. OK for master?
>
> From my POV LGTM, but please give Uros a chance to chime in.
>
> > From ceab81ef97ab102c410830c41ba7fea911170d1a Mon Sep 17 00:00:00 2001
AVX does not support 32-byte integer compares, required by
ix86_expand_vector_set_var. The following patch fixes vec_set
expanders by introducing new vec_setm_avx2_operand predicate for AVX
vector modes.
gcc/
2021-07-12 Uroš Bizjak
PR target/101424
* config/i386/predicates.md (vec_se
2021-07-09 Uroš Bizjak
gcc/
* recog.c (memory_address_addr_space_p): Change the type to bool.
Return true/false instead of 1/0.
(offsettable_memref_p): Ditto.
(offsettable_nonstrict_memref_p): Ditto.
(offsettable_address_addr_space_p): Ditto.
Change the type of addressp
In addition to the obvious cut-n-pasto where *udivmodsi4_pow2_zext_2
never matches, limit the range of the immediate operand to prevent
out of range immediate operand of AND instruction.
Found by inspection, the patterns rarely match (if at all), since
tree optimizers do the transformation before
On Fri, Jul 9, 2021 at 10:25 AM Iain Sandoe wrote:
>
> (early) ping;
> if possible I’d like to get this onto master in time to back-port for 11.2.
>
> > On 4 Jul 2021, at 21:08, Iain Sandoe wrote:
> >
> > Hi,
> >
> > (I’m not going to defend the status quo here, it seems a bit prone
> > to confus
On Thu, Jul 8, 2021 at 10:25 AM Roger Sayle wrote:
>
>
> This patch tweaks the way GCC handles 32-bit integer division on
> x86_64, when the numerator is constant. Currently the function
>
> int foo (int x) {
> return 100/x;
> }
>
> generates the code:
> foo:movl$100, %eax
> clt
V1SI mode shift is needed to shift 32bit operands and consequently we
need to implement V1SI moves and pushes.
2021-07-08 Uroš Bizjak
gcc/
PR target/100637
* config/i386/i386-expand.c (ix86_expand_sse_unpack):
Handle V4QI mode.
* config/i386/mmx.md (V_32): New mode iterator.
To generate sane code a SSE4.1 variable PBLENDV instruction is needed.
Also enable variable vec_set through vec_setm_operand predicate
for TARGET_SSE4_1 instead of TARGET_AVX2. ix86_expand_vector_init_duplicate
is able to emulate vpbroadcast{b,w} with pxor/pshufb.
2021-07-06 Uroš Bizjak
gcc/
2021-07-05 Uroš Bizjak
gcc/
PR target/100637
* config/i386/i386-expand.c (ix86_split_mmx_punpck):
Handle V4QI and V2HI modes.
(expand_vec_perm_blend): Allow 4-byte vector modes with TARGET_SSE4_1.
Handle V4QI mode. Emit mmx_pblendvb32 for 4-byte modes.
(expand_vec_perm_p
On Fri, Jul 2, 2021 at 12:48 PM Hongyu Wang wrote:
>
> >
> > On Fri, Jul 2, 2021 at 10:30 AM Hongyu Wang wrote:
> > >
> > > Hi,
> > >
> > > For instructions like cvtss2si, there is no need to output the 'l'
> > > or 'q' suffixes just like cvtss2usi, since the output operand is always
> > > regist
On Fri, Jul 2, 2021 at 10:30 AM Hongyu Wang wrote:
>
> Hi,
>
> For instructions like cvtss2si, there is no need to output the 'l'
> or 'q' suffixes just like cvtss2usi, since the output operand is always
> register and those suffixes are only used to distinguish ambiguous
> memory operands.
>
> Bo
On Fri, Jul 2, 2021 at 4:28 AM Kewen.Lin wrote:
>
> Hi,
>
> With Hongtao's help (thanks), we got the SPEC2017 performance
> evaluation result on x86_64 (see [1]), this new parameter
> ira-consider-dup-in-all-alts has negative effects on i386.
> Since we observed it can benefit ports aarch64 and rs
On Fri, Jul 2, 2021 at 8:25 AM Hongtao Liu wrote:
> > > AVX512FP16 is disclosed, refer to [1].
> > > There're 100+ instructions for AVX512FP16, 67 gcc patches, for the
> > > convenience of review, we divide the 67 patches into 2 major parts.
> > > The first part is 2 patches containing bas
No functional changes.
2021-07-01 Uroš Bizjak
gcc/
* config/i386/predicates.md (ix86_endbr_immediate_operand):
Return true/false instead of 1/0.
(movq_parallel): Ditto.
Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
Pushed to master.
Uros.
diff --git a/gcc/confi
No functional changes.
2021-07-01 Uroš Bizjak
gcc/
* recog.c (general_operand): Return true/false instead of 1/0.
(register_operand): Ditto.
(immediate_operand): Ditto.
(const_int_operand): Ditto.
(const_scalar_int_operand): Ditto.
(const_double_operand): Ditto.
(pu
On Thu, Jul 1, 2021 at 2:42 PM H.J. Lu wrote:
>
> Hi Uros,
>
> On Thu, Jul 1, 2021 at 1:32 AM Hongtao Liu wrote:
> >
> > On Tue, Jun 29, 2021 at 6:16 AM H.J. Lu wrote:
> > >
> > > 1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTOR
> > > operands to vector broadcast from an i
On Wed, Jun 30, 2021 at 12:50 PM Richard Biener
wrote:
>
> On Wed, Jun 30, 2021 at 10:47 AM Uros Bizjak via Gcc-patches
> wrote:
> >
> > This RFC patch changes the type of predicates to bool. However, some
> > of the targets (e.g. x86) use indirect functions to
On Thu, Jul 1, 2021 at 2:40 PM H.J. Lu wrote:
>
> On Thu, Jul 1, 2021 at 4:10 AM Uros Bizjak wrote:
> >
> > [Sorry for double post, gcc-patches address was wrong in original post]
> >
> > On Thu, Jul 1, 2021 at 7:48 AM liuhongt wrote:
> > >
> > > Hi:
> > > AVX512FP16 is disclosed, refer to [1]
[Sorry for double post, gcc-patches address was wrong in original post]
On Thu, Jul 1, 2021 at 7:48 AM liuhongt wrote:
>
> Hi:
> AVX512FP16 is disclosed, refer to [1].
> There're 100+ instructions for AVX512FP16, 67 gcc patches, for the
> convenience of review, we divide the 67 patches into
The patch adds integer nabs "(NEG (ABS (...)))" instructions, adds STV
conversion and adjusts STV cost calculations accordingly. When CMOV
instruction is used to implement abs, the sign is determined from the
preceding operand negation, and CMOVS is used to select between
negated and non-negated v
This RFC patch changes the type of predicates to bool. However, some
of the targets (e.g. x86) use indirect functions to call the
predicates, so without the local change, the build fails. Putting the
patch through CI bots should weed out the problems, but I have no
infrastructure to do it myself.
gcc/
2021-06-21 Uroš Bizjak
PR target/95046
* config/i386/mmx.md (vec_addsubv2sf3): New insn pattern.
gcc/testsuite/
2021-06-21 Uroš Bizjak
PR target/95046
* gcc.target/i386/pr95046-9.c: New test.
Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
Pushed to
On Fri, Jun 25, 2021 at 8:48 AM Richard Biener wrote:
>
> On Thu, 24 Jun 2021, Uros Bizjak wrote:
>
> > On Thu, Jun 24, 2021 at 1:07 PM Richard Biener wrote:
> >
> > > This addds SLP pattern recognition for the SSE3/AVX [v]addsubp{ds} v0, v1
> > > instructions which compute { v0[0] - v1[0], v0[1]
601 - 700 of 1175 matches
Mail list logo