On Thu, Jun 15, 2023 at 10:15 AM Jan Beulich wrote:
>
> On 15.06.2023 09:45, Hongtao Liu wrote:
> > On Thu, Jun 15, 2023 at 3:07 PM Uros Bizjak via Gcc-patches
> > wrote:
> >> On Thu, Jun 15, 2023 at 8:03 AM Jan Beulich via Gcc-patches
> >> wrote:
&g
On Thu, Jun 15, 2023 at 8:03 AM Jan Beulich via Gcc-patches
wrote:
>
> The input constraint for the %vmovddup alternative was wrong, as the
> upper 16 XMM registers require AVX512VL to be used with this insn. To
> compensate, introduce a new alternative permitting all 32 registers, by
>
On Wed, Jun 14, 2023 at 4:56 PM Jakub Jelinek wrote:
>
> On Wed, Jun 14, 2023 at 04:34:27PM +0200, Uros Bizjak wrote:
> > LGTM for the x86 part. I did my best, but those peephole2 patterns are
> > real PITA to be reviewed thoroughly.
> >
> > Maybe split out peep
On Wed, Jun 14, 2023 at 4:00 PM Jakub Jelinek wrote:
>
> Hi!
>
> On Wed, Jun 14, 2023 at 12:35:42PM +, Richard Biener wrote:
> > At this point two pages of code without a comment - can you introduce
> > some vertical spacing and comments as to what is matched now? The
> > split out functions
On Wed, Jun 14, 2023 at 4:00 PM Jakub Jelinek wrote:
>
> Hi!
>
> On Wed, Jun 14, 2023 at 12:35:42PM +, Richard Biener wrote:
> > At this point two pages of code without a comment - can you introduce
> > some vertical spacing and comments as to what is matched now? The
> > split out functions
Use default argument when callback function is not required to merge
rtx_equal_p and hash_rtx functions with their callback variants.
gcc/ChangeLog:
* cse.cc (hash_rtx_cb): Rename to hash_rtx.
(hash_rtx): Remove.
* early-remat.cc (remat_candidate_hasher::equal): Update
to call
On Tue, Jun 13, 2023 at 6:03 PM Roger Sayle wrote:
>
>
> This patch is the next instalment in a set of backend patches around
> improvements to ptest/vptest. A previous patch optimized the sequence
> t=pand(x,y); ptestz(t,t) into the equivalent ptestz(x,y), using the
> property that ZF is set to
On Tue, Jun 13, 2023 at 9:06 AM Jakub Jelinek wrote:
>
> Hi!
>
> On Tue, Jun 06, 2023 at 11:42:07PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > The following patch introduces {add,sub}c5_optab and pattern recognizes
> > various forms of add with carry and subtract with carry/borrow, see
> >
On Mon, Jun 12, 2023 at 4:03 PM Roger Sayle wrote:
>
>
> The following simple test case, from PR 104610, shows that memcmp () == 0
> can result in some bizarre code sequences on x86.
>
> int foo(char *a)
> {
> static const char t[] = "0123456789012345678901234567890";
> return
On Wed, Jun 7, 2023 at 8:32 AM Uros Bizjak wrote:
>
> On Wed, Jun 7, 2023 at 1:05 AM Roger Sayle wrote:
> >
> >
> > This patch addresses the last remaining issue with PR target/31985, that
> > GCC could make better use of memory addressing modes when impleme
On Wed, Jun 7, 2023 at 1:05 AM Roger Sayle wrote:
>
>
> This patch addresses the last remaining issue with PR target/31985, that
> GCC could make better use of memory addressing modes when implementing
> double word addition. This is achieved by adding a define_insn_and_split
> that combines an
;
> Thanks in advance.
> Roger
> --
>
> -Original Message-
> From: Uros Bizjak
> Sent: 06 June 2023 18:34
> To: Roger Sayle
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [x86 PATCH] Add support for stc, clc and cmc instructions in
> i386.md
>
> O
c-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures. Ok for mainline?
>
> 2022-06-06 Roger Sayle
> Uros Bizjak
>
> gcc/ChangeLog
> * config/i386/i386-expand.cc (ix86_expand_builtin) :
>
gcc/ChangeLog:
* rtl.h (function_invariant_p): Change return type from int to bool.
* reload1.cc (function_invariant_p): Change return type from
int to bool and adjust function body accordingly.
Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
Uros.
diff --git
On Tue, Jun 6, 2023 at 1:42 PM Hongtao Liu wrote:
>
> On Tue, Jun 6, 2023 at 5:11 PM Uros Bizjak wrote:
> >
> > On Tue, Jun 6, 2023 at 6:33 AM liuhongt via Gcc-patches
> > wrote:
> > >
> > > r14-1145 fold the intrinsics into gimple ABS_EXPR which has U
On Tue, Jun 6, 2023 at 6:33 AM liuhongt via Gcc-patches
wrote:
>
> r14-1145 fold the intrinsics into gimple ABS_EXPR which has UB for
> TYPE_MIN, but PABSB will store unsigned result into dst. The patch
> uses ABSU_EXPR + VCE instead of ABS_EXPR.
>
> Also don't fold _mm_abs_{pi8,pi16,pi32} w/o
On Tue, Jun 6, 2023 at 6:33 AM liuhongt via Gcc-patches
wrote:
>
> r14-1145 fold the intrinsics into gimple ABS_EXPR which has UB for
> TYPE_MIN, but PABSB will store unsigned result into dst. The patch
> uses ABSU_EXPR + VCE instead of ABS_EXPR.
>
> Also don't fold _mm_abs_{pi8,pi16,pi32} w/o
Also change one internal variable to bool.
gcc/ChangeLog:
* rtl.h (print_rtl_single): Change return type from int to void.
(print_rtl_single_with_indent): Ditto.
* print-rtl.h (class rtx_writer): Ditto. Change m_sawclose to bool.
* print-rtl.cc (rtx_writer::rtx_writer): Update
gcc/ChangeLog:
* rtl.h (reg_classes_intersect_p): Change return type from int to bool.
(reg_class_subset_p): Ditto.
* reginfo.cc (reg_classes_intersect_p): Ditto.
(reg_class_subset_p): Ditto.
Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
Uros
diff --git
On Sun, Jun 4, 2023 at 12:45 AM Roger Sayle wrote:
>
>
> This patch is the latest revision of my patch to add support for the
> STC (set carry flag), CLC (clear carry flag) and CMC (complement
> carry flag) instructions to the i386 backend, incorporating Uros'
> previous feedback. The
On Sat, Jun 3, 2023 at 7:31 PM Roger Sayle wrote:
>
>
> This patch fixes PR target/110083, an ICE-on-valid regression exposed by
> my recent PTEST improvements (to address PR target/109973). The latent
> bug (admittedly mine) is that the scalar-to-vector (STV) pass doesn't update
> or delete
Also change some internal variables to bool and recode handling of
boolean varialbes to not use bitwise or.
gcc/ChangeLog:
* rtl.h (stack_regs_mentioned): Change return type from int to bool.
* reg-stack.cc (struct_block_info_def): Change "done" to bool.
(stack_regs_mentioned_p):
On Fri, Jun 2, 2023 at 2:49 AM liuhongt wrote:
>
> Add missing insn patterns for v2si -> v2hi/v2qi and v2hi-> v2qi vector
> truncate.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/92658
> * config/i386/mmx.md
Also change some function arguments to bool and remove one instance
of always zero function argument.
gcc/ChangeLog:
* rtl.h (exp_equiv_p): Change return type from int to bool.
* cse.cc (mention_regs): Change return type from int to bool
and adjust function body accordingly.
Also fix some stalled comments.
gcc/ChangeLog:
* rtl.h (subreg_lowpart_p): Change return type from int to bool.
(active_insn_p): Ditto.
(in_sequence_p): Ditto.
(unshare_all_rtl): Change return type from int to void.
* emit-rtl.h (mem_expr_equal_p): Change return type from int
Also remove a bunch of unneeded forward declarations.
gcc/ChangeLog:
* rtl.h (true_dependence): Change return type from int to bool.
(canon_true_dependence): Ditto.
(read_dependence): Ditto.
(anti_dependence): Ditto.
(canon_anti_dependence): Ditto.
(output_dependence):
On Wed, May 31, 2023 at 9:40 AM Kewen.Lin wrote:
>
> Hi Andreas,
>
> on 2023/5/25 15:25, Andreas Krebbel wrote:
> > On 3/20/23 07:33, Kewen.Lin wrote:
> >> Hi,
> >>
> >> One of my workmates found there is a warning like:
> >>
> >> libgcc/config/rs6000/morestack.S:402: Warning: ignoring
> >>
On Wed, May 31, 2023 at 9:17 AM Richard Biener
wrote:
>
> On Tue, May 30, 2023 at 9:01 PM Jeff Law via Gcc-patches
> wrote:
> >
> >
> >
> > On 5/30/23 08:36, Uros Bizjak via Gcc-patches wrote:
> > > gcc/ChangeLog:
> > >
> > > *
gcc/ChangeLog:
* rtl.h (comparison_dominates_p): Change return type from int to bool.
(condjump_p): Ditto.
(any_condjump_p): Ditto.
(any_uncondjump_p): Ditto.
(simplejump_p): Ditto.
(returnjump_p): Ditto.
(eh_returnjump_p): Ditto.
(onlyjump_p): Ditto.
On Tue, May 30, 2023 at 9:39 AM Uros Bizjak wrote:
>
> On Mon, May 29, 2023 at 8:17 PM Roger Sayle
> wrote:
> >
> >
> > This is my proposed minimal fix for PR target/109973 (hopefully suitable
> > for backporting) that follows Jakub Jelinek's suggestion that we
On Mon, May 29, 2023 at 8:17 PM Roger Sayle wrote:
>
>
> This is my proposed minimal fix for PR target/109973 (hopefully suitable
> for backporting) that follows Jakub Jelinek's suggestion that we introduce
> CCZmode and CCCmode variants of ptest and vptest, so that the i386
> backend treats
gcc/ChangeLog:
* rtl.h (rtx_addr_can_trap_p): Change return type from int to bool.
(rtx_unstable_p): Ditto.
(reg_mentioned_p): Ditto.
(reg_referenced_p): Ditto.
(reg_used_between_p): Ditto.
(reg_set_between_p): Ditto.
(modified_between_p): Ditto.
gcc/ChangeLog:
PR target/110021
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2): Also require
TARGET_AVX512BW to generate truncv16hiv16qi2.
Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
Uros.
diff --git a/gcc/config/i386/i386-expand.cc
On Fri, May 26, 2023 at 4:46 AM liuhongt wrote:
>
> lzcnt/tzcnt has been fixed since skylake, popcnt has been fixed since
> icelake. At least for icelake and later intel Core processors, the
> errata tune is not needed. And the tune isn't need for ATOM either.
>
> Bootstrapped and regtested on
On Fri, May 26, 2023 at 4:12 AM Jiang, Haochen wrote:
>
> > gcc/ChangeLog:
> >
> > * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
> > Rewrite to expand to 2x-wider (e.g. V16QI -> V16HImode)
> > instructions when available. Emulate truncation via
> >
Rewrite ix86_expand_vecop_qihi2 to expand fo 2x-wider (e.g. V16QI -> V16HImode)
instructions when available. Currently, the compiler generates following
assembly for V16QImode multiplication (-mavx2):
vpunpcklbw %xmm0, %xmm0, %xmm3
vpunpcklbw %xmm1, %xmm1, %xmm2
vpunpckhbw
Also, move vv8qi3 expander to a better place and enable
it with TARGET_MMX_WITH_SSE. Remove handling of V8QImode from
ix86_expand_vecop_qihi2 since all partial QI->HI vector modes expand
via ix86_expand_vecop_qihi_partial.
gcc/ChangeLog:
* config/i386/i386-expand.cc
On Wed, May 24, 2023 at 12:13 PM Richard Biener wrote:
>
> The following dispatches to V2DImode CTOR expansion instead of
> using sets of (subreg:DI (reg:V16QI 146) [08]) which causes
> LRA to spill DImode and reload V16QImode. The same applies for
> V8QImode or V4HImode construction from SImode
On Wed, May 24, 2023 at 7:48 AM Alexandre Oliva wrote:
>
>
> Various x86 tests fail if the toolchain is configured with
> --enable-frame-pointer, because the unexpected extra insns mess with
> the expected asm counts. Add -fomit-frame-pointer so that they can
> still pass.
>
> Bootstrapped on
Add V8QImode and V4QImode vector shift patterns that call into
ix86_expand_vecop_qihi_partial. Generate special sequences
for constant count operands.
The patch regresses g++.dg/pr91838.C - as explained in PR91838, the
test returns different results, depending on whether V8QImode shift
pattern
On Tue, May 23, 2023 at 5:18 PM Richard Biener wrote:
>
> The following also accounts for a GPR->XMM move cost for splat
> operations and properly guards eliding the cost when moving from
> memory only for SSE4.1 or HImode or larger operands. This
> doesn't fix the PR fully yet.
>
> Bootstrapped
Returned integer vector mode costs of emulated instructions in
ix86_shift_rotate_cost are wrong and do not reflect generated
instruction sequences. Rewrite handling of different integer vector
modes and different target ABIs to return real instruction
counts in order to calcuate better costs of
Add the cost of a memory read to the cost of V*QImode vector mult sequences.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_multiplication_cost): Add
the cost of a memory read to the cost of V?QImode sequences.
Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
Uros.
diff
QImode partial vector multiplications and shifts can be implemented using
their HImode counterparts. Add infrastructure to handle V8QImode and
V4QImode vectors by extending (interleaving) their input operands to
V8HImode, performing V8HImode operation and truncating output back to
the original
On Wed, May 17, 2023 at 8:08 PM Jakub Jelinek wrote:
>
> Hi!
>
> When _Float128 support has been added to C++ for 13.1, float128t_type_node
> tree has been added - in C float128_type_node and float128t_type_node is
> the same and represents both _Float128 and __float128, but in C++ they
> are
Returned integer vector mode costs of emulated modes in
ix86_multiplication_cost are wrong and do not reflect generated
instruction sequences. Rewrite handling of different integer vector
modes and different target ABIs to return real instruction
counts in order to calculate better costs of
Revert my previous change that faked handling of V4HI and V2SImodes
in ix86_widen_mult_cost and rather return arbitrary high value
for unsupported modes. This should prevent cost estimator from
selecting non-existent vector widen multiply operation.
gcc/ChangeLog:
PR target/109807
*
Some cleanups while looking at these two functions.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2): Also
reject ymm instructions for TARGET_PREFER_AVX128. Use generic
gen_extend_insn to generate zero/sign extension instructions.
Fix comments.
On Fri, May 12, 2023 at 4:07 PM Ard Biesheuvel wrote:
> > > > > Note that the GOT reference in question is in fact a data reference:
> > > > > we
> > > > > explicitly load the address of __fentry__ from the GOT, which amounts
> > > > > to
> > > > > eager binding, rather than emitting a PLT
Remove mulv2si emulated sequence for TARGET_SSE2 and enable
only native PMULLD instruction for TARGET_SSE4_1. Ideally, the
vectorization for TARGET_SSE2 should depend on more precise cost
estimation (the PR contains patch for ix86_multiplication_cost),
but even with patched cost function the
On Thu, May 11, 2023 at 4:21 PM Roger Sayle wrote:
>
>
> PR 109766 is an interesting case of large code being generated on x86_64,
> caused by an interaction/conflict between register allocation and hardreg
> cprop, that's tricky to fix/resolve within the middle-end.
>
> The task/challenge is to
Do not crash when asking ix86_widen_mult_cost for the cost of
a widening mul operation to V4HI or V2SImode.
gcc/ChangeLog:
PR target/109807
* config/i386/i386.cc (ix86_widen_mult_cost):
Handle V4HImode and V2SImode.
gcc/testsuite/ChangeLog:
PR target/109807
*
On Thu, May 11, 2023 at 12:04 AM H.J. Lu wrote:
>
> On Wed, May 10, 2023 at 2:17 AM Uros Bizjak wrote:
> >
> > On Tue, May 9, 2023 at 10:58 AM Ard Biesheuvel wrote:
> > >
> > > The small and medium PIC code models generate profiling calls that
> > >
Add missing insn pattern for v2qi -> v2si vector extend and named
expanders to activate generation of vector extends to 8-byte and 4-byte
vectors.
gcc/ChangeLog:
PR target/92658
* config/i386/mmx.md (sse4_1_v2qiv2si2): New insn pattern.
(v4qiv4hi2): New expander.
(v2hiv2si2):
On Wed, May 10, 2023 at 9:20 PM Roger Sayle wrote:
>
>
> Hi Uros,
> This cleans up the use of [(clobber (const_int 0))] in the i386 backend.
> My apologies I must have copied this idiom from one of the other targets:
> aarch64.md, arm.md, thumb1.md, avr.md, or sparc.md.
>
> This patch has been
On Fri, Apr 28, 2023 at 2:47 AM Fangrui Song wrote:
>
> When using -mcmodel=medium, large data is placed into .l* sections. GNU ld
> places .l* sections into separate output sections. If small and medium
> code model object files are mixed, the .l* sections won't cause
> relocation overflow
/i386.cc (x86_function_profiler): Take
> ix86_direct_extern_access into account when generating calls
> to __fentry__()
HJ, is the patch OK with you?
Uros.
>
> Cc: H.J. Lu
> Cc: Jakub Jelinek
> Cc: Richard Biener
> Cc: Uros Bizjak
> Cc: Hou Wenlong
On Sat, May 6, 2023 at 4:00 PM Roger Sayle wrote:
>
>
> Hi Uros,
> This is a repost/respin of a patch that was conditionally approved:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609470.html
>
> This patch adds a convenient post-reload splitter for setting/updating
> the highpart of
Rename index_register_operand predicate to what it really does.
No functional change.
gcc/ChangeLog:
* config/i386/predicates.md (register_no_SP_operand):
Rename from index_register_operand.
(call_register_operand): Update for rename.
* config/i386/i386.md (*lea_general_[1234]):
For SSE2 targets the expander unpacks input elements into the correct
position in the V4SI vector and emits PMULUDQ instruction. The output
elements are then shuffled back to their positions in the V2SI vector.
For SSE4 targets PMULLD instruction is emitted directly.
gcc/ChangeLog:
*
The predicates of ashift to lea post-reload splitter were too broad
so the splitter tried to convert the mask shift instruction. Tighten
operand predicates to match only general registers.
gcc/ChangeLog:
PR target/109733
* config/i386/predicates.md (index_reg_operand): New predicate.
Use the same approach as in register_no_elim_operand predicate, but also
reject stack_pointer_rtx operands.
gcc/ChangeLog:
* config/i386/predicates.md (index_register_operand): Reject
arg_pointer_rtx, frame_pointer_rtx, stack_pointer_rtx and
VIRTUAL_REGISTER_P operands. Allow
On Mon, Apr 24, 2023 at 11:19 AM Segher Boessenkool
wrote:
>
> On Sun, Apr 23, 2023 at 11:06:41PM +0200, Uros Bizjak wrote:
> > > I send this patch now so that people can start testing. I don't plan to
> > > commit this for another week at least, for a week after GCC
On Sun, Apr 23, 2023 at 6:48 PM Segher Boessenkool
wrote:
>
> This minimal patch enables LRA for all targets. It does not clean up
> the target code, nor does it do anything to generic code: it just
> deletes all target definitions of TARGET_LRA_P.
>
> There are three kinds of changes:
>
> 1)
x86 was converted to TARGET_LEGITIMATE_ADDRESS_P long ago. Remove
remnants of the conversion. Also, cleanup the remaining macros a bit
by introducing INDEX_REGNO_P macro.
No functional change.
gcc/ChangeLog:
2023-04-21 Uroš Bizjak
* config/i386/i386.h (REG_OK_FOR_INDEX_P,
gcc/ChangeLog:
* config/arm/arm.cc (thumb1_legitimate_address_p):
Use VIRTUAL_REGISTER_P predicate.
(arm_eliminable_register): Ditto.
* config/avr/avr.md (push_1): Ditto.
* config/bfin/predicates.md (register_no_elim_operand): Ditto.
* config/h8300/predicates.md
Introduce extract_operator predicate to handle both, zero-extract and
sign-extract extract operations with expressions like:
(subreg:QI
(zero_extract:SWI248
(match_operand 1 "int248_register_operand" "0")
(const_int 8)
(const_int 8)) 0)
As shown in the testcase, this
Following code:
typedef __SIZE_TYPE__ size_t;
struct S1s
{
char pad1;
char val;
short pad2;
};
extern char ts[256];
_Bool foo (struct S1s a, size_t i)
{
return (ts[i] > a.val);
}
compiles with -O2 to:
movl%edi, %eax
movsbl %ah, %edi
cmpb%dil, ts(%rsi)
On Wed, Apr 19, 2023 at 1:33 AM Andrew Pinski via Gcc-patches
wrote:
>
> After a phiopt change, I got a failure of cmov9.c.
> The RTL IR has zero_extend on the outside of
> the if_then_else rather than on the side. Both
> ways are considered canonical as mentioned in
> PR 66588.
>
> This fixes
On Tue, Apr 18, 2023 at 7:20 PM Jakub Jelinek wrote:
>
> On Mon, Apr 17, 2023 at 11:27:28PM +0200, Uros Bizjak via Gcc-patches wrote:
> > --- a/gcc/rtl.h
> > +++ b/gcc/rtl.h
> > @@ -1972,6 +1972,13 @@ set_regno_raw (rtx x, unsigned int regno, unsigned
> > int
INSERTPS can select any element from src and insert into any place
of the dest. For SSE4.1 targets, compiler can generate e.g.
insertps $64, %xmm0, %xmm1
to insert element 1 from %xmm1 to element 0 of %xmm0.
gcc/ChangeLog:
PR target/94908
* config/i386/i386-builtin.def
These two predicates are similar to existing HARD_REGISTER_P and
HARD_REGISTER_NUM_P predicates and return 1 if the given register
corresponds to a virtual register.
gcc/ChangeLog:
* rtl.h (VIRTUAL_REGISTER_P): New predicate.
(VIRTUAL_REGISTER_NUM_P): Ditto.
(REGNO_PTR_FRAME_P): Use
On Wed, Apr 12, 2023 at 4:28 PM Jakub Jelinek wrote:
>
> Hi!
>
> On the following testcase, we emit weird diagnostics.
> User used the z modifier, but diagnostics talks about Z instead.
> This is because z is implemented by doing some stuff and then falling
> through into the Z case.
>
> The
On Fri, Mar 31, 2023 at 7:11 AM liuhongt wrote:
>
> RA sometimes will use lowest the cost of the mode with all different
> regclasses
> w/o check if it's hard_regno_mode_ok.
> It's impossible to put modes whose size > 8 into MASK_REGS, ajdust the cost to
> avoid potential performance issue.
I
On Thu, Mar 30, 2023 at 1:43 PM liuhongt wrote:
>
> > > Just rename the instruction and fix all its call sites. The name of
> > > the insn pattern is internal to the compiler and can be renamed at
> > > will.
> >
> > Ideally, we should standardize all the names to a standard name, so
> > e.g.
On Thu, Mar 30, 2023 at 8:17 AM Uros Bizjak wrote:
>
> On Thu, Mar 30, 2023 at 3:47 AM liuhongt wrote:
> >
> > There's some typo for the standard pattern name for unsigned_{float,fix},
> > it should be floatunsmn2/fixuns_truncmn2, not ufloatmn2/ufix_truncmn2
> > i
On Thu, Mar 30, 2023 at 3:47 AM liuhongt wrote:
>
> There's some typo for the standard pattern name for unsigned_{float,fix},
> it should be floatunsmn2/fixuns_truncmn2, not ufloatmn2/ufix_truncmn2
> in current trunk, the patch fix the typo.
>
> Also vcvttps2udq is available under AVX512VL, so it
On Wed, Mar 29, 2023 at 9:21 AM liuhongt wrote:
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
> Ok for GCC14 stage-1(or maybe trunk)?
>
> gcc/ChangeLog:
>
> * config/i386/i386-expand.cc (expand_vec_perm_blend): Generate
> vpblendd instead of vpblendw for V4SI under
On Tue, Mar 28, 2023 at 10:11 AM Jakub Jelinek wrote:
>
> Hi!
>
> The following testcase ICEs since r11-2259 because assign_386_stack_local
> -> assign_stack_local -> ix86_local_alignment now uses 64-bit alignment
> for DImode temporaries rather than 32-bit as before.
> Most of the spots in the
On Wed, Mar 22, 2023 at 3:59 AM liuhongt wrote:
>
> The target hook is only used by i386, and the current definition is
> same as default gen_reg_rtx. So there's no need for this target hook.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk(or GCC14)?
>
>
8-byte modes should be processed only for TARGET_MMX_WITH_SSE.
gcc/ChangeLog:
* config/i386/i386-expand.cc (expand_vec_perm_pblendv):
Handle 8-byte modes only with TARGET_MMX_WITH_SSE.
(expand_vec_perm_2perm_pblendv): Ditto.
Bootstrapped and regression tested on x86_64-linux-gnu
8-byte modes should be processed only for TARGET_MMX_WITH_SSE. Handle
V2SFmode and fix V2HImode handling. The resulting BLEND instructions
are always faster than MOVSS/MOVSD, so prioritize them w.r.t MOVSS/MOVSD
for TARGET_SSE4_1.
gcc/ChangeLog:
* config/i386/i386-expand.cc
Perform V2SI vector permutation in the same way as existing V2SF for
TARGET_MMX_WITH_SSE targets. The testcase:
typedef unsigned int v2si __attribute__((vector_size(8)));
v2si foo(v2si x, v2si y) { return (v2si){y[0], x[1]}; }
is currently compiled to (-O2):
foo:
movdqa %xmm0, %xmm2
On Tue, Mar 14, 2023 at 5:09 PM Jakub Jelinek wrote:
>
> Hi!
>
> In my PR107627 change I've missed one important case, which causes
> miscompilation of f4 and f6 in the following tests.
>
> Combine matches there *concatsidi3_3 define_insn_and_split (as with all
> other f* functions in those
On Tue, Mar 14, 2023 at 7:27 AM Hu, Lin1 wrote:
>
> The implementation of these builtins requires support for both AVX512VL and
> VAES. However, the builtins didn't request AVX512VL. As a result, compiling
> pr109117-1.c with the options -mvaes -mno-avx512vl caused an ICE.
>
> This patch aims to
On Fri, Mar 10, 2023 at 7:11 PM Ian Lance Taylor wrote:
>
> Jakub Jelinek writes:
>
> > On Wed, Mar 01, 2023 at 01:32:43PM +0100, Jakub Jelinek via Gcc-patches
> > wrote:
> >> On Wed, Nov 16, 2022 at 12:51:14PM +0100, Jakub Jelinek via Gcc-patches
> >> wrote:
> >> > On Wed, Nov 16, 2022 at
On Thu, Mar 2, 2023 at 2:28 PM Richard Biener wrote:
>
> The following puts a hard limit on the inherently quadratic STV chain
> discovery. Without a limit for the compiler.i testcase in PR26854
> we see at -O2
>
> machine dep reorg : 574.45 ( 53%)
>
> with release checking
According to Intel ISA manual, fprem and fprem1 return NaN when invalid
arithmetic exception is generated. This is documented in Table 8-10 of the
ISA manual and makes these two instructions fully IEEE compatible.
The reverted patch was based on the data from table 3-30 and 3-31 of the
Intel ISA
Instructions that use high-part QImode registers can not be encoded
with REX prefix. To avoid REX prefix, operand constraints allow
only legacy QImode registers, immediates and constant memory operands.
The patch introduces matching predicate, so invalid operands are not
combined into instruction
On Sat, Feb 18, 2023 at 11:35 AM Jakub Jelinek wrote:
>
> Hi!
>
> As mentioned in the PR, replace_rtx has 2 modes, one that only replaces
> x == from with to, the other which i386.md uses which also replaces
> REGNO (x) == REGNO (from) with to if both are REGs, but assert they have
> the same
Following testcase:
--cut here--
struct S
{
unsigned char pad1;
unsigned char val;
unsigned short pad2;
};
unsigned char
test_add (unsigned char a, struct S b)
{
a += b.val;
return a;
}
--cut here--
should be compiled to something like:
addb %dh, %al
but is currently
On Fri, Feb 17, 2023 at 12:31 PM Richard Biener wrote:
>
> On Fri, 17 Feb 2023, Uros Bizjak wrote:
>
> > On Fri, Feb 17, 2023 at 8:38 AM Richard Biener wrote:
> > >
> > > On Thu, 16 Feb 2023, Uros Bizjak wrote:
> > >
> > > >
On Fri, Feb 17, 2023 at 8:38 AM Richard Biener wrote:
>
> On Thu, 16 Feb 2023, Uros Bizjak wrote:
>
> > simplify_subreg can return VOIDmode const_int operand and will
> > cause ICE in simplify_gen_subreg when this operand is passed to it.
> >
> > The patch
simplify_subreg can return VOIDmode const_int operand and will
cause ICE in simplify_gen_subreg when this operand is passed to it.
The patch prevents VOIDmode temporary from entering simplify_gen_subreg.
We can't process const_int operand any further, since outermode
is not an integer mode here.
There is no requirement on the mode of the location operand, so any
supported integer mode is valid. We can relax extract location
operand mode requirement of other patterns involving zero_extract RTX.
2023-02-15 Uroš Bizjak
gcc/ChangeLog:
* config/i386/i386.md (*cmpqi_ext_1): Use
gcc/testsuite/ChangeLog:
2023-02-15 Uroš Bizjak
* g++.target/i386/empty-class2.C (dg-additional-options): Remove.
* gcc.target/i386/avx512fp16-reduce-op-2.c: Ditto.
* gcc.target/i386/pr99464.c: Ditto.
* gcc.target/i386/pr103541.c (dg-do): Compile for !ia32 target.
*
No functional changes.
gcc/ChangeLog:
2023-02-15 Uroš Bizjak
* config/i386/predicates.md (int248_register_operand):
Rename from extr_register_operand.
* config/i386/i386.md (*extv): Update for renamed predicate.
(*extzx): Ditto.
(*ashl3_doubleword_mask): Use
On Thu, Feb 9, 2023 at 3:25 PM Richard Biener via Gcc-patches
wrote:
>
> When the set of candidates becomes very large then repeated
> bit checks on it during the build of an actual chain can become
> slow because of the O(n) nature of bitmap tests. The following
> switches the candidates
On Thu, Feb 9, 2023 at 3:25 PM Richard Biener via Gcc-patches
wrote:
>
> The following does low-hanging optimizations, combining bitmap
> test and set and removing redundant operations.
>
> This shaves off half of the testcase compile time.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu,
Combine pass simplifies zero-extend of a zero-extract to:
Trying 16 -> 6:
16: r86:QI#0=zero_extract(r87:HI,0x8,0x8)
REG_DEAD r87:HI
6: r84:SI=zero_extend(r86:QI)
REG_DEAD r86:QI
Failed to match this instruction:
(set (reg:SI 84 [ s.e2 ])
(zero_extract:SI (reg:HI 87)
301 - 400 of 6070 matches
Mail list logo