Prathamesh Kulkarni writes:
> On Thu, 10 Aug 2023 at 21:27, Richard Sandiford
> wrote:
>>
>> Prathamesh Kulkarni writes:
>> >> static bool
>> >> is_simple_vla_size (poly_uint64 size)
>> >> {
>> >> if (size.is_constant ())
>> >> return false;
>> >> for (int i = 1; i < ARRAY_SIZE (size.coe
Thanks for the clean-ups. But...
"Kewen.Lin" writes:
> Hi,
>
> Following Richi's suggestion [1], this patch is to move the
> handlings on VMAT_GATHER_SCATTER in the final loop nest
> of function vectorizable_load to its own loop. Basically
> it duplicates the final loop nest, clean up some usel
Juzhe-Zhong writes:
> Hi, there is genrecog issue happens in RISC-V backend.
>
> This is the ICE info:
>
> 0xfa3ba4 poly_int_pod<2u, unsigned short>::to_constant() const
> ../../../riscv-gcc/gcc/poly-int.h:504
> 0x28eaa91 recog_5
> ../../../riscv-gcc/gcc/config/riscv/bitmanip.md:31
Richard Biener writes:
> On Fri, 11 Aug 2023, juzhe.zh...@rivai.ai wrote:
>
>> Hi, Richi.
>>
>> > 1. Target is using loop MASK as the partial vector loop control.
>> >> I don't think it checks for this?
>>
>> I am not sure whether I understand EXTRACT_LAST correctly.
>> But if target doesn't use
Richard Biener writes:
> When we vectorize fold-left reductions with partial vectors but
> no target operation available we use a vector conditional to force
> excess elements to zero. But that doesn't correctly preserve
> the sign of zero. The following patch disables partial vector
> support i
Siddhesh Poyarekar writes:
> On 2023-08-08 10:30, Siddhesh Poyarekar wrote:
>>> Do you have a suggestion for the language to address libgcc,
>>> libstdc++, etc. and libiberty, libbacktrace, etc.?
>>
>> I'll work on this a bit and share a draft.
>
> Hi David,
>
> Here's what I came up with for di
Prathamesh Kulkarni writes:
>> static bool
>> is_simple_vla_size (poly_uint64 size)
>> {
>> if (size.is_constant ())
>> return false;
>> for (int i = 1; i < ARRAY_SIZE (size.coeffs); ++i)
>> if (size[i] != (i <= 1 ? size[0] : 0))
> Just wondering is this should be (i == 1 ? size[0] : 0
Richard Biener writes:
> On Thu, Aug 10, 2023 at 3:44 PM Richard Sandiford
> wrote:
>>
>> Richard Biener via Gcc-patches writes:
>> > On Wed, Aug 9, 2023 at 6:16 PM Andrew Pinski via Gcc-patches
>> > wrote:
>> >>
>> >> If `A` has a range of `[0,0][100,INF]` and the comparison
>> >> of `A < 50`.
Richard Biener via Gcc-patches writes:
> On Wed, Aug 9, 2023 at 6:16 PM Andrew Pinski via Gcc-patches
> wrote:
>>
>> If `A` has a range of `[0,0][100,INF]` and the comparison
>> of `A < 50`. This should be optimized to `A <= 0` (which then
>> will be optimized to just `A == 0`).
>> This patch imp
Jakub Jelinek writes:
> On Wed, Aug 09, 2023 at 06:27:20PM +0100, Richard Sandiford wrote:
>> Jakub Jelinek writes:
>> > On Wed, Aug 09, 2023 at 05:55:28PM +0100, Richard Sandiford wrote:
>> >> Jakub: do you remember what the reason was? I don't mind dropping
>> >> "function", but it feels weird
Jakub Jelinek writes:
> On Wed, Aug 09, 2023 at 05:55:28PM +0100, Richard Sandiford wrote:
>> Jakub: do you remember what the reason was? I don't mind dropping
>> "function", but it feels weird to drop the quotes around "simd".
>> Seems like, if we do that, there'll one day be a patch to add
>> t
"Andre Vieira (lists)" writes:
> Here is my new version, see inline response to your comments.
>
> New cover letter:
>
> This patch enables the use of mixed-types for simd clones for AArch64,
> adds aarch64 as a target_vect_simd_clones and corrects the way the
> simdlen is chosen for non-specifi
"juzhe.zh...@rivai.ai" writes:
> Hi, Richi.
>
>>> that should be
>
>>> || (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
>>> && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
>
>>> I think. It seems to imply that SLP isn't supported with
>>> masking/lengthing.
>
> Oh, yes. At first glance, the
Richard Ball writes:
> ACLE has added intrinsics to bridge between SVE and Neon.
>
> The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and
> SVE vectors.
>
> This patch adds support to GCC for the following 3 intrinsics:
> svset_neonq, svget_neonq and svdup_neonq
>
> gcc/Chan
Richard Ball writes:
> This patch adds support for the Cortex-A520 CPU to GCC.
>
> No regressions on aarch64-none-elf.
>
> Ok for master?
>
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add
> Cortex-A520 CPU.
> * config/aarch64/aarch64-tune.md: Regene
"Andre Vieira (lists)" writes:
> Hi,
>
> This patch enables the use of mixed-types for simd clones for AArch64
> and adds aarch64 as a target_vect_simd_clones.
>
> Bootstrapped and regression tested on aarch64-unknown-linux-gnu
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.cc (currentl
Prathamesh Kulkarni writes:
> On Fri, 4 Aug 2023 at 20:36, Richard Sandiford
> wrote:
>>
>> Full review this time, sorry for the skipping the tests earlier.
> Thanks for the detailed review! Please find my responses inline below.
>>
>> Prathamesh Kulkarni writes:
>> > diff --git a/gcc/fold-const
Full review this time, sorry for the skipping the tests earlier.
Prathamesh Kulkarni writes:
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 7e5494dfd39..680d0e54fd4 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -85,6 +85,10 @@ along with GCC; see the file COPYING3.
Richard Biener writes:
> The following fixes a problem with my last attempt of avoiding
> out-of-bound shift values for vectorized right shifts of widened
> operands. Instead of truncating the shift amount with a bitwise
> and we actually need to saturate it to the target precision.
>
> The follo
YunQiang Su writes:
> PR #104914
>
> On TRULY_NOOP_TRUNCATION_MODES_P (DImode, SImode)) == true platforms,
> zero_extract (SI, SI) can be sign-extended. So, if a zero_extract (DI,
> DI) following with an sign_extend(SI, DI) can be merged to a single
> zero_extract (SI, SI).
>
> gcc/ChangeLog:
>
Tamar Christina writes:
>> >> Do you see vect_constant_defs in practice, or is this just for
>> >> completeness?
>> >> I would expect any constants to appear as direct operands. I don't
>> >> mind keeping it if it's just a belt-and-braces thing though.
>> >
>> > In the latency case where I had a
Richard Sandiford writes:
> Prathamesh Kulkarni writes:
>> On Tue, 25 Jul 2023 at 18:25, Richard Sandiford
>> wrote:
>>>
>>> Hi,
>>>
>>> Thanks for the rework and sorry for the slow review.
>> Hi Richard,
>> Thanks for the suggestions! Please find my responses inline below.
>>>
>>> Prathamesh K
Prathamesh Kulkarni writes:
> On Tue, 25 Jul 2023 at 18:25, Richard Sandiford
> wrote:
>>
>> Hi,
>>
>> Thanks for the rework and sorry for the slow review.
> Hi Richard,
> Thanks for the suggestions! Please find my responses inline below.
>>
>> Prathamesh Kulkarni writes:
>> > Hi Richard,
>> >
Tamar Christina writes:
>> > +
>> > +(define_constraint "D3"
>> > + "@internal
>> > + A constraint that matches vector of immediates that is with 0 to
>> > +(bits(mode)/2)-1."
>> > + (and (match_code "const,const_vector")
>> > + (match_test "aarch64_const_vec_all_same_in_range_p (op, 0,
>> >
can_div_trunc_p (a, b, &Q, &r) tries to compute a Q and r that
satisfy the usual conditions for truncating division:
(1) a = b * Q + r
(2) |b * Q| <= |a|
(3) |r| < |b|
We can compute Q using the constant component (the case when
all indeterminates are zero). Since |r| < |b| for th
Hao Liu OS writes:
> Hi Richard,
>
> Update the patch with a simple case (see below case and comments). It shows
> a live stmt may not have reduction def, which introduce the ICE.
>
> Is it OK for trunk?
OK, thanks.
Richard
>
> Fix the assertion failure on empty reduction define in info_
Richard Biener writes:
> [...]
>> >> in vect_determine_precisions_from_range. Maybe we should drop
>> >> the shift handling from there and instead rely on
>> >> vect_determine_precisions_from_users, extending:
>> >>
>> >> if (TREE_CODE (shift) != INTEGER_CST
>> >> || !wi::ltu_p (wi::to_w
Tamar Christina writes:
> Hi All,
>
> Currently we segfault when len == 0 for an attribute list.
>
> essentially [cons: =0, 1, 2, 3; attrs: ] segfaults but should be equivalent to
> [cons: =0, 1, 2, 3] and [cons: =0, 1, 2, 3; attrs:]. This fixes it by just
> returning early and leaving it to the
Tamar Christina writes:
> Hi All,
>
> In GCC 11 we implemented the vectorizer optab for widening left shifts,
> however this optab is only supported for uniform shift constants.
>
> At the moment GCC still has two loop vectorization strategy (classical loop
> and
> SLP based loop vec) and the opt
Richard Biener writes:
> On Tue, 1 Aug 2023, Richard Sandiford wrote:
>
>> Richard Sandiford writes:
>> > Richard Biener via Gcc-patches writes:
>> >> The following makes sure to limit the shift operand when vectorizing
>> >> (short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift
>>
Tamar Christina writes:
>> Tamar Christina writes:
>> > Hi All,
>> >
>> > When determining issue rates we currently discount non-constant MLA
>> > accumulators for Advanced SIMD but don't do it for the latency.
>> >
>> > This means the costs for Advanced SIMD with a constant accumulator are
>> >
Tamar Christina writes:
> Hi All,
>
> boolean comparisons have different cost depending on the mode. e.g.
> a && b when predicated doesn't require an addition instruction, the AND is
> free
Nit (for the commit msg): additional
Maybe:
for SVE, a && b doesn't require an additional instruction
Tamar Christina writes:
> Hi All,
>
> When determining issue rates we currently discount non-constant MLA
> accumulators
> for Advanced SIMD but don't do it for the latency.
>
> This means the costs for Advanced SIMD with a constant accumulator are wrong
> and
> results in us costing SVE and Adv
Jeff Law via Gcc-patches writes:
> On 8/1/23 05:18, Richard Sandiford wrote:
>>
>> Where were you seeing the requirement for pointer equality? genrecog.cc
>> at least uses rtx_equal_p, and I think it has to. E.g. some patterns
>> use (match_dup ...) to match output and input mems, and mem rtxes
Jeff Law via Gcc-patches writes:
> On 7/19/23 04:11, Xiao Zeng wrote:
>> This patch completes the recognition of the basic semantics
>> defined in the spec, namely:
>>
>> Conditional zero, if condition is equal to zero
>>rd = (rs2 == 0) ? 0 : rs1
>> Conditional zero, if condition is non zero
Richard Sandiford writes:
> Richard Biener via Gcc-patches writes:
>> The following makes sure to limit the shift operand when vectorizing
>> (short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift
>> operand otherwise invokes undefined behavior. When we determine
>> whether we can d
Richard Biener via Gcc-patches writes:
> The following makes sure to limit the shift operand when vectorizing
> (short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift
> operand otherwise invokes undefined behavior. When we determine
> whether we can demote the operand we know we at m
Juzhe-Zhong writes:
> Hi, Richard and Richi.
>
> Base on the suggestions from Richard:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625396.html
>
> This patch choose (1) approach that Richard provided, meaning:
>
> RVV implements cond_* optabs as expanders. RVV therefore supports
> both
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong
>
> Hi, Richard and Richi.
>
> Base on previous disscussions, we should make COND_* and COND_LEN_*
> consistent.
>
> So, this patch define these internal function together by these 2
> wrappers:
>
> #ifndef DEF_INTERNAL_COND_FN
> #define DEF_INTERN
Richard Ball writes:
> Add POLY_INT_CST support to code within
> fold_ctor_reference. This code previously
> only supported INTEGER_CST which caused a
> bug when using VEC_PERM_EXPR with SVE vectors.
Just to add for others: this is a prerequisite for a follow-on patch,
so the change will be teste
Hao Liu OS writes:
>> Which test case do you see this for? The two tests in the patch still
>> seem to report correct latencies for me if I make the change above.
>
> Not the newly added tests. It is still the existing case causing the
> previous ICE (i.e. assertion problem): gcc.target/aarch64
Sorry for the slow response.
Hao Liu OS writes:
>> Ah, thanks. In that case, Hao, I think we can avoid the ICE by changing:
>>
>> if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
>> && vect_is_reduction (stmt_info))
>>
>> to:
>>
>> if ((kind == scalar_stmt || k
Richard Biener writes:
> On Wed, Jul 26, 2023 at 11:14 AM Richard Sandiford
> wrote:
>>
>> Richard Biener writes:
>> > On Wed, Jul 26, 2023 at 4:02 AM Hao Liu OS via Gcc-patches
>> > wrote:
>> >>
>> >> > When was STMT_VINFO_REDUC_DEF empty? I just want to make sure that
>> >> > we're not pape
Richard Biener writes:
> On Wed, Jul 26, 2023 at 4:02 AM Hao Liu OS via Gcc-patches
> wrote:
>>
>> > When was STMT_VINFO_REDUC_DEF empty? I just want to make sure that we're
>> > not papering over an issue elsewhere.
>>
>> Yes, I also wonder if this is an issue in vectorizable_reduction. Below
Was leaving a bit of time in case Richi had any comments, but:
Matthew Malcomson writes:
> Our checks for whether the vectorization of a given loop would make an
> out of bounds access miss the case when the vector we load is so large
> as to span multiple iterations worth of data (while only bei
Hi,
Thanks for the rework and sorry for the slow review.
Prathamesh Kulkarni writes:
> Hi Richard,
> This is reworking of patch to extend fold_vec_perm to handle VLA vectors.
> The attached patch unifies handling of VLS and VLA vector_csts, while
> using fallback code
> for ctors.
>
> For VLS ve
"juzhe.zh...@rivai.ai" writes:
> Hi, Richard.
>>> I think we should have an internal-fn helper that returns IFN_COND_LEN_*
>>> for a given IFN_COND_*. It could handle IFN_MASK_LOAD -> IFN_MASK_LEN_LOAD
>>> etc. too.
> Could you name this helper function for me? Does it call
> "get_conditional_le
"juzhe.zh...@rivai.ai" writes:
> Thanks Richard.
>
> Do you suggest we should add a macro like this first:
>
> #ifndef DEF_INTERNAL_COND_FN
> #define DEF_INTERNAL_COND_FN(NAME, FLAGS, OPTAB, TYPE) \
> DEF_INTERNAL_OPTAB_FN (COND_##NAME, FLAGS, cond_##optab, cond_##TYPE)
> DEF_INTERNAL_OPTAB_FN
Hao Liu OS writes:
> Hi,
>
> Thanks for the suggestion. I tested it and found a gcc_assert failure:
> gcc.target/aarch64/sve/cost_model_13.c (internal compiler error: in
> info_for_reduction, at tree-vect-loop.cc:5473)
>
> It is caused by empty STMT_VINFO_REDUC_DEF.
When was STMT_VINFO_REDU
钟居哲 writes:
> Hi, Richi. Thank you so much for review.
>
>>> This function doesn't seem to care about conditional vectorization
>>> support, so why are you changing it?
>
> I debug and analyze the code here:
>
> Breakpoint 1, vectorizable_call (vinfo=0x3d358d0, stmt_info=0x3dcc820,
> gsi=0x0, vec
Andrew Pinski via Gcc-patches writes:
> The problem -fasynchronous-unwind-tables is on by default for aarch64
> We need turn it off for crt*.o because it would make __EH_FRAME_BEGIN__ point
> to .eh_frame data from crtbeginT.o instead of the user-defined object
> during static linking.
Could you
Hao Liu OS writes:
> This only affects the new costs in aarch64 backend. Currently, the reduction
> latency of vector body is too large as it is multiplied by stmt count. As the
> scalar reduction latency is small, the new costs model may think "scalar code
> would issue more quickly" and increa
Richard Biener writes:
> The following unifies SLP_TREE_VEC_STMTS into SLP_TREE_VEC_DEFS
> which can handle all cases we need.
>
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
Nice! Just curious...
> @@ -149,6 +147,20 @@ _slp_tree::~_slp_tree ()
> free (failed);
> }
>
> +/*
Jan Hubicka writes:
> Avoid scaling flat loop profiles of vectorized loops
>
> As discussed, when vectorizing loop with static profile, it is not always
> good idea
> to divide the header frequency by vectorization factor because the profile may
> not realistically represent the expected number o
Richard Biener writes:
>> Am 20.07.2023 um 18:59 schrieb Richard Sandiford :
>>
>> Richard Biener writes:
> Am 20.07.2023 um 16:09 schrieb Richard Sandiford
> :
Richard Biener via Gcc-patches writes:
> When we materialize a layout we push edge permutes to constant/exte
Richard Biener writes:
>> Am 20.07.2023 um 16:09 schrieb Richard Sandiford :
>>
>> Richard Biener via Gcc-patches writes:
>>> When we materialize a layout we push edge permutes to constant/external
>>> defs without checking we can actually do so. For externals defined
>>> by vector stmts rathe
Richard Biener via Gcc-patches writes:
> When we materialize a layout we push edge permutes to constant/external
> defs without checking we can actually do so. For externals defined
> by vector stmts rather than scalar components we can't.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
"Kewen.Lin" writes:
> Hi,
>
> Commit r14-2267-gb8806f6ffbe72e adjusts the arguments order
> of LEN_STORE from {len,vector,bias} to {len,bias,vector},
> in order to make them consistent with LEN_MASK_STORE and
> MASK_STORE. But it missed to update the related handlings
> in tree-ssa-sccvn.cc, it c
"Kewen.Lin" writes:
> Hi,
>
> As PR110729 reported, there was one issue for .section
> __patchable_function_entries with -ffunction-sections, that
> is we put the same symbol as link_to section symbol for all
> functions wrongly. The commit r13-4294 for PR99889 has
> fixed this with the correspon
Richard Biener writes:
#> On Thu, 20 Jul 2023, Richard Sandiford wrote:
>
>> Jeff Law via Gcc-patches writes:
>> > On 7/19/23 04:25, Richard Biener wrote:
>> >> On Wed, 19 Jul 2023, YunQiang Su wrote:
>> >>
>> >>> Eric Botcazou ?2023?7?19??? 17:45???
>>
>> > I don't see that. That's d
Jan Hubicka writes:
>> Tamar Christina writes:
>> > Hi All,
>> >
>> > The resulting predicate register of a whilelo is not
>> > restricted to the lower half of the predicate register file.
>> >
>> > As such these tests started failing after recent changes
>> > because the whilelo outside the loop
Richard Biener writes:
> On Thu, 20 Jul 2023, Richard Sandiford wrote:
>
>> Tamar Christina writes:
>> > Hi All,
>> >
>> > The resulting predicate register of a whilelo is not
>> > restricted to the lower half of the predicate register file.
>> >
>> > As such these tests started failing after rec
Jeff Law via Gcc-patches writes:
> On 7/19/23 04:25, Richard Biener wrote:
>> On Wed, 19 Jul 2023, YunQiang Su wrote:
>>
>>> Eric Botcazou ?2023?7?19??? 17:45???
> I don't see that. That's definitely not what GCC expects here,
> the left-most word of the doubleword should be unchan
Andrew Carlotti writes:
> Updated patch to fix the fp16 intrinsic pragmas, and pushed to master.
> OK to backport to GCC 13?
OK, thanks.
Richard
> Many intrinsics currently depend on both an architecture version and a
> feature, despite the corresponding instructions being available within
> GC
Tamar Christina writes:
> Hi All,
>
> The resulting predicate register of a whilelo is not
> restricted to the lower half of the predicate register file.
>
> As such these tests started failing after recent changes
> because the whilelo outside the loop is getting assigned p15.
It's the whilelo i
Manolis Tsamis writes:
> On Tue, Jul 18, 2023 at 1:12 AM Richard Sandiford
> wrote:
>>
>> Manolis Tsamis writes:
>> > noce_convert_multiple_sets has been introduced and extended over time to
>> > handle
>> > if conversion for blocks with multiple sets. Currently this is focused on
>> > register
Michael Matz via Gcc-patches writes:
> Hello,
>
> the ELF psABI for x86-64 doesn't have any callee-saved SSE
> registers (there were actual reasons for that, but those don't
> matter anymore). This starts to hurt some uses, as it means that
> as soon as you have a call (say to memmove/memcpy, eve
Manolis Tsamis writes:
> noce_convert_multiple_sets has been introduced and extended over time to
> handle
> if conversion for blocks with multiple sets. Currently this is focused on
> register moves and rejects any sort of arithmetic operations.
>
> This series is an extension to allow more sequ
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong
>
> Hi, Richard.
>
> RISC-V port needs to add a bunch VLS modes (V16QI,V32QI,V64QI,...etc)
> There are sharing same REG_CLASS with VLA modes (VNx16QI,VNx32QI,...etc)
>
> When I am adding those VLS modes, the RTL_SSA initialization in VSETVL PASS
>
Juzhe-Zhong writes:
> Hi, Richard.
>
> RISC-V port needs to add a bunch VLS modes (V16QI,V32QI,V64QI,...etc)
> There are sharing same REG_CLASS with VLA modes (VNx16QI,VNx32QI,...etc)
>
> When I am adding those VLS modes, the RTL_SSA initialization in VSETVL PASS
> (inserted after RA) ICE:
> rvv.
Jason Merrill writes:
> On Sun, Jul 16, 2023 at 6:50 AM Richard Sandiford
> wrote:
>
>> Jakub Jelinek writes:
>> > On Fri, Jul 14, 2023 at 04:56:18PM +0100, Richard Sandiford via
>> Gcc-patches wrote:
>> >> Summary: We'd like to be able to speci
Richard Biener writes:
> On Fri, Jul 14, 2023 at 5:58 PM Richard Sandiford via Gcc-patches
> wrote:
>>
>> Summary: We'd like to be able to specify some attributes using
>> keywords, rather than the traditional __attribute__ or [[...]]
>> syntax. Wou
Jakub Jelinek writes:
> On Fri, Jul 14, 2023 at 04:56:18PM +0100, Richard Sandiford via Gcc-patches
> wrote:
>> Summary: We'd like to be able to specify some attributes using
>> keywords, rather than the traditional __attribute__ or [[...]]
>> syntax. Would that
Thanks for the feedback.
Nathan Sidwell writes:
> On 7/14/23 11:56, Richard Sandiford wrote:
>> Summary: We'd like to be able to specify some attributes using
>> keywords, rather than the traditional __attribute__ or [[...]]
>> syntax. Would that be OK?
>>
>> In more detail:
>>
>> We'd like to
Summary: We'd like to be able to specify some attributes using
keywords, rather than the traditional __attribute__ or [[...]]
syntax. Would that be OK?
In more detail:
We'd like to add some new target-specific attributes for Arm SME.
These attributes affect semantics and code generation and so t
Vladimir Makarov writes:
> On 7/12/23 06:07, Richard Sandiford wrote:
>> Vladimir Makarov via Gcc-patches writes:
>>> diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
>>> index 73fbef29912..2f95121df06 100644
>>> --- a/gcc/lra-assigns.cc
>>> +++ b/gcc/lra-assigns.cc
>>> @@ -1443,10 +1443,11 @
Richard Biener writes:
> The PRs ask for optimizing of
>
> _1 = BIT_FIELD_REF ;
> result_4 = BIT_INSERT_EXPR ;
>
> to a vector permutation. The following implements this as
> match.pd pattern, improving code generation on x86_64.
>
> On the RTL level we face the issue that backend patterns in
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong
>
> Hi, Richard and Richi.
> As we disscussed before, COND_LEN_* patterns were added for multiple
> situations.
> This patch apply CON_LEN_* for the following situation:
>
> Support for the situation that in "vectorizable_operation":
> /* If ope
Richard Biener writes:
> On Wed, Jul 12, 2023 at 1:05 PM Uros Bizjak wrote:
>>
>> On Wed, Jul 12, 2023 at 12:58 PM Uros Bizjak wrote:
>> >
>> > On Wed, Jul 12, 2023 at 12:23 PM Richard Sandiford
>> > wrote:
>> > >
>> > > Richard Biener via Gcc-patches writes:
>> > > > On Mon, Jul 10, 2023 at 1
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong
>
> Hi, Richard and Richi.
> As we disscussed before, COND_LEN_* patterns were added for multiple
> situations.
> This patch apply CON_LEN_* for the following situation:
>
> Support for the situation that in "vectorizable_operation":
> /* If ope
Richard Biener via Gcc-patches writes:
> On Mon, Jul 10, 2023 at 1:01 PM Uros Bizjak wrote:
>>
>> On Mon, Jul 10, 2023 at 11:47 AM Richard Biener
>> wrote:
>> >
>> > On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak wrote:
>> > >
>> > > On Mon, Jul 10, 2023 at 11:17 AM Richard Biener
>> > > wrote:
Vladimir Makarov via Gcc-patches writes:
> The following patch solves
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110372
>
> The patch was successfully bootstrapped and tested on x86-64.
>
> commit 1f7e5a7b91862b999aab88ee0319052aaf00f0f1
> Author: Vladimir N. Makarov
> Date: Fri Jul 7 09:
Richard Biener writes:
> On Wed, 12 Jul 2023, juzhe.zh...@rivai.ai wrote:
>
>> Thanks Richard.
>>
>> Is it correct that the better way is to add optabs
>> (len_strided_load/len_strided_store),
>> then expand LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE to
>> len_strided_load/len_strided_store op
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong
>
> Hi, Richard and Richi.
> As we disscussed before, COND_LEN_* patterns were added for multiple
> situations.
> This patch apply CON_LEN_* for the following situation:
>
> Support for the situation that in "vectorizable_operation":
> /* If ope
Robin Dapp writes:
> Ok so the consensus seems to rather stay with 32 bits and only
> change the shift to 10/20?
Yeah. The check would then be:
if (NUM_OPTABS > 0xfff || NUM_MACHINE_MODES > 0x3ff)
fatal ("genopinit range assumptions invalid");
> As MACHINE_MODE_BITSIZE is already
> 16 we
Richard Sandiford writes:
> Robin Dapp via Gcc-patches writes:
>> Hi,
>>
>> upcoming changes for RISC-V will have us exceed 256 modes or 8 bits. The
>> helper functions in gen* rely on the opcode as well as two modes fitting
>> into an unsigned int (a signed int even if we consider the qsort defa
Robin Dapp via Gcc-patches writes:
> Hi,
>
> upcoming changes for RISC-V will have us exceed 256 modes or 8 bits. The
> helper functions in gen* rely on the opcode as well as two modes fitting
> into an unsigned int (a signed int even if we consider the qsort default
> comparison function). This
Richard Biener writes:
>> Am 06.07.2023 um 19:50 schrieb Richard Sandiford :
>>
>> Richard Biener via Gcc-patches writes:
On Wed, Jul 5, 2023 at 8:44 AM Hao Liu OS via Gcc-patches
wrote:
Hi,
If a loop is unrolled by n times during vectoriation, two steps are use
Richard Biener via Gcc-patches writes:
> On Wed, Jul 5, 2023 at 8:44 AM Hao Liu OS via Gcc-patches
> wrote:
>>
>> Hi,
>>
>> If a loop is unrolled by n times during vectoriation, two steps are used to
>> calculate the induction variable:
>> - The small step for the unrolled ith-copy: vec_1 = vec
Hao Liu OS via Gcc-patches writes:
> Hi,
>
> If a loop is unrolled during vectorization (i.e. suggested_unroll_factor > 1),
> the VFs of both main and epilog loop are enlarged. The epilog vect loop is
> specific for a loop with small iteration counts, so a large VF may hurt
> performance.
>
> Thi
Richard Biener writes:
> On Tue, 4 Jul 2023, Richard Sandiford wrote:
>
>> Richard Biener writes:
>> > On Thu, 29 Jun 2023, Richard Biener wrote:
>> >
>> >> On Thu, 29 Jun 2023, Richard Sandiford wrote:
>> >>
>> >> > Richard Biener writes:
>> >> > > With applying loop masking to epilogues on x8
Richard Biener via Gcc-patches writes:
> The following adjusts the tree.def documentation about VEC_PERM_EXPR
> which wasn't adjusted when the restrictions of permutes with constant
> mask were relaxed.
I was going to complain about having two copies of the documentation,
but then I realised that
Richard Biener writes:
> On Thu, 29 Jun 2023, Richard Biener wrote:
>
>> On Thu, 29 Jun 2023, Richard Sandiford wrote:
>>
>> > Richard Biener writes:
>> > > With applying loop masking to epilogues on x86_64 AVX512 we see
>> > > some significant performance regressions when evaluating SPEC CPU 20
Richard Biener writes:
> On Tue, 4 Jul 2023, Richard Biener wrote:
>
>> On Mon, 3 Jul 2023, Richard Sandiford wrote:
>>
>> > Richard Biener writes:
>> > > The following removes late deciding to elide vectorized epilogues to
>> > > the analysis phase and also avoids altering the epilogues niter.
Richard Biener writes:
> The following removes late deciding to elide vectorized epilogues to
> the analysis phase and also avoids altering the epilogues niter.
> The costing part from vect_determine_partial_vectors_and_peeling is
> moved to vect_analyze_loop_costing where we use the main loop
> a
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong
>
> Hi, Richi and Richard.
>
> Base one the review comments from Richard:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623405.html
>
> I change len_mask_gather_load/len_mask_scatter_store order into:
> {len,bias,mask}
>
> We adjust adding
Richard Biener via Gcc-patches writes:
> When trying to associate (v + INT_MAX) + INT_MAX we are using
> the TREE_OVERFLOW bit to check for correctness. That isn't
> working for VECTOR_CSTs and it can't in general when one considers
> VL vectors. It looks like it should work for COMPLEX_CSTs but
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong
>
> Hi, Richard. I fix the order as you suggeted.
>
> Before this patch, the order is {len,mask,bias}.
>
> Now, after this patch, the order becomes {len,bias,mask}.
>
> Since you said we should not need 'internal_fn_bias_index', the bias index
> s
The documentation says:
-
@cindex @code{vec_extract@var{m}@var{n}} instruction pattern
@item @samp{vec_extract@var{m}@var{n}}
Extract given field from the vector value. [...] The
@var{n} mode is the mode of the field or vect
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong
>
> Hi, Richard and Richi.
>
> According to Richard's review comments:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623405.html
>
> current len, bias and mask order is not reasonable.
>
> Change {len,mask,bias} into {len,bias,mask}.
>
> Th
101 - 200 of 1077 matches
Mail list logo