LGTM.
--
Regards
Robin
LGTM (in case you haven't committed it yet).
--
Regards
Robin
Hi,
this fixes asm-scan fallout from r15-3712-g5e3a4a01785e2d where we allow
SLP with SELECT_VL.
Assisted by sed and regtested on rv64gcv_zvfh_zvbb.
Rather lengthy but obvious, so going to commit after a while if the CI is
happy. I think those tests don't really need to check for vsetvl anyway,
Hi Dusan,
sorry for the late reply.
> This patch addresses a missed opportunity to fuse vsetvl_infos.
> Instead of checking whether demands for merging configurations of
> vsetvl_info are all met, the demands are checked individually.
>
> The case in question occurs because of the conditional
> On Tue, 17 Sep 2024, Richard Biener wrote:
>
> > The following restores the use of .SELECT_VL for testcases where it
> > is safe to use even when using SLP. I've for now restricted it
> > to single-lane SLP plus optimistically allow store-lane nodes
> > and assume single-lane roots are not widen
> This patch would like fix the dump check times of vector SAT_ADD. The
> middle-end change makes the match times from 2 to 4 times.
>
> The below test suites are passed for this patch.
> * The rv64gcv fully regression test.
That's OK. And I think testsuite fixup patches like this you can consid
> The following simply removes a seemingly bogus guard.
>
> * tree-vect-loop.cc (vect_analyze_loop_1): Remove SLP guard
> from .SELECT_VL disabling.
> ---
> gcc/tree-vect-loop.cc | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-v
Hi,
PR112694 shows that we try to create sub-vectors of single-element
vectors because can_duplicate_and_interleave_p returns true.
The problem resurfaced in PR116611.
This patch makes can_duplicate_and_interleave_p return false
if count / nvectors > 0 and removes the corresponding check in the r
> In the process of DF to SI, we generally use "unsigned_fix" rather than
> "truncate" for conversion. Although this has no effect in general,
> unexpected ICE often occurs when precise semantic analysis is required,
> such as analysis in function "simplify_const_unary_operation" in
> simplify-rtx.
Hi,
this adds a V16SI -> V4SI and related i.e. "quartering" vector-vector
extract expander for VLS modes. It helps with unnecessary spills in
x264.
Regtested on rv64gcv_zvfh_zvbb.
Regards
Robin
gcc/ChangeLog:
* config/riscv/autovec.md (vec_extract):
Add quarter vec-vec extrac
> > So we only found two instances of this problem and both were related to
> > _Bools. In case you have more cases, it would be greatly appreciated
> > to verify the series with them. If you don't mind, would it be possible
> > to comment out the zeroing, re-run the testsuite and check for FAILs
> There were absolutely problems without this. It's a while ago now, so I'm
> struggling with the details, but as GCC only applies the mask to selected
> operations there were all sorts of issues that crept in. Zeroing the
> undefined lanes seemed to match the middle end assumptions (or, at least i
> > +(define_predicate "maskload_else_operand"
> > + (and (match_code "const_int,const_vector")
> > + (match_test "op == CONST0_RTX (GET_MODE (op))")))
>
> This forces maskload and mask_gather_load to only accept zero here, but
> in fact the hardware would allow us to accept any value (incl
Hi,
I messed up the return value in check_effective_target_rvv_zvl256b_ok and
check_effective_target_rvv_zvl512b_ok. This fixes it and also just uses
the current march for the check.
Going to commit as obvious.
Regards
Robin
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Fix eff
> On Wed, Aug 28, 2024 at 3:21 PM Robin Dapp wrote:
> >
> > > Hmm - but how can you call this ambiguous? VLEN and LMUL is a runtime
> > > property(?), so unknown to the compiler(?) - as you do below the only
> > > way to code generate would be a agnostic way
> Hmm - but how can you call this ambiguous? VLEN and LMUL is a runtime
> property(?), so unknown to the compiler(?) - as you do below the only
> way to code generate would be a agnostic way such as with a slide-down.
> But can't you always to this, for all subregs of this sort (even with offset)?
> +(define_mode_iterator V_HAS_HALF [
> + V2QI V4QI V8QI V16QI V32QI V64QI V128QI V256QI V512QI V1024QI V2048QI
> V4096QI
> + V2HI V4HI V8HI V16HI V32HI V64HI V128HI V256HI V512HI V1024HI V2048HI
> + V2SI V4SI V8SI V16SI V32SI V64SI V128SI V256SI V512SI V1024SI
> + V2DI V4DI V8DI V16DI V32DI V
You don't need an OK of course but LGTM.
When I found another instance of this I was thinking about having
exhaustive self tests for those attributes. Maybe a good learning
exercise?
--
Regards
Robin
Hi,
this is a hopefully better way to solve the "subreg problem" by first,
in the generic case, have the RA go via memory and second, providing a
vector-vector extract that deals with it in an optimized way.
When the source mode is potentially larger than one vector (e.g. an
LMUL2 mode for VLEN=1
> + /* Constants in range -16 ~ 15 integer or 0.0 floating-point
> +can be emitted using vmv.v.i. */
> + if (satisfies_constraint_vi (x) || satisfies_constraint_Wc0 (x))
> return 1;
Just a nit but while you're at it, don't you want to split
Before looking at the rest (tomorrow) - this is OK.
--
Regards
Robin
Hi,
standard abs synthesis during expand is max (a, -a). This
expansion has the advantage of avoiding masking and is thus potentially
faster than the a < 0 ? -a : a synthesis.
Regtested on rv64gcv_zvfh_zvbb.
Regards
Robin
gcc/ChangeLog:
* config/riscv/autovec.md (abs2): Expand via ma
> Why's the include needed? .ccs ought to include coretypes.h directly
> (and get machmode.h that way, since coretypes.h include machmode.h).
Ugh, that was not intentional, sometimes my auto-complete inserts
such includes for no reason. I really need to disable that, thanks for
pointing that out
> Indeed though that might be a larger change.
I have tested the attached now, aarch64 is still running but
x86 and power10 are bootstrapped and regtested, riscv regtested.
Hope I didn't miss any target-specific code that I haven't tested.
As the issue is only latent I verified by calling
get_b
LGTM.
--
Regards
Robin
LGTM.
--
Regards
Robin
> And we fail to fold vect_patt_384.36_436 | { 1, ... } to { 1, ... }?
> Or is the issue that vector masks contain padding and with
> non-zero masking we'd have garbage in the padding and that leaks
> here? That is, _47 ? 1 : iftmp.0_113 -> _47 | iftmp.0_113 assumes
> there's exactly one bit in a
> > > > _Bool iftmp.0_113;
> > > > _Bool iftmp.0_114;
> > > > iftmp.0_113 = .MASK_LOAD (_170, 8B, _169, _171(D));
> > > > iftmp.0_114 = _47 | iftmp.0_113;
> > _BoolD.2746 _47;
> > iftmp.0_114 = _47 ? 1 : iftmp.0_113;
> > which is folded into
> > iftmp.0_114 = _47 | iftmp.0_113;
>
>
> > > Why? I don't think the vectorizer relies on a particular else
> > > value? I'd say it would be appropriate for if-conversion to
> > > use "ANY" and for the vectorizer to then pick a supported
> > > version and/or enforce the else value it needs via a blend?
> >
> > In PR115336 we have some
> > > When predicating a load we implicitly assume that the else value is
> > > zero. In order to formalize this this patch queries the target for
> > > its supported else operand and uses that for the maskload call.
> > > Subsequently, if the else operand is nonzero, a cond_expr enforcing
> > > a
Hi,
in get_best_extraction_insn we use smallest_int_mode_for_size with
struct_bits as size argument. In PR115495 struct_bits = 256 and we
don't have a mode for that. This patch just bails for such cases.
This does not happen on the current trunk anymore (so the test passes
unpatched) but we've
This patch adds else operands to masked loads. Currently the default
else operand predicate accepts "undefined" (i.e. SCRATCH) as well as
all-ones values.
Note that this series introduces a large number of new RVV FAILs for
riscv. All of them are due to us not being able to elide redundant
vec_c
This patch adds a zero else operand to masked loads, in particular the
masked gather load builtins that are used for gather vectorization.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_special_args_builtin):
Add else-operand handling.
(ix86_expand_builtin): Ditt
This patch adds a zero else operand to the masked loads.
gcc/ChangeLog:
* config/gcn/predicates.md (maskload_else_operand): New
predicate.
* config/gcn/gcn-valu.md: Use new predicate.
---
gcc/config/gcn/gcn-valu.md | 6 --
gcc/config/gcn/predicates.md | 3 +++
2 fil
When predicating a load we implicitly assume that the else value is
zero. In order to formalize this this patch queries the target for
its supported else operand and uses that for the maskload call.
Subsequently, if the else operand is nonzero, a cond_expr enforcing
a zero else value is emitted.
This patch adds an else operand to vectorized masked load calls.
The current implementation adds else-value arguments to the respective
target-querying functions that is used to supply the vectorizer with the
proper else value.
Right now, the only spot where a zero else value is actually enforced
This adds zero else operands to masked loads and their intrinsics.
I needed to adjust more than initially thought because we rely on
combine for several instructions and a change in a "base" pattern
needs to propagate to all those.
For the lack of a better idea I used a function call property to s
This patch adds else-operand handling to the internal functions.
gcc/ChangeLog:
* internal-fn.cc (add_mask_and_len_args): Rename...
(add_mask_else_and_len_args): ...to this and add else handling.
(expand_partial_load_optab_fn): Use adjusted function.
(expand_partia
This patch amends the documentation for masked loads (maskload,
vec_mask_load_lanes, and mask_gather_load as well as their len
counterparts) with an else operand.
gcc/ChangeLog:
* doc/md.texi: Document masked load else operand.
---
gcc/doc/md.texi | 60 +++
st suite results
are, however, unchanged.
Robin Dapp (8):
docs: Document maskload else operand and behavior.
ifn: Add else-operand handling.
tree-ifcvt: Enforce zero else value after maskload.
vect: Add maskload else value support.
aarch64: Add masked-load else operands.
gcn: Add else ope
A bit of bikeshedding:
While it's obviously a bug, I'm not really sure it's useful to truncate before
emitting the widening shift. Do we save an instruction vs. the regular
non-widening shift by doing so?
I think my original (failed) idea was this pattern to be an intermediate/bridge
pattern tha
> When compiling an interface for rounding of type 'vfloat16*' without using
> zvfh
> or zvfhmin, it is not enough to use FLOAT_MODE_P because the type does not
> support
> it. Although the subsequent riscv_validate_vector_type checks will still fail
> and throw exceptions, I don't think we shoul
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
> index d150f20b5d9..02814183dbb 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr116202-run
Hi,
in PR116149 we choose a wrong vector length which causes wrong values in
a reduction. The problem happens in avlprop where we choose the
number of units in the instruction's mode as vector length. For the
non-scalar variants the respective operand has the correct non-widened
mode. For the s
> > Like aarch64 we set REGMODE_NATURAL_SIZE for fixed-size modes to
> > UNITS_PER_WORD. Isn't that part of the problem?
> >
> > In extract_bit_field_as_subreg we check lowpart_bit_field_p (= true because
> > 128 is a multiple of UNITS_PER_WORD). This leads to the subreg expression.
> >
> > If I
> > IMO, what ought to happen here is that the RA should spill
> > the inner register to memory and load the V4SI back from there.
> > (Or vice versa, for an lvalue.) Obviously that's not very efficient,
> > and so a patch like the above might be useful as an optimisation.[*]
> > But it shouldn't
OK.
--
Regards
Robin
Hi,
when the source mode is potentially larger than one vector (e.g. an
LMUL2 mode for VLEN=128) we don't know which vector the subreg actually
refers to. For zvl128b and LMUL=2 the subreg in (subreg:V2DI (reg:V4DI))
could actually be the a full (high) vector register of a two-register
group (at
Hi,
an unquoted apostrophe slipped through when testing the recent
V/M extension patch. This, again, re-words the message to
"Currently the 'V' implementation requires the 'M' extension".
Going to commit as obvious after testing.
Regards
Robin
gcc/ChangeLog:
* config/riscv/riscv.cc (
Hi,
In preparation for the maskload else operand I split off this patch. The patch
looks through SSA names for the conditions passed to inverse_conditions_p which
helps match.pd recognize more redundant vec_cond expressions. It also adds
VCOND_MASK to the respective iterators in match.pd.
Is th
> That phrasing makes sense to me. It's consistent with the -mbig-endian
> sorry message:
>
> https://godbolt.org/z/oWMeorEeM
I seem to remember that explicitly mentioning GCC in an error message like
that was discouraged but I might be confusing things.
So probably
"GCC's current 'V' implementa
> It's really GCC's implementation of the V extension that requires M, not
> the actul ISA V extension. So I think the wording could be a little
> confusing for users here, but no big deal either way on my end so
>
> Reviewed-by: Palmer Dabbelt
Hmm, fair. How about just "the 'V' implementatio
Hi,
now with proper diff...
For calculating the value of a poly_int at runtime we use a
multiplication instruction that requires the M extension.
Instead of just asserting and ICEing this patch emits an early
error at option-parsing time.
We have several tests that use only "i" (without "m") and
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 826d552a6fd..eb6c033535c 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -5049,7 +5049,8 @@ internal_len_load_store_bias (internal_fn ifn,
> machine_mode mode)
> }
>
> /* Return true if the given ELS_VALUE is supp
Hi,
for calculating the value of a poly_int at runtime we use a multiplication
instruction that requires the M extension. Instead of just asserting and
ICEing this patch emits an early error at option-parsing time.
We have several tests that use only "i" (without "m") and I adjusted all of
them
> Thanks for the explanation! I have a few clarification questions about this.
> If I understand correctly, B would represent the number of elements the
> vector can have (for 128b vector operating on 32b elements, B == 4, but if
> operating on 64b elements B == 2); however, I'm not too sure what A
LGTM.
--
Regards
Robin
> I have a test.
> The backend can't see -0.0 and It becomes 0.0 when translate to gimple.
I don't think it should except when specifying -ffast-math or similar.
But we don't have a shortcut to load a negative zero, just the positive
one.
--
Regards
Robin
> -(match_operand:V_VLSF 3 "register_operand")]))]
> +(match_operand:V_VLSF 3 "nonmemory_operand")]))]
Even though the integer compares have nonmemory operand here their respective
insn patterns don't (but constrain properly).
I guess what's happening with register operand and a c
Hi Demin,
> + void add_integer_operand (rtx x)
> + {
> +create_integer_operand (&m_ops[m_opno++], INTVAL (x));
> +gcc_assert (m_opno <= MAX_OPERANDS);
> + }
Can that be folded into add_input_operand somehow?
>void add_input_operand (rtx x, machine_mode mode)
>{
> create_i
> Yeah, I think so. I guess for RVV there's a choice between:
>
> (1) making the insn predicate accept all else values and making
> the insn emit an explicit blend between the loaded result
> and the else value
>
> (2) making the insn predicate only accept “undefined” (SCRATCH in
> r
> To me this looks like mis-applying of match.pd:6083?
>
> Applying pattern match.pd:6083, gimple-match-1.cc:45749
> gimple_simplified to iftmp.0_62 = iftmp.0_61 | _219;
> new phi replacement stmt
> iftmp.0_62 = iftmp.0_61 | _219;
>
> so originally it wasn't
>
> iftmp.0_61 = .MASK_LOAD (_260,
> FTR, my concern & suggestion was:
>
> I suppose the difficulty is that we might make:
>
> MASK_LOAD (mask, ptr, some-arbitrary-else-value)
>
> seem as cheap as:
>
> MASK_LOAD (mask, ptr, { 0, 0,. ... 0})
>
> which definitely isn't the case for SVE (and I'm guessing also
> for
Hi,
in PR115336 we have the following
vect_patt_391 = .MASK_LEN_GATHER_LOAD (_470, vect__59, 1, { 0, ... }, { 0,
... }, _482, 0);
vect_iftmp.44 = vect_patt_391 | { 1, ... };
.MASK_LEN_STORE (vectp_f.45, 8B, { -1, ... }, _482, 0, vect_iftmp.44);
which assumes that a maskload sets the maske
Hi,
this patch changes the tail policy for vmv.s.x from ta to tu.
By default the bug does not show up with qemu because qemu's
current vmv.s.x implementation always uses the tail-undisturbed
policy. With a local qemu version that overwrites the tail
with ones when the tail-agnostic policy is spec
OK. Thanks for adding the test.
Regards
Robin
OK.
Regards
Robin
> diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
> index 6a2eabbd854..29916adb62b 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -1517,8 +1517,7 @@ (define_insn_and_split "*vwsll_zext1_scalar_"
>"&& 1"
>[(const_int
> I did a test run without the subreg condition and it also appears to
> work when running on rv32gcv and rv64gcv newlib. Would it be better
> to remove the subreg?
Yep, if it works, i.e. all tests still pass then let's get rid of it.
Regards
Robin
> Hmm, ok. The bit that confused me most was:
>
> if (last_needs_comparison != -1)
> {
> end_sequence ();
> start_sequence ();
> ...
> }
>
> which implied that the second attempt was made conditionally.
> It seems like it's always used and is an inherent part of the
>
Hi Edwin,
this is OK but did you check if we can get rid of the subreg
condition now that we have gen_lowpart?
Regards
Robin
:00:00 2001
From: Robin Dapp
Date: Fri, 31 May 2024 14:51:17 +0200
Subject: [PATCH] RISC-V: Use descriptive errors instead of asserts.
In emit_insn we forestall possible ICEs in maybe_legitimize_operand by
asserting. This patch replaces the assertions by more descriptive
internal errors.
> I was looking at the code in more detail and just wanted to check.
> We have:
>
> int last_needs_comparison = -1;
>
> bool ok = noce_convert_multiple_sets_1
> (if_info, &need_no_cmov, &rewired_src, &targets, &temporaries,
> &unmodified_insns, &last_needs_comparison);
> if (!ok)
>
The attached v3 tracks the use of cond_earliest as you suggested
and adds its cost in default_noce_conversion_profitable_p.
Bootstrapped and regtested on x86 and p10, aarch64 still
running. Regtested on riscv64.
Regards
Robin
Before noce_find_if_block processes a block it sets up an if_info
st
Thanks, the patch is OK then.
Regards
Robin
Hi Pan,
in general LGTM. Would you mind adding the coremark-pro
testcase which should be working now, and, was the original
reason for doing this?
I believe the following should do:
extern int wsize;
typedef unsigned short Posf;
#define NIL 0
void foo (Posf *p)
{
register unsigned n, m;
d
> Actually, as Richard mentioned in the PR, it would probably be better
> to use prepare_vec_mask instead. It should work in this context too
> and would avoid redundant double masking.
Attached is v2 that uses prepare_vec_mask.
Regtested on riscv64 and armv8.8-a+sve via qemu.
Bootstrap and regt
Just realized I missed the PR115382 tag in the patch...
Regards
Robin
Hi,
despite looking good on cfarm185 and Linaro's pre-commit CI
gcc-15-638-g7ca35f2e430 now appears to have caused several
regressions on arm-eabi cortex-m55 as found by Linaro's CI:
https://linaro.atlassian.net/browse/GNU-1252
I'm assuming this target is not tested as regularly and thus
the fai
> But isn't canonicalization of EQ/NE safe, even for IEEE NaN and +-0.0?
>
> target = (a == b) ? x : y
> target = (a != b) ? y : x
>
> Are equivalent, even for IEEE IIRC.
Yes, that should be fine. My concern was not that we do a
canonicalization but that we might not do it for some of the
vecto
Hi,
currently we discard the cond-op mask when the loop is fully masked
which causes wrong code in
gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
when compiled with
-O3 -march=cascadelake --param vect-partial-vector-usage=2.
This patch ANDs both masks instead.
Bootstrapped and regtested on
LGTM.
Let's keep in mind that min/max will save us two insns(?)
and a conditional move would save us one.
Regards
Robin
>> When you say other variants are still to be implemented
>> does that also include variants for zbb with min/max
>> or zicond?
>
> No, I mean some other forms like branch need the improvement from the
> middle end(aka widen_mul).
Ah, I see, thanks. Those can save one instruction and we want th
Hi Pan,
> + /* Step-2: lt = x < y */
> + riscv_emit_binary (LTU, pmode_lt, pmode_x, pmode_y);
> +
> + /* Step-3: lt = -lt */
> + riscv_emit_unary (NEG, pmode_lt, pmode_lt);
> +
> + /* Step-4: lt = ~lt */
> + riscv_emit_unary (NOT, pmode_lt, pmode_lt);
Can we replace step 3 and 4 with sub
> Is there any way we can avoid using pattern_cost here? Using it means
> that we can make use of targetm.insn_cost for the jump but circumvent
> it for the condition, giving a bit of a mixed metric.
>
> (I realise there are existing calls to pattern_cost in ifcvt.cc,
> but if possible I think we
Hi,
I wasn't aware that I needed to regenerate the opt urls when
adding an option. For this patch I did it now.
I suppose this doesn't require an extra OK but I'm going to
wait some minutes before applying still.
Regards
Robin
gcc/ChangeLog:
* config/riscv/riscv.opt.urls: Regenerate.
Hi,
this silences some warnings when using check_GNU_style.
I didn't expect this to have any bootstrap or regtest impact
but I still ran it on x86 - no change.
Regards
Robin
contrib/ChangeLog:
* check_GNU_style_lib.py: Use raw strings for regexps.
---
contrib/check_GNU_style_lib.py |
Hi,
ifcvt likes to emit
(set
(if_then_else)
(ge (reg 1) (reg2))
(reg 1)
(reg 2))
which can be recognized as min/max patterns in the backend.
This patch adds such patterns and the respective iterators as well as a
test.
This depends on the generic ifcvt change.
Regtested on rv64gcv
Hi,
before noce_find_if_block processes a block it sets up an if_info
structure that holds the original costs. At that point the costs of
the then/else blocks have not been added so we only care about the
"if" cost.
The code originally used BRANCH_COST for that but was then changed
to COST_N_INS
On 5/28/24 23:55, Patrick O'Neill wrote:
> From: Greg McGary
>
> Add option -m(no-)autovec-segment to enable/disable autovectorizer
> from emitting vector segment load/store instructions. This is useful for
> performance experiments.
I think the question was raised before but does a vector tune
Hi,
this patch disables movmisalign by default and introduces
the -mno-vector-strict-align option to override it and re-enable
movmisalign. For now, generic-ooo is the only uarch that supports
misaligned vector access.
The patch also adds a check_effective_target_riscv_v_misalign_ok to
the tests
>> + /* By default, when -mno-vector-strict-align is not specified, do not
>> allow
>> + unaligned vector memory accesses except if -mtune's setting explicitly
>> + allows it. */
>> + riscv_vector_unaligned_access_p = rvv_vector_strict_align == 0 ||
>
> opts->x_rvv_vector_strict_align
Attached is v3 with the discussed changes. It now has
-mscalar-strict-align which is an alias to -mstrict-align as well
as -mvector-strict-align.
Testsuite shows no new regressions on rv64gcv_zvfh_zvbb.
Regards
Robin
gcc/ChangeLog:
* config/riscv/riscv-opts.h (TARGET_VECTOR_MISALIGN_S
> * -mstrict-align: Both scalar and vector misaligned accesses are
> unsupported (-mrvv-allow-misalign doesn't matter). I'm not sure if
> there's hardware there, but given we have systems that don't support
> scalar misaligned accesses it seems reasonable to assume they'll also
> not support vecto
> We should have something in doc/invoke too, this one is going to be
> tricky for users. We'll also have to define how this interacts with
> the existing -mstrict-align.
Addressed the rest in the attached v2 which also fixes tests.
I'm really not sure about -mstrict-align. I would have hoped th
Hi,
this patch changes the default from always enabling movmisalign to
disabling it. It adds an option to override the default and adds
generic-ooo to the uarchs that support misaligned vector access.
It also adds a check_effective_target_riscv_v_misalign_ok to the
testsuite which enables or dis
The patch is OK from the riscv side. generic-ooo includes fast unaligned
access.
Regards
Robin
Hi Pan,
all in all LGTM. Just insignificant nits.
> +void
> +expand_vec_usadd (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode)
> +{
> + emit_vec_saddu (op_0, op_1, op_2, BINARY_OP, vec_mode);
> +}
> +
Do we really need this function? Or do you want it to be a dispatcher
for later? If it
Hi,
with the introduction of shuffle_series_patterns the explicit handler
code for a perm series is dead. This patch removes it and also adds
a function-level comment to shuffle_series_patterns.
Regtested on rv64gcv_zvfh_zvbb.
Regards
Robin
gcc/ChangeLog:
* config/riscv/riscv-v.cc (e
Hi,
this patch adds the zvbb vcpop, vclz and vctz to the autovec machinery
as well as tests for them. It also changes several non-VLS iterators
to V_VLS iterators for consistency.
Regtested on rv64gcv_zvfh_zvbb.
Regards
Robin
gcc/ChangeLog:
* config/riscv/autovec.md (ctz2): New expan
1 - 100 of 1007 matches
Mail list logo