> However, when I built the toolchain with -march=rv64gcv_zfh_zvfh.
> Then link fail.
Hmm, is it about zvfh or why does linking fail?
Regards
Robin
LGTM.
Regards
Robin
Hi Juzhe,
I guess that's OK but what's the problem here? Are the default options
wrong so we need to overwrite them instead of adding some?
Regards
Robin
Hi Vineet,
I was thinking of two things while skimming the code:
- Couldn't we do this in the expanders directly? Or is the
subreg-promoted info gone until we reach that?
- Should some common-code part be more suited to handle that?
We already elide redundant sign-zero extensions for
ell as internal_fn_else_index.
Testsuite on riscv is unchanged, bootstrap and testsuite on power10 done,
aarch64 and x86 still running.
Regards
Robin
>From e11ac2b5889558c58ce711d8119ebcd78173ac6c Mon Sep 17 00:00:00 2001
From: Robin Dapp
Date: Wed, 13 Sep 2023 22:19:35 +0200
Subject: [PATCH v6] ifcv
cond_mask_len
that results from a simplification with all-one or all-zero mask we
could allow constant immediate vectors and expand them to simple
len moves in the backend.
Regards
Robin
>From bc72e9b2f3ee46508404ee7723ca78790fa96b6b Mon Sep 17 00:00:00 2001
From: Robin Dapp
Date: Fri, 13 Oct 2
> Sorry for the slow review. TBH I was hoping someone else would pick
> it up, since (a) I'm not very familiar with this code, and (b) I don't
> really agree with the way that the current code works. I'm not sure the
> current dependency checking is safe, so I'm nervous about adding even
> more
Ugh, I didn't push yet because with a rebased trunk I am
seeing different behavior for some riscv testcases.
A reduction is not recognized because there is yet another
"double use" occurrence in check_reduction_path. I guess it's
reasonable to loosen the restriction for conditional operations
Hi Juzhe,
as discussed off-list this approach generally makes sense to me so
the patch LGTM once the vsetvl rework is upstream and settled.
Independently, we still need to understand why the more complex
broadcast pattern is not hoisted out of the loop.
Regards
Robin
Hi Lehua,
thanks for the extensive rework. I'm going to let Juzhe handle the review
since it's his pass and he knows it best. Delegated it to him in patchwork.
Regards
Robin
> Could you by the way add this mention this PR:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791
> Add the test of this PR ?
Commented in that PR. This patch does not help there.
Regards
Robin
Hi,
even though there was no full conclusion yet I took the liberty of
just posting this as a patch in case of further discussion.
In PR/111794 we miss a vectorization because on riscv type precision and
mode precision differ for mask types. We can still vectorize when
allowing assignments with
I didn't push this yet because it would have introduced an UNRESOLVED that
my summary script didn't catch. Normally I go with just contrib/test_summary
but that only filters out FAIL and XPASS. I should really be using
compare_testsuite_log.py from riscv-gnu-toolchain/scripts.
It was caused by
> I saw you didn't extend VI -> V_VLSI. I guess will failed SLP on popcount.
Added VLS modes and your test in v2.
Testsuite looks unchanged on my side (vect, dg, rvv).
Regards
Robin
Subject: [PATCH v2] RISC-V: Add popcount fallback expander.
I didn't manage to get back to the generic
> I saw you didn't extend VI -> V_VLSI. I guess will failed SLP on
> popcount.
Hehe, right, I just copied and pasted the expander from my old
patch. Will adjust it and add the test.
Regards
Robin
Hi,
as I didn't manage to get back to the generic vectorizer fallback for
popcount in time (still the generic costing problem) I figured I'd
rather implement the popcount fallback in the riscv backend.
It uses the WWG algorithm from libgcc.
rvv.exp is unchanged, vect and dg.exp testsuites are
LGTM.
Regards
Robin
Thank you for the explanation.
So, assuming I added an IFN_VCOND_MASK and IFN_VCOND_MASK_LEN along
with the respective helper and expand functions, what would be the
way forward?
Generate an IFN_VCOND_MASK(_LEN) here instead of a VEC_COND_EXPR?
How would I make sure all of match.pd's vec_cond
>> I don't know much about valueisation either :) But it does feel
>> like we're working around the lack of a LEN form of COND_EXPR.
>> In other words, it seems odd that we can do:
>>
>> IFN_COND_LEN_ADD (mask, a, 0, b, len, bias)
>>
>> but we can't do:
>>
>> IFN_COND_LEN (mask, a, b, len,
> Natively, things seem fine, but for cross, I get failures on a few
> targets (hppa2.0-unknown-linux-gnu, hppa64-unknown-linux-gnu).
>
> With ./configure --host=x86_64-pc-linux-gnu
> --target=hppa2.0-unknown-linux-gnu --build=x86_64-pc-linux-gnu && make
> -j$(nproc), I get a bunch of stuff like:
> + if (live_range && flow_bb_inside_loop_p (loop, e->src))
> + {
Doesn't this match several cases more than before i.e set the range
start to zero fairly often? I mean if it works fine with me and
the code is easier to read.
Please split off the search for the
Hi Juzhe,
> +/* Get STORE value. */
> +static tree
> +get_store_value (gimple *stmt)
> +{
> + if (is_gimple_call (stmt) && gimple_call_internal_p (stmt))
> +{
> + if (gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
> + return gimple_call_arg (stmt, 3);
> + else
> +
not help
but also doesn't slow anything down. insn-emit.cc isn't very large
to begin with on s390.
Regards
Robin
>From 34d05113a4e3c7e83a4731020307e26c1144af69 Mon Sep 17 00:00:00 2001
From: Robin Dapp
Date: Thu, 12 Oct 2023 11:23:26 +0200
Subject: [PATCH v2] genemit: Split insn-emit.cc into seve
Hi Juzhe,
this LGTM. I was first concerned whether we would want to
stop e.g. at LMUL = 1 and only continue with a specific flag but
actually this should be done via the costs. If an implementation
wants to penalize or incentivize some behavior it can always
adjust the costs which should be
> Why are the contents of this if statement wrong for COND_LEN?
> If the "else" value doesn't matter, then the masked form can use
> the "then" value for all elements. I would have expected the same
> thing to be true of COND_LEN.
Right, that one was overly pessimistic. Removed.
> But isn't
> Hmm why? The same callback you use to consume the listed arguments
> can be used to consume the list can it not? I may be wrong, but from
> what I remember the callback is called when main can't consume an
> argv value and it's allowed to eat all remaining input?
Ah, I see. If that's
> Testsuite is unchanged on all but x86 where, strangely, I saw several
> illegal instructions in the pch tests. Those were not reproducible
> in a second manual test suite run. I'm just running another full
> bootstrap and testsuite cycle with the latest trunk.
Follow-up on the pch tests. The
Hi,
on riscv insn-emit.cc has grown to over 1.2 mio lines of code and
compiling it takes considerable time.
Therefore, this patch adjust genemit to create ten files insn-emit-1.cc
to insn-emit-10.cc. In order to do so it first counts the number of
available patterns, calculates the number of
8e50859 Mon Sep 17 00:00:00 2001
From: Robin Dapp
Date: Wed, 13 Sep 2023 22:19:35 +0200
Subject: [PATCH v4] ifcvt/vect: Emit COND_ADD for conditional scalar
reduction.
As described in PR111401 we currently emit a COND and a PLUS expression
for conditional reductions. This makes it difficu
LGTM, thanks.
Regards
Robin
Hi Juzhe,
good that you noticed it now, I should have caught that
in the review back then...
One thing, though:
> + if (inner_offsize < GET_MODE_BITSIZE (GET_MODE (ptr)).to_constant ())
Shouldn't ptr always be Pmode i.e. the bitsize == XLEN?
Rest LGTM.
Regards
Robin
LGTM FWIW.
Regards
Robin
Hi Juzhe,
seems OK to me. We don't support most of the patterns directly
but as we can and want to vectorize them it makes sens to enable
the tests.
Regards
Robin
Thanks, for now this LGTM.
Regards
Robin
> Hmm, the function is called at transform time so this shouldn't help
> avoiding the ICE. I expected we refuse to vectorize _any_ reduction
> when sign dependent rounding is in effect? OTOH maybe sign-dependent
> rounding is OK but only when we use a unconditional fold-left
> (so a loop mask
> It'd be good to expand on this comment a bit. What kind of COND are you
> anticipating? A COND with the neutral op as the else value, so that the
> PLUS_EXPR (or whatever) can remain unconditional? If so, it would be
> good to sketch briefly how that happens, and why it's better than using
>
On 10/9/23 09:32, Andreas Schwab wrote:
> On Okt 09 2023, juzhe.zh...@rivai.ai wrote:
>
>> Turns out COND(_LEN)?_ADD can't work.
>
> It should work though. Tcl regexps are a superset of POSIX EREs.
>
The problem is that COND(_LEN)?_ADD matches two times against
COND_LEN_ADD and a
Hi Juzhe,
I think an extra param might be too intrusive. I would expect normal
hardware implementations to support unaligned accesses (but they might
be slow which should be covered by costs) and only rarely have hardware
that doesn't support it and raises exceptions.
Therefore I would suggest
> Maybe I should pretend RVV support vect_pack/vect_unpack and enable
> all the tests in target-supports.exp?
The problem is that vect_pack/unpack is an overloaded term in the
moment meaning "vector conversion" (promotion/demotion) or so. This
test does not require pack/unpack for successful
> So if you think you got everything correct the patch is OK as-is,
> I just wasn't sure - maybe the neutral_element change deserves
> a comment as to how MINUS_EXPR is handled.
Heh, I never think I got everything correct ;)
Added this now:
static bool
fold_left_reduction_fn (code_helper
> We might need a similar assert
>
> gcc_assert (HONOR_SIGNED_ZEROS (vectype_out)
> && !HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));?
erm, obviously not that exact assert but more something like
if (HONOR_SIGNED_ZEROS && !HONOR_SIGN_DEPENDENT_ROUNDING...)
{
> ... here we probably get PLUS_EXPR for MINUS_EXPR above but IIRC
> for MINUS_EXPR the !as_initial case should return positive zero.
>
> Can you double-check?
You're referring to the canonicalization from a - CST to a + -CST so
that the neutral op would need to change with it? Argh, good
> Your suggested code seems work fine, let me run more test and send
> v2, I guess I just don’t know how to explain why it work in comment
> :p
If it's too convoluted maybe we should rather not use it :D
The idea is for
factor % (vlenb / potential_div) == 0
we're actually looking for the
Hi Tamar,
> The only comment I have is whether you actually need this helper
> function? It looks like all the uses of it are in cases you have, or
> will call conditional_internal_fn_code directly.
removed the cond_fn_p entirely in the attached v3.
Bootstrapped and regtested on x86_64, aarch64
> So I think Kenner's code is trying to prevent having a value in a
> SUBREG that is inconsistent with the SUBREG_PROMOTED* flag bits. But
> I think it's been unnecessary since Matz's rewrite in 2009.
I couldn't really tell what the rewrite does entirely so I tried creating
a case where we would
Ah, sorry, read your remark incorrectly. Will try again.
Regards
Robin
Hi Tamar,
> So in the
>
> if (slp_node)
> {
>
> Add something like:
>
> If (is_cond_op)
> {
> if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>"left fold reduction on SLP not supported.\n");
> return false;
>
>> I think the "max poly value" is the LMUL 1 mode coeffs[1]
>>
>> See int vlenb = BYTES_PER_RISCV_VECTOR.coeffs[1];
>>
>> So I think bump max_power to exact_log2 (64); is not enough.
>> since we adjust the LMUL 1 mode size according to TARGET_MIN_VLEN.
>>
>> I suspect the testcase you append in
I'm currently in the process of removing some unused @s.
This is OK.
Regards
Robin
> + gcc_assert (code == IFN_COND_ADD || code == IFN_COND_SUB);
I forgot to add the other IFN_CONDs here before sending. So with
- gcc_assert (code == IFN_COND_ADD || code == IFN_COND_SUB);
+ gcc_assert (code == IFN_COND_ADD || code == IFN_COND_SUB
+ || code ==
Ping^2.
I realize it's not very elegant as of now. If there's a better/shorter way
to solve this feel free to suggest :)
Regards
Robin
Hi Tamar,
> I can't approve but hope you don't mind the review,
Not at all, greatly appreciated.
I incorporated all your remarks apart from this:
> Isn't vec_opmask NULL for SLP? You probably need to read it from
> vec_defs for the COND_EXPR
Above that I gcc_assert (!slp_node) for the
Hi Richard,
cool, thanks. I just gave it a try with my test cases and it does what
it is supposed to do, at least if I disable the register pressure check :)
A cursory look over the test suite showed no major regressions and just
some overly specific tests.
My test case only works before split,
> Conceptually the rounding mode is just a property. The call, in
> effect, should demand a "normal" rounding mode and set the rounding
> mode to unknown if I understand how this is supposed to work. If my
> understanding is wrong, then maybe that's where we should start --
> with a good
LGTM.
Regards
Robin
Hi Juzhe,
with the middle-end changes that's a nice improvement. LGTM.
Regards
Robin
Hi Lehua,
I once had different comments for those but either I never pushed them
or they got buried in the process of refactoring. The explanatory
comment explaining vlmax is also in "nowhere land" below autovec_use_vlmax_p.
(it says vsetvli instead of vsetvl as well...) It would be useful
to
OK.
This is also the approach I took locally to fix a Fortran ICE
but forgot to send/push it.
Regards
Robin
Hi Lehua,
> V3 Change: Back to the original method.
Was there an original method even before the first patch?
Anyway, I prefer this v3 over the others even though the large
pattern is not exactly pretty :)
What about the VLS changes? Are they necessary for the patterns/tests?
I mean they are
Hi,
as described in PR111401 we currently emit a COND and a PLUS expression
for conditional reductions. This makes it difficult to combine both
into a masked reduction statement later.
This patch improves that by directly emitting a COND_ADD during ifcvt and
adjusting some vectorizer code to
> So, IMHO, a complicate pattern which combine initial 0 value + extension +
> reduction + vmerge may be more reasonable.
If that works I would also prefer that.
Regards
Robin
o leave
the decision to you, either one is OK.
Regards
Robin
>From 3be4cf4403a584d560c3923207a9c4da8dafee49 Mon Sep 17 00:00:00 2001
From: Robin Dapp
Date: Wed, 20 Sep 2023 10:15:36 +0200
Subject: [PATCH] lehua
---
gcc/config/riscv/autovec-opt.md | 52 -
gcc/
Hi Lehua,
this LGTM.
Regards
Robin
Hi Lehua,
thanks for the explanation.
> My current method is still to keep the operand 2 of vcond_mask as a
> register, but the pattern of mov_vec_const_0 is simplified, so that
> the corresponding combine pattern can be more simple. That's the only
> reason I split the vcond_mask into three
Hi Patrick,
thanks for reporting. Before seeing your message here I already opened a PR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111488
Regards
Robin
Hi Lehua,
>> Would it hurt to allow any nonmemory operand here and just force the
>> "unsupported" constants into a register?
>
> Are you talking about why operand 2 doesn't use nonmemory_operand
> predicate? If so, I think this is because our vmerge.v[vxi]m insns
> only supports that operand 1
Hi Juzhe,
I'd agree that punting is reasonable for now, therefore LGTM.
Regards
Robin
Hi Lehua,
> +(define_expand "vcond_mask_"
> + [(set (match_operand:V_VLS 0 "register_operand")
> +(if_then_else:V_VLS
> + (match_operand: 3 "register_operand")
> + (match_operand:V_VLS 1 "nonmemory_operand")
> + (match_operand:V_VLS 2
LGTM.
Regards
Robin
Ping.
Regards
Robin
> I must be missing something. Doesn't insn 10 broadcast the immediate
> 0x2 to both elements of r142?!? What am I missing?
It is indeed a bit misleading. The difference is in the mask which
is not displayed in the short form. So we actually use a vec_dup
for a single-element move, essentially
> You mean this patch is ok?
I thought about it a bit more. From my point of view the patch is OK
for now in order to get the bug out of the way.
In the longer term I would really prefer a more "regular" solution
(i.e. via hard_regno_mode_ok) and related. I can take care of that
once I have a
Hi Thomas, Jakub,
is there anything we can do to assist from the riscv side in order to help
with this? I haven't really been involved with it but was wondering
what's missing. If I understand correctly Thomas has a major cleanup
operation in plan but might not get to it soon. The fix he
> I am thinking what we are doing is something like we are allowing
> scalar mode within the vector register, so...not sure should we try to
> implement that within the mov pattern?
>
> I guess we need some inputs from Jeff.
Sorry for the late response. I have also been thinking about this and
> Yes. We need the additional helper function since I will cal emit_insn
> (gen_vec_extract (mode, mode)
> in the following patch which fixes PR111391 ICE.
OK.
Regards
Robin
> -(define_expand "vec_extract"
> +(define_expand "@vec_extract"
Do we need the additional helper function? If not let's rather not
add them for build-time reasons. The rest is OK, no need for v2.
Regards
Robin
> Most (all?) of those are due to:
> f951: Warning: command-line option '-Wno-psabi' is valid for
> C/C++/D/LTO/ObjC/ObjC++ but not for Fortran
> so no real bug.
When pushing this, I'd take the liberty of enabling the recently merged vector
ABI so we don't require -Wno-psabi anymore. All
The current status (for rv64gcv) is:
=== gcc tests ===
Running target unix/-march=rv64gcv
XPASS: gcc.dg/vect/bb-slp-subgroups-3.c -flto -ffat-lto-objects
scan-tree-dump-times slp2 "optimized: basic block" 2
XPASS: gcc.dg/vect/bb-slp-subgroups-3.c scan-tree-dump-times slp2
The PR thing needs to be moved but I can commit it.
Regards
Robin
Maybe you want to add PR target/111337 to the changelog?
The rest LGTM.
Regards
Robin
LGTM. We should just keep in mind the restrictions discussed in the
other thread.
Regards
Robin
> This is first version of dynamic LMUL.
> I didn't test it with full GCC testsuite.
>
> My plan is to first pass all GCC testsuite (including vect.exp) with default
> LMUL = M1.
> Then enable dynamic LMUL to test it.
>
> Maybe we could tolerate this ICE issue for now. Then we can test it
>
> Is calculix big ?
It's 7 nested for loops IIRC and, when unrolling, can get pretty nasty.
I tested with -Ofast -funroll-loops. I think wrf is even larger, maybe I
can run a full comparison test tonight to have good coverage.
> Could you give me the testcase to reproduce it?
OK, I will try to
I did some benchmarks and, at least for calculix the differences are
miniscule. I'd say we can stick with the current approach and improve
as needed.
However, I noticed ICEs here:
+ gcc_assert (biggest_size >= mode_size);
and here:
+ mode = TYPE_MODE (TREE_TYPE (lhs));
when compiling
Hi Juzhe,
> +max_number_of_live_regs (const basic_block bb,
> + const hash_map _ranges,
> + unsigned int max_point, machine_mode biggest_mode,
> + int lmul)
> +{
> + unsigned int max_nregs = 0;
> + unsigned int i;
> + unsigned int
Hi,
as Juzhe noticed in gcc.dg/pr92301.c there was still something missing in
the last patch. The attached v2 makes sure we always have a COND_LEN operation
before returning true and initializes len and bias even if they are unused.
Bootstrapped and regtested on aarch64 and x86.
Regards
Robin
Hi Juzhe,
glad that we can use the dominator info directly. Could we move the
calculation of the info to the beginning (if it's not available)? That
makes it clearer that it's a prerequisite. Function comments look
good now.
Some general remarks kind of similar to v1:
- I would prefer a
Hi,
on riscv gcc.dg/pr70252.c ICEs at gimple-isel.cc:283. This is because
we created the gimple statement
mask__7.36_170 = VEC_COND_EXPR ;
during vrp2.
What happens is that, starting with
maskdest = (vec_cond mask1 1 0) >= (vec_cond mask2 1 0)
we fold to
maskdest = mask1 >= (vec_cond
Hi,
found in slp-reduc-7.c, this patch prevents optimizing e.g.
COND_LEN_ADD ({-1, ... }, a, 0, c, len, bias)
unconditionally into just "a".
Currently, we assume that COND_LEN operations can be optimized similarly
to COND operations. As the length is part of the mask (and usually not
Thanks for looking at it in detail.
> Yeah, I think this is potentially a blocker for propagating A into B
> when A is used elsewhere. Combine is able to combine A and B while
> keeping A in parallel with the result. I think either fwprop would
> need to try that too, or it would need to be
I have an almost identical patch locally that passed testing as well
but didn't get around to posting it yet. Therefore LGTM.
Regards
Robin
in
>From d3f87e4de7d7d05a2fcf8c948097b14eadf08c90 Mon Sep 17 00:00:00 2001
From: Robin Dapp
Date: Mon, 24 Jul 2023 16:25:38 +0200
Subject: [PATCH] gcse: Extract reg pressure handling into separate file.
This patch extracts the hoist-pressure handling from gcse and puts it
into a separate fi
LGTM.
Regards
Robin
OK.
Regards
Robin
Hi Lehua,
> May I ask if the compiler options "-march=rv64gcv_zvfh -mabi=lp64d"
> should be removed? Because we don't specify -march and -mabi when we
> run testcase (so but why we need to specify the -march and -mabi in
> this target check?), we run it with the default values. Assuming that
>
Hi Juzhe,
general remark upfront: Please add function-level comments for all
functions. This makes reading and reviewing much easier. I had to sweep
back and forth quite a bit.
> +
> +static int
> +get_last_live_range (const vec _ranges, tree var)
> +{
> + unsigned int ix;
> +
Hi Richard,
I did some testing with the attached v2 that does not restrict to UNARY
anymore. As feared ;) there is some more fallout that I'm detailing below.
On Power there is one guality fail (pr43051-1.c) that I would take
the liberty of ignoring for now.
On x86 there are four fails:
-
Hi Juzhe,
I think the general approach makes sense and it doesn't need to be perfect
from the beginning as we can always iterate on it. Before continuing with a
more detailed review (hopefully tomorrow) some high-level questions upfront.
It would help to document some of these choices so it's
> I imagine doing it in reverse postorder would still make sense.
>
> But my point was that, for the current fwprop limitation of substituting
> into exactly one use of a register, we can check whether that use is
> the *only* use of register.
>
> I.e. if we substitute:
>
> A: (set (reg R1)
> So I don't think I have a good feel for the advantages and disadvantages
> of doing this. Robin's analysis of the aarch64 changes was nice and
> detailed though. I think the one that worries me most is the addressing
> mode one. fwprop is probably the first chance we get to propagate adds
>
301 - 400 of 1012 matches
Mail list logo