https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115104
--- Comment #2 from Robin Dapp ---
Thanks, I was just about to open a PR.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #18 from Robin Dapp ---
A bit of a follow-up: I'm working on a patch for reassociation that can handle
the mentioned cases and some more but it will still require a bit of time to
get everything regression free and correct. What
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114196
--- Comment #7 from Robin Dapp ---
I can barely build a compiler on gcc185 due to disk space. I'm going to set up
a cross toolchain (that I need for other purposes as well) in order to test.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
--- Comment #10 from Robin Dapp ---
Yes it helps. Great that get_gimple_for_ssa_name is right below
get_rtx_for_ssa_name that I stepped through several times while debugging and I
didn't realize the connection, g.
But thanks! Good thing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
--- Comment #8 from Robin Dapp ---
Created attachment 58037
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58037=edit
Expand dump
Dump attached. Insn 209 is the problematic one.
The changing from _911 to 1078 happens in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
Robin Dapp changed:
What|Removed |Added
CC||rguenth at gcc dot gnu.org,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
--- Comment #5 from Robin Dapp ---
What happens is that code sinking does:
Sinking # VUSE <.MEM_1235>
vect__173.251_1238 = .MASK_LEN_LOAD (_911, 32B, { -1, -1, -1, -1 },
loop_len_1064, 0);
from bb 3 to bb 4
so we have
vect__173.251_1238 =
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114714
Robin Dapp changed:
What|Removed |Added
CC||rdapp at gcc dot gnu.org
--- Comment #5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
--- Comment #4 from Robin Dapp ---
Ok, it looks like we do 5 iterations with the last one being length-masked to
length 2 and then in the "live extraction" phase use "iteration 6".
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
--- Comment #3 from Robin Dapp ---
> probably -fwhole-program is enough, -flto not needed(?)
Yes, -fwhole-program is sufficient.
>
> # vectp_g.248_1401 = PHI
> ...
> _1411 = .SELECT_VL (ivtmp_1409, POLY_INT_CST [2, 2]);
> ..
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
--- Comment #1 from Robin Dapp ---
Confirmed.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114733
--- Comment #1 from Robin Dapp ---
Confirmed, also shows up here.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114665
--- Comment #5 from Robin Dapp ---
Weird, I tried your exact qemu version and still can't reproduce the problem.
My results are always FFB5.
Binutils difference? Very unlikely. Could you post your QEMU_CPU settings
just to be sure?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114668
Robin Dapp changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114686
--- Comment #3 from Robin Dapp ---
I think we have always maintained that this can definitely be a per-uarch
default but shouldn't be a generic default.
> I don't see any reason why this wouldn't be the case for the vast majority of
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114668
--- Comment #2 from Robin Dapp ---
This, again, seems to be a problem with bit extraction from masks.
For some reason I didn't add the VLS modes to the corresponding vec_extract
patterns. With those in place the problem is gone because we go
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114665
--- Comment #2 from Robin Dapp ---
Checked with the latest commit on a different machine but still cannot
reproduce the error. PR114668 I can reproduce. Maybe a copy and paste
problem?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114665
--- Comment #1 from Robin Dapp ---
Hmm, my local version is a bit older and seems to give the same result for both
-O2 and -O3. At least a good starting point for bisection then.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247
--- Comment #6 from Robin Dapp ---
Testsuite looks unchanged on rv64gcv.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247
--- Comment #5 from Robin Dapp ---
This fixes the test case for me locally, thanks.
I can run the testsuite with it later if you'd like.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114476
--- Comment #8 from Robin Dapp ---
I tried some things (for the related bug without -fwrapv) then got busy with
some other things. I'm going to have another look later this week.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515
Robin Dapp changed:
What|Removed |Added
CC||ewlu at rivosinc dot com,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485
--- Comment #4 from Robin Dapp ---
Yes, the vectorization looks ok. The extracted live values are not used
afterwards and therefore the whole vectorized loop is being thrown away.
Then we do one iteration of the epilogue loop, inverting the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114476
--- Comment #5 from Robin Dapp ---
So the result is -9 instead of 9 (or vice versa) and this happens (just) with
vectorization. We only vectorize with -fwrapv.
>From a first quick look, the following is what we have before vect:
(loop)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396
--- Comment #8 from Robin Dapp ---
No fallout on x86 or aarch64.
Of course using false instead of TYPE_SIGN (utype) is also possible and maybe
clearer?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396
--- Comment #7 from Robin Dapp ---
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 4375ebdcb49..f8f7ba0ccc1 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -9454,7 +9454,7 @@ vect_peel_nonlinear_iv_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396
--- Comment #3 from Robin Dapp ---
-O3 -mavx2 -fno-vect-cost-model -fwrapv seems to be sufficient.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396
Robin Dapp changed:
What|Removed |Added
Target|riscv*-*-* |x86_64-*-* riscv*-*-*
--- Comment #2 from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #29 from Robin Dapp ---
Yes, that also appears to work here. There was no lto involved this time?
Now we need to figure out what's different with SPEC.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #27 from Robin Dapp ---
Can you try it with a simpler (non SPEC) test? Maybe there is still something
weird happening with SPEC's scripting.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #24 from Robin Dapp ---
I rebuilt GCC from scratch with your options but still have the same problem.
Could our sources differ? My SPEC version might not be the most recent but I'm
not aware that mcf changed at some point.
Just
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #22 from Robin Dapp ---
Still the same problem unfortunately.
I'm a bit out of ideas - maybe your compiler executables could help?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #20 from Robin Dapp ---
No change with -std=gnu99 unfortunately.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #18 from Robin Dapp ---
Hmm, doesn't help unfortunately. A full command line for me looks like:
x86_64-pc-linux-gnu-gcc -c -o pbeampp.o -DSPEC_CPU -DNDEBUG -DWANT_STDC_PROTO
-Ofast -march=znver4 -mtune=znver4 -flto=32 -g
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #16 from Robin Dapp ---
Thank you!
I'm having a problem with the data, though.
Compiling with -Ofast -march=znver4 -mtune=znver4 -flto -fprofile-use=/tmp.
Would you mind showing your exact final options for compilation of e.g.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #10 from Robin Dapp ---
(In reply to Sam James from comment #9)
> (In reply to Filip Kastl from comment #8)
> > I'd like to help but I'm afraid I cannot send you the SPEC binaries with PGO
> > applied since SPEC is licensed nor can
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #7 from Robin Dapp ---
I built executables with and without the commit (-Ofast -march=znver4 -flto).
There is no difference so it must really be something that happens with PGO.
I'd really need access to a zen4 box or the pgo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114202
Robin Dapp changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114200
--- Comment #3 from Robin Dapp ---
*** Bug 114202 has been marked as a duplicate of this bug. ***
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114196
Robin Dapp changed:
What|Removed |Added
See Also||https://gcc.gnu.org/bugzill
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114200
--- Comment #1 from Robin Dapp ---
Took me a while to analyze this... needed more time than I'd like to admit to
make sense of the somewhat weird code created by fully unrolling and peeling.
I believe the problem is that we reload the output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #6 from Robin Dapp ---
Honestly, I don't know how to analyze/debug this without a zen4, in particular
as it only seems to happen with PGO. I tried locally but of course the
execution time doesn't change (same as with zen3 according
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114109
--- Comment #4 from Robin Dapp ---
Yes, as mentioned, vectorization of the first loop is debatable.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114109
--- Comment #2 from Robin Dapp ---
It is vectorized with a higher zvl, e.g. zvl512b, refer
https://godbolt.org/z/vbfjYn5Kd.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114109
Bug ID: 114109
Summary: x264 satd vectorization vs LLVM
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: enhancement
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114028
--- Comment #2 from Robin Dapp ---
This is a target issue. It looks like we try to construct a "superword"
sequence when the element size is already == Pmode. Testing a patch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027
--- Comment #9 from Robin Dapp ---
Argh, I actually just did a gcc -O3 -march=native pr114027.c
-fno-vect-cost-model on cfarm188 with a recent-ish GCC but realized that I used
my slightly modified version and not the original test case.
long
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114027
Robin Dapp changed:
What|Removed |Added
CC||rguenth at gcc dot gnu.org
Last
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #4 from Robin Dapp ---
Judging by the graph it looks like it was slow before, then got faster and now
slower again. Is there some more info on why it got faster in the first place?
Did the patch reverse something or is it rather a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827
--- Comment #1 from Robin Dapp ---
x86 (-march=native -O3 on an i7 12th gen) looks pretty similar:
.L3:
movq(%rdi), %rax
vmovups (%rax), %xmm1
vdivps %xmm0, %xmm1, %xmm1
vmovups %xmm1, (%rax)
addq
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113827
Bug ID: 113827
Summary: MrBayes benchmark redundant load
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: enhancement
Priority: P3
Component: target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607
--- Comment #23 from Robin Dapp ---
> this is:
>
> _429 = mask_patt_205.47_276[i] ? vect_cst__262[i] : (vect_cst__262 <<
> {0,..})[i];
> vect_iftmp.55_287 = mask_patt_209.54_286[i] ? _429 [i] : vect_cst__262[i]
But isn't it rather
_429 =
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607
--- Comment #19 from Robin Dapp ---
What seems odd to me is that in fre5 we simplify
_429 = .COND_SHL (mask_patt_205.47_276, vect_cst__262, vect_cst__262, { 0,
... });
vect_prephitmp_129.51_282 = _429;
vect_iftmp.55_287 = VEC_COND_EXPR ;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607
--- Comment #18 from Robin Dapp ---
Hehe no it doesn't make sense... I wrongly read a v2 as a v1. Please
disregard the last message.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607
--- Comment #17 from Robin Dapp ---
Grasping for straws by blaming qemu ;)
At some point we do the vector shift
vsll.vv v1,v2,v2,v0.t
but the mask v0 is all zeros:
gdb:
b = {0 }
According to the mask-undisturbed policy set before
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607
--- Comment #16 from Robin Dapp ---
Disabling vec_extract makes us operate on non-partial vectors, though so there
are a lot of differences in codegen. I'm going to have a look.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #9 from Robin Dapp ---
(In reply to rguent...@suse.de from comment #6)
> t.c:47:21: missed: the size of the group of accesses is not a power of 2
> or not equal to 3
> t.c:47:21: missed: not falling back to elementwise
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607
--- Comment #10 from Robin Dapp ---
The compile farm machine I'm using doesn't have SVE.
Compiling with -march=armv8-a -O3 pr113607.c -fno-vect-cost-model and running
it returns 0 (i.e. ok).
pr113607.c:35:5: note: vectorized 3 loops in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607
--- Comment #7 from Robin Dapp ---
Yep, that one fails for me now, thanks.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113607
--- Comment #4 from Robin Dapp ---
I cannot reproduce it either, tried with -ftree-vectorize as well as
-fno-vect-cost-model.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113575
--- Comment #14 from Robin Dapp ---
Ok, running tests with the adjusted version and going to post a patch
afterwards.
However, during a recent run compiling insn-recog took 2G and insn-emit-7 as
well as insn-emit-10 required > 1.5G each.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113575
--- Comment #12 from Robin Dapp ---
Created attachment 57209
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57209=edit
Tentative
I tested the attached "fix". On my machine with 13.2 host compiler it reduced
the build time for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #2 from Robin Dapp ---
> It's interesting, for Clang only RISC-V can vectorize it.
The full loop can be vectorized on clang x86 as well when I remove the first
conditional (which is not in the snippet I posted above). So that's
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
Bug ID: 113583
Summary: Main loop in 519.lbm not vectorized.
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113575
--- Comment #7 from Robin Dapp ---
Ok, I'm going to check.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113575
Robin Dapp changed:
What|Removed |Added
CC||rdapp at gcc dot gnu.org
--- Comment #5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113570
--- Comment #2 from Robin Dapp ---
I'm pretty certain this is "works as intended" and -Ofast causes the precision
to be different than with -O3 (and dependant on the target). See also:
It has been reported that with gfortran -Ofast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113558
--- Comment #2 from Robin Dapp ---
Created attachment 57195
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57195=edit
Tentative patch
Ah, it looks like nothing is being vectorized at all and the second check just
happened to match as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087
--- Comment #38 from Robin Dapp ---
deepsjeng also looks ok here.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087
--- Comment #37 from Robin Dapp ---
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113206#c9
> Using 4a0a8dc1b88408222b88e10278017189f6144602, the spec run failed on:
> zvl128b (All runtime fails):
> 527.cam4 (Runtime)
> 531.deepsjeng (Runtime)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113495
--- Comment #27 from Robin Dapp ---
Following up on this:
I'm seeing the same thing Patrick does. We create a lot of large non-sparse
sbitmaps that amount to around 33G in total.
I did local experiments replacing all sbitmaps that are not
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113474
--- Comment #1 from Robin Dapp ---
Good catch. Looks like the ifn expander always forces into a register. That's
probably necessary on all targets except riscv.
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113247
--- Comment #9 from Robin Dapp ---
I also noticed this (likely unwanted) vector snippet and wondered where it is
being created. First I thought it's a vec_extract but doesn't look like it.
I'm going to check why we create this.
Pan, the test
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971
--- Comment #22 from Robin Dapp ---
Yes, going to the thread soon.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113249
--- Comment #4 from Robin Dapp ---
> One of the reasons I've been testing things with generic-ooo is because
> generic-ooo had initial vector pipelines defined. For cleaning up the
> scheduler, I copied over the generic-ooo pipelines into
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113247
--- Comment #4 from Robin Dapp ---
The other option is to assert that all tune models have at least a vector cost
model rather than NULL... But not falling back to the builtin costs still
makes sense.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113247
--- Comment #3 from Robin Dapp ---
Yes, sure and I gave a bit of detail why the values chosen there (same as
aarch64) make sense to me.
Using this generic vector cost model by default without adjusting the latencies
is possible. I would be OK
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113247
--- Comment #1 from Robin Dapp ---
Hmm, so I tried reproducing this and without a vector cost model we indeed
vectorize. My qemu dynamic instruction count results are not as abysmal as
yours but still bad enough (20-30% increase in dynamic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113281
--- Comment #2 from Robin Dapp ---
Confirmed. Funny, we shouldn't vectorize that but really optimize to "return
0". Costing might be questionable but we also haven't optimized away the loop
when comparing costs.
Disregarding that, of course
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113249
--- Comment #1 from Robin Dapp ---
Yes, several (most?) of those are expected because the tests rely on the
default latency model. One option is to hard code the tune in those tests.
On the other hand the dump tests checking for a more or less
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112999
Robin Dapp changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112773
--- Comment #16 from Robin Dapp ---
I'd hope it was not fixed by this but just latent because we chose a VLS-mode
vectorization instead. Hopefully we're better off with the fix than without :)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113014
--- Comment #4 from Robin Dapp ---
Richard has posted it and asked for reviews. I have tested it and we have
several testsuite regressions with it but no severe ones. Most or all of them
are dump fails because we combine into vx variants that
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113014
--- Comment #2 from Robin Dapp ---
Yes, that's right.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112999
--- Comment #1 from Robin Dapp ---
What actually gets in the way of vec_extract here is changing to a "better"
vector mode (which is RVVMF4QI here). If we tried to extract from the mask
directly everything would work directly.
I have a patch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112999
Bug ID: 112999
Summary: riscv: Infinite loop with mask extraction
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971
--- Comment #8 from Robin Dapp ---
Yes, can confirm that this helps.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971
--- Comment #5 from Robin Dapp ---
Yes that's what I just tried. No infinite loop anymore then. But that's not a
new simplification and looks reasonable so there must be something special for
our backend.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971
--- Comment #3 from Robin Dapp ---
In match.pd we do something like this:
;; Function e (e, funcdef_no=0, decl_uid=2751, cgraph_uid=1, symbol_order=4)
Pass statistics of "forwprop":
Matching expression match.pd:2771,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971
--- Comment #2 from Robin Dapp ---
It doesn't look like the same issue to me. The other bug is related to TImode
handling in combination with mask registers. I will also have a look at this
one.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929
--- Comment #15 from Robin Dapp ---
I think we need to make sure that we're not writing out of bounds. In that
case anything might happen and if we just don't happen to overwrite this
variable we might hit another one but the test can still
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853
--- Comment #10 from Robin Dapp ---
I just realized that I forgot to post the comparison recently. With the patch
now upstream I don't see any differences for zvl128b and different vlens
anymore. What I haven't fully tested yet is zvl256b or
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929
--- Comment #13 from Robin Dapp ---
I just built from the most recent commit and it still fails for me.
Could there be a difference in qemu? I'm on qemu-riscv64 version 8.1.91 but
yours is even newer so that might not explain it.
You could
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929
--- Comment #9 from Robin Dapp ---
In the good version the length is 32 here because directly before the vsetvl we
have:
li a4,32
That seems to get lost somehow.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929
--- Comment #7 from Robin Dapp ---
Here
0x105c6 vse8.v v8,(a5)
is where we overwrite m. The vl is 128 but the preceding vsetvl gets a4 =
46912504507016 as AVL which seems already borken.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929
--- Comment #6 from Robin Dapp ---
This seems to be gone when simple vsetvl (instead of lazy) is used or with
-fno-schedule-insns which might indicate a vsetvl pass problem.
We might have a few more of those. Maybe it would make sense to run
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853
--- Comment #8 from Robin Dapp ---
With Juzhe's latest fix that disables VLS modes >= 128 bit for zvl128b x264
runs without issues here and some of the additional execution failures are
gone.
Will post the current comparison later.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112872
--- Comment #2 from Robin Dapp ---
Thanks. Yes that's similar and also looks fixed by the introduction of the
vec_init expander. Added this test case to the patch and will push it soon.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853
--- Comment #7 from Robin Dapp ---
Ah, forgot three tests:
FAIL: gcc.dg/vect/bb-slp-cond-1.c execution test
FAIL: gcc.dg/vect/bb-slp-pr101668.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/bb-slp-pr101668.c execution test
On
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853
--- Comment #6 from Robin Dapp ---
I indeed see more failures with _zvl128b, vlen=256 (than with _zvl128b,
vlen=128):
FAIL: gcc.dg/vect/pr66251.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr66251.c execution test
FAIL:
1 - 100 of 208 matches
Mail list logo