https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116571
--- Comment #6 from Andrew Stubbs ---
(In reply to Richard Biener from comment #5)
> (In reply to Thomas Schwinge from comment #4)
> > The GCN target FAILs that I originally had reported here:
> >
> > > [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-11
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116104
Andrew Stubbs changed:
What|Removed |Added
Resolution|FIXED |---
Status|RESOLVED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116103
--- Comment #8 from Andrew Stubbs ---
(In reply to Thomas Schwinge from comment #4)
> (In reply to Richard Biener from comment #2)
> > if (VECTOR_BOOLEAN_TYPE_P (type)
> > && SCALAR_INT_MODE_P (TYPE_MODE (type)))
> > return true;
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116104
--- Comment #4 from Andrew Stubbs ---
The problem insn is this:
(insn 31 30 32 2 (set (reg:V2SI 711)
(ashift:V2SI (reg:V2SI 161 v1)
(const_vector:V2SI [
(const_int 3 [0x3]) repeated x2
]))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116104
--- Comment #3 from Andrew Stubbs ---
(In reply to Jeffrey A. Law from comment #1)
> So, how am I supposed to reproduce this? I don't have an assembler/binutils
> for amdgcn and thus libgcc won't configure. Thus I can't extract a testcase.
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640
--- Comment #18 from Andrew Stubbs ---
That should fix the broken validation check. All V32 permutations should work
now on RDNA GPUs, I think. V16 and smaller were already working fine.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640
--- Comment #16 from Andrew Stubbs ---
On 26/06/2024 14:41, rguenther at suse dot de wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640
>
> --- Comment #15 from rguenther at suse dot de ---
>>> Btw, the above looks quite odd for nelt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640
--- Comment #14 from Andrew Stubbs ---
On 26/06/2024 13:34, rguenth at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640
>
> --- Comment #13 from Richard Biener ---
> (In reply to Richard Biener from comment #12)
>>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640
--- Comment #10 from Andrew Stubbs ---
On 26/06/2024 12:05, rguenth at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640
>
> --- Comment #8 from Richard Biener ---
> (In reply to Richard Biener from comment #7)
>> I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640
--- Comment #3 from Andrew Stubbs ---
(In reply to Richard Biener from comment #2)
> If you force GCN to use fixed length vectors (how?), does it work? How's
> it behaving on aarch64 with SVE? (the CI was happy, but maybe doesn't
> enable SVE)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115631
--- Comment #1 from Andrew Stubbs ---
It was writing 0 to s12 (scalar register) and then moving the zero to lane zero
of v0 (vector register).
Now it's writing the 0 directly to v0, of which all but lane zero is masked.
These should be identic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304
--- Comment #11 from Andrew Stubbs ---
(In reply to rguent...@suse.de from comment #10)
> On Mon, 3 Jun 2024, ams at gcc dot gnu.org wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304
> >
> > --- Comme
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304
--- Comment #9 from Andrew Stubbs ---
(In reply to Richard Biener from comment #6)
> The best strathegy for GCN would be to gather V4QImode aka SImode into the
> V64QImode (or V16SImode) vector. For pix2 we have a gap of 28 elements,
> doing co
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114717
--- Comment #3 from Andrew Stubbs ---
Can this be filtered (safely) in mkoffload? That tool is
offload-target-specific, so no problem with "if offload target were to support
it".
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114302
--- Comment #4 from Andrew Stubbs ---
Yes, that's what the simd-math-3* tests do.
The simd-math-5* tests are explicitly supposed to be doing this in the context
of the autovectorizer.
If these tests are being compiled as (newly) intended then
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114302
--- Comment #2 from Andrew Stubbs ---
The execution test checks that each of the libgcc routines work correctly, and
the scan assembler tests make sure that we're getting coverage of all of them.
In this case, the failure indicates that we're n
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113085
--- Comment #8 from Andrew Stubbs ---
(In reply to seurer from comment #7)
> On the BE machine:
>
> seurer@nilram:~/gcc/git/build/gcc-test$ ulimit -a
> real-time non-blocking time (microseconds, -R) unlimited
> ...
> max locked memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113085
--- Comment #6 from Andrew Stubbs ---
(In reply to seurer from comment #5)
> I should note that pinned-2 also fails on powerpc64 LE.
>
> make -k check-target-libgomp RUNTESTFLAGS="c.exp=libgomp.c/alloc-pinned-*"
> FAIL: libgomp.c/alloc-pinned-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113615
--- Comment #3 from Andrew Stubbs ---
I did see these, but I hadn't had time to chase them up.
The proposed patch is exactly the sort of solution I was expecting to find,
short term. Have you confirmed that it fixes all the cases?
A proper sol
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113199
--- Comment #5 from Andrew Stubbs ---
I can confirm that I can now build the amdgcn toolchain once more. :-)
Thanks.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113163
Andrew Stubbs changed:
What|Removed |Added
CC||ams at gcc dot gnu.org
--- Comment #11
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113085
--- Comment #4 from Andrew Stubbs ---
It's going to be difficult to make this test work when only one page of locked
memory is available. :-(
I will look at making it "unsupported".
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113085
--- Comment #1 from Andrew Stubbs ---
That is a typo.
I don't want to make it pass on machines that have insufficient memory
configured because it will mask the case where it fails for another reason.
However, the testcase was originally suppo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113022
--- Comment #1 from Andrew Stubbs ---
This is what I get for trying to get this done before vacation. :(
Yes, there's probably something in mkoffload that has to match the default
change from -mxnack=any to -mxnack=off on the older ISAs.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112937
--- Comment #2 from Andrew Stubbs ---
Flat addressing *should* be the safe option that always works (although using
"global" address space permits slightly more efficient offset options).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112481
Andrew Stubbs changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112481
--- Comment #7 from Andrew Stubbs ---
Simply changing to OPTAB_WIDEN solves the ICE, but I don't know if it does so
in a sensible way, for RISC V.
@@ -7489,7 +7489,7 @@ store_constructor (tree exp, rtx target, int cleared,
poly_int64 size,
||2023-11-13
Ever confirmed|0 |1
Assignee|unassigned at gcc dot gnu.org |ams at gcc dot gnu.org
--- Comment #4 from Andrew Stubbs ---
It fails because optab_handler fails to find an instruction for "and_optab" in
SImode.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112308
Andrew Stubbs changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
|RESOLVED
Assignee|unassigned at gcc dot gnu.org |ams at gcc dot gnu.org
--- Comment #2 from Andrew Stubbs ---
This is now fixed.
||2023-11-09
Ever confirmed|0 |1
Assignee|unassigned at gcc dot gnu.org |ams at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112088
Andrew Stubbs changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
|1
Last reconfirmed||2023-10-27
Assignee|unassigned at gcc dot gnu.org |ams at gcc dot gnu.org
--- Comment #1 from Andrew Stubbs ---
I'm testing a fix for this.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110313
--- Comment #5 from Andrew Stubbs ---
One thing that is unusual about the GCN stack pointer is that it's actually two
registers. Could this be breaking some cprop assumptions?
GCN can't fit an address in one (SImode) register so all (DImode) po
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110313
--- Comment #3 from Andrew Stubbs ---
It's curious that this affects the Fiji target only, and not the newer targets
at all.
There are some additional register options for multiply instructions, some
differences to atomics, but mostly the diffe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110313
--- Comment #1 from Andrew Stubbs ---
This ICE also affect the following standalone test failures (raw amdgcn, no
offloading):
gfortran.dg/assumed_rank_21.f90
gfortran.dg/finalize_38.f90
gfortran.dg/finalize_38a.f90
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108898
--- Comment #4 from Andrew Stubbs ---
I did not know there was a way to do that! I'll add this to my to-do list.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108898
--- Comment #1 from Andrew Stubbs ---
I tested it on i686-pc-linux-gnu before I posted the patch, and it was working
then. Can you be more specific what configuration you were testing, please?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107510
Andrew Stubbs changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89863
Bug 89863 depends on bug 107510, which changed state.
Bug 107510 Summary: gcc/config/gcn/gcn.cc:4930:9: style: Same expression on
both sides of '||'. [duplicateExpression]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107510
What|R
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107510
Andrew Stubbs changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |ams at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107096
--- Comment #4 from Andrew Stubbs ---
I don't understand rgroups, but I can say that GCN masks are very simply
one-bit-one-lane. There are always 64-lanes, regardless of the type, so V64QI
mode has fewer bytes and bits than V64DImode (when writt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107088
--- Comment #9 from Andrew Stubbs ---
I can confirm that the patch fixes the amdgcn build.
-*-*
CC||ams at gcc dot gnu.org
--- Comment #7 from Andrew Stubbs ---
I get the same failure on amdgcn building newlib/libm/math/kf_rem_pio2.c
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ams at gcc dot gnu.org
CC: rguenther at suse dot de
Target Milestone: ---
Target: amdgcn-amdhsa
Commit 8f4d9c1deda "amdgcn: 64-bit not" exposed an ICE in tree-vect_stmts.cc
when
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105873
--- Comment #4 from Andrew Stubbs ---
I think unused threads should be given a no-op function to run, not a null
pointer. The GCN implementation cannot tell the difference between a null
pointer and an unset pointer (which is what happens when t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105246
--- Comment #2 from Andrew Stubbs ---
When we first coded this we only had the GCN3 ISA manual, which says nothing
about the accuracy.
Now I look in the Vega manual (GCN5) I see:
Square root with perhaps not the accuracy you were hoping for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100181
--- Comment #13 from Andrew Stubbs ---
I've updated the LLVM version documentation at
https://gcc.gnu.org/wiki/Offloading#For_AMD_GCN:
It's LLVM 9 or 13.0.1 now (nothing in between), and will be 13.0.1+ for the
next release (dropping LLVM 9 bec
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104026
Andrew Stubbs changed:
What|Removed |Added
CC||ams at gcc dot gnu.org
--- Comment #6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103396
Andrew Stubbs changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
|UNCONFIRMED |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |ams at gcc dot gnu.org
Ever confirmed|0 |1
--- Comment #4 from Andrew Stubbs ---
I think I have a fix for this. It happens when the link register has to be
saved because it is used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103201
--- Comment #3 from Andrew Stubbs ---
I did some preliminary testing on your patch: the libgomp.c/target-teams-1.c
testcase runs fine on amdgcn. I presume that that covers most of the existing
features of those runtime calls?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102544
--- Comment #8 from Andrew Stubbs ---
Did you get the C version to return anything other than "-1"? (The expected
result is "2".)
I'm still trying to determine if the device is compatible, but the mapping
problem looks like a different issue.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102544
--- Comment #5 from Andrew Stubbs ---
Sorry, I should have said to compile with -fopenacc.
If you did do that, please post the GCN_DEBUG output.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102544
--- Comment #3 from Andrew Stubbs ---
That output shows that we have the correct libgomp and rocm is installed and
working. Libgomp initialized the GCN plugin, but did not attempt to initialize
the device (the next message in the output should h
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102544
--- Comment #1 from Andrew Stubbs ---
Please set "export GCN_DEBUG=1", try it again, and post the output.
||2021-09-09
Ever confirmed|0 |1
Assignee|unassigned at gcc dot gnu.org |ams at gcc dot gnu.org
--- Comment #1 from Andrew Stubbs ---
In addition to changing the amdgcn_target syntax in LLVM 13, the LLVM GCN guys
have also renamed the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101544
--- Comment #5 from Andrew Stubbs ---
[Note: all of my comments refer to the amdgcn case. nvptx has somewhat
different support in this area.]
(In reply to Jonathan Wakely from comment #4)
> But it's a waste of space in the .so to build lots of
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100208
Andrew Stubbs changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101544
--- Comment #3 from Andrew Stubbs ---
The standalone amdgcn configuration does not support C++. There are a number of
technical reasons why it doesn't Just Work, but basically it comes down to
no-one ever working on it. Our customers were primar
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101484
Andrew Stubbs changed:
What|Removed |Added
Ever confirmed|0 |1
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97827
Andrew Stubbs changed:
What|Removed |Added
CC||xw111luoye at gmail dot com
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95023
Andrew Stubbs changed:
What|Removed |Added
CC||ams at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100418
Andrew Stubbs changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100418
--- Comment #13 from Andrew Stubbs ---
I found a lot more ICEs when testing my patch. They look to be unrelated
(TImode come back to haunt us), but it makes it hard to be sure.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100418
--- Comment #9 from Andrew Stubbs ---
I found a couple of other places to put force_operand and the full case works
now.
Running more tests
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100418
Andrew Stubbs changed:
What|Removed |Added
Ever confirmed|0 |1
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100418
--- Comment #4 from Andrew Stubbs ---
Alexandre's patch has this:
emit_move_insn (rem, plus_constant (ptr_mode, rem, -blksize));
Is that generally a valid thing to do? It seems like other places do similar
things...
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100208
--- Comment #1 from Andrew Stubbs ---
LLVM changed the default parameters, so we either have to change the
expectations in the ".amdgcn_target" string (which is basically an assert), or
set the attributes be want explicitly on the assembler comm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521
--- Comment #22 from Andrew Stubbs ---
(In reply to Andrew Stubbs from comment #21)
> (In reply to Richard Biener from comment #19)
> > GCN also uses MODE_INT for the mask mode and thus may be similarly affected.
> > Andrew - are the bits in the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97521
--- Comment #21 from Andrew Stubbs ---
(In reply to Richard Biener from comment #19)
> GCN also uses MODE_INT for the mask mode and thus may be similarly affected.
> Andrew - are the bits in the mask dense? Thus for a V4SImode compare
> would th
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84958
--- Comment #6 from Andrew Stubbs ---
(In reply to Tom de Vries from comment #5)
> I've removed the xfail for nvptx.
>
> The only remaining xfail is for gcn. Is that one still necessary?
The test still fails for gcn.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97332
Andrew Stubbs changed:
What|Removed |Added
Ever confirmed|0 |1
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96306
--- Comment #8 from Andrew Stubbs ---
I'm loath to enable TImode if it's going to ICE all over the place, and I can't
just drop everything else and implement working TImode unless there's an easy
solution. It's always been on the nice-to-have lis
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95730
--- Comment #4 from Andrew Stubbs ---
In fact default_scalar_mode_supported_p does return *false* for TImode (because
LONG_LONG_TYPE_SIZE == 64, and BITS_PER_WORD == 32).
Therefore int128_t does not exist, as far as users are concerned. I'm not
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96306
--- Comment #5 from Andrew Stubbs ---
GCC will automatically generate libgcc calls for types up to 2*BITS_PER_WORD,
but no further. Since BITS_PER_WORD is 32 on GCN this means no automatic TImode
support for anything that would go that route (suc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96306
--- Comment #3 from Andrew Stubbs ---
TImode was added for use by a few instructions that take two 64-bit values in
consecutive registers. It's also useful for the SLP fake vectorization stuff.
It wasn't intended for use with user types; I proba
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95864
--- Comment #1 from Andrew Stubbs ---
I'm aware of these issues.
I fixed all the test failures that were definitely bugs in the HSACOv3
implementation, and the ones that remain appear to be either latent bugs
uncovered by the new driver configur
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95730
--- Comment #3 from Andrew Stubbs ---
The GCN port does not define a scalar_mode_supported, and I think the default
definition is allowing TImode (as long long int). As I said, the SLP
fake-vector load/store use it fine as a substitute for V4SI o
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95730
--- Comment #1 from Andrew Stubbs ---
GCN uses TImode for a few special purposes, but lacks real TImode support.
(Basically, it allows TImode loads and stores for the SLP fake vectorization,
and there's one instruction that needs two DImode valu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93488
Andrew Stubbs changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94725
Andrew Stubbs changed:
What|Removed |Added
CC||ams at gcc dot gnu.org
--- Comment #2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94629
--- Comment #23 from Andrew Stubbs ---
(In reply to Jakub Jelinek from comment #12)
> (In reply to Andrew Stubbs from comment #11)
> > (In reply to Jakub Jelinek from comment #10)
> > > or if instead we should drop the "status = " for the cases w
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94282
--- Comment #6 from Andrew Stubbs ---
I think we've decided to with Thomas's approach.
Thomas, please go ahead and commit.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94278
--- Comment #4 from Andrew Stubbs ---
Almost all the tests listed in pr81430 pass for me (and the exception I found
is a link error).
I don't understand what's happening with your build, but from my point of view
the patch fixes an issue that do
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94248
--- Comment #7 from Andrew Stubbs ---
I'd rather remove the whole if branch, but given you've tested this already
then it's probably the best short term fix. Please go ahead.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94278
--- Comment #2 from Andrew Stubbs ---
Well, it works for me:
PASS: libgomp.c/examples-4/async_target-2.c (test for excess errors)
PASS: libgomp.c/examples-4/async_target-2.c execution test
That's with an unmodified LLVM 9 we built ourselves.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94248
--- Comment #5 from Andrew Stubbs ---
(In reply to Thomas Schwinge from comment #4)
> (In reply to Andrew Stubbs from comment #3)
> > Actually, I think that recent changes to the register alignment mean that
> > this can't happen any more, so the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94629
--- Comment #11 from Andrew Stubbs ---
(In reply to Jakub Jelinek from comment #10)
> or if instead we should drop the "status = " for the cases where nothing
> checks it. Andrew?
I think checking the status is probably good practice, even thoug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94282
--- Comment #3 from Andrew Stubbs ---
(In reply to Andrew Pinski from comment #2)
> (In reply to Tobias Burnus from comment #1)
> > The symbol __gxx_personality_v0 is part of libsupc++ – which I believe is
> > not build to to lacking/restricted C
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94248
--- Comment #3 from Andrew Stubbs ---
Actually, I think that recent changes to the register alignment mean that this
can't happen any more, so the whole check is probably obsolete.
I thought that --enable-checking=yes was already covering this.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93409
Andrew Stubbs changed:
What|Removed |Added
CC||ams at gcc dot gnu.org
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92772
Andrew Stubbs changed:
What|Removed |Added
Priority|P3 |P5
Severity|critical
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92772
--- Comment #6 from Andrew Stubbs ---
(In reply to Richard Biener from comment #4)
> Btw, isn't the issue that the reduction looks at all lanes? That is,
> I think the code simply assumes that for fully masked loops at least
> one iteration is p
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92772
--- Comment #3 from Andrew Stubbs ---
The GCN architecture can handle the masking, but I don't know how we'd
represent or apply that in the middle end?
I can probably implement extract_last, and that might be more efficient, but I
don't see how
-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ams at gcc dot gnu.org
Target Milestone: ---
The testcase pr65947-10.c fails on amdgcn because there are more vector lanes
than there is data, and the algorithm created doesn't allow for this. (Actually
there
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91198
--- Comment #1 from Andrew Stubbs ---
I don't believe GCC detects that operation automatically.
It does support the instruction via intrinsics (builtin functions that
correspond to low-level machine features). You should investigate
"__builtin_i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90779
--- Comment #14 from Andrew Stubbs ---
(In reply to Jakub Jelinek from comment #7)
> if I compile just the first TU without the foo () call in there, and
> .global .align 4 .u32 var$lto_priv$1[1] = { 5 };
> .global .align 4 .u32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90779
--- Comment #8 from Andrew Stubbs ---
On GCN I get the lto_priv names, but not the globalization. I think that shows
what the expected behaviour is, thanks ... I just need to find that magic.
That being so, I think I can confirm that your origin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90779
--- Comment #6 from Andrew Stubbs ---
There's not observable difference. I don't quite follow what the patch is
trying to achieve, but seems like adding the variable to the offload variables
does not address the issue here.
I've added a hack to
1 - 100 of 161 matches
Mail list logo