Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-11 Thread Andrew Stubbs
On 10/09/2024 10:43, Andrew Stubbs wrote: On 06/09/2024 09:47, Robin Dapp wrote: So we only found two instances of this problem and both were related to _Bools.  In case you have more cases, it would be greatly appreciated to verify the series with them.  If you don't mind, would

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-10 Thread Andrew Stubbs
On 06/09/2024 09:47, Robin Dapp wrote: So we only found two instances of this problem and both were related to _Bools. In case you have more cases, it would be greatly appreciated to verify the series with them. If you don't mind, would it be possible to comment out the zeroing, re-run the test

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-06 Thread Andrew Stubbs
On 06/09/2024 08:06, Robin Dapp wrote: There were absolutely problems without this. It's a while ago now, so I'm struggling with the details, but as GCC only applies the mask to selected operations there were all sorts of issues that crept in. Zeroing the undefined lanes seemed to match the middl

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-05 Thread Andrew Stubbs
On Thu, 5 Sept 2024, 21:10 Robin Dapp, wrote: > > > +(define_predicate "maskload_else_operand" > > > + (and (match_code "const_int,const_vector") > > > + (match_test "op == CONST0_RTX (GET_MODE (op))"))) > > > > This forces maskload and mask_gather_load to only accept zero here, but > > in

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-05 Thread Andrew Stubbs
(Sorry, I missed this because I was on vacation.) On 11/08/2024 22:00, Robin Dapp wrote: This patch adds a zero else operand to the masked loads. The patch is OK, but I have a question below. gcc/ChangeLog: * config/gcn/predicates.md (maskload_else_operand): New predicate.

[committed, wwwdocs] gcc-15: Fiji gfx803 device support removed

2024-09-02 Thread Andrew Stubbs
--- htdocs/gcc-15/changes.html | 7 +++ 1 file changed, 7 insertions(+) diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html index edce138e..7c372688 100644 --- a/htdocs/gcc-15/changes.html +++ b/htdocs/gcc-15/changes.html @@ -123,6 +123,13 @@ a work-in-progress. +AMD Ra

[committed] amdgcn: remove gfx803 "Fiji" support

2024-09-02 Thread Andrew Stubbs
The gfx803 "Fiji" device was deprecated in GCC 14, removed from LLVM 18, and hasn't worked properly with the drivers since about ROCm 4. This patch removes the device from GCC options and documentation, and removes the direct mentions from the internals. The TARGET_GCN3 support in the back-end is

[committed] amdgcn: Remove TARGET_GCN5_PLUS

2024-09-02 Thread Andrew Stubbs
Now that GCN3 support is gone, TARGET_GCN5_PLUS always evaluates to true, so we can make that code unconditional, and remove all the "else" cases. The ISA features TARGET_GLOBAL_ADDRSPACE, TARGET_FLAT_OFFSETS, TARGET_EXPLICIT_CARRY, and TARGET_MULTIPLY_IMMEDIATE, are similarly also redundant and c

[committed] amdgcn: Remove TARGET_GCN3

2024-09-02 Thread Andrew Stubbs
The only GCN3 ISA device was remove (Fiji, gfx803) so all the GCN3-specific code and features can be removed from the back-end. gcc/ChangeLog: * config/gcn/gcn-opts.h (enum gcn_isa): Delete ISA_GCN3. (TARGET_GCN3): Delete. (TARGET_GCN3_PLUS): Delete. (TARGET_M0_LDS

Re: [patch][rfc] libgomp: Add OpenMP interop support to nvptx + gcn plugin

2024-08-27 Thread Andrew Stubbs
On 22/08/2024 19:26, Tobias Burnus wrote: This patch adds OpenMP's interop support to the libgomp plugins (nvptx: cuda, cuda_driver, hip; gcn: hip, hsa).* [The idea is that the user can ask OpenMP to return a foreign-runtime handle (CUdevice, hipCtx_t, …) for to a specified OpenMP device numbe

Re: [commit] amdgcn: Re-enable trampolines

2024-08-09 Thread Andrew Stubbs
On 09/08/2024 07:53, Thomas Schwinge wrote: Hi Andrew! On 2024-08-08T13:50:17+, Andrew Stubbs wrote: Previously, trampolines worked on GCN3 devices, but the newer GCN5 devices had different permissions on the stack memory space we were using. That changed when we added the reverse

[commit] amdgcn: Add padding to trampoline

2024-08-09 Thread Andrew Stubbs
This avoids a -Wpadded warning (testcase gcc.dg/20050607-1.c). gcc/ChangeLog: * config/gcn/gcn.cc (gcn_asm_trampoline_template): Add .align. * config/gcn/gcn.h (TRAMPOLINE_SIZE): Increase to 40. --- gcc/config/gcn/gcn.cc | 1 + gcc/config/gcn/gcn.h | 2 +- 2 files changed, 2 ins

[committed] amdgcn: Fix VGPR max count

2024-08-08 Thread Andrew Stubbs
The metadata for RDNA3 kernels allocates VGPRs in blocks of 12, which means the maximum usable number of registers is 252. This patch prevents the compiler from exceeding this artifical limit. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_conditional_register_usage): Fix registers rema

[commit] amdgcn: Re-enable trampolines

2024-08-08 Thread Andrew Stubbs
Previously, trampolines worked on GCN3 devices, but the newer GCN5 devices had different permissions on the stack memory space we were using. That changed when we added the reverse-offload features because we switched from using the "private" memory space to using a regular memory allocation. The

Re: [Patch] install.texi (gcn): Suggest newer commit for Newlib

2024-07-23 Thread Andrew Stubbs
On 23/07/2024 11:05, Tobias Burnus wrote: Hi Andrew, hi all, to be compatible with C++ (and Thomas' WIP work for GCN C++ support), I suggest the attach patch that also suggest Thomas' Newlib commit (April 4, 2024) ed50a50b9   amdgcn: Implement proper locks: Fix 'newlib/libc/sys/amdgcn/inclu

Re: GCN: Honor OpenMP 5.1 'num_teams' lower bound

2024-07-15 Thread Andrew Stubbs
On 15/07/2024 16:36, Thomas Schwinge wrote: Hi! On 2024-07-15T12:16:30+0100, Andrew Stubbs wrote: On 15/07/2024 10:29, Thomas Schwinge wrote: On 2021-11-12T18:58:04+0100, Jakub Jelinek via Gcc-patches wrote: And finally here is a third version, [...] ... which became commit

Re: GCN: Honor OpenMP 5.1 'num_teams' lower bound

2024-07-15 Thread Andrew Stubbs
On 15/07/2024 10:29, Thomas Schwinge wrote: Hi! On 2021-11-12T18:58:04+0100, Jakub Jelinek via Gcc-patches wrote: And finally here is a third version, [...] ... which became commit 9fa72756d90e0d9edadf6e6f5f56476029925788 "libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound". Attached h

[committed] amdgcn: invent target feature flags

2024-07-02 Thread Andrew Stubbs
This is a first step towards having a device table so we can add new devices more easily. It'll also make it easier to remove the deprecated GCN3 bits. The patch should not change the behaviour of anything. gcc/ChangeLog: * config/gcn/gcn-opts.h (TARGET_GLOBAL_ADDRSPACE): New. (

[PATCH v2 8/8] libgomp: Map omp_default_mem_space to USM

2024-06-28 Thread Andrew Stubbs
When unified shared memory is required, the default memory space should also be unified. libgomp/ChangeLog: * config/linux/allocator.c (linux_memspace_alloc): Check omp_requires_mask. (linux_memspace_calloc): Likewise. (linux_memspace_free): Likewise. (linu

[PATCH v2 6/8] amdgcn: libgomp plugin USM implementation

2024-06-28 Thread Andrew Stubbs
From: Andrew Stubbs Implement the Unified Shared Memory API calls in the GCN plugin. The AMD equivalent of "Managed Memory" means registering previously allocated host memory as "coarse-grained" (whereas allocating coarse-grained memory via hsa_allocate_memory allocate

[PATCH v2 7/8] openmp, libgomp: Handle unified shared memory in omp_target_is_accessible

2024-06-28 Thread Andrew Stubbs
From: Marcel Vollweiler This patch handles Unified Shared Memory (USM) in the OpenMP runtime routine omp_target_is_accessible. libgomp/ChangeLog: * target.c (omp_target_is_accessible): Handle unified shared memory. * testsuite/libgomp.c-c++-common/target-is-accessible-1.c: Updat

[PATCH v2 5/8] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK

2024-06-28 Thread Andrew Stubbs
From: Andrew Stubbs The AMD GCN runtime must be set to the correct mode for Unified Shared Memory to work, but this is not always clear at compile and link time due to the split nature of the offload compilation pipeline. This patch sets a new attribute on OpenMP offload functions to ensure

[PATCH v2 4/8] openmp: Use libgomp memory allocation functions with unified shared memory.

2024-06-28 Thread Andrew Stubbs
++.dg/gomp/usm-5.C: New test. * gfortran.dg/gomp/usm-2.f90: New test. * gfortran.dg/gomp/usm-3.f90: New test. co-authored-by: Andrew Stubbs --- gcc/omp-low.cc| 184 ++ gcc/passes.def| 1 + gcc/tes

[PATCH v2 2/8] openmp, nvptx: ompx_gnu_unified_shared_mem_alloc

2024-06-28 Thread Andrew Stubbs
From: Andrew Stubbs This adds support for using Cuda Managed Memory with omp_alloc. It will be used as the underpinnings for "requires unified_shared_memory" in a later patch. There are two new predefined allocators, ompx_gnu_unified_shared_mem_alloc and ompx_gnu_host_mem_a

[PATCH v2 3/8] openmp: Enable -foffload-memory=unified

2024-06-28 Thread Andrew Stubbs
From: Andrew Stubbs Ensure that "requires unified_shared_memory" plays nicely with the -foffload-memory options, and that enabling the option has the same effect as enabling USM in the code. Also adds some testcases. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_ta

[PATCH v2 0/8] OpenMP: Unified Shared Memory via Managed Memory

2024-06-28 Thread Andrew Stubbs
approve the amdgcn patches myself, but comments are welcome. OK for mainline? (Once the pinned memory dependencies are committed.) Thanks Andrew P.S. This series includes contributions from (at least) Thomas Schwinge, Marcel Vollweiler, Kwok Cheung Yeung, and Abid Qadeer.

[PATCH v2 1/8] libgomp: Disentangle shared memory from managed

2024-06-28 Thread Andrew Stubbs
Some GPU compute systems allow the GPU to access host memory without much prior setup, but that's not necessarily the fast way to do it. For shared memory APUs this is almost certainly the correct choice, but for AMD there is the difference between "fine-grained" and "coarse-grained" memory, and f

Re: [Patch, v2] gcn/mkoffload.cc: Use #embed for including the generated ELF file

2024-06-21 Thread Andrew Stubbs
On 21/06/2024 16:30, Tobias Burnus wrote: [I messed up copying from the build system, picking up an old version. Changes to v1 (bottom of the diff): fopen is no longer required.] Tobias Burnus wrote: mkoffload's generated .c file looks much nicer with '#embed'. This patch depends on Jakub's #e

Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-17 Thread Andrew Stubbs
On 14/06/2024 11:31, Richard Biener wrote: The following retires vcond{,u,eq} optabs by stopping to use them from the middle-end. Targets instead (should) implement vcond_mask and vec_cmp{,u,eq} optabs. The PR this change refers to lists possibly affected targets - those implementing these patt

[PATCH v5 6/6] libgomp: fine-grained pinned memory allocator

2024-06-12 Thread Andrew Stubbs
This patch introduces a new custom memory allocator for use with pinned memory (in the case where the Cuda allocator isn't available). In future, this allocator will also be used for Unified Shared Memory. Both memories are incompatible with the system malloc because allocated memory cannot share

[PATCH v5 4/6] openmp: -foffload-memory=pinned

2024-06-12 Thread Andrew Stubbs
Implement the -foffload-memory=pinned option such that libgomp is instructed to enable fully-pinned memory at start-up. The option is intended to provide a performance boost to certain offload programs without modifying the code. This feature only works on Linux, at present, and simply calls mloc

[PATCH v5 5/6] libgomp, nvptx: Cuda pinned memory

2024-06-12 Thread Andrew Stubbs
This patch was already approved, in the v3 posting by Tobias Burnus (with one caveat about initialization location), but wasn't committed at that time as I didn't want to disentangle it from the textual dependencies on the other patches in the series. -- Use Cuda to pin memory, instead of Lin

[PATCH v5 2/6] libgomp, openmp: Add ompx_gnu_pinned_mem_alloc

2024-06-12 Thread Andrew Stubbs
Compared to the previous v4 (1/5) posting of this patch: - The enumeration of the ompx allocators have been moved (again) to 200 (as 100 is already in use by another toolchain vendor and this seems like a possible source of confusion). - The "ompx" has also been changed to "ompx_gnu" to highlig

[PATCH v5 3/6] openmp: Add -foffload-memory

2024-06-12 Thread Andrew Stubbs
Add a new option. It's inactive until I add some follow-up patches. gcc/ChangeLog: * common.opt: Add -foffload-memory and its enum values. * coretypes.h (enum offload_memory): New. * doc/invoke.texi: Document -foffload-memory. --- gcc/common.opt | 16 +++

[PATCH v5 0/6] libgomp: OpenMP pinned memory for omp_alloc

2024-06-12 Thread Andrew Stubbs
the new testcases included in the rest of the series. Otherwise, I've address comments regarding the enum values, naming, and implemented previously missed cases in the environment variables and parsers. OK for mainline? Andrew Andrew Stubbs (6): libgomp: change alloc-pinned tests failure m

[PATCH v5 1/6] libgomp: change alloc-pinned tests failure mode

2024-06-12 Thread Andrew Stubbs
The feature doesn't work on non-Linux hosts, at present, so skip the tests entirely. On Linux systems that have insufficient lockable memory configured we still need to fail or else the feature won't be getting tested when we think it is, but now there's a message to explain why. libgomp/ChangeLo

Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-04 Thread Andrew Stubbs
On 03/06/2024 21:40, Tobias Burnus wrote: Andrew Stubbs wrote: On 03/06/2024 17:46, Tobias Burnus wrote: Andrew Stubbs wrote: +    /* If USM has been requested and is supported by all devices +   of this type, set the capability accordingly. */ +    if (omp_requires_mask

Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-03 Thread Andrew Stubbs
On 03/06/2024 17:46, Tobias Burnus wrote: Andrew Stubbs wrote: +    /* If USM has been requested and is supported by all devices +   of this type, set the capability accordingly.  */ +    if (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_ME

Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-03 Thread Andrew Stubbs
On 28/05/2024 23:33, Tobias Burnus wrote: While most of the nvptx systems I have access to don't have the support for CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES, one has: Tesla V100-SXM2-16GB (as installed, e.g., on ORNL's Summit) does support this feature. And with that

Re: [PATCH 17/52] gcn: Remove macros {FLOAT, DOUBLE, LONG_DOUBLE}_TYPE_SIZE

2024-06-03 Thread Andrew Stubbs
On 03/06/2024 04:01, Kewen Lin wrote: This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE defines in gcn port. gcc/ChangeLog: * config/gcn/gcn.h (FLOAT_TYPE_SIZE): Remove. (DOUBLE_TYPE_SIZE): Likewise. (LONG_DOUBLE_TYPE_SIZE): Likewise. Assuming that this does n

Re: [patch] libgomp: Enable USM for AMD APUs and MI200 devices

2024-05-31 Thread Andrew Stubbs
On 29/05/2024 13:15, Tobias Burnus wrote: This patch depends (on the libgomp/target.c parts) of the patch "[patch] libgomp: Enable USM for some nvptx devices", https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652987.html AMD GPUs that are either APU devices or MI200 [or MI300X] (with HSA_XNACK

[PATCH v4 5/5] libgomp: fine-grained pinned memory allocator

2024-05-31 Thread Andrew Stubbs
This patch was already approved, by Tobias Burnus, in the v3 posting, but I've not yet committed it because there are some textual dependecies on the yet-to-be-approved patches. - This patch introduces a new custom memory allocator for use with pinned memory (in the case where

[PATCH v4 2/5] openmp: Add -foffload-memory

2024-05-31 Thread Andrew Stubbs
Add a new option. It's inactive until I add some follow-up patches. gcc/ChangeLog: * common.opt: Add -foffload-memory and its enum values. * coretypes.h (enum offload_memory): New. * doc/invoke.texi: Document -foffload-memory. --- gcc/common.opt | 16 +++

[PATCH v4 4/5] libgomp, nvptx: Cuda pinned memory

2024-05-31 Thread Andrew Stubbs
From: Thomas Schwinge This patch was already approved, by Tobias Burnus (with one caveat about initialization location), but wasn't committed at that time as I didn't want to disentangle it from the textual dependencies on the other patches in the series. Use Cuda to pin me

[PATCH v4 3/5] openmp: -foffload-memory=pinned

2024-05-31 Thread Andrew Stubbs
Implement the -foffload-memory=pinned option such that libgomp is instructed to enable fully-pinned memory at start-up. The option is intended to provide a performance boost to certain offload programs without modifying the code. This feature only works on Linux, at present, and simply calls mloc

[PATCH v4 1/5] libgomp, openmp: Add ompx_pinned_mem_alloc

2024-05-31 Thread Andrew Stubbs
Compared to the previous v3 posting of this patch, the enumeration of the "ompx" allocators have been moved to start at "100". - This creates a new predefined allocator as a shortcut for using pinned memory with OpenMP. The name uses the OpenMP extension space and is intended to be consi

[PATCH v4 0/5] libgomp: OpenMP pinned memory for omp_alloc

2024-05-31 Thread Andrew Stubbs
ns to-do. Besides rebase and retest, I've addressed the review comments regarding the enum assignments. OK for mainline? Andrew Andrew Stubbs (4): libgomp, openmp: Add ompx_pinned_mem_alloc openmp: Add -foffload-memory openmp: -foffload-memory=pinned libgomp: fine-grained pinned mem

[wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx90c support

2024-04-26 Thread Andrew Stubbs
I will push this shortly. I think the gfx90c patch just made the cut for the GCC-14 branch! Andrew --- htdocs/gcc-14/changes.html | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index fce0fb44..47fef32d 100644 --

[wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx90c support

2024-04-26 Thread Andrew Stubbs
I will push this shortly. I think the gfx90c patch just made the cut for the GCC-14 branch! Andrew --- htdocs/gcc-14/changes.html | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index fce0fb44..47fef32d 100644 --

Re: [PATCH] amdgcn: Add gfx90c target

2024-04-26 Thread Andrew Stubbs
On 25/04/2024 19:37, Frederik Harwath wrote: Hi Andrew, this patch adds support for gfx90c GCN5 APU integrated graphics devices. The LLVM AMDGPU documentation (https://llvm.org/docs/AMDGPUUsage.html) lists those devices as unsupported by rocm-amdhsa. As we have discussed elsewhere, I have tested

Re: [patch] [gcn][nvptx] Add warning to mkoffload for 32bit host code

2024-04-25 Thread Andrew Stubbs
On 25/04/2024 11:51, Tobias Burnus wrote: Motivated by a surprise of a colleague that with -m32, no offload dumps were created; that's because mkoffload does not process host binaries when the are 32bit (i.e. ilp32). Internally, that done as follows: The host compiler passes to 'mkoffload' the u

Re: GCN: Enable effective-target 'vect_long_long'

2024-04-17 Thread Andrew Stubbs
On 16/04/2024 20:01, Thomas Schwinge wrote: Hi! OK to push the attached "GCN: Enable effective-target 'vect_long_long'"? (Or is that not what you'd expect to see for GCN? I haven't checked the actual back end code...) I think if there are still missing int64 vector operations then they're ex

Re: [wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx1036 support

2024-04-15 Thread Andrew Stubbs
On 15/04/2024 13:00, Richard Biener wrote: On Mon, Apr 15, 2024 at 12:04 PM Tobias Burnus wrote: I experimented with some variants to make clearer that each of RDNA2 and RNDA3 applies to two card types, but at the end I settled on the fewest-word version. Comments, remarks, suggestions? (To t

Re: [wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx1036 support

2024-04-15 Thread Andrew Stubbs
On 15/04/2024 11:03, Tobias Burnus wrote: I experimented with some variants to make clearer that each of RDNA2 and RNDA3 applies to two card types, but at the end I settled on the fewest-word version. Comments, remarks, suggestions? (To this change or in general?) Current version: https://gcc

Re: GCN: '--param=gcn-preferred-vector-lane-width=[default,32,64]'

2024-04-08 Thread Andrew Stubbs
On 08/04/2024 11:45, Thomas Schwinge wrote: Hi! On 2024-03-28T08:00:50+0100, I wrote: On 2024-03-22T15:54:48+, Andrew Stubbs wrote: This patch alters the default (preferred) vector size to 32 on RDNA devices to better match the actual hardware. 64-lane vectors will continue to be used

Re: [Patch] GCN: install.texi update for Newlib change and LLVM 18 release

2024-04-03 Thread Andrew Stubbs
On 03/04/2024 10:27, Jakub Jelinek wrote: On Wed, Apr 03, 2024 at 11:09:19AM +0200, Tobias Burnus wrote: @@ -3954,8 +3956,8 @@ on the GPU. To enable support for GCN3 Fiji devices (gfx803), GCC has to be configured with @option{--with-arch=@code{fiji}} or @option{--with-multilib-list=@code

Re: [Patch] GCN: Fix --with-arch= handling in mkoffload [PR111966]

2024-04-03 Thread Andrew Stubbs
On 03/04/2024 10:05, Tobias Burnus wrote: This patch handles --with-arch= in GCN's mkoffload.cc While mkoffload mostly does not know this and passes it through to the GCN lto1 compiler, it writes an .o file with debug information - and here the -march= in the ELF flags must agree with the one

Re: [PATCH] amdgcn: Add gfx1036 target

2024-03-25 Thread Andrew Stubbs
On 25/03/2024 11:27, Richard Biener wrote: Add support for the gfx1036 RDNA2 APU integrated graphics devices. The ROCm documentation warns that these may not be supported, but it seems to work at least partially. x86 host bootstrap/regtest running, target-libgomp testing for the offload produce

Re: GCN: Enable effective-target 'vect_long_mult'

2024-03-25 Thread Andrew Stubbs
On 21/03/2024 10:41, Thomas Schwinge wrote: Hi! OK to push the attached "GCN: Enable effective-target 'vect_long_mult'"? (Or is that not what you'd expect to see for GCN? I haven't checked the actual back end code...) OK. Andrew

Re: GCN: Enable effective-target 'vect_hw_misalign'

2024-03-25 Thread Andrew Stubbs
On 21/03/2024 10:41, Thomas Schwinge wrote: Hi! OK to push the attached "GCN: Enable effective-target 'vect_hw_misalign'"? (Or is that not what you'd expect to see for GCN? I haven't checked the actual back end code...) OK. Andrew.

[wwwdocs, committed] gcc-14: amdgcn: Add gfx1103

2024-03-22 Thread Andrew Stubbs
I added a note about gfx1103 to the existing text for gfx1100. Andrew --- htdocs/gcc-14/changes.html | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index d88fbc96..880b9195 100644 --- a/htdocs/gcc-14/changes.htm

[committed] amdgcn: Adjust GFX10/GFX11 cache coherency

2024-03-22 Thread Andrew Stubbs
The RDNA devices have different cache architectures to the CDNA devices, and the differences go deeper than just the assembler mnemonics, so we probably need to generate different code to maintain coherency across the whole device. I believe this patch is correct according to the documentation in

[committed] amdgcn: Prefer V32 on RDNA devices

2024-03-22 Thread Andrew Stubbs
This patch alters the default (preferred) vector size to 32 on RDNA devices to better match the actual hardware. 64-lane vectors will continue to be used where they are hard-coded (such as function prologues). We run these devices in wavefrontsize64 for compatibility, but they actually only have

[committed] amdgcn: Add gfx1103 target

2024-03-22 Thread Andrew Stubbs
This patch adds support for the gfx1103 RDNA3 APU integrated graphics devices. The ROCm documentation warns that these may not be supported, but it seems to work at least partially. This device should be considered "Experimental" at this point, although so far it seems to be at least as functiona

Re: [PATCH] vect: more oversized bitmask fixups

2024-03-22 Thread Andrew Stubbs
On 22/03/2024 08:43, Richard Biener wrote: I'll note that we don't pass 'val' there and 'val' is unfortunately not documented - what's it supposed to be? I think I placed the original fix in do_compare_and_jump because we have the full into available there. So what's the do_compare_rtx_and_j

Re: [committed] amdgcn: Ensure gfx11 is running in cumode

2024-03-22 Thread Andrew Stubbs
On 22/03/2024 11:56, Thomas Schwinge wrote: Hi Andrew! On 2024-03-21T13:39:53+, Andrew Stubbs wrote: CUmode "on" is the setting for compatibility with GCN and CDNA devices. --- a/gcc/config/gcn/gcn-hsa.h +++ b/gcc/config/gcn/gcn-hsa.h @@ -107,6 +107,7 @@ extern un

Re: [PATCH] vect: more oversized bitmask fixups

2024-03-21 Thread Andrew Stubbs
On 21/03/2024 15:18, Richard Biener wrote: On Thu, Mar 21, 2024 at 3:23 PM Andrew Stubbs wrote: My previous patch to fix this problem with xor was rejected because we want to fix these issues only at the point of use. That patch produced slightly better code, in this example, but this works

[PATCH] vect: more oversized bitmask fixups

2024-03-21 Thread Andrew Stubbs
My previous patch to fix this problem with xor was rejected because we want to fix these issues only at the point of use. That patch produced slightly better code, in this example, but this works too These patches fix up a failure in testcase vect/tsvc/vect-tsvc-s278.c when configured to use

[committed] amdgcn: Ensure gfx11 is running in cumode

2024-03-21 Thread Andrew Stubbs
CUmode "on" is the setting for compatibility with GCN and CDNA devices. Committed to mainline. gcc/ChangeLog: * config/gcn/gcn-hsa.h (ASM_SPEC): Pass -mattr=+cumode. --- gcc/config/gcn/gcn-hsa.h | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gc

[commmitted] amdgcn: Comment correction

2024-03-21 Thread Andrew Stubbs
The location of the marker was changed, but the comment wasn't updated. Fixed now. Committed to mainline gcc/ChangeLog: * config/gcn/gcn.cc (gcn_expand_builtin_1): Comment correction. --- gcc/config/gcn/gcn.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/c

[committed] amdgcn: Clean up device memory in gcn-run

2024-03-21 Thread Andrew Stubbs
There are some stability issues in the ROC runtime or drivers when we run too many tests in quick succession. I was hoping this patch might fix it, but no; still good to fix the omissions though. Committed to mainline. gcc/ChangeLog: * config/gcn/gcn-run.cc (main): Add an hsa_memory_fre

Re: GCN: Enable effective-target 'vect_early_break', 'vect_early_break_hw'

2024-03-21 Thread Andrew Stubbs
On 21/03/2024 10:41, Thomas Schwinge wrote: Hi! On 2024-01-12T15:02:35+0100, I wrote: OK to push the attached "GCN: Enable effective-target 'vect_early_break', 'vect_early_break_hw'"? Ping. (Or is that not what you'd expect to see for GCN? I haven't checked the actual back end code...) So

Re: [Patch][RFC] GCN: Define ISA archs in gcn-devices.def and use it

2024-03-15 Thread Andrew Stubbs
On 15/03/2024 13:56, Tobias Burnus wrote: Hi Andrew, Andrew Stubbs wrote: This is more-or-less what I was planning to do myself, but as I want to include all the other features that get parametrized in gcn.cc, gcn.h, gcn-hsa.h, gcn-opts.h, I hadn't got around to it yet. Unfortunate

Re: [Patch][RFC] GCN: Define ISA archs in gcn-devices.def and use it

2024-03-15 Thread Andrew Stubbs
On 15/03/2024 12:21, Tobias Burnus wrote: Given the large number of AMD GPU ISAs and the number of files which have to be adapted, I wonder whether it makes sense to consolidate this a bit, especially in the light that we may want to support more in the future. Besides using some macros, I al

Re: [PATCH] vect: Use xor to invert oversized vector masks

2024-03-15 Thread Andrew Stubbs
On 15/03/2024 07:35, Richard Biener wrote: On Fri, Mar 15, 2024 at 4:35 AM Hongtao Liu wrote: On Thu, Mar 14, 2024 at 11:42 PM Andrew Stubbs wrote: Don't enable excess lanes when inverting vector bit-masks smaller than the integer mode. This is yet another case of wrong-code d

Re: [PATCH] vect: Use xor to invert oversized vector masks

2024-03-15 Thread Andrew Stubbs
On 15/03/2024 03:45, Hongtao Liu wrote: On Thu, Mar 14, 2024 at 11:42 PM Andrew Stubbs wrote: Don't enable excess lanes when inverting vector bit-masks smaller than the integer mode. This is yet another case of wrong-code due to mishandling of oversized bitmasks. This issue shows up in

[PATCH] vect: Use xor to invert oversized vector masks

2024-03-14 Thread Andrew Stubbs
Don't enable excess lanes when inverting vector bit-masks smaller than the integer mode. This is yet another case of wrong-code due to mishandling of oversized bitmasks. This issue shows up in vect/tsvc/vect-tsvc-s278.c and vect/tsvc/vect-tsvc-s279.c if I set the preferred vector size to V32 (dow

Re: GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable (non-shared memory system)

2024-03-08 Thread Andrew Stubbs
On 08/03/2024 10:16, Thomas Schwinge wrote: Hi! So, attached here is now a different patch "GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable (non-shared memory system)", that takes a different approach re clarifying the two orthogonal aspects that the 'GCN_SUPPRESS_HOS

Re: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal

2024-03-07 Thread Andrew Stubbs
On 07/03/2024 13:37, Thomas Schwinge wrote: Hi Andrew! On 2024-03-07T11:38:27+, Andrew Stubbs wrote: On 07/03/2024 11:29, Thomas Schwinge wrote: On 2019-11-12T13:29:16+, Andrew Stubbs wrote: This patch contributes the GCN libgomp plugin, with the various configure and make bits to

Re: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal

2024-03-07 Thread Andrew Stubbs
On 07/03/2024 11:29, Thomas Schwinge wrote: Hi! On 2019-11-12T13:29:16+, Andrew Stubbs wrote: This patch contributes the GCN libgomp plugin, with the various configure and make bits to go with it. An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is differen

Re: amdgcn: additional gfx1030/gfx1100 support: adjust test cases

2024-03-06 Thread Andrew Stubbs
On 06/03/2024 13:49, Thomas Schwinge wrote: Hi! On 2024-01-24T12:43:04+, Andrew Stubbs wrote: This [...] ... became commit 99890e15527f1f04caef95ecdd135c9f1a077f08 "amdgcn: additional gfx1030/gfx1100 support", and included the following: --- a/gcc/config/gcn/gcn-valu.md

Re: Stabilize flaky GCN target/offloading testing

2024-03-06 Thread Andrew Stubbs
On 06/03/2024 12:09, Thomas Schwinge wrote: Hi! On 2024-02-21T17:32:13+0100, Richard Biener wrote: Am 21.02.2024 um 13:34 schrieb Thomas Schwinge : [...] per my work on "libgomp make check time is excessive", all execution testing in libgomp is serialized in 'lib

Re: [PATCH] vect: Fix integer overflow calculating mask

2024-03-04 Thread Andrew Stubbs
On 23/02/2024 15:13, Richard Biener wrote: On Fri, 23 Feb 2024, Jakub Jelinek wrote: On Fri, Feb 23, 2024 at 02:22:19PM +, Andrew Stubbs wrote: On 23/02/2024 13:02, Jakub Jelinek wrote: On Fri, Feb 23, 2024 at 12:58:53PM +, Andrew Stubbs wrote: This is a follow-up to the previous

Re: [PATCH] vect: Fix integer overflow calculating mask

2024-02-23 Thread Andrew Stubbs
On 23/02/2024 13:02, Jakub Jelinek wrote: On Fri, Feb 23, 2024 at 12:58:53PM +, Andrew Stubbs wrote: This is a follow-up to the previous patch to ensure that integer vector bit-masks do not have excess bits set. It fixes a bug, observed on amdgcn, in which the mask could be incorrectly set

[PATCH] vect: Fix integer overflow calculating mask

2024-02-23 Thread Andrew Stubbs
This is a follow-up to the previous patch to ensure that integer vector bit-masks do not have excess bits set. It fixes a bug, observed on amdgcn, in which the mask could be incorrectly set to zero, resulting in wrong-code. The mask was broken when nunits==32. The patched version will probably be

Re: GCN: Conditionalize 'define_expand "reduc__scal_"' on '!TARGET_RDNA2_PLUS' [PR113615]

2024-02-16 Thread Andrew Stubbs
On 16/02/2024 14:34, Thomas Schwinge wrote: Hi! On 2024-01-29T11:34:05+0100, Tobias Burnus wrote: Andrew wrote off list: "Vector reductions don't work on RDNA, as is, but they're supposed to be disabled by the insn condition" This patch disables "fold_left_plus_", which is about vect

Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-16 Thread Andrew Stubbs
On 16/02/2024 12:26, Richard Biener wrote: On Fri, 16 Feb 2024, Andrew Stubbs wrote: On 16/02/2024 10:17, Richard Biener wrote: On Fri, 16 Feb 2024, Thomas Schwinge wrote: Hi! On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote: I've committed this patch ... as c

Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-16 Thread Andrew Stubbs
On 16/02/2024 10:17, Richard Biener wrote: On Fri, 16 Feb 2024, Thomas Schwinge wrote: Hi! On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote: I've committed this patch ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691 "amdgcn: add -march=gfx1030 EXPERIMENTAL", which

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-15 Thread Andrew Stubbs
On 15/02/2024 10:23, Thomas Schwinge wrote: Hi! On 2024-02-15T08:49:17+0100, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 14/02/2024 13:43, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 14/02/2024 13:27, Richard Biener wrote: On Wed, 14 Feb 2024

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-15 Thread Andrew Stubbs
On 15/02/2024 10:21, Richard Biener wrote: [snip] I suppse if RDNA really only has 32 lane vectors (it sounds like it, even if it can "simulate" 64 lane ones?) then it might make sense to vectorize for 32 lanes? That said, with variable-length it likely doesn't matter but I'd not expose fixed-si

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-15 Thread Andrew Stubbs
On 15/02/2024 07:49, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 14/02/2024 13:43, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 14/02/2024 13:27, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 13/02/2024 08:26, Richard

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-14 Thread Andrew Stubbs
On 14/02/2024 13:43, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 14/02/2024 13:27, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 13/02/2024 08:26, Richard Biener wrote: On Mon, 12 Feb 2024, Thomas Schwinge wrote: Hi! On 2023-10-20T12:51:03

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-14 Thread Andrew Stubbs
On 14/02/2024 13:27, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 13/02/2024 08:26, Richard Biener wrote: On Mon, 12 Feb 2024, Thomas Schwinge wrote: Hi! On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote: I've committed this patch ... as c

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-14 Thread Andrew Stubbs
On 13/02/2024 08:26, Richard Biener wrote: On Mon, 12 Feb 2024, Thomas Schwinge wrote: Hi! On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote: I've committed this patch ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691 "amdgcn: add -march=gfx1030 EXPERIMENTAL". The RD

Re: [PATCH] libgomp: testsuite: Don't XPASS libgomp.c/alloc-pinned-1.c etc. on non-Linux targets [PR113448]

2024-02-12 Thread Andrew Stubbs
On 05/02/2024 13:04, Rainer Orth wrote: Two libgomp tests XPASS on Solaris (any non-Linux target actually) since their introduction: XPASS: libgomp.c/alloc-pinned-1.c execution test XPASS: libgomp.c/alloc-pinned-2.c execution test The problem is that the test just prints OS unsupported and ex

Re: GCN: Don't hard-code number of SGPR/VGPR/AVGPR registers

2024-02-01 Thread Andrew Stubbs
On 01/02/2024 13:49, Thomas Schwinge wrote: Hi! On 2018-12-12T11:52:52+, Andrew Stubbs wrote: This patch contains the major part of the GCN back-end. [...] --- /dev/null +++ b/gcc/config/gcn/gcn.c +void +gcn_hsa_declare_function_name (FILE *file, const char *name, tree

Re: GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'

2024-02-01 Thread Andrew Stubbs
On 01/02/2024 11:36, Thomas Schwinge wrote: Hi! On 2024-01-31T11:31:00+, Andrew Stubbs wrote: On 31/01/2024 10:36, Thomas Schwinge wrote: OK to push "GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'", see attached? In pre-RDNA 3 ISA manuals, there are

Re: GCN: Remove 'FIRST_{SGPR,VGPR,AVGPR}_REG', 'LAST_{SGPR,VGPR,AVGPR}_REG' from machine description

2024-01-31 Thread Andrew Stubbs
On 31/01/2024 17:21, Thomas Schwinge wrote: Hi! On 2018-12-12T11:52:23+, Andrew Stubbs wrote: This patch contains the machine description portion of the GCN back-end. [...] --- /dev/null +++ b/gcc/config/gcn/gcn.md +;; {{{ Constants and enums + +; Named registers +(define_constants

Re: GCN: Remove 'SGPR_OR_VGPR_REGNO_P' definition

2024-01-31 Thread Andrew Stubbs
On 31/01/2024 17:12, Thomas Schwinge wrote: Hi! On 2018-12-12T11:52:52+, Andrew Stubbs wrote: This patch contains the major part of the GCN back-end. [...] --- /dev/null +++ b/gcc/config/gcn/gcn.h +#define FIRST_SGPR_REG 0 +#define SGPR_REGNO(N) ((N)+FIRST_SGPR_REG

Re: GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'

2024-01-31 Thread Andrew Stubbs
On 31/01/2024 10:36, Thomas Schwinge wrote: Hi! OK to push "GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'", see attached? In pre-RDNA 3 ISA manuals, there are notes for 'DS_CMPST_[...]', like: Caution, the order of src and cmp are the *opposite* of the BUFFER_ATOMIC_CMPSWAP opcode

  1   2   3   4   5   6   7   8   9   10   >