Re: [Patch, v2] gcn/mkoffload.cc: Use #embed for including the generated ELF file

2024-06-21 Thread Andrew Stubbs
On 21/06/2024 16:30, Tobias Burnus wrote: [I messed up copying from the build system, picking up an old version. Changes to v1 (bottom of the diff): fopen is no longer required.] Tobias Burnus wrote: mkoffload's generated .c file looks much nicer with '#embed'. This patch depends on Jakub's

Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-17 Thread Andrew Stubbs
On 14/06/2024 11:31, Richard Biener wrote: The following retires vcond{,u,eq} optabs by stopping to use them from the middle-end. Targets instead (should) implement vcond_mask and vec_cmp{,u,eq} optabs. The PR this change refers to lists possibly affected targets - those implementing these

[PATCH v5 6/6] libgomp: fine-grained pinned memory allocator

2024-06-12 Thread Andrew Stubbs
This patch introduces a new custom memory allocator for use with pinned memory (in the case where the Cuda allocator isn't available). In future, this allocator will also be used for Unified Shared Memory. Both memories are incompatible with the system malloc because allocated memory cannot

[PATCH v5 4/6] openmp: -foffload-memory=pinned

2024-06-12 Thread Andrew Stubbs
Implement the -foffload-memory=pinned option such that libgomp is instructed to enable fully-pinned memory at start-up. The option is intended to provide a performance boost to certain offload programs without modifying the code. This feature only works on Linux, at present, and simply calls

[PATCH v5 5/6] libgomp, nvptx: Cuda pinned memory

2024-06-12 Thread Andrew Stubbs
This patch was already approved, in the v3 posting by Tobias Burnus (with one caveat about initialization location), but wasn't committed at that time as I didn't want to disentangle it from the textual dependencies on the other patches in the series. -- Use Cuda to pin memory, instead of

[PATCH v5 2/6] libgomp, openmp: Add ompx_gnu_pinned_mem_alloc

2024-06-12 Thread Andrew Stubbs
Compared to the previous v4 (1/5) posting of this patch: - The enumeration of the ompx allocators have been moved (again) to 200 (as 100 is already in use by another toolchain vendor and this seems like a possible source of confusion). - The "ompx" has also been changed to "ompx_gnu" to

[PATCH v5 3/6] openmp: Add -foffload-memory

2024-06-12 Thread Andrew Stubbs
Add a new option. It's inactive until I add some follow-up patches. gcc/ChangeLog: * common.opt: Add -foffload-memory and its enum values. * coretypes.h (enum offload_memory): New. * doc/invoke.texi: Document -foffload-memory. --- gcc/common.opt | 16

[PATCH v5 0/6] libgomp: OpenMP pinned memory for omp_alloc

2024-06-12 Thread Andrew Stubbs
in the rest of the series. Otherwise, I've address comments regarding the enum values, naming, and implemented previously missed cases in the environment variables and parsers. OK for mainline? Andrew Andrew Stubbs (6): libgomp: change alloc-pinned tests failure mode libgomp, openmp: Add

[PATCH v5 1/6] libgomp: change alloc-pinned tests failure mode

2024-06-12 Thread Andrew Stubbs
The feature doesn't work on non-Linux hosts, at present, so skip the tests entirely. On Linux systems that have insufficient lockable memory configured we still need to fail or else the feature won't be getting tested when we think it is, but now there's a message to explain why.

Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-04 Thread Andrew Stubbs
On 03/06/2024 21:40, Tobias Burnus wrote: Andrew Stubbs wrote: On 03/06/2024 17:46, Tobias Burnus wrote: Andrew Stubbs wrote: +    /* If USM has been requested and is supported by all devices +   of this type, set the capability accordingly. */ +    if (omp_requires_mask

Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-03 Thread Andrew Stubbs
On 03/06/2024 17:46, Tobias Burnus wrote: Andrew Stubbs wrote: +    /* If USM has been requested and is supported by all devices +   of this type, set the capability accordingly.  */ +    if (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_ME

Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-03 Thread Andrew Stubbs
On 28/05/2024 23:33, Tobias Burnus wrote: While most of the nvptx systems I have access to don't have the support for CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES, one has: Tesla V100-SXM2-16GB (as installed, e.g., on ORNL's Summit) does support this feature. And with

Re: [PATCH 17/52] gcn: Remove macros {FLOAT, DOUBLE, LONG_DOUBLE}_TYPE_SIZE

2024-06-03 Thread Andrew Stubbs
On 03/06/2024 04:01, Kewen Lin wrote: This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE defines in gcn port. gcc/ChangeLog: * config/gcn/gcn.h (FLOAT_TYPE_SIZE): Remove. (DOUBLE_TYPE_SIZE): Likewise. (LONG_DOUBLE_TYPE_SIZE): Likewise. Assuming that this does

Re: [patch] libgomp: Enable USM for AMD APUs and MI200 devices

2024-05-31 Thread Andrew Stubbs
On 29/05/2024 13:15, Tobias Burnus wrote: This patch depends (on the libgomp/target.c parts) of the patch "[patch] libgomp: Enable USM for some nvptx devices", https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652987.html AMD GPUs that are either APU devices or MI200 [or MI300X] (with

[PATCH v4 5/5] libgomp: fine-grained pinned memory allocator

2024-05-31 Thread Andrew Stubbs
This patch was already approved, by Tobias Burnus, in the v3 posting, but I've not yet committed it because there are some textual dependecies on the yet-to-be-approved patches. - This patch introduces a new custom memory allocator for use with pinned memory (in the case

[PATCH v4 2/5] openmp: Add -foffload-memory

2024-05-31 Thread Andrew Stubbs
Add a new option. It's inactive until I add some follow-up patches. gcc/ChangeLog: * common.opt: Add -foffload-memory and its enum values. * coretypes.h (enum offload_memory): New. * doc/invoke.texi: Document -foffload-memory. --- gcc/common.opt | 16

[PATCH v4 4/5] libgomp, nvptx: Cuda pinned memory

2024-05-31 Thread Andrew Stubbs
From: Thomas Schwinge This patch was already approved, by Tobias Burnus (with one caveat about initialization location), but wasn't committed at that time as I didn't want to disentangle it from the textual dependencies on the other patches in the series. Use Cuda to pin

[PATCH v4 3/5] openmp: -foffload-memory=pinned

2024-05-31 Thread Andrew Stubbs
Implement the -foffload-memory=pinned option such that libgomp is instructed to enable fully-pinned memory at start-up. The option is intended to provide a performance boost to certain offload programs without modifying the code. This feature only works on Linux, at present, and simply calls

[PATCH v4 1/5] libgomp, openmp: Add ompx_pinned_mem_alloc

2024-05-31 Thread Andrew Stubbs
Compared to the previous v3 posting of this patch, the enumeration of the "ompx" allocators have been moved to start at "100". - This creates a new predefined allocator as a shortcut for using pinned memory with OpenMP. The name uses the OpenMP extension space and is intended to be

[PATCH v4 0/5] libgomp: OpenMP pinned memory for omp_alloc

2024-05-31 Thread Andrew Stubbs
and retest, I've addressed the review comments regarding the enum assignments. OK for mainline? Andrew Andrew Stubbs (4): libgomp, openmp: Add ompx_pinned_mem_alloc openmp: Add -foffload-memory openmp: -foffload-memory=pinned libgomp: fine-grained pinned memory allocator Thomas Schwinge (1

[wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx90c support

2024-04-26 Thread Andrew Stubbs
I will push this shortly. I think the gfx90c patch just made the cut for the GCC-14 branch! Andrew --- htdocs/gcc-14/changes.html | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index fce0fb44..47fef32d 100644

[wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx90c support

2024-04-26 Thread Andrew Stubbs
I will push this shortly. I think the gfx90c patch just made the cut for the GCC-14 branch! Andrew --- htdocs/gcc-14/changes.html | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index fce0fb44..47fef32d 100644

Re: [PATCH] amdgcn: Add gfx90c target

2024-04-26 Thread Andrew Stubbs
On 25/04/2024 19:37, Frederik Harwath wrote: Hi Andrew, this patch adds support for gfx90c GCN5 APU integrated graphics devices. The LLVM AMDGPU documentation (https://llvm.org/docs/AMDGPUUsage.html) lists those devices as unsupported by rocm-amdhsa. As we have discussed elsewhere, I have tested

Re: [patch] [gcn][nvptx] Add warning to mkoffload for 32bit host code

2024-04-25 Thread Andrew Stubbs
On 25/04/2024 11:51, Tobias Burnus wrote: Motivated by a surprise of a colleague that with -m32, no offload dumps were created; that's because mkoffload does not process host binaries when the are 32bit (i.e. ilp32). Internally, that done as follows: The host compiler passes to 'mkoffload' the

Re: GCN: Enable effective-target 'vect_long_long'

2024-04-17 Thread Andrew Stubbs
On 16/04/2024 20:01, Thomas Schwinge wrote: Hi! OK to push the attached "GCN: Enable effective-target 'vect_long_long'"? (Or is that not what you'd expect to see for GCN? I haven't checked the actual back end code...) I think if there are still missing int64 vector operations then they're

Re: [wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx1036 support

2024-04-15 Thread Andrew Stubbs
On 15/04/2024 13:00, Richard Biener wrote: On Mon, Apr 15, 2024 at 12:04 PM Tobias Burnus wrote: I experimented with some variants to make clearer that each of RDNA2 and RNDA3 applies to two card types, but at the end I settled on the fewest-word version. Comments, remarks, suggestions? (To

Re: [wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx1036 support

2024-04-15 Thread Andrew Stubbs
On 15/04/2024 11:03, Tobias Burnus wrote: I experimented with some variants to make clearer that each of RDNA2 and RNDA3 applies to two card types, but at the end I settled on the fewest-word version. Comments, remarks, suggestions? (To this change or in general?) Current version:

Re: GCN: '--param=gcn-preferred-vector-lane-width=[default,32,64]'

2024-04-08 Thread Andrew Stubbs
On 08/04/2024 11:45, Thomas Schwinge wrote: Hi! On 2024-03-28T08:00:50+0100, I wrote: On 2024-03-22T15:54:48+, Andrew Stubbs wrote: This patch alters the default (preferred) vector size to 32 on RDNA devices to better match the actual hardware. 64-lane vectors will continue to be used

Re: [Patch] GCN: install.texi update for Newlib change and LLVM 18 release

2024-04-03 Thread Andrew Stubbs
On 03/04/2024 10:27, Jakub Jelinek wrote: On Wed, Apr 03, 2024 at 11:09:19AM +0200, Tobias Burnus wrote: @@ -3954,8 +3956,8 @@ on the GPU. To enable support for GCN3 Fiji devices (gfx803), GCC has to be configured with @option{--with-arch=@code{fiji}} or

Re: [Patch] GCN: Fix --with-arch= handling in mkoffload [PR111966]

2024-04-03 Thread Andrew Stubbs
On 03/04/2024 10:05, Tobias Burnus wrote: This patch handles --with-arch= in GCN's mkoffload.cc While mkoffload mostly does not know this and passes it through to the GCN lto1 compiler, it writes an .o file with debug information - and here the -march= in the ELF flags must agree with the one

Re: [PATCH] amdgcn: Add gfx1036 target

2024-03-25 Thread Andrew Stubbs
On 25/03/2024 11:27, Richard Biener wrote: Add support for the gfx1036 RDNA2 APU integrated graphics devices. The ROCm documentation warns that these may not be supported, but it seems to work at least partially. x86 host bootstrap/regtest running, target-libgomp testing for the offload

Re: GCN: Enable effective-target 'vect_long_mult'

2024-03-25 Thread Andrew Stubbs
On 21/03/2024 10:41, Thomas Schwinge wrote: Hi! OK to push the attached "GCN: Enable effective-target 'vect_long_mult'"? (Or is that not what you'd expect to see for GCN? I haven't checked the actual back end code...) OK. Andrew

Re: GCN: Enable effective-target 'vect_hw_misalign'

2024-03-25 Thread Andrew Stubbs
On 21/03/2024 10:41, Thomas Schwinge wrote: Hi! OK to push the attached "GCN: Enable effective-target 'vect_hw_misalign'"? (Or is that not what you'd expect to see for GCN? I haven't checked the actual back end code...) OK. Andrew.

[wwwdocs, committed] gcc-14: amdgcn: Add gfx1103

2024-03-22 Thread Andrew Stubbs
I added a note about gfx1103 to the existing text for gfx1100. Andrew --- htdocs/gcc-14/changes.html | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index d88fbc96..880b9195 100644 ---

[committed] amdgcn: Adjust GFX10/GFX11 cache coherency

2024-03-22 Thread Andrew Stubbs
The RDNA devices have different cache architectures to the CDNA devices, and the differences go deeper than just the assembler mnemonics, so we probably need to generate different code to maintain coherency across the whole device. I believe this patch is correct according to the documentation in

[committed] amdgcn: Prefer V32 on RDNA devices

2024-03-22 Thread Andrew Stubbs
This patch alters the default (preferred) vector size to 32 on RDNA devices to better match the actual hardware. 64-lane vectors will continue to be used where they are hard-coded (such as function prologues). We run these devices in wavefrontsize64 for compatibility, but they actually only have

[committed] amdgcn: Add gfx1103 target

2024-03-22 Thread Andrew Stubbs
This patch adds support for the gfx1103 RDNA3 APU integrated graphics devices. The ROCm documentation warns that these may not be supported, but it seems to work at least partially. This device should be considered "Experimental" at this point, although so far it seems to be at least as

Re: [PATCH] vect: more oversized bitmask fixups

2024-03-22 Thread Andrew Stubbs
On 22/03/2024 08:43, Richard Biener wrote: I'll note that we don't pass 'val' there and 'val' is unfortunately not documented - what's it supposed to be? I think I placed the original fix in do_compare_and_jump because we have the full into available there. So what's the

Re: [committed] amdgcn: Ensure gfx11 is running in cumode

2024-03-22 Thread Andrew Stubbs
On 22/03/2024 11:56, Thomas Schwinge wrote: Hi Andrew! On 2024-03-21T13:39:53+, Andrew Stubbs wrote: CUmode "on" is the setting for compatibility with GCN and CDNA devices. --- a/gcc/config/gcn/gcn-hsa.h +++ b/gcc/config/gcn/gcn-hsa.h @@ -107,6 +107,7 @@ extern un

Re: [PATCH] vect: more oversized bitmask fixups

2024-03-21 Thread Andrew Stubbs
On 21/03/2024 15:18, Richard Biener wrote: On Thu, Mar 21, 2024 at 3:23 PM Andrew Stubbs wrote: My previous patch to fix this problem with xor was rejected because we want to fix these issues only at the point of use. That patch produced slightly better code, in this example, but this works

[PATCH] vect: more oversized bitmask fixups

2024-03-21 Thread Andrew Stubbs
My previous patch to fix this problem with xor was rejected because we want to fix these issues only at the point of use. That patch produced slightly better code, in this example, but this works too These patches fix up a failure in testcase vect/tsvc/vect-tsvc-s278.c when configured to use

[committed] amdgcn: Ensure gfx11 is running in cumode

2024-03-21 Thread Andrew Stubbs
CUmode "on" is the setting for compatibility with GCN and CDNA devices. Committed to mainline. gcc/ChangeLog: * config/gcn/gcn-hsa.h (ASM_SPEC): Pass -mattr=+cumode. --- gcc/config/gcn/gcn-hsa.h | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/config/gcn/gcn-hsa.h

[commmitted] amdgcn: Comment correction

2024-03-21 Thread Andrew Stubbs
The location of the marker was changed, but the comment wasn't updated. Fixed now. Committed to mainline gcc/ChangeLog: * config/gcn/gcn.cc (gcn_expand_builtin_1): Comment correction. --- gcc/config/gcn/gcn.cc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git

[committed] amdgcn: Clean up device memory in gcn-run

2024-03-21 Thread Andrew Stubbs
There are some stability issues in the ROC runtime or drivers when we run too many tests in quick succession. I was hoping this patch might fix it, but no; still good to fix the omissions though. Committed to mainline. gcc/ChangeLog: * config/gcn/gcn-run.cc (main): Add an

Re: GCN: Enable effective-target 'vect_early_break', 'vect_early_break_hw'

2024-03-21 Thread Andrew Stubbs
On 21/03/2024 10:41, Thomas Schwinge wrote: Hi! On 2024-01-12T15:02:35+0100, I wrote: OK to push the attached "GCN: Enable effective-target 'vect_early_break', 'vect_early_break_hw'"? Ping. (Or is that not what you'd expect to see for GCN? I haven't checked the actual back end code...)

Re: [Patch][RFC] GCN: Define ISA archs in gcn-devices.def and use it

2024-03-15 Thread Andrew Stubbs
On 15/03/2024 13:56, Tobias Burnus wrote: Hi Andrew, Andrew Stubbs wrote: This is more-or-less what I was planning to do myself, but as I want to include all the other features that get parametrized in gcn.cc, gcn.h, gcn-hsa.h, gcn-opts.h, I hadn't got around to it yet. Unfortunately, I

Re: [Patch][RFC] GCN: Define ISA archs in gcn-devices.def and use it

2024-03-15 Thread Andrew Stubbs
On 15/03/2024 12:21, Tobias Burnus wrote: Given the large number of AMD GPU ISAs and the number of files which have to be adapted, I wonder whether it makes sense to consolidate this a bit, especially in the light that we may want to support more in the future. Besides using some macros, I

Re: [PATCH] vect: Use xor to invert oversized vector masks

2024-03-15 Thread Andrew Stubbs
On 15/03/2024 07:35, Richard Biener wrote: On Fri, Mar 15, 2024 at 4:35 AM Hongtao Liu wrote: On Thu, Mar 14, 2024 at 11:42 PM Andrew Stubbs wrote: Don't enable excess lanes when inverting vector bit-masks smaller than the integer mode. This is yet another case of wrong-code due

Re: [PATCH] vect: Use xor to invert oversized vector masks

2024-03-15 Thread Andrew Stubbs
On 15/03/2024 03:45, Hongtao Liu wrote: On Thu, Mar 14, 2024 at 11:42 PM Andrew Stubbs wrote: Don't enable excess lanes when inverting vector bit-masks smaller than the integer mode. This is yet another case of wrong-code due to mishandling of oversized bitmasks. This issue shows up in vect

[PATCH] vect: Use xor to invert oversized vector masks

2024-03-14 Thread Andrew Stubbs
Don't enable excess lanes when inverting vector bit-masks smaller than the integer mode. This is yet another case of wrong-code due to mishandling of oversized bitmasks. This issue shows up in vect/tsvc/vect-tsvc-s278.c and vect/tsvc/vect-tsvc-s279.c if I set the preferred vector size to V32

Re: GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable (non-shared memory system)

2024-03-08 Thread Andrew Stubbs
On 08/03/2024 10:16, Thomas Schwinge wrote: Hi! So, attached here is now a different patch "GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable (non-shared memory system)", that takes a different approach re clarifying the two orthogonal aspects that the

Re: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal

2024-03-07 Thread Andrew Stubbs
On 07/03/2024 13:37, Thomas Schwinge wrote: Hi Andrew! On 2024-03-07T11:38:27+, Andrew Stubbs wrote: On 07/03/2024 11:29, Thomas Schwinge wrote: On 2019-11-12T13:29:16+, Andrew Stubbs wrote: This patch contributes the GCN libgomp plugin, with the various configure and make bits

Re: GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal

2024-03-07 Thread Andrew Stubbs
On 07/03/2024 11:29, Thomas Schwinge wrote: Hi! On 2019-11-12T13:29:16+, Andrew Stubbs wrote: This patch contributes the GCN libgomp plugin, with the various configure and make bits to go with it. An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is different from

Re: amdgcn: additional gfx1030/gfx1100 support: adjust test cases

2024-03-06 Thread Andrew Stubbs
On 06/03/2024 13:49, Thomas Schwinge wrote: Hi! On 2024-01-24T12:43:04+, Andrew Stubbs wrote: This [...] ... became commit 99890e15527f1f04caef95ecdd135c9f1a077f08 "amdgcn: additional gfx1030/gfx1100 support", and included the following: --- a/gcc/config/gcn/gcn-valu.md

Re: Stabilize flaky GCN target/offloading testing

2024-03-06 Thread Andrew Stubbs
On 06/03/2024 12:09, Thomas Schwinge wrote: Hi! On 2024-02-21T17:32:13+0100, Richard Biener wrote: Am 21.02.2024 um 13:34 schrieb Thomas Schwinge : [...] per my work on "libgomp make check time is excessive", all execution testing in libgomp is serialized in

Re: [PATCH] vect: Fix integer overflow calculating mask

2024-03-04 Thread Andrew Stubbs
On 23/02/2024 15:13, Richard Biener wrote: On Fri, 23 Feb 2024, Jakub Jelinek wrote: On Fri, Feb 23, 2024 at 02:22:19PM +, Andrew Stubbs wrote: On 23/02/2024 13:02, Jakub Jelinek wrote: On Fri, Feb 23, 2024 at 12:58:53PM +, Andrew Stubbs wrote: This is a follow-up to the previous

Re: [PATCH] vect: Fix integer overflow calculating mask

2024-02-23 Thread Andrew Stubbs
On 23/02/2024 13:02, Jakub Jelinek wrote: On Fri, Feb 23, 2024 at 12:58:53PM +, Andrew Stubbs wrote: This is a follow-up to the previous patch to ensure that integer vector bit-masks do not have excess bits set. It fixes a bug, observed on amdgcn, in which the mask could be incorrectly set

[PATCH] vect: Fix integer overflow calculating mask

2024-02-23 Thread Andrew Stubbs
This is a follow-up to the previous patch to ensure that integer vector bit-masks do not have excess bits set. It fixes a bug, observed on amdgcn, in which the mask could be incorrectly set to zero, resulting in wrong-code. The mask was broken when nunits==32. The patched version will probably be

Re: GCN: Conditionalize 'define_expand "reduc__scal_"' on '!TARGET_RDNA2_PLUS' [PR113615]

2024-02-16 Thread Andrew Stubbs
On 16/02/2024 14:34, Thomas Schwinge wrote: Hi! On 2024-01-29T11:34:05+0100, Tobias Burnus wrote: Andrew wrote off list: "Vector reductions don't work on RDNA, as is, but they're supposed to be disabled by the insn condition" This patch disables "fold_left_plus_", which is about

Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-16 Thread Andrew Stubbs
On 16/02/2024 12:26, Richard Biener wrote: On Fri, 16 Feb 2024, Andrew Stubbs wrote: On 16/02/2024 10:17, Richard Biener wrote: On Fri, 16 Feb 2024, Thomas Schwinge wrote: Hi! On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote: I've committed this patch ... as commit

Re: GCN RDNA2+ vs. GCC SLP vectorizer

2024-02-16 Thread Andrew Stubbs
On 16/02/2024 10:17, Richard Biener wrote: On Fri, 16 Feb 2024, Thomas Schwinge wrote: Hi! On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote: I've committed this patch ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691 "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the l

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-15 Thread Andrew Stubbs
On 15/02/2024 10:23, Thomas Schwinge wrote: Hi! On 2024-02-15T08:49:17+0100, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 14/02/2024 13:43, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 14/02/2024 13:27, Richard Biener wrote: On Wed, 14 Feb 2024

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-15 Thread Andrew Stubbs
On 15/02/2024 10:21, Richard Biener wrote: [snip] I suppse if RDNA really only has 32 lane vectors (it sounds like it, even if it can "simulate" 64 lane ones?) then it might make sense to vectorize for 32 lanes? That said, with variable-length it likely doesn't matter but I'd not expose

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-15 Thread Andrew Stubbs
On 15/02/2024 07:49, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 14/02/2024 13:43, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 14/02/2024 13:27, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 13/02/2024 08:26, Richard

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-14 Thread Andrew Stubbs
On 14/02/2024 13:43, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 14/02/2024 13:27, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 13/02/2024 08:26, Richard Biener wrote: On Mon, 12 Feb 2024, Thomas Schwinge wrote: Hi! On 2023-10-20T12:51:03

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-14 Thread Andrew Stubbs
On 14/02/2024 13:27, Richard Biener wrote: On Wed, 14 Feb 2024, Andrew Stubbs wrote: On 13/02/2024 08:26, Richard Biener wrote: On Mon, 12 Feb 2024, Thomas Schwinge wrote: Hi! On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote: I've committed this patch ... as commit

Re: GCN RDNA2+ vs. GCC vectorizer "Reduce using vector shifts"

2024-02-14 Thread Andrew Stubbs
On 13/02/2024 08:26, Richard Biener wrote: On Mon, 12 Feb 2024, Thomas Schwinge wrote: Hi! On 2023-10-20T12:51:03+0100, Andrew Stubbs wrote: I've committed this patch ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691 "amdgcn: add -march=gfx1030 EXPERIMENTAL". The RDNA2 I

Re: [PATCH] libgomp: testsuite: Don't XPASS libgomp.c/alloc-pinned-1.c etc. on non-Linux targets [PR113448]

2024-02-12 Thread Andrew Stubbs
On 05/02/2024 13:04, Rainer Orth wrote: Two libgomp tests XPASS on Solaris (any non-Linux target actually) since their introduction: XPASS: libgomp.c/alloc-pinned-1.c execution test XPASS: libgomp.c/alloc-pinned-2.c execution test The problem is that the test just prints OS unsupported and

Re: GCN: Don't hard-code number of SGPR/VGPR/AVGPR registers

2024-02-01 Thread Andrew Stubbs
On 01/02/2024 13:49, Thomas Schwinge wrote: Hi! On 2018-12-12T11:52:52+, Andrew Stubbs wrote: This patch contains the major part of the GCN back-end. [...] --- /dev/null +++ b/gcc/config/gcn/gcn.c +void +gcn_hsa_declare_function_name (FILE *file, const char *name, tree

Re: GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'

2024-02-01 Thread Andrew Stubbs
On 01/02/2024 11:36, Thomas Schwinge wrote: Hi! On 2024-01-31T11:31:00+, Andrew Stubbs wrote: On 31/01/2024 10:36, Thomas Schwinge wrote: OK to push "GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'", see attached? In pre-RDNA 3 ISA manuals, there are notes for

Re: GCN: Remove 'FIRST_{SGPR,VGPR,AVGPR}_REG', 'LAST_{SGPR,VGPR,AVGPR}_REG' from machine description

2024-01-31 Thread Andrew Stubbs
On 31/01/2024 17:21, Thomas Schwinge wrote: Hi! On 2018-12-12T11:52:23+, Andrew Stubbs wrote: This patch contains the machine description portion of the GCN back-end. [...] --- /dev/null +++ b/gcc/config/gcn/gcn.md +;; {{{ Constants and enums + +; Named registers +(define_constants

Re: GCN: Remove 'SGPR_OR_VGPR_REGNO_P' definition

2024-01-31 Thread Andrew Stubbs
On 31/01/2024 17:12, Thomas Schwinge wrote: Hi! On 2018-12-12T11:52:52+, Andrew Stubbs wrote: This patch contains the major part of the GCN back-end. [...] --- /dev/null +++ b/gcc/config/gcn/gcn.h +#define FIRST_SGPR_REG 0 +#define SGPR_REGNO(N) ((N)+FIRST_SGPR_REG

Re: GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'

2024-01-31 Thread Andrew Stubbs
On 31/01/2024 10:36, Thomas Schwinge wrote: Hi! OK to push "GCN, RDNA 3: Adjust 'sync_compare_and_swap_lds_insn'", see attached? In pre-RDNA 3 ISA manuals, there are notes for 'DS_CMPST_[...]', like: Caution, the order of src and cmp are the *opposite* of the BUFFER_ATOMIC_CMPSWAP

Re: [patch] gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]

2024-01-29 Thread Andrew Stubbs
On 29/01/2024 12:50, Tobias Burnus wrote: Andrew Stubbs wrote: /tmp/ccrsHfVQ.mkoffload.2.s:788736:27: error: value out of range    .amdhsa_next_free_vgpr    516 ^~~ [Obviously, likewise forlibgomp.c++/.. Hmm, supposedly there are 768

Re: [patch] gcn/gcn-valu.md: Disable fold_left_plus for TARGET_RDNA2_PLUS [PR113615]

2024-01-29 Thread Andrew Stubbs
On 29/01/2024 10:34, Tobias Burnus wrote: Andrew wrote off list:   "Vector reductions don't work on RDNA, as is, but they're    supposed to be disabled by the insn condition" This patch disables "fold_left_plus_", which is about vectorization and in the code path shown in the backtrace. I can

Re: [wwwdocs][patch] gcc-14/changes.html (amdgcn): Update for gfx1030/gfx1100

2024-01-29 Thread Andrew Stubbs
On 26/01/2024 17:06, Tobias Burnus wrote: Mention that gfx1030/gfx1100 are now supported. As noted in another thread, LLVM 15's assembler is now required, before LLVM 13.0.1 would do. (Alternatively, disabling gfx1100 support would do.) Hence, the added link to the install documentation.

Re: [patch] install.texi: For gcn, recommend LLVM 15, unless gfx1100 is disabled

2024-01-29 Thread Andrew Stubbs
On 26/01/2024 16:45, Tobias Burnus wrote: Hi, Thomas Schwinge wrote: amdgcn: config.gcc - enable gfx1030 and gfx1100 multilib; add them to the docs ... Further down in that file, we state: @anchor{amdgcn-x-amdhsa} @heading amdgcn-*-amdhsa AMD GCN GPU target. Instead

Re: [patch][v2] gcn/mkoffload.cc: Fix SRAM_ECC and XNACK handling [PR111966]

2024-01-29 Thread Andrew Stubbs
On 25/01/2024 15:11, Tobias Burnus wrote: Updated patch enclosed. Tobias Burnus wrote: I have now run the attached script and the result running yesterday's build with both my patch and your patch applied. (And the now committed gcn-hsa.h patch) Now the result with the testscript is: *

Re: [PATCH] Avoid registering unsupported OMP offload devices

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 14:21, Richard Biener wrote: On Fri, 26 Jan 2024, Jakub Jelinek wrote: On Fri, Jan 26, 2024 at 03:04:11PM +0100, Richard Biener wrote: Otherwise it looks reasoanble to me, but let's see what Andrew thinks. 'n' before 'a', please. ;-) ?! I've misspelled a word. @@ -1443,6

Re: [PATCH] Avoid registering unsupported OMP offload devices

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 14:04, Richard Biener wrote: On Fri, 26 Jan 2024, Andrew Stubbs wrote: On 26/01/2024 12:06, Jakub Jelinek wrote: On Fri, Jan 26, 2024 at 01:00:28PM +0100, Richard Biener wrote: The following avoids registering unsupported GCN offload devices when iterating over available ones

Re: [PATCH] Avoid registering unsupported OMP offload devices

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 12:06, Jakub Jelinek wrote: On Fri, Jan 26, 2024 at 01:00:28PM +0100, Richard Biener wrote: The following avoids registering unsupported GCN offload devices when iterating over available ones. With a Zen4 desktop CPU you will have an IGPU (unspported) which will otherwise be made

Re: [PATCH] Fix architecture support in OMP_OFFLOAD_init_device for gcn

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 11:42, Richard Biener wrote: The following makes the existing architecture support check work instead of being optimized away (enum vs. -1). This avoids later asserts when we assume such devices are never actually used. Tested as previously, now the error is libgomp: GCN fatal

Re: [patch] gcn/gcn-hsa.h: Always pass --amdhsa-code-object-version= in ASM_SPEC

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 10:39, Tobias Burnus wrote: Hi all, Andrew Stubbs wrote: On 26/01/2024 07:29, Richard Biener wrote: If you link against prebuilt objects with COV 5 it seems there's no way to override the COV version GCC uses?  That is, do we want to add a -mcode-object-version=... option

Re: [PATCH] Avoid using an unsupported agent when offloading to GCN

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 10:40, Richard Biener wrote: The following avoids selecting an unsupported agent early, avoiding later asserts when we rely on it being supported. tested on x86_64-unknown-linux-gnu -> amdhsa-gcn on gfx1060 that's the alternative to the other patch. I do indeed seem to get the

Re: [PATCH] Avoid assert for unknown device ISAs in GCN libgomp plugin

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 10:30, Richard Biener wrote: When the agent reports a device ISA we don't support avoid hitting an assert, instead report the raw integers as error. I'm not sure whether -1 is special as I didn't figure where that field is initialized. But I guess since agents are not rejected

Re: [PATCH] amdgcn: additional gfx1100 support

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 10:22, Richard Biener wrote: On Fri, 26 Jan 2024, Andrew Stubbs wrote: On 26/01/2024 09:45, Richard Biener wrote: On Fri, 26 Jan 2024, Richard Biener wrote: === libgomp Summary === # of expected passes29126 # of unexpected failures697

Re: [PATCH] amdgcn: additional gfx1100 support

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 09:45, Richard Biener wrote: On Fri, 26 Jan 2024, Richard Biener wrote: === libgomp Summary === # of expected passes29126 # of unexpected failures697 # of unexpected successes 1 # of expected failures 703 # of unresolved

Re: [patch] gcn/gcn-hsa.h: Always pass --amdhsa-code-object-version= in ASM_SPEC

2024-01-26 Thread Andrew Stubbs
On 26/01/2024 07:29, Richard Biener wrote: On Fri, Jan 26, 2024 at 12:04 AM Tobias Burnus wrote: When targeting AMD GPUs, the LLVM assembler (and linker) are used. Two days ago LLVM changed the default for the AMDHSA code object version (COV) from 4 to 5. In principle, we do not care which

Re: [patch] gcn/gcn-hsa.h: Always pass --amdhsa-code-object-version= in ASM_SPEC

2024-01-26 Thread Andrew Stubbs
On 25/01/2024 23:03, Tobias Burnus wrote: When targeting AMD GPUs, the LLVM assembler (and linker) are used. Two days ago LLVM changed the default for theAMDHSA code object version (COV) from 4 to 5. In principle, we do not care which COV is used as long as it works; unfortunately,

Re: [patch] gcn: Add missing space to ASM_SPEC in gcn-hsa.h

2024-01-25 Thread Andrew Stubbs
On 25/01/2024 12:44, Tobias Burnus wrote: This patch avoids assembler warnings for gfx908 and gfx90a such as '-xnack-mattr=-sramecc' is not a recognized feature for this target(ignoring feature) as we pass -mattr=-xnack-mattr=-sramecc to the llvm-mc assembler. Solution: Add a space

Re: [patch] gcn/mkoffload.cc: Fix SRAM_ECC and XNACK handling [PR111966]

2024-01-25 Thread Andrew Stubbs
On 24/01/2024 22:12, Tobias Burnus wrote: This patch fixes "-g" debug compilation for gfx1100 and gfx1030, which fail to link when "-g" is specified. The reason is: When using gfx1100 and compiling with '-g' I was running into an error because the eflags used for the debugger file has

[PATCH] amdgcn: additional gfx1100 support

2024-01-24 Thread Andrew Stubbs
(RTC_TICKS): Configure RDNA3. (omp_get_wtime): Add RDNA3-compatible variant. * plugin/plugin-gcn.c (max_isa_vgprs): Tune for gfx1030 and gfx1100. Signed-off-by: Andrew Stubbs --- gcc/config/gcn/gcn-opts.h | 2 +- gcc/config/gcn/gcn-valu.md

[PATCH] Update my email in MAINTAINERS

2024-01-23 Thread Andrew Stubbs
I've moved to BayLibre and don't have access to my codesourcery.com address, at least for a while. ChangeLog: * MAINTAINERS: Update Signed-off-by: Andrew Stubbs --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index

Re: [PATCH] gcn: Fix a warning

2024-01-23 Thread Andrew Stubbs
On Tue, 23 Jan 2024 at 10:01, Jakub Jelinek wrote: > Hi! > > I see > ../../gcc/config/gcn/gcn.cc: In function ‘void > gcn_hsa_declare_function_name(FILE*, const char*, tree)’: > ../../gcc/config/gcn/gcn.cc:6568:67: warning: unused parameter ‘decl’ > [-Wunused-parameter] > 6568 |

Re: [Patch] xfail libgomp.c/declare-variant-4-{fiji,gfx803}.c

2024-01-22 Thread Andrew Stubbs
On Fri, 19 Jan 2024 at 18:27, Tobias Burnus wrote: > The problem is as described at > https://gcc.gnu.org/install/specific.html#amdgcn-x-amdhsa > > "Note that support for Fiji devices has been removed in ROCm 4.0 and > support in LLVM is deprecated and will be removed in LLVM 18." > > Therefore,

Re: [Patch] GCN: Add pre-initial support for gfx1100

2024-01-08 Thread Andrew Stubbs
On 07/01/2024 19:20, Tobias Burnus wrote: ROCm meanwhile supports also some consumer cards; besides the semi-new gfx1030, support for gfx1100 was added more recently (in ROCm 5.7.1 for "Ubuntu 22.04 only" and without parenthesis since ROCm 6.0.0). GCC has already very limited support for

[committed] amdgcn: Match new XNACK defaults in mkoffload

2024-01-08 Thread Andrew Stubbs
This patch fixes build failures with the offload toolchain since my recent XNACK patch. The problem was simply that mkoffload made out-of-date assumptions about the -mxnack defaults. This patch fixes the mismatch. Committed to mainline. Andrewamdgcn: Don't double-count AVGPRs CDNA2 devices

[committed] amdgcn: Don't double-count AVGPRs

2024-01-08 Thread Andrew Stubbs
This patch fixes a runtime error with offload kernels that use a lot of registers, such as libgomp.fortran/target1.f90. Committed to mainline. Andrewamdgcn: Don't double-count AVGPRs CDNA2 devices have VGPRs and AVGPRs combined into a single hardware register file (they're seperate in CDNA1).

Re: [Patch] gcn.h: Add builtin_define ("__gfx1030")

2024-01-08 Thread Andrew Stubbs
On 06/01/2024 21:20, Tobias Burnus wrote: Hi Andrew, I just spotted that this define was missing. OK for mainline? OK. Andrew

[committed] amdgcn: XNACK support

2023-12-13 Thread Andrew Stubbs
Some AMD GCN devices support an "XNACK" mode in which the device can handle page-misses (and maybe other traps in memory instructions), but it's not completely invisible to software. We need this now to support OpenMP Unified Shared Memory (I plan to post updated patches for that in January),

  1   2   3   4   5   6   7   8   9   10   >