[PATCH] docs: Add NoOffload option flag to the internals manual

2025-07-31 Thread Andrew Stubbs
The NoOffload flag was introduced recently (commit "Don't pass vector params through to offload targets"). gcc/ChangeLog: * doc/options.texi: Document NoOffload. --- gcc/doc/options.texi | 6 ++ 1 file changed, 6 insertions(+) diff --git a/gcc/doc/options.texi b/gcc/doc/options.texi

Re: [PATCH] Don't pass vector params through to offload targets

2025-07-31 Thread Andrew Stubbs
On 30/07/2025 18:13, Thomas Schwinge wrote: Hi Andrew! On 2025-07-25T14:44:06+, Andrew Stubbs wrote: The optimization options are deliberately passed through to the LTO compiler, but when the same mechanism is reused for offloading it ends up forcing the host compiler settings onto the

Re: [PATCH] vect: Add target hook to prefer gather/scatter instructions

2025-07-28 Thread Andrew Stubbs
On 28/07/2025 14:57, Richard Biener wrote: On Mon, Jul 28, 2025 at 3:50 PM Andrew Stubbs wrote: On 28/07/2025 10:46, Richard Biener wrote: On Fri, Jul 25, 2025 at 6:37 PM Andrew Stubbs wrote: On 23/07/2025 16:23, Richard Biener wrote: That said, the hook is a bit black/white - whether

Re: [PATCH] vect: Add target hook to prefer gather/scatter instructions

2025-07-28 Thread Andrew Stubbs
On 28/07/2025 10:46, Richard Biener wrote: On Fri, Jul 25, 2025 at 6:37 PM Andrew Stubbs wrote: On 23/07/2025 16:23, Richard Biener wrote: That said, the hook is a bit black/white - whether the target prefers a gather/scatter over N piecewise operations with equal stride depends at least on

Re: [Patch] gcn: Fix CDNA3 atomics' buffer invalidation

2025-07-28 Thread Andrew Stubbs
On 28/07/2025 14:36, Tobias Burnus wrote: When initially adding MI300 support, the buffer invalidation before atomics was messed up - it should have been buffer_wbl2 (wbl2 = write back L2). With this patch in place, most test cases work on MI300A :-) Without this change, there were several multi

Re: [Patch] gcn: Add more s_nop for MI300

2025-07-28 Thread Andrew Stubbs
On 28/07/2025 13:54, Tobias Burnus wrote: Hi Andrew, thanks for all the suggestions and the review! Andrew Stubbs wrote: ... +  /* CDNA3: VALU writes VGPR/VCC: v_readlane, v_readfirstlane, v_cmp, +         v_add_*i/u, v_sub_*i/u, v_div_*scale - followed by: +         - VALU reads SGPR as

Re: [Patch] gcn: Add more s_nop for MI300

2025-07-28 Thread Andrew Stubbs
On 28/07/2025 12:06, Tobias Burnus wrote: Hi Andrew, Andrew Stubbs wrote: +static bool +gcn_v_cmp_insn_p (attr_type type) +{ +  return type == TYPE_VOPC || type == TYPE_VOP3A;  } There are many vop3a encoded instructions. I don't understand how this uniquely identifies v_cmp instruc

Re: [Patch] gcn: Add 'nops' insn, extend comments

2025-07-28 Thread Andrew Stubbs
On 28/07/2025 10:45, Tobias Burnus wrote: Tiny cleanup patch - as fallout of trying to understand MI300 and its fails better (A) Replace 's_nop 0x0; s_nop 0x0; s_nop 0x0' by 's_nop 0x2'. Advantage: fewer instructions - this helps on the hardware side by permitting more follow-up instructions in

Re: [PATCH] vect: Add target hook to prefer gather/scatter instructions

2025-07-25 Thread Andrew Stubbs
On 23/07/2025 16:23, Richard Biener wrote: That said, the hook is a bit black/white - whether the target prefers a gather/scatter over N piecewise operations with equal stride depends at least on the vector mode. On x86_64 for V2DImode definitely no gather, for V16SFmode it probably depends (V64

[PATCH] Don't pass vector params through to offload targets

2025-07-25 Thread Andrew Stubbs
Hi all, The optimization options are deliberately passed through to the LTO compiler, but when the same mechanism is reused for offloading it ends up forcing the host compiler settings onto the device compiler. Maybe this should be removed completely, but this patch just fixes a few of them. In

Re: [Patch] gcn: Add more s_nop for MI300

2025-07-25 Thread Andrew Stubbs
On 25/07/2025 11:54, Tobias Burnus wrote: There are still issues with MI300, some which get resolved by adding s_nop. One case where it is exactly known where the s_nop fixes a fail is for libgomp.c-c++-common/task-detach-10.c, where libgomp/single.c's GOMP_single_start() never returns 1, such t

Re: [Patch] gcn: Add "s_nop"s for MI300

2025-07-25 Thread Andrew Stubbs
On 24/07/2025 20:13, Tobias Burnus wrote: Andrew Stubbs wrote: On 24/07/2025 16:49, Tobias Burnus wrote: Andrew Stubbs wrote: On 24/07/2025 14:25, Tobias Burnus wrote: +/* Device requires CDNA1-style manually inserted wait states for AVGPRs.  */ +#define TARGET_AVGPR_CDNA3_NOPS TARGET_CDNA3

Re: [Patch] gcn: Add "s_nop"s for MI300

2025-07-24 Thread Andrew Stubbs
On 24/07/2025 16:49, Tobias Burnus wrote: Andrew Stubbs wrote: On 24/07/2025 14:25, Tobias Burnus wrote: +/* Device requires CDNA1-style manually inserted wait states for AVGPRs.  */ +#define TARGET_AVGPR_CDNA3_NOPS TARGET_CDNA3 This is not for CDNA1, and not for AVGPRS. I have deleted it

Re: [Patch] gcn: Add "s_nop"s for MI300

2025-07-24 Thread Andrew Stubbs
On 24/07/2025 14:25, Tobias Burnus wrote: Hello Andrew, hello world, some instructions take a bit longer but permit that the next instruction is processed while finishing the work. That works well - and speeds up the calculation - unless the next instruction relies on those being available. Whic

Re: [PATCH] vect: Add target hook to prefer gather/scatter instructions

2025-07-23 Thread Andrew Stubbs
On 23/07/2025 14:52, Richard Biener wrote: On Wed, Jul 23, 2025 at 3:24 PM Andrew Stubbs wrote: On 23/07/2025 13:24, Richard Biener wrote: On Wed, Jul 23, 2025 at 1:51 PM Andrew Stubbs wrote: From: Julian Brown This patch was originally written by Julian in 2021 for the OG10 branch, but

Re: [PATCH] vect: Add target hook to prefer gather/scatter instructions

2025-07-23 Thread Andrew Stubbs
On 23/07/2025 13:24, Richard Biener wrote: On Wed, Jul 23, 2025 at 1:51 PM Andrew Stubbs wrote: From: Julian Brown This patch was originally written by Julian in 2021 for the OG10 branch, but does not appear to have been proposed for upstream at that time, or since. I've now forward p

[PATCH] vect: Add target hook to prefer gather/scatter instructions

2025-07-23 Thread Andrew Stubbs
From: Julian Brown This patch was originally written by Julian in 2021 for the OG10 branch, but does not appear to have been proposed for upstream at that time, or since. I've now forward ported it and retested it. Thomas reported test regressions with this patch on the OG14 branch, but I think

[committed 2/3] amdgcn: Add ashlvNm, mulvNm macros

2025-07-21 Thread Andrew Stubbs
I need some extra shift varieties in the mode-independent code, but the macros don't permit insns that don't have QI/HI variants. This fixes the problem, and adds the new functions for the follow-up patch to use. gcc/ChangeLog: * config/gcn/gcn.cc (GEN_VNM_NOEXEC): Use USE_QHF. (

[committed 3/3] amdgcn: add DImode offsets for gather/scatter

2025-07-21 Thread Andrew Stubbs
Add new variant of he gather_load and scatter_store instructions that take the offsets in DImode. This is not the natural width for offsets in the instruction set, but we can use them to compute a vector of absolute addresses, which does work. This enables the autovectorizer to use gather/scatter

[committed 1/3] amdgcn: add more insn patterns using vec_duplicate

2025-07-21 Thread Andrew Stubbs
These new insns allow more efficient use of scalar inputs to 64-bit vector add and mul. Also, the patch adjusts the existing mul.._dup because it was actually a dup2 (the vec_duplicate is on the second input), and that was inconveniently inconsistent. The patterns are generally useful, but will b

[committed 0/3] amdgcn: Add support for DImode offsets to gather/scatter

2025-07-21 Thread Andrew Stubbs
SImode offsets then that would be generally better, but allowing DImode is better than allowing the vectorizers to simply fail when it encounters complex access patterns. When combined with vect_partial_vector_usage=1, this patch gives a good speed-up on the SPEC HPC lbm benchmark. Andrew Stubbs (3

[committed 1/7] amdgcn, libgomp: Remove unused variable (PR121156)

2025-07-18 Thread Andrew Stubbs
There's a new compiler warning breaking the build. This fixes it. The variable appears to be genuinely vestigial. libgomp/ChangeLog: PR target/121156 * config/gcn/bar.c (gomp_team_barrier_wait_end): Remove unused "generation" variable. (gomp_team_barrier_wait_can

[committed] amdgcn: Fix various unrecognized pattern issues with add3_vcc_dup

2025-07-16 Thread Andrew Stubbs
The patterns did not accept inline immediate constants, even though the hardware instructions do, which has lead to some errors in some patches I'm working on. Also the VCC update RTL was using the wrong operands in the wrong places. This appears to have been harmless(?) but is definitely not int

[committed] amdgcn: fix vec_ucmp infinite recursion

2025-07-14 Thread Andrew Stubbs
I suppose this pattern doesn't get used much! The unsigned compare was meant to be defined using the signed compare pattern, but actually ended up trying to recursively call itself. This patch fixes the issue in the obvious way. gcc/ChangeLog: * config/gcn/gcn-valu.md (vec_cmpudi_exec):

[committed] amdgcn: Don't clobber VCC if we don't need to

2025-07-14 Thread Andrew Stubbs
This is a hold-over from GCN3 where v_add always wrote to the condition register, whether you wanted it or not. This hasn't been true since GCN5, and we dropped support for GCN3 a little while ago, so let's fix it. There was actually a latent bug here because some other post-reload splitters were

Re: [Patch] gcn: Fix glc vs. sc0 handling for scalar memory access

2025-06-24 Thread Andrew Stubbs
On 23/06/2025 22:39, Tobias Burnus wrote: This is more based on documentation reading that on testing as still only limited MI300 testing has been done and seemingly this code does not usually get touched. MI300's "9.1.10 Memory Scope and Temporal Control" distinguishes between scalar memory (9.

[committed] amdgcn: allow SImode in VCC_HI [PR120722]

2025-06-20 Thread Andrew Stubbs
This patch isn't fully tested yet, but it fixes the build failure, so that will do for now. SImode was not allowed in VCC_HI because there were issues, way back before the port went upstream, so it's possible we'll find out what those issues were again soon. gcc/ChangeLog: PR target/1207

Re: [Patch] [+wwwdocs] gcn: Add experimental MI300 (gfx942) support

2025-06-10 Thread Andrew Stubbs
On 10/06/2025 09:49, Tobias Burnus wrote: This add experimental support for AMD Instinct MI300. It has been tested to support hello world, but not yet much beyond (to come). OK for mainline? I forgot that %G would have to have an operand number. That's not ideal, but using a punctuation mark

Re: [Patch] gcn: Update --with-arch= for newer archs

2025-06-05 Thread Andrew Stubbs
On 05/06/2025 08:48, Tobias Burnus wrote: As a user reported, --with=arch= did not support the newer devices, as we forgot to update the list. While we still have lists to update, this one can be replaced by checking directly against the .def file. There was another list that we didn't update -

Re: [Patch] libgomp: Add OpenMP's omp_target_memset/omp_target_memset_async [PR120444]

2025-06-02 Thread Andrew Stubbs
On 02/06/2025 15:40, Tobias Burnus wrote: Hi Andrew, Andrew Stubbs wrote: The hsa_memory_copy API is known to be slow, so for smaller data sizes it's probably better to have one hsa_memory_copy replace the whole memset than use three API calls, even with setting up some host-side memo

Re: [Patch] libgomp: Add OpenMP's omp_target_memset/omp_target_memset_async [PR120444]

2025-06-02 Thread Andrew Stubbs
On 30/05/2025 23:36, Tobias Burnus wrote: Attached patch adds omp_target_memset and omp_target_memset_async permitting to set (potentially large) data on the device to a certain value - in particular to '\0'. It uses 'memset' on the host (and for shared memory, e.g. via requires unified_shared_m

[PATCH] OpenMP, GCN: Add interop-hsa testcase

2025-04-25 Thread Andrew Stubbs
This testcase ensures that the interop HSA support is sufficient to run a kernel manually on the same device. It reuses an OpenMP kernel in order to avoid all the complication of compiling a custom kernel in Dejagnu (although, this does mean matching the OpenMP runtime environment, which might be

Re: [PATCH] GCN, nvptx offloading: Host/device compatibility: Itanium C++ ABI, DSO Object Destruction API [PR119853, PR119854]

2025-04-24 Thread Andrew Stubbs
On 23/04/2025 20:49, Thomas Schwinge wrote: '__dso_handle' for '__cxa_atexit', '__cxa_finalize'. See . PR target/119853 PR target/119854 libgcc/ * config/gcn/crt0.c (_fini_array): Call '__GCC_of

Re: [PATCH] GCN: Properly switch sections in 'gcn_hsa_declare_function_name' [PR119737]

2025-04-23 Thread Andrew Stubbs
On 22/04/2025 21:41, Thomas Schwinge wrote: From: Andrew Pinski There are GCN/C++ target as well as offloading codes, where the hard-coded section names in 'gcn_hsa_declare_function_name' do not fit, and assembly thus fails: LLVM ERROR: Size expression must be absolute. This commit progr

Re: [wwwdocs][Patch] gcc-15/changes: Fortran + offload (C++) update | project/gomp: GCC 15 update

2025-04-17 Thread Andrew Stubbs
On 17/04/2025 15:10, Tobias Burnus wrote: Hi all, @Fortraners: Comments to the added 'do concurrent' item? @Thomas: Are you fine with this C++ wording? @Andrew: Likewise for C++ and ROCm bump? This part is fine with me. Andrew Anyone: comments are welcome. Affected pages: * https://gcc.

Re: [PATCH] testsuite: force AMDGCN test for vect-early-break_18.c to consistent architecture [PR119286]

2025-04-16 Thread Andrew Stubbs
On 16/04/2025 08:57, Tamar Christina wrote: Hi All, The given test is intended to test vectorization of a strided access done by having a step of > 1. GCN target doesn't support load lanes, so the testcase is expected to fail, other targets create a permuted load here which we then then reject.

Re: GCN, nvptx libstdc++: Force use of '__atomic' builtins [PR119645]

2025-04-07 Thread Andrew Stubbs
On 07/04/2025 09:07, Thomas Schwinge wrote: Hi! On 2025-03-14T11:39:20+0100, I wrote: As the first of a few patches to enable libstdc++ for GCN, nvptx targets, [...] some more fine-tuning is to follow later on.) Any comments before I push the attached "GCN, nvptx libstdc++: Force use of '_

Re: [PATCH 2/2] GCN: Don't emit weak undefined symbols [PR119369]

2025-04-04 Thread Andrew Stubbs
On 31/03/2025 10:48, Thomas Schwinge wrote: This resolves all instances of PR119369 "GCN: weak undefined symbols -> execution test FAIL, 'HSA_STATUS_ERROR_VARIABLE_UNDEFINED'"; for all affected test cases, the execution test status progresses FAIL -> PASS. This however also causes a small numbe

Re: [Patch] install.texi: gcn - suggest to use Newlib with simd math fix [PR119325]

2025-03-25 Thread Andrew Stubbs
On 25/03/2025 12:05, Tobias Burnus wrote: A GCC 15 regression turned out to be a bug in Newlib related to undefined behavior that just started to trigger in some cases. As it is now fixed, it makes IMHO sense to mention that Newlib commit in GCC's install documentation for AMD GPUs. Comments, s

Re: [PATCH] Fix GCN SIMD libm bug

2025-03-20 Thread Andrew Stubbs
I meant to send this to the newlib list. Apparently yesterday was a bad day for sending emails correctly. :( Andrew On 19/03/2025 15:04, Andrew Stubbs wrote: Since January, GCC has been miscompiling Newlib libm on AMD GCN due to undefined behaviour in the RESIZE_VECTOR macro. It was "wo

[PATCH] Fix GCN SIMD libm bug

2025-03-19 Thread Andrew Stubbs
Since January, GCC has been miscompiling Newlib libm on AMD GCN due to undefined behaviour in the RESIZE_VECTOR macro. It was "working" but expanding the size of a vector would no longer zero the additional lanes, as it expected. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119325 --- newlib

Re: GCN, nvptx: Allow for "hosted" libstdc++ build

2025-03-14 Thread Andrew Stubbs
On 14/03/2025 10:39, Thomas Schwinge wrote: Hi! As the first of a few patches to enable libstdc++ for GCN, nvptx targets, and eventually for OpenACC, OpenMP offloading use, I intend to push the attached 'GCN, nvptx: Allow for "hosted" libstdc++ build'. Any objections? It's not exactly pretty,

Re: [Patch] libgomp/plugin: Add initial interop support to nvptx + gcn

2025-03-11 Thread Andrew Stubbs
On 10/03/2025 21:48, Tobias Burnus wrote: This patch requires the to be submitted GOMP_interop patch, which handles the generic libgomp parts. But once it is available, this patch adds support for the foreign runtimes cuda/cuda_driver/hip for nvptx and hip/hsa for gcn. The patch is based on my o

Re: [wwwdocs] gcc-15/changes.html: Update AMD GPU (GCN) section for new gfx*

2025-02-14 Thread Andrew Stubbs
On 14/02/2025 09:02, Tobias Burnus wrote: Update https://gcc.gnu.org/gcc-15/changes.html#amdgcn for the newly added generic support and the GPUs compatible with the generic devices. OK? To have clickable links: In the patch, both https://gcc.gnu.org/ onlinedocs/gcc/AMD-GCN-Options.html and ht

Re: [Patch] [gcn] install.texi: Update for new ISA targets and their requirements

2025-02-10 Thread Andrew Stubbs
On 10/02/2025 15:50, Tobias Burnus wrote: Update the GCN install documentation for added ISAs, especially as no longer all supported ISA are enabled by default.  And for the (ROCm wise: upcoming) generic support. OK for mainline? +By default, multilib support is build for @code{gfx900}, @code

Re: [Patch] [gcn] mkoffload.cc: Print fatal error if -march has no multilib but generic has

2025-02-10 Thread Andrew Stubbs
On 10/02/2025 15:24, Tobias Burnus wrote: Hi all, Andrew and I discussed about the following: Andrew Stubbs wrote: This business of changing the -march flag from what the user specified is also questionable. Result: the new patch (attached) no longer automatically chooses the associated

Re: [Patch] [gcn] mkoffload.cc: Automatically use gfx*-generic if -march has no multilib but generic has

2025-02-07 Thread Andrew Stubbs
On 07/02/2025 12:53, Tobias Burnus wrote: Hi Andrew, Andrew Stubbs wrote: I think the correct place for this whole concept might be in the MULTILIB_MATCHES configuration option, not in mkoffload. In any case, mkoffload needs to know about this; if only the driver ('gcc') knows ab

Re: [Patch][v2] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c

2025-02-07 Thread Andrew Stubbs
On 07/02/2025 11:44, Tobias Burnus wrote: Andrew Stubbs wrote: On 07/02/2025 09:40, Tobias Burnus wrote: This patch permits loading generic ISA code objects - by just trying whether the runtime accepts it.  If not, it fails with an error. - The error messages should be a bit more helpful in

Re: [Patch] [gcn] mkoffload.cc: Automatically use gfx*-generic if -march has no multilib but generic has

2025-02-07 Thread Andrew Stubbs
On 07/02/2025 10:17, Tobias Burnus wrote: This patch is part of the following series (not yet in mainline); this patch depends on the first one, but only makes sense if both are in: * "[gcn] Add gfx9-generic and generic-associated gfx*"   (email subject: "Re: [Patch] [GCN] Handle generic ISA na

Re: [Patch] [gcn] Fix the output amdhsa.version

2025-02-07 Thread Andrew Stubbs
On 07/02/2025 11:16, Tobias Burnus wrote: Andrew Stubbs wrote: Otherwise, this patch seems fine (I have not reviewed the new magic numbers and settings.) As Andrew mentioned via chat, we also have to update the 'amdhsa.version'. Well, that's what the attached patch does.

Re: [Patch] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c

2025-02-07 Thread Andrew Stubbs
On 07/02/2025 09:40, Tobias Burnus wrote: This patch is part of the following series (all unreviewed so far) but can be independently applied: * [Patch] [gcn] Fix gfx906's sramecc setting,   https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675251.html * "[gcn] Add gfx9-generic and gener

Re: [Patch, v2] [gcn] Add gfx9-generic and generic-associated gfx*

2025-02-07 Thread Andrew Stubbs
On 07/02/2025 10:37, Tobias Burnus wrote: Andrew Stubbs wrote: The attached patch now adds gfx9-generic - alongside the existing gfx{10-3,1}-generic and all gfx* that are enabled by those. What happened to the documentation patch with the "Experimental" markers? I'm still unc

Re: [Patch] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c

2025-02-07 Thread Andrew Stubbs
On 07/02/2025 00:25, Tobias Burnus wrote: After spending some time with the debugger, I am now convinced that ROCm 6.3.2 does not yet support generic. The amd-staging branch at https://github.com/ROCm/ROCR-Runtime/ support does, albeit only after the tag rocm-6.3.2. However, the released ROCm 6.3

Re: [Patch] [gcn] Fix gfx906's sramecc setting

2025-02-07 Thread Andrew Stubbs
On 06/02/2025 22:09, Tobias Burnus wrote: ROCm 6.3.2 does not like my patch for reasons that I do not understand; https://gcc.gnu.org/pipermail/gcc-patches/2025-February/675200.html Until that's sorted, I decided to split off two obvious fixes; I might suggest some further changes, but the full

Re: [Patch] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c

2025-02-05 Thread Andrew Stubbs
On 05/02/2025 12:51, Tobias Burnus wrote: Hi Andrew, Andrew Stubbs wrote: On 05/02/2025 11:14, Tobias Burnus wrote: Therefore, the following GPUs are now supported in addition: gfx902, gfx904, gfx909, gfx1031, gfx1032, gfx1033, gfx1034, gfx1035, gfx1101, gfx1102, gfx1150, gfx1151, gfx1152

Re: [Patch] [GCN] Handle generic ISA names in libgomp's plugin-gcn.c

2025-02-05 Thread Andrew Stubbs
On 05/02/2025 11:14, Tobias Burnus wrote: The number of AMD GPUs is huge - and, unfortunately, every GPU device is potentially slightly different, requiring different code generation either in some dusty corner case or for standard code. As for several GPUs identical code can run (either all or

[committed][OG14] openmp: Fix error reporting in parsing of C++ OpenMP to/from clause

2024-12-06 Thread Andrew Stubbs
From: Kwok Cheung Yeung The final 'else' when checking the motion modifiers is nested one level too deep. This patch should be folded into "OpenMP: Enable 'declare mapper' mappers for 'target update' directives" when merging to mainline. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_cla

Re: GCN: Fix 'real_from_integer' usage

2024-12-06 Thread Andrew Stubbs
On 12/6/24 13:56, Thomas Schwinge wrote: Hi Andrew! On 2024-12-05T15:14:45+0100, I wrote: On 2020-01-31T11:20:14+, Andrew Stubbs wrote: This is one of those things I don't know why we didn't notice sooner. ..., and here's another thing I don't know why we didn&

Re: [PATCH v4 6/8] gcn: Add else operand to masked loads.

2024-11-20 Thread Andrew Stubbs
On 11/7/24 18:02, Andrew Stubbs wrote: On 07/11/2024 17:57, Robin Dapp wrote: From: Robin Dapp This patch adds an undefined else operand to the masked loads. gcc/ChangeLog: * config/gcn/predicates.md (maskload_else_operand): New predicate. * config/gcn/gcn-valu.md: Use new

Re: [Patch] libgomp/plugin/plugin-gcn.c: async-queue init - fix function-return type and fail fatally

2024-11-18 Thread Andrew Stubbs
On 11/18/24 13:23, Tobias Burnus wrote: This fixes a C23 error, causing a build fail: 'false' should have been 'NULL'. The NULL value is not really handled as the code calling maybe_init_omp_async assumes that agent->omp_async_queue can be dereferenced. Hence, besides fixing the false/NULL issu

Re: [Patch] libgomp/plugin/plugin-gcn.c: Show device number in ISA error

2024-11-11 Thread Andrew Stubbs
On 11/11/2024 09:42, Tobias Burnus wrote: Currently, for GCN, only one offload ISA is supported; this might lead to errors when multiple different AMD GPUs are installed on the same system, at least when using the "wrong" device/device number. In case of the testsuite, this occurs for instance w

Re: [PATCH] Add push/pop_function_decl

2024-11-08 Thread Andrew Stubbs
On 08/11/2024 12:25, Richard Sandiford wrote: For the aarch64 simd clones patches, it would be useful to be able to push a function declaration onto the cfun stack, even though it has no function body associated with it. That is, we want cfun to be null, current_function_decl to be the decl itse

Re: [PATCH v4 6/8] gcn: Add else operand to masked loads.

2024-11-07 Thread Andrew Stubbs
On 07/11/2024 17:57, Robin Dapp wrote: From: Robin Dapp This patch adds an undefined else operand to the masked loads. gcc/ChangeLog: * config/gcn/predicates.md (maskload_else_operand): New predicate. * config/gcn/gcn-valu.md: Use new predicate. --- gcc/config/gcn/gc

Re: [r15-4988 Regression] FAIL: gcc.dg/gomp/max_vf-1.c scan-tree-dump-times ompexp "__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \\(.*, 16, 0\\);" 1 on Linux/x86_64

2024-11-07 Thread Andrew Stubbs
On 07/11/2024 11:07, Jakub Jelinek wrote: On Thu, Nov 07, 2024 at 10:54:40AM +, Andrew Stubbs wrote: On 07/11/2024 00:37, haochen.jiang wrote: d334f729e53867b838e867375b3f475ba793d96e is the first bad commit commit d334f729e53867b838e867375b3f475ba793d96e Author: Andrew Stubbs Date: Wed

Re: [r15-4988 Regression] FAIL: gcc.dg/gomp/max_vf-1.c scan-tree-dump-times ompexp "__builtin_GOMP_parallel_loop_nonmonotonic_dynamic \\(.*, 16, 0\\);" 1 on Linux/x86_64

2024-11-07 Thread Andrew Stubbs
On 07/11/2024 00:37, haochen.jiang wrote: On Linux/x86_64, d334f729e53867b838e867375b3f475ba793d96e is the first bad commit commit d334f729e53867b838e867375b3f475ba793d96e Author: Andrew Stubbs Date: Wed Nov 6 12:26:08 2024 + openmp: Add testcases for omp_max_vf caused FAIL

Re: [PATCH 1/4] openmp: Tune omp_max_vf for offload targets

2024-11-06 Thread Andrew Stubbs
On 06/11/2024 17:59, Jakub Jelinek wrote: On Wed, Nov 06, 2024 at 05:53:53PM +, Andrew Stubbs wrote: I'm not sure why I didn't see this. Was it bootstrap tested or just built without bootstrap + tested? Otherwise it is just a warning. Apparently I forgot to rerun the boots

Re: [PATCH 1/4] openmp: Tune omp_max_vf for offload targets

2024-11-06 Thread Andrew Stubbs
uses a bootstrap failure for me (and others) on x86_64-linux-gnu: I'm not sure why I didn't see this. I'm testing the attached patch. AndrewFrom 345eb9b795d9728733bd0e472529e259ce796ff6 Mon Sep 17 00:00:00 2001 From: Andrew Stubbs Date: Wed, 6 Nov 2024 17:50:00 + Subject: [PAT

Re: [PATCH 4/4] openmp: Add testcases for omp_max_vf

2024-11-06 Thread Andrew Stubbs
On 06/11/2024 15:41, Jakub Jelinek wrote: On Wed, Nov 06, 2024 at 03:27:22PM +, Andrew Stubbs wrote: Ensure that the GOMP_MAX_VF does the right thing for explicit schedules, when offloading is enabled ("target" directives are present), and is inactive otherwise. This requires en

[PATCH 0/4] openmp: Fix omp_max_vf in offload contexts

2024-11-06 Thread Andrew Stubbs
letely sure if the IFN is overkill. I'm aware that Prathamesh is also working on code in this area. His RFC patch doesn't work for my use-case, and seems to have other issues. This patch conflicts, but hopefully it's not unresolvable. OK for mainline? Andrew Andrew Stubbs (4): ope

[PATCH 1/4] openmp: Tune omp_max_vf for offload targets

2024-11-06 Thread Andrew Stubbs
If requested, return the vectorization factor appropriate for the offload device, if any. This change gives a significant speedup in the BabelStream "dot" benchmark on amdgcn. The omp_adjust_chunk_size usecase is set "false", for now, but I intend to change that in a follow-up patch. Note that N

[PATCH 3/4] openmp: Add IFN_GOMP_MAX_VF

2024-11-06 Thread Andrew Stubbs
Delay omp_max_vf call until after the host and device compilers have diverged so that the max_vf value can be tuned exactly right on both variants. This change means that the ompdevlow pass must be enabled for functions that use OpenMP directives with both "simd" and "schedule" enabled. gcc/Chang

[PATCH 4/4] openmp: Add testcases for omp_max_vf

2024-11-06 Thread Andrew Stubbs
Ensure that the GOMP_MAX_VF does the right thing for explicit schedules, when offloading is enabled ("target" directives are present), and is inactive otherwise. This requires enabling the offload-dump scanning features previously only used in the libgomp testsuite. The automake scheme used there

[PATCH 2/4] openmp: use offload max_vf for chunk_size

2024-11-06 Thread Andrew Stubbs
The chunk size for SIMD loops should be right for the current device; too big allocates too much memory, too small is inefficient. Getting it wrong doesn't actually break anything though. This patch attempts to choose the optimal setting based on the context. Both host-fallback and device will g

Re: [PATCH v3 6/8] gcn: Add else operand to masked loads.

2024-11-04 Thread Andrew Stubbs
On 02/11/2024 12:58, Robin Dapp wrote: From: Robin Dapp This patch adds an undefined else operand to the masked loads. gcc/ChangeLog: * config/gcn/predicates.md (maskload_else_operand): New predicate. * config/gcn/gcn-valu.md: Use new predicate. --- gcc/config/gcn/gc

Re: [PATCH v3] Remove sys/user time in -ftime-report

2024-10-31 Thread Andrew Stubbs
On 30/10/2024 16:06, Andi Kleen wrote: On Wed, Oct 23, 2024 at 02:56:51PM +0200, Richard Biener wrote: On Wed, Oct 9, 2024 at 6:18 PM Andi Kleen wrote: From: Andi Kleen Retrieving sys/user time in timevars is quite expensive because it always needs a system call. Only getting the wall time

Re: [PATCH v2 6/8] gcn: Add else operand to masked loads.

2024-10-29 Thread Andrew Stubbs
On 29/10/2024 09:39, Andrew Stubbs wrote: On 28/10/2024 20:03, Robin Dapp wrote: I'm not sure how this is different to just deleting the zero-initializer, which is what I already tested and found some random behaviour? The difference is in the else-operand predicate.  So unless there are

Re: [Patch] AMD GCN: Set HSA_XNACK for USM and 'xnack+' / 'xnack-'

2024-10-29 Thread Andrew Stubbs
On 29/10/2024 12:10, Tobias Burnus wrote: Hi Andrew, Am 29.10.24 um 13:07 schrieb Andrew Stubbs: On 29/10/2024 11:44, Tobias Burnus wrote: This somewhat matches what is done in OG13 and in Andrew's patch at https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655951.html albeit the co

Re: [Patch] AMD GCN: Set HSA_XNACK for USM and 'xnack+' / 'xnack-'

2024-10-29 Thread Andrew Stubbs
On 29/10/2024 11:44, Tobias Burnus wrote: While users can set HSA_XNACK themselves, it is much more convenient if the compiler sets it for them (at least if it is overriddable). Some systems don't have XNACK, but for those that have it, the somewhat newisher object code versions support three mo

Re: [PATCH v2 6/8] gcn: Add else operand to masked loads.

2024-10-29 Thread Andrew Stubbs
On 28/10/2024 20:03, Robin Dapp wrote: I'm not sure how this is different to just deleting the zero-initializer, which is what I already tested and found some random behaviour? The difference is in the else-operand predicate. So unless there are more bugs we should only have added VCOND_EXPRs

Re: [Patch][v2] GCN: Initial generic-target handling

2024-10-22 Thread Andrew Stubbs
On 22/10/2024 17:29, Tobias Burnus wrote: Andrew Stubbs wrote: I'm going to push the base patch shortly. … which happened in commit r15-4540-ga6b26e5ea09779. Updated patch attached. Some more testing showed that there was an issue with the builtin defines, which has been fixed and

Re: [PATCH v2 6/8] gcn: Add else operand to masked loads.

2024-10-22 Thread Andrew Stubbs
On 18/10/2024 15:22, Robin Dapp wrote: This patch adds an undefined else operand to the masked loads. @@ -4027,7 +4025,8 @@ (define_expand "mask_gather_load" (match_operand: 2 "register_operand") (match_operand 3 "immediate_operand") (match_operand:SI 4 "gcn_alu_operand") - (

Re: [Patch] GCN: Initial generic-target handling

2024-10-22 Thread Andrew Stubbs
On 22/10/2024 11:04, Tobias Burnus wrote: Hi Andrew, Andrew Stubbs wrote: On 21/10/2024 20:49, Tobias Burnus wrote: GCN_DEVICE field descriptions: -  0  "name"  (text, external) +  0 Generic flag/version (0 = non-generic, 1 to 255 = generic version, +    external)

Re: [Patch] GCN: Initial generic-target handling

2024-10-22 Thread Andrew Stubbs
On 21/10/2024 20:49, Tobias Burnus wrote: I have now attached a proper version of my patch, which is relative to your patch. OK once your patch is in? GCN_DEVICE field descriptions: - 0 "name" (text, external) + 0 Generic flag/version (0 = non-generic, 1 to 255 = generic ve

[PATCH] amdgcn: Refactor device settings into a def file

2024-10-21 Thread Andrew Stubbs
I'm going to commit this soon, but I'd appreciate if anybody could have a quick look and let me know if anything is obviously broken or doing things the hard way, or something. Thanks! Andrew -- Almost all device-specific settings are now centralised into gcn-devices.def for the

[committed] amdgcn: silence warning

2024-10-21 Thread Andrew Stubbs
FIRST_SGPR_REG is register zero so the compiler always claims this comparison is redundant. It's right, of course, but I'd have preferred to keep the comparison for completeness. Probably the "correct" solution is to use an enum for these values. gcc/ChangeLog: * config/gcn/gcn.h (SGPR_

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-11 Thread Andrew Stubbs
On 10/09/2024 10:43, Andrew Stubbs wrote: On 06/09/2024 09:47, Robin Dapp wrote: So we only found two instances of this problem and both were related to _Bools.  In case you have more cases, it would be greatly appreciated to verify the series with them.  If you don't mind, would

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-10 Thread Andrew Stubbs
On 06/09/2024 09:47, Robin Dapp wrote: So we only found two instances of this problem and both were related to _Bools. In case you have more cases, it would be greatly appreciated to verify the series with them. If you don't mind, would it be possible to comment out the zeroing, re-run the test

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-06 Thread Andrew Stubbs
On 06/09/2024 08:06, Robin Dapp wrote: There were absolutely problems without this. It's a while ago now, so I'm struggling with the details, but as GCC only applies the mask to selected operations there were all sorts of issues that crept in. Zeroing the undefined lanes seemed to match the middl

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-05 Thread Andrew Stubbs
On Thu, 5 Sept 2024, 21:10 Robin Dapp, wrote: > > > +(define_predicate "maskload_else_operand" > > > + (and (match_code "const_int,const_vector") > > > + (match_test "op == CONST0_RTX (GET_MODE (op))"))) > > > > This forces maskload and mask_gather_load to only accept zero here, but > > in

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-05 Thread Andrew Stubbs
(Sorry, I missed this because I was on vacation.) On 11/08/2024 22:00, Robin Dapp wrote: This patch adds a zero else operand to the masked loads. The patch is OK, but I have a question below. gcc/ChangeLog: * config/gcn/predicates.md (maskload_else_operand): New predicate.

[committed, wwwdocs] gcc-15: Fiji gfx803 device support removed

2024-09-02 Thread Andrew Stubbs
--- htdocs/gcc-15/changes.html | 7 +++ 1 file changed, 7 insertions(+) diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html index edce138e..7c372688 100644 --- a/htdocs/gcc-15/changes.html +++ b/htdocs/gcc-15/changes.html @@ -123,6 +123,13 @@ a work-in-progress. +AMD Ra

[committed] amdgcn: remove gfx803 "Fiji" support

2024-09-02 Thread Andrew Stubbs
The gfx803 "Fiji" device was deprecated in GCC 14, removed from LLVM 18, and hasn't worked properly with the drivers since about ROCm 4. This patch removes the device from GCC options and documentation, and removes the direct mentions from the internals. The TARGET_GCN3 support in the back-end is

[committed] amdgcn: Remove TARGET_GCN5_PLUS

2024-09-02 Thread Andrew Stubbs
Now that GCN3 support is gone, TARGET_GCN5_PLUS always evaluates to true, so we can make that code unconditional, and remove all the "else" cases. The ISA features TARGET_GLOBAL_ADDRSPACE, TARGET_FLAT_OFFSETS, TARGET_EXPLICIT_CARRY, and TARGET_MULTIPLY_IMMEDIATE, are similarly also redundant and c

[committed] amdgcn: Remove TARGET_GCN3

2024-09-02 Thread Andrew Stubbs
The only GCN3 ISA device was remove (Fiji, gfx803) so all the GCN3-specific code and features can be removed from the back-end. gcc/ChangeLog: * config/gcn/gcn-opts.h (enum gcn_isa): Delete ISA_GCN3. (TARGET_GCN3): Delete. (TARGET_GCN3_PLUS): Delete. (TARGET_M0_LDS

Re: [patch][rfc] libgomp: Add OpenMP interop support to nvptx + gcn plugin

2024-08-27 Thread Andrew Stubbs
On 22/08/2024 19:26, Tobias Burnus wrote: This patch adds OpenMP's interop support to the libgomp plugins (nvptx: cuda, cuda_driver, hip; gcn: hip, hsa).* [The idea is that the user can ask OpenMP to return a foreign-runtime handle (CUdevice, hipCtx_t, …) for to a specified OpenMP device numbe

Re: [commit] amdgcn: Re-enable trampolines

2024-08-09 Thread Andrew Stubbs
On 09/08/2024 07:53, Thomas Schwinge wrote: Hi Andrew! On 2024-08-08T13:50:17+, Andrew Stubbs wrote: Previously, trampolines worked on GCN3 devices, but the newer GCN5 devices had different permissions on the stack memory space we were using. That changed when we added the reverse

[commit] amdgcn: Add padding to trampoline

2024-08-09 Thread Andrew Stubbs
This avoids a -Wpadded warning (testcase gcc.dg/20050607-1.c). gcc/ChangeLog: * config/gcn/gcn.cc (gcn_asm_trampoline_template): Add .align. * config/gcn/gcn.h (TRAMPOLINE_SIZE): Increase to 40. --- gcc/config/gcn/gcn.cc | 1 + gcc/config/gcn/gcn.h | 2 +- 2 files changed, 2 ins

[committed] amdgcn: Fix VGPR max count

2024-08-08 Thread Andrew Stubbs
The metadata for RDNA3 kernels allocates VGPRs in blocks of 12, which means the maximum usable number of registers is 252. This patch prevents the compiler from exceeding this artifical limit. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_conditional_register_usage): Fix registers rema

[commit] amdgcn: Re-enable trampolines

2024-08-08 Thread Andrew Stubbs
Previously, trampolines worked on GCN3 devices, but the newer GCN5 devices had different permissions on the stack memory space we were using. That changed when we added the reverse-offload features because we switched from using the "private" memory space to using a regular memory allocation. The

Re: [Patch] install.texi (gcn): Suggest newer commit for Newlib

2024-07-23 Thread Andrew Stubbs
On 23/07/2024 11:05, Tobias Burnus wrote: Hi Andrew, hi all, to be compatible with C++ (and Thomas' WIP work for GCN C++ support), I suggest the attach patch that also suggest Thomas' Newlib commit (April 4, 2024) ed50a50b9   amdgcn: Implement proper locks: Fix 'newlib/libc/sys/amdgcn/inclu

  1   2   3   4   5   6   7   8   9   10   >