[committed] amdgcn: Switch to HSACO v3 binary format

2020-06-17 Thread Andrew Stubbs
. This move makes the binaries compatible with the new rocgdb from ROCm 3.5. 2020-06-17 Andrew Stubbs gcc/ * config/gcn/gcn-hsa.h (TEXT_SECTION_ASM_OP): Use ".text". (BSS_SECTION_ASM_OP): Use ".bss". (ASM_SPEC): Remove "-mattr=-code-object-v3". (LINK_SPEC): Add

[committed][OG10] amdgcn: Switch to HSACO v3 binary format

2020-06-18 Thread Andrew Stubbs
This patch is now backported to the devel/omp/gcc-10 branch. Andrew On 17/06/2020 10:13, Andrew Stubbs wrote: This upgrades the compiler to emit HSA Code Object v3 binaries.  This means changing the assembler directives, and linker command line options. The gcn-run and libgomp loaders need

Re: [Patch] amdgcn: Silence compile warnings

2020-06-19 Thread Andrew Stubbs
On 19/06/2020 17:00, Tobias Burnus wrote: OK for mainline? OK, thank you. Andrew

[committed] amdgcn: Pass vector parameters in memory

2020-06-22 Thread Andrew Stubbs
This patch ensures that programs using vector extensions to pass vectors to functions pass the vectors in memory. Even though we could technically do this in registers, the ABI would have to be reworked to do so, and there's no call for that yet (maybe if we want vector libgcc/libm). This ch

Re: [Patch][gcn, nvptx, offloading] mkoffload – handle -fpic/-fPIC

2020-06-23 Thread Andrew Stubbs
On 23/06/2020 16:21, Tobias Burnus wrote: If the offloading code is (only) in a library, one can come up with the idea to build those parts as shared library – and link it to the nonoffloading code.(*) Currently, this fails as the mkoffload calls the nonoffloading compiler without the -fpic/-fPI

Re: [Patch][gcn, nvptx, offloading] mkoffload – handle -fpic/-fPIC

2020-06-23 Thread Andrew Stubbs
On 23/06/2020 20:36, Thomas Schwinge wrote: Eventually (not now...), instead of special-casing more and more options (I somehow doubt that '-fpic', '-fPIC' are the only ones?), shouldn't we solve this in some more generic way, like re-invoking the host compiler exactly as invoked before (if that

[committed] amdgcn: Support basic DWARF

2020-06-29 Thread Andrew Stubbs
This patch configures the DWARF debug output to match the proposed DWARF specification from AMD. This is already implemented in LLVM and rocgdb (out of tree). This makes no attempt to support CFI, yet, and has some issues with vector registers. GCC will need to support some DWARF extensions t

Re: [PATCH] [og10] amdgcn: Add waitcnt after LDS write instructions

2020-06-29 Thread Andrew Stubbs
On 29/06/2020 21:16, Julian Brown wrote: Data-share write (ds_write) instructions do not necessarily complete the write to LDS immediately. When a write completes, LGKM_CNT is decremented. For now, we wait until LGKM_CNT reaches zero after each ds_write instruction. This fixes a race condition i

[PATCH] OpenMP: Disable GPU threads when only teams are used

2020-07-02 Thread Andrew Stubbs
. Co-Authored-By: Andrew Stubbs diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c index 0f07e51f7e8..6afe18d5ee0 100644 --- a/gcc/omp-expand.c +++ b/gcc/omp-expand.c @@ -8461,10 +8461,22 @@ push_target_argument_according_to_value (gimple_stmt_iterator *gsi, int device, } } +static bool

Re: [PATCH] OpenMP: Disable GPU threads when only teams are used

2020-07-02 Thread Andrew Stubbs
On 02/07/2020 18:00, Jakub Jelinek wrote: On Thu, Jul 02, 2020 at 05:15:20PM +0100, Andrew Stubbs wrote: This patch, originally by Kwok, auto-adjusts the default OpenMP target arguments to set num_threads(1) when there are no parallel regions. There may still be multiple teams in this case

[committed] amdgcn: Add fold_left_plus vector reductions

2020-07-03 Thread Andrew Stubbs
This patch implements a floating-point fold_left_plus vector pattern, which gives a significant speed-up in the BabelStream "dot" benchmark. The GCN architecture can't actually do an in-order vector reduction any more efficiently than that equivalent scalar algorithm, so this is a bit of a che

[committed][OG10] amdgcn: Add fold_left_plus vector reductions

2020-07-03 Thread Andrew Stubbs
Now backported to OG10. Andrew On 03/07/2020 11:11, Andrew Stubbs wrote: This patch implements a floating-point fold_left_plus vector pattern, which gives a significant speed-up in the BabelStream "dot" benchmark. The GCN architecture can't actually do an in-order vector redu

[committed] amdgcn: Tweak plugin-gcn defines

2020-11-24 Thread Andrew Stubbs
This tiny patch just cleans up some defines in the libgomp GCN plugin. The code wouldn't compile, as it was, if elf.h is updated to support GCN relocations. This should fix that. The other user, mkoffload.c, was already fixed. Andrew Tweak plugin-gcn.c defines Ensure the code will continue t

Re: [PATCH] middle-end/97579 - lower VECTOR_BOOLEAN_TYPE_P VEC_COND_EXPRs

2020-11-25 Thread Andrew Stubbs
On 25/11/2020 11:36, Richard Biener wrote: This makes sure to lower VECTOR_BOOLEAN_TYPE_P typed VEC_COND_EXPRs so we don't try to use vcond to expand those. That's especially improtant for x86 integer mode boolean vectors but eventually as well for aarch64 / gcn VnBImode ones. GCN does not hav

[committed] Fix atomic_capture-1.f90 testcase

2020-11-25 Thread Andrew Stubbs
This libgomp OpenACC testcase makes assumptions about the order in which loop iterations will run that are invalid on amdgcn. Apparently nvptx does work that way, but I find that surprising in itself. For example, this patch ensures that where a test expects one bit left set, or unset, then it

[committed] amdgcn: Fix early-debug relocations

2020-11-26 Thread Andrew Stubbs
This patch fixes an error in GCN mkoffload that corrupted relocations in the early-debug info. The code now updates the relocation code without zeroing the symbol index. Andrew Fix early-debug relocations The relocation symbols were inadvertantly wiped when the type was set in mkoffload. gcc/

Re: [PATCH] configury : Fix LEB128 support for non-GNU assemblers.

2020-11-26 Thread Andrew Stubbs
On 26/11/2020 14:48, Iain Sandoe wrote: Rainer Orth wrote: unfortunately, Solaris/SPARC results are miserable: So without further investigation, we cannot use the leb128 directives with Solaris/SPARC as. I think Andrew was running GCN (not sure of the results there) - but, I suppose tha

Re: [PATCH] amdgcn: Add builtins for vectorized native versions of abs, floorf and floor

2022-11-08 Thread Andrew Stubbs
On 08/11/2022 14:35, Kwok Cheung Yeung wrote: Hello This patch adds three extra builtins for the vectorized forms of the abs, floorf and floor math functions, which are implemented by native GCN instructions. I have also added a test to check that they generate the expected assembler instruct

Re: [patch] gcn: Add __builtin_gcn_kernarg_ptr

2022-11-16 Thread Andrew Stubbs
On 16/11/2022 11:42, Tobias Burnus wrote: This is a part of a patch by Andrew (hi!) - namely that part that only adds the __builtin_gcn_kernarg_ptr. More is planned, see below. The short term benefit of this patch is to permit replacing hardcoded numbers by a builtin – like in libgomp (see pa

[committed] amdgcn: update target-supports.exp

2023-04-20 Thread Andrew Stubbs
Recent patches have enabled new capabilities on AMD GCN, but not all the testsuite features were enabled. The hardfp divide patch actually had an test regression because the expected results were too conservative. This patch corrects both issues. Andrewamdgcn: update target-supports.exp The b

[committed] amdgcn: bug fix ldexp insn

2023-04-20 Thread Andrew Stubbs
The hardfp division patch exposed a flaw in the ldexp pattern at -O0; the compiler was trying to use out-of-range immediates on VOP3 instruction encodings. This patch changes the constraints appropriately, and also takes the opportunity to combine the two patterns into one using the newly ava

[committed][OG10] amdgcn, openmp: Fix concurrency in low-latency allocator

2023-04-20 Thread Andrew Stubbs
I've committed this to the devel/omp/gcc-12 branch. The patch fixes a concurrency issue where the spin-locks didn't work well if many GPU threads tried to free low-latency memory all at once. Adding a short sleep instruction is enough for the hardware thread to yield and allow another to proc

[committed] amdgcn: Fix addsub bug

2023-04-27 Thread Andrew Stubbs
I've committed this patch to fix a couple of bugs introduced in the recent CMul patch. First, the fmsubadd insn was accidentally all adds and no substracts. Second, there were input dependencies on the undefined output register which caused the compiler to reserve unnecessary slots in the stac

Re: [Patch] GCN: Silence unused-variable warning

2023-05-05 Thread Andrew Stubbs
On 05/05/2023 12:10, Tobias Burnus wrote: Probably added for symmetry with out_mode/out_n but at the end not used. That function was added in commit   r13-6423-gce9cd7258d0 amdgcn: Enable SIMD vectorization of math functions Tested the removal by building with that patch applied. OK for mainl

Re: [PATCH] amdgcn: Add instruction pattern for conditional shift operations

2023-02-02 Thread Andrew Stubbs
On 01/02/2023 15:35, Paul-Antoine Arras wrote: This patch introduces an instruction pattern for conditional shift operations (cond_{ashl|ashr|lshr}) in the GCN machine description. Tested on GCN3 Fiji gfx803. OK to commit? The changelog will need to be wrapped to 80 columns. OK otherwise. A

[committed] amdgcn, libgomp: Manually allocated stacks

2023-02-02 Thread Andrew Stubbs
I've committed this patch to change the ways stacks are initialized on amdgcn. The patch only touches GCN files, or the GCN-only portions of libgomp files, so I'm allowing it despite stage 4 because I want the ABI change done for GCC 13, and because it enables Tobias's reverse offload-patch tha

Re: [Patch] libgomp: enable reverse offload for AMDGCN

2023-02-02 Thread Andrew Stubbs
On 02/02/2023 14:59, Tobias Burnus wrote: Maybe it becomes better reviewable with an attached patch ... On 02.02.23 15:31, Tobias Burnus wrote: Now that the stack handling has been changed for AMDGCN, this patch enables reverse offload. (cf. today's "[committed] amdgcn, libgomp: Manually alloca

[committed] amdgcn: Pass -mstack-size through to runtime

2023-02-06 Thread Andrew Stubbs
The -mstack-size option has been marked obsolete in favour of setting an environment variable at runtime ("GCN_STACK_SIZE"), but some testcases still need the option set or they have stack overflow. I could change them to use the envvar, but my testing setup uses remote execute which doesn't su

Re: [PATCH 3/5] openmp, nvptx: ompx_unified_shared_mem_alloc

2023-02-10 Thread Andrew Stubbs
On 10/02/2023 14:21, Thomas Schwinge wrote: Is the correct fix the following (conceptually like 'linux_memspace_alloc' cited above), or is there something that I fail to understand? static void * linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int pin) {

Re: [PATCH] libgomp, openmp: pinned memory

2023-02-10 Thread Andrew Stubbs
On 10/02/2023 15:11, Thomas Schwinge wrote: Hi! Re OpenMP 'pinned' memory allocator trait semantics vs. 'omp_realloc': On 2022-01-13T13:53:03+, Andrew Stubbs wrote: On 05/01/2022 17:07, Andrew Stubbs wrote: [...], I'm working on an implementation using mmap ins

Re: [patch] [wwwdocs]+[invoke.texi] Update GCN for gfx90a (was: Re: [committed] amdgcn: Add gfx90a support)

2022-05-25 Thread Andrew Stubbs
On 24/05/2022 17:44, Tobias Burnus wrote: On 24.05.22 17:31, Andrew Stubbs wrote: amdgcn: Add gfx90a support Attached is an attempt to update invoke.texi I've deliberately avoided the MI100 and MI200 names because they're really not that simple. MI100 is gfx908, but MI150 is

Re: [patch] [wwwdocs]+[invoke.texi] Update GCN for gfx90a (was: Re: [committed] amdgcn: Add gfx90a support)

2022-05-25 Thread Andrew Stubbs
On 25/05/2022 12:16, Tobias Burnus wrote: On 25.05.22 11:18, Andrew Stubbs wrote: On 24/05/2022 17:44, Tobias Burnus wrote: On 24.05.22 17:31, Andrew Stubbs wrote: amdgcn: Add gfx90a support I've deliberately avoided the MI100 and MI200 names because they're really not that simple

Re: [committed] amdgcn: Remove LLVM 9 assembler/linker support

2022-06-06 Thread Andrew Stubbs
On 27/05/2022 20:16, Thomas Schwinge wrote: Hi Andrew! On 2022-05-24T16:27:52+0100, Andrew Stubbs wrote: I've committed this patch to set the minimum required LLVM version, for the assembler and linker, to 13.0.1. An upgrade from LLVM 9 is a prerequisite for the gfx90a support, and 13.0

Re: [PATCH] libgomp, openmp: pinned memory

2022-06-07 Thread Andrew Stubbs
x27;t know how that'll handle heterogenous systems, but those ought to be rare. I don't think libmemkind will resolve this performance issue, although certainly it can be used for host implementations of low-latency memories, etc. Andrew On 13/01/2022 13:53, Andrew Stubbs wrote:

Re: [PATCH] libgomp, openmp: pinned memory

2022-06-07 Thread Andrew Stubbs
On 07/06/2022 13:10, Jakub Jelinek wrote: On Tue, Jun 07, 2022 at 12:05:40PM +0100, Andrew Stubbs wrote: Following some feedback from users of the OG11 branch I think I need to withdraw this patch, for now. The memory pinned via the mlock call does not give the expected performance boost. I

[committed] amdgcn: remove obsolete assembler workarounds

2022-06-27 Thread Andrew Stubbs
This patch removed some workarounds that were required for old versions of the LLVM assembler. The minimum supported version is now 13.0.1 so the workarounds are no longer needed. Andrewamdgcn: remove obsolete assembler workarounds This nonsense is no longer required, now that the minimum sup

[committed] amdgcn: test global constructors

2022-06-27 Thread Andrew Stubbs
This setting is way out of date; global constructors have worked on GCN for a while now. Andrewamdgcn: test global constructors The tests are disabled for historical reasons only. gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_global_constructor): R

[committed][OG11] andgcn, openmp: Unified Shared Memory

2022-06-27 Thread Andrew Stubbs
I've pushed these three patches to the devel/omp/gcc-11 branch ("OG11"). I'll be submitting mainline versions soonish. The patches add a means to track "requires unified_shared_memory" from the frontend, through the backend compiler, and on to the runtime, plus all the bits needed to implement

Re: [committed] openmp: Add support for HBW or large capacity or interleaved memory through the libmemkind.so library

2022-06-28 Thread Andrew Stubbs
On 09/06/2022 09:19, Jakub Jelinek via Gcc-patches wrote: + switch (memspace) +{ +case omp_high_bw_mem_space: +#ifdef LIBGOMP_USE_MEMKIND + struct gomp_memkind_data *memkind_data; + memkind_data = gomp_get_memkind (); + if (data.partition == omp_atv_interleaved + &

Re: [committed] openmp: Add support for HBW or large capacity or interleaved memory through the libmemkind.so library

2022-06-29 Thread Andrew Stubbs
On 29/06/2022 11:45, Jakub Jelinek wrote: And omp_init_allocator needs to decide what to do if one asks for features that need memkind as well as for features that need whatever you/Abid have been working on. A possible resolution is punt (return omp_null_allocator), or prefer one feature over t

[committed] amdgcn: Silence warnings in gcn.c

2021-03-18 Thread Andrew Stubbs
This patch has no functional changes; it merely cleans up some warning messages. Thanks to Jan-Benedict for pointing them out, off-list. Andrew amdgcn: Silence warnings in gcn.c This fixes a few cases of "unquoted identifier or keyword", one "spurious trailing punctuation sequence", and a "m

Re: [committed] amdgcn: Silence warnings in gcn.c

2021-03-19 Thread Andrew Stubbs
This follow-up fixes a typo in the placement of the close quote. Thanks to Tobias for pointing it out. Andrew On 18/03/2021 17:41, Andrew Stubbs wrote: This patch has no functional changes; it merely cleans up some warning messages. Thanks to Jan-Benedict for pointing them out, off-list

Re: [PATCH 1/3] openacc: Add support for gang local storage allocation in shared memory

2021-04-16 Thread Andrew Stubbs
On 15/04/2021 18:26, Thomas Schwinge wrote: and optimisation, since shared memory might be faster than the main memory on a GPU. Do we potentially have a problem that making more use of (scarce) gang-private memory may negatively affect peformance, because potentially fewer OpenACC gangs may th

Re: [PATCH 1/3] openacc: Add support for gang local storage allocation in shared memory

2021-04-18 Thread Andrew Stubbs
On 16/04/2021 18:30, Thomas Schwinge wrote: Hi! On 2021-04-16T17:05:24+0100, Andrew Stubbs wrote: On 15/04/2021 18:26, Thomas Schwinge wrote: and optimisation, since shared memory might be faster than the main memory on a GPU. Do we potentially have a problem that making more use of

Re: [PATCH] gcc/configure.ac: fix register issue for global_load assembler functions

2021-06-14 Thread Andrew Stubbs
On 14/06/2021 13:36, Julian Brown wrote: On Wed, 9 Jun 2021 16:47:21 +0200 Marcel Vollweiler wrote: This patch fixes an issue with global_load assembler functions leading to a "invalid operand for instruction" error since in different LLVM versions those functions use either one or two registe

Re: [PATCH 2/5] amdgcn: Add [us]mulsi3_highpart SGPR alternatives & [us]mulsid3/muldi3 expanders

2021-06-18 Thread Andrew Stubbs
s lost from libgcc if we build it for DImode/TImode rather than SImode/DImode, a change we make in a later patch in this series. I can probably self-approve this, but I'll give Andrew Stubbs a chance to comment. Thanks, Julian 2021-06-18 Julian Brown gcc/ * config/gcn/gcn.md (mulsi

Re: [PATCH 3/5] amdgcn: Add clrsbsi2/clrsbdi2 implementation

2021-06-18 Thread Andrew Stubbs
o fix up the result afterwards. These patterns are lost from libgcc if we build it for DImode/TImode rather than SImode/DImode, a change we make in a later patch in this series. I can probably self-approve this, but I'll give Andrew Stubbs a chance to comment. Thanks, Julian 2021-06-18 Ju

Re: [PATCH 4/5] amdgcn: Enable support for TImode for AMD GCN

2021-06-18 Thread Andrew Stubbs
h alternatives for all operations that might be needed). Those gaps are filled in by this patch, or by the preceding patches in the series. I can probably self-approve this, but I'll give Andrew Stubbs a chance to comment. Thanks, Julian 2021-06-18 Julian Brown gcc/ * config/

Re: [PATCH 1/5] amdgcn: Use unsigned types for udivsi3/umodsi3 libgcc helper args/return

2021-06-18 Thread Andrew Stubbs
On 18/06/2021 15:19, Julian Brown wrote: This patch changes the argument and return types for the libgcc __udivsi3 and __umodsi3 helper functions for GCN to USItype instead of SItype. This is probably just cosmetic in practice. I can probably self-approve this, but I'll give Andrew Stu

Re: [PATCH 1/3] [amdgcn] Update CFI configuration

2021-06-23 Thread Andrew Stubbs
On 22/06/2021 18:14, Hafiz Abid Qadeer wrote: Currently we don't get any call frame information for the amdgcn target. This patch makes necessary adjustments to generate CFI that can work with ROCGDB (ROCm 3.8+). gcc/ * config/gcn/gcn.c (move_callee_saved_registers): Emit CFI notes for

Re: [PATCH 2/3] [amdgcn] Use frame pointer for CFA expressions.

2021-06-23 Thread Andrew Stubbs
On 22/06/2021 18:14, Hafiz Abid Qadeer wrote: As size of address is bigger than registers in amdgcn, we are forced to use DW_CFA_def_cfa_expression to make an expression that concatenates multiple registers for the value of the CFA. This then prohibits us from using many of the dwarf ops which e

Re: [PATCH 3/3] [amdgcn] Add hook for DWARF address spaces.

2021-06-23 Thread Andrew Stubbs
On 22/06/2021 18:14, Hafiz Abid Qadeer wrote: Map GCN address spaces to the proposed DWARF address spaces defined by AMD at https://llvm.org/docs/AMDGPUUsage.html#amdgpu-dwarf-address-class-mapping-table gcc/ * config/gcn/gcn.c: Include dwarf2.h. (gcn_addr_space_debug): New func

Re: [wwwdocs] gcc-12/changes.html: OpenMP + GCN update

2021-06-23 Thread Andrew Stubbs
On 23/06/2021 10:53, Tobias Burnus wrote: + additionally the following features which were available in C and C++ + before: depobj, mutexinoutset and I realise that you did not invent this awkward wording, but I'd prefer ... "the following features that were previously only availabl

Re: [PATCH] [GCN] Fix handling of VCC_CONDITIONAL_REG

2019-11-14 Thread Andrew Stubbs
On 14/11/2019 12:43, Kwok Cheung Yeung wrote: Hello This patch fixes an issue seen in the following test cases on AMD GCN: libgomp.oacc-fortran/gemm.f90 libgomp.oacc-fortran/gemm-2.f90 libgomp.c/for-5-test_ttdpfs_ds128_auto.c libgomp.c/for-5-test_ttdpfs_ds128_guided32.c libgomp.c/for-5-test_ttd

[patch, libgomp] Enable OpenACC GCN testing

2019-11-14 Thread Andrew Stubbs
Hi, This patch adds some necessary bits to enable OpenACC testings for amdgcn offloading. The two "check_effective" procedures are not actually needed yet, but later patches to test cases will use them. OK to commit? Thanks Andrew Enable OpenACC GCN testing. 2019-11-14 And

[patch, libgomp] Add tests for print from offload target

2019-11-14 Thread Andrew Stubbs
om offload kernels is not recommended in production, but can be useful in development. OK to commit? Thanks Andrew Add tests for print from offload target. 2019-11-14 Andrew Stubbs libgomp/ * testsuite/libgomp.c/target-print-1.c: New file. * testsuite/libgomp.fortran/target-print-1.f9

Re: [patch, libgomp] Add tests for print from offload target

2019-11-14 Thread Andrew Stubbs
On 14/11/2019 17:05, Jakub Jelinek wrote: On Thu, Nov 14, 2019 at 04:47:49PM +, Andrew Stubbs wrote: This patch adds new libgomp tests to ensure that C "printf" and Fortran "write" work correctly within offload kernels. Both should work for amdgcn, but nvptx uses the

Re: [PATCH 1/5] [amdgcn] Use first lane of v1 for zero constant

2019-11-15 Thread Andrew Stubbs
On 14/11/2019 15:30, Kwok Cheung Yeung wrote: GCN 5 has commonly-used global memory instructions that specify the address as [SGPR address] + [VGPR offset] + [constant offset], and we often want the VGPR offset to be zero, so v0 is currently reserved for that purpose. However, v1 contains [0,

Re: [PATCH 2/5] [amdgcn] Reinitialize registers for every function

2019-11-15 Thread Andrew Stubbs
On 14/11/2019 15:30, Kwok Cheung Yeung wrote: The set of fixed registers is adjusted by the TARGET_CONDITIONAL_REGISTER_USAGE hook, but this needs to be done on a per-function basis, whereas the hook is normally called once during GCC initialization before any functions have been processed (whi

Re: [PATCH 3/5] [amdgcn] Restrict register usage in non-kernel functions

2019-11-15 Thread Andrew Stubbs
On 14/11/2019 15:32, Kwok Cheung Yeung wrote: This patch restricts non-kernel functions to using a maximum of 64 SGPRs and 24 VGPRs. Kernels can request various pieces of information from the HSA runtime, and these will be loaded into the registers consecutively before the kernel executes. Th

Re: [PATCH 4/5] [amdgcn] Update lower limits requested by non-leaf kernels

2019-11-15 Thread Andrew Stubbs
On 14/11/2019 15:33, Kwok Cheung Yeung wrote: The kernel attributes are changed to request at least 64 SGPRs and 24 VGPRs (i.e. the non-kernel maximum, otherwise the callees may not have enough registers to run in) for non-leaf kernels to take advantage of the reduced number of registers used i

Re: [PATCH 5/5] [amdgcn] Unfix frame pointer

2019-11-15 Thread Andrew Stubbs
On 14/11/2019 15:34, Kwok Cheung Yeung wrote: This patch unfixes the registers for the hard frame pointer so that they can be used for other purposes if the frame pointer is not in use. This patch is dependent on the commit 'Support using multiple registers to hold the frame pointer' (r277895)

Re: [patch, libgomp] Enable OpenACC GCN testing

2019-11-15 Thread Andrew Stubbs
On 15/11/2019 12:21, Jakub Jelinek wrote: On Thu, Nov 14, 2019 at 04:36:38PM +, Andrew Stubbs wrote: This patch adds some necessary bits to enable OpenACC testings for amdgcn offloading. The two "check_effective" procedures are not actually needed yet, but later patches to test

Re: [patch, libgomp] Enable OpenACC GCN testing

2019-11-15 Thread Andrew Stubbs
On 15/11/2019 12:43, Jakub Jelinek wrote: APUs, such as Carizzo are shared memory. DGPUs, such as Fiji and Vega, have their own memory. A DGPU can access host memory, provided that it has been set up just so, but that is very slow, and I don't know of a way to do that without still having to copy

Re: [PATCH 4/5] [amdgcn] Update lower limits requested by non-leaf kernels

2019-11-15 Thread Andrew Stubbs
On 15/11/2019 15:51, Kwok Cheung Yeung wrote: On 15/11/2019 11:32 am, Andrew Stubbs wrote: On 14/11/2019 15:33, Kwok Cheung Yeung wrote: The kernel attributes are changed to request at least 64 SGPRs and 24 VGPRs (i.e. the non-kernel maximum, otherwise the callees may not have enough

Re: [PATCH 08/13] Fix host-to-device copies from rodata for AMD GCN

2019-11-18 Thread Andrew Stubbs
On 15/11/2019 21:44, Julian Brown wrote: +static void +hsa_memory_copy_wrapper (void *dst, const void *src, size_t len) +{ + hsa_status_t status = hsa_fns.hsa_memory_copy_fn (dst, src, len); + + if (status == HSA_STATUS_SUCCESS) +return; + + /* It appears that the copy fails if the source

Re: [PATCH 09/13] AMD GCN libgomp plugin queue-full condition locking fix

2019-11-18 Thread Andrew Stubbs
On 15/11/2019 21:44, Julian Brown wrote: @@ -2732,13 +2732,9 @@ wait_for_queue_nonfull (struct goacc_asyncqueue *aq) { if (aq->queue_n == ASYNC_QUEUE_SIZE) { - pthread_mutex_lock (&aq->mutex); - /* Queue is full. Wait for it to not be full. */ while (aq->queue_n

Re: [PATCH 11/13] AMD GCN symbol output with null cfun

2019-11-18 Thread Andrew Stubbs
On 15/11/2019 21:44, Julian Brown wrote: This patch checks that cfun is valid in the gcn_asm_output_symbol_ref function. This prevents a crash when that function is called with NULL cfun, i.e. when outputting debug symbols. OK? OK, although that FIXME still baffles me. Andrew

Re: [PATCH 13/13] Enable worker partitioning for AMD GCN

2019-11-18 Thread Andrew Stubbs
On 15/11/2019 21:44, Julian Brown wrote: This patch flips the switch to enable worker partitioning on AMD GCN. OK? This is OK, although I think we could just remove that flag now. Andrew

[patch, openacc] Adjust tests for amdgcn offloading

2019-11-19 Thread Andrew Stubbs
This patch adds GCN special casing for most of the OpenACC libgomp tests that require it. It also disables one testcase that explicitly uses CUDA. OK to commit? Andrew Update OpenACC tests for amdgcn 2019-11-19 Andrew Stubbs libgomp/ * testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1

[committed] Update loop-1.c test for amdgcn

2019-11-19 Thread Andrew Stubbs
sn't match. The code is still correct for the purpose of the testcase either way, however, so I'm removing the over-fussy match. Andrew Update loop-1.c test for amdgcn 2019-11-19 Andrew Stubbs gcc/testsuite/ * gcc.dg/tree-ssa/loop-1.c: Change amdgcn assembler scan. diff --git a/

[patch, openacc] Fix ICE verifying gimple

2019-11-22 Thread Andrew Stubbs
The attached patch assigns the "(int) x" to a temporary and passes that to the function instead. OK to commit? -- Andrew Stubbs CodeSourcery / Mentor Graphics Normalize GOACC_parallel_keyed async parameter. 2019-11-22 Andrew Stubbs gcc/ * omp-expand.c (expand_omp_target): Pass s

[committed, amdgcn] Use GFX9 granulated sgprs count correctly

2019-11-22 Thread Andrew Stubbs
I've committed the attached. The patch adjusts the GCN kernel metadata so that it is correct for GFX9 devices. The existing implementation was correct for GFX8, and seems to work on GFX9, but wasn't technically correct. -- Andrew Stubbs CodeSourcery / Mentor Graphics Use GFX9

[committed, amdgcn] Limit LDS usage

2019-11-22 Thread Andrew Stubbs
ocation remains unchanged for non-offload compiles (this is only really used for running the testsuite). -- Andrew Stubbs CodeSourcery / Mentor Graphics Limit LDS usage. 2019-11-22 Andrew Stubbs gcc/ * config/gcn/gcn.c (OMP_LDS_SIZE): Define. (ACC_LDS_SIZE): Define. (OTHER_LDS_SIZE): Def

Re: [Patch][amdgcn] Silence warnings + add gcc_unreachable()

2019-11-25 Thread Andrew Stubbs
On 25/11/2019 11:14, Tobias Burnus wrote: This patch adds "gcc_unreachable ();" as suggested by me (cf. below). It also silences the -Wunused-variable + 'no return statement' warnings. OK for the trunk? OK. Thanks, Tobias. Andrew

Re: [Patch] config/gcn/mkoffload.c – remove unused static vars

2019-11-25 Thread Andrew Stubbs
On 25/11/2019 14:17, Tobias Burnus wrote: The compiler warns that funcs_tail and vars_tails are unused – they, funcs_ids/var_ids and struct id_map seem to be a copy-n-paste leftovers from gcc/config/nvptx/mkoffload.c. Additionally, COMMENT_PREFIX does not seem to be used anywhere. (In the who

Re: Host/device shared memory

2019-12-02 Thread Andrew Stubbs
On 02/12/2019 14:23, Thomas Schwinge wrote: Hi! On 2019-11-15T13:43:04+0100, Jakub Jelinek wrote: On Fri, Nov 15, 2019 at 12:38:06PM +, Andrew Stubbs wrote: On 15/11/2019 12:21, Jakub Jelinek wrote: I'm surprised by the set acc_mem_shared 0, I thought gcn is a shared memory offlo

Re: [patch, libgomp] Enable OpenACC GCN testing

2019-12-03 Thread Andrew Stubbs
wed-by tag, sorry). Andrew Enable OpenACC GCN testing. 2019-12-03 Andrew Stubbs libgomp/ * testsuite/lib/libgomp.exp (offload_target_to_openacc_device_type): Recognize amdgcn. (check_effective_target_openacc_amdgcn_accel_present): New proc. (check_effective_target_openacc_amdgcn_acce

Re: [PATCH 4/7 libgomp,amdgcn] GCN libgomp port

2019-12-03 Thread Andrew Stubbs
On 02/12/2019 14:43, Thomas Schwinge wrote: Hi! On 2019-11-12T13:29:13+, Andrew Stubbs wrote: --- a/include/gomp-constants.h +++ b/include/gomp-constants.h @@ -174,6 +174,7 @@ enum gomp_map_kind #define GOMP_DEVICE_NVIDIA_PTX5 #define GOMP_DEVICE_INTEL_MIC 6

[amdgcn] Add missing vcondu patterns

2019-12-03 Thread Andrew Stubbs
tcase now compiles, although not quite correctly, but that's another issue (pr92772). Andrew Add missing amdgcn vcondu patterns 2019-12-03 Andrew Stubbs gcc/ * config/gcn/gcn-valu.md: Change "vcondu" patterns to use VEC_1REG_MODE for the data mode. diff --git a/gcc/config/g

Re: [Patch] gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}

2022-11-18 Thread Andrew Stubbs
On 18/11/2022 17:20, Tobias Burnus wrote: This patch adds two builtins (getting end-of-stack pointer and a Boolean answer whether it was the first call to the builtin on this thread). The idea is to replace some hard-coded values in newlib, permitting to move later to a manually allocated stac

Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling

2022-11-18 Thread Andrew Stubbs
On 18/11/2022 17:41, Tobias Burnus wrote: Attached is the updated/rediffed version, which now uses the builtin instead of the 'asm("s8"). The code in principle works; that is: If no private stack variables are copied, it works. Or in other words: reverse-offload target regions that don't use

Re: [Patch] gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}

2022-11-19 Thread Andrew Stubbs
On 19/11/2022 10:46, Tobias Burnus wrote: On 18.11.22 18:49, Andrew Stubbs wrote: On 18/11/2022 17:20, Tobias Burnus wrote: This looks wrong: +    /* stackbase = (stack_segment_decr & 0x) +    + stack_wave_offset); +   seg_size = dispatch_ptr->private_segme

Re: [Patch] libgomp/gcn: fix/improve struct output (was: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling)

2022-11-21 Thread Andrew Stubbs
On 21/11/2022 13:40, Tobias Burnus wrote: Working on the builtins, I realized that I mixed up (again) bits and byes. While 'uint64_t var[2]' has a size of 128 bits, 'char var[128]' has a size of 128 bytes. Thus, there is sufficient space for 16 pointer-size/uin64_t values but I only need 6. T

Re: [Patch] gcn: Fix __builtin_gcn_first_call_this_thread_p

2022-11-28 Thread Andrew Stubbs
On 28/11/2022 07:40, Tobias Burnus wrote: It turned out that cprop cleverly propagated the unspec_volatile to the preceding (pseudo)register, permitting to remove the 'set (s0) (pseudoregister)' at -O2.  Unfortunately, it does matter whether the assignment is done to 's2' (previously: pseudoregis

Re: [PATCH] amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors

2022-11-29 Thread Andrew Stubbs
On 29/11/2022 15:56, Paul-Antoine Arras wrote: Hi all, This patch adds support for 'gfx803' as an alias for 'fiji' in OpenMP context selectors, so as to be consistent with LLVM. It also adds test cases checking all supported AMD ISAs are properly recognised when used in a 'declare variant' co

Re: [Patch] libgomp.texi: List GCN's 'gfx803' under OpenMP Context Selectors (was: amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors)

2022-11-30 Thread Andrew Stubbs
On 29/11/2022 18:26, Tobias Burnus wrote: Hi PA, hi Andrew, hi Jakub, hi all, On 29.11.22 16:56, Paul-Antoine Arras wrote: This patch adds support for 'gfx803' as an alias for 'fiji' in OpenMP context selectors, [...] I think this should be documented somewhere. We have https://gcc.gnu.org/on

Re: [PATCH 3/3] vect: inbranch SIMD clones

2022-11-30 Thread Andrew Stubbs
On 09/09/2022 15:31, Jakub Jelinek wrote: --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -1074,13 +1076,19 @@ if_convertible_stmt_p (gimple *stmt, vec refs) tree fndecl = gimple_call_fndecl (stmt); if (fndecl) { + /* We can vectorize some builtins and

Re: [PATCH][OG12] amdgcn: Support AMD-specific 'isa' and 'arch' traits in OpenMP context selectors

2022-12-01 Thread Andrew Stubbs
On 01/12/2022 11:10, Paul-Antoine Arras wrote: + if (TARGET_FIJI) \ + builtin_define ("__FIJI__"); \ + else if (TARGET_VEGA10) \ +

Re: [PATCH 3/3] vect: inbranch SIMD clones

2022-12-01 Thread Andrew Stubbs
On 30/11/2022 15:37, Jakub Jelinek wrote: On Wed, Nov 30, 2022 at 03:17:30PM +, Andrew Stubbs wrote: --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c @@ -0,0 +1,89 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -

Re: [PATCH] amdgcn: Add preprocessor builtins for every processor type

2022-12-01 Thread Andrew Stubbs
On 01/12/2022 14:35, Paul-Antoine Arras wrote: I believe this patch addresses your comments regarding the GCN bits. The new builtins are consistent with the LLVM naming convention (lower case, canonical name). For gfx803, I also kept '__fiji__' to be consistent with -march=fiji. Is it OK for

Re: [PATCH 02/17] libgomp: pinned memory

2022-12-08 Thread Andrew Stubbs
On 08/12/2022 12:11, Jakub Jelinek wrote: On Thu, Jul 07, 2022 at 11:34:33AM +0100, Andrew Stubbs wrote: Implement the OpenMP pinned memory trait on Linux hosts using the mlock syscall. Pinned allocations are performed using mmap, not malloc, to ensure that they can be unpinned safely when

Re: [PATCH 02/17] libgomp: pinned memory

2022-12-08 Thread Andrew Stubbs
On 08/12/2022 14:02, Tobias Burnus wrote: On 08.12.22 13:51, Andrew Stubbs wrote: On 08/12/2022 12:11, Jakub Jelinek wrote: On Thu, Jul 07, 2022 at 11:34:33AM +0100, Andrew Stubbs wrote: Implement the OpenMP pinned memory trait on Linux hosts using the mlock syscall.  Pinned allocations are

[PATCH] OpenMP front-end: allow requires dynamic_allocators

2021-12-20 Thread Andrew Stubbs
Hi all, This patch removes the "sorry" message for the OpenMP "requires dynamic_allocators" feature in C, C++ and Fortran. The clause is supposed to state that the user code will not work without the omp_alloc/omp_free and omp_init_allocator/omp_destroy_allocator and these things *are* prese

[PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2021-12-20 Thread Andrew Stubbs
This patch is submitted now for review and so I can commit a backport it to the OG11 branch, but isn't suitable for mainline until stage 1. The patch implements support for omp_low_lat_mem_space and omp_low_lat_mem_alloc on NVPTX offload devices. The omp_pteam_mem_alloc, omp_cgroup_mem_alloc a

[PATCH] nvptx: bump default to PTX 4.1

2021-12-21 Thread Andrew Stubbs
On 20/12/2021 15:58, Andrew Stubbs wrote: In order to support the %dynamic_smem_size PTX feature is is necessary to bump the minimum supported PTX version from 3.1 (~2013) to 4.1 (~2014). Tobias has pointed out, privately, that the default version is both documented and encoded in the -mptx

[OG11][PATCH] OpenMP: Ensure that offloaded variables are public

2021-12-22 Thread Andrew Stubbs
This is now backported to the devel/omp/gcc-11 branch (OG11). Andrew On 09/12/2021 11:41, Andrew Stubbs wrote: On 02/12/2021 16:43, Jakub Jelinek wrote: On Thu, Dec 02, 2021 at 04:31:36PM +, Andrew Stubbs wrote: On 02/12/2021 16:05, Andrew Stubbs wrote: On 02/12/2021 12:58, Jakub

[PATCH] libgomp, openmp: pinned memory

2022-01-04 Thread Andrew Stubbs
This patch implements the OpenMP pinned memory trait for Linux hosts. On other hosts and on devices the trait becomes a no-op (instead of being rejected). The memory is locked via the mlock syscall, which is both the "correct" way to do it on Linux, and a problem because the default ulimit for

Re: [PATCH] libgomp, openmp: pinned memory

2022-01-04 Thread Andrew Stubbs
On 04/01/2022 15:55, Jakub Jelinek wrote: The usual libgomp way of doing this wouldn't be to use #ifdef __linux__, but instead add libgomp/config/linux/allocator.c that includes some headers, defines some macros and then includes the generic allocator.c. OK, good point, I can do that. I think

Re: [PATCH] nvptx: bump default to PTX 4.1

2022-01-05 Thread Andrew Stubbs
On 05/01/2022 10:24, Tom de Vries wrote: On 12/21/21 12:33, Andrew Stubbs wrote: On 20/12/2021 15:58, Andrew Stubbs wrote: In order to support the %dynamic_smem_size PTX feature is is necessary to bump the minimum supported PTX version from 3.1 (~2013) to 4.1 (~2014). Tobias has pointed out

<    1   2   3   4   5   6   7   8   9   10   >