I've committed the following 5-patch series to amonakov/gomp-nvptx git branch. The first two patches are unrelated fixes to previously landed code. Patches 3-5 reorganize the way initial soft-stack setup is done.
Previously soft-stacks used to be allocated by libgomp/config/nvptx/team.c in a function that wrapped gomp_nvptx_main. However: - the default device heap is only 8 MB, which is not enough for multiple teams with 128 KiB per-warp stacks; libgomp plugin would need to increase heap size; - device heap persists between launches, so it's possible to leak soft-stack allocations if a team exits without cleaning up; - device malloc is rather slow, so I'd like to eliminate or reuse device allocations as much as possible; it's easier to arrange reuse of soft stack storage from the host side; - there's a chicken-and-egg problem with setting up soft stacks from C code. So the above motivates a transition to a scheme where libgomp core is oblivious to soft stack setup, and instead the storage is allocated from the libgomp plugin (via cuMemAlloc) and passed to the compiler-emitted entry function as the 2nd (base pointer) and 3rd (per-warp size) arguments. This obviously addresses bullets 1-2 above, bullet 4 is addressed since the entry code is emitted in assembly from the backend, and bullet 3 is left to a followup change: cuMemAlloc is roughly as slow on the host as malloc is slow on the device, but we should be able to reuse allocations on the host. This changes the binary interface between libgomp plugin (GOMP_OFFLOAD_run) and compiler-emitted kernel entry functions for OpenMP target regions. For now, I am free to do that on the branch without worries, but if a similar change is required in the future after a release, libgomp plugin should be able to detect which arguments the entry expects. Assuming the argument list is only appended to, libgomp plugin only needs to know the argument count. So a possible solution is to invent a tagging mechanism when the change needs to be made, and provide the default 3 arguments to untagged entries. Old libgomp plugins unaware of the change should be able to detect failure to provide sufficient arguments to entries emitted from new compiler from the failure of cuLaunchKernel Alexander Monakov (5): libgomp plugin: correct types Revert "nvptx plugin: bump heap size to 1GB" nvptx backend: set up stacks in entry code libgomp: remove __nvptx_stacks setup code libgomp plugin: manage soft-stack storage gcc/ChangeLog.gomp-nvptx | 6 +++++ gcc/config/nvptx/nvptx.c | 57 ++++++++++++++++++++++++++++++------------ libgomp/ChangeLog.gomp-nvptx | 26 +++++++++++++++++++ libgomp/config/nvptx/team.c | 31 ++++------------------- libgomp/plugin/plugin-nvptx.c | 58 +++++++++++++++++++++++++++++++++++-------- 5 files changed, 126 insertions(+), 52 deletions(-)