From: Cesar Philippidis <ce...@codesourcery.com> This patch teaches nvptx_single to always use barrier '0' for CTA synchronization. This started off as a cosmetic change, but later on each large vector (i.e. one that larger than a PTX warp) will need to use its own unique thread barrier to avoid thread divergence. Consequently, this patch begins the process of teaching the nvptx state propagator how to use a common thread barrier for each propagation level.
2018-XX-YY Cesar Philippidis <ce...@codesourcery.com> gcc/ * config/nvptx/nvptx.c (nvptx_single): Always pass false to nvptx_cta_sync. (nvptx_process_pars): Likewise. (cherry picked from openacc-gcc-7-branch commit ac0a55b8e72363a09f7968474744c51c1fa7720a) diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 4d46d89..1f954a6 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -4374,7 +4374,7 @@ nvptx_single (unsigned mask, basic_block from, basic_block to) /* This barrier is needed to avoid worker zero clobbering the broadcast buffer before all the other workers have had a chance to read this instance of it. */ - emit_insn_before (nvptx_cta_sync (true), tail); + emit_insn_before (nvptx_cta_sync (false), tail); } extract_insn (tail); @@ -4501,7 +4501,7 @@ nvptx_process_pars (parallel *par) { /* Insert begin and end synchronizations. */ emit_insn_before (nvptx_cta_sync (false), par->forked_insn); - emit_insn_before (nvptx_cta_sync (true), par->join_insn); + emit_insn_before (nvptx_cta_sync (false), par->join_insn); } } else if (par->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR)) -- 2.7.4