From: Cesar Philippidis <ce...@codesourcery.com>

This patch teaches nvptx_single to always use barrier '0' for CTA
synchronization. This started off as a cosmetic change, but later on
each large vector (i.e. one that larger than a PTX warp) will need to
use its own unique thread barrier to avoid thread divergence.
Consequently, this patch begins the process of teaching the nvptx
state propagator how to use a common thread barrier for each
propagation level.

2018-XX-YY  Cesar Philippidis  <ce...@codesourcery.com>

        gcc/
        * config/nvptx/nvptx.c (nvptx_single): Always pass false to
        nvptx_cta_sync.
        (nvptx_process_pars): Likewise.

(cherry picked from openacc-gcc-7-branch commit
ac0a55b8e72363a09f7968474744c51c1fa7720a)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 4d46d89..1f954a6 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4374,7 +4374,7 @@ nvptx_single (unsigned mask, basic_block from, 
basic_block to)
          /* This barrier is needed to avoid worker zero clobbering
             the broadcast buffer before all the other workers have
             had a chance to read this instance of it.  */
-         emit_insn_before (nvptx_cta_sync (true), tail);
+         emit_insn_before (nvptx_cta_sync (false), tail);
        }
 
       extract_insn (tail);
@@ -4501,7 +4501,7 @@ nvptx_process_pars (parallel *par)
        {
          /* Insert begin and end synchronizations.  */
          emit_insn_before (nvptx_cta_sync (false), par->forked_insn);
-         emit_insn_before (nvptx_cta_sync (true), par->join_insn);
+         emit_insn_before (nvptx_cta_sync (false), par->join_insn);
        }
     }
   else if (par->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR))
-- 
2.7.4

Reply via email to