Re: [Mesa-dev] [PATCH v3 29/48] intel/cs: Rework the way thread local ID is handled

2017-10-31 Thread Iago Toral
On Mon, 2017-10-30 at 11:34 -0700, Jason Ekstrand wrote:
> On Mon, Oct 30, 2017 at 12:33 AM, Iago Toral 
> wrote:
> > On Fri, 2017-10-27 at 12:37 -0700, Jason Ekstrand wrote:
> > > On Fri, Oct 27, 2017 at 2:11 AM, Iago Toral 
> > > wrote:
> > > > On Wed, 2017-10-25 at 16:26 -0700, Jason Ekstrand wrote:
> > > > 
> > > > > Previously, brw_nir_lower_intrinsics added the param and then
> > > > emitted
> > > > 
> > > > > a
> > > > 
> > > > > load_uniform intrinsic to load it directly.  This commit
> > > > switches
> > > > 
> > > > > things
> > > > 
> > > > > over to use a specific NIR intrinsic for the thread id.  The
> > > > one
> > > > 
> > > > > thing I
> > > > 
> > > > > don't like about this approach is that we have to copy
> > > > 
> > > > > thread_local_id
> > > > 
> > > > > over to the new visitor in import_uniforms.
> > > > 
> > > > 
> > > > 
> > > > It is not clear to me why you are doing this... why do you like
> > > > this
> > > > 
> > > > better?
> > > > 
> > > 
> > > For compute shaders, the SPIR-V subgroups stuff has a
> > > gl_subgroupId system value which subgroup in the dispatch you
> > > are.  That information is basically the same as the
> > > thread_local_id only off by a factor of the SIMD size.  It's
> > > fairly arbitrary, but I figured we might as well switch over to
> > > pushing the value that's defined in SPIR-V.
> > 
> > Oh, my question was not about pushing the subgroup id instead of
> > the thread local id (that is actually done in a later patch, not
> > here) it is about using a system value and changing the place where
> > we push that last uniform, which is what you change here. The
> > implementation seems exactly equivalent to what we had prior to
> > this patch, so I was wondering if there is any practical advantage
> > in doing it like this.
> 
> Not really.  It just seemed like, if we have a nir_load_* system
> value intrinsic, we may as well treat it as a system value like
> everything else.  Assuming it doesn't cause too much pain, I think
> I'd be ok with dropping this if you really want.

Not at all, I was just curious if there was another  reason for this
that I was missing. I am fine with keeping this.
Iago
> > Iago
> > > > > ---
> > > > 
> > > > >  src/compiler/nir/nir_intrinsics.h|  3 ++
> > > > 
> > > > >  src/intel/compiler/brw_fs.cpp|  4 +-
> > > > 
> > > > >  src/intel/compiler/brw_fs.h  |  1 +
> > > > 
> > > > >  src/intel/compiler/brw_fs_nir.cpp| 14
> > > > +++
> > > > 
> > > > >  src/intel/compiler/brw_nir.h |  3 +-
> > > > 
> > > > >  src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 53 +-
> > > > 
> > > > 
> > > > > --
> > > > 
> > > > >  6 files changed, 32 insertions(+), 46 deletions(-)
> > > > 
> > > > >
> > > > 
> > > > > diff --git a/src/compiler/nir/nir_intrinsics.h
> > > > 
> > > > > b/src/compiler/nir/nir_intrinsics.h
> > > > 
> > > > > index cefd18b..47022dd 100644
> > > > 
> > > > > --- a/src/compiler/nir/nir_intrinsics.h
> > > > 
> > > > > +++ b/src/compiler/nir/nir_intrinsics.h
> > > > 
> > > > > @@ -364,6 +364,9 @@ SYSTEM_VALUE(blend_const_color_a_float,
> > > > 1, 0, xx,
> > > > 
> > > > > xx, xx)
> > > > 
> > > > >  SYSTEM_VALUE(blend_const_color_rgba_unorm, 1, 0, xx, xx,
> > > > xx)
> > > > 
> > > > >  SYSTEM_VALUE(blend_const_color__unorm, 1, 0, xx, xx,
> > > > xx)
> > > > 
> > > > >  
> > > > 
> > > > > +/* Intel specific system values */
> > > > 
> > > > > +SYSTEM_VALUE(intel_thread_local_id, 1, 0, xx, xx, xx)
> > > > 
> > > > > +
> > > > 
> > > > >  /**
> > > > 
> > > > >   * Barycentric coordinate intrinsics.
> > > > 
> > > > >   *
> > > > 
> > > > > diff --git a/src/intel/compiler/brw_fs.cpp
> > > > 
> > > > > b/src/intel/compiler/brw_fs.cpp
> > > > 
> > > > > index 2acd838..c0d4c05 100644
> > > > 
> > > > > --- a/src/intel/compiler/brw_fs.cpp
> > > > 
> > > > > +++ b/src/intel/compiler/brw_fs.cpp
> > > > 
> > > > > @@ -996,6 +996,7 @@ fs_visitor::import_uniforms(fs_visitor
> > > > *v)
> > > > 
> > > > > this->push_constant_loc = v->push_constant_loc;
> > > > 
> > > > > this->pull_constant_loc = v->pull_constant_loc;
> > > > 
> > > > > this->uniforms = v->uniforms;
> > > > 
> > > > > +   this->thread_local_id = v->thread_local_id;
> > > > 
> > > > >  }
> > > > 
> > > > >  
> > > > 
> > > > >  void
> > > > 
> > > > > @@ -6781,8 +6782,7 @@ brw_compile_cs(const struct
> > > > brw_compiler
> > > > 
> > > > > *compiler, void *log_data,
> > > > 
> > > > >  {
> > > > 
> > > > > nir_shader *shader = nir_shader_clone(mem_ctx,
> > > > src_shader);
> > > > 
> > > > > shader = brw_nir_apply_sampler_key(shader, compiler,
> > > > &key->tex,
> > > > 
> > > > > true);
> > > > 
> > > > > -
> > > > 
> > > > > -   brw_nir_lower_cs_intrinsics(shader, prog_data);
> > > > 
> > > > > +   brw_nir_lower_cs_intrinsics(shader);
> > > > 
> > > > > shader = brw_postprocess_nir(shader, compiler, true);
> > > > 
> 

Re: [Mesa-dev] [PATCH v3 29/48] intel/cs: Rework the way thread local ID is handled

2017-10-30 Thread Jason Ekstrand
On Mon, Oct 30, 2017 at 12:33 AM, Iago Toral  wrote:

> On Fri, 2017-10-27 at 12:37 -0700, Jason Ekstrand wrote:
>
> On Fri, Oct 27, 2017 at 2:11 AM, Iago Toral  wrote:
>
> On Wed, 2017-10-25 at 16:26 -0700, Jason Ekstrand wrote:
> > Previously, brw_nir_lower_intrinsics added the param and then emitted
> > a
> > load_uniform intrinsic to load it directly.  This commit switches
> > things
> > over to use a specific NIR intrinsic for the thread id.  The one
> > thing I
> > don't like about this approach is that we have to copy
> > thread_local_id
> > over to the new visitor in import_uniforms.
>
> It is not clear to me why you are doing this... why do you like this
> better?
>
>
> For compute shaders, the SPIR-V subgroups stuff has a gl_subgroupId system
> value which subgroup in the dispatch you are.  That information is
> basically the same as the thread_local_id only off by a factor of the SIMD
> size.  It's fairly arbitrary, but I figured we might as well switch over to
> pushing the value that's defined in SPIR-V.
>
>
> Oh, my question was not about pushing the subgroup id instead of the
> thread local id (that is actually done in a later patch, not here) it is
> about using a system value and changing the place where we push that last
> uniform, which is what you change here. The implementation seems exactly
> equivalent to what we had prior to this patch, so I was wondering if there
> is any practical advantage in doing it like this.
>

Not really.  It just seemed like, if we have a nir_load_* system value
intrinsic, we may as well treat it as a system value like everything else.
Assuming it doesn't cause too much pain, I think I'd be ok with dropping
this if you really want.


> Iago
>
> > ---
> >  src/compiler/nir/nir_intrinsics.h|  3 ++
> >  src/intel/compiler/brw_fs.cpp|  4 +-
> >  src/intel/compiler/brw_fs.h  |  1 +
> >  src/intel/compiler/brw_fs_nir.cpp| 14 +++
> >  src/intel/compiler/brw_nir.h |  3 +-
> >  src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 53 +-
> > --
> >  6 files changed, 32 insertions(+), 46 deletions(-)
> >
> > diff --git a/src/compiler/nir/nir_intrinsics.h
> > b/src/compiler/nir/nir_intrinsics.h
> > index cefd18b..47022dd 100644
> > --- a/src/compiler/nir/nir_intrinsics.h
> > +++ b/src/compiler/nir/nir_intrinsics.h
> > @@ -364,6 +364,9 @@ SYSTEM_VALUE(blend_const_color_a_float, 1, 0, xx,
> > xx, xx)
> >  SYSTEM_VALUE(blend_const_color_rgba_unorm, 1, 0, xx, xx, xx)
> >  SYSTEM_VALUE(blend_const_color__unorm, 1, 0, xx, xx, xx)
> >
> > +/* Intel specific system values */
> > +SYSTEM_VALUE(intel_thread_local_id, 1, 0, xx, xx, xx)
> > +
> >  /**
> >   * Barycentric coordinate intrinsics.
> >   *
> > diff --git a/src/intel/compiler/brw_fs.cpp
> > b/src/intel/compiler/brw_fs.cpp
> > index 2acd838..c0d4c05 100644
> > --- a/src/intel/compiler/brw_fs.cpp
> > +++ b/src/intel/compiler/brw_fs.cpp
> > @@ -996,6 +996,7 @@ fs_visitor::import_uniforms(fs_visitor *v)
> > this->push_constant_loc = v->push_constant_loc;
> > this->pull_constant_loc = v->pull_constant_loc;
> > this->uniforms = v->uniforms;
> > +   this->thread_local_id = v->thread_local_id;
> >  }
> >
> >  void
> > @@ -6781,8 +6782,7 @@ brw_compile_cs(const struct brw_compiler
> > *compiler, void *log_data,
> >  {
> > nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
> > shader = brw_nir_apply_sampler_key(shader, compiler, &key->tex,
> > true);
> > -
> > -   brw_nir_lower_cs_intrinsics(shader, prog_data);
> > +   brw_nir_lower_cs_intrinsics(shader);
> > shader = brw_postprocess_nir(shader, compiler, true);
> >
> > prog_data->local_size[0] = shader->info.cs.local_size[0];
> > diff --git a/src/intel/compiler/brw_fs.h
> > b/src/intel/compiler/brw_fs.h
> > index da32593..f51a4d8 100644
> > --- a/src/intel/compiler/brw_fs.h
> > +++ b/src/intel/compiler/brw_fs.h
> > @@ -315,6 +315,7 @@ public:
> >  */
> > int *push_constant_loc;
> >
> > +   fs_reg thread_local_id;
> > fs_reg frag_depth;
> > fs_reg frag_stencil;
> > fs_reg sample_mask;
> > diff --git a/src/intel/compiler/brw_fs_nir.cpp
> > b/src/intel/compiler/brw_fs_nir.cpp
> > index 05efee3..fdc6fc6 100644
> > --- a/src/intel/compiler/brw_fs_nir.cpp
> > +++ b/src/intel/compiler/brw_fs_nir.cpp
> > @@ -88,6 +88,16 @@ fs_visitor::nir_setup_uniforms()
> > }
> >
> > uniforms = nir->num_uniforms / 4;
> > +
> > +   if (stage == MESA_SHADER_COMPUTE) {
> > +  /* Add a uniform for the thread local id.  It must be the last
> > uniform
> > +   * on the list.
> > +   */
> > +  assert(uniforms == prog_data->nr_params);
> > +  uint32_t *param = brw_stage_prog_data_add_params(prog_data,
> > 1);
> > +  *param = BRW_PARAM_BUILTIN_THREAD_LOCAL_ID;
> > +  thread_local_id = fs_reg(UNIFORM, uniforms++,
> > BRW_REGISTER_TYPE_UD);
> > +   }
> >  }
> >
> >  static bool
> > @@ -3409,6 +3419

Re: [Mesa-dev] [PATCH v3 29/48] intel/cs: Rework the way thread local ID is handled

2017-10-30 Thread Iago Toral
On Fri, 2017-10-27 at 12:37 -0700, Jason Ekstrand wrote:
> On Fri, Oct 27, 2017 at 2:11 AM, Iago Toral 
> wrote:
> > On Wed, 2017-10-25 at 16:26 -0700, Jason Ekstrand wrote:
> > 
> > > Previously, brw_nir_lower_intrinsics added the param and then
> > emitted
> > 
> > > a
> > 
> > > load_uniform intrinsic to load it directly.  This commit switches
> > 
> > > things
> > 
> > > over to use a specific NIR intrinsic for the thread id.  The one
> > 
> > > thing I
> > 
> > > don't like about this approach is that we have to copy
> > 
> > > thread_local_id
> > 
> > > over to the new visitor in import_uniforms.
> > 
> > 
> > 
> > It is not clear to me why you are doing this... why do you like
> > this
> > 
> > better?
> > 
> 
> For compute shaders, the SPIR-V subgroups stuff has a gl_subgroupId
> system value which subgroup in the dispatch you are.  That
> information is basically the same as the thread_local_id only off by
> a factor of the SIMD size.  It's fairly arbitrary, but I figured we
> might as well switch over to pushing the value that's defined in
> SPIR-V.

Oh, my question was not about pushing the subgroup id instead of the
thread local id (that is actually done in a later patch, not here) it
is about using a system value and changing the place where we push that
last uniform, which is what you change here. The implementation seems
exactly equivalent to what we had prior to this patch, so I was
wondering if there is any practical advantage in doing it like this.
Iago
> > > ---
> > 
> > >  src/compiler/nir/nir_intrinsics.h|  3 ++
> > 
> > >  src/intel/compiler/brw_fs.cpp|  4 +-
> > 
> > >  src/intel/compiler/brw_fs.h  |  1 +
> > 
> > >  src/intel/compiler/brw_fs_nir.cpp| 14 +++
> > 
> > >  src/intel/compiler/brw_nir.h |  3 +-
> > 
> > >  src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 53 +-
> > 
> > 
> > > --
> > 
> > >  6 files changed, 32 insertions(+), 46 deletions(-)
> > 
> > >
> > 
> > > diff --git a/src/compiler/nir/nir_intrinsics.h
> > 
> > > b/src/compiler/nir/nir_intrinsics.h
> > 
> > > index cefd18b..47022dd 100644
> > 
> > > --- a/src/compiler/nir/nir_intrinsics.h
> > 
> > > +++ b/src/compiler/nir/nir_intrinsics.h
> > 
> > > @@ -364,6 +364,9 @@ SYSTEM_VALUE(blend_const_color_a_float, 1, 0,
> > xx,
> > 
> > > xx, xx)
> > 
> > >  SYSTEM_VALUE(blend_const_color_rgba_unorm, 1, 0, xx, xx, xx)
> > 
> > >  SYSTEM_VALUE(blend_const_color__unorm, 1, 0, xx, xx, xx)
> > 
> > >  
> > 
> > > +/* Intel specific system values */
> > 
> > > +SYSTEM_VALUE(intel_thread_local_id, 1, 0, xx, xx, xx)
> > 
> > > +
> > 
> > >  /**
> > 
> > >   * Barycentric coordinate intrinsics.
> > 
> > >   *
> > 
> > > diff --git a/src/intel/compiler/brw_fs.cpp
> > 
> > > b/src/intel/compiler/brw_fs.cpp
> > 
> > > index 2acd838..c0d4c05 100644
> > 
> > > --- a/src/intel/compiler/brw_fs.cpp
> > 
> > > +++ b/src/intel/compiler/brw_fs.cpp
> > 
> > > @@ -996,6 +996,7 @@ fs_visitor::import_uniforms(fs_visitor *v)
> > 
> > > this->push_constant_loc = v->push_constant_loc;
> > 
> > > this->pull_constant_loc = v->pull_constant_loc;
> > 
> > > this->uniforms = v->uniforms;
> > 
> > > +   this->thread_local_id = v->thread_local_id;
> > 
> > >  }
> > 
> > >  
> > 
> > >  void
> > 
> > > @@ -6781,8 +6782,7 @@ brw_compile_cs(const struct brw_compiler
> > 
> > > *compiler, void *log_data,
> > 
> > >  {
> > 
> > > nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
> > 
> > > shader = brw_nir_apply_sampler_key(shader, compiler, &key-
> > >tex,
> > 
> > > true);
> > 
> > > -
> > 
> > > -   brw_nir_lower_cs_intrinsics(shader, prog_data);
> > 
> > > +   brw_nir_lower_cs_intrinsics(shader);
> > 
> > > shader = brw_postprocess_nir(shader, compiler, true);
> > 
> > >  
> > 
> > > prog_data->local_size[0] = shader->info.cs.local_size[0];
> > 
> > > diff --git a/src/intel/compiler/brw_fs.h
> > 
> > > b/src/intel/compiler/brw_fs.h
> > 
> > > index da32593..f51a4d8 100644
> > 
> > > --- a/src/intel/compiler/brw_fs.h
> > 
> > > +++ b/src/intel/compiler/brw_fs.h
> > 
> > > @@ -315,6 +315,7 @@ public:
> > 
> > >  */
> > 
> > > int *push_constant_loc;
> > 
> > >  
> > 
> > > +   fs_reg thread_local_id;
> > 
> > > fs_reg frag_depth;
> > 
> > > fs_reg frag_stencil;
> > 
> > > fs_reg sample_mask;
> > 
> > > diff --git a/src/intel/compiler/brw_fs_nir.cpp
> > 
> > > b/src/intel/compiler/brw_fs_nir.cpp
> > 
> > > index 05efee3..fdc6fc6 100644
> > 
> > > --- a/src/intel/compiler/brw_fs_nir.cpp
> > 
> > > +++ b/src/intel/compiler/brw_fs_nir.cpp
> > 
> > > @@ -88,6 +88,16 @@ fs_visitor::nir_setup_uniforms()
> > 
> > > }
> > 
> > >  
> > 
> > > uniforms = nir->num_uniforms / 4;
> > 
> > > +
> > 
> > > +   if (stage == MESA_SHADER_COMPUTE) {
> > 
> > > +  /* Add a uniform for the thread local id.  It must be the
> > last
> > 
> > > uniform
> > 
> > > +   * on the list.

Re: [Mesa-dev] [PATCH v3 29/48] intel/cs: Rework the way thread local ID is handled

2017-10-27 Thread Jason Ekstrand
On Fri, Oct 27, 2017 at 2:11 AM, Iago Toral  wrote:

> On Wed, 2017-10-25 at 16:26 -0700, Jason Ekstrand wrote:
> > Previously, brw_nir_lower_intrinsics added the param and then emitted
> > a
> > load_uniform intrinsic to load it directly.  This commit switches
> > things
> > over to use a specific NIR intrinsic for the thread id.  The one
> > thing I
> > don't like about this approach is that we have to copy
> > thread_local_id
> > over to the new visitor in import_uniforms.
>
> It is not clear to me why you are doing this... why do you like this
> better?
>

For compute shaders, the SPIR-V subgroups stuff has a gl_subgroupId system
value which subgroup in the dispatch you are.  That information is
basically the same as the thread_local_id only off by a factor of the SIMD
size.  It's fairly arbitrary, but I figured we might as well switch over to
pushing the value that's defined in SPIR-V.


> > ---
> >  src/compiler/nir/nir_intrinsics.h|  3 ++
> >  src/intel/compiler/brw_fs.cpp|  4 +-
> >  src/intel/compiler/brw_fs.h  |  1 +
> >  src/intel/compiler/brw_fs_nir.cpp| 14 +++
> >  src/intel/compiler/brw_nir.h |  3 +-
> >  src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 53 +-
> > --
> >  6 files changed, 32 insertions(+), 46 deletions(-)
> >
> > diff --git a/src/compiler/nir/nir_intrinsics.h
> > b/src/compiler/nir/nir_intrinsics.h
> > index cefd18b..47022dd 100644
> > --- a/src/compiler/nir/nir_intrinsics.h
> > +++ b/src/compiler/nir/nir_intrinsics.h
> > @@ -364,6 +364,9 @@ SYSTEM_VALUE(blend_const_color_a_float, 1, 0, xx,
> > xx, xx)
> >  SYSTEM_VALUE(blend_const_color_rgba_unorm, 1, 0, xx, xx, xx)
> >  SYSTEM_VALUE(blend_const_color__unorm, 1, 0, xx, xx, xx)
> >
> > +/* Intel specific system values */
> > +SYSTEM_VALUE(intel_thread_local_id, 1, 0, xx, xx, xx)
> > +
> >  /**
> >   * Barycentric coordinate intrinsics.
> >   *
> > diff --git a/src/intel/compiler/brw_fs.cpp
> > b/src/intel/compiler/brw_fs.cpp
> > index 2acd838..c0d4c05 100644
> > --- a/src/intel/compiler/brw_fs.cpp
> > +++ b/src/intel/compiler/brw_fs.cpp
> > @@ -996,6 +996,7 @@ fs_visitor::import_uniforms(fs_visitor *v)
> > this->push_constant_loc = v->push_constant_loc;
> > this->pull_constant_loc = v->pull_constant_loc;
> > this->uniforms = v->uniforms;
> > +   this->thread_local_id = v->thread_local_id;
> >  }
> >
> >  void
> > @@ -6781,8 +6782,7 @@ brw_compile_cs(const struct brw_compiler
> > *compiler, void *log_data,
> >  {
> > nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
> > shader = brw_nir_apply_sampler_key(shader, compiler, &key->tex,
> > true);
> > -
> > -   brw_nir_lower_cs_intrinsics(shader, prog_data);
> > +   brw_nir_lower_cs_intrinsics(shader);
> > shader = brw_postprocess_nir(shader, compiler, true);
> >
> > prog_data->local_size[0] = shader->info.cs.local_size[0];
> > diff --git a/src/intel/compiler/brw_fs.h
> > b/src/intel/compiler/brw_fs.h
> > index da32593..f51a4d8 100644
> > --- a/src/intel/compiler/brw_fs.h
> > +++ b/src/intel/compiler/brw_fs.h
> > @@ -315,6 +315,7 @@ public:
> >  */
> > int *push_constant_loc;
> >
> > +   fs_reg thread_local_id;
> > fs_reg frag_depth;
> > fs_reg frag_stencil;
> > fs_reg sample_mask;
> > diff --git a/src/intel/compiler/brw_fs_nir.cpp
> > b/src/intel/compiler/brw_fs_nir.cpp
> > index 05efee3..fdc6fc6 100644
> > --- a/src/intel/compiler/brw_fs_nir.cpp
> > +++ b/src/intel/compiler/brw_fs_nir.cpp
> > @@ -88,6 +88,16 @@ fs_visitor::nir_setup_uniforms()
> > }
> >
> > uniforms = nir->num_uniforms / 4;
> > +
> > +   if (stage == MESA_SHADER_COMPUTE) {
> > +  /* Add a uniform for the thread local id.  It must be the last
> > uniform
> > +   * on the list.
> > +   */
> > +  assert(uniforms == prog_data->nr_params);
> > +  uint32_t *param = brw_stage_prog_data_add_params(prog_data,
> > 1);
> > +  *param = BRW_PARAM_BUILTIN_THREAD_LOCAL_ID;
> > +  thread_local_id = fs_reg(UNIFORM, uniforms++,
> > BRW_REGISTER_TYPE_UD);
> > +   }
> >  }
> >
> >  static bool
> > @@ -3409,6 +3419,10 @@ fs_visitor::nir_emit_cs_intrinsic(const
> > fs_builder &bld,
> >cs_prog_data->uses_barrier = true;
> >break;
> >
> > +   case nir_intrinsic_load_intel_thread_local_id:
> > +  bld.MOV(retype(dest, BRW_REGISTER_TYPE_UD), thread_local_id);
> > +  break;
> > +
> > case nir_intrinsic_load_local_invocation_id:
> > case nir_intrinsic_load_work_group_id: {
> >gl_system_value sv = nir_system_value_from_intrinsic(instr-
> > >intrinsic);
> > diff --git a/src/intel/compiler/brw_nir.h
> > b/src/intel/compiler/brw_nir.h
> > index 1493b74..3e40712 100644
> > --- a/src/intel/compiler/brw_nir.h
> > +++ b/src/intel/compiler/brw_nir.h
> > @@ -95,8 +95,7 @@ void brw_nir_analyze_boolean_resolves(nir_shader
> > *nir);
> >  nir_shader *brw_preprocess_nir(const struct brw_compiler 

Re: [Mesa-dev] [PATCH v3 29/48] intel/cs: Rework the way thread local ID is handled

2017-10-27 Thread Iago Toral
On Wed, 2017-10-25 at 16:26 -0700, Jason Ekstrand wrote:
> Previously, brw_nir_lower_intrinsics added the param and then emitted
> a
> load_uniform intrinsic to load it directly.  This commit switches
> things
> over to use a specific NIR intrinsic for the thread id.  The one
> thing I
> don't like about this approach is that we have to copy
> thread_local_id
> over to the new visitor in import_uniforms.

It is not clear to me why you are doing this... why do you like this
better?

> ---
>  src/compiler/nir/nir_intrinsics.h|  3 ++
>  src/intel/compiler/brw_fs.cpp|  4 +-
>  src/intel/compiler/brw_fs.h  |  1 +
>  src/intel/compiler/brw_fs_nir.cpp| 14 +++
>  src/intel/compiler/brw_nir.h |  3 +-
>  src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 53 +-
> --
>  6 files changed, 32 insertions(+), 46 deletions(-)
> 
> diff --git a/src/compiler/nir/nir_intrinsics.h
> b/src/compiler/nir/nir_intrinsics.h
> index cefd18b..47022dd 100644
> --- a/src/compiler/nir/nir_intrinsics.h
> +++ b/src/compiler/nir/nir_intrinsics.h
> @@ -364,6 +364,9 @@ SYSTEM_VALUE(blend_const_color_a_float, 1, 0, xx,
> xx, xx)
>  SYSTEM_VALUE(blend_const_color_rgba_unorm, 1, 0, xx, xx, xx)
>  SYSTEM_VALUE(blend_const_color__unorm, 1, 0, xx, xx, xx)
>  
> +/* Intel specific system values */
> +SYSTEM_VALUE(intel_thread_local_id, 1, 0, xx, xx, xx)
> +
>  /**
>   * Barycentric coordinate intrinsics.
>   *
> diff --git a/src/intel/compiler/brw_fs.cpp
> b/src/intel/compiler/brw_fs.cpp
> index 2acd838..c0d4c05 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -996,6 +996,7 @@ fs_visitor::import_uniforms(fs_visitor *v)
> this->push_constant_loc = v->push_constant_loc;
> this->pull_constant_loc = v->pull_constant_loc;
> this->uniforms = v->uniforms;
> +   this->thread_local_id = v->thread_local_id;
>  }
>  
>  void
> @@ -6781,8 +6782,7 @@ brw_compile_cs(const struct brw_compiler
> *compiler, void *log_data,
>  {
> nir_shader *shader = nir_shader_clone(mem_ctx, src_shader);
> shader = brw_nir_apply_sampler_key(shader, compiler, &key->tex,
> true);
> -
> -   brw_nir_lower_cs_intrinsics(shader, prog_data);
> +   brw_nir_lower_cs_intrinsics(shader);
> shader = brw_postprocess_nir(shader, compiler, true);
>  
> prog_data->local_size[0] = shader->info.cs.local_size[0];
> diff --git a/src/intel/compiler/brw_fs.h
> b/src/intel/compiler/brw_fs.h
> index da32593..f51a4d8 100644
> --- a/src/intel/compiler/brw_fs.h
> +++ b/src/intel/compiler/brw_fs.h
> @@ -315,6 +315,7 @@ public:
>  */
> int *push_constant_loc;
>  
> +   fs_reg thread_local_id;
> fs_reg frag_depth;
> fs_reg frag_stencil;
> fs_reg sample_mask;
> diff --git a/src/intel/compiler/brw_fs_nir.cpp
> b/src/intel/compiler/brw_fs_nir.cpp
> index 05efee3..fdc6fc6 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -88,6 +88,16 @@ fs_visitor::nir_setup_uniforms()
> }
>  
> uniforms = nir->num_uniforms / 4;
> +
> +   if (stage == MESA_SHADER_COMPUTE) {
> +  /* Add a uniform for the thread local id.  It must be the last
> uniform
> +   * on the list.
> +   */
> +  assert(uniforms == prog_data->nr_params);
> +  uint32_t *param = brw_stage_prog_data_add_params(prog_data,
> 1);
> +  *param = BRW_PARAM_BUILTIN_THREAD_LOCAL_ID;
> +  thread_local_id = fs_reg(UNIFORM, uniforms++,
> BRW_REGISTER_TYPE_UD);
> +   }
>  }
>  
>  static bool
> @@ -3409,6 +3419,10 @@ fs_visitor::nir_emit_cs_intrinsic(const
> fs_builder &bld,
>    cs_prog_data->uses_barrier = true;
>    break;
>  
> +   case nir_intrinsic_load_intel_thread_local_id:
> +  bld.MOV(retype(dest, BRW_REGISTER_TYPE_UD), thread_local_id);
> +  break;
> +
> case nir_intrinsic_load_local_invocation_id:
> case nir_intrinsic_load_work_group_id: {
>    gl_system_value sv = nir_system_value_from_intrinsic(instr-
> >intrinsic);
> diff --git a/src/intel/compiler/brw_nir.h
> b/src/intel/compiler/brw_nir.h
> index 1493b74..3e40712 100644
> --- a/src/intel/compiler/brw_nir.h
> +++ b/src/intel/compiler/brw_nir.h
> @@ -95,8 +95,7 @@ void brw_nir_analyze_boolean_resolves(nir_shader
> *nir);
>  nir_shader *brw_preprocess_nir(const struct brw_compiler *compiler,
> nir_shader *nir);
>  
> -bool brw_nir_lower_cs_intrinsics(nir_shader *nir,
> - struct brw_cs_prog_data
> *prog_data);
> +bool brw_nir_lower_cs_intrinsics(nir_shader *nir);
>  void brw_nir_lower_vs_inputs(nir_shader *nir,
>   bool use_legacy_snorm_formula,
>   const uint8_t *vs_attrib_wa_flags);
> diff --git a/src/intel/compiler/brw_nir_lower_cs_intrinsics.c
> b/src/intel/compiler/brw_nir_lower_cs_intrinsics.c
> index d277276..07d2dcc 100644
> --- a/src/intel/compiler/brw_nir_lower_cs_in