from:"Iago Toral"

Re: [Mesa-dev] [PATCH] i965/gen6/xfb: handle case where transform feedback is not active

2018-08-17 Thread Iago Toral

Hi Andrey,
Thanks for the report and all the analysis work on your side. I am on
holidays at the momentand from tomorrow onwards I won't have reliable
internet access but Samuel will be back fromhis vacaction next week and
he might be be able to have a look at the problem and your patch.
Thanks,Iago
On Wed, 2018-08-15 at 18:26 +0300, andrey simiklit wrote:
> Hi all,
> 
> This workaround just helps me to avoid the graphical corruption on
> SNB but
> I not sure is it good idea.
> Regards,
> Andrii.
> 
> On Wed, Aug 15, 2018 at 6:20 PM,   wrote:
> > From: Andrii Simiklit 
> > 
> > 
> > 
> > When the SVBI Payload Enable is false I guess the register R1.4
> > 
> > which contains the Maximum Streamed Vertex Buffer Index is filled
> > by zero
> > 
> > and GS stops to write transform feedback when the transform
> > feedback 
> > 
> > is not active.
> > 
> > 
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107579
> > 
> > Signed-off-by: Andrii Simiklit 
> > 
> > ---
> > 
> >  src/mesa/drivers/dri/i965/genX_state_upload.c | 2 +-
> > 
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > 
> > 
> > diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c
> > b/src/mesa/drivers/dri/i965/genX_state_upload.c
> > 
> > index ea5ad55..0f82500 100644
> > 
> > --- a/src/mesa/drivers/dri/i965/genX_state_upload.c
> > 
> > +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
> > 
> > @@ -2806,7 +2806,7 @@ genX(upload_gs_state)(struct brw_context
> > *brw)
> > 
> >  #if GEN_GEN < 7
> > 
> >   gs.SOStatisticsEnable = true;
> > 
> >   if (gs_prog->info.has_transform_feedback_varyings)
> > 
> > -gs.SVBIPayloadEnable = true;
> > 
> > +gs.SVBIPayloadEnable =
> > _mesa_is_xfb_active_and_unpaused(ctx);
> > 
> > 
> > 
> >   /* GEN6_GS_SPF_MODE and GEN6_GS_VECTOR_MASK_ENABLE are
> > enabled as it
> > 
> >* was previously done for gen6.
> > 
> > -- 
> > 
> > 2.7.4
> > 
> > 
> > ___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] intel: Switch the order of the 2x MSAA sample positions

2018-08-09 Thread Iago Toral

Reviewed-by: Iago Toral Quiroga 

On Wed, 2018-08-08 at 11:30 -0700, Jason Ekstrand wrote:
> The Vulkan 1.1.82 spec flipped the order to better match D3D.
> 
> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  src/intel/blorp/blorp_blit.c   | 11 ++-
>  src/intel/common/gen_sample_positions.h|  8 
>  src/mesa/drivers/dri/i965/brw_multisample_state.h  |  8 
>  src/mesa/drivers/dri/i965/gen6_multisample_state.c |  4 ++--
>  4 files changed, 20 insertions(+), 11 deletions(-)
> 
> diff --git a/src/intel/blorp/blorp_blit.c
> b/src/intel/blorp/blorp_blit.c
> index 561897894c3..013f7a14fa2 100644
> --- a/src/intel/blorp/blorp_blit.c
> +++ b/src/intel/blorp/blorp_blit.c
> @@ -776,6 +776,13 @@ blorp_nir_manual_blend_bilinear(nir_builder *b,
> nir_ssa_def *pos,
> * grid of samples with in a pixel. Sample number layout shows
> the
> * rectangular grid of samples roughly corresponding to the
> real sample
> * locations with in a pixel.
> +   *
> +   * In the case of 2x MSAA, the layout of sample indices is
> reversed from
> +   * the layout of sample numbers:
> +   *   -
> +   *   | 1 | 0 |
> +   *   -
> +   *
> * In case of 4x MSAA, layout of sample indices matches the
> layout of
> * sample numbers:
> *   -
> @@ -819,7 +826,9 @@ blorp_nir_manual_blend_bilinear(nir_builder *b,
> nir_ssa_def *pos,
>  key->x_scale * key-
> >y_scale));
>sample = nir_f2i32(b, sample);
>  
> -  if (tex_samples == 8) {
> +  if (tex_samples == 2) {
> + sample = nir_isub(b, nir_imm_int(b, 1), sample);
> +  } else if (tex_samples == 8) {
>   sample = nir_iand(b, nir_ishr(b, nir_imm_int(b,
> 0x64210573),
> nir_ishl(b, sample,
> nir_imm_int(b, 2))),
> nir_imm_int(b, 0xf));
> diff --git a/src/intel/common/gen_sample_positions.h
> b/src/intel/common/gen_sample_positions.h
> index f0ce95dd1fb..da48dcb5ed0 100644
> --- a/src/intel/common/gen_sample_positions.h
> +++ b/src/intel/common/gen_sample_positions.h
> @@ -42,10 +42,10 @@ prefix##0YOffset   = 0.5;
>   * c   1
>   */
>  #define GEN_SAMPLE_POS_2X(prefix) \
> -prefix##0XOffset   = 0.25; \
> -prefix##0YOffset   = 0.25; \
> -prefix##1XOffset   = 0.75; \
> -prefix##1YOffset   = 0.75;
> +prefix##0XOffset   = 0.75; \
> +prefix##0YOffset   = 0.75; \
> +prefix##1XOffset   = 0.25; \
> +prefix##1YOffset   = 0.25;
>  
>  /**
>   * Sample positions:
> diff --git a/src/mesa/drivers/dri/i965/brw_multisample_state.h
> b/src/mesa/drivers/dri/i965/brw_multisample_state.h
> index 6cf324e561c..2142a17a484 100644
> --- a/src/mesa/drivers/dri/i965/brw_multisample_state.h
> +++ b/src/mesa/drivers/dri/i965/brw_multisample_state.h
> @@ -38,13 +38,13 @@
>  /**
>   * 1x MSAA has a single sample at the center: (0.5, 0.5) -> (0x8,
> 0x8).
>   *
> - * 2x MSAA sample positions are (0.25, 0.25) and (0.75, 0.75):
> + * 2x MSAA sample positions are (0.75, 0.75) and (0.25, 0.25):
>   *   4 c
> - * 4 0
> - * c   1
> + * 4 1
> + * c   0
>   */
>  static const uint32_t
> -brw_multisample_positions_1x_2x = 0x0088cc44;
> +brw_multisample_positions_1x_2x = 0x008844cc;
>  
>  /**
>   * Sample positions:
> diff --git a/src/mesa/drivers/dri/i965/gen6_multisample_state.c
> b/src/mesa/drivers/dri/i965/gen6_multisample_state.c
> index bfa84fb9b77..78ff3942075 100644
> --- a/src/mesa/drivers/dri/i965/gen6_multisample_state.c
> +++ b/src/mesa/drivers/dri/i965/gen6_multisample_state.c
> @@ -70,7 +70,7 @@ gen6_get_sample_position(struct gl_context *ctx,
>   *
>   * 2X MSAA sample index / number layout
>   *   -
> - *   | 0 | 1 |
> + *   | 1 | 0 |
>   *   -
>   *
>   * 4X MSAA sample index / number layout
> @@ -107,7 +107,7 @@ gen6_get_sample_position(struct gl_context *ctx,
>  void
>  gen6_set_sample_maps(struct gl_context *ctx)
>  {
> -   uint8_t map_2x[2] = {0, 1};
> +   uint8_t map_2x[2] = {1, 0};
> uint8_t map_4x[4] = {0, 1, 2, 3};
> uint8_t map_8x[8] = {3, 7, 5, 0, 1, 2, 4, 6};
> uint8_t map_16x[16] = { 15, 10, 9, 7, 4, 1, 3, 13,
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl: handle error case with ast_post_inc, ast_post_dec

2018-08-07 Thread Iago Toral

Reviewed-by: Iago Toral Quiroga 

On Tue, 2018-08-07 at 08:20 +0300, Tapani Pälli wrote:
> Return ir_rvalue::error_value with ast_post_inc, ast_post_dec if
> parser error was emitted previously. This way process_array_size
> won't see bogus IR generated like with commit 9c676a64273.
> 
> Signed-off-by: Tapani Pälli 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98699
> ---
>  src/compiler/glsl/ast_to_hir.cpp | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/src/compiler/glsl/ast_to_hir.cpp
> b/src/compiler/glsl/ast_to_hir.cpp
> index 74160ec142b..5d3f10b6823 100644
> --- a/src/compiler/glsl/ast_to_hir.cpp
> +++ b/src/compiler/glsl/ast_to_hir.cpp
> @@ -1928,6 +1928,11 @@ ast_expression::do_hir(exec_list
> *instructions,
>  
>error_emitted = op[0]->type->is_error() || op[1]->type-
> >is_error();
>  
> +  if (error_emitted) {
> + result = ir_rvalue::error_value(ctx);
> + break;
> +  }
> +
>type = arithmetic_result_type(op[0], op[1], false, state, &
> loc);
>  
>ir_rvalue *temp_rhs;
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] intel/compiler: fix lower conversions to account for predication

2018-07-27 Thread Iago Toral

On Thu, 2018-07-26 at 11:30 +0200, Chema Casanova wrote:
> Please include:
> 
> Fixes: 5a12bdac09496e00 "i965/compiler: handle conversion to smaller
>  type in the lowering pass for that"

This is not specifically fixing that commit, the problem has been there
before that commit.

Iago

> Reviewed-by: Jose Maria Casanova Crespo 
> 
> El 17/07/18 a las 11:10, Iago Toral Quiroga escribió:
> > The pass can create a temporary result for the instruction and then
> > moves from it to the original destination, however, if the original
> > instruction was predicated, the mov has to be predicated as well.
> > ---
> >  src/intel/compiler/brw_fs_lower_conversions.cpp | 5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/intel/compiler/brw_fs_lower_conversions.cpp
> > b/src/intel/compiler/brw_fs_lower_conversions.cpp
> > index e27e2402746..145fb55f995 100644
> > --- a/src/intel/compiler/brw_fs_lower_conversions.cpp
> > +++ b/src/intel/compiler/brw_fs_lower_conversions.cpp
> > @@ -98,7 +98,10 @@ fs_visitor::lower_conversions()
> >   * size_written accordingly.
> >   */
> >  inst->size_written = inst->dst.component_size(inst-
> > >exec_size);
> > -ibld.at(block, inst->next).MOV(dst, strided_temp)-
> > >saturate = saturate;
> > +
> > +fs_inst *mov = ibld.at(block, inst->next).MOV(dst,
> > strided_temp);
> > +mov->saturate = saturate;
> > +mov->predicate = inst->predicate;
> >  
> >  progress = true;
> >   }
> > 
> 
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] intel/compiler: fix lower conversions to account for predication

2018-07-23 Thread Iago Toral

This is still pending review, any takers?

Iago

On Tue, 2018-07-17 at 11:10 +0200, Iago Toral Quiroga wrote:
> The pass can create a temporary result for the instruction and then
> moves from it to the original destination, however, if the original
> instruction was predicated, the mov has to be predicated as well.
> ---
>  src/intel/compiler/brw_fs_lower_conversions.cpp | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/src/intel/compiler/brw_fs_lower_conversions.cpp
> b/src/intel/compiler/brw_fs_lower_conversions.cpp
> index e27e2402746..145fb55f995 100644
> --- a/src/intel/compiler/brw_fs_lower_conversions.cpp
> +++ b/src/intel/compiler/brw_fs_lower_conversions.cpp
> @@ -98,7 +98,10 @@ fs_visitor::lower_conversions()
>   * size_written accordingly.
>   */
>  inst->size_written = inst->dst.component_size(inst-
> >exec_size);
> -ibld.at(block, inst->next).MOV(dst, strided_temp)-
> >saturate = saturate;
> +
> +fs_inst *mov = ibld.at(block, inst->next).MOV(dst,
> strided_temp);
> +mov->saturate = saturate;
> +mov->predicate = inst->predicate;
>  
>  progress = true;
>   }
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] intel/compiler: fix lower conversions to account for predication

2018-07-17 Thread Iago Toral Quiroga

The pass can create a temporary result for the instruction and then
moves from it to the original destination, however, if the original
instruction was predicated, the mov has to be predicated as well.
---
 src/intel/compiler/brw_fs_lower_conversions.cpp | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_fs_lower_conversions.cpp 
b/src/intel/compiler/brw_fs_lower_conversions.cpp
index e27e2402746..145fb55f995 100644
--- a/src/intel/compiler/brw_fs_lower_conversions.cpp
+++ b/src/intel/compiler/brw_fs_lower_conversions.cpp
@@ -98,7 +98,10 @@ fs_visitor::lower_conversions()
  * size_written accordingly.
  */
 inst->size_written = inst->dst.component_size(inst->exec_size);
-ibld.at(block, inst->next).MOV(dst, strided_temp)->saturate = 
saturate;
+
+fs_inst *mov = ibld.at(block, inst->next).MOV(dst, strided_temp);
+mov->saturate = saturate;
+mov->predicate = inst->predicate;
 
 progress = true;
  }
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] intel/compiler: remove unused function

2018-07-09 Thread Iago Toral Quiroga

---
 src/intel/compiler/brw_fs.cpp | 27 ---
 src/intel/compiler/brw_fs.h   |  4 
 2 files changed, 31 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 38a8621f2c..99b21f6d89 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -6444,33 +6444,6 @@ fs_visitor::dump_instruction(backend_instruction 
*be_inst, FILE *file)
fprintf(file, "\n");
 }
 
-/**
- * Possibly returns an instruction that set up @param reg.
- *
- * Sometimes we want to take the result of some expression/variable
- * dereference tree and rewrite the instruction generating the result
- * of the tree.  When processing the tree, we know that the
- * instructions generated are all writing temporaries that are dead
- * outside of this tree.  So, if we have some instructions that write
- * a temporary, we're free to point that temp write somewhere else.
- *
- * Note that this doesn't guarantee that the instruction generated
- * only reg -- it might be the size=4 destination of a texture instruction.
- */
-fs_inst *
-fs_visitor::get_instruction_generating_reg(fs_inst *start,
-  fs_inst *end,
-  const fs_reg )
-{
-   if (end == start ||
-   end->is_partial_write() ||
-   !reg.equals(end->dst)) {
-  return NULL;
-   } else {
-  return end;
-   }
-}
-
 void
 fs_visitor::setup_fs_payload_gen6()
 {
diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index 25c433e44f..c09f0ccdd3 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -83,10 +83,6 @@ public:
void setup_uniform_clipplane_values();
void compute_clip_distance();
 
-   fs_inst *get_instruction_generating_reg(fs_inst *start,
-  fs_inst *end,
-  const fs_reg );
-
void VARYING_PULL_CONSTANT_LOAD(const brw::fs_builder ,
const fs_reg ,
const fs_reg _index,
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] intel/compiler: emit actual barriers for working-group level barriers

2018-07-09 Thread Iago Toral

Any feedback about this? We need this to fix some new CTS tests.

On Thu, 2018-06-21 at 13:52 +0200, Iago Toral Quiroga wrote:
> Until now we have assumed that we could skip emitting these barriers
> in the general case based on empirical testing and a few assumptions
> detailed in a comment in the driver code, however, recent CTS tests
> have showed that we actually need them to produce correct behavior.
> ---
>  src/intel/compiler/brw_fs_nir.cpp | 25 ++---
>  1 file changed, 2 insertions(+), 23 deletions(-)
> 
> diff --git a/src/intel/compiler/brw_fs_nir.cpp
> b/src/intel/compiler/brw_fs_nir.cpp
> index 0abb4798e70..d0648c89865 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -3884,6 +3884,8 @@ fs_visitor::nir_emit_intrinsic(const fs_builder
> , nir_intrinsic_instr *instr
>break;
> }
>  
> +   case nir_intrinsic_group_memory_barrier:
> +   case nir_intrinsic_memory_barrier_shared:
> case nir_intrinsic_memory_barrier_atomic_counter:
> case nir_intrinsic_memory_barrier_buffer:
> case nir_intrinsic_memory_barrier_image:
> @@ -3895,29 +3897,6 @@ fs_visitor::nir_emit_intrinsic(const
> fs_builder , nir_intrinsic_instr *instr
>break;
> }
>  
> -   case nir_intrinsic_group_memory_barrier:
> -   case nir_intrinsic_memory_barrier_shared:
> -  /* We treat these workgroup-level barriers as no-ops.  This
> should be
> -   * safe at present and as long as:
> -   *
> -   *  - Memory access instructions are not subsequently
> reordered by the
> -   *compiler back-end.
> -   *
> -   *  - All threads from a given compute shader workgroup fit
> within a
> -   *single subslice and therefore talk to the same HDC
> shared unit
> -   *what supposedly guarantees ordering and coherency
> between threads
> -   *from the same workgroup.  This may change in the future
> when we
> -   *start splitting workgroups across multiple subslices.
> -   *
> -   *  - The context is not in fault-and-stream mode, which could
> cause
> -   *memory transactions (including to SLM) prior to the
> barrier to be
> -   *replayed after the barrier if a pagefault occurs.  This
> shouldn't
> -   *be a problem up to and including SKL because fault-and-
> stream is
> -   *not usable due to hardware issues, but that's likely to
> change in
> -   *the future.
> -   */
> -  break;
> -
> case nir_intrinsic_shader_clock: {
>/* We cannot do anything if there is an event, so ignore it
> for now */
>const fs_reg shader_clock = get_timestamp(bld);
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv/pipeline: honor the pipeline_cache_enabled run-time flag

2018-07-09 Thread Iago Toral

On Mon, 2018-07-09 at 08:22 +0200, Iago Toral wrote:
> On Fri, 2018-07-06 at 15:50 +0100, Lionel Landwerlin wrote:
> > On 04/07/18 09:44, Iago Toral Quiroga wrote:
> > > ---
> > >   src/intel/vulkan/anv_pipeline_cache.c | 37 +++-
> > > --
> > > -
> > >   1 file changed, 20 insertions(+), 17 deletions(-)
> > > 
> > > diff --git a/src/intel/vulkan/anv_pipeline_cache.c
> > > b/src/intel/vulkan/anv_pipeline_cache.c
> > > index d4c7262dc0..5825bf9f01 100644
> > > --- a/src/intel/vulkan/anv_pipeline_cache.c
> > > +++ b/src/intel/vulkan/anv_pipeline_cache.c
> > > @@ -570,23 +570,26 @@ anv_device_search_for_kernel(struct
> > > anv_device *device,
> > >  }
> > >   
> > 
> > I guess you could do :
> > 
> > if (disk_cache && device->instance->pipeline_cache_enabled) {
> > ...
> > 
> > to minimize the diff.
> 
> Sure, will do that.
> 
> > Do we still want to save stuff in the cache
> > (anv_device_upload_kernel) 
> > when cache is disabled?
> 
> Good question... looking at the implementation it looks that the
> intent
> behind the flag was to also control in-memory cache, so it is
> probably
> a good idea. I'll send a v2 disabling that too.


Actually, we are already disabling that, we have this code in
anv_pipeline_init:

   /* Use the default pipeline cache if none is specified */
   if (cache == NULL && device->instance->pipeline_cache_enabled)
  cache = >default_pipeline_cache;

So when pipeline_cache_enabled is False we don't have a memory cache
unless one is provided by the application, in which case we would want
to use it.

I'l, push the v1 then.

Iago

> Iago
> 
> > Regardless :
> > 
> > Reviewed-by: Lionel Landwerlin 
> > 
> > Thanks!
> > 
> > >   #ifdef ENABLE_SHADER_CACHE
> > > -   struct disk_cache *disk_cache = device->instance-
> > > > physicalDevice.disk_cache;
> > > 
> > > -   if (disk_cache) {
> > > -  cache_key cache_key;
> > > -  disk_cache_compute_key(disk_cache, key_data, key_size,
> > > cache_key);
> > > -
> > > -  size_t buffer_size;
> > > -  uint8_t *buffer = disk_cache_get(disk_cache, cache_key,
> > > _size);
> > > -  if (buffer) {
> > > - struct blob_reader blob;
> > > - blob_reader_init(, buffer, buffer_size);
> > > - bin = anv_shader_bin_create_from_blob(device, );
> > > - free(buffer);
> > > -
> > > - if (bin) {
> > > -if (cache)
> > > -   anv_pipeline_cache_add_shader_bin(cache, bin);
> > > -return bin;
> > > +   if (device->instance->pipeline_cache_enabled) {
> > > +  struct disk_cache *disk_cache =
> > > + device->instance->physicalDevice.disk_cache;
> > > +  if (disk_cache) {
> > > + cache_key cache_key;
> > > + disk_cache_compute_key(disk_cache, key_data, key_size,
> > > cache_key);
> > > +
> > > + size_t buffer_size;
> > > + uint8_t *buffer = disk_cache_get(disk_cache, cache_key,
> > > _size);
> > > + if (buffer) {
> > > +struct blob_reader blob;
> > > +blob_reader_init(, buffer, buffer_size);
> > > +bin = anv_shader_bin_create_from_blob(device,
> > > );
> > > +free(buffer);
> > > +
> > > +if (bin) {
> > > +   if (cache)
> > > +  anv_pipeline_cache_add_shader_bin(cache, bin);
> > > +   return bin;
> > > +}
> > >}
> > > }
> > >  }
> > 
> > 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv/pipeline: honor the pipeline_cache_enabled run-time flag

2018-07-09 Thread Iago Toral

On Fri, 2018-07-06 at 15:50 +0100, Lionel Landwerlin wrote:
> On 04/07/18 09:44, Iago Toral Quiroga wrote:
> > ---
> >   src/intel/vulkan/anv_pipeline_cache.c | 37 +++---
> > -
> >   1 file changed, 20 insertions(+), 17 deletions(-)
> > 
> > diff --git a/src/intel/vulkan/anv_pipeline_cache.c
> > b/src/intel/vulkan/anv_pipeline_cache.c
> > index d4c7262dc0..5825bf9f01 100644
> > --- a/src/intel/vulkan/anv_pipeline_cache.c
> > +++ b/src/intel/vulkan/anv_pipeline_cache.c
> > @@ -570,23 +570,26 @@ anv_device_search_for_kernel(struct
> > anv_device *device,
> >  }
> >   
> 
> I guess you could do :
> 
> if (disk_cache && device->instance->pipeline_cache_enabled) {
> ...
> 
> to minimize the diff.

Sure, will do that.

> Do we still want to save stuff in the cache
> (anv_device_upload_kernel) 
> when cache is disabled?

Good question... looking at the implementation it looks that the intent
behind the flag was to also control in-memory cache, so it is probably
a good idea. I'll send a v2 disabling that too.

Iago

> Regardless :
> 
> Reviewed-by: Lionel Landwerlin 
> 
> Thanks!
> 
> >   #ifdef ENABLE_SHADER_CACHE
> > -   struct disk_cache *disk_cache = device->instance-
> > >physicalDevice.disk_cache;
> > -   if (disk_cache) {
> > -  cache_key cache_key;
> > -  disk_cache_compute_key(disk_cache, key_data, key_size,
> > cache_key);
> > -
> > -  size_t buffer_size;
> > -  uint8_t *buffer = disk_cache_get(disk_cache, cache_key,
> > _size);
> > -  if (buffer) {
> > - struct blob_reader blob;
> > - blob_reader_init(, buffer, buffer_size);
> > - bin = anv_shader_bin_create_from_blob(device, );
> > - free(buffer);
> > -
> > - if (bin) {
> > -if (cache)
> > -   anv_pipeline_cache_add_shader_bin(cache, bin);
> > -return bin;
> > +   if (device->instance->pipeline_cache_enabled) {
> > +  struct disk_cache *disk_cache =
> > + device->instance->physicalDevice.disk_cache;
> > +  if (disk_cache) {
> > + cache_key cache_key;
> > + disk_cache_compute_key(disk_cache, key_data, key_size,
> > cache_key);
> > +
> > + size_t buffer_size;
> > + uint8_t *buffer = disk_cache_get(disk_cache, cache_key,
> > _size);
> > + if (buffer) {
> > +struct blob_reader blob;
> > +blob_reader_init(, buffer, buffer_size);
> > +bin = anv_shader_bin_create_from_blob(device, );
> > +free(buffer);
> > +
> > +if (bin) {
> > +   if (cache)
> > +  anv_pipeline_cache_add_shader_bin(cache, bin);
> > +   return bin;
> > +}
> >}
> > }
> >  }
> 
> 
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] intel/compiler: add an optimization pass for booleans

2018-07-06 Thread Iago Toral

On Thu, 2018-07-05 at 15:47 -0700, Caio Marcelo de Oliveira Filho
wrote:
> (I had to stop reading to go home last Tuesday, so here are the
> remaining comments.)
> 
> 
> On Tue, May 15, 2018 at 01:05:21PM +0200, Iago Toral Quiroga wrote:
> > NIR assumes that all booleans are 32-bit but Intel hardware
> > produces
> > booleans of the same size as the operands to the CMP instruction,
> > so we
> > can actually have 8-bit and 16-bit booleans. To work around this
> > mismatch between NIR and the hardware, we emit boolean conversions
> > to
> > 32-bit right after emitting the CMP instruction during the NIR->FS
> > pass, which makes interfacing with NIR a lot easier, but can leave
> > unnecessary boolean conversions in the shader code.
> 
> Question: have you explored handling this at the NIR->FS conversion?
> I.e. instead of generate the cmp + mov and then look for the cmp +
> mov
> to fix up; when generating a cmp, perform those checks (at nir level)
> and then pick the right bitsize.

It is not that easy. The problem is that NIR will continue to come at
us with 32-bit boolean instructions after the CMP+MOV, so instead of
prpagating forward for every conversion, now, for every bool we find in
IR we'd need to go back in the FS program to check if it is a real 32-
bit boolean or not to decide what to emit. I don't see any benefit,
plus we would be coupling all this with the translation implementation,
which I think is less nice than having it being a completely separate
thing.

Anyway, there is a major issue with the current patch that I have found
this week thanks to some new CTS tests: when we propagate the bitsize
of a logical instruction to its destination, that affects all its
consumers even outside the current block, so we need to handle
propagation across blocks, which adds a few more problems, so I still
need to figure out how to deal with that properly and whether that is
something we want to do (there is a reason why no other opt/lowering
passes do cross-block changes...).

Iago

> 
> > +/**
> > + * Propagates the bit-size of the destination of a boolean
> > instruction to
> > + * all its consumers. If propagate_from_source is True, then the
> > producer
> > + * is a conversion MOV from a low bit-size boolean to 32-bit, and
> > in that
> > + * case the propagation happens from the source of the instruction
> > instead
> > + * of its destination.
> > + */
> > +static bool
> > +propagate_bool_bit_size(fs_inst *inst, bool propagate_from_source)
> > +{
> > +   assert(!propagate_from_source || inst->opcode ==
> > BRW_OPCODE_MOV);
> > +
> > +   bool progress = false;
> > +
> > +   const unsigned bit_size = 8 * (propagate_from_source ?
> > +  type_sz(inst->src[0].type) : type_sz(inst->dst.type));
> > +
> > +   /* Look for any follow-up instructions that sources from the
> > boolean
> > +* result of the producer instruction and rewrite them to use
> > the correct
> > +* bit-size.
> > +*/
> > +   foreach_inst_in_block_starting_from(fs_inst, fixup_inst, inst)
> > {
> > +  if (!inst_supports_boolean(fixup_inst))
> > + continue;
> 
> Should we care about other instruction clobbering the contents of
> inst->dst, or at this point of the optimization we can count on it
> not
> being?
> 
> 
> > + /* If it is a plain boolean conversion to 32-bit, then
> > look for any
> > +  * follow-up instructions that source from the 32-bit
> > boolean and
> > +  * rewrite them to source from the output of the CMP
> > (which is the
> > +  * source of the conversion instruction) directly if
> > possible.
> > +  */
> > + progress = propagate_bool_bit_size(conv_inst, true) ||
> > progress;
> > +  }
> > +#if 0
> > +   else if (inst_supports_boolean(inst) && inst->sources > 1)
> > {
> 
> If you end up enabling this section, I suggest move the
> inst_supports_boolean() check to the beginning of the for-loop, as an
> early return. Makes the condition for the cases we are handling
> cleaner.
> 
> 
> 
> > + /* For all logical instructions that can take more than
> > one operand
> > +  * we need to ensure that all of them have matching bit-
> > sizes. If they
> > +  * don't, it means that the original shader code is
> > operating boolean
> > +  * expressions with different native bit-sizes and we
> > need to choose
> > +  * a canonical boolean form for all the operands, which
> > requires to
> > +  *

[Mesa-dev] [PATCH] anv/pipeline: honor the pipeline_cache_enabled run-time flag

2018-07-04 Thread Iago Toral Quiroga

---
 src/intel/vulkan/anv_pipeline_cache.c | 37 +++
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/src/intel/vulkan/anv_pipeline_cache.c 
b/src/intel/vulkan/anv_pipeline_cache.c
index d4c7262dc0..5825bf9f01 100644
--- a/src/intel/vulkan/anv_pipeline_cache.c
+++ b/src/intel/vulkan/anv_pipeline_cache.c
@@ -570,23 +570,26 @@ anv_device_search_for_kernel(struct anv_device *device,
}
 
 #ifdef ENABLE_SHADER_CACHE
-   struct disk_cache *disk_cache = device->instance->physicalDevice.disk_cache;
-   if (disk_cache) {
-  cache_key cache_key;
-  disk_cache_compute_key(disk_cache, key_data, key_size, cache_key);
-
-  size_t buffer_size;
-  uint8_t *buffer = disk_cache_get(disk_cache, cache_key, _size);
-  if (buffer) {
- struct blob_reader blob;
- blob_reader_init(, buffer, buffer_size);
- bin = anv_shader_bin_create_from_blob(device, );
- free(buffer);
-
- if (bin) {
-if (cache)
-   anv_pipeline_cache_add_shader_bin(cache, bin);
-return bin;
+   if (device->instance->pipeline_cache_enabled) {
+  struct disk_cache *disk_cache =
+ device->instance->physicalDevice.disk_cache;
+  if (disk_cache) {
+ cache_key cache_key;
+ disk_cache_compute_key(disk_cache, key_data, key_size, cache_key);
+
+ size_t buffer_size;
+ uint8_t *buffer = disk_cache_get(disk_cache, cache_key, _size);
+ if (buffer) {
+struct blob_reader blob;
+blob_reader_init(, buffer, buffer_size);
+bin = anv_shader_bin_create_from_blob(device, );
+free(buffer);
+
+if (bin) {
+   if (cache)
+  anv_pipeline_cache_add_shader_bin(cache, bin);
+   return bin;
+}
  }
   }
}
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/3] intel/compiler: add an optimization pass for booleans

2018-07-04 Thread Iago Toral

On Tue, 2018-07-03 at 18:45 -0700, Caio Marcelo de Oliveira Filho
wrote:
> Hi,
> 
> 
> > +   /* Look for any follow-up instructions that sources from the
> > boolean
> > +* result of the producer instruction and rewrite them to use
> > the correct
> > +* bit-size.
> > +*/
> > +   foreach_inst_in_block_starting_from(fs_inst, fixup_inst, inst)
> > {
> > +  if (!inst_supports_boolean(fixup_inst))
> > + continue;
> > +
> > +  /* For MOV instructions we can always rewrite the boolean
> > source
> > +   * if the instrucion reads the same region we produced in
> > the
> > +   * 32-bit conversion.
> > +   */
> > +  if (fixup_inst->opcode == BRW_OPCODE_MOV &&
> > +  region_match(inst->dst, inst->size_written,
> > +   fixup_inst->src[0], fixup_inst-
> > >size_read(0))) {
> > + if (propagate_from_source) {
> > +fixup_inst->src[0].file = inst->src[0].file;
> > +fixup_inst->src[0].nr = inst->src[0].nr;
> > + }
> > + fixup_inst->src[0] =
> > +fix_bool_reg_bit_size(fixup_inst->src[0], bit_size);
> > + progress = true;
> > + continue;
> > +  }
> 
> It seems the rest of the code assumes that instruction is not MOV, so
> you would need to ensure continue is called regardless the region
> match.

Right, although if the region doesn't match the rest of the code won't
do anything anyway.

> Idea: it seems we could just remove this section above (handling
> MOV),
> and slightly change the section below so that MOV can be dealt with
> it
> too.
> 
> - Drop the section above;
> - Rename progress_logical to local_progress;
> - Add a "fixup_inst->opcode == BRW_OPCODE_MOV" to the

The recursive call executes for logical instructions, not for MOV, so
this should be !=.

>   condition that controls the recursive call;
> - Update comments accordingly.

Sounds like a good idea, thanks for the feedback.

Iago

> 
> > +
> > +  /* For logical instructions we have the same restriction as
> > for MOVs,
> > +   * and we also need to:
> > +   *
> > +   * 1. Propagate the bit-size to the boolean destination of
> > the
> > +   *instruction.
> > +   * 2. Rewrite any instruction that reads the destination to
> > use
> > +   *the new bit-size.
> > +   *
> > +   * However, we can only do these if we can rewrite all the
> > operands
> > +   * to use the same bit-size.
> > +   */
> > +  bool progress_logical = false;
> > +  bool same_bit_size = true;
> > +  for (unsigned i = 0; i < fixup_inst->sources; i++) {
> > + if (region_match(inst->dst, inst->size_written,
> > +  fixup_inst->src[i], fixup_inst-
> > >size_read(i))) {
> > +if (propagate_from_source) {
> > +   fixup_inst->src[i].file = inst->src[0].file;
> > +   fixup_inst->src[i].nr = inst->src[0].nr;
> > +}
> > +fixup_inst->src[i] =
> > +   fix_bool_reg_bit_size(fixup_inst->src[i],
> > bit_size);
> > +progress_logical = true;
> > +progress = true;
> > + }
> > +
> > + if (i > 0 &&
> > + type_sz(fixup_inst->src[i].type) !=
> > + type_sz(fixup_inst->src[i - 1].type)) {
> > +same_bit_size = false;
> > + }
> > +  }
> > +
> > +  /* If we have successfully rewritten a logical instruction
> > operand
> > +   * to use a smaller bit-size boolean and all the operands in
> > the
> > +   * instruction have the same small bit-size, then propagate
> > the
> > +   * new bit-size to the destination boolean and do the same
> > for all
> > +   * follow-up instructions that read from it.
> > +   */
> > +  if (progress_logical && same_bit_size) {
> > + fixup_inst->dst = retype(fixup_inst->dst, fixup_inst-
> > >src[0].type);
> > + propagate_bool_bit_size(fixup_inst, false);
> > +  }
> > +   }
> > +
> > +   return progress;
> > +}
> 
> 
> 
> 
> Thanks,
> Caio
> 
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/4] anv/cmd_buffer: only emit state base address if the address changes

2018-07-03 Thread Iago Toral

On Mon, 2018-07-02 at 08:23 -0500, Jason Ekstrand wrote:
> On July 2, 2018 01:09:38 Iago Toral  wrote:
> 
> > On Sun, 2018-07-01 at 18:30 -0500, Jason Ekstrand wrote:
> > > On June 29, 2018 03:11:00 Iago Toral Quiroga 
> > > wrote:
> > > 
> > > > ---
> > > > src/intel/vulkan/anv_private.h |  5 +
> > > > src/intel/vulkan/genX_cmd_buffer.c | 12 +++-
> > > > 2 files changed, 12 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/src/intel/vulkan/anv_private.h
> > > > b/src/intel/vulkan/anv_private.h
> > > > index 510471da602..1a9ab7013f2 100644
> > > > --- a/src/intel/vulkan/anv_private.h
> > > > +++ b/src/intel/vulkan/anv_private.h
> > > > @@ -1989,6 +1989,11 @@ struct anv_cmd_state {
> > > > * is one of the states in render_pass_states.
> > > > */
> > > > struct
> > > > anv_state null_surface_state;
> > > > +
> > > > +   /**
> > > > +* Current state base address.
> > > > +*/
> > > > +   struct
> > > > anv_address   base_state_address;
> > > > };
> > > > 
> > > > struct anv_cmd_pool {
> > > > diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> > > > b/src/intel/vulkan/genX_cmd_buffer.c
> > > > index 611311904e6..2847e0b30c9 100644
> > > > --- a/src/intel/vulkan/genX_cmd_buffer.c
> > > > +++ b/src/intel/vulkan/genX_cmd_buffer.c
> > > > @@ -67,6 +67,12 @@
> > > > genX(cmd_buffer_emit_state_base_address)(struct
> > > > anv_cmd_buffer *cmd_buffer)
> > > > {
> > > > struct anv_device *device = cmd_buffer->device;
> > > > 
> > > > +   struct anv_address new_base_address =
> > > > +  anv_cmd_buffer_surface_base_address(cmd_buffer);
> > > > +   if (new_base_address.bo == cmd_buffer-
> > > > > state.base_state_address.bo &&
> > > > 
> > > > +   new_base_address.offset == cmd_buffer-
> > > > > state.base_state_address.offset)
> > > > 
> > > > +  return;
> > > > +
> > > > /* If we are emitting a new state base address we probably need
> > > > to re-emit
> > > > * binding tables.
> > > > */
> > > > @@ -90,8 +96,7 @@
> > > > genX(cmd_buffer_emit_state_base_address)(struct
> > > > anv_cmd_buffer *cmd_buffer)
> > > > sba.GeneralStateMemoryObjectControlState = GENX(MOCS);
> > > > sba.GeneralStateBaseAddressModifyEnable = true;
> > > > 
> > > > -  sba.SurfaceStateBaseAddress =
> > > > - anv_cmd_buffer_surface_base_address(cmd_buffer);
> > > > +  sba.SurfaceStateBaseAddress = new_base_address;
> > > > sba.SurfaceStateMemoryObjectControlState = GENX(MOCS);
> > > > sba.SurfaceStateBaseAddressModifyEnable = true;
> > > > 
> > > > @@ -1521,9 +1526,6 @@ genX(CmdExecuteCommands)(
> > > > /* Each of the secondary command buffers will use its own state
> > > > base
> > > > * address.  We need to re-emit state base address for the
> > > > primary after
> > > > * all of the secondaries are done.
> > > > -*
> > > > -* TODO: Maybe we want to make this a dirty bit to avoid
> > > > extra
> > > > state base
> > > > -* address calls?
> > > 
> > > I don't think this is correct.  When a secondary executes, we
> > > have
> > > to
> > > reemit STATE_BASE_ADDRESS because the secondary used it's own and
> > > we
> > > need
> > > to set it back for the primary. The comment above was saying that
> > > we
> > > can
> > > probably avoid it if we have a bunch of ExecuteCommands calls
> > > back to
> > > back
> > > or if the last thing in the batch is a call out to a secondary.
> > > As is, I
> > > think this patch will cause problems in the case where the client
> > > uses a
> > > secondary followed by rendering in the primary.  Have I missed
> > > something?
> > 
> > I shouldn't remove the comment since this patche doesn't address
> > that
> > TODO, we still emit the state base address for the primary below,
> > the
> > only change in here is that if the base state address of the
> > primary is
> > the same as the one for the secondaries we won't actually emit the
> > state packet, but that should be fine. Maybe you thought I was
> > removing
> > the line below?
> 
> The problem is that it never will be the same.  The secondary always 
> allocate a new binding table pool and re-emit STATE_BASE_ADDRESS so I
> don't 
> think we're actually saving anything.

Ok, in that case I agree it won't help anything, thanks for clarifying.
Let's drop this patch then.

Iago


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 0/9] anv, nir: Move large constants to a UBO

2018-07-02 Thread Iago Toral

For the series:

Reviewed-by: Iago Toral Quiroga 

On Fri, 2018-06-29 at 17:13 -0700, Jason Ekstrand wrote:
> This little series adds an optimization pass to NIR and wires up up
> in anv
> that moves large constant variables to a UBO.  This fixes a farily
> common
> case in some filter or ambient occlusion shaders where they put some
> sort
> of look-up table in the shader itself.  This series takes Skyrim
> Special
> Edition running under DXVK from a slide show to a smooth and very
> playable
> framerate on my SKL desktop.
> 
> The first part of the series adds a concept of constant data that can
> be
> associated with a NIR shader and adds an optimization pass to move
> large
> constant variables into this constant data section.  It's left up to
> the
> driver to figure out how to get this constant data into the
> shader.  The
> last three patches wire things up in ANV to put this data into an
> implicit
> UBO and enables the optimization.
> 
> v2 (Jason Ekstrand):
>  - Take anholt's feedback and make it more clear that the units on
> the
>number of constants is in bytes by calling it constant_data_size.
>  - Break some of the deref to offset code out into helpers
>  - Add new size/align helpers for types to ensure that we get
> alignments
>right when setting up constants.  This hasn't usually been a
> problem in
>the past because we align most things to a dword and 64-bit values
>aren't common.  We should start being more careful.
> 
> Jason Ekstrand (9):
>   util/macros: Import ALIGN_POT from ralloc.c
>   nir: Add a deref_instr_has_indirect helper
>   nir/types: Add a natural size and alignment helper
>   nir/deref: Add helpers for getting offsets
>   nir: Add a concept of constant data associated with a shader
>   nir: Add a large constants optimization pass
>   anv: Add support for shader constant data to the pipeline cache
>   anv: Add state setup support for shader constants
>   anv,intel: Enable nir_opt_large_constants for Vulkan
> 
>  src/compiler/Makefile.sources |   1 +
>  src/compiler/nir/meson.build  |   1 +
>  src/compiler/nir/nir.h|  14 +
>  src/compiler/nir/nir_clone.c  |   6 +
>  src/compiler/nir/nir_deref.c  | 109 +++
>  src/compiler/nir/nir_deref.h  |   6 +
>  src/compiler/nir/nir_intrinsics.py|   2 +
>  src/compiler/nir/nir_opt_large_constants.c| 301
> ++
>  src/compiler/nir/nir_serialize.c  |  12 +
>  src/compiler/nir/nir_sweep.c  |   2 +
>  src/compiler/nir_types.cpp|  56 
>  src/compiler/nir_types.h  |   6 +
>  src/intel/compiler/brw_compiler.h |   6 +
>  src/intel/compiler/brw_nir.c  |   7 +
>  src/intel/vulkan/anv_blorp.c  |   1 +
>  src/intel/vulkan/anv_device.c |   1 +
>  .../vulkan/anv_nir_apply_pipeline_layout.c|  47 +++
>  src/intel/vulkan/anv_pipeline.c   |  16 +
>  src/intel/vulkan/anv_pipeline_cache.c |  27 ++
>  src/intel/vulkan/anv_private.h|   7 +
>  src/intel/vulkan/genX_cmd_buffer.c|  72 +++--
>  src/util/macros.h |   3 +
>  src/util/ralloc.c |   2 -
>  23 files changed, 684 insertions(+), 21 deletions(-)
>  create mode 100644 src/compiler/nir/nir_opt_large_constants.c
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/4] anv/cmd_buffer: never shrink the push constant buffer size

2018-07-02 Thread Iago Toral

On Sun, 2018-07-01 at 17:33 +0100, Lionel Landwerlin wrote:
> I reread the discussion you had with Jason in order to figure out
> why 
> this change is required.
> Maybe adding a comment at the top of the function would be a good bit
> of 
> documentation for future developers ;)

Sure, I will add that.

> Regardless this series is :
> 
> Reviewed-by: Lionel Landwerlin 

Thanks, I'll push the first 3 for now.

> Thanks!
> 
> On 29/06/18 09:10, Iago Toral Quiroga wrote:
> > If we have to re-emit push constant data, we need to re-emit all
> > of it.
> > ---
> >   src/intel/vulkan/anv_cmd_buffer.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/intel/vulkan/anv_cmd_buffer.c
> > b/src/intel/vulkan/anv_cmd_buffer.c
> > index 33687920a38..3e9f000f7b8 100644
> > --- a/src/intel/vulkan/anv_cmd_buffer.c
> > +++ b/src/intel/vulkan/anv_cmd_buffer.c
> > @@ -166,6 +166,7 @@
> > anv_cmd_buffer_ensure_push_constants_size(struct anv_cmd_buffer
> > *cmd_buffer,
> >anv_batch_set_error(_buffer->batch,
> > VK_ERROR_OUT_OF_HOST_MEMORY);
> >return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
> > }
> > +  (*ptr)->size = size;
> >  } else if ((*ptr)->size < size) {
> > *ptr = vk_realloc(_buffer->pool->alloc, *ptr, size, 8,
> >VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
> > @@ -173,8 +174,8 @@
> > anv_cmd_buffer_ensure_push_constants_size(struct anv_cmd_buffer
> > *cmd_buffer,
> >anv_batch_set_error(_buffer->batch,
> > VK_ERROR_OUT_OF_HOST_MEMORY);
> >return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
> > }
> > +  (*ptr)->size = size;
> >  }
> > -   (*ptr)->size = size;
> >   
> >  return VK_SUCCESS;
> >   }
> 
> 
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 3/4] anv/cmd_buffer: make descriptors dirty when emitting base state address

2018-07-02 Thread Iago Toral

On Sun, 2018-07-01 at 18:32 -0500, Jason Ekstrand wrote:
> 1-3 are R-b me. Should we cc stable?

Yes, I think these should go to stable.

> On June 29, 2018 03:11:00 Iago Toral Quiroga 
> wrote:
> 
> > Every time we emit a new state base address we will need to re-emit 
> > our
> > binding tables, since they might have been emitted with a different
> > base
> > state adress.
> > ---
> > src/intel/vulkan/genX_cmd_buffer.c | 5 +
> > 1 file changed, 5 insertions(+)
> > 
> > diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
> > b/src/intel/vulkan/genX_cmd_buffer.c
> > index 66d1ef7d786..611311904e6 100644
> > --- a/src/intel/vulkan/genX_cmd_buffer.c
> > +++ b/src/intel/vulkan/genX_cmd_buffer.c
> > @@ -67,6 +67,11 @@ genX(cmd_buffer_emit_state_base_address)(struct 
> > anv_cmd_buffer *cmd_buffer)
> > {
> >struct anv_device *device = cmd_buffer->device;
> > 
> > +   /* If we are emitting a new state base address we probably need
> > to re-emit
> > +* binding tables.
> > +*/
> > +   cmd_buffer->state.descriptors_dirty |= ~0;
> > +
> >/* Emit a render target cache flush.
> > *
> > * This isn't documented anywhere in the PRM.  However, it seems
> > to be
> > --
> > 2.14.1
> 
> 
> 
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 4/4] anv/cmd_buffer: only emit state base address if the address changes

2018-07-02 Thread Iago Toral

On Sun, 2018-07-01 at 18:30 -0500, Jason Ekstrand wrote:
> On June 29, 2018 03:11:00 Iago Toral Quiroga 
> wrote:
> 
> > ---
> > src/intel/vulkan/anv_private.h |  5 +
> > src/intel/vulkan/genX_cmd_buffer.c | 12 +++-
> > 2 files changed, 12 insertions(+), 5 deletions(-)
> > 
> > diff --git a/src/intel/vulkan/anv_private.h
> > b/src/intel/vulkan/anv_private.h
> > index 510471da602..1a9ab7013f2 100644
> > --- a/src/intel/vulkan/anv_private.h
> > +++ b/src/intel/vulkan/anv_private.h
> > @@ -1989,6 +1989,11 @@ struct anv_cmd_state {
> > * is one of the states in render_pass_states.
> > */
> >struct anv_state null_surface_state;
> > +
> > +   /**
> > +* Current state base address.
> > +*/
> > +   struct
> > anv_address   base_state_address;
> > };
> > 
> > struct anv_cmd_pool {
> > diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
> > b/src/intel/vulkan/genX_cmd_buffer.c
> > index 611311904e6..2847e0b30c9 100644
> > --- a/src/intel/vulkan/genX_cmd_buffer.c
> > +++ b/src/intel/vulkan/genX_cmd_buffer.c
> > @@ -67,6 +67,12 @@ genX(cmd_buffer_emit_state_base_address)(struct 
> > anv_cmd_buffer *cmd_buffer)
> > {
> >struct anv_device *device = cmd_buffer->device;
> > 
> > +   struct anv_address new_base_address =
> > +  anv_cmd_buffer_surface_base_address(cmd_buffer);
> > +   if (new_base_address.bo == cmd_buffer-
> > >state.base_state_address.bo &&
> > +   new_base_address.offset == cmd_buffer-
> > >state.base_state_address.offset)
> > +  return;
> > +
> >/* If we are emitting a new state base address we probably need
> > to re-emit
> > * binding tables.
> > */
> > @@ -90,8 +96,7 @@ genX(cmd_buffer_emit_state_base_address)(struct 
> > anv_cmd_buffer *cmd_buffer)
> >   sba.GeneralStateMemoryObjectControlState = GENX(MOCS);
> >   sba.GeneralStateBaseAddressModifyEnable = true;
> > 
> > -  sba.SurfaceStateBaseAddress =
> > - anv_cmd_buffer_surface_base_address(cmd_buffer);
> > +  sba.SurfaceStateBaseAddress = new_base_address;
> >   sba.SurfaceStateMemoryObjectControlState = GENX(MOCS);
> >   sba.SurfaceStateBaseAddressModifyEnable = true;
> > 
> > @@ -1521,9 +1526,6 @@ genX(CmdExecuteCommands)(
> >/* Each of the secondary command buffers will use its own state
> > base
> > * address.  We need to re-emit state base address for the
> > primary after
> > * all of the secondaries are done.
> > -*
> > -* TODO: Maybe we want to make this a dirty bit to avoid extra
> > state base
> > -* address calls?
> 
> I don't think this is correct.  When a secondary executes, we have
> to 
> reemit STATE_BASE_ADDRESS because the secondary used it's own and we
> need 
> to set it back for the primary. The comment above was saying that we
> can 
> probably avoid it if we have a bunch of ExecuteCommands calls back to
> back 
> or if the last thing in the batch is a call out to a secondary.
>   As is, I 
> think this patch will cause problems in the case where the client
> uses a 
> secondary followed by rendering in the primary.  Have I missed
> something?

I shouldn't remove the comment since this patche doesn't address that
TODO, we still emit the state base address for the primary below, the
only change in here is that if the base state address of the primary is
the same as the one for the secondaries we won't actually emit the
state packet, but that should be fine. Maybe you thought I was removing
the line below?

> 
> > */
> >genX(cmd_buffer_emit_state_base_address)(primary);
> > }
> > --
> > 2.14.1
> 
> 
> 
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/4] anv/cmd_buffer: clean dirty push constants flag after emitting push constants

2018-06-29 Thread Iago Toral Quiroga

---
 src/intel/vulkan/genX_cmd_buffer.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 97b321ccaeb..66d1ef7d786 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -3008,6 +3008,8 @@ genX(cmd_buffer_flush_compute_state)(struct 
anv_cmd_buffer *cmd_buffer)
 curbe.CURBEDataStartAddress   = push_state.offset;
  }
   }
+
+  cmd_buffer->state.push_constants_dirty &= ~VK_SHADER_STAGE_COMPUTE_BIT;
}
 
cmd_buffer->state.compute.pipeline_dirty = false;
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/4] anv/cmd_buffer: make descriptors dirty when emitting base state address

2018-06-29 Thread Iago Toral Quiroga

Every time we emit a new state base address we will need to re-emit our
binding tables, since they might have been emitted with a different base
state adress.
---
 src/intel/vulkan/genX_cmd_buffer.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 66d1ef7d786..611311904e6 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -67,6 +67,11 @@ genX(cmd_buffer_emit_state_base_address)(struct 
anv_cmd_buffer *cmd_buffer)
 {
struct anv_device *device = cmd_buffer->device;
 
+   /* If we are emitting a new state base address we probably need to re-emit
+* binding tables.
+*/
+   cmd_buffer->state.descriptors_dirty |= ~0;
+
/* Emit a render target cache flush.
 *
 * This isn't documented anywhere in the PRM.  However, it seems to be
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/4] anv/cmd_buffer: only emit state base address if the address changes

2018-06-29 Thread Iago Toral Quiroga

---
 src/intel/vulkan/anv_private.h |  5 +
 src/intel/vulkan/genX_cmd_buffer.c | 12 +++-
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 510471da602..1a9ab7013f2 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -1989,6 +1989,11 @@ struct anv_cmd_state {
 * is one of the states in render_pass_states.
 */
struct anv_state null_surface_state;
+
+   /**
+* Current state base address.
+*/
+   struct anv_address   base_state_address;
 };
 
 struct anv_cmd_pool {
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 611311904e6..2847e0b30c9 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -67,6 +67,12 @@ genX(cmd_buffer_emit_state_base_address)(struct 
anv_cmd_buffer *cmd_buffer)
 {
struct anv_device *device = cmd_buffer->device;
 
+   struct anv_address new_base_address =
+  anv_cmd_buffer_surface_base_address(cmd_buffer);
+   if (new_base_address.bo == cmd_buffer->state.base_state_address.bo &&
+   new_base_address.offset == cmd_buffer->state.base_state_address.offset)
+  return;
+
/* If we are emitting a new state base address we probably need to re-emit
 * binding tables.
 */
@@ -90,8 +96,7 @@ genX(cmd_buffer_emit_state_base_address)(struct 
anv_cmd_buffer *cmd_buffer)
   sba.GeneralStateMemoryObjectControlState = GENX(MOCS);
   sba.GeneralStateBaseAddressModifyEnable = true;
 
-  sba.SurfaceStateBaseAddress =
- anv_cmd_buffer_surface_base_address(cmd_buffer);
+  sba.SurfaceStateBaseAddress = new_base_address;
   sba.SurfaceStateMemoryObjectControlState = GENX(MOCS);
   sba.SurfaceStateBaseAddressModifyEnable = true;
 
@@ -1521,9 +1526,6 @@ genX(CmdExecuteCommands)(
/* Each of the secondary command buffers will use its own state base
 * address.  We need to re-emit state base address for the primary after
 * all of the secondaries are done.
-*
-* TODO: Maybe we want to make this a dirty bit to avoid extra state base
-* address calls?
 */
genX(cmd_buffer_emit_state_base_address)(primary);
 }
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/4] anv/cmd_buffer: never shrink the push constant buffer size

2018-06-29 Thread Iago Toral Quiroga

If we have to re-emit push constant data, we need to re-emit all
of it.
---
 src/intel/vulkan/anv_cmd_buffer.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_cmd_buffer.c 
b/src/intel/vulkan/anv_cmd_buffer.c
index 33687920a38..3e9f000f7b8 100644
--- a/src/intel/vulkan/anv_cmd_buffer.c
+++ b/src/intel/vulkan/anv_cmd_buffer.c
@@ -166,6 +166,7 @@ anv_cmd_buffer_ensure_push_constants_size(struct 
anv_cmd_buffer *cmd_buffer,
  anv_batch_set_error(_buffer->batch, VK_ERROR_OUT_OF_HOST_MEMORY);
  return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
   }
+  (*ptr)->size = size;
} else if ((*ptr)->size < size) {
   *ptr = vk_realloc(_buffer->pool->alloc, *ptr, size, 8,
  VK_SYSTEM_ALLOCATION_SCOPE_OBJECT);
@@ -173,8 +174,8 @@ anv_cmd_buffer_ensure_push_constants_size(struct 
anv_cmd_buffer *cmd_buffer,
  anv_batch_set_error(_buffer->batch, VK_ERROR_OUT_OF_HOST_MEMORY);
  return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
   }
+  (*ptr)->size = size;
}
-   (*ptr)->size = size;
 
return VK_SUCCESS;
 }
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] anv/cmd_buffer: emit binding tables always if push constants are dirty

2018-06-28 Thread Iago Toral

On Thu, 2018-06-28 at 08:47 +0200, Iago Toral wrote:
> On Wed, 2018-06-27 at 09:13 -0700, Jason Ekstrand wrote:
> > On Wed, Jun 27, 2018 at 2:25 AM, Iago Toral 
> > wrote:
> > > On Tue, 2018-06-26 at 10:59 -0700, Jason Ekstrand wrote:
> > > > On Tue, Jun 26, 2018 at 4:08 AM, Iago Toral Quiroga  > > > lia.com> wrote:
> > > > > Storage images require to patch push constant stateto work,
> > > > > which happens during
> > > > > 
> > > > > binding table emision. In the scenario where our pipeline and
> > > > > descriptors are
> > > > > 
> > > > > not dirty, we don't re-emit the binding table, however, if
> > > > > our push constant
> > > > > 
> > > > > state is dirty, we will re-emit the push constant state,
> > > > > trashing storage
> > > > > 
> > > > > image setup.
> > > > > 
> > > > > 
> > > > > 
> > > > > While that scenario is probably not very likely to happen in
> > > > > practice, there
> > > > > 
> > > > > are some CTS tests that trigger this by clearing storage
> > > > > images and buffers
> > > > > 
> > > > > and dispatching a compute shader in a loop. The clearing of
> > > > > the images
> > > > > 
> > > > > and buffers will trigger a blorp execution which will dirty
> > > > > our push constant
> > > > > 
> > > > > state, however, because  we don't alter the descriptors or
> > > > > the compute dispatch
> > > > > 
> > > > > at all in the loop (we are basically execution the same
> > > > > program in a loop),
> > > > > 
> > > > > our pipeline and descriptor state is not dirty. If the shader
> > > > > uses a storage
> > > > > 
> > > > > image, then any iteration after the first will re-emit push
> > > > > constant state
> > > > > 
> > > > > without re-emitting binding tables and the storage image will
> > > > > not be properly
> > > > > 
> > > > > setup any more.
> > > > 
> > > > I don't see why that is a problem.  The only thing
> > > > flush_descriptor_sets does is fill out the binding and sampler
> > > > tables and fill in the push constant data for storage
> > > > images/buffers.  The actual HW packets are filled out by
> > > > flush_push_constants and emit_descriptor_pointers.  Yes, blorp
> > > > trashes our descriptor pointers but the descriptor sets should
> > > > be fine.  For push constants, it does emit 3DSTATE_CONSTANT_*
> > > > but it doesn't actually modify anv_cmd_state::push_constants.
> > > > 
> > > > Are secondary command buffers involved?  I could see something
> > > > funny going on with those.
> > > 
> > > No, no secondaries are involved. I did some more investigation
> > > and I think my explanation of the problem was not good, this is
> > > what is really happening:
> > > 
> > > First, I found the problem in the compute pipeline and I only
> > > extended the fix to the graphics pipeline because it looked like
> > > the same rationale would apply, so I'll explain what happens in
> > > compute and then we can discuss whether the same problem applies
> > > to graphics.
> > > 
> > > The test does something like this:
> > > 
> > > for (...) {
> > >clear ssbos / storage images
> > >dispatch compute
> > > }
> > > 
> > > The first iteration of this loop will find that the compute
> > > pipeline and descriptors are dirty and proceed to emit binding
> > > tables. We have storage images, so during that process the push
> > > constant buffer is amended to include storage images.
> > > Specifically, we call anv_cmd_buffer_ensure_push_constants_size()
> > > for the images field. This gives us a size of 624.
> > > 
> > > We move on to the second iteration of the loop. When we clear
> > > images and ssbos via blorp, we again mark the push constant
> > > buffer as dirty. Now we execute the compute dispatch and the
> > > first thing we do there is anv_cmd_buffer_push_base_group_id()
> > > which calls anv_cmd_buffer_ensure_push_constants_size() for the
> > > base group id

Re: [Mesa-dev] [PATCH] anv/cmd_buffer: emit binding tables always if push constants are dirty

2018-06-28 Thread Iago Toral

On Wed, 2018-06-27 at 09:13 -0700, Jason Ekstrand wrote:
> On Wed, Jun 27, 2018 at 2:25 AM, Iago Toral 
> wrote:
> > On Tue, 2018-06-26 at 10:59 -0700, Jason Ekstrand wrote:
> > > On Tue, Jun 26, 2018 at 4:08 AM, Iago Toral Quiroga  > > a.com> wrote:
> > > > Storage images require to patch push constant stateto work,
> > > > which happens during
> > > > 
> > > > binding table emision. In the scenario where our pipeline and
> > > > descriptors are
> > > > 
> > > > not dirty, we don't re-emit the binding table, however, if our
> > > > push constant
> > > > 
> > > > state is dirty, we will re-emit the push constant state,
> > > > trashing storage
> > > > 
> > > > image setup.
> > > > 
> > > > 
> > > > 
> > > > While that scenario is probably not very likely to happen in
> > > > practice, there
> > > > 
> > > > are some CTS tests that trigger this by clearing storage images
> > > > and buffers
> > > > 
> > > > and dispatching a compute shader in a loop. The clearing of the
> > > > images
> > > > 
> > > > and buffers will trigger a blorp execution which will dirty our
> > > > push constant
> > > > 
> > > > state, however, because  we don't alter the descriptors or the
> > > > compute dispatch
> > > > 
> > > > at all in the loop (we are basically execution the same program
> > > > in a loop),
> > > > 
> > > > our pipeline and descriptor state is not dirty. If the shader
> > > > uses a storage
> > > > 
> > > > image, then any iteration after the first will re-emit push
> > > > constant state
> > > > 
> > > > without re-emitting binding tables and the storage image will
> > > > not be properly
> > > > 
> > > > setup any more.
> > > 
> > > I don't see why that is a problem.  The only thing
> > > flush_descriptor_sets does is fill out the binding and sampler
> > > tables and fill in the push constant data for storage
> > > images/buffers.  The actual HW packets are filled out by
> > > flush_push_constants and emit_descriptor_pointers.  Yes, blorp
> > > trashes our descriptor pointers but the descriptor sets should be
> > > fine.  For push constants, it does emit 3DSTATE_CONSTANT_* but it
> > > doesn't actually modify anv_cmd_state::push_constants.
> > > 
> > > Are secondary command buffers involved?  I could see something
> > > funny going on with those.
> > 
> > No, no secondaries are involved. I did some more investigation and
> > I think my explanation of the problem was not good, this is what is
> > really happening:
> > 
> > First, I found the problem in the compute pipeline and I only
> > extended the fix to the graphics pipeline because it looked like
> > the same rationale would apply, so I'll explain what happens in
> > compute and then we can discuss whether the same problem applies to
> > graphics.
> > 
> > The test does something like this:
> > 
> > for (...) {
> >clear ssbos / storage images
> >dispatch compute
> > }
> > 
> > The first iteration of this loop will find that the compute
> > pipeline and descriptors are dirty and proceed to emit binding
> > tables. We have storage images, so during that process the push
> > constant buffer is amended to include storage images. Specifically,
> > we call anv_cmd_buffer_ensure_push_constants_size() for the images
> > field. This gives us a size of 624.
> > 
> > We move on to the second iteration of the loop. When we clear
> > images and ssbos via blorp, we again mark the push constant buffer
> > as dirty. Now we execute the compute dispatch and the first thing
> > we do there is anv_cmd_buffer_push_base_group_id() which calls
> > anv_cmd_buffer_ensure_push_constants_size() for the base group id,
> > which gives as a size of 144. This is smaller than what we computed
> > in the previous iteration, because we haven't called the same
> > function for the images field yet. Unfortunately, we will never
> > call that again, because we only do that during binding table
> > emission and we only do that if the compute pipeline is dirty (it
> > is not) or our descriptors are  dirty (they are not). So we don't
> > re-emit binding table and we don't ensure push constant space for
>

Re: [Mesa-dev] [PATCH] anv/cmd_buffer: emit binding tables always if push constants are dirty

2018-06-27 Thread Iago Toral

On Tue, 2018-06-26 at 10:59 -0700, Jason Ekstrand wrote:
> On Tue, Jun 26, 2018 at 4:08 AM, Iago Toral Quiroga  m> wrote:
> > Storage images require to patch push constant stateto work, which
> > happens during
> > 
> > binding table emision. In the scenario where our pipeline and
> > descriptors are
> > 
> > not dirty, we don't re-emit the binding table, however, if our push
> > constant
> > 
> > state is dirty, we will re-emit the push constant state, trashing
> > storage
> > 
> > image setup.
> > 
> > 
> > 
> > While that scenario is probably not very likely to happen in
> > practice, there
> > 
> > are some CTS tests that trigger this by clearing storage images and
> > buffers
> > 
> > and dispatching a compute shader in a loop. The clearing of the
> > images
> > 
> > and buffers will trigger a blorp execution which will dirty our
> > push constant
> > 
> > state, however, because  we don't alter the descriptors or the
> > compute dispatch
> > 
> > at all in the loop (we are basically execution the same program in
> > a loop),
> > 
> > our pipeline and descriptor state is not dirty. If the shader uses
> > a storage
> > 
> > image, then any iteration after the first will re-emit push
> > constant state
> > 
> > without re-emitting binding tables and the storage image will not
> > be properly
> > 
> > setup any more.
> 
> I don't see why that is a problem.  The only thing
> flush_descriptor_sets does is fill out the binding and sampler tables
> and fill in the push constant data for storage images/buffers.  The
> actual HW packets are filled out by flush_push_constants and
> emit_descriptor_pointers.  Yes, blorp trashes our descriptor pointers
> but the descriptor sets should be fine.  For push constants, it does
> emit 3DSTATE_CONSTANT_* but it doesn't actually modify
> anv_cmd_state::push_constants.
> 
> Are secondary command buffers involved?  I could see something funny
> going on with those.

No, no secondaries are involved. I did some more investigation and I
think my explanation of the problem was not good, this is what is
really happening:
First, I found the problem in the compute pipeline and I only extended
the fix to the graphics pipeline because it looked like the same
rationale would apply, so I'll explain what happens in compute and then
we can discuss whether the same problem applies to graphics.
The test does something like this:
for (...) {   clear ssbos / storage images   dispatch compute}
The first iteration of this loop will find that the compute pipeline
and descriptors are dirty and proceed to emit binding tables. We have
storage images, so during that process the push constant buffer is
amended to include storage images. Specifically, we call
anv_cmd_buffer_ensure_push_constants_size() for the images field. This
gives us a size of 624.
We move on to the second iteration of the loop. When we clear images
and ssbos via blorp, we again mark the push constant buffer as dirty.
Now we execute the compute dispatch and the first thing we do there is
anv_cmd_buffer_push_base_group_id() which calls
anv_cmd_buffer_ensure_push_constants_size() for the base group id,
which gives as a size of 144. This is smaller than what we computed in
the previous iteration, because we haven't called the same function for
the images field yet. Unfortunately, we will never call that again,
because we only do that during binding table emission and we only do
that if the compute pipeline is dirty (it is not) or our descriptors
are  dirty (they are not). So we don't re-emit binding table and we
don't ensure push constant space for the image data, but because we
come from a blorp execution our push constant dirty flag is true, so we
re-emit push constant data, only that this time we won't emit the push
constant data we need for the storage images, which leads to the
problem.
I thought that maybe making anv_cmd_buffer_ensure_push_constants_size()
only update the size if we alloc or realloc would fix this, but that
can cause GPU hangs in some cases when I run multiple tests in parallel
, so I guess it isn't that simple.

I hope I am making more sense now.
Iago
> --Jason
>  
> > Fixes multiple failures in some new CTS tests.
> > 
> > ---
> > 
> >  src/intel/vulkan/genX_cmd_buffer.c | 9 -
> > 
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > 
> > 
> > diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> > b/src/intel/vulkan/genX_cmd_buffer.c
> > 
> > index 97b321ccaeb..6e48aaedb9b 100644
> > 
> > --- a/src/intel/vulkan/genX_cmd_buffer.c
> > 
> > +++ b/src/intel/vulkan/gen

[Mesa-dev] [PATCH] anv/cmd_buffer: emit binding tables always if push constants are dirty

2018-06-26 Thread Iago Toral Quiroga

Storage images require to patch push constant stateto work, which happens during
binding table emision. In the scenario where our pipeline and descriptors are
not dirty, we don't re-emit the binding table, however, if our push constant
state is dirty, we will re-emit the push constant state, trashing storage
image setup.

While that scenario is probably not very likely to happen in practice, there
are some CTS tests that trigger this by clearing storage images and buffers
and dispatching a compute shader in a loop. The clearing of the images
and buffers will trigger a blorp execution which will dirty our push constant
state, however, because  we don't alter the descriptors or the compute dispatch
at all in the loop (we are basically execution the same program in a loop),
our pipeline and descriptor state is not dirty. If the shader uses a storage
image, then any iteration after the first will re-emit push constant state
without re-emitting binding tables and the storage image will not be properly
setup any more.

Fixes multiple failures in some new CTS tests.
---
 src/intel/vulkan/genX_cmd_buffer.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 97b321ccaeb..6e48aaedb9b 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -2554,7 +2554,8 @@ genX(cmd_buffer_flush_state)(struct anv_cmd_buffer 
*cmd_buffer)
 * 3DSTATE_BINDING_TABLE_POINTER_* for the push constants to take effect.
 */
uint32_t dirty = 0;
-   if (cmd_buffer->state.descriptors_dirty)
+   if (cmd_buffer->state.descriptors_dirty ||
+   cmd_buffer->state.push_constants_dirty)
   dirty = flush_descriptor_sets(cmd_buffer);
 
if (dirty || cmd_buffer->state.push_constants_dirty) {
@@ -2988,7 +2989,13 @@ genX(cmd_buffer_flush_compute_state)(struct 
anv_cmd_buffer *cmd_buffer)
   anv_batch_emit_batch(_buffer->batch, >batch);
}
 
+   /* Storage images require push constant data, which is setup during the
+* binding table emision. If we have dirty push constants, we need to
+* re-emit the binding table so we get the push constant storage image setup
+* done, otherwise we trash it when we emit push constants below.
+*/
if ((cmd_buffer->state.descriptors_dirty & VK_SHADER_STAGE_COMPUTE_BIT) ||
+   (cmd_buffer->state.push_constants_dirty & VK_SHADER_STAGE_COMPUTE_BIT) 
||
cmd_buffer->state.compute.pipeline_dirty) {
   /* FIXME: figure out descriptors for gen7 */
   result = flush_compute_descriptor_set(cmd_buffer);
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v3] i965/gen6/gs: Handle case where a GS doesn't allocate VUE

2018-06-25 Thread Iago Toral

Thanks for testing Mark.

Andrii, I'll add my Reviewed-by and and push the patch to master later
today (I'll also queue it for the next stable release).

Thanks for fixing this!

Iago

On Fri, 2018-06-22 at 13:18 -0700, Mark Janes wrote:
> Tested-by: Mark Janes 
> 
> Iago Toral  writes:
> 
> > Thanks Andrii, this version looks good to me.
> > 
> > Mark: this change fixes a GPU hang in sandy bridge with geometry
> > shaders (the change itself affects a path in the driver that is
> > only
> > executed in SNB with GS, so nothing else is affected). While I
> > think
> > the change in here is correct according to the PRMs and in fact it
> > seems to fix the GPU hang reported in Bugzilla, since this is
> > tinkering
> > with the way in which we allocate and free VUEs for SNB GS I
> > believe
> > that if this breaks anything it might produce a GPU hang and in
> > that
> > case I would rather not hang Jenkins for everyone else until you
> > have a
> > chance to restore it, so in order to minimize that risk, could you
> > run
> > this through Jenkins when you are available? If that is
> > inconvenient
> > for you just let me know and I will send it myself late in my day
> > on
> > Monday to minimize the risk.
> > 
> > Thanks,
> > Iago
> > 
> > On Fri, 2018-06-22 at 10:59 +0300, Andrii Simiklit wrote:
> > > We can not use the VUE Dereference flags combination for EOT
> > > message under ILK and SNB because the threads are not initialized
> > > there with initial VUE handle unlike Pre-IL.
> > > So to avoid GPU hangs on SNB and ILK we need
> > > to avoid usage of the VUE Dereference flags combination.
> > > (Was tested only on SNB but according to the specification
> > > SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6
> > > the ILK must behave itself in the similar way)
> > > 
> > > v2: Approach to fix this issue was changed.
> > > Instead of different EOT flags in the program end
> > > we will create VUE every time even if GS produces no output.
> > > 
> > > v3: Clean up the patch.
> > > Signed-off-by: Andrii Simiklit 
> > > 
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399
> > > 
> > > ---
> > >  src/intel/compiler/gen6_gs_visitor.cpp | 42 +---
> > > 
> > > --
> > >  1 file changed, 21 insertions(+), 21 deletions(-)
> > > 
> > > diff --git a/src/intel/compiler/gen6_gs_visitor.cpp
> > > b/src/intel/compiler/gen6_gs_visitor.cpp
> > > index 66c69fb..c9571cc 100644
> > > --- a/src/intel/compiler/gen6_gs_visitor.cpp
> > > +++ b/src/intel/compiler/gen6_gs_visitor.cpp
> > > @@ -350,27 +350,27 @@ gen6_gs_visitor::emit_thread_end()
> > > int max_usable_mrf = FIRST_SPILL_MRF(devinfo->gen);
> > >  
> > > /* Issue the FF_SYNC message and obtain the initial VUE
> > > handle.
> > > */
> > > +   this->current_annotation = "gen6 thread end: ff_sync";
> > > +
> > > +   vec4_instruction *inst = NULL;
> > > +   if (prog->info.has_transform_feedback_varyings) {
> > > +  src_reg sol_temp(this, glsl_type::uvec4_type);
> > > +  emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES,
> > > +   dst_reg(this->svbi),
> > > +   this->vertex_count,
> > > +   this->prim_count,
> > > +   sol_temp);
> > > +  inst = emit(GS_OPCODE_FF_SYNC,
> > > +  dst_reg(this->temp), this->prim_count, this-
> > > > svbi);
> > > 
> > > +   } else {
> > > +  inst = emit(GS_OPCODE_FF_SYNC,
> > > +  dst_reg(this->temp), this->prim_count,
> > > brw_imm_ud(0u));
> > > +   }
> > > +   inst->base_mrf = base_mrf;
> > > +
> > > emit(CMP(dst_null_ud(), this->vertex_count, brw_imm_ud(0u),
> > > BRW_CONDITIONAL_G));
> > > emit(IF(BRW_PREDICATE_NORMAL));
> > > {
> > > -  this->current_annotation = "gen6 thread end: ff_sync";
> > > -
> > > -  vec4_instruction *inst;
> > > -  if (prog->info.has_transform_feedback_varyings) {
> > > - src_reg sol_temp(this, glsl_type::uvec4_type);
> > > - emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES,
> > > -  dst_reg(this->svbi),
> > > -  this->vertex_count,
> > > -  this->prim_count,
> > > -

Re: [Mesa-dev] [PATCH v3] i965/gen6/gs: Handle case where a GS doesn't allocate VUE

2018-06-22 Thread Iago Toral

Thanks Andrii, this version looks good to me.

Mark: this change fixes a GPU hang in sandy bridge with geometry
shaders (the change itself affects a path in the driver that is only
executed in SNB with GS, so nothing else is affected). While I think
the change in here is correct according to the PRMs and in fact it
seems to fix the GPU hang reported in Bugzilla, since this is tinkering
with the way in which we allocate and free VUEs for SNB GS I believe
that if this breaks anything it might produce a GPU hang and in that
case I would rather not hang Jenkins for everyone else until you have a
chance to restore it, so in order to minimize that risk, could you run
this through Jenkins when you are available? If that is inconvenient
for you just let me know and I will send it myself late in my day on
Monday to minimize the risk.

Thanks,
Iago

On Fri, 2018-06-22 at 10:59 +0300, Andrii Simiklit wrote:
> We can not use the VUE Dereference flags combination for EOT
> message under ILK and SNB because the threads are not initialized
> there with initial VUE handle unlike Pre-IL.
> So to avoid GPU hangs on SNB and ILK we need
> to avoid usage of the VUE Dereference flags combination.
> (Was tested only on SNB but according to the specification
> SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6
> the ILK must behave itself in the similar way)
> 
> v2: Approach to fix this issue was changed.
> Instead of different EOT flags in the program end
> we will create VUE every time even if GS produces no output.
> 
> v3: Clean up the patch.
> Signed-off-by: Andrii Simiklit 
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399
> 
> ---
>  src/intel/compiler/gen6_gs_visitor.cpp | 42 +---
> --
>  1 file changed, 21 insertions(+), 21 deletions(-)
> 
> diff --git a/src/intel/compiler/gen6_gs_visitor.cpp
> b/src/intel/compiler/gen6_gs_visitor.cpp
> index 66c69fb..c9571cc 100644
> --- a/src/intel/compiler/gen6_gs_visitor.cpp
> +++ b/src/intel/compiler/gen6_gs_visitor.cpp
> @@ -350,27 +350,27 @@ gen6_gs_visitor::emit_thread_end()
> int max_usable_mrf = FIRST_SPILL_MRF(devinfo->gen);
>  
> /* Issue the FF_SYNC message and obtain the initial VUE handle.
> */
> +   this->current_annotation = "gen6 thread end: ff_sync";
> +
> +   vec4_instruction *inst = NULL;
> +   if (prog->info.has_transform_feedback_varyings) {
> +  src_reg sol_temp(this, glsl_type::uvec4_type);
> +  emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES,
> +   dst_reg(this->svbi),
> +   this->vertex_count,
> +   this->prim_count,
> +   sol_temp);
> +  inst = emit(GS_OPCODE_FF_SYNC,
> +  dst_reg(this->temp), this->prim_count, this-
> >svbi);
> +   } else {
> +  inst = emit(GS_OPCODE_FF_SYNC,
> +  dst_reg(this->temp), this->prim_count,
> brw_imm_ud(0u));
> +   }
> +   inst->base_mrf = base_mrf;
> +
> emit(CMP(dst_null_ud(), this->vertex_count, brw_imm_ud(0u),
> BRW_CONDITIONAL_G));
> emit(IF(BRW_PREDICATE_NORMAL));
> {
> -  this->current_annotation = "gen6 thread end: ff_sync";
> -
> -  vec4_instruction *inst;
> -  if (prog->info.has_transform_feedback_varyings) {
> - src_reg sol_temp(this, glsl_type::uvec4_type);
> - emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES,
> -  dst_reg(this->svbi),
> -  this->vertex_count,
> -  this->prim_count,
> -  sol_temp);
> - inst = emit(GS_OPCODE_FF_SYNC,
> - dst_reg(this->temp), this->prim_count, this-
> >svbi);
> -  } else {
> - inst = emit(GS_OPCODE_FF_SYNC,
> - dst_reg(this->temp), this->prim_count,
> brw_imm_ud(0u));
> -  }
> -  inst->base_mrf = base_mrf;
> -
>/* Loop over all buffered vertices and emit URB write messages
> */
>this->current_annotation = "gen6 thread end: urb writes init";
>src_reg vertex(this, glsl_type::uint_type);
> @@ -414,7 +414,7 @@ gen6_gs_visitor::emit_thread_end()
> dst_reg reg = dst_reg(MRF, mrf);
> reg.type = output_reg[varying][0].type;
> data.type = reg.type;
> -   vec4_instruction *inst = emit(MOV(reg, data));
> +   inst = emit(MOV(reg, data));
> inst->force_writemask_all = true;
>  
> mrf++;
> @@ -460,7 +460,7 @@ gen6_gs_visitor::emit_thread_end()
>  *
>  * However, this would lead us to end the program with an ENDIF
> opcode,
>  * which we want to avoid, so what we do is that we always
> request a new
> -* VUE handle every time we do a URB WRITE, even for the last
> vertex we emit.
> +* VUE handle every time, even if GS produces no output.
>  * With this we make sure that whether we have emitted at least
> one vertex
>  * or none at all, we have to finish the thread without writing
> to the URB,
>  * which works for both cases by setting the COMPLETE and UNUSED
> flags in
> @@ -476,7 +476,7 @@

Re: [Mesa-dev] [PATCH v2] i965/gen6/gs: Handle case where a GS doesn't allocate VUE

2018-06-22 Thread Iago Toral

Hi Andrii,


thanks for verifying my suggestion and sending the new patch.

However, this patch is the diff against your previous patch, please
merge both patches into a single patch so we get a single patch with
all the changes against current master. Once we have that I'll run the
resulting patch through Intel's CI system to ensure that this doesn't
break anything in our test suites.

I also add a few addition comments below:

On Thu, 2018-06-21 at 16:40 +0300, Andrii Simiklit wrote:
> We can not use the VUE Dereference flags combination for EOT
> message under ILK and SNB because the threads are not initialized
> there with initial VUE handle unlike Pre-IL.
> So to avoid GPU hangs on SNB and ILK we need
> to avoid usage of the VUE Dereference flags combination.
> (Was tested only on SNB but according to the specification
> SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6
> the ILK must behave itself in the similar way)
> 
> v2: Approach to fix this issue was changed.
> Instead of different EOT flags in the program end
> we will create VUE every time even if GS produces no output.
> 
> Signed-off-by: Andrii Simiklit 
> ---
>  src/intel/compiler/gen6_gs_visitor.cpp | 88 +---
> --
>  1 file changed, 23 insertions(+), 65 deletions(-)
> 
> diff --git a/src/intel/compiler/gen6_gs_visitor.cpp
> b/src/intel/compiler/gen6_gs_visitor.cpp
> index ac3ba55..b831d33 100644
> --- a/src/intel/compiler/gen6_gs_visitor.cpp
> +++ b/src/intel/compiler/gen6_gs_visitor.cpp
> @@ -300,11 +300,10 @@ gen6_gs_visitor::emit_urb_write_opcode(bool
> complete, int base_mrf,
>/* Otherwise we always request to allocate a new VUE handle.
> If this is
> * the last write before the EOT message and the new handle
> never gets
> * used it will be dereferenced when we send the EOT message.
> This is
> -   * necessary to avoid different setups (under Pre-IL only) for
> the EOT message (one for the
> +   * necessary to avoid different setups for the EOT message
> (one for the
> * case when there is no output and another for the case when
> there is)
> * which would require to end the program with an
> IF/ELSE/ENDIF block,
> -   * something we do not want. 
> -   * But for ILK and SNB we can not avoid the end the program
> with an IF/ELSE/ENDIF block.
> +   * something we do not want.
> */
>inst = emit(GS_OPCODE_URB_WRITE_ALLOCATE);
>inst->urb_write_flags = BRW_URB_WRITE_COMPLETE;
> @@ -351,27 +350,27 @@ gen6_gs_visitor::emit_thread_end()
> int max_usable_mrf = FIRST_SPILL_MRF(devinfo->gen);
>  
> /* Issue the FF_SYNC message and obtain the initial VUE handle.
> */
> -   emit(CMP(dst_null_ud(), this->vertex_count, brw_imm_ud(0u),
> BRW_CONDITIONAL_G));
> -   emit(IF(BRW_PREDICATE_NORMAL));
> -   {
> -  this->current_annotation = "gen6 thread end: ff_sync";
> +   this->current_annotation = "gen6 thread end: ff_sync";
>  
> -  vec4_instruction *inst;
> -  if (prog->info.has_transform_feedback_varyings) {
> +   vec4_instruction *inst = NULL;
> +   if (prog->info.has_transform_feedback_varyings) {
>   src_reg sol_temp(this, glsl_type::uvec4_type);
>   emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES,

Since you are removing one level of indentation here, please re-indent
the code inside the if accordingly (the coding style is 3 blank spaces
for each indentation level).

> 
> -  dst_reg(this->svbi),
> -  this->vertex_count,
> -  this->prim_count,
> -  sol_temp);
> +   dst_reg(this->svbi),
> +   this->vertex_count,
> +   this->prim_count,
> +   sol_temp);
>   inst = emit(GS_OPCODE_FF_SYNC,
>   dst_reg(this->temp), this->prim_count, this-
> >svbi);
> -  } else {
> +   } else {
>   inst = emit(GS_OPCODE_FF_SYNC,
>   dst_reg(this->temp), this->prim_count,
> brw_imm_ud(0u));

Same inside the else.

> -  }
> -  inst->base_mrf = base_mrf;
> +   }
> +   inst->base_mrf = base_mrf;
>  
> +   emit(CMP(dst_null_ud(), this->vertex_count, brw_imm_ud(0u),
> BRW_CONDITIONAL_G));
> +   emit(IF(BRW_PREDICATE_NORMAL));
> +   {
>/* Loop over all buffered vertices and emit URB write messages
> */
>this->current_annotation = "gen6 thread end: urb writes init";
>src_reg vertex(this, glsl_type::uint_type);
> @@ -415,7 +414,7 @@ gen6_gs_visitor::emit_thread_end()
> dst_reg reg = dst_reg(MRF, mrf);
> reg.type = output_reg[varying][0].type;
> data.type = reg.type;
> -   vec4_instruction *inst = emit(MOV(reg, data));
> +   inst = emit(MOV(reg, data));
> inst->force_writemask_all = true;
>  
> mrf++;
> @@ -450,11 +449,8 @@ gen6_gs_visitor::emit_thread_end()
>if (prog->info.has_transform_feedback_varyings)
>   xfb_write();
> }
> -   const bool

[Mesa-dev] [PATCH] intel/compiler: emit actual barriers for working-group level barriers

2018-06-21 Thread Iago Toral Quiroga

Until now we have assumed that we could skip emitting these barriers
in the general case based on empirical testing and a few assumptions
detailed in a comment in the driver code, however, recent CTS tests
have showed that we actually need them to produce correct behavior.
---
 src/intel/compiler/brw_fs_nir.cpp | 25 ++---
 1 file changed, 2 insertions(+), 23 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 0abb4798e70..d0648c89865 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -3884,6 +3884,8 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
   break;
}
 
+   case nir_intrinsic_group_memory_barrier:
+   case nir_intrinsic_memory_barrier_shared:
case nir_intrinsic_memory_barrier_atomic_counter:
case nir_intrinsic_memory_barrier_buffer:
case nir_intrinsic_memory_barrier_image:
@@ -3895,29 +3897,6 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
   break;
}
 
-   case nir_intrinsic_group_memory_barrier:
-   case nir_intrinsic_memory_barrier_shared:
-  /* We treat these workgroup-level barriers as no-ops.  This should be
-   * safe at present and as long as:
-   *
-   *  - Memory access instructions are not subsequently reordered by the
-   *compiler back-end.
-   *
-   *  - All threads from a given compute shader workgroup fit within a
-   *single subslice and therefore talk to the same HDC shared unit
-   *what supposedly guarantees ordering and coherency between threads
-   *from the same workgroup.  This may change in the future when we
-   *start splitting workgroups across multiple subslices.
-   *
-   *  - The context is not in fault-and-stream mode, which could cause
-   *memory transactions (including to SLM) prior to the barrier to be
-   *replayed after the barrier if a pagefault occurs.  This shouldn't
-   *be a problem up to and including SKL because fault-and-stream is
-   *not usable due to hardware issues, but that's likely to change in
-   *the future.
-   */
-  break;
-
case nir_intrinsic_shader_clock: {
   /* We cannot do anything if there is an event, so ignore it for now */
   const fs_reg shader_clock = get_timestamp(bld);
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965/gen6/gs: Handle case where a GS doesn't allocate VUE

2018-06-21 Thread Iago Toral

ung until now. According to Vol4, part2, section 2.4.4.1
FF_SYNC  Message  Header,  it is valid to specify 0 vertices/primitives
for the FF_SYNC message, so that should in theory work.
Could you see f if that fixes the problem?
Iago
> > Then in 2.4.2 Message Descriptor (vol4, part2), it says:
> > 
> > " Used: 
> >   If set, this signals that the URB entry(s) referenced by
> >   the handle(s) are valid outputs of the thread.  In 
> >   all likelihood this means that that entry(s) contains
> >   complete & valid data to be subject to further 
> >   processing by the pipeline.   
> >   If clear, this signals that the URB entry(s) referenced by
> >   the handle(s) are not valid outputs of the thread.  
> >   Use of this setting will result in the handle(s) 
> >   being immediately dereferenced by the owning FF unit.  
> >   This setting is to be used by GS or CLIP threads to 
> >   dereference handles it obtained (either in the initial 
> >   thread payload or subsequent allocation writebacks) 
> >   but subsequently determined were not required  (e.g.,
> >   the object was completely clipped out)."
> > 
> > Again, there is no mention of this being Pre-ILK only and on top of
> > that, the text explicitly states that this combination is used to
> > deference handles obtained  either in the initial thread payload or
> > subsequent allocation writebacks.
> 
> So, according to section 1.6.5.5 VUE Dereference (GS) (vol2, part1)
> this combination is the Dereference operation and we shouldn't use it
> in case GS produces no output (see my first comment above).  
> 
> > And finally, it also says the following:
> > 
> > "Complete: (...)
> > Programming Notes: 
> > The following message descriptor fields are only valid when 
> > Complete is set:  Used"
> > 
> > Which I understand means that 'Used' is only applicable when
> > Complete
> > is set, or in other words, that the only possible combinations
> > where
> > Used is accounted for are those in which we we also have Complete
> > set.
> 
> According to 
> Section 1.6.5.6 Thread Termination (vol2, part1)
>  " All threads must explicitly terminate 
>by executing a SEND instruction 
>with the EOT bit set.  (See EU chapters).  
>When a thread spawned by a 3D FF unit terminates, 
>the spawning FF unit detects 
>this termination as a part of Thread Management.  
>This allows the FF units to manage the number of 
>concurrent threads it has spawned and also manage 
>the resources (e.g., scratch space) allocated to those threads. 
> 
>Programming Note: [Pre-DevIL] GS and Clip threads must terminate 
>by sending a URB_WRITE message (with EOT set) with the Complete
> bit also
>set (therein returning a URB handle marked as either used or un-
> used). "
> 
> Only Pre-DevIL architectures must specify the Complete=1.
> 
> And finally according to all comments above we shouldn't use 
> the Dereference operation for the no output case and we know 
> that we able to set Complete=0 because the Complete=1 value is
> mandatory only for Pre-DevIL.
> 
> I hope, that I was understandable) Could you please let me know if
> you agree with me)
> Regards,
> Andrii.
> 
> 
> 
> On 20.06.18 15:19, Iago Toral
> wrote:
> 
> 
> 
> >   On Tue, 2018-06-19 at 17:06 +0300, Andrii Simiklit wrote:
> > 
> >   
> > > We can not use the VUE Dereference flags combination for
> > > EOT
> > > message under ILK and SNB because the threads are not initialized
> > > there with initial VUE handle unlike Pre-IL.
> > > So to avoid GPU hangs on SNB and ILK we need
> > > to avoid usage of the VUE Dereference flags combination.
> > > (Was tested only on SNB but according to the specification
> > > SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6
> > > the ILK must behave itself in the similar way)
> > > 
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399
> > > 
> > > Signed-off-by: Andrii Simiklit 
> > > ---
> > >  src/intel/compiler/gen6_gs_visitor.cpp | 56
> > > +-
> > >  1 file changed, 49 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/src/intel/compiler/gen6_gs_visitor.cpp
> > > b/src/intel/compiler/gen6_gs_visitor.cpp
> > > index 66c69fb..ac3ba55 100644
> > > --- a/src/intel/compiler/gen6_gs_visitor.cpp
> > > +++ b/src/intel/compiler/gen6_gs_visitor.cpp
> > > @@ -300,10 +300,11 @@ gen6_gs_visit

Re: [Mesa-dev] [PATCH] i965/gen6/gs: Handle case where a GS doesn't allocate VUE

2018-06-20 Thread Iago Toral

On Tue, 2018-06-19 at 17:06 +0300, Andrii Simiklit wrote:
> We can not use the VUE Dereference flags combination for EOT
> message under ILK and SNB because the threads are not initialized
> there with initial VUE handle unlike Pre-IL.
> So to avoid GPU hangs on SNB and ILK we need
> to avoid usage of the VUE Dereference flags combination.
> (Was tested only on SNB but according to the specification
> SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6
> the ILK must behave itself in the similar way)
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399
> 
> Signed-off-by: Andrii Simiklit 
> ---
>  src/intel/compiler/gen6_gs_visitor.cpp | 56
> +-
>  1 file changed, 49 insertions(+), 7 deletions(-)
> 
> diff --git a/src/intel/compiler/gen6_gs_visitor.cpp
> b/src/intel/compiler/gen6_gs_visitor.cpp
> index 66c69fb..ac3ba55 100644
> --- a/src/intel/compiler/gen6_gs_visitor.cpp
> +++ b/src/intel/compiler/gen6_gs_visitor.cpp
> @@ -300,10 +300,11 @@ gen6_gs_visitor::emit_urb_write_opcode(bool
> complete, int base_mrf,
>/* Otherwise we always request to allocate a new VUE handle.
> If this is
> * the last write before the EOT message and the new handle
> never gets
> * used it will be dereferenced when we send the EOT message.
> This is
> -   * necessary to avoid different setups for the EOT message
> (one for the
> +   * necessary to avoid different setups (under Pre-IL only) for
> the EOT message (one for the
> * case when there is no output and another for the case when
> there is)
> * which would require to end the program with an
> IF/ELSE/ENDIF block,
> -   * something we do not want.
> +   * something we do not want. 
> +   * But for ILK and SNB we can not avoid the end the program
> with an IF/ELSE/ENDIF block.
> */
>inst = emit(GS_OPCODE_URB_WRITE_ALLOCATE);
>inst->urb_write_flags = BRW_URB_WRITE_COMPLETE;
> @@ -449,8 +450,11 @@ gen6_gs_visitor::emit_thread_end()
>if (prog->info.has_transform_feedback_varyings)
>   xfb_write();
> }
> -   emit(BRW_OPCODE_ENDIF);
> -
> +   const bool common_eot_approach_can_be_used = (devinfo->gen < 5);

We don't implement GS before gen6, and I don't think there are plans
for it at this point, so I think we can just simplify the patch by
assuming that devinfo->gen is always going to be 6 here (later gens use
a different implementation of GS).

> +   if(common_eot_approach_can_be_used)
> +   {
> +  emit(BRW_OPCODE_ENDIF);  
> +   }
> /* Finally, emit EOT message.
>  *
>  * In gen6 we need to end the thread differently depending on
> whether we have
> @@ -463,8 +467,32 @@ gen6_gs_visitor::emit_thread_end()
>  * VUE handle every time we do a URB WRITE, even for the last
> vertex we emit.
>  * With this we make sure that whether we have emitted at least
> one vertex
>  * or none at all, we have to finish the thread without writing
> to the URB,
> -* which works for both cases by setting the COMPLETE and UNUSED
> flags in
> -* the EOT message.
> +* which works for both cases (but only under Pre-IL) by setting 
> +* the COMPLETE and UNUSED flags in the EOT message.
> +* 
> +* But under ILK or SNB we must not use combination COMPLETE and
> UNUSED 
> +* because this combination could be used only for already
> allocated VUE. 
> +* But unlike Pre-IL in the ILK and SNB 
> +* the initial VUE is not passed to threads. 
> +* This behaver mentioned in specification: 
> +* SNB Volume 2 Part 1:
> +*  "1.6.5.3 VUE Allocation (GS, CLIP) [DevIL]"
> +*  "1.6.5.4 VUE Allocation (GS) [DevSNB+]"
> +* "The threads are not passed an initial handle.  
> +* Instead, they request a first handle (if any) 
> +* via the URB shared function’s FF_SYNC message (see Shared
> Functions). 
> +* If additional handles are required, 
> +* the URB_WRITE allocate mechanism (mentioned above) is
> used."
> +* 
> +* So for ILK and for SNB we must use only UNUSED flag.
> +* This is accepteble combination according to:
> +*SNB Volume 4 Part 2:
> +*   "2.4.2 Message Descriptor"
> +*  "Table lists the valid and invalid combinations of 
> +*   the Complete, Used, Allocate and EOT bits"
> +*  "Thread terminate non-write of URB"
> +*SNB Volume 2 Part 1:
> +*   "1.6.5.6 Thread Termination"
>  */

I am not sure why you conclude all this from the PRM. This is what I
see:

Section 1.6.5.5 VUE Dereference (GS) (vol2, part1) says:

"It is possible and legal for a thread to produce no output
 or subsequently allocate a destination VUE that 
 was not required (e.g., the thread allocated ahead). 
 Therefore, there is a mechanism by which a thread can “give back”  
 (dereference) an a llocated VUE.  This mechanism must  be used if   
 the  VUE is not written before the thread terminates.  A  kernel

Re: [Mesa-dev] [PATCH] Fix 105399 bug GPU hang on SNB using geometry shader. The end of thread (EOT) message with flags Complete=1 and Used=0 will leads to GPU hang on SNB and ILK when GS does not all

2018-06-19 Thread Iago Toral

Hi Andrii,

thanks for the fix!

Kenneth, this patch makes it so that we end the GS program with and
ENDIF. I remember that back in the day when I wrote this code you had
concerns about that (that's why I added that comment), but that was a
long time ago so maybe things have changed, do you know if this is
still something that we should avoid?

Andrii: limit the subject line (the one that shows up in the subject of
the e-mail starting after "[PATCH]" small enough to fit in 80
characters. I do a quick review of the patch inline below and will do a
more thorough review tomorrow.

On Tue, 2018-06-19 at 11:31 +0300, Andrii Simiklit wrote:
> We can not use the VUE Dereference flags combination for EOT
> message under ILK and SNB because the threads are not initialized
> there with initial VUE handle unlike Pre-IL.
> So to avoid GPU hangs on SNB and ILK we need
> to avoid usage of the VUE Dereference flags combination.
> (Was tested only on SNB but according to specification
> https://01.org/sites/default/files/documentation/snb_ihd_os_vol2_part
> 1_0.pdf
> sections: 1.6.5.3, 1.6.5.6
> the ILK must behave itself in the similar way)
> 
> Signed-off-by: Andrii Simiklit 

Add:
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399

> ---
>  src/intel/compiler/gen6_gs_visitor.cpp | 53
> ++
>  1 file changed, 47 insertions(+), 6 deletions(-)
> 
> diff --git a/src/intel/compiler/gen6_gs_visitor.cpp
> b/src/intel/compiler/gen6_gs_visitor.cpp
> index 66c69fb..2143fd2 100644
> --- a/src/intel/compiler/gen6_gs_visitor.cpp
> +++ b/src/intel/compiler/gen6_gs_visitor.cpp
> @@ -300,10 +300,11 @@ gen6_gs_visitor::emit_urb_write_opcode(bool
> complete, int base_mrf,
>/* Otherwise we always request to allocate a new VUE handle.
> If this is
> * the last write before the EOT message and the new handle
> never gets
> * used it will be dereferenced when we send the EOT message.
> This is
> -   * necessary to avoid different setups for the EOT message
> (one for the
> +   * necessary to avoid different setups (under Pre-IL only) for
> the EOT message (one for the
> * case when there is no output and another for the case when
> there is)
> * which would require to end the program with an
> IF/ELSE/ENDIF block,
> -   * something we do not want.
> +   * something we do not want. 
> +   * But for ILK and SNB we can not avoid the end the program
> with an IF/ELSE/ENDIF block.
> */

Limit lines to 80 characters long.

>inst = emit(GS_OPCODE_URB_WRITE_ALLOCATE);
>inst->urb_write_flags = BRW_URB_WRITE_COMPLETE;
> @@ -449,8 +450,12 @@ gen6_gs_visitor::emit_thread_end()
>if (prog->info.has_transform_feedback_varyings)
>   xfb_write();
> }
> -   emit(BRW_OPCODE_ENDIF);
> -
> +   enum { GEN5_ILK = 5 };
> +   const bool common_eot_approach_can_be_used = (devinfo->gen <
> GEN5_ILK);

devinfo->gen < 5 is fine, we do that everywhere in the driver.

> +   if(common_eot_approach_can_be_used)
> +   {
> +  emit(BRW_OPCODE_ENDIF);  
> +   }
> /* Finally, emit EOT message.
>  *
>  * In gen6 we need to end the thread differently depending on
> whether we have
> @@ -463,8 +468,30 @@ gen6_gs_visitor::emit_thread_end()
>  * VUE handle every time we do a URB WRITE, even for the last
> vertex we emit.
>  * With this we make sure that whether we have emitted at least
> one vertex
>  * or none at all, we have to finish the thread without writing
> to the URB,
> -* which works for both cases by setting the COMPLETE and UNUSED
> flags in
> +* which works for both cases (but only under Pre-IL) by setting
> the COMPLETE and UNUSED flags in
>  * the EOT message.
> +* 
> +* But under ILK or SNB we must not use combination COMPLETE and
> UNUSED 
> +* because this combination could be used only for already
> allocated VUE. 
> +* But unlike Pre-IL in the ILK and SNB the initial VUE is not
> passed to threads. 
> +* This behaver mentioned in specification: 
> +* SNB (gen6) Spec:
> https://01.org/sites/default/files/documentation/snb_ihd_os_vol2_part
> 1_0.pdf

I think you can drop the URL, mentioning the specific section of the
PRM with the relevant text is sufficient.

> +*1.6.5.3 VUE Allocation (GS, CLIP) [DevIL]
> +*1.6.5.4 VUE Allocation (GS) [DevSNB+]

We usually write PRM citations with quotes.

> +*   The threads are not passed an initial handle.  
> +*   Instead, they request a first handle (if any) 
> +*   via the URB shared function’s FF_SYNC message (see
> Shared Functions). 
> +*   If additional handles are required, 
> +*   the URB_WRITE allocate mechanism (mentioned above) is
> used. 
> +* 
> +* So for ILK and for SNB we must use only UNUSED flag.
> +* This is accepteble combination according to:
> +*SNB (gen6) Spec: https://01.org/sites/default/files/documen
>

Re: [Mesa-dev] [PATCH mesa] mesa: add missing return in error path

2018-06-18 Thread Iago Toral

Reviewed-by: Iago Toral Quiroga 

On Mon, 2018-06-18 at 11:40 +0100, Eric Engestrom wrote:
> Fixes: 67f40dadaadacd90 "mesa: add support for
> ARB_sample_locations"
> Cc: Rhys Perry 
> Cc: Brian Paul 
> Signed-off-by: Eric Engestrom 
> ---
>  src/mesa/main/fbobject.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/main/fbobject.c b/src/mesa/main/fbobject.c
> index 5d7e5d29847dcfdbb23e..fa7a9361dfcfeaa105aa 100644
> --- a/src/mesa/main/fbobject.c
> +++ b/src/mesa/main/fbobject.c
> @@ -4695,9 +4695,11 @@ sample_locations(struct gl_context *ctx,
> struct gl_framebuffer *fb,
> if (!fb->SampleLocationTable) {
>size_t size = MAX_SAMPLE_LOCATION_TABLE_SIZE * 2 *
> sizeof(GLfloat);
>fb->SampleLocationTable = malloc(size);
> -  if (!fb->SampleLocationTable)
> +  if (!fb->SampleLocationTable) {
>   _mesa_error(ctx, GL_OUT_OF_MEMORY,
>   "Cannot allocate sample location table");
> + return;
> +  }
>for (i = 0; i < MAX_SAMPLE_LOCATION_TABLE_SIZE * 2; i++)
>   fb->SampleLocationTable[i] = 0.5f;
> }
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: add better GLSL override support for compat profile

2018-06-18 Thread Iago Toral

On Mon, 2018-06-18 at 12:51 +1000, Timothy Arceri wrote:
> ---
>  src/mesa/main/version.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/main/version.c b/src/mesa/main/version.c
> index 1bdccf4a1df..0d2597a61f4 100644
> --- a/src/mesa/main/version.c
> +++ b/src/mesa/main/version.c
> @@ -228,6 +228,7 @@ _mesa_override_glsl_version(struct gl_constants
> *consts)
> }
>  
> n = sscanf(version, "%u", >GLSLVersion);
> +   consts->GLSLVersionCompat = consts->GLSLVersion;
> if (n != 1) {
>fprintf(stderr, "error: invalid value for %s: %s\n", env_var,
> version);
>return;
> @@ -624,16 +625,21 @@ _mesa_compute_version(struct gl_context *ctx)
>switch (ctx->Version) {
>case 30:
>   ctx->Const.GLSLVersion = 130;
> + ctx->Const.GLSLVersionCompat = 130;
>   break;
>case 31:
>   ctx->Const.GLSLVersion = 140;
> + ctx->Const.GLSLVersionCompat = 140;
>   break;
>case 32:
>   ctx->Const.GLSLVersion = 150;
> + ctx->Const.GLSLVersionCompat = 150;
>   break;
>default:
> - if (ctx->Version >= 33)
> + if (ctx->Version >= 33) {
>  ctx->Const.GLSLVersion = ctx->Version * 10;
> +ctx->Const.GLSLVersionCompat = ctx->Version * 10;
> + }
>   break;

Looks like we should be able to just do this after the switch right?:

ctx->Const.GLSLVersionCompat = ctx->Const.GLSLVersion;

I'd prefer this unless there is something I am missing.

With that:
Reviewed-by: Iago Toral Quiroga 

>}
> }
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: add ff fragment shader support for geom and tess shaders

2018-06-18 Thread Iago Toral

On Mon, 2018-06-18 at 10:45 +0200, Gustaw Smolarczyk wrote:
> 2018-06-18 10:39 GMT+02:00 Iago Toral :
> > On Mon, 2018-06-18 at 09:43 +0200, Gustaw Smolarczyk wrote:
> > > 2018-06-18 4:39 GMT+02:00 Timothy Arceri :
> > > > This is required for compatibility profile support. 
> > > > ---
> > > >  src/mesa/main/ff_fragment_shader.cpp | 6 +-
> > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/src/mesa/main/ff_fragment_shader.cpp
> > > > b/src/mesa/main/ff_fragment_shader.cpp
> > > > index a698931d99e..935a21624af 100644
> > > > --- a/src/mesa/main/ff_fragment_shader.cpp
> > > > +++ b/src/mesa/main/ff_fragment_shader.cpp
> > > > @@ -229,7 +229,11 @@ static GLbitfield filter_fp_input_mask(
> > > > GLbitfield fp_inputs,
> > > >  * since vertex shader state validation comes after
> > > > fragment state
> > > >  * validation (see additional comments in state.c).
> > > >  */
> > > > -   if (vertexShader)
> > > > +   if (ctx->_Shader->CurrentProgram[MESA_SHADER_GEOMETRY] !=
> > > > NULL)
> > > > +  vprog = ctx->_Shader-
> > > > >CurrentProgram[MESA_SHADER_GEOMETRY];
> > > > +   else if (ctx->_Shader-
> > > > >CurrentProgram[MESA_SHADER_TESS_EVAL] != NULL)
> > > > +  vprog = ctx->_Shader-
> > > > >CurrentProgram[MESA_SHADER_TESS_EVAL];
> > > > +   else if (vertexShader)
> > > 
> > > 
> > > Shouldn't you also update the if condition on line 178?
> > > Otherwise, you won't reach the if tree you change when the vertex
> > > shader is missing (unless that was intended - I am not really
> > > familiar with how fixed function shaders work alongside new
> > > features).
> > 
> > You don't have Tesselation / Geometry with fixed function GL, so I
> > think this should be fine.
> > 
> 
> Well, this whole file implements fixed function fragment shader, so
> it will only be reached when the fragment shader is missing. Unless
> you meant that tessellation/geometry shaders cannot be combined with
> fixed function vertex shader but fixed function fragment shader is
> fine.

Yes, that is what I was thinking.  The OpenGL 4.5 spec with
compatibility profile states in 'chapter 12: fixed function vertex
processing":

"When programmable vertex processing (see chapter 11) is not being
performed, the fixed-function operations described in this chapter are 
performed instead. Vertices are first transformed as described in section 12.1, 
followed by lighting and coloring described described in section 12.2. The 
resulting transformed vertices are then processed as described in chapter 13."

And then 'Chapter 13: Fixed function vertex post-processing', doesn't
include geometry or tessellation shading, and seems to include the
fixed function stages that start after the geometry stage.

Iago


> Regards,
> Gustaw Smolarczyk
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] mesa: add ff fragment shader support for geom and tess shaders

2018-06-18 Thread Iago Toral

On Mon, 2018-06-18 at 09:43 +0200, Gustaw Smolarczyk wrote:
> 2018-06-18 4:39 GMT+02:00 Timothy Arceri :
> > This is required for compatibility profile support. 
> > 
> > ---
> > 
> >  src/mesa/main/ff_fragment_shader.cpp | 6 +-
> > 
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > 
> > 
> > diff --git a/src/mesa/main/ff_fragment_shader.cpp
> > b/src/mesa/main/ff_fragment_shader.cpp
> > 
> > index a698931d99e..935a21624af 100644
> > 
> > --- a/src/mesa/main/ff_fragment_shader.cpp
> > 
> > +++ b/src/mesa/main/ff_fragment_shader.cpp
> > 
> > @@ -229,7 +229,11 @@ static GLbitfield filter_fp_input_mask(
> > GLbitfield fp_inputs,
> > 
> >  * since vertex shader state validation comes after fragment
> > state
> > 
> >  * validation (see additional comments in state.c).
> > 
> >  */
> > 
> > -   if (vertexShader)
> > 
> > +   if (ctx->_Shader->CurrentProgram[MESA_SHADER_GEOMETRY] != NULL)
> > 
> > +  vprog = ctx->_Shader->CurrentProgram[MESA_SHADER_GEOMETRY];
> > 
> > +   else if (ctx->_Shader->CurrentProgram[MESA_SHADER_TESS_EVAL] !=
> > NULL)
> > 
> > +  vprog = ctx->_Shader->CurrentProgram[MESA_SHADER_TESS_EVAL];
> > 
> > +   else if (vertexShader)
> 
> 
> 
> Shouldn't you also update the if condition on line 178? Otherwise,
> you won't reach the if tree you change when the vertex shader is
> missing (unless that was intended - I am not really familiar with how
> fixed function shaders work alongside new features).

You don't have Tesselation / Geometry with fixed function GL, so I
think this should be fine.
Reviewed-by: Iago Toral Quiroga 
> You could also move or update the comment that is just above your
> change.
> 
> 
> Regards,Gustaw Smolarczyk 
> >vprog = ctx->_Shader->CurrentProgram[MESA_SHADER_VERTEX];
> > 
> > else
> > 
> >vprog = ctx->VertexProgram.Current;
> > 
> > -- 
> > 
> > 2.17.1
> > 
> > 
> > 
> > ___
> > 
> > mesa-dev mailing list
> > 
> > mesa-dev@lists.freedesktop.org
> > 
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > 
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/6] i965/fs: Optimize OR with 0 into a MOV

2018-06-15 Thread Iago Toral

I dropped a suggestion in patch 1 that also applies to patch 3, feel
free to take it or not, and then I pointed out a small issue in patch 6
that I think should be addressed that I think should be fixed.
Otherwise, the series is:

Reviewed-by: Iago Toral Quiroga 

On Thu, 2018-06-14 at 17:43 -0700, Ian Romanick wrote:
> From: Ian Romanick 
> 
> fs_visitor::set_gs_stream_control_data_bits generates some code like
> "control_data_bits | stream_id << ((2 * (vertex_count - 1)) % 32)" as
> part of EmitVertex.  The first time this (dynamically) occurs in the
> shader, control_data_bits is zero.  Many times we can determine this
> statically and various optimizations will collaborate to make one of
> the
> OR operands literal zero.
> 
> Converting the OR to a MOV usually allows it to be copy-propagated
> away.
> However, this does not happen in at least some shaders (in the
> assembly
> output of
> shaders/closed/UnrealEngine4/EffectsCaveDemo/301.shader_test,
> search for shl).
> 
> All of the affected shaders are geometry shaders.
> 
> Broadwell and Skylake had similar results. (Skylake shown)
> total instructions in shared programs: 14375452 -> 14375413 (<.01%)
> instructions in affected programs: 6422 -> 6383 (-0.61%)
> helped: 39
> HURT: 0
> helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
> helped stats (rel) min: 0.14% max: 2.56% x̄: 1.91% x̃: 2.56%
> 95% mean confidence interval for instructions value: -1.00 -1.00
> 95% mean confidence interval for instructions %-change: -2.26% -1.57%
> Instructions are helped.
> 
> total cycles in shared programs: 531981179 -> 531980555 (<.01%)
> cycles in affected programs: 27493 -> 26869 (-2.27%)
> helped: 39
> HURT: 0
> helped stats (abs) min: 16 max: 16 x̄: 16.00 x̃: 16
> helped stats (rel) min: 0.60% max: 7.92% x̄: 5.94% x̃: 7.92%
> 95% mean confidence interval for cycles value: -16.00 -16.00
> 95% mean confidence interval for cycles %-change: -6.98% -4.90%
> Cycles are helped.
> 
> No changes on earlier platforms.
> 
> Signed-off-by: Ian Romanick 
> ---
>  src/intel/compiler/brw_fs.cpp | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/intel/compiler/brw_fs.cpp
> b/src/intel/compiler/brw_fs.cpp
> index d67c0a41922..d836b268629 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -2448,7 +2448,8 @@ fs_visitor::opt_algebraic()
>   }
>   break;
>case BRW_OPCODE_OR:
> - if (inst->src[0].equals(inst->src[1])) {
> + if (inst->src[0].equals(inst->src[1]) ||
> + inst->src[1].is_zero()) {
>  inst->opcode = BRW_OPCODE_MOV;
>  inst->src[1] = reg_undef;
>  progress = true;
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 6/6] i965/fs: Propagate conditional modifiers from not instructions

2018-06-15 Thread Iago Toral

On Thu, 2018-06-14 at 17:43 -0700, Ian Romanick wrote:
> From: Ian Romanick 
> 
> Skylake
> total instructions in shared programs: 14399081 -> 14399010 (<.01%)
> instructions in affected programs: 26961 -> 26890 (-0.26%)
> helped: 57
> HURT: 0
> helped stats (abs) min: 1 max: 6 x̄: 1.25 x̃: 1
> helped stats (rel) min: 0.16% max: 0.80% x̄: 0.30% x̃: 0.18%
> 95% mean confidence interval for instructions value: -1.50 -0.99
> 95% mean confidence interval for instructions %-change: -0.35% -0.25%
> Instructions are helped.
> 
> total cycles in shared programs: 532978307 -> 532976050 (<.01%)
> cycles in affected programs: 468629 -> 466372 (-0.48%)
> helped: 33
> HURT: 20
> helped stats (abs) min: 3 max: 360 x̄: 116.52 x̃: 98
> helped stats (rel) min: 0.06% max: 3.63% x̄: 1.66% x̃: 1.27%
> HURT stats (abs)   min: 2 max: 172 x̄: 79.40 x̃: 43
> HURT stats (rel)   min: 0.04% max: 3.02% x̄: 1.48% x̃: 0.44%
> 95% mean confidence interval for cycles value: -81.29 -3.88
> 95% mean confidence interval for cycles %-change: -1.07% 0.12%
> Inconclusive result (%-change mean confidence interval includes 0).
> 
> All Gen6+ platforms, except Ivy Bridge, had similar results. (Haswell
> shown)
> total instructions in shared programs: 12973897 -> 12973838 (<.01%)
> instructions in affected programs: 25970 -> 25911 (-0.23%)
> helped: 55
> HURT: 0
> helped stats (abs) min: 1 max: 2 x̄: 1.07 x̃: 1
> helped stats (rel) min: 0.16% max: 0.62% x̄: 0.28% x̃: 0.18%
> 95% mean confidence interval for instructions value: -1.14 -1.00
> 95% mean confidence interval for instructions %-change: -0.32% -0.24%
> Instructions are helped.
> 
> total cycles in shared programs: 410355841 -> 410352067 (<.01%)
> cycles in affected programs: 578454 -> 574680 (-0.65%)
> helped: 47
> HURT: 5
> helped stats (abs) min: 3 max: 360 x̄: 85.74 x̃: 18
> helped stats (rel) min: 0.05% max: 3.68% x̄: 1.18% x̃: 0.38%
> HURT stats (abs)   min: 2 max: 242 x̄: 51.20 x̃: 4
> HURT stats (rel)   min: <.01% max: 0.45% x̄: 0.15% x̃: 0.11%
> 95% mean confidence interval for cycles value: -104.89 -40.27
> 95% mean confidence interval for cycles %-change: -1.45% -0.66%
> Cycles are helped.
> 
> Ivy Bridge
> total instructions in shared programs: 11679351 -> 11679301 (<.01%)
> instructions in affected programs: 28208 -> 28158 (-0.18%)
> helped: 50
> HURT: 0
> helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
> helped stats (rel) min: 0.12% max: 0.54% x̄: 0.23% x̃: 0.16%
> 95% mean confidence interval for instructions value: -1.00 -1.00
> 95% mean confidence interval for instructions %-change: -0.27% -0.19%
> Instructions are helped.
> 
> total cycles in shared programs: 257445362 -> 257444662 (<.01%)
> cycles in affected programs: 419338 -> 418638 (-0.17%)
> helped: 40
> HURT: 3
> helped stats (abs) min: 1 max: 170 x̄: 65.05 x̃: 24
> helped stats (rel) min: 0.02% max: 3.51% x̄: 1.26% x̃: 0.41%
> HURT stats (abs)   min: 2 max: 1588 x̄: 634.00 x̃: 312
> HURT stats (rel)   min: 0.05% max: 2.97% x̄: 1.21% x̃: 0.62%
> 95% mean confidence interval for cycles value: -97.96 65.41
> 95% mean confidence interval for cycles %-change: -1.56% -0.62%
> Inconclusive result (value mean confidence interval includes 0).
> 
> No changes on Iron Lake or GM45.
> 
> Signed-off-by: Ian Romanick 
> ---
>  src/intel/compiler/brw_fs_cmod_propagation.cpp | 63
> +-
>  1 file changed, 62 insertions(+), 1 deletion(-)
> 
> diff --git a/src/intel/compiler/brw_fs_cmod_propagation.cpp
> b/src/intel/compiler/brw_fs_cmod_propagation.cpp
> index b4f05613e98..c935cc66c81 100644
> --- a/src/intel/compiler/brw_fs_cmod_propagation.cpp
> +++ b/src/intel/compiler/brw_fs_cmod_propagation.cpp
> @@ -111,6 +111,61 @@ cmod_propagate_cmp_to_add(const gen_device_info
> *devinfo, bblock_t *block,
> return false;
>  }
>  
> +/**
> + * Propagate conditional modifiers from NOT instructions
> + *
> + * Attempt to convert sequences like
> + *
> + *or(8)   g78<8,8,1>  g76<8,8,1>UDg77<8,8,1>UD
> + *...
> + *not.nz.f0(8)nullg78<8,8,1>UD
> + *
> + * into
> + *
> + *or.z.f0(8)  g78<8,8,1>  g76<8,8,1>UDg77<8,8,1>UD
> + */
> +static bool
> +cmod_propagate_not(const gen_device_info *devinfo, bblock_t *block,
> +   fs_inst *inst)
> +{
> +   const enum brw_conditional_mod cond = brw_negate_cmod(inst-
> >conditional_mod);
> +   bool read_flag = false;
> +
> +   foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst,
> inst) {
> +  if (regions_overlap(scan_inst->dst, scan_inst->size_written,
> +  inst->src[0], inst->size_read(0))) {
> + if (cond != BRW_CONDITIONAL_Z &&
> + cond != BRW_CONDITIONAL_NZ)
> +break;

Looks like we can do this before the loop.

> +
> + if (scan_inst->opcode != BRW_OPCODE_OR &&
> + scan_inst->opcode != BRW_OPCODE_AND)
> +break;
> +
> + if (scan_inst->is_partial_write() ||
> + scan_inst->dst.offset !=

Re: [Mesa-dev] [PATCH 1/6] i965/fs: Optimize OR with 0 into a MOV

2018-06-15 Thread Iago Toral

On Thu, 2018-06-14 at 17:43 -0700, Ian Romanick wrote:
> From: Ian Romanick 
> 
> fs_visitor::set_gs_stream_control_data_bits generates some code like
> "control_data_bits | stream_id << ((2 * (vertex_count - 1)) % 32)" as
> part of EmitVertex.  The first time this (dynamically) occurs in the
> shader, control_data_bits is zero.  Many times we can determine this
> statically and various optimizations will collaborate to make one of
> the
> OR operands literal zero.
> 
> Converting the OR to a MOV usually allows it to be copy-propagated
> away.
> However, this does not happen in at least some shaders (in the
> assembly
> output of
> shaders/closed/UnrealEngine4/EffectsCaveDemo/301.shader_test,
> search for shl).
> 
> All of the affected shaders are geometry shaders.
> 
> Broadwell and Skylake had similar results. (Skylake shown)
> total instructions in shared programs: 14375452 -> 14375413 (<.01%)
> instructions in affected programs: 6422 -> 6383 (-0.61%)
> helped: 39
> HURT: 0
> helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
> helped stats (rel) min: 0.14% max: 2.56% x̄: 1.91% x̃: 2.56%
> 95% mean confidence interval for instructions value: -1.00 -1.00
> 95% mean confidence interval for instructions %-change: -2.26% -1.57%
> Instructions are helped.
> 
> total cycles in shared programs: 531981179 -> 531980555 (<.01%)
> cycles in affected programs: 27493 -> 26869 (-2.27%)
> helped: 39
> HURT: 0
> helped stats (abs) min: 16 max: 16 x̄: 16.00 x̃: 16
> helped stats (rel) min: 0.60% max: 7.92% x̄: 5.94% x̃: 7.92%
> 95% mean confidence interval for cycles value: -16.00 -16.00
> 95% mean confidence interval for cycles %-change: -6.98% -4.90%
> Cycles are helped.
> 
> No changes on earlier platforms.
> 
> Signed-off-by: Ian Romanick 
> ---
>  src/intel/compiler/brw_fs.cpp | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/intel/compiler/brw_fs.cpp
> b/src/intel/compiler/brw_fs.cpp
> index d67c0a41922..d836b268629 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -2448,7 +2448,8 @@ fs_visitor::opt_algebraic()
>   }
>   break;
>case BRW_OPCODE_OR:
> - if (inst->src[0].equals(inst->src[1])) {
> + if (inst->src[0].equals(inst->src[1]) ||
> + inst->src[1].is_zero()) {
>  inst->opcode = BRW_OPCODE_MOV;
>  inst->src[1] = reg_undef;
>  progress = true;

While we are at this, shouldn't we also handle this as a MOV (from
src[1]) when src[0].is_zero() is true?

Iago
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 07/14] intel/compiler: shuffle_from_32bit_read for 64-bit do_untyped_vector_read

2018-06-14 Thread Iago Toral

On Fri, 2018-06-15 at 00:20 +0200, Chema Casanova wrote:
> 
> On 14/06/18 03:26, Jason Ekstrand wrote:
> > On Sat, Jun 9, 2018 at 4:13 AM, Jose Maria Casanova Crespo
> > mailto:jmcasan...@igalia.com>> wrote:
> > 
> > do_untyped_vector_read is used at load_ssbo and load_shared.
> > 
> > The previous MOVs are removed because shuffle_from_32bit_read
> > can handle storing the shuffle results in the expected
> > destination
> > just using the proper offset.
> > ---
> >  src/intel/compiler/brw_fs_nir.cpp | 12 ++--
> >  1 file changed, 2 insertions(+), 10 deletions(-)
> > 
> > diff --git a/src/intel/compiler/brw_fs_nir.cpp
> > b/src/intel/compiler/brw_fs_nir.cpp
> > index 7e738ade82e..780a9e228de 100644
> > --- a/src/intel/compiler/brw_fs_nir.cpp
> > +++ b/src/intel/compiler/brw_fs_nir.cpp
> > @@ -2434,16 +2434,8 @@ do_untyped_vector_read(const fs_builder
> > ,
> >
> >  BRW_PREDICATE_NONE);
> > 
> >   /* Shuffle the 32-bit load result into valid 64-bit
> > data */
> > - const fs_reg packed_result = bld.vgrf(dest.type,
> > iter_components);
> > - shuffle_32bit_load_result_to_64bit_data(
> > -bld, packed_result, read_result, iter_components);
> > -
> > - /* Move each component to its destination */
> > - read_result = retype(read_result,
> > BRW_REGISTER_TYPE_DF);
> > - for (int c = 0; c < iter_components; c++) {
> > -bld.MOV(offset(dest, bld, it * 2 + c),
> > -offset(packed_result, bld, c));
> > - }
> > 
> > 
> > I really don't know why we needed this extra set of MOVs.  They
> > seem
> > pretty pointless to me.  Maybe history?  In any case, this looks
> > good.v-
> 
> I've just checked and there is not much history as the 64-bit code of
> this function hasn't been changed since they landed. I think that the
> logic was first shuffle and then move to the proper destination
> instead
> of just shuffling to the final destination directly.

Could it be related to non-uniform control flow? Does the function
below disable channel masks to shuffle? If it does, then I think we
need to do the shuffle to a temporary and the move from there to its
original destination with channel masking enabled.

Iago

> So maybe Iago remembers if there was any reason why...
> 
> > Reviewed-by: Jason Ekstrand  > >
> >  
> > 
> > + shuffle_from_32bit_read(bld, offset(dest, bld, it *
> > 2),
> > + read_result, 0,
> > iter_components);
> > 
> >   bld.ADD(read_offset, read_offset, brw_imm_ud(16));
> >}
> > -- 
> > 2.17.1
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org  > op.org>
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > 
> > 
> > 
> > 
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > 
> 
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/3] intel: implement an optimization pass to clean-up boolean conversions

2018-06-05 Thread Iago Toral

This isn't reviewed yet, any feedback?

Iago

On Tue, 2018-05-15 at 13:05 +0200, Iago Toral Quiroga wrote:
> NIR assumes that all booleans are 32-bit, so drivers need to produce
> 32-bit
> booleans even if they can produce native booleans of a different bit-
> size, like
> Intel does. This means that if we have a 16-bit CMP instruction, we
> generate a
> 16-bit boolean that we immediately convert to 32-bit, since that is
> the bit-size
> expected by NIR for all consumers of the boolean.
> 
> This backend optimization pass identifies these cases after we are
> done
> translating from NIR to FS IR, and propagates the lower bit-size
> booleans
> to allow DCE to remove the 32-bit conversions. The pass should run
> early
> after translating from NIR, since it assumes that boolean conversions
> to
> 32-bit take place immediately after the corresponding CMP
> instructions.
> 
> This has been tested with existing and work-in-progress CTS tests as
> well
> as some had-hoc VkRunner I wrote.
> 
> For more context you can read this discussion:
> https://lists.freedesktop.org/archives/mesa-dev/2018-April/192751.htm
> l
> 
> One point raised by Jason during the discussion linked above was that
> we might
> need to canonicalize booleans of different native bit-sizes when they
> are
> combined in boolean expressions. However, as indicated in the commit
> log for the
> last patch in the series, my interpretation of the PRM is that the
> hardware can
> handle this situation without us having to do anything about it. The
> last patch
> contains canonicalization code under a disabled #if guard anyway,
> just in case
> reviewers think this is needed in the end and want to have a look at
> what it
> could look like.
> 
> Alternatively to what is being done here, we could also change the
> way
> we construct CMP instructions to take advantage of the PRM
> documentation that
> says that CMP instructions can mix and match *B, *W and *D for their
> source
> and destination arguments since gen5 to always produce canonical 32-
> bit bools
> like NIR expects. However, since all hardware gens still produce 16-
> bit booleans
> for half-float, we would still need to handle that case specially
> with a similar
> pass so we would not gaining much from that. Also, in that case we
> would always
> operate with 32-bit booleans, losing the possibility to emit native
> 16-bit
> boolean instructions where possible.
> 
> Iago Toral Quiroga (3):
>   intel/compiler: make brw_reg_type_from_bit_size usable from other
> places
>   intel/compiler: add a region_match() helper
>   intel/compiler: add an optimization pass for booleans
> 
>  src/intel/compiler/brw_fs.cpp | 291
> ++
>  src/intel/compiler/brw_fs.h   |   5 +
>  src/intel/compiler/brw_fs_nir.cpp |  59 
>  src/intel/compiler/brw_ir_fs.h|  13 ++
>  4 files changed, 309 insertions(+), 59 deletions(-)
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl: Add ir_binop_vector_extract in NIR

2018-06-01 Thread Iago Toral

On Fri, 2018-06-01 at 12:26 +0200, Iago Toral wrote:
> On Fri, 2018-06-01 at 11:25 +0200, Juan A. Suarez Romero wrote:
> > On Wed, 2018-05-30 at 15:10 -0700, Eric Anholt wrote:
> > > "Juan A. Suarez Romero"  writes:
> > > 
> > > > Implement ir_binop_vector_extract using NIR operations. Based
> > > > on
> > > > SPIR-V
> > > > to NIR approach.
> > > > 
> > > > This fixes:
> > > > dEQP-
> > > > GLES3.functional.shaders.indexing.moredynamic.with_value_from_i
> > > > nd
> > > > exing_expression_fragment
> > > > Piglit's glsl-fs-vec4-indexing-8.shader_test
> > > > 
> > > > Signed-off-by: Juan A. Suarez Romero 
> > > > ---
> > > > 
> > > > Pending to verify that this also fixes https://bugs.freedesktop
> > > > .o
> > > > rg/show_bug.cgi?id=105438
> > > > 
> > > >  src/compiler/glsl/glsl_to_nir.cpp | 11 +++
> > > >  1 file changed, 11 insertions(+)
> > > > 
> > > > diff --git a/src/compiler/glsl/glsl_to_nir.cpp
> > > > b/src/compiler/glsl/glsl_to_nir.cpp
> > > > index 8e5e9c34912..5fc420d856f 100644
> > > > --- a/src/compiler/glsl/glsl_to_nir.cpp
> > > > +++ b/src/compiler/glsl/glsl_to_nir.cpp
> > > > @@ -1928,6 +1928,17 @@ nir_visitor::visit(ir_expression *ir)
> > > >  unreachable("not reached");
> > > >}
> > > >break;
> > > > +   case ir_binop_vector_extract: {
> > > > +  unsigned swiz[4] = { 0 };
> > > > +  result = nir_swizzle(, srcs[0], swiz, 1, true);
> > > > +  for (unsigned i = 1; i < ir->operands[0]->type-
> > > > > vector_elements; i++) {
> > > > 
> > > > + swiz[0] = i;
> > > > + nir_ssa_def *swizzled = nir_swizzle(, srcs[0],
> > > > swiz,
> > > > 1, true);
> > > 
> > > You could use nir_channel(, srcs[0], i) here and above to
> > > simplify
> > > (which I think gets the use_fmov argument right, as well).  Other
> > > than
> > > that,
> > > 
> > 
> > I'm checking nir_channel(), and it sets use_fmov to false, rather
> > than true.
> 
> I don't think that is a problem, right? does that make the test fail?

I just checked and it seems to work just fine, as expected, feel free
to add my Rb as well to the patch with Eric's suggestions.

Iago

> 
> > 
> > J.A.
> > 
> > > Reviewed-by: Eric Anholt 
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl: Add ir_binop_vector_extract in NIR

2018-06-01 Thread Iago Toral

On Fri, 2018-06-01 at 11:25 +0200, Juan A. Suarez Romero wrote:
> On Wed, 2018-05-30 at 15:10 -0700, Eric Anholt wrote:
> > "Juan A. Suarez Romero"  writes:
> > 
> > > Implement ir_binop_vector_extract using NIR operations. Based on
> > > SPIR-V
> > > to NIR approach.
> > > 
> > > This fixes:
> > > dEQP-
> > > GLES3.functional.shaders.indexing.moredynamic.with_value_from_ind
> > > exing_expression_fragment
> > > Piglit's glsl-fs-vec4-indexing-8.shader_test
> > > 
> > > Signed-off-by: Juan A. Suarez Romero 
> > > ---
> > > 
> > > Pending to verify that this also fixes https://bugs.freedesktop.o
> > > rg/show_bug.cgi?id=105438
> > > 
> > >  src/compiler/glsl/glsl_to_nir.cpp | 11 +++
> > >  1 file changed, 11 insertions(+)
> > > 
> > > diff --git a/src/compiler/glsl/glsl_to_nir.cpp
> > > b/src/compiler/glsl/glsl_to_nir.cpp
> > > index 8e5e9c34912..5fc420d856f 100644
> > > --- a/src/compiler/glsl/glsl_to_nir.cpp
> > > +++ b/src/compiler/glsl/glsl_to_nir.cpp
> > > @@ -1928,6 +1928,17 @@ nir_visitor::visit(ir_expression *ir)
> > >  unreachable("not reached");
> > >}
> > >break;
> > > +   case ir_binop_vector_extract: {
> > > +  unsigned swiz[4] = { 0 };
> > > +  result = nir_swizzle(, srcs[0], swiz, 1, true);
> > > +  for (unsigned i = 1; i < ir->operands[0]->type-
> > > >vector_elements; i++) {
> > > + swiz[0] = i;
> > > + nir_ssa_def *swizzled = nir_swizzle(, srcs[0], swiz,
> > > 1, true);
> > 
> > You could use nir_channel(, srcs[0], i) here and above to
> > simplify
> > (which I think gets the use_fmov argument right, as well).  Other
> > than
> > that,
> > 
> 
> I'm checking nir_channel(), and it sets use_fmov to false, rather
> than true.

I don't think that is a problem, right? does that make the test fail?

> 
>   J.A.
> 
> > Reviewed-by: Eric Anholt 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/2] nir: Lower !f2b(x) to x == 0.0

2018-06-01 Thread Iago Toral

Reviewed-by: Iago Toral Quiroga 

On Thu, 2018-05-31 at 17:21 -0700, Ian Romanick wrote:
> From: Ian Romanick 
> 
> Some trivial help now, but it also prevents ~40 regressions caused by
> Samuel's "nir: implement the GLSL equivalent of if simplication in
> nir_opt_if" patch.
> 
> All Gen4+ platforms had similar results. (Skylake shown)
> total instructions in shared programs: 14369557 -> 14369555 (<.01%)
> instructions in affected programs: 442 -> 440 (-0.45%)
> helped: 2
> HURT: 0
> 
> total cycles in shared programs: 532425772 -> 532425743 (<.01%)
> cycles in affected programs: 6086 -> 6057 (-0.48%)
> helped: 2
> HURT: 0
> 
> Signed-off-by: Ian Romanick 
> Cc: Samuel Pitoiset 
> Cc: Timothy Arceri 
> ---
>  src/compiler/nir/nir_opt_algebraic.py | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/compiler/nir/nir_opt_algebraic.py
> b/src/compiler/nir/nir_opt_algebraic.py
> index f153570105b..fdfb0250b0b 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -441,6 +441,7 @@ optimizations = [
> (('i2b', ('iabs', a)), ('i2b', a)),
> (('fabs', ('b2f', a)), ('b2f', a)),
> (('iabs', ('b2i', a)), ('b2i', a)),
> +   (('inot', ('f2b', a)), ('feq', a, 0.0)),
>  
> # Packing and then unpacking does nothing
> (('unpack_64_2x32_split_x', ('pack_64_2x32_split', a, b)), a),
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 09/22] intel/compiler: implement 16-bit multiply-add

2018-05-22 Thread Iago Toral

On Mon, 2018-05-21 at 13:49 +0300, Eero Tamminen wrote:
> Hi,
> 
> On 21.05.2018 10:42, Iago Toral wrote:
> > On Fri, 2018-05-18 at 12:08 +0300, Eero Tamminen wrote:
> > > On 17.05.2018 14:25, Eero Tamminen wrote:
> > > > On 17.05.2018 11:46, Iago Toral Quiroga wrote:
> > > > > The PRM for MAD states that F, DF and HF are supported,
> > > > > however,
> > > > > then
> > > > > it requires that the instruction includes a 2-bit mask
> > > > > specifying
> > > > > the types of each operand like this:
> > > > 
> > > >   >
> > > > > 00: 32-bit float
> > > > > 01: 32-bit signed integer
> > > > > 10: 32-bit unsigned integer
> > > > > 11: 64-bit float
> > > > 
> > > > Where this was?
> > 
> > This is in the decription of the MAD instruction here (for SKL):
> > 
> > https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc
> > -skl
> > -vol02a-commandreference-instructions.pdf
> > 
> > It guess the contents for this were copy & pasted from previous
> > PRMs
> > that didn't support HF...
> 
> Ouch.  That looks pretty different from what's in here:
> https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-s
> kl-vol07-3d_media_gpgpu.pdf
> 
> > > > In
> > > > https://01.org/sites/default/files/documentation/intel-gfx-bspe
> > > > c-os
> > > > rc-chv-bsw-vol07-3d-media-gpgpu-engine.pdf
> 
> I'll ask around who's currently maintaining the 01.org docs, and
> could 
> she/he update the opcode docs. All of them from BSW to KBL seem to
> have 
> the old information, while the Media GPGPU docs have newer info.

Thanks Eero!

> 
> Btw.  Did you have test-cases utilizing mad() instructions, and did
> they work OK with your patchset?

Yes, but they only worked because I was dropping the MAD and open
coding it as MUL+ADD for HF operands.

> (If yes, better test-cases may be required.)

The existing tests catch the problem if I don't attempt to lower the
MAD for HF, so they are good enough for this.

BTW, I implemented the solution using the Src1Type and Src2Type bits
and it seems to work fine in BDW as well.

Iago

> 
> > > > Section "EU Changes by Processor Generation" states:
> > > > -
> > > > These features or behaviors are added for CHV, BSW, continuing
> > > > to
> > > > later generations:
> > > > ...
> > > > In the 3-source instruction format, widen the SrcType and
> > > > DstType
> > > > fields
> > > > and add an encoding for the HF (Half Float) type.
> > > > -
> > > > 
> > > > (I.e. it applies to GEN9+ [1] and on GEN8 for BSW/CHV.)
> > 
> > Actually, I have just verified that the BDW PRMs have the exact
> > same
> > thing, but stating BDW instead of BSW/CHV, so I guess BDW should be
> > supported too.
> 
> Yes, right.
> 
> 
> > > > In section "GEN Instruction Format – 3-src" table, both "Dst
> > > > Type"
> > > > and
> > > > "Src Type" fields are 3 bits, and there's additional 1 bit
> > > > "Src1
> > > > Type"
> > > > and "Src2 Type" fields to differentiate formats for src1 &
> > > > src2.
> > > 
> > >   >
> > > > Then, when looking at "Source or Destination Operand Fields
> > > > (Alphabetically by Short Name)" section:
> > > > ---
> > > > DstType:
> > > > 
> > > > Encoding for three source instructions:
> > > > 000b = :f. Single precision Float (32-bit).
> > > > 001b = :d. Signed Doubleword integer.
> > > > 010b = :ud. Unsigned Doubleword integer.
> > > > 011b = :df. Double precision Float (64-bit).
> > > > 100b = :hf. Half precision Float (16-bit).
> > > > 101b - 111b. Reserved.
> > > > 
> > > > ...
> > > > 
> > > > SrcType:
> > > > 
> > > > Three source instructions use one SrcType field for all source
> > > > operands,
> > > > with a 3-bit encoding that allows fewer data types:
> > > > 
> > > > Encoding for three source instructions:
> > > > 000b = :f. Single precision Float (32-bit).
> > > > 001b = :d. Signed Doubleword integer.
> > > >

Re: [Mesa-dev] [PATCH 03/22] compiler/spirv: fix SpvOpIsInf for 16-bit float

2018-05-21 Thread Iago Toral

On Thu, 2018-05-17 at 06:59 -0700, Jason Ekstrand wrote:
> 
> On May 17, 2018 01:47:11 Iago Toral Quiroga <ito...@igalia.com>
> wrote:
> 
> > ---
> > src/compiler/spirv/vtn_alu.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/compiler/spirv/vtn_alu.c
> > b/src/compiler/spirv/vtn_alu.c
> > index 5f9cc97fdfb..62a5149797a 100644
> > --- a/src/compiler/spirv/vtn_alu.c
> > +++ b/src/compiler/spirv/vtn_alu.c
> > @@ -578,7 +578,9 @@ vtn_handle_alu(struct vtn_builder *b, SpvOp
> > opcode,
> >   break;
> > 
> >case SpvOpIsInf: {
> > -  nir_ssa_def *inf = nir_imm_floatN_t(>nb, INFINITY,
> > src[0]->bit_size);
> > +  nir_ssa_def *inf = src[0]->bit_size > 16 ?
> > + nir_imm_floatN_t(>nb, INFINITY, src[0]->bit_size) :
> > + nir_imm_intN_t(>nb, 0x7c00, 16);
> 
> We should just make nir_imm_floatN_t handle 16-bit floats with 
> _mesa_float_to_half().

Right, I did that in a later patch and forgot to come back here and fix
 this. Will do that now, thanks!

> >   val->ssa->def = nir_ieq(>nb, nir_fabs(>nb, src[0]),
> > inf);
> >   break;
> >}
> > --
> > 2.14.1
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> 
> 
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 19/22] intel/compiler: lower 16-bit fmod

2018-05-17 Thread Iago Toral Quiroga

---
 src/intel/compiler/brw_compiler.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/intel/compiler/brw_compiler.c 
b/src/intel/compiler/brw_compiler.c
index 6480dbefbf6..36a870ece0d 100644
--- a/src/intel/compiler/brw_compiler.c
+++ b/src/intel/compiler/brw_compiler.c
@@ -33,6 +33,7 @@
.lower_sub = true, \
.lower_fdiv = true,\
.lower_scmp = true,\
+   .lower_fmod16 = true,  \
.lower_fmod32 = true,  \
.lower_fmod64 = false, \
.lower_bitfield_extract = true,\
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 21/22] intel/compiler: lower 16-bit flrp

2018-05-17 Thread Iago Toral Quiroga

---
 src/intel/compiler/brw_compiler.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/intel/compiler/brw_compiler.c 
b/src/intel/compiler/brw_compiler.c
index 36a870ece0d..250e4695ded 100644
--- a/src/intel/compiler/brw_compiler.c
+++ b/src/intel/compiler/brw_compiler.c
@@ -33,6 +33,7 @@
.lower_sub = true, \
.lower_fdiv = true,\
.lower_scmp = true,\
+   .lower_flrp16 = true,  \
.lower_fmod16 = true,  \
.lower_fmod32 = true,  \
.lower_fmod64 = false, \
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 20/22] compiler/nir: add lowering for 16-bit flrp

2018-05-17 Thread Iago Toral Quiroga

---
 src/compiler/nir/nir.h| 1 +
 src/compiler/nir/nir_opt_algebraic.py | 1 +
 2 files changed, 2 insertions(+)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 59c84bde268..7e4c78cc1b7 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -1871,6 +1871,7 @@ typedef struct nir_shader_compiler_options {
bool lower_fdiv;
bool lower_ffma;
bool fuse_ffma;
+   bool lower_flrp16;
bool lower_flrp32;
/** Lowers flrp when it does not support doubles */
bool lower_flrp64;
diff --git a/src/compiler/nir/nir_opt_algebraic.py 
b/src/compiler/nir/nir_opt_algebraic.py
index 1033a42a06c..75e71efcc6b 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -113,6 +113,7 @@ optimizations = [
(('~flrp', 0.0, a, b), ('fmul', a, b)),
(('~flrp', a, b, ('b2f', c)), ('bcsel', c, b, a), 'options->lower_flrp32'),
(('~flrp', a, 0.0, c), ('fadd', ('fmul', ('fneg', a), c), a)),
+   (('flrp@16', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 
'options->lower_flrp16'),
(('flrp@32', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 
'options->lower_flrp32'),
(('flrp@64', a, b, c), ('fadd', ('fmul', c, ('fsub', b, a)), a), 
'options->lower_flrp64'),
(('ffract', a), ('fsub', a, ('ffloor', a)), 'options->lower_ffract'),
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 22/22] intel/compiler: Extended Math is limited to SIMD8 on half-float

2018-05-17 Thread Iago Toral Quiroga

From the Skylake PRM, Extended Math Function:

  "The execution size must be no more than 8 when half-floats
   are used in source or destination operand."

Earlier generations do not support Extended Math with half-float.
---
 src/intel/compiler/brw_fs.cpp | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index b21996c1682..dcba4ee8068 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -5199,18 +5199,34 @@ get_lowered_simd_width(const struct gen_device_info 
*devinfo,
case SHADER_OPCODE_EXP2:
case SHADER_OPCODE_LOG2:
case SHADER_OPCODE_SIN:
-   case SHADER_OPCODE_COS:
+   case SHADER_OPCODE_COS: {
   /* Unary extended math instructions are limited to SIMD8 on Gen4 and
* Gen6.
*/
-  return (devinfo->gen >= 7 ? MIN2(16, inst->exec_size) :
-  devinfo->gen == 5 || devinfo->is_g4x ? MIN2(16, inst->exec_size) 
:
-  MIN2(8, inst->exec_size));
+  unsigned max_width =
+ (devinfo->gen >= 7 ? MIN2(16, inst->exec_size) :
+  devinfo->gen == 5 || devinfo->is_g4x ? MIN2(16, inst->exec_size) :
+  MIN2(8, inst->exec_size));
 
-   case SHADER_OPCODE_POW:
+  /* Extended Math Function is limited to SIMD8 with half-float */
+  if (inst->dst.type == BRW_REGISTER_TYPE_HF)
+ max_width = MIN2(max_width, 8);
+
+  return max_width;
+   }
+
+   case SHADER_OPCODE_POW: {
   /* SIMD16 is only allowed on Gen7+. */
-  return (devinfo->gen >= 7 ? MIN2(16, inst->exec_size) :
-  MIN2(8, inst->exec_size));
+  unsigned max_width =
+  (devinfo->gen >= 7 ? MIN2(16, inst->exec_size) :
+   MIN2(8, inst->exec_size));
+
+  /* Extended Math Function is limited to SIMD8 with half-float */
+  if (inst->dst.type == BRW_REGISTER_TYPE_HF)
+ max_width = MIN2(max_width, 8);
+
+  return max_width;
+   }
 
case SHADER_OPCODE_INT_QUOTIENT:
case SHADER_OPCODE_INT_REMAINDER:
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 17/22] compiler/spirv: implement 16-bit frexp

2018-05-17 Thread Iago Toral Quiroga

---
 src/compiler/spirv/vtn_glsl450.c | 48 ++--
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/src/compiler/spirv/vtn_glsl450.c b/src/compiler/spirv/vtn_glsl450.c
index 738f1ea93f1..88d2dcfb0fd 100644
--- a/src/compiler/spirv/vtn_glsl450.c
+++ b/src/compiler/spirv/vtn_glsl450.c
@@ -408,6 +408,45 @@ build_atan2(nir_builder *b, nir_ssa_def *y, nir_ssa_def *x)
 nir_fneg(b, arc), arc);
 }
 
+static nir_ssa_def *
+build_frexp16(nir_builder *b, nir_ssa_def *x, nir_ssa_def **exponent)
+{
+   assert(x->bit_size == 16);
+
+   nir_ssa_def *abs_x = nir_fabs(b, x);
+   nir_ssa_def *zero = nir_imm_floatN_t(b, 0, 16);
+
+   /* Half-precision floating-point values are stored as
+*   1 sign bit;
+*   5 exponent bits;
+*   10 mantissa bits.
+*
+* An exponent shift of 10 will shift the mantissa out, leaving only the
+* exponent and sign bit (which itself may be zero, if the absolute value
+* was taken before the bitcast and shift.
+*/
+   nir_ssa_def *exponent_shift = nir_imm_int(b, 10);
+   nir_ssa_def *exponent_bias = nir_imm_intN_t(b, -14, 16);
+
+   nir_ssa_def *sign_mantissa_mask = nir_imm_intN_t(b, 0x83ffu, 16);
+
+   /* Exponent of floating-point values in the range [0.5, 1.0). */
+   nir_ssa_def *exponent_value = nir_imm_intN_t(b, 0x3c00u, 16);
+
+   nir_ssa_def *is_not_zero = nir_fne(b, abs_x, zero);
+
+   /* Significand return must be of the same type as the input, but the
+* exponent must be a 32-bit integer.
+*/
+   *exponent =
+  nir_i2i32(b,
+nir_iadd(b, nir_ushr(b, abs_x, exponent_shift),
+nir_bcsel(b, is_not_zero, exponent_bias, zero)));
+
+   return nir_ior(b, nir_iand(b, x, sign_mantissa_mask),
+ nir_bcsel(b, is_not_zero, exponent_value, zero));
+}
+
 static nir_ssa_def *
 build_frexp32(nir_builder *b, nir_ssa_def *x, nir_ssa_def **exponent)
 {
@@ -791,8 +830,10 @@ handle_glsl450_alu(struct vtn_builder *b, enum GLSLstd450 
entrypoint,
   nir_ssa_def *exponent;
   if (src[0]->bit_size == 64)
  val->ssa->def = build_frexp64(nb, src[0], );
-  else
+  else if (src[0]->bit_size == 32)
  val->ssa->def = build_frexp32(nb, src[0], );
+  else
+ val->ssa->def = build_frexp16(nb, src[0], );
   nir_store_deref_var(nb, vtn_nir_deref(b, w[6]), exponent, 0xf);
   return;
}
@@ -802,9 +843,12 @@ handle_glsl450_alu(struct vtn_builder *b, enum GLSLstd450 
entrypoint,
   if (src[0]->bit_size == 64)
  val->ssa->elems[0]->def = build_frexp64(nb, src[0],
  >ssa->elems[1]->def);
-  else
+  else if (src[0]->bit_size == 32)
  val->ssa->elems[0]->def = build_frexp32(nb, src[0],
  >ssa->elems[1]->def);
+  else
+ val->ssa->elems[0]->def = build_frexp16(nb, src[0],
+ >ssa->elems[1]->def);
   return;
}
 
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 18/22] compiler/nir: add lowering option for 16-bit fmod

2018-05-17 Thread Iago Toral Quiroga

---
 src/compiler/nir/nir.h| 1 +
 src/compiler/nir/nir_opt_algebraic.py | 1 +
 2 files changed, 2 insertions(+)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index a379928cdcd..59c84bde268 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -1877,6 +1877,7 @@ typedef struct nir_shader_compiler_options {
bool lower_fpow;
bool lower_fsat;
bool lower_fsqrt;
+   bool lower_fmod16;
bool lower_fmod32;
bool lower_fmod64;
bool lower_bitfield_extract;
diff --git a/src/compiler/nir/nir_opt_algebraic.py 
b/src/compiler/nir/nir_opt_algebraic.py
index 96232f0e549..1033a42a06c 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -481,6 +481,7 @@ optimizations = [
(('bcsel', ('ine', a, -1), ('ifind_msb', a), -1), ('ifind_msb', a)),
 
# Misc. lowering
+   (('fmod@16', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv', a, b, 
'options->lower_fmod16'),
(('fmod@32', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv', a, b, 
'options->lower_fmod32'),
(('fmod@64', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv', a, b, 
'options->lower_fmod64'),
(('frem', a, b), ('fsub', a, ('fmul', b, ('ftrunc', ('fdiv', a, b, 
'options->lower_fmod32'),
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 16/22] compiler/spirv: implement 16-bit hyperbolic trigonometric functions

2018-05-17 Thread Iago Toral Quiroga

---
 src/compiler/spirv/vtn_glsl450.c | 29 +++--
 1 file changed, 19 insertions(+), 10 deletions(-)

diff --git a/src/compiler/spirv/vtn_glsl450.c b/src/compiler/spirv/vtn_glsl450.c
index 324e8b5874a..738f1ea93f1 100644
--- a/src/compiler/spirv/vtn_glsl450.c
+++ b/src/compiler/spirv/vtn_glsl450.c
@@ -712,7 +712,7 @@ handle_glsl450_alu(struct vtn_builder *b, enum GLSLstd450 
entrypoint,
case GLSLstd450Sinh:
   /* 0.5 * (e^x - e^(-x)) */
   val->ssa->def =
- nir_fmul(nb, nir_imm_float(nb, 0.5f),
+ nir_fmul(nb, nir_imm_floatN_t(nb, 0.5f, src[0]->bit_size),
   nir_fsub(nb, build_exp(nb, src[0]),
build_exp(nb, nir_fneg(nb, src[0];
   return;
@@ -720,7 +720,7 @@ handle_glsl450_alu(struct vtn_builder *b, enum GLSLstd450 
entrypoint,
case GLSLstd450Cosh:
   /* 0.5 * (e^x + e^(-x)) */
   val->ssa->def =
- nir_fmul(nb, nir_imm_float(nb, 0.5f),
+ nir_fmul(nb, nir_imm_floatN_t(nb, 0.5f, src[0]->bit_size),
   nir_fadd(nb, build_exp(nb, src[0]),
build_exp(nb, nir_fneg(nb, src[0];
   return;
@@ -733,11 +733,20 @@ handle_glsl450_alu(struct vtn_builder *b, enum GLSLstd450 
entrypoint,
* We clamp x to (-inf, +10] to avoid precision problems.  When x > 10,
* e^2x is so much larger than 1.0 that 1.0 gets flushed to zero in the
* computation e^2x +/- 1 so it can be ignored.
+   *
+   * For 16-bit precision we clamp x to (-inf, +4.2] since the maximum
+   * representable number is only 65,504 and e^(2*6) exceeds that. Also,
+   * if x > 4.2, tanh(x) will return 1.0 in fp16.
*/
-  nir_ssa_def *x = nir_fmin(nb, src[0], nir_imm_float(nb, 10));
-  nir_ssa_def *exp2x = build_exp(nb, nir_fmul(nb, x, nir_imm_float(nb, 
2)));
-  val->ssa->def = nir_fdiv(nb, nir_fsub(nb, exp2x, nir_imm_float(nb, 1)),
-   nir_fadd(nb, exp2x, nir_imm_float(nb, 1)));
+  const uint32_t bit_size = src[0]->bit_size;
+  const double clamped_x = bit_size > 16 ? 10.0 : 4.2;
+  nir_ssa_def *x = nir_fmin(nb, src[0],
+nir_imm_floatN_t(nb, clamped_x, bit_size));
+  nir_ssa_def *one = nir_imm_floatN_t(nb, 1.0, bit_size);
+  nir_ssa_def *two = nir_imm_floatN_t(nb, 2.0, bit_size);
+  nir_ssa_def *exp2x = build_exp(nb, nir_fmul(nb, x, two));
+  val->ssa->def = nir_fdiv(nb, nir_fsub(nb, exp2x, one),
+   nir_fadd(nb, exp2x, one));
   return;
}
 
@@ -745,16 +754,16 @@ handle_glsl450_alu(struct vtn_builder *b, enum GLSLstd450 
entrypoint,
   val->ssa->def = nir_fmul(nb, nir_fsign(nb, src[0]),
  build_log(nb, nir_fadd(nb, nir_fabs(nb, src[0]),
nir_fsqrt(nb, nir_fadd(nb, nir_fmul(nb, src[0], src[0]),
-  nir_imm_float(nb, 1.0f));
+  nir_imm_floatN_t(nb, 1.0f, 
src[0]->bit_size));
   return;
case GLSLstd450Acosh:
   val->ssa->def = build_log(nb, nir_fadd(nb, src[0],
  nir_fsqrt(nb, nir_fsub(nb, nir_fmul(nb, src[0], src[0]),
-nir_imm_float(nb, 1.0f);
+nir_imm_floatN_t(nb, 1.0f, 
src[0]->bit_size);
   return;
case GLSLstd450Atanh: {
-  nir_ssa_def *one = nir_imm_float(nb, 1.0);
-  val->ssa->def = nir_fmul(nb, nir_imm_float(nb, 0.5f),
+  nir_ssa_def *one = nir_imm_floatN_t(nb, 1.0, src[0]->bit_size);
+  val->ssa->def = nir_fmul(nb, nir_imm_floatN_t(nb, 0.5f, 
src[0]->bit_size),
  build_log(nb, nir_fdiv(nb, nir_fadd(nb, one, src[0]),
 nir_fsub(nb, one, src[0];
   return;
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 14/22] compiler/spirv: implement 16-bit atan2

2018-05-17 Thread Iago Toral Quiroga

---
 src/compiler/spirv/vtn_glsl450.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/src/compiler/spirv/vtn_glsl450.c b/src/compiler/spirv/vtn_glsl450.c
index 9e565ef9e5a..70e3eb80c4c 100644
--- a/src/compiler/spirv/vtn_glsl450.c
+++ b/src/compiler/spirv/vtn_glsl450.c
@@ -322,8 +322,11 @@ build_atan(nir_builder *b, nir_ssa_def *y_over_x)
 static nir_ssa_def *
 build_atan2(nir_builder *b, nir_ssa_def *y, nir_ssa_def *x)
 {
-   nir_ssa_def *zero = nir_imm_float(b, 0);
-   nir_ssa_def *one = nir_imm_float(b, 1);
+   assert(y->bit_size == x->bit_size);
+   const uint32_t bit_size = x->bit_size;
+
+   nir_ssa_def *zero = nir_imm_floatN_t(b, 0, bit_size);
+   nir_ssa_def *one = nir_imm_floatN_t(b, 1, bit_size);
 
/* If we're on the left half-plane rotate the coordinates π/2 clock-wise
 * for the y=0 discontinuity to end up aligned with the vertical
@@ -353,9 +356,10 @@ build_atan2(nir_builder *b, nir_ssa_def *y, nir_ssa_def *x)
 * floating point representations with at least the dynamic range of ATI's
 * 24-bit representation.
 */
-   nir_ssa_def *huge = nir_imm_float(b, 1e18f);
+   const double huge_val = bit_size >= 32 ? 1e18 : 1e14;
+   nir_ssa_def *huge = nir_imm_floatN_t(b,  huge_val, bit_size);
nir_ssa_def *scale = nir_bcsel(b, nir_fge(b, nir_fabs(b, t), huge),
-  nir_imm_float(b, 0.25), one);
+  nir_imm_floatN_t(b, 0.25, bit_size), one);
nir_ssa_def *rcp_scaled_t = nir_frcp(b, nir_fmul(b, t, scale));
nir_ssa_def *s_over_t = nir_fmul(b, nir_fmul(b, s, scale), rcp_scaled_t);
 
@@ -382,9 +386,13 @@ build_atan2(nir_builder *b, nir_ssa_def *y, nir_ssa_def *x)
/* Calculate the arctangent and fix up the result if we had flipped the
 * coordinate system.
 */
-   nir_ssa_def *arc = nir_fadd(b, nir_fmul(b, nir_b2f(b, flip),
-   nir_imm_float(b, M_PI_2f)),
-   build_atan(b, tan));
+   nir_ssa_def *b2f_flip = nir_b2f(b, flip);
+   if (bit_size == 16)
+  b2f_flip = nir_f2f16_undef(b, b2f_flip);
+   nir_ssa_def *arc =
+  nir_fadd(b, nir_fmul(b, b2f_flip,
+  nir_imm_floatN_t(b, M_PI_2f, bit_size)),
+  build_atan(b, tan));
 
/* Rather convoluted calculation of the sign of the result.  When x < 0 we
 * cannot use fsign because we need to be able to distinguish between
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 13/22] compiler/spirv: implement 16-bit atan

2018-05-17 Thread Iago Toral Quiroga

---
 src/compiler/spirv/vtn_glsl450.c | 37 +
 1 file changed, 25 insertions(+), 12 deletions(-)

diff --git a/src/compiler/spirv/vtn_glsl450.c b/src/compiler/spirv/vtn_glsl450.c
index 8cbdaad3998..9e565ef9e5a 100644
--- a/src/compiler/spirv/vtn_glsl450.c
+++ b/src/compiler/spirv/vtn_glsl450.c
@@ -255,8 +255,10 @@ build_fsum(nir_builder *b, nir_ssa_def **xs, int terms)
 static nir_ssa_def *
 build_atan(nir_builder *b, nir_ssa_def *y_over_x)
 {
+   const uint32_t bit_size = y_over_x->bit_size;
+
nir_ssa_def *abs_y_over_x = nir_fabs(b, y_over_x);
-   nir_ssa_def *one = nir_imm_float(b, 1.0f);
+   nir_ssa_def *one = nir_imm_floatN_t(b, 1.0f, bit_size);
 
/*
 * range-reduction, first step:
@@ -282,25 +284,36 @@ build_atan(nir_builder *b, nir_ssa_def *y_over_x)
nir_ssa_def *x_9  = nir_fmul(b, x_7, x_2);
nir_ssa_def *x_11 = nir_fmul(b, x_9, x_2);
 
+   const float coef[] = {
+   0.793128310355f,
+  -0.3326756418091246f,
+   0.1938924977115610f,
+  -0.1173503194786851f,
+   0.0536813784310406f,
+  -0.0121323213173444f,
+   };
+
nir_ssa_def *polynomial_terms[] = {
-  nir_fmul(b, x,nir_imm_float(b,  0.793128310355f)),
-  nir_fmul(b, x_3,  nir_imm_float(b, -0.3326756418091246f)),
-  nir_fmul(b, x_5,  nir_imm_float(b,  0.1938924977115610f)),
-  nir_fmul(b, x_7,  nir_imm_float(b, -0.1173503194786851f)),
-  nir_fmul(b, x_9,  nir_imm_float(b,  0.0536813784310406f)),
-  nir_fmul(b, x_11, nir_imm_float(b, -0.0121323213173444f)),
+  nir_fmul(b, x,nir_imm_floatN_t(b, coef[0], bit_size)),
+  nir_fmul(b, x_3,  nir_imm_floatN_t(b, coef[1], bit_size)),
+  nir_fmul(b, x_5,  nir_imm_floatN_t(b, coef[2], bit_size)),
+  nir_fmul(b, x_7,  nir_imm_floatN_t(b, coef[3], bit_size)),
+  nir_fmul(b, x_9,  nir_imm_floatN_t(b, coef[4], bit_size)),
+  nir_fmul(b, x_11, nir_imm_floatN_t(b, coef[5], bit_size)),
};
 
nir_ssa_def *tmp =
   build_fsum(b, polynomial_terms, ARRAY_SIZE(polynomial_terms));
 
/* range-reduction fixup */
+   nir_ssa_def *minus_2 = nir_imm_floatN_t(b, -2.0f, bit_size);
+   nir_ssa_def *m_pi_2 = nir_imm_floatN_t(b, M_PI_2f, bit_size);
+   nir_ssa_def *b2f = nir_b2f(b, nir_flt(b, one, abs_y_over_x));
+   if (bit_size == 16)
+  b2f = nir_f2f16_undef(b, b2f);
tmp = nir_fadd(b, tmp,
-  nir_fmul(b,
-   nir_b2f(b, nir_flt(b, one, abs_y_over_x)),
-   nir_fadd(b, nir_fmul(b, tmp,
-nir_imm_float(b, -2.0f)),
-   nir_imm_float(b, M_PI_2f;
+  nir_fmul(b, b2f,
+  nir_fadd(b, nir_fmul(b, tmp, minus_2), m_pi_2)));
 
/* sign fixup */
return nir_fmul(b, tmp, nir_fsign(b, y_over_x));
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 15/22] compiler/spirv: implement 16-bit exp and log

2018-05-17 Thread Iago Toral Quiroga

---
 src/compiler/spirv/vtn_glsl450.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/compiler/spirv/vtn_glsl450.c b/src/compiler/spirv/vtn_glsl450.c
index 70e3eb80c4c..324e8b5874a 100644
--- a/src/compiler/spirv/vtn_glsl450.c
+++ b/src/compiler/spirv/vtn_glsl450.c
@@ -194,7 +194,7 @@ build_fclamp(nir_builder *b,
 static nir_ssa_def *
 build_exp(nir_builder *b, nir_ssa_def *x)
 {
-   return nir_fexp2(b, nir_fmul(b, x, nir_imm_float(b, M_LOG2E)));
+   return nir_fexp2(b, nir_fmul(b, x, nir_imm_floatN_t(b, M_LOG2E, 
x->bit_size)));
 }
 
 /**
@@ -203,7 +203,8 @@ build_exp(nir_builder *b, nir_ssa_def *x)
 static nir_ssa_def *
 build_log(nir_builder *b, nir_ssa_def *x)
 {
-   return nir_fmul(b, nir_flog2(b, x), nir_imm_float(b, 1.0 / M_LOG2E));
+   return nir_fmul(b, nir_flog2(b, x),
+  nir_imm_floatN_t(b, 1.0 / M_LOG2E, x->bit_size));
 }
 
 /**
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 12/22] compiler/spirv: implement 16-bit acos

2018-05-17 Thread Iago Toral Quiroga

---
 src/compiler/spirv/vtn_glsl450.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/compiler/spirv/vtn_glsl450.c b/src/compiler/spirv/vtn_glsl450.c
index 845e5a9e517..8cbdaad3998 100644
--- a/src/compiler/spirv/vtn_glsl450.c
+++ b/src/compiler/spirv/vtn_glsl450.c
@@ -743,8 +743,9 @@ handle_glsl450_alu(struct vtn_builder *b, enum GLSLstd450 
entrypoint,
   return;
 
case GLSLstd450Acos:
-  val->ssa->def = nir_fsub(nb, nir_imm_float(nb, M_PI_2f),
-   build_asin(nb, src[0], 0.08132463, 
-0.02363318));
+  val->ssa->def =
+ nir_fsub(nb, nir_imm_floatN_t(nb, M_PI_2f, src[0]->bit_size),
+  build_asin(nb, src[0], 0.08132463, -0.02363318));
   return;
 
case GLSLstd450Atan:
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 10/22] intel/compiler: allow extended math functions with HF operands

2018-05-17 Thread Iago Toral Quiroga

The PRM states that half-float operands are supported since gen9.
---
 src/intel/compiler/brw_eu_emit.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c
index ee5a048bcaa..20c3f9fa933 100644
--- a/src/intel/compiler/brw_eu_emit.c
+++ b/src/intel/compiler/brw_eu_emit.c
@@ -1933,8 +1933,10 @@ void gen6_math(struct brw_codegen *p,
   assert(src1.file == BRW_GENERAL_REGISTER_FILE ||
  (devinfo->gen >= 8 && src1.file == BRW_IMMEDIATE_VALUE));
} else {
-  assert(src0.type == BRW_REGISTER_TYPE_F);
-  assert(src1.type == BRW_REGISTER_TYPE_F);
+  assert(src0.type == BRW_REGISTER_TYPE_F ||
+ (src0.type == BRW_REGISTER_TYPE_HF && devinfo->gen >= 9));
+  assert(src1.type == BRW_REGISTER_TYPE_F ||
+ (src1.type == BRW_REGISTER_TYPE_HF && devinfo->gen >= 9));
}
 
/* Source modifiers are ignored for extended math instructions on Gen6. */
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 11/22] compiler/spirv: implement 16-bit asin

2018-05-17 Thread Iago Toral Quiroga

---
 src/compiler/spirv/vtn_glsl450.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/src/compiler/spirv/vtn_glsl450.c b/src/compiler/spirv/vtn_glsl450.c
index ffe12a71818..845e5a9e517 100644
--- a/src/compiler/spirv/vtn_glsl450.c
+++ b/src/compiler/spirv/vtn_glsl450.c
@@ -217,19 +217,25 @@ build_log(nir_builder *b, nir_ssa_def *x)
  * in each case.
  */
 static nir_ssa_def *
-build_asin(nir_builder *b, nir_ssa_def *x, float p0, float p1)
+build_asin(nir_builder *b, nir_ssa_def *x, float _p0, float _p1)
 {
+   nir_ssa_def *p0 = nir_imm_floatN_t(b, _p0, x->bit_size);
+   nir_ssa_def *p1 = nir_imm_floatN_t(b, _p1, x->bit_size);
+   nir_ssa_def *one = nir_imm_floatN_t(b, 1.0f, x->bit_size);
+   nir_ssa_def *m_pi_2 = nir_imm_floatN_t(b, M_PI_2f, x->bit_size);
+   nir_ssa_def *m_pi_4_minus_one =
+  nir_imm_floatN_t(b, M_PI_4f - 1.0f, x->bit_size);
nir_ssa_def *abs_x = nir_fabs(b, x);
return nir_fmul(b, nir_fsign(b, x),
-   nir_fsub(b, nir_imm_float(b, M_PI_2f),
-nir_fmul(b, nir_fsqrt(b, nir_fsub(b, 
nir_imm_float(b, 1.0f), abs_x)),
- nir_fadd(b, nir_imm_float(b, M_PI_2f),
+   nir_fsub(b, m_pi_2,
+nir_fmul(b, nir_fsqrt(b, nir_fsub(b, one, abs_x)),
+ nir_fadd(b, m_pi_2,
   nir_fmul(b, abs_x,
-   nir_fadd(b, 
nir_imm_float(b, M_PI_4f - 1.0f),
+   nir_fadd(b, 
m_pi_4_minus_one,
 nir_fmul(b, 
abs_x,
- 
nir_fadd(b, nir_imm_float(b, p0),
+ 
nir_fadd(b, p0,

   nir_fmul(b, abs_x,
-   
nir_imm_float(b, p1));
+   
p1);
 }
 
 /**
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 09/22] intel/compiler: implement 16-bit multiply-add

2018-05-17 Thread Iago Toral Quiroga

The PRM for MAD states that F, DF and HF are supported, however, then
it requires that the instruction includes a 2-bit mask specifying
the types of each operand like this:

00: 32-bit float
01: 32-bit signed integer
10: 32-bit unsigned integer
11: 64-bit float

So 16-bit float would not be supported. The driver also asserts that
the types involved in ALING16 3-src operations are one of these
(MAD is always emitted as an align16 instruction prior to gen10).
---
 src/intel/compiler/brw_fs_nir.cpp | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 91283ab4911..58ddc456bae 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -1525,7 +1525,14 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
   break;
 
case nir_op_ffma:
-  inst = bld.MAD(result, op[2], op[1], op[0]);
+  /* 3-src MAD doesn't support 16-bit operands */
+  if (nir_dest_bit_size(instr->dest.dest) >= 32) {
+ inst = bld.MAD(result, op[2], op[1], op[0]);
+  } else {
+ fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_HF);
+ bld.MUL(tmp, op[1], op[0]);
+ inst = bld.ADD(result, tmp, op[2]);
+  }
   inst->saturate = instr->dest.saturate;
   break;
 
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 08/22] intel/compiler: implement 16-bit fsign

2018-05-17 Thread Iago Toral Quiroga

---
 src/intel/compiler/brw_fs_nir.cpp | 27 +--
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index fb5ad7a614a..91283ab4911 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -868,14 +868,29 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
   * Predicated OR ORs 1.0 (0x3f80) with the sign bit if val is not
   * zero.
   */
- bld.CMP(bld.null_reg_f(), op[0], brw_imm_f(0.0f), BRW_CONDITIONAL_NZ);
+ fs_reg zero, one_mask, sign_mask;
+ brw_reg_type reg_type;
+ if (type_sz(op[0].type) == 4) {
+zero = brw_imm_f(0.0f);
+one_mask = brw_imm_ud(0x3f80);
+sign_mask = brw_imm_ud(0x8000);
+reg_type = BRW_REGISTER_TYPE_UD;
+ } else {
+assert(type_sz(op[0].type) == 2);
+zero = retype(brw_imm_uw(0), BRW_REGISTER_TYPE_HF);
+one_mask = brw_imm_uw(0x3c00);
+sign_mask = brw_imm_uw(0x8000);
+reg_type = BRW_REGISTER_TYPE_UW;
+ }
+
+ bld.CMP(bld.null_reg_f(), op[0], zero, BRW_CONDITIONAL_NZ);
 
- fs_reg result_int = retype(result, BRW_REGISTER_TYPE_UD);
- op[0].type = BRW_REGISTER_TYPE_UD;
- result.type = BRW_REGISTER_TYPE_UD;
- bld.AND(result_int, op[0], brw_imm_ud(0x8000u));
+ fs_reg result_int = retype(result, reg_type);
+ op[0].type = reg_type;
+ result.type = reg_type;
+ bld.AND(result_int, op[0], sign_mask);
 
- inst = bld.OR(result_int, result_int, brw_imm_ud(0x3f80u));
+ inst = bld.OR(result_int, result_int, one_mask);
  inst->predicate = BRW_PREDICATE_NORMAL;
  if (instr->dest.saturate) {
 inst = bld.MOV(result, result);
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 06/22] compiler/nir: support 16-bit float in nir_imm_floatN_t

2018-05-17 Thread Iago Toral Quiroga

---
 src/compiler/nir/nir_builder.h | 29 -
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/src/compiler/nir/nir_builder.h b/src/compiler/nir/nir_builder.h
index 02a9dbfb040..198c42dd823 100644
--- a/src/compiler/nir/nir_builder.h
+++ b/src/compiler/nir/nir_builder.h
@@ -25,6 +25,7 @@
 #define NIR_BUILDER_H
 
 #include "nir_control_flow.h"
+#include "util/half_float.h"
 
 struct exec_list;
 
@@ -227,19 +228,6 @@ nir_imm_double(nir_builder *build, double x)
return nir_build_imm(build, 1, 64, v);
 }
 
-static inline nir_ssa_def *
-nir_imm_floatN_t(nir_builder *build, double x, unsigned bit_size)
-{
-   switch (bit_size) {
-   case 32:
-  return nir_imm_float(build, x);
-   case 64:
-  return nir_imm_double(build, x);
-   }
-
-   unreachable("unknown float immediate bit size");
-}
-
 static inline nir_ssa_def *
 nir_imm_vec4(nir_builder *build, float x, float y, float z, float w)
 {
@@ -288,6 +276,21 @@ nir_imm_intN_t(nir_builder *build, uint64_t x, unsigned 
bit_size)
return nir_build_imm(build, 1, bit_size, v);
 }
 
+static inline nir_ssa_def *
+nir_imm_floatN_t(nir_builder *build, double x, unsigned bit_size)
+{
+   switch (bit_size) {
+   case 16:
+  return nir_imm_intN_t(build, _mesa_float_to_half((float)x), 16);
+   case 32:
+  return nir_imm_float(build, x);
+   case 64:
+  return nir_imm_double(build, x);
+   }
+
+   unreachable("unknown float immediate bit size");
+}
+
 static inline nir_ssa_def *
 nir_imm_ivec4(nir_builder *build, int x, int y, int z, int w)
 {
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 07/22] compiler/spirv: handle 16-bit float in radians() and degrees()

2018-05-17 Thread Iago Toral Quiroga

---
 src/compiler/spirv/vtn_glsl450.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/compiler/spirv/vtn_glsl450.c b/src/compiler/spirv/vtn_glsl450.c
index 6fa759b1bba..ffe12a71818 100644
--- a/src/compiler/spirv/vtn_glsl450.c
+++ b/src/compiler/spirv/vtn_glsl450.c
@@ -540,11 +540,17 @@ handle_glsl450_alu(struct vtn_builder *b, enum GLSLstd450 
entrypoint,
 
switch (entrypoint) {
case GLSLstd450Radians:
-  val->ssa->def = nir_fmul(nb, src[0], nir_imm_float(nb, 0.01745329251));
+  val->ssa->def = nir_fmul(nb, src[0],
+   nir_imm_floatN_t(nb, 0.01745329251,
+src[0]->bit_size));
   return;
+
case GLSLstd450Degrees:
-  val->ssa->def = nir_fmul(nb, src[0], nir_imm_float(nb, 57.2957795131));
+  val->ssa->def = nir_fmul(nb, src[0],
+   nir_imm_floatN_t(nb, 57.2957795131,
+src[0]->bit_size));
   return;
+
case GLSLstd450Tan:
   val->ssa->def = nir_fdiv(nb, nir_fsin(nb, src[0]),
nir_fcos(nb, src[0]));
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 04/22] intel/compiler: lower some 16-bit float operations to 32-bit

2018-05-17 Thread Iago Toral Quiroga

The hardware doesn't support half-float for these.
---
 src/intel/compiler/brw_nir.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index dfeea73b06a..ff245b59b81 100644
--- a/src/intel/compiler/brw_nir.c
+++ b/src/intel/compiler/brw_nir.c
@@ -605,6 +605,11 @@ lower_bit_size_callback(const nir_alu_instr *alu, void 
*data)
case nir_op_irem:
case nir_op_udiv:
case nir_op_umod:
+   case nir_op_fceil:
+   case nir_op_ffloor:
+   case nir_op_ffract:
+   case nir_op_fround_even:
+   case nir_op_ftrunc:
   return 32;
default:
   return 0;
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 03/22] compiler/spirv: fix SpvOpIsInf for 16-bit float

2018-05-17 Thread Iago Toral Quiroga

---
 src/compiler/spirv/vtn_alu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/compiler/spirv/vtn_alu.c b/src/compiler/spirv/vtn_alu.c
index 5f9cc97fdfb..62a5149797a 100644
--- a/src/compiler/spirv/vtn_alu.c
+++ b/src/compiler/spirv/vtn_alu.c
@@ -578,7 +578,9 @@ vtn_handle_alu(struct vtn_builder *b, SpvOp opcode,
   break;
 
case SpvOpIsInf: {
-  nir_ssa_def *inf = nir_imm_floatN_t(>nb, INFINITY, src[0]->bit_size);
+  nir_ssa_def *inf = src[0]->bit_size > 16 ?
+ nir_imm_floatN_t(>nb, INFINITY, src[0]->bit_size) :
+ nir_imm_intN_t(>nb, 0x7c00, 16);
   val->ssa->def = nir_ieq(>nb, nir_fabs(>nb, src[0]), inf);
   break;
}
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 05/22] intel/compiler: lower 16-bit extended math to 32-bit prior to gen9

2018-05-17 Thread Iago Toral Quiroga

Extended math desn't support half-float on these generations.
---
 src/intel/compiler/brw_nir.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index ff245b59b81..8337da57585 100644
--- a/src/intel/compiler/brw_nir.c
+++ b/src/intel/compiler/brw_nir.c
@@ -599,6 +599,8 @@ lower_bit_size_callback(const nir_alu_instr *alu, void 
*data)
if (alu->dest.dest.ssa.bit_size != 16)
   return 0;
 
+   const struct brw_compiler *compiler = (const struct brw_compiler *) data;
+
switch (alu->op) {
case nir_op_idiv:
case nir_op_imod:
@@ -611,6 +613,15 @@ lower_bit_size_callback(const nir_alu_instr *alu, void 
*data)
case nir_op_fround_even:
case nir_op_ftrunc:
   return 32;
+   case nir_op_frcp:
+   case nir_op_frsq:
+   case nir_op_fsqrt:
+   case nir_op_fpow:
+   case nir_op_fexp2:
+   case nir_op_flog2:
+   case nir_op_fsin:
+   case nir_op_fcos:
+  return compiler->devinfo->gen < 9 ? 32 : 0;
default:
   return 0;
}
@@ -669,7 +680,7 @@ brw_preprocess_nir(const struct brw_compiler *compiler, 
nir_shader *nir)
 
nir = brw_nir_optimize(nir, compiler, is_scalar);
 
-   nir_lower_bit_size(nir, lower_bit_size_callback, NULL);
+   nir_lower_bit_size(nir, lower_bit_size_callback, (void *)compiler);
 
if (is_scalar) {
   OPT(nir_lower_load_const_to_scalar);
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 02/22] i965/fs: Implement float64 to float16 conversion

2018-05-17 Thread Iago Toral Quiroga

From: Samuel Iglesias Gonsálvez 

It is not supported directly in the HW, we need to convert to float32
first as intermediate step.

Signed-off-by: Samuel Iglesias Gonsálvez 
---
 src/intel/compiler/brw_fs_nir.cpp | 17 +
 1 file changed, 17 insertions(+)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index dd8e5191f4e..fb5ad7a614a 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -755,6 +755,23 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
*/
 
case nir_op_f2f16_undef:
+  /* BDW PRM, vol02, Command Reference Instructions, mov - MOVE:
+   *
+   *   "There is no direct conversion from HF to DF or DF to HF.
+   *Use two instructions and F (Float) as an intermediate type.
+   *
+   *There is no direct conversion from HF to Q/UQ or Q/UQ to HF.
+   *Use two instructions and F (Float) or a word integer type
+   *or a DWord integer type as an intermediate type."
+   */
+  if (nir_src_bit_size(instr->src[0].src) == 64) {
+ fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F, 1);
+ inst = bld.MOV(tmp, op[0]);
+ inst->saturate = instr->dest.saturate;
+ inst = bld.MOV(result, tmp);
+ inst->saturate = instr->dest.saturate;
+ break;
+  }
   inst = bld.MOV(result, op[0]);
   inst->saturate = instr->dest.saturate;
   break;
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 01/22] i965/fs: implement conversions from float16 to 64 bits data types

2018-05-17 Thread Iago Toral Quiroga

From: Samuel Iglesias Gonsálvez 

Signed-off-by: Samuel Iglesias Gonsálvez 
---
 src/intel/compiler/brw_fs_nir.cpp | 32 
 1 file changed, 32 insertions(+)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 1ce89520bf1..dd8e5191f4e 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -762,6 +762,38 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
case nir_op_f2f64:
case nir_op_f2i64:
case nir_op_f2u64:
+  /* BDW PRM, vol02, Command Reference Instructions, mov - MOVE:
+   *
+   *   "There is no direct conversion from HF to DF or DF to HF.
+   *Use two instructions and F (Float) as an intermediate type.
+   *
+   *There is no direct conversion from HF to Q/UQ or Q/UQ to HF.
+   *Use two instructions and F (Float) or a word integer type
+   *or a DWord integer type as an intermediate type."
+   */
+  if (nir_src_bit_size(instr->src[0].src) == 16) {
+ brw_reg_type type;
+ switch (instr->op) {
+ case nir_op_f2f64:
+type = BRW_REGISTER_TYPE_F;
+break;
+ case nir_op_f2i64:
+type = BRW_REGISTER_TYPE_D;
+break;
+ case nir_op_f2u64:
+type = BRW_REGISTER_TYPE_UD;
+break;
+ default:
+unreachable("Not supported");
+ }
+ fs_reg tmp = bld.vgrf(type, 1);
+ inst = bld.MOV(tmp, op[0]);
+ inst->saturate = instr->dest.saturate;
+ inst = bld.MOV(result, tmp);
+ inst->saturate = instr->dest.saturate;
+ break;
+  }
+  /* fallthrough */
case nir_op_i2f64:
case nir_op_i2i64:
case nir_op_u2f64:
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 6/9] intel/blorp: Make blorp_ccs_ambiguate just an internal helper

2018-05-17 Thread Iago Toral

On Wed, 2018-05-16 at 08:44 -0700, Jason Ekstrand wrote:
> On Wed, May 16, 2018 at 4:00 AM, Iago Toral <ito...@igalia.com>
> wrote:
> > On Tue, 2018-05-15 at 15:28 -0700, Jason Ekstrand wrote:
> > 
> > > Now that anv uses blorp_ccs_op for everything, we no longer need
> > to
> > 
> > > expose the ccs_ambiguate function directly.  It's much better
> > tucked
> > 
> > > away as an implementation detail.
> > 
> > > ---
> > 
> > >  src/intel/blorp/blorp.h   |  5 -
> > 
> > >  src/intel/blorp/blorp_clear.c | 21 ++---
> > 
> > >  2 files changed, 10 insertions(+), 16 deletions(-)
> > 
> > > 
> > 
> > > diff --git a/src/intel/blorp/blorp.h b/src/intel/blorp/blorp.h
> > 
> > > index 8c775bf..e27ea7e 100644
> > 
> > > --- a/src/intel/blorp/blorp.h
> > 
> > > +++ b/src/intel/blorp/blorp.h
> > 
> > > @@ -208,11 +208,6 @@ blorp_ccs_op(struct blorp_batch *batch,
> > 
> > >   enum isl_aux_op ccs_op);
> > 
> > >  
> > 
> > >  void
> > 
> > > -blorp_ccs_ambiguate(struct blorp_batch *batch,
> > 
> > > -struct blorp_surf *surf,
> > 
> > > -uint32_t level, uint32_t layer);
> > 
> > > -
> > 
> > > -void
> > 
> > >  blorp_mcs_partial_resolve(struct blorp_batch *batch,
> > 
> > >struct blorp_surf *surf,
> > 
> > >enum isl_format format,
> > 
> > > diff --git a/src/intel/blorp/blorp_clear.c
> > 
> > > b/src/intel/blorp/blorp_clear.c
> > 
> > > index 6f5549f..39bc0c6 100644
> > 
> > > --- a/src/intel/blorp/blorp_clear.c
> > 
> > > +++ b/src/intel/blorp/blorp_clear.c
> > 
> > > @@ -814,6 +814,11 @@ blorp_clear_attachments(struct blorp_batch
> > 
> > > *batch,
> > 
> > > batch->blorp->exec(batch, );
> > 
> > >  }
> > 
> > >  
> > 
> > > +static void
> > 
> > > +blorp_legacy_ccs_ambiguate(struct blorp_batch *batch,
> > 
> > > +   struct blorp_surf *surf,
> > 
> > > +   uint32_t level, uint32_t layer);
> > 
> > > +
> > 
> > >  void
> > 
> > >  blorp_ccs_op(struct blorp_batch *batch,
> > 
> > >   struct blorp_surf *surf, uint32_t level,
> > 
> > > @@ -835,7 +840,7 @@ blorp_ccs_op(struct blorp_batch *batch,
> > 
> > > * mess to another function.
> > 
> > > */
> > 
> > >for (uint32_t a = 0; a < num_layers; a++)
> > 
> > > - blorp_ccs_ambiguate(batch, surf, level, start_layer +
> > a);
> > 
> > > + blorp_legacy_ccs_ambiguate(batch, surf, level,
> > start_layer
> > 
> > > + a);
> > 
> > >return;
> > 
> > > }
> > 
> > >  
> > 
> > > @@ -1022,17 +1027,11 @@ blorp_mcs_partial_resolve(struct
> > blorp_batch
> > 
> > > *batch,
> > 
> > >   * for a given layer/level of a surface to 0x0 which is the
> > 
> > > "uncompressed"
> > 
> > >   * state which tells the sampler to go look at the main surface.
> > 
> > >   */
> > 
> > > -void
> > 
> > > -blorp_ccs_ambiguate(struct blorp_batch *batch,
> > 
> > > -struct blorp_surf *surf,
> > 
> > > -uint32_t level, uint32_t layer)
> > 
> > > +static void
> > 
> > > +blorp_legacy_ccs_ambiguate(struct blorp_batch *batch,
> > 
> > > +   struct blorp_surf *surf,
> > 
> > > +   uint32_t level, uint32_t layer)
> > 
> > >  {
> > 
> > > -   if (ISL_DEV_GEN(batch->blorp->isl_dev) >= 10) {
> > 
> > > -  /* On gen10 and above, we have a hardware resolve op for
> > this
> > 
> > > */
> > 
> > > -  return blorp_ccs_op(batch, surf, level, layer, 1,
> > 
> > > -  surf->surf->format,
> > ISL_AUX_OP_AMBIGUATE);
> > 
> > > -   }
> > 
> > > -
> > 
> > 
> > 
> > Since we don't want to call this in gen10+, would it make sense to
> > an
> > 
> > assert for gen < 10?
> 
> It does work on gen10 and 11 (and we used it on gen 10 for a while). 
> I'll make it gen < 12.
> 

My point was that even if it works, we don't want this to be called for
these generations... maybe an assert would be too much for this?

Iago___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 00/22] spirv/intel: half-float compiler enablement

2018-05-17 Thread Iago Toral Quiroga

Most of our compiler was already 16-bit aware thanks to previous work on
VK_KHR_16bit_storage and shaderInt16, but specifically for 16-bit floating
point, we were missing a few things such as some lowerings that depend on the
specific bit representation of the float or some hardware restrictions which
we address here.

The series contains two patches from Samuel that handle float64/16 conversions,
which had been posted in the mailing list some time ago and that are also
relevant to this.

Iago Toral Quiroga (20):
  compiler/spirv: fix SpvOpIsInf for 16-bit float
  intel/compiler: lower some 16-bit float operations to 32-bit
  intel/compiler: lower 16-bit extended math to 32-bit prior to gen9
  compiler/nir: support 16-bit float in nir_imm_floatN_t
  compiler/spirv: handle 16-bit float in radians() and degrees()
  intel/compiler: implement 16-bit fsign
  intel/compiler: implement 16-bit multiply-add
  intel/compiler: allow extended math functions with HF operands
  compiler/spirv: implement 16-bit asin
  compiler/spirv: implement 16-bit acos
  compiler/spirv: implement 16-bit atan
  compiler/spirv: implement 16-bit atan2
  compiler/spirv: implement 16-bit exp and log
  compiler/spirv: implement 16-bit hyperbolic trigonometric functions
  compiler/spirv: implement 16-bit frexp
  compiler/nir: add lowering option for 16-bit fmod
  intel/compiler: lower 16-bit fmod
  compiler/nir: add lowering for 16-bit flrp
  intel/compiler: lower 16-bit flrp
  intel/compiler: Extended Math is limited to SIMD8 on half-float

Samuel Iglesias Gonsálvez (2):
  i965/fs: implement conversions from float16 to 64 bits data types
  i965/fs: Implement float64 to float16 conversion

 src/compiler/nir/nir.h|   2 +
 src/compiler/nir/nir_builder.h|  29 +++---
 src/compiler/nir/nir_opt_algebraic.py |   2 +
 src/compiler/spirv/vtn_alu.c  |   4 +-
 src/compiler/spirv/vtn_glsl450.c  | 176 +-
 src/intel/compiler/brw_compiler.c |   2 +
 src/intel/compiler/brw_eu_emit.c  |   6 +-
 src/intel/compiler/brw_fs.cpp |  30 --
 src/intel/compiler/brw_fs_nir.cpp |  85 ++--
 src/intel/compiler/brw_nir.c  |  18 +++-
 10 files changed, 279 insertions(+), 75 deletions(-)

-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/9] intel/blorp: Refactors, cleanups, and fixes

2018-05-16 Thread Iago Toral

I skipped the first two patches in the series. I believe that there is
probably someone else more suitable than me to review or verify patch 1
and I don't think we have public gen10 PRMs available yet for patch 2.

I dropped a minor comment on patch 6 which you can take or leave. In
any case, patches 3-9 are:

Reviewed-by: Iago Toral Quiroga <ito...@igalia.com>

Iago

On Tue, 2018-05-15 at 15:28 -0700, Jason Ekstrand wrote:
> This little series makes a bunch of mostly small changes to
> blorp.  The end
> objective is to get to the point where you just call blorp_ccs_op and
> hand
> it an isl_aux_op instead of having different entrypoints for
> everything.
> This is similar to what we do for HiZ.  For MCS, we still have two
> functions: blorp_mcs_clear and blorp_mcs_partial_resolve.  Since
> those are
> the only two MCS operations you can do (and partial resolve isn't an
> actual
> hardware op), that seemed ok.
> 
> The difficult patch in here is the first one.  I fairly firmly
> believe it
> to be correct but it's a deviation of the docs so it's a bit hard to
> say.
> Unfortunately, it's one of the worst bits of documentation we have
> for our
> GPUs and, as the giant comment explains, it's actually self-
> contradictory
> once you start doing the math.
> 
> Jason Ekstrand (9):
>   intel/blorp: Only double the fast-clear rect alignment on HSW
>   intel/blorp: Use the hardware op for CCS ambiguate on gen10+
>   intel/blorp: Rename blorp_ccs_resolve to blorp_ccs_op
>   intel/blorp: Simplify asserts in blorp_ccs_op
>   anv/blorp: Use blorp_ccs_op for everything
>   intel/blorp: Make blorp_ccs_ambiguate just an internal helper
>   i965: Use blorp_ccs_op for CCS fast-clears
>   intel/blorp: Handle fast-clear directly in blorp_ccs_op
>   intel/blorp: Refactor MCS clears
> 
>  src/intel/blorp/blorp.h   |  24 ++-
>  src/intel/blorp/blorp_clear.c | 327 ++
> 
>  src/intel/blorp/blorp_genX_exec.h |   6 +
>  src/intel/vulkan/anv_blorp.c  |  34 +---
>  src/mesa/drivers/dri/i965/brw_blorp.c |  18 +-
>  5 files changed, 203 insertions(+), 206 deletions(-)
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 8/9] intel/blorp: Handle fast-clear directly in blorp_ccs_op

2018-05-16 Thread Iago Toral

On Wed, 2018-05-16 at 13:34 +0200, Iago Toral wrote:
> On Tue, 2018-05-15 at 15:28 -0700, Jason Ekstrand wrote:
> > ---
> >  src/intel/blorp/blorp_clear.c | 199 +++---
> > --
> > --
> >  1 file changed, 88 insertions(+), 111 deletions(-)
> > 
> > diff --git a/src/intel/blorp/blorp_clear.c
> > b/src/intel/blorp/blorp_clear.c
> > index 39bc0c6..5625221 100644
> > --- a/src/intel/blorp/blorp_clear.c
> > +++ b/src/intel/blorp/blorp_clear.c
> > @@ -193,104 +193,7 @@ get_fast_clear_rect(const struct isl_device
> > *dev,
> >  
> > /* Only single sampled surfaces need to (and actually can) be
> > resolved. */
> > if (aux_surf->usage == ISL_SURF_USAGE_CCS_BIT) {
> > -  /* From the Ivy Bridge PRM, Vol2 Part1 11.7 "MCS Buffer for
> > Render
> > -   * Target(s)", beneath the "Fast Color Clear" bullet (p327):
> > -   *
> > -   * Clear pass must have a clear rectangle that must
> > follow
> > -   * alignment rules in terms of pixels and lines as shown
> > in the
> > -   * table below. Further, the clear-rectangle height and
> > width
> > -   * must be multiple of the following dimensions. If the
> > height
> > -   * and width of the render target being cleared do not
> > meet these
> > -   * requirements, an MCS buffer can be created such that
> > it
> > -   * follows the requirement and covers the RT.
> > -   *
> > -   * The alignment size in the table that follows is related
> > to
> > the
> > -   * alignment size that is baked into the CCS surface format
> > but with X
> > -   * alignment multiplied by 16 and Y alignment multiplied by
> > 32.
> > -   */
> > -  x_align = isl_format_get_layout(aux_surf->format)->bw;
> > -  y_align = isl_format_get_layout(aux_surf->format)->bh;
> > -
> > -  x_align *= 16;
> > -
> > -  /* SKL+ line alignment requirement for Y-tiled are half
> > those
> > of the prior
> > -   * generations.
> > -   */
> > -  if (dev->info->gen >= 9)
> > - y_align *= 16;
> > -  else
> > - y_align *= 32;
> > -
> > -  /* From the Ivy Bridge PRM, Vol2 Part1 11.7 "MCS Buffer for
> > Render
> > -   * Target(s)", beneath the "Fast Color Clear" bullet (p327):
> > -   *
> > -   * In order to optimize the performance MCS buffer (when
> > bound to
> > -   * 1X RT) clear similarly to MCS buffer clear for MSRT
> > case,
> > -   * clear rect is required to be scaled by the following
> > factors
> > -   * in the horizontal and vertical directions:
> > -   *
> > -   * The X and Y scale down factors in the table that follows
> > are each
> > -   * equal to half the alignment value computed above.
> > -   */
> > -  x_scaledown = x_align / 2;
> > -  y_scaledown = y_align / 2;
> > -
> > -  if (ISL_DEV_IS_HASWELL(dev)) {
> > - /* The following text was added in the Haswell PRM, "3D
> > Media GPGPU
> > -  * Engine" >> "MCS Buffer for Render Target(s)" >> Table
> > "Color Clear
> > -  * of Non-MultiSampler Render Target Restrictions":
> > -  *
> > -  *"Clear rectangle must be aligned to two times the
> > number of
> > -  *pixels in the table shown below due to 16X16
> > hashing
> > across the
> > -  *slice."
> > -  *
> > -  * It has persisted in the documentation for all
> > platforms
> > up until
> > -  * Cannonlake and possibly even beyond.  However, we
> > believe that it
> > -  * is only needed on Haswell.
> > -  *
> > -  * There are a couple possible explanations for this
> > restriction:
> > -  *
> > -  * 1) If you assume that the hardware is writing to the
> > CCS
> > as
> > -  *bytes, then the x/y_align computed above gives you
> > an
> > alignment
> > -  *in the CCS of 8x8 bytes and, if 16x16 is needed for
> > hashing, we
> > -  *need to multiply by 2.
> > -  *
> > -  * 2) Haswell is a bit unique in that it's CCS tiling
> > does
> > not line
> > -  *up with Y-til

Re: [Mesa-dev] [PATCH 8/9] intel/blorp: Handle fast-clear directly in blorp_ccs_op

2018-05-16 Thread Iago Toral

On Tue, 2018-05-15 at 15:28 -0700, Jason Ekstrand wrote:
> ---
>  src/intel/blorp/blorp_clear.c | 199 +++-
> --
>  1 file changed, 88 insertions(+), 111 deletions(-)
> 
> diff --git a/src/intel/blorp/blorp_clear.c
> b/src/intel/blorp/blorp_clear.c
> index 39bc0c6..5625221 100644
> --- a/src/intel/blorp/blorp_clear.c
> +++ b/src/intel/blorp/blorp_clear.c
> @@ -193,104 +193,7 @@ get_fast_clear_rect(const struct isl_device
> *dev,
>  
> /* Only single sampled surfaces need to (and actually can) be
> resolved. */
> if (aux_surf->usage == ISL_SURF_USAGE_CCS_BIT) {
> -  /* From the Ivy Bridge PRM, Vol2 Part1 11.7 "MCS Buffer for
> Render
> -   * Target(s)", beneath the "Fast Color Clear" bullet (p327):
> -   *
> -   * Clear pass must have a clear rectangle that must follow
> -   * alignment rules in terms of pixels and lines as shown
> in the
> -   * table below. Further, the clear-rectangle height and
> width
> -   * must be multiple of the following dimensions. If the
> height
> -   * and width of the render target being cleared do not
> meet these
> -   * requirements, an MCS buffer can be created such that it
> -   * follows the requirement and covers the RT.
> -   *
> -   * The alignment size in the table that follows is related to
> the
> -   * alignment size that is baked into the CCS surface format
> but with X
> -   * alignment multiplied by 16 and Y alignment multiplied by
> 32.
> -   */
> -  x_align = isl_format_get_layout(aux_surf->format)->bw;
> -  y_align = isl_format_get_layout(aux_surf->format)->bh;
> -
> -  x_align *= 16;
> -
> -  /* SKL+ line alignment requirement for Y-tiled are half those
> of the prior
> -   * generations.
> -   */
> -  if (dev->info->gen >= 9)
> - y_align *= 16;
> -  else
> - y_align *= 32;
> -
> -  /* From the Ivy Bridge PRM, Vol2 Part1 11.7 "MCS Buffer for
> Render
> -   * Target(s)", beneath the "Fast Color Clear" bullet (p327):
> -   *
> -   * In order to optimize the performance MCS buffer (when
> bound to
> -   * 1X RT) clear similarly to MCS buffer clear for MSRT
> case,
> -   * clear rect is required to be scaled by the following
> factors
> -   * in the horizontal and vertical directions:
> -   *
> -   * The X and Y scale down factors in the table that follows
> are each
> -   * equal to half the alignment value computed above.
> -   */
> -  x_scaledown = x_align / 2;
> -  y_scaledown = y_align / 2;
> -
> -  if (ISL_DEV_IS_HASWELL(dev)) {
> - /* The following text was added in the Haswell PRM, "3D
> Media GPGPU
> -  * Engine" >> "MCS Buffer for Render Target(s)" >> Table
> "Color Clear
> -  * of Non-MultiSampler Render Target Restrictions":
> -  *
> -  *"Clear rectangle must be aligned to two times the
> number of
> -  *pixels in the table shown below due to 16X16 hashing
> across the
> -  *slice."
> -  *
> -  * It has persisted in the documentation for all platforms
> up until
> -  * Cannonlake and possibly even beyond.  However, we
> believe that it
> -  * is only needed on Haswell.
> -  *
> -  * There are a couple possible explanations for this
> restriction:
> -  *
> -  * 1) If you assume that the hardware is writing to the CCS
> as
> -  *bytes, then the x/y_align computed above gives you an
> alignment
> -  *in the CCS of 8x8 bytes and, if 16x16 is needed for
> hashing, we
> -  *need to multiply by 2.
> -  *
> -  * 2) Haswell is a bit unique in that it's CCS tiling does
> not line
> -  *up with Y-tiling on a cache-line
> granularity.  Instead, it has
> -  *an extra bit of swizzling in bit 9.  Also, bit 6
> swizzling
> -  *applies to the CCS on Haswell.  This means that
> Haswell CTS
> -  *does not match on a cache-line granularity but it
> does match on
> -  *a 2x2 cache line granularity.
> -  *
> -  * Clearly, the first explanation seems to follow
> documentation the
> -  * best but they may be related.  In any case, empirical
> evidence
> -  * seems to confirm that it is, indeed required on Haswell.
> -  *
> -  * On Broadwell things get a bit stickier.  Broadwell adds
> support
> -  * for mip-mapped CCS with an alignment in the CCS of
> 256x128.  For a
> -  * 32bpb main surface, the above computation will yield a
> x/y_align
> -  * of 128x128 for a Y-tiled main surface and 256x64 for X-
> tiled.  In
> -  * either case, if we double the alignment, we will get an
> alignment
> -  * bigger than horizontal and vertical alignment of the CCS
> and fast
> -  * clears of one LOD may

Re: [Mesa-dev] [PATCH 7/9] i965: Use blorp_ccs_op for CCS fast-clears

2018-05-16 Thread Iago Toral

On Tue, 2018-05-15 at 15:28 -0700, Jason Ekstrand wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_blorp.c | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c
> b/src/mesa/drivers/dri/i965/brw_blorp.c
> index dab04f2..b6097f5 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp.c
> +++ b/src/mesa/drivers/dri/i965/brw_blorp.c
> @@ -1260,9 +1260,15 @@ do_single_blorp_clear(struct brw_context *brw,
> struct gl_framebuffer *fb,
>  
>struct blorp_batch batch;
>blorp_batch_init(>blorp, , brw, 0);
> -  blorp_fast_clear(, , isl_format,
> -   level, irb->mt_layer, num_layers,
> -   x0, y0, x1, y1);
> +  if (surf.aux_usage == ISL_AUX_USAGE_CCS_E ||
> +  surf.aux_usage == ISL_AUX_USAGE_CCS_D) {
> + blorp_ccs_op(, , level, irb->mt_layer,
> num_layers,
> +  isl_format, ISL_AUX_OP_FAST_CLEAR);
> +  } else {
> + blorp_fast_clear(, , isl_format,
> +  level, irb->mt_layer, num_layers,
> +  x0, y0, x1, y1);
> +  }

On its own, this looks a bit weird, but reading further into the series
this seems to be preparation for the last patch in the series.

>blorp_batch_finish();
>  
>brw_emit_end_of_pipe_sync(brw,
> PIPE_CONTROL_RENDER_TARGET_FLUSH);
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 6/9] intel/blorp: Make blorp_ccs_ambiguate just an internal helper

2018-05-16 Thread Iago Toral

On Tue, 2018-05-15 at 15:28 -0700, Jason Ekstrand wrote:
> Now that anv uses blorp_ccs_op for everything, we no longer need to
> expose the ccs_ambiguate function directly.  It's much better tucked
> away as an implementation detail.
> ---
>  src/intel/blorp/blorp.h   |  5 -
>  src/intel/blorp/blorp_clear.c | 21 ++---
>  2 files changed, 10 insertions(+), 16 deletions(-)
> 
> diff --git a/src/intel/blorp/blorp.h b/src/intel/blorp/blorp.h
> index 8c775bf..e27ea7e 100644
> --- a/src/intel/blorp/blorp.h
> +++ b/src/intel/blorp/blorp.h
> @@ -208,11 +208,6 @@ blorp_ccs_op(struct blorp_batch *batch,
>   enum isl_aux_op ccs_op);
>  
>  void
> -blorp_ccs_ambiguate(struct blorp_batch *batch,
> -struct blorp_surf *surf,
> -uint32_t level, uint32_t layer);
> -
> -void
>  blorp_mcs_partial_resolve(struct blorp_batch *batch,
>struct blorp_surf *surf,
>enum isl_format format,
> diff --git a/src/intel/blorp/blorp_clear.c
> b/src/intel/blorp/blorp_clear.c
> index 6f5549f..39bc0c6 100644
> --- a/src/intel/blorp/blorp_clear.c
> +++ b/src/intel/blorp/blorp_clear.c
> @@ -814,6 +814,11 @@ blorp_clear_attachments(struct blorp_batch
> *batch,
> batch->blorp->exec(batch, );
>  }
>  
> +static void
> +blorp_legacy_ccs_ambiguate(struct blorp_batch *batch,
> +   struct blorp_surf *surf,
> +   uint32_t level, uint32_t layer);
> +
>  void
>  blorp_ccs_op(struct blorp_batch *batch,
>   struct blorp_surf *surf, uint32_t level,
> @@ -835,7 +840,7 @@ blorp_ccs_op(struct blorp_batch *batch,
> * mess to another function.
> */
>for (uint32_t a = 0; a < num_layers; a++)
> - blorp_ccs_ambiguate(batch, surf, level, start_layer + a);
> + blorp_legacy_ccs_ambiguate(batch, surf, level, start_layer
> + a);
>return;
> }
>  
> @@ -1022,17 +1027,11 @@ blorp_mcs_partial_resolve(struct blorp_batch
> *batch,
>   * for a given layer/level of a surface to 0x0 which is the
> "uncompressed"
>   * state which tells the sampler to go look at the main surface.
>   */
> -void
> -blorp_ccs_ambiguate(struct blorp_batch *batch,
> -struct blorp_surf *surf,
> -uint32_t level, uint32_t layer)
> +static void
> +blorp_legacy_ccs_ambiguate(struct blorp_batch *batch,
> +   struct blorp_surf *surf,
> +   uint32_t level, uint32_t layer)
>  {
> -   if (ISL_DEV_GEN(batch->blorp->isl_dev) >= 10) {
> -  /* On gen10 and above, we have a hardware resolve op for this
> */
> -  return blorp_ccs_op(batch, surf, level, layer, 1,
> -  surf->surf->format, ISL_AUX_OP_AMBIGUATE);
> -   }
> -

Since we don't want to call this in gen10+, would it make sense to an
assert for gen < 10?

Iago

> struct blorp_params params;
> blorp_params_init();
>  
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] intel/compiler: add a region_match() helper

2018-05-15 Thread Iago Toral Quiroga

This checks whether two register regions are an exact match.
---
 src/intel/compiler/brw_ir_fs.h | 13 +
 1 file changed, 13 insertions(+)

diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h
index f06a33c516d..cad333b6b6d 100644
--- a/src/intel/compiler/brw_ir_fs.h
+++ b/src/intel/compiler/brw_ir_fs.h
@@ -239,6 +239,19 @@ region_contained_in(const fs_reg , unsigned dr, const 
fs_reg , unsigned ds)
   reg_offset(r) + dr <= reg_offset(s) + ds;
 }
 
+/**
+ * Check that the register region given by r [r.offset, r.offset + dr[
+ * is exactly the same as the the register region given by s
+ * [s.offset, s.offset + ds[
+ */
+static inline bool
+region_match(const fs_reg , unsigned dr, const fs_reg , unsigned ds)
+{
+   return reg_space(r) == reg_space(s) &&
+  reg_offset(r) == reg_offset(s) &&
+  reg_offset(r) + dr == reg_offset(s) + ds;
+}
+
 /**
  * Return whether the given register region is n-periodic, i.e. whether the
  * original region remains invariant after shifting it by \p n scalar
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/3] intel: implement an optimization pass to clean-up boolean conversions

2018-05-15 Thread Iago Toral Quiroga

NIR assumes that all booleans are 32-bit, so drivers need to produce 32-bit
booleans even if they can produce native booleans of a different bit-size, like
Intel does. This means that if we have a 16-bit CMP instruction, we generate a
16-bit boolean that we immediately convert to 32-bit, since that is the bit-size
expected by NIR for all consumers of the boolean.

This backend optimization pass identifies these cases after we are done
translating from NIR to FS IR, and propagates the lower bit-size booleans
to allow DCE to remove the 32-bit conversions. The pass should run early
after translating from NIR, since it assumes that boolean conversions to
32-bit take place immediately after the corresponding CMP instructions.

This has been tested with existing and work-in-progress CTS tests as well
as some had-hoc VkRunner I wrote.

For more context you can read this discussion:
https://lists.freedesktop.org/archives/mesa-dev/2018-April/192751.html

One point raised by Jason during the discussion linked above was that we might
need to canonicalize booleans of different native bit-sizes when they are
combined in boolean expressions. However, as indicated in the commit log for the
last patch in the series, my interpretation of the PRM is that the hardware can
handle this situation without us having to do anything about it. The last patch
contains canonicalization code under a disabled #if guard anyway, just in case
reviewers think this is needed in the end and want to have a look at what it
could look like.

Alternatively to what is being done here, we could also change the way
we construct CMP instructions to take advantage of the PRM documentation that
says that CMP instructions can mix and match *B, *W and *D for their source
and destination arguments since gen5 to always produce canonical 32-bit bools
like NIR expects. However, since all hardware gens still produce 16-bit booleans
for half-float, we would still need to handle that case specially with a similar
pass so we would not gaining much from that. Also, in that case we would always
operate with 32-bit booleans, losing the possibility to emit native 16-bit
boolean instructions where possible.

Iago Toral Quiroga (3):
  intel/compiler: make brw_reg_type_from_bit_size usable from other
places
  intel/compiler: add a region_match() helper
  intel/compiler: add an optimization pass for booleans

 src/intel/compiler/brw_fs.cpp | 291 ++
 src/intel/compiler/brw_fs.h   |   5 +
 src/intel/compiler/brw_fs_nir.cpp |  59 
 src/intel/compiler/brw_ir_fs.h|  13 ++
 4 files changed, 309 insertions(+), 59 deletions(-)

-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/3] intel/compiler: add an optimization pass for booleans

2018-05-15 Thread Iago Toral Quiroga

NIR assumes that all booleans are 32-bit but Intel hardware produces
booleans of the same size as the operands to the CMP instruction, so we
can actually have 8-bit and 16-bit booleans. To work around this
mismatch between NIR and the hardware, we emit boolean conversions to
32-bit right after emitting the CMP instruction during the NIR->FS
pass, which makes interfacing with NIR a lot easier, but can leave
unnecessary boolean conversions in the shader code.

This optimization tries to identify instructions that source from these
conversions and rewrite them to use the original 16-bit boolean result
when possible, hoping that we can later eliminate the 32-bit conversions.

For example, it turns this:

cmp.g.f0(16)g10<1>HF(abs)g19<16,8,2>  HF Half Float IMM
mov(16) g8<1>D  g10<8,8,1>W
mov(16) g10<1>F -g8<8,8,1>D

into:

cmp.g.f0(16)g25<1>HF(abs)g19<16,8,2>  HF Half Float IMM
mov(16) g8<1>F  -g25<8,8,1>W

It is worth pointing out that when the original shader has native
boolean expressions of different bit-sizes, it could lead to situations
where we end up operating these booleans together, such in the following
pseudo-code:

int   a, b;
int16 c, d;
...
if ((a < b) && (c < d)) {
   ...
}

According to the PRMs (at least since IVB) all logical instructions (CMP,
NOT, AND, OR, XOR) and also non-logical instructions that could read read
boolean operands (MOV, SEL) state:

Src Types Dst Types
--
*B, *W, *D*B, *W, *D

Which indicates that these instructions can handle booleans of different
bit-sizes. I have been doing some ad-hoc testing of these scenarios using
VkRunner to verify this and didn't find any indication that this is not
working. Also, I noticed that we already have code in the driver that is
already exploiting this behavior. For example, the helper-invocation tests
in piglit produce GPU code like this:

shr(16) vgrf1:UW, g1<0>:UB, ???
and(16) vgrf2:UD, -vgrf1:UW, 65537u
---
 src/intel/compiler/brw_fs.cpp | 232 ++
 src/intel/compiler/brw_fs.h   |   1 +
 2 files changed, 233 insertions(+)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 458c534c9c7..ebb6307289d 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -3014,6 +3014,237 @@ fs_visitor::opt_peephole_csel()
return progress;
 }
 
+/**
+ * For a given integer register type representing a boolean value, obtain
+ * the signed version of the type, which is required to get sign-extension
+ * for producing correct boolean values when converting to larger bit-sizes.
+ */
+static brw_reg_type
+get_signed_bool_type(brw_reg_type type)
+{
+   switch (type) {
+   case BRW_REGISTER_TYPE_D:
+   case BRW_REGISTER_TYPE_UD:
+  return BRW_REGISTER_TYPE_D;
+   case BRW_REGISTER_TYPE_W:
+   case BRW_REGISTER_TYPE_UW:
+  return BRW_REGISTER_TYPE_W;
+   case BRW_REGISTER_TYPE_B:
+   case BRW_REGISTER_TYPE_UB:
+  return BRW_REGISTER_TYPE_B;
+   default:
+  assert(!"Invalid boolean register type");
+   }
+}
+
+static bool
+inst_supports_boolean(fs_inst *inst)
+{
+   switch (inst->opcode) {
+   case BRW_OPCODE_MOV:
+   case BRW_OPCODE_CMP:
+   case BRW_OPCODE_SEL:
+   case BRW_OPCODE_NOT:
+   case BRW_OPCODE_AND:
+   case BRW_OPCODE_OR:
+   case BRW_OPCODE_XOR:
+  return true;
+   default:
+  return false;
+   }
+}
+
+/**
+ * Modifies the type of a boolean register to accomodate it to the given
+ * bit-size while preserving signedness of its original type.
+ */
+static inline fs_reg
+fix_bool_reg_bit_size(fs_reg reg, unsigned bit_size)
+{
+   const brw_reg_type bool_type =
+  brw_reg_type_from_bit_size(bit_size, reg.type);
+   return retype(reg, bool_type);
+}
+
+/**
+ * Propagates the bit-size of the destination of a boolean instruction to
+ * all its consumers. If propagate_from_source is True, then the producer
+ * is a conversion MOV from a low bit-size boolean to 32-bit, and in that
+ * case the propagation happens from the source of the instruction instead
+ * of its destination.
+ */
+static bool
+propagate_bool_bit_size(fs_inst *inst, bool propagate_from_source)
+{
+   assert(!propagate_from_source || inst->opcode == BRW_OPCODE_MOV);
+
+   bool progress = false;
+
+   const unsigned bit_size = 8 * (propagate_from_source ?
+  type_sz(inst->src[0].type) : type_sz(inst->dst.type));
+
+   /* Look for any follow-up instructions that sources from the boolean
+* result of the producer instruction and rewrite them to use the correct
+* bit-size.
+*/
+   foreach_inst_in_block_starting_from(fs_inst, fixup_inst, inst) {
+  if (!inst_supports_boolean(fixup_inst))
+ continue;
+
+  /* For MOV instructions we can always rewrite the boolean source
+   * if the instrucion reads the same region we produced in the
+   * 32-bit conversion.
+   */
+  if (fixup_inst->opcode

[Mesa-dev] [PATCH 1/3] intel/compiler: make brw_reg_type_from_bit_size usable from other places

2018-05-15 Thread Iago Toral Quiroga

This was private to brw_fs_nir.cpp but we are going to need it soon in
brw_fs.cpp, so move it there and make it available to other files as we
do for other utility functions.
---
 src/intel/compiler/brw_fs.cpp | 59 +++
 src/intel/compiler/brw_fs.h   |  4 +++
 src/intel/compiler/brw_fs_nir.cpp | 59 ---
 3 files changed, 63 insertions(+), 59 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index dcba4ee8068..458c534c9c7 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -900,6 +900,65 @@ fs_inst::size_read(int arg) const
return 0;
 }
 
+/*
+ * Returns a type based on a reference_type (word, float, half-float) and a
+ * given bit_size.
+ *
+ * Reference BRW_REGISTER_TYPE are HF,F,DF,W,D,UW,UD.
+ *
+ * @FIXME: 64-bit return types are always DF on integer types to maintain
+ * compability with uses of DF previously to the introduction of int64
+ * support.
+ */
+brw_reg_type
+brw_reg_type_from_bit_size(const unsigned bit_size,
+   const brw_reg_type reference_type)
+{
+   switch(reference_type) {
+   case BRW_REGISTER_TYPE_HF:
+   case BRW_REGISTER_TYPE_F:
+   case BRW_REGISTER_TYPE_DF:
+  switch(bit_size) {
+  case 16:
+ return BRW_REGISTER_TYPE_HF;
+  case 32:
+ return BRW_REGISTER_TYPE_F;
+  case 64:
+ return BRW_REGISTER_TYPE_DF;
+  default:
+ unreachable("Invalid bit size");
+  }
+   case BRW_REGISTER_TYPE_W:
+   case BRW_REGISTER_TYPE_D:
+   case BRW_REGISTER_TYPE_Q:
+  switch(bit_size) {
+  case 16:
+ return BRW_REGISTER_TYPE_W;
+  case 32:
+ return BRW_REGISTER_TYPE_D;
+  case 64:
+ return BRW_REGISTER_TYPE_Q;
+  default:
+ unreachable("Invalid bit size");
+  }
+   case BRW_REGISTER_TYPE_UW:
+   case BRW_REGISTER_TYPE_UD:
+   case BRW_REGISTER_TYPE_UQ:
+  switch(bit_size) {
+  case 16:
+ return BRW_REGISTER_TYPE_UW;
+  case 32:
+ return BRW_REGISTER_TYPE_UD;
+  case 64:
+ return BRW_REGISTER_TYPE_UQ;
+  default:
+ unreachable("Invalid bit size");
+  }
+   default:
+  unreachable("Unknown type");
+   }
+}
+
 namespace {
/* Return the subset of flag registers that an instruction could
 * potentially read or write based on the execution controls and flag
diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index e384db809dc..c4d5ebee239 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -525,4 +525,8 @@ fs_reg setup_imm_df(const brw::fs_builder ,
 enum brw_barycentric_mode brw_barycentric_mode(enum glsl_interp_mode mode,
nir_intrinsic_op op);
 
+brw_reg_type
+brw_reg_type_from_bit_size(const unsigned bit_size,
+   const brw_reg_type reference_type);
+
 #endif /* BRW_FS_H */
diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 58ddc456bae..490fd4a0461 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -260,65 +260,6 @@ fs_visitor::nir_emit_system_values()
}
 }
 
-/*
- * Returns a type based on a reference_type (word, float, half-float) and a
- * given bit_size.
- *
- * Reference BRW_REGISTER_TYPE are HF,F,DF,W,D,UW,UD.
- *
- * @FIXME: 64-bit return types are always DF on integer types to maintain
- * compability with uses of DF previously to the introduction of int64
- * support.
- */
-static brw_reg_type
-brw_reg_type_from_bit_size(const unsigned bit_size,
-   const brw_reg_type reference_type)
-{
-   switch(reference_type) {
-   case BRW_REGISTER_TYPE_HF:
-   case BRW_REGISTER_TYPE_F:
-   case BRW_REGISTER_TYPE_DF:
-  switch(bit_size) {
-  case 16:
- return BRW_REGISTER_TYPE_HF;
-  case 32:
- return BRW_REGISTER_TYPE_F;
-  case 64:
- return BRW_REGISTER_TYPE_DF;
-  default:
- unreachable("Invalid bit size");
-  }
-   case BRW_REGISTER_TYPE_W:
-   case BRW_REGISTER_TYPE_D:
-   case BRW_REGISTER_TYPE_Q:
-  switch(bit_size) {
-  case 16:
- return BRW_REGISTER_TYPE_W;
-  case 32:
- return BRW_REGISTER_TYPE_D;
-  case 64:
- return BRW_REGISTER_TYPE_Q;
-  default:
- unreachable("Invalid bit size");
-  }
-   case BRW_REGISTER_TYPE_UW:
-   case BRW_REGISTER_TYPE_UD:
-   case BRW_REGISTER_TYPE_UQ:
-  switch(bit_size) {
-  case 16:
- return BRW_REGISTER_TYPE_UW;
-  case 32:
- return BRW_REGISTER_TYPE_UD;
-  case 64:
- return BRW_REGISTER_TYPE_UQ;
-  default:
- unreachable("Invalid bit size");
-  }
-   default:
-  unreachable("Unknown type");
-   }
-}
-
 void
 fs_visitor::nir_emit_impl(nir_function_impl *impl)
 {
-- 
2.14.1

___

[Mesa-dev] [PATCH 2/4] i965/compiler: handle conversion to smaller type in the lowering pass for that

2018-05-04 Thread Iago Toral Quiroga

This rollbacks the revert of this same patch introduced in
commit 7b9c15628aae8729118b648f5f473e6ac926b99b.
---
 src/intel/compiler/brw_fs_lower_conversions.cpp |  5 -
 src/intel/compiler/brw_fs_nir.cpp   | 14 +++---
 2 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/src/intel/compiler/brw_fs_lower_conversions.cpp 
b/src/intel/compiler/brw_fs_lower_conversions.cpp
index 663c9674c4..f95b39d3e8 100644
--- a/src/intel/compiler/brw_fs_lower_conversions.cpp
+++ b/src/intel/compiler/brw_fs_lower_conversions.cpp
@@ -54,7 +54,7 @@ fs_visitor::lower_conversions()
   bool saturate = inst->saturate;
 
   if (supports_type_conversion(inst)) {
- if (get_exec_type_size(inst) == 8 && type_sz(inst->dst.type) < 8) {
+ if (type_sz(inst->dst.type) < get_exec_type_size(inst)) {
 /* From the Broadwell PRM, 3D Media GPGPU, "Double Precision Float 
to
  * Single Precision Float":
  *
@@ -64,6 +64,9 @@ fs_visitor::lower_conversions()
  * So we need to allocate a temporary that's two registers, and 
then do
  * a strided MOV to get the lower DWord of every Qword that has the
  * result.
+ *
+ * This restriction applies, in general, whenever we convert to
+ * a type with a smaller bit-size.
  */
 fs_reg temp = ibld.vgrf(get_exec_type(inst));
 fs_reg strided_temp = subscript(temp, dst.type, 0);
diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index c7f7bc21b8..1ce89520bf 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -755,19 +755,9 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
*/
 
case nir_op_f2f16_undef:
-   case nir_op_i2i16:
-   case nir_op_u2u16: {
-  /* TODO: Fixing aligment rules for conversions from 32-bits to
-   * 16-bit types should be moved to lower_conversions
-   */
-  fs_reg tmp = bld.vgrf(op[0].type, 1);
-  tmp = subscript(tmp, result.type, 0);
-  inst = bld.MOV(tmp, op[0]);
-  inst->saturate = instr->dest.saturate;
-  inst = bld.MOV(result, tmp);
+  inst = bld.MOV(result, op[0]);
   inst->saturate = instr->dest.saturate;
   break;
-   }
 
case nir_op_f2f64:
case nir_op_f2i64:
@@ -807,6 +797,8 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
case nir_op_f2u16:
case nir_op_i2i32:
case nir_op_u2u32:
+   case nir_op_i2i16:
+   case nir_op_u2u16:
case nir_op_i2f16:
case nir_op_u2f16:
   inst = bld.MOV(result, op[0]);
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 4/4] anv/device: expose shaderInt16 support in gen8+

2018-05-04 Thread Iago Toral Quiroga

This rollbacks the revert of this patch introduced with
commit 7cf284f18e6774c810ed6db17b98e597bf96f8a5.
---
 src/intel/vulkan/anv_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 0563eae5c1..fd516fb846 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -759,7 +759,7 @@ void anv_GetPhysicalDeviceFeatures(
   
pdevice->info.has_64bit_types,
   .shaderInt64  = pdevice->info.gen >= 8 &&
   
pdevice->info.has_64bit_types,
-  .shaderInt16  = false,
+  .shaderInt16  = pdevice->info.gen >= 8,
   .shaderResourceMinLod = false,
   .variableMultisampleRate  = true,
   .inheritedQueries = true,
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/4] intel/compiler: handle 16-bit to 64-bit conversions in BSW platforms

2018-05-04 Thread Iago Toral Quiroga

These are subject to the general restriction that anything that is converted
to 64-bit needs to be aligned to 64-bit.  We had this already in place for
32-bit to 64-bit conversions, so this patch generalizes the implementation
to take effect on any conversion to 64-bit from a source smaller than
64-bit.

Fixes assembly validation errors in the following CTS tests in BSW:
dEQP-VK.spirv_assembly.instruction.compute.sconvert.int16_to_int64
dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint16_to_uint64
dEQP-VK.spirv_assembly.instruction.compute.sconvert.int16_to_uint64

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106389
---
 src/intel/compiler/brw_fs_nir.cpp | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index f9fde145a1..c7f7bc21b8 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -785,12 +785,12 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
*the same qword.
* (...)"
*
-   * This means that 32-bit to 64-bit conversions need to have the 32-bit
-   * data elements aligned to 64-bit. This restriction does not apply to
-   * BDW and later.
+   * This means that conversions from bit-sizes smaller than 64-bit to
+   * 64-bit need to have the source data elements aligned to 64-bit.
+   * This restriction does not apply to BDW and later.
*/
   if (nir_dest_bit_size(instr->dest.dest) == 64 &&
-  nir_src_bit_size(instr->src[0].src) == 32 &&
+  nir_src_bit_size(instr->src[0].src) < 64 &&
   (devinfo->is_cherryview || gen_device_info_is_9lp(devinfo))) {
  fs_reg tmp = bld.vgrf(result.type, 1);
  tmp = subscript(tmp, op[0].type, 0);
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/4] intel/compiler: Fix lower_conversions for 8-bit types.

2018-05-04 Thread Iago Toral Quiroga

From: Jose Maria Casanova Crespo 

For 8-bit types the execution type is word. A byte raw MOV has 16-bit
execution type and 8-bit destination and it shouldn't be considered
a conversion case. So there is no need to change alignment and enter
in lower_conversions for these instructions.

Fixes a regresion in the piglit test "glsl-fs-shader-stencil-export"
that is introduced with this patch from the Vulkan shaderInt16 series:
'i965/compiler: handle conversion to smaller type in the lowering
pass for that'. The problem is caused because there is already a case
in the driver that injects Byte instructions like this:

mov(8)  g127<1>UB   g2<32,8,4>UB

And the aforementioned pass was not accounting for the special
handling of the execution size of Byte instructions. This patch
fixes this.

v2: (Jason Ekstrand)
   - Simplify is_byte_raw_mov, include reference to PRM and not
   consider B <-> UB conversions as raw movs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106393
---
 src/intel/compiler/brw_fs_lower_conversions.cpp | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_fs_lower_conversions.cpp 
b/src/intel/compiler/brw_fs_lower_conversions.cpp
index f95b39d3e8..f6c936cf21 100644
--- a/src/intel/compiler/brw_fs_lower_conversions.cpp
+++ b/src/intel/compiler/brw_fs_lower_conversions.cpp
@@ -43,6 +43,24 @@ supports_type_conversion(const fs_inst *inst) {
}
 }
 
+/* From the SKL PRM Vol 2a, "Move":
+ *
+ *"A mov with the same source and destination type, no source modifier,
+ *and no saturation is a raw move. A packed byte destination region (B
+ *or UB type with HorzStride == 1 and ExecSize > 1) can only be written
+ *using raw move."
+ */
+static bool
+is_byte_raw_mov (const fs_inst *inst)
+{
+   return type_sz(inst->dst.type) == 1 &&
+  inst->opcode == BRW_OPCODE_MOV &&
+  inst->src[0].type == inst->dst.type &&
+  !inst->saturate &&
+  !inst->src[0].negate &&
+  !inst->src[0].abs;
+}
+
 bool
 fs_visitor::lower_conversions()
 {
@@ -54,7 +72,8 @@ fs_visitor::lower_conversions()
   bool saturate = inst->saturate;
 
   if (supports_type_conversion(inst)) {
- if (type_sz(inst->dst.type) < get_exec_type_size(inst)) {
+ if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&
+ !is_byte_raw_mov(inst)) {
 /* From the Broadwell PRM, 3D Media GPGPU, "Double Precision Float 
to
  * Single Precision Float":
  *
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 0/4] Intel: fixes to regressions caused by shaderInt16

2018-05-04 Thread Iago Toral Quiroga

This series fixes these two bug reports:
https://bugs.freedesktop.org/show_bug.cgi?id=106393
https://bugs.freedesktop.org/show_bug.cgi?id=106389

Caused by the shaderInt16 series we landed yesterday. Details follow:

Bug #106389 was triggered on BSW-like platforms, which apparently are not
executed during developer Jenkins runs due to excesive run times. The problem
was manifesting as some test fails for Intel CI. Although the same tests would
not fail for me, at least on a BSW NUC I have around, I could see assembly
validation errors being printed when INTEL_DEBUG was used. Thankfully, this is
a trivial fix, since we just need to account for the fact that all conversions
to 64b need to be aligned to 64b in these platforms, and we were already
doing this for 32b to 64b conversions, we just had to generalize the code.

Bug #106393: I am not sure why we were not seeing this in our Jenkins runs,
but we can certainly reproduce it locally. We had, in fact, seen the problem
before while we were experimenting with int8 support in the compiler since it is
specific to Byte instructions (MOVs in particular). It seems that there is at
least one case in the driver where we emit a Byte MOV instruction already and
that was causing the problem. We already had a fix for this that we had been
discussin with Jason for some time, we just were not aware that we already had
this situation in the driver.

Patch 1 should fix the first bug. I verified that the CTS tests mentioned in
the bug report no longer show validation errors in the assembly output with
INTEL_DEBUG=cs. The tests were already passing for me without the fix though.

Patch 2 is a rollback of the revert of that same commit.

Patch 3 fixes the second bug, which is introduced in Patch 2. Patch 3 was 
created
on top of patch 2 and it is a very specific fix, so at least for
review purposes we think it makes sense to make both patches separate, however,
since patch 2 introduces a regression, we think we want to squash them
together before pushing.

Patch 4 Re-enables shaderInt16

Mark, I ran these patches through Jenkins and didn't get any regressions but
since I didn't get them when I was testing the original shaderInt16 I'd ask if
you can verify the fixes on your end (specially on BSW-like platforms).

Iago

Iago Toral Quiroga (3):
  intel/compiler: handle 16-bit to 64-bit conversions in BSW platforms
  i965/compiler: handle conversion to smaller type in the lowering pass
for that
  anv/device: expose shaderInt16 support in gen8+

Jose Maria Casanova Crespo (1):
  intel/compiler: Fix lower_conversions for 8-bit types.

 src/intel/compiler/brw_fs_lower_conversions.cpp | 24 +++-
 src/intel/compiler/brw_fs_nir.cpp   | 22 +++---
 src/intel/vulkan/anv_device.c   |  2 +-
 3 files changed, 31 insertions(+), 17 deletions(-)

-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 00/18] anv: add shaderInt16 support

2018-05-04 Thread Iago Toral

On Thu, 2018-05-03 at 11:44 -0700, Clayton Craft wrote:
> Quoting Iago Toral Quiroga (2018-04-30 07:18:08)
> > This version addresses the feedback received to v1, which includes
> > moving the
> > bit-size lowering pass from intel to core NIR (patch 8) and a
> > separate patch
> > to add Intel's specific configuration for int16 (patch 9), and then
> > it also
> > adds a few things that were missing in the first version, namely, a
> > fix for
> > 16-bit comparisons to emit 32-bit booleans (patch 10 -a patch to
> > optimize the
> > resulting code will come later-) and 16-bit pack/unpack which is
> > needed for
> > 16-bit bitcasts (patches 11-15).
> > 
> > Patches 6-15 need review, the rest (1-5 and 16-18), have already
> > been reviewed
> > and don't have changes.
> > 
> > A branch with the series is available for testing in the
> > 'itoral/shaderInt16ForReview_v2' branch of the Igalia mesa
> > repository at
> > github:
> > 
> > https://github.com/Igalia/mesa/tree/itoral/shaderInt16ForReview_v2
> > 
> > Iago Toral Quiroga (16):
> >   intel/compiler: fix isign for 16-bit integers
> >   i965/compiler: handle conversion to smaller type in the lowering
> > pass
> > for that
> >   intel/compiler: implement conversion between float/int 16-bit
> > types
> >   intel/compiler: implement conversions from 16-bit int/float to
> > bool
> >   intel/compiler: fix brw_imm_w for negative 16-bit integers
> >   compiler/nir: add a lowering pass to convert the bit size of ALU
> > operations
> >   intel/compiler: lower some 16-bit integer operations to 32-bit
> >   intel/compiler: fix 16-bit comparisons
> >   nir: add opcodes for 16-bit packing and unpacking
> >   nir/lower_64bit_packing: extend the pass to handle packing from /
> > to
> > 16-bit.
> >   compiler/lower_64bit_packing: rename the pass to be more generic
> >   compiler/spirv: implement 16-bit bitcasts
> >   intel/compiler: implement 16-bit pack/unpack opcodes
> >   compiler/spirv: add implementation to check for
> > SpvCapabilityInt16
> > support
> >   anv/pipeline: support SpvCapabilityInt16 in gen8+
> >   anv/device: expose shaderInt16 support in gen8+
> > 
> > Jose Maria Casanova Crespo (2):
> >   intel/compiler: implement nir_instr_type_load_const for 16-bit
> > constants
> >   intel/compiler: fix brw_negate_immediate for 16-bit types
> > 
> >  src/amd/vulkan/radv_shader.c   |   2 +-
> >  src/compiler/Makefile.sources  |   3 +-
> >  src/compiler/nir/meson.build   |   3 +-
> >  src/compiler/nir/nir.h |   8 +-
> >  src/compiler/nir/nir_lower_bit_size.c  | 127
> > +
> >  ...r_lower_64bit_packing.c => nir_lower_packing.c} |  70
> > ++--
> >  src/compiler/nir/nir_opcodes.py|  19 +++
> >  src/compiler/shader_info.h |   1 +
> >  src/compiler/spirv/spirv_to_nir.c  |   4 +-
> >  src/compiler/spirv/vtn_alu.c   |  31 +++--
> >  src/intel/compiler/brw_fs_lower_conversions.cpp|   5 +-
> >  src/intel/compiler/brw_fs_nir.cpp  | 100
> > +++-
> >  src/intel/compiler/brw_nir.c   |  23 +++-
> >  src/intel/compiler/brw_reg.h   |   2 +-
> >  src/intel/compiler/brw_shader.cpp  |  11 +-
> >  src/intel/vulkan/anv_device.c  |   2 +-
> >  src/intel/vulkan/anv_pipeline.c|   1 +
> >  src/mesa/state_tracker/st_glsl_to_nir.cpp  |   2 +-
> >  18 files changed, 356 insertions(+), 58 deletions(-)
> >  create mode 100644 src/compiler/nir/nir_lower_bit_size.c
> >  rename src/compiler/nir/{nir_lower_64bit_packing.c =>
> > nir_lower_packing.c} (56%)
> > 
> > -- 
> > 2.14.1
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> Since this patch series was merged, we are seeing a number of
> failures in CI on
> BSW, GLK, and BXT platforms:
> 
> dEQP-VK.spirv_assembly.instruction.compute.sconvert.int16_to_int64
> dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint16_to_uint64
> dEQP-VK.spirv_assembly.instruction.compute.sconvert.int16_to_uint64
> 
> 
> fdo bug: #106389
> 
> 
> Output from tests is:
> 
> dEQP-VK.spirv_as

Re: [Mesa-dev] [PATCH v2 7.5/18] intel/compiler: support negate and abs of half float immediates

2018-05-03 Thread Iago Toral

On Thu, 2018-05-03 at 08:39 +0200, Iago Toral wrote:
> On Wed, 2018-05-02 at 17:57 -0700, Jason Ekstrand wrote:
> > Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net>
> > 
> > Have I reviewed everything?  Can we land shaderInt16 now?
> 
> Yes, all patches are reviewed now, thanks Jason.
> I'll send the final set of patches to Jenkins one last time and push
> them today  if we don't see any unexpected results.

I have just pushed the patches to master.
Iago
> Iago
> > --Jason
> > 
> > On Wed, May 2, 2018 at 5:18 PM, Jose Maria Casanova Crespo  > o...@igalia.com> wrote:
> > > ---
> > > 
> > >  src/intel/compiler/brw_shader.cpp | 6 --
> > > 
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > 
> > > 
> > > diff --git a/src/intel/compiler/brw_shader.cpp
> > > b/src/intel/compiler/brw_shader.cpp
> > > 
> > > index 284c2e8233c..537defd05d9 100644
> > > 
> > > --- a/src/intel/compiler/brw_shader.cpp
> > > 
> > > +++ b/src/intel/compiler/brw_shader.cpp
> > > 
> > > @@ -605,7 +605,8 @@ brw_negate_immediate(enum brw_reg_type type,
> > > struct brw_reg *reg)
> > > 
> > > case BRW_REGISTER_TYPE_V:
> > > 
> > >assert(!"unimplemented: negate UV/V immediate");
> > > 
> > > case BRW_REGISTER_TYPE_HF:
> > > 
> > > -  assert(!"unimplemented: negate HF immediate");
> > > 
> > > +  reg->ud ^= 0x80008000;
> > > 
> > > +  return true;
> > > 
> > > case BRW_REGISTER_TYPE_NF:
> > > 
> > >unreachable("no NF immediates");
> > > 
> > > }
> > > 
> > > @@ -651,7 +652,8 @@ brw_abs_immediate(enum brw_reg_type type,
> > > struct brw_reg *reg)
> > > 
> > > case BRW_REGISTER_TYPE_V:
> > > 
> > >assert(!"unimplemented: abs V immediate");
> > > 
> > > case BRW_REGISTER_TYPE_HF:
> > > 
> > > -  assert(!"unimplemented: abs HF immediate");
> > > 
> > > +  reg->ud &= ~0x80008000;
> > > 
> > > +  return true;
> > > 
> > > case BRW_REGISTER_TYPE_NF:
> > > 
> > >unreachable("no NF immediates");
> > > 
> > > }
> > > 
> > > -- 
> > > 
> > > 2.14.3
> > > 
> > > 
> > > ___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 7.5/18] intel/compiler: support negate and abs of half float immediates

2018-05-03 Thread Iago Toral

On Wed, 2018-05-02 at 17:57 -0700, Jason Ekstrand wrote:
> Reviewed-by: Jason Ekstrand 
> 
> Have I reviewed everything?  Can we land shaderInt16 now?

Yes, all patches are reviewed now, thanks Jason.I'll send the final set
of patches to Jenkins one last time and push them today  if we don't
see any unexpected results.
Iago
> --Jason
> 
> On Wed, May 2, 2018 at 5:18 PM, Jose Maria Casanova Crespo  a...@igalia.com> wrote:
> > ---
> > 
> >  src/intel/compiler/brw_shader.cpp | 6 --
> > 
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > 
> > 
> > diff --git a/src/intel/compiler/brw_shader.cpp
> > b/src/intel/compiler/brw_shader.cpp
> > 
> > index 284c2e8233c..537defd05d9 100644
> > 
> > --- a/src/intel/compiler/brw_shader.cpp
> > 
> > +++ b/src/intel/compiler/brw_shader.cpp
> > 
> > @@ -605,7 +605,8 @@ brw_negate_immediate(enum brw_reg_type type,
> > struct brw_reg *reg)
> > 
> > case BRW_REGISTER_TYPE_V:
> > 
> >assert(!"unimplemented: negate UV/V immediate");
> > 
> > case BRW_REGISTER_TYPE_HF:
> > 
> > -  assert(!"unimplemented: negate HF immediate");
> > 
> > +  reg->ud ^= 0x80008000;
> > 
> > +  return true;
> > 
> > case BRW_REGISTER_TYPE_NF:
> > 
> >unreachable("no NF immediates");
> > 
> > }
> > 
> > @@ -651,7 +652,8 @@ brw_abs_immediate(enum brw_reg_type type,
> > struct brw_reg *reg)
> > 
> > case BRW_REGISTER_TYPE_V:
> > 
> >assert(!"unimplemented: abs V immediate");
> > 
> > case BRW_REGISTER_TYPE_HF:
> > 
> > -  assert(!"unimplemented: abs HF immediate");
> > 
> > +  reg->ud &= ~0x80008000;
> > 
> > +  return true;
> > 
> > case BRW_REGISTER_TYPE_NF:
> > 
> >unreachable("no NF immediates");
> > 
> > }
> > 
> > -- 
> > 
> > 2.14.3
> > 
> > 
> > ___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v3] intel/compiler: fix 16-bit comparisons

2018-05-02 Thread Iago Toral Quiroga

NIR assumes that booleans are always 32-bit, but Intel hardware produces
16-bit booleans for 16-bit comparisons. This means that we need to convert
the 16-bit result to 32-bit.

In the future we want to add an optimization pass to clean this up and
hopefully remove the conversions.

v2 (Jason): use the type of the source for the temporary and use
brw_reg_type_from_bit_size for the conversion to 32-bit.
---
 src/intel/compiler/brw_fs_nir.cpp | 42 +++
 1 file changed, 34 insertions(+), 8 deletions(-)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index b9d8ade4cf..f763dfa4f2 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -1017,9 +1017,11 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
case nir_op_feq:
case nir_op_fne: {
   fs_reg dest = result;
-  if (nir_src_bit_size(instr->src[0].src) > 32) {
- dest = bld.vgrf(BRW_REGISTER_TYPE_DF, 1);
-  }
+
+  const uint32_t bit_size =  nir_src_bit_size(instr->src[0].src);
+  if (bit_size != 32)
+ dest = bld.vgrf(op[0].type, 1);
+
   brw_conditional_mod cond;
   switch (instr->op) {
   case nir_op_flt:
@@ -1037,9 +1039,21 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
   default:
  unreachable("bad opcode");
   }
+
   bld.CMP(dest, op[0], op[1], cond);
-  if (nir_src_bit_size(instr->src[0].src) > 32) {
+
+  if (bit_size > 32) {
  bld.MOV(result, subscript(dest, BRW_REGISTER_TYPE_UD, 0));
+  } else if(bit_size < 32) {
+ /* When we convert the result to 32-bit we need to be careful and do
+  * it as a signed conversion to get sign extension (for 32-bit true)
+  */
+ const brw_reg_type dst_type =
+brw_reg_type_from_bit_size(32, BRW_REGISTER_TYPE_D);
+ const brw_reg_type src_type =
+brw_reg_type_from_bit_size(bit_size, BRW_REGISTER_TYPE_D);
+
+ bld.MOV(retype(result, dst_type), retype(dest, src_type));
   }
   break;
}
@@ -1051,9 +1065,10 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
case nir_op_ieq:
case nir_op_ine: {
   fs_reg dest = result;
-  if (nir_src_bit_size(instr->src[0].src) > 32) {
- dest = bld.vgrf(BRW_REGISTER_TYPE_UQ, 1);
-  }
+
+  const uint32_t bit_size = nir_src_bit_size(instr->src[0].src);
+  if (bit_size != 32)
+ dest = bld.vgrf(op[0].type, 1);
 
   brw_conditional_mod cond;
   switch (instr->op) {
@@ -1075,8 +1090,19 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
  unreachable("bad opcode");
   }
   bld.CMP(dest, op[0], op[1], cond);
-  if (nir_src_bit_size(instr->src[0].src) > 32) {
+
+  if (bit_size > 32) {
  bld.MOV(result, subscript(dest, BRW_REGISTER_TYPE_UD, 0));
+  } else if (bit_size < 32) {
+ /* When we convert the result to 32-bit we need to be careful and do
+  * it as a signed conversion to get sign extension (for 32-bit true)
+  */
+ const brw_reg_type dst_type =
+brw_reg_type_from_bit_size(32, BRW_REGISTER_TYPE_D);
+ const brw_reg_type src_type =
+brw_reg_type_from_bit_size(bit_size, BRW_REGISTER_TYPE_D);
+
+ bld.MOV(retype(result, dst_type), retype(dest, src_type));
   }
   break;
}
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 10/18] intel/compiler: fix 16-bit comparisons

2018-05-02 Thread Iago Toral

On Mon, 2018-04-30 at 14:43 -0700, Jason Ekstrand wrote:
> On Mon, Apr 30, 2018 at 7:18 AM, Iago Toral Quiroga <ito...@igalia.co
> m> wrote:
> > NIR assumes that booleans are always 32-bit, but Intel hardware
> > produces
> > 
> > 16-bit booleans for 16-bit comparisons. This means that we need to
> > convert
> > 
> > the 16-bit result to 32-bit.
> > 
> > 
> > 
> > In the future we want to add an optimization pass to clean this up
> > and
> > 
> > hopefully remove the conversions.
> > 
> > ---
> > 
> >  src/intel/compiler/brw_fs_nir.cpp | 34
> > --
> > 
> >  1 file changed, 28 insertions(+), 6 deletions(-)
> > 
> > 
> > 
> > diff --git a/src/intel/compiler/brw_fs_nir.cpp
> > b/src/intel/compiler/brw_fs_nir.cpp
> > 
> > index b9d8ade4cf..d590a00385 100644
> > 
> > --- a/src/intel/compiler/brw_fs_nir.cpp
> > 
> > +++ b/src/intel/compiler/brw_fs_nir.cpp
> > 
> > @@ -1017,9 +1017,13 @@ fs_visitor::nir_emit_alu(const fs_builder
> > , nir_alu_instr *instr)
> > 
> > case nir_op_feq:
> > 
> > case nir_op_fne: {
> > 
> >fs_reg dest = result;
> > 
> > -  if (nir_src_bit_size(instr->src[0].src) > 32) {
> > 
> > +
> > 
> > +  const uint32_t bit_size =  nir_src_bit_size(instr-
> > >src[0].src);
> > 
> > +  if (bit_size > 32)
> > 
> >   dest = bld.vgrf(BRW_REGISTER_TYPE_DF, 1);
> > 
> > -  }
> > 
> > +  else if (bit_size < 32)
> > 
> > + dest = bld.vgrf(BRW_REGISTER_TYPE_HF, 1);
> 
> This is going to break for 8-bit.  Maybe add an assert?  For that
> matter, why not just use the type of src0 for the destination type? 
> How do 8-bit comparisons work?  Do they return a 16-bit value?

According to the PRM a Byte CMP writes 0xFF for true, so they return an
8-bit value. I think using the type of the source operand here should
work.
>  
> > +
> > 
> >brw_conditional_mod cond;
> > 
> >switch (instr->op) {
> > 
> >case nir_op_flt:
> > 
> > @@ -1037,9 +1041,17 @@ fs_visitor::nir_emit_alu(const fs_builder
> > , nir_alu_instr *instr)
> > 
> >default:
> > 
> >   unreachable("bad opcode");
> > 
> >}
> > 
> > +
> > 
> >bld.CMP(dest, op[0], op[1], cond);
> > 
> > -  if (nir_src_bit_size(instr->src[0].src) > 32) {
> > 
> > +
> > 
> > +  if (bit_size > 32) {
> > 
> >   bld.MOV(result, subscript(dest, BRW_REGISTER_TYPE_UD,
> > 0));
> > 
> > +  } else if(bit_size < 32) {
> > 
> > + /* When we convert the result to 32-bit we need to be
> > careful and do
> > 
> > +  * it as a signed conversion to get sign extension (for
> > 32-bit true)
> > 
> > +  */
> > 
> > + bld.MOV(retype(result, BRW_REGISTER_TYPE_D),
> > 
> > + retype(dest, BRW_REGISTER_TYPE_W));
> 
> Maybe better to use brw_reg_type_from_bit_size so 8-bit gets
> automatically handled?

Ah yes, that's a good idea, thanks!
Iago
> >}
> > 
> >break;
> > 
> > }
> > 
> > @@ -1051,9 +1063,12 @@ fs_visitor::nir_emit_alu(const fs_builder
> > , nir_alu_instr *instr)
> > 
> > case nir_op_ieq:
> > 
> > case nir_op_ine: {
> > 
> >fs_reg dest = result;
> > 
> > -  if (nir_src_bit_size(instr->src[0].src) > 32) {
> > 
> > +
> > 
> > +  const uint32_t bit_size = nir_src_bit_size(instr-
> > >src[0].src);
> > 
> > +  if (bit_size > 32)
> > 
> >   dest = bld.vgrf(BRW_REGISTER_TYPE_UQ, 1);
> > 
> > -  }
> > 
> > +  else if (bit_size < 32)
> > 
> > + dest = bld.vgrf(BRW_REGISTER_TYPE_W, 1);
> > 
> > 
> > 
> >brw_conditional_mod cond;
> > 
> >switch (instr->op) {
> > 
> > @@ -1075,8 +1090,15 @@ fs_visitor::nir_emit_alu(const fs_builder
> > , nir_alu_instr *instr)
> > 
> >   unreachable("bad opcode");
> > 
> >}
> > 
> >bld.CMP(dest, op[0], op[1], cond);
> > 
> > -  if (nir_src_bit_size(instr->src[0].src) > 32) {
> > 
> > +
> > 
> > +  if (bit_size > 32) {
> > 
> >   bld.MOV(result, subscript(dest, BRW_REGISTER_TYPE_UD,
> > 0));
> > 
> > +  } else if (bit_size < 32) {
> > 
> > + /* When we convert the result to 32-bit we need to be
> > careful and do
> > 
> > +  * it as a signed conversion to get sign extension (for
> > 32-bit true)
> > 
> > +  */
> > 
> > + bld.MOV(retype(result, BRW_REGISTER_TYPE_D),
> > 
> > + retype(dest, BRW_REGISTER_TYPE_W));
> > 
> >}
> > 
> >break;
> > 
> > }
> > 
> > -- 
> > 
> > 2.14.1
> > 
> > 
> > ___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 18/18] anv/device: expose shaderInt16 support in gen8+

2018-04-30 Thread Iago Toral Quiroga

Reviewed-by: Jason Ekstrand 
---
 src/intel/vulkan/anv_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index b456d3d4c5..d123ae16ec 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -761,7 +761,7 @@ void anv_GetPhysicalDeviceFeatures(
   .shaderCullDistance   = true,
   .shaderFloat64= pdevice->info.gen >= 8,
   .shaderInt64  = pdevice->info.gen >= 8,
-  .shaderInt16  = false,
+  .shaderInt16  = pdevice->info.gen >= 8,
   .shaderResourceMinLod = false,
   .variableMultisampleRate  = false,
   .inheritedQueries = true,
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 17/18] anv/pipeline: support SpvCapabilityInt16 in gen8+

2018-04-30 Thread Iago Toral Quiroga

Reviewed-by: Jason Ekstrand 
---
 src/intel/vulkan/anv_pipeline.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 56bea7bf0d..87788de10a 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -144,6 +144,7 @@ anv_shader_compile_to_nir(struct anv_pipeline *pipeline,
  .multiview = true,
  .variable_pointers = true,
  .storage_16bit = device->instance->physicalDevice.info.gen >= 8,
+ .int16 = device->instance->physicalDevice.info.gen >= 8,
  .shader_viewport_index_layer = true,
  .subgroup_arithmetic = true,
  .subgroup_basic = true,
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 14/18] compiler/spirv: implement 16-bit bitcasts

2018-04-30 Thread Iago Toral Quiroga

---
 src/compiler/spirv/vtn_alu.c | 31 ++-
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/src/compiler/spirv/vtn_alu.c b/src/compiler/spirv/vtn_alu.c
index 3134849ba9..3708a9dc0c 100644
--- a/src/compiler/spirv/vtn_alu.c
+++ b/src/compiler/spirv/vtn_alu.c
@@ -251,10 +251,17 @@ vtn_handle_bitcast(struct vtn_builder *b, struct 
vtn_ssa_value *dest,
   vtn_assert(src_bit_size % dest_bit_size == 0);
   unsigned divisor = src_bit_size / dest_bit_size;
   for (unsigned comp = 0; comp < src_components; comp++) {
- vtn_assert(src_bit_size == 64);
- vtn_assert(dest_bit_size == 32);
- nir_ssa_def *split =
-nir_unpack_64_2x32(>nb, nir_channel(>nb, src, comp));
+ nir_ssa_def *split;
+ if (src_bit_size == 64) {
+assert(dest_bit_size == 32 || dest_bit_size == 16);
+split = dest_bit_size == 32 ?
+   nir_unpack_64_2x32(>nb, nir_channel(>nb, src, comp)) :
+   nir_unpack_64_4x16(>nb, nir_channel(>nb, src, comp));
+ } else {
+vtn_assert(src_bit_size == 32);
+vtn_assert(dest_bit_size == 16);
+split = nir_unpack_32_2x16(>nb, nir_channel(>nb, src, comp));
+ }
  for (unsigned i = 0; i < divisor; i++)
 dest_chan[divisor * comp + i] = nir_channel(>nb, split, i);
   }
@@ -263,11 +270,17 @@ vtn_handle_bitcast(struct vtn_builder *b, struct 
vtn_ssa_value *dest,
   unsigned divisor = dest_bit_size / src_bit_size;
   for (unsigned comp = 0; comp < dest_components; comp++) {
  unsigned channels = ((1 << divisor) - 1) << (comp * divisor);
- nir_ssa_def *src_chan =
-nir_channels(>nb, src, channels);
- vtn_assert(dest_bit_size == 64);
- vtn_assert(src_bit_size == 32);
- dest_chan[comp] = nir_pack_64_2x32(>nb, src_chan);
+ nir_ssa_def *src_chan = nir_channels(>nb, src, channels);
+ if (dest_bit_size == 64) {
+assert(src_bit_size == 32 || src_bit_size == 16);
+dest_chan[comp] = src_bit_size == 32 ?
+   nir_pack_64_2x32(>nb, src_chan) :
+   nir_pack_64_4x16(>nb, src_chan);
+ } else {
+vtn_assert(dest_bit_size == 32);
+vtn_assert(src_bit_size == 16);
+dest_chan[comp] = nir_pack_32_2x16(>nb, src_chan);
+ }
   }
}
dest->def = nir_vec(>nb, dest_chan, dest_components);
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 16/18] compiler/spirv: add implementation to check for SpvCapabilityInt16 support

2018-04-30 Thread Iago Toral Quiroga

Reviewed-by: Jason Ekstrand 
---
 src/compiler/shader_info.h| 1 +
 src/compiler/spirv/spirv_to_nir.c | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/compiler/shader_info.h b/src/compiler/shader_info.h
index 53a0ef21f6..afc53a8840 100644
--- a/src/compiler/shader_info.h
+++ b/src/compiler/shader_info.h
@@ -44,6 +44,7 @@ struct spirv_supported_capabilities {
bool multiview;
bool variable_pointers;
bool storage_16bit;
+   bool int16;
bool shader_viewport_index_layer;
bool subgroup_arithmetic;
bool subgroup_ballot;
diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index 2a835f047e..78437428aa 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -3281,7 +3281,6 @@ vtn_handle_preamble_instruction(struct vtn_builder *b, 
SpvOp opcode,
   case SpvCapabilityFloat16:
   case SpvCapabilityInt64Atomics:
   case SpvCapabilityAtomicStorage:
-  case SpvCapabilityInt16:
   case SpvCapabilityStorageImageMultisample:
   case SpvCapabilityInt8:
   case SpvCapabilitySparseResidency:
@@ -3297,6 +3296,9 @@ vtn_handle_preamble_instruction(struct vtn_builder *b, 
SpvOp opcode,
   case SpvCapabilityInt64:
  spv_check_supported(int64, cap);
  break;
+  case SpvCapabilityInt16:
+ spv_check_supported(int16, cap);
+ break;
 
   case SpvCapabilityAddresses:
   case SpvCapabilityKernel:
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 12/18] nir/lower_64bit_packing: extend the pass to handle packing from / to 16-bit.

2018-04-30 Thread Iago Toral Quiroga

With 16-bit support we can now do 32-bit packing, a follow-up patch will
rename the pass to something more generic.
---
 src/compiler/nir/nir_lower_64bit_packing.c | 64 +++---
 1 file changed, 59 insertions(+), 5 deletions(-)

diff --git a/src/compiler/nir/nir_lower_64bit_packing.c 
b/src/compiler/nir/nir_lower_64bit_packing.c
index abae173ce3..dd435490e3 100644
--- a/src/compiler/nir/nir_lower_64bit_packing.c
+++ b/src/compiler/nir/nir_lower_64bit_packing.c
@@ -35,19 +35,57 @@
  */
 
 static nir_ssa_def *
-lower_pack_64(nir_builder *b, nir_ssa_def *src)
+lower_pack_64_from_32(nir_builder *b, nir_ssa_def *src)
 {
return nir_pack_64_2x32_split(b, nir_channel(b, src, 0),
 nir_channel(b, src, 1));
 }
 
 static nir_ssa_def *
-lower_unpack_64(nir_builder *b, nir_ssa_def *src)
+lower_unpack_64_to_32(nir_builder *b, nir_ssa_def *src)
 {
return nir_vec2(b, nir_unpack_64_2x32_split_x(b, src),
   nir_unpack_64_2x32_split_y(b, src));
 }
 
+static nir_ssa_def *
+lower_pack_32_from_16(nir_builder *b, nir_ssa_def *src)
+{
+   return nir_pack_32_2x16_split(b, nir_channel(b, src, 0),
+nir_channel(b, src, 1));
+}
+
+static nir_ssa_def *
+lower_unpack_32_to_16(nir_builder *b, nir_ssa_def *src)
+{
+   return nir_vec2(b, nir_unpack_32_2x16_split_x(b, src),
+  nir_unpack_32_2x16_split_y(b, src));
+}
+
+static nir_ssa_def *
+lower_pack_64_from_16(nir_builder *b, nir_ssa_def *src)
+{
+   nir_ssa_def *xy = nir_pack_32_2x16_split(b, nir_channel(b, src, 0),
+   nir_channel(b, src, 1));
+
+   nir_ssa_def *zw = nir_pack_32_2x16_split(b, nir_channel(b, src, 2),
+   nir_channel(b, src, 3));
+
+   return nir_pack_64_2x32_split(b, xy, zw);
+}
+
+static nir_ssa_def *
+lower_unpack_64_to_16(nir_builder *b, nir_ssa_def *src)
+{
+   nir_ssa_def *xy = nir_unpack_64_2x32_split_x(b, src);
+   nir_ssa_def *zw = nir_unpack_64_2x32_split_y(b, src);
+
+   return nir_vec4(b, nir_unpack_32_2x16_split_x(b, xy),
+  nir_unpack_32_2x16_split_y(b, xy),
+  nir_unpack_32_2x16_split_x(b, zw),
+  nir_unpack_32_2x16_split_y(b, zw));
+}
+
 static bool
 lower_64bit_pack_impl(nir_function_impl *impl)
 {
@@ -63,7 +101,11 @@ lower_64bit_pack_impl(nir_function_impl *impl)
  nir_alu_instr *alu_instr = (nir_alu_instr *) instr;
 
  if (alu_instr->op != nir_op_pack_64_2x32 &&
- alu_instr->op != nir_op_unpack_64_2x32)
+ alu_instr->op != nir_op_unpack_64_2x32 &&
+ alu_instr->op != nir_op_pack_64_4x16 &&
+ alu_instr->op != nir_op_unpack_64_4x16 &&
+ alu_instr->op != nir_op_pack_32_2x16 &&
+ alu_instr->op != nir_op_unpack_32_2x16)
 continue;
 
  b.cursor = nir_before_instr(_instr->instr);
@@ -73,10 +115,22 @@ lower_64bit_pack_impl(nir_function_impl *impl)
 
  switch (alu_instr->op) {
  case nir_op_pack_64_2x32:
-dest = lower_pack_64(, src);
+dest = lower_pack_64_from_32(, src);
 break;
  case nir_op_unpack_64_2x32:
-dest = lower_unpack_64(, src);
+dest = lower_unpack_64_to_32(, src);
+break;
+ case nir_op_pack_64_4x16:
+dest = lower_pack_64_from_16(, src);
+break;
+ case nir_op_unpack_64_4x16:
+dest = lower_unpack_64_to_16(, src);
+break;
+ case nir_op_pack_32_2x16:
+dest = lower_pack_32_from_16(, src);
+break;
+ case nir_op_unpack_32_2x16:
+dest = lower_unpack_32_to_16(, src);
 break;
  default:
 unreachable("Impossible opcode");
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 15/18] intel/compiler: implement 16-bit pack/unpack opcodes

2018-04-30 Thread Iago Toral Quiroga

---
 src/intel/compiler/brw_fs_nir.cpp | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index d590a00385..25e85b9b25 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -1313,6 +1313,7 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
   break;
 
case nir_op_pack_64_2x32_split:
+   case nir_op_pack_32_2x16_split:
   bld.emit(FS_OPCODE_PACK, result, op[0], op[1]);
   break;
 
@@ -1325,6 +1326,15 @@ fs_visitor::nir_emit_alu(const fs_builder , 
nir_alu_instr *instr)
   break;
}
 
+   case nir_op_unpack_32_2x16_split_x:
+   case nir_op_unpack_32_2x16_split_y: {
+  if (instr->op == nir_op_unpack_32_2x16_split_x)
+ bld.MOV(result, subscript(op[0], BRW_REGISTER_TYPE_UW, 0));
+  else
+ bld.MOV(result, subscript(op[0], BRW_REGISTER_TYPE_UW, 1));
+  break;
+   }
+
case nir_op_fpow:
   inst = bld.emit(SHADER_OPCODE_POW, result, op[0], op[1]);
   inst->saturate = instr->dest.saturate;
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 13/18] compiler/lower_64bit_packing: rename the pass to be more generic

2018-04-30 Thread Iago Toral Quiroga

It can do 32-bit packing too now.
---
 src/amd/vulkan/radv_shader.c| 2 +-
 src/compiler/Makefile.sources   | 2 +-
 src/compiler/nir/meson.build| 2 +-
 src/compiler/nir/nir.h  | 2 +-
 src/compiler/nir/{nir_lower_64bit_packing.c => nir_lower_packing.c} | 6 +++---
 src/intel/compiler/brw_nir.c| 2 +-
 src/mesa/state_tracker/st_glsl_to_nir.cpp   | 2 +-
 7 files changed, 9 insertions(+), 9 deletions(-)
 rename src/compiler/nir/{nir_lower_64bit_packing.c => nir_lower_packing.c} 
(97%)

diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
index aaa6702975..67956000d1 100644
--- a/src/amd/vulkan/radv_shader.c
+++ b/src/amd/vulkan/radv_shader.c
@@ -125,7 +125,7 @@ radv_optimize_nir(struct nir_shader *shader)
 progress = false;
 
 NIR_PASS_V(shader, nir_lower_vars_to_ssa);
-   NIR_PASS_V(shader, nir_lower_64bit_pack);
+   NIR_PASS_V(shader, nir_lower_pack);
 NIR_PASS_V(shader, nir_lower_alu_to_scalar);
 NIR_PASS_V(shader, nir_lower_phis_to_scalar);
 
diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
index b5321588be..14d024b0f9 100644
--- a/src/compiler/Makefile.sources
+++ b/src/compiler/Makefile.sources
@@ -208,7 +208,6 @@ NIR_FILES = \
nir/nir_liveness.c \
nir/nir_loop_analyze.c \
nir/nir_loop_analyze.h \
-   nir/nir_lower_64bit_packing.c \
nir/nir_lower_alpha_test.c \
nir/nir_lower_alu_to_scalar.c \
nir/nir_lower_atomics.c \
@@ -232,6 +231,7 @@ NIR_FILES = \
nir/nir_lower_io_to_temporaries.c \
nir/nir_lower_io_to_scalar.c \
nir/nir_lower_io_types.c \
+   nir/nir_lower_packing.c \
nir/nir_lower_passthrough_edgeflags.c \
nir/nir_lower_patch_vertices.c \
nir/nir_lower_phis_to_scalar.c \
diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
index 62fe5167d3..1307e52f92 100644
--- a/src/compiler/nir/meson.build
+++ b/src/compiler/nir/meson.build
@@ -103,7 +103,6 @@ files_libnir = files(
   'nir_liveness.c',
   'nir_loop_analyze.c',
   'nir_loop_analyze.h',
-  'nir_lower_64bit_packing.c',
   'nir_lower_alu_to_scalar.c',
   'nir_lower_alpha_test.c',
   'nir_lower_atomics.c',
@@ -127,6 +126,7 @@ files_libnir = files(
   'nir_lower_io_to_temporaries.c',
   'nir_lower_io_to_scalar.c',
   'nir_lower_io_types.c',
+  'nir_lower_packing.c',
   'nir_lower_passthrough_edgeflags.c',
   'nir_lower_patch_vertices.c',
   'nir_lower_phis_to_scalar.c',
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index e7f2b145b3..d328d90b84 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2791,7 +2791,7 @@ typedef enum {
 } nir_lower_doubles_options;
 
 bool nir_lower_doubles(nir_shader *shader, nir_lower_doubles_options options);
-bool nir_lower_64bit_pack(nir_shader *shader);
+bool nir_lower_pack(nir_shader *shader);
 
 bool nir_normalize_cubemap_coords(nir_shader *shader);
 
diff --git a/src/compiler/nir/nir_lower_64bit_packing.c 
b/src/compiler/nir/nir_lower_packing.c
similarity index 97%
rename from src/compiler/nir/nir_lower_64bit_packing.c
rename to src/compiler/nir/nir_lower_packing.c
index dd435490e3..ba9f4bc040 100644
--- a/src/compiler/nir/nir_lower_64bit_packing.c
+++ b/src/compiler/nir/nir_lower_packing.c
@@ -87,7 +87,7 @@ lower_unpack_64_to_16(nir_builder *b, nir_ssa_def *src)
 }
 
 static bool
-lower_64bit_pack_impl(nir_function_impl *impl)
+lower_pack_impl(nir_function_impl *impl)
 {
nir_builder b;
nir_builder_init(, impl);
@@ -148,13 +148,13 @@ lower_64bit_pack_impl(nir_function_impl *impl)
 }
 
 bool
-nir_lower_64bit_pack(nir_shader *shader)
+nir_lower_pack(nir_shader *shader)
 {
bool progress = false;
 
nir_foreach_function(function, shader) {
   if (function->impl)
- progress |= lower_64bit_pack_impl(function->impl);
+ progress |= lower_pack_impl(function->impl);
}
 
return false;
diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index fb5e08fb33..22ef0486b7 100644
--- a/src/intel/compiler/brw_nir.c
+++ b/src/intel/compiler/brw_nir.c
@@ -584,7 +584,7 @@ brw_nir_optimize(nir_shader *nir, const struct brw_compiler 
*compiler,
  nir_lower_dfract |
  nir_lower_dround_even |
  nir_lower_dmod);
-  OPT(nir_lower_64bit_pack);
+  OPT(nir_lower_pack);
} while (progress);
 
return nir;
diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp 
b/src/mesa/state_tracker/st_glsl_to_nir.cpp
index bcf6a7ceb6..dbf506e77b 100644
--- a/src/mesa/state_tracker/st_glsl_to_nir.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp
@@ -324,7 +324,7 @@ st_nir_opts(nir_shader *nir)

[Mesa-dev] [PATCH v2 11/18] nir: add opcodes for 16-bit packing and unpacking

2018-04-30 Thread Iago Toral Quiroga

Noitice that we don't need 'split' versions of the 64-bit to / from
16-bit opcodes which we require during pack lowering to implement these
operations. This is because these operations can be expressed as a
collection of 32-bit from / to 16-bit and 64-bit to / from 32-bit
operations, so we don't need new opcodes specifically for them.
---
 src/compiler/nir/nir_opcodes.py | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
index 89a6c6becc..3c3316dcaa 100644
--- a/src/compiler/nir/nir_opcodes.py
+++ b/src/compiler/nir/nir_opcodes.py
@@ -91,6 +91,7 @@ tfloat = "float"
 tint = "int"
 tbool = "bool32"
 tuint = "uint"
+tuint16 = "uint16"
 tfloat32 = "float32"
 tint32 = "int32"
 tuint32 = "uint32"
@@ -282,12 +283,24 @@ dst.x = (src0.x <<  0) |
 (src0.w << 24);
 """)
 
+unop_horiz("pack_32_2x16", 1, tuint32, 2, tuint16,
+   "dst.x = src0.x | ((uint32_t)src0.y << 16);")
+
 unop_horiz("pack_64_2x32", 1, tuint64, 2, tuint32,
"dst.x = src0.x | ((uint64_t)src0.y << 32);")
 
+unop_horiz("pack_64_4x16", 1, tuint64, 4, tuint16,
+   "dst.x = src0.x | ((uint64_t)src0.y << 16) | ((uint64_t)src0.z << 
32) | ((uint64_t)src0.w << 48);")
+
 unop_horiz("unpack_64_2x32", 2, tuint32, 1, tuint64,
"dst.x = src0.x; dst.y = src0.x >> 32;")
 
+unop_horiz("unpack_64_4x16", 4, tuint16, 1, tuint64,
+   "dst.x = src0.x; dst.y = src0.x >> 16; dst.z = src0.x >> 32; dst.w 
= src0.w >> 48;")
+
+unop_horiz("unpack_32_2x16", 2, tuint16, 1, tuint32,
+   "dst.x = src0.x; dst.y = src0.x >> 16;")
+
 # Lowered floating point unpacking operations.
 
 
@@ -296,6 +309,9 @@ unop_horiz("unpack_half_2x16_split_x", 1, tfloat32, 1, 
tuint32,
 unop_horiz("unpack_half_2x16_split_y", 1, tfloat32, 1, tuint32,
"unpack_half_1x16((uint16_t)(src0.x >> 16))")
 
+unop_convert("unpack_32_2x16_split_x", tuint16, tuint32, "src0")
+unop_convert("unpack_32_2x16_split_y", tuint16, tuint32, "src0 >> 16")
+
 unop_convert("unpack_64_2x32_split_x", tuint32, tuint64, "src0")
 unop_convert("unpack_64_2x32_split_y", tuint32, tuint64, "src0 >> 32")
 
@@ -608,6 +624,9 @@ binop_horiz("pack_half_2x16_split", 1, tuint32, 1, 
tfloat32, 1, tfloat32,
 binop_convert("pack_64_2x32_split", tuint64, tuint32, "",
   "src0 | ((uint64_t)src1 << 32)")
 
+binop_convert("pack_32_2x16_split", tuint32, tuint16, "",
+  "src0 | ((uint32_t)src1 << 16)")
+
 # bfm implements the behavior of the first operation of the SM5 "bfi" assembly
 # and that of the "bfi1" i965 instruction. That is, it has undefined behavior
 # if either of its arguments are 32.
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 09/18] intel/compiler: lower some 16-bit integer operations to 32-bit

2018-04-30 Thread Iago Toral Quiroga

These are not supported in hardware for 16-bit integers.

We do the lowering pass after the optimization loop to ensure that we
lower ALU operations injected by algebraic optimizations too.
---
 src/intel/compiler/brw_nir.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
index 16b0d86814..fb5e08fb33 100644
--- a/src/intel/compiler/brw_nir.c
+++ b/src/intel/compiler/brw_nir.c
@@ -590,6 +590,25 @@ brw_nir_optimize(nir_shader *nir, const struct 
brw_compiler *compiler,
return nir;
 }
 
+static unsigned
+lower_bit_size_callback(const nir_alu_instr *alu, void *data)
+{
+   assert(alu->dest.dest.is_ssa);
+   if (alu->dest.dest.ssa.bit_size != 16)
+  return 0;
+
+   switch (alu->op) {
+   case nir_op_idiv:
+   case nir_op_imod:
+   case nir_op_irem:
+   case nir_op_udiv:
+   case nir_op_umod:
+  return 32;
+   default:
+  return 0;
+   }
+}
+
 /* Does some simple lowering and runs the standard suite of optimizations
  *
  * This is intended to be called more-or-less directly after you get the
@@ -643,6 +662,8 @@ brw_preprocess_nir(const struct brw_compiler *compiler, 
nir_shader *nir)
 
nir = brw_nir_optimize(nir, compiler, is_scalar);
 
+   nir_lower_bit_size(nir, lower_bit_size_callback, NULL);
+
if (is_scalar) {
   OPT(nir_lower_load_const_to_scalar);
}
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH v2 07/18] intel/compiler: fix brw_negate_immediate for 16-bit types

2018-04-30 Thread Iago Toral Quiroga

From: Jose Maria Casanova Crespo 

From Intel Skylake PRM, vol 07, "Immediate" section (page 768):

"For a word, unsigned word, or half-float immediate data,
software must replicate the same 16-bit immediate value to both
the lower word and the high word of the 32-bit immediate field
in a GEN instruction."

This patch implements float16 negate and fix the int16/uint16
negate that wasn't taking into account the replication in lower
and higher words.

v2: Integer cases are different to Float cases. (Jason Ekstrand)
Included reference to PRM (Jose Maria Casanova)
---
 src/intel/compiler/brw_shader.cpp | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_shader.cpp 
b/src/intel/compiler/brw_shader.cpp
index 9cdf9fcb23..76dd1173fa 100644
--- a/src/intel/compiler/brw_shader.cpp
+++ b/src/intel/compiler/brw_shader.cpp
@@ -580,8 +580,13 @@ brw_negate_immediate(enum brw_reg_type type, struct 
brw_reg *reg)
   reg->d = -reg->d;
   return true;
case BRW_REGISTER_TYPE_W:
-   case BRW_REGISTER_TYPE_UW:
-  reg->d = -(int16_t)reg->ud;
+   case BRW_REGISTER_TYPE_UW: {
+  uint16_t value = -(int16_t)reg->ud;
+  reg->ud = value | value << 16;
+  return true;
+   }
+   case BRW_REGISTER_TYPE_HF:
+  reg->ud ^= 0x80008000;
   return true;
case BRW_REGISTER_TYPE_F:
   reg->f = -reg->f;
@@ -602,8 +607,6 @@ brw_negate_immediate(enum brw_reg_type type, struct brw_reg 
*reg)
case BRW_REGISTER_TYPE_UV:
case BRW_REGISTER_TYPE_V:
   assert(!"unimplemented: negate UV/V immediate");
-   case BRW_REGISTER_TYPE_HF:
-  assert(!"unimplemented: negate HF immediate");
case BRW_REGISTER_TYPE_NF:
   unreachable("no NF immediates");
}
-- 
2.14.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

< 1 2 3 4 5 6 7 8 9 10 >

401 - 500 of 3144 matches

Mail list logo