Re: [Mesa-dev] [PATCH 00/15] GLSL memory allocation rework for faster compilation

2016-10-09 Thread Tapani Pälli



On 10/08/2016 06:58 PM, Jason Ekstrand wrote:

FYI, we use ralloc for a lot more than just the glsl compiler so the
first few changes make me a bit nervous.  There was someone working on
making our driver more I undefined-memory-friendly but I don't know what
happened to those patches.


There's bunch of patches like that in this series:
https://lists.freedesktop.org/archives/mesa-dev/2016-June/120445.html

it looks like it just never landed as would have required more testing 
on misc drivers?




On Oct 8, 2016 3:58 AM, "Marek Olšák" > wrote:

Hi,

This patch series reduces the number of malloc calls in the GLSL
compiler by 63%. That leads to better compile times and less heap
thrashing.

It's done by switching memory allocations in the GLSL compiler to my
new linear allocator that allocates out of a fixed-sized buffer with
a monotonically increasing offset. If more buffers are needed, it
chains them.

The new allocator is used in all places where short-lived allocations
are used with a high number of malloc calls. The series also contains
other improvements not related to the new allocator that also improve
compile times. The results are below.

I tested my shader-db with shaders only being compiled to TGSI.
(noop gallium driver)


master + libc's malloc:

 real   0m54.182s
 user   3m33.640s
 sys0m0.620s
 maxmem 275 MB


master + jemalloc preloaded:

 real   0m45.044s
 user   2m56.356s
 sys0m1.652s
 maxmem 284 MB


the series + libc's malloc:

 real   0m46.221s
 user   3m2.080s
 sys0m0.544s
 maxmem 270 MB


the series + jemalloc preloaded:

 real   0m40.729s
 user   2m39.564s
 sys0m1.232s
 maxmem 284 MB


The series without jemalloc almost caught up with jemalloc + master.
However, jemalloc also benefits.

Current Mesa needs 54.182s and it drops to 40.729s with my series and
jemalloc. The total change in compile time is -25% if we incorporate
both. Without jemalloc, the difference is only -14.7%.

With radeonsi, the improvement is approx. slightly more than 1/2 of that
(if you add the LLVM time). However, radeonsi also has asynchronous
shader compilation hiding LLVM overhead in some cases, so it depends.

Drivers with faster compiler backends will benefit more than radeonsi,
but will probably not reach -25% or -14.7% (except softpipe, which uses
TGSI as-is).

The memory usage looks reasonable in all tested cases.

Note: One of the first patches moves memset from ralloc to rzalloc.
I tested and fixed the GLSL source -> TGSI path, but other codepaths
may break, and you need to use valgrind to find all uninitialized
variables that relied on ralloc doing memset (if there are any).

You can also find it here:
https://cgit.freedesktop.org/~mareko/mesa/log/?h=glsl-alloc-rework


Please review.

 src/compiler/glsl/ast.h |   4 +-
 src/compiler/glsl/ast_to_hir.cpp|   4 +-
 src/compiler/glsl/ast_type.cpp  |  13 ++-
 src/compiler/glsl/glcpp/glcpp-lex.l |   2 +-
 src/compiler/glsl/glcpp/glcpp-parse.y   | 203
+-
 src/compiler/glsl/glcpp/glcpp.h |   1 +
 src/compiler/glsl/glsl_lexer.ll |  16 +--
 src/compiler/glsl/glsl_parser.yy| 202
+++---
 src/compiler/glsl/glsl_parser_extras.cpp|   6 +-
 src/compiler/glsl/glsl_parser_extras.h  |   4 +-
 src/compiler/glsl/glsl_symbol_table.cpp |  19 ++--
 src/compiler/glsl/glsl_symbol_table.h   |   1 +
 src/compiler/glsl/ir.cpp|   4 +
 src/compiler/glsl/ir.h  |  13 ++-
 src/compiler/glsl/link_uniform_blocks.cpp   |   2 +-
 src/compiler/glsl/list.h|   2 +-
 src/compiler/glsl/lower_packed_varyings.cpp |   8 +-
 src/compiler/glsl/opt_constant_propagation.cpp  |  14 ++-
 src/compiler/glsl/opt_copy_propagation.cpp  |   7 +-
 src/compiler/glsl/opt_copy_propagation_elements.cpp |  19 ++--
 src/compiler/glsl/opt_dead_code_local.cpp   |  12 ++-
 src/compiler/glsl_types.cpp |  38 +--
 src/compiler/glsl_types.h   |   6 +-
 src/compiler/nir/nir.c  |   8 +-
 src/compiler/spirv/vtn_variables.c  |   3 +-
 src/gallium/drivers/freedreno/ir3/ir3.c |   2 +-
 src/gallium/drivers/vc4/vc4_cl.c|   2 +-
 

Re: [Mesa-dev] [PATCH 10/22] intel/blorp: Add an entrypoint for clearing depth and stencil

2016-10-09 Thread Jason Ekstrand
On Oct 9, 2016 10:48 PM, "Pohjolainen, Topi" 
wrote:
>
> On Fri, Oct 07, 2016 at 09:41:08PM -0700, Jason Ekstrand wrote:
> > Signed-off-by: Jason Ekstrand 
> > ---
> >  src/intel/blorp/blorp.h   | 10 
> >  src/intel/blorp/blorp_clear.c | 58
+++
> >  2 files changed, 68 insertions(+)
> >
> > diff --git a/src/intel/blorp/blorp.h b/src/intel/blorp/blorp.h
> > index 480f1bb..263d194 100644
> > --- a/src/intel/blorp/blorp.h
> > +++ b/src/intel/blorp/blorp.h
> > @@ -134,6 +134,16 @@ blorp_clear(struct blorp_batch *batch,
> >  const bool color_write_disable[4]);
> >
> >  void
> > +blorp_clear_depth_stencil(struct blorp_batch *batch,
> > +  const struct blorp_surf *depth,
> > +  const struct blorp_surf *stencil,
> > +  uint32_t level, uint32_t start_layer,
> > +  uint32_t num_layers,
> > +  uint32_t x0, uint32_t y0, uint32_t x1,
uint32_t y1,
> > +  bool clear_depth, float depth_value,
> > +  bool clear_stencil, uint8_t stencil_value);
> > +
> > +void
> >  blorp_ccs_resolve(struct blorp_batch *batch,
> >struct blorp_surf *surf, enum isl_format format);
> >
> > diff --git a/src/intel/blorp/blorp_clear.c
b/src/intel/blorp/blorp_clear.c
> > index a8f29fd..1d6bf1c 100644
> > --- a/src/intel/blorp/blorp_clear.c
> > +++ b/src/intel/blorp/blorp_clear.c
> > @@ -306,6 +306,64 @@ blorp_clear(struct blorp_batch *batch,
> >  }
> >
> >  void
> > +blorp_clear_depth_stencil(struct blorp_batch *batch,
> > +  const struct blorp_surf *depth,
> > +  const struct blorp_surf *stencil,
> > +  uint32_t level, uint32_t start_layer,
> > +  uint32_t num_layers,
> > +  uint32_t x0, uint32_t y0, uint32_t x1,
uint32_t y1,
> > +  bool clear_depth, float depth_value,
> > +  bool clear_stencil, uint8_t stencil_value)
> > +{
> > +   struct blorp_params params;
> > +   blorp_params_init();
> > +
> > +   params.x0 = x0;
> > +   params.y0 = y0;
> > +   params.x1 = x1;
> > +   params.y1 = y1;
> > +
> > +   while (num_layers > 0) {
> > +  params.num_layers = num_layers;
> > +
> > +  if (clear_stencil) {
> > + brw_blorp_surface_info_init(batch->blorp, ,
stencil,
> > + level, start_layer,
> > + ISL_FORMAT_UNSUPPORTED, true);
> > + params.stencil_ref = stencil_value;
> > +
> > + params.dst.surf.samples = params.stencil.surf.samples;
> > + params.dst.surf.logical_level0_px =
> > +params.stencil.surf.logical_level0_px;
> > +
> > + if (params.stencil.view.array_len < params.num_layers)
> > +params.num_layers = params.stencil.view.array_len;
> > +  }
> > +
> > +  if (clear_depth) {
> > + brw_blorp_surface_info_init(batch->blorp, ,
depth,
> > + level, start_layer,
> > + ISL_FORMAT_UNSUPPORTED, true);
> > + params.z = depth_value;
> > + params.depth_format =
> > +isl_format_get_depth_format(depth->surf->format, false);
> > +
> > + params.dst.surf.samples = params.depth.surf.samples;
> > + params.dst.surf.logical_level0_px =
> > +params.depth.surf.logical_level0_px;
> > +
> > + if (params.depth.view.array_len < params.num_layers)
> > +params.num_layers = params.depth.view.array_len;
>
> Stencil already does the same, could we add an assert here:
>
>assert(!clear_stencil ||
>   params.depth.view.array_len ==
>   params.stencil.view.array_len)

Sure.

>
> Moreover, I thought that start_layer + num_layers <= view.array_len should
> always apply. You seem to explicitly prepare for that not to hold. Could
you
> give an example?

I had to do this recently with the other clears as well.  Sandy Bridge has
a max of 512 layers for rendering.  In the surface_info_init function, we
clamp as needed and this ensures that 16k-slice textures get 32 draw calls.

> > +  }
> > +
> > +  batch->blorp->exec(batch, );
> > +
> > +  start_layer += params.num_layers;
> > +  num_layers -= params.num_layers;
> > +   }
> > +}
> > +
> > +void
> >  blorp_ccs_resolve(struct blorp_batch *batch,
> >struct blorp_surf *surf, enum isl_format format)
> >  {
> > --
> > 2.5.0.400.gff86faf
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list

Re: [Mesa-dev] [PATCH 11/22] intel/blorp: Add a flag to make blorp not re-emit dept/stencil buffers

2016-10-09 Thread Pohjolainen, Topi
On Fri, Oct 07, 2016 at 09:41:09PM -0700, Jason Ekstrand wrote:
> In Vulkan, we want to be able to use blorp to perform clears inside of a
> render pass.  If blorp stomps the depth/stencil buffers packets then we'll
> have to re-emit them.  This gets tricky when secondary command buffers get
> involved.  Instead, we'll simply guarantee that the depth and stencil
> buffers we pass to blorp (if any) match those already set in the hardware.

Patches 8,9 and 11 are:

Reviewed-by: Topi Pohjolainen 

> 
> Signed-off-by: Jason Ekstrand 
> ---
>  src/intel/blorp/blorp.c   |  4 +++-
>  src/intel/blorp/blorp.h   | 13 -
>  src/intel/blorp/blorp_genX_exec.h |  3 ++-
>  src/intel/vulkan/anv_blorp.c  | 18 +-
>  src/mesa/drivers/dri/i965/brw_blorp.c | 12 ++--
>  5 files changed, 32 insertions(+), 18 deletions(-)
> 
> diff --git a/src/intel/blorp/blorp.c b/src/intel/blorp/blorp.c
> index 91513a0..08afffe 100644
> --- a/src/intel/blorp/blorp.c
> +++ b/src/intel/blorp/blorp.c
> @@ -45,10 +45,12 @@ blorp_finish(struct blorp_context *blorp)
>  
>  void
>  blorp_batch_init(struct blorp_context *blorp,
> - struct blorp_batch *batch, void *driver_batch)
> + struct blorp_batch *batch, void *driver_batch,
> + enum blorp_batch_flags flags)
>  {
> batch->blorp = blorp;
> batch->driver_batch = driver_batch;
> +   batch->flags = flags;
>  }
>  
>  void
> diff --git a/src/intel/blorp/blorp.h b/src/intel/blorp/blorp.h
> index 263d194..7a3e3de 100644
> --- a/src/intel/blorp/blorp.h
> +++ b/src/intel/blorp/blorp.h
> @@ -66,13 +66,24 @@ void blorp_init(struct blorp_context *blorp, void 
> *driver_ctx,
>  struct isl_device *isl_dev);
>  void blorp_finish(struct blorp_context *blorp);
>  
> +enum blorp_batch_flags {
> +   /**
> +* This flag indicates that blorp should *not* re-emit the depth and
> +* stencil buffer packets.  Instead, the driver guarantees that all depth
> +* and stencil images passed in will match what is currently set in the
> +* hardware.
> +*/
> +   BLORP_BATCH_NO_EMIT_DEPTH_STENCIL = (1 << 0),
> +};
> +
>  struct blorp_batch {
> struct blorp_context *blorp;
> void *driver_batch;
> +   enum blorp_batch_flags flags;
>  };
>  
>  void blorp_batch_init(struct blorp_context *blorp, struct blorp_batch *batch,
> -  void *driver_batch);
> +  void *driver_batch, enum blorp_batch_flags flags);
>  void blorp_batch_finish(struct blorp_batch *batch);
>  
>  struct blorp_address {
> diff --git a/src/intel/blorp/blorp_genX_exec.h 
> b/src/intel/blorp/blorp_genX_exec.h
> index 1615d42..ff9e68b 100644
> --- a/src/intel/blorp/blorp_genX_exec.h
> +++ b/src/intel/blorp/blorp_genX_exec.h
> @@ -1281,7 +1281,8 @@ blorp_exec(struct blorp_batch *batch, const struct 
> blorp_params *params)
>  
> blorp_emit_viewport_state(batch, params);
>  
> -   blorp_emit_depth_stencil_config(batch, params);
> +   if (!(batch->flags & BLORP_BATCH_NO_EMIT_DEPTH_STENCIL))
> +  blorp_emit_depth_stencil_config(batch, params);
>  
> blorp_emit(batch, GENX(3DPRIMITIVE), prim) {
>prim.VertexAccessType = SEQUENTIAL;
> diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> index f149f84..d7a1fd3 100644
> --- a/src/intel/vulkan/anv_blorp.c
> +++ b/src/intel/vulkan/anv_blorp.c
> @@ -182,7 +182,7 @@ void anv_CmdCopyImage(
> ANV_FROM_HANDLE(anv_image, dst_image, dstImage);
>  
> struct blorp_batch batch;
> -   blorp_batch_init(_buffer->device->blorp, , cmd_buffer);
> +   blorp_batch_init(_buffer->device->blorp, , cmd_buffer, 0);
>  
> for (unsigned r = 0; r < regionCount; r++) {
>VkOffset3D srcOffset =
> @@ -244,7 +244,7 @@ copy_buffer_to_image(struct anv_cmd_buffer *cmd_buffer,
>   bool buffer_to_image)
>  {
> struct blorp_batch batch;
> -   blorp_batch_init(_buffer->device->blorp, , cmd_buffer);
> +   blorp_batch_init(_buffer->device->blorp, , cmd_buffer, 0);
>  
> struct {
>struct blorp_surf surf;
> @@ -404,7 +404,7 @@ void anv_CmdBlitImage(
> }
>  
> struct blorp_batch batch;
> -   blorp_batch_init(_buffer->device->blorp, , cmd_buffer);
> +   blorp_batch_init(_buffer->device->blorp, , cmd_buffer, 0);
>  
> for (unsigned r = 0; r < regionCount; r++) {
>const VkImageSubresourceLayers *src_res = [r].srcSubresource;
> @@ -575,7 +575,7 @@ void anv_CmdCopyBuffer(
> ANV_FROM_HANDLE(anv_buffer, dst_buffer, dstBuffer);
>  
> struct blorp_batch batch;
> -   blorp_batch_init(_buffer->device->blorp, , cmd_buffer);
> +   blorp_batch_init(_buffer->device->blorp, , cmd_buffer, 0);
>  
> for (unsigned r = 0; r < regionCount; r++) {
>uint64_t src_offset = src_buffer->offset + pRegions[r].srcOffset;
> @@ -636,7 +636,7 @@ void anv_CmdUpdateBuffer(
> ANV_FROM_HANDLE(anv_buffer, dst_buffer, 

Re: [Mesa-dev] [PATCH 10/22] intel/blorp: Add an entrypoint for clearing depth and stencil

2016-10-09 Thread Pohjolainen, Topi
On Fri, Oct 07, 2016 at 09:41:08PM -0700, Jason Ekstrand wrote:
> Signed-off-by: Jason Ekstrand 
> ---
>  src/intel/blorp/blorp.h   | 10 
>  src/intel/blorp/blorp_clear.c | 58 
> +++
>  2 files changed, 68 insertions(+)
> 
> diff --git a/src/intel/blorp/blorp.h b/src/intel/blorp/blorp.h
> index 480f1bb..263d194 100644
> --- a/src/intel/blorp/blorp.h
> +++ b/src/intel/blorp/blorp.h
> @@ -134,6 +134,16 @@ blorp_clear(struct blorp_batch *batch,
>  const bool color_write_disable[4]);
>  
>  void
> +blorp_clear_depth_stencil(struct blorp_batch *batch,
> +  const struct blorp_surf *depth,
> +  const struct blorp_surf *stencil,
> +  uint32_t level, uint32_t start_layer,
> +  uint32_t num_layers,
> +  uint32_t x0, uint32_t y0, uint32_t x1, uint32_t y1,
> +  bool clear_depth, float depth_value,
> +  bool clear_stencil, uint8_t stencil_value);
> +
> +void
>  blorp_ccs_resolve(struct blorp_batch *batch,
>struct blorp_surf *surf, enum isl_format format);
>  
> diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c
> index a8f29fd..1d6bf1c 100644
> --- a/src/intel/blorp/blorp_clear.c
> +++ b/src/intel/blorp/blorp_clear.c
> @@ -306,6 +306,64 @@ blorp_clear(struct blorp_batch *batch,
>  }
>  
>  void
> +blorp_clear_depth_stencil(struct blorp_batch *batch,
> +  const struct blorp_surf *depth,
> +  const struct blorp_surf *stencil,
> +  uint32_t level, uint32_t start_layer,
> +  uint32_t num_layers,
> +  uint32_t x0, uint32_t y0, uint32_t x1, uint32_t y1,
> +  bool clear_depth, float depth_value,
> +  bool clear_stencil, uint8_t stencil_value)
> +{
> +   struct blorp_params params;
> +   blorp_params_init();
> +
> +   params.x0 = x0;
> +   params.y0 = y0;
> +   params.x1 = x1;
> +   params.y1 = y1;
> +
> +   while (num_layers > 0) {
> +  params.num_layers = num_layers;
> +
> +  if (clear_stencil) {
> + brw_blorp_surface_info_init(batch->blorp, , stencil,
> + level, start_layer,
> + ISL_FORMAT_UNSUPPORTED, true);
> + params.stencil_ref = stencil_value;
> +
> + params.dst.surf.samples = params.stencil.surf.samples;
> + params.dst.surf.logical_level0_px =
> +params.stencil.surf.logical_level0_px;
> +
> + if (params.stencil.view.array_len < params.num_layers)
> +params.num_layers = params.stencil.view.array_len;
> +  }
> +
> +  if (clear_depth) {
> + brw_blorp_surface_info_init(batch->blorp, , depth,
> + level, start_layer,
> + ISL_FORMAT_UNSUPPORTED, true);
> + params.z = depth_value;
> + params.depth_format =
> +isl_format_get_depth_format(depth->surf->format, false);
> +
> + params.dst.surf.samples = params.depth.surf.samples;
> + params.dst.surf.logical_level0_px =
> +params.depth.surf.logical_level0_px;
> +
> + if (params.depth.view.array_len < params.num_layers)
> +params.num_layers = params.depth.view.array_len;

Stencil already does the same, could we add an assert here:

   assert(!clear_stencil || 
  params.depth.view.array_len == 
  params.stencil.view.array_len)


Moreover, I thought that start_layer + num_layers <= view.array_len should
always apply. You seem to explicitly prepare for that not to hold. Could you
give an example?

> +  }
> +
> +  batch->blorp->exec(batch, );
> +
> +  start_layer += params.num_layers;
> +  num_layers -= params.num_layers;
> +   }
> +}
> +
> +void
>  blorp_ccs_resolve(struct blorp_batch *batch,
>struct blorp_surf *surf, enum isl_format format)
>  {
> -- 
> 2.5.0.400.gff86faf
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98135] dEQP-GLES31.functional.debug.negative_coverage.get_error.shader.transform_feedback_varyings wants a different GL error code

2016-10-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98135

Tapani Pälli  changed:

   What|Removed |Added

   Assignee|mesa-dev@lists.freedesktop. |lem...@gmail.com
   |org |
 Status|NEW |ASSIGNED

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/22] intel/blorp: Emit more complete DEPTH_STENCIL state

2016-10-09 Thread Pohjolainen, Topi
On Fri, Oct 07, 2016 at 09:41:05PM -0700, Jason Ekstrand wrote:
> This should now set the pipeline up properly for doing depth and/or stencil
> clears by plumbing through depth/stencil test values.

If you like you could also say that color calculator state is now emitted
also for blorp operations without actual shader (no prog_data).

Patches 2-7 are:

Reviewed-by: Topi Pohjolainen 

> 
> Signed-off-by: Jason Ekstrand 
> ---
>  src/intel/blorp/blorp_genX_exec.h | 56 
> +++
>  src/intel/blorp/blorp_priv.h  |  1 +
>  2 files changed, 46 insertions(+), 11 deletions(-)
> 
> diff --git a/src/intel/blorp/blorp_genX_exec.h 
> b/src/intel/blorp/blorp_genX_exec.h
> index 6bbabcf..d72965e 100644
> --- a/src/intel/blorp/blorp_genX_exec.h
> +++ b/src/intel/blorp/blorp_genX_exec.h
> @@ -868,11 +868,17 @@ static uint32_t
>  blorp_emit_color_calc_state(struct blorp_batch *batch,
>  const struct blorp_params *params)
>  {
> +   struct GENX(COLOR_CALC_STATE) cc = { 0 };
> +
> +#if GEN_GEN <= 8
> +   cc.StencilReferenceValue = params->stencil_ref;
> +#endif
> +
> uint32_t offset;
> void *state = blorp_alloc_dynamic_state(batch, AUB_TRACE_CC_STATE,
> GENX(COLOR_CALC_STATE_length) * 4,
> 64, );
> -   memset(state, 0, GENX(COLOR_CALC_STATE_length) * 4);
> +   GENX(COLOR_CALC_STATE_pack)(NULL, state, );
>  
>  #if GEN_GEN >= 7
> blorp_emit(batch, GENX(3DSTATE_CC_STATE_POINTERS), sp) {
> @@ -898,16 +904,44 @@ blorp_emit_depth_stencil_state(struct blorp_batch 
> *batch,
> struct GENX(DEPTH_STENCIL_STATE) ds = { 0 };
>  #endif
>  
> -   ds.DepthBufferWriteEnable = params->depth.addr.buffer != NULL;
> +   if (params->depth.addr.buffer) {
> +  ds.DepthBufferWriteEnable = true;
>  
> -   /* See the following sections of the Sandy Bridge PRM, Volume 1, Part2:
> -*   - 7.5.3.1 Depth Buffer Clear
> -*   - 7.5.3.2 Depth Buffer Resolve
> -*   - 7.5.3.3 Hierarchical Depth Buffer Resolve
> -*/
> -   if (params->hiz_op == BLORP_HIZ_OP_DEPTH_RESOLVE) {
> -  ds.DepthTestEnable = true;
> -  ds.DepthTestFunction = COMPAREFUNCTION_NEVER;
> +  switch (params->hiz_op) {
> +  case BLORP_HIZ_OP_NONE:
> + ds.DepthTestEnable = true;
> + ds.DepthTestFunction = COMPAREFUNCTION_ALWAYS;
> + break;
> +
> +  /* See the following sections of the Sandy Bridge PRM, Volume 2, Part1:
> +   *   - 7.5.3.1 Depth Buffer Clear
> +   *   - 7.5.3.2 Depth Buffer Resolve
> +   *   - 7.5.3.3 Hierarchical Depth Buffer Resolve
> +   */
> +  case BLORP_HIZ_OP_DEPTH_RESOLVE:
> + ds.DepthTestEnable = true;
> + ds.DepthTestFunction = COMPAREFUNCTION_NEVER;
> + break;
> +
> +  case BLORP_HIZ_OP_DEPTH_CLEAR:
> +  case BLORP_HIZ_OP_HIZ_RESOLVE:
> + ds.DepthTestEnable = false;
> + break;
> +  }
> +   }
> +
> +   if (params->stencil.addr.buffer) {
> +  ds.StencilBufferWriteEnable = true;
> +  ds.StencilTestEnable = true;
> +  ds.DoubleSidedStencilEnable = false;
> +
> +  ds.StencilTestFunction = COMPAREFUNCTION_ALWAYS;
> +  ds.StencilPassDepthPassOp = STENCILOP_REPLACE;
> +
> +  ds.StencilWriteMask = 0xff;
> +#if GEN_GEN >= 9
> +  ds.StencilReferenceValue = params->stencil_ref;
> +#endif
> }
>  
>  #if GEN_GEN >= 8
> @@ -1163,8 +1197,8 @@ blorp_exec(struct blorp_batch *batch, const struct 
> blorp_params *params)
>  
> if (params->wm_prog_data) {
>blend_state_offset = blorp_emit_blend_state(batch, params);
> -  color_calc_state_offset = blorp_emit_color_calc_state(batch, params);
> }
> +   color_calc_state_offset = blorp_emit_color_calc_state(batch, params);
> depth_stencil_state_offset = blorp_emit_depth_stencil_state(batch, 
> params);
>  
>  #if GEN_GEN <= 6
> diff --git a/src/intel/blorp/blorp_priv.h b/src/intel/blorp/blorp_priv.h
> index 82a7bef..e9f0124 100644
> --- a/src/intel/blorp/blorp_priv.h
> +++ b/src/intel/blorp/blorp_priv.h
> @@ -180,6 +180,7 @@ struct blorp_params
> uint32_t x1;
> uint32_t y1;
> float z;
> +   uint8_t stencil_ref;
> struct brw_blorp_surface_info depth;
> struct brw_blorp_surface_info stencil;
> uint32_t depth_format;
> -- 
> 2.5.0.400.gff86faf
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98133] GetSynciv should raise an error if bufSize < 0

2016-10-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98133

Tapani Pälli  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Tapani Pälli  ---
commit 2d7e0f35c54f49c3280eea308a652253cb3bde46
Author: Tapani Pälli 
Date:   Fri Oct 7 08:41:15 2016 +0300

mesa: throw error if bufSize negative in GetSynciv on OpenGL ES

Fixes following dEQP tests:

  
dEQP-GLES31.functional.debug.negative_coverage.callbacks.state.get_synciv
  
dEQP-GLES31.functional.debug.negative_coverage.get_error.state.get_synciv
   dEQP-GLES31.functional.debug.negative_coverage.log.state.get_synciv

v2: drop _mesa_is_gles check (Kenneth)

Signed-off-by: Tapani Pälli 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98133
Reviewed-by: Kenneth Graunke 

-- 
You are receiving this mail because:
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/22] intel/blorp: Make the Z component of the primitive adjustable

2016-10-09 Thread Pohjolainen, Topi
On Fri, Oct 07, 2016 at 09:40:59PM -0700, Jason Ekstrand wrote:
> We want to be able to start slow depth clears.  This allows us to adjust

I think you are missing a word or two here, s/start/start using/? Otherwise

Reviewed-by: Topi Pohjolainen 

> the depth we're clearing to.
> 
> Signed-off-by: Jason Ekstrand 
> ---
>  src/intel/blorp/blorp_genX_exec.h | 12 ++--
>  src/intel/blorp/blorp_priv.h  |  1 +
>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/src/intel/blorp/blorp_genX_exec.h 
> b/src/intel/blorp/blorp_genX_exec.h
> index 62f16a3..f24b495 100644
> --- a/src/intel/blorp/blorp_genX_exec.h
> +++ b/src/intel/blorp/blorp_genX_exec.h
> @@ -171,9 +171,9 @@ blorp_emit_vertex_data(struct blorp_batch *batch,
> uint32_t *size)
>  {
> const float vertices[] = {
> -  /* v0 */ (float)params->x1, (float)params->y1,
> -  /* v1 */ (float)params->x0, (float)params->y1,
> -  /* v2 */ (float)params->x0, (float)params->y0,
> +  /* v0 */ (float)params->x1, (float)params->y1, params->z,
> +  /* v1 */ (float)params->x0, (float)params->y1, params->z,
> +  /* v2 */ (float)params->x0, (float)params->y0, params->z,
> };
>  
> void *data = blorp_alloc_vertex_buffer(batch, sizeof(vertices), addr);
> @@ -225,7 +225,7 @@ blorp_emit_vertex_buffers(struct blorp_batch *batch,
> uint32_t size;
> blorp_emit_vertex_data(batch, params, [0].BufferStartingAddress, 
> );
> vb[0].VertexBufferIndex = 0;
> -   vb[0].BufferPitch = 2 * sizeof(float);
> +   vb[0].BufferPitch = 3 * sizeof(float);
> vb[0].VertexBufferMOCS = batch->blorp->mocs.vb;
>  #if GEN_GEN >= 7
> vb[0].AddressModifyEnable = true;
> @@ -344,11 +344,11 @@ blorp_emit_vertex_elements(struct blorp_batch *batch,
>  
> ve[1].VertexBufferIndex = 0;
> ve[1].Valid = true;
> -   ve[1].SourceElementFormat = ISL_FORMAT_R32G32_FLOAT;
> +   ve[1].SourceElementFormat = ISL_FORMAT_R32G32B32_FLOAT;
> ve[1].SourceElementOffset = 0;
> ve[1].Component0Control = VFCOMP_STORE_SRC;
> ve[1].Component1Control = VFCOMP_STORE_SRC;
> -   ve[1].Component2Control = VFCOMP_STORE_0;
> +   ve[1].Component2Control = VFCOMP_STORE_SRC;
> ve[1].Component3Control = VFCOMP_STORE_1_FP;
>  
> for (unsigned i = 0; i < num_varyings; ++i) {
> diff --git a/src/intel/blorp/blorp_priv.h b/src/intel/blorp/blorp_priv.h
> index a88d0f8..a6ea0ff 100644
> --- a/src/intel/blorp/blorp_priv.h
> +++ b/src/intel/blorp/blorp_priv.h
> @@ -179,6 +179,7 @@ struct blorp_params
> uint32_t y0;
> uint32_t x1;
> uint32_t y1;
> +   float z;
> struct brw_blorp_surface_info depth;
> uint32_t depth_format;
> struct brw_blorp_surface_info src;
> -- 
> 2.5.0.400.gff86faf
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] GlBlitFramebuffer Specification Fix

2016-10-09 Thread Max Qian
The specification says that if the depth AND stencil formats
do not match, not or. This fixes bug #97921.
---
 src/mesa/main/blit.c | 46 +++---
 1 file changed, 19 insertions(+), 27 deletions(-)

diff --git a/src/mesa/main/blit.c b/src/mesa/main/blit.c
index e739130..62afeb4 100644
--- a/src/mesa/main/blit.c
+++ b/src/mesa/main/blit.c
@@ -334,7 +334,7 @@ _mesa_blit_framebuffer(struct gl_context *ctx,
  mask &= ~GL_STENCIL_BUFFER_BIT;
   }
   else {
- int read_z_bits, draw_z_bits;
+ int read_z_bits, draw_z_bits, read_s_bit, draw_s_bit;
 
  if (_mesa_is_gles3(ctx) && (drawRb == readRb)) {
 _mesa_error(ctx, GL_INVALID_OPERATION,
@@ -343,28 +343,24 @@ _mesa_blit_framebuffer(struct gl_context *ctx,
 return;
  }
 
- if (_mesa_get_format_bits(readRb->Format, GL_STENCIL_BITS) !=
- _mesa_get_format_bits(drawRb->Format, GL_STENCIL_BITS)) {
-/* There is no need to check the stencil datatype here, because
- * there is only one: GL_UNSIGNED_INT.
- */
-_mesa_error(ctx, GL_INVALID_OPERATION,
-"%s(stencil attachment format mismatch)", func);
-return;
- }
-
  read_z_bits = _mesa_get_format_bits(readRb->Format, GL_DEPTH_BITS);
  draw_z_bits = _mesa_get_format_bits(drawRb->Format, GL_DEPTH_BITS);
+ read_s_bit = _mesa_get_format_bits(readRb->Format, GL_STENCIL_BITS);
+ draw_s_bit = _mesa_get_format_bits(drawRb->Format, GL_STENCIL_BITS);
 
  /* If both buffers also have depth data, the depth formats must match
   * as well.  If one doesn't have depth, it's not blitted, so we should
   * ignore the depth format check.
   */
- if (read_z_bits > 0 && draw_z_bits > 0 &&
- (read_z_bits != draw_z_bits ||
-  _mesa_get_format_datatype(readRb->Format) !=
-  _mesa_get_format_datatype(drawRb->Format))) {
+ if ((read_s_bit != draw_s_bit) &&
+  (read_z_bits > 0 && draw_z_bits > 0 &&
+  (read_z_bits != draw_z_bits ||
+   _mesa_get_format_datatype(readRb->Format) !=
+   _mesa_get_format_datatype(drawRb->Format {
 
+/* There is no need to check the stencil datatype here, because
+ * there is only one: GL_UNSIGNED_INT.
+ */
 _mesa_error(ctx, GL_INVALID_OPERATION,
 "%s(stencil attachment depth format mismatch)", func);
 return;
@@ -388,7 +384,7 @@ _mesa_blit_framebuffer(struct gl_context *ctx,
  mask &= ~GL_DEPTH_BUFFER_BIT;
   }
   else {
- int read_s_bit, draw_s_bit;
+ int read_z_bits, draw_z_bits, read_s_bit, draw_s_bit;
 
  if (_mesa_is_gles3(ctx) && (drawRb == readRb)) {
 _mesa_error(ctx, GL_INVALID_OPERATION,
@@ -397,15 +393,8 @@ _mesa_blit_framebuffer(struct gl_context *ctx,
 return;
  }
 
- if ((_mesa_get_format_bits(readRb->Format, GL_DEPTH_BITS) !=
-  _mesa_get_format_bits(drawRb->Format, GL_DEPTH_BITS)) ||
- (_mesa_get_format_datatype(readRb->Format) !=
-  _mesa_get_format_datatype(drawRb->Format))) {
-_mesa_error(ctx, GL_INVALID_OPERATION,
-"%s(depth attachment format mismatch)", func);
-return;
- }
-
+ read_z_bits = _mesa_get_format_bits(readRb->Format, GL_DEPTH_BITS);
+ draw_z_bits = _mesa_get_format_bits(drawRb->Format, GL_DEPTH_BITS);
  read_s_bit = _mesa_get_format_bits(readRb->Format, GL_STENCIL_BITS);
  draw_s_bit = _mesa_get_format_bits(drawRb->Format, GL_STENCIL_BITS);
 
@@ -413,9 +402,12 @@ _mesa_blit_framebuffer(struct gl_context *ctx,
   * match as well.  If one doesn't have stencil, it's not blitted, so
   * we should ignore the stencil format check.
   */
- if (read_s_bit > 0 && draw_s_bit > 0 && read_s_bit != draw_s_bit) {
+ if ((read_s_bit > 0 && draw_s_bit > 0 && read_s_bit != draw_s_bit) &&
+ (read_z_bits != draw_z_bits ||
+  (_mesa_get_format_datatype(readRb->Format) !=
+   _mesa_get_format_datatype(drawRb->Format {
 _mesa_error(ctx, GL_INVALID_OPERATION,
-"%s(depth attachment stencil bits mismatch)", func);
+"%s(depth attachment stencil bits and format 
mismatch)", func);
 return;
  }
   }
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] [RFC] radv: add scratch support for spilling.

2016-10-09 Thread Dave Airlie
On 10 October 2016 at 13:25, Dave Airlie  wrote:
> From: Dave Airlie 
>
> This is a bit of a hack due to how llvm currently handles
> spilling in it's shader ABI. Currently llvm amdgpu backend
> uses relocations to patch the shader with the address of
> the tmpring. The driver loads the shader and patches the
> relocations.
>
> However for vulkan this doesn't work so well for a few reasons
> a) when we build/load the shaders we aren't constructing the
> command stream yet, and the same shader could be using in multiple
> command streams.
>
> b) multiple command execution engines for compute shaders.
>
> So ideally we'd fix LLVM to understand the ABI convention, possibly
> we'd fix it so user sgpr 0,1 are used (this hack uses 10/11).
>
> This patch when it gets the shader back from llvm it patches
> the relocation dword to a nop, and patches to previous mov command
> to move from SGPR 10 and 11. This works usually as it seems the
> SGPR loading of the spill stuff is at the start of shaders always
> so the 10/11 user sgprs haven't been trashed yet. I'm not 100%
> sure this will work all the time, but for now this should allow
> us to pass a bunch more CTS tests and make the Sascha computeshader
> demo to work.

So I found a shader that this doesn't work so well with unfortunately, so
while I'd like this as a temporary solution I probably need to start
digging into llvm.

My current plan is to add a flag to llvm to denote ability to spill to
the address
in userdata sgpr 0/1, have llvm preload those and use them instead.

Now the other question I have is, should I be killing two user data sgprs for
this purpose, or should we define a better ABI, so that the first descriptor
in the buffer pointed to by these is the scratch buffer, and other things
could be queued after it, (like push constants and dynamic descriptor).

Bas, nha? not sure if Matt is on this list.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] android: intel/genxml: add rules to generate xml headers

2016-10-09 Thread Jason Ekstrand
On Sun, Oct 9, 2016 at 3:13 AM, Mauro Rossi  wrote:

> Hi Jason,
>
> I'm sending a patch to add rules for Android,
> even if those headers are not yet used there
> they are inducing a minor building error on Android.
>

These XML includes shouldn't be used for anything in the Anrdoid build as
Aubinator isn't really useful to be building for Android.  Does this fix an
actual build problem or is it just in case we ever need them?  I doubt
we'll be using the XML headers in driver code any time soon, but I'm happy
to add rules for them if you'd like.

--Jason


> Cheers and congratulations for your nice presentation on vulkan at XDC2016
> Mauro
>
> From 52c9b2d9a7d7fc962d7a8c30fc412387a74bf554 Mon Sep 17 00:00:00 2001
> From: Mauro Rossi 
> Date: Sun, 9 Oct 2016 11:40:42 +0200
> Subject: [PATCH] android: intel/genxml: add rules to generate xml headers
>
> New generated headers were introduced by commit 63a366a
> "intel: aubinator: generate a standalone binary"
>
> Once the generated files are listed in Makefile.sources variables,
> Android build system requires rules to avoid building error.
>
> New macro xml-header-gen is basically a copy of automake rules,
> but xxd is invoked directly, as Android build systems stops and shows the
> error
> in case of xxd missing.
>
> Existing macro header-gen is renamed to pack-header-gen as a refinement.
> ---
>  src/intel/Android.genxml.mk | 42 ++
> +++-
>  1 file changed, 33 insertions(+), 9 deletions(-)
>
> diff --git a/src/intel/Android.genxml.mk b/src/intel/Android.genxml.mk
> index 79de784..8991563 100644
> --- a/src/intel/Android.genxml.mk
> +++ b/src/intel/Android.genxml.mk
> @@ -43,7 +43,7 @@ $(intermediates)/dummy.c:
>  # This is the list of auto-generated files headers
>  LOCAL_GENERATED_SOURCES += $(addprefix $(intermediates)/,
> $(GENXML_GENERATED_FILES))
>
> -define header-gen
> +define pack-header-gen
>   @mkdir -p $(dir $@)
>   @echo "Gen Header: $(PRIVATE_MODULE) <= $(notdir $(@))"
>   $(hide) $(PRIVATE_SCRIPT) $(PRIVATE_XML) > $@
> @@ -52,42 +52,66 @@ endef
>  $(intermediates)/genxml/gen4_pack.h: PRIVATE_SCRIPT :=
> $(MESA_PYTHON2) $(LOCAL_PATH)/genxml/gen_pack_header.py
>  $(intermediates)/genxml/gen4_pack.h: PRIVATE_XML :=
> $(LOCAL_PATH)/genxml/gen4.xml
>  $(intermediates)/genxml/gen4_pack.h: $(LOCAL_PATH)/genxml/gen4.xml
> $(LOCAL_PATH)/genxml/gen_pack_header.py
> - $(call header-gen)
> + $(call pack-header-gen)
>
>  $(intermediates)/genxml/gen45_pack.h: PRIVATE_SCRIPT :=
> $(MESA_PYTHON2) $(LOCAL_PATH)/genxml/gen_pack_header.py
>  $(intermediates)/genxml/gen45_pack.h: PRIVATE_XML :=
> $(LOCAL_PATH)/genxml/gen45.xml
>  $(intermediates)/genxml/gen45_pack.h: $(LOCAL_PATH)/genxml/gen45.xml
> $(LOCAL_PATH)/genxml/gen_pack_header.py
> - $(call header-gen)
> + $(call pack-header-gen)
>
>  $(intermediates)/genxml/gen5_pack.h: PRIVATE_SCRIPT :=
> $(MESA_PYTHON2) $(LOCAL_PATH)/genxml/gen_pack_header.py
>  $(intermediates)/genxml/gen5_pack.h: PRIVATE_XML :=
> $(LOCAL_PATH)/genxml/gen5.xml
>  $(intermediates)/genxml/gen5_pack.h: $(LOCAL_PATH)/genxml/gen5.xml
> $(LOCAL_PATH)/genxml/gen_pack_header.py
> - $(call header-gen)
> + $(call pack-header-gen)
>
>  $(intermediates)/genxml/gen6_pack.h: PRIVATE_SCRIPT :=
> $(MESA_PYTHON2) $(LOCAL_PATH)/genxml/gen_pack_header.py
>  $(intermediates)/genxml/gen6_pack.h: PRIVATE_XML :=
> $(LOCAL_PATH)/genxml/gen6.xml
>  $(intermediates)/genxml/gen6_pack.h: $(LOCAL_PATH)/genxml/gen6.xml
> $(LOCAL_PATH)/genxml/gen_pack_header.py
> - $(call header-gen)
> + $(call pack-header-gen)
>
>  $(intermediates)/genxml/gen7_pack.h: PRIVATE_SCRIPT :=
> $(MESA_PYTHON2) $(LOCAL_PATH)/genxml/gen_pack_header.py
>  $(intermediates)/genxml/gen7_pack.h: PRIVATE_XML :=
> $(LOCAL_PATH)/genxml/gen7.xml
>  $(intermediates)/genxml/gen7_pack.h: $(LOCAL_PATH)/genxml/gen7.xml
> $(LOCAL_PATH)/genxml/gen_pack_header.py
> - $(call header-gen)
> + $(call pack-header-gen)
>
>  $(intermediates)/genxml/gen75_pack.h: PRIVATE_SCRIPT :=
> $(MESA_PYTHON2) $(LOCAL_PATH)/genxml/gen_pack_header.py
>  $(intermediates)/genxml/gen75_pack.h: PRIVATE_XML :=
> $(LOCAL_PATH)/genxml/gen75.xml
>  $(intermediates)/genxml/gen75_pack.h: $(LOCAL_PATH)/genxml/gen75.xml
> $(LOCAL_PATH)/genxml/gen_pack_header.py
> - $(call header-gen)
> + $(call pack-header-gen)
>
>  $(intermediates)/genxml/gen8_pack.h: PRIVATE_SCRIPT :=
> $(MESA_PYTHON2) $(LOCAL_PATH)/genxml/gen_pack_header.py
>  $(intermediates)/genxml/gen8_pack.h: PRIVATE_XML :=
> $(LOCAL_PATH)/genxml/gen8.xml
>  $(intermediates)/genxml/gen8_pack.h: $(LOCAL_PATH)/genxml/gen8.xml
> $(LOCAL_PATH)/genxml/gen_pack_header.py
> - $(call header-gen)
> + $(call pack-header-gen)
>
>  $(intermediates)/genxml/gen9_pack.h: PRIVATE_SCRIPT :=
> $(MESA_PYTHON2) $(LOCAL_PATH)/genxml/gen_pack_header.py
>  $(intermediates)/genxml/gen9_pack.h: PRIVATE_XML :=
> $(LOCAL_PATH)/genxml/gen9.xml
>  $(intermediates)/genxml/gen9_pack.h: $(LOCAL_PATH)/genxml/gen9.xml
> 

[Mesa-dev] [PATCH 2/2] [RFC] radv: add scratch support for spilling.

2016-10-09 Thread Dave Airlie
From: Dave Airlie 

This is a bit of a hack due to how llvm currently handles
spilling in it's shader ABI. Currently llvm amdgpu backend
uses relocations to patch the shader with the address of
the tmpring. The driver loads the shader and patches the
relocations.

However for vulkan this doesn't work so well for a few reasons
a) when we build/load the shaders we aren't constructing the
command stream yet, and the same shader could be using in multiple
command streams.

b) multiple command execution engines for compute shaders.

So ideally we'd fix LLVM to understand the ABI convention, possibly
we'd fix it so user sgpr 0,1 are used (this hack uses 10/11).

This patch when it gets the shader back from llvm it patches
the relocation dword to a nop, and patches to previous mov command
to move from SGPR 10 and 11. This works usually as it seems the
SGPR loading of the spill stuff is at the start of shaders always
so the 10/11 user sgprs haven't been trashed yet. I'm not 100%
sure this will work all the time, but for now this should allow
us to pass a bunch more CTS tests and make the Sascha computeshader
demo to work.

Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_nir_to_llvm.c  |  38 +++
 src/amd/common/ac_nir_to_llvm.h  |  11 +++--
 src/amd/vulkan/radv_cmd_buffer.c | 102 ++-
 src/amd/vulkan/radv_device.c |  16 ++
 src/amd/vulkan/radv_private.h|   9 
 5 files changed, 170 insertions(+), 6 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 0e70114..4371451 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -343,6 +343,10 @@ static void create_function(struct nir_to_llvm_context 
*ctx,
arg_types[arg_idx++] = const_array(ctx->i8, 1024 * 1024);
 
array_count = arg_idx;
+
+   /* scratch address */
+   arg_types[arg_idx++] = LLVMVectorType(ctx->i32, 2); /* address of 
scratch */
+
switch (nir->stage) {
case MESA_SHADER_COMPUTE:
arg_types[arg_idx++] = LLVMVectorType(ctx->i32, 3); /* grid 
size */
@@ -416,6 +420,7 @@ static void create_function(struct nir_to_llvm_context *ctx,
 
ctx->push_constants = LLVMGetParam(ctx->main_function, arg_idx++);
 
+   arg_idx++; /* scratch address */
switch (nir->stage) {
case MESA_SHADER_COMPUTE:
ctx->num_work_groups =
@@ -4428,6 +4433,37 @@ static void ac_diagnostic_handler(LLVMDiagnosticInfoRef 
di, void *context)
LLVMDisposeMessage(description);
 }
 
+static const char *scratch_rsrc_dword0_symbol =
+   "SCRATCH_RSRC_DWORD0";
+
+static const char *scratch_rsrc_dword1_symbol =
+   "SCRATCH_RSRC_DWORD1";
+
+static void ac_apply_scratch_relocs(struct ac_shader_binary *binary, int sgpr)
+{
+   uint32_t nop_val = 0xbf80;
+   unsigned i;
+   for (i = 0 ; i < binary->reloc_count; i++) {
+   const struct ac_shader_reloc *reloc =
+   >relocs[i];
+   if (!strcmp(scratch_rsrc_dword0_symbol, reloc->name)) {
+   uint32_t reg_val = *(uint32_t *)(binary->code + 
reloc->offset - 4);
+   reg_val &= 0xff00;
+   reg_val |= sgpr;
+   *(uint32_t *)(binary->code + reloc->offset - 4) = 
reg_val;
+   util_memcpy_cpu_to_le32(binary->code + reloc->offset,
+   _val, 4);
+   } else if (!strcmp(scratch_rsrc_dword1_symbol, reloc->name)) {
+   uint32_t reg_val = *(uint32_t *)(binary->code + 
reloc->offset - 4);
+   reg_val &= 0xff00;
+   reg_val |= (sgpr + 1);
+   *(uint32_t *)(binary->code + reloc->offset - 4) = 
reg_val;
+   util_memcpy_cpu_to_le32(binary->code + reloc->offset,
+   _val, 4);
+   }
+   }
+}
+
 static unsigned ac_llvm_compile(LLVMModuleRef M,
 struct ac_shader_binary *binary,
 LLVMTargetMachineRef tm)
@@ -4467,6 +4503,8 @@ static unsigned ac_llvm_compile(LLVMModuleRef M,
/* Clean up */
LLVMDisposeMemoryBuffer(out_buffer);
 
+   /* patch for vulkan scratch rsrc */
+   ac_apply_scratch_relocs(binary, AC_USERDATA_SCRATCH_OFFSET);
 out:
return retval;
 }
diff --git a/src/amd/common/ac_nir_to_llvm.h b/src/amd/common/ac_nir_to_llvm.h
index a17caf2..836e38a 100644
--- a/src/amd/common/ac_nir_to_llvm.h
+++ b/src/amd/common/ac_nir_to_llvm.h
@@ -104,14 +104,15 @@ void ac_compile_nir_shader(LLVMTargetMachineRef tm,
 #define AC_USERDATA_DESCRIPTOR_SET_2 4
 #define AC_USERDATA_DESCRIPTOR_SET_3 6
 #define AC_USERDATA_PUSH_CONST_DYN 8
+#define AC_USERDATA_SCRATCH_OFFSET 10
 
-#define AC_USERDATA_VS_VERTEX_BUFFERS 10
-#define AC_USERDATA_VS_BASE_VERTEX 12
-#define 

[Mesa-dev] [PATCH 1/2] radv: start using defines for the user sgpr offsets

2016-10-09 Thread Dave Airlie
From: Dave Airlie 

This adds some comments and adds defines for the user sgprs,
so that we can move them around easier later and not have
to change/revalidate every one of these.

Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_nir_to_llvm.c  |  7 +--
 src/amd/common/ac_nir_to_llvm.h  | 17 +
 src/amd/vulkan/radv_cmd_buffer.c | 24 
 3 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index e6ff7c8..0e70114 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -334,9 +334,12 @@ static void create_function(struct nir_to_llvm_context 
*ctx,
unsigned array_count = 0;
unsigned sgpr_count = 0, user_sgpr_count;
unsigned i;
+
+   /* 1 for each descriptor set */
for (unsigned i = 0; i < 4; ++i)
arg_types[arg_idx++] = const_array(ctx->i8, 1024 * 1024);
 
+   /* 1 for push constants and dynamic descriptors */
arg_types[arg_idx++] = const_array(ctx->i8, 1024 * 1024);
 
array_count = arg_idx;
@@ -351,7 +354,7 @@ static void create_function(struct nir_to_llvm_context *ctx,
arg_types[arg_idx++] = LLVMVectorType(ctx->i32, 3);
break;
case MESA_SHADER_VERTEX:
-   arg_types[arg_idx++] = const_array(ctx->v16i8, 16);
+   arg_types[arg_idx++] = const_array(ctx->v16i8, 16); /* vertex 
buffers */
arg_types[arg_idx++] = ctx->i32; // base vertex
arg_types[arg_idx++] = ctx->i32; // start instance
user_sgpr_count = sgpr_count = arg_idx;
@@ -361,7 +364,7 @@ static void create_function(struct nir_to_llvm_context *ctx,
arg_types[arg_idx++] = ctx->i32; // instance id
break;
case MESA_SHADER_FRAGMENT:
-   arg_types[arg_idx++] = const_array(ctx->f32, 32);
+   arg_types[arg_idx++] = const_array(ctx->f32, 32); /* sample 
positions */
user_sgpr_count = arg_idx;
arg_types[arg_idx++] = ctx->i32; /* prim mask */
sgpr_count = arg_idx;
diff --git a/src/amd/common/ac_nir_to_llvm.h b/src/amd/common/ac_nir_to_llvm.h
index 550fe84..a17caf2 100644
--- a/src/amd/common/ac_nir_to_llvm.h
+++ b/src/amd/common/ac_nir_to_llvm.h
@@ -96,6 +96,23 @@ void ac_compile_nir_shader(LLVMTargetMachineRef tm,
const struct ac_nir_compiler_options *options,
   bool dump_shader);
 
+/* SHADER ABI defines */
+
+/* offset in dwords */
+#define AC_USERDATA_DESCRIPTOR_SET_0 0
+#define AC_USERDATA_DESCRIPTOR_SET_1 2
+#define AC_USERDATA_DESCRIPTOR_SET_2 4
+#define AC_USERDATA_DESCRIPTOR_SET_3 6
+#define AC_USERDATA_PUSH_CONST_DYN 8
+
+#define AC_USERDATA_VS_VERTEX_BUFFERS 10
+#define AC_USERDATA_VS_BASE_VERTEX 12
+#define AC_USERDATA_VS_START_INSTANCE 13
+
+#define AC_USERDATA_PS_SAMPLE_POS 10
+
+#define AC_USERDATA_CS_GRID_SIZE 10
+
 #ifdef __cplusplus
 extern "C"
 #endif
diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index e3e9e32..d97b29b 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -327,7 +327,7 @@ radv_update_multisample_state(struct radv_cmd_buffer 
*cmd_buffer,
uint64_t va = 
cmd_buffer->device->ws->buffer_get_va(cmd_buffer->upload.upload_bo);
va += samples_offset;
 
-   radeon_set_sh_reg_seq(cmd_buffer->cs, 
R_00B030_SPI_SHADER_USER_DATA_PS_0 + 10 * 4, 2);
+   radeon_set_sh_reg_seq(cmd_buffer->cs, 
R_00B030_SPI_SHADER_USER_DATA_PS_0 + AC_USERDATA_PS_SAMPLE_POS * 4, 2);
radeon_emit(cmd_buffer->cs, va);
radeon_emit(cmd_buffer->cs, va >> 32);
 }
@@ -919,21 +919,21 @@ radv_flush_constants(struct radv_cmd_buffer *cmd_buffer,
 
if (stages & VK_SHADER_STAGE_VERTEX_BIT) {
radeon_set_sh_reg_seq(cmd_buffer->cs,
- R_00B130_SPI_SHADER_USER_DATA_VS_0 + 8 * 
4, 2);
+ R_00B130_SPI_SHADER_USER_DATA_VS_0 + 
AC_USERDATA_PUSH_CONST_DYN * 4, 2);
radeon_emit(cmd_buffer->cs, va);
radeon_emit(cmd_buffer->cs, va >> 32);
}
 
if (stages & VK_SHADER_STAGE_FRAGMENT_BIT) {
radeon_set_sh_reg_seq(cmd_buffer->cs,
- R_00B030_SPI_SHADER_USER_DATA_PS_0 + 8 * 
4, 2);
+ R_00B030_SPI_SHADER_USER_DATA_PS_0 + 
AC_USERDATA_PUSH_CONST_DYN * 4, 2);
radeon_emit(cmd_buffer->cs, va);
radeon_emit(cmd_buffer->cs, va >> 32);
}
 
if (stages & VK_SHADER_STAGE_COMPUTE_BIT) {
radeon_set_sh_reg_seq(cmd_buffer->cs,
- R_00B900_COMPUTE_USER_DATA_0 + 8 * 4, 2);
+ R_00B900_COMPUTE_USER_DATA_0 + 

[Mesa-dev] radv shader spilling support

2016-10-09 Thread Dave Airlie
This is a bit of a workaround to how llvm does spilling, and
we should fix llvm, I'm just not good enough yet, and this will
get us further with CTS tests for now.

Details in patch 2.

Dave.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] nvc0: enable ARB_enhanced_layouts

2016-10-09 Thread Edward O'Callaghan
Reviewed-by: Edward O'Callaghan 

On 10/09/2016 10:48 PM, Samuel Pitoiset wrote:
> All ARB_enhanced_layouts piglit tests pass without any changes
> in our compiler.
> 
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> index c8b0d41..008ef87 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> @@ -240,6 +240,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
> pipe_cap param)
> case PIPE_CAP_ROBUST_BUFFER_ACCESS_BEHAVIOR:
> case PIPE_CAP_TGSI_VOTE:
> case PIPE_CAP_POLYGON_OFFSET_UNITS_UNSCALED:
> +   case PIPE_CAP_TGSI_ARRAY_COMPONENTS:
>return 1;
> case PIPE_CAP_COMPUTE:
>return (class_3d < GP100_3D_CLASS);
> @@ -272,7 +273,6 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
> pipe_cap param)
> case PIPE_CAP_PCI_DEVICE:
> case PIPE_CAP_PCI_FUNCTION:
> case PIPE_CAP_VIEWPORT_SUBPIXEL_BITS:
> -   case PIPE_CAP_TGSI_ARRAY_COMPONENTS:
>return 0;
>  
> case PIPE_CAP_VENDOR_ID:
> 



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/nine: Memset pipe_resource templates

2016-10-09 Thread Edward O'Callaghan
Acked-by: Edward O'Callaghan 

On 10/09/2016 11:26 PM, Axel Davy wrote:
> Fixes regression introduced by
> ecd6fce2611e88ff8468a354cff8eda39f260a31
> and is more future proof than just clearing the next
> field.
> 
> Other nine usages did already zero out the templates.
> 
> Signed-off-by: Axel Davy 
> ---
>  src/gallium/state_trackers/nine/device9.c| 6 ++
>  src/gallium/state_trackers/nine/swapchain9.c | 2 ++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/src/gallium/state_trackers/nine/device9.c 
> b/src/gallium/state_trackers/nine/device9.c
> index 230ad02..68d2185 100644
> --- a/src/gallium/state_trackers/nine/device9.c
> +++ b/src/gallium/state_trackers/nine/device9.c
> @@ -245,6 +245,7 @@ NineDevice9_ctor( struct NineDevice9 *This,
>  struct pipe_box box;
>  unsigned char *data;
>  
> +memset(, 0, sizeof(tmpl));
>  tmpl.target = PIPE_BUFFER;
>  tmpl.format = PIPE_FORMAT_R8_UNORM;
>  tmpl.width0 = 16; /* 4 floats */
> @@ -277,6 +278,7 @@ NineDevice9_ctor( struct NineDevice9 *This,
>  This->cursor.hotspot.y = -1;
>  {
>  struct pipe_resource tmpl;
> +memset(, 0, sizeof(tmpl));
>  tmpl.target = PIPE_TEXTURE_2D;
>  tmpl.format = PIPE_FORMAT_R8G8B8A8_UNORM;
>  tmpl.width0 = 64;
> @@ -298,6 +300,7 @@ NineDevice9_ctor( struct NineDevice9 *This,
>  {
>  struct pipe_resource tmpl;
>  unsigned max_const_vs, max_const_ps;
> +memset(, 0, sizeof(tmpl));
>  
>  /* vs 3.0: >= 256 float constants, but for cards with exactly 256 
> slots,
>   * we have to take in some more slots for int and bool*/
> @@ -359,6 +362,7 @@ NineDevice9_ctor( struct NineDevice9 *This,
>  struct pipe_resource tmplt;
>  struct pipe_sampler_view templ;
>  struct pipe_sampler_state samp;
> +memset(, 0, sizeof(tmplt));
>  memset(, 0, sizeof(samp));
>  
>  tmplt.target = PIPE_TEXTURE_2D;
> @@ -1119,6 +1123,7 @@ create_zs_or_rt_surface(struct NineDevice9 *This,
>  user_assert(Width && Height, D3DERR_INVALIDCALL);
>  user_assert(Pool != D3DPOOL_MANAGED, D3DERR_INVALIDCALL);
>  
> +memset(, 0, sizeof(templ));
>  templ.target = PIPE_TEXTURE_2D;
>  templ.width0 = Width;
>  templ.height0 = Height;
> @@ -3183,6 +3188,7 @@ NineDevice9_ProcessVertices( struct NineDevice9 *This,
>  if (1) {
>  struct pipe_resource templ;
>  
> +memset(, 0, sizeof(templ));
>  templ.target = PIPE_BUFFER;
>  templ.format = PIPE_FORMAT_R8_UNORM;
>  templ.width0 = buffer_size;
> diff --git a/src/gallium/state_trackers/nine/swapchain9.c 
> b/src/gallium/state_trackers/nine/swapchain9.c
> index 79fba28..838c248 100644
> --- a/src/gallium/state_trackers/nine/swapchain9.c
> +++ b/src/gallium/state_trackers/nine/swapchain9.c
> @@ -228,6 +228,7 @@ NineSwapChain9_Resize( struct NineSwapChain9 *This,
>   * For example 16 bits.*/
>  depth = 24;
>  
> +memset(, 0, sizeof(tmplt));
>  tmplt.target = PIPE_TEXTURE_2D;
>  tmplt.width0 = pParams->BackBufferWidth;
>  tmplt.height0 = pParams->BackBufferHeight;
> @@ -537,6 +538,7 @@ create_present_buffer( struct NineSwapChain9 *This,
>  {
>  struct pipe_resource tmplt;
>  
> +memset(, 0, sizeof(tmplt));
>  tmplt.target = PIPE_TEXTURE_2D;
>  tmplt.width0 = width;
>  tmplt.height0 = height;
> 



signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nvc0: fix valid range for shader buffers

2016-10-09 Thread Samuel Pitoiset
When offset != 0, the valid range was wrong because the second
argument of util_range_add() is end, not size.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_compute.c| 1 +
 src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c | 1 +
 src/gallium/drivers/nouveau/nvc0/nve4_compute.c| 1 +
 3 files changed, 3 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
index dc4d1b3..11635c9 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_compute.c
@@ -288,6 +288,7 @@ nvc0_compute_validate_buffers(struct nvc0_context *nvc0)
  BCTX_REFN(nvc0->bufctx_cp, CP_BUF, res, RDWR);
  util_range_add(>valid_buffer_range,
 nvc0->buffers[s][i].buffer_offset,
+nvc0->buffers[s][i].buffer_offset +
 nvc0->buffers[s][i].buffer_size);
   } else {
  PUSH_DATA (push, 0);
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
index 6683795..025f4e3 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state_validate.c
@@ -561,6 +561,7 @@ nvc0_validate_buffers(struct nvc0_context *nvc0)
 BCTX_REFN(nvc0->bufctx_3d, 3D_BUF, res, RDWR);
 util_range_add(>valid_buffer_range,
nvc0->buffers[s][i].buffer_offset,
+   nvc0->buffers[s][i].buffer_offset +
nvc0->buffers[s][i].buffer_size);
  } else {
 PUSH_DATA (push, 0);
diff --git a/src/gallium/drivers/nouveau/nvc0/nve4_compute.c 
b/src/gallium/drivers/nouveau/nvc0/nve4_compute.c
index e85e9b4..d661c00 100644
--- a/src/gallium/drivers/nouveau/nvc0/nve4_compute.c
+++ b/src/gallium/drivers/nouveau/nvc0/nve4_compute.c
@@ -444,6 +444,7 @@ nve4_compute_validate_buffers(struct nvc0_context *nvc0)
  BCTX_REFN(nvc0->bufctx_cp, CP_BUF, res, RDWR);
  util_range_add(>valid_buffer_range,
 nvc0->buffers[s][i].buffer_offset,
+nvc0->buffers[s][i].buffer_offset +
 nvc0->buffers[s][i].buffer_size);
   } else {
  PUSH_DATA (push, 0);
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] ddebug: add missing pipe_context::clear_texture()

2016-10-09 Thread Nicolai Hähnle

On 09.10.2016 21:19, Samuel Pitoiset wrote:

This fixes a crash while replaying a trace from F1 2015.


I think clear_texture should be handled in dd_draw.c with the draw and 
other clear functions.


Nicolai



Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/ddebug/dd_context.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/gallium/drivers/ddebug/dd_context.c 
b/src/gallium/drivers/ddebug/dd_context.c
index edcbf2c..5e4cebe 100644
--- a/src/gallium/drivers/ddebug/dd_context.c
+++ b/src/gallium/drivers/ddebug/dd_context.c
@@ -748,6 +748,16 @@ dd_context_dump_debug_state(struct pipe_context *_pipe, 
FILE *stream,
return pipe->dump_debug_state(pipe, stream, flags);
 }

+static void
+dd_context_clear_texture(struct pipe_context *_pipe, struct pipe_resource *res,
+ unsigned level, const struct pipe_box *box,
+ const void *data)
+{
+   struct pipe_context *pipe = dd_context(_pipe)->pipe;
+
+   pipe->clear_texture(pipe, res, level, box, data);
+}
+
 struct pipe_context *
 dd_context_create(struct dd_screen *dscreen, struct pipe_context *pipe)
 {
@@ -847,6 +857,7 @@ dd_context_create(struct dd_screen *dscreen, struct 
pipe_context *pipe)
CTX_INIT(set_device_reset_callback);
CTX_INIT(dump_debug_state);
CTX_INIT(emit_string_marker);
+   CTX_INIT(clear_texture);

dd_init_draw_functions(dctx);



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c)

2016-10-09 Thread Karol Herbst
2016-10-09 21:34 GMT+02:00 Ilia Mirkin :
> On Sun, Oct 9, 2016 at 3:28 PM, Karol Herbst  wrote:
>> 2016-10-09 13:58 GMT+02:00 Samuel Pitoiset :
>>>
>>>
>>> On 10/08/2016 10:04 PM, Karol Herbst wrote:

 looks great, a few comments below
>>>
>>>
>>> Thanks!
>>>

 2016-10-08 21:55 GMT+02:00 Samuel Pitoiset :
>
> total instructions in shared programs :2286901 -> 2284473 (-0.11%)
> total gprs used in shared programs:335256 -> 335273 (0.01%)
> total local used in shared programs   :31968 -> 31968 (0.00%)
>
> localgpr   inst  bytes
> helped   0  41 852 852
>   hurt   0  44  23  23
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 94
> ++
>  1 file changed, 94 insertions(+)
>
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> index 6efb29e..caf5d1d 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
> @@ -2132,6 +2132,99 @@ AlgebraicOpt::visit(BasicBlock *bb)
>
>  //
> =
>
> +// ADD(SHL(a, b), c) -> SHLADD(a, b, c)
> +class LateAlgebraicOpt : public Pass
> +{
> +private:
> +   virtual bool visit(BasicBlock *);
> +
> +   void handleADD(Instruction *);
> +   bool tryADDToSHLADD(Instruction *);
> +
> +   BuildUtil bld;
> +};
> +
> +void
> +LateAlgebraicOpt::handleADD(Instruction *add)
> +{
> +   Value *src0 = add->getSrc(0);
> +   Value *src1 = add->getSrc(1);
> +
> +   if (src0->reg.file != FILE_GPR || src1->reg.file != FILE_GPR)
> +  return;
> +
> +   if (prog->getTarget()->isOpSupported(OP_SHLADD, add->dType))
> +  tryADDToSHLADD(add);
> +}
> +
> +// ADD(SHL(a, b), c) -> SHLADD(a, b, c)
> +bool
> +LateAlgebraicOpt::tryADDToSHLADD(Instruction *add)
> +{
> +   Value *src0 = add->getSrc(0);
> +   Value *src1 = add->getSrc(1);
> +   Modifier mod[2];
> +   Value *src;
> +   int s;
> +
> +   if (add->saturate || add->usesFlags() || typeSizeof(add->dType) == 8)
> +  return false;
> +
> +   if (src0->getUniqueInsn() && src0->getUniqueInsn()->op == OP_SHL)
> +  s = 0;
> +   else
> +   if (src1->getUniqueInsn() && src1->getUniqueInsn()->op == OP_SHL)
> +  s = 1;
> +   else
> +  return false;
> +
> +   src = add->getSrc(s);
> +
> +   if (src->getUniqueInsn()->bb != add->bb)
> +  return false;


 is there a reason to check for the bb here?
>>>
>>>
>>> Those optimizations like ADD->MAD, ADD->SAD, etc, have to be local (ie.
>>> inside the same basic block). I just apply the same rule here.
>>>

> +
> +   if (src->getInsn()->usesFlags() || src->getInsn()->subOp)
> +  return false;
> +
> +   if (src->getInsn()->src(1).getFile() != FILE_IMMEDIATE)
> +  return false;
> +
> +   mod[0] = add->src(0).mod;
> +   mod[1] = add->src(1).mod;
> +
> +   add->op = OP_SHLADD;
> +   add->setSrc(2, add->src(s ? 0 : 1));


 I usually use "s ^ 1" instead.
>>>
>>>
>>> Yeah, but 's ? 0 : 1' seems to be more used than 's ^ 1' (in peephole).
>>>

> +   add->src(2).mod = mod[s];
> +
> +   add->setSrc(0, src->getInsn()->getSrc(0));
> +   add->src(0).mod = mod[0];
> +   add->setSrc(1, src->getInsn()->getSrc(1));
> +   add->src(1).mod = Modifier(0);
> +
> +   return true;
> +}
> +
> +bool
> +LateAlgebraicOpt::visit(BasicBlock *bb)
> +{
> +   Instruction *next;
> +
> +   for (Instruction *i = bb->getEntry(); i; i = next) {
> +  next = i->next;
> +  switch (i->op) {
> +  case OP_ADD:
> + handleADD(i);
> + break;
> +  default:
> + break;
> +  }
> +   }


 you can also just implement visit(Instruction *) and return true; then
 you won't need that silly loop.
>>>
>>>
>>> True.
>>>

> +
> +   return true;
> +}
> +
> +//
> =
> +
>  static inline void
>  updateLdStOffset(Instruction *ldst, int32_t offset, Function *fn)
>  {
> @@ -3436,6 +3529,7 @@ Program::optimizeSSA(int level)
> RUN_PASS(2, AlgebraicOpt, run);
> RUN_PASS(2, ModifierFolding, run); // before load propagation -> less
> checks
> 

Re: [Mesa-dev] [PATCH] nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c)

2016-10-09 Thread Ilia Mirkin
On Sun, Oct 9, 2016 at 3:28 PM, Karol Herbst  wrote:
> 2016-10-09 13:58 GMT+02:00 Samuel Pitoiset :
>>
>>
>> On 10/08/2016 10:04 PM, Karol Herbst wrote:
>>>
>>> looks great, a few comments below
>>
>>
>> Thanks!
>>
>>>
>>> 2016-10-08 21:55 GMT+02:00 Samuel Pitoiset :

 total instructions in shared programs :2286901 -> 2284473 (-0.11%)
 total gprs used in shared programs:335256 -> 335273 (0.01%)
 total local used in shared programs   :31968 -> 31968 (0.00%)

 localgpr   inst  bytes
 helped   0  41 852 852
   hurt   0  44  23  23

 Signed-off-by: Samuel Pitoiset 
 ---
  .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 94
 ++
  1 file changed, 94 insertions(+)

 diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
 b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
 index 6efb29e..caf5d1d 100644
 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
 +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
 @@ -2132,6 +2132,99 @@ AlgebraicOpt::visit(BasicBlock *bb)

  //
 =

 +// ADD(SHL(a, b), c) -> SHLADD(a, b, c)
 +class LateAlgebraicOpt : public Pass
 +{
 +private:
 +   virtual bool visit(BasicBlock *);
 +
 +   void handleADD(Instruction *);
 +   bool tryADDToSHLADD(Instruction *);
 +
 +   BuildUtil bld;
 +};
 +
 +void
 +LateAlgebraicOpt::handleADD(Instruction *add)
 +{
 +   Value *src0 = add->getSrc(0);
 +   Value *src1 = add->getSrc(1);
 +
 +   if (src0->reg.file != FILE_GPR || src1->reg.file != FILE_GPR)
 +  return;
 +
 +   if (prog->getTarget()->isOpSupported(OP_SHLADD, add->dType))
 +  tryADDToSHLADD(add);
 +}
 +
 +// ADD(SHL(a, b), c) -> SHLADD(a, b, c)
 +bool
 +LateAlgebraicOpt::tryADDToSHLADD(Instruction *add)
 +{
 +   Value *src0 = add->getSrc(0);
 +   Value *src1 = add->getSrc(1);
 +   Modifier mod[2];
 +   Value *src;
 +   int s;
 +
 +   if (add->saturate || add->usesFlags() || typeSizeof(add->dType) == 8)
 +  return false;
 +
 +   if (src0->getUniqueInsn() && src0->getUniqueInsn()->op == OP_SHL)
 +  s = 0;
 +   else
 +   if (src1->getUniqueInsn() && src1->getUniqueInsn()->op == OP_SHL)
 +  s = 1;
 +   else
 +  return false;
 +
 +   src = add->getSrc(s);
 +
 +   if (src->getUniqueInsn()->bb != add->bb)
 +  return false;
>>>
>>>
>>> is there a reason to check for the bb here?
>>
>>
>> Those optimizations like ADD->MAD, ADD->SAD, etc, have to be local (ie.
>> inside the same basic block). I just apply the same rule here.
>>
>>>
 +
 +   if (src->getInsn()->usesFlags() || src->getInsn()->subOp)
 +  return false;
 +
 +   if (src->getInsn()->src(1).getFile() != FILE_IMMEDIATE)
 +  return false;
 +
 +   mod[0] = add->src(0).mod;
 +   mod[1] = add->src(1).mod;
 +
 +   add->op = OP_SHLADD;
 +   add->setSrc(2, add->src(s ? 0 : 1));
>>>
>>>
>>> I usually use "s ^ 1" instead.
>>
>>
>> Yeah, but 's ? 0 : 1' seems to be more used than 's ^ 1' (in peephole).
>>
>>>
 +   add->src(2).mod = mod[s];
 +
 +   add->setSrc(0, src->getInsn()->getSrc(0));
 +   add->src(0).mod = mod[0];
 +   add->setSrc(1, src->getInsn()->getSrc(1));
 +   add->src(1).mod = Modifier(0);
 +
 +   return true;
 +}
 +
 +bool
 +LateAlgebraicOpt::visit(BasicBlock *bb)
 +{
 +   Instruction *next;
 +
 +   for (Instruction *i = bb->getEntry(); i; i = next) {
 +  next = i->next;
 +  switch (i->op) {
 +  case OP_ADD:
 + handleADD(i);
 + break;
 +  default:
 + break;
 +  }
 +   }
>>>
>>>
>>> you can also just implement visit(Instruction *) and return true; then
>>> you won't need that silly loop.
>>
>>
>> True.
>>
>>>
 +
 +   return true;
 +}
 +
 +//
 =
 +
  static inline void
  updateLdStOffset(Instruction *ldst, int32_t offset, Function *fn)
  {
 @@ -3436,6 +3529,7 @@ Program::optimizeSSA(int level)
 RUN_PASS(2, AlgebraicOpt, run);
 RUN_PASS(2, ModifierFolding, run); // before load propagation -> less
 checks
 RUN_PASS(1, ConstantFolding, foldAll);
 +   RUN_PASS(2, LateAlgebraicOpt, run);
>>>
>>>
>>> What is the reason to add a new AlgebraicOpt pass for this?
>>
>>
>> Because a bunch of optimizations are applied in ConstantFolding 

Re: [Mesa-dev] [PATCH] nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c)

2016-10-09 Thread Karol Herbst
2016-10-09 13:58 GMT+02:00 Samuel Pitoiset :
>
>
> On 10/08/2016 10:04 PM, Karol Herbst wrote:
>>
>> looks great, a few comments below
>
>
> Thanks!
>
>>
>> 2016-10-08 21:55 GMT+02:00 Samuel Pitoiset :
>>>
>>> total instructions in shared programs :2286901 -> 2284473 (-0.11%)
>>> total gprs used in shared programs:335256 -> 335273 (0.01%)
>>> total local used in shared programs   :31968 -> 31968 (0.00%)
>>>
>>> localgpr   inst  bytes
>>> helped   0  41 852 852
>>>   hurt   0  44  23  23
>>>
>>> Signed-off-by: Samuel Pitoiset 
>>> ---
>>>  .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 94
>>> ++
>>>  1 file changed, 94 insertions(+)
>>>
>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> index 6efb29e..caf5d1d 100644
>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> @@ -2132,6 +2132,99 @@ AlgebraicOpt::visit(BasicBlock *bb)
>>>
>>>  //
>>> =
>>>
>>> +// ADD(SHL(a, b), c) -> SHLADD(a, b, c)
>>> +class LateAlgebraicOpt : public Pass
>>> +{
>>> +private:
>>> +   virtual bool visit(BasicBlock *);
>>> +
>>> +   void handleADD(Instruction *);
>>> +   bool tryADDToSHLADD(Instruction *);
>>> +
>>> +   BuildUtil bld;
>>> +};
>>> +
>>> +void
>>> +LateAlgebraicOpt::handleADD(Instruction *add)
>>> +{
>>> +   Value *src0 = add->getSrc(0);
>>> +   Value *src1 = add->getSrc(1);
>>> +
>>> +   if (src0->reg.file != FILE_GPR || src1->reg.file != FILE_GPR)
>>> +  return;
>>> +
>>> +   if (prog->getTarget()->isOpSupported(OP_SHLADD, add->dType))
>>> +  tryADDToSHLADD(add);
>>> +}
>>> +
>>> +// ADD(SHL(a, b), c) -> SHLADD(a, b, c)
>>> +bool
>>> +LateAlgebraicOpt::tryADDToSHLADD(Instruction *add)
>>> +{
>>> +   Value *src0 = add->getSrc(0);
>>> +   Value *src1 = add->getSrc(1);
>>> +   Modifier mod[2];
>>> +   Value *src;
>>> +   int s;
>>> +
>>> +   if (add->saturate || add->usesFlags() || typeSizeof(add->dType) == 8)
>>> +  return false;
>>> +
>>> +   if (src0->getUniqueInsn() && src0->getUniqueInsn()->op == OP_SHL)
>>> +  s = 0;
>>> +   else
>>> +   if (src1->getUniqueInsn() && src1->getUniqueInsn()->op == OP_SHL)
>>> +  s = 1;
>>> +   else
>>> +  return false;
>>> +
>>> +   src = add->getSrc(s);
>>> +
>>> +   if (src->getUniqueInsn()->bb != add->bb)
>>> +  return false;
>>
>>
>> is there a reason to check for the bb here?
>
>
> Those optimizations like ADD->MAD, ADD->SAD, etc, have to be local (ie.
> inside the same basic block). I just apply the same rule here.
>
>>
>>> +
>>> +   if (src->getInsn()->usesFlags() || src->getInsn()->subOp)
>>> +  return false;
>>> +
>>> +   if (src->getInsn()->src(1).getFile() != FILE_IMMEDIATE)
>>> +  return false;
>>> +
>>> +   mod[0] = add->src(0).mod;
>>> +   mod[1] = add->src(1).mod;
>>> +
>>> +   add->op = OP_SHLADD;
>>> +   add->setSrc(2, add->src(s ? 0 : 1));
>>
>>
>> I usually use "s ^ 1" instead.
>
>
> Yeah, but 's ? 0 : 1' seems to be more used than 's ^ 1' (in peephole).
>
>>
>>> +   add->src(2).mod = mod[s];
>>> +
>>> +   add->setSrc(0, src->getInsn()->getSrc(0));
>>> +   add->src(0).mod = mod[0];
>>> +   add->setSrc(1, src->getInsn()->getSrc(1));
>>> +   add->src(1).mod = Modifier(0);
>>> +
>>> +   return true;
>>> +}
>>> +
>>> +bool
>>> +LateAlgebraicOpt::visit(BasicBlock *bb)
>>> +{
>>> +   Instruction *next;
>>> +
>>> +   for (Instruction *i = bb->getEntry(); i; i = next) {
>>> +  next = i->next;
>>> +  switch (i->op) {
>>> +  case OP_ADD:
>>> + handleADD(i);
>>> + break;
>>> +  default:
>>> + break;
>>> +  }
>>> +   }
>>
>>
>> you can also just implement visit(Instruction *) and return true; then
>> you won't need that silly loop.
>
>
> True.
>
>>
>>> +
>>> +   return true;
>>> +}
>>> +
>>> +//
>>> =
>>> +
>>>  static inline void
>>>  updateLdStOffset(Instruction *ldst, int32_t offset, Function *fn)
>>>  {
>>> @@ -3436,6 +3529,7 @@ Program::optimizeSSA(int level)
>>> RUN_PASS(2, AlgebraicOpt, run);
>>> RUN_PASS(2, ModifierFolding, run); // before load propagation -> less
>>> checks
>>> RUN_PASS(1, ConstantFolding, foldAll);
>>> +   RUN_PASS(2, LateAlgebraicOpt, run);
>>
>>
>> What is the reason to add a new AlgebraicOpt pass for this?
>
>
> Because a bunch of optimizations are applied in ConstantFolding which means
> this one has to be done *after*. My first attempt was to do it in
> AlgebraicOpt, but it's actually not the best place. :-)
>

Yeah but this applies to many other opts as well. I think in the end
we need to run those passes inside a loop 

[Mesa-dev] [PATCH] ddebug: add missing pipe_context::clear_texture()

2016-10-09 Thread Samuel Pitoiset
This fixes a crash while replaying a trace from F1 2015.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/ddebug/dd_context.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/gallium/drivers/ddebug/dd_context.c 
b/src/gallium/drivers/ddebug/dd_context.c
index edcbf2c..5e4cebe 100644
--- a/src/gallium/drivers/ddebug/dd_context.c
+++ b/src/gallium/drivers/ddebug/dd_context.c
@@ -748,6 +748,16 @@ dd_context_dump_debug_state(struct pipe_context *_pipe, 
FILE *stream,
return pipe->dump_debug_state(pipe, stream, flags);
 }
 
+static void
+dd_context_clear_texture(struct pipe_context *_pipe, struct pipe_resource *res,
+ unsigned level, const struct pipe_box *box,
+ const void *data)
+{
+   struct pipe_context *pipe = dd_context(_pipe)->pipe;
+
+   pipe->clear_texture(pipe, res, level, box, data);
+}
+
 struct pipe_context *
 dd_context_create(struct dd_screen *dscreen, struct pipe_context *pipe)
 {
@@ -847,6 +857,7 @@ dd_context_create(struct dd_screen *dscreen, struct 
pipe_context *pipe)
CTX_INIT(set_device_reset_callback);
CTX_INIT(dump_debug_state);
CTX_INIT(emit_string_marker);
+   CTX_INIT(clear_texture);
 
dd_init_draw_functions(dctx);
 
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98172] Concurrent call to glClientWaitSync results in segfault in one of the waiters.

2016-10-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98172

shinji.suz...@gmail.com changed:

   What|Removed |Added

   Assignee|dri-devel@lists.freedesktop |mesa-dev@lists.freedesktop.
   |.org|org
 QA Contact|dri-devel@lists.freedesktop |mesa-dev@lists.freedesktop.
   |.org|org
  Component|Drivers/Gallium/r600|Mesa core
 CC||shinji.suz...@gmail.com

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nv50/ir: only stick one preret per function

2016-10-09 Thread Ilia Mirkin
On Sun, Oct 9, 2016 at 7:53 AM, Samuel Pitoiset
 wrote:
>
>
> On 10/09/2016 06:12 AM, Ilia Mirkin wrote:
>>
>> A function with multiple returns would have had multiple preret settings
>> at the top of the function. While this is unlikely to have caused issues
>> since we don't use funcitons in earnest, it could have in some cases
>
>
> s/funcitons/functions/ :)
>
> This looks good to me, but fyi this doesn't fix the regressions introduced
> with "nv50/ir: start LocalCSE with getFirst to merge PHI instructions".
> Something else is probably wrong.

I don't think I ever suggested it would. Just something I happened to
notice when looking at the compiled output of a shader. The only case
it could possibly fix is if there were >N preret's, which would
overflow the call stack and probably cause an error.

>
> Reviewed-by: Samuel Pitoiset 

Thanks!

>
>
>> overflowed the call stack, in case a function had a lot of early
>> returns.
>>
>> Signed-off-by: Ilia Mirkin 
>> ---
>>  src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 11
>> +++
>>  1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
>> index 5938226..6664366 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
>> @@ -3475,10 +3475,13 @@ Converter::handleInstruction(const struct
>> tgsi_full_instruction *insn)
>>if (!isEndOfSubroutine(ip + 1)) {
>>   // insert a PRERET at the entry if this is an early return
>>   // (only needed for sharing code in the epilogue)
>> - BasicBlock *pos = getBB();
>> - setPosition(BasicBlock::get(func->cfg.getRoot()), false);
>> - mkFlow(OP_PRERET, leave, CC_ALWAYS, NULL)->fixed = 1;
>> - setPosition(pos, true);
>> + BasicBlock *root = BasicBlock::get(func->cfg.getRoot());
>> + if (root->getEntry() != NULL && root->getEntry()->op !=
>> OP_PRERET) {
>> +BasicBlock *pos = getBB();
>> +setPosition(root, false);
>> +mkFlow(OP_PRERET, leave, CC_ALWAYS, NULL)->fixed = 1;
>> +setPosition(pos, true);
>> + }
>>}
>>mkFlow(OP_RET, NULL, CC_ALWAYS, NULL)->fixed = 1;
>>bb->cfg.attach(>cfg, Graph::Edge::CROSS);
>>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Bug 98169] lm_sensors hud option crashes unigine heaven

2016-10-09 Thread Steven Toth
> Comment # 1 on bug 98169 from Christoph Haag
>
> I also noticed that the cpufrequency graphs (cpufreq-cur-cpu0) always show 0
> in
> unigine heaven.
>
> They work correctly in glxgears, so perhaps it's caused by the same thing
> unigine heaven does.

I'll open a bugzilla account tomorrow (monday) and repo/remedy this issue.

Thanks for raising it.

-- 
Steven Toth - Kernel Labs
http://www.kernellabs.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98169] lm_sensors hud option crashes unigine heaven

2016-10-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98169

--- Comment #1 from Christoph Haag  ---
I also noticed that the cpufrequency graphs (cpufreq-cur-cpu0) always show 0 in
unigine heaven.

They work correctly in glxgears, so perhaps it's caused by the same thing
unigine heaven does.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] st/nine: Memset pipe_resource templates

2016-10-09 Thread Axel Davy
Fixes regression introduced by
ecd6fce2611e88ff8468a354cff8eda39f260a31
and is more future proof than just clearing the next
field.

Other nine usages did already zero out the templates.

Signed-off-by: Axel Davy 
---
 src/gallium/state_trackers/nine/device9.c| 6 ++
 src/gallium/state_trackers/nine/swapchain9.c | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/src/gallium/state_trackers/nine/device9.c 
b/src/gallium/state_trackers/nine/device9.c
index 230ad02..68d2185 100644
--- a/src/gallium/state_trackers/nine/device9.c
+++ b/src/gallium/state_trackers/nine/device9.c
@@ -245,6 +245,7 @@ NineDevice9_ctor( struct NineDevice9 *This,
 struct pipe_box box;
 unsigned char *data;
 
+memset(, 0, sizeof(tmpl));
 tmpl.target = PIPE_BUFFER;
 tmpl.format = PIPE_FORMAT_R8_UNORM;
 tmpl.width0 = 16; /* 4 floats */
@@ -277,6 +278,7 @@ NineDevice9_ctor( struct NineDevice9 *This,
 This->cursor.hotspot.y = -1;
 {
 struct pipe_resource tmpl;
+memset(, 0, sizeof(tmpl));
 tmpl.target = PIPE_TEXTURE_2D;
 tmpl.format = PIPE_FORMAT_R8G8B8A8_UNORM;
 tmpl.width0 = 64;
@@ -298,6 +300,7 @@ NineDevice9_ctor( struct NineDevice9 *This,
 {
 struct pipe_resource tmpl;
 unsigned max_const_vs, max_const_ps;
+memset(, 0, sizeof(tmpl));
 
 /* vs 3.0: >= 256 float constants, but for cards with exactly 256 
slots,
  * we have to take in some more slots for int and bool*/
@@ -359,6 +362,7 @@ NineDevice9_ctor( struct NineDevice9 *This,
 struct pipe_resource tmplt;
 struct pipe_sampler_view templ;
 struct pipe_sampler_state samp;
+memset(, 0, sizeof(tmplt));
 memset(, 0, sizeof(samp));
 
 tmplt.target = PIPE_TEXTURE_2D;
@@ -1119,6 +1123,7 @@ create_zs_or_rt_surface(struct NineDevice9 *This,
 user_assert(Width && Height, D3DERR_INVALIDCALL);
 user_assert(Pool != D3DPOOL_MANAGED, D3DERR_INVALIDCALL);
 
+memset(, 0, sizeof(templ));
 templ.target = PIPE_TEXTURE_2D;
 templ.width0 = Width;
 templ.height0 = Height;
@@ -3183,6 +3188,7 @@ NineDevice9_ProcessVertices( struct NineDevice9 *This,
 if (1) {
 struct pipe_resource templ;
 
+memset(, 0, sizeof(templ));
 templ.target = PIPE_BUFFER;
 templ.format = PIPE_FORMAT_R8_UNORM;
 templ.width0 = buffer_size;
diff --git a/src/gallium/state_trackers/nine/swapchain9.c 
b/src/gallium/state_trackers/nine/swapchain9.c
index 79fba28..838c248 100644
--- a/src/gallium/state_trackers/nine/swapchain9.c
+++ b/src/gallium/state_trackers/nine/swapchain9.c
@@ -228,6 +228,7 @@ NineSwapChain9_Resize( struct NineSwapChain9 *This,
  * For example 16 bits.*/
 depth = 24;
 
+memset(, 0, sizeof(tmplt));
 tmplt.target = PIPE_TEXTURE_2D;
 tmplt.width0 = pParams->BackBufferWidth;
 tmplt.height0 = pParams->BackBufferHeight;
@@ -537,6 +538,7 @@ create_present_buffer( struct NineSwapChain9 *This,
 {
 struct pipe_resource tmplt;
 
+memset(, 0, sizeof(tmplt));
 tmplt.target = PIPE_TEXTURE_2D;
 tmplt.width0 = width;
 tmplt.height0 = height;
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 98169] lm_sensors hud option crashes unigine heaven

2016-10-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=98169

Bug ID: 98169
   Summary: lm_sensors hud option crashes unigine heaven
   Product: Mesa
   Version: git
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: Mesa core
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: haa...@frickel.club
QA Contact: mesa-dev@lists.freedesktop.org

Created attachment 127151
  --> https://bugs.freedesktop.org/attachment.cgi?id=127151=edit
valgrind output

For convenient debugging, go to unigine-heaven/bin and run

I'm using GALLIUM_HUD=sensors_temp_cu-amdgpu-pci-0100.temp1 for this.
glxgears works fine with it.
ungine-heaven crashes (I have not tested more applications).

For convenient debugging, go to unigine-heaven/bin and run

./heaven_x64 -project_name Heaven -data_path ../ -engine_config
../data/heaven_4.0.cfg -system_script heaven/unigine.cpp -sound_app openal
-video_app opengl -video_multisample 1 -video_fullscreen 0 -video_mode 6
-extern_define ,RELEASE,LANGUAGE_EN,QUALITY_HIGH,TESSELLATION_NORMAL
-extern_plugin ,GPUMonitor

after printing
Set 1920x1080 windowed video mode
it crashes
*** Error in `./heaven_x64': corrupted double-linked list: 0x01d7b040
***

valgrind detects an error here:

==13792== Invalid free() / delete / delete[] / realloc()
==13792==at 0x4C2AD90: free (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==13792==by 0x2A0C4512: sensors_cleanup (in /usr/lib/libsensors.so.4.4.0)
==13792==by 0x294B0606: free_query_data (hud_sensors_temp.c:200)
==13792==by 0x294AB4D5: hud_graph_destroy (hud_context.c:863)
==13792==by 0x294AD145: hud_destroy (hud_context.c:1528)
==13792==by 0x29452BA0: dri_destroy_context (dri_context.c:176)
==13792==by 0x29451BCC: driDestroyContext (dri_util.c:500)
==13792==by 0x1B4FA333: dri3_destroy_context (dri3_glx.c:200)

More complete valgrind output attached.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c)

2016-10-09 Thread Samuel Pitoiset



On 10/08/2016 10:09 PM, Ilia Mirkin wrote:

On Sat, Oct 8, 2016 at 3:55 PM, Samuel Pitoiset
 wrote:

total instructions in shared programs :2286901 -> 2284473 (-0.11%)
total gprs used in shared programs:335256 -> 335273 (0.01%)
total local used in shared programs   :31968 -> 31968 (0.00%)

localgpr   inst  bytes
helped   0  41 852 852
  hurt   0  44  23  23

Signed-off-by: Samuel Pitoiset 
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 94 ++
 1 file changed, 94 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 6efb29e..caf5d1d 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -2132,6 +2132,99 @@ AlgebraicOpt::visit(BasicBlock *bb)

 // 
=

+// ADD(SHL(a, b), c) -> SHLADD(a, b, c)
+class LateAlgebraicOpt : public Pass
+{
+private:
+   virtual bool visit(BasicBlock *);
+
+   void handleADD(Instruction *);
+   bool tryADDToSHLADD(Instruction *);
+
+   BuildUtil bld;
+};
+
+void
+LateAlgebraicOpt::handleADD(Instruction *add)
+{
+   Value *src0 = add->getSrc(0);
+   Value *src1 = add->getSrc(1);
+
+   if (src0->reg.file != FILE_GPR || src1->reg.file != FILE_GPR)
+  return;
+
+   if (prog->getTarget()->isOpSupported(OP_SHLADD, add->dType))
+  tryADDToSHLADD(add);
+}
+
+// ADD(SHL(a, b), c) -> SHLADD(a, b, c)
+bool
+LateAlgebraicOpt::tryADDToSHLADD(Instruction *add)
+{
+   Value *src0 = add->getSrc(0);
+   Value *src1 = add->getSrc(1);
+   Modifier mod[2];
+   Value *src;
+   int s;
+
+   if (add->saturate || add->usesFlags() || typeSizeof(add->dType) == 8)
+  return false;
+
+   if (src0->getUniqueInsn() && src0->getUniqueInsn()->op == OP_SHL)
+  s = 0;
+   else
+   if (src1->getUniqueInsn() && src1->getUniqueInsn()->op == OP_SHL)
+  s = 1;
+   else
+  return false;
+
+   src = add->getSrc(s);
+
+   if (src->getUniqueInsn()->bb != add->bb)
+  return false;


You flip between UniqueInsn and Insn... I think UniqueInsn is fine,
but you could also save the result off into another Instruction *shl
and keep reusing that. IMHO it'll be easier to follow the logic.


Fine by me.




+
+   if (src->getInsn()->usesFlags() || src->getInsn()->subOp)
+  return false;
+
+   if (src->getInsn()->src(1).getFile() != FILE_IMMEDIATE)


This is happening before LoadPropagation, so that may may not always
work (e.g. it could be a mov). I think you want to use getImmediate()
or however that function is called so that you can peer through moves.


Okay, I will have a look.




+  return false;
+
+   mod[0] = add->src(0).mod;
+   mod[1] = add->src(1).mod;
+
+   add->op = OP_SHLADD;
+   add->setSrc(2, add->src(s ? 0 : 1));
+   add->src(2).mod = mod[s];
+
+   add->setSrc(0, src->getInsn()->getSrc(0));
+   add->src(0).mod = mod[0];


mod[!s] presumably?

IMO this would be more readable with judicious use of swapSources and
moveSources.


Okay, but I have just followed the same design as tryADDToMadOrSAD() but 
it seems like we can improve readability with those util methods.





+   add->setSrc(1, src->getInsn()->getSrc(1));
+   add->src(1).mod = Modifier(0);
+
+   return true;
+}
+
+bool
+LateAlgebraicOpt::visit(BasicBlock *bb)
+{
+   Instruction *next;
+
+   for (Instruction *i = bb->getEntry(); i; i = next) {
+  next = i->next;
+  switch (i->op) {
+  case OP_ADD:
+ handleADD(i);
+ break;
+  default:
+ break;
+  }
+   }
+
+   return true;
+}
+
+// 
=
+
 static inline void
 updateLdStOffset(Instruction *ldst, int32_t offset, Function *fn)
 {
@@ -3436,6 +3529,7 @@ Program::optimizeSSA(int level)
RUN_PASS(2, AlgebraicOpt, run);
RUN_PASS(2, ModifierFolding, run); // before load propagation -> less checks
RUN_PASS(1, ConstantFolding, foldAll);
+   RUN_PASS(2, LateAlgebraicOpt, run);
RUN_PASS(1, LoadPropagation, run);
RUN_PASS(1, IndirectPropagation, run);
RUN_PASS(2, MemoryOpt, run);
--
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c)

2016-10-09 Thread Samuel Pitoiset



On 10/08/2016 10:04 PM, Karol Herbst wrote:

looks great, a few comments below


Thanks!



2016-10-08 21:55 GMT+02:00 Samuel Pitoiset :

total instructions in shared programs :2286901 -> 2284473 (-0.11%)
total gprs used in shared programs:335256 -> 335273 (0.01%)
total local used in shared programs   :31968 -> 31968 (0.00%)

localgpr   inst  bytes
helped   0  41 852 852
  hurt   0  44  23  23

Signed-off-by: Samuel Pitoiset 
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 94 ++
 1 file changed, 94 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 6efb29e..caf5d1d 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -2132,6 +2132,99 @@ AlgebraicOpt::visit(BasicBlock *bb)

 // 
=

+// ADD(SHL(a, b), c) -> SHLADD(a, b, c)
+class LateAlgebraicOpt : public Pass
+{
+private:
+   virtual bool visit(BasicBlock *);
+
+   void handleADD(Instruction *);
+   bool tryADDToSHLADD(Instruction *);
+
+   BuildUtil bld;
+};
+
+void
+LateAlgebraicOpt::handleADD(Instruction *add)
+{
+   Value *src0 = add->getSrc(0);
+   Value *src1 = add->getSrc(1);
+
+   if (src0->reg.file != FILE_GPR || src1->reg.file != FILE_GPR)
+  return;
+
+   if (prog->getTarget()->isOpSupported(OP_SHLADD, add->dType))
+  tryADDToSHLADD(add);
+}
+
+// ADD(SHL(a, b), c) -> SHLADD(a, b, c)
+bool
+LateAlgebraicOpt::tryADDToSHLADD(Instruction *add)
+{
+   Value *src0 = add->getSrc(0);
+   Value *src1 = add->getSrc(1);
+   Modifier mod[2];
+   Value *src;
+   int s;
+
+   if (add->saturate || add->usesFlags() || typeSizeof(add->dType) == 8)
+  return false;
+
+   if (src0->getUniqueInsn() && src0->getUniqueInsn()->op == OP_SHL)
+  s = 0;
+   else
+   if (src1->getUniqueInsn() && src1->getUniqueInsn()->op == OP_SHL)
+  s = 1;
+   else
+  return false;
+
+   src = add->getSrc(s);
+
+   if (src->getUniqueInsn()->bb != add->bb)
+  return false;


is there a reason to check for the bb here?


Those optimizations like ADD->MAD, ADD->SAD, etc, have to be local (ie. 
inside the same basic block). I just apply the same rule here.





+
+   if (src->getInsn()->usesFlags() || src->getInsn()->subOp)
+  return false;
+
+   if (src->getInsn()->src(1).getFile() != FILE_IMMEDIATE)
+  return false;
+
+   mod[0] = add->src(0).mod;
+   mod[1] = add->src(1).mod;
+
+   add->op = OP_SHLADD;
+   add->setSrc(2, add->src(s ? 0 : 1));


I usually use "s ^ 1" instead.


Yeah, but 's ? 0 : 1' seems to be more used than 's ^ 1' (in peephole).




+   add->src(2).mod = mod[s];
+
+   add->setSrc(0, src->getInsn()->getSrc(0));
+   add->src(0).mod = mod[0];
+   add->setSrc(1, src->getInsn()->getSrc(1));
+   add->src(1).mod = Modifier(0);
+
+   return true;
+}
+
+bool
+LateAlgebraicOpt::visit(BasicBlock *bb)
+{
+   Instruction *next;
+
+   for (Instruction *i = bb->getEntry(); i; i = next) {
+  next = i->next;
+  switch (i->op) {
+  case OP_ADD:
+ handleADD(i);
+ break;
+  default:
+ break;
+  }
+   }


you can also just implement visit(Instruction *) and return true; then
you won't need that silly loop.


True.




+
+   return true;
+}
+
+// 
=
+
 static inline void
 updateLdStOffset(Instruction *ldst, int32_t offset, Function *fn)
 {
@@ -3436,6 +3529,7 @@ Program::optimizeSSA(int level)
RUN_PASS(2, AlgebraicOpt, run);
RUN_PASS(2, ModifierFolding, run); // before load propagation -> less checks
RUN_PASS(1, ConstantFolding, foldAll);
+   RUN_PASS(2, LateAlgebraicOpt, run);


What is the reason to add a new AlgebraicOpt pass for this?


Because a bunch of optimizations are applied in ConstantFolding which 
means this one has to be done *after*. My first attempt was to do it in 
AlgebraicOpt, but it's actually not the best place. :-)





RUN_PASS(1, LoadPropagation, run);
RUN_PASS(1, IndirectPropagation, run);
RUN_PASS(2, MemoryOpt, run);
--
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nv50/ir: only stick one preret per function

2016-10-09 Thread Samuel Pitoiset



On 10/09/2016 06:12 AM, Ilia Mirkin wrote:

A function with multiple returns would have had multiple preret settings
at the top of the function. While this is unlikely to have caused issues
since we don't use funcitons in earnest, it could have in some cases


s/funcitons/functions/ :)

This looks good to me, but fyi this doesn't fix the regressions 
introduced with "nv50/ir: start LocalCSE with getFirst to merge PHI 
instructions". Something else is probably wrong.


Reviewed-by: Samuel Pitoiset 


overflowed the call stack, in case a function had a lot of early
returns.

Signed-off-by: Ilia Mirkin 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
index 5938226..6664366 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp
@@ -3475,10 +3475,13 @@ Converter::handleInstruction(const struct 
tgsi_full_instruction *insn)
   if (!isEndOfSubroutine(ip + 1)) {
  // insert a PRERET at the entry if this is an early return
  // (only needed for sharing code in the epilogue)
- BasicBlock *pos = getBB();
- setPosition(BasicBlock::get(func->cfg.getRoot()), false);
- mkFlow(OP_PRERET, leave, CC_ALWAYS, NULL)->fixed = 1;
- setPosition(pos, true);
+ BasicBlock *root = BasicBlock::get(func->cfg.getRoot());
+ if (root->getEntry() != NULL && root->getEntry()->op != OP_PRERET) {
+BasicBlock *pos = getBB();
+setPosition(root, false);
+mkFlow(OP_PRERET, leave, CC_ALWAYS, NULL)->fixed = 1;
+setPosition(pos, true);
+ }
   }
   mkFlow(OP_RET, NULL, CC_ALWAYS, NULL)->fixed = 1;
   bb->cfg.attach(>cfg, Graph::Edge::CROSS);


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] nvc0: enable GLSL 4.5

2016-10-09 Thread Samuel Pitoiset
This exposes OpenGL 4.5 on Fermi and Kepler GPUs. Maxwell still
only exposes OpenGL 4.1 because I need to finish my instructions
scheduler calculator.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index 008ef87..b699c57 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -126,7 +126,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
   return 128 * 1024 * 1024;
case PIPE_CAP_GLSL_FEATURE_LEVEL:
   if (class_3d <= NVF0_3D_CLASS)
- return 430;
+ return 450;
   return 410;
case PIPE_CAP_MAX_RENDER_TARGETS:
   return 8;
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] nvc0: enable ARB_enhanced_layouts

2016-10-09 Thread Samuel Pitoiset
All ARB_enhanced_layouts piglit tests pass without any changes
in our compiler.

Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index c8b0d41..008ef87 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -240,6 +240,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_ROBUST_BUFFER_ACCESS_BEHAVIOR:
case PIPE_CAP_TGSI_VOTE:
case PIPE_CAP_POLYGON_OFFSET_UNITS_UNSCALED:
+   case PIPE_CAP_TGSI_ARRAY_COMPONENTS:
   return 1;
case PIPE_CAP_COMPUTE:
   return (class_3d < GP100_3D_CLASS);
@@ -272,7 +273,6 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_PCI_DEVICE:
case PIPE_CAP_PCI_FUNCTION:
case PIPE_CAP_VIEWPORT_SUBPIXEL_BITS:
-   case PIPE_CAP_TGSI_ARRAY_COMPONENTS:
   return 0;
 
case PIPE_CAP_VENDOR_ID:
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] android: intel/genxml: add rules to generate xml headers

2016-10-09 Thread Mauro Rossi
Hi Jason,

I'm sending a patch to add rules for Android,
even if those headers are not yet used there
they are inducing a minor building error on Android.

Cheers and congratulations for your nice presentation on vulkan at XDC2016
Mauro

>From 52c9b2d9a7d7fc962d7a8c30fc412387a74bf554 Mon Sep 17 00:00:00 2001
From: Mauro Rossi 
Date: Sun, 9 Oct 2016 11:40:42 +0200
Subject: [PATCH] android: intel/genxml: add rules to generate xml headers

New generated headers were introduced by commit 63a366a
"intel: aubinator: generate a standalone binary"

Once the generated files are listed in Makefile.sources variables,
Android build system requires rules to avoid building error.

New macro xml-header-gen is basically a copy of automake rules,
but xxd is invoked directly, as Android build systems stops and shows the error
in case of xxd missing.

Existing macro header-gen is renamed to pack-header-gen as a refinement.
---
 src/intel/Android.genxml.mk | 42 +-
 1 file changed, 33 insertions(+), 9 deletions(-)

diff --git a/src/intel/Android.genxml.mk b/src/intel/Android.genxml.mk
index 79de784..8991563 100644
--- a/src/intel/Android.genxml.mk
+++ b/src/intel/Android.genxml.mk
@@ -43,7 +43,7 @@ $(intermediates)/dummy.c:
 # This is the list of auto-generated files headers
 LOCAL_GENERATED_SOURCES += $(addprefix $(intermediates)/,
$(GENXML_GENERATED_FILES))

-define header-gen
+define pack-header-gen
  @mkdir -p $(dir $@)
  @echo "Gen Header: $(PRIVATE_MODULE) <= $(notdir $(@))"
  $(hide) $(PRIVATE_SCRIPT) $(PRIVATE_XML) > $@
@@ -52,42 +52,66 @@ endef
 $(intermediates)/genxml/gen4_pack.h: PRIVATE_SCRIPT :=
$(MESA_PYTHON2) $(LOCAL_PATH)/genxml/gen_pack_header.py
 $(intermediates)/genxml/gen4_pack.h: PRIVATE_XML :=
$(LOCAL_PATH)/genxml/gen4.xml
 $(intermediates)/genxml/gen4_pack.h: $(LOCAL_PATH)/genxml/gen4.xml
$(LOCAL_PATH)/genxml/gen_pack_header.py
- $(call header-gen)
+ $(call pack-header-gen)

 $(intermediates)/genxml/gen45_pack.h: PRIVATE_SCRIPT :=
$(MESA_PYTHON2) $(LOCAL_PATH)/genxml/gen_pack_header.py
 $(intermediates)/genxml/gen45_pack.h: PRIVATE_XML :=
$(LOCAL_PATH)/genxml/gen45.xml
 $(intermediates)/genxml/gen45_pack.h: $(LOCAL_PATH)/genxml/gen45.xml
$(LOCAL_PATH)/genxml/gen_pack_header.py
- $(call header-gen)
+ $(call pack-header-gen)

 $(intermediates)/genxml/gen5_pack.h: PRIVATE_SCRIPT :=
$(MESA_PYTHON2) $(LOCAL_PATH)/genxml/gen_pack_header.py
 $(intermediates)/genxml/gen5_pack.h: PRIVATE_XML :=
$(LOCAL_PATH)/genxml/gen5.xml
 $(intermediates)/genxml/gen5_pack.h: $(LOCAL_PATH)/genxml/gen5.xml
$(LOCAL_PATH)/genxml/gen_pack_header.py
- $(call header-gen)
+ $(call pack-header-gen)

 $(intermediates)/genxml/gen6_pack.h: PRIVATE_SCRIPT :=
$(MESA_PYTHON2) $(LOCAL_PATH)/genxml/gen_pack_header.py
 $(intermediates)/genxml/gen6_pack.h: PRIVATE_XML :=
$(LOCAL_PATH)/genxml/gen6.xml
 $(intermediates)/genxml/gen6_pack.h: $(LOCAL_PATH)/genxml/gen6.xml
$(LOCAL_PATH)/genxml/gen_pack_header.py
- $(call header-gen)
+ $(call pack-header-gen)

 $(intermediates)/genxml/gen7_pack.h: PRIVATE_SCRIPT :=
$(MESA_PYTHON2) $(LOCAL_PATH)/genxml/gen_pack_header.py
 $(intermediates)/genxml/gen7_pack.h: PRIVATE_XML :=
$(LOCAL_PATH)/genxml/gen7.xml
 $(intermediates)/genxml/gen7_pack.h: $(LOCAL_PATH)/genxml/gen7.xml
$(LOCAL_PATH)/genxml/gen_pack_header.py
- $(call header-gen)
+ $(call pack-header-gen)

 $(intermediates)/genxml/gen75_pack.h: PRIVATE_SCRIPT :=
$(MESA_PYTHON2) $(LOCAL_PATH)/genxml/gen_pack_header.py
 $(intermediates)/genxml/gen75_pack.h: PRIVATE_XML :=
$(LOCAL_PATH)/genxml/gen75.xml
 $(intermediates)/genxml/gen75_pack.h: $(LOCAL_PATH)/genxml/gen75.xml
$(LOCAL_PATH)/genxml/gen_pack_header.py
- $(call header-gen)
+ $(call pack-header-gen)

 $(intermediates)/genxml/gen8_pack.h: PRIVATE_SCRIPT :=
$(MESA_PYTHON2) $(LOCAL_PATH)/genxml/gen_pack_header.py
 $(intermediates)/genxml/gen8_pack.h: PRIVATE_XML :=
$(LOCAL_PATH)/genxml/gen8.xml
 $(intermediates)/genxml/gen8_pack.h: $(LOCAL_PATH)/genxml/gen8.xml
$(LOCAL_PATH)/genxml/gen_pack_header.py
- $(call header-gen)
+ $(call pack-header-gen)

 $(intermediates)/genxml/gen9_pack.h: PRIVATE_SCRIPT :=
$(MESA_PYTHON2) $(LOCAL_PATH)/genxml/gen_pack_header.py
 $(intermediates)/genxml/gen9_pack.h: PRIVATE_XML :=
$(LOCAL_PATH)/genxml/gen9.xml
 $(intermediates)/genxml/gen9_pack.h: $(LOCAL_PATH)/genxml/gen9.xml
$(LOCAL_PATH)/genxml/gen_pack_header.py
- $(call header-gen)
+ $(call pack-header-gen)
+
+define xml-header-gen
+ @mkdir -p $(dir $@)
+ @echo "Gen Header: $(PRIVATE_MODULE) <= $(notdir $(@))"
+ $(hide) echo -n "static const uint8_t " > $@; \
+ sed -e 's,_xml.h,,' <<< "`basename $@`_xml[] = {" >> $@; \
+ cat $< | xxd -i >> $@; \
+ echo "};" >> $@
+endef
+
+$(intermediates)/genxml/gen6_xml.h: $(LOCAL_PATH)/genxml/gen6.xml
+ $(call xml-header-gen)
+
+$(intermediates)/genxml/gen7_xml.h: $(LOCAL_PATH)/genxml/gen7.xml
+ $(call xml-header-gen)
+
+$(intermediates)/genxml/gen75_xml.h: $(LOCAL_PATH)/genxml/gen75.xml
+ $(call xml-header-gen)
+

[Mesa-dev] [PATCH v2 4/6] nv50/ir: restructure postraconstantfolding pass

2016-10-09 Thread Karol Herbst
we might want to add more folding passes here, so make it a bit more generic

v2: leave the comment and reword commit message

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 120 +++--
 1 file changed, 62 insertions(+), 58 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index fd1fec6..1f47ba2 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -2956,75 +2956,79 @@ FlatteningPass::tryPredicateConditional(BasicBlock *bb)
 // constraint SDST == SSRC2
 // TODO:
 // Does NVC0+ have other situations where this pass makes sense?
-class NV50PostRaConstantFolding : public Pass
+class PostRaConstantFolding : public Pass
 {
 private:
-   virtual bool visit(BasicBlock *);
+   virtual bool visit(Instruction *);
+   void handleMAD(Instruction *);
 };
 
-bool
-NV50PostRaConstantFolding::visit(BasicBlock *bb)
+// Fold Immediate into MAD; must be done after register allocation due to
+// constraint SDST == SSRC2
+void
+PostRaConstantFolding::handleMAD(Instruction *i)
 {
-   Value *vtmp;
-   Instruction *def;
-
-   for (Instruction *i = bb->getFirst(); i; i = i->next) {
-  switch (i->op) {
-  case OP_MAD:
- if (i->def(0).getFile() != FILE_GPR ||
- i->src(0).getFile() != FILE_GPR ||
- i->src(1).getFile() != FILE_GPR ||
- i->src(2).getFile() != FILE_GPR ||
- i->getDef(0)->reg.data.id != i->getSrc(2)->reg.data.id)
-break;
-
- if (i->getDef(0)->reg.data.id >= 64 ||
- i->getSrc(0)->reg.data.id >= 64)
-break;
+   if (i->def(0).getFile() != FILE_GPR ||
+   i->src(0).getFile() != FILE_GPR ||
+   i->src(1).getFile() != FILE_GPR ||
+   i->src(2).getFile() != FILE_GPR ||
+   i->getDef(0)->reg.data.id != i->getSrc(2)->reg.data.id)
+  return;
 
- if (i->flagsSrc >= 0 && i->getSrc(i->flagsSrc)->reg.data.id != 0)
-break;
+   if (i->getDef(0)->reg.data.id >= 64 ||
+   i->getSrc(0)->reg.data.id >= 64)
+  return;
 
- if (i->getPredicate())
-break;
+   if (i->flagsSrc >= 0 && i->getSrc(i->flagsSrc)->reg.data.id != 0)
+  return;
 
- def = i->getSrc(1)->getInsn();
- if (def && def->op == OP_SPLIT && typeSizeof(def->sType) == 4)
-def = def->getSrc(0)->getInsn();
- if (def && def->op == OP_MOV && def->src(0).getFile() == 
FILE_IMMEDIATE) {
-vtmp = i->getSrc(1);
-if (isFloatType(i->sType)) {
-   i->setSrc(1, def->getSrc(0));
-} else {
-   ImmediateValue val;
-   bool ret = def->src(0).getImmediate(val);
-   assert(ret);
-   if (i->getSrc(1)->reg.data.id & 1)
-  val.reg.data.u32 >>= 16;
-   val.reg.data.u32 &= 0x;
-   i->setSrc(1, new_ImmediateValue(bb->getProgram(), 
val.reg.data.u32));
-}
+   if (i->getPredicate())
+  return;
 
-/* There's no post-RA dead code elimination, so do it here
- * XXX: if we add more code-removing post-RA passes, we might
- *  want to create a post-RA dead-code elim pass */
-if (vtmp->getInsn()->isDead(true)) {
-   Value *src = vtmp->getInsn()->getSrc(0);
-   // Careful -- splits will have already been removed from the
-   // functions. Don't double-delete.
-   if (vtmp->getInsn()->bb)
-  delete_Instruction(prog, vtmp->getInsn());
-   if (src->getInsn() && src->getInsn()->isDead(true))
-  delete_Instruction(prog, src->getInsn());
-}
+   Value *vtmp;
+   Instruction *def = i->getSrc(1)->getInsn();
+
+   if (def && def->op == OP_SPLIT && typeSizeof(def->sType) == 4)
+  def = def->getSrc(0)->getInsn();
+   if (def && def->op == OP_MOV && def->src(0).getFile() == FILE_IMMEDIATE) {
+  vtmp = i->getSrc(1);
+  if (isFloatType(i->sType)) {
+ i->setSrc(1, def->getSrc(0));
+  } else {
+ ImmediateValue val;
+ bool ret = def->src(0).getImmediate(val);
+ assert(ret);
+ if (i->getSrc(1)->reg.data.id & 1)
+val.reg.data.u32 >>= 16;
+ val.reg.data.u32 &= 0x;
+ i->setSrc(1, new_ImmediateValue(prog, val.reg.data.u32));
+  }
 
-break;
- }
- break;
-  default:
- break;
+  /* There's no post-RA dead code elimination, so do it here
+   * XXX: if we add more code-removing post-RA passes, we might
+   *  want to create a post-RA dead-code elim pass */
+  if (vtmp->getInsn()->isDead(true)) {
+ Value *src = vtmp->getInsn()->getSrc(0);
+ // Careful -- splits will have already been removed from the
+ // 

[Mesa-dev] [PATCH v2 6/6] nv50/ra: always prefer def == src2 for mad/sad

2016-10-09 Thread Karol Herbst
improves the post ra mad folding pass:
total instructions in shared programs : 2811662 -> 2808429 (-0.11%)
total gprs used in shared programs: 379273 -> 379236 (-0.01%)
total local used in shared programs   : 9505 -> 9505 (0.00%)
total bytes used in shared programs   : 25773432 -> 25743616 (-0.12%)

localgpr   inst  bytes
helped   0  2617361736
  hurt   0  20  78  78

v2: reorder to show the benefit of this patch

Signed-off-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
index 7e64f7c..0cf35b7 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
@@ -1468,8 +1468,7 @@ GCRA::allocateRegisters(ArrayList& insns)
  nodes[i].init(regs, lval);
  RIG.insert([i]);
 
- if (lval->inFile(FILE_GPR) && lval->getInsn() != NULL &&
- prog->getTarget()->getChipset() < 0xc0) {
+ if (lval->inFile(FILE_GPR) && lval->getInsn() != NULL) {
 Instruction *insn = lval->getInsn();
 if (insn->op == OP_MAD || insn->op == OP_SAD)
// Short encoding only possible if they're all GPRs, no need to
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 5/6] nv50/ir: implement mad post ra folding for nvc0+

2016-10-09 Thread Karol Herbst
changes for GpuTest /test=pixmark_piano /benchmark /no_scorebox /msaa=0
/benchmark_duration_ms=6 /width=1024 /height=640:

score: 1026 -> 1044

changes for shader-db:

total instructions in shared programs : 2818606 -> 2811662 (-0.25%)
total gprs used in shared programs: 379273 -> 379273 (0.00%)
total local used in shared programs   : 9505 -> 9505 (0.00%)
total bytes used in shared programs   : 25837192 -> 25773432 (-0.25%)

localgpr   inst  bytes
helped   0   030843084
  hurt   0   0   0   0

v2: removed TODO
reorderd to show changes without RA modification
removed stale debugging print() call

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 64 +++---
 1 file changed, 57 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 1f47ba2..bcbc0c0 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -2954,19 +2954,18 @@ FlatteningPass::tryPredicateConditional(BasicBlock *bb)
 
 // Fold Immediate into MAD; must be done after register allocation due to
 // constraint SDST == SSRC2
-// TODO:
-// Does NVC0+ have other situations where this pass makes sense?
 class PostRaConstantFolding : public Pass
 {
 private:
virtual bool visit(Instruction *);
-   void handleMAD(Instruction *);
+   void handleMADforNV50(Instruction *);
+   void handleMADforNVC0(Instruction *);
 };
 
 // Fold Immediate into MAD; must be done after register allocation due to
 // constraint SDST == SSRC2
 void
-PostRaConstantFolding::handleMAD(Instruction *i)
+PostRaConstantFolding::handleMADforNV50(Instruction *i)
 {
if (i->def(0).getFile() != FILE_GPR ||
i->src(0).getFile() != FILE_GPR ||
@@ -3019,12 +3018,64 @@ PostRaConstantFolding::handleMAD(Instruction *i)
}
 }
 
+void
+PostRaConstantFolding::handleMADforNVC0(Instruction *i)
+{
+   if (i->def(0).getFile() != FILE_GPR ||
+   i->src(0).getFile() != FILE_GPR ||
+   i->src(1).getFile() != FILE_GPR ||
+   i->src(2).getFile() != FILE_GPR ||
+   i->getDef(0)->reg.data.id != i->getSrc(2)->reg.data.id)
+  return;
+
+   int chipset = prog->getTarget()->getChipset();
+   if (i->getPredicate()) {
+  // prior gk110 we can't do that if we have a predicate
+  if (chipset < NVISA_GK20A_CHIPSET)
+ return;
+  // and gk110 can't handle a cc
+  if (chipset < NVISA_GM107_CHIPSET && i->cc)
+ return;
+   }
+
+   // TODO: gm107 can also do this for S32
+   if (i->dType != TYPE_F32)
+  return;
+
+   if ((i->src(2).mod | Modifier(NV50_IR_MOD_NEG)) != 
Modifier(NV50_IR_MOD_NEG))
+  return;
+
+   ImmediateValue val;
+   int s;
+
+   if (i->src(0).getImmediate(val))
+  s = 1;
+   else if (i->src(1).getImmediate(val))
+  s = 0;
+   else
+  return;
+
+   if ((i->src(s).mod | Modifier(NV50_IR_MOD_NEG)) != 
Modifier(NV50_IR_MOD_NEG))
+  return;
+
+   if (s == 1)
+  i->swapSources(0, 1);
+
+   Instruction *imm = i->getSrc(1)->getInsn();
+   i->setSrc(1, imm->getSrc(0));
+   if (imm->isDead(true))
+  delete_Instruction(prog, imm);
+}
+
 bool
 PostRaConstantFolding::visit(Instruction *i)
 {
switch (i->op) {
case OP_MAD:
-  handleMAD(i);
+  if (prog->getTarget()->getChipset() < 0xc0)
+ handleMADforNV50(i);
+  else
+ handleMADforNVC0(i);
   break;
default:
   break;
@@ -3447,8 +3498,7 @@ bool
 Program::optimizePostRA(int level)
 {
RUN_PASS(2, FlatteningPass, run);
-   if (getTarget()->getChipset() < 0xc0)
-  RUN_PASS(2, PostRaConstantFolding, run);
+   RUN_PASS(2, PostRaConstantFolding, run);
 
return true;
 }
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 2/6] gm107/ir: add LIMM form of mad

2016-10-09 Thread Karol Herbst
v2: renamed commit
reordered modifiers
add assert(dst == src2)

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 35 --
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index 3fedafd..5729615 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -1306,7 +1306,7 @@ CodeEmitterGM107::emitFMUL()
 void
 CodeEmitterGM107::emitFFMA()
 {
-   /*XXX: ffma32i exists, but not using it as third src overlaps dst */
+   bool isLongIMMD = false;
switch(insn->src(2).getFile()) {
case FILE_GPR:
   switch (insn->src(1).getFile()) {
@@ -1319,14 +1319,22 @@ CodeEmitterGM107::emitFFMA()
  emitCBUF(0x22, -1, 0x14, 16, 2, insn->src(1));
  break;
   case FILE_IMMEDIATE:
- emitInsn(0x3280);
- emitIMMD(0x14, 19, insn->src(1));
+ if (longIMMD(insn->getSrc(1))) {
+assert(insn->getDef(0)->reg.data.id == 
insn->getSrc(2)->reg.data.id);
+isLongIMMD = true;
+emitInsn(0x0c00);
+emitIMMD(0x14, 32, insn->src(1));
+ } else {
+emitInsn(0x3280);
+emitIMMD(0x14, 19, insn->src(1));
+ }
  break;
   default:
  assert(!"bad src1 file");
  break;
   }
-  emitGPR (0x27, insn->src(2));
+  if (!isLongIMMD)
+ emitGPR (0x27, insn->src(2));
   break;
case FILE_MEMORY_CONST:
   emitInsn(0x5180);
@@ -1337,11 +1345,20 @@ CodeEmitterGM107::emitFFMA()
   assert(!"bad src2 file");
   break;
}
-   emitRND (0x33);
-   emitSAT (0x32);
-   emitNEG (0x31, insn->src(2));
-   emitNEG2(0x30, insn->src(0), insn->src(1));
-   emitCC  (0x2f);
+
+   emitRND(0x33);
+
+   if (isLongIMMD) {
+  emitSAT (0x37);
+  emitNEG (0x39, insn->src(2));
+  emitNEG2(0x38, insn->src(0), insn->src(1));
+  emitCC  (0x34);
+   } else {
+  emitSAT (0x32);
+  emitNEG (0x31, insn->src(2));
+  emitNEG2(0x30, insn->src(0), insn->src(1));
+  emitCC  (0x2f);
+   }
 
emitFMZ(0x35, 2);
emitGPR(0x08, insn->src(0));
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 1/6] gk110/ir: add LIMM form of mad

2016-10-09 Thread Karol Herbst
v2: renamed commit
reordered modifiers
add assert(dst == src2)

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 50 ++
 1 file changed, 33 insertions(+), 17 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
index ce20ed3..7574116 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
@@ -47,7 +47,7 @@ private:
 private:
void emitForm_21(const Instruction *, uint32_t opc2, uint32_t opc1);
void emitForm_C(const Instruction *, uint32_t opc, uint8_t ctg);
-   void emitForm_L(const Instruction *, uint32_t opc, uint8_t ctg, Modifier);
+   void emitForm_L(const Instruction *, uint32_t opc, uint8_t ctg, Modifier, 
int sCount = 3);
 
void emitPredicate(const Instruction *);
 
@@ -364,7 +364,7 @@ CodeEmitterGK110::setImmediate32(const Instruction *i, 
const int s,
 
 void
 CodeEmitterGK110::emitForm_L(const Instruction *i, uint32_t opc, uint8_t ctg,
- Modifier mod)
+ Modifier mod, int sCount)
 {
code[0] = ctg;
code[1] = opc << 20;
@@ -373,7 +373,7 @@ CodeEmitterGK110::emitForm_L(const Instruction *i, uint32_t 
opc, uint8_t ctg,
 
defId(i->def(0), 2);
 
-   for (int s = 0; s < 3 && i->srcExists(s); ++s) {
+   for (int s = 0; s < sCount && i->srcExists(s); ++s) {
   switch (i->src(s).getFile()) {
   case FILE_GPR:
  srcId(i->src(s), s ? 42 : 10);
@@ -486,25 +486,41 @@ CodeEmitterGK110::emitNOP(const Instruction *i)
 void
 CodeEmitterGK110::emitFMAD(const Instruction *i)
 {
-   assert(!isLIMM(i->src(1), TYPE_F32));
+   bool neg1 = (i->src(0).mod ^ i->src(1).mod).neg();
 
-   emitForm_21(i, 0x0c0, 0x940);
+   if (isLIMM(i->src(1), TYPE_F32)) {
+  assert(i->getDef(0)->reg.data.id == i->getSrc(2)->reg.data.id);
 
-   NEG_(34, 2);
-   SAT_(35);
-   RND_(36, F);
-   FTZ_(38);
-   DNZ_(39);
+  // last source is dst, so force 2 sources
+  emitForm_L(i, 0x600, 0x0, Modifier(0), 2);
 
-   bool neg1 = (i->src(0).mod ^ i->src(1).mod).neg();
+  NEG_(3b, 0);
+  NEG_(3c, 2);
+  SAT_(3a);
 
-   if (code[0] & 0x1) {
-  if (neg1)
- code[1] ^= 1 << 27;
-   } else
-   if (neg1) {
-  code[1] |= 1 << 19;
+  // neg 1
+  if (neg1) {
+ code[1] |= 1 << 27;
+  }
+   } else {
+  emitForm_21(i, 0x0c0, 0x940);
+
+  NEG_(34, 2);
+  SAT_(35);
+  RND_(36, F);
+
+  // neg 1
+  if (code[0] & 0x1) {
+ if (neg1)
+code[1] ^= 1 << 27;
+  } else
+  if (neg1) {
+ code[1] |= 1 << 19;
+  }
}
+
+   FTZ_(38);
+   DNZ_(39);
 }
 
 void
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 3/6] nv50/ir: replace post_ra_dead by Instruction::isDead

2016-10-09 Thread Karol Herbst
Signed-off-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir.h|  2 +-
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 20 +++-
 2 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index bedbdcc..9d1a72f 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -829,7 +829,7 @@ public:
}
 
inline bool isPseudo() const { return op < OP_MOV; }
-   bool isDead() const;
+   bool isDead(bool postRA = false) const;
bool isNop() const;
bool isCommutationLegal(const Instruction *) const; // must be adjacent !
bool isActionEqual(const Instruction *) const;
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 6efb29e..fd1fec6 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -61,7 +61,7 @@ Instruction::isNop() const
return false;
 }
 
-bool Instruction::isDead() const
+bool Instruction::isDead(bool postRA) const
 {
if (op == OP_STORE ||
op == OP_EXPORT ||
@@ -70,8 +70,11 @@ bool Instruction::isDead() const
op == OP_WRSV)
   return false;
 
+   if (postRA && op == OP_MOV && subOp == NV50_IR_SUBOP_MOV_FINAL)
+  return false;
+
for (int d = 0; defExists(d); ++d)
-  if (getDef(d)->refCount() || getDef(d)->reg.data.id >= 0)
+  if (getDef(d)->refCount() || (!postRA && getDef(d)->reg.data.id >= 0))
  return false;
 
if (terminator || asFlow())
@@ -2959,15 +2962,6 @@ private:
virtual bool visit(BasicBlock *);
 };
 
-static bool
-post_ra_dead(Instruction *i)
-{
-   for (int d = 0; i->defExists(d); ++d)
-  if (i->getDef(d)->refCount())
- return false;
-   return true;
-}
-
 bool
 NV50PostRaConstantFolding::visit(BasicBlock *bb)
 {
@@ -3014,13 +3008,13 @@ NV50PostRaConstantFolding::visit(BasicBlock *bb)
 /* There's no post-RA dead code elimination, so do it here
  * XXX: if we add more code-removing post-RA passes, we might
  *  want to create a post-RA dead-code elim pass */
-if (post_ra_dead(vtmp->getInsn())) {
+if (vtmp->getInsn()->isDead(true)) {
Value *src = vtmp->getInsn()->getSrc(0);
// Careful -- splits will have already been removed from the
// functions. Don't double-delete.
if (vtmp->getInsn()->bb)
   delete_Instruction(prog, vtmp->getInsn());
-   if (src->getInsn() && post_ra_dead(src->getInsn()))
+   if (src->getInsn() && src->getInsn()->isDead(true))
   delete_Instruction(prog, src->getInsn());
 }
 
-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 0/6] nv50/ir: PostRaConstantFolding improvements

2016-10-09 Thread Karol Herbst
This series reworks the structure of the pass to make it easier to add
more optimisations to it.

Still have to run a full piglit on my gk106 with this, but g80, gk110 and gm107
should be tested as well, but I can't.

v2: swaped the last two commits

changes for shader-db:
total instructions in shared programs : 2818606 -> 2808429 (-0.36%)
total gprs used in shared programs: 379273 -> 379236 (-0.01%)
total local used in shared programs   : 9505 -> 9505 (0.00%)
total bytes used in shared programs   : 25837192 -> 25743616 (-0.36%)

localgpr   inst  bytes 
helped   0  2640934093 
  hurt   0  20  61  61

Karol Herbst (6):
  gk110/ir: add LIMM form of mad
  gm107/ir: add LIMM form of mad
  nv50/ir: replace post_ra_dead by Instruction::isDead
  nv50/ir: restructure postraconstantfolding pass
  nv50/ir: implement mad post ra folding for nvc0+
  nv50/ra: always prefer def == src2 for mad/sad

 src/gallium/drivers/nouveau/codegen/nv50_ir.h  |   2 +-
 .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp |  50 --
 .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp |  35 +++-
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 184 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp |   3 +-
 5 files changed, 177 insertions(+), 97 deletions(-)

-- 
2.10.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/6] nv50/ir: replace post_ra_dead by Instruction::isDead

2016-10-09 Thread Karol Herbst
2016-10-08 18:39 GMT+02:00 Samuel Pitoiset :
>
>
> On 10/08/2016 05:43 PM, Karol Herbst wrote:
>>
>> Signed-off-by: Karol Herbst 
>> ---
>>  src/gallium/drivers/nouveau/codegen/nv50_ir.h|  2 +-
>>  .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 20
>> +++-
>>  2 files changed, 8 insertions(+), 14 deletions(-)
>>
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
>> index bedbdcc..9d1a72f 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
>> @@ -829,7 +829,7 @@ public:
>> }
>>
>> inline bool isPseudo() const { return op < OP_MOV; }
>> -   bool isDead() const;
>> +   bool isDead(bool postRA = false) const;
>> bool isNop() const;
>> bool isCommutationLegal(const Instruction *) const; // must be
>> adjacent !
>> bool isActionEqual(const Instruction *) const;
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>> index 780820f..3fcadd9 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>> @@ -61,7 +61,7 @@ Instruction::isNop() const
>> return false;
>>  }
>>
>> -bool Instruction::isDead() const
>> +bool Instruction::isDead(bool postRA) const
>>  {
>
>
> You might want to do :
>
> if (postRA)
>   return post_ra_dead(this);
>
> which doesn't require any changes above.

post_ra_dead is basically broken, we were just lucky it worked out so far.

>
>> if (op == OP_STORE ||
>> op == OP_EXPORT ||
>> @@ -70,8 +70,11 @@ bool Instruction::isDead() const
>> op == OP_WRSV)
>>return false;
>>
>> +   if (postRA && op == OP_MOV && subOp == NV50_IR_SUBOP_MOV_FINAL)
>> +  return false;
>
>
> Why do you need that specific case?

those mov defs have a refcount of 0, so they would get eliminated
otherwise, but shouldn't.

>
>> +
>> for (int d = 0; defExists(d); ++d)
>> -  if (getDef(d)->refCount() || getDef(d)->reg.data.id >= 0)
>> +  if (getDef(d)->refCount() || (!postRA && getDef(d)->reg.data.id >=
>> 0))
>>   return false;
>>
>> if (terminator || asFlow())
>> @@ -2959,15 +2962,6 @@ private:
>> virtual bool visit(BasicBlock *);
>>  };
>>
>> -static bool
>> -post_ra_dead(Instruction *i)
>> -{
>> -   for (int d = 0; i->defExists(d); ++d)
>> -  if (i->getDef(d)->refCount())
>> - return false;
>> -   return true;
>> -}
>> -
>>  bool
>>  NV50PostRaConstantFolding::visit(BasicBlock *bb)
>>  {
>> @@ -3014,13 +3008,13 @@ NV50PostRaConstantFolding::visit(BasicBlock *bb)
>>  /* There's no post-RA dead code elimination, so do it here
>>   * XXX: if we add more code-removing post-RA passes, we might
>>   *  want to create a post-RA dead-code elim pass */
>> -if (post_ra_dead(vtmp->getInsn())) {
>> +if (vtmp->getInsn()->isDead(true)) {
>> Value *src = vtmp->getInsn()->getSrc(0);
>> // Careful -- splits will have already been removed from
>> the
>> // functions. Don't double-delete.
>> if (vtmp->getInsn()->bb)
>>delete_Instruction(prog, vtmp->getInsn());
>> -   if (src->getInsn() && post_ra_dead(src->getInsn()))
>> +   if (src->getInsn() && src->getInsn()->isDead(true))
>>delete_Instruction(prog, src->getInsn());
>>  }
>>
>>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/6] nv50/ir: add LIMM form of mad to gm107

2016-10-09 Thread Karol Herbst
2016-10-08 18:12 GMT+02:00 Samuel Pitoiset :
> Usually we prefix with gm107/ir, gk110/ir, etc...
>
> More comments below.
>
> On 10/08/2016 05:43 PM, Karol Herbst wrote:
>>
>> Signed-off-by: Karol Herbst 
>> ---
>>  .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 32
>> --
>>  1 file changed, 23 insertions(+), 9 deletions(-)
>>
>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
>> index 3fedafd..d084e07 100644
>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
>> @@ -1306,7 +1306,7 @@ CodeEmitterGM107::emitFMUL()
>>  void
>>  CodeEmitterGM107::emitFFMA()
>>  {
>> -   /*XXX: ffma32i exists, but not using it as third src overlaps dst */
>> +   bool isLongIMMD = false;
>> switch(insn->src(2).getFile()) {
>> case FILE_GPR:
>>switch (insn->src(1).getFile()) {
>> @@ -1319,14 +1319,21 @@ CodeEmitterGM107::emitFFMA()
>>   emitCBUF(0x22, -1, 0x14, 16, 2, insn->src(1));
>>   break;
>>case FILE_IMMEDIATE:
>> - emitInsn(0x3280);
>> - emitIMMD(0x14, 19, insn->src(1));
>> + if (longIMMD(insn->getSrc(1))) {
>> +isLongIMMD = true;
>> +emitInsn(0x0c00);
>> +emitIMMD(0x14, 32, insn->src(1));
>> + } else {
>> +emitInsn(0x3280);
>> +emitIMMD(0x14, 19, insn->src(1));
>> + }
>>   break;
>>default:
>>   assert(!"bad src1 file");
>>   break;
>>}
>> -  emitGPR (0x27, insn->src(2));
>> +  if (!isLongIMMD)
>> + emitGPR (0x27, insn->src(2));
>>break;
>> case FILE_MEMORY_CONST:
>>emitInsn(0x5180);
>> @@ -1337,12 +1344,19 @@ CodeEmitterGM107::emitFFMA()
>>assert(!"bad src2 file");
>>break;
>> }
>> -   emitRND (0x33);
>> -   emitSAT (0x32);
>> -   emitNEG (0x31, insn->src(2));
>> -   emitNEG2(0x30, insn->src(0), insn->src(1));
>> -   emitCC  (0x2f);
>> +   if (isLongIMMD) {
>> +  emitCC  (0x34);
>> +  emitSAT (0x37);
>> +  emitNEG2(0x38, insn->src(0), insn->src(1));
>> +  emitNEG (0x39, insn->src(2));
>> +   } else {
>> +  emitCC  (0x2f);
>> +  emitNEG2(0x30, insn->src(0), insn->src(1));
>> +  emitNEG (0x31, insn->src(2));
>> +  emitSAT (0x32);
>> +   }
>
>
> Please re-order.
>
> Also, because FFMA32I modifiers are really different than FFMA, it would
> make more sense to add emitFFMA32I() and select the good one regarding the
> LIMM form or not.
>

currently other emit methods do the same and this would also move more
logic inside the big switch. I think a better solution would be to
split inside emitFFMA and call emitFFMA32I() or emitFFMASomething()
from there. But I would leave it as is, just because it is the same
everywhere else.

> Otherwise, I don't see any restrictions which make sure that def == src2 for
> FFMA32I...
>
> If you check that later in the series, this patch has to be *after*.
>
>>
>> +   emitRND(0x33);
>> emitFMZ(0x35, 2);
>> emitGPR(0x08, insn->src(0));
>> emitGPR(0x00, insn->def(0));
>>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev