Re: [Mesa-dev] [PATCH v2] i965: allocate a SGVS element when VertexID or InstanceID are read

2018-01-09 Thread Iago Toral
Ken, do you have any comments about this patch? I'd like to push it
otherwise.

Iago

On Thu, 2018-01-04 at 14:24 -0800, Jason Ekstrand wrote:
> Reviewed-by: Jason Ekstrand 
> 
> Ken?
> 
> On Wed, Jan 3, 2018 at 6:55 PM, Iago Toral Quiroga  > wrote:
> > Although on gen8+ platforms we can in theory use 3DSTATE_VF_SGVS
> > 
> > to put these beyond the last vertex element it seems that we still
> > 
> > need to allocate the SVGS element, otherwise we have observed cases
> > 
> > where we end up reading garbage. Specifically, the CTS test
> > mentioned
> > 
> > below was flaky with a fail rate of ~1% on some gen9+ platforms
> > caused
> > 
> > by reading garbage for the gl_InstanceID value. The flakyness goes
> > 
> > away as soon as we start allocating the SVGS element.
> > 
> > 
> > 
> > v2:
> > 
> >   - Do this for gen8+, not just gen9+, and pull the boolean
> > 
> >     outside the #if block (Jason)
> > 
> > 
> > 
> > Fixes flaky test:
> > 
> > KHR-GL45.vertex_attrib_64bit.limits_test
> > 
> > 
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104335
> > 
> > ---
> > 
> >  src/mesa/drivers/dri/i965/genX_state_upload.c | 17 ++-
> > --
> > 
> >  1 file changed, 2 insertions(+), 15 deletions(-)
> > 
> > 
> > 
> > diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c
> > b/src/mesa/drivers/dri/i965/genX_state_upload.c
> > 
> > index 50ac5bc59f..d0a980f973 100644
> > 
> > --- a/src/mesa/drivers/dri/i965/genX_state_upload.c
> > 
> > +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
> > 
> > @@ -486,26 +486,13 @@ genX(emit_vertices)(struct brw_context *brw)
> > 
> >     } else {
> > 
> >        brw_batch_emit(brw, GENX(3DSTATE_VF_SGVS), vfs);
> > 
> >     }
> > 
> > +#endif
> > 
> > 
> > 
> > -   /* Normally we don't need an element for the SGVS attribute
> > because the
> > 
> > -    * 3DSTATE_VF_SGVS instruction lets you store the generated
> > attribute in an
> > 
> > -    * element that is past the list in 3DSTATE_VERTEX_ELEMENTS.
> > However if
> > 
> > -    * we're using draw parameters then we need an element for the
> > those
> > 
> > -    * values.  Additionally if there is an edge flag element then
> > the SGVS
> > 
> > -    * can't be inserted past that so we need a dummy element to
> > ensure that
> > 
> > -    * the edge flag is the last one.
> > 
> > -    */
> > 
> > -   const bool needs_sgvs_element = (vs_prog_data->uses_basevertex
> > ||
> > 
> > -                                    vs_prog_data-
> > >uses_baseinstance ||
> > 
> > -                                    ((vs_prog_data-
> > >uses_instanceid ||
> > 
> > -                                      vs_prog_data->uses_vertexid)
> > 
> > -                                     && uses_edge_flag));
> > 
> > -#else
> > 
> >     const bool needs_sgvs_element = (vs_prog_data->uses_basevertex
> > ||
> > 
> >                                      vs_prog_data-
> > >uses_baseinstance ||
> > 
> >                                      vs_prog_data->uses_instanceid
> > ||
> > 
> >                                      vs_prog_data->uses_vertexid);
> > 
> > -#endif
> > 
> > +
> > 
> >     unsigned nr_elements =
> > 
> >        brw->vb.nr_enabled + needs_sgvs_element + vs_prog_data-
> > >uses_drawid;
> > 
> > 
> > 
> > --
> > 
> > 2.11.0
> > 
> > 
> > 
> > 
> 
> ___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] i965/miptree: Use cpu tiling/detiling when mapping

2018-01-09 Thread Scott D Phillips
Rename the (un)map_gtt functions to (un)map_map (map by
returning a map) and add new functions (un)map_tiled_memcpy that
return a shadow buffer populated with the intel_tiled_memcpy
functions.
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 95 ---
 1 file changed, 86 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index ead0c359c0..7a90dafa1e 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -31,6 +31,7 @@
 #include "intel_image.h"
 #include "intel_mipmap_tree.h"
 #include "intel_tex.h"
+#include "intel_tiled_memcpy.h"
 #include "intel_blit.h"
 #include "intel_fbo.h"
 
@@ -3031,10 +3032,10 @@ intel_miptree_unmap_raw(struct intel_mipmap_tree *mt)
 }
 
 static void
-intel_miptree_map_gtt(struct brw_context *brw,
- struct intel_mipmap_tree *mt,
- struct intel_miptree_map *map,
- unsigned int level, unsigned int slice)
+intel_miptree_map_map(struct brw_context *brw,
+  struct intel_mipmap_tree *mt,
+  struct intel_miptree_map *map,
+  unsigned int level, unsigned int slice)
 {
unsigned int bw, bh;
void *base;
@@ -3052,7 +3053,7 @@ intel_miptree_map_gtt(struct brw_context *brw,
y /= bh;
x /= bw;
 
-   base = intel_miptree_map_raw(brw, mt, map->mode);
+   base = intel_miptree_map_raw(brw, mt, map->mode | MAP_RAW);
 
if (base == NULL)
   map->ptr = NULL;
@@ -3078,11 +3079,80 @@ intel_miptree_map_gtt(struct brw_context *brw,
 }
 
 static void
-intel_miptree_unmap_gtt(struct intel_mipmap_tree *mt)
+intel_miptree_unmap_map(struct intel_mipmap_tree *mt)
 {
intel_miptree_unmap_raw(mt);
 }
 
+/* Compute extent parameters for use with tiled_memcpy functions.
+ * xs are in units of bytes and ys are in units of strides. */
+static inline void
+tile_extents(struct intel_mipmap_tree *mt, struct intel_miptree_map *map,
+ unsigned int level, unsigned int slice, unsigned int *x1,
+ unsigned int *x2, unsigned int *y1, unsigned int *y2)
+{
+   unsigned int block_width, block_height, block_bytes;
+   unsigned int x0_el, y0_el;
+
+   _mesa_get_format_block_size(mt->format, _width, _height);
+   block_bytes = _mesa_get_format_bytes(mt->format);
+
+   assert(map->x % block_width == 0);
+   assert(map->y % block_height == 0);
+
+   intel_miptree_get_image_offset(mt, level, slice, _el, _el);
+   *x1 = (map->x / block_width + x0_el) * block_bytes;
+   *y1 = map->y / block_height + y0_el;
+   *x2 = *x1 + DIV_ROUND_UP(map->w, block_width) * block_bytes;
+   *y2 = *y1 + DIV_ROUND_UP(map->h, block_height);
+}
+
+static void
+intel_miptree_map_tiled_memcpy(struct brw_context *brw,
+   struct intel_mipmap_tree *mt,
+   struct intel_miptree_map *map,
+   unsigned int level, unsigned int slice)
+{
+   unsigned int x1, x2, y1, y2;
+   tile_extents(mt, map, level, slice, , , , );
+   map->stride = _mesa_format_row_stride(mt->format, map->w);
+   map->buffer = map->ptr = malloc(map->stride * (y2 - y1));
+
+   if (!(map->mode & GL_MAP_INVALIDATE_RANGE_BIT)) {
+  char *src = intel_miptree_map_raw(brw, mt, map->mode | MAP_RAW);
+  src += mt->offset;
+
+  tiled_to_linear(x1, x2, y1, y2, map->ptr, src, map->stride,
+  mt->surf.row_pitch, brw->has_swizzling, mt->surf.tiling,
+  memcpy);
+
+  intel_miptree_unmap_raw(mt);
+   }
+}
+
+static void
+intel_miptree_unmap_tiled_memcpy(struct brw_context *brw,
+ struct intel_mipmap_tree *mt,
+ struct intel_miptree_map *map,
+ unsigned int level,
+ unsigned int slice)
+{
+   if (map->mode & GL_MAP_WRITE_BIT) {
+  unsigned int x1, x2, y1, y2;
+  tile_extents(mt, map, level, slice, , , , );
+
+  char *dst = intel_miptree_map_raw(brw, mt, map->mode | MAP_RAW);
+  dst += mt->offset;
+
+  linear_to_tiled(x1, x2, y1, y2, dst, map->ptr, mt->surf.row_pitch,
+  map->stride, brw->has_swizzling, mt->surf.tiling, 
memcpy);
+
+  intel_miptree_unmap_raw(mt);
+   }
+   free(map->buffer);
+   map->buffer = map->ptr = NULL;
+}
+
 static void
 intel_miptree_map_blit(struct brw_context *brw,
   struct intel_mipmap_tree *mt,
@@ -3640,8 +3710,10 @@ intel_miptree_map(struct brw_context *brw,
   (mt->surf.row_pitch % 16 == 0)) {
   intel_miptree_map_movntdqa(brw, mt, map, level, slice);
 #endif
+   } else if (mt->surf.tiling != ISL_TILING_LINEAR) {
+  intel_miptree_map_tiled_memcpy(brw, mt, map, level, slice);
} else {
-  intel_miptree_map_gtt(brw, mt, map, level, slice);
+  intel_miptree_map_map(brw, mt, map, level, slice);
   

[Mesa-dev] [PATCH 1/5] i965/tiled_memcpy: change linear pointer from (0, 0) to (xt1, yt1)

2018-01-09 Thread Scott D Phillips
In all current uses, the linear surface is only allocated starting
at (xt1, yt1) anyway, so this improves the calling ergonomics.
---
 src/mesa/drivers/dri/i965/intel_pixel_read.c   |  2 +-
 src/mesa/drivers/dri/i965/intel_tex_image.c|  4 ++--
 src/mesa/drivers/dri/i965/intel_tiled_memcpy.c | 16 
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_pixel_read.c 
b/src/mesa/drivers/dri/i965/intel_pixel_read.c
index 4528d6d265..cf957378f9 100644
--- a/src/mesa/drivers/dri/i965/intel_pixel_read.c
+++ b/src/mesa/drivers/dri/i965/intel_pixel_read.c
@@ -202,7 +202,7 @@ intel_readpixels_tiled_memcpy(struct gl_context * ctx,
tiled_to_linear(
   xoffset * cpp, (xoffset + width) * cpp,
   yoffset, yoffset + height,
-  pixels - (ptrdiff_t) yoffset * dst_pitch - (ptrdiff_t) xoffset * cpp,
+  pixels,
   map + irb->mt->offset,
   dst_pitch, irb->mt->surf.row_pitch,
   brw->has_swizzling,
diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
b/src/mesa/drivers/dri/i965/intel_tex_image.c
index 2ee36583c4..28fdd680da 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -294,7 +294,7 @@ intel_texsubimage_tiled_memcpy(struct gl_context * ctx,
   xoffset * cpp, (xoffset + width) * cpp,
   yoffset, yoffset + height,
   map,
-  pixels - (ptrdiff_t) yoffset * src_pitch - (ptrdiff_t) xoffset * cpp,
+  pixels,
   image->mt->surf.row_pitch, src_pitch,
   brw->has_swizzling,
   image->mt->surf.tiling,
@@ -742,7 +742,7 @@ intel_gettexsubimage_tiled_memcpy(struct gl_context *ctx,
tiled_to_linear(
   xoffset * cpp, (xoffset + width) * cpp,
   yoffset, yoffset + height,
-  pixels - (ptrdiff_t) yoffset * dst_pitch - (ptrdiff_t) xoffset * cpp,
+  pixels,
   map,
   dst_pitch, image->mt->surf.row_pitch,
   brw->has_swizzling,
diff --git a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c 
b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
index 53a5679691..e2b7b3496d 100644
--- a/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
+++ b/src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
@@ -624,8 +624,8 @@ ytiled_to_linear_faster(uint32_t x0, uint32_t x1, uint32_t 
x2, uint32_t x3,
  * copy function (\ref tile_copy_fn).
  * The X range is in bytes, i.e. pixels * bytes-per-pixel.
  * The Y range is in pixels (i.e. unitless).
- * 'dst' is the start of the texture and 'src' is the corresponding
- * address to copy from, though copying begins at (xt1, yt1).
+ * 'dst' is the address of (0, 0) in the destination tiled texture.
+ * 'src' is the address of (xt1, yt1) in the source linear texture.
  */
 void
 linear_to_tiled(uint32_t xt1, uint32_t xt2,
@@ -698,8 +698,8 @@ linear_to_tiled(uint32_t xt1, uint32_t xt2,
  /* Translate by (xt,yt) for single-tile copier. */
  tile_copy(x0-xt, x1-xt, x2-xt, x3-xt,
y0-yt, y1-yt,
-   dst + (ptrdiff_t) xt * th + (ptrdiff_t) yt * dst_pitch,
-   src + (ptrdiff_t) xt  + (ptrdiff_t) yt * src_pitch,
+   dst + (ptrdiff_t)xt * th  +  (ptrdiff_t)yt* 
dst_pitch,
+   src + (ptrdiff_t)xt - xt1 + ((ptrdiff_t)yt - yt1) * 
src_pitch,
src_pitch,
swizzle_bit,
mem_copy);
@@ -715,8 +715,8 @@ linear_to_tiled(uint32_t xt1, uint32_t xt2,
  * copy function (\ref tile_copy_fn).
  * The X range is in bytes, i.e. pixels * bytes-per-pixel.
  * The Y range is in pixels (i.e. unitless).
- * 'dst' is the start of the texture and 'src' is the corresponding
- * address to copy from, though copying begins at (xt1, yt1).
+ * 'dst' is the address of (xt1, yt1) in the destination linear texture.
+ * 'src' is the address of (0, 0) in the source tiled texture.
  */
 void
 tiled_to_linear(uint32_t xt1, uint32_t xt2,
@@ -789,8 +789,8 @@ tiled_to_linear(uint32_t xt1, uint32_t xt2,
  /* Translate by (xt,yt) for single-tile copier. */
  tile_copy(x0-xt, x1-xt, x2-xt, x3-xt,
y0-yt, y1-yt,
-   dst + (ptrdiff_t) xt  + (ptrdiff_t) yt * dst_pitch,
-   src + (ptrdiff_t) xt * th + (ptrdiff_t) yt * src_pitch,
+   dst + (ptrdiff_t)xt - xt1 + ((ptrdiff_t)yt - yt1) * 
dst_pitch,
+   src + (ptrdiff_t)xt * th  +  (ptrdiff_t)yt* 
src_pitch,
dst_pitch,
swizzle_bit,
mem_copy);
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] i965/miptree: Don't gtt map from map_depthstencil

2018-01-09 Thread Scott D Phillips
Instead of gtt mapping, call out to other map functions (map_map
or map_tiled_memcpy) for the depth surface. Removes a place where
gtt mapping is used.
---
This is a bit icky, perhaps something like mapping z_mt with
BRW_MAP_DIRECT_BIT could be cleaner (but in that case the
depthstencil mapping and the DIRECT one would fight for the map
slot in mt->level[level].slice[slice].map).

 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 48 +--
 1 file changed, 30 insertions(+), 18 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index fa4ae06399..0b9aafe205 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -3460,16 +3460,21 @@ intel_miptree_map_depthstencil(struct brw_context *brw,
 * temporary buffer back out.
 */
if (!(map->mode & GL_MAP_INVALIDATE_RANGE_BIT)) {
+  struct intel_miptree_map z_mt_map = {
+ .mode = map->mode & ~GL_MAP_WRITE_BIT, .x = map->x, .y = map->y,
+ .w = map->w, .h = map->h,
+  };
+  if (z_mt->surf.tiling == ISL_TILING_LINEAR)
+ intel_miptree_map_map(brw, z_mt, _mt_map, level, slice);
+  else
+ intel_miptree_map_tiled_memcpy(brw, z_mt, _mt_map, level, slice);
+  uint32_t *z_map = z_mt_map.ptr;
   uint32_t *packed_map = map->ptr;
   uint8_t *s_map = intel_miptree_map_raw(brw, s_mt, GL_MAP_READ_BIT);
-  uint32_t *z_map = intel_miptree_map_raw(brw, z_mt, GL_MAP_READ_BIT);
   unsigned int s_image_x, s_image_y;
-  unsigned int z_image_x, z_image_y;
 
   intel_miptree_get_image_offset(s_mt, level, slice,
 _image_x, _image_y);
-  intel_miptree_get_image_offset(z_mt, level, slice,
-_image_x, _image_y);
 
   for (uint32_t y = 0; y < map->h; y++) {
 for (uint32_t x = 0; x < map->w; x++) {
@@ -3478,9 +3483,7 @@ intel_miptree_map_depthstencil(struct brw_context *brw,
 map_x + s_image_x,
 map_y + s_image_y,
 brw->has_swizzling);
-   ptrdiff_t z_offset = ((map_y + z_image_y) *
-  (z_mt->surf.row_pitch / 4) +
- (map_x + z_image_x));
+   ptrdiff_t z_offset = y * (z_mt_map.stride / 4) + x;
uint8_t s = s_map[s_offset];
uint32_t z = z_map[z_offset];
 
@@ -3494,12 +3497,15 @@ intel_miptree_map_depthstencil(struct brw_context *brw,
   }
 
   intel_miptree_unmap_raw(s_mt);
-  intel_miptree_unmap_raw(z_mt);
+  if (z_mt->surf.tiling == ISL_TILING_LINEAR)
+ intel_miptree_unmap_map(z_mt);
+  else
+ intel_miptree_unmap_tiled_memcpy(brw, z_mt, _mt_map, level, slice);
 
   DBG("%s: %d,%d %dx%d from z mt %p %d,%d, s mt %p %d,%d = %p/%d\n",
  __func__,
  map->x, map->y, map->w, map->h,
- z_mt, map->x + z_image_x, map->y + z_image_y,
+ z_mt, map->x, map->y,
  s_mt, map->x + s_image_x, map->y + s_image_y,
  map->ptr, map->stride);
} else {
@@ -3521,16 +3527,21 @@ intel_miptree_unmap_depthstencil(struct brw_context 
*brw,
bool map_z32f_x24s8 = mt->format == MESA_FORMAT_Z_FLOAT32;
 
if (map->mode & GL_MAP_WRITE_BIT) {
+  struct intel_miptree_map z_mt_map = {
+ .mode = map->mode | GL_MAP_INVALIDATE_RANGE_BIT, .x = map->x,
+ .y = map->y, .w = map->w, .h = map->h,
+  };
+  if (z_mt->surf.tiling == ISL_TILING_LINEAR)
+ intel_miptree_map_map(brw, z_mt, _mt_map, level, slice);
+  else
+ intel_miptree_map_tiled_memcpy(brw, z_mt, _mt_map, level, slice);
+  uint32_t *z_map = z_mt_map.ptr;
   uint32_t *packed_map = map->ptr;
   uint8_t *s_map = intel_miptree_map_raw(brw, s_mt, GL_MAP_WRITE_BIT);
-  uint32_t *z_map = intel_miptree_map_raw(brw, z_mt, GL_MAP_WRITE_BIT);
   unsigned int s_image_x, s_image_y;
-  unsigned int z_image_x, z_image_y;
 
   intel_miptree_get_image_offset(s_mt, level, slice,
 _image_x, _image_y);
-  intel_miptree_get_image_offset(z_mt, level, slice,
-_image_x, _image_y);
 
   for (uint32_t y = 0; y < map->h; y++) {
 for (uint32_t x = 0; x < map->w; x++) {
@@ -3538,9 +3549,7 @@ intel_miptree_unmap_depthstencil(struct brw_context *brw,
 x + s_image_x + map->x,
 y + s_image_y + map->y,
 brw->has_swizzling);
-   ptrdiff_t z_offset = ((y + z_image_y + map->y) *
-  (z_mt->surf.row_pitch / 4) +
- (x + z_image_x + map->x));
+   ptrdiff_t z_offset = y * (z_mt_map.stride 

[Mesa-dev] [PATCH 4/5] i965/miptree: Map with movntdqa for linear buffers only

2018-01-09 Thread Scott D Phillips
Removes a place where gtt mapping is used.
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index e4a3f163d2..fa4ae06399 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -3707,7 +3707,8 @@ intel_miptree_map(struct brw_context *brw,
 #if defined(USE_SSE41)
} else if (!(mode & GL_MAP_WRITE_BIT) &&
   !mt->compressed && cpu_has_sse4_1 &&
-  (mt->surf.row_pitch % 16 == 0)) {
+  (mt->surf.row_pitch % 16 == 0) &&
+  (mt->surf.tiling == ISL_TILING_LINEAR)) {
   intel_miptree_map_movntdqa(brw, mt, map, level, slice);
 #endif
} else if (mt->surf.tiling != ISL_TILING_LINEAR) {
@@ -3752,6 +3753,7 @@ intel_miptree_unmap(struct brw_context *brw,
} else if (!(map->mode & GL_MAP_WRITE_BIT) &&
   !mt->compressed && cpu_has_sse4_1 &&
   (mt->surf.row_pitch % 16 == 0) &&
+  (mt->surf.tiling == ISL_TILING_LINEAR) &&
   map->buffer) {
   intel_miptree_unmap_movntdqa(brw, mt, map, level, slice);
 #endif
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] i965/miptree: Initialize mcs with a linear map

2018-01-09 Thread Scott D Phillips
When initializing mcs, map with MAP_RAW and fill in the linear
map. Removes a place where gtt mapping is used.
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 7a90dafa1e..e4a3f163d2 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -1652,7 +1652,7 @@ intel_miptree_init_mcs(struct brw_context *brw,
 *
 * Note: the clear value for MCS buffers is all 1's, so we memset to 0xff.
 */
-   void *map = brw_bo_map(brw, mt->mcs_buf->bo, MAP_WRITE);
+   void *map = brw_bo_map(brw, mt->mcs_buf->bo, MAP_WRITE | MAP_RAW);
if (unlikely(map == NULL)) {
   fprintf(stderr, "Failed to map mcs buffer into GTT\n");
   brw_bo_unreference(mt->mcs_buf->bo);
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 20/21] [RFC] r600/sb: make it work?

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

This has some hacks in it that in the end make heaven run
---
 src/gallium/drivers/r600/sb/sb_bc_builder.cpp  |  2 +-
 src/gallium/drivers/r600/sb/sb_bc_decoder.cpp  |  1 +
 src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 10 +++-
 src/gallium/drivers/r600/sb/sb_gcm.cpp | 32 +-
 src/gallium/drivers/r600/sb/sb_gvn.cpp |  3 +++
 src/gallium/drivers/r600/sb/sb_liveness.cpp|  4 
 src/gallium/drivers/r600/sb/sb_pass.h  |  6 ++---
 src/gallium/drivers/r600/sb/sb_ra_checker.cpp  |  5 +++-
 src/gallium/drivers/r600/sb/sb_sched.cpp   |  4 ++--
 src/gallium/drivers/r600/sb/sb_ssa_builder.cpp |  4 +++-
 src/gallium/drivers/r600/sb/sb_valtable.cpp|  2 +-
 11 files changed, 52 insertions(+), 21 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc_builder.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_builder.cpp
index ea91e197c0..40d950e857 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_builder.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_builder.cpp
@@ -398,7 +398,7 @@ int bc_builder::build_alu(alu_node* n) {
.LDS_OP((bc.op_ptr->opcode[1] >> 8) & 0xff)
.IDX_OFFSET_0((bc.lds_idx_offset >> 0) & 1)
.IDX_OFFSET_2((bc.lds_idx_offset >> 2) & 1)
-   .DST_CHAN(bc.dst_chan)
+   .DST_CHAN(0)
.IDX_OFFSET_3((bc.lds_idx_offset >> 3) & 1);
 
return 0;
diff --git a/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp
index 1fa580e66d..823b927881 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp
@@ -329,6 +329,7 @@ int bc_decoder::decode_alu(unsigned & i, bc_alu& bc) {
bc.src[2].sel = iw1.get_SRC2_SEL();
bc.src[2].rel = iw1.get_SRC2_REL();
bc.dst_chan = iw1.get_DST_CHAN();
+   bc.dst_gpr = 0;
// TODO: clean up
for (size_t k = 0, e = r600_alu_op_table_size(); k != 
e; k++) {
if (((r600_alu_op_table[k].opcode[1] >> 8) & 
0xff) == iw1.get_LDS_OP()) {
diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
index 099b295f18..5b202c3737 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
@@ -293,6 +293,10 @@ void bc_finalizer::finalize_alu_group(alu_group_node* g, 
node *prev_node) {
unsigned slot = n->bc.slot;
value *d = n->dst.empty() ? NULL : n->dst[0];
 
+   if (n->bc.op == LDS_OP1_LDS_READ_RET && !d) {
+   n->remove();
+   continue;
+   }
if (d && d->is_special_reg()) {
assert((n->bc.op_ptr->flags & AF_MOVA) || 
d->is_geometry_emit() || d->is_lds_oq() || d->is_lds_access());
d = NULL;
@@ -339,7 +343,8 @@ void bc_finalizer::finalize_alu_group(alu_group_node* g, 
node *prev_node) {
insert_rv6xx_load_ar_workaround(g);
}
}
-   last->bc.last = 1;
+   if (last)
+   last->bc.last = 1;
 }
 
 bool bc_finalizer::finalize_alu_src(alu_group_node* g, alu_node* a, 
alu_group_node *prev) {
@@ -358,6 +363,9 @@ bool bc_finalizer::finalize_alu_src(alu_group_node* g, 
alu_node* a, alu_group_no
assert(v);
 
bc_alu_src  = a->bc.src[si];
+
+   if (si >= 3)
+   continue;
sel_chan sc;
src.rel = 0;
 
diff --git a/src/gallium/drivers/r600/sb/sb_gcm.cpp 
b/src/gallium/drivers/r600/sb/sb_gcm.cpp
index 7776a10fc8..c08648ba4a 100644
--- a/src/gallium/drivers/r600/sb/sb_gcm.cpp
+++ b/src/gallium/drivers/r600/sb/sb_gcm.cpp
@@ -158,18 +158,19 @@ void gcm::sched_early(container_node *n) {
}
 }
 
-void gcm::td_schedule(bb_node *bb, node *n) {
+bool gcm::td_schedule(bb_node *bb, node *n) {
+   bool pushed_front = false;
GCM_DUMP(
sblog << "scheduling : ";
dump::dump_op(n);
sblog << "\n";
);
-   td_release_uses(n->dst);
+   pushed_front = td_release_uses(n->dst);
 
bb->push_back(n);
 
op_map[n].top_bb = bb;
-
+   return pushed_front;
 }
 
 void gcm::td_sched_bb(bb_node* bb) {
@@ -181,8 +182,10 @@ void gcm::td_sched_bb(bb_node* bb) {
for (sq_iterator N, I = ready.begin(), E = ready.end(); I != E;
I = N) {
N = I; ++N;
-   td_schedule(bb, *I);
+   bool pushed_front = td_schedule(bb, *I);
ready.erase(I);
+   if (pushed_front)
+   

[Mesa-dev] [PATCH 18/21] r600/sb: use different stacks for tracking lds and queue usage.

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

The normal ssa renumbering isn't sufficient for LDS queue access,
this uses two stacks, one for the lds queue, and one for the
lds r/w ordering.

The LDS oq values are incremented in their use in a linear
fashion.
The LDS rw values are incremented in their definitions and used
in the next lds operation to ensure reordering doesn't occur.
---
 src/gallium/drivers/r600/sb/sb_pass.h  |  4 
 src/gallium/drivers/r600/sb/sb_ssa_builder.cpp | 23 ---
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_pass.h 
b/src/gallium/drivers/r600/sb/sb_pass.h
index b5818039c2..a21b0bf997 100644
--- a/src/gallium/drivers/r600/sb/sb_pass.h
+++ b/src/gallium/drivers/r600/sb/sb_pass.h
@@ -634,7 +634,11 @@ class ssa_rename : public vpass {
typedef sb_map def_map;
 
def_map def_count;
+   def_map lds_oq_count;
+   def_map lds_rw_count;
std::stack rename_stack;
+   std::stack rename_lds_oq_stack;
+   std::stack rename_lds_rw_stack;
 
typedef std::map val_map;
val_map values;
diff --git a/src/gallium/drivers/r600/sb/sb_ssa_builder.cpp 
b/src/gallium/drivers/r600/sb/sb_ssa_builder.cpp
index 3ad628bb68..5cd41c2aab 100644
--- a/src/gallium/drivers/r600/sb/sb_ssa_builder.cpp
+++ b/src/gallium/drivers/r600/sb/sb_ssa_builder.cpp
@@ -132,6 +132,8 @@ bool ssa_prepare::visit(depart_node& n, bool enter) {
 
 int ssa_rename::init() {
rename_stack.push(def_map());
+   rename_lds_oq_stack.push(def_map());
+   rename_lds_rw_stack.push(def_map());
return 0;
 }
 
@@ -287,8 +289,16 @@ void ssa_rename::pop() {
 value* ssa_rename::rename_use(node *n, value* v) {
if (v->version)
return v;
+   unsigned index;
+   if (v->is_lds_access()) {
+   index = get_index(rename_lds_rw_stack.top(), v);
+   } else if (v->is_lds_oq()) {
+   index = new_index(lds_oq_count, v);
+   set_index(rename_lds_oq_stack.top(), v, index);
+   } else {
+   index = get_index(rename_stack.top(), v);
+   }
 
-   unsigned index = get_index(rename_stack.top(), v);
v = sh.get_value_version(v, index);
 
// if (alu) instruction is predicated and source arg comes from psi node
@@ -313,8 +323,15 @@ value* ssa_rename::rename_use(node *n, value* v) {
 }
 
 value* ssa_rename::rename_def(node *n, value* v) {
-   unsigned index = new_index(def_count, v);
-   set_index(rename_stack.top(), v, index);
+   unsigned index;
+
+   if (v->is_lds_access()) {
+   index = new_index(lds_rw_count, v);
+   set_index(rename_lds_rw_stack.top(), v, index);
+   } else {
+   index = new_index(def_count, v);
+   set_index(rename_stack.top(), v, index);
+   }
value *r = sh.get_value_version(v, index);
return r;
 }
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 21/21] [RFC] hack enable sb for tess

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

Don't apply this until we have a lot more tests passing

this disables SB for barrier usage (as those will be a lot of
"fun")
---
 src/gallium/drivers/r600/r600_shader.c | 11 ---
 src/gallium/drivers/r600/r600_shader.h |  1 +
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 773eb079d2..ceb770a01f 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -183,18 +183,14 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
R600_ERR("translation from TGSI failed !\n");
goto error;
}
-   if (shader->shader.processor_type == PIPE_SHADER_VERTEX) {
-   /* only disable for vertex shaders in tess paths */
-   if (key.vs.as_ls)
-   use_sb = 0;
-   }
-   use_sb &= (shader->shader.processor_type != PIPE_SHADER_TESS_CTRL);
-   use_sb &= (shader->shader.processor_type != PIPE_SHADER_TESS_EVAL);
use_sb &= (shader->shader.processor_type != PIPE_SHADER_COMPUTE);
 
/* disable SB for shaders using doubles */
use_sb &= !shader->shader.uses_doubles;
 
+   /* disable SB for barriers */
+   use_sb &= !shader->shader.uses_barrier;
+
use_sb &= !shader->shader.uses_atomics;
use_sb &= !shader->shader.uses_images;
 
@@ -3124,6 +3120,7 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
shader->uses_atomics = ctx.info.file_mask[TGSI_FILE_HW_ATOMIC];
shader->nsys_inputs = 0;
 
+   shader->uses_barrier = ctx.info.opcode_count[TGSI_OPCODE_BARRIER] + 
ctx.info.opcode_count[TGSI_OPCODE_MEMBAR];
shader->uses_images = ctx.info.file_count[TGSI_FILE_IMAGE] > 0 ||
ctx.info.file_count[TGSI_FILE_BUFFER] > 0;
indirect_gprs = ctx.info.indirect_files & ~((1 << TGSI_FILE_CONSTANT) | 
(1 << TGSI_FILE_SAMPLER));
diff --git a/src/gallium/drivers/r600/r600_shader.h 
b/src/gallium/drivers/r600/r600_shader.h
index 8444907883..ad65055295 100644
--- a/src/gallium/drivers/r600/r600_shader.h
+++ b/src/gallium/drivers/r600/r600_shader.h
@@ -119,6 +119,7 @@ struct r600_shader {
boolean uses_doubles;
boolean uses_atomics;
boolean uses_images;
+   boolean uses_barrier;
uint8_t atomic_base;
uint8_t rat_base;
uint8_t image_size_const_offset;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 19/21] r600/sb: add lds related peepholes.

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

if no destination:
a) convert _RET instructions to non _RET variants if no dst
b) set src0 to undefined if it's a READ, this should get DCE then.
---
 src/gallium/drivers/r600/sb/sb_peephole.cpp | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/sb/sb_peephole.cpp 
b/src/gallium/drivers/r600/sb/sb_peephole.cpp
index 49a6965b1f..4390a8f525 100644
--- a/src/gallium/drivers/r600/sb/sb_peephole.cpp
+++ b/src/gallium/drivers/r600/sb/sb_peephole.cpp
@@ -68,7 +68,14 @@ void peephole::run_on(container_node* c) {
if (n->is_alu_inst()) {
alu_node *a = static_cast(n);
 
-   if (a->bc.op_ptr->flags &
+   if (a->bc.op_ptr->flags & AF_LDS) {
+   if (!a->dst[0]) {
+   if (a->bc.op >= 
LDS_OP2_LDS_ADD_RET && a->bc.op <= LDS_OP3_LDS_MSKOR_RET)
+   a->bc.set_op(a->bc.op - 
LDS_OP2_LDS_ADD_RET + LDS_OP2_LDS_ADD);
+   if (a->bc.op == 
LDS_OP1_LDS_READ_RET)
+   a->src[0] = 
sh.get_undef_value();
+   }
+   } else if (a->bc.op_ptr->flags &
(AF_PRED | AF_SET | AF_CMOV | 
AF_KILL)) {
optimize_cc_op(a);
} else if (a->bc.op == ALU_OP1_FLT_TO_INT) {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 14/21] r600/sb: add gcm support to avoid clause between lds read/queue read

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

You have to schedule LDS_READ_RET _, x and MOV reg, LDS_OQ_A_POP
in the same basic block/clause. This makes sure once we've issues
and MOV we don't add another block until we balance it with an
LDS read.
---
 src/gallium/drivers/r600/sb/sb_gcm.cpp | 15 ++-
 src/gallium/drivers/r600/sb/sb_pass.h  |  4 +++-
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_gcm.cpp 
b/src/gallium/drivers/r600/sb/sb_gcm.cpp
index fbebe3427d..7776a10fc8 100644
--- a/src/gallium/drivers/r600/sb/sb_gcm.cpp
+++ b/src/gallium/drivers/r600/sb/sb_gcm.cpp
@@ -366,6 +366,9 @@ void gcm::bu_sched_bb(bb_node* bb) {
continue;
}
 
+   if (sq != SQ_ALU && outstanding_lds_oq)
+   continue;
+
if (!bu_ready_next[sq].empty())
bu_ready[sq].splice(bu_ready[sq].end(), 
bu_ready_next[sq]);
 
@@ -388,7 +391,7 @@ void gcm::bu_sched_bb(bb_node* bb) {
}
 
// simple heuristic to limit register pressure,
-   if (sq == SQ_ALU && live_count > rp_threshold &&
+   if (sq == SQ_ALU && live_count > rp_threshold 
&& !outstanding_lds_oq &&
(!bu_ready[SQ_TEX].empty() ||
 !bu_ready[SQ_VTX].empty() ||
 !bu_ready_next[SQ_TEX].empty() 
||
@@ -423,6 +426,12 @@ void gcm::bu_sched_bb(bb_node* bb) {
check_alu_ready_count(24))
break;
 
+
+   if (sq == SQ_ALU && n->consumes_lds_oq() &&
+   (bu_ready[SQ_TEX].size() || 
bu_ready[SQ_VTX].size() || bu_ready[SQ_GDS].size())) {
+   GCM_DUMP( sblog << "switching 
scheduling due to lds op\n"; );
+   break;
+   }
bu_ready[sq].pop_front();
 
if (sq != SQ_CF) {
@@ -513,6 +522,10 @@ void gcm::bu_schedule(container_node* c, node* n) {
 
assert(op_map[n].bottom_bb == bu_bb);
 
+   if (n->produces_lds_oq())
+   outstanding_lds_oq--;
+   if (n->consumes_lds_oq())
+   outstanding_lds_oq++;
bu_release_defs(n->src, true);
bu_release_defs(n->dst, false);
 
diff --git a/src/gallium/drivers/r600/sb/sb_pass.h 
b/src/gallium/drivers/r600/sb/sb_pass.h
index e878f8c70c..b5818039c2 100644
--- a/src/gallium/drivers/r600/sb/sb_pass.h
+++ b/src/gallium/drivers/r600/sb/sb_pass.h
@@ -223,6 +223,7 @@ class gcm : public pass {
sched_queue ready;
sched_queue ready_above;
 
+   unsigned outstanding_lds_oq;
container_node pending;
 
struct op_info {
@@ -263,7 +264,8 @@ public:
 
gcm(shader ) : pass(sh),
bu_ready(), bu_ready_next(), bu_ready_early(),
-   ready(), op_map(), uses(), nuc_stk(1), ucs_level(),
+   ready(), outstanding_lds_oq(),
+   op_map(), uses(), nuc_stk(1), ucs_level(),
bu_bb(), pending_defs(), pending_nodes(), cur_sq(),
live(), live_count(), pending_exec_mask_update() {}
 
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/21] r600/sb: handle LDS operations in folding.

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

Don't try and fold LDS using expressions.
---
 src/gallium/drivers/r600/sb/sb_expr.cpp | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp 
b/src/gallium/drivers/r600/sb/sb_expr.cpp
index 7a5d62c8e8..7d43ef1d1d 100644
--- a/src/gallium/drivers/r600/sb/sb_expr.cpp
+++ b/src/gallium/drivers/r600/sb/sb_expr.cpp
@@ -74,6 +74,8 @@ bool expr_handler::equal(value *l, value *r) {
 
assert(l != r);
 
+   if (l->is_lds_access() || r->is_lds_access())
+   return false;
if (l->gvalue() == r->gvalue())
return true;
 
@@ -383,8 +385,14 @@ bool expr_handler::fold_alu_op1(alu_node& n) {
if (n.src.empty())
return false;
 
+   /* don't fold LDS instructions */
+   if (n.bc.op_ptr->flags & AF_LDS)
+   return false;
+
value* v0 = n.src[0]->gvalue();
 
+   if (v0->is_lds_oq() || v0->is_lds_access())
+   return false;
assert(v0 && n.dst[0]);
 
if (!v0->is_const()) {
@@ -942,6 +950,9 @@ bool expr_handler::fold_alu_op3(alu_node& n) {
value* v1 = n.src[1]->gvalue();
value* v2 = n.src[2]->gvalue();
 
+   /* LDS instructions look like op3 with no dst - don't fold. */
+   if (!n.dst[0])
+   return false;
assert(v0 && v1 && v2 && n.dst[0]);
 
bool isc0 = v0->is_const();
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 17/21] r600/sb: schedule LDS ops in appropriate places.

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

So LDS ops have to be SLOT_X,
and LDS OQ reads have read port restrictions so we try
and force those into only having one per slot and avoiding
bank swizzles.
---
 src/gallium/drivers/r600/sb/sb_bc.h  | 3 +++
 src/gallium/drivers/r600/sb/sb_sched.cpp | 4 
 2 files changed, 7 insertions(+)

diff --git a/src/gallium/drivers/r600/sb/sb_bc.h 
b/src/gallium/drivers/r600/sb/sb_bc.h
index 3a3bae9d44..b35671bf0f 100644
--- a/src/gallium/drivers/r600/sb/sb_bc.h
+++ b/src/gallium/drivers/r600/sb/sb_bc.h
@@ -711,6 +711,9 @@ public:
mask = 0x0F;
if (!is_cayman() && (slot_flags & AF_S))
mask |= 0x10;
+   /* Force LDS_IDX ops into SLOT_X */
+   if (op_ptr->opcode[0] == -1 && ((op_ptr->opcode[1] & 0xFF) == 
0x11))
+   mask = 0x01;
return mask;
}
 
diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp 
b/src/gallium/drivers/r600/sb/sb_sched.cpp
index 1feef585df..f5fd84d54a 100644
--- a/src/gallium/drivers/r600/sb/sb_sched.cpp
+++ b/src/gallium/drivers/r600/sb/sb_sched.cpp
@@ -461,6 +461,10 @@ bool alu_group_tracker::try_reserve(alu_node* n) {
if (n->uses_ar() && has_mova)
return false;
 
+   if (consumes_lds_oqa)
+   return false;
+   if (n->consumes_lds_oq() && available_slots != (sh.get_ctx().has_trans 
? 0x1F : 0x0F))
+   return false;
for (unsigned i = 0; i < nsrc; ++i) {
 
unsigned last_id = next_id;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 13/21] r600/sb: handle lds special dest registers.

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

This adds lds to the geom emit handling
---
 src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 2 +-
 src/gallium/drivers/r600/sb/sb_sched.cpp   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
index d377a3950a..099b295f18 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
@@ -294,7 +294,7 @@ void bc_finalizer::finalize_alu_group(alu_group_node* g, 
node *prev_node) {
value *d = n->dst.empty() ? NULL : n->dst[0];
 
if (d && d->is_special_reg()) {
-   assert((n->bc.op_ptr->flags & AF_MOVA) || 
d->is_geometry_emit());
+   assert((n->bc.op_ptr->flags & AF_MOVA) || 
d->is_geometry_emit() || d->is_lds_oq() || d->is_lds_access());
d = NULL;
}
 
diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp 
b/src/gallium/drivers/r600/sb/sb_sched.cpp
index 4158317765..6d7ab671ff 100644
--- a/src/gallium/drivers/r600/sb/sb_sched.cpp
+++ b/src/gallium/drivers/r600/sb/sb_sched.cpp
@@ -1663,7 +1663,7 @@ unsigned post_scheduler::try_add_instruction(node *n) {
value *d = a->dst.empty() ? NULL : a->dst[0];
 
if (d && d->is_special_reg()) {
-   assert((a->bc.op_ptr->flags & AF_MOVA) || 
d->is_geometry_emit());
+   assert((a->bc.op_ptr->flags & AF_MOVA) || 
d->is_geometry_emit() || d->is_lds_oq() || d->is_lds_access());
d = NULL;
}
 
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 15/21] r600/sb: adding lds oq tracking to the scheduler

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

This adds support for tracking the lds oq read/writes
so can avoid scheduling other things in between.

This patch just adds the tracking and assert to show
problems.
---
 src/gallium/drivers/r600/sb/sb_sched.cpp | 13 ++---
 src/gallium/drivers/r600/sb/sb_sched.h   |  5 +
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp 
b/src/gallium/drivers/r600/sb/sb_sched.cpp
index 6d7ab671ff..26e4811b1c 100644
--- a/src/gallium/drivers/r600/sb/sb_sched.cpp
+++ b/src/gallium/drivers/r600/sb/sb_sched.cpp
@@ -312,7 +312,7 @@ alu_group_tracker::alu_group_tracker(shader )
  gpr(), lt(), slots(),
  max_slots(sh.get_ctx().is_cayman() ? 4 : 5),
  has_mova(), uses_ar(), has_predset(), has_kill(),
- updates_exec_mask(), chan_count(), interp_param(), next_id() {
+ updates_exec_mask(), consumes_lds_oqa(), produces_lds_oqa(), 
chan_count(), interp_param(), next_id() {
 
available_slots = sh.get_ctx().has_trans ? 0x1F : 0x0F;
 }
@@ -680,6 +680,8 @@ void alu_group_tracker::reset(bool keep_packed) {
memset(slots, 0, sizeof(slots));
vmap.clear();
next_id = 0;
+   produces_lds_oqa = 0;
+   consumes_lds_oqa = 0;
has_mova = false;
uses_ar = false;
has_predset = false;
@@ -703,7 +705,8 @@ void alu_group_tracker::update_flags(alu_node* n) {
has_mova |= (flags & AF_MOVA);
has_predset |= (flags & AF_ANY_PRED);
uses_ar |= n->uses_ar();
-
+   consumes_lds_oqa |= n->consumes_lds_oq();
+   produces_lds_oqa |= n->produces_lds_oq();
if (flags & AF_ANY_PRED) {
if (n->dst[2] != NULL)
updates_exec_mask = true;
@@ -1958,6 +1961,7 @@ void alu_kcache_tracker::reset() {
 void alu_clause_tracker::reset() {
group = 0;
slot_count = 0;
+   outstanding_lds_oqa_reads = 0;
grp0.reset();
grp1.reset();
 }
@@ -1966,7 +1970,7 @@ alu_clause_tracker::alu_clause_tracker(shader )
: sh(sh), kt(sh.get_ctx().hw_class), slot_count(),
  grp0(sh), grp1(sh),
  group(), clause(),
- push_exec_mask(),
+ push_exec_mask(), outstanding_lds_oqa_reads(),
  current_ar(), current_pr(), current_idx() {}
 
 void alu_clause_tracker::emit_group() {
@@ -1988,6 +1992,8 @@ void alu_clause_tracker::emit_group() {
 
clause->push_front(g);
 
+   outstanding_lds_oqa_reads += grp().get_consumes_lds_oqa();
+   outstanding_lds_oqa_reads -= grp().get_produces_lds_oqa();
slot_count += grp().slot_count();
 
new_group();
@@ -2000,6 +2006,7 @@ void alu_clause_tracker::emit_clause(container_node *c) {
 
kt.init_clause(clause->bc);
 
+   assert(!outstanding_lds_oqa_reads);
assert(!current_ar);
assert(!current_pr);
 
diff --git a/src/gallium/drivers/r600/sb/sb_sched.h 
b/src/gallium/drivers/r600/sb/sb_sched.h
index 5a2663442b..91a34e078d 100644
--- a/src/gallium/drivers/r600/sb/sb_sched.h
+++ b/src/gallium/drivers/r600/sb/sb_sched.h
@@ -127,6 +127,8 @@ class alu_group_tracker {
bool has_kill;
bool updates_exec_mask;
 
+   bool consumes_lds_oqa;
+   bool produces_lds_oqa;
unsigned chan_count[4];
 
// param index + 1 (0 means that group doesn't refer to Params)
@@ -166,6 +168,8 @@ public:
unsigned literal_slot_count() { return (literal_count() + 1) >> 1; };
unsigned slot_count() { return inst_count() + literal_slot_count(); }
 
+   bool get_consumes_lds_oqa() { return consumes_lds_oqa; }
+   bool get_produces_lds_oqa() { return produces_lds_oqa; }
alu_group_node* emit();
 
rp_kcache_tracker& kcache() { return kc; }
@@ -212,6 +216,7 @@ class alu_clause_tracker {
 
bool push_exec_mask;
 
+   unsigned outstanding_lds_oqa_reads;
 public:
container_node conflict_nodes;
 
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 16/21] r600/sb: hit the scheduler with a big hammer to avoid lds splits.

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

This tries to avoid an lds queue read getting scheduled separately
from an lds ret read, the non-sb code uses the same style of hammer,
this isn't foolproof.

We can do better, but it's a bit tricky, as you have to scan ahead
and either schedule more lds oq moves and more lds reads and that
could lead to you running out of space anyways.
---
 src/gallium/drivers/r600/sb/sb_sched.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp 
b/src/gallium/drivers/r600/sb/sb_sched.cpp
index 26e4811b1c..1feef585df 100644
--- a/src/gallium/drivers/r600/sb/sb_sched.cpp
+++ b/src/gallium/drivers/r600/sb/sb_sched.cpp
@@ -2034,6 +2034,9 @@ bool alu_clause_tracker::check_clause_limits() {
// ...and index registers
reserve_slots += (current_idx[0] != NULL) + (current_idx[1] != NULL);
 
+   if (gt.get_consumes_lds_oqa() && !outstanding_lds_oqa_reads)
+   reserve_slots += 60;
+
if (slot_count + slots > MAX_ALU_SLOTS - reserve_slots)
return false;
 
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/21] r600: emit 0 gds_op for tf write.

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

This field is ignored for tf writes so should be 0.

Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/r600/eg_asm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/eg_asm.c 
b/src/gallium/drivers/r600/eg_asm.c
index 8f9d1b85f2..f8651bdff5 100644
--- a/src/gallium/drivers/r600/eg_asm.c
+++ b/src/gallium/drivers/r600/eg_asm.c
@@ -225,9 +225,10 @@ int eg_bytecode_gds_build(struct r600_bytecode *bc, struct 
r600_bytecode_gds *gd
 {
unsigned gds_op = (r600_isa_fetch_opcode(bc->isa->hw_class, gds->op) >> 
8) & 0x3f;
unsigned opcode;
-   if (gds->op == FETCH_OP_TF_WRITE)
+   if (gds->op == FETCH_OP_TF_WRITE) {
opcode = 5;
-   else
+   gds_op = 0;
+   } else
opcode = 4;
bc->bytecode[id++] = S_SQ_MEM_GDS_WORD0_MEM_INST(2) |
S_SQ_MEM_GDS_WORD0_MEM_OP(opcode) |
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/21] r600/sb: add initial support for parsing lds operations.

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

This handles parsing the LDS ops and queue accessess.
---
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 52 ++--
 1 file changed, 50 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
index 8ab4083a3c..970e4141d5 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
@@ -384,7 +384,40 @@ int bc_parser::prepare_alu_group(cf_node* cf, 
alu_group_node *g) {
 
unsigned flags = n->bc.op_ptr->flags;
 
-   if (flags & AF_PRED) {
+   if (flags & AF_LDS) {
+   bool need_rw = false, need_oqa = false, need_oqb = 
false;
+   int ndst = 0, ncount = 0;
+
+   /* all non-read operations have side effects */
+   if (n->bc.op != LDS_OP2_LDS_READ2_RET &&
+   n->bc.op != LDS_OP1_LDS_READ_REL_RET &&
+   n->bc.op != LDS_OP1_LDS_READ_RET) {
+   n->flags |= NF_DONT_KILL;
+   ndst++;
+   need_rw = true;
+   }
+
+   if (n->bc.op >= LDS_OP2_LDS_ADD_RET && n->bc.op <= 
LDS_OP1_LDS_USHORT_READ_RET) {
+   need_oqa = true;
+   ndst++;
+   }
+
+   if (n->bc.op == LDS_OP2_LDS_READ2_RET || n->bc.op == 
LDS_OP1_LDS_READ_REL_RET) {
+   need_oqb = true;
+   ndst++;
+   }
+
+   n->dst.resize(ndst);
+   if (need_oqa)
+   n->dst[ncount++] = 
sh->get_special_value(SV_LDS_OQA);
+   if (need_oqb)
+   n->dst[ncount++] = 
sh->get_special_value(SV_LDS_OQB);
+   if (need_rw)
+   n->dst[ncount++] = 
sh->get_special_value(SV_LDS_RW);
+
+   n->flags |= NF_DONT_MOVE | NF_DONT_HOIST;
+
+   } else if (flags & AF_PRED) {
n->dst.resize(3);
if (n->bc.update_pred)
n->dst[1] = sh->get_special_value(SV_ALU_PRED);
@@ -417,7 +450,7 @@ int bc_parser::prepare_alu_group(cf_node* cf, 
alu_group_node *g) {
 
n->flags |= NF_DONT_HOIST;
 
-   } else if (n->bc.op_ptr->src_count == 3 || n->bc.write_mask) {
+   } else if ((n->bc.op_ptr->src_count == 3 || n->bc.write_mask) 
&& !(flags & AF_LDS)) {
assert(!n->bc.dst_rel || n->bc.index_mode == 
INDEX_AR_X);
 
value *v = sh->get_gpr_value(false, n->bc.dst_gpr, 
n->bc.dst_chan,
@@ -487,6 +520,21 @@ int bc_parser::prepare_alu_group(cf_node* cf, 
alu_group_node *g) {
// param index as equal instructions and leave 
only one of them
n->src[s] = 
sh->get_special_ro_value(sel_chan(src.sel,
  
n->bc.slot));
+   } else if (ctx.is_lds_oq(src.sel)) {
+   switch (src.sel) {
+   case ALU_SRC_LDS_OQ_A:
+   case ALU_SRC_LDS_OQ_B:
+   assert(!"Unsupported LDS queue access 
in SB");
+   break;
+   case ALU_SRC_LDS_OQ_A_POP:
+   n->src[s] = 
sh->get_special_value(SV_LDS_OQA);
+   break;
+   case ALU_SRC_LDS_OQ_B_POP:
+   n->src[s] = 
sh->get_special_value(SV_LDS_OQB);
+   break;
+   }
+   n->flags |= NF_DONT_HOIST | NF_DONT_MOVE;
+
} else {
switch (src.sel) {
case ALU_SRC_0:
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/21] r600/sb: lds ops have no dst register.

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

Although these are op3s they don't have a dst reg.
---
 src/gallium/drivers/r600/sb/sb_bc_dump.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc_dump.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_dump.cpp
index 72a1b24467..3b5d9e77b2 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_dump.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_dump.cpp
@@ -232,7 +232,7 @@ static void print_dst(sb_ostream , bc_alu )
reg_char = 'T';
}
 
-   if (alu.write_mask || alu.op_ptr->src_count == 3) {
+   if (alu.write_mask || (alu.op_ptr->src_count == 3 && alu.op < 
LDS_OP2_LDS_ADD)) {
s << reg_char;
print_sel(s, sel, alu.dst_rel, alu.index_mode, 0);
} else {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/21] r600/sb: fix a bug emitting ar load from a constant.

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

Some tess shaders were doing MOVA_INT _, c0.x on cayman, and then
hitting an assert in sb_bc_finalize.cpp:translate_kcache.

This makes sure the toplevel kcache tracker gets updated,
and the clause gets fixed up.

Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/r600/sb/sb_sched.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp 
b/src/gallium/drivers/r600/sb/sb_sched.cpp
index 2fbec2f77e..4158317765 100644
--- a/src/gallium/drivers/r600/sb/sb_sched.cpp
+++ b/src/gallium/drivers/r600/sb/sb_sched.cpp
@@ -1130,6 +1130,9 @@ void post_scheduler::emit_clause() {
if (alu.current_ar) {
emit_load_ar();
process_group();
+   if (!alu.check_clause_limits()) {
+   // Can't happen since clause only contains 
MOVA/CF_SET_IDX0/1
+   }
alu.emit_group();
}
 
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/21] r600/sb: disable if conversion for hs

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

This fixes bad interactions with the LDS special values.
---
 src/gallium/drivers/r600/sb/sb_core.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/sb/sb_core.cpp 
b/src/gallium/drivers/r600/sb/sb_core.cpp
index cdc2862d36..5049b67784 100644
--- a/src/gallium/drivers/r600/sb/sb_core.cpp
+++ b/src/gallium/drivers/r600/sb/sb_core.cpp
@@ -191,7 +191,7 @@ int r600_sb_bytecode_process(struct r600_context *rctx,
 
// if conversion breaks the dependency tracking between CF_EMIT ops 
when it removes
// the phi nodes for SV_GEOMETRY_EMIT. Just disable it for GS
-   if (sh->target != TARGET_GS)
+   if (sh->target != TARGET_GS && sh->target != TARGET_HS)
SB_RUN_PASS(if_conversion,  1);
 
// if_conversion breaks info about uses, but next pass (peephole)
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/21] r600/sb: update last_cf if alu is the last clause

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

It's rare to have a final alu clause on normal shaders (exports)
but tess shaders write to LDS as their output, so we see some
alu clauses, and the CF_END get put in the wrong place.

This makes sure to update last_cf correctly.

Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
index c20640e476..2ec4db624a 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
@@ -266,6 +266,7 @@ void bc_finalizer::run_on(container_node* c) {
}
}
}
+   last_cf = c;
} else if (n->is_fetch_inst()) {
finalize_fetch(static_cast(n));
} else if (n->is_cf_inst()) {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/21] r600/sb: start adding GDS support

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

This adds support for GDS ops to sb backend.

This seems to work for atomics and tess factor writes.

Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/r600/r600_isa.h|  2 +-
 src/gallium/drivers/r600/sb/sb_bc.h|  7 
 src/gallium/drivers/r600/sb/sb_bc_builder.cpp  | 44 +-
 src/gallium/drivers/r600/sb/sb_bc_decoder.cpp  |  9 +-
 src/gallium/drivers/r600/sb/sb_bc_dump.cpp | 13 ++--
 src/gallium/drivers/r600/sb/sb_bc_finalize.cpp |  7 
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp   | 11 +--
 src/gallium/drivers/r600/sb/sb_dump.cpp|  1 +
 src/gallium/drivers/r600/sb/sb_gcm.cpp | 20 +---
 src/gallium/drivers/r600/sb/sb_ir.h|  3 +-
 src/gallium/drivers/r600/sb/sb_peephole.cpp| 14 +++-
 src/gallium/drivers/r600/sb/sb_ra_init.cpp |  2 ++
 src/gallium/drivers/r600/sb/sb_shader.cpp  |  3 ++
 13 files changed, 123 insertions(+), 13 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_isa.h 
b/src/gallium/drivers/r600/r600_isa.h
index b5a36b4e80..f6e26976c5 100644
--- a/src/gallium/drivers/r600/r600_isa.h
+++ b/src/gallium/drivers/r600/r600_isa.h
@@ -115,7 +115,7 @@ enum alu_op_flags
AF_CC_LE= (5U << AF_CC_SHIFT),
 };
 
-/* flags for FETCH instructions (TEX/VTX) */
+/* flags for FETCH instructions (TEX/VTX/GDS) */
 enum fetch_op_flags
 {
FF_GDS  = (1<<0),
diff --git a/src/gallium/drivers/r600/sb/sb_bc.h 
b/src/gallium/drivers/r600/sb/sb_bc.h
index fed041cf50..fc3fa5082d 100644
--- a/src/gallium/drivers/r600/sb/sb_bc.h
+++ b/src/gallium/drivers/r600/sb/sb_bc.h
@@ -401,6 +401,7 @@ enum sched_queue_id {
SQ_ALU,
SQ_TEX,
SQ_VTX,
+   SQ_GDS,
 
SQ_NUM
 };
@@ -580,6 +581,11 @@ struct bc_fetch {
unsigned mega_fetch:1;
 
unsigned src2_gpr:7; /* for GDS */
+   unsigned alloc_consume:1;
+   unsigned uav_id:4;
+   unsigned uav_index_mode:2;
+   unsigned bcast_first_req:1;
+
void set_op(unsigned op) { this->op = op; op_ptr = r600_isa_fetch(op); }
 };
 
@@ -966,6 +972,7 @@ private:
int build_fetch_clause(cf_node *n);
int build_fetch_tex(fetch_node *n);
int build_fetch_vtx(fetch_node *n);
+   int build_fetch_gds(fetch_node *n);
 };
 
 } // namespace r600_sb
diff --git a/src/gallium/drivers/r600/sb/sb_bc_builder.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_builder.cpp
index b0df3d9a54..ea91e197c0 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_builder.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_builder.cpp
@@ -129,7 +129,9 @@ int bc_builder::build_fetch_clause(cf_node* n) {
I != E; ++I) {
fetch_node *f = static_cast(*I);
 
-   if (f->bc.op_ptr->flags & FF_VTX)
+   if (f->bc.op_ptr->flags & FF_GDS)
+   build_fetch_gds(f);
+   else if (f->bc.op_ptr->flags & FF_VTX)
build_fetch_vtx(f);
else
build_fetch_tex(f);
@@ -558,6 +560,46 @@ int bc_builder::build_fetch_tex(fetch_node* n) {
return 0;
 }
 
+int bc_builder::build_fetch_gds(fetch_node *n) {
+   const bc_fetch  = n->bc;
+   const fetch_op_info *fop = bc.op_ptr;
+   unsigned gds_op = (ctx.fetch_opcode(bc.op) >> 8) & 0x3f;
+   unsigned mem_op = 4;
+   assert(fop->flags && FF_GDS);
+
+   if (bc.op == FETCH_OP_TF_WRITE) {
+   mem_op = 5;
+   gds_op = 0;
+   }
+
+   bb << MEM_GDS_WORD0_EGCM()
+   .MEM_INST(2)
+   .MEM_OP(mem_op)
+   .SRC_GPR(bc.src_gpr)
+   .SRC_SEL_X(bc.src_sel[0])
+   .SRC_SEL_Y(bc.src_sel[1])
+   .SRC_SEL_Z(bc.src_sel[2]);
+
+   bb << MEM_GDS_WORD1_EGCM()
+   .DST_GPR(bc.dst_gpr)
+   .DST_REL_MODE(bc.dst_rel)
+   .GDS_OP(gds_op)
+   .SRC_GPR(bc.src2_gpr)
+   .UAV_INDEX_MODE(bc.uav_index_mode)
+   .UAV_ID(bc.uav_id)
+   .ALLOC_CONSUME(bc.alloc_consume)
+   .BCAST_FIRST_REQ(bc.bcast_first_req);
+
+   bb << MEM_GDS_WORD2_EGCM()
+   .DST_SEL_X(bc.dst_sel[0])
+   .DST_SEL_Y(bc.dst_sel[1])
+   .DST_SEL_Z(bc.dst_sel[2])
+   .DST_SEL_W(bc.dst_sel[3]);
+
+   bb << 0;
+   return 0;
+}
+
 int bc_builder::build_fetch_vtx(fetch_node* n) {
const bc_fetch  = n->bc;
const fetch_op_info *fop = bc.op_ptr;
diff --git a/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp
index 8712abe5f7..1fa580e66d 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp
@@ -415,7 +415,10 @@ int bc_decoder::decode_fetch(unsigned & i, bc_fetch& bc) {
unsigned gds_op;
if (mem_op == 4) {
   

[Mesa-dev] [PATCH 11/21] r600/sb: add finalising for lds output queue special values.

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

We need to convert these to the hw special registers.
---
 src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
index 2ec4db624a..d377a3950a 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
@@ -428,6 +428,18 @@ bool bc_finalizer::finalize_alu_src(alu_group_node* g, 
alu_node* a, alu_group_no
src.chan = k.chan();
break;
}
+   case VLK_SPECIAL_REG:
+   if (v->select.sel() == SV_LDS_OQA) {
+   src.sel = ALU_SRC_LDS_OQ_A_POP;
+   src.chan = 0;
+   } else if (v->select.sel() == SV_LDS_OQB) {
+   src.sel = ALU_SRC_LDS_OQ_B_POP;
+   src.chan = 0;
+   } else {
+   src.sel = ALU_SRC_0;
+   src.chan = 0;
+   }
+   break;
case VLK_PARAM:
case VLK_SPECIAL_CONST:
src.sel = v->select.sel();
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/21] r600/sb: add tess/compute initial state registers.

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

This stops them being optimised out.
---
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
index ae92a767b4..de3984f596 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
@@ -149,11 +149,14 @@ int bc_parser::parse_decls() {
}
}
 
-   if (sh->target == TARGET_VS || sh->target == TARGET_ES || sh->target == 
TARGET_HS)
+   if (sh->target == TARGET_VS || sh->target == TARGET_ES || sh->target == 
TARGET_HS || sh->target == TARGET_LS)
sh->add_input(0, 1, 0x0F);
else if (sh->target == TARGET_GS) {
sh->add_input(0, 1, 0x0F);
sh->add_input(1, 1, 0x0F);
+   } else if (sh->target == TARGET_COMPUTE) {
+   sh->add_input(0, 1, 0x0F);
+   sh->add_input(1, 1, 0x0F);
}
 
bool ps_interp = ctx.hw_class >= HW_CLASS_EVERGREEN
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/21] r600/sb: introduce special register values for lds support.

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

For LDS read/write ordering we use the LDS_RW value, reads
will wait on previous writes.
For LDS read/read from LDS queue ordering we use the LDS_OQ
values, we define two for now, though initially we'll just
support OQA.

Also add the check for the lds oq values

Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/r600/sb/sb_bc.h |  4 
 src/gallium/drivers/r600/sb/sb_ir.h | 27 ++-
 src/gallium/drivers/r600/sb/sb_valtable.cpp |  3 +++
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc.h 
b/src/gallium/drivers/r600/sb/sb_bc.h
index fc3fa5082d..3a3bae9d44 100644
--- a/src/gallium/drivers/r600/sb/sb_bc.h
+++ b/src/gallium/drivers/r600/sb/sb_bc.h
@@ -722,6 +722,10 @@ public:
return ((sel >= 128 && sel < 192) || (sel >= 256 && sel < 320));
}
 
+   bool is_lds_oq(unsigned sel) {
+   return (sel >= 0xdb && sel <= 0xde);
+   }
+
const char * get_hw_class_name();
const char * get_hw_chip_name();
 
diff --git a/src/gallium/drivers/r600/sb/sb_ir.h 
b/src/gallium/drivers/r600/sb/sb_ir.h
index 2390babfcf..bee947504e 100644
--- a/src/gallium/drivers/r600/sb/sb_ir.h
+++ b/src/gallium/drivers/r600/sb/sb_ir.h
@@ -42,7 +42,10 @@ enum special_regs {
SV_EXEC_MASK,
SV_AR_INDEX,
SV_VALID_MASK,
-   SV_GEOMETRY_EMIT
+   SV_GEOMETRY_EMIT,
+   SV_LDS_RW,
+   SV_LDS_OQA,
+   SV_LDS_OQB,
 };
 
 class node;
@@ -495,6 +498,12 @@ public:
bool is_geometry_emit() {
return is_special_reg() && select == sel_chan(SV_GEOMETRY_EMIT, 
0);
}
+   bool is_lds_access() {
+   return is_special_reg() && select == sel_chan(SV_LDS_RW, 0);
+   }
+   bool is_lds_oq() {
+   return is_special_reg() && (select == sel_chan(SV_LDS_OQA, 0) 
|| select == sel_chan(SV_LDS_OQB, 0));
+   }
 
node* any_def() {
assert(!(def && adef));
@@ -833,6 +842,22 @@ public:
return vec_uses_ar(dst) || vec_uses_ar(src);
}
 
+   bool vec_uses_lds_oq(vvec ) {
+   for (vvec::iterator I = vv.begin(), E = vv.end(); I != E; ++I) {
+   value *v = *I;
+   if (v && v->is_lds_oq())
+   return true;
+   }
+   return false;
+   }
+
+   bool consumes_lds_oq() {
+   return vec_uses_lds_oq(src);
+   }
+
+   bool produces_lds_oq() {
+   return vec_uses_lds_oq(dst);
+   }
 
region_node* get_parent_region();
 
diff --git a/src/gallium/drivers/r600/sb/sb_valtable.cpp 
b/src/gallium/drivers/r600/sb/sb_valtable.cpp
index a85537c2ad..41cfbf0946 100644
--- a/src/gallium/drivers/r600/sb/sb_valtable.cpp
+++ b/src/gallium/drivers/r600/sb/sb_valtable.cpp
@@ -56,6 +56,9 @@ sb_ostream& operator << (sb_ostream , value ) {
case SV_EXEC_MASK: o << "EM"; break;
case SV_VALID_MASK: o << "VM"; break;
case SV_GEOMETRY_EMIT: o << "GEOMETRY_EMIT"; break;
+   case SV_LDS_RW: o << "LDS_RW"; break;
+   case SV_LDS_OQA: o << "LDS_OQA"; break;
+   case SV_LDS_OQB: o << "LDS_OQB"; break;
default: o << "???specialreg"; break;
}
break;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] r600 sb tessellation support

2018-01-09 Thread Dave Airlie
This is an attempt to add tessellation support to the SB backend.

The main things needed are GDS access which is used for tess
factor storage (also used for atomic counters), and LDS access
which is needed to pass all the data between stages.

The first 19 patches are the stuff I'm happy with, the
nop/sanity shader tests pass with those (and sb enabled).

The last two patches make heaven work and turn on sb,
I'm not suggested these be applied as-is yet.

I think in theory enabling sb for atomics/images/compute should
be fine after this series as well, but I haven't tested that too
much.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/21] r600/shader: only emit add instruction if param has a value.

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

Just saves a pointless a = a + 0;

Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/r600/r600_shader.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 1b028e4a8a..773eb079d2 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -2864,12 +2864,14 @@ static int r600_tess_factor_read(struct r600_shader_ctx 
*ctx,
if (r)
return r;
 
-   r = single_alu_op2(ctx, ALU_OP2_ADD_INT,
-  temp_reg, 0,
-  temp_reg, 0,
-  V_SQ_ALU_SRC_LITERAL, param * 16);
-   if (r)
-   return r;
+   if (param) {
+   r = single_alu_op2(ctx, ALU_OP2_ADD_INT,
+  temp_reg, 0,
+  temp_reg, 0,
+  V_SQ_ALU_SRC_LITERAL, param * 16);
+   if (r)
+   return r;
+   }
 
do_lds_fetch_values(ctx, temp_reg, dreg, ((1u << nc) - 1));
return 0;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] ac: add load_patch_vertices_in() to the abi

2018-01-09 Thread Timothy Arceri
Fixes the follow test for radeonsi nir:

tests/spec/arb_tessellation_shader/execution/quads.shader_test

Also stops 8 other tests from crashing, they now just fail e.g.

tcs-output-array-float-index-rd-after-barrier.shader_test
---
 src/amd/common/ac_nir_to_llvm.c  | 11 ++-
 src/amd/common/ac_shader_abi.h   |  2 ++
 src/gallium/drivers/radeonsi/si_shader.c | 20 ++--
 3 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 4d2c8f20ab..2023dd49c6 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -4157,6 +4157,13 @@ load_tess_coord(struct ac_shader_abi *abi, LLVMTypeRef 
type,
return LLVMBuildBitCast(ctx->builder, result, type, "");
 }
 
+static LLVMValueRef
+load_patch_vertices_in(struct ac_shader_abi *abi)
+{
+   struct nir_to_llvm_context *ctx = nir_to_llvm_context_from_abi(abi);
+   return LLVMConstInt(ctx->ac.i32, ctx->options->key.tcs.input_vertices, 
false);
+}
+
 static void visit_intrinsic(struct ac_nir_context *ctx,
 nir_intrinsic_instr *instr)
 {
@@ -4357,7 +4364,7 @@ static void visit_intrinsic(struct ac_nir_context *ctx,
result = ctx->abi->load_tess_level(ctx->abi, 
VARYING_SLOT_TESS_LEVEL_INNER);
break;
case nir_intrinsic_load_patch_vertices_in:
-   result = LLVMConstInt(ctx->ac.i32, 
ctx->nctx->options->key.tcs.input_vertices, false);
+   result = ctx->abi->load_patch_vertices_in(ctx->abi);
break;
default:
fprintf(stderr, "Unknown intrinsic: ");
@@ -6688,11 +6695,13 @@ LLVMModuleRef 
ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
ctx.tcs_outputs_read = shaders[i]->info.outputs_read;
ctx.tcs_patch_outputs_read = 
shaders[i]->info.patch_outputs_read;
ctx.abi.load_tess_varyings = load_tcs_varyings;
+   ctx.abi.load_patch_vertices_in = load_patch_vertices_in;
ctx.abi.store_tcs_outputs = store_tcs_output;
} else if (shaders[i]->info.stage == MESA_SHADER_TESS_EVAL) {
ctx.tes_primitive_mode = 
shaders[i]->info.tess.primitive_mode;
ctx.abi.load_tess_varyings = load_tes_input;
ctx.abi.load_tess_coord = load_tess_coord;
+   ctx.abi.load_patch_vertices_in = load_patch_vertices_in;
} else if (shaders[i]->info.stage == MESA_SHADER_VERTEX) {
if (shader_info->info.vs.needs_instance_id) {
if (ctx.ac.chip_class == GFX9 &&
diff --git a/src/amd/common/ac_shader_abi.h b/src/amd/common/ac_shader_abi.h
index 6ed7dbb04e..3e9e7a4786 100644
--- a/src/amd/common/ac_shader_abi.h
+++ b/src/amd/common/ac_shader_abi.h
@@ -104,6 +104,8 @@ struct ac_shader_abi {
LLVMTypeRef type,
unsigned num_components);
 
+   LLVMValueRef (*load_patch_vertices_in)(struct ac_shader_abi *abi);
+
LLVMValueRef (*load_tess_level)(struct ac_shader_abi *abi,
unsigned varying_id);
 
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 2e74f4a33c..391ee04741 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1967,6 +1967,17 @@ static LLVMValueRef si_load_tess_level(struct 
ac_shader_abi *abi,
 
 }
 
+static LLVMValueRef si_load_patch_vertices_in(struct ac_shader_abi *abi)
+{
+   struct si_shader_context *ctx = si_shader_context_from_abi(abi);
+   if (ctx->type == PIPE_SHADER_TESS_CTRL)
+   return unpack_param(ctx, ctx->param_tcs_out_lds_layout, 26, 6);
+   else if (ctx->type == PIPE_SHADER_TESS_EVAL)
+   return get_num_tcs_out_vertices(ctx);
+   else
+   assert(!"invalid shader stage for TGSI_SEMANTIC_VERTICESIN");
+}
+
 void si_load_system_value(struct si_shader_context *ctx,
  unsigned index,
  const struct tgsi_full_declaration *decl)
@@ -2075,12 +2086,7 @@ void si_load_system_value(struct si_shader_context *ctx,
break;
 
case TGSI_SEMANTIC_VERTICESIN:
-   if (ctx->type == PIPE_SHADER_TESS_CTRL)
-   value = unpack_param(ctx, 
ctx->param_tcs_out_lds_layout, 26, 6);
-   else if (ctx->type == PIPE_SHADER_TESS_EVAL)
-   value = get_num_tcs_out_vertices(ctx);
-   else
-   assert(!"invalid shader stage for 
TGSI_SEMANTIC_VERTICESIN");
+   value = si_load_patch_vertices_in(>abi);
break;
 
case TGSI_SEMANTIC_TESSINNER:
@@ -5998,6 +6004,7 @@ static bool 

Re: [Mesa-dev] [PATCH] dri_util: remove ALLOW_RGB10_CONFIGS option (v2)

2018-01-09 Thread Tapani Pälli

Hi Marek;

This one works but only if you add

DRI_CONF_ALLOW_RGB10_CONFIGS("false")

to the DRI_CONF_SECTION_MISCELLANEOUS section in intel_screen. With that 
change: Reviewed-by: Tapani Pälli 



On 01/09/2018 04:04 PM, Marek Olšák wrote:

From: Marek Olšák 

This is unused because it's for libGL/libEGL, not drivers.

v2: i965 was wrong, because it used dri_util instead of its own config.
---
  src/mesa/drivers/dri/common/dri_util.c   | 4 
  src/mesa/drivers/dri/i965/intel_screen.c | 2 +-
  2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/common/dri_util.c 
b/src/mesa/drivers/dri/common/dri_util.c
index d4fba0b..e6a7d23 100644
--- a/src/mesa/drivers/dri/common/dri_util.c
+++ b/src/mesa/drivers/dri/common/dri_util.c
@@ -48,24 +48,20 @@
  #include "main/version.h"
  #include "main/debug_output.h"
  #include "main/errors.h"
  #include "main/macros.h"
  
  const char __dri2ConfigOptions[] =

 DRI_CONF_BEGIN
DRI_CONF_SECTION_PERFORMANCE
   DRI_CONF_VBLANK_MODE(DRI_CONF_VBLANK_DEF_INTERVAL_1)
DRI_CONF_SECTION_END
-
-  DRI_CONF_SECTION_MISCELLANEOUS
- DRI_CONF_ALLOW_RGB10_CONFIGS("true")
-  DRI_CONF_SECTION_END
 DRI_CONF_END;
  
  /*/

  /** \name Screen handling functions  */
  /*/
  /*@{*/
  
  static void

  setupLoaderExtensions(__DRIscreen *psp,
  const __DRIextension **extensions)
diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index 3e016b5..89db821 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -2057,21 +2057,21 @@ intel_screen_make_configs(__DRIscreen *dri_screen)
 __DRIconfig **configs = NULL;
  
 /* Expose only BGRA ordering if the loader doesn't support RGBA ordering. */

 unsigned num_formats;
 if (intel_loader_get_cap(dri_screen, DRI_LOADER_CAP_RGBA_ORDERING))
num_formats = ARRAY_SIZE(formats);
 else
num_formats = ARRAY_SIZE(formats) - 2; /* all - RGBA_ORDERING formats */
  
 /* Shall we expose 10 bpc formats? */

-   bool allow_rgb10_configs = driQueryOptionb(_screen->optionCache,
+   bool allow_rgb10_configs = driQueryOptionb(>optionCache,
"allow_rgb10_configs");
  
 /* Generate singlesample configs without accumulation buffer. */

 for (unsigned i = 0; i < num_formats; i++) {
__DRIconfig **new_configs;
int num_depth_stencil_bits = 2;
  
if (!allow_rgb10_configs &&

(formats[i] == MESA_FORMAT_B10G10R10A2_UNORM ||
 formats[i] == MESA_FORMAT_B10G10R10X2_UNORM))


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC 07/10] mesa: add program blob cache functionality

2018-01-09 Thread Tapani Pälli



On 01/09/2018 05:05 PM, Eric Engestrom wrote:

On Tuesday, 2018-01-09 09:48:19 +0200, Tapani Pälli wrote:

Cache set and get are called in similar fashion as what is happening
with disk cache. Functionality requires ARB_get_program_binary and
EGL_ANDROID_blob_cache support.

Signed-off-by: Tapani Pälli 
---
  src/mesa/Makefile.sources  |   2 +
  src/mesa/main/program_blob_cache.c | 141 +
  src/mesa/main/program_blob_cache.h |  48 +
  src/mesa/meson.build   |   2 +
  src/mesa/program/ir_to_mesa.cpp|   9 ++-
  5 files changed, 201 insertions(+), 1 deletion(-)
  create mode 100644 src/mesa/main/program_blob_cache.c
  create mode 100644 src/mesa/main/program_blob_cache.h

diff --git a/src/mesa/Makefile.sources b/src/mesa/Makefile.sources
index 53fa486364..bbcfdb425e 100644
--- a/src/mesa/Makefile.sources
+++ b/src/mesa/Makefile.sources
@@ -177,6 +177,8 @@ MAIN_FILES = \
main/polygon.h \
main/program_binary.c \
main/program_binary.h \
+   main/program_blob_cache.c \
+   main/program_blob_cache.h \
main/program_resource.c \
main/program_resource.h \
main/querymatrix.c \
diff --git a/src/mesa/main/program_blob_cache.c 
b/src/mesa/main/program_blob_cache.c
new file mode 100644
index 00..0b3ea1a549
--- /dev/null
+++ b/src/mesa/main/program_blob_cache.c
@@ -0,0 +1,141 @@
+/*
+ * Mesa 3-D graphics library
+ *
+ * Copyright (C) 2018 Intel Corporation.  All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included
+ * in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "main/errors.h"
+#include "main/mtypes.h"
+#include "main/shaderobj.h"
+#include "main/program_binary.h"
+#include "util/mesa-sha1.h"
+#include "compiler/glsl/program.h"
+
+#include "program_blob_cache.h"
+
+/* This is what Android EGL defines as the maxValueSize in egl_cache_t
+ * class implementation.
+ */
+#define MAX_BLOB_SIZE 64 * 1024
+
+static void
+generate_sha1_string(struct gl_context *ctx, struct gl_shader_program *shProg,
+ char *key)
+{
+   char *buf = create_shader_program_keystr(ctx, shProg);
+   struct mesa_sha1 sha_ctx;
+   unsigned char sha1str[20];
+
+   /* Add driver sha1 to the key string. */
+   uint8_t driver_sha1[20];
+   char driver_sha1buf[41];
+
+   ctx->Driver.GetProgramBinaryDriverSHA1(ctx, driver_sha1);
+   _mesa_sha1_format(driver_sha1buf, driver_sha1);
+   ralloc_asprintf_append(, "%s", driver_sha1buf);
+
+   _mesa_sha1_init(_ctx);
+   _mesa_sha1_update(_ctx, buf, strlen(buf));
+   _mesa_sha1_final(_ctx, sha1str);
+   _mesa_sha1_format(key, sha1str);
+
+   ralloc_free(buf);
+}
+
+void
+_mesa_blob_cache_set(struct gl_context *ctx,
+ struct gl_shader_program *shProg)
+{
+   assert(shProg->data->LinkStatus == linking_success);
+
+   /* ARB_get_program_binary support required. */
+   if (!ctx->blobCacheSet || !ctx->Driver.GetProgramBinaryDriverSHA1)
+  return;
+
+   /* Skip cache for fixed-function programs and programs that use
+* transform feedback.
+*/
+   if (!shProg->Name || shProg->TransformFeedback.NumVarying > 0)
+  return;
+
+   GLint length;
+   _mesa_get_program_binary_length(ctx, shProg, );
+
+   /* Skip cache if exceeds max blob size. */
+   if (length > MAX_BLOB_SIZE)
+  return;
+
+   char *blob = (char *) malloc (length);


Nit: in C, malloc returns (void*) so the cast is unnecessary, and the
space after malloc looks weird :)


Yes, will clean this up.


+
+   if (!blob)
+  return;
+
+   GLsizei real_len;
+   GLenum format;
+   _mesa_get_program_binary(ctx, shProg, length, _len,
+, blob);
+
+   assert(format == GL_PROGRAM_BINARY_FORMAT_MESA);
+
+   char key[41];
+   generate_sha1_string(ctx, shProg, key);
+
+   ctx->blobCacheSet(key, 41, blob, real_len);
+   free(blob);
+}
+
+void
+_mesa_blob_cache_get(struct gl_context *ctx,
+ struct 

Re: [Mesa-dev] [PATCH] util: fix NORETURN for msvc, add HAVE_FUNC_ATTRIBUTE_NORETURN to c99_compat.h

2018-01-09 Thread Brian Paul

On 01/09/2018 07:15 PM, srol...@vmware.com wrote:

From: Roland Scheidegger 

We've seen some problems internally due to macro redefinition.
Fix this by adding HAVE_FUNC_ATTRIBUTE_NORETURN to c99_compat.h,
and defining it for msvc.
And avoid redefinition just in case.
---
  include/c99_compat.h |  1 +
  src/util/macros.h| 12 
  2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/c99_compat.h b/include/c99_compat.h
index cb690c6..81621a7 100644
--- a/include/c99_compat.h
+++ b/include/c99_compat.h
@@ -164,6 +164,7 @@ test_c99_compat_h(const void * restrict a,
  #define HAVE_FUNC_ATTRIBUTE_FORMAT 1
  #define HAVE_FUNC_ATTRIBUTE_PACKED 1
  #define HAVE_FUNC_ATTRIBUTE_ALIAS 1
+#define HAVE_FUNC_ATTRIBUTE_NORETURN 1
  
  #if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)

 /* 
https://urldefense.proofpoint.com/v2/url?u=https-3A__gcc.gnu.org_onlinedocs_gcc-2D4.3.6_gcc_Other-2DBuiltins.html=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=Ie7_encNUsqxbSRbqbNgofw0ITcfE8JKfaUjIQhncGA=oylXtmBLOzvwWe7ZHMkXZwIoBmh1Aq4Crp-zXabZjzE=CESxymd22FyLvhnwZwopuzUabQR-tlSIAmzo81-NE8k=
 */
diff --git a/src/util/macros.h b/src/util/macros.h
index 2a08407..5ce0e57 100644
--- a/src/util/macros.h
+++ b/src/util/macros.h
@@ -171,10 +171,14 @@ do {   \
  #define ATTRIBUTE_RETURNS_NONNULL
  #endif
  
-#ifdef HAVE_FUNC_ATTRIBUTE_NORETURN

-#define NORETURN __attribute__((__noreturn__))
-#else
-#define NORETURN
+#ifndef NORETURN
+#  ifdef _MSC_VER
+#define NORETURN __declspec(noreturn)
+#  elif defined HAVE_FUNC_ATTRIBUTE_NORETURN
+#define NORETURN __attribute__((__noreturn__))
+#  else
+#define NORETURN
+#  endif
  #endif
  
  #ifdef __cplusplus




I didn't test this, but looks OK to me.

Reviewed-by: Brian Paul 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radv: Implement VK_EXT_discard_rectangles.

2018-01-09 Thread Dave Airlie
On 10 January 2018 at 12:34, Bas Nieuwenhuizen  wrote:
> Tested with a modified deferred demo and no regressions in a 1.0.2
> mustpass run.

For the series:

Reviewed-by: Dave Airlie 

> ---
>  src/amd/vulkan/radv_cmd_buffer.c  | 51 
> +++
>  src/amd/vulkan/radv_device.c  |  6 +
>  src/amd/vulkan/radv_extensions.py |  1 +
>  src/amd/vulkan/radv_pipeline.c| 35 +++
>  src/amd/vulkan/radv_private.h | 23 +-
>  5 files changed, 110 insertions(+), 6 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_cmd_buffer.c 
> b/src/amd/vulkan/radv_cmd_buffer.c
> index 3114ae9fb4..4c42dc2b13 100644
> --- a/src/amd/vulkan/radv_cmd_buffer.c
> +++ b/src/amd/vulkan/radv_cmd_buffer.c
> @@ -91,6 +91,7 @@ radv_bind_dynamic_state(struct radv_cmd_buffer *cmd_buffer,
>  */
> dest->viewport.count = src->viewport.count;
> dest->scissor.count = src->scissor.count;
> +   dest->discard_rectangle.count = src->discard_rectangle.count;
>
> if (copy_mask & RADV_DYNAMIC_VIEWPORT) {
> if (memcmp(>viewport.viewports, 
> >viewport.viewports,
> @@ -168,6 +169,16 @@ radv_bind_dynamic_state(struct radv_cmd_buffer 
> *cmd_buffer,
> }
> }
>
> +   if (copy_mask & RADV_DYNAMIC_DISCARD_RECTANGLE) {
> +   if (memcmp(>discard_rectangle.rectangles, 
> >discard_rectangle.rectangles,
> +  src->discard_rectangle.count * sizeof(VkRect2D))) {
> +   typed_memcpy(dest->discard_rectangle.rectangles,
> +src->discard_rectangle.rectangles,
> +src->discard_rectangle.count);
> +   dest_mask |= RADV_DYNAMIC_DISCARD_RECTANGLE;
> +   }
> +   }
> +
> cmd_buffer->state.dirty |= dest_mask;
>  }
>
> @@ -1098,6 +1109,8 @@ radv_emit_graphics_pipeline(struct radv_cmd_buffer 
> *cmd_buffer)
> }
> radeon_set_context_reg(cmd_buffer->cs, R_028A6C_VGT_GS_OUT_PRIM_TYPE, 
> pipeline->graphics.gs_out);
>
> +   radeon_set_context_reg(cmd_buffer->cs, R_02820C_PA_SC_CLIPRECT_RULE, 
> pipeline->graphics.pa_sc_cliprect_rule);
> +
> if (unlikely(cmd_buffer->device->trace_bo))
> radv_save_pipeline(cmd_buffer, pipeline, RING_GFX);
>
> @@ -1134,6 +1147,22 @@ radv_emit_scissor(struct radv_cmd_buffer *cmd_buffer)
>
> cmd_buffer->state.pipeline->graphics.ms.pa_sc_mode_cntl_0 | 
> S_028A48_VPORT_SCISSOR_ENABLE(count ? 1 : 0));
>  }
>
> +static void
> +radv_emit_discard_rectangle(struct radv_cmd_buffer *cmd_buffer)
> +{
> +   if (!cmd_buffer->state.dynamic.discard_rectangle.count)
> +   return;
> +
> +   radeon_set_context_reg_seq(cmd_buffer->cs, 
> R_028210_PA_SC_CLIPRECT_0_TL,
> +  
> cmd_buffer->state.dynamic.discard_rectangle.count * 2);
> +   for (unsigned i = 0; i < 
> cmd_buffer->state.dynamic.discard_rectangle.count; ++i) {
> +   VkRect2D rect = 
> cmd_buffer->state.dynamic.discard_rectangle.rectangles[i];
> +   radeon_emit(cmd_buffer->cs, S_028210_TL_X(rect.offset.x) | 
> S_028210_TL_Y(rect.offset.y));
> +   radeon_emit(cmd_buffer->cs, S_028214_BR_X(rect.offset.x + 
> rect.extent.width) |
> +   S_028214_BR_Y(rect.offset.y + 
> rect.extent.height));
> +   }
> +}
> +
>  static void
>  radv_emit_line_width(struct radv_cmd_buffer *cmd_buffer)
>  {
> @@ -1627,6 +1656,9 @@ radv_cmd_buffer_flush_dynamic_state(struct 
> radv_cmd_buffer *cmd_buffer)
>RADV_CMD_DIRTY_DYNAMIC_DEPTH_BIAS))
> radv_emit_depth_biais(cmd_buffer);
>
> +   if (cmd_buffer->state.dirty & 
> RADV_CMD_DIRTY_DYNAMIC_DISCARD_RECTANGLE)
> +   radv_emit_discard_rectangle(cmd_buffer);
> +
> cmd_buffer->state.dirty &= ~RADV_CMD_DIRTY_DYNAMIC_ALL;
>  }
>
> @@ -2882,6 +2914,25 @@ void radv_CmdSetStencilReference(
> cmd_buffer->state.dirty |= RADV_CMD_DIRTY_DYNAMIC_STENCIL_REFERENCE;
>  }
>
> +void radv_CmdSetDiscardRectangleEXT(
> +   VkCommandBuffer commandBuffer,
> +   uint32_tfirstDiscardRectangle,
> +   uint32_tdiscardRectangleCount,
> +   const VkRect2D* pDiscardRectangles)
> +{
> +   RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
> +   struct radv_cmd_state *state = _buffer->state;
> +   MAYBE_UNUSED const uint32_t total_count = firstDiscardRectangle + 
> discardRectangleCount;
> +
> +   assert(firstDiscardRectangle < MAX_DISCARD_RECTANGLES);
> +   assert(total_count >= 1 && total_count <= MAX_DISCARD_RECTANGLES);
> +
> +
> +   
> 

[Mesa-dev] [PATCH v4 38/38] nvir/nir: implement intrinsic shader_clock

2018-01-09 Thread Karol Herbst
Signed-off-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index 0a78c6a593..e60d21bc8a 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -2150,6 +2150,14 @@ Converter::visit(nir_intrinsic_instr *insn)
   bar->subOp = getSubOp(op);
   break;
}
+   case nir_intrinsic_shader_clock: {
+  const DataType dType = getDType(insn);
+  LValues  = convert(>dest);
+
+  loadImm(newDefs[0], 0u);
+  mkOp1v(OP_RDSV, dType, newDefs[1], mkSysVal(SV_CLOCK, 0));
+  break;
+   }
default:
   ERROR("unknown nir_intrinsic_op %s\n", nir_intrinsic_infos[op].name);
   return false;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 21/38] nvir/nir: implement nir_alu_instr handling

2018-01-09 Thread Karol Herbst
Signed-off-by: Karol Herbst 

v2: user bitfield_insert instead of bfi
rework switch helper macros
remove some lowering code (LoweringHelper is now used for this)
v3: add pack_half_2x16_split
add unpack_half_2x16_split_x/y
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 486 -
 1 file changed, 485 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index 9c00304ad3..572ccfa4eb 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -32,6 +32,31 @@
 #include 
 #include 
 
+#define CASE_OPFI(ni) \
+   case nir_op_f ## ni : \
+   case nir_op_i ## ni
+#define CASE_OPFIU(ni) \
+   case nir_op_f ## ni : \
+   case nir_op_i ## ni : \
+   case nir_op_u ## ni
+#define CASE_OPIU(ni) \
+   case nir_op_i ## ni : \
+   case nir_op_u ## ni
+
+#define CASE_OPFI_RET(ni, val) \
+   case nir_op_f ## ni : \
+   case nir_op_i ## ni : \
+  return val
+#define CASE_OPFIU_RET(ni, val) \
+   case nir_op_f ## ni : \
+   case nir_op_i ## ni : \
+   case nir_op_u ## ni : \
+  return val
+#define CASE_OPIU_RET(ni, val) \
+   case nir_op_i ## ni : \
+   case nir_op_u ## ni : \
+  return val
+
 static int
 type_size(const struct glsl_type *type)
 {
@@ -78,6 +103,7 @@ public:
Instruction *loadFrom(DataFile, uint8_t, DataType, Value *def, uint32_t 
base, uint8_t c, Value *indirect0 = nullptr, Value *indirect1 = nullptr, bool 
patch = false);
void storeTo(DataFile, operation, DataType, Value *src, uint8_t idx, 
uint8_t c, Value *indirect0 = nullptr, Value *indirect1 = nullptr);
 
+   bool visit(nir_alu_instr *);
bool visit(nir_block *);
bool visit(nir_cf_node *);
bool visit(nir_function *);
@@ -100,6 +126,10 @@ public:
std::vector getSTypes(nir_alu_instr*);
DataType getSType(nir_src&, bool isFloat, bool isSigned);
 
+   operation getOperation(nir_op);
+   operation preOperationNeeded(nir_op);
+   int getSubOp(nir_op);
+   CondCode getCondCode(nir_op);
 private:
nir_shader *nir;
 
@@ -109,6 +139,7 @@ private:
unsigned int curLoopDepth;
 
BasicBlock *exit;
+   Value *zero;
 
union {
   struct {
@@ -120,7 +151,10 @@ private:
 Converter::Converter(Program *prog, nir_shader *nir, nv50_ir_prog_info *info)
: ConverterCommon(prog, info),
  nir(nir),
- curLoopDepth(0) {}
+ curLoopDepth(0)
+{
+   zero = mkImm((uint32_t)0);
+}
 
 BasicBlock *
 Converter::convert(nir_block *block)
@@ -239,6 +273,136 @@ Converter::getSType(nir_src , bool isFloat, bool 
isSigned)
return typeOfSize(bitSize / 8, isFloat, isSigned);
 }
 
+operation
+Converter::getOperation(nir_op op)
+{
+   switch (op) {
+   // basic ops with float and int variants
+   CASE_OPFI_RET(abs, OP_ABS);
+   CASE_OPFI_RET(add, OP_ADD);
+   CASE_OPFI_RET(and, OP_AND);
+   CASE_OPFIU_RET(div, OP_DIV);
+   CASE_OPIU_RET(find_msb, OP_BFIND);
+   CASE_OPFIU_RET(max, OP_MAX);
+   CASE_OPFIU_RET(min, OP_MIN);
+   CASE_OPFIU_RET(mod, OP_MOD);
+   CASE_OPFI_RET(mul, OP_MUL);
+   CASE_OPIU_RET(mul_high, OP_MUL);
+   CASE_OPFI_RET(neg, OP_NEG);
+   CASE_OPFI_RET(not, OP_NOT);
+   CASE_OPFI_RET(or, OP_OR);
+   CASE_OPFI_RET(eq, OP_SET);
+   CASE_OPFIU_RET(ge, OP_SET);
+   CASE_OPFIU_RET(lt, OP_SET);
+   CASE_OPFI_RET(ne, OP_SET);
+   CASE_OPIU_RET(shr, OP_SHR);
+   CASE_OPFI_RET(sub, OP_SUB);
+   CASE_OPFI_RET(xor, OP_XOR);
+   case nir_op_fceil:
+  return OP_CEIL;
+   case nir_op_fcos:
+  return OP_COS;
+   case nir_op_f2f32:
+   case nir_op_f2f64:
+   case nir_op_f2i32:
+   case nir_op_f2i64:
+   case nir_op_f2u32:
+   case nir_op_f2u64:
+   case nir_op_i2f32:
+   case nir_op_i2f64:
+   case nir_op_i2i32:
+   case nir_op_i2i64:
+   case nir_op_u2f32:
+   case nir_op_u2f64:
+   case nir_op_u2u32:
+   case nir_op_u2u64:
+  return OP_CVT;
+   case nir_op_fddx:
+   case nir_op_fddx_coarse:
+   case nir_op_fddx_fine:
+  return OP_DFDX;
+   case nir_op_fddy:
+   case nir_op_fddy_coarse:
+   case nir_op_fddy_fine:
+  return OP_DFDY;
+   case nir_op_fexp2:
+  return OP_EX2;
+   case nir_op_ffloor:
+  return OP_FLOOR;
+   case nir_op_ffma:
+  return OP_FMA;
+   case nir_op_flog2:
+  return OP_LG2;
+   case nir_op_pack_64_2x32_split:
+  return OP_MERGE;
+   case nir_op_frcp:
+  return OP_RCP;
+   case nir_op_frsq:
+  return OP_RSQ;
+   case nir_op_fsat:
+  return OP_SAT;
+   case nir_op_ishl:
+  return OP_SHL;
+   case nir_op_fsin:
+  return OP_SIN;
+   case nir_op_fsqrt:
+  return OP_SQRT;
+   case nir_op_ftrunc:
+  return OP_TRUNC;
+   default:
+  ERROR("couldn't get operation for op %s\n", nir_op_infos[op].name);
+  assert(false);
+  return OP_NOP;
+   }
+}
+
+operation
+Converter::preOperationNeeded(nir_op op)
+{
+   switch (op) {
+   case nir_op_fcos:
+   case nir_op_fsin:
+  return OP_PRESIN;
+   default:

[Mesa-dev] [PATCH v4 27/38] nvir/nir: implement nir_ssa_undef_instr

2018-01-09 Thread Karol Herbst
v2: use mkOp

Signed-off-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index eee7e4ccb5..206a512918 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -114,6 +114,7 @@ public:
bool visit(nir_jump_instr *);
bool visit(nir_load_const_instr*);
bool visit(nir_loop *);
+   bool visit(nir_ssa_undef_instr *);
 
bool run();
 
@@ -1374,6 +1375,10 @@ Converter::visit(nir_instr *insn)
   if (!visit(nir_instr_as_jump(insn)))
  return false;
   break;
+   case nir_instr_type_ssa_undef:
+  if (!visit(nir_instr_as_ssa_undef(insn)))
+ return false;
+  break;
default:
   ERROR("unknown nir_instr type %u\n", insn->type);
   return false;
@@ -1959,6 +1964,16 @@ Converter::visit(nir_alu_instr *insn)
 }
 #undef DEFAULT_CHECKS
 
+bool
+Converter::visit(nir_ssa_undef_instr *insn)
+{
+   LValues  = convert(>def);
+   for (auto i = 0u; i < insn->def.num_components; ++i) {
+  mkOp(OP_NOP, TYPE_NONE, newDefs[i]);
+   }
+   return true;
+}
+
 bool
 Converter::run()
 {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 30/38] nvir/nir: implement vote and ballot

2018-01-09 Thread Karol Herbst
v2: add vote_eq support
use the new subop intrinsic helper
add ballot
v3: add read_(first_)invocation

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 41 ++
 1 file changed, 41 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index ef0e58d4b8..996c202645 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -433,6 +433,12 @@ int
 Converter::getSubOp(nir_intrinsic_op op)
 {
switch (op) {
+   case nir_intrinsic_vote_all:
+  return NV50_IR_SUBOP_VOTE_ALL;
+   case nir_intrinsic_vote_any:
+  return NV50_IR_SUBOP_VOTE_ANY;
+   case nir_intrinsic_vote_eq:
+  return NV50_IR_SUBOP_VOTE_UNI;
default:
   ERROR("couldn't get subop for nir_intrinsic_op %u\n", op);
   assert(false);
@@ -1704,6 +1710,41 @@ Converter::visit(nir_intrinsic_instr *insn)
   loadImm(newDefs[0], 32u);
   break;
}
+   case nir_intrinsic_vote_all:
+   case nir_intrinsic_vote_any:
+   case nir_intrinsic_vote_eq: {
+  LValues  = convert(>dest);
+  Value *pred = new_LValue(func, FILE_PREDICATE);
+  mkCmp(OP_SET, CC_NE, TYPE_U32, pred, TYPE_U32, getSrc(>src[0], 0), 
zero);
+  mkOp1(OP_VOTE, TYPE_U32, pred, pred)->subOp = getSubOp(op);
+  mkCvt(OP_CVT, TYPE_U32, newDefs[0], TYPE_U8, pred);
+  break;
+   }
+   case nir_intrinsic_ballot: {
+  LValues  = convert(>dest);
+  Value *pred = new_LValue(func, FILE_PREDICATE);
+  mkCmp(OP_SET, CC_NE, TYPE_U32, pred, TYPE_U32, getSrc(>src[0], 0), 
zero);
+  Instruction *ballot = mkOp1(OP_VOTE, TYPE_U32, getSSA(), pred);
+  ballot->subOp = NV50_IR_SUBOP_VOTE_ANY;
+  mkOp2(OP_MERGE, TYPE_U64, newDefs[0], ballot->getDef(0), 
loadImm(getSSA(), 0));
+  break;
+   }
+   case nir_intrinsic_read_first_invocation:
+   case nir_intrinsic_read_invocation: {
+  LValues  = convert(>dest);
+  const DataType dType = getDType(insn);
+  Value *tmp = getScratch();
+
+  if (op == nir_intrinsic_read_first_invocation) {
+ mkOp1(OP_VOTE, TYPE_U32, tmp, mkImm(1))->subOp = 
NV50_IR_SUBOP_VOTE_ANY;
+ mkOp2(OP_EXTBF, TYPE_U32, tmp, tmp, mkImm(0x2000))->subOp = 
NV50_IR_SUBOP_EXTBF_REV;
+ mkOp1(OP_BFIND, TYPE_U32, tmp, tmp)->subOp = NV50_IR_SUBOP_BFIND_SAMT;
+  } else
+ tmp = getSrc(>src[1], 0);
+
+  mkOp3(OP_SHFL, dType, newDefs[0], getSrc(>src[0], 0), tmp, 
mkImm(0x1f))->subOp = NV50_IR_SUBOP_SHFL_IDX;
+  break;
+   }
default:
   ERROR("unknown nir_intrinsic_op %s\n", nir_intrinsic_infos[op].name);
   return false;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 35/38] nvir/nir: implement images

2018-01-09 Thread Karol Herbst
v3: fix compiler warnings
v4: use loadFrom helper

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 276 +++--
 1 file changed, 258 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index 9105cddf93..f7b51339c2 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -87,6 +87,8 @@ public:
LValues& convert(nir_register *);
LValues& convert(nir_ssa_def *);
 
+   ImgFormat convertGLImgFormat(GLuint);
+
// nir_alu_src needs special handling due to neg and abs modifiers
Value* getSrc(nir_alu_src *, uint8_t component = 0);
Value* getSrc(nir_register *, uint8_t);
@@ -141,6 +143,7 @@ public:
 
// tex stuff
Value* applyProjection(Value *src, Value *proj);
+   unsigned int getNIRArgCount(TexInstruction::Target&);
 private:
nir_shader *nir;
 
@@ -435,28 +438,31 @@ Converter::getSubOp(nir_op op)
}
 }
 
+#define CASE_OP_INTR_ATOM(nir, nvir) \
+   case nir_intrinsic_image_atomic_ ## nir : \
+   case nir_intrinsic_shared_atomic_ ## nir : \
+   case nir_intrinsic_ssbo_atomic_ ## nir : \
+  return NV50_IR_SUBOP_ATOM_ ## nvir
+#define CASE_OP_INTR_ATOM_S(nir, nvir) \
+   case nir_intrinsic_shared_atomic_ ## nir : \
+   case nir_intrinsic_ssbo_atomic_ ## nir : \
+  return NV50_IR_SUBOP_ATOM_ ## nvir
 int
 Converter::getSubOp(nir_intrinsic_op op)
 {
switch (op) {
-   case nir_intrinsic_ssbo_atomic_add:
-  return NV50_IR_SUBOP_ATOM_ADD;
-   case nir_intrinsic_ssbo_atomic_and:
-  return NV50_IR_SUBOP_ATOM_AND;
-   case nir_intrinsic_ssbo_atomic_comp_swap:
-  return NV50_IR_SUBOP_ATOM_CAS;
-   case nir_intrinsic_ssbo_atomic_exchange:
-  return NV50_IR_SUBOP_ATOM_EXCH;
-   case nir_intrinsic_ssbo_atomic_or:
-  return NV50_IR_SUBOP_ATOM_OR;
-   case nir_intrinsic_ssbo_atomic_imax:
-   case nir_intrinsic_ssbo_atomic_umax:
-  return NV50_IR_SUBOP_ATOM_MAX;
-   case nir_intrinsic_ssbo_atomic_imin:
-   case nir_intrinsic_ssbo_atomic_umin:
-  return NV50_IR_SUBOP_ATOM_MIN;
-   case nir_intrinsic_ssbo_atomic_xor:
-  return NV50_IR_SUBOP_ATOM_XOR;
+   CASE_OP_INTR_ATOM(add, ADD);
+   CASE_OP_INTR_ATOM(and, AND);
+   CASE_OP_INTR_ATOM(comp_swap, CAS);
+   CASE_OP_INTR_ATOM(exchange, EXCH);
+   CASE_OP_INTR_ATOM(or, OR);
+   case nir_intrinsic_image_atomic_max:
+   CASE_OP_INTR_ATOM_S(imax, MAX);
+   CASE_OP_INTR_ATOM_S(umax, MAX);
+   case nir_intrinsic_image_atomic_min:
+   CASE_OP_INTR_ATOM_S(imin, MIN);
+   CASE_OP_INTR_ATOM_S(umin, MIN);
+   CASE_OP_INTR_ATOM(xor, XOR);
case nir_intrinsic_vote_all:
   return NV50_IR_SUBOP_VOTE_ALL;
case nir_intrinsic_vote_any:
@@ -469,6 +475,8 @@ Converter::getSubOp(nir_intrinsic_op op)
   return 0;
}
 }
+#undef CASE_OP_INTR_ATOM
+#undef CASE_OP_INTR_ATOM_S
 
 CondCode
 Converter::getCondCode(nir_op op)
@@ -1595,6 +1603,68 @@ Converter::convert(nir_intrinsic_op intr)
}
 }
 
+ImgFormat
+Converter::convertGLImgFormat(GLuint format)
+{
+#define FMT_CASE(a, b) \
+  case GL_ ## a: return nv50_ir::FMT_ ## b
+
+   switch (format) {
+   FMT_CASE(NONE, NONE);
+
+   FMT_CASE(RGBA32F, RGBA32F);
+   FMT_CASE(RGBA16F, RGBA16F);
+   FMT_CASE(RG32F, RG32F);
+   FMT_CASE(RG16F, RG16F);
+   FMT_CASE(R11F_G11F_B10F, R11G11B10F);
+   FMT_CASE(R32F, R32F);
+   FMT_CASE(R16F, R16F);
+
+   FMT_CASE(RGBA32UI, RGBA32UI);
+   FMT_CASE(RGBA16UI, RGBA16UI);
+   FMT_CASE(RGB10_A2UI, RGB10A2UI);
+   FMT_CASE(RGBA8UI, RGBA8UI);
+   FMT_CASE(RG32UI, RG32UI);
+   FMT_CASE(RG16UI, RG16UI);
+   FMT_CASE(RG8UI, RG8UI);
+   FMT_CASE(R32UI, R32UI);
+   FMT_CASE(R16UI, R16UI);
+   FMT_CASE(R8UI, R8UI);
+
+   FMT_CASE(RGBA32I, RGBA32I);
+   FMT_CASE(RGBA16I, RGBA16I);
+   FMT_CASE(RGBA8I, RGBA8I);
+   FMT_CASE(RG32I, RG32I);
+   FMT_CASE(RG16I, RG16I);
+   FMT_CASE(RG8I, RG8I);
+   FMT_CASE(R32I, R32I);
+   FMT_CASE(R16I, R16I);
+   FMT_CASE(R8I, R8I);
+
+   FMT_CASE(RGBA16, RGBA16);
+   FMT_CASE(RGB10_A2, RGB10A2);
+   FMT_CASE(RGBA8, RGBA8);
+   FMT_CASE(RG16, RG16);
+   FMT_CASE(RG8, RG8);
+   FMT_CASE(R16, R16);
+   FMT_CASE(R8, R8);
+
+   FMT_CASE(RGBA16_SNORM, RGBA16_SNORM);
+   FMT_CASE(RGBA8_SNORM, RGBA8_SNORM);
+   FMT_CASE(RG16_SNORM, RG16_SNORM);
+   FMT_CASE(RG8_SNORM, RG8_SNORM);
+   FMT_CASE(R16_SNORM, R16_SNORM);
+   FMT_CASE(R8_SNORM, R8_SNORM);
+
+   FMT_CASE(BGRA_INTEGER, BGRA8);
+   default:
+  ERROR("unknown format %x\n", format);
+  assert(false);
+  return nv50_ir::FMT_NONE;
+   }
+#undef FMT_CASE
+}
+
 bool
 Converter::visit(nir_intrinsic_instr *insn)
 {
@@ -1856,6 +1926,28 @@ Converter::visit(nir_intrinsic_instr *insn)
   info->io.globalAccess |= 0x1;
   break;
}
+   case nir_intrinsic_shared_atomic_add:
+   case nir_intrinsic_shared_atomic_and:
+   case nir_intrinsic_shared_atomic_comp_swap:
+   case 

[Mesa-dev] [PATCH v4 24/38] nvir/nir: implement nir_intrinsic_load_input

2018-01-09 Thread Karol Herbst
v3: and load_output
v4: use smarter getIndirect helper
use new getSlotAddress helper

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index d8593ee9cc..748d7740de 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -1474,6 +1474,44 @@ Converter::visit(nir_intrinsic_instr *insn)
   }
   break;
}
+   case nir_intrinsic_load_input:
+   case nir_intrinsic_load_output: {
+  const DataType dType = getDType(insn);
+  Value *indirect;
+  bool input = op == nir_intrinsic_load_input;
+
+  LValues  = convert(>dest);
+  auto idx = getIndirect(insn, 0, 0, 16, );
+  uint8_t offset = insn->const_index[1];
+  nv50_ir_varying& vary = input ? info->in[idx] : info->out[idx];
+
+  for (auto i = 0u; i < insn->num_components; ++i) {
+ uint32_t address = getSlotAddress(input, idx, i + offset, dType, 4);
+ Symbol *sym = mkSymbol(input ? FILE_SHADER_INPUT : 
FILE_SHADER_OUTPUT, 0, dType, address);
+ switch(prog->getType()) {
+ case Program::TYPE_FRAGMENT: {
+operation op;
+auto mode = translateInterpMode(, op);
+if (typeSizeof(dType) == 8) {
+   Value *lo = getSSA();
+   Value *hi = getSSA();
+
+   mkOp2(op, TYPE_U32, lo, sym, op == OP_PINTERP ? fp.position : 
nullptr)->setInterpolate(mode);
+   Symbol *sym1 = mkSymbol(input ? FILE_SHADER_INPUT : 
FILE_SHADER_OUTPUT, 0, dType, address + 4);
+   mkOp2(op, TYPE_U32, hi, sym1, op == OP_PINTERP ? fp.position : 
nullptr)->setInterpolate(mode);
+
+   mkOp2(OP_MERGE, dType, newDefs[i], lo, hi);
+} else
+   mkOp2(op, dType, newDefs[i], sym, op == OP_PINTERP ? 
fp.position : nullptr)->setInterpolate(mode);
+break;
+ }
+ default:
+mkLoad(dType, newDefs[i], sym, indirect)->perPatch = vary.patch;
+break;
+ }
+  }
+  break;
+   }
default:
   ERROR("unknown nir_intrinsic_op %s\n", nir_intrinsic_infos[op].name);
   return false;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 33/38] nvir/nir: implement nir_intrinsic_load_ubo

2018-01-09 Thread Karol Herbst
v4: use loadFrom helper

Signed-off-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp | 13 +
 1 file changed, 13 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index 27a2c4f886..5ff7eecbc0 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -1784,6 +1784,19 @@ Converter::visit(nir_intrinsic_instr *insn)
   mkOp1(getOperation(op), TYPE_U32, NULL, mkImm(idx))->fixed = 1;
   break;
}
+   case nir_intrinsic_load_ubo: {
+  const DataType dType = getDType(insn);
+  LValues  = convert(>dest);
+  Value *indirectIndex;
+  Value *indirectOffset;
+  uint32_t index = getIndirect(>src[0], 0, ) + 1;
+  uint32_t offset = getIndirect(>src[1], 0, );
+
+  for (auto i = 0u; i < insn->num_components; ++i) {
+ loadFrom(FILE_MEMORY_CONST, index, dType, newDefs[i], offset, i, 
indirectOffset, indirectIndex);
+  }
+  break;
+   }
default:
   ERROR("unknown nir_intrinsic_op %s\n", nir_intrinsic_infos[op].name);
   return false;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 23/38] nvir/nir: implement nir_intrinsic_store_(per_vertex_)output

2018-01-09 Thread Karol Herbst
v3: add workaround for RA issues
indirects have to be multiplied by 0x10
fix indirect access
v4: use smarter getIndirect helper
use storeTo helper

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 43 ++
 1 file changed, 43 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index ae92e94511..d8593ee9cc 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -1191,6 +1191,11 @@ Converter::visit(nir_function *function)
 
setPosition(entry, true);
 
+   if (info->io.genUserClip > 0) {
+  for (int c = 0; c < 4; ++c)
+ clipVtx[c] = getScratch();
+   }
+
switch (prog->getType()) {
case Program::TYPE_TESSELLATION_CONTROL:
   outBase = mkOp2v(
@@ -1217,6 +1222,8 @@ Converter::visit(nir_function *function)
bb->cfg.attach(>cfg, Graph::Edge::TREE);
setPosition(exit, true);
 
+   if (info->io.genUserClip > 0)
+  handleUserClipPlanes();
// TODO: for non main function this needs to be a OP_RETURN
mkOp(OP_EXIT, TYPE_NONE, NULL)->terminator = 1;
return true;
@@ -1431,6 +1438,42 @@ Converter::visit(nir_intrinsic_instr *insn)
   }
   break;
}
+   case nir_intrinsic_store_output:
+   case nir_intrinsic_store_per_vertex_output: {
+  Value *indirect;
+  DataType dType = getSType(insn->src[0], false, false);
+  auto idx = getIndirect(insn, op == nir_intrinsic_store_output ? 1 : 2, 
0, 16, );
+  uint8_t offset = insn->const_index[2];
+
+  for (auto i = 0u; i < insn->num_components; ++i) {
+ if (!((1u << i) & nir_intrinsic_write_mask(insn)))
+continue;
+
+ Value *src = getSrc(>src[0], i);
+ switch (prog->getType()) {
+ case Program::TYPE_FRAGMENT: {
+if (info->out[idx].sn == TGSI_SEMANTIC_POSITION) {
+   // TGSI uses a different interface than NIR, TGSI stores that 
value in the z component, NIR in X
+   offset += 2;
+   src = mkOp1v(OP_SAT, TYPE_F32, getScratch(), src);
+}
+break;
+ }
+ case Program::TYPE_VERTEX: {
+if (info->io.genUserClip > 0) {
+   mkMov(clipVtx[i], src);
+   src = clipVtx[i];
+}
+break;
+ }
+ default:
+break;
+ }
+
+ storeTo(FILE_SHADER_OUTPUT, OP_EXPORT, dType, src, idx, i + offset, 
indirect);
+  }
+  break;
+   }
default:
   ERROR("unknown nir_intrinsic_op %s\n", nir_intrinsic_infos[op].name);
   return false;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 37/38] nvir/nir: implement load_per_vertex_output

2018-01-09 Thread Karol Herbst
v4: use smarter getIndirect helper
use new getSlotAddress helper

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 25 ++
 1 file changed, 25 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index aeeca94f4c..0a78c6a593 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -1870,6 +1870,31 @@ Converter::visit(nir_intrinsic_instr *insn)
   }
   break;
}
+   case nir_intrinsic_load_per_vertex_output: {
+  const DataType dType = getDType(insn);
+  LValues  = convert(>dest);
+  Value *indirectVertex;
+  Value *indirectOffset;
+  auto baseVertex = getIndirect(>src[0], 0, );
+  auto idx = getIndirect(insn, 1, 0, 16, );
+  Value *vtxBase = nullptr;
+
+  if (indirectVertex)
+ vtxBase = indirectVertex;
+  else
+ vtxBase = loadImm(nullptr, baseVertex);
+
+  vtxBase = mkOp2v(OP_ADD, TYPE_U32, getSSA(4, FILE_ADDRESS), outBase, 
vtxBase);
+
+  for (auto i = 0u; i < insn->num_components; ++i) {
+ uint32_t address = getSlotAddress(false, idx, 
nir_intrinsic_component(insn) + i, dType, 4);
+ Symbol *sym = mkSymbol(FILE_SHADER_OUTPUT, 0, dType, address);
+ Instruction *ld = mkLoad(dType, newDefs[i], sym, indirectOffset);
+ ld->setIndirect(0, 1, vtxBase);
+ ld->perPatch = info->in[idx].patch;
+  }
+  break;
+   }
case nir_intrinsic_emit_vertex:
case nir_intrinsic_end_primitive: {
   auto idx = nir_intrinsic_stream_id(insn);
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 26/38] nvir/nir: implement loading system values

2018-01-09 Thread Karol Herbst
v2: support more sys values
fixed a bug where for multi component reads all values ended up in x
v3: add load_patch_vertices_in
v4: add subgroup stuff

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 108 +
 1 file changed, 108 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index 857385931c..eee7e4ccb5 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -81,6 +81,7 @@ public:
LValues& convert(nir_alu_dest *);
BasicBlock* convert(nir_block *);
LValues& convert(nir_dest *);
+   SVSemantic convert(nir_intrinsic_op);
LValues& convert(nir_register *);
LValues& convert(nir_ssa_def *);
 
@@ -1422,6 +1423,67 @@ Converter::visit(nir_jump_instr *insn)
return true;
 }
 
+SVSemantic
+Converter::convert(nir_intrinsic_op intr)
+{
+   switch (intr) {
+   case nir_intrinsic_load_base_vertex:
+  return SV_BASEVERTEX;
+   case nir_intrinsic_load_base_instance:
+  return SV_BASEINSTANCE;
+   case nir_intrinsic_load_draw_id:
+  return SV_DRAWID;
+   case nir_intrinsic_load_front_face:
+  return SV_FACE;
+   case nir_intrinsic_load_instance_id:
+  return SV_INSTANCE_ID;
+   case nir_intrinsic_load_invocation_id:
+  return SV_INVOCATION_ID;
+   case nir_intrinsic_load_local_group_size:
+  return SV_NTID;
+   case nir_intrinsic_load_local_invocation_id:
+  return SV_TID;
+   case nir_intrinsic_load_num_work_groups:
+  return SV_NCTAID;
+   case nir_intrinsic_load_patch_vertices_in:
+  return SV_VERTEX_COUNT;
+   case nir_intrinsic_load_primitive_id:
+  return SV_PRIMITIVE_ID;
+   case nir_intrinsic_load_sample_id:
+  return SV_SAMPLE_INDEX;
+   case nir_intrinsic_load_sample_mask_in:
+  return SV_SAMPLE_MASK;
+   case nir_intrinsic_load_sample_pos:
+  return SV_SAMPLE_POS;
+   case nir_intrinsic_load_subgroup_eq_mask:
+  return SV_LANEMASK_EQ;
+   case nir_intrinsic_load_subgroup_ge_mask:
+  return SV_LANEMASK_GE;
+   case nir_intrinsic_load_subgroup_gt_mask:
+  return SV_LANEMASK_GT;
+   case nir_intrinsic_load_subgroup_le_mask:
+  return SV_LANEMASK_LE;
+   case nir_intrinsic_load_subgroup_lt_mask:
+  return SV_LANEMASK_LT;
+   case nir_intrinsic_load_subgroup_invocation:
+  return SV_LANEID;
+   case nir_intrinsic_load_tess_coord:
+  return SV_TESS_COORD;
+   case nir_intrinsic_load_tess_level_inner:
+  return SV_TESS_INNER;
+   case nir_intrinsic_load_tess_level_outer:
+  return SV_TESS_OUTER;
+   case nir_intrinsic_load_vertex_id:
+  return SV_VERTEX_ID;
+   case nir_intrinsic_load_work_group_id:
+  return SV_CTAID;
+   default:
+  ERROR("unknown SVSemantic for nir_intrinsic_op %s\n", 
nir_intrinsic_infos[intr].name);
+  assert(false);
+  return SV_LAST;
+   }
+}
+
 bool
 Converter::visit(nir_intrinsic_instr *insn)
 {
@@ -1527,6 +1589,52 @@ Converter::visit(nir_intrinsic_instr *insn)
   mkOp(OP_DISCARD, TYPE_NONE, NULL)->setPredicate(CC_P, pred);
   break;
}
+   case nir_intrinsic_load_base_vertex:
+   case nir_intrinsic_load_base_instance:
+   case nir_intrinsic_load_draw_id:
+   case nir_intrinsic_load_front_face:
+   case nir_intrinsic_load_instance_id:
+   case nir_intrinsic_load_invocation_id:
+   case nir_intrinsic_load_local_group_size:
+   case nir_intrinsic_load_local_invocation_id:
+   case nir_intrinsic_load_num_work_groups:
+   case nir_intrinsic_load_patch_vertices_in:
+   case nir_intrinsic_load_primitive_id:
+   case nir_intrinsic_load_sample_id:
+   case nir_intrinsic_load_sample_mask_in:
+   case nir_intrinsic_load_sample_pos:
+   case nir_intrinsic_load_subgroup_eq_mask:
+   case nir_intrinsic_load_subgroup_ge_mask:
+   case nir_intrinsic_load_subgroup_gt_mask:
+   case nir_intrinsic_load_subgroup_le_mask:
+   case nir_intrinsic_load_subgroup_lt_mask:
+   case nir_intrinsic_load_subgroup_invocation:
+   case nir_intrinsic_load_tess_coord:
+   case nir_intrinsic_load_tess_level_inner:
+   case nir_intrinsic_load_tess_level_outer:
+   case nir_intrinsic_load_vertex_id:
+   case nir_intrinsic_load_work_group_id: {
+  SVSemantic sv = convert(op);
+  LValues  = convert(>dest);
+
+  for (auto i = 0u; i < insn->num_components; ++i) {
+ if (sv == SV_TID && info->prop.cp.numThreads[i] == 1) {
+loadImm(newDefs[i], 0u);
+continue;
+ }
+ Symbol *sym = mkSysVal(sv, i);
+ Instruction *rdsv = mkOp1(OP_RDSV, TYPE_U32, newDefs[i], sym);
+ if (sv == SV_TESS_OUTER || sv == SV_TESS_INNER)
+rdsv->perPatch = 1;
+  }
+  break;
+   }
+   // constants
+   case nir_intrinsic_load_subgroup_size: {
+  LValues  = convert(>dest);
+  loadImm(newDefs[0], 32u);
+  break;
+   }
default:
   ERROR("unknown nir_intrinsic_op 

[Mesa-dev] [PATCH v4 28/38] nvir/nir: implement nir_instr_type_tex

2018-01-09 Thread Karol Herbst
a lot of those fields are not valid for a lot of tex ops. Not quite sure if
it's worth the effort to check for those or just keep it like that. It seems
to kind of work.

v2: reworked offset handling
add tex support with indirect R/S arguments
handle GLSL_SAMPLER_DIM_EXTERNAL
drop reference in convert(glsl_sampler_dim&, bool, bool)
fix tg4 component selection

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 219 +
 1 file changed, 219 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index 206a512918..58c627371b 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -78,6 +78,7 @@ public:
 
Converter(Program *, nir_shader *, nv50_ir_prog_info *);
 
+   TexTarget convert(glsl_sampler_dim, bool isArray, bool isShadow);
LValues& convert(nir_alu_dest *);
BasicBlock* convert(nir_block *);
LValues& convert(nir_dest *);
@@ -115,6 +116,7 @@ public:
bool visit(nir_load_const_instr*);
bool visit(nir_loop *);
bool visit(nir_ssa_undef_instr *);
+   bool visit(nir_tex_instr *);
 
bool run();
 
@@ -129,9 +131,13 @@ public:
DataType getSType(nir_src&, bool isFloat, bool isSigned);
 
operation getOperation(nir_op);
+   operation getOperation(nir_texop);
operation preOperationNeeded(nir_op);
int getSubOp(nir_op);
CondCode getCondCode(nir_op);
+
+   // tex stuff
+   Value* applyProjection(Value *src, Value *proj);
 private:
nir_shader *nir;
 
@@ -358,6 +364,36 @@ Converter::getOperation(nir_op op)
}
 }
 
+operation
+Converter::getOperation(nir_texop op)
+{
+   switch (op) {
+   case nir_texop_tex:
+  return OP_TEX;
+   case nir_texop_lod:
+  return OP_TXLQ;
+   case nir_texop_txb:
+  return OP_TXB;
+   case nir_texop_txd:
+  return OP_TXD;
+   case nir_texop_txf:
+   case nir_texop_txf_ms:
+  return OP_TXF;
+   case nir_texop_tg4:
+  return OP_TXG;
+   case nir_texop_txl:
+  return OP_TXL;
+   case nir_texop_query_levels:
+   case nir_texop_texture_samples:
+   case nir_texop_txs:
+  return OP_TXQ;
+   default:
+  ERROR("couldn't get operation for nir_texop %u\n", op);
+  assert(false);
+  return OP_NOP;
+   }
+}
+
 operation
 Converter::preOperationNeeded(nir_op op)
 {
@@ -1363,6 +1399,10 @@ Converter::visit(nir_instr *insn)
   if (!visit(nir_instr_as_alu(insn)))
  return false;
   break;
+   case nir_instr_type_tex:
+  if (!visit(nir_instr_as_tex(insn)))
+ return false;
+  break;
case nir_instr_type_intrinsic:
   if (!visit(nir_instr_as_intrinsic(insn)))
  return false;
@@ -1974,6 +2014,185 @@ Converter::visit(nir_ssa_undef_instr *insn)
return true;
 }
 
+#define CASE_SAMPLER(ty) \
+   case GLSL_SAMPLER_DIM_ ## ty : \
+  if (isArray && !isShadow) \
+ return TEX_TARGET_ ## ty ## _ARRAY; \
+  else if (!isArray && isShadow) \
+ return TEX_TARGET_## ty ## _SHADOW; \
+  else if (isArray && isShadow) \
+ return TEX_TARGET_## ty ## _ARRAY_SHADOW; \
+  else \
+ return TEX_TARGET_ ## ty
+
+TexTarget
+Converter::convert(glsl_sampler_dim dim, bool isArray, bool isShadow)
+{
+   switch (dim) {
+   CASE_SAMPLER(1D);
+   CASE_SAMPLER(2D);
+   CASE_SAMPLER(CUBE);
+   case GLSL_SAMPLER_DIM_3D:
+  return TEX_TARGET_3D;
+   case GLSL_SAMPLER_DIM_MS:
+  if (isArray)
+ return TEX_TARGET_2D_MS_ARRAY;
+  return TEX_TARGET_2D_MS;
+   case GLSL_SAMPLER_DIM_RECT:
+  if (isShadow)
+ return TEX_TARGET_RECT_SHADOW;
+  return TEX_TARGET_RECT;
+   case GLSL_SAMPLER_DIM_BUF:
+  return TEX_TARGET_BUFFER;
+   case GLSL_SAMPLER_DIM_EXTERNAL:
+  return TEX_TARGET_2D;
+   default:
+  ERROR("unknown glsl_sampler_dim %u\n", dim);
+  assert(false);
+  return TEX_TARGET_COUNT;
+   }
+}
+#undef CASE_SAMPLER
+
+Value*
+Converter::applyProjection(Value *src, Value *proj)
+{
+   if (!proj)
+  return src;
+   return mkOp2v(OP_MUL, TYPE_F32, getScratch(), src, proj);
+}
+
+bool
+Converter::visit(nir_tex_instr *insn)
+{
+   switch (insn->op) {
+   case nir_texop_lod:
+   case nir_texop_query_levels:
+   case nir_texop_tex:
+   case nir_texop_texture_samples:
+   case nir_texop_tg4:
+   case nir_texop_txb:
+   case nir_texop_txd:
+   case nir_texop_txf:
+   case nir_texop_txf_ms:
+   case nir_texop_txl:
+   case nir_texop_txs: {
+  LValues  = convert(>dest);
+  std::vector srcs;
+  std::vector defs;
+  std::vector offsets;
+  uint8_t mask = 0;
+  bool lz = false;
+  Value *proj = nullptr;
+  TexInstruction::Target target = convert(insn->sampler_dim, 
insn->is_array, insn->is_shadow);
+  operation op = getOperation(insn->op);
+
+  int biasIdx = nir_tex_instr_src_index(insn, 

[Mesa-dev] [PATCH v4 20/38] nvir/nir: add skeleton for nir_intrinsic_instr

2018-01-09 Thread Karol Herbst
Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp  | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index 7010e4e468..9c00304ad3 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -83,6 +83,7 @@ public:
bool visit(nir_function *);
bool visit(nir_if *);
bool visit(nir_instr *);
+   bool visit(nir_intrinsic_instr *);
bool visit(nir_jump_instr *);
bool visit(nir_load_const_instr*);
bool visit(nir_loop *);
@@ -1185,6 +1186,10 @@ bool
 Converter::visit(nir_instr *insn)
 {
switch (insn->type) {
+   case nir_instr_type_intrinsic:
+  if (!visit(nir_instr_as_intrinsic(insn)))
+ return false;
+  break;
case nir_instr_type_load_const:
   if (!visit(nir_instr_as_load_const(insn)))
  return false;
@@ -1242,6 +1247,20 @@ Converter::visit(nir_jump_instr *insn)
return true;
 }
 
+bool
+Converter::visit(nir_intrinsic_instr *insn)
+{
+   nir_intrinsic_op op = insn->intrinsic;
+
+   switch (op) {
+   default:
+  ERROR("unknown nir_intrinsic_op %s\n", nir_intrinsic_infos[op].name);
+  return false;
+   }
+
+   return true;
+}
+
 bool
 Converter::run()
 {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 25/38] nvir/nir: implement intrinsic_discard(_if)

2018-01-09 Thread Karol Herbst
Signed-off-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index 748d7740de..857385931c 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -1512,6 +1512,21 @@ Converter::visit(nir_intrinsic_instr *insn)
   }
   break;
}
+   case nir_intrinsic_discard:
+  mkOp(OP_DISCARD, TYPE_NONE, NULL);
+  break;
+   case nir_intrinsic_discard_if: {
+  // we get a nir boolean value
+  Value *pred = new_LValue(func, FILE_PREDICATE);
+  if (insn->num_components > 1) {
+ ERROR("nir_intrinsic_discard_if only with 1 component supported!\n");
+ assert(false);
+ return false;
+  }
+  mkCmp(OP_SET, CC_NE, TYPE_U8, pred, TYPE_U32, getSrc(>src[0], 0), 
zero);
+  mkOp(OP_DISCARD, TYPE_NONE, NULL)->setPredicate(CC_P, pred);
+  break;
+   }
default:
   ERROR("unknown nir_intrinsic_op %s\n", nir_intrinsic_infos[op].name);
   return false;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 32/38] nvir/nir: implement geometry shader nir_intrinsics

2018-01-09 Thread Karol Herbst
v4: use smarter getIndirect helper
use new getSlotAddress helper
use loadFrom helper

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 25 ++
 1 file changed, 25 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index b4d75802c6..27a2c4f886 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -402,6 +402,10 @@ operation
 Converter::getOperation(nir_intrinsic_op op)
 {
switch (op) {
+   case nir_intrinsic_emit_vertex:
+  return OP_EMIT;
+   case nir_intrinsic_end_primitive:
+  return OP_RESTART;
default:
   ERROR("couldn't get operation for nir_intrinsic_op %u\n", op);
   assert(false);
@@ -1759,6 +1763,27 @@ Converter::visit(nir_intrinsic_instr *insn)
   mkOp3(OP_SHFL, dType, newDefs[0], getSrc(>src[0], 0), tmp, 
mkImm(0x1f))->subOp = NV50_IR_SUBOP_SHFL_IDX;
   break;
}
+   case nir_intrinsic_load_per_vertex_input: {
+  const DataType dType = getDType(insn);
+  LValues  = convert(>dest);
+  Value *indirectVertex;
+  Value *indirectOffset;
+  auto baseVertex = getIndirect(>src[0], 0, );
+  auto idx = getIndirect(insn, 1, 0, 16, );
+
+  Value *vtxBase = mkOp2v(OP_PFETCH, TYPE_U32, getSSA(4, FILE_ADDRESS), 
mkImm(baseVertex), indirectVertex);
+  for (auto i = 0u; i < insn->num_components; ++i) {
+ uint32_t address = getSlotAddress(true, idx, i + 
nir_intrinsic_component(insn), dType, 4);
+ loadFrom(FILE_SHADER_INPUT, 0, dType, newDefs[i], address, 0, 
indirectOffset, vtxBase, info->in[idx].patch);
+  }
+  break;
+   }
+   case nir_intrinsic_emit_vertex:
+   case nir_intrinsic_end_primitive: {
+  auto idx = nir_intrinsic_stream_id(insn);
+  mkOp1(getOperation(op), TYPE_U32, NULL, mkImm(idx))->fixed = 1;
+  break;
+   }
default:
   ERROR("unknown nir_intrinsic_op %s\n", nir_intrinsic_infos[op].name);
   return false;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 34/38] nvir/nir: implement ssbo intrinsics

2018-01-09 Thread Karol Herbst
v4: use loadFrom helper

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 86 ++
 1 file changed, 86 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index 5ff7eecbc0..9105cddf93 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -439,6 +439,24 @@ int
 Converter::getSubOp(nir_intrinsic_op op)
 {
switch (op) {
+   case nir_intrinsic_ssbo_atomic_add:
+  return NV50_IR_SUBOP_ATOM_ADD;
+   case nir_intrinsic_ssbo_atomic_and:
+  return NV50_IR_SUBOP_ATOM_AND;
+   case nir_intrinsic_ssbo_atomic_comp_swap:
+  return NV50_IR_SUBOP_ATOM_CAS;
+   case nir_intrinsic_ssbo_atomic_exchange:
+  return NV50_IR_SUBOP_ATOM_EXCH;
+   case nir_intrinsic_ssbo_atomic_or:
+  return NV50_IR_SUBOP_ATOM_OR;
+   case nir_intrinsic_ssbo_atomic_imax:
+   case nir_intrinsic_ssbo_atomic_umax:
+  return NV50_IR_SUBOP_ATOM_MAX;
+   case nir_intrinsic_ssbo_atomic_imin:
+   case nir_intrinsic_ssbo_atomic_umin:
+  return NV50_IR_SUBOP_ATOM_MIN;
+   case nir_intrinsic_ssbo_atomic_xor:
+  return NV50_IR_SUBOP_ATOM_XOR;
case nir_intrinsic_vote_all:
   return NV50_IR_SUBOP_VOTE_ALL;
case nir_intrinsic_vote_any:
@@ -1797,6 +1815,74 @@ Converter::visit(nir_intrinsic_instr *insn)
   }
   break;
}
+   case nir_intrinsic_get_buffer_size: {
+  LValues  = convert(>dest);
+  const DataType dType = getDType(insn);
+  Value *indirectBuffer;
+  uint32_t buffer = getIndirect(>src[0], 0, );
+
+  Symbol *sym = mkSymbol(FILE_MEMORY_BUFFER, buffer, dType, 0);
+  mkOp1(OP_BUFQ, dType, newDefs[0], sym)->setIndirect(0, 0, 
indirectBuffer);
+  info->io.globalAccess |= 0x2;
+  break;
+   }
+   case nir_intrinsic_store_ssbo: {
+  DataType sType = getSType(insn->src[0], false, false);
+  Value *indirectBuffer;
+  Value *indirectOffset;
+  uint32_t buffer = getIndirect(>src[1], 0, );
+  uint32_t offset = getIndirect(>src[2], 0, );
+
+  for (auto i = 0u; i < insn->num_components; ++i) {
+ if (!((1u << i) & nir_intrinsic_write_mask(insn)))
+continue;
+ Symbol *sym = mkSymbol(FILE_MEMORY_BUFFER, buffer, sType, offset + i 
* typeSizeof(sType));
+ mkStore(OP_STORE, sType, sym, indirectOffset, getSrc(>src[0], 
i))->setIndirect(0, 1, indirectBuffer);
+  }
+  info->io.globalAccess |= 0x2;
+  break;
+   }
+   case nir_intrinsic_load_ssbo: {
+  const DataType dType = getDType(insn);
+  LValues  = convert(>dest);
+  Value *indirectBuffer;
+  Value *indirectOffset;
+  uint32_t buffer = getIndirect(>src[0], 0, );
+  uint32_t offset = getIndirect(>src[1], 0, );
+
+  for (auto i = 0u; i < insn->num_components; ++i)
+ loadFrom(FILE_MEMORY_BUFFER, buffer, dType, newDefs[i], offset, i, 
indirectOffset, indirectBuffer);
+
+  info->io.globalAccess |= 0x1;
+  break;
+   }
+   case nir_intrinsic_ssbo_atomic_add:
+   case nir_intrinsic_ssbo_atomic_and:
+   case nir_intrinsic_ssbo_atomic_comp_swap:
+   case nir_intrinsic_ssbo_atomic_exchange:
+   case nir_intrinsic_ssbo_atomic_or:
+   case nir_intrinsic_ssbo_atomic_imax:
+   case nir_intrinsic_ssbo_atomic_imin:
+   case nir_intrinsic_ssbo_atomic_umax:
+   case nir_intrinsic_ssbo_atomic_umin:
+   case nir_intrinsic_ssbo_atomic_xor: {
+  const DataType dType = getDType(insn);
+  LValues  = convert(>dest);
+  Value *indirectBuffer;
+  Value *indirectOffset;
+  uint32_t buffer = getIndirect(>src[0], 0, );
+  uint32_t offset = getIndirect(>src[1], 0, );
+
+  Symbol *sym = mkSymbol(FILE_MEMORY_BUFFER, buffer, dType, offset);
+  Instruction *atom = mkOp2(OP_ATOM, dType, newDefs[0], sym, 
getSrc(>src[2], 0));
+  if (op == nir_intrinsic_ssbo_atomic_comp_swap)
+ atom->setSrc(2, getSrc(>src[3], 0));
+  atom->setIndirect(0, 0, indirectOffset);
+  atom->subOp = getSubOp(op);
+
+  info->io.globalAccess |= 0x2;
+  break;
+   }
default:
   ERROR("unknown nir_intrinsic_op %s\n", nir_intrinsic_infos[op].name);
   return false;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 31/38] nvir/nir: implement variable indexing

2018-01-09 Thread Karol Herbst
we store those arrays in local memory and reserve some space for each of the
arrays. The arrays are stored in a packed format, because we know quite easily
the context of each index. We don't do that in TGSI so far.

This causes various issues to come up in the MemoryOpt pass, because ld/st with
indirects aren't guarenteed to be aligned to 0x10 anymore.

v3: use fixed size vec4 arrays until we fix MemoryOpt
v4: fix for 64 bit types

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 60 ++
 1 file changed, 60 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index 996c202645..b4d75802c6 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -74,6 +74,7 @@ public:
typedef decltype(nir_ssa_def().index) NirSSADefIdx;
typedef decltype(nir_ssa_def().bit_size) NirSSADefBitSize;
typedef std::unordered_map NirDefMap;
+   typedef std::unordered_map NirArrayLMemOffsets;
typedef std::unordered_map 
NirBlockMap;
 
Converter(Program *, nir_shader *, nv50_ir_prog_info *);
@@ -145,6 +146,7 @@ private:
 
NirDefMap ssaDefs;
NirDefMap regDefs;
+   NirArrayLMemOffsets regToLmemOffset;
NirBlockMap blocks;
unsigned int curLoopDepth;
 
@@ -1194,6 +1196,7 @@ Converter::storeTo(DataFile file, operation op, DataType 
ty, Value *src, uint8_t
 bool
 Converter::parseNIR()
 {
+   info->bin.tlsSpace = 0;
info->io.clipDistances = nir->info.clip_distance_array_size;
info->io.cullDistances = nir->info.cull_distance_array_size;
 
@@ -1281,6 +1284,17 @@ Converter::visit(nir_function *function)
   break;
}
 
+   nir_foreach_register(reg, >impl->registers) {
+  if (reg->num_array_elems) {
+ // TODO: packed variables would be nice, but MemoryOpt fails
+ // uint32_t size = reg->num_components * reg->num_array_elems * 
(reg->bit_size / 8);
+ uint32_t size = 4 * reg->num_array_elems * (reg->bit_size / 8);
+ // reserve some lmem
+ regToLmemOffset[reg->index] = info->bin.tlsSpace;
+ info->bin.tlsSpace += size;
+  }
+   }
+
nir_index_ssa_defs(function->impl);
foreach_list_typed(nir_cf_node, node, node, >impl->body) {
   if (!visit(node))
@@ -1886,6 +1900,51 @@ Converter::visit(nir_alu_instr *insn)
 *   2. they basically just merge multiple values into one data type
 */
CASE_OPFI(mov):
+  if (!insn->dest.dest.is_ssa && insn->dest.dest.reg.reg->num_array_elems) 
{
+ nir_reg_dest& reg = insn->dest.dest.reg;
+ auto goffset = regToLmemOffset[reg.reg->index];
+ auto comps = reg.reg->num_components;
+ auto size = reg.reg->bit_size / 8;
+ auto csize = 4 * size; // TODO after fixing MemoryOpts: comps * size;
+ auto aoffset = csize * reg.base_offset;
+ Value *indirect = nullptr;
+
+ if (reg.indirect)
+indirect = mkOp2v(OP_MUL, TYPE_U32, getSSA(4, FILE_ADDRESS), 
getSrc(reg.indirect, 0), mkImm(csize));
+
+ for (auto i = 0u; i < comps; ++i) {
+if (!((1u << i) & insn->dest.write_mask))
+   continue;
+
+Symbol *sym = mkSymbol(FILE_MEMORY_LOCAL, 0, dType, goffset + 
aoffset + i * size);
+mkStore(OP_STORE, dType, sym, indirect, getSrc(>src[0], i));
+ }
+ break;
+  } else if (!insn->src[0].src.is_ssa && 
insn->src[0].src.reg.reg->num_array_elems) {
+ LValues  = convert(>dest);
+ nir_reg_src& reg = insn->src[0].src.reg;
+ auto goffset = regToLmemOffset[reg.reg->index];
+ // auto comps = reg.reg->num_components;
+ auto size = reg.reg->bit_size / 8;
+ auto csize = 4 * size; // TODO after fixing MemoryOpts: comps * size;
+ auto aoffset = csize * reg.base_offset;
+ Value *indirect = nullptr;
+
+ if (reg.indirect)
+indirect = mkOp2v(OP_MUL, TYPE_U32, getSSA(4, FILE_ADDRESS), 
getSrc(reg.indirect, 0), mkImm(csize));
+
+ for (auto i = 0u; i < newDefs.size(); ++i) {
+Symbol *sym = mkSymbol(FILE_MEMORY_LOCAL, 0, dType, goffset + 
aoffset + i * size);
+mkLoad(dType, newDefs[i], sym, indirect);
+ }
+ break;
+  } else {
+ LValues  = convert(>dest);
+ for (LValues::size_type c = 0u; c < newDefs.size(); ++c) {
+mkMov(newDefs[c], getSrc(>src[0], c), dType);
+ }
+  }
+  break;
case nir_op_vec2:
case nir_op_vec3:
case nir_op_vec4: {
@@ -2286,6 +2345,7 @@ Converter::run()
   NIR_PASS(progress, nir, nir_opt_dead_cf);
} while (progress);
 
+   NIR_PASS_V(nir, nir_lower_locals_to_regs);
NIR_PASS_V(nir, nir_remove_dead_variables, nir_var_local);
NIR_PASS_V(nir, 

[Mesa-dev] [PATCH v4 36/38] nvir/nir: add memory barriers

2018-01-09 Thread Karol Herbst
Signed-off-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index f7b51339c2..aeeca94f4c 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -455,6 +455,10 @@ Converter::getSubOp(nir_intrinsic_op op)
CASE_OP_INTR_ATOM(and, AND);
CASE_OP_INTR_ATOM(comp_swap, CAS);
CASE_OP_INTR_ATOM(exchange, EXCH);
+   case nir_intrinsic_memory_barrier:
+  return NV50_IR_SUBOP_MEMBAR(M, GL);
+   case nir_intrinsic_memory_barrier_shared:
+  return NV50_IR_SUBOP_MEMBAR(M, CTA);
CASE_OP_INTR_ATOM(or, OR);
case nir_intrinsic_image_atomic_max:
CASE_OP_INTR_ATOM_S(imax, MAX);
@@ -2114,6 +2118,13 @@ Converter::visit(nir_intrinsic_instr *insn)
   bar->subOp = NV50_IR_SUBOP_BAR_SYNC;
   break;
}
+   case nir_intrinsic_memory_barrier:
+   case nir_intrinsic_memory_barrier_shared: {
+  Instruction *bar = mkOp(OP_MEMBAR, TYPE_NONE, NULL);
+  bar->fixed = 1;
+  bar->subOp = getSubOp(op);
+  break;
+   }
default:
   ERROR("unknown nir_intrinsic_op %s\n", nir_intrinsic_infos[op].name);
   return false;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 18/38] nvir/nir: implement CFG handling

2018-01-09 Thread Karol Herbst
Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 255 -
 1 file changed, 253 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index 38f03523f3..66ec4460d9 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -49,10 +49,12 @@ public:
typedef decltype(nir_ssa_def().index) NirSSADefIdx;
typedef decltype(nir_ssa_def().bit_size) NirSSADefBitSize;
typedef std::unordered_map NirDefMap;
+   typedef std::unordered_map 
NirBlockMap;
 
Converter(Program *, nir_shader *, nv50_ir_prog_info *);
 
LValues& convert(nir_alu_dest *);
+   BasicBlock* convert(nir_block *);
LValues& convert(nir_dest *);
LValues& convert(nir_register *);
LValues& convert(nir_ssa_def *);
@@ -76,6 +78,14 @@ public:
Instruction *loadFrom(DataFile, uint8_t, DataType, Value *def, uint32_t 
base, uint8_t c, Value *indirect0 = nullptr, Value *indirect1 = nullptr, bool 
patch = false);
void storeTo(DataFile, operation, DataType, Value *src, uint8_t idx, 
uint8_t c, Value *indirect0 = nullptr, Value *indirect1 = nullptr);
 
+   bool visit(nir_block *);
+   bool visit(nir_cf_node *);
+   bool visit(nir_function *);
+   bool visit(nir_if *);
+   bool visit(nir_instr *);
+   bool visit(nir_jump_instr *);
+   bool visit(nir_loop *);
+
bool run();
 
bool isFloatType(nir_alu_type);
@@ -93,11 +103,34 @@ private:
 
NirDefMap ssaDefs;
NirDefMap regDefs;
+   NirBlockMap blocks;
+   unsigned int curLoopDepth;
+
+   BasicBlock *exit;
+
+   union {
+  struct {
+ Value *position;
+  } fp;
+   };
 };
 
 Converter::Converter(Program *prog, nir_shader *nir, nv50_ir_prog_info *info)
: ConverterCommon(prog, info),
- nir(nir) {}
+ nir(nir),
+ curLoopDepth(0) {}
+
+BasicBlock *
+Converter::convert(nir_block *block)
+{
+   NirBlockMap::iterator it = blocks.find(block->index);
+   if (it != blocks.end())
+  return (*it).second;
+
+   BasicBlock *bb = new BasicBlock(func);
+   blocks[block->index] = bb;
+   return bb;
+}
 
 bool
 Converter::isFloatType(nir_alu_type type)
@@ -976,6 +1009,219 @@ Converter::parseNIR()
return true;
 }
 
+bool
+Converter::visit(nir_function *function)
+{
+   // we only support emiting the main function for now
+   assert(!strcmp(function->name, "main"));
+   assert(function->impl);
+
+   // usually the blocks will set everything up, but main is special
+   BasicBlock *entry = new BasicBlock(prog->main);
+   exit = new BasicBlock(prog->main);
+   blocks[nir_start_block(function->impl)->index] = entry;
+   prog->main->setEntry(entry);
+   prog->main->setExit(exit);
+
+   setPosition(entry, true);
+
+   switch (prog->getType()) {
+   case Program::TYPE_TESSELLATION_CONTROL:
+  outBase = mkOp2v(
+ OP_SUB, TYPE_U32, getSSA(),
+ mkOp1v(OP_RDSV, TYPE_U32, getSSA(), mkSysVal(SV_LANEID, 0)),
+ mkOp1v(OP_RDSV, TYPE_U32, getSSA(), mkSysVal(SV_INVOCATION_ID, 0)));
+  break;
+   case Program::TYPE_FRAGMENT: {
+  Symbol *sv = mkSysVal(SV_POSITION, 3);
+  fragCoord[3] = mkOp1v(OP_RDSV, TYPE_F32, getSSA(), sv);
+  fp.position = mkOp1v(OP_RCP, TYPE_F32, fragCoord[3], fragCoord[3]);
+  break;
+   }
+   default:
+  break;
+   }
+
+   nir_index_ssa_defs(function->impl);
+   foreach_list_typed(nir_cf_node, node, node, >impl->body) {
+  if (!visit(node))
+ return false;
+   }
+
+   bb->cfg.attach(>cfg, Graph::Edge::TREE);
+   setPosition(exit, true);
+
+   // TODO: for non main function this needs to be a OP_RETURN
+   mkOp(OP_EXIT, TYPE_NONE, NULL)->terminator = 1;
+   return true;
+}
+
+bool
+Converter::visit(nir_cf_node *node)
+{
+   switch (node->type) {
+   case nir_cf_node_block:
+  if (!visit(nir_cf_node_as_block(node)))
+ return false;
+  break;
+   case nir_cf_node_if:
+  if (!visit(nir_cf_node_as_if(node)))
+ return false;
+  break;
+   case nir_cf_node_loop:
+  if (!visit(nir_cf_node_as_loop(node)))
+ return false;
+  break;
+   default:
+  ERROR("unknown nir_cf_node type %u\n", node->type);
+  return false;
+   }
+   return true;
+}
+
+bool
+Converter::visit(nir_block *block)
+{
+   BasicBlock *bb = convert(block);
+
+   setPosition(bb, true);
+   nir_foreach_instr(insn, block) {
+  if (!visit(insn))
+ return false;
+   }
+   return true;
+}
+
+bool
+Converter::visit(nir_if *nif)
+{
+   DataType sType = getSType(nif->condition, false, false);
+   Value *src = getSrc(>condition, 0);
+
+   nir_block *lastThen = nir_if_last_then_block(nif);
+   nir_block *lastElse = nir_if_last_else_block(nif);
+
+   assert(!lastThen->successors[1]);
+   assert(!lastElse->successors[1]);
+
+   BasicBlock *ifBB = 

[Mesa-dev] [PATCH v4 29/38] nvir/nir: add getOperation for intrinsics

2018-01-09 Thread Karol Herbst
Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 24 ++
 1 file changed, 24 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index 58c627371b..ef0e58d4b8 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -131,9 +131,11 @@ public:
DataType getSType(nir_src&, bool isFloat, bool isSigned);
 
operation getOperation(nir_op);
+   operation getOperation(nir_intrinsic_op);
operation getOperation(nir_texop);
operation preOperationNeeded(nir_op);
int getSubOp(nir_op);
+   int getSubOp(nir_intrinsic_op);
CondCode getCondCode(nir_op);
 
// tex stuff
@@ -394,6 +396,17 @@ Converter::getOperation(nir_texop op)
}
 }
 
+operation
+Converter::getOperation(nir_intrinsic_op op)
+{
+   switch (op) {
+   default:
+  ERROR("couldn't get operation for nir_intrinsic_op %u\n", op);
+  assert(false);
+  return OP_NOP;
+   }
+}
+
 operation
 Converter::preOperationNeeded(nir_op op)
 {
@@ -416,6 +429,17 @@ Converter::getSubOp(nir_op op)
}
 }
 
+int
+Converter::getSubOp(nir_intrinsic_op op)
+{
+   switch (op) {
+   default:
+  ERROR("couldn't get subop for nir_intrinsic_op %u\n", op);
+  assert(false);
+  return 0;
+   }
+}
+
 CondCode
 Converter::getCondCode(nir_op op)
 {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 22/38] nvir/nir: implement nir_intrinsic_load_uniform

2018-01-09 Thread Karol Herbst
v2: use new getIndirect helper
fixes symbols for 64 bit types
v4: use smarter getIndirect helper
simplify address calculation
use loadFrom helper

Signed-off-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index 572ccfa4eb..ae92e94511 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -1421,6 +1421,16 @@ Converter::visit(nir_intrinsic_instr *insn)
nir_intrinsic_op op = insn->intrinsic;
 
switch (op) {
+   case nir_intrinsic_load_uniform: {
+  LValues  = convert(>dest);
+  const DataType dType = getDType(insn);
+  Value *indirect;
+  auto coffset = getIndirect(insn, 0, 0, 16, );
+  for (auto i = 0; i < insn->num_components; ++i) {
+ loadFrom(FILE_MEMORY_CONST, 0, dType, newDefs[i], 16 * coffset, i, 
indirect);
+  }
+  break;
+   }
default:
   ERROR("unknown nir_intrinsic_op %s\n", nir_intrinsic_infos[op].name);
   return false;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 16/38] nvir/nir: add loadFrom and storeTo helpler

2018-01-09 Thread Karol Herbst
Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 47 ++
 1 file changed, 47 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index f3cd22622d..6eaabef6cb 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -72,6 +72,8 @@ public:
bool assignSlots();
 
uint32_t getSlotAddress(bool input, uint8_t idx, uint8_t slot, DataType ty 
= TYPE_U32, uint8_t f = 4);
+   Instruction *loadFrom(DataFile, uint8_t, DataType, Value *def, uint32_t 
base, uint8_t c, Value *indirect0 = nullptr, Value *indirect1 = nullptr, bool 
patch = false);
+   void storeTo(DataFile, operation, DataType, Value *src, uint8_t idx, 
uint8_t c, Value *indirect0 = nullptr, Value *indirect1 = nullptr);
 
bool run();
 
@@ -876,6 +878,51 @@ Converter::getSlotAddress(bool input, uint8_t idx, uint8_t 
slot, DataType ty, ui
return vary[idx].slot[slot] * f;
 }
 
+Instruction *
+Converter::loadFrom(DataFile file, uint8_t i, DataType ty, Value *def, 
uint32_t base, uint8_t c, Value *indirect0, Value *indirect1, bool patch)
+{
+   if (typeSizeof(ty) == 8 && (file == FILE_MEMORY_CONST || file == 
FILE_MEMORY_BUFFER || indirect0)) {
+  Value *lo = getSSA();
+  Value *hi = getSSA();
+  Instruction *loi = mkLoad(TYPE_U32, lo, mkSymbol(file, i, TYPE_U32, base 
+ c * typeSizeof(ty)), indirect0);
+  loi->setIndirect(0, 1, indirect1);
+  loi->perPatch = patch;
+  Instruction *hii = mkLoad(TYPE_U32, hi, mkSymbol(file, i, TYPE_U32, base 
+ c * typeSizeof(ty) + 4), indirect0);
+  hii->setIndirect(0, 1, indirect1);
+  hii->perPatch = patch;
+  return mkOp2(OP_MERGE, ty, def, lo, hi);
+   } else {
+  Instruction *ld = mkLoad(ty, def, mkSymbol(file, i, ty, base + c * 
typeSizeof(ty)), indirect0);
+  ld->setIndirect(0, 1, indirect1);
+  ld->perPatch = patch;
+  return ld;
+   }
+}
+
+void
+Converter::storeTo(DataFile file, operation op, DataType ty, Value *src, 
uint8_t idx, uint8_t c, Value *indirect0, Value *indirect1)
+{
+   uint8_t size = typeSizeof(ty);
+   uint32_t address = getSlotAddress(false, idx, c, ty, 4);
+
+   if (size == 8 && indirect0) {
+  Value *split[2];
+  mkSplit(split, 4, src);
+
+  if (op == OP_EXPORT) {
+ split[0] = mkMov(getSSA(), split[0], ty)->getDef(0);
+ split[1] = mkMov(getSSA(), split[1], ty)->getDef(0);
+  }
+
+  mkStore(op, TYPE_U32, mkSymbol(file, 0, TYPE_U32, address), indirect0, 
split[0])->perPatch = info->out[idx].patch;
+  mkStore(op, TYPE_U32, mkSymbol(file, 0, TYPE_U32, address + 4), 
indirect0, split[1])->perPatch = info->out[idx].patch;
+   } else {
+  if (op == OP_EXPORT)
+ src = mkMov(getSSA(size), src, ty)->getDef(0);
+  mkStore(op, ty, mkSymbol(file, 0, ty, address), indirect0, 
src)->perPatch = info->out[idx].patch;
+   }
+}
+
 bool
 Converter::run()
 {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 17/38] nvir/nir: parse NIR shader info

2018-01-09 Thread Karol Herbst
v2: parse a few more fields
v3: add special handling for GL_ISOLINES

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 58 ++
 1 file changed, 58 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index 6eaabef6cb..38f03523f3 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -70,6 +70,7 @@ public:
bool centroid,
unsigned semantics);
bool assignSlots();
+   bool parseNIR();
 
uint32_t getSlotAddress(bool input, uint8_t idx, uint8_t slot, DataType ty 
= TYPE_U32, uint8_t f = 4);
Instruction *loadFrom(DataFile, uint8_t, DataType, Value *def, uint32_t 
base, uint8_t c, Value *indirect0 = nullptr, Value *indirect1 = nullptr, bool 
patch = false);
@@ -923,6 +924,58 @@ Converter::storeTo(DataFile file, operation op, DataType 
ty, Value *src, uint8_t
}
 }
 
+bool
+Converter::parseNIR()
+{
+   info->io.clipDistances = nir->info.clip_distance_array_size;
+   info->io.cullDistances = nir->info.cull_distance_array_size;
+
+   switch(prog->getType()) {
+   case Program::TYPE_COMPUTE:
+  info->prop.cp.numThreads[0] = nir->info.cs.local_size[0];
+  info->prop.cp.numThreads[1] = nir->info.cs.local_size[1];
+  info->prop.cp.numThreads[2] = nir->info.cs.local_size[2];
+  info->bin.smemSize = nir->info.cs.shared_size;
+  break;
+   case Program::TYPE_FRAGMENT:
+  info->prop.fp.earlyFragTests = nir->info.fs.early_fragment_tests;
+  info->prop.fp.persampleInvocation =
+ (nir->info.system_values_read & SYSTEM_BIT_SAMPLE_ID) ||
+ (nir->info.system_values_read & SYSTEM_BIT_SAMPLE_POS);
+  info->prop.fp.postDepthCoverage = nir->info.fs.post_depth_coverage;
+  info->prop.fp.usesDiscard = nir->info.fs.uses_discard;
+  info->prop.fp.usesSampleMaskIn = !!(nir->info.system_values_read & 
SYSTEM_BIT_SAMPLE_MASK_IN);
+  break;
+   case Program::TYPE_GEOMETRY:
+  info->prop.gp.inputPrim = nir->info.gs.input_primitive;
+  info->prop.gp.instanceCount = nir->info.gs.invocations;
+  info->prop.gp.maxVertices = nir->info.gs.vertices_out;
+  info->prop.gp.outputPrim = nir->info.gs.output_primitive;
+  break;
+   case Program::TYPE_TESSELLATION_CONTROL:
+   case Program::TYPE_TESSELLATION_EVAL:
+  if (nir->info.tess.primitive_mode == GL_ISOLINES)
+ info->prop.tp.domain = GL_LINES;
+  else
+ info->prop.tp.domain = nir->info.tess.primitive_mode;
+  info->prop.tp.outputPatchSize = nir->info.tess.tcs_vertices_out;
+  info->prop.tp.outputPrim = nir->info.tess.point_mode ? PIPE_PRIM_POINTS 
: PIPE_PRIM_TRIANGLES;
+  info->prop.tp.partitioning = (nir->info.tess.spacing + 1) % 3;
+  info->prop.tp.winding = !nir->info.tess.ccw;
+  break;
+   case Program::TYPE_VERTEX:
+  info->prop.vp.usesDrawParameters =
+ (nir->info.system_values_read & 
BITFIELD64_BIT(SYSTEM_VALUE_BASE_VERTEX)) ||
+ (nir->info.system_values_read & 
BITFIELD64_BIT(SYSTEM_VALUE_BASE_INSTANCE)) ||
+ (nir->info.system_values_read & BITFIELD64_BIT(SYSTEM_VALUE_DRAW_ID));
+  break;
+   default:
+  break;
+   }
+
+   return true;
+}
+
 bool
 Converter::run()
 {
@@ -960,6 +1013,11 @@ Converter::run()
if (prog->dbgFlags & NV50_IR_DEBUG_BASIC)
   nir_print_shader(nir, stderr);
 
+   if (!parseNIR()) {
+  ERROR("Couldn't prase NIR!\n");
+  return false;
+   }
+
if (!assignSlots()) {
   ERROR("Couldn't assign slots!\n");
   return false;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 19/38] nvir/nir: implement nir_load_const_instr

2018-01-09 Thread Karol Herbst
Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp | 20 
 1 file changed, 20 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index 66ec4460d9..7010e4e468 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -84,6 +84,7 @@ public:
bool visit(nir_if *);
bool visit(nir_instr *);
bool visit(nir_jump_instr *);
+   bool visit(nir_load_const_instr*);
bool visit(nir_loop *);
 
bool run();
@@ -1184,6 +1185,10 @@ bool
 Converter::visit(nir_instr *insn)
 {
switch (insn->type) {
+   case nir_instr_type_load_const:
+  if (!visit(nir_instr_as_load_const(insn)))
+ return false;
+  break;
case nir_instr_type_jump:
   if (!visit(nir_instr_as_jump(insn)))
  return false;
@@ -1195,6 +1200,21 @@ Converter::visit(nir_instr *insn)
return true;
 }
 
+bool
+Converter::visit(nir_load_const_instr *insn)
+{
+   assert(insn->def.bit_size <= 64);
+
+   LValues  = convert(>def);
+   for (int i = 0; i < insn->def.num_components; i++) {
+  if (insn->def.bit_size > 32)
+ loadImm(newDefs[i], insn->value.u64[i]);
+  else
+ loadImm(newDefs[i], insn->value.u32[i]);
+   }
+   return true;
+}
+
 bool
 Converter::visit(nir_jump_instr *insn)
 {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 09/38] nvc0: add support for NIR

2018-01-09 Thread Karol Herbst
not all those nir options are actually required, it just made the work a
little easier.

v2: fix asserts
parse compute shaders
don't lower bitfield_insert
v3: fix memory leak
v4: don't lower fmod32

Signed-off-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/Makefile.sources   |  1 +
 src/gallium/drivers/nouveau/codegen/nv50_ir.cpp|  3 +
 src/gallium/drivers/nouveau/codegen/nv50_ir.h  |  1 +
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 69 ++
 src/gallium/drivers/nouveau/meson.build| 10 ++--
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c| 18 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 40 -
 src/gallium/drivers/nouveau/nvc0/nvc0_state.c  | 27 -
 8 files changed, 160 insertions(+), 9 deletions(-)
 create mode 100644 src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp

diff --git a/src/gallium/drivers/nouveau/Makefile.sources 
b/src/gallium/drivers/nouveau/Makefile.sources
index ec344c6316..c6a1aff711 100644
--- a/src/gallium/drivers/nouveau/Makefile.sources
+++ b/src/gallium/drivers/nouveau/Makefile.sources
@@ -117,6 +117,7 @@ NV50_CODEGEN_SOURCES := \
codegen/nv50_ir_emit_nv50.cpp \
codegen/nv50_ir_from_common.cpp \
codegen/nv50_ir_from_common.h \
+   codegen/nv50_ir_from_nir.cpp \
codegen/nv50_ir_from_tgsi.cpp \
codegen/nv50_ir_graph.cpp \
codegen/nv50_ir_graph.h \
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
index 6f12df70a1..b95ba8e4e9 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
@@ -1231,6 +1231,9 @@ nv50_ir_generate_code(struct nv50_ir_prog_info *info)
prog->optLevel = info->optLevel;
 
switch (info->bin.sourceRep) {
+   case PIPE_SHADER_IR_NIR:
+  ret = prog->makeFromNIR(info) ? 0 : -2;
+  break;
case PIPE_SHADER_IR_TGSI:
   ret = prog->makeFromTGSI(info) ? 0 : -2;
   break;
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index f4f3c70888..e5b4592a61 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -1255,6 +1255,7 @@ public:
inline void del(Function *fn, int& id) { allFuncs.remove(id); }
inline void add(Value *rval, int& id) { allRValues.insert(rval, id); }
 
+   bool makeFromNIR(struct nv50_ir_prog_info *);
bool makeFromTGSI(struct nv50_ir_prog_info *);
bool convertToSSA();
bool optimizeSSA(int level);
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
new file mode 100644
index 00..6bccd14bce
--- /dev/null
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -0,0 +1,69 @@
+/*
+ * Copyright 2017 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Karol Herbst 
+ */
+
+#include "compiler/nir/nir.h"
+
+#include "codegen/nv50_ir.h"
+#include "codegen/nv50_ir_from_common.h"
+#include "codegen/nv50_ir_util.h"
+
+namespace {
+
+using namespace nv50_ir;
+
+class Converter : public ConverterCommon
+{
+public:
+   Converter(Program *, nir_shader *, nv50_ir_prog_info *);
+
+   bool run();
+private:
+   nir_shader *nir;
+};
+
+Converter::Converter(Program *prog, nir_shader *nir, nv50_ir_prog_info *info)
+   : ConverterCommon(prog, info),
+ nir(nir) {}
+
+bool
+Converter::run()
+{
+   return false;
+}
+
+} // unnamed namespace
+
+namespace nv50_ir {
+
+bool
+Program::makeFromNIR(struct nv50_ir_prog_info *info)
+{
+   nir_shader *nir = (nir_shader*)info->bin.source;
+   Converter converter(this, nir, info);
+   bool result = converter.run();
+   tlsSize = info->bin.tlsSpace;
+   return result;
+}
+
+} // namespace nv50_ir
diff --git 

[Mesa-dev] [PATCH v4 15/38] nvir/nir: run assignSlots

2018-01-09 Thread Karol Herbst
v2: add support for geometry shaders
set idx
add some missing mappings
fix for 64bit inputs/outputs
fix up some FP color output index messup
parse centroid flag
v3: fix arrays in outputs as well
fix input/ouput size calculation for tessellation shaders
v4: add getSlotAddress helper
fix for 64 bit typed inputs

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 575 +
 1 file changed, 575 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index e95b97af19..f3cd22622d 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -65,6 +65,14 @@ public:
uint32_t getIndirect(nir_src *, uint8_t, Value**);
uint32_t getIndirect(nir_intrinsic_instr *, uint8_t s, uint8_t c, uint8_t 
f, Value**);
 
+   void setInterpolate(nv50_ir_varying *,
+   decltype(nir_variable().data.interpolation),
+   bool centroid,
+   unsigned semantics);
+   bool assignSlots();
+
+   uint32_t getSlotAddress(bool input, uint8_t idx, uint8_t slot, DataType ty 
= TYPE_U32, uint8_t f = 4);
+
bool run();
 
bool isFloatType(nir_alu_type);
@@ -306,6 +314,568 @@ Converter::getIndirect(nir_intrinsic_instr *insn, uint8_t 
s, uint8_t c, uint8_t
return idx;
 }
 
+static void
+vert_attrib_to_tgsi_semantic(unsigned slot, unsigned *name, unsigned *index)
+{
+   if (slot >= VERT_ATTRIB_GENERIC0) {
+  *name = TGSI_SEMANTIC_GENERIC;
+  *index = slot - VERT_ATTRIB_GENERIC0;
+  return;
+   }
+
+   if (slot == VERT_ATTRIB_POINT_SIZE) {
+  ERROR("unknown vert attrib slot %u\n", slot);
+  assert(false);
+  return;
+   }
+
+   if (slot >= VERT_ATTRIB_TEX0) {
+  *name = TGSI_SEMANTIC_TEXCOORD;
+  *index = slot - VERT_ATTRIB_TEX0;
+  return;
+   }
+
+   switch (slot) {
+   case VERT_ATTRIB_COLOR0:
+  *name = TGSI_SEMANTIC_COLOR;
+  *index = 0;
+  break;
+   case VERT_ATTRIB_COLOR1:
+  *name = TGSI_SEMANTIC_COLOR;
+  *index = 1;
+  break;
+   case VERT_ATTRIB_EDGEFLAG:
+  *name = TGSI_SEMANTIC_EDGEFLAG;
+  *index = 0;
+  break;
+   case VERT_ATTRIB_FOG:
+  *name = TGSI_SEMANTIC_FOG;
+  *index = 0;
+  break;
+   case VERT_ATTRIB_NORMAL:
+  *name = TGSI_SEMANTIC_NORMAL;
+  *index = 0;
+  break;
+   case VERT_ATTRIB_POS:
+  *name = TGSI_SEMANTIC_POSITION;
+  *index = 0;
+  break;
+   default:
+  ERROR("unknown vert attrib slot %u\n", slot);
+  assert(false);
+   }
+}
+
+static void
+varying_slot_to_tgsi_semantic(unsigned slot, unsigned *name, unsigned *index)
+{
+   if (slot >= VARYING_SLOT_PATCH0) {
+  *name = TGSI_SEMANTIC_PATCH;
+  *index = slot - VARYING_SLOT_PATCH0;
+  return;
+   }
+
+   if (slot >= VARYING_SLOT_VAR0) {
+  *name = TGSI_SEMANTIC_GENERIC;
+  *index = slot - VARYING_SLOT_VAR0;
+  return;
+   }
+
+   if (slot >= VARYING_SLOT_TEX0 && slot <= VARYING_SLOT_TEX7) {
+  *name = TGSI_SEMANTIC_TEXCOORD;
+  *index = slot - VARYING_SLOT_TEX0;
+  return;
+   }
+
+   switch (slot) {
+   case VARYING_SLOT_BFC0:
+  *name = TGSI_SEMANTIC_BCOLOR;
+  *index = 0;
+  break;
+   case VARYING_SLOT_BFC1:
+  *name = TGSI_SEMANTIC_BCOLOR;
+  *index = 1;
+  break;
+   case VARYING_SLOT_CLIP_DIST0:
+  *name = TGSI_SEMANTIC_CLIPDIST;
+  *index = 0;
+  break;
+   case VARYING_SLOT_CLIP_DIST1:
+  *name = TGSI_SEMANTIC_CLIPDIST;
+  *index = 1;
+  break;
+   case VARYING_SLOT_CLIP_VERTEX:
+  *name = TGSI_SEMANTIC_CLIPVERTEX;
+  *index = 0;
+  break;
+   case VARYING_SLOT_COL0:
+  *name = TGSI_SEMANTIC_COLOR;
+  *index = 0;
+  break;
+   case VARYING_SLOT_COL1:
+  *name = TGSI_SEMANTIC_COLOR;
+  *index = 1;
+  break;
+   case VARYING_SLOT_EDGE:
+  *name = TGSI_SEMANTIC_EDGEFLAG;
+  *index = 0;
+  break;
+   case VARYING_SLOT_FACE:
+  *name = TGSI_SEMANTIC_FACE;
+  *index = 0;
+  break;
+   case VARYING_SLOT_FOGC:
+  *name = TGSI_SEMANTIC_FOG;
+  *index = 0;
+  break;
+   case VARYING_SLOT_LAYER:
+  *name = TGSI_SEMANTIC_LAYER;
+  *index = 0;
+  break;
+   case VARYING_SLOT_PNTC:
+  *name = TGSI_SEMANTIC_PCOORD;
+  *index = 0;
+  break;
+   case VARYING_SLOT_POS:
+  *name = TGSI_SEMANTIC_POSITION;
+  *index = 0;
+  break;
+   case VARYING_SLOT_PRIMITIVE_ID:
+  *name = TGSI_SEMANTIC_PRIMID;
+  *index = 0;
+  break;
+   case VARYING_SLOT_PSIZ:
+  *name = TGSI_SEMANTIC_PSIZE;
+  *index = 0;
+  break;
+   case VARYING_SLOT_TESS_LEVEL_INNER:
+  *name = TGSI_SEMANTIC_TESSINNER;
+  *index = 0;
+  break;
+   case VARYING_SLOT_TESS_LEVEL_OUTER:
+  *name = TGSI_SEMANTIC_TESSOUTER;
+  *index = 0;
+ 

[Mesa-dev] [PATCH v4 08/38] nvir: add lowering helper

2018-01-09 Thread Karol Herbst
this is mostly usefull for lazy IR converters not wanting to deal with 64 bit
lowering and other illegal stuff

Signed-off-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/Makefile.sources   |   2 +
 .../nouveau/codegen/nv50_ir_lowering_helper.cpp| 250 +
 .../nouveau/codegen/nv50_ir_lowering_helper.h  |  52 +
 src/gallium/drivers/nouveau/meson.build|   2 +
 4 files changed, 306 insertions(+)
 create mode 100644 
src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_helper.cpp
 create mode 100644 
src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_helper.h

diff --git a/src/gallium/drivers/nouveau/Makefile.sources 
b/src/gallium/drivers/nouveau/Makefile.sources
index fee5e59522..ec344c6316 100644
--- a/src/gallium/drivers/nouveau/Makefile.sources
+++ b/src/gallium/drivers/nouveau/Makefile.sources
@@ -122,6 +122,8 @@ NV50_CODEGEN_SOURCES := \
codegen/nv50_ir_graph.h \
codegen/nv50_ir.h \
codegen/nv50_ir_inlines.h \
+   codegen/nv50_ir_lowering_helper.cpp \
+   codegen/nv50_ir_lowering_helper.h \
codegen/nv50_ir_lowering_nv50.cpp \
codegen/nv50_ir_peephole.cpp \
codegen/nv50_ir_print.cpp \
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_helper.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_helper.cpp
new file mode 100644
index 00..0f6daba953
--- /dev/null
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_helper.cpp
@@ -0,0 +1,250 @@
+/*
+ * Copyright 2018 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Karol Herbst 
+ */
+
+#include "codegen/nv50_ir_lowering_helper.h"
+
+namespace nv50_ir {
+
+bool
+LoweringHelper::visit(Instruction *insn)
+{
+   switch (insn->op) {
+   case OP_ABS:
+  return handleABS(insn);
+   case OP_CVT:
+  return handleCVT(insn);
+   case OP_MAX:
+   case OP_MIN:
+  return handleMAXMIN(insn);
+   case OP_MOV:
+  return handleMOV(insn);
+   case OP_NEG:
+  return handleNEG(insn);
+   case OP_SLCT:
+  return handleSLCT(insn->asCmp());
+
+   case OP_AND:
+   case OP_OR:
+   case OP_XOR:
+  return handleLogOp(insn);
+
+   default:
+  return true;
+   }
+}
+
+bool
+LoweringHelper::handleABS(Instruction *insn)
+{
+   DataType dt = insn->dType;
+   if (!(dt == TYPE_U64 || dt == TYPE_S64))
+  return true;
+
+   bld.setPosition(insn, false);
+
+   Value *neg = bld.getSSA(8);
+   Value *negComp[2], *srcComp[2];
+   Value *lo = bld.getSSA(), *hi = bld.getSSA();
+   bld.mkOp2(OP_SUB, dt, neg, bld.mkImm((uint64_t)0), insn->getSrc(0));
+   bld.mkSplit(negComp, 4, neg);
+   bld.mkSplit(srcComp, 4, insn->getSrc(0));
+   bld.mkCmp(OP_SLCT, CC_LT, TYPE_S32, lo, TYPE_S32, negComp[0], srcComp[0], 
srcComp[1]);
+   bld.mkCmp(OP_SLCT, CC_LT, TYPE_S32, hi, TYPE_S32, negComp[1], srcComp[1], 
srcComp[1]);
+   insn->op = OP_MERGE;
+   insn->setSrc(0, lo);
+   insn->setSrc(1, hi);
+
+   return true;
+}
+
+bool
+LoweringHelper::handleCVT(Instruction *insn)
+{
+   DataType dt = insn->dType;
+   DataType st = insn->sType;
+
+   if (typeSizeof(dt) <= 4 && typeSizeof(st) <= 4)
+  return true;
+
+   bld.setPosition(insn, false);
+
+   if ((dt == TYPE_S32 && st == TYPE_S64) ||
+   (dt == TYPE_U32 && st == TYPE_U64)) {
+  Value *src[2];
+  bld.mkSplit(src, 4, insn->getSrc(0));
+  insn->op = OP_MOV;
+  insn->setSrc(0, src[0]);
+   } else if (dt == TYPE_S64 && st == TYPE_S32) {
+  Value *tmp = bld.getSSA();
+  bld.mkOp2(OP_SHR, TYPE_S32, tmp, insn->getSrc(0), 
bld.loadImm(bld.getSSA(), 31));
+  insn->op = OP_MERGE;
+  insn->setSrc(1, tmp);
+   } else if (dt == TYPE_U64 && st == TYPE_U32) {
+  insn->op = OP_MERGE;
+  insn->setSrc(1, bld.loadImm(bld.getSSA(), 0));
+   }
+
+   return true;
+}
+
+bool
+LoweringHelper::handleMAXMIN(Instruction *insn)
+{
+   DataType dt = insn->dType;
+   if (!(dt 

[Mesa-dev] [PATCH v4 10/38] nvir/nir: use lowering helper

2018-01-09 Thread Karol Herbst
this helps with a bunch of piglit tests testing 64 bit types

Signed-off-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index 6bccd14bce..73527d4800 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -26,6 +26,7 @@
 
 #include "codegen/nv50_ir.h"
 #include "codegen/nv50_ir_from_common.h"
+#include "codegen/nv50_ir_lowering_helper.h"
 #include "codegen/nv50_ir_util.h"
 
 namespace {
@@ -62,6 +63,10 @@ Program::makeFromNIR(struct nv50_ir_prog_info *info)
nir_shader *nir = (nir_shader*)info->bin.source;
Converter converter(this, nir, info);
bool result = converter.run();
+   if (!result)
+  return result;
+   LoweringHelper lowering;
+   lowering.run(this);
tlsSize = info->bin.tlsSpace;
return result;
 }
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 13/38] nvir/nir: track defs and provide easy access functions

2018-01-09 Thread Karol Herbst
v2: add helper function for indirects
v4: add new getIndirect overload for easier use

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 136 +
 1 file changed, 136 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index 148db464bd..2ba51d2b63 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -29,6 +29,9 @@
 #include "codegen/nv50_ir_lowering_helper.h"
 #include "codegen/nv50_ir_util.h"
 
+#include 
+#include 
+
 static int
 type_size(const struct glsl_type *type)
 {
@@ -42,17 +45,150 @@ using namespace nv50_ir;
 class Converter : public ConverterCommon
 {
 public:
+   typedef std::vector LValues;
+   typedef decltype(nir_ssa_def().index) NirSSADefIdx;
+   typedef std::unordered_map NirDefMap;
+
Converter(Program *, nir_shader *, nv50_ir_prog_info *);
 
+   LValues& convert(nir_alu_dest *);
+   LValues& convert(nir_dest *);
+   LValues& convert(nir_register *);
+   LValues& convert(nir_ssa_def *);
+
+   // nir_alu_src needs special handling due to neg and abs modifiers
+   Value* getSrc(nir_alu_src *, uint8_t component = 0);
+   Value* getSrc(nir_register *, uint8_t);
+   Value* getSrc(nir_src *, uint8_t, bool indirect = false);
+   Value* getSrc(nir_ssa_def *, uint8_t);
+   uint32_t getIndirect(nir_src *, uint8_t, Value**);
+   uint32_t getIndirect(nir_intrinsic_instr *, uint8_t s, uint8_t c, uint8_t 
f, Value**);
+
bool run();
 private:
nir_shader *nir;
+
+   NirDefMap ssaDefs;
+   NirDefMap regDefs;
 };
 
 Converter::Converter(Program *prog, nir_shader *nir, nv50_ir_prog_info *info)
: ConverterCommon(prog, info),
  nir(nir) {}
 
+Converter::LValues&
+Converter::convert(nir_dest *dest)
+{
+   if (dest->is_ssa)
+  return convert(>ssa);
+   if (dest->reg.indirect) {
+  ERROR("no support for indirects.");
+  assert(false);
+   }
+   return convert(dest->reg.reg);
+}
+
+Converter::LValues&
+Converter::convert(nir_register *reg)
+{
+   NirDefMap::iterator it = regDefs.find(reg->index);
+   if (it != regDefs.end())
+  return (*it).second;
+
+   LValues newDef(reg->num_components);
+   for (auto i = 0u; i < reg->num_components; i++)
+  newDef[i] = getScratch(reg->bit_size / 8);
+   return regDefs[reg->index] = newDef;
+}
+
+Converter::LValues&
+Converter::convert(nir_ssa_def *def)
+{
+   NirDefMap::iterator it = ssaDefs.find(def->index);
+   if (it != ssaDefs.end())
+  return (*it).second;
+
+   LValues newDef(def->num_components);
+   for (auto i = 0; i < def->num_components; i++)
+  newDef[i] = getScratch(def->bit_size / 8);
+   return ssaDefs[def->index] = newDef;
+}
+
+Value*
+Converter::getSrc(nir_alu_src *src, uint8_t component)
+{
+   if (src->abs || src->negate) {
+  ERROR("modifiers currently not supported on nir_alu_src\n");
+  assert(false);
+   }
+   return getSrc(>src, src->swizzle[component]);
+}
+
+Value*
+Converter::getSrc(nir_register *reg, uint8_t idx)
+{
+   NirDefMap::iterator it = regDefs.find(reg->index);
+   if (it == regDefs.end()) {
+  ERROR("Register %u not found\n", reg->index);
+  assert(false);
+  return nullptr;
+   }
+   return (*it).second[idx];
+}
+
+Value*
+Converter::getSrc(nir_src *src, uint8_t idx, bool indirect)
+{
+   if (src->is_ssa)
+  return getSrc(src->ssa, idx);
+
+   if (src->reg.indirect) {
+  if (indirect)
+ return getSrc(src->reg.indirect, idx);
+  ERROR("no support for indirects.");
+  assert(false);
+  return nullptr;
+   }
+
+   return getSrc(src->reg.reg, idx);
+}
+
+Value*
+Converter::getSrc(nir_ssa_def *src, uint8_t idx)
+{
+   NirDefMap::iterator it = ssaDefs.find(src->index);
+   if (it == ssaDefs.end()) {
+  ERROR("SSA value %u not found\n", src->index);
+  assert(false);
+  return nullptr;
+   }
+   return (*it).second[idx];
+}
+
+uint32_t
+Converter::getIndirect(nir_src *src, uint8_t idx, Value **indirect)
+{
+   nir_const_value *offset = nir_src_as_const_value(*src);
+
+   if (offset) {
+  *indirect = nullptr;
+  return offset->u32[0];
+   }
+
+   *indirect = getSrc(src, idx, true);
+   return 0;
+}
+
+uint32_t
+Converter::getIndirect(nir_intrinsic_instr *insn, uint8_t s, uint8_t c, 
uint8_t f, Value **indirect)
+{
+   auto idx = nir_intrinsic_base(insn) + getIndirect(>src[s], c, 
indirect);
+   if (f != 1 && *indirect) {
+  *indirect = mkOp2v(OP_MUL, TYPE_U32, getSSA(4, FILE_ADDRESS), *indirect, 
loadImm(nullptr, f));
+   }
+   return idx;
+}
+
 bool
 Converter::run()
 {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 14/38] nvir/nir: add nir type helper functions

2018-01-09 Thread Karol Herbst
v4: treat imul as unsigned

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 117 +
 1 file changed, 117 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index 2ba51d2b63..e95b97af19 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -47,6 +47,7 @@ class Converter : public ConverterCommon
 public:
typedef std::vector LValues;
typedef decltype(nir_ssa_def().index) NirSSADefIdx;
+   typedef decltype(nir_ssa_def().bit_size) NirSSADefBitSize;
typedef std::unordered_map NirDefMap;
 
Converter(Program *, nir_shader *, nv50_ir_prog_info *);
@@ -65,6 +66,17 @@ public:
uint32_t getIndirect(nir_intrinsic_instr *, uint8_t s, uint8_t c, uint8_t 
f, Value**);
 
bool run();
+
+   bool isFloatType(nir_alu_type);
+   bool isSignedType(nir_alu_type);
+   bool isResultFloat(nir_op);
+   bool isResultSigned(nir_op);
+   DataType getDType(nir_alu_instr*);
+   DataType getDType(nir_intrinsic_instr*);
+   DataType getDType(nir_op, NirSSADefBitSize);
+   std::vector getSTypes(nir_alu_instr*);
+   DataType getSType(nir_src&, bool isFloat, bool isSigned);
+
 private:
nir_shader *nir;
 
@@ -76,6 +88,111 @@ Converter::Converter(Program *prog, nir_shader *nir, 
nv50_ir_prog_info *info)
: ConverterCommon(prog, info),
  nir(nir) {}
 
+bool
+Converter::isFloatType(nir_alu_type type)
+{
+   return !!(nir_alu_type_get_base_type(type) == nir_type_float);
+}
+
+bool
+Converter::isSignedType(nir_alu_type type)
+{
+   return !!(nir_alu_type_get_base_type(type) == nir_type_int);
+}
+
+bool
+Converter::isResultFloat(nir_op op)
+{
+   const nir_op_info  = nir_op_infos[op];
+   if (info.output_type != nir_type_invalid)
+  return isFloatType(info.output_type);
+
+   switch (op) {
+   default:
+  ERROR("isResultFloat not implemented for %s\n", nir_op_infos[op].name);
+  assert(false);
+  return true;
+   }
+}
+
+bool
+Converter::isResultSigned(nir_op op)
+{
+   switch (op) {
+   case nir_op_imul:
+  return false;
+   default:
+  const nir_op_info  = nir_op_infos[op];
+  if (info.output_type != nir_type_invalid)
+ return isSignedType(info.output_type);
+  ERROR("isResultSigned not implemented for %s\n", nir_op_infos[op].name);
+  assert(false);
+  return true;
+   }
+}
+
+DataType
+Converter::getDType(nir_alu_instr *insn)
+{
+   if (insn->dest.dest.is_ssa)
+  return getDType(insn->op, insn->dest.dest.ssa.bit_size);
+   else
+  return getDType(insn->op, insn->dest.dest.reg.reg->bit_size);
+}
+
+DataType
+Converter::getDType(nir_intrinsic_instr *insn)
+{
+   if (insn->dest.is_ssa)
+  return typeOfSize(insn->dest.ssa.bit_size / 8, false, false);
+   else
+  return typeOfSize(insn->dest.reg.reg->bit_size / 8, false, false);
+}
+
+DataType
+Converter::getDType(nir_op op, Converter::NirSSADefBitSize bitSize)
+{
+   DataType ty = typeOfSize(bitSize / 8, isResultFloat(op), 
isResultSigned(op));
+   if (ty == TYPE_NONE) {
+  ERROR("couldn't get Type for op %s with bitSize %u\n", 
nir_op_infos[op].name, bitSize);
+  assert(false);
+   }
+   return ty;
+}
+
+std::vector
+Converter::getSTypes(nir_alu_instr *insn)
+{
+   const nir_op_info  = nir_op_infos[insn->op];
+   std::vector res(info.num_inputs);
+
+   for (auto i = 0u; i < info.num_inputs; ++i) {
+  if (info.input_types[i] != nir_type_invalid)
+ res[i] = getSType(insn->src[i].src, isFloatType(info.input_types[i]), 
isSignedType(info.input_types[i]));
+  else switch (insn->op) {
+ default:
+ERROR("getSType not implemented for %s idx %u\n", info.name, i);
+assert(false);
+res[i] = TYPE_NONE;
+break;
+  }
+   }
+
+   return res;
+}
+
+DataType
+Converter::getSType(nir_src , bool isFloat, bool isSigned)
+{
+   NirSSADefBitSize bitSize;
+   if (src.is_ssa)
+  bitSize = src.ssa->bit_size;
+   else
+  bitSize = src.reg.reg->bit_size;
+
+   return typeOfSize(bitSize / 8, isFloat, isSigned);
+}
+
 Converter::LValues&
 Converter::convert(nir_dest *dest)
 {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 11/38] nvc0/debug: add env var to make nir default

2018-01-09 Thread Karol Herbst
v2: allow for non debug builds as well
v3: move reading out env var more global
disable tg4 with multiple offsets with nir
disable caps for 64 bit types

Signed-off-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/nouveau_screen.c   |  4 
 src/gallium/drivers/nouveau/nouveau_screen.h   |  2 ++
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 12 
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nouveau_screen.c 
b/src/gallium/drivers/nouveau/nouveau_screen.c
index c144b39b2d..6c52f9e40c 100644
--- a/src/gallium/drivers/nouveau/nouveau_screen.c
+++ b/src/gallium/drivers/nouveau/nouveau_screen.c
@@ -175,6 +175,7 @@ nouveau_screen_init(struct nouveau_screen *screen, struct 
nouveau_device *dev)
void *data;
union nouveau_bo_config mm_config;
 
+   char *use_nir = getenv("NV50_PROG_USE_NIR");
char *nv_dbg = getenv("NOUVEAU_MESA_DEBUG");
if (nv_dbg)
   nouveau_mesa_debug = atoi(nv_dbg);
@@ -261,6 +262,9 @@ nouveau_screen_init(struct nouveau_screen *screen, struct 
nouveau_device *dev)
NOUVEAU_BO_GART | NOUVEAU_BO_MAP,
_config);
screen->mm_VRAM = nouveau_mm_create(dev, NOUVEAU_BO_VRAM, _config);
+
+   screen->prefer_nir = use_nir && strtol(use_nir, NULL, 0) == 1;
+
return 0;
 }
 
diff --git a/src/gallium/drivers/nouveau/nouveau_screen.h 
b/src/gallium/drivers/nouveau/nouveau_screen.h
index e4fbae99ca..1229b66b26 100644
--- a/src/gallium/drivers/nouveau/nouveau_screen.h
+++ b/src/gallium/drivers/nouveau/nouveau_screen.h
@@ -62,6 +62,8 @@ struct nouveau_screen {
 
struct disk_cache *disk_shader_cache;
 
+   bool prefer_nir;
+
 #ifdef NOUVEAU_ENABLE_DRIVER_STATISTICS
union {
   uint64_t v[29];
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
index baf3af7346..bc0d053cc9 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -112,7 +112,8 @@ static int
 nvc0_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
 {
const uint16_t class_3d = nouveau_screen(pscreen)->class_3d;
-   struct nouveau_device *dev = nouveau_screen(pscreen)->device;
+   const struct nouveau_screen *screen = nouveau_screen(pscreen);
+   struct nouveau_device *dev = screen->device;
 
switch (param) {
/* non-boolean caps */
@@ -219,7 +220,6 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_USER_VERTEX_BUFFERS:
case PIPE_CAP_TEXTURE_QUERY_LOD:
case PIPE_CAP_SAMPLE_SHADING:
-   case PIPE_CAP_TEXTURE_GATHER_OFFSETS:
case PIPE_CAP_TEXTURE_GATHER_SM5:
case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE:
case PIPE_CAP_CONDITIONAL_RENDER_INVERTED:
@@ -259,6 +259,9 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
pipe_cap param)
case PIPE_CAP_CAN_BIND_CONST_BUFFER_AS_VERTEX:
case PIPE_CAP_ALLOW_MAPPED_BUFFERS_DURING_EXECUTION:
   return 1;
+   case PIPE_CAP_TEXTURE_GATHER_OFFSETS:
+  /* TODO: nir doesn't support tg4 with multiple offsets */
+  return screen->prefer_nir ? 0 : 1;
case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
   return nouveau_screen(pscreen)->vram_domain & NOUVEAU_BO_VRAM ? 1 : 0;
case PIPE_CAP_TGSI_FS_FBFETCH:
@@ -340,7 +343,8 @@ nvc0_screen_get_shader_param(struct pipe_screen *pscreen,
  enum pipe_shader_type shader,
  enum pipe_shader_cap param)
 {
-   const uint16_t class_3d = nouveau_screen(pscreen)->class_3d;
+   const struct nouveau_screen *screen = nouveau_screen(pscreen);
+   const uint16_t class_3d = screen->class_3d;
 
switch (shader) {
case PIPE_SHADER_VERTEX:
@@ -356,7 +360,7 @@ nvc0_screen_get_shader_param(struct pipe_screen *pscreen,
 
switch (param) {
case PIPE_SHADER_CAP_PREFERRED_IR:
-  return PIPE_SHADER_IR_TGSI;
+  return screen->prefer_nir ? PIPE_SHADER_IR_NIR : PIPE_SHADER_IR_TGSI;
case PIPE_SHADER_CAP_SUPPORTED_IRS:
   return 1 << PIPE_SHADER_IR_TGSI |
  1 << PIPE_SHADER_IR_NIR;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 12/38] nvir/nir: run some passes to make the conversion easier

2018-01-09 Thread Karol Herbst
v2: add constant_folding

Signed-off-by: Karol Herbst 
---
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 40 ++
 1 file changed, 40 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
index 73527d4800..148db464bd 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
@@ -29,6 +29,12 @@
 #include "codegen/nv50_ir_lowering_helper.h"
 #include "codegen/nv50_ir_util.h"
 
+static int
+type_size(const struct glsl_type *type)
+{
+   return glsl_count_attribute_slots(type, false);
+}
+
 namespace {
 
 using namespace nv50_ir;
@@ -50,6 +56,40 @@ Converter::Converter(Program *prog, nir_shader *nir, 
nv50_ir_prog_info *info)
 bool
 Converter::run()
 {
+   bool progress;
+
+   if (prog->dbgFlags & NV50_IR_DEBUG_BASIC)
+  nir_print_shader(nir, stderr);
+
+   // converts intrinsic load_var to intrinsic load_uniform
+   NIR_PASS_V(nir, nir_lower_io, nir_var_all, type_size, 
(nir_lower_io_options)0);
+
+   NIR_PASS_V(nir, nir_lower_regs_to_ssa);
+   NIR_PASS_V(nir, nir_lower_load_const_to_scalar);
+
+   do {
+  progress = false;
+  // we need this to_ssa otherwise the later opts are less effective
+  NIR_PASS_V(nir, nir_lower_vars_to_ssa);
+  NIR_PASS(progress, nir, nir_lower_alu_to_scalar);
+  NIR_PASS(progress, nir, nir_lower_phis_to_scalar);
+  // some ops depend on having constant as sources, but those can also
+  // point to expressions made from constants like 0 + 1
+  NIR_PASS(progress, nir, nir_opt_constant_folding);
+  NIR_PASS(progress, nir, nir_copy_prop);
+  NIR_PASS(progress, nir, nir_opt_dce);
+  NIR_PASS(progress, nir, nir_opt_dead_cf);
+   } while (progress);
+
+   NIR_PASS_V(nir, nir_remove_dead_variables, nir_var_local);
+   NIR_PASS_V(nir, nir_convert_from_ssa, true);
+
+   /* Garbage collect dead instructions */
+   nir_sweep(nir);
+
+   if (prog->dbgFlags & NV50_IR_DEBUG_BASIC)
+  nir_print_shader(nir, stderr);
+
return false;
 }
 
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 06/38] nvir: print the shader type when dumping headers

2018-01-09 Thread Karol Herbst
this makes debugging the shader header a little easier

Signed-off-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
index e6157f550d..fd65859516 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_program.c
@@ -541,6 +541,7 @@ nvc0_program_dump(struct nvc0_program *prog)
unsigned pos;
 
if (prog->type != PIPE_SHADER_COMPUTE) {
+  debug_printf("dumping HDR for type %i\n", prog->type);
   for (pos = 0; pos < ARRAY_SIZE(prog->hdr); ++pos)
  debug_printf("HDR[%02"PRIxPTR"] = 0x%08x\n",
   pos * sizeof(prog->hdr[0]), prog->hdr[pos]);
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 00/38] Nir support for Nouveau

2018-01-09 Thread Karol Herbst
significant changes to last series:
* fixing TF with GS for gallium nir drivers
* RA fix for 64 bit values and compounds
* completed support for 64 bit types
* random piglit fixes

Tested with unigine heaven/valley, gputest and RealisticRenderer

piglit run -x glx -x egl -x streaming-texture-leak -x max-texture-size 
tests/gpu.py:
[26075/26075] skip: 1576, pass: 24333, warn: 9, fail: 143, crash: 14

overview of fails:
* interpolateAt
* indirects in image_load/store
* shader_ballot fails in 'fs-ballot-if-else' and 'fs-builtin-variables'
* two tests fail for RA reasons

thanks to everybody involved in this :)

Connor Abbott (1):
  nv50/ir/ra: Fix copying compound for moves

Karol Herbst (33):
  nvir: print the shader type when dumping headers
  nvir: move common converter code in base class
  nvir: add lowering helper
  nvc0: add support for NIR
  nvir/nir: use lowering helper
  nvc0/debug: add env var to make nir default
  nvir/nir: run some passes to make the conversion easier
  nvir/nir: track defs and provide easy access functions
  nvir/nir: add nir type helper functions
  nvir/nir: run assignSlots
  nvir/nir: add loadFrom and storeTo helpler
  nvir/nir: parse NIR shader info
  nvir/nir: implement CFG handling
  nvir/nir: implement nir_load_const_instr
  nvir/nir: add skeleton for nir_intrinsic_instr
  nvir/nir: implement nir_alu_instr handling
  nvir/nir: implement nir_intrinsic_load_uniform
  nvir/nir: implement nir_intrinsic_store_(per_vertex_)output
  nvir/nir: implement nir_intrinsic_load_input
  nvir/nir: implement intrinsic_discard(_if)
  nvir/nir: implement loading system values
  nvir/nir: implement nir_ssa_undef_instr
  nvir/nir: implement nir_instr_type_tex
  nvir/nir: add getOperation for intrinsics
  nvir/nir: implement vote and ballot
  nvir/nir: implement variable indexing
  nvir/nir: implement geometry shader nir_intrinsics
  nvir/nir: implement nir_intrinsic_load_ubo
  nvir/nir: implement ssbo intrinsics
  nvir/nir: implement images
  nvir/nir: add memory barriers
  nvir/nir: implement load_per_vertex_output
  nvir/nir: implement intrinsic shader_clock

Rob Clark (1):
  mesa/st: translate SO info in glsl_to_nir() case

Timothy Arceri (3):
  compiler: tidy up double_inputs_read uses
  nir: add vs_inputs_dual_locations compiler option
  nir: partially revert c2acf97fcc9b32e

 src/amd/vulkan/radv_shader.c   |1 +
 src/compiler/glsl/glsl_to_nir.cpp  |   21 +-
 src/compiler/glsl/ir_set_program_inouts.cpp|2 +-
 src/compiler/nir/nir.h |6 +
 src/compiler/nir/nir_gather_info.c |   35 +-
 src/compiler/shader_info.h |   10 +-
 src/gallium/drivers/nouveau/Makefile.sources   |5 +
 src/gallium/drivers/nouveau/codegen/nv50_ir.cpp|3 +
 src/gallium/drivers/nouveau/codegen/nv50_ir.h  |1 +
 .../nouveau/codegen/nv50_ir_from_common.cpp|  107 +
 .../drivers/nouveau/codegen/nv50_ir_from_common.h  |   58 +
 .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 2802 
 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  |  106 +-
 .../nouveau/codegen/nv50_ir_lowering_helper.cpp|  250 ++
 .../nouveau/codegen/nv50_ir_lowering_helper.h  |   52 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp |   60 +-
 src/gallium/drivers/nouveau/meson.build|   14 +-
 src/gallium/drivers/nouveau/nouveau_screen.c   |4 +
 src/gallium/drivers/nouveau/nouveau_screen.h   |2 +
 src/gallium/drivers/nouveau/nvc0/nvc0_program.c|   19 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |   52 +-
 src/gallium/drivers/nouveau/nvc0/nvc0_state.c  |   27 +-
 src/intel/compiler/brw_compiler.c  |3 +
 src/intel/compiler/brw_vec4.cpp|2 +-
 src/mesa/state_tracker/st_glsl_to_nir.cpp  |2 +-
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp |2 +-
 src/mesa/state_tracker/st_program.c|   59 +-
 27 files changed, 3542 insertions(+), 163 deletions(-)
 create mode 100644 src/gallium/drivers/nouveau/codegen/nv50_ir_from_common.cpp
 create mode 100644 src/gallium/drivers/nouveau/codegen/nv50_ir_from_common.h
 create mode 100644 src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
 create mode 100644 
src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_helper.cpp
 create mode 100644 
src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_helper.h

-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 03/38] nir: add vs_inputs_dual_locations compiler option

2018-01-09 Thread Karol Herbst
From: Timothy Arceri 

Allows nir drivers to either use a single or dual locations for
vs double inputs.

i965 uses dual locations for both OpenGL and Vulkan drivers, for
now gallium OpenGL drivers only use a single location.

The following patch will also make use of this option when
calling nir_shader_gather_info().

Tested-by: Karol Herbst 
---
 src/amd/vulkan/radv_shader.c  |  1 +
 src/compiler/glsl/glsl_to_nir.cpp | 14 +-
 src/compiler/nir/nir.h|  6 ++
 src/intel/compiler/brw_compiler.c |  3 +++
 4 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
index 5d777a05e5..8ed9fb91e7 100644
--- a/src/amd/vulkan/radv_shader.c
+++ b/src/amd/vulkan/radv_shader.c
@@ -66,6 +66,7 @@ static const struct nir_shader_compiler_options nir_options = 
{
.lower_extract_byte = true,
.lower_extract_word = true,
.lower_ffma = true,
+   .vs_inputs_dual_locations = true,
.max_unroll_iterations = 32
 };
 
diff --git a/src/compiler/glsl/glsl_to_nir.cpp 
b/src/compiler/glsl/glsl_to_nir.cpp
index 505c99bbe3..4e3e9c4610 100644
--- a/src/compiler/glsl/glsl_to_nir.cpp
+++ b/src/compiler/glsl/glsl_to_nir.cpp
@@ -130,11 +130,15 @@ private:
 } /* end of anonymous namespace */
 
 static void
-nir_remap_attributes(nir_shader *shader)
+nir_remap_attributes(nir_shader *shader,
+ const nir_shader_compiler_options *options)
 {
-   nir_foreach_variable(var, >inputs) {
-  var->data.location += _mesa_bitcount_64(shader->info.vs.double_inputs &
-  
BITFIELD64_MASK(var->data.location));
+   if (options->vs_inputs_dual_locations) {
+  nir_foreach_variable(var, >inputs) {
+ var->data.location +=
+_mesa_bitcount_64(shader->info.vs.double_inputs &
+  BITFIELD64_MASK(var->data.location));
+  }
}
 
/* Once the remap is done, reset double_inputs_read, so later it will have
@@ -164,7 +168,7 @@ glsl_to_nir(const struct gl_shader_program *shader_prog,
 * location 0 and vec4 attr1 in location 1, in NIR attr0 will use
 * locations/slots 0 and 1, and attr1 will use location/slot 2 */
if (shader->info.stage == MESA_SHADER_VERTEX)
-  nir_remap_attributes(shader);
+  nir_remap_attributes(shader, options);
 
shader->info.name = ralloc_asprintf(shader, "GLSL%d", shader_prog->Name);
if (shader_prog->Label)
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 440c3fe997..4eb9c80ecd 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -1892,6 +1892,12 @@ typedef struct nir_shader_compiler_options {
 */
bool use_interpolated_input_intrinsics;
 
+   /**
+* Do vertex shader double inputs use two locations? The Vulkan spec
+* requires two locations to be used, OpenGL allows a single location.
+*/
+   bool vs_inputs_dual_locations;
+
unsigned max_unroll_iterations;
 } nir_shader_compiler_options;
 
diff --git a/src/intel/compiler/brw_compiler.c 
b/src/intel/compiler/brw_compiler.c
index e89aeacc7d..e515559acb 100644
--- a/src/intel/compiler/brw_compiler.c
+++ b/src/intel/compiler/brw_compiler.c
@@ -57,6 +57,7 @@ static const struct nir_shader_compiler_options 
scalar_nir_options = {
.lower_unpack_snorm_4x8 = true,
.lower_unpack_unorm_2x16 = true,
.lower_unpack_unorm_4x8 = true,
+   .vs_inputs_dual_locations = true,
.max_unroll_iterations = 32,
 };
 
@@ -78,6 +79,7 @@ static const struct nir_shader_compiler_options 
vector_nir_options = {
.lower_unpack_unorm_2x16 = true,
.lower_extract_byte = true,
.lower_extract_word = true,
+   .vs_inputs_dual_locations = true,
.max_unroll_iterations = 32,
 };
 
@@ -96,6 +98,7 @@ static const struct nir_shader_compiler_options 
vector_nir_options_gen6 = {
.lower_unpack_unorm_2x16 = true,
.lower_extract_byte = true,
.lower_extract_word = true,
+   .vs_inputs_dual_locations = true,
.max_unroll_iterations = 32,
 };
 
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 01/38] mesa/st: translate SO info in glsl_to_nir() case

2018-01-09 Thread Karol Herbst
From: Rob Clark 

This was handled for VS, but not for GS.

Fixes for gallium drivers using nir:
spec@arb_gpu_shader5@arb_gpu_shader5-xfb-streams-without-invocations
spec@arb_gpu_shader5@arb_gpu_shader5-xfb-streams*
spec@arb_transform_feedback3@arb_transform_feedback3-ext_interleaved_two_bufs_gs*
spec@ext_transform_feedback@geometry-shaders-basic
spec@ext_transform_feedback@* use_gs
spec@glsl-1.50@execution@geometry@primitive-id*
spec@glsl-1.50@execution@geometry@tri-strip-ordering-with-prim-restart 
gl_triangle_strip *
spec@glsl-1.50@transform-feedback-builtins
spec@glsl-1.50@transform-feedback-type-and-size

v2: we don't need this for TCP

Signed-off-by: Rob Clark 
Reviewed-by: Timothy Arceri 
Tested-by: Karol Herbst 
---
 src/mesa/state_tracker/st_program.c | 57 ++---
 1 file changed, 53 insertions(+), 4 deletions(-)

diff --git a/src/mesa/state_tracker/st_program.c 
b/src/mesa/state_tracker/st_program.c
index 05e6042f42..2a1a695948 100644
--- a/src/mesa/state_tracker/st_program.c
+++ b/src/mesa/state_tracker/st_program.c
@@ -1421,6 +1421,50 @@ st_translate_program_common(struct st_context *st,
}
 }
 
+/**
+ * Update stream-output info for GS/TCS/TES.  Normally this is done in
+ * st_translate_program_common() but that is not called for glsl_to_nir
+ * case.
+ */
+static void
+st_translate_program_stream_output(struct gl_program *prog,
+   struct pipe_stream_output_info 
*stream_output)
+{
+   if (!prog->sh.LinkedTransformFeedback)
+  return;
+
+   ubyte outputMapping[VARYING_SLOT_TESS_MAX];
+   GLuint attr;
+   uint num_outputs = 0;
+
+   memset(outputMapping, 0, sizeof(outputMapping));
+
+   /*
+* Determine number of outputs, the (default) output register
+* mapping and the semantic information for each output.
+*/
+   for (attr = 0; attr < VARYING_SLOT_MAX; attr++) {
+  if (prog->info.outputs_written & BITFIELD64_BIT(attr)) {
+ GLuint slot = num_outputs++;
+
+ outputMapping[attr] = slot;
+  }
+   }
+
+   /* Also add patch outputs. */
+   for (attr = 0; attr < 32; attr++) {
+  if (prog->info.patch_outputs_written & (1u << attr)) {
+ GLuint slot = num_outputs++;
+ GLuint patch_attr = VARYING_SLOT_PATCH0 + attr;
+
+ outputMapping[patch_attr] = slot;
+  }
+   }
+
+   st_translate_stream_output_info2(prog->sh.LinkedTransformFeedback,
+outputMapping,
+stream_output);
+}
 
 /**
  * Translate a geometry program to create a new variant.
@@ -1432,8 +1476,10 @@ st_translate_geometry_program(struct st_context *st,
struct ureg_program *ureg;
 
/* We have already compiled to NIR so just return */
-   if (stgp->shader_program)
+   if (stgp->shader_program) {
+  st_translate_program_stream_output(>Base, 
>tgsi.stream_output);
   return true;
+   }
 
ureg = ureg_create_with_screen(PIPE_SHADER_GEOMETRY, st->pipe->screen);
if (ureg == NULL)
@@ -1489,6 +1535,7 @@ st_get_basic_variant(struct st_context *st,
tgsi.ir.nir = nir_shader_clone(NULL, prog->tgsi.ir.nir);
st_finalize_nir(st, >Base, prog->shader_program,
 tgsi.ir.nir);
+tgsi.stream_output = prog->tgsi.stream_output;
 } else
tgsi = prog->tgsi;
  /* fill in new variant */
@@ -1529,7 +1576,7 @@ st_translate_tessctrl_program(struct st_context *st,
 {
struct ureg_program *ureg;
 
-   /* We have already compiler to NIR so just return */
+   /* We have already compiled to NIR so just return */
if (sttcp->shader_program)
   return true;
 
@@ -1558,9 +1605,11 @@ st_translate_tesseval_program(struct st_context *st,
 {
struct ureg_program *ureg;
 
-   /* We have already compiler to NIR so just return */
-   if (sttep->shader_program)
+   /* We have already compiled to NIR so just return */
+   if (sttep->shader_program) {
+  st_translate_program_stream_output(>Base, 
>tgsi.stream_output);
   return true;
+   }
 
ureg = ureg_create_with_screen(PIPE_SHADER_TESS_EVAL, st->pipe->screen);
if (ureg == NULL)
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 05/38] nv50/ir/ra: Fix copying compound for moves

2018-01-09 Thread Karol Herbst
From: Connor Abbott 

In order to reduce moves when coalescing multiple registers into a
larger register, RA will try to coalesce MERGE instructions with their
definitions. For example, for something like this in GLSL:

uint a = ...;
uint b = ...;
uint64 x = packUint2x32(a, b);

The compiler will try to coalesce x with a and b, in the same way as
something like:

uint a = ...;
uint b = ...;
...
uint x = phi(a, b);

with the crucial difference that the definitions of a and b only clobber
part of the register, instead of the whole thing. This information is
carried through the compound flag and compMask bitmask. If compound is
set, then the value has been coalesced in such a way that not all the
defs clobber the entire register. The compMask bitmask describes which
subregister each def clobbers, although it does it in a slightly
convoluted way. It's an invariant that once compound is set on one def,
it must be set for all the defs in a given coalesced value.

In more detail, the constraints pass will first create extra moves:

uint a = ...;
uint b = ...;
uint a' = a;
uint b' = b;
uint64 x = packUint2x32(a', b');

and then RA will merge values involved in MERGE/SPLIT instructions,
merging x with a' and b' and making the combined value compound -- this
is relatively simple, and will always succeed since we just created a'
and b', so they never interfere with x, and x has no other definitions,
since we haven't started coalescing moves yet. Basically, we just replaced
the MERGE instruction with an equivalent sequence of partial writes to the
destination. The tricky part comes when we try to merge a' with a
and b' with b. We need to transfer the compound information from a' to a
and b' to b, which copyCompound() does, but we also need to transfer it
to any defs coalesced with a and b, which the code failed to do. Similarly,
if x is the argument to a phi instruction, then when we try to merge it
with other arguments to the same phi by coalescing moves, we'd have
problems guaranteeing that all the other merged defs stay up-to-date.

One tricky part of fixing this is that in order to properly propagate
the information from a' to a, we need to do it before the defs for a and
a' are merged in coalesceValues(), since we need to know which defs are
merged with a but not a' -- after coalesceValues() returns, all the defs
have been combined, so we don't know which is which. I took the approach
of calling copyCompound() inside coalesceValues(), instead of
afterwards.

Cc: Ilia Mirkin 
Cc: Karol Herbst 
Tested-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 60 ++
 1 file changed, 39 insertions(+), 21 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
index 3a0e56e138..df3116a6d7 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
@@ -890,6 +890,35 @@ GCRA::RIG_Node::init(const RegisterSet& regs, LValue *lval)
livei.insert(lval->livei);
 }
 
+// Used when coalescing moves. The non-compound value will become one, e.g.:
+// mov b32 $r0 $r2/ merge b64 $r0d { $r0 $r1 }
+// split b64 { $r0 $r1 } $r0d / mov b64 $r0d f64 $r2d
+static inline void copyCompound(Value *dst, Value *src)
+{
+   LValue *ldst = dst->asLValue();
+   LValue *lsrc = src->asLValue();
+
+   if (ldst->compound && !lsrc->compound) {
+  LValue *swap = lsrc;
+  lsrc = ldst;
+  ldst = swap;
+   }
+
+   assert(!ldst->compound);
+
+   if (lsrc->compound) {
+  Value *dstRep = ldst->join;
+  for (Value::DefIterator d = dstRep->defs.begin(); d != 
dstRep->defs.end();
+   ++d) {
+ LValue *ldst = (*d)->get()->asLValue();
+ if (!ldst->compound)
+ldst->compMask = 0xff;
+ ldst->compound = 1;
+ ldst->compMask &= lsrc->compMask;
+  }
+   }
+}
+
 bool
 GCRA::coalesceValues(Value *dst, Value *src, bool force)
 {
@@ -932,9 +961,16 @@ GCRA::coalesceValues(Value *dst, Value *src, bool force)
if (!force && nRep->livei.overlaps(nVal->livei))
   return false;
 
+   // TODO: Handle this case properly.
+   if (!force && rep->compound && val->compound)
+  return false;
+
INFO_DBG(prog->dbgFlags, REG_ALLOC, "joining %%%i($%i) <- %%%i\n",
 rep->id, rep->reg.data.id, val->id);
 
+   if (!force)
+  copyCompound(dst, src);
+
// set join pointer of all values joined with val
for (Value::DefIterator def = val->defs.begin(); def != val->defs.end();
 ++def)
@@ -997,24 +1033,6 @@ static inline uint8_t makeCompMask(int compSize, int 
base, int size)
}
 }
 
-// Used when coalescing moves. The non-compound value will become one, e.g.:
-// mov b32 $r0 $r2/ merge b64 $r0d { $r0 $r1 }
-// split b64 { $r0 $r1 } $r0d / mov b64 $r0d f64 $r2d
-static inline void 

[Mesa-dev] [PATCH v4 02/38] compiler: tidy up double_inputs_read uses

2018-01-09 Thread Karol Herbst
From: Timothy Arceri 

First we move double_inputs_read into a vs struct in the union,
double_inputs_read is only used for vs inputs so this will
save space and also allows us to add a new double_inputs field.

We add the new field because c2acf97fcc9b changed the behaviour
of double_inputs_read, and while it's no longer used to track
actual reads in i965 we do still want to track this for gallium
drivers.
---
 src/compiler/glsl/glsl_to_nir.cpp   |  9 +
 src/compiler/glsl/ir_set_program_inouts.cpp |  2 +-
 src/compiler/nir/nir_gather_info.c  |  8 ++--
 src/compiler/shader_info.h  | 10 --
 src/intel/compiler/brw_vec4.cpp |  2 +-
 src/mesa/state_tracker/st_glsl_to_nir.cpp   |  2 +-
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp  |  2 +-
 src/mesa/state_tracker/st_program.c |  2 +-
 8 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/src/compiler/glsl/glsl_to_nir.cpp 
b/src/compiler/glsl/glsl_to_nir.cpp
index 0493410aeb..505c99bbe3 100644
--- a/src/compiler/glsl/glsl_to_nir.cpp
+++ b/src/compiler/glsl/glsl_to_nir.cpp
@@ -133,13 +133,13 @@ static void
 nir_remap_attributes(nir_shader *shader)
 {
nir_foreach_variable(var, >inputs) {
-  var->data.location += _mesa_bitcount_64(shader->info.double_inputs_read &
+  var->data.location += _mesa_bitcount_64(shader->info.vs.double_inputs &
   
BITFIELD64_MASK(var->data.location));
}
 
/* Once the remap is done, reset double_inputs_read, so later it will have
 * which location/slots are doubles */
-   shader->info.double_inputs_read = 0;
+   shader->info.vs.double_inputs = 0;
 }
 
 nir_shader *
@@ -363,10 +363,11 @@ nir_visitor::visit(ir_variable *ir)
   }
 
   /* Mark all the locations that require two slots */
-  if (glsl_type_is_dual_slot(glsl_without_array(var->type))) {
+  if (shader->info.stage == MESA_SHADER_VERTEX &&
+  glsl_type_is_dual_slot(glsl_without_array(var->type))) {
  for (uint i = 0; i < glsl_count_attribute_slots(var->type, true); 
i++) {
 uint64_t bitfield = BITFIELD64_BIT(var->data.location + i);
-shader->info.double_inputs_read |= bitfield;
+shader->info.vs.double_inputs |= bitfield;
  }
   }
   break;
diff --git a/src/compiler/glsl/ir_set_program_inouts.cpp 
b/src/compiler/glsl/ir_set_program_inouts.cpp
index 90b06b9f41..1b6c8d750b 100644
--- a/src/compiler/glsl/ir_set_program_inouts.cpp
+++ b/src/compiler/glsl/ir_set_program_inouts.cpp
@@ -118,7 +118,7 @@ mark(struct gl_program *prog, ir_variable *var, int offset, 
int len,
  /* double inputs read is only for vertex inputs */
  if (stage == MESA_SHADER_VERTEX &&
  var->type->without_array()->is_dual_slot())
-prog->info.double_inputs_read |= bitfield;
+prog->info.vs.double_inputs_read |= bitfield;
 
  if (stage == MESA_SHADER_FRAGMENT) {
 prog->info.fs.uses_sample_qualifier |= var->data.sample;
diff --git a/src/compiler/nir/nir_gather_info.c 
b/src/compiler/nir/nir_gather_info.c
index 946939657e..e98129b22c 100644
--- a/src/compiler/nir/nir_gather_info.c
+++ b/src/compiler/nir/nir_gather_info.c
@@ -234,7 +234,8 @@ gather_intrinsic_info(nir_intrinsic_instr *instr, 
nir_shader *shader)
  glsl_type_is_dual_slot(glsl_without_array(var->type))) {
 for (uint i = 0; i < glsl_count_attribute_slots(var->type, false); 
i++) {
int idx = var->data.location + i;
-   shader->info.double_inputs_read |= BITFIELD64_BIT(idx);
+   shader->info.vs.double_inputs |= BITFIELD64_BIT(idx);
+   shader->info.vs.double_inputs_read |= BITFIELD64_BIT(idx);
 }
  }
   }
@@ -356,10 +357,13 @@ nir_shader_gather_info(nir_shader *shader, 
nir_function_impl *entrypoint)
shader->info.outputs_written = 0;
shader->info.outputs_read = 0;
shader->info.patch_outputs_read = 0;
-   shader->info.double_inputs_read = 0;
shader->info.patch_inputs_read = 0;
shader->info.patch_outputs_written = 0;
shader->info.system_values_read = 0;
+   if (shader->info.stage == MESA_SHADER_VERTEX) {
+  shader->info.vs.double_inputs = 0;
+  shader->info.vs.double_inputs_read = 0;
+   }
if (shader->info.stage == MESA_SHADER_FRAGMENT) {
   shader->info.fs.uses_sample_qualifier = false;
}
diff --git a/src/compiler/shader_info.h b/src/compiler/shader_info.h
index 4492cad0e8..f6dedb8d62 100644
--- a/src/compiler/shader_info.h
+++ b/src/compiler/shader_info.h
@@ -67,8 +67,6 @@ typedef struct shader_info {
 
/* Which inputs are actually read */
uint64_t inputs_read;
-   /* Which inputs are actually read and are double */
-   uint64_t double_inputs_read;
/* Which outputs are actually written */
uint64_t outputs_written;
/* Which outputs are actually read */
@@ -109,6 +107,14 @@ 

[Mesa-dev] [PATCH v4 04/38] nir: partially revert c2acf97fcc9b32e

2018-01-09 Thread Karol Herbst
From: Timothy Arceri 

c2acf97fcc9b32e changed the use of double_inputs_read to be
inconsitent with its previous meaning. Here we re-enable the
gather info code that was removed as the modified code from
c2acf97fcc9b32e now uses the double_inputs member rather than
double_inputs_read.

This change allows us to use double_inputs_read with gallium
drivers without impacting double_inputs which is used by i965.

We also make use of the compiler option vs_inputs_dual_locations
to allow for the difference in behaviour between drivers that handle
vs inputs as taking up two locations for doubles, versus those that
treat them as taking a single location.

Tested-by: Karol Herbst 
---
 src/compiler/nir/nir_gather_info.c | 29 +++--
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/src/compiler/nir/nir_gather_info.c 
b/src/compiler/nir/nir_gather_info.c
index e98129b22c..743f968035 100644
--- a/src/compiler/nir/nir_gather_info.c
+++ b/src/compiler/nir/nir_gather_info.c
@@ -54,6 +54,11 @@ set_io_mask(nir_shader *shader, nir_variable *var, int 
offset, int len,
  else
 shader->info.inputs_read |= bitfield;
 
+ /* double inputs read is only for vertex inputs */
+ if (shader->info.stage == MESA_SHADER_VERTEX &&
+ glsl_type_is_dual_slot(glsl_without_array(var->type)))
+shader->info.vs.double_inputs_read |= bitfield;
+
  if (shader->info.stage == MESA_SHADER_FRAGMENT) {
 shader->info.fs.uses_sample_qualifier |= var->data.sample;
  }
@@ -88,21 +93,27 @@ static void
 mark_whole_variable(nir_shader *shader, nir_variable *var, bool is_output_read)
 {
const struct glsl_type *type = var->type;
+   bool is_vertex_input = false;
 
if (nir_is_per_vertex_io(var, shader->info.stage)) {
   assert(glsl_type_is_array(type));
   type = glsl_get_array_element(type);
}
 
+   if (!shader->options->vs_inputs_dual_locations &&
+   shader->info.stage == MESA_SHADER_VERTEX &&
+   var->data.mode == nir_var_shader_in)
+  is_vertex_input = true;
+
const unsigned slots =
   var->data.compact ? DIV_ROUND_UP(glsl_get_length(type), 4)
-: glsl_count_attribute_slots(type, false);
+: glsl_count_attribute_slots(type, is_vertex_input);
 
set_io_mask(shader, var, 0, slots, is_output_read);
 }
 
 static unsigned
-get_io_offset(nir_deref_var *deref)
+get_io_offset(nir_deref_var *deref, bool is_vertex_input)
 {
unsigned offset = 0;
 
@@ -117,7 +128,7 @@ get_io_offset(nir_deref_var *deref)
 return -1;
  }
 
- offset += glsl_count_attribute_slots(tail->type, false) *
+ offset += glsl_count_attribute_slots(tail->type, is_vertex_input) *
 deref_array->base_offset;
   }
   /* TODO: we can get the offset for structs here see nir_lower_io() */
@@ -163,7 +174,13 @@ try_mask_partial_io(nir_shader *shader, nir_deref_var 
*deref, bool is_output_rea
   return false;
}
 
-   unsigned offset = get_io_offset(deref);
+   bool is_vertex_input = false;
+   if (!shader->options->vs_inputs_dual_locations &&
+   shader->info.stage == MESA_SHADER_VERTEX &&
+   var->data.mode == nir_var_shader_in)
+  is_vertex_input = true;
+
+   unsigned offset = get_io_offset(deref, is_vertex_input);
if (offset == -1)
   return false;
 
@@ -179,7 +196,8 @@ try_mask_partial_io(nir_shader *shader, nir_deref_var 
*deref, bool is_output_rea
}
 
/* double element width for double types that takes two slots */
-   if (glsl_type_is_dual_slot(glsl_without_array(type))) {
+   if (!is_vertex_input &&
+   glsl_type_is_dual_slot(glsl_without_array(type))) {
   elem_width *= 2;
}
 
@@ -235,7 +253,6 @@ gather_intrinsic_info(nir_intrinsic_instr *instr, 
nir_shader *shader)
 for (uint i = 0; i < glsl_count_attribute_slots(var->type, false); 
i++) {
int idx = var->data.location + i;
shader->info.vs.double_inputs |= BITFIELD64_BIT(idx);
-   shader->info.vs.double_inputs_read |= BITFIELD64_BIT(idx);
 }
  }
   }
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v4 07/38] nvir: move common converter code in base class

2018-01-09 Thread Karol Herbst
v2: remove TGSI related bits

Signed-off-by: Karol Herbst 
---
 src/gallium/drivers/nouveau/Makefile.sources   |   2 +
 .../nouveau/codegen/nv50_ir_from_common.cpp| 107 +
 .../drivers/nouveau/codegen/nv50_ir_from_common.h  |  58 +++
 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  | 106 +---
 src/gallium/drivers/nouveau/meson.build|   2 +
 5 files changed, 172 insertions(+), 103 deletions(-)
 create mode 100644 src/gallium/drivers/nouveau/codegen/nv50_ir_from_common.cpp
 create mode 100644 src/gallium/drivers/nouveau/codegen/nv50_ir_from_common.h

diff --git a/src/gallium/drivers/nouveau/Makefile.sources 
b/src/gallium/drivers/nouveau/Makefile.sources
index 65f08c7d8d..fee5e59522 100644
--- a/src/gallium/drivers/nouveau/Makefile.sources
+++ b/src/gallium/drivers/nouveau/Makefile.sources
@@ -115,6 +115,8 @@ NV50_CODEGEN_SOURCES := \
codegen/nv50_ir_build_util.h \
codegen/nv50_ir_driver.h \
codegen/nv50_ir_emit_nv50.cpp \
+   codegen/nv50_ir_from_common.cpp \
+   codegen/nv50_ir_from_common.h \
codegen/nv50_ir_from_tgsi.cpp \
codegen/nv50_ir_graph.cpp \
codegen/nv50_ir_graph.h \
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_common.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_common.cpp
new file mode 100644
index 00..58e9ab311b
--- /dev/null
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_common.cpp
@@ -0,0 +1,107 @@
+/*
+ * Copyright 2011 Christoph Bumiller
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include "codegen/nv50_ir_from_common.h"
+
+namespace nv50_ir {
+
+ConverterCommon::ConverterCommon(Program *prog, nv50_ir_prog_info *info)
+   :  BuildUtil(prog),
+  info(info) {}
+
+ConverterCommon::Subroutine *
+ConverterCommon::getSubroutine(unsigned ip)
+{
+   std::map::iterator it = sub.map.find(ip);
+
+   if (it == sub.map.end())
+  it = sub.map.insert(std::make_pair(
+  ip, Subroutine(new Function(prog, "SUB", ip.first;
+
+   return >second;
+}
+
+ConverterCommon::Subroutine *
+ConverterCommon::getSubroutine(Function *f)
+{
+   unsigned ip = f->getLabel();
+   std::map::iterator it = sub.map.find(ip);
+
+   if (it == sub.map.end())
+  it = sub.map.insert(std::make_pair(ip, Subroutine(f))).first;
+
+   return >second;
+}
+
+uint8_t
+ConverterCommon::translateInterpMode(const nv50_ir_varying *var, operation& op)
+{
+   uint8_t mode = NV50_IR_INTERP_PERSPECTIVE;
+
+   if (var->flat)
+  mode = NV50_IR_INTERP_FLAT;
+   else
+   if (var->linear)
+  mode = NV50_IR_INTERP_LINEAR;
+   else
+   if (var->sc)
+  mode = NV50_IR_INTERP_SC;
+
+   op = (mode == NV50_IR_INTERP_PERSPECTIVE || mode == NV50_IR_INTERP_SC)
+  ? OP_PINTERP : OP_LINTERP;
+
+   if (var->centroid)
+  mode |= NV50_IR_INTERP_CENTROID;
+
+   return mode;
+}
+
+void
+ConverterCommon::handleUserClipPlanes()
+{
+   Value *res[8];
+   int n, i, c;
+
+   for (c = 0; c < 4; ++c) {
+  for (i = 0; i < info->io.genUserClip; ++i) {
+ Symbol *sym = mkSymbol(FILE_MEMORY_CONST, info->io.auxCBSlot,
+TYPE_F32, info->io.ucpBase + i * 16 + c * 4);
+ Value *ucp = mkLoadv(TYPE_F32, sym, NULL);
+ if (c == 0)
+res[i] = mkOp2v(OP_MUL, TYPE_F32, getScratch(), clipVtx[c], ucp);
+ else
+mkOp3(OP_MAD, TYPE_F32, res[i], clipVtx[c], ucp, res[i]);
+  }
+   }
+
+   const int first = info->numOutputs - (info->io.genUserClip + 3) / 4;
+
+   for (i = 0; i < info->io.genUserClip; ++i) {
+  n = i / 4 + first;
+  c = i % 4;
+  Symbol *sym =
+ mkSymbol(FILE_SHADER_OUTPUT, 0, TYPE_F32, info->out[n].slot[c] * 4);
+  mkStore(OP_EXPORT, TYPE_F32, sym, NULL, res[i]);
+   }
+}
+
+} // nv50_ir
diff --git 

Re: [Mesa-dev] [PATCH] r600: Allow egd_tables.py to run with python3 too

2018-01-09 Thread Dave Airlie
On 5 January 2018 at 01:14, Michal Srb  wrote:
> From: =?UTF-8?q?Tom=C3=A1=C5=A1=20Chv=C3=A1tal?= 
>
> Makes the egd_tables.py compatible with both python 2 and 3.

This appears to break the build here, I get a few () lines in the output.

I suspect print() needs to be print('')

Dave.

> ---
>  src/gallium/drivers/r600/egd_tables.py | 52 
> +-
>  1 file changed, 26 insertions(+), 26 deletions(-)
>
> diff --git a/src/gallium/drivers/r600/egd_tables.py 
> b/src/gallium/drivers/r600/egd_tables.py
> index d7b78c7fb1..c9b5610441 100644
> --- a/src/gallium/drivers/r600/egd_tables.py
> +++ b/src/gallium/drivers/r600/egd_tables.py
> @@ -60,7 +60,7 @@ class StringTable:
>  """
>  fragments = [
>  '"%s\\0" /* %s */' % (
> -te[0].encode('string_escape'),
> +te[0].encode('unicode_escape'),
>  ', '.join(str(idx) for idx in te[2])
>  )
>  for te in self.table
> @@ -217,10 +217,10 @@ def write_tables(regs, packets):
>  strings = StringTable()
>  strings_offsets = IntTable("int")
>
> -print '/* This file is autogenerated by egd_tables.py from evergreend.h. 
> Do not edit directly. */'
> -print
> -print CopyRight.strip()
> -print '''
> +print('/* This file is autogenerated by egd_tables.py from evergreend.h. 
> Do not edit directly. */')
> +print()
> +print(CopyRight.strip())
> +print('''
>  #ifndef EG_TABLES_H
>  #define EG_TABLES_H
>
> @@ -242,20 +242,20 @@ struct eg_packet3 {
>  unsigned name_offset;
>  unsigned op;
>  };
> -'''
> +''')
>
> -print 'static const struct eg_packet3 packet3_table[] = {'
> +print('static const struct eg_packet3 packet3_table[] = {')
>  for pkt in packets:
> -print '\t{%s, %s},' % (strings.add(pkt[5:]), pkt)
> -print '};'
> -print
> +print('\t{%s, %s},' % (strings.add(pkt[5:]), pkt))
> +print('};')
> +print()
>
> -print 'static const struct eg_field egd_fields_table[] = {'
> +print('static const struct eg_field egd_fields_table[] = {')
>
>  fields_idx = 0
>  for reg in regs:
>  if len(reg.fields) and reg.own_fields:
> -print '\t/* %s */' % (fields_idx)
> +print('\t/* %s */' % (fields_idx))
>
>  reg.fields_idx = fields_idx
>
> @@ -266,34 +266,34 @@ struct eg_packet3 {
>  while value[1] >= len(values_offsets):
>  values_offsets.append(-1)
>  values_offsets[value[1]] = 
> strings.add(strip_prefix(value[0]))
> -print '\t{%s, %s(~0u), %s, %s},' % (
> +print('\t{%s, %s(~0u), %s, %s},' % (
>  strings.add(field.name), field.s_name,
> -len(values_offsets), 
> strings_offsets.add(values_offsets))
> +len(values_offsets), 
> strings_offsets.add(values_offsets)))
>  else:
> -print '\t{%s, %s(~0u)},' % (strings.add(field.name), 
> field.s_name)
> +print('\t{%s, %s(~0u)},' % (strings.add(field.name), 
> field.s_name))
>  fields_idx += 1
>
> -print '};'
> -print
> +print('};')
> +print()
>
> -print 'static const struct eg_reg egd_reg_table[] = {'
> +print('static const struct eg_reg egd_reg_table[] = {')
>  for reg in regs:
>  if len(reg.fields):
> -print '\t{%s, %s, %s, %s},' % (strings.add(reg.name), reg.r_name,
> -len(reg.fields), reg.fields_idx if reg.own_fields else 
> reg.fields_owner.fields_idx)
> +print('\t{%s, %s, %s, %s},' % (strings.add(reg.name), reg.r_name,
> +len(reg.fields), reg.fields_idx if reg.own_fields else 
> reg.fields_owner.fields_idx))
>  else:
> -print '\t{%s, %s},' % (strings.add(reg.name), reg.r_name)
> -print '};'
> -print
> +print('\t{%s, %s},' % (strings.add(reg.name), reg.r_name))
> +print('};')
> +print()
>
>  strings.emit(sys.stdout, "egd_strings")
>
> -print
> +print()
>
>  strings_offsets.emit(sys.stdout, "egd_strings_offsets")
>
> -print
> -print '#endif'
> +print()
> +print('#endif')
>
>
>  def main():
> --
> 2.15.1
>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] radv: Implement VK_EXT_discard_rectangles.

2018-01-09 Thread Bas Nieuwenhuizen
Tested with a modified deferred demo and no regressions in a 1.0.2
mustpass run.
---
 src/amd/vulkan/radv_cmd_buffer.c  | 51 +++
 src/amd/vulkan/radv_device.c  |  6 +
 src/amd/vulkan/radv_extensions.py |  1 +
 src/amd/vulkan/radv_pipeline.c| 35 +++
 src/amd/vulkan/radv_private.h | 23 +-
 5 files changed, 110 insertions(+), 6 deletions(-)

diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index 3114ae9fb4..4c42dc2b13 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -91,6 +91,7 @@ radv_bind_dynamic_state(struct radv_cmd_buffer *cmd_buffer,
 */
dest->viewport.count = src->viewport.count;
dest->scissor.count = src->scissor.count;
+   dest->discard_rectangle.count = src->discard_rectangle.count;
 
if (copy_mask & RADV_DYNAMIC_VIEWPORT) {
if (memcmp(>viewport.viewports, >viewport.viewports,
@@ -168,6 +169,16 @@ radv_bind_dynamic_state(struct radv_cmd_buffer *cmd_buffer,
}
}
 
+   if (copy_mask & RADV_DYNAMIC_DISCARD_RECTANGLE) {
+   if (memcmp(>discard_rectangle.rectangles, 
>discard_rectangle.rectangles,
+  src->discard_rectangle.count * sizeof(VkRect2D))) {
+   typed_memcpy(dest->discard_rectangle.rectangles,
+src->discard_rectangle.rectangles,
+src->discard_rectangle.count);
+   dest_mask |= RADV_DYNAMIC_DISCARD_RECTANGLE;
+   }
+   }
+
cmd_buffer->state.dirty |= dest_mask;
 }
 
@@ -1098,6 +1109,8 @@ radv_emit_graphics_pipeline(struct radv_cmd_buffer 
*cmd_buffer)
}
radeon_set_context_reg(cmd_buffer->cs, R_028A6C_VGT_GS_OUT_PRIM_TYPE, 
pipeline->graphics.gs_out);
 
+   radeon_set_context_reg(cmd_buffer->cs, R_02820C_PA_SC_CLIPRECT_RULE, 
pipeline->graphics.pa_sc_cliprect_rule);
+
if (unlikely(cmd_buffer->device->trace_bo))
radv_save_pipeline(cmd_buffer, pipeline, RING_GFX);
 
@@ -1134,6 +1147,22 @@ radv_emit_scissor(struct radv_cmd_buffer *cmd_buffer)
   
cmd_buffer->state.pipeline->graphics.ms.pa_sc_mode_cntl_0 | 
S_028A48_VPORT_SCISSOR_ENABLE(count ? 1 : 0));
 }
 
+static void
+radv_emit_discard_rectangle(struct radv_cmd_buffer *cmd_buffer)
+{
+   if (!cmd_buffer->state.dynamic.discard_rectangle.count)
+   return;
+
+   radeon_set_context_reg_seq(cmd_buffer->cs, R_028210_PA_SC_CLIPRECT_0_TL,
+  
cmd_buffer->state.dynamic.discard_rectangle.count * 2);
+   for (unsigned i = 0; i < 
cmd_buffer->state.dynamic.discard_rectangle.count; ++i) {
+   VkRect2D rect = 
cmd_buffer->state.dynamic.discard_rectangle.rectangles[i];
+   radeon_emit(cmd_buffer->cs, S_028210_TL_X(rect.offset.x) | 
S_028210_TL_Y(rect.offset.y));
+   radeon_emit(cmd_buffer->cs, S_028214_BR_X(rect.offset.x + 
rect.extent.width) |
+   S_028214_BR_Y(rect.offset.y + 
rect.extent.height));
+   }
+}
+
 static void
 radv_emit_line_width(struct radv_cmd_buffer *cmd_buffer)
 {
@@ -1627,6 +1656,9 @@ radv_cmd_buffer_flush_dynamic_state(struct 
radv_cmd_buffer *cmd_buffer)
   RADV_CMD_DIRTY_DYNAMIC_DEPTH_BIAS))
radv_emit_depth_biais(cmd_buffer);
 
+   if (cmd_buffer->state.dirty & RADV_CMD_DIRTY_DYNAMIC_DISCARD_RECTANGLE)
+   radv_emit_discard_rectangle(cmd_buffer);
+
cmd_buffer->state.dirty &= ~RADV_CMD_DIRTY_DYNAMIC_ALL;
 }
 
@@ -2882,6 +2914,25 @@ void radv_CmdSetStencilReference(
cmd_buffer->state.dirty |= RADV_CMD_DIRTY_DYNAMIC_STENCIL_REFERENCE;
 }
 
+void radv_CmdSetDiscardRectangleEXT(
+   VkCommandBuffer commandBuffer,
+   uint32_tfirstDiscardRectangle,
+   uint32_tdiscardRectangleCount,
+   const VkRect2D* pDiscardRectangles)
+{
+   RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
+   struct radv_cmd_state *state = _buffer->state;
+   MAYBE_UNUSED const uint32_t total_count = firstDiscardRectangle + 
discardRectangleCount;
+
+   assert(firstDiscardRectangle < MAX_DISCARD_RECTANGLES);
+   assert(total_count >= 1 && total_count <= MAX_DISCARD_RECTANGLES);
+
+
+   
typed_memcpy(>dynamic.discard_rectangle.rectangles[firstDiscardRectangle],
+pDiscardRectangles, discardRectangleCount);
+
+   state->dirty |= RADV_CMD_DIRTY_DYNAMIC_DISCARD_RECTANGLE;
+}
 void radv_CmdExecuteCommands(
VkCommandBuffer commandBuffer,
uint32_tcommandBufferCount,
diff --git a/src/amd/vulkan/radv_device.c 

[Mesa-dev] [PATCH 1/2] radv: Add mapping between dynamic state mask and external enum.

2018-01-09 Thread Bas Nieuwenhuizen
The EXT values are really large, e.g.
VK_DYNAMIC_STATE_DISCARD_RECTANGLE_EXT = 199000, so 1 << value
is not going to fit into a 32-bit mask.
---
 src/amd/vulkan/radv_cmd_buffer.c | 36 ++---
 src/amd/vulkan/radv_pipeline.c   | 49 +++-
 src/amd/vulkan/radv_private.h| 32 ++
 3 files changed, 79 insertions(+), 38 deletions(-)

diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index d8f780cfd7..3114ae9fb4 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -92,79 +92,79 @@ radv_bind_dynamic_state(struct radv_cmd_buffer *cmd_buffer,
dest->viewport.count = src->viewport.count;
dest->scissor.count = src->scissor.count;
 
-   if (copy_mask & (1 << VK_DYNAMIC_STATE_VIEWPORT)) {
+   if (copy_mask & RADV_DYNAMIC_VIEWPORT) {
if (memcmp(>viewport.viewports, >viewport.viewports,
   src->viewport.count * sizeof(VkViewport))) {
typed_memcpy(dest->viewport.viewports,
 src->viewport.viewports,
 src->viewport.count);
-   dest_mask |= 1 << VK_DYNAMIC_STATE_VIEWPORT;
+   dest_mask |= RADV_DYNAMIC_VIEWPORT;
}
}
 
-   if (copy_mask & (1 << VK_DYNAMIC_STATE_SCISSOR)) {
+   if (copy_mask & RADV_DYNAMIC_SCISSOR) {
if (memcmp(>scissor.scissors, >scissor.scissors,
   src->scissor.count * sizeof(VkRect2D))) {
typed_memcpy(dest->scissor.scissors,
 src->scissor.scissors, src->scissor.count);
-   dest_mask |= 1 << VK_DYNAMIC_STATE_SCISSOR;
+   dest_mask |= RADV_DYNAMIC_SCISSOR;
}
}
 
-   if (copy_mask & (1 << VK_DYNAMIC_STATE_LINE_WIDTH)) {
+   if (copy_mask & RADV_DYNAMIC_LINE_WIDTH) {
if (dest->line_width != src->line_width) {
dest->line_width = src->line_width;
-   dest_mask |= 1 << VK_DYNAMIC_STATE_LINE_WIDTH;
+   dest_mask |= RADV_DYNAMIC_LINE_WIDTH;
}
}
 
-   if (copy_mask & (1 << VK_DYNAMIC_STATE_DEPTH_BIAS)) {
+   if (copy_mask & RADV_DYNAMIC_DEPTH_BIAS) {
if (memcmp(>depth_bias, >depth_bias,
   sizeof(src->depth_bias))) {
dest->depth_bias = src->depth_bias;
-   dest_mask |= 1 << VK_DYNAMIC_STATE_DEPTH_BIAS;
+   dest_mask |= RADV_DYNAMIC_DEPTH_BIAS;
}
}
 
-   if (copy_mask & (1 << VK_DYNAMIC_STATE_BLEND_CONSTANTS)) {
+   if (copy_mask & RADV_DYNAMIC_BLEND_CONSTANTS) {
if (memcmp(>blend_constants, >blend_constants,
   sizeof(src->blend_constants))) {
typed_memcpy(dest->blend_constants,
 src->blend_constants, 4);
-   dest_mask |= 1 << VK_DYNAMIC_STATE_BLEND_CONSTANTS;
+   dest_mask |= RADV_DYNAMIC_BLEND_CONSTANTS;
}
}
 
-   if (copy_mask & (1 << VK_DYNAMIC_STATE_DEPTH_BOUNDS)) {
+   if (copy_mask & RADV_DYNAMIC_DEPTH_BOUNDS) {
if (memcmp(>depth_bounds, >depth_bounds,
   sizeof(src->depth_bounds))) {
dest->depth_bounds = src->depth_bounds;
-   dest_mask |= 1 << VK_DYNAMIC_STATE_DEPTH_BOUNDS;
+   dest_mask |= RADV_DYNAMIC_DEPTH_BOUNDS;
}
}
 
-   if (copy_mask & (1 << VK_DYNAMIC_STATE_STENCIL_COMPARE_MASK)) {
+   if (copy_mask & RADV_DYNAMIC_STENCIL_COMPARE_MASK) {
if (memcmp(>stencil_compare_mask,
   >stencil_compare_mask,
   sizeof(src->stencil_compare_mask))) {
dest->stencil_compare_mask = src->stencil_compare_mask;
-   dest_mask |= 1 << VK_DYNAMIC_STATE_STENCIL_COMPARE_MASK;
+   dest_mask |= RADV_DYNAMIC_STENCIL_COMPARE_MASK;
}
}
 
-   if (copy_mask & (1 << VK_DYNAMIC_STATE_STENCIL_WRITE_MASK)) {
+   if (copy_mask & RADV_DYNAMIC_STENCIL_WRITE_MASK) {
if (memcmp(>stencil_write_mask, >stencil_write_mask,
   sizeof(src->stencil_write_mask))) {
dest->stencil_write_mask = src->stencil_write_mask;
-   dest_mask |= 1 << VK_DYNAMIC_STATE_STENCIL_WRITE_MASK;
+   dest_mask |= RADV_DYNAMIC_STENCIL_WRITE_MASK;
}
}
 
-   if (copy_mask & (1 << VK_DYNAMIC_STATE_STENCIL_REFERENCE)) {
+   if (copy_mask & RADV_DYNAMIC_STENCIL_REFERENCE) {
if 

[Mesa-dev] [PATCH] util: fix NORETURN for msvc, add HAVE_FUNC_ATTRIBUTE_NORETURN to c99_compat.h

2018-01-09 Thread sroland
From: Roland Scheidegger 

We've seen some problems internally due to macro redefinition.
Fix this by adding HAVE_FUNC_ATTRIBUTE_NORETURN to c99_compat.h,
and defining it for msvc.
And avoid redefinition just in case.
---
 include/c99_compat.h |  1 +
 src/util/macros.h| 12 
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/c99_compat.h b/include/c99_compat.h
index cb690c6..81621a7 100644
--- a/include/c99_compat.h
+++ b/include/c99_compat.h
@@ -164,6 +164,7 @@ test_c99_compat_h(const void * restrict a,
 #define HAVE_FUNC_ATTRIBUTE_FORMAT 1
 #define HAVE_FUNC_ATTRIBUTE_PACKED 1
 #define HAVE_FUNC_ATTRIBUTE_ALIAS 1
+#define HAVE_FUNC_ATTRIBUTE_NORETURN 1
 
 #if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)
/* https://gcc.gnu.org/onlinedocs/gcc-4.3.6/gcc/Other-Builtins.html */
diff --git a/src/util/macros.h b/src/util/macros.h
index 2a08407..5ce0e57 100644
--- a/src/util/macros.h
+++ b/src/util/macros.h
@@ -171,10 +171,14 @@ do {   \
 #define ATTRIBUTE_RETURNS_NONNULL
 #endif
 
-#ifdef HAVE_FUNC_ATTRIBUTE_NORETURN
-#define NORETURN __attribute__((__noreturn__))
-#else
-#define NORETURN
+#ifndef NORETURN
+#  ifdef _MSC_VER
+#define NORETURN __declspec(noreturn)
+#  elif defined HAVE_FUNC_ATTRIBUTE_NORETURN
+#define NORETURN __attribute__((__noreturn__))
+#  else
+#define NORETURN
+#  endif
 #endif
 
 #ifdef __cplusplus
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 104553] mat4: m[i][j] incorrect result with row_major UBO

2018-01-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=104553

--- Comment #3 from Timothy Arceri  ---
(In reply to Ilia Mirkin from comment #1)
> Ian Romanick (idr) wrote a test generator which generated random shader_test
> files with different ubo arrangements. It caught a lot of bugs back in the
> day, but I don't think tests from it were ever checked in. May be worth
> resurfacing, and extending to SSBO's while one's at it.

I believe I pushed a version of the generator with arrays of arrays and doubles
support merged in but it was never hooked up to run in standard piglit runs. I
believe people were worried about running an excessive number of tests.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] i965/fs: Use UW types when using V immediates

2018-01-09 Thread Anuj Phogat
I tested the destination register type W => UW change to move 0x76543210V.
It fixed 1000+ piglit failures on Cannonlake.

On Tue, Jan 9, 2018 at 4:56 PM, Jason Ekstrand  wrote:
> Gen 10 has a strange hardware bug involving V immediates with W types.
> It appears that a mov(8) g2<1>W 0x76543210V will actually result in g2
> getting the value {3, 2, 1, 0, 3, 2, 1, 0}.  In particular, the bottom
> four nibbles are repeated instead of the top four being taken.  (A mov
> of 0x3210V yields the same result.)  This bug does not appear in any
> hardware documentation as far as we can tell and the simulator does not
> implement the bug either.
>
> Commit 6132992cdb858268af0e985727d80e4140be389c was mostly a no-op
> except that it changed the type of the subgroup invocation from UW to W
> and caused us to tickle this bug with basically every compute shader
> that uses any sort of invocation ID (which is most of them).  This is
> also potentially an issue for geometry shader input pulls and SampleID
> setup.  The easy solution is just to change the few places where we use
> a vector integer immediate with a W type to use a UW type.
>
> Cc: Anuj Phogat 
> Cc: mesa-sta...@lists.freedesktop.org
> Fixes: 6132992cdb858268af0e985727d80e4140be389c
> ---
>  src/intel/compiler/brw_fs.cpp | 6 +++---
>  src/intel/compiler/brw_fs_nir.cpp | 4 ++--
>  2 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 6d9f0ec..83d28f8 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -1256,16 +1256,16 @@ fs_visitor::emit_sampleid_setup()
> * TODO: These payload bits exist on Gen7 too, but they appear to 
> always
> *   be zero, so this code fails to work.  We should find out why.
> */
> -  fs_reg tmp(VGRF, alloc.allocate(1), BRW_REGISTER_TYPE_W);
> +  fs_reg tmp(VGRF, alloc.allocate(1), BRW_REGISTER_TYPE_UW);
>
>abld.SHR(tmp, fs_reg(stride(retype(brw_vec1_grf(1, 0),
> - BRW_REGISTER_TYPE_B), 1, 8, 0)),
> + BRW_REGISTER_TYPE_UB), 1, 8, 0)),
>  brw_imm_v(0x));
>abld.AND(*reg, tmp, brw_imm_w(0xf));
> } else {
>const fs_reg t1 = component(fs_reg(VGRF, alloc.allocate(1),
>   BRW_REGISTER_TYPE_D), 0);
> -  const fs_reg t2(VGRF, alloc.allocate(1), BRW_REGISTER_TYPE_W);
> +  const fs_reg t2(VGRF, alloc.allocate(1), BRW_REGISTER_TYPE_UW);
>
>/* The PS will be run in MSDISPMODE_PERSAMPLE. For example with
> * 8x multisampling, subspan 0 will represent sample N (where N
> diff --git a/src/intel/compiler/brw_fs_nir.cpp 
> b/src/intel/compiler/brw_fs_nir.cpp
> index 01651dd..5c16efa 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -237,7 +237,7 @@ fs_visitor::nir_emit_system_values()
> {
>const fs_builder abld = bld.annotate("gl_SubgroupInvocation", NULL);
>fs_reg  = nir_system_values[SYSTEM_VALUE_SUBGROUP_INVOCATION];
> -  reg = abld.vgrf(BRW_REGISTER_TYPE_W);
> +  reg = abld.vgrf(BRW_REGISTER_TYPE_UW);
>
>const fs_builder allbld8 = abld.group(8, 0).exec_all();
>allbld8.MOV(reg, brw_imm_v(0x76543210));
> @@ -2134,7 +2134,7 @@ fs_visitor::emit_gs_input_load(const fs_reg ,
>* by 32 (shifting by 5), and add the two together.  This is
>* the final indirect byte offset.
>*/
> - fs_reg sequence = bld.vgrf(BRW_REGISTER_TYPE_W, 1);
> + fs_reg sequence = bld.vgrf(BRW_REGISTER_TYPE_UW, 1);
>   fs_reg channel_offsets = bld.vgrf(BRW_REGISTER_TYPE_UD, 1);
>   fs_reg vertex_offset_bytes = bld.vgrf(BRW_REGISTER_TYPE_UD, 1);
>   fs_reg icp_offset_bytes = bld.vgrf(BRW_REGISTER_TYPE_UD, 1);
> --
> 2.5.0.400.gff86faf
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600: add support for ARB_shader_clock.

2018-01-09 Thread Dave Airlie
From: Dave Airlie 

---
 docs/features.txt  |  2 +-
 src/gallium/drivers/r600/r600_pipe.c   |  2 +-
 src/gallium/drivers/r600/r600_shader.c | 29 ++---
 src/gallium/drivers/r600/r600_sq.h |  3 ++-
 4 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/docs/features.txt b/docs/features.txt
index a5f34edd41..80e6c5aaa4 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -305,7 +305,7 @@ Khronos, ARB, and OES extensions that are not part of any 
OpenGL or OpenGL ES ve
   GL_ARB_sample_locations   not started
   GL_ARB_seamless_cubemap_per_texture   DONE (i965, nvc0, 
radeonsi, r600, softpipe, swr)
   GL_ARB_shader_ballot  DONE (i965/gen8+, 
nvc0, radeonsi)
-  GL_ARB_shader_clock   DONE (i965/gen7+, 
nv50, nvc0, radeonsi)
+  GL_ARB_shader_clock   DONE (i965/gen7+, 
nv50, nvc0, r600, radeonsi)
   GL_ARB_shader_stencil_export  DONE (i965/gen9+, 
r600, radeonsi, softpipe, llvmpipe, swr)
   GL_ARB_shader_viewport_layer_arrayDONE (i965/gen6+, 
nvc0, radeonsi)
   GL_ARB_sparse_buffer  DONE (radeonsi/CIK+)
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 95aa2e5383..a8cedb3b90 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -353,6 +353,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE:
case PIPE_CAP_SAMPLER_VIEW_TARGET:
case PIPE_CAP_TGSI_PACK_HALF_FLOAT:
+   case PIPE_CAP_TGSI_CLOCK:
return family >= CHIP_CEDAR ? 1 : 0;
case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
return family >= CHIP_CEDAR ? 4 : 0;
@@ -397,7 +398,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_INT64:
case PIPE_CAP_INT64_DIVMOD:
case PIPE_CAP_TGSI_TEX_TXF_LZ:
-   case PIPE_CAP_TGSI_CLOCK:
case PIPE_CAP_POLYGON_MODE_FILL_RECTANGLE:
case PIPE_CAP_SPARSE_BUFFER_PAGE_SIZE:
case PIPE_CAP_TGSI_BALLOT:
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index f6ff2055ee..f0d5277b2c 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -10168,6 +10168,29 @@ static int tgsi_bfe(struct r600_shader_ctx *ctx)
return 0;
 }
 
+static int tgsi_clock(struct r600_shader_ctx *ctx)
+{
+   struct tgsi_full_instruction *inst = 
>parse.FullToken.FullInstruction;
+   struct r600_bytecode_alu alu;
+   int r;
+
+   memset(, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ALU_OP1_MOV;
+   tgsi_dst(ctx, >Dst[0], 0, );
+   alu.src[0].sel = EG_V_SQ_ALU_SRC_TIME_LO;
+   r = r600_bytecode_add_alu(ctx->bc, );
+   if (r)
+   return r;
+   memset(, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ALU_OP1_MOV;
+   tgsi_dst(ctx, >Dst[0], 1, );
+   alu.src[0].sel = EG_V_SQ_ALU_SRC_TIME_HI;
+   r = r600_bytecode_add_alu(ctx->bc, );
+   if (r)
+   return r;
+   return 0;
+}
+
 static const struct r600_shader_tgsi_instruction 
r600_shader_tgsi_instruction[] = {
[TGSI_OPCODE_ARL]   = { ALU_OP0_NOP, tgsi_r600_arl},
[TGSI_OPCODE_MOV]   = { ALU_OP1_MOV, tgsi_op2},
@@ -10204,7 +10227,7 @@ static const struct r600_shader_tgsi_instruction 
r600_shader_tgsi_instruction[]
[TGSI_OPCODE_POW]   = { ALU_OP0_NOP, tgsi_pow},
[31]= { ALU_OP0_NOP, tgsi_unsupported},
[32]= { ALU_OP0_NOP, tgsi_unsupported},
-   [33]= { ALU_OP0_NOP, tgsi_unsupported},
+   [TGSI_OPCODE_CLOCK] = { ALU_OP0_NOP, tgsi_unsupported},
[34]= { ALU_OP0_NOP, tgsi_unsupported},
[35]= { ALU_OP0_NOP, tgsi_unsupported},
[TGSI_OPCODE_COS]   = { ALU_OP1_COS, tgsi_trig},
@@ -10402,7 +10425,7 @@ static const struct r600_shader_tgsi_instruction 
eg_shader_tgsi_instruction[] =
[TGSI_OPCODE_POW]   = { ALU_OP0_NOP, tgsi_pow},
[31]= { ALU_OP0_NOP, tgsi_unsupported},
[32]= { ALU_OP0_NOP, tgsi_unsupported},
-   [33]= { ALU_OP0_NOP, tgsi_unsupported},
+   [TGSI_OPCODE_CLOCK] = { ALU_OP0_NOP, tgsi_clock},
[34]= { ALU_OP0_NOP, tgsi_unsupported},
[35]= { ALU_OP0_NOP, tgsi_unsupported},
[TGSI_OPCODE_COS]   = { ALU_OP1_COS, tgsi_trig},
@@ -10624,7 +10647,7 @@ static const struct r600_shader_tgsi_instruction 
cm_shader_tgsi_instruction[] =
[TGSI_OPCODE_POW]   = { ALU_OP0_NOP, cayman_pow},
[31]= { 

[Mesa-dev] [PATCH 1/2] i965/fs: Use UW types when using V immediates

2018-01-09 Thread Jason Ekstrand
Gen 10 has a strange hardware bug involving V immediates with W types.
It appears that a mov(8) g2<1>W 0x76543210V will actually result in g2
getting the value {3, 2, 1, 0, 3, 2, 1, 0}.  In particular, the bottom
four nibbles are repeated instead of the top four being taken.  (A mov
of 0x3210V yields the same result.)  This bug does not appear in any
hardware documentation as far as we can tell and the simulator does not
implement the bug either.

Commit 6132992cdb858268af0e985727d80e4140be389c was mostly a no-op
except that it changed the type of the subgroup invocation from UW to W
and caused us to tickle this bug with basically every compute shader
that uses any sort of invocation ID (which is most of them).  This is
also potentially an issue for geometry shader input pulls and SampleID
setup.  The easy solution is just to change the few places where we use
a vector integer immediate with a W type to use a UW type.

Cc: Anuj Phogat 
Cc: mesa-sta...@lists.freedesktop.org
Fixes: 6132992cdb858268af0e985727d80e4140be389c
---
 src/intel/compiler/brw_fs.cpp | 6 +++---
 src/intel/compiler/brw_fs_nir.cpp | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 6d9f0ec..83d28f8 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -1256,16 +1256,16 @@ fs_visitor::emit_sampleid_setup()
* TODO: These payload bits exist on Gen7 too, but they appear to always
*   be zero, so this code fails to work.  We should find out why.
*/
-  fs_reg tmp(VGRF, alloc.allocate(1), BRW_REGISTER_TYPE_W);
+  fs_reg tmp(VGRF, alloc.allocate(1), BRW_REGISTER_TYPE_UW);
 
   abld.SHR(tmp, fs_reg(stride(retype(brw_vec1_grf(1, 0),
- BRW_REGISTER_TYPE_B), 1, 8, 0)),
+ BRW_REGISTER_TYPE_UB), 1, 8, 0)),
 brw_imm_v(0x));
   abld.AND(*reg, tmp, brw_imm_w(0xf));
} else {
   const fs_reg t1 = component(fs_reg(VGRF, alloc.allocate(1),
  BRW_REGISTER_TYPE_D), 0);
-  const fs_reg t2(VGRF, alloc.allocate(1), BRW_REGISTER_TYPE_W);
+  const fs_reg t2(VGRF, alloc.allocate(1), BRW_REGISTER_TYPE_UW);
 
   /* The PS will be run in MSDISPMODE_PERSAMPLE. For example with
* 8x multisampling, subspan 0 will represent sample N (where N
diff --git a/src/intel/compiler/brw_fs_nir.cpp 
b/src/intel/compiler/brw_fs_nir.cpp
index 01651dd..5c16efa 100644
--- a/src/intel/compiler/brw_fs_nir.cpp
+++ b/src/intel/compiler/brw_fs_nir.cpp
@@ -237,7 +237,7 @@ fs_visitor::nir_emit_system_values()
{
   const fs_builder abld = bld.annotate("gl_SubgroupInvocation", NULL);
   fs_reg  = nir_system_values[SYSTEM_VALUE_SUBGROUP_INVOCATION];
-  reg = abld.vgrf(BRW_REGISTER_TYPE_W);
+  reg = abld.vgrf(BRW_REGISTER_TYPE_UW);
 
   const fs_builder allbld8 = abld.group(8, 0).exec_all();
   allbld8.MOV(reg, brw_imm_v(0x76543210));
@@ -2134,7 +2134,7 @@ fs_visitor::emit_gs_input_load(const fs_reg ,
   * by 32 (shifting by 5), and add the two together.  This is
   * the final indirect byte offset.
   */
- fs_reg sequence = bld.vgrf(BRW_REGISTER_TYPE_W, 1);
+ fs_reg sequence = bld.vgrf(BRW_REGISTER_TYPE_UW, 1);
  fs_reg channel_offsets = bld.vgrf(BRW_REGISTER_TYPE_UD, 1);
  fs_reg vertex_offset_bytes = bld.vgrf(BRW_REGISTER_TYPE_UD, 1);
  fs_reg icp_offset_bytes = bld.vgrf(BRW_REGISTER_TYPE_UD, 1);
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965: Use UD types for gl_SampleID setup

2018-01-09 Thread Jason Ekstrand
We already had to switch all of the W types to UW to prevent issues
with vector immediates on gen10.  We may as well use unsigned types
everywhere.
---
 src/intel/compiler/brw_fs.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 83d28f8..1623a7e 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -1219,7 +1219,7 @@ fs_visitor::emit_sampleid_setup()
assert(devinfo->gen >= 6);
 
const fs_builder abld = bld.annotate("compute sample id");
-   fs_reg *reg = new(this->mem_ctx) fs_reg(vgrf(glsl_type::int_type));
+   fs_reg *reg = new(this->mem_ctx) fs_reg(vgrf(glsl_type::uint_type));
 
if (!key->multisample_fbo) {
   /* As per GL_ARB_sample_shading specification:
@@ -1264,7 +1264,7 @@ fs_visitor::emit_sampleid_setup()
   abld.AND(*reg, tmp, brw_imm_w(0xf));
} else {
   const fs_reg t1 = component(fs_reg(VGRF, alloc.allocate(1),
- BRW_REGISTER_TYPE_D), 0);
+ BRW_REGISTER_TYPE_UD), 0);
   const fs_reg t2(VGRF, alloc.allocate(1), BRW_REGISTER_TYPE_UW);
 
   /* The PS will be run in MSDISPMODE_PERSAMPLE. For example with
@@ -1291,7 +1291,7 @@ fs_visitor::emit_sampleid_setup()
* accomodate 16x MSAA.
*/
   abld.exec_all().group(1, 0)
-  .AND(t1, fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_D)),
+  .AND(t1, fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_UD)),
brw_imm_ud(0xc0));
   abld.exec_all().group(1, 0).SHR(t1, t1, brw_imm_d(5));
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i915g: fix crashes with wined3d

2018-01-09 Thread Ilia Mirkin
On Tue, Jan 9, 2018 at 5:40 PM, Christopher Egert  wrote:
> I'm not too familiar with gallium3d, but this fixes
> crashes with 3DMark2001 and GTA3 in wine-staging.
>
> This should be fixed properly in the future.
>
> Signed-off-by: Christopher Egert 
> ---
>  src/gallium/drivers/i915/i915_clear.c| 3 ++-
>  src/gallium/drivers/i915/i915_state_static.c | 4 +++-
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/drivers/i915/i915_clear.c 
> b/src/gallium/drivers/i915/i915_clear.c
> index a1af789104..7a6f3fbb9a 100644
> --- a/src/gallium/drivers/i915/i915_clear.c
> +++ b/src/gallium/drivers/i915/i915_clear.c
> @@ -250,7 +250,8 @@ i915_clear_render(struct pipe_context *pipe, unsigned 
> buffers,
>  {
> struct i915_context *i915 = i915_context(pipe);
>
> -   if (i915->dirty)
> +   /* XXX not sure why this happens, but it works around one crash */

It used to be that there'd always be a draw before the first clear.
This no longer happens, and a ton of drivers had to get fixes for
that. Not sure if this is the right thing to do, but should look at
how/whether the draw function deals with this.

> +   if (i915->dirty || (i915->hardware_dirty == ~0))
>i915_update_derived(i915);
>
> i915_clear_emit(pipe, buffers, color, depth, stencil,
> diff --git a/src/gallium/drivers/i915/i915_state_static.c 
> b/src/gallium/drivers/i915/i915_state_static.c
> index 88b418b1ac..48bb137019 100644
> --- a/src/gallium/drivers/i915/i915_state_static.c
> +++ b/src/gallium/drivers/i915/i915_state_static.c
> @@ -244,7 +244,9 @@ static void update_dst_buf_vars(struct i915_context *i915)
>i915->current.target_fixup_format = need_fixup;
>i915->current.fixup_swizzle = fixup;
>/* we also send a new program to make sure the fixup for RGBA surfaces 
> happens */
> -  i915->hardware_dirty |= I915_HW_PROGRAM;
> +  /* XXX there is no program to upload, not sure where this should
> +   * be coming from, so comment this out for now */
> +  //i915->hardware_dirty |= I915_HW_PROGRAM;

Note that for that first clear, you might not have a program bound.
Perhaps related?

> }
>  }
>
> --
> 2.15.1
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 16/29] anv/cmd_buffer: Pass a subpass id into begin_subpass

2018-01-09 Thread Nanley Chery
On Mon, Nov 27, 2017 at 07:06:06PM -0800, Jason Ekstrand wrote:
> This is a bit less awkward than passing in the subpass because it means
> we don't have to extract the subpass id from the subpass.
> ---
>  src/intel/vulkan/genX_cmd_buffer.c | 12 +---
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
> b/src/intel/vulkan/genX_cmd_buffer.c
> index 6f2fa0a..56036f7 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -3136,13 +3136,11 @@ cmd_buffer_subpass_sync_fast_clear_values(struct 
> anv_cmd_buffer *cmd_buffer)
> }
>  }
>  
> -
>  static void
>  cmd_buffer_begin_subpass(struct anv_cmd_buffer *cmd_buffer,
> - struct anv_subpass *subpass)
> + uint32_t subpass_id)
>  {
> -   cmd_buffer->state.subpass = subpass;
> -   uint32_t subpass_id = anv_get_subpass_id(_buffer->state);
> +   cmd_buffer->state.subpass = 
> _buffer->state.pass->subpasses[subpass_id];
>  
> cmd_buffer->state.dirty |= ANV_CMD_DIRTY_RENDER_TARGETS;
>  
> @@ -3222,7 +3220,7 @@ void genX(CmdBeginRenderPass)(
>  
> genX(flush_pipeline_select_3d)(cmd_buffer);
>  
> -   cmd_buffer_begin_subpass(cmd_buffer, pass->subpasses);
> +   cmd_buffer_begin_subpass(cmd_buffer, 0);
>  }
>  
>  void genX(CmdNextSubpass)(
> @@ -3236,9 +3234,9 @@ void genX(CmdNextSubpass)(
>  
> assert(cmd_buffer->level == VK_COMMAND_BUFFER_LEVEL_PRIMARY);
>  
> +   uint32_t prev_subpass = anv_get_subpass_id(_buffer->state);

The prev_ prefix confused me a little. Maybe ending_subpass_id?
Either way, this patch is
Reviewed-by: Nanley Chery 

> cmd_buffer_end_subpass(cmd_buffer);
> -
> -   cmd_buffer_begin_subpass(cmd_buffer, cmd_buffer->state.subpass + 1);
> +   cmd_buffer_begin_subpass(cmd_buffer, prev_subpass + 1);
>  }
>  
>  void genX(CmdEndRenderPass)(
> -- 
> 2.5.0.400.gff86faf
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 15/29] anv/cmd_buffer: Add begin/end_subpass helpers

2018-01-09 Thread Nanley Chery
On Mon, Nov 27, 2017 at 07:06:05PM -0800, Jason Ekstrand wrote:
> Having begin/end_subpass is a bit nicer than the begin/next/end hooks
> that Vulkan gives us.
> ---
>  src/intel/vulkan/genX_cmd_buffer.c | 55 
> +-
>  1 file changed, 31 insertions(+), 24 deletions(-)
> 

This patch is
Reviewed-by: Nanley Chery 

> diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
> b/src/intel/vulkan/genX_cmd_buffer.c
> index bbe97f5..6f2fa0a 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -3138,10 +3138,11 @@ cmd_buffer_subpass_sync_fast_clear_values(struct 
> anv_cmd_buffer *cmd_buffer)
>  
>  
>  static void
> -genX(cmd_buffer_set_subpass)(struct anv_cmd_buffer *cmd_buffer,
> - struct anv_subpass *subpass)
> +cmd_buffer_begin_subpass(struct anv_cmd_buffer *cmd_buffer,
> + struct anv_subpass *subpass)
>  {
> cmd_buffer->state.subpass = subpass;
> +   uint32_t subpass_id = anv_get_subpass_id(_buffer->state);
>  
> cmd_buffer->state.dirty |= ANV_CMD_DIRTY_RENDER_TARGETS;
>  
> @@ -3155,6 +3156,10 @@ genX(cmd_buffer_set_subpass)(struct anv_cmd_buffer 
> *cmd_buffer,
> if (GEN_GEN == 7)
>cmd_buffer->state.vb_dirty |= ~0;
>  
> +   /* Accumulate any subpass flushes that need to happen before the subpass 
> */
> +   cmd_buffer->state.pending_pipe_bits |=
> +  cmd_buffer->state.pass->subpass_flushes[subpass_id];
> +
> /* Perform transitions to the subpass layout before any writes have
>  * occurred.
>  */
> @@ -3174,6 +3179,26 @@ genX(cmd_buffer_set_subpass)(struct anv_cmd_buffer 
> *cmd_buffer,
> anv_cmd_buffer_clear_subpass(cmd_buffer);
>  }
>  
> +static void
> +cmd_buffer_end_subpass(struct anv_cmd_buffer *cmd_buffer)
> +{
> +   uint32_t subpass_id = anv_get_subpass_id(_buffer->state);
> +
> +   anv_cmd_buffer_resolve_subpass(cmd_buffer);
> +
> +   /* Perform transitions to the final layout after all writes have occurred.
> +*/
> +   cmd_buffer_subpass_transition_layouts(cmd_buffer, true);
> +
> +   /* Accumulate any subpass flushes that need to happen after the subpass.
> +* Yes, they do get accumulated twice in the NextSubpass case but since
> +* genX_CmdNextSubpass just calls end/begin back-to-back, we just end up
> +* ORing the bits in twice so it's harmless.
> +*/
> +   cmd_buffer->state.pending_pipe_bits |=
> +  cmd_buffer->state.pass->subpass_flushes[subpass_id + 1];
> +}
> +
>  void genX(CmdBeginRenderPass)(
>  VkCommandBuffer commandBuffer,
>  const VkRenderPassBeginInfo*pRenderPassBegin,
> @@ -3197,10 +3222,7 @@ void genX(CmdBeginRenderPass)(
>  
> genX(flush_pipeline_select_3d)(cmd_buffer);
>  
> -   cmd_buffer->state.pending_pipe_bits |=
> -  cmd_buffer->state.pass->subpass_flushes[0];
> -
> -   genX(cmd_buffer_set_subpass)(cmd_buffer, pass->subpasses);
> +   cmd_buffer_begin_subpass(cmd_buffer, pass->subpasses);
>  }
>  
>  void genX(CmdNextSubpass)(
> @@ -3214,17 +3236,9 @@ void genX(CmdNextSubpass)(
>  
> assert(cmd_buffer->level == VK_COMMAND_BUFFER_LEVEL_PRIMARY);
>  
> -   anv_cmd_buffer_resolve_subpass(cmd_buffer);
> -
> -   /* Perform transitions to the final layout after all writes have occurred.
> -*/
> -   cmd_buffer_subpass_transition_layouts(cmd_buffer, true);
> -
> -   uint32_t subpass_id = anv_get_subpass_id(_buffer->state);
> -   cmd_buffer->state.pending_pipe_bits |=
> -  cmd_buffer->state.pass->subpass_flushes[subpass_id];
> +   cmd_buffer_end_subpass(cmd_buffer);
>  
> -   genX(cmd_buffer_set_subpass)(cmd_buffer, cmd_buffer->state.subpass + 1);
> +   cmd_buffer_begin_subpass(cmd_buffer, cmd_buffer->state.subpass + 1);
>  }
>  
>  void genX(CmdEndRenderPass)(
> @@ -3235,14 +3249,7 @@ void genX(CmdEndRenderPass)(
> if (anv_batch_has_error(_buffer->batch))
>return;
>  
> -   anv_cmd_buffer_resolve_subpass(cmd_buffer);
> -
> -   /* Perform transitions to the final layout after all writes have occurred.
> -*/
> -   cmd_buffer_subpass_transition_layouts(cmd_buffer, true);
> -
> -   cmd_buffer->state.pending_pipe_bits |=
> -  
> cmd_buffer->state.pass->subpass_flushes[cmd_buffer->state.pass->subpass_count];
> +   cmd_buffer_end_subpass(cmd_buffer);
>  
> cmd_buffer->state.hiz_enabled = false;
>  
> -- 
> 2.5.0.400.gff86faf
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i915g: fix crashes with wined3d

2018-01-09 Thread Christopher Egert
I'm not too familiar with gallium3d, but this fixes
crashes with 3DMark2001 and GTA3 in wine-staging.

This should be fixed properly in the future.

Signed-off-by: Christopher Egert 
---
 src/gallium/drivers/i915/i915_clear.c| 3 ++-
 src/gallium/drivers/i915/i915_state_static.c | 4 +++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/i915/i915_clear.c 
b/src/gallium/drivers/i915/i915_clear.c
index a1af789104..7a6f3fbb9a 100644
--- a/src/gallium/drivers/i915/i915_clear.c
+++ b/src/gallium/drivers/i915/i915_clear.c
@@ -250,7 +250,8 @@ i915_clear_render(struct pipe_context *pipe, unsigned 
buffers,
 {
struct i915_context *i915 = i915_context(pipe);

-   if (i915->dirty)
+   /* XXX not sure why this happens, but it works around one crash */
+   if (i915->dirty || (i915->hardware_dirty == ~0))
   i915_update_derived(i915);

i915_clear_emit(pipe, buffers, color, depth, stencil,
diff --git a/src/gallium/drivers/i915/i915_state_static.c 
b/src/gallium/drivers/i915/i915_state_static.c
index 88b418b1ac..48bb137019 100644
--- a/src/gallium/drivers/i915/i915_state_static.c
+++ b/src/gallium/drivers/i915/i915_state_static.c
@@ -244,7 +244,9 @@ static void update_dst_buf_vars(struct i915_context *i915)
   i915->current.target_fixup_format = need_fixup;
   i915->current.fixup_swizzle = fixup;
   /* we also send a new program to make sure the fixup for RGBA surfaces 
happens */
-  i915->hardware_dirty |= I915_HW_PROGRAM;
+  /* XXX there is no program to upload, not sure where this should
+   * be coming from, so comment this out for now */
+  //i915->hardware_dirty |= I915_HW_PROGRAM;
}
 }

-- 
2.15.1
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 14/29] anv/cmd_buffer: Apply subpass flushes before set_subpass

2018-01-09 Thread Nanley Chery
On Mon, Nov 27, 2017 at 07:06:04PM -0800, Jason Ekstrand wrote:
> This seems slightly more correct because it means that the flushes
> happen before any clears or resolves implied by the subpass transition.
> ---
>  src/intel/vulkan/genX_cmd_buffer.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 

Makes sense. This patch is
Reviewed-by: Nanley Chery 

> diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
> b/src/intel/vulkan/genX_cmd_buffer.c
> index 2d47179..bbe97f5 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -3197,10 +3197,10 @@ void genX(CmdBeginRenderPass)(
>  
> genX(flush_pipeline_select_3d)(cmd_buffer);
>  
> -   genX(cmd_buffer_set_subpass)(cmd_buffer, pass->subpasses);
> -
> cmd_buffer->state.pending_pipe_bits |=
>cmd_buffer->state.pass->subpass_flushes[0];
> +
> +   genX(cmd_buffer_set_subpass)(cmd_buffer, pass->subpasses);
>  }
>  
>  void genX(CmdNextSubpass)(
> @@ -3220,11 +3220,11 @@ void genX(CmdNextSubpass)(
>  */
> cmd_buffer_subpass_transition_layouts(cmd_buffer, true);
>  
> -   genX(cmd_buffer_set_subpass)(cmd_buffer, cmd_buffer->state.subpass + 1);
> -
> uint32_t subpass_id = anv_get_subpass_id(_buffer->state);
> cmd_buffer->state.pending_pipe_bits |=
>cmd_buffer->state.pass->subpass_flushes[subpass_id];
> +
> +   genX(cmd_buffer_set_subpass)(cmd_buffer, cmd_buffer->state.subpass + 1);
>  }
>  
>  void genX(CmdEndRenderPass)(
> -- 
> 2.5.0.400.gff86faf
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 1/3] util/crc32: don't drop the const qualifier

2018-01-09 Thread Grazvydas Ignotas
Signed-off-by: Grazvydas Ignotas 
---
 src/util/crc32.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/util/crc32.c b/src/util/crc32.c
index 44d637c..f2e01c6 100644
--- a/src/util/crc32.c
+++ b/src/util/crc32.c
@@ -109,11 +109,11 @@ util_crc32_table[256] = {
  * @sa http://www.w3.org/TR/PNG/#D-CRCAppendix
  */
 uint32_t
 util_hash_crc32(const void *data, size_t size)
 {
-   uint8_t *p = (uint8_t *)data;
+   const uint8_t *p = data;
uint32_t crc = 0x;
  
while (size--)
   crc = util_crc32_table[(crc ^ *p++) & 0xff] ^ (crc >> 8);

-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 2/3] android, configure, meson: define HAVE_ZLIB

2018-01-09 Thread Grazvydas Ignotas
The next change wants to use some optional zlib functionality, however
not all platforms currently use it. Based on earlier Jordan Justen's
patches and their review feedback.

Signed-off-by: Grazvydas Ignotas 
---
 Android.common.mk | 1 +
 configure.ac  | 1 +
 meson.build   | 1 +
 3 files changed, 3 insertions(+)

diff --git a/Android.common.mk b/Android.common.mk
index d9f871c..52dc7bf 100644
--- a/Android.common.mk
+++ b/Android.common.mk
@@ -68,10 +68,11 @@ LOCAL_CFLAGS += \
-DHAVE___BUILTIN_UNREACHABLE \
-DHAVE_PTHREAD=1 \
-DHAVE_DLADDR \
-DHAVE_DL_ITERATE_PHDR \
-DHAVE_LINUX_FUTEX_H \
+   -DHAVE_ZLIB \
-DMAJOR_IN_SYSMACROS \
-fvisibility=hidden \
-Wno-sign-compare
 
 LOCAL_CPPFLAGS += \
diff --git a/configure.ac b/configure.ac
index 79f275d..e236a3c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -904,10 +904,11 @@ esac
 dnl See if posix_memalign is available
 AC_CHECK_FUNC([posix_memalign], [DEFINES="$DEFINES -DHAVE_POSIX_MEMALIGN"])
 
 dnl Check for zlib
 PKG_CHECK_MODULES([ZLIB], [zlib >= $ZLIB_REQUIRED])
+DEFINES="$DEFINES -DHAVE_ZLIB"
 
 dnl Check for pthreads
 AX_PTHREAD
 if test "x$ax_pthread_ok" = xno; then
 AC_MSG_ERROR([Building mesa on this platform requires pthreads])
diff --git a/meson.build b/meson.build
index 77e4e89..ae31cdd 100644
--- a/meson.build
+++ b/meson.build
@@ -941,10 +941,11 @@ if dep_libdrm.found()
   endif
 endif
 
 # TODO: some of these may be conditional
 dep_zlib = dependency('zlib', version : '>= 1.2.3')
+pre_args += '-DHAVE_ZLIB'
 dep_thread = dependency('threads')
 if dep_thread.found() and host_machine.system() != 'windows'
   pre_args += '-DHAVE_PTHREAD'
 endif
 if with_amd_vk or with_gallium_radeonsi or with_gallium_r600 or 
with_gallium_opencl
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 3/3] util: use faster zlib's CRC32 implementaion

2018-01-09 Thread Grazvydas Ignotas
zlib provides a faster slice-by-4 CRC32 implementation than the
traditional single byte lookup one used by mesa. As most supported
platforms now link zlib unconditionally, we can easily use it.

Improvement for a 1MB buffer (avg MB/s, n=100, zlib 1.2.8):

  i5-6600KC2D E4500
mesa zlibmesa zlib
 443 1443 225% +/- 2.1%   403 1175 191% +/- 0.9%

It has been verified the calculation results stay the same after this
change.

Signed-off-by: Grazvydas Ignotas 
---
v2: drop the size threshold check because size is unlikely to be that
low of things mesa is typically hashing

 src/util/crc32.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/util/crc32.c b/src/util/crc32.c
index f2e01c6..9edd3e1 100644
--- a/src/util/crc32.c
+++ b/src/util/crc32.c
@@ -31,10 +31,13 @@
  * 
  * @author Jose Fonseca
  */
 
 
+#ifdef HAVE_ZLIB
+#include 
+#endif
 #include "crc32.h"
 
 
 static const uint32_t 
 util_crc32_table[256] = {
@@ -112,10 +115,19 @@ uint32_t
 util_hash_crc32(const void *data, size_t size)
 {
const uint8_t *p = data;
uint32_t crc = 0x;
  
+#ifdef HAVE_ZLIB
+   /* zlib's uInt is always "unsigned int" while size_t can be 64bit.
+* Since 1.2.9 there's crc32_z that takes size_t, but use the more
+* available function to avoid build system complications.
+*/
+   if ((uInt)size == size)
+  return ~crc32(0, data, size);
+#endif
+
while (size--)
   crc = util_crc32_table[(crc ^ *p++) & 0xff] ^ (crc >> 8);

return crc;
 }
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] amd/common: do not rely on the pipeline for the push constants logic

2018-01-09 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 

for the series.

On Tue, Jan 9, 2018 at 6:09 PM, Samuel Pitoiset
 wrote:
> It makes more sense to rely on nir_intrinsic_load_push_constant
> instead of the pipeline layout.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/common/ac_nir_to_llvm.c |  6 +++---
>  src/amd/common/ac_shader_info.c | 10 +-
>  src/amd/common/ac_shader_info.h |  2 +-
>  3 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index 70876cfc69..54edeff983 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -597,7 +597,7 @@ static void allocate_user_sgprs(struct 
> nir_to_llvm_context *ctx,
> break;
> }
>
> -   if (ctx->shader_info->info.needs_push_constants)
> +   if (ctx->shader_info->info.loads_push_constants)
> user_sgpr_info->sgpr_count += 2;
>
> uint32_t remaining_sgprs = 16 - user_sgpr_info->sgpr_count;
> @@ -638,7 +638,7 @@ declare_global_input_sgprs(struct nir_to_llvm_context 
> *ctx,
> add_array_arg(args, const_array(type, 32), desc_sets);
> }
>
> -   if (ctx->shader_info->info.needs_push_constants) {
> +   if (ctx->shader_info->info.loads_push_constants) {
> /* 1 for push constants and dynamic descriptors */
> add_array_arg(args, type, >push_constants);
> }
> @@ -729,7 +729,7 @@ set_global_input_locs(struct nir_to_llvm_context *ctx, 
> gl_shader_stage stage,
> ctx->shader_info->need_indirect_descriptor_sets = true;
> }
>
> -   if (ctx->shader_info->info.needs_push_constants) {
> +   if (ctx->shader_info->info.loads_push_constants) {
> set_loc_shader(ctx, AC_UD_PUSH_CONSTANTS, user_sgpr_idx, 2);
> }
>  }
> diff --git a/src/amd/common/ac_shader_info.c b/src/amd/common/ac_shader_info.c
> index 27896a26bb..443980c7d1 100644
> --- a/src/amd/common/ac_shader_info.c
> +++ b/src/amd/common/ac_shader_info.c
> @@ -76,6 +76,9 @@ gather_intrinsic_info(nir_intrinsic_instr *instr, struct 
> ac_shader_info *info)
> case nir_intrinsic_load_primitive_id:
> info->uses_prim_id = true;
> break;
> +   case nir_intrinsic_load_push_constant:
> +   info->loads_push_constants = true;
> +   break;
> case nir_intrinsic_vulkan_resource_index:
> info->desc_set_used_mask |= (1 << 
> nir_intrinsic_desc_set(instr));
> break;
> @@ -154,11 +157,8 @@ ac_nir_shader_info_pass(struct nir_shader *nir,
>  {
> struct nir_function *func = (struct nir_function 
> *)exec_list_get_head(>functions);
>
> -   info->needs_push_constants = false;
> -   if ((options->layout->push_constant_size &&
> -options->layout->push_constant_stages & (1 << nir->info.stage)) 
> ||
> -   options->layout->dynamic_offset_count)
> -   info->needs_push_constants = true;
> +   if (options->layout->dynamic_offset_count)
> +   info->loads_push_constants = true;
>
> nir_foreach_variable(variable, >inputs)
> gather_info_input_decl(nir, options, variable, info);
> diff --git a/src/amd/common/ac_shader_info.h b/src/amd/common/ac_shader_info.h
> index 437859f891..9c9a8473a4 100644
> --- a/src/amd/common/ac_shader_info.h
> +++ b/src/amd/common/ac_shader_info.h
> @@ -28,7 +28,7 @@ struct nir_shader;
>  struct ac_nir_compiler_options;
>
>  struct ac_shader_info {
> -   bool needs_push_constants;
> +   bool loads_push_constants;
> uint32_t desc_set_used_mask;
> bool needs_multiview_view_index;
> bool uses_invocation_id;
> --
> 2.15.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/3] radv/gfx9: calculate the number of ES VGPRs for merged shaders

2018-01-09 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 

for the series.

On Tue, Jan 9, 2018 at 4:01 PM, Samuel Pitoiset
 wrote:
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_shader.c | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
> index 58d991e452..6f622dd996 100644
> --- a/src/amd/vulkan/radv_shader.c
> +++ b/src/amd/vulkan/radv_shader.c
> @@ -416,7 +416,15 @@ radv_fill_shader_variant(struct radv_device *device,
> stage == MESA_SHADER_GEOMETRY) {
> struct ac_shader_info *info = >info.info;
> unsigned es_type = variant->info.gs.es_type;
> -   unsigned gs_vgpr_comp_cnt;
> +   unsigned gs_vgpr_comp_cnt, es_vgpr_comp_cnt;
> +
> +   if (es_type == MESA_SHADER_VERTEX) {
> +   es_vgpr_comp_cnt = variant->info.vs.vgpr_comp_cnt;
> +   } else if (es_type == MESA_SHADER_TESS_EVAL) {
> +   es_vgpr_comp_cnt = 3;
> +   } else {
> +   assert(!"invalid shader ES type");
> +   }
>
> /* If offsets 4, 5 are used, GS_VGPR_COMP_CNT is ignored and
>  * VGPR[0:4] are always loaded.
> @@ -430,9 +438,8 @@ radv_fill_shader_variant(struct radv_device *device,
> else
> gs_vgpr_comp_cnt = 0; /* VGPR0 contains offsets 0, 1 
> */
>
> -   /* TODO: Figure out how many we actually need. */
> variant->rsrc1 |= S_00B228_GS_VGPR_COMP_CNT(gs_vgpr_comp_cnt);
> -   variant->rsrc2 |= S_00B22C_ES_VGPR_COMP_CNT(3) |
> +   variant->rsrc2 |= S_00B22C_ES_VGPR_COMP_CNT(es_vgpr_comp_cnt) 
> |
>   S_00B22C_OC_LDS_EN(es_type == 
> MESA_SHADER_TESS_EVAL);
> } else if (device->physical_device->rad_info.chip_class >= GFX9 &&
> stage == MESA_SHADER_TESS_CTRL)
> --
> 2.15.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/8] intel/isl: Add support to emit clear value address.

2018-01-09 Thread Nanley Chery
On Mon, Jan 08, 2018 at 04:00:37PM -0800, Jason Ekstrand wrote:
> On Mon, Jan 8, 2018 at 2:29 PM, Nanley Chery  wrote:
> 
> > On Fri, Dec 15, 2017 at 02:53:29PM -0800, Rafael Antognolli wrote:
> > > gen10 can emit the clear color by setting it on a buffer somewhere, and
> > > then adding only the address to the surface state.
> > >
> > > This commit add support for that on isl_surf_fill_state, and if that is
> > > requested, skip setting the clear value itself.
> > >
> > > Signed-off-by: Rafael Antognolli 
> > > ---
> > >  src/intel/isl/isl.h   |  9 +
> > >  src/intel/isl/isl_surface_state.c | 15 +++
> > >  2 files changed, 20 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/src/intel/isl/isl.h b/src/intel/isl/isl.h
> > > index e3acb0ec280..c6e1fee27c1 100644
> > > --- a/src/intel/isl/isl.h
> > > +++ b/src/intel/isl/isl.h
> > > @@ -1277,6 +1277,15 @@ struct isl_surf_fill_state_info {
> > >  */
> > > union isl_color_value clear_color;
> > >
> > > +   /**
> > > +* Send only the clear value address
> > > +*
> > > +* If set, we only pass the clear address to the GPU and it will
> > fetch it
> > > +* from wherever it is.
> > > +*/
> > > +   bool use_clear_address;
> > > +   uint64_t clear_address;
> > > +
> > > /**
> > >  * Surface write disables for gen4-5
> > >  */
> > > diff --git a/src/intel/isl/isl_surface_state.c
> > b/src/intel/isl/isl_surface_state.c
> > > index bfb27fa4a44..14741459687 100644
> > > --- a/src/intel/isl/isl_surface_state.c
> > > +++ b/src/intel/isl/isl_surface_state.c
> > > @@ -635,11 +635,18 @@ isl_genX(surf_fill_state_s)(const struct
> > isl_device *dev, void *state,
> > >  #endif
> > >
> > > if (info->aux_usage != ISL_AUX_USAGE_NONE) {
> > > +#if GEN_GEN >= 10
> > > +  s.ClearValueAddressEnable = info->use_clear_address;
> > > +  s.ClearValueAddressHigh = info->clear_address >> 32;
> > > +  s.ClearValueAddressLow = info->clear_address;
> > > +#endif
> > >  #if GEN_GEN >= 9
> > > -  s.RedClearColor = info->clear_color.u32[0];
> > > -  s.GreenClearColor = info->clear_color.u32[1];
> > > -  s.BlueClearColor = info->clear_color.u32[2];
> > > -  s.AlphaClearColor = info->clear_color.u32[3];
> > > +  if (!info->use_clear_address) {
> > > + s.RedClearColor = info->clear_color.u32[0];
> > > + s.GreenClearColor = info->clear_color.u32[1];
> > > + s.BlueClearColor = info->clear_color.u32[2];
> > > + s.AlphaClearColor = info->clear_color.u32[3];
> > > +  }
> >
> > It'd be nice to assert that use_clear_address is false for gen9.
> >
> 
> Yes it would.  How about something like this:
> 
> if (info->use_clear_address) {
> #if GEN_GEN >= 10
>s.ClearValueAddressEnable = true;
>s.ClearValueAddress = info->clear_address;
> #else
>unreachable("Gen9 and earlier do not support indirect clear colors");
> #endif
> } else {
>// Set clear colors
> }
> 
> 

Looks good to me.

> > -Nanley
> >
> > >  #elif GEN_GEN >= 7
> > >/* Prior to Sky Lake, we only have one bit for the clear color
> > which
> > > * gives us 0 or 1 in whatever the surface's format happens to be.
> > > --
> > > 2.14.3
> > >
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] anv: Make the clear state buffer 64 bytes aligned.

2018-01-09 Thread Nanley Chery
On Tue, Jan 09, 2018 at 11:26:26AM -0800, Jason Ekstrand wrote:
> On Tue, Jan 9, 2018 at 10:33 AM, Nanley Chery  wrote:
> 
> > On Mon, Jan 08, 2018 at 04:03:47PM -0800, Jason Ekstrand wrote:
> > > On Mon, Jan 8, 2018 at 3:00 PM, Nanley Chery 
> > wrote:
> > >
> > > > On Fri, Dec 15, 2017 at 02:53:30PM -0800, Rafael Antognolli wrote:
> > > > > On Gen10+, if we use the clear state address field in the surface
> > state
> > > > > instead of the clear color directly, there's a restriction that the
> > > > > address must point to the lower part of a 64 byte cache-line.
> > > > >
> > > > > Signed-off-by: Rafael Antognolli 
> > > > > ---
> > > > >  src/intel/vulkan/anv_private.h | 12 +++-
> > > > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_
> > > > private.h
> > > > > index b7bde4b8ce6..43cbf065724 100644
> > > > > --- a/src/intel/vulkan/anv_private.h
> > > > > +++ b/src/intel/vulkan/anv_private.h
> > > > > @@ -2490,7 +2490,17 @@ anv_fast_clear_state_entry_size(const struct
> > > > anv_device *device)
> > > > >  * GPU memcpy operations.
> > > > >  */
> > > > > assert(device->isl_dev.ss.clear_value_size % 4 == 0);
> > > > > -   return device->isl_dev.ss.clear_value_size + 4;
> > > > > +
> > > > > +   const unsigned entry_size = device->isl_dev.ss.clear_value_size
> > + 4;
> > > > > +   /* On Gen10+, we use the clear color address of the surface to
> > point
> > > > to this
> > > > > +* buffer directly. However, according to the bspec:
> > > > > +*
> > > > > +*The memory layout of the clear color pointed to by this
> > > > address is a
> > > > > +*value stored in the lower-order bytes of a 64-byte
> > cache-line.
> > > > > +*
> > > > > +* So add some padding here for Gen10+.
> > > > > +*/
> > > >
> > > > I don't see any indication that the upper bytes may be modified by the
> > > > hardware. For that reason, I think we can assume that the image that
> > > > precedes this entry is at least 64 bytes and avoid padding the entry.
> > >
> > >
> > > I'm not sure what you mean by this.
> > >
> > >
> >
> > My bad, I meant to say:
> >I don't see any indication that the upper bytes of the cacheline may
> >be modified by the hardware. I think we can also assume that the
> >image that precedes this entry is at least 64 bytes. For these
> >reasons we should be able to avoid padding the entry.
> >
> 
> Yes, that means that we are free to put other stuff (such as resolve
> tracking information) after the clear color.  However, for images with
> multiple LODs, we need to have the individual per-LOD entries aligned.
> 
> --Jason
> 
> 

Got it. Thanks.

> > > > > +   return device->info.gen >= 10 ? ALIGN(entry_size, 64) :
> > entry_size;
> > > >
> > > >
> > > > >  }
> > > > >
> > > > >  static inline struct anv_address
> > > > > --
> > > > > 2.14.3
> > > > >
> > > > > ___
> > > > > mesa-dev mailing list
> > > > > mesa-dev@lists.freedesktop.org
> > > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > > > ___
> > > > mesa-dev mailing list
> > > > mesa-dev@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > > >
> >
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] anv: Make the clear state buffer 64 bytes aligned.

2018-01-09 Thread Nanley Chery
On Mon, Jan 08, 2018 at 04:33:25PM -0800, Rafael Antognolli wrote:
> On Mon, Jan 08, 2018 at 04:03:47PM -0800, Jason Ekstrand wrote:
> > On Mon, Jan 8, 2018 at 3:00 PM, Nanley Chery  wrote:
> > 
> > On Fri, Dec 15, 2017 at 02:53:30PM -0800, Rafael Antognolli wrote:
> > > On Gen10+, if we use the clear state address field in the surface 
> > state
> > > instead of the clear color directly, there's a restriction that the
> > > address must point to the lower part of a 64 byte cache-line.
> > >
> > > Signed-off-by: Rafael Antognolli 
> > > ---
> > >  src/intel/vulkan/anv_private.h | 12 +++-
> > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_
> > private.h
> > > index b7bde4b8ce6..43cbf065724 100644
> > > --- a/src/intel/vulkan/anv_private.h
> > > +++ b/src/intel/vulkan/anv_private.h
> > > @@ -2490,7 +2490,17 @@ anv_fast_clear_state_entry_size(const struct
> > anv_device *device)
> > >  * GPU memcpy operations.
> > >  */
> > > assert(device->isl_dev.ss.clear_value_size % 4 == 0);
> > > -   return device->isl_dev.ss.clear_value_size + 4;
> > > +
> > > +   const unsigned entry_size = device->isl_dev.ss.clear_value_size + 
> > 4;
> > > +   /* On Gen10+, we use the clear color address of the surface to 
> > point
> > to this
> > > +* buffer directly. However, according to the bspec:
> > > +*
> > > +*The memory layout of the clear color pointed to by this 
> > address
> > is a
> > > +*value stored in the lower-order bytes of a 64-byte 
> > cache-line.
> > > +*
> > > +* So add some padding here for Gen10+.
> > > +*/
> > 
> > I don't see any indication that the upper bytes may be modified by the
> > hardware. For that reason, I think we can assume that the image that
> > precedes this entry is at least 64 bytes and avoid padding the entry.
> > 
> > 
> > I'm not sure what you mean by this.
> 
> Hmm... maybe my comment is confusing, but the idea is to add padding to
> the entry, so if we have multiple entries (one per level), all of them
> are aligned. I can try to find a better way to guarantee that.
> 
> Without this code, I was hitting the assert in patch 04 on some tests.
> 

Oh, okay. That makes sense. This solution looks good to me. Perhaps we
want the title to mention that we specifically want the buffer entries
to be 64-byte aligned.

> > > +   return device->info.gen >= 10 ? ALIGN(entry_size, 64) : 
> > entry_size;
> > 
> > 
> > >  }
> > >
> > >  static inline struct anv_address
> > > --
> > > 2.14.3
> > >
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > 
> > 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] intel: Add more Coffee Lake PCI IDs

2018-01-09 Thread Anuj Phogat
More Coffee Lake PCI IDs have been added to the spec.

Cc: Rodrigo Vivi 
Signed-off-by: Anuj Phogat 
---
 include/pci_ids/i965_pci_ids.h | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h
index 0dd01a4343..9616f7de21 100644
--- a/include/pci_ids/i965_pci_ids.h
+++ b/include/pci_ids/i965_pci_ids.h
@@ -167,15 +167,23 @@ CHIPSET(0x3184, glk, "Intel(R) HD Graphics 
(Geminilake)")
 CHIPSET(0x3185, glk_2x6, "Intel(R) HD Graphics (Geminilake 2x6)")
 CHIPSET(0x3E90, cfl_gt1, "Intel(R) HD Graphics (Coffeelake 2x6 GT1)")
 CHIPSET(0x3E93, cfl_gt1, "Intel(R) HD Graphics (Coffeelake 2x6 GT1)")
+CHIPSET(0x3E99, cfl_gt1, "Intel(R) HD Graphics (Coffeelake 2x6 GT1)")
+CHIPSET(0x3EA1, cfl_gt1, "Intel(R) HD Graphics (Coffeelake 2x6 GT1)")
+CHIPSET(0x3EA4, cfl_gt1, "Intel(R) HD Graphics (Coffeelake 2x6 GT1)")
 CHIPSET(0x3E91, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
 CHIPSET(0x3E92, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
 CHIPSET(0x3E96, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
+CHIPSET(0x3E9A, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
 CHIPSET(0x3E9B, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
 CHIPSET(0x3E94, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
+CHIPSET(0x3EA0, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
+CHIPSET(0x3EA3, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
+CHIPSET(0x3EA9, cfl_gt2, "Intel(R) HD Graphics (Coffeelake 3x8 GT2)")
+CHIPSET(0x3EA2, cfl_gt3, "Intel(R) HD Graphics (Coffeelake 3x8 GT3)")
+CHIPSET(0x3EA5, cfl_gt3, "Intel(R) HD Graphics (Coffeelake 3x8 GT3)")
 CHIPSET(0x3EA6, cfl_gt3, "Intel(R) HD Graphics (Coffeelake 3x8 GT3)")
 CHIPSET(0x3EA7, cfl_gt3, "Intel(R) HD Graphics (Coffeelake 3x8 GT3)")
 CHIPSET(0x3EA8, cfl_gt3, "Intel(R) HD Graphics (Coffeelake 3x8 GT3)")
-CHIPSET(0x3EA5, cfl_gt3, "Intel(R) HD Graphics (Coffeelake 3x8 GT3)")
 CHIPSET(0x5A49, cnl_2x8, "Intel(R) HD Graphics (Cannonlake 2x8 GT0.5)")
 CHIPSET(0x5A4A, cnl_2x8, "Intel(R) HD Graphics (Cannonlake 2x8 GT0.5)")
 CHIPSET(0x5A41, cnl_3x8, "Intel(R) HD Graphics (Cannonlake 3x8 GT1)")
-- 
2.13.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] i965/fs: Add/use functions to convert to 3src_align1 vstride/hstride

2018-01-09 Thread Scott D Phillips
Matt Turner  writes:

> On Mon, Jan 8, 2018 at 5:01 PM, Scott D Phillips
>  wrote:
>> Matt Turner  writes:
>>
>>> Some cases weren't handled, such as stride 4 which is needed for 64-bit
>>> operations. Presumably fixes the assertion failure mentioned in commit
>>> 2d0457203871 (Revert "i965/fs: Use align1 mode on ternary instructions
>>> on Gen10+") but who can really say since the commit neglected to list
>>> any of them!
>>> ---
>>>  src/intel/compiler/brw_eu_emit.c | 69 
>>> 
>>>  1 file changed, 41 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/src/intel/compiler/brw_eu_emit.c 
>>> b/src/intel/compiler/brw_eu_emit.c
>>> index 85bb6a4cdd..c25d8d6eda 100644
>>> --- a/src/intel/compiler/brw_eu_emit.c
>>> +++ b/src/intel/compiler/brw_eu_emit.c
>>> @@ -673,6 +673,42 @@ get_3src_subreg_nr(struct brw_reg reg)
>>> return reg.subnr / 4;
>>>  }
>>>
>>> +static enum gen10_align1_3src_vertical_stride
>>> +to_3src_align1_vstride(enum brw_vertical_stride vstride)
>>> +{
>>> +   switch (vstride) {
>>> +   case BRW_VERTICAL_STRIDE_0:
>>> +  return BRW_ALIGN1_3SRC_VERTICAL_STRIDE_0;
>>> +   case BRW_VERTICAL_STRIDE_2:
>>> +  return BRW_ALIGN1_3SRC_VERTICAL_STRIDE_2;
>>> +   case BRW_VERTICAL_STRIDE_4:
>>> +  return BRW_ALIGN1_3SRC_VERTICAL_STRIDE_4;
>>> +   case BRW_VERTICAL_STRIDE_8:
>>> +   case BRW_VERTICAL_STRIDE_16:
>>> +  return BRW_ALIGN1_3SRC_VERTICAL_STRIDE_8;
>>
>> What is the reasoning for vstride 16 to map to 8 here? Could that cause
>> problems?
>
> Good question. In an exec_size 16 instruction, a 16,16,1 region
> actually reads the same channels in the same order as an 8,8,1 region
> (another confusing thing about regioning is that there are effectively
> duplicates...). That's the most common region, and I seem to recall
> that in some cases the IR contains instructions with a 16,16,1 region.
> Other than in that duplicate case, I don't think we ever use vstride
> 16. As a result, they left it out of the hardware for align1 ternary
> instructions.
>
> If I can get some hardware to test with, it's probably a good idea for
> me to to double check which cases cause us to need to handle vstride
> 16 here. Maybe an assertion that it is the "duplicate" 16,16,1 region
> should be added.

Got it, with the assert or the double check on your todo list, patch is

Reviewed-by: Scott D Phillips 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] i965/fs: Add/use functions to convert to 3src_align1 vstride/hstride

2018-01-09 Thread Matt Turner
On Mon, Jan 8, 2018 at 5:01 PM, Scott D Phillips
 wrote:
> Matt Turner  writes:
>
>> Some cases weren't handled, such as stride 4 which is needed for 64-bit
>> operations. Presumably fixes the assertion failure mentioned in commit
>> 2d0457203871 (Revert "i965/fs: Use align1 mode on ternary instructions
>> on Gen10+") but who can really say since the commit neglected to list
>> any of them!
>> ---
>>  src/intel/compiler/brw_eu_emit.c | 69 
>> 
>>  1 file changed, 41 insertions(+), 28 deletions(-)
>>
>> diff --git a/src/intel/compiler/brw_eu_emit.c 
>> b/src/intel/compiler/brw_eu_emit.c
>> index 85bb6a4cdd..c25d8d6eda 100644
>> --- a/src/intel/compiler/brw_eu_emit.c
>> +++ b/src/intel/compiler/brw_eu_emit.c
>> @@ -673,6 +673,42 @@ get_3src_subreg_nr(struct brw_reg reg)
>> return reg.subnr / 4;
>>  }
>>
>> +static enum gen10_align1_3src_vertical_stride
>> +to_3src_align1_vstride(enum brw_vertical_stride vstride)
>> +{
>> +   switch (vstride) {
>> +   case BRW_VERTICAL_STRIDE_0:
>> +  return BRW_ALIGN1_3SRC_VERTICAL_STRIDE_0;
>> +   case BRW_VERTICAL_STRIDE_2:
>> +  return BRW_ALIGN1_3SRC_VERTICAL_STRIDE_2;
>> +   case BRW_VERTICAL_STRIDE_4:
>> +  return BRW_ALIGN1_3SRC_VERTICAL_STRIDE_4;
>> +   case BRW_VERTICAL_STRIDE_8:
>> +   case BRW_VERTICAL_STRIDE_16:
>> +  return BRW_ALIGN1_3SRC_VERTICAL_STRIDE_8;
>
> What is the reasoning for vstride 16 to map to 8 here? Could that cause
> problems?

Good question. In an exec_size 16 instruction, a 16,16,1 region
actually reads the same channels in the same order as an 8,8,1 region
(another confusing thing about regioning is that there are effectively
duplicates...). That's the most common region, and I seem to recall
that in some cases the IR contains instructions with a 16,16,1 region.
Other than in that duplicate case, I don't think we ever use vstride
16. As a result, they left it out of the hardware for align1 ternary
instructions.

If I can get some hardware to test with, it's probably a good idea for
me to to double check which cases cause us to need to handle vstride
16 here. Maybe an assertion that it is the "duplicate" 16,16,1 region
should be added.

>> +   default:
>> +  unreachable("invalid vstride");
>> +   }
>> +}
>> +
>> +
>> +static enum gen10_align1_3src_src_horizontal_stride
>> +to_3src_align1_hstride(enum brw_horizontal_stride hstride)
>> +{
>> +   switch (hstride) {
>> +   case BRW_HORIZONTAL_STRIDE_0:
>> +  return BRW_ALIGN1_3SRC_SRC_HORIZONTAL_STRIDE_0;
>> +   case BRW_HORIZONTAL_STRIDE_1:
>> +  return BRW_ALIGN1_3SRC_SRC_HORIZONTAL_STRIDE_1;
>> +   case BRW_HORIZONTAL_STRIDE_2:
>> +  return BRW_ALIGN1_3SRC_SRC_HORIZONTAL_STRIDE_2;
>> +   case BRW_HORIZONTAL_STRIDE_4:
>> +  return BRW_ALIGN1_3SRC_SRC_HORIZONTAL_STRIDE_4;
>> +   default:
>> +  unreachable("invalid hstride");
>> +   }
>> +}
>> +
>>  static brw_inst *
>>  brw_alu3(struct brw_codegen *p, unsigned opcode, struct brw_reg dest,
>>   struct brw_reg src0, struct brw_reg src1, struct brw_reg src2)
>> @@ -721,41 +757,18 @@ brw_alu3(struct brw_codegen *p, unsigned opcode, 
>> struct brw_reg dest,
>>brw_inst_set_3src_a1_src1_type(devinfo, inst, src1.type);
>>brw_inst_set_3src_a1_src2_type(devinfo, inst, src2.type);
>>
>> -  assert((src0.vstride == BRW_VERTICAL_STRIDE_0 &&
>> -  src0.hstride == BRW_HORIZONTAL_STRIDE_0) ||
>> - (src0.vstride == BRW_VERTICAL_STRIDE_8 &&
>> -  src0.hstride == BRW_HORIZONTAL_STRIDE_1));
>> -  assert((src1.vstride == BRW_VERTICAL_STRIDE_0 &&
>> -  src1.hstride == BRW_HORIZONTAL_STRIDE_0) ||
>> - (src1.vstride == BRW_VERTICAL_STRIDE_8 &&
>> -  src1.hstride == BRW_HORIZONTAL_STRIDE_1));
>> -  assert((src2.vstride == BRW_VERTICAL_STRIDE_0 &&
>> -  src2.hstride == BRW_HORIZONTAL_STRIDE_0) ||
>> - (src2.vstride == BRW_VERTICAL_STRIDE_8 &&
>> -  src2.hstride == BRW_HORIZONTAL_STRIDE_1));
>> -
>
> Were 0,x,0 and 8,x,1 just a list of expected cases before or was it
> toward some restriction? I'm not seeing anything in the documentation
> that implies a restriction, so I'm guessing the former.

No restriction; these were just the most common (I thought only at the
time) cases. 8,8,1 is the regular "read every channel in order"
region, and 0,1,0 is the "read a single channel repeatedly" region.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] anv: Make the clear state buffer 64 bytes aligned.

2018-01-09 Thread Jason Ekstrand
On Tue, Jan 9, 2018 at 10:33 AM, Nanley Chery  wrote:

> On Mon, Jan 08, 2018 at 04:03:47PM -0800, Jason Ekstrand wrote:
> > On Mon, Jan 8, 2018 at 3:00 PM, Nanley Chery 
> wrote:
> >
> > > On Fri, Dec 15, 2017 at 02:53:30PM -0800, Rafael Antognolli wrote:
> > > > On Gen10+, if we use the clear state address field in the surface
> state
> > > > instead of the clear color directly, there's a restriction that the
> > > > address must point to the lower part of a 64 byte cache-line.
> > > >
> > > > Signed-off-by: Rafael Antognolli 
> > > > ---
> > > >  src/intel/vulkan/anv_private.h | 12 +++-
> > > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_
> > > private.h
> > > > index b7bde4b8ce6..43cbf065724 100644
> > > > --- a/src/intel/vulkan/anv_private.h
> > > > +++ b/src/intel/vulkan/anv_private.h
> > > > @@ -2490,7 +2490,17 @@ anv_fast_clear_state_entry_size(const struct
> > > anv_device *device)
> > > >  * GPU memcpy operations.
> > > >  */
> > > > assert(device->isl_dev.ss.clear_value_size % 4 == 0);
> > > > -   return device->isl_dev.ss.clear_value_size + 4;
> > > > +
> > > > +   const unsigned entry_size = device->isl_dev.ss.clear_value_size
> + 4;
> > > > +   /* On Gen10+, we use the clear color address of the surface to
> point
> > > to this
> > > > +* buffer directly. However, according to the bspec:
> > > > +*
> > > > +*The memory layout of the clear color pointed to by this
> > > address is a
> > > > +*value stored in the lower-order bytes of a 64-byte
> cache-line.
> > > > +*
> > > > +* So add some padding here for Gen10+.
> > > > +*/
> > >
> > > I don't see any indication that the upper bytes may be modified by the
> > > hardware. For that reason, I think we can assume that the image that
> > > precedes this entry is at least 64 bytes and avoid padding the entry.
> >
> >
> > I'm not sure what you mean by this.
> >
> >
>
> My bad, I meant to say:
>I don't see any indication that the upper bytes of the cacheline may
>be modified by the hardware. I think we can also assume that the
>image that precedes this entry is at least 64 bytes. For these
>reasons we should be able to avoid padding the entry.
>

Yes, that means that we are free to put other stuff (such as resolve
tracking information) after the clear color.  However, for images with
multiple LODs, we need to have the individual per-LOD entries aligned.

--Jason


> > > > +   return device->info.gen >= 10 ? ALIGN(entry_size, 64) :
> entry_size;
> > >
> > >
> > > >  }
> > > >
> > > >  static inline struct anv_address
> > > > --
> > > > 2.14.3
> > > >
> > > > ___
> > > > mesa-dev mailing list
> > > > mesa-dev@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > >
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/8] anv: Make the clear state buffer 64 bytes aligned.

2018-01-09 Thread Nanley Chery
On Mon, Jan 08, 2018 at 04:03:47PM -0800, Jason Ekstrand wrote:
> On Mon, Jan 8, 2018 at 3:00 PM, Nanley Chery  wrote:
> 
> > On Fri, Dec 15, 2017 at 02:53:30PM -0800, Rafael Antognolli wrote:
> > > On Gen10+, if we use the clear state address field in the surface state
> > > instead of the clear color directly, there's a restriction that the
> > > address must point to the lower part of a 64 byte cache-line.
> > >
> > > Signed-off-by: Rafael Antognolli 
> > > ---
> > >  src/intel/vulkan/anv_private.h | 12 +++-
> > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_
> > private.h
> > > index b7bde4b8ce6..43cbf065724 100644
> > > --- a/src/intel/vulkan/anv_private.h
> > > +++ b/src/intel/vulkan/anv_private.h
> > > @@ -2490,7 +2490,17 @@ anv_fast_clear_state_entry_size(const struct
> > anv_device *device)
> > >  * GPU memcpy operations.
> > >  */
> > > assert(device->isl_dev.ss.clear_value_size % 4 == 0);
> > > -   return device->isl_dev.ss.clear_value_size + 4;
> > > +
> > > +   const unsigned entry_size = device->isl_dev.ss.clear_value_size + 4;
> > > +   /* On Gen10+, we use the clear color address of the surface to point
> > to this
> > > +* buffer directly. However, according to the bspec:
> > > +*
> > > +*The memory layout of the clear color pointed to by this
> > address is a
> > > +*value stored in the lower-order bytes of a 64-byte cache-line.
> > > +*
> > > +* So add some padding here for Gen10+.
> > > +*/
> >
> > I don't see any indication that the upper bytes may be modified by the
> > hardware. For that reason, I think we can assume that the image that
> > precedes this entry is at least 64 bytes and avoid padding the entry.
> 
> 
> I'm not sure what you mean by this.
> 
> 

My bad, I meant to say: 
   I don't see any indication that the upper bytes of the cacheline may
   be modified by the hardware. I think we can also assume that the
   image that precedes this entry is at least 64 bytes. For these
   reasons we should be able to avoid padding the entry.

> > > +   return device->info.gen >= 10 ? ALIGN(entry_size, 64) : entry_size;
> >
> >
> > >  }
> > >
> > >  static inline struct anv_address
> > > --
> > > 2.14.3
> > >
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> >
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel: Apply Geminilake "Barrier Mode" workaround.

2018-01-09 Thread Kenneth Graunke
On Monday, January 8, 2018 3:00:30 PM PST Rafael Antognolli wrote:
> On Thu, Jan 04, 2018 at 11:36:48AM -0800, Kenneth Graunke wrote:
> > Apparently, Geminilake requires you to whack a chicken bit to select
> > either compute or tessellation mode for barriers.  The recommendation
> > is to switch between them at PIPELINE_SELECT time.
> > 
> > We may not need to do this all the time, but I don't know that it hurts
> > either.  PIPELINE_SELECT is already a pretty giant stall.
> > 
> > This appears to fix hangs in tessellation control shaders with barriers
> > on Geminilake.  Note that this requires a corresponding kernel change,
> > 
> > drm/i915: Whitelist SLICE_COMMON_ECO_CHICKEN1 on Geminilake.
> > 
> > in order for the register write to actually happen.  Without an updated
> > kernel, this register write will be noop'd and the fix will not work.
> > ---
> >  src/intel/genxml/gen9.xml  |  8 
> >  src/intel/vulkan/genX_cmd_buffer.c | 21 +
> >  src/mesa/drivers/dri/i965/brw_defines.h|  5 +
> >  src/mesa/drivers/dri/i965/brw_misc_state.c | 15 +++
> >  4 files changed, 49 insertions(+)
> > 
> > diff --git a/src/intel/genxml/gen9.xml b/src/intel/genxml/gen9.xml
> > index 1422463693d..019d264fb70 100644
> > --- a/src/intel/genxml/gen9.xml
> > +++ b/src/intel/genxml/gen9.xml
> > @@ -3710,6 +3710,14 @@
> >   > type="bool"/>
> >
> >  
> > +  
> > +
> 
> Kind of nitpicking, but this field means more than a simple
> enable/disable kind of boolean. In other similar places we used "uint"
> instead of "bool" to represent that, specially since you are assigning
> value names to it. For instance, Floating Point Mode is like that, but
> there are other examples. Maybe we should decide one or another way and
> making it more consistent.
> 
> Regardless of that, this patch is
> 
> Reviewed-by: Rafael Antognolli 

Oops...I agree, uint makes more sense than bool for values with
enumerations.  I think that was just a mistake on my part.  I
changed it to uint before pushing.

Thanks for reviewing this!


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >