[Mesa-dev] [PATCH] util/rb_tree: Fix a compiler warning

2018-07-11 Thread Jason Ekstrand
Gcc 8 warns "cast to pointer from integer of different size" in 32-bit
builds.
---
 src/util/rb_tree.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/util/rb_tree.h b/src/util/rb_tree.h
index e8750b32d0e..c77e9255ea2 100644
--- a/src/util/rb_tree.h
+++ b/src/util/rb_tree.h
@@ -55,7 +55,7 @@ struct rb_node {
 static inline struct rb_node *
 rb_node_parent(struct rb_node *n)
 {
-return (struct rb_node *)(n->parent & ~1ull);
+return (struct rb_node *)(n->parent & ~(uintptr_t)1);
 }
 
 /** A red-black tree
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV

2018-07-11 Thread Jason Ekstrand
We've had several broadwell hangs that have come down to this bit just
not working correctly.  Most recently, we've had a pile of hangs
reported with apps running under DXVK:

https://github.com/doitsujin/dxvk/issues/469

Instead, use the bit that doesn't try to imply weird D3D coherency
things and just force-enables the PS like we want.

cc: mesa-sta...@lists.freedesktop.org
---
 src/intel/vulkan/genX_pipeline.c | 53 +---
 1 file changed, 22 insertions(+), 31 deletions(-)

diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c
index aa63ad0e097..4004a36f74f 100644
--- a/src/intel/vulkan/genX_pipeline.c
+++ b/src/intel/vulkan/genX_pipeline.c
@@ -1389,6 +1389,28 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct 
anv_subpass *subpass,
 wm.EarlyDepthStencilControl = EDSC_NORMAL;
  }
 
+#if GEN_GEN >= 8
+ /* Gen8 hardware tries to compute ThreadDispatchEnable for us but
+  * doesn't take into account KillPixels when no depth or stencil
+  * writes are enabled.  In order for occlusion queries to work
+  * correctly with no attachments, we need to force-enable PS thread
+  * dispatch.
+  *
+  * The BDW docs are pretty clear that that this bit isn't validated
+  * and probably shouldn't be used in production:
+  *
+  *"This must always be set to Normal. This field should not be
+  *tested for functional validation."
+  *
+  * Unfortunately, however, the other mechanism we have for doing this
+  * is 3DSTATE_PS_EXTRA::PixelShaderHasUAV which causes hangs on BDW.
+  * Given two bad options, we choose the one which works.
+  */
+ if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) &&
+ !has_color_buffer_write_enabled(pipeline, blend))
+wm.ForceThreadDispatchEnable = ForceON;
+#endif
+
  wm.BarycentricInterpolationMode =
 wm_prog_data->barycentric_interp_modes;
 
@@ -1583,37 +1605,6 @@ emit_3dstate_ps_extra(struct anv_pipeline *pipeline,
   ps.PixelShaderKillsPixel = subpass->has_ds_self_dep ||
  wm_prog_data->uses_kill;
 
-  /* The stricter cross-primitive coherency guarantees that the hardware
-   * gives us with the "Accesses UAV" bit set for at least one shader stage
-   * and the "UAV coherency required" bit set on the 3DPRIMITIVE command 
are
-   * redundant within the current image, atomic counter and SSBO GL APIs,
-   * which all have very loose ordering and coherency requirements and
-   * generally rely on the application to insert explicit barriers when a
-   * shader invocation is expected to see the memory writes performed by 
the
-   * invocations of some previous primitive.  Regardless of the value of
-   * "UAV coherency required", the "Accesses UAV" bits will implicitly 
cause
-   * an in most cases useless DC flush when the lowermost stage with the 
bit
-   * set finishes execution.
-   *
-   * It would be nice to disable it, but in some cases we can't because on
-   * Gen8+ it also has an influence on rasterization via the PS UAV-only
-   * signal (which could be set independently from the coherency mechanism
-   * in the 3DSTATE_WM command on Gen7), and because in some cases it will
-   * determine whether the hardware skips execution of the fragment shader
-   * or not via the ThreadDispatchEnable signal.  However if we know that
-   * GEN8_PS_BLEND_HAS_WRITEABLE_RT is going to be set and
-   * GEN8_PSX_PIXEL_SHADER_NO_RT_WRITE is not set it shouldn't make any
-   * difference so we may just disable it here.
-   *
-   * Gen8 hardware tries to compute ThreadDispatchEnable for us but doesn't
-   * take into account KillPixels when no depth or stencil writes are
-   * enabled. In order for occlusion queries to work correctly with no
-   * attachments, we need to force-enable here.
-   */
-  if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) &&
-  !has_color_buffer_write_enabled(pipeline, blend))
- ps.PixelShaderHasUAV = true;
-
 #if GEN_GEN >= 9
   ps.PixelShaderComputesStencil = wm_prog_data->computed_stencil;
   ps.PixelShaderPullsBary= wm_prog_data->pulls_bary;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/9] radeonsi: rework RADEON_PRIO flags to be <= 31

2018-07-11 Thread Marek Olšák
From: Marek Olšák 

This decreases sizeof(struct amdgpu_cs_buffer) from 24 to 16 bytes.
---
 src/gallium/drivers/radeon/radeon_winsys.h| 39 ++-
 src/gallium/drivers/radeonsi/si_debug.c   |  2 +-
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c |  6 +--
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.h |  4 +-
 src/gallium/winsys/radeon/drm/radeon_drm_cs.c |  2 +-
 src/gallium/winsys/radeon/drm/radeon_drm_cs.h |  2 +-
 6 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_winsys.h 
b/src/gallium/drivers/radeon/radeon_winsys.h
index bcd6831ed35..10c63ae4d82 100644
--- a/src/gallium/drivers/radeon/radeon_winsys.h
+++ b/src/gallium/drivers/radeon/radeon_winsys.h
@@ -108,63 +108,64 @@ enum radeon_value_id {
 RADEON_VRAM_USAGE,
 RADEON_VRAM_VIS_USAGE,
 RADEON_GTT_USAGE,
 RADEON_GPU_TEMPERATURE, /* DRM 2.42.0 */
 RADEON_CURRENT_SCLK,
 RADEON_CURRENT_MCLK,
 RADEON_GPU_RESET_COUNTER, /* DRM 2.43.0 */
 RADEON_CS_THREAD_TIME,
 };
 
-/* Each group of four has the same priority. */
 enum radeon_bo_priority {
+/* Each group of two has the same priority. */
 RADEON_PRIO_FENCE = 0,
 RADEON_PRIO_TRACE,
-RADEON_PRIO_SO_FILLED_SIZE,
+
+RADEON_PRIO_SO_FILLED_SIZE = 2,
 RADEON_PRIO_QUERY,
 
 RADEON_PRIO_IB1 = 4, /* main IB submitted to the kernel */
 RADEON_PRIO_IB2, /* IB executed with INDIRECT_BUFFER */
-RADEON_PRIO_DRAW_INDIRECT,
+
+RADEON_PRIO_DRAW_INDIRECT = 6,
 RADEON_PRIO_INDEX_BUFFER,
 
-RADEON_PRIO_CP_DMA = 12,
+RADEON_PRIO_CP_DMA = 8,
+RADEON_PRIO_BORDER_COLORS,
 
-RADEON_PRIO_CONST_BUFFER = 16,
+RADEON_PRIO_CONST_BUFFER = 10,
 RADEON_PRIO_DESCRIPTORS,
-RADEON_PRIO_BORDER_COLORS,
 
-RADEON_PRIO_SAMPLER_BUFFER = 20,
+RADEON_PRIO_SAMPLER_BUFFER = 12,
 RADEON_PRIO_VERTEX_BUFFER,
 
-RADEON_PRIO_SHADER_RW_BUFFER = 24,
+RADEON_PRIO_SHADER_RW_BUFFER = 14,
 RADEON_PRIO_COMPUTE_GLOBAL,
 
-RADEON_PRIO_SAMPLER_TEXTURE = 28,
+RADEON_PRIO_SAMPLER_TEXTURE = 16,
 RADEON_PRIO_SHADER_RW_IMAGE,
 
-RADEON_PRIO_SAMPLER_TEXTURE_MSAA = 32,
-
-RADEON_PRIO_COLOR_BUFFER = 36,
+RADEON_PRIO_SAMPLER_TEXTURE_MSAA = 18,
+RADEON_PRIO_COLOR_BUFFER,
 
-RADEON_PRIO_DEPTH_BUFFER = 40,
+RADEON_PRIO_DEPTH_BUFFER = 20,
 
-RADEON_PRIO_COLOR_BUFFER_MSAA = 44,
+RADEON_PRIO_COLOR_BUFFER_MSAA = 22,
 
-RADEON_PRIO_DEPTH_BUFFER_MSAA = 48,
+RADEON_PRIO_DEPTH_BUFFER_MSAA = 24,
 
-RADEON_PRIO_SEPARATE_META = 52,
+RADEON_PRIO_SEPARATE_META = 26,
 RADEON_PRIO_SHADER_BINARY, /* the hw can't hide instruction cache misses */
 
-RADEON_PRIO_SHADER_RINGS = 56,
+RADEON_PRIO_SHADER_RINGS = 28,
 
-RADEON_PRIO_SCRATCH_BUFFER = 60,
+RADEON_PRIO_SCRATCH_BUFFER = 30,
 /* 63 is the maximum value */
 };
 
 struct winsys_handle;
 struct radeon_winsys_ctx;
 
 struct radeon_cmdbuf_chunk {
 unsigned cdw;  /* Number of used dwords. */
 unsigned max_dw; /* Maximum number of dwords. */
 uint32_t *buf; /* The base pointer of the chunk. */
@@ -216,21 +217,21 @@ struct radeon_bo_metadata {
 };
 
 enum radeon_feature_id {
 RADEON_FID_R300_HYPERZ_ACCESS, /* ZMask + HiZ */
 RADEON_FID_R300_CMASK_ACCESS,
 };
 
 struct radeon_bo_list_item {
 uint64_t bo_size;
 uint64_t vm_address;
-uint64_t priority_usage; /* mask of (1 << RADEON_PRIO_*) */
+uint32_t priority_usage; /* mask of (1 << RADEON_PRIO_*) */
 };
 
 struct radeon_winsys {
 /**
  * The screen object this winsys was created for
  */
 struct pipe_screen *screen;
 
 /**
  * Decrement the winsys reference count.
diff --git a/src/gallium/drivers/radeonsi/si_debug.c 
b/src/gallium/drivers/radeonsi/si_debug.c
index 50375ce7cbe..d6207e68d12 100644
--- a/src/gallium/drivers/radeonsi/si_debug.c
+++ b/src/gallium/drivers/radeonsi/si_debug.c
@@ -562,21 +562,21 @@ static void si_dump_bo_list(struct si_context *sctx,
(va - previous_va_end) / page_size);
}
}
 
/* Print the buffer. */
fprintf(f, "  %10"PRIu64"0x%013"PRIX64"   
0x%013"PRIX64"   ",
size / page_size, va / page_size, (va + size) / 
page_size);
 
/* Print the usage. */
for (j = 0; j < 64; j++) {
-   if (!(saved->bo_list[i].priority_usage & (1ull << j)))
+   if (!(saved->bo_list[i].priority_usage & (1u << j)))
continue;
 
fprintf(f, "%s%s", !hit ? "" : ", ", 
priority_to_string(j));
hit = true;
}
fprintf(f, "\n");
}
fprintf(f, "\nNote: The holes represent memory not used by the IB.\n"
   "  Other buffers can still be allocated there.\n\n");
 }
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 

[Mesa-dev] [PATCH 2/9] winsys/amdgpu: always update gfx_bo_list_counter

2018-07-11 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
index 9aa489adaa4..77b372d2cea 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -1336,39 +1336,39 @@ void amdgpu_cs_submit_ib(void *job, int thread_index)
 if (buffer->bo->is_local)
 continue;
 
  assert(buffer->u.real.priority_usage != 0);
 
  handles[num_handles] = buffer->bo->bo;
  flags[num_handles] = (util_last_bit64(buffer->u.real.priority_usage) 
- 1) / 4;
 ++num_handles;
   }
 
-  if (acs->ring_type == RING_GFX)
- ws->gfx_bo_list_counter += cs->num_real_buffers;
-
   if (num_handles) {
  r = amdgpu_bo_list_create(ws->dev, num_handles,
handles, flags, _list);
   } else {
  r = 0;
   }
}
 bo_list_error:
 
if (r) {
   fprintf(stderr, "amdgpu: buffer list creation failed (%d)\n", r);
   amdgpu_fence_signalled(cs->fence);
   cs->error_code = r;
   goto cleanup;
}
 
+   if (acs->ring_type == RING_GFX)
+  ws->gfx_bo_list_counter += cs->num_real_buffers;
+
if (acs->ctx->num_rejected_cs) {
   r = -ECANCELED;
} else {
   struct drm_amdgpu_cs_chunk chunks[5];
   unsigned num_chunks = 0;
 
   /* Convert from dwords to bytes. */
   cs->ib[IB_MAIN].ib_bytes *= 4;
 
   /* IB */
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/9] radeonsi: remove non-GFX BO priority flags

2018-07-11 Thread Marek Olšák
From: Marek Olšák 

For a later simplification.
---
 src/gallium/drivers/radeon/radeon_uvd.c | 3 +--
 src/gallium/drivers/radeon/radeon_uvd_enc_1_1.c | 2 +-
 src/gallium/drivers/radeon/radeon_vce.c | 2 +-
 src/gallium/drivers/radeon/radeon_vcn_dec.c | 2 +-
 src/gallium/drivers/radeon/radeon_vcn_enc_1_2.c | 2 +-
 src/gallium/drivers/radeon/radeon_winsys.h  | 5 -
 src/gallium/drivers/radeonsi/si_debug.c | 4 
 src/gallium/drivers/radeonsi/si_dma_cs.c| 6 ++
 8 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/src/gallium/drivers/radeon/radeon_uvd.c 
b/src/gallium/drivers/radeon/radeon_uvd.c
index dbf3c95175c..923216d77f1 100644
--- a/src/gallium/drivers/radeon/radeon_uvd.c
+++ b/src/gallium/drivers/radeon/radeon_uvd.c
@@ -109,22 +109,21 @@ static void set_reg(struct ruvd_decoder *dec, unsigned 
reg, uint32_t val)
 }
 
 /* send a command to the VCPU through the GPCOM registers */
 static void send_cmd(struct ruvd_decoder *dec, unsigned cmd,
 struct pb_buffer* buf, uint32_t off,
 enum radeon_bo_usage usage, enum radeon_bo_domain domain)
 {
int reloc_idx;
 
reloc_idx = dec->ws->cs_add_buffer(dec->cs, buf, usage | 
RADEON_USAGE_SYNCHRONIZED,
-  domain,
- RADEON_PRIO_UVD);
+  domain, 0);
if (!dec->use_legacy) {
uint64_t addr;
addr = dec->ws->buffer_get_virtual_address(buf);
addr = addr + off;
set_reg(dec, dec->reg.data0, addr);
set_reg(dec, dec->reg.data1, addr >> 32);
} else {
off += dec->ws->buffer_get_reloc_offset(buf);
set_reg(dec, RUVD_GPCOM_VCPU_DATA0, off);
set_reg(dec, RUVD_GPCOM_VCPU_DATA1, reloc_idx * 4);
diff --git a/src/gallium/drivers/radeon/radeon_uvd_enc_1_1.c 
b/src/gallium/drivers/radeon/radeon_uvd_enc_1_1.c
index 42a9fa9abf0..ddb219792ae 100644
--- a/src/gallium/drivers/radeon/radeon_uvd_enc_1_1.c
+++ b/src/gallium/drivers/radeon/radeon_uvd_enc_1_1.c
@@ -48,21 +48,21 @@ RADEON_ENC_CS(cmd)
enc->total_task_size += *begin;}
 
 static const unsigned index_to_shifts[4] = { 24, 16, 8, 0 };
 
 static void
 radeon_uvd_enc_add_buffer(struct radeon_uvd_encoder *enc,
   struct pb_buffer *buf, enum radeon_bo_usage usage,
   enum radeon_bo_domain domain, signed offset)
 {
enc->ws->cs_add_buffer(enc->cs, buf, usage | RADEON_USAGE_SYNCHRONIZED,
-  domain, RADEON_PRIO_VCE);
+  domain, 0);
uint64_t addr;
addr = enc->ws->buffer_get_virtual_address(buf);
addr = addr + offset;
RADEON_ENC_CS(addr >> 32);
RADEON_ENC_CS(addr);
 }
 
 static void
 radeon_uvd_enc_set_emulation_prevention(struct radeon_uvd_encoder *enc,
 bool set)
diff --git a/src/gallium/drivers/radeon/radeon_vce.c 
b/src/gallium/drivers/radeon/radeon_vce.c
index 6d1b1ff7879..8972253c7c5 100644
--- a/src/gallium/drivers/radeon/radeon_vce.c
+++ b/src/gallium/drivers/radeon/radeon_vce.c
@@ -552,21 +552,21 @@ bool si_vce_is_fw_version_supported(struct si_screen 
*sscreen)
 /**
  * Add the buffer as relocation to the current command submission
  */
 void si_vce_add_buffer(struct rvce_encoder *enc, struct pb_buffer *buf,
   enum radeon_bo_usage usage, enum radeon_bo_domain domain,
   signed offset)
 {
int reloc_idx;
 
reloc_idx = enc->ws->cs_add_buffer(enc->cs, buf, usage | 
RADEON_USAGE_SYNCHRONIZED,
-  domain, RADEON_PRIO_VCE);
+  domain, 0);
if (enc->use_vm) {
uint64_t addr;
addr = enc->ws->buffer_get_virtual_address(buf);
addr = addr + offset;
RVCE_CS(addr >> 32);
RVCE_CS(addr);
} else {
offset += enc->ws->buffer_get_reloc_offset(buf);
RVCE_CS(reloc_idx * 4);
RVCE_CS(offset);
diff --git a/src/gallium/drivers/radeon/radeon_vcn_dec.c 
b/src/gallium/drivers/radeon/radeon_vcn_dec.c
index ed7223bbec5..c2e22048cef 100644
--- a/src/gallium/drivers/radeon/radeon_vcn_dec.c
+++ b/src/gallium/drivers/radeon/radeon_vcn_dec.c
@@ -1026,21 +1026,21 @@ static void set_reg(struct radeon_decoder *dec, 
unsigned reg, uint32_t val)
 }
 
 /* send a command to the VCPU through the GPCOM registers */
 static void send_cmd(struct radeon_decoder *dec, unsigned cmd,
 struct pb_buffer* buf, uint32_t off,
 enum radeon_bo_usage usage, enum radeon_bo_domain domain)
 {
uint64_t addr;
 
dec->ws->cs_add_buffer(dec->cs, buf, usage | RADEON_USAGE_SYNCHRONIZED,
-  domain, RADEON_PRIO_UVD);
+   

[Mesa-dev] [PATCH 9/9] winsys/amdgpu: pass the BO list via the CS ioctl on DRM >= 3.27.0

2018-07-11 Thread Marek Olšák
From: Marek Olšák 

TODO: requires latest libdrm for amdgpu_bo_handle_type_kms_noimport
---
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.c |  6 +++
 src/gallium/winsys/amdgpu/drm/amdgpu_bo.h |  2 +
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 54 +--
 3 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
index e3d56613dfa..eba8d6e8b3d 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.c
@@ -473,20 +473,22 @@ static struct amdgpu_winsys_bo *amdgpu_create_bo(struct 
amdgpu_winsys *ws,
bo->u.real.va_handle = va_handle;
bo->initial_domain = initial_domain;
bo->unique_id = __sync_fetch_and_add(>next_bo_unique_id, 1);
bo->is_local = !!(request.flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID);
 
if (initial_domain & RADEON_DOMAIN_VRAM)
   ws->allocated_vram += align64(size, ws->info.gart_page_size);
else if (initial_domain & RADEON_DOMAIN_GTT)
   ws->allocated_gtt += align64(size, ws->info.gart_page_size);
 
+   amdgpu_bo_export(bo->bo, amdgpu_bo_handle_type_kms_noimport, 
>u.real.kms_handle);
+
amdgpu_add_buffer_to_global_list(bo);
 
return bo;
 
 error_va_map:
amdgpu_va_range_free(va_handle);
 
 error_va_alloc:
amdgpu_bo_free(buf_handle);
 
@@ -1330,20 +1332,22 @@ static struct pb_buffer *amdgpu_bo_from_handle(struct 
radeon_winsys *rws,
if (stride)
   *stride = whandle->stride;
if (offset)
   *offset = whandle->offset;
 
if (bo->initial_domain & RADEON_DOMAIN_VRAM)
   ws->allocated_vram += align64(bo->base.size, ws->info.gart_page_size);
else if (bo->initial_domain & RADEON_DOMAIN_GTT)
   ws->allocated_gtt += align64(bo->base.size, ws->info.gart_page_size);
 
+   amdgpu_bo_export(bo->bo, amdgpu_bo_handle_type_kms_noimport, 
>u.real.kms_handle);
+
amdgpu_add_buffer_to_global_list(bo);
 
return >base;
 
 error_va_map:
amdgpu_va_range_free(va_handle);
 
 error_query:
amdgpu_bo_free(result.buf_handle);
 
@@ -1429,20 +1433,22 @@ static struct pb_buffer *amdgpu_bo_from_ptr(struct 
radeon_winsys *rws,
 bo->user_ptr = pointer;
 bo->va = va;
 bo->u.real.va_handle = va_handle;
 bo->initial_domain = RADEON_DOMAIN_GTT;
 bo->unique_id = __sync_fetch_and_add(>next_bo_unique_id, 1);
 
 ws->allocated_gtt += aligned_size;
 
 amdgpu_add_buffer_to_global_list(bo);
 
+amdgpu_bo_export(bo->bo, amdgpu_bo_handle_type_kms_noimport, 
>u.real.kms_handle);
+
 return (struct pb_buffer*)bo;
 
 error_va_map:
 amdgpu_va_range_free(va_handle);
 
 error_va_alloc:
 amdgpu_bo_free(buf_handle);
 
 error:
 FREE(bo);
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.h 
b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.h
index b3dbb3515e9..1e07e4734aa 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_bo.h
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_bo.h
@@ -59,20 +59,22 @@ struct amdgpu_winsys_bo {
struct pb_buffer base;
union {
   struct {
  struct pb_cache_entry cache_entry;
 
  amdgpu_va_handle va_handle;
  int map_count;
  bool use_reusable_pool;
 
  struct list_head global_list_item;
+
+ uint32_t kms_handle;
   } real;
   struct {
  struct pb_slab_entry entry;
  struct amdgpu_winsys_bo *real;
   } slab;
   struct {
  simple_mtx_t commit_lock;
  amdgpu_va_handle va_handle;
  enum radeon_bo_flag flags;
 
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
index ac7160a5e51..c0f8b442b1d 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -30,20 +30,24 @@
 #include "util/os_time.h"
 #include 
 #include 
 
 #include "amd/common/sid.h"
 
 #ifndef AMDGPU_IB_FLAG_TC_WB_NOT_INVALIDATE
 #define AMDGPU_IB_FLAG_TC_WB_NOT_INVALIDATE (1 << 3)
 #endif
 
+#ifndef AMDGPU_CHUNK_ID_BO_HANDLES
+#define AMDGPU_CHUNK_ID_BO_HANDLES 0x06
+#endif
+
 DEBUG_GET_ONCE_BOOL_OPTION(noop, "RADEON_NOOP", false)
 
 /* FENCES */
 
 static struct pipe_fence_handle *
 amdgpu_fence_create(struct amdgpu_ctx *ctx, unsigned ip_type,
 unsigned ip_instance, unsigned ring)
 {
struct amdgpu_fence *fence = CALLOC_STRUCT(amdgpu_fence);
 
@@ -1283,45 +1287,79 @@ static bool amdgpu_add_sparse_backing_buffers(struct 
amdgpu_cs_context *cs)
 
 void amdgpu_cs_submit_ib(void *job, int thread_index)
 {
struct amdgpu_cs *acs = (struct amdgpu_cs*)job;
struct amdgpu_winsys *ws = acs->ctx->ws;
struct amdgpu_cs_context *cs = acs->cst;
int i, r;
amdgpu_bo_list_handle bo_list = NULL;
uint64_t seq_no = 0;
bool has_user_fence = amdgpu_cs_has_user_fence(cs);
+   bool use_bo_list_create = ws->info.drm_minor < 27;
+   struct drm_amdgpu_bo_list_in bo_list_in;
 
-   /* Create the buffer list.
-* Use a buffer list containing all allocated buffers if 

[Mesa-dev] [PATCH 8/9] winsys/amdgpu: clean up error handling in amdgpu_cs_submit_ib

2018-07-11 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
index 872e67a790a..ac7160a5e51 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -1305,31 +1305,28 @@ void amdgpu_cs_submit_ib(void *job, int thread_index)
   LIST_FOR_EACH_ENTRY(bo, >global_bo_list, u.real.global_list_item) {
  assert(num < ws->num_buffers);
  handles[num++] = bo->bo;
   }
 
   r = amdgpu_bo_list_create(ws->dev, ws->num_buffers,
 handles, NULL, _list);
   simple_mtx_unlock(>global_bo_list_lock);
   if (r) {
  fprintf(stderr, "amdgpu: buffer list creation failed (%d)\n", r);
- amdgpu_fence_signalled(cs->fence);
- cs->error_code = r;
  goto cleanup;
   }
} else {
   unsigned num_handles;
 
   if (!amdgpu_add_sparse_backing_buffers(cs)) {
  fprintf(stderr, "amdgpu: amdgpu_add_sparse_backing_buffers failed\n");
- amdgpu_fence_signalled(cs->fence);
- cs->error_code = -ENOMEM;
+ r = -ENOMEM;
  goto cleanup;
   }
 
   amdgpu_bo_handle *handles = alloca(sizeof(*handles) * 
cs->num_real_buffers);
   uint8_t *flags = alloca(sizeof(*flags) * cs->num_real_buffers);
 
   num_handles = 0;
   for (i = 0; i < cs->num_real_buffers; ++i) {
  struct amdgpu_cs_buffer *buffer = >real_buffers[i];
 
@@ -1341,22 +1338,20 @@ void amdgpu_cs_submit_ib(void *job, int thread_index)
  handles[num_handles] = buffer->bo->bo;
  flags[num_handles] = (util_last_bit(buffer->u.real.priority_usage) - 
1) / 2;
 ++num_handles;
   }
 
   if (num_handles) {
  r = amdgpu_bo_list_create(ws->dev, num_handles,
handles, flags, _list);
  if (r) {
 fprintf(stderr, "amdgpu: buffer list creation failed (%d)\n", r);
-amdgpu_fence_signalled(cs->fence);
-cs->error_code = r;
 goto cleanup;
  }
   }
}
 
if (acs->ring_type == RING_GFX)
   ws->gfx_bo_list_counter += cs->num_real_buffers;
 
if (acs->ctx->num_rejected_cs) {
   r = -ECANCELED;
@@ -1451,48 +1446,52 @@ void amdgpu_cs_submit_ib(void *job, int thread_index)
  chunks[num_chunks].chunk_data = (uintptr_t)sem_chunk;
  num_chunks++;
   }
 
   assert(num_chunks <= ARRAY_SIZE(chunks));
 
   r = amdgpu_cs_submit_raw(ws->dev, acs->ctx->ctx, bo_list,
num_chunks, chunks, _no);
}
 
-   cs->error_code = r;
if (r) {
   if (r == -ENOMEM)
  fprintf(stderr, "amdgpu: Not enough memory for command 
submission.\n");
   else if (r == -ECANCELED)
  fprintf(stderr, "amdgpu: The CS has been cancelled because the 
context is lost.\n");
   else
  fprintf(stderr, "amdgpu: The CS has been rejected, "
  "see dmesg for more information (%i).\n", r);
 
-  amdgpu_fence_signalled(cs->fence);
-
   acs->ctx->num_rejected_cs++;
   ws->num_total_rejected_cs++;
} else {
   /* Success. */
   uint64_t *user_fence = NULL;
 
   if (has_user_fence)
  user_fence = acs->ctx->user_fence_cpu_address_base + acs->ring_type;
   amdgpu_fence_submitted(cs->fence, seq_no, user_fence);
}
 
/* Cleanup. */
if (bo_list)
   amdgpu_bo_list_destroy(bo_list);
 
 cleanup:
+   /* If there was an error, signal the fence, because it won't be signalled
+* by the hardware. */
+   if (r)
+  amdgpu_fence_signalled(cs->fence);
+
+   cs->error_code = r;
+
for (i = 0; i < cs->num_real_buffers; i++)
   p_atomic_dec(>real_buffers[i].bo->num_active_ioctls);
for (i = 0; i < cs->num_slab_buffers; i++)
   p_atomic_dec(>slab_buffers[i].bo->num_active_ioctls);
for (i = 0; i < cs->num_sparse_buffers; i++)
   p_atomic_dec(>sparse_buffers[i].bo->num_active_ioctls);
 
amdgpu_cs_context_cleanup(cs);
 }
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/9] radeonsi: merge DCC/CMASK/HTILE priority flags

2018-07-11 Thread Marek Olšák
From: Marek Olšák 

For a later simplification.
---
 src/gallium/drivers/r600/evergreen_state.c| 4 ++--
 src/gallium/drivers/r600/r600_state.c | 2 +-
 src/gallium/drivers/radeon/radeon_winsys.h| 4 +---
 src/gallium/drivers/radeonsi/si_debug.c   | 4 +---
 src/gallium/drivers/radeonsi/si_descriptors.c | 2 +-
 src/gallium/drivers/radeonsi/si_state.c   | 4 ++--
 6 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index 76a3e0e441a..57b82e7855f 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -1857,21 +1857,21 @@ static void evergreen_emit_framebuffer_state(struct 
r600_context *rctx, struct r
  >b.gfx,
  (struct 
r600_resource*)cb->base.texture,
  RADEON_USAGE_READWRITE,
  tex->resource.b.b.nr_samples > 1 ?
  
RADEON_PRIO_COLOR_BUFFER_MSAA :
  RADEON_PRIO_COLOR_BUFFER);
 
if (tex->cmask_buffer && tex->cmask_buffer != >resource) {
cmask_reloc = radeon_add_to_buffer_list(>b, 
>b.gfx,
tex->cmask_buffer, RADEON_USAGE_READWRITE,
-   RADEON_PRIO_CMASK);
+   RADEON_PRIO_SEPARATE_META);
} else {
cmask_reloc = reloc;
}
 
radeon_set_context_reg_seq(cs, R_028C60_CB_COLOR0_BASE + i * 
0x3C, 13);
radeon_emit(cs, cb->cb_color_base); /* 
R_028C60_CB_COLOR0_BASE */
radeon_emit(cs, cb->cb_color_pitch);/* 
R_028C64_CB_COLOR0_PITCH */
radeon_emit(cs, cb->cb_color_slice);/* 
R_028C68_CB_COLOR0_SLICE */
radeon_emit(cs, cb->cb_color_view); /* 
R_028C6C_CB_COLOR0_VIEW */
radeon_emit(cs, cb->cb_color_info | tex->cb_color_info); /* 
R_028C70_CB_COLOR0_INFO */
@@ -2046,21 +2046,21 @@ static void evergreen_emit_db_state(struct r600_context 
*rctx, struct r600_atom
 
if (a->rsurf && a->rsurf->db_htile_surface) {
struct r600_texture *rtex = (struct r600_texture 
*)a->rsurf->base.texture;
unsigned reloc_idx;
 
radeon_set_context_reg(cs, R_02802C_DB_DEPTH_CLEAR, 
fui(rtex->depth_clear_value));
radeon_set_context_reg(cs, R_028ABC_DB_HTILE_SURFACE, 
a->rsurf->db_htile_surface);
radeon_set_context_reg(cs, R_028AC8_DB_PRELOAD_CONTROL, 
a->rsurf->db_preload_control);
radeon_set_context_reg(cs, R_028014_DB_HTILE_DATA_BASE, 
a->rsurf->db_htile_data_base);
reloc_idx = radeon_add_to_buffer_list(>b, >b.gfx, 
>resource,
- RADEON_USAGE_READWRITE, 
RADEON_PRIO_HTILE);
+ RADEON_USAGE_READWRITE, 
RADEON_PRIO_SEPARATE_META);
radeon_emit(cs, PKT3(PKT3_NOP, 0, 0));
radeon_emit(cs, reloc_idx);
} else {
radeon_set_context_reg(cs, R_028ABC_DB_HTILE_SURFACE, 0);
radeon_set_context_reg(cs, R_028AC8_DB_PRELOAD_CONTROL, 0);
}
 }
 
 static void evergreen_emit_db_misc_state(struct r600_context *rctx, struct 
r600_atom *atom)
 {
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index d241d27d1b9..9f3779f16d4 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -1547,21 +1547,21 @@ static void r600_emit_db_state(struct r600_context 
*rctx, struct r600_atom *atom
struct r600_db_state *a = (struct r600_db_state*)atom;
 
if (a->rsurf && a->rsurf->db_htile_surface) {
struct r600_texture *rtex = (struct r600_texture 
*)a->rsurf->base.texture;
unsigned reloc_idx;
 
radeon_set_context_reg(cs, R_02802C_DB_DEPTH_CLEAR, 
fui(rtex->depth_clear_value));
radeon_set_context_reg(cs, R_028D24_DB_HTILE_SURFACE, 
a->rsurf->db_htile_surface);
radeon_set_context_reg(cs, R_028014_DB_HTILE_DATA_BASE, 
a->rsurf->db_htile_data_base);
reloc_idx = radeon_add_to_buffer_list(>b, >b.gfx, 
>resource,
- RADEON_USAGE_READWRITE, 
RADEON_PRIO_HTILE);
+ RADEON_USAGE_READWRITE, 
RADEON_PRIO_SEPARATE_META);
radeon_emit(cs, PKT3(PKT3_NOP, 0, 0));
radeon_emit(cs, reloc_idx);
} else {
radeon_set_context_reg(cs, R_028D24_DB_HTILE_SURFACE, 0);
}
 }
 
 static void r600_emit_db_misc_state(struct r600_context *rctx, struct 

[Mesa-dev] [PATCH 4/9] winsys/amdgpu: use alloca when using global_bo_list

2018-07-11 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
index 1aaa0667310..ec164175dbc 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -1293,37 +1293,29 @@ void amdgpu_cs_submit_ib(void *job, int thread_index)
 
/* Create the buffer list.
 * Use a buffer list containing all allocated buffers if requested.
 */
if (ws->debug_all_bos) {
   struct amdgpu_winsys_bo *bo;
   amdgpu_bo_handle *handles;
   unsigned num = 0;
 
   simple_mtx_lock(>global_bo_list_lock);
-
-  handles = malloc(sizeof(handles[0]) * ws->num_buffers);
-  if (!handles) {
- simple_mtx_unlock(>global_bo_list_lock);
- amdgpu_cs_context_cleanup(cs);
- cs->error_code = -ENOMEM;
- return;
-  }
+  handles = alloca(sizeof(handles[0]) * ws->num_buffers);
 
   LIST_FOR_EACH_ENTRY(bo, >global_bo_list, u.real.global_list_item) {
  assert(num < ws->num_buffers);
  handles[num++] = bo->bo;
   }
 
   r = amdgpu_bo_list_create(ws->dev, ws->num_buffers,
 handles, NULL, _list);
-  free(handles);
   simple_mtx_unlock(>global_bo_list_lock);
   if (r) {
  fprintf(stderr, "amdgpu: buffer list creation failed (%d)\n", r);
  amdgpu_fence_signalled(cs->fence);
  cs->error_code = r;
  goto cleanup;
   }
} else {
   unsigned num_handles;
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/9] winsys/amdgpu: remove label bo_list_error

2018-07-11 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 28 +--
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
index 77b372d2cea..1aaa0667310 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -1311,26 +1311,34 @@ void amdgpu_cs_submit_ib(void *job, int thread_index)
 
   LIST_FOR_EACH_ENTRY(bo, >global_bo_list, u.real.global_list_item) {
  assert(num < ws->num_buffers);
  handles[num++] = bo->bo;
   }
 
   r = amdgpu_bo_list_create(ws->dev, ws->num_buffers,
 handles, NULL, _list);
   free(handles);
   simple_mtx_unlock(>global_bo_list_lock);
+  if (r) {
+ fprintf(stderr, "amdgpu: buffer list creation failed (%d)\n", r);
+ amdgpu_fence_signalled(cs->fence);
+ cs->error_code = r;
+ goto cleanup;
+  }
} else {
   unsigned num_handles;
 
   if (!amdgpu_add_sparse_backing_buffers(cs)) {
- r = -ENOMEM;
- goto bo_list_error;
+ fprintf(stderr, "amdgpu: amdgpu_add_sparse_backing_buffers failed\n");
+ amdgpu_fence_signalled(cs->fence);
+ cs->error_code = -ENOMEM;
+ goto cleanup;
   }
 
   amdgpu_bo_handle *handles = alloca(sizeof(*handles) * 
cs->num_real_buffers);
   uint8_t *flags = alloca(sizeof(*flags) * cs->num_real_buffers);
 
   num_handles = 0;
   for (i = 0; i < cs->num_real_buffers; ++i) {
  struct amdgpu_cs_buffer *buffer = >real_buffers[i];
 
 if (buffer->bo->is_local)
@@ -1339,32 +1347,28 @@ void amdgpu_cs_submit_ib(void *job, int thread_index)
  assert(buffer->u.real.priority_usage != 0);
 
  handles[num_handles] = buffer->bo->bo;
  flags[num_handles] = (util_last_bit64(buffer->u.real.priority_usage) 
- 1) / 4;
 ++num_handles;
   }
 
   if (num_handles) {
  r = amdgpu_bo_list_create(ws->dev, num_handles,
handles, flags, _list);
-  } else {
- r = 0;
+ if (r) {
+fprintf(stderr, "amdgpu: buffer list creation failed (%d)\n", r);
+amdgpu_fence_signalled(cs->fence);
+cs->error_code = r;
+goto cleanup;
+ }
   }
}
-bo_list_error:
-
-   if (r) {
-  fprintf(stderr, "amdgpu: buffer list creation failed (%d)\n", r);
-  amdgpu_fence_signalled(cs->fence);
-  cs->error_code = r;
-  goto cleanup;
-   }
 
if (acs->ring_type == RING_GFX)
   ws->gfx_bo_list_counter += cs->num_real_buffers;
 
if (acs->ctx->num_rejected_cs) {
   r = -ECANCELED;
} else {
   struct drm_amdgpu_cs_chunk chunks[5];
   unsigned num_chunks = 0;
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/9] RadeonSI on AMDGPU: Command submission optimizations

2018-07-11 Thread Marek Olšák
Hi,

This series improves RadeonSI performance for trivial CPU-bound
benchmarks. Other CPU-bound benchmarks may be affected marginally.

The first 8 patches are cleanups that surprisingly increase
performance.

The last patch uses a new chunk type in the CS ioctl for passing
the array of buffer handles to the kernel, skipping and deprecating
the BO list ioctl. Thanks to Andrey Grodzovsky for providing
the kernel patch. My libdrm patch adding amdgpu_bo_handle_type_kms_-
noimport is also required.

The maximum glxgears FPS improves as follows.

Initially: 13285
8 patches: 14403 (+8.4% vs initial)
+ patch 9: 15498 (+16.6% vs initial)

This vastly exceeded my expectations.

Please review.

Thanks,
Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/9] winsys/amdgpu: make amdgpu_cs_context::flags & handles local

2018-07-11 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.c | 23 +--
 src/gallium/winsys/amdgpu/drm/amdgpu_cs.h |  4 
 2 files changed, 5 insertions(+), 22 deletions(-)

diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
index 6628ff9f170..9aa489adaa4 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c
@@ -895,23 +895,21 @@ static void amdgpu_cs_context_cleanup(struct 
amdgpu_cs_context *cs)
cs->num_syncobj_to_signal = 0;
amdgpu_fence_reference(>fence, NULL);
 
memset(cs->buffer_indices_hashlist, -1, 
sizeof(cs->buffer_indices_hashlist));
cs->last_added_bo = NULL;
 }
 
 static void amdgpu_destroy_cs_context(struct amdgpu_cs_context *cs)
 {
amdgpu_cs_context_cleanup(cs);
-   FREE(cs->flags);
FREE(cs->real_buffers);
-   FREE(cs->handles);
FREE(cs->slab_buffers);
FREE(cs->sparse_buffers);
FREE(cs->fence_dependencies);
FREE(cs->syncobj_to_signal);
 }
 
 
 static struct radeon_cmdbuf *
 amdgpu_cs_create(struct radeon_winsys_ctx *rwctx,
  enum ring_type ring_type,
@@ -1321,54 +1319,43 @@ void amdgpu_cs_submit_ib(void *job, int thread_index)
   free(handles);
   simple_mtx_unlock(>global_bo_list_lock);
} else {
   unsigned num_handles;
 
   if (!amdgpu_add_sparse_backing_buffers(cs)) {
  r = -ENOMEM;
  goto bo_list_error;
   }
 
-  if (cs->max_real_submit < cs->num_real_buffers) {
- FREE(cs->handles);
- FREE(cs->flags);
-
- cs->handles = MALLOC(sizeof(*cs->handles) * cs->num_real_buffers);
- cs->flags = MALLOC(sizeof(*cs->flags) * cs->num_real_buffers);
-
- if (!cs->handles || !cs->flags) {
-cs->max_real_submit = 0;
-r = -ENOMEM;
-goto bo_list_error;
- }
-  }
+  amdgpu_bo_handle *handles = alloca(sizeof(*handles) * 
cs->num_real_buffers);
+  uint8_t *flags = alloca(sizeof(*flags) * cs->num_real_buffers);
 
   num_handles = 0;
   for (i = 0; i < cs->num_real_buffers; ++i) {
  struct amdgpu_cs_buffer *buffer = >real_buffers[i];
 
 if (buffer->bo->is_local)
 continue;
 
  assert(buffer->u.real.priority_usage != 0);
 
- cs->handles[num_handles] = buffer->bo->bo;
- cs->flags[num_handles] = 
(util_last_bit64(buffer->u.real.priority_usage) - 1) / 4;
+ handles[num_handles] = buffer->bo->bo;
+ flags[num_handles] = (util_last_bit64(buffer->u.real.priority_usage) 
- 1) / 4;
 ++num_handles;
   }
 
   if (acs->ring_type == RING_GFX)
  ws->gfx_bo_list_counter += cs->num_real_buffers;
 
   if (num_handles) {
  r = amdgpu_bo_list_create(ws->dev, num_handles,
-   cs->handles, cs->flags, _list);
+   handles, flags, _list);
   } else {
  r = 0;
   }
}
 bo_list_error:
 
if (r) {
   fprintf(stderr, "amdgpu: buffer list creation failed (%d)\n", r);
   amdgpu_fence_signalled(cs->fence);
   cs->error_code = r;
diff --git a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h 
b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h
index 5f96193750b..3b10cc66c21 100644
--- a/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h
+++ b/src/gallium/winsys/amdgpu/drm/amdgpu_cs.h
@@ -74,24 +74,20 @@ struct amdgpu_ib {
 };
 
 struct amdgpu_cs_context {
struct drm_amdgpu_cs_chunk_ib ib[IB_NUM];
 
/* Buffers. */
unsignedmax_real_buffers;
unsignednum_real_buffers;
struct amdgpu_cs_buffer *real_buffers;
 
-   unsignedmax_real_submit;
-   amdgpu_bo_handle*handles;
-   uint8_t *flags;
-
unsignednum_slab_buffers;
unsignedmax_slab_buffers;
struct amdgpu_cs_buffer *slab_buffers;
 
unsignednum_sparse_buffers;
unsignedmax_sparse_buffers;
struct amdgpu_cs_buffer *sparse_buffers;
 
int buffer_indices_hashlist[4096];
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] i965/miptree: Init r8stencil_needs_update to false

2018-07-11 Thread Nanley Chery
The current behavior masked two bugs where the flag was not set to true
after modifying the stencil texture. One case was a regression
introduced with commit bdbb527a65fc729e7a9319ae67de60d03d06c3fd and
another was a bug in the depthstencil mapping code. These have since
been fixed.

To prevent such bugs from being masked in the future, initialize
r8stencil_needs_update to false.

v2: Keep the delayed allocation.

Reviewed-by: Topi Pohjolainen  (v1)
---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 187f310d7d6..31f3182d5a5 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -672,8 +672,6 @@ make_separate_stencil_surface(struct brw_context *brw,
if (!mt->stencil_mt)
   return false;
 
-   mt->stencil_mt->r8stencil_needs_update = true;
-
return true;
 }
 
@@ -2937,7 +2935,7 @@ intel_update_r8stencil(struct brw_context *brw,
assert(devinfo->gen >= 7);
struct intel_mipmap_tree *src =
   mt->format == MESA_FORMAT_S_UINT8 ? mt : mt->stencil_mt;
-   if (!src || devinfo->gen >= 8 || !src->r8stencil_needs_update)
+   if (!src || devinfo->gen >= 8)
   return;
 
assert(src->surf.size > 0);
@@ -2961,6 +2959,9 @@ intel_update_r8stencil(struct brw_context *brw,
   assert(mt->r8stencil_mt);
}
 
+   if (src->r8stencil_needs_update == false)
+  return;
+
struct intel_mipmap_tree *dst = mt->r8stencil_mt;
 
for (int level = src->first_level; level <= src->last_level; level++) {
-- 
2.18.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] u_blitter: Add an option to draw the triangles using an index buffer.

2018-07-11 Thread Roland Scheidegger
Am 12.07.2018 um 00:05 schrieb Eric Anholt:
> For V3D, the HW will interpolate slightly differently along the shared
> edge of the trifan.  The conformance tests manage to catch this in the
> nearest_consistency_* group.  To get interpolation to match, we need the
> last vertex of the triangle to be shared.
> 
> I first tried implementing draw_rectangle to do triangles instead, but
> that was quite a bit (147 lines) of code duplication from u_blitter, and
> this seems much simpler and less likely to break as u_blitter changes.

I'm curious, how does interpolation work on that hw?
Does it use the provoking vertex as some sort of reference? If so would
it actually work if you switched to provoking vertex first (if you can)?
(But I'm really just curious, the patch looks alright to me.)

Roland


> 
> Fixes dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_* on V3D.
> ---
>  src/gallium/auxiliary/util/u_blitter.c | 16 ++--
>  src/gallium/auxiliary/util/u_blitter.h |  2 ++
>  src/gallium/drivers/v3d/v3d_context.c  |  1 +
>  3 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/src/gallium/auxiliary/util/u_blitter.c 
> b/src/gallium/auxiliary/util/u_blitter.c
> index 4748627fc523..eadb76a109fb 100644
> --- a/src/gallium/auxiliary/util/u_blitter.c
> +++ b/src/gallium/auxiliary/util/u_blitter.c
> @@ -1258,8 +1258,20 @@ static void blitter_draw(struct blitter_context_priv 
> *ctx,
> pipe->set_vertex_buffers(pipe, ctx->base.vb_slot, 1, );
> pipe->bind_vertex_elements_state(pipe, vertex_elements_cso);
> pipe->bind_vs_state(pipe, get_vs(>base));
> -   util_draw_arrays_instanced(pipe, PIPE_PRIM_TRIANGLE_FAN, 0, 4,
> -  0, num_instances);
> +
> +   if (ctx->base.use_index_buffer) {
> +  /* Note that for V3D,
> +   * dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_* require
> +   * that the last vert of the two tris be the same.
> +   */
> +  static uint8_t indices[6] = { 0, 1, 2, 0, 3, 2 };
> +  util_draw_elements_instanced(pipe, indices, 1, 0,
> +   PIPE_PRIM_TRIANGLES, 0, 6,
> +   0, num_instances);
> +   } else {
> +  util_draw_arrays_instanced(pipe, PIPE_PRIM_TRIANGLE_FAN, 0, 4,
> + 0, num_instances);
> +   }
> pipe_resource_reference(, NULL);
>  }
>  
> diff --git a/src/gallium/auxiliary/util/u_blitter.h 
> b/src/gallium/auxiliary/util/u_blitter.h
> index 9e945983baac..9ea1dc9b6b28 100644
> --- a/src/gallium/auxiliary/util/u_blitter.h
> +++ b/src/gallium/auxiliary/util/u_blitter.h
> @@ -100,6 +100,8 @@ struct blitter_context
> /* Whether the blitter is running. */
> bool running;
>  
> +   bool use_index_buffer;
> +
> /* Private members, really. */
> struct pipe_context *pipe; /**< pipe context */
>  
> diff --git a/src/gallium/drivers/v3d/v3d_context.c 
> b/src/gallium/drivers/v3d/v3d_context.c
> index cef32ceb069d..6fb807b1aa8a 100644
> --- a/src/gallium/drivers/v3d/v3d_context.c
> +++ b/src/gallium/drivers/v3d/v3d_context.c
> @@ -164,6 +164,7 @@ v3d_context_create(struct pipe_screen *pscreen, void 
> *priv, unsigned flags)
>  v3d->blitter = util_blitter_create(pctx);
>  if (!v3d->blitter)
>  goto fail;
> +v3d->blitter->use_index_buffer = true;
>  
>  v3d->primconvert = util_primconvert_create(pctx,
> (1 << PIPE_PRIM_QUADS) - 
> 1);
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH] mesa/st: don't prematurely optimize for render targets when on virgl

2018-07-11 Thread Marek Olšák
If some formats are not supported as render targets, I recommend that
they are not supported for texturing either. (radeonsi doesn't support
unaligned 3-channel formats either. It only supports aligned formats,
such as R8G8B8X8 and R32G32B32X32.)

If you can't support RGBX formats as render targets, you can still
expose them as long as you support RGBX for texturing. Everything will
be correct except for blending where DST_ALPHA will read the value
from memory instead of returning 1. (and you can fix the blend state to
use ONE instead)

Marek

On Tue, Jul 10, 2018 at 9:42 AM, Gert Wollny  wrote:
> For three component textures virgl faces the problem that the host driver
> may report that these can not be used as a render target, and when the
> client requests such a texture a four-componet texture will be choosen
> even if only a sampler view was requested. One example where this happens
> is with the Intel i965 driver that doesn't support RGB32* as render target.
> The result is that when allocating a GL_RGB32F and a GL_RGB32I texture, and
> then glCopyImageSubData is called for these two texture, gallium will fail
> with an assertion, because it reports a different per pixel bit count.
>
> Therefore, when using the virgl driver, don't try to enable BIND_RENDER_TARGET
> for RGB textures that were requested with only BIND_SAMPLER_VIEW.
>
> Signed-off-by: Gert Wollny 
> ---
>
> I'm aware that instead of using the device ID, I should probably add a new 
> caps
> flag, but apart from that I was wondering whether there may be better 
> approaches
> to achieve the same goal: The a texture is allocated with the internal format
> as closely as possible to the requested one. Especially it shouldn't change 
> the
> percieved pixel bit count.
>
> In fact, I was a bit surprised to see that the assertions regarding the
> different sizes was hit in st_copy_image:307 (swizzled_copy). It seems that
> there is some check missing that should redirect the copy in such a case.
>
> Many thanks for any comments,
> Gert
>
>  src/mesa/state_tracker/st_format.c | 22 +++---
>  1 file changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_format.c 
> b/src/mesa/state_tracker/st_format.c
> index 9ae796eca9..2d8ff756a9 100644
> --- a/src/mesa/state_tracker/st_format.c
> +++ b/src/mesa/state_tracker/st_format.c
> @@ -2285,19 +2285,27 @@ st_ChooseTextureFormat(struct gl_context *ctx, GLenum 
> target,
>
> /* GL textures may wind up being render targets, but we don't know
>  * that in advance.  Specify potential render target flags now for formats
> -* that we know should always be renderable.
> +* that we know should always be renderable, except when we are on virgl,
> +* we don't try this for three component textures, because the host might
> +* not support rendering to them, and then Gallium chooses a four 
> component
> +* internal format and calls to e.g. glCopyImageSubData will fail for 
> format
> +* that should be compatible.
>  */
> bindings = PIPE_BIND_SAMPLER_VIEW;
> if (_mesa_is_depth_or_stencil_format(internalFormat))
>bindings |= PIPE_BIND_DEPTH_STENCIL;
> -   else if (is_renderbuffer || internalFormat == 3 || internalFormat == 4 ||
> -internalFormat == GL_RGB || internalFormat == GL_RGBA ||
> -internalFormat == GL_RGB8 || internalFormat == GL_RGBA8 ||
> +   else if (is_renderbuffer  ||
> +internalFormat == GL_RGBA ||
> +internalFormat == GL_RGBA8 ||
>  internalFormat == GL_BGRA ||
> -internalFormat == GL_RGB16F ||
>  internalFormat == GL_RGBA16F ||
> -internalFormat == GL_RGB32F ||
> -internalFormat == GL_RGBA32F)
> +internalFormat == GL_RGBA32F ||
> +((st->pipe->screen->get_param(st->pipe->screen, 
> PIPE_CAP_DEVICE_ID) != 0x1010) &&
> + (internalFormat == 3 || internalFormat == 4 ||
> +  internalFormat == GL_RGB ||
> +  internalFormat == GL_RGB8 ||
> +  internalFormat == GL_RGB16F ||
> +  internalFormat == GL_RGB32F )))
>bindings |= PIPE_BIND_RENDER_TARGET;
>
> /* GLES allows the driver to choose any format which matches
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: unspills shoudn't use grf127 as dest since Gen8+

2018-07-11 Thread Caio Marcelo de Oliveira Filho
On Wed, Jul 11, 2018 at 06:03:05PM +0200, Jose Maria Casanova Crespo wrote:
> At 232ed8980217dd65ab0925df28156f565b94b2e5 "i965/fs: Register allocator
> shoudn't use grf127 for sends dest" we didn't take into account the case
> of SEND instructions that are not send_from_grf. But since Gen7+ although
> the backend still uses MRFs internally for sends they are finally asigned
> to a GRFs.

Typo "assigned".


> In the case of unspills the backend assigns directly as source its
> destination because it is suppose to be available. So we always have a
> source-destination overlap. If the reg_allocator asigns registers that

Typo "assigns".


> include de grf127 we fail the validation rule that affects Gen8+

Typo "the".


> "r127 must not be used for return address when there is a src and dest
> overlap in send instruction."
> 
> So this patch activates the grf127_send_hack_node for Gen8+ and if we have
> any register spilled we add interferences to the destination of the unspill
> operations.

I've spent some time testing why this patch was still not covering all
the cases yet. The opt_bank_conflicts() optimization, that runs after
the register allocation, was moving things around, causing the r127 to
be used in the condition we were avoiding it.

The code there already has the idea of not touching certain registers,
so we should add something like

  /* At Intel Broadwell PRM, vol 07, section "Instruction Set Reference",
   * subsection "EUISA Instructions", Send Message (page 990):
   *
   * "r127 must not be used for return address when there is a src and
   * dest overlap in send instruction."
   *
   * Register allocation ensures that, so don't move 127 around to avoid
   * breaking that property.
   */ 
  if (v->devinfo->gen >= 8)
 constrained[p.atom_of_reg(127)] = true;

to function shader_reg_constraints() in
brw_fs_bank_conflicts.cpp. This fixes the crashes I was seeing in
shader-db.

With the change to bank conflicts and the typos/style fixed, this
patch is

Reviewed-by: Caio Marcelo de Oliveira Filho 


> +  if (spilled_any_registers) {
> + foreach_block_and_inst(block, fs_inst, inst, cfg) {
> +if ((inst->opcode == SHADER_OPCODE_GEN7_SCRATCH_READ ||
> +inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_READ) &&
> +inst->dst.file ==VGRF) {

Missing space after the "==".

> +   ra_add_node_interference(g, inst->dst.nr, 
> grf127_send_hack_node);
> +}
>   }
>}
> }
>  
> +

Extra newline?


Thanks,
Caio
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: call resource_changed when binding a EGLImage to a texture

2018-07-11 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Mon, Jul 9, 2018 at 12:53 PM, Lucas Stach  wrote:
> When a EGLImage is newly bound to a texture, we need to make sure the
> driver is informed that the resource might have changed. Fixes stale
> texture content on Etnaviv when binding an existing EGLImage to an
> existing texture object.
>
> Signed-off-by: Lucas Stach 
> ---
>  src/mesa/state_tracker/st_cb_eglimage.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/mesa/state_tracker/st_cb_eglimage.c 
> b/src/mesa/state_tracker/st_cb_eglimage.c
> index bb092a2f6ef1..4f33cb4bb062 100644
> --- a/src/mesa/state_tracker/st_cb_eglimage.c
> +++ b/src/mesa/state_tracker/st_cb_eglimage.c
> @@ -229,6 +229,7 @@ st_bind_egl_image(struct gl_context *ctx,
> pipe_resource_reference(>pt, stimg->texture);
> st_texture_release_all_sampler_views(st, stObj);
> pipe_resource_reference(>pt, stObj->pt);
> +   st->pipe->screen->resource_changed(st->pipe->screen, stImage->pt);
>
> stObj->surface_format = stimg->format;
> stObj->level_override = stimg->level;
> --
> 2.18.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 107156] earth tessellation bug

2018-07-11 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107156

--- Comment #5 from Timothy Arceri  ---
Here are some really quick instructions for building mesa and bisecting the
issue.

Get and Build RADV:
---

git clone https://gitlab.freedesktop.org/mesa/mesa.git

cd mesa

sudo dnf builddep mesa (this is how you get the mesa build deps on Fedora not
sure how you would do that on Arch)

sudo mkdir /opt/xorg
sudo chown youruserid /opt/xorg

./autogen.sh --prefix=/opt/xorg --with-dri-drivers="" --enable-gles1
--enable-gles2 --enable-shared-glapi --with-gallium-drivers=""
--with-vulkan-drivers=radeon --with-egl-platforms=x11,drm  --enable-gbm
--enable-glx-tls --enable-dri3

make -j4 && make install

Use your built version of RADV:
---

Add the following to ~/.bashrc and logout and back in (Remember to remove this
if you want to switch back to your system version of RADV)

# User specific aliases and functions
export VK_ICD_FILENAMES=/opt/xorg/share/vulkan/icd.d/radeon_icd.x86_64.json

Bisecting the issue:


Here is a simple intro to git bisect http://webchick.net/node/99

The hardest part will probably be finding a commit that works. From there it
should be straight forward. Maybe start with the 18.1 branchpoint 
6754c2e83d79f93b3a4c8 (that was the end of april)

Good luck.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 107169] [regression] Upgrade from 18.0.4 to 18.1.0 causes severe stuttering in games

2018-07-11 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107169

--- Comment #3 from Timothy Arceri  ---
I highly recommend trying to build mesa from git and doing a git bisect from
the 18.0.0 branch point until the 18.1.0 branch point.

There are not may devs actively working on the r600 driver and bisecting the
bad commit will substantially increase the chance of someone actually fixing
this issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] docs/features: Move the Vulkan 1.1 extensions to the 1.1 section

2018-07-11 Thread Jason Ekstrand
While we're at it, add some extensions we missed along the way like the
VK_KHR_maintenanceN extensions.
---
 docs/features.txt | 39 +--
 1 file changed, 25 insertions(+), 14 deletions(-)

diff --git a/docs/features.txt b/docs/features.txt
index 0705ff9974b..61ca4d2da65 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -354,39 +354,50 @@ we DO NOT WANT implementations of these extensions for 
Mesa.
 
 Vulkan 1.0 -- all DONE: anv, radv
 
-Khronos extensions that are not part of any Vulkan version:
+Vulkan 1.1 -- all DONE: anv, radv
+
   VK_KHR_16bit_storage  in progress (Alejandro)
-  VK_KHR_android_surfacenot started
+  VK_KHR_bind_memory2   DONE (anv, radv)
   VK_KHR_dedicated_allocation   DONE (anv, radv)
   VK_KHR_descriptor_update_template DONE (anv, radv)
-  VK_KHR_displayDONE (anv, radv)
-  VK_KHR_display_swapchain  DONE (anv, radv)
+  VK_KHR_device_group   not started
+  VK_KHR_device_group_creation  not started
   VK_KHR_external_fence DONE (anv, radv)
   VK_KHR_external_fence_capabilitiesDONE (anv, radv)
-  VK_KHR_external_fence_fd  DONE (anv, radv)
-  VK_KHR_external_fence_win32   not started
   VK_KHR_external_memoryDONE (anv, radv)
   VK_KHR_external_memory_capabilities   DONE (anv, radv)
-  VK_KHR_external_memory_fd DONE (anv, radv)
-  VK_KHR_external_memory_win32  not started
   VK_KHR_external_semaphore DONE (anv, radv)
   VK_KHR_external_semaphore_capabilitiesDONE (anv, radv)
-  VK_KHR_external_semaphore_fd  DONE (anv, radv)
-  VK_KHR_external_semaphore_win32   not started
   VK_KHR_get_memory_requirements2   DONE (anv, radv)
   VK_KHR_get_physical_device_properties2DONE (anv, radv)
+  VK_KHR_maintenance1   DONE (anv, radv)
+  VK_KHR_maintenance2   DONE (anv, radv)
+  VK_KHR_maintenance3   DONE (anv, radv)
+  VK_KHR_multiview  DONE (anv, radv)
+  VK_KHR_relaxed_block_layout   DONE (anv, radv)
+  VK_KHR_sampler_ycbcr_conversion   DONE (anv, radv)
+  VK_KHR_shader_draw_parameters DONE (anv, radv)
+  VK_KHR_storage_buffer_storage_class   DONE (anv, radv)
+  VK_KHR_variable_pointers  DONE (anv, radv)
+
+Khronos extensions that are not part of any Vulkan version:
+  VK_KHR_android_surfacenot started
+  VK_KHR_displayDONE (anv, radv)
+  VK_KHR_display_swapchain  DONE (anv, radv)
+  VK_KHR_external_fence_fd  DONE (anv, radv)
+  VK_KHR_external_fence_win32   not started
+  VK_KHR_external_memory_fd DONE (anv, radv)
+  VK_KHR_external_memory_win32  not started
+  VK_KHR_external_semaphore_fd  DONE (anv, radv)
+  VK_KHR_external_semaphore_win32   not started
   VK_KHR_get_surface_capabilities2  DONE (anv, radv)
   VK_KHR_incremental_presentDONE (anv, radv)
-  VK_KHR_maintenance1   DONE (anv, radv)
   VK_KHR_mir_surfacenot started
   VK_KHR_push_descriptorDONE (anv, radv)
   VK_KHR_sampler_mirror_clamp_to_edge   DONE (anv, radv)
-  VK_KHR_shader_draw_parameters DONE (anv, radv)
   VK_KHR_shared_presentable_image   not started
-  VK_KHR_storage_buffer_storage_class   DONE (anv, radv)
   VK_KHR_surfaceDONE (anv, radv)
   VK_KHR_swapchain  DONE (anv, radv)
-  VK_KHR_variable_pointers  DONE (anv, radv)
   VK_KHR_wayland_surfaceDONE (anv, radv)
   VK_KHR_win32_keyed_mutex  not started
   VK_KHR_win32_surface  not started
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] docs/features: Mark some Vulkan extensions as done

2018-07-11 Thread Jason Ekstrand
---
 docs/features.txt | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/docs/features.txt b/docs/features.txt
index 81fe8d0e751..0705ff9974b 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -359,23 +359,23 @@ Khronos extensions that are not part of any Vulkan 
version:
   VK_KHR_android_surfacenot started
   VK_KHR_dedicated_allocation   DONE (anv, radv)
   VK_KHR_descriptor_update_template DONE (anv, radv)
-  VK_KHR_displaynot started
-  VK_KHR_display_swapchain  not started
-  VK_KHR_external_fence not started
-  VK_KHR_external_fence_capabilitiesnot started
-  VK_KHR_external_fence_fd  not started
+  VK_KHR_displayDONE (anv, radv)
+  VK_KHR_display_swapchain  DONE (anv, radv)
+  VK_KHR_external_fence DONE (anv, radv)
+  VK_KHR_external_fence_capabilitiesDONE (anv, radv)
+  VK_KHR_external_fence_fd  DONE (anv, radv)
   VK_KHR_external_fence_win32   not started
   VK_KHR_external_memoryDONE (anv, radv)
   VK_KHR_external_memory_capabilities   DONE (anv, radv)
   VK_KHR_external_memory_fd DONE (anv, radv)
   VK_KHR_external_memory_win32  not started
-  VK_KHR_external_semaphore DONE (radv)
-  VK_KHR_external_semaphore_capabilitiesDONE (radv)
-  VK_KHR_external_semaphore_fd  DONE (radv)
+  VK_KHR_external_semaphore DONE (anv, radv)
+  VK_KHR_external_semaphore_capabilitiesDONE (anv, radv)
+  VK_KHR_external_semaphore_fd  DONE (anv, radv)
   VK_KHR_external_semaphore_win32   not started
   VK_KHR_get_memory_requirements2   DONE (anv, radv)
   VK_KHR_get_physical_device_properties2DONE (anv, radv)
-  VK_KHR_get_surface_capabilities2  DONE (anv)
+  VK_KHR_get_surface_capabilities2  DONE (anv, radv)
   VK_KHR_incremental_presentDONE (anv, radv)
   VK_KHR_maintenance1   DONE (anv, radv)
   VK_KHR_mir_surfacenot started
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] docs/features: Add the missing KHR extensions

2018-07-11 Thread Jason Ekstrand
---
 docs/features.txt | 5 +
 1 file changed, 5 insertions(+)

diff --git a/docs/features.txt b/docs/features.txt
index 61ca4d2da65..a70e1d1cc68 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -381,16 +381,21 @@ Vulkan 1.1 -- all DONE: anv, radv
   VK_KHR_variable_pointers  DONE (anv, radv)
 
 Khronos extensions that are not part of any Vulkan version:
+  VK_KHR_8bit_storage   DONE (anv)
   VK_KHR_android_surfacenot started
+  VK_KHR_create_renderpass2 DONE (anv)
   VK_KHR_displayDONE (anv, radv)
   VK_KHR_display_swapchain  DONE (anv, radv)
+  VK_KHR_draw_indirect_countnot started
   VK_KHR_external_fence_fd  DONE (anv, radv)
   VK_KHR_external_fence_win32   not started
   VK_KHR_external_memory_fd DONE (anv, radv)
   VK_KHR_external_memory_win32  not started
   VK_KHR_external_semaphore_fd  DONE (anv, radv)
   VK_KHR_external_semaphore_win32   not started
+  VK_KHR_get_display_properties2DONE (anv, radv)
   VK_KHR_get_surface_capabilities2  DONE (anv, radv)
+  VK_KHR_image_format_list  DONE (anv, radv)
   VK_KHR_incremental_presentDONE (anv, radv)
   VK_KHR_mir_surfacenot started
   VK_KHR_push_descriptorDONE (anv, radv)
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/18] anv/pipeline: Do cross-stage linking

2018-07-11 Thread Timothy Arceri

On 12/07/18 07:18, Jason Ekstrand wrote:

I sent out a series for this almost a year ago and it just sat on the list
rotting away.  You can find the original series here:

https://patchwork.freedesktop.org/series/32809/

This v2 is a rebase of that series.  I believe Tim reviewed most of the
original but the rebase was painful enough that it probably merits a second
look-over.  I still have yet to actually be able to tie this to performance
data on anything.  I know there are some Skyrim shaders that are affected
by it but it runs at the same speed before and after.


As mentioned on IRC you are likely missing the most important parts for 
linking to be fully effective (splitting arrays and vectors). The 
following are the outstanding patches for i965.


https://patchwork.freedesktop.org/patch/189449/
https://patchwork.freedesktop.org/patch/189451/
https://patchwork.freedesktop.org/patch/189452/

With those there was a noticeable boost to the tessellation Vulkan demo 
for RADV if I'm remembering correctly.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] egl/android: Delete set_damage_region from egl dri vtbl

2018-07-11 Thread Eric Anholt
Harish Krupo  writes:

> Hi Eric,
>
> Eric Anholt  writes:
>
>> Harish Krupo  writes:
>>
>>> The intension of the KHR_partial_update was not to send the damage back
>>> to the platform but to send the damage to the driver to ensure that the
>>> following rendering could be restricted to those regions.
>>> This patch removes the set_damage_region from the egl_dri vtbl and all
>>> the platfrom_*.c files.
>>> Then upcomming patches add a new dri2 interface for the drivers to
>>> implement
>>>
>>> Signed-off-by: Harish Krupo 
>>
>> Why shouldn't the platform know about the damage region in a swap, if
>> it's available?  It looks like it was successfully used for Android, and
>> we should be using it for Present as well.
>
> From the spec [1], the damage region referred to by partial_update spec is
> the damaged part of the buffer when it is used again. The damage that the
> compositor/platform needs to know is the damage between the (n-1)th
> frame and the nth frame. Quoting from the spec:
> "   The surface damage for frame n is the difference between frame n and frame
> (n-1), and represents the area that a compositor must recompose."
> This is the damage referred to by the swap_buffers_with_damage spec [2],
> whereas the partial_update damage region's objective is to restrict the 
> subsequent
> rendering operations on the back buffer, to only those regions which have 
> changed since
> that buffer was last used. This information is available as the buffer
> age. Some more information: [3].

OK, let's document that in the new internal API you're adding then.
Things I'd want to know as an implementer of the hook:

1) Am I guaranteed that it's called before the frame is started?

2) Is the behavior if the client draws outside of the partial update
damage region defined?  (is it "the driver must not change pixels
outside of the partial region" or "the driver might not change pixels
outside of the partial region")

3) Is the client guaranteed to fully initialize pixels in the partial
update region, or might it depend on previous contents?


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [ANNOUNCE] Mesa 18.1.4 release candidate

2018-07-11 Thread Dylan Baker
Hi List,

Mesa 18.1.4 is planned for release this Friday, July 13th, at or around 10 AM
PDT.

There are currently:
 - 27 queued
 - 1 nominated
 - 0 rejected

In the mesa repo, the branch "staging/18.1" in the mesa gitlab will (unless bugs
are found) be rebased into the 18.1 branch for the release on Friday. This has
been run though the Intel CI already and passes there, anyone who wants to test
on other hardware/drivers please do so and report any bugs.

Dylan


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] u_blitter: Add an option to draw the triangles using an index buffer.

2018-07-11 Thread Marek Olšák
For the series:

Reviewed-by: Marek Olšák 

Marek

On Wed, Jul 11, 2018 at 6:05 PM, Eric Anholt  wrote:
> For V3D, the HW will interpolate slightly differently along the shared
> edge of the trifan.  The conformance tests manage to catch this in the
> nearest_consistency_* group.  To get interpolation to match, we need the
> last vertex of the triangle to be shared.
>
> I first tried implementing draw_rectangle to do triangles instead, but
> that was quite a bit (147 lines) of code duplication from u_blitter, and
> this seems much simpler and less likely to break as u_blitter changes.
>
> Fixes dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_* on V3D.
> ---
>  src/gallium/auxiliary/util/u_blitter.c | 16 ++--
>  src/gallium/auxiliary/util/u_blitter.h |  2 ++
>  src/gallium/drivers/v3d/v3d_context.c  |  1 +
>  3 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/auxiliary/util/u_blitter.c 
> b/src/gallium/auxiliary/util/u_blitter.c
> index 4748627fc523..eadb76a109fb 100644
> --- a/src/gallium/auxiliary/util/u_blitter.c
> +++ b/src/gallium/auxiliary/util/u_blitter.c
> @@ -1258,8 +1258,20 @@ static void blitter_draw(struct blitter_context_priv 
> *ctx,
> pipe->set_vertex_buffers(pipe, ctx->base.vb_slot, 1, );
> pipe->bind_vertex_elements_state(pipe, vertex_elements_cso);
> pipe->bind_vs_state(pipe, get_vs(>base));
> -   util_draw_arrays_instanced(pipe, PIPE_PRIM_TRIANGLE_FAN, 0, 4,
> -  0, num_instances);
> +
> +   if (ctx->base.use_index_buffer) {
> +  /* Note that for V3D,
> +   * dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_* require
> +   * that the last vert of the two tris be the same.
> +   */
> +  static uint8_t indices[6] = { 0, 1, 2, 0, 3, 2 };
> +  util_draw_elements_instanced(pipe, indices, 1, 0,
> +   PIPE_PRIM_TRIANGLES, 0, 6,
> +   0, num_instances);
> +   } else {
> +  util_draw_arrays_instanced(pipe, PIPE_PRIM_TRIANGLE_FAN, 0, 4,
> + 0, num_instances);
> +   }
> pipe_resource_reference(, NULL);
>  }
>
> diff --git a/src/gallium/auxiliary/util/u_blitter.h 
> b/src/gallium/auxiliary/util/u_blitter.h
> index 9e945983baac..9ea1dc9b6b28 100644
> --- a/src/gallium/auxiliary/util/u_blitter.h
> +++ b/src/gallium/auxiliary/util/u_blitter.h
> @@ -100,6 +100,8 @@ struct blitter_context
> /* Whether the blitter is running. */
> bool running;
>
> +   bool use_index_buffer;
> +
> /* Private members, really. */
> struct pipe_context *pipe; /**< pipe context */
>
> diff --git a/src/gallium/drivers/v3d/v3d_context.c 
> b/src/gallium/drivers/v3d/v3d_context.c
> index cef32ceb069d..6fb807b1aa8a 100644
> --- a/src/gallium/drivers/v3d/v3d_context.c
> +++ b/src/gallium/drivers/v3d/v3d_context.c
> @@ -164,6 +164,7 @@ v3d_context_create(struct pipe_screen *pscreen, void 
> *priv, unsigned flags)
>  v3d->blitter = util_blitter_create(pctx);
>  if (!v3d->blitter)
>  goto fail;
> +v3d->blitter->use_index_buffer = true;
>
>  v3d->primconvert = util_primconvert_create(pctx,
> (1 << PIPE_PRIM_QUADS) - 
> 1);
> --
> 2.18.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] u_draw: Add some indices to the util_draw_elements() helpers.

2018-07-11 Thread Eric Anholt
These helpers have been unused, and were definitely not useful since
330d0607ed60 ("gallium: remove pipe_index_buffer and set_index_buffer")
made it so that they never had an index buffer passed in.

For an upcoming u_blitter change to use these helpers, I have just 6 bytes
of index data, so pass it as user data until a more interesting caller
comes along.
---
 src/gallium/auxiliary/util/u_draw.h | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/gallium/auxiliary/util/u_draw.h 
b/src/gallium/auxiliary/util/u_draw.h
index e8af14051b17..d0955fa3f978 100644
--- a/src/gallium/auxiliary/util/u_draw.h
+++ b/src/gallium/auxiliary/util/u_draw.h
@@ -67,7 +67,9 @@ util_draw_arrays(struct pipe_context *pipe,
 }
 
 static inline void
-util_draw_elements(struct pipe_context *pipe, unsigned index_size,
+util_draw_elements(struct pipe_context *pipe,
+   void *indices,
+   unsigned index_size,
int index_bias, enum pipe_prim_type mode,
uint start,
uint count)
@@ -75,6 +77,8 @@ util_draw_elements(struct pipe_context *pipe, unsigned 
index_size,
struct pipe_draw_info info;
 
util_draw_init_info();
+   info.index.user = indices;
+   info.has_user_indices = true;
info.index_size = index_size;
info.mode = mode;
info.start = start;
@@ -108,6 +112,7 @@ util_draw_arrays_instanced(struct pipe_context *pipe,
 
 static inline void
 util_draw_elements_instanced(struct pipe_context *pipe,
+ void *indices,
  unsigned index_size,
  int index_bias,
  enum pipe_prim_type mode,
@@ -119,6 +124,8 @@ util_draw_elements_instanced(struct pipe_context *pipe,
struct pipe_draw_info info;
 
util_draw_init_info();
+   info.index.user = indices;
+   info.has_user_indices = true;
info.index_size = index_size;
info.mode = mode;
info.start = start;
-- 
2.18.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] u_blitter: Add an option to draw the triangles using an index buffer.

2018-07-11 Thread Eric Anholt
For V3D, the HW will interpolate slightly differently along the shared
edge of the trifan.  The conformance tests manage to catch this in the
nearest_consistency_* group.  To get interpolation to match, we need the
last vertex of the triangle to be shared.

I first tried implementing draw_rectangle to do triangles instead, but
that was quite a bit (147 lines) of code duplication from u_blitter, and
this seems much simpler and less likely to break as u_blitter changes.

Fixes dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_* on V3D.
---
 src/gallium/auxiliary/util/u_blitter.c | 16 ++--
 src/gallium/auxiliary/util/u_blitter.h |  2 ++
 src/gallium/drivers/v3d/v3d_context.c  |  1 +
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_blitter.c 
b/src/gallium/auxiliary/util/u_blitter.c
index 4748627fc523..eadb76a109fb 100644
--- a/src/gallium/auxiliary/util/u_blitter.c
+++ b/src/gallium/auxiliary/util/u_blitter.c
@@ -1258,8 +1258,20 @@ static void blitter_draw(struct blitter_context_priv 
*ctx,
pipe->set_vertex_buffers(pipe, ctx->base.vb_slot, 1, );
pipe->bind_vertex_elements_state(pipe, vertex_elements_cso);
pipe->bind_vs_state(pipe, get_vs(>base));
-   util_draw_arrays_instanced(pipe, PIPE_PRIM_TRIANGLE_FAN, 0, 4,
-  0, num_instances);
+
+   if (ctx->base.use_index_buffer) {
+  /* Note that for V3D,
+   * dEQP-GLES3.functional.fbo.blit.rect.nearest_consistency_* require
+   * that the last vert of the two tris be the same.
+   */
+  static uint8_t indices[6] = { 0, 1, 2, 0, 3, 2 };
+  util_draw_elements_instanced(pipe, indices, 1, 0,
+   PIPE_PRIM_TRIANGLES, 0, 6,
+   0, num_instances);
+   } else {
+  util_draw_arrays_instanced(pipe, PIPE_PRIM_TRIANGLE_FAN, 0, 4,
+ 0, num_instances);
+   }
pipe_resource_reference(, NULL);
 }
 
diff --git a/src/gallium/auxiliary/util/u_blitter.h 
b/src/gallium/auxiliary/util/u_blitter.h
index 9e945983baac..9ea1dc9b6b28 100644
--- a/src/gallium/auxiliary/util/u_blitter.h
+++ b/src/gallium/auxiliary/util/u_blitter.h
@@ -100,6 +100,8 @@ struct blitter_context
/* Whether the blitter is running. */
bool running;
 
+   bool use_index_buffer;
+
/* Private members, really. */
struct pipe_context *pipe; /**< pipe context */
 
diff --git a/src/gallium/drivers/v3d/v3d_context.c 
b/src/gallium/drivers/v3d/v3d_context.c
index cef32ceb069d..6fb807b1aa8a 100644
--- a/src/gallium/drivers/v3d/v3d_context.c
+++ b/src/gallium/drivers/v3d/v3d_context.c
@@ -164,6 +164,7 @@ v3d_context_create(struct pipe_screen *pscreen, void *priv, 
unsigned flags)
 v3d->blitter = util_blitter_create(pctx);
 if (!v3d->blitter)
 goto fail;
+v3d->blitter->use_index_buffer = true;
 
 v3d->primconvert = util_primconvert_create(pctx,
(1 << PIPE_PRIM_QUADS) - 1);
-- 
2.18.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/6] nir: move lowering of SYSTEM_VALUE_LOCAL_GROUP_SIZE into a function

2018-07-11 Thread Jason Ekstrand
On Wed, Jul 11, 2018 at 2:29 PM Karol Herbst  wrote:

> we already have this code duplicated and we will need it for the global
> group size as well
>
> Signed-off-by: Karol Herbst 
> ---
>  src/compiler/nir/nir_lower_system_values.c | 29 +++---
>  1 file changed, 14 insertions(+), 15 deletions(-)
>
> diff --git a/src/compiler/nir/nir_lower_system_values.c
> b/src/compiler/nir/nir_lower_system_values.c
> index f315b7ae96f..2a1be8fdd45 100644
> --- a/src/compiler/nir/nir_lower_system_values.c
> +++ b/src/compiler/nir/nir_lower_system_values.c
> @@ -28,6 +28,17 @@
>  #include "nir.h"
>  #include "nir_builder.h"
>
> +static nir_ssa_def*
> +handle_local_group_size(nir_builder *b)
>

How about build_local_group_size_imm or something like that?  "handle" is a
bit of an odd name.


> +{
> +   nir_const_value local_size;
> +   memset(_size, 0, sizeof(local_size));
> +   local_size.u32[0] = b->shader->info.cs.local_size[0];
> +   local_size.u32[1] = b->shader->info.cs.local_size[1];
> +   local_size.u32[2] = b->shader->info.cs.local_size[2];
> +   return nir_build_imm(b, 3, 32, local_size);
> +}
> +
>  static bool
>  convert_block(nir_block *block, nir_builder *b)
>  {
> @@ -66,18 +77,11 @@ convert_block(nir_block *block, nir_builder *b)
>*"The value of gl_GlobalInvocationID is equal to
>*gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID"
>*/
> -
> - nir_const_value local_size;
> - memset(_size, 0, sizeof(local_size));
> - local_size.u32[0] = b->shader->info.cs.local_size[0];
> - local_size.u32[1] = b->shader->info.cs.local_size[1];
> - local_size.u32[2] = b->shader->info.cs.local_size[2];
> -
> + nir_ssa_def *group_size = handle_local_group_size(b);
>   nir_ssa_def *group_id = nir_load_work_group_id(b);
>   nir_ssa_def *local_id = nir_load_local_invocation_id(b);
>
> - sysval = nir_iadd(b, nir_imul(b, group_id,
> -   nir_build_imm(b, 3, 32,
> local_size)),
> + sysval = nir_iadd(b, nir_imul(b, group_id, group_size),
>local_id);
>

Can this fit on one line now?  Not that it matters much.

4 and 5 are

Reviewed-by: Jason Ekstrand 


>   break;
>}
> @@ -112,12 +116,7 @@ convert_block(nir_block *block, nir_builder *b)
>}
>
>case SYSTEM_VALUE_LOCAL_GROUP_SIZE: {
> - nir_const_value local_size;
> - memset(_size, 0, sizeof(local_size));
> - local_size.u32[0] = b->shader->info.cs.local_size[0];
> - local_size.u32[1] = b->shader->info.cs.local_size[1];
> - local_size.u32[2] = b->shader->info.cs.local_size[2];
> - sysval = nir_build_imm(b, 3, 32, local_size);
> + sysval = handle_local_group_size(b);
>   break;
>}
>
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/6] vtn: handle OpConstantComposites with OpUndef members

2018-07-11 Thread Jason Ekstrand
1, 2, 3, and 6 are

Reviewed-by: Jason Ekstrand 

On Wed, Jul 11, 2018 at 2:30 PM Karol Herbst  wrote:

> Signed-off-by: Karol Herbst 
> ---
>  src/compiler/spirv/spirv_to_nir.c | 15 +--
>  1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/src/compiler/spirv/spirv_to_nir.c
> b/src/compiler/spirv/spirv_to_nir.c
> index 413fbf481c1..235003e872a 100644
> --- a/src/compiler/spirv/spirv_to_nir.c
> +++ b/src/compiler/spirv/spirv_to_nir.c
> @@ -1494,8 +1494,19 @@ vtn_handle_constant(struct vtn_builder *b, SpvOp
> opcode,
>spirv_op_to_string(opcode), elem_count,
> val->type->length);
>
>nir_constant **elems = ralloc_array(b, nir_constant *, elem_count);
> -  for (unsigned i = 0; i < elem_count; i++)
> - elems[i] = vtn_value(b, w[i + 3],
> vtn_value_type_constant)->constant;
> +  for (unsigned i = 0; i < elem_count; i++) {
> + struct vtn_value *val = vtn_untyped_value(b, w[i + 3]);
> +
> + if (val->value_type == vtn_value_type_constant) {
> +elems[i] = val->constant;
> + } else {
> +vtn_fail_if(val->value_type != vtn_value_type_undef,
> +"only constants or undefs allowed for "
> +"SpvOpConstantComposite");
> +/* to make it easier, just insert a NULL constant for now */
> +elems[i] = vtn_null_constant(b, val->type->type);
> + }
> +  }
>
>switch (val->type->base_type) {
>case vtn_base_type_vector: {
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/6] vtn: handle OpConstantComposites with OpUndef members

2018-07-11 Thread Karol Herbst
Signed-off-by: Karol Herbst 
---
 src/compiler/spirv/spirv_to_nir.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index 413fbf481c1..235003e872a 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -1494,8 +1494,19 @@ vtn_handle_constant(struct vtn_builder *b, SpvOp opcode,
   spirv_op_to_string(opcode), elem_count, val->type->length);
 
   nir_constant **elems = ralloc_array(b, nir_constant *, elem_count);
-  for (unsigned i = 0; i < elem_count; i++)
- elems[i] = vtn_value(b, w[i + 3], vtn_value_type_constant)->constant;
+  for (unsigned i = 0; i < elem_count; i++) {
+ struct vtn_value *val = vtn_untyped_value(b, w[i + 3]);
+
+ if (val->value_type == vtn_value_type_constant) {
+elems[i] = val->constant;
+ } else {
+vtn_fail_if(val->value_type != vtn_value_type_undef,
+"only constants or undefs allowed for "
+"SpvOpConstantComposite");
+/* to make it easier, just insert a NULL constant for now */
+elems[i] = vtn_null_constant(b, val->type->type);
+ }
+  }
 
   switch (val->type->base_type) {
   case vtn_base_type_vector: {
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/6] compiler: add missing entries to gl_system_value_name

2018-07-11 Thread Karol Herbst
also reorder to match the gl_system_value enum.

It is weird that the STATIC_ASSERT doesn't trigger though.

Signed-off-by: Karol Herbst 
---
 src/compiler/shader_enums.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/compiler/shader_enums.c b/src/compiler/shader_enums.c
index d596d7d04a0..0f3b1acae7a 100644
--- a/src/compiler/shader_enums.c
+++ b/src/compiler/shader_enums.c
@@ -216,16 +216,18 @@ gl_system_value_name(gl_system_value sysval)
  ENUM(SYSTEM_VALUE_INSTANCE_ID),
  ENUM(SYSTEM_VALUE_INSTANCE_INDEX),
  ENUM(SYSTEM_VALUE_VERTEX_ID_ZERO_BASE),
+ ENUM(SYSTEM_VALUE_BASE_VERTEX),
  ENUM(SYSTEM_VALUE_FIRST_VERTEX),
  ENUM(SYSTEM_VALUE_IS_INDEXED_DRAW),
- ENUM(SYSTEM_VALUE_BASE_VERTEX),
  ENUM(SYSTEM_VALUE_BASE_INSTANCE),
  ENUM(SYSTEM_VALUE_DRAW_ID),
  ENUM(SYSTEM_VALUE_INVOCATION_ID),
+ ENUM(SYSTEM_VALUE_FRAG_COORD),
  ENUM(SYSTEM_VALUE_FRONT_FACE),
  ENUM(SYSTEM_VALUE_SAMPLE_ID),
  ENUM(SYSTEM_VALUE_SAMPLE_POS),
  ENUM(SYSTEM_VALUE_SAMPLE_MASK_IN),
+ ENUM(SYSTEM_VALUE_HELPER_INVOCATION),
  ENUM(SYSTEM_VALUE_TESS_COORD),
  ENUM(SYSTEM_VALUE_VERTICES_IN),
  ENUM(SYSTEM_VALUE_PRIMITIVE_ID),
@@ -236,6 +238,7 @@ gl_system_value_name(gl_system_value sysval)
  ENUM(SYSTEM_VALUE_GLOBAL_INVOCATION_ID),
  ENUM(SYSTEM_VALUE_WORK_GROUP_ID),
  ENUM(SYSTEM_VALUE_NUM_WORK_GROUPS),
+ ENUM(SYSTEM_VALUE_LOCAL_GROUP_SIZE),
  ENUM(SYSTEM_VALUE_DEVICE_INDEX),
  ENUM(SYSTEM_VALUE_VIEW_INDEX),
  ENUM(SYSTEM_VALUE_VERTEX_CNT),
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/6] nir/vtn: implement BuiltInGlobalSize

2018-07-11 Thread Karol Herbst
Signed-off-by: Karol Herbst 
---
 src/compiler/nir/nir_lower_system_values.c | 7 +++
 src/compiler/shader_enums.c| 1 +
 src/compiler/shader_enums.h| 1 +
 src/compiler/spirv/vtn_variables.c | 4 
 4 files changed, 13 insertions(+)

diff --git a/src/compiler/nir/nir_lower_system_values.c 
b/src/compiler/nir/nir_lower_system_values.c
index 2a1be8fdd45..657289b4f61 100644
--- a/src/compiler/nir/nir_lower_system_values.c
+++ b/src/compiler/nir/nir_lower_system_values.c
@@ -172,6 +172,13 @@ convert_block(nir_block *block, nir_builder *b)
 sysval = nir_imm_int(b, 0);
  break;
 
+  case SYSTEM_VALUE_GLOBAL_GROUP_SIZE: {
+ nir_ssa_def *group_size = handle_local_group_size(b);
+ nir_ssa_def *num_work_groups = nir_load_num_work_groups(b);
+ sysval = nir_imul(b, group_size, num_work_groups);
+ break;
+  }
+
   default:
  break;
   }
diff --git a/src/compiler/shader_enums.c b/src/compiler/shader_enums.c
index 0f3b1acae7a..4eade256604 100644
--- a/src/compiler/shader_enums.c
+++ b/src/compiler/shader_enums.c
@@ -239,6 +239,7 @@ gl_system_value_name(gl_system_value sysval)
  ENUM(SYSTEM_VALUE_WORK_GROUP_ID),
  ENUM(SYSTEM_VALUE_NUM_WORK_GROUPS),
  ENUM(SYSTEM_VALUE_LOCAL_GROUP_SIZE),
+ ENUM(SYSTEM_VALUE_GLOBAL_GROUP_SIZE),
  ENUM(SYSTEM_VALUE_DEVICE_INDEX),
  ENUM(SYSTEM_VALUE_VIEW_INDEX),
  ENUM(SYSTEM_VALUE_VERTEX_CNT),
diff --git a/src/compiler/shader_enums.h b/src/compiler/shader_enums.h
index 1ef4d5a33d0..280bf1d2835 100644
--- a/src/compiler/shader_enums.h
+++ b/src/compiler/shader_enums.h
@@ -585,6 +585,7 @@ typedef enum
SYSTEM_VALUE_WORK_GROUP_ID,
SYSTEM_VALUE_NUM_WORK_GROUPS,
SYSTEM_VALUE_LOCAL_GROUP_SIZE,
+   SYSTEM_VALUE_GLOBAL_GROUP_SIZE,
/*@}*/
 
/** Required for VK_KHR_device_group */
diff --git a/src/compiler/spirv/vtn_variables.c 
b/src/compiler/spirv/vtn_variables.c
index b7c9e6f2f70..c86416495b6 100644
--- a/src/compiler/spirv/vtn_variables.c
+++ b/src/compiler/spirv/vtn_variables.c
@@ -1208,6 +1208,10 @@ vtn_get_builtin_location(struct vtn_builder *b,
   *location = FRAG_RESULT_STENCIL;
   vtn_assert(*mode == nir_var_shader_out);
   break;
+   case SpvBuiltInGlobalSize:
+  *location = SYSTEM_VALUE_GLOBAL_GROUP_SIZE;
+  set_mode_system_value(b, mode);
+  break;
default:
   vtn_fail("unsupported builtin");
}
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/6] nir/vtn: print extension name in fail msg

2018-07-11 Thread Karol Herbst
From: Rob Clark 

Reviewed-by: Karol Herbst 
Signed-off-by: Karol Herbst 
---
 src/compiler/spirv/spirv_to_nir.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index aad4c713f9e..413fbf481c1 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -383,19 +383,20 @@ static void
 vtn_handle_extension(struct vtn_builder *b, SpvOp opcode,
  const uint32_t *w, unsigned count)
 {
+   const char *ext = (const char *)[2];
switch (opcode) {
case SpvOpExtInstImport: {
   struct vtn_value *val = vtn_push_value(b, w[1], 
vtn_value_type_extension);
-  if (strcmp((const char *)[2], "GLSL.std.450") == 0) {
+  if (strcmp(ext, "GLSL.std.450") == 0) {
  val->ext_handler = vtn_handle_glsl450_instruction;
-  } else if ((strcmp((const char *)[2], "SPV_AMD_gcn_shader") == 0)
+  } else if ((strcmp(ext, "SPV_AMD_gcn_shader") == 0)
 && (b->options && b->options->caps.gcn_shader)) {
  val->ext_handler = vtn_handle_amd_gcn_shader_instruction;
-  } else if ((strcmp((const char *)[2], "SPV_AMD_shader_trinary_minmax") 
== 0)
+  } else if ((strcmp(ext, "SPV_AMD_shader_trinary_minmax") == 0)
 && (b->options && b->options->caps.trinary_minmax)) {
  val->ext_handler = vtn_handle_amd_shader_trinary_minmax_instruction;
   } else {
- vtn_fail("Unsupported extension");
+ vtn_fail("Unsupported extension: %s", ext);
   }
   break;
}
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/6] nir: move lowering of SYSTEM_VALUE_LOCAL_GROUP_SIZE into a function

2018-07-11 Thread Karol Herbst
we already have this code duplicated and we will need it for the global
group size as well

Signed-off-by: Karol Herbst 
---
 src/compiler/nir/nir_lower_system_values.c | 29 +++---
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/src/compiler/nir/nir_lower_system_values.c 
b/src/compiler/nir/nir_lower_system_values.c
index f315b7ae96f..2a1be8fdd45 100644
--- a/src/compiler/nir/nir_lower_system_values.c
+++ b/src/compiler/nir/nir_lower_system_values.c
@@ -28,6 +28,17 @@
 #include "nir.h"
 #include "nir_builder.h"
 
+static nir_ssa_def*
+handle_local_group_size(nir_builder *b)
+{
+   nir_const_value local_size;
+   memset(_size, 0, sizeof(local_size));
+   local_size.u32[0] = b->shader->info.cs.local_size[0];
+   local_size.u32[1] = b->shader->info.cs.local_size[1];
+   local_size.u32[2] = b->shader->info.cs.local_size[2];
+   return nir_build_imm(b, 3, 32, local_size);
+}
+
 static bool
 convert_block(nir_block *block, nir_builder *b)
 {
@@ -66,18 +77,11 @@ convert_block(nir_block *block, nir_builder *b)
   *"The value of gl_GlobalInvocationID is equal to
   *gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID"
   */
-
- nir_const_value local_size;
- memset(_size, 0, sizeof(local_size));
- local_size.u32[0] = b->shader->info.cs.local_size[0];
- local_size.u32[1] = b->shader->info.cs.local_size[1];
- local_size.u32[2] = b->shader->info.cs.local_size[2];
-
+ nir_ssa_def *group_size = handle_local_group_size(b);
  nir_ssa_def *group_id = nir_load_work_group_id(b);
  nir_ssa_def *local_id = nir_load_local_invocation_id(b);
 
- sysval = nir_iadd(b, nir_imul(b, group_id,
-   nir_build_imm(b, 3, 32, local_size)),
+ sysval = nir_iadd(b, nir_imul(b, group_id, group_size),
   local_id);
  break;
   }
@@ -112,12 +116,7 @@ convert_block(nir_block *block, nir_builder *b)
   }
 
   case SYSTEM_VALUE_LOCAL_GROUP_SIZE: {
- nir_const_value local_size;
- memset(_size, 0, sizeof(local_size));
- local_size.u32[0] = b->shader->info.cs.local_size[0];
- local_size.u32[1] = b->shader->info.cs.local_size[1];
- local_size.u32[2] = b->shader->info.cs.local_size[2];
- sysval = nir_build_imm(b, 3, 32, local_size);
+ sysval = handle_local_group_size(b);
  break;
   }
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/6] Some trivial patches for OpenCL support in vtn

2018-07-11 Thread Karol Herbst
most of the patches can be reviewed independently, this are just some
smaller patches I think we can upstream already.

Karol Herbst (4):
  compiler: add missing entries to gl_system_value_name
  nir: move lowering of SYSTEM_VALUE_LOCAL_GROUP_SIZE into a function
  nir/vtn: implement BuiltInGlobalSize
  vtn: handle OpConstantComposites with OpUndef members

Rob Clark (2):
  nir/vtn: Use imov where we might have 8 bit types
  nir/vtn: print extension name in fail msg

 src/compiler/nir/nir_lower_system_values.c | 36 +-
 src/compiler/shader_enums.c|  6 +++-
 src/compiler/shader_enums.h|  1 +
 src/compiler/spirv/spirv_to_nir.c  | 28 -
 src/compiler/spirv/vtn_variables.c |  4 +++
 5 files changed, 51 insertions(+), 24 deletions(-)

-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/6] nir/vtn: Use imov where we might have 8 bit types

2018-07-11 Thread Karol Herbst
From: Rob Clark 

Otherwise nir_validate may complain about 8 bit floats, which do not exist.

Reviewed-by: Karol Herbst 
Signed-off-by: Karol Herbst 
---
 src/compiler/spirv/spirv_to_nir.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index 80a35b1b750..aad4c713f9e 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -2798,7 +2798,7 @@ create_vec(struct vtn_builder *b, unsigned 
num_components, unsigned bit_size)
 {
nir_op op;
switch (num_components) {
-   case 1: op = nir_op_fmov; break;
+   case 1: op = nir_op_imov; break;
case 2: op = nir_op_vec2; break;
case 3: op = nir_op_vec3; break;
case 4: op = nir_op_vec4; break;
@@ -2847,7 +2847,7 @@ nir_ssa_def *
 vtn_vector_extract(struct vtn_builder *b, nir_ssa_def *src, unsigned index)
 {
unsigned swiz[4] = { index };
-   return nir_swizzle(>nb, src, swiz, 1, true);
+   return nir_swizzle(>nb, src, swiz, 1, false);
 }
 
 nir_ssa_def *
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/3] i965: Sweep NIR after linking phase to free held memory

2018-07-11 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

On Wed, Jul 11, 2018 at 5:29 AM Danylo Piliaiev 
wrote:

> After optimization passes and many trasfromations most of memory
> NIR holds is a garbage which was being freed only after shader deletion.
> Freeing it at the end of linking will save memory which would be useful
> in case there are a lot of complex shaders being compiled.
> The common case for this issue is 32bit game running under Wine.
>
> The cost of the optimization is around ~3-5% of compilation speed
> with complex shaders.
>
> V2: by Jason Ekstrand
> - Move nir_sweep up, right after the last change of NIR
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103274
>
> Signed-off-by: Danylo Piliaiev 
> ---
>  src/mesa/drivers/dri/i965/brw_link.cpp | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_link.cpp
> b/src/mesa/drivers/dri/i965/brw_link.cpp
> index 1071056f14..378426101b 100644
> --- a/src/mesa/drivers/dri/i965/brw_link.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_link.cpp
> @@ -317,6 +317,8 @@ brw_link_shader(struct gl_context *ctx, struct
> gl_shader_program *shProg)
>NIR_PASS_V(prog->nir, nir_lower_atomics_to_ssbo,
>   prog->nir->info.num_abos);
>
> +  nir_sweep(prog->nir);
> +
>infos[stage] = >nir->info;
>
>update_xfb_info(prog->sh.LinkedTransformFeedback, infos[stage]);
> --
> 2.17.1
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 18/18] anv/pipeline: Disable FS dispatch for pointless fragment shaders

2018-07-11 Thread Jason Ekstrand
---
 src/intel/vulkan/anv_pipeline.c | 37 +
 1 file changed, 33 insertions(+), 4 deletions(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index ab3b95e78ef..35f60a23443 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -826,9 +826,25 @@ anv_pipeline_compile_fs(const struct brw_compiler 
*compiler,
fs_stage->key.wm.input_slots_valid =
   prev_stage->prog_data.vue.vue_map.slots_valid;
 
-   return brw_compile_fs(compiler, NULL, mem_ctx, _stage->key.wm,
- _stage->prog_data.wm, fs_stage->nir,
- NULL, -1, -1, -1, true, false, NULL, NULL);
+   const unsigned *code =
+  brw_compile_fs(compiler, NULL, mem_ctx, _stage->key.wm,
+ _stage->prog_data.wm, fs_stage->nir,
+ NULL, -1, -1, -1, true, false, NULL, NULL);
+
+   if (fs_stage->key.wm.nr_color_regions == 0 &&
+   !fs_stage->prog_data.wm.has_side_effects &&
+   !fs_stage->prog_data.wm.uses_kill &&
+   fs_stage->prog_data.wm.computed_depth_mode == BRW_PSCDEPTH_OFF &&
+   !fs_stage->prog_data.wm.computed_stencil) {
+  /* This fragment shader has not outputs and no side effects.  Go ahead
+   * and return the code pointer so we don't accidentally think the
+   * compile failed but zero out prog_data which will set program_size to
+   * zero and disable the stage.
+   */
+  memset(_stage->prog_data, 0, sizeof(fs_stage->prog_data));
+   }
+
+   return code;
 }
 
 static VkResult
@@ -910,7 +926,7 @@ anv_pipeline_compile_graphics(struct anv_pipeline *pipeline,
 
if (found == __builtin_popcount(pipeline->active_stages)) {
   /* We found all our shaders in the cache.  We're done. */
-  return VK_SUCCESS;
+  goto done;
} else if (found > 0) {
   /* We found some but not all of our shaders.  This shouldn't happen
* most of the time but it can if we have a partially populated
@@ -1052,6 +1068,19 @@ anv_pipeline_compile_graphics(struct anv_pipeline 
*pipeline,
 
ralloc_free(pipeline_ctx);
 
+done:
+
+   if (pipeline->shaders[MESA_SHADER_FRAGMENT] &&
+   pipeline->shaders[MESA_SHADER_FRAGMENT]->prog_data->program_size == 0) {
+  /* This can happen if we decided to implicitly disable the fragment
+   * shader.  See anv_pipeline_compile_fs().
+   */
+  anv_shader_bin_unref(pipeline->device,
+   pipeline->shaders[MESA_SHADER_FRAGMENT]);
+  pipeline->shaders[MESA_SHADER_FRAGMENT] = NULL;
+  pipeline->active_stages &= ~VK_SHADER_STAGE_FRAGMENT_BIT;
+   }
+
return VK_SUCCESS;
 
 fail:
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 15/18] anv/pipeline: Pull most of the anv_pipeline_compile_* into common code

2018-07-11 Thread Jason Ekstrand
This leaves us with a series of little anv_pipeline_compile_* functions
which each take a compiler object, a mem_ctx, the stage to compile, and
the previous stage for VUE linking purposes.  Some of them do
interesting things but most are little more than wrappers around
brw_compile_*.
---
 src/intel/vulkan/anv_pipeline.c | 307 ++--
 1 file changed, 92 insertions(+), 215 deletions(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 3f42a81edb7..080a78e1cee 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -560,52 +560,18 @@ anv_pipeline_link_vs(const struct brw_compiler *compiler,
anv_fill_binding_table(_stage->prog_data.vs.base.base, 0);
 }
 
-static VkResult
-anv_pipeline_compile_vs(struct anv_pipeline *pipeline,
-struct anv_pipeline_cache *cache,
-struct anv_pipeline_stage *stage)
+static const unsigned *
+anv_pipeline_compile_vs(const struct brw_compiler *compiler,
+void *mem_ctx,
+struct anv_pipeline_stage *vs_stage)
 {
-   const struct brw_compiler *compiler =
-  pipeline->device->instance->physicalDevice.compiler;
-   struct anv_shader_bin *bin = NULL;
-
-   if (bin == NULL) {
-  void *mem_ctx = ralloc_context(NULL);
+   brw_compute_vue_map(compiler->devinfo,
+   _stage->prog_data.vs.base.vue_map,
+   vs_stage->nir->info.outputs_written,
+   vs_stage->nir->info.separate_shader);
 
-  brw_compute_vue_map(>device->info,
-  >prog_data.vs.base.vue_map,
-  stage->nir->info.outputs_written,
-  stage->nir->info.separate_shader);
-
-  const unsigned *shader_code =
- brw_compile_vs(compiler, NULL, mem_ctx, >key.vs,
->prog_data.vs, stage->nir, -1, NULL);
-  if (shader_code == NULL) {
- ralloc_free(mem_ctx);
- return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
-  }
-
-  unsigned code_size = stage->prog_data.vs.base.base.program_size;
-  bin = anv_device_upload_kernel(pipeline->device, cache,
- >cache_key,
- sizeof(stage->cache_key),
- shader_code, code_size,
- stage->nir->constant_data,
- stage->nir->constant_data_size,
- >prog_data.base,
- sizeof(stage->prog_data.vs),
- >bind_map);
-  if (!bin) {
- ralloc_free(mem_ctx);
- return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
-  }
-
-  ralloc_free(mem_ctx);
-   }
-
-   pipeline->shaders[MESA_SHADER_VERTEX] = bin;
-
-   return VK_SUCCESS;
+   return brw_compile_vs(compiler, NULL, mem_ctx, _stage->key.vs,
+ _stage->prog_data.vs, vs_stage->nir, -1, NULL);
 }
 
 static void
@@ -683,6 +649,17 @@ anv_pipeline_link_tcs(const struct brw_compiler *compiler,
   tcs_stage->nir->info.patch_outputs_written;
 }
 
+static const unsigned *
+anv_pipeline_compile_tcs(const struct brw_compiler *compiler,
+ void *mem_ctx,
+ struct anv_pipeline_stage *tcs_stage,
+ struct anv_pipeline_stage *prev_stage)
+{
+   return brw_compile_tcs(compiler, NULL, mem_ctx, _stage->key.tcs,
+  _stage->prog_data.tcs, tcs_stage->nir,
+  -1, NULL);
+}
+
 static void
 anv_pipeline_link_tes(const struct brw_compiler *compiler,
   struct anv_pipeline_stage *tes_stage,
@@ -691,79 +668,16 @@ anv_pipeline_link_tes(const struct brw_compiler *compiler,
anv_fill_binding_table(_stage->prog_data.tes.base.base, 0);
 }
 
-static VkResult
-anv_pipeline_compile_tcs_tes(struct anv_pipeline *pipeline,
- struct anv_pipeline_cache *cache,
- struct anv_pipeline_stage *tcs_stage,
- struct anv_pipeline_stage *tes_stage)
+static const unsigned *
+anv_pipeline_compile_tes(const struct brw_compiler *compiler,
+ void *mem_ctx,
+ struct anv_pipeline_stage *tes_stage,
+ struct anv_pipeline_stage *tcs_stage)
 {
-   const struct brw_compiler *compiler =
-  pipeline->device->instance->physicalDevice.compiler;
-   struct anv_shader_bin *tcs_bin = NULL;
-   struct anv_shader_bin *tes_bin = NULL;
-
-   if (tcs_bin == NULL || tes_bin == NULL) {
-  void *mem_ctx = ralloc_context(NULL);
-
-  const int shader_time_index = -1;
-  const unsigned *shader_code;
-
-  shader_code =
- brw_compile_tcs(compiler, NULL, mem_ctx, _stage->key.tcs,
- _stage->prog_data.tcs, 

[Mesa-dev] [PATCH 17/18] anv/pipeline: Do cross-stage linking optimizations

2018-07-11 Thread Jason Ekstrand
This appears to help the Aztec Ruins benchmark by about 2% on my Kaby
Lake gt2 laptop.
---
 src/intel/vulkan/anv_pipeline.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 080a78e1cee..ab3b95e78ef 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -558,6 +558,9 @@ anv_pipeline_link_vs(const struct brw_compiler *compiler,
  struct anv_pipeline_stage *next_stage)
 {
anv_fill_binding_table(_stage->prog_data.vs.base.base, 0);
+
+   if (next_stage)
+  brw_nir_link_shaders(compiler, _stage->nir, _stage->nir);
 }
 
 static const unsigned *
@@ -622,6 +625,8 @@ anv_pipeline_link_tcs(const struct brw_compiler *compiler,
 
anv_fill_binding_table(_stage->prog_data.tcs.base.base, 0);
 
+   brw_nir_link_shaders(compiler, _stage->nir, _stage->nir);
+
nir_lower_tes_patch_vertices(tes_stage->nir,
 tcs_stage->nir->info.tess.tcs_vertices_out);
 
@@ -666,6 +671,9 @@ anv_pipeline_link_tes(const struct brw_compiler *compiler,
   struct anv_pipeline_stage *next_stage)
 {
anv_fill_binding_table(_stage->prog_data.tes.base.base, 0);
+
+   if (next_stage)
+  brw_nir_link_shaders(compiler, _stage->nir, _stage->nir);
 }
 
 static const unsigned *
@@ -686,6 +694,9 @@ anv_pipeline_link_gs(const struct brw_compiler *compiler,
  struct anv_pipeline_stage *next_stage)
 {
anv_fill_binding_table(_stage->prog_data.gs.base.base, 0);
+
+   if (next_stage)
+  brw_nir_link_shaders(compiler, _stage->nir, _stage->nir);
 }
 
 static const unsigned *
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 14/18] anv/pipeline: Add a separate "link" stage

2018-07-11 Thread Jason Ekstrand
This breaks compilation up a bit into "link" and "compile".  In the
"link" stage, new anv_pipeline_link_* helpers are called which are
responsible for setting up the binding table and doing anything needed
to properly link with the next stage in the pipeline if one exists.
They are called in reverse order starting with the fragment shader so
you can assume linking in later stages is already done.
---
 src/intel/vulkan/anv_pipeline.c | 316 +++-
 1 file changed, 189 insertions(+), 127 deletions(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 50d6ab358d2..3f42a81edb7 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -552,6 +552,14 @@ anv_fill_binding_table(struct brw_stage_prog_data 
*prog_data, unsigned bias)
prog_data->binding_table.image_start = bias;
 }
 
+static void
+anv_pipeline_link_vs(const struct brw_compiler *compiler,
+ struct anv_pipeline_stage *vs_stage,
+ struct anv_pipeline_stage *next_stage)
+{
+   anv_fill_binding_table(_stage->prog_data.vs.base.base, 0);
+}
+
 static VkResult
 anv_pipeline_compile_vs(struct anv_pipeline *pipeline,
 struct anv_pipeline_cache *cache,
@@ -564,8 +572,6 @@ anv_pipeline_compile_vs(struct anv_pipeline *pipeline,
if (bin == NULL) {
   void *mem_ctx = ralloc_context(NULL);
 
-  anv_fill_binding_table(>prog_data.vs.base.base, 0);
-
   brw_compute_vue_map(>device->info,
   >prog_data.vs.base.vue_map,
   stage->nir->info.outputs_written,
@@ -641,13 +647,56 @@ merge_tess_info(struct shader_info *tes_info,
tes_info->tess.point_mode |= tcs_info->tess.point_mode;
 }
 
+static void
+anv_pipeline_link_tcs(const struct brw_compiler *compiler,
+  struct anv_pipeline_stage *tcs_stage,
+  struct anv_pipeline_stage *tes_stage)
+{
+   assert(tes_stage && tes_stage->stage == MESA_SHADER_TESS_EVAL);
+
+   anv_fill_binding_table(_stage->prog_data.tcs.base.base, 0);
+
+   nir_lower_tes_patch_vertices(tes_stage->nir,
+tcs_stage->nir->info.tess.tcs_vertices_out);
+
+   /* Copy TCS info into the TES info */
+   merge_tess_info(_stage->nir->info, _stage->nir->info);
+
+   /* Whacking the key after cache lookup is a bit sketchy, but all of
+* this comes from the SPIR-V, which is part of the hash used for the
+* pipeline cache.  So it should be safe.
+*/
+   tcs_stage->key.tcs.tes_primitive_mode =
+  tes_stage->nir->info.tess.primitive_mode;
+   tcs_stage->key.tcs.outputs_written =
+  tcs_stage->nir->info.outputs_written;
+   tcs_stage->key.tcs.patch_outputs_written =
+  tcs_stage->nir->info.patch_outputs_written;
+   tcs_stage->key.tcs.quads_workaround =
+  compiler->devinfo->gen < 9 &&
+  tes_stage->nir->info.tess.primitive_mode == 7 /* GL_QUADS */ &&
+  tes_stage->nir->info.tess.spacing == TESS_SPACING_EQUAL;
+
+   tes_stage->key.tes.inputs_read =
+  tcs_stage->nir->info.outputs_written;
+   tes_stage->key.tes.patch_inputs_read =
+  tcs_stage->nir->info.patch_outputs_written;
+}
+
+static void
+anv_pipeline_link_tes(const struct brw_compiler *compiler,
+  struct anv_pipeline_stage *tes_stage,
+  struct anv_pipeline_stage *next_stage)
+{
+   anv_fill_binding_table(_stage->prog_data.tes.base.base, 0);
+}
+
 static VkResult
 anv_pipeline_compile_tcs_tes(struct anv_pipeline *pipeline,
  struct anv_pipeline_cache *cache,
  struct anv_pipeline_stage *tcs_stage,
  struct anv_pipeline_stage *tes_stage)
 {
-   const struct gen_device_info *devinfo = >device->info;
const struct brw_compiler *compiler =
   pipeline->device->instance->physicalDevice.compiler;
struct anv_shader_bin *tcs_bin = NULL;
@@ -656,35 +705,6 @@ anv_pipeline_compile_tcs_tes(struct anv_pipeline *pipeline,
if (tcs_bin == NULL || tes_bin == NULL) {
   void *mem_ctx = ralloc_context(NULL);
 
-  nir_lower_tes_patch_vertices(tes_stage->nir,
-   tcs_stage->nir->info.tess.tcs_vertices_out);
-
-  /* Copy TCS info into the TES info */
-  merge_tess_info(_stage->nir->info, _stage->nir->info);
-
-  anv_fill_binding_table(_stage->prog_data.tcs.base.base, 0);
-  anv_fill_binding_table(_stage->prog_data.tes.base.base, 0);
-
-  /* Whacking the key after cache lookup is a bit sketchy, but all of
-   * this comes from the SPIR-V, which is part of the hash used for the
-   * pipeline cache.  So it should be safe.
-   */
-  tcs_stage->key.tcs.tes_primitive_mode =
- tes_stage->nir->info.tess.primitive_mode;
-  tcs_stage->key.tcs.outputs_written =
- tcs_stage->nir->info.outputs_written;
-  tcs_stage->key.tcs.patch_outputs_written =
- 

[Mesa-dev] [PATCH 10/18] anv/pipeline: Pull shader compilation out into a helper.

2018-07-11 Thread Jason Ekstrand
---
 src/intel/vulkan/anv_pipeline.c | 228 +---
 1 file changed, 120 insertions(+), 108 deletions(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index fb3ae15210d..9f35bc9c27b 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -1016,6 +1016,122 @@ anv_pipeline_compile_fs(struct anv_pipeline *pipeline,
return VK_SUCCESS;
 }
 
+static VkResult
+anv_pipeline_compile_graphics(struct anv_pipeline *pipeline,
+  struct anv_pipeline_cache *cache,
+  const VkGraphicsPipelineCreateInfo *info)
+{
+   struct anv_pipeline_stage stages[MESA_SHADER_STAGES] = {};
+
+   VkResult result;
+   for (uint32_t i = 0; i < info->stageCount; i++) {
+  const VkPipelineShaderStageCreateInfo *sinfo = >pStages[i];
+  gl_shader_stage stage = vk_to_mesa_shader_stage(sinfo->stage);
+
+  stages[stage].stage = stage;
+  stages[stage].module = anv_shader_module_from_handle(sinfo->module);
+  stages[stage].entrypoint = sinfo->pName;
+  stages[stage].spec_info = sinfo->pSpecializationInfo;
+
+  const struct gen_device_info *devinfo = >device->info;
+  switch (stage) {
+  case MESA_SHADER_VERTEX:
+ populate_vs_prog_key(devinfo, [stage].key.vs);
+ break;
+  case MESA_SHADER_TESS_CTRL:
+ populate_tcs_prog_key(devinfo,
+   info->pTessellationState->patchControlPoints,
+   [stage].key.tcs);
+ break;
+  case MESA_SHADER_TESS_EVAL:
+ populate_tes_prog_key(devinfo, [stage].key.tes);
+ break;
+  case MESA_SHADER_GEOMETRY:
+ populate_gs_prog_key(devinfo, [stage].key.gs);
+ break;
+  case MESA_SHADER_FRAGMENT:
+ populate_wm_prog_key(devinfo, pipeline->subpass,
+  info->pMultisampleState,
+  [stage].key.wm);
+ break;
+  default:
+ unreachable("Invalid graphics shader stage");
+  }
+
+  pipeline->active_stages |= sinfo->stage;
+   }
+
+   if (pipeline->active_stages & VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT)
+  pipeline->active_stages |= VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT;
+
+   assert(pipeline->active_stages & VK_SHADER_STAGE_VERTEX_BIT);
+
+   ANV_FROM_HANDLE(anv_pipeline_layout, layout, info->layout);
+
+   unsigned char sha1[20];
+   anv_pipeline_hash_graphics(pipeline, layout, stages, sha1);
+
+   for (unsigned s = 0; s < MESA_SHADER_STAGES; s++) {
+  if (!stages[s].entrypoint)
+ continue;
+
+  stages[s].cache_key.stage = s;
+  memcpy(stages[s].cache_key.sha1, sha1, sizeof(sha1));
+
+  struct anv_shader_bin *bin =
+ anv_device_search_for_kernel(pipeline->device, cache,
+  [s].cache_key,
+  sizeof(stages[s].cache_key));
+  if (bin)
+ anv_pipeline_add_compiled_stage(pipeline, s, bin);
+   }
+
+   for (unsigned s = 0; s < MESA_SHADER_STAGES; s++) {
+  if (!stages[s].entrypoint)
+ continue;
+
+  assert(stages[s].stage == s);
+
+  if (pipeline->shaders[s])
+ continue;
+
+  switch (s) {
+  case MESA_SHADER_VERTEX:
+ result = anv_pipeline_compile_vs(pipeline, cache, info,
+  [s]);
+ break;
+  case MESA_SHADER_TESS_CTRL:
+ /* Handled with TESS_EVAL */
+ break;
+  case MESA_SHADER_TESS_EVAL:
+ result = anv_pipeline_compile_tcs_tes(pipeline, cache, info,
+   [MESA_SHADER_TESS_CTRL],
+   [MESA_SHADER_TESS_EVAL]);
+ break;
+  case MESA_SHADER_GEOMETRY:
+ result = anv_pipeline_compile_gs(pipeline, cache, info, [s]);
+ break;
+  case MESA_SHADER_FRAGMENT:
+ result = anv_pipeline_compile_fs(pipeline, cache, info, [s]);
+ break;
+  default:
+ unreachable("Invalid graphics shader stage");
+  }
+  if (result != VK_SUCCESS)
+ goto fail;
+   }
+
+   return VK_SUCCESS;
+
+fail:
+   for (unsigned s = 0; s < MESA_SHADER_STAGES; s++) {
+  if (pipeline->shaders[s])
+ anv_shader_bin_unref(pipeline->device, pipeline->shaders[s]);
+   }
+
+   return result;
+}
+
 VkResult
 anv_pipeline_compile_cs(struct anv_pipeline *pipeline,
 struct anv_pipeline_cache *cache,
@@ -1357,104 +1473,10 @@ anv_pipeline_init(struct anv_pipeline *pipeline,
 
pipeline->active_stages = 0;
 
-   struct anv_pipeline_stage stages[MESA_SHADER_STAGES] = {};
-   for (uint32_t i = 0; i < pCreateInfo->stageCount; i++) {
-  const VkPipelineShaderStageCreateInfo *sinfo = >pStages[i];
-  gl_shader_stage stage = vk_to_mesa_shader_stage(sinfo->stage);
-
-  pipeline->active_stages |= sinfo->stage;
-
-  stages[stage].stage = stage;
- 

[Mesa-dev] [PATCH 09/18] anv/pipeline: Call anv_pipeline_compile_* in a loop

2018-07-11 Thread Jason Ekstrand
---
 src/intel/vulkan/anv_pipeline.c | 56 ++---
 1 file changed, 30 insertions(+), 26 deletions(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index d56f6ce8966..fb3ae15210d 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -1420,35 +1420,39 @@ anv_pipeline_init(struct anv_pipeline *pipeline,
  anv_pipeline_add_compiled_stage(pipeline, s, bin);
}
 
-   if (stages[MESA_SHADER_VERTEX].entrypoint &&
-   !pipeline->shaders[MESA_SHADER_VERTEX]) {
-  result = anv_pipeline_compile_vs(pipeline, cache, pCreateInfo,
-   [MESA_SHADER_VERTEX]);
-  if (result != VK_SUCCESS)
- goto compile_fail;
-   }
+   for (unsigned s = 0; s < MESA_SHADER_STAGES; s++) {
+  if (!stages[s].entrypoint)
+ continue;
 
-   if (stages[MESA_SHADER_TESS_EVAL].entrypoint &&
-   !pipeline->shaders[MESA_SHADER_TESS_EVAL]) {
-  result = anv_pipeline_compile_tcs_tes(pipeline, cache, pCreateInfo,
-[MESA_SHADER_TESS_CTRL],
-[MESA_SHADER_TESS_EVAL]);
-  if (result != VK_SUCCESS)
- goto compile_fail;
-   }
+  assert(stages[s].stage == s);
 
-   if (stages[MESA_SHADER_GEOMETRY].entrypoint &&
-   !pipeline->shaders[MESA_SHADER_GEOMETRY]) {
-  result = anv_pipeline_compile_gs(pipeline, cache, pCreateInfo,
-   [MESA_SHADER_GEOMETRY]);
-  if (result != VK_SUCCESS)
- goto compile_fail;
-   }
+  if (pipeline->shaders[s])
+ continue;
 
-   if (stages[MESA_SHADER_FRAGMENT].entrypoint &&
-   !pipeline->shaders[MESA_SHADER_FRAGMENT]) {
-  result = anv_pipeline_compile_fs(pipeline, cache, pCreateInfo,
-   [MESA_SHADER_FRAGMENT]);
+  switch (s) {
+  case MESA_SHADER_VERTEX:
+ result = anv_pipeline_compile_vs(pipeline, cache, pCreateInfo,
+  [s]);
+ break;
+  case MESA_SHADER_TESS_CTRL:
+ /* Handled with TESS_EVAL */
+ break;
+  case MESA_SHADER_TESS_EVAL:
+ result = anv_pipeline_compile_tcs_tes(pipeline, cache, pCreateInfo,
+   [MESA_SHADER_TESS_CTRL],
+   [MESA_SHADER_TESS_EVAL]);
+ break;
+  case MESA_SHADER_GEOMETRY:
+ result = anv_pipeline_compile_gs(pipeline, cache, pCreateInfo,
+  [s]);
+ break;
+  case MESA_SHADER_FRAGMENT:
+ result = anv_pipeline_compile_fs(pipeline, cache, pCreateInfo,
+  [s]);
+ break;
+  default:
+ unreachable("Invalid graphics shader stage");
+  }
   if (result != VK_SUCCESS)
  goto compile_fail;
}
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/18] anv/pipeline: Add populate_tcs/tes_key helpers

2018-07-11 Thread Jason Ekstrand
They don't really do anything interesting, but it's more consistent this
way.
---
 src/intel/vulkan/anv_pipeline.c | 28 +---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index d6e8e0a0838..e39ce2de010 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -306,6 +306,27 @@ populate_vs_prog_key(const struct gen_device_info *devinfo,
/* XXX: Handle sampler_prog_key */
 }
 
+static void
+populate_tcs_prog_key(const struct gen_device_info *devinfo,
+  unsigned input_vertices,
+  struct brw_tcs_prog_key *key)
+{
+   memset(key, 0, sizeof(*key));
+
+   populate_sampler_prog_key(devinfo, >tex);
+
+   key->input_vertices = input_vertices;
+}
+
+static void
+populate_tes_prog_key(const struct gen_device_info *devinfo,
+  struct brw_tes_prog_key *key)
+{
+   memset(key, 0, sizeof(*key));
+
+   populate_sampler_prog_key(devinfo, >tex);
+}
+
 static void
 populate_gs_prog_key(const struct gen_device_info *devinfo,
  struct brw_gs_prog_key *key)
@@ -628,9 +649,10 @@ anv_pipeline_compile_tcs_tes(struct anv_pipeline *pipeline,
struct anv_shader_bin *tcs_bin = NULL;
struct anv_shader_bin *tes_bin = NULL;
 
-   populate_sampler_prog_key(>device->info, _key.tex);
-   populate_sampler_prog_key(>device->info, _key.tex);
-   tcs_key.input_vertices = info->pTessellationState->patchControlPoints;
+   populate_tcs_prog_key(>device->info,
+ info->pTessellationState->patchControlPoints,
+ _key);
+   populate_tes_prog_key(>device->info, _key);
 
ANV_FROM_HANDLE(anv_pipeline_layout, layout, info->layout);
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/18] anv/pipeline: Recompile all shaders if any are missing from the cache

2018-07-11 Thread Jason Ekstrand
---
 src/intel/vulkan/anv_pipeline.c | 41 +
 1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 200b8748186..bc268b87e55 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -1065,6 +1065,7 @@ anv_pipeline_compile_graphics(struct anv_pipeline 
*pipeline,
unsigned char sha1[20];
anv_pipeline_hash_graphics(pipeline, layout, stages, sha1);
 
+   unsigned found = 0;
for (unsigned s = 0; s < MESA_SHADER_STAGES; s++) {
   if (!stages[s].entrypoint)
  continue;
@@ -1076,8 +1077,42 @@ anv_pipeline_compile_graphics(struct anv_pipeline 
*pipeline,
  anv_device_search_for_kernel(pipeline->device, cache,
   [s].cache_key,
   sizeof(stages[s].cache_key));
-  if (bin)
+  if (bin) {
+ found++;
  pipeline->shaders[s] = bin;
+  }
+   }
+
+   if (found == __builtin_popcount(pipeline->active_stages)) {
+  /* We found all our shaders in the cache.  We're done. */
+  return VK_SUCCESS;
+   } else if (found > 0) {
+  /* We found some but not all of our shaders.  This shouldn't happen
+   * most of the time but it can if we have a partially populated
+   * pipeline cache.
+   */
+  assert(found < __builtin_popcount(pipeline->active_stages));
+
+  vk_debug_report(>device->instance->debug_report_callbacks,
+  VK_DEBUG_REPORT_WARNING_BIT_EXT |
+  VK_DEBUG_REPORT_PERFORMANCE_WARNING_BIT_EXT,
+  VK_DEBUG_REPORT_OBJECT_TYPE_PIPELINE_CACHE_EXT,
+  (uint64_t)(uintptr_t)cache,
+  0, 0, "anv",
+  "Found a partial pipeline in the cache.  This is "
+  "most likely caused by an incomplete pipeline cache "
+  "import or export");
+
+  /* We're going to have to recompile anyway, so just throw away our
+   * references to the shaders in the cache.  We'll get them out of the
+   * cache again as part of the compilation process.
+   */
+  for (unsigned s = 0; s < MESA_SHADER_STAGES; s++) {
+ if (pipeline->shaders[s]) {
+anv_shader_bin_unref(pipeline->device, pipeline->shaders[s]);
+pipeline->shaders[s] = NULL;
+ }
+  }
}
 
for (unsigned s = 0; s < MESA_SHADER_STAGES; s++) {
@@ -1085,9 +1120,7 @@ anv_pipeline_compile_graphics(struct anv_pipeline 
*pipeline,
  continue;
 
   assert(stages[s].stage == s);
-
-  if (pipeline->shaders[s])
- continue;
+  assert(pipeline->shaders[s] == NULL);
 
   switch (s) {
   case MESA_SHADER_VERTEX:
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/18] anv/pipeline: Populate keys up-front

2018-07-11 Thread Jason Ekstrand
Instead of having each anv_pipeline_compile_* function populate the
shader key, make it part of the anv_pipeline_stage struct and fill it
out up-front.
---
 src/intel/vulkan/anv_pipeline.c | 115 +---
 1 file changed, 60 insertions(+), 55 deletions(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index c2ef8878db6..29661433516 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -397,13 +397,14 @@ struct anv_pipeline_stage {
const struct anv_shader_module *module;
const char *entrypoint;
const VkSpecializationInfo *spec_info;
+
+   union brw_any_prog_key key;
 };
 
 static void
 anv_pipeline_hash_shader(struct anv_pipeline *pipeline,
  struct anv_pipeline_layout *layout,
  struct anv_pipeline_stage *stage,
- const void *key, size_t key_size,
  unsigned char *sha1_out)
 {
struct mesa_sha1 ctx;
@@ -425,7 +426,7 @@ anv_pipeline_hash_shader(struct anv_pipeline *pipeline,
   _mesa_sha1_update(, stage->spec_info->pData,
 stage->spec_info->dataSize);
}
-   _mesa_sha1_update(, key, key_size);
+   _mesa_sha1_update(, >key, brw_prog_key_size(stage->stage));
_mesa_sha1_final(, sha1_out);
 }
 
@@ -526,16 +527,12 @@ anv_pipeline_compile_vs(struct anv_pipeline *pipeline,
 {
const struct brw_compiler *compiler =
   pipeline->device->instance->physicalDevice.compiler;
-   struct brw_vs_prog_key key;
struct anv_shader_bin *bin = NULL;
 
-   populate_vs_prog_key(>device->info, );
-
ANV_FROM_HANDLE(anv_pipeline_layout, layout, info->layout);
 
unsigned char sha1[20];
-   anv_pipeline_hash_shader(pipeline, layout, stage,
-, sizeof(key), sha1);
+   anv_pipeline_hash_shader(pipeline, layout, stage, sha1);
bin = anv_device_search_for_kernel(pipeline->device, cache, sha1, 20);
 
if (bin == NULL) {
@@ -565,8 +562,8 @@ anv_pipeline_compile_vs(struct anv_pipeline *pipeline,
   nir->info.separate_shader);
 
   const unsigned *shader_code =
- brw_compile_vs(compiler, NULL, mem_ctx, , _data, nir,
--1, NULL);
+ brw_compile_vs(compiler, NULL, mem_ctx, >key.vs,
+_data, nir, -1, NULL);
   if (shader_code == NULL) {
  ralloc_free(mem_ctx);
  return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
@@ -641,24 +638,15 @@ anv_pipeline_compile_tcs_tes(struct anv_pipeline 
*pipeline,
const struct gen_device_info *devinfo = >device->info;
const struct brw_compiler *compiler =
   pipeline->device->instance->physicalDevice.compiler;
-   struct brw_tcs_prog_key tcs_key = {};
-   struct brw_tes_prog_key tes_key = {};
struct anv_shader_bin *tcs_bin = NULL;
struct anv_shader_bin *tes_bin = NULL;
 
-   populate_tcs_prog_key(>device->info,
- info->pTessellationState->patchControlPoints,
- _key);
-   populate_tes_prog_key(>device->info, _key);
-
ANV_FROM_HANDLE(anv_pipeline_layout, layout, info->layout);
 
unsigned char tcs_sha1[40];
unsigned char tes_sha1[40];
-   anv_pipeline_hash_shader(pipeline, layout, tcs_stage,
-_key, sizeof(tcs_key), tcs_sha1);
-   anv_pipeline_hash_shader(pipeline, layout, tes_stage,
-_key, sizeof(tes_key), tes_sha1);
+   anv_pipeline_hash_shader(pipeline, layout, tcs_stage, tcs_sha1);
+   anv_pipeline_hash_shader(pipeline, layout, tes_stage, tes_sha1);
memcpy(_sha1[20], tes_sha1, 20);
memcpy(_sha1[20], tcs_sha1, 20);
 
@@ -710,23 +698,25 @@ anv_pipeline_compile_tcs_tes(struct anv_pipeline 
*pipeline,
* this comes from the SPIR-V, which is part of the hash used for the
* pipeline cache.  So it should be safe.
*/
-  tcs_key.tes_primitive_mode = tes_nir->info.tess.primitive_mode;
-  tcs_key.outputs_written = tcs_nir->info.outputs_written;
-  tcs_key.patch_outputs_written = tcs_nir->info.patch_outputs_written;
-  tcs_key.quads_workaround =
+  tcs_stage->key.tcs.tes_primitive_mode = 
tes_nir->info.tess.primitive_mode;
+  tcs_stage->key.tcs.outputs_written = tcs_nir->info.outputs_written;
+  tcs_stage->key.tcs.patch_outputs_written =
+ tcs_nir->info.patch_outputs_written;
+  tcs_stage->key.tcs.quads_workaround =
  devinfo->gen < 9 &&
  tes_nir->info.tess.primitive_mode == 7 /* GL_QUADS */ &&
  tes_nir->info.tess.spacing == TESS_SPACING_EQUAL;
 
-  tes_key.inputs_read = tcs_key.outputs_written;
-  tes_key.patch_inputs_read = tcs_key.patch_outputs_written;
+  tes_stage->key.tes.inputs_read = tcs_nir->info.outputs_written;
+  tes_stage->key.tes.patch_inputs_read =
+ tcs_nir->info.patch_outputs_written;
 
   const int shader_time_index = -1;
   const unsigned *shader_code;
 
   

[Mesa-dev] [PATCH 11/18] anv/pipeline: Drop anv_pipeline_add_compiled_stage

2018-07-11 Thread Jason Ekstrand
We can set active_stages much more directly and then it's just candy
around setting pipeline->stages[stage].
---
 src/intel/vulkan/anv_pipeline.c  | 27 ++-
 src/intel/vulkan/genX_pipeline.c |  2 --
 2 files changed, 10 insertions(+), 19 deletions(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 9f35bc9c27b..200b8748186 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -544,14 +544,6 @@ anv_fill_binding_table(struct brw_stage_prog_data 
*prog_data, unsigned bias)
prog_data->binding_table.image_start = bias;
 }
 
-static void
-anv_pipeline_add_compiled_stage(struct anv_pipeline *pipeline,
-gl_shader_stage stage,
-struct anv_shader_bin *shader)
-{
-   pipeline->shaders[stage] = shader;
-}
-
 static VkResult
 anv_pipeline_compile_vs(struct anv_pipeline *pipeline,
 struct anv_pipeline_cache *cache,
@@ -615,7 +607,7 @@ anv_pipeline_compile_vs(struct anv_pipeline *pipeline,
   ralloc_free(mem_ctx);
}
 
-   anv_pipeline_add_compiled_stage(pipeline, MESA_SHADER_VERTEX, bin);
+   pipeline->shaders[MESA_SHADER_VERTEX] = bin;
 
return VK_SUCCESS;
 }
@@ -783,8 +775,8 @@ anv_pipeline_compile_tcs_tes(struct anv_pipeline *pipeline,
   ralloc_free(mem_ctx);
}
 
-   anv_pipeline_add_compiled_stage(pipeline, MESA_SHADER_TESS_CTRL, tcs_bin);
-   anv_pipeline_add_compiled_stage(pipeline, MESA_SHADER_TESS_EVAL, tes_bin);
+   pipeline->shaders[MESA_SHADER_TESS_CTRL] = tcs_bin;
+   pipeline->shaders[MESA_SHADER_TESS_EVAL] = tes_bin;
 
return VK_SUCCESS;
 }
@@ -853,7 +845,7 @@ anv_pipeline_compile_gs(struct anv_pipeline *pipeline,
   ralloc_free(mem_ctx);
}
 
-   anv_pipeline_add_compiled_stage(pipeline, MESA_SHADER_GEOMETRY, bin);
+   pipeline->shaders[MESA_SHADER_GEOMETRY] = bin;
 
return VK_SUCCESS;
 }
@@ -1011,7 +1003,7 @@ anv_pipeline_compile_fs(struct anv_pipeline *pipeline,
   ralloc_free(mem_ctx);
}
 
-   anv_pipeline_add_compiled_stage(pipeline, MESA_SHADER_FRAGMENT, bin);
+   pipeline->shaders[MESA_SHADER_FRAGMENT] = bin;
 
return VK_SUCCESS;
 }
@@ -1023,6 +1015,8 @@ anv_pipeline_compile_graphics(struct anv_pipeline 
*pipeline,
 {
struct anv_pipeline_stage stages[MESA_SHADER_STAGES] = {};
 
+   pipeline->active_stages = 0;
+
VkResult result;
for (uint32_t i = 0; i < info->stageCount; i++) {
   const VkPipelineShaderStageCreateInfo *sinfo = >pStages[i];
@@ -1083,7 +1077,7 @@ anv_pipeline_compile_graphics(struct anv_pipeline 
*pipeline,
   [s].cache_key,
   sizeof(stages[s].cache_key));
   if (bin)
- anv_pipeline_add_compiled_stage(pipeline, s, bin);
+ pipeline->shaders[s] = bin;
}
 
for (unsigned s = 0; s < MESA_SHADER_STAGES; s++) {
@@ -1206,7 +1200,8 @@ anv_pipeline_compile_cs(struct anv_pipeline *pipeline,
   ralloc_free(mem_ctx);
}
 
-   anv_pipeline_add_compiled_stage(pipeline, MESA_SHADER_COMPUTE, bin);
+   pipeline->active_stages = VK_SHADER_STAGE_COMPUTE_BIT;
+   pipeline->shaders[MESA_SHADER_COMPUTE] = bin;
 
return VK_SUCCESS;
 }
@@ -1471,8 +1466,6 @@ anv_pipeline_init(struct anv_pipeline *pipeline,
 */
memset(pipeline->shaders, 0, sizeof(pipeline->shaders));
 
-   pipeline->active_stages = 0;
-
result = anv_pipeline_compile_graphics(pipeline, cache, pCreateInfo);
if (result != VK_SUCCESS) {
   anv_reloc_list_finish(>batch_relocs, alloc);
diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c
index 0821d71c9f8..9f06a085677 100644
--- a/src/intel/vulkan/genX_pipeline.c
+++ b/src/intel/vulkan/genX_pipeline.c
@@ -1807,8 +1807,6 @@ compute_pipeline_create(
 */
memset(pipeline->shaders, 0, sizeof(pipeline->shaders));
 
-   pipeline->active_stages = 0;
-
pipeline->needs_data_cache = false;
 
assert(pCreateInfo->stage.stage == VK_SHADER_STAGE_COMPUTE_BIT);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 13/18] anv/pipeline: Compile to NIR in compile_graphics

2018-07-11 Thread Jason Ekstrand
This pulls the SPIR-V to NIR step out into common code.
---
 src/intel/vulkan/anv_pipeline.c | 278 +---
 1 file changed, 116 insertions(+), 162 deletions(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index bc268b87e55..50d6ab358d2 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -404,6 +404,14 @@ struct anv_pipeline_stage {
   gl_shader_stage stage;
   unsigned char sha1[20];
} cache_key;
+
+   nir_shader *nir;
+
+   struct anv_pipeline_binding surface_to_descriptor[256];
+   struct anv_pipeline_binding sampler_to_descriptor[256];
+   struct anv_pipeline_bind_map bind_map;
+
+   union brw_any_prog_data prog_data;
 };
 
 static void
@@ -547,58 +555,40 @@ anv_fill_binding_table(struct brw_stage_prog_data 
*prog_data, unsigned bias)
 static VkResult
 anv_pipeline_compile_vs(struct anv_pipeline *pipeline,
 struct anv_pipeline_cache *cache,
-const VkGraphicsPipelineCreateInfo *info,
 struct anv_pipeline_stage *stage)
 {
const struct brw_compiler *compiler =
   pipeline->device->instance->physicalDevice.compiler;
struct anv_shader_bin *bin = NULL;
 
-   ANV_FROM_HANDLE(anv_pipeline_layout, layout, info->layout);
-
if (bin == NULL) {
-  struct brw_vs_prog_data prog_data = {};
-  struct anv_pipeline_binding surface_to_descriptor[256];
-  struct anv_pipeline_binding sampler_to_descriptor[256];
-
-  struct anv_pipeline_bind_map map = {
- .surface_to_descriptor = surface_to_descriptor,
- .sampler_to_descriptor = sampler_to_descriptor
-  };
-
   void *mem_ctx = ralloc_context(NULL);
 
-  nir_shader *nir = anv_pipeline_compile(pipeline, mem_ctx, layout, stage,
- _data.base.base, );
-  if (nir == NULL) {
- ralloc_free(mem_ctx);
- return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
-  }
-
-  anv_fill_binding_table(_data.base.base, 0);
+  anv_fill_binding_table(>prog_data.vs.base.base, 0);
 
   brw_compute_vue_map(>device->info,
-  _data.base.vue_map,
-  nir->info.outputs_written,
-  nir->info.separate_shader);
+  >prog_data.vs.base.vue_map,
+  stage->nir->info.outputs_written,
+  stage->nir->info.separate_shader);
 
   const unsigned *shader_code =
  brw_compile_vs(compiler, NULL, mem_ctx, >key.vs,
-_data, nir, -1, NULL);
+>prog_data.vs, stage->nir, -1, NULL);
   if (shader_code == NULL) {
  ralloc_free(mem_ctx);
  return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
   }
 
-  unsigned code_size = prog_data.base.base.program_size;
+  unsigned code_size = stage->prog_data.vs.base.base.program_size;
   bin = anv_device_upload_kernel(pipeline->device, cache,
  >cache_key,
  sizeof(stage->cache_key),
  shader_code, code_size,
- nir->constant_data,
- nir->constant_data_size,
- _data.base.base, sizeof(prog_data),
- );
+ stage->nir->constant_data,
+ stage->nir->constant_data_size,
+ >prog_data.base,
+ sizeof(stage->prog_data.vs),
+ >bind_map);
   if (!bin) {
  ralloc_free(mem_ctx);
  return vk_error(VK_ERROR_OUT_OF_HOST_MEMORY);
@@ -654,7 +644,6 @@ merge_tess_info(struct shader_info *tes_info,
 static VkResult
 anv_pipeline_compile_tcs_tes(struct anv_pipeline *pipeline,
  struct anv_pipeline_cache *cache,
- const VkGraphicsPipelineCreateInfo *info,
  struct anv_pipeline_stage *tcs_stage,
  struct anv_pipeline_stage *tes_stage)
 {
@@ -664,85 +653,60 @@ anv_pipeline_compile_tcs_tes(struct anv_pipeline 
*pipeline,
struct anv_shader_bin *tcs_bin = NULL;
struct anv_shader_bin *tes_bin = NULL;
 
-   ANV_FROM_HANDLE(anv_pipeline_layout, layout, info->layout);
-
if (tcs_bin == NULL || tes_bin == NULL) {
-  struct brw_tcs_prog_data tcs_prog_data = {};
-  struct brw_tes_prog_data tes_prog_data = {};
-  struct anv_pipeline_binding tcs_surface_to_descriptor[256];
-  struct anv_pipeline_binding tcs_sampler_to_descriptor[256];
-  struct anv_pipeline_binding tes_surface_to_descriptor[256];
-  struct anv_pipeline_binding tes_sampler_to_descriptor[256];
-
-  struct anv_pipeline_bind_map tcs_map = {
- 

[Mesa-dev] [PATCH 08/18] anv/pipeline: Hash the entire pipeline in one go

2018-07-11 Thread Jason Ekstrand
Instead of hashing each stage separately (and TES and TCS together), we
hash the entire pipeline.  This means we'll get fewer cache hits if
they, for instance, re-use the same VS over and over again but it also
means we can now safely do cross-stage optimizations.
---
 src/intel/vulkan/anv_pipeline.c | 147 
 1 file changed, 94 insertions(+), 53 deletions(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 29661433516..d56f6ce8966 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -399,34 +399,67 @@ struct anv_pipeline_stage {
const VkSpecializationInfo *spec_info;
 
union brw_any_prog_key key;
+
+   struct {
+  gl_shader_stage stage;
+  unsigned char sha1[20];
+   } cache_key;
 };
 
 static void
-anv_pipeline_hash_shader(struct anv_pipeline *pipeline,
- struct anv_pipeline_layout *layout,
- struct anv_pipeline_stage *stage,
- unsigned char *sha1_out)
+anv_pipeline_hash_shader(struct mesa_sha1 *ctx,
+ struct anv_pipeline_stage *stage)
 {
-   struct mesa_sha1 ctx;
-
-   _mesa_sha1_init();
-   if (stage->stage != MESA_SHADER_COMPUTE) {
-  _mesa_sha1_update(, >subpass->view_mask,
-sizeof(pipeline->subpass->view_mask));
-   }
-   if (layout)
-  _mesa_sha1_update(, layout->sha1, sizeof(layout->sha1));
-   _mesa_sha1_update(, stage->module->sha1, sizeof(stage->module->sha1));
-   _mesa_sha1_update(, stage->entrypoint, strlen(stage->entrypoint));
-   _mesa_sha1_update(, >stage, sizeof(stage->stage));
+   _mesa_sha1_update(ctx, >stage, sizeof(stage->stage));
+   _mesa_sha1_update(ctx, stage->module->sha1, sizeof(stage->module->sha1));
+   _mesa_sha1_update(ctx, stage->entrypoint, strlen(stage->entrypoint));
if (stage->spec_info) {
-  _mesa_sha1_update(, stage->spec_info->pMapEntries,
+  _mesa_sha1_update(ctx, stage->spec_info->pMapEntries,
 stage->spec_info->mapEntryCount *
 sizeof(*stage->spec_info->pMapEntries));
-  _mesa_sha1_update(, stage->spec_info->pData,
+  _mesa_sha1_update(ctx, stage->spec_info->pData,
 stage->spec_info->dataSize);
}
-   _mesa_sha1_update(, >key, brw_prog_key_size(stage->stage));
+   _mesa_sha1_update(ctx, >key, brw_prog_key_size(stage->stage));
+}
+
+static void
+anv_pipeline_hash_graphics(struct anv_pipeline *pipeline,
+   struct anv_pipeline_layout *layout,
+   struct anv_pipeline_stage *stages,
+   unsigned char *sha1_out)
+{
+   struct mesa_sha1 ctx;
+   _mesa_sha1_init();
+
+   _mesa_sha1_update(, >subpass->view_mask,
+ sizeof(pipeline->subpass->view_mask));
+
+   if (layout)
+  _mesa_sha1_update(, layout->sha1, sizeof(layout->sha1));
+
+   for (unsigned s = 0; s < MESA_SHADER_STAGES; s++) {
+  if (stages[s].entrypoint)
+ anv_pipeline_hash_shader(, [s]);
+   }
+
+   _mesa_sha1_final(, sha1_out);
+}
+
+static void
+anv_pipeline_hash_compute(struct anv_pipeline *pipeline,
+  struct anv_pipeline_layout *layout,
+  struct anv_pipeline_stage *stage,
+  unsigned char *sha1_out)
+{
+   struct mesa_sha1 ctx;
+   _mesa_sha1_init();
+
+   if (layout)
+  _mesa_sha1_update(, layout->sha1, sizeof(layout->sha1));
+
+   anv_pipeline_hash_shader(, stage);
+
_mesa_sha1_final(, sha1_out);
 }
 
@@ -531,10 +564,6 @@ anv_pipeline_compile_vs(struct anv_pipeline *pipeline,
 
ANV_FROM_HANDLE(anv_pipeline_layout, layout, info->layout);
 
-   unsigned char sha1[20];
-   anv_pipeline_hash_shader(pipeline, layout, stage, sha1);
-   bin = anv_device_search_for_kernel(pipeline->device, cache, sha1, 20);
-
if (bin == NULL) {
   struct brw_vs_prog_data prog_data = {};
   struct anv_pipeline_binding surface_to_descriptor[256];
@@ -570,7 +599,9 @@ anv_pipeline_compile_vs(struct anv_pipeline *pipeline,
   }
 
   unsigned code_size = prog_data.base.base.program_size;
-  bin = anv_device_upload_kernel(pipeline->device, cache, sha1, 20,
+  bin = anv_device_upload_kernel(pipeline->device, cache,
+ >cache_key,
+ sizeof(stage->cache_key),
  shader_code, code_size,
  nir->constant_data,
  nir->constant_data_size,
@@ -643,18 +674,6 @@ anv_pipeline_compile_tcs_tes(struct anv_pipeline *pipeline,
 
ANV_FROM_HANDLE(anv_pipeline_layout, layout, info->layout);
 
-   unsigned char tcs_sha1[40];
-   unsigned char tes_sha1[40];
-   anv_pipeline_hash_shader(pipeline, layout, tcs_stage, tcs_sha1);
-   anv_pipeline_hash_shader(pipeline, layout, tes_stage, tes_sha1);
-   memcpy(_sha1[20], tes_sha1, 20);

[Mesa-dev] [PATCH 16/18] nir/lower_indirect: Bail early if modes == 0

2018-07-11 Thread Jason Ekstrand
There's no point in walking the program if 100% if we're never going to
actually lower anything.
---
 src/compiler/nir/nir_lower_indirect_derefs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/compiler/nir/nir_lower_indirect_derefs.c 
b/src/compiler/nir/nir_lower_indirect_derefs.c
index d85c1704222..c1f3cf86823 100644
--- a/src/compiler/nir/nir_lower_indirect_derefs.c
+++ b/src/compiler/nir/nir_lower_indirect_derefs.c
@@ -205,6 +205,9 @@ nir_lower_indirect_derefs(nir_shader *shader, 
nir_variable_mode modes)
 {
bool progress = false;
 
+   if (modes == 0)
+  return false;
+
nir_foreach_function(function, shader) {
   if (function->impl)
  progress = lower_indirects_impl(function->impl, modes) || progress;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/18] anv/pipline: Add a helper struct for per-stage info

2018-07-11 Thread Jason Ekstrand
---
 src/intel/vulkan/anv_pipeline.c | 167 ++--
 src/intel/vulkan/anv_private.h  |   2 +-
 2 files changed, 74 insertions(+), 95 deletions(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index e39ce2de010..c2ef8878db6 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -99,7 +99,7 @@ static const uint64_t stage_to_debug[] = {
 static nir_shader *
 anv_shader_compile_to_nir(struct anv_pipeline *pipeline,
   void *mem_ctx,
-  struct anv_shader_module *module,
+  const struct anv_shader_module *module,
   const char *entrypoint_name,
   gl_shader_stage stage,
   const VkSpecializationInfo *spec_info)
@@ -391,32 +391,39 @@ populate_cs_prog_key(const struct gen_device_info 
*devinfo,
populate_sampler_prog_key(devinfo, >tex);
 }
 
+struct anv_pipeline_stage {
+   gl_shader_stage stage;
+
+   const struct anv_shader_module *module;
+   const char *entrypoint;
+   const VkSpecializationInfo *spec_info;
+};
+
 static void
 anv_pipeline_hash_shader(struct anv_pipeline *pipeline,
  struct anv_pipeline_layout *layout,
- struct anv_shader_module *module,
- const char *entrypoint,
- gl_shader_stage stage,
- const VkSpecializationInfo *spec_info,
+ struct anv_pipeline_stage *stage,
  const void *key, size_t key_size,
  unsigned char *sha1_out)
 {
struct mesa_sha1 ctx;
 
_mesa_sha1_init();
-   if (stage != MESA_SHADER_COMPUTE) {
+   if (stage->stage != MESA_SHADER_COMPUTE) {
   _mesa_sha1_update(, >subpass->view_mask,
 sizeof(pipeline->subpass->view_mask));
}
if (layout)
   _mesa_sha1_update(, layout->sha1, sizeof(layout->sha1));
-   _mesa_sha1_update(, module->sha1, sizeof(module->sha1));
-   _mesa_sha1_update(, entrypoint, strlen(entrypoint));
-   _mesa_sha1_update(, , sizeof(stage));
-   if (spec_info) {
-  _mesa_sha1_update(, spec_info->pMapEntries,
-spec_info->mapEntryCount * 
sizeof(*spec_info->pMapEntries));
-  _mesa_sha1_update(, spec_info->pData, spec_info->dataSize);
+   _mesa_sha1_update(, stage->module->sha1, sizeof(stage->module->sha1));
+   _mesa_sha1_update(, stage->entrypoint, strlen(stage->entrypoint));
+   _mesa_sha1_update(, >stage, sizeof(stage->stage));
+   if (stage->spec_info) {
+  _mesa_sha1_update(, stage->spec_info->pMapEntries,
+stage->spec_info->mapEntryCount *
+sizeof(*stage->spec_info->pMapEntries));
+  _mesa_sha1_update(, stage->spec_info->pData,
+stage->spec_info->dataSize);
}
_mesa_sha1_update(, key, key_size);
_mesa_sha1_final(, sha1_out);
@@ -426,10 +433,7 @@ static nir_shader *
 anv_pipeline_compile(struct anv_pipeline *pipeline,
  void *mem_ctx,
  struct anv_pipeline_layout *layout,
- struct anv_shader_module *module,
- const char *entrypoint,
- gl_shader_stage stage,
- const VkSpecializationInfo *spec_info,
+ struct anv_pipeline_stage *stage,
  struct brw_stage_prog_data *prog_data,
  struct anv_pipeline_bind_map *map)
 {
@@ -437,8 +441,10 @@ anv_pipeline_compile(struct anv_pipeline *pipeline,
   pipeline->device->instance->physicalDevice.compiler;
 
nir_shader *nir = anv_shader_compile_to_nir(pipeline, mem_ctx,
-   module, entrypoint, stage,
-   spec_info);
+   stage->module,
+   stage->entrypoint,
+   stage->stage,
+   stage->spec_info);
if (nir == NULL)
   return NULL;
 
@@ -446,10 +452,10 @@ anv_pipeline_compile(struct anv_pipeline *pipeline,
 
NIR_PASS_V(nir, anv_nir_lower_push_constants);
 
-   if (stage != MESA_SHADER_COMPUTE)
+   if (nir->info.stage != MESA_SHADER_COMPUTE)
   NIR_PASS_V(nir, anv_nir_lower_multiview, pipeline->subpass->view_mask);
 
-   if (stage == MESA_SHADER_COMPUTE)
+   if (nir->info.stage == MESA_SHADER_COMPUTE)
   prog_data->total_shared = nir->num_shared;
 
nir_shader_gather_info(nir, nir_shader_get_entrypoint(nir));
@@ -485,7 +491,7 @@ anv_pipeline_compile(struct anv_pipeline *pipeline,
if (layout)
   anv_nir_apply_pipeline_layout(pipeline, layout, nir, prog_data, map);
 
-   if (stage != MESA_SHADER_COMPUTE)
+   if (nir->info.stage != MESA_SHADER_COMPUTE)
   

[Mesa-dev] [PATCH 02/18] anv: Restrict the nuber of color regions to those actually written

2018-07-11 Thread Jason Ekstrand
The back-end compiler emits the number of color writes specified by
wm_prog_key::nr_color_regions regardless of what nir_store_outputs we
have.  Once we've gone through and figured out which render targets
actually exist and are written by the shader, we should restrict the key
to avoid extra RT write messages.
---
 src/intel/vulkan/anv_pipeline.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 7f89c02407b..f0c8f22c9f0 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -962,6 +962,11 @@ anv_pipeline_compile_fs(struct anv_pipeline *pipeline,
  num_rts = 1;
   }
 
+  /* Now that we've determined the actual number of render targets, adjust
+   * the key accordingly.
+   */
+  key.nr_color_regions = num_rts;
+
   assert(num_rts <= max_rt);
   map.surface_to_descriptor -= num_rts;
   map.surface_count += num_rts;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/18] anv/pipeline: Rework the parameters to populate_wm_prog_key

2018-07-11 Thread Jason Ekstrand
---
 src/intel/vulkan/anv_pipeline.c | 46 +
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 0d514fbae47..d6e8e0a0838 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -316,21 +316,19 @@ populate_gs_prog_key(const struct gen_device_info 
*devinfo,
 }
 
 static void
-populate_wm_prog_key(const struct anv_pipeline *pipeline,
- const VkGraphicsPipelineCreateInfo *info,
+populate_wm_prog_key(const struct gen_device_info *devinfo,
+ const struct anv_subpass *subpass,
+ const VkPipelineMultisampleStateCreateInfo *ms_info,
  struct brw_wm_prog_key *key)
 {
-   const struct gen_device_info *devinfo = >device->info;
-
memset(key, 0, sizeof(*key));
 
populate_sampler_prog_key(devinfo, >tex);
 
-   /* TODO: we could set this to 0 based on the information in nir_shader, but
-* this function is called before spirv_to_nir. */
-   const struct brw_vue_map *vue_map =
-  _pipeline_get_last_vue_prog_data(pipeline)->vue_map;
-   key->input_slots_valid = vue_map->slots_valid;
+   /* We set this to 0 here and set to the actual value before we call
+* brw_compile_fs.
+*/
+   key->input_slots_valid = 0;
 
/* Vulkan doesn't specify a default */
key->high_quality_derivatives = false;
@@ -338,32 +336,28 @@ populate_wm_prog_key(const struct anv_pipeline *pipeline,
/* XXX Vulkan doesn't appear to specify */
key->clamp_fragment_color = false;
 
-   assert(pipeline->subpass->color_count <= MAX_RTS);
-   for (uint32_t i = 0; i < pipeline->subpass->color_count; i++) {
-  if (pipeline->subpass->color_attachments[i].attachment !=
-  VK_ATTACHMENT_UNUSED)
+   assert(subpass->color_count <= MAX_RTS);
+   for (uint32_t i = 0; i < subpass->color_count; i++) {
+  if (subpass->color_attachments[i].attachment != VK_ATTACHMENT_UNUSED)
  key->color_outputs_valid |= (1 << i);
}
 
key->nr_color_regions = _mesa_bitcount(key->color_outputs_valid);
 
key->replicate_alpha = key->nr_color_regions > 1 &&
-  info->pMultisampleState &&
-  info->pMultisampleState->alphaToCoverageEnable;
+  ms_info && ms_info->alphaToCoverageEnable;
 
-   if (info->pMultisampleState) {
+   if (ms_info) {
   /* We should probably pull this out of the shader, but it's fairly
* harmless to compute it and then let dead-code take care of it.
*/
-  if (info->pMultisampleState->rasterizationSamples > 1) {
+  if (ms_info->rasterizationSamples > 1) {
  key->persample_interp =
-(info->pMultisampleState->minSampleShading *
- info->pMultisampleState->rasterizationSamples) > 1;
+(ms_info->minSampleShading * ms_info->rasterizationSamples) > 1;
  key->multisample_fbo = true;
   }
 
-  key->frag_coord_adds_sample_pos =
- info->pMultisampleState->sampleShadingEnable;
+  key->frag_coord_adds_sample_pos = ms_info->sampleShadingEnable;
}
 }
 
@@ -864,7 +858,15 @@ anv_pipeline_compile_fs(struct anv_pipeline *pipeline,
struct brw_wm_prog_key key;
struct anv_shader_bin *bin = NULL;
 
-   populate_wm_prog_key(pipeline, info, );
+   populate_wm_prog_key(>device->info, pipeline->subpass,
+info->pMultisampleState, );
+
+   /* TODO: we could set this to 0 based on the information in nir_shader, but
+* we need this before we call spirv_to_nir.
+*/
+   const struct brw_vue_map *vue_map =
+  _pipeline_get_last_vue_prog_data(pipeline)->vue_map;
+   key.input_slots_valid = vue_map->slots_valid;
 
ANV_FROM_HANDLE(anv_pipeline_layout, layout, info->layout);
 
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/18] anv/pipeline: Fix up deref modes if we delete a FS output

2018-07-11 Thread Jason Ekstrand
With the new deref instructions, we have to keep the modes consistent
between the derefs and the variables they reference.  Since we remove
outputs by changing them to local variables, we need to run the fixup
pass to fix the modes.
---
 src/intel/vulkan/anv_pipeline.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 95a686f7833..7f89c02407b 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -929,6 +929,7 @@ anv_pipeline_compile_fs(struct anv_pipeline *pipeline,
  num_rts++;
   }
 
+  bool deleted_output = false;
   nir_foreach_variable_safe(var, >outputs) {
  if (var->data.location < FRAG_RESULT_DATA0)
 continue;
@@ -936,6 +937,7 @@ anv_pipeline_compile_fs(struct anv_pipeline *pipeline,
  const unsigned rt = var->data.location - FRAG_RESULT_DATA0;
  if (rt >= key.nr_color_regions) {
 /* Out-of-bounds, throw it away */
+deleted_output = true;
 var->data.mode = nir_var_local;
 exec_node_remove(>node);
 exec_list_push_tail(>locals, >node);
@@ -947,6 +949,9 @@ anv_pipeline_compile_fs(struct anv_pipeline *pipeline,
  var->data.location = rt_to_bindings[rt] + FRAG_RESULT_DATA0;
   }
 
+  if (deleted_output)
+ nir_fixup_deref_modes(nir);
+
   if (num_rts == 0) {
  /* If we have no render targets, we need a null render target */
  rt_bindings[0] = (struct anv_pipeline_binding) {
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/18] anv/pipeline: More aggressively optimize away color attachments

2018-07-11 Thread Jason Ekstrand
Instead of just looking at the number of color attachments, look at
which ones are actually used by the subpass.  This lets us potentially
throw away chunks of the fragment shader.  In DXVK, for example, all
subpasses have 8 attachments and most are VK_ATTACHMENT_UNUSED so this
is very helpful in that case.
---
 src/intel/compiler/brw_compiler.h |  1 +
 src/intel/vulkan/anv_pipeline.c   | 18 +-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/src/intel/compiler/brw_compiler.h 
b/src/intel/compiler/brw_compiler.h
index 9dfcfcc0115..4797c9cf06d 100644
--- a/src/intel/compiler/brw_compiler.h
+++ b/src/intel/compiler/brw_compiler.h
@@ -403,6 +403,7 @@ struct brw_wm_prog_key {
bool force_dual_color_blend:1;
bool coherent_fb_fetch:1;
 
+   uint8_t color_outputs_valid;
uint64_t input_slots_valid;
unsigned program_string_id;
GLenum alpha_test_func;  /* < For Gen4/5 MRT alpha test */
diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index f0c8f22c9f0..0d514fbae47 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -338,7 +338,14 @@ populate_wm_prog_key(const struct anv_pipeline *pipeline,
/* XXX Vulkan doesn't appear to specify */
key->clamp_fragment_color = false;
 
-   key->nr_color_regions = pipeline->subpass->color_count;
+   assert(pipeline->subpass->color_count <= MAX_RTS);
+   for (uint32_t i = 0; i < pipeline->subpass->color_count; i++) {
+  if (pipeline->subpass->color_attachments[i].attachment !=
+  VK_ATTACHMENT_UNUSED)
+ key->color_outputs_valid |= (1 << i);
+   }
+
+   key->nr_color_regions = _mesa_bitcount(key->color_outputs_valid);
 
key->replicate_alpha = key->nr_color_regions > 1 &&
   info->pMultisampleState &&
@@ -903,8 +910,8 @@ anv_pipeline_compile_fs(struct anv_pipeline *pipeline,
 continue;
 
  const unsigned rt = var->data.location - FRAG_RESULT_DATA0;
- /* Out-of-bounds */
- if (rt >= key.nr_color_regions)
+ /* Unused or out-of-bounds */
+ if (rt >= MAX_RTS || !(key.color_outputs_valid & (1 << rt)))
 continue;
 
  const unsigned array_len =
@@ -935,8 +942,8 @@ anv_pipeline_compile_fs(struct anv_pipeline *pipeline,
 continue;
 
  const unsigned rt = var->data.location - FRAG_RESULT_DATA0;
- if (rt >= key.nr_color_regions) {
-/* Out-of-bounds, throw it away */
+ if (rt >= MAX_RTS || !(key.color_outputs_valid & (1 << rt))) {
+/* Unused or out-of-bounds, throw it away */
 deleted_output = true;
 var->data.mode = nir_var_local;
 exec_node_remove(>node);
@@ -966,6 +973,7 @@ anv_pipeline_compile_fs(struct anv_pipeline *pipeline,
* the key accordingly.
*/
   key.nr_color_regions = num_rts;
+  key.color_outputs_valid = (1 << num_rts) - 1;
 
   assert(num_rts <= max_rt);
   map.surface_to_descriptor -= num_rts;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/18] anv/pipeline: Do cross-stage linking

2018-07-11 Thread Jason Ekstrand
I sent out a series for this almost a year ago and it just sat on the list
rotting away.  You can find the original series here:

https://patchwork.freedesktop.org/series/32809/

This v2 is a rebase of that series.  I believe Tim reviewed most of the
original but the rebase was painful enough that it probably merits a second
look-over.  I still have yet to actually be able to tie this to performance
data on anything.  I know there are some Skyrim shaders that are affected
by it but it runs at the same speed before and after.

Jason Ekstrand (18):
  anv/pipeline: Fix up deref modes if we delete a FS output
  anv: Restrict the nuber of color regions to those actually written
  anv/pipeline: More aggressively optimize away color attachments
  anv/pipeline: Rework the parameters to populate_wm_prog_key
  anv/pipeline: Add populate_tcs/tes_key helpers
  anv/pipline: Add a helper struct for per-stage info
  anv/pipeline: Populate keys up-front
  anv/pipeline: Hash the entire pipeline in one go
  anv/pipeline: Call anv_pipeline_compile_* in a loop
  anv/pipeline: Pull shader compilation out into a helper.
  anv/pipeline: Drop anv_pipeline_add_compiled_stage
  anv/pipeline: Recompile all shaders if any are missing from the cache
  anv/pipeline: Compile to NIR in compile_graphics
  anv/pipeline: Add a separate "link" stage
  anv/pipeline: Pull most of the anv_pipeline_compile_* into common code
  nir/lower_indirect: Bail early if modes == 0
  anv/pipeline: Do cross-stage linking optimizations
  anv/pipeline: Disable FS dispatch for pointless fragment shaders

 src/compiler/nir/nir_lower_indirect_derefs.c |3 +
 src/intel/compiler/brw_compiler.h|1 +
 src/intel/vulkan/anv_pipeline.c  | 1110 +-
 src/intel/vulkan/anv_private.h   |2 +-
 src/intel/vulkan/genX_pipeline.c |2 -
 5 files changed, 581 insertions(+), 537 deletions(-)

-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 106644] [llvmpipe] Mesa 18.1.2 fails lp_test_format, lp_test_arit, lp_test_blend, lp_test_printf, lp_test_conv tests

2018-07-11 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=106644

--- Comment #34 from Ben Crocker  ---
Also, are you able to run this test on PPC64 BE?
(My PPC64 BE machine is out of service at the moment.)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi: add support for Vega20

2018-07-11 Thread Marek Olšák
From: Marek Olšák 

---
 include/pci_ids/radeonsi_pci_ids.h  | 7 +++
 src/amd/addrlib/amdgpu_asic_addr.h  | 2 ++
 src/amd/addrlib/gfx9/gfx9addrlib.cpp| 3 ++-
 src/amd/addrlib/gfx9/gfx9addrlib.h  | 1 +
 src/amd/common/ac_llvm_util.c   | 4 +++-
 src/amd/common/ac_surface.c | 4 
 src/amd/common/amd_family.h | 1 +
 src/amd/common/gfx9d.h  | 1 +
 src/gallium/drivers/radeonsi/si_get.c   | 1 +
 src/gallium/drivers/radeonsi/si_pipe.c  | 3 ++-
 src/gallium/drivers/radeonsi/si_state.c | 1 +
 src/gallium/drivers/radeonsi/si_state_binning.c | 1 +
 12 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/include/pci_ids/radeonsi_pci_ids.h 
b/include/pci_ids/radeonsi_pci_ids.h
index 6386d21a19f..c8d30597230 100644
--- a/include/pci_ids/radeonsi_pci_ids.h
+++ b/include/pci_ids/radeonsi_pci_ids.h
@@ -228,11 +228,18 @@ CHIPSET(0x6867, VEGA10)
 CHIPSET(0x6868, VEGA10)
 CHIPSET(0x687F, VEGA10)
 CHIPSET(0x686C, VEGA10)
 
 CHIPSET(0x69A0, VEGA12)
 CHIPSET(0x69A1, VEGA12)
 CHIPSET(0x69A2, VEGA12)
 CHIPSET(0x69A3, VEGA12)
 CHIPSET(0x69AF, VEGA12)
 
+CHIPSET(0x66A0, VEGA20)
+CHIPSET(0x66A1, VEGA20)
+CHIPSET(0x66A2, VEGA20)
+CHIPSET(0x66A3, VEGA20)
+CHIPSET(0x66A7, VEGA20)
+CHIPSET(0x66AF, VEGA20)
+
 CHIPSET(0x15DD, RAVEN)
diff --git a/src/amd/addrlib/amdgpu_asic_addr.h 
b/src/amd/addrlib/amdgpu_asic_addr.h
index b4b8aecd42d..e5838d42a3c 100644
--- a/src/amd/addrlib/amdgpu_asic_addr.h
+++ b/src/amd/addrlib/amdgpu_asic_addr.h
@@ -80,20 +80,21 @@
 #define AMDGPU_POLARIS11_RANGE  0x5A, 0x64
 #define AMDGPU_POLARIS12_RANGE  0x64, 0x6E
 #define AMDGPU_VEGAM_RANGE  0x6E, 0xFF
 
 #define AMDGPU_CARRIZO_RANGE0x01, 0x21
 #define AMDGPU_BRISTOL_RANGE0x10, 0x21
 #define AMDGPU_STONEY_RANGE 0x61, 0xFF
 
 #define AMDGPU_VEGA10_RANGE 0x01, 0x14
 #define AMDGPU_VEGA12_RANGE 0x14, 0x28
+#define AMDGPU_VEGA20_RANGE 0x28, 0xFF
 
 #define AMDGPU_RAVEN_RANGE  0x01, 0x81
 
 #define AMDGPU_EXPAND_FIX(x) x
 #define AMDGPU_RANGE_HELPER(val, min, max) ((val >= min) && (val < max))
 #define AMDGPU_IN_RANGE(val, ...)   AMDGPU_EXPAND_FIX(AMDGPU_RANGE_HELPER(val, 
__VA_ARGS__))
 
 
 // ASICREV_IS(eRevisionId, revisionName)
 #define ASICREV_IS(r, rn)  AMDGPU_IN_RANGE(r, AMDGPU_##rn##_RANGE)
@@ -121,14 +122,15 @@
 #define ASICREV_IS_VEGAM_P(r)  ASICREV_IS(r, VEGAM)
 
 #define ASICREV_IS_CARRIZO(r)  ASICREV_IS(r, CARRIZO)
 #define ASICREV_IS_CARRIZO_BRISTOL(r)  ASICREV_IS(r, BRISTOL)
 #define ASICREV_IS_STONEY(r)   ASICREV_IS(r, STONEY)
 
 #define ASICREV_IS_VEGA10_M(r) ASICREV_IS(r, VEGA10)
 #define ASICREV_IS_VEGA10_P(r) ASICREV_IS(r, VEGA10)
 #define ASICREV_IS_VEGA12_P(r) ASICREV_IS(r, VEGA12)
 #define ASICREV_IS_VEGA12_p(r) ASICREV_IS(r, VEGA12)
+#define ASICREV_IS_VEGA20_P(r) ASICREV_IS(r, VEGA20)
 
 #define ASICREV_IS_RAVEN(r)ASICREV_IS(r, RAVEN)
 
 #endif // _AMDGPU_ASIC_ADDR_H
diff --git a/src/amd/addrlib/gfx9/gfx9addrlib.cpp 
b/src/amd/addrlib/gfx9/gfx9addrlib.cpp
index b88d3243228..ef86c3bc7b5 100644
--- a/src/amd/addrlib/gfx9/gfx9addrlib.cpp
+++ b/src/amd/addrlib/gfx9/gfx9addrlib.cpp
@@ -1223,20 +1223,21 @@ BOOL_32 Gfx9Lib::HwlInitGlobalParams(
 ADDR_ASSERT((m_blockVarSizeLog2 == 0) ||
 ((m_blockVarSizeLog2 >= 17u) && (m_blockVarSizeLog2 <= 
20u)));
 m_blockVarSizeLog2 = Min(Max(17u, m_blockVarSizeLog2), 20u);
 
 if ((m_rbPerSeLog2 == 1) &&
 (((m_pipesLog2 == 1) && ((m_seLog2 == 2) || (m_seLog2 == 3))) ||
  ((m_pipesLog2 == 2) && ((m_seLog2 == 1) || (m_seLog2 == 2)
 {
 ADDR_ASSERT(m_settings.isVega10 == FALSE);
 ADDR_ASSERT(m_settings.isRaven == FALSE);
+ADDR_ASSERT(m_settings.isVega20 == FALSE);
 
 if (m_settings.isVega12)
 {
 m_settings.htileCacheRbConflict = 1;
 }
 }
 }
 else
 {
 valid = FALSE;
@@ -1266,21 +1267,21 @@ ChipFamily Gfx9Lib::HwlConvertChipFamily(
 UINT_32 uChipRevision)  ///< [in] chip revision defined in 
"asic_family"_id.h
 {
 ChipFamily family = ADDR_CHIP_FAMILY_AI;
 
 switch (uChipFamily)
 {
 case FAMILY_AI:
 m_settings.isArcticIsland = 1;
 m_settings.isVega10= ASICREV_IS_VEGA10_P(uChipRevision);
 m_settings.isVega12= ASICREV_IS_VEGA12_P(uChipRevision);
-
+m_settings.isVega20= ASICREV_IS_VEGA20_P(uChipRevision);
 m_settings.isDce12 = 1;
 
 if (m_settings.isVega10 == 0)
 {
 m_settings.htileAlignFix = 1;
 m_settings.applyAliasFix = 1;
 }
 
 m_settings.metaBaseAlignFix = 1;
 
diff --git a/src/amd/addrlib/gfx9/gfx9addrlib.h 
b/src/amd/addrlib/gfx9/gfx9addrlib.h
index 

Re: [Mesa-dev] [PATCH 1/2] mesa: MESA_framebuffer_flip_y extension [v3]

2018-07-11 Thread Chad Versace
+Ken, I had a question about GLboolean. I call you by name in the
comments below.

On Fri 29 Jun 2018, Fritz Koenig wrote:
> Adds an extension to glFramebufferParameteri
> that will specify if the framebuffer is vertically
> flipped. Historically system framebuffers are
> vertically flipped and user framebuffers are not.
> Checking to see the state was done by looking at
> the name field.  This adds an explicit field.
> 
> v2:
> * updated spec language [for chadv]
> * correctly specifying ES 3.1 [for chadv]
> * refactor access to rb->Name [for jason]
> * handle GetFramebufferParameteriv [for chadv]
> v3:
> * correct _mesa_GetMultisamplefv [for kusmabite]
> ---

>  docs/specs/MESA_framebuffer_flip_y.spec| 84 ++

Use file extension '.txt'. Khronos no longer uses the '.spec' extension.

File docs/specs/enums.txt needs an update too.

>  include/GLES2/gl2ext.h |  5 ++
>  src/mapi/glapi/registry/gl.xml |  6 ++
>  src/mesa/drivers/dri/i915/intel_fbo.c  |  7 +-
>  src/mesa/drivers/dri/i965/intel_fbo.c  |  7 +-
>  src/mesa/drivers/dri/nouveau/nouveau_fbo.c |  7 +-
>  src/mesa/drivers/dri/radeon/radeon_fbo.c   |  7 +-
>  src/mesa/drivers/dri/radeon/radeon_span.c  |  9 ++-
>  src/mesa/drivers/dri/swrast/swrast.c   |  7 +-
>  src/mesa/drivers/osmesa/osmesa.c   |  5 +-
>  src/mesa/drivers/x11/xm_buffer.c   |  3 +-
>  src/mesa/drivers/x11/xmesaP.h  |  3 +-
>  src/mesa/main/accum.c  | 17 +++--
>  src/mesa/main/dd.h |  3 +-
>  src/mesa/main/extensions_table.h   |  1 +
>  src/mesa/main/fbobject.c   | 18 -
>  src/mesa/main/framebuffer.c|  1 +
>  src/mesa/main/glheader.h   |  3 +
>  src/mesa/main/mtypes.h |  3 +
>  src/mesa/main/readpix.c| 20 +++---
>  src/mesa/state_tracker/st_cb_fbo.c |  7 +-
>  src/mesa/swrast/s_blit.c   | 17 +++--
>  src/mesa/swrast/s_clear.c  |  3 +-
>  src/mesa/swrast/s_copypix.c| 11 +--
>  src/mesa/swrast/s_depth.c  |  6 +-
>  src/mesa/swrast/s_drawpix.c| 26 ---
>  src/mesa/swrast/s_renderbuffer.c   |  6 +-
>  src/mesa/swrast/s_renderbuffer.h   |  3 +-
>  src/mesa/swrast/s_stencil.c|  3 +-
>  29 files changed, 241 insertions(+), 57 deletions(-)
>  create mode 100644 docs/specs/MESA_framebuffer_flip_y.spec
> 
> diff --git a/docs/specs/MESA_framebuffer_flip_y.spec 
> b/docs/specs/MESA_framebuffer_flip_y.spec
> new file mode 100644
> index 00..dca77a9541
> --- /dev/null
> +++ b/docs/specs/MESA_framebuffer_flip_y.spec
> @@ -0,0 +1,84 @@
> +Name
> +
> +MESA_framebuffer_flip_y
> +
> +Name Strings
> +
> +GL_MESA_framebuffer_flip_y
> +
> +Contact
> +
> +Fritz Koenig 
> +
> +Contributors
> +
> +Fritz Koenig, Google
> +Kristian Høgsberg, Google
> +Chad Versace, Google
> +
> +Status
> +
> +Proposal
> +
> +Version
> +
> +Version 1, June 7, 2018
> +
> +Number
> +
> +TBD
> +
> +Dependencies
> +
> +OpenGL ES 3.1 is required, for FramebufferParameteri.
> +
> +Overview
> +
> +Rendered buffers are normally returned right side up, as accessed
> +top to bottom.  This extension allows those buffers to be upside down
> +when accessed top to bottom.
> +
> +This extension defines a new framebuffer parameter,
> +GL_FRAMEBUFFER_FLIP_Y_MESA, that changes the behavior of the reads and
> +writes to the framebuffer attachment points. When 
> GL_FRAMEBUFFER_FLIP_Y_MESA
> +is GL_TRUE, render commands and pixel transfer operations access the
> +backing store of each attachment point with an y-inverted coordinate
> +system. This y-inversion is relative to the coordinate system set when
> +GL_FRAMEBUFFER_FLIP_Y_MESA is GL_FALSE.
> +
> +Access through TexSubImage2D and similar calls will notice the effect of
> +the flip when they are not attached to framebuffer objects because
> +GL_FRAMEBUFFER_FLIP_Y_MESA is associated with the framebuffer object and
> +not the attachment points.
> +
> +IP Status
> +
> +None
> +
> +Issues
> +
> +None
> +
> +New Procedures and Functions
> +
> +None
> +
> +New Types
> +
> +None
> +
> +New Tokens
> +
> +Accepted by the  argument of FramebufferParameteri and
> +GetFramebufferParameteriv:
> +
> +GL_FRAMEBUFFER_FLIP_Y_MESA  0x8BBB
> +
> +Errors
> +GL_INVALID_OPERATION is returned from  GetFramebufferParameteriv if this
> +is called on a winsys framebuffer.


Above, s/on a winsys framebuffer/on the default framebuffer/


> +
> +Revision History
> +
> +Version 1, June, 2018
> +Initial draft (Fritz Koenig)
> diff --git a/include/GLES2/gl2ext.h b/include/GLES2/gl2ext.h
> index a7d19a1fc8..0a93bfb865 100644
> --- a/include/GLES2/gl2ext.h
> +++ b/include/GLES2/gl2ext.h
> @@ -2334,6 

[Mesa-dev] [PATCH 2/2] intel: Make the decoder handle STATE_BASE_ADDRESS not being a buffer.

2018-07-11 Thread Kenneth Graunke
Normally, i965 programs STATE_BASE_ADDRESS every batch, and puts all
state for a given base in a single buffer.

I'm working on a prototype which emits STATE_BASE_ADDRESS only once at
startup, where each base address is a fixed 4GB region of the PPGTT.
State may live in many buffers in that 4GB region, even if there isn't
a buffer located at the actual base address itself.

To handle this, we need to save the STATE_BASE_ADDRESS values across
multiple batches, rather than assuming we'll see the command each time.
Then, each time we see a pointer, we need to ask the driver for the BO
map for that data.  (We can't just use the map for the base address, as
state may be in multiple buffers, and there may not even be a buffer
at the base address to map.)
---
 src/intel/common/gen_batch_decoder.c | 83 
 src/intel/common/gen_decoder.h   |  9 ++-
 2 files changed, 56 insertions(+), 36 deletions(-)

diff --git a/src/intel/common/gen_batch_decoder.c 
b/src/intel/common/gen_batch_decoder.c
index fe7536da9ec..6cb66bcb257 100644
--- a/src/intel/common/gen_batch_decoder.c
+++ b/src/intel/common/gen_batch_decoder.c
@@ -128,13 +128,13 @@ static void
 ctx_disassemble_program(struct gen_batch_decode_ctx *ctx,
 uint32_t ksp, const char *type)
 {
-   if (!ctx->instruction_base.map)
+   uint64_t addr = ctx->instruction_base.addr + ksp;
+   struct gen_batch_decode_bo bo = ctx_get_bo(ctx, addr);
+   if (!bo.map)
   return;
 
-   printf("\nReferenced %s:\n", type);
-   gen_disasm_disassemble(ctx->disasm,
-  (void *)ctx->instruction_base.map, ksp,
-  ctx->fp);
+   fprintf(ctx->fp, "\nReferenced %s:\n", type);
+   gen_disasm_disassemble(ctx->disasm, bo.map, 0, ctx->fp);
 }
 
 /* Heuristic to determine whether a uint32_t is probably actually a float
@@ -225,35 +225,30 @@ dump_binding_table(struct gen_batch_decode_ctx *ctx, 
uint32_t offset, int count)
if (count < 0)
   count = update_count(ctx, offset, 1, 8);
 
-   if (ctx->surface_base.map == NULL) {
+   struct gen_batch_decode_bo bind_bo =
+  ctx_get_bo(ctx, ctx->surface_base.addr + offset);
+
+   if (bind_bo.map == NULL) {
   fprintf(ctx->fp, "  binding table unavailable\n");
   return;
}
 
-   if (offset % 32 != 0 || offset >= UINT16_MAX ||
-   offset >= ctx->surface_base.size) {
+   if (offset % 32 != 0 || offset >= UINT16_MAX || offset >= bind_bo.size) {
   fprintf(ctx->fp, "  invalid binding table pointer\n");
   return;
}
 
-   struct gen_batch_decode_bo bo = ctx->surface_base;
-   const uint32_t *pointers = ctx->surface_base.map + offset;
+   const uint32_t *pointers = bind_bo.map;
for (int i = 0; i < count; i++) {
   if (pointers[i] == 0)
  continue;
 
-  if (pointers[i] % 32 != 0) {
- fprintf(ctx->fp, "pointer %u: %08x \n", i, pointers[i]);
- continue;
-  }
-
   uint64_t addr = ctx->surface_base.addr + pointers[i];
+  struct gen_batch_decode_bo bo = ctx_get_bo(ctx, addr);
   uint32_t size = strct->dw_length * 4;
 
-  if (addr < bo.addr || addr + size >= bo.addr + bo.size)
- bo = ctx->get_bo(ctx->user_data, addr);
-
-  if (addr < bo.addr || addr + size >= bo.addr + bo.size) {
+  if (pointers[i] % 32 != 0 ||
+  addr < bo.addr || addr + size >= bo.addr + bo.size) {
  fprintf(ctx->fp, "pointer %u: %08x \n", i, pointers[i]);
  continue;
   }
@@ -271,18 +266,20 @@ dump_samplers(struct gen_batch_decode_ctx *ctx, uint32_t 
offset, int count)
if (count < 0)
   count = update_count(ctx, offset, strct->dw_length, 4);
 
-   if (ctx->dynamic_base.map == NULL) {
+   uint64_t state_addr = ctx->dynamic_base.addr + offset;
+   struct gen_batch_decode_bo bo = ctx_get_bo(ctx, state_addr);
+   const void *state_map = bo.map;
+
+   if (state_map == NULL) {
   fprintf(ctx->fp, "  samplers unavailable\n");
   return;
}
 
-   if (offset % 32 != 0 || offset >= ctx->dynamic_base.size) {
+   if (offset % 32 != 0 || state_addr - bo.addr >= bo.size) {
   fprintf(ctx->fp, "  invalid sampler state pointer\n");
   return;
}
 
-   uint64_t state_addr = ctx->dynamic_base.addr + offset;
-   const void *state_map = ctx->dynamic_base.map + offset;
for (int i = 0; i < count; i++) {
   fprintf(ctx->fp, "sampler state %d\n", i);
   ctx_print_group(ctx, strct, state_addr, state_map);
@@ -295,9 +292,6 @@ static void
 handle_media_interface_descriptor_load(struct gen_batch_decode_ctx *ctx,
const uint32_t *p)
 {
-   if (ctx->dynamic_base.map == NULL)
-  return;
-
struct gen_group *inst = gen_spec_find_instruction(ctx->spec, p);
struct gen_group *desc =
   gen_spec_find_struct(ctx->spec, "INTERFACE_DESCRIPTOR_DATA");
@@ -316,7 +310,12 @@ handle_media_interface_descriptor_load(struct 
gen_batch_decode_ctx *ctx,
}
 
uint64_t desc_addr = ctx->dynamic_base.addr + 

[Mesa-dev] [PATCH 1/2] intel: Make the disassembler take a const pointer to the assembly.

2018-07-11 Thread Kenneth Graunke
Disassembling doesn't modify the assembly.
---
 src/intel/common/gen_disasm.c | 7 ---
 src/intel/common/gen_disasm.h | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/intel/common/gen_disasm.c b/src/intel/common/gen_disasm.c
index 1de20f576d4..4f835c19883 100644
--- a/src/intel/common/gen_disasm.c
+++ b/src/intel/common/gen_disasm.c
@@ -44,14 +44,15 @@ is_send(uint32_t opcode)
 }
 
 static int
-gen_disasm_find_end(struct gen_disasm *disasm, void *assembly, int start)
+gen_disasm_find_end(struct gen_disasm *disasm,
+const void *assembly, int start)
 {
struct gen_device_info *devinfo = >devinfo;
int offset = start;
 
/* This loop exits when send-with-EOT or when opcode is 0 */
while (true) {
-  brw_inst *insn = assembly + offset;
+  const brw_inst *insn = assembly + offset;
 
   if (brw_inst_cmpt_control(devinfo, insn)) {
  offset += 8;
@@ -70,7 +71,7 @@ gen_disasm_find_end(struct gen_disasm *disasm, void 
*assembly, int start)
 }
 
 void
-gen_disasm_disassemble(struct gen_disasm *disasm, void *assembly,
+gen_disasm_disassemble(struct gen_disasm *disasm, const void *assembly,
int start, FILE *out)
 {
struct gen_device_info *devinfo = >devinfo;
diff --git a/src/intel/common/gen_disasm.h b/src/intel/common/gen_disasm.h
index c8c18b2cf03..d979114588d 100644
--- a/src/intel/common/gen_disasm.h
+++ b/src/intel/common/gen_disasm.h
@@ -34,7 +34,7 @@ struct gen_disasm;
 
 struct gen_disasm *gen_disasm_create(const struct gen_device_info *devinfo);
 void gen_disasm_disassemble(struct gen_disasm *disasm,
-void *assembly, int start, FILE *out);
+const void *assembly, int start, FILE *out);
 
 void gen_disasm_destroy(struct gen_disasm *disasm);
 
-- 
2.18.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/batch_decoder: decoding of 3DSTATE_CONSTANT_BODY.

2018-07-11 Thread Kenneth Graunke
On Wednesday, July 11, 2018 4:43:52 AM PDT Sergii Romantsov wrote:
> SNB doesn't have a difinition of 3DSTATE_CONSTANT_BODY, thats
> why we got segmentation fault when used INTEL_DEBUG=bat.
> Fixed by avoiding parsing of 3DSTATE_CONSTANT_BODY if gen_spec
> is not observed.
> 
> Fixes: 169d8e011ae (intel: Fix 3DSTATE_CONSTANT buffer decoding.)
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107190
> Signed-off-by: Sergii Romantsov 
> ---
>  src/intel/common/gen_batch_decoder.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/src/intel/common/gen_batch_decoder.c 
> b/src/intel/common/gen_batch_decoder.c
> index fe7536d..973221b 100644
> --- a/src/intel/common/gen_batch_decoder.c
> +++ b/src/intel/common/gen_batch_decoder.c
> @@ -561,6 +561,10 @@ decode_3dstate_constant(struct gen_batch_decode_ctx 
> *ctx, const uint32_t *p)
> struct gen_group *inst = gen_spec_find_instruction(ctx->spec, p);
> struct gen_group *body =
>gen_spec_find_struct(ctx->spec, "3DSTATE_CONSTANT_BODY");
> +   if (body == NULL) {
> +  fprintf(ctx->fp, "did not find 3DSTATE_CONSTANT_BODY info\n");
> +  return;
> +   }
>  
> uint32_t read_length[4] = {0};
> uint64_t read_addr[4];
> 

Could we refactor gen6.xml to have a 3DSTATE_CONSTANT_BODY structure
instead?  It would be layed out a bit differently than the Gen7+ one,
but we've essentially duplicated the same fields in 3DSTATE_CONSTANT_VS,
GS, and PS.  Then, this decoding should actually work, which would be
nicer...


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/26] python: Fix rich comparisons

2018-07-11 Thread Mathieu Bridon
Python 3 lost the cmp() builtin, and doesn't call objects __cmp__()
methods any more to compare them.

Instead, Python 3 requires implementing the rich comparison methods
explicitly: __eq__(), __ne(), __lt__(), __le__(), __gt__() and __ge__().

Fortunately those are trivial to implement by just calling the existing
__cmp__() method, which makes the code compatible with both Python 2 and
Python 3.

This commit only implements the comparison methods which are actually
used by the build scripts.

In addition, this commit brings back to Python 3 the cmp() builtin
method as required.

Signed-off-by: Mathieu Bridon 
---
 src/amd/vulkan/radv_extensions.py  |  6 +-
 src/intel/vulkan/anv_extensions.py |  6 +-
 src/mapi/mapi_abi.py   | 13 +
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/src/amd/vulkan/radv_extensions.py 
b/src/amd/vulkan/radv_extensions.py
index c36559f48e..13a05fa21d 100644
--- a/src/amd/vulkan/radv_extensions.py
+++ b/src/amd/vulkan/radv_extensions.py
@@ -151,7 +151,11 @@ class VkVersion:
 other = copy.copy(other)
 other.patch = self.patch
 
-return self.__int_ver().__cmp__(other.__int_ver())
+return self.__int_ver() - other.__int_ver()
+
+def __gt__(self, other):
+return self.__cmp__(other) > 0
+
 
 MAX_API_VERSION = VkVersion(MAX_API_VERSION)
 
diff --git a/src/intel/vulkan/anv_extensions.py 
b/src/intel/vulkan/anv_extensions.py
index adc1d75898..47dba164ef 100644
--- a/src/intel/vulkan/anv_extensions.py
+++ b/src/intel/vulkan/anv_extensions.py
@@ -166,7 +166,11 @@ class VkVersion:
 other = copy.copy(other)
 other.patch = self.patch
 
-return self.__int_ver().__cmp__(other.__int_ver())
+return self.__int_ver() - other.__int_ver()
+
+def __gt__(self, other):
+return self.__cmp__(other) > 0
+
 
 
 MAX_API_VERSION = VkVersion('0.0.0')
diff --git a/src/mapi/mapi_abi.py b/src/mapi/mapi_abi.py
index be1d15d922..67fdb10650 100644
--- a/src/mapi/mapi_abi.py
+++ b/src/mapi/mapi_abi.py
@@ -38,6 +38,15 @@ import gl_XML
 import glX_XML
 
 
+try:
+cmp
+
+except NameError:
+# Python 3 does not have cmp()
+def cmp(a, b):
+return ((a > b) - (a < b))
+
+
 # number of dynamic entries
 ABI_NUM_DYNAMIC_ENTRIES = 256
 
@@ -135,6 +144,10 @@ class ABIEntry(object):
 
 return res
 
+def __lt__(self, other):
+return self.__cmp__(other) < 0
+
+
 def abi_parse_xml(xml):
 """Parse a GLAPI XML file for ABI entries."""
 api = gl_XML.parse_GL_API(xml, glX_XML.glx_item_factory())
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 10/26] python: Use explicit integer divisions

2018-07-11 Thread Mathieu Bridon
In Python 2, divisions return an integer:

>>> 32 / 4
8

In Python 3 though, they return floats:

>>> 32 / 4
8.0

However, Python 3 has an explicit integer division operator:

>>> 32 // 4
8

That operator exists on Python >= 2.2, so let's use it everywhere to
make the scripts compatible with both Python 2 and 3.

In addition, using __future__.division tells Python 2 to behave the same
way as Python 3, which helps ensure the scripts produce the same output
in both versions of Python.

Signed-off-by: Mathieu Bridon 
---
 src/gallium/auxiliary/util/u_format_pack.py  | 4 ++--
 src/gallium/auxiliary/util/u_format_parse.py | 7 +--
 src/mapi/glapi/gen/glX_proto_send.py | 4 ++--
 src/mesa/main/format_info.py | 4 ++--
 src/mesa/main/format_pack.py | 8 
 src/mesa/main/format_unpack.py   | 8 
 6 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/src/gallium/auxiliary/util/u_format_pack.py 
b/src/gallium/auxiliary/util/u_format_pack.py
index 7a952a48b3..ad2e49281f 100644
--- a/src/gallium/auxiliary/util/u_format_pack.py
+++ b/src/gallium/auxiliary/util/u_format_pack.py
@@ -36,7 +36,7 @@
 '''
 
 
-from __future__ import print_function
+from __future__ import division, print_function
 
 from u_format_parse import *
 
@@ -240,7 +240,7 @@ def value_to_native(type, value):
 return truncate_mantissa(value, 23)
 return value
 if type.type == FIXED:
-return int(value * (1 << (type.size/2)))
+return int(value * (1 << (type.size // 2)))
 if not type.norm:
 return int(value)
 if type.type == UNSIGNED:
diff --git a/src/gallium/auxiliary/util/u_format_parse.py 
b/src/gallium/auxiliary/util/u_format_parse.py
index c0456f6d15..d3874cd895 100644
--- a/src/gallium/auxiliary/util/u_format_parse.py
+++ b/src/gallium/auxiliary/util/u_format_parse.py
@@ -29,6 +29,9 @@
 '''
 
 
+from __future__ import division
+
+
 VOID, UNSIGNED, SIGNED, FIXED, FLOAT = range(5)
 
 SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Z, SWIZZLE_W, SWIZZLE_0, SWIZZLE_1, 
SWIZZLE_NONE, = range(7)
@@ -76,7 +79,7 @@ class Channel:
 if self.type == FLOAT:
 return VERY_LARGE
 if self.type == FIXED:
-return (1 << (self.size/2)) - 1
+return (1 << (self.size // 2)) - 1
 if self.norm:
 return 1
 if self.type == UNSIGNED:
@@ -90,7 +93,7 @@ class Channel:
 if self.type == FLOAT:
 return -VERY_LARGE
 if self.type == FIXED:
-return -(1 << (self.size/2))
+return -(1 << (self.size // 2))
 if self.type == UNSIGNED:
 return 0
 if self.norm:
diff --git a/src/mapi/glapi/gen/glX_proto_send.py 
b/src/mapi/glapi/gen/glX_proto_send.py
index a920ecc012..03067d8a3c 100644
--- a/src/mapi/glapi/gen/glX_proto_send.py
+++ b/src/mapi/glapi/gen/glX_proto_send.py
@@ -26,7 +26,7 @@
 #Ian Romanick 
 #Jeremy Kolb 
 
-from __future__ import print_function
+from __future__ import division, print_function
 
 import argparse
 
@@ -809,7 +809,7 @@ generic_%u_byte( GLint rop, const void * ptr )
 # Dividing by the array size (1 for
 # non-arrays) gives us this.
 
-s = p.size() / p.get_element_count()
+s = p.size() // p.get_element_count()
 print("   %s __glXReadReply(dpy, %s, %s, %s);" % 
(return_str, s, p.name, aa))
 got_reply = 1
 
diff --git a/src/mesa/main/format_info.py b/src/mesa/main/format_info.py
index bbecaa2451..00e27b3fba 100644
--- a/src/mesa/main/format_info.py
+++ b/src/mesa/main/format_info.py
@@ -21,7 +21,7 @@
 # TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
 # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 
-from __future__ import print_function
+from __future__ import division, print_function
 
 import format_parser as parser
 import sys
@@ -198,7 +198,7 @@ for fmat in formats:
   chan = fmat.array_element()
   norm = chan.norm or chan.type == parser.FLOAT
   print('  .ArrayFormat = MESA_ARRAY_FORMAT({0}),'.format(', '.join([
- str(chan.size / 8),
+ str(chan.size // 8),
  str(int(chan.sign)),
  str(int(chan.type == parser.FLOAT)),
  str(int(norm)),
diff --git a/src/mesa/main/format_pack.py b/src/mesa/main/format_pack.py
index d3c8d24acd..05262efb5b 100644
--- a/src/mesa/main/format_pack.py
+++ b/src/mesa/main/format_pack.py
@@ -1,4 +1,4 @@
-from __future__ import print_function
+from __future__ import division, print_function
 
 from mako.template import Template
 from sys import argv
@@ -356,7 +356,7 @@ _mesa_pack_ubyte_rgba_row(mesa_format format, GLuint n,
case ${f.name}:
   for (i = 0; i < n; ++i) {
  pack_ubyte_${f.short_name()}(src[i], d);
- d += ${f.block_size() / 8};
+ d += ${f.block_size() // 8};
   }
   break;
 %endfor
@@ 

Re: [Mesa-dev] [PATCH] vulkan: Fix compilation on older platforms

2018-07-11 Thread Dylan Baker
Quoting Danylo Piliaiev (2018-07-11 04:26:03)
> diff --git a/meson.build b/meson.build
> index 7d12af3d51..2683060827 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -1088,6 +1088,8 @@ _drm_freedreno_ver = '2.4.92'
>  _drm_intel_ver = '2.4.75'
>  _drm_ver = '2.4.75'
>  
> +_drm_crt_sequence_ver = '2.4.89'
> +
>  _libdrm_checks = [
>['intel', with_dri_i915 or with_gallium_i915],
>['amdgpu', with_amd_vk or with_gallium_radeonsi],
> @@ -1361,11 +1363,18 @@ if with_platform_x11
>  dep_xcb_xfixes = dependency('xcb-xfixes')
>endif
>if with_xlib_lease
> -dep_xcb_xrandr = dependency('xcb-randr', version : '>= 1.12')
> +dep_xcb_xrandr = dependency('xcb-randr', version : '>= 1.13')
>  dep_xlib_xrandr = dependency('xrandr', version : '>= 1.3')
>endif
>  endif
>  
> +if with_any_vk
> +  dep_drm_crt_sequence = dependency('libdrm', version : '>=' + 
> _drm_crt_sequence_ver, required : false)
> +  if dep_drm_crt_sequence.found()
> +pre_args += '-DVK_USE_DISPLAY_CONTROL'
> +  endif
> +endif
> +

Instead of calling into pkg-config again, how about in the "if
dep_libdrm.found()" check around line 1131, we add:

if with_any_vk
  if dep_libdrm.version().version_compare('>= ' + _drm_crt_sequence_ver):
pre_args += '-DVK_USE_DISPLAY_CONTROL'
  endif
endif

Or (since radv always requires libdrm > 2.4.89), why don't we just set the set
the minimum to 2.4.89 if vulkan is enabled and be done with it?

Dylan


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] i965/fs: Generalize grf127 hack to dispatch_width >

2018-07-11 Thread Caio Marcelo de Oliveira Filho
> > diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp
> > b/src/intel/compiler/brw_fs_reg_allocate.cpp
> > index 59e047483c0..417ddeba09c 100644
> > --- a/src/intel/compiler/brw_fs_reg_allocate.cpp
> > +++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
> > @@ -549,7 +549,7 @@ fs_visitor::assign_regs(bool allow_spilling, bool
> > spill_all)
> >if (devinfo->gen >= 7)
> >   node_count += BRW_MAX_GRF - GEN7_MRF_HACK_START;
> >int grf127_send_hack_node = node_count;
> > -   if (devinfo->gen >= 8 && dispatch_width == 8)
> > +   if (devinfo->gen >= 8 && dispatch_width >= 8)
> 
> dispatch_width is always >= 8.

Yes, I was so focused on the other bits that completely messed up
this one. 


Thanks,
Caio
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] egl/android: Delete set_damage_region from egl dri vtbl

2018-07-11 Thread Harish Krupo
Hi Eric,

Eric Anholt  writes:

> Harish Krupo  writes:
>
>> The intension of the KHR_partial_update was not to send the damage back
>> to the platform but to send the damage to the driver to ensure that the
>> following rendering could be restricted to those regions.
>> This patch removes the set_damage_region from the egl_dri vtbl and all
>> the platfrom_*.c files.
>> Then upcomming patches add a new dri2 interface for the drivers to
>> implement
>>
>> Signed-off-by: Harish Krupo 
>
> Why shouldn't the platform know about the damage region in a swap, if
> it's available?  It looks like it was successfully used for Android, and
> we should be using it for Present as well.

From the spec [1], the damage region referred to by partial_update spec is
the damaged part of the buffer when it is used again. The damage that the
compositor/platform needs to know is the damage between the (n-1)th
frame and the nth frame. Quoting from the spec:
"   The surface damage for frame n is the difference between frame n and frame
(n-1), and represents the area that a compositor must recompose."
This is the damage referred to by the swap_buffers_with_damage spec [2],
whereas the partial_update damage region's objective is to restrict the 
subsequent
rendering operations on the back buffer, to only those regions which have 
changed since
that buffer was last used. This information is available as the buffer
age. Some more information: [3].


[1] 
https://www.khronos.org/registry/EGL/extensions/KHR/EGL_KHR_partial_update.txt
[2] 
https://www.khronos.org/registry/EGL/extensions/EXT/EGL_EXT_swap_buffers_with_damage.txt
[3] 
https://community.arm.com/graphics/b/blog/posts/mali-performance-3-is-egl_5f00_buffer_5f00_preserved-a-good-thing

Thank you
Regards
Harish Krupo
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 25/49] docs/meson.html: fix numerous issues spotted by xmllint

2018-07-11 Thread Dylan Baker
Quoting Erik Faye-Lund (2018-07-11 00:39:37)
> On 11. juli 2018 01:17, Dylan Baker wrote:
> > ---
> >   docs/meson.html | 5 ++---
> >   1 file changed, 2 insertions(+), 3 deletions(-)
> 
> This is a HTML-document, not an XML document or XHTML. What xmllint 
> thinks should be irrelevant.

I would certinaly hope that xmllint's --html mode would be for html, and not xml
or xhtml.

At this point however, everything else that it caught (which was quite useful),
has been fixed by other patches, so I'll just drop this one.

> 
> > diff --git a/docs/meson.html b/docs/meson.html
> > index b64ca2ec35e..d4b1861ff26 100644
> > --- a/docs/meson.html
> > +++ b/docs/meson.html
> > @@ -1,9 +1,9 @@
> >> "http://www.w3.org/TR/html4/loose.dtd;>
> >   
> >   
> > -  
> > +  
> 
> No, meta-tags should not have a trailing slash:
> https://www.w3.org/TR/html401/struct/global.html#h-7.4.4
> 
> > Compilation and Installation using Meson
> > -  
> > +  
> 
> Link tags shouldn't have trailing slashes either:
> https://www.w3.org/TR/html401/struct/links.html#edef-LINK

> 
> >   
> >   
> >   
> > @@ -247,7 +247,6 @@ is unrelated to the buildtype; setting the 
> > latter to
> >   
> >   
> >   
> > -
> >   
> >   
> >   
> 


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] i965/fs: Generalize grf127 hack to dispatch_width > 8

2018-07-11 Thread Caio Marcelo de Oliveira Filho
Hi,

Thanks for the explanations :-)

> > -  ra_set_node_reg(g, grf127_send_hack_node, 127);
> > +  ra_set_node_reg(g, grf127_send_hack_node, 128 - reg_width);
> 
> This configuration is more restrictive than needed. The original code
> just avoids any register with any length that uses the physical register
> grf127. Your code works for SIMD16, but as you are setting conflicts
> with grf126 in SIMD16, you are forbidding the use of grf125 using with
> regsize=2, and the same with grf123 with size 4, when this options never
> use grf127. You don't need to take care of the reg_width here, just
> about which physical register you can not use.

That was my first attempt, but I think it was failing because of the
mistake below.



> >foreach_block_and_inst(block, fs_inst, inst, cfg) {
> >   if (inst->is_send_from_grf() && inst->dst.file == VGRF) {
> >  ra_add_node_interference(g, inst->dst.nr, 
> > grf127_send_hack_node);
> > 
> 
> The issue here is that the unspill instructions aren't in the list of
> the is_send_from_grf. I thought we could update is_send_from_grf to
> include the read/write scratch operations but finally I think that it
> didn't have sense because  the source at this point is an MRF that will
> be finally assigned to a GRF on Gen7+.

Yes. Reading more of the spilling code today I can see how this won't
work. I was somehow under the idea that the actual register choice
would be preserved under a spill, but if we are spilling is precisely
because we don't have proper register allocation.

 
> I've sent a patch with my solution that I think solves the case of
> unspill that is creating this problem, but maybe we need to think if
> there are more SEND instructions that could have this problem because of
> using the MRF as source.

Great! I'll take a look.


Thanks,
Caio
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/26] python: Use range() instead of xrange()

2018-07-11 Thread Dylan Baker
I've sent 4-9 to our CI, and assuming that it comes back green I'll go ahead and
merge those patches today.

Quoting Mathieu Bridon (2018-07-05 06:17:40)
> Python 2 has a range() function which returns a list, and an xrange()
> one which returns an iterator.
> 
> Python 3 lost the function returning a list, and renamed the function
> returning an iterator as range().
> 
> As a result, using range() makes the scripts compatible with both Python
> versions 2 and 3.
> 
> Signed-off-by: Mathieu Bridon 
> ---
>  src/amd/vulkan/radv_entrypoints_gen.py   | 2 +-
>  src/broadcom/cle/gen_pack_header.py  | 2 +-
>  src/compiler/glsl/ir_expression_operation.py | 2 +-
>  src/compiler/nir/nir_opcodes.py  | 4 ++--
>  src/intel/vulkan/anv_entrypoints_gen.py  | 2 +-
>  src/mapi/glapi/gen/glX_proto_send.py | 2 +-
>  src/mapi/glapi/gen/gl_XML.py | 2 +-
>  src/mapi/glapi/gen/gl_gentable.py| 4 ++--
>  src/mapi/mapi_abi.py | 2 +-
>  src/mesa/main/format_parser.py   | 4 ++--
>  10 files changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/src/amd/vulkan/radv_entrypoints_gen.py 
> b/src/amd/vulkan/radv_entrypoints_gen.py
> index 9c4dfd02a0..ca022bcbb0 100644
> --- a/src/amd/vulkan/radv_entrypoints_gen.py
> +++ b/src/amd/vulkan/radv_entrypoints_gen.py
> @@ -136,7 +136,7 @@ static const struct string_map_entry string_map_entries[] 
> = {
>  /* Hash table stats:
>   * size ${len(strmap.sorted_strings)} entries
>   * collisions entries:
> -% for i in xrange(10):
> +% for i in range(10):
>   * ${i}${'+' if i == 9 else ' '} ${strmap.collisions[i]}
>  % endfor
>   */
> diff --git a/src/broadcom/cle/gen_pack_header.py 
> b/src/broadcom/cle/gen_pack_header.py
> index c6e1c564e6..8ad54464cb 100644
> --- a/src/broadcom/cle/gen_pack_header.py
> +++ b/src/broadcom/cle/gen_pack_header.py
> @@ -216,7 +216,7 @@ class Group(object):
>  first_byte = field.start // 8
>  last_byte = field.end // 8
>  
> -for b in xrange(first_byte, last_byte + 1):
> +for b in range(first_byte, last_byte + 1):
>  if not b in bytes:
>  bytes[b] = self.Byte()
>  
> diff --git a/src/compiler/glsl/ir_expression_operation.py 
> b/src/compiler/glsl/ir_expression_operation.py
> index b3dac3da3f..16b98690a6 100644
> --- a/src/compiler/glsl/ir_expression_operation.py
> +++ b/src/compiler/glsl/ir_expression_operation.py
> @@ -116,7 +116,7 @@ constant_template_common = mako.template.Template("""\
>  constant_template_vector_scalar = mako.template.Template("""\
> case ${op.get_enum_name()}:
>  % if "mixed" in op.flags:
> -% for i in xrange(op.num_operands):
> +% for i in range(op.num_operands):
>assert(op[${i}]->type->base_type == ${op.source_types[0].glsl_type} ||
>  % for src_type in op.source_types[1:-1]:
>   op[${i}]->type->base_type == ${src_type.glsl_type} ||
> diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
> index 3c3316dcaa..b03c5da2ea 100644
> --- a/src/compiler/nir/nir_opcodes.py
> +++ b/src/compiler/nir/nir_opcodes.py
> @@ -367,8 +367,8 @@ for (unsigned bit = 0; bit < bit_size; bit++) {
>  """)
>  
>  
> -for i in xrange(1, 5):
> -   for j in xrange(1, 5):
> +for i in range(1, 5):
> +   for j in range(1, 5):
>unop_horiz("fnoise{0}_{1}".format(i, j), i, tfloat, j, tfloat, "0.0f")
>  
>  
> diff --git a/src/intel/vulkan/anv_entrypoints_gen.py 
> b/src/intel/vulkan/anv_entrypoints_gen.py
> index 8a37336496..5e2cd0740a 100644
> --- a/src/intel/vulkan/anv_entrypoints_gen.py
> +++ b/src/intel/vulkan/anv_entrypoints_gen.py
> @@ -145,7 +145,7 @@ static const struct string_map_entry string_map_entries[] 
> = {
>  /* Hash table stats:
>   * size ${len(strmap.sorted_strings)} entries
>   * collisions entries:
> -% for i in xrange(10):
> +% for i in range(10):
>   * ${i}${'+' if i == 9 else ' '} ${strmap.collisions[i]}
>  % endfor
>   */
> diff --git a/src/mapi/glapi/gen/glX_proto_send.py 
> b/src/mapi/glapi/gen/glX_proto_send.py
> index fba2f0cc1e..a920ecc012 100644
> --- a/src/mapi/glapi/gen/glX_proto_send.py
> +++ b/src/mapi/glapi/gen/glX_proto_send.py
> @@ -392,7 +392,7 @@ static const struct proc_pair
> _glapi_proc proc;
>  } proc_pairs[%d] = {""" % len(procs))
>  names = sorted(procs.keys())
> -for i in xrange(len(names)):
> +for i in range(len(names)):
>  comma = ',' if i < len(names) - 1 else ''
>  print('   { "%s", (_glapi_proc) gl%s }%s' % (names[i], 
> procs[names[i]], comma))
>  print("""};
> diff --git a/src/mapi/glapi/gen/gl_XML.py b/src/mapi/glapi/gen/gl_XML.py
> index bfbb1ec6e0..96dc1b3c12 100644
> --- a/src/mapi/glapi/gen/gl_XML.py
> +++ b/src/mapi/glapi/gen/gl_XML.py
> @@ -834,7 +834,7 @@ class gl_function( gl_item ):
>  versions.
>  """
>  result = []
> -for entry_point, 

Re: [Mesa-dev] [Mesa-stable] [PATCH v3] radv: make sure to wait for CP DMA when needed

2018-07-11 Thread Dylan Baker
Quoting Samuel Pitoiset (2018-07-09 09:02:58)
> This might fix some synchronization issues. I don't know if
> that will affect performance but it's required for correctness.
> 
> v3: - wait for CP DMA in CmdPipelineBarrier()
> - clear the busy value when CP_DMA_SYNC is requested
> v2: - wait for CP DMA in CmdWaitEvents()
> - track if CP DMA is used
> 
> CC: 
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_cmd_buffer.c | 15 +
>  src/amd/vulkan/radv_private.h|  5 +
>  src/amd/vulkan/si_cmd_buffer.c   | 36 
>  3 files changed, 52 insertions(+), 4 deletions(-)
> 
> diff --git a/src/amd/vulkan/radv_cmd_buffer.c 
> b/src/amd/vulkan/radv_cmd_buffer.c
> index 9da42fe03e..5dbdb3d996 100644
> --- a/src/amd/vulkan/radv_cmd_buffer.c
> +++ b/src/amd/vulkan/radv_cmd_buffer.c
> @@ -2596,6 +2596,11 @@ VkResult radv_EndCommandBuffer(
> si_emit_cache_flush(cmd_buffer);
> }
>  
> +   /* Make sure CP DMA is idle at the end of IBs because the kernel
> +* doesn't wait for it.
> +*/
> +   si_cp_dma_wait_for_idle(cmd_buffer);
> +
> vk_free(_buffer->pool->alloc, cmd_buffer->state.attachments);
>  
> if (!cmd_buffer->device->ws->cs_finalize(cmd_buffer->cs))
> @@ -4242,6 +4247,11 @@ radv_barrier(struct radv_cmd_buffer *cmd_buffer,
>  0);
> }
>  
> +   /* Make sure CP DMA is idle because the driver might have performed a
> +* DMA operation for copying or filling buffers/images.
> +*/
> +   si_cp_dma_wait_for_idle(cmd_buffer);
> +
> cmd_buffer->state.flush_bits |= dst_flush_bits;
>  }
>  
> @@ -4292,6 +4302,11 @@ static void write_event(struct radv_cmd_buffer 
> *cmd_buffer,
> VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT |
> VK_PIPELINE_STAGE_VERTEX_INPUT_BIT;
>  
> +   /* Make sure CP DMA is idle because the driver might have performed a
> +* DMA operation for copying or filling buffers/images.
> +*/
> +   si_cp_dma_wait_for_idle(cmd_buffer);
> +
> /* TODO: Emit EOS events for syncing PS/CS stages. */
>  
> if (!(stageMask & ~top_of_pipe_flags)) {
> diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h
> index 4e4b3a6037..2400de49a2 100644
> --- a/src/amd/vulkan/radv_private.h
> +++ b/src/amd/vulkan/radv_private.h
> @@ -979,6 +979,9 @@ struct radv_cmd_state {
> uint32_t last_num_instances;
> uint32_t last_first_instance;
> uint32_t last_vertex_offset;
> +
> +   /* Whether CP DMA is busy/idle. */
> +   bool dma_is_busy;
>  };
>  
>  struct radv_cmd_pool {
> @@ -1091,6 +1094,8 @@ void si_cp_dma_prefetch(struct radv_cmd_buffer 
> *cmd_buffer, uint64_t va,
>  unsigned size);
>  void si_cp_dma_clear_buffer(struct radv_cmd_buffer *cmd_buffer, uint64_t va,
> uint64_t size, unsigned value);
> +void si_cp_dma_wait_for_idle(struct radv_cmd_buffer *cmd_buffer);
> +
>  void radv_set_db_count_control(struct radv_cmd_buffer *cmd_buffer);
>  bool
>  radv_cmd_buffer_upload_alloc(struct radv_cmd_buffer *cmd_buffer,
> diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c
> index 454fd8c39c..6d566a918d 100644
> --- a/src/amd/vulkan/si_cmd_buffer.c
> +++ b/src/amd/vulkan/si_cmd_buffer.c
> @@ -1040,7 +1040,6 @@ static void si_emit_cp_dma(struct radv_cmd_buffer 
> *cmd_buffer,
> struct radeon_cmdbuf *cs = cmd_buffer->cs;
> uint32_t header = 0, command = 0;
>  
> -   assert(size);
> assert(size <= cp_dma_max_byte_count(cmd_buffer));
>  
> radeon_check_space(cmd_buffer->device->ws, cmd_buffer->cs, 9);
> @@ -1099,9 +1098,14 @@ static void si_emit_cp_dma(struct radv_cmd_buffer 
> *cmd_buffer,
>  * indices. If we wanted to execute CP DMA in PFP, this packet
>  * should precede it.
>  */
> -   if ((flags & CP_DMA_SYNC) && cmd_buffer->queue_family_index == 
> RADV_QUEUE_GENERAL) {
> -   radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, 
> cmd_buffer->state.predicating));
> -   radeon_emit(cs, 0);
> +   if (flags & CP_DMA_SYNC) {
> +   if (cmd_buffer->queue_family_index == RADV_QUEUE_GENERAL) {
> +   radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, 
> cmd_buffer->state.predicating));
> +   radeon_emit(cs, 0);
> +   }
> +
> +   /* CP will see the sync flag and wait for all DMAs to 
> complete. */
> +   cmd_buffer->state.dma_is_busy = false;
> }
>  
> if (unlikely(cmd_buffer->device->trace_bo))
> @@ -1165,6 +1169,8 @@ void si_cp_dma_buffer_copy(struct radv_cmd_buffer 
> *cmd_buffer,
> uint64_t main_src_va, main_dest_va;
> uint64_t skipped_size = 0, realign_size = 0;
>  
> +   /* Assume that we are not going to sync after the last DMA operation. 
> */
> +   

[Mesa-dev] [Bug 107169] [regression] Upgrade from 18.0.4 to 18.1.0 causes severe stuttering in games

2018-07-11 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107169

--- Comment #2 from Stan Staykov  ---
(In reply to Denis from comment #1)
> hi. Could you plz clarify, what System Shock game you meant? New one, which
> wasn't released yet? Or some old game?
> Asking because I found demo of new one -
> https://drive.google.com/file/d/1zcBTkDP-ulMYprse2vvdlOHk7BTTVwWg/view
> and it quite slow even on my 530GT :)

I meant the 1994 and 1999 games 1 and 2 respectively.
Notice that I mentioned that downgrading back to 18.0.4 solves the issue,
therefore it's not a hardware problem. My cart is kind of slow, but it's good
enough for old games like Baldur's Gate and stuff like that.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] i965/fs: Generalize grf127 hack to dispatch_width > 8

2018-07-11 Thread Chema Casanova
El 11/07/18 a las 03:50, Caio Marcelo de Oliveira Filho escribió:
> Change the hack to always apply, adjusting the register number
> according to the dispatch_width.
> 
> The original change assumed that given for dispatch_width > 8 we
> already prevent the overlap of source and destination for send, it
> would not be necessary to explicitly add an interference with a
> register that covers r127.
> 
> The problem is that the code for spilling registers ends up generating
> scratch reads, that in Gen7+ will reuse the destination register,
> causing a send with both source and destination overlaping. So prevent
> r127 (or the overlapping wider register) to be used as destination for
> sends.
> 
> This patch fixes piglit test
> tests/spec/arb_compute_shader/linker/bug-93840.shader_test.
> 
> Fixes: 232ed898021 "i965/fs: Register allocator shoudn't use grf127 for sends 
> dest"
> ---
> 
> After more digging on the piglit failure, I came up with this
> patch. I'm still seeing crashes with for some shader-db executions
> (master have them too), but didn't have time today to drill into them
> 
>  src/intel/compiler/brw_fs_reg_allocate.cpp | 11 ---
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp 
> b/src/intel/compiler/brw_fs_reg_allocate.cpp
> index 59e047483c0..417ddeba09c 100644
> --- a/src/intel/compiler/brw_fs_reg_allocate.cpp
> +++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
> @@ -549,7 +549,7 @@ fs_visitor::assign_regs(bool allow_spilling, bool 
> spill_all)
> if (devinfo->gen >= 7)
>node_count += BRW_MAX_GRF - GEN7_MRF_HACK_START;
> int grf127_send_hack_node = node_count;
> -   if (devinfo->gen >= 8 && dispatch_width == 8)
> +   if (devinfo->gen >= 8 && dispatch_width >= 8)
>node_count ++;
> struct ra_graph *g =
>ra_alloc_interference_graph(compiler->fs_reg_sets[rsi].regs, 
> node_count);
> @@ -656,7 +656,7 @@ fs_visitor::assign_regs(bool allow_spilling, bool 
> spill_all)
>}
> }
>  
> -   if (devinfo->gen >= 8 && dispatch_width == 8) {
> +   if (devinfo->gen >= 8 && dispatch_width >= 8) {
>/* At Intel Broadwell PRM, vol 07, section "Instruction Set Reference",
> * subsection "EUISA Instructions", Send Message (page 990):
> *
> @@ -665,12 +665,9 @@ fs_visitor::assign_regs(bool allow_spilling, bool 
> spill_all)
> *
> * We are avoiding using grf127 as part of the destination of send
> * messages adding a node interference to the grf127_send_hack_node.
> -   * This node has a fixed asignment to grf127.
> -   *
> -   * We don't apply it to SIMD16 because previous code avoids any 
> register
> -   * overlap between sources and destination.
> +   * This node has a fixed assignment that overlaps with grf127.
> */
> -  ra_set_node_reg(g, grf127_send_hack_node, 127);
> +  ra_set_node_reg(g, grf127_send_hack_node, 128 - reg_width);

This configuration is more restrictive than needed. The original code
just avoids any register with any length that uses the physical register
grf127. Your code works for SIMD16, but as you are setting conflicts
with grf126 in SIMD16, you are forbidding the use of grf125 using with
regsize=2, and the same with grf123 with size 4, when this options never
use grf127. You don't need to take care of the reg_width here, just
about which physical register you can not use.

At brw_alloc_reg_set() you can check how the different registers are
defined using classes are used for different sizes. It also configures
the conflicts among the registers with different sizes and the physical
register.

So if at this point you create a node assigned to a physical register
you have conflicts with all the logical registers with any size that
overlap with it.

>foreach_block_and_inst(block, fs_inst, inst, cfg) {
>   if (inst->is_send_from_grf() && inst->dst.file == VGRF) {
>  ra_add_node_interference(g, inst->dst.nr, grf127_send_hack_node);
> 

The issue here is that the unspill instructions aren't in the list of
the is_send_from_grf. I thought we could update is_send_from_grf to
include the read/write scratch operations but finally I think that it
didn't have sense because  the source at this point is an MRF that will
be finally assigned to a GRF on Gen7+.

I've sent a patch with my solution that I think solves the case of
unspill that is creating this problem, but maybe we need to think if
there are more SEND instructions that could have this problem because of
using the MRF as source.

Chema

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] egl/android: Delete set_damage_region from egl dri vtbl

2018-07-11 Thread Eric Anholt
Harish Krupo  writes:

> The intension of the KHR_partial_update was not to send the damage back
> to the platform but to send the damage to the driver to ensure that the
> following rendering could be restricted to those regions.
> This patch removes the set_damage_region from the egl_dri vtbl and all
> the platfrom_*.c files.
> Then upcomming patches add a new dri2 interface for the drivers to
> implement
>
> Signed-off-by: Harish Krupo 

Why shouldn't the platform know about the damage region in a swap, if
it's available?  It looks like it was successfully used for Android, and
we should be using it for Present as well.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] egl: Fix missing clamping in eglSetDamageRegionKHR

2018-07-11 Thread Harish Krupo

Harish Krupo  writes:

> Clamp the x and y co-ordinates of the rectangles.
>
> v2: Clamp width/height after converting to co-ordinates
> (Ilia Merkin)
>
> Signed-off-by: Harish Krupo 
> ---
>  src/egl/main/eglapi.c | 25 +++--
>  1 file changed, 11 insertions(+), 14 deletions(-)
>
> diff --git a/src/egl/main/eglapi.c b/src/egl/main/eglapi.c
> index c110349119..deb479b6d5 100644
> --- a/src/egl/main/eglapi.c
> +++ b/src/egl/main/eglapi.c
> @@ -1320,9 +1320,7 @@ eglSwapBuffersWithDamageKHR(EGLDisplay dpy, EGLSurface 
> surface,
>  }
>  
>  /**
> - * If the width of the passed rect is greater than the surface's
> - * width then it is clamped to the width of the surface. Same with
> - * height.
> + * Clamp the rectangles so that they lie within the surface.
>   */
>  
>  static void
> @@ -1334,17 +1332,16 @@ _eglSetDamageRegionKHRClampRects(_EGLDisplay* disp, 
> _EGLSurface* surf,
> EGLint surf_width = surf->Width;
>  
> for (i = 0; i < (4 * n_rects); i += 4) {
> -  EGLint x, y, rect_width, rect_height;
> -  x = rects[i];
> -  y = rects[i + 1];
> -  rect_width = rects[i + 2];
> -  rect_height = rects[i + 3];
> -
> -  if (rect_width > surf_width - x)
> - rects[i + 2] = surf_width - x;
> -
> -  if (rect_height > surf_height - y)
> - rects[i + 3] = surf_height - y;
> +  EGLint x1, y1, x2, y2;
> +  x1 = rects[i];
> +  y1 = rects[i + 1];
> +  x2 = rects[i + 2] + x1;
> +  y2 = rects[i + 3] + y1;
> +
> +  rects[i] = CLAMP(x1, 0, surf_width);
> +  rects[i + 1] = CLAMP(y1, 0, surf_height);
> +  rects[i + 2] = CLAMP(x2, 0, surf_width) - rects[i];
> +  rects[i + 3] = CLAMP(y2, 0, surf_height) - rects[i + 1];
> }
>  }

Gentle ping :)

Thank you
Regards
Harish Krupo
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: unspills shoudn't use grf127 as dest since Gen8+

2018-07-11 Thread Jose Maria Casanova Crespo
At 232ed8980217dd65ab0925df28156f565b94b2e5 "i965/fs: Register allocator
shoudn't use grf127 for sends dest" we didn't take into account the case
of SEND instructions that are not send_from_grf. But since Gen7+ although
the backend still uses MRFs internally for sends they are finally asigned
to a GRFs.

In the case of unspills the backend assigns directly as source its
destination because it is suppose to be available. So we always have a
source-destination overlap. If the reg_allocator asigns registers that
include de grf127 we fail the validation rule that affects Gen8+
"r127 must not be used for return address when there is a src and dest
overlap in send instruction."

So this patch activates the grf127_send_hack_node for Gen8+ and if we have
any register spilled we add interferences to the destination of the unspill
operations.

Found by Caio Marcelo de Oliveira Filho

Fixes piglit test tests/spec/arb_compute_shader/linker/bug-93840.shader_test

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107193
Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127 for sends 
dest"
Cc: 18.1 
Cc: Caio Marcelo de Oliveira Filho 
Cc: Jason Ekstrand 
---
 src/intel/compiler/brw_fs_reg_allocate.cpp | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/src/intel/compiler/brw_fs_reg_allocate.cpp 
b/src/intel/compiler/brw_fs_reg_allocate.cpp
index 59e047483c0..3ea2e7547c6 100644
--- a/src/intel/compiler/brw_fs_reg_allocate.cpp
+++ b/src/intel/compiler/brw_fs_reg_allocate.cpp
@@ -549,7 +549,7 @@ fs_visitor::assign_regs(bool allow_spilling, bool spill_all)
if (devinfo->gen >= 7)
   node_count += BRW_MAX_GRF - GEN7_MRF_HACK_START;
int grf127_send_hack_node = node_count;
-   if (devinfo->gen >= 8 && dispatch_width == 8)
+   if (devinfo->gen >= 8)
   node_count ++;
struct ra_graph *g =
   ra_alloc_interference_graph(compiler->fs_reg_sets[rsi].regs, node_count);
@@ -656,7 +656,7 @@ fs_visitor::assign_regs(bool allow_spilling, bool spill_all)
   }
}
 
-   if (devinfo->gen >= 8 && dispatch_width == 8) {
+   if (devinfo->gen >= 8) {
   /* At Intel Broadwell PRM, vol 07, section "Instruction Set Reference",
* subsection "EUISA Instructions", Send Message (page 990):
*
@@ -671,13 +671,25 @@ fs_visitor::assign_regs(bool allow_spilling, bool 
spill_all)
* overlap between sources and destination.
*/
   ra_set_node_reg(g, grf127_send_hack_node, 127);
-  foreach_block_and_inst(block, fs_inst, inst, cfg) {
- if (inst->is_send_from_grf() && inst->dst.file == VGRF) {
-ra_add_node_interference(g, inst->dst.nr, grf127_send_hack_node);
+  if (dispatch_width == 8) {
+ foreach_block_and_inst(block, fs_inst, inst, cfg) {
+if (inst->is_send_from_grf() && inst->dst.file == VGRF)
+   ra_add_node_interference(g, inst->dst.nr, 
grf127_send_hack_node);
+ }
+  }
+
+  if (spilled_any_registers) {
+ foreach_block_and_inst(block, fs_inst, inst, cfg) {
+if ((inst->opcode == SHADER_OPCODE_GEN7_SCRATCH_READ ||
+inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_READ) &&
+inst->dst.file ==VGRF) {
+   ra_add_node_interference(g, inst->dst.nr, 
grf127_send_hack_node);
+}
  }
   }
}
 
+
/* Debug of register spilling: Go spill everything. */
if (unlikely(spill_all)) {
   int reg = choose_spill_reg(g);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] nv50/ir: don't rely on inbound edge order in phi nodes

2018-07-11 Thread Rhys Perry
Previously, a phi node's sources were implicitly ordered by the inbound edge
order. This changes that so that a phi node instead has a basic block stored
for each source in a deque.

Signed-off-by: Rhys Perry 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir.cpp| 20 +-
 src/gallium/drivers/nouveau/codegen/nv50_ir.h  |  9 ++-
 .../drivers/nouveau/codegen/nv50_ir_print.cpp  |  2 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 73 ++
 .../drivers/nouveau/codegen/nv50_ir_ssa.cpp|  7 ++-
 5 files changed, 50 insertions(+), 61 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
index d28022fce5..07a5325cbe 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
@@ -1132,12 +1132,21 @@ PhiInstruction::clone(ClonePolicy& pol, 
BaseInstruction *i) const
   new_PhiInstruction(pol.context(), dType));
 
BaseInstruction::clone(pol, phi);
+   phi->basicBlocks.resize(basicBlocks.size());
+   for (size_t i = 0; i < basicBlocks.size(); i++)
+  phi->basicBlocks[i] = basicBlocks[i];
 
return phi;
 }
 
+BasicBlock *
+PhiInstruction::getBB(int s) const
+{
+   return s >= (int)basicBlocks.size() ? NULL : basicBlocks[s];
+}
+
 void
-PhiInstruction::setSrc(int s, Value *val)
+PhiInstruction::setSrcBB(int s, Value *val, BasicBlock *bb)
 {
int size = srcs.size();
if (s >= size) {
@@ -1146,13 +1155,18 @@ PhiInstruction::setSrc(int s, Value *val)
  srcs[size++].setInsn(this);
}
 
+   size = basicBlocks.size();
+   if (s >= size)
+  basicBlocks.resize(s + 1);
+
srcs[s].set(val);
+   basicBlocks[s] = bb;
 }
 
 void
-PhiInstruction::setSrc(int s, const ValueRef& ref)
+PhiInstruction::setSrcBB(int s, const ValueRef& ref, BasicBlock *bb)
 {
-   setSrc(s, ref.get());
+   setSrcBB(s, ref.get(), bb);
srcs[s].mod = ref.mod;
 }
 
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
index b80397a0b9..1eb9fa7bb2 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
@@ -870,8 +870,13 @@ public:
virtual PhiInstruction *clone(ClonePolicy&,
  BaseInstruction * = NULL) const;
 
-   void setSrc(int s, Value *);
-   void setSrc(int s, const ValueRef&);
+   BasicBlock *getBB(int s) const;
+
+   void setSrcBB(int s, Value *, BasicBlock *);
+   void setSrcBB(int s, const ValueRef&, BasicBlock *);
+
+private:
+   std::deque basicBlocks;
 };
 
 class Instruction : public BaseInstruction
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
index 29e45b7ebf..e4b6b68b56 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
@@ -678,6 +678,8 @@ void BaseInstruction::print() const
   getIndirect(s, 1));
   else
  pos += getSrc(s)->print([pos], BUFSZ - pos, sType);
+  if (op == OP_PHI)
+ PRINT("%s(BB:%i)", colour[TXT_INSN], asPhi(this)->getBB(s)->getId());
}
if (i && i->exit)
   PRINT("%s exit", colour[TXT_INSN]);
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
index 26cbe20fb4..80f83c88c9 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp
@@ -372,20 +372,13 @@ typedef unordered_map<
std::pair, Value *, PhiMapHash> PhiMap;
 
 // Critical edges need to be split up so that work can be inserted along
-// specific edge transitions. Unfortunately manipulating incident edges into a
-// BB invalidates all the PHI nodes since their sources are implicitly ordered
-// by incident edge order.
-//
-// TODO: Make it so that that is not the case, and PHI nodes store pointers to
-// the original BBs.
+// specific edge transitions.
 void
 RegAlloc::PhiMovesPass::splitEdges(BasicBlock *bb)
 {
BasicBlock *pb, *pn;
-   PhiInstruction *phi;
Graph::EdgeIterator ei;
std::stack stack;
-   int j = 0;
 
for (ei = bb->cfg.incident(); !ei.end(); ei.next()) {
   pb = BasicBlock::get(ei.getNode());
@@ -394,22 +387,6 @@ RegAlloc::PhiMovesPass::splitEdges(BasicBlock *bb)
  stack.push(pb);
}
 
-   // No critical edges were found, no need to perform any work.
-   if (stack.empty())
-  return;
-
-   // We're about to, potentially, reorder the inbound edges. This means that
-   // we need to hold on to the (phi, bb) -> src mapping, and fix up the phi
-   // nodes after the graph has been modified.
-   PhiMap phis;
-
-   j = 0;
-   for (ei = bb->cfg.incident(); !ei.end(); ei.next(), j++) {
-  pb = BasicBlock::get(ei.getNode());
-  for (phi = bb->getPhi(); phi; phi = asPhi(phi->next))
- 

[Mesa-dev] [PATCH 0/2] nv50/ir: create PhiInstruction

2018-07-11 Thread Rhys Perry
This series it based off the "nv50/ir: don't rely on inbound edge order in phi
nodes" series. It differs in that it creates BaseInstruction, a class that both
PhiInstruction and Instruction inherit from. This is so that it's more
difficult to create incorrect phi instructions.

It's rather large, but a good portion of it is just changing type declarations
to use BaseInstruction, adding casts or moving code around for BaseInstruction.

After applying both patches, no regressions were found with Unigine Heaven,
Valley or Superposition with my GTX 1060. Before splitting into two patches,
the changes were tested with shader-db and piglit's quick profile and no
changes were found with the same card.

Rhys Perry (2):
  nv50/ir: create PhiInstruction as a sibling of Instruction
  nv50/ir: don't rely on inbound edge order in phi nodes

 src/gallium/drivers/nouveau/codegen/nv50_ir.cpp| 154 
 src/gallium/drivers/nouveau/codegen/nv50_ir.h  | 189 +--
 src/gallium/drivers/nouveau/codegen/nv50_ir_bb.cpp |  71 +++---
 .../drivers/nouveau/codegen/nv50_ir_build_util.h   |  12 +-
 .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp |  24 +-
 .../drivers/nouveau/codegen/nv50_ir_emit_nv50.cpp  |  24 +-
 .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  |  10 +-
 .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  |  16 +-
 .../drivers/nouveau/codegen/nv50_ir_inlines.h  |  55 -
 .../nouveau/codegen/nv50_ir_lowering_nv50.cpp  |  24 +-
 .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp  |  54 ++---
 .../nouveau/codegen/nv50_ir_lowering_nvc0.h|  16 +-
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 269 +++--
 .../drivers/nouveau/codegen/nv50_ir_print.cpp  |  83 ---
 src/gallium/drivers/nouveau/codegen/nv50_ir_ra.cpp | 156 +---
 .../drivers/nouveau/codegen/nv50_ir_ssa.cpp|  27 ++-
 .../drivers/nouveau/codegen/nv50_ir_target.cpp |  30 +--
 src/gallium/drivers/nouveau/codegen/nv50_ir_util.h |   2 +
 18 files changed, 688 insertions(+), 528 deletions(-)

-- 
2.14.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] vulkan: Define new VK_MESA_query_timestamp extension [v2]

2018-07-11 Thread Keith Packard
Pekka Paalanen  writes:

> I did not mean you would be solving that problem. I meant that it would
> be good to figure out what people actually want from the API to be able
> to solve the problem themselves.

Thanks for the clarification. I'd suggest that we not try and solve that
problem until we have someone who needs it?

What I'm using this extension for is to correlate GPU times with WSI
times as required by the GOOGLE_display_timing extension. That extension
reports the time gap between the end of rendering and when a frame is
displayed as reported by the WSI backend, so I needed some way to
translate between those two time domains. To do this, I call this new
function once per presentation, which means I'm getting recent values
for both clocks which should track any minor clock skew.

-- 
-keith


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 106861] fatal error: wayland-egl-backend.h: No such file or directory compilation terminated.

2018-07-11 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=106861

Eero Tamminen  changed:

   What|Removed |Added

 Status|RESOLVED|VERIFIED

-- 
You are receiving this mail because:
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 107169] [regression] Upgrade from 18.0.4 to 18.1.0 causes severe stuttering in games

2018-07-11 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107169

Denis  changed:

   What|Removed |Added

 Status|NEW |NEEDINFO

--- Comment #1 from Denis  ---
hi. Could you plz clarify, what System Shock game you meant? New one, which
wasn't released yet? Or some old game?
Asking because I found demo of new one -
https://drive.google.com/file/d/1zcBTkDP-ulMYprse2vvdlOHk7BTTVwWg/view
and it quite slow even on my 530GT :)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/miptree: Allocate MS texture BOs as BUSY

2018-07-11 Thread Pohjolainen, Topi
On Fri, Jul 06, 2018 at 03:39:26PM -0700, Nanley Chery wrote:
> These buffer objects are never accessed with the CPU.

Reviewed-by: Topi Pohjolainen 

> ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index eee83a7a963..3a1d064ef4b 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -694,8 +694,8 @@ miptree_create(struct brw_context *brw,
> enum intel_miptree_create_flags flags)
>  {
> const struct gen_device_info *devinfo = >screen->devinfo;
> -   const uint32_t alloc_flags = (flags & MIPTREE_CREATE_BUSY) ?
> -BO_ALLOC_BUSY : 0;
> +   const uint32_t alloc_flags =
> +  (flags & MIPTREE_CREATE_BUSY || num_samples > 1) ? BO_ALLOC_BUSY : 0;
> isl_tiling_flags_t tiling_flags = ISL_TILING_ANY_MASK;
>  
> /* TODO: This used to be because there wasn't BLORP to handle Y-tiling. */
> -- 
> 2.18.0
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/13] Fix stencil texturing and BO caching bugs

2018-07-11 Thread Pohjolainen, Topi
On Fri, Jul 06, 2018 at 01:29:29PM -0700, Nanley Chery wrote:
> On Fri, Jul 06, 2018 at 03:36:01PM +0300, Pohjolainen, Topi wrote:
> > On Tue, Jun 12, 2018 at 12:21:52PM -0700, Nanley Chery wrote:
> > > This series fixes a couple stencil texturing bugs on HSW and
> > > cache-tracking for certain stencil BOs on all platforms.
> > > 
> > > Nanley Chery (13):
> > >   i965: Set the r8stencil flag in miptree_finish_write
> > >   i965/miptree: Set the r8stencil flag in map_depthstencil
> > >   i965/draw: Set the r8stencil flag after drawing
> > >   i965/draw: Fix adding the stencil bo to the depth cache
> > >   i965/miptree: Use make_surface in map_blit
> > >   i965/miptree: Delete MIPTREE_CREATE_LINEAR
> > >   i965/miptree: Share tiling_flags in miptree_create
> > >   i965/miptree: Share the miptree format in miptree_create
> > >   i965/miptree: Share alloc_flags in miptree_create
> > 
> > I'm not sure if this maintains the BO_ALLOC_BUSY.
> > 
> > >   i965/miptree: Add and use mt_surf_usage
> > >   i965/miptree: Refactor miptree_create
> > >   i965/miptree: Create the r8stencil_mt immediately
> > >   i965/miptree: Inline make_separate_stencil
> > 
> > Same here.
> 
> Yes, those two patches don't maintain BO_ALLOC_BUSY for the cases where
> the user creates a depth, depth-stencil, or stencil texture. It seems
> better this way though since, like a color texture, those are liable to
> be accessed with the CPU immediately.
> 
> I think we can maintain BO_ALLOC_BUSY for multisampled textures, since
> they will never be accessed on the CPU. I'll send a follow-up patch.
> Thoughts?

I'm fine with the change, just mention it in the commit message.

> 
> AFAICT, BO_ALLOC_BUSY wasn't used for those textures until commit
> a73d56dce37ae13f422215de1bf1fdfb8e2f6ed7, which allocated renderbuffers
> for depth stencil textures and implicitly (and perhaps accidentally)
> asked for a BUSY BO. I haven't thoroughly checked though.
> 
> > Otherwise the series is:
> > 
> > Reviewed-by: Topi Pohjolainen 
> > 
> 
> Thanks for the review!
> 
> -Nanley
> 
> > > 
> > >  src/mesa/drivers/dri/i965/brw_blorp.c |   6 +-
> > >  src/mesa/drivers/dri/i965/brw_clear.c |   8 -
> > >  src/mesa/drivers/dri/i965/brw_draw.c  |  14 +-
> > >  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 211 --
> > >  src/mesa/drivers/dri/i965/intel_mipmap_tree.h |   5 +-
> > >  src/mesa/drivers/dri/i965/intel_tex_image.c   |   3 -
> > >  6 files changed, 103 insertions(+), 144 deletions(-)
> > > 
> > > -- 
> > > 2.17.0
> > > 
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/9] i965/fs: Register allocator shoudn't use grf127 for sends dest

2018-07-11 Thread Chema Casanova
Including mesa-dev in my previous reply.

El 11/07/18 a las 01:08, Caio Marcelo de Oliveira Filho escribió:
>> Since Gen8+ Intel PRM states that "r127 must not be used for return
>> address when there is a src and dest overlap in send instruction."
> 
> The previous patch, that verifies the condition above is causing
> 
> tests/spec/arb_compute_shader/linker/bug-93840.shader_test
> 
> to crash with
> 
> shader_runner: ../src/intel/compiler/brw_fs_generator.cpp:2455: int 
> fs_generator::generate_code(const cfg_t*, int): Assertion `validated' failed.
> 
> I also could reproduce the crash locally. It happens even with this
> patch (which adds the hack) applied.

I've seen it in Jenkins, but couldn't reproduce it so I thought it
wasn't related. Now I've realized that I was using a release build at
that moment.

The good thing is that the validator rule has detected that the
generated instruction was incorrect.


>> This patch implements this restriction creating new grf127_send_hack_node
>> at the register allocator. This node has a fixed assignation to grf127.
>>
>> For vgrf that are used as destination of send messages we create node
>> interfereces with the grf127_send_hack_node. So the register allocator
>> will never assign to these vgrf a register that involves grf127.
>>
>> If dispatch_width > 8 we don't create these interferences to the because
>> all instructions have node interferences between sources and destination.
>> That is enough to avoid the r127 restriction.
> 
> I think for both widths will not be enough. The instruction that fails
> the validation is:
> 
> mov(8)  g126<1>UD   g0<8,8,1>UD { align1 
> WE_all 1Q };
> mov(1)  g126.2<1>UD 0x0090UD{ align1 
> WE_all 1N };
> send(16)g126<1>UW   g126<8,8,1>UD
> data ( DC OWORD block read, 253, 3) mlen 1 rlen 2 
> { align1 WE_all 1H };
> ERROR: r127 must not be used for return address when there is a src 
> and dest overlap
> 
> Which if I understood correctly comes from the scratch reading being
> created by the spilling logic. In brw_oword_block_read_scratch() we
> see
> 
>if (p->devinfo->gen >= 7) {
>   /* On gen 7 and above, we no longer have message registers and we can
>* send from any register we want.  By using the destination register
>* for the message, we guarantee that the implied message write won't
>* accidentally overwrite anything.  This has been a problem because
>* the MRF registers and source for the final FB write are both fixed
>* and may overlap.
>*/
>   mrf = retype(dest, BRW_REGISTER_TYPE_UD);
>} else {
>   mrf = retype(mrf, BRW_REGISTER_TYPE_UD);
>}
>dest = retype(dest, BRW_REGISTER_TYPE_UW);
> 
> It seems to me we'll have to handle r127 there as well.

Yes, as in this case source and destination are coded to be the same
vgrf, we don't have a source/destination interference on SIMD16.

I'm doing some extra testing but something like next code at
assigns_regs seems to fix the issue:


  if (spilled_any_registers) {
 foreach_block_and_inst(block, fs_inst, inst, cfg) {
if (inst->opcode == SHADER_OPCODE_GEN7_SCRATCH_READ ||
inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_READ) {
   ra_add_node_interference(g, inst->dst.nr,
grf127_send_hack_node);
}
 }

Thanks Caio for digging into the problem. I'm sending today a patch to
deal with this case.

Chema


>>
>> This fixes CTS tests that raised this issue as they were executed as SIMD8:
>>
>> dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom
>>
>> Shader-db results on Skylake:
>>total instructions in shared programs: 7686798 -> 7686797 (<.01%)
>>instructions in affected programs: 301 -> 300 (-0.33%)
>>helped: 1
>>HURT: 0
>>
>>total cycles in shared programs: 337092322 -> 337091919 (<.01%)
>>cycles in affected programs: 22420415 -> 22420012 (<.01%)
>>helped: 712
>>HURT: 588
>>
>> Shader-db results on Broadwell:
>>
>>total instructions in shared programs: 7658574 -> 7658625 (<.01%)
>>instructions in affected programs: 19610 -> 19661 (0.26%)
>>helped: 3
>>HURT: 4
>>
>>total cycles in shared programs: 340694553 -> 340676378 (<.01%)
>>cycles in affected programs: 24724915 -> 24706740 (-0.07%)
>>helped: 998
>>HURT: 916
>>
>>total spills in shared programs: 4300 -> 4311 (0.26%)
>>spills in affected programs: 333 -> 344 (3.30%)
>>helped: 1
>>HURT: 3
>>
>>total fills in shared programs: 5370 -> 5378 (0.15%)
>>fills in affected programs: 274 -> 282 (2.92%)
>>helped: 1
>>HURT: 3
>>
>> v2: Avoid duplicating register classes without grf127. Let's use a node
>> with a fixed assignation to grf127 and create interferences to send
>> message vgrf destinations. (Eric Anholt)
>> v3: 

[Mesa-dev] [PATCH v2 1/3] i965: Sweep NIR after linking phase to free held memory

2018-07-11 Thread Danylo Piliaiev
After optimization passes and many trasfromations most of memory
NIR holds is a garbage which was being freed only after shader deletion.
Freeing it at the end of linking will save memory which would be useful
in case there are a lot of complex shaders being compiled.
The common case for this issue is 32bit game running under Wine.

The cost of the optimization is around ~3-5% of compilation speed
with complex shaders.

V2: by Jason Ekstrand
- Move nir_sweep up, right after the last change of NIR

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103274

Signed-off-by: Danylo Piliaiev 
---
 src/mesa/drivers/dri/i965/brw_link.cpp | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_link.cpp 
b/src/mesa/drivers/dri/i965/brw_link.cpp
index 1071056f14..378426101b 100644
--- a/src/mesa/drivers/dri/i965/brw_link.cpp
+++ b/src/mesa/drivers/dri/i965/brw_link.cpp
@@ -317,6 +317,8 @@ brw_link_shader(struct gl_context *ctx, struct 
gl_shader_program *shProg)
   NIR_PASS_V(prog->nir, nir_lower_atomics_to_ssbo,
  prog->nir->info.num_abos);
 
+  nir_sweep(prog->nir);
+
   infos[stage] = >nir->info;
 
   update_xfb_info(prog->sh.LinkedTransformFeedback, infos[stage]);
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radv: emit a dummy ZPASS_DONE to prevent GPU hangs on GFX9

2018-07-11 Thread Samuel Pitoiset



On 07/11/2018 01:56 PM, Bas Nieuwenhuizen wrote:

Assuming you confirmed this fixed something:


This doesn't fix anything known.



Reviewed-by: Bas Nieuwenhuizen 

On Wed, Jul 11, 2018 at 11:55 AM, Samuel Pitoiset
 wrote:

A ZPASS_DONE or PIXEL_STAT_DUMP_EVENT (of the DB occlusion
counters) must immediately precede every timestamp event to
prevent a GPU hang on GFX9.

Signed-off-by: Samuel Pitoiset 
Cc: 18.1 
---
  src/amd/vulkan/radv_cmd_buffer.c | 15 +--
  src/amd/vulkan/radv_device.c |  4 ++--
  src/amd/vulkan/radv_private.h|  7 +--
  src/amd/vulkan/radv_query.c  |  9 ++---
  src/amd/vulkan/si_cmd_buffer.c   | 26 +-
  5 files changed, 47 insertions(+), 14 deletions(-)

diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index 9da42fe03e..325e1993f8 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -319,11 +319,21 @@ radv_reset_cmd_buffer(struct radv_cmd_buffer *cmd_buffer)
 }

 if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9) {
+   unsigned num_db = 
cmd_buffer->device->physical_device->rad_info.num_render_backends;
+   unsigned eop_bug_offset;
 void *fence_ptr;
+
 radv_cmd_buffer_upload_alloc(cmd_buffer, 8, 0,
  _buffer->gfx9_fence_offset,
  _ptr);
 cmd_buffer->gfx9_fence_bo = cmd_buffer->upload.upload_bo;
+
+   /* Allocate a buffer for the EOP bug on GFX9. */
+   radv_cmd_buffer_upload_alloc(cmd_buffer, 16 * num_db, 0,
+_bug_offset, _ptr);
+   cmd_buffer->gfx9_eop_bug_va =
+   radv_buffer_get_va(cmd_buffer->upload.upload_bo);
+   cmd_buffer->gfx9_eop_bug_va += eop_bug_offset;
 }

 cmd_buffer->status = RADV_CMD_BUFFER_STATUS_INITIAL;
@@ -473,7 +483,7 @@ radv_cmd_buffer_after_draw(struct radv_cmd_buffer 
*cmd_buffer,

cmd_buffer->device->physical_device->rad_info.chip_class,
ptr, va,
radv_cmd_buffer_uses_mec(cmd_buffer),
-  flags);
+  flags, cmd_buffer->gfx9_eop_bug_va);
 }

 if (unlikely(cmd_buffer->device->trace_bo))
@@ -4318,7 +4328,8 @@ static void write_event(struct radv_cmd_buffer 
*cmd_buffer,

cmd_buffer->device->physical_device->rad_info.chip_class,

radv_cmd_buffer_uses_mec(cmd_buffer),
V_028A90_BOTTOM_OF_PIPE_TS, 0,
-  EOP_DATA_SEL_VALUE_32BIT, va, 2, 
value);
+  EOP_DATA_SEL_VALUE_32BIT, va, 2, 
value,
+  cmd_buffer->gfx9_eop_bug_va);
 }

 assert(cmd_buffer->cs->cdw <= cdw_max);
diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 73c48cef1f..1c0a50c82f 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -2240,7 +2240,7 @@ radv_get_preamble_cs(struct radv_queue *queue,
RADV_CMD_FLAG_INV_SMEM_L1 |
RADV_CMD_FLAG_INV_VMEM_L1 |
RADV_CMD_FLAG_INV_GLOBAL_L2 |
-  
RADV_CMD_FLAG_START_PIPELINE_STATS);
+  
RADV_CMD_FLAG_START_PIPELINE_STATS, 0);
 } else if (i == 1) {
 si_cs_emit_cache_flush(cs,

queue->device->physical_device->rad_info.chip_class,
@@ -2251,7 +2251,7 @@ radv_get_preamble_cs(struct radv_queue *queue,
RADV_CMD_FLAG_INV_SMEM_L1 |
RADV_CMD_FLAG_INV_VMEM_L1 |
RADV_CMD_FLAG_INV_GLOBAL_L2 |
-  
RADV_CMD_FLAG_START_PIPELINE_STATS);
+  
RADV_CMD_FLAG_START_PIPELINE_STATS, 0);
 }

 if (!queue->device->ws->cs_finalize(cs))
diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h
index 4e4b3a6037..96218f4be2 100644
--- a/src/amd/vulkan/radv_private.h
+++ b/src/amd/vulkan/radv_private.h
@@ -1041,6 +1041,7 @@ struct radv_cmd_buffer {
 uint32_t gfx9_fence_offset;
 struct radeon_winsys_bo *gfx9_fence_bo;
 uint32_t gfx9_fence_idx;
+   uint64_t gfx9_eop_bug_va;

 /**
  * Whether a query pool has been 

Re: [Mesa-dev] [PATCH] radv: emit a dummy ZPASS_DONE to prevent GPU hangs on GFX9

2018-07-11 Thread Bas Nieuwenhuizen
Assuming you confirmed this fixed something:

Reviewed-by: Bas Nieuwenhuizen 

On Wed, Jul 11, 2018 at 11:55 AM, Samuel Pitoiset
 wrote:
> A ZPASS_DONE or PIXEL_STAT_DUMP_EVENT (of the DB occlusion
> counters) must immediately precede every timestamp event to
> prevent a GPU hang on GFX9.
>
> Signed-off-by: Samuel Pitoiset 
> Cc: 18.1 
> ---
>  src/amd/vulkan/radv_cmd_buffer.c | 15 +--
>  src/amd/vulkan/radv_device.c |  4 ++--
>  src/amd/vulkan/radv_private.h|  7 +--
>  src/amd/vulkan/radv_query.c  |  9 ++---
>  src/amd/vulkan/si_cmd_buffer.c   | 26 +-
>  5 files changed, 47 insertions(+), 14 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_cmd_buffer.c 
> b/src/amd/vulkan/radv_cmd_buffer.c
> index 9da42fe03e..325e1993f8 100644
> --- a/src/amd/vulkan/radv_cmd_buffer.c
> +++ b/src/amd/vulkan/radv_cmd_buffer.c
> @@ -319,11 +319,21 @@ radv_reset_cmd_buffer(struct radv_cmd_buffer 
> *cmd_buffer)
> }
>
> if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9) 
> {
> +   unsigned num_db = 
> cmd_buffer->device->physical_device->rad_info.num_render_backends;
> +   unsigned eop_bug_offset;
> void *fence_ptr;
> +
> radv_cmd_buffer_upload_alloc(cmd_buffer, 8, 0,
>  _buffer->gfx9_fence_offset,
>  _ptr);
> cmd_buffer->gfx9_fence_bo = cmd_buffer->upload.upload_bo;
> +
> +   /* Allocate a buffer for the EOP bug on GFX9. */
> +   radv_cmd_buffer_upload_alloc(cmd_buffer, 16 * num_db, 0,
> +_bug_offset, _ptr);
> +   cmd_buffer->gfx9_eop_bug_va =
> +   radv_buffer_get_va(cmd_buffer->upload.upload_bo);
> +   cmd_buffer->gfx9_eop_bug_va += eop_bug_offset;
> }
>
> cmd_buffer->status = RADV_CMD_BUFFER_STATUS_INITIAL;
> @@ -473,7 +483,7 @@ radv_cmd_buffer_after_draw(struct radv_cmd_buffer 
> *cmd_buffer,
>
> cmd_buffer->device->physical_device->rad_info.chip_class,
>ptr, va,
>radv_cmd_buffer_uses_mec(cmd_buffer),
> -  flags);
> +  flags, cmd_buffer->gfx9_eop_bug_va);
> }
>
> if (unlikely(cmd_buffer->device->trace_bo))
> @@ -4318,7 +4328,8 @@ static void write_event(struct radv_cmd_buffer 
> *cmd_buffer,
>
> cmd_buffer->device->physical_device->rad_info.chip_class,
>
> radv_cmd_buffer_uses_mec(cmd_buffer),
>V_028A90_BOTTOM_OF_PIPE_TS, 0,
> -  EOP_DATA_SEL_VALUE_32BIT, va, 2, 
> value);
> +  EOP_DATA_SEL_VALUE_32BIT, va, 2, 
> value,
> +  cmd_buffer->gfx9_eop_bug_va);
> }
>
> assert(cmd_buffer->cs->cdw <= cdw_max);
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index 73c48cef1f..1c0a50c82f 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -2240,7 +2240,7 @@ radv_get_preamble_cs(struct radv_queue *queue,
>RADV_CMD_FLAG_INV_SMEM_L1 |
>RADV_CMD_FLAG_INV_VMEM_L1 |
>RADV_CMD_FLAG_INV_GLOBAL_L2 |
> -  
> RADV_CMD_FLAG_START_PIPELINE_STATS);
> +  
> RADV_CMD_FLAG_START_PIPELINE_STATS, 0);
> } else if (i == 1) {
> si_cs_emit_cache_flush(cs,
>
> queue->device->physical_device->rad_info.chip_class,
> @@ -2251,7 +2251,7 @@ radv_get_preamble_cs(struct radv_queue *queue,
>RADV_CMD_FLAG_INV_SMEM_L1 |
>RADV_CMD_FLAG_INV_VMEM_L1 |
>RADV_CMD_FLAG_INV_GLOBAL_L2 |
> -  
> RADV_CMD_FLAG_START_PIPELINE_STATS);
> +  
> RADV_CMD_FLAG_START_PIPELINE_STATS, 0);
> }
>
> if (!queue->device->ws->cs_finalize(cs))
> diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h
> index 4e4b3a6037..96218f4be2 100644
> --- a/src/amd/vulkan/radv_private.h
> +++ b/src/amd/vulkan/radv_private.h
> @@ -1041,6 +1041,7 @@ struct radv_cmd_buffer {
> uint32_t gfx9_fence_offset;
> struct radeon_winsys_bo *gfx9_fence_bo;
> uint32_t gfx9_fence_idx;
> +   

[Mesa-dev] [PATCH] intel/batch_decoder: decoding of 3DSTATE_CONSTANT_BODY.

2018-07-11 Thread Sergii Romantsov
SNB doesn't have a difinition of 3DSTATE_CONSTANT_BODY, thats
why we got segmentation fault when used INTEL_DEBUG=bat.
Fixed by avoiding parsing of 3DSTATE_CONSTANT_BODY if gen_spec
is not observed.

Fixes: 169d8e011ae (intel: Fix 3DSTATE_CONSTANT buffer decoding.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107190
Signed-off-by: Sergii Romantsov 
---
 src/intel/common/gen_batch_decoder.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/intel/common/gen_batch_decoder.c 
b/src/intel/common/gen_batch_decoder.c
index fe7536d..973221b 100644
--- a/src/intel/common/gen_batch_decoder.c
+++ b/src/intel/common/gen_batch_decoder.c
@@ -561,6 +561,10 @@ decode_3dstate_constant(struct gen_batch_decode_ctx *ctx, 
const uint32_t *p)
struct gen_group *inst = gen_spec_find_instruction(ctx->spec, p);
struct gen_group *body =
   gen_spec_find_struct(ctx->spec, "3DSTATE_CONSTANT_BODY");
+   if (body == NULL) {
+  fprintf(ctx->fp, "did not find 3DSTATE_CONSTANT_BODY info\n");
+  return;
+   }
 
uint32_t read_length[4] = {0};
uint64_t read_addr[4];
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 107170] Build fails if building against X server/libxrandr with no leases support

2018-07-11 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=107170

Danylo  changed:

   What|Removed |Added

 CC||danylo.pilia...@gmail.com

--- Comment #1 from Danylo  ---
Made a patch to fix compilation:
https://patchwork.freedesktop.org/patch/237719/

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] vulkan: Fix compilation on older platforms

2018-07-11 Thread Danylo Piliaiev
Make xlease automatically enabled only if xcb-randr >= 1.13,
check its version if manually enabled.

Enable VK_EXT_display_control only when libdrm >= 2.4.89

Check for DRM_EVENT_CONTEXT_VERSION >= 4 to use sequence_handler.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107170
  https://bugs.freedesktop.org/show_bug.cgi?id=106972
  https://bugs.freedesktop.org/show_bug.cgi?id=107176

Signed-off-by: Danylo Piliaiev 
---
 configure.ac   | 29 +-
 meson.build| 11 +-
 src/amd/vulkan/radv_extensions.py  |  9 +++-
 src/amd/vulkan/radv_wsi_display.c  |  5 ++---
 src/intel/vulkan/anv_extensions.py |  2 +-
 src/intel/vulkan/anv_extensions_gen.py |  7 +++
 src/intel/vulkan/anv_wsi_display.c |  4 ++--
 src/vulkan/wsi/wsi_common_display.c|  8 +--
 src/vulkan/wsi/wsi_common_display.h|  3 ++-
 9 files changed, 52 insertions(+), 26 deletions(-)

diff --git a/configure.ac b/configure.ac
index f135d05736..0b04525014 100644
--- a/configure.ac
+++ b/configure.ac
@@ -82,6 +82,8 @@ LIBDRM_FREEDRENO_REQUIRED=2.4.92
 LIBDRM_ETNAVIV_REQUIRED=2.4.89
 LIBDRM_VC4_REQUIRED=2.4.89
 
+LIBDRM_CRT_SEQUENCE_REQUIRED=2.4.89
+
 dnl Versions for external dependencies
 DRI2PROTO_REQUIRED=2.8
 GLPROTO_REQUIRED=1.4.14
@@ -97,6 +99,7 @@ XCBDRI2_REQUIRED=1.8
 XCBDRI3_MODIFIERS_REQUIRED=1.13
 XCBGLX_REQUIRED=1.8.1
 XCBPRESENT_MODIFIERS_REQUIRED=1.13
+XCBRANDR_XLEASE_REQUIRED=1.13
 XDAMAGE_REQUIRED=1.1
 XSHMFENCE_REQUIRED=1.1
 XVMC_REQUIRED=1.0.6
@@ -1874,20 +1877,6 @@ if test x"$enable_dri3" = xyes; then
 fi
 fi
 
-
-if echo "$platforms" | grep -q 'x11' && echo "$platforms" | grep -q 'drm'; then
-have_xlease=yes
-else
-have_xlease=no
-fi
-
-if test x"$have_xlease" = xyes; then
-randr_modules="x11-xcb xcb-randr"
-PKG_CHECK_MODULES([XCB_RANDR], [$randr_modules])
-xlib_randr_modules="xrandr"
-PKG_CHECK_MODULES([XLIB_RANDR], [$xlib_randr_modules])
-fi
-
 AM_CONDITIONAL(HAVE_PLATFORM_X11, echo "$platforms" | grep -q 'x11')
 AM_CONDITIONAL(HAVE_PLATFORM_WAYLAND, echo "$platforms" | grep -q 'wayland')
 AM_CONDITIONAL(HAVE_PLATFORM_DRM, echo "$platforms" | grep -q 'drm')
@@ -1905,14 +1894,24 @@ xno)
 ;;
 *)
 if echo "$platforms" | grep -q 'x11' && echo "$platforms" | grep -q 'drm'; 
then
-enable_xlib_lease=yes
+xlease_modules="x11-xcb xcb-randr >= $XCBRANDR_XLEASE_REQUIRED xrandr"
+PKG_CHECK_EXISTS([$xlease_modules], [enable_xlib_lease=yes], 
[enable_xlib_lease=no])
 else
 enable_xlib_lease=no
 fi
 esac
 
+if test x"$enable_xlib_lease" = xyes; then
+randr_modules="x11-xcb xcb-randr >= $XCBRANDR_XLEASE_REQUIRED"
+PKG_CHECK_MODULES([XCB_RANDR], [$randr_modules])
+xlib_randr_modules="xrandr"
+PKG_CHECK_MODULES([XLIB_RANDR], [$xlib_randr_modules])
+fi
+
 AM_CONDITIONAL(HAVE_XLIB_LEASE, test "x$enable_xlib_lease" = xyes)
 
+PKG_CHECK_EXISTS([libdrm >= $LIBDRM_CRT_SEQUENCE_REQUIRED], 
[DEFINES="${DEFINES} -DVK_USE_DISPLAY_CONTROL"], [])
+
 dnl
 dnl More DRI setup
 dnl
diff --git a/meson.build b/meson.build
index 7d12af3d51..2683060827 100644
--- a/meson.build
+++ b/meson.build
@@ -1088,6 +1088,8 @@ _drm_freedreno_ver = '2.4.92'
 _drm_intel_ver = '2.4.75'
 _drm_ver = '2.4.75'
 
+_drm_crt_sequence_ver = '2.4.89'
+
 _libdrm_checks = [
   ['intel', with_dri_i915 or with_gallium_i915],
   ['amdgpu', with_amd_vk or with_gallium_radeonsi],
@@ -1361,11 +1363,18 @@ if with_platform_x11
 dep_xcb_xfixes = dependency('xcb-xfixes')
   endif
   if with_xlib_lease
-dep_xcb_xrandr = dependency('xcb-randr', version : '>= 1.12')
+dep_xcb_xrandr = dependency('xcb-randr', version : '>= 1.13')
 dep_xlib_xrandr = dependency('xrandr', version : '>= 1.3')
   endif
 endif
 
+if with_any_vk
+  dep_drm_crt_sequence = dependency('libdrm', version : '>=' + 
_drm_crt_sequence_ver, required : false)
+  if dep_drm_crt_sequence.found()
+pre_args += '-DVK_USE_DISPLAY_CONTROL'
+  endif
+endif
+
 if get_option('gallium-extra-hud')
   pre_args += '-DHAVE_GALLIUM_EXTRA_HUD=1'
 endif
diff --git a/src/amd/vulkan/radv_extensions.py 
b/src/amd/vulkan/radv_extensions.py
index c36559f48e..e60b0d4773 100644
--- a/src/amd/vulkan/radv_extensions.py
+++ b/src/amd/vulkan/radv_extensions.py
@@ -91,7 +91,7 @@ EXTENSIONS = [
 Extension('VK_EXT_direct_mode_display',   1, 
'VK_USE_PLATFORM_DISPLAY_KHR'),
 Extension('VK_EXT_acquire_xlib_display',  1, 
'VK_USE_PLATFORM_XLIB_XRANDR_EXT'),
 Extension('VK_EXT_display_surface_counter',   1, 
'VK_USE_PLATFORM_DISPLAY_KHR'),
-Extension('VK_EXT_display_control',   1, 
'VK_USE_PLATFORM_DISPLAY_KHR'),
+Extension('VK_EXT_display_control',   1, 
'VK_USE_DISPLAY_CONTROL'),
 Extension('VK_EXT_debug_report',  9, True),
 Extension('VK_EXT_depth_range_unrestricted',  1, True),
 

Re: [Mesa-dev] Loop unrolling and if statement opts

2018-07-11 Thread Timothy Arceri

On 11/07/18 19:45, Eero Tamminen wrote:

Hi,

On 11.07.2018 12:00, Timothy Arceri wrote:

On 11/07/18 18:20, Eero Tamminen wrote:

Have you considered partial loop unrolling support?

I.e. when loop counter is known, but too high for full unroll, doing 
partial loop unrolling (e.g. unroll 4x times) and dividing loop 
counter by same amount (if it didn't divide evenly, need to unroll 
remainder outside of loop).


This is supported e.g. by Intel Windows compiler.



Do you have any examples of apps this helps?


Sorry, no. This was found a while ago when doing synthetic tests for 
things that are handled by compilers on the CPU side (GCC, LLVM...).




If I remember correctly there are very few (if any?) shaders in shader-db
that are not unrolled due to the limit.


Loop unrolling should probably have some limits based on shader cache 
size and how many instructions the loop content has, not on the loop 
counter, but only backend has that info...




Yes we already limit based on the contents of the loop. If I recall 
correctly that part is not blocking too many loops from unrolling, its 
set at a limit that seems to limit spilling pretty well it a small 
number of shaders. The iteration limit is that main hard limit, I guess 
there probably is room to play with partial unrolling there.



The limiting code is:

static bool
is_loop_small_enough_to_unroll(nir_shader *shader, nir_loop_info *li)
{
   unsigned max_iter = shader->options->max_unroll_iterations;

   if (li->trip_count > max_iter)
  return false;

   if (li->force_unroll)
  return true;

   bool loop_not_too_large =
  li->num_instructions * li->trip_count <= max_iter * 
LOOP_UNROLL_LIMIT;


   return loop_not_too_large;
}



 - Eero


There was a measurable impact from the unroll limits on the Talos 
benchmark for RADV. I guess it might be interesting to try partial 
unrolling with that Game, it would be good to know where else it might 
help.




 - Eero

On 11.07.2018 09:48, Timothy Arceri wrote:

This series started out as me trying to unrolls some useless loops I
spotted in some shaders from DXVK games (see patch 10), but I found
some other issues and improvements along the way.

The biggest winner seem like it could be the dolphin uber shaders on
i965 (on radeonsi the shaders don't seem to have spilling issues).
The loops in the uber shaders that are unrolled are those used as
wrappers around switchs by GLSL IR.

shader-db results for the series on IVB (note as the loops that are
unrolled only have a single iteration I enabled shader-db reporting
on shaders where loops are unrolled):

total instructions in shared programs: 10018187 -> 10016468 (-0.02%)
instructions in affected programs: 104080 -> 102361 (-1.65%)
helped: 36
HURT: 15

total cycles in shared programs: 220065064 -> 154529655 (-29.78%)
cycles in affected programs: 126063017 -> 60527608 (-51.99%)
helped: 51
HURT: 0

total loops in shared programs: 2515 -> 2308 (-8.23%)
loops in affected programs: 903 -> 696 (-22.92%)
helped: 51
HURT: 0

total spills in shared programs: 4370 -> 4124 (-5.63%)
spills in affected programs: 1397 -> 1151 (-17.61%)
helped: 9
HURT: 12

total fills in shared programs: 4581 -> 4419 (-3.54%)
fills in affected programs: 2201 -> 2039 (-7.36%)
helped: 9
HURT: 15

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 2/3] nir: Add a discard optimization pass

2018-07-11 Thread Eero Tamminen

Hi,

On 06.07.2018 00:28, Jason Ekstrand wrote:
On Thu, Jul 5, 2018 at 2:18 PM, Jason Ekstrand > wrote:

[...]

>> Optimizing for the latter case is an essentially
>> heuristic assumption that needs to be verified experimentally.  Have 
you
>> tested the effect of this pass on non-DX workloads extensively?
>>
>
> Yes, it is a trade-off.  No, I have not done particularly extensive
> testing.  We do, however, know of non-DXVK workloads that would 
benefit
> from this.  I believe Manhattan is one such example though I have not 
yet
> benchmarked it.
>

You should grab some numbers then to make sure there are no
regressions...


I'm working on that.  Unfortunately the perf system is giving me
trouble so I don't have the numbers yet.

But keep in mind that the i965 scheduler is already
performing a similar optimization (locally, but with cycle-count
information).  This will only help over the existing
optimization if the
shaders that represent a bottleneck in Manhattan have sufficient
control
flow for the basic block boundaries to represent a problem to the
(local) scheduler.


I'm not sure about the manhattan shader but the Skyrim shader does
have control flow which the discard has to get moved above.


I have results from the perf system now and somehow this pass makes 
manhattan noticeably worse.  I'll look into that.


Note: All the more complex GfxBench tests use discard:
Egypt, T-Rex, Manhattan, CarChase, AztecRuins...

(Most of the other benchmarks we're running, don't use them.)


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radv: emit a dummy ZPASS_DONE to prevent GPU hangs on GFX9

2018-07-11 Thread Samuel Pitoiset
A ZPASS_DONE or PIXEL_STAT_DUMP_EVENT (of the DB occlusion
counters) must immediately precede every timestamp event to
prevent a GPU hang on GFX9.

Signed-off-by: Samuel Pitoiset 
Cc: 18.1 
---
 src/amd/vulkan/radv_cmd_buffer.c | 15 +--
 src/amd/vulkan/radv_device.c |  4 ++--
 src/amd/vulkan/radv_private.h|  7 +--
 src/amd/vulkan/radv_query.c  |  9 ++---
 src/amd/vulkan/si_cmd_buffer.c   | 26 +-
 5 files changed, 47 insertions(+), 14 deletions(-)

diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index 9da42fe03e..325e1993f8 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -319,11 +319,21 @@ radv_reset_cmd_buffer(struct radv_cmd_buffer *cmd_buffer)
}
 
if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9) {
+   unsigned num_db = 
cmd_buffer->device->physical_device->rad_info.num_render_backends;
+   unsigned eop_bug_offset;
void *fence_ptr;
+
radv_cmd_buffer_upload_alloc(cmd_buffer, 8, 0,
 _buffer->gfx9_fence_offset,
 _ptr);
cmd_buffer->gfx9_fence_bo = cmd_buffer->upload.upload_bo;
+
+   /* Allocate a buffer for the EOP bug on GFX9. */
+   radv_cmd_buffer_upload_alloc(cmd_buffer, 16 * num_db, 0,
+_bug_offset, _ptr);
+   cmd_buffer->gfx9_eop_bug_va =
+   radv_buffer_get_va(cmd_buffer->upload.upload_bo);
+   cmd_buffer->gfx9_eop_bug_va += eop_bug_offset;
}
 
cmd_buffer->status = RADV_CMD_BUFFER_STATUS_INITIAL;
@@ -473,7 +483,7 @@ radv_cmd_buffer_after_draw(struct radv_cmd_buffer 
*cmd_buffer,
   
cmd_buffer->device->physical_device->rad_info.chip_class,
   ptr, va,
   radv_cmd_buffer_uses_mec(cmd_buffer),
-  flags);
+  flags, cmd_buffer->gfx9_eop_bug_va);
}
 
if (unlikely(cmd_buffer->device->trace_bo))
@@ -4318,7 +4328,8 @@ static void write_event(struct radv_cmd_buffer 
*cmd_buffer,
   
cmd_buffer->device->physical_device->rad_info.chip_class,
   radv_cmd_buffer_uses_mec(cmd_buffer),
   V_028A90_BOTTOM_OF_PIPE_TS, 0,
-  EOP_DATA_SEL_VALUE_32BIT, va, 2, 
value);
+  EOP_DATA_SEL_VALUE_32BIT, va, 2, 
value,
+  cmd_buffer->gfx9_eop_bug_va);
}
 
assert(cmd_buffer->cs->cdw <= cdw_max);
diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 73c48cef1f..1c0a50c82f 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -2240,7 +2240,7 @@ radv_get_preamble_cs(struct radv_queue *queue,
   RADV_CMD_FLAG_INV_SMEM_L1 |
   RADV_CMD_FLAG_INV_VMEM_L1 |
   RADV_CMD_FLAG_INV_GLOBAL_L2 |
-  
RADV_CMD_FLAG_START_PIPELINE_STATS);
+  
RADV_CMD_FLAG_START_PIPELINE_STATS, 0);
} else if (i == 1) {
si_cs_emit_cache_flush(cs,
   
queue->device->physical_device->rad_info.chip_class,
@@ -2251,7 +2251,7 @@ radv_get_preamble_cs(struct radv_queue *queue,
   RADV_CMD_FLAG_INV_SMEM_L1 |
   RADV_CMD_FLAG_INV_VMEM_L1 |
   RADV_CMD_FLAG_INV_GLOBAL_L2 |
-  
RADV_CMD_FLAG_START_PIPELINE_STATS);
+  
RADV_CMD_FLAG_START_PIPELINE_STATS, 0);
}
 
if (!queue->device->ws->cs_finalize(cs))
diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h
index 4e4b3a6037..96218f4be2 100644
--- a/src/amd/vulkan/radv_private.h
+++ b/src/amd/vulkan/radv_private.h
@@ -1041,6 +1041,7 @@ struct radv_cmd_buffer {
uint32_t gfx9_fence_offset;
struct radeon_winsys_bo *gfx9_fence_bo;
uint32_t gfx9_fence_idx;
+   uint64_t gfx9_eop_bug_va;
 
/**
 * Whether a query pool has been resetted and we have to flush caches.
@@ -1072,7 +1073,8 @@ void si_cs_emit_write_event_eop(struct radeon_cmdbuf *cs,
unsigned data_sel,
uint64_t va,
uint32_t old_fence,
- 

Re: [Mesa-dev] Loop unrolling and if statement opts

2018-07-11 Thread Eero Tamminen

Hi,

On 11.07.2018 12:00, Timothy Arceri wrote:

On 11/07/18 18:20, Eero Tamminen wrote:

Have you considered partial loop unrolling support?

I.e. when loop counter is known, but too high for full unroll, doing 
partial loop unrolling (e.g. unroll 4x times) and dividing loop 
counter by same amount (if it didn't divide evenly, need to unroll 
remainder outside of loop).


This is supported e.g. by Intel Windows compiler.



Do you have any examples of apps this helps?


Sorry, no. This was found a while ago when doing synthetic tests for 
things that are handled by compilers on the CPU side (GCC, LLVM...).




If I remember correctly there are very few (if any?) shaders in shader-db
that are not unrolled due to the limit.


Loop unrolling should probably have some limits based on shader cache 
size and how many instructions the loop content has, not on the loop 
counter, but only backend has that info...



- Eero


There was a measurable impact from the unroll limits on the Talos 
benchmark for RADV. I guess it might be interesting to try partial 
unrolling with that Game, it would be good to know where else it might 
help.




 - Eero

On 11.07.2018 09:48, Timothy Arceri wrote:

This series started out as me trying to unrolls some useless loops I
spotted in some shaders from DXVK games (see patch 10), but I found
some other issues and improvements along the way.

The biggest winner seem like it could be the dolphin uber shaders on
i965 (on radeonsi the shaders don't seem to have spilling issues).
The loops in the uber shaders that are unrolled are those used as
wrappers around switchs by GLSL IR.

shader-db results for the series on IVB (note as the loops that are
unrolled only have a single iteration I enabled shader-db reporting
on shaders where loops are unrolled):

total instructions in shared programs: 10018187 -> 10016468 (-0.02%)
instructions in affected programs: 104080 -> 102361 (-1.65%)
helped: 36
HURT: 15

total cycles in shared programs: 220065064 -> 154529655 (-29.78%)
cycles in affected programs: 126063017 -> 60527608 (-51.99%)
helped: 51
HURT: 0

total loops in shared programs: 2515 -> 2308 (-8.23%)
loops in affected programs: 903 -> 696 (-22.92%)
helped: 51
HURT: 0

total spills in shared programs: 4370 -> 4124 (-5.63%)
spills in affected programs: 1397 -> 1151 (-17.61%)
helped: 9
HURT: 12

total fills in shared programs: 4581 -> 4419 (-3.54%)
fills in affected programs: 2201 -> 2039 (-7.36%)
helped: 9
HURT: 15

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] How to apply patches to upstream mesa 9.0.3

2018-07-11 Thread Erik Faye-Lund

On 11. juli 2018 10:22, Thella Shyam Kumar wrote:

Hi All,

We are using an x86 based platform and we are needing changes in mesa 
9.0.3 which is coming with AOSP 7.1.2r36
If we want the changes in mesa in AOSP, our understanding is that 
these changes should be upstreamed in freedesktop.org 
's mesa 9.0.3 because AOSP gets the code from 
there.


So firstly will you allow changes into mesa 9.0.3 as it is very old now?


No. Mesa 9.0.3 is already released, so that ship has sailed. And I'm 
pretty sure there's no plans for a 9.0.4, considering the last 9.x 
release happened almost 5 years ago. I think you're going to have to 
look into upgrading (or patching) Mesa in AOSP instead.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Loop unrolling and if statement opts

2018-07-11 Thread Timothy Arceri

On 11/07/18 18:20, Eero Tamminen wrote:

Hi,

Have you considered partial loop unrolling support?

I.e. when loop counter is known, but too high for full unroll, doing 
partial loop unrolling (e.g. unroll 4x times) and dividing loop counter 
by same amount (if it didn't divide evenly, need to unroll remainder 
outside of loop).


This is supported e.g. by Intel Windows compiler.



Do you have any examples of apps this helps? If I remember correctly 
there are very few (if any?) shaders in shader-db that are not unrolled 
due to the limit.


There was a measurable impact from the unroll limits on the Talos 
benchmark for RADV. I guess it might be interesting to try partial 
unrolling with that Game, it would be good to know where else it might 
help.




 - Eero

On 11.07.2018 09:48, Timothy Arceri wrote:

This series started out as me trying to unrolls some useless loops I
spotted in some shaders from DXVK games (see patch 10), but I found
some other issues and improvements along the way.

The biggest winner seem like it could be the dolphin uber shaders on
i965 (on radeonsi the shaders don't seem to have spilling issues).
The loops in the uber shaders that are unrolled are those used as
wrappers around switchs by GLSL IR.

shader-db results for the series on IVB (note as the loops that are
unrolled only have a single iteration I enabled shader-db reporting
on shaders where loops are unrolled):

total instructions in shared programs: 10018187 -> 10016468 (-0.02%)
instructions in affected programs: 104080 -> 102361 (-1.65%)
helped: 36
HURT: 15

total cycles in shared programs: 220065064 -> 154529655 (-29.78%)
cycles in affected programs: 126063017 -> 60527608 (-51.99%)
helped: 51
HURT: 0

total loops in shared programs: 2515 -> 2308 (-8.23%)
loops in affected programs: 903 -> 696 (-22.92%)
helped: 51
HURT: 0

total spills in shared programs: 4370 -> 4124 (-5.63%)
spills in affected programs: 1397 -> 1151 (-17.61%)
helped: 9
HURT: 12

total fills in shared programs: 4581 -> 4419 (-3.54%)
fills in affected programs: 2201 -> 2039 (-7.36%)
helped: 9
HURT: 15

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 4/4] radv: add support for VK_EXT_conditional_rendering

2018-07-11 Thread Bas Nieuwenhuizen
Don't we need to disable predication too for the PipelineBarriers when
a layout change happens?

Also in cases the barrier or a blit/copy does different predication,
do we not need to do si_emit_set_predication_state again as the state
was overridden?

On Mon, Jul 9, 2018 at 2:57 PM, Samuel Pitoiset
 wrote:
> Inherited commands buffers are not supported.
>
> v2: - disable predication for blit and copy commands
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_cmd_buffer.c  | 29 +
>  src/amd/vulkan/radv_device.c  |  7 +++
>  src/amd/vulkan/radv_extensions.py |  1 +
>  src/amd/vulkan/radv_meta_blit.c   | 10 ++
>  src/amd/vulkan/radv_meta_buffer.c | 10 ++
>  src/amd/vulkan/radv_meta_copy.c   | 30 ++
>  6 files changed, 87 insertions(+)
>
> diff --git a/src/amd/vulkan/radv_cmd_buffer.c 
> b/src/amd/vulkan/radv_cmd_buffer.c
> index 29199f2b3d..3fd8ebe2d3 100644
> --- a/src/amd/vulkan/radv_cmd_buffer.c
> +++ b/src/amd/vulkan/radv_cmd_buffer.c
> @@ -4376,3 +4376,32 @@ void radv_CmdSetDeviceMask(VkCommandBuffer 
> commandBuffer,
>  {
> /* No-op */
>  }
> +
> +/* VK_EXT_conditional_rendering */
> +void vkCmdBeginConditionalRenderingEXT(
> +   VkCommandBuffer commandBuffer,
> +   const VkConditionalRenderingBeginInfoEXT*   
> pConditionalRenderingBegin)
> +{
> +   RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
> +   RADV_FROM_HANDLE(radv_buffer, buffer, 
> pConditionalRenderingBegin->buffer);
> +   bool inverted;
> +   uint64_t va;
> +
> +   va = radv_buffer_get_va(buffer->bo) + 
> pConditionalRenderingBegin->offset;
> +
> +   inverted = pConditionalRenderingBegin->flags & 
> VK_CONDITIONAL_RENDERING_INVERTED_BIT_EXT;
> +
> +   /* Enable predication for this command buffer. */
> +   si_emit_set_predication_state(cmd_buffer, inverted, va);
> +   cmd_buffer->state.predicating = true;
> +}
> +
> +void vkCmdEndConditionalRenderingEXT(
> +   VkCommandBuffer commandBuffer)
> +{
> +   RADV_FROM_HANDLE(radv_cmd_buffer, cmd_buffer, commandBuffer);
> +
> +   /* Disable predication for this command buffer. */
> +   si_emit_set_predication_state(cmd_buffer, false, 0);
> +   cmd_buffer->state.predicating = false;
> +}
> diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
> index ad3465f594..06d70d305a 100644
> --- a/src/amd/vulkan/radv_device.c
> +++ b/src/amd/vulkan/radv_device.c
> @@ -806,6 +806,13 @@ void radv_GetPhysicalDeviceFeatures2(
> features->runtimeDescriptorArray = true;
> break;
> }
> +   case 
> VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_CONDITIONAL_RENDERING_FEATURES_EXT: {
> +   VkPhysicalDeviceConditionalRenderingFeaturesEXT 
> *features =
> +   
> (VkPhysicalDeviceConditionalRenderingFeaturesEXT*)ext;
> +   features->conditionalRendering = true;
> +   features->inheritedConditionalRendering = false;
> +   break;
> +   }
> default:
> break;
> }
> diff --git a/src/amd/vulkan/radv_extensions.py 
> b/src/amd/vulkan/radv_extensions.py
> index a0f1038110..6ddbabf26e 100644
> --- a/src/amd/vulkan/radv_extensions.py
> +++ b/src/amd/vulkan/radv_extensions.py
> @@ -89,6 +89,7 @@ EXTENSIONS = [
>  Extension('VK_KHR_display',  23, 
> 'VK_USE_PLATFORM_DISPLAY_KHR'),
>  Extension('VK_EXT_direct_mode_display',   1, 
> 'VK_USE_PLATFORM_DISPLAY_KHR'),
>  Extension('VK_EXT_acquire_xlib_display',  1, 
> 'VK_USE_PLATFORM_XLIB_XRANDR_EXT'),
> +Extension('VK_EXT_conditional_rendering', 1, True),
>  Extension('VK_EXT_display_surface_counter',   1, 
> 'VK_USE_PLATFORM_DISPLAY_KHR'),
>  Extension('VK_EXT_display_control',   1, 
> 'VK_USE_PLATFORM_DISPLAY_KHR'),
>  Extension('VK_EXT_debug_report',  9, True),
> diff --git a/src/amd/vulkan/radv_meta_blit.c b/src/amd/vulkan/radv_meta_blit.c
> index a6ee0cb7e9..67c26aabdb 100644
> --- a/src/amd/vulkan/radv_meta_blit.c
> +++ b/src/amd/vulkan/radv_meta_blit.c
> @@ -520,6 +520,7 @@ void radv_CmdBlitImage(
> RADV_FROM_HANDLE(radv_image, src_image, srcImage);
> RADV_FROM_HANDLE(radv_image, dest_image, destImage);
> struct radv_meta_saved_state saved_state;
> +   bool old_predicating;
>
> /* From the Vulkan 1.0 spec:
>  *
> @@ -534,6 +535,12 @@ void radv_CmdBlitImage(
>RADV_META_SAVE_CONSTANTS |
>RADV_META_SAVE_DESCRIPTORS);
>
> +   /* VK_EXT_conditional_rendering says that blit commands should not be
> +* affected by conditional rendering.
> +*/
> +   

Re: [Mesa-dev] [PATCH v3] radv: make sure to wait for CP DMA when needed

2018-07-11 Thread Bas Nieuwenhuizen
Reviewed-by: Bas Nieuwenhuizen 

On Mon, Jul 9, 2018 at 6:02 PM, Samuel Pitoiset
 wrote:
> This might fix some synchronization issues. I don't know if
> that will affect performance but it's required for correctness.
>
> v3: - wait for CP DMA in CmdPipelineBarrier()
> - clear the busy value when CP_DMA_SYNC is requested
> v2: - wait for CP DMA in CmdWaitEvents()
> - track if CP DMA is used
>
> CC: 
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_cmd_buffer.c | 15 +
>  src/amd/vulkan/radv_private.h|  5 +
>  src/amd/vulkan/si_cmd_buffer.c   | 36 
>  3 files changed, 52 insertions(+), 4 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_cmd_buffer.c 
> b/src/amd/vulkan/radv_cmd_buffer.c
> index 9da42fe03e..5dbdb3d996 100644
> --- a/src/amd/vulkan/radv_cmd_buffer.c
> +++ b/src/amd/vulkan/radv_cmd_buffer.c
> @@ -2596,6 +2596,11 @@ VkResult radv_EndCommandBuffer(
> si_emit_cache_flush(cmd_buffer);
> }
>
> +   /* Make sure CP DMA is idle at the end of IBs because the kernel
> +* doesn't wait for it.
> +*/
> +   si_cp_dma_wait_for_idle(cmd_buffer);
> +
> vk_free(_buffer->pool->alloc, cmd_buffer->state.attachments);
>
> if (!cmd_buffer->device->ws->cs_finalize(cmd_buffer->cs))
> @@ -4242,6 +4247,11 @@ radv_barrier(struct radv_cmd_buffer *cmd_buffer,
>  0);
> }
>
> +   /* Make sure CP DMA is idle because the driver might have performed a
> +* DMA operation for copying or filling buffers/images.
> +*/
> +   si_cp_dma_wait_for_idle(cmd_buffer);
> +
> cmd_buffer->state.flush_bits |= dst_flush_bits;
>  }
>
> @@ -4292,6 +4302,11 @@ static void write_event(struct radv_cmd_buffer 
> *cmd_buffer,
> VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT |
> VK_PIPELINE_STAGE_VERTEX_INPUT_BIT;
>
> +   /* Make sure CP DMA is idle because the driver might have performed a
> +* DMA operation for copying or filling buffers/images.
> +*/
> +   si_cp_dma_wait_for_idle(cmd_buffer);
> +
> /* TODO: Emit EOS events for syncing PS/CS stages. */
>
> if (!(stageMask & ~top_of_pipe_flags)) {
> diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h
> index 4e4b3a6037..2400de49a2 100644
> --- a/src/amd/vulkan/radv_private.h
> +++ b/src/amd/vulkan/radv_private.h
> @@ -979,6 +979,9 @@ struct radv_cmd_state {
> uint32_t last_num_instances;
> uint32_t last_first_instance;
> uint32_t last_vertex_offset;
> +
> +   /* Whether CP DMA is busy/idle. */
> +   bool dma_is_busy;
>  };
>
>  struct radv_cmd_pool {
> @@ -1091,6 +1094,8 @@ void si_cp_dma_prefetch(struct radv_cmd_buffer 
> *cmd_buffer, uint64_t va,
>  unsigned size);
>  void si_cp_dma_clear_buffer(struct radv_cmd_buffer *cmd_buffer, uint64_t va,
> uint64_t size, unsigned value);
> +void si_cp_dma_wait_for_idle(struct radv_cmd_buffer *cmd_buffer);
> +
>  void radv_set_db_count_control(struct radv_cmd_buffer *cmd_buffer);
>  bool
>  radv_cmd_buffer_upload_alloc(struct radv_cmd_buffer *cmd_buffer,
> diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c
> index 454fd8c39c..6d566a918d 100644
> --- a/src/amd/vulkan/si_cmd_buffer.c
> +++ b/src/amd/vulkan/si_cmd_buffer.c
> @@ -1040,7 +1040,6 @@ static void si_emit_cp_dma(struct radv_cmd_buffer 
> *cmd_buffer,
> struct radeon_cmdbuf *cs = cmd_buffer->cs;
> uint32_t header = 0, command = 0;
>
> -   assert(size);
> assert(size <= cp_dma_max_byte_count(cmd_buffer));
>
> radeon_check_space(cmd_buffer->device->ws, cmd_buffer->cs, 9);
> @@ -1099,9 +1098,14 @@ static void si_emit_cp_dma(struct radv_cmd_buffer 
> *cmd_buffer,
>  * indices. If we wanted to execute CP DMA in PFP, this packet
>  * should precede it.
>  */
> -   if ((flags & CP_DMA_SYNC) && cmd_buffer->queue_family_index == 
> RADV_QUEUE_GENERAL) {
> -   radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, 
> cmd_buffer->state.predicating));
> -   radeon_emit(cs, 0);
> +   if (flags & CP_DMA_SYNC) {
> +   if (cmd_buffer->queue_family_index == RADV_QUEUE_GENERAL) {
> +   radeon_emit(cs, PKT3(PKT3_PFP_SYNC_ME, 0, 
> cmd_buffer->state.predicating));
> +   radeon_emit(cs, 0);
> +   }
> +
> +   /* CP will see the sync flag and wait for all DMAs to 
> complete. */
> +   cmd_buffer->state.dma_is_busy = false;
> }
>
> if (unlikely(cmd_buffer->device->trace_bo))
> @@ -1165,6 +1169,8 @@ void si_cp_dma_buffer_copy(struct radv_cmd_buffer 
> *cmd_buffer,
> uint64_t main_src_va, main_dest_va;
> uint64_t skipped_size = 0, realign_size = 0;
>
> +   /* Assume that we are not going to sync after the last DMA 

[Mesa-dev] How to apply patches to upstream mesa 9.0.3

2018-07-11 Thread Thella Shyam Kumar
Hi All,

We are using an x86 based platform and we are needing changes in mesa 9.0.3
which is coming with AOSP 7.1.2r36
If we want the changes in mesa in AOSP, our understanding is that these
changes should be upstreamed in freedesktop.org's mesa 9.0.3 because AOSP
gets the code from there.

So firstly will you allow changes into mesa 9.0.3 as it is very old now?
If yes how can I submit the changes to mesa 9.0.3 here.

Regards,
Shyam
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >