Re: [Mesa-dev] [ANNOUNCE] mesa 17.3.4

2018-02-17 Thread Mark Janes
Bas Nieuwenhuizen  writes:

> (-mesa-announce + Mark, Dave and James)
>
> Hi Emil,
>
> radv is broken for nearly all commercial games in 17.3.4. The cause is
>
> commit ad764e365beb8a119369b97f5cb95fc7ea8c
> Author: Bas Nieuwenhuizen 
> Date:   Mon Jan 22 09:01:29 2018 +0100
>
> ac/nir: Use instance_rate_inputs per attribute, not per variable.
>
> This did the wrong thing if we had e.g. an array for which only some
> of the attributes use the instance index. Tripped up some new CTS
> tests.
>
> CC: 
> Reviewed-by: Samuel Pitoiset 
> Reviewed-by: Dave Airlie 
> (cherry picked from commit 5a4dc285002e1924dbc8c72d17481a3dbc4c0142)
>
> Conflicts:
> src/amd/common/ac_nir_to_llvm.c
>
> A typo was introduced during the conflict resolution while
> cherrypicking to 17.3.
>
> First things first, can we mitigate this? Would it be possible to get
> a 17.3.5 with this fixed ASAP? This can be fixed by either rolling
> back ad764e365beb8a119369b97f5cb95fc7ea8c or by applying
> https://patchwork.freedesktop.org/patch/204260/ (obviously not applied
> to master as the issue did not occur there).
>
> Secondly, I'd like to talk about process and how to prevent this in
> the future. Bugfix releases are supposed to be stable so downstream
> maintainers don't have to to deal with this kind of stuff happening,
> so I think that breaking radv pretty much completely is particularly
> egregious and we should look at how to prevent this happnening another
> time.
>
> First a short summary of what happened and when (all times in UTC):
>
> 2018-02-09 4:47: Emil sends out the pre-release announcement, git
> branch contains the faulty commit
> 2018-02-13 16:05: James Legg replies to the pre-release announcement:
> https://lists.freedesktop.org/archives/mesa-dev/2018-February/185355.html
> 2018-02-13 16:06: James Legg sends a patch to fix it:
> https://lists.freedesktop.org/archives/mesa-dev/2018-February/185356.html
> 2018-02-13 16:33: Bas reviews the patch.
> 2018-02-15 12:56: Emil releases 17.3.4 without the fix:
> https://lists.freedesktop.org/archives/mesa-dev/2018-February/185691.html
> 2018-02-16 21:09: Someone else replies with a fix to the pre-release
> announcement: 
> https://lists.freedesktop.org/archives/mesa-dev/2018-February/185908.html
> 2018-02-17 ~13:00: Bas notices the attached fix has "mismerge" in the
> name and decides to investigate the issue
>
>
> So I think there have been a couple of issues in communication here:
>
> 1) Essentially none of the communication talked about this being an
> issue on the branch only. This made me underestimate the severity of
> the issue, since the games kept working on master. This also meant no
> manual pings from my side to the release manager for not applying this
> to master first.
> 2) The reply to the pre-release announcement only said there was an
> issue, not that there was a fix sent out, nor any details like
> severity.
> 3) As far as I can tell no action had been taken by the release
> manager. James' reply did not get a response nor was the fix included
> in the release.
>
> My question would be how to improve the communication here. Could you
> elaborate the reason for (3)? Was it because you thought it would have
> to be picked up by the radv developers first? Would it help if I
> replied to James' reply in the announcement to link to the patch?
>
> The remaining issue is mainly about testing. The initial detection of
> this issue by James was already more than the 24-48 hours after the
> pre-release announcement that is recommended between the announcement
> and release, so a faster release would have prevented timely detection
> in the first place. I suspect radv does not get tested at all on
> releases except maybe a build test?
>
> Since manually testing all games on every release is not scalable for
> either the radv developers or the release manager I'm thinking of
> getting the automated tests from the vulkan CTS and crucible running
> before a release. That said I don't think keeping track of what tests
> are supposed to pass is something we should be pushing on the release
> manager, so I suppose we should be setting up our own CI and surfacing
> results of the stable branches to the release manager?

Tracking results is at best a significant investment which attenuates as
the driver becomes more stable and automation improves.  It takes us
perhaps an hour per day on average to track i965 status in CI.  This
investment pays bigger dividends for larger developer teams.  There is
constant demand to expand the coverage of CI.

Tracking results requires developer time when the driver has been
broken, but this cost is less than any other alternative.
Unfortunately, this cost is often borne by whoever runs the CI, and can
become burdensome.

> I'm assuming there is already a 

[Mesa-dev] [PATCH 10/13] radeonsi: allow fewer input SGPRs in 2nd shader of merged shaders

2018-02-17 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_shader.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 783d57f..6d0c81d 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -6583,23 +6583,27 @@ static void si_build_wrapper_function(struct 
si_shader_context *ctx,
unsigned param_size;
LLVMValueRef arg = NULL;
 
param = LLVMGetParam(parts[part], param_idx);
param_type = LLVMTypeOf(param);
param_size = ac_get_type_size(param_type) / 4;
is_sgpr = ac_is_sgpr_param(param);
 
if (is_sgpr)
lp_add_function_attr(parts[part], param_idx + 
1, LP_FUNC_ATTR_INREG);
+   else if (out_idx < num_out_sgpr) {
+   /* Skip returned SGPRs the current part doesn't
+* declare on the input. */
+   out_idx = num_out_sgpr;
+   }
 
assert(out_idx + param_size <= (is_sgpr ? num_out_sgpr 
: num_out));
-   assert(is_sgpr || out_idx >= num_out_sgpr);
 
if (param_size == 1)
arg = out[out_idx];
else
arg = lp_build_gather_values(>gallivm, 
[out_idx], param_size);
 
if (LLVMTypeOf(arg) != param_type) {
if (LLVMGetTypeKind(param_type) == 
LLVMPointerTypeKind) {
if 
(LLVMGetPointerAddressSpace(param_type) ==
AC_CONST_32BIT_ADDR_SPACE) {
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/13] radeonsi: don't use struct si_descriptors for vertex buffer descriptors

2018-02-17 Thread Marek Olšák
From: Marek Olšák 

VBO descriptor code will change a lot one day.
---
 src/gallium/drivers/radeonsi/si_blit.c|  2 +-
 src/gallium/drivers/radeonsi/si_cp_dma.c  |  5 ++-
 src/gallium/drivers/radeonsi/si_debug.c   | 14 ++--
 src/gallium/drivers/radeonsi/si_descriptors.c | 50 +--
 src/gallium/drivers/radeonsi/si_hw_context.c  |  2 +-
 src/gallium/drivers/radeonsi/si_pipe.h|  6 +++-
 6 files changed, 46 insertions(+), 33 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_blit.c 
b/src/gallium/drivers/radeonsi/si_blit.c
index 370ce04..f1c4f6d 100644
--- a/src/gallium/drivers/radeonsi/si_blit.c
+++ b/src/gallium/drivers/radeonsi/si_blit.c
@@ -79,21 +79,21 @@ void si_blitter_begin(struct pipe_context *ctx, enum 
si_blitter_op op)
 
 void si_blitter_end(struct pipe_context *ctx)
 {
struct si_context *sctx = (struct si_context *)ctx;
 
sctx->b.render_cond_force_off = false;
 
/* Restore shader pointers because the VS blit shader changed all
 * non-global VS user SGPRs. */
sctx->shader_pointers_dirty |= SI_DESCS_SHADER_MASK(VERTEX);
-   sctx->vertex_buffer_pointer_dirty = true;
+   sctx->vertex_buffer_pointer_dirty = sctx->vb_descriptors_buffer != NULL;
si_mark_atom_dirty(sctx, >shader_pointers.atom);
 }
 
 static unsigned u_max_sample(struct pipe_resource *r)
 {
return r->nr_samples ? r->nr_samples - 1 : 0;
 }
 
 static unsigned
 si_blit_dbcb_copy(struct si_context *sctx,
diff --git a/src/gallium/drivers/radeonsi/si_cp_dma.c 
b/src/gallium/drivers/radeonsi/si_cp_dma.c
index b665926..15bd305 100644
--- a/src/gallium/drivers/radeonsi/si_cp_dma.c
+++ b/src/gallium/drivers/radeonsi/si_cp_dma.c
@@ -509,23 +509,22 @@ static void cik_prefetch_shader_async(struct si_context 
*sctx,
assert(state->nbo == 1);
 
cik_prefetch_TC_L2_async(sctx, bo, 0, bo->width0);
 }
 
 static void cik_prefetch_VBO_descriptors(struct si_context *sctx)
 {
if (!sctx->vertex_elements)
return;
 
-   cik_prefetch_TC_L2_async(sctx, >vertex_buffers.buffer->b.b,
-sctx->vertex_buffers.gpu_address -
-sctx->vertex_buffers.buffer->gpu_address,
+   cik_prefetch_TC_L2_async(sctx, >vb_descriptors_buffer->b.b,
+sctx->vb_descriptors_offset,
 sctx->vertex_elements->desc_list_byte_size);
 }
 
 void cik_emit_prefetch_L2(struct si_context *sctx)
 {
/* Prefetch shaders and VBO descriptors to TC L2. */
if (sctx->b.chip_class >= GFX9) {
/* Choose the right spot for the VBO prefetch. */
if (sctx->tes_shader.cso) {
if (sctx->prefetch_L2_mask & SI_PREFETCH_HS)
diff --git a/src/gallium/drivers/radeonsi/si_debug.c 
b/src/gallium/drivers/radeonsi/si_debug.c
index 385ce39..1f25f4e 100644
--- a/src/gallium/drivers/radeonsi/si_debug.c
+++ b/src/gallium/drivers/radeonsi/si_debug.c
@@ -733,24 +733,34 @@ static void si_dump_descriptors(struct si_context *sctx,
enabled_constbuf = 
sctx->const_and_shader_buffers[processor].enabled_mask >>
   SI_NUM_SHADER_BUFFERS;
enabled_shaderbuf = 
sctx->const_and_shader_buffers[processor].enabled_mask &
u_bit_consecutive(0, SI_NUM_SHADER_BUFFERS);
enabled_shaderbuf = util_bitreverse(enabled_shaderbuf) >>
(32 - SI_NUM_SHADER_BUFFERS);
enabled_samplers = sctx->samplers[processor].enabled_mask;
enabled_images = sctx->images[processor].enabled_mask;
}
 
-   if (processor == PIPE_SHADER_VERTEX) {
+   if (processor == PIPE_SHADER_VERTEX &&
+   sctx->vb_descriptors_buffer &&
+   sctx->vb_descriptors_gpu_list &&
+   sctx->vertex_elements) {
assert(info); /* only CS may not have an info struct */
+   struct si_descriptors desc = {};
 
-   si_dump_descriptor_list(sctx->screen, >vertex_buffers, 
name,
+   desc.buffer = sctx->vb_descriptors_buffer;
+   desc.list = sctx->vb_descriptors_gpu_list;
+   desc.gpu_list = sctx->vb_descriptors_gpu_list;
+   desc.element_dw_size = 4;
+   desc.num_active_slots = 
sctx->vertex_elements->desc_list_byte_size / 16;
+
+   si_dump_descriptor_list(sctx->screen, , name,
" - Vertex buffer", 4, info->num_inputs,
si_identity, log);
}
 
si_dump_descriptor_list(sctx->screen,

[SI_SHADER_DESCS_CONST_AND_SHADER_BUFFERS],
name, " - Constant buffer", 4,
util_last_bit(enabled_constbuf),

[Mesa-dev] [PATCH 13/13] radeonsi: remove 2 unused user SGPRs from merged TES-GS with 32-bit pointers

2018-02-17 Thread Marek Olšák
From: Marek Olšák 

The effect of the last 13 commits on user SGPR counts:

64-bit pointers:
TCS:14 -> 12
Merged VS-TCS: 24 -> 20
Merged VS-GS:  18 -> 16
Merged TES-GS: 18 -> 14

32-bit pointers:
TCS:10 -> 8
Merged VS-TCS: 16 -> 12
Merged VS-GS:  11 -> 9
Merged TES-GS: 11 -> 6
---
 src/gallium/drivers/radeonsi/si_descriptors.c   |  2 +-
 src/gallium/drivers/radeonsi/si_shader.c| 30 +++--
 src/gallium/drivers/radeonsi/si_shader.h|  9 +++-
 src/gallium/drivers/radeonsi/si_state_shaders.c |  5 +++--
 4 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index 7fdac23..a4dae44 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -2190,21 +2190,21 @@ void si_emit_graphics_shader_pointers(struct si_context 
*sctx,
struct radeon_winsys_cs *cs = sctx->b.gfx.cs;
 
/* Find the location of the VB descriptor pointer. */
/* TODO: In the future, the pointer will be packed in unused
 *   bits of the first 2 VB descriptors. */
unsigned sh_dw_offset = SI_VS_NUM_USER_SGPR;
if (sctx->b.chip_class >= GFX9) {
if (sctx->tes_shader.cso)
sh_dw_offset = GFX9_TCS_NUM_USER_SGPR;
else if (sctx->gs_shader.cso)
-   sh_dw_offset = GFX9_GS_NUM_USER_SGPR;
+   sh_dw_offset = GFX9_VSGS_NUM_USER_SGPR;
}
 
unsigned sh_offset = sh_base[PIPE_SHADER_VERTEX] + sh_dw_offset 
* 4;
si_emit_shader_pointer_head(cs, sh_offset, 1);
si_emit_shader_pointer_body(sctx->screen, cs,

sctx->vb_descriptors_buffer->gpu_address +
sctx->vb_descriptors_offset);
sctx->vertex_buffer_pointer_dirty = false;
}
 
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index dfbb1f2..7b37765 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -3413,21 +3413,26 @@ static void si_set_es_return_value_for_gs(struct 
si_shader_context *ctx)
  8 + SI_SGPR_RW_BUFFERS);
ret = si_insert_input_ptr(ctx, ret,
  ctx->param_bindless_samplers_and_images,
  8 + SI_SGPR_BINDLESS_SAMPLERS_AND_IMAGES);
 
 #if !HAVE_32BIT_POINTERS
ret = si_insert_input_ptr(ctx, ret, ctx->param_vs_state_bits + 1,
  8 + GFX9_SGPR_2ND_SAMPLERS_AND_IMAGES);
 #endif
 
-   unsigned vgpr = 8 + GFX9_GS_NUM_USER_SGPR;
+   unsigned vgpr;
+   if (ctx->type == PIPE_SHADER_VERTEX)
+   vgpr = 8 + GFX9_VSGS_NUM_USER_SGPR;
+   else
+   vgpr = 8 + GFX9_TESGS_NUM_USER_SGPR;
+
for (unsigned i = 0; i < 5; i++) {
unsigned param = ctx->param_gs_vtx01_offset + i;
ret = si_insert_input_ret_float(ctx, ret, param, vgpr++);
}
ctx->return_value = ret;
 }
 
 static void si_llvm_emit_ls_epilogue(struct ac_shader_abi *abi,
 unsigned max_outputs,
 LLVMValueRef *addrs)
@@ -4772,26 +4777,27 @@ static void create_function(struct si_shader_context 
*ctx)
add_arg(, ARG_SGPR, ctx->i32); /* unused 
(SPI_SHADER_PGM_LO/HI_GS << 8) */
add_arg(, ARG_SGPR, ctx->i32); /* unused 
(SPI_SHADER_PGM_LO/HI_GS >> 24) */
 
declare_global_desc_pointers(ctx, );
declare_per_stage_desc_pointers(ctx, ,
(ctx->type == 
PIPE_SHADER_VERTEX ||
 ctx->type == 
PIPE_SHADER_TESS_EVAL));
if (ctx->type == PIPE_SHADER_VERTEX) {
declare_vs_specific_input_sgprs(ctx, );
} else {
-   /* TESS_EVAL (and also GEOMETRY):
-* Declare as many input SGPRs as the VS has. */
ctx->param_tcs_offchip_layout = add_arg(, 
ARG_SGPR, ctx->i32);
ctx->param_tes_offchip_addr = add_arg(, 
ARG_SGPR, ctx->i32);
-   add_arg(, ARG_SGPR, ctx->i32); /* unused */
-   ctx->param_vs_state_bits = add_arg(, ARG_SGPR, 
ctx->i32); /* unused */
+   if (!HAVE_32BIT_POINTERS) {
+   /* Declare as many input SGPRs as the VS has. */
+   add_arg(, ARG_SGPR, ctx->i32); /* unused 
*/
+   ctx->param_vs_state_bits = 

[Mesa-dev] [PATCH 06/13] radeonsi: move tess ring address into TCS_OUT_LAYOUT, removes 2 TCS user SGPRs

2018-02-17 Thread Marek Olšák
From: Marek Olšák 

TCS_OUT_LAYOUT has 13 unused bits. That's enough for a 32-bit address
aligned to 512KB. Hey, it's a 13-bit pointer!
---
 src/gallium/drivers/radeonsi/si_shader.c  | 104 +++---
 src/gallium/drivers/radeonsi/si_shader.h  |   6 +-
 src/gallium/drivers/radeonsi/si_shader_internal.h |   4 +-
 src/gallium/drivers/radeonsi/si_state_draw.c  |   8 +-
 src/gallium/drivers/radeonsi/si_state_shaders.c   |  39 ++--
 5 files changed, 70 insertions(+), 91 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 0bf5228..786b250 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1168,42 +1168,60 @@ static LLVMValueRef lds_load(struct 
lp_build_tgsi_context *bld_base,
 static void lds_store(struct si_shader_context *ctx,
  unsigned dw_offset_imm, LLVMValueRef dw_addr,
  LLVMValueRef value)
 {
dw_addr = lp_build_add(>bld_base.uint_bld, dw_addr,
LLVMConstInt(ctx->i32, dw_offset_imm, 0));
 
ac_lds_store(>ac, dw_addr, value);
 }
 
-static LLVMValueRef desc_from_addr_base64k(struct si_shader_context *ctx,
- unsigned param)
+enum si_tess_ring {
+   TCS_OFFCHIP_RING,
+   TCS_FACTOR_RING,
+   TES_OFFCHIP_RING,
+};
+
+static LLVMValueRef get_tess_ring_descriptor(struct si_shader_context *ctx,
+enum si_tess_ring ring)
 {
LLVMBuilderRef builder = ctx->ac.builder;
-
+   unsigned param = ring == TES_OFFCHIP_RING ? ctx->param_tes_offchip_addr 
:
+   
ctx->param_tcs_out_lds_layout;
LLVMValueRef addr = LLVMGetParam(ctx->main_fn, param);
-   addr = LLVMBuildZExt(builder, addr, ctx->i64, "");
-   addr = LLVMBuildShl(builder, addr, LLVMConstInt(ctx->i64, 16, 0), "");
 
-   uint64_t desc2 = 0x;
-   uint64_t desc3 = S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) |
-S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) |
-S_008F0C_DST_SEL_Z(V_008F0C_SQ_SEL_Z) |
-S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W) |
-S_008F0C_NUM_FORMAT(V_008F0C_BUF_NUM_FORMAT_FLOAT) |
-S_008F0C_DATA_FORMAT(V_008F0C_BUF_DATA_FORMAT_32);
-   LLVMValueRef hi = LLVMConstInt(ctx->i64, desc2 | (desc3 << 32), 0);
+   /* TCS only receives high 13 bits of the address. */
+   if (ring == TCS_OFFCHIP_RING || ring == TCS_FACTOR_RING) {
+   addr = LLVMBuildAnd(builder, addr,
+   LLVMConstInt(ctx->i32, 0xfff8, 0), "");
+   }
+
+   if (ring == TCS_FACTOR_RING) {
+   unsigned tf_offset = ctx->screen->tess_offchip_ring_size;
+   addr = LLVMBuildAdd(builder, addr,
+   LLVMConstInt(ctx->i32, tf_offset, 0), "");
+   }
 
-   LLVMValueRef desc = LLVMGetUndef(LLVMVectorType(ctx->i64, 2));
-   desc = LLVMBuildInsertElement(builder, desc, addr, ctx->i32_0, "");
-   desc = LLVMBuildInsertElement(builder, desc, hi, ctx->i32_1, "");
-   return LLVMBuildBitCast(builder, desc, ctx->v4i32, "");
+   LLVMValueRef desc[4];
+   desc[0] = addr;
+   desc[1] = LLVMConstInt(ctx->i32,
+  
S_008F04_BASE_ADDRESS_HI(ctx->screen->info.address32_hi), 0);
+   desc[2] = LLVMConstInt(ctx->i32, 0x, 0);
+   desc[3] = LLVMConstInt(ctx->i32,
+  S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) |
+  S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) |
+  S_008F0C_DST_SEL_Z(V_008F0C_SQ_SEL_Z) |
+  S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W) |
+  
S_008F0C_NUM_FORMAT(V_008F0C_BUF_NUM_FORMAT_FLOAT) |
+  
S_008F0C_DATA_FORMAT(V_008F0C_BUF_DATA_FORMAT_32), 0);
+
+   return ac_build_gather_values(>ac, desc, 4);
 }
 
 static LLVMValueRef fetch_input_tcs(
struct lp_build_tgsi_context *bld_base,
const struct tgsi_full_src_register *reg,
enum tgsi_opcode_type type, unsigned swizzle)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
LLVMValueRef dw_addr, stride;
 
@@ -1299,21 +1317,21 @@ static LLVMValueRef fetch_output_tcs(
 }
 
 static LLVMValueRef fetch_input_tes(
struct lp_build_tgsi_context *bld_base,
const struct tgsi_full_src_register *reg,
enum tgsi_opcode_type type, unsigned swizzle)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
LLVMValueRef buffer, base, addr;
 
-   buffer = desc_from_addr_base64k(ctx, 
ctx->param_tcs_offchip_addr_base64k);
+   buffer = get_tess_ring_descriptor(ctx, TES_OFFCHIP_RING);
 

[Mesa-dev] [PATCH 11/13] radeonsi: set correct num_input_sgprs for VS prolog in merged shaders

2018-02-17 Thread Marek Olšák
From: Marek Olšák 

We need to take num_input_sgprs from VS, not the second shader.
No apps suffered from this.
---
 src/gallium/drivers/radeonsi/si_shader.c | 48 
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 6d0c81d..445d994 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -6733,47 +6733,47 @@ int si_compile_tgsi_shader(struct si_screen *sscreen,
/* TCS main part */
parts[2] = ctx.main_fn;
 
/* TCS epilog */
union si_shader_part_key tcs_epilog_key;
memset(_epilog_key, 0, sizeof(tcs_epilog_key));
tcs_epilog_key.tcs_epilog.states = 
shader->key.part.tcs.epilog;
si_build_tcs_epilog_function(, _epilog_key);
parts[3] = ctx.main_fn;
 
-   /* VS prolog */
-   if (vs_needs_prolog) {
-   union si_shader_part_key vs_prolog_key;
-   si_get_vs_prolog_key(>info,
-
shader->info.num_input_sgprs,
-
>key.part.tcs.ls_prolog,
-shader, _prolog_key);
-   vs_prolog_key.vs_prolog.is_monolithic = true;
-   si_build_vs_prolog_function(, 
_prolog_key);
-   parts[0] = ctx.main_fn;
-   }
-
/* VS as LS main part */
struct si_shader shader_ls = {};
shader_ls.selector = ls;
shader_ls.key.as_ls = 1;
shader_ls.key.mono = shader->key.mono;
shader_ls.key.opt = shader->key.opt;
si_llvm_context_set_tgsi(, _ls);
 
if (!si_compile_tgsi_main(, true)) {
si_llvm_dispose();
return -1;
}
shader->info.uses_instanceid |= 
ls->info.uses_instanceid;
parts[1] = ctx.main_fn;
 
+   /* LS prolog */
+   if (vs_needs_prolog) {
+   union si_shader_part_key vs_prolog_key;
+   si_get_vs_prolog_key(>info,
+
shader_ls.info.num_input_sgprs,
+
>key.part.tcs.ls_prolog,
+shader, _prolog_key);
+   vs_prolog_key.vs_prolog.is_monolithic = true;
+   si_build_vs_prolog_function(, 
_prolog_key);
+   parts[0] = ctx.main_fn;
+   }
+
/* Reset the shader context. */
ctx.shader = shader;
ctx.type = PIPE_SHADER_TESS_CTRL;
 
si_build_wrapper_function(,
  parts + !vs_needs_prolog,
  4 - !vs_needs_prolog, 0,
  vs_needs_prolog ? 2 : 1);
} else {
LLVMValueRef parts[2];
@@ -6797,47 +6797,47 @@ int si_compile_tgsi_shader(struct si_screen *sscreen,
LLVMValueRef gs_main = ctx.main_fn;
 
/* GS prolog */
union si_shader_part_key gs_prolog_key;
memset(_prolog_key, 0, sizeof(gs_prolog_key));
gs_prolog_key.gs_prolog.states = 
shader->key.part.gs.prolog;
gs_prolog_key.gs_prolog.is_monolithic = true;
si_build_gs_prolog_function(, _prolog_key);
gs_prolog = ctx.main_fn;
 
-   /* ES prolog */
-   if (es->vs_needs_prolog) {
-   union si_shader_part_key vs_prolog_key;
-   si_get_vs_prolog_key(>info,
-
shader->info.num_input_sgprs,
-
>key.part.gs.vs_prolog,
-shader, _prolog_key);
-   vs_prolog_key.vs_prolog.is_monolithic = true;
-   si_build_vs_prolog_function(, 
_prolog_key);
-   es_prolog = ctx.main_fn;
-   }
-
/* ES main part */
  

[Mesa-dev] [PATCH 08/13] radeonsi: remove si_descriptors parameter from emit_shader_pointer functions

2018-02-17 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_descriptors.c | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index 5083027..76f2a3e 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -2053,89 +2053,90 @@ void si_shader_change_notify(struct si_context *sctx)
  
R_00B330_SPI_SHADER_USER_DATA_ES_0);
else
si_set_user_data_base(sctx, PIPE_SHADER_TESS_EVAL,
  
R_00B130_SPI_SHADER_USER_DATA_VS_0);
} else {
si_set_user_data_base(sctx, PIPE_SHADER_TESS_EVAL, 0);
}
 }
 
 static void si_emit_shader_pointer_head(struct radeon_winsys_cs *cs,
-   struct si_descriptors *desc,
-   unsigned sh_base,
+   unsigned sh_offset,
unsigned pointer_count)
 {
radeon_emit(cs, PKT3(PKT3_SET_SH_REG, pointer_count * 
(HAVE_32BIT_POINTERS ? 1 : 2), 0));
-   radeon_emit(cs, (sh_base + desc->shader_userdata_offset - 
SI_SH_REG_OFFSET) >> 2);
+   radeon_emit(cs, (sh_offset - SI_SH_REG_OFFSET) >> 2);
 }
 
 static void si_emit_shader_pointer_body(struct si_screen *sscreen,
struct radeon_winsys_cs *cs,
-   struct si_descriptors *desc)
+   uint64_t va)
 {
-   uint64_t va = desc->gpu_address;
-
radeon_emit(cs, va);
 
if (HAVE_32BIT_POINTERS)
assert((va >> 32) == sscreen->info.address32_hi);
else
radeon_emit(cs, va >> 32);
 }
 
 static void si_emit_shader_pointer(struct si_context *sctx,
   struct si_descriptors *desc,
   unsigned sh_base)
 {
struct radeon_winsys_cs *cs = sctx->b.gfx.cs;
+   unsigned sh_offset = sh_base + desc->shader_userdata_offset;
 
-   si_emit_shader_pointer_head(cs, desc, sh_base, 1);
-   si_emit_shader_pointer_body(sctx->screen, cs, desc);
+   si_emit_shader_pointer_head(cs, sh_offset, 1);
+   si_emit_shader_pointer_body(sctx->screen, cs, desc->gpu_address);
 }
 
 static void si_emit_consecutive_shader_pointers(struct si_context *sctx,
unsigned pointer_mask,
unsigned sh_base)
 {
if (!sh_base)
return;
 
struct radeon_winsys_cs *cs = sctx->b.gfx.cs;
unsigned mask = sctx->shader_pointers_dirty & pointer_mask;
 
while (mask) {
int start, count;
u_bit_scan_consecutive_range(, , );
 
struct si_descriptors *descs = >descriptors[start];
+   unsigned sh_offset = sh_base + descs->shader_userdata_offset;
 
-   si_emit_shader_pointer_head(cs, descs, sh_base, count);
+   si_emit_shader_pointer_head(cs, sh_offset, count);
for (int i = 0; i < count; i++)
-   si_emit_shader_pointer_body(sctx->screen, cs, descs + 
i);
+   si_emit_shader_pointer_body(sctx->screen, cs,
+   descs[i].gpu_address);
}
 }
 
 static void si_emit_disjoint_shader_pointers(struct si_context *sctx,
 unsigned pointer_mask,
 unsigned sh_base)
 {
if (!sh_base)
return;
 
struct radeon_winsys_cs *cs = sctx->b.gfx.cs;
unsigned mask = sctx->shader_pointers_dirty & pointer_mask;
 
while (mask) {
struct si_descriptors *descs = 
>descriptors[u_bit_scan()];
+   unsigned sh_offset = sh_base + descs->shader_userdata_offset;
 
-   si_emit_shader_pointer_head(cs, descs, sh_base, 1);
-   si_emit_shader_pointer_body(sctx->screen, cs, descs);
+   si_emit_shader_pointer_head(cs, sh_offset, 1);
+   si_emit_shader_pointer_body(sctx->screen, cs, 
descs->gpu_address);
}
 }
 
 static void si_emit_global_shader_pointers(struct si_context *sctx,
   struct si_descriptors *descs)
 {
if (sctx->b.chip_class == GFX9) {
/* Broadcast it to all shader stages. */
si_emit_shader_pointer(sctx, descs,
   R_00B530_SPI_SHADER_USER_DATA_COMMON_0);
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/13] radeonsi: move tessellation ring info into si_screen

2018-02-17 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_pipe.c  | 38 -
 src/gallium/drivers/radeonsi/si_pipe.h  |  3 ++
 src/gallium/drivers/radeonsi/si_state_shaders.c | 56 ++---
 3 files changed, 52 insertions(+), 45 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index f07ec50..83133cb 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -742,25 +742,59 @@ struct pipe_screen *radeonsi_screen_create(struct 
radeon_winsys *ws,
   si_destroy_shader_cache(sscreen);
   FREE(sscreen);
   return NULL;
}
 
si_handle_env_var_force_family(sscreen);
 
if (!debug_get_bool_option("RADEON_DISABLE_PERFCOUNTERS", false))
si_init_perfcounters(sscreen);
 
+   /* Determine tessellation ring info. */
+   bool double_offchip_buffers = sscreen->info.chip_class >= CIK &&
+ sscreen->info.family != CHIP_CARRIZO &&
+ sscreen->info.family != CHIP_STONEY;
+   /* This must be one less than the maximum number due to a hw limitation.
+* Various hardware bugs in SI, CIK, and GFX9 need this.
+*/
+   unsigned max_offchip_buffers_per_se = double_offchip_buffers ? 127 : 63;
+   unsigned max_offchip_buffers = max_offchip_buffers_per_se *
+  sscreen->info.max_se;
+   unsigned offchip_granularity;
+
/* Hawaii has a bug with offchip buffers > 256 that can be worked
 * around by setting 4K granularity.
 */
-   sscreen->tess_offchip_block_dw_size =
-   sscreen->info.family == CHIP_HAWAII ? 4096 : 8192;
+   if (sscreen->info.family == CHIP_HAWAII) {
+   sscreen->tess_offchip_block_dw_size = 4096;
+   offchip_granularity = V_03093C_X_4K_DWORDS;
+   } else {
+   sscreen->tess_offchip_block_dw_size = 8192;
+   offchip_granularity = V_03093C_X_8K_DWORDS;
+   }
+
+   sscreen->tess_factor_ring_size = 32768 * sscreen->info.max_se;
+   assert(((sscreen->tess_factor_ring_size / 4) & C_030938_SIZE) == 0);
+   sscreen->tess_offchip_ring_size = max_offchip_buffers *
+ sscreen->tess_offchip_block_dw_size * 
4;
+
+   if (sscreen->info.chip_class >= CIK) {
+   if (sscreen->info.chip_class >= VI)
+   --max_offchip_buffers;
+   sscreen->vgt_hs_offchip_param =
+   S_03093C_OFFCHIP_BUFFERING(max_offchip_buffers) |
+   S_03093C_OFFCHIP_GRANULARITY(offchip_granularity);
+   } else {
+   assert(offchip_granularity == V_03093C_X_8K_DWORDS);
+   sscreen->vgt_hs_offchip_param =
+   S_0089B0_OFFCHIP_BUFFERING(max_offchip_buffers);
+   }
 
/* The mere presense of CLEAR_STATE in the IB causes random GPU hangs
 * on SI. */
sscreen->has_clear_state = sscreen->info.chip_class >= CIK;
 
sscreen->has_distributed_tess =
sscreen->info.chip_class >= VI &&
sscreen->info.max_se >= 2;
 
sscreen->has_draw_indirect_multi =
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 3a959f9..7b23e8c 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -97,20 +97,23 @@ struct si_screen {
struct pipe_screen  b;
struct radeon_winsys*ws;
struct disk_cache   *disk_shader_cache;
 
struct radeon_info  info;
uint64_tdebug_flags;
charrenderer_string[100];
 
unsignedgs_table_depth;
unsignedtess_offchip_block_dw_size;
+   unsignedtess_offchip_ring_size;
+   unsignedtess_factor_ring_size;
+   unsignedvgt_hs_offchip_param;
boolhas_clear_state;
boolhas_distributed_tess;
boolhas_draw_indirect_multi;
boolhas_out_of_order_rast;
boolassume_no_z_fights;
boolcommutative_blend_add;
boolclear_db_cache_before_clear;
boolhas_msaa_sample_loc_bug;
boolhas_ls_vgpr_init_bug;
booldpbb_allowed;
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 2cd48f5..9c505ff 

[Mesa-dev] [PATCH 07/13] radeonsi: preload the tess offchip ring in TES

2018-02-17 Thread Marek Olšák
From: Marek Olšák 

so that it's not done multiple times in branches
---
 src/gallium/drivers/radeonsi/si_shader.c  | 21 +
 src/gallium/drivers/radeonsi/si_shader_internal.h |  1 +
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 786b250..783d57f 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1315,51 +1315,47 @@ static LLVMValueRef fetch_output_tcs(
 
return lds_load(bld_base, tgsi2llvmtype(bld_base, type), swizzle, 
dw_addr);
 }
 
 static LLVMValueRef fetch_input_tes(
struct lp_build_tgsi_context *bld_base,
const struct tgsi_full_src_register *reg,
enum tgsi_opcode_type type, unsigned swizzle)
 {
struct si_shader_context *ctx = si_shader_context(bld_base);
-   LLVMValueRef buffer, base, addr;
-
-   buffer = get_tess_ring_descriptor(ctx, TES_OFFCHIP_RING);
+   LLVMValueRef base, addr;
 
base = LLVMGetParam(ctx->main_fn, ctx->param_tcs_offchip_offset);
addr = get_tcs_tes_buffer_address_from_reg(ctx, NULL, reg);
 
return buffer_load(bld_base, tgsi2llvmtype(bld_base, type), swizzle,
-  buffer, base, addr, true);
+  ctx->tess_offchip_ring, base, addr, true);
 }
 
 LLVMValueRef si_nir_load_input_tes(struct ac_shader_abi *abi,
   LLVMValueRef vertex_index,
   LLVMValueRef param_index,
   unsigned const_index,
   unsigned location,
   unsigned driver_location,
   unsigned component,
   unsigned num_components,
   bool is_patch,
   bool is_compact,
   bool load_input)
 {
struct si_shader_context *ctx = si_shader_context_from_abi(abi);
struct tgsi_shader_info *info = >shader->selector->info;
-   LLVMValueRef buffer, base, addr;
+   LLVMValueRef base, addr;
 
driver_location = driver_location / 4;
 
-   buffer = get_tess_ring_descriptor(ctx, TES_OFFCHIP_RING);
-
base = LLVMGetParam(ctx->main_fn, ctx->param_tcs_offchip_offset);
 
if (param_index) {
/* Add the constant index to the indirect index */
param_index = LLVMBuildAdd(ctx->ac.builder, param_index,
   LLVMConstInt(ctx->i32, const_index, 
0), "");
} else {
param_index = LLVMConstInt(ctx->i32, const_index, 0);
}
 
@@ -1369,21 +1365,22 @@ LLVMValueRef si_nir_load_input_tes(struct ac_shader_abi 
*abi,
   
info->input_semantic_index,
   is_patch);
 
/* TODO: This will generate rather ordinary llvm code, although it
 * should be easy for the optimiser to fix up. In future we might want
 * to refactor buffer_load(), but for now this maximises code sharing
 * between the NIR and TGSI backends.
 */
LLVMValueRef value[4];
for (unsigned i = component; i < num_components + component; i++) {
-   value[i] = buffer_load(>bld_base, ctx->i32, i, buffer, 
base, addr, true);
+   value[i] = buffer_load(>bld_base, ctx->i32, i,
+  ctx->tess_offchip_ring, base, addr, 
true);
}
 
return ac_build_varying_gather_values(>ac, value, num_components, 
component);
 }
 
 static void store_output_tcs(struct lp_build_tgsi_context *bld_base,
 const struct tgsi_full_instruction *inst,
 const struct tgsi_opcode_info *info,
 unsigned index,
 LLVMValueRef dst[4])
@@ -1983,32 +1980,30 @@ static LLVMValueRef si_load_tess_coord(struct 
ac_shader_abi *abi)
PIPE_PRIM_TRIANGLES)
coord[2] = lp_build_sub(bld, ctx->ac.f32_1,
lp_build_add(bld, coord[0], coord[1]));
 
return lp_build_gather_values(>gallivm, coord, 4);
 }
 
 static LLVMValueRef load_tess_level(struct si_shader_context *ctx,
unsigned semantic_name)
 {
-   LLVMValueRef buffer, base, addr;
+   LLVMValueRef base, addr;
 
int param = si_shader_io_get_unique_index_patch(semantic_name, 0);
 
-   buffer = get_tess_ring_descriptor(ctx, TES_OFFCHIP_RING);
-
base = LLVMGetParam(ctx->main_fn, ctx->param_tcs_offchip_offset);
addr = get_tcs_tes_buffer_address(ctx, get_rel_patch_id(ctx), NULL,
  LLVMConstInt(ctx->i32, 

[Mesa-dev] [PATCH 03/13] radeonsi: put both tessellation rings into 1 buffer

2018-02-17 Thread Marek Olšák
From: Marek Olšák 

---
 src/gallium/drivers/radeonsi/si_pipe.c  |  3 +-
 src/gallium/drivers/radeonsi/si_pipe.h  |  3 +-
 src/gallium/drivers/radeonsi/si_state_draw.c|  2 +-
 src/gallium/drivers/radeonsi/si_state_shaders.c | 39 ++---
 4 files changed, 18 insertions(+), 29 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_pipe.c 
b/src/gallium/drivers/radeonsi/si_pipe.c
index 83133cb..6deabba 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -110,22 +110,21 @@ static void si_destroy_context(struct pipe_context 
*context)
 * properly.
 */
struct pipe_framebuffer_state fb = {};
if (context->set_framebuffer_state)
context->set_framebuffer_state(context, );
 
si_release_all_descriptors(sctx);
 
pipe_resource_reference(>esgs_ring, NULL);
pipe_resource_reference(>gsvs_ring, NULL);
-   pipe_resource_reference(>tf_ring, NULL);
-   pipe_resource_reference(>tess_offchip_ring, NULL);
+   pipe_resource_reference(>tess_rings, NULL);
pipe_resource_reference(>null_const_buf.buffer, NULL);
r600_resource_reference(>border_color_buffer, NULL);
free(sctx->border_color_table);
r600_resource_reference(>scratch_buffer, NULL);
r600_resource_reference(>compute_scratch_buffer, NULL);
r600_resource_reference(>wait_mem_scratch, NULL);
 
si_pm4_free_state(sctx, sctx->init_config, ~0);
if (sctx->init_config_gs_rings)
si_pm4_free_state(sctx, sctx->init_config_gs_rings, ~0);
diff --git a/src/gallium/drivers/radeonsi/si_pipe.h 
b/src/gallium/drivers/radeonsi/si_pipe.h
index 7b23e8c..896b640 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.h
+++ b/src/gallium/drivers/radeonsi/si_pipe.h
@@ -505,22 +505,21 @@ struct si_context {
unsignedshader_needs_decompress_mask;
struct si_buffer_resources  rw_buffers;
struct si_buffer_resources  
const_and_shader_buffers[SI_NUM_SHADERS];
struct si_samplers  samplers[SI_NUM_SHADERS];
struct si_imagesimages[SI_NUM_SHADERS];
 
/* other shader resources */
struct pipe_constant_buffer null_const_buf; /* used for 
set_constant_buffer(NULL) on CIK */
struct pipe_resource*esgs_ring;
struct pipe_resource*gsvs_ring;
-   struct pipe_resource*tf_ring;
-   struct pipe_resource*tess_offchip_ring;
+   struct pipe_resource*tess_rings;
union pipe_color_union  *border_color_table; /* in CPU memory, 
any endian */
struct r600_resource*border_color_buffer;
union pipe_color_union  *border_color_map; /* in VRAM (slow 
access), little endian */
unsignedborder_color_count;
unsignednum_vs_blit_sgprs;
uint32_t
vs_blit_sh_data[SI_VS_BLIT_SGPRS_POS_TEXCOORD];
 
/* Vertex and index buffers. */
boolvertex_buffers_dirty;
boolvertex_buffer_pointer_dirty;
diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index b245a38..3881e3f 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -289,21 +289,21 @@ static void si_emit_derived_tess_state(struct si_context 
*sctx,
R_00B430_SPI_SHADER_USER_DATA_HS_0 + 
GFX6_SGPR_TCS_OFFCHIP_LAYOUT * 4, 4);
radeon_emit(cs, offchip_layout);
radeon_emit(cs, tcs_out_offsets);
radeon_emit(cs, tcs_out_layout);
radeon_emit(cs, tcs_in_layout);
}
 
/* Set userdata SGPRs for TES. */
radeon_set_sh_reg_seq(cs, tes_sh_base + SI_SGPR_TES_OFFCHIP_LAYOUT * 4, 
2);
radeon_emit(cs, offchip_layout);
-   radeon_emit(cs, r600_resource(sctx->tess_offchip_ring)->gpu_address >> 
16);
+   radeon_emit(cs, r600_resource(sctx->tess_rings)->gpu_address >> 16);
 
ls_hs_config = S_028B58_NUM_PATCHES(*num_patches) |
   S_028B58_HS_NUM_INPUT_CP(num_tcs_input_cp) |
   S_028B58_HS_NUM_OUTPUT_CP(num_tcs_output_cp);
 
if (sctx->b.chip_class >= CIK)
radeon_set_context_reg_idx(cs, R_028B58_VGT_LS_HS_CONFIG, 2,
   ls_hs_config);
else
radeon_set_context_reg(cs, R_028B58_VGT_LS_HS_CONFIG,
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 9c505ff..57bf622 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -2945,52 +2945,43 @@ static 

[Mesa-dev] [PATCH 00/13] RadeonSI: Reduce user SGPR usage

2018-02-17 Thread Marek Olšák
Hi,

This series has the following effect on user SGPRs:

64-bit pointers:
TCS:14 -> 12
Merged VS-TCS: 24 -> 20
Merged VS-GS:  18 -> 16
Merged TES-GS: 18 -> 14

32-bit pointers:
TCS:10 -> 8
Merged VS-TCS: 16 -> 12
Merged VS-GS:  11 -> 9
Merged TES-GS: 11 -> 6

I tested both monolithic and non-monolithic shaders, and both 64-bit
and 32-bit pointers. (4 combinations)

This series is a prerequisite for VBO descriptors in user SGPRs.

Note that merged LS-HS and ES-GS don't even use s[6:7] input SGPRs
yet. Those only provide 40 bits of scalar data (not 64 bits like
s[0:1]).

Please review.

Thanks,
Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/13] radeonsi: move TCS_OUT_LAYOUT.PatchVerticesIn to lower bits

2018-02-17 Thread Marek Olšák
From: Marek Olšák 

For a later patch.
---
 src/gallium/drivers/radeonsi/si_shader.c  | 2 +-
 src/gallium/drivers/radeonsi/si_shader_internal.h | 2 +-
 src/gallium/drivers/radeonsi/si_state_draw.c  | 7 ---
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 1f2338a..cc57ba3 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -2005,21 +2005,21 @@ static LLVMValueRef si_load_tess_level(struct 
ac_shader_abi *abi,
}
 
return load_tess_level(ctx, semantic_name);
 
 }
 
 static LLVMValueRef si_load_patch_vertices_in(struct ac_shader_abi *abi)
 {
struct si_shader_context *ctx = si_shader_context_from_abi(abi);
if (ctx->type == PIPE_SHADER_TESS_CTRL)
-   return unpack_param(ctx, ctx->param_tcs_out_lds_layout, 26, 6);
+   return unpack_param(ctx, ctx->param_tcs_out_lds_layout, 13, 6);
else if (ctx->type == PIPE_SHADER_TESS_EVAL)
return get_num_tcs_out_vertices(ctx);
else
unreachable("invalid shader stage for 
TGSI_SEMANTIC_VERTICESIN");
 }
 
 void si_load_system_value(struct si_shader_context *ctx,
  unsigned index,
  const struct tgsi_full_declaration *decl)
 {
diff --git a/src/gallium/drivers/radeonsi/si_shader_internal.h 
b/src/gallium/drivers/radeonsi/si_shader_internal.h
index 571df55..a9883d5 100644
--- a/src/gallium/drivers/radeonsi/si_shader_internal.h
+++ b/src/gallium/drivers/radeonsi/si_shader_internal.h
@@ -155,21 +155,21 @@ struct si_shader_context {
/* API TCS */
/* Offsets where TCS outputs and TCS patch outputs live in LDS:
 *   [0:15] = TCS output patch0 offset / 16, max = NUM_PATCHES * 32 * 32
 *   [16:31] = TCS output patch0 offset for per-patch / 16
 * max = (NUM_PATCHES + 1) * 32*32
 */
int param_tcs_out_lds_offsets;
/* Layout of TCS outputs / TES inputs:
 *   [0:12] = stride between output patches in DW, num_outputs * 
num_vertices * 4
 *max = 32*32*4 + 32*4
-*   [26:31] = gl_PatchVerticesIn, max = 32
+*   [13:18] = gl_PatchVerticesIn, max = 32
 */
int param_tcs_out_lds_layout;
int param_tcs_offchip_addr_base64k;
int param_tcs_factor_addr_base64k;
int param_tcs_offchip_offset;
int param_tcs_factor_offset;
 
/* API TES */
int param_tes_u;
int param_tes_v;
diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index 06ef84d..b245a38 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -226,21 +226,22 @@ static void si_emit_derived_tess_state(struct si_context 
*sctx,
assert(((output_vertex_size / 4) & ~0xff) == 0);
assert(((input_patch_size / 4) & ~0x1fff) == 0);
assert(((output_patch_size / 4) & ~0x1fff) == 0);
assert(((output_patch0_offset / 16) & ~0x) == 0);
assert(((perpatch_output_offset / 16) & ~0x) == 0);
assert(num_tcs_input_cp <= 32);
assert(num_tcs_output_cp <= 32);
 
tcs_in_layout = S_VS_STATE_LS_OUT_PATCH_SIZE(input_patch_size / 4) |
S_VS_STATE_LS_OUT_VERTEX_SIZE(input_vertex_size / 4);
-   tcs_out_layout = output_patch_size / 4;
+   tcs_out_layout = (output_patch_size / 4) |
+(num_tcs_input_cp << 13);
tcs_out_offsets = (output_patch0_offset / 16) |
  ((perpatch_output_offset / 16) << 16);
offchip_layout = *num_patches |
 (num_tcs_output_cp << 6) |
 (pervertex_output_patch_size * *num_patches << 12);
 
/* Compute the LDS size. */
lds_size = output_patch0_offset + output_patch_size * *num_patches;
 
if (sctx->b.chip_class >= CIK) {
@@ -261,41 +262,41 @@ static void si_emit_derived_tess_state(struct si_context 
*sctx,
S_00B42C_LDS_SIZE(lds_size);
 
radeon_set_sh_reg(cs, R_00B42C_SPI_SHADER_PGM_RSRC2_HS, 
hs_rsrc2);
 
/* Set userdata SGPRs for merged LS-HS. */
radeon_set_sh_reg_seq(cs,
  R_00B430_SPI_SHADER_USER_DATA_LS_0 +
  GFX9_SGPR_TCS_OFFCHIP_LAYOUT * 4, 3);
radeon_emit(cs, offchip_layout);
radeon_emit(cs, tcs_out_offsets);
-   radeon_emit(cs, tcs_out_layout | (num_tcs_input_cp << 26));
+   radeon_emit(cs, tcs_out_layout);
} else {
unsigned ls_rsrc2 = ls_current->config.rsrc2;
 
si_multiwave_lds_size_workaround(sctx->screen, _size);
ls_rsrc2 |= 

[Mesa-dev] [PATCH 12/13] radeonsi: make SI_SGPR_VERTEX_BUFFERS the last user SGPR input

2018-02-17 Thread Marek Olšák
From: Marek Olšák 

so that it can be removed and replaced with inline VBO descriptors,
and the pointer can be packed in unused bits of VBO descriptors.
This also removes the pointer from merged TES-GS where it's useless.
---
 src/gallium/drivers/radeonsi/si_descriptors.c   | 14 --
 src/gallium/drivers/radeonsi/si_shader.c| 16 +++-
 src/gallium/drivers/radeonsi/si_shader.h|  9 +++
 src/gallium/drivers/radeonsi/si_state_shaders.c | 34 -
 4 files changed, 53 insertions(+), 20 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index f6bc3cf..7fdac23 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -2181,23 +2181,33 @@ void si_emit_graphics_shader_pointers(struct si_context 
*sctx,
 
sh_base[PIPE_SHADER_TESS_CTRL]);
si_emit_disjoint_shader_pointers(sctx, 
SI_DESCS_SHADER_MASK(GEOMETRY),
 sh_base[PIPE_SHADER_GEOMETRY]);
}
 
sctx->shader_pointers_dirty &=
~u_bit_consecutive(SI_DESCS_RW_BUFFERS, SI_DESCS_FIRST_COMPUTE);
 
if (sctx->vertex_buffer_pointer_dirty) {
struct radeon_winsys_cs *cs = sctx->b.gfx.cs;
-   unsigned sh_offset = sh_base[PIPE_SHADER_VERTEX] +
-SI_SGPR_VERTEX_BUFFERS * 4;
 
+   /* Find the location of the VB descriptor pointer. */
+   /* TODO: In the future, the pointer will be packed in unused
+*   bits of the first 2 VB descriptors. */
+   unsigned sh_dw_offset = SI_VS_NUM_USER_SGPR;
+   if (sctx->b.chip_class >= GFX9) {
+   if (sctx->tes_shader.cso)
+   sh_dw_offset = GFX9_TCS_NUM_USER_SGPR;
+   else if (sctx->gs_shader.cso)
+   sh_dw_offset = GFX9_GS_NUM_USER_SGPR;
+   }
+
+   unsigned sh_offset = sh_base[PIPE_SHADER_VERTEX] + sh_dw_offset 
* 4;
si_emit_shader_pointer_head(cs, sh_offset, 1);
si_emit_shader_pointer_body(sctx->screen, cs,

sctx->vb_descriptors_buffer->gpu_address +
sctx->vb_descriptors_offset);
sctx->vertex_buffer_pointer_dirty = false;
}
 
if (sctx->graphics_bindless_pointer_dirty) {
si_emit_global_shader_pointers(sctx,
   >bindless_descriptors);
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 445d994..dfbb1f2 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -4541,22 +4541,20 @@ static void declare_global_desc_pointers(struct 
si_shader_context *ctx,
 {
ctx->param_rw_buffers = add_arg(fninfo, ARG_SGPR,
ac_array_in_const32_addr_space(ctx->v4i32));
ctx->param_bindless_samplers_and_images = add_arg(fninfo, ARG_SGPR,
ac_array_in_const32_addr_space(ctx->v8i32));
 }
 
 static void declare_vs_specific_input_sgprs(struct si_shader_context *ctx,
struct si_function_info *fninfo)
 {
-   ctx->param_vertex_buffers = add_arg(fninfo, ARG_SGPR,
-   ac_array_in_const32_addr_space(ctx->v4i32));
add_arg_assign(fninfo, ARG_SGPR, ctx->i32, >abi.base_vertex);
add_arg_assign(fninfo, ARG_SGPR, ctx->i32, >abi.start_instance);
add_arg_assign(fninfo, ARG_SGPR, ctx->i32, >abi.draw_id);
ctx->param_vs_state_bits = add_arg(fninfo, ARG_SGPR, ctx->i32);
 }
 
 static void declare_vs_input_vgprs(struct si_shader_context *ctx,
   struct si_function_info *fninfo,
   unsigned *num_prolog_vgprs)
 {
@@ -4644,20 +4642,22 @@ static void create_function(struct si_shader_context 
*ctx)
add_arg(, ARG_SGPR, ctx->f32); /* 
texcoord.w */
}
 
/* VGPRs */
declare_vs_input_vgprs(ctx, , _prolog_vgprs);
break;
}
 
declare_per_stage_desc_pointers(ctx, , true);
declare_vs_specific_input_sgprs(ctx, );
+   ctx->param_vertex_buffers = add_arg(, ARG_SGPR,
+   ac_array_in_const32_addr_space(ctx->v4i32));
 
if (shader->key.as_es) {
ctx->param_es2gs_offset = add_arg(, ARG_SGPR, 
ctx->i32);
} else if (shader->key.as_ls) {
/* no extra parameters */
} else {
if (shader->is_gs_copy_shader) {
 

[Mesa-dev] [PATCH 05/13] radeonsi: move 2nd-shader descriptor pointers into s[0:1]

2018-02-17 Thread Marek Olšák
From: Marek Olšák 

If 32-bit pointers are supported, both pointers can be moved into s[0:1]
and then ESGS has exactly the same user data SGPR declarations as VS.

If 32-bit pointers are not supported, only one pointer can be moved into
s[0:1]. In that case, the 2nd pointer is moved before TCS constants,
so that the location is the same in HS and GS.
---
 src/gallium/drivers/radeonsi/si_descriptors.c | 83 +--
 src/gallium/drivers/radeonsi/si_shader.c  | 94 ++-
 src/gallium/drivers/radeonsi/si_shader.h  | 37 ---
 3 files changed, 140 insertions(+), 74 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index 0bad3e1..5083027 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -2107,20 +2107,38 @@ static void si_emit_consecutive_shader_pointers(struct 
si_context *sctx,
u_bit_scan_consecutive_range(, , );
 
struct si_descriptors *descs = >descriptors[start];
 
si_emit_shader_pointer_head(cs, descs, sh_base, count);
for (int i = 0; i < count; i++)
si_emit_shader_pointer_body(sctx->screen, cs, descs + 
i);
}
 }
 
+static void si_emit_disjoint_shader_pointers(struct si_context *sctx,
+unsigned pointer_mask,
+unsigned sh_base)
+{
+   if (!sh_base)
+   return;
+
+   struct radeon_winsys_cs *cs = sctx->b.gfx.cs;
+   unsigned mask = sctx->shader_pointers_dirty & pointer_mask;
+
+   while (mask) {
+   struct si_descriptors *descs = 
>descriptors[u_bit_scan()];
+
+   si_emit_shader_pointer_head(cs, descs, sh_base, 1);
+   si_emit_shader_pointer_body(sctx->screen, cs, descs);
+   }
+}
+
 static void si_emit_global_shader_pointers(struct si_context *sctx,
   struct si_descriptors *descs)
 {
if (sctx->b.chip_class == GFX9) {
/* Broadcast it to all shader stages. */
si_emit_shader_pointer(sctx, descs,
   R_00B530_SPI_SHADER_USER_DATA_COMMON_0);
return;
}
 
@@ -2143,28 +2161,35 @@ void si_emit_graphics_shader_pointers(struct si_context 
*sctx,
 {
uint32_t *sh_base = sctx->shader_pointers.sh_base;
 
if (sctx->shader_pointers_dirty & (1 << SI_DESCS_RW_BUFFERS)) {
si_emit_global_shader_pointers(sctx,
   
>descriptors[SI_DESCS_RW_BUFFERS]);
}
 
si_emit_consecutive_shader_pointers(sctx, SI_DESCS_SHADER_MASK(VERTEX),
sh_base[PIPE_SHADER_VERTEX]);
-   si_emit_consecutive_shader_pointers(sctx, 
SI_DESCS_SHADER_MASK(TESS_CTRL),
-   sh_base[PIPE_SHADER_TESS_CTRL]);
si_emit_consecutive_shader_pointers(sctx, 
SI_DESCS_SHADER_MASK(TESS_EVAL),
sh_base[PIPE_SHADER_TESS_EVAL]);
-   si_emit_consecutive_shader_pointers(sctx, 
SI_DESCS_SHADER_MASK(GEOMETRY),
-   sh_base[PIPE_SHADER_GEOMETRY]);
si_emit_consecutive_shader_pointers(sctx, 
SI_DESCS_SHADER_MASK(FRAGMENT),
sh_base[PIPE_SHADER_FRAGMENT]);
+   if (HAVE_32BIT_POINTERS || sctx->b.chip_class <= VI) {
+   si_emit_consecutive_shader_pointers(sctx, 
SI_DESCS_SHADER_MASK(TESS_CTRL),
+   
sh_base[PIPE_SHADER_TESS_CTRL]);
+   si_emit_consecutive_shader_pointers(sctx, 
SI_DESCS_SHADER_MASK(GEOMETRY),
+   
sh_base[PIPE_SHADER_GEOMETRY]);
+   } else {
+   si_emit_disjoint_shader_pointers(sctx, 
SI_DESCS_SHADER_MASK(TESS_CTRL),
+
sh_base[PIPE_SHADER_TESS_CTRL]);
+   si_emit_disjoint_shader_pointers(sctx, 
SI_DESCS_SHADER_MASK(GEOMETRY),
+sh_base[PIPE_SHADER_GEOMETRY]);
+   }
 
sctx->shader_pointers_dirty &=
~u_bit_consecutive(SI_DESCS_RW_BUFFERS, SI_DESCS_FIRST_COMPUTE);
 
if (sctx->vertex_buffer_pointer_dirty) {
si_emit_shader_pointer(sctx, >vertex_buffers,
   sh_base[PIPE_SHADER_VERTEX]);
sctx->vertex_buffer_pointer_dirty = false;
}
 
@@ -2626,54 +2651,70 @@ void si_all_resident_buffers_begin_new_cs(struct 
si_context *sctx)
num_resident_img_handles;
 }
 
 /* INIT/DEINIT/UPLOAD */
 
 void si_init_all_descriptors(struct si_context *sctx)
 {
int i;
 
 #if !HAVE_32BIT_POINTERS
-  

[Mesa-dev] [PATCH 04/13] radeonsi: change si_descriptors::shader_userdata_offset type to short

2018-02-17 Thread Marek Olšák
From: Marek Olšák 

We will want to use SH registers outside of user data SGPRs, like the GFX9
special SGPRs.
---
 src/gallium/drivers/radeonsi/si_descriptors.c | 12 ++--
 src/gallium/drivers/radeonsi/si_state.h   |  6 +++---
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_descriptors.c 
b/src/gallium/drivers/radeonsi/si_descriptors.c
index 2c1c4b7..0bad3e1 100644
--- a/src/gallium/drivers/radeonsi/si_descriptors.c
+++ b/src/gallium/drivers/radeonsi/si_descriptors.c
@@ -108,28 +108,28 @@ static void si_init_descriptor_list(uint32_t *desc_list,
 
/* Initialize the array to NULL descriptors if the element size is 8. */
if (null_descriptor) {
assert(element_dw_size % 8 == 0);
for (i = 0; i < num_elements * element_dw_size / 8; i++)
memcpy(desc_list + i * 8, null_descriptor, 8 * 4);
}
 }
 
 static void si_init_descriptors(struct si_descriptors *desc,
-   unsigned shader_userdata_index,
+   short shader_userdata_rel_index,
unsigned element_dw_size,
unsigned num_elements)
 {
desc->list = CALLOC(num_elements, element_dw_size * 4);
desc->element_dw_size = element_dw_size;
desc->num_elements = num_elements;
-   desc->shader_userdata_offset = shader_userdata_index * 4;
+   desc->shader_userdata_offset = shader_userdata_rel_index * 4;
desc->slot_index_to_bind_directly = -1;
 }
 
 static void si_release_descriptors(struct si_descriptors *desc)
 {
r600_resource_reference(>buffer, NULL);
FREE(desc->list);
 }
 
 static bool si_upload_descriptors(struct si_context *sctx,
@@ -970,33 +970,33 @@ static void si_bind_sampler_states(struct pipe_context 
*ctx,
 
sctx->descriptors_dirty |= 1u << 
si_sampler_and_image_descriptors_idx(shader);
}
 }
 
 /* BUFFER RESOURCES */
 
 static void si_init_buffer_resources(struct si_buffer_resources *buffers,
 struct si_descriptors *descs,
 unsigned num_buffers,
-unsigned shader_userdata_index,
+short shader_userdata_rel_index,
 enum radeon_bo_usage shader_usage,
 enum radeon_bo_usage shader_usage_constbuf,
 enum radeon_bo_priority priority,
 enum radeon_bo_priority priority_constbuf)
 {
buffers->shader_usage = shader_usage;
buffers->shader_usage_constbuf = shader_usage_constbuf;
buffers->priority = priority;
buffers->priority_constbuf = priority_constbuf;
buffers->buffers = CALLOC(num_buffers, sizeof(struct pipe_resource*));
 
-   si_init_descriptors(descs, shader_userdata_index, 4, num_buffers);
+   si_init_descriptors(descs, shader_userdata_rel_index, 4, num_buffers);
 }
 
 static void si_release_buffer_resources(struct si_buffer_resources *buffers,
struct si_descriptors *descs)
 {
int i;
 
for (i = 0; i < descs->num_elements; i++) {
pipe_resource_reference(>buffers[i], NULL);
}
@@ -2186,26 +2186,26 @@ void si_emit_compute_shader_pointers(struct si_context 
*sctx)
if (sctx->compute_bindless_pointer_dirty) {
si_emit_shader_pointer(sctx, >bindless_descriptors, base);
sctx->compute_bindless_pointer_dirty = false;
}
 }
 
 /* BINDLESS */
 
 static void si_init_bindless_descriptors(struct si_context *sctx,
 struct si_descriptors *desc,
-unsigned shader_userdata_index,
+short shader_userdata_rel_index,
 unsigned num_elements)
 {
MAYBE_UNUSED unsigned desc_slot;
 
-   si_init_descriptors(desc, shader_userdata_index, 16, num_elements);
+   si_init_descriptors(desc, shader_userdata_rel_index, 16, num_elements);
sctx->bindless_descriptors.num_active_slots = num_elements;
 
/* The first bindless descriptor is stored at slot 1, because 0 is not
 * considered to be a valid handle.
 */
sctx->num_bindless_descriptors = 1;
 
/* Track which bindless slots are used (or not). */
util_idalloc_init(>bindless_used_slots);
util_idalloc_resize(>bindless_used_slots, num_elements);
diff --git a/src/gallium/drivers/radeonsi/si_state.h 
b/src/gallium/drivers/radeonsi/si_state.h
index ea5038d..8d2b934 100644
--- a/src/gallium/drivers/radeonsi/si_state.h
+++ b/src/gallium/drivers/radeonsi/si_state.h
@@ -268,23 +268,23 @@ struct si_descriptors {
 
/* The maximum 

[Mesa-dev] [Bug 105068] vulkaninfo gives an VK_ERROR_INITIALIZATION_FAILED

2018-02-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105068

Pietro Pesci Feltri  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEEDINFO|RESOLVED

--- Comment #7 from Pietro Pesci Feltri  ---
The problem appear in machines containing Southern Islands cards and Sea
Islands cards.

As stated in
https://wiki.archlinux.org/index.php/AMDGPU#Selecting_the_right_driver

The parameters must be added to the kernel but the parameters depend on
what cart you have installed in the case the card is Southern Islands
**or** Sea Islands.

For Southern Islands parameters are: radeon.si_support=0 amdgpu.si_support=1
For Sea Islands parameters are: radeon.cik_support=0 amdgpu.cik_support=1

but not all parameters, or you will end with a frozen display during the boot
as in
my case

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 105068] vulkaninfo gives an VK_ERROR_INITIALIZATION_FAILED

2018-02-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105068

--- Comment #6 from Pietro Pesci Feltri  ---
Problem solved :).

As stated in
https://wiki.archlinux.org/index.php/AMDGPU#Selecting_the_right_driver

The parameters must be added to the kernel but the parameters depend on
what cart you have installed in the case the card is Southern Islands
**or** Sea Islands.

For Southern Islands parameters are: radeon.si_support=0 amdgpu.si_support=1
For Sea Islands parameters are: radeon.cik_support=0 amdgpu.cik_support=1

but not both, or you will end with a frozen display during the boot as in
my case.

I hope this help others.

On Sat, Feb 17, 2018 at 12:56 AM, Pietro Pesci Feltri 
wrote:

> I add the parameters manually pressing "e" in the grub menu 2 days ago.
> One hour later the initialization process was not ended.
> I was forced to reboot the machine.
>
> Anyway thanks.
>
>
>

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [ANNOUNCE] mesa 17.3.4

2018-02-17 Thread Bas Nieuwenhuizen
(-mesa-announce + Mark, Dave and James)

Hi Emil,

radv is broken for nearly all commercial games in 17.3.4. The cause is

commit ad764e365beb8a119369b97f5cb95fc7ea8c
Author: Bas Nieuwenhuizen 
Date:   Mon Jan 22 09:01:29 2018 +0100

ac/nir: Use instance_rate_inputs per attribute, not per variable.

This did the wrong thing if we had e.g. an array for which only some
of the attributes use the instance index. Tripped up some new CTS
tests.

CC: 
Reviewed-by: Samuel Pitoiset 
Reviewed-by: Dave Airlie 
(cherry picked from commit 5a4dc285002e1924dbc8c72d17481a3dbc4c0142)

Conflicts:
src/amd/common/ac_nir_to_llvm.c

A typo was introduced during the conflict resolution while
cherrypicking to 17.3.

First things first, can we mitigate this? Would it be possible to get
a 17.3.5 with this fixed ASAP? This can be fixed by either rolling
back ad764e365beb8a119369b97f5cb95fc7ea8c or by applying
https://patchwork.freedesktop.org/patch/204260/ (obviously not applied
to master as the issue did not occur there).

Secondly, I'd like to talk about process and how to prevent this in
the future. Bugfix releases are supposed to be stable so downstream
maintainers don't have to to deal with this kind of stuff happening,
so I think that breaking radv pretty much completely is particularly
egregious and we should look at how to prevent this happnening another
time.

First a short summary of what happened and when (all times in UTC):

2018-02-09 4:47: Emil sends out the pre-release announcement, git
branch contains the faulty commit
2018-02-13 16:05: James Legg replies to the pre-release announcement:
https://lists.freedesktop.org/archives/mesa-dev/2018-February/185355.html
2018-02-13 16:06: James Legg sends a patch to fix it:
https://lists.freedesktop.org/archives/mesa-dev/2018-February/185356.html
2018-02-13 16:33: Bas reviews the patch.
2018-02-15 12:56: Emil releases 17.3.4 without the fix:
https://lists.freedesktop.org/archives/mesa-dev/2018-February/185691.html
2018-02-16 21:09: Someone else replies with a fix to the pre-release
announcement: 
https://lists.freedesktop.org/archives/mesa-dev/2018-February/185908.html
2018-02-17 ~13:00: Bas notices the attached fix has "mismerge" in the
name and decides to investigate the issue


So I think there have been a couple of issues in communication here:

1) Essentially none of the communication talked about this being an
issue on the branch only. This made me underestimate the severity of
the issue, since the games kept working on master. This also meant no
manual pings from my side to the release manager for not applying this
to master first.
2) The reply to the pre-release announcement only said there was an
issue, not that there was a fix sent out, nor any details like
severity.
3) As far as I can tell no action had been taken by the release
manager. James' reply did not get a response nor was the fix included
in the release.

My question would be how to improve the communication here. Could you
elaborate the reason for (3)? Was it because you thought it would have
to be picked up by the radv developers first? Would it help if I
replied to James' reply in the announcement to link to the patch?

The remaining issue is mainly about testing. The initial detection of
this issue by James was already more than the 24-48 hours after the
pre-release announcement that is recommended between the announcement
and release, so a faster release would have prevented timely detection
in the first place. I suspect radv does not get tested at all on
releases except maybe a build test?

Since manually testing all games on every release is not scalable for
either the radv developers or the release manager I'm thinking of
getting the automated tests from the vulkan CTS and crucible running
before a release. That said I don't think keeping track of what tests
are supposed to pass is something we should be pushing on the release
manager, so I suppose we should be setting up our own CI and surfacing
results of the stable branches to the release manager?

I'm assuming there is already a similar arrangement with intel? Is
this something were the release manager triggers their CI specifically
when doing a pre-release or are they just continuously watching stable
branches? Also how do the results get surfaced back to the release
manager?

Apologies for the barrage of questions!

Thanks,
Bas Nieuwenhuizen

On Thu, Feb 15, 2018 at 1:56 PM, Emil Velikov  wrote:
> Mesa 17.3.4 is now available.
>
> In this release we have:
>
> Dozens of fixes in the i965, ANV and RADV drivers. Additionally
> the r600, virgl, etnaviv and renderonly drivers have also seen some love.
>
> The experimental Vulkan extension VK_KHX_multiview was disabled.
>
> On the video decoding drivers side:
> r600/radeonsi correctly handle new UVD/VCN firmware. 

Re: [Mesa-dev] Mesa 17.3.4 release candidate

2018-02-17 Thread Thomas Backlund

Den 13.02.2018 kl. 18:05, skrev James Legg:

The conflict resolution on this commit has a typo, it should use
(index + i) instead of (index + 1).



Yep, attached fix for 17.3 branch...


--
Thomas

Commmit: ad764e365beb8a119369b97f5cb95fc7ea8c 
 (ac/nir: Use instance_rate_inputs per attribute, not per variable)

in the 17.3 branch which is cherrypicked from 5a4dc285002e1924dbc8c72d17481a3dbc4c0142
and released in 17.3.4 contains a mis-merge noticed by James Legg during 17.3.4 review.

Fix that up.

Signed-off-by: Thomas Backlund 
Reported-by: James Legg 

---

diff -Nurp mesa-17.3.4.orig/src/amd/common/ac_nir_to_llvm.c mesa-17.3.4/src/amd/common/ac_nir_to_llvm.c
--- mesa-17.3.4.orig/src/amd/common/ac_nir_to_llvm.c
+++ mesa-17.3.4/src/amd/common/ac_nir_to_llvm.c
@@ -5117,7 +5117,7 @@ handle_vs_input_decl(struct nir_to_llvm_
 	variable->data.driver_location = idx * 4;
 
 	for (unsigned i = 0; i < attrib_count; ++i, ++idx) {
-		if (ctx->options->key.vs.instance_rate_inputs & (1u << (index + 1))) {
+		if (ctx->options->key.vs.instance_rate_inputs & (1u << (index + i))) {
 			buffer_index = LLVMBuildAdd(ctx->builder, ctx->abi.instance_id,
 		ctx->abi.start_instance, "");
 			ctx->shader_info->vs.vgpr_comp_cnt =
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv/gpu_memcpy: CS Stall before a MI memcpy on gen7

2018-02-17 Thread Jason Ekstrand
On Sat, Feb 17, 2018 at 1:21 AM, Kenneth Graunke 
wrote:

> On Saturday, February 17, 2018 1:09:10 AM PST Jason Ekstrand wrote:
> > This fixes a pile of hangs caused by the recent shuffling of resolves
> > and transitions.  The particularly problematic case is when you have at
> > least three attachments with load ops of CLEAR, LOAD, CLEAR.  In this
> > case, we execute the first CLEAR followed by a MI memcpy to copy the
> > clear values over for the LOAD followed by a second CLEAR.  The MI
> > commands cause the first CLEAR to hang which causes us to get stuck on
> > the 3DSTATE_MULTISAMPLE in the second CLEAR.
> >
> > We also add guards for BLORP to fix the same issue.  These shouldn't
> > actually do anything right now because the only use of indirect clears
> > in BLORP today is for resolves which are already guarded by a render
> > cache flush and CS stall.  However, this will guard us against potential
> > issues in the future.
> >
> > Cc: Kenneth Graunke 
> > Cc: Nanley Chery 
> > ---
> >  src/intel/vulkan/genX_blorp_exec.c | 10 ++
> >  src/intel/vulkan/genX_gpu_memcpy.c | 22 ++
> >  2 files changed, 32 insertions(+)
> >
> > diff --git a/src/intel/vulkan/genX_blorp_exec.c
> b/src/intel/vulkan/genX_blorp_exec.c
> > index 04f7675..f956715 100644
> > --- a/src/intel/vulkan/genX_blorp_exec.c
> > +++ b/src/intel/vulkan/genX_blorp_exec.c
> > @@ -200,6 +200,16 @@ genX(blorp_exec)(struct blorp_batch *batch,
> >genX(cmd_buffer_config_l3)(cmd_buffer, cfg);
> > }
> >
> > +#if GEN_GEN == 7
> > +   /* The MI_LOAD/STORE_REGISTER_MEM commands which BLORP uses to
> implement
> > +* indirect fast-clear colors can cause GPU hangs if we don't stall
> first.
> > +* See genX(cmd_buffer_mi_memcpy) for more details.
> > +*/
> > +   assert(params->src.clear_color_addr.buffer == NULL);
> > +   if (params->dst.clear_color_addr.buffer)
> > +  cmd_buffer->state.pending_pipe_bits |= ANV_PIPE_CS_STALL_BIT;
> > +#endif
> > +
> > genX(cmd_buffer_apply_pipe_flushes)(cmd_buffer);
> >
> > genX(flush_pipeline_select_3d)(cmd_buffer);
> > diff --git a/src/intel/vulkan/genX_gpu_memcpy.c
> b/src/intel/vulkan/genX_gpu_memcpy.c
> > index f3ada93..1d527fb 100644
> > --- a/src/intel/vulkan/genX_gpu_memcpy.c
> > +++ b/src/intel/vulkan/genX_gpu_memcpy.c
> > @@ -62,6 +62,28 @@ genX(cmd_buffer_mi_memcpy)(struct anv_cmd_buffer
> *cmd_buffer,
> > assert(dst_offset % 4 == 0);
> > assert(src_offset % 4 == 0);
> >
> > +#if GEN_GEN == 7
> > +   /* On gen7, the combination of commands used
> here(MI_LOAD_REGISTER_MEM
> > +* and MI_STORE_REGISTER_MEM) can cause GPU hangs if any rendering is
> > +* in-flight when they are issued even if the memory touched is not
> > +* currently active for rendering.  The weird bit is that it is not
> the
> > +* MI_LOAD/STORE_REGISTER_MEM commands which hang but rather the
> in-flight
> > +* rendering hangs such that the next stalling command after the
> > +* MI_LOAD/STORE_REGISTER_MEM commands will catch the hang.
> > +*
> > +* It is unclear exactly why this hang occurs.  Both MI commands
> come with
> > +* warnings about the 3D pipeline but that doesn't seem to fully
> explain
> > +* it.  My (Jason's) best theory is that it has something to do with
> the
> > +* fact that we're using a GPU state register as our temporary and
> that
> > +* something with reading/writing it is causing problems.
> > +*
> > +* In order to work around this issue, we emit a PIPE_CONTROL with
> the
> > +* command streamer stall bit set.
> > +*/
> > +   cmd_buffer->state.pending_pipe_bits |= ANV_PIPE_CS_STALL_BIT;
> > +   genX(cmd_buffer_apply_pipe_flushes)(cmd_buffer);
> > +#endif
> > +
> > for (uint32_t i = 0; i < size; i += 4) {
> >const struct anv_address src_addr =
> >   (struct anv_address) { src, src_offset + i};
> >
>
> This seems reasonable enough.  I can't say I understand why this is
> necessary,


That makes two of us and, when Nanley gets around to looking at it, I'm
sure it'll make 3. :-)


> but if it fixes the problem...it's probably best to just
> do it.  We can always change it later if we find new information.
>

I'm fairly sure that this is more-or-less the correct fix and 10 Jenkins
runs along with 50 (soon to be 100) runs of
dEQP-VK.renderpass.dedicated_allocation.attachment.* agree.


> Acked-by: Kenneth Graunke 
>

Thanks!


> I'll let Nanley review as well.
>

I can wait.  I've only been fighting this for a week and a half...

--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv/gpu_memcpy: CS Stall before a MI memcpy on gen7

2018-02-17 Thread Kenneth Graunke
On Saturday, February 17, 2018 1:09:10 AM PST Jason Ekstrand wrote:
> This fixes a pile of hangs caused by the recent shuffling of resolves
> and transitions.  The particularly problematic case is when you have at
> least three attachments with load ops of CLEAR, LOAD, CLEAR.  In this
> case, we execute the first CLEAR followed by a MI memcpy to copy the
> clear values over for the LOAD followed by a second CLEAR.  The MI
> commands cause the first CLEAR to hang which causes us to get stuck on
> the 3DSTATE_MULTISAMPLE in the second CLEAR.
> 
> We also add guards for BLORP to fix the same issue.  These shouldn't
> actually do anything right now because the only use of indirect clears
> in BLORP today is for resolves which are already guarded by a render
> cache flush and CS stall.  However, this will guard us against potential
> issues in the future.
> 
> Cc: Kenneth Graunke 
> Cc: Nanley Chery 
> ---
>  src/intel/vulkan/genX_blorp_exec.c | 10 ++
>  src/intel/vulkan/genX_gpu_memcpy.c | 22 ++
>  2 files changed, 32 insertions(+)
> 
> diff --git a/src/intel/vulkan/genX_blorp_exec.c 
> b/src/intel/vulkan/genX_blorp_exec.c
> index 04f7675..f956715 100644
> --- a/src/intel/vulkan/genX_blorp_exec.c
> +++ b/src/intel/vulkan/genX_blorp_exec.c
> @@ -200,6 +200,16 @@ genX(blorp_exec)(struct blorp_batch *batch,
>genX(cmd_buffer_config_l3)(cmd_buffer, cfg);
> }
>  
> +#if GEN_GEN == 7
> +   /* The MI_LOAD/STORE_REGISTER_MEM commands which BLORP uses to implement
> +* indirect fast-clear colors can cause GPU hangs if we don't stall first.
> +* See genX(cmd_buffer_mi_memcpy) for more details.
> +*/
> +   assert(params->src.clear_color_addr.buffer == NULL);
> +   if (params->dst.clear_color_addr.buffer)
> +  cmd_buffer->state.pending_pipe_bits |= ANV_PIPE_CS_STALL_BIT;
> +#endif
> +
> genX(cmd_buffer_apply_pipe_flushes)(cmd_buffer);
>  
> genX(flush_pipeline_select_3d)(cmd_buffer);
> diff --git a/src/intel/vulkan/genX_gpu_memcpy.c 
> b/src/intel/vulkan/genX_gpu_memcpy.c
> index f3ada93..1d527fb 100644
> --- a/src/intel/vulkan/genX_gpu_memcpy.c
> +++ b/src/intel/vulkan/genX_gpu_memcpy.c
> @@ -62,6 +62,28 @@ genX(cmd_buffer_mi_memcpy)(struct anv_cmd_buffer 
> *cmd_buffer,
> assert(dst_offset % 4 == 0);
> assert(src_offset % 4 == 0);
>  
> +#if GEN_GEN == 7
> +   /* On gen7, the combination of commands used here(MI_LOAD_REGISTER_MEM
> +* and MI_STORE_REGISTER_MEM) can cause GPU hangs if any rendering is
> +* in-flight when they are issued even if the memory touched is not
> +* currently active for rendering.  The weird bit is that it is not the
> +* MI_LOAD/STORE_REGISTER_MEM commands which hang but rather the in-flight
> +* rendering hangs such that the next stalling command after the
> +* MI_LOAD/STORE_REGISTER_MEM commands will catch the hang.
> +*
> +* It is unclear exactly why this hang occurs.  Both MI commands come with
> +* warnings about the 3D pipeline but that doesn't seem to fully explain
> +* it.  My (Jason's) best theory is that it has something to do with the
> +* fact that we're using a GPU state register as our temporary and that
> +* something with reading/writing it is causing problems.
> +*
> +* In order to work around this issue, we emit a PIPE_CONTROL with the
> +* command streamer stall bit set.
> +*/
> +   cmd_buffer->state.pending_pipe_bits |= ANV_PIPE_CS_STALL_BIT;
> +   genX(cmd_buffer_apply_pipe_flushes)(cmd_buffer);
> +#endif
> +
> for (uint32_t i = 0; i < size; i += 4) {
>const struct anv_address src_addr =
>   (struct anv_address) { src, src_offset + i};
> 

This seems reasonable enough.  I can't say I understand why this is
necessary, but if it fixes the problem...it's probably best to just
do it.  We can always change it later if we find new information.

Acked-by: Kenneth Graunke 

I'll let Nanley review as well.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] anv/gpu_memcpy: CS Stall before a MI memcpy on gen7

2018-02-17 Thread Jason Ekstrand
This fixes a pile of hangs caused by the recent shuffling of resolves
and transitions.  The particularly problematic case is when you have at
least three attachments with load ops of CLEAR, LOAD, CLEAR.  In this
case, we execute the first CLEAR followed by a MI memcpy to copy the
clear values over for the LOAD followed by a second CLEAR.  The MI
commands cause the first CLEAR to hang which causes us to get stuck on
the 3DSTATE_MULTISAMPLE in the second CLEAR.

We also add guards for BLORP to fix the same issue.  These shouldn't
actually do anything right now because the only use of indirect clears
in BLORP today is for resolves which are already guarded by a render
cache flush and CS stall.  However, this will guard us against potential
issues in the future.

Cc: Kenneth Graunke 
Cc: Nanley Chery 
---
 src/intel/vulkan/genX_blorp_exec.c | 10 ++
 src/intel/vulkan/genX_gpu_memcpy.c | 22 ++
 2 files changed, 32 insertions(+)

diff --git a/src/intel/vulkan/genX_blorp_exec.c 
b/src/intel/vulkan/genX_blorp_exec.c
index 04f7675..f956715 100644
--- a/src/intel/vulkan/genX_blorp_exec.c
+++ b/src/intel/vulkan/genX_blorp_exec.c
@@ -200,6 +200,16 @@ genX(blorp_exec)(struct blorp_batch *batch,
   genX(cmd_buffer_config_l3)(cmd_buffer, cfg);
}
 
+#if GEN_GEN == 7
+   /* The MI_LOAD/STORE_REGISTER_MEM commands which BLORP uses to implement
+* indirect fast-clear colors can cause GPU hangs if we don't stall first.
+* See genX(cmd_buffer_mi_memcpy) for more details.
+*/
+   assert(params->src.clear_color_addr.buffer == NULL);
+   if (params->dst.clear_color_addr.buffer)
+  cmd_buffer->state.pending_pipe_bits |= ANV_PIPE_CS_STALL_BIT;
+#endif
+
genX(cmd_buffer_apply_pipe_flushes)(cmd_buffer);
 
genX(flush_pipeline_select_3d)(cmd_buffer);
diff --git a/src/intel/vulkan/genX_gpu_memcpy.c 
b/src/intel/vulkan/genX_gpu_memcpy.c
index f3ada93..1d527fb 100644
--- a/src/intel/vulkan/genX_gpu_memcpy.c
+++ b/src/intel/vulkan/genX_gpu_memcpy.c
@@ -62,6 +62,28 @@ genX(cmd_buffer_mi_memcpy)(struct anv_cmd_buffer *cmd_buffer,
assert(dst_offset % 4 == 0);
assert(src_offset % 4 == 0);
 
+#if GEN_GEN == 7
+   /* On gen7, the combination of commands used here(MI_LOAD_REGISTER_MEM
+* and MI_STORE_REGISTER_MEM) can cause GPU hangs if any rendering is
+* in-flight when they are issued even if the memory touched is not
+* currently active for rendering.  The weird bit is that it is not the
+* MI_LOAD/STORE_REGISTER_MEM commands which hang but rather the in-flight
+* rendering hangs such that the next stalling command after the
+* MI_LOAD/STORE_REGISTER_MEM commands will catch the hang.
+*
+* It is unclear exactly why this hang occurs.  Both MI commands come with
+* warnings about the 3D pipeline but that doesn't seem to fully explain
+* it.  My (Jason's) best theory is that it has something to do with the
+* fact that we're using a GPU state register as our temporary and that
+* something with reading/writing it is causing problems.
+*
+* In order to work around this issue, we emit a PIPE_CONTROL with the
+* command streamer stall bit set.
+*/
+   cmd_buffer->state.pending_pipe_bits |= ANV_PIPE_CS_STALL_BIT;
+   genX(cmd_buffer_apply_pipe_flushes)(cmd_buffer);
+#endif
+
for (uint32_t i = 0; i < size; i += 4) {
   const struct anv_address src_addr =
  (struct anv_address) { src, src_offset + i};
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev