Re: [Mesa-dev] [PATCH] r600/sb: remove superfluos assert

2017-09-12 Thread Glenn Kennard

On Tue, 12 Sep 2017 19:25:18 +0200, Vadim Girlin  wrote:


On 09/12/2017 12:49 PM, Gert Wollny wrote:

Am Dienstag, den 12.09.2017, 09:56 +0300 schrieb Vadim Girlin:

On 09/11/2017 07:09 PM, Emil Velikov wrote:



Anyway, if num_arrays is 0 there, I suspect it can be a result of
some other issue. At the very least it looks like a potential
performance problem, because in that case we assume all shader
registers can be  accessed with indirect addressing and it can limit
the optimizations significantly. So it might make sense to figure out
why it's zero in the first place, in theory it shouldn't happen.
Maybe something is wrong with the indirect_files bits?files


The shader that's failing is this (i.e. no arrays, and indirect access
only to SV).


Is the tested feature really supported by r600g? AFAICS the indirect
index value is unused in the shader code.

Anyway, at first glance it looks like we don't need indirect addressing
for GPRs in this case, so the outer "if" around that assert probably
should handle this case too and skip the assert. I'm not 100% sure though.



FRAG
DCL SV[0], SAMPLEMASK
DCL OUT[0], COLOR
DCL CONST[0][0]
DCL TEMP[0..1], LOCAL
DCL ADDR[0]
IMM[0] FLT32 {1., 0., 0., 0.}
IMM[1] INT32 {1, 0, 0, 0}
   0: MOV TEMP[0], IMM[0].xyyx
   1: UARL ADDR[0].x, CONST[0][0].
   2: USEQ TEMP[1].x, SV[ADDR[0].x]., IMM[1].
   3: UIF TEMP[1].
   4:   MOV TEMP[0].xy, IMM[0].yxyy
   5: ENDIF
   6: MOV OUT[0], TEMP[0]
   7: END

= SHADER #12 ==
PS/BARTS/EVERGREEN =
= 36 dw = 8 gprs = 1 stack
=
  4005 a418 ALU_PUSH_BEFORE 7 @10 KC0[CB0:0-15]
0010  00f9 00400c90 1 x: MOVR2.x,  1.0
0012  04f8 20400c90   y: MOVR2.y,  0
0014  04f8 40400c90   z: MOVR2.z,  0
0016  00f9 60400c90   w: MOVR2.w,  1.0
0018  8080 00800c90   t: MOVR4.x,  KC0[0].x
0020  801f4800 00601d10 2 x: SETE_INT   R3.x,  R0.z, 1
0022  801f00fe 00e0229c 3 MP  x: PRED_SETNE_INT R7.x,  PV.x, 0
0002  0003 8281 JUMP @6 POP:1
0004  000c a804 ALU_POP_AFTER 2 @24
0024  04f8 00400c90 4 x: MOVR2.x,  0
0026  80f9 20400c90   y: MOVR2.y,  1.0
0006  000e a00c ALU 4 @28
0028  0002 00200c90 5 x: MOVR1.x,  R2.x
0030  0402 20200c90   y: MOVR1.y,  R2.y
0032  0802 40200c90   z: MOVR1.z,  R2.z
0034  8c02 60200c90   w: MOVR1.w,  R2.w
0008  c0008000 95200688 EXPORT_DONEPIXEL 0 R1.xyzw  EOP
= SHADER_END






Hi Gert,

Vadim is correct, the fix is to extend the check in the if case above to also 
exclude TGSI_FILE_SYSTEM_VALUE, and keep the assert in place. ie:

 if (pshader->indirect_files & ~((1 << TGSI_FILE_CONSTANT) | (1 << TGSI_FILE_SAMPLER) 
| (1 << TGSI_FILE_SYSTEM_VALUE))) {


Although gl_SampleMaskIn is declared as an array in GLSL, its effectively a 32 
bit mask on all hardware supported by mesa so the array indexing is simply 
ignored. Thanks for looking in to this!


/Glenn
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600: refactor out some compressed resource state code.

2017-06-05 Thread Glenn Kennard

On Mon, 05 Jun 2017 05:35:02 +0200, Dave Airlie <airl...@gmail.com> wrote:


From: Dave Airlie <airl...@redhat.com>

This just takes this out to a separate function as it will
get more complex with images.
---
 src/gallium/drivers/r600/r600_state_common.c | 52 +++-
 1 file changed, 28 insertions(+), 24 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index 3b24f36..8ace779 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -1400,6 +1400,32 @@ static void r600_generate_fixed_func_tcs(struct 
r600_context *rctx)
ureg_create_shader_and_destroy(ureg, >b.b);
 }
+static void r600_update_compressed_resource_state(struct r600_context *rctx)
+{
+   unsigned i;
+   unsigned counter;
+
+   counter = p_atomic_read(>screen->b.compressed_colortex_counter);
+   if (counter != rctx->b.last_compressed_colortex_counter) {
+   rctx->b.last_compressed_colortex_counter = counter;
+
+   for (i = 0; i < PIPE_SHADER_TYPES; ++i) {
+   
r600_update_compressed_colortex_mask(>samplers[i].views);
+   }
+   }
+
+   /* Decompress textures if needed. */
+   for (i = 0; i < PIPE_SHADER_TYPES; i++) {
+   struct r600_samplerview_state *views = >samplers[i].views;
+   if (views->compressed_depthtex_mask) {
+   r600_decompress_depth_textures(rctx, views);
+   }
+   if (views->compressed_colortex_mask) {
+   r600_decompress_color_textures(rctx, views);
+   }
+   }
+}
+
 #define SELECT_SHADER_OR_FAIL(x) do {  \
r600_shader_select(ctx, rctx->x##_shader, ##_dirty);   \
if (unlikely(!rctx->x##_shader->current)) \
@@ -1440,30 +1466,8 @@ static bool r600_update_derived_state(struct 
r600_context *rctx)
bool need_buf_const;
struct r600_pipe_shader *clip_so_current = NULL;
-   if (!rctx->blitter->running) {
-   unsigned i;
-   unsigned counter;
-
-   counter = 
p_atomic_read(>screen->b.compressed_colortex_counter);
-   if (counter != rctx->b.last_compressed_colortex_counter) {
-   rctx->b.last_compressed_colortex_counter = counter;
-
-   for (i = 0; i < PIPE_SHADER_TYPES; ++i) {
-   
r600_update_compressed_colortex_mask(>samplers[i].views);
-   }
-   }
-
-   /* Decompress textures if needed. */
-   for (i = 0; i < PIPE_SHADER_TYPES; i++) {
-   struct r600_samplerview_state *views = 
>samplers[i].views;
-   if (views->compressed_depthtex_mask) {
-   r600_decompress_depth_textures(rctx, views);
-   }
-   if (views->compressed_colortex_mask) {
-   r600_decompress_color_textures(rctx, views);
-   }
-   }
-   }
+   if (!rctx->blitter->running)
+   r600_update_compressed_resource_state(rctx);
SELECT_SHADER_OR_FAIL(ps);



Patch series is Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/9] r600g: Implement scratch buffer state management

2017-03-05 Thread Glenn Kennard
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
 src/gallium/drivers/r600/evergreen_state.c   |  24 +++
 src/gallium/drivers/r600/r600_pipe.c |   3 +
 src/gallium/drivers/r600/r600_pipe.h |  14 
 src/gallium/drivers/r600/r600_shader.h   |   1 +
 src/gallium/drivers/r600/r600_state_common.c | 104 +++
 5 files changed, 146 insertions(+)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index c5dd9f7..8e984b9 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -1976,6 +1976,30 @@ static void evergreen_emit_tcs_constant_buffers(struct 
r600_context *rctx, struc
0);
 }
 
+void evergreen_setup_scratch_buffers(struct r600_context *rctx) {
+   static const struct {
+   unsigned ring_base;
+   unsigned item_size;
+   unsigned ring_size;
+   } regs[EG_NUM_HW_STAGES] = {
+   [R600_HW_STAGE_PS] = { R_008C68_SQ_PSTMP_RING_BASE, 
R_028914_SQ_PSTMP_RING_ITEMSIZE, R_008C6C_SQ_PSTMP_RING_SIZE },
+   [R600_HW_STAGE_VS] = { R_008C60_SQ_VSTMP_RING_BASE, 
R_028910_SQ_VSTMP_RING_ITEMSIZE, R_008C64_SQ_VSTMP_RING_SIZE },
+   [R600_HW_STAGE_GS] = { R_008C58_SQ_GSTMP_RING_BASE, 
R_02890C_SQ_GSTMP_RING_ITEMSIZE, R_008C5C_SQ_GSTMP_RING_SIZE },
+   [R600_HW_STAGE_ES] = { R_008C50_SQ_ESTMP_RING_BASE, 
R_028908_SQ_ESTMP_RING_ITEMSIZE, R_008C54_SQ_ESTMP_RING_SIZE },
+   [EG_HW_STAGE_LS] = { R_008E10_SQ_LSTMP_RING_BASE, 
R_028830_SQ_LSTMP_RING_ITEMSIZE, R_008E14_SQ_LSTMP_RING_SIZE },
+   [EG_HW_STAGE_HS] = { R_008E18_SQ_HSTMP_RING_BASE, 
R_028834_SQ_HSTMP_RING_ITEMSIZE, R_008E1C_SQ_HSTMP_RING_SIZE }
+   };
+
+   for (unsigned i = 0; i < EG_NUM_HW_STAGES; i++) {
+   struct r600_pipe_shader *stage = 
rctx->hw_shader_stages[i].shader;
+
+   if (stage && unlikely(stage->scratch_space_needed)) {
+   r600_setup_scratch_area_for_shader(rctx, stage,
+   >scratch_buffers[i], regs[i].ring_base, 
regs[i].item_size, regs[i].ring_size);
+   }
+   }
+}
+
 static void evergreen_emit_sampler_views(struct r600_context *rctx,
 struct r600_samplerview_state *state,
 unsigned resource_id_base, unsigned 
pkt_flags)
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 1803c26..fc03990 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -71,6 +71,9 @@ static void r600_destroy_context(struct pipe_context *context)
 
r600_sb_context_destroy(rctx->sb_context);
 
+   for (sh = 0; sh < (rctx->b.chip_class < EVERGREEN ? R600_NUM_HW_STAGES 
: EG_NUM_HW_STAGES); sh++) {
+   r600_resource_reference(>scratch_buffers[sh].buffer, 
NULL);
+   }
r600_resource_reference(>dummy_cmask, NULL);
r600_resource_reference(>dummy_fmask, NULL);
 
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index cf8eba3..c8cf87f 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -413,6 +413,13 @@ struct r600_shader_state {
struct r600_pipe_shader *shader;
 };
 
+/* Used to spill shader temps */
+struct r600_scratch_buffer {
+   struct r600_resource*buffer;
+   unsignedsize;
+   unsigneditem_size;
+};
+
 struct r600_context {
struct r600_common_context  b;
struct r600_screen  *screen;
@@ -522,6 +529,8 @@ struct r600_context {
struct r600_pipe_shader_selector *last_tcs;
unsigned last_num_tcs_input_cp;
unsigned lds_alloc;
+
+   struct r600_scratch_buffer scratch_buffers[MAX2(R600_NUM_HW_STAGES, 
EG_NUM_HW_STAGES)];
 };
 
 static inline void r600_emit_command_buffer(struct radeon_winsys_cs *cs,
@@ -621,6 +630,7 @@ void evergreen_init_color_surface_rat(struct r600_context 
*rctx,
struct r600_surface *surf);
 void evergreen_update_db_shader_control(struct r600_context * rctx);
 bool evergreen_adjust_gprs(struct r600_context *rctx);
+void evergreen_setup_scratch_buffers(struct r600_context *rctx);
 /* r600_blit.c */
 void r600_init_blit_functions(struct r600_context *rctx);
 void r600_decompress_depth_textures(struct r600_context *rctx,
@@ -665,6 +675,7 @@ boolean r600_is_format_supported(struct pipe_screen *screen,
 unsigned sample_count,
 unsigned usage);
 void r600_update_db_shader_control(struct r600_context * rctx);
+void r600_setup_scratch_buffers(struct r600_context *rctx);
 
 /* r600_hw

[Mesa-dev] [PATCH 8/9] r600g/sb: Add dependency tracking for scratch ops

2017-03-05 Thread Glenn Kennard
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
 src/gallium/drivers/r600/r600_shader.h |  1 +
 src/gallium/drivers/r600/sb/sb_bc_finalize.cpp |  2 +-
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp   | 12 
 src/gallium/drivers/r600/sb/sb_core.cpp|  3 ++-
 src/gallium/drivers/r600/sb/sb_ir.h|  6 +-
 src/gallium/drivers/r600/sb/sb_ra_init.cpp |  2 +-
 src/gallium/drivers/r600/sb/sb_sched.cpp   |  2 +-
 src/gallium/drivers/r600/sb/sb_valtable.cpp|  1 +
 8 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.h 
b/src/gallium/drivers/r600/r600_shader.h
index e94230f..3c35d48 100644
--- a/src/gallium/drivers/r600/r600_shader.h
+++ b/src/gallium/drivers/r600/r600_shader.h
@@ -67,6 +67,7 @@ struct r600_shader {
boolean uses_kill;
boolean fs_write_all;
boolean two_side;
+   boolean needs_scratch_space;
/* Number of color outputs in the TGSI shader,
 * sometimes it could be higher than nr_cbufs (bug?).
 * Also with writes_all property on eg+ it will be set to max CB number 
*/
diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
index 82826a9..5d74794 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
@@ -293,7 +293,7 @@ void bc_finalizer::finalize_alu_group(alu_group_node* g, 
node *prev_node) {
value *d = n->dst.empty() ? NULL : n->dst[0];
 
if (d && d->is_special_reg()) {
-   assert((n->bc.op_ptr->flags & AF_MOVA) || 
d->is_geometry_emit());
+   assert((n->bc.op_ptr->flags & AF_MOVA) || 
d->is_geometry_emit() || d->is_scratch());
d = NULL;
}
 
diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
index ae92a76..9c52342 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
@@ -667,6 +667,11 @@ int bc_parser::prepare_fetch_clause(cf_node *cf) {

n->src.push_back(get_cf_index_value(n->bc.resource_index_mode == 
V_SQ_CF_INDEX_1));
}
}
+
+   if (n->bc.op == FETCH_OP_READ_SCRATCH) {
+   n->src.push_back(sh->get_special_value(SV_SCRATCH));
+   n->dst.push_back(sh->get_special_value(SV_SCRATCH));
+   }
}
 
return 0;
@@ -797,6 +802,10 @@ int bc_parser::prepare_ir() {
c->flags |= NF_DONT_KILL;
}
}
+   else if (c->bc.op == CF_OP_MEM_SCRATCH) {
+   
c->src.push_back(sh->get_special_value(SV_SCRATCH));
+   
c->dst.push_back(sh->get_special_value(SV_SCRATCH));
+   }
 
if (!burst_count--)
break;
@@ -831,6 +840,9 @@ int bc_parser::prepare_ir() {

c->src.push_back(sh->get_special_value(SV_GEOMETRY_EMIT));

c->dst.push_back(sh->get_special_value(SV_GEOMETRY_EMIT));
}
+   } else if (c->bc.op == CF_OP_WAIT_ACK) {
+   c->src.push_back(sh->get_special_value(SV_SCRATCH));
+   c->dst.push_back(sh->get_special_value(SV_SCRATCH));
}
}
 
diff --git a/src/gallium/drivers/r600/sb/sb_core.cpp 
b/src/gallium/drivers/r600/sb/sb_core.cpp
index afea818..283c84f 100644
--- a/src/gallium/drivers/r600/sb/sb_core.cpp
+++ b/src/gallium/drivers/r600/sb/sb_core.cpp
@@ -191,7 +191,8 @@ int r600_sb_bytecode_process(struct r600_context *rctx,
 
// if conversion breaks the dependency tracking between CF_EMIT ops 
when it removes
// the phi nodes for SV_GEOMETRY_EMIT. Just disable it for GS
-   if (sh->target != TARGET_GS)
+   // Same for for shaders spilling to scratch memory using SV_SCRATCH
+   if (sh->target != TARGET_GS || pshader->needs_scratch_space)
SB_RUN_PASS(if_conversion,  1);
 
// if_conversion breaks info about uses, but next pass (peephole)
diff --git a/src/gallium/drivers/r600/sb/sb_ir.h 
b/src/gallium/drivers/r600/sb/sb_ir.h
index 74c0549..141bf5f 100644
--- a/src/gallium/drivers/r600/sb/sb_ir.h
+++ b/src/gallium/drivers/r600/sb/sb_ir.h
@@ -42,7 +42,8 @@ enum special_regs {
SV_EXEC_MASK,
SV_AR_INDEX,
SV_VALID_MASK,
-   SV_GEOMETRY

[Mesa-dev] [PATCH 9/9] r600g: Implement spilling of temp arrays

2017-03-05 Thread Glenn Kennard
Pessimistically spills arrays if GPR limit is exceeded.

Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
 src/gallium/drivers/r600/r600_shader.c | 308 ++---
 1 file changed, 285 insertions(+), 23 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 8cb3f8b..f716dae 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -165,7 +165,7 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
bool dump = r600_can_dump_shader(>screen->b,
 tgsi_get_processor_type(sel->tokens));
unsigned use_sb = !(rctx->screen->b.debug_flags & DBG_NO_SB);
-   unsigned sb_disasm = use_sb || (rctx->screen->b.debug_flags & 
DBG_SB_DISASM);
+   unsigned sb_disasm;
unsigned export_shader;
 
shader->shader.bc.isa = rctx->isa;
@@ -203,6 +203,7 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
}
}
 
+   sb_disasm = use_sb || (rctx->screen->b.debug_flags & DBG_SB_DISASM);
if (dump && !sb_disasm) {
fprintf(stderr, 
"--\n");
r600_bytecode_disasm(>shader.bc);
@@ -317,6 +318,9 @@ struct eg_interp {
 
 struct r600_shader_ctx {
struct tgsi_shader_info info;
+   struct tgsi_array_info  *array_infos;
+   /* flag for each tgsi temp array if its been spilled or not */
+   bool*spilled_arrays;
struct tgsi_parse_context   parse;
const struct tgsi_token *tokens;
unsignedtype;
@@ -350,6 +354,7 @@ struct r600_shader_ctx {
unsignedenabled_stream_buffers_mask;
unsignedtess_input_info; /* temp with 
tess input offsets */
unsignedtess_output_info; /* temp with 
tess input offsets */
+   unsignedneed_wait_ack;
 };
 
 struct r600_shader_tgsi_instruction {
@@ -850,6 +855,96 @@ static int tgsi_barrier(struct r600_shader_ctx *ctx)
return 0;
 }
 
+static void choose_spill_arrays(struct r600_shader_ctx *ctx, int *regno, 
unsigned *scratch_space_needed)
+{
+   // pick largest array and spill it, repeat until the number of temps is 
under limit or we run out of arrays
+   unsigned n = ctx->info.array_max[TGSI_FILE_TEMPORARY];
+   unsigned narrays_left = n;
+   bool *spilled = ctx->spilled_arrays; // assumed calloc:ed
+
+   *scratch_space_needed = 0;
+   while (*regno > 124 && narrays_left) {
+   unsigned i;
+   unsigned largest = 0;
+   unsigned largest_index = 0;
+
+   for (i = 0; i < n; i++) {
+   unsigned size = ctx->array_infos[i].range.Last - 
ctx->array_infos[i].range.First + 1;
+   if (!spilled[i] && size > largest) {
+   largest = size;
+   largest_index = i;
+   }
+   }
+
+   spilled[largest_index] = true;
+   *regno -= largest;
+   *scratch_space_needed += largest;
+
+   narrays_left --;
+   }
+
+   if (narrays_left == 0) {
+   ctx->info.indirect_files &= ~(1 << TGSI_FILE_TEMPORARY);
+   }
+}
+
+/* take spilled temp arrays into account when translating tgsi register
+   indexes into r600 gprs if spilled is false, or scratch array offset if
+   spilled is true */
+static int map_tgsi_reg_index_to_r600_gpr(struct r600_shader_ctx *ctx, 
unsigned tgsi_reg_index, bool *spilled) {
+   unsigned i;
+   unsigned spilled_size = 0;
+
+   for (i = 0; i < ctx->info.array_max[TGSI_FILE_TEMPORARY]; i++) {
+   if (tgsi_reg_index >= ctx->array_infos[i].range.First && 
tgsi_reg_index <= ctx->array_infos[i].range.Last) {
+   if (ctx->spilled_arrays[i]) {
+   /* vec4 index into spilled scratch memory */
+   *spilled = true;
+
+   return tgsi_reg_index - 
ctx->array_infos[i].range.First + spilled_size;
+   }
+   else {
+   /* regular GPR array */
+   *spilled = false;
+
+   return tgsi_reg_index - spilled_size + 
ctx->file_offset[TGSI_FILE_TEMPORARY];
+   }
+   }
+
+   if (ctx->spilled_arrays[i]) {
+   spilled_size += ctx->array_infos[i].ran

[Mesa-dev] [PATCH 1/9] r600g: Add scratch ring register defines

2017-03-05 Thread Glenn Kennard
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
 src/gallium/drivers/r600/evergreend.h | 14 ++
 src/gallium/drivers/r600/r600d.h  |  8 ++--
 2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreend.h 
b/src/gallium/drivers/r600/evergreend.h
index 40ba7c1..2fbb540 100644
--- a/src/gallium/drivers/r600/evergreend.h
+++ b/src/gallium/drivers/r600/evergreend.h
@@ -2021,10 +2021,24 @@
 #define R_0288EC_SQ_LDS_ALLOC_PS 0x000288EC
 #define R_028900_SQ_ESGS_RING_ITEMSIZE   0x00028900
 #define R_028904_SQ_GSVS_RING_ITEMSIZE   0x00028904
+#define R_008C50_SQ_ESTMP_RING_BASE  0x8C50
 #define R_028908_SQ_ESTMP_RING_ITEMSIZE  0x00028908
+#define R_008C54_SQ_ESTMP_RING_SIZE  0x8C54
+#define R_008C58_SQ_GSTMP_RING_BASE  0x8C58
 #define R_02890C_SQ_GSTMP_RING_ITEMSIZE  0x0002890C
+#define R_008C5C_SQ_GSTMP_RING_SIZE  0x8C5C
+#define R_008C60_SQ_VSTMP_RING_BASE  0x8C60
 #define R_028910_SQ_VSTMP_RING_ITEMSIZE  0x00028910
+#define R_008C64_SQ_VSTMP_RING_SIZE  0x8C64
+#define R_008C68_SQ_PSTMP_RING_BASE  0x8C68
 #define R_028914_SQ_PSTMP_RING_ITEMSIZE  0x00028914
+#define R_008C6C_SQ_PSTMP_RING_SIZE  0x8C6C
+#define R_008E10_SQ_LSTMP_RING_BASE  0x8E10
+#define R_028830_SQ_LSTMP_RING_ITEMSIZE  0x00028830
+#define R_008E14_SQ_LSTMP_RING_SIZE  0x8E14
+#define R_008E18_SQ_HSTMP_RING_BASE  0x8E18
+#define R_028834_SQ_HSTMP_RING_ITEMSIZE  0x00028834
+#define R_008E1C_SQ_HSTMP_RING_SIZE  0x8E1C
 #define R_02891C_SQ_GS_VERT_ITEMSIZE 0x0002891C
 #define R_028920_SQ_GS_VERT_ITEMSIZE_1   0x00028920
 #define R_028924_SQ_GS_VERT_ITEMSIZE_2   0x00028924
diff --git a/src/gallium/drivers/r600/r600d.h b/src/gallium/drivers/r600/r600d.h
index 75d64c1..9155076 100644
--- a/src/gallium/drivers/r600/r600d.h
+++ b/src/gallium/drivers/r600/r600d.h
@@ -219,8 +219,12 @@
 #define R_008C4C_SQ_GSVS_RING_SIZE   0x008C4C
 #define R_008C50_SQ_ESTMP_RING_BASE  0x008C50
 #define R_008C54_SQ_ESTMP_RING_SIZE  0x008C54
-#define R_008C50_SQ_GSTMP_RING_BASE  0x008C58
-#define R_008C54_SQ_GSTMP_RING_SIZE  0x008C5C
+#define R_008C58_SQ_GSTMP_RING_BASE  0x008C58
+#define R_008C5C_SQ_GSTMP_RING_SIZE  0x008C5C
+#define R_008C68_SQ_PSTMP_RING_BASE  0x008C68
+#define R_008C6C_SQ_PSTMP_RING_SIZE  0x008C6C
+#define R_008C60_SQ_VSTMP_RING_BASE  0x008C60
+#define R_008C64_SQ_VSTMP_RING_SIZE  0x008C64
 
 #define R_0088C8_VGT_GS_PER_ES   0x0088C8
 #define R_0088CC_VGT_ES_PER_GS   0x0088CC
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/9] r600g: Support emitting scratch ops

2017-03-05 Thread Glenn Kennard
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
 src/gallium/drivers/r600/eg_asm.c   |  3 ++-
 src/gallium/drivers/r600/r600_asm.c | 25 +++-
 src/gallium/drivers/r600/r600_asm.h | 15 ++
 src/gallium/drivers/r600/r700_asm.c | 39 +
 4 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/eg_asm.c 
b/src/gallium/drivers/r600/eg_asm.c
index 46683c1..fa2e1d4 100644
--- a/src/gallium/drivers/r600/eg_asm.c
+++ b/src/gallium/drivers/r600/eg_asm.c
@@ -104,7 +104,8 @@ int eg_bytecode_cf_build(struct r600_bytecode *bc, struct 
r600_bytecode_cf *cf)

S_SQ_CF_ALLOC_EXPORT_WORD1_BARRIER(cf->barrier) |

S_SQ_CF_ALLOC_EXPORT_WORD1_CF_INST(opcode) |

S_SQ_CF_ALLOC_EXPORT_WORD1_BUF_COMP_MASK(cf->output.comp_mask) |
-   
S_SQ_CF_ALLOC_EXPORT_WORD1_BUF_ARRAY_SIZE(cf->output.array_size);
+   
S_SQ_CF_ALLOC_EXPORT_WORD1_BUF_ARRAY_SIZE(cf->output.array_size) |
+   
S_SQ_CF_ALLOC_EXPORT_WORD1_MARK(cf->output.mark);
if (bc->chip_class == EVERGREEN) /* no EOP on cayman */
bc->bytecode[id] |= 
S_SQ_CF_ALLOC_EXPORT_WORD1_END_OF_PROGRAM(cf->end_of_program);
id++;
diff --git a/src/gallium/drivers/r600/r600_asm.c 
b/src/gallium/drivers/r600/r600_asm.c
index f85993d..7415543 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -1491,6 +1491,9 @@ int cm_bytecode_add_cf_end(struct r600_bytecode *bc)
 /* common to all 3 families */
 static int r600_bytecode_vtx_build(struct r600_bytecode *bc, struct 
r600_bytecode_vtx *vtx, unsigned id)
 {
+   if (r600_isa_fetch(vtx->op)->flags & FF_MEM)
+   return r700_bytecode_fetch_mem_build(bc, vtx, id);
+
bc->bytecode[id] = S_SQ_VTX_WORD0_BUFFER_ID(vtx->buffer_id) |
S_SQ_VTX_WORD0_FETCH_TYPE(vtx->fetch_type) |
S_SQ_VTX_WORD0_SRC_GPR(vtx->src_gpr) |
@@ -2127,7 +2130,8 @@ void r600_bytecode_disasm(struct r600_bytecode *bc)
o += print_swizzle(7);
}
 
-   if (cf->output.type == 
V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_WRITE_IND)
+   if (cf->output.type == 
V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_WRITE_IND ||
+   cf->output.type == 3 
/*V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_WRITE_IND_ACK */)
o += fprintf(stderr, " R%d", 
cf->output.index_gpr);
 
o += print_indent(o, 67);
@@ -2139,6 +2143,10 @@ void r600_bytecode_disasm(struct r600_bytecode *bc)
fprintf(stderr, "NO_BARRIER ");
if (cf->end_of_program)
fprintf(stderr, "EOP ");
+
+   if (cf->output.mark)
+   fprintf(stderr, "MARK ");
+
fprintf(stderr, "\n");
} else {
fprintf(stderr, "%04d %08X %08X  %s ", id, 
bc->bytecode[id],
@@ -2270,6 +2278,8 @@ void r600_bytecode_disasm(struct r600_bytecode *bc)
 
o += fprintf(stderr, ", R%d.", vtx->src_gpr);
o += print_swizzle(vtx->src_sel_x);
+   if (r600_isa_fetch(vtx->op)->flags & FF_MEM)
+   o += print_swizzle(vtx->src_sel_y);
 
if (vtx->offset)
fprintf(stderr, " +%db", vtx->offset);
@@ -2286,6 +2296,19 @@ void r600_bytecode_disasm(struct r600_bytecode *bc)
if (bc->chip_class >= EVERGREEN && 
vtx->buffer_index_mode)
fprintf(stderr, "SQ_%s ", 
index_mode[vtx->buffer_index_mode]);
 
+   if (r600_isa_fetch(vtx->op)->flags & FF_MEM) {
+   if (vtx->uncached)
+   fprintf(stderr, "UNCACHED ");
+   if (vtx->indexed)
+   fprintf(stderr, "INDEXED:%d ", 
vtx->indexed);
+
+   fprintf(stderr, "ELEM_SIZE:%d ", 
vtx->elem_size);
+   if (vtx->burst_count)
+   fprintf(stderr, "BURST_COUNT:%d ", 
vtx->burst_co

[Mesa-dev] [PATCH 2/9] r600g: Add instruction encoding defines for MEM_RD

2017-03-05 Thread Glenn Kennard
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
 src/gallium/drivers/r600/r700_sq.h | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/src/gallium/drivers/r600/r700_sq.h 
b/src/gallium/drivers/r600/r700_sq.h
index d881012..81e0e7a 100644
--- a/src/gallium/drivers/r600/r700_sq.h
+++ b/src/gallium/drivers/r600/r700_sq.h
@@ -543,4 +543,34 @@
 #define   G_SQ_TEX_WORD2_SRC_SEL_W(x)(((x) >> 
29) & 0x7)
 #define   C_SQ_TEX_WORD2_SRC_SEL_W   0x1FFF
 
+#define P_SQ_MEM_RD_WORD0
+#define   S_SQ_MEM_RD_WORD0_MEM_INST(x)  (((x) & 
0x1F) << 0)
+#define   S_SQ_MEM_RD_WORD0_ELEM_SIZE(x) (((x) & 
0x3) << 5)
+#define   S_SQ_MEM_RD_WORD0_FETCH_WHOLE_QUAD(x)  (((x) & 
0x1) << 7)
+#define   S_SQ_MEM_RD_WORD0_MEM_OP(x)(((x) & 
0x7) << 8)
+#define   S_SQ_MEM_RD_WORD0_UNCACHED(x)  (((x) & 
0x1) << 11)
+#define   S_SQ_MEM_RD_WORD0_INDEXED(x)   (((x) & 
0x1) << 12)
+#define   S_SQ_MEM_RD_WORD0_SRC_SEL_Y(x) (((x) & 
0x3) << 13)
+#define   S_SQ_MEM_RD_WORD0_SRC_GPR(x)   (((x) & 
0x7F) << 16)
+#define   S_SQ_MEM_RD_WORD0_SRC_REL(x)   (((x) & 
0x1) << 23)
+#define   S_SQ_MEM_RD_WORD0_SRC_SEL_X(x) (((x) & 
0x3) << 24)
+#define   S_SQ_MEM_RD_WORD0_BURST_COUNT(x)   (((x) & 
0xF) << 26)
+#define   S_SQ_MEM_RD_WORD0_LDS_REQ(x)   (((x) & 
0x1) << 30)
+#define   S_SQ_MEM_RD_WORD0_COALESCED_READ(x)(((x) & 
0x1) << 31)
+#define P_SQ_MEM_RD_WORD1
+#define   S_SQ_MEM_RD_WORD1_DST_GPR(x)   (((x) & 
0x7f) << 0)
+#define   S_SQ_MEM_RD_WORD1_DST_REL(x)   (((x) & 
0x1) << 7)
+#define   S_SQ_MEM_RD_WORD1_DST_SEL_X(x) (((x) & 
0x7) << 9)
+#define   S_SQ_MEM_RD_WORD1_DST_SEL_Y(x) (((x) & 
0x7) << 12)
+#define   S_SQ_MEM_RD_WORD1_DST_SEL_Z(x) (((x) & 
0x7) << 15)
+#define   S_SQ_MEM_RD_WORD1_DST_SEL_W(x) (((x) & 
0x7) << 18)
+#define   S_SQ_MEM_RD_WORD1_DATA_FORMAT(x)   (((x) & 
0x3F) << 22)
+#define   S_SQ_MEM_RD_WORD1_NUM_FORMAT_ALL(x)(((x) & 
0x3) << 28)
+#define   S_SQ_MEM_RD_WORD1_FORMAT_COMP_ALL(x)   (((x) & 
0x1) << 30)
+#define   S_SQ_MEM_RD_WORD1_SRF_MODE_ALL(x)  (((x) & 
0x1) << 31)
+#define P_SQ_MEM_RD_WORD2
+#define   S_SQ_MEM_RD_WORD2_ARRAY_BASE(x)(((x) & 
0x1FFF) << 0)
+#define   S_SQ_MEM_RD_WORD2_ENDIAN_SWAP(x)   (((x) & 
0x3) << 16)
+#define   S_SQ_MEM_RD_WORD2_ARRAY_SIZE(x)(((x) & 
0xFFF) << 20)
+
 #endif
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/9] r600g: Add defines for per-shader engine settings

2017-03-05 Thread Glenn Kennard
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
 src/gallium/drivers/r600/r600d.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/gallium/drivers/r600/r600d.h b/src/gallium/drivers/r600/r600d.h
index 9155076..0d04708 100644
--- a/src/gallium/drivers/r600/r600d.h
+++ b/src/gallium/drivers/r600/r600d.h
@@ -3777,6 +3777,12 @@
 #define SQ_TEX_INST_SAMPLE_C_G_LB  0x1E
 #define SQ_TEX_INST_SAMPLE_C_G_LZ  0x1F
 
+#define EG_0802C_GRBM_GFX_INDEX0x802C
+#define   S_0802C_INSTANCE_INDEX(x)  (((x) 
& 0x) << 0)
+#define   S_0802C_SE_INDEX(x)(((x) 
& 0x3fff) << 16)
+#define   S_0802C_INSTANCE_BROADCAST_WRITES(x)   (((x) & 0x1) << 30)
+#define   S_0802C_SE_BROADCAST_WRITES(x) (((x) & 0x1) 
<< 31)
+
 #define CM_R_028AA8_IA_MULTI_VGT_PARAM0x028AA8
 #define   S_028AA8_PRIMGROUP_SIZE(x)   (((unsigned)(x) & 
0x) << 0)
 #define   G_028AA8_PRIMGROUP_SIZE(x)   (((x) >> 0) & 0x)
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] r600g: Support spilling temp arrays

2017-03-05 Thread Glenn Kennard
This patch series implements support for spilling temporary arrays on
R6xx/R7xx/Evergreen/NI if hardware GPR limits are exceeded. It opts for a
simple pessimistic scheme of spilling the largest arrays until things fit.

This fixes some subset of issues where "GPR limit exceeded" or "TGSI
translation error" is printed to the console.

Exercises left to reader:
* Test on R600/R700, I suspect R600 in particular might need some additional
  fixups for write masking in tgsi_src().
* Implement support for spilling regular TGSI temps. Most of the
  infrastructure needed is in this patch series so should be straightforward.
  This would fix the remaining GPR limit exceeded issues.


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/9] r600g: Add pending output function

2017-03-05 Thread Glenn Kennard
Spills have to happen after the VLIW bundle currently
processed, so defer emitting the spill op.

Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
 src/gallium/drivers/r600/r600_asm.c | 18 ++
 src/gallium/drivers/r600/r600_asm.h |  4 
 2 files changed, 22 insertions(+)

diff --git a/src/gallium/drivers/r600/r600_asm.c 
b/src/gallium/drivers/r600/r600_asm.c
index 7415543..69bd0d6 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -235,6 +235,15 @@ int r600_bytecode_add_output(struct r600_bytecode *bc,
return 0;
 }
 
+int r600_bytecode_add_pending_output(struct r600_bytecode *bc,
+   const struct r600_bytecode_output *output)
+{
+   assert(bc->n_pending_outputs + 1 < ARRAY_SIZE(bc->pending_outputs));
+   bc->pending_outputs[bc->n_pending_outputs++] = *output;
+
+   return 0;
+}
+
 /* alu instructions that can ony exits once per group */
 static int is_alu_once_inst(struct r600_bytecode *bc, struct r600_bytecode_alu 
*alu)
 {
@@ -1304,6 +1313,15 @@ int r600_bytecode_add_alu_type(struct r600_bytecode *bc,
if (nalu->dst.rel && bc->r6xx_nop_after_rel_dst)
insert_nop_r6xx(bc);
 
+   /* Might need to insert spill write ops after current clause */
+   if (nalu->last && bc->n_pending_outputs) {
+   while (bc->n_pending_outputs) {
+   r = r600_bytecode_add_output(bc, 
>pending_outputs[--bc->n_pending_outputs]);
+   if (r)
+   return r;
+   }
+   }
+
return 0;
 }
 
diff --git a/src/gallium/drivers/r600/r600_asm.h 
b/src/gallium/drivers/r600/r600_asm.h
index 87a7c3a..df46db7 100644
--- a/src/gallium/drivers/r600/r600_asm.h
+++ b/src/gallium/drivers/r600/r600_asm.h
@@ -261,6 +261,8 @@ struct r600_bytecode {
unsignedindex_reg[2]; /* indexing register CF_INDEX_[01] */
unsigneddebug_id;
struct r600_isa* isa;
+   struct r600_bytecode_output pending_outputs[5];
+   int n_pending_outputs;
 };
 
 /* eg_asm.c */
@@ -285,6 +287,8 @@ int r600_bytecode_add_gds(struct r600_bytecode *bc,
const struct r600_bytecode_gds *gds);
 int r600_bytecode_add_output(struct r600_bytecode *bc,
const struct r600_bytecode_output *output);
+int r600_bytecode_add_pending_output(struct r600_bytecode *bc,
+   const struct r600_bytecode_output *output);
 int r600_bytecode_build(struct r600_bytecode *bc);
 int r600_bytecode_add_cf(struct r600_bytecode *bc);
 int r600_bytecode_add_cfinst(struct r600_bytecode *bc,
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/9] r600g/sb: Support scratch ops

2017-03-05 Thread Glenn Kennard
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
 src/gallium/drivers/r600/sb/sb_bc.h   | 11 ++
 src/gallium/drivers/r600/sb/sb_bc_builder.cpp | 46 -
 src/gallium/drivers/r600/sb/sb_bc_decoder.cpp | 49 ++-
 src/gallium/drivers/r600/sb/sb_bc_dump.cpp| 15 
 src/gallium/drivers/r600/sb/sb_bc_fmt_def.inc | 36 
 5 files changed, 155 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc.h 
b/src/gallium/drivers/r600/sb/sb_bc.h
index 2c662ac..74c8699 100644
--- a/src/gallium/drivers/r600/sb/sb_bc.h
+++ b/src/gallium/drivers/r600/sb/sb_bc.h
@@ -580,6 +580,15 @@ struct bc_fetch {
unsigned mega_fetch:1;
 
unsigned src2_gpr:7; /* for GDS */
+
+   /* for MEM ops */
+   unsigned elem_size:2;
+   unsigned uncached:1;
+   unsigned indexed:1;
+   unsigned burst_count:4;
+   unsigned array_base:13;
+   unsigned array_size:12;
+
void set_op(unsigned op) { this->op = op; op_ptr = r600_isa_fetch(op); }
 };
 
@@ -747,6 +756,7 @@ private:
 
int decode_fetch_vtx(unsigned , bc_fetch );
int decode_fetch_gds(unsigned , bc_fetch );
+   int decode_fetch_mem(unsigned , bc_fetch );
 };
 
 // bytecode format definition
@@ -966,6 +976,7 @@ private:
int build_fetch_clause(cf_node *n);
int build_fetch_tex(fetch_node *n);
int build_fetch_vtx(fetch_node *n);
+   int build_fetch_mem(fetch_node* n);
 };
 
 } // namespace r600_sb
diff --git a/src/gallium/drivers/r600/sb/sb_bc_builder.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_builder.cpp
index b0df3d9..678844c 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_builder.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_builder.cpp
@@ -129,7 +129,9 @@ int bc_builder::build_fetch_clause(cf_node* n) {
I != E; ++I) {
fetch_node *f = static_cast<fetch_node*>(*I);
 
-   if (f->bc.op_ptr->flags & FF_VTX)
+   if (f->bc.op_ptr->flags & FF_MEM)
+   build_fetch_mem(f);
+   else if (f->bc.op_ptr->flags & FF_VTX)
build_fetch_vtx(f);
else
build_fetch_tex(f);
@@ -657,4 +659,46 @@ int bc_builder::build_fetch_vtx(fetch_node* n) {
return 0;
 }
 
+int bc_builder::build_fetch_mem(fetch_node* n) {
+   const bc_fetch  = n->bc;
+   const fetch_op_info *fop = bc.op_ptr;
+
+   assert(fop->flags & FF_MEM);
+
+   bb << MEM_RD_WORD0_R7EGCM()
+   .MEM_INST(2)
+   .ELEM_SIZE(bc.elem_size)
+   .FETCH_WHOLE_QUAD(bc.fetch_whole_quad)
+   .MEM_OP(0)
+   .UNCACHED(bc.uncached)
+   .INDEXED(bc.indexed)
+   .SRC_SEL_Y(bc.src_sel[1])
+   .SRC_GPR(bc.src_gpr)
+   .SRC_REL(bc.src_rel)
+   .SRC_SEL_X(bc.src_sel[0])
+   .BURST_COUNT(bc.burst_count)
+   .LDS_REQ(bc.lds_req)
+   .COALESCED_READ(bc.coalesced_read);
+
+   bb << MEM_RD_WORD1_R7EGCM()
+   .DST_GPR(bc.dst_gpr)
+   .DST_REL(bc.dst_rel)
+   .DST_SEL_X(bc.dst_sel[0])
+   .DST_SEL_Y(bc.dst_sel[1])
+   .DST_SEL_Z(bc.dst_sel[2])
+   .DST_SEL_W(bc.dst_sel[3])
+   .DATA_FORMAT(bc.data_format)
+   .NUM_FORMAT_ALL(bc.num_format_all)
+   .FORMAT_COMP_ALL(bc.format_comp_all)
+   .SRF_MODE_ALL(bc.srf_mode_all);
+
+   bb << MEM_RD_WORD2_R7EGCM()
+   .ARRAY_BASE(bc.array_base)
+   .ENDIAN_SWAP(bc.endian_swap)
+   .ARR_SIZE(bc.array_size);
+
+   bb << 0;
+   return 0;
+}
+
 }
diff --git a/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp
index 8712abe..1c63c38 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp
@@ -413,7 +413,9 @@ int bc_decoder::decode_fetch(unsigned & i, bc_fetch& bc) {
if (fetch_opcode == 2) { // MEM_INST_MEM
unsigned mem_op = (dw0 >> 8) & 0x7;
unsigned gds_op;
-   if (mem_op == 4) {
+   if (mem_op == 0 || mem_op == 2) {
+   fetch_opcode = mem_op == 0 ? FETCH_OP_READ_SCRATCH : 
FETCH_OP_READ_MEM;
+   } else if (mem_op == 4) {
gds_op = (dw1 >> 9) & 0x1f;
fetch_opcode = FETCH_OP_GDS_ADD + gds_op;
} else if (mem_op == 5)
@@ -422,6 +424,9 @@ int bc_decoder::decode_fetch(unsigned & i, bc_fetch& bc) {
} else
bc.set_op(r600_isa_fetch_by_opcode(ctx.isa, fetch_opcode));
 
+   if (bc.op_ptr->flags & FF_MEM)
+   return decode_fetch_mem(i, bc);
+
  

Re: [Mesa-dev] [PATCH 1/1] r600: Enable FMA on chips that support it

2016-06-15 Thread Glenn Kennard

On Wed, 15 Jun 2016 20:13:13 +0200, Jan Vesely  wrote:


Signed-off-by: Jan Vesely 
---
Untested (I don't have the required hw)

 src/gallium/drivers/r600/r600_pipe.c   | 5 -
 src/gallium/drivers/r600/r600_shader.c | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index a49b00f..49c3e1d 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -548,7 +548,6 @@ static int r600_get_shader_param(struct pipe_screen* 
pscreen, unsigned shader, e
return 0;
case PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED:
case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED:
-   case PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED:
case PIPE_SHADER_CAP_MAX_SHADER_BUFFERS:
case PIPE_SHADER_CAP_MAX_SHADER_IMAGES:
return 0;
@@ -558,6 +557,10 @@ static int r600_get_shader_param(struct pipe_screen* 
pscreen, unsigned shader, e
 *https://bugs.freedesktop.org/show_bug.cgi?id=86720
 */
return 255;
+   case PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED:
+   // Enable on CYPRESS(EG) and CAYMAN(NI)
+   return rscreen->b.family == CHIP_CYPRESS ||
+  rscreen->b.family == CHIP_CAYMAN;
}
return 0;
 }
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 101f666..35019e3 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -8917,7 +8917,7 @@ static const struct r600_shader_tgsi_instruction 
r600_shader_tgsi_instruction[]
[TGSI_OPCODE_MAD]   = { ALU_OP3_MULADD, tgsi_op3},
[TGSI_OPCODE_SUB]   = { ALU_OP2_ADD, tgsi_op2},
[TGSI_OPCODE_LRP]   = { ALU_OP0_NOP, tgsi_lrp},
-   [TGSI_OPCODE_FMA]   = { ALU_OP0_NOP, tgsi_unsupported},
+   [TGSI_OPCODE_FMA]   = { ALU_OP3_FMA, tgsi_op3},
[TGSI_OPCODE_SQRT]  = { ALU_OP1_SQRT_IEEE, 
tgsi_trans_srcx_replicate},
[TGSI_OPCODE_DP2A]  = { ALU_OP0_NOP, tgsi_unsupported},
[22]= { ALU_OP0_NOP, tgsi_unsupported},


You probably meant to add the opcode to the eg_shader_tgsi_instruction and 
cm_shader_tgsi_instruction opcode tables rather than the R600/R700 one?


I'll also note in passing that FMA on CYPRESS/HEMLOCK has an issue rate of 
4/cycle vs MULADD 5/cycle since FMA cannot be issued in the 't' slot,
may or may not affect performance depending on if the GLSL front end decides to 
use fma for mul+add operations. On Cayman/Aruba they are the same rate.


/Glenn
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] st/mesa: add GL_ARB_shader_atomic_counter_ops support

2016-03-10 Thread Glenn Kennard

On Thu, 10 Mar 2016 18:13:03 +0100, Ilia Mirkin <imir...@alum.mit.edu> wrote:


On Thu, Mar 10, 2016 at 12:04 PM, Glenn Kennard <glenn.kenn...@gmail.com> wrote:

On Thu, 10 Mar 2016 17:02:15 +0100, Ilia Mirkin <imir...@alum.mit.edu>
wrote:


On Thu, Mar 10, 2016 at 10:57 AM, Nicolai Hähnle <nhaeh...@gmail.com>
wrote:


-   if (c->MaxCombinedAtomicBuffers > 0)
+   if (c->MaxCombinedAtomicBuffers > 0) {
extensions->ARB_shader_atomic_counters = GL_TRUE;
+  extensions->ARB_shader_atomic_counter_ops = GL_TRUE;
+   }




I believe there's pre-GCN AMD hardware which can support atomic counters
but
not atomic_counter_ops (at least according to what the closed driver
exposes, I haven't actually checked the docs), so there should probably
be a
capability flag here.



I assumed this was due to laziness... seems odd if the SSBO atomic ops
can be supported, but those same ops can't be supported on atomic
buffers. Glenn / Dave - do you guys happen to know what the pre-GCN hw
is capable of?

  -ilia



AFAIK Cayman supports atomic counter ops on SSBOs, evergreen only on counter
buffers, and earlier hardware does neither.


To phrase this a different way, my patch is fine? :) If you support
atomic counters, you support all the various ops in
ARB_shader_atomic_counter_ops (which are basically all the SSBO ops,
but on atomic counters)?



I think so, though the closed driver only exposes ARB_shader_atomic_counter_ops 
on
Cayman only which may be a hint to something. Cross that bridge when we get 
there...

/Glenn
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600/sb: Do not distribute neg in expr_handler::fold_assoc() when folding multiplications.

2016-03-10 Thread Glenn Kennard

The patch makes a bit more sense to me after realizing a fallthrough was 
changed to a break, so the whole patch is

Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] st/mesa: add GL_ARB_shader_atomic_counter_ops support

2016-03-10 Thread Glenn Kennard

On Thu, 10 Mar 2016 17:02:15 +0100, Ilia Mirkin  wrote:


On Thu, Mar 10, 2016 at 10:57 AM, Nicolai Hähnle  wrote:

-   if (c->MaxCombinedAtomicBuffers > 0)
+   if (c->MaxCombinedAtomicBuffers > 0) {
extensions->ARB_shader_atomic_counters = GL_TRUE;
+  extensions->ARB_shader_atomic_counter_ops = GL_TRUE;
+   }



I believe there's pre-GCN AMD hardware which can support atomic counters but
not atomic_counter_ops (at least according to what the closed driver
exposes, I haven't actually checked the docs), so there should probably be a
capability flag here.


I assumed this was due to laziness... seems odd if the SSBO atomic ops
can be supported, but those same ops can't be supported on atomic
buffers. Glenn / Dave - do you guys happen to know what the pre-GCN hw
is capable of?

  -ilia



AFAIK Cayman supports atomic counter ops on SSBOs, evergreen only on counter
buffers, and earlier hardware does neither.


/Glenn
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600/sb: Do not distribute neg in expr_handler::fold_assoc() when folding multiplications.

2016-03-09 Thread Glenn Kennard

On Wed, 09 Mar 2016 09:58:48 +0100, Xavier B <xavi...@gmail.com> wrote:


From: xavier <xavi...@gmail.com>

Previously it was doing this transformation for a Trine 3 shader:
 MUL R6.x.12,R13.x.23, 0.5|3f00
-MULADD R4.x.12,-R6.x.12, 2|4000, 1|3f80
+MULADD R4.x.12,-R13.x.23, -1|bf80, 1|3f80

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94412
Signed-off-by: Xavier Bouchoux <xavi...@gmail.com>
---
 src/gallium/drivers/r600/sb/sb_expr.cpp | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp 
b/src/gallium/drivers/r600/sb/sb_expr.cpp
index 556a05d..3dd3a48 100644
--- a/src/gallium/drivers/r600/sb/sb_expr.cpp
+++ b/src/gallium/drivers/r600/sb/sb_expr.cpp
@@ -598,9 +598,13 @@ bool expr_handler::fold_assoc(alu_node *n) {
unsigned op = n->bc.op;
bool allow_neg = false, cur_neg = false;
+   bool distribute_neg = false;
switch(op) {
case ALU_OP2_ADD:
+   distribute_neg = true;



+   allow_neg = true;


I'm not sure this change belongs in this patch, or even if its correct.


+   break;
case ALU_OP2_MUL:
case ALU_OP2_MUL_IEEE:
allow_neg = true;
@@ -632,7 +636,7 @@ bool expr_handler::fold_assoc(alu_node *n) {
if (v1->is_const()) {
literal arg = v1->get_const_value();
apply_alu_src_mod(a->bc, 1, arg);
-   if (cur_neg)
+   if (cur_neg && distribute_neg)
arg.f = -arg.f;
if (a == n)
@@ -660,7 +664,7 @@ bool expr_handler::fold_assoc(alu_node *n) {
if (v0->is_const()) {
literal arg = v0->get_const_value();
apply_alu_src_mod(a->bc, 0, arg);
-   if (cur_neg)
+   if (cur_neg && distribute_neg)
arg.f = -arg.f;
if (last_arg == 0) {



With the allow_neg change removed, patch is
Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: Add support for PK2H/UP2H

2016-01-03 Thread Glenn Kennard
Based off of Ilia's original patch, but with output values replicated so
that it matches the TGSI semantics.

Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
 src/gallium/drivers/r600/r600_pipe.c   |   2 +-
 src/gallium/drivers/r600/r600_shader.c | 107 +++--
 2 files changed, 104 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index d71082f..3b5d26c 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -328,6 +328,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_TEXTURE_QUERY_LOD:
case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE:
case PIPE_CAP_SAMPLER_VIEW_TARGET:
+   case PIPE_CAP_TGSI_PACK_HALF_FLOAT:
return family >= CHIP_CEDAR ? 1 : 0;
case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
return family >= CHIP_CEDAR ? 4 : 0;
@@ -349,7 +350,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_SHAREABLE_SHADERS:
case PIPE_CAP_CLEAR_TEXTURE:
case PIPE_CAP_DRAW_PARAMETERS:
-   case PIPE_CAP_TGSI_PACK_HALF_FLOAT:
return 0;
 
case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS:
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 9c040ae..7b1eade 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -8960,6 +8960,105 @@ static int tgsi_umad(struct r600_shader_ctx *ctx)
return 0;
 }
 
+static int tgsi_pk2h(struct r600_shader_ctx *ctx)
+{
+   struct tgsi_full_instruction *inst = 
>parse.FullToken.FullInstruction;
+   struct r600_bytecode_alu alu;
+   int r, i;
+   int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask);
+
+   /* temp.xy = f32_to_f16(src) */
+   memset(, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ALU_OP1_FLT32_TO_FLT16;
+   alu.dst.chan = 0;
+   alu.dst.sel = ctx->temp_reg;
+   alu.dst.write = 1;
+   r600_bytecode_src([0], >src[0], 0);
+   r = r600_bytecode_add_alu(ctx->bc, );
+   if (r)
+   return r;
+   alu.dst.chan = 1;
+   r600_bytecode_src([0], >src[0], 1);
+   alu.last = 1;
+   r = r600_bytecode_add_alu(ctx->bc, );
+   if (r)
+   return r;
+
+   /* dst.x = temp.y * 0x1 + temp.x */
+   for (i = 0; i < lasti + 1; i++) {
+   if (!(inst->Dst[0].Register.WriteMask & (1 << i)))
+   continue;
+
+   memset(, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ALU_OP3_MULADD_UINT24;
+   alu.is_op3 = 1;
+   tgsi_dst(ctx, >Dst[0], i, );
+   alu.last = i == lasti;
+   alu.src[0].sel = ctx->temp_reg;
+   alu.src[0].chan = 1;
+   alu.src[1].sel = V_SQ_ALU_SRC_LITERAL;
+   alu.src[1].value = 0x1;
+   alu.src[2].sel = ctx->temp_reg;
+   alu.src[2].chan = 0;
+   r = r600_bytecode_add_alu(ctx->bc, );
+   if (r)
+   return r;
+   }
+
+   return 0;
+}
+
+static int tgsi_up2h(struct r600_shader_ctx *ctx)
+{
+   struct tgsi_full_instruction *inst = 
>parse.FullToken.FullInstruction;
+   struct r600_bytecode_alu alu;
+   int r, i;
+   int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask);
+
+   /* temp.x = src.x */
+   /* note: no need to mask out the high bits */
+   memset(, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ALU_OP1_MOV;
+   alu.dst.chan = 0;
+   alu.dst.sel = ctx->temp_reg;
+   alu.dst.write = 1;
+   r600_bytecode_src([0], >src[0], 0);
+   r = r600_bytecode_add_alu(ctx->bc, );
+   if (r)
+   return r;
+
+   /* temp.y = src.x >> 16 */
+   memset(, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ALU_OP2_LSHR_INT;
+   alu.dst.chan = 1;
+   alu.dst.sel = ctx->temp_reg;
+   alu.dst.write = 1;
+   r600_bytecode_src([0], >src[0], 0);
+   alu.src[1].sel = V_SQ_ALU_SRC_LITERAL;
+   alu.src[1].value = 16;
+   alu.last = 1;
+   r = r600_bytecode_add_alu(ctx->bc, );
+   if (r)
+   return r;
+
+   /* dst.wz = dst.xy = f16_to_f32(temp.xy) */
+   for (i = 0; i < lasti + 1; i++) {
+   if (!(inst->Dst[0].Register.WriteMask & (1 << i)))
+   continue;
+   memset(, 0, sizeof(struct r600_bytecode_alu));
+   tgsi_dst(ctx, >Dst[0], i, );
+   alu.op = ALU_OP1_FLT16_TO_FLT32;
+   alu.src[0].sel = ctx->temp_reg;
+   alu.src[0].chan = i % 2;
+   alu.last = i == las

Re: [Mesa-dev] [PATCH] util/macros: Simplify DIV_ROUND_UP() definition

2015-12-17 Thread Glenn Kennard

On Wed, 16 Dec 2015 20:57:51 +0100, Nanley Chery  wrote:


From: Nanley Chery 

Commit 64880d073ab21ae1abad0c049ea2d6a1169a3cfa consolidated two
DIV_ROUND_UP() definitions to one, but chose the more
compute-intensive version in the process. Use the simpler version
instead. Reduces .text size by 1360 bytes.

Output of `size lib/i965_dri.so`:
  textdata bss dec hex filename
   7850440  219264   27240 8096944  7b8cb0 lib/i965_dri.so (before)
   7849080  219264   27240 8095584  7b8760 lib/i965_dri.so (after)

Cc: Axel Davy 
Signed-off-by: Nanley Chery 
---
 src/util/macros.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/util/macros.h b/src/util/macros.h
index 0c8958f..53a98a0 100644
--- a/src/util/macros.h
+++ b/src/util/macros.h
@@ -211,6 +211,6 @@ do {   \
 #endif
/** Compute ceiling of integer quotient of A divided by B. */
-#define DIV_ROUND_UP( A, B )  ( (A) % (B) == 0 ? (A)/(B) : (A)/(B)+1 )
+#define DIV_ROUND_UP(A, B)  (((A) + (B) - 1) / (B))
#endif /* UTIL_MACROS_H */


I'll point out that these are not equivalent, one can overflow and the other 
doesn't. You
probably want to check if the call sites have sufficient checks for that before
substituting one for the other.


/Glenn
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/53] r600/sb: add support for GDS to the sb decoder/dump.

2015-11-30 Thread Glenn Kennard
OOP, 0);
-   s << ".";
-   for (int k = 0; k < 4; ++k)
-   s << chans[n.bc.dst_sel[k]];
-   s << ", ";
+   if (!gds) {
+   s << "R";
+   print_sel(s, n.bc.dst_gpr, n.bc.dst_rel, INDEX_LOOP, 0);
+   s << ".";
+   for (int k = 0; k < 4; ++k)
+   s << chans[n.bc.dst_sel[k]];
+   s << ", ";
+   }
s << "R";
print_sel(s, n.bc.src_gpr, n.bc.src_rel, INDEX_LOOP, 0);
s << ".";
unsigned vtx = n.bc.op_ptr->flags & FF_VTX;
-   unsigned num_src_comp = vtx ? ctx.is_cayman() ? 2 : 1 : 4;
+   unsigned num_src_comp = gds ? 3 : vtx ? ctx.is_cayman() ? 2 : 1 : 4;
for (unsigned k = 0; k < num_src_comp; ++k)
s << chans[n.bc.src_sel[k]];
@@ -450,9 +453,12 @@ void bc_dump::dump(fetch_node& n) {
s << " + " << n.bc.offset[0] << "b ";
}
-   s << ",   RID:" << n.bc.resource_id;
+   if (!gds)
+   s << ",   RID:" << n.bc.resource_id;
+
+   if (gds) {
-   if (vtx) {
+   } else if (vtx) {
s << "  " << fetch_type[n.bc.fetch_type];
if (!ctx.is_cayman() && n.bc.mega_fetch_count)
s << " MFC:" << n.bc.mega_fetch_count;
diff --git a/src/gallium/drivers/r600/sb/sb_bc_fmt_def.inc 
b/src/gallium/drivers/r600/sb/sb_bc_fmt_def.inc
index 50f73d7..e775499 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_fmt_def.inc
+++ b/src/gallium/drivers/r600/sb/sb_bc_fmt_def.inc
@@ -541,3 +541,31 @@ BC_FIELD(TEX_WORD2, SRC_SEL_Y,  SSY,   
 25, 23)
 BC_FIELD(TEX_WORD2, SRC_SEL_Z,  SSZ,28, 26)
 BC_FIELD(TEX_WORD2, SRC_SEL_W,  SSW,31, 29)
 BC_FORMAT_END(TEX_WORD2)
+
+BC_FORMAT_BEGIN_HW(MEM_GDS_WORD0, EGCM)
+BC_FIELD(MEM_GDS_WORD0,  MEM_INST,   M_INST, 4, 0)
+BC_FIELD(MEM_GDS_WORD0,  MEM_OP,M_OP,  10, 8)
+BC_FIELD(MEM_GDS_WORD0,  SRC_GPR,S_GPR, 17, 11)
+BC_FIELD(MEM_GDS_WORD0,  SRC_REL,SR,19, 18)
+BC_FIELD(MEM_GDS_WORD0,  SRC_SEL_X,  SSX,   22, 20)
+BC_FIELD(MEM_GDS_WORD0,  SRC_SEL_Y,  SSY,   25, 23)
+BC_FIELD(MEM_GDS_WORD0,  SRC_SEL_Z,  SSZ,   28, 26)
+BC_FORMAT_END(MEM_GDS_WORD0)
+
+BC_FORMAT_BEGIN_HW(MEM_GDS_WORD1, EGCM)
+BC_FIELD(MEM_GDS_WORD1, DST_GPR,D_GPR,  6,  0)
+BC_FIELD(MEM_GDS_WORD1, DST_REL,DR, 8,  7)
+BC_FIELD(MEM_GDS_WORD1, GDS_OP, G_OP,  14,  9)
+BC_FIELD(MEM_GDS_WORD1, SRC_GPR,S_GPR, 22, 16)
+BC_FIELD(MEM_GDS_WORD1, UAV_INDEX_MODE, U_IM,  25, 24)
+BC_FIELD(MEM_GDS_WORD1, UAV_ID, U_ID,  29, 26)
+BC_FIELD(MEM_GDS_WORD1, ALLOC_CONSUME,  AC,30, 30)
+BC_FIELD(MEM_GDS_WORD1, BCARD_FIRST_REQ,BFR,   31, 31)
+BC_FORMAT_END(MEM_GDS_WORD1)
+
+BC_FORMAT_BEGIN_HW(MEM_GDS_WORD2, EGCM)
+BC_FIELD(MEM_GDS_WORD2, DST_SEL_X,  DSX,2, 0)
+BC_FIELD(MEM_GDS_WORD2, DST_SEL_Y,  DSY,5, 3)
+BC_FIELD(MEM_GDS_WORD2, DST_SEL_Z,  DSZ,8, 6)
+BC_FIELD(MEM_GDS_WORD2, DST_SEL_W,  DSW,   11, 9)
+BC_FORMAT_END(MEM_GDS_WORD2)
\ No newline at end of file


With src_rel/dst_rel dealt with as suggested above,

Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 20/53] r600/sb: add LS/HS hw shader types.

2015-11-30 Thread Glenn Kennard

On Mon, 30 Nov 2015 07:20:29 +0100, Dave Airlie <airl...@gmail.com> wrote:


From: Dave Airlie <airl...@redhat.com>

This just adds printing for the hw shader types, and hooks it up.

Signed-off-by: Dave Airlie <airl...@redhat.com>
---
 src/gallium/drivers/r600/sb/sb_bc.h  | 2 ++
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 6 --
 src/gallium/drivers/r600/sb/sb_shader.cpp| 4 +++-
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc.h 
b/src/gallium/drivers/r600/sb/sb_bc.h
index d2e8da0..34e1e58 100644
--- a/src/gallium/drivers/r600/sb/sb_bc.h
+++ b/src/gallium/drivers/r600/sb/sb_bc.h
@@ -174,6 +174,8 @@ enum shader_target
TARGET_GS_COPY,
TARGET_COMPUTE,
TARGET_FETCH,
+   TARGET_HS,
+   TARGET_LS,
TARGET_NUM
 };
diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
index 28ebfa2..65aa801 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
@@ -58,10 +58,12 @@ int bc_parser::decode() {
switch (bc->type) {
case TGSI_PROCESSOR_FRAGMENT: t = TARGET_PS; break;
case TGSI_PROCESSOR_VERTEX:
-   t = pshader->vs_as_es ? TARGET_ES : TARGET_VS;
+   t = pshader->vs_as_ls ? TARGET_LS : (pshader->vs_as_es 
? TARGET_ES : TARGET_VS);
break;
case TGSI_PROCESSOR_GEOMETRY: t = TARGET_GS; break;
case TGSI_PROCESSOR_COMPUTE: t = TARGET_COMPUTE; break;
+   case TGSI_PROCESSOR_TESS_CTRL: t = TARGET_HS; break;
+   case TGSI_PROCESSOR_TESS_EVAL: t = pshader->tes_as_es ? 
TARGET_ES : TARGET_VS; break;
default: assert(!"unknown shader target"); return -1; break;
}
} else {
@@ -146,7 +148,7 @@ int bc_parser::parse_decls() {
}
}
-   if (sh->target == TARGET_VS || sh->target == TARGET_ES)
+   if (sh->target == TARGET_VS || sh->target == TARGET_ES || sh->target == 
TARGET_HS)
sh->add_input(0, 1, 0x0F);
else if (sh->target == TARGET_GS) {
sh->add_input(0, 1, 0x0F);
diff --git a/src/gallium/drivers/r600/sb/sb_shader.cpp 
b/src/gallium/drivers/r600/sb/sb_shader.cpp
index 87e28e9..8c7b39b 100644
--- a/src/gallium/drivers/r600/sb/sb_shader.cpp
+++ b/src/gallium/drivers/r600/sb/sb_shader.cpp
@@ -215,7 +215,7 @@ void shader::init() {
 void shader::init_call_fs(cf_node* cf) {
unsigned gpr = 0;
-   assert(target == TARGET_VS || target == TARGET_ES);
+   assert(target == TARGET_LS || target == TARGET_VS || target == 
TARGET_ES);
for(inputs_vec::const_iterator I = inputs.begin(),
E = inputs.end(); I != E; ++I, ++gpr) {
@@ -436,6 +436,8 @@ const char* shader::get_shader_target_name() {
case TARGET_ES: return "ES";
case TARGET_PS: return "PS";
case TARGET_GS: return "GS";
+   case TARGET_HS: return "HS";
+   case TARGET_LS: return "LS";
case TARGET_COMPUTE: return "COMPUTE";
case TARGET_FETCH: return "FETCH";
default:


Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600: move per-type settings into a switch statement

2015-11-29 Thread Glenn Kennard

On Mon, 30 Nov 2015 01:38:03 +0100, Dave Airlie <airl...@gmail.com> wrote:


From: Dave Airlie <airl...@redhat.com>

This will allow adding tess stuff much cleaner later.

Signed-off-by: Dave Airlie <airl...@redhat.com>
---
 src/gallium/drivers/r600/r600_shader.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 560197c..019fef7 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -1909,13 +1909,21 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
shader->processor_type = ctx.type;
ctx.bc->type = shader->processor_type;
-   if (ctx.type == TGSI_PROCESSOR_VERTEX) {
+   switch (ctx.type) {
+   case TGSI_PROCESSOR_VERTEX:
shader->vs_as_gs_a = key.vs.as_gs_a;
shader->vs_as_es = key.vs.as_es;
+   if (shader->vs_as_es)
+   ring_outputs = true;
+   break;
+   case TGSI_PROCESSOR_GEOMETRY:
+   ring_outputs = true;
+   break;
+   case TGSI_PROCESSOR_FRAGMENT:
+   shader->two_side = key.ps.color_two_side;
+   break;
}
-   ring_outputs = shader->vs_as_es || ctx.type == TGSI_PROCESSOR_GEOMETRY;
-
if (shader->vs_as_es) {
ctx.gs_for_vs = >gs_shader->current->shader;
} else {
@@ -1936,8 +1944,6 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
shader->nr_ps_color_exports = 0;
shader->nr_ps_max_color_exports = 0;
-   if (ctx.type == TGSI_PROCESSOR_FRAGMENT)
-   shader->two_side = key.ps.color_two_side;
/* register allocations */
    /* Values [0,127] correspond to GPR[0..127].


Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600: split out common alu_writes pattern.

2015-11-29 Thread Glenn Kennard

On Mon, 30 Nov 2015 01:18:18 +0100, Dave Airlie <airl...@gmail.com> wrote:


From: Dave Airlie <airl...@redhat.com>

This just splits out a common pattern into an inline function
to make things cleaner to read.

Signed-off-by: Dave Airlie <airl...@redhat.com>
---
 src/gallium/drivers/r600/r600_asm.c | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_asm.c 
b/src/gallium/drivers/r600/r600_asm.c
index 45824f2..29515f2 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -37,6 +37,11 @@
 #define NUM_OF_CYCLES 3
 #define NUM_OF_COMPONENTS 4
+static inline bool alu_writes(struct r600_bytecode_alu *alu)
+{
+   return alu->dst.write || alu->is_op3;
+}
+
 static inline unsigned int r600_bytecode_get_num_operands(
struct r600_bytecode *bc, struct r600_bytecode_alu *alu)
 {
@@ -592,7 +597,7 @@ static int replace_gpr_with_pv_ps(struct r600_bytecode *bc,
return r;
for (i = 0; i < max_slots; ++i) {
-   if (prev[i] && (prev[i]->dst.write || prev[i]->is_op3) && 
!prev[i]->dst.rel) {
+   if (prev[i] && alu_writes(prev[i]) && !prev[i]->dst.rel) {
if (is_alu_64bit_inst(bc, prev[i])) {
gpr[i] = -1;
@@ -800,8 +805,8 @@ static int merge_inst_groups(struct r600_bytecode *bc, 
struct r600_bytecode_alu
result[4] = slots[i];
} else if (is_alu_any_unit_inst(bc, prev[i])) {
if (slots[i]->dst.sel == prev[i]->dst.sel 
&&
-   (slots[i]->dst.write == 1 || 
slots[i]->is_op3) &&
-   (prev[i]->dst.write == 1 || 
prev[i]->is_op3))
+   alu_writes(slots[i]) &&
+   alu_writes(prev[i]))
return 0;
result[i] = slots[i];
@@ -816,8 +821,8 @@ static int merge_inst_groups(struct r600_bytecode *bc, 
struct r600_bytecode_alu
if (max_slots == 5 && slots[i] && prev[4] &&
slots[i]->dst.sel == prev[4]->dst.sel &&
slots[i]->dst.chan == prev[4]->dst.chan 
&&
-   (slots[i]->dst.write == 1 || slots[i]->is_op3) 
&&
-   (prev[4]->dst.write == 1 || 
prev[4]->is_op3))
+   alu_writes(slots[i]) &&
+   alu_writes(prev[4]))
return 0;
result[i] = slots[i];
@@ -857,7 +862,7 @@ static int merge_inst_groups(struct r600_bytecode *bc, 
struct r600_bytecode_alu
continue;
for (j = 0; j < max_slots; ++j) {
-   if (!prev[j] || !(prev[j]->dst.write || 
prev[j]->is_op3))
+   if (!prev[j] || !alu_writes(prev[j]))
continue;
/* If it's relative then we can't determin 
which gpr is really used. */
@@ -1846,7 +1851,7 @@ static int print_dst(struct r600_bytecode_alu *alu)
reg_char = 'T';
}
-   if (alu->dst.write || alu->is_op3) {
+   if (alu_writes(alu)) {
o += fprintf(stderr, "%c", reg_char);
o += print_sel(alu->dst.sel, alu->dst.rel, alu->index_mode, 0);
} else {


Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] r600: define registers required for tessellation

2015-11-23 Thread Glenn Kennard
ES_FS 0x000288A8
+#define R_0288E8_SQ_LDS_ALLOC0x000288E8
 #define R_0288EC_SQ_LDS_ALLOC_PS 0x000288EC
 #define R_028900_SQ_ESGS_RING_ITEMSIZE   0x00028900
 #define R_028904_SQ_GSVS_RING_ITEMSIZE   0x00028904
@@ -1997,6 +2043,7 @@
 #define R_028980_ALU_CONST_CACHE_VS_00x00028980
 #define R_028984_ALU_CONST_CACHE_VS_10x00028984
 #define R_0289C0_ALU_CONST_CACHE_GS_00x000289C0
+#define R_028F00_ALU_CONST_CACHE_HS_00x00028F00
 #define R_028F40_ALU_CONST_CACHE_LS_00x00028F40
 #define R_028A04_PA_SU_POINT_MINMAX  0x00028A04
 #define   S_028A04_MIN_SIZE(x) (((x) & 0x) << 0)
@@ -2090,6 +2137,36 @@
 #define V_028B54_VS_STAGE_REAL   0x00
 #define V_028B54_VS_STAGE_DS 0x01
 #define V_028B54_VS_STAGE_COPY_SHADER0x02
+#define R_028B58_VGT_LS_HS_CONFIG   0x00028B58
+#define   S_028B58_NUM_PATCHES(x) (((x) & 0xFF) 
<< 0)
+#define   G_028B58_NUM_PATCHES(x) (((x) >> 0) 
& 0xFF)
+#define   C_028B58_NUM_PATCHES
0xFF00
+#define   S_028B58_HS_NUM_INPUT_CP(x) (((x) & 0x3F) 
<< 8)
+#define   G_028B58_HS_NUM_INPUT_CP(x) (((x) >> 8) 
& 0x3F)
+#define   C_028B58_HS_NUM_INPUT_CP
0xC0FF
+#define   S_028B58_HS_NUM_OUTPUT_CP(x)(((x) & 0x3F) 
<< 14)
+#define   G_028B58_HS_NUM_OUTPUT_CP(x)(((x) >> 14) 
& 0x3F)
+#define   C_028B58_HS_NUM_OUTPUT_CP   
0xFFF03FFF
+#define R_028B5C_VGT_LS_SIZE 0x00028B5C
+#define   S_028B5C_SIZE(x)(((x) & 0xFF) 
<< 0)
+#define   G_028B5C_SIZE(x)(((x) >> 0) 
& 0xFF)
+#define   C_028B5C_SIZE   
0xFF00
+#define   S_028B5C_PATCH_CP_SIZE(x)   (((x) & 
0x1FFF) << 8)
+#define   G_028B5C_PATCH_CP_SIZE(x)   (((x) >> 8) 
& 0x1FFF)
+#define   C_028B5C_PATCH_CP_SIZE  
0xFFF000FF


C_028B5C_PATCH_CP_SIZE should be 0xFFE000FF, its 13 bits, not 12


+#define R_028B60_VGT_HS_SIZE 0x00028B60
+#define   S_028B60_SIZE(x)(((x) & 0xFF) 
<< 0)
+#define   G_028B60_SIZE(x)(((x) >> 0) 
& 0xFF)
+#define   C_028B60_SIZE   
0xFF00
+#define   S_028B60_PATCH_CP_SIZE(x)   (((x) & 
0x1FFF) << 8)
+#define   G_028B60_PATCH_CP_SIZE(x)   (((x) >> 8) 
& 0x1FFF)
+#define   C_028B60_PATCH_CP_SIZE  
0xFFF000FF


Same here, C_028B60_PATCH_CP_SIZE mask should be 0xFFE000FF


+#define R_028B64_VGT_LS_HS_ALLOC 0x00028B64
+#define   S_028B64_HS_TOTAL_OUTPUT(x) (((x) & 
0x1FFF) << 0)
+#define   S_028B64_LS_HS_TOTAL_OUTPUT(x)  (((x) & 
0x1FFF) << 13)
+#define R_028B68_VGT_HS_PATCH_CONST  0x00028B68
+#define   S_028B68_SIZE(x)(((x) & 
0x1FFF) << 0)
+#define   S_028B68_STRIDE(x)  (((x) & 
0x1FFF) << 13)


No getters/masks for these?


 #define R_028B70_DB_ALPHA_TO_MASK0x00028B70
 #define   S_028B70_ALPHA_TO_MASK_ENABLE(x) (((x) & 0x1) << 0)
 #define   S_028B70_ALPHA_TO_MASK_OFFSET0(x)(((x) & 0x3) << 8)
diff --git a/src/gallium/drivers/r600/r600_sq.h 
b/src/gallium/drivers/r600/r600_sq.h
index 1545cf1..37b6d58 100644
--- a/src/gallium/drivers/r600/r600_sq.h
+++ b/src/gallium/drivers/r600/r600_sq.h
@@ -189,6 +189,14 @@
  * 255  SQ_ALU_SRC_PS: previous scalar result.
  * 448  EG - INTERP SRC BASE
  */
+/* LDS are Evergreen/Cayman only */
+#define EG_V_SQ_ALU_SRC_LDS_OQ_A 0x00DB
+#define EG_V_SQ_ALU_SRC_LDS_OQ_B     0x00DC
+#define EG_V_SQ_ALU_SRC_LDS_OQ_A_POP 0x00DD
+#define EG_V_SQ_ALU_SRC_LDS_OQ_B_POP 0x00DE
+#define EG_V_SQ_ALU_SRC_LDS_DIRECT_A 0x00DF
+#define EG_V_SQ_ALU_SRC_LDS_DIRECT_B 0x00E0
+
 #define V_SQ_ALU_SRC_0   0x00F8
 #define V_SQ_ALU_SRC_1   0x00F9
 #define V_SQ_ALU_SRC_1_INT   0x00FA


With above nits fixed,
Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600: add missing register to initial state

2015-11-23 Thread Glenn Kennard
eend.h  
b/src/gallium/drivers/r600/evergreend.h

index dbee9d5..c43a987 100644
--- a/src/gallium/drivers/r600/evergreend.h
+++ b/src/gallium/drivers/r600/evergreend.h
@@ -2500,7 +2500,6 @@
 #define CM_R_0286FC_SPI_LDS_MGMT 0x286fc
 #define   S_0286FC_NUM_PS_LDS(x) ((x) & 0xff)
 #define   S_0286FC_NUM_LS_LDS(x) ((x) & 0xff) << 8
-#define CM_R_0288E8_SQ_LDS_ALLOC 0x000288E8
#define CM_R_028804_DB_EQAA  0x00028804
 #define   S_028804_MAX_ANCHOR_SAMPLES(x)   (((x) & 0x7) << 0)


Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: Support TGSI_SEMANTIC_HELPER_INVOCATION

2015-11-13 Thread Glenn Kennard
On Fri, 13 Nov 2015 18:57:28 +0100, Nicolai Hähnle <nhaeh...@gmail.com>  
wrote:



On 13.11.2015 00:14, Glenn Kennard wrote:

Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
Maybe there is a better way to check if a thread is a helper invocation?


Is ctx->face_gpr guaranteed to be initialized when  
load_helper_invocation is called?




allocate_system_value_inputs() sets that if needed, and is called before  
parsing any opcodes.


Aside, I'm not sure I understand correctly what this is supposed to do.  
The values you're querying are related to multi-sampling, but my  
understanding has always been that helper invocations can also happen  
without multi-sampling: you always want to process 2x2 quads of pixels  
at a time to be able to compute derivatives for texture sampling. When  
the boundary of primitive intersects such a quad, you get helper  
invocations outside the primitive.




Non-MSAA buffers act just like 1 sample buffers with regards to the  
coverage mask supplied by the hardware, so helper invocations which have  
no coverage get a 0 for the mask value, and normal fragments get 1. Works  
with the piglit test case posted at least...



Cheers,
Nicolai

  src/gallium/drivers/r600/r600_shader.c | 83  
+-

  1 file changed, 72 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c  
b/src/gallium/drivers/r600/r600_shader.c

index 560197c..a227d78 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -530,7 +530,8 @@ static int r600_spi_sid(struct r600_shader_io * io)
name == TGSI_SEMANTIC_PSIZE ||
name == TGSI_SEMANTIC_EDGEFLAG ||
name == TGSI_SEMANTIC_FACE ||
-   name == TGSI_SEMANTIC_SAMPLEMASK)
+   name == TGSI_SEMANTIC_SAMPLEMASK ||
+   name == TGSI_SEMANTIC_HELPER_INVOCATION)
index = 0;
else {
if (name == TGSI_SEMANTIC_GENERIC) {
@@ -734,7 +735,8 @@ static int tgsi_declaration(struct r600_shader_ctx  
*ctx)

case TGSI_FILE_SYSTEM_VALUE:
if (d->Semantic.Name == TGSI_SEMANTIC_SAMPLEMASK ||
d->Semantic.Name == TGSI_SEMANTIC_SAMPLEID ||
-   d->Semantic.Name == TGSI_SEMANTIC_SAMPLEPOS) {
+   d->Semantic.Name == TGSI_SEMANTIC_SAMPLEPOS ||
+   d->Semantic.Name == TGSI_SEMANTIC_HELPER_INVOCATION) {
break; /* Already handled from 
allocate_system_value_inputs */
} else if (d->Semantic.Name == TGSI_SEMANTIC_INSTANCEID) {
if (!ctx->native_integers) {
@@ -776,13 +778,14 @@ static int allocate_system_value_inputs(struct  
r600_shader_ctx *ctx, int gpr_off

struct {
boolean enabled;
int *reg;
-   unsigned name, alternate_name;
+   unsigned associated_semantics[3];
} inputs[2] = {
-		{ false, >face_gpr, TGSI_SEMANTIC_SAMPLEMASK, ~0u }, /* lives  
in Front Face GPR.z */

-
-		{ false, >fixed_pt_position_gpr, TGSI_SEMANTIC_SAMPLEID,  
TGSI_SEMANTIC_SAMPLEPOS } /* SAMPLEID is in Fixed Point Position GPR.w  
*/
+		{ false, >face_gpr, { TGSI_SEMANTIC_SAMPLEMASK /* lives in  
Front Face GPR.z */,

+   TGSI_SEMANTIC_HELPER_INVOCATION, ~0u } },
+		{ false, >fixed_pt_position_gpr, { TGSI_SEMANTIC_SAMPLEID  /*  
in Fixed Point Position GPR.w */,

+   TGSI_SEMANTIC_SAMPLEPOS, 
TGSI_SEMANTIC_HELPER_INVOCATION } }
};
-   int i, k, num_regs = 0;
+   int i, k, l, num_regs = 0;

if (tgsi_parse_init(, ctx->tokens) != TGSI_PARSE_OK) {
return 0;
@@ -818,9 +821,11 @@ static int allocate_system_value_inputs(struct  
r600_shader_ctx *ctx, int gpr_off

struct tgsi_full_declaration *d = 

if (d->Declaration.File == TGSI_FILE_SYSTEM_VALUE) {
for (k = 0; k < Elements(inputs); k++) {
-   if (d->Semantic.Name == inputs[k].name 
||
-   d->Semantic.Name == 
inputs[k].alternate_name) {
-   inputs[k].enabled = true;
+   for (l = 0; l < 3; l++) {
+   if (d->Semantic.Name == 
inputs[k].associated_semantics[l]) {
+   inputs[k].enabled = 
true;
+   break;
+   }
}
}
}
@@ -832,7 +837,7 @@ static int allocate_system_value_inputs(struct  
r600_shader_ctx *ctx, int gpr_off

for (i = 0; i < Elem

Re: [Mesa-dev] [PATCH v2 2/3] gallium: add support for gl_HelperInvocation semantic

2015-11-12 Thread Glenn Kennard
On Thu, 12 Nov 2015 18:32:25 +0100, Ilia Mirkin <imir...@alum.mit.edu>  
wrote:



Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
---
 src/gallium/auxiliary/tgsi/tgsi_strings.c  | 1 +
 src/gallium/docs/source/tgsi.rst   | 8 
 src/gallium/include/pipe/p_shader_tokens.h | 3 ++-
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 4 +++-
 4 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.c  
b/src/gallium/auxiliary/tgsi/tgsi_strings.c

index 89369d6..fc29a23 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_strings.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_strings.c
@@ -95,6 +95,7 @@ const char *tgsi_semantic_names[TGSI_SEMANTIC_COUNT] =
"TESSOUTER",
"TESSINNER",
"VERTICESIN",
+   "HELPER_INVOCATION",
 };
const char *tgsi_texture_names[TGSI_TEXTURE_COUNT] =
diff --git a/src/gallium/docs/source/tgsi.rst  
b/src/gallium/docs/source/tgsi.rst

index 01e18f3..e7b0c2f 100644
--- a/src/gallium/docs/source/tgsi.rst
+++ b/src/gallium/docs/source/tgsi.rst
@@ -2941,6 +2941,14 @@ TGSI_SEMANTIC_VERTICESIN
 For tessellation evaluation/control shaders, this semantic label  
indicates the
 number of vertices provided in the input patch. Only the X value is  
defined.

+TGSI_SEMANTIC_HELPER_INVOCATION
+"""""""""""""""""""""""""""""""
+
+For fragment shaders, this semantic indicates whether the current
+invocation is covered or not. Helper invocations are created in order
+to properly compute derivatives, however it may be desirable to skip
+some of the logic in those cases. See ``gl_HelperInvocation``  
documentation.

+
Declaration Interpolate
 ^^^
diff --git a/src/gallium/include/pipe/p_shader_tokens.h  
b/src/gallium/include/pipe/p_shader_tokens.h

index e0ab901..a3137ae 100644
--- a/src/gallium/include/pipe/p_shader_tokens.h
+++ b/src/gallium/include/pipe/p_shader_tokens.h
@@ -185,7 +185,8 @@ struct tgsi_declaration_interp
 #define TGSI_SEMANTIC_TESSOUTER  32 /**< outer tessellation levels */
 #define TGSI_SEMANTIC_TESSINNER  33 /**< inner tessellation levels */
 #define TGSI_SEMANTIC_VERTICESIN 34 /**< number of input vertices */
-#define TGSI_SEMANTIC_COUNT  35 /**< number of semantic values */
+#define TGSI_SEMANTIC_HELPER_INVOCATION 35 /**< current invocation is  
helper */

+#define TGSI_SEMANTIC_COUNT  36 /**< number of semantic values */
struct tgsi_declaration_semantic
 {
diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp  
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp

index b565127..3ad1afd 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -4408,7 +4408,7 @@ const unsigned  
_mesa_sysval_to_semantic[SYSTEM_VALUE_MAX] = {

TGSI_SEMANTIC_SAMPLEID,
TGSI_SEMANTIC_SAMPLEPOS,
TGSI_SEMANTIC_SAMPLEMASK,
-   0, /* gl_HelperInvocation */
+   TGSI_SEMANTIC_HELPER_INVOCATION,
   /* Tessellation shaders
 */
@@ -5139,6 +5139,8 @@ st_translate_program(
   TGSI_SEMANTIC_BASEVERTEX);
assert(_mesa_sysval_to_semantic[SYSTEM_VALUE_TESS_COORD] ==
   TGSI_SEMANTIC_TESSCOORD);
+   assert(_mesa_sysval_to_semantic[SYSTEM_VALUE_HELPER_INVOCATION] ==
+  TGSI_SEMANTIC_HELPER_INVOCATION);
   t = CALLOC_STRUCT(st_translate);
if (!t) {


Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: Support TGSI_SEMANTIC_HELPER_INVOCATION

2015-11-12 Thread Glenn Kennard
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
Maybe there is a better way to check if a thread is a helper invocation?

 src/gallium/drivers/r600/r600_shader.c | 83 +-
 1 file changed, 72 insertions(+), 11 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 560197c..a227d78 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -530,7 +530,8 @@ static int r600_spi_sid(struct r600_shader_io * io)
name == TGSI_SEMANTIC_PSIZE ||
name == TGSI_SEMANTIC_EDGEFLAG ||
name == TGSI_SEMANTIC_FACE ||
-   name == TGSI_SEMANTIC_SAMPLEMASK)
+   name == TGSI_SEMANTIC_SAMPLEMASK ||
+   name == TGSI_SEMANTIC_HELPER_INVOCATION)
index = 0;
else {
if (name == TGSI_SEMANTIC_GENERIC) {
@@ -734,7 +735,8 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx)
case TGSI_FILE_SYSTEM_VALUE:
if (d->Semantic.Name == TGSI_SEMANTIC_SAMPLEMASK ||
d->Semantic.Name == TGSI_SEMANTIC_SAMPLEID ||
-   d->Semantic.Name == TGSI_SEMANTIC_SAMPLEPOS) {
+   d->Semantic.Name == TGSI_SEMANTIC_SAMPLEPOS ||
+   d->Semantic.Name == TGSI_SEMANTIC_HELPER_INVOCATION) {
break; /* Already handled from 
allocate_system_value_inputs */
} else if (d->Semantic.Name == TGSI_SEMANTIC_INSTANCEID) {
if (!ctx->native_integers) {
@@ -776,13 +778,14 @@ static int allocate_system_value_inputs(struct 
r600_shader_ctx *ctx, int gpr_off
struct {
boolean enabled;
int *reg;
-   unsigned name, alternate_name;
+   unsigned associated_semantics[3];
} inputs[2] = {
-   { false, >face_gpr, TGSI_SEMANTIC_SAMPLEMASK, ~0u }, /* 
lives in Front Face GPR.z */
-
-   { false, >fixed_pt_position_gpr, TGSI_SEMANTIC_SAMPLEID, 
TGSI_SEMANTIC_SAMPLEPOS } /* SAMPLEID is in Fixed Point Position GPR.w */
+   { false, >face_gpr, { TGSI_SEMANTIC_SAMPLEMASK /* lives in 
Front Face GPR.z */,
+   TGSI_SEMANTIC_HELPER_INVOCATION, ~0u } },
+   { false, >fixed_pt_position_gpr, { TGSI_SEMANTIC_SAMPLEID  
/* in Fixed Point Position GPR.w */,
+   TGSI_SEMANTIC_SAMPLEPOS, 
TGSI_SEMANTIC_HELPER_INVOCATION } }
};
-   int i, k, num_regs = 0;
+   int i, k, l, num_regs = 0;
 
if (tgsi_parse_init(, ctx->tokens) != TGSI_PARSE_OK) {
return 0;
@@ -818,9 +821,11 @@ static int allocate_system_value_inputs(struct 
r600_shader_ctx *ctx, int gpr_off
struct tgsi_full_declaration *d = 

if (d->Declaration.File == TGSI_FILE_SYSTEM_VALUE) {
for (k = 0; k < Elements(inputs); k++) {
-   if (d->Semantic.Name == inputs[k].name 
||
-   d->Semantic.Name == 
inputs[k].alternate_name) {
-   inputs[k].enabled = true;
+   for (l = 0; l < 3; l++) {
+   if (d->Semantic.Name == 
inputs[k].associated_semantics[l]) {
+   inputs[k].enabled = 
true;
+   break;
+   }
}
}
}
@@ -832,7 +837,7 @@ static int allocate_system_value_inputs(struct 
r600_shader_ctx *ctx, int gpr_off
for (i = 0; i < Elements(inputs); i++) {
boolean enabled = inputs[i].enabled;
int *reg = inputs[i].reg;
-   unsigned name = inputs[i].name;
+   unsigned name = inputs[i].associated_semantics[0];
 
if (enabled) {
int gpr = gpr_offset + num_regs++;
@@ -985,6 +990,56 @@ static int load_sample_position(struct r600_shader_ctx 
*ctx, struct r600_shader_
return t1;
 }
 
+static int load_helper_invocation(struct r600_shader_ctx *ctx,
+   int mask_gpr, int mask_chan, int id_gpr, int id_chan)
+{
+   // sample (mask >> sampleid) & 1
+   struct r600_bytecode_alu alu;
+   int r, t = r600_get_temp(ctx);
+
+   memset(, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ALU_OP2_LSHR_INT;
+   alu.src[0].sel = mask_gpr;
+   alu.src[0].chan = mask_chan;
+   alu.src[1].sel = id_gpr;
+   alu.src[1].chan = id_chan;
+   alu.dst.sel = t;
+   alu.dst.chan = 0;
+   alu.dst.write = 1;
+   alu.last = 1;
+   r = 

Re: [Mesa-dev] [PATCH] r600: initialised PGM_RESOURCES_2 for ES/GS

2015-11-11 Thread Glenn Kennard

On Wed, 11 Nov 2015 23:42:18 +0100, Dave Airlie <airl...@gmail.com> wrote:


From: Dave Airlie <airl...@redhat.com>

This fixes the corruption on rendering that we are seeing in
certain geometry shaders.



Specifically, this fixes  
https://bugs.freedesktop.org/show_bug.cgi?id=91780 and probably others



Signed-off-by: Dave Airlie <airl...@redhat.com>
---
 src/gallium/drivers/r600/evergreen_state.c | 4 
 src/gallium/drivers/r600/evergreend.h  | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/src/gallium/drivers/r600/evergreen_state.c  
b/src/gallium/drivers/r600/evergreen_state.c

index c6702a9..a3bbbcc 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -2362,6 +2362,8 @@ static void cayman_init_atom_start_cs(struct  
r600_context *rctx)
	r600_store_context_reg(cb, R_028848_SQ_PGM_RESOURCES_2_PS,  
S_028848_SINGLE_ROUND(V_SQ_ROUND_NEAREST_EVEN));
 	r600_store_context_reg(cb, R_028864_SQ_PGM_RESOURCES_2_VS,  
S_028864_SINGLE_ROUND(V_SQ_ROUND_NEAREST_EVEN));
+	r600_store_context_reg(cb, R_02887C_SQ_PGM_RESOURCES_2_GS,  
S_028848_SINGLE_ROUND(V_SQ_ROUND_NEAREST_EVEN));
+	r600_store_context_reg(cb, R_028894_SQ_PGM_RESOURCES_2_ES,  
S_028848_SINGLE_ROUND(V_SQ_ROUND_NEAREST_EVEN));

r600_store_context_reg(cb, R_0288A8_SQ_PGM_RESOURCES_FS, 0);
/* to avoid GPU doing any preloading of constant from random address */
@@ -2801,6 +2803,8 @@ void evergreen_init_atom_start_cs(struct  
r600_context *rctx)
	r600_store_context_reg(cb, R_028848_SQ_PGM_RESOURCES_2_PS,  
S_028848_SINGLE_ROUND(V_SQ_ROUND_NEAREST_EVEN));
 	r600_store_context_reg(cb, R_028864_SQ_PGM_RESOURCES_2_VS,  
S_028864_SINGLE_ROUND(V_SQ_ROUND_NEAREST_EVEN));
+	r600_store_context_reg(cb, R_02887C_SQ_PGM_RESOURCES_2_GS,  
S_028848_SINGLE_ROUND(V_SQ_ROUND_NEAREST_EVEN));
+	r600_store_context_reg(cb, R_028894_SQ_PGM_RESOURCES_2_ES,  
S_028848_SINGLE_ROUND(V_SQ_ROUND_NEAREST_EVEN));


nitpick: separate macros for SINGLE_ROUND for each register


r600_store_context_reg(cb, R_0288A8_SQ_PGM_RESOURCES_FS, 0);
/* to avoid GPU doing any preloading of constant from random address */
diff --git a/src/gallium/drivers/r600/evergreend.h  
b/src/gallium/drivers/r600/evergreend.h

index 937ffcb..cf8906c 100644
--- a/src/gallium/drivers/r600/evergreend.h
+++ b/src/gallium/drivers/r600/evergreend.h
@@ -1497,6 +1497,7 @@
 #define   S_028878_UNCACHED_FIRST_INST(x)  (((x) & 0x1) <<  
28)
 #define   G_028878_UNCACHED_FIRST_INST(x)  (((x) >> 28) &  
0x1)

 #define   C_028878_UNCACHED_FIRST_INST 0xEFFF
+#define R_02887C_SQ_PGM_RESOURCES_2_GS 0x02887C
#define R_028890_SQ_PGM_RESOURCES_ES 0x028890
 #define   S_028890_NUM_GPRS(x) (((x) & 0xFF) <<  
0)

@@ -1511,6 +1512,7 @@
 #define   S_028890_UNCACHED_FIRST_INST(x)  (((x) & 0x1) <<  
28)
 #define   G_028890_UNCACHED_FIRST_INST(x)  (((x) >> 28) &  
0x1)

 #define   C_028890_UNCACHED_FIRST_INST 0xEFFF
+#define R_028894_SQ_PGM_RESOURCES_2_ES 0x028894
#define R_028864_SQ_PGM_RESOURCES_2_VS   0x028864
 #define   S_028864_SINGLE_ROUND(x)         (((x) & 0x3) <<  
0)


Tested / Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] r600g: Pass conservative depth parameters to hw

2015-10-17 Thread Glenn Kennard
Supported on R700 and up.

Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
v2:
  Use correct register for R700, set from r600_emit_db_misc_state
  Added ps_conservative_z field to r600_db_misc_state
  Shrunk ps_conservative_z to uint8 since only 2 bits are needed

Thanks Alex for noting the incorrect register on R700, would have gone
unnoticed for years otherwise since the feature isn't directly observable...

 src/gallium/drivers/r600/evergreen_state.c | 13 +
 src/gallium/drivers/r600/evergreend.h  |  7 +++
 src/gallium/drivers/r600/r600_pipe.h   |  1 +
 src/gallium/drivers/r600/r600_shader.c |  1 +
 src/gallium/drivers/r600/r600_shader.h |  2 ++
 src/gallium/drivers/r600/r600_state.c  | 22 +-
 src/gallium/drivers/r600/r600d.h   |  8 
 7 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index c6702a9..96c6b11 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -2940,6 +2940,19 @@ void evergreen_update_ps_state(struct pipe_context *ctx, 
struct r600_pipe_shader
db_shader_control |= S_02880C_STENCIL_EXPORT_ENABLE(stencil_export);
db_shader_control |= S_02880C_MASK_EXPORT_ENABLE(mask_export);
 
+   switch (rshader->ps_conservative_z) {
+   default: /* fall through */
+   case TGSI_FS_DEPTH_LAYOUT_ANY:
+   db_shader_control |= 
S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_ANY_Z);
+   break;
+   case TGSI_FS_DEPTH_LAYOUT_GREATER:
+   db_shader_control |= 
S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_GREATER_THAN_Z);
+   break;
+   case TGSI_FS_DEPTH_LAYOUT_LESS:
+   db_shader_control |= 
S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_LESS_THAN_Z);
+   break;
+   }
+
exports_ps = 0;
for (i = 0; i < rshader->noutput; i++) {
if (rshader->output[i].name == TGSI_SEMANTIC_POSITION ||
diff --git a/src/gallium/drivers/r600/evergreend.h 
b/src/gallium/drivers/r600/evergreend.h
index 937ffcb..a9a65f7 100644
--- a/src/gallium/drivers/r600/evergreend.h
+++ b/src/gallium/drivers/r600/evergreend.h
@@ -815,6 +815,13 @@
 #define V_02880C_EXPORT_DB_FOUR16  0x01
 #define V_02880C_EXPORT_DB_TWO 0x02
 #define   S_02880C_ALPHA_TO_MASK_DISABLE(x)(((x) & 0x1) << 12)
+#define   S_02880C_CONSERVATIVE_Z_EXPORT(x)(((x) & 0x03) << 16)
+#define   G_02880C_CONSERVATIVE_Z_EXPORT(x)(((x) >> 16) & 0x03)
+#define   C_02880C_CONSERVATIVE_Z_EXPORT   0xFFFC
+#define V_02880C_EXPORT_ANY_Z  0
+#define V_02880C_EXPORT_LESS_THAN_Z1
+#define V_02880C_EXPORT_GREATER_THAN_Z 2
+#define V_02880C_EXPORT_RESERVED   3
 
 #define R_028A00_PA_SU_POINT_SIZE0x028A00
 #define   S_028A00_HEIGHT(x)   (((x) & 0x) << 0)
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index 520b03f..950bb6b 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -116,6 +116,7 @@ struct r600_db_misc_state {
unsignedlog_samples;
unsigneddb_shader_control;
boolhtile_clear;
+   uint8_t ps_conservative_z;
 };
 
 struct r600_cb_misc_state {
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 8efe902..613f94e 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -2048,6 +2048,7 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
 
shader->fs_write_all = 
ctx.info.properties[TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS];
shader->vs_position_window_space = 
ctx.info.properties[TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION];
+   shader->ps_conservative_z = 
(uint8_t)ctx.info.properties[TGSI_PROPERTY_FS_DEPTH_LAYOUT];
 
if (shader->vs_as_gs_a)
vs_add_primid_output(, key.vs.prim_id_out);
diff --git a/src/gallium/drivers/r600/r600_shader.h 
b/src/gallium/drivers/r600/r600_shader.h
index c240e71..2040f73 100644
--- a/src/gallium/drivers/r600/r600_shader.h
+++ b/src/gallium/drivers/r600/r600_shader.h
@@ -76,6 +76,8 @@ struct r600_shader {
boolean uses_tex_buffers;
boolean gs_prim_id_input;
 
+   uint8_t ps_conservative_z;
+
/* Size in bytes of a data item in the ring(s) (single vertex data).
   Stages with only one ring items 123 will be set to 0. */
unsignedring_item_sizes[4];
dif

[Mesa-dev] [PATCH] r600g: Pass conservative depth parameters to hw

2015-10-16 Thread Glenn Kennard
Supported on R700 and up.

Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
Not exactly a commonly used extension, but might as well set the
hardware registers rather than just dropping the hint on the floor.

 src/gallium/drivers/r600/evergreen_state.c | 13 +
 src/gallium/drivers/r600/evergreend.h  |  7 +++
 src/gallium/drivers/r600/r600_shader.c |  1 +
 src/gallium/drivers/r600/r600_shader.h |  2 ++
 src/gallium/drivers/r600/r600_state.c  | 15 +++
 src/gallium/drivers/r600/r600d.h   |  8 
 6 files changed, 46 insertions(+)

diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index c6702a9..96c6b11 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -2940,6 +2940,19 @@ void evergreen_update_ps_state(struct pipe_context *ctx, 
struct r600_pipe_shader
db_shader_control |= S_02880C_STENCIL_EXPORT_ENABLE(stencil_export);
db_shader_control |= S_02880C_MASK_EXPORT_ENABLE(mask_export);
 
+   switch (rshader->ps_conservative_z) {
+   default: /* fall through */
+   case TGSI_FS_DEPTH_LAYOUT_ANY:
+   db_shader_control |= 
S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_ANY_Z);
+   break;
+   case TGSI_FS_DEPTH_LAYOUT_GREATER:
+   db_shader_control |= 
S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_GREATER_THAN_Z);
+   break;
+   case TGSI_FS_DEPTH_LAYOUT_LESS:
+   db_shader_control |= 
S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_LESS_THAN_Z);
+   break;
+   }
+
exports_ps = 0;
for (i = 0; i < rshader->noutput; i++) {
if (rshader->output[i].name == TGSI_SEMANTIC_POSITION ||
diff --git a/src/gallium/drivers/r600/evergreend.h 
b/src/gallium/drivers/r600/evergreend.h
index 937ffcb..a9a65f7 100644
--- a/src/gallium/drivers/r600/evergreend.h
+++ b/src/gallium/drivers/r600/evergreend.h
@@ -815,6 +815,13 @@
 #define V_02880C_EXPORT_DB_FOUR16  0x01
 #define V_02880C_EXPORT_DB_TWO 0x02
 #define   S_02880C_ALPHA_TO_MASK_DISABLE(x)(((x) & 0x1) << 12)
+#define   S_02880C_CONSERVATIVE_Z_EXPORT(x)(((x) & 0x03) << 16)
+#define   G_02880C_CONSERVATIVE_Z_EXPORT(x)(((x) >> 16) & 0x03)
+#define   C_02880C_CONSERVATIVE_Z_EXPORT   0xFFFC
+#define V_02880C_EXPORT_ANY_Z  0
+#define V_02880C_EXPORT_LESS_THAN_Z1
+#define V_02880C_EXPORT_GREATER_THAN_Z 2
+#define V_02880C_EXPORT_RESERVED   3
 
 #define R_028A00_PA_SU_POINT_SIZE0x028A00
 #define   S_028A00_HEIGHT(x)   (((x) & 0x) << 0)
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 8efe902..560696d 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -2048,6 +2048,7 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
 
shader->fs_write_all = 
ctx.info.properties[TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS];
shader->vs_position_window_space = 
ctx.info.properties[TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION];
+   shader->ps_conservative_z = 
ctx.info.properties[TGSI_PROPERTY_FS_DEPTH_LAYOUT];
 
if (shader->vs_as_gs_a)
vs_add_primid_output(, key.vs.prim_id_out);
diff --git a/src/gallium/drivers/r600/r600_shader.h 
b/src/gallium/drivers/r600/r600_shader.h
index c240e71..e085263 100644
--- a/src/gallium/drivers/r600/r600_shader.h
+++ b/src/gallium/drivers/r600/r600_shader.h
@@ -76,6 +76,8 @@ struct r600_shader {
boolean uses_tex_buffers;
boolean gs_prim_id_input;
 
+   unsignedps_conservative_z;
+
/* Size in bytes of a data item in the ring(s) (single vertex data).
   Stages with only one ring items 123 will be set to 0. */
unsignedring_item_sizes[4];
diff --git a/src/gallium/drivers/r600/r600_state.c 
b/src/gallium/drivers/r600/r600_state.c
index 1be3e1b..09b2325 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -2533,6 +2533,21 @@ void r600_update_ps_state(struct pipe_context *ctx, 
struct r600_pipe_shader *sha
if (rshader->uses_kill)
db_shader_control |= S_02880C_KILL_ENABLE(1);
 
+   if (rctx->b.chip_class >= R700) {
+   switch (rshader->ps_conservative_z) {
+   default: /* fall through */
+   case TGSI_FS_DEPTH_LAYOUT_ANY:
+   db_shader_control |= 
S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_ANY_Z);
+   break;
+   case TGSI_FS_DEPTH_LAYOUT_GRE

[Mesa-dev] [PATCH] r600g: Implement ARB_texture_view

2015-10-15 Thread Glenn Kennard
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
See also additional texture view piglit test case posted to piglit ml,
which tests cases with layer>0. Notably softpipe and llvmpipe fail that
case but i965/hsw, nv50/nvc0 and r600g pass.

 docs/GL3.txt   |  2 +-
 docs/relnotes/11.1.0.html  |  1 +
 src/gallium/drivers/r600/evergreen_state.c | 23 +--
 src/gallium/drivers/r600/r600_pipe.c   |  2 +-
 4 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index 6503e2a..c03a574 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -169,7 +169,7 @@ GL 4.3, GLSL 4.30:
   GL_ARB_texture_buffer_range  DONE (nv50, nvc0, i965, 
r600, radeonsi, llvmpipe)
   GL_ARB_texture_query_levels  DONE (all drivers that 
support GLSL 1.30)
   GL_ARB_texture_storage_multisample   DONE (all drivers that 
support GL_ARB_texture_multisample)
-  GL_ARB_texture_view  DONE (i965, nv50, nvc0, 
llvmpipe, softpipe)
+  GL_ARB_texture_view  DONE (i965, nv50, nvc0, 
r600, llvmpipe, softpipe)
   GL_ARB_vertex_attrib_binding DONE (all drivers)
 
 
diff --git a/docs/relnotes/11.1.0.html b/docs/relnotes/11.1.0.html
index dcf425e..cb8715c 100644
--- a/docs/relnotes/11.1.0.html
+++ b/docs/relnotes/11.1.0.html
@@ -53,6 +53,7 @@ Note: some of the new features are only available with 
certain drivers.
 GL_ARB_texture_query_lod on softpipe
 EGL_KHR_create_context on softpipe, llvmpipe
 EGL_KHR_gl_colorspace on softpipe, llvmpipe
+GL_ARB_texture_view on r600 for Evergreen and later chips
 
 
 Bug fixes
diff --git a/src/gallium/drivers/r600/evergreen_state.c 
b/src/gallium/drivers/r600/evergreen_state.c
index c6702a9..60747d1 100644
--- a/src/gallium/drivers/r600/evergreen_state.c
+++ b/src/gallium/drivers/r600/evergreen_state.c
@@ -666,6 +666,7 @@ evergreen_create_sampler_view_custom(struct pipe_context 
*ctx,
enum pipe_format pipe_format = state->format;
struct radeon_surf_level *surflevel;
unsigned base_level, first_level, last_level;
+   unsigned dim, last_layer;
uint64_t va;
 
if (view == NULL)
@@ -679,7 +680,7 @@ evergreen_create_sampler_view_custom(struct pipe_context 
*ctx,
view->base.reference.count = 1;
view->base.context = ctx;
 
-   if (texture->target == PIPE_BUFFER)
+   if (state->target == PIPE_BUFFER)
return texture_buffer_sampler_view(rctx, view, width0, height0);
 
swizzle[0] = state->swizzle_r;
@@ -773,12 +774,12 @@ evergreen_create_sampler_view_custom(struct pipe_context 
*ctx,
}
nbanks = eg_num_banks(rscreen->b.tiling_info.num_banks);
 
-   if (texture->target == PIPE_TEXTURE_1D_ARRAY) {
+   if (state->target == PIPE_TEXTURE_1D_ARRAY) {
height = 1;
depth = texture->array_size;
-   } else if (texture->target == PIPE_TEXTURE_2D_ARRAY) {
+   } else if (state->target == PIPE_TEXTURE_2D_ARRAY) {
depth = texture->array_size;
-   } else if (texture->target == PIPE_TEXTURE_CUBE_ARRAY)
+   } else if (state->target == PIPE_TEXTURE_CUBE_ARRAY)
depth = texture->array_size / 6;
 
va = tmp->resource.gpu_address;
@@ -790,7 +791,13 @@ evergreen_create_sampler_view_custom(struct pipe_context 
*ctx,
view->is_stencil_sampler = true;
 
view->tex_resource = >resource;
-   view->tex_resource_words[0] = 
(S_03_DIM(r600_tex_dim(texture->target, texture->nr_samples)) |
+
+   /* array type views and views into array types need to use layer offset 
*/
+   dim = state->target;
+   if (state->target != PIPE_TEXTURE_CUBE)
+   dim = MAX2(state->target, texture->target);
+
+   view->tex_resource_words[0] = (S_03_DIM(r600_tex_dim(dim, 
texture->nr_samples)) |
   S_03_PITCH((pitch / 8) - 1) |
   S_03_TEX_WIDTH(width - 1));
if (rscreen->b.chip_class == CAYMAN)
@@ -818,10 +825,14 @@ evergreen_create_sampler_view_custom(struct pipe_context 
*ctx,
view->tex_resource_words[3] = (surflevel[base_level].offset + 
va) >> 8;
}
 
+   last_layer = state->u.tex.last_layer;
+   if (state->target != texture->target && depth == 1) {
+   last_layer = state->u.tex.first_layer;
+   }
view->tex_resource_words[4] = (word4 |
   S_030010_ENDIAN_SWAP(endian));
view->tex_resource_words[5] = 
S_030014_BASE_ARRAY(state->u.tex.first_layer) |
- 
S_030014_LAST_ARRAY(state->u.tex.last_layer);
+  

[Mesa-dev] [PATCH 1/2] r600g/sb: SB support for UBO indexing

2015-10-07 Thread Glenn Kennard
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
This patch depends on prior patch:
  r600g/sb: Support gs5 sampler indexing

Two items that could be improved on in some future patch:
Clauses using UBO indexing still lock the cache line for a
constant used to load the index register, which causes some
instruction groups to be broken up as SB thinks they are
using too many constant read ports.

The MOVA_INT/SET_CF_IDX[01] ops can often be emitted directly into
the preceeding clause rather than always creating a new one.

 src/gallium/drivers/r600/r600_shader.c |   6 --
 src/gallium/drivers/r600/r600_shader.h |   2 -
 src/gallium/drivers/r600/sb/sb_bc.h|   4 +-
 src/gallium/drivers/r600/sb/sb_bc_finalize.cpp |   6 +-
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp   |  20 -
 src/gallium/drivers/r600/sb/sb_expr.cpp|   3 +-
 src/gallium/drivers/r600/sb/sb_ir.h|   7 ++
 src/gallium/drivers/r600/sb/sb_sched.cpp   | 108 ++---
 src/gallium/drivers/r600/sb/sb_sched.h |   4 +
 src/gallium/drivers/r600/sb/sb_shader.cpp  |   4 +-
 src/gallium/drivers/r600/sb/sb_shader.h|   2 +-
 11 files changed, 139 insertions(+), 27 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 24c3d43..8efe902 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -166,8 +166,6 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
 if (rctx->b.chip_class <= R700) {
use_sb &= (shader->shader.processor_type != 
TGSI_PROCESSOR_GEOMETRY);
 }
-   /* disable SB for shaders using ubo array indexing as it doesn't handle 
those currently */
-   use_sb &= !shader->shader.uses_ubo_indexing;
/* disable SB for shaders using doubles */
use_sb &= !shader->shader.uses_doubles;
 
@@ -1250,9 +1248,6 @@ static int tgsi_split_constant(struct r600_shader_ctx 
*ctx)
continue;
}
 
-   if (ctx->src[i].kc_rel)
-   ctx->shader->uses_ubo_indexing = true;
-
if (ctx->src[i].rel) {
int chan = inst->Src[i].Indirect.Swizzle;
int treg = r600_get_temp(ctx);
@@ -1936,7 +1931,6 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
ctx.gs_next_vertex = 0;
ctx.gs_stream_output_info = 
 
-   shader->uses_ubo_indexing = false;
ctx.face_gpr = -1;
ctx.fixed_pt_position_gpr = -1;
ctx.fragcoord_input = -1;
diff --git a/src/gallium/drivers/r600/r600_shader.h 
b/src/gallium/drivers/r600/r600_shader.h
index 8ba32ae..c240e71 100644
--- a/src/gallium/drivers/r600/r600_shader.h
+++ b/src/gallium/drivers/r600/r600_shader.h
@@ -75,8 +75,6 @@ struct r600_shader {
boolean has_txq_cube_array_z_comp;
boolean uses_tex_buffers;
boolean gs_prim_id_input;
-   /* Temporarily workaround SB not handling ubo indexing */
-   boolean uses_ubo_indexing;
 
/* Size in bytes of a data item in the ring(s) (single vertex data).
   Stages with only one ring items 123 will be set to 0. */
diff --git a/src/gallium/drivers/r600/sb/sb_bc.h 
b/src/gallium/drivers/r600/sb/sb_bc.h
index 126750d..9c2a917 100644
--- a/src/gallium/drivers/r600/sb/sb_bc.h
+++ b/src/gallium/drivers/r600/sb/sb_bc.h
@@ -478,7 +478,9 @@ struct bc_cf {
 
bool is_alu_extended() {
assert(op_ptr->flags & CF_ALU);
-   return kc[2].mode != KC_LOCK_NONE || kc[3].mode != KC_LOCK_NONE;
+   return kc[2].mode != KC_LOCK_NONE || kc[3].mode != KC_LOCK_NONE 
||
+   kc[0].index_mode != KC_INDEX_NONE || kc[1].index_mode 
!= KC_INDEX_NONE ||
+   kc[2].index_mode != KC_INDEX_NONE || kc[3].index_mode 
!= KC_INDEX_NONE;
}
 
 };
diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
index 522ff9d..17fe2a5 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
@@ -514,7 +514,7 @@ void bc_finalizer::copy_fetch_src(fetch_node , 
fetch_node , unsigned arg
 
 void bc_finalizer::emit_set_grad(fetch_node* f) {
 
-   assert(f->src.size() == 12);
+   assert(f->src.size() == 12 || f->src.size() == 13);
unsigned ops[2] = { FETCH_OP_SET_GRADIENTS_V, FETCH_OP_SET_GRADIENTS_H 
};
 
unsigned arg_start = 0;
@@ -809,8 +809,8 @@ void bc_finalizer::finalize_cf(cf_node* c) {
 }
 
 sel_chan bc_finalizer::translate_kcache(cf_node* alu, value* v) {
-   unsigned sel = v->select.sel();
-   unsigned bank = sel >> 12;
+   unsigned sel = v->select.kcache_sel();
+   unsigned bank = v->select.kcache_bank();
uns

[Mesa-dev] [PATCH 2/2] r600g: Enable GL_ARB_gpu_shader5 extension

2015-10-07 Thread Glenn Kennard
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
Now that SB supports the GS5 features we can safely enable the
extension.

Note that gallium state tracker clamps the GLSL language / GL version
since GL_ARB_tessellation_shader isn't implemented yet.

 docs/GL3.txt | 16 
 docs/relnotes/11.1.0.html|  1 +
 src/gallium/drivers/r600/r600_pipe.c |  2 +-
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index e17e783..6503e2a 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -96,18 +96,18 @@ GL 4.0, GLSL 4.00 --- all DONE: nvc0, radeonsi
 
   GL_ARB_draw_buffers_blendDONE (i965, nv50, r600, 
llvmpipe, softpipe)
   GL_ARB_draw_indirect DONE (i965, r600, 
llvmpipe, softpipe)
-  GL_ARB_gpu_shader5   DONE (i965)
+  GL_ARB_gpu_shader5   DONE (i965, r600)
   - 'precise' qualifierDONE
-  - Dynamically uniform sampler array indices  DONE (r600, softpipe)
-  - Dynamically uniform UBO array indices  DONE (r600)
+  - Dynamically uniform sampler array indices  DONE (softpipe)
+  - Dynamically uniform UBO array indices  DONE ()
   - Implicit signed -> unsigned conversionsDONE
   - Fused multiply-add DONE ()
-  - Packing/bitfield/conversion functions  DONE (r600, softpipe)
-  - Enhanced textureGather DONE (r600, softpipe)
-  - Geometry shader instancing DONE (r600, llvmpipe, 
softpipe)
+  - Packing/bitfield/conversion functions  DONE (softpipe)
+  - Enhanced textureGather DONE (softpipe)
+  - Geometry shader instancing DONE (llvmpipe, 
softpipe)
   - Geometry shader multiple streams   DONE ()
-  - Enhanced per-sample shadingDONE (r600)
-  - Interpolation functionsDONE (r600)
+  - Enhanced per-sample shadingDONE ()
+  - Interpolation functionsDONE ()
   - New overload resolution rules  DONE
   GL_ARB_gpu_shader_fp64   DONE (r600, llvmpipe, 
softpipe)
   GL_ARB_sample_shadingDONE (i965, nv50, r600)
diff --git a/docs/relnotes/11.1.0.html b/docs/relnotes/11.1.0.html
index c755c98..e537d98 100644
--- a/docs/relnotes/11.1.0.html
+++ b/docs/relnotes/11.1.0.html
@@ -50,6 +50,7 @@ Note: some of the new features are only available with 
certain drivers.
 GL_ARB_texture_barrier / GL_NV_texture_barrier on i965
 GL_ARB_texture_query_lod on softpipe
 GL_ARB_gpu_shader_fp64 on r600 for Cypress/Cayman/Aruba chips
+GL_ARB_gpu_shader5 on r600 for Evergreen and later chips
 
 
 Bug fixes
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index efb4889..32ce76a 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -305,7 +305,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
 
case PIPE_CAP_GLSL_FEATURE_LEVEL:
if (family >= CHIP_CEDAR)
-  return 330;
+  return 410;
/* pre-evergreen geom shaders need newer kernel */
if (rscreen->b.info.drm_minor >= 37)
   return 330;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g: Enable GL_ARB_gpu_shader5 extension

2015-10-07 Thread Glenn Kennard
On Wed, 07 Oct 2015 19:59:03 +0200, Benjamin Bellec <b.bel...@gmail.com>  
wrote:



Le 07/10/2015 19:13, Glenn Kennard a écrit :

On Wed, 07 Oct 2015 19:04:15 +0200, Benjamin Bellec
<b.bel...@gmail.com> wrote:


Hi Glenn,

The series doesn't apply on current master.

Regard.



It's not meant to apply directly on master. Quoting from the notes in
patch 1/2:

This patch depends on prior patch:
  r600g/sb: Support gs5 sampler indexing


/Glenn


OK sorry, I read too quickly.

Is that normal glxinfo still reports GLSL 330 ? With your series applied
I still get :
OpenGL renderer string: Gallium 0.4 on AMD CYPRESS (DRM 2.42.0)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 11.1.0-devel
(git-6ed8fd3)
OpenGL core profile shading language version string: 3.30




Quoting from the notes in patch 2/2:
"Note that gallium state tracker clamps the GLSL language / GL version
since GL_ARB_tessellation_shader isn't implemented yet."


/Glenn
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g: Enable GL_ARB_gpu_shader5 extension

2015-10-07 Thread Glenn Kennard
On Wed, 07 Oct 2015 19:04:15 +0200, Benjamin Bellec   
wrote:



Hi Glenn,

The series doesn't apply on current master.

Regard.



It's not meant to apply directly on master. Quoting from the notes in  
patch 1/2:


This patch depends on prior patch:
  r600g/sb: Support gs5 sampler indexing


/Glenn
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] List of unsupported extensions per driver

2015-09-29 Thread Glenn Kennard

On Tue, 29 Sep 2015 17:00:31 +0200, Marek Olšák  wrote:


On Tue, Sep 29, 2015 at 4:48 PM, Romain Failliot
 wrote:
What I don't understand is that all the lines starting with a "-" seems  
to

be part of the GL_ARB_gpu_shader5 extension. See the line here:
http://cgit.freedesktop.org/mesa/mesa/tree/docs/GL3.txt#n99

If I'm right, it means that, considering Ilia's web site,  
GL_ARB_gpu_shader5
is unsupported by R600, but everything in its sublist is supported. You  
see

why it is confusing?


No, not everything is supported. GS streams aren't.



Actually GS streams were merged a few weeks ago. gpu_shader5 isn't enabled  
yet because SB doesn't support all the features yet, and games etc getting  
unoptimized shaders when trying to use more modern GL4 features is not an  
acceptable regression.


Specifically, sampler and UBO indexing need SB support, the first i posted  
a patch for but it's not merged yet (needs someone who can grok SB code to  
review that), and the second, well, no ETA on that but work happens on it.


GL3.txt doesn't tell the whole story, its just a rough idea of whats going  
on featurewise, for the details inquire, like Romain just did :-)



Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g/sb: Support gs5 sampler indexing

2015-09-21 Thread Glenn Kennard
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
Just UBO support left before gs5 can be enabled.
Could improve how the two index registers are set/used to reduce
the number of clauses, but as is its about as good as what the blob
emits.

 src/gallium/drivers/r600/r600_shader.c   |  12 ++-
 src/gallium/drivers/r600/r600_shader.h   |   4 +-
 src/gallium/drivers/r600/sb/sb_bc.h  |  10 ++-
 src/gallium/drivers/r600/sb/sb_bc_dump.cpp   |  17 +++-
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp |  50 +++-
 src/gallium/drivers/r600/sb/sb_gcm.cpp   |  11 ++-
 src/gallium/drivers/r600/sb/sb_sched.cpp | 118 +--
 src/gallium/drivers/r600/sb/sb_sched.h   |   5 +-
 8 files changed, 201 insertions(+), 26 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 1d90582..24c3d43 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -166,8 +166,8 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
 if (rctx->b.chip_class <= R700) {
use_sb &= (shader->shader.processor_type != 
TGSI_PROCESSOR_GEOMETRY);
 }
-   /* disable SB for shaders using CF_INDEX_0/1 (sampler/ubo array 
indexing) as it doesn't handle those currently */
-   use_sb &= !shader->shader.uses_index_registers;
+   /* disable SB for shaders using ubo array indexing as it doesn't handle 
those currently */
+   use_sb &= !shader->shader.uses_ubo_indexing;
/* disable SB for shaders using doubles */
use_sb &= !shader->shader.uses_doubles;
 
@@ -1251,7 +1251,7 @@ static int tgsi_split_constant(struct r600_shader_ctx 
*ctx)
}
 
if (ctx->src[i].kc_rel)
-   ctx->shader->uses_index_registers = true;
+   ctx->shader->uses_ubo_indexing = true;
 
if (ctx->src[i].rel) {
int chan = inst->Src[i].Indirect.Swizzle;
@@ -1912,7 +1912,7 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
 
shader->uses_doubles = ctx.info.uses_doubles;
 
-   indirect_gprs = ctx.info.indirect_files & ~(1 << TGSI_FILE_CONSTANT);
+   indirect_gprs = ctx.info.indirect_files & ~((1 << TGSI_FILE_CONSTANT) | 
(1 << TGSI_FILE_SAMPLER));
tgsi_parse_init(, tokens);
ctx.type = ctx.info.processor;
shader->processor_type = ctx.type;
@@ -1936,7 +1936,7 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
ctx.gs_next_vertex = 0;
ctx.gs_stream_output_info = 
 
-   shader->uses_index_registers = false;
+   shader->uses_ubo_indexing = false;
ctx.face_gpr = -1;
ctx.fixed_pt_position_gpr = -1;
ctx.fragcoord_input = -1;
@@ -5703,8 +5703,6 @@ static int tgsi_tex(struct r600_shader_ctx *ctx)
sampler_src_reg = 3;
 
sampler_index_mode = inst->Src[sampler_src_reg].Indirect.Index == 2 ? 2 
: 0; // CF_INDEX_1 : CF_INDEX_NONE
-   if (sampler_index_mode)
-   ctx->shader->uses_index_registers = true;
 
src_gpr = tgsi_tex_get_src_gpr(ctx, 0);
 
diff --git a/src/gallium/drivers/r600/r600_shader.h 
b/src/gallium/drivers/r600/r600_shader.h
index 48de9cd..8ba32ae 100644
--- a/src/gallium/drivers/r600/r600_shader.h
+++ b/src/gallium/drivers/r600/r600_shader.h
@@ -75,8 +75,8 @@ struct r600_shader {
boolean has_txq_cube_array_z_comp;
boolean uses_tex_buffers;
boolean gs_prim_id_input;
-   /* Temporarily workaround SB not handling CF_INDEX_[01] index registers 
*/
-   boolean uses_index_registers;
+   /* Temporarily workaround SB not handling ubo indexing */
+   boolean uses_ubo_indexing;
 
/* Size in bytes of a data item in the ring(s) (single vertex data).
   Stages with only one ring items 123 will be set to 0. */
diff --git a/src/gallium/drivers/r600/sb/sb_bc.h 
b/src/gallium/drivers/r600/sb/sb_bc.h
index ab988f8..126750d 100644
--- a/src/gallium/drivers/r600/sb/sb_bc.h
+++ b/src/gallium/drivers/r600/sb/sb_bc.h
@@ -48,6 +48,7 @@ class fetch_node;
 class alu_group_node;
 class region_node;
 class shader;
+class value;
 
 class sb_ostream {
 public:
@@ -818,13 +819,16 @@ class bc_parser {
 
bool gpr_reladdr;
 
+   // Note: currently relies on input emitting SET_CF in same basic block 
as uses
+   value *cf_index_value[2];
+   alu_node *mova;
 public:
 
bc_parser(sb_context , r600_bytecode *bc, r600_shader* pshader) :
ctx(sctx), dec(), bc(bc), pshader(pshader),
dw(), bc_ndw(), max_cf(),
sh(), error(), slots(), cgroup(),
-   cf_map(), loop_stack(), gpr_reladdr() { }
+   cf_map(), loop_stack(), gpr_reladdr(

Re: [Mesa-dev] [PATCH] st/mesa: avoid integer overflows with buffers >= 512MB

2015-09-15 Thread Glenn Kennard
On Wed, 16 Sep 2015 01:32:10 +0200, Ilia Mirkin <imir...@alum.mit.edu>  
wrote:



This fixes failures with the newly-submitted max-size texture buffer
piglit test for GPUs exposing >= 128M max texels.

Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu>
Cc: "10.6 11.0" <mesa-sta...@lists.freedesktop.org>
---
 src/mesa/state_tracker/st_atom_texture.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/state_tracker/st_atom_texture.c  
b/src/mesa/state_tracker/st_atom_texture.c

index 31e0f6b..62312af 100644
--- a/src/mesa/state_tracker/st_atom_texture.c
+++ b/src/mesa/state_tracker/st_atom_texture.c
@@ -264,7 +264,7 @@ st_create_texture_sampler_view_from_stobj(struct  
pipe_context *pipe,

format);
   if (stObj->pt->target == PIPE_BUFFER) {
-  unsigned base, size;
+  uint64_t base, size;
   unsigned f, n;
   const struct util_format_description *desc
  = util_format_description(templ.format);


Tested / Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 2/5] gallium: add PIPE_CAP_TGSI_TXQS to let st know if TXQS is supported

2015-09-13 Thread Glenn Kennard
lium/drivers/nouveau/nvc0/nvc0_screen.c
@@ -200,6 +200,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen,  
enum pipe_cap param)

case PIPE_CAP_VERTEXID_NOBASE:
case PIPE_CAP_RESOURCE_FROM_USER_MEMORY:
case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
+   case PIPE_CAP_TGSI_TXQS:
   return 0;
   case PIPE_CAP_VENDOR_ID:
diff --git a/src/gallium/drivers/r300/r300_screen.c  
b/src/gallium/drivers/r300/r300_screen.c

index 4ca0b26..e669ba2 100644
--- a/src/gallium/drivers/r300/r300_screen.c
+++ b/src/gallium/drivers/r300/r300_screen.c
@@ -195,6 +195,7 @@ static int r300_get_param(struct pipe_screen*  
pscreen, enum pipe_cap param)

 case PIPE_CAP_TEXTURE_FLOAT_LINEAR:
 case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR:
 case PIPE_CAP_DEPTH_BOUNDS_TEST:
+case PIPE_CAP_TGSI_TXQS:
 return 0;
/* SWTCL-only features. */
diff --git a/src/gallium/drivers/r600/r600_pipe.c  
b/src/gallium/drivers/r600/r600_pipe.c

index fd9c16c..dfbf0e5 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -341,6 +341,7 @@ static int r600_get_param(struct pipe_screen*  
pscreen, enum pipe_cap param)

case PIPE_CAP_VERTEXID_NOBASE:
case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS:
case PIPE_CAP_DEPTH_BOUNDS_TEST:
+   case PIPE_CAP_TGSI_TXQS:
return 0;
/* Stream output. */
diff --git a/src/gallium/drivers/radeonsi/si_pipe.c  
b/src/gallium/drivers/radeonsi/si_pipe.c

index 9094427..ae1ff7e 100644
--- a/src/gallium/drivers/radeonsi/si_pipe.c
+++ b/src/gallium/drivers/radeonsi/si_pipe.c
@@ -325,6 +325,7 @@ static int si_get_param(struct pipe_screen* pscreen,  
enum pipe_cap param)

case PIPE_CAP_TEXTURE_GATHER_OFFSETS:
case PIPE_CAP_SAMPLER_VIEW_TARGET:
case PIPE_CAP_VERTEXID_NOBASE:
+   case PIPE_CAP_TGSI_TXQS:
return 0;
case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS:
diff --git a/src/gallium/drivers/softpipe/sp_screen.c  
b/src/gallium/drivers/softpipe/sp_screen.c

index 7ca8a67..d8606f3 100644
--- a/src/gallium/drivers/softpipe/sp_screen.c
+++ b/src/gallium/drivers/softpipe/sp_screen.c
@@ -246,6 +246,7 @@ softpipe_get_param(struct pipe_screen *screen, enum  
pipe_cap param)

case PIPE_CAP_DEVICE_RESET_STATUS_QUERY:
case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS:
case PIPE_CAP_DEPTH_BOUNDS_TEST:
+   case PIPE_CAP_TGSI_TXQS:
   return 0;
}
/* should only get here on unhandled cases */
diff --git a/src/gallium/drivers/svga/svga_screen.c  
b/src/gallium/drivers/svga/svga_screen.c

index f2ae40b..44b6f4a 100644
--- a/src/gallium/drivers/svga/svga_screen.c
+++ b/src/gallium/drivers/svga/svga_screen.c
@@ -379,6 +379,7 @@ svga_get_param(struct pipe_screen *screen, enum  
pipe_cap param)

case PIPE_CAP_TEXTURE_FLOAT_LINEAR:
case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR:
case PIPE_CAP_DEPTH_BOUNDS_TEST:
+   case PIPE_CAP_TGSI_TXQS:
   return 0;
}
diff --git a/src/gallium/drivers/vc4/vc4_screen.c  
b/src/gallium/drivers/vc4/vc4_screen.c

index 2dee1d4..c4b52e1 100644
--- a/src/gallium/drivers/vc4/vc4_screen.c
+++ b/src/gallium/drivers/vc4/vc4_screen.c
@@ -180,6 +180,7 @@ vc4_screen_get_param(struct pipe_screen *pscreen,  
enum pipe_cap param)

case PIPE_CAP_TEXTURE_FLOAT_LINEAR:
case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR:
case PIPE_CAP_DEPTH_BOUNDS_TEST:
+   case PIPE_CAP_TGSI_TXQS:
 return 0;
/* Stream output. */
diff --git a/src/gallium/include/pipe/p_defines.h  
b/src/gallium/include/pipe/p_defines.h

index 88e37e9..47fa82a 100644
--- a/src/gallium/include/pipe/p_defines.h
+++ b/src/gallium/include/pipe/p_defines.h
@@ -630,6 +630,7 @@ enum pipe_cap
PIPE_CAP_TEXTURE_FLOAT_LINEAR,
PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR,
PIPE_CAP_DEPTH_BOUNDS_TEST,
+   PIPE_CAP_TGSI_TXQS,
 };
#define PIPE_QUIRK_TEXTURE_BORDER_COLOR_SWIZZLE_NV50 (1 << 0)


Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g: lower number of driver const buffers

2015-09-11 Thread Glenn Kennard

Series is:

Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 2/2] r600: Enable fp64 on chips with native support

2015-09-11 Thread Glenn Kennard
Cypress/Cayman/Aruba, earlier r6xx/r7xx chips only support a subset
of the needed fp64 ops, and don't do GL4 anyway.

Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
Changes since v1:
 Updated commit message

 docs/GL3.txt | 4 ++--
 docs/relnotes/11.1.0.html| 2 +-
 src/gallium/drivers/r600/r600_pipe.c | 3 +++
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index 8ad1aac..7247eb6 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -109,7 +109,7 @@ GL 4.0, GLSL 4.00 --- all DONE: nvc0, radeonsi
   - Enhanced per-sample shadingDONE (r600)
   - Interpolation functionsDONE (r600)
   - New overload resolution rules  DONE
-  GL_ARB_gpu_shader_fp64   DONE (llvmpipe, 
softpipe)
+  GL_ARB_gpu_shader_fp64   DONE (r600, llvmpipe, 
softpipe)
   GL_ARB_sample_shadingDONE (i965, nv50, r600)
   GL_ARB_shader_subroutine DONE (i965, nv50, r600, 
llvmpipe, softpipe)
   GL_ARB_tessellation_shader   DONE ()
@@ -127,7 +127,7 @@ GL 4.1, GLSL 4.10 --- all DONE: nvc0, radeonsi
   GL_ARB_get_program_binaryDONE (0 binary formats)
   GL_ARB_separate_shader_objects   DONE (all drivers)
   GL_ARB_shader_precision  DONE (all drivers that 
support GLSL 4.10)
-  GL_ARB_vertex_attrib_64bit   DONE (llvmpipe, 
softpipe)
+  GL_ARB_vertex_attrib_64bit   DONE (r600, llvmpipe, 
softpipe)
   GL_ARB_viewport_arrayDONE (i965, nv50, r600, 
llvmpipe)
 
 
diff --git a/docs/relnotes/11.1.0.html b/docs/relnotes/11.1.0.html
index 4b56f69..f7ff74a 100644
--- a/docs/relnotes/11.1.0.html
+++ b/docs/relnotes/11.1.0.html
@@ -45,7 +45,7 @@ Note: some of the new features are only available with 
certain drivers.
 
 
 GL_ARB_texture_query_lod on softpipe
-TBD.
+GL_ARB_gpu_shader_fp64 on r600 for Cypress/Cayman/Aruba chips
 
 
 Bug fixes
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index fd9c16c..a18ec49 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -500,6 +500,9 @@ static int r600_get_shader_param(struct pipe_screen* 
pscreen, unsigned shader, e
return PIPE_SHADER_IR_TGSI;
}
case PIPE_SHADER_CAP_DOUBLES:
+   if (rscreen->b.family == CHIP_CYPRESS ||
+   rscreen->b.family == CHIP_CAYMAN || rscreen->b.family 
== CHIP_ARUBA)
+   return 1;
return 0;
case PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED:
case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED:
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 1/2] r600g: Support I2D/U2D/D2I/D2U

2015-09-11 Thread Glenn Kennard
Only for Cypress/Cayman/Aruba, older chips have only partial fp64 support.
Uses float intermediate values so only accurate for int24 range, which
matches what the blob does.

Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
Changes since v1:
 Split into two functions
 Make names a bit clearer which chips they apply to
 Fix mixup of INT_TO_FLT/UINT_TO_FLT for eg opcode table
 Updated commit message

 src/gallium/drivers/r600/r600_shader.c | 106 ++---
 1 file changed, 98 insertions(+), 8 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index f2c9e16..41cb226 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -3058,6 +3058,96 @@ static int tgsi_dfracexp(struct r600_shader_ctx *ctx)
return 0;
 }
 
+
+static int egcm_int_to_double(struct r600_shader_ctx *ctx)
+{
+   struct tgsi_full_instruction *inst = 
>parse.FullToken.FullInstruction;
+   struct r600_bytecode_alu alu;
+   int i, r;
+   int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask);
+
+   assert(inst->Instruction.Opcode == TGSI_OPCODE_I2D ||
+   inst->Instruction.Opcode == TGSI_OPCODE_U2D);
+
+   for (i = 0; i <= (lasti+1)/2; i++) {
+   memset(, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ctx->inst_info->op;
+
+   r600_bytecode_src([0], >src[0], i);
+   alu.dst.sel = ctx->temp_reg;
+   alu.dst.chan = i;
+   alu.dst.write = 1;
+   alu.last = 1;
+
+   r = r600_bytecode_add_alu(ctx->bc, );
+   if (r)
+   return r;
+   }
+
+   for (i = 0; i <= lasti; i++) {
+   memset(, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ALU_OP1_FLT32_TO_FLT64;
+
+   alu.src[0].chan = i/2;
+   if (i%2 == 0)
+   alu.src[0].sel = ctx->temp_reg;
+   else {
+   alu.src[0].sel = V_SQ_ALU_SRC_LITERAL;
+   alu.src[0].value = 0x0;
+   }
+   tgsi_dst(ctx, >Dst[0], i, );
+   alu.last = i == lasti;
+
+   r = r600_bytecode_add_alu(ctx->bc, );
+   if (r)
+   return r;
+   }
+
+   return 0;
+}
+
+static int egcm_double_to_int(struct r600_shader_ctx *ctx)
+{
+   struct tgsi_full_instruction *inst = 
>parse.FullToken.FullInstruction;
+   struct r600_bytecode_alu alu;
+   int i, r;
+   int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask);
+
+   assert(inst->Instruction.Opcode == TGSI_OPCODE_D2I ||
+   inst->Instruction.Opcode == TGSI_OPCODE_D2U);
+
+   for (i = 0; i <= lasti; i++) {
+   memset(, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ALU_OP1_FLT64_TO_FLT32;
+
+   r600_bytecode_src([0], >src[0], fp64_switch(i));
+   alu.dst.chan = i;
+   alu.dst.sel = ctx->temp_reg;
+   alu.dst.write = i%2 == 0;
+   alu.last = i == lasti;
+
+   r = r600_bytecode_add_alu(ctx->bc, );
+   if (r)
+   return r;
+   }
+
+   for (i = 0; i <= (lasti+1)/2; i++) {
+   memset(, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ctx->inst_info->op;
+
+   alu.src[0].chan = i*2;
+   alu.src[0].sel = ctx->temp_reg;
+   tgsi_dst(ctx, >Dst[0], 0, );
+   alu.last = 1;
+
+   r = r600_bytecode_add_alu(ctx->bc, );
+   if (r)
+   return r;
+   }
+
+   return 0;
+}
+
 static int cayman_emit_double_instr(struct r600_shader_ctx *ctx)
 {
struct tgsi_full_instruction *inst = 
>parse.FullToken.FullInstruction;
@@ -8150,10 +8240,10 @@ static const struct r600_shader_tgsi_instruction 
eg_shader_tgsi_instruction[] =
[TGSI_OPCODE_DFRAC] = { ALU_OP1_FRACT_64, tgsi_op2_64},
[TGSI_OPCODE_DLDEXP]= { ALU_OP2_LDEXP_64, tgsi_op2_64},
[TGSI_OPCODE_DFRACEXP]  = { ALU_OP1_FREXP_64, tgsi_dfracexp},
-   [TGSI_OPCODE_D2I]   = { ALU_OP0_NOP, tgsi_unsupported},
-   [TGSI_OPCODE_I2D]   = { ALU_OP0_NOP, tgsi_unsupported},
-   [TGSI_OPCODE_D2U]   = { ALU_OP0_NOP, tgsi_unsupported},
-   [TGSI_OPCODE_U2D]   = { ALU_OP0_NOP, tgsi_unsupported},
+   [TGSI_OPCODE_D2I]   = { ALU_OP1_FLT_TO_INT, egcm_double_to_int},
+   [TGSI_OPCODE_I2D]   = { ALU_OP1_INT_TO_FLT, egcm_int_to_double},
+   [TGSI_OPCODE_D2U]   = { ALU_OP1_FLT_TO_UINT, egcm_double_to_int},
+   [TGSI_OPCODE_U2D]   = { ALU_OP1_UINT_TO_FLT, egcm_int_to_double},
[TGSI_OPCODE_DRSQ]  = { ALU_OP2_RECIPSQRT_64, 
cayman_emit_double_instr},

[Mesa-dev] [PATCH 2/2] r600: Enable fp64 on chips with native support

2015-09-10 Thread Glenn Kennard
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
 docs/GL3.txt | 4 ++--
 docs/relnotes/11.1.0.html| 2 +-
 src/gallium/drivers/r600/r600_pipe.c | 3 +++
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index 8ad1aac..7247eb6 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -109,7 +109,7 @@ GL 4.0, GLSL 4.00 --- all DONE: nvc0, radeonsi
   - Enhanced per-sample shadingDONE (r600)
   - Interpolation functionsDONE (r600)
   - New overload resolution rules  DONE
-  GL_ARB_gpu_shader_fp64   DONE (llvmpipe, 
softpipe)
+  GL_ARB_gpu_shader_fp64   DONE (r600, llvmpipe, 
softpipe)
   GL_ARB_sample_shadingDONE (i965, nv50, r600)
   GL_ARB_shader_subroutine DONE (i965, nv50, r600, 
llvmpipe, softpipe)
   GL_ARB_tessellation_shader   DONE ()
@@ -127,7 +127,7 @@ GL 4.1, GLSL 4.10 --- all DONE: nvc0, radeonsi
   GL_ARB_get_program_binaryDONE (0 binary formats)
   GL_ARB_separate_shader_objects   DONE (all drivers)
   GL_ARB_shader_precision  DONE (all drivers that 
support GLSL 4.10)
-  GL_ARB_vertex_attrib_64bit   DONE (llvmpipe, 
softpipe)
+  GL_ARB_vertex_attrib_64bit   DONE (r600, llvmpipe, 
softpipe)
   GL_ARB_viewport_arrayDONE (i965, nv50, r600, 
llvmpipe)
 
 
diff --git a/docs/relnotes/11.1.0.html b/docs/relnotes/11.1.0.html
index 4b56f69..f7ff74a 100644
--- a/docs/relnotes/11.1.0.html
+++ b/docs/relnotes/11.1.0.html
@@ -45,7 +45,7 @@ Note: some of the new features are only available with 
certain drivers.
 
 
 GL_ARB_texture_query_lod on softpipe
-TBD.
+GL_ARB_gpu_shader_fp64 on r600 for Cypress/Cayman/Aruba chips
 
 
 Bug fixes
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index fd9c16c..a18ec49 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -500,6 +500,9 @@ static int r600_get_shader_param(struct pipe_screen* 
pscreen, unsigned shader, e
return PIPE_SHADER_IR_TGSI;
}
case PIPE_SHADER_CAP_DOUBLES:
+   if (rscreen->b.family == CHIP_CYPRESS ||
+   rscreen->b.family == CHIP_CAYMAN || rscreen->b.family 
== CHIP_ARUBA)
+   return 1;
return 0;
case PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED:
case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED:
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] r600g: Support I2D/U2D/D2I/D2U

2015-09-10 Thread Glenn Kennard
int <-> float <-> double conversion, matches what the blob does

Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com>
---
 src/gallium/drivers/r600/r600_shader.c | 95 +++---
 1 file changed, 87 insertions(+), 8 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index f2c9e16..1c642fd 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -3058,6 +3058,85 @@ static int tgsi_dfracexp(struct r600_shader_ctx *ctx)
return 0;
 }
 
+static int cypress_int_double(struct r600_shader_ctx *ctx)
+{
+   struct tgsi_full_instruction *inst = 
>parse.FullToken.FullInstruction;
+   struct r600_bytecode_alu alu;
+   int i, r;
+   int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask);
+
+   if (inst->Instruction.Opcode == TGSI_OPCODE_I2D ||
+   inst->Instruction.Opcode == TGSI_OPCODE_U2D) {
+
+   for (i = 0; i <= (lasti+1)/2; i++) {
+   memset(, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ctx->inst_info->op;
+
+   r600_bytecode_src([0], >src[0], i);
+   alu.dst.sel = ctx->temp_reg;
+   alu.dst.chan = i;
+   alu.dst.write = 1;
+   alu.last = 1;
+
+   r = r600_bytecode_add_alu(ctx->bc, );
+   if (r)
+   return r;
+   }
+
+   for (i = 0; i <= lasti; i++) {
+   memset(, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ALU_OP1_FLT32_TO_FLT64;
+
+   alu.src[0].chan = i/2;
+   if (i%2 == 0)
+   alu.src[0].sel = ctx->temp_reg;
+   else {
+   alu.src[0].sel = V_SQ_ALU_SRC_LITERAL;
+   alu.src[0].value = 0x0;
+   }
+   tgsi_dst(ctx, >Dst[0], i, );
+   alu.last = i == lasti;
+
+   r = r600_bytecode_add_alu(ctx->bc, );
+   if (r)
+   return r;
+   }
+   }
+   else { // D2I/D2U
+
+   for (i = 0; i <= lasti; i++) {
+   memset(, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ALU_OP1_FLT64_TO_FLT32;
+
+   r600_bytecode_src([0], >src[0], 
fp64_switch(i));
+   alu.dst.chan = i;
+   alu.dst.sel = ctx->temp_reg;
+   alu.dst.write = i%2 == 0;
+   alu.last = i == lasti;
+
+   r = r600_bytecode_add_alu(ctx->bc, );
+   if (r)
+   return r;
+   }
+
+   for (i = 0; i <= (lasti+1)/2; i++) {
+   memset(, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ctx->inst_info->op;
+
+   alu.src[0].chan = i*2;
+   alu.src[0].sel = ctx->temp_reg;
+   tgsi_dst(ctx, >Dst[0], 0, );
+   alu.last = 1;
+
+   r = r600_bytecode_add_alu(ctx->bc, );
+   if (r)
+   return r;
+   }
+   }
+
+   return 0;
+}
+
 static int cayman_emit_double_instr(struct r600_shader_ctx *ctx)
 {
struct tgsi_full_instruction *inst = 
>parse.FullToken.FullInstruction;
@@ -8150,10 +8229,10 @@ static const struct r600_shader_tgsi_instruction 
eg_shader_tgsi_instruction[] =
[TGSI_OPCODE_DFRAC] = { ALU_OP1_FRACT_64, tgsi_op2_64},
[TGSI_OPCODE_DLDEXP]= { ALU_OP2_LDEXP_64, tgsi_op2_64},
[TGSI_OPCODE_DFRACEXP]  = { ALU_OP1_FREXP_64, tgsi_dfracexp},
-   [TGSI_OPCODE_D2I]   = { ALU_OP0_NOP, tgsi_unsupported},
-   [TGSI_OPCODE_I2D]   = { ALU_OP0_NOP, tgsi_unsupported},
-   [TGSI_OPCODE_D2U]   = { ALU_OP0_NOP, tgsi_unsupported},
-   [TGSI_OPCODE_U2D]   = { ALU_OP0_NOP, tgsi_unsupported},
+   [TGSI_OPCODE_D2I]   = { ALU_OP1_FLT_TO_INT, cypress_int_double},
+   [TGSI_OPCODE_I2D]   = { ALU_OP1_INT_TO_FLT, cypress_int_double},
+   [TGSI_OPCODE_D2U]   = { ALU_OP1_FLT_TO_UINT, cypress_int_double},
+   [TGSI_OPCODE_U2D]   = { ALU_OP1_INT_TO_FLT, cypress_int_double},
[TGSI_OPCODE_DRSQ]  = { ALU_OP2_RECIPSQRT_64, 
cayman_emit_double_instr},
[TGSI_OPCODE_LAST]  = { ALU_OP0_NOP, tgsi_unsupported},
 };
@@ -8372,10 +8451,10 @@ static const struct r600_shader_tgsi_instruction 
cm_shader_tgsi_instruction[] =
[TGSI_OPCODE_DFRAC] = { ALU_OP1_FRACT_64, tgsi_op2_64},

Re: [Mesa-dev] [PATCH] r600/sb: update last_cf for finalize if.

2015-08-31 Thread Glenn Kennard

On Mon, 31 Aug 2015 06:23:38 +0200, Dave Airlie <airl...@gmail.com> wrote:


From: Dave Airlie <airl...@redhat.com>

As Glenn did for finalize_loop we need to update_cf when we
add a POP at the end of a shader.

I think this fixes one of the earlier shader going off end
of memory problems we've stopped.

Signed-off-by: Dave Airlie <airl...@redhat.com>
---
 src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp  
b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp

index e8ed5a2..726e438 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
@@ -199,6 +199,9 @@ void bc_finalizer::finalize_if(region_node* r) {
cf_node *if_jump = sh.create_cf(CF_OP_JUMP);
cf_node *if_pop = sh.create_cf(CF_OP_POP);
+   if (!last_cf || last_cf->get_parent_region() == r) {
+   last_cf = if_pop;
+   }
if_pop->bc.pop_count = 1;
if_pop->jump_after(if_pop);



Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600: port si_conv_prim_to_gs_out from radeonsi

2015-08-28 Thread Glenn Kennard

On Fri, 28 Aug 2015 02:47:44 +0200, Dave Airlie airl...@gmail.com wrote:


From: Dave Airlie airl...@redhat.com

This code we broken by the tess merge, and I totally missed it


was broken


until now. I'm not sure this fixes anything but it stops the assert.

Cc: 11.0 mesa-sta...@lists.freedesktop.org
Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_pipe.h | 31  
---

 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_pipe.h  
b/src/gallium/drivers/r600/r600_pipe.h

index 384ba80..3247aba 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -939,21 +939,22 @@ static inline bool r600_can_read_depth(struct  
r600_texture *rtex)

 static inline unsigned r600_conv_prim_to_gs_out(unsigned mode)
 {
static const int prim_conv[] = {
-   V_028A6C_OUTPRIM_TYPE_POINTLIST,
-   V_028A6C_OUTPRIM_TYPE_LINESTRIP,
-   V_028A6C_OUTPRIM_TYPE_LINESTRIP,
-   V_028A6C_OUTPRIM_TYPE_LINESTRIP,
-   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
-   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
-   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
-   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
-   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
-   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
-   V_028A6C_OUTPRIM_TYPE_LINESTRIP,
-   V_028A6C_OUTPRIM_TYPE_LINESTRIP,
-   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
-   V_028A6C_OUTPRIM_TYPE_TRISTRIP,
-   V_028A6C_OUTPRIM_TYPE_TRISTRIP
+   [PIPE_PRIM_POINTS]  = 
V_028A6C_OUTPRIM_TYPE_POINTLIST,
+   [PIPE_PRIM_LINES]   = 
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
+   [PIPE_PRIM_LINE_LOOP]   = 
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
+   [PIPE_PRIM_LINE_STRIP]  = 
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
+   [PIPE_PRIM_TRIANGLES]   = 
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   [PIPE_PRIM_TRIANGLE_STRIP]  = 
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   [PIPE_PRIM_TRIANGLE_FAN]= 
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   [PIPE_PRIM_QUADS]   = 
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   [PIPE_PRIM_QUAD_STRIP]  = 
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   [PIPE_PRIM_POLYGON] = 
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   [PIPE_PRIM_LINES_ADJACENCY] = 
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
+   [PIPE_PRIM_LINE_STRIP_ADJACENCY]= 
V_028A6C_OUTPRIM_TYPE_LINESTRIP,
+   [PIPE_PRIM_TRIANGLES_ADJACENCY] = 
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   [PIPE_PRIM_TRIANGLE_STRIP_ADJACENCY]= 
V_028A6C_OUTPRIM_TYPE_TRISTRIP,
+   [PIPE_PRIM_PATCHES] = 
V_028A6C_OUTPRIM_TYPE_POINTLIST,
+   [R600_PRIM_RECTANGLE_LIST]  = 
V_028A6C_OUTPRIM_TYPE_TRISTRIP
};
assert(mode  Elements(prim_conv));



A dup of si_conv_prim_to_gs_out(), but probably not worth the hassle of  
sharing.


Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] r600g/sb: Handle undef in read port tracker

2015-08-27 Thread Glenn Kennard
e8e443 missed adding check for undef values also in
unreserve function, leading to an assert triggering.

Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com
---
 src/gallium/drivers/r600/sb/sb_sched.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp 
b/src/gallium/drivers/r600/sb/sb_sched.cpp
index 6268078..c98b8ff 100644
--- a/src/gallium/drivers/r600/sb/sb_sched.cpp
+++ b/src/gallium/drivers/r600/sb/sb_sched.cpp
@@ -236,7 +236,7 @@ void rp_gpr_tracker::unreserve(alu_node* n) {
 
for (i = 0; i  nsrc; ++i) {
value *v = n-src[i];
-   if (v-is_readonly())
+   if (v-is_readonly() || v-is_undef())
continue;
if (i == 1  opt)
continue;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] r600g/sb: Don't crash on empty if jump target

2015-08-27 Thread Glenn Kennard
Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com
---
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
index 748aae2..c479927 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
@@ -792,6 +792,9 @@ int bc_parser::prepare_if(cf_node* c) {
assert(c-bc.addr-1  cf_map.size());
cf_node *c_else = NULL, *end = cf_map[c-bc.addr];
 
+   if (!end)
+   return 0; // not quite sure how this happens, malformed input?
+
BCP_DUMP(
sblog  parsing JUMP @  c-bc.id;
sblog  \n;
@@ -817,7 +820,7 @@ int bc_parser::prepare_if(cf_node* c) {
if (c_else-parent != c-parent)
c_else = NULL;
 
-   if (end-parent != c-parent)
+   if (end  end-parent != c-parent)
end = NULL;
 
region_node *reg = sh-create_region();
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] r600g/sb: Don't read junk after EOP

2015-08-27 Thread Glenn Kennard
Shaders that contain instruction data after an instruction with EOP could end
up parsing that as an instruction, leading to various crashes and asserts in
SB as it gets very confused if it sees for instance a loop start instruction
jumping off to some random point.

Add a couple of asserts, and print EOP bit if set in old asm printer.

Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com
---
 src/gallium/drivers/r600/r600_asm.c   | 2 ++
 src/gallium/drivers/r600/sb/sb_bc_decoder.cpp | 1 +
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp  | 4 +++-
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_asm.c 
b/src/gallium/drivers/r600/r600_asm.c
index 762cc7f..b514c58 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -2029,6 +2029,8 @@ void r600_bytecode_disasm(struct r600_bytecode *bc)
fprintf(stderr, CND:%X , cf-cond);
if (cf-pop_count)
fprintf(stderr, POP:%X , 
cf-pop_count);
+   if (cf-end_of_program)
+   fprintf(stderr, EOP );
fprintf(stderr, \n);
}
}
diff --git a/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp
index 5e233f9..5fe8f50 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp
@@ -32,6 +32,7 @@ int bc_decoder::decode_cf(unsigned i, bc_cf bc) {
int r = 0;
uint32_t dw0 = dw[i];
uint32_t dw1 = dw[i+1];
+   assert(i+1 = ndw);
 
if ((dw1  29)  1) { // CF_ALU
return decode_cf_alu(i, bc);
diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
index 4879c03..748aae2 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp
@@ -95,7 +95,7 @@ int bc_parser::decode_shader() {
if ((r = decode_cf(i, eop)))
return r;
 
-   } while (!eop || (i  1) = max_cf);
+   } while (!eop || (i  1)  max_cf);
 
return 0;
 }
@@ -769,6 +769,7 @@ int bc_parser::prepare_ir() {
 }
 
 int bc_parser::prepare_loop(cf_node* c) {
+   assert(c-bc.addr-1  cf_map.size());
 
cf_node *end = cf_map[c-bc.addr - 1];
assert(end-bc.op == CF_OP_LOOP_END);
@@ -788,6 +789,7 @@ int bc_parser::prepare_loop(cf_node* c) {
 }
 
 int bc_parser::prepare_if(cf_node* c) {
+   assert(c-bc.addr-1  cf_map.size());
cf_node *c_else = NULL, *end = cf_map[c-bc.addr];
 
BCP_DUMP(
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: Fix assert in tgsi_cmp

2015-08-22 Thread Glenn Kennard
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=91726

Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com
---
 src/gallium/drivers/r600/r600_shader.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 6cbfd1b..4c4b600 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -6151,10 +6151,10 @@ static int tgsi_cmp(struct r600_shader_ctx *ctx)
r = tgsi_make_src_for_op3(ctx, temp_regs[0], i, alu.src[0], 
ctx-src[0]);
if (r)
return r;
-   r = tgsi_make_src_for_op3(ctx, temp_regs[1], i, alu.src[1], 
ctx-src[2]);
+   r = tgsi_make_src_for_op3(ctx, temp_regs[2], i, alu.src[1], 
ctx-src[2]);
if (r)
return r;
-   r = tgsi_make_src_for_op3(ctx, temp_regs[2], i, alu.src[2], 
ctx-src[1]);
+   r = tgsi_make_src_for_op3(ctx, temp_regs[1], i, alu.src[2], 
ctx-src[1]);
if (r)
return r;
tgsi_dst(ctx, inst-Dst[0], i, alu.dst);
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: Fix handling of TGSI_OPCODE_ARR with SB

2015-08-13 Thread Glenn Kennard
FLT_TO_INT goes in the vector pipes on evergreen/NI,
not the trans unit as on earlier chips.

Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com
---
Fixes issue found on nine: https://github.com/iXit/Mesa-3D/issues/119

 src/gallium/drivers/r600/r600_isa.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_isa.h 
b/src/gallium/drivers/r600/r600_isa.h
index 381f06d..fdbe1c0 100644
--- a/src/gallium/drivers/r600/r600_isa.h
+++ b/src/gallium/drivers/r600/r600_isa.h
@@ -262,7 +262,7 @@ static const struct alu_op_info alu_op_table[] = {
{PRED_SETNE_PUSH_INT,   2, { 0x4D, 0x4D },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_PRED_PUSH | AF_CC_NE | AF_INT_CMP },
{PRED_SETLT_PUSH_INT,   2, { 0x4E, 0x4E },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_PRED_PUSH | AF_CC_LT | AF_INT_CMP },
{PRED_SETLE_PUSH_INT,   2, { 0x4F, 0x4F },{  AF_VS, 
AF_VS, AF_VS, AF_VS},  AF_PRED_PUSH | AF_CC_LE | AF_INT_CMP },
-   {FLT_TO_INT,1, { 0x6B, 0x50 },{   AF_S,  
AF_S, AF_VS, AF_VS},  AF_INT_DST | AF_CVT },
+   {FLT_TO_INT,1, { 0x6B, 0x50 },{   AF_S,  
AF_S,  AF_V,  AF_V},  AF_INT_DST | AF_CVT },
{BFREV_INT, 1, {   -1, 0x51 },{  0, 
0, AF_VS, AF_VS},  AF_INT_DST },
{ADDC_UINT, 2, {   -1, 0x52 },{  0, 
0, AF_VS, AF_VS},  AF_UINT_DST },
{SUBB_UINT, 2, {   -1, 0x53 },{  0, 
0, AF_VS, AF_VS},  AF_UINT_DST },
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix sampler/ubo indexing on cayman

2015-07-09 Thread Glenn Kennard

On Thu, 09 Jul 2015 07:37:59 +0200, Dave Airlie airl...@gmail.com wrote:


From: Dave Airlie airl...@redhat.com

Cayman needs a different method to upload the CF IDX0/1

This fixes 31 piglits when ARB_gpu_shader5 is forced on
with cayman.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/eg_asm.c | 17 +++--
 src/gallium/drivers/r600/eg_sq.h  | 11 +++
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/r600/eg_asm.c  
b/src/gallium/drivers/r600/eg_asm.c

index d04921e..c32d317 100644
--- a/src/gallium/drivers/r600/eg_asm.c
+++ b/src/gallium/drivers/r600/eg_asm.c
@@ -161,6 +161,9 @@ int egcm_load_index_reg(struct r600_bytecode *bc,  
unsigned id, bool inside_alu_c

alu.op = ALU_OP1_MOVA_INT;
alu.src[0].sel = bc-index_reg[id];
alu.src[0].chan = 0;
+   if (bc-chip_class == CAYMAN)
+		alu.dst.sel = id == 0 ? CM_V_SQ_MOVA_DST_CF_IDX0 :  
CM_V_SQ_MOVA_DST_CF_IDX1;

+
alu.last = 1;
r = r600_bytecode_add_alu(bc, alu);
if (r)
@@ -168,12 +171,14 @@ int egcm_load_index_reg(struct r600_bytecode *bc,  
unsigned id, bool inside_alu_c

bc-ar_loaded = 0; /* clobbered */


Could split ar_loaded into 3 bits for AR/IDX0/IDX1 for cayman, however I  
think it would be better to teach SB to handle sampler/ubo indexing and  
keep things simple here.



-   memset(alu, 0, sizeof(alu));
-   alu.op = id == 0 ? ALU_OP0_SET_CF_IDX0 : ALU_OP0_SET_CF_IDX1;
-   alu.last = 1;
-   r = r600_bytecode_add_alu(bc, alu);
-   if (r)
-   return r;
+   if (bc-chip_class == EVERGREEN) {
+   memset(alu, 0, sizeof(alu));
+   alu.op = id == 0 ? ALU_OP0_SET_CF_IDX0 : ALU_OP0_SET_CF_IDX1;
+   alu.last = 1;
+   r = r600_bytecode_add_alu(bc, alu);
+   if (r)
+   return r;
+   }
/* Must split ALU group as index only applies to following group */
if (inside_alu_clause) {
diff --git a/src/gallium/drivers/r600/eg_sq.h  
b/src/gallium/drivers/r600/eg_sq.h

index b534872..10caa07 100644
--- a/src/gallium/drivers/r600/eg_sq.h
+++ b/src/gallium/drivers/r600/eg_sq.h
@@ -521,4 +521,15 @@
#define V_SQ_REL_ABSOLUTE 0
 #define V_SQ_REL_RELATIVE 1
+
+/* CAYMAN has special encoding for MOVA_INT destination */
+#define CM_V_SQ_MOVA_DST_AR_X 0
+#define CM_V_SQ_MOVA_DST_CF_PC 1
+#define CM_V_SQ_MOVA_DST_CF_IDX0 2
+#define CM_V_SQ_MOVA_DST_CF_IDX1 3



+#define CM_V_SQ_MOVA_DST_CF_CLAUSE_GLOBAL_7_0 4
+#define CM_V_SQ_MOVA_DST_CF_CLAUSE_GLOBAL_15_8 5
+#define CM_V_SQ_MOVA_DST_CF_CLAUSE_GLOBAL_23_16 6
+#define CM_V_SQ_MOVA_DST_CF_CLAUSE_GLOBAL_31_24 7


Can't think of any useful cases for the cayman specific ALU global  
register. Drop these four?



+
 #endif



Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: move sampler/ubo index registers before temp reg

2015-07-09 Thread Glenn Kennard

On Thu, 09 Jul 2015 08:00:48 +0200, Dave Airlie airl...@gmail.com wrote:


From: Dave Airlie airl...@redhat.com

temp_reg needs to be last, as we increment things
away from it, otherwise on cayman some tests were overwriting
the index regs.

Fixes 2 piglit with ARB_gpu_shader5 forced on cayman.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_shader.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c  
b/src/gallium/drivers/r600/r600_shader.c

index af7622e..1a72bf6 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -1931,15 +1931,14 @@ static int r600_shader_from_tgsi(struct  
r600_context *rctx,

ctx.file_offset[TGSI_FILE_IMMEDIATE] = V_SQ_ALU_SRC_LITERAL;
ctx.bc-ar_reg = ctx.file_offset[TGSI_FILE_TEMPORARY] +
ctx.info.file_max[TGSI_FILE_TEMPORARY] + 1;
+   ctx.bc-index_reg[0] = ctx.bc-ar_reg + 1;
+   ctx.bc-index_reg[1] = ctx.bc-ar_reg + 2;
+
if (ctx.type == TGSI_PROCESSOR_GEOMETRY) {
-   ctx.gs_export_gpr_treg = ctx.bc-ar_reg + 1;
-   ctx.temp_reg = ctx.bc-ar_reg + 2;
-   ctx.bc-index_reg[0] = ctx.bc-ar_reg + 3;
-   ctx.bc-index_reg[1] = ctx.bc-ar_reg + 4;
+   ctx.gs_export_gpr_treg = ctx.bc-ar_reg + 3;
+   ctx.temp_reg = ctx.bc-ar_reg + 4;
} else {
-   ctx.temp_reg = ctx.bc-ar_reg + 1;
-   ctx.bc-index_reg[0] = ctx.bc-ar_reg + 2;
-   ctx.bc-index_reg[1] = ctx.bc-ar_reg + 3;
+   ctx.temp_reg = ctx.bc-ar_reg + 3;
}
shader-max_arrays = 0;


Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] r600/sb_sched: fix what appears to be a typo in a condition

2015-06-05 Thread Glenn Kennard

On Fri, 05 Jun 2015 19:39:31 +0200, Marek Olšák mar...@gmail.com wrote:


I'd like somebody who knows r600/sb to review this. Glenn, can I
bother you please? :)

Marek

On Fri, Jun 5, 2015 at 2:31 PM, Martin Peres
martin.pe...@linux.intel.com wrote:

Signed-off-by: Martin Peres martin.pe...@linux.intel.com
---
 src/gallium/drivers/r600/sb/sb_sched.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp  
b/src/gallium/drivers/r600/sb/sb_sched.cpp

index 2e38a62..6268078 100644
--- a/src/gallium/drivers/r600/sb/sb_sched.cpp
+++ b/src/gallium/drivers/r600/sb/sb_sched.cpp
@@ -489,7 +489,7 @@ bool alu_group_tracker::try_reserve(alu_node* n) {

n-bc.bank_swizzle = 0;

-   if (!trans  fbs)
+   if (!trans  fbs)
n-bc.bank_swizzle = VEC_210;

if (gpr.try_reserve(n)) {
--
2.4.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


In theory this changes behavior, but the current implementation of the  
function that sets fbs - forced_bank_swizzle() only returns two values,  
VEC_012=0 or VEC_210=5, so the bit value tested coincides with the logical  
 operation, so if using logical and instead silences gcc 5 irritable  
warning syndrome, it can get a


Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] nir: Allow feq/fne/ieq/ine to be optimized with inot.

2015-05-06 Thread Glenn Kennard

On Wed, 06 May 2015 23:12:54 +0200, Matt Turner matts...@gmail.com wrote:


instructions in affected programs: 380 - 376 (-1.05%)
helped:2
---
Did we just completely forget these in commit 391fb32b, or is there a
reason to not include them?

 src/glsl/nir/nir_opt_algebraic.py | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/glsl/nir/nir_opt_algebraic.py  
b/src/glsl/nir/nir_opt_algebraic.py

index b0a1f24..400d60e 100644
--- a/src/glsl/nir/nir_opt_algebraic.py
+++ b/src/glsl/nir/nir_opt_algebraic.py
@@ -83,8 +83,12 @@ optimizations = [
# Comparison simplifications
(('inot', ('flt', a, b)), ('fge', a, b)),
(('inot', ('fge', a, b)), ('flt', a, b)),
+   (('inot', ('feq', a, b)), ('fne', a, b)),
+   (('inot', ('fne', a, b)), ('feq', a, b)),


These two will produce inverted results for NaN inputs. GLSL 4.5 spec  
doesn't mention requiring ieee754 compliant comparison operators though so  
probably okay.



(('inot', ('ilt', a, b)), ('ige', a, b)),
(('inot', ('ige', a, b)), ('ilt', a, b)),
+   (('inot', ('ieq', a, b)), ('ine', a, b)),
+   (('inot', ('ine', a, b)), ('ieq', a, b)),
(('fge', ('fneg', ('fabs', a)), 0.0), ('feq', a, 0.0)),
(('bcsel', ('flt', a, b), a, b), ('fmin', a, b)),
(('bcsel', ('flt', a, b), b, a), ('fmax', a, b)),



Patches 1-5 are
Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/19] gallium: basic tessellation support

2015-05-03 Thread Glenn Kennard
On Sat, 02 May 2015 22:16:24 +0200, Ilia Mirkin imir...@alum.mit.edu  
wrote:



This series adds tokens and updates some helper gallium functions to
know about tessellation. This provides no actual support for
tessellation in either core or drivers, however this will make it
possible to work on the core and driver pieces without crazy
interdependencies, as well as be landed separately and without
(direct) dependency.

Most of these patches have existed for about a year already, and have
been part of my and Marek's trees enabling tessellation in the nvc0
and radeonsi drivers. I've taken this opportunity to fix up and fold
some of them though.

This should be pretty safe to land, since even if I messed something
up, having this in-tree will make it easier for others to identify and
fix any issues collaboratively.

Ilia Mirkin (11):
  gallium: add tessellation shader types
  gallium: add new PATCHES primitive type
  gallium: add new semantics for tessellation
  gallium: add interfaces for controlling tess program state
  gallium: add tessellation shader properties
  gallium: add patch_vertices to draw info
  gallium: add set_tess_state to configure default tessellation
parameters
  tgsi/scan: allow scanning tessellation shaders
  tgsi/sanity: set implicit in/out array sizes based on patch sizes
  tgsi/ureg: allow ureg_dst to have dimension indices
  tgsi/dump: fix declaration printing of tessellation inputs/outputs

Marek Olšák (8):
  gallium: bump shader input and output limits
  trace: implement new tessellation functions
  gallium/util: print patch_vertices in util_dump_draw_info
  gallium/u_blitter: disable tessellation for all operations
  gallium/cso: add support for tessellation shaders
  gallium/cso: set NULL shaders at context destruction
  gallium: disable tessellation shaders for meta ops
  tgsi/ureg: use correct limit for max input count

 src/gallium/auxiliary/cso_cache/cso_context.c | 100  
++

 src/gallium/auxiliary/cso_cache/cso_context.h |  12 
 src/gallium/auxiliary/hud/hud_context.c   |   6 ++
 src/gallium/auxiliary/postprocess/pp_run.c|   6 ++
 src/gallium/auxiliary/tgsi/tgsi_dump.c|  20 +-
 src/gallium/auxiliary/tgsi/tgsi_info.c|   4 ++
 src/gallium/auxiliary/tgsi/tgsi_sanity.c  |  36 --
 src/gallium/auxiliary/tgsi/tgsi_scan.c|   6 +-
 src/gallium/auxiliary/tgsi/tgsi_strings.c |  19 -
 src/gallium/auxiliary/tgsi/tgsi_strings.h |   2 +-
 src/gallium/auxiliary/tgsi/tgsi_ureg.c|  26 ++-
 src/gallium/auxiliary/tgsi/tgsi_ureg.h|  59 +--
 src/gallium/auxiliary/util/u_blit.c   |   6 ++
 src/gallium/auxiliary/util/u_blitter.c|  27 +++
 src/gallium/auxiliary/util/u_blitter.h|  16 -
 src/gallium/auxiliary/util/u_dump_state.c |   2 +
 src/gallium/docs/source/context.rst   |   5 ++
 src/gallium/docs/source/tgsi.rst  |  70 ++
 src/gallium/drivers/trace/tr_context.c|  26 +++
 src/gallium/drivers/trace/tr_dump_state.c |   2 +
 src/gallium/include/pipe/p_context.h  |  14 
 src/gallium/include/pipe/p_defines.h  |  16 -
 src/gallium/include/pipe/p_shader_tokens.h|  18 -
 src/gallium/include/pipe/p_state.h|   6 +-
 src/mesa/state_tracker/st_cb_bitmap.c |   8 ++-
 src/mesa/state_tracker/st_cb_clear.c  |   6 ++
 src/mesa/state_tracker/st_cb_drawpixels.c |   8 ++-
 src/mesa/state_tracker/st_cb_drawtex.c|   6 ++
 28 files changed, 501 insertions(+), 31 deletions(-)



Some minor nits for patches 1, 6 and 7, see separate mails

Patches 2-5, 8-19 are
Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/19] gallium: add patch_vertices to draw info

2015-05-03 Thread Glenn Kennard
On Sat, 02 May 2015 22:16:31 +0200, Ilia Mirkin imir...@alum.mit.edu  
wrote:



Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---
 src/gallium/include/pipe/p_state.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/include/pipe/p_state.h  
b/src/gallium/include/pipe/p_state.h

index e713a44..449c7f1 100644
--- a/src/gallium/include/pipe/p_state.h
+++ b/src/gallium/include/pipe/p_state.h
@@ -543,6 +543,8 @@ struct pipe_draw_info
unsigned start_instance; /** first instance id */
unsigned instance_count; /** number of instances */
+   unsigned patch_vertices; /** the number of vertices per patch */
+


patch_vertex_count, this field isn't the actual patch vertices data
Don't forget to update patch 10 with the name


/**
 * For indexed drawing, these fields apply after index lookup.
 */


With above fixed,
Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/19] gallium: add tessellation shader types

2015-05-03 Thread Glenn Kennard
On Sat, 02 May 2015 22:16:25 +0200, Ilia Mirkin imir...@alum.mit.edu  
wrote:



Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---
 src/gallium/auxiliary/tgsi/tgsi_info.c | 4 
 src/gallium/auxiliary/tgsi/tgsi_strings.c  | 4 +++-
 src/gallium/auxiliary/tgsi/tgsi_strings.h  | 2 +-
 src/gallium/include/pipe/p_defines.h   | 6 --
 src/gallium/include/pipe/p_shader_tokens.h | 4 +++-
 5 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c  
b/src/gallium/auxiliary/tgsi/tgsi_info.c

index 3cab86e..eb447cb 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_info.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_info.c
@@ -302,6 +302,10 @@ tgsi_get_processor_name( uint processor )
   return fragment shader;
case TGSI_PROCESSOR_GEOMETRY:
   return geometry shader;
+   case TGSI_PROCESSOR_TESSCTRL:
+  return tessellation control shader;
+   case TGSI_PROCESSOR_TESSEVAL:
+  return tessellation evaluation shader;
default:
   return unknown shader type!;
}
diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.c  
b/src/gallium/auxiliary/tgsi/tgsi_strings.c

index 9b727cf..e712f30 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_strings.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_strings.c
@@ -32,11 +32,13 @@
 #include tgsi_strings.h
-const char *tgsi_processor_type_names[4] =
+const char *tgsi_processor_type_names[6] =


Don't forget to update the declaration in tgsi_strings.h


 {
FRAG,
VERT,
GEOM,
+   TESSC,
+   TESSE,


A bit silly to shorten these when the dumps dedicate an entire line for  
printing the name.



COMP
 };
diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.h  
b/src/gallium/auxiliary/tgsi/tgsi_strings.h

index 90014a2..71e7437 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_strings.h
+++ b/src/gallium/auxiliary/tgsi/tgsi_strings.h
@@ -38,7 +38,7 @@ extern C {
 #endif
-extern const char *tgsi_processor_type_names[4];
+extern const char *tgsi_processor_type_names[6];
extern const char *tgsi_semantic_names[TGSI_SEMANTIC_COUNT];
diff --git a/src/gallium/include/pipe/p_defines.h  
b/src/gallium/include/pipe/p_defines.h

index 67f48e4..48c182f 100644
--- a/src/gallium/include/pipe/p_defines.h
+++ b/src/gallium/include/pipe/p_defines.h
@@ -404,8 +404,10 @@ enum pipe_flush_flags
 #define PIPE_SHADER_VERTEX   0
 #define PIPE_SHADER_FRAGMENT 1
 #define PIPE_SHADER_GEOMETRY 2
-#define PIPE_SHADER_COMPUTE  3
-#define PIPE_SHADER_TYPES4
+#define PIPE_SHADER_TESSCTRL 3
+#define PIPE_SHADER_TESSEVAL 4


Most of the gallium names are typed out without contractions, ie  
PIPE_SHADER_TESSELLATION_CONTROL/EVALUATION



+#define PIPE_SHADER_COMPUTE  5
+#define PIPE_SHADER_TYPES6
/**
diff --git a/src/gallium/include/pipe/p_shader_tokens.h  
b/src/gallium/include/pipe/p_shader_tokens.h

index c14bcbc..776b0d4 100644
--- a/src/gallium/include/pipe/p_shader_tokens.h
+++ b/src/gallium/include/pipe/p_shader_tokens.h
@@ -43,7 +43,9 @@ struct tgsi_header
 #define TGSI_PROCESSOR_FRAGMENT  0
 #define TGSI_PROCESSOR_VERTEX1
 #define TGSI_PROCESSOR_GEOMETRY  2
-#define TGSI_PROCESSOR_COMPUTE   3
+#define TGSI_PROCESSOR_TESSCTRL  3
+#define TGSI_PROCESSOR_TESSEVAL  4
+#define TGSI_PROCESSOR_COMPUTE   5
struct tgsi_processor
 {


With above niggles fixed
Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/19] gallium: add tessellation shader properties

2015-05-03 Thread Glenn Kennard
On Sat, 02 May 2015 22:16:30 +0200, Ilia Mirkin imir...@alum.mit.edu  
wrote:



Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---
 src/gallium/auxiliary/tgsi/tgsi_strings.c  |  7 ++-
 src/gallium/docs/source/tgsi.rst   | 33  
++

 src/gallium/include/pipe/p_defines.h   |  7 +++
 src/gallium/include/pipe/p_shader_tokens.h |  7 ++-
 4 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.c  
b/src/gallium/auxiliary/tgsi/tgsi_strings.c

index dad503e..6781248 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_strings.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_strings.c
@@ -131,7 +131,12 @@ const char  
*tgsi_property_names[TGSI_PROPERTY_COUNT] =

FS_DEPTH_LAYOUT,
VS_PROHIBIT_UCPS,
GS_INVOCATIONS,
-   VS_WINDOW_SPACE_POSITION
+   VS_WINDOW_SPACE_POSITION,
+   TCS_VERTICES_OUT,
+   TES_PRIM_MODE,
+   TES_SPACING,
+   TES_VERTEX_ORDER_CW,
+   TES_POINT_MODE,


Stray comma


 };
const char *tgsi_return_type_names[TGSI_RETURN_TYPE_COUNT] =
diff --git a/src/gallium/docs/source/tgsi.rst  
b/src/gallium/docs/source/tgsi.rst

index 0116842..f77702a 100644
--- a/src/gallium/docs/source/tgsi.rst
+++ b/src/gallium/docs/source/tgsi.rst
@@ -3071,6 +3071,39 @@ Naturally, clipping is not performed on window  
coordinates either.
 The effect of this property is undefined if a geometry or tessellation  
shader

 are in use.
+TCS_VERTICES_OUT
+
+
+The number of vertices written by the tessellation control shader. This
+effectively defines the patch input size of the tessellation evaluation  
shader

+as well.
+
+TES_PRIM_MODE
+
+
+This sets the tessellation primitive mode, one of  
``PIPE_PRIM_TRIANGLES``,

+``PIPE_PRIM_QUADS``, or ``PIPE_PRIM_LINES``. (Unlike in GL, there is no
+separate isolines settings, the regular lines is assumed to mean  
isolines.)

+
+TES_SPACING
+
+
+This sets the spacing mode of the tessellation generator, one of
+``PIPE_TESS_SPACING_*``.
+
+TES_VERTEX_ORDER_CW
+
+
+This sets the vertex order to be clockwise if the value is 1, or
+counter-clockwise if set to 0.
+
+TES_POINT_MODE
+
+
+If set to a non-zero value, this turns on point mode for the  
tessellator,

+which means that points will be generated instead of primitives.
+
+
 Texture Sampling and Texture Formats
 
diff --git a/src/gallium/include/pipe/p_defines.h  
b/src/gallium/include/pipe/p_defines.h

index 59b7486..14e0db3 100644
--- a/src/gallium/include/pipe/p_defines.h
+++ b/src/gallium/include/pipe/p_defines.h
@@ -432,6 +432,13 @@ enum pipe_flush_flags
/**
+ * Tessellator spacing types
+ */
+#define PIPE_TESS_SPACING_FRACT_ODD  0
+#define PIPE_TESS_SPACING_FRACT_EVEN 1


GL spec types out the FRACTIONAL which is easier to grep the spec for.


+#define PIPE_TESS_SPACING_EQUAL  2
+
+/**
  * Query object types
  */
 #define PIPE_QUERY_OCCLUSION_COUNTER 0
diff --git a/src/gallium/include/pipe/p_shader_tokens.h  
b/src/gallium/include/pipe/p_shader_tokens.h

index c6ab899..ff1f7d6 100644
--- a/src/gallium/include/pipe/p_shader_tokens.h
+++ b/src/gallium/include/pipe/p_shader_tokens.h
@@ -262,7 +262,12 @@ union tgsi_immediate_data
 #define TGSI_PROPERTY_VS_PROHIBIT_UCPS   7
 #define TGSI_PROPERTY_GS_INVOCATIONS 8
 #define TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION 9
-#define TGSI_PROPERTY_COUNT  10
+#define TGSI_PROPERTY_TCS_VERTICES_OUT   10
+#define TGSI_PROPERTY_TES_PRIM_MODE  11
+#define TGSI_PROPERTY_TES_SPACING12
+#define TGSI_PROPERTY_TES_VERTEX_ORDER_CW13
+#define TGSI_PROPERTY_TES_POINT_MODE 14
+#define TGSI_PROPERTY_COUNT  15
struct tgsi_property {
unsigned Type : 4;  /** TGSI_TOKEN_TYPE_PROPERTY */


With above niggles fixed
Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g/sb: Skip empty ALU clause while scheduling

2015-04-08 Thread Glenn Kennard
Fixes assert triggered by
ext_transform_feedback-intervening-read output use_gs
piglit test.

Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com
---
 src/gallium/drivers/r600/sb/sb_sched.cpp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp 
b/src/gallium/drivers/r600/sb/sb_sched.cpp
index 4248a3f..2e38a62 100644
--- a/src/gallium/drivers/r600/sb/sb_sched.cpp
+++ b/src/gallium/drivers/r600/sb/sb_sched.cpp
@@ -825,6 +825,9 @@ void post_scheduler::init_regmap() {
 
 void post_scheduler::process_alu(container_node *c) {
 
+   if (c-empty())
+   return;
+
ucm.clear();
alu.reset();
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] r600g/sb: Enable SB for geometry shaders

2015-04-06 Thread Glenn Kennard
Add SV_GEOMETRY_EMIT special variable type to track the
implicit dependencies between CUT/EMIT_VERTEX/MEM_RING
instructions so GCM/scheduler doesn't reorder them.

Mark emit instructions as unkillable so DCE doesn't eat them.

Enable only for evergreen/cayman as there are a few
unexplained GS piglit regressions on R6xx/R7xx with SB
enabled otherwise.

Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com
---
Changes since v1:
* Enable SB only for = EVERGREEN. Something strange going on
  with GS on R6xx/R7xx that the code emitted by SB triggers,
  haven't been able to pinpoint it yet.
* Avoid splitting live ranges for SV_GEOMETRY_EMIT values, useless
  since they are not actual values. Avoids unnecessary MOV operations
  being emitted.
* Ensure the asm dump prints out the SV_GEOMETRY_EMIT dst values
* One bytecode dumper fix spotted by Coverity

Note:
Requires 'r600g/sb: Update last_cf for loops' for cayman to
pass all GS piglits without regressions - not a GS bug but a
loop handling issue that only triggers in some GS piglit shaders.

 src/gallium/drivers/r600/r600_isa.h|  8 
 src/gallium/drivers/r600/r600_shader.c | 12 
 src/gallium/drivers/r600/sb/sb_bc_dump.cpp |  2 +-
 src/gallium/drivers/r600/sb/sb_bc_finalize.cpp |  2 +-
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp   | 25 +
 src/gallium/drivers/r600/sb/sb_core.cpp|  5 -
 src/gallium/drivers/r600/sb/sb_dump.cpp|  4 +++-
 src/gallium/drivers/r600/sb/sb_ir.h|  6 +-
 src/gallium/drivers/r600/sb/sb_ra_init.cpp |  4 ++--
 src/gallium/drivers/r600/sb/sb_sched.cpp   |  2 +-
 src/gallium/drivers/r600/sb/sb_valtable.cpp|  1 +
 11 files changed, 55 insertions(+), 16 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_isa.h 
b/src/gallium/drivers/r600/r600_isa.h
index ec3f702..381f06d 100644
--- a/src/gallium/drivers/r600/r600_isa.h
+++ b/src/gallium/drivers/r600/r600_isa.h
@@ -641,7 +641,7 @@ static const struct cf_op_info cf_op_table[] = {
 
{MEM_SCRATCH,   { 0x24, 0x24, 0x50, 0x50 },  
CF_MEM  },
{MEM_REDUCT,{ 0x25, 0x25,   -1,   -1 },  
CF_MEM  },
-   {MEM_RING,  { 0x26, 0x26, 0x52, 0x52 },  
CF_MEM  },
+   {MEM_RING,  { 0x26, 0x26, 0x52, 0x52 },  
CF_MEM | CF_EMIT },
 
{EXPORT,{ 0x27, 0x27, 0x53, 0x53 },  
CF_EXP  },
{EXPORT_DONE,   { 0x28, 0x28, 0x54, 0x54 },  
CF_EXP  },
@@ -649,9 +649,9 @@ static const struct cf_op_info cf_op_table[] = {
{MEM_EXPORT,{   -1, 0x3A, 0x55, 0x55 },  
CF_MEM  },
{MEM_RAT,   {   -1,   -1, 0x56, 0x56 },  
CF_MEM | CF_RAT },
{MEM_RAT_NOCACHE,   {   -1,   -1, 0x57, 0x57 },  
CF_MEM | CF_RAT },
-   {MEM_RING1, {   -1,   -1, 0x58, 0x58 },  
CF_MEM  },
-   {MEM_RING2, {   -1,   -1, 0x59, 0x59 },  
CF_MEM  },
-   {MEM_RING3, {   -1,   -1, 0x5A, 0x5A },  
CF_MEM  },
+   {MEM_RING1, {   -1,   -1, 0x58, 0x58 },  
CF_MEM | CF_EMIT },
+   {MEM_RING2, {   -1,   -1, 0x59, 0x59 },  
CF_MEM | CF_EMIT },
+   {MEM_RING3, {   -1,   -1, 0x5A, 0x5A },  
CF_MEM | CF_EMIT },
{MEM_MEM_COMBINED,  {   -1,   -1, 0x5B, 0x5B },  
CF_MEM  },
{MEM_RAT_COMBINED_NOCACHE,  {   -1,   -1, 0x5C, 0x5C },  
CF_MEM | CF_RAT },
{MEM_RAT_COMBINED,  {   -1,   -1,   -1, 0x5D },  
CF_MEM | CF_RAT }, /* ??? not in cayman isa doc */
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 28b290a..a9338cc 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -159,8 +159,10 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
goto error;
}
 
-   /* disable SB for geom shaders - it can't handle the CF_EMIT 
instructions */
-   use_sb = (shader-shader.processor_type != TGSI_PROCESSOR_GEOMETRY);
+/* disable SB for geom shaders on R6xx/R7xx due to some mysterious gs 
piglit regressions with it enabled. */
+if (rctx-b.chip_class = R700) {
+   use_sb = (shader-shader.processor_type != 
TGSI_PROCESSOR_GEOMETRY);
+}
/* disable SB for shaders using CF_INDEX_0/1 (sampler/ubo array 
indexing) as it doesn't handle those currently */
use_sb = !shader-shader.uses_index_registers;
 
@@ -1141,6 +1143,8 @@ static int fetch_gs_input(struct r600_shader_ctx *ctx, 
struct tgsi_full_src_regi
for (i = 0; i  3; i++) {
treg[i] = r600_get_temp(ctx);
}
+   r600_add_gpr_array(ctx-shader, treg[0

Re: [Mesa-dev] [PATCH] r600g: fix op3 abs issue

2015-03-31 Thread Glenn Kennard
].sel = ctx-temp_reg;
@@ -6109,8 +6119,15 @@ static int tgsi_cmp(struct r600_shader_ctx *ctx)
 {
 	struct tgsi_full_instruction *inst =  
ctx-parse.FullToken.FullInstruction;

struct r600_bytecode_alu alu;
-   int i, r;
+   int i, r, j;
int lasti = tgsi_last_instruction(inst-Dst[0].Register.WriteMask);
+   int temp_regs[3];
+
+   for (j = 0; j  inst-Instruction.NumSrcRegs; j++) {
+   temp_regs[j] = 0;
+   if (ctx-src[j].abs)
+   temp_regs[j] = r600_get_temp(ctx);
+   }
for (i = 0; i  lasti + 1; i++) {
if (!(inst-Dst[0].Register.WriteMask  (1  i)))
@@ -6118,13 +6135,13 @@ static int tgsi_cmp(struct r600_shader_ctx *ctx)
memset(alu, 0, sizeof(struct r600_bytecode_alu));
alu.op = ALU_OP3_CNDGE;
-		r = tgsi_make_src_for_op3(ctx, ctx-temp_reg, 0, alu.src[0],  
ctx-src[0], i);
+		r = tgsi_make_src_for_op3(ctx, temp_regs[0], i, alu.src[0],  
ctx-src[0]);

if (r)
return r;
-		r = tgsi_make_src_for_op3(ctx, ctx-temp_reg, 1, alu.src[1],  
ctx-src[2], i);
+		r = tgsi_make_src_for_op3(ctx, temp_regs[1], i, alu.src[1],  
ctx-src[2]);

if (r)
return r;
-		r = tgsi_make_src_for_op3(ctx, ctx-temp_reg, 2, alu.src[2],  
ctx-src[1], i);
+		r = tgsi_make_src_for_op3(ctx, temp_regs[2], i, alu.src[2],  
ctx-src[1]);

if (r)
return r;
tgsi_dst(ctx, inst-Dst[0], i, alu.dst);



Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g/sb: Enable SB for geometry shaders

2015-03-25 Thread Glenn Kennard

On Wed, 25 Mar 2015 14:26:40 +0100, Marc Dietrich marvi...@gmx.de wrote:


Am Dienstag, 24. März 2015, 20:05:46 schrieb Glenn Kennard:

On Tue, 24 Mar 2015 17:21:35 +0100, Dieter Nützel die...@nuetzel-hh.de

wrote:
 Am 20.03.2015 14:13, schrieb Glenn Kennard:
 Add SV_GEOMETRY_EMIT special variable type to track the
 implicit dependencies between CUT/EMIT_VERTEX/MEM_RING
 instructions so GCM/scheduler doesn't reorder them.

  Mark emit instructions as unkillable so DCE doesn't eat them.
  Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com

 ---
 The hangs with SB on geometry shaders were all due to the CUT/EMIT
 instructions either being DCE:d or emitted out of order from the
 memory ring writes, so the hardware stalled forever waiting for
 completed primitives.

  Tested only on a Turks so far, but should behave the same across

 all R600 generations.

 Hello Glenn,

 what tests are preferred?
 Starting with a Turks XT here, too and could do some tests on RV730
 (AGP) then.

 -Dieter

Just the usual piglit regression testing, at this point it's been tested
on a Turks XT, and a RV770. A R6xx card and some VLIW4 gpu would  
complete

the coverage needed.


I would like to, but piglit run quick stalls/crashes the gpu (rs880)  
too

often. Maybe you could tell me some special tests to run instead of all.

Marc


-t geometry should be the smallest useful subset. It's likely that most of  
the hangs you get on rs880 (and other r6xx devices) are geometry shader  
related though so that might end up taking as long as a full quick run,  
unfortunately.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g/sb: Update last_cf for loops

2015-03-25 Thread Glenn Kennard
CF_END could end up emitted in the middle of a shader on cayman
when there was a loop at the very end.

Fixes glsl-1.50-geometry-end-primitive and
ext_transform_feedback-geometry-shaders-basic piglit tests.

Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com
---
Bug exposed by [PATCH] r600g/sb: Enable SB for geometry shaders

 src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 8 
 1 file changed, 8 insertions(+)

diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp 
b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
index 8d0be06..08b7d77 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
@@ -127,6 +127,14 @@ void bc_finalizer::finalize_loop(region_node* r) {
cf_node *loop_start = sh.create_cf(CF_OP_LOOP_START_DX10);
cf_node *loop_end = sh.create_cf(CF_OP_LOOP_END);
 
+   // Update last_cf, but don't overwrite it if it's outside the current 
loop nest since
+   // it may point to a cf that is later in program order.
+   // The single parent level check is sufficient since finalize_loop() is 
processed in
+   // reverse order from innermost to outermost loop nest level.
+   if (!last_cf || last_cf-get_parent_region() == r) {
+   last_cf = loop_end;
+   }
+
loop_start-jump_after(loop_end);
loop_end-jump_after(loop_start);
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g/sb: Enable SB for geometry shaders

2015-03-24 Thread Glenn Kennard
On Tue, 24 Mar 2015 17:21:35 +0100, Dieter Nützel die...@nuetzel-hh.de  
wrote:



Am 20.03.2015 14:13, schrieb Glenn Kennard:

Add SV_GEOMETRY_EMIT special variable type to track the
implicit dependencies between CUT/EMIT_VERTEX/MEM_RING
instructions so GCM/scheduler doesn't reorder them.
 Mark emit instructions as unkillable so DCE doesn't eat them.
 Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com
---
The hangs with SB on geometry shaders were all due to the CUT/EMIT
instructions either being DCE:d or emitted out of order from the
memory ring writes, so the hardware stalled forever waiting for
completed primitives.
 Tested only on a Turks so far, but should behave the same across
all R600 generations.


Hello Glenn,

what tests are preferred?
Starting with a Turks XT here, too and could do some tests on RV730  
(AGP) then.


-Dieter


Just the usual piglit regression testing, at this point it's been tested  
on a Turks XT, and a RV770. A R6xx card and some VLIW4 gpu would complete  
the coverage needed.



/Glenn
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g/sb: Enable SB for geometry shaders

2015-03-20 Thread Glenn Kennard
Add SV_GEOMETRY_EMIT special variable type to track the
implicit dependencies between CUT/EMIT_VERTEX/MEM_RING
instructions so GCM/scheduler doesn't reorder them.

Mark emit instructions as unkillable so DCE doesn't eat them.

Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com
---
The hangs with SB on geometry shaders were all due to the CUT/EMIT
instructions either being DCE:d or emitted out of order from the
memory ring writes, so the hardware stalled forever waiting for
completed primitives.

Tested only on a Turks so far, but should behave the same across
all R600 generations.

This patch disables the if-conversion pass when running GS shaders,
didn't seem worth the effort to fix that pass up for the marginal
returns.

 src/gallium/drivers/r600/r600_isa.h|  8 
 src/gallium/drivers/r600/r600_shader.c |  8 
 src/gallium/drivers/r600/sb/sb_bc_finalize.cpp |  2 +-
 src/gallium/drivers/r600/sb/sb_bc_parser.cpp   | 25 +
 src/gallium/drivers/r600/sb/sb_core.cpp|  5 -
 src/gallium/drivers/r600/sb/sb_dump.cpp|  4 +++-
 src/gallium/drivers/r600/sb/sb_ir.h|  6 +-
 src/gallium/drivers/r600/sb/sb_ra_init.cpp |  2 +-
 src/gallium/drivers/r600/sb/sb_sched.cpp   |  2 +-
 src/gallium/drivers/r600/sb/sb_valtable.cpp|  1 +
 10 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_isa.h 
b/src/gallium/drivers/r600/r600_isa.h
index ec3f702..381f06d 100644
--- a/src/gallium/drivers/r600/r600_isa.h
+++ b/src/gallium/drivers/r600/r600_isa.h
@@ -641,7 +641,7 @@ static const struct cf_op_info cf_op_table[] = {
 
{MEM_SCRATCH,   { 0x24, 0x24, 0x50, 0x50 },  
CF_MEM  },
{MEM_REDUCT,{ 0x25, 0x25,   -1,   -1 },  
CF_MEM  },
-   {MEM_RING,  { 0x26, 0x26, 0x52, 0x52 },  
CF_MEM  },
+   {MEM_RING,  { 0x26, 0x26, 0x52, 0x52 },  
CF_MEM | CF_EMIT },
 
{EXPORT,{ 0x27, 0x27, 0x53, 0x53 },  
CF_EXP  },
{EXPORT_DONE,   { 0x28, 0x28, 0x54, 0x54 },  
CF_EXP  },
@@ -649,9 +649,9 @@ static const struct cf_op_info cf_op_table[] = {
{MEM_EXPORT,{   -1, 0x3A, 0x55, 0x55 },  
CF_MEM  },
{MEM_RAT,   {   -1,   -1, 0x56, 0x56 },  
CF_MEM | CF_RAT },
{MEM_RAT_NOCACHE,   {   -1,   -1, 0x57, 0x57 },  
CF_MEM | CF_RAT },
-   {MEM_RING1, {   -1,   -1, 0x58, 0x58 },  
CF_MEM  },
-   {MEM_RING2, {   -1,   -1, 0x59, 0x59 },  
CF_MEM  },
-   {MEM_RING3, {   -1,   -1, 0x5A, 0x5A },  
CF_MEM  },
+   {MEM_RING1, {   -1,   -1, 0x58, 0x58 },  
CF_MEM | CF_EMIT },
+   {MEM_RING2, {   -1,   -1, 0x59, 0x59 },  
CF_MEM | CF_EMIT },
+   {MEM_RING3, {   -1,   -1, 0x5A, 0x5A },  
CF_MEM | CF_EMIT },
{MEM_MEM_COMBINED,  {   -1,   -1, 0x5B, 0x5B },  
CF_MEM  },
{MEM_RAT_COMBINED_NOCACHE,  {   -1,   -1, 0x5C, 0x5C },  
CF_MEM | CF_RAT },
{MEM_RAT_COMBINED,  {   -1,   -1,   -1, 0x5D },  
CF_MEM | CF_RAT }, /* ??? not in cayman isa doc */
diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 28b290a..ff2c784 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -159,8 +159,6 @@ int r600_pipe_shader_create(struct pipe_context *ctx,
goto error;
}
 
-   /* disable SB for geom shaders - it can't handle the CF_EMIT 
instructions */
-   use_sb = (shader-shader.processor_type != TGSI_PROCESSOR_GEOMETRY);
/* disable SB for shaders using CF_INDEX_0/1 (sampler/ubo array 
indexing) as it doesn't handle those currently */
use_sb = !shader-shader.uses_index_registers;
 
@@ -1141,6 +1139,8 @@ static int fetch_gs_input(struct r600_shader_ctx *ctx, 
struct tgsi_full_src_regi
for (i = 0; i  3; i++) {
treg[i] = r600_get_temp(ctx);
}
+   r600_add_gpr_array(ctx-shader, treg[0], 3, 0x0F);
+
t2 = r600_get_temp(ctx);
for (i = 0; i  3; i++) {
memset(alu, 0, sizeof(struct r600_bytecode_alu));
@@ -1935,9 +1935,9 @@ static int r600_shader_from_tgsi(struct r600_context 
*rctx,
ctx.bc-index_reg[1] = ctx.bc-ar_reg + 3;
}
 
+   shader-max_arrays = 0;
+   shader-num_arrays = 0;
if (indirect_gprs) {
-   shader-max_arrays = 0;
-   shader-num_arrays = 0;
 
if (ctx.info.indirect_files  (1  TGSI_FILE_INPUT)) {
r600_add_gpr_array(shader

Re: [Mesa-dev] [PATCH 4/9] radeonsi: implement gl_SampleMaskIn

2015-03-02 Thread Glenn Kennard

On Mon, 02 Mar 2015 12:54:18 +0100, Marek Olšák mar...@gmail.com wrote:


From: Marek Olšák marek.ol...@amd.com

---
 docs/GL3.txt | 2 +-
 src/gallium/drivers/radeonsi/si_shader.c | 4 
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index 43bbf85..0487cdf 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -106,7 +106,7 @@ GL 4.0, GLSL 4.00:
   - Enhanced textureGather DONE (r600,  
radeonsi)

   - Geometry shader instancing DONE (r600)
   - Geometry shader multiple streams   DONE ()
-  - Enhanced per-sample shadingDONE (r600)
+  - Enhanced per-sample shadingDONE (r600,  
radeonsi)

   - Interpolation functionsDONE (r600)
   - New overload resolution rules  DONE
   GL_ARB_gpu_shader_fp64   DONE (nvc0,  
softpipe)
diff --git a/src/gallium/drivers/radeonsi/si_shader.c  
b/src/gallium/drivers/radeonsi/si_shader.c

index 085a350..8001ea2 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -680,6 +680,10 @@ static void declare_system_value(
break;
}
+   case TGSI_SEMANTIC_SAMPLEMASK:
+   value = LLVMGetParam(radeon_bld-main_fn, 
SI_PARAM_SAMPLE_COVERAGE);
+   break;
+
default:
assert(!unknown system value);
return;


Patches 4-9 are
Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g: add doubles support for CAYMAN

2015-02-19 Thread Glenn Kennard
vs_as_gs_a;
unsignedps_prim_id_input;
struct r600_shader_array * arrays;
+
+   boolean uses_doubles;
 };
struct r600_shader_key {


With above nits fixed,
Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g/sb: treat undefined values like constants

2015-02-17 Thread Glenn Kennard

On Wed, 18 Feb 2015 01:17:32 +0100, Dave Airlie airl...@gmail.com wrote:


From: Dave Airlie airl...@redhat.com

When we schedule an instructions with undefined value, we
eventually will use 0, which is a constant, however sb wasn't
taking this into account and creating ops with illegal scalar
swizzles.

this replaces my fix for op3 in t slots.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/sb/sb_sched.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp  
b/src/gallium/drivers/r600/sb/sb_sched.cpp

index 4fbdc4f..63e7464 100644
--- a/src/gallium/drivers/r600/sb/sb_sched.cpp
+++ b/src/gallium/drivers/r600/sb/sb_sched.cpp
@@ -266,7 +266,7 @@ bool rp_gpr_tracker::try_reserve(alu_node* n) {
for (i = 0; i  nsrc; ++i) {
value *v = n-src[i];
-   if (v-is_readonly()) {
+   if (v-is_readonly() || v-is_undef()) {
const_count++;
if (trans  const_count == 3)
break;
@@ -295,7 +295,7 @@ bool rp_gpr_tracker::try_reserve(alu_node* n) {
if (need_unreserve  i--) {
do {
value *v = n-src[i];
-   if (!v-is_readonly()) {
+   if (!v-is_readonly()  !v-is_undef()) {
if (i == 1  opt)
continue;
unreserve(bs_cycle(trans, bs, i), n-bc.src[i].sel,


Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g/sb: Don't fold integer value into float CND

2015-02-12 Thread Glenn Kennard
Don't try to do float comparisons on signed integer values,
some of them look like NaNs.

Fixes fs-temp-array-mat3-index-col-row-rd.shader_test regression
caused by 0d4272cd8e7c45157140dc8e283707714a8238d5.

Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com
---
 src/gallium/drivers/r600/sb/sb_peephole.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/sb/sb_peephole.cpp 
b/src/gallium/drivers/r600/sb/sb_peephole.cpp
index d4b9755..4161d59 100644
--- a/src/gallium/drivers/r600/sb/sb_peephole.cpp
+++ b/src/gallium/drivers/r600/sb/sb_peephole.cpp
@@ -250,7 +250,7 @@ void peephole::optimize_CNDcc_op(alu_node* a) {
return;
 
// TODO we can handle some cases for uint comparison
-   if (dcmp_type == AF_UINT_CMP)
+   if (dcmp_type == AF_UINT_CMP || dcmp_type == AF_INT_CMP)
return;
 
if (dcc == AF_CC_NE) {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] r300g: handle unsupported blend factor gracefully

2015-02-06 Thread Glenn Kennard
On Fri, 06 Feb 2015 20:53:21 +0100, Roland Scheidegger  
srol...@vmware.com wrote:



FWIW I'm wondering why you'd actually need them in a d3d9 state tracker,
as this is a feature first seen with d3d10. Unless you'd want to handle
d3d10 of course, but in this case there's probably not much hope for any
of the d3d9 capable hw drivers for lots of reasons...

Roland



Actually it got retrofitted into D3D9 Ex for Vista, see  
https://msdn.microsoft.com/en-us/library/windows/desktop/bb172513(v=vs.85).aspx

D3DPBLENDCAPS_INVSRCCOLOR2 and D3DPBLENDCAPS_SRCCOLOR2.


/Glenn's .02 cents
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] r600g: Implement GL_ARB_draw_indirect for EG/CM

2015-02-06 Thread Glenn Kennard

On Fri, 06 Feb 2015 17:08:46 +0100, Marek Olšák mar...@gmail.com wrote:


Please bump the size of vgt_state for the SQ_VTX_BASE_VTX_LOC
register. It's set by r600_init_atom in r600_state.c and
evergreen_state.c

Please bump R600_MAX_DRAW_CS_DWORDS. It's an upper bound of how many
dwords draw_vbo can emit.



Thanks, will fix.


I don't understand what get_vfetch_type is good for. Could you please
explain it in the code? Also, I don't understand what constant buffer
fetches have to do with VertexID.



Will add some more blurb to get_vfetch_type, in particular i can point at  
the appropriate parts of gpu documentation.


As for the interaction of buffer fetches and VertexID, i'll attempt to  
explain:


The way R_03CFF0_SQ_VTX_BASE_VTX_LOC is delivered to the vertex shader is  
basically, it isn't. Instead what the
hardware does is poke the 64 unique values (one per wavefront thread, 64  
state in the documentation) into the fetch units into a hidden state  
hardware register which the shader cannot read, at least not in any way  
that i've been able to find.


Setting FETCH_MODE=SQ_VTX_FETCH_VERTEX_DATA (=0) on a VFETCH instruction  
then tells the fetch unit to add the BASE_VTX and start instance offsets  
before reading the value - see  
r600_asm.c:r600_create_vertex_fetch_shader() which open codes 0 as the  
fetch mode for vertex fetches.


This creates a problem for GLSL gl_VertexId, since the shader cannot apply  
the offset. Lets look at the shader for the  
tests/spec/arb_draw_indirect/vertexid.c piglit test case:


#version 140\n
\n
in vec4 piglit_vertex;\n
out vec3 c;\n
\n
const vec3 colors[] = vec3[](\n
  vec3(1, 0, 0),\n
  vec3(1, 0, 0),\n
  vec3(1, 0, 0),\n
  vec3(1, 0, 0),\n
\n
...
  vec3(1, 0, 1),\n
  vec3(1, 0, 1),\n
  vec3(1, 0, 1),\n
  vec3(1, 0, 1)\n
);\n
void main() {\n
   c = colors[gl_VertexID];\n
  gl_Position = piglit_vertex;\n
}\n

Colors here is a constant array, and base offset needs to be applied to  
look up the correct color value - the GL 4.5 spec is quite clear that it  
should be applied to gl_VertexID. Since the hardware offers no way to add  
base instance to gl_VertexID, i do the next best thing and enable offset  
on the array fetch operation instead.


The detection logic is quite hacky, since really it needs to look if the  
array expression depends in any way on gl_VertexId which requires looking  
at def use chains, which aren't available in r600_asm.c - can probably  
have SB compute the bit instead, but that sort of violates its don't  
change program meaning principle, not to mention different behavior with  
SB disabled.


All the actual shaders that i've found using gl_VertexId in conjunction  
with indirect draws only use one constant array. I figure partial support  
at least approximately matches what the binary driver supports, which  
doesn't produce the correct value for gl_VertexId either for indirect  
draws in various cases - in particular if the shader tries to compare  
gl_VertexID against some other expression you get an incorrect value.



The driver does something totally different for direct draws, it adds the  
base offset and start offset manually and feeds that to the hardware, with  
BASE_VTX always set to 0, which allows it to work for all cases. Not an  
option for indirect draws if you want any sort of performance out of them.



So to sum up, gl_VertexID i don't see the hardware being fully capable of  
following the spec in conjunction with indirect drawing for all cases, at  
least not without some very slow fallbacks reading back the draw  
parameters to the cpu which is useless. One option would be to just drop  
the attempt at supporting gl_VertexID from this patch if it's deemed too  
hacky.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v3] r600g: Implement GL_ARB_draw_indirect for EG/CM

2015-02-05 Thread Glenn Kennard
Requires Evergreen/Cayman and radeon kernel module
2.41.0 or newer.

Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com
---
Changes since v2:
* Fix failing arb_draw_indirect-vertexid piglit test cases.
* Ensure start_instance, base_vertex, index_offset are reset when
  switching back to direct draws.
* Juggled some header defines to avoid use of magic numbers.

 docs/GL3.txt |   4 +-
 docs/relnotes/10.5.0.html|   1 +
 src/gallium/drivers/r600/evergreend.h|   1 -
 src/gallium/drivers/r600/r600_pipe.c |   4 +-
 src/gallium/drivers/r600/r600_pipe.h |   1 +
 src/gallium/drivers/r600/r600_shader.c   |  14 ++-
 src/gallium/drivers/r600/r600_state_common.c | 128 ++-
 src/gallium/drivers/r600/r600d.h |   8 +-
 8 files changed, 130 insertions(+), 31 deletions(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index 23f5561..ef4f0ae 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -95,7 +95,7 @@ GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, 
radeonsi, llvmpipe, soft
 GL 4.0, GLSL 4.00:
 
   GL_ARB_draw_buffers_blendDONE (i965, nv50, nvc0, 
r600, radeonsi, llvmpipe, softpipe)
-  GL_ARB_draw_indirect DONE (i965, nvc0, 
radeonsi, llvmpipe, softpipe)
+  GL_ARB_draw_indirect DONE (i965, nvc0, r600, 
radeonsi, llvmpipe, softpipe)
   GL_ARB_gpu_shader5   DONE (i965, nvc0)
   - 'precise' qualifierDONE
   - Dynamically uniform sampler array indices  DONE (r600)
@@ -159,7 +159,7 @@ GL 4.3, GLSL 4.30:
   GL_ARB_framebuffer_no_attachmentsnot started
   GL_ARB_internalformat_query2 not started
   GL_ARB_invalidate_subdataDONE (all drivers)
-  GL_ARB_multi_draw_indirect   DONE (i965, nvc0, 
radeonsi, llvmpipe, softpipe)
+  GL_ARB_multi_draw_indirect   DONE (i965, nvc0, r600, 
radeonsi, llvmpipe, softpipe)
   GL_ARB_program_interface_query   not started
   GL_ARB_robust_buffer_access_behavior not started
   GL_ARB_shader_image_size not started
diff --git a/docs/relnotes/10.5.0.html b/docs/relnotes/10.5.0.html
index 4f921ea..47686c0 100644
--- a/docs/relnotes/10.5.0.html
+++ b/docs/relnotes/10.5.0.html
@@ -49,6 +49,7 @@ Note: some of the new features are only available with 
certain drivers.
 liGL_EXT_packed_float on freedreno/li
 liGL_EXT_texture_shared_exponent on freedreno/li
 liGL_EXT_texture_snorm on freedreno/li
+liGL_ARB_draw_indirect, GL_ARB_multi_draw_indirect on r600/li
 /ul
 
 
diff --git a/src/gallium/drivers/r600/evergreend.h 
b/src/gallium/drivers/r600/evergreend.h
index 4989996..cd4ff46 100644
--- a/src/gallium/drivers/r600/evergreend.h
+++ b/src/gallium/drivers/r600/evergreend.h
@@ -72,7 +72,6 @@
 #define PKT3_REG_RMW   0x21
 #define PKT3_COND_EXEC 0x22
 #define PKT3_PRED_EXEC 0x23
-#define PKT3_START_3D_CMDBUF   0x24
 #define PKT3_DRAW_INDEX_2  0x27
 #define PKT3_CONTEXT_CONTROL   0x28
 #define PKT3_DRAW_INDEX_IMMD_BE0x29
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index b6f7859..3127e23 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -313,6 +313,9 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
return family = CHIP_CEDAR ? 1 : 0;
case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
return family = CHIP_CEDAR ? 4 : 0;
+   case PIPE_CAP_DRAW_INDIRECT:
+   /* kernel command checker support is also required */
+   return family = CHIP_CEDAR  rscreen-b.info.drm_minor = 41;
 
/* Unsupported features. */
case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT:
@@ -322,7 +325,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_VERTEX_COLOR_CLAMPED:
case PIPE_CAP_USER_VERTEX_BUFFERS:
case PIPE_CAP_TEXTURE_GATHER_OFFSETS:
-   case PIPE_CAP_DRAW_INDIRECT:
case PIPE_CAP_CONDITIONAL_RENDER_INVERTED:
case PIPE_CAP_SAMPLER_VIEW_TARGET:
case PIPE_CAP_VERTEXID_NOBASE:
diff --git a/src/gallium/drivers/r600/r600_pipe.h 
b/src/gallium/drivers/r600/r600_pipe.h
index e110efe..1db43c4 100644
--- a/src/gallium/drivers/r600/r600_pipe.h
+++ b/src/gallium/drivers/r600/r600_pipe.h
@@ -145,6 +145,7 @@ struct r600_vgt_state {
uint32_t vgt_multi_prim_ib_reset_en;
uint32_t vgt_multi_prim_ib_reset_indx;
uint32_t vgt_indx_offset;
+   bool last_draw_was_indirect;
 };
 
 struct r600_blend_color {
diff --git a/src/gallium/drivers/r600

Re: [Mesa-dev] [PATCH 1/6] glapi: add GL_EXT_polygon_offset_clamp

2015-02-01 Thread Glenn Kennard
On Sun, 01 Feb 2015 16:18:51 +0100, Ilia Mirkin imir...@alum.mit.edu  
wrote:



Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---
 src/mapi/glapi/gen/gl_API.xml   | 11 +++
 src/mesa/main/polygon.c |  6 ++
 src/mesa/main/polygon.h |  5 -
 src/mesa/main/tests/dispatch_sanity.cpp |  3 +++
 4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/src/mapi/glapi/gen/gl_API.xml  
b/src/mapi/glapi/gen/gl_API.xml

index e3cbab3..17bf62a 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -12858,6 +12858,17 @@
xi:include href=INTEL_performance_query.xml  
xmlns:xi=http://www.w3.org/2001/XInclude/

+category name=GL_EXT_polygon_offset_clamp number=460
+enum name=POLYGON_OFFSET_CLAMP_EXT value=0x8E1B
+size name=Get mode=get/
+/enum
+function name=PolygonOffsetClampEXT offset=assign
+param name=factor type=GLfloat/
+param name=units  type=GLfloat/
+param name=clamp  type=GLfloat/
+/function
+/category
+
 !-- Unnumbered extensions sorted by name. --
category name=GL_ATI_blend_equation_separate
diff --git a/src/mesa/main/polygon.c b/src/mesa/main/polygon.c
index cdaa244..e3b9073 100644
--- a/src/mesa/main/polygon.c
+++ b/src/mesa/main/polygon.c
@@ -265,6 +265,12 @@ _mesa_PolygonOffsetEXT( GLfloat factor, GLfloat  
bias )

_mesa_PolygonOffset(factor, bias * ctx-DrawBuffer-_DepthMaxF );
 }
+void GLAPIENTRY
+_mesa_PolygonOffsetClampEXT( GLfloat factor, GLfloat units, GLfloat  
clamp )

+{
+
+}
+
/**/
diff --git a/src/mesa/main/polygon.h b/src/mesa/main/polygon.h
index 530adba..6cf14d3 100644
--- a/src/mesa/main/polygon.h
+++ b/src/mesa/main/polygon.h
@@ -55,12 +55,15 @@ extern void GLAPIENTRY
 _mesa_PolygonOffsetEXT( GLfloat factor, GLfloat bias );
extern void GLAPIENTRY
+_mesa_PolygonOffsetClampEXT( GLfloat factor, GLfloat units, GLfloat  
clamp );

+
+extern void GLAPIENTRY
 _mesa_PolygonStipple( const GLubyte *mask );
extern void GLAPIENTRY
 _mesa_GetPolygonStipple( GLubyte *mask );
-extern void
+extern void
 _mesa_init_polygon( struct gl_context * ctx );
#endif
diff --git a/src/mesa/main/tests/dispatch_sanity.cpp  
b/src/mesa/main/tests/dispatch_sanity.cpp

index ee4db45..1f1a3a8 100644
--- a/src/mesa/main/tests/dispatch_sanity.cpp
+++ b/src/mesa/main/tests/dispatch_sanity.cpp
@@ -988,6 +988,9 @@ const struct function gl_core_functions_possible[] =  
{

{ glTextureStorage3DMultisample, 45, -1 },
{ glTextureBuffer, 45, -1 },
+   /* GL_EXT_polygon_offset_clamp */
+   { glPolygonOffsetClampEXT, 11, -1 },
+
{ NULL, 0, -1 }
 };



Patches 1-5 (assuming fix for clamp in 2 noted already by Ilia) are
Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/10] r600g, radeonsi: don't append to streamout buffers that haven't been used yet

2015-02-01 Thread Glenn Kennard

On Sun, 01 Feb 2015 18:36:58 +0100, Marek Olšák mar...@gmail.com wrote:


From: Marek Olšák marek.ol...@amd.com

The FILLED_SIZE counter is uninitialized at the beginning, so we can't  
use it.

Instead, use offset = 0, which is what we always do when not appending.

This unexpectedly fixes spec/ARB_texture_multisample/sample-position/*.
Yes, the test does use transform feedback.

Cc: 10.3 10.4 mesa-sta...@lists.freedesktop.org
---
 src/gallium/drivers/radeon/r600_pipe_common.h | 1 +
 src/gallium/drivers/radeon/r600_streamout.c   | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h  
b/src/gallium/drivers/radeon/r600_pipe_common.h

index 6224668..46a6bf3 100644
--- a/src/gallium/drivers/radeon/r600_pipe_common.h
+++ b/src/gallium/drivers/radeon/r600_pipe_common.h
@@ -294,6 +294,7 @@ struct r600_so_target {
/* The buffer where BUFFER_FILLED_SIZE is stored. */
struct r600_resource*buf_filled_size;
unsignedbuf_filled_size_offset;
+   boolbuf_filled_size_valid;
unsignedstride_in_dw;
 };
diff --git a/src/gallium/drivers/radeon/r600_streamout.c  
b/src/gallium/drivers/radeon/r600_streamout.c

index c44f0f2..bc8bf97 100644
--- a/src/gallium/drivers/radeon/r600_streamout.c
+++ b/src/gallium/drivers/radeon/r600_streamout.c
@@ -237,7 +237,7 @@ static void r600_emit_streamout_begin(struct  
r600_common_context *rctx, struct r

}
}
-   if (rctx-streamout.append_bitmask  (1  i)) {
+		if (rctx-streamout.append_bitmask  (1  i)   
t[i]-buf_filled_size_valid) {

uint64_t va = t[i]-buf_filled_size-gpu_address +
  t[i]-buf_filled_size_offset;
@@ -302,6 +302,8 @@ void r600_emit_streamout_end(struct  
r600_common_context *rctx)

 * buffer bound. This ensures that the primitives-emitted query
 * won't increment. */
 		r600_write_context_reg(cs, R_028AD0_VGT_STRMOUT_BUFFER_SIZE_0 + 16*i,  
0);

+
+   t[i]-buf_filled_size_valid = true;
}
rctx-streamout.begin_emitted = false;


Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] glsl: Add define for ARB_shader_precision

2015-02-01 Thread Glenn Kennard
On Wed, 31 Dec 2014 21:43:51 +0100, Micah Fedke  
micah.fe...@collabora.co.uk wrote:



---
 src/glsl/glcpp/glcpp-parse.y| 3 +++
 src/glsl/glsl_parser_extras.cpp | 1 +
 src/glsl/glsl_parser_extras.h   | 2 ++
 src/mesa/main/extensions.c  | 1 +
 src/mesa/main/mtypes.h  | 1 +
 5 files changed, 8 insertions(+)

diff --git a/src/glsl/glcpp/glcpp-parse.y b/src/glsl/glcpp/glcpp-parse.y
index 9b1a4f4..c9cc68f 100644
--- a/src/glsl/glcpp/glcpp-parse.y
+++ b/src/glsl/glcpp/glcpp-parse.y
@@ -2473,6 +2473,9 @@  
_glcpp_parser_handle_version_declaration(glcpp_parser_t *parser,  
intmax_t versio

  if (extensions-ARB_derivative_control)
  add_builtin_define(parser,  
GL_ARB_derivative_control, 1);

+
+  if (extensions-ARB_shader_precision)
+ add_builtin_define(parser, GL_ARB_shader_precision,  
1);

   }
}
diff --git a/src/glsl/glsl_parser_extras.cpp  
b/src/glsl/glsl_parser_extras.cpp

index 27e2eaf3..8555af6 100644
--- a/src/glsl/glsl_parser_extras.cpp
+++ b/src/glsl/glsl_parser_extras.cpp
@@ -532,6 +532,7 @@ static const _mesa_glsl_extension  
_mesa_glsl_supported_extensions[] = {
EXT(ARB_shader_atomic_counters, true,  false,  
ARB_shader_atomic_counters),
EXT(ARB_shader_bit_encoding,true,  false,  
ARB_shader_bit_encoding),
EXT(ARB_shader_image_load_store,true,  false,  
ARB_shader_image_load_store),
+   EXT(ARB_shader_precision,   true,  false,  
ARB_shader_precision),
EXT(ARB_shader_stencil_export,  true,  false,  
ARB_shader_stencil_export),
EXT(ARB_shader_texture_lod, true,  false,  
ARB_shader_texture_lod),
EXT(ARB_shading_language_420pack,   true,  false,  
ARB_shading_language_420pack),
diff --git a/src/glsl/glsl_parser_extras.h  
b/src/glsl/glsl_parser_extras.h

index e04f7ce..0ca6053 100644
--- a/src/glsl/glsl_parser_extras.h
+++ b/src/glsl/glsl_parser_extras.h
@@ -424,6 +424,8 @@ struct _mesa_glsl_parse_state {
bool ARB_shader_bit_encoding_warn;
bool ARB_shader_image_load_store_enable;
bool ARB_shader_image_load_store_warn;
+   bool ARB_shader_precision_enable;
+   bool ARB_shader_precision_warn;
bool ARB_shader_stencil_export_enable;
bool ARB_shader_stencil_export_warn;
bool ARB_shader_texture_lod_enable;
diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
index 0df04c2..95c7a37 100644
--- a/src/mesa/main/extensions.c
+++ b/src/mesa/main/extensions.c
@@ -147,6 +147,7 @@ static const struct extension extension_table[] = {
{ GL_ARB_shader_bit_encoding,  
o(ARB_shader_bit_encoding), GL, 2010 },
{ GL_ARB_shader_image_load_store,  
o(ARB_shader_image_load_store), GL, 2011 },
{ GL_ARB_shader_objects,   
o(dummy_true),  GL, 2002 },
+   { GL_ARB_shader_precision, 
o(ARB_shader_precision),GL, 2014 },


Isn't this extension from 2010 rather than 2014?

{ GL_ARB_shader_stencil_export,
o(ARB_shader_stencil_export),   GL, 2009 },
{ GL_ARB_shader_texture_lod,   
o(ARB_shader_texture_lod),  GL, 2009 },
{ GL_ARB_shading_language_100, 
o(dummy_true),  GLL,2003 },

diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index b95dfb9..4c83379 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -3757,6 +3757,7 @@ struct gl_extensions
GLboolean ARB_shader_atomic_counters;
GLboolean ARB_shader_bit_encoding;
GLboolean ARB_shader_image_load_store;
+   GLboolean ARB_shader_precision;
GLboolean ARB_shader_stencil_export;
GLboolean ARB_shader_texture_lod;
GLboolean ARB_shading_language_packing;

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/4] drirc: add workarounds for Unigine Tropics

2015-01-30 Thread Glenn Kennard
On Fri, 30 Jan 2015 15:19:49 +0100, Martin Peres  
martin.pe...@linux.intel.com wrote:



Signed-off-by: Martin Peres martin.pe...@linux.intel.com
---
 src/mesa/drivers/dri/common/drirc | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/mesa/drivers/dri/common/drirc  
b/src/mesa/drivers/dri/common/drirc

index cecd6a9..073814e 100644
--- a/src/mesa/drivers/dri/common/drirc
+++ b/src/mesa/drivers/dri/common/drirc
@@ -10,6 +10,11 @@ Application bugs worked around in this file:
   Enabling all extensions for Unigine fixes most issues, but the GLSL  
version

   is still 1.10.
+* Unigine Tropics 1.3 makes use of the sample keyword which is  
reserved
+  with ARB_GL_gpu_shader5 which got enabled by  
force_glsl_extensions_warn.


There seems to be something weird going on here - as far as I can tell  
Tropics is using a GL legacy context, and for those
GL_ARB_GL_gpu_shader5 isn't supposed to be enabled, the extension spec  
mentions GL 3.2 compatibility/core profile being required.


If i test this on r600 the extension cannot be enabled in a legacy  
context, only in a core one. Maybe there is a check missing somewhere in  
the intel driver?


+  It also makes use of bitwise manipulation (when adding anistropic  
filtering)

+  which is illegal in GLSL 1.10. Adding #version 130 fixes this.
+
 * Unigine Heaven 3.0 with ARB_texture_multisample uses a ivec4 * vec4
   expression, which is illegal in GLSL 1.10.
   Adding #version 130 fixes this.
@@ -41,6 +46,8 @@ TODO: document the other workarounds.
application name=Unigine Tropics executable=Tropics
 option name=force_glsl_extensions_warn value=true /
+option name=mesa_extension_override  
value=-GL_ARB_gpu_shader5 /

+option name=force_glsl_version value=130 /
 option name=disable_blend_func_extended value=true /
/application



force_glsl_version addition LGTM.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] r600g/sb: fix a bug in constants folding optimisation pass

2015-01-30 Thread Glenn Kennard

On Sat, 31 Jan 2015 01:36:30 +0100, Xavier B. xavi...@gmail.com wrote:


r600g/sb: fix a bug in constants folding optimisation
 pass:

ADD R6.y.1,R5.w.1, ~1|3f80
ADD R6.y.2,|R6.y.1|, -0.0001|b8d1b717

was wrongly being converted to

ADD R6.y.1,R5.w.1, ~1|3f80
ADD R6.y.2,R5.w.1, -1.0001|bf800347

because abs() modifier was ignored.

Signed-off-by: Xavier Bouchoux xavi...@gmail.com



Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com

Thanks Xavier! For future patches, please use git send-email as noted in  
http://www.mesa3d.org/devinfo.html so reviewers can comment inline.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] r600g: add support for primitive id without geom shader

2015-01-27 Thread Glenn Kennard
,

output[j].swizzle_z = 4; /* 0 */
output[j].swizzle_w = 5; /* 1 */
break;
+   case TGSI_SEMANTIC_PRIMID:
+   output[j].swizzle_x = 2;
+   output[j].swizzle_y = 4; /* 0 */
+   output[j].swizzle_z = 4; /* 0 */
+   output[j].swizzle_w = 4; /* 0 */
+   break;
}
+
break;
case TGSI_PROCESSOR_FRAGMENT:
if (shader-output[i].name == 
TGSI_SEMANTIC_COLOR) {
diff --git a/src/gallium/drivers/r600/r600_shader.h  
b/src/gallium/drivers/r600/r600_shader.h

index ab67013..b2559e9 100644
--- a/src/gallium/drivers/r600/r600_shader.h
+++ b/src/gallium/drivers/r600/r600_shader.h
@@ -84,6 +84,8 @@ struct r600_shader {
unsignedmax_arrays;
unsignednum_arrays;
unsignedvs_as_es;
+   unsignedvs_as_gs_a;
+   unsignedps_prim_id_input;
struct r600_shader_array * arrays;
 };
@@ -92,6 +94,8 @@ struct r600_shader_key {
unsigned alpha_to_one:1;
unsigned nr_cbufs:4;
unsigned vs_as_es:1;
+   unsigned vs_as_gs_a:1;
+   unsigned vs_prim_id_out:8;
 };
struct r600_shader_array {
diff --git a/src/gallium/drivers/r600/r600_state_common.c  
b/src/gallium/drivers/r600/r600_state_common.c

index 1030620..b498d00 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -707,6 +707,10 @@ static INLINE struct r600_shader_key  
r600_shader_selector_key(struct pipe_contex

key.nr_cbufs = 2;
} else if (sel-type == PIPE_SHADER_VERTEX) {
key.vs_as_es = (rctx-gs_shader != NULL);
+		if (rctx-ps_shader-current-shader.gs_prim_id_input   
!rctx-gs_shader) {

+   key.vs_as_gs_a = true;
+			key.vs_prim_id_out =  
rctx-ps_shader-current-shader.input[rctx-ps_shader-current-shader.ps_prim_id_input].spi_sid;

+   }
}
return key;
 }
@@ -1265,6 +1269,7 @@ static bool r600_update_derived_state(struct  
r600_context *rctx)

r600_update_ps_state(ctx, 
rctx-ps_shader-current);
}
+   rctx-shader_stages.atom.dirty = true;
 		update_shader_atom(ctx, rctx-pixel_shader,  
rctx-ps_shader-current);

}



With r600/r700 bits added and debug print removed:
Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] r600g: move selecting the pixel shader earlier.

2015-01-27 Thread Glenn Kennard

On Tue, 27 Jan 2015 04:46:32 +0100, Dave Airlie airl...@gmail.com wrote:


From: Dave Airlie airl...@redhat.com

In order to detect that a pixel shader has a prim id
input when we have no geometry shader we need to reorder
the shader selection so the pixel shader is selected
first, then the vertex shader key can take into account
the primitive id input requirement and lack of geom shader.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_state_common.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_state_common.c  
b/src/gallium/drivers/r600/r600_state_common.c

index 09d8952..1030620 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -1170,6 +1170,10 @@ static bool r600_update_derived_state(struct  
r600_context *rctx)

}
}
+   r600_shader_select(ctx, rctx-ps_shader, ps_dirty);
+   if (unlikely(!rctx-ps_shader-current))
+   return false;
+
update_gs_block_state(rctx, rctx-gs_shader != NULL);
if (rctx-gs_shader) {
@@ -1232,9 +1236,6 @@ static bool r600_update_derived_state(struct  
r600_context *rctx)

}
}
-   r600_shader_select(ctx, rctx-ps_shader, ps_dirty);
-   if (unlikely(!rctx-ps_shader-current))
-   return false;
	if (unlikely(ps_dirty || rctx-pixel_shader.shader !=  
rctx-ps_shader-current ||
 		rctx-rasterizer-sprite_coord_enable !=  
rctx-ps_shader-current-sprite_coord_enable ||



Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Improving precision of mod(x,y)

2015-01-15 Thread Glenn Kennard
On Thu, 15 Jan 2015 15:32:59 +0100, Roland Scheidegger  
srol...@vmware.com wrote:



Am 15.01.2015 um 10:05 schrieb Iago Toral:

Hi,

We have 16 deqp tests that fail, at least on i965, because of
insufficient precision of the mod GLSL function.

Mesa lowers mod(x,y) to y * fract(x,y) so there can be some precision
lost due to fract operation. Since the result is multiplied by y the
total precision lost usually grows together with the value of y.

Did you mean fract(x/y) here?



Below are some examples to give an idea of the magnitude of this error.
The values on the right represent the precision error for each case:

mod(-1.951171875, 1.9980468750) =  0.000447
mod(121.57, 13.29)  =  0.023842
mod(3769.12, 321.99)=  0.762939
mod(3769.12, 1321.99)   =  0.0001220703
mod(-987654.125, 123456.984375) =  0.0160663128
mod( 987654.125, 123456.984375) =  0.031250

As you see, for large enough values, the precision error becomes
significant.

This can be fixed by lowering mod(x,y) to x - y * floor(x/y) instead,
which is the suggested implementation in the GLSL docs. I have a local
patch in my tree that does this and it does indeed fix the problem. the
down side is that this implementation adds and extra ADD instruction to
the generated code (besides replacing fract with floor, which I guess
have similar cost).

Since this is a case where there is some trade-off to the fix, I wonder
if we are interested in doing this or not. Is the precision fix worth
the additional ADD?



Well I can tell you that llvmpipe implements frc(x) as x - floor(x), so
this change looks good to me :-).
On a more serious note though, it looks to me like the cost of this
expression would be mostly dominated by the division, hence some add
more shouldn't be that bad. And if the test is legit, I don't think
there's much choice (unless you could make this optional for some old
glsl versions if they didn't require that much precision but even then
it's probably not worth bothering imho).



FWIW, I just typed out the following little piglit test and tried it on  
R600:


[require]
GLSL = 3.30

[vertex shader passthrough]
[fragment shader]
uniform float a;
uniform float b;
out vec4 colour;

void
main(void)
{
//  colour = vec4(b * fract(a / b)); // current lowering of mod(x,y)
colour = vec4(a - b * floor(a/b)); // proposed lowering
}

[test]
clear color 0.5 0.5 0.5 0.5
clear

uniform float a 4.2
uniform float b 3.5
draw rect -1 -1 2 2
probe rgba 1 1 0.7 0.7 0.7 0.7


Resulting R600 assembly:

// y * fract(x,y)
// KC0[0].x is x and KC0[1] is y
1  t: RECIP_IEEE T0.x,  KC0[1].x
2  x: MULT0.x,  KC0[0].x, T0.x
3  x: FRACT  T0.x,  T0.x
4  x: MULR0.x,  KC0[1].x, T0.x
EXPORT_DONEPIXEL 0 R0.  EOP

// x - y * floor(x/y)
1  t: RECIP_IEEE T0.x,  KC0[1].x
2  x: MULT0.x,  KC0[0].x, T0.x
3  x: FLOOR  T0.x,  T0.x
4  x: MULADD R0.x,  KC0[1].x, -T0.x, KC0[0].x
EXPORT_DONEPIXEL 0 R0.  EOP

Same number of cycles/length of dependency chain/ALU pipe usage for both  
methods.



I'd expect most architectures that can do source negate with multiply-add  
in a single operation should get similar results with no extra cost for  
the subtraction.



/Glenn
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium: add double opcodes and TGSI execution (v2.1)

2014-12-23 Thread Glenn Kennard

On Tue, 23 Dec 2014 22:50:30 +0100, Dave Airlie airl...@gmail.com wrote:


This patch adds support for a set of double opcodes
to TGSI. It is an update of work done originally
by Michal Krol on the gallium-double-opcodes branch.

The opcodes have a hint where they came from in the
header file.

v2: add unsigned/int - double
v2.1:  update docs.
This is based on code by Michael Krol mic...@vmware.com

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/auxiliary/tgsi/tgsi_exec.c | 743  
-

 src/gallium/auxiliary/tgsi/tgsi_info.c |  24 +-
 src/gallium/docs/source/tgsi.rst   |  76 ++-
 src/gallium/include/pipe/p_shader_tokens.h |  26 +-
 4 files changed, 850 insertions(+), 19 deletions(-)

diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c  
b/src/gallium/auxiliary/tgsi/tgsi_exec.c

index 834568b..6af4730 100644
--- a/src/gallium/auxiliary/tgsi/tgsi_exec.c
+++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c
@@ -72,6 +72,16 @@
 #define TILE_BOTTOM_LEFT  2
 #define TILE_BOTTOM_RIGHT 3
+union tgsi_double_channel {
+   double d[TGSI_QUAD_SIZE];
+   unsigned u[TGSI_QUAD_SIZE][2];
+};
+
+struct tgsi_double_vector {
+   union tgsi_double_channel xy;
+   union tgsi_double_channel zw;
+};
+
 static void
 micro_abs(union tgsi_exec_channel *dst,
   const union tgsi_exec_channel *src)
@@ -147,6 +157,55 @@ micro_cos(union tgsi_exec_channel *dst,
 }
static void
+micro_d2f(union tgsi_exec_channel *dst,
+  const union tgsi_double_channel *src)
+{
+   dst-f[0] = (float)src-d[0];
+   dst-f[1] = (float)src-d[1];
+   dst-f[2] = (float)src-d[2];
+   dst-f[3] = (float)src-d[3];
+}
+
+static void
+micro_d2i(union tgsi_exec_channel *dst,
+  const union tgsi_double_channel *src)
+{
+   dst-i[0] = (int)src-d[0];
+   dst-i[1] = (int)src-d[1];
+   dst-i[2] = (int)src-d[2];
+   dst-i[3] = (int)src-d[3];
+}
+
+static void
+micro_d2u(union tgsi_exec_channel *dst,
+  const union tgsi_double_channel *src)
+{
+   dst-u[0] = (unsigned)src-d[0];
+   dst-u[1] = (unsigned)src-d[1];
+   dst-u[2] = (unsigned)src-d[2];
+   dst-u[3] = (unsigned)src-d[3];
+}
+static void
+micro_dabs(union tgsi_double_channel *dst,
+   const union tgsi_double_channel *src)
+{
+   dst-d[0] = src-d[0] = 0.0 ? src-d[0] : -src-d[0];
+   dst-d[1] = src-d[1] = 0.0 ? src-d[1] : -src-d[1];
+   dst-d[2] = src-d[2] = 0.0 ? src-d[2] : -src-d[2];
+   dst-d[3] = src-d[3] = 0.0 ? src-d[3] : -src-d[3];
+}
+
+static void
+micro_dadd(union tgsi_double_channel *dst,
+  const union tgsi_double_channel *src)
+{
+   dst-d[0] = src[0].d[0] + src[1].d[0];
+   dst-d[1] = src[0].d[1] + src[1].d[1];
+   dst-d[2] = src[0].d[2] + src[1].d[2];
+   dst-d[3] = src[0].d[3] + src[1].d[3];
+}
+
+static void
 micro_ddx(union tgsi_exec_channel *dst,
   const union tgsi_exec_channel *src)
 {
@@ -167,6 +226,159 @@ micro_ddy(union tgsi_exec_channel *dst,
 }
static void
+micro_ddiv(union tgsi_double_channel *dst,
+   const union tgsi_double_channel *src)
+{
+   dst-d[0] = src[0].d[0] / src[1].d[0];
+   dst-d[1] = src[0].d[1] / src[1].d[1];
+   dst-d[2] = src[0].d[2] / src[1].d[2];
+   dst-d[3] = src[0].d[3] / src[1].d[3];
+}
+
+static void
+micro_dmul(union tgsi_double_channel *dst,
+   const union tgsi_double_channel *src)
+{
+   dst-d[0] = src[0].d[0] * src[1].d[0];
+   dst-d[1] = src[0].d[1] * src[1].d[1];
+   dst-d[2] = src[0].d[2] * src[1].d[2];
+   dst-d[3] = src[0].d[3] * src[1].d[3];
+}
+
+static void
+micro_dmax(union tgsi_double_channel *dst,
+   const union tgsi_double_channel *src)
+{
+   dst-d[0] = src[0].d[0]  src[1].d[0] ? src[0].d[0] : src[1].d[0];
+   dst-d[1] = src[0].d[1]  src[1].d[1] ? src[0].d[1] : src[1].d[1];
+   dst-d[2] = src[0].d[2]  src[1].d[2] ? src[0].d[2] : src[1].d[2];
+   dst-d[3] = src[0].d[3]  src[1].d[3] ? src[0].d[3] : src[1].d[3];
+}
+
+static void
+micro_dmin(union tgsi_double_channel *dst,
+   const union tgsi_double_channel *src)
+{
+   dst-d[0] = src[0].d[0]  src[1].d[0] ? src[0].d[0] : src[1].d[0];
+   dst-d[1] = src[0].d[1]  src[1].d[1] ? src[0].d[1] : src[1].d[1];
+   dst-d[2] = src[0].d[2]  src[1].d[2] ? src[0].d[2] : src[1].d[2];
+   dst-d[3] = src[0].d[3]  src[1].d[3] ? src[0].d[3] : src[1].d[3];
+}
+
+static void
+micro_dneg(union tgsi_double_channel *dst,
+   const union tgsi_double_channel *src)
+{
+   dst-d[0] = -src-d[0];
+   dst-d[1] = -src-d[1];
+   dst-d[2] = -src-d[2];
+   dst-d[3] = -src-d[3];
+}
+
+static void
+micro_dslt(union tgsi_double_channel *dst,
+   const union tgsi_double_channel *src)
+{
+   dst-u[0][0] = src[0].d[0]  src[1].d[0] ? ~0U : 0U;
+   dst-u[1][0] = src[0].d[1]  src[1].d[1] ? ~0U : 0U;
+   dst-u[2][0] = src[0].d[2]  src[1].d[2] ? ~0U : 0U;
+   dst-u[3][0] = src[0].d[3]  src[1].d[3] ? ~0U : 0U;
+}
+
+static void
+micro_dsne(union tgsi_double_channel *dst,
+   const union tgsi_double_channel *src)
+{
+   dst-u[0][0] = src[0].d[0] != src[1].d[0] ? ~0U : 0U;
+   dst-u[1][0] = 

Re: [Mesa-dev] [PATCH 004/133] nir: add the core datastructures

2014-12-19 Thread Glenn Kennard
On Tue, 16 Dec 2014 07:04:14 +0100, Jason Ekstrand ja...@jlekstrand.net  
wrote:



From: Connor Abbott connor.abb...@intel.com

This includes all the instructions, ifs, loops, functions, etc. This is
similar to the information in ir.h.

v2: Jason Ekstrand jason.ekstr...@intel.com:
   Include ralloc and hash_table from the util directory
---
 src/glsl/Makefile.sources |2 +
 src/glsl/nir/nir.h| 1150  
+

 src/glsl/nir/nir_intrinsics.c |   49 ++
 src/glsl/nir/nir_intrinsics.h |  158 ++
 src/glsl/nir/nir_opcodes.c|   46 ++
 src/glsl/nir/nir_opcodes.h|  346 +
 6 files changed, 1751 insertions(+)
 create mode 100644 src/glsl/nir/nir.h
 create mode 100644 src/glsl/nir/nir_intrinsics.c
 create mode 100644 src/glsl/nir/nir_intrinsics.h
 create mode 100644 src/glsl/nir/nir_opcodes.c
 create mode 100644 src/glsl/nir/nir_opcodes.h

diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
index c3a90f7..e8eedd1 100644
--- a/src/glsl/Makefile.sources
+++ b/src/glsl/Makefile.sources
@@ -14,6 +14,8 @@ LIBGLCPP_GENERATED_FILES = \
$(GLSL_BUILDDIR)/glcpp/glcpp-parse.c
NIR_FILES = \
+$(GLSL_SRCDIR)/nir/nir_intrinsics.c \
+$(GLSL_SRCDIR)/nir/nir_opcodes.c \
 $(GLSL_SRCDIR)/nir/nir_types.cpp
# libglsl
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
new file mode 100644
index 000..ef486da
--- /dev/null
+++ b/src/glsl/nir/nir.h
@@ -0,0 +1,1150 @@
+/*
+ * Copyright © 2014 Connor Abbott
+ *
+ * Permission is hereby granted, free of charge, to any person  
obtaining a
+ * copy of this software and associated documentation files (the  
Software),
+ * to deal in the Software without restriction, including without  
limitation
+ * the rights to use, copy, modify, merge, publish, distribute,  
sublicense,

+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the  
next
+ * paragraph) shall be included in all copies or substantial portions  
of the

+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,  
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF  
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT  
SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR  
OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,  
ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER  
DEALINGS

+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Connor Abbott (cwabbo...@gmail.com)
+ *
+ */
+
+#pragma once
+
+#include util/hash_table.h
+#include main/set.h
+#include ../list.h
+#include GL/gl.h /* GLenum */
+#include util/ralloc.h
+#include nir_types.h
+#include stdio.h
+
+#ifdef __cplusplus
+extern C {
+#endif
+
+struct nir_function_overload;
+struct nir_function;
+
+
+/**
+ * Description of built-in state associated with a uniform
+ *
+ * \sa nir_variable::state_slots
+ */
+typedef struct {
+   int tokens[5];
+   int swizzle;
+} nir_state_slot;
+
+typedef enum {
+   nir_var_shader_in,
+   nir_var_shader_out,
+   nir_var_global,
+   nir_var_local,
+   nir_var_uniform,
+   nir_var_system_value
+} nir_variable_mode;
+
+/**
+ * Data stored in an nir_constant
+ */
+union nir_constant_data {
+  unsigned u[16];
+  int i[16];
+  float f[16];
+  bool b[16];
+};
+
+typedef struct nir_constant {
+   /**
+* Value of the constant.
+*
+* The field used to back the values supplied by the constant is  
determined
+* by the type associated with the \c ir_instruction.  Constants may  
be

+* scalars, vectors, or matrices.
+*/
+   union nir_constant_data value;
+
+   /* Array elements / Structure Fields */
+   struct nir_constant **elements;
+} nir_constant;
+
+/**
+ * \brief Layout qualifiers for gl_FragDepth.
+ *
+ * The AMD/ARB_conservative_depth extensions allow gl_FragDepth to be  
redeclared

+ * with a layout qualifier.
+ */
+typedef enum {
+nir_depth_layout_none, /** No depth layout is specified. */
+nir_depth_layout_any,
+nir_depth_layout_greater,
+nir_depth_layout_less,
+nir_depth_layout_unchanged
+} nir_depth_layout;
+
+/**
+ * Either a uniform, global variable, shader input, or shader output.  
Based on

+ * ir_variable - it should be easy to translate between the two.
+ */
+
+typedef struct {
+   struct exec_node node;
+
+   /**
+* Declared type of the variable
+*/
+   const struct glsl_type *type;
+
+   /**
+* Declared name of the variable
+*/
+   char *name;
+
+   /**
+* For variables which satisfy the is_interface_instance()  
predicate, this

+* points to an array of integers such that if the ith member of the
+* interface block is an array, max_ifc_array_access[i] is the  
maximum
+* array element of that member 

Re: [Mesa-dev] [PATCH 146/133] nir: Use static inlines instead of macros for list getters

2014-12-19 Thread Glenn Kennard
 exec_node_is_tail_sentinel(node-node.next);
+}
NIR_DEFINE_CAST(nir_cf_node_as_block, nir_cf_node, nir_block, cf_node)
 NIR_DEFINE_CAST(nir_cf_node_as_if, nir_cf_node, nir_if, cf_node)


Reviewed-By: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] r600g: Implement ARB_draw_indirect for EG/CM

2014-12-12 Thread Glenn Kennard
Requires Evergreen/Cayman and updated radeon kernel module

Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com
---
Changes since V1:
* Fixed 8 bit index case, only triggerable using GLES 3.1 which isn't supported 
yet
* Don't read info struct values that have no meaning for indirect case
* Don't update start_instance/instance_count for indirect cases
* Use bool expression directly in get_param

Benjamin, the #defines are essentially used, but due to a header conflict
its not possible to include them in this file. Would have broken the indirect 
cases
into evergreen_state.c, but this is a performance-sensitive section of code and
inlining is critical, so did the next best thing and typed out the define names
as comments.

Thanks Marek/Benjamin for V1 review

 docs/GL3.txt |   4 +-
 docs/relnotes/10.5.0.html|   1 +
 src/gallium/drivers/r600/evergreend.h|   6 +-
 src/gallium/drivers/r600/r600_pipe.c |   4 +-
 src/gallium/drivers/r600/r600_state_common.c | 116 ++-
 5 files changed, 105 insertions(+), 26 deletions(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index 648f5ac..435054a 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -95,7 +95,7 @@ GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, 
radeonsi, llvmpipe, soft
 GL 4.0, GLSL 4.00:
 
   GL_ARB_draw_buffers_blendDONE (i965, nv50, nvc0, 
r600, radeonsi, llvmpipe, softpipe)
-  GL_ARB_draw_indirect DONE (i965, nvc0, 
radeonsi, llvmpipe, softpipe)
+  GL_ARB_draw_indirect DONE (i965, nvc0, r600, 
radeonsi, llvmpipe, softpipe)
   GL_ARB_gpu_shader5   DONE (i965, nvc0)
   - 'precise' qualifierDONE
   - Dynamically uniform sampler array indices  DONE (r600)
@@ -159,7 +159,7 @@ GL 4.3, GLSL 4.30:
   GL_ARB_framebuffer_no_attachmentsnot started
   GL_ARB_internalformat_query2 not started
   GL_ARB_invalidate_subdataDONE (all drivers)
-  GL_ARB_multi_draw_indirect   DONE (i965, nvc0, 
radeonsi, llvmpipe, softpipe)
+  GL_ARB_multi_draw_indirect   DONE (i965, nvc0, r600, 
radeonsi, llvmpipe, softpipe)
   GL_ARB_program_interface_query   not started
   GL_ARB_robust_buffer_access_behavior not started
   GL_ARB_shader_image_size not started
diff --git a/docs/relnotes/10.5.0.html b/docs/relnotes/10.5.0.html
index 2987d53..72bb791 100644
--- a/docs/relnotes/10.5.0.html
+++ b/docs/relnotes/10.5.0.html
@@ -49,6 +49,7 @@ Note: some of the new features are only available with 
certain drivers.
 liGL_EXT_packed_float on freedreno/li
 liGL_EXT_texture_shared_exponent on freedreno/li
 liGL_EXT_texture_snorm on freedreno/li
+liGL_ARB_draw_indirect, GL_ARB_multi_draw_indirect on r600/li
 /ul
 
 
diff --git a/src/gallium/drivers/r600/evergreend.h 
b/src/gallium/drivers/r600/evergreend.h
index 4989996..0725f0d 100644
--- a/src/gallium/drivers/r600/evergreend.h
+++ b/src/gallium/drivers/r600/evergreend.h
@@ -64,6 +64,8 @@
 #define R600_TEXEL_PITCH_ALIGNMENT_MASK0x7
 
 #define PKT3_NOP   0x10
+#define PKT3_SET_BASE  0x11
+#define PKT3_INDEX_BUFFER_SIZE 0x13
 #define PKT3_DEALLOC_STATE 0x14
 #define PKT3_DISPATCH_DIRECT   0x15
 #define PKT3_DISPATCH_INDIRECT 0x16
@@ -72,7 +74,9 @@
 #define PKT3_REG_RMW   0x21
 #define PKT3_COND_EXEC 0x22
 #define PKT3_PRED_EXEC 0x23
-#define PKT3_START_3D_CMDBUF   0x24
+#define PKT3_DRAW_INDIRECT 0x24
+#define PKT3_DRAW_INDEX_INDIRECT   0x25
+#define PKT3_INDEX_BASE0x26
 #define PKT3_DRAW_INDEX_2  0x27
 #define PKT3_CONTEXT_CONTROL   0x28
 #define PKT3_DRAW_INDEX_IMMD_BE0x29
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 0b571e4..0d8bac2 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -313,6 +313,9 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
return family = CHIP_CEDAR ? 1 : 0;
case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
return family = CHIP_CEDAR ? 4 : 0;
+   case PIPE_CAP_DRAW_INDIRECT:
+   /* kernel command checker support is also required */
+   return family = CHIP_CEDAR  rscreen-b.info.drm_minor = 41;
 
/* Unsupported features. */
case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT:
@@ -322,7 +325,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param

Re: [Mesa-dev] [PATCH] r600g/sb: implement r600 gpr index workaround. (v3)

2014-12-09 Thread Glenn Kennard
 *pn = 
static_castalu_node*(*pI);
+   if (pn-bc.dst_gpr == src.sel) {
+   add_nop = true;
+   break;
+   }
+   }
+   }
} else
src.rel = 0;
@@ -393,11 +426,23 @@ void  
bc_finalizer::finalize_alu_src(alu_group_node* g, alu_node* a) {

assert(!unknown value kind);
break;
}
+   if (prev  !add_nop) {
+			for (node_iterator pI = prev-begin(), pE = prev-end(); pI != pE;  
++pI) {

+   alu_node *pn = static_castalu_node*(*pI);
+   if (pn-bc.dst_rel) {
+   if (pn-bc.dst_gpr == src.sel) {
+   add_nop = true;
+   break;
+   }
+   }
+   }
+   }
}
while (si  3) {
a-bc.src[si++].sel = 0;
}
+   return add_nop;
 }
void bc_finalizer::copy_fetch_src(fetch_node dst, fetch_node src,  
unsigned arg_start)
diff --git a/src/gallium/drivers/r600/sb/sb_context.cpp  
b/src/gallium/drivers/r600/sb/sb_context.cpp

index 8e11428..5dba85b 100644
--- a/src/gallium/drivers/r600/sb/sb_context.cpp
+++ b/src/gallium/drivers/r600/sb/sb_context.cpp
@@ -61,6 +61,8 @@ int sb_context::init(r600_isa *isa, sb_hw_chip chip,  
sb_hw_class cclass) {

uses_mova_gpr = is_r600()  chip != HW_CHIP_RV670;
+	r6xx_gpr_index_workaround = is_r600()  chip != HW_CHIP_RV670  chip  
!= HW_CHIP_RS780  chip != HW_CHIP_RS880;

+
switch (chip) {
case HW_CHIP_RV610:
case HW_CHIP_RS780:
diff --git a/src/gallium/drivers/r600/sb/sb_pass.h  
b/src/gallium/drivers/r600/sb/sb_pass.h

index 812d14a..0346df1 100644
--- a/src/gallium/drivers/r600/sb/sb_pass.h
+++ b/src/gallium/drivers/r600/sb/sb_pass.h
@@ -695,8 +695,9 @@ public:
void run_on(container_node *c);
-   void finalize_alu_group(alu_group_node *g);
-   void finalize_alu_src(alu_group_node *g, alu_node *a);
+   void insert_rv6xx_load_ar_workaround(alu_group_node *b4);
+   void finalize_alu_group(alu_group_node *g, node *prev_node);
+	bool finalize_alu_src(alu_group_node *g, alu_node *a, alu_group_node  
*prev_node);

void emit_set_grad(fetch_node* f);
void finalize_fetch(fetch_node *f);


Reviewed-By: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: only init GS_VERT_ITEMSIZE on r600

2014-12-09 Thread Glenn Kennard

On Wed, 10 Dec 2014 04:55:21 +0100, Dave Airlie airl...@gmail.com wrote:


From: Dave Airlie airl...@redhat.com

On evergreen there are 4 regs, on r600/700 there is only one.

Don't initialise regs and trash someone elses state.

Not sure this fixes anything, but hey one less stupid.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_state.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_state.c  
b/src/gallium/drivers/r600/r600_state.c

index 61f5c5a..9a4b972 100644
--- a/src/gallium/drivers/r600/r600_state.c
+++ b/src/gallium/drivers/r600/r600_state.c
@@ -2659,11 +2659,8 @@ void r600_update_gs_state(struct pipe_context  
*ctx, struct r600_pipe_shader *sha

r600_store_context_reg(cb, R_028A6C_VGT_GS_OUT_PRIM_TYPE,
   
r600_conv_prim_to_gs_out(rshader-gs_output_prim));
-   r600_store_context_reg_seq(cb, R_0288C8_SQ_GS_VERT_ITEMSIZE, 4);
-   r600_store_value(cb, cp_shader-ring_item_size  2);
-   r600_store_value(cb, 0);
-   r600_store_value(cb, 0);
-   r600_store_value(cb, 0);
+   r600_store_context_reg(cb, R_0288C8_SQ_GS_VERT_ITEMSIZE,
+  cp_shader-ring_item_size  2);
r600_store_context_reg(cb, R_0288A8_SQ_ESGS_RING_ITEMSIZE,
   (rshader-ring_item_size)  2);


Reviewed-By: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: fix regression since UCMP change

2014-12-08 Thread Glenn Kennard

On Tue, 09 Dec 2014 02:31:01 +0100, Dave Airlie airl...@gmail.com wrote:


From: Dave Airlie airl...@redhat.com
Since d8da6deceadf5e48201d848b7061dad17a5b7cac where the
state tracker started using UCMP on cayman a number of tests
regressed.
this seems to be r600g is doing CNDGE_INT for UCMP which is = 0,
we should be doing CNDE_INT with reverse arguments.
Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_shader.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/gallium/drivers/r600/r600_shader.c  
b/src/gallium/drivers/r600/r600_shader.c

index 0b988df..28137e1 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -6082,7 +6082,7 @@ static int tgsi_ucmp(struct r600_shader_ctx *ctx)
continue;
memset(alu, 0, sizeof(struct r600_bytecode_alu));
-   alu.op = ALU_OP3_CNDGE_INT;
+   alu.op = ALU_OP3_CNDE_INT;
r600_bytecode_src(alu.src[0], ctx-src[0], i);
r600_bytecode_src(alu.src[1], ctx-src[2], i);
r600_bytecode_src(alu.src[2], ctx-src[1], i);


Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g/sb: fix issues cause by GLSL switching to loops for switch

2014-11-30 Thread Glenn Kennard

On Fri, 28 Nov 2014 04:36:42 +0100, Dave Airlie airl...@gmail.com wrote:


From: Dave Airlie airl...@redhat.com

Since 73dd50acf6d244979c2a657906aa56d3ac60d550
glsl: implement switch flow control using a loop

The SB backend was falling over in an assert or crashing.

Tracked this down to the loops having no repeats, but requiring
a working break, initial code just called the loop handler for
all non-if statements, but this caused a regression in
tests/shaders/dead-code-break-interaction.shader_test.
So I had to add further code to detect if all the departure
nodes are empty and avoid generating an empty loop for that case.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86089
Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 51  
++

 1 file changed, 36 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp  
b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp

index f0849ca..d91ffa5 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
@@ -46,15 +46,22 @@ int bc_finalizer::run() {
 	for (regions_vec::reverse_iterator I = rv.rbegin(), E = rv.rend(); I  
!= E;

++I) {
region_node *r = *I;
-
+   bool is_if = false;
assert(r);
-   bool loop = r-is_loop();
+   assert(r-first);
+   if (r-first-is_container()) {
+   container_node *repdep1 = 
static_castcontainer_node*(r-first);
+   assert(repdep1-is_depart() || repdep1-is_repeat());
+   if_node *n_if = static_castif_node*(repdep1-first);
+   if (n_if  n_if-is_if())
+   is_if = true;
+   }
-   if (loop)
-   finalize_loop(r);
-   else
+   if (is_if)
finalize_if(r);
+   else
+   finalize_loop(r);
r-expand();
}
@@ -112,16 +119,31 @@ void bc_finalizer::finalize_loop(region_node* r) {
cf_node *loop_start = sh.create_cf(CF_OP_LOOP_START_DX10);
cf_node *loop_end = sh.create_cf(CF_OP_LOOP_END);
+   bool has_instr = false;
+
+   if (!r-is_loop()) {
+		for (depart_vec::iterator I = r-departs.begin(), E =  
r-departs.end();

+I != E; ++I) {
+   depart_node *dep = *I;
+   if (!dep-empty())
+   has_instr = true;


could break here


+   }
+   } else
+   has_instr = true;
-   loop_start-jump_after(loop_end);
-   loop_end-jump_after(loop_start);
+   if (has_instr) {
+   loop_start-jump_after(loop_end);
+   loop_end-jump_after(loop_start);
+   }
for (depart_vec::iterator I = r-departs.begin(), E = r-departs.end();
I != E; ++I) {
depart_node *dep = *I;
-   cf_node *loop_break = sh.create_cf(CF_OP_LOOP_BREAK);
-   loop_break-jump(loop_end);
-   dep-push_back(loop_break);
+   if (has_instr) {
+   cf_node *loop_break = sh.create_cf(CF_OP_LOOP_BREAK);
+   loop_break-jump(loop_end);
+   dep-push_back(loop_break);
+   }
dep-expand();
}
@@ -137,8 +159,10 @@ void bc_finalizer::finalize_loop(region_node* r) {
rep-expand();
}
-   r-push_front(loop_start);
-   r-push_back(loop_end);
+   if (has_instr) {
+   r-push_front(loop_start);
+   r-push_back(loop_end);
+   }
 }
void bc_finalizer::finalize_if(region_node* r) {
@@ -168,9 +192,6 @@ void bc_finalizer::finalize_if(region_node* r) {
if (n_if) {
-
-   assert(n_if-is_if());


shouldn't need to remove this assertion


-
container_node *repdep2 = 
static_castcontainer_node*(n_if-first);
assert(repdep2-is_depart() || repdep2-is_repeat());



I think i've managed to convince myself the above logic is correct, so
Reviewed-By: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: merge the TXQ and BUFFER constant buffers

2014-11-26 Thread Glenn Kennard
,  
R600_TXQ_CONST_BUFFER, cb);

-   pipe_resource_reference(cb.buffer, NULL);
-}
-
 /* set sample xy locations as array of fragment shader constants */
 void r600_set_sample_locations_constant_buffer(struct r600_context  
*rctx)

 {
@@ -1175,7 +1151,7 @@ static bool r600_update_derived_state(struct  
r600_context *rctx)

struct pipe_context * ctx = (struct pipe_context*)rctx;
bool ps_dirty = false, vs_dirty = false, gs_dirty = false;
bool blend_disable;
-
+   bool need_buf_const;
if (!rctx-blitter-running) {
unsigned i;
@@ -1296,29 +1272,35 @@ static bool r600_update_derived_state(struct  
r600_context *rctx)

/* on R600 we stuff masks + txq info into one constant buffer */
/* on evergreen we only need a txq info one */
-   if (rctx-b.chip_class  EVERGREEN) {
-		if (rctx-ps_shader   
rctx-ps_shader-current-shader.uses_tex_buffers)

-   r600_setup_buffer_constants(rctx, PIPE_SHADER_FRAGMENT);
-		if (rctx-vs_shader   
rctx-vs_shader-current-shader.uses_tex_buffers)

-   r600_setup_buffer_constants(rctx, PIPE_SHADER_VERTEX);
-		if (rctx-gs_shader   
rctx-gs_shader-current-shader.uses_tex_buffers)

-   r600_setup_buffer_constants(rctx, PIPE_SHADER_GEOMETRY);
-   } else {
-		if (rctx-ps_shader   
rctx-ps_shader-current-shader.uses_tex_buffers)

-   eg_setup_buffer_constants(rctx, PIPE_SHADER_FRAGMENT);
-		if (rctx-vs_shader   
rctx-vs_shader-current-shader.uses_tex_buffers)

-   eg_setup_buffer_constants(rctx, PIPE_SHADER_VERTEX);
-		if (rctx-gs_shader   
rctx-gs_shader-current-shader.uses_tex_buffers)

-   eg_setup_buffer_constants(rctx, PIPE_SHADER_GEOMETRY);
+   if (rctx-ps_shader) {
+		need_buf_const = rctx-ps_shader-current-shader.uses_tex_buffers ||  
rctx-ps_shader-current-shader.has_txq_cube_array_z_comp;

+   if (need_buf_const) {
+   if (rctx-b.chip_class  EVERGREEN)
+   r600_setup_buffer_constants(rctx, 
PIPE_SHADER_FRAGMENT);
+   else
+   eg_setup_buffer_constants(rctx, 
PIPE_SHADER_FRAGMENT);
+   }
}
+   if (rctx-vs_shader) {
+		need_buf_const = rctx-vs_shader-current-shader.uses_tex_buffers ||  
rctx-vs_shader-current-shader.has_txq_cube_array_z_comp;

+   if (need_buf_const) {
+   if (rctx-b.chip_class  EVERGREEN)
+   r600_setup_buffer_constants(rctx, 
PIPE_SHADER_VERTEX);
+   else
+   eg_setup_buffer_constants(rctx, 
PIPE_SHADER_VERTEX);
+   }
+   }
-	if (rctx-ps_shader   
rctx-ps_shader-current-shader.has_txq_cube_array_z_comp)

-   r600_setup_txq_cube_array_constants(rctx, PIPE_SHADER_FRAGMENT);
-	if (rctx-vs_shader   
rctx-vs_shader-current-shader.has_txq_cube_array_z_comp)

-   r600_setup_txq_cube_array_constants(rctx, PIPE_SHADER_VERTEX);
-	if (rctx-gs_shader   
rctx-gs_shader-current-shader.has_txq_cube_array_z_comp)

-   r600_setup_txq_cube_array_constants(rctx, PIPE_SHADER_GEOMETRY);
+   if (rctx-gs_shader) {
+		need_buf_const = rctx-gs_shader-current-shader.uses_tex_buffers ||  
rctx-gs_shader-current-shader.has_txq_cube_array_z_comp;

+   if (need_buf_const) {
+   if (rctx-b.chip_class  EVERGREEN)
+   r600_setup_buffer_constants(rctx, 
PIPE_SHADER_GEOMETRY);
+   else
+   eg_setup_buffer_constants(rctx, 
PIPE_SHADER_GEOMETRY);
+   }
+   }
	if (rctx-b.chip_class  EVERGREEN  rctx-ps_shader   
rctx-vs_shader) {

if (!r600_adjust_gprs(rctx)) {



Passes piglits on a Turks with no obvious regressions, so with nits above  
fixed, consider it

Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600: fix texture gradients instruction emission (v2)

2014-11-23 Thread Glenn Kennard
 != TGSI_TEXTURE_RECT) {


Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: do all CUBE ALU operations before gradient texture operations (v2)

2014-11-23 Thread Glenn Kennard
 ?  
2 : 0; // CF_INDEX_1 : CF_INDEX_NONE

-   if (sampler_index_mode)
-   ctx-shader-uses_index_registers = true;
if ((inst-Texture.Texture == TGSI_TEXTURE_CUBE ||
 inst-Texture.Texture == TGSI_TEXTURE_CUBE_ARRAY ||
@@ -5454,6 +5399,69 @@ static int tgsi_tex(struct r600_shader_ctx *ctx)
src_gpr = ctx-temp_reg;
}
+   if (inst-Instruction.Opcode == TGSI_OPCODE_TXD) {
+   int temp_h, temp_v;
+   int start_val = 0;
+
+   /* if we've already loaded the src (i.e. CUBE don't reload it). 
*/
+   if (src_loaded == TRUE)
+   start_val = 1;
+   else
+   src_loaded = TRUE;
+   for (i = start_val; i  3; i++) {
+   int treg = r600_get_temp(ctx);
+
+   if (i == 0)
+   src_gpr = treg;
+   else if (i == 1)
+   temp_h = treg;
+   else
+   temp_v = treg;
+
+   for (j = 0; j  4; j++) {
+   memset(alu, 0, sizeof(struct 
r600_bytecode_alu));
+   alu.op = ALU_OP1_MOV;
+r600_bytecode_src(alu.src[0],  
ctx-src[i], j);

+alu.dst.sel = treg;
+alu.dst.chan = j;
+if (j == 3)
+   alu.last = 1;
+alu.dst.write = 1;
+r = r600_bytecode_add_alu(ctx-bc,  
alu);

+if (r)
+return r;
+   }
+   }
+   for (i = 1; i  3; i++) {
+   /* set gradients h/v */
+   memset(tex, 0, sizeof(struct r600_bytecode_tex));
+   tex.op = (i == 1) ? FETCH_OP_SET_GRADIENTS_H :
+   FETCH_OP_SET_GRADIENTS_V;
+   tex.sampler_id = tgsi_tex_get_src_gpr(ctx, 
sampler_src_reg);
+   tex.sampler_index_mode = sampler_index_mode;
+   tex.resource_id = tex.sampler_id + 
R600_MAX_CONST_BUFFERS;
+   tex.resource_index_mode = sampler_index_mode;
+
+   tex.src_gpr = (i == 1) ? temp_h : temp_v;
+   tex.src_sel_x = 0;
+   tex.src_sel_y = 1;
+   tex.src_sel_z = 2;
+   tex.src_sel_w = 3;
+
+			tex.dst_gpr = r600_get_temp(ctx); /* just to avoid confusing the asm  
scheduler */

+   tex.dst_sel_x = tex.dst_sel_y = tex.dst_sel_z = 
tex.dst_sel_w = 7;
+   if (inst-Texture.Texture != TGSI_TEXTURE_RECT) {
+   tex.coord_type_x = 1;
+   tex.coord_type_y = 1;
+   tex.coord_type_z = 1;
+   tex.coord_type_w = 1;
+   }
+   r = r600_bytecode_add_tex(ctx-bc, tex);
+   if (r)
+   return r;
+   }
+   }
+
if (src_requires_loading  !src_loaded) {
for (i = 0; i  4; i++) {
memset(alu, 0, sizeof(struct r600_bytecode_alu));



ARB_shader_texture_lod piglits go from 76/90 to 88/90, and fixes a number  
of tex-miplevel-selection tests.


Some remaining Cube/1DArrayShadow failures.

Worthwhile improvement as is, so
Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: geom shaders: always load texture src regs from inputs

2014-11-18 Thread Glenn Kennard

On Tue, 18 Nov 2014 05:09:05 +0100, Dave Airlie airl...@gmail.com wrote:


From: Dave Airlie airl...@redhat.com

Otherwise we seem to lose the split_gs_inputs and try and
pull from an uninitialised register.

fixes 9 texelFetch geom shader tests.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_shader.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c  
b/src/gallium/drivers/r600/r600_shader.c

index 709fcd7..ab2a838 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -4919,7 +4919,8 @@ static inline boolean  
tgsi_tex_src_requires_loading(struct r600_shader_ctx *ctx,

return  (inst-Src[index].Register.File != TGSI_FILE_TEMPORARY 
inst-Src[index].Register.File != TGSI_FILE_INPUT 
inst-Src[index].Register.File != TGSI_FILE_OUTPUT) ||
-   ctx-src[index].neg || ctx-src[index].abs;
+   ctx-src[index].neg || ctx-src[index].abs ||
+		(inst-Src[index].Register.File == TGSI_FILE_INPUT  ctx-type ==  
TGSI_PROCESSOR_GEOMETRY);

 }
static inline unsigned tgsi_tex_get_src_gpr(struct r600_shader_ctx *ctx,


Confirmed fixes the same set of tests on a Turks.

Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g: limit texture offset application to specific types (v2)

2014-11-18 Thread Glenn Kennard

On Tue, 18 Nov 2014 07:59:23 +0100, Dave Airlie airl...@gmail.com wrote:


From: Dave Airlie airl...@redhat.com

For 1D and 2D arrays we don't want the other coordinates being
offset and affecting where we sample. I wrote this patch 6 months
ago but lost it.

Fixes:
./bin/tex-miplevel-selection textureLodOffset 1DArray
./bin/tex-miplevel-selection textureLodOffset 2DArray
./bin/tex-miplevel-selection textureOffset 1DArray
./bin/tex-miplevel-selection textureOffset 1DArrayShadow
./bin/tex-miplevel-selection textureOffset 2DArray
./bin/tex-miplevel-selection textureOffset(bias) 1DArray
./bin/tex-miplevel-selection textureOffset(bias) 2DArray

v2: rewrite to handle more cases and be consistent with code
above.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_shader.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c  
b/src/gallium/drivers/r600/r600_shader.c

index ab2a838..76daf2c 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -5535,9 +5535,24 @@ static int tgsi_tex(struct r600_shader_ctx *ctx)
/* texture offsets do not apply to other 
texture targets */
}
} else {
-			offset_x = ctx-literals[4 * inst-TexOffsets[0].Index +  
inst-TexOffsets[0].SwizzleX]  1;
-			offset_y = ctx-literals[4 * inst-TexOffsets[0].Index +  
inst-TexOffsets[0].SwizzleY]  1;
-			offset_z = ctx-literals[4 * inst-TexOffsets[0].Index +  
inst-TexOffsets[0].SwizzleZ]  1;

+   switch (inst-Texture.Texture) {
+   case TGSI_TEXTURE_3D:
+offset_z = ctx-literals[4 * inst-TexOffsets[0].Index +  
inst-TexOffsets[0].SwizzleZ]  1;

+   /* fallthrough */
+   case TGSI_TEXTURE_2D:
+   case TGSI_TEXTURE_SHADOW2D:
+   case TGSI_TEXTURE_RECT:
+   case TGSI_TEXTURE_SHADOWRECT:
+   case TGSI_TEXTURE_2D_ARRAY:
+   case TGSI_TEXTURE_SHADOW2D_ARRAY:
+offset_y = ctx-literals[4 * inst-TexOffsets[0].Index +  
inst-TexOffsets[0].SwizzleY]  1;

+   /* fallthrough */
+   case TGSI_TEXTURE_1D:
+   case TGSI_TEXTURE_SHADOW1D:
+   case TGSI_TEXTURE_1D_ARRAY:
+   case TGSI_TEXTURE_SHADOW1D_ARRAY:
+offset_x = ctx-literals[4 * inst-TexOffsets[0].Index +  
inst-TexOffsets[0].SwizzleX]  1;

+   }
}
}



Confirmed fixes the same set of tests on a Turks.

Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g/cayman: fix integer multiplication output overwrite

2014-11-17 Thread Glenn Kennard

On Tue, 18 Nov 2014 00:56:38 +0100, Dave Airlie airl...@gmail.com wrote:


From: Dave Airlie airl...@redhat.com

This fixes  
tests/spec/glsl-1.10/execution/fs-op-assign-mult-ivec2-ivec2-overwrite.shader_test.


Reported-by: ghallberg on irc
Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_shader.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c  
b/src/gallium/drivers/r600/r600_shader.c

index aab4215..02efc92 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -2729,6 +2729,9 @@ static int cayman_mul_int_instr(struct  
r600_shader_ctx *ctx)

int i, j, k, r;
struct r600_bytecode_alu alu;
int last_slot = (inst-Dst[0].Register.WriteMask  0x8) ? 4 : 3;
+   int t1 = ctx-temp_reg;
+   int lasti = tgsi_last_instruction(inst-Dst[0].Register.WriteMask);
+
for (k = 0; k  last_slot; k++) {
if (!(inst-Dst[0].Register.WriteMask  (1  k)))
continue;
@@ -2739,7 +2742,8 @@ static int cayman_mul_int_instr(struct  
r600_shader_ctx *ctx)

for (j = 0; j  inst-Instruction.NumSrcRegs; j++) {
r600_bytecode_src(alu.src[j], ctx-src[j], k);
}
-   tgsi_dst(ctx, inst-Dst[0], i, alu.dst);
+   alu.dst.sel = t1;
+   alu.dst.chan = i;
alu.dst.write = (i == k);
if (i == 3)
alu.last = 1;
@@ -2748,6 +2752,23 @@ static int cayman_mul_int_instr(struct  
r600_shader_ctx *ctx)

return r;
}
}
+
+   for (i = 0 ; i  last_slot; i++) {
+   if (!(inst-Dst[0].Register.WriteMask  (1  i)))
+   continue;
+   memset(alu, 0, sizeof(struct r600_bytecode_alu));
+   alu.op = ALU_OP1_MOV;
+   alu.src[0].sel = t1;
+   alu.src[0].chan = i;
+   tgsi_dst(ctx, inst-Dst[0], i, alu.dst);
+   alu.dst.write = 1;
+   if (i == lasti)
+   alu.last = 1;
+   r = r600_bytecode_add_alu(ctx-bc, alu);
+   if (r)
+   return r;
+   }
+
return 0;
 }



Trivial nit: last_slot is no longer needed and can be removed.

With a bit of luck it will also fix  
https://bugs.freedesktop.org/show_bug.cgi?id=85376


Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g/cayman: fix texture gather tests

2014-11-17 Thread Glenn Kennard

On Tue, 18 Nov 2014 01:57:13 +0100, Dave Airlie airl...@gmail.com wrote:


From: Dave Airlie airl...@redhat.com

It appears on cayman the TG4 outputs were reordered.

This fixes a lot of piglit tests.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/r600_shader.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c  
b/src/gallium/drivers/r600/r600_shader.c

index 4c6ae45..709fcd7 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -5763,11 +5763,18 @@ static int tgsi_tex(struct r600_shader_ctx *ctx)
 		int8_t texture_component_select = ctx-literals[4 *  
inst-Src[1].Register.Index + inst-Src[1].Register.SwizzleX];

tex.inst_mod = texture_component_select;
+   if (ctx-bc-chip_class == CAYMAN) {
/* GATHER4 result order is different from TGSI TG4 */
-   tex.dst_sel_x = (inst-Dst[0].Register.WriteMask  2) ? 1 : 7;
-   tex.dst_sel_y = (inst-Dst[0].Register.WriteMask  4) ? 2 : 7;
-   tex.dst_sel_z = (inst-Dst[0].Register.WriteMask  1) ? 0 : 7;
-   tex.dst_sel_w = (inst-Dst[0].Register.WriteMask  8) ? 3 : 7;
+   tex.dst_sel_x = (inst-Dst[0].Register.WriteMask  2) ? 
0 : 7;
+   tex.dst_sel_y = (inst-Dst[0].Register.WriteMask  4) ? 
1 : 7;
+   tex.dst_sel_z = (inst-Dst[0].Register.WriteMask  1) ? 
2 : 7;
+   tex.dst_sel_w = (inst-Dst[0].Register.WriteMask  8) ? 
3 : 7;
+   } else {
+   tex.dst_sel_x = (inst-Dst[0].Register.WriteMask  2) ? 
1 : 7;
+   tex.dst_sel_y = (inst-Dst[0].Register.WriteMask  4) ? 
2 : 7;
+   tex.dst_sel_z = (inst-Dst[0].Register.WriteMask  1) ? 
0 : 7;
+   tex.dst_sel_w = (inst-Dst[0].Register.WriteMask  8) ? 
3 : 7;
+   }
}
else if (inst-Instruction.Opcode == TGSI_OPCODE_LODQ) {
tex.dst_sel_x = (inst-Dst[0].Register.WriteMask  2) ? 1 : 7;


Gotta permute those tex op bit encodings between hardware generations or  
they go stale...


Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] r600g/cayman: hande empty vertex shaders

2014-11-17 Thread Glenn Kennard

On Tue, 18 Nov 2014 02:23:51 +0100, Dave Airlie airl...@gmail.com wrote:


From: Dave Airlie airl...@redhat.com

Some of the geom shader tests produce an empty vertex shader,
on cayman we'd crash in the finaliser because last_cf was NULL.

cayman doesn't need the NOP workaround, so if the code arrives
here with no last_cf, just emit an END.

fixes crashes in a bunch of piglit geom shader tests.

Signed-off-by: Dave Airlie airl...@redhat.com
---
 src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp  
b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp

index 5c22f96..f0849ca 100644
--- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
+++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp
@@ -83,14 +83,18 @@ int bc_finalizer::run() {
last_cf = c;
}
-   if (last_cf-bc.op_ptr-flags  CF_ALU) {
+   if (!ctx.is_cayman()  last_cf-bc.op_ptr-flags  CF_ALU) {
last_cf = sh.create_cf(CF_OP_NOP);
sh.root-push_back(last_cf);
}
-   if (ctx.is_cayman())
-   last_cf-insert_after(sh.create_cf(CF_OP_CF_END));
-   else
+   if (ctx.is_cayman()) {
+   if (!last_cf) {
+   cf_node *c = sh.create_cf(CF_OP_CF_END);
+   sh.root-push_back(c);
+   } else
+   last_cf-insert_after(sh.create_cf(CF_OP_CF_END));
+   } else
last_cf-bc.end_of_program = 1;
for (unsigned t = EXP_PIXEL; t  EXP_TYPE_COUNT; ++t) {


Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: Implement GL_ARB_draw_indirect

2014-11-08 Thread Glenn Kennard
Requires evergreen/cayman, and updated radeon kernel module.

Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com
---
See also kernel side patch sent to dri-de...@lists.freedesktop.org

 docs/GL3.txt |  4 +-
 docs/relnotes/10.4.html  |  1 +
 src/gallium/drivers/r600/evergreend.h|  7 ++-
 src/gallium/drivers/r600/r600_pipe.c |  6 ++-
 src/gallium/drivers/r600/r600_state_common.c | 80 ++--
 5 files changed, 77 insertions(+), 21 deletions(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index 2854431..06c52f9 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -95,7 +95,7 @@ GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, 
radeonsi, llvmpipe, soft
 GL 4.0, GLSL 4.00:
 
   GL_ARB_draw_buffers_blendDONE (i965, nv50, nvc0, 
r600, radeonsi, llvmpipe, softpipe)
-  GL_ARB_draw_indirect DONE (i965, nvc0, 
radeonsi, llvmpipe, softpipe)
+  GL_ARB_draw_indirect DONE (i965, nvc0, r600, 
radeonsi, llvmpipe, softpipe)
   GL_ARB_gpu_shader5   DONE (i965, nvc0)
   - 'precise' qualifierDONE
   - Dynamically uniform sampler array indices  DONE (r600)
@@ -159,7 +159,7 @@ GL 4.3, GLSL 4.30:
   GL_ARB_framebuffer_no_attachmentsnot started
   GL_ARB_internalformat_query2 not started
   GL_ARB_invalidate_subdataDONE (all drivers)
-  GL_ARB_multi_draw_indirect   DONE (i965, nvc0, 
radeonsi, llvmpipe, softpipe)
+  GL_ARB_multi_draw_indirect   DONE (i965, nvc0, r600, 
radeonsi, llvmpipe, softpipe)
   GL_ARB_program_interface_query   not started
   GL_ARB_robust_buffer_access_behavior not started
   GL_ARB_shader_image_size not started
diff --git a/docs/relnotes/10.4.html b/docs/relnotes/10.4.html
index d0fbd3b..9c2a491 100644
--- a/docs/relnotes/10.4.html
+++ b/docs/relnotes/10.4.html
@@ -49,6 +49,7 @@ Note: some of the new features are only available with 
certain drivers.
 liGL_ARB_texture_view on nv50, nvc0/li
 liGL_ARB_clip_control on llvmpipe, softpipe, r300, r600, radeonsi/li
 liGL_KHR_context_flush_control on all drivers/li
+liGL_ARB_draw_indirect, GL_ARB_multi_draw_indirect on r600/li
 /ul
 
 
diff --git a/src/gallium/drivers/r600/evergreend.h 
b/src/gallium/drivers/r600/evergreend.h
index 4989996..b8880c8 100644
--- a/src/gallium/drivers/r600/evergreend.h
+++ b/src/gallium/drivers/r600/evergreend.h
@@ -64,6 +64,8 @@
 #define R600_TEXEL_PITCH_ALIGNMENT_MASK0x7
 
 #define PKT3_NOP   0x10
+#define PKT3_SET_BASE  0x11
+#define PKT3_INDEX_BUFFER_SIZE 0x13
 #define PKT3_DEALLOC_STATE 0x14
 #define PKT3_DISPATCH_DIRECT   0x15
 #define PKT3_DISPATCH_INDIRECT 0x16
@@ -72,12 +74,15 @@
 #define PKT3_REG_RMW   0x21
 #define PKT3_COND_EXEC 0x22
 #define PKT3_PRED_EXEC 0x23
-#define PKT3_START_3D_CMDBUF   0x24
+#define PKT3_DRAW_INDIRECT 0x24
+#define PKT3_DRAW_INDEX_INDIRECT   0x25
+#define PKT3_INDEX_BASE0x26
 #define PKT3_DRAW_INDEX_2  0x27
 #define PKT3_CONTEXT_CONTROL   0x28
 #define PKT3_DRAW_INDEX_IMMD_BE0x29
 #define PKT3_INDEX_TYPE0x2A
 #define PKT3_DRAW_INDEX0x2B
+#define PKT3_DRAW_INDIRECT_MULTI   0x2C
 #define PKT3_DRAW_INDEX_AUTO   0x2D
 #define PKT3_DRAW_INDEX_IMMD   0x2E
 #define PKT3_NUM_INSTANCES 0x2F
diff --git a/src/gallium/drivers/r600/r600_pipe.c 
b/src/gallium/drivers/r600/r600_pipe.c
index 0b571e4..829deaf 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -313,6 +313,11 @@ static int r600_get_param(struct pipe_screen* pscreen, 
enum pipe_cap param)
return family = CHIP_CEDAR ? 1 : 0;
case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS:
return family = CHIP_CEDAR ? 4 : 0;
+   case PIPE_CAP_DRAW_INDIRECT:
+   /* needs kernel command checking support to work */
+   if (family = CHIP_CEDAR  rscreen-b.info.drm_minor = 41)
+   return 1;
+   return 0;
 
/* Unsupported features. */
case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT:
@@ -322,7 +327,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum 
pipe_cap param)
case PIPE_CAP_VERTEX_COLOR_CLAMPED:
case PIPE_CAP_USER_VERTEX_BUFFERS:
case PIPE_CAP_TEXTURE_GATHER_OFFSETS:
-   case PIPE_CAP_DRAW_INDIRECT:
case

[Mesa-dev] [PATCH 2/2] r600g: Implement sm5 UBO/sampler indexing

2014-10-15 Thread Glenn Kennard
Caveat: Shaders using UBO/sampler indexing will
not be optimized by SB, due to SB not currently
supporting the necessary CF_INDEX_[01] index
registers.

Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com
---
 docs/GL3.txt   |  4 +--
 src/gallium/drivers/r600/eg_asm.c  | 52 ---
 src/gallium/drivers/r600/r600_asm.c| 58 +-
 src/gallium/drivers/r600/r600_asm.h|  9 +
 src/gallium/drivers/r600/r600_shader.c | 52 +++
 src/gallium/drivers/r600/r600_shader.h |  2 ++
 src/gallium/drivers/r600/sb/sb_bc_dump.cpp |  8 -
 src/gallium/drivers/r600/sb/sb_sched.h |  2 ++
 8 files changed, 166 insertions(+), 21 deletions(-)

diff --git a/docs/GL3.txt b/docs/GL3.txt
index 5ccfdea..dba36e0 100644
--- a/docs/GL3.txt
+++ b/docs/GL3.txt
@@ -98,8 +98,8 @@ GL 4.0, GLSL 4.00:
   GL_ARB_draw_indirect DONE (i965, nvc0, 
radeonsi, llvmpipe, softpipe)
   GL_ARB_gpu_shader5   DONE (i965, nvc0)
   - 'precise' qualifierDONE
-  - Dynamically uniform sampler array indices  DONE ()
-  - Dynamically uniform UBO array indices  DONE ()
+  - Dynamically uniform sampler array indices  DONE (r600)
+  - Dynamically uniform UBO array indices  DONE (r600)
   - Implicit signed - unsigned conversionsDONE
   - Fused multiply-add DONE ()
   - Packing/bitfield/conversion functions  DONE (r600)
diff --git a/src/gallium/drivers/r600/eg_asm.c 
b/src/gallium/drivers/r600/eg_asm.c
index acb3040..295cb4d 100644
--- a/src/gallium/drivers/r600/eg_asm.c
+++ b/src/gallium/drivers/r600/eg_asm.c
@@ -43,10 +43,10 @@ int eg_bytecode_cf_build(struct r600_bytecode *bc, struct 
r600_bytecode_cf *cf)
/* prepend ALU_EXTENDED if we need more than 2 kcache 
sets */
if (cf-eg_alu_extended) {
bc-bytecode[id++] =
-   
S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE0(V_SQ_CF_INDEX_NONE) |
-   
S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE1(V_SQ_CF_INDEX_NONE) |
-   
S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE2(V_SQ_CF_INDEX_NONE) |
-   
S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE3(V_SQ_CF_INDEX_NONE) |
+   
S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE0(cf-kcache[0].index_mode) |
+   
S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE1(cf-kcache[1].index_mode) |
+   
S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE2(cf-kcache[2].index_mode) |
+   
S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE3(cf-kcache[3].index_mode) |

S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK2(cf-kcache[2].bank) |

S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK3(cf-kcache[3].bank) |

S_SQ_CF_ALU_WORD0_EXT_KCACHE_MODE2(cf-kcache[2].mode);
@@ -143,3 +143,47 @@ void eg_bytecode_export_read(struct r600_bytecode *bc,
output-comp_mask = G_SQ_CF_ALLOC_EXPORT_WORD1_BUF_COMP_MASK(word1);
 }
 #endif
+
+int egcm_load_index_reg(struct r600_bytecode *bc, unsigned id, bool 
inside_alu_clause)
+{
+   struct r600_bytecode_alu alu;
+   int r;
+   unsigned type;
+
+   assert(id  2);
+   assert(bc-chip_class = EVERGREEN);
+
+   if (bc-index_loaded[id])
+   return 0;
+
+   memset(alu, 0, sizeof(alu));
+   alu.op = ALU_OP1_MOVA_INT;
+   alu.src[0].sel = bc-index_reg[id];
+   alu.src[0].chan = 0;
+   alu.last = 1;
+   r = r600_bytecode_add_alu(bc, alu);
+   if (r)
+   return r;
+
+   bc-ar_loaded = 0; /* clobbered */
+
+   memset(alu, 0, sizeof(alu));
+   alu.op = id == 0 ? ALU_OP0_SET_CF_IDX0 : ALU_OP0_SET_CF_IDX1;
+   alu.last = 1;
+   r = r600_bytecode_add_alu(bc, alu);
+   if (r)
+   return r;
+
+   /* Must split ALU group as index only applies to following group */
+   if (inside_alu_clause) {
+   type = bc-cf_last-op;
+   if ((r = r600_bytecode_add_cf(bc))) {
+   return r;
+   }
+   bc-cf_last-op = type;
+   }
+
+   bc-index_loaded[id] = 1;
+
+   return 0;
+}
diff --git a/src/gallium/drivers/r600/r600_asm.c 
b/src/gallium/drivers/r600/r600_asm.c
index 8aa69b5..ce3c2d1 100644
--- a/src/gallium/drivers/r600/r600_asm.c
+++ b/src/gallium/drivers/r600/r600_asm.c
@@ -819,6 +819,10 @@ static int merge_inst_groups(struct r600_bytecode *bc, 
struct

  1   2   >