Re: [Mesa-dev] [PATCH] r600/sb: remove superfluos assert
On Tue, 12 Sep 2017 19:25:18 +0200, Vadim Girlinwrote: On 09/12/2017 12:49 PM, Gert Wollny wrote: Am Dienstag, den 12.09.2017, 09:56 +0300 schrieb Vadim Girlin: On 09/11/2017 07:09 PM, Emil Velikov wrote: Anyway, if num_arrays is 0 there, I suspect it can be a result of some other issue. At the very least it looks like a potential performance problem, because in that case we assume all shader registers can be accessed with indirect addressing and it can limit the optimizations significantly. So it might make sense to figure out why it's zero in the first place, in theory it shouldn't happen. Maybe something is wrong with the indirect_files bits?files The shader that's failing is this (i.e. no arrays, and indirect access only to SV). Is the tested feature really supported by r600g? AFAICS the indirect index value is unused in the shader code. Anyway, at first glance it looks like we don't need indirect addressing for GPRs in this case, so the outer "if" around that assert probably should handle this case too and skip the assert. I'm not 100% sure though. FRAG DCL SV[0], SAMPLEMASK DCL OUT[0], COLOR DCL CONST[0][0] DCL TEMP[0..1], LOCAL DCL ADDR[0] IMM[0] FLT32 {1., 0., 0., 0.} IMM[1] INT32 {1, 0, 0, 0} 0: MOV TEMP[0], IMM[0].xyyx 1: UARL ADDR[0].x, CONST[0][0]. 2: USEQ TEMP[1].x, SV[ADDR[0].x]., IMM[1]. 3: UIF TEMP[1]. 4: MOV TEMP[0].xy, IMM[0].yxyy 5: ENDIF 6: MOV OUT[0], TEMP[0] 7: END = SHADER #12 == PS/BARTS/EVERGREEN = = 36 dw = 8 gprs = 1 stack = 4005 a418 ALU_PUSH_BEFORE 7 @10 KC0[CB0:0-15] 0010 00f9 00400c90 1 x: MOVR2.x, 1.0 0012 04f8 20400c90 y: MOVR2.y, 0 0014 04f8 40400c90 z: MOVR2.z, 0 0016 00f9 60400c90 w: MOVR2.w, 1.0 0018 8080 00800c90 t: MOVR4.x, KC0[0].x 0020 801f4800 00601d10 2 x: SETE_INT R3.x, R0.z, 1 0022 801f00fe 00e0229c 3 MP x: PRED_SETNE_INT R7.x, PV.x, 0 0002 0003 8281 JUMP @6 POP:1 0004 000c a804 ALU_POP_AFTER 2 @24 0024 04f8 00400c90 4 x: MOVR2.x, 0 0026 80f9 20400c90 y: MOVR2.y, 1.0 0006 000e a00c ALU 4 @28 0028 0002 00200c90 5 x: MOVR1.x, R2.x 0030 0402 20200c90 y: MOVR1.y, R2.y 0032 0802 40200c90 z: MOVR1.z, R2.z 0034 8c02 60200c90 w: MOVR1.w, R2.w 0008 c0008000 95200688 EXPORT_DONEPIXEL 0 R1.xyzw EOP = SHADER_END Hi Gert, Vadim is correct, the fix is to extend the check in the if case above to also exclude TGSI_FILE_SYSTEM_VALUE, and keep the assert in place. ie: if (pshader->indirect_files & ~((1 << TGSI_FILE_CONSTANT) | (1 << TGSI_FILE_SAMPLER) | (1 << TGSI_FILE_SYSTEM_VALUE))) { Although gl_SampleMaskIn is declared as an array in GLSL, its effectively a 32 bit mask on all hardware supported by mesa so the array indexing is simply ignored. Thanks for looking in to this! /Glenn ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600: refactor out some compressed resource state code.
On Mon, 05 Jun 2017 05:35:02 +0200, Dave Airlie <airl...@gmail.com> wrote: From: Dave Airlie <airl...@redhat.com> This just takes this out to a separate function as it will get more complex with images. --- src/gallium/drivers/r600/r600_state_common.c | 52 +++- 1 file changed, 28 insertions(+), 24 deletions(-) diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index 3b24f36..8ace779 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -1400,6 +1400,32 @@ static void r600_generate_fixed_func_tcs(struct r600_context *rctx) ureg_create_shader_and_destroy(ureg, >b.b); } +static void r600_update_compressed_resource_state(struct r600_context *rctx) +{ + unsigned i; + unsigned counter; + + counter = p_atomic_read(>screen->b.compressed_colortex_counter); + if (counter != rctx->b.last_compressed_colortex_counter) { + rctx->b.last_compressed_colortex_counter = counter; + + for (i = 0; i < PIPE_SHADER_TYPES; ++i) { + r600_update_compressed_colortex_mask(>samplers[i].views); + } + } + + /* Decompress textures if needed. */ + for (i = 0; i < PIPE_SHADER_TYPES; i++) { + struct r600_samplerview_state *views = >samplers[i].views; + if (views->compressed_depthtex_mask) { + r600_decompress_depth_textures(rctx, views); + } + if (views->compressed_colortex_mask) { + r600_decompress_color_textures(rctx, views); + } + } +} + #define SELECT_SHADER_OR_FAIL(x) do { \ r600_shader_select(ctx, rctx->x##_shader, ##_dirty); \ if (unlikely(!rctx->x##_shader->current)) \ @@ -1440,30 +1466,8 @@ static bool r600_update_derived_state(struct r600_context *rctx) bool need_buf_const; struct r600_pipe_shader *clip_so_current = NULL; - if (!rctx->blitter->running) { - unsigned i; - unsigned counter; - - counter = p_atomic_read(>screen->b.compressed_colortex_counter); - if (counter != rctx->b.last_compressed_colortex_counter) { - rctx->b.last_compressed_colortex_counter = counter; - - for (i = 0; i < PIPE_SHADER_TYPES; ++i) { - r600_update_compressed_colortex_mask(>samplers[i].views); - } - } - - /* Decompress textures if needed. */ - for (i = 0; i < PIPE_SHADER_TYPES; i++) { - struct r600_samplerview_state *views = >samplers[i].views; - if (views->compressed_depthtex_mask) { - r600_decompress_depth_textures(rctx, views); - } - if (views->compressed_colortex_mask) { - r600_decompress_color_textures(rctx, views); - } - } - } + if (!rctx->blitter->running) + r600_update_compressed_resource_state(rctx); SELECT_SHADER_OR_FAIL(ps); Patch series is Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/9] r600g: Implement scratch buffer state management
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- src/gallium/drivers/r600/evergreen_state.c | 24 +++ src/gallium/drivers/r600/r600_pipe.c | 3 + src/gallium/drivers/r600/r600_pipe.h | 14 src/gallium/drivers/r600/r600_shader.h | 1 + src/gallium/drivers/r600/r600_state_common.c | 104 +++ 5 files changed, 146 insertions(+) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index c5dd9f7..8e984b9 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -1976,6 +1976,30 @@ static void evergreen_emit_tcs_constant_buffers(struct r600_context *rctx, struc 0); } +void evergreen_setup_scratch_buffers(struct r600_context *rctx) { + static const struct { + unsigned ring_base; + unsigned item_size; + unsigned ring_size; + } regs[EG_NUM_HW_STAGES] = { + [R600_HW_STAGE_PS] = { R_008C68_SQ_PSTMP_RING_BASE, R_028914_SQ_PSTMP_RING_ITEMSIZE, R_008C6C_SQ_PSTMP_RING_SIZE }, + [R600_HW_STAGE_VS] = { R_008C60_SQ_VSTMP_RING_BASE, R_028910_SQ_VSTMP_RING_ITEMSIZE, R_008C64_SQ_VSTMP_RING_SIZE }, + [R600_HW_STAGE_GS] = { R_008C58_SQ_GSTMP_RING_BASE, R_02890C_SQ_GSTMP_RING_ITEMSIZE, R_008C5C_SQ_GSTMP_RING_SIZE }, + [R600_HW_STAGE_ES] = { R_008C50_SQ_ESTMP_RING_BASE, R_028908_SQ_ESTMP_RING_ITEMSIZE, R_008C54_SQ_ESTMP_RING_SIZE }, + [EG_HW_STAGE_LS] = { R_008E10_SQ_LSTMP_RING_BASE, R_028830_SQ_LSTMP_RING_ITEMSIZE, R_008E14_SQ_LSTMP_RING_SIZE }, + [EG_HW_STAGE_HS] = { R_008E18_SQ_HSTMP_RING_BASE, R_028834_SQ_HSTMP_RING_ITEMSIZE, R_008E1C_SQ_HSTMP_RING_SIZE } + }; + + for (unsigned i = 0; i < EG_NUM_HW_STAGES; i++) { + struct r600_pipe_shader *stage = rctx->hw_shader_stages[i].shader; + + if (stage && unlikely(stage->scratch_space_needed)) { + r600_setup_scratch_area_for_shader(rctx, stage, + >scratch_buffers[i], regs[i].ring_base, regs[i].item_size, regs[i].ring_size); + } + } +} + static void evergreen_emit_sampler_views(struct r600_context *rctx, struct r600_samplerview_state *state, unsigned resource_id_base, unsigned pkt_flags) diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index 1803c26..fc03990 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -71,6 +71,9 @@ static void r600_destroy_context(struct pipe_context *context) r600_sb_context_destroy(rctx->sb_context); + for (sh = 0; sh < (rctx->b.chip_class < EVERGREEN ? R600_NUM_HW_STAGES : EG_NUM_HW_STAGES); sh++) { + r600_resource_reference(>scratch_buffers[sh].buffer, NULL); + } r600_resource_reference(>dummy_cmask, NULL); r600_resource_reference(>dummy_fmask, NULL); diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index cf8eba3..c8cf87f 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -413,6 +413,13 @@ struct r600_shader_state { struct r600_pipe_shader *shader; }; +/* Used to spill shader temps */ +struct r600_scratch_buffer { + struct r600_resource*buffer; + unsignedsize; + unsigneditem_size; +}; + struct r600_context { struct r600_common_context b; struct r600_screen *screen; @@ -522,6 +529,8 @@ struct r600_context { struct r600_pipe_shader_selector *last_tcs; unsigned last_num_tcs_input_cp; unsigned lds_alloc; + + struct r600_scratch_buffer scratch_buffers[MAX2(R600_NUM_HW_STAGES, EG_NUM_HW_STAGES)]; }; static inline void r600_emit_command_buffer(struct radeon_winsys_cs *cs, @@ -621,6 +630,7 @@ void evergreen_init_color_surface_rat(struct r600_context *rctx, struct r600_surface *surf); void evergreen_update_db_shader_control(struct r600_context * rctx); bool evergreen_adjust_gprs(struct r600_context *rctx); +void evergreen_setup_scratch_buffers(struct r600_context *rctx); /* r600_blit.c */ void r600_init_blit_functions(struct r600_context *rctx); void r600_decompress_depth_textures(struct r600_context *rctx, @@ -665,6 +675,7 @@ boolean r600_is_format_supported(struct pipe_screen *screen, unsigned sample_count, unsigned usage); void r600_update_db_shader_control(struct r600_context * rctx); +void r600_setup_scratch_buffers(struct r600_context *rctx); /* r600_hw
[Mesa-dev] [PATCH 8/9] r600g/sb: Add dependency tracking for scratch ops
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- src/gallium/drivers/r600/r600_shader.h | 1 + src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 2 +- src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 12 src/gallium/drivers/r600/sb/sb_core.cpp| 3 ++- src/gallium/drivers/r600/sb/sb_ir.h| 6 +- src/gallium/drivers/r600/sb/sb_ra_init.cpp | 2 +- src/gallium/drivers/r600/sb/sb_sched.cpp | 2 +- src/gallium/drivers/r600/sb/sb_valtable.cpp| 1 + 8 files changed, 24 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.h b/src/gallium/drivers/r600/r600_shader.h index e94230f..3c35d48 100644 --- a/src/gallium/drivers/r600/r600_shader.h +++ b/src/gallium/drivers/r600/r600_shader.h @@ -67,6 +67,7 @@ struct r600_shader { boolean uses_kill; boolean fs_write_all; boolean two_side; + boolean needs_scratch_space; /* Number of color outputs in the TGSI shader, * sometimes it could be higher than nr_cbufs (bug?). * Also with writes_all property on eg+ it will be set to max CB number */ diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp index 82826a9..5d74794 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp @@ -293,7 +293,7 @@ void bc_finalizer::finalize_alu_group(alu_group_node* g, node *prev_node) { value *d = n->dst.empty() ? NULL : n->dst[0]; if (d && d->is_special_reg()) { - assert((n->bc.op_ptr->flags & AF_MOVA) || d->is_geometry_emit()); + assert((n->bc.op_ptr->flags & AF_MOVA) || d->is_geometry_emit() || d->is_scratch()); d = NULL; } diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp index ae92a76..9c52342 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp @@ -667,6 +667,11 @@ int bc_parser::prepare_fetch_clause(cf_node *cf) { n->src.push_back(get_cf_index_value(n->bc.resource_index_mode == V_SQ_CF_INDEX_1)); } } + + if (n->bc.op == FETCH_OP_READ_SCRATCH) { + n->src.push_back(sh->get_special_value(SV_SCRATCH)); + n->dst.push_back(sh->get_special_value(SV_SCRATCH)); + } } return 0; @@ -797,6 +802,10 @@ int bc_parser::prepare_ir() { c->flags |= NF_DONT_KILL; } } + else if (c->bc.op == CF_OP_MEM_SCRATCH) { + c->src.push_back(sh->get_special_value(SV_SCRATCH)); + c->dst.push_back(sh->get_special_value(SV_SCRATCH)); + } if (!burst_count--) break; @@ -831,6 +840,9 @@ int bc_parser::prepare_ir() { c->src.push_back(sh->get_special_value(SV_GEOMETRY_EMIT)); c->dst.push_back(sh->get_special_value(SV_GEOMETRY_EMIT)); } + } else if (c->bc.op == CF_OP_WAIT_ACK) { + c->src.push_back(sh->get_special_value(SV_SCRATCH)); + c->dst.push_back(sh->get_special_value(SV_SCRATCH)); } } diff --git a/src/gallium/drivers/r600/sb/sb_core.cpp b/src/gallium/drivers/r600/sb/sb_core.cpp index afea818..283c84f 100644 --- a/src/gallium/drivers/r600/sb/sb_core.cpp +++ b/src/gallium/drivers/r600/sb/sb_core.cpp @@ -191,7 +191,8 @@ int r600_sb_bytecode_process(struct r600_context *rctx, // if conversion breaks the dependency tracking between CF_EMIT ops when it removes // the phi nodes for SV_GEOMETRY_EMIT. Just disable it for GS - if (sh->target != TARGET_GS) + // Same for for shaders spilling to scratch memory using SV_SCRATCH + if (sh->target != TARGET_GS || pshader->needs_scratch_space) SB_RUN_PASS(if_conversion, 1); // if_conversion breaks info about uses, but next pass (peephole) diff --git a/src/gallium/drivers/r600/sb/sb_ir.h b/src/gallium/drivers/r600/sb/sb_ir.h index 74c0549..141bf5f 100644 --- a/src/gallium/drivers/r600/sb/sb_ir.h +++ b/src/gallium/drivers/r600/sb/sb_ir.h @@ -42,7 +42,8 @@ enum special_regs { SV_EXEC_MASK, SV_AR_INDEX, SV_VALID_MASK, - SV_GEOMETRY
[Mesa-dev] [PATCH 9/9] r600g: Implement spilling of temp arrays
Pessimistically spills arrays if GPR limit is exceeded. Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- src/gallium/drivers/r600/r600_shader.c | 308 ++--- 1 file changed, 285 insertions(+), 23 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 8cb3f8b..f716dae 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -165,7 +165,7 @@ int r600_pipe_shader_create(struct pipe_context *ctx, bool dump = r600_can_dump_shader(>screen->b, tgsi_get_processor_type(sel->tokens)); unsigned use_sb = !(rctx->screen->b.debug_flags & DBG_NO_SB); - unsigned sb_disasm = use_sb || (rctx->screen->b.debug_flags & DBG_SB_DISASM); + unsigned sb_disasm; unsigned export_shader; shader->shader.bc.isa = rctx->isa; @@ -203,6 +203,7 @@ int r600_pipe_shader_create(struct pipe_context *ctx, } } + sb_disasm = use_sb || (rctx->screen->b.debug_flags & DBG_SB_DISASM); if (dump && !sb_disasm) { fprintf(stderr, "--\n"); r600_bytecode_disasm(>shader.bc); @@ -317,6 +318,9 @@ struct eg_interp { struct r600_shader_ctx { struct tgsi_shader_info info; + struct tgsi_array_info *array_infos; + /* flag for each tgsi temp array if its been spilled or not */ + bool*spilled_arrays; struct tgsi_parse_context parse; const struct tgsi_token *tokens; unsignedtype; @@ -350,6 +354,7 @@ struct r600_shader_ctx { unsignedenabled_stream_buffers_mask; unsignedtess_input_info; /* temp with tess input offsets */ unsignedtess_output_info; /* temp with tess input offsets */ + unsignedneed_wait_ack; }; struct r600_shader_tgsi_instruction { @@ -850,6 +855,96 @@ static int tgsi_barrier(struct r600_shader_ctx *ctx) return 0; } +static void choose_spill_arrays(struct r600_shader_ctx *ctx, int *regno, unsigned *scratch_space_needed) +{ + // pick largest array and spill it, repeat until the number of temps is under limit or we run out of arrays + unsigned n = ctx->info.array_max[TGSI_FILE_TEMPORARY]; + unsigned narrays_left = n; + bool *spilled = ctx->spilled_arrays; // assumed calloc:ed + + *scratch_space_needed = 0; + while (*regno > 124 && narrays_left) { + unsigned i; + unsigned largest = 0; + unsigned largest_index = 0; + + for (i = 0; i < n; i++) { + unsigned size = ctx->array_infos[i].range.Last - ctx->array_infos[i].range.First + 1; + if (!spilled[i] && size > largest) { + largest = size; + largest_index = i; + } + } + + spilled[largest_index] = true; + *regno -= largest; + *scratch_space_needed += largest; + + narrays_left --; + } + + if (narrays_left == 0) { + ctx->info.indirect_files &= ~(1 << TGSI_FILE_TEMPORARY); + } +} + +/* take spilled temp arrays into account when translating tgsi register + indexes into r600 gprs if spilled is false, or scratch array offset if + spilled is true */ +static int map_tgsi_reg_index_to_r600_gpr(struct r600_shader_ctx *ctx, unsigned tgsi_reg_index, bool *spilled) { + unsigned i; + unsigned spilled_size = 0; + + for (i = 0; i < ctx->info.array_max[TGSI_FILE_TEMPORARY]; i++) { + if (tgsi_reg_index >= ctx->array_infos[i].range.First && tgsi_reg_index <= ctx->array_infos[i].range.Last) { + if (ctx->spilled_arrays[i]) { + /* vec4 index into spilled scratch memory */ + *spilled = true; + + return tgsi_reg_index - ctx->array_infos[i].range.First + spilled_size; + } + else { + /* regular GPR array */ + *spilled = false; + + return tgsi_reg_index - spilled_size + ctx->file_offset[TGSI_FILE_TEMPORARY]; + } + } + + if (ctx->spilled_arrays[i]) { + spilled_size += ctx->array_infos[i].ran
[Mesa-dev] [PATCH 1/9] r600g: Add scratch ring register defines
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- src/gallium/drivers/r600/evergreend.h | 14 ++ src/gallium/drivers/r600/r600d.h | 8 ++-- 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/evergreend.h b/src/gallium/drivers/r600/evergreend.h index 40ba7c1..2fbb540 100644 --- a/src/gallium/drivers/r600/evergreend.h +++ b/src/gallium/drivers/r600/evergreend.h @@ -2021,10 +2021,24 @@ #define R_0288EC_SQ_LDS_ALLOC_PS 0x000288EC #define R_028900_SQ_ESGS_RING_ITEMSIZE 0x00028900 #define R_028904_SQ_GSVS_RING_ITEMSIZE 0x00028904 +#define R_008C50_SQ_ESTMP_RING_BASE 0x8C50 #define R_028908_SQ_ESTMP_RING_ITEMSIZE 0x00028908 +#define R_008C54_SQ_ESTMP_RING_SIZE 0x8C54 +#define R_008C58_SQ_GSTMP_RING_BASE 0x8C58 #define R_02890C_SQ_GSTMP_RING_ITEMSIZE 0x0002890C +#define R_008C5C_SQ_GSTMP_RING_SIZE 0x8C5C +#define R_008C60_SQ_VSTMP_RING_BASE 0x8C60 #define R_028910_SQ_VSTMP_RING_ITEMSIZE 0x00028910 +#define R_008C64_SQ_VSTMP_RING_SIZE 0x8C64 +#define R_008C68_SQ_PSTMP_RING_BASE 0x8C68 #define R_028914_SQ_PSTMP_RING_ITEMSIZE 0x00028914 +#define R_008C6C_SQ_PSTMP_RING_SIZE 0x8C6C +#define R_008E10_SQ_LSTMP_RING_BASE 0x8E10 +#define R_028830_SQ_LSTMP_RING_ITEMSIZE 0x00028830 +#define R_008E14_SQ_LSTMP_RING_SIZE 0x8E14 +#define R_008E18_SQ_HSTMP_RING_BASE 0x8E18 +#define R_028834_SQ_HSTMP_RING_ITEMSIZE 0x00028834 +#define R_008E1C_SQ_HSTMP_RING_SIZE 0x8E1C #define R_02891C_SQ_GS_VERT_ITEMSIZE 0x0002891C #define R_028920_SQ_GS_VERT_ITEMSIZE_1 0x00028920 #define R_028924_SQ_GS_VERT_ITEMSIZE_2 0x00028924 diff --git a/src/gallium/drivers/r600/r600d.h b/src/gallium/drivers/r600/r600d.h index 75d64c1..9155076 100644 --- a/src/gallium/drivers/r600/r600d.h +++ b/src/gallium/drivers/r600/r600d.h @@ -219,8 +219,12 @@ #define R_008C4C_SQ_GSVS_RING_SIZE 0x008C4C #define R_008C50_SQ_ESTMP_RING_BASE 0x008C50 #define R_008C54_SQ_ESTMP_RING_SIZE 0x008C54 -#define R_008C50_SQ_GSTMP_RING_BASE 0x008C58 -#define R_008C54_SQ_GSTMP_RING_SIZE 0x008C5C +#define R_008C58_SQ_GSTMP_RING_BASE 0x008C58 +#define R_008C5C_SQ_GSTMP_RING_SIZE 0x008C5C +#define R_008C68_SQ_PSTMP_RING_BASE 0x008C68 +#define R_008C6C_SQ_PSTMP_RING_SIZE 0x008C6C +#define R_008C60_SQ_VSTMP_RING_BASE 0x008C60 +#define R_008C64_SQ_VSTMP_RING_SIZE 0x008C64 #define R_0088C8_VGT_GS_PER_ES 0x0088C8 #define R_0088CC_VGT_ES_PER_GS 0x0088CC -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/9] r600g: Support emitting scratch ops
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- src/gallium/drivers/r600/eg_asm.c | 3 ++- src/gallium/drivers/r600/r600_asm.c | 25 +++- src/gallium/drivers/r600/r600_asm.h | 15 ++ src/gallium/drivers/r600/r700_asm.c | 39 + 4 files changed, 80 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/eg_asm.c b/src/gallium/drivers/r600/eg_asm.c index 46683c1..fa2e1d4 100644 --- a/src/gallium/drivers/r600/eg_asm.c +++ b/src/gallium/drivers/r600/eg_asm.c @@ -104,7 +104,8 @@ int eg_bytecode_cf_build(struct r600_bytecode *bc, struct r600_bytecode_cf *cf) S_SQ_CF_ALLOC_EXPORT_WORD1_BARRIER(cf->barrier) | S_SQ_CF_ALLOC_EXPORT_WORD1_CF_INST(opcode) | S_SQ_CF_ALLOC_EXPORT_WORD1_BUF_COMP_MASK(cf->output.comp_mask) | - S_SQ_CF_ALLOC_EXPORT_WORD1_BUF_ARRAY_SIZE(cf->output.array_size); + S_SQ_CF_ALLOC_EXPORT_WORD1_BUF_ARRAY_SIZE(cf->output.array_size) | + S_SQ_CF_ALLOC_EXPORT_WORD1_MARK(cf->output.mark); if (bc->chip_class == EVERGREEN) /* no EOP on cayman */ bc->bytecode[id] |= S_SQ_CF_ALLOC_EXPORT_WORD1_END_OF_PROGRAM(cf->end_of_program); id++; diff --git a/src/gallium/drivers/r600/r600_asm.c b/src/gallium/drivers/r600/r600_asm.c index f85993d..7415543 100644 --- a/src/gallium/drivers/r600/r600_asm.c +++ b/src/gallium/drivers/r600/r600_asm.c @@ -1491,6 +1491,9 @@ int cm_bytecode_add_cf_end(struct r600_bytecode *bc) /* common to all 3 families */ static int r600_bytecode_vtx_build(struct r600_bytecode *bc, struct r600_bytecode_vtx *vtx, unsigned id) { + if (r600_isa_fetch(vtx->op)->flags & FF_MEM) + return r700_bytecode_fetch_mem_build(bc, vtx, id); + bc->bytecode[id] = S_SQ_VTX_WORD0_BUFFER_ID(vtx->buffer_id) | S_SQ_VTX_WORD0_FETCH_TYPE(vtx->fetch_type) | S_SQ_VTX_WORD0_SRC_GPR(vtx->src_gpr) | @@ -2127,7 +2130,8 @@ void r600_bytecode_disasm(struct r600_bytecode *bc) o += print_swizzle(7); } - if (cf->output.type == V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_WRITE_IND) + if (cf->output.type == V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_WRITE_IND || + cf->output.type == 3 /*V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_WRITE_IND_ACK */) o += fprintf(stderr, " R%d", cf->output.index_gpr); o += print_indent(o, 67); @@ -2139,6 +2143,10 @@ void r600_bytecode_disasm(struct r600_bytecode *bc) fprintf(stderr, "NO_BARRIER "); if (cf->end_of_program) fprintf(stderr, "EOP "); + + if (cf->output.mark) + fprintf(stderr, "MARK "); + fprintf(stderr, "\n"); } else { fprintf(stderr, "%04d %08X %08X %s ", id, bc->bytecode[id], @@ -2270,6 +2278,8 @@ void r600_bytecode_disasm(struct r600_bytecode *bc) o += fprintf(stderr, ", R%d.", vtx->src_gpr); o += print_swizzle(vtx->src_sel_x); + if (r600_isa_fetch(vtx->op)->flags & FF_MEM) + o += print_swizzle(vtx->src_sel_y); if (vtx->offset) fprintf(stderr, " +%db", vtx->offset); @@ -2286,6 +2296,19 @@ void r600_bytecode_disasm(struct r600_bytecode *bc) if (bc->chip_class >= EVERGREEN && vtx->buffer_index_mode) fprintf(stderr, "SQ_%s ", index_mode[vtx->buffer_index_mode]); + if (r600_isa_fetch(vtx->op)->flags & FF_MEM) { + if (vtx->uncached) + fprintf(stderr, "UNCACHED "); + if (vtx->indexed) + fprintf(stderr, "INDEXED:%d ", vtx->indexed); + + fprintf(stderr, "ELEM_SIZE:%d ", vtx->elem_size); + if (vtx->burst_count) + fprintf(stderr, "BURST_COUNT:%d ", vtx->burst_co
[Mesa-dev] [PATCH 2/9] r600g: Add instruction encoding defines for MEM_RD
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- src/gallium/drivers/r600/r700_sq.h | 30 ++ 1 file changed, 30 insertions(+) diff --git a/src/gallium/drivers/r600/r700_sq.h b/src/gallium/drivers/r600/r700_sq.h index d881012..81e0e7a 100644 --- a/src/gallium/drivers/r600/r700_sq.h +++ b/src/gallium/drivers/r600/r700_sq.h @@ -543,4 +543,34 @@ #define G_SQ_TEX_WORD2_SRC_SEL_W(x)(((x) >> 29) & 0x7) #define C_SQ_TEX_WORD2_SRC_SEL_W 0x1FFF +#define P_SQ_MEM_RD_WORD0 +#define S_SQ_MEM_RD_WORD0_MEM_INST(x) (((x) & 0x1F) << 0) +#define S_SQ_MEM_RD_WORD0_ELEM_SIZE(x) (((x) & 0x3) << 5) +#define S_SQ_MEM_RD_WORD0_FETCH_WHOLE_QUAD(x) (((x) & 0x1) << 7) +#define S_SQ_MEM_RD_WORD0_MEM_OP(x)(((x) & 0x7) << 8) +#define S_SQ_MEM_RD_WORD0_UNCACHED(x) (((x) & 0x1) << 11) +#define S_SQ_MEM_RD_WORD0_INDEXED(x) (((x) & 0x1) << 12) +#define S_SQ_MEM_RD_WORD0_SRC_SEL_Y(x) (((x) & 0x3) << 13) +#define S_SQ_MEM_RD_WORD0_SRC_GPR(x) (((x) & 0x7F) << 16) +#define S_SQ_MEM_RD_WORD0_SRC_REL(x) (((x) & 0x1) << 23) +#define S_SQ_MEM_RD_WORD0_SRC_SEL_X(x) (((x) & 0x3) << 24) +#define S_SQ_MEM_RD_WORD0_BURST_COUNT(x) (((x) & 0xF) << 26) +#define S_SQ_MEM_RD_WORD0_LDS_REQ(x) (((x) & 0x1) << 30) +#define S_SQ_MEM_RD_WORD0_COALESCED_READ(x)(((x) & 0x1) << 31) +#define P_SQ_MEM_RD_WORD1 +#define S_SQ_MEM_RD_WORD1_DST_GPR(x) (((x) & 0x7f) << 0) +#define S_SQ_MEM_RD_WORD1_DST_REL(x) (((x) & 0x1) << 7) +#define S_SQ_MEM_RD_WORD1_DST_SEL_X(x) (((x) & 0x7) << 9) +#define S_SQ_MEM_RD_WORD1_DST_SEL_Y(x) (((x) & 0x7) << 12) +#define S_SQ_MEM_RD_WORD1_DST_SEL_Z(x) (((x) & 0x7) << 15) +#define S_SQ_MEM_RD_WORD1_DST_SEL_W(x) (((x) & 0x7) << 18) +#define S_SQ_MEM_RD_WORD1_DATA_FORMAT(x) (((x) & 0x3F) << 22) +#define S_SQ_MEM_RD_WORD1_NUM_FORMAT_ALL(x)(((x) & 0x3) << 28) +#define S_SQ_MEM_RD_WORD1_FORMAT_COMP_ALL(x) (((x) & 0x1) << 30) +#define S_SQ_MEM_RD_WORD1_SRF_MODE_ALL(x) (((x) & 0x1) << 31) +#define P_SQ_MEM_RD_WORD2 +#define S_SQ_MEM_RD_WORD2_ARRAY_BASE(x)(((x) & 0x1FFF) << 0) +#define S_SQ_MEM_RD_WORD2_ENDIAN_SWAP(x) (((x) & 0x3) << 16) +#define S_SQ_MEM_RD_WORD2_ARRAY_SIZE(x)(((x) & 0xFFF) << 20) + #endif -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/9] r600g: Add defines for per-shader engine settings
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- src/gallium/drivers/r600/r600d.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/src/gallium/drivers/r600/r600d.h b/src/gallium/drivers/r600/r600d.h index 9155076..0d04708 100644 --- a/src/gallium/drivers/r600/r600d.h +++ b/src/gallium/drivers/r600/r600d.h @@ -3777,6 +3777,12 @@ #define SQ_TEX_INST_SAMPLE_C_G_LB 0x1E #define SQ_TEX_INST_SAMPLE_C_G_LZ 0x1F +#define EG_0802C_GRBM_GFX_INDEX0x802C +#define S_0802C_INSTANCE_INDEX(x) (((x) & 0x) << 0) +#define S_0802C_SE_INDEX(x)(((x) & 0x3fff) << 16) +#define S_0802C_INSTANCE_BROADCAST_WRITES(x) (((x) & 0x1) << 30) +#define S_0802C_SE_BROADCAST_WRITES(x) (((x) & 0x1) << 31) + #define CM_R_028AA8_IA_MULTI_VGT_PARAM0x028AA8 #define S_028AA8_PRIMGROUP_SIZE(x) (((unsigned)(x) & 0x) << 0) #define G_028AA8_PRIMGROUP_SIZE(x) (((x) >> 0) & 0x) -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] r600g: Support spilling temp arrays
This patch series implements support for spilling temporary arrays on R6xx/R7xx/Evergreen/NI if hardware GPR limits are exceeded. It opts for a simple pessimistic scheme of spilling the largest arrays until things fit. This fixes some subset of issues where "GPR limit exceeded" or "TGSI translation error" is printed to the console. Exercises left to reader: * Test on R600/R700, I suspect R600 in particular might need some additional fixups for write masking in tgsi_src(). * Implement support for spilling regular TGSI temps. Most of the infrastructure needed is in this patch series so should be straightforward. This would fix the remaining GPR limit exceeded issues. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/9] r600g: Add pending output function
Spills have to happen after the VLIW bundle currently processed, so defer emitting the spill op. Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- src/gallium/drivers/r600/r600_asm.c | 18 ++ src/gallium/drivers/r600/r600_asm.h | 4 2 files changed, 22 insertions(+) diff --git a/src/gallium/drivers/r600/r600_asm.c b/src/gallium/drivers/r600/r600_asm.c index 7415543..69bd0d6 100644 --- a/src/gallium/drivers/r600/r600_asm.c +++ b/src/gallium/drivers/r600/r600_asm.c @@ -235,6 +235,15 @@ int r600_bytecode_add_output(struct r600_bytecode *bc, return 0; } +int r600_bytecode_add_pending_output(struct r600_bytecode *bc, + const struct r600_bytecode_output *output) +{ + assert(bc->n_pending_outputs + 1 < ARRAY_SIZE(bc->pending_outputs)); + bc->pending_outputs[bc->n_pending_outputs++] = *output; + + return 0; +} + /* alu instructions that can ony exits once per group */ static int is_alu_once_inst(struct r600_bytecode *bc, struct r600_bytecode_alu *alu) { @@ -1304,6 +1313,15 @@ int r600_bytecode_add_alu_type(struct r600_bytecode *bc, if (nalu->dst.rel && bc->r6xx_nop_after_rel_dst) insert_nop_r6xx(bc); + /* Might need to insert spill write ops after current clause */ + if (nalu->last && bc->n_pending_outputs) { + while (bc->n_pending_outputs) { + r = r600_bytecode_add_output(bc, >pending_outputs[--bc->n_pending_outputs]); + if (r) + return r; + } + } + return 0; } diff --git a/src/gallium/drivers/r600/r600_asm.h b/src/gallium/drivers/r600/r600_asm.h index 87a7c3a..df46db7 100644 --- a/src/gallium/drivers/r600/r600_asm.h +++ b/src/gallium/drivers/r600/r600_asm.h @@ -261,6 +261,8 @@ struct r600_bytecode { unsignedindex_reg[2]; /* indexing register CF_INDEX_[01] */ unsigneddebug_id; struct r600_isa* isa; + struct r600_bytecode_output pending_outputs[5]; + int n_pending_outputs; }; /* eg_asm.c */ @@ -285,6 +287,8 @@ int r600_bytecode_add_gds(struct r600_bytecode *bc, const struct r600_bytecode_gds *gds); int r600_bytecode_add_output(struct r600_bytecode *bc, const struct r600_bytecode_output *output); +int r600_bytecode_add_pending_output(struct r600_bytecode *bc, + const struct r600_bytecode_output *output); int r600_bytecode_build(struct r600_bytecode *bc); int r600_bytecode_add_cf(struct r600_bytecode *bc); int r600_bytecode_add_cfinst(struct r600_bytecode *bc, -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/9] r600g/sb: Support scratch ops
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- src/gallium/drivers/r600/sb/sb_bc.h | 11 ++ src/gallium/drivers/r600/sb/sb_bc_builder.cpp | 46 - src/gallium/drivers/r600/sb/sb_bc_decoder.cpp | 49 ++- src/gallium/drivers/r600/sb/sb_bc_dump.cpp| 15 src/gallium/drivers/r600/sb/sb_bc_fmt_def.inc | 36 5 files changed, 155 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_bc.h b/src/gallium/drivers/r600/sb/sb_bc.h index 2c662ac..74c8699 100644 --- a/src/gallium/drivers/r600/sb/sb_bc.h +++ b/src/gallium/drivers/r600/sb/sb_bc.h @@ -580,6 +580,15 @@ struct bc_fetch { unsigned mega_fetch:1; unsigned src2_gpr:7; /* for GDS */ + + /* for MEM ops */ + unsigned elem_size:2; + unsigned uncached:1; + unsigned indexed:1; + unsigned burst_count:4; + unsigned array_base:13; + unsigned array_size:12; + void set_op(unsigned op) { this->op = op; op_ptr = r600_isa_fetch(op); } }; @@ -747,6 +756,7 @@ private: int decode_fetch_vtx(unsigned , bc_fetch ); int decode_fetch_gds(unsigned , bc_fetch ); + int decode_fetch_mem(unsigned , bc_fetch ); }; // bytecode format definition @@ -966,6 +976,7 @@ private: int build_fetch_clause(cf_node *n); int build_fetch_tex(fetch_node *n); int build_fetch_vtx(fetch_node *n); + int build_fetch_mem(fetch_node* n); }; } // namespace r600_sb diff --git a/src/gallium/drivers/r600/sb/sb_bc_builder.cpp b/src/gallium/drivers/r600/sb/sb_bc_builder.cpp index b0df3d9..678844c 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_builder.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_builder.cpp @@ -129,7 +129,9 @@ int bc_builder::build_fetch_clause(cf_node* n) { I != E; ++I) { fetch_node *f = static_cast<fetch_node*>(*I); - if (f->bc.op_ptr->flags & FF_VTX) + if (f->bc.op_ptr->flags & FF_MEM) + build_fetch_mem(f); + else if (f->bc.op_ptr->flags & FF_VTX) build_fetch_vtx(f); else build_fetch_tex(f); @@ -657,4 +659,46 @@ int bc_builder::build_fetch_vtx(fetch_node* n) { return 0; } +int bc_builder::build_fetch_mem(fetch_node* n) { + const bc_fetch = n->bc; + const fetch_op_info *fop = bc.op_ptr; + + assert(fop->flags & FF_MEM); + + bb << MEM_RD_WORD0_R7EGCM() + .MEM_INST(2) + .ELEM_SIZE(bc.elem_size) + .FETCH_WHOLE_QUAD(bc.fetch_whole_quad) + .MEM_OP(0) + .UNCACHED(bc.uncached) + .INDEXED(bc.indexed) + .SRC_SEL_Y(bc.src_sel[1]) + .SRC_GPR(bc.src_gpr) + .SRC_REL(bc.src_rel) + .SRC_SEL_X(bc.src_sel[0]) + .BURST_COUNT(bc.burst_count) + .LDS_REQ(bc.lds_req) + .COALESCED_READ(bc.coalesced_read); + + bb << MEM_RD_WORD1_R7EGCM() + .DST_GPR(bc.dst_gpr) + .DST_REL(bc.dst_rel) + .DST_SEL_X(bc.dst_sel[0]) + .DST_SEL_Y(bc.dst_sel[1]) + .DST_SEL_Z(bc.dst_sel[2]) + .DST_SEL_W(bc.dst_sel[3]) + .DATA_FORMAT(bc.data_format) + .NUM_FORMAT_ALL(bc.num_format_all) + .FORMAT_COMP_ALL(bc.format_comp_all) + .SRF_MODE_ALL(bc.srf_mode_all); + + bb << MEM_RD_WORD2_R7EGCM() + .ARRAY_BASE(bc.array_base) + .ENDIAN_SWAP(bc.endian_swap) + .ARR_SIZE(bc.array_size); + + bb << 0; + return 0; +} + } diff --git a/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp b/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp index 8712abe..1c63c38 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp @@ -413,7 +413,9 @@ int bc_decoder::decode_fetch(unsigned & i, bc_fetch& bc) { if (fetch_opcode == 2) { // MEM_INST_MEM unsigned mem_op = (dw0 >> 8) & 0x7; unsigned gds_op; - if (mem_op == 4) { + if (mem_op == 0 || mem_op == 2) { + fetch_opcode = mem_op == 0 ? FETCH_OP_READ_SCRATCH : FETCH_OP_READ_MEM; + } else if (mem_op == 4) { gds_op = (dw1 >> 9) & 0x1f; fetch_opcode = FETCH_OP_GDS_ADD + gds_op; } else if (mem_op == 5) @@ -422,6 +424,9 @@ int bc_decoder::decode_fetch(unsigned & i, bc_fetch& bc) { } else bc.set_op(r600_isa_fetch_by_opcode(ctx.isa, fetch_opcode)); + if (bc.op_ptr->flags & FF_MEM) + return decode_fetch_mem(i, bc); +
Re: [Mesa-dev] [PATCH 1/1] r600: Enable FMA on chips that support it
On Wed, 15 Jun 2016 20:13:13 +0200, Jan Veselywrote: Signed-off-by: Jan Vesely --- Untested (I don't have the required hw) src/gallium/drivers/r600/r600_pipe.c | 5 - src/gallium/drivers/r600/r600_shader.c | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index a49b00f..49c3e1d 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -548,7 +548,6 @@ static int r600_get_shader_param(struct pipe_screen* pscreen, unsigned shader, e return 0; case PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED: case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED: - case PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED: case PIPE_SHADER_CAP_MAX_SHADER_BUFFERS: case PIPE_SHADER_CAP_MAX_SHADER_IMAGES: return 0; @@ -558,6 +557,10 @@ static int r600_get_shader_param(struct pipe_screen* pscreen, unsigned shader, e *https://bugs.freedesktop.org/show_bug.cgi?id=86720 */ return 255; + case PIPE_SHADER_CAP_TGSI_FMA_SUPPORTED: + // Enable on CYPRESS(EG) and CAYMAN(NI) + return rscreen->b.family == CHIP_CYPRESS || + rscreen->b.family == CHIP_CAYMAN; } return 0; } diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 101f666..35019e3 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -8917,7 +8917,7 @@ static const struct r600_shader_tgsi_instruction r600_shader_tgsi_instruction[] [TGSI_OPCODE_MAD] = { ALU_OP3_MULADD, tgsi_op3}, [TGSI_OPCODE_SUB] = { ALU_OP2_ADD, tgsi_op2}, [TGSI_OPCODE_LRP] = { ALU_OP0_NOP, tgsi_lrp}, - [TGSI_OPCODE_FMA] = { ALU_OP0_NOP, tgsi_unsupported}, + [TGSI_OPCODE_FMA] = { ALU_OP3_FMA, tgsi_op3}, [TGSI_OPCODE_SQRT] = { ALU_OP1_SQRT_IEEE, tgsi_trans_srcx_replicate}, [TGSI_OPCODE_DP2A] = { ALU_OP0_NOP, tgsi_unsupported}, [22]= { ALU_OP0_NOP, tgsi_unsupported}, You probably meant to add the opcode to the eg_shader_tgsi_instruction and cm_shader_tgsi_instruction opcode tables rather than the R600/R700 one? I'll also note in passing that FMA on CYPRESS/HEMLOCK has an issue rate of 4/cycle vs MULADD 5/cycle since FMA cannot be issued in the 't' slot, may or may not affect performance depending on if the GLSL front end decides to use fma for mul+add operations. On Cayman/Aruba they are the same rate. /Glenn ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] st/mesa: add GL_ARB_shader_atomic_counter_ops support
On Thu, 10 Mar 2016 18:13:03 +0100, Ilia Mirkin <imir...@alum.mit.edu> wrote: On Thu, Mar 10, 2016 at 12:04 PM, Glenn Kennard <glenn.kenn...@gmail.com> wrote: On Thu, 10 Mar 2016 17:02:15 +0100, Ilia Mirkin <imir...@alum.mit.edu> wrote: On Thu, Mar 10, 2016 at 10:57 AM, Nicolai Hähnle <nhaeh...@gmail.com> wrote: - if (c->MaxCombinedAtomicBuffers > 0) + if (c->MaxCombinedAtomicBuffers > 0) { extensions->ARB_shader_atomic_counters = GL_TRUE; + extensions->ARB_shader_atomic_counter_ops = GL_TRUE; + } I believe there's pre-GCN AMD hardware which can support atomic counters but not atomic_counter_ops (at least according to what the closed driver exposes, I haven't actually checked the docs), so there should probably be a capability flag here. I assumed this was due to laziness... seems odd if the SSBO atomic ops can be supported, but those same ops can't be supported on atomic buffers. Glenn / Dave - do you guys happen to know what the pre-GCN hw is capable of? -ilia AFAIK Cayman supports atomic counter ops on SSBOs, evergreen only on counter buffers, and earlier hardware does neither. To phrase this a different way, my patch is fine? :) If you support atomic counters, you support all the various ops in ARB_shader_atomic_counter_ops (which are basically all the SSBO ops, but on atomic counters)? I think so, though the closed driver only exposes ARB_shader_atomic_counter_ops on Cayman only which may be a hint to something. Cross that bridge when we get there... /Glenn ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600/sb: Do not distribute neg in expr_handler::fold_assoc() when folding multiplications.
The patch makes a bit more sense to me after realizing a fallthrough was changed to a break, so the whole patch is Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] st/mesa: add GL_ARB_shader_atomic_counter_ops support
On Thu, 10 Mar 2016 17:02:15 +0100, Ilia Mirkinwrote: On Thu, Mar 10, 2016 at 10:57 AM, Nicolai Hähnle wrote: - if (c->MaxCombinedAtomicBuffers > 0) + if (c->MaxCombinedAtomicBuffers > 0) { extensions->ARB_shader_atomic_counters = GL_TRUE; + extensions->ARB_shader_atomic_counter_ops = GL_TRUE; + } I believe there's pre-GCN AMD hardware which can support atomic counters but not atomic_counter_ops (at least according to what the closed driver exposes, I haven't actually checked the docs), so there should probably be a capability flag here. I assumed this was due to laziness... seems odd if the SSBO atomic ops can be supported, but those same ops can't be supported on atomic buffers. Glenn / Dave - do you guys happen to know what the pre-GCN hw is capable of? -ilia AFAIK Cayman supports atomic counter ops on SSBOs, evergreen only on counter buffers, and earlier hardware does neither. /Glenn ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600/sb: Do not distribute neg in expr_handler::fold_assoc() when folding multiplications.
On Wed, 09 Mar 2016 09:58:48 +0100, Xavier B <xavi...@gmail.com> wrote: From: xavier <xavi...@gmail.com> Previously it was doing this transformation for a Trine 3 shader: MUL R6.x.12,R13.x.23, 0.5|3f00 -MULADD R4.x.12,-R6.x.12, 2|4000, 1|3f80 +MULADD R4.x.12,-R13.x.23, -1|bf80, 1|3f80 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94412 Signed-off-by: Xavier Bouchoux <xavi...@gmail.com> --- src/gallium/drivers/r600/sb/sb_expr.cpp | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_expr.cpp b/src/gallium/drivers/r600/sb/sb_expr.cpp index 556a05d..3dd3a48 100644 --- a/src/gallium/drivers/r600/sb/sb_expr.cpp +++ b/src/gallium/drivers/r600/sb/sb_expr.cpp @@ -598,9 +598,13 @@ bool expr_handler::fold_assoc(alu_node *n) { unsigned op = n->bc.op; bool allow_neg = false, cur_neg = false; + bool distribute_neg = false; switch(op) { case ALU_OP2_ADD: + distribute_neg = true; + allow_neg = true; I'm not sure this change belongs in this patch, or even if its correct. + break; case ALU_OP2_MUL: case ALU_OP2_MUL_IEEE: allow_neg = true; @@ -632,7 +636,7 @@ bool expr_handler::fold_assoc(alu_node *n) { if (v1->is_const()) { literal arg = v1->get_const_value(); apply_alu_src_mod(a->bc, 1, arg); - if (cur_neg) + if (cur_neg && distribute_neg) arg.f = -arg.f; if (a == n) @@ -660,7 +664,7 @@ bool expr_handler::fold_assoc(alu_node *n) { if (v0->is_const()) { literal arg = v0->get_const_value(); apply_alu_src_mod(a->bc, 0, arg); - if (cur_neg) + if (cur_neg && distribute_neg) arg.f = -arg.f; if (last_arg == 0) { With the allow_neg change removed, patch is Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g: Add support for PK2H/UP2H
Based off of Ilia's original patch, but with output values replicated so that it matches the TGSI semantics. Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- src/gallium/drivers/r600/r600_pipe.c | 2 +- src/gallium/drivers/r600/r600_shader.c | 107 +++-- 2 files changed, 104 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index d71082f..3b5d26c 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -328,6 +328,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_TEXTURE_QUERY_LOD: case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE: case PIPE_CAP_SAMPLER_VIEW_TARGET: + case PIPE_CAP_TGSI_PACK_HALF_FLOAT: return family >= CHIP_CEDAR ? 1 : 0; case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: return family >= CHIP_CEDAR ? 4 : 0; @@ -349,7 +350,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_SHAREABLE_SHADERS: case PIPE_CAP_CLEAR_TEXTURE: case PIPE_CAP_DRAW_PARAMETERS: - case PIPE_CAP_TGSI_PACK_HALF_FLOAT: return 0; case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS: diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 9c040ae..7b1eade 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -8960,6 +8960,105 @@ static int tgsi_umad(struct r600_shader_ctx *ctx) return 0; } +static int tgsi_pk2h(struct r600_shader_ctx *ctx) +{ + struct tgsi_full_instruction *inst = >parse.FullToken.FullInstruction; + struct r600_bytecode_alu alu; + int r, i; + int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask); + + /* temp.xy = f32_to_f16(src) */ + memset(, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ALU_OP1_FLT32_TO_FLT16; + alu.dst.chan = 0; + alu.dst.sel = ctx->temp_reg; + alu.dst.write = 1; + r600_bytecode_src([0], >src[0], 0); + r = r600_bytecode_add_alu(ctx->bc, ); + if (r) + return r; + alu.dst.chan = 1; + r600_bytecode_src([0], >src[0], 1); + alu.last = 1; + r = r600_bytecode_add_alu(ctx->bc, ); + if (r) + return r; + + /* dst.x = temp.y * 0x1 + temp.x */ + for (i = 0; i < lasti + 1; i++) { + if (!(inst->Dst[0].Register.WriteMask & (1 << i))) + continue; + + memset(, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ALU_OP3_MULADD_UINT24; + alu.is_op3 = 1; + tgsi_dst(ctx, >Dst[0], i, ); + alu.last = i == lasti; + alu.src[0].sel = ctx->temp_reg; + alu.src[0].chan = 1; + alu.src[1].sel = V_SQ_ALU_SRC_LITERAL; + alu.src[1].value = 0x1; + alu.src[2].sel = ctx->temp_reg; + alu.src[2].chan = 0; + r = r600_bytecode_add_alu(ctx->bc, ); + if (r) + return r; + } + + return 0; +} + +static int tgsi_up2h(struct r600_shader_ctx *ctx) +{ + struct tgsi_full_instruction *inst = >parse.FullToken.FullInstruction; + struct r600_bytecode_alu alu; + int r, i; + int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask); + + /* temp.x = src.x */ + /* note: no need to mask out the high bits */ + memset(, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ALU_OP1_MOV; + alu.dst.chan = 0; + alu.dst.sel = ctx->temp_reg; + alu.dst.write = 1; + r600_bytecode_src([0], >src[0], 0); + r = r600_bytecode_add_alu(ctx->bc, ); + if (r) + return r; + + /* temp.y = src.x >> 16 */ + memset(, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ALU_OP2_LSHR_INT; + alu.dst.chan = 1; + alu.dst.sel = ctx->temp_reg; + alu.dst.write = 1; + r600_bytecode_src([0], >src[0], 0); + alu.src[1].sel = V_SQ_ALU_SRC_LITERAL; + alu.src[1].value = 16; + alu.last = 1; + r = r600_bytecode_add_alu(ctx->bc, ); + if (r) + return r; + + /* dst.wz = dst.xy = f16_to_f32(temp.xy) */ + for (i = 0; i < lasti + 1; i++) { + if (!(inst->Dst[0].Register.WriteMask & (1 << i))) + continue; + memset(, 0, sizeof(struct r600_bytecode_alu)); + tgsi_dst(ctx, >Dst[0], i, ); + alu.op = ALU_OP1_FLT16_TO_FLT32; + alu.src[0].sel = ctx->temp_reg; + alu.src[0].chan = i % 2; + alu.last = i == las
Re: [Mesa-dev] [PATCH] util/macros: Simplify DIV_ROUND_UP() definition
On Wed, 16 Dec 2015 20:57:51 +0100, Nanley Cherywrote: From: Nanley Chery Commit 64880d073ab21ae1abad0c049ea2d6a1169a3cfa consolidated two DIV_ROUND_UP() definitions to one, but chose the more compute-intensive version in the process. Use the simpler version instead. Reduces .text size by 1360 bytes. Output of `size lib/i965_dri.so`: textdata bss dec hex filename 7850440 219264 27240 8096944 7b8cb0 lib/i965_dri.so (before) 7849080 219264 27240 8095584 7b8760 lib/i965_dri.so (after) Cc: Axel Davy Signed-off-by: Nanley Chery --- src/util/macros.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/util/macros.h b/src/util/macros.h index 0c8958f..53a98a0 100644 --- a/src/util/macros.h +++ b/src/util/macros.h @@ -211,6 +211,6 @@ do { \ #endif /** Compute ceiling of integer quotient of A divided by B. */ -#define DIV_ROUND_UP( A, B ) ( (A) % (B) == 0 ? (A)/(B) : (A)/(B)+1 ) +#define DIV_ROUND_UP(A, B) (((A) + (B) - 1) / (B)) #endif /* UTIL_MACROS_H */ I'll point out that these are not equivalent, one can overflow and the other doesn't. You probably want to check if the call sites have sufficient checks for that before substituting one for the other. /Glenn ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/53] r600/sb: add support for GDS to the sb decoder/dump.
OOP, 0); - s << "."; - for (int k = 0; k < 4; ++k) - s << chans[n.bc.dst_sel[k]]; - s << ", "; + if (!gds) { + s << "R"; + print_sel(s, n.bc.dst_gpr, n.bc.dst_rel, INDEX_LOOP, 0); + s << "."; + for (int k = 0; k < 4; ++k) + s << chans[n.bc.dst_sel[k]]; + s << ", "; + } s << "R"; print_sel(s, n.bc.src_gpr, n.bc.src_rel, INDEX_LOOP, 0); s << "."; unsigned vtx = n.bc.op_ptr->flags & FF_VTX; - unsigned num_src_comp = vtx ? ctx.is_cayman() ? 2 : 1 : 4; + unsigned num_src_comp = gds ? 3 : vtx ? ctx.is_cayman() ? 2 : 1 : 4; for (unsigned k = 0; k < num_src_comp; ++k) s << chans[n.bc.src_sel[k]]; @@ -450,9 +453,12 @@ void bc_dump::dump(fetch_node& n) { s << " + " << n.bc.offset[0] << "b "; } - s << ", RID:" << n.bc.resource_id; + if (!gds) + s << ", RID:" << n.bc.resource_id; + + if (gds) { - if (vtx) { + } else if (vtx) { s << " " << fetch_type[n.bc.fetch_type]; if (!ctx.is_cayman() && n.bc.mega_fetch_count) s << " MFC:" << n.bc.mega_fetch_count; diff --git a/src/gallium/drivers/r600/sb/sb_bc_fmt_def.inc b/src/gallium/drivers/r600/sb/sb_bc_fmt_def.inc index 50f73d7..e775499 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_fmt_def.inc +++ b/src/gallium/drivers/r600/sb/sb_bc_fmt_def.inc @@ -541,3 +541,31 @@ BC_FIELD(TEX_WORD2, SRC_SEL_Y, SSY, 25, 23) BC_FIELD(TEX_WORD2, SRC_SEL_Z, SSZ,28, 26) BC_FIELD(TEX_WORD2, SRC_SEL_W, SSW,31, 29) BC_FORMAT_END(TEX_WORD2) + +BC_FORMAT_BEGIN_HW(MEM_GDS_WORD0, EGCM) +BC_FIELD(MEM_GDS_WORD0, MEM_INST, M_INST, 4, 0) +BC_FIELD(MEM_GDS_WORD0, MEM_OP,M_OP, 10, 8) +BC_FIELD(MEM_GDS_WORD0, SRC_GPR,S_GPR, 17, 11) +BC_FIELD(MEM_GDS_WORD0, SRC_REL,SR,19, 18) +BC_FIELD(MEM_GDS_WORD0, SRC_SEL_X, SSX, 22, 20) +BC_FIELD(MEM_GDS_WORD0, SRC_SEL_Y, SSY, 25, 23) +BC_FIELD(MEM_GDS_WORD0, SRC_SEL_Z, SSZ, 28, 26) +BC_FORMAT_END(MEM_GDS_WORD0) + +BC_FORMAT_BEGIN_HW(MEM_GDS_WORD1, EGCM) +BC_FIELD(MEM_GDS_WORD1, DST_GPR,D_GPR, 6, 0) +BC_FIELD(MEM_GDS_WORD1, DST_REL,DR, 8, 7) +BC_FIELD(MEM_GDS_WORD1, GDS_OP, G_OP, 14, 9) +BC_FIELD(MEM_GDS_WORD1, SRC_GPR,S_GPR, 22, 16) +BC_FIELD(MEM_GDS_WORD1, UAV_INDEX_MODE, U_IM, 25, 24) +BC_FIELD(MEM_GDS_WORD1, UAV_ID, U_ID, 29, 26) +BC_FIELD(MEM_GDS_WORD1, ALLOC_CONSUME, AC,30, 30) +BC_FIELD(MEM_GDS_WORD1, BCARD_FIRST_REQ,BFR, 31, 31) +BC_FORMAT_END(MEM_GDS_WORD1) + +BC_FORMAT_BEGIN_HW(MEM_GDS_WORD2, EGCM) +BC_FIELD(MEM_GDS_WORD2, DST_SEL_X, DSX,2, 0) +BC_FIELD(MEM_GDS_WORD2, DST_SEL_Y, DSY,5, 3) +BC_FIELD(MEM_GDS_WORD2, DST_SEL_Z, DSZ,8, 6) +BC_FIELD(MEM_GDS_WORD2, DST_SEL_W, DSW, 11, 9) +BC_FORMAT_END(MEM_GDS_WORD2) \ No newline at end of file With src_rel/dst_rel dealt with as suggested above, Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 20/53] r600/sb: add LS/HS hw shader types.
On Mon, 30 Nov 2015 07:20:29 +0100, Dave Airlie <airl...@gmail.com> wrote: From: Dave Airlie <airl...@redhat.com> This just adds printing for the hw shader types, and hooks it up. Signed-off-by: Dave Airlie <airl...@redhat.com> --- src/gallium/drivers/r600/sb/sb_bc.h | 2 ++ src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 6 -- src/gallium/drivers/r600/sb/sb_shader.cpp| 4 +++- 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_bc.h b/src/gallium/drivers/r600/sb/sb_bc.h index d2e8da0..34e1e58 100644 --- a/src/gallium/drivers/r600/sb/sb_bc.h +++ b/src/gallium/drivers/r600/sb/sb_bc.h @@ -174,6 +174,8 @@ enum shader_target TARGET_GS_COPY, TARGET_COMPUTE, TARGET_FETCH, + TARGET_HS, + TARGET_LS, TARGET_NUM }; diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp index 28ebfa2..65aa801 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp @@ -58,10 +58,12 @@ int bc_parser::decode() { switch (bc->type) { case TGSI_PROCESSOR_FRAGMENT: t = TARGET_PS; break; case TGSI_PROCESSOR_VERTEX: - t = pshader->vs_as_es ? TARGET_ES : TARGET_VS; + t = pshader->vs_as_ls ? TARGET_LS : (pshader->vs_as_es ? TARGET_ES : TARGET_VS); break; case TGSI_PROCESSOR_GEOMETRY: t = TARGET_GS; break; case TGSI_PROCESSOR_COMPUTE: t = TARGET_COMPUTE; break; + case TGSI_PROCESSOR_TESS_CTRL: t = TARGET_HS; break; + case TGSI_PROCESSOR_TESS_EVAL: t = pshader->tes_as_es ? TARGET_ES : TARGET_VS; break; default: assert(!"unknown shader target"); return -1; break; } } else { @@ -146,7 +148,7 @@ int bc_parser::parse_decls() { } } - if (sh->target == TARGET_VS || sh->target == TARGET_ES) + if (sh->target == TARGET_VS || sh->target == TARGET_ES || sh->target == TARGET_HS) sh->add_input(0, 1, 0x0F); else if (sh->target == TARGET_GS) { sh->add_input(0, 1, 0x0F); diff --git a/src/gallium/drivers/r600/sb/sb_shader.cpp b/src/gallium/drivers/r600/sb/sb_shader.cpp index 87e28e9..8c7b39b 100644 --- a/src/gallium/drivers/r600/sb/sb_shader.cpp +++ b/src/gallium/drivers/r600/sb/sb_shader.cpp @@ -215,7 +215,7 @@ void shader::init() { void shader::init_call_fs(cf_node* cf) { unsigned gpr = 0; - assert(target == TARGET_VS || target == TARGET_ES); + assert(target == TARGET_LS || target == TARGET_VS || target == TARGET_ES); for(inputs_vec::const_iterator I = inputs.begin(), E = inputs.end(); I != E; ++I, ++gpr) { @@ -436,6 +436,8 @@ const char* shader::get_shader_target_name() { case TARGET_ES: return "ES"; case TARGET_PS: return "PS"; case TARGET_GS: return "GS"; + case TARGET_HS: return "HS"; + case TARGET_LS: return "LS"; case TARGET_COMPUTE: return "COMPUTE"; case TARGET_FETCH: return "FETCH"; default: Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600: move per-type settings into a switch statement
On Mon, 30 Nov 2015 01:38:03 +0100, Dave Airlie <airl...@gmail.com> wrote: From: Dave Airlie <airl...@redhat.com> This will allow adding tess stuff much cleaner later. Signed-off-by: Dave Airlie <airl...@redhat.com> --- src/gallium/drivers/r600/r600_shader.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 560197c..019fef7 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -1909,13 +1909,21 @@ static int r600_shader_from_tgsi(struct r600_context *rctx, shader->processor_type = ctx.type; ctx.bc->type = shader->processor_type; - if (ctx.type == TGSI_PROCESSOR_VERTEX) { + switch (ctx.type) { + case TGSI_PROCESSOR_VERTEX: shader->vs_as_gs_a = key.vs.as_gs_a; shader->vs_as_es = key.vs.as_es; + if (shader->vs_as_es) + ring_outputs = true; + break; + case TGSI_PROCESSOR_GEOMETRY: + ring_outputs = true; + break; + case TGSI_PROCESSOR_FRAGMENT: + shader->two_side = key.ps.color_two_side; + break; } - ring_outputs = shader->vs_as_es || ctx.type == TGSI_PROCESSOR_GEOMETRY; - if (shader->vs_as_es) { ctx.gs_for_vs = >gs_shader->current->shader; } else { @@ -1936,8 +1944,6 @@ static int r600_shader_from_tgsi(struct r600_context *rctx, shader->nr_ps_color_exports = 0; shader->nr_ps_max_color_exports = 0; - if (ctx.type == TGSI_PROCESSOR_FRAGMENT) - shader->two_side = key.ps.color_two_side; /* register allocations */ /* Values [0,127] correspond to GPR[0..127]. Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600: split out common alu_writes pattern.
On Mon, 30 Nov 2015 01:18:18 +0100, Dave Airlie <airl...@gmail.com> wrote: From: Dave Airlie <airl...@redhat.com> This just splits out a common pattern into an inline function to make things cleaner to read. Signed-off-by: Dave Airlie <airl...@redhat.com> --- src/gallium/drivers/r600/r600_asm.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/r600/r600_asm.c b/src/gallium/drivers/r600/r600_asm.c index 45824f2..29515f2 100644 --- a/src/gallium/drivers/r600/r600_asm.c +++ b/src/gallium/drivers/r600/r600_asm.c @@ -37,6 +37,11 @@ #define NUM_OF_CYCLES 3 #define NUM_OF_COMPONENTS 4 +static inline bool alu_writes(struct r600_bytecode_alu *alu) +{ + return alu->dst.write || alu->is_op3; +} + static inline unsigned int r600_bytecode_get_num_operands( struct r600_bytecode *bc, struct r600_bytecode_alu *alu) { @@ -592,7 +597,7 @@ static int replace_gpr_with_pv_ps(struct r600_bytecode *bc, return r; for (i = 0; i < max_slots; ++i) { - if (prev[i] && (prev[i]->dst.write || prev[i]->is_op3) && !prev[i]->dst.rel) { + if (prev[i] && alu_writes(prev[i]) && !prev[i]->dst.rel) { if (is_alu_64bit_inst(bc, prev[i])) { gpr[i] = -1; @@ -800,8 +805,8 @@ static int merge_inst_groups(struct r600_bytecode *bc, struct r600_bytecode_alu result[4] = slots[i]; } else if (is_alu_any_unit_inst(bc, prev[i])) { if (slots[i]->dst.sel == prev[i]->dst.sel && - (slots[i]->dst.write == 1 || slots[i]->is_op3) && - (prev[i]->dst.write == 1 || prev[i]->is_op3)) + alu_writes(slots[i]) && + alu_writes(prev[i])) return 0; result[i] = slots[i]; @@ -816,8 +821,8 @@ static int merge_inst_groups(struct r600_bytecode *bc, struct r600_bytecode_alu if (max_slots == 5 && slots[i] && prev[4] && slots[i]->dst.sel == prev[4]->dst.sel && slots[i]->dst.chan == prev[4]->dst.chan && - (slots[i]->dst.write == 1 || slots[i]->is_op3) && - (prev[4]->dst.write == 1 || prev[4]->is_op3)) + alu_writes(slots[i]) && + alu_writes(prev[4])) return 0; result[i] = slots[i]; @@ -857,7 +862,7 @@ static int merge_inst_groups(struct r600_bytecode *bc, struct r600_bytecode_alu continue; for (j = 0; j < max_slots; ++j) { - if (!prev[j] || !(prev[j]->dst.write || prev[j]->is_op3)) + if (!prev[j] || !alu_writes(prev[j])) continue; /* If it's relative then we can't determin which gpr is really used. */ @@ -1846,7 +1851,7 @@ static int print_dst(struct r600_bytecode_alu *alu) reg_char = 'T'; } - if (alu->dst.write || alu->is_op3) { + if (alu_writes(alu)) { o += fprintf(stderr, "%c", reg_char); o += print_sel(alu->dst.sel, alu->dst.rel, alu->index_mode, 0); } else { Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] r600: define registers required for tessellation
ES_FS 0x000288A8 +#define R_0288E8_SQ_LDS_ALLOC0x000288E8 #define R_0288EC_SQ_LDS_ALLOC_PS 0x000288EC #define R_028900_SQ_ESGS_RING_ITEMSIZE 0x00028900 #define R_028904_SQ_GSVS_RING_ITEMSIZE 0x00028904 @@ -1997,6 +2043,7 @@ #define R_028980_ALU_CONST_CACHE_VS_00x00028980 #define R_028984_ALU_CONST_CACHE_VS_10x00028984 #define R_0289C0_ALU_CONST_CACHE_GS_00x000289C0 +#define R_028F00_ALU_CONST_CACHE_HS_00x00028F00 #define R_028F40_ALU_CONST_CACHE_LS_00x00028F40 #define R_028A04_PA_SU_POINT_MINMAX 0x00028A04 #define S_028A04_MIN_SIZE(x) (((x) & 0x) << 0) @@ -2090,6 +2137,36 @@ #define V_028B54_VS_STAGE_REAL 0x00 #define V_028B54_VS_STAGE_DS 0x01 #define V_028B54_VS_STAGE_COPY_SHADER0x02 +#define R_028B58_VGT_LS_HS_CONFIG 0x00028B58 +#define S_028B58_NUM_PATCHES(x) (((x) & 0xFF) << 0) +#define G_028B58_NUM_PATCHES(x) (((x) >> 0) & 0xFF) +#define C_028B58_NUM_PATCHES 0xFF00 +#define S_028B58_HS_NUM_INPUT_CP(x) (((x) & 0x3F) << 8) +#define G_028B58_HS_NUM_INPUT_CP(x) (((x) >> 8) & 0x3F) +#define C_028B58_HS_NUM_INPUT_CP 0xC0FF +#define S_028B58_HS_NUM_OUTPUT_CP(x)(((x) & 0x3F) << 14) +#define G_028B58_HS_NUM_OUTPUT_CP(x)(((x) >> 14) & 0x3F) +#define C_028B58_HS_NUM_OUTPUT_CP 0xFFF03FFF +#define R_028B5C_VGT_LS_SIZE 0x00028B5C +#define S_028B5C_SIZE(x)(((x) & 0xFF) << 0) +#define G_028B5C_SIZE(x)(((x) >> 0) & 0xFF) +#define C_028B5C_SIZE 0xFF00 +#define S_028B5C_PATCH_CP_SIZE(x) (((x) & 0x1FFF) << 8) +#define G_028B5C_PATCH_CP_SIZE(x) (((x) >> 8) & 0x1FFF) +#define C_028B5C_PATCH_CP_SIZE 0xFFF000FF C_028B5C_PATCH_CP_SIZE should be 0xFFE000FF, its 13 bits, not 12 +#define R_028B60_VGT_HS_SIZE 0x00028B60 +#define S_028B60_SIZE(x)(((x) & 0xFF) << 0) +#define G_028B60_SIZE(x)(((x) >> 0) & 0xFF) +#define C_028B60_SIZE 0xFF00 +#define S_028B60_PATCH_CP_SIZE(x) (((x) & 0x1FFF) << 8) +#define G_028B60_PATCH_CP_SIZE(x) (((x) >> 8) & 0x1FFF) +#define C_028B60_PATCH_CP_SIZE 0xFFF000FF Same here, C_028B60_PATCH_CP_SIZE mask should be 0xFFE000FF +#define R_028B64_VGT_LS_HS_ALLOC 0x00028B64 +#define S_028B64_HS_TOTAL_OUTPUT(x) (((x) & 0x1FFF) << 0) +#define S_028B64_LS_HS_TOTAL_OUTPUT(x) (((x) & 0x1FFF) << 13) +#define R_028B68_VGT_HS_PATCH_CONST 0x00028B68 +#define S_028B68_SIZE(x)(((x) & 0x1FFF) << 0) +#define S_028B68_STRIDE(x) (((x) & 0x1FFF) << 13) No getters/masks for these? #define R_028B70_DB_ALPHA_TO_MASK0x00028B70 #define S_028B70_ALPHA_TO_MASK_ENABLE(x) (((x) & 0x1) << 0) #define S_028B70_ALPHA_TO_MASK_OFFSET0(x)(((x) & 0x3) << 8) diff --git a/src/gallium/drivers/r600/r600_sq.h b/src/gallium/drivers/r600/r600_sq.h index 1545cf1..37b6d58 100644 --- a/src/gallium/drivers/r600/r600_sq.h +++ b/src/gallium/drivers/r600/r600_sq.h @@ -189,6 +189,14 @@ * 255 SQ_ALU_SRC_PS: previous scalar result. * 448 EG - INTERP SRC BASE */ +/* LDS are Evergreen/Cayman only */ +#define EG_V_SQ_ALU_SRC_LDS_OQ_A 0x00DB +#define EG_V_SQ_ALU_SRC_LDS_OQ_B 0x00DC +#define EG_V_SQ_ALU_SRC_LDS_OQ_A_POP 0x00DD +#define EG_V_SQ_ALU_SRC_LDS_OQ_B_POP 0x00DE +#define EG_V_SQ_ALU_SRC_LDS_DIRECT_A 0x00DF +#define EG_V_SQ_ALU_SRC_LDS_DIRECT_B 0x00E0 + #define V_SQ_ALU_SRC_0 0x00F8 #define V_SQ_ALU_SRC_1 0x00F9 #define V_SQ_ALU_SRC_1_INT 0x00FA With above nits fixed, Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] r600: add missing register to initial state
eend.h b/src/gallium/drivers/r600/evergreend.h index dbee9d5..c43a987 100644 --- a/src/gallium/drivers/r600/evergreend.h +++ b/src/gallium/drivers/r600/evergreend.h @@ -2500,7 +2500,6 @@ #define CM_R_0286FC_SPI_LDS_MGMT 0x286fc #define S_0286FC_NUM_PS_LDS(x) ((x) & 0xff) #define S_0286FC_NUM_LS_LDS(x) ((x) & 0xff) << 8 -#define CM_R_0288E8_SQ_LDS_ALLOC 0x000288E8 #define CM_R_028804_DB_EQAA 0x00028804 #define S_028804_MAX_ANCHOR_SAMPLES(x) (((x) & 0x7) << 0) Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: Support TGSI_SEMANTIC_HELPER_INVOCATION
On Fri, 13 Nov 2015 18:57:28 +0100, Nicolai Hähnle <nhaeh...@gmail.com> wrote: On 13.11.2015 00:14, Glenn Kennard wrote: Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- Maybe there is a better way to check if a thread is a helper invocation? Is ctx->face_gpr guaranteed to be initialized when load_helper_invocation is called? allocate_system_value_inputs() sets that if needed, and is called before parsing any opcodes. Aside, I'm not sure I understand correctly what this is supposed to do. The values you're querying are related to multi-sampling, but my understanding has always been that helper invocations can also happen without multi-sampling: you always want to process 2x2 quads of pixels at a time to be able to compute derivatives for texture sampling. When the boundary of primitive intersects such a quad, you get helper invocations outside the primitive. Non-MSAA buffers act just like 1 sample buffers with regards to the coverage mask supplied by the hardware, so helper invocations which have no coverage get a 0 for the mask value, and normal fragments get 1. Works with the piglit test case posted at least... Cheers, Nicolai src/gallium/drivers/r600/r600_shader.c | 83 +- 1 file changed, 72 insertions(+), 11 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 560197c..a227d78 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -530,7 +530,8 @@ static int r600_spi_sid(struct r600_shader_io * io) name == TGSI_SEMANTIC_PSIZE || name == TGSI_SEMANTIC_EDGEFLAG || name == TGSI_SEMANTIC_FACE || - name == TGSI_SEMANTIC_SAMPLEMASK) + name == TGSI_SEMANTIC_SAMPLEMASK || + name == TGSI_SEMANTIC_HELPER_INVOCATION) index = 0; else { if (name == TGSI_SEMANTIC_GENERIC) { @@ -734,7 +735,8 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx) case TGSI_FILE_SYSTEM_VALUE: if (d->Semantic.Name == TGSI_SEMANTIC_SAMPLEMASK || d->Semantic.Name == TGSI_SEMANTIC_SAMPLEID || - d->Semantic.Name == TGSI_SEMANTIC_SAMPLEPOS) { + d->Semantic.Name == TGSI_SEMANTIC_SAMPLEPOS || + d->Semantic.Name == TGSI_SEMANTIC_HELPER_INVOCATION) { break; /* Already handled from allocate_system_value_inputs */ } else if (d->Semantic.Name == TGSI_SEMANTIC_INSTANCEID) { if (!ctx->native_integers) { @@ -776,13 +778,14 @@ static int allocate_system_value_inputs(struct r600_shader_ctx *ctx, int gpr_off struct { boolean enabled; int *reg; - unsigned name, alternate_name; + unsigned associated_semantics[3]; } inputs[2] = { - { false, >face_gpr, TGSI_SEMANTIC_SAMPLEMASK, ~0u }, /* lives in Front Face GPR.z */ - - { false, >fixed_pt_position_gpr, TGSI_SEMANTIC_SAMPLEID, TGSI_SEMANTIC_SAMPLEPOS } /* SAMPLEID is in Fixed Point Position GPR.w */ + { false, >face_gpr, { TGSI_SEMANTIC_SAMPLEMASK /* lives in Front Face GPR.z */, + TGSI_SEMANTIC_HELPER_INVOCATION, ~0u } }, + { false, >fixed_pt_position_gpr, { TGSI_SEMANTIC_SAMPLEID /* in Fixed Point Position GPR.w */, + TGSI_SEMANTIC_SAMPLEPOS, TGSI_SEMANTIC_HELPER_INVOCATION } } }; - int i, k, num_regs = 0; + int i, k, l, num_regs = 0; if (tgsi_parse_init(, ctx->tokens) != TGSI_PARSE_OK) { return 0; @@ -818,9 +821,11 @@ static int allocate_system_value_inputs(struct r600_shader_ctx *ctx, int gpr_off struct tgsi_full_declaration *d = if (d->Declaration.File == TGSI_FILE_SYSTEM_VALUE) { for (k = 0; k < Elements(inputs); k++) { - if (d->Semantic.Name == inputs[k].name || - d->Semantic.Name == inputs[k].alternate_name) { - inputs[k].enabled = true; + for (l = 0; l < 3; l++) { + if (d->Semantic.Name == inputs[k].associated_semantics[l]) { + inputs[k].enabled = true; + break; + } } } } @@ -832,7 +837,7 @@ static int allocate_system_value_inputs(struct r600_shader_ctx *ctx, int gpr_off for (i = 0; i < Elem
Re: [Mesa-dev] [PATCH v2 2/3] gallium: add support for gl_HelperInvocation semantic
On Thu, 12 Nov 2015 18:32:25 +0100, Ilia Mirkin <imir...@alum.mit.edu> wrote: Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu> --- src/gallium/auxiliary/tgsi/tgsi_strings.c | 1 + src/gallium/docs/source/tgsi.rst | 8 src/gallium/include/pipe/p_shader_tokens.h | 3 ++- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 4 +++- 4 files changed, 14 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.c b/src/gallium/auxiliary/tgsi/tgsi_strings.c index 89369d6..fc29a23 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_strings.c +++ b/src/gallium/auxiliary/tgsi/tgsi_strings.c @@ -95,6 +95,7 @@ const char *tgsi_semantic_names[TGSI_SEMANTIC_COUNT] = "TESSOUTER", "TESSINNER", "VERTICESIN", + "HELPER_INVOCATION", }; const char *tgsi_texture_names[TGSI_TEXTURE_COUNT] = diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index 01e18f3..e7b0c2f 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -2941,6 +2941,14 @@ TGSI_SEMANTIC_VERTICESIN For tessellation evaluation/control shaders, this semantic label indicates the number of vertices provided in the input patch. Only the X value is defined. +TGSI_SEMANTIC_HELPER_INVOCATION +""""""""""""""""""""""""""""""" + +For fragment shaders, this semantic indicates whether the current +invocation is covered or not. Helper invocations are created in order +to properly compute derivatives, however it may be desirable to skip +some of the logic in those cases. See ``gl_HelperInvocation`` documentation. + Declaration Interpolate ^^^ diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index e0ab901..a3137ae 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -185,7 +185,8 @@ struct tgsi_declaration_interp #define TGSI_SEMANTIC_TESSOUTER 32 /**< outer tessellation levels */ #define TGSI_SEMANTIC_TESSINNER 33 /**< inner tessellation levels */ #define TGSI_SEMANTIC_VERTICESIN 34 /**< number of input vertices */ -#define TGSI_SEMANTIC_COUNT 35 /**< number of semantic values */ +#define TGSI_SEMANTIC_HELPER_INVOCATION 35 /**< current invocation is helper */ +#define TGSI_SEMANTIC_COUNT 36 /**< number of semantic values */ struct tgsi_declaration_semantic { diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index b565127..3ad1afd 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -4408,7 +4408,7 @@ const unsigned _mesa_sysval_to_semantic[SYSTEM_VALUE_MAX] = { TGSI_SEMANTIC_SAMPLEID, TGSI_SEMANTIC_SAMPLEPOS, TGSI_SEMANTIC_SAMPLEMASK, - 0, /* gl_HelperInvocation */ + TGSI_SEMANTIC_HELPER_INVOCATION, /* Tessellation shaders */ @@ -5139,6 +5139,8 @@ st_translate_program( TGSI_SEMANTIC_BASEVERTEX); assert(_mesa_sysval_to_semantic[SYSTEM_VALUE_TESS_COORD] == TGSI_SEMANTIC_TESSCOORD); + assert(_mesa_sysval_to_semantic[SYSTEM_VALUE_HELPER_INVOCATION] == + TGSI_SEMANTIC_HELPER_INVOCATION); t = CALLOC_STRUCT(st_translate); if (!t) { Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g: Support TGSI_SEMANTIC_HELPER_INVOCATION
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- Maybe there is a better way to check if a thread is a helper invocation? src/gallium/drivers/r600/r600_shader.c | 83 +- 1 file changed, 72 insertions(+), 11 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 560197c..a227d78 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -530,7 +530,8 @@ static int r600_spi_sid(struct r600_shader_io * io) name == TGSI_SEMANTIC_PSIZE || name == TGSI_SEMANTIC_EDGEFLAG || name == TGSI_SEMANTIC_FACE || - name == TGSI_SEMANTIC_SAMPLEMASK) + name == TGSI_SEMANTIC_SAMPLEMASK || + name == TGSI_SEMANTIC_HELPER_INVOCATION) index = 0; else { if (name == TGSI_SEMANTIC_GENERIC) { @@ -734,7 +735,8 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx) case TGSI_FILE_SYSTEM_VALUE: if (d->Semantic.Name == TGSI_SEMANTIC_SAMPLEMASK || d->Semantic.Name == TGSI_SEMANTIC_SAMPLEID || - d->Semantic.Name == TGSI_SEMANTIC_SAMPLEPOS) { + d->Semantic.Name == TGSI_SEMANTIC_SAMPLEPOS || + d->Semantic.Name == TGSI_SEMANTIC_HELPER_INVOCATION) { break; /* Already handled from allocate_system_value_inputs */ } else if (d->Semantic.Name == TGSI_SEMANTIC_INSTANCEID) { if (!ctx->native_integers) { @@ -776,13 +778,14 @@ static int allocate_system_value_inputs(struct r600_shader_ctx *ctx, int gpr_off struct { boolean enabled; int *reg; - unsigned name, alternate_name; + unsigned associated_semantics[3]; } inputs[2] = { - { false, >face_gpr, TGSI_SEMANTIC_SAMPLEMASK, ~0u }, /* lives in Front Face GPR.z */ - - { false, >fixed_pt_position_gpr, TGSI_SEMANTIC_SAMPLEID, TGSI_SEMANTIC_SAMPLEPOS } /* SAMPLEID is in Fixed Point Position GPR.w */ + { false, >face_gpr, { TGSI_SEMANTIC_SAMPLEMASK /* lives in Front Face GPR.z */, + TGSI_SEMANTIC_HELPER_INVOCATION, ~0u } }, + { false, >fixed_pt_position_gpr, { TGSI_SEMANTIC_SAMPLEID /* in Fixed Point Position GPR.w */, + TGSI_SEMANTIC_SAMPLEPOS, TGSI_SEMANTIC_HELPER_INVOCATION } } }; - int i, k, num_regs = 0; + int i, k, l, num_regs = 0; if (tgsi_parse_init(, ctx->tokens) != TGSI_PARSE_OK) { return 0; @@ -818,9 +821,11 @@ static int allocate_system_value_inputs(struct r600_shader_ctx *ctx, int gpr_off struct tgsi_full_declaration *d = if (d->Declaration.File == TGSI_FILE_SYSTEM_VALUE) { for (k = 0; k < Elements(inputs); k++) { - if (d->Semantic.Name == inputs[k].name || - d->Semantic.Name == inputs[k].alternate_name) { - inputs[k].enabled = true; + for (l = 0; l < 3; l++) { + if (d->Semantic.Name == inputs[k].associated_semantics[l]) { + inputs[k].enabled = true; + break; + } } } } @@ -832,7 +837,7 @@ static int allocate_system_value_inputs(struct r600_shader_ctx *ctx, int gpr_off for (i = 0; i < Elements(inputs); i++) { boolean enabled = inputs[i].enabled; int *reg = inputs[i].reg; - unsigned name = inputs[i].name; + unsigned name = inputs[i].associated_semantics[0]; if (enabled) { int gpr = gpr_offset + num_regs++; @@ -985,6 +990,56 @@ static int load_sample_position(struct r600_shader_ctx *ctx, struct r600_shader_ return t1; } +static int load_helper_invocation(struct r600_shader_ctx *ctx, + int mask_gpr, int mask_chan, int id_gpr, int id_chan) +{ + // sample (mask >> sampleid) & 1 + struct r600_bytecode_alu alu; + int r, t = r600_get_temp(ctx); + + memset(, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ALU_OP2_LSHR_INT; + alu.src[0].sel = mask_gpr; + alu.src[0].chan = mask_chan; + alu.src[1].sel = id_gpr; + alu.src[1].chan = id_chan; + alu.dst.sel = t; + alu.dst.chan = 0; + alu.dst.write = 1; + alu.last = 1; + r =
Re: [Mesa-dev] [PATCH] r600: initialised PGM_RESOURCES_2 for ES/GS
On Wed, 11 Nov 2015 23:42:18 +0100, Dave Airlie <airl...@gmail.com> wrote: From: Dave Airlie <airl...@redhat.com> This fixes the corruption on rendering that we are seeing in certain geometry shaders. Specifically, this fixes https://bugs.freedesktop.org/show_bug.cgi?id=91780 and probably others Signed-off-by: Dave Airlie <airl...@redhat.com> --- src/gallium/drivers/r600/evergreen_state.c | 4 src/gallium/drivers/r600/evergreend.h | 2 ++ 2 files changed, 6 insertions(+) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index c6702a9..a3bbbcc 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -2362,6 +2362,8 @@ static void cayman_init_atom_start_cs(struct r600_context *rctx) r600_store_context_reg(cb, R_028848_SQ_PGM_RESOURCES_2_PS, S_028848_SINGLE_ROUND(V_SQ_ROUND_NEAREST_EVEN)); r600_store_context_reg(cb, R_028864_SQ_PGM_RESOURCES_2_VS, S_028864_SINGLE_ROUND(V_SQ_ROUND_NEAREST_EVEN)); + r600_store_context_reg(cb, R_02887C_SQ_PGM_RESOURCES_2_GS, S_028848_SINGLE_ROUND(V_SQ_ROUND_NEAREST_EVEN)); + r600_store_context_reg(cb, R_028894_SQ_PGM_RESOURCES_2_ES, S_028848_SINGLE_ROUND(V_SQ_ROUND_NEAREST_EVEN)); r600_store_context_reg(cb, R_0288A8_SQ_PGM_RESOURCES_FS, 0); /* to avoid GPU doing any preloading of constant from random address */ @@ -2801,6 +2803,8 @@ void evergreen_init_atom_start_cs(struct r600_context *rctx) r600_store_context_reg(cb, R_028848_SQ_PGM_RESOURCES_2_PS, S_028848_SINGLE_ROUND(V_SQ_ROUND_NEAREST_EVEN)); r600_store_context_reg(cb, R_028864_SQ_PGM_RESOURCES_2_VS, S_028864_SINGLE_ROUND(V_SQ_ROUND_NEAREST_EVEN)); + r600_store_context_reg(cb, R_02887C_SQ_PGM_RESOURCES_2_GS, S_028848_SINGLE_ROUND(V_SQ_ROUND_NEAREST_EVEN)); + r600_store_context_reg(cb, R_028894_SQ_PGM_RESOURCES_2_ES, S_028848_SINGLE_ROUND(V_SQ_ROUND_NEAREST_EVEN)); nitpick: separate macros for SINGLE_ROUND for each register r600_store_context_reg(cb, R_0288A8_SQ_PGM_RESOURCES_FS, 0); /* to avoid GPU doing any preloading of constant from random address */ diff --git a/src/gallium/drivers/r600/evergreend.h b/src/gallium/drivers/r600/evergreend.h index 937ffcb..cf8906c 100644 --- a/src/gallium/drivers/r600/evergreend.h +++ b/src/gallium/drivers/r600/evergreend.h @@ -1497,6 +1497,7 @@ #define S_028878_UNCACHED_FIRST_INST(x) (((x) & 0x1) << 28) #define G_028878_UNCACHED_FIRST_INST(x) (((x) >> 28) & 0x1) #define C_028878_UNCACHED_FIRST_INST 0xEFFF +#define R_02887C_SQ_PGM_RESOURCES_2_GS 0x02887C #define R_028890_SQ_PGM_RESOURCES_ES 0x028890 #define S_028890_NUM_GPRS(x) (((x) & 0xFF) << 0) @@ -1511,6 +1512,7 @@ #define S_028890_UNCACHED_FIRST_INST(x) (((x) & 0x1) << 28) #define G_028890_UNCACHED_FIRST_INST(x) (((x) >> 28) & 0x1) #define C_028890_UNCACHED_FIRST_INST 0xEFFF +#define R_028894_SQ_PGM_RESOURCES_2_ES 0x028894 #define R_028864_SQ_PGM_RESOURCES_2_VS 0x028864 #define S_028864_SINGLE_ROUND(x) (((x) & 0x3) << 0) Tested / Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] r600g: Pass conservative depth parameters to hw
Supported on R700 and up. Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- v2: Use correct register for R700, set from r600_emit_db_misc_state Added ps_conservative_z field to r600_db_misc_state Shrunk ps_conservative_z to uint8 since only 2 bits are needed Thanks Alex for noting the incorrect register on R700, would have gone unnoticed for years otherwise since the feature isn't directly observable... src/gallium/drivers/r600/evergreen_state.c | 13 + src/gallium/drivers/r600/evergreend.h | 7 +++ src/gallium/drivers/r600/r600_pipe.h | 1 + src/gallium/drivers/r600/r600_shader.c | 1 + src/gallium/drivers/r600/r600_shader.h | 2 ++ src/gallium/drivers/r600/r600_state.c | 22 +- src/gallium/drivers/r600/r600d.h | 8 7 files changed, 53 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index c6702a9..96c6b11 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -2940,6 +2940,19 @@ void evergreen_update_ps_state(struct pipe_context *ctx, struct r600_pipe_shader db_shader_control |= S_02880C_STENCIL_EXPORT_ENABLE(stencil_export); db_shader_control |= S_02880C_MASK_EXPORT_ENABLE(mask_export); + switch (rshader->ps_conservative_z) { + default: /* fall through */ + case TGSI_FS_DEPTH_LAYOUT_ANY: + db_shader_control |= S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_ANY_Z); + break; + case TGSI_FS_DEPTH_LAYOUT_GREATER: + db_shader_control |= S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_GREATER_THAN_Z); + break; + case TGSI_FS_DEPTH_LAYOUT_LESS: + db_shader_control |= S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_LESS_THAN_Z); + break; + } + exports_ps = 0; for (i = 0; i < rshader->noutput; i++) { if (rshader->output[i].name == TGSI_SEMANTIC_POSITION || diff --git a/src/gallium/drivers/r600/evergreend.h b/src/gallium/drivers/r600/evergreend.h index 937ffcb..a9a65f7 100644 --- a/src/gallium/drivers/r600/evergreend.h +++ b/src/gallium/drivers/r600/evergreend.h @@ -815,6 +815,13 @@ #define V_02880C_EXPORT_DB_FOUR16 0x01 #define V_02880C_EXPORT_DB_TWO 0x02 #define S_02880C_ALPHA_TO_MASK_DISABLE(x)(((x) & 0x1) << 12) +#define S_02880C_CONSERVATIVE_Z_EXPORT(x)(((x) & 0x03) << 16) +#define G_02880C_CONSERVATIVE_Z_EXPORT(x)(((x) >> 16) & 0x03) +#define C_02880C_CONSERVATIVE_Z_EXPORT 0xFFFC +#define V_02880C_EXPORT_ANY_Z 0 +#define V_02880C_EXPORT_LESS_THAN_Z1 +#define V_02880C_EXPORT_GREATER_THAN_Z 2 +#define V_02880C_EXPORT_RESERVED 3 #define R_028A00_PA_SU_POINT_SIZE0x028A00 #define S_028A00_HEIGHT(x) (((x) & 0x) << 0) diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index 520b03f..950bb6b 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -116,6 +116,7 @@ struct r600_db_misc_state { unsignedlog_samples; unsigneddb_shader_control; boolhtile_clear; + uint8_t ps_conservative_z; }; struct r600_cb_misc_state { diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 8efe902..613f94e 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -2048,6 +2048,7 @@ static int r600_shader_from_tgsi(struct r600_context *rctx, shader->fs_write_all = ctx.info.properties[TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS]; shader->vs_position_window_space = ctx.info.properties[TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION]; + shader->ps_conservative_z = (uint8_t)ctx.info.properties[TGSI_PROPERTY_FS_DEPTH_LAYOUT]; if (shader->vs_as_gs_a) vs_add_primid_output(, key.vs.prim_id_out); diff --git a/src/gallium/drivers/r600/r600_shader.h b/src/gallium/drivers/r600/r600_shader.h index c240e71..2040f73 100644 --- a/src/gallium/drivers/r600/r600_shader.h +++ b/src/gallium/drivers/r600/r600_shader.h @@ -76,6 +76,8 @@ struct r600_shader { boolean uses_tex_buffers; boolean gs_prim_id_input; + uint8_t ps_conservative_z; + /* Size in bytes of a data item in the ring(s) (single vertex data). Stages with only one ring items 123 will be set to 0. */ unsignedring_item_sizes[4]; dif
[Mesa-dev] [PATCH] r600g: Pass conservative depth parameters to hw
Supported on R700 and up. Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- Not exactly a commonly used extension, but might as well set the hardware registers rather than just dropping the hint on the floor. src/gallium/drivers/r600/evergreen_state.c | 13 + src/gallium/drivers/r600/evergreend.h | 7 +++ src/gallium/drivers/r600/r600_shader.c | 1 + src/gallium/drivers/r600/r600_shader.h | 2 ++ src/gallium/drivers/r600/r600_state.c | 15 +++ src/gallium/drivers/r600/r600d.h | 8 6 files changed, 46 insertions(+) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index c6702a9..96c6b11 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -2940,6 +2940,19 @@ void evergreen_update_ps_state(struct pipe_context *ctx, struct r600_pipe_shader db_shader_control |= S_02880C_STENCIL_EXPORT_ENABLE(stencil_export); db_shader_control |= S_02880C_MASK_EXPORT_ENABLE(mask_export); + switch (rshader->ps_conservative_z) { + default: /* fall through */ + case TGSI_FS_DEPTH_LAYOUT_ANY: + db_shader_control |= S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_ANY_Z); + break; + case TGSI_FS_DEPTH_LAYOUT_GREATER: + db_shader_control |= S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_GREATER_THAN_Z); + break; + case TGSI_FS_DEPTH_LAYOUT_LESS: + db_shader_control |= S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_LESS_THAN_Z); + break; + } + exports_ps = 0; for (i = 0; i < rshader->noutput; i++) { if (rshader->output[i].name == TGSI_SEMANTIC_POSITION || diff --git a/src/gallium/drivers/r600/evergreend.h b/src/gallium/drivers/r600/evergreend.h index 937ffcb..a9a65f7 100644 --- a/src/gallium/drivers/r600/evergreend.h +++ b/src/gallium/drivers/r600/evergreend.h @@ -815,6 +815,13 @@ #define V_02880C_EXPORT_DB_FOUR16 0x01 #define V_02880C_EXPORT_DB_TWO 0x02 #define S_02880C_ALPHA_TO_MASK_DISABLE(x)(((x) & 0x1) << 12) +#define S_02880C_CONSERVATIVE_Z_EXPORT(x)(((x) & 0x03) << 16) +#define G_02880C_CONSERVATIVE_Z_EXPORT(x)(((x) >> 16) & 0x03) +#define C_02880C_CONSERVATIVE_Z_EXPORT 0xFFFC +#define V_02880C_EXPORT_ANY_Z 0 +#define V_02880C_EXPORT_LESS_THAN_Z1 +#define V_02880C_EXPORT_GREATER_THAN_Z 2 +#define V_02880C_EXPORT_RESERVED 3 #define R_028A00_PA_SU_POINT_SIZE0x028A00 #define S_028A00_HEIGHT(x) (((x) & 0x) << 0) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 8efe902..560696d 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -2048,6 +2048,7 @@ static int r600_shader_from_tgsi(struct r600_context *rctx, shader->fs_write_all = ctx.info.properties[TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS]; shader->vs_position_window_space = ctx.info.properties[TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION]; + shader->ps_conservative_z = ctx.info.properties[TGSI_PROPERTY_FS_DEPTH_LAYOUT]; if (shader->vs_as_gs_a) vs_add_primid_output(, key.vs.prim_id_out); diff --git a/src/gallium/drivers/r600/r600_shader.h b/src/gallium/drivers/r600/r600_shader.h index c240e71..e085263 100644 --- a/src/gallium/drivers/r600/r600_shader.h +++ b/src/gallium/drivers/r600/r600_shader.h @@ -76,6 +76,8 @@ struct r600_shader { boolean uses_tex_buffers; boolean gs_prim_id_input; + unsignedps_conservative_z; + /* Size in bytes of a data item in the ring(s) (single vertex data). Stages with only one ring items 123 will be set to 0. */ unsignedring_item_sizes[4]; diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index 1be3e1b..09b2325 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -2533,6 +2533,21 @@ void r600_update_ps_state(struct pipe_context *ctx, struct r600_pipe_shader *sha if (rshader->uses_kill) db_shader_control |= S_02880C_KILL_ENABLE(1); + if (rctx->b.chip_class >= R700) { + switch (rshader->ps_conservative_z) { + default: /* fall through */ + case TGSI_FS_DEPTH_LAYOUT_ANY: + db_shader_control |= S_02880C_CONSERVATIVE_Z_EXPORT(V_02880C_EXPORT_ANY_Z); + break; + case TGSI_FS_DEPTH_LAYOUT_GRE
[Mesa-dev] [PATCH] r600g: Implement ARB_texture_view
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- See also additional texture view piglit test case posted to piglit ml, which tests cases with layer>0. Notably softpipe and llvmpipe fail that case but i965/hsw, nv50/nvc0 and r600g pass. docs/GL3.txt | 2 +- docs/relnotes/11.1.0.html | 1 + src/gallium/drivers/r600/evergreen_state.c | 23 +-- src/gallium/drivers/r600/r600_pipe.c | 2 +- 4 files changed, 20 insertions(+), 8 deletions(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 6503e2a..c03a574 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -169,7 +169,7 @@ GL 4.3, GLSL 4.30: GL_ARB_texture_buffer_range DONE (nv50, nvc0, i965, r600, radeonsi, llvmpipe) GL_ARB_texture_query_levels DONE (all drivers that support GLSL 1.30) GL_ARB_texture_storage_multisample DONE (all drivers that support GL_ARB_texture_multisample) - GL_ARB_texture_view DONE (i965, nv50, nvc0, llvmpipe, softpipe) + GL_ARB_texture_view DONE (i965, nv50, nvc0, r600, llvmpipe, softpipe) GL_ARB_vertex_attrib_binding DONE (all drivers) diff --git a/docs/relnotes/11.1.0.html b/docs/relnotes/11.1.0.html index dcf425e..cb8715c 100644 --- a/docs/relnotes/11.1.0.html +++ b/docs/relnotes/11.1.0.html @@ -53,6 +53,7 @@ Note: some of the new features are only available with certain drivers. GL_ARB_texture_query_lod on softpipe EGL_KHR_create_context on softpipe, llvmpipe EGL_KHR_gl_colorspace on softpipe, llvmpipe +GL_ARB_texture_view on r600 for Evergreen and later chips Bug fixes diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index c6702a9..60747d1 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -666,6 +666,7 @@ evergreen_create_sampler_view_custom(struct pipe_context *ctx, enum pipe_format pipe_format = state->format; struct radeon_surf_level *surflevel; unsigned base_level, first_level, last_level; + unsigned dim, last_layer; uint64_t va; if (view == NULL) @@ -679,7 +680,7 @@ evergreen_create_sampler_view_custom(struct pipe_context *ctx, view->base.reference.count = 1; view->base.context = ctx; - if (texture->target == PIPE_BUFFER) + if (state->target == PIPE_BUFFER) return texture_buffer_sampler_view(rctx, view, width0, height0); swizzle[0] = state->swizzle_r; @@ -773,12 +774,12 @@ evergreen_create_sampler_view_custom(struct pipe_context *ctx, } nbanks = eg_num_banks(rscreen->b.tiling_info.num_banks); - if (texture->target == PIPE_TEXTURE_1D_ARRAY) { + if (state->target == PIPE_TEXTURE_1D_ARRAY) { height = 1; depth = texture->array_size; - } else if (texture->target == PIPE_TEXTURE_2D_ARRAY) { + } else if (state->target == PIPE_TEXTURE_2D_ARRAY) { depth = texture->array_size; - } else if (texture->target == PIPE_TEXTURE_CUBE_ARRAY) + } else if (state->target == PIPE_TEXTURE_CUBE_ARRAY) depth = texture->array_size / 6; va = tmp->resource.gpu_address; @@ -790,7 +791,13 @@ evergreen_create_sampler_view_custom(struct pipe_context *ctx, view->is_stencil_sampler = true; view->tex_resource = >resource; - view->tex_resource_words[0] = (S_03_DIM(r600_tex_dim(texture->target, texture->nr_samples)) | + + /* array type views and views into array types need to use layer offset */ + dim = state->target; + if (state->target != PIPE_TEXTURE_CUBE) + dim = MAX2(state->target, texture->target); + + view->tex_resource_words[0] = (S_03_DIM(r600_tex_dim(dim, texture->nr_samples)) | S_03_PITCH((pitch / 8) - 1) | S_03_TEX_WIDTH(width - 1)); if (rscreen->b.chip_class == CAYMAN) @@ -818,10 +825,14 @@ evergreen_create_sampler_view_custom(struct pipe_context *ctx, view->tex_resource_words[3] = (surflevel[base_level].offset + va) >> 8; } + last_layer = state->u.tex.last_layer; + if (state->target != texture->target && depth == 1) { + last_layer = state->u.tex.first_layer; + } view->tex_resource_words[4] = (word4 | S_030010_ENDIAN_SWAP(endian)); view->tex_resource_words[5] = S_030014_BASE_ARRAY(state->u.tex.first_layer) | - S_030014_LAST_ARRAY(state->u.tex.last_layer); +
[Mesa-dev] [PATCH 1/2] r600g/sb: SB support for UBO indexing
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- This patch depends on prior patch: r600g/sb: Support gs5 sampler indexing Two items that could be improved on in some future patch: Clauses using UBO indexing still lock the cache line for a constant used to load the index register, which causes some instruction groups to be broken up as SB thinks they are using too many constant read ports. The MOVA_INT/SET_CF_IDX[01] ops can often be emitted directly into the preceeding clause rather than always creating a new one. src/gallium/drivers/r600/r600_shader.c | 6 -- src/gallium/drivers/r600/r600_shader.h | 2 - src/gallium/drivers/r600/sb/sb_bc.h| 4 +- src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 6 +- src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 20 - src/gallium/drivers/r600/sb/sb_expr.cpp| 3 +- src/gallium/drivers/r600/sb/sb_ir.h| 7 ++ src/gallium/drivers/r600/sb/sb_sched.cpp | 108 ++--- src/gallium/drivers/r600/sb/sb_sched.h | 4 + src/gallium/drivers/r600/sb/sb_shader.cpp | 4 +- src/gallium/drivers/r600/sb/sb_shader.h| 2 +- 11 files changed, 139 insertions(+), 27 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 24c3d43..8efe902 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -166,8 +166,6 @@ int r600_pipe_shader_create(struct pipe_context *ctx, if (rctx->b.chip_class <= R700) { use_sb &= (shader->shader.processor_type != TGSI_PROCESSOR_GEOMETRY); } - /* disable SB for shaders using ubo array indexing as it doesn't handle those currently */ - use_sb &= !shader->shader.uses_ubo_indexing; /* disable SB for shaders using doubles */ use_sb &= !shader->shader.uses_doubles; @@ -1250,9 +1248,6 @@ static int tgsi_split_constant(struct r600_shader_ctx *ctx) continue; } - if (ctx->src[i].kc_rel) - ctx->shader->uses_ubo_indexing = true; - if (ctx->src[i].rel) { int chan = inst->Src[i].Indirect.Swizzle; int treg = r600_get_temp(ctx); @@ -1936,7 +1931,6 @@ static int r600_shader_from_tgsi(struct r600_context *rctx, ctx.gs_next_vertex = 0; ctx.gs_stream_output_info = - shader->uses_ubo_indexing = false; ctx.face_gpr = -1; ctx.fixed_pt_position_gpr = -1; ctx.fragcoord_input = -1; diff --git a/src/gallium/drivers/r600/r600_shader.h b/src/gallium/drivers/r600/r600_shader.h index 8ba32ae..c240e71 100644 --- a/src/gallium/drivers/r600/r600_shader.h +++ b/src/gallium/drivers/r600/r600_shader.h @@ -75,8 +75,6 @@ struct r600_shader { boolean has_txq_cube_array_z_comp; boolean uses_tex_buffers; boolean gs_prim_id_input; - /* Temporarily workaround SB not handling ubo indexing */ - boolean uses_ubo_indexing; /* Size in bytes of a data item in the ring(s) (single vertex data). Stages with only one ring items 123 will be set to 0. */ diff --git a/src/gallium/drivers/r600/sb/sb_bc.h b/src/gallium/drivers/r600/sb/sb_bc.h index 126750d..9c2a917 100644 --- a/src/gallium/drivers/r600/sb/sb_bc.h +++ b/src/gallium/drivers/r600/sb/sb_bc.h @@ -478,7 +478,9 @@ struct bc_cf { bool is_alu_extended() { assert(op_ptr->flags & CF_ALU); - return kc[2].mode != KC_LOCK_NONE || kc[3].mode != KC_LOCK_NONE; + return kc[2].mode != KC_LOCK_NONE || kc[3].mode != KC_LOCK_NONE || + kc[0].index_mode != KC_INDEX_NONE || kc[1].index_mode != KC_INDEX_NONE || + kc[2].index_mode != KC_INDEX_NONE || kc[3].index_mode != KC_INDEX_NONE; } }; diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp index 522ff9d..17fe2a5 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp @@ -514,7 +514,7 @@ void bc_finalizer::copy_fetch_src(fetch_node , fetch_node , unsigned arg void bc_finalizer::emit_set_grad(fetch_node* f) { - assert(f->src.size() == 12); + assert(f->src.size() == 12 || f->src.size() == 13); unsigned ops[2] = { FETCH_OP_SET_GRADIENTS_V, FETCH_OP_SET_GRADIENTS_H }; unsigned arg_start = 0; @@ -809,8 +809,8 @@ void bc_finalizer::finalize_cf(cf_node* c) { } sel_chan bc_finalizer::translate_kcache(cf_node* alu, value* v) { - unsigned sel = v->select.sel(); - unsigned bank = sel >> 12; + unsigned sel = v->select.kcache_sel(); + unsigned bank = v->select.kcache_bank(); uns
[Mesa-dev] [PATCH 2/2] r600g: Enable GL_ARB_gpu_shader5 extension
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- Now that SB supports the GS5 features we can safely enable the extension. Note that gallium state tracker clamps the GLSL language / GL version since GL_ARB_tessellation_shader isn't implemented yet. docs/GL3.txt | 16 docs/relnotes/11.1.0.html| 1 + src/gallium/drivers/r600/r600_pipe.c | 2 +- 3 files changed, 10 insertions(+), 9 deletions(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index e17e783..6503e2a 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -96,18 +96,18 @@ GL 4.0, GLSL 4.00 --- all DONE: nvc0, radeonsi GL_ARB_draw_buffers_blendDONE (i965, nv50, r600, llvmpipe, softpipe) GL_ARB_draw_indirect DONE (i965, r600, llvmpipe, softpipe) - GL_ARB_gpu_shader5 DONE (i965) + GL_ARB_gpu_shader5 DONE (i965, r600) - 'precise' qualifierDONE - - Dynamically uniform sampler array indices DONE (r600, softpipe) - - Dynamically uniform UBO array indices DONE (r600) + - Dynamically uniform sampler array indices DONE (softpipe) + - Dynamically uniform UBO array indices DONE () - Implicit signed -> unsigned conversionsDONE - Fused multiply-add DONE () - - Packing/bitfield/conversion functions DONE (r600, softpipe) - - Enhanced textureGather DONE (r600, softpipe) - - Geometry shader instancing DONE (r600, llvmpipe, softpipe) + - Packing/bitfield/conversion functions DONE (softpipe) + - Enhanced textureGather DONE (softpipe) + - Geometry shader instancing DONE (llvmpipe, softpipe) - Geometry shader multiple streams DONE () - - Enhanced per-sample shadingDONE (r600) - - Interpolation functionsDONE (r600) + - Enhanced per-sample shadingDONE () + - Interpolation functionsDONE () - New overload resolution rules DONE GL_ARB_gpu_shader_fp64 DONE (r600, llvmpipe, softpipe) GL_ARB_sample_shadingDONE (i965, nv50, r600) diff --git a/docs/relnotes/11.1.0.html b/docs/relnotes/11.1.0.html index c755c98..e537d98 100644 --- a/docs/relnotes/11.1.0.html +++ b/docs/relnotes/11.1.0.html @@ -50,6 +50,7 @@ Note: some of the new features are only available with certain drivers. GL_ARB_texture_barrier / GL_NV_texture_barrier on i965 GL_ARB_texture_query_lod on softpipe GL_ARB_gpu_shader_fp64 on r600 for Cypress/Cayman/Aruba chips +GL_ARB_gpu_shader5 on r600 for Evergreen and later chips Bug fixes diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index efb4889..32ce76a 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -305,7 +305,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_GLSL_FEATURE_LEVEL: if (family >= CHIP_CEDAR) - return 330; + return 410; /* pre-evergreen geom shaders need newer kernel */ if (rscreen->b.info.drm_minor >= 37) return 330; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] r600g: Enable GL_ARB_gpu_shader5 extension
On Wed, 07 Oct 2015 19:59:03 +0200, Benjamin Bellec <b.bel...@gmail.com> wrote: Le 07/10/2015 19:13, Glenn Kennard a écrit : On Wed, 07 Oct 2015 19:04:15 +0200, Benjamin Bellec <b.bel...@gmail.com> wrote: Hi Glenn, The series doesn't apply on current master. Regard. It's not meant to apply directly on master. Quoting from the notes in patch 1/2: This patch depends on prior patch: r600g/sb: Support gs5 sampler indexing /Glenn OK sorry, I read too quickly. Is that normal glxinfo still reports GLSL 330 ? With your series applied I still get : OpenGL renderer string: Gallium 0.4 on AMD CYPRESS (DRM 2.42.0) OpenGL core profile version string: 3.3 (Core Profile) Mesa 11.1.0-devel (git-6ed8fd3) OpenGL core profile shading language version string: 3.30 Quoting from the notes in patch 2/2: "Note that gallium state tracker clamps the GLSL language / GL version since GL_ARB_tessellation_shader isn't implemented yet." /Glenn ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] r600g: Enable GL_ARB_gpu_shader5 extension
On Wed, 07 Oct 2015 19:04:15 +0200, Benjamin Bellecwrote: Hi Glenn, The series doesn't apply on current master. Regard. It's not meant to apply directly on master. Quoting from the notes in patch 1/2: This patch depends on prior patch: r600g/sb: Support gs5 sampler indexing /Glenn ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] List of unsupported extensions per driver
On Tue, 29 Sep 2015 17:00:31 +0200, Marek Olšákwrote: On Tue, Sep 29, 2015 at 4:48 PM, Romain Failliot wrote: What I don't understand is that all the lines starting with a "-" seems to be part of the GL_ARB_gpu_shader5 extension. See the line here: http://cgit.freedesktop.org/mesa/mesa/tree/docs/GL3.txt#n99 If I'm right, it means that, considering Ilia's web site, GL_ARB_gpu_shader5 is unsupported by R600, but everything in its sublist is supported. You see why it is confusing? No, not everything is supported. GS streams aren't. Actually GS streams were merged a few weeks ago. gpu_shader5 isn't enabled yet because SB doesn't support all the features yet, and games etc getting unoptimized shaders when trying to use more modern GL4 features is not an acceptable regression. Specifically, sampler and UBO indexing need SB support, the first i posted a patch for but it's not merged yet (needs someone who can grok SB code to review that), and the second, well, no ETA on that but work happens on it. GL3.txt doesn't tell the whole story, its just a rough idea of whats going on featurewise, for the details inquire, like Romain just did :-) Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g/sb: Support gs5 sampler indexing
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- Just UBO support left before gs5 can be enabled. Could improve how the two index registers are set/used to reduce the number of clauses, but as is its about as good as what the blob emits. src/gallium/drivers/r600/r600_shader.c | 12 ++- src/gallium/drivers/r600/r600_shader.h | 4 +- src/gallium/drivers/r600/sb/sb_bc.h | 10 ++- src/gallium/drivers/r600/sb/sb_bc_dump.cpp | 17 +++- src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 50 +++- src/gallium/drivers/r600/sb/sb_gcm.cpp | 11 ++- src/gallium/drivers/r600/sb/sb_sched.cpp | 118 +-- src/gallium/drivers/r600/sb/sb_sched.h | 5 +- 8 files changed, 201 insertions(+), 26 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 1d90582..24c3d43 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -166,8 +166,8 @@ int r600_pipe_shader_create(struct pipe_context *ctx, if (rctx->b.chip_class <= R700) { use_sb &= (shader->shader.processor_type != TGSI_PROCESSOR_GEOMETRY); } - /* disable SB for shaders using CF_INDEX_0/1 (sampler/ubo array indexing) as it doesn't handle those currently */ - use_sb &= !shader->shader.uses_index_registers; + /* disable SB for shaders using ubo array indexing as it doesn't handle those currently */ + use_sb &= !shader->shader.uses_ubo_indexing; /* disable SB for shaders using doubles */ use_sb &= !shader->shader.uses_doubles; @@ -1251,7 +1251,7 @@ static int tgsi_split_constant(struct r600_shader_ctx *ctx) } if (ctx->src[i].kc_rel) - ctx->shader->uses_index_registers = true; + ctx->shader->uses_ubo_indexing = true; if (ctx->src[i].rel) { int chan = inst->Src[i].Indirect.Swizzle; @@ -1912,7 +1912,7 @@ static int r600_shader_from_tgsi(struct r600_context *rctx, shader->uses_doubles = ctx.info.uses_doubles; - indirect_gprs = ctx.info.indirect_files & ~(1 << TGSI_FILE_CONSTANT); + indirect_gprs = ctx.info.indirect_files & ~((1 << TGSI_FILE_CONSTANT) | (1 << TGSI_FILE_SAMPLER)); tgsi_parse_init(, tokens); ctx.type = ctx.info.processor; shader->processor_type = ctx.type; @@ -1936,7 +1936,7 @@ static int r600_shader_from_tgsi(struct r600_context *rctx, ctx.gs_next_vertex = 0; ctx.gs_stream_output_info = - shader->uses_index_registers = false; + shader->uses_ubo_indexing = false; ctx.face_gpr = -1; ctx.fixed_pt_position_gpr = -1; ctx.fragcoord_input = -1; @@ -5703,8 +5703,6 @@ static int tgsi_tex(struct r600_shader_ctx *ctx) sampler_src_reg = 3; sampler_index_mode = inst->Src[sampler_src_reg].Indirect.Index == 2 ? 2 : 0; // CF_INDEX_1 : CF_INDEX_NONE - if (sampler_index_mode) - ctx->shader->uses_index_registers = true; src_gpr = tgsi_tex_get_src_gpr(ctx, 0); diff --git a/src/gallium/drivers/r600/r600_shader.h b/src/gallium/drivers/r600/r600_shader.h index 48de9cd..8ba32ae 100644 --- a/src/gallium/drivers/r600/r600_shader.h +++ b/src/gallium/drivers/r600/r600_shader.h @@ -75,8 +75,8 @@ struct r600_shader { boolean has_txq_cube_array_z_comp; boolean uses_tex_buffers; boolean gs_prim_id_input; - /* Temporarily workaround SB not handling CF_INDEX_[01] index registers */ - boolean uses_index_registers; + /* Temporarily workaround SB not handling ubo indexing */ + boolean uses_ubo_indexing; /* Size in bytes of a data item in the ring(s) (single vertex data). Stages with only one ring items 123 will be set to 0. */ diff --git a/src/gallium/drivers/r600/sb/sb_bc.h b/src/gallium/drivers/r600/sb/sb_bc.h index ab988f8..126750d 100644 --- a/src/gallium/drivers/r600/sb/sb_bc.h +++ b/src/gallium/drivers/r600/sb/sb_bc.h @@ -48,6 +48,7 @@ class fetch_node; class alu_group_node; class region_node; class shader; +class value; class sb_ostream { public: @@ -818,13 +819,16 @@ class bc_parser { bool gpr_reladdr; + // Note: currently relies on input emitting SET_CF in same basic block as uses + value *cf_index_value[2]; + alu_node *mova; public: bc_parser(sb_context , r600_bytecode *bc, r600_shader* pshader) : ctx(sctx), dec(), bc(bc), pshader(pshader), dw(), bc_ndw(), max_cf(), sh(), error(), slots(), cgroup(), - cf_map(), loop_stack(), gpr_reladdr() { } + cf_map(), loop_stack(), gpr_reladdr(
Re: [Mesa-dev] [PATCH] st/mesa: avoid integer overflows with buffers >= 512MB
On Wed, 16 Sep 2015 01:32:10 +0200, Ilia Mirkin <imir...@alum.mit.edu> wrote: This fixes failures with the newly-submitted max-size texture buffer piglit test for GPUs exposing >= 128M max texels. Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu> Cc: "10.6 11.0" <mesa-sta...@lists.freedesktop.org> --- src/mesa/state_tracker/st_atom_texture.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/state_tracker/st_atom_texture.c b/src/mesa/state_tracker/st_atom_texture.c index 31e0f6b..62312af 100644 --- a/src/mesa/state_tracker/st_atom_texture.c +++ b/src/mesa/state_tracker/st_atom_texture.c @@ -264,7 +264,7 @@ st_create_texture_sampler_view_from_stobj(struct pipe_context *pipe, format); if (stObj->pt->target == PIPE_BUFFER) { - unsigned base, size; + uint64_t base, size; unsigned f, n; const struct util_format_description *desc = util_format_description(templ.format); Tested / Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 2/5] gallium: add PIPE_CAP_TGSI_TXQS to let st know if TXQS is supported
lium/drivers/nouveau/nvc0/nvc0_screen.c @@ -200,6 +200,7 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_VERTEXID_NOBASE: case PIPE_CAP_RESOURCE_FROM_USER_MEMORY: case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: + case PIPE_CAP_TGSI_TXQS: return 0; case PIPE_CAP_VENDOR_ID: diff --git a/src/gallium/drivers/r300/r300_screen.c b/src/gallium/drivers/r300/r300_screen.c index 4ca0b26..e669ba2 100644 --- a/src/gallium/drivers/r300/r300_screen.c +++ b/src/gallium/drivers/r300/r300_screen.c @@ -195,6 +195,7 @@ static int r300_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_TEXTURE_FLOAT_LINEAR: case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR: case PIPE_CAP_DEPTH_BOUNDS_TEST: +case PIPE_CAP_TGSI_TXQS: return 0; /* SWTCL-only features. */ diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index fd9c16c..dfbf0e5 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -341,6 +341,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_VERTEXID_NOBASE: case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS: case PIPE_CAP_DEPTH_BOUNDS_TEST: + case PIPE_CAP_TGSI_TXQS: return 0; /* Stream output. */ diff --git a/src/gallium/drivers/radeonsi/si_pipe.c b/src/gallium/drivers/radeonsi/si_pipe.c index 9094427..ae1ff7e 100644 --- a/src/gallium/drivers/radeonsi/si_pipe.c +++ b/src/gallium/drivers/radeonsi/si_pipe.c @@ -325,6 +325,7 @@ static int si_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_TEXTURE_GATHER_OFFSETS: case PIPE_CAP_SAMPLER_VIEW_TARGET: case PIPE_CAP_VERTEXID_NOBASE: + case PIPE_CAP_TGSI_TXQS: return 0; case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS: diff --git a/src/gallium/drivers/softpipe/sp_screen.c b/src/gallium/drivers/softpipe/sp_screen.c index 7ca8a67..d8606f3 100644 --- a/src/gallium/drivers/softpipe/sp_screen.c +++ b/src/gallium/drivers/softpipe/sp_screen.c @@ -246,6 +246,7 @@ softpipe_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_DEVICE_RESET_STATUS_QUERY: case PIPE_CAP_MAX_SHADER_PATCH_VARYINGS: case PIPE_CAP_DEPTH_BOUNDS_TEST: + case PIPE_CAP_TGSI_TXQS: return 0; } /* should only get here on unhandled cases */ diff --git a/src/gallium/drivers/svga/svga_screen.c b/src/gallium/drivers/svga/svga_screen.c index f2ae40b..44b6f4a 100644 --- a/src/gallium/drivers/svga/svga_screen.c +++ b/src/gallium/drivers/svga/svga_screen.c @@ -379,6 +379,7 @@ svga_get_param(struct pipe_screen *screen, enum pipe_cap param) case PIPE_CAP_TEXTURE_FLOAT_LINEAR: case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR: case PIPE_CAP_DEPTH_BOUNDS_TEST: + case PIPE_CAP_TGSI_TXQS: return 0; } diff --git a/src/gallium/drivers/vc4/vc4_screen.c b/src/gallium/drivers/vc4/vc4_screen.c index 2dee1d4..c4b52e1 100644 --- a/src/gallium/drivers/vc4/vc4_screen.c +++ b/src/gallium/drivers/vc4/vc4_screen.c @@ -180,6 +180,7 @@ vc4_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_TEXTURE_FLOAT_LINEAR: case PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR: case PIPE_CAP_DEPTH_BOUNDS_TEST: + case PIPE_CAP_TGSI_TXQS: return 0; /* Stream output. */ diff --git a/src/gallium/include/pipe/p_defines.h b/src/gallium/include/pipe/p_defines.h index 88e37e9..47fa82a 100644 --- a/src/gallium/include/pipe/p_defines.h +++ b/src/gallium/include/pipe/p_defines.h @@ -630,6 +630,7 @@ enum pipe_cap PIPE_CAP_TEXTURE_FLOAT_LINEAR, PIPE_CAP_TEXTURE_HALF_FLOAT_LINEAR, PIPE_CAP_DEPTH_BOUNDS_TEST, + PIPE_CAP_TGSI_TXQS, }; #define PIPE_QUIRK_TEXTURE_BORDER_COLOR_SWIZZLE_NV50 (1 << 0) Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] r600g: lower number of driver const buffers
Series is: Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 2/2] r600: Enable fp64 on chips with native support
Cypress/Cayman/Aruba, earlier r6xx/r7xx chips only support a subset of the needed fp64 ops, and don't do GL4 anyway. Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- Changes since v1: Updated commit message docs/GL3.txt | 4 ++-- docs/relnotes/11.1.0.html| 2 +- src/gallium/drivers/r600/r600_pipe.c | 3 +++ 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 8ad1aac..7247eb6 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -109,7 +109,7 @@ GL 4.0, GLSL 4.00 --- all DONE: nvc0, radeonsi - Enhanced per-sample shadingDONE (r600) - Interpolation functionsDONE (r600) - New overload resolution rules DONE - GL_ARB_gpu_shader_fp64 DONE (llvmpipe, softpipe) + GL_ARB_gpu_shader_fp64 DONE (r600, llvmpipe, softpipe) GL_ARB_sample_shadingDONE (i965, nv50, r600) GL_ARB_shader_subroutine DONE (i965, nv50, r600, llvmpipe, softpipe) GL_ARB_tessellation_shader DONE () @@ -127,7 +127,7 @@ GL 4.1, GLSL 4.10 --- all DONE: nvc0, radeonsi GL_ARB_get_program_binaryDONE (0 binary formats) GL_ARB_separate_shader_objects DONE (all drivers) GL_ARB_shader_precision DONE (all drivers that support GLSL 4.10) - GL_ARB_vertex_attrib_64bit DONE (llvmpipe, softpipe) + GL_ARB_vertex_attrib_64bit DONE (r600, llvmpipe, softpipe) GL_ARB_viewport_arrayDONE (i965, nv50, r600, llvmpipe) diff --git a/docs/relnotes/11.1.0.html b/docs/relnotes/11.1.0.html index 4b56f69..f7ff74a 100644 --- a/docs/relnotes/11.1.0.html +++ b/docs/relnotes/11.1.0.html @@ -45,7 +45,7 @@ Note: some of the new features are only available with certain drivers. GL_ARB_texture_query_lod on softpipe -TBD. +GL_ARB_gpu_shader_fp64 on r600 for Cypress/Cayman/Aruba chips Bug fixes diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index fd9c16c..a18ec49 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -500,6 +500,9 @@ static int r600_get_shader_param(struct pipe_screen* pscreen, unsigned shader, e return PIPE_SHADER_IR_TGSI; } case PIPE_SHADER_CAP_DOUBLES: + if (rscreen->b.family == CHIP_CYPRESS || + rscreen->b.family == CHIP_CAYMAN || rscreen->b.family == CHIP_ARUBA) + return 1; return 0; case PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED: case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED: -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 1/2] r600g: Support I2D/U2D/D2I/D2U
Only for Cypress/Cayman/Aruba, older chips have only partial fp64 support. Uses float intermediate values so only accurate for int24 range, which matches what the blob does. Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- Changes since v1: Split into two functions Make names a bit clearer which chips they apply to Fix mixup of INT_TO_FLT/UINT_TO_FLT for eg opcode table Updated commit message src/gallium/drivers/r600/r600_shader.c | 106 ++--- 1 file changed, 98 insertions(+), 8 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index f2c9e16..41cb226 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -3058,6 +3058,96 @@ static int tgsi_dfracexp(struct r600_shader_ctx *ctx) return 0; } + +static int egcm_int_to_double(struct r600_shader_ctx *ctx) +{ + struct tgsi_full_instruction *inst = >parse.FullToken.FullInstruction; + struct r600_bytecode_alu alu; + int i, r; + int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask); + + assert(inst->Instruction.Opcode == TGSI_OPCODE_I2D || + inst->Instruction.Opcode == TGSI_OPCODE_U2D); + + for (i = 0; i <= (lasti+1)/2; i++) { + memset(, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ctx->inst_info->op; + + r600_bytecode_src([0], >src[0], i); + alu.dst.sel = ctx->temp_reg; + alu.dst.chan = i; + alu.dst.write = 1; + alu.last = 1; + + r = r600_bytecode_add_alu(ctx->bc, ); + if (r) + return r; + } + + for (i = 0; i <= lasti; i++) { + memset(, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ALU_OP1_FLT32_TO_FLT64; + + alu.src[0].chan = i/2; + if (i%2 == 0) + alu.src[0].sel = ctx->temp_reg; + else { + alu.src[0].sel = V_SQ_ALU_SRC_LITERAL; + alu.src[0].value = 0x0; + } + tgsi_dst(ctx, >Dst[0], i, ); + alu.last = i == lasti; + + r = r600_bytecode_add_alu(ctx->bc, ); + if (r) + return r; + } + + return 0; +} + +static int egcm_double_to_int(struct r600_shader_ctx *ctx) +{ + struct tgsi_full_instruction *inst = >parse.FullToken.FullInstruction; + struct r600_bytecode_alu alu; + int i, r; + int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask); + + assert(inst->Instruction.Opcode == TGSI_OPCODE_D2I || + inst->Instruction.Opcode == TGSI_OPCODE_D2U); + + for (i = 0; i <= lasti; i++) { + memset(, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ALU_OP1_FLT64_TO_FLT32; + + r600_bytecode_src([0], >src[0], fp64_switch(i)); + alu.dst.chan = i; + alu.dst.sel = ctx->temp_reg; + alu.dst.write = i%2 == 0; + alu.last = i == lasti; + + r = r600_bytecode_add_alu(ctx->bc, ); + if (r) + return r; + } + + for (i = 0; i <= (lasti+1)/2; i++) { + memset(, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ctx->inst_info->op; + + alu.src[0].chan = i*2; + alu.src[0].sel = ctx->temp_reg; + tgsi_dst(ctx, >Dst[0], 0, ); + alu.last = 1; + + r = r600_bytecode_add_alu(ctx->bc, ); + if (r) + return r; + } + + return 0; +} + static int cayman_emit_double_instr(struct r600_shader_ctx *ctx) { struct tgsi_full_instruction *inst = >parse.FullToken.FullInstruction; @@ -8150,10 +8240,10 @@ static const struct r600_shader_tgsi_instruction eg_shader_tgsi_instruction[] = [TGSI_OPCODE_DFRAC] = { ALU_OP1_FRACT_64, tgsi_op2_64}, [TGSI_OPCODE_DLDEXP]= { ALU_OP2_LDEXP_64, tgsi_op2_64}, [TGSI_OPCODE_DFRACEXP] = { ALU_OP1_FREXP_64, tgsi_dfracexp}, - [TGSI_OPCODE_D2I] = { ALU_OP0_NOP, tgsi_unsupported}, - [TGSI_OPCODE_I2D] = { ALU_OP0_NOP, tgsi_unsupported}, - [TGSI_OPCODE_D2U] = { ALU_OP0_NOP, tgsi_unsupported}, - [TGSI_OPCODE_U2D] = { ALU_OP0_NOP, tgsi_unsupported}, + [TGSI_OPCODE_D2I] = { ALU_OP1_FLT_TO_INT, egcm_double_to_int}, + [TGSI_OPCODE_I2D] = { ALU_OP1_INT_TO_FLT, egcm_int_to_double}, + [TGSI_OPCODE_D2U] = { ALU_OP1_FLT_TO_UINT, egcm_double_to_int}, + [TGSI_OPCODE_U2D] = { ALU_OP1_UINT_TO_FLT, egcm_int_to_double}, [TGSI_OPCODE_DRSQ] = { ALU_OP2_RECIPSQRT_64, cayman_emit_double_instr},
[Mesa-dev] [PATCH 2/2] r600: Enable fp64 on chips with native support
Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- docs/GL3.txt | 4 ++-- docs/relnotes/11.1.0.html| 2 +- src/gallium/drivers/r600/r600_pipe.c | 3 +++ 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 8ad1aac..7247eb6 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -109,7 +109,7 @@ GL 4.0, GLSL 4.00 --- all DONE: nvc0, radeonsi - Enhanced per-sample shadingDONE (r600) - Interpolation functionsDONE (r600) - New overload resolution rules DONE - GL_ARB_gpu_shader_fp64 DONE (llvmpipe, softpipe) + GL_ARB_gpu_shader_fp64 DONE (r600, llvmpipe, softpipe) GL_ARB_sample_shadingDONE (i965, nv50, r600) GL_ARB_shader_subroutine DONE (i965, nv50, r600, llvmpipe, softpipe) GL_ARB_tessellation_shader DONE () @@ -127,7 +127,7 @@ GL 4.1, GLSL 4.10 --- all DONE: nvc0, radeonsi GL_ARB_get_program_binaryDONE (0 binary formats) GL_ARB_separate_shader_objects DONE (all drivers) GL_ARB_shader_precision DONE (all drivers that support GLSL 4.10) - GL_ARB_vertex_attrib_64bit DONE (llvmpipe, softpipe) + GL_ARB_vertex_attrib_64bit DONE (r600, llvmpipe, softpipe) GL_ARB_viewport_arrayDONE (i965, nv50, r600, llvmpipe) diff --git a/docs/relnotes/11.1.0.html b/docs/relnotes/11.1.0.html index 4b56f69..f7ff74a 100644 --- a/docs/relnotes/11.1.0.html +++ b/docs/relnotes/11.1.0.html @@ -45,7 +45,7 @@ Note: some of the new features are only available with certain drivers. GL_ARB_texture_query_lod on softpipe -TBD. +GL_ARB_gpu_shader_fp64 on r600 for Cypress/Cayman/Aruba chips Bug fixes diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index fd9c16c..a18ec49 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -500,6 +500,9 @@ static int r600_get_shader_param(struct pipe_screen* pscreen, unsigned shader, e return PIPE_SHADER_IR_TGSI; } case PIPE_SHADER_CAP_DOUBLES: + if (rscreen->b.family == CHIP_CYPRESS || + rscreen->b.family == CHIP_CAYMAN || rscreen->b.family == CHIP_ARUBA) + return 1; return 0; case PIPE_SHADER_CAP_TGSI_DROUND_SUPPORTED: case PIPE_SHADER_CAP_TGSI_DFRACEXP_DLDEXP_SUPPORTED: -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] r600g: Support I2D/U2D/D2I/D2U
int <-> float <-> double conversion, matches what the blob does Signed-off-by: Glenn Kennard <glenn.kenn...@gmail.com> --- src/gallium/drivers/r600/r600_shader.c | 95 +++--- 1 file changed, 87 insertions(+), 8 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index f2c9e16..1c642fd 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -3058,6 +3058,85 @@ static int tgsi_dfracexp(struct r600_shader_ctx *ctx) return 0; } +static int cypress_int_double(struct r600_shader_ctx *ctx) +{ + struct tgsi_full_instruction *inst = >parse.FullToken.FullInstruction; + struct r600_bytecode_alu alu; + int i, r; + int lasti = tgsi_last_instruction(inst->Dst[0].Register.WriteMask); + + if (inst->Instruction.Opcode == TGSI_OPCODE_I2D || + inst->Instruction.Opcode == TGSI_OPCODE_U2D) { + + for (i = 0; i <= (lasti+1)/2; i++) { + memset(, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ctx->inst_info->op; + + r600_bytecode_src([0], >src[0], i); + alu.dst.sel = ctx->temp_reg; + alu.dst.chan = i; + alu.dst.write = 1; + alu.last = 1; + + r = r600_bytecode_add_alu(ctx->bc, ); + if (r) + return r; + } + + for (i = 0; i <= lasti; i++) { + memset(, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ALU_OP1_FLT32_TO_FLT64; + + alu.src[0].chan = i/2; + if (i%2 == 0) + alu.src[0].sel = ctx->temp_reg; + else { + alu.src[0].sel = V_SQ_ALU_SRC_LITERAL; + alu.src[0].value = 0x0; + } + tgsi_dst(ctx, >Dst[0], i, ); + alu.last = i == lasti; + + r = r600_bytecode_add_alu(ctx->bc, ); + if (r) + return r; + } + } + else { // D2I/D2U + + for (i = 0; i <= lasti; i++) { + memset(, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ALU_OP1_FLT64_TO_FLT32; + + r600_bytecode_src([0], >src[0], fp64_switch(i)); + alu.dst.chan = i; + alu.dst.sel = ctx->temp_reg; + alu.dst.write = i%2 == 0; + alu.last = i == lasti; + + r = r600_bytecode_add_alu(ctx->bc, ); + if (r) + return r; + } + + for (i = 0; i <= (lasti+1)/2; i++) { + memset(, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ctx->inst_info->op; + + alu.src[0].chan = i*2; + alu.src[0].sel = ctx->temp_reg; + tgsi_dst(ctx, >Dst[0], 0, ); + alu.last = 1; + + r = r600_bytecode_add_alu(ctx->bc, ); + if (r) + return r; + } + } + + return 0; +} + static int cayman_emit_double_instr(struct r600_shader_ctx *ctx) { struct tgsi_full_instruction *inst = >parse.FullToken.FullInstruction; @@ -8150,10 +8229,10 @@ static const struct r600_shader_tgsi_instruction eg_shader_tgsi_instruction[] = [TGSI_OPCODE_DFRAC] = { ALU_OP1_FRACT_64, tgsi_op2_64}, [TGSI_OPCODE_DLDEXP]= { ALU_OP2_LDEXP_64, tgsi_op2_64}, [TGSI_OPCODE_DFRACEXP] = { ALU_OP1_FREXP_64, tgsi_dfracexp}, - [TGSI_OPCODE_D2I] = { ALU_OP0_NOP, tgsi_unsupported}, - [TGSI_OPCODE_I2D] = { ALU_OP0_NOP, tgsi_unsupported}, - [TGSI_OPCODE_D2U] = { ALU_OP0_NOP, tgsi_unsupported}, - [TGSI_OPCODE_U2D] = { ALU_OP0_NOP, tgsi_unsupported}, + [TGSI_OPCODE_D2I] = { ALU_OP1_FLT_TO_INT, cypress_int_double}, + [TGSI_OPCODE_I2D] = { ALU_OP1_INT_TO_FLT, cypress_int_double}, + [TGSI_OPCODE_D2U] = { ALU_OP1_FLT_TO_UINT, cypress_int_double}, + [TGSI_OPCODE_U2D] = { ALU_OP1_INT_TO_FLT, cypress_int_double}, [TGSI_OPCODE_DRSQ] = { ALU_OP2_RECIPSQRT_64, cayman_emit_double_instr}, [TGSI_OPCODE_LAST] = { ALU_OP0_NOP, tgsi_unsupported}, }; @@ -8372,10 +8451,10 @@ static const struct r600_shader_tgsi_instruction cm_shader_tgsi_instruction[] = [TGSI_OPCODE_DFRAC] = { ALU_OP1_FRACT_64, tgsi_op2_64},
Re: [Mesa-dev] [PATCH] r600/sb: update last_cf for finalize if.
On Mon, 31 Aug 2015 06:23:38 +0200, Dave Airlie <airl...@gmail.com> wrote: From: Dave Airlie <airl...@redhat.com> As Glenn did for finalize_loop we need to update_cf when we add a POP at the end of a shader. I think this fixes one of the earlier shader going off end of memory problems we've stopped. Signed-off-by: Dave Airlie <airl...@redhat.com> --- src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp index e8ed5a2..726e438 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp @@ -199,6 +199,9 @@ void bc_finalizer::finalize_if(region_node* r) { cf_node *if_jump = sh.create_cf(CF_OP_JUMP); cf_node *if_pop = sh.create_cf(CF_OP_POP); + if (!last_cf || last_cf->get_parent_region() == r) { + last_cf = if_pop; + } if_pop->bc.pop_count = 1; if_pop->jump_after(if_pop); Reviewed-by: Glenn Kennard <glenn.kenn...@gmail.com> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600: port si_conv_prim_to_gs_out from radeonsi
On Fri, 28 Aug 2015 02:47:44 +0200, Dave Airlie airl...@gmail.com wrote: From: Dave Airlie airl...@redhat.com This code we broken by the tess merge, and I totally missed it was broken until now. I'm not sure this fixes anything but it stops the assert. Cc: 11.0 mesa-sta...@lists.freedesktop.org Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/drivers/r600/r600_pipe.h | 31 --- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index 384ba80..3247aba 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -939,21 +939,22 @@ static inline bool r600_can_read_depth(struct r600_texture *rtex) static inline unsigned r600_conv_prim_to_gs_out(unsigned mode) { static const int prim_conv[] = { - V_028A6C_OUTPRIM_TYPE_POINTLIST, - V_028A6C_OUTPRIM_TYPE_LINESTRIP, - V_028A6C_OUTPRIM_TYPE_LINESTRIP, - V_028A6C_OUTPRIM_TYPE_LINESTRIP, - V_028A6C_OUTPRIM_TYPE_TRISTRIP, - V_028A6C_OUTPRIM_TYPE_TRISTRIP, - V_028A6C_OUTPRIM_TYPE_TRISTRIP, - V_028A6C_OUTPRIM_TYPE_TRISTRIP, - V_028A6C_OUTPRIM_TYPE_TRISTRIP, - V_028A6C_OUTPRIM_TYPE_TRISTRIP, - V_028A6C_OUTPRIM_TYPE_LINESTRIP, - V_028A6C_OUTPRIM_TYPE_LINESTRIP, - V_028A6C_OUTPRIM_TYPE_TRISTRIP, - V_028A6C_OUTPRIM_TYPE_TRISTRIP, - V_028A6C_OUTPRIM_TYPE_TRISTRIP + [PIPE_PRIM_POINTS] = V_028A6C_OUTPRIM_TYPE_POINTLIST, + [PIPE_PRIM_LINES] = V_028A6C_OUTPRIM_TYPE_LINESTRIP, + [PIPE_PRIM_LINE_LOOP] = V_028A6C_OUTPRIM_TYPE_LINESTRIP, + [PIPE_PRIM_LINE_STRIP] = V_028A6C_OUTPRIM_TYPE_LINESTRIP, + [PIPE_PRIM_TRIANGLES] = V_028A6C_OUTPRIM_TYPE_TRISTRIP, + [PIPE_PRIM_TRIANGLE_STRIP] = V_028A6C_OUTPRIM_TYPE_TRISTRIP, + [PIPE_PRIM_TRIANGLE_FAN]= V_028A6C_OUTPRIM_TYPE_TRISTRIP, + [PIPE_PRIM_QUADS] = V_028A6C_OUTPRIM_TYPE_TRISTRIP, + [PIPE_PRIM_QUAD_STRIP] = V_028A6C_OUTPRIM_TYPE_TRISTRIP, + [PIPE_PRIM_POLYGON] = V_028A6C_OUTPRIM_TYPE_TRISTRIP, + [PIPE_PRIM_LINES_ADJACENCY] = V_028A6C_OUTPRIM_TYPE_LINESTRIP, + [PIPE_PRIM_LINE_STRIP_ADJACENCY]= V_028A6C_OUTPRIM_TYPE_LINESTRIP, + [PIPE_PRIM_TRIANGLES_ADJACENCY] = V_028A6C_OUTPRIM_TYPE_TRISTRIP, + [PIPE_PRIM_TRIANGLE_STRIP_ADJACENCY]= V_028A6C_OUTPRIM_TYPE_TRISTRIP, + [PIPE_PRIM_PATCHES] = V_028A6C_OUTPRIM_TYPE_POINTLIST, + [R600_PRIM_RECTANGLE_LIST] = V_028A6C_OUTPRIM_TYPE_TRISTRIP }; assert(mode Elements(prim_conv)); A dup of si_conv_prim_to_gs_out(), but probably not worth the hassle of sharing. Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] r600g/sb: Handle undef in read port tracker
e8e443 missed adding check for undef values also in unreserve function, leading to an assert triggering. Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- src/gallium/drivers/r600/sb/sb_sched.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp b/src/gallium/drivers/r600/sb/sb_sched.cpp index 6268078..c98b8ff 100644 --- a/src/gallium/drivers/r600/sb/sb_sched.cpp +++ b/src/gallium/drivers/r600/sb/sb_sched.cpp @@ -236,7 +236,7 @@ void rp_gpr_tracker::unreserve(alu_node* n) { for (i = 0; i nsrc; ++i) { value *v = n-src[i]; - if (v-is_readonly()) + if (v-is_readonly() || v-is_undef()) continue; if (i == 1 opt) continue; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] r600g/sb: Don't crash on empty if jump target
Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp index 748aae2..c479927 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp @@ -792,6 +792,9 @@ int bc_parser::prepare_if(cf_node* c) { assert(c-bc.addr-1 cf_map.size()); cf_node *c_else = NULL, *end = cf_map[c-bc.addr]; + if (!end) + return 0; // not quite sure how this happens, malformed input? + BCP_DUMP( sblog parsing JUMP @ c-bc.id; sblog \n; @@ -817,7 +820,7 @@ int bc_parser::prepare_if(cf_node* c) { if (c_else-parent != c-parent) c_else = NULL; - if (end-parent != c-parent) + if (end end-parent != c-parent) end = NULL; region_node *reg = sh-create_region(); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] r600g/sb: Don't read junk after EOP
Shaders that contain instruction data after an instruction with EOP could end up parsing that as an instruction, leading to various crashes and asserts in SB as it gets very confused if it sees for instance a loop start instruction jumping off to some random point. Add a couple of asserts, and print EOP bit if set in old asm printer. Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- src/gallium/drivers/r600/r600_asm.c | 2 ++ src/gallium/drivers/r600/sb/sb_bc_decoder.cpp | 1 + src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 4 +++- 3 files changed, 6 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/r600_asm.c b/src/gallium/drivers/r600/r600_asm.c index 762cc7f..b514c58 100644 --- a/src/gallium/drivers/r600/r600_asm.c +++ b/src/gallium/drivers/r600/r600_asm.c @@ -2029,6 +2029,8 @@ void r600_bytecode_disasm(struct r600_bytecode *bc) fprintf(stderr, CND:%X , cf-cond); if (cf-pop_count) fprintf(stderr, POP:%X , cf-pop_count); + if (cf-end_of_program) + fprintf(stderr, EOP ); fprintf(stderr, \n); } } diff --git a/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp b/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp index 5e233f9..5fe8f50 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_decoder.cpp @@ -32,6 +32,7 @@ int bc_decoder::decode_cf(unsigned i, bc_cf bc) { int r = 0; uint32_t dw0 = dw[i]; uint32_t dw1 = dw[i+1]; + assert(i+1 = ndw); if ((dw1 29) 1) { // CF_ALU return decode_cf_alu(i, bc); diff --git a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp index 4879c03..748aae2 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_parser.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_parser.cpp @@ -95,7 +95,7 @@ int bc_parser::decode_shader() { if ((r = decode_cf(i, eop))) return r; - } while (!eop || (i 1) = max_cf); + } while (!eop || (i 1) max_cf); return 0; } @@ -769,6 +769,7 @@ int bc_parser::prepare_ir() { } int bc_parser::prepare_loop(cf_node* c) { + assert(c-bc.addr-1 cf_map.size()); cf_node *end = cf_map[c-bc.addr - 1]; assert(end-bc.op == CF_OP_LOOP_END); @@ -788,6 +789,7 @@ int bc_parser::prepare_loop(cf_node* c) { } int bc_parser::prepare_if(cf_node* c) { + assert(c-bc.addr-1 cf_map.size()); cf_node *c_else = NULL, *end = cf_map[c-bc.addr]; BCP_DUMP( -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g: Fix assert in tgsi_cmp
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=91726 Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- src/gallium/drivers/r600/r600_shader.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 6cbfd1b..4c4b600 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -6151,10 +6151,10 @@ static int tgsi_cmp(struct r600_shader_ctx *ctx) r = tgsi_make_src_for_op3(ctx, temp_regs[0], i, alu.src[0], ctx-src[0]); if (r) return r; - r = tgsi_make_src_for_op3(ctx, temp_regs[1], i, alu.src[1], ctx-src[2]); + r = tgsi_make_src_for_op3(ctx, temp_regs[2], i, alu.src[1], ctx-src[2]); if (r) return r; - r = tgsi_make_src_for_op3(ctx, temp_regs[2], i, alu.src[2], ctx-src[1]); + r = tgsi_make_src_for_op3(ctx, temp_regs[1], i, alu.src[2], ctx-src[1]); if (r) return r; tgsi_dst(ctx, inst-Dst[0], i, alu.dst); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g: Fix handling of TGSI_OPCODE_ARR with SB
FLT_TO_INT goes in the vector pipes on evergreen/NI, not the trans unit as on earlier chips. Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- Fixes issue found on nine: https://github.com/iXit/Mesa-3D/issues/119 src/gallium/drivers/r600/r600_isa.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/r600_isa.h b/src/gallium/drivers/r600/r600_isa.h index 381f06d..fdbe1c0 100644 --- a/src/gallium/drivers/r600/r600_isa.h +++ b/src/gallium/drivers/r600/r600_isa.h @@ -262,7 +262,7 @@ static const struct alu_op_info alu_op_table[] = { {PRED_SETNE_PUSH_INT, 2, { 0x4D, 0x4D },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_PRED_PUSH | AF_CC_NE | AF_INT_CMP }, {PRED_SETLT_PUSH_INT, 2, { 0x4E, 0x4E },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_PRED_PUSH | AF_CC_LT | AF_INT_CMP }, {PRED_SETLE_PUSH_INT, 2, { 0x4F, 0x4F },{ AF_VS, AF_VS, AF_VS, AF_VS}, AF_PRED_PUSH | AF_CC_LE | AF_INT_CMP }, - {FLT_TO_INT,1, { 0x6B, 0x50 },{ AF_S, AF_S, AF_VS, AF_VS}, AF_INT_DST | AF_CVT }, + {FLT_TO_INT,1, { 0x6B, 0x50 },{ AF_S, AF_S, AF_V, AF_V}, AF_INT_DST | AF_CVT }, {BFREV_INT, 1, { -1, 0x51 },{ 0, 0, AF_VS, AF_VS}, AF_INT_DST }, {ADDC_UINT, 2, { -1, 0x52 },{ 0, 0, AF_VS, AF_VS}, AF_UINT_DST }, {SUBB_UINT, 2, { -1, 0x53 },{ 0, 0, AF_VS, AF_VS}, AF_UINT_DST }, -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix sampler/ubo indexing on cayman
On Thu, 09 Jul 2015 07:37:59 +0200, Dave Airlie airl...@gmail.com wrote: From: Dave Airlie airl...@redhat.com Cayman needs a different method to upload the CF IDX0/1 This fixes 31 piglits when ARB_gpu_shader5 is forced on with cayman. Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/drivers/r600/eg_asm.c | 17 +++-- src/gallium/drivers/r600/eg_sq.h | 11 +++ 2 files changed, 22 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/r600/eg_asm.c b/src/gallium/drivers/r600/eg_asm.c index d04921e..c32d317 100644 --- a/src/gallium/drivers/r600/eg_asm.c +++ b/src/gallium/drivers/r600/eg_asm.c @@ -161,6 +161,9 @@ int egcm_load_index_reg(struct r600_bytecode *bc, unsigned id, bool inside_alu_c alu.op = ALU_OP1_MOVA_INT; alu.src[0].sel = bc-index_reg[id]; alu.src[0].chan = 0; + if (bc-chip_class == CAYMAN) + alu.dst.sel = id == 0 ? CM_V_SQ_MOVA_DST_CF_IDX0 : CM_V_SQ_MOVA_DST_CF_IDX1; + alu.last = 1; r = r600_bytecode_add_alu(bc, alu); if (r) @@ -168,12 +171,14 @@ int egcm_load_index_reg(struct r600_bytecode *bc, unsigned id, bool inside_alu_c bc-ar_loaded = 0; /* clobbered */ Could split ar_loaded into 3 bits for AR/IDX0/IDX1 for cayman, however I think it would be better to teach SB to handle sampler/ubo indexing and keep things simple here. - memset(alu, 0, sizeof(alu)); - alu.op = id == 0 ? ALU_OP0_SET_CF_IDX0 : ALU_OP0_SET_CF_IDX1; - alu.last = 1; - r = r600_bytecode_add_alu(bc, alu); - if (r) - return r; + if (bc-chip_class == EVERGREEN) { + memset(alu, 0, sizeof(alu)); + alu.op = id == 0 ? ALU_OP0_SET_CF_IDX0 : ALU_OP0_SET_CF_IDX1; + alu.last = 1; + r = r600_bytecode_add_alu(bc, alu); + if (r) + return r; + } /* Must split ALU group as index only applies to following group */ if (inside_alu_clause) { diff --git a/src/gallium/drivers/r600/eg_sq.h b/src/gallium/drivers/r600/eg_sq.h index b534872..10caa07 100644 --- a/src/gallium/drivers/r600/eg_sq.h +++ b/src/gallium/drivers/r600/eg_sq.h @@ -521,4 +521,15 @@ #define V_SQ_REL_ABSOLUTE 0 #define V_SQ_REL_RELATIVE 1 + +/* CAYMAN has special encoding for MOVA_INT destination */ +#define CM_V_SQ_MOVA_DST_AR_X 0 +#define CM_V_SQ_MOVA_DST_CF_PC 1 +#define CM_V_SQ_MOVA_DST_CF_IDX0 2 +#define CM_V_SQ_MOVA_DST_CF_IDX1 3 +#define CM_V_SQ_MOVA_DST_CF_CLAUSE_GLOBAL_7_0 4 +#define CM_V_SQ_MOVA_DST_CF_CLAUSE_GLOBAL_15_8 5 +#define CM_V_SQ_MOVA_DST_CF_CLAUSE_GLOBAL_23_16 6 +#define CM_V_SQ_MOVA_DST_CF_CLAUSE_GLOBAL_31_24 7 Can't think of any useful cases for the cayman specific ALU global register. Drop these four? + #endif Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: move sampler/ubo index registers before temp reg
On Thu, 09 Jul 2015 08:00:48 +0200, Dave Airlie airl...@gmail.com wrote: From: Dave Airlie airl...@redhat.com temp_reg needs to be last, as we increment things away from it, otherwise on cayman some tests were overwriting the index regs. Fixes 2 piglit with ARB_gpu_shader5 forced on cayman. Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/drivers/r600/r600_shader.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index af7622e..1a72bf6 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -1931,15 +1931,14 @@ static int r600_shader_from_tgsi(struct r600_context *rctx, ctx.file_offset[TGSI_FILE_IMMEDIATE] = V_SQ_ALU_SRC_LITERAL; ctx.bc-ar_reg = ctx.file_offset[TGSI_FILE_TEMPORARY] + ctx.info.file_max[TGSI_FILE_TEMPORARY] + 1; + ctx.bc-index_reg[0] = ctx.bc-ar_reg + 1; + ctx.bc-index_reg[1] = ctx.bc-ar_reg + 2; + if (ctx.type == TGSI_PROCESSOR_GEOMETRY) { - ctx.gs_export_gpr_treg = ctx.bc-ar_reg + 1; - ctx.temp_reg = ctx.bc-ar_reg + 2; - ctx.bc-index_reg[0] = ctx.bc-ar_reg + 3; - ctx.bc-index_reg[1] = ctx.bc-ar_reg + 4; + ctx.gs_export_gpr_treg = ctx.bc-ar_reg + 3; + ctx.temp_reg = ctx.bc-ar_reg + 4; } else { - ctx.temp_reg = ctx.bc-ar_reg + 1; - ctx.bc-index_reg[0] = ctx.bc-ar_reg + 2; - ctx.bc-index_reg[1] = ctx.bc-ar_reg + 3; + ctx.temp_reg = ctx.bc-ar_reg + 3; } shader-max_arrays = 0; Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/4] r600/sb_sched: fix what appears to be a typo in a condition
On Fri, 05 Jun 2015 19:39:31 +0200, Marek Olšák mar...@gmail.com wrote: I'd like somebody who knows r600/sb to review this. Glenn, can I bother you please? :) Marek On Fri, Jun 5, 2015 at 2:31 PM, Martin Peres martin.pe...@linux.intel.com wrote: Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/gallium/drivers/r600/sb/sb_sched.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp b/src/gallium/drivers/r600/sb/sb_sched.cpp index 2e38a62..6268078 100644 --- a/src/gallium/drivers/r600/sb/sb_sched.cpp +++ b/src/gallium/drivers/r600/sb/sb_sched.cpp @@ -489,7 +489,7 @@ bool alu_group_tracker::try_reserve(alu_node* n) { n-bc.bank_swizzle = 0; - if (!trans fbs) + if (!trans fbs) n-bc.bank_swizzle = VEC_210; if (gpr.try_reserve(n)) { -- 2.4.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev In theory this changes behavior, but the current implementation of the function that sets fbs - forced_bank_swizzle() only returns two values, VEC_012=0 or VEC_210=5, so the bit value tested coincides with the logical operation, so if using logical and instead silences gcc 5 irritable warning syndrome, it can get a Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/5] nir: Allow feq/fne/ieq/ine to be optimized with inot.
On Wed, 06 May 2015 23:12:54 +0200, Matt Turner matts...@gmail.com wrote: instructions in affected programs: 380 - 376 (-1.05%) helped:2 --- Did we just completely forget these in commit 391fb32b, or is there a reason to not include them? src/glsl/nir/nir_opt_algebraic.py | 4 1 file changed, 4 insertions(+) diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py index b0a1f24..400d60e 100644 --- a/src/glsl/nir/nir_opt_algebraic.py +++ b/src/glsl/nir/nir_opt_algebraic.py @@ -83,8 +83,12 @@ optimizations = [ # Comparison simplifications (('inot', ('flt', a, b)), ('fge', a, b)), (('inot', ('fge', a, b)), ('flt', a, b)), + (('inot', ('feq', a, b)), ('fne', a, b)), + (('inot', ('fne', a, b)), ('feq', a, b)), These two will produce inverted results for NaN inputs. GLSL 4.5 spec doesn't mention requiring ieee754 compliant comparison operators though so probably okay. (('inot', ('ilt', a, b)), ('ige', a, b)), (('inot', ('ige', a, b)), ('ilt', a, b)), + (('inot', ('ieq', a, b)), ('ine', a, b)), + (('inot', ('ine', a, b)), ('ieq', a, b)), (('fge', ('fneg', ('fabs', a)), 0.0), ('feq', a, 0.0)), (('bcsel', ('flt', a, b), a, b), ('fmin', a, b)), (('bcsel', ('flt', a, b), b, a), ('fmax', a, b)), Patches 1-5 are Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/19] gallium: basic tessellation support
On Sat, 02 May 2015 22:16:24 +0200, Ilia Mirkin imir...@alum.mit.edu wrote: This series adds tokens and updates some helper gallium functions to know about tessellation. This provides no actual support for tessellation in either core or drivers, however this will make it possible to work on the core and driver pieces without crazy interdependencies, as well as be landed separately and without (direct) dependency. Most of these patches have existed for about a year already, and have been part of my and Marek's trees enabling tessellation in the nvc0 and radeonsi drivers. I've taken this opportunity to fix up and fold some of them though. This should be pretty safe to land, since even if I messed something up, having this in-tree will make it easier for others to identify and fix any issues collaboratively. Ilia Mirkin (11): gallium: add tessellation shader types gallium: add new PATCHES primitive type gallium: add new semantics for tessellation gallium: add interfaces for controlling tess program state gallium: add tessellation shader properties gallium: add patch_vertices to draw info gallium: add set_tess_state to configure default tessellation parameters tgsi/scan: allow scanning tessellation shaders tgsi/sanity: set implicit in/out array sizes based on patch sizes tgsi/ureg: allow ureg_dst to have dimension indices tgsi/dump: fix declaration printing of tessellation inputs/outputs Marek Olšák (8): gallium: bump shader input and output limits trace: implement new tessellation functions gallium/util: print patch_vertices in util_dump_draw_info gallium/u_blitter: disable tessellation for all operations gallium/cso: add support for tessellation shaders gallium/cso: set NULL shaders at context destruction gallium: disable tessellation shaders for meta ops tgsi/ureg: use correct limit for max input count src/gallium/auxiliary/cso_cache/cso_context.c | 100 ++ src/gallium/auxiliary/cso_cache/cso_context.h | 12 src/gallium/auxiliary/hud/hud_context.c | 6 ++ src/gallium/auxiliary/postprocess/pp_run.c| 6 ++ src/gallium/auxiliary/tgsi/tgsi_dump.c| 20 +- src/gallium/auxiliary/tgsi/tgsi_info.c| 4 ++ src/gallium/auxiliary/tgsi/tgsi_sanity.c | 36 -- src/gallium/auxiliary/tgsi/tgsi_scan.c| 6 +- src/gallium/auxiliary/tgsi/tgsi_strings.c | 19 - src/gallium/auxiliary/tgsi/tgsi_strings.h | 2 +- src/gallium/auxiliary/tgsi/tgsi_ureg.c| 26 ++- src/gallium/auxiliary/tgsi/tgsi_ureg.h| 59 +-- src/gallium/auxiliary/util/u_blit.c | 6 ++ src/gallium/auxiliary/util/u_blitter.c| 27 +++ src/gallium/auxiliary/util/u_blitter.h| 16 - src/gallium/auxiliary/util/u_dump_state.c | 2 + src/gallium/docs/source/context.rst | 5 ++ src/gallium/docs/source/tgsi.rst | 70 ++ src/gallium/drivers/trace/tr_context.c| 26 +++ src/gallium/drivers/trace/tr_dump_state.c | 2 + src/gallium/include/pipe/p_context.h | 14 src/gallium/include/pipe/p_defines.h | 16 - src/gallium/include/pipe/p_shader_tokens.h| 18 - src/gallium/include/pipe/p_state.h| 6 +- src/mesa/state_tracker/st_cb_bitmap.c | 8 ++- src/mesa/state_tracker/st_cb_clear.c | 6 ++ src/mesa/state_tracker/st_cb_drawpixels.c | 8 ++- src/mesa/state_tracker/st_cb_drawtex.c| 6 ++ 28 files changed, 501 insertions(+), 31 deletions(-) Some minor nits for patches 1, 6 and 7, see separate mails Patches 2-5, 8-19 are Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 07/19] gallium: add patch_vertices to draw info
On Sat, 02 May 2015 22:16:31 +0200, Ilia Mirkin imir...@alum.mit.edu wrote: Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- src/gallium/include/pipe/p_state.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/gallium/include/pipe/p_state.h b/src/gallium/include/pipe/p_state.h index e713a44..449c7f1 100644 --- a/src/gallium/include/pipe/p_state.h +++ b/src/gallium/include/pipe/p_state.h @@ -543,6 +543,8 @@ struct pipe_draw_info unsigned start_instance; /** first instance id */ unsigned instance_count; /** number of instances */ + unsigned patch_vertices; /** the number of vertices per patch */ + patch_vertex_count, this field isn't the actual patch vertices data Don't forget to update patch 10 with the name /** * For indexed drawing, these fields apply after index lookup. */ With above fixed, Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/19] gallium: add tessellation shader types
On Sat, 02 May 2015 22:16:25 +0200, Ilia Mirkin imir...@alum.mit.edu wrote: Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- src/gallium/auxiliary/tgsi/tgsi_info.c | 4 src/gallium/auxiliary/tgsi/tgsi_strings.c | 4 +++- src/gallium/auxiliary/tgsi/tgsi_strings.h | 2 +- src/gallium/include/pipe/p_defines.h | 6 -- src/gallium/include/pipe/p_shader_tokens.h | 4 +++- 5 files changed, 15 insertions(+), 5 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c b/src/gallium/auxiliary/tgsi/tgsi_info.c index 3cab86e..eb447cb 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_info.c +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c @@ -302,6 +302,10 @@ tgsi_get_processor_name( uint processor ) return fragment shader; case TGSI_PROCESSOR_GEOMETRY: return geometry shader; + case TGSI_PROCESSOR_TESSCTRL: + return tessellation control shader; + case TGSI_PROCESSOR_TESSEVAL: + return tessellation evaluation shader; default: return unknown shader type!; } diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.c b/src/gallium/auxiliary/tgsi/tgsi_strings.c index 9b727cf..e712f30 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_strings.c +++ b/src/gallium/auxiliary/tgsi/tgsi_strings.c @@ -32,11 +32,13 @@ #include tgsi_strings.h -const char *tgsi_processor_type_names[4] = +const char *tgsi_processor_type_names[6] = Don't forget to update the declaration in tgsi_strings.h { FRAG, VERT, GEOM, + TESSC, + TESSE, A bit silly to shorten these when the dumps dedicate an entire line for printing the name. COMP }; diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.h b/src/gallium/auxiliary/tgsi/tgsi_strings.h index 90014a2..71e7437 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_strings.h +++ b/src/gallium/auxiliary/tgsi/tgsi_strings.h @@ -38,7 +38,7 @@ extern C { #endif -extern const char *tgsi_processor_type_names[4]; +extern const char *tgsi_processor_type_names[6]; extern const char *tgsi_semantic_names[TGSI_SEMANTIC_COUNT]; diff --git a/src/gallium/include/pipe/p_defines.h b/src/gallium/include/pipe/p_defines.h index 67f48e4..48c182f 100644 --- a/src/gallium/include/pipe/p_defines.h +++ b/src/gallium/include/pipe/p_defines.h @@ -404,8 +404,10 @@ enum pipe_flush_flags #define PIPE_SHADER_VERTEX 0 #define PIPE_SHADER_FRAGMENT 1 #define PIPE_SHADER_GEOMETRY 2 -#define PIPE_SHADER_COMPUTE 3 -#define PIPE_SHADER_TYPES4 +#define PIPE_SHADER_TESSCTRL 3 +#define PIPE_SHADER_TESSEVAL 4 Most of the gallium names are typed out without contractions, ie PIPE_SHADER_TESSELLATION_CONTROL/EVALUATION +#define PIPE_SHADER_COMPUTE 5 +#define PIPE_SHADER_TYPES6 /** diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index c14bcbc..776b0d4 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -43,7 +43,9 @@ struct tgsi_header #define TGSI_PROCESSOR_FRAGMENT 0 #define TGSI_PROCESSOR_VERTEX1 #define TGSI_PROCESSOR_GEOMETRY 2 -#define TGSI_PROCESSOR_COMPUTE 3 +#define TGSI_PROCESSOR_TESSCTRL 3 +#define TGSI_PROCESSOR_TESSEVAL 4 +#define TGSI_PROCESSOR_COMPUTE 5 struct tgsi_processor { With above niggles fixed Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 06/19] gallium: add tessellation shader properties
On Sat, 02 May 2015 22:16:30 +0200, Ilia Mirkin imir...@alum.mit.edu wrote: Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- src/gallium/auxiliary/tgsi/tgsi_strings.c | 7 ++- src/gallium/docs/source/tgsi.rst | 33 ++ src/gallium/include/pipe/p_defines.h | 7 +++ src/gallium/include/pipe/p_shader_tokens.h | 7 ++- 4 files changed, 52 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_strings.c b/src/gallium/auxiliary/tgsi/tgsi_strings.c index dad503e..6781248 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_strings.c +++ b/src/gallium/auxiliary/tgsi/tgsi_strings.c @@ -131,7 +131,12 @@ const char *tgsi_property_names[TGSI_PROPERTY_COUNT] = FS_DEPTH_LAYOUT, VS_PROHIBIT_UCPS, GS_INVOCATIONS, - VS_WINDOW_SPACE_POSITION + VS_WINDOW_SPACE_POSITION, + TCS_VERTICES_OUT, + TES_PRIM_MODE, + TES_SPACING, + TES_VERTEX_ORDER_CW, + TES_POINT_MODE, Stray comma }; const char *tgsi_return_type_names[TGSI_RETURN_TYPE_COUNT] = diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index 0116842..f77702a 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -3071,6 +3071,39 @@ Naturally, clipping is not performed on window coordinates either. The effect of this property is undefined if a geometry or tessellation shader are in use. +TCS_VERTICES_OUT + + +The number of vertices written by the tessellation control shader. This +effectively defines the patch input size of the tessellation evaluation shader +as well. + +TES_PRIM_MODE + + +This sets the tessellation primitive mode, one of ``PIPE_PRIM_TRIANGLES``, +``PIPE_PRIM_QUADS``, or ``PIPE_PRIM_LINES``. (Unlike in GL, there is no +separate isolines settings, the regular lines is assumed to mean isolines.) + +TES_SPACING + + +This sets the spacing mode of the tessellation generator, one of +``PIPE_TESS_SPACING_*``. + +TES_VERTEX_ORDER_CW + + +This sets the vertex order to be clockwise if the value is 1, or +counter-clockwise if set to 0. + +TES_POINT_MODE + + +If set to a non-zero value, this turns on point mode for the tessellator, +which means that points will be generated instead of primitives. + + Texture Sampling and Texture Formats diff --git a/src/gallium/include/pipe/p_defines.h b/src/gallium/include/pipe/p_defines.h index 59b7486..14e0db3 100644 --- a/src/gallium/include/pipe/p_defines.h +++ b/src/gallium/include/pipe/p_defines.h @@ -432,6 +432,13 @@ enum pipe_flush_flags /** + * Tessellator spacing types + */ +#define PIPE_TESS_SPACING_FRACT_ODD 0 +#define PIPE_TESS_SPACING_FRACT_EVEN 1 GL spec types out the FRACTIONAL which is easier to grep the spec for. +#define PIPE_TESS_SPACING_EQUAL 2 + +/** * Query object types */ #define PIPE_QUERY_OCCLUSION_COUNTER 0 diff --git a/src/gallium/include/pipe/p_shader_tokens.h b/src/gallium/include/pipe/p_shader_tokens.h index c6ab899..ff1f7d6 100644 --- a/src/gallium/include/pipe/p_shader_tokens.h +++ b/src/gallium/include/pipe/p_shader_tokens.h @@ -262,7 +262,12 @@ union tgsi_immediate_data #define TGSI_PROPERTY_VS_PROHIBIT_UCPS 7 #define TGSI_PROPERTY_GS_INVOCATIONS 8 #define TGSI_PROPERTY_VS_WINDOW_SPACE_POSITION 9 -#define TGSI_PROPERTY_COUNT 10 +#define TGSI_PROPERTY_TCS_VERTICES_OUT 10 +#define TGSI_PROPERTY_TES_PRIM_MODE 11 +#define TGSI_PROPERTY_TES_SPACING12 +#define TGSI_PROPERTY_TES_VERTEX_ORDER_CW13 +#define TGSI_PROPERTY_TES_POINT_MODE 14 +#define TGSI_PROPERTY_COUNT 15 struct tgsi_property { unsigned Type : 4; /** TGSI_TOKEN_TYPE_PROPERTY */ With above niggles fixed Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g/sb: Skip empty ALU clause while scheduling
Fixes assert triggered by ext_transform_feedback-intervening-read output use_gs piglit test. Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- src/gallium/drivers/r600/sb/sb_sched.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp b/src/gallium/drivers/r600/sb/sb_sched.cpp index 4248a3f..2e38a62 100644 --- a/src/gallium/drivers/r600/sb/sb_sched.cpp +++ b/src/gallium/drivers/r600/sb/sb_sched.cpp @@ -825,6 +825,9 @@ void post_scheduler::init_regmap() { void post_scheduler::process_alu(container_node *c) { + if (c-empty()) + return; + ucm.clear(); alu.reset(); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] r600g/sb: Enable SB for geometry shaders
Add SV_GEOMETRY_EMIT special variable type to track the implicit dependencies between CUT/EMIT_VERTEX/MEM_RING instructions so GCM/scheduler doesn't reorder them. Mark emit instructions as unkillable so DCE doesn't eat them. Enable only for evergreen/cayman as there are a few unexplained GS piglit regressions on R6xx/R7xx with SB enabled otherwise. Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- Changes since v1: * Enable SB only for = EVERGREEN. Something strange going on with GS on R6xx/R7xx that the code emitted by SB triggers, haven't been able to pinpoint it yet. * Avoid splitting live ranges for SV_GEOMETRY_EMIT values, useless since they are not actual values. Avoids unnecessary MOV operations being emitted. * Ensure the asm dump prints out the SV_GEOMETRY_EMIT dst values * One bytecode dumper fix spotted by Coverity Note: Requires 'r600g/sb: Update last_cf for loops' for cayman to pass all GS piglits without regressions - not a GS bug but a loop handling issue that only triggers in some GS piglit shaders. src/gallium/drivers/r600/r600_isa.h| 8 src/gallium/drivers/r600/r600_shader.c | 12 src/gallium/drivers/r600/sb/sb_bc_dump.cpp | 2 +- src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 2 +- src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 25 + src/gallium/drivers/r600/sb/sb_core.cpp| 5 - src/gallium/drivers/r600/sb/sb_dump.cpp| 4 +++- src/gallium/drivers/r600/sb/sb_ir.h| 6 +- src/gallium/drivers/r600/sb/sb_ra_init.cpp | 4 ++-- src/gallium/drivers/r600/sb/sb_sched.cpp | 2 +- src/gallium/drivers/r600/sb/sb_valtable.cpp| 1 + 11 files changed, 55 insertions(+), 16 deletions(-) diff --git a/src/gallium/drivers/r600/r600_isa.h b/src/gallium/drivers/r600/r600_isa.h index ec3f702..381f06d 100644 --- a/src/gallium/drivers/r600/r600_isa.h +++ b/src/gallium/drivers/r600/r600_isa.h @@ -641,7 +641,7 @@ static const struct cf_op_info cf_op_table[] = { {MEM_SCRATCH, { 0x24, 0x24, 0x50, 0x50 }, CF_MEM }, {MEM_REDUCT,{ 0x25, 0x25, -1, -1 }, CF_MEM }, - {MEM_RING, { 0x26, 0x26, 0x52, 0x52 }, CF_MEM }, + {MEM_RING, { 0x26, 0x26, 0x52, 0x52 }, CF_MEM | CF_EMIT }, {EXPORT,{ 0x27, 0x27, 0x53, 0x53 }, CF_EXP }, {EXPORT_DONE, { 0x28, 0x28, 0x54, 0x54 }, CF_EXP }, @@ -649,9 +649,9 @@ static const struct cf_op_info cf_op_table[] = { {MEM_EXPORT,{ -1, 0x3A, 0x55, 0x55 }, CF_MEM }, {MEM_RAT, { -1, -1, 0x56, 0x56 }, CF_MEM | CF_RAT }, {MEM_RAT_NOCACHE, { -1, -1, 0x57, 0x57 }, CF_MEM | CF_RAT }, - {MEM_RING1, { -1, -1, 0x58, 0x58 }, CF_MEM }, - {MEM_RING2, { -1, -1, 0x59, 0x59 }, CF_MEM }, - {MEM_RING3, { -1, -1, 0x5A, 0x5A }, CF_MEM }, + {MEM_RING1, { -1, -1, 0x58, 0x58 }, CF_MEM | CF_EMIT }, + {MEM_RING2, { -1, -1, 0x59, 0x59 }, CF_MEM | CF_EMIT }, + {MEM_RING3, { -1, -1, 0x5A, 0x5A }, CF_MEM | CF_EMIT }, {MEM_MEM_COMBINED, { -1, -1, 0x5B, 0x5B }, CF_MEM }, {MEM_RAT_COMBINED_NOCACHE, { -1, -1, 0x5C, 0x5C }, CF_MEM | CF_RAT }, {MEM_RAT_COMBINED, { -1, -1, -1, 0x5D }, CF_MEM | CF_RAT }, /* ??? not in cayman isa doc */ diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 28b290a..a9338cc 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -159,8 +159,10 @@ int r600_pipe_shader_create(struct pipe_context *ctx, goto error; } - /* disable SB for geom shaders - it can't handle the CF_EMIT instructions */ - use_sb = (shader-shader.processor_type != TGSI_PROCESSOR_GEOMETRY); +/* disable SB for geom shaders on R6xx/R7xx due to some mysterious gs piglit regressions with it enabled. */ +if (rctx-b.chip_class = R700) { + use_sb = (shader-shader.processor_type != TGSI_PROCESSOR_GEOMETRY); +} /* disable SB for shaders using CF_INDEX_0/1 (sampler/ubo array indexing) as it doesn't handle those currently */ use_sb = !shader-shader.uses_index_registers; @@ -1141,6 +1143,8 @@ static int fetch_gs_input(struct r600_shader_ctx *ctx, struct tgsi_full_src_regi for (i = 0; i 3; i++) { treg[i] = r600_get_temp(ctx); } + r600_add_gpr_array(ctx-shader, treg[0
Re: [Mesa-dev] [PATCH] r600g: fix op3 abs issue
].sel = ctx-temp_reg; @@ -6109,8 +6119,15 @@ static int tgsi_cmp(struct r600_shader_ctx *ctx) { struct tgsi_full_instruction *inst = ctx-parse.FullToken.FullInstruction; struct r600_bytecode_alu alu; - int i, r; + int i, r, j; int lasti = tgsi_last_instruction(inst-Dst[0].Register.WriteMask); + int temp_regs[3]; + + for (j = 0; j inst-Instruction.NumSrcRegs; j++) { + temp_regs[j] = 0; + if (ctx-src[j].abs) + temp_regs[j] = r600_get_temp(ctx); + } for (i = 0; i lasti + 1; i++) { if (!(inst-Dst[0].Register.WriteMask (1 i))) @@ -6118,13 +6135,13 @@ static int tgsi_cmp(struct r600_shader_ctx *ctx) memset(alu, 0, sizeof(struct r600_bytecode_alu)); alu.op = ALU_OP3_CNDGE; - r = tgsi_make_src_for_op3(ctx, ctx-temp_reg, 0, alu.src[0], ctx-src[0], i); + r = tgsi_make_src_for_op3(ctx, temp_regs[0], i, alu.src[0], ctx-src[0]); if (r) return r; - r = tgsi_make_src_for_op3(ctx, ctx-temp_reg, 1, alu.src[1], ctx-src[2], i); + r = tgsi_make_src_for_op3(ctx, temp_regs[1], i, alu.src[1], ctx-src[2]); if (r) return r; - r = tgsi_make_src_for_op3(ctx, ctx-temp_reg, 2, alu.src[2], ctx-src[1], i); + r = tgsi_make_src_for_op3(ctx, temp_regs[2], i, alu.src[2], ctx-src[1]); if (r) return r; tgsi_dst(ctx, inst-Dst[0], i, alu.dst); Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g/sb: Enable SB for geometry shaders
On Wed, 25 Mar 2015 14:26:40 +0100, Marc Dietrich marvi...@gmx.de wrote: Am Dienstag, 24. März 2015, 20:05:46 schrieb Glenn Kennard: On Tue, 24 Mar 2015 17:21:35 +0100, Dieter Nützel die...@nuetzel-hh.de wrote: Am 20.03.2015 14:13, schrieb Glenn Kennard: Add SV_GEOMETRY_EMIT special variable type to track the implicit dependencies between CUT/EMIT_VERTEX/MEM_RING instructions so GCM/scheduler doesn't reorder them. Mark emit instructions as unkillable so DCE doesn't eat them. Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- The hangs with SB on geometry shaders were all due to the CUT/EMIT instructions either being DCE:d or emitted out of order from the memory ring writes, so the hardware stalled forever waiting for completed primitives. Tested only on a Turks so far, but should behave the same across all R600 generations. Hello Glenn, what tests are preferred? Starting with a Turks XT here, too and could do some tests on RV730 (AGP) then. -Dieter Just the usual piglit regression testing, at this point it's been tested on a Turks XT, and a RV770. A R6xx card and some VLIW4 gpu would complete the coverage needed. I would like to, but piglit run quick stalls/crashes the gpu (rs880) too often. Maybe you could tell me some special tests to run instead of all. Marc -t geometry should be the smallest useful subset. It's likely that most of the hangs you get on rs880 (and other r6xx devices) are geometry shader related though so that might end up taking as long as a full quick run, unfortunately. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g/sb: Update last_cf for loops
CF_END could end up emitted in the middle of a shader on cayman when there was a loop at the very end. Fixes glsl-1.50-geometry-end-primitive and ext_transform_feedback-geometry-shaders-basic piglit tests. Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- Bug exposed by [PATCH] r600g/sb: Enable SB for geometry shaders src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 8 1 file changed, 8 insertions(+) diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp index 8d0be06..08b7d77 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp @@ -127,6 +127,14 @@ void bc_finalizer::finalize_loop(region_node* r) { cf_node *loop_start = sh.create_cf(CF_OP_LOOP_START_DX10); cf_node *loop_end = sh.create_cf(CF_OP_LOOP_END); + // Update last_cf, but don't overwrite it if it's outside the current loop nest since + // it may point to a cf that is later in program order. + // The single parent level check is sufficient since finalize_loop() is processed in + // reverse order from innermost to outermost loop nest level. + if (!last_cf || last_cf-get_parent_region() == r) { + last_cf = loop_end; + } + loop_start-jump_after(loop_end); loop_end-jump_after(loop_start); -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g/sb: Enable SB for geometry shaders
On Tue, 24 Mar 2015 17:21:35 +0100, Dieter Nützel die...@nuetzel-hh.de wrote: Am 20.03.2015 14:13, schrieb Glenn Kennard: Add SV_GEOMETRY_EMIT special variable type to track the implicit dependencies between CUT/EMIT_VERTEX/MEM_RING instructions so GCM/scheduler doesn't reorder them. Mark emit instructions as unkillable so DCE doesn't eat them. Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- The hangs with SB on geometry shaders were all due to the CUT/EMIT instructions either being DCE:d or emitted out of order from the memory ring writes, so the hardware stalled forever waiting for completed primitives. Tested only on a Turks so far, but should behave the same across all R600 generations. Hello Glenn, what tests are preferred? Starting with a Turks XT here, too and could do some tests on RV730 (AGP) then. -Dieter Just the usual piglit regression testing, at this point it's been tested on a Turks XT, and a RV770. A R6xx card and some VLIW4 gpu would complete the coverage needed. /Glenn ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g/sb: Enable SB for geometry shaders
Add SV_GEOMETRY_EMIT special variable type to track the implicit dependencies between CUT/EMIT_VERTEX/MEM_RING instructions so GCM/scheduler doesn't reorder them. Mark emit instructions as unkillable so DCE doesn't eat them. Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- The hangs with SB on geometry shaders were all due to the CUT/EMIT instructions either being DCE:d or emitted out of order from the memory ring writes, so the hardware stalled forever waiting for completed primitives. Tested only on a Turks so far, but should behave the same across all R600 generations. This patch disables the if-conversion pass when running GS shaders, didn't seem worth the effort to fix that pass up for the marginal returns. src/gallium/drivers/r600/r600_isa.h| 8 src/gallium/drivers/r600/r600_shader.c | 8 src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 2 +- src/gallium/drivers/r600/sb/sb_bc_parser.cpp | 25 + src/gallium/drivers/r600/sb/sb_core.cpp| 5 - src/gallium/drivers/r600/sb/sb_dump.cpp| 4 +++- src/gallium/drivers/r600/sb/sb_ir.h| 6 +- src/gallium/drivers/r600/sb/sb_ra_init.cpp | 2 +- src/gallium/drivers/r600/sb/sb_sched.cpp | 2 +- src/gallium/drivers/r600/sb/sb_valtable.cpp| 1 + 10 files changed, 49 insertions(+), 14 deletions(-) diff --git a/src/gallium/drivers/r600/r600_isa.h b/src/gallium/drivers/r600/r600_isa.h index ec3f702..381f06d 100644 --- a/src/gallium/drivers/r600/r600_isa.h +++ b/src/gallium/drivers/r600/r600_isa.h @@ -641,7 +641,7 @@ static const struct cf_op_info cf_op_table[] = { {MEM_SCRATCH, { 0x24, 0x24, 0x50, 0x50 }, CF_MEM }, {MEM_REDUCT,{ 0x25, 0x25, -1, -1 }, CF_MEM }, - {MEM_RING, { 0x26, 0x26, 0x52, 0x52 }, CF_MEM }, + {MEM_RING, { 0x26, 0x26, 0x52, 0x52 }, CF_MEM | CF_EMIT }, {EXPORT,{ 0x27, 0x27, 0x53, 0x53 }, CF_EXP }, {EXPORT_DONE, { 0x28, 0x28, 0x54, 0x54 }, CF_EXP }, @@ -649,9 +649,9 @@ static const struct cf_op_info cf_op_table[] = { {MEM_EXPORT,{ -1, 0x3A, 0x55, 0x55 }, CF_MEM }, {MEM_RAT, { -1, -1, 0x56, 0x56 }, CF_MEM | CF_RAT }, {MEM_RAT_NOCACHE, { -1, -1, 0x57, 0x57 }, CF_MEM | CF_RAT }, - {MEM_RING1, { -1, -1, 0x58, 0x58 }, CF_MEM }, - {MEM_RING2, { -1, -1, 0x59, 0x59 }, CF_MEM }, - {MEM_RING3, { -1, -1, 0x5A, 0x5A }, CF_MEM }, + {MEM_RING1, { -1, -1, 0x58, 0x58 }, CF_MEM | CF_EMIT }, + {MEM_RING2, { -1, -1, 0x59, 0x59 }, CF_MEM | CF_EMIT }, + {MEM_RING3, { -1, -1, 0x5A, 0x5A }, CF_MEM | CF_EMIT }, {MEM_MEM_COMBINED, { -1, -1, 0x5B, 0x5B }, CF_MEM }, {MEM_RAT_COMBINED_NOCACHE, { -1, -1, 0x5C, 0x5C }, CF_MEM | CF_RAT }, {MEM_RAT_COMBINED, { -1, -1, -1, 0x5D }, CF_MEM | CF_RAT }, /* ??? not in cayman isa doc */ diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 28b290a..ff2c784 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -159,8 +159,6 @@ int r600_pipe_shader_create(struct pipe_context *ctx, goto error; } - /* disable SB for geom shaders - it can't handle the CF_EMIT instructions */ - use_sb = (shader-shader.processor_type != TGSI_PROCESSOR_GEOMETRY); /* disable SB for shaders using CF_INDEX_0/1 (sampler/ubo array indexing) as it doesn't handle those currently */ use_sb = !shader-shader.uses_index_registers; @@ -1141,6 +1139,8 @@ static int fetch_gs_input(struct r600_shader_ctx *ctx, struct tgsi_full_src_regi for (i = 0; i 3; i++) { treg[i] = r600_get_temp(ctx); } + r600_add_gpr_array(ctx-shader, treg[0], 3, 0x0F); + t2 = r600_get_temp(ctx); for (i = 0; i 3; i++) { memset(alu, 0, sizeof(struct r600_bytecode_alu)); @@ -1935,9 +1935,9 @@ static int r600_shader_from_tgsi(struct r600_context *rctx, ctx.bc-index_reg[1] = ctx.bc-ar_reg + 3; } + shader-max_arrays = 0; + shader-num_arrays = 0; if (indirect_gprs) { - shader-max_arrays = 0; - shader-num_arrays = 0; if (ctx.info.indirect_files (1 TGSI_FILE_INPUT)) { r600_add_gpr_array(shader
Re: [Mesa-dev] [PATCH 4/9] radeonsi: implement gl_SampleMaskIn
On Mon, 02 Mar 2015 12:54:18 +0100, Marek Olšák mar...@gmail.com wrote: From: Marek Olšák marek.ol...@amd.com --- docs/GL3.txt | 2 +- src/gallium/drivers/radeonsi/si_shader.c | 4 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 43bbf85..0487cdf 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -106,7 +106,7 @@ GL 4.0, GLSL 4.00: - Enhanced textureGather DONE (r600, radeonsi) - Geometry shader instancing DONE (r600) - Geometry shader multiple streams DONE () - - Enhanced per-sample shadingDONE (r600) + - Enhanced per-sample shadingDONE (r600, radeonsi) - Interpolation functionsDONE (r600) - New overload resolution rules DONE GL_ARB_gpu_shader_fp64 DONE (nvc0, softpipe) diff --git a/src/gallium/drivers/radeonsi/si_shader.c b/src/gallium/drivers/radeonsi/si_shader.c index 085a350..8001ea2 100644 --- a/src/gallium/drivers/radeonsi/si_shader.c +++ b/src/gallium/drivers/radeonsi/si_shader.c @@ -680,6 +680,10 @@ static void declare_system_value( break; } + case TGSI_SEMANTIC_SAMPLEMASK: + value = LLVMGetParam(radeon_bld-main_fn, SI_PARAM_SAMPLE_COVERAGE); + break; + default: assert(!unknown system value); return; Patches 4-9 are Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] r600g: add doubles support for CAYMAN
vs_as_gs_a; unsignedps_prim_id_input; struct r600_shader_array * arrays; + + boolean uses_doubles; }; struct r600_shader_key { With above nits fixed, Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g/sb: treat undefined values like constants
On Wed, 18 Feb 2015 01:17:32 +0100, Dave Airlie airl...@gmail.com wrote: From: Dave Airlie airl...@redhat.com When we schedule an instructions with undefined value, we eventually will use 0, which is a constant, however sb wasn't taking this into account and creating ops with illegal scalar swizzles. this replaces my fix for op3 in t slots. Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/drivers/r600/sb/sb_sched.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_sched.cpp b/src/gallium/drivers/r600/sb/sb_sched.cpp index 4fbdc4f..63e7464 100644 --- a/src/gallium/drivers/r600/sb/sb_sched.cpp +++ b/src/gallium/drivers/r600/sb/sb_sched.cpp @@ -266,7 +266,7 @@ bool rp_gpr_tracker::try_reserve(alu_node* n) { for (i = 0; i nsrc; ++i) { value *v = n-src[i]; - if (v-is_readonly()) { + if (v-is_readonly() || v-is_undef()) { const_count++; if (trans const_count == 3) break; @@ -295,7 +295,7 @@ bool rp_gpr_tracker::try_reserve(alu_node* n) { if (need_unreserve i--) { do { value *v = n-src[i]; - if (!v-is_readonly()) { + if (!v-is_readonly() !v-is_undef()) { if (i == 1 opt) continue; unreserve(bs_cycle(trans, bs, i), n-bc.src[i].sel, Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g/sb: Don't fold integer value into float CND
Don't try to do float comparisons on signed integer values, some of them look like NaNs. Fixes fs-temp-array-mat3-index-col-row-rd.shader_test regression caused by 0d4272cd8e7c45157140dc8e283707714a8238d5. Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- src/gallium/drivers/r600/sb/sb_peephole.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/sb/sb_peephole.cpp b/src/gallium/drivers/r600/sb/sb_peephole.cpp index d4b9755..4161d59 100644 --- a/src/gallium/drivers/r600/sb/sb_peephole.cpp +++ b/src/gallium/drivers/r600/sb/sb_peephole.cpp @@ -250,7 +250,7 @@ void peephole::optimize_CNDcc_op(alu_node* a) { return; // TODO we can handle some cases for uint comparison - if (dcmp_type == AF_UINT_CMP) + if (dcmp_type == AF_UINT_CMP || dcmp_type == AF_INT_CMP) return; if (dcc == AF_CC_NE) { -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] r300g: handle unsupported blend factor gracefully
On Fri, 06 Feb 2015 20:53:21 +0100, Roland Scheidegger srol...@vmware.com wrote: FWIW I'm wondering why you'd actually need them in a d3d9 state tracker, as this is a feature first seen with d3d10. Unless you'd want to handle d3d10 of course, but in this case there's probably not much hope for any of the d3d9 capable hw drivers for lots of reasons... Roland Actually it got retrofitted into D3D9 Ex for Vista, see https://msdn.microsoft.com/en-us/library/windows/desktop/bb172513(v=vs.85).aspx D3DPBLENDCAPS_INVSRCCOLOR2 and D3DPBLENDCAPS_SRCCOLOR2. /Glenn's .02 cents ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3] r600g: Implement GL_ARB_draw_indirect for EG/CM
On Fri, 06 Feb 2015 17:08:46 +0100, Marek Olšák mar...@gmail.com wrote: Please bump the size of vgt_state for the SQ_VTX_BASE_VTX_LOC register. It's set by r600_init_atom in r600_state.c and evergreen_state.c Please bump R600_MAX_DRAW_CS_DWORDS. It's an upper bound of how many dwords draw_vbo can emit. Thanks, will fix. I don't understand what get_vfetch_type is good for. Could you please explain it in the code? Also, I don't understand what constant buffer fetches have to do with VertexID. Will add some more blurb to get_vfetch_type, in particular i can point at the appropriate parts of gpu documentation. As for the interaction of buffer fetches and VertexID, i'll attempt to explain: The way R_03CFF0_SQ_VTX_BASE_VTX_LOC is delivered to the vertex shader is basically, it isn't. Instead what the hardware does is poke the 64 unique values (one per wavefront thread, 64 state in the documentation) into the fetch units into a hidden state hardware register which the shader cannot read, at least not in any way that i've been able to find. Setting FETCH_MODE=SQ_VTX_FETCH_VERTEX_DATA (=0) on a VFETCH instruction then tells the fetch unit to add the BASE_VTX and start instance offsets before reading the value - see r600_asm.c:r600_create_vertex_fetch_shader() which open codes 0 as the fetch mode for vertex fetches. This creates a problem for GLSL gl_VertexId, since the shader cannot apply the offset. Lets look at the shader for the tests/spec/arb_draw_indirect/vertexid.c piglit test case: #version 140\n \n in vec4 piglit_vertex;\n out vec3 c;\n \n const vec3 colors[] = vec3[](\n vec3(1, 0, 0),\n vec3(1, 0, 0),\n vec3(1, 0, 0),\n vec3(1, 0, 0),\n \n ... vec3(1, 0, 1),\n vec3(1, 0, 1),\n vec3(1, 0, 1),\n vec3(1, 0, 1)\n );\n void main() {\n c = colors[gl_VertexID];\n gl_Position = piglit_vertex;\n }\n Colors here is a constant array, and base offset needs to be applied to look up the correct color value - the GL 4.5 spec is quite clear that it should be applied to gl_VertexID. Since the hardware offers no way to add base instance to gl_VertexID, i do the next best thing and enable offset on the array fetch operation instead. The detection logic is quite hacky, since really it needs to look if the array expression depends in any way on gl_VertexId which requires looking at def use chains, which aren't available in r600_asm.c - can probably have SB compute the bit instead, but that sort of violates its don't change program meaning principle, not to mention different behavior with SB disabled. All the actual shaders that i've found using gl_VertexId in conjunction with indirect draws only use one constant array. I figure partial support at least approximately matches what the binary driver supports, which doesn't produce the correct value for gl_VertexId either for indirect draws in various cases - in particular if the shader tries to compare gl_VertexID against some other expression you get an incorrect value. The driver does something totally different for direct draws, it adds the base offset and start offset manually and feeds that to the hardware, with BASE_VTX always set to 0, which allows it to work for all cases. Not an option for indirect draws if you want any sort of performance out of them. So to sum up, gl_VertexID i don't see the hardware being fully capable of following the spec in conjunction with indirect drawing for all cases, at least not without some very slow fallbacks reading back the draw parameters to the cpu which is useless. One option would be to just drop the attempt at supporting gl_VertexID from this patch if it's deemed too hacky. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3] r600g: Implement GL_ARB_draw_indirect for EG/CM
Requires Evergreen/Cayman and radeon kernel module 2.41.0 or newer. Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- Changes since v2: * Fix failing arb_draw_indirect-vertexid piglit test cases. * Ensure start_instance, base_vertex, index_offset are reset when switching back to direct draws. * Juggled some header defines to avoid use of magic numbers. docs/GL3.txt | 4 +- docs/relnotes/10.5.0.html| 1 + src/gallium/drivers/r600/evergreend.h| 1 - src/gallium/drivers/r600/r600_pipe.c | 4 +- src/gallium/drivers/r600/r600_pipe.h | 1 + src/gallium/drivers/r600/r600_shader.c | 14 ++- src/gallium/drivers/r600/r600_state_common.c | 128 ++- src/gallium/drivers/r600/r600d.h | 8 +- 8 files changed, 130 insertions(+), 31 deletions(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 23f5561..ef4f0ae 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -95,7 +95,7 @@ GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, soft GL 4.0, GLSL 4.00: GL_ARB_draw_buffers_blendDONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe) - GL_ARB_draw_indirect DONE (i965, nvc0, radeonsi, llvmpipe, softpipe) + GL_ARB_draw_indirect DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe) GL_ARB_gpu_shader5 DONE (i965, nvc0) - 'precise' qualifierDONE - Dynamically uniform sampler array indices DONE (r600) @@ -159,7 +159,7 @@ GL 4.3, GLSL 4.30: GL_ARB_framebuffer_no_attachmentsnot started GL_ARB_internalformat_query2 not started GL_ARB_invalidate_subdataDONE (all drivers) - GL_ARB_multi_draw_indirect DONE (i965, nvc0, radeonsi, llvmpipe, softpipe) + GL_ARB_multi_draw_indirect DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe) GL_ARB_program_interface_query not started GL_ARB_robust_buffer_access_behavior not started GL_ARB_shader_image_size not started diff --git a/docs/relnotes/10.5.0.html b/docs/relnotes/10.5.0.html index 4f921ea..47686c0 100644 --- a/docs/relnotes/10.5.0.html +++ b/docs/relnotes/10.5.0.html @@ -49,6 +49,7 @@ Note: some of the new features are only available with certain drivers. liGL_EXT_packed_float on freedreno/li liGL_EXT_texture_shared_exponent on freedreno/li liGL_EXT_texture_snorm on freedreno/li +liGL_ARB_draw_indirect, GL_ARB_multi_draw_indirect on r600/li /ul diff --git a/src/gallium/drivers/r600/evergreend.h b/src/gallium/drivers/r600/evergreend.h index 4989996..cd4ff46 100644 --- a/src/gallium/drivers/r600/evergreend.h +++ b/src/gallium/drivers/r600/evergreend.h @@ -72,7 +72,6 @@ #define PKT3_REG_RMW 0x21 #define PKT3_COND_EXEC 0x22 #define PKT3_PRED_EXEC 0x23 -#define PKT3_START_3D_CMDBUF 0x24 #define PKT3_DRAW_INDEX_2 0x27 #define PKT3_CONTEXT_CONTROL 0x28 #define PKT3_DRAW_INDEX_IMMD_BE0x29 diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index b6f7859..3127e23 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -313,6 +313,9 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) return family = CHIP_CEDAR ? 1 : 0; case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: return family = CHIP_CEDAR ? 4 : 0; + case PIPE_CAP_DRAW_INDIRECT: + /* kernel command checker support is also required */ + return family = CHIP_CEDAR rscreen-b.info.drm_minor = 41; /* Unsupported features. */ case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT: @@ -322,7 +325,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_VERTEX_COLOR_CLAMPED: case PIPE_CAP_USER_VERTEX_BUFFERS: case PIPE_CAP_TEXTURE_GATHER_OFFSETS: - case PIPE_CAP_DRAW_INDIRECT: case PIPE_CAP_CONDITIONAL_RENDER_INVERTED: case PIPE_CAP_SAMPLER_VIEW_TARGET: case PIPE_CAP_VERTEXID_NOBASE: diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index e110efe..1db43c4 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -145,6 +145,7 @@ struct r600_vgt_state { uint32_t vgt_multi_prim_ib_reset_en; uint32_t vgt_multi_prim_ib_reset_indx; uint32_t vgt_indx_offset; + bool last_draw_was_indirect; }; struct r600_blend_color { diff --git a/src/gallium/drivers/r600
Re: [Mesa-dev] [PATCH 1/6] glapi: add GL_EXT_polygon_offset_clamp
On Sun, 01 Feb 2015 16:18:51 +0100, Ilia Mirkin imir...@alum.mit.edu wrote: Signed-off-by: Ilia Mirkin imir...@alum.mit.edu --- src/mapi/glapi/gen/gl_API.xml | 11 +++ src/mesa/main/polygon.c | 6 ++ src/mesa/main/polygon.h | 5 - src/mesa/main/tests/dispatch_sanity.cpp | 3 +++ 4 files changed, 24 insertions(+), 1 deletion(-) diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml index e3cbab3..17bf62a 100644 --- a/src/mapi/glapi/gen/gl_API.xml +++ b/src/mapi/glapi/gen/gl_API.xml @@ -12858,6 +12858,17 @@ xi:include href=INTEL_performance_query.xml xmlns:xi=http://www.w3.org/2001/XInclude/ +category name=GL_EXT_polygon_offset_clamp number=460 +enum name=POLYGON_OFFSET_CLAMP_EXT value=0x8E1B +size name=Get mode=get/ +/enum +function name=PolygonOffsetClampEXT offset=assign +param name=factor type=GLfloat/ +param name=units type=GLfloat/ +param name=clamp type=GLfloat/ +/function +/category + !-- Unnumbered extensions sorted by name. -- category name=GL_ATI_blend_equation_separate diff --git a/src/mesa/main/polygon.c b/src/mesa/main/polygon.c index cdaa244..e3b9073 100644 --- a/src/mesa/main/polygon.c +++ b/src/mesa/main/polygon.c @@ -265,6 +265,12 @@ _mesa_PolygonOffsetEXT( GLfloat factor, GLfloat bias ) _mesa_PolygonOffset(factor, bias * ctx-DrawBuffer-_DepthMaxF ); } +void GLAPIENTRY +_mesa_PolygonOffsetClampEXT( GLfloat factor, GLfloat units, GLfloat clamp ) +{ + +} + /**/ diff --git a/src/mesa/main/polygon.h b/src/mesa/main/polygon.h index 530adba..6cf14d3 100644 --- a/src/mesa/main/polygon.h +++ b/src/mesa/main/polygon.h @@ -55,12 +55,15 @@ extern void GLAPIENTRY _mesa_PolygonOffsetEXT( GLfloat factor, GLfloat bias ); extern void GLAPIENTRY +_mesa_PolygonOffsetClampEXT( GLfloat factor, GLfloat units, GLfloat clamp ); + +extern void GLAPIENTRY _mesa_PolygonStipple( const GLubyte *mask ); extern void GLAPIENTRY _mesa_GetPolygonStipple( GLubyte *mask ); -extern void +extern void _mesa_init_polygon( struct gl_context * ctx ); #endif diff --git a/src/mesa/main/tests/dispatch_sanity.cpp b/src/mesa/main/tests/dispatch_sanity.cpp index ee4db45..1f1a3a8 100644 --- a/src/mesa/main/tests/dispatch_sanity.cpp +++ b/src/mesa/main/tests/dispatch_sanity.cpp @@ -988,6 +988,9 @@ const struct function gl_core_functions_possible[] = { { glTextureStorage3DMultisample, 45, -1 }, { glTextureBuffer, 45, -1 }, + /* GL_EXT_polygon_offset_clamp */ + { glPolygonOffsetClampEXT, 11, -1 }, + { NULL, 0, -1 } }; Patches 1-5 (assuming fix for clamp in 2 noted already by Ilia) are Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/10] r600g, radeonsi: don't append to streamout buffers that haven't been used yet
On Sun, 01 Feb 2015 18:36:58 +0100, Marek Olšák mar...@gmail.com wrote: From: Marek Olšák marek.ol...@amd.com The FILLED_SIZE counter is uninitialized at the beginning, so we can't use it. Instead, use offset = 0, which is what we always do when not appending. This unexpectedly fixes spec/ARB_texture_multisample/sample-position/*. Yes, the test does use transform feedback. Cc: 10.3 10.4 mesa-sta...@lists.freedesktop.org --- src/gallium/drivers/radeon/r600_pipe_common.h | 1 + src/gallium/drivers/radeon/r600_streamout.c | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h b/src/gallium/drivers/radeon/r600_pipe_common.h index 6224668..46a6bf3 100644 --- a/src/gallium/drivers/radeon/r600_pipe_common.h +++ b/src/gallium/drivers/radeon/r600_pipe_common.h @@ -294,6 +294,7 @@ struct r600_so_target { /* The buffer where BUFFER_FILLED_SIZE is stored. */ struct r600_resource*buf_filled_size; unsignedbuf_filled_size_offset; + boolbuf_filled_size_valid; unsignedstride_in_dw; }; diff --git a/src/gallium/drivers/radeon/r600_streamout.c b/src/gallium/drivers/radeon/r600_streamout.c index c44f0f2..bc8bf97 100644 --- a/src/gallium/drivers/radeon/r600_streamout.c +++ b/src/gallium/drivers/radeon/r600_streamout.c @@ -237,7 +237,7 @@ static void r600_emit_streamout_begin(struct r600_common_context *rctx, struct r } } - if (rctx-streamout.append_bitmask (1 i)) { + if (rctx-streamout.append_bitmask (1 i) t[i]-buf_filled_size_valid) { uint64_t va = t[i]-buf_filled_size-gpu_address + t[i]-buf_filled_size_offset; @@ -302,6 +302,8 @@ void r600_emit_streamout_end(struct r600_common_context *rctx) * buffer bound. This ensures that the primitives-emitted query * won't increment. */ r600_write_context_reg(cs, R_028AD0_VGT_STRMOUT_BUFFER_SIZE_0 + 16*i, 0); + + t[i]-buf_filled_size_valid = true; } rctx-streamout.begin_emitted = false; Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] glsl: Add define for ARB_shader_precision
On Wed, 31 Dec 2014 21:43:51 +0100, Micah Fedke micah.fe...@collabora.co.uk wrote: --- src/glsl/glcpp/glcpp-parse.y| 3 +++ src/glsl/glsl_parser_extras.cpp | 1 + src/glsl/glsl_parser_extras.h | 2 ++ src/mesa/main/extensions.c | 1 + src/mesa/main/mtypes.h | 1 + 5 files changed, 8 insertions(+) diff --git a/src/glsl/glcpp/glcpp-parse.y b/src/glsl/glcpp/glcpp-parse.y index 9b1a4f4..c9cc68f 100644 --- a/src/glsl/glcpp/glcpp-parse.y +++ b/src/glsl/glcpp/glcpp-parse.y @@ -2473,6 +2473,9 @@ _glcpp_parser_handle_version_declaration(glcpp_parser_t *parser, intmax_t versio if (extensions-ARB_derivative_control) add_builtin_define(parser, GL_ARB_derivative_control, 1); + + if (extensions-ARB_shader_precision) + add_builtin_define(parser, GL_ARB_shader_precision, 1); } } diff --git a/src/glsl/glsl_parser_extras.cpp b/src/glsl/glsl_parser_extras.cpp index 27e2eaf3..8555af6 100644 --- a/src/glsl/glsl_parser_extras.cpp +++ b/src/glsl/glsl_parser_extras.cpp @@ -532,6 +532,7 @@ static const _mesa_glsl_extension _mesa_glsl_supported_extensions[] = { EXT(ARB_shader_atomic_counters, true, false, ARB_shader_atomic_counters), EXT(ARB_shader_bit_encoding,true, false, ARB_shader_bit_encoding), EXT(ARB_shader_image_load_store,true, false, ARB_shader_image_load_store), + EXT(ARB_shader_precision, true, false, ARB_shader_precision), EXT(ARB_shader_stencil_export, true, false, ARB_shader_stencil_export), EXT(ARB_shader_texture_lod, true, false, ARB_shader_texture_lod), EXT(ARB_shading_language_420pack, true, false, ARB_shading_language_420pack), diff --git a/src/glsl/glsl_parser_extras.h b/src/glsl/glsl_parser_extras.h index e04f7ce..0ca6053 100644 --- a/src/glsl/glsl_parser_extras.h +++ b/src/glsl/glsl_parser_extras.h @@ -424,6 +424,8 @@ struct _mesa_glsl_parse_state { bool ARB_shader_bit_encoding_warn; bool ARB_shader_image_load_store_enable; bool ARB_shader_image_load_store_warn; + bool ARB_shader_precision_enable; + bool ARB_shader_precision_warn; bool ARB_shader_stencil_export_enable; bool ARB_shader_stencil_export_warn; bool ARB_shader_texture_lod_enable; diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c index 0df04c2..95c7a37 100644 --- a/src/mesa/main/extensions.c +++ b/src/mesa/main/extensions.c @@ -147,6 +147,7 @@ static const struct extension extension_table[] = { { GL_ARB_shader_bit_encoding, o(ARB_shader_bit_encoding), GL, 2010 }, { GL_ARB_shader_image_load_store, o(ARB_shader_image_load_store), GL, 2011 }, { GL_ARB_shader_objects, o(dummy_true), GL, 2002 }, + { GL_ARB_shader_precision, o(ARB_shader_precision),GL, 2014 }, Isn't this extension from 2010 rather than 2014? { GL_ARB_shader_stencil_export, o(ARB_shader_stencil_export), GL, 2009 }, { GL_ARB_shader_texture_lod, o(ARB_shader_texture_lod), GL, 2009 }, { GL_ARB_shading_language_100, o(dummy_true), GLL,2003 }, diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h index b95dfb9..4c83379 100644 --- a/src/mesa/main/mtypes.h +++ b/src/mesa/main/mtypes.h @@ -3757,6 +3757,7 @@ struct gl_extensions GLboolean ARB_shader_atomic_counters; GLboolean ARB_shader_bit_encoding; GLboolean ARB_shader_image_load_store; + GLboolean ARB_shader_precision; GLboolean ARB_shader_stencil_export; GLboolean ARB_shader_texture_lod; GLboolean ARB_shading_language_packing; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/4] drirc: add workarounds for Unigine Tropics
On Fri, 30 Jan 2015 15:19:49 +0100, Martin Peres martin.pe...@linux.intel.com wrote: Signed-off-by: Martin Peres martin.pe...@linux.intel.com --- src/mesa/drivers/dri/common/drirc | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/mesa/drivers/dri/common/drirc b/src/mesa/drivers/dri/common/drirc index cecd6a9..073814e 100644 --- a/src/mesa/drivers/dri/common/drirc +++ b/src/mesa/drivers/dri/common/drirc @@ -10,6 +10,11 @@ Application bugs worked around in this file: Enabling all extensions for Unigine fixes most issues, but the GLSL version is still 1.10. +* Unigine Tropics 1.3 makes use of the sample keyword which is reserved + with ARB_GL_gpu_shader5 which got enabled by force_glsl_extensions_warn. There seems to be something weird going on here - as far as I can tell Tropics is using a GL legacy context, and for those GL_ARB_GL_gpu_shader5 isn't supposed to be enabled, the extension spec mentions GL 3.2 compatibility/core profile being required. If i test this on r600 the extension cannot be enabled in a legacy context, only in a core one. Maybe there is a check missing somewhere in the intel driver? + It also makes use of bitwise manipulation (when adding anistropic filtering) + which is illegal in GLSL 1.10. Adding #version 130 fixes this. + * Unigine Heaven 3.0 with ARB_texture_multisample uses a ivec4 * vec4 expression, which is illegal in GLSL 1.10. Adding #version 130 fixes this. @@ -41,6 +46,8 @@ TODO: document the other workarounds. application name=Unigine Tropics executable=Tropics option name=force_glsl_extensions_warn value=true / +option name=mesa_extension_override value=-GL_ARB_gpu_shader5 / +option name=force_glsl_version value=130 / option name=disable_blend_func_extended value=true / /application force_glsl_version addition LGTM. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g/sb: fix a bug in constants folding optimisation pass
On Sat, 31 Jan 2015 01:36:30 +0100, Xavier B. xavi...@gmail.com wrote: r600g/sb: fix a bug in constants folding optimisation pass: ADD R6.y.1,R5.w.1, ~1|3f80 ADD R6.y.2,|R6.y.1|, -0.0001|b8d1b717 was wrongly being converted to ADD R6.y.1,R5.w.1, ~1|3f80 ADD R6.y.2,R5.w.1, -1.0001|bf800347 because abs() modifier was ignored. Signed-off-by: Xavier Bouchoux xavi...@gmail.com Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com Thanks Xavier! For future patches, please use git send-email as noted in http://www.mesa3d.org/devinfo.html so reviewers can comment inline. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] r600g: add support for primitive id without geom shader
, output[j].swizzle_z = 4; /* 0 */ output[j].swizzle_w = 5; /* 1 */ break; + case TGSI_SEMANTIC_PRIMID: + output[j].swizzle_x = 2; + output[j].swizzle_y = 4; /* 0 */ + output[j].swizzle_z = 4; /* 0 */ + output[j].swizzle_w = 4; /* 0 */ + break; } + break; case TGSI_PROCESSOR_FRAGMENT: if (shader-output[i].name == TGSI_SEMANTIC_COLOR) { diff --git a/src/gallium/drivers/r600/r600_shader.h b/src/gallium/drivers/r600/r600_shader.h index ab67013..b2559e9 100644 --- a/src/gallium/drivers/r600/r600_shader.h +++ b/src/gallium/drivers/r600/r600_shader.h @@ -84,6 +84,8 @@ struct r600_shader { unsignedmax_arrays; unsignednum_arrays; unsignedvs_as_es; + unsignedvs_as_gs_a; + unsignedps_prim_id_input; struct r600_shader_array * arrays; }; @@ -92,6 +94,8 @@ struct r600_shader_key { unsigned alpha_to_one:1; unsigned nr_cbufs:4; unsigned vs_as_es:1; + unsigned vs_as_gs_a:1; + unsigned vs_prim_id_out:8; }; struct r600_shader_array { diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index 1030620..b498d00 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -707,6 +707,10 @@ static INLINE struct r600_shader_key r600_shader_selector_key(struct pipe_contex key.nr_cbufs = 2; } else if (sel-type == PIPE_SHADER_VERTEX) { key.vs_as_es = (rctx-gs_shader != NULL); + if (rctx-ps_shader-current-shader.gs_prim_id_input !rctx-gs_shader) { + key.vs_as_gs_a = true; + key.vs_prim_id_out = rctx-ps_shader-current-shader.input[rctx-ps_shader-current-shader.ps_prim_id_input].spi_sid; + } } return key; } @@ -1265,6 +1269,7 @@ static bool r600_update_derived_state(struct r600_context *rctx) r600_update_ps_state(ctx, rctx-ps_shader-current); } + rctx-shader_stages.atom.dirty = true; update_shader_atom(ctx, rctx-pixel_shader, rctx-ps_shader-current); } With r600/r700 bits added and debug print removed: Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] r600g: move selecting the pixel shader earlier.
On Tue, 27 Jan 2015 04:46:32 +0100, Dave Airlie airl...@gmail.com wrote: From: Dave Airlie airl...@redhat.com In order to detect that a pixel shader has a prim id input when we have no geometry shader we need to reorder the shader selection so the pixel shader is selected first, then the vertex shader key can take into account the primitive id input requirement and lack of geom shader. Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/drivers/r600/r600_state_common.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index 09d8952..1030620 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -1170,6 +1170,10 @@ static bool r600_update_derived_state(struct r600_context *rctx) } } + r600_shader_select(ctx, rctx-ps_shader, ps_dirty); + if (unlikely(!rctx-ps_shader-current)) + return false; + update_gs_block_state(rctx, rctx-gs_shader != NULL); if (rctx-gs_shader) { @@ -1232,9 +1236,6 @@ static bool r600_update_derived_state(struct r600_context *rctx) } } - r600_shader_select(ctx, rctx-ps_shader, ps_dirty); - if (unlikely(!rctx-ps_shader-current)) - return false; if (unlikely(ps_dirty || rctx-pixel_shader.shader != rctx-ps_shader-current || rctx-rasterizer-sprite_coord_enable != rctx-ps_shader-current-sprite_coord_enable || Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Improving precision of mod(x,y)
On Thu, 15 Jan 2015 15:32:59 +0100, Roland Scheidegger srol...@vmware.com wrote: Am 15.01.2015 um 10:05 schrieb Iago Toral: Hi, We have 16 deqp tests that fail, at least on i965, because of insufficient precision of the mod GLSL function. Mesa lowers mod(x,y) to y * fract(x,y) so there can be some precision lost due to fract operation. Since the result is multiplied by y the total precision lost usually grows together with the value of y. Did you mean fract(x/y) here? Below are some examples to give an idea of the magnitude of this error. The values on the right represent the precision error for each case: mod(-1.951171875, 1.9980468750) = 0.000447 mod(121.57, 13.29) = 0.023842 mod(3769.12, 321.99)= 0.762939 mod(3769.12, 1321.99) = 0.0001220703 mod(-987654.125, 123456.984375) = 0.0160663128 mod( 987654.125, 123456.984375) = 0.031250 As you see, for large enough values, the precision error becomes significant. This can be fixed by lowering mod(x,y) to x - y * floor(x/y) instead, which is the suggested implementation in the GLSL docs. I have a local patch in my tree that does this and it does indeed fix the problem. the down side is that this implementation adds and extra ADD instruction to the generated code (besides replacing fract with floor, which I guess have similar cost). Since this is a case where there is some trade-off to the fix, I wonder if we are interested in doing this or not. Is the precision fix worth the additional ADD? Well I can tell you that llvmpipe implements frc(x) as x - floor(x), so this change looks good to me :-). On a more serious note though, it looks to me like the cost of this expression would be mostly dominated by the division, hence some add more shouldn't be that bad. And if the test is legit, I don't think there's much choice (unless you could make this optional for some old glsl versions if they didn't require that much precision but even then it's probably not worth bothering imho). FWIW, I just typed out the following little piglit test and tried it on R600: [require] GLSL = 3.30 [vertex shader passthrough] [fragment shader] uniform float a; uniform float b; out vec4 colour; void main(void) { // colour = vec4(b * fract(a / b)); // current lowering of mod(x,y) colour = vec4(a - b * floor(a/b)); // proposed lowering } [test] clear color 0.5 0.5 0.5 0.5 clear uniform float a 4.2 uniform float b 3.5 draw rect -1 -1 2 2 probe rgba 1 1 0.7 0.7 0.7 0.7 Resulting R600 assembly: // y * fract(x,y) // KC0[0].x is x and KC0[1] is y 1 t: RECIP_IEEE T0.x, KC0[1].x 2 x: MULT0.x, KC0[0].x, T0.x 3 x: FRACT T0.x, T0.x 4 x: MULR0.x, KC0[1].x, T0.x EXPORT_DONEPIXEL 0 R0. EOP // x - y * floor(x/y) 1 t: RECIP_IEEE T0.x, KC0[1].x 2 x: MULT0.x, KC0[0].x, T0.x 3 x: FLOOR T0.x, T0.x 4 x: MULADD R0.x, KC0[1].x, -T0.x, KC0[0].x EXPORT_DONEPIXEL 0 R0. EOP Same number of cycles/length of dependency chain/ALU pipe usage for both methods. I'd expect most architectures that can do source negate with multiply-add in a single operation should get similar results with no extra cost for the subtraction. /Glenn ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium: add double opcodes and TGSI execution (v2.1)
On Tue, 23 Dec 2014 22:50:30 +0100, Dave Airlie airl...@gmail.com wrote: This patch adds support for a set of double opcodes to TGSI. It is an update of work done originally by Michal Krol on the gallium-double-opcodes branch. The opcodes have a hint where they came from in the header file. v2: add unsigned/int - double v2.1: update docs. This is based on code by Michael Krol mic...@vmware.com Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/auxiliary/tgsi/tgsi_exec.c | 743 - src/gallium/auxiliary/tgsi/tgsi_info.c | 24 +- src/gallium/docs/source/tgsi.rst | 76 ++- src/gallium/include/pipe/p_shader_tokens.h | 26 +- 4 files changed, 850 insertions(+), 19 deletions(-) diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c b/src/gallium/auxiliary/tgsi/tgsi_exec.c index 834568b..6af4730 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_exec.c +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c @@ -72,6 +72,16 @@ #define TILE_BOTTOM_LEFT 2 #define TILE_BOTTOM_RIGHT 3 +union tgsi_double_channel { + double d[TGSI_QUAD_SIZE]; + unsigned u[TGSI_QUAD_SIZE][2]; +}; + +struct tgsi_double_vector { + union tgsi_double_channel xy; + union tgsi_double_channel zw; +}; + static void micro_abs(union tgsi_exec_channel *dst, const union tgsi_exec_channel *src) @@ -147,6 +157,55 @@ micro_cos(union tgsi_exec_channel *dst, } static void +micro_d2f(union tgsi_exec_channel *dst, + const union tgsi_double_channel *src) +{ + dst-f[0] = (float)src-d[0]; + dst-f[1] = (float)src-d[1]; + dst-f[2] = (float)src-d[2]; + dst-f[3] = (float)src-d[3]; +} + +static void +micro_d2i(union tgsi_exec_channel *dst, + const union tgsi_double_channel *src) +{ + dst-i[0] = (int)src-d[0]; + dst-i[1] = (int)src-d[1]; + dst-i[2] = (int)src-d[2]; + dst-i[3] = (int)src-d[3]; +} + +static void +micro_d2u(union tgsi_exec_channel *dst, + const union tgsi_double_channel *src) +{ + dst-u[0] = (unsigned)src-d[0]; + dst-u[1] = (unsigned)src-d[1]; + dst-u[2] = (unsigned)src-d[2]; + dst-u[3] = (unsigned)src-d[3]; +} +static void +micro_dabs(union tgsi_double_channel *dst, + const union tgsi_double_channel *src) +{ + dst-d[0] = src-d[0] = 0.0 ? src-d[0] : -src-d[0]; + dst-d[1] = src-d[1] = 0.0 ? src-d[1] : -src-d[1]; + dst-d[2] = src-d[2] = 0.0 ? src-d[2] : -src-d[2]; + dst-d[3] = src-d[3] = 0.0 ? src-d[3] : -src-d[3]; +} + +static void +micro_dadd(union tgsi_double_channel *dst, + const union tgsi_double_channel *src) +{ + dst-d[0] = src[0].d[0] + src[1].d[0]; + dst-d[1] = src[0].d[1] + src[1].d[1]; + dst-d[2] = src[0].d[2] + src[1].d[2]; + dst-d[3] = src[0].d[3] + src[1].d[3]; +} + +static void micro_ddx(union tgsi_exec_channel *dst, const union tgsi_exec_channel *src) { @@ -167,6 +226,159 @@ micro_ddy(union tgsi_exec_channel *dst, } static void +micro_ddiv(union tgsi_double_channel *dst, + const union tgsi_double_channel *src) +{ + dst-d[0] = src[0].d[0] / src[1].d[0]; + dst-d[1] = src[0].d[1] / src[1].d[1]; + dst-d[2] = src[0].d[2] / src[1].d[2]; + dst-d[3] = src[0].d[3] / src[1].d[3]; +} + +static void +micro_dmul(union tgsi_double_channel *dst, + const union tgsi_double_channel *src) +{ + dst-d[0] = src[0].d[0] * src[1].d[0]; + dst-d[1] = src[0].d[1] * src[1].d[1]; + dst-d[2] = src[0].d[2] * src[1].d[2]; + dst-d[3] = src[0].d[3] * src[1].d[3]; +} + +static void +micro_dmax(union tgsi_double_channel *dst, + const union tgsi_double_channel *src) +{ + dst-d[0] = src[0].d[0] src[1].d[0] ? src[0].d[0] : src[1].d[0]; + dst-d[1] = src[0].d[1] src[1].d[1] ? src[0].d[1] : src[1].d[1]; + dst-d[2] = src[0].d[2] src[1].d[2] ? src[0].d[2] : src[1].d[2]; + dst-d[3] = src[0].d[3] src[1].d[3] ? src[0].d[3] : src[1].d[3]; +} + +static void +micro_dmin(union tgsi_double_channel *dst, + const union tgsi_double_channel *src) +{ + dst-d[0] = src[0].d[0] src[1].d[0] ? src[0].d[0] : src[1].d[0]; + dst-d[1] = src[0].d[1] src[1].d[1] ? src[0].d[1] : src[1].d[1]; + dst-d[2] = src[0].d[2] src[1].d[2] ? src[0].d[2] : src[1].d[2]; + dst-d[3] = src[0].d[3] src[1].d[3] ? src[0].d[3] : src[1].d[3]; +} + +static void +micro_dneg(union tgsi_double_channel *dst, + const union tgsi_double_channel *src) +{ + dst-d[0] = -src-d[0]; + dst-d[1] = -src-d[1]; + dst-d[2] = -src-d[2]; + dst-d[3] = -src-d[3]; +} + +static void +micro_dslt(union tgsi_double_channel *dst, + const union tgsi_double_channel *src) +{ + dst-u[0][0] = src[0].d[0] src[1].d[0] ? ~0U : 0U; + dst-u[1][0] = src[0].d[1] src[1].d[1] ? ~0U : 0U; + dst-u[2][0] = src[0].d[2] src[1].d[2] ? ~0U : 0U; + dst-u[3][0] = src[0].d[3] src[1].d[3] ? ~0U : 0U; +} + +static void +micro_dsne(union tgsi_double_channel *dst, + const union tgsi_double_channel *src) +{ + dst-u[0][0] = src[0].d[0] != src[1].d[0] ? ~0U : 0U; + dst-u[1][0] =
Re: [Mesa-dev] [PATCH 004/133] nir: add the core datastructures
On Tue, 16 Dec 2014 07:04:14 +0100, Jason Ekstrand ja...@jlekstrand.net wrote: From: Connor Abbott connor.abb...@intel.com This includes all the instructions, ifs, loops, functions, etc. This is similar to the information in ir.h. v2: Jason Ekstrand jason.ekstr...@intel.com: Include ralloc and hash_table from the util directory --- src/glsl/Makefile.sources |2 + src/glsl/nir/nir.h| 1150 + src/glsl/nir/nir_intrinsics.c | 49 ++ src/glsl/nir/nir_intrinsics.h | 158 ++ src/glsl/nir/nir_opcodes.c| 46 ++ src/glsl/nir/nir_opcodes.h| 346 + 6 files changed, 1751 insertions(+) create mode 100644 src/glsl/nir/nir.h create mode 100644 src/glsl/nir/nir_intrinsics.c create mode 100644 src/glsl/nir/nir_intrinsics.h create mode 100644 src/glsl/nir/nir_opcodes.c create mode 100644 src/glsl/nir/nir_opcodes.h diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources index c3a90f7..e8eedd1 100644 --- a/src/glsl/Makefile.sources +++ b/src/glsl/Makefile.sources @@ -14,6 +14,8 @@ LIBGLCPP_GENERATED_FILES = \ $(GLSL_BUILDDIR)/glcpp/glcpp-parse.c NIR_FILES = \ +$(GLSL_SRCDIR)/nir/nir_intrinsics.c \ +$(GLSL_SRCDIR)/nir/nir_opcodes.c \ $(GLSL_SRCDIR)/nir/nir_types.cpp # libglsl diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h new file mode 100644 index 000..ef486da --- /dev/null +++ b/src/glsl/nir/nir.h @@ -0,0 +1,1150 @@ +/* + * Copyright © 2014 Connor Abbott + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Connor Abbott (cwabbo...@gmail.com) + * + */ + +#pragma once + +#include util/hash_table.h +#include main/set.h +#include ../list.h +#include GL/gl.h /* GLenum */ +#include util/ralloc.h +#include nir_types.h +#include stdio.h + +#ifdef __cplusplus +extern C { +#endif + +struct nir_function_overload; +struct nir_function; + + +/** + * Description of built-in state associated with a uniform + * + * \sa nir_variable::state_slots + */ +typedef struct { + int tokens[5]; + int swizzle; +} nir_state_slot; + +typedef enum { + nir_var_shader_in, + nir_var_shader_out, + nir_var_global, + nir_var_local, + nir_var_uniform, + nir_var_system_value +} nir_variable_mode; + +/** + * Data stored in an nir_constant + */ +union nir_constant_data { + unsigned u[16]; + int i[16]; + float f[16]; + bool b[16]; +}; + +typedef struct nir_constant { + /** +* Value of the constant. +* +* The field used to back the values supplied by the constant is determined +* by the type associated with the \c ir_instruction. Constants may be +* scalars, vectors, or matrices. +*/ + union nir_constant_data value; + + /* Array elements / Structure Fields */ + struct nir_constant **elements; +} nir_constant; + +/** + * \brief Layout qualifiers for gl_FragDepth. + * + * The AMD/ARB_conservative_depth extensions allow gl_FragDepth to be redeclared + * with a layout qualifier. + */ +typedef enum { +nir_depth_layout_none, /** No depth layout is specified. */ +nir_depth_layout_any, +nir_depth_layout_greater, +nir_depth_layout_less, +nir_depth_layout_unchanged +} nir_depth_layout; + +/** + * Either a uniform, global variable, shader input, or shader output. Based on + * ir_variable - it should be easy to translate between the two. + */ + +typedef struct { + struct exec_node node; + + /** +* Declared type of the variable +*/ + const struct glsl_type *type; + + /** +* Declared name of the variable +*/ + char *name; + + /** +* For variables which satisfy the is_interface_instance() predicate, this +* points to an array of integers such that if the ith member of the +* interface block is an array, max_ifc_array_access[i] is the maximum +* array element of that member
Re: [Mesa-dev] [PATCH 146/133] nir: Use static inlines instead of macros for list getters
exec_node_is_tail_sentinel(node-node.next); +} NIR_DEFINE_CAST(nir_cf_node_as_block, nir_cf_node, nir_block, cf_node) NIR_DEFINE_CAST(nir_cf_node_as_if, nir_cf_node, nir_if, cf_node) Reviewed-By: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] r600g: Implement ARB_draw_indirect for EG/CM
Requires Evergreen/Cayman and updated radeon kernel module Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- Changes since V1: * Fixed 8 bit index case, only triggerable using GLES 3.1 which isn't supported yet * Don't read info struct values that have no meaning for indirect case * Don't update start_instance/instance_count for indirect cases * Use bool expression directly in get_param Benjamin, the #defines are essentially used, but due to a header conflict its not possible to include them in this file. Would have broken the indirect cases into evergreen_state.c, but this is a performance-sensitive section of code and inlining is critical, so did the next best thing and typed out the define names as comments. Thanks Marek/Benjamin for V1 review docs/GL3.txt | 4 +- docs/relnotes/10.5.0.html| 1 + src/gallium/drivers/r600/evergreend.h| 6 +- src/gallium/drivers/r600/r600_pipe.c | 4 +- src/gallium/drivers/r600/r600_state_common.c | 116 ++- 5 files changed, 105 insertions(+), 26 deletions(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 648f5ac..435054a 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -95,7 +95,7 @@ GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, soft GL 4.0, GLSL 4.00: GL_ARB_draw_buffers_blendDONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe) - GL_ARB_draw_indirect DONE (i965, nvc0, radeonsi, llvmpipe, softpipe) + GL_ARB_draw_indirect DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe) GL_ARB_gpu_shader5 DONE (i965, nvc0) - 'precise' qualifierDONE - Dynamically uniform sampler array indices DONE (r600) @@ -159,7 +159,7 @@ GL 4.3, GLSL 4.30: GL_ARB_framebuffer_no_attachmentsnot started GL_ARB_internalformat_query2 not started GL_ARB_invalidate_subdataDONE (all drivers) - GL_ARB_multi_draw_indirect DONE (i965, nvc0, radeonsi, llvmpipe, softpipe) + GL_ARB_multi_draw_indirect DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe) GL_ARB_program_interface_query not started GL_ARB_robust_buffer_access_behavior not started GL_ARB_shader_image_size not started diff --git a/docs/relnotes/10.5.0.html b/docs/relnotes/10.5.0.html index 2987d53..72bb791 100644 --- a/docs/relnotes/10.5.0.html +++ b/docs/relnotes/10.5.0.html @@ -49,6 +49,7 @@ Note: some of the new features are only available with certain drivers. liGL_EXT_packed_float on freedreno/li liGL_EXT_texture_shared_exponent on freedreno/li liGL_EXT_texture_snorm on freedreno/li +liGL_ARB_draw_indirect, GL_ARB_multi_draw_indirect on r600/li /ul diff --git a/src/gallium/drivers/r600/evergreend.h b/src/gallium/drivers/r600/evergreend.h index 4989996..0725f0d 100644 --- a/src/gallium/drivers/r600/evergreend.h +++ b/src/gallium/drivers/r600/evergreend.h @@ -64,6 +64,8 @@ #define R600_TEXEL_PITCH_ALIGNMENT_MASK0x7 #define PKT3_NOP 0x10 +#define PKT3_SET_BASE 0x11 +#define PKT3_INDEX_BUFFER_SIZE 0x13 #define PKT3_DEALLOC_STATE 0x14 #define PKT3_DISPATCH_DIRECT 0x15 #define PKT3_DISPATCH_INDIRECT 0x16 @@ -72,7 +74,9 @@ #define PKT3_REG_RMW 0x21 #define PKT3_COND_EXEC 0x22 #define PKT3_PRED_EXEC 0x23 -#define PKT3_START_3D_CMDBUF 0x24 +#define PKT3_DRAW_INDIRECT 0x24 +#define PKT3_DRAW_INDEX_INDIRECT 0x25 +#define PKT3_INDEX_BASE0x26 #define PKT3_DRAW_INDEX_2 0x27 #define PKT3_CONTEXT_CONTROL 0x28 #define PKT3_DRAW_INDEX_IMMD_BE0x29 diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index 0b571e4..0d8bac2 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -313,6 +313,9 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) return family = CHIP_CEDAR ? 1 : 0; case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: return family = CHIP_CEDAR ? 4 : 0; + case PIPE_CAP_DRAW_INDIRECT: + /* kernel command checker support is also required */ + return family = CHIP_CEDAR rscreen-b.info.drm_minor = 41; /* Unsupported features. */ case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT: @@ -322,7 +325,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param
Re: [Mesa-dev] [PATCH] r600g/sb: implement r600 gpr index workaround. (v3)
*pn = static_castalu_node*(*pI); + if (pn-bc.dst_gpr == src.sel) { + add_nop = true; + break; + } + } + } } else src.rel = 0; @@ -393,11 +426,23 @@ void bc_finalizer::finalize_alu_src(alu_group_node* g, alu_node* a) { assert(!unknown value kind); break; } + if (prev !add_nop) { + for (node_iterator pI = prev-begin(), pE = prev-end(); pI != pE; ++pI) { + alu_node *pn = static_castalu_node*(*pI); + if (pn-bc.dst_rel) { + if (pn-bc.dst_gpr == src.sel) { + add_nop = true; + break; + } + } + } + } } while (si 3) { a-bc.src[si++].sel = 0; } + return add_nop; } void bc_finalizer::copy_fetch_src(fetch_node dst, fetch_node src, unsigned arg_start) diff --git a/src/gallium/drivers/r600/sb/sb_context.cpp b/src/gallium/drivers/r600/sb/sb_context.cpp index 8e11428..5dba85b 100644 --- a/src/gallium/drivers/r600/sb/sb_context.cpp +++ b/src/gallium/drivers/r600/sb/sb_context.cpp @@ -61,6 +61,8 @@ int sb_context::init(r600_isa *isa, sb_hw_chip chip, sb_hw_class cclass) { uses_mova_gpr = is_r600() chip != HW_CHIP_RV670; + r6xx_gpr_index_workaround = is_r600() chip != HW_CHIP_RV670 chip != HW_CHIP_RS780 chip != HW_CHIP_RS880; + switch (chip) { case HW_CHIP_RV610: case HW_CHIP_RS780: diff --git a/src/gallium/drivers/r600/sb/sb_pass.h b/src/gallium/drivers/r600/sb/sb_pass.h index 812d14a..0346df1 100644 --- a/src/gallium/drivers/r600/sb/sb_pass.h +++ b/src/gallium/drivers/r600/sb/sb_pass.h @@ -695,8 +695,9 @@ public: void run_on(container_node *c); - void finalize_alu_group(alu_group_node *g); - void finalize_alu_src(alu_group_node *g, alu_node *a); + void insert_rv6xx_load_ar_workaround(alu_group_node *b4); + void finalize_alu_group(alu_group_node *g, node *prev_node); + bool finalize_alu_src(alu_group_node *g, alu_node *a, alu_group_node *prev_node); void emit_set_grad(fetch_node* f); void finalize_fetch(fetch_node *f); Reviewed-By: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: only init GS_VERT_ITEMSIZE on r600
On Wed, 10 Dec 2014 04:55:21 +0100, Dave Airlie airl...@gmail.com wrote: From: Dave Airlie airl...@redhat.com On evergreen there are 4 regs, on r600/700 there is only one. Don't initialise regs and trash someone elses state. Not sure this fixes anything, but hey one less stupid. Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/drivers/r600/r600_state.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index 61f5c5a..9a4b972 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -2659,11 +2659,8 @@ void r600_update_gs_state(struct pipe_context *ctx, struct r600_pipe_shader *sha r600_store_context_reg(cb, R_028A6C_VGT_GS_OUT_PRIM_TYPE, r600_conv_prim_to_gs_out(rshader-gs_output_prim)); - r600_store_context_reg_seq(cb, R_0288C8_SQ_GS_VERT_ITEMSIZE, 4); - r600_store_value(cb, cp_shader-ring_item_size 2); - r600_store_value(cb, 0); - r600_store_value(cb, 0); - r600_store_value(cb, 0); + r600_store_context_reg(cb, R_0288C8_SQ_GS_VERT_ITEMSIZE, + cp_shader-ring_item_size 2); r600_store_context_reg(cb, R_0288A8_SQ_ESGS_RING_ITEMSIZE, (rshader-ring_item_size) 2); Reviewed-By: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix regression since UCMP change
On Tue, 09 Dec 2014 02:31:01 +0100, Dave Airlie airl...@gmail.com wrote: From: Dave Airlie airl...@redhat.com Since d8da6deceadf5e48201d848b7061dad17a5b7cac where the state tracker started using UCMP on cayman a number of tests regressed. this seems to be r600g is doing CNDGE_INT for UCMP which is = 0, we should be doing CNDE_INT with reverse arguments. Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/drivers/r600/r600_shader.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 0b988df..28137e1 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -6082,7 +6082,7 @@ static int tgsi_ucmp(struct r600_shader_ctx *ctx) continue; memset(alu, 0, sizeof(struct r600_bytecode_alu)); - alu.op = ALU_OP3_CNDGE_INT; + alu.op = ALU_OP3_CNDE_INT; r600_bytecode_src(alu.src[0], ctx-src[0], i); r600_bytecode_src(alu.src[1], ctx-src[2], i); r600_bytecode_src(alu.src[2], ctx-src[1], i); Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g/sb: fix issues cause by GLSL switching to loops for switch
On Fri, 28 Nov 2014 04:36:42 +0100, Dave Airlie airl...@gmail.com wrote: From: Dave Airlie airl...@redhat.com Since 73dd50acf6d244979c2a657906aa56d3ac60d550 glsl: implement switch flow control using a loop The SB backend was falling over in an assert or crashing. Tracked this down to the loops having no repeats, but requiring a working break, initial code just called the loop handler for all non-if statements, but this caused a regression in tests/shaders/dead-code-break-interaction.shader_test. So I had to add further code to detect if all the departure nodes are empty and avoid generating an empty loop for that case. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86089 Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 51 ++ 1 file changed, 36 insertions(+), 15 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp index f0849ca..d91ffa5 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp @@ -46,15 +46,22 @@ int bc_finalizer::run() { for (regions_vec::reverse_iterator I = rv.rbegin(), E = rv.rend(); I != E; ++I) { region_node *r = *I; - + bool is_if = false; assert(r); - bool loop = r-is_loop(); + assert(r-first); + if (r-first-is_container()) { + container_node *repdep1 = static_castcontainer_node*(r-first); + assert(repdep1-is_depart() || repdep1-is_repeat()); + if_node *n_if = static_castif_node*(repdep1-first); + if (n_if n_if-is_if()) + is_if = true; + } - if (loop) - finalize_loop(r); - else + if (is_if) finalize_if(r); + else + finalize_loop(r); r-expand(); } @@ -112,16 +119,31 @@ void bc_finalizer::finalize_loop(region_node* r) { cf_node *loop_start = sh.create_cf(CF_OP_LOOP_START_DX10); cf_node *loop_end = sh.create_cf(CF_OP_LOOP_END); + bool has_instr = false; + + if (!r-is_loop()) { + for (depart_vec::iterator I = r-departs.begin(), E = r-departs.end(); +I != E; ++I) { + depart_node *dep = *I; + if (!dep-empty()) + has_instr = true; could break here + } + } else + has_instr = true; - loop_start-jump_after(loop_end); - loop_end-jump_after(loop_start); + if (has_instr) { + loop_start-jump_after(loop_end); + loop_end-jump_after(loop_start); + } for (depart_vec::iterator I = r-departs.begin(), E = r-departs.end(); I != E; ++I) { depart_node *dep = *I; - cf_node *loop_break = sh.create_cf(CF_OP_LOOP_BREAK); - loop_break-jump(loop_end); - dep-push_back(loop_break); + if (has_instr) { + cf_node *loop_break = sh.create_cf(CF_OP_LOOP_BREAK); + loop_break-jump(loop_end); + dep-push_back(loop_break); + } dep-expand(); } @@ -137,8 +159,10 @@ void bc_finalizer::finalize_loop(region_node* r) { rep-expand(); } - r-push_front(loop_start); - r-push_back(loop_end); + if (has_instr) { + r-push_front(loop_start); + r-push_back(loop_end); + } } void bc_finalizer::finalize_if(region_node* r) { @@ -168,9 +192,6 @@ void bc_finalizer::finalize_if(region_node* r) { if (n_if) { - - assert(n_if-is_if()); shouldn't need to remove this assertion - container_node *repdep2 = static_castcontainer_node*(n_if-first); assert(repdep2-is_depart() || repdep2-is_repeat()); I think i've managed to convince myself the above logic is correct, so Reviewed-By: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: merge the TXQ and BUFFER constant buffers
, R600_TXQ_CONST_BUFFER, cb); - pipe_resource_reference(cb.buffer, NULL); -} - /* set sample xy locations as array of fragment shader constants */ void r600_set_sample_locations_constant_buffer(struct r600_context *rctx) { @@ -1175,7 +1151,7 @@ static bool r600_update_derived_state(struct r600_context *rctx) struct pipe_context * ctx = (struct pipe_context*)rctx; bool ps_dirty = false, vs_dirty = false, gs_dirty = false; bool blend_disable; - + bool need_buf_const; if (!rctx-blitter-running) { unsigned i; @@ -1296,29 +1272,35 @@ static bool r600_update_derived_state(struct r600_context *rctx) /* on R600 we stuff masks + txq info into one constant buffer */ /* on evergreen we only need a txq info one */ - if (rctx-b.chip_class EVERGREEN) { - if (rctx-ps_shader rctx-ps_shader-current-shader.uses_tex_buffers) - r600_setup_buffer_constants(rctx, PIPE_SHADER_FRAGMENT); - if (rctx-vs_shader rctx-vs_shader-current-shader.uses_tex_buffers) - r600_setup_buffer_constants(rctx, PIPE_SHADER_VERTEX); - if (rctx-gs_shader rctx-gs_shader-current-shader.uses_tex_buffers) - r600_setup_buffer_constants(rctx, PIPE_SHADER_GEOMETRY); - } else { - if (rctx-ps_shader rctx-ps_shader-current-shader.uses_tex_buffers) - eg_setup_buffer_constants(rctx, PIPE_SHADER_FRAGMENT); - if (rctx-vs_shader rctx-vs_shader-current-shader.uses_tex_buffers) - eg_setup_buffer_constants(rctx, PIPE_SHADER_VERTEX); - if (rctx-gs_shader rctx-gs_shader-current-shader.uses_tex_buffers) - eg_setup_buffer_constants(rctx, PIPE_SHADER_GEOMETRY); + if (rctx-ps_shader) { + need_buf_const = rctx-ps_shader-current-shader.uses_tex_buffers || rctx-ps_shader-current-shader.has_txq_cube_array_z_comp; + if (need_buf_const) { + if (rctx-b.chip_class EVERGREEN) + r600_setup_buffer_constants(rctx, PIPE_SHADER_FRAGMENT); + else + eg_setup_buffer_constants(rctx, PIPE_SHADER_FRAGMENT); + } } + if (rctx-vs_shader) { + need_buf_const = rctx-vs_shader-current-shader.uses_tex_buffers || rctx-vs_shader-current-shader.has_txq_cube_array_z_comp; + if (need_buf_const) { + if (rctx-b.chip_class EVERGREEN) + r600_setup_buffer_constants(rctx, PIPE_SHADER_VERTEX); + else + eg_setup_buffer_constants(rctx, PIPE_SHADER_VERTEX); + } + } - if (rctx-ps_shader rctx-ps_shader-current-shader.has_txq_cube_array_z_comp) - r600_setup_txq_cube_array_constants(rctx, PIPE_SHADER_FRAGMENT); - if (rctx-vs_shader rctx-vs_shader-current-shader.has_txq_cube_array_z_comp) - r600_setup_txq_cube_array_constants(rctx, PIPE_SHADER_VERTEX); - if (rctx-gs_shader rctx-gs_shader-current-shader.has_txq_cube_array_z_comp) - r600_setup_txq_cube_array_constants(rctx, PIPE_SHADER_GEOMETRY); + if (rctx-gs_shader) { + need_buf_const = rctx-gs_shader-current-shader.uses_tex_buffers || rctx-gs_shader-current-shader.has_txq_cube_array_z_comp; + if (need_buf_const) { + if (rctx-b.chip_class EVERGREEN) + r600_setup_buffer_constants(rctx, PIPE_SHADER_GEOMETRY); + else + eg_setup_buffer_constants(rctx, PIPE_SHADER_GEOMETRY); + } + } if (rctx-b.chip_class EVERGREEN rctx-ps_shader rctx-vs_shader) { if (!r600_adjust_gprs(rctx)) { Passes piglits on a Turks with no obvious regressions, so with nits above fixed, consider it Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600: fix texture gradients instruction emission (v2)
!= TGSI_TEXTURE_RECT) { Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: do all CUBE ALU operations before gradient texture operations (v2)
? 2 : 0; // CF_INDEX_1 : CF_INDEX_NONE - if (sampler_index_mode) - ctx-shader-uses_index_registers = true; if ((inst-Texture.Texture == TGSI_TEXTURE_CUBE || inst-Texture.Texture == TGSI_TEXTURE_CUBE_ARRAY || @@ -5454,6 +5399,69 @@ static int tgsi_tex(struct r600_shader_ctx *ctx) src_gpr = ctx-temp_reg; } + if (inst-Instruction.Opcode == TGSI_OPCODE_TXD) { + int temp_h, temp_v; + int start_val = 0; + + /* if we've already loaded the src (i.e. CUBE don't reload it). */ + if (src_loaded == TRUE) + start_val = 1; + else + src_loaded = TRUE; + for (i = start_val; i 3; i++) { + int treg = r600_get_temp(ctx); + + if (i == 0) + src_gpr = treg; + else if (i == 1) + temp_h = treg; + else + temp_v = treg; + + for (j = 0; j 4; j++) { + memset(alu, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ALU_OP1_MOV; +r600_bytecode_src(alu.src[0], ctx-src[i], j); +alu.dst.sel = treg; +alu.dst.chan = j; +if (j == 3) + alu.last = 1; +alu.dst.write = 1; +r = r600_bytecode_add_alu(ctx-bc, alu); +if (r) +return r; + } + } + for (i = 1; i 3; i++) { + /* set gradients h/v */ + memset(tex, 0, sizeof(struct r600_bytecode_tex)); + tex.op = (i == 1) ? FETCH_OP_SET_GRADIENTS_H : + FETCH_OP_SET_GRADIENTS_V; + tex.sampler_id = tgsi_tex_get_src_gpr(ctx, sampler_src_reg); + tex.sampler_index_mode = sampler_index_mode; + tex.resource_id = tex.sampler_id + R600_MAX_CONST_BUFFERS; + tex.resource_index_mode = sampler_index_mode; + + tex.src_gpr = (i == 1) ? temp_h : temp_v; + tex.src_sel_x = 0; + tex.src_sel_y = 1; + tex.src_sel_z = 2; + tex.src_sel_w = 3; + + tex.dst_gpr = r600_get_temp(ctx); /* just to avoid confusing the asm scheduler */ + tex.dst_sel_x = tex.dst_sel_y = tex.dst_sel_z = tex.dst_sel_w = 7; + if (inst-Texture.Texture != TGSI_TEXTURE_RECT) { + tex.coord_type_x = 1; + tex.coord_type_y = 1; + tex.coord_type_z = 1; + tex.coord_type_w = 1; + } + r = r600_bytecode_add_tex(ctx-bc, tex); + if (r) + return r; + } + } + if (src_requires_loading !src_loaded) { for (i = 0; i 4; i++) { memset(alu, 0, sizeof(struct r600_bytecode_alu)); ARB_shader_texture_lod piglits go from 76/90 to 88/90, and fixes a number of tex-miplevel-selection tests. Some remaining Cube/1DArrayShadow failures. Worthwhile improvement as is, so Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: geom shaders: always load texture src regs from inputs
On Tue, 18 Nov 2014 05:09:05 +0100, Dave Airlie airl...@gmail.com wrote: From: Dave Airlie airl...@redhat.com Otherwise we seem to lose the split_gs_inputs and try and pull from an uninitialised register. fixes 9 texelFetch geom shader tests. Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/drivers/r600/r600_shader.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 709fcd7..ab2a838 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -4919,7 +4919,8 @@ static inline boolean tgsi_tex_src_requires_loading(struct r600_shader_ctx *ctx, return (inst-Src[index].Register.File != TGSI_FILE_TEMPORARY inst-Src[index].Register.File != TGSI_FILE_INPUT inst-Src[index].Register.File != TGSI_FILE_OUTPUT) || - ctx-src[index].neg || ctx-src[index].abs; + ctx-src[index].neg || ctx-src[index].abs || + (inst-Src[index].Register.File == TGSI_FILE_INPUT ctx-type == TGSI_PROCESSOR_GEOMETRY); } static inline unsigned tgsi_tex_get_src_gpr(struct r600_shader_ctx *ctx, Confirmed fixes the same set of tests on a Turks. Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: limit texture offset application to specific types (v2)
On Tue, 18 Nov 2014 07:59:23 +0100, Dave Airlie airl...@gmail.com wrote: From: Dave Airlie airl...@redhat.com For 1D and 2D arrays we don't want the other coordinates being offset and affecting where we sample. I wrote this patch 6 months ago but lost it. Fixes: ./bin/tex-miplevel-selection textureLodOffset 1DArray ./bin/tex-miplevel-selection textureLodOffset 2DArray ./bin/tex-miplevel-selection textureOffset 1DArray ./bin/tex-miplevel-selection textureOffset 1DArrayShadow ./bin/tex-miplevel-selection textureOffset 2DArray ./bin/tex-miplevel-selection textureOffset(bias) 1DArray ./bin/tex-miplevel-selection textureOffset(bias) 2DArray v2: rewrite to handle more cases and be consistent with code above. Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/drivers/r600/r600_shader.c | 21 ++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index ab2a838..76daf2c 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -5535,9 +5535,24 @@ static int tgsi_tex(struct r600_shader_ctx *ctx) /* texture offsets do not apply to other texture targets */ } } else { - offset_x = ctx-literals[4 * inst-TexOffsets[0].Index + inst-TexOffsets[0].SwizzleX] 1; - offset_y = ctx-literals[4 * inst-TexOffsets[0].Index + inst-TexOffsets[0].SwizzleY] 1; - offset_z = ctx-literals[4 * inst-TexOffsets[0].Index + inst-TexOffsets[0].SwizzleZ] 1; + switch (inst-Texture.Texture) { + case TGSI_TEXTURE_3D: +offset_z = ctx-literals[4 * inst-TexOffsets[0].Index + inst-TexOffsets[0].SwizzleZ] 1; + /* fallthrough */ + case TGSI_TEXTURE_2D: + case TGSI_TEXTURE_SHADOW2D: + case TGSI_TEXTURE_RECT: + case TGSI_TEXTURE_SHADOWRECT: + case TGSI_TEXTURE_2D_ARRAY: + case TGSI_TEXTURE_SHADOW2D_ARRAY: +offset_y = ctx-literals[4 * inst-TexOffsets[0].Index + inst-TexOffsets[0].SwizzleY] 1; + /* fallthrough */ + case TGSI_TEXTURE_1D: + case TGSI_TEXTURE_SHADOW1D: + case TGSI_TEXTURE_1D_ARRAY: + case TGSI_TEXTURE_SHADOW1D_ARRAY: +offset_x = ctx-literals[4 * inst-TexOffsets[0].Index + inst-TexOffsets[0].SwizzleX] 1; + } } } Confirmed fixes the same set of tests on a Turks. Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g/cayman: fix integer multiplication output overwrite
On Tue, 18 Nov 2014 00:56:38 +0100, Dave Airlie airl...@gmail.com wrote: From: Dave Airlie airl...@redhat.com This fixes tests/spec/glsl-1.10/execution/fs-op-assign-mult-ivec2-ivec2-overwrite.shader_test. Reported-by: ghallberg on irc Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/drivers/r600/r600_shader.c | 23 ++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index aab4215..02efc92 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -2729,6 +2729,9 @@ static int cayman_mul_int_instr(struct r600_shader_ctx *ctx) int i, j, k, r; struct r600_bytecode_alu alu; int last_slot = (inst-Dst[0].Register.WriteMask 0x8) ? 4 : 3; + int t1 = ctx-temp_reg; + int lasti = tgsi_last_instruction(inst-Dst[0].Register.WriteMask); + for (k = 0; k last_slot; k++) { if (!(inst-Dst[0].Register.WriteMask (1 k))) continue; @@ -2739,7 +2742,8 @@ static int cayman_mul_int_instr(struct r600_shader_ctx *ctx) for (j = 0; j inst-Instruction.NumSrcRegs; j++) { r600_bytecode_src(alu.src[j], ctx-src[j], k); } - tgsi_dst(ctx, inst-Dst[0], i, alu.dst); + alu.dst.sel = t1; + alu.dst.chan = i; alu.dst.write = (i == k); if (i == 3) alu.last = 1; @@ -2748,6 +2752,23 @@ static int cayman_mul_int_instr(struct r600_shader_ctx *ctx) return r; } } + + for (i = 0 ; i last_slot; i++) { + if (!(inst-Dst[0].Register.WriteMask (1 i))) + continue; + memset(alu, 0, sizeof(struct r600_bytecode_alu)); + alu.op = ALU_OP1_MOV; + alu.src[0].sel = t1; + alu.src[0].chan = i; + tgsi_dst(ctx, inst-Dst[0], i, alu.dst); + alu.dst.write = 1; + if (i == lasti) + alu.last = 1; + r = r600_bytecode_add_alu(ctx-bc, alu); + if (r) + return r; + } + return 0; } Trivial nit: last_slot is no longer needed and can be removed. With a bit of luck it will also fix https://bugs.freedesktop.org/show_bug.cgi?id=85376 Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g/cayman: fix texture gather tests
On Tue, 18 Nov 2014 01:57:13 +0100, Dave Airlie airl...@gmail.com wrote: From: Dave Airlie airl...@redhat.com It appears on cayman the TG4 outputs were reordered. This fixes a lot of piglit tests. Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/drivers/r600/r600_shader.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 4c6ae45..709fcd7 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -5763,11 +5763,18 @@ static int tgsi_tex(struct r600_shader_ctx *ctx) int8_t texture_component_select = ctx-literals[4 * inst-Src[1].Register.Index + inst-Src[1].Register.SwizzleX]; tex.inst_mod = texture_component_select; + if (ctx-bc-chip_class == CAYMAN) { /* GATHER4 result order is different from TGSI TG4 */ - tex.dst_sel_x = (inst-Dst[0].Register.WriteMask 2) ? 1 : 7; - tex.dst_sel_y = (inst-Dst[0].Register.WriteMask 4) ? 2 : 7; - tex.dst_sel_z = (inst-Dst[0].Register.WriteMask 1) ? 0 : 7; - tex.dst_sel_w = (inst-Dst[0].Register.WriteMask 8) ? 3 : 7; + tex.dst_sel_x = (inst-Dst[0].Register.WriteMask 2) ? 0 : 7; + tex.dst_sel_y = (inst-Dst[0].Register.WriteMask 4) ? 1 : 7; + tex.dst_sel_z = (inst-Dst[0].Register.WriteMask 1) ? 2 : 7; + tex.dst_sel_w = (inst-Dst[0].Register.WriteMask 8) ? 3 : 7; + } else { + tex.dst_sel_x = (inst-Dst[0].Register.WriteMask 2) ? 1 : 7; + tex.dst_sel_y = (inst-Dst[0].Register.WriteMask 4) ? 2 : 7; + tex.dst_sel_z = (inst-Dst[0].Register.WriteMask 1) ? 0 : 7; + tex.dst_sel_w = (inst-Dst[0].Register.WriteMask 8) ? 3 : 7; + } } else if (inst-Instruction.Opcode == TGSI_OPCODE_LODQ) { tex.dst_sel_x = (inst-Dst[0].Register.WriteMask 2) ? 1 : 7; Gotta permute those tex op bit encodings between hardware generations or they go stale... Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g/cayman: hande empty vertex shaders
On Tue, 18 Nov 2014 02:23:51 +0100, Dave Airlie airl...@gmail.com wrote: From: Dave Airlie airl...@redhat.com Some of the geom shader tests produce an empty vertex shader, on cayman we'd crash in the finaliser because last_cf was NULL. cayman doesn't need the NOP workaround, so if the code arrives here with no last_cf, just emit an END. fixes crashes in a bunch of piglit geom shader tests. Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/drivers/r600/sb/sb_bc_finalize.cpp | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp index 5c22f96..f0849ca 100644 --- a/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp +++ b/src/gallium/drivers/r600/sb/sb_bc_finalize.cpp @@ -83,14 +83,18 @@ int bc_finalizer::run() { last_cf = c; } - if (last_cf-bc.op_ptr-flags CF_ALU) { + if (!ctx.is_cayman() last_cf-bc.op_ptr-flags CF_ALU) { last_cf = sh.create_cf(CF_OP_NOP); sh.root-push_back(last_cf); } - if (ctx.is_cayman()) - last_cf-insert_after(sh.create_cf(CF_OP_CF_END)); - else + if (ctx.is_cayman()) { + if (!last_cf) { + cf_node *c = sh.create_cf(CF_OP_CF_END); + sh.root-push_back(c); + } else + last_cf-insert_after(sh.create_cf(CF_OP_CF_END)); + } else last_cf-bc.end_of_program = 1; for (unsigned t = EXP_PIXEL; t EXP_TYPE_COUNT; ++t) { Reviewed-by: Glenn Kennard glenn.kenn...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g: Implement GL_ARB_draw_indirect
Requires evergreen/cayman, and updated radeon kernel module. Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- See also kernel side patch sent to dri-de...@lists.freedesktop.org docs/GL3.txt | 4 +- docs/relnotes/10.4.html | 1 + src/gallium/drivers/r600/evergreend.h| 7 ++- src/gallium/drivers/r600/r600_pipe.c | 6 ++- src/gallium/drivers/r600/r600_state_common.c | 80 ++-- 5 files changed, 77 insertions(+), 21 deletions(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 2854431..06c52f9 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -95,7 +95,7 @@ GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, llvmpipe, soft GL 4.0, GLSL 4.00: GL_ARB_draw_buffers_blendDONE (i965, nv50, nvc0, r600, radeonsi, llvmpipe, softpipe) - GL_ARB_draw_indirect DONE (i965, nvc0, radeonsi, llvmpipe, softpipe) + GL_ARB_draw_indirect DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe) GL_ARB_gpu_shader5 DONE (i965, nvc0) - 'precise' qualifierDONE - Dynamically uniform sampler array indices DONE (r600) @@ -159,7 +159,7 @@ GL 4.3, GLSL 4.30: GL_ARB_framebuffer_no_attachmentsnot started GL_ARB_internalformat_query2 not started GL_ARB_invalidate_subdataDONE (all drivers) - GL_ARB_multi_draw_indirect DONE (i965, nvc0, radeonsi, llvmpipe, softpipe) + GL_ARB_multi_draw_indirect DONE (i965, nvc0, r600, radeonsi, llvmpipe, softpipe) GL_ARB_program_interface_query not started GL_ARB_robust_buffer_access_behavior not started GL_ARB_shader_image_size not started diff --git a/docs/relnotes/10.4.html b/docs/relnotes/10.4.html index d0fbd3b..9c2a491 100644 --- a/docs/relnotes/10.4.html +++ b/docs/relnotes/10.4.html @@ -49,6 +49,7 @@ Note: some of the new features are only available with certain drivers. liGL_ARB_texture_view on nv50, nvc0/li liGL_ARB_clip_control on llvmpipe, softpipe, r300, r600, radeonsi/li liGL_KHR_context_flush_control on all drivers/li +liGL_ARB_draw_indirect, GL_ARB_multi_draw_indirect on r600/li /ul diff --git a/src/gallium/drivers/r600/evergreend.h b/src/gallium/drivers/r600/evergreend.h index 4989996..b8880c8 100644 --- a/src/gallium/drivers/r600/evergreend.h +++ b/src/gallium/drivers/r600/evergreend.h @@ -64,6 +64,8 @@ #define R600_TEXEL_PITCH_ALIGNMENT_MASK0x7 #define PKT3_NOP 0x10 +#define PKT3_SET_BASE 0x11 +#define PKT3_INDEX_BUFFER_SIZE 0x13 #define PKT3_DEALLOC_STATE 0x14 #define PKT3_DISPATCH_DIRECT 0x15 #define PKT3_DISPATCH_INDIRECT 0x16 @@ -72,12 +74,15 @@ #define PKT3_REG_RMW 0x21 #define PKT3_COND_EXEC 0x22 #define PKT3_PRED_EXEC 0x23 -#define PKT3_START_3D_CMDBUF 0x24 +#define PKT3_DRAW_INDIRECT 0x24 +#define PKT3_DRAW_INDEX_INDIRECT 0x25 +#define PKT3_INDEX_BASE0x26 #define PKT3_DRAW_INDEX_2 0x27 #define PKT3_CONTEXT_CONTROL 0x28 #define PKT3_DRAW_INDEX_IMMD_BE0x29 #define PKT3_INDEX_TYPE0x2A #define PKT3_DRAW_INDEX0x2B +#define PKT3_DRAW_INDIRECT_MULTI 0x2C #define PKT3_DRAW_INDEX_AUTO 0x2D #define PKT3_DRAW_INDEX_IMMD 0x2E #define PKT3_NUM_INSTANCES 0x2F diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index 0b571e4..829deaf 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -313,6 +313,11 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) return family = CHIP_CEDAR ? 1 : 0; case PIPE_CAP_MAX_TEXTURE_GATHER_COMPONENTS: return family = CHIP_CEDAR ? 4 : 0; + case PIPE_CAP_DRAW_INDIRECT: + /* needs kernel command checking support to work */ + if (family = CHIP_CEDAR rscreen-b.info.drm_minor = 41) + return 1; + return 0; /* Unsupported features. */ case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT: @@ -322,7 +327,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_VERTEX_COLOR_CLAMPED: case PIPE_CAP_USER_VERTEX_BUFFERS: case PIPE_CAP_TEXTURE_GATHER_OFFSETS: - case PIPE_CAP_DRAW_INDIRECT: case
[Mesa-dev] [PATCH 2/2] r600g: Implement sm5 UBO/sampler indexing
Caveat: Shaders using UBO/sampler indexing will not be optimized by SB, due to SB not currently supporting the necessary CF_INDEX_[01] index registers. Signed-off-by: Glenn Kennard glenn.kenn...@gmail.com --- docs/GL3.txt | 4 +-- src/gallium/drivers/r600/eg_asm.c | 52 --- src/gallium/drivers/r600/r600_asm.c| 58 +- src/gallium/drivers/r600/r600_asm.h| 9 + src/gallium/drivers/r600/r600_shader.c | 52 +++ src/gallium/drivers/r600/r600_shader.h | 2 ++ src/gallium/drivers/r600/sb/sb_bc_dump.cpp | 8 - src/gallium/drivers/r600/sb/sb_sched.h | 2 ++ 8 files changed, 166 insertions(+), 21 deletions(-) diff --git a/docs/GL3.txt b/docs/GL3.txt index 5ccfdea..dba36e0 100644 --- a/docs/GL3.txt +++ b/docs/GL3.txt @@ -98,8 +98,8 @@ GL 4.0, GLSL 4.00: GL_ARB_draw_indirect DONE (i965, nvc0, radeonsi, llvmpipe, softpipe) GL_ARB_gpu_shader5 DONE (i965, nvc0) - 'precise' qualifierDONE - - Dynamically uniform sampler array indices DONE () - - Dynamically uniform UBO array indices DONE () + - Dynamically uniform sampler array indices DONE (r600) + - Dynamically uniform UBO array indices DONE (r600) - Implicit signed - unsigned conversionsDONE - Fused multiply-add DONE () - Packing/bitfield/conversion functions DONE (r600) diff --git a/src/gallium/drivers/r600/eg_asm.c b/src/gallium/drivers/r600/eg_asm.c index acb3040..295cb4d 100644 --- a/src/gallium/drivers/r600/eg_asm.c +++ b/src/gallium/drivers/r600/eg_asm.c @@ -43,10 +43,10 @@ int eg_bytecode_cf_build(struct r600_bytecode *bc, struct r600_bytecode_cf *cf) /* prepend ALU_EXTENDED if we need more than 2 kcache sets */ if (cf-eg_alu_extended) { bc-bytecode[id++] = - S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE0(V_SQ_CF_INDEX_NONE) | - S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE1(V_SQ_CF_INDEX_NONE) | - S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE2(V_SQ_CF_INDEX_NONE) | - S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE3(V_SQ_CF_INDEX_NONE) | + S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE0(cf-kcache[0].index_mode) | + S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE1(cf-kcache[1].index_mode) | + S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE2(cf-kcache[2].index_mode) | + S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK_INDEX_MODE3(cf-kcache[3].index_mode) | S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK2(cf-kcache[2].bank) | S_SQ_CF_ALU_WORD0_EXT_KCACHE_BANK3(cf-kcache[3].bank) | S_SQ_CF_ALU_WORD0_EXT_KCACHE_MODE2(cf-kcache[2].mode); @@ -143,3 +143,47 @@ void eg_bytecode_export_read(struct r600_bytecode *bc, output-comp_mask = G_SQ_CF_ALLOC_EXPORT_WORD1_BUF_COMP_MASK(word1); } #endif + +int egcm_load_index_reg(struct r600_bytecode *bc, unsigned id, bool inside_alu_clause) +{ + struct r600_bytecode_alu alu; + int r; + unsigned type; + + assert(id 2); + assert(bc-chip_class = EVERGREEN); + + if (bc-index_loaded[id]) + return 0; + + memset(alu, 0, sizeof(alu)); + alu.op = ALU_OP1_MOVA_INT; + alu.src[0].sel = bc-index_reg[id]; + alu.src[0].chan = 0; + alu.last = 1; + r = r600_bytecode_add_alu(bc, alu); + if (r) + return r; + + bc-ar_loaded = 0; /* clobbered */ + + memset(alu, 0, sizeof(alu)); + alu.op = id == 0 ? ALU_OP0_SET_CF_IDX0 : ALU_OP0_SET_CF_IDX1; + alu.last = 1; + r = r600_bytecode_add_alu(bc, alu); + if (r) + return r; + + /* Must split ALU group as index only applies to following group */ + if (inside_alu_clause) { + type = bc-cf_last-op; + if ((r = r600_bytecode_add_cf(bc))) { + return r; + } + bc-cf_last-op = type; + } + + bc-index_loaded[id] = 1; + + return 0; +} diff --git a/src/gallium/drivers/r600/r600_asm.c b/src/gallium/drivers/r600/r600_asm.c index 8aa69b5..ce3c2d1 100644 --- a/src/gallium/drivers/r600/r600_asm.c +++ b/src/gallium/drivers/r600/r600_asm.c @@ -819,6 +819,10 @@ static int merge_inst_groups(struct r600_bytecode *bc, struct