Re: [Mesa-dev] [PATCH 2/2] ac, radeonsi: reduce optimizations for complex compute shaders on older APUs
On 24/07/18 14:24, Marek Olšák wrote: On Mon, Jul 23, 2018 at 11:33 PM, Timothy Arceri wrote: On 24/07/18 11:15, Marek Olšák wrote: On Fri, Jul 20, 2018 at 12:53 AM, Dave Airlie wrote: On 20 July 2018 at 13:12, Marek Olšák wrote: From: Marek Olšák To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23 finish sooner on the older CPUs. (otherwise it gets killed and we fail the test) I think this is possibly a bad idea, since it's clear LLVM has some pathalogical behaviour the AMDGPU backend for this shader and we are just papering over it. A quick dig into LLVM shows horrible misuse of a SmallVector data structure for what ends up having 2000 entries in it. I'm not going to out right NAK this, but it would be nice to have it accompanied by a pointer to an llvm bug against the amdgpu backend for the pathalogical case. Even if I comment out the push_back call in LLVM, it's still too slow. (the dEQP test times out and fails) LLVMCodeGenLevelLess is faster, but I don't know yet if it's enough for the test. I hard-coded the second buffer block to column_major rather than row_major which reduced total run time from 15 -> 9 seconds on my machine. So it seems temps would definitely help. Proper packing support would also likely help a little more but not as much. Can you please describe how temps would help? I already have :) https://lists.freedesktop.org/archives/mesa-dev/2018-July/200710.html Thanks, Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] ac, radeonsi: reduce optimizations for complex compute shaders on older APUs
On Mon, Jul 23, 2018 at 11:33 PM, Timothy Arceri wrote: > On 24/07/18 11:15, Marek Olšák wrote: >> >> On Fri, Jul 20, 2018 at 12:53 AM, Dave Airlie wrote: >>> >>> On 20 July 2018 at 13:12, Marek Olšák wrote: From: Marek Olšák To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23 finish sooner on the older CPUs. (otherwise it gets killed and we fail the test) >>> >>> >>> I think this is possibly a bad idea, since it's clear LLVM has some >>> pathalogical >>> behaviour the AMDGPU backend for this shader and we are just papering >>> over it. >>> >>> A quick dig into LLVM shows horrible misuse of a SmallVector data >>> structure >>> for what ends up having 2000 entries in it. >>> >>> I'm not going to out right NAK this, but it would be nice to have it >>> accompanied >>> by a pointer to an llvm bug against the amdgpu backend for the >>> pathalogical case. >> >> >> Even if I comment out the push_back call in LLVM, it's still too slow. >> (the dEQP test times out and fails) LLVMCodeGenLevelLess is faster, >> but I don't know yet if it's enough for the test. > > > I hard-coded the second buffer block to column_major rather than row_major > which reduced total run time from 15 -> 9 seconds on my machine. So it seems > temps would definitely help. Proper packing support would also likely help a > little more but not as much. Can you please describe how temps would help? Thanks, Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] gallium/u_vbuf: handle indirect multidraws correctly and efficiently (v2)
From: Marek Olšák v2: need to do MAX{start+count} instead of MAX{count} added piglit tests --- src/gallium/auxiliary/util/u_vbuf.c | 199 1 file changed, 175 insertions(+), 24 deletions(-) diff --git a/src/gallium/auxiliary/util/u_vbuf.c b/src/gallium/auxiliary/util/u_vbuf.c index 746ff1085ce..ca53e6218fd 100644 --- a/src/gallium/auxiliary/util/u_vbuf.c +++ b/src/gallium/auxiliary/util/u_vbuf.c @@ -1124,20 +1124,45 @@ static void u_vbuf_set_driver_vertex_buffers(struct u_vbuf *mgr) unsigned start_slot, count; start_slot = ffs(mgr->dirty_real_vb_mask) - 1; count = util_last_bit(mgr->dirty_real_vb_mask >> start_slot); pipe->set_vertex_buffers(pipe, start_slot, count, mgr->real_vertex_buffer + start_slot); mgr->dirty_real_vb_mask = 0; } +static void +u_vbuf_split_indexed_multidraw(struct u_vbuf *mgr, struct pipe_draw_info *info, + unsigned *indirect_data, unsigned stride, + unsigned draw_count) +{ + assert(info->index_size); + info->indirect = NULL; + + for (unsigned i = 0; i < draw_count; i++) { + unsigned offset = i * stride / 4; + + info->count = indirect_data[offset + 0]; + info->instance_count = indirect_data[offset + 1]; + + if (!info->count || !info->instance_count) + continue; + + info->start = indirect_data[offset + 2]; + info->index_bias = indirect_data[offset + 3]; + info->start_instance = indirect_data[offset + 4]; + + u_vbuf_draw_vbo(mgr, info); + } +} + void u_vbuf_draw_vbo(struct u_vbuf *mgr, const struct pipe_draw_info *info) { struct pipe_context *pipe = mgr->pipe; int start_vertex; unsigned min_index; unsigned num_vertices; boolean unroll_indices = FALSE; const uint32_t used_vb_mask = mgr->ve->used_vb_mask; uint32_t user_vb_mask = mgr->user_vb_mask & used_vb_mask; const uint32_t incompatible_vb_mask = @@ -1153,47 +1178,172 @@ void u_vbuf_draw_vbo(struct u_vbuf *mgr, const struct pipe_draw_info *info) if (mgr->dirty_real_vb_mask & used_vb_mask) { u_vbuf_set_driver_vertex_buffers(mgr); } pipe->draw_vbo(pipe, info); return; } new_info = *info; - /* Fallback. We need to know all the parameters. */ + /* Handle indirect (multi)draws. */ if (new_info.indirect) { - struct pipe_transfer *transfer = NULL; - int *data; - - if (new_info.index_size) { - data = pipe_buffer_map_range(pipe, new_info.indirect->buffer, - new_info.indirect->offset, 20, - PIPE_TRANSFER_READ, ); - new_info.index_bias = data[3]; - new_info.start_instance = data[4]; - } - else { - data = pipe_buffer_map_range(pipe, new_info.indirect->buffer, - new_info.indirect->offset, 16, - PIPE_TRANSFER_READ, ); - new_info.start_instance = data[3]; + const struct pipe_draw_indirect_info *indirect = new_info.indirect; + unsigned draw_count = 0; + + /* Get the number of draws. */ + if (indirect->indirect_draw_count) { + pipe_buffer_read(pipe, indirect->indirect_draw_count, + indirect->indirect_draw_count_offset, + 4, _count); + } else { + draw_count = indirect->draw_count; } - new_info.count = data[0]; - new_info.instance_count = data[1]; - new_info.start = data[2]; - pipe_buffer_unmap(pipe, transfer); - new_info.indirect = NULL; - - if (!new_info.count) + if (!draw_count) return; + + unsigned data_size = (draw_count - 1) * indirect->stride + + (new_info.index_size ? 20 : 16); + unsigned *data = alloca(data_size); + + /* Read the used buffer range only once, because the read can be + * uncached. + */ + pipe_buffer_read(pipe, indirect->buffer, indirect->offset, data_size, + data); + + if (info->index_size) { + /* Indexed multidraw. */ + unsigned index_bias0 = data[3]; + bool index_bias_same = true; + + /* If we invoke the translate path, we have to split the multidraw. */ + if (incompatible_vb_mask || + mgr->ve->incompatible_elem_mask) { +u_vbuf_split_indexed_multidraw(mgr, _info, data, + indirect->stride, draw_count); +return; + } + + /* See if index_bias is the same for all draws. */ + for (unsigned i = 1; i < draw_count; i++) { +if (data[i * indirect->stride / 4 + 3] != index_bias0) { + index_bias_same = false; + break; +} + } + + /* Split the multidraw if index_bias is different. */ + if
[Mesa-dev] [PATCH] mesa: allow indirect draws with the default VAO and compatibility profile
From: Marek Olšák --- src/mesa/main/draw_validate.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/main/draw_validate.c b/src/mesa/main/draw_validate.c index c0a234a2bc2..29304bd5144 100644 --- a/src/mesa/main/draw_validate.c +++ b/src/mesa/main/draw_validate.c @@ -1078,21 +1078,22 @@ valid_draw_indirect(struct gl_context *ctx, { const uint64_t end = (uint64_t) (uintptr_t) indirect + size; /* OpenGL ES 3.1 spec. section 10.5: * * "DrawArraysIndirect requires that all data sourced for the * command, including the DrawArraysIndirectCommand * structure, be in buffer objects, and may not be called when * the default vertex array object is bound." */ - if (ctx->Array.VAO == ctx->Array.DefaultVAO) { + if (ctx->API != API_OPENGL_COMPAT && + ctx->Array.VAO == ctx->Array.DefaultVAO) { _mesa_error(ctx, GL_INVALID_OPERATION, "(no VAO bound)"); return GL_FALSE; } /* From OpenGL ES 3.1 spec. section 10.5: * "An INVALID_OPERATION error is generated if zero is bound to * VERTEX_ARRAY_BINDING, DRAW_INDIRECT_BUFFER or to any enabled * vertex array." * * Here we check that for each enabled vertex array we have a vertex -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] gallium/u_vbuf: split u_vbuf_get_minmax_index function (v2)
From: Marek Olšák This will be used by indirect multidraws. v2: clean up the function further, change return types to unsigned --- src/gallium/auxiliary/util/u_vbuf.c | 101 ++-- 1 file changed, 51 insertions(+), 50 deletions(-) diff --git a/src/gallium/auxiliary/util/u_vbuf.c b/src/gallium/auxiliary/util/u_vbuf.c index c04a18a6764..746ff1085ce 100644 --- a/src/gallium/auxiliary/util/u_vbuf.c +++ b/src/gallium/auxiliary/util/u_vbuf.c @@ -1015,111 +1015,111 @@ static boolean u_vbuf_mapping_vertex_buffer_blocks(const struct u_vbuf *mgr) * We could query whether each buffer is busy, but that would * be way more costly than this. */ return (mgr->ve->used_vb_mask & (~mgr->user_vb_mask & ~mgr->incompatible_vb_mask & mgr->ve->compatible_vb_mask_all & mgr->ve->noninstance_vb_mask_any & mgr->nonzero_stride_vb_mask)) != 0; } -static void u_vbuf_get_minmax_index(struct pipe_context *pipe, -const struct pipe_draw_info *info, -int *out_min_index, int *out_max_index) +static void +u_vbuf_get_minmax_index_mapped(const struct pipe_draw_info *info, + const void *indices, unsigned *out_min_index, + unsigned *out_max_index) { - struct pipe_transfer *transfer = NULL; - const void *indices; - unsigned i; - - if (info->has_user_indices) { - indices = (uint8_t*)info->index.user + -info->start * info->index_size; - } else { - indices = pipe_buffer_map_range(pipe, info->index.resource, - info->start * info->index_size, - info->count * info->index_size, - PIPE_TRANSFER_READ, ); - } + unsigned max = 0; + unsigned min = ~0u; switch (info->index_size) { case 4: { const unsigned *ui_indices = (const unsigned*)indices; - unsigned max_ui = 0; - unsigned min_ui = ~0U; if (info->primitive_restart) { - for (i = 0; i < info->count; i++) { + for (unsigned i = 0; i < info->count; i++) { if (ui_indices[i] != info->restart_index) { - if (ui_indices[i] > max_ui) max_ui = ui_indices[i]; - if (ui_indices[i] < min_ui) min_ui = ui_indices[i]; + if (ui_indices[i] > max) max = ui_indices[i]; + if (ui_indices[i] < min) min = ui_indices[i]; } } } else { - for (i = 0; i < info->count; i++) { -if (ui_indices[i] > max_ui) max_ui = ui_indices[i]; -if (ui_indices[i] < min_ui) min_ui = ui_indices[i]; + for (unsigned i = 0; i < info->count; i++) { +if (ui_indices[i] > max) max = ui_indices[i]; +if (ui_indices[i] < min) min = ui_indices[i]; } } - *out_min_index = min_ui; - *out_max_index = max_ui; break; } case 2: { const unsigned short *us_indices = (const unsigned short*)indices; - unsigned max_us = 0; - unsigned min_us = ~0U; if (info->primitive_restart) { - for (i = 0; i < info->count; i++) { + for (unsigned i = 0; i < info->count; i++) { if (us_indices[i] != info->restart_index) { - if (us_indices[i] > max_us) max_us = us_indices[i]; - if (us_indices[i] < min_us) min_us = us_indices[i]; + if (us_indices[i] > max) max = us_indices[i]; + if (us_indices[i] < min) min = us_indices[i]; } } } else { - for (i = 0; i < info->count; i++) { -if (us_indices[i] > max_us) max_us = us_indices[i]; -if (us_indices[i] < min_us) min_us = us_indices[i]; + for (unsigned i = 0; i < info->count; i++) { +if (us_indices[i] > max) max = us_indices[i]; +if (us_indices[i] < min) min = us_indices[i]; } } - *out_min_index = min_us; - *out_max_index = max_us; break; } case 1: { const unsigned char *ub_indices = (const unsigned char*)indices; - unsigned max_ub = 0; - unsigned min_ub = ~0U; if (info->primitive_restart) { - for (i = 0; i < info->count; i++) { + for (unsigned i = 0; i < info->count; i++) { if (ub_indices[i] != info->restart_index) { - if (ub_indices[i] > max_ub) max_ub = ub_indices[i]; - if (ub_indices[i] < min_ub) min_ub = ub_indices[i]; + if (ub_indices[i] > max) max = ub_indices[i]; + if (ub_indices[i] < min) min = ub_indices[i]; } } } else { - for (i = 0; i < info->count; i++) { -if (ub_indices[i] > max_ub) max_ub = ub_indices[i]; -if (ub_indices[i] < min_ub) min_ub = ub_indices[i]; + for
Re: [Mesa-dev] [PATCH 2/2] ac, radeonsi: reduce optimizations for complex compute shaders on older APUs
On 24/07/18 11:15, Marek Olšák wrote: On Fri, Jul 20, 2018 at 12:53 AM, Dave Airlie wrote: On 20 July 2018 at 13:12, Marek Olšák wrote: From: Marek Olšák To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23 finish sooner on the older CPUs. (otherwise it gets killed and we fail the test) I think this is possibly a bad idea, since it's clear LLVM has some pathalogical behaviour the AMDGPU backend for this shader and we are just papering over it. A quick dig into LLVM shows horrible misuse of a SmallVector data structure for what ends up having 2000 entries in it. I'm not going to out right NAK this, but it would be nice to have it accompanied by a pointer to an llvm bug against the amdgpu backend for the pathalogical case. Even if I comment out the push_back call in LLVM, it's still too slow. (the dEQP test times out and fails) LLVMCodeGenLevelLess is faster, but I don't know yet if it's enough for the test. I hard-coded the second buffer block to column_major rather than row_major which reduced total run time from 15 -> 9 seconds on my machine. So it seems temps would definitely help. Proper packing support would also likely help a little more but not as much. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] ac, radeonsi: reduce optimizations for complex compute shaders on older APUs
On Fri, Jul 20, 2018 at 12:53 AM, Dave Airlie wrote: > On 20 July 2018 at 13:12, Marek Olšák wrote: >> From: Marek Olšák >> >> To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23 >> finish sooner on the older CPUs. (otherwise it gets killed and we fail >> the test) > > I think this is possibly a bad idea, since it's clear LLVM has some > pathalogical > behaviour the AMDGPU backend for this shader and we are just papering over it. > > A quick dig into LLVM shows horrible misuse of a SmallVector data structure > for what ends up having 2000 entries in it. > > I'm not going to out right NAK this, but it would be nice to have it > accompanied > by a pointer to an llvm bug against the amdgpu backend for the > pathalogical case. Even if I comment out the push_back call in LLVM, it's still too slow. (the dEQP test times out and fails) LLVMCodeGenLevelLess is faster, but I don't know yet if it's enough for the test. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/8] ac: add support for 16bit UBO loads
On Thu, Jul 19, 2018 at 8:48 AM, Daniel Schürmann wrote: > Signed-off-by: Daniel Schürmann > --- > src/amd/common/ac_llvm_build.c | 25 + > src/amd/common/ac_llvm_build.h | 8 > src/amd/common/ac_nir_to_llvm.c | 21 ++--- > 3 files changed, 51 insertions(+), 3 deletions(-) > > diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c > index 4078b005e5..54b7e98701 100644 > --- a/src/amd/common/ac_llvm_build.c > +++ b/src/amd/common/ac_llvm_build.c > @@ -1103,6 +1103,31 @@ LLVMValueRef > ac_build_buffer_load_format_gfx9_safe(struct ac_llvm_context *ctx, >can_speculate, true); > } > > +LLVMValueRef > +ac_build_tbuffer_load_short(struct ac_llvm_context *ctx, > + LLVMValueRef rsrc, > + LLVMValueRef vindex, > + LLVMValueRef voffset, > + LLVMValueRef soffset, > + LLVMValueRef immoffset) > +{ > + const char *name = "llvm.amdgcn.tbuffer.load.i32"; > + LLVMTypeRef type = ctx->i32; > + LLVMValueRef params[] = { > + rsrc, > + vindex, > + voffset, > + soffset, > + immoffset, > + LLVMConstInt(ctx->i32, > V_008F0C_BUF_DATA_FORMAT_16, false), > + LLVMConstInt(ctx->i32, > V_008F0C_BUF_NUM_FORMAT_UINT, false), > + ctx->i1false, > + ctx->i1false, > + }; > + LLVMValueRef res = ac_build_intrinsic(ctx, name, type, params, 9, 0); > + return LLVMBuildTrunc(ctx->builder, res, ctx->i16, ""); > +} > + > /** > * Set range metadata on an instruction. This can only be used on load and > * call instructions. If you know an instruction can only produce the values > diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h > index 4e7cbcd5fa..c5753037e7 100644 > --- a/src/amd/common/ac_llvm_build.h > +++ b/src/amd/common/ac_llvm_build.h > @@ -252,6 +252,14 @@ LLVMValueRef > ac_build_buffer_load_format_gfx9_safe(struct ac_llvm_context *ctx, >bool glc, >bool can_speculate); > > +LLVMValueRef > +ac_build_tbuffer_load_short(struct ac_llvm_context *ctx, > + LLVMValueRef rsrc, > + LLVMValueRef vindex, > + LLVMValueRef voffset, > + LLVMValueRef soffset, > + LLVMValueRef immoffset); > + > LLVMValueRef > ac_get_thread_id(struct ac_llvm_context *ctx); > > diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c > index 43a0b86420..d7a52a536c 100644 > --- a/src/amd/common/ac_nir_to_llvm.c > +++ b/src/amd/common/ac_nir_to_llvm.c > @@ -1663,9 +1663,24 @@ static LLVMValueRef visit_load_ubo_buffer(struct > ac_nir_context *ctx, > if (instr->dest.ssa.bit_size == 64) > num_components *= 2; > > - ret = ac_build_buffer_load(>ac, rsrc, num_components, NULL, > offset, > - NULL, 0, false, false, true, true); > - ret = ac_trim_vector(>ac, ret, num_components); > + if (instr->dest.ssa.bit_size == 16) { > + LLVMValueRef results[num_components]; > + for (unsigned i = 0; i < num_components; ++i) { > + results[i] = ac_build_tbuffer_load_short(>ac, > +rsrc, > + > ctx->ac.i32_0, > +offset, > + > ctx->ac.i32_0, > + > LLVMConstInt(ctx->ac.i32, 2 * i, 0)); > + } FYI, tbuffer.load is significantly slower than SI.load.const. If the offset is aligned to 4, SI.load.const + FPExt or ZExt/SExt would be faster. Marek > + ret = ac_build_gather_values(>ac, results, > num_components); > + } else { > + ret = ac_build_buffer_load(>ac, rsrc, num_components, > NULL, offset, > + NULL, 0, false, false, true, true); > + > + ret = ac_trim_vector(>ac, ret, num_components); > + } > + > return LLVMBuildBitCast(ctx->ac.builder, ret, > get_def_type(ctx, >dest.ssa), ""); > } > -- > 2.17.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/7] st/mesa: generalize code for the compressed texture map/unmap fallback
From: Marek Olšák in order to support ASTC --- src/mesa/state_tracker/st_cb_texture.c | 41 +++--- src/mesa/state_tracker/st_texture.h| 13 2 files changed, 31 insertions(+), 23 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_texture.c b/src/mesa/state_tracker/st_cb_texture.c index b345b2c6d8b..ecd1f4ef339 100644 --- a/src/mesa/state_tracker/st_cb_texture.c +++ b/src/mesa/state_tracker/st_cb_texture.c @@ -200,23 +200,23 @@ st_FreeTextureImageBuffer(struct gl_context *ctx, DBG("%s\n", __func__); if (stImage->pt) { pipe_resource_reference(>pt, NULL); } free(stImage->transfer); stImage->transfer = NULL; stImage->num_transfers = 0; - if (stImage->etc_data) { - free(stImage->etc_data); - stImage->etc_data = NULL; + if (stImage->compressed_data) { + free(stImage->compressed_data); + stImage->compressed_data = NULL; } /* if the texture image is being deallocated, the structure of the * texture is changing so we'll likely need a new sampler view. */ st_texture_release_all_sampler_views(st, stObj); } bool st_compressed_format_fallback(struct st_context *st, mesa_format format) @@ -224,36 +224,37 @@ st_compressed_format_fallback(struct st_context *st, mesa_format format) if (format == MESA_FORMAT_ETC1_RGB8) return !st->has_etc1; if (_mesa_is_format_etc2(format)) return !st->has_etc2; return false; } static void -etc_fallback_allocate(struct st_context *st, struct st_texture_image *stImage) +compressed_tex_fallback_allocate(struct st_context *st, + struct st_texture_image *stImage) { struct gl_texture_image *texImage = >base; if (!st_compressed_format_fallback(st, texImage->TexFormat)) return; - if (stImage->etc_data) - free(stImage->etc_data); + if (stImage->compressed_data) + free(stImage->compressed_data); unsigned data_size = _mesa_format_image_size(texImage->TexFormat, texImage->Width2, texImage->Height2, texImage->Depth2); - stImage->etc_data = + stImage->compressed_data = malloc(data_size * _mesa_num_tex_faces(texImage->TexObject->Target)); } /** called via ctx->Driver.MapTextureImage() */ static void st_MapTextureImage(struct gl_context *ctx, struct gl_texture_image *texImage, GLuint slice, GLuint x, GLuint y, GLuint w, GLuint h, GLbitfield mode, GLubyte **mapOut, GLint *rowStrideOut) @@ -268,36 +269,42 @@ st_MapTextureImage(struct gl_context *ctx, GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT)) == 0); const enum pipe_transfer_usage transfer_flags = st_access_flags_to_transfer_flags(mode, false); map = st_texture_image_map(st, stImage, transfer_flags, x, y, slice, w, h, 1, ); if (map) { if (st_compressed_format_fallback(st, texImage->TexFormat)) { - /* ETC isn't supported by all gallium drivers, where it's represented - * by uncompressed formats. We store the compressed data (as it's - * needed for image copies in OES_copy_image), and decompress as - * necessary in Unmap. + /* Some compressed formats don't have to be supported by drivers, + * and st/mesa transparently handles decompression on upload (Unmap), + * so that drivers don't see the compressed formats. * - * Note: all ETC1/ETC2 formats have 4x4 block sizes. + * We store the compressed data (it's needed for glGetCompressedTex- + * Image and image copies in OES_copy_image). */ unsigned z = transfer->box.z; struct st_texture_image_transfer *itransfer = >transfer[z]; - unsigned bytes = _mesa_get_format_bytes(texImage->TexFormat); + unsigned blk_w, blk_h; + _mesa_get_format_block_size(texImage->TexFormat, _w, _h); + + unsigned y_blocks = DIV_ROUND_UP(texImage->Height2, blk_h); unsigned stride = *rowStrideOut = itransfer->temp_stride = _mesa_format_row_stride(texImage->TexFormat, texImage->Width2); + unsigned block_size = _mesa_get_format_bytes(texImage->TexFormat); + *mapOut = itransfer->temp_data = -stImage->etc_data + ((x / 4) * bytes + (y / 4) * stride) + -z * stride * texImage->Height2 / 4; +stImage->compressed_data + +(z * y_blocks + (y / blk_h)) * stride + +(x / blk_w) * block_size; itransfer->map = map; } else { /* supported mapping */ *mapOut = map; *rowStrideOut = transfer->stride; } } else { *mapOut = NULL; @@ -624,21 +631,21 @@
[Mesa-dev] [PATCH 1/7] mesa: add ASTC 2D LDR decoder
From: Marek Olšák --- src/mesa/Makefile.sources |2 + src/mesa/main/formats.c| 42 + src/mesa/main/formats.h|3 + src/mesa/main/texcompress_astc.cpp | 1871 src/mesa/main/texcompress_astc.h | 47 + src/util/half_float.c | 59 + src/util/half_float.h |5 + 7 files changed, 2029 insertions(+) create mode 100644 src/mesa/main/texcompress_astc.cpp create mode 100644 src/mesa/main/texcompress_astc.h diff --git a/src/mesa/Makefile.sources b/src/mesa/Makefile.sources index 63f3734c322..ae8934e2830 100644 --- a/src/mesa/Makefile.sources +++ b/src/mesa/Makefile.sources @@ -207,20 +207,22 @@ MAIN_FILES = \ main/shader_query.cpp \ main/shared.c \ main/shared.h \ main/state.c \ main/state.h \ main/stencil.c \ main/stencil.h \ main/syncobj.c \ main/syncobj.h \ main/texcompress.c \ + main/texcompress_astc.cpp \ + main/texcompress_astc.h \ main/texcompress_bptc.c \ main/texcompress_bptc.h \ main/texcompress_bptc_tmp.h \ main/texcompress_cpal.c \ main/texcompress_cpal.h \ main/texcompress_etc.c \ main/texcompress_etc.h \ main/texcompress_etc_tmp.h \ main/texcompress_fxt1.c \ main/texcompress_fxt1.h \ diff --git a/src/mesa/main/formats.c b/src/mesa/main/formats.c index fdb53afd570..d4cd5d2182c 100644 --- a/src/mesa/main/formats.c +++ b/src/mesa/main/formats.c @@ -620,20 +620,62 @@ _mesa_is_format_etc2(mesa_format format) case MESA_FORMAT_ETC2_SIGNED_RG11_EAC: case MESA_FORMAT_ETC2_RGB8_PUNCHTHROUGH_ALPHA1: case MESA_FORMAT_ETC2_SRGB8_PUNCHTHROUGH_ALPHA1: return GL_TRUE; default: return GL_FALSE; } } +/** + * Return TRUE if format is an ASTC 2D compressed format. + */ +bool +_mesa_is_format_astc_2d(mesa_format format) +{ + switch (format) { + case MESA_FORMAT_RGBA_ASTC_4x4: + case MESA_FORMAT_RGBA_ASTC_5x4: + case MESA_FORMAT_RGBA_ASTC_5x5: + case MESA_FORMAT_RGBA_ASTC_6x5: + case MESA_FORMAT_RGBA_ASTC_6x6: + case MESA_FORMAT_RGBA_ASTC_8x5: + case MESA_FORMAT_RGBA_ASTC_8x6: + case MESA_FORMAT_RGBA_ASTC_8x8: + case MESA_FORMAT_RGBA_ASTC_10x5: + case MESA_FORMAT_RGBA_ASTC_10x6: + case MESA_FORMAT_RGBA_ASTC_10x8: + case MESA_FORMAT_RGBA_ASTC_10x10: + case MESA_FORMAT_RGBA_ASTC_12x10: + case MESA_FORMAT_RGBA_ASTC_12x12: + case MESA_FORMAT_SRGB8_ALPHA8_ASTC_4x4: + case MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x4: + case MESA_FORMAT_SRGB8_ALPHA8_ASTC_5x5: + case MESA_FORMAT_SRGB8_ALPHA8_ASTC_6x5: + case MESA_FORMAT_SRGB8_ALPHA8_ASTC_6x6: + case MESA_FORMAT_SRGB8_ALPHA8_ASTC_8x5: + case MESA_FORMAT_SRGB8_ALPHA8_ASTC_8x6: + case MESA_FORMAT_SRGB8_ALPHA8_ASTC_8x8: + case MESA_FORMAT_SRGB8_ALPHA8_ASTC_10x5: + case MESA_FORMAT_SRGB8_ALPHA8_ASTC_10x6: + case MESA_FORMAT_SRGB8_ALPHA8_ASTC_10x8: + case MESA_FORMAT_SRGB8_ALPHA8_ASTC_10x10: + case MESA_FORMAT_SRGB8_ALPHA8_ASTC_12x10: + case MESA_FORMAT_SRGB8_ALPHA8_ASTC_12x12: + return true; + default: + return false; + } +} + + /** * If the given format is a compressed format, return a corresponding * uncompressed format. */ mesa_format _mesa_get_uncompressed_format(mesa_format format) { switch (format) { case MESA_FORMAT_RGB_FXT1: return MESA_FORMAT_BGR_UNORM8; diff --git a/src/mesa/main/formats.h b/src/mesa/main/formats.h index 2afa886782b..335e4de9955 100644 --- a/src/mesa/main/formats.h +++ b/src/mesa/main/formats.h @@ -714,20 +714,23 @@ _mesa_is_format_unsigned(mesa_format format); extern GLboolean _mesa_is_format_signed(mesa_format format); extern GLboolean _mesa_is_format_integer(mesa_format format); extern bool _mesa_is_format_etc2(mesa_format format); +bool +_mesa_is_format_astc_2d(mesa_format format); + GLenum _mesa_is_format_color_format(mesa_format format); extern GLenum _mesa_get_format_color_encoding(mesa_format format); extern GLuint _mesa_format_image_size(mesa_format format, GLsizei width, GLsizei height, GLsizei depth); diff --git a/src/mesa/main/texcompress_astc.cpp b/src/mesa/main/texcompress_astc.cpp new file mode 100644 index 000..996e8ea28d6 --- /dev/null +++ b/src/mesa/main/texcompress_astc.cpp @@ -0,0 +1,1871 @@ +/* + * Copyright 2015 Philip Taylor + * Copyright 2018 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this
[Mesa-dev] [PATCH 5/7] st/mesa: generalize fallback_copy_image for compressed textures
From: Marek Olšák in order to support ASTC --- src/mesa/state_tracker/st_cb_copyimage.c | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_copyimage.c b/src/mesa/state_tracker/st_cb_copyimage.c index 5230b61877f..f5d8c047800 100644 --- a/src/mesa/state_tracker/st_cb_copyimage.c +++ b/src/mesa/state_tracker/st_cb_copyimage.c @@ -525,52 +525,57 @@ copy_image(struct pipe_context *pipe, PIPE_FORMAT_R16G16_UNORM)) return; /* Only allow non-identity swizzling on RGBA8 formats. */ /* Simple copy, memcpy with swizzling, no format conversion. */ swizzled_copy(pipe, dst, dst_level, dstx, dsty, dstz, src, src_level, src_box); } -/* Note, the only allowable compressed format for this function is ETC */ static void fallback_copy_image(struct st_context *st, struct gl_texture_image *dst_image, struct pipe_resource *dst_res, int dst_x, int dst_y, int dst_z, struct gl_texture_image *src_image, struct pipe_resource *src_res, int src_x, int src_y, int src_z, int src_w, int src_h) { uint8_t *dst, *src; int dst_stride, src_stride; struct pipe_transfer *dst_transfer, *src_transfer; unsigned line_bytes; bool dst_is_compressed = dst_image && _mesa_is_format_compressed(dst_image->TexFormat); bool src_is_compressed = src_image && _mesa_is_format_compressed(src_image->TexFormat); + unsigned dst_blk_w = 1, dst_blk_h = 1, src_blk_w = 1, src_blk_h = 1; + if (dst_image) + _mesa_get_format_block_size(dst_image->TexFormat, _blk_w, _blk_h); + if (src_image) + _mesa_get_format_block_size(src_image->TexFormat, _blk_w, _blk_h); + unsigned dst_w = src_w; unsigned dst_h = src_h; unsigned lines = src_h; if (src_is_compressed && !dst_is_compressed) { - dst_w = DIV_ROUND_UP(dst_w, 4); - dst_h = DIV_ROUND_UP(dst_h, 4); + dst_w = DIV_ROUND_UP(dst_w, src_blk_w); + dst_h = DIV_ROUND_UP(dst_h, src_blk_h); } else if (!src_is_compressed && dst_is_compressed) { - dst_w *= 4; - dst_h *= 4; + dst_w *= dst_blk_w; + dst_h *= dst_blk_h; } if (src_is_compressed) { - lines = DIV_ROUND_UP(lines, 4); + lines = DIV_ROUND_UP(lines, src_blk_h); } if (src_image) line_bytes = _mesa_format_row_stride(src_image->TexFormat, src_w); else line_bytes = _mesa_format_row_stride(dst_image->TexFormat, dst_w); if (dst_image) { st->ctx->Driver.MapTextureImage( st->ctx, dst_image, dst_z, -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/7] st/mesa: use st_compressed_format_fallback more
From: Marek Olšák --- src/mesa/state_tracker/st_format.c | 16 1 file changed, 4 insertions(+), 12 deletions(-) diff --git a/src/mesa/state_tracker/st_format.c b/src/mesa/state_tracker/st_format.c index 9ae796eca9e..45513e8683e 100644 --- a/src/mesa/state_tracker/st_format.c +++ b/src/mesa/state_tracker/st_format.c @@ -45,20 +45,21 @@ #include "main/macros.h" #include "main/formatquery.h" #include "pipe/p_context.h" #include "pipe/p_defines.h" #include "pipe/p_screen.h" #include "util/u_format.h" #include "st_cb_texture.h" #include "st_context.h" #include "st_format.h" +#include "st_texture.h" /** * Translate Mesa format to Gallium format. */ enum pipe_format st_mesa_format_to_pipe_format(const struct st_context *st, mesa_format mesaFormat) { struct pipe_screen *screen = st->pipe->screen; @@ -1039,41 +1040,34 @@ st_pipe_format_to_mesa_format(enum pipe_format format) */ static void test_format_conversion(struct st_context *st) { GLuint i; /* test all Mesa formats */ for (i = 1; i < MESA_FORMAT_COUNT; i++) { enum pipe_format pf; - /* ETC formats are translated differently, skip them. */ - if (_mesa_is_format_etc2(i)) - continue; - if (i == MESA_FORMAT_ETC1_RGB8 && !st->has_etc1) + if (st_compressed_format_fallback(st, i)) continue; pf = st_mesa_format_to_pipe_format(st, i); if (pf != PIPE_FORMAT_NONE) { mesa_format MAYBE_UNUSED mf = st_pipe_format_to_mesa_format(pf); assert(mf == i); } } /* Test all Gallium formats */ for (i = 1; i < PIPE_FORMAT_COUNT; i++) { - /* ETC formats are translated differently, skip them. */ - if (i == PIPE_FORMAT_ETC1_RGB8 && !st->has_etc1) - continue; - mesa_format mf = st_pipe_format_to_mesa_format(i); - if (_mesa_is_format_etc2(mf) && !st->has_etc2) + if (st_compressed_format_fallback(st, mf)) continue; if (mf != MESA_FORMAT_NONE) { enum pipe_format MAYBE_UNUSED pf = st_mesa_format_to_pipe_format(st, mf); assert(pf == i); } } } @@ -2340,24 +2334,22 @@ st_ChooseTextureFormat(struct gl_context *ctx, GLenum target, pTarget, 0, bindings, GL_TRUE); if (pFormat == PIPE_FORMAT_NONE && !is_renderbuffer) { /* try choosing format again, this time without render target bindings */ pFormat = st_choose_format(st, internalFormat, format, type, pTarget, 0, PIPE_BIND_SAMPLER_VIEW, GL_TRUE); } if (pFormat == PIPE_FORMAT_NONE) { - /* lie about using etc1/etc2 natively if we do decoding tricks */ mFormat = _mesa_glenum_to_compressed_format(internalFormat); - if ((mFormat == MESA_FORMAT_ETC1_RGB8 && !st->has_etc1) || - (_mesa_is_format_etc2(mFormat) && !st->has_etc2)) + if (st_compressed_format_fallback(st, mFormat)) return mFormat; /* no luck at all */ return MESA_FORMAT_NONE; } mFormat = st_pipe_format_to_mesa_format(pFormat); /* Debugging aid */ if (0) { -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/7] st/mesa: implement ASTC 2D LDR fallback for all drivers
From: Marek Olšák --- src/mesa/state_tracker/st_cb_texture.c | 17 +++- src/mesa/state_tracker/st_context.c| 3 ++ src/mesa/state_tracker/st_context.h| 1 + src/mesa/state_tracker/st_extensions.c | 5 +++ src/mesa/state_tracker/st_format.c | 56 ++ 5 files changed, 80 insertions(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_texture.c b/src/mesa/state_tracker/st_cb_texture.c index 80eb171dfd8..56d8c411472 100644 --- a/src/mesa/state_tracker/st_cb_texture.c +++ b/src/mesa/state_tracker/st_cb_texture.c @@ -34,20 +34,21 @@ #include "main/format_utils.h" #include "main/glformats.h" #include "main/image.h" #include "main/imports.h" #include "main/macros.h" #include "main/mipmap.h" #include "main/pack.h" #include "main/pbo.h" #include "main/pixeltransfer.h" #include "main/texcompress.h" +#include "main/texcompress_astc.h" #include "main/texcompress_etc.h" #include "main/texgetimage.h" #include "main/teximage.h" #include "main/texobj.h" #include "main/texstore.h" #include "state_tracker/st_debug.h" #include "state_tracker/st_context.h" #include "state_tracker/st_cb_bitmap.h" #include "state_tracker/st_cb_fbo.h" @@ -220,20 +221,23 @@ st_FreeTextureImageBuffer(struct gl_context *ctx, bool st_compressed_format_fallback(struct st_context *st, mesa_format format) { if (format == MESA_FORMAT_ETC1_RGB8) return !st->has_etc1; if (_mesa_is_format_etc2(format)) return !st->has_etc2; + if (_mesa_is_format_astc_2d(format)) + return !st->has_astc_2d_ldr; + return false; } static void compressed_tex_fallback_allocate(struct st_context *st, struct st_texture_image *stImage) { struct gl_texture_image *texImage = >base; if (!st_compressed_format_fallback(st, texImage->TexFormat)) @@ -330,28 +334,34 @@ st_UnmapTextureImage(struct gl_context *ctx, struct pipe_transfer *transfer = itransfer->transfer; assert(z == transfer->box.z); if (transfer->usage & PIPE_TRANSFER_WRITE) { if (texImage->TexFormat == MESA_FORMAT_ETC1_RGB8) { _mesa_etc1_unpack_rgba(itransfer->map, transfer->stride, itransfer->temp_data, itransfer->temp_stride, transfer->box.width, transfer->box.height); - } - else { + } else if (_mesa_is_format_etc2(texImage->TexFormat)) { bool bgra = stImage->pt->format == PIPE_FORMAT_B8G8R8A8_SRGB; _mesa_unpack_etc2_format(itransfer->map, transfer->stride, itransfer->temp_data, itransfer->temp_stride, transfer->box.width, transfer->box.height, texImage->TexFormat, bgra); + } else if (_mesa_is_format_astc_2d(texImage->TexFormat)) { +_mesa_unpack_astc_2d_ldr(itransfer->map, transfer->stride, + itransfer->temp_data, itransfer->temp_stride, + transfer->box.width, transfer->box.height, + texImage->TexFormat); + } else { +unreachable("unexpected format for a compressed format fallback"); } } itransfer->temp_data = NULL; itransfer->temp_stride = 0; itransfer->map = 0; } st_texture_image_unmap(st, stImage, slice); } @@ -1408,20 +1418,21 @@ st_TexSubImage(struct gl_context *ctx, GLuint dims, unsigned dstz = texImage->Face + texImage->TexObject->MinLayer; unsigned dst_level = 0; st_flush_bitmap_cache(st); st_invalidate_readpix_cache(st); if (stObj->pt == stImage->pt) dst_level = texImage->TexObject->MinLevel + texImage->Level; assert(!_mesa_is_format_etc2(texImage->TexFormat) && + !_mesa_is_format_astc_2d(texImage->TexFormat) && texImage->TexFormat != MESA_FORMAT_ETC1_RGB8); if (!dst) goto fallback; /* Try texture_subdata, which should be the fastest memcpy path. */ if (pixels && !_mesa_is_bufferobj(unpack->BufferObj) && _mesa_texstore_can_use_memcpy(ctx, texImage->_BaseFormat, texImage->TexFormat, format, type, @@ -1868,20 +1879,21 @@ st_GetTexSubImage(struct gl_context * ctx, GLenum gl_target = texImage->TexObject->Target; enum pipe_texture_target pipe_target; unsigned dims; struct pipe_blit_info blit; unsigned bind; struct pipe_transfer *tex_xfer; ubyte *map = NULL; boolean done = FALSE; assert(!_mesa_is_format_etc2(texImage->TexFormat) && + !_mesa_is_format_astc_2d(texImage->TexFormat) && texImage->TexFormat != MESA_FORMAT_ETC1_RGB8); st_flush_bitmap_cache(st); if
[Mesa-dev] [PATCH 6/7] st/mesa: add ETC2 & ASTC fast path for GetTex(Sub)Image
From: Marek Olšák Not sure if GL/GLES can hit this path, but it's just decompression. --- src/mesa/state_tracker/st_cb_texture.c | 41 ++ 1 file changed, 41 insertions(+) diff --git a/src/mesa/state_tracker/st_cb_texture.c b/src/mesa/state_tracker/st_cb_texture.c index ecd1f4ef339..80eb171dfd8 100644 --- a/src/mesa/state_tracker/st_cb_texture.c +++ b/src/mesa/state_tracker/st_cb_texture.c @@ -1960,35 +1960,76 @@ st_GetTexSubImage(struct gl_context * ctx, /* Set the appropriate format for the decompressed texture. * Luminance and sRGB formats shouldn't appear here.*/ switch (src_format) { case PIPE_FORMAT_DXT1_RGB: case PIPE_FORMAT_DXT1_RGBA: case PIPE_FORMAT_DXT3_RGBA: case PIPE_FORMAT_DXT5_RGBA: case PIPE_FORMAT_RGTC1_UNORM: case PIPE_FORMAT_RGTC2_UNORM: case PIPE_FORMAT_ETC1_RGB8: + case PIPE_FORMAT_ETC2_RGB8: + case PIPE_FORMAT_ETC2_RGB8A1: + case PIPE_FORMAT_ETC2_RGBA8: + case PIPE_FORMAT_ASTC_4x4: + case PIPE_FORMAT_ASTC_5x4: + case PIPE_FORMAT_ASTC_5x5: + case PIPE_FORMAT_ASTC_6x5: + case PIPE_FORMAT_ASTC_6x6: + case PIPE_FORMAT_ASTC_8x5: + case PIPE_FORMAT_ASTC_8x6: + case PIPE_FORMAT_ASTC_8x8: + case PIPE_FORMAT_ASTC_10x5: + case PIPE_FORMAT_ASTC_10x6: + case PIPE_FORMAT_ASTC_10x8: + case PIPE_FORMAT_ASTC_10x10: + case PIPE_FORMAT_ASTC_12x10: + case PIPE_FORMAT_ASTC_12x12: case PIPE_FORMAT_BPTC_RGBA_UNORM: dst_glformat = GL_RGBA8; break; case PIPE_FORMAT_RGTC1_SNORM: case PIPE_FORMAT_RGTC2_SNORM: if (!ctx->Extensions.EXT_texture_snorm) goto fallback; dst_glformat = GL_RGBA8_SNORM; break; case PIPE_FORMAT_BPTC_RGB_FLOAT: case PIPE_FORMAT_BPTC_RGB_UFLOAT: if (!ctx->Extensions.ARB_texture_float) goto fallback; dst_glformat = GL_RGBA32F; break; + case PIPE_FORMAT_ETC2_R11_UNORM: + if (!screen->is_format_supported(screen, PIPE_FORMAT_R16_UNORM, + pipe_target, 0, bind)) +goto fallback; + dst_glformat = GL_R16; + break; + case PIPE_FORMAT_ETC2_R11_SNORM: + if (!screen->is_format_supported(screen, PIPE_FORMAT_R16_SNORM, + pipe_target, 0, bind)) +goto fallback; + dst_glformat = GL_R16_SNORM; + break; + case PIPE_FORMAT_ETC2_RG11_UNORM: + if (!screen->is_format_supported(screen, PIPE_FORMAT_R16G16_UNORM, + pipe_target, 0, bind)) +goto fallback; + dst_glformat = GL_RG16; + break; + case PIPE_FORMAT_ETC2_RG11_SNORM: + if (!screen->is_format_supported(screen, PIPE_FORMAT_R16G16_SNORM, + pipe_target, 0, bind)) +goto fallback; + dst_glformat = GL_RG16_SNORM; + break; default: assert(0); goto fallback; } dst_format = st_choose_format(st, dst_glformat, format, type, pipe_target, 0, bind, FALSE); if (dst_format == PIPE_FORMAT_NONE) { /* unable to get an rgba format!?! */ -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/7] ASTC texture compression for all Gallium drivers
Hi, This series enables ASTC texture compression for all Gallium drivers that don't support it in hardware. The works the same as the ETC2 fallback, i.e. it decompresses ASTC inside glCompressedTexImage to a supported uncompressed format. RadeonSI now finally supports the following: - GL_KHR_texture_compression_astc_ldr - GL_ANDROID_extension_pack_es31a - OpenGL ES 3.2 !!! All ASTC dEQP tests pass. Please review. Thanks, Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/7] st/mesa: generalize st_etc_fallback -> st_compressed_format_fallback
From: Marek Olšák for ASTC support later --- src/mesa/state_tracker/st_cb_copyimage.c | 4 ++-- src/mesa/state_tracker/st_cb_texture.c | 24 ++-- src/mesa/state_tracker/st_texture.h | 2 +- 3 files changed, 17 insertions(+), 13 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_copyimage.c b/src/mesa/state_tracker/st_cb_copyimage.c index d160c8c8d30..5230b61877f 100644 --- a/src/mesa/state_tracker/st_cb_copyimage.c +++ b/src/mesa/state_tracker/st_cb_copyimage.c @@ -661,22 +661,22 @@ st_CopyImageSubData(struct gl_context *ctx, dst_z += dst_image->TexObject->MinLayer; } } else { struct st_renderbuffer *dst = st_renderbuffer(dst_renderbuffer); dst_res = dst->texture; dst_level = 0; } u_box_2d_zslice(src_x, src_y, src_z, src_width, src_height, ); - if ((src_image && st_etc_fallback(st, src_image)) || - (dst_image && st_etc_fallback(st, dst_image))) { + if ((src_image && st_compressed_format_fallback(st, src_image->TexFormat)) || + (dst_image && st_compressed_format_fallback(st, dst_image->TexFormat))) { fallback_copy_image(st, dst_image, dst_res, dst_x, dst_y, orig_dst_z, src_image, src_res, src_x, src_y, orig_src_z, src_width, src_height); } else { copy_image(pipe, dst_res, dst_level, dst_x, dst_y, dst_z, src_res, src_level, ); } } void diff --git a/src/mesa/state_tracker/st_cb_texture.c b/src/mesa/state_tracker/st_cb_texture.c index 99209abcd62..b345b2c6d8b 100644 --- a/src/mesa/state_tracker/st_cb_texture.c +++ b/src/mesa/state_tracker/st_cb_texture.c @@ -212,32 +212,37 @@ st_FreeTextureImageBuffer(struct gl_context *ctx, stImage->etc_data = NULL; } /* if the texture image is being deallocated, the structure of the * texture is changing so we'll likely need a new sampler view. */ st_texture_release_all_sampler_views(st, stObj); } bool -st_etc_fallback(struct st_context *st, struct gl_texture_image *texImage) +st_compressed_format_fallback(struct st_context *st, mesa_format format) { - return (_mesa_is_format_etc2(texImage->TexFormat) && !st->has_etc2) || - (texImage->TexFormat == MESA_FORMAT_ETC1_RGB8 && !st->has_etc1); + if (format == MESA_FORMAT_ETC1_RGB8) + return !st->has_etc1; + + if (_mesa_is_format_etc2(format)) + return !st->has_etc2; + + return false; } static void etc_fallback_allocate(struct st_context *st, struct st_texture_image *stImage) { struct gl_texture_image *texImage = >base; - if (!st_etc_fallback(st, texImage)) + if (!st_compressed_format_fallback(st, texImage->TexFormat)) return; if (stImage->etc_data) free(stImage->etc_data); unsigned data_size = _mesa_format_image_size(texImage->TexFormat, texImage->Width2, texImage->Height2, texImage->Depth2); @@ -262,21 +267,21 @@ st_MapTextureImage(struct gl_context *ctx, assert((mode & ~(GL_MAP_READ_BIT | GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT)) == 0); const enum pipe_transfer_usage transfer_flags = st_access_flags_to_transfer_flags(mode, false); map = st_texture_image_map(st, stImage, transfer_flags, x, y, slice, w, h, 1, ); if (map) { - if (st_etc_fallback(st, texImage)) { + if (st_compressed_format_fallback(st, texImage->TexFormat)) { /* ETC isn't supported by all gallium drivers, where it's represented * by uncompressed formats. We store the compressed data (as it's * needed for image copies in OES_copy_image), and decompress as * necessary in Unmap. * * Note: all ETC1/ETC2 formats have 4x4 block sizes. */ unsigned z = transfer->box.z; struct st_texture_image_transfer *itransfer = >transfer[z]; @@ -303,22 +308,23 @@ st_MapTextureImage(struct gl_context *ctx, /** called via ctx->Driver.UnmapTextureImage() */ static void st_UnmapTextureImage(struct gl_context *ctx, struct gl_texture_image *texImage, GLuint slice) { struct st_context *st = st_context(ctx); struct st_texture_image *stImage = st_texture_image(texImage); - if (st_etc_fallback(st, texImage)) { - /* Decompress the ETC texture to the mapped one. */ + if (st_compressed_format_fallback(st, texImage->TexFormat)) { + /* Decompress the compressed image on upload if the driver doesn't + * support the compressed format. */ unsigned z = slice + stImage->base.Face; struct st_texture_image_transfer *itransfer = >transfer[z]; struct pipe_transfer *transfer = itransfer->transfer; assert(z == transfer->box.z);
[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks
https://bugs.freedesktop.org/show_bug.cgi?id=107351 --- Comment #10 from Bas Nieuwenhuizen --- If you're in a position to get more useful info, it would be great if you could dump the *pCreateInfo arg of the crashing radv_CreateImage. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks
https://bugs.freedesktop.org/show_bug.cgi?id=107351 --- Comment #9 from Bas Nieuwenhuizen --- Yes, images not explicitly using the path for Android cross-api interop should just follow the normal path. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks
https://bugs.freedesktop.org/show_bug.cgi?id=107351 --- Comment #8 from Mauro Rossi --- In the code between ANDROID braces, is it correct that code falls through in case of !gralloc_info or should it bail out? Just a question as I may be very wrong and confused #ifdef ANDROID const VkNativeBufferANDROID *gralloc_info = vk_find_struct_const(pCreateInfo->pNext, NATIVE_BUFFER_ANDROID); if (gralloc_info)/* this line 1252 */ return radv_image_from_gralloc(device, pCreateInfo, gralloc_info, pAllocator, pImage); #endif -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks
https://bugs.freedesktop.org/show_bug.cgi?id=107351 --- Comment #7 from Bas Nieuwenhuizen --- Looking at the full backtrace it looks like plain texture creation, nothing Android specific so I'll need to debug the specific app sometime. 07-22 21:31:16.071 5007 5007 F DEBUG : backtrace: 07-22 21:31:16.071 5007 5007 F DEBUG : #00 pc 000f1d47 /system/vendor/lib/hw/vulkan.radv.so (radv_image_create+199) 07-22 21:31:16.071 5007 5007 F DEBUG : #01 pc 000f2f26 /system/vendor/lib/hw/vulkan.radv.so (radv_CreateImage+166) 07-22 21:31:16.071 5007 5007 F DEBUG : #02 pc b59a /android/system/lib/libvulkan.so 07-22 21:31:16.071 5007 5007 F DEBUG : #03 pc 001488fe /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (eva::graphics::vulkan::native_memory_utility::create_texture(VkImageCreateInfo const&)+78) 07-22 21:31:16.071 5007 5007 F DEBUG : #04 pc 0013ffe9 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (eva::graphics::vulkan::vulkan_device::create_texture(eva::graphics::texture_desc const&, eva::graphics::subresource_data const*)+233) 07-22 21:31:16.071 5007 5007 F DEBUG : #05 pc 001458cf /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so 07-22 21:31:16.071 5007 5007 F DEBUG : #06 pc 0013e474 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (eva::graphics::vulkan::create_dds_texture_from_memory(std::function, eva::graphics::texture_desc*, unsigned char const*, DXGI_FORMAT, unsigned int, unsigned int, unsigned int)+2484) 07-22 21:31:16.071 5007 5007 F DEBUG : #07 pc 00145b8c /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (eva::graphics::vulkan::vulkan_resource_factory::load_texture(boost::filesystem::path const&, DXGI_FORMAT, int)+316) 07-22 21:31:16.071 5007 5007 F DEBUG : #08 pc 001ad794 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (eva::graphics::assets::texture_loader::load(std::string const&) const+1412) 07-22 21:31:16.071 5007 5007 F DEBUG : #09 pc 00153e69 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (eva::ui::loading_screen_contents::loading_screen_contents(eva::ui::loading_screen_creation_parameters const&, eva::globals&)+649) 07-22 21:31:16.071 5007 5007 F DEBUG : #10 pc 0011f1a1 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (mobile_player::player::create_loading_screen()+161) 07-22 21:31:16.071 5007 5007 F DEBUG : #11 pc 00125563 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (_ZN13mobile_player6playerC1ERN3eva7globalsESsNS1_4math6vectorIiLj2EEESsRNS1_2ui13invalidatableERNS7_12message_pumpEP13ANativeWindowiNS1_8graphics8platformE+6083) 07-22 21:31:16.071 5007 5007 F DEBUG : #12 pc 00118e37 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (android_native::native_application::create_player(std::string, bool, ANativeWindow*)+375) 07-22 21:31:16.071 5007 5007 F DEBUG : #13 pc 0011cd0e /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (activity::handle_cmd(int)+206) 07-22 21:31:16.071 5007 5007 F DEBUG : #14 pc 0011ce15 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (activity::handle_cmd(android_app*, int)+37) 07-22 21:31:16.071 5007 5007 F DEBUG : #15 pc 0025e46c /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so 07-22 21:31:16.071 5007 5007 F DEBUG : #16 pc 0011cb9e /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (activity::run()+126) 07-22 21:31:16.071 5007 5007 F DEBUG : #17 pc 0011cf02 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so (android_main+98) 07-22 21:31:16.071 5007 5007 F DEBUG : #18 pc 0025ddd4 /data/app/com.futuremark.dmandroid.application--TcMIPU5ttg_gGGrKxrVOQ==/lib/x86/lib3dmark_farandole.so -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 12/15] nir/types: Add a wrapper to access gl_type
On 23/07/18 22:48, Alejandro Piñeiro wrote: On 21/07/18 05:15, Timothy Arceri wrote: On 21/07/18 13:09, Timothy Arceri wrote: Reviewed-by: Timothy Arceri Actually I take that back. This introduces a dependency on GL in NIR, Hmm, but that dependency is already there. nir.h includes GL/gl.h, and in fact, there is a comment pointing that is there for GLenum. nir_variable includes a "Glenum format" as part of ARB_shader_image_load_store. If it's already there then for now this patch is: Reviewed-by: Timothy Arceri ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks
https://bugs.freedesktop.org/show_bug.cgi?id=107351 --- Comment #6 from Mauro Rossi --- Created attachment 140802 --> https://bugs.freedesktop.org/attachment.cgi?id=140802=edit extended logcat to see what happens before and after extended logcat to see what happens before and after -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks
https://bugs.freedesktop.org/show_bug.cgi?id=107351 --- Comment #5 from Mauro Rossi --- >Is ANDROID properly defined? Yes, because it is defined by the build system itself, without need of passing the LOCAL_CFLAGS := -DANDROID Mauro -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 107295] Access violation on glDrawArrays with count >= 2048
https://bugs.freedesktop.org/show_bug.cgi?id=107295 Roland Scheidegger changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #9 from Roland Scheidegger --- Fixed by 09828feab0bdcb1feb0865fc66e5304f6164c3b8. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks
https://bugs.freedesktop.org/show_bug.cgi?id=107351 --- Comment #4 from Bas Nieuwenhuizen --- No, on ANDROID, the call to radv_image_from_gralloc should happen for wsi images and wsi_info should be always NULL. Is ANDROID properly defined? -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] virgl: remove unused stride-arguments
On Jul 23, 2018 22:05, Dave Airlie wrote:On 20 July 2018 at 13:39, Gurchetan Singh wrote: > Reviewed-by: Gurchetan Singh > On Wed, Jul 18, 2018 at 4:06 AM Erik Faye-Lund > wrote: >> >> The IOCTLs doesn't pass this along, so computing them in the first >> place is kinda pointless. >> >> Signed-off-by: Erik Faye-Lund >> --- >> >> This is just a cleanup I noticed based on some discussion with Gert. >> >> A question is, what code here expects this stride to be respected? The >> call-sites in virgl_*_transfer_map and virgl_*_transfer_unmap kinda >> looks like they do... They'll get a bit of a surprise here, no? >> >> Anyway, this is already broken, so I think this should be OK. But >> perhaps this patch shows some code-paths that need some love? I reverted this as it didn't fixup vtest, and it introduced build time warnings. I agree this should get some more investigation but make sure vtest and drm backends don't regress. Dave. Good call! ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks
https://bugs.freedesktop.org/show_bug.cgi?id=107351 --- Comment #3 from Mauro Rossi --- Hi Bas, the problem was obeserved in the last month, but the relevant parts radv_image.c and radv_android.c did not change, so anything between 3da693b and HEAD the addr2line output is not always reliable in terms of line of code, looking at the logcat the call causing the segfault is radv_image_create() at line 1261, the symbols resolved by crash dump should be reliable, based on previous experiences. I had a look at differences between radv and anv and I saw two of them: 1. wsi extension implementation is different, so I tried to revert a50f93e (radv/image: Implement the wsi "extension") but the problem was not solved 2. radv_image_create() passes .scanout parameter and anv_image_create() does not. I have not tried yet to remove .scanout in radv_image_create(), but is it worth an attempt? Mauro -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Add flipping of gl_PointCoord.y in nir_lower_wpos_ytransform.
On Monday, July 23, 2018 1:24:08 PM PDT Rob Clark wrote: > On Mon, Jul 23, 2018 at 1:25 PM, Eric Anholt wrote: > > Eric Anholt writes: > > > >> This is controlled by a new nir_shader_compiler_options flag, and fixes > >> dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D. > >> --- > > > > Ken, Rob, any chance you could take a quick look at this? I think this > > is the last change I need in Mesa for GLES2 conformance. > > > > I was kinda wondering if there was any driver where we *didn't* want > this? I had meant to try that deqp w/ freedreno but got distracted.. > I'll try to give this a look this evening.. > > BR, > -R i965 can configure the point coordinate origin to be either upper left or lower left, so I don't think we need Y-flipping in the shader. But we do need this pass for gl_FragCoord. So I think making it optional is probably a good idea. Looks good to me, Eric. Congratulations on finishing the conformance test bugfixing slog! :) Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] intel/compiler: Account for built-in uniforms in analyze_ubo_ranges
On Monday, July 23, 2018 10:46:31 AM PDT Jason Ekstrand wrote: > The original pass only looked for load_uniform intrinsics but there are > a number of other places that could end up loading a push constant. One > obvious omission was images which always implicitly use a push constant. > Legacy VS clip planes also get pushed into the shader. > > Cc: mesa-sta...@lists.freedesktop.org > Cc: Kenneth Graunke Ugh. Thanks. Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/miptree: Fix can_blit_slice()
On Mon, Jul 23, 2018 at 08:20:15PM +0100, Chris Wilson wrote: > Quoting Nanley Chery (2018-07-23 18:17:15) > > Satisfy the BLT engine's row pitch limitation on the destination > > miptree. The destination miptree is untiled, so its row_pitch will be > > slightly less than or equal to the source miptree's row_pitch. Use the > > source miptree's row_pitch in can_blit_slice instead of its blt_pitch. > > > > Fixes 0288fe8d0417730bdd5b3477130dd1dc32bdbcd3 > > ("i965/miptree: Use the correct BLT pitch") > > > > Cc: > > Reported-by: Dylan Baker > > --- > > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 9 +++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > > index a18d5ac3624..d8e823e4826 100644 > > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > > @@ -3544,8 +3544,13 @@ static bool > > can_blit_slice(struct intel_mipmap_tree *mt, > > unsigned int level, unsigned int slice) > > { > > - /* See intel_miptree_blit() for details on the 32k pitch limit. */ > > - if (intel_miptree_blt_pitch(mt) >= 32768) > > + /* The blit destination is untiled, so its row_pitch will be slightly > > less > > +* than or equal to the source's row_pitch. The BLT engine only supports > > +* linear row pitches up to but not including 32k. > > +* > > +* See intel_miptree_blit() for details on the 32k pitch limit. > > +*/ > > + if (mt->surf.row_pitch >= 32768) > >return false; > > I see the difference, but do we copy the whole slice or a region of it? I think it depends on the caller. -Nanley > -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks
https://bugs.freedesktop.org/show_bug.cgi?id=107351 --- Comment #2 from Bas Nieuwenhuizen --- Do you have a hash of the upstream git version you based this on? (just so I can see how the lines correlate in radv_image.c) -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Add flipping of gl_PointCoord.y in nir_lower_wpos_ytransform.
On Mon, Jul 23, 2018 at 1:25 PM, Eric Anholt wrote: > Eric Anholt writes: > >> This is controlled by a new nir_shader_compiler_options flag, and fixes >> dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D. >> --- > > Ken, Rob, any chance you could take a quick look at this? I think this > is the last change I need in Mesa for GLES2 conformance. > I was kinda wondering if there was any driver where we *didn't* want this? I had meant to try that deqp w/ freedreno but got distracted.. I'll try to give this a look this evening.. BR, -R > (https://gerrit.khronos.org/#/c/2745/ and > https://gitlab.khronos.org/Tracker/vk-gl-cts/issues/1278 being my CTS > issues) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 1/2] intel/fs: New methods dst_write_pattern and src_read_pattern at fs_inst
These new methods return for a instruction register source/destination the read/write byte pattern of the 32-byte GRF as an unsigned int. The returned pattern takes into account the exec_size of the instruction, the type bitsize, the register stride and a relative offset inside the register. The motivation of this functions if to know the read/written bytes of the instructions to improve the liveness analysis for partial read/writes. We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize parameter they have a different read pattern. v2: (Francisco Jerez) - Split original register_byte_use_pattern into one read and other write. - Check for send like instructions using this->mlen != 0 - Pass functions src number and offset. - Use periodic_mask function with code written by Francisco Jerez to simplify pattern generation. - Avoid breaking silently if source straddles multiple GRFs. v3: (Francisco Jerez) - A SEND could be this->mlen != 0 or this->is_send_from_grf - We only assume that a periodic mask with offset could be applied to reg_offset == 0. - We can assure that for MOVs operations for any offset (Chema) Cc: Francisco Jerez --- src/intel/compiler/brw_fs.cpp | 119 + src/intel/compiler/brw_ir_fs.h | 2 + 2 files changed, 121 insertions(+) diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp index 7ddbd285fe2..4fa0f154c44 100644 --- a/src/intel/compiler/brw_fs.cpp +++ b/src/intel/compiler/brw_fs.cpp @@ -39,6 +39,7 @@ #include "compiler/glsl_types.h" #include "compiler/nir/nir_builder.h" #include "program/prog_parameter.h" +#include using namespace brw; @@ -687,6 +688,124 @@ fs_inst::is_partial_write() const this->dst.offset % REG_SIZE != 0); } +/** + * Returns a periodic mask that is repeated "count" times with a "step" + * size and consecutive "bits" finally shifted "offset" bits to the left. + * + * This helper is used to calculate the representations of byte read/write + * register patterns + * + * Example: periodic_mask(8, 4, 2, 0) would return 0x + * periodic_mask(8, 4, 2, 2) would return 0x + * periodic_masc(8, 2, 2, 16) would return 0x + */ +static inline uint32_t +periodic_mask(unsigned count, unsigned step, unsigned bits, unsigned offset) +{ + uint32_t m = (count ? (1 << bits) - 1 : 0); + const unsigned max = MIN2(count * step, sizeof(m) * CHAR_BIT); + + for (unsigned shift = step; shift < max; shift *= 2) + m |= m << shift; + + return m << offset; +} + +/** + * Returns a 32-bit uint whose bits represent if the associated register byte + * has been written by the instruction. The returned pattern takes into + * account the exec_size of the instruction, the type bitsize and the + * stride of the destination register. + * + * The objective of this function is to identify which parts of the register + * are defined for operations that don't write a full register. So we + * we can identify in live range variable analysis if a partial write has + * completelly defined the data used by a partial read. + */ +unsigned +fs_inst::dst_write_pattern(unsigned reg_offset) const +{ + assert(this->dst.file == VGRF); + /* We don't know what is written so we return the worst case */ + if (this->predicate && this->opcode != BRW_OPCODE_SEL) + return 0u; + /* We assume that send destinations are completelly defined */ + if (this->is_send_from_grf() || this->mlen != 0) { + return ~0u; + } + + /* The byte pattern is calculated using a periodic mask for reg_offset == 0 +* because the internal offset will match how the register is written. +* +* We can for any reg_offset on MOV operations. We could add in the future +* other opcodes, but we didn't include them until we have evidences of +* them being used in partial write situations that ensure that the pattern +* is repeated of any reg_offset. +*/ + if (reg_offset == 0 || this->opcode == BRW_OPCODE_MOV) { + return periodic_mask(this->exec_size, + this->dst.stride * type_sz(this->dst.type), + type_sz(this->dst.type), + this->dst.offset % REG_SIZE); + } + /* This shouldn't be reached by in liveness range calcluation but if +* function is other context we know that we write a complete register. +*/ + if (!this->is_partial_write()) + return ~0u; + + /* By default we don't know what is written */ + return 0u; +} + +/** + * Returns a 32-bit uint whose bits represent if the associated register byte + * has been read by the instruction. The returned pattern takes into + * account the exec_size of the instruction, the type bitsize and stride of + * a source register and a register offset. + * + * The objective of this function is to identify
Re: [Mesa-dev] [PATCH 1/2] intel/fs: New method for register_byte_use_pattern for fs_inst
El 20/07/18 a las 22:10, Francisco Jerez escribió: > Chema Casanova writes: > >> El 20/07/18 a las 00:34, Francisco Jerez escribió: >>> Chema Casanova writes: >>> El 14/07/18 a las 00:14, Francisco Jerez escribió: > Jose Maria Casanova Crespo writes: > >> For a register source/destination of an instruction the function returns >> the read/write byte pattern of a 32-byte registers as a unsigned int. >> >> The returned pattern takes into account the exec_size of the instruction, >> the type bitsize, the stride and if the register is source or >> destination. >> >> The objective of the functions if to help to know the read/written bytes >> of the instructions to improve the liveness analysis for partial >> read/writes. >> >> We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL >> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize >> parameter they have a different read pattern. >> --- >> src/intel/compiler/brw_fs.cpp | 183 + >> src/intel/compiler/brw_ir_fs.h | 1 + >> 2 files changed, 184 insertions(+) >> >> diff --git a/src/intel/compiler/brw_fs.cpp >> b/src/intel/compiler/brw_fs.cpp >> index 2b8363ca362..f3045c4ff6c 100644 >> --- a/src/intel/compiler/brw_fs.cpp >> +++ b/src/intel/compiler/brw_fs.cpp >> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const >> this->dst.offset % REG_SIZE != 0); >> } >> >> +/** >> + * Returns a 32-bit uint whose bits represent if the associated >> register byte >> + * has been read/written by the instruction. The returned pattern takes >> into >> + * account the exec_size of the instruction, the type bitsize and the >> register >> + * stride and the register is source or destination for the instruction. >> + * >> + * The objective of this function is to identify which parts of the >> register >> + * are read or written for operations that don't read/write a full >> register. >> + * So we can identify in live range variable analysis if a partial >> write has >> + * completelly defined the part of the register used by a partial read. >> So we >> + * avoid extending the liveness range because all data read was already >> + * defined although the wasn't completely written. >> + */ >> +unsigned >> +fs_inst::register_byte_use_pattern(const fs_reg , boolean is_dst) >> const >> +{ >> + if (is_dst) { > Please split into two functions (like fs_inst::src_read and > ::src_written) since that would make the call-sites of this method more > self-documenting than a boolean parameter. You should be able to share > code by refactoring the common logic into a separate function (see below > for some suggestions on how that could be achieved). Sure, it would improve readability and simplifies the logic, I've chosen dst_write_pattern and src_read_pattern. > >> + /* We don't know what is written so we return the worts case */ > > "worst" Fixed. >> + if (this->predicate && this->opcode != BRW_OPCODE_SEL) >> + return 0; >> + /* We assume that send destinations are completely written */ >> + if (this->is_send_from_grf()) >> + return ~0u; > > Some send-like instructions won't be caught by this condition, you > should check for this->mlen != 0 in addition. Would it be enough to check for (this->mlen > 0) and forget about is_send_from_grf? I am using this approach in v2 I am sending. >>> >>> I don't think the mlen > 0 condition would catch all cases either... >>> E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC. You probably need both >>> conditions. Sucks... >> >> That is true, so now we have the: >> (this->is_send_from_grf() || this->mlen != 0) >> >> + } else { >> + /* byte_scattered_write_logical pattern of src[1] is 32-bit >> aligned >> + * so the read pattern depends on the bitsize stored at src[4] >> + */ >> + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL && >> + this->src[1].nr == r.nr) { > I feel uncomfortable about attempting to guess the source the caller is > referring to by comparing the registers for equality. E.g. you could > potentially end up with two sources that compare equal but have > different semantics (e.g. as a result of CSE) which might cause it to > get the wrong answer. It would probably be better to pass a source > index and a byte offset as argument instead of an fs_reg. I've didn't thought about CSE, I'm now receiving the number of source and the reg_offset. I'm using reg_offset instead of byte offsets as it simplifies the logic. Now we are using always
Re: [Mesa-dev] [PATCH 1/2] nir: add const_index parameters to system value builder function
Karol Herbst writes: > On Mon, Jul 23, 2018 at 7:02 PM, Eric Anholt wrote: >> Karol Herbst writes: >> >>> this allows to replace some nir_load_system_value calls with the specific >>> system value constructor >>> >>> Reviewed-by: Jason Ekstrand >>> Signed-off-by: Karol Herbst >>> --- >>> src/compiler/nir/nir_builder_opcodes_h.py | 21 +++-- >>> 1 file changed, 19 insertions(+), 2 deletions(-) >>> >>> diff --git a/src/compiler/nir/nir_builder_opcodes_h.py >>> b/src/compiler/nir/nir_builder_opcodes_h.py >>> index 72cf5b4549d..0440875dade 100644 >>> --- a/src/compiler/nir/nir_builder_opcodes_h.py >>> +++ b/src/compiler/nir/nir_builder_opcodes_h.py >>> @@ -55,11 +55,28 @@ nir_load_system_value(nir_builder *build, >>> nir_intrinsic_op op, int index) >>> return >dest.ssa; >>> } >>> >>> +<% >>> +def sysval_decl_list(opcode): >>> + res = '' >>> + if opcode.indices: >>> + res += ', unsigned ' + opcode.indices[0].lower() >>> + return res >>> + >>> +def sysval_arg_list(opcode): >>> + args = [] >>> + if opcode.indices: >>> + args.append(opcode.indices[0].lower()) >>> + else: >>> + args.append('0') >>> + return ', '.join(args) >>> +%> >> >> I was confused why only indices[0] was used, but it looks like system >> values can only have one index. Maybe assert(len(opcode.indices) <= 1)? >> Other than that, these are: >> >> Reviewed-by: Eric Anholt > > maybe a static assert that verifies this instead? We could put it > inside nir_intrinsics_c.py and add a note that nir_load_system_value > needs to be adjusted in case we want more. Either way is fine with me. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] virgl: remove unused stride-arguments
On 20 July 2018 at 13:39, Gurchetan Singh wrote: > Reviewed-by: Gurchetan Singh > On Wed, Jul 18, 2018 at 4:06 AM Erik Faye-Lund > wrote: >> >> The IOCTLs doesn't pass this along, so computing them in the first >> place is kinda pointless. >> >> Signed-off-by: Erik Faye-Lund >> --- >> >> This is just a cleanup I noticed based on some discussion with Gert. >> >> A question is, what code here expects this stride to be respected? The >> call-sites in virgl_*_transfer_map and virgl_*_transfer_unmap kinda >> looks like they do... They'll get a bit of a surprise here, no? >> >> Anyway, this is already broken, so I think this should be OK. But >> perhaps this patch shows some code-paths that need some love? I reverted this as it didn't fixup vtest, and it introduced build time warnings. I agree this should get some more investigation but make sure vtest and drm backends don't regress. Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] virgl: add initial shader_storage_buffer_object support. (v2)
On 24 July 2018 at 03:14, Gert Wollny wrote: > Am Montag, den 23.07.2018, 21:17 +1000 schrieb Dave Airlie: >> On 23 July 2018 at 17:28, Gert Wollny >> wrote: >> > Am Montag, den 23.07.2018, 17:13 +1000 schrieb Dave Airlie: >> > > On 23 July 2018 at 16:46, Gert Wollny >> > > wrote: >> > > > Am Montag, den 23.07.2018, 09:04 +1000 schrieb Dave Airlie: >> > > > > >> > > > > > > + uint32_t max_shader_buffer = shader == >> > > > > > > PIPE_SHADER_FRAGMENT ? >> > > > > > > + rs->caps.caps.v2.max_shader_buffer_frag_compute : >> > > > > > >> > > > > > Shouldn't max_shader_buffer_frag_compute also be used for >> > > > > > PIPE_SHADER_COMPUTE? >> > > > > >> > > > > virgl doesn't support compute shaders yet. Adding support for >> > > > > them is >> > > > > in the future patches. >> > > > >> > > > I somehow suspected this, but then one should probably >> > > > explicitely >> > > > return zero instead of returning >> > > > max_shader_buffer_other_stages. >> > > >> > > You can never reach this point of code with shader set to >> > > PIPE_SHADER_COMPUTE. >> > > >> > > The outer switch statement defaults to return 0 for compute >> > > shaders. >> > >> > I see, that was not visible in the patch, sorry for the noise. >> >> Is that an r-b in disguise? :-) > > Well, I had to take a closer look for this, and there is one thing that > I'm wondering about: Are the last five macros in virgl_protocol.h for > the shader buffers actually used? I don't see it, but if they supposed > to be used somewhere then the offsets for the last three entries seem > to be off by one, because currently > VIRGL_SET_SHADER_BUFFER_OFFSET(0) == VIRGL_SET_SHADER_BUFFER_START_SLOT > == 2 Those macros are meant to be used on the decode side in virglrenderer, and I just realised they weren't and are wrong, just sent a patch to fix them up there. I'll fixup these for consistency Dave. > > When this is clarified the patch is > Reviewed-By: Gert Wollny > > Best, > Gert >> >> Dave. >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks
https://bugs.freedesktop.org/show_bug.cgi?id=107351 --- Comment #1 from Mauro Rossi --- Created attachment 140797 --> https://bugs.freedesktop.org/attachment.cgi?id=140797=edit logcat of slingshot extreeme vulkan test -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 107351] Android 8.1: radv segfault with 3Dmark vulkan benchmarks
https://bugs.freedesktop.org/show_bug.cgi?id=107351 Bug ID: 107351 Summary: Android 8.1: radv segfault with 3Dmark vulkan benchmarks Product: Mesa Version: git Hardware: x86-64 (AMD64) OS: other Status: NEW Severity: major Priority: medium Component: Drivers/Vulkan/radeon Assignee: mesa-dev@lists.freedesktop.org Reporter: issor.or...@gmail.com QA Contact: mesa-dev@lists.freedesktop.org Created attachment 140796 --> https://bugs.freedesktop.org/attachment.cgi?id=140796=edit logcat of api_overhead test Hi, while performing some test on vulkan radv HAL on Android 8.1 (oreo-x86) build a segfault was noticed and traced with 3Dmark vulkan benchmark. The build is experimental and based on hwcomposer.drm + gralloc.gbm and tested on HD7790 gpu (Bonaire) The same 3Dmark vulkan benchmarks work without segfault on Intel anv running on Skylake GT2, so it is better to investigate why radv crashes. Inspection of crash with addr2line -Cfe returns last line of code: src/amd/vulkan/radv_image.c: 1289 Mauro Rossi android-x86 team - beginning of crash 07-22 21:31:15.938 4949 4995 F libc: Fatal signal 11 (SIGSEGV), code 1, fault addr 0x1c in tid 4995 (Thread-9), pid 4949 (cation:workload) ... 07-22 21:31:15.956 5007 5007 F DEBUG : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** 07-22 21:31:15.956 5007 5007 F DEBUG : Build fingerprint: 'Android-x86/android_x86_64/x86_64:8.1.0/OPM6.171019.030.E1/maur07141051:userdebug/test-keys' 07-22 21:31:15.956 5007 5007 F DEBUG : Revision: '0' 07-22 21:31:15.956 5007 5007 F DEBUG : ABI: 'x86' 07-22 21:31:15.956 5007 5007 F DEBUG : pid: 4949, tid: 4995, name: Thread-9 >>> com.futuremark.dmandroid.application:workload <<< 07-22 21:31:15.956 5007 5007 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x1c 07-22 21:31:15.956 5007 5007 F DEBUG : Cause: null pointer dereference 07-22 21:31:15.956 5007 5007 F DEBUG : eax ebx c65b690c ecx c2a6d0ac edx 0097 07-22 21:31:15.956 5007 5007 F DEBUG : esi c2a6d0ac edi c1fbbb00 07-22 21:31:15.956 5007 5007 F DEBUG : xcs 0023 xds 002b xes 002b xfs 006b xss 002b 07-22 21:31:15.956 5007 5007 F DEBUG : eip c64e0d47 ebp c2b7eb38 esp c2b7eaf0 flags 00010286 07-22 21:31:16.071 5007 5007 F DEBUG : 07-22 21:31:16.071 5007 5007 F DEBUG : backtrace: 07-22 21:31:16.071 5007 5007 F DEBUG : #00 pc 000f1d47 /system/vendor/lib/hw/vulkan.radv.so (radv_image_create+199) 07-22 21:31:16.071 5007 5007 F DEBUG : #01 pc 000f2f26 /system/vendor/lib/hw/vulkan.radv.so (radv_CreateImage+166) 07-22 21:31:16.071 5007 5007 F DEBUG : #02 pc b59a /android/system/lib/libvulkan.so -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/miptree: Fix can_blit_slice()
Quoting Nanley Chery (2018-07-23 18:17:15) > Satisfy the BLT engine's row pitch limitation on the destination > miptree. The destination miptree is untiled, so its row_pitch will be > slightly less than or equal to the source miptree's row_pitch. Use the > source miptree's row_pitch in can_blit_slice instead of its blt_pitch. > > Fixes 0288fe8d0417730bdd5b3477130dd1dc32bdbcd3 > ("i965/miptree: Use the correct BLT pitch") > > Cc: > Reported-by: Dylan Baker > --- > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > index a18d5ac3624..d8e823e4826 100644 > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > @@ -3544,8 +3544,13 @@ static bool > can_blit_slice(struct intel_mipmap_tree *mt, > unsigned int level, unsigned int slice) > { > - /* See intel_miptree_blit() for details on the 32k pitch limit. */ > - if (intel_miptree_blt_pitch(mt) >= 32768) > + /* The blit destination is untiled, so its row_pitch will be slightly less > +* than or equal to the source's row_pitch. The BLT engine only supports > +* linear row pitches up to but not including 32k. > +* > +* See intel_miptree_blit() for details on the 32k pitch limit. > +*/ > + if (mt->surf.row_pitch >= 32768) >return false; I see the difference, but do we copy the whole slice or a region of it? -Chris ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 107305] glsl/opt_copy_propagation_elements.cpp:72:9: error: delegating constructors are permitted only in C++11
https://bugs.freedesktop.org/show_bug.cgi?id=107305 Caio Marcelo de Oliveira Filho changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #3 from Caio Marcelo de Oliveira Filho --- Fixed by commit 52d831ff83036773978aabf52dde3bb73bb211c7 Author: Caio Marcelo de Oliveira Filho Date: Fri Jul 20 13:21:33 2018 -0700 glsl: remove delegating constructors to allow build with C++98 Delegating constructors is a C++11 feature, so this was breaking when compiling with C++98. Change the copy_propagation_state() calls that used the convenience constructor to use a static member function instead. Since copy_propagation_state is expected to be heap allocated, this change is a good fit. Tested-by: Vinson Lee Reviewed-by: Matt Turner Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107305 -- You are receiving this mail because: You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] intel/compiler: Account for built-in uniforms in analyze_ubo_ranges
On 23/07/18 18:46, Jason Ekstrand wrote: The original pass only looked for load_uniform intrinsics but there are a number of other places that could end up loading a push constant. One obvious omission was images which always implicitly use a push constant. Legacy VS clip planes also get pushed into the shader. Cc: mesa-sta...@lists.freedesktop.org Cc: Kenneth Graunke Tested-by: Lionel Landwerlin --- src/intel/compiler/brw_nir.h | 1 + .../compiler/brw_nir_analyze_ubo_ranges.c | 41 +-- src/intel/vulkan/anv_pipeline.c | 2 +- src/mesa/drivers/dri/i965/brw_gs.c| 2 +- src/mesa/drivers/dri/i965/brw_tcs.c | 2 +- src/mesa/drivers/dri/i965/brw_tes.c | 2 +- src/mesa/drivers/dri/i965/brw_vs.c| 2 +- src/mesa/drivers/dri/i965/brw_wm.c| 2 +- 8 files changed, 45 insertions(+), 9 deletions(-) diff --git a/src/intel/compiler/brw_nir.h b/src/intel/compiler/brw_nir.h index 19442b47eae..7d82edafe46 100644 --- a/src/intel/compiler/brw_nir.h +++ b/src/intel/compiler/brw_nir.h @@ -148,6 +148,7 @@ void brw_nir_lower_patch_vertices_in_to_uniform(nir_shader *nir); void brw_nir_analyze_ubo_ranges(const struct brw_compiler *compiler, nir_shader *nir, +const struct brw_vs_prog_key *vs_key, struct brw_ubo_range out_ranges[4]); bool brw_nir_opt_peephole_ffma(nir_shader *shader); diff --git a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c index cd5137da06e..cfa531675fc 100644 --- a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c +++ b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c @@ -124,12 +124,29 @@ analyze_ubos_block(struct ubo_analysis_state *state, nir_block *block) continue; nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr); - if (intrin->intrinsic == nir_intrinsic_load_uniform) + switch (intrin->intrinsic) { + case nir_intrinsic_load_uniform: + case nir_intrinsic_image_deref_load: + case nir_intrinsic_image_deref_store: + case nir_intrinsic_image_deref_atomic_add: + case nir_intrinsic_image_deref_atomic_min: + case nir_intrinsic_image_deref_atomic_max: + case nir_intrinsic_image_deref_atomic_and: + case nir_intrinsic_image_deref_atomic_or: + case nir_intrinsic_image_deref_atomic_xor: + case nir_intrinsic_image_deref_atomic_exchange: + case nir_intrinsic_image_deref_atomic_comp_swap: + case nir_intrinsic_image_deref_size: state->uses_regular_uniforms = true; - - if (intrin->intrinsic != nir_intrinsic_load_ubo) continue; + case nir_intrinsic_load_ubo: + break; /* Fall through to the analysis below */ + + default: + continue; /* Not a uniform or UBO intrinsic */ + } + nir_const_value *block_const = nir_src_as_const_value(intrin->src[0]); nir_const_value *offset_const = nir_src_as_const_value(intrin->src[1]); @@ -179,6 +196,7 @@ print_ubo_entry(FILE *file, void brw_nir_analyze_ubo_ranges(const struct brw_compiler *compiler, nir_shader *nir, + const struct brw_vs_prog_key *vs_key, struct brw_ubo_range out_ranges[4]) { const struct gen_device_info *devinfo = compiler->devinfo; @@ -197,6 +215,23 @@ brw_nir_analyze_ubo_ranges(const struct brw_compiler *compiler, _mesa_hash_table_create(mem_ctx, NULL, _mesa_key_pointer_equal), }; + switch (nir->info.stage) { + case MESA_SHADER_VERTEX: + if (vs_key && vs_key->nr_userclip_plane_consts > 0) + state.uses_regular_uniforms = true; + break; + + case MESA_SHADER_COMPUTE: + /* Compute shaders use push constants to get the subgroup ID so it's + * best to just assume some system values are pushed. + */ + state.uses_regular_uniforms = true; + break; + + default: + break; + } + /* Walk the IR, recording how many times each UBO block/offset is used. */ nir_foreach_function(function, nir) { if (function->impl) { diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c index 211cee788b8..1e6bd12b87d 100644 --- a/src/intel/vulkan/anv_pipeline.c +++ b/src/intel/vulkan/anv_pipeline.c @@ -472,7 +472,7 @@ anv_pipeline_compile(struct anv_pipeline *pipeline, anv_nir_apply_pipeline_layout(pipeline, layout, nir, prog_data, map); if (stage != MESA_SHADER_COMPUTE) - brw_nir_analyze_ubo_ranges(compiler, nir, prog_data->ubo_ranges); + brw_nir_analyze_ubo_ranges(compiler, nir, NULL, prog_data->ubo_ranges); assert(nir->num_uniforms == prog_data->nr_params * 4); diff --git a/src/mesa/drivers/dri/i965/brw_gs.c b/src/mesa/drivers/dri/i965/brw_gs.c index 9acb0337e20..7263f6351e9 100644
Re: [Mesa-dev] [PATCH] r600: enable tess_input_info for TES
Looks reasonable, Reviewed-By: Gert Wollny Am Donnerstag, den 19.07.2018, 14:42 +1000 schrieb Dave Airlie: > From: Dave Airlie > > There might be a nicer way to do this, but this is at least correct. > > This fixes: > KHR-GL44.tessellation_shader.single.max_patch_vertices > KHR- > GL44.tessellation_shader.tessellation_control_to_tessellation_evaluat > ion.gl_PatchVerticesIn > --- > src/gallium/drivers/r600/r600_shader.c | 20 ++-- > 1 file changed, 6 insertions(+), 14 deletions(-) > > diff --git a/src/gallium/drivers/r600/r600_shader.c > b/src/gallium/drivers/r600/r600_shader.c > index 4b91da3..6effa31be7b 100644 > --- a/src/gallium/drivers/r600/r600_shader.c > +++ b/src/gallium/drivers/r600/r600_shader.c > @@ -1673,19 +1673,11 @@ static void tgsi_src(struct r600_shader_ctx > *ctx, > } else if (ctx- > >info.system_value_semantic_name[tgsi_src->Register.Index] == > TGSI_SEMANTIC_TESSOUTER) { > r600_src->sel = 2; > } else if (ctx- > >info.system_value_semantic_name[tgsi_src->Register.Index] == > TGSI_SEMANTIC_VERTICESIN) { > - if (ctx->type == PIPE_SHADER_TESS_CTRL) { > - r600_src->sel = ctx- > >tess_input_info; > - r600_src->swizzle[0] = 2; > - r600_src->swizzle[1] = 2; > - r600_src->swizzle[2] = 2; > - r600_src->swizzle[3] = 2; > - } else { > - r600_src->sel = ctx- > >tess_input_info; > - r600_src->swizzle[0] = 3; > - r600_src->swizzle[1] = 3; > - r600_src->swizzle[2] = 3; > - r600_src->swizzle[3] = 3; > - } > + r600_src->sel = ctx->tess_input_info; > + r600_src->swizzle[0] = 2; > + r600_src->swizzle[1] = 2; > + r600_src->swizzle[2] = 2; > + r600_src->swizzle[3] = 2; > } else if (ctx->type == PIPE_SHADER_TESS_CTRL && > ctx->info.system_value_semantic_name[tgsi_src->Register.Index] == > TGSI_SEMANTIC_PRIMID) { > r600_src->sel = 0; > r600_src->swizzle[0] = 0; > @@ -3559,7 +3551,7 @@ static int r600_shader_from_tgsi(struct > r600_context *rctx, > ctx.tess_input_info = ++regno; > ctx.tess_output_info = ++regno; > } else if (ctx.type == PIPE_SHADER_TESS_EVAL) { > - ctx.tess_input_info = 0; > + ctx.tess_input_info = ++regno; > ctx.tess_output_info = ++regno; > } else if (ctx.type == PIPE_SHADER_GEOMETRY) { > ctx.gs_export_gpr_tregs[0] = ++regno; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] intel/compiler: Account for built-in uniforms in analyze_ubo_ranges
The original pass only looked for load_uniform intrinsics but there are a number of other places that could end up loading a push constant. One obvious omission was images which always implicitly use a push constant. Legacy VS clip planes also get pushed into the shader. Cc: mesa-sta...@lists.freedesktop.org Cc: Kenneth Graunke --- src/intel/compiler/brw_nir.h | 1 + .../compiler/brw_nir_analyze_ubo_ranges.c | 41 +-- src/intel/vulkan/anv_pipeline.c | 2 +- src/mesa/drivers/dri/i965/brw_gs.c| 2 +- src/mesa/drivers/dri/i965/brw_tcs.c | 2 +- src/mesa/drivers/dri/i965/brw_tes.c | 2 +- src/mesa/drivers/dri/i965/brw_vs.c| 2 +- src/mesa/drivers/dri/i965/brw_wm.c| 2 +- 8 files changed, 45 insertions(+), 9 deletions(-) diff --git a/src/intel/compiler/brw_nir.h b/src/intel/compiler/brw_nir.h index 19442b47eae..7d82edafe46 100644 --- a/src/intel/compiler/brw_nir.h +++ b/src/intel/compiler/brw_nir.h @@ -148,6 +148,7 @@ void brw_nir_lower_patch_vertices_in_to_uniform(nir_shader *nir); void brw_nir_analyze_ubo_ranges(const struct brw_compiler *compiler, nir_shader *nir, +const struct brw_vs_prog_key *vs_key, struct brw_ubo_range out_ranges[4]); bool brw_nir_opt_peephole_ffma(nir_shader *shader); diff --git a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c index cd5137da06e..cfa531675fc 100644 --- a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c +++ b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c @@ -124,12 +124,29 @@ analyze_ubos_block(struct ubo_analysis_state *state, nir_block *block) continue; nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr); - if (intrin->intrinsic == nir_intrinsic_load_uniform) + switch (intrin->intrinsic) { + case nir_intrinsic_load_uniform: + case nir_intrinsic_image_deref_load: + case nir_intrinsic_image_deref_store: + case nir_intrinsic_image_deref_atomic_add: + case nir_intrinsic_image_deref_atomic_min: + case nir_intrinsic_image_deref_atomic_max: + case nir_intrinsic_image_deref_atomic_and: + case nir_intrinsic_image_deref_atomic_or: + case nir_intrinsic_image_deref_atomic_xor: + case nir_intrinsic_image_deref_atomic_exchange: + case nir_intrinsic_image_deref_atomic_comp_swap: + case nir_intrinsic_image_deref_size: state->uses_regular_uniforms = true; - - if (intrin->intrinsic != nir_intrinsic_load_ubo) continue; + case nir_intrinsic_load_ubo: + break; /* Fall through to the analysis below */ + + default: + continue; /* Not a uniform or UBO intrinsic */ + } + nir_const_value *block_const = nir_src_as_const_value(intrin->src[0]); nir_const_value *offset_const = nir_src_as_const_value(intrin->src[1]); @@ -179,6 +196,7 @@ print_ubo_entry(FILE *file, void brw_nir_analyze_ubo_ranges(const struct brw_compiler *compiler, nir_shader *nir, + const struct brw_vs_prog_key *vs_key, struct brw_ubo_range out_ranges[4]) { const struct gen_device_info *devinfo = compiler->devinfo; @@ -197,6 +215,23 @@ brw_nir_analyze_ubo_ranges(const struct brw_compiler *compiler, _mesa_hash_table_create(mem_ctx, NULL, _mesa_key_pointer_equal), }; + switch (nir->info.stage) { + case MESA_SHADER_VERTEX: + if (vs_key && vs_key->nr_userclip_plane_consts > 0) + state.uses_regular_uniforms = true; + break; + + case MESA_SHADER_COMPUTE: + /* Compute shaders use push constants to get the subgroup ID so it's + * best to just assume some system values are pushed. + */ + state.uses_regular_uniforms = true; + break; + + default: + break; + } + /* Walk the IR, recording how many times each UBO block/offset is used. */ nir_foreach_function(function, nir) { if (function->impl) { diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c index 211cee788b8..1e6bd12b87d 100644 --- a/src/intel/vulkan/anv_pipeline.c +++ b/src/intel/vulkan/anv_pipeline.c @@ -472,7 +472,7 @@ anv_pipeline_compile(struct anv_pipeline *pipeline, anv_nir_apply_pipeline_layout(pipeline, layout, nir, prog_data, map); if (stage != MESA_SHADER_COMPUTE) - brw_nir_analyze_ubo_ranges(compiler, nir, prog_data->ubo_ranges); + brw_nir_analyze_ubo_ranges(compiler, nir, NULL, prog_data->ubo_ranges); assert(nir->num_uniforms == prog_data->nr_params * 4); diff --git a/src/mesa/drivers/dri/i965/brw_gs.c b/src/mesa/drivers/dri/i965/brw_gs.c index 9acb0337e20..7263f6351e9 100644 --- a/src/mesa/drivers/dri/i965/brw_gs.c +++ b/src/mesa/drivers/dri/i965/brw_gs.c @@ -94,7 +94,7 @@ brw_codegen_gs_prog(struct
Re: [Mesa-dev] [PATCH] nir: Add a couple trivial abs optimizations
Funny thing about this... this is how i915 implements ABS. :) Reviewed-by: Ian Romanick On 07/23/2018 12:02 AM, Jason Ekstrand wrote: > Spotted in a shader in Batman: Arkham City. > --- > src/compiler/nir/nir_opt_algebraic.py | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/src/compiler/nir/nir_opt_algebraic.py > b/src/compiler/nir/nir_opt_algebraic.py > index f49b43f3213..ba277fdfd0e 100644 > --- a/src/compiler/nir/nir_opt_algebraic.py > +++ b/src/compiler/nir/nir_opt_algebraic.py > @@ -229,6 +229,8 @@ optimizations = [ > (('imax', a, a), a), > (('umin', a, a), a), > (('umax', a, a), a), > + (('fmax', a, ('fneg', a)), ('fabs', a)), > + (('imax', a, ('ineg', a)), ('iabs', a)), > (('fmin', a, ('fneg', a)), ('fneg', ('fabs', a))), > (('imin', a, ('ineg', a)), ('ineg', ('iabs', a))), > (('fmin', a, ('fneg', ('fabs', a))), ('fneg', ('fabs', a))), > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: remove delegating constructors to allow build with C++98
Reviewed-by: Matt Turner and pushed. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Add flipping of gl_PointCoord.y in nir_lower_wpos_ytransform.
Eric Anholt writes: > This is controlled by a new nir_shader_compiler_options flag, and fixes > dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D. > --- Ken, Rob, any chance you could take a quick look at this? I think this is the last change I need in Mesa for GLES2 conformance. (https://gerrit.khronos.org/#/c/2745/ and https://gitlab.khronos.org/Tracker/vk-gl-cts/issues/1278 being my CTS issues) signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/miptree: Fix can_blit_slice()
Satisfy the BLT engine's row pitch limitation on the destination miptree. The destination miptree is untiled, so its row_pitch will be slightly less than or equal to the source miptree's row_pitch. Use the source miptree's row_pitch in can_blit_slice instead of its blt_pitch. Fixes 0288fe8d0417730bdd5b3477130dd1dc32bdbcd3 ("i965/miptree: Use the correct BLT pitch") Cc: Reported-by: Dylan Baker --- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index a18d5ac3624..d8e823e4826 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -3544,8 +3544,13 @@ static bool can_blit_slice(struct intel_mipmap_tree *mt, unsigned int level, unsigned int slice) { - /* See intel_miptree_blit() for details on the 32k pitch limit. */ - if (intel_miptree_blt_pitch(mt) >= 32768) + /* The blit destination is untiled, so its row_pitch will be slightly less +* than or equal to the source's row_pitch. The BLT engine only supports +* linear row pitches up to but not including 32k. +* +* See intel_miptree_blit() for details on the 32k pitch limit. +*/ + if (mt->surf.row_pitch >= 32768) return false; return true; -- 2.18.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] nir: add const_index parameters to system value builder function
On Mon, Jul 23, 2018 at 7:02 PM, Eric Anholt wrote: > Karol Herbst writes: > >> this allows to replace some nir_load_system_value calls with the specific >> system value constructor >> >> Reviewed-by: Jason Ekstrand >> Signed-off-by: Karol Herbst >> --- >> src/compiler/nir/nir_builder_opcodes_h.py | 21 +++-- >> 1 file changed, 19 insertions(+), 2 deletions(-) >> >> diff --git a/src/compiler/nir/nir_builder_opcodes_h.py >> b/src/compiler/nir/nir_builder_opcodes_h.py >> index 72cf5b4549d..0440875dade 100644 >> --- a/src/compiler/nir/nir_builder_opcodes_h.py >> +++ b/src/compiler/nir/nir_builder_opcodes_h.py >> @@ -55,11 +55,28 @@ nir_load_system_value(nir_builder *build, >> nir_intrinsic_op op, int index) >> return >dest.ssa; >> } >> >> +<% >> +def sysval_decl_list(opcode): >> + res = '' >> + if opcode.indices: >> + res += ', unsigned ' + opcode.indices[0].lower() >> + return res >> + >> +def sysval_arg_list(opcode): >> + args = [] >> + if opcode.indices: >> + args.append(opcode.indices[0].lower()) >> + else: >> + args.append('0') >> + return ', '.join(args) >> +%> > > I was confused why only indices[0] was used, but it looks like system > values can only have one index. Maybe assert(len(opcode.indices) <= 1)? > Other than that, these are: > > Reviewed-by: Eric Anholt maybe a static assert that verifies this instead? We could put it inside nir_intrinsics_c.py and add a note that nir_load_system_value needs to be adjusted in case we want more. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] virgl: add initial shader_storage_buffer_object support. (v2)
Am Montag, den 23.07.2018, 21:17 +1000 schrieb Dave Airlie: > On 23 July 2018 at 17:28, Gert Wollny > wrote: > > Am Montag, den 23.07.2018, 17:13 +1000 schrieb Dave Airlie: > > > On 23 July 2018 at 16:46, Gert Wollny > > > wrote: > > > > Am Montag, den 23.07.2018, 09:04 +1000 schrieb Dave Airlie: > > > > > > > > > > > > + uint32_t max_shader_buffer = shader == > > > > > > > PIPE_SHADER_FRAGMENT ? > > > > > > > + rs->caps.caps.v2.max_shader_buffer_frag_compute : > > > > > > > > > > > > Shouldn't max_shader_buffer_frag_compute also be used for > > > > > > PIPE_SHADER_COMPUTE? > > > > > > > > > > virgl doesn't support compute shaders yet. Adding support for > > > > > them is > > > > > in the future patches. > > > > > > > > I somehow suspected this, but then one should probably > > > > explicitely > > > > return zero instead of returning > > > > max_shader_buffer_other_stages. > > > > > > You can never reach this point of code with shader set to > > > PIPE_SHADER_COMPUTE. > > > > > > The outer switch statement defaults to return 0 for compute > > > shaders. > > > > I see, that was not visible in the patch, sorry for the noise. > > Is that an r-b in disguise? :-) Well, I had to take a closer look for this, and there is one thing that I'm wondering about: Are the last five macros in virgl_protocol.h for the shader buffers actually used? I don't see it, but if they supposed to be used somewhere then the offsets for the last three entries seem to be off by one, because currently VIRGL_SET_SHADER_BUFFER_OFFSET(0) == VIRGL_SET_SHADER_BUFFER_START_SLOT == 2 When this is clarified the patch is Reviewed-By: Gert Wollny Best, Gert > > Dave. > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] i965: implement MESA_framebuffer_flip_y [v3]
Instead of using _mesa_is_winsys_fbo or _mesa_is_user_fbo to infer if an fbo is flipped use the InvertedY flag. v2: * additional window-system framebuffer checks [for jason] v3: * s/inverted_y/flip_y/g [for chadv] * s/InvertedY/FlipY/g [for chadv] --- src/mesa/drivers/dri/i965/brw_blorp.c | 2 +- src/mesa/drivers/dri/i965/brw_meta_util.c | 4 +- src/mesa/drivers/dri/i965/brw_sf.c| 6 +-- src/mesa/drivers/dri/i965/genX_state_upload.c | 50 +-- src/mesa/drivers/dri/i965/intel_extensions.c | 1 + src/mesa/drivers/dri/i965/intel_fbo.c | 11 ++-- .../drivers/dri/i965/intel_pixel_bitmap.c | 8 +-- src/mesa/drivers/dri/i965/intel_pixel_copy.c | 4 +- src/mesa/drivers/dri/i965/intel_pixel_draw.c | 2 +- 9 files changed, 43 insertions(+), 45 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c b/src/mesa/drivers/dri/i965/brw_blorp.c index b66ee18ba4..7476cee43a 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp.c +++ b/src/mesa/drivers/dri/i965/brw_blorp.c @@ -740,7 +740,7 @@ brw_blorp_copytexsubimage(struct brw_context *brw, /* Account for the fact that in the system framebuffer, the origin is at * the lower left. */ - bool mirror_y = _mesa_is_winsys_fbo(ctx->ReadBuffer); + bool mirror_y = ctx->ReadBuffer->FlipY; if (mirror_y) apply_y_flip(, , src_rb->Height); diff --git a/src/mesa/drivers/dri/i965/brw_meta_util.c b/src/mesa/drivers/dri/i965/brw_meta_util.c index d292f5a8e2..908b098976 100644 --- a/src/mesa/drivers/dri/i965/brw_meta_util.c +++ b/src/mesa/drivers/dri/i965/brw_meta_util.c @@ -250,13 +250,13 @@ brw_meta_mirror_clip_and_scissor(const struct gl_context *ctx, /* Account for the fact that in the system framebuffer, the origin is at * the lower left. */ - if (_mesa_is_winsys_fbo(read_fb)) { + if (read_fb->FlipY) { GLint tmp = read_fb->Height - *srcY0; *srcY0 = read_fb->Height - *srcY1; *srcY1 = tmp; *mirror_y = !*mirror_y; } - if (_mesa_is_winsys_fbo(draw_fb)) { + if (draw_fb->FlipY) { GLint tmp = draw_fb->Height - *dstY0; *dstY0 = draw_fb->Height - *dstY1; *dstY1 = tmp; diff --git a/src/mesa/drivers/dri/i965/brw_sf.c b/src/mesa/drivers/dri/i965/brw_sf.c index 73bc663f29..f4073fa6cf 100644 --- a/src/mesa/drivers/dri/i965/brw_sf.c +++ b/src/mesa/drivers/dri/i965/brw_sf.c @@ -90,7 +90,7 @@ brw_upload_sf_prog(struct brw_context *brw) return; /* _NEW_BUFFERS */ - bool render_to_fbo = _mesa_is_user_fbo(ctx->DrawBuffer); + bool flip_y = ctx->DrawBuffer->FlipY; memset(, 0, sizeof(key)); @@ -137,7 +137,7 @@ brw_upload_sf_prog(struct brw_context *brw) * Window coordinates in a FBO are inverted, which means point * sprite origin must be inverted, too. */ - if ((ctx->Point.SpriteOrigin == GL_LOWER_LEFT) != render_to_fbo) + if ((ctx->Point.SpriteOrigin == GL_LOWER_LEFT) == flip_y) key.sprite_origin_lower_left = true; /* BRW_NEW_FS_PROG_DATA */ @@ -161,7 +161,7 @@ brw_upload_sf_prog(struct brw_context *brw) * face orientation, just as we invert the viewport in * sf_unit_create_from_key(). */ - key.frontface_ccw = brw->polygon_front_bit == render_to_fbo; + key.frontface_ccw = brw->polygon_front_bit != flip_y; } if (!brw_search_cache(>cache, BRW_CACHE_SF_PROG, , sizeof(key), diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c b/src/mesa/drivers/dri/i965/genX_state_upload.c index 9e0a17b9d9..cfafbb0c37 100644 --- a/src/mesa/drivers/dri/i965/genX_state_upload.c +++ b/src/mesa/drivers/dri/i965/genX_state_upload.c @@ -217,7 +217,7 @@ genX(upload_polygon_stipple)(struct brw_context *brw) * to a FBO (i.e. any named frame buffer object), we *don't* * need to invert - we already match the layout. */ - if (_mesa_is_winsys_fbo(ctx->DrawBuffer)) { + if (ctx->DrawBuffer->FlipY) { for (unsigned i = 0; i < 32; i++) poly.PatternRow[i] = ctx->PolygonStipple[31 - i]; /* invert */ } else { @@ -257,7 +257,7 @@ genX(upload_polygon_stipple_offset)(struct brw_context *brw) * to a user-created FBO then our native pixel coordinate system * works just fine, and there's no window system to worry about. */ - if (_mesa_is_winsys_fbo(ctx->DrawBuffer)) { + if (ctx->DrawBuffer->FlipY) { poly.PolygonStippleYOffset = (32 - (_mesa_geometric_height(ctx->DrawBuffer) & 31)) & 31; } @@ -1468,7 +1468,7 @@ genX(upload_clip_state)(struct brw_context *brw) #endif #if GEN_GEN == 7 - clip.FrontWinding = brw->polygon_front_bit == _mesa_is_user_fbo(fb); + clip.FrontWinding = brw->polygon_front_bit != fb->FlipY; if (ctx->Polygon.CullFlag) { switch (ctx->Polygon.CullFaceMode) { @@ -1583,7 +1583,7 @@ genX(upload_sf)(struct brw_context *brw) #if GEN_GEN <= 7 /* _NEW_BUFFERS */ - bool render_to_fbo =
[Mesa-dev] [PATCH 1/2] mesa: MESA_framebuffer_flip_y extension [v4]
Adds an extension to glFramebufferParameteri that will specify if the framebuffer is vertically flipped. Historically system framebuffers are vertically flipped and user framebuffers are not. Checking to see the state was done by looking at the name field. This adds an explicit field. v2: * updated spec language [for chadv] * correctly specifying ES 3.1 [for chadv] * refactor access to rb->Name [for jason] * handle GetFramebufferParameteriv [for chadv] v3: * correct _mesa_GetMultisamplefv [for kusmabite] v4: * update spec language [for chadv] * s/GLboolean/bool/g [for chadv] * s/InvertedY/FlipY/g [for chadv] * s/inverted_y/flip_y/g [for chadv] * assert changes [for chadv] --- docs/specs/MESA_framebuffer_flip_y.txt | 81 ++ docs/specs/enums.txt | 3 + include/GLES2/gl2ext.h | 5 ++ src/mapi/glapi/registry/gl.xml | 6 ++ src/mesa/drivers/dri/i915/intel_fbo.c | 6 +- src/mesa/drivers/dri/i965/intel_fbo.c | 6 +- src/mesa/drivers/dri/nouveau/nouveau_fbo.c | 6 +- src/mesa/drivers/dri/radeon/radeon_fbo.c | 6 +- src/mesa/drivers/dri/radeon/radeon_span.c | 9 ++- src/mesa/drivers/dri/swrast/swrast.c | 6 +- src/mesa/drivers/osmesa/osmesa.c | 5 +- src/mesa/drivers/x11/xm_buffer.c | 3 +- src/mesa/drivers/x11/xmesaP.h | 3 +- src/mesa/main/accum.c | 17 +++-- src/mesa/main/dd.h | 3 +- src/mesa/main/extensions_table.h | 1 + src/mesa/main/fbobject.c | 16 + src/mesa/main/framebuffer.c| 1 + src/mesa/main/glheader.h | 3 + src/mesa/main/mtypes.h | 4 ++ src/mesa/main/readpix.c| 20 +++--- src/mesa/state_tracker/st_cb_fbo.c | 6 +- src/mesa/swrast/s_blit.c | 17 +++-- src/mesa/swrast/s_clear.c | 3 +- src/mesa/swrast/s_copypix.c| 11 +-- src/mesa/swrast/s_depth.c | 6 +- src/mesa/swrast/s_drawpix.c| 26 --- src/mesa/swrast/s_renderbuffer.c | 6 +- src/mesa/swrast/s_renderbuffer.h | 3 +- src/mesa/swrast/s_stencil.c| 3 +- 30 files changed, 235 insertions(+), 56 deletions(-) create mode 100644 docs/specs/MESA_framebuffer_flip_y.txt diff --git a/docs/specs/MESA_framebuffer_flip_y.txt b/docs/specs/MESA_framebuffer_flip_y.txt new file mode 100644 index 00..697ab4e75d --- /dev/null +++ b/docs/specs/MESA_framebuffer_flip_y.txt @@ -0,0 +1,81 @@ +Name + +MESA_framebuffer_flip_y + +Name Strings + +GL_MESA_framebuffer_flip_y + +Contact + +Fritz Koenig + +Contributors + +Fritz Koenig, Google +Kristian Høgsberg, Google +Chad Versace, Google + +Status + +Proposal + +Version + +Version 1, June 7, 2018 + +Number + +302 + +Dependencies + +OpenGL ES 3.1 is required, for FramebufferParameteri. + +Overview + +This extension defines a new framebuffer parameter, +GL_FRAMEBUFFER_FLIP_Y_MESA, that changes the behavior of the reads and +writes to the framebuffer attachment points. When GL_FRAMEBUFFER_FLIP_Y_MESA +is GL_TRUE, render commands and pixel transfer operations access the +backing store of each attachment point with an y-inverted coordinate +system. This y-inversion is relative to the coordinate system set when +GL_FRAMEBUFFER_FLIP_Y_MESA is GL_FALSE. + +Access through TexSubImage2D and similar calls will notice the effect of +the flip when they are not attached to framebuffer objects because +GL_FRAMEBUFFER_FLIP_Y_MESA is associated with the framebuffer object and +not the attachment points. + +IP Status + +None + +Issues + +None + +New Procedures and Functions + +None + +New Types + +None + +New Tokens + +Accepted by the argument of FramebufferParameteri and +GetFramebufferParameteriv: + +GL_FRAMEBUFFER_FLIP_Y_MESA 0x8BBB + +Errors + +An INVALID_OPERATION error is generated by GetFramebufferParameteriv if the +default framebuffer is bound to and is FRAMEBUFFER_FLIP_Y_MESA. + +Revision History + +Version 1, June, 2018 +Initial draft (Fritz Koenig) diff --git a/docs/specs/enums.txt b/docs/specs/enums.txt index bf3ca9c176..e1b95ec874 100644 --- a/docs/specs/enums.txt +++ b/docs/specs/enums.txt @@ -71,6 +71,9 @@ GL_MESA_tile_raster_order GL_TILE_RASTER_ORDER_INCREASING_X_MESA 0x8BB9 GL_TILE_RASTER_ORDER_INCREASING_Y_MESA 0x8BBA +GL_MESA_framebuffer_flip_y + GL_FRAMEBUFFER_FLIP_Y_MESA 0x8BBB + EGL_MESA_drm_image EGL_DRM_BUFFER_FORMAT_MESA 0x31D0 EGL_DRM_BUFFER_USE_MESA0x31D1 diff --git a/include/GLES2/gl2ext.h b/include/GLES2/gl2ext.h index a7d19a1fc8..0a93bfb865 100644 --- a/include/GLES2/gl2ext.h +++ b/include/GLES2/gl2ext.h @@ -2334,6
Re: [Mesa-dev] i915/swrast vertex array regression
Hi Ville, > I noticed a while back that xonotic had started to misrender the gun > models on i915. Yesterday I bisected it down to commit 64d2a2048054 > ("mesa: Make gl_vertex_array contain pointers to first order VAO > members."). Actually that commit broke things even worse (and the > game would even crash after a while), but a later commit (presumably > 98f35ad63c23 ("vbo: Correctly handle source arrays in vbo_split_copy.") > fixed things a little bit. But even with current master the guns are > still being misrendered. I also verified that the same breakage was > visible with swrast, whereas llvmpipe and i965 seemed ok. > > To reproduce you can run 'xonotic-glx -benchmark demos/the-big-keybench.dem' > and wait until the guy picks up the mortar (I think that's what its > called) which is maybe the third gun he picks up in that demo. The gun > is supposed to have some red color on it but now it has green. > > It looked like there's already a bunch of stuff piled on top of the > regression so reverting didn't seem entirely trivial, and thus I didn't > bother trying. I'll take a look! And no - reverting depends on too much. Mathias ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] nir: add const_index parameters to system value builder function
Karol Herbst writes: > this allows to replace some nir_load_system_value calls with the specific > system value constructor > > Reviewed-by: Jason Ekstrand > Signed-off-by: Karol Herbst > --- > src/compiler/nir/nir_builder_opcodes_h.py | 21 +++-- > 1 file changed, 19 insertions(+), 2 deletions(-) > > diff --git a/src/compiler/nir/nir_builder_opcodes_h.py > b/src/compiler/nir/nir_builder_opcodes_h.py > index 72cf5b4549d..0440875dade 100644 > --- a/src/compiler/nir/nir_builder_opcodes_h.py > +++ b/src/compiler/nir/nir_builder_opcodes_h.py > @@ -55,11 +55,28 @@ nir_load_system_value(nir_builder *build, > nir_intrinsic_op op, int index) > return >dest.ssa; > } > > +<% > +def sysval_decl_list(opcode): > + res = '' > + if opcode.indices: > + res += ', unsigned ' + opcode.indices[0].lower() > + return res > + > +def sysval_arg_list(opcode): > + args = [] > + if opcode.indices: > + args.append(opcode.indices[0].lower()) > + else: > + args.append('0') > + return ', '.join(args) > +%> I was confused why only indices[0] was used, but it looks like system values can only have one index. Maybe assert(len(opcode.indices) <= 1)? Other than that, these are: Reviewed-by: Eric Anholt signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 2/2] i965, anv: Use INTEL_DEBUG for disk_cache driver flags
On 2018-07-23 03:29:21, Eero Tamminen wrote: > Hi, > > Should this fix also: > https://bugs.freedesktop.org/show_bug.cgi?id=106382 > ? > It helped a bit. For me, on a simple test program, it allowed shader time to work once, even when the cache had a non-shader-time program. But, running wit shader-time a second time, it printed 'No shader time collected yet'. Without this patch, the test program would crash if the cache had a non-shader-time program, and this patch prevented that. Regarding the bisect info in that bug, if you happen to run with INTEL_DEBUG=shader_time first after a fresh build, then shader time would probably work, since it would generate the shader time program and put it in the cache. So, the bisect info could be unreliable. I'll take at what is happening on the '2nd shader-time run' issue that I described above. -Jordan > > On 23.07.2018 07:27, Jordan Justen wrote: > > Since various options within INTEL_DEBUG could impact code generation, > > we need to set the disk cache driver_flags parameter based on the > > INTEL_DEBUG flags in use. > > > > An example that will affect the program generated by i965 is the > > INTEL_DEBUG=nocompact option. > > > > The DEBUG_DISK_CACHE_MASK value is added to mask the settings of > > INTEL_DEBUG that can affect program generation. > > > > v2: > > * Use driver_flags (Tim) > > * Also update Anvil (Jason) > > > > Signed-off-by: Jordan Justen > > Reviewed-by: Lionel Landwerlin (v1) > > --- > > src/intel/common/gen_debug.h | 5 + > > src/intel/vulkan/anv_device.c | 3 ++- > > src/mesa/drivers/dri/i965/brw_disk_cache.c | 3 ++- > > 3 files changed, 9 insertions(+), 2 deletions(-) > > > > diff --git a/src/intel/common/gen_debug.h b/src/intel/common/gen_debug.h > > index f6c44eeb912..aa9f3cf80d7 100644 > > --- a/src/intel/common/gen_debug.h > > +++ b/src/intel/common/gen_debug.h > > @@ -84,6 +84,11 @@ extern uint64_t INTEL_DEBUG; > > #define DEBUG_COLOR (1ull << 40) > > #define DEBUG_REEMIT (1ull << 41) > > > > +/* These flags may affect program generation */ > > +#define DEBUG_DISK_CACHE_MASK \ > > + (DEBUG_SHADER_TIME | DEBUG_NO16 | DEBUG_NO_DUAL_OBJECT_GS | DEBUG_NO8 | > > \ > > + DEBUG_SPILL_FS | DEBUG_SPILL_VEC4 | DEBUG_NO_COMPACTION | DEBUG_DO32) > > + > > #ifdef HAVE_ANDROID_PLATFORM > > #define LOG_TAG "INTEL-MESA" > > #if ANDROID_API_LEVEL >= 26 > > diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c > > index 247ba641336..97a71563b8a 100644 > > --- a/src/intel/vulkan/anv_device.c > > +++ b/src/intel/vulkan/anv_device.c > > @@ -286,7 +286,8 @@ anv_physical_device_init_disk_cache(struct > > anv_physical_device *device) > > char timestamp[41]; > > _mesa_sha1_format(timestamp, device->driver_build_sha1); > > > > - device->disk_cache = disk_cache_create(renderer, timestamp, 0); > > + const uint64_t driver_flags = INTEL_DEBUG & DEBUG_DISK_CACHE_MASK; > > + device->disk_cache = disk_cache_create(renderer, timestamp, > > driver_flags); > > #else > > device->disk_cache = NULL; > > #endif > > diff --git a/src/mesa/drivers/dri/i965/brw_disk_cache.c > > b/src/mesa/drivers/dri/i965/brw_disk_cache.c > > index a678c355b9d..8f1b064fd61 100644 > > --- a/src/mesa/drivers/dri/i965/brw_disk_cache.c > > +++ b/src/mesa/drivers/dri/i965/brw_disk_cache.c > > @@ -393,6 +393,7 @@ brw_disk_cache_init(struct intel_screen *screen) > > char timestamp[41]; > > _mesa_sha1_format(timestamp, id_sha1); > > > > - screen->disk_cache = disk_cache_create(renderer, timestamp, 0); > > + const uint64_t driver_flags = INTEL_DEBUG & DEBUG_DISK_CACHE_MASK; > > + screen->disk_cache = disk_cache_create(renderer, timestamp, > > driver_flags); > > #endif > > } > > > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/8] nir: add 16bit type information to glsl types
On Thu, Jul 19, 2018 at 2:48 PM, Daniel Schürmann wrote: > Signed-off-by: Daniel Schürmann > --- > src/compiler/glsl_types.h | 15 +++ > src/compiler/nir_types.cpp | 12 > src/compiler/nir_types.h | 1 + > 3 files changed, 28 insertions(+) > > diff --git a/src/compiler/glsl_types.h b/src/compiler/glsl_types.h > index efc6324865..8cc8177f2d 100644 > --- a/src/compiler/glsl_types.h > +++ b/src/compiler/glsl_types.h > @@ -87,6 +87,13 @@ enum glsl_base_type { > GLSL_TYPE_ERROR > }; > > +static inline bool glsl_base_type_is_16bit(enum glsl_base_type type) > +{ > + return type == GLSL_TYPE_FLOAT16 || > + type == GLSL_TYPE_UINT16 || > + type == GLSL_TYPE_INT16; > +} > + > static inline bool glsl_base_type_is_64bit(enum glsl_base_type type) > { > return type == GLSL_TYPE_DOUBLE || > @@ -551,6 +558,14 @@ public: >return glsl_base_type_is_64bit(base_type); > } > > + /** > +* Query whether or not a type is 64-bit Update comment? > +*/ > + bool is_16bit() const > + { > + return glsl_base_type_is_16bit(base_type); > + } > + > /** > * Query whether or not a type is a non-array boolean type > */ > diff --git a/src/compiler/nir_types.cpp b/src/compiler/nir_types.cpp > index 6f1182b742..3a3864414f 100644 > --- a/src/compiler/nir_types.cpp > +++ b/src/compiler/nir_types.cpp > @@ -164,6 +164,12 @@ glsl_get_record_location_offset(const struct glsl_type > *type, > return type->record_location_offset(length); > } > > +bool > +glsl_type_is_16bit(const glsl_type *type) > +{ > + return type->is_16bit(); > +} > + > bool > glsl_type_is_64bit(const glsl_type *type) > { > @@ -473,6 +479,12 @@ glsl_channel_type(const glsl_type *t) >return glsl_uint64_t_type(); > case GLSL_TYPE_INT64: >return glsl_int64_t_type(); > + case GLSL_TYPE_FLOAT16: > + return glsl_float16_t_type(); > + case GLSL_TYPE_UINT16: > + return glsl_uint16_t_type(); > + case GLSL_TYPE_INT16: > + return glsl_int16_t_type(); > default: >unreachable("Unhandled base type glsl_channel_type()"); > } > diff --git a/src/compiler/nir_types.h b/src/compiler/nir_types.h > index c128250c7d..817b7a9b34 100644 > --- a/src/compiler/nir_types.h > +++ b/src/compiler/nir_types.h > @@ -121,6 +121,7 @@ glsl_get_bit_size(const struct glsl_type *type) > return 0; > } > > +bool glsl_type_is_16bit(const struct glsl_type *type); > bool glsl_type_is_64bit(const struct glsl_type *type); > bool glsl_type_is_void(const struct glsl_type *type); > bool glsl_type_is_error(const struct glsl_type *type); > -- > 2.17.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] docs: move releases from Fridays to Wednesdays
Acked-by: Dylan Baker Quoting Juan A. Suarez Romero (2018-07-23 02:22:06) > Reviewed-by: Juan A. Suarez > > Reviewed/Acked/Agreed, whatever makes more sense :) > > > > J.A. > > On Thu, 2018-07-19 at 16:00 +0300, Andres Gomez wrote: > > As discussed at: > > https://lists.freedesktop.org/archives/mesa-dev/2018-March/188525.html > > > > Cc: Emil Velikov > > Cc: Juan A. Suarez Romero > > Cc: Dylan Baker > > Cc: Ian Romanick > > Cc: Carl Worth > > Cc: Mark Janes > > Signed-off-by: Andres Gomez > > --- > > docs/releasing.html | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/docs/releasing.html b/docs/releasing.html > > index a022d0c484b..14315e7a8e4 100644 > > --- a/docs/releasing.html > > +++ b/docs/releasing.html > > @@ -54,8 +54,8 @@ For example: > > Release schedule > > > > > > -Releases should happen on Fridays. Delays can occur although those should > > be keep > > -to a minimum. > > +Releases should happen on Wednesdays. Delays can occur although those > > +should be keep to a minimum. > > > > See our calendar for > > the > > date and other details for individual releases. > signature.asc Description: signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 54971] __glXInitialize can initialize same display multiple times
https://bugs.freedesktop.org/show_bug.cgi?id=54971 Julien Isorce changed: What|Removed |Added CC||julien.iso...@gmail.com -- You are receiving this mail because: You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] i915/swrast vertex array regression
Hi Mathias, I noticed a while back that xonotic had started to misrender the gun models on i915. Yesterday I bisected it down to commit 64d2a2048054 ("mesa: Make gl_vertex_array contain pointers to first order VAO members."). Actually that commit broke things even worse (and the game would even crash after a while), but a later commit (presumably 98f35ad63c23 ("vbo: Correctly handle source arrays in vbo_split_copy.") fixed things a little bit. But even with current master the guns are still being misrendered. I also verified that the same breakage was visible with swrast, whereas llvmpipe and i965 seemed ok. To reproduce you can run 'xonotic-glx -benchmark demos/the-big-keybench.dem' and wait until the guy picks up the mortar (I think that's what its called) which is maybe the third gun he picks up in that demo. The gun is supposed to have some red color on it but now it has green. It looked like there's already a bunch of stuff piled on top of the regression so reverting didn't seem entirely trivial, and thus I didn't bother trying. -- Ville Syrjälä Intel ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: fix a little memleak when we keep shader info
On 07/23/2018 04:02 PM, Bas Nieuwenhuizen wrote: On Mon, Jul 23, 2018 at 1:50 PM, Samuel Pitoiset wrote: When we keep shader info for eg. RADV_TRACE_FILE, we don't free the NIR when creating the pipelines. Though, we have to free it when destroying the shader module. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_shader.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index aac5b8a21a..ef7c41af9e 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -599,8 +599,8 @@ shader_variant_create(struct radv_device *device, if (device->keep_shader_info) { variant->disasm_string = binary.disasm_string; variant->llvm_ir_string = binary.llvm_ir_string; + variant->nir = *shaders; How does this work with merged shaders? You are right. I figured this on friday but I totally forgot during the weekend. :) This still leaks memory for merged shaders. if (!gs_copy_shader && !module->nir) { - variant->nir = *shaders; variant->spirv = (uint32_t *)module->data; variant->spirv_size = module->size; } -- 2.18.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: Update to new VK_EXT_vertex_attribute_divisor to version 2.
Behavior wrt firstInstance got changed, and a divisor of 0 has been disallowed. The new version of the ext got published in specification 1.1.81. --- src/amd/vulkan/radv_extensions.py | 2 +- src/amd/vulkan/radv_nir_to_llvm.c | 7 --- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/src/amd/vulkan/radv_extensions.py b/src/amd/vulkan/radv_extensions.py index a5fbffac33b..d02042e5647 100644 --- a/src/amd/vulkan/radv_extensions.py +++ b/src/amd/vulkan/radv_extensions.py @@ -104,7 +104,7 @@ EXTENSIONS = [ Extension('VK_EXT_sampler_filter_minmax', 1, 'device->rad_info.chip_class >= CIK'), Extension('VK_EXT_shader_viewport_index_layer', 1, True), Extension('VK_EXT_shader_stencil_export', 1, True), -Extension('VK_EXT_vertex_attribute_divisor', 1, True), +Extension('VK_EXT_vertex_attribute_divisor', 2, True), Extension('VK_AMD_draw_indirect_count', 1, True), Extension('VK_AMD_gcn_shader',1, True), Extension('VK_AMD_rasterization_order', 1, 'device->has_out_of_order_rast'), diff --git a/src/amd/vulkan/radv_nir_to_llvm.c b/src/amd/vulkan/radv_nir_to_llvm.c index c7d772fa652..d12ef09c9f3 100644 --- a/src/amd/vulkan/radv_nir_to_llvm.c +++ b/src/amd/vulkan/radv_nir_to_llvm.c @@ -1984,8 +1984,7 @@ handle_vs_input_decl(struct radv_shader_context *ctx, uint32_t divisor = ctx->options->key.vs.instance_rate_divisors[attrib_index]; if (divisor) { - buffer_index = LLVMBuildAdd(ctx->ac.builder, ctx->abi.instance_id, - ctx->abi.start_instance, ""); + buffer_index = ctx->abi.instance_id; if (divisor != 1) { buffer_index = LLVMBuildUDiv(ctx->ac.builder, buffer_index, @@ -2000,8 +1999,10 @@ handle_vs_input_decl(struct radv_shader_context *ctx, MAX2(1, ctx->shader_info->vs.vgpr_comp_cnt); } } else { - buffer_index = ctx->ac.i32_0; + unreachable("Invalid vertex attribute divisor of 0."); } + + buffer_index = LLVMBuildAdd(ctx->ac.builder, ctx->abi.start_instance, buffer_index, ""); } else buffer_index = LLVMBuildAdd(ctx->ac.builder, ctx->abi.vertex_id, ctx->abi.base_vertex, ""); -- 2.18.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Add a couple of iand/ior optimizations
On Mon, Jul 23, 2018 at 1:08 AM Timothy Arceri wrote: > Ian and I have been looking at these type of things recently. Ian has > started work on a pass to cover this stuff without having to add dozens > of these types of opts. > > https://lists.freedesktop.org/archives/mesa-dev/2018-July/200583.html > Does said pass work only on logical operations with booleans or also regular bitwise operations? This patch is for bitwise operations though it naturally works with booleans. --Jason > On 23/07/18 17:36, Jason Ekstrand wrote: > > Spotted in a shader in Batman: Arkham City. > > --- > > src/compiler/nir/nir_opt_algebraic.py | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/src/compiler/nir/nir_opt_algebraic.py > b/src/compiler/nir/nir_opt_algebraic.py > > index ba277fdfd0e..f2007852b21 100644 > > --- a/src/compiler/nir/nir_opt_algebraic.py > > +++ b/src/compiler/nir/nir_opt_algebraic.py > > @@ -377,6 +377,8 @@ optimizations = [ > > (('ixor', a, a), 0), > > (('ixor', a, 0), a), > > (('inot', ('inot', a)), a), > > + (('ior', ('iand', a, b), b), b), > > + (('iand', ('ior', a, b), b), b), > > # DeMorgan's Laws > > (('iand', ('inot', a), ('inot', b)), ('inot', ('ior', a, b))), > > (('ior', ('inot', a), ('inot', b)), ('inot', ('iand', a, b))), > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] anv: Pay attention to VK_ACCESS_MEMORY_(READ|WRITE)_BIT
Reviewed-by: Jason Ekstrand On Mon, Jul 23, 2018 at 4:05 AM Lionel Landwerlin < lionel.g.landwer...@intel.com> wrote: > Thanks Alex! > > Reviewed-by: Lionel Landwerlin > > On 23/07/18 09:51, Alex Smith wrote: > > According to the spec, these should apply to all read/write access > > types (so would be equivalent to specifying all other access types > > individually). Currently, they were doing nothing. > > > > v2: Handle VK_ACCESS_MEMORY_WRITE_BIT in dstAccessMask. > > > > Signed-off-by: Alex Smith > > Cc: mesa-sta...@lists.freedesktop.org > > --- > > src/intel/vulkan/anv_private.h | 9 + > > 1 file changed, 9 insertions(+) > > > > diff --git a/src/intel/vulkan/anv_private.h > b/src/intel/vulkan/anv_private.h > > index cec2842792..1660fcbbc8 100644 > > --- a/src/intel/vulkan/anv_private.h > > +++ b/src/intel/vulkan/anv_private.h > > @@ -1731,6 +1731,9 @@ anv_pipe_flush_bits_for_access_flags(VkAccessFlags > flags) > >pipe_bits |= ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT; > >pipe_bits |= ANV_PIPE_DEPTH_CACHE_FLUSH_BIT; > >break; > > + case VK_ACCESS_MEMORY_WRITE_BIT: > > + pipe_bits |= ANV_PIPE_FLUSH_BITS; > > + break; > > default: > >break; /* Nothing to do */ > > } > > @@ -1761,6 +1764,12 @@ > anv_pipe_invalidate_bits_for_access_flags(VkAccessFlags flags) > > case VK_ACCESS_TRANSFER_READ_BIT: > >pipe_bits |= ANV_PIPE_TEXTURE_CACHE_INVALIDATE_BIT; > >break; > > + case VK_ACCESS_MEMORY_READ_BIT: > > + pipe_bits |= ANV_PIPE_INVALIDATE_BITS; > > + break; > > + case VK_ACCESS_MEMORY_WRITE_BIT: > > + pipe_bits |= ANV_PIPE_FLUSH_BITS; > > + break; > > default: > >break; /* Nothing to do */ > > } > > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: fix a little memleak when we keep shader info
On Mon, Jul 23, 2018 at 1:50 PM, Samuel Pitoiset wrote: > When we keep shader info for eg. RADV_TRACE_FILE, we don't > free the NIR when creating the pipelines. Though, we have to > free it when destroying the shader module. > > Signed-off-by: Samuel Pitoiset > --- > src/amd/vulkan/radv_shader.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c > index aac5b8a21a..ef7c41af9e 100644 > --- a/src/amd/vulkan/radv_shader.c > +++ b/src/amd/vulkan/radv_shader.c > @@ -599,8 +599,8 @@ shader_variant_create(struct radv_device *device, > if (device->keep_shader_info) { > variant->disasm_string = binary.disasm_string; > variant->llvm_ir_string = binary.llvm_ir_string; > + variant->nir = *shaders; How does this work with merged shaders? > if (!gs_copy_shader && !module->nir) { > - variant->nir = *shaders; > variant->spirv = (uint32_t *)module->data; > variant->spirv_size = module->size; > } > -- > 2.18.0 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 12/15] nir/types: Add a wrapper to access gl_type
On 21/07/18 05:15, Timothy Arceri wrote: > On 21/07/18 13:09, Timothy Arceri wrote: >> Reviewed-by: Timothy Arceri > > Actually I take that back. This introduces a dependency on GL in NIR, Hmm, but that dependency is already there. nir.h includes GL/gl.h, and in fact, there is a comment pointing that is there for GLenum. nir_variable includes a "Glenum format" as part of ARB_shader_image_load_store. Just in case you want to see it on code: https://cgit.freedesktop.org/mesa/mesa/tree/src/compiler/nir/nir.h#n33 https://cgit.freedesktop.org/mesa/mesa/tree/src/compiler/nir/nir.h#n338 > maybe better to put this one in gl_nir_linker.h? I can't move it directly there, as gl_nir_linker is C-code. After all, that was the reason we placed it at nir_types.[h/cpp], because we need a C-wrapper for gl_nir_link* stuff. Perhaps I could put it at link_util.cpp, but sounds weird. And as I mentioned before, NIR already have a dependency on GL. > If you agree that change would be: > > Reviewed-by: Timothy Arceri > >> >> On 21/07/18 01:08, Alejandro Piñeiro wrote: >>> From: Neil Roberts >>> >>> --- >>> src/compiler/nir_types.cpp | 6 ++ >>> src/compiler/nir_types.h | 2 ++ >>> 2 files changed, 8 insertions(+) >>> >>> diff --git a/src/compiler/nir_types.cpp b/src/compiler/nir_types.cpp >>> index 6f1182b742c..2085138c407 100644 >>> --- a/src/compiler/nir_types.cpp >>> +++ b/src/compiler/nir_types.cpp >>> @@ -81,6 +81,12 @@ glsl_get_column_type(const struct glsl_type *type) >>> return type->column_type(); >>> } >>> +GLenum >>> +glsl_get_gl_type(const struct glsl_type *type) >>> +{ >>> + return type->gl_type; >>> +} >>> + >>> enum glsl_base_type >>> glsl_get_base_type(const struct glsl_type *type) >>> { >>> diff --git a/src/compiler/nir_types.h b/src/compiler/nir_types.h >>> index c128250c7d3..ec5bf2cbc76 100644 >>> --- a/src/compiler/nir_types.h >>> +++ b/src/compiler/nir_types.h >>> @@ -59,6 +59,8 @@ glsl_get_function_return_type(const struct >>> glsl_type *type); >>> const struct glsl_function_param * >>> glsl_get_function_param(const struct glsl_type *type, unsigned >>> index); >>> +GLenum glsl_get_gl_type(const struct glsl_type *type); >>> + >>> enum glsl_base_type glsl_get_base_type(const struct glsl_type *type); >>> unsigned glsl_get_vector_elements(const struct glsl_type *type); >>> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 106893] Potential mem leak with radv, linked to RADV_TRACE_FILE
https://bugs.freedesktop.org/show_bug.cgi?id=106893 --- Comment #14 from Alex Smith --- If ever there is a need to get a trace, I've given Samuel details of a game option that can be set to disable the background pipeline preloading. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 106893] Potential mem leak with radv, linked to RADV_TRACE_FILE
https://bugs.freedesktop.org/show_bug.cgi?id=106893 John changed: What|Removed |Added Status|RESOLVED|CLOSED --- Comment #13 from John --- I'm surely wrong but shouldn't that be flushed to a file or something when memory is becoming a problem? It seems like in such a case the trace cannot be used, isn't that problematic? Either way, thank you for looking into it! -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] intel/ppgtt: memory address alignment
Kernel (for ppgtt) requires memory address to be aligned to page size (4096). Added such alignment for buffers marked with EXEC_OBJECT_PINNED. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106997 Fixes: a363bb2cd0e2 (i965: Allocate VMA in userspace for full-PPGTT systems.) Signed-off-by: Sergii Romantsov --- src/mesa/drivers/dri/i965/brw_bufmgr.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_bufmgr.c b/src/mesa/drivers/dri/i965/brw_bufmgr.c index 09d45e3..8383735 100644 --- a/src/mesa/drivers/dri/i965/brw_bufmgr.c +++ b/src/mesa/drivers/dri/i965/brw_bufmgr.c @@ -643,7 +643,7 @@ retry: bo->kflags = bufmgr->initial_kflags; if ((bo->kflags & EXEC_OBJECT_PINNED) && bo->gtt_offset == 0ull) { - bo->gtt_offset = vma_alloc(bufmgr, memzone, bo->size, 1); + bo->gtt_offset = vma_alloc(bufmgr, memzone, bo->size, 4096); if (bo->gtt_offset == 0ull) goto err_free; @@ -784,7 +784,7 @@ brw_bo_gem_create_from_name(struct brw_bufmgr *bufmgr, bo->kflags = bufmgr->initial_kflags; if (bo->kflags & EXEC_OBJECT_PINNED) - bo->gtt_offset = vma_alloc(bufmgr, BRW_MEMZONE_OTHER, bo->size, 1); + bo->gtt_offset = vma_alloc(bufmgr, BRW_MEMZONE_OTHER, bo->size, 4096); _mesa_hash_table_insert(bufmgr->handle_table, >gem_handle, bo); _mesa_hash_table_insert(bufmgr->name_table, >global_name, bo); @@ -1424,7 +1424,7 @@ brw_bo_gem_create_from_prime_internal(struct brw_bufmgr *bufmgr, int prime_fd, if (bo->kflags & EXEC_OBJECT_PINNED) { assert(bo->size > 0); - bo->gtt_offset = vma_alloc(bufmgr, BRW_MEMZONE_OTHER, bo->size, 1); + bo->gtt_offset = vma_alloc(bufmgr, BRW_MEMZONE_OTHER, bo->size, 4096); } if (tiling_mode < 0) { -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: fix a little memleak when we keep shader info
When we keep shader info for eg. RADV_TRACE_FILE, we don't free the NIR when creating the pipelines. Though, we have to free it when destroying the shader module. Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_shader.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c index aac5b8a21a..ef7c41af9e 100644 --- a/src/amd/vulkan/radv_shader.c +++ b/src/amd/vulkan/radv_shader.c @@ -599,8 +599,8 @@ shader_variant_create(struct radv_device *device, if (device->keep_shader_info) { variant->disasm_string = binary.disasm_string; variant->llvm_ir_string = binary.llvm_ir_string; + variant->nir = *shaders; if (!gs_copy_shader && !module->nir) { - variant->nir = *shaders; variant->spirv = (uint32_t *)module->data; variant->spirv_size = module->size; } -- 2.18.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] draw: force draw pipeline if there's more than 65535 vertices
On 22/07/18 00:09, srol...@vmware.com wrote: From: Roland Scheidegger The pt emit path can only handle 65535 - the number of vertices is truncated to a ushort, resulting in a too small buffer allocation, which will crash. Forcing the pipeline path looks suboptimal, then again this bug is probably there ever since GS is supported, so it seems it's not happening often. (Note that the vertex_id in the vertex header is 16 bit too, however this is only used by the draw pipeline, and it denotes the emit vertex nr, and that uses vbuf code, which will only emit smaller chunks, so should be fine I think.) Other solutions would be to simply allow 32bit counts for vertex allocation, however 65535 is already larger than this was intended for (the idea being it should be more cache friendly). Or could try to teach the pt emit path to split the emit in smaller chunks (only the non-index path can be affected, since gs output is always linear), but it's a bit tricky (we don't know the primitive boundaries up-front). Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=107295 Cc: --- src/gallium/auxiliary/draw/draw_pt_emit.c | 2 ++ src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c | 10 ++ src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c | 9 + 3 files changed, 21 insertions(+) diff --git a/src/gallium/auxiliary/draw/draw_pt_emit.c b/src/gallium/auxiliary/draw/draw_pt_emit.c index 6fb630b549..984c76fdf9 100644 --- a/src/gallium/auxiliary/draw/draw_pt_emit.c +++ b/src/gallium/auxiliary/draw/draw_pt_emit.c @@ -158,6 +158,7 @@ draw_pt_emit(struct pt_emit *emit, */ render->set_primitive(draw->render, prim_info->prim); + assert(vertex_count <= 65535); render->allocate_vertices(render, (ushort)translate->key.output_stride, (ushort)vertex_count); @@ -229,6 +230,7 @@ draw_pt_emit_linear(struct pt_emit *emit, */ render->set_primitive(draw->render, prim_info->prim); + assert(count <= 65535); if (!render->allocate_vertices(render, (ushort)translate->key.output_stride, (ushort)count)) diff --git a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c index aa20b918f5..f76e022994 100644 --- a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c +++ b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline.c @@ -299,6 +299,16 @@ fetch_pipeline_generic(struct draw_pt_middle_end *middle, FREE(vert_info->verts); vert_info = _vert_info; prim_info = _prim_info; + + /* + * pt emit can only handle ushort number of vertices (see + * render->allocate_vertices). + * vsplit guarantees there's never more than 4096, however GS can + * easily blow this up (by a factor of 256 (or even 1024) max). + */ + if (vert_info->count > 65535) { + opt |= PT_PIPELINE; + } } else { if (draw_prim_assembler_is_required(draw, prim_info, vert_info)) { draw_prim_assembler_run(draw, prim_info, vert_info, diff --git a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c index 5e0c562256..91c9360cce 100644 --- a/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c +++ b/src/gallium/auxiliary/draw/draw_pt_fetch_shade_pipeline_llvm.c @@ -428,6 +428,15 @@ llvm_pipeline_generic(struct draw_pt_middle_end *middle, FREE(vert_info->verts); vert_info = _vert_info; prim_info = _prim_info; + /* + * pt emit can only handle ushort number of vertices (see + * render->allocate_vertices). + * vsplit guarantees there's never more than 4096, however GS can + * easily blow this up (by a factor of 256 (or even 1024) max). + */ + if (vert_info->count > 65535) { + opt |= PT_PIPELINE; + } } else { if (draw_prim_assembler_is_required(draw, prim_info, vert_info)) { draw_prim_assembler_run(draw, prim_info, vert_info, Sounds good to me. Reviewed-by: Jose Fonseca ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 106893] Potential mem leak with radv, linked to RADV_TRACE_FILE
https://bugs.freedesktop.org/show_bug.cgi?id=106893 Samuel Pitoiset changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |NOTABUG --- Comment #12 from Samuel Pitoiset --- We don't have any memory leaks with RADV_TRACE_FILE actually. The thing is that RoTR creates a TON of pipelines in the background menu for the whole game. When RADV_TRACE_FILE is set we keep all shader info, including SPIRV, NIR and assembly. After few minutes the game will crash because we are out of memory. We can't do anything useful for that. Thanks for the report! -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] virgl: add initial shader_storage_buffer_object support. (v2)
On 23 July 2018 at 17:28, Gert Wollny wrote: > Am Montag, den 23.07.2018, 17:13 +1000 schrieb Dave Airlie: >> On 23 July 2018 at 16:46, Gert Wollny >> wrote: >> > Am Montag, den 23.07.2018, 09:04 +1000 schrieb Dave Airlie: >> > > >> > > > > + uint32_t max_shader_buffer = shader == >> > > > > PIPE_SHADER_FRAGMENT ? >> > > > > + rs->caps.caps.v2.max_shader_buffer_frag_compute : >> > > > >> > > > Shouldn't max_shader_buffer_frag_compute also be used for >> > > > PIPE_SHADER_COMPUTE? >> > > >> > > virgl doesn't support compute shaders yet. Adding support for >> > > them is >> > > in the future patches. >> > >> > I somehow suspected this, but then one should probably explicitely >> > return zero instead of returning max_shader_buffer_other_stages. >> >> You can never reach this point of code with shader set to >> PIPE_SHADER_COMPUTE. >> >> The outer switch statement defaults to return 0 for compute shaders. > > I see, that was not visible in the patch, sorry for the noise. Is that an r-b in disguise? :-) Dave. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] forward precise-flag if supported
On 23 July 2018 at 20:48, Erik Faye-Lund wrote: > New versions of virglrenderer supports the precise-flag, so let's > forward it from TGSI if that's the case. Reviewed-by: Dave Airlie > > This fixes a few dEQP-GLES31 tests: > - dEQP-GLES31.functional.tessellation.common_edge.quads_equal_spacing_precise > - > dEQP-GLES31.functional.tessellation.common_edge.quads_fractional_even_spacing_precise > - > dEQP-GLES31.functional.tessellation.common_edge.quads_fractional_odd_spacing_precise > - > dEQP-GLES31.functional.tessellation.common_edge.triangles_equal_spacing_precise > - > dEQP-GLES31.functional.tessellation.common_edge.triangles_fractional_even_spacing_precise > - > dEQP-GLES31.functional.tessellation.common_edge.triangles_fractional_odd_spacing_precise > > Signed-off-by: Erik Faye-Lund > --- > > This matches this virglrenderer-series, which has been merged upstrea: > https://patchwork.freedesktop.org/series/46361/ > > src/gallium/drivers/virgl/virgl_hw.h | 1 + > src/gallium/drivers/virgl/virgl_tgsi.c | 5 - > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/src/gallium/drivers/virgl/virgl_hw.h > b/src/gallium/drivers/virgl/virgl_hw.h > index ee6aa68c5a..8f4f0ab6d8 100644 > --- a/src/gallium/drivers/virgl/virgl_hw.h > +++ b/src/gallium/drivers/virgl/virgl_hw.h > @@ -203,6 +203,7 @@ enum virgl_formats { > #define VIRGL_CAP_TEXTURE_VIEW (1 << 1) > #define VIRGL_CAP_SET_MIN_SAMPLES (1 << 2) > #define VIRGL_CAP_COPY_IMAGE (1 << 3) > +#define VIRGL_CAP_TGSI_PRECISE (1 << 4) > > #define VIRGL_BIND_DEPTH_STENCIL (1 << 0) > #define VIRGL_BIND_RENDER_TARGET (1 << 1) > diff --git a/src/gallium/drivers/virgl/virgl_tgsi.c > b/src/gallium/drivers/virgl/virgl_tgsi.c > index ff5abf6ddb..d1f785d4d2 100644 > --- a/src/gallium/drivers/virgl/virgl_tgsi.c > +++ b/src/gallium/drivers/virgl/virgl_tgsi.c > @@ -31,6 +31,7 @@ > struct virgl_transform_context { > struct tgsi_transform_context base; > bool cull_enabled; > + bool has_precise; > }; > > static void > @@ -76,7 +77,8 @@ static void > virgl_tgsi_transform_instruction(struct tgsi_transform_context *ctx, > struct tgsi_full_instruction *inst) > { > - if (inst->Instruction.Precise) > + struct virgl_transform_context *vtctx = (struct virgl_transform_context > *)ctx; > + if (!vtctx->has_precise && inst->Instruction.Precise) >inst->Instruction.Precise = 0; > > for (unsigned i = 0; i < inst->Instruction.NumSrcRegs; i++) { > @@ -104,6 +106,7 @@ struct tgsi_token *virgl_tgsi_transform(struct > virgl_context *vctx, const struct > transform.base.transform_property = virgl_tgsi_transform_property; > transform.base.transform_instruction = virgl_tgsi_transform_instruction; > transform.cull_enabled = vscreen->caps.caps.v1.bset.has_cull; > + transform.has_precise = vscreen->caps.caps.v2.capability_bits & > VIRGL_CAP_TGSI_PRECISE; > tgsi_transform_shader(tokens_in, new_tokens, newLen, ); > > return new_tokens; > -- > 2.18.0 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2] anv: Pay attention to VK_ACCESS_MEMORY_(READ|WRITE)_BIT
Thanks Alex! Reviewed-by: Lionel Landwerlin On 23/07/18 09:51, Alex Smith wrote: According to the spec, these should apply to all read/write access types (so would be equivalent to specifying all other access types individually). Currently, they were doing nothing. v2: Handle VK_ACCESS_MEMORY_WRITE_BIT in dstAccessMask. Signed-off-by: Alex Smith Cc: mesa-sta...@lists.freedesktop.org --- src/intel/vulkan/anv_private.h | 9 + 1 file changed, 9 insertions(+) diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h index cec2842792..1660fcbbc8 100644 --- a/src/intel/vulkan/anv_private.h +++ b/src/intel/vulkan/anv_private.h @@ -1731,6 +1731,9 @@ anv_pipe_flush_bits_for_access_flags(VkAccessFlags flags) pipe_bits |= ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT; pipe_bits |= ANV_PIPE_DEPTH_CACHE_FLUSH_BIT; break; + case VK_ACCESS_MEMORY_WRITE_BIT: + pipe_bits |= ANV_PIPE_FLUSH_BITS; + break; default: break; /* Nothing to do */ } @@ -1761,6 +1764,12 @@ anv_pipe_invalidate_bits_for_access_flags(VkAccessFlags flags) case VK_ACCESS_TRANSFER_READ_BIT: pipe_bits |= ANV_PIPE_TEXTURE_CACHE_INVALIDATE_BIT; break; + case VK_ACCESS_MEMORY_READ_BIT: + pipe_bits |= ANV_PIPE_INVALIDATE_BITS; + break; + case VK_ACCESS_MEMORY_WRITE_BIT: + pipe_bits |= ANV_PIPE_FLUSH_BITS; + break; default: break; /* Nothing to do */ } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] forward precise-flag if supported
New versions of virglrenderer supports the precise-flag, so let's forward it from TGSI if that's the case. This fixes a few dEQP-GLES31 tests: - dEQP-GLES31.functional.tessellation.common_edge.quads_equal_spacing_precise - dEQP-GLES31.functional.tessellation.common_edge.quads_fractional_even_spacing_precise - dEQP-GLES31.functional.tessellation.common_edge.quads_fractional_odd_spacing_precise - dEQP-GLES31.functional.tessellation.common_edge.triangles_equal_spacing_precise - dEQP-GLES31.functional.tessellation.common_edge.triangles_fractional_even_spacing_precise - dEQP-GLES31.functional.tessellation.common_edge.triangles_fractional_odd_spacing_precise Signed-off-by: Erik Faye-Lund --- This matches this virglrenderer-series, which has been merged upstrea: https://patchwork.freedesktop.org/series/46361/ src/gallium/drivers/virgl/virgl_hw.h | 1 + src/gallium/drivers/virgl/virgl_tgsi.c | 5 - 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/virgl/virgl_hw.h b/src/gallium/drivers/virgl/virgl_hw.h index ee6aa68c5a..8f4f0ab6d8 100644 --- a/src/gallium/drivers/virgl/virgl_hw.h +++ b/src/gallium/drivers/virgl/virgl_hw.h @@ -203,6 +203,7 @@ enum virgl_formats { #define VIRGL_CAP_TEXTURE_VIEW (1 << 1) #define VIRGL_CAP_SET_MIN_SAMPLES (1 << 2) #define VIRGL_CAP_COPY_IMAGE (1 << 3) +#define VIRGL_CAP_TGSI_PRECISE (1 << 4) #define VIRGL_BIND_DEPTH_STENCIL (1 << 0) #define VIRGL_BIND_RENDER_TARGET (1 << 1) diff --git a/src/gallium/drivers/virgl/virgl_tgsi.c b/src/gallium/drivers/virgl/virgl_tgsi.c index ff5abf6ddb..d1f785d4d2 100644 --- a/src/gallium/drivers/virgl/virgl_tgsi.c +++ b/src/gallium/drivers/virgl/virgl_tgsi.c @@ -31,6 +31,7 @@ struct virgl_transform_context { struct tgsi_transform_context base; bool cull_enabled; + bool has_precise; }; static void @@ -76,7 +77,8 @@ static void virgl_tgsi_transform_instruction(struct tgsi_transform_context *ctx, struct tgsi_full_instruction *inst) { - if (inst->Instruction.Precise) + struct virgl_transform_context *vtctx = (struct virgl_transform_context *)ctx; + if (!vtctx->has_precise && inst->Instruction.Precise) inst->Instruction.Precise = 0; for (unsigned i = 0; i < inst->Instruction.NumSrcRegs; i++) { @@ -104,6 +106,7 @@ struct tgsi_token *virgl_tgsi_transform(struct virgl_context *vctx, const struct transform.base.transform_property = virgl_tgsi_transform_property; transform.base.transform_instruction = virgl_tgsi_transform_instruction; transform.cull_enabled = vscreen->caps.caps.v1.bset.has_cull; + transform.has_precise = vscreen->caps.caps.v2.capability_bits & VIRGL_CAP_TGSI_PRECISE; tgsi_transform_shader(tokens_in, new_tokens, newLen, ); return new_tokens; -- 2.18.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v3 3/4] nv50/ir: optimize imul/imad to xmads
This patch is: Reviewied-By: Karol Herbst forgot to add that On Mon, Jul 23, 2018 at 11:40 AM, Rhys Perry wrote: > This hits the shader-db numbers a good bit, though a few xmads is way > faster than an imul or imad and the cost is mitigated by the next commit, > which optimizes many multiplications by immediates into shorter and less > register heavy instructions than the xmads. > > total instructions in shared programs : 5787704 -> 5839715 (0.90%) > total gprs used in shared programs: 669878 -> 670553 (0.10%) > total shared used in shared programs : 548832 -> 548832 (0.00%) > total local used in shared programs : 21068 -> 21164 (0.46%) > > local sharedgpr inst bytes > helped 0 0 39 0 0 > hurt 1 0 36530763076 > > Signed-off-by: Rhys Perry > --- > .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 56 > ++ > .../nouveau/codegen/nv50_ir_target_gm107.cpp | 1 - > 2 files changed, 56 insertions(+), 1 deletion(-) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > index 6deea7a360..a6ddb284b8 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > @@ -2292,13 +2292,18 @@ AlgebraicOpt::visit(BasicBlock *bb) > // > = > > // ADD(SHL(a, b), c) -> SHLADD(a, b, c) > +// MUL(a, b) -> a few XMADs > +// MAD/FMA(a, b, c) -> a few XMADs > class LateAlgebraicOpt : public Pass > { > private: > virtual bool visit(Instruction *); > > void handleADD(Instruction *); > + void handleMULMAD(Instruction *); > bool tryADDToSHLADD(Instruction *); > + > + BuildUtil bld; > }; > > void > @@ -2359,6 +2364,52 @@ LateAlgebraicOpt::tryADDToSHLADD(Instruction *add) > return true; > } > > +// MUL(a, b) -> a few XMADs > +// MAD/FMA(a, b, c) -> a few XMADs > +void > +LateAlgebraicOpt::handleMULMAD(Instruction *i) > +{ > + // TODO: handle NV50_IR_SUBOP_MUL_HIGH > + if (!prog->getTarget()->isOpSupported(OP_XMAD, TYPE_U32)) > + return; > + if (isFloatType(i->dType) || typeSizeof(i->dType) != 4) > + return; > + if (i->subOp || i->usesFlags() || i->flagsDef >= 0) > + return; > + > + assert(!i->src(0).mod); > + assert(!i->src(1).mod); > + assert(i->op == OP_MUL ? 1 : !i->src(2).mod); > + > + bld.setPosition(i, false); > + > + Value *a = i->getSrc(0); > + Value *b = i->getSrc(1); > + Value *c = i->op == OP_MUL ? bld.mkImm(0) : i->getSrc(2); > + > + Value *tmp0 = bld.getSSA(); > + Value *tmp1 = bld.getSSA(); > + > + Instruction *insn = bld.mkOp3(OP_XMAD, TYPE_U32, tmp0, b, a, c); > + insn->setPredicate(i->cc, i->getPredicate()); > + > + insn = bld.mkOp3(OP_XMAD, TYPE_U32, tmp1, b, a, bld.mkImm(0)); > + insn->setPredicate(i->cc, i->getPredicate()); > + insn->subOp = NV50_IR_SUBOP_XMAD_MRG | NV50_IR_SUBOP_XMAD_H1(1); > + > + Value *pred = i->getPredicate(); > + i->setPredicate(i->cc, NULL); > + > + i->op = OP_XMAD; > + i->setSrc(0, b); > + i->setSrc(1, tmp1); > + i->setSrc(2, tmp0); > + i->subOp = NV50_IR_SUBOP_XMAD_PSL | NV50_IR_SUBOP_XMAD_CBCC; > + i->subOp |= NV50_IR_SUBOP_XMAD_H1(0) | NV50_IR_SUBOP_XMAD_H1(1); > + > + i->setPredicate(i->cc, pred); > +} > + > bool > LateAlgebraicOpt::visit(Instruction *i) > { > @@ -2366,6 +2417,11 @@ LateAlgebraicOpt::visit(Instruction *i) > case OP_ADD: >handleADD(i); >break; > + case OP_MUL: > + case OP_MAD: > + case OP_FMA: > + handleMULMAD(i); > + break; > default: >break; > } > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp > index bb1c234c43..edb823afb4 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp > @@ -166,7 +166,6 @@ TargetGM107::isBarrierRequired(const Instruction *insn) > const >} >break; > case OPCLASS_ARITH: > - // TODO: IMUL/IMAD require barriers too, use of XMAD instead! >if ((insn->op == OP_MUL || insn->op == OP_MAD) && >!isFloatType(insn->dType)) > return true; > -- > 2.14.4 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 1/4] nv50/ir: add preliminary support for OP_XMAD
Signed-off-by: Rhys Perry --- src/gallium/drivers/nouveau/codegen/nv50_ir.h | 26 ++ .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 18 +-- .../drivers/nouveau/codegen/nv50_ir_print.cpp | 19 .../drivers/nouveau/codegen/nv50_ir_target.cpp | 7 +++--- .../nouveau/codegen/nv50_ir_target_gm107.cpp | 1 + .../nouveau/codegen/nv50_ir_target_nv50.cpp| 1 + .../nouveau/codegen/nv50_ir_target_nvc0.cpp| 15 + 7 files changed, 82 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h b/src/gallium/drivers/nouveau/codegen/nv50_ir.h index 0b220cc48d..13822a08c3 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h @@ -58,6 +58,9 @@ enum operation OP_FMA, OP_SAD, // abs(src0 - src1) + src2 OP_SHLADD, + // extended multiply-add (GM107+), does a lot of things. + // see envytools for detailed documentation + OP_XMAD, OP_ABS, OP_NEG, OP_NOT, @@ -256,6 +259,29 @@ enum operation #define NV50_IR_SUBOP_MINMAX_MED 2 #define NV50_IR_SUBOP_MINMAX_HIGH 3 +// xmad(src0, src1, 0) << 16 + src2 +#define NV50_IR_SUBOP_XMAD_PSL (1 << 0) +// (xmad(src0, src1, src2) & 0x) | (src1 << 16) +#define NV50_IR_SUBOP_XMAD_MRG (1 << 1) +// xmad(src0, src1, src2.lo) +#define NV50_IR_SUBOP_XMAD_CLO (1 << 2) +// xmad(src0, src1, src2.hi) +#define NV50_IR_SUBOP_XMAD_CHI (2 << 2) +// if both operands to the multiplication are non-zero, subtract 65536 for each +// negative operand +#define NV50_IR_SUBOP_XMAD_CSFU (3 << 2) +// xmad(src0, src1, src2) + src1 << 16 +#define NV50_IR_SUBOP_XMAD_CBCC (4 << 2) +#define NV50_IR_SUBOP_XMAD_CMODE_SHIFT 2 +#define NV50_IR_SUBOP_XMAD_CMODE_MASK uint16_t(0x7 << NV50_IR_SUBOP_XMAD_CMODE_SHIFT) + +// use the high 16 bits instead of the low 16 bits for the multiplication. +// if the instruction's sType is signed, sign extend the operand from 16 bits +// to 32 before multiplication. +#define NV50_IR_SUBOP_XMAD_H1_SHIFT 5 +#define NV50_IR_SUBOP_XMAD_H1(i) (1 << (NV50_IR_SUBOP_XMAD_H1_SHIFT + (i))) +#define NV50_IR_SUBOP_XMAD_H1_MASK uint16_t(0x3 << NV50_IR_SUBOP_XMAD_H1_SHIFT) + enum DataType { TYPE_NONE, diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 16022e6f23..6deea7a360 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -191,9 +191,17 @@ void LoadPropagation::checkSwapSrc01(Instruction *insn) { const Target *targ = prog->getTarget(); - if (!targ->getOpInfo(insn).commutative) - if (insn->op != OP_SET && insn->op != OP_SLCT && insn->op != OP_SUB) + if (!targ->getOpInfo(insn).commutative) { + if (insn->op != OP_SET && insn->op != OP_SLCT && + insn->op != OP_SUB && insn->op != OP_XMAD) return; + // XMAD is only commutative if both the CBCC and MRG flags are not set. + if (insn->op == OP_XMAD && + (insn->subOp & NV50_IR_SUBOP_XMAD_CMODE_MASK) == NV50_IR_SUBOP_XMAD_CBCC) + return; + if (insn->op == OP_XMAD && (insn->subOp & NV50_IR_SUBOP_XMAD_MRG)) + return; + } if (insn->src(1).getFile() != FILE_GPR) return; // This is the special OP_SET used for alphatesting, we can't reverse its @@ -236,6 +244,12 @@ LoadPropagation::checkSwapSrc01(Instruction *insn) if (insn->op == OP_SUB) { insn->src(0).mod = insn->src(0).mod ^ Modifier(NV50_IR_MOD_NEG); insn->src(1).mod = insn->src(1).mod ^ Modifier(NV50_IR_MOD_NEG); + } else + if (insn->op == OP_XMAD) { + // swap h1 flags + uint16_t h1 = (insn->subOp >> 1 & NV50_IR_SUBOP_XMAD_H1(0)) | +(insn->subOp << 1 & NV50_IR_SUBOP_XMAD_H1(1)); + insn->subOp = (insn->subOp & ~NV50_IR_SUBOP_XMAD_H1_MASK) | h1; } } diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp index ee3506fbae..7eab8b8d70 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp @@ -86,6 +86,7 @@ const char *operationStr[OP_LAST + 1] = "fma", "sad", "shladd", + "xmad", "abs", "neg", "not", @@ -240,6 +241,11 @@ static const char *barOpStr[] = "sync", "arrive", "red and", "red or", "red popc" }; +static const char *xmadOpCModeStr[] = +{ + "clo", "chi", "csfu", "cbcc" +}; + static const char *DataTypeStr[] = { "-", @@ -625,6 +631,19 @@ void Instruction::print() const if (subOp < ARRAY_SIZE(barOpStr)) PRINT("%s ", barOpStr[subOp]); break; + case OP_XMAD: { + if (subOp & NV50_IR_SUBOP_XMAD_PSL) +PRINT("psl "); + if (subOp & NV50_IR_SUBOP_XMAD_MRG) +PRINT("mrg "); + unsigned cmode = (subOp &
[Mesa-dev] [PATCH v3 4/4] nv50/ir: further optimize multiplication by immediates
Strongly mitigates the harm from the previous commit, which made many integer multiplications much more heavy on the register and instruction count. total instructions in shared programs : 5839715 -> 5801926 (-0.65%) total gprs used in shared programs: 670553 -> 669853 (-0.10%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21164 -> 21068 (-0.45%) local sharedgpr inst bytes helped 1 0 40825222522 hurt 0 0 232 23 23 Signed-off-by: Rhys Perry --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 126 ++--- 1 file changed, 112 insertions(+), 14 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index a6ddb284b8..e5d033c9b0 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -379,6 +379,10 @@ private: CmpInstruction *findOriginForTestWithZero(Value *); + Value *createMulMethod1(Value *a, unsigned b, Value *c); + Value *createMulMethod2(Value *a, unsigned b, Value *c); + Value *createMul(Value *a, unsigned b, Value *c); + unsigned int foldCount; BuildUtil bld; @@ -953,6 +957,88 @@ ConstantFolding::opnd3(Instruction *i, ImmediateValue ) } } +Value * +ConstantFolding::createMulMethod1(Value *a, unsigned b, Value *c) +{ + // basically constant folded shift and add multiplication. + Value *res = c ? c : bld.loadImm(NULL, 0u); + bool resZero = !c; + unsigned ashift = 0; + while (b) { + if ((b & 1) && ashift) { + Value *sh = bld.loadImm(NULL, ashift); + if (resZero) +res = bld.mkOp2v(OP_SHL, TYPE_U32, bld.getSSA(), a, sh); + else +res = bld.mkOp3v(OP_SHLADD, TYPE_U32, bld.getSSA(), a, sh, res); + resZero = false; + } else if (b & 1) { + if (resZero) +res = a; + else +res = bld.mkOp2v(OP_ADD, TYPE_U32, bld.getSSA(), res, a); + resZero = false; + } + b >>= 1; + ashift++; + } + return res; +} + +Value * +ConstantFolding::createMulMethod2(Value *a, unsigned b, Value *c) +{ + // basically does a * b2 - a * (b2 - b) + c + uint64_t b2 = util_next_power_of_two64(b); + unsigned b2shift = ffsll(b2) - 1; + + Value *mul1 = createMulMethod1(a, b2 - b, NULL); + + Value *res; + if (b2shift < 32) { + Instruction *i = bld.mkOp3(OP_SHLADD, TYPE_U32, bld.getSSA(), + a, bld.loadImm(NULL, b2shift), mul1); + res = i->getDef(0); + + // all targets supporting OP_SHLADD should pass this + assert(bld.getProgram()->getTarget()->isModSupported(i, 2, NV50_IR_MOD_NEG)); + i->src(2).mod *= Modifier(NV50_IR_MOD_NEG); + } else { + res = bld.mkOp1v(OP_NEG, TYPE_U32, bld.getSSA(), mul1); + } + + if (c) + res = bld.mkOp2v(OP_ADD, TYPE_U32, bld.getSSA(), res, c); + + return res; +} + +Value * +ConstantFolding::createMul(Value *a, unsigned b, Value *c) +{ + unsigned cost[2]; + + // estimate cost for first method (a << i) + (b << j) + ... + cost[0] = util_bitcount64(b >> 1); + + // estimate cost for second method (a << i) - ((a << j) + (a << k) + ...) + uint64_t rounded_b = util_next_power_of_two64(b); + cost[1] = rounded_b == b ? 1 : (util_bitcount64((rounded_b - b) >> 1) + 1); + if (c) cost[1]++; + + // The general method, multiplication by XMADs, costs three instructions. + // So nothing much larger than that or it could be making things worse. + if (cost[0] > 3 && cost[1] > 3) + return NULL; + + // the cost is the same for both methods with powers of twos + // but method 1 creates more optimizable code + if (cost[0] <= cost[1]) + return createMulMethod1(a, b, c); + else + return createMulMethod2(a, b, c); +} + void ConstantFolding::opnd(Instruction *i, ImmediateValue , int s) { @@ -1040,13 +1126,25 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue , int s) i->setSrc(s, i->getSrc(t)); i->src(s).mod = i->src(t).mod; } else - if (!isFloatType(i->sType) && !imm0.isNegative() && imm0.isPow2()) { - i->op = OP_SHL; - imm0.applyLog2(); - i->setSrc(0, i->getSrc(t)); - i->src(0).mod = i->src(t).mod; - i->setSrc(1, new_ImmediateValue(prog, imm0.reg.data.u32)); - i->src(1).mod = 0; + if (!isFloatType(i->dType)) { + bool optimized = false; + if (target->isOpSupported(OP_SHLADD, TYPE_U32)) { +bld.setPosition(i, false); +Value *val = createMul(i->getSrc(t), imm0.reg.data.u32, NULL); +if (val) { + i->def(0).replace(val, false); + delete_Instruction(prog, i); + optimized = true; +
[Mesa-dev] [PATCH v3 0/4] nv50/ir: Improve Performance of Integer Multiplication
Changes in v3: - stylistic changes - simplify createMulMethod2() - update shader-db statistics - use util_bitcount64 and util_next_power_of_two64 instead of reimplementing them Changes in v2: - rebase - bring back constant folding for multiplication by power-of-twos for nv50 - remove TODO in nv50_ir_target_gm107.cpp - document XMAD's flags - change how XMAD's per-operand flags are represented - move util/bitscan.h stuff into a new patch - stylistic changes This series improve the performance of integer multiplication by removing much usage of the very slow IMAD and IMUL on Maxwell+ and improving multiplication by immediates on Fermi+. The first and second patch add support for the XMAD instruction in codegen The third patch replaces most IMADs and IMULs with a sequence of XMADs on Maxwell+. This is far faster but increases the total instructions in the shader-db by 0.90%, gpr count by 0.10% and local memory by 0.46%. The next patch significantly lowers this number. It replaces many multiplications by immediates with instructions that should be as fast or faster than the XMAD approach. They are also typically smaller and less register heavy, so they decrease the total instruction count by -0.65% and bring the gpr count and local memory back to normal. This series gives about a ~50% speedup in fragment-heavy scenaries with Dolphin 5.0 on my GTX 1060. All timings were made with interesting looking fifos from Dolphin's bugtracker: Wind Waker: 18 FPS -> 26 FPS at 3x internal resolution Wind Waker: 8 FPS -> 11 FPS at 5x internal resolution Paper Mario?: 26 FPS -> 42 FPS at 5x internal resolution SpongeBob Movie: 19 FPS -> 30 FPS at 5x internal resolution Unigine Heaven and Unigine Valley seems to run the same at low quality with no anti-aliasing and no tessellation. SuperTuxKart and 0 A.D. also show no change. It's possible these patches may break something. Piglit shows no functionality regressions though they should probably be tested for improvements or breakage with actual applications. These patches can also be found on my github: https://github.com/pendingchaos/mesa/tree/nv-xmad-v3 The final changes in shader-db are as follows: total instructions in shared programs : 5787704 -> 5801926 (0.25%) total gprs used in shared programs: 669878 -> 669853 (-0.00%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21068 -> 21068 (0.00%) local sharedgpr inst bytes helped 0 0 280 717 717 hurt 0 0 29821712171 Rhys Perry (4): nv50/ir: add preliminary support for OP_XMAD gm107/ir: add support for OP_XMAD on GM107+ nv50/ir: optimize imul/imad to xmads nv50/ir: further optimize multiplication by immediates src/gallium/drivers/nouveau/codegen/nv50_ir.h | 26 +++ .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 65 +++ .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 200 +++-- .../drivers/nouveau/codegen/nv50_ir_print.cpp | 19 ++ .../drivers/nouveau/codegen/nv50_ir_target.cpp | 7 +- .../nouveau/codegen/nv50_ir_target_gm107.cpp | 6 +- .../nouveau/codegen/nv50_ir_target_nv50.cpp| 1 + .../nouveau/codegen/nv50_ir_target_nvc0.cpp| 16 ++ 8 files changed, 320 insertions(+), 20 deletions(-) -- 2.14.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 3/4] nv50/ir: optimize imul/imad to xmads
This hits the shader-db numbers a good bit, though a few xmads is way faster than an imul or imad and the cost is mitigated by the next commit, which optimizes many multiplications by immediates into shorter and less register heavy instructions than the xmads. total instructions in shared programs : 5787704 -> 5839715 (0.90%) total gprs used in shared programs: 669878 -> 670553 (0.10%) total shared used in shared programs : 548832 -> 548832 (0.00%) total local used in shared programs : 21068 -> 21164 (0.46%) local sharedgpr inst bytes helped 0 0 39 0 0 hurt 1 0 36530763076 Signed-off-by: Rhys Perry --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 56 ++ .../nouveau/codegen/nv50_ir_target_gm107.cpp | 1 - 2 files changed, 56 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 6deea7a360..a6ddb284b8 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -2292,13 +2292,18 @@ AlgebraicOpt::visit(BasicBlock *bb) // = // ADD(SHL(a, b), c) -> SHLADD(a, b, c) +// MUL(a, b) -> a few XMADs +// MAD/FMA(a, b, c) -> a few XMADs class LateAlgebraicOpt : public Pass { private: virtual bool visit(Instruction *); void handleADD(Instruction *); + void handleMULMAD(Instruction *); bool tryADDToSHLADD(Instruction *); + + BuildUtil bld; }; void @@ -2359,6 +2364,52 @@ LateAlgebraicOpt::tryADDToSHLADD(Instruction *add) return true; } +// MUL(a, b) -> a few XMADs +// MAD/FMA(a, b, c) -> a few XMADs +void +LateAlgebraicOpt::handleMULMAD(Instruction *i) +{ + // TODO: handle NV50_IR_SUBOP_MUL_HIGH + if (!prog->getTarget()->isOpSupported(OP_XMAD, TYPE_U32)) + return; + if (isFloatType(i->dType) || typeSizeof(i->dType) != 4) + return; + if (i->subOp || i->usesFlags() || i->flagsDef >= 0) + return; + + assert(!i->src(0).mod); + assert(!i->src(1).mod); + assert(i->op == OP_MUL ? 1 : !i->src(2).mod); + + bld.setPosition(i, false); + + Value *a = i->getSrc(0); + Value *b = i->getSrc(1); + Value *c = i->op == OP_MUL ? bld.mkImm(0) : i->getSrc(2); + + Value *tmp0 = bld.getSSA(); + Value *tmp1 = bld.getSSA(); + + Instruction *insn = bld.mkOp3(OP_XMAD, TYPE_U32, tmp0, b, a, c); + insn->setPredicate(i->cc, i->getPredicate()); + + insn = bld.mkOp3(OP_XMAD, TYPE_U32, tmp1, b, a, bld.mkImm(0)); + insn->setPredicate(i->cc, i->getPredicate()); + insn->subOp = NV50_IR_SUBOP_XMAD_MRG | NV50_IR_SUBOP_XMAD_H1(1); + + Value *pred = i->getPredicate(); + i->setPredicate(i->cc, NULL); + + i->op = OP_XMAD; + i->setSrc(0, b); + i->setSrc(1, tmp1); + i->setSrc(2, tmp0); + i->subOp = NV50_IR_SUBOP_XMAD_PSL | NV50_IR_SUBOP_XMAD_CBCC; + i->subOp |= NV50_IR_SUBOP_XMAD_H1(0) | NV50_IR_SUBOP_XMAD_H1(1); + + i->setPredicate(i->cc, pred); +} + bool LateAlgebraicOpt::visit(Instruction *i) { @@ -2366,6 +2417,11 @@ LateAlgebraicOpt::visit(Instruction *i) case OP_ADD: handleADD(i); break; + case OP_MUL: + case OP_MAD: + case OP_FMA: + handleMULMAD(i); + break; default: break; } diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp index bb1c234c43..edb823afb4 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp @@ -166,7 +166,6 @@ TargetGM107::isBarrierRequired(const Instruction *insn) const } break; case OPCLASS_ARITH: - // TODO: IMUL/IMAD require barriers too, use of XMAD instead! if ((insn->op == OP_MUL || insn->op == OP_MAD) && !isFloatType(insn->dType)) return true; -- 2.14.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v3 2/4] gm107/ir: add support for OP_XMAD on GM107+
Signed-off-by: Rhys Perry --- .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 65 ++ .../nouveau/codegen/nv50_ir_target_gm107.cpp | 6 +- .../nouveau/codegen/nv50_ir_target_nvc0.cpp| 1 + 3 files changed, 71 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp index 1d31f181e4..c3d7be0f0e 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp @@ -156,6 +156,7 @@ private: void emitIMUL(); void emitIMAD(); void emitISCADD(); + void emitXMAD(); void emitIMNMX(); void emitICMP(); void emitISET(); @@ -1892,6 +1893,67 @@ CodeEmitterGM107::emitISCADD() emitGPR (0x00, insn->def(0)); } +void +CodeEmitterGM107::emitXMAD() +{ + assert(insn->src(0).getFile() == FILE_GPR); + + bool constbuf = false; + bool psl_mrg = true; + bool immediate = false; + if (insn->src(2).getFile() == FILE_MEMORY_CONST) { + assert(insn->src(1).getFile() == FILE_GPR); + constbuf = true; + psl_mrg = false; + emitInsn(0x5100); + emitGPR(0x27, insn->src(1)); + emitCBUF(0x22, -1, 0x14, 16, 2, insn->src(2)); + } else if (insn->src(1).getFile() == FILE_MEMORY_CONST) { + assert(insn->src(2).getFile() == FILE_GPR); + constbuf = true; + emitInsn(0x4e00); + emitCBUF(0x22, -1, 0x14, 16, 2, insn->src(1)); + emitGPR(0x27, insn->src(2)); + } else if (insn->src(1).getFile() == FILE_IMMEDIATE) { + assert(insn->src(2).getFile() == FILE_GPR); + assert(!(insn->subOp & NV50_IR_SUBOP_XMAD_H1(1))); + immediate = false; + emitInsn(0x3600); + emitIMMD(0x14, 19, insn->src(1)); + emitGPR(0x27, insn->src(2)); + } else { + assert(insn->src(1).getFile() == FILE_GPR); + assert(insn->src(2).getFile() == FILE_GPR); + emitInsn(0x5b00); + emitGPR(0x14, insn->src(1)); + emitGPR(0x27, insn->src(2)); + } + + if (psl_mrg) + emitField(constbuf ? 0x37 : 0x24, 2, insn->subOp & 0x3); + + unsigned cmode = (insn->subOp & NV50_IR_SUBOP_XMAD_CMODE_MASK); + cmode >>= NV50_IR_SUBOP_XMAD_CMODE_SHIFT; + emitField(0x32, constbuf ? 2 : 3, cmode); + + emitX(constbuf ? 0x36 : 0x26); + emitCC(0x2f); + + emitGPR(0x0, insn->def(0)); + emitGPR(0x8, insn->src(0)); + + // source flags + if (isSignedType(insn->sType)) { + uint16_t h1s = insn->subOp & NV50_IR_SUBOP_XMAD_H1_MASK; + emitField(0x30, 2, h1s >> NV50_IR_SUBOP_XMAD_H1_SHIFT); + } + emitField(0x35, 1, insn->subOp & NV50_IR_SUBOP_XMAD_H1(0) ? 1 : 0); + if (!immediate) { + bool h1 = insn->subOp & NV50_IR_SUBOP_XMAD_H1(1); + emitField(constbuf ? 0x34 : 0x23, 1, h1); + } +} + void CodeEmitterGM107::emitIMNMX() { @@ -3266,6 +3328,9 @@ CodeEmitterGM107::emitInstruction(Instruction *i) case OP_SHLADD: emitISCADD(); break; + case OP_XMAD: + emitXMAD(); + break; case OP_MIN: case OP_MAX: if (isFloatType(insn->dType)) { diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp index 7293fb27dd..bb1c234c43 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_gm107.cpp @@ -60,8 +60,11 @@ TargetGM107::isOpSupported(operation op, DataType ty) const case OP_SQRT: case OP_DIV: case OP_MOD: - case OP_XMAD: return false; + case OP_XMAD: + if (isFloatType(ty)) + return false; + break; default: break; } @@ -231,6 +234,7 @@ TargetGM107::getLatency(const Instruction *insn) const case OP_SUB: case OP_VOTE: case OP_XOR: + case OP_XMAD: if (insn->dType != TYPE_F64) return 6; break; diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp index 7e66d2950b..5257f353e4 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nvc0.cpp @@ -161,6 +161,7 @@ static const struct opProperties _initPropsGM107[] = { { OP_SUSTP, 0x0, 0x0, 0x0, 0x0, 0x0, 0x4 }, { OP_SUREDB, 0x0, 0x0, 0x0, 0x0, 0x0, 0x4 }, { OP_SUREDP, 0x0, 0x0, 0x0, 0x0, 0x0, 0x4 }, + { OP_XMAD,0x0, 0x0, 0x0, 0x0, 0x6, 0x2 }, }; void TargetNVC0::initProps(const struct opProperties *props, int size) -- 2.14.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 07/15] spirv/nir: Set info.gs.uses_end_primitive
On 20/07/18 18:32, Jason Ekstrand wrote: > This should already be handled by nir_gather_info. Is there some > reason why we need to do it here as well? Seems so. I have just tried to remove this patch, and I got no regressions with run our specific tests, and a full run with the borrowed tests. So lets drop it for now. > > On July 20, 2018 08:09:26 Alejandro Piñeiro wrote: > >> From: Neil Roberts >> >> Whenever SpvOpEndPrimitive or SpvOpEndStreamPrimitive is encountered >> it now sets uses_end_primitive to true. This reflects the code in >> validate_geometry_shader_emissions for GLSL. >> --- >> src/compiler/spirv/spirv_to_nir.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/src/compiler/spirv/spirv_to_nir.c >> b/src/compiler/spirv/spirv_to_nir.c >> index 238298a8340..4d297e60b60 100644 >> --- a/src/compiler/spirv/spirv_to_nir.c >> +++ b/src/compiler/spirv/spirv_to_nir.c >> @@ -3208,6 +3208,7 @@ vtn_handle_barrier(struct vtn_builder *b, SpvOp >> opcode, >> case SpvOpEndPrimitive: >> case SpvOpEndStreamPrimitive: >> intrinsic_op = nir_intrinsic_end_primitive; >> + b->shader->info.gs.uses_end_primitive = true; >> break; >> default: >> unreachable("Invalid opcode"); >> -- >> 2.14.1 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 2/2] i965, anv: Use INTEL_DEBUG for disk_cache driver flags
Hi, Should this fix also: https://bugs.freedesktop.org/show_bug.cgi?id=106382 ? - Eero On 23.07.2018 07:27, Jordan Justen wrote: Since various options within INTEL_DEBUG could impact code generation, we need to set the disk cache driver_flags parameter based on the INTEL_DEBUG flags in use. An example that will affect the program generated by i965 is the INTEL_DEBUG=nocompact option. The DEBUG_DISK_CACHE_MASK value is added to mask the settings of INTEL_DEBUG that can affect program generation. v2: * Use driver_flags (Tim) * Also update Anvil (Jason) Signed-off-by: Jordan Justen Reviewed-by: Lionel Landwerlin (v1) --- src/intel/common/gen_debug.h | 5 + src/intel/vulkan/anv_device.c | 3 ++- src/mesa/drivers/dri/i965/brw_disk_cache.c | 3 ++- 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/src/intel/common/gen_debug.h b/src/intel/common/gen_debug.h index f6c44eeb912..aa9f3cf80d7 100644 --- a/src/intel/common/gen_debug.h +++ b/src/intel/common/gen_debug.h @@ -84,6 +84,11 @@ extern uint64_t INTEL_DEBUG; #define DEBUG_COLOR (1ull << 40) #define DEBUG_REEMIT (1ull << 41) +/* These flags may affect program generation */ +#define DEBUG_DISK_CACHE_MASK \ + (DEBUG_SHADER_TIME | DEBUG_NO16 | DEBUG_NO_DUAL_OBJECT_GS | DEBUG_NO8 | \ + DEBUG_SPILL_FS | DEBUG_SPILL_VEC4 | DEBUG_NO_COMPACTION | DEBUG_DO32) + #ifdef HAVE_ANDROID_PLATFORM #define LOG_TAG "INTEL-MESA" #if ANDROID_API_LEVEL >= 26 diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index 247ba641336..97a71563b8a 100644 --- a/src/intel/vulkan/anv_device.c +++ b/src/intel/vulkan/anv_device.c @@ -286,7 +286,8 @@ anv_physical_device_init_disk_cache(struct anv_physical_device *device) char timestamp[41]; _mesa_sha1_format(timestamp, device->driver_build_sha1); - device->disk_cache = disk_cache_create(renderer, timestamp, 0); + const uint64_t driver_flags = INTEL_DEBUG & DEBUG_DISK_CACHE_MASK; + device->disk_cache = disk_cache_create(renderer, timestamp, driver_flags); #else device->disk_cache = NULL; #endif diff --git a/src/mesa/drivers/dri/i965/brw_disk_cache.c b/src/mesa/drivers/dri/i965/brw_disk_cache.c index a678c355b9d..8f1b064fd61 100644 --- a/src/mesa/drivers/dri/i965/brw_disk_cache.c +++ b/src/mesa/drivers/dri/i965/brw_disk_cache.c @@ -393,6 +393,7 @@ brw_disk_cache_init(struct intel_screen *screen) char timestamp[41]; _mesa_sha1_format(timestamp, id_sha1); - screen->disk_cache = disk_cache_create(renderer, timestamp, 0); + const uint64_t driver_flags = INTEL_DEBUG & DEBUG_DISK_CACHE_MASK; + screen->disk_cache = disk_cache_create(renderer, timestamp, driver_flags); #endif } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] RFC : Context aware user space Resource control
On Fri, Jul 20, 2018 at 10:59:03AM +0100, Lionel Landwerlin wrote: > On 20/07/18 09:32, aravindan.muthuku...@intel.com wrote: > >diff --git a/src/egl/generate/egl.xml b/src/egl/generate/egl.xml > >index 9250f93..52b0c9f 100644 > >--- a/src/egl/generate/egl.xml > >+++ b/src/egl/generate/egl.xml > >@@ -460,6 +460,14 @@ > > > > > >+ >comment="Reserved for context aware load type"> > >+ > >+ > >+ > >+ > >+ > >+ > >+ > > > comment="Reserved for Tim Renouf, Antix (Khronos bug 4949)"> > > > > > > Are you defining a new extension here? > The chunk above seems to imply this is from IMG, but still commented as > reserved. > > As far as I understand enums need to be allocated through Khronos group > before you can make use of them. > Otherwise this will be unusable by applications as soon as another extension > reuses the same enums... Hello Lionel, Thank you for the comments, yes we used IMG extension as a template and were wondering how the xml is modified , when we found the procedure for proposing an extension : https://www.khronos.org/registry/OpenGL/docs/rules.html , just wondering how does it work for RFC , Do we need to get through the Khronos process first then send for RFC or is the other way round ok ? I shall dig into the mail archives for any existing extension and how it came into being. Thanks, Kedar > > Thanks, > > - > Lionel > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] docs: update calendar to match the 18.2 plan with the one announced
Reviewed-by: Juan A. Suarez J.A. On Thu, 2018-07-19 at 15:21 -0700, Dylan Baker wrote: > Quoting Andres Gomez (2018-07-19 06:03:11) > > Additionally, I've extended the 18.1 cycle by one more release, > > tentatively assigned to Dylan, due to the ~2 weeks delay for 18.2. > > This seems like a good plan to me, and I have no problem making an additional > release. > > Acked-by: Dylan Baker > > > > > Cc: Dylan Baker > > Cc: Juan A. Suarez > > Cc: Emil Velikov > > Signed-off-by: Andres Gomez > > --- > > docs/release-calendar.html | 16 +++- > > 1 file changed, 11 insertions(+), 5 deletions(-) > > > > diff --git a/docs/release-calendar.html b/docs/release-calendar.html > > index 7f029b5b396..8e23b7f9e05 100644 > > --- a/docs/release-calendar.html > > +++ b/docs/release-calendar.html > > @@ -39,7 +39,7 @@ if you'd like to nominate a patch in the next stable > > release. > > Notes > > > > > > -18.1 > > +18.1 > > 2018-07-27 > > 18.1.5 > > Dylan Baker > > @@ -55,29 +55,35 @@ if you'd like to nominate a patch in the next stable > > release. > > 2018-08-24 > > 18.1.7 > > Dylan Baker > > + > > + > > + > > +2018-09-07 > > +18.1.8 > > +Dylan Baker > > Last planned 18.1.x release > > > > > > 18.2 > > -2018-07-20 > > +2018-08-01 > > 18.2.0rc1 > > Andres Gomez > > > > > > > > -2018-07-27 > > +2018-08-08 > > 18.2.0rc2 > > Andres Gomez > > > > > > > > -2018-08-03 > > +2018-08-15 > > 18.2.0rc3 > > Andres Gomez > > > > > > > > -2018-08-10 > > +2018-08-22 > > 18.2.0rc4 > > Andres Gomez > > Last planned RC/Final release > > -- > > 2.18.0 > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] docs: move releases from Fridays to Wednesdays
Reviewed-by: Juan A. Suarez Reviewed/Acked/Agreed, whatever makes more sense :) J.A. On Thu, 2018-07-19 at 16:00 +0300, Andres Gomez wrote: > As discussed at: > https://lists.freedesktop.org/archives/mesa-dev/2018-March/188525.html > > Cc: Emil Velikov > Cc: Juan A. Suarez Romero > Cc: Dylan Baker > Cc: Ian Romanick > Cc: Carl Worth > Cc: Mark Janes > Signed-off-by: Andres Gomez > --- > docs/releasing.html | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/docs/releasing.html b/docs/releasing.html > index a022d0c484b..14315e7a8e4 100644 > --- a/docs/releasing.html > +++ b/docs/releasing.html > @@ -54,8 +54,8 @@ For example: > Release schedule > > > -Releases should happen on Fridays. Delays can occur although those should be > keep > -to a minimum. > +Releases should happen on Wednesdays. Delays can occur although those > +should be keep to a minimum. > > See our calendar for the > date and other details for individual releases. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 106590] Wrong line numbers expanded while compiling shaders
https://bugs.freedesktop.org/show_bug.cgi?id=106590 --- Comment #3 from Zhaowei Yuan --- I posted patch v2 to cc more maintainer: https://patchwork.freedesktop.org/patch/240324/ -- You are receiving this mail because: You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 107156] earth tessellation bug
https://bugs.freedesktop.org/show_bug.cgi?id=107156 Gregor Münch changed: What|Removed |Added Resolution|--- |DUPLICATE Status|REOPENED|RESOLVED --- Comment #10 from Gregor Münch --- *** This bug has been marked as a duplicate of bug 107276 *** -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 107276] radv: OpBitfieldUExtract returns incorrect result when count is zero
https://bugs.freedesktop.org/show_bug.cgi?id=107276 Gregor Münch changed: What|Removed |Added CC||ximi...@mail.ru --- Comment #6 from Gregor Münch --- *** Bug 107156 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] anv: Pay attention to VK_ACCESS_MEMORY_(READ|WRITE)_BIT
According to the spec, these should apply to all read/write access types (so would be equivalent to specifying all other access types individually). Currently, they were doing nothing. v2: Handle VK_ACCESS_MEMORY_WRITE_BIT in dstAccessMask. Signed-off-by: Alex Smith Cc: mesa-sta...@lists.freedesktop.org --- src/intel/vulkan/anv_private.h | 9 + 1 file changed, 9 insertions(+) diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h index cec2842792..1660fcbbc8 100644 --- a/src/intel/vulkan/anv_private.h +++ b/src/intel/vulkan/anv_private.h @@ -1731,6 +1731,9 @@ anv_pipe_flush_bits_for_access_flags(VkAccessFlags flags) pipe_bits |= ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT; pipe_bits |= ANV_PIPE_DEPTH_CACHE_FLUSH_BIT; break; + case VK_ACCESS_MEMORY_WRITE_BIT: + pipe_bits |= ANV_PIPE_FLUSH_BITS; + break; default: break; /* Nothing to do */ } @@ -1761,6 +1764,12 @@ anv_pipe_invalidate_bits_for_access_flags(VkAccessFlags flags) case VK_ACCESS_TRANSFER_READ_BIT: pipe_bits |= ANV_PIPE_TEXTURE_CACHE_INVALIDATE_BIT; break; + case VK_ACCESS_MEMORY_READ_BIT: + pipe_bits |= ANV_PIPE_INVALIDATE_BITS; + break; + case VK_ACCESS_MEMORY_WRITE_BIT: + pipe_bits |= ANV_PIPE_FLUSH_BITS; + break; default: break; /* Nothing to do */ } -- 2.14.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] glcpp: Sync line number for macro
Line number of a predefined macro should be set as where it is referenced rather than declared Patch v2 does nothing more except ccing more maintainers Signed-off-by: zhaowei yuan Bugzilla: https://patchwork.freedesktop.org/patch/225818/ --- src/compiler/glsl/glcpp/glcpp-lex.l | 1 + src/compiler/glsl/glcpp/glcpp-parse.y | 55 ++- src/compiler/glsl/glcpp/glcpp.h | 4 ++- src/compiler/glsl/glcpp/pp.c | 3 +- 4 files changed, 54 insertions(+), 9 deletions(-) diff --git a/src/compiler/glsl/glcpp/glcpp-lex.l b/src/compiler/glsl/glcpp/glcpp-lex.l index 9cfcc12..86b82c2 100644 --- a/src/compiler/glsl/glcpp/glcpp-lex.l +++ b/src/compiler/glsl/glcpp/glcpp-lex.l @@ -50,6 +50,7 @@ void glcpp_set_column (int column_no , yyscan_t yyscanner); yylloc->first_line = yylloc->last_line = yylineno; \ yycolumn += yyleng; \ yylloc->last_column = yycolumn + 1; \ + yylloc->position = (yytext - YY_CURRENT_BUFFER_LVALUE->yy_ch_buf); \ parser->has_new_line_number = 0;\ parser->has_new_source_number = 0; \ } while(0); diff --git a/src/compiler/glsl/glcpp/glcpp-parse.y b/src/compiler/glsl/glcpp/glcpp-parse.y index ccb3aa1..68f8477 100644 --- a/src/compiler/glsl/glcpp/glcpp-parse.y +++ b/src/compiler/glsl/glcpp/glcpp-parse.y @@ -1021,7 +1021,7 @@ _token_list_append_list(token_list_t *list, token_list_t *tail) } static token_list_t * -_token_list_copy(glcpp_parser_t *parser, token_list_t *other) +_token_list_copy(glcpp_parser_t *parser, token_list_t *other, token_node_t *macro_node) { token_list_t *copy; token_node_t *node; @@ -1033,6 +1033,12 @@ _token_list_copy(glcpp_parser_t *parser, token_list_t *other) for (node = other->head; node; node = node->next) { token_t *new_token = linear_alloc_child(parser->linalloc, sizeof(token_t)); *new_token = *node->token; + + if(macro_node) { + new_token->location.first_line = macro_node->token->location.first_line; + new_token->location.last_line = macro_node->token->location.last_line; + } + _token_list_append (parser, copy, new_token); } @@ -1349,7 +1355,7 @@ add_builtin_define(glcpp_parser_t *parser, const char *name, int value) glcpp_parser_t * glcpp_parser_create(const struct gl_extensions *extension_list, -glcpp_extension_iterator extensions, void *state, gl_api api) +glcpp_extension_iterator extensions, void *state, gl_api api, const char *input) { glcpp_parser_t *parser; @@ -1377,6 +1383,11 @@ glcpp_parser_create(const struct gl_extensions *extension_list, parser->lex_from_list = NULL; parser->lex_from_node = NULL; + parser->input = _mesa_string_buffer_create(parser, strlen(input) + 1); + strcpy(parser->input->buf, input); + parser->input->buf[strlen(input)] = '\0'; + parser->input->length = strlen(input); + parser->output = _mesa_string_buffer_create(parser, INITIAL_PP_OUTPUT_BUF_SIZE); parser->info_log = _mesa_string_buffer_create(parser, @@ -1441,7 +1452,7 @@ typedef enum function_status static function_status_t _arguments_parse(glcpp_parser_t *parser, argument_list_t *arguments, token_node_t *node, - token_node_t **last) + token_node_t **last, int *end_position) { token_list_t *argument; int paren_count; @@ -1465,8 +1476,10 @@ _arguments_parse(glcpp_parser_t *parser, paren_count++; } else if (node->token->type == ')') { paren_count--; - if (paren_count == 0) + if (paren_count == 0) { +*end_position = node->token->location.position; break; + } } if (node->token->type == ',' && paren_count == 1) { @@ -1702,6 +1715,28 @@ _glcpp_parser_apply_pastes(glcpp_parser_t *parser, token_list_t *list) list->non_space_tail = list->tail; } +static int +_glcpp_parser_get_line(glcpp_parser_t *parser, int offset) +{ + int line = 1; + int i; + + for(i = 0; parser->input->buf[i] && i <= offset; i++) { + if(parser->input->buf[i] == '\n' || parser->input->buf[i] == '\r') + line++; + } + + return line; +} + +static void +_glcpp_sync_location(glcpp_parser_t *parser, token_t *token, token_t *macro_token) +{ + token->location.source = macro_token->location.source; + token->location.first_line = _glcpp_parser_get_line(parser, token->location.position); + token->location.last_line = token->location.first_line; +} + /* This is a helper function that's essentially part of the * implementation of _glcpp_parser_expand_node. It shouldn't be called * except for by that function. @@ -1732,7 +1767,9 @@ _glcpp_parser_expand_function(glcpp_parser_t *parser,
Re: [Mesa-dev] [PATCH] anv: Pay attention to VK_ACCESS_MEMORY_(READ|WRITE)_BIT
On 20 July 2018 at 19:01, Jason Ekstrand wrote: > On Fri, Jul 20, 2018 at 8:37 AM Lionel Landwerlin < > lionel.g.landwer...@intel.com> wrote: > >> On 20/07/18 11:44, Alex Smith wrote: >> >> According to the spec, these should apply to all read/write access >> types (so would be equivalent to specifying all other access types >> individually). Currently, they were doing nothing. >> >> Signed-off-by: Alex Smith >> >> Cc: mesa-sta...@lists.freedesktop.org >> --- >> src/intel/vulkan/anv_private.h | 6 ++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h >> index cec2842792..775bacaff2 100644 >> --- a/src/intel/vulkan/anv_private.h >> +++ b/src/intel/vulkan/anv_private.h >> @@ -1731,6 +1731,9 @@ anv_pipe_flush_bits_for_access_flags(VkAccessFlags >> flags) >> pipe_bits |= ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT; >> pipe_bits |= ANV_PIPE_DEPTH_CACHE_FLUSH_BIT; >> break; >> + case VK_ACCESS_MEMORY_WRITE_BIT: >> + pipe_bits |= ANV_PIPE_FLUSH_BITS; >> + break; >>default: >> break; /* Nothing to do */ >>} >> @@ -1761,6 +1764,9 @@ >> anv_pipe_invalidate_bits_for_access_flags(VkAccessFlags flags) >>case VK_ACCESS_TRANSFER_READ_BIT: >> pipe_bits |= ANV_PIPE_TEXTURE_CACHE_INVALIDATE_BIT; >> break; >> + case VK_ACCESS_MEMORY_READ_BIT: >> + pipe_bits |= ANV_PIPE_INVALIDATE_BITS; >> + break; >> >> >> I know this function is a bit oddly named for that, but with this part of >> the spec regarding VK_ACCESS_MEMORY_WRITE_BIT : >> >> " >> >>- >> >>When included in a destination access mask, makes all available >>writes visible to all future write accesses on entities known to the >> Vulkan >>device. >> >> " >> >> I would also add : >> >> case VK_ACCESS_MEMORY_WRITE_BIT: >> pipe_bits |= ANV_PIPE_FLUSH_BITS; >> break; >> >> Does that sound fair? >> > > That's quite the heavy hammer But I think it's the right thing to do. > Yes - these bits are supposed to be the heaviest hammer there is. I'll add that. Alex ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Add a couple of iand/ior optimizations
Ian and I have been looking at these type of things recently. Ian has started work on a pass to cover this stuff without having to add dozens of these types of opts. https://lists.freedesktop.org/archives/mesa-dev/2018-July/200583.html On 23/07/18 17:36, Jason Ekstrand wrote: Spotted in a shader in Batman: Arkham City. --- src/compiler/nir/nir_opt_algebraic.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/nir_opt_algebraic.py index ba277fdfd0e..f2007852b21 100644 --- a/src/compiler/nir/nir_opt_algebraic.py +++ b/src/compiler/nir/nir_opt_algebraic.py @@ -377,6 +377,8 @@ optimizations = [ (('ixor', a, a), 0), (('ixor', a, 0), a), (('inot', ('inot', a)), a), + (('ior', ('iand', a, b), b), b), + (('iand', ('ior', a, b), b), b), # DeMorgan's Laws (('iand', ('inot', a), ('inot', b)), ('inot', ('ior', a, b))), (('ior', ('inot', a), ('inot', b)), ('inot', ('iand', a, b))), ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 5/8] nir: add complex_loop bool to loop info
In order to be sure loop_terminator_list is an accurate representation of all the jumps in the loop we need to be sure we didn't encounter any other complex behaviour such as continues, nested breaks, etc during analysis. This will be used in the following patch. --- src/compiler/nir/nir.h | 6 ++ src/compiler/nir/nir_loop_analyze.c | 8 ++-- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h index 3bfe7d7f7bf..336a2ca5ac4 100644 --- a/src/compiler/nir/nir.h +++ b/src/compiler/nir/nir.h @@ -1745,6 +1745,12 @@ typedef struct { /* Unroll the loop regardless of its size */ bool force_unroll; + /* Does the loop contain complex loop terminators, continues or other +* complex behaviours? If this is true we can't rely on +* loop_terminator_list to be complete or accurate. +*/ + bool complex_loop; + nir_loop_terminator *limiting_terminator; /* A list of loop_terminators terminating this loop. */ diff --git a/src/compiler/nir/nir_loop_analyze.c b/src/compiler/nir/nir_loop_analyze.c index 5454b7691ba..9c3fd2f286f 100644 --- a/src/compiler/nir/nir_loop_analyze.c +++ b/src/compiler/nir/nir_loop_analyze.c @@ -317,15 +317,19 @@ find_loop_terminators(loop_info_state *state) * not find a loop terminator, but there is a break-statement then * we should return false so that we do not try to find trip-count */ - if (!nir_is_trivial_loop_if(nif, break_blk)) + if (!nir_is_trivial_loop_if(nif, break_blk)) { +state->loop->info->complex_loop = true; return false; + } /* Continue if the if contained no jumps at all */ if (!break_blk) continue; - if (nif->condition.ssa->parent_instr->type == nir_instr_type_phi) + if (nif->condition.ssa->parent_instr->type == nir_instr_type_phi) { +state->loop->info->complex_loop = true; return false; + } nir_loop_terminator *terminator = rzalloc(state->loop->info, nir_loop_terminator); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 8/8] nir: add loop unroll support for complex wrapper loops
In GLSL IR we cheat with switch statements and simply convert them into loops with a single iteration. This allowed us to make use of the existing jump instruction handling provided by the loop handing code, it also allows dead code to be cleaned up once we have wrapped the code in a loop. However using loops in this way created previously unrollable loops which limits further optimisations. Here we provide a way to unroll loops that end in a break and have multiple other exits. All shader-db changes are from the dolphin uber shaders. There is a small amount of HURT shaders but in general the improvements far exceed the HURT. shader-db results IVB: total instructions in shared programs: 10018187 -> 10016468 (-0.02%) instructions in affected programs: 104080 -> 102361 (-1.65%) helped: 36 HURT: 15 total cycles in shared programs: 220065064 -> 154529655 (-29.78%) cycles in affected programs: 126063017 -> 60527608 (-51.99%) helped: 51 HURT: 0 total loops in shared programs: 2515 -> 2308 (-8.23%) loops in affected programs: 903 -> 696 (-22.92%) helped: 51 HURT: 0 total spills in shared programs: 4370 -> 4124 (-5.63%) spills in affected programs: 1397 -> 1151 (-17.61%) helped: 9 HURT: 12 total fills in shared programs: 4581 -> 4419 (-3.54%) fills in affected programs: 2201 -> 2039 (-7.36%) helped: 9 HURT: 15 --- src/compiler/nir/nir_opt_loop_unroll.c | 113 + 1 file changed, 76 insertions(+), 37 deletions(-) diff --git a/src/compiler/nir/nir_opt_loop_unroll.c b/src/compiler/nir/nir_opt_loop_unroll.c index 9c33267cb72..fa60523aac7 100644 --- a/src/compiler/nir/nir_opt_loop_unroll.c +++ b/src/compiler/nir/nir_opt_loop_unroll.c @@ -67,7 +67,6 @@ loop_prepare_for_unroll(nir_loop *loop) /* Remove continue if its the last instruction in the loop */ nir_instr *last_instr = nir_block_last_instr(nir_loop_last_block(loop)); if (last_instr && last_instr->type == nir_instr_type_jump) { - assert(nir_instr_as_jump(last_instr)->type == nir_jump_continue); nir_instr_remove(last_instr); } } @@ -474,54 +473,91 @@ complex_unroll(nir_loop *loop, nir_loop_terminator *unlimit_term, static bool wrapper_unroll(nir_loop *loop) { - bool progress = false; - - nir_block *blk_after_loop = - nir_cursor_current_block(nir_after_cf_node(>cf_node)); - - /* There may still be some single src phis following the loop that -* have not yet been cleaned up by another pass. Tidy those up before -* unrolling the loop. -*/ - nir_foreach_instr_safe(instr, blk_after_loop) { - if (instr->type != nir_instr_type_phi) - break; + if (!list_empty(>info->loop_terminator_list)) { - nir_phi_instr *phi = nir_instr_as_phi(instr); - assert(exec_list_length(>srcs) == 1); + /* Unrolling a loop with a large number of exits can result in a + * large inrease in register pressure. For now we just skip + * unrolling if we have more than 3 exits (not including the break + * at the end of the loop). + * + * TODO: Most loops that fit this pattern are simply switch + * statements that are converted to a loop to take advantage of + * exiting jump instruction handling. In this case we could make + * use of a binary seach pattern like we do in + * nir_lower_indirect_derefs(), this should allow us to unroll the + * loops in an optimal way and should also avoid some of the + * register pressure that comes from simply nesting the + * terminators one after the other. + */ + if (list_length(>info->loop_terminator_list) > 3) + return false; + + loop_prepare_for_unroll(loop); + + nir_cursor cursor = nir_after_block(nir_loop_last_block(loop)); + list_for_each_entry(nir_loop_terminator, terminator, + >info->loop_terminator_list, + loop_terminator_link) { + + /* Remove break from the terminator */ + nir_instr *break_instr = +nir_block_last_instr(terminator->break_block); + nir_instr_remove(break_instr); + + /* Pluck out the loop body. */ + nir_cf_list loop_body; + nir_cf_extract(_body, +nir_after_cf_node(>nif->cf_node), +cursor); + + /* Reinsert loop body into continue from block */ + nir_cf_reinsert(_body, + nir_after_block(terminator->continue_from_block)); + + cursor = terminator->continue_from_then ? + nir_after_block(nir_if_last_then_block(terminator->nif)) : + nir_after_block(nir_if_last_else_block(terminator->nif)); + } + } else { + nir_block *blk_after_loop = + nir_cursor_current_block(nir_after_cf_node(>cf_node)); - nir_phi_src *phi_src = exec_node_data(nir_phi_src, -exec_list_get_head(>srcs), -node); + /* There may
[Mesa-dev] [PATCH v2 6/8] nir/opt_loop_unroll: Remove unneeded phis if we make progress
Now that SSA values can be derefs and they have special rules, we have to be a bit more careful about our LCSSA phis. In particular, we need to clean up in case LCSSA ended up creating a phi node for a deref. This avoids validation issues with some CTS tests with the following patch, but its possible this we could also see the same problem with the existing unrolling passes. --- src/compiler/nir/nir_opt_loop_unroll.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/src/compiler/nir/nir_opt_loop_unroll.c b/src/compiler/nir/nir_opt_loop_unroll.c index a1ad0e59814..e0e0b754716 100644 --- a/src/compiler/nir/nir_opt_loop_unroll.c +++ b/src/compiler/nir/nir_opt_loop_unroll.c @@ -575,9 +575,17 @@ nir_opt_loop_unroll_impl(nir_function_impl *impl, _nested_loop); } - if (progress) + if (progress) { nir_lower_regs_to_ssa_impl(impl); + /* Calling nir_convert_loop_to_lcssa() adds extra phi nodes which may + * not be valid if they're used for something such as a deref. + * Remove any unneeded phis. + */ + nir_copy_prop(impl->function->shader); + nir_opt_remove_phis_impl(impl); + } + return progress; } -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 3/8] nir: allow more nested loops to be unrolled
The innermost check was added to stop us from unrolling multiple loops in a single pass, and to stop outer loops from unrolling. When we successfully unroll a loop we need to run the analysis pass again before deciding if we want to go ahead an unroll a second loop. However the logic was flawed because it never tried to unroll any nested loops other than the first innermost loop it found. If this innermost loop is not unrolled we end up skipping all other nested loops. This unrolls a loop in a Deus Ex: MD shader on ultra settings and also unrolls a loop in a shader from the game Prey when running on DXVK. --- src/compiler/nir/nir_opt_loop_unroll.c | 31 ++ 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/src/compiler/nir/nir_opt_loop_unroll.c b/src/compiler/nir/nir_opt_loop_unroll.c index 955dfede694..a1ad0e59814 100644 --- a/src/compiler/nir/nir_opt_loop_unroll.c +++ b/src/compiler/nir/nir_opt_loop_unroll.c @@ -483,9 +483,10 @@ is_loop_small_enough_to_unroll(nir_shader *shader, nir_loop_info *li) } static bool -process_loops(nir_shader *sh, nir_cf_node *cf_node, bool *innermost_loop) +process_loops(nir_shader *sh, nir_cf_node *cf_node, bool *has_nested_loop_out) { bool progress = false; + bool has_nested_loop = false; nir_loop *loop; switch (cf_node->type) { @@ -494,32 +495,32 @@ process_loops(nir_shader *sh, nir_cf_node *cf_node, bool *innermost_loop) case nir_cf_node_if: { nir_if *if_stmt = nir_cf_node_as_if(cf_node); foreach_list_typed_safe(nir_cf_node, nested_node, node, _stmt->then_list) - progress |= process_loops(sh, nested_node, innermost_loop); + progress |= process_loops(sh, nested_node, has_nested_loop_out); foreach_list_typed_safe(nir_cf_node, nested_node, node, _stmt->else_list) - progress |= process_loops(sh, nested_node, innermost_loop); + progress |= process_loops(sh, nested_node, has_nested_loop_out); return progress; } case nir_cf_node_loop: { loop = nir_cf_node_as_loop(cf_node); foreach_list_typed_safe(nir_cf_node, nested_node, node, >body) - progress |= process_loops(sh, nested_node, innermost_loop); + progress |= process_loops(sh, nested_node, _nested_loop); + break; } default: unreachable("unknown cf node type"); } - if (*innermost_loop) { - /* Don't attempt to unroll outer loops or a second inner loop in - * this pass wait until the next pass as we have altered the cf. - */ - *innermost_loop = false; + /* Don't attempt to unroll a second inner loop in this pass, wait until the +* next pass as we have altered the cf. +*/ + if (!progress) { - if (loop->info->limiting_terminator == NULL) - return progress; + if (has_nested_loop || loop->info->limiting_terminator == NULL) + goto exit; if (!is_loop_small_enough_to_unroll(sh, loop->info)) - return progress; + goto exit; if (loop->info->is_trip_count_known) { simple_unroll(loop); @@ -555,6 +556,8 @@ process_loops(nir_shader *sh, nir_cf_node *cf_node, bool *innermost_loop) } } +exit: + *has_nested_loop_out = true; return progress; } @@ -567,9 +570,9 @@ nir_opt_loop_unroll_impl(nir_function_impl *impl, nir_metadata_require(impl, nir_metadata_block_index); foreach_list_typed_safe(nir_cf_node, node, node, >body) { - bool innermost_loop = true; + bool has_nested_loop = false; progress |= process_loops(impl->function->shader, node, -_loop); +_nested_loop); } if (progress) -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 7/8] nir: add loop unroll support for wrapper loops
This adds support for unrolling the classic do { // ... } while (false) that is used to wrap multi-line macros. GLSL IR also wraps switch statements in a loop like this. shader-db results IVB: total loops in shared programs: 2515 -> 2512 (-0.12%) loops in affected programs: 33 -> 30 (-9.09%) helped: 3 HURT: 0 --- src/compiler/nir/nir_opt_loop_unroll.c | 77 ++ 1 file changed, 77 insertions(+) diff --git a/src/compiler/nir/nir_opt_loop_unroll.c b/src/compiler/nir/nir_opt_loop_unroll.c index e0e0b754716..9c33267cb72 100644 --- a/src/compiler/nir/nir_opt_loop_unroll.c +++ b/src/compiler/nir/nir_opt_loop_unroll.c @@ -465,6 +465,65 @@ complex_unroll(nir_loop *loop, nir_loop_terminator *unlimit_term, _mesa_hash_table_destroy(remap_table, NULL); } +/* Unrolls the classic wrapper loops e.g + * + *do { + *// ... + *} while (false) + */ +static bool +wrapper_unroll(nir_loop *loop) +{ + bool progress = false; + + nir_block *blk_after_loop = + nir_cursor_current_block(nir_after_cf_node(>cf_node)); + + /* There may still be some single src phis following the loop that +* have not yet been cleaned up by another pass. Tidy those up before +* unrolling the loop. +*/ + nir_foreach_instr_safe(instr, blk_after_loop) { + if (instr->type != nir_instr_type_phi) + break; + + nir_phi_instr *phi = nir_instr_as_phi(instr); + assert(exec_list_length(>srcs) == 1); + + nir_phi_src *phi_src = exec_node_data(nir_phi_src, +exec_list_get_head(>srcs), +node); + + nir_ssa_def_rewrite_uses(>dest.ssa, phi_src->src); + nir_instr_remove(instr); + + progress = true; + } + + nir_block *last_loop_blk = nir_loop_last_block(loop); + if (nir_block_ends_in_break(last_loop_blk)) { + + /* Remove break at end of the loop */ + nir_instr *break_instr = nir_block_last_instr(last_loop_blk); + nir_instr_remove(break_instr); + + /* Pluck out the loop body. */ + nir_cf_list loop_body; + nir_cf_extract(_body, nir_before_block(nir_loop_first_block(loop)), + nir_after_block(nir_loop_last_block(loop))); + + /* Reinsert loop body after the loop */ + nir_cf_reinsert(_body, nir_after_cf_node(>cf_node)); + + /* The loop has been unrolled so remove it. */ + nir_cf_node_remove(>cf_node); + + progress = true; + } + + return progress; +} + static bool is_loop_small_enough_to_unroll(nir_shader *shader, nir_loop_info *li) { @@ -516,6 +575,24 @@ process_loops(nir_shader *sh, nir_cf_node *cf_node, bool *has_nested_loop_out) */ if (!progress) { + /* Check for the classic + * + *do { + *// ... + *} while (false) + * + * that is used to wrap multi-line macros. GLSL IR also wraps switch + * statements in a loop like this. + */ + if (loop->info->limiting_terminator == NULL && + list_empty(>info->loop_terminator_list) && + !loop->info->complex_loop) { + + progress = wrapper_unroll(loop); + + goto exit; + } + if (has_nested_loop || loop->info->limiting_terminator == NULL) goto exit; -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 4/8] nir: always attempt to find loop terminators
This will help later patches with unrolling loops that end with a break i.e. loops the always exit on their first interation. --- src/compiler/nir/nir_loop_analyze.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/compiler/nir/nir_loop_analyze.c b/src/compiler/nir/nir_loop_analyze.c index d564296aa67..5454b7691ba 100644 --- a/src/compiler/nir/nir_loop_analyze.c +++ b/src/compiler/nir/nir_loop_analyze.c @@ -717,13 +717,6 @@ get_loop_info(loop_info_state *state, nir_function_impl *impl) } } - /* Induction analysis needs invariance information so get that first */ - compute_invariance_information(state); - - /* We have invariance information so try to find induction variables */ - if (!compute_induction_information(state)) - return; - /* Try to find all simple terminators of the loop. If we can't find any, * or we find possible terminators that have side effects then bail. */ @@ -737,6 +730,13 @@ get_loop_info(loop_info_state *state, nir_function_impl *impl) return; } + /* Induction analysis needs invariance information so get that first */ + compute_invariance_information(state); + + /* We have invariance information so try to find induction variables */ + if (!compute_induction_information(state)) + return; + /* Run through each of the terminators and try to compute a trip-count */ find_trip_count(state); -- 2.17.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] v2 Loop unrolling and if statement opts
v2: - make use of nir_block_dominates() in the if-statement opts as suggested by Connor. This allowed for a reduction in code required but also allowed me to implemented some extra optimisations. - implement feedback from Jason on loop unrolling opts. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev