Re: [Mesa-dev] [PATCH 1/2] radv: Add occlusion query shader.
One trivial comment but otherwise 1&2 are, Reviewed-by: Edward O'CallaghanOn 04/10/2017 09:34 AM, Bas Nieuwenhuizen wrote: > Adds a shader for writing occlusion query results to a buffer, as the > CP packet isn't support on SI or secondary buffers, and doesn't handle > the availability bit (or partial results) nor truncation to 32-bit. > > Signed-off-by: Bas Nieuwenhuizen > --- > src/amd/vulkan/radv_meta.c| 7 + > src/amd/vulkan/radv_meta.h| 3 + > src/amd/vulkan/radv_private.h | 6 + > src/amd/vulkan/radv_query.c | 419 > ++ > 4 files changed, 435 insertions(+) > > diff --git a/src/amd/vulkan/radv_meta.c b/src/amd/vulkan/radv_meta.c > index 04fa247dd36..0098e0844c1 100644 > --- a/src/amd/vulkan/radv_meta.c > +++ b/src/amd/vulkan/radv_meta.c > @@ -324,6 +324,10 @@ radv_device_init_meta(struct radv_device *device) > if (result != VK_SUCCESS) > goto fail_buffer; > > + result = radv_device_init_meta_query_state(device); > + if (result != VK_SUCCESS) > + goto fail_query; > + > result = radv_device_init_meta_fast_clear_flush_state(device); > if (result != VK_SUCCESS) > goto fail_fast_clear; > @@ -337,6 +341,8 @@ fail_resolve_compute: > radv_device_finish_meta_fast_clear_flush_state(device); > fail_fast_clear: > radv_device_finish_meta_buffer_state(device); > +fail_query: > + radv_device_finish_meta_query_state(device); > fail_buffer: > radv_device_finish_meta_depth_decomp_state(device); > fail_depth_decomp: > @@ -363,6 +369,7 @@ radv_device_finish_meta(struct radv_device *device) > radv_device_finish_meta_blit2d_state(device); > radv_device_finish_meta_bufimage_state(device); > radv_device_finish_meta_depth_decomp_state(device); > + radv_device_finish_meta_query_state(device); > radv_device_finish_meta_buffer_state(device); > radv_device_finish_meta_fast_clear_flush_state(device); > radv_device_finish_meta_resolve_compute_state(device); > diff --git a/src/amd/vulkan/radv_meta.h b/src/amd/vulkan/radv_meta.h > index d70fef1e5f1..6cfc6134c53 100644 > --- a/src/amd/vulkan/radv_meta.h > +++ b/src/amd/vulkan/radv_meta.h > @@ -85,6 +85,9 @@ void radv_device_finish_meta_blit2d_state(struct > radv_device *device); > VkResult radv_device_init_meta_buffer_state(struct radv_device *device); > void radv_device_finish_meta_buffer_state(struct radv_device *device); > > +VkResult radv_device_init_meta_query_state(struct radv_device *device); > +void radv_device_finish_meta_query_state(struct radv_device *device); > + > VkResult radv_device_init_meta_resolve_compute_state(struct radv_device > *device); > void radv_device_finish_meta_resolve_compute_state(struct radv_device > *device); > void radv_meta_save(struct radv_meta_saved_state *state, > diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h > index 580c1197e64..a03c24c24ac 100644 > --- a/src/amd/vulkan/radv_private.h > +++ b/src/amd/vulkan/radv_private.h > @@ -438,6 +438,12 @@ struct radv_meta_state { > VkPipeline fill_pipeline; > VkPipeline copy_pipeline; > } buffer; > + > + struct { > + VkDescriptorSetLayout occlusion_query_ds_layout; > + VkPipelineLayout occlusion_query_p_layout; > + VkPipeline occlusion_query_pipeline; > + } query; > }; > > /* queue types */ > diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c > index 288bd43a763..5b1fff4eeaa 100644 > --- a/src/amd/vulkan/radv_query.c > +++ b/src/amd/vulkan/radv_query.c > @@ -29,6 +29,8 @@ > #include > #include > > +#include "nir/nir_builder.h" > +#include "radv_meta.h" > #include "radv_private.h" > #include "radv_cs.h" > #include "sid.h" > @@ -49,6 +51,423 @@ static unsigned get_max_db(struct radv_device *device) > return num_db; > } > > +static void radv_break_on_count(nir_builder *b, nir_variable *var, int count) > +{ > + nir_ssa_def *counter = nir_load_var(b, var); > + > + nir_if *if_stmt = nir_if_create(b->shader); > + if_stmt->condition = nir_src_for_ssa(nir_uge(b, counter, nir_imm_int(b, > count))); > + nir_cf_node_insert(b->cursor, _stmt->cf_node); > + > + b->cursor = nir_after_cf_list(_stmt->then_list); > + > + nir_jump_instr *instr = nir_jump_instr_create(b->shader, > nir_jump_break); > + nir_builder_instr_insert(b, >instr); > + > + b->cursor = nir_after_cf_node(_stmt->cf_node); > + counter = nir_iadd(b, counter, nir_imm_int(b, 1)); > + nir_store_var(b, var, counter, 0x1); > +} > + > +static struct nir_ssa_def * > +radv_load_push_int(nir_builder *b, unsigned offset, const char *name) > +{ > + nir_intrinsic_instr *flags = nir_intrinsic_instr_create(b->shader, > nir_intrinsic_load_push_constant); > + flags->src[0] = nir_src_for_ssa(nir_imm_int(b, offset)); > +
Re: [Mesa-dev] [PATCH] amd/addrlib: use correct variable name in header
On 04/10/2017 12:31 PM, Thomas H.P. Andersen wrote: > On Sun, Apr 9, 2017 at 8:25 PM, Marek Olšákwrote: >> Reviewed-by: Marek Olšák >> >> Marek > > Thanks. I do not have commit access, so will need someone to push it for me. Done, thanks for the fix! Kind Regards, Edward. > >> On Sat, Apr 8, 2017 at 8:36 AM, Thomas Hindoe Paaboel Andersen >> wrote: >>> Since the inclusion in 7f160efcde41b52ad78e562316384373dab419e3 >>> the header used x_biased, while the implementation used y_biased. >>> This changes the header to macth the implementation since the >>> uses of the function seems to expect y_biased. >>> --- >>> src/amd/addrlib/gfx9/rbmap.h | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/src/amd/addrlib/gfx9/rbmap.h b/src/amd/addrlib/gfx9/rbmap.h >>> index f2f2ca8..89c8922 100644 >>> --- a/src/amd/addrlib/gfx9/rbmap.h >>> +++ b/src/amd/addrlib/gfx9/rbmap.h >>> @@ -49,7 +49,7 @@ public: >>> >>> void Get_Comp_Block_Screen_Space( CoordEq& addr, int bytes_log2, int* >>> w, int* h, int* d = NULL); >>> >>> -void Get_Meta_Block_Screen_Space( int num_comp_blocks_log2, bool >>> is_thick, bool x_biased, >>> +void Get_Meta_Block_Screen_Space( int num_comp_blocks_log2, bool >>> is_thick, bool y_biased, >>>int comp_block_width_log2, int >>> comp_block_height_log2, int comp_block_depth_log2, >>>int& meta_block_width_log2, int& >>> meta_block_height_log2, int& meta_block_depth_log2 ); >>> void cap_pipe( int xmode, bool is_thick, int& num_ses_log2, int >>> bpp_log2, int num_samples_log2, int pipe_interleave_log2, >>> -- >>> 2.9.3 >>> >>> ___ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > signature.asc Description: OpenPGP digital signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] amd/addrlib: use correct variable name in header
On Sun, Apr 9, 2017 at 8:25 PM, Marek Olšákwrote: > Reviewed-by: Marek Olšák > > Marek Thanks. I do not have commit access, so will need someone to push it for me. > On Sat, Apr 8, 2017 at 8:36 AM, Thomas Hindoe Paaboel Andersen > wrote: >> Since the inclusion in 7f160efcde41b52ad78e562316384373dab419e3 >> the header used x_biased, while the implementation used y_biased. >> This changes the header to macth the implementation since the >> uses of the function seems to expect y_biased. >> --- >> src/amd/addrlib/gfx9/rbmap.h | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/src/amd/addrlib/gfx9/rbmap.h b/src/amd/addrlib/gfx9/rbmap.h >> index f2f2ca8..89c8922 100644 >> --- a/src/amd/addrlib/gfx9/rbmap.h >> +++ b/src/amd/addrlib/gfx9/rbmap.h >> @@ -49,7 +49,7 @@ public: >> >> void Get_Comp_Block_Screen_Space( CoordEq& addr, int bytes_log2, int* >> w, int* h, int* d = NULL); >> >> -void Get_Meta_Block_Screen_Space( int num_comp_blocks_log2, bool >> is_thick, bool x_biased, >> +void Get_Meta_Block_Screen_Space( int num_comp_blocks_log2, bool >> is_thick, bool y_biased, >>int comp_block_width_log2, int >> comp_block_height_log2, int comp_block_depth_log2, >>int& meta_block_width_log2, int& >> meta_block_height_log2, int& meta_block_depth_log2 ); >> void cap_pipe( int xmode, bool is_thick, int& num_ses_log2, int >> bpp_log2, int num_samples_log2, int pipe_interleave_log2, >> -- >> 2.9.3 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: fix memory leak in arb_fragment_program
Thanks. Reviewed-by: Timothy ArceriOn 10/04/17 02:37, Bartosz Tomczyk wrote: --- src/mesa/program/arbprogparse.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/program/arbprogparse.c b/src/mesa/program/arbprogparse.c index 07bdf1603e..83a501eea6 100644 --- a/src/mesa/program/arbprogparse.c +++ b/src/mesa/program/arbprogparse.c @@ -78,6 +78,7 @@ _mesa_parse_arb_fragment_program(struct gl_context* ctx, GLenum target, memset(, 0, sizeof(prog)); memset(, 0, sizeof(state)); state.prog = + state.mem_ctx = program; if (!_mesa_parse_arb_program(ctx, target, (const GLubyte*) str, len, )) { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 8/9] nvc0/ir: Implement TGSI_OPCODE_BALLOT and TGSI_OPCODE_READ_*
2017-04-10 9:54 GMT+08:00 Ilia Mirkin: > On Sun, Apr 9, 2017 at 8:58 PM, Boyan Ding wrote: >> --- >> .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 28 >> ++ >> 1 file changed, 28 insertions(+) >> >> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp >> b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp >> index 1bd01a9a32..2ce6f29905 100644 >> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp >> @@ -978,6 +978,10 @@ static nv50_ir::operation translateOpcode(uint opcode) >> NV50_IR_OPCODE_CASE(VOTE_ANY, VOTE); >> NV50_IR_OPCODE_CASE(VOTE_EQ, VOTE); >> >> + NV50_IR_OPCODE_CASE(BALLOT, VOTE); >> + NV50_IR_OPCODE_CASE(READ_INVOC, SHFL); >> + NV50_IR_OPCODE_CASE(READ_FIRST, SHFL); >> + >> NV50_IR_OPCODE_CASE(END, EXIT); >> >> default: >> @@ -3431,6 +3435,30 @@ Converter::handleInstruction(const struct >> tgsi_full_instruction *insn) >> mkCvt(OP_CVT, TYPE_U32, dst0[c], TYPE_U8, val0); >>} >>break; >> + case TGSI_OPCODE_BALLOT: >> + val0 = new_LValue(func, FILE_PREDICATE); >> + mkCmp(OP_SET, CC_NE, TYPE_U32, val0, TYPE_U32, fetchSrc(0, 0), zero); >> + mkOp1(op, TYPE_U32, dst0[0], val0)->subOp = NV50_IR_SUBOP_VOTE_ANY; >> + mkMov(dst0[1], zero, TYPE_U32); > > Check that dst[n] isn't masked though before writing to it. > >> + break; >> + case TGSI_OPCODE_READ_FIRST: >> + // ReadFirstInvocationARB(src) is implemented as >> + // ReadInvocationARB(src, findLSB(ballot(true))) >> + val0 = getScratch(); >> + mkOp1(OP_VOTE, TYPE_U32, val0, mkImm(1))->subOp = >> NV50_IR_SUBOP_VOTE_ANY; >> + mkOp2(OP_EXTBF, TYPE_U32, val0, val0, mkImm(0x2000)) >> + ->subOp = NV50_IR_SUBOP_EXTBF_REV; >> + mkOp1(OP_BFIND, TYPE_U32, val0, val0)->subOp = >> NV50_IR_SUBOP_BFIND_SAMT; >> + src1 = val0; >> + /* fallthrough */ > > You could, of course, do this as: > > if (false) > >> + case TGSI_OPCODE_READ_INVOC: >> + if (tgsi.getOpcode() == TGSI_OPCODE_READ_INVOC) > > And then remove this if statement. (Ain't C fun.) > > But don't actually do that :) I'm more pointing it out due to the crazy > factor. Well, I didn't even think of that ;) But I surely won't take it. > > I really do hate that if for somewhat irrational reasons though... > can't think of a clean way of getting rid of it. Oh well. Yeah, the 'if' here isnt really great. However, without that, the only way I could come up with will cause duplication which is even worse. > >> + src1 = fetchSrc(1, 0); >> + FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) { >> + geni = mkOp3(op, dstTy, dst0[c], fetchSrc(0, c), src1, >> mkImm(0x1f)); >> + geni->subOp = NV50_IR_SUBOP_SHFL_IDX; >> + } >> + break; >> case TGSI_OPCODE_CLOCK: >>// Stick the 32-bit clock into the high dword of the logical result. >>if (!tgsi.getDst(0).isMasked(0)) >> -- >> 2.12.1 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 9/9] nvc0: Enable ARB_shader_ballot on Kepler+
Reviewed-by: Ilia MirkinOn Sun, Apr 9, 2017 at 8:58 PM, Boyan Ding wrote: > readInvocationARB() and readFirstInvocationARB() need SHFL.IDX > instruction which is introduced in Kepler. > --- > docs/features.txt | 2 +- > docs/relnotes/17.1.0.html | 2 +- > src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 3 ++- > 3 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/docs/features.txt b/docs/features.txt > index edc56842b9..a2d7785827 100644 > --- a/docs/features.txt > +++ b/docs/features.txt > @@ -292,7 +292,7 @@ Khronos, ARB, and OES extensions that are not part of any > OpenGL or OpenGL ES ve >GL_ARB_sample_locations not started >GL_ARB_seamless_cubemap_per_texture DONE (i965, nvc0, > radeonsi, r600, softpipe, swr) >GL_ARB_shader_atomic_counter_ops DONE (i965/gen7+, > nvc0, radeonsi, softpipe) > - GL_ARB_shader_ballot DONE (radeonsi) > + GL_ARB_shader_ballot DONE (nvc0, radeonsi) >GL_ARB_shader_clock DONE (i965/gen7+, > nv50, nvc0, radeonsi) >GL_ARB_shader_draw_parameters DONE (i965, nvc0, > radeonsi) >GL_ARB_shader_group_vote DONE (nvc0, radeonsi) > diff --git a/docs/relnotes/17.1.0.html b/docs/relnotes/17.1.0.html > index 0a5cabe4f1..8f237ed527 100644 > --- a/docs/relnotes/17.1.0.html > +++ b/docs/relnotes/17.1.0.html > @@ -45,7 +45,7 @@ Note: some of the new features are only available with > certain drivers. > > > GL_ARB_gpu_shader_int64 on i965/gen8+, nvc0, radeonsi, softpipe, > llvmpipe > -GL_ARB_shader_ballot on radeonsi > +GL_ARB_shader_ballot on nvc0, radeonsi > GL_ARB_shader_clock on nv50, nvc0, radeonsi > GL_ARB_shader_group_vote on radeonsi > GL_ARB_sparse_buffer on radeonsi/CIK+ > diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c > b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c > index 7ef9bf9c9c..8c6712a121 100644 > --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c > +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c > @@ -259,6 +259,8 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum > pipe_cap param) >return class_3d >= NVE4_3D_CLASS; /* needs testing on fermi */ > case PIPE_CAP_POLYGON_MODE_FILL_RECTANGLE: >return class_3d >= GM200_3D_CLASS; > + case PIPE_CAP_TGSI_BALLOT: > + return class_3d >= NVE4_3D_CLASS; > > /* unsupported caps */ > case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT: > @@ -289,7 +291,6 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum > pipe_cap param) > case PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY: > case PIPE_CAP_INT64_DIVMOD: > case PIPE_CAP_SPARSE_BUFFER_PAGE_SIZE: > - case PIPE_CAP_TGSI_BALLOT: >return 0; > > case PIPE_CAP_VENDOR_ID: > -- > 2.12.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 100613] Regression in Mesa 17 on s390x (zSystems)
https://bugs.freedesktop.org/show_bug.cgi?id=100613 --- Comment #3 from Roland Scheidegger--- (In reply to Stefan Dirsch from comment #2) > Roland, thanks a lot for your prompt reply! Very much appreciated! > > Seems Richard meanwhile switched companies from IBM to ARM meanwhile. I > found him on Linkedin. Possibly he's now working on aarch64 (LE). So I'm > afraid he has no longer access to BE machines any longer. > > Unfortunately I'm not familiar with llvmpipe at all. Would it be an option > not to change the code there for BE, if developers have no access to such > machines? Reverse-applying the commit is going to break sooner or later I'm > sure. That'll be theoretically possible but I can't say I particularly like that solution. It doesn't make much sense that the fetch paths for BE and LE are completely distinct... Chances are it will break sooner or later anyway - this code really desperately wants someone who is willing to test it and keep it working on BE. (That it took 3 months until someone notices it's broken isn't a good sign...) Otherwise there's probably a build change down the road which just disables build on BE... > > Of course I'm willing to test any proposed change/patch on s390x, but I'm > not a Mesa/llvmwpipe developer per se. > > Unfortunately llvmpipe is needed on s390x, since it has become a requirement > for modern desktops like gdm/gnome-shell. :-( All the more reason why someone might want to look into it... > I can't say how fundamental the issue is. gdm and gnome-shell just show a > black screen. :-( I don't know what vertex formats these use, but yes bogus vertex fetch will make for a very bad experience (it's nearly a miracle glxgears still manages to draw something in fact I like that new look better :-)). I've taken a closer look now, and I can see some reasons why it doesn't work. That said, I never really understood the vector_justify logic, which just looks odd to me. But in the end the gather really is different for AoS and SoA (and I didn't understand the differences there neither wrt vector_justify). So, looking at R32G32B32F format (which glxgears uses) (for this format SoA vs. AoS should not actually make that much of a difference, since it doesn't require any actual conversion): The old code would have called lp_build_fetch_rgba_aos() 4 times - which would have resulted in 4 lp_build_gather with vector_justify set to TRUE, block_bits 96 and dst type of 1x128bit. The gather would have fetched 96 bits, do a ZEXT and then (due to vector_justify - this is the stuff guarded with PIPE_ARCH_BIG_ENDIAN in lp_bld_gather.c) do a left shift of 32 for some reason I don't quite get (I thought it shouldn't make a difference with those array formats if they are fetched on BE or LE but it looks like I'm wrong). The values then would have gone through lp_build_format_swizzle_aos() (and I have no idea if that swizzle looks different on BE) before finally getting transposed to SoA. The new code will now use one lp_build_fetch_rgba_soa() call. This will still end up with 4 gathers, but in the soa path which always use vector_justify of false (why? I have no idea but this was like that before), so you don't get the left shift of 32. Oh and the values will be fetched as 3x32bit instead of a 96bit int (this particular change was one of the changes preceding this commit, so you could verify independently if it breaks stuff, some piglit texture format tests for instance could show that - unfortunately lp_test_format only does (scalar) rgba_aos fetch, so not exactly helpful for that, but you really want rgba SoA fetches working in general, regardless of vertex fetch), if that makes any difference (again, I have no idea really) (it will do pad_vector, so use a shuffle to extend the 3x32bit values to 4x32bit instead of using ZExt to 1x128bit, but I'm not worried about that particular bit). The values will then be transposed and finally going into lp_build_format_swizzle_soa(). So, my guess is maybe things would work a bit better if you'd hack up the vector_justify parameter to lp_build_gather() in lp_build_fetch_rgba_soa(). However, this near certainly breaks all the other callers of lp_build_fetch_rgba_soa(), which is used for just about all texture formats except the rgba8 ones, so glxgears and desktop compositors might still run but probably not much else, you don't want to do that... -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 8/9] nvc0/ir: Implement TGSI_OPCODE_BALLOT and TGSI_OPCODE_READ_*
On Sun, Apr 9, 2017 at 8:58 PM, Boyan Dingwrote: > --- > .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 28 > ++ > 1 file changed, 28 insertions(+) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp > index 1bd01a9a32..2ce6f29905 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp > @@ -978,6 +978,10 @@ static nv50_ir::operation translateOpcode(uint opcode) > NV50_IR_OPCODE_CASE(VOTE_ANY, VOTE); > NV50_IR_OPCODE_CASE(VOTE_EQ, VOTE); > > + NV50_IR_OPCODE_CASE(BALLOT, VOTE); > + NV50_IR_OPCODE_CASE(READ_INVOC, SHFL); > + NV50_IR_OPCODE_CASE(READ_FIRST, SHFL); > + > NV50_IR_OPCODE_CASE(END, EXIT); > > default: > @@ -3431,6 +3435,30 @@ Converter::handleInstruction(const struct > tgsi_full_instruction *insn) > mkCvt(OP_CVT, TYPE_U32, dst0[c], TYPE_U8, val0); >} >break; > + case TGSI_OPCODE_BALLOT: > + val0 = new_LValue(func, FILE_PREDICATE); > + mkCmp(OP_SET, CC_NE, TYPE_U32, val0, TYPE_U32, fetchSrc(0, 0), zero); > + mkOp1(op, TYPE_U32, dst0[0], val0)->subOp = NV50_IR_SUBOP_VOTE_ANY; > + mkMov(dst0[1], zero, TYPE_U32); Check that dst[n] isn't masked though before writing to it. > + break; > + case TGSI_OPCODE_READ_FIRST: > + // ReadFirstInvocationARB(src) is implemented as > + // ReadInvocationARB(src, findLSB(ballot(true))) > + val0 = getScratch(); > + mkOp1(OP_VOTE, TYPE_U32, val0, mkImm(1))->subOp = > NV50_IR_SUBOP_VOTE_ANY; > + mkOp2(OP_EXTBF, TYPE_U32, val0, val0, mkImm(0x2000)) > + ->subOp = NV50_IR_SUBOP_EXTBF_REV; > + mkOp1(OP_BFIND, TYPE_U32, val0, val0)->subOp = > NV50_IR_SUBOP_BFIND_SAMT; > + src1 = val0; > + /* fallthrough */ You could, of course, do this as: if (false) > + case TGSI_OPCODE_READ_INVOC: > + if (tgsi.getOpcode() == TGSI_OPCODE_READ_INVOC) And then remove this if statement. (Ain't C fun.) But don't actually do that :) I'm more pointing it out due to the crazy factor. I really do hate that if for somewhat irrational reasons though... can't think of a clean way of getting rid of it. Oh well. > + src1 = fetchSrc(1, 0); > + FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) { > + geni = mkOp3(op, dstTy, dst0[c], fetchSrc(0, c), src1, mkImm(0x1f)); > + geni->subOp = NV50_IR_SUBOP_SHFL_IDX; > + } > + break; > case TGSI_OPCODE_CLOCK: >// Stick the 32-bit clock into the high dword of the logical result. >if (!tgsi.getDst(0).isMasked(0)) > -- > 2.12.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 2/9] nvc0/ir: Properly handle a "split form" of predicate destination
2017-04-10 9:31 GMT+08:00 Ilia Mirkin: > Wow, great find! > > On Sun, Apr 9, 2017 at 8:58 PM, Boyan Ding wrote: >> GF100's ISA encoding has a weird form of predicate destination where its >> 3 bits are split across whole the instruction. Use a dedicated setPDSTL >> function instead of original defId which is incorrect in this case. >> --- >> src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 13 +++-- >> 1 file changed, 11 insertions(+), 2 deletions(-) >> >> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp >> b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp >> index 5467447e35..d5a310f88c 100644 >> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp >> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp >> @@ -58,6 +58,7 @@ private: >> void setImmediateS8(const ValueRef&); >> void setSUConst16(const Instruction *, const int s); >> void setSUPred(const Instruction *, const int s); >> + inline void setPDSTL(const ValueDef&); >> >> void emitCondCode(CondCode cc, int pos); >> void emitInterpMode(const Instruction *); >> @@ -375,6 +376,14 @@ void CodeEmitterNVC0::setImmediateS8(const ValueRef >> ) >> code[0] |= (s8 >> 6) << 8; >> } >> >> +void CodeEmitterNVC0::setPDSTL(const ValueDef ) >> +{ >> + uint32_t pred = (def.get() && def.getFile() != FILE_FLAGS ? >> DDATA(def).id : 7); > > Why not just == FILE_PREDICATE? Also, I don't think the outer parens do much. Okay, will fix it. > >> + >> + code[0] |= (pred & 3) << 8; >> + code[1] |= !!(pred & 7) << 26; > > This always makes me nervous... how about > > (pred & 4) << (26 - 2) > > BTW, this should be pred & 4 in either case, no? Yeah, should be pred & 4. > >> +} >> + >> void >> CodeEmitterNVC0::emitForm_A(const Instruction *i, uint64_t opc) >> { >> @@ -1873,7 +1882,7 @@ CodeEmitterNVC0::emitSTORE(const Instruction *i) >>if (i->src(0).getFile() == FILE_MEMORY_SHARED && >>i->subOp == NV50_IR_SUBOP_STORE_UNLOCKED) { >> assert(i->defExists(0)); >> - defId(i->def(0), 8); >> + setPDSTL(i->def(0)); >>} >> } >> >> @@ -1945,7 +1954,7 @@ CodeEmitterNVC0::emitLOAD(const Instruction *i) >> >> if (p >= 0) { >>if (targ->getChipset() >= NVISA_GK104_CHIPSET) >> - defId(i->def(p), 8); >> + setPDSTL(i->def(p)); >>else >> defId(i->def(p), 32 + 18); >> } >> -- >> 2.12.1 >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH V3 2/2] glsl: don't run the GLSL pre-processor when we are skipping compilation
Improves Deus Ex start-up times with a warm cache from ~30 seconds to ~22 seconds. Also fixes the leaking of state. V2: fix indentation v3: add the value of MESA_EXTENSION_OVERRIDE to the hash of the shader. Tested-by (v2): Grazvydas Ignotas--- src/compiler/glsl/glsl_parser_extras.cpp | 19 ++- src/compiler/glsl/shader_cache.cpp | 10 ++ 2 files changed, 20 insertions(+), 9 deletions(-) diff --git a/src/compiler/glsl/glsl_parser_extras.cpp b/src/compiler/glsl/glsl_parser_extras.cpp index ca74b55..eb12eff 100644 --- a/src/compiler/glsl/glsl_parser_extras.cpp +++ b/src/compiler/glsl/glsl_parser_extras.cpp @@ -1998,32 +1998,23 @@ opt_shader_and_create_symbol_table(struct gl_context *ctx, } } _mesa_glsl_initialize_derived_variables(ctx, shader); } void _mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader, bool dump_ast, bool dump_hir, bool force_recompile) { - struct _mesa_glsl_parse_state *state = - new(shader) _mesa_glsl_parse_state(ctx, shader->Stage, shader); const char *source = force_recompile && shader->FallbackSource ? shader->FallbackSource : shader->Source; - if (ctx->Const.GenerateTemporaryNames) - (void) p_atomic_cmpxchg(_variable::temporaries_allocate_names, - false, true); - - state->error = glcpp_preprocess(state, , >info_log, - add_builtin_defines, state, ctx); - if (!force_recompile) { if (ctx->Cache) { char buf[41]; disk_cache_compute_key(ctx->Cache, source, strlen(source), shader->sha1); if (disk_cache_has_key(ctx->Cache, shader->sha1)) { /* We've seen this shader before and know it compiles */ if (ctx->_Shader->Flags & GLSL_CACHE_INFO) { _mesa_sha1_format(buf, shader->sha1); fprintf(stderr, "deferring compile of shader: %s\n", buf); @@ -2043,20 +2034,30 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader, if (shader->CompileStatus == compile_success) return; if (shader->CompileStatus == compiled_no_opts) { opt_shader_and_create_symbol_table(ctx, shader); shader->CompileStatus = compile_success; return; } } + struct _mesa_glsl_parse_state *state = + new(shader) _mesa_glsl_parse_state(ctx, shader->Stage, shader); + + if (ctx->Const.GenerateTemporaryNames) + (void) p_atomic_cmpxchg(_variable::temporaries_allocate_names, + false, true); + + state->error = glcpp_preprocess(state, , >info_log, + add_builtin_defines, state, ctx); + if (!state->error) { _mesa_glsl_lexer_ctor(state, source); _mesa_glsl_parse(state); _mesa_glsl_lexer_dtor(state); do_late_parsing_checks(state); } if (dump_ast) { foreach_list_typed(ast_node, ast, link, >translation_unit) { ast->print(); diff --git a/src/compiler/glsl/shader_cache.cpp b/src/compiler/glsl/shader_cache.cpp index e51fecd..738e548 100644 --- a/src/compiler/glsl/shader_cache.cpp +++ b/src/compiler/glsl/shader_cache.cpp @@ -1312,20 +1312,30 @@ shader_cache_read_program_metadata(struct gl_context *ctx, prog->SeparateShader ? "T" : "F"); /* A shader might end up producing different output depending on the glsl * version supported by the compiler. For example a different path might be * taken by the preprocessor, so add the version to the hash input. */ ralloc_asprintf_append(, "api: %d glsl: %d fglsl: %d\n", ctx->API, ctx->Const.GLSLVersion, ctx->Const.ForceGLSLVersion); + /* We run the preprocessor on shaders after hashing them, so we need to +* add any extension override vars to the hash. If we don't do this the +* preprocessor could result in different output and we could load the +* wrong shader. +*/ + char *ext_override = getenv("MESA_EXTENSION_OVERRIDE"); + if (ext_override) { + ralloc_asprintf_append(, "ext:%s", ext_override); + } + /* DRI config options may also change the output from the compiler so * include them as an input to sha1 creation. */ char sha1buf[41]; _mesa_sha1_format(sha1buf, ctx->Const.dri_config_options_sha1); ralloc_strcat(, sha1buf); for (unsigned i = 0; i < prog->NumShaders; i++) { struct gl_shader *sh = prog->Shaders[i]; _mesa_sha1_format(sha1buf, sh->sha1); -- 2.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH V2 1/2] glsl: delay optimisations on individual shaders when cache is available
Due to a max limit of 65,536 entries on the index table that we use to decide if we can skip compiling individual shaders, it is very likely we will have collisions. To avoid doing too much work when the linked program may be in the cache this patch delays calling the optimisations until link time. Improves cold cache start-up times on Deus Ex by ~20 seconds. When deleting the cache index to simulate a worst case scenario of collisions in the index, warm cache start-up time improves by ~45 seconds. V2: fix indentation, make sure to call optimisations on cache fallback, make sure optimisations get called for XFB. Tested-by: Grazvydas Ignotasi Reviewed-by: Nicolai Hähnle --- src/compiler/glsl/glsl_parser_extras.cpp | 166 +-- src/compiler/glsl/linker.cpp | 3 - src/compiler/glsl/shader_cache.cpp | 2 +- src/mesa/main/mtypes.h | 3 +- 4 files changed, 96 insertions(+), 78 deletions(-) diff --git a/src/compiler/glsl/glsl_parser_extras.cpp b/src/compiler/glsl/glsl_parser_extras.cpp index 4629e78..ca74b55 100644 --- a/src/compiler/glsl/glsl_parser_extras.cpp +++ b/src/compiler/glsl/glsl_parser_extras.cpp @@ -1915,20 +1915,99 @@ static void do_late_parsing_checks(struct _mesa_glsl_parse_state *state) { if (state->stage == MESA_SHADER_COMPUTE && !state->has_compute_shader()) { YYLTYPE loc; memset(, 0, sizeof(loc)); _mesa_glsl_error(, state, "Compute shaders require " "GLSL 4.30 or GLSL ES 3.10"); } } +static void +opt_shader_and_create_symbol_table(struct gl_context *ctx, + struct gl_shader *shader) +{ + assert(shader->CompileStatus != compile_failure && + !shader->ir->is_empty()); + + struct gl_shader_compiler_options *options = + >Const.ShaderCompilerOptions[shader->Stage]; + + /* Do some optimization at compile time to reduce shader IR size +* and reduce later work if the same shader is linked multiple times +*/ + if (ctx->Const.GLSLOptimizeConservatively) { + /* Run it just once. */ + do_common_optimization(shader->ir, false, false, options, + ctx->Const.NativeIntegers); + } else { + /* Repeat it until it stops making changes. */ + while (do_common_optimization(shader->ir, false, false, options, +ctx->Const.NativeIntegers)) + ; + } + + validate_ir_tree(shader->ir); + + enum ir_variable_mode other; + switch (shader->Stage) { + case MESA_SHADER_VERTEX: + other = ir_var_shader_in; + break; + case MESA_SHADER_FRAGMENT: + other = ir_var_shader_out; + break; + default: + /* Something invalid to ensure optimize_dead_builtin_uniforms + * doesn't remove anything other than uniforms or constants. + */ + other = ir_var_mode_count; + break; + } + + optimize_dead_builtin_variables(shader->ir, other); + + validate_ir_tree(shader->ir); + + /* Retain any live IR, but trash the rest. */ + reparent_ir(shader->ir, shader->ir); + + /* Destroy the symbol table. Create a new symbol table that contains only +* the variables and functions that still exist in the IR. The symbol +* table will be used later during linking. +* +* There must NOT be any freed objects still referenced by the symbol +* table. That could cause the linker to dereference freed memory. +* +* We don't have to worry about types or interface-types here because those +* are fly-weights that are looked up by glsl_type. +*/ + foreach_in_list (ir_instruction, ir, shader->ir) { + switch (ir->ir_type) { + case ir_type_function: + shader->symbols->add_function((ir_function *) ir); + break; + case ir_type_variable: { + ir_variable *const var = (ir_variable *) ir; + + if (var->data.mode != ir_var_temporary) +shader->symbols->add_variable(var); + break; + } + default: + break; + } + } + + _mesa_glsl_initialize_derived_variables(ctx, shader); +} + void _mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader, bool dump_ast, bool dump_hir, bool force_recompile) { struct _mesa_glsl_parse_state *state = new(shader) _mesa_glsl_parse_state(ctx, shader->Stage, shader); const char *source = force_recompile && shader->FallbackSource ? shader->FallbackSource : shader->Source; if (ctx->Const.GenerateTemporaryNames) @@ -1956,20 +2035,26 @@ _mesa_glsl_compile_shader(struct gl_context *ctx, struct gl_shader *shader, return; } } } else { /* We should only ever end up here if a re-compile has been forced by a * shader cache miss. In which case we can skip the compile if its * already be done by a previous fallback or the
Re: [Mesa-dev] [PATCH v2 7/9] nvc0/ir: Implement TGSI_SEMANTIC_SUBGROUP_*
Reviewed-by: Ilia MirkinOn Sun, Apr 9, 2017 at 8:58 PM, Boyan Ding wrote: > --- > .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 27 > ++ > 1 file changed, 27 insertions(+) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp > index 3ed7d345c4..1bd01a9a32 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp > @@ -450,6 +450,12 @@ static nv50_ir::SVSemantic translateSysVal(uint sysval) > case TGSI_SEMANTIC_BASEINSTANCE: return nv50_ir::SV_BASEINSTANCE; > case TGSI_SEMANTIC_DRAWID: return nv50_ir::SV_DRAWID; > case TGSI_SEMANTIC_WORK_DIM: return nv50_ir::SV_WORK_DIM; > + case TGSI_SEMANTIC_SUBGROUP_INVOCATION: return nv50_ir::SV_LANEID; > + case TGSI_SEMANTIC_SUBGROUP_EQ_MASK: return nv50_ir::SV_LANEMASK_EQ; > + case TGSI_SEMANTIC_SUBGROUP_LT_MASK: return nv50_ir::SV_LANEMASK_LT; > + case TGSI_SEMANTIC_SUBGROUP_LE_MASK: return nv50_ir::SV_LANEMASK_LE; > + case TGSI_SEMANTIC_SUBGROUP_GT_MASK: return nv50_ir::SV_LANEMASK_GT; > + case TGSI_SEMANTIC_SUBGROUP_GE_MASK: return nv50_ir::SV_LANEMASK_GE; > default: >assert(0); >return nv50_ir::SV_CLOCK; > @@ -1667,6 +1673,8 @@ private: > Symbol *srcToSym(tgsi::Instruction::SrcRegister, int c); > Symbol *dstToSym(tgsi::Instruction::DstRegister, int c); > > + bool isSubGroupMask(uint8_t semantic); > + > bool handleInstruction(const struct tgsi_full_instruction *); > void exportOutputs(); > inline Subroutine *getSubroutine(unsigned ip); > @@ -1996,6 +2004,21 @@ Converter::adjustTempIndex(int arrayId, int , int > ) const > idx += it->second; > } > > +bool > +Converter::isSubGroupMask(uint8_t semantic) > +{ > + switch (semantic) { > + case TGSI_SEMANTIC_SUBGROUP_EQ_MASK: > + case TGSI_SEMANTIC_SUBGROUP_LT_MASK: > + case TGSI_SEMANTIC_SUBGROUP_LE_MASK: > + case TGSI_SEMANTIC_SUBGROUP_GT_MASK: > + case TGSI_SEMANTIC_SUBGROUP_GE_MASK: > + return true; > + default: > + return false; > + } > +} > + > Value * > Converter::fetchSrc(tgsi::Instruction::SrcRegister src, int c, Value *ptr) > { > @@ -2041,6 +2064,10 @@ Converter::fetchSrc(tgsi::Instruction::SrcRegister > src, int c, Value *ptr) >if (info->sv[idx].sn == TGSI_SEMANTIC_THREAD_ID && >info->prop.cp.numThreads[swz] == 1) > return loadImm(NULL, 0u); > + if (isSubGroupMask(info->sv[idx].sn) && swz > 0) > + return loadImm(NULL, 0u); > + if (info->sv[idx].sn == TGSI_SEMANTIC_SUBGROUP_SIZE) > + return loadImm(NULL, 32u); >ld = mkOp1(OP_RDSV, TYPE_U32, getSSA(), srcToSym(src, c)); >ld->perPatch = info->sv[idx].patch; >return ld->getDef(0); > -- > 2.12.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 5/9] nvc0/ir: Allow 0/1 immediate value as source of OP_VOTE
On Sun, Apr 9, 2017 at 8:58 PM, Boyan Dingwrote: > Implementation of readFirstInvocationARB() on nvidia hardware needs a > ballotARB(true) used to decide the first active thread. This expressed > in gm107 asm as (supposing output is $r0): > vote any $r0 0x1 0x1 > > To model the always true input, which corresponds to the second 0x1 > above, we make OP_VOTE accept immediate value 0/1 and emit "0x1" and > "not 0x1" in the src field respectively. > > v2: Make sure that asImm() is not NULL (Samuel Pitoiset) > --- > .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 24 > ++ > .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 22 +--- > .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 24 > ++ > 3 files changed, 59 insertions(+), 11 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp > index 58076ba4d5..87976ffebc 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp > @@ -1621,7 +1621,8 @@ CodeEmitterGK110::emitSHFL(const Instruction *i) > void > CodeEmitterGK110::emitVOTE(const Instruction *i) > { > - assert(i->src(0).getFile() == FILE_PREDICATE); > + const ImmediateValue *imm; > + uint32_t u32; > > code[0] = 0x0002; > code[1] = 0x86c0 | (i->subOp << 19); > @@ -1646,9 +1647,24 @@ CodeEmitterGK110::emitVOTE(const Instruction *i) >code[0] |= 255 << 2; > if (!(rp & 2)) >code[1] |= 7 << 16; > - if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT)) > - code[1] |= 1 << 13; > - srcId(i->src(0), 42); > + > + switch (i->src(0).getFile()) { > + case FILE_PREDICATE: > + if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT)) > + code[0] |= 1 << 13; > + srcId(i->src(0), 42); > + break; > + case FILE_IMMEDIATE: > + imm = i->src(0).get()->asImm(); > + assert(imm); > + u32 = imm->reg.data.u32; > + assert(u32 == 0 || u32 == 1); > + code[1] |= (u32 == 1 ? 0x7 : 0xf) << 10; > + break; > + default: > + assert(!"Unhandled src"); > + break; > + } > } > > void > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp > index 944563c93c..0382cb3903 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp > @@ -2931,7 +2931,8 @@ CodeEmitterGM107::emitMEMBAR() > void > CodeEmitterGM107::emitVOTE() > { > - assert(insn->src(0).getFile() == FILE_PREDICATE); > + const ImmediateValue *imm; > + uint32_t u32; > > int r = -1, p = -1; > for (int i = 0; insn->defExists(i); i++) { > @@ -2951,8 +2952,23 @@ CodeEmitterGM107::emitVOTE() >emitPRED (0x2d, insn->def(p)); > else >emitPRED (0x2d); > - emitField(0x2a, 1, insn->src(0).mod == Modifier(NV50_IR_MOD_NOT)); > - emitPRED (0x27, insn->src(0)); > + > + switch (insn->src(0).getFile()) { > + case FILE_PREDICATE: > + emitField(0x2a, 1, insn->src(0).mod == Modifier(NV50_IR_MOD_NOT)); > + emitPRED (0x27, insn->src(0)); > + break; > + case FILE_IMMEDIATE: > + imm = insn->src(0).get()->asImm(); > + assert(imm); > + u32 = imm->reg.data.u32; > + assert(u32 == 0 || u32 == 1); > + emitField(0x27, 4, u32 == 1 ? 0x7 : 0xf); I'd kinda prefer this to be emitField(0x2a, 1, u32 == 0); emitPRED(0x27); That way you have symmetry with the predicate version. Unfortunately this is tricky to do in the other emitters -- the helpers in gm107 are *way* better (well, Ben probably learned from the earlier failures). So don't worry about trying to do it in the other ones. > + break; > + default: > + assert(!"Unhandled src"); > + break; > + } > } > > void > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > index ee2d2f06c1..84c3aca1df 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > @@ -2587,7 +2587,8 @@ CodeEmitterNVC0::emitSHFL(const Instruction *i) > void > CodeEmitterNVC0::emitVOTE(const Instruction *i) > { > - assert(i->src(0).getFile() == FILE_PREDICATE); > + const ImmediateValue *imm; > + uint32_t u32; > > code[0] = 0x0004 | (i->subOp << 5); > code[1] = 0x4800; > @@ -2612,9 +2613,24 @@ CodeEmitterNVC0::emitVOTE(const Instruction *i) >code[0] |= 63 << 14; > if (!(rp & 2)) >code[1] |= 7 << 22; > - if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT)) > - code[0] |= 1 << 23; > - srcId(i->src(0), 20); > + > + switch (i->src(0).getFile()) { > + case FILE_PREDICATE: > + if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT)) > +
Re: [Mesa-dev] [PATCH v2 6/9] nvc0/ir: Add SV_LANEMASK_* system values.
Please add these to nv50_ir_print.cpp's list of names too. On Sun, Apr 9, 2017 at 8:58 PM, Boyan Dingwrote: > --- > src/gallium/drivers/nouveau/codegen/nv50_ir.h | 5 + > src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 5 + > src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 5 + > src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 5 + > 4 files changed, 20 insertions(+) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h > b/src/gallium/drivers/nouveau/codegen/nv50_ir.h > index 6e5ffa525d..de6c110536 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h > @@ -470,6 +470,11 @@ enum SVSemantic > SV_BASEINSTANCE, > SV_DRAWID, > SV_WORK_DIM, > + SV_LANEMASK_EQ, > + SV_LANEMASK_LT, > + SV_LANEMASK_LE, > + SV_LANEMASK_GT, > + SV_LANEMASK_GE, > SV_UNDEFINED, > SV_LAST > }; > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp > index 87976ffebc..bd4bd118f4 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp > @@ -2300,6 +2300,11 @@ CodeEmitterGK110::getSRegEncoding(const ValueRef& ref) > case SV_NCTAID:return 0x2d + SDATA(ref).sv.index; > case SV_LBASE: return 0x34; > case SV_SBASE: return 0x30; > + case SV_LANEMASK_EQ: return 0x38; > + case SV_LANEMASK_LT: return 0x39; > + case SV_LANEMASK_LE: return 0x3a; > + case SV_LANEMASK_GT: return 0x3b; > + case SV_LANEMASK_GE: return 0x3c; > case SV_CLOCK: return 0x50 + SDATA(ref).sv.index; > default: >assert(!"no sreg for system value"); > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp > index 0382cb3903..29426c130b 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp > @@ -269,6 +269,11 @@ CodeEmitterGM107::emitSYS(int pos, const Value *val) > case SV_INVOCATION_INFO: id = 0x1d; break; > case SV_TID: id = 0x21 + val->reg.data.sv.index; break; > case SV_CTAID : id = 0x25 + val->reg.data.sv.index; break; > + case SV_LANEMASK_EQ: id = 0x38; break; > + case SV_LANEMASK_LT: id = 0x39; break; > + case SV_LANEMASK_LE: id = 0x3a; break; > + case SV_LANEMASK_GT: id = 0x3b; break; > + case SV_LANEMASK_GE: id = 0x3c; break; > case SV_CLOCK : id = 0x50 + val->reg.data.sv.index; break; > default: >assert(!"invalid system value"); > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > index 84c3aca1df..c549ca1158 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > @@ -1989,6 +1989,11 @@ CodeEmitterNVC0::getSRegEncoding(const ValueRef& ref) > case SV_NCTAID:return 0x2d + SDATA(ref).sv.index; > case SV_LBASE: return 0x34; > case SV_SBASE: return 0x30; > + case SV_LANEMASK_EQ: return 0x38; > + case SV_LANEMASK_LT: return 0x39; > + case SV_LANEMASK_LE: return 0x3a; > + case SV_LANEMASK_GT: return 0x3b; > + case SV_LANEMASK_GE: return 0x3c; > case SV_CLOCK: return 0x50 + SDATA(ref).sv.index; > default: >assert(!"no sreg for system value"); > -- > 2.12.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 3/9] nvc0/ir: Emit OP_SHFL
On Sun, Apr 9, 2017 at 8:58 PM, Boyan Dingwrote: > v2: (Samuel Pitoiset) > Add an assertion to check if the target is Kepler > Make sure that asImm() is not NULL > --- > .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 59 > ++ > 1 file changed, 59 insertions(+) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > index d5a310f88c..ee2d2f06c1 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > @@ -150,6 +150,8 @@ private: > > void emitPIXLD(const Instruction *); > > + void emitSHFL(const Instruction *); > + > void emitVOTE(const Instruction *); > > inline void defId(const ValueDef&, const int pos); > @@ -2529,6 +2531,60 @@ CodeEmitterNVC0::emitPIXLD(const Instruction *i) > } > > void > +CodeEmitterNVC0::emitSHFL(const Instruction *i) > +{ > + const ImmediateValue *imm; > + > + assert(targ->getChipset() >= NVISA_GK104_CHIPSET); > + > + code[0] = 0x0005; > + code[1] = 0x8800 | (i->subOp << 23); > + > + emitPredicate(i); > + > + defId(i->def(0), 14); > + srcId(i->src(0), 20); > + > + switch (i->src(1).getFile()) { > + case FILE_GPR: > + srcId(i->src(1), 26); > + break; > + case FILE_IMMEDIATE: > + imm = i->src(1).get()->asImm(); The common thing to do is i->getSrc(1)->asImm(). Should be identical. Same below. > + assert(imm); > + code[0] |= (imm->reg.data.u32 & 0x1f) << 26; > + code[0] |= 1 << 5; > + break; > + default: > + assert(!"invalid src1 file"); > + break; > + } > + > + switch (i->src(2).getFile()) { > + case FILE_GPR: > + srcId(i->src(2), 49); > + break; > + case FILE_IMMEDIATE: > + imm = i->src(2).get()->asImm(); > + assert(imm); && imm->reg.data.u32 < 0x2000 > + code[1] |= (imm->reg.data.u32 & 0x1fff) << 10; > + code[0] |= 1 << 6; > + break; > + default: > + assert(!"invalid src2 file"); > + break; > + } > + > + if (!i->defExists(1)) { > + code[0] |= 3 << 8; > + code[1] |= 1 << 26; > + } else { > + assert(i->def(1).getFile() == FILE_PREDICATE); > + setPDSTL(i->def(1)); setPDSTL should be able to handle the no-exists case too, no? You might change the API to be setPDSTL(const Instruction *, int d) to avoid confusion. > + } > +} > + > +void > CodeEmitterNVC0::emitVOTE(const Instruction *i) > { > assert(i->src(0).getFile() == FILE_PREDICATE); > @@ -2837,6 +2893,9 @@ CodeEmitterNVC0::emitInstruction(Instruction *insn) > case OP_PIXLD: >emitPIXLD(insn); >break; > + case OP_SHFL: > + emitSHFL(insn); > + break; > case OP_VOTE: >emitVOTE(insn); >break; > -- > 2.12.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 2/9] nvc0/ir: Properly handle a "split form" of predicate destination
Wow, great find! On Sun, Apr 9, 2017 at 8:58 PM, Boyan Dingwrote: > GF100's ISA encoding has a weird form of predicate destination where its > 3 bits are split across whole the instruction. Use a dedicated setPDSTL > function instead of original defId which is incorrect in this case. > --- > src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 13 +++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > index 5467447e35..d5a310f88c 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp > @@ -58,6 +58,7 @@ private: > void setImmediateS8(const ValueRef&); > void setSUConst16(const Instruction *, const int s); > void setSUPred(const Instruction *, const int s); > + inline void setPDSTL(const ValueDef&); > > void emitCondCode(CondCode cc, int pos); > void emitInterpMode(const Instruction *); > @@ -375,6 +376,14 @@ void CodeEmitterNVC0::setImmediateS8(const ValueRef ) > code[0] |= (s8 >> 6) << 8; > } > > +void CodeEmitterNVC0::setPDSTL(const ValueDef ) > +{ > + uint32_t pred = (def.get() && def.getFile() != FILE_FLAGS ? DDATA(def).id > : 7); Why not just == FILE_PREDICATE? Also, I don't think the outer parens do much. > + > + code[0] |= (pred & 3) << 8; > + code[1] |= !!(pred & 7) << 26; This always makes me nervous... how about (pred & 4) << (26 - 2) BTW, this should be pred & 4 in either case, no? > +} > + > void > CodeEmitterNVC0::emitForm_A(const Instruction *i, uint64_t opc) > { > @@ -1873,7 +1882,7 @@ CodeEmitterNVC0::emitSTORE(const Instruction *i) >if (i->src(0).getFile() == FILE_MEMORY_SHARED && >i->subOp == NV50_IR_SUBOP_STORE_UNLOCKED) { > assert(i->defExists(0)); > - defId(i->def(0), 8); > + setPDSTL(i->def(0)); >} > } > > @@ -1945,7 +1954,7 @@ CodeEmitterNVC0::emitLOAD(const Instruction *i) > > if (p >= 0) { >if (targ->getChipset() >= NVISA_GK104_CHIPSET) > - defId(i->def(p), 8); > + setPDSTL(i->def(p)); >else > defId(i->def(p), 32 + 18); > } > -- > 2.12.1 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 9/9] nvc0: Enable ARB_shader_ballot on Kepler+
readInvocationARB() and readFirstInvocationARB() need SHFL.IDX instruction which is introduced in Kepler. --- docs/features.txt | 2 +- docs/relnotes/17.1.0.html | 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 3 ++- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/features.txt b/docs/features.txt index edc56842b9..a2d7785827 100644 --- a/docs/features.txt +++ b/docs/features.txt @@ -292,7 +292,7 @@ Khronos, ARB, and OES extensions that are not part of any OpenGL or OpenGL ES ve GL_ARB_sample_locations not started GL_ARB_seamless_cubemap_per_texture DONE (i965, nvc0, radeonsi, r600, softpipe, swr) GL_ARB_shader_atomic_counter_ops DONE (i965/gen7+, nvc0, radeonsi, softpipe) - GL_ARB_shader_ballot DONE (radeonsi) + GL_ARB_shader_ballot DONE (nvc0, radeonsi) GL_ARB_shader_clock DONE (i965/gen7+, nv50, nvc0, radeonsi) GL_ARB_shader_draw_parameters DONE (i965, nvc0, radeonsi) GL_ARB_shader_group_vote DONE (nvc0, radeonsi) diff --git a/docs/relnotes/17.1.0.html b/docs/relnotes/17.1.0.html index 0a5cabe4f1..8f237ed527 100644 --- a/docs/relnotes/17.1.0.html +++ b/docs/relnotes/17.1.0.html @@ -45,7 +45,7 @@ Note: some of the new features are only available with certain drivers. GL_ARB_gpu_shader_int64 on i965/gen8+, nvc0, radeonsi, softpipe, llvmpipe -GL_ARB_shader_ballot on radeonsi +GL_ARB_shader_ballot on nvc0, radeonsi GL_ARB_shader_clock on nv50, nvc0, radeonsi GL_ARB_shader_group_vote on radeonsi GL_ARB_sparse_buffer on radeonsi/CIK+ diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c index 7ef9bf9c9c..8c6712a121 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c @@ -259,6 +259,8 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) return class_3d >= NVE4_3D_CLASS; /* needs testing on fermi */ case PIPE_CAP_POLYGON_MODE_FILL_RECTANGLE: return class_3d >= GM200_3D_CLASS; + case PIPE_CAP_TGSI_BALLOT: + return class_3d >= NVE4_3D_CLASS; /* unsupported caps */ case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT: @@ -289,7 +291,6 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param) case PIPE_CAP_GLSL_OPTIMIZE_CONSERVATIVELY: case PIPE_CAP_INT64_DIVMOD: case PIPE_CAP_SPARSE_BUFFER_PAGE_SIZE: - case PIPE_CAP_TGSI_BALLOT: return 0; case PIPE_CAP_VENDOR_ID: -- 2.12.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 6/9] nvc0/ir: Add SV_LANEMASK_* system values.
--- src/gallium/drivers/nouveau/codegen/nv50_ir.h | 5 + src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 5 + src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 5 + src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 5 + 4 files changed, 20 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h b/src/gallium/drivers/nouveau/codegen/nv50_ir.h index 6e5ffa525d..de6c110536 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h @@ -470,6 +470,11 @@ enum SVSemantic SV_BASEINSTANCE, SV_DRAWID, SV_WORK_DIM, + SV_LANEMASK_EQ, + SV_LANEMASK_LT, + SV_LANEMASK_LE, + SV_LANEMASK_GT, + SV_LANEMASK_GE, SV_UNDEFINED, SV_LAST }; diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp index 87976ffebc..bd4bd118f4 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp @@ -2300,6 +2300,11 @@ CodeEmitterGK110::getSRegEncoding(const ValueRef& ref) case SV_NCTAID:return 0x2d + SDATA(ref).sv.index; case SV_LBASE: return 0x34; case SV_SBASE: return 0x30; + case SV_LANEMASK_EQ: return 0x38; + case SV_LANEMASK_LT: return 0x39; + case SV_LANEMASK_LE: return 0x3a; + case SV_LANEMASK_GT: return 0x3b; + case SV_LANEMASK_GE: return 0x3c; case SV_CLOCK: return 0x50 + SDATA(ref).sv.index; default: assert(!"no sreg for system value"); diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp index 0382cb3903..29426c130b 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp @@ -269,6 +269,11 @@ CodeEmitterGM107::emitSYS(int pos, const Value *val) case SV_INVOCATION_INFO: id = 0x1d; break; case SV_TID: id = 0x21 + val->reg.data.sv.index; break; case SV_CTAID : id = 0x25 + val->reg.data.sv.index; break; + case SV_LANEMASK_EQ: id = 0x38; break; + case SV_LANEMASK_LT: id = 0x39; break; + case SV_LANEMASK_LE: id = 0x3a; break; + case SV_LANEMASK_GT: id = 0x3b; break; + case SV_LANEMASK_GE: id = 0x3c; break; case SV_CLOCK : id = 0x50 + val->reg.data.sv.index; break; default: assert(!"invalid system value"); diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp index 84c3aca1df..c549ca1158 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp @@ -1989,6 +1989,11 @@ CodeEmitterNVC0::getSRegEncoding(const ValueRef& ref) case SV_NCTAID:return 0x2d + SDATA(ref).sv.index; case SV_LBASE: return 0x34; case SV_SBASE: return 0x30; + case SV_LANEMASK_EQ: return 0x38; + case SV_LANEMASK_LT: return 0x39; + case SV_LANEMASK_LE: return 0x3a; + case SV_LANEMASK_GT: return 0x3b; + case SV_LANEMASK_GE: return 0x3c; case SV_CLOCK: return 0x50 + SDATA(ref).sv.index; default: assert(!"no sreg for system value"); -- 2.12.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 5/9] nvc0/ir: Allow 0/1 immediate value as source of OP_VOTE
Implementation of readFirstInvocationARB() on nvidia hardware needs a ballotARB(true) used to decide the first active thread. This expressed in gm107 asm as (supposing output is $r0): vote any $r0 0x1 0x1 To model the always true input, which corresponds to the second 0x1 above, we make OP_VOTE accept immediate value 0/1 and emit "0x1" and "not 0x1" in the src field respectively. v2: Make sure that asImm() is not NULL (Samuel Pitoiset) --- .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 24 ++ .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 22 +--- .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 24 ++ 3 files changed, 59 insertions(+), 11 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp index 58076ba4d5..87976ffebc 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp @@ -1621,7 +1621,8 @@ CodeEmitterGK110::emitSHFL(const Instruction *i) void CodeEmitterGK110::emitVOTE(const Instruction *i) { - assert(i->src(0).getFile() == FILE_PREDICATE); + const ImmediateValue *imm; + uint32_t u32; code[0] = 0x0002; code[1] = 0x86c0 | (i->subOp << 19); @@ -1646,9 +1647,24 @@ CodeEmitterGK110::emitVOTE(const Instruction *i) code[0] |= 255 << 2; if (!(rp & 2)) code[1] |= 7 << 16; - if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT)) - code[1] |= 1 << 13; - srcId(i->src(0), 42); + + switch (i->src(0).getFile()) { + case FILE_PREDICATE: + if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT)) + code[0] |= 1 << 13; + srcId(i->src(0), 42); + break; + case FILE_IMMEDIATE: + imm = i->src(0).get()->asImm(); + assert(imm); + u32 = imm->reg.data.u32; + assert(u32 == 0 || u32 == 1); + code[1] |= (u32 == 1 ? 0x7 : 0xf) << 10; + break; + default: + assert(!"Unhandled src"); + break; + } } void diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp index 944563c93c..0382cb3903 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp @@ -2931,7 +2931,8 @@ CodeEmitterGM107::emitMEMBAR() void CodeEmitterGM107::emitVOTE() { - assert(insn->src(0).getFile() == FILE_PREDICATE); + const ImmediateValue *imm; + uint32_t u32; int r = -1, p = -1; for (int i = 0; insn->defExists(i); i++) { @@ -2951,8 +2952,23 @@ CodeEmitterGM107::emitVOTE() emitPRED (0x2d, insn->def(p)); else emitPRED (0x2d); - emitField(0x2a, 1, insn->src(0).mod == Modifier(NV50_IR_MOD_NOT)); - emitPRED (0x27, insn->src(0)); + + switch (insn->src(0).getFile()) { + case FILE_PREDICATE: + emitField(0x2a, 1, insn->src(0).mod == Modifier(NV50_IR_MOD_NOT)); + emitPRED (0x27, insn->src(0)); + break; + case FILE_IMMEDIATE: + imm = insn->src(0).get()->asImm(); + assert(imm); + u32 = imm->reg.data.u32; + assert(u32 == 0 || u32 == 1); + emitField(0x27, 4, u32 == 1 ? 0x7 : 0xf); + break; + default: + assert(!"Unhandled src"); + break; + } } void diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp index ee2d2f06c1..84c3aca1df 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp @@ -2587,7 +2587,8 @@ CodeEmitterNVC0::emitSHFL(const Instruction *i) void CodeEmitterNVC0::emitVOTE(const Instruction *i) { - assert(i->src(0).getFile() == FILE_PREDICATE); + const ImmediateValue *imm; + uint32_t u32; code[0] = 0x0004 | (i->subOp << 5); code[1] = 0x4800; @@ -2612,9 +2613,24 @@ CodeEmitterNVC0::emitVOTE(const Instruction *i) code[0] |= 63 << 14; if (!(rp & 2)) code[1] |= 7 << 22; - if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT)) - code[0] |= 1 << 23; - srcId(i->src(0), 20); + + switch (i->src(0).getFile()) { + case FILE_PREDICATE: + if (i->src(0).mod == Modifier(NV50_IR_MOD_NOT)) + code[0] |= 1 << 23; + srcId(i->src(0), 20); + break; + case FILE_IMMEDIATE: + imm = i->src(0).get()->asImm(); + assert(imm); + u32 = imm->reg.data.u32; + assert(u32 == 0 || u32 == 1); + code[0] |= (u32 == 1 ? 0x7 : 0xf) << 20; + break; + default: + assert(!"Unhandled src"); + break; + } } bool -- 2.12.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 7/9] nvc0/ir: Implement TGSI_SEMANTIC_SUBGROUP_*
--- .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 27 ++ 1 file changed, 27 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index 3ed7d345c4..1bd01a9a32 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -450,6 +450,12 @@ static nv50_ir::SVSemantic translateSysVal(uint sysval) case TGSI_SEMANTIC_BASEINSTANCE: return nv50_ir::SV_BASEINSTANCE; case TGSI_SEMANTIC_DRAWID: return nv50_ir::SV_DRAWID; case TGSI_SEMANTIC_WORK_DIM: return nv50_ir::SV_WORK_DIM; + case TGSI_SEMANTIC_SUBGROUP_INVOCATION: return nv50_ir::SV_LANEID; + case TGSI_SEMANTIC_SUBGROUP_EQ_MASK: return nv50_ir::SV_LANEMASK_EQ; + case TGSI_SEMANTIC_SUBGROUP_LT_MASK: return nv50_ir::SV_LANEMASK_LT; + case TGSI_SEMANTIC_SUBGROUP_LE_MASK: return nv50_ir::SV_LANEMASK_LE; + case TGSI_SEMANTIC_SUBGROUP_GT_MASK: return nv50_ir::SV_LANEMASK_GT; + case TGSI_SEMANTIC_SUBGROUP_GE_MASK: return nv50_ir::SV_LANEMASK_GE; default: assert(0); return nv50_ir::SV_CLOCK; @@ -1667,6 +1673,8 @@ private: Symbol *srcToSym(tgsi::Instruction::SrcRegister, int c); Symbol *dstToSym(tgsi::Instruction::DstRegister, int c); + bool isSubGroupMask(uint8_t semantic); + bool handleInstruction(const struct tgsi_full_instruction *); void exportOutputs(); inline Subroutine *getSubroutine(unsigned ip); @@ -1996,6 +2004,21 @@ Converter::adjustTempIndex(int arrayId, int , int ) const idx += it->second; } +bool +Converter::isSubGroupMask(uint8_t semantic) +{ + switch (semantic) { + case TGSI_SEMANTIC_SUBGROUP_EQ_MASK: + case TGSI_SEMANTIC_SUBGROUP_LT_MASK: + case TGSI_SEMANTIC_SUBGROUP_LE_MASK: + case TGSI_SEMANTIC_SUBGROUP_GT_MASK: + case TGSI_SEMANTIC_SUBGROUP_GE_MASK: + return true; + default: + return false; + } +} + Value * Converter::fetchSrc(tgsi::Instruction::SrcRegister src, int c, Value *ptr) { @@ -2041,6 +2064,10 @@ Converter::fetchSrc(tgsi::Instruction::SrcRegister src, int c, Value *ptr) if (info->sv[idx].sn == TGSI_SEMANTIC_THREAD_ID && info->prop.cp.numThreads[swz] == 1) return loadImm(NULL, 0u); + if (isSubGroupMask(info->sv[idx].sn) && swz > 0) + return loadImm(NULL, 0u); + if (info->sv[idx].sn == TGSI_SEMANTIC_SUBGROUP_SIZE) + return loadImm(NULL, 32u); ld = mkOp1(OP_RDSV, TYPE_U32, getSSA(), srcToSym(src, c)); ld->perPatch = info->sv[idx].patch; return ld->getDef(0); -- 2.12.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 3/9] nvc0/ir: Emit OP_SHFL
v2: (Samuel Pitoiset) Add an assertion to check if the target is Kepler Make sure that asImm() is not NULL --- .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 59 ++ 1 file changed, 59 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp index d5a310f88c..ee2d2f06c1 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp @@ -150,6 +150,8 @@ private: void emitPIXLD(const Instruction *); + void emitSHFL(const Instruction *); + void emitVOTE(const Instruction *); inline void defId(const ValueDef&, const int pos); @@ -2529,6 +2531,60 @@ CodeEmitterNVC0::emitPIXLD(const Instruction *i) } void +CodeEmitterNVC0::emitSHFL(const Instruction *i) +{ + const ImmediateValue *imm; + + assert(targ->getChipset() >= NVISA_GK104_CHIPSET); + + code[0] = 0x0005; + code[1] = 0x8800 | (i->subOp << 23); + + emitPredicate(i); + + defId(i->def(0), 14); + srcId(i->src(0), 20); + + switch (i->src(1).getFile()) { + case FILE_GPR: + srcId(i->src(1), 26); + break; + case FILE_IMMEDIATE: + imm = i->src(1).get()->asImm(); + assert(imm); + code[0] |= (imm->reg.data.u32 & 0x1f) << 26; + code[0] |= 1 << 5; + break; + default: + assert(!"invalid src1 file"); + break; + } + + switch (i->src(2).getFile()) { + case FILE_GPR: + srcId(i->src(2), 49); + break; + case FILE_IMMEDIATE: + imm = i->src(2).get()->asImm(); + assert(imm); + code[1] |= (imm->reg.data.u32 & 0x1fff) << 10; + code[0] |= 1 << 6; + break; + default: + assert(!"invalid src2 file"); + break; + } + + if (!i->defExists(1)) { + code[0] |= 3 << 8; + code[1] |= 1 << 26; + } else { + assert(i->def(1).getFile() == FILE_PREDICATE); + setPDSTL(i->def(1)); + } +} + +void CodeEmitterNVC0::emitVOTE(const Instruction *i) { assert(i->src(0).getFile() == FILE_PREDICATE); @@ -2837,6 +2893,9 @@ CodeEmitterNVC0::emitInstruction(Instruction *insn) case OP_PIXLD: emitPIXLD(insn); break; + case OP_SHFL: + emitSHFL(insn); + break; case OP_VOTE: emitVOTE(insn); break; -- 2.12.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 1/9] gm107/ir: Emit third src 'bound' and optional predicate output of SHFL
v2: Emit the original hard-coded 0x1c03 when OP_SHFL is used in gm107's lowering (Samuel Pitoiset) --- .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 23 ++ .../nouveau/codegen/nv50_ir_lowering_gm107.cpp | 15 +- 2 files changed, 29 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp index c3c0dcd9fc..944563c93c 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp @@ -967,11 +967,26 @@ CodeEmitterGM107::emitSHFL() break; } - /*XXX: what is this arg? hardcode immediate for now */ - emitField(0x22, 13, 0x1c03); - type |= 2; + switch (insn->src(2).getFile()) { + case FILE_GPR: + emitGPR(0x27, insn->src(2)); + break; + case FILE_IMMEDIATE: + emitIMMD(0x22, 13, insn->src(2)); + type |= 2; + break; + default: + assert(!"invalid src2 file"); + break; + } + + if (!insn->defExists(1)) + emitPRED(0x30); + else { + assert(insn->def(1).getFile() == FILE_PREDICATE); + emitPRED(0x30, insn->def(1)); + } - emitPRED (0x30); emitField(0x1e, 2, insn->subOp); emitField(0x1c, 2, type); emitGPR (0x08, insn->src(0)); diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_gm107.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_gm107.cpp index 371ebae40c..6b9edd4864 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_gm107.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_gm107.cpp @@ -41,6 +41,8 @@ namespace nv50_ir { ((QOP_##q << 6) | (QOP_##r << 4) | \ (QOP_##s << 2) | (QOP_##t << 0)) +#define SHFL_BOUND_QUAD 0x1c03 + void GM107LegalizeSSA::handlePFETCH(Instruction *i) { @@ -120,7 +122,8 @@ GM107LoweringPass::handleManualTXD(TexInstruction *i) // mov coordinates from lane l to all lanes bld.mkOp(OP_QUADON, TYPE_NONE, NULL); for (c = 0; c < dim; ++c) { - bld.mkOp2(OP_SHFL, TYPE_F32, crd[c], i->getSrc(c + array), bld.mkImm(l)); + bld.mkOp3(OP_SHFL, TYPE_F32, crd[c], i->getSrc(c + array), + bld.mkImm(l), bld.mkImm(SHFL_BOUND_QUAD)); add = bld.mkOp2(OP_QUADOP, TYPE_F32, crd[c], crd[c], zero); add->subOp = 0x00; add->lanes = 1; /* abused for .ndv */ @@ -128,7 +131,8 @@ GM107LoweringPass::handleManualTXD(TexInstruction *i) // add dPdx from lane l to lanes dx for (c = 0; c < dim; ++c) { - bld.mkOp2(OP_SHFL, TYPE_F32, tmp, i->dPdx[c].get(), bld.mkImm(l)); + bld.mkOp3(OP_SHFL, TYPE_F32, tmp, i->dPdx[c].get(), bld.mkImm(l), + bld.mkImm(SHFL_BOUND_QUAD)); add = bld.mkOp2(OP_QUADOP, TYPE_F32, crd[c], tmp, crd[c]); add->subOp = qOps[l][0]; add->lanes = 1; /* abused for .ndv */ @@ -136,7 +140,8 @@ GM107LoweringPass::handleManualTXD(TexInstruction *i) // add dPdy from lane l to lanes dy for (c = 0; c < dim; ++c) { - bld.mkOp2(OP_SHFL, TYPE_F32, tmp, i->dPdy[c].get(), bld.mkImm(l)); + bld.mkOp3(OP_SHFL, TYPE_F32, tmp, i->dPdy[c].get(), bld.mkImm(l), + bld.mkImm(SHFL_BOUND_QUAD)); add = bld.mkOp2(OP_QUADOP, TYPE_F32, crd[c], tmp, crd[c]); add->subOp = qOps[l][1]; add->lanes = 1; /* abused for .ndv */ @@ -203,8 +208,8 @@ GM107LoweringPass::handleDFDX(Instruction *insn) break; } - shfl = bld.mkOp2(OP_SHFL, TYPE_F32, bld.getScratch(), -insn->getSrc(0), bld.mkImm(xid)); + shfl = bld.mkOp3(OP_SHFL, TYPE_F32, bld.getScratch(), insn->getSrc(0), +bld.mkImm(xid), bld.mkImm(SHFL_BOUND_QUAD)); shfl->subOp = NV50_IR_SUBOP_SHFL_BFLY; insn->op = OP_QUADOP; insn->subOp = qop; -- 2.12.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 2/9] nvc0/ir: Properly handle a "split form" of predicate destination
GF100's ISA encoding has a weird form of predicate destination where its 3 bits are split across whole the instruction. Use a dedicated setPDSTL function instead of original defId which is incorrect in this case. --- src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp index 5467447e35..d5a310f88c 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp @@ -58,6 +58,7 @@ private: void setImmediateS8(const ValueRef&); void setSUConst16(const Instruction *, const int s); void setSUPred(const Instruction *, const int s); + inline void setPDSTL(const ValueDef&); void emitCondCode(CondCode cc, int pos); void emitInterpMode(const Instruction *); @@ -375,6 +376,14 @@ void CodeEmitterNVC0::setImmediateS8(const ValueRef ) code[0] |= (s8 >> 6) << 8; } +void CodeEmitterNVC0::setPDSTL(const ValueDef ) +{ + uint32_t pred = (def.get() && def.getFile() != FILE_FLAGS ? DDATA(def).id : 7); + + code[0] |= (pred & 3) << 8; + code[1] |= !!(pred & 7) << 26; +} + void CodeEmitterNVC0::emitForm_A(const Instruction *i, uint64_t opc) { @@ -1873,7 +1882,7 @@ CodeEmitterNVC0::emitSTORE(const Instruction *i) if (i->src(0).getFile() == FILE_MEMORY_SHARED && i->subOp == NV50_IR_SUBOP_STORE_UNLOCKED) { assert(i->defExists(0)); - defId(i->def(0), 8); + setPDSTL(i->def(0)); } } @@ -1945,7 +1954,7 @@ CodeEmitterNVC0::emitLOAD(const Instruction *i) if (p >= 0) { if (targ->getChipset() >= NVISA_GK104_CHIPSET) - defId(i->def(p), 8); + setPDSTL(i->def(p)); else defId(i->def(p), 32 + 18); } -- 2.12.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/9] nvc0: ARB_shader_ballot for Kepler+ (v2)
This is v2 series of my ARB_shader_ballot enablement. I added some fixes based on Samuel Pitoiset's feedback, which mainly include adapting existing OP_SHFL usage to the new form in gm107's lowering and addition of several assertion checks. It is also rebased against current master. Boyan Ding (9): gm107/ir: Emit third src 'bound' and optional predicate output of SHFL nvc0/ir: Properly handle a "split form" of predicate destination nvc0/ir: Emit OP_SHFL gk110/ir: Emit OP_SHFL nvc0/ir: Allow 0/1 immediate value as source of OP_VOTE nvc0/ir: Add SV_LANEMASK_* system values. nvc0/ir: Implement TGSI_SEMANTIC_SUBGROUP_* nvc0/ir: Implement TGSI_OPCODE_BALLOT and TGSI_OPCODE_READ_* nvc0: Enable ARB_shader_ballot on Kepler+ docs/features.txt | 2 +- docs/relnotes/17.1.0.html | 2 +- src/gallium/drivers/nouveau/codegen/nv50_ir.h | 5 + .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 85 - .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 50 -- .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 101 +++-- .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 55 +++ .../nouveau/codegen/nv50_ir_lowering_gm107.cpp | 15 ++- src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 3 +- 9 files changed, 293 insertions(+), 25 deletions(-) -- 2.12.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 8/9] nvc0/ir: Implement TGSI_OPCODE_BALLOT and TGSI_OPCODE_READ_*
--- .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp | 28 ++ 1 file changed, 28 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp index 1bd01a9a32..2ce6f29905 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp @@ -978,6 +978,10 @@ static nv50_ir::operation translateOpcode(uint opcode) NV50_IR_OPCODE_CASE(VOTE_ANY, VOTE); NV50_IR_OPCODE_CASE(VOTE_EQ, VOTE); + NV50_IR_OPCODE_CASE(BALLOT, VOTE); + NV50_IR_OPCODE_CASE(READ_INVOC, SHFL); + NV50_IR_OPCODE_CASE(READ_FIRST, SHFL); + NV50_IR_OPCODE_CASE(END, EXIT); default: @@ -3431,6 +3435,30 @@ Converter::handleInstruction(const struct tgsi_full_instruction *insn) mkCvt(OP_CVT, TYPE_U32, dst0[c], TYPE_U8, val0); } break; + case TGSI_OPCODE_BALLOT: + val0 = new_LValue(func, FILE_PREDICATE); + mkCmp(OP_SET, CC_NE, TYPE_U32, val0, TYPE_U32, fetchSrc(0, 0), zero); + mkOp1(op, TYPE_U32, dst0[0], val0)->subOp = NV50_IR_SUBOP_VOTE_ANY; + mkMov(dst0[1], zero, TYPE_U32); + break; + case TGSI_OPCODE_READ_FIRST: + // ReadFirstInvocationARB(src) is implemented as + // ReadInvocationARB(src, findLSB(ballot(true))) + val0 = getScratch(); + mkOp1(OP_VOTE, TYPE_U32, val0, mkImm(1))->subOp = NV50_IR_SUBOP_VOTE_ANY; + mkOp2(OP_EXTBF, TYPE_U32, val0, val0, mkImm(0x2000)) + ->subOp = NV50_IR_SUBOP_EXTBF_REV; + mkOp1(OP_BFIND, TYPE_U32, val0, val0)->subOp = NV50_IR_SUBOP_BFIND_SAMT; + src1 = val0; + /* fallthrough */ + case TGSI_OPCODE_READ_INVOC: + if (tgsi.getOpcode() == TGSI_OPCODE_READ_INVOC) + src1 = fetchSrc(1, 0); + FOR_EACH_DST_ENABLED_CHANNEL(0, c, tgsi) { + geni = mkOp3(op, dstTy, dst0[c], fetchSrc(0, c), src1, mkImm(0x1f)); + geni->subOp = NV50_IR_SUBOP_SHFL_IDX; + } + break; case TGSI_OPCODE_CLOCK: // Stick the 32-bit clock into the high dword of the logical result. if (!tgsi.getDst(0).isMasked(0)) -- 2.12.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 4/9] gk110/ir: Emit OP_SHFL
v2: Make sure that asImm() is not NULL (Samuel Pitoiset) --- .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 56 ++ 1 file changed, 56 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp index 1121ae0912..58076ba4d5 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp @@ -135,6 +135,8 @@ private: void emitFlow(const Instruction *); + void emitSHFL(const Instruction *); + void emitVOTE(const Instruction *); void emitSULDGB(const TexInstruction *); @@ -1566,6 +1568,57 @@ CodeEmitterGK110::emitFlow(const Instruction *i) } void +CodeEmitterGK110::emitSHFL(const Instruction *i) +{ + const ImmediateValue *imm; + + code[0] = 0x0002; + code[1] = 0x7880 | (i->subOp << 1); + + emitPredicate(i); + + defId(i->def(0), 2); + srcId(i->src(0), 10); + + switch (i->src(1).getFile()) { + case FILE_GPR: + srcId(i->src(1), 23); + break; + case FILE_IMMEDIATE: + imm = i->src(1).get()->asImm(); + assert(imm); + code[0] |= (imm->reg.data.u32 & 0x1f) << 23; + code[0] |= 1 << 31; + break; + default: + assert(!"invalid src1 file"); + break; + } + + switch (i->src(2).getFile()) { + case FILE_GPR: + srcId(i->src(2), 42); + break; + case FILE_IMMEDIATE: + imm = i->src(2).get()->asImm(); + assert(imm); + code[1] |= (imm->reg.data.u32 & 0x1fff) << 5; + code[1] |= 1; + break; + default: + assert(!"invalid src2 file"); + break; + } + + if (!i->defExists(1)) + code[1] |= 7 << 19; + else { + assert(i->def(1).getFile() == FILE_PREDICATE); + defId(i->def(1), 51); + } +} + +void CodeEmitterGK110::emitVOTE(const Instruction *i) { assert(i->src(0).getFile() == FILE_PREDICATE); @@ -2642,6 +2695,9 @@ CodeEmitterGK110::emitInstruction(Instruction *insn) case OP_CCTL: emitCCTL(insn); break; + case OP_SHFL: + emitSHFL(insn); + break; case OP_VOTE: emitVOTE(insn); break; -- 2.12.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] radv: Use a shader for occlusion CmdCopyQueryPoolResults.
Use the new occlusion query copy shader. We don't use the shader for the waiting as a polling loop ineracts badly with having caching enabled. I noticed on my GPU (Tonga) that the values are written out in order, so I just use a WAIT_REG_MEM on the last value. If it turns out other chips don't do that we may need to look a bit more into this. Having 8 WAIT_REG_MEM packets per query doesn't sound ideal. This also restricts the availability word in the pool to timestamp queries only, as occlusion queries don't use it, and pipeline statistic queries likely won't either. Signed-off-by: Bas Nieuwenhuizen--- src/amd/vulkan/radv_query.c | 138 1 file changed, 64 insertions(+), 74 deletions(-) diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c index 5b1fff4eeaa..86be85a5369 100644 --- a/src/amd/vulkan/radv_query.c +++ b/src/amd/vulkan/radv_query.c @@ -486,9 +486,7 @@ VkResult radv_CreateQueryPool( switch(pCreateInfo->queryType) { case VK_QUERY_TYPE_OCCLUSION: - /* 16 bytes tmp. buffer as the compute packet writes 64 bits, but -* the app. may have 32 bits of space. */ - pool->stride = 16 * get_max_db(device) + 16; + pool->stride = 16 * get_max_db(device); break; case VK_QUERY_TYPE_PIPELINE_STATISTICS: pool->stride = 16 * 11; @@ -502,7 +500,9 @@ VkResult radv_CreateQueryPool( pool->type = pCreateInfo->queryType; pool->availability_offset = pool->stride * pCreateInfo->queryCount; - size = pool->availability_offset + 4 * pCreateInfo->queryCount; + size = pool->availability_offset; + if (pCreateInfo->queryType == VK_QUERY_TYPE_TIMESTAMP) + size += 4 * pCreateInfo->queryCount; pool->bo = device->ws->buffer_create(device->ws, size, 64, RADEON_DOMAIN_GTT, 0); @@ -649,6 +649,7 @@ void radv_CmdCopyQueryPoolResults( RADV_FROM_HANDLE(radv_query_pool, pool, queryPool); RADV_FROM_HANDLE(radv_buffer, dst_buffer, dstBuffer); struct radeon_winsys_cs *cs = cmd_buffer->cs; + unsigned elem_size = (flags & VK_QUERY_RESULT_64_BIT) ? 8 : 4; uint64_t va = cmd_buffer->device->ws->buffer_get_va(pool->bo); uint64_t dest_va = cmd_buffer->device->ws->buffer_get_va(dst_buffer->bo); dest_va += dst_buffer->offset + dstOffset; @@ -656,33 +657,62 @@ void radv_CmdCopyQueryPoolResults( cmd_buffer->device->ws->cs_add_buffer(cmd_buffer->cs, pool->bo, 8); cmd_buffer->device->ws->cs_add_buffer(cmd_buffer->cs, dst_buffer->bo, 8); - for(unsigned i = 0; i < queryCount; ++i, dest_va += stride) { - unsigned query = firstQuery + i; - uint64_t local_src_va = va + query * pool->stride; - unsigned elem_size = (flags & VK_QUERY_RESULT_64_BIT) ? 8 : 4; - - MAYBE_UNUSED unsigned cdw_max = radeon_check_space(cmd_buffer->device->ws, cs, 26); - + switch (pool->type) { + case VK_QUERY_TYPE_OCCLUSION: if (flags & VK_QUERY_RESULT_WAIT_BIT) { - /* TODO, not sure if there is any case where we won't always be ready yet */ - uint64_t avail_va = va + pool->availability_offset + 4 * query; - - - /* This waits on the ME. All copies below are done on the ME */ - radeon_emit(cs, PKT3(PKT3_WAIT_REG_MEM, 5, 0)); - radeon_emit(cs, WAIT_REG_MEM_EQUAL | WAIT_REG_MEM_MEM_SPACE(1)); - radeon_emit(cs, avail_va); - radeon_emit(cs, avail_va >> 32); - radeon_emit(cs, 1); /* reference value */ - radeon_emit(cs, 0x); /* mask */ - radeon_emit(cs, 4); /* poll interval */ + for(unsigned i = 0; i < queryCount; ++i, dest_va += stride) { + unsigned query = firstQuery + i; + uint64_t src_va = va + query * pool->stride + pool->stride - 4; + + /* Waits on the upper word of the last DB entry */ + radeon_emit(cs, PKT3(PKT3_WAIT_REG_MEM, 5, 0)); + radeon_emit(cs, /*WAIT_REG_MEM_EQUAL*/ 5 | WAIT_REG_MEM_MEM_SPACE(1)); + radeon_emit(cs, src_va); + radeon_emit(cs, src_va >> 32); + radeon_emit(cs, 0x8000); /* reference value */ + radeon_emit(cs, 0x); /* mask */ + radeon_emit(cs, 4); /* poll interval */ + } } + occlusion_query_shader(cmd_buffer, pool->bo, dst_buffer->bo, + firstQuery *
[Mesa-dev] [PATCH 1/2] radv: Add occlusion query shader.
Adds a shader for writing occlusion query results to a buffer, as the CP packet isn't support on SI or secondary buffers, and doesn't handle the availability bit (or partial results) nor truncation to 32-bit. Signed-off-by: Bas Nieuwenhuizen--- src/amd/vulkan/radv_meta.c| 7 + src/amd/vulkan/radv_meta.h| 3 + src/amd/vulkan/radv_private.h | 6 + src/amd/vulkan/radv_query.c | 419 ++ 4 files changed, 435 insertions(+) diff --git a/src/amd/vulkan/radv_meta.c b/src/amd/vulkan/radv_meta.c index 04fa247dd36..0098e0844c1 100644 --- a/src/amd/vulkan/radv_meta.c +++ b/src/amd/vulkan/radv_meta.c @@ -324,6 +324,10 @@ radv_device_init_meta(struct radv_device *device) if (result != VK_SUCCESS) goto fail_buffer; + result = radv_device_init_meta_query_state(device); + if (result != VK_SUCCESS) + goto fail_query; + result = radv_device_init_meta_fast_clear_flush_state(device); if (result != VK_SUCCESS) goto fail_fast_clear; @@ -337,6 +341,8 @@ fail_resolve_compute: radv_device_finish_meta_fast_clear_flush_state(device); fail_fast_clear: radv_device_finish_meta_buffer_state(device); +fail_query: + radv_device_finish_meta_query_state(device); fail_buffer: radv_device_finish_meta_depth_decomp_state(device); fail_depth_decomp: @@ -363,6 +369,7 @@ radv_device_finish_meta(struct radv_device *device) radv_device_finish_meta_blit2d_state(device); radv_device_finish_meta_bufimage_state(device); radv_device_finish_meta_depth_decomp_state(device); + radv_device_finish_meta_query_state(device); radv_device_finish_meta_buffer_state(device); radv_device_finish_meta_fast_clear_flush_state(device); radv_device_finish_meta_resolve_compute_state(device); diff --git a/src/amd/vulkan/radv_meta.h b/src/amd/vulkan/radv_meta.h index d70fef1e5f1..6cfc6134c53 100644 --- a/src/amd/vulkan/radv_meta.h +++ b/src/amd/vulkan/radv_meta.h @@ -85,6 +85,9 @@ void radv_device_finish_meta_blit2d_state(struct radv_device *device); VkResult radv_device_init_meta_buffer_state(struct radv_device *device); void radv_device_finish_meta_buffer_state(struct radv_device *device); +VkResult radv_device_init_meta_query_state(struct radv_device *device); +void radv_device_finish_meta_query_state(struct radv_device *device); + VkResult radv_device_init_meta_resolve_compute_state(struct radv_device *device); void radv_device_finish_meta_resolve_compute_state(struct radv_device *device); void radv_meta_save(struct radv_meta_saved_state *state, diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h index 580c1197e64..a03c24c24ac 100644 --- a/src/amd/vulkan/radv_private.h +++ b/src/amd/vulkan/radv_private.h @@ -438,6 +438,12 @@ struct radv_meta_state { VkPipeline fill_pipeline; VkPipeline copy_pipeline; } buffer; + + struct { + VkDescriptorSetLayout occlusion_query_ds_layout; + VkPipelineLayout occlusion_query_p_layout; + VkPipeline occlusion_query_pipeline; + } query; }; /* queue types */ diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c index 288bd43a763..5b1fff4eeaa 100644 --- a/src/amd/vulkan/radv_query.c +++ b/src/amd/vulkan/radv_query.c @@ -29,6 +29,8 @@ #include #include +#include "nir/nir_builder.h" +#include "radv_meta.h" #include "radv_private.h" #include "radv_cs.h" #include "sid.h" @@ -49,6 +51,423 @@ static unsigned get_max_db(struct radv_device *device) return num_db; } +static void radv_break_on_count(nir_builder *b, nir_variable *var, int count) +{ + nir_ssa_def *counter = nir_load_var(b, var); + + nir_if *if_stmt = nir_if_create(b->shader); + if_stmt->condition = nir_src_for_ssa(nir_uge(b, counter, nir_imm_int(b, count))); + nir_cf_node_insert(b->cursor, _stmt->cf_node); + + b->cursor = nir_after_cf_list(_stmt->then_list); + + nir_jump_instr *instr = nir_jump_instr_create(b->shader, nir_jump_break); + nir_builder_instr_insert(b, >instr); + + b->cursor = nir_after_cf_node(_stmt->cf_node); + counter = nir_iadd(b, counter, nir_imm_int(b, 1)); + nir_store_var(b, var, counter, 0x1); +} + +static struct nir_ssa_def * +radv_load_push_int(nir_builder *b, unsigned offset, const char *name) +{ + nir_intrinsic_instr *flags = nir_intrinsic_instr_create(b->shader, nir_intrinsic_load_push_constant); + flags->src[0] = nir_src_for_ssa(nir_imm_int(b, offset)); + flags->num_components = 1; + nir_ssa_dest_init(>instr, >dest, 1, 32, name); + nir_builder_instr_insert(b, >instr); + return >dest.ssa; +} + +static nir_shader * +build_occlusion_query_shader(struct radv_device *device) { + /* the shader this builds is roughly +* +* push constants { +
[Mesa-dev] [PATCH 2/3] r600g: add draw_vbo check for a NULL pixel shader
Taken from radeonsi, required to remove dummy pixel shader in the next patch Signed-off-by: Constantine Kharlamov--- src/gallium/drivers/r600/evergreen_state.c | 1 + src/gallium/drivers/r600/r600_pipe.h | 1 + src/gallium/drivers/r600/r600_state.c| 3 ++- src/gallium/drivers/r600/r600_state_common.c | 7 ++- 4 files changed, 10 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index 371e7ce212..5697da4af9 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -471,6 +471,7 @@ static void *evergreen_create_rs_state(struct pipe_context *ctx, rs->clip_halfz = state->clip_halfz; rs->flatshade = state->flatshade; rs->sprite_coord_enable = state->sprite_coord_enable; + rs->rasterizer_discard = state->rasterizer_discard; rs->two_side = state->light_twoside; rs->clip_plane_enable = state->clip_plane_enable; rs->pa_sc_line_stipple = state->line_stipple_enable ? diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index 86634b8681..7f1ecc278b 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -279,6 +279,7 @@ struct r600_rasterizer_state { boolscissor_enable; boolmultisample_enable; boolclip_halfz; + boolrasterizer_discard; }; struct r600_poly_offset_state { diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index 1f7e9b3aa5..06100abc4a 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -470,6 +470,7 @@ static void *r600_create_rs_state(struct pipe_context *ctx, rs->clip_halfz = state->clip_halfz; rs->flatshade = state->flatshade; rs->sprite_coord_enable = state->sprite_coord_enable; + rs->rasterizer_discard = state->rasterizer_discard; rs->two_side = state->light_twoside; rs->clip_plane_enable = state->clip_plane_enable; rs->pa_sc_line_stipple = state->line_stipple_enable ? @@ -622,7 +623,7 @@ static void *r600_create_sampler_state(struct pipe_context *ctx, static struct pipe_sampler_view * texture_buffer_sampler_view(struct r600_pipe_sampler_view *view, unsigned width0, unsigned height0) - + { struct r600_texture *tmp = (struct r600_texture*)view->base.texture; int stride = util_format_get_blocksize(view->base.format); diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index 94f85e6dd3..c9b41517cc 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -1708,7 +1708,12 @@ static void r600_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info return; } - if (unlikely(!rctx->vs_shader || !rctx->ps_shader)) { + if (unlikely(!rctx->vs_shader)) { + assert(0); + return; + } + if (unlikely(!rctx->ps_shader && +(!rctx->rasterizer || !rctx->rasterizer->rasterizer_discard))) { assert(0); return; } -- 2.12.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3 v2] r600g: skip repeating vs, gs, and tes shader binds
The idea is taken from radeonsi. The code lacks some checks for null vs, and I'm unsure about some changes against that, so I left it in place. Some statistics for GTAⅣ: Average tesselation bind skip per frame: ≈350 Average geometric shaders bind skip per frame: ≈260 Skip of binding vertex ones occurs rarely enough to not get into per-frame counter at all, so I just gonna say: it happens. v2: I've occasionally removed an empty line, don't do this. Signed-off-by: Constantine Kharlamov--- src/gallium/drivers/r600/r600_state_common.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index 4de2a7344b..94f85e6dd3 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -931,7 +931,7 @@ static void r600_bind_vs_state(struct pipe_context *ctx, void *state) { struct r600_context *rctx = (struct r600_context *)ctx; - if (!state) + if (!state || rctx->vs_shader == state) return; rctx->vs_shader = (struct r600_pipe_shader_selector *)state; @@ -943,11 +943,12 @@ static void r600_bind_gs_state(struct pipe_context *ctx, void *state) { struct r600_context *rctx = (struct r600_context *)ctx; + if (state == rctx->gs_shader) + return; + rctx->gs_shader = (struct r600_pipe_shader_selector *)state; r600_update_vs_writes_viewport_index(>b, r600_get_vs_info(rctx)); - if (!state) - return; rctx->b.streamout.stride_in_dw = rctx->gs_shader->so.stride; } @@ -962,11 +963,12 @@ static void r600_bind_tes_state(struct pipe_context *ctx, void *state) { struct r600_context *rctx = (struct r600_context *)ctx; + if (state == rctx->tes_shader) + return; + rctx->tes_shader = (struct r600_pipe_shader_selector *)state; r600_update_vs_writes_viewport_index(>b, r600_get_vs_info(rctx)); - if (!state) - return; rctx->b.streamout.stride_in_dw = rctx->tes_shader->so.stride; } -- 2.12.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/3 v2] r600g: shader logic improvements
Although I didn't see a statistically significant change in GTAⅣ benchmark, it seem to have reduced stall for opening the door from a house to the outer world at the first savepoint. No changes in gpu.py tests of piglit in gbm mode. v2: In the 1-st patch was occasionally removed empty line. Don't do that. To the 3-rd patch added a check I missed because of macros using prefix. Tbh I'd rather prefer to split ps-related logic out of r600_update_derived_state(), but after more than hour of looking into it, and with understanding only half of the logic, I gave up. Constantine Kharlamov (3): r600g: skip repeating vs, gs, and tes shader binds r600g: add draw_vbo check for a NULL pixel shader r600g: get rid of dummy pixel shader src/gallium/drivers/r600/evergreen_state.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 9 src/gallium/drivers/r600/r600_pipe.h | 4 +- src/gallium/drivers/r600/r600_state.c| 3 +- src/gallium/drivers/r600/r600_state_common.c | 77 5 files changed, 47 insertions(+), 47 deletions(-) -- 2.12.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3 v2] r600g: get rid of dummy pixel shader
The idea is taken from radeonsi. The code mostly was already checking for null pixel shader, so little checks had to be added. Interestingly, acc. to testing with GTAⅣ, though binding of null shader happens a lot at the start (then just stops), but draw_vbo() never actually sees null ps. v2: added a check I missed because of a macros using a prefix to choose a shader. Signed-off-by: Constantine Kharlamov--- src/gallium/drivers/r600/r600_pipe.c | 9 - src/gallium/drivers/r600/r600_pipe.h | 3 -- src/gallium/drivers/r600/r600_state_common.c | 58 ++-- 3 files changed, 30 insertions(+), 40 deletions(-) diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index 5014f2525c..7d8efd2c9b 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -82,9 +82,6 @@ static void r600_destroy_context(struct pipe_context *context) if (rctx->fixed_func_tcs_shader) rctx->b.b.delete_tcs_state(>b.b, rctx->fixed_func_tcs_shader); - if (rctx->dummy_pixel_shader) { - rctx->b.b.delete_fs_state(>b.b, rctx->dummy_pixel_shader); - } if (rctx->custom_dsa_flush) { rctx->b.b.delete_depth_stencil_alpha_state(>b.b, rctx->custom_dsa_flush); } @@ -209,12 +206,6 @@ static struct pipe_context *r600_create_context(struct pipe_screen *screen, r600_begin_new_cs(rctx); - rctx->dummy_pixel_shader = - util_make_fragment_cloneinput_shader(>b.b, 0, -TGSI_SEMANTIC_GENERIC, -TGSI_INTERPOLATE_CONSTANT); - rctx->b.b.bind_fs_state(>b.b, rctx->dummy_pixel_shader); - return >b.b; fail: diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index 7f1ecc278b..e636ef0024 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -432,9 +432,6 @@ struct r600_context { void*custom_blend_resolve; void*custom_blend_decompress; void*custom_blend_fastclear; - /* With rasterizer discard, there doesn't have to be a pixel shader. -* In that case, we bind this one: */ - void*dummy_pixel_shader; /* These dummy CMASK and FMASK buffers are used to get around the R6xx hardware * bug where valid CMASK and FMASK are required to be present to avoid * a hardlock in certain operations but aren't actually used diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index c9b41517cc..8d1193360b 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -725,7 +725,8 @@ static inline void r600_shader_selector_key(const struct pipe_context *ctx, if (!key->vs.as_ls) key->vs.as_es = (rctx->gs_shader != NULL); - if (rctx->ps_shader->current->shader.gs_prim_id_input && !rctx->gs_shader) { + if (rctx->ps_shader && rctx->ps_shader->current->shader.gs_prim_id_input && + !rctx->gs_shader) { key->vs.as_gs_a = true; key->vs.prim_id_out = rctx->ps_shader->current->shader.input[rctx->ps_shader->current->shader.ps_prim_id_input].spi_sid; } @@ -909,9 +910,6 @@ static void r600_bind_ps_state(struct pipe_context *ctx, void *state) { struct r600_context *rctx = (struct r600_context *)ctx; - if (!state) - state = rctx->dummy_pixel_shader; - rctx->ps_shader = (struct r600_pipe_shader_selector *)state; } @@ -1474,7 +1472,8 @@ static bool r600_update_derived_state(struct r600_context *rctx) } } - SELECT_SHADER_OR_FAIL(ps); + if (rctx->ps_shader) + SELECT_SHADER_OR_FAIL(ps); r600_mark_atom_dirty(rctx, >shader_stages.atom); @@ -1551,37 +1550,40 @@ static bool r600_update_derived_state(struct r600_context *rctx) rctx->b.streamout.enabled_stream_buffers_mask = clip_so_current->enabled_stream_buffers_mask; } - if (unlikely(ps_dirty || rctx->hw_shader_stages[R600_HW_STAGE_PS].shader != rctx->ps_shader->current || - rctx->rasterizer->sprite_coord_enable != rctx->ps_shader->current->sprite_coord_enable || - rctx->rasterizer->flatshade != rctx->ps_shader->current->flatshade)) { + if (rctx->ps_shader) { + if (unlikely((ps_dirty || rctx->hw_shader_stages[R600_HW_STAGE_PS].shader != rctx->ps_shader->current || + rctx->rasterizer->sprite_coord_enable != rctx->ps_shader->current->sprite_coord_enable || +
Re: [Mesa-dev] [PATCH 2/2] genxml: Make BLEND_STATE command support variable length array.
On 09/04/17 17:23, Jason Ekstrand wrote: On April 9, 2017 8:48:31 AM Lionel Landwerlinwrote: I have one suggestion at the bottom of the patch, otherwise : Reviewed-by: Lionel Landwerlin On 07/04/17 17:52, Rafael Antognolli wrote: We need to emit BLEND_STATE, which size is 1 + 2 * nr_draw_buffers dwords (on gen8+), but the BLEND_STATE struct length is always 17. By marking it size 1, which is actually the size of the struct minus the BLEND_STATE_ENTRY's, we can emit a BLEND_STATE of variable number of entries. For gen6 and gen7 we set length to 0, since it only contains BLEND_STATE_ENTRY's, and no other data. With this change, we also change the code for blorp and anv to emit only the needed BLEND_STATE_ENTRY's, instead of always emitting 16 dwords on gen6-7 and 17 dwords on gen8+. Signed-off-by: Rafael Antognolli --- src/intel/blorp/blorp_genX_exec.h | 35 - src/intel/genxml/gen6.xml | 4 +- src/intel/genxml/gen7.xml | 4 +- src/intel/genxml/gen75.xml| 4 +- src/intel/genxml/gen8.xml | 4 +- src/intel/genxml/gen9.xml | 4 +- src/intel/vulkan/genX_pipeline.c | 53 7 files changed, 58 insertions(+), 50 deletions(-) diff --git a/src/intel/blorp/blorp_genX_exec.h b/src/intel/blorp/blorp_genX_exec.h index 3791462..fc1856f 100644 --- a/src/intel/blorp/blorp_genX_exec.h +++ b/src/intel/blorp/blorp_genX_exec.h @@ -902,23 +902,30 @@ blorp_emit_blend_state(struct blorp_batch *batch, struct GENX(BLEND_STATE) blend; memset(, 0, sizeof(blend)); + uint32_t offset; + int size = GENX(BLEND_STATE_length) * 4; + size += GENX(BLEND_STATE_ENTRY_length) * 4 * params->num_draw_buffers; + uint32_t *state = blorp_alloc_dynamic_state(batch, size, 64, ); + uint32_t *pos = state; + + GENX(BLEND_STATE_pack)(NULL, pos, ); + pos += GENX(BLEND_STATE_length); + for (unsigned i = 0; i < params->num_draw_buffers; ++i) { - blend.Entry[i].PreBlendColorClampEnable = true; - blend.Entry[i].PostBlendColorClampEnable = true; - blend.Entry[i].ColorClampRange = COLORCLAMP_RTFORMAT; - - blend.Entry[i].WriteDisableRed = params->color_write_disable[0]; - blend.Entry[i].WriteDisableGreen = params->color_write_disable[1]; - blend.Entry[i].WriteDisableBlue = params->color_write_disable[2]; - blend.Entry[i].WriteDisableAlpha = params->color_write_disable[3]; + struct GENX(BLEND_STATE_ENTRY) entry = { 0 }; + entry.PreBlendColorClampEnable = true; + entry.PostBlendColorClampEnable = true; + entry.ColorClampRange = COLORCLAMP_RTFORMAT; + + entry.WriteDisableRed = params->color_write_disable[0]; + entry.WriteDisableGreen = params->color_write_disable[1]; + entry.WriteDisableBlue = params->color_write_disable[2]; + entry.WriteDisableAlpha = params->color_write_disable[3]; + GENX(BLEND_STATE_ENTRY_pack)(NULL, pos, ); + pos += GENX(BLEND_STATE_ENTRY_length); } - uint32_t offset; - void *state = blorp_alloc_dynamic_state(batch, - GENX(BLEND_STATE_length) * 4, - 64, ); - GENX(BLEND_STATE_pack)(NULL, state, ); - blorp_flush_range(batch, state, GENX(BLEND_STATE_length) * 4); + blorp_flush_range(batch, state, size); #if GEN_GEN >= 7 blorp_emit(batch, GENX(3DSTATE_BLEND_STATE_POINTERS), sp) { diff --git a/src/intel/genxml/gen6.xml b/src/intel/genxml/gen6.xml index 5083f07..3059bfc 100644 --- a/src/intel/genxml/gen6.xml +++ b/src/intel/genxml/gen6.xml @@ -452,8 +452,8 @@ end="32" type="bool"/> - - + + type="BLEND_STATE_ENTRY"/> diff --git a/src/intel/genxml/gen7.xml b/src/intel/genxml/gen7.xml index ada8f74..867a1d4 100644 --- a/src/intel/genxml/gen7.xml +++ b/src/intel/genxml/gen7.xml @@ -507,8 +507,8 @@ end="32" type="bool"/> - - + + type="BLEND_STATE_ENTRY"/> diff --git a/src/intel/genxml/gen75.xml b/src/intel/genxml/gen75.xml index 16d2d74..594e539 100644 --- a/src/intel/genxml/gen75.xml +++ b/src/intel/genxml/gen75.xml @@ -517,8 +517,8 @@ end="32" type="bool"/> - - + + type="BLEND_STATE_ENTRY"/> diff --git a/src/intel/genxml/gen8.xml b/src/intel/genxml/gen8.xml index 1390fe6..4985342 100644 --- a/src/intel/genxml/gen8.xml +++ b/src/intel/genxml/gen8.xml @@ -546,7 +546,7 @@ - + type="bool"/> end="30" type="bool"/> type="bool"/> @@ -556,7 +556,7 @@ type="bool"/> - + type="BLEND_STATE_ENTRY"/> diff --git a/src/intel/genxml/gen9.xml b/src/intel/genxml/gen9.xml index 4bf0fb6..a620e78 100644 --- a/src/intel/genxml/gen9.xml +++ b/src/intel/genxml/gen9.xml @@ -555,7 +555,7 @@ - + type="bool"/> end="30" type="bool"/>
Re: [Mesa-dev] [PATCH shader-db] Add ".so" shared objects to .gitignore
Reviewed-by: Marek OlšákMarek On Sat, Apr 8, 2017 at 9:59 PM, Rhys Kidd wrote: > For intel_stubs.so > > Signed-off-by: Rhys Kidd > --- > > I don't have commit access, so I would appreciate a reviewer pushing this to > master. > > .gitignore | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/.gitignore b/.gitignore > index f69750a..95a04f6 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -1,2 +1,3 @@ > bin > run > +*.so > -- > 2.9.3 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] amd/addrlib: use correct variable name in header
Reviewed-by: Marek OlšákMarek On Sat, Apr 8, 2017 at 8:36 AM, Thomas Hindoe Paaboel Andersen wrote: > Since the inclusion in 7f160efcde41b52ad78e562316384373dab419e3 > the header used x_biased, while the implementation used y_biased. > This changes the header to macth the implementation since the > uses of the function seems to expect y_biased. > --- > src/amd/addrlib/gfx9/rbmap.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/amd/addrlib/gfx9/rbmap.h b/src/amd/addrlib/gfx9/rbmap.h > index f2f2ca8..89c8922 100644 > --- a/src/amd/addrlib/gfx9/rbmap.h > +++ b/src/amd/addrlib/gfx9/rbmap.h > @@ -49,7 +49,7 @@ public: > > void Get_Comp_Block_Screen_Space( CoordEq& addr, int bytes_log2, int* w, > int* h, int* d = NULL); > > -void Get_Meta_Block_Screen_Space( int num_comp_blocks_log2, bool > is_thick, bool x_biased, > +void Get_Meta_Block_Screen_Space( int num_comp_blocks_log2, bool > is_thick, bool y_biased, >int comp_block_width_log2, int > comp_block_height_log2, int comp_block_depth_log2, >int& meta_block_width_log2, int& > meta_block_height_log2, int& meta_block_depth_log2 ); > void cap_pipe( int xmode, bool is_thick, int& num_ses_log2, int > bpp_log2, int num_samples_log2, int pipe_interleave_log2, > -- > 2.9.3 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] amd: fix distcheck
Reviewed-by: Marek OlšákMarek On Wed, Apr 5, 2017 at 1:00 PM, Juan A. Suarez Romero wrote: > Add missing GFX9 files in the EXTRA_DIST. > --- > src/amd/Makefile.sources | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/src/amd/Makefile.sources b/src/amd/Makefile.sources > index 46da0fb..816e7e4 100644 > --- a/src/amd/Makefile.sources > +++ b/src/amd/Makefile.sources > @@ -21,12 +21,14 @@ ADDRLIB_FILES = \ > addrlib/core/addrlib2.h \ > addrlib/core/addrobject.cpp \ > addrlib/core/addrobject.h \ > + addrlib/gfx9/chip/gfx9_enum.h \ > addrlib/gfx9/coord.cpp \ > addrlib/gfx9/coord.h \ > addrlib/gfx9/gfx9addrlib.cpp \ > addrlib/gfx9/gfx9addrlib.h \ > addrlib/gfx9/rbmap.cpp \ > addrlib/gfx9/rbmap.h \ > + addrlib/inc/chip/gfx9/gfx9_gb_reg.h \ > addrlib/inc/chip/r800/si_gb_reg.h \ > addrlib/inc/lnx_common_defs.h \ > addrlib/r800/chip/si_ci_vi_merged_enum.h \ > -- > 2.9.3 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 2/3] nv50/ir: handle logops with NOT in AlgebraicOpt
On Mon, Apr 3, 2017 at 11:58 AM, Karol Herbstwrote: > Signed-off-by: Karol Herbst > --- > src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > index bd60a84998..0de84fe9fc 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > @@ -1856,6 +1856,12 @@ AlgebraicOpt::handleLOGOP(Instruction *logop) > >set0 = cloneForward(func, set0); >set1 = cloneShallow(func, set1); > + > + if (logop->src(0).mod == Modifier(NV50_IR_MOD_NOT)) > + set0->asCmp()->setCond = inverseCondCode(set0->asCmp()->setCond); > + if (logop->src(1).mod == Modifier(NV50_IR_MOD_NOT)) > + set1->asCmp()->setCond = inverseCondCode(set1->asCmp()->setCond); set0/set1 may have been swapped further up, so you need to keep track of that. Also, I don't think this will work if one of the sets is a SET_AND -- the condcode applies to the set bit, not to the AND bit. I think you'd also have to flip AND <-> OR and flip the neg. -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 1/3] nv50/ir: fix AlgebraicOpt for slcts with mods
On Mon, Apr 3, 2017 at 11:58 AM, Karol Herbstwrote: > Signed-off-by: Karol Herbst > --- > src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > index 4c92a1efb5..bd60a84998 100644 > --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp > @@ -1797,10 +1797,10 @@ AlgebraicOpt::handleSLCT(Instruction *slct) >if (slct->getSrc(2)->asImm()->compare(slct->asCmp()->setCond, 0.0f)) > slct->setSrc(0, slct->getSrc(1)); > } else > - if (slct->getSrc(0) != slct->getSrc(1)) { > + if (slct->getSrc(0) != slct->getSrc(1) || slct->src(0).mod != > slct->src(1).mod) SLCT can't have mods on src0/src1. Only on src2. I'd be just as happy to assert that they're both == 0 here. You can also add a helper to ValueRef to see if it's == to another ValueRef, which compares both the Value ptr as well as any modifiers, indirects, etc. But it again doesn't ultimately need to be used here. >return; > - } > - slct->op = OP_MOV; > + slct->op = slct->src(0).mod.getOp(); > + slct->src(0).mod = slct->src(0).mod ^ Modifier(slct->op); > slct->setSrc(1, NULL); > slct->setSrc(2, NULL); > } > -- > 2.12.2 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: fix memory leak in arb_fragment_program
--- src/mesa/program/arbprogparse.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/program/arbprogparse.c b/src/mesa/program/arbprogparse.c index 07bdf1603e..83a501eea6 100644 --- a/src/mesa/program/arbprogparse.c +++ b/src/mesa/program/arbprogparse.c @@ -78,6 +78,7 @@ _mesa_parse_arb_fragment_program(struct gl_context* ctx, GLenum target, memset(, 0, sizeof(prog)); memset(, 0, sizeof(state)); state.prog = + state.mem_ctx = program; if (!_mesa_parse_arb_program(ctx, target, (const GLubyte*) str, len, )) { -- 2.12.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] genxml: Make BLEND_STATE command support variable length array.
On April 9, 2017 8:48:31 AM Lionel Landwerlinwrote: I have one suggestion at the bottom of the patch, otherwise : Reviewed-by: Lionel Landwerlin On 07/04/17 17:52, Rafael Antognolli wrote: We need to emit BLEND_STATE, which size is 1 + 2 * nr_draw_buffers dwords (on gen8+), but the BLEND_STATE struct length is always 17. By marking it size 1, which is actually the size of the struct minus the BLEND_STATE_ENTRY's, we can emit a BLEND_STATE of variable number of entries. For gen6 and gen7 we set length to 0, since it only contains BLEND_STATE_ENTRY's, and no other data. With this change, we also change the code for blorp and anv to emit only the needed BLEND_STATE_ENTRY's, instead of always emitting 16 dwords on gen6-7 and 17 dwords on gen8+. Signed-off-by: Rafael Antognolli --- src/intel/blorp/blorp_genX_exec.h | 35 - src/intel/genxml/gen6.xml | 4 +- src/intel/genxml/gen7.xml | 4 +- src/intel/genxml/gen75.xml| 4 +- src/intel/genxml/gen8.xml | 4 +- src/intel/genxml/gen9.xml | 4 +- src/intel/vulkan/genX_pipeline.c | 53 7 files changed, 58 insertions(+), 50 deletions(-) diff --git a/src/intel/blorp/blorp_genX_exec.h b/src/intel/blorp/blorp_genX_exec.h index 3791462..fc1856f 100644 --- a/src/intel/blorp/blorp_genX_exec.h +++ b/src/intel/blorp/blorp_genX_exec.h @@ -902,23 +902,30 @@ blorp_emit_blend_state(struct blorp_batch *batch, struct GENX(BLEND_STATE) blend; memset(, 0, sizeof(blend)); + uint32_t offset; + int size = GENX(BLEND_STATE_length) * 4; + size += GENX(BLEND_STATE_ENTRY_length) * 4 * params->num_draw_buffers; + uint32_t *state = blorp_alloc_dynamic_state(batch, size, 64, ); + uint32_t *pos = state; + + GENX(BLEND_STATE_pack)(NULL, pos, ); + pos += GENX(BLEND_STATE_length); + for (unsigned i = 0; i < params->num_draw_buffers; ++i) { - blend.Entry[i].PreBlendColorClampEnable = true; - blend.Entry[i].PostBlendColorClampEnable = true; - blend.Entry[i].ColorClampRange = COLORCLAMP_RTFORMAT; - - blend.Entry[i].WriteDisableRed = params->color_write_disable[0]; - blend.Entry[i].WriteDisableGreen = params->color_write_disable[1]; - blend.Entry[i].WriteDisableBlue = params->color_write_disable[2]; - blend.Entry[i].WriteDisableAlpha = params->color_write_disable[3]; + struct GENX(BLEND_STATE_ENTRY) entry = { 0 }; + entry.PreBlendColorClampEnable = true; + entry.PostBlendColorClampEnable = true; + entry.ColorClampRange = COLORCLAMP_RTFORMAT; + + entry.WriteDisableRed = params->color_write_disable[0]; + entry.WriteDisableGreen = params->color_write_disable[1]; + entry.WriteDisableBlue = params->color_write_disable[2]; + entry.WriteDisableAlpha = params->color_write_disable[3]; + GENX(BLEND_STATE_ENTRY_pack)(NULL, pos, ); + pos += GENX(BLEND_STATE_ENTRY_length); } - uint32_t offset; - void *state = blorp_alloc_dynamic_state(batch, - GENX(BLEND_STATE_length) * 4, - 64, ); - GENX(BLEND_STATE_pack)(NULL, state, ); - blorp_flush_range(batch, state, GENX(BLEND_STATE_length) * 4); + blorp_flush_range(batch, state, size); #if GEN_GEN >= 7 blorp_emit(batch, GENX(3DSTATE_BLEND_STATE_POINTERS), sp) { diff --git a/src/intel/genxml/gen6.xml b/src/intel/genxml/gen6.xml index 5083f07..3059bfc 100644 --- a/src/intel/genxml/gen6.xml +++ b/src/intel/genxml/gen6.xml @@ -452,8 +452,8 @@ - - + + diff --git a/src/intel/genxml/gen7.xml b/src/intel/genxml/gen7.xml index ada8f74..867a1d4 100644 --- a/src/intel/genxml/gen7.xml +++ b/src/intel/genxml/gen7.xml @@ -507,8 +507,8 @@ - - + + diff --git a/src/intel/genxml/gen75.xml b/src/intel/genxml/gen75.xml index 16d2d74..594e539 100644 --- a/src/intel/genxml/gen75.xml +++ b/src/intel/genxml/gen75.xml @@ -517,8 +517,8 @@ - - + + diff --git a/src/intel/genxml/gen8.xml b/src/intel/genxml/gen8.xml index 1390fe6..4985342 100644 --- a/src/intel/genxml/gen8.xml +++ b/src/intel/genxml/gen8.xml @@ -546,7 +546,7 @@ - + @@ -556,7 +556,7 @@ - + diff --git a/src/intel/genxml/gen9.xml b/src/intel/genxml/gen9.xml index 4bf0fb6..a620e78 100644 --- a/src/intel/genxml/gen9.xml +++ b/src/intel/genxml/gen9.xml @@ -555,7 +555,7 @@ - + @@ -565,7 +565,7 @@ - + diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c index 3fd1333..894d584 100644 --- a/src/intel/vulkan/genX_pipeline.c +++ b/src/intel/vulkan/genX_pipeline.c @@ -862,28
Re: [Mesa-dev] [PATCH 2/2] genxml: Make BLEND_STATE command support variable length array.
I have one suggestion at the bottom of the patch, otherwise : Reviewed-by: Lionel LandwerlinOn 07/04/17 17:52, Rafael Antognolli wrote: We need to emit BLEND_STATE, which size is 1 + 2 * nr_draw_buffers dwords (on gen8+), but the BLEND_STATE struct length is always 17. By marking it size 1, which is actually the size of the struct minus the BLEND_STATE_ENTRY's, we can emit a BLEND_STATE of variable number of entries. For gen6 and gen7 we set length to 0, since it only contains BLEND_STATE_ENTRY's, and no other data. With this change, we also change the code for blorp and anv to emit only the needed BLEND_STATE_ENTRY's, instead of always emitting 16 dwords on gen6-7 and 17 dwords on gen8+. Signed-off-by: Rafael Antognolli --- src/intel/blorp/blorp_genX_exec.h | 35 - src/intel/genxml/gen6.xml | 4 +- src/intel/genxml/gen7.xml | 4 +- src/intel/genxml/gen75.xml| 4 +- src/intel/genxml/gen8.xml | 4 +- src/intel/genxml/gen9.xml | 4 +- src/intel/vulkan/genX_pipeline.c | 53 7 files changed, 58 insertions(+), 50 deletions(-) diff --git a/src/intel/blorp/blorp_genX_exec.h b/src/intel/blorp/blorp_genX_exec.h index 3791462..fc1856f 100644 --- a/src/intel/blorp/blorp_genX_exec.h +++ b/src/intel/blorp/blorp_genX_exec.h @@ -902,23 +902,30 @@ blorp_emit_blend_state(struct blorp_batch *batch, struct GENX(BLEND_STATE) blend; memset(, 0, sizeof(blend)); + uint32_t offset; + int size = GENX(BLEND_STATE_length) * 4; + size += GENX(BLEND_STATE_ENTRY_length) * 4 * params->num_draw_buffers; + uint32_t *state = blorp_alloc_dynamic_state(batch, size, 64, ); + uint32_t *pos = state; + + GENX(BLEND_STATE_pack)(NULL, pos, ); + pos += GENX(BLEND_STATE_length); + for (unsigned i = 0; i < params->num_draw_buffers; ++i) { - blend.Entry[i].PreBlendColorClampEnable = true; - blend.Entry[i].PostBlendColorClampEnable = true; - blend.Entry[i].ColorClampRange = COLORCLAMP_RTFORMAT; - - blend.Entry[i].WriteDisableRed = params->color_write_disable[0]; - blend.Entry[i].WriteDisableGreen = params->color_write_disable[1]; - blend.Entry[i].WriteDisableBlue = params->color_write_disable[2]; - blend.Entry[i].WriteDisableAlpha = params->color_write_disable[3]; + struct GENX(BLEND_STATE_ENTRY) entry = { 0 }; + entry.PreBlendColorClampEnable = true; + entry.PostBlendColorClampEnable = true; + entry.ColorClampRange = COLORCLAMP_RTFORMAT; + + entry.WriteDisableRed = params->color_write_disable[0]; + entry.WriteDisableGreen = params->color_write_disable[1]; + entry.WriteDisableBlue = params->color_write_disable[2]; + entry.WriteDisableAlpha = params->color_write_disable[3]; + GENX(BLEND_STATE_ENTRY_pack)(NULL, pos, ); + pos += GENX(BLEND_STATE_ENTRY_length); } - uint32_t offset; - void *state = blorp_alloc_dynamic_state(batch, - GENX(BLEND_STATE_length) * 4, - 64, ); - GENX(BLEND_STATE_pack)(NULL, state, ); - blorp_flush_range(batch, state, GENX(BLEND_STATE_length) * 4); + blorp_flush_range(batch, state, size); #if GEN_GEN >= 7 blorp_emit(batch, GENX(3DSTATE_BLEND_STATE_POINTERS), sp) { diff --git a/src/intel/genxml/gen6.xml b/src/intel/genxml/gen6.xml index 5083f07..3059bfc 100644 --- a/src/intel/genxml/gen6.xml +++ b/src/intel/genxml/gen6.xml @@ -452,8 +452,8 @@ - - + + diff --git a/src/intel/genxml/gen7.xml b/src/intel/genxml/gen7.xml index ada8f74..867a1d4 100644 --- a/src/intel/genxml/gen7.xml +++ b/src/intel/genxml/gen7.xml @@ -507,8 +507,8 @@ - - + + diff --git a/src/intel/genxml/gen75.xml b/src/intel/genxml/gen75.xml index 16d2d74..594e539 100644 --- a/src/intel/genxml/gen75.xml +++ b/src/intel/genxml/gen75.xml @@ -517,8 +517,8 @@ - - + + diff --git a/src/intel/genxml/gen8.xml b/src/intel/genxml/gen8.xml index 1390fe6..4985342 100644 --- a/src/intel/genxml/gen8.xml +++ b/src/intel/genxml/gen8.xml @@ -546,7 +546,7 @@ - + @@ -556,7 +556,7 @@ - + diff --git a/src/intel/genxml/gen9.xml b/src/intel/genxml/gen9.xml index 4bf0fb6..a620e78 100644 --- a/src/intel/genxml/gen9.xml +++ b/src/intel/genxml/gen9.xml @@ -555,7 +555,7 @@ - + @@ -565,7 +565,7 @@ - + diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c index 3fd1333..894d584 100644 --- a/src/intel/vulkan/genX_pipeline.c +++ b/src/intel/vulkan/genX_pipeline.c @@ -862,28 +862,14 @@ emit_cb_state(struct anv_pipeline *pipeline, { struct
[Mesa-dev] [PATCH 1/3 v3] r600g: skip repeating vs, gs, and tes shader binds
The idea is taken from radeonsi. The code lacks some checks for null vs, and I'm unsure about some changes against that, so I left it in place. Some statistics for GTAⅣ: Average tesselation bind skip per frame: ≈350 Average geometric shaders bind skip per frame: ≈260 Skip of binding vertex ones occurs rarely enough to not get into per-frame counter at all, so I just gonna say: it happens. v2: I've occasionally removed an empty line, don't do this. v3: fix the title for the mail to get stacked with its series Signed-off-by: Constantine Kharlamov--- src/gallium/drivers/r600/r600_state_common.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index 4de2a7344b..94f85e6dd3 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -931,7 +931,7 @@ static void r600_bind_vs_state(struct pipe_context *ctx, void *state) { struct r600_context *rctx = (struct r600_context *)ctx; - if (!state) + if (!state || rctx->vs_shader == state) return; rctx->vs_shader = (struct r600_pipe_shader_selector *)state; @@ -943,11 +943,12 @@ static void r600_bind_gs_state(struct pipe_context *ctx, void *state) { struct r600_context *rctx = (struct r600_context *)ctx; + if (state == rctx->gs_shader) + return; + rctx->gs_shader = (struct r600_pipe_shader_selector *)state; r600_update_vs_writes_viewport_index(>b, r600_get_vs_info(rctx)); - if (!state) - return; rctx->b.streamout.stride_in_dw = rctx->gs_shader->so.stride; } @@ -962,11 +963,12 @@ static void r600_bind_tes_state(struct pipe_context *ctx, void *state) { struct r600_context *rctx = (struct r600_context *)ctx; + if (state == rctx->tes_shader) + return; + rctx->tes_shader = (struct r600_pipe_shader_selector *)state; r600_update_vs_writes_viewport_index(>b, r600_get_vs_info(rctx)); - if (!state) - return; rctx->b.streamout.stride_in_dw = rctx->tes_shader->so.stride; } -- 2.12.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2] r600g: skip repeating vs, gs, and tes shader binds
The idea is taken from radeonsi. The code lacks some checks for null vs, and I'm unsure about some changes against that, so I left it in place. Some statistics for GTAⅣ: Average tesselation bind skip per frame: ≈350 Average geometric shaders bind skip per frame: ≈260 Skip of binding vertex ones occurs rarely enough to not get into per-frame counter at all, so I just gonna say: it happens. v2: I've occasionally removed an empty line, don't do this. Signed-off-by: Constantine Kharlamov--- src/gallium/drivers/r600/r600_state_common.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index 4de2a7344b..94f85e6dd3 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -931,7 +931,7 @@ static void r600_bind_vs_state(struct pipe_context *ctx, void *state) { struct r600_context *rctx = (struct r600_context *)ctx; - if (!state) + if (!state || rctx->vs_shader == state) return; rctx->vs_shader = (struct r600_pipe_shader_selector *)state; @@ -943,11 +943,12 @@ static void r600_bind_gs_state(struct pipe_context *ctx, void *state) { struct r600_context *rctx = (struct r600_context *)ctx; + if (state == rctx->gs_shader) + return; + rctx->gs_shader = (struct r600_pipe_shader_selector *)state; r600_update_vs_writes_viewport_index(>b, r600_get_vs_info(rctx)); - if (!state) - return; rctx->b.streamout.stride_in_dw = rctx->gs_shader->so.stride; } @@ -962,11 +963,12 @@ static void r600_bind_tes_state(struct pipe_context *ctx, void *state) { struct r600_context *rctx = (struct r600_context *)ctx; + if (state == rctx->tes_shader) + return; + rctx->tes_shader = (struct r600_pipe_shader_selector *)state; r600_update_vs_writes_viewport_index(>b, r600_get_vs_info(rctx)); - if (!state) - return; rctx->b.streamout.stride_in_dw = rctx->tes_shader->so.stride; } -- 2.12.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 100629] No mans sky renders white screen under wine in linux
https://bugs.freedesktop.org/show_bug.cgi?id=100629 --- Comment #2 from Giovanni ongaro--- those errors are displayed multiple times Mesa: User error: GL_INVALID_ENUM in glDrawElements(mode=) Mesa: User error: GL_INVALID_ENUM in glDrawElementsInstanced(mode=) -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] r600g: get rid of dummy pixel shader
The idea is taken from radeonsi. The code mostly was already checking for null pixel shader, so little checks had to be added. Interestingly, acc. to testing with GTAⅣ, though binding of null shader happens a lot at the start (then just stops), but draw_vbo() never actually sees null ps. Signed-off-by: Constantine Kharlamov--- src/gallium/drivers/r600/r600_pipe.c | 9 - src/gallium/drivers/r600/r600_pipe.h | 3 --- src/gallium/drivers/r600/r600_state_common.c | 17 - 3 files changed, 8 insertions(+), 21 deletions(-) diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index 5014f2525c..7d8efd2c9b 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -82,9 +82,6 @@ static void r600_destroy_context(struct pipe_context *context) if (rctx->fixed_func_tcs_shader) rctx->b.b.delete_tcs_state(>b.b, rctx->fixed_func_tcs_shader); - if (rctx->dummy_pixel_shader) { - rctx->b.b.delete_fs_state(>b.b, rctx->dummy_pixel_shader); - } if (rctx->custom_dsa_flush) { rctx->b.b.delete_depth_stencil_alpha_state(>b.b, rctx->custom_dsa_flush); } @@ -209,12 +206,6 @@ static struct pipe_context *r600_create_context(struct pipe_screen *screen, r600_begin_new_cs(rctx); - rctx->dummy_pixel_shader = - util_make_fragment_cloneinput_shader(>b.b, 0, -TGSI_SEMANTIC_GENERIC, -TGSI_INTERPOLATE_CONSTANT); - rctx->b.b.bind_fs_state(>b.b, rctx->dummy_pixel_shader); - return >b.b; fail: diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index 7f1ecc278b..e636ef0024 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -432,9 +432,6 @@ struct r600_context { void*custom_blend_resolve; void*custom_blend_decompress; void*custom_blend_fastclear; - /* With rasterizer discard, there doesn't have to be a pixel shader. -* In that case, we bind this one: */ - void*dummy_pixel_shader; /* These dummy CMASK and FMASK buffers are used to get around the R6xx hardware * bug where valid CMASK and FMASK are required to be present to avoid * a hardlock in certain operations but aren't actually used diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index c4b1a22d95..be7db361d1 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -725,7 +725,8 @@ static inline void r600_shader_selector_key(const struct pipe_context *ctx, if (!key->vs.as_ls) key->vs.as_es = (rctx->gs_shader != NULL); - if (rctx->ps_shader->current->shader.gs_prim_id_input && !rctx->gs_shader) { + if (rctx->ps_shader && rctx->ps_shader->current->shader.gs_prim_id_input && + !rctx->gs_shader) { key->vs.as_gs_a = true; key->vs.prim_id_out = rctx->ps_shader->current->shader.input[rctx->ps_shader->current->shader.ps_prim_id_input].spi_sid; } @@ -909,9 +910,6 @@ static void r600_bind_ps_state(struct pipe_context *ctx, void *state) { struct r600_context *rctx = (struct r600_context *)ctx; - if (!state) - state = rctx->dummy_pixel_shader; - rctx->ps_shader = (struct r600_pipe_shader_selector *)state; } @@ -1550,9 +1548,10 @@ static bool r600_update_derived_state(struct r600_context *rctx) rctx->b.streamout.enabled_stream_buffers_mask = clip_so_current->enabled_stream_buffers_mask; } - if (unlikely(ps_dirty || rctx->hw_shader_stages[R600_HW_STAGE_PS].shader != rctx->ps_shader->current || - rctx->rasterizer->sprite_coord_enable != rctx->ps_shader->current->sprite_coord_enable || - rctx->rasterizer->flatshade != rctx->ps_shader->current->flatshade)) { + if (unlikely(rctx->ps_shader && +(ps_dirty || rctx->hw_shader_stages[R600_HW_STAGE_PS].shader != rctx->ps_shader->current || + rctx->rasterizer->sprite_coord_enable != rctx->ps_shader->current->sprite_coord_enable || + rctx->rasterizer->flatshade != rctx->ps_shader->current->flatshade))) { if (rctx->cb_misc_state.nr_ps_color_outputs != rctx->ps_shader->current->nr_ps_color_outputs) { rctx->cb_misc_state.nr_ps_color_outputs = rctx->ps_shader->current->nr_ps_color_outputs; @@ -1568,7 +1567,7 @@ static bool r600_update_derived_state(struct
[Mesa-dev] [PATCH 2/3] r600g: add draw_vbo check for a NULL pixel shader
Taken from radeonsi, required to remove dummy pixel shader in the next patch Signed-off-by: Constantine Kharlamov--- src/gallium/drivers/r600/evergreen_state.c | 1 + src/gallium/drivers/r600/r600_pipe.h | 1 + src/gallium/drivers/r600/r600_state.c| 3 ++- src/gallium/drivers/r600/r600_state_common.c | 7 ++- 4 files changed, 10 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index 371e7ce212..5697da4af9 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -471,6 +471,7 @@ static void *evergreen_create_rs_state(struct pipe_context *ctx, rs->clip_halfz = state->clip_halfz; rs->flatshade = state->flatshade; rs->sprite_coord_enable = state->sprite_coord_enable; + rs->rasterizer_discard = state->rasterizer_discard; rs->two_side = state->light_twoside; rs->clip_plane_enable = state->clip_plane_enable; rs->pa_sc_line_stipple = state->line_stipple_enable ? diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index 86634b8681..7f1ecc278b 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -279,6 +279,7 @@ struct r600_rasterizer_state { boolscissor_enable; boolmultisample_enable; boolclip_halfz; + boolrasterizer_discard; }; struct r600_poly_offset_state { diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index 1f7e9b3aa5..06100abc4a 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -470,6 +470,7 @@ static void *r600_create_rs_state(struct pipe_context *ctx, rs->clip_halfz = state->clip_halfz; rs->flatshade = state->flatshade; rs->sprite_coord_enable = state->sprite_coord_enable; + rs->rasterizer_discard = state->rasterizer_discard; rs->two_side = state->light_twoside; rs->clip_plane_enable = state->clip_plane_enable; rs->pa_sc_line_stipple = state->line_stipple_enable ? @@ -622,7 +623,7 @@ static void *r600_create_sampler_state(struct pipe_context *ctx, static struct pipe_sampler_view * texture_buffer_sampler_view(struct r600_pipe_sampler_view *view, unsigned width0, unsigned height0) - + { struct r600_texture *tmp = (struct r600_texture*)view->base.texture; int stride = util_format_get_blocksize(view->base.format); diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index dab39f19e3..c4b1a22d95 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -1707,7 +1707,12 @@ static void r600_draw_vbo(struct pipe_context *ctx, const struct pipe_draw_info return; } - if (unlikely(!rctx->vs_shader || !rctx->ps_shader)) { + if (unlikely(!rctx->vs_shader)) { + assert(0); + return; + } + if (unlikely(!rctx->ps_shader && +(!rctx->rasterizer || !rctx->rasterizer->rasterizer_discard))) { assert(0); return; } -- 2.12.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] r600g: skip repeating vs, gs, and tes shader binds
The idea is taken from radeonsi. The code lacks some checks for null vs, and I'm unsure about some changes against that, so I left it in place. Some statistics for GTAⅣ: Average tesselation shaders bind skip per frame: ≈350 Average geometric shaders bind skip per frame: ≈260 Skip of binding vertex ones occurs rarely enough to not get into per-frame counter at all, so I just gonna say: it happens. Signed-off-by: Constantine Kharlamov--- src/gallium/drivers/r600/r600_state_common.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index 4de2a7344b..dab39f19e3 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -926,12 +926,11 @@ static struct tgsi_shader_info *r600_get_vs_info(struct r600_context *rctx) else return NULL; } - static void r600_bind_vs_state(struct pipe_context *ctx, void *state) { struct r600_context *rctx = (struct r600_context *)ctx; - if (!state) + if (!state || rctx->vs_shader == state) return; rctx->vs_shader = (struct r600_pipe_shader_selector *)state; @@ -943,11 +942,12 @@ static void r600_bind_gs_state(struct pipe_context *ctx, void *state) { struct r600_context *rctx = (struct r600_context *)ctx; + if (state == rctx->gs_shader) + return; + rctx->gs_shader = (struct r600_pipe_shader_selector *)state; r600_update_vs_writes_viewport_index(>b, r600_get_vs_info(rctx)); - if (!state) - return; rctx->b.streamout.stride_in_dw = rctx->gs_shader->so.stride; } @@ -962,11 +962,12 @@ static void r600_bind_tes_state(struct pipe_context *ctx, void *state) { struct r600_context *rctx = (struct r600_context *)ctx; + if (state == rctx->tes_shader) + return; + rctx->tes_shader = (struct r600_pipe_shader_selector *)state; r600_update_vs_writes_viewport_index(>b, r600_get_vs_info(rctx)); - if (!state) - return; rctx->b.streamout.stride_in_dw = rctx->tes_shader->so.stride; } -- 2.12.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/3] r600g: shader logic improvements
Although I didn't see a staticstically significant change in GTAⅣ benchmark, it seem to have reduced stall for opening the door from a house to the outer world at the first savepoint. No changes in gpu.py tests of piglit in gbm mode. Constantine Kharlamov (3): r600g: skip repeating vs, gs, and tes shader binds r600g: add draw_vbo check for a NULL pixel shader r600g: get rid of dummy pixel shader src/gallium/drivers/r600/evergreen_state.c | 1 + src/gallium/drivers/r600/r600_pipe.c | 9 --- src/gallium/drivers/r600/r600_pipe.h | 4 +-- src/gallium/drivers/r600/r600_state.c| 3 ++- src/gallium/drivers/r600/r600_state_common.c | 37 5 files changed, 25 insertions(+), 29 deletions(-) -- 2.12.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 100629] No mans sky renders white screen under wine in linux
https://bugs.freedesktop.org/show_bug.cgi?id=100629 --- Comment #1 from Giovanni ongaro--- Upon starting no man sky under wine (no man sky need OGL4.5 ) ingame only a white screen is displayed -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 100629] No mans sky renders white screen under wine in linux
https://bugs.freedesktop.org/show_bug.cgi?id=100629 Bug ID: 100629 Summary: No mans sky renders white screen under wine in linux Product: Mesa Version: git Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: Mesa core Assignee: mesa-dev@lists.freedesktop.org Reporter: giovanni.nic...@ticino.com QA Contact: mesa-dev@lists.freedesktop.org -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: use single memcpy when strides matches
--- src/mesa/main/readpix.c | 15 ++- src/mesa/main/texstore.c | 15 +++ 2 files changed, 21 insertions(+), 9 deletions(-) diff --git a/src/mesa/main/readpix.c b/src/mesa/main/readpix.c index 25823230d6..14568de497 100644 --- a/src/mesa/main/readpix.c +++ b/src/mesa/main/readpix.c @@ -220,7 +220,7 @@ readpixels_memcpy(struct gl_context *ctx, struct gl_renderbuffer *rb = _mesa_get_read_renderbuffer_for_format(ctx, format); GLubyte *dst, *map; - int dstStride, stride, j, texelBytes; + int dstStride, stride, j, texelBytes, bytesPerRow; /* Fail if memcpy cannot be used. */ if (!readpixels_can_use_memcpy(ctx, format, type, packing)) { @@ -239,12 +239,17 @@ readpixels_memcpy(struct gl_context *ctx, } texelBytes = _mesa_get_format_bytes(rb->Format); + bytesPerRow = texelBytes * width; /* memcpy*/ - for (j = 0; j < height; j++) { - memcpy(dst, map, width * texelBytes); - dst += dstStride; - map += stride; + if (dstStride == stride && dstStride == bytesPerRow) { + memcpy(dst, map, bytesPerRow * height); + } else { + for (j = 0; j < height; j++) { + memcpy(dst, map, bytesPerRow); + dst += dstStride; + map += stride; + } } ctx->Driver.UnmapRenderbuffer(ctx, rb); diff --git a/src/mesa/main/texstore.c b/src/mesa/main/texstore.c index 615ba63362..3314e557c0 100644 --- a/src/mesa/main/texstore.c +++ b/src/mesa/main/texstore.c @@ -1360,10 +1360,17 @@ _mesa_store_compressed_texsubimage(struct gl_context *ctx, GLuint dims, if (dstMap) { /* copy rows of blocks */ - for (i = 0; i < store.CopyRowsPerSlice; i++) { -memcpy(dstMap, src, store.CopyBytesPerRow); -dstMap += dstRowStride; -src += store.TotalBytesPerRow; + if (dstRowStride == store.TotalBytesPerRow && + dstRowStride == store.CopyBytesPerRow) { +memcpy(dstMap, src, store.CopyBytesPerRow * store.CopyRowsPerSlice); +src += store.CopyBytesPerRow * store.CopyRowsPerSlice; + } + else { +for (i = 0; i < store.CopyRowsPerSlice; i++) { + memcpy(dstMap, src, store.CopyBytesPerRow); + dstMap += dstRowStride; + src += store.TotalBytesPerRow; +} } ctx->Driver.UnmapTextureImage(ctx, texImage, slice + zoffset); -- 2.12.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 100627] EGL fails to fall back to DRI2 when DRI3 is enabled but not available
https://bugs.freedesktop.org/show_bug.cgi?id=100627 Bug ID: 100627 Summary: EGL fails to fall back to DRI2 when DRI3 is enabled but not available Product: Mesa Version: 17.0 Hardware: All OS: FreeBSD Status: NEW Severity: normal Priority: medium Component: EGL Assignee: mesa-dev@lists.freedesktop.org Reporter: re...@freebsd.org QA Contact: mesa-dev@lists.freedesktop.org DRI2 will be available but DRI3 may or may not depending which kernel and drivers are running. When Mesa is compiled with DRI3 support enabled and run on a kernel with only DRI2 support, applications using GLX work fine (modulo scary messages), but applications that use EGL fail. The cause appears to be insufficient checking during init in libEGL. The GLX init code has separate dri3_create_screen and dri2_create_screen functions which are called from the respective init function. When the former bails with "libGL error: Version 7 or imageFromFds image extension not found\nlibGL error: failed to load driver: r600", libGL then tries the DRI2 path and succeeds. As an aside, it would be nice if the first message was only shown if LIBGL_DEBUG is set, and the second should indicate that DRI3 init has failed but DRI2 will be attempted instead of saying the driver failed to load (which is a source of spurious bug reports). The EGL path on the other hand only has a dri2_create_screen function which is called from both dri2_initialize_x11_dri2 and dri2_initialize_x11_dri3. Thus the init path succeeds for DRI3 even though it cannot work, so the application ultimately fails because the first detectable error is after we are out of the init routines and it's too late to attempt a fallback to DRI2. Setting LIBGL_DRI3_DISABLE allows applications using EGL to function correctly, so there is a work-around until the initialization can be fixed to check availability of DRI3 support as is done in libGL. Example of failure running mesa EGL demos: % LIBGL_DEBUG=verbose EGL_LOG_LEVEL=debug MESA_DEBUG=1 eglgears_x11 libEGL debug: Native platform type: x11 (autodetected) libEGL debug: added egl_dri2 to module array libGL: Can't open configuration file /home/user/.drirc: No such file or directory. libEGL debug: DRI2: dlopen(/usr/local/lib/dri/r600_dri.so) libEGL debug: found extension `DRI_Core' libEGL info: found extension DRI_Core version 1 libEGL debug: found extension `DRI_IMAGE_DRIVER' libEGL info: found extension DRI_IMAGE_DRIVER version 1 libEGL debug: found extension `DRI_DRI2' libEGL debug: found extension `DRI_ConfigOptions' libEGL debug: found extension `DRI2_Fence' libGL: Can't open configuration file /home/user/.drirc: No such file or directory. libGL: Can't open configuration file /home/user/.drirc: No such file or directory. libEGL debug: found extension `DRI_TexBuffer' libEGL info: found extension DRI_TexBuffer version 2 libEGL debug: found extension `DRI2_Flush' libEGL info: found extension DRI2_Flush version 4 libEGL debug: found extension `DRI_IMAGE' libEGL info: found extension DRI_IMAGE version 12 libEGL debug: found extension `DRI_RENDERER_QUERY' libEGL debug: found extension `DRI_CONFIG_QUERY' libEGL debug: found extension `DRI2_Throttle' libEGL debug: found extension `DRI2_Fence' libEGL debug: found extension `DRI2_Interop' libEGL debug: found extension `DRI_TexBuffer' libEGL debug: found extension `DRI2_Flush' libEGL debug: found extension `DRI_IMAGE' libEGL debug: found extension `DRI_RENDERER_QUERY' libEGL info: found extension DRI_RENDERER_QUERY version 1 libEGL debug: found extension `DRI_CONFIG_QUERY' libEGL info: found extension DRI_CONFIG_QUERY version 1 libEGL debug: found extension `DRI2_Throttle' libEGL debug: found extension `DRI2_Fence' libEGL info: found extension DRI2_Fence version 2 libEGL debug: found extension `DRI2_Interop' libEGL info: found extension DRI2_Interop version 1 libEGL debug: did not find optional extension DRI_Robustness version 1 libEGL info: Using DRI3 libEGL debug: the best driver is DRI2 EGL_VERSION = 1.4 (DRI2) zsh: segmentation fault (core dumped) As can be seen, libEGL runs right through the DRI3 init and then crashes when it tries to draw without having a surface. The backtrace differs according to the driver in use. This is just an example from my machine: (lldb) bt * thread #1 * frame #0: r600_dri.so`_debug_assert_fail(expr="surface", file="state_tracker/st_atom_framebuffer.c", line=61, function="update_framebuffer_size") at u_debug.c:321 frame #1: r600_dri.so`update_framebuffer_size(framebuffer=0x00080a845028, surface=0x) at st_atom_framebuffer.c:61 frame #2: r600_dri.so`update_framebuffer_state(st=0x00080a843000) at st_atom_framebuffer.c:181 frame #3: r600_dri.so`st_validate_state(st=0x00080a843000, pipeline=ST_PIPELINE_RENDER) at st_atom.c:219 frame #4:
[Mesa-dev] [Bug 100613] Regression in Mesa 17 on s390x (zSystems)
https://bugs.freedesktop.org/show_bug.cgi?id=100613 --- Comment #2 from Stefan Dirsch--- Roland, thanks a lot for your prompt reply! Very much appreciated! Seems Richard meanwhile switched companies from IBM to ARM meanwhile. I found him on Linkedin. Possibly he's now working on aarch64 (LE). So I'm afraid he has no longer access to BE machines any longer. Unfortunately I'm not familiar with llvmpipe at all. Would it be an option not to change the code there for BE, if developers have no access to such machines? Reverse-applying the commit is going to break sooner or later I'm sure. Of course I'm willing to test any proposed change/patch on s390x, but I'm not a Mesa/llvmwpipe developer per se. UNfortunately llvmpipe is needed on s390x, since it has become a requirement for modern desktops like gdm/gnome-shell. :-( I can't say how fundamental the issue is. gdm and gnome-shell just show a black screen. :-( I found glxgears as example more useful. ;-) -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev