[Mesa-dev] [Bug 86195] Lightswork video editor segfaults
https://bugs.freedesktop.org/show_bug.cgi?id=86195 --- Comment #2 from Iaroslav Andrusyak pontost...@gmail.com --- Created attachment 109393 -- https://bugs.freedesktop.org/attachment.cgi?id=109393action=edit stderr -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 86195] Lightswork video editor segfaults
https://bugs.freedesktop.org/show_bug.cgi?id=86195 --- Comment #3 from Michel Dänzer mic...@daenzer.net --- (In reply to Iaroslav Andrusyak from comment #2) stderr Did that crash as well? There's only one LLVM dump in there, and no immediate sign of a crash. If it did crash, can you try again with R600_DEBUG=vs,gs,ps? Also, did you try DRAW_USE_LLVM=0? -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/4] i965/vec4: Combine all the math emitters.
On Wednesday, November 12, 2014 09:57:30 PM Matt Turner wrote: On Wed, Nov 12, 2014 at 9:35 PM, Kenneth Graunke kenn...@whitecape.org wrote: +vec4_visitor::emit_math(enum opcode opcode, + dst_reg dst, src_reg src0, src_reg src1) I think you can make the arguments const references too? Yeah. I've changed the prototype to: void emit_math(enum opcode opcode, const dst_reg dst, const src_reg src0, const src_reg src1 = src_reg()); It also meant changing the first few lines to: vec4_instruction *math = emit(opcode, dst, fix_math_operand(src0), fix_math_operand(src1)) since src0 = fix_math_operand(src0) doesn't work with const src_reg . + if (brw-gen == 6 dst.writemask != WRITEMASK_XYZW) { + /* MATH on Gen6 must be align1, so we can't do writemasks. */ + math-dst = dst_reg(this, glsl_type::vec4_type); + math-dst.type = dst.type; + math-dst.writemask = WRITEMASK_XYZW; I don't think you need to set the writemask (XYZW is the default). I do, actually - it's guaranteed to not be XYZW at this point. The caller passed us a destination register with some writemask set. We create the math instruction using dst, so it inherits that writemask. This block executes when dst.writemask != WRITEMASK_XYZW. The point is to override it back to XYZW, since it isn't. + emit(MOV(dst, src_reg(math-dst))); + } else if (brw-gen 6) { + math-base_mrf = 1; + math-mlen = src1.file == BAD_FILE ? 1 : 2; } } Series is Reviewed-by: Matt Turner matts...@gmail.com Thanks! signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 86195] Lightswork video editor segfaults
https://bugs.freedesktop.org/show_bug.cgi?id=86195 --- Comment #4 from Iaroslav Andrusyak pontost...@gmail.com --- DRAW_USE_LLVM=0 does not help, and there is no output in console from LW, Lightswork totally silent. I have only several logs in lightswork folder. StdErr.log and error.log -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 86195] Lightswork video editor segfaults
https://bugs.freedesktop.org/show_bug.cgi?id=86195 --- Comment #5 from Iaroslav Andrusyak pontost...@gmail.com --- Created attachment 109395 -- https://bugs.freedesktop.org/attachment.cgi?id=109395action=edit error -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] gk20a: use NOUVEAU_BO_GART as VRAM domain
On 10/30/2014 12:29 AM, Ilia Mirkin wrote: On Mon, Oct 27, 2014 at 6:34 AM, Alexandre Courbot acour...@nvidia.com wrote: GK20A does not have dedicated VRAM, therefore allocating in VRAM can be sub-optimal and sometimes even harmful. Set its VRAM domain to NOUVEAU_BO_GART so all objects are allocated in system memory. Signed-off-by: Alexandre Courbot acour...@nvidia.com --- src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c index ac5823e4a8d5..ad143cd9a140 100644 --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c @@ -620,6 +620,16 @@ nvc0_screen_create(struct nouveau_device *dev) return NULL; pscreen = screen-base.base; + /* Recognize chipsets with no VRAM */ + switch (dev-chipset) { + /* GK20A */ + case 0xea: + screen-base.vram_domain = NOUVEAU_BO_GART; I think you also want to set vidmem_bindings = 0... although potentially after the |= that's done below. Although I guess that constbuf + command args buf need to be |='d into the sysmem_bindings for this to work out well. That said, we don't really handle explicit migration well right now, and those PIPE_BIND_* are *incredibly* misleading and don't actually necessarily reflect the current usage. [I have some patches to improve the situation, but you don't really have to worry about that.] In the light of that it could be that the vram_domain member I am introducing is completely useless - if we set NV_VRAM_DOMAIN to be the following: #define NV_VRAM_DOMAIN(screen) ((screen)-vidmem_bindings == 0 ? NOUVEAU_BO_GART : NOUVEAU_BO_VRAM) then I suspect we can just live without it. I tested quickly and it seems to work. Ilia, do you agree? Or could we imagine having GPUs with VRAM for which none of the PIPE_BIND_* targets should reside in VRAM? Also thinking, prior to setting vidmem_bindings to 0, shouldn't we also do a sysmem_bindings |= vidmem_bindings to make sure all the set bindings are tracked somewhere? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: Avoid deadlock when unloading opengl32.dll
Hi Tom, That's peculiar. It looks like pthreads got into a weird state somehow. Don't precisely understand how though. Maybe there's a race inside pipe_semaphore_signal() with the destruction of the semaphore. I think the best thing for now is to revert to old behavior for non-windows platforms: diff --git a/src/gallium/drivers/llvmpipe/lp_rast.c b/src/gallium/drivers/llvmpipe/lp_rast.c index 6b54d43..e168766 100644 --- a/src/gallium/drivers/llvmpipe/lp_rast.c +++ b/src/gallium/drivers/llvmpipe/lp_rast.c @@ -800,7 +800,9 @@ static PIPE_THREAD_ROUTINE( thread_function, init_data ) pipe_semaphore_signal(task-work_done); } +#ifdef _WIN32 pipe_semaphore_signal(task-work_done); +#endif return 0; } @@ -891,7 +893,11 @@ void lp_rast_destroy( struct lp_rasterizer *rast ) * We don't actually call pipe_thread_wait to avoid dead lock on Windows * per https://bugs.freedesktop.org/show_bug.cgi?id=76252 */ for (i = 0; i rast-num_threads; i++) { +#ifdef _WIN32 pipe_semaphore_wait(rast-tasks[i].work_done); +#else + pipe_thread_wait(rast-threads[i]); +#endif } /* Clean up per-thread data */ Because I don't think that the Windows deadlock ever happens on Linux. Jose From: Tom Stellard t...@stellard.net Sent: 13 November 2014 01:45 To: Jose Fonseca Cc: mesa-dev@lists.freedesktop.org; Roland Scheidegger Subject: Re: [Mesa-dev] [PATCH] llvmpipe: Avoid deadlock when unloading opengl32.dll On Fri, Nov 07, 2014 at 04:52:25PM +, jfons...@vmware.com wrote: From: José Fonseca jfons...@vmware.com Hi Jose, This patch is causing random segfaults with OpenCL programs on radeonsi. I haven't been able to figure out exactly what is happening, so I was hoping you could help. I think the problem has something to do with the fact that when clover probes the hardware for OpenCL devices, the pipe_loader creates an llvmpipe screen, checks the value of PIPE_CAP_COMPUTE, and then destroys the screen since PIPE_CAP_COMPUTE is 0. The only way I can reproduce this bug is by running the piglit OpenCL tests concurrently. If it helps, here are the stack traces from one of the core dumps I captured from a piglit run: (gdb) thread 1 [Switching to thread 1 (Thread 0x7f6d53cdf700 (LWP 18653))] #0 0x7f6d53e56d2d in ?? () (gdb) bt #0 0x7f6d53e56d2d in ?? () #1 0x in ?? () (gdb) thread 2 [Switching to thread 2 (Thread 0x7f6d5495f700 (LWP 18652))] #0 0x7f6d5aacd44c in pthread_cond_wait () from /lib64/libpthread.so.0 (gdb) bt #0 0x7f6d5aacd44c in pthread_cond_wait () from /lib64/libpthread.so.0 #1 0x7f6d54c71dbb in mtx_init (mtx=0x7f6d54c71dbb mtx_init+97,type=0) at ../../../../../include/c11/threads_posix.h:182 #2 0x7f6d54c72157 in radeon_set_fd_access (applier=0x61e828,owner=0x61e800, mutex=0x7f6d54c71dbb mtx_init+97, request=0,request_name=0x0, enable=238 '\356') at radeon_drm_winsys.c:70 #3 0x7f6d54c7ad30 in radeon_drm_cs_emit_ioctl (param=0x61e4f0) at radeon_drm_winsys.c:598 #4 0x7f6d54c71ce0 in cnd_wait (cond=0x61e4f0, mtx=0x7f6d54c7ad07 radeon_drm_cs_emit_ioctl+168) at ../../../../../include/c11/threads_posix.h:152 #5 0x7f6d5aac91da in start_thread () from /lib64/libpthread.so.0 #6 0x7f6d5afd5d7d in clone () from /lib64/libc.so.6 (gdb) thread 3 [Switching to thread 3 (Thread 0x7f6d5c20c740 (LWP 18649))] #0 0x7f6d5afae73e in re_node_set_insert_last () from /lib64/libc.so.6 (gdb) bt #0 0x7f6d5afae73e in re_node_set_insert_last () from /lib64/libc.so.6 #1 0x7f6d5afae7fe in register_state () from /lib64/libc.so.6 #2 0x7f6d5afb1d39 in re_acquire_state_context () from /lib64/libc.so.6 #3 0x7f6d5afbaa95 in re_compile_internal () from /lib64/libc.so.6 #4 0x7f6d5afbb603 in regcomp () from /lib64/libc.so.6 #5 0x00403e9b in regex_get_matches (src=0x63e6c0 float, pattern=0x40b940 ^ulong|ulong2|ulong3|ulong4|ulong8|ulong16$, pmatch=0x0, size=0, cflags=4) at /home/tstellar/piglit/tests/cl/program/program-tester.c:476 #6 0x004040e2 in regex_match (src=0x63e6c0 float, pattern=0x40b940 ^ulong|ulong2|ulong3|ulong4|ulong8|ulong16$) at /home/tstellar/piglit/tests/cl/program/program-tester.c:532 #7 0x004059c6 in get_test_arg (src=0x63de70 1 buffer float[7] 0.5 -0.5 0.0 -0.0 nan -3.99 1.5, test=0x645710, arg_in=true) at /home/tstellar/piglit/tests/cl/program/program-tester.c:1016 #8 0x00406f4a in parse_config ( config_str=0x63fe30 \n[config]\nname: Test float trunc built-in on CL 1.1\nclc_version_min: 10\ndimensions: 1\n\n[test]\nname: trunc float1\nkernel_name: test_1_trunc_float\nglobal_size: 7 0 0\n\narg_out: 0 buffer float[7] 0.0 -0.0..., config=0x60e260 config) at /home/tstellar/piglit/tests/cl/program/program-tester.c:1410 #9 0x004074a7 in init (argc=2, argv=0x7fff46612d88, config=0x60e260 config) at /home/tstellar/piglit/tests/cl/program/program-tester.c:1555 #10
Re: [Mesa-dev] [PATCH] radeonsi: Disable asynchronous DMA except for PIPE_BUFFER
Reviewed-by: Marek Olšák marek.ol...@amd.com I suggest pasting the commit message into the code. Marek On Thu, Nov 13, 2014 at 7:52 AM, Michel Dänzer mic...@daenzer.net wrote: From: Michel Dänzer michel.daen...@amd.com Using the asynchronous DMA engine for multi-dimensional operations seems to cause random GPU lockups for various people. While the root cause for this might need to be fixed in the kernel, let's disable it for now. Before re-enabling this, please make sure you can hit all newly enabled paths in your testing, preferably with both piglit and real world apps, and get in touch with people on the bug reports below for stability testing. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85647 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83500 Cc: mesa-sta...@lists.freedesktop.org Signed-off-by: Michel Dänzer michel.daen...@amd.com --- src/gallium/drivers/radeonsi/si_dma.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/gallium/drivers/radeonsi/si_dma.c b/src/gallium/drivers/radeonsi/si_dma.c index b1bd5e7..1d3b524 100644 --- a/src/gallium/drivers/radeonsi/si_dma.c +++ b/src/gallium/drivers/radeonsi/si_dma.c @@ -250,6 +250,9 @@ void si_dma_copy(struct pipe_context *ctx, return; } + /* XXX: The paths below cause lockups for some */ + goto fallback; + if (src-format != dst-format || src_box-depth 1 || rdst-dirty_level_mask != 0 || rdst-cmask.size || rdst-fmask.size || -- 2.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)
Thanks for doing this. It's has been long overdue. Unfortunately we are relying on TGSI_OPCODE_CND/TGSI_OPCODE_ARR internally. I'm also interested in cutting down used opcodes, so I'll try to replace their usage with something else. But until then please hold on to those two patches. The rest looks good AFAICT. Concerning subroutines, we rely on BGNSUB/ENDSUB/CAL extensively. They are quite convenient when translating D3D 9/10 shaders, which also have them. And if one day we need to support recursive subroutines (CUDA 4.0 appears to have them; not sure about OpenCL, but I suppose it's only a matter of time), then they'll be unavoidable, as in-lining subroutines won't work anymore. Jose From: mesa-dev mesa-dev-boun...@lists.freedesktop.org on behalf of Eric Anholt e...@anholt.net Sent: 13 November 2014 01:18 To: mesa-dev@lists.freedesktop.org Subject: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR) This series removes a bunch of unused opcodes, mostly from TGSI. It doesn't go as far as we could possibly go -- while I welcome discussion for future patch series deleting more, I hope that discussion doesn't derail the review process for these changes. I haven't messed with the subroutine stuff, since I don't know what people are planning with that. I also haven't messed with the pack/unpack opcodes in TGSI, since they might be useful for some of the GLSL packing stuff. Testing status: compile-tested ilo/r600/softpipe, touch-tested softpipe. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzEm=KrBch2e5-gJGE_5bIs9RInABCFoKy7me_0oysUie4JIs=w3G1SjuOy0EbCJjVrC1tDok52z4eMzIiKu63rvxI7SYe= ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/4] i965/vec4: Use const references in emit() functions.
Kenneth Graunke kenn...@whitecape.org writes: Signed-off-by: Kenneth Graunke kenn...@whitecape.org Reviewed-by: Francisco Jerez curroje...@riseup.net --- src/mesa/drivers/dri/i965/brw_vec4.h | 18 -- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 11 ++- 2 files changed, 14 insertions(+), 15 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 3301dd8..ebbf882 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -399,16 +399,14 @@ public: vec4_instruction *emit(vec4_instruction *inst); vec4_instruction *emit(enum opcode opcode); - - vec4_instruction *emit(enum opcode opcode, dst_reg dst); - - vec4_instruction *emit(enum opcode opcode, dst_reg dst, src_reg src0); - - vec4_instruction *emit(enum opcode opcode, dst_reg dst, - src_reg src0, src_reg src1); - - vec4_instruction *emit(enum opcode opcode, dst_reg dst, - src_reg src0, src_reg src1, src_reg src2); + vec4_instruction *emit(enum opcode opcode, const dst_reg dst); + vec4_instruction *emit(enum opcode opcode, const dst_reg dst, + const src_reg src0); + vec4_instruction *emit(enum opcode opcode, const dst_reg dst, + const src_reg src0, const src_reg src1); + vec4_instruction *emit(enum opcode opcode, const dst_reg dst, + const src_reg src0, const src_reg src1, + const src_reg src2); vec4_instruction *emit_before(bblock_t *block, vec4_instruction *inst, diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index b46879b..a8ce498 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -79,8 +79,8 @@ vec4_visitor::emit_before(bblock_t *block, vec4_instruction *inst, } vec4_instruction * -vec4_visitor::emit(enum opcode opcode, dst_reg dst, -src_reg src0, src_reg src1, src_reg src2) +vec4_visitor::emit(enum opcode opcode, const dst_reg dst, const src_reg src0, + const src_reg src1, const src_reg src2) { return emit(new(mem_ctx) vec4_instruction(this, opcode, dst, src0, src1, src2)); @@ -88,19 +88,20 @@ vec4_visitor::emit(enum opcode opcode, dst_reg dst, vec4_instruction * -vec4_visitor::emit(enum opcode opcode, dst_reg dst, src_reg src0, src_reg src1) +vec4_visitor::emit(enum opcode opcode, const dst_reg dst, const src_reg src0, + const src_reg src1) { return emit(new(mem_ctx) vec4_instruction(this, opcode, dst, src0, src1)); } vec4_instruction * -vec4_visitor::emit(enum opcode opcode, dst_reg dst, src_reg src0) +vec4_visitor::emit(enum opcode opcode, const dst_reg dst, const src_reg src0) { return emit(new(mem_ctx) vec4_instruction(this, opcode, dst, src0)); } vec4_instruction * -vec4_visitor::emit(enum opcode opcode, dst_reg dst) +vec4_visitor::emit(enum opcode opcode, const dst_reg dst) { return emit(new(mem_ctx) vec4_instruction(this, opcode, dst)); } -- 2.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev pgpcwDdFoS5ZP.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] i965: Use macros to create prototypes for emitter helpers.
Kenneth Graunke kenn...@whitecape.org writes: We do this almost everywhere else; this should make it easier to modify. Signed-off-by: Kenneth Graunke kenn...@whitecape.org For this patch: Reviewed-by: Francisco Jerez curroje...@riseup.net --- src/mesa/drivers/dri/i965/brw_vec4.h | 98 +++- 1 file changed, 41 insertions(+), 57 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 750f491..3301dd8 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -414,68 +414,52 @@ public: vec4_instruction *inst, vec4_instruction *new_inst); - vec4_instruction *MOV(const dst_reg dst, const src_reg src0); - vec4_instruction *NOT(const dst_reg dst, const src_reg src0); - vec4_instruction *RNDD(const dst_reg dst, const src_reg src0); - vec4_instruction *RNDE(const dst_reg dst, const src_reg src0); - vec4_instruction *RNDZ(const dst_reg dst, const src_reg src0); - vec4_instruction *FRC(const dst_reg dst, const src_reg src0); - vec4_instruction *F32TO16(const dst_reg dst, const src_reg src0); - vec4_instruction *F16TO32(const dst_reg dst, const src_reg src0); - vec4_instruction *ADD(const dst_reg dst, const src_reg src0, - const src_reg src1); - vec4_instruction *MUL(const dst_reg dst, const src_reg src0, - const src_reg src1); - vec4_instruction *MACH(const dst_reg dst, const src_reg src0, - const src_reg src1); - vec4_instruction *MAC(const dst_reg dst, const src_reg src0, - const src_reg src1); - vec4_instruction *AND(const dst_reg dst, const src_reg src0, - const src_reg src1); - vec4_instruction *OR(const dst_reg dst, const src_reg src0, -const src_reg src1); - vec4_instruction *XOR(const dst_reg dst, const src_reg src0, - const src_reg src1); - vec4_instruction *DP3(const dst_reg dst, const src_reg src0, - const src_reg src1); - vec4_instruction *DP4(const dst_reg dst, const src_reg src0, - const src_reg src1); - vec4_instruction *DPH(const dst_reg dst, const src_reg src0, - const src_reg src1); - vec4_instruction *SHL(const dst_reg dst, const src_reg src0, - const src_reg src1); - vec4_instruction *SHR(const dst_reg dst, const src_reg src0, - const src_reg src1); - vec4_instruction *ASR(const dst_reg dst, const src_reg src0, - const src_reg src1); +#define EMIT1(op) vec4_instruction *op(const dst_reg , const src_reg ); +#define EMIT2(op) vec4_instruction *op(const dst_reg , const src_reg , const src_reg ); +#define EMIT3(op) vec4_instruction *op(const dst_reg , const src_reg , const src_reg , const src_reg ); + EMIT1(MOV) + EMIT1(NOT) + EMIT1(RNDD) + EMIT1(RNDE) + EMIT1(RNDZ) + EMIT1(FRC) + EMIT1(F32TO16) + EMIT1(F16TO32) + EMIT2(ADD) + EMIT2(MUL) + EMIT2(MACH) + EMIT2(MAC) + EMIT2(AND) + EMIT2(OR) + EMIT2(XOR) + EMIT2(DP3) + EMIT2(DP4) + EMIT2(DPH) + EMIT2(SHL) + EMIT2(SHR) + EMIT2(ASR) vec4_instruction *CMP(dst_reg dst, src_reg src0, src_reg src1, enum brw_conditional_mod condition); vec4_instruction *IF(src_reg src0, src_reg src1, enum brw_conditional_mod condition); vec4_instruction *IF(enum brw_predicate predicate); - vec4_instruction *PULL_CONSTANT_LOAD(const dst_reg dst, -const src_reg index); - vec4_instruction *SCRATCH_READ(const dst_reg dst, const src_reg index); - vec4_instruction *SCRATCH_WRITE(const dst_reg dst, const src_reg src, - const src_reg index); - vec4_instruction *LRP(const dst_reg dst, const src_reg a, - const src_reg y, const src_reg x); - vec4_instruction *BFREV(const dst_reg dst, const src_reg value); - vec4_instruction *BFE(const dst_reg dst, const src_reg bits, - const src_reg offset, const src_reg value); - vec4_instruction *BFI1(const dst_reg dst, const src_reg bits, - const src_reg offset); - vec4_instruction *BFI2(const dst_reg dst, const src_reg bfi1_dst, - const src_reg insert, const src_reg base); - vec4_instruction *FBH(const dst_reg dst, const src_reg value); - vec4_instruction *FBL(const dst_reg dst, const src_reg value); - vec4_instruction *CBIT(const dst_reg dst, const src_reg value); - vec4_instruction *MAD(const dst_reg dst, const src_reg c, - const src_reg b, const src_reg a); - vec4_instruction *ADDC(const
Re: [Mesa-dev] [PATCH 3/4] i965/vec4: Combine all the math emitters.
Kenneth Graunke kenn...@whitecape.org writes: On Wednesday, November 12, 2014 09:57:30 PM Matt Turner wrote: On Wed, Nov 12, 2014 at 9:35 PM, Kenneth Graunke kenn...@whitecape.org wrote: +vec4_visitor::emit_math(enum opcode opcode, + dst_reg dst, src_reg src0, src_reg src1) I think you can make the arguments const references too? Yeah. I've changed the prototype to: void emit_math(enum opcode opcode, const dst_reg dst, const src_reg src0, const src_reg src1 = src_reg()); It also meant changing the first few lines to: vec4_instruction *math = emit(opcode, dst, fix_math_operand(src0), fix_math_operand(src1)) since src0 = fix_math_operand(src0) doesn't work with const src_reg . + if (brw-gen == 6 dst.writemask != WRITEMASK_XYZW) { + /* MATH on Gen6 must be align1, so we can't do writemasks. */ + math-dst = dst_reg(this, glsl_type::vec4_type); + math-dst.type = dst.type; + math-dst.writemask = WRITEMASK_XYZW; I don't think you need to set the writemask (XYZW is the default). I do, actually - it's guaranteed to not be XYZW at this point. The caller passed us a destination register with some writemask set. We create the math instruction using dst, so it inherits that writemask. This block executes when dst.writemask != WRITEMASK_XYZW. The point is to override it back to XYZW, since it isn't. Are you sure? You are assigning a newly created dst_reg() to math-dst, so it should have the default writemask for a vec4, which is XYZW already. With that fixed and the change you mention above this patch is: Reviewed-by: Francisco Jerez curroje...@riseup.net I had a very similar change in my tree, but you beat me to it ;). + emit(MOV(dst, src_reg(math-dst))); + } else if (brw-gen 6) { + math-base_mrf = 1; + math-mlen = src1.file == BAD_FILE ? 1 : 2; } } Series is Reviewed-by: Matt Turner matts...@gmail.com Thanks! ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev pgp2fRzzl4hUz.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] i965/vec4: Make src_reg immediate constructors explicit.
Kenneth Graunke kenn...@whitecape.org writes: We did this for fs_reg a while back, and it's generally a good idea. I disagree, explicit constructors aren't a one-size-fits-all. IMO there are three scenarios in which explicit constructors may be a good idea: - Cases where your constructor may lose relevant information about its argument when used inadvertently. IOW there is a many-to-one mapping between your argument type and your constructed type. - Cases where your constructor doesn't leave the constructed object completely initialized and some additional action may be required to bring the constructed object to a well-defined state. IOW there is a one-to-many mapping between your argument type and your constructed type. - Cases where your constructor has to do some expensive or run-time environment-dependent operation. If none of these apply your argument and constructed objects are effectively the same thing, and declaring the constructor explicit just adds clutter and increases the amount of typing you have to do for no benefit. I suspect that the immediate register constructors from both back-ends don't fit in any of the three categories, they do the only sane thing they could possibly do without losing any information, so I don't see why we would want them to be explicit. Actually it would make it rather annoying to pass immediates around with the i965 IR builder framework I'm working on for ARB_shader_image_load_store unless I change my src_vector type to have a constructor for each immediate type instead of relying on the implicit conversion to src/fs_reg, but then I'd have to maintain another constructor for each possible src/fs_reg constructor argument and keep them up to date. I agree though that there is a good reason for the src_reg(dst_reg) constructor and its converse to be marked explicit, because they (currently) lose information. dst_reg(src_reg) necessarily loses component ordering information because you cannot represent that as a writemask, the transformation could be better behaved than what we have if it calculated the subset of components referenced by the swizzle of its argument instead of special-casing . There's no good reason why src_reg(dst_reg) should lose information, and I think it would make sense and it would be very convenient to make it implicit if it fulfills the property 'dst_reg(src_reg(dst_reg(x))) == dst_reg(x)' and we fix it so the following code does the only one sane thing: | dst_reg reg = x; | ADD(reg, src_reg(reg), y); I can send patches to address the last two issues, actually I have a fix for them lying around in some branch... Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_vec4.h | 6 +-- src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 35 --- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 12 ++--- src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 55 --- 4 files changed, 55 insertions(+), 53 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 8abd166..3d2882d 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -99,9 +99,9 @@ public: src_reg(register_file file, int reg, const glsl_type *type); src_reg(); - src_reg(float f); - src_reg(uint32_t u); - src_reg(int32_t i); + explicit src_reg(float f); + explicit src_reg(uint32_t u); + explicit src_reg(int32_t i); src_reg(struct brw_reg reg); bool equals(const src_reg r) const; diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp index db0e6cc..58c4df2 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp @@ -150,7 +150,7 @@ vec4_gs_visitor::emit_prolog() */ this-current_annotation = clear r0.2; dst_reg r0(retype(brw_vec4_grf(0, 0), BRW_REGISTER_TYPE_UD)); - vec4_instruction *inst = emit(GS_OPCODE_SET_DWORD_2, r0, 0u); + vec4_instruction *inst = emit(GS_OPCODE_SET_DWORD_2, r0, src_reg(0u)); inst-force_writemask_all = true; /* Create a virtual register to hold the vertex count */ @@ -158,7 +158,7 @@ vec4_gs_visitor::emit_prolog() /* Initialize the vertex_count register to 0 */ this-current_annotation = initialize vertex_count; - inst = emit(MOV(dst_reg(this-vertex_count), 0u)); + inst = emit(MOV(dst_reg(this-vertex_count), src_reg(0u))); inst-force_writemask_all = true; if (c-control_data_header_size_bits 0) { @@ -173,7 +173,7 @@ vec4_gs_visitor::emit_prolog() */ if (c-control_data_header_size_bits = 32) { this-current_annotation = initialize control data bits; - inst = emit(MOV(dst_reg(this-control_data_bits), 0u)); + inst = emit(MOV(dst_reg(this-control_data_bits),
Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)
It looks like ARR is generated, as src/gallium/state_trackers/nine/nine_shader.c has #define _OPI(o,t,vv1,vv2,pv1,pv2,d,s,h) \ { D3DSIO_##o, TGSI_OPCODE_##t, { vv1, vv2 }, { pv1, pv2, }, d, s, h } [...] _OPI(MOVA, ARR, V(2,0), V(3,0), V(0,0), V(0,0), 1, 1, NULL), Jose From: mesa-dev mesa-dev-boun...@lists.freedesktop.org on behalf of Eric Anholt e...@anholt.net Sent: 13 November 2014 01:43 To: Ilia Mirkin Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR) Ilia Mirkin imir...@alum.mit.edu writes: AFAIK at least some of these (NRM, ARR, probably others) were being used by the d3d9 state tracker. Not sure what its status is, but I believe the hope was to eventually get it into the tree. They've got code for lowering NRM and CND to sanity, and no use of ARR, ARA, X2D, RFL, STR, SFL, or BRA. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] llvmpipe: Avoid deadlock when unloading opengl32.dll
On Thu, Nov 13, 2014 at 11:10:39AM +, Jose Fonseca wrote: Hi Tom, That's peculiar. It looks like pthreads got into a weird state somehow. Don't precisely understand how though. Maybe there's a race inside pipe_semaphore_signal() with the destruction of the semaphore. I think the best thing for now is to revert to old behavior for non-windows platforms: diff --git a/src/gallium/drivers/llvmpipe/lp_rast.c b/src/gallium/drivers/llvmpipe/lp_rast.c index 6b54d43..e168766 100644 --- a/src/gallium/drivers/llvmpipe/lp_rast.c +++ b/src/gallium/drivers/llvmpipe/lp_rast.c @@ -800,7 +800,9 @@ static PIPE_THREAD_ROUTINE( thread_function, init_data ) pipe_semaphore_signal(task-work_done); } +#ifdef _WIN32 pipe_semaphore_signal(task-work_done); +#endif return 0; } @@ -891,7 +893,11 @@ void lp_rast_destroy( struct lp_rasterizer *rast ) * We don't actually call pipe_thread_wait to avoid dead lock on Windows * per https://bugs.freedesktop.org/show_bug.cgi?id=76252 */ for (i = 0; i rast-num_threads; i++) { +#ifdef _WIN32 pipe_semaphore_wait(rast-tasks[i].work_done); +#else + pipe_thread_wait(rast-threads[i]); +#endif } /* Clean up per-thread data */ Because I don't think that the Windows deadlock ever happens on Linux. This solution works for me. Feel free to commit. I wonder if the problem may be the pipe-loader is unloading pipe_swrast.so before all the threads have finished. -Tom Jose From: Tom Stellard t...@stellard.net Sent: 13 November 2014 01:45 To: Jose Fonseca Cc: mesa-dev@lists.freedesktop.org; Roland Scheidegger Subject: Re: [Mesa-dev] [PATCH] llvmpipe: Avoid deadlock when unloading opengl32.dll On Fri, Nov 07, 2014 at 04:52:25PM +, jfons...@vmware.com wrote: From: José Fonseca jfons...@vmware.com Hi Jose, This patch is causing random segfaults with OpenCL programs on radeonsi. I haven't been able to figure out exactly what is happening, so I was hoping you could help. I think the problem has something to do with the fact that when clover probes the hardware for OpenCL devices, the pipe_loader creates an llvmpipe screen, checks the value of PIPE_CAP_COMPUTE, and then destroys the screen since PIPE_CAP_COMPUTE is 0. The only way I can reproduce this bug is by running the piglit OpenCL tests concurrently. If it helps, here are the stack traces from one of the core dumps I captured from a piglit run: (gdb) thread 1 [Switching to thread 1 (Thread 0x7f6d53cdf700 (LWP 18653))] #0 0x7f6d53e56d2d in ?? () (gdb) bt #0 0x7f6d53e56d2d in ?? () #1 0x in ?? () (gdb) thread 2 [Switching to thread 2 (Thread 0x7f6d5495f700 (LWP 18652))] #0 0x7f6d5aacd44c in pthread_cond_wait () from /lib64/libpthread.so.0 (gdb) bt #0 0x7f6d5aacd44c in pthread_cond_wait () from /lib64/libpthread.so.0 #1 0x7f6d54c71dbb in mtx_init (mtx=0x7f6d54c71dbb mtx_init+97,type=0) at ../../../../../include/c11/threads_posix.h:182 #2 0x7f6d54c72157 in radeon_set_fd_access (applier=0x61e828,owner=0x61e800, mutex=0x7f6d54c71dbb mtx_init+97, request=0,request_name=0x0, enable=238 '\356') at radeon_drm_winsys.c:70 #3 0x7f6d54c7ad30 in radeon_drm_cs_emit_ioctl (param=0x61e4f0) at radeon_drm_winsys.c:598 #4 0x7f6d54c71ce0 in cnd_wait (cond=0x61e4f0, mtx=0x7f6d54c7ad07 radeon_drm_cs_emit_ioctl+168) at ../../../../../include/c11/threads_posix.h:152 #5 0x7f6d5aac91da in start_thread () from /lib64/libpthread.so.0 #6 0x7f6d5afd5d7d in clone () from /lib64/libc.so.6 (gdb) thread 3 [Switching to thread 3 (Thread 0x7f6d5c20c740 (LWP 18649))] #0 0x7f6d5afae73e in re_node_set_insert_last () from /lib64/libc.so.6 (gdb) bt #0 0x7f6d5afae73e in re_node_set_insert_last () from /lib64/libc.so.6 #1 0x7f6d5afae7fe in register_state () from /lib64/libc.so.6 #2 0x7f6d5afb1d39 in re_acquire_state_context () from /lib64/libc.so.6 #3 0x7f6d5afbaa95 in re_compile_internal () from /lib64/libc.so.6 #4 0x7f6d5afbb603 in regcomp () from /lib64/libc.so.6 #5 0x00403e9b in regex_get_matches (src=0x63e6c0 float, pattern=0x40b940 ^ulong|ulong2|ulong3|ulong4|ulong8|ulong16$, pmatch=0x0, size=0, cflags=4) at /home/tstellar/piglit/tests/cl/program/program-tester.c:476 #6 0x004040e2 in regex_match (src=0x63e6c0 float, pattern=0x40b940 ^ulong|ulong2|ulong3|ulong4|ulong8|ulong16$) at /home/tstellar/piglit/tests/cl/program/program-tester.c:532 #7 0x004059c6 in get_test_arg (src=0x63de70 1 buffer float[7] 0.5 -0.5 0.0 -0.0 nan -3.99 1.5, test=0x645710, arg_in=true) at /home/tstellar/piglit/tests/cl/program/program-tester.c:1016 #8 0x00406f4a in parse_config ( config_str=0x63fe30 \n[config]\nname: Test float trunc built-in on CL 1.1\nclc_version_min: 10\ndimensions: 1\n\n[test]\nname: trunc
[Mesa-dev] [PATCH] linker: Add a missing space in an error message
--- src/glsl/linker.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index bd2aa3c..41d6a82 100644 --- a/src/glsl/linker.cpp +++ b/src/glsl/linker.cpp @@ -2411,7 +2411,7 @@ reserve_explicit_locations(struct gl_shader_program *prog, * or linker error will be generated. */ linker_error(prog, - location qualifier for uniform %s overlaps + location qualifier for uniform %s overlaps previously used location, var-name); return false; -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] linker: Add a missing space in an error message
On 11/13/2014 08:32 AM, Neil Roberts wrote: --- src/glsl/linker.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/glsl/linker.cpp b/src/glsl/linker.cpp index bd2aa3c..41d6a82 100644 --- a/src/glsl/linker.cpp +++ b/src/glsl/linker.cpp @@ -2411,7 +2411,7 @@ reserve_explicit_locations(struct gl_shader_program *prog, * or linker error will be generated. */ linker_error(prog, - location qualifier for uniform %s overlaps + location qualifier for uniform %s overlaps previously used location, var-name); return false; Reviewed-by: Brian Paul bri...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)
On 11/12/2014 06:18 PM, Eric Anholt wrote: This series removes a bunch of unused opcodes, mostly from TGSI. It doesn't go as far as we could possibly go -- while I welcome discussion for future patch series deleting more, I hope that discussion doesn't derail the review process for these changes. I haven't messed with the subroutine stuff, since I don't know what people are planning with that. I also haven't messed with the pack/unpack opcodes in TGSI, since they might be useful for some of the GLSL packing stuff. Testing status: compile-tested ilo/r600/softpipe, touch-tested softpipe. Except for what Jose said, this looks fine to me. Does anyone remember if there was a reason that the TGSI_OPCODE_ tokens are #defines instead of an enumeration? -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)
Nine can lower ARR into ROUND+ARL easily. Marek On Thu, Nov 13, 2014 at 3:33 PM, Jose Fonseca jfons...@vmware.com wrote: It looks like ARR is generated, as src/gallium/state_trackers/nine/nine_shader.c has #define _OPI(o,t,vv1,vv2,pv1,pv2,d,s,h) \ { D3DSIO_##o, TGSI_OPCODE_##t, { vv1, vv2 }, { pv1, pv2, }, d, s, h } [...] _OPI(MOVA, ARR, V(2,0), V(3,0), V(0,0), V(0,0), 1, 1, NULL), Jose From: mesa-dev mesa-dev-boun...@lists.freedesktop.org on behalf of Eric Anholt e...@anholt.net Sent: 13 November 2014 01:43 To: Ilia Mirkin Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR) Ilia Mirkin imir...@alum.mit.edu writes: AFAIK at least some of these (NRM, ARR, probably others) were being used by the d3d9 state tracker. Not sure what its status is, but I believe the hope was to eventually get it into the tree. They've got code for lowering NRM and CND to sanity, and no use of ARR, ARA, X2D, RFL, STR, SFL, or BRA. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)
This looks good to me. Other candidates for removal: SUB (same as ADD with the Negate bit inverted) CLAMP (same as MIN+MAX), some drivers don't implement this ABS (same as MOV with the Abs bit set) Marek On Thu, Nov 13, 2014 at 2:18 AM, Eric Anholt e...@anholt.net wrote: This series removes a bunch of unused opcodes, mostly from TGSI. It doesn't go as far as we could possibly go -- while I welcome discussion for future patch series deleting more, I hope that discussion doesn't derail the review process for these changes. I haven't messed with the subroutine stuff, since I don't know what people are planning with that. I also haven't messed with the pack/unpack opcodes in TGSI, since they might be useful for some of the GLSL packing stuff. Testing status: compile-tested ilo/r600/softpipe, touch-tested softpipe. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)
As long as we have NAND, pretty much anything can be lowered to that... I am, of course, not advocating keeping around every insane instruction, but it does seem a bit arbitrary as to which ones we have and which ones we don't... I am personally guilty of adding a bunch, and it was never clear to me how much should be left to the backend optimizer to un-lower and how much should be done as separate instructions. My take was that as long there was a state tracker providing it as input, it made sense to keep the instruction. But perhaps there's a different policy that'd work better. Cheers, -ilia On Thu, Nov 13, 2014 at 11:40 AM, Marek Olšák mar...@gmail.com wrote: Nine can lower ARR into ROUND+ARL easily. Marek On Thu, Nov 13, 2014 at 3:33 PM, Jose Fonseca jfons...@vmware.com wrote: It looks like ARR is generated, as src/gallium/state_trackers/nine/nine_shader.c has #define _OPI(o,t,vv1,vv2,pv1,pv2,d,s,h) \ { D3DSIO_##o, TGSI_OPCODE_##t, { vv1, vv2 }, { pv1, pv2, }, d, s, h } [...] _OPI(MOVA, ARR, V(2,0), V(3,0), V(0,0), V(0,0), 1, 1, NULL), Jose From: mesa-dev mesa-dev-boun...@lists.freedesktop.org on behalf of Eric Anholt e...@anholt.net Sent: 13 November 2014 01:43 To: Ilia Mirkin Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR) Ilia Mirkin imir...@alum.mit.edu writes: AFAIK at least some of these (NRM, ARR, probably others) were being used by the d3d9 state tracker. Not sure what its status is, but I believe the hope was to eventually get it into the tree. They've got code for lowering NRM and CND to sanity, and no use of ARR, ARA, X2D, RFL, STR, SFL, or BRA. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/9] i915: Gen2 texturing fixes and a few random patches
On Wed, Aug 06, 2014 at 09:56:30PM +0300, ville.syrj...@linux.intel.com wrote: From: Ville Syrjälä ville.syrj...@linux.intel.com I had a few rainy days during my summer vacation so I decided to fix a chromnium-bsu texturing problem that was nagging me for a while now. I ended up fixing a few other things too that I spotted mostly from reading the code. The aniso vs. mip filter thing probably comes down to personal preference, but at least to me aniso+mip nearest looks better than trilinear. At least when playing the old classic glaxium :) I have no idea if the scissor patch makes any difference anywhere. I just caught the note in the spec and noticed we're doing it in the opposite order. The rest should be pretty clear. Ville Syrjälä (9): i915: Fix GL_DOT3_RGBA a bit i915: Use L8A8 instead of I8 to simulate A8 on gen2 i915: Override mip filter to nearest with aniso i915: Accept GL_DEPTH_STENCIL GL_DEPTH_COMPONENT formats for renderbuffers i915: Kill intel_context::hw_stencil i915: Protect macro argument for TEXTURE_SET() i915: Don't call _mesa_meta_glsl_Clear() on gen2 i915: Emit 3DSTATE_SCISSOR_RECTANGLE_0 before 3DSTATE_SCISSOR_ENABLE I finally got around to pushing the reviewed patches from this series. Thanks for the reviews. i915: Only use TEXCOORDTYPE_VECTOR with cube maps on gen2 This one is still lacking a review though, and it's actually for the original bug I set out to fix. So I'd appreaciate if someone can take a look at the patch. There's also this gen3 specific patch I did that would like to get reviewed: http://patchwork.freedesktop.org/patch/31661/ I also have to confess to having a decent pile of more vertex related i915 patches sitting in a branch, one which actually makes glxgears render correctly on gen2 ;) I'd like to post those too but I wanted to get the old stuff out of the way first... -- Ville Syrjälä rntel OTC ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Always enable VF statistics
Kenneth Graunke kenn...@whitecape.org writes: On Wednesday, November 12, 2014 06:54:31 PM Ben Widawsky wrote: Every other unit in the geometry pipeline automatically enables statistics gathering. This part of the pipe has been controlled by the DEBUG_STATS variable, but this is asymmetric. This dates back to the original implementation, and I am not sure if there is a reason for it. I need access to these stats to implement ARB_pipeline_statistics_query. Eric wrote it, and Ken touched it last. Do you have any opposition? Cc: Eric Anholt e...@anholt.net Cc: Kenneth Graunke kenn...@whitecape.org Signed-off-by: Ben Widawsky b...@bwidawsk.net --- src/mesa/drivers/dri/i965/brw_misc_state.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c b/src/mesa/drivers/dri/i965/brw_misc_state.c index 99fcddc..2c40814 100644 --- a/src/mesa/drivers/dri/i965/brw_misc_state.c +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c @@ -929,8 +929,7 @@ brw_upload_invariant_state(struct brw_context *brw) const uint32_t _3DSTATE_VF_STATISTICS = is_965 ? GEN4_3DSTATE_VF_STATISTICS : GM45_3DSTATE_VF_STATISTICS; BEGIN_BATCH(1); - OUT_BATCH(_3DSTATE_VF_STATISTICS 16 | - (unlikely(INTEL_DEBUG DEBUG_STATS) ? 1 : 0)); + OUT_BATCH(_3DSTATE_VF_STATISTICS 16 | 1); ADVANCE_BATCH(); } My only complaint about this patch is that it doesn't go far enough. I'm 100% for removing DEBUG_STATS completely. I've never seen any performance penalty for enabling statistics. I think we should leave them on except when there's some reason to turn them off (i.e. brw-meta_in_progress flag in the clipper, which prevents us from counting i.e. glClear). Presumably there's some tiny power cost. I don't know if it's a relevat amount of power, though. pgpR8OyqxuHyu.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/13 v2] gallium: Drop the unused ARA opcode.
Nothing in the tree generated it. --- The rest of the rebase to deal with the conflicts with this can be found at tgsi-opcode-nuke-2 of my Mesa tree. (CND is also left in there) src/gallium/auxiliary/gallivm/lp_bld_tgsi.c | 1 - src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c | 6 -- src/gallium/auxiliary/tgsi/tgsi_exec.c | 4 src/gallium/auxiliary/tgsi/tgsi_info.c | 2 +- src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h| 1 - src/gallium/docs/source/tgsi.rst| 8 src/gallium/drivers/ilo/shader/toy_tgsi.c | 2 -- src/gallium/drivers/r300/r300_tgsi_to_rc.c | 1 - src/gallium/drivers/r600/r600_shader.c | 6 +++--- src/gallium/include/pipe/p_shader_tokens.h | 2 +- 10 files changed, 5 insertions(+), 28 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c index 4a9ce37..44a44a6 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi.c @@ -212,7 +212,6 @@ lp_build_tgsi_inst_llvm( case TGSI_OPCODE_UP4B: case TGSI_OPCODE_UP4UB: case TGSI_OPCODE_X2D: - case TGSI_OPCODE_ARA: case TGSI_OPCODE_BRA: case TGSI_OPCODE_PUSHA: case TGSI_OPCODE_POPA: diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c index 3b9833a..ed1798d 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_aos.c @@ -798,12 +798,6 @@ lp_emit_instruction_aos( return FALSE; break; - case TGSI_OPCODE_ARA: - /* deprecated */ - assert(0); - return FALSE; - break; - case TGSI_OPCODE_ARR: src0 = lp_build_emit_fetch(bld-bld_base, inst, 0, LP_CHAN_ALL); dst0 = lp_build_round(bld-bld_base.base, src0); diff --git a/src/gallium/auxiliary/tgsi/tgsi_exec.c b/src/gallium/auxiliary/tgsi/tgsi_exec.c index b3ea82f..578d4d8 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_exec.c +++ b/src/gallium/auxiliary/tgsi/tgsi_exec.c @@ -3912,10 +3912,6 @@ exec_instruction( exec_x2d(mach, inst); break; - case TGSI_OPCODE_ARA: - assert (0); - break; - case TGSI_OPCODE_ARR: exec_vector_unary(mach, inst, micro_arr, TGSI_EXEC_DATA_INT, TGSI_EXEC_DATA_FLOAT); break; diff --git a/src/gallium/auxiliary/tgsi/tgsi_info.c b/src/gallium/auxiliary/tgsi/tgsi_info.c index d17426f..b94f5ac 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_info.c +++ b/src/gallium/auxiliary/tgsi/tgsi_info.c @@ -97,7 +97,7 @@ static const struct tgsi_opcode_info opcode_info[TGSI_OPCODE_LAST] = { 1, 1, 0, 0, 0, 0, COMP, UP4B, TGSI_OPCODE_UP4B }, { 1, 1, 0, 0, 0, 0, COMP, UP4UB, TGSI_OPCODE_UP4UB }, { 1, 3, 0, 0, 0, 0, COMP, X2D, TGSI_OPCODE_X2D }, - { 1, 1, 0, 0, 0, 0, COMP, ARA, TGSI_OPCODE_ARA }, + { 0, 1, 0, 0, 0, 1, NONE, , 60 }, /* removed */ { 1, 1, 0, 0, 0, 0, COMP, ARR, TGSI_OPCODE_ARR }, { 0, 1, 0, 0, 0, 0, NONE, BRA, TGSI_OPCODE_BRA }, { 0, 0, 0, 1, 0, 0, NONE, CAL, TGSI_OPCODE_CAL }, diff --git a/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h b/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h index b121d32..60ecb2d 100644 --- a/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h +++ b/src/gallium/auxiliary/tgsi/tgsi_opcode_tmp.h @@ -112,7 +112,6 @@ OP11(UP2US) OP11(UP4B) OP11(UP4UB) OP13(X2D) -OP11(ARA) OP11(ARR) OP01(BRA) OP00_LBL(CAL) diff --git a/src/gallium/docs/source/tgsi.rst b/src/gallium/docs/source/tgsi.rst index c912ec5..2138b18 100644 --- a/src/gallium/docs/source/tgsi.rst +++ b/src/gallium/docs/source/tgsi.rst @@ -701,14 +701,6 @@ This instruction replicates its result. Considered for removal. -.. opcode:: ARA - Address Register Add - - TBD - -.. note:: - - Considered for removal. - .. opcode:: ARR - Address Register Load With Round .. math:: diff --git a/src/gallium/drivers/ilo/shader/toy_tgsi.c b/src/gallium/drivers/ilo/shader/toy_tgsi.c index 1bf9f21..b71d577 100644 --- a/src/gallium/drivers/ilo/shader/toy_tgsi.c +++ b/src/gallium/drivers/ilo/shader/toy_tgsi.c @@ -854,7 +854,6 @@ static const toy_tgsi_translate aos_translate_table[TGSI_OPCODE_LAST] = { [TGSI_OPCODE_UP4B] = aos_unsupported, [TGSI_OPCODE_UP4UB]= aos_unsupported, [TGSI_OPCODE_X2D] = aos_unsupported, - [TGSI_OPCODE_ARA] = aos_unsupported, [TGSI_OPCODE_ARR] = aos_simple, [TGSI_OPCODE_BRA] = aos_unsupported, [TGSI_OPCODE_CAL] = aos_unsupported, @@ -1404,7 +1403,6 @@ static const toy_tgsi_translate soa_translate_table[TGSI_OPCODE_LAST] = { [TGSI_OPCODE_UP4B] = soa_unsupported, [TGSI_OPCODE_UP4UB]= soa_unsupported, [TGSI_OPCODE_X2D] = soa_unsupported, - [TGSI_OPCODE_ARA] = soa_unsupported, [TGSI_OPCODE_ARR] = soa_per_channel, [TGSI_OPCODE_BRA] = soa_unsupported, [TGSI_OPCODE_CAL] =
Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)
It looks like ARR is mildly useful though as hw often can implement it natively and it benefits at least one state tracker (not that and optimizing backend couldn't recognize round+arl but llvmpipe wouldn't at least right now). So, maybe it would be better to keep it for now. Roland Am 13.11.2014 um 17:40 schrieb Marek Olšák: Nine can lower ARR into ROUND+ARL easily. Marek On Thu, Nov 13, 2014 at 3:33 PM, Jose Fonseca jfons...@vmware.com wrote: It looks like ARR is generated, as src/gallium/state_trackers/nine/nine_shader.c has #define _OPI(o,t,vv1,vv2,pv1,pv2,d,s,h) \ { D3DSIO_##o, TGSI_OPCODE_##t, { vv1, vv2 }, { pv1, pv2, }, d, s, h } [...] _OPI(MOVA, ARR, V(2,0), V(3,0), V(0,0), V(0,0), 1, 1, NULL), Jose From: mesa-dev mesa-dev-boun...@lists.freedesktop.org on behalf of Eric Anholt e...@anholt.net Sent: 13 November 2014 01:43 To: Ilia Mirkin Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR) Ilia Mirkin imir...@alum.mit.edu writes: AFAIK at least some of these (NRM, ARR, probably others) were being used by the d3d9 state tracker. Not sure what its status is, but I believe the hope was to eventually get it into the tree. They've got code for lowering NRM and CND to sanity, and no use of ARR, ARA, X2D, RFL, STR, SFL, or BRA. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_Im=Taa6YbyiGX2xsMrwlSrA_lcjzjGuuBWzdEII8T8CFQQs=3g-djpg3gj45XldHXhQL-VFVLYNCS2hdSP8pfrU-tJ4e= ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_Im=Taa6YbyiGX2xsMrwlSrA_lcjzjGuuBWzdEII8T8CFQQs=3g-djpg3gj45XldHXhQL-VFVLYNCS2hdSP8pfrU-tJ4e= ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/4] i965/vec4: Combine all the math emitters.
On Thursday, November 13, 2014 03:09:22 PM Francisco Jerez wrote: Kenneth Graunke kenn...@whitecape.org writes: On Wednesday, November 12, 2014 09:57:30 PM Matt Turner wrote: On Wed, Nov 12, 2014 at 9:35 PM, Kenneth Graunke kenn...@whitecape.org wrote: +vec4_visitor::emit_math(enum opcode opcode, + dst_reg dst, src_reg src0, src_reg src1) I think you can make the arguments const references too? Yeah. I've changed the prototype to: void emit_math(enum opcode opcode, const dst_reg dst, const src_reg src0, const src_reg src1 = src_reg()); It also meant changing the first few lines to: vec4_instruction *math = emit(opcode, dst, fix_math_operand(src0), fix_math_operand(src1)) since src0 = fix_math_operand(src0) doesn't work with const src_reg . + if (brw-gen == 6 dst.writemask != WRITEMASK_XYZW) { + /* MATH on Gen6 must be align1, so we can't do writemasks. */ + math-dst = dst_reg(this, glsl_type::vec4_type); + math-dst.type = dst.type; + math-dst.writemask = WRITEMASK_XYZW; I don't think you need to set the writemask (XYZW is the default). I do, actually - it's guaranteed to not be XYZW at this point. The caller passed us a destination register with some writemask set. We create the math instruction using dst, so it inherits that writemask. This block executes when dst.writemask != WRITEMASK_XYZW. The point is to override it back to XYZW, since it isn't. Are you sure? You are assigning a newly created dst_reg() to math-dst, so it should have the default writemask for a vec4, which is XYZW already. With that fixed and the change you mention above this patch is: Reviewed-by: Francisco Jerez curroje...@riseup.net I had a very similar change in my tree, but you beat me to it ;). You're both right, of course. I've dropped the XYZW setting. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Always enable VF statistics
On Thu, Nov 13, 2014 at 09:47:10AM -0800, Eric Anholt wrote: Kenneth Graunke kenn...@whitecape.org writes: On Wednesday, November 12, 2014 06:54:31 PM Ben Widawsky wrote: Every other unit in the geometry pipeline automatically enables statistics gathering. This part of the pipe has been controlled by the DEBUG_STATS variable, but this is asymmetric. This dates back to the original implementation, and I am not sure if there is a reason for it. I need access to these stats to implement ARB_pipeline_statistics_query. Eric wrote it, and Ken touched it last. Do you have any opposition? Cc: Eric Anholt e...@anholt.net Cc: Kenneth Graunke kenn...@whitecape.org Signed-off-by: Ben Widawsky b...@bwidawsk.net --- src/mesa/drivers/dri/i965/brw_misc_state.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_misc_state.c b/src/mesa/drivers/dri/i965/brw_misc_state.c index 99fcddc..2c40814 100644 --- a/src/mesa/drivers/dri/i965/brw_misc_state.c +++ b/src/mesa/drivers/dri/i965/brw_misc_state.c @@ -929,8 +929,7 @@ brw_upload_invariant_state(struct brw_context *brw) const uint32_t _3DSTATE_VF_STATISTICS = is_965 ? GEN4_3DSTATE_VF_STATISTICS : GM45_3DSTATE_VF_STATISTICS; BEGIN_BATCH(1); - OUT_BATCH(_3DSTATE_VF_STATISTICS 16 | - (unlikely(INTEL_DEBUG DEBUG_STATS) ? 1 : 0)); + OUT_BATCH(_3DSTATE_VF_STATISTICS 16 | 1); ADVANCE_BATCH(); } My only complaint about this patch is that it doesn't go far enough. I'm 100% for removing DEBUG_STATS completely. I've never seen any performance penalty for enabling statistics. I think we should leave them on except when there's some reason to turn them off (i.e. brw-meta_in_progress flag in the clipper, which prevents us from counting i.e. glClear). Presumably there's some tiny power cost. I don't know if it's a relevat amount of power, though. It would totally be blown away by the PS stats I assume. So not taking that as a NAK. BTW, also: buzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86145 -- Ben Widawsky, Intel Open Source Technology Center ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)
Initially TGSI used to be an union of all possible opcodes (NV/ARB fp/vp, Mesa IR, D3D Shader Model 1.x, 2.x, more recently D3D10). But in practice it's just too much of a hassle, and many of the opcodes were never handled or generated. Many received little to no testing. Particularly when implementing drivers for modern hardware that doesn't have opcodes to match with the Shader Model 1.x and 2.x quirky semantics, they are distractions. Furthermore the apps who used to generate them are simple by nowadays standards and run fine on fast modern hardware. By having a smaller set of opcodes they can be tested more easily, so one can have more confidence that they do actually work as intended; and developing analysis/optimization/transformation passes becomes easier too. But I have no definite answer on which should or not be in TGSI. D3D10/11 assembly is not a bad reference, but it has some omissions It's a matter of deciding on case by case.. Jose From: ibmir...@gmail.com ibmir...@gmail.com on behalf of Ilia Mirkin imir...@alum.mit.edu Sent: 13 November 2014 17:13 To: Marek Olšák Cc: Jose Fonseca; Eric Anholt; mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR) As long as we have NAND, pretty much anything can be lowered to that... I am, of course, not advocating keeping around every insane instruction, but it does seem a bit arbitrary as to which ones we have and which ones we don't... I am personally guilty of adding a bunch, and it was never clear to me how much should be left to the backend optimizer to un-lower and how much should be done as separate instructions. My take was that as long there was a state tracker providing it as input, it made sense to keep the instruction. But perhaps there's a different policy that'd work better. Cheers, -ilia On Thu, Nov 13, 2014 at 11:40 AM, Marek Olšák mar...@gmail.com wrote: Nine can lower ARR into ROUND+ARL easily. Marek On Thu, Nov 13, 2014 at 3:33 PM, Jose Fonseca jfons...@vmware.com wrote: It looks like ARR is generated, as src/gallium/state_trackers/nine/nine_shader.c has #define _OPI(o,t,vv1,vv2,pv1,pv2,d,s,h) \ { D3DSIO_##o, TGSI_OPCODE_##t, { vv1, vv2 }, { pv1, pv2, }, d, s, h } [...] _OPI(MOVA, ARR, V(2,0), V(3,0), V(0,0), V(0,0), 1, 1, NULL), Jose From: mesa-dev mesa-dev-boun...@lists.freedesktop.org on behalf of Eric Anholt e...@anholt.net Sent: 13 November 2014 01:43 To: Ilia Mirkin Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR) Ilia Mirkin imir...@alum.mit.edu writes: AFAIK at least some of these (NRM, ARR, probably others) were being used by the d3d9 state tracker. Not sure what its status is, but I believe the hope was to eventually get it into the tree. They've got code for lowering NRM and CND to sanity, and no use of ARR, ARA, X2D, RFL, STR, SFL, or BRA. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIFaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzEm=j854NOxlaV5nq8kWcima4dP_7hhtaOc2Uj1eJJzZOUMs=51MpEXASrETyaVvEjR8y1V-NPHxlTTfeHhX4Bb8TgKEe= ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)
I've eliminated our internal dependency on TGSI_OPCODE_CND (by replacing SUB+CMP). So you can commit the change to remove it as far as I'm concerned. I have mixed feelings about ARR, because the operation it does is essentially an iround(), i.e., (int)roundf(), and at least when targeting x86, we can generate better code with the combination. That said neither D3D10, GLSL, or OpenCL C code has built-ins for iround(), so to be of benefit we'd need to do pattern matching. So I'm not sure if it's worth to keep this around just for that... Jose From: Jose Fonseca Sent: 13 November 2014 13:06 To: Eric Anholt; mesa-dev@lists.freedesktop.org Subject: RE: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR) Thanks for doing this. It's has been long overdue. Unfortunately we are relying on TGSI_OPCODE_CND/TGSI_OPCODE_ARR internally. I'm also interested in cutting down used opcodes, so I'll try to replace their usage with something else. But until then please hold on to those two patches. The rest looks good AFAICT. Concerning subroutines, we rely on BGNSUB/ENDSUB/CAL extensively. They are quite convenient when translating D3D 9/10 shaders, which also have them. And if one day we need to support recursive subroutines (CUDA 4.0 appears to have them; not sure about OpenCL, but I suppose it's only a matter of time), then they'll be unavoidable, as in-lining subroutines won't work anymore. Jose From: mesa-dev mesa-dev-boun...@lists.freedesktop.org on behalf of Eric Anholt e...@anholt.net Sent: 13 November 2014 01:18 To: mesa-dev@lists.freedesktop.org Subject: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR) This series removes a bunch of unused opcodes, mostly from TGSI. It doesn't go as far as we could possibly go -- while I welcome discussion for future patch series deleting more, I hope that discussion doesn't derail the review process for these changes. I haven't messed with the subroutine stuff, since I don't know what people are planning with that. I also haven't messed with the pack/unpack opcodes in TGSI, since they might be useful for some of the GLSL packing stuff. Testing status: compile-tested ilo/r600/softpipe, touch-tested softpipe. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzEm=KrBch2e5-gJGE_5bIs9RInABCFoKy7me_0oysUie4JIs=w3G1SjuOy0EbCJjVrC1tDok52z4eMzIiKu63rvxI7SYe= ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 86070] Host application crash on vmware fusion 7 in vmw_swc_flush
https://bugs.freedesktop.org/show_bug.cgi?id=86070 Sinclair Yeh s...@vmware.com changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |WONTFIX --- Comment #7 from Sinclair Yeh s...@vmware.com --- This issue has been fixed by ce9a3a8997d86f3bf387f23578972acb5b16ac4ac, which is in MESA 10.1.0 onwards. The fix is not trivial to back port to MESA 8.0.4 and so the easiest way is to wait until the next update of Ubuntu 12.04. There is a temporary workaround if waiting for the next update is not an option. 1. Download MESA 10.1.2 from freedesktop 2. ./configure --prefix=PATH OF YOUR CHOOSING --with-gallium-drivers=svga --enable-xa --disable-dri3 3. make install After the above is done, before running mplay-bin, do an export LD_LIBRARY_PATH=PATH OF YOUR CHOOSING FROM EARLIER/lib -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR)
FWIW opencl explicit conversion instructions have optional rounding mode modifiers. Roland Am 13.11.2014 um 21:19 schrieb Jose Fonseca: I've eliminated our internal dependency on TGSI_OPCODE_CND (by replacing SUB+CMP). So you can commit the change to remove it as far as I'm concerned. I have mixed feelings about ARR, because the operation it does is essentially an iround(), i.e., (int)roundf(), and at least when targeting x86, we can generate better code with the combination. That said neither D3D10, GLSL, or OpenCL C code has built-ins for iround(), so to be of benefit we'd need to do pattern matching. So I'm not sure if it's worth to keep this around just for that... Jose From: Jose Fonseca Sent: 13 November 2014 13:06 To: Eric Anholt; mesa-dev@lists.freedesktop.org Subject: RE: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR) Thanks for doing this. It's has been long overdue. Unfortunately we are relying on TGSI_OPCODE_CND/TGSI_OPCODE_ARR internally. I'm also interested in cutting down used opcodes, so I'll try to replace their usage with something else. But until then please hold on to those two patches. The rest looks good AFAICT. Concerning subroutines, we rely on BGNSUB/ENDSUB/CAL extensively. They are quite convenient when translating D3D 9/10 shaders, which also have them. And if one day we need to support recursive subroutines (CUDA 4.0 appears to have them; not sure about OpenCL, but I suppose it's only a matter of time), then they'll be unavoidable, as in-lining subroutines won't work anymore. Jose From: mesa-dev mesa-dev-boun...@lists.freedesktop.org on behalf of Eric Anholt e...@anholt.net Sent: 13 November 2014 01:18 To: mesa-dev@lists.freedesktop.org Subject: [Mesa-dev] Removing unused opcodes (TGSI, Mesa IR) This series removes a bunch of unused opcodes, mostly from TGSI. It doesn't go as far as we could possibly go -- while I welcome discussion for future patch series deleting more, I hope that discussion doesn't derail the review process for these changes. I haven't messed with the subroutine stuff, since I don't know what people are planning with that. I also haven't messed with the pack/unpack opcodes in TGSI, since they might be useful for some of the GLSL packing stuff. Testing status: compile-tested ilo/r600/softpipe, touch-tested softpipe. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=zfmBZnnVGHeYde45pMKNnVyzeaZbdIqVLprmZCM2zzEm=KrBch2e5-gJGE_5bIs9RInABCFoKy7me_0oysUie4JIs=w3G1SjuOy0EbCJjVrC1tDok52z4eMzIiKu63rvxI7SYe= ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddevd=AAIGaQc=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEsr=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_Im=5I_PKJFWlfqxJa2pURZQFykxOixuGPmVNdNc0FEBojMs=cacpp7IDyYxBIzOO6UYU6IzVdrr6EoyBV66p1rS2Vu0e= ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/3] mesa/main: Add sse2 streaming clamping
On Wed, 2014-11-12 at 21:47 +0200, Juha-Pekka Heikkila wrote: On 12.11.2014 19:36, Bruno Jimenez wrote: On Wed, 2014-11-12 at 14:50 +0200, Juha-Pekka Heikkila wrote: Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com --- src/mesa/Makefile.am | 8 +++ src/mesa/main/sse2_clamping.c | 138 ++ src/mesa/main/sse2_clamping.h | 49 +++ 3 files changed, 195 insertions(+) create mode 100644 src/mesa/main/sse2_clamping.c create mode 100644 src/mesa/main/sse2_clamping.h diff --git a/src/mesa/Makefile.am b/src/mesa/Makefile.am index 932db4f..43dbe87 100644 --- a/src/mesa/Makefile.am +++ b/src/mesa/Makefile.am @@ -111,6 +111,10 @@ if SSE41_SUPPORTED ARCH_LIBS += libmesa_sse41.la endif +if SSE2_SUPPORTED +ARCH_LIBS += libmesa_sse2.la +endif + MESA_ASM_FILES_FOR_ARCH = if HAVE_X86_ASM @@ -155,6 +159,10 @@ libmesa_sse41_la_SOURCES = \ main/sse_minmax.c libmesa_sse41_la_CFLAGS = $(AM_CFLAGS) -msse4.1 +libmesa_sse2_la_SOURCES = \ + main/sse2_clamping.c +libmesa_sse2_la_CFLAGS = $(AM_CFLAGS) -msse2 + pkgconfigdir = $(libdir)/pkgconfig pkgconfig_DATA = gl.pc diff --git a/src/mesa/main/sse2_clamping.c b/src/mesa/main/sse2_clamping.c new file mode 100644 index 000..66c7dc7 --- /dev/null +++ b/src/mesa/main/sse2_clamping.c @@ -0,0 +1,138 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Juha-Pekka Heikkila juhapekka.heikk...@gmail.com + * + */ + +#ifdef __SSE2__ +#include main/macros.h +#include main/sse2_clamping.h +#include emmintrin.h + +/** + * Clamp four float values to [min,max] + */ +static inline void +_mesa_clamp_float_rgba(GLfloat src[4], GLfloat result[4], const float min, + const float max) +{ + __m128 operand, minval, maxval; + + operand = _mm_loadu_ps(src); + minval = _mm_set1_ps(min); + maxval = _mm_set1_ps(max); + operand = _mm_max_ps(operand, minval); + operand = _mm_min_ps(operand, maxval); + _mm_storeu_ps(result, operand); +} + + +/* Clamp n amount float rgba pixels to [min,max] using SSE2 + */ +__attribute__((optimize(unroll-loops))) +void +_mesa_streaming_clamp_float_rgba(const GLuint n, GLfloat rgba_src[][4], + GLfloat rgba_dst[][4], const GLfloat min, + const GLfloat max) +{ + int c, prefetch_c; + float* worker = rgba_src[0][0]; + __m128 operand[2], minval, maxval; + + _mm_prefetch((char*) (((unsigned long)worker)|0x1f) + 65, _MM_HINT_T0); ^^^ Hi, May I ask why precisely this numbers? 0x1f as you note below is a typo, should be 0x0f. 65 is cache line length added with one to even the |0x1f operation. Hi, I supposed that it could be something like that, but I wasn't fully sure, thanks for the answer. + + minval = _mm_set1_ps(min); + maxval = _mm_set1_ps(max); + + for (c = n*4; c 0 (((unsigned long)worker)0x1f) != 0; c--, worker++) { ^ I guess that this is for alignment, but you only need to align to a 16 bytes boundary, not 32. Or maybe I am missing something obvious. You are correct, 0x1f is typo. should be 0x0f + operand[0] = _mm_load_ss(worker); + operand[0] = _mm_max_ss(operand[0], minval); + operand[0] = _mm_min_ss(operand[0], maxval); + _mm_store_ss(worker, operand[0]); + } + + while (c = 8) { + _mm_prefetch((char*) worker + 64, _MM_HINT_T0); ^^^ + + for
Re: [Mesa-dev] [PATCH 4/4] i965/fs: Remove is_valid_3src().
On Wed, Nov 12, 2014 at 11:15 AM, Matt Turner matts...@gmail.com wrote: --- src/mesa/drivers/dri/i965/brw_fs.cpp | 6 -- src/mesa/drivers/dri/i965/brw_fs.h | 1 - src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 +- 3 files changed, 1 insertion(+), 8 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 7003691..9196af9 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -622,12 +622,6 @@ fs_reg::is_contiguous() const return stride == 1; } -bool -fs_reg::is_valid_3src() const -{ - return file == GRF || file == UNIFORM; -} - int fs_visitor::type_size(const struct glsl_type *type) { diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 0dae800..9e1dddc 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -83,7 +83,6 @@ public: fs_reg(fs_visitor *v, const struct glsl_type *type); bool equals(const fs_reg r) const; - bool is_valid_3src() const; bool is_contiguous() const; /** Smear a channel of the reg to all channels. */ diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index ce4d8c8..f112466 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -514,7 +514,7 @@ fs_visitor::visit(ir_expression *ir) ir-operands[operand]-fprint(stderr); fprintf(stderr, \n); } - assert(this-result.is_valid_3src()); + assert(this-result.file == GRF || this-result.file == UNIFORM); op[operand] = this-result; /* Matrix expression operands should have been broken down to vector -- 2.0.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Series is Reviewed-by: Anuj Phogat anuj.pho...@gmail.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] r600g/compute: Stop leaking CL shader RAM/VRAM
shader-code_bo was leaked VRAM shader-bc.bytecode, shader-binary.* were leaked system memory. Signed-off-by: Aaron Watry awa...@gmail.com --- src/gallium/drivers/r600/evergreen_compute.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c index 5389f96..f3ccffd 100644 --- a/src/gallium/drivers/r600/evergreen_compute.c +++ b/src/gallium/drivers/r600/evergreen_compute.c @@ -268,6 +268,13 @@ void evergreen_delete_compute_state(struct pipe_context *ctx, void* state) FREE(shader-kernels); shader-kernels = NULL; } +#else + pipe_resource_reference(shader-code_bo, NULL); + FREE(shader-bc.bytecode); + FREE(shader-binary.code); + FREE(shader-binary.config); + FREE(shader-binary.global_symbol_offsets); + FREE(shader-binary.rodata); #endif #endif -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/4] aux/pipe_loader: Don't leak dlerror string on dlopen failure
dlopen allocates a string on dlopen failure which is retrieved via dlerror. In order to free that string, you need to retrieve and then free it. In order to keep things legit the windows/other util_dl_error paths allocate and then copy their error message into a buffer as well. Signed-off-by: Aaron Watry awa...@gmail.com CC: Ilia Mirkin imir...@alum.mit.edu v3: Switch comment to C-Style v2: Use strdup instead of calloc/strcpy --- src/gallium/auxiliary/pipe-loader/pipe_loader.c | 5 + src/gallium/auxiliary/util/u_dl.c | 4 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/pipe-loader/pipe_loader.c b/src/gallium/auxiliary/pipe-loader/pipe_loader.c index 8e79f85..7a4e0b1 100644 --- a/src/gallium/auxiliary/pipe-loader/pipe_loader.c +++ b/src/gallium/auxiliary/pipe-loader/pipe_loader.c @@ -25,6 +25,8 @@ * **/ +#include dlfcn.h + #include pipe_loader_priv.h #include util/u_inlines.h @@ -101,6 +103,9 @@ pipe_loader_find_module(struct pipe_loader_device *dev, if (lib) { return lib; } + + /* Retrieve the dlerror() str so that it can be freed properly */ + FREE(util_dl_error()); } } diff --git a/src/gallium/auxiliary/util/u_dl.c b/src/gallium/auxiliary/util/u_dl.c index aca435d..00c4d7c 100644 --- a/src/gallium/auxiliary/util/u_dl.c +++ b/src/gallium/auxiliary/util/u_dl.c @@ -87,8 +87,8 @@ util_dl_error(void) #if defined(PIPE_OS_UNIX) return dlerror(); #elif defined(PIPE_OS_WINDOWS) - return unknown error; + return strdup(unknown error); #else - return unknown error; + return strdup(unknown error); #endif } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/4] r600g/compute: Stop leaking CL kernel bytecode/resources
v3: Rebase and add #if guards v2: fix indentation Signed-off-by: Aaron Watry awa...@gmail.com --- src/gallium/drivers/r600/evergreen_compute.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c index 4334743..5389f96 100644 --- a/src/gallium/drivers/r600/evergreen_compute.c +++ b/src/gallium/drivers/r600/evergreen_compute.c @@ -252,6 +252,25 @@ void evergreen_delete_compute_state(struct pipe_context *ctx, void* state) if (!shader) return; +#if HAVE_OPENCL +#if HAVE_LLVM 0x0306 + if (shader-kernels) { + for (int i = 0; i shader-num_kernels; i++) { + if (shader-kernels[i].code_bo) { + pipe_resource_reference( + (struct pipe_resource**) shader-kernels[i].code_bo, + NULL + ); + } + FREE(shader-kernels[i].bc.bytecode); + shader-kernels[i].bc.bytecode = NULL; + } + FREE(shader-kernels); + shader-kernels = NULL; + } +#endif +#endif + if (shader-ctx){ struct pipe_framebuffer_state *fb_state = shader-ctx-framebuffer.state; for (int i = fb_state-nr_cbufs - 1; fb_state-nr_cbufs 0 ; i--){ -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/4] r600g/compute: Don't leak cbufs in compute state
Walk the array of cbufs backwards and free all of them. v3: Rebase on top of changes since Aug 2014 Signed-off-by: Aaron Watry awa...@gmail.com --- src/gallium/drivers/r600/evergreen_compute.c | 9 + 1 file changed, 9 insertions(+) diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c index 90fdd79..4334743 100644 --- a/src/gallium/drivers/r600/evergreen_compute.c +++ b/src/gallium/drivers/r600/evergreen_compute.c @@ -252,6 +252,15 @@ void evergreen_delete_compute_state(struct pipe_context *ctx, void* state) if (!shader) return; + if (shader-ctx){ + struct pipe_framebuffer_state *fb_state = shader-ctx-framebuffer.state; + for (int i = fb_state-nr_cbufs - 1; fb_state-nr_cbufs 0 ; i--){ + shader-ctx-b.b.surface_destroy(ctx, fb_state-cbufs[i]); + fb_state-cbufs[i] = NULL; + fb_state-nr_cbufs--; + } + } + FREE(shader); } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] aux/pipe_loader: Don't leak dlerror string on dlopen failure
On Thu, Nov 13, 2014 at 6:43 PM, Aaron Watry awa...@gmail.com wrote: dlopen allocates a string on dlopen failure which is retrieved via dlerror. In order to free that string, you need to retrieve and then free it. Are you basically saying that glibc leaks memory and you're trying to make up for it? What if you use a non-buggy library? Or is dlopen() specified in such a way that if it fails, you must free the result of dlerror? I see nothing in the man pages to suggest that... In order to keep things legit the windows/other util_dl_error paths allocate and then copy their error message into a buffer as well. Signed-off-by: Aaron Watry awa...@gmail.com CC: Ilia Mirkin imir...@alum.mit.edu v3: Switch comment to C-Style v2: Use strdup instead of calloc/strcpy --- src/gallium/auxiliary/pipe-loader/pipe_loader.c | 5 + src/gallium/auxiliary/util/u_dl.c | 4 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/pipe-loader/pipe_loader.c b/src/gallium/auxiliary/pipe-loader/pipe_loader.c index 8e79f85..7a4e0b1 100644 --- a/src/gallium/auxiliary/pipe-loader/pipe_loader.c +++ b/src/gallium/auxiliary/pipe-loader/pipe_loader.c @@ -25,6 +25,8 @@ * **/ +#include dlfcn.h + #include pipe_loader_priv.h #include util/u_inlines.h @@ -101,6 +103,9 @@ pipe_loader_find_module(struct pipe_loader_device *dev, if (lib) { return lib; } + + /* Retrieve the dlerror() str so that it can be freed properly */ + FREE(util_dl_error()); } } diff --git a/src/gallium/auxiliary/util/u_dl.c b/src/gallium/auxiliary/util/u_dl.c index aca435d..00c4d7c 100644 --- a/src/gallium/auxiliary/util/u_dl.c +++ b/src/gallium/auxiliary/util/u_dl.c @@ -87,8 +87,8 @@ util_dl_error(void) #if defined(PIPE_OS_UNIX) return dlerror(); #elif defined(PIPE_OS_WINDOWS) - return unknown error; + return strdup(unknown error); #else - return unknown error; + return strdup(unknown error); #endif } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 86070] Host application crash on vmware fusion 7 in vmw_swc_flush
https://bugs.freedesktop.org/show_bug.cgi?id=86070 --- Comment #8 from Nicholas Yue yue.nicho...@gmail.com --- (In reply to Sinclair Yeh from comment #7) This issue has been fixed by ce9a3a8997d86f3bf387f23578972acb5b16ac4ac, which is in MESA 10.1.0 onwards. The fix is not trivial to back port to MESA 8.0.4 and so the easiest way is to wait until the next update of Ubuntu 12.04. There is a temporary workaround if waiting for the next update is not an option. 1. Download MESA 10.1.2 from freedesktop 2. ./configure --prefix=PATH OF YOUR CHOOSING --with-gallium-drivers=svga --enable-xa --disable-dri3 3. make install After the above is done, before running mplay-bin, do an export LD_LIBRARY_PATH=PATH OF YOUR CHOOSING FROM EARLIER/lib Thanks Sinclair, I have built Mesa 10.1.2 as a temporary solution and the Houdini tools are now running fine on VMWare Fusion 7. Cheers -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 08/16] i965: Consolidate code to get struct brw_sampler_prog_key_data
This chunk of code is repeated in a few places, and we're going to add a MESA_SHADER_VERTEX case to it soon. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 37 1 file changed, 16 insertions(+), 21 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 3fc9e39..f36c474 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1696,6 +1696,17 @@ fs_visitor::emit_texture_gen7(ir_texture_opcode op, fs_reg dst, return inst; } +static struct brw_sampler_prog_key_data * +get_tex(gl_shader_stage stage, const void *key) +{ + switch (stage) { + case MESA_SHADER_FRAGMENT: + return ((brw_wm_prog_key*) key)-tex; + default: + unreachable(unhandled shader stage); + } +} + fs_reg fs_visitor::rescale_texcoord(fs_reg coordinate, const glsl_type *coord_type, bool is_rect, uint32_t sampler, int texunit) @@ -1703,10 +1714,7 @@ fs_visitor::rescale_texcoord(fs_reg coordinate, const glsl_type *coord_type, fs_inst *inst = NULL; bool needs_gl_clamp = true; fs_reg scale_x, scale_y; - const struct brw_sampler_prog_key_data *tex = - (stage == MESA_SHADER_FRAGMENT) ? - ((brw_wm_prog_key*) this-key)-tex : NULL; - assert(tex); + struct brw_sampler_prog_key_data *tex = get_tex(stage, this-key); /* The 965 requires the EU to do the normalization of GL rectangle * texture coordinates. We use the program parameter state @@ -1859,10 +1867,7 @@ fs_visitor::emit_texture(ir_texture_opcode op, uint32_t sampler, fs_reg sampler_reg, int texunit) { - const struct brw_sampler_prog_key_data *tex = - (stage == MESA_SHADER_FRAGMENT) ? - ((brw_wm_prog_key*) this-key)-tex : NULL; - assert(tex); + struct brw_sampler_prog_key_data *tex = get_tex(stage, this-key); fs_inst *inst = NULL; if (op == ir_tg4) { @@ -1952,11 +1957,7 @@ fs_visitor::emit_texture(ir_texture_opcode op, void fs_visitor::visit(ir_texture *ir) { - const struct brw_sampler_prog_key_data *tex = - (stage == MESA_SHADER_FRAGMENT) ? - ((brw_wm_prog_key*) this-key)-tex : NULL; - assert(tex); - + const struct brw_sampler_prog_key_data *tex = get_tex(stage, this-key); uint32_t sampler = _mesa_get_sampler_uniform_value(ir-sampler, shader_prog, prog); @@ -2138,10 +2139,7 @@ fs_visitor::emit_gen6_gather_wa(uint8_t wa, fs_reg dst) uint32_t fs_visitor::gather_channel(int orig_chan, uint32_t sampler) { - const struct brw_sampler_prog_key_data *tex = - (stage == MESA_SHADER_FRAGMENT) ? - ((brw_wm_prog_key*) this-key)-tex : NULL; - assert(tex); + struct brw_sampler_prog_key_data *tex = get_tex(stage, this-key); int swiz = GET_SWZ(tex-swizzles[sampler], orig_chan); switch (swiz) { case SWIZZLE_X: return 0; @@ -2181,10 +2179,7 @@ fs_visitor::swizzle_result(ir_texture_opcode op, int dest_components, if (op == ir_txs || op == ir_lod || op == ir_tg4) return; - const struct brw_sampler_prog_key_data *tex = - (stage == MESA_SHADER_FRAGMENT) ? - ((brw_wm_prog_key*) this-key)-tex : NULL; - assert(tex); + struct brw_sampler_prog_key_data *tex = get_tex(stage, this-key); if (dest_components == 1) { /* Ignore DEPTH_TEXTURE_MODE swizzling. */ -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 06/16] i965: Add SIMD8 URB write low-level IR instruction
This is all we need from the generator for SIMD8 vertex shaders. This opcode is just the send instruction, all the hard work will happen in the visitor using LOAD_PAYLOAD. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_defines.h | 3 +++ src/mesa/drivers/dri/i965/brw_fs.cpp | 4 src/mesa/drivers/dri/i965/brw_fs.h| 1 + src/mesa/drivers/dri/i965/brw_fs_generator.cpp| 25 +++ src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 16 ++- src/mesa/drivers/dri/i965/brw_shader.cpp | 4 6 files changed, 52 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index ab45d3d..650fdb9 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -903,6 +903,8 @@ enum opcode { SHADER_OPCODE_GEN4_SCRATCH_WRITE, SHADER_OPCODE_GEN7_SCRATCH_READ, + SHADER_OPCODE_URB_WRITE_SIMD8, + FS_OPCODE_DDX, FS_OPCODE_DDY, FS_OPCODE_PIXEL_X, @@ -1520,6 +1522,7 @@ enum brw_message_target { #define BRW_URB_OPCODE_WRITE_HWORD 0 #define BRW_URB_OPCODE_WRITE_OWORD 1 +#define GEN8_URB_OPCODE_SIMD8_WRITE 7 #define BRW_URB_SWIZZLE_NONE 0 #define BRW_URB_SWIZZLE_INTERLEAVE1 diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index bd44b24..9d07857 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -509,6 +509,7 @@ fs_inst::is_send_from_grf() const case FS_OPCODE_INTERPOLATE_AT_PER_SLOT_OFFSET: case SHADER_OPCODE_UNTYPED_ATOMIC: case SHADER_OPCODE_UNTYPED_SURFACE_READ: + case SHADER_OPCODE_URB_WRITE_SIMD8: return true; case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD: return src[1].file == GRF; @@ -919,6 +920,8 @@ fs_inst::regs_read(fs_visitor *v, int arg) const return mlen; } else if (opcode == FS_OPCODE_FB_WRITE arg == 0) { return mlen; + } else if (opcode == SHADER_OPCODE_URB_WRITE_SIMD8 arg == 0) { + return mlen; } else if (opcode == SHADER_OPCODE_UNTYPED_ATOMIC arg == 0) { return mlen; } else if (opcode == SHADER_OPCODE_UNTYPED_SURFACE_READ arg == 0) { @@ -1009,6 +1012,7 @@ fs_visitor::implied_mrf_writes(fs_inst *inst) return 2; case SHADER_OPCODE_UNTYPED_ATOMIC: case SHADER_OPCODE_UNTYPED_SURFACE_READ: + case SHADER_OPCODE_URB_WRITE_SIMD8: case FS_OPCODE_INTERPOLATE_AT_CENTROID: case FS_OPCODE_INTERPOLATE_AT_SAMPLE: case FS_OPCODE_INTERPOLATE_AT_SHARED_OFFSET: diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 7e99b31..457fb4b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -710,6 +710,7 @@ private: struct brw_reg implied_header, GLuint nr); void generate_fb_write(fs_inst *inst, struct brw_reg payload); + void generate_urb_write(fs_inst *inst, struct brw_reg payload); void generate_blorp_fb_write(fs_inst *inst); void generate_pixel_xy(struct brw_reg dst, bool is_x); void generate_linterp(fs_inst *inst, struct brw_reg dst, diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index fe09ad5..75ee2c7 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -236,6 +236,27 @@ fs_generator::generate_fb_write(fs_inst *inst, struct brw_reg payload) } void +fs_generator::generate_urb_write(fs_inst *inst, struct brw_reg payload) +{ + brw_inst *insn; + + insn = brw_next_insn(p, BRW_OPCODE_SEND); + + brw_set_dest(p, insn, brw_null_reg()); + brw_set_src0(p, insn, payload); + brw_set_src1(p, insn, brw_imm_d(0)); + + brw_inst_set_sfid(brw, insn, BRW_SFID_URB); + brw_inst_set_urb_opcode(brw, insn, GEN8_URB_OPCODE_SIMD8_WRITE); + + brw_inst_set_mlen(brw, insn, inst-mlen); + brw_inst_set_rlen(brw, insn, 0); + brw_inst_set_eot(brw, insn, inst-eot); + brw_inst_set_header_present(brw, insn, true); + brw_inst_set_urb_global_offset(brw, insn, inst-offset); +} + +void fs_generator::generate_blorp_fb_write(fs_inst *inst) { brw_fb_WRITE(p, @@ -1879,6 +1900,10 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width) generate_scratch_read_gen7(inst, dst); break; + case SHADER_OPCODE_URB_WRITE_SIMD8: +generate_urb_write(inst, src[0]); +break; + case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD: generate_uniform_pull_constant_load(inst, dst, src[0], src[1]); break; diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp index 44c74a3..8165909 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp @@ -385,6
[Mesa-dev] [PATCH v2 04/16] i965: Set shader name for generator from call site
fs_generator no longer knows what stage it's generating code for, so we have to set the debug name of the shader from the call site. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 4 +++- src/mesa/drivers/dri/i965/brw_fs.cpp| 17 -- src/mesa/drivers/dri/i965/brw_fs.h | 7 +++--- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 31 +++-- 4 files changed, 35 insertions(+), 24 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp index 86ed953..f6d0b68 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp @@ -31,8 +31,10 @@ brw_blorp_eu_emitter::brw_blorp_eu_emitter(struct brw_context *brw, : mem_ctx(ralloc_context(NULL)), generator(brw, mem_ctx, (void *) rzalloc(mem_ctx, struct brw_wm_prog_key), (struct brw_stage_prog_data *) rzalloc(mem_ctx, struct brw_wm_prog_data), - NULL, NULL, false, debug_flag) + NULL, NULL, false) { + if (debug_flag) + generator.enable_debug(blorp); } brw_blorp_eu_emitter::~brw_blorp_eu_emitter() diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index e417e0c..e96b375 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3743,8 +3743,21 @@ brw_wm_fs_emit(struct brw_context *brw, prog_data-no_8 = false; } - fs_generator g(brw, mem_ctx, (void *) key, prog_data-base, prog, fp-Base, - v.runtime_check_aads_emit, INTEL_DEBUG DEBUG_WM); + fs_generator g(brw, mem_ctx, (void *) key, prog_data-base, prog, + fp-Base, v.runtime_check_aads_emit); + + if (unlikely(INTEL_DEBUG DEBUG_WM)) { + char *name; + if (prog) + name = ralloc_asprintf(mem_ctx, %s fragment shader %d, +prog-Label ? prog-Label : unnamed, +prog-Name); + else + name = ralloc_asprintf(mem_ctx, fragment program %d, fp-Base.Id); + + g.enable_debug(name); + } + if (simd8_cfg) g.generate_code(simd8_cfg, 8); if (simd16_cfg) diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index ae21840..ad47875 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -698,10 +698,10 @@ public: struct brw_stage_prog_data *prog_data, struct gl_shader_program *shader_prog, struct gl_program *fp, -bool runtime_check_aads_emit, -bool debug_flag); +bool runtime_check_aads_emit); ~fs_generator(); + void enable_debug(const char *shader_name); int generate_code(const cfg_t *cfg, int dispatch_width); const unsigned *get_assembly(unsigned int *assembly_size); @@ -809,7 +809,8 @@ private: exec_list discard_halt_patches; bool runtime_check_aads_emit; - const bool debug_flag; + bool debug_flag; + const char *shader_name; void *mem_ctx; }; diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index 9faecf6..ba9303f 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -42,13 +42,12 @@ fs_generator::fs_generator(struct brw_context *brw, struct brw_stage_prog_data *prog_data, struct gl_shader_program *shader_prog, struct gl_program *prog, - bool runtime_check_aads_emit, - bool debug_flag) + bool runtime_check_aads_emit) : brw(brw), key(key), prog_data(prog_data), shader_prog(shader_prog), prog(prog), runtime_check_aads_emit(runtime_check_aads_emit), - debug_flag(debug_flag), mem_ctx(mem_ctx) + debug_flag(false), mem_ctx(mem_ctx) { ctx = brw-ctx; @@ -1508,6 +1507,13 @@ fs_generator::generate_untyped_surface_read(fs_inst *inst, struct brw_reg dst, brw_mark_surface_used(prog_data, surf_index.dw1.ud); } +void +fs_generator::enable_debug(const char *shader_name) +{ + debug_flag = true; + this-shader_name = shader_name; +} + int fs_generator::generate_code(const cfg_t *cfg, int dispatch_width) { @@ -2006,21 +2012,10 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width) int after_size = p-next_insn_offset - start_offset; if (unlikely(debug_flag)) { - if (shader_prog) { - fprintf(stderr, - Native code for %s fragment shader %d (SIMD%d dispatch):\n, - shader_prog-Label ? shader_prog-Label : unnamed, - shader_prog-Name, dispatch_width); - } else if (prog) { - fprintf(stderr, -
[Mesa-dev] [PATCH v2 12/16] i965: Move fs_visitor optimization pass into new method fs_visitor::optimize()
We'll reuse this toplevel optimization driver for the scalar VS. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_fs.cpp | 136 ++- src/mesa/drivers/dri/i965/brw_fs.h | 1 + 2 files changed, 72 insertions(+), 65 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 0ffb4d8..cb73b9f 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3468,6 +3468,76 @@ fs_visitor::opt_drop_redundant_mov_to_flags() } } +void +fs_visitor::optimize() +{ + calculate_cfg(); + + split_virtual_grfs(); + + move_uniform_array_access_to_pull_constants(); + assign_constant_locations(); + demote_pull_constants(); + + opt_drop_redundant_mov_to_flags(); + +#define OPT(pass, args...) do { \ + pass_num++; \ + bool this_progress = pass(args); \ +\ + if (unlikely(INTEL_DEBUG DEBUG_OPTIMIZER) this_progress) { \ + char filename[64]; \ + snprintf(filename, 64, fs%d-%04d-%02d-%02d- #pass, \ + dispatch_width, shader_prog ? shader_prog-Name : 0, iteration, pass_num); \ +\ + backend_visitor::dump_instructions(filename); \ + } \ +\ + progress = progress || this_progress; \ + } while (false) + + if (unlikely(INTEL_DEBUG DEBUG_OPTIMIZER)) { + char filename[64]; + snprintf(filename, 64, fs%d-%04d-00-start, + dispatch_width, shader_prog ? shader_prog-Name : 0); + + backend_visitor::dump_instructions(filename); + } + + bool progress; + int iteration = 0; + do { + progress = false; + iteration++; + int pass_num = 0; + + OPT(remove_duplicate_mrf_writes); + + OPT(opt_algebraic); + OPT(opt_cse); + OPT(opt_copy_propagate); + OPT(opt_peephole_predicated_break); + OPT(dead_code_eliminate); + OPT(opt_peephole_sel); + OPT(dead_control_flow_eliminate, this); + OPT(opt_register_renaming); + OPT(opt_saturate_propagation); + OPT(register_coalesce); + OPT(compute_to_mrf); + + OPT(compact_virtual_grfs); + } while (progress); + + if (lower_load_payload()) { + split_virtual_grfs(); + register_coalesce(); + compute_to_mrf(); + dead_code_eliminate(); + } + + lower_uniform_pull_constant_loads(); +} + bool fs_visitor::run() { @@ -3535,71 +3605,7 @@ fs_visitor::run() emit_fb_writes(); - calculate_cfg(); - - split_virtual_grfs(); - - move_uniform_array_access_to_pull_constants(); - assign_constant_locations(); - demote_pull_constants(); - - opt_drop_redundant_mov_to_flags(); - -#define OPT(pass, args...) do {\ - pass_num++; \ - bool this_progress = pass(args); \ - \ - if (unlikely(INTEL_DEBUG DEBUG_OPTIMIZER) this_progress) { \ - char filename[64];\ - snprintf(filename, 64, fs%d-%04d-%02d-%02d- #pass, \ - dispatch_width, shader_prog ? shader_prog-Name : 0, iteration, pass_num); \ - \ - backend_visitor::dump_instructions(filename); \ - }\ - \ - progress = progress || this_progress;\ - } while (false) - - if (unlikely(INTEL_DEBUG DEBUG_OPTIMIZER)) { - char filename[64]; - snprintf(filename, 64, fs%d-%04d-00-start, - dispatch_width, shader_prog ? shader_prog-Name : 0); - - backend_visitor::dump_instructions(filename); - } - - bool progress; - int iteration = 0; - do { -progress = false; - iteration++; - int pass_num = 0; - - OPT(remove_duplicate_mrf_writes); - - OPT(opt_algebraic); - OPT(opt_cse); - OPT(opt_copy_propagate); - OPT(opt_peephole_predicated_break); - OPT(dead_code_eliminate); - OPT(opt_peephole_sel); - OPT(dead_control_flow_eliminate, this); -
[Mesa-dev] [PATCH v2 13/16] i965: Move fs_visitor ra pass to new fs_visitor::allocate_registers()
This will be reused for the scalar VS pass. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_fs.cpp | 132 +++ src/mesa/drivers/dri/i965/brw_fs.h | 1 + 2 files changed, 71 insertions(+), 62 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index cb73b9f..4dce0a2 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3538,11 +3538,79 @@ fs_visitor::optimize() lower_uniform_pull_constant_loads(); } +void +fs_visitor::allocate_registers() +{ + bool allocated_without_spills; + + static enum instruction_scheduler_mode pre_modes[] = { + SCHEDULE_PRE, + SCHEDULE_PRE_NON_LIFO, + SCHEDULE_PRE_LIFO, + }; + + /* Try each scheduling heuristic to see if it can successfully register +* allocate without spilling. They should be ordered by decreasing +* performance but increasing likelihood of allocating. +*/ + for (unsigned i = 0; i ARRAY_SIZE(pre_modes); i++) { + schedule_instructions(pre_modes[i]); + + if (0) { + assign_regs_trivial(); + allocated_without_spills = true; + } else { + allocated_without_spills = assign_regs(false); + } + if (allocated_without_spills) + break; + } + + if (!allocated_without_spills) { + /* We assume that any spilling is worse than just dropping back to + * SIMD8. There's probably actually some intermediate point where + * SIMD16 with a couple of spills is still better. + */ + if (dispatch_width == 16) { + fail(Failure to register allocate. Reduce number of + live scalar values to avoid this.); + } else { + perf_debug(Fragment shader triggered register spilling. +Try reducing the number of live scalar values to +improve performance.\n); + } + + /* Since we're out of heuristics, just go spill registers until we + * get an allocation. + */ + while (!assign_regs(true)) { + if (failed) +break; + } + } + + assert(force_uncompressed_stack == 0); + + /* This must come after all optimization and register allocation, since +* it inserts dead code that happens to have side effects, and it does +* so based on the actual physical registers in use. +*/ + insert_gen4_send_dependency_workarounds(); + + if (failed) + return; + + if (!allocated_without_spills) + schedule_instructions(SCHEDULE_POST); + + if (last_scratch 0) + prog_data-total_scratch = brw_get_scratch_size(last_scratch); +} + bool fs_visitor::run() { sanity_param_count = prog-Parameters-NumParameters; - bool allocated_without_spills; assign_binding_table_offsets(); @@ -3555,7 +3623,6 @@ fs_visitor::run() emit_dummy_fs(); } else if (brw-use_rep_send dispatch_width == 16) { emit_repclear_shader(); - allocated_without_spills = true; } else { if (INTEL_DEBUG DEBUG_SHADER_TIME) emit_shader_time_begin(); @@ -3610,68 +3677,9 @@ fs_visitor::run() assign_curb_setup(); assign_urb_setup(); - static enum instruction_scheduler_mode pre_modes[] = { - SCHEDULE_PRE, - SCHEDULE_PRE_NON_LIFO, - SCHEDULE_PRE_LIFO, - }; - - /* Try each scheduling heuristic to see if it can successfully register - * allocate without spilling. They should be ordered by decreasing - * performance but increasing likelihood of allocating. - */ - for (unsigned i = 0; i ARRAY_SIZE(pre_modes); i++) { - schedule_instructions(pre_modes[i]); - - if (0) { -assign_regs_trivial(); -allocated_without_spills = true; - } else { -allocated_without_spills = assign_regs(false); - } - if (allocated_without_spills) -break; - } - - if (!allocated_without_spills) { - /* We assume that any spilling is worse than just dropping back to - * SIMD8. There's probably actually some intermediate point where - * SIMD16 with a couple of spills is still better. - */ - if (dispatch_width == 16) { -fail(Failure to register allocate. Reduce number of - live scalar values to avoid this.); - } else { -perf_debug(Fragment shader triggered register spilling. - Try reducing the number of live scalar values to - improve performance.\n); - } - - /* Since we're out of heuristics, just go spill registers until we - * get an allocation. - */ - while (!assign_regs(true)) { -if (failed) - break; - } - } - - assert(force_uncompressed_stack == 0); - - /* This must come after all optimization and
[Mesa-dev] [PATCH v2 00/16] Scalar VS for BDW+
Hi, Here's v2 of the patch series. It incorportes Matts review comments and adds a new patch to refactor the way we call fs_generator. The idea is to get rid of the MESA_SHADER_FS assertion in generate_assembly)() in a nicer way. Now we call generate_code() two times with different dispatch with instead, which returns the offset in the assembly where we put the generated code. Kristian Kristian Høgsberg (16): i965: Don't copy propagate sat MOVs into LOAD_PAYLOAD i965: Refactor fs_generator API i965: Generalize fs_generator further i965: Set shader name for generator from call site i965: Remove shader program argument and member from fs_generator i965: Add SIMD8 URB write low-level IR instruction i965: Add new SIMD8 VS prog data flag i965: Consolidate code to get struct brw_sampler_prog_key_data i965: Prepare for using the ATTR register file in the fs backend i965: Rename brw_vec4_prog_data to brw_vue_prog_data i965: Move more code into codegen-branch of the fs_visitor::run() if statement i965: Move fs_visitor optimization pass into new method fs_visitor::optimize() i965: Move fs_visitor ra pass to new fs_visitor::allocate_registers() i965: Add fs_visitor::run_vs() to generate scalar vertex shader code i965: Clean up fs_visitor::run and rename to run_fs i965: Generate vs code using scalar backend for BDW+ src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp| 13 +- src/mesa/drivers/dri/i965/brw_context.c| 13 + src/mesa/drivers/dri/i965/brw_context.h| 23 +- src/mesa/drivers/dri/i965/brw_defines.h| 5 + src/mesa/drivers/dri/i965/brw_fs.cpp | 436 + src/mesa/drivers/dri/i965/brw_fs.h | 51 ++- .../drivers/dri/i965/brw_fs_copy_propagation.cpp | 6 +- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 121 +++--- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 16 +- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 351 +++-- src/mesa/drivers/dri/i965/brw_gs_surface_state.c | 2 +- src/mesa/drivers/dri/i965/brw_shader.cpp | 21 +- src/mesa/drivers/dri/i965/brw_vec4.cpp | 66 +++- src/mesa/drivers/dri/i965/brw_vec4.h | 18 +- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 4 +- src/mesa/drivers/dri/i965/brw_vec4_gs.c| 4 +- src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h| 2 +- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 4 +- src/mesa/drivers/dri/i965/brw_vs.c | 10 +- src/mesa/drivers/dri/i965/brw_vs.h | 2 +- src/mesa/drivers/dri/i965/brw_vs_surface_state.c | 10 +- src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 7 +- src/mesa/drivers/dri/i965/gen6_gs_state.c | 2 +- src/mesa/drivers/dri/i965/gen7_gs_state.c | 2 +- src/mesa/drivers/dri/i965/gen8_gs_state.c | 2 +- src/mesa/drivers/dri/i965/gen8_vs_state.c | 4 +- src/mesa/drivers/dri/i965/intel_debug.c| 1 + src/mesa/drivers/dri/i965/intel_debug.h| 1 + 29 files changed, 869 insertions(+), 330 deletions(-) -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 15/16] i965: Clean up fs_visitor::run and rename to run_fs
Now that fs_visitor::run is back to being only fragment shader compilation, we can clean up a few stage == MESA_SHADER_FRAGMENT conditions and rename it to run_fs. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_fs.cpp | 31 +-- src/mesa/drivers/dri/i965/brw_fs.h | 2 +- 2 files changed, 14 insertions(+), 19 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 8007977..9bd57c9 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3699,8 +3699,12 @@ fs_visitor::run_vs() } bool -fs_visitor::run() +fs_visitor::run_fs() { + brw_wm_prog_data *wm_prog_data = (brw_wm_prog_data *) this-prog_data; + brw_wm_prog_key *wm_key = (brw_wm_prog_key *) this-key; + assert(stage == MESA_SHADER_FRAGMENT); + sanity_param_count = prog-Parameters-NumParameters; assign_binding_table_offsets(); @@ -3729,13 +3733,7 @@ fs_visitor::run() /* We handle discards by keeping track of the still-live pixels in f0.1. * Initialize it with the dispatched pixels. */ - bool uses_kill = - (stage == MESA_SHADER_FRAGMENT) - ((brw_wm_prog_data*) this-prog_data)-uses_kill; - bool alpha_test_func = - (stage == MESA_SHADER_FRAGMENT) - ((brw_wm_prog_key*) this-key)-alpha_test_func; - if (uses_kill || alpha_test_func) { + if (wm_prog_data-uses_kill || wm_key-alpha_test_func) { fs_inst *discard_init = emit(FS_OPCODE_MOV_DISPATCH_TO_FLAGS); discard_init-flag_subreg = 1; } @@ -3758,7 +3756,7 @@ fs_visitor::run() emit(FS_OPCODE_PLACEHOLDER_HALT); - if (alpha_test_func) + if (wm_key-alpha_test_func) emit_alpha_test(); emit_fb_writes(); @@ -3773,13 +3771,10 @@ fs_visitor::run() return false; } - if (stage == MESA_SHADER_FRAGMENT) { - brw_wm_prog_data *prog_data = (brw_wm_prog_data*) this-prog_data; - if (dispatch_width == 8) - prog_data-reg_blocks = brw_register_blocks(grf_used); - else - prog_data-reg_blocks_16 = brw_register_blocks(grf_used); - } + if (dispatch_width == 8) + wm_prog_data-reg_blocks = brw_register_blocks(grf_used); + else + wm_prog_data-reg_blocks_16 = brw_register_blocks(grf_used); /* If any state parameters were appended, then ParameterValues could have * been realloced, in which case the driver uniform storage set up by @@ -3819,7 +3814,7 @@ brw_wm_fs_emit(struct brw_context *brw, /* Now the main event: Visit the shader IR and generate our FS IR for it. */ fs_visitor v(brw, mem_ctx, key, prog_data, prog, fp, 8); - if (!v.run()) { + if (!v.run_fs()) { if (prog) { prog-LinkStatus = false; ralloc_strcat(prog-InfoLog, v.fail_msg); @@ -3838,7 +3833,7 @@ brw_wm_fs_emit(struct brw_context *brw, if (!v.simd16_unsupported) { /* Try a SIMD16 compile */ v2.import_uniforms(v); - if (!v2.run()) { + if (!v2.run_fs()) { perf_debug(SIMD16 shader failed to compile, falling back to SIMD8 at a 10-20%% performance cost: %s, v2.fail_msg); } else { diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 6888cdd..b83ea87 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -416,7 +416,7 @@ public: const fs_reg varying_offset, uint32_t const_offset); - bool run(); + bool run_fs(); bool run_vs(); void optimize(); void allocate_registers(); -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 05/16] i965: Remove shader program argument and member from fs_generator
Now that the caller passes in the shader debug name, we don't need this anymore. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 2 +- src/mesa/drivers/dri/i965/brw_fs.cpp| 2 +- src/mesa/drivers/dri/i965/brw_fs.h | 2 -- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 3 +-- 4 files changed, 3 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp index f6d0b68..83fccc2 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp @@ -31,7 +31,7 @@ brw_blorp_eu_emitter::brw_blorp_eu_emitter(struct brw_context *brw, : mem_ctx(ralloc_context(NULL)), generator(brw, mem_ctx, (void *) rzalloc(mem_ctx, struct brw_wm_prog_key), (struct brw_stage_prog_data *) rzalloc(mem_ctx, struct brw_wm_prog_data), - NULL, NULL, false) + NULL, false) { if (debug_flag) generator.enable_debug(blorp); diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index e96b375..bd44b24 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3743,7 +3743,7 @@ brw_wm_fs_emit(struct brw_context *brw, prog_data-no_8 = false; } - fs_generator g(brw, mem_ctx, (void *) key, prog_data-base, prog, + fs_generator g(brw, mem_ctx, (void *) key, prog_data-base, fp-Base, v.runtime_check_aads_emit); if (unlikely(INTEL_DEBUG DEBUG_WM)) { diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index ad47875..7e99b31 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -696,7 +696,6 @@ public: void *mem_ctx, const void *key, struct brw_stage_prog_data *prog_data, -struct gl_shader_program *shader_prog, struct gl_program *fp, bool runtime_check_aads_emit); ~fs_generator(); @@ -802,7 +801,6 @@ private: const void * const key; struct brw_stage_prog_data * const prog_data; - struct gl_shader_program * const shader_prog; const struct gl_program *prog; unsigned dispatch_width; /** 8 or 16 */ diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index ba9303f..fe09ad5 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -40,12 +40,11 @@ fs_generator::fs_generator(struct brw_context *brw, void *mem_ctx, const void *key, struct brw_stage_prog_data *prog_data, - struct gl_shader_program *shader_prog, struct gl_program *prog, bool runtime_check_aads_emit) : brw(brw), key(key), - prog_data(prog_data), shader_prog(shader_prog), + prog_data(prog_data), prog(prog), runtime_check_aads_emit(runtime_check_aads_emit), debug_flag(false), mem_ctx(mem_ctx) { -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 02/16] i965: Refactor fs_generator API
We split out SIMD8 and SIMD16 generation into seperate calls to new method generate_code(), which returns the start offset for the generated code. A new get_assembly() method returns the generated code. This avoids asserting MESA_SHADER_FRAGMENT and accessing wm_prog_data in the generator. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 4 ++- src/mesa/drivers/dri/i965/brw_fs.cpp| 9 +++--- src/mesa/drivers/dri/i965/brw_fs.h | 6 ++-- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 43 - 4 files changed, 23 insertions(+), 39 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp index 3afe0e7..7802c9f 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp @@ -45,7 +45,9 @@ const unsigned * brw_blorp_eu_emitter::get_program(unsigned *program_size) { cfg_t cfg(insts); - return generator.generate_assembly(NULL, cfg, program_size); + generator.generate_code(cfg, 16); + + return generator.get_assembly(program_size); } /** diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index aa1d8d2..e12fd77 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3743,11 +3743,12 @@ brw_wm_fs_emit(struct brw_context *brw, prog_data-no_8 = false; } - const unsigned *assembly = NULL; fs_generator g(brw, mem_ctx, key, prog_data, prog, fp, v.runtime_check_aads_emit, INTEL_DEBUG DEBUG_WM); - assembly = g.generate_assembly(simd8_cfg, simd16_cfg, - final_assembly_size); + if (simd8_cfg) + g.generate_code(simd8_cfg, 8); + if (simd16_cfg) + prog_data-prog_offset_16 = g.generate_code(simd16_cfg, 16); if (unlikely(brw-perf_debug) shader) { if (shader-compiled_once) @@ -3760,7 +3761,7 @@ brw_wm_fs_emit(struct brw_context *brw, } } - return assembly; + return g.get_assembly(final_assembly_size); } bool diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 67956bc..5c21dd0 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -702,12 +702,10 @@ public: bool debug_flag); ~fs_generator(); - const unsigned *generate_assembly(const cfg_t *simd8_cfg, - const cfg_t *simd16_cfg, - unsigned *assembly_size); + int generate_code(const cfg_t *cfg, int dispatch_width); + const unsigned *get_assembly(unsigned int *assembly_size); private: - void generate_code(const cfg_t *cfg); void fire_fb_write(fs_inst *inst, struct brw_reg payload, struct brw_reg implied_header, diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index c95beb6..0622b07 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -1512,9 +1512,17 @@ fs_generator::generate_untyped_surface_read(fs_inst *inst, struct brw_reg dst, brw_mark_surface_used(prog_data, surf_index.dw1.ud); } -void -fs_generator::generate_code(const cfg_t *cfg) +int +fs_generator::generate_code(const cfg_t *cfg, int dispatch_width) { + /* align to 64 byte boundary. */ + while (p-next_insn_offset % 64) + brw_NOP(p); + + this-dispatch_width = dispatch_width; + if (dispatch_width == 16) + brw_set_default_compression_control(p, BRW_COMPRESSION_COMPRESSED); + int start_offset = p-next_insn_offset; int loop_count = 0; @@ -2024,37 +2032,12 @@ fs_generator::generate_code(const cfg_t *cfg) dump_assembly(p-store, annotation.ann_count, annotation.ann, brw, prog); ralloc_free(annotation.ann); } + + return start_offset; } const unsigned * -fs_generator::generate_assembly(const cfg_t *simd8_cfg, -const cfg_t *simd16_cfg, -unsigned *assembly_size) +fs_generator::get_assembly(unsigned int *assembly_size) { - assert(simd8_cfg || simd16_cfg); - - if (simd8_cfg) { - dispatch_width = 8; - generate_code(simd8_cfg); - } - - if (simd16_cfg) { - /* align to 64 byte boundary. */ - while (p-next_insn_offset % 64) { - brw_NOP(p); - } - - assert(stage == MESA_SHADER_FRAGMENT); - brw_wm_prog_data *prog_data = (brw_wm_prog_data*) this-prog_data; - - /* Save off the start of this SIMD16 program */ - prog_data-prog_offset_16 = p-next_insn_offset; - - brw_set_default_compression_control(p, BRW_COMPRESSION_COMPRESSED); - - dispatch_width = 16; - generate_code(simd16_cfg); - } - return brw_get_program(p, assembly_size); } -- 2.1.0
[Mesa-dev] [PATCH v2 01/16] i965: Don't copy propagate sat MOVs into LOAD_PAYLOAD
The LOAD_PAYLOAD opcode can't saturate its sources, so skip saturating MOVs. The register coalescing after lower_load_payload() will clean up the extra MOVs. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp index e1989cb..87ea9c2 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp @@ -454,8 +454,12 @@ fs_visitor::try_constant_propagate(fs_inst *inst, acp_entry *entry) val.effective_width = inst-src[i].effective_width; switch (inst-opcode) { - case BRW_OPCODE_MOV: case SHADER_OPCODE_LOAD_PAYLOAD: + /* LOAD_PAYLOAD can't sat its sources. */ + if (entry-saturate) +break; + /* Otherwise, fall through */ + case BRW_OPCODE_MOV: inst-src[i] = val; progress = true; break; -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 09/16] i965: Prepare for using the ATTR register file in the fs backend
The scalar vertex shader will use the ATTR register file for vertex attributes. This patch adds support for the ATTR file to fs_visitor. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_fs.cpp | 12 ++-- src/mesa/drivers/dri/i965/brw_fs.h | 3 +++ src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 -- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 11 +-- 4 files changed, 22 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 9d07857..00156c7 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -76,7 +76,7 @@ fs_inst::init(enum opcode opcode, uint8_t exec_size, const fs_reg dst, this-exec_size = dst.width; } else { for (int i = 0; i sources; ++i) { -if (src[i].file != GRF) +if (src[i].file != GRF src[i].file != ATTR) continue; if (this-exec_size = 1) @@ -97,6 +97,7 @@ fs_inst::init(enum opcode opcode, uint8_t exec_size, const fs_reg dst, break; case GRF: case HW_REG: + case ATTR: assert(this-src[i].width 0); if (this-src[i].width == 1) { this-src[i].effective_width = this-exec_size; @@ -121,6 +122,7 @@ fs_inst::init(enum opcode opcode, uint8_t exec_size, const fs_reg dst, case GRF: case HW_REG: case MRF: + case ATTR: this-regs_written = (dst.width * dst.stride * type_sz(dst.type) + 31) / 32; break; case BAD_FILE: @@ -636,7 +638,7 @@ fs_reg::is_contiguous() const bool fs_reg::is_valid_3src() const { - return file == GRF || file == UNIFORM; + return file == GRF || file == UNIFORM || file == ATTR; } int @@ -3148,6 +3150,9 @@ fs_visitor::dump_instruction(backend_instruction *be_inst, FILE *file) case UNIFORM: fprintf(file, ***u%d***, inst-dst.reg + inst-dst.reg_offset); break; + case ATTR: + fprintf(file, attr%d, inst-dst.reg + inst-dst.reg_offset); + break; case HW_REG: if (inst-dst.fixed_hw_reg.file == BRW_ARCHITECTURE_REGISTER_FILE) { switch (inst-dst.fixed_hw_reg.nr) { @@ -3199,6 +3204,9 @@ fs_visitor::dump_instruction(backend_instruction *be_inst, FILE *file) case MRF: fprintf(file, ***m%d***, inst-src[i].reg); break; + case ATTR: + fprintf(file, attr%d, inst-src[i].reg + inst-src[i].reg_offset); + break; case UNIFORM: fprintf(file, u%d, inst-src[i].reg + inst-src[i].reg_offset); if (inst-src[i].reladdr) { diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 457fb4b..454496e 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -132,6 +132,7 @@ byte_offset(fs_reg reg, unsigned delta) case BAD_FILE: break; case GRF: + case ATTR: reg.reg_offset += delta / 32; break; case MRF: @@ -157,6 +158,7 @@ horiz_offset(fs_reg reg, unsigned delta) break; case GRF: case MRF: + case ATTR: return byte_offset(reg, delta * reg.stride * type_sz(reg.type)); default: assert(delta == 0); @@ -173,6 +175,7 @@ offset(fs_reg reg, unsigned delta) break; case GRF: case MRF: + case ATTR: return byte_offset(reg, delta * reg.width * reg.stride * type_sz(reg.type)); case UNIFORM: reg.reg_offset += delta; diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index 75ee2c7..dee79d3 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -1270,8 +1270,6 @@ brw_reg_from_fs_reg(fs_reg *reg) /* Probably unused. */ brw_reg = brw_null_reg(); break; - case UNIFORM: - unreachable(not reached); default: unreachable(not reached); } diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index f36c474..0cc51f3 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -196,8 +196,15 @@ fs_visitor::visit(ir_dereference_array *ir) src.type = brw_type_for_base_type(ir-type); if (constant_index) { - assert(src.file == UNIFORM || src.file == GRF || src.file == HW_REG); - src = offset(src, constant_index-value.i[0] * element_size); + if (src.file == ATTR) { + /* Attribute arrays get loaded as one vec4 per element. In that case + * offset the source register. + */ + src.reg += constant_index-value.i[0]; + } else { + assert(src.file == UNIFORM || src.file == GRF || src.file == HW_REG); + src = offset(src, constant_index-value.i[0] * element_size); + } } else { /* Variable index array dereference. We attach the
[Mesa-dev] [PATCH v2 11/16] i965: Move more code into codegen-branch of the fs_visitor::run() if statement
These last few operations all only apply when we've actually generated code, optimized and allocated registers. The dummy and the repclear shaders don't touch uncompressed_stack, don't need the gen4 send workaround, and don't spill. This means we can move these lines into the else-branch, which will make the following refactoring easier. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_fs.cpp | 26 +- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 00156c7..0ffb4d8 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3649,23 +3649,23 @@ fs_visitor::run() break; } } - } - assert(force_uncompressed_stack == 0); - /* This must come after all optimization and register allocation, since -* it inserts dead code that happens to have side effects, and it does -* so based on the actual physical registers in use. -*/ - insert_gen4_send_dependency_workarounds(); + assert(force_uncompressed_stack == 0); - if (failed) - return false; + /* This must come after all optimization and register allocation, since + * it inserts dead code that happens to have side effects, and it does + * so based on the actual physical registers in use. + */ + insert_gen4_send_dependency_workarounds(); + + if (failed) + return false; - if (!allocated_without_spills) - schedule_instructions(SCHEDULE_POST); + if (!allocated_without_spills) + schedule_instructions(SCHEDULE_POST); - if (last_scratch 0) { - prog_data-total_scratch = brw_get_scratch_size(last_scratch); + if (last_scratch 0) + prog_data-total_scratch = brw_get_scratch_size(last_scratch); } if (stage == MESA_SHADER_FRAGMENT) { -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 07/16] i965: Add new SIMD8 VS prog data flag
This flag signals that we have a SIMD8 VS shader so we can set up the corresponding state accordingly. This boils down to setting the BDW+ SIMD8 enable bit in 3DSTATE_VS and making UBO and pull constant buffers use dword pitch. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_context.h | 5 - src/mesa/drivers/dri/i965/brw_defines.h | 2 ++ src/mesa/drivers/dri/i965/brw_gs_surface_state.c | 2 +- src/mesa/drivers/dri/i965/brw_vs_surface_state.c | 10 -- src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 7 --- src/mesa/drivers/dri/i965/gen8_vs_state.c| 2 ++ 6 files changed, 21 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index eb37e75..e7cd30f 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -543,6 +543,8 @@ struct brw_vec4_prog_data { * is the size of the URB entry used for output. */ GLuint urb_entry_size; + + bool simd8; }; @@ -1599,7 +1601,8 @@ brw_update_sol_surface(struct brw_context *brw, void brw_upload_ubo_surfaces(struct brw_context *brw, struct gl_shader *shader, struct brw_stage_state *stage_state, - struct brw_stage_prog_data *prog_data); + struct brw_stage_prog_data *prog_data, + bool dword_pitch); void brw_upload_abo_surfaces(struct brw_context *brw, struct gl_shader_program *prog, struct brw_stage_state *stage_state, diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index 650fdb9..900d8cf 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -1688,6 +1688,8 @@ enum brw_message_target { # define GEN6_VS_STATISTICS_ENABLE (1 10) # define GEN6_VS_CACHE_DISABLE (1 1) # define GEN6_VS_ENABLE(1 0) +/* Gen8+ DW7 */ +# define GEN8_VS_SIMD8_ENABLE (1 2) /* Gen8+ DW8 */ # define GEN8_VS_URB_ENTRY_OUTPUT_OFFSET_SHIFT 21 # define GEN8_VS_URB_OUTPUT_LENGTH_SHIFT16 diff --git a/src/mesa/drivers/dri/i965/brw_gs_surface_state.c b/src/mesa/drivers/dri/i965/brw_gs_surface_state.c index 2c2ba56..42cdddb 100644 --- a/src/mesa/drivers/dri/i965/brw_gs_surface_state.c +++ b/src/mesa/drivers/dri/i965/brw_gs_surface_state.c @@ -77,7 +77,7 @@ brw_upload_gs_ubo_surfaces(struct brw_context *brw) /* CACHE_NEW_GS_PROG */ brw_upload_ubo_surfaces(brw, prog-_LinkedShaders[MESA_SHADER_GEOMETRY], - brw-gs.base, brw-gs.prog_data-base.base); + brw-gs.base, brw-gs.prog_data-base.base, false); } const struct brw_tracked_state brw_gs_ubo_surfaces = { diff --git a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c index 1cc96cf..24bc06d 100644 --- a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c +++ b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c @@ -112,6 +112,7 @@ static void brw_upload_vs_pull_constants(struct brw_context *brw) { struct brw_stage_state *stage_state = brw-vs.base; + bool dword_pitch; /* BRW_NEW_VERTEX_PROGRAM */ struct brw_vertex_program *vp = @@ -120,9 +121,11 @@ brw_upload_vs_pull_constants(struct brw_context *brw) /* CACHE_NEW_VS_PROG */ const struct brw_stage_prog_data *prog_data = brw-vs.prog_data-base.base; + dword_pitch = brw-vs.prog_data-base.simd8; + /* _NEW_PROGRAM_CONSTANTS */ brw_upload_pull_constants(brw, BRW_NEW_VS_CONSTBUF, vp-program.Base, - stage_state, prog_data, false); + stage_state, prog_data, dword_pitch); } const struct brw_tracked_state brw_vs_pull_constants = { @@ -141,13 +144,16 @@ brw_upload_vs_ubo_surfaces(struct brw_context *brw) /* _NEW_PROGRAM */ struct gl_shader_program *prog = ctx-_Shader-CurrentProgram[MESA_SHADER_VERTEX]; + bool dword_pitch; if (!prog) return; /* CACHE_NEW_VS_PROG */ + dword_pitch = brw-vs.prog_data-base.simd8; brw_upload_ubo_surfaces(brw, prog-_LinkedShaders[MESA_SHADER_VERTEX], - brw-vs.base, brw-vs.prog_data-base.base); + brw-vs.base, brw-vs.prog_data-base.base, + dword_pitch); } const struct brw_tracked_state brw_vs_ubo_surfaces = { diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c index ef46dd7..ec86841 100644 --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c @@ -837,7 +837,8 @@ void brw_upload_ubo_surfaces(struct
[Mesa-dev] [PATCH v2 16/16] i965: Generate vs code using scalar backend for BDW+
With everything in place, we can now use the scalar backend compiler for vertex shaders on BDW+. We make scalar vertex shaders the default on BDW+ but add a new vec4vs debug option to force the vec4 backend. No piglit regressions. Performance impact is minimal, I see a ~1.5 improvement on the T-Rex GLBenchmark case, but in general it's in the noise. Some of our internal synthetic, vs bounded benchmarks show great improvement, 20%-40% in some cases, but real-world cases are mostly unaffected. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_context.c | 13 +++ src/mesa/drivers/dri/i965/brw_context.h | 1 + src/mesa/drivers/dri/i965/brw_shader.cpp | 17 +++-- src/mesa/drivers/dri/i965/brw_vec4.cpp | 60 +--- src/mesa/drivers/dri/i965/intel_debug.c | 1 + src/mesa/drivers/dri/i965/intel_debug.h | 1 + 6 files changed, 78 insertions(+), 15 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri/i965/brw_context.c index e1a994a..f56cfb2 100644 --- a/src/mesa/drivers/dri/i965/brw_context.c +++ b/src/mesa/drivers/dri/i965/brw_context.c @@ -557,6 +557,15 @@ brw_initialize_context_constants(struct brw_context *brw) ctx-Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].OptimizeForAOS = true; ctx-Const.ShaderCompilerOptions[MESA_SHADER_GEOMETRY].OptimizeForAOS = true; + if (brw-scalar_vs) { + /* If we're using the scalar backend for vertex shaders, we need to + * configure these accordingly. + */ + ctx-Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].EmitNoIndirectOutput = true; + ctx-Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].EmitNoIndirectTemp = true; + ctx-Const.ShaderCompilerOptions[MESA_SHADER_VERTEX].OptimizeForAOS = false; + } + /* ARB_viewport_array */ if (brw-gen = 7 ctx-API == API_OPENGL_CORE) { ctx-Const.MaxViewports = GEN7_NUM_VIEWPORTS; @@ -755,6 +764,10 @@ brwCreateContext(gl_api api, brw_process_driconf_options(brw); brw_process_intel_debug_variable(brw); + + if (brw-gen = 8 !(INTEL_DEBUG DEBUG_VEC4VS)) + brw-scalar_vs = true; + brw_initialize_context_constants(brw); ctx-Const.ResetStrategy = notify_reset diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 463f3d2..f198103 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -1067,6 +1067,7 @@ struct brw_context bool has_pln; bool no_simd8; bool use_rep_send; + bool scalar_vs; /** * Some versions of Gen hardware don't do centroid interpolation correctly diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp b/src/mesa/drivers/dri/i965/brw_shader.cpp index 3c78afd..26da729 100644 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp @@ -71,6 +71,19 @@ brw_shader_precompile(struct gl_context *ctx, struct gl_shader_program *prog) return true; } +static inline bool +is_scalar_shader_stage(struct brw_context *brw, int stage) +{ + switch (stage) { + case MESA_SHADER_FRAGMENT: + return true; + case MESA_SHADER_VERTEX: + return brw-scalar_vs; + default: + return false; + } +} + static void brw_lower_packing_builtins(struct brw_context *brw, gl_shader_stage shader_type, @@ -91,7 +104,7 @@ brw_lower_packing_builtins(struct brw_context *brw, * lowering is needed. For SOA code, the Half2x16 ops must be * scalarized. */ - if (shader_type == MESA_SHADER_FRAGMENT) { + if (is_scalar_shader_stage(brw, shader_type)) { ops |= LOWER_PACK_HALF_2x16_TO_SPLIT | LOWER_UNPACK_HALF_2x16_TO_SPLIT; } @@ -179,7 +192,7 @@ brw_link_shader(struct gl_context *ctx, struct gl_shader_program *shProg) do { progress = false; -if (stage == MESA_SHADER_FRAGMENT) { +if (is_scalar_shader_stage(brw, stage)) { brw_do_channel_expressions(shader-base.ir); brw_do_vector_splitting(shader-base.ir); } diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 280db47..3e9cc23 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -22,6 +22,7 @@ */ #include brw_vec4.h +#include brw_fs.h #include brw_cfg.h #include brw_vs.h #include brw_dead_control_flow.h @@ -1863,6 +1864,7 @@ brw_vs_emit(struct brw_context *brw, { bool start_busy = false; double start_time = 0; + const unsigned *assembly = NULL; if (unlikely(brw-perf_debug)) { start_busy = (brw-batch.last_bo @@ -1877,23 +1879,55 @@ brw_vs_emit(struct brw_context *brw, if (unlikely(INTEL_DEBUG DEBUG_VS)) brw_dump_ir(vertex, prog, shader-base, c-vp-program.Base); - vec4_vs_visitor v(brw, c, prog_data, prog, mem_ctx); - if (!v.run()) { - if
[Mesa-dev] [PATCH v2 03/16] i965: Generalize fs_generator further
This removes all stage specific data from the generator, and lets us create a generator for any stage. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 5 ++--- src/mesa/drivers/dri/i965/brw_fs.cpp| 2 +- src/mesa/drivers/dri/i965/brw_fs.h | 7 +++ src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 19 +++ 4 files changed, 13 insertions(+), 20 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp index 7802c9f..86ed953 100644 --- a/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp @@ -29,9 +29,8 @@ brw_blorp_eu_emitter::brw_blorp_eu_emitter(struct brw_context *brw, bool debug_flag) : mem_ctx(ralloc_context(NULL)), - generator(brw, mem_ctx, - rzalloc(mem_ctx, struct brw_wm_prog_key), - rzalloc(mem_ctx, struct brw_wm_prog_data), + generator(brw, mem_ctx, (void *) rzalloc(mem_ctx, struct brw_wm_prog_key), + (struct brw_stage_prog_data *) rzalloc(mem_ctx, struct brw_wm_prog_data), NULL, NULL, false, debug_flag) { } diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index e12fd77..e417e0c 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3743,7 +3743,7 @@ brw_wm_fs_emit(struct brw_context *brw, prog_data-no_8 = false; } - fs_generator g(brw, mem_ctx, key, prog_data, prog, fp, + fs_generator g(brw, mem_ctx, (void *) key, prog_data-base, prog, fp-Base, v.runtime_check_aads_emit, INTEL_DEBUG DEBUG_WM); if (simd8_cfg) g.generate_code(simd8_cfg, 8); diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 5c21dd0..ae21840 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -694,10 +694,10 @@ class fs_generator public: fs_generator(struct brw_context *brw, void *mem_ctx, -const struct brw_wm_prog_key *key, -struct brw_wm_prog_data *prog_data, +const void *key, +struct brw_stage_prog_data *prog_data, struct gl_shader_program *shader_prog, -struct gl_fragment_program *fp, +struct gl_program *fp, bool runtime_check_aads_emit, bool debug_flag); ~fs_generator(); @@ -799,7 +799,6 @@ private: struct gl_context *ctx; struct brw_compile *p; - gl_shader_stage stage; const void * const key; struct brw_stage_prog_data * const prog_data; diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index 0622b07..9faecf6 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -38,16 +38,16 @@ extern C { fs_generator::fs_generator(struct brw_context *brw, void *mem_ctx, - const struct brw_wm_prog_key *key, - struct brw_wm_prog_data *prog_data, + const void *key, + struct brw_stage_prog_data *prog_data, struct gl_shader_program *shader_prog, - struct gl_fragment_program *fp, + struct gl_program *prog, bool runtime_check_aads_emit, bool debug_flag) - : brw(brw), stage(MESA_SHADER_FRAGMENT), key(key), - prog_data(prog_data-base), shader_prog(shader_prog), - prog(fp-Base), runtime_check_aads_emit(runtime_check_aads_emit), + : brw(brw), key(key), + prog_data(prog_data), shader_prog(shader_prog), + prog(prog), runtime_check_aads_emit(runtime_check_aads_emit), debug_flag(debug_flag), mem_ctx(mem_ctx) { ctx = brw-ctx; @@ -105,7 +105,6 @@ fs_generator::fire_fb_write(fs_inst *inst, { uint32_t msg_control; - assert(stage == MESA_SHADER_FRAGMENT); brw_wm_prog_data *prog_data = (brw_wm_prog_data*) this-prog_data; if (brw-gen 6) { @@ -146,7 +145,6 @@ fs_generator::fire_fb_write(fs_inst *inst, void fs_generator::generate_fb_write(fs_inst *inst, struct brw_reg payload) { - assert(stage == MESA_SHADER_FRAGMENT); brw_wm_prog_data *prog_data = (brw_wm_prog_data*) this-prog_data; const brw_wm_prog_key * const key = (brw_wm_prog_key * const) this-key; struct brw_reg implied_header; @@ -700,7 +698,6 @@ fs_generator::generate_ddx(fs_inst *inst, struct brw_reg dst, struct brw_reg src assert(quality.file == BRW_IMMEDIATE_VALUE); assert(quality.type == BRW_REGISTER_TYPE_D); - assert(stage == MESA_SHADER_FRAGMENT); const brw_wm_prog_key * const key =
[Mesa-dev] [PATCH v2 10/16] i965: Rename brw_vec4_prog_data to brw_vue_prog_data
With scalar vertex shader coming up, we're going to reuse brw_vec4_prog_data in the scalar backend. There's nothing vec4 specific in the struct, it's instead common state for stages that operate on VUEs. This patch renames the struct to brw_vue_prog_data which is more descriptive and will look a lot less awkward when we use it in the scalar backend. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_context.h | 17 - src/mesa/drivers/dri/i965/brw_vec4.cpp| 6 +++--- src/mesa/drivers/dri/i965/brw_vec4.h | 18 +- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 4 ++-- src/mesa/drivers/dri/i965/brw_vec4_gs.c | 4 ++-- src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h | 2 +- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 4 ++-- src/mesa/drivers/dri/i965/brw_vs.c| 10 +- src/mesa/drivers/dri/i965/brw_vs.h| 2 +- src/mesa/drivers/dri/i965/gen6_gs_state.c | 2 +- src/mesa/drivers/dri/i965/gen7_gs_state.c | 2 +- src/mesa/drivers/dri/i965/gen8_gs_state.c | 2 +- src/mesa/drivers/dri/i965/gen8_vs_state.c | 2 +- 14 files changed, 38 insertions(+), 39 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index e7cd30f..463f3d2 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -145,7 +145,7 @@ extern C { struct brw_context; struct brw_inst; struct brw_vs_prog_key; -struct brw_vec4_prog_key; +struct brw_vue_prog_key; struct brw_wm_prog_key; struct brw_wm_prog_data; @@ -528,10 +528,9 @@ struct brw_ff_gs_prog_data { }; -/* Note: brw_vec4_prog_data_compare() must be updated when adding fields to - * this struct! +/* Shared data for stages that operate on VUEs (vertex, geometry) */ -struct brw_vec4_prog_data { +struct brw_vue_prog_data { struct brw_stage_prog_data base; struct brw_vue_map vue_map; @@ -552,7 +551,7 @@ struct brw_vec4_prog_data { * struct! */ struct brw_vs_prog_data { - struct brw_vec4_prog_data base; + struct brw_vue_prog_data base; GLbitfield64 inputs_read; @@ -610,7 +609,7 @@ struct brw_vs_prog_data { */ struct brw_gs_prog_data { - struct brw_vec4_prog_data base; + struct brw_vue_prog_data base; /** * Size of an output vertex, measured in HWORDS (32 bytes). @@ -1853,9 +1852,9 @@ void gen8_hiz_exec(struct brw_context *brw, struct intel_mipmap_tree *mt, uint32_t get_hw_prim_for_gl_prim(int mode); void -brw_setup_vec4_key_clip_info(struct brw_context *brw, - struct brw_vec4_prog_key *key, - bool program_uses_clip_distance); +brw_setup_vue_key_clip_info(struct brw_context *brw, +struct brw_vue_prog_key *key, +bool program_uses_clip_distance); void gen6_upload_push_constants(struct brw_context *brw, diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index df589b8..280db47 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -1911,9 +1911,9 @@ brw_vs_emit(struct brw_context *brw, void -brw_vec4_setup_prog_key_for_precompile(struct gl_context *ctx, - struct brw_vec4_prog_key *key, - GLuint id, struct gl_program *prog) +brw_vue_setup_prog_key_for_precompile(struct gl_context *ctx, + struct brw_vue_prog_key *key, + GLuint id, struct gl_program *prog) { key-program_string_id = id; key-clamp_vertex_color = ctx-API == API_OPENGL_COMPAT; diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h index 750f491..18ec8b3 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_vec4.h @@ -50,7 +50,7 @@ struct brw_vec4_compile { }; -struct brw_vec4_prog_key { +struct brw_vue_prog_key { GLuint program_string_id; /** @@ -77,7 +77,7 @@ extern C { void brw_vec4_setup_prog_key_for_precompile(struct gl_context *ctx, - struct brw_vec4_prog_key *key, + struct brw_vue_prog_key *key, GLuint id, struct gl_program *prog); #ifdef __cplusplus @@ -210,7 +210,7 @@ public: const src_reg src2 = src_reg()); struct brw_reg get_dst(void); - struct brw_reg get_src(const struct brw_vec4_prog_data *prog_data, int i); + struct brw_reg get_src(const struct brw_vue_prog_data *prog_data, int i); dst_reg dst; src_reg src[3]; @@ -252,8 +252,8 @@ public: vec4_visitor(struct brw_context *brw, struct
[Mesa-dev] [PATCH v2 14/16] i965: Add fs_visitor::run_vs() to generate scalar vertex shader code
This patch uses the previous refactoring to add a new run_vs() method that generates vertex shader code using the scalar visitor and optimizer. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_fs.cpp | 99 - src/mesa/drivers/dri/i965/brw_fs.h | 21 +- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 303 ++- 3 files changed, 412 insertions(+), 11 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 4dce0a2..8007977 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1828,6 +1828,56 @@ fs_visitor::assign_urb_setup() urb_start + prog_data-num_varying_inputs * 2; } +void +fs_visitor::assign_vs_urb_setup() +{ + brw_vs_prog_data *vs_prog_data = (brw_vs_prog_data *) prog_data; + int grf, count, slot, channel, attr; + + assert(stage == MESA_SHADER_VERTEX); + count = _mesa_bitcount_64(vs_prog_data-inputs_read); + if (vs_prog_data-uses_vertexid || vs_prog_data-uses_instanceid) + count++; + + /* Each attribute is 4 regs. */ + this-first_non_payload_grf = + payload.num_regs + prog_data-curb_read_length + count * 4; + + unsigned vue_entries = + MAX2(count, vs_prog_data-base.vue_map.num_slots); + + vs_prog_data-base.urb_entry_size = ALIGN(vue_entries, 4) / 4; + vs_prog_data-base.urb_read_length = (count + 1) / 2; + + assert(vs_prog_data-base.urb_read_length = 15); + + /* Rewrite all ATTR file references to the hw grf that they land in. */ + foreach_block_and_inst(block, fs_inst, inst, cfg) { + for (int i = 0; i inst-sources; i++) { + if (inst-src[i].file == ATTR) { + +if (inst-src[i].reg == VERT_ATTRIB_MAX) { + slot = count - 1; +} else { + attr = inst-src[i].reg + inst-src[i].reg_offset / 4; + slot = _mesa_bitcount_64(vs_prog_data-inputs_read +BITFIELD64_MASK(attr)); +} + +channel = inst-src[i].reg_offset 3; + +grf = payload.num_regs + + prog_data-curb_read_length + + slot * 4 + channel; + +inst-src[i].file = HW_REG; +inst-src[i].fixed_hw_reg = + retype(brw_vec8_grf(grf, 0), inst-src[i].type); + } + } + } +} + /** * Split large virtual GRFs into separate components if we can. * @@ -3405,6 +3455,13 @@ fs_visitor::setup_payload_gen6() } void +fs_visitor::setup_vs_payload() +{ + /* R0: thread header, R1: urb handles */ + payload.num_regs = 2; +} + +void fs_visitor::assign_binding_table_offsets() { assert(stage == MESA_SHADER_FRAGMENT); @@ -3471,6 +3528,8 @@ fs_visitor::opt_drop_redundant_mov_to_flags() void fs_visitor::optimize() { + const char *stage_name = stage == MESA_SHADER_VERTEX ? vs : fs; + calculate_cfg(); split_virtual_grfs(); @@ -3487,8 +3546,8 @@ fs_visitor::optimize() \ if (unlikely(INTEL_DEBUG DEBUG_OPTIMIZER) this_progress) { \ char filename[64]; \ - snprintf(filename, 64, fs%d-%04d-%02d-%02d- #pass, \ - dispatch_width, shader_prog ? shader_prog-Name : 0, iteration, pass_num); \ + snprintf(filename, 64, %s%d-%04d-%02d-%02d- #pass, \ + stage_name, dispatch_width, shader_prog ? shader_prog-Name : 0, iteration, pass_num); \ \ backend_visitor::dump_instructions(filename); \ } \ @@ -3498,8 +3557,8 @@ fs_visitor::optimize() if (unlikely(INTEL_DEBUG DEBUG_OPTIMIZER)) { char filename[64]; - snprintf(filename, 64, fs%d-%04d-00-start, - dispatch_width, shader_prog ? shader_prog-Name : 0); + snprintf(filename, 64, %s%d-%04d-00-start, + stage_name, dispatch_width, shader_prog ? shader_prog-Name : 0); backend_visitor::dump_instructions(filename); } @@ -3608,6 +3667,38 @@ fs_visitor::allocate_registers() } bool +fs_visitor::run_vs() +{ + assert(stage == MESA_SHADER_VERTEX); + + assign_common_binding_table_offsets(0); + setup_vs_payload(); + + if (INTEL_DEBUG DEBUG_SHADER_TIME) + emit_shader_time_begin(); + + foreach_in_list(ir_instruction, ir, shader-base.ir) { + base_ir = ir; + this-result = reg_undef; + ir-accept(this); + } + base_ir = NULL; + if (failed) + return false; + + emit_urb_writes(); + + optimize(); + + assign_curb_setup(); + assign_vs_urb_setup(); + + allocate_registers(); + + return !failed; +} + +bool fs_visitor::run() { sanity_param_count = prog-Parameters-NumParameters; diff --git
Re: [Mesa-dev] [PATCH 4/4] aux/pipe_loader: Don't leak dlerror string on dlopen failure
On Thu, Nov 13, 2014 at 6:22 PM, Ilia Mirkin imir...@alum.mit.edu wrote: On Thu, Nov 13, 2014 at 6:43 PM, Aaron Watry awa...@gmail.com wrote: dlopen allocates a string on dlopen failure which is retrieved via dlerror. In order to free that string, you need to retrieve and then free it. Are you basically saying that glibc leaks memory and you're trying to make up for it? What if you use a non-buggy library? Or is dlopen() specified in such a way that if it fails, you must free the result of dlerror? I see nothing in the man pages to suggest that... The closest that I can come to documentation at least implying this is [1]: RETURN VALUE If file cannot be found, cannot be opened for reading, is not of an appropriate object format for processing by dlopen(), or if an error occurs during the process of loading file or relocating its symbolic references, dlopen() shall return NULL. More detailed diagnostic information shall be available through dlerror(). Which implies that libdl needs to keep some sort of state regarding the last error encountered. I see no requirement that it keep a malloc'd string, just that it keep some state information around. [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/dlopen.html That does seem to lead one to read the dlerror() page that has this gem: DESCRIPTION The dlerror() function shall return a null-terminated character string (with no trailing newline) that describes the last error that occurred during dynamic linking processing. If no dynamic linking errors have occurred since the last invocation of dlerror(), dlerror() shall return NULL. Thus, invoking dlerror() a second time, immediately following a prior invocation, shall result in NULL being returned. snip APPLICATION USAGE The messages returned by dlerror() may reside in a static buffer that is overwritten on each call to dlerror() So, it may or may not return a malloc'd string, and all I've managed here is to fix a leak in glibc's specific implementation. The above docs seem to imply that dlopen() triggering an error needs to populate some state and dlerror() retrieves the last error that has occurred since the last dlerror() call. calling dlerror() again at that point will return null. That being said, I think a simpler fix and probably more correct fix would be to do: dlerror(); dlerror(); Thoughts? --Aaron In order to keep things legit the windows/other util_dl_error paths allocate and then copy their error message into a buffer as well. Signed-off-by: Aaron Watry awa...@gmail.com CC: Ilia Mirkin imir...@alum.mit.edu v3: Switch comment to C-Style v2: Use strdup instead of calloc/strcpy --- src/gallium/auxiliary/pipe-loader/pipe_loader.c | 5 + src/gallium/auxiliary/util/u_dl.c | 4 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/pipe-loader/pipe_loader.c b/src/gallium/auxiliary/pipe-loader/pipe_loader.c index 8e79f85..7a4e0b1 100644 --- a/src/gallium/auxiliary/pipe-loader/pipe_loader.c +++ b/src/gallium/auxiliary/pipe-loader/pipe_loader.c @@ -25,6 +25,8 @@ * **/ +#include dlfcn.h + #include pipe_loader_priv.h #include util/u_inlines.h @@ -101,6 +103,9 @@ pipe_loader_find_module(struct pipe_loader_device *dev, if (lib) { return lib; } + + /* Retrieve the dlerror() str so that it can be freed properly */ + FREE(util_dl_error()); } } diff --git a/src/gallium/auxiliary/util/u_dl.c b/src/gallium/auxiliary/util/u_dl.c index aca435d..00c4d7c 100644 --- a/src/gallium/auxiliary/util/u_dl.c +++ b/src/gallium/auxiliary/util/u_dl.c @@ -87,8 +87,8 @@ util_dl_error(void) #if defined(PIPE_OS_UNIX) return dlerror(); #elif defined(PIPE_OS_WINDOWS) - return unknown error; + return strdup(unknown error); #else - return unknown error; + return strdup(unknown error); #endif } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] aux/pipe_loader: Don't leak dlerror string on dlopen failure
On Thu, Nov 13, 2014 at 7:54 PM, Aaron Watry awa...@gmail.com wrote: On Thu, Nov 13, 2014 at 6:22 PM, Ilia Mirkin imir...@alum.mit.edu wrote: On Thu, Nov 13, 2014 at 6:43 PM, Aaron Watry awa...@gmail.com wrote: dlopen allocates a string on dlopen failure which is retrieved via dlerror. In order to free that string, you need to retrieve and then free it. Are you basically saying that glibc leaks memory and you're trying to make up for it? What if you use a non-buggy library? Or is dlopen() specified in such a way that if it fails, you must free the result of dlerror? I see nothing in the man pages to suggest that... The closest that I can come to documentation at least implying this is [1]: RETURN VALUE If file cannot be found, cannot be opened for reading, is not of an appropriate object format for processing by dlopen(), or if an error occurs during the process of loading file or relocating its symbolic references, dlopen() shall return NULL. More detailed diagnostic information shall be available through dlerror(). Which implies that libdl needs to keep some sort of state regarding the last error encountered. I see no requirement that it keep a malloc'd string, just that it keep some state information around. [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/dlopen.html That does seem to lead one to read the dlerror() page that has this gem: DESCRIPTION The dlerror() function shall return a null-terminated character string (with no trailing newline) that describes the last error that occurred during dynamic linking processing. If no dynamic linking errors have occurred since the last invocation of dlerror(), dlerror() shall return NULL. Thus, invoking dlerror() a second time, immediately following a prior invocation, shall result in NULL being returned. snip APPLICATION USAGE The messages returned by dlerror() may reside in a static buffer that is overwritten on each call to dlerror() So, it may or may not return a malloc'd string, and all I've managed here is to fix a leak in glibc's specific implementation. The above docs seem to imply that dlopen() triggering an error needs to populate some state and dlerror() retrieves the last error that has occurred since the last dlerror() call. calling dlerror() again at that point will return null. That being said, I think a simpler fix and probably more correct fix would be to do: dlerror(); dlerror(); That seems like it's vastly less likely to do the wrong thing with any reasonable implementation. So if it fixes things for glibc, let's do that instead :) Also, I wonder, even if glibc malloc's the string, whether it is truly leaked or valgrind just thinks that. May be interesting to pull the curtain back and see what's actually going on in glibc... -ilia Thoughts? --Aaron In order to keep things legit the windows/other util_dl_error paths allocate and then copy their error message into a buffer as well. Signed-off-by: Aaron Watry awa...@gmail.com CC: Ilia Mirkin imir...@alum.mit.edu v3: Switch comment to C-Style v2: Use strdup instead of calloc/strcpy --- src/gallium/auxiliary/pipe-loader/pipe_loader.c | 5 + src/gallium/auxiliary/util/u_dl.c | 4 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/pipe-loader/pipe_loader.c b/src/gallium/auxiliary/pipe-loader/pipe_loader.c index 8e79f85..7a4e0b1 100644 --- a/src/gallium/auxiliary/pipe-loader/pipe_loader.c +++ b/src/gallium/auxiliary/pipe-loader/pipe_loader.c @@ -25,6 +25,8 @@ * **/ +#include dlfcn.h + #include pipe_loader_priv.h #include util/u_inlines.h @@ -101,6 +103,9 @@ pipe_loader_find_module(struct pipe_loader_device *dev, if (lib) { return lib; } + + /* Retrieve the dlerror() str so that it can be freed properly */ + FREE(util_dl_error()); } } diff --git a/src/gallium/auxiliary/util/u_dl.c b/src/gallium/auxiliary/util/u_dl.c index aca435d..00c4d7c 100644 --- a/src/gallium/auxiliary/util/u_dl.c +++ b/src/gallium/auxiliary/util/u_dl.c @@ -87,8 +87,8 @@ util_dl_error(void) #if defined(PIPE_OS_UNIX) return dlerror(); #elif defined(PIPE_OS_WINDOWS) - return unknown error; + return strdup(unknown error); #else - return unknown error; + return strdup(unknown error); #endif } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] How difficult would it be to have debugging information for Jitted code show up?
But my distribution does build Mesa with debugging symbols. I have the package libgl1-mesa-dri-dbg installed which gives me debugging symbols such as drm_intel_bo_wait_rendering and drm_intel_bo_subdata. I assume I don't have is debugging information for JITted code although maybe the problem is a bug in perf (they've had problems with artificial intrinsics before.) Please assume I have at least the small quantity of intelligence needed to have installed the debugging symbols for the library I a. But tell me, what do you see when you profile glxgears with perf? Thank you, Steven Stewart-Gallus ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] How difficult would it be to have debugging information for Jitted code show up?
Hi Steven, On 14 November 2014 01:40, Steven Stewart-Gallus sstewartgallu...@mylangara.bc.ca wrote: But my distribution does build Mesa with debugging symbols. I have the package libgl1-mesa-dri-dbg installed which gives me debugging symbols such as drm_intel_bo_wait_rendering and drm_intel_bo_subdata. I assume I don't have is debugging information for JITted code although maybe the problem is a bug in perf (they've had problems with artificial intrinsics before.) Please assume I have at least the small quantity of intelligence needed to have installed the debugging symbols for the library I a. But tell me, what do you see when you profile glxgears with perf? Let me put things a bit differently: the classic drivers (be that i965 or any other) do _not_ use LLVM. So when you say that there are no debug symbols for the JITted code. that does not make sense for most(all?) people thus they assume the closest thing. Which is that you're missing the debug symbols, therefore giving you instructions on how to get them(rebuild mesa). So can you please tell us what makes you think that using the i965 driver, goes into LLVM/JITted code ? Perhaps having some sort of backtrace (I know the some functions names will be ??) or a testcase might bring some light :) Cheers, -Emil Thank you, Steven Stewart-Gallus ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] aux/pipe_loader: Don't leak dlerror string on dlopen failure
On Thu, Nov 13, 2014 at 7:04 PM, Ilia Mirkin imir...@alum.mit.edu wrote: On Thu, Nov 13, 2014 at 7:54 PM, Aaron Watry awa...@gmail.com wrote: On Thu, Nov 13, 2014 at 6:22 PM, Ilia Mirkin imir...@alum.mit.edu wrote: On Thu, Nov 13, 2014 at 6:43 PM, Aaron Watry awa...@gmail.com wrote: dlopen allocates a string on dlopen failure which is retrieved via dlerror. In order to free that string, you need to retrieve and then free it. Are you basically saying that glibc leaks memory and you're trying to make up for it? What if you use a non-buggy library? Or is dlopen() specified in such a way that if it fails, you must free the result of dlerror? I see nothing in the man pages to suggest that... The closest that I can come to documentation at least implying this is [1]: RETURN VALUE If file cannot be found, cannot be opened for reading, is not of an appropriate object format for processing by dlopen(), or if an error occurs during the process of loading file or relocating its symbolic references, dlopen() shall return NULL. More detailed diagnostic information shall be available through dlerror(). Which implies that libdl needs to keep some sort of state regarding the last error encountered. I see no requirement that it keep a malloc'd string, just that it keep some state information around. [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/dlopen.html That does seem to lead one to read the dlerror() page that has this gem: DESCRIPTION The dlerror() function shall return a null-terminated character string (with no trailing newline) that describes the last error that occurred during dynamic linking processing. If no dynamic linking errors have occurred since the last invocation of dlerror(), dlerror() shall return NULL. Thus, invoking dlerror() a second time, immediately following a prior invocation, shall result in NULL being returned. snip APPLICATION USAGE The messages returned by dlerror() may reside in a static buffer that is overwritten on each call to dlerror() So, it may or may not return a malloc'd string, and all I've managed here is to fix a leak in glibc's specific implementation. The above docs seem to imply that dlopen() triggering an error needs to populate some state and dlerror() retrieves the last error that has occurred since the last dlerror() call. calling dlerror() again at that point will return null. That being said, I think a simpler fix and probably more correct fix would be to do: dlerror(); dlerror(); That seems like it's vastly less likely to do the wrong thing with any reasonable implementation. So if it fixes things for glibc, let's do that instead :) Also, I wonder, even if glibc malloc's the string, whether it is truly leaked or valgrind just thinks that. May be interesting to pull the curtain back and see what's actually going on in glibc... According to glibc's current source, it does actually malloc the error string and keep it around in either a static or thread local struct with a pointer to the error string depending on how it's built. There's comments in the source implying that if a dlopen fails, it's recommended to then run dlerror() to see why it failed... If you do happen to dlopen() with failure multiple times in a row, it does only leak the 1 error string (it cleans up after itself every time you call dlopen() before it actually attempts to open the new library).. But in general, that malloc'd string will hang around until the next dlerror() call or the next time you call another dl*() function. We could have the same issue with dlsym, dlclose, etc. https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=dlfcn/dlopen.c;hb=HEAD https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=dlfcn/dlerror.c;hb=HEAD So for now, I'll just change this entire patch to call dlerror() twice instead of getting the error and then explicitly freeing it (which really was wrong. My bad.). The other file (u_dl.c) will remain as is (no changes in this patch). --Aaron -ilia Thoughts? --Aaron In order to keep things legit the windows/other util_dl_error paths allocate and then copy their error message into a buffer as well. Signed-off-by: Aaron Watry awa...@gmail.com CC: Ilia Mirkin imir...@alum.mit.edu v3: Switch comment to C-Style v2: Use strdup instead of calloc/strcpy --- src/gallium/auxiliary/pipe-loader/pipe_loader.c | 5 + src/gallium/auxiliary/util/u_dl.c | 4 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/pipe-loader/pipe_loader.c b/src/gallium/auxiliary/pipe-loader/pipe_loader.c index 8e79f85..7a4e0b1 100644 --- a/src/gallium/auxiliary/pipe-loader/pipe_loader.c +++ b/src/gallium/auxiliary/pipe-loader/pipe_loader.c @@ -25,6 +25,8 @@ * **/ +#include dlfcn.h + #include
Re: [Mesa-dev] [PATCH v2 13/16] i965: Move fs_visitor ra pass to new fs_visitor::allocate_registers()
On Thu, Nov 13, 2014 at 7:28 PM, Kristian Høgsberg k...@bitplanet.net wrote: This will be reused for the scalar VS pass. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_fs.cpp | 132 +++ src/mesa/drivers/dri/i965/brw_fs.h | 1 + 2 files changed, 71 insertions(+), 62 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index cb73b9f..4dce0a2 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -3538,11 +3538,79 @@ fs_visitor::optimize() lower_uniform_pull_constant_loads(); } +void +fs_visitor::allocate_registers() +{ + bool allocated_without_spills; + + static enum instruction_scheduler_mode pre_modes[] = { + SCHEDULE_PRE, + SCHEDULE_PRE_NON_LIFO, + SCHEDULE_PRE_LIFO, + }; + + /* Try each scheduling heuristic to see if it can successfully register +* allocate without spilling. They should be ordered by decreasing +* performance but increasing likelihood of allocating. +*/ + for (unsigned i = 0; i ARRAY_SIZE(pre_modes); i++) { + schedule_instructions(pre_modes[i]); + + if (0) { + assign_regs_trivial(); + allocated_without_spills = true; + } else { + allocated_without_spills = assign_regs(false); + } + if (allocated_without_spills) + break; + } + + if (!allocated_without_spills) { + /* We assume that any spilling is worse than just dropping back to + * SIMD8. There's probably actually some intermediate point where + * SIMD16 with a couple of spills is still better. + */ + if (dispatch_width == 16) { + fail(Failure to register allocate. Reduce number of + live scalar values to avoid this.); + } else { + perf_debug(Fragment shader triggered register spilling. +Try reducing the number of live scalar values to +improve performance.\n); Hmm, this warning will be pretty confusing once we start hitting this path for vertex shaders as well... + } + + /* Since we're out of heuristics, just go spill registers until we + * get an allocation. + */ + while (!assign_regs(true)) { + if (failed) +break; + } + } + + assert(force_uncompressed_stack == 0); + + /* This must come after all optimization and register allocation, since +* it inserts dead code that happens to have side effects, and it does +* so based on the actual physical registers in use. +*/ + insert_gen4_send_dependency_workarounds(); + + if (failed) + return; + + if (!allocated_without_spills) + schedule_instructions(SCHEDULE_POST); + + if (last_scratch 0) + prog_data-total_scratch = brw_get_scratch_size(last_scratch); +} + bool fs_visitor::run() { sanity_param_count = prog-Parameters-NumParameters; - bool allocated_without_spills; assign_binding_table_offsets(); @@ -3555,7 +3623,6 @@ fs_visitor::run() emit_dummy_fs(); } else if (brw-use_rep_send dispatch_width == 16) { emit_repclear_shader(); - allocated_without_spills = true; } else { if (INTEL_DEBUG DEBUG_SHADER_TIME) emit_shader_time_begin(); @@ -3610,68 +3677,9 @@ fs_visitor::run() assign_curb_setup(); assign_urb_setup(); - static enum instruction_scheduler_mode pre_modes[] = { - SCHEDULE_PRE, - SCHEDULE_PRE_NON_LIFO, - SCHEDULE_PRE_LIFO, - }; - - /* Try each scheduling heuristic to see if it can successfully register - * allocate without spilling. They should be ordered by decreasing - * performance but increasing likelihood of allocating. - */ - for (unsigned i = 0; i ARRAY_SIZE(pre_modes); i++) { - schedule_instructions(pre_modes[i]); - - if (0) { -assign_regs_trivial(); -allocated_without_spills = true; - } else { -allocated_without_spills = assign_regs(false); - } - if (allocated_without_spills) -break; - } - - if (!allocated_without_spills) { - /* We assume that any spilling is worse than just dropping back to - * SIMD8. There's probably actually some intermediate point where - * SIMD16 with a couple of spills is still better. - */ - if (dispatch_width == 16) { -fail(Failure to register allocate. Reduce number of - live scalar values to avoid this.); - } else { -perf_debug(Fragment shader triggered register spilling. - Try reducing the number of live scalar values to - improve performance.\n); - } - - /*
Re: [Mesa-dev] [PATCH 1/4] r600g/compute: Don't leak cbufs in compute state
On 14.11.2014 08:43, Aaron Watry wrote: Walk the array of cbufs backwards and free all of them. v3: Rebase on top of changes since Aug 2014 Signed-off-by: Aaron Watry awa...@gmail.com --- src/gallium/drivers/r600/evergreen_compute.c | 9 + 1 file changed, 9 insertions(+) diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c index 90fdd79..4334743 100644 --- a/src/gallium/drivers/r600/evergreen_compute.c +++ b/src/gallium/drivers/r600/evergreen_compute.c @@ -252,6 +252,15 @@ void evergreen_delete_compute_state(struct pipe_context *ctx, void* state) if (!shader) return; + if (shader-ctx){ + struct pipe_framebuffer_state *fb_state = shader-ctx-framebuffer.state; + for (int i = fb_state-nr_cbufs - 1; fb_state-nr_cbufs 0 ; i--){ + shader-ctx-b.b.surface_destroy(ctx, fb_state-cbufs[i]); + fb_state-cbufs[i] = NULL; + fb_state-nr_cbufs--; + } + } I think this is the wrong place to do this. It's the state tracker's responsibility to set an empty framebuffer state, so that the driver can unreference the surfaces of the previous framebuffer state. -- Earthling Michel Dänzer| http://www.amd.com Libre software enthusiast |Mesa and X developer ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 01/16] i965: Don't copy propagate sat MOVs into LOAD_PAYLOAD
On Thu, Nov 13, 2014 at 4:28 PM, Kristian Høgsberg k...@bitplanet.net wrote: The LOAD_PAYLOAD opcode can't saturate its sources, so skip saturating MOVs. The register coalescing after lower_load_payload() will clean up the extra MOVs. Signed-off-by: Kristian Høgsberg k...@bitplanet.net --- src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp index e1989cb..87ea9c2 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp @@ -454,8 +454,12 @@ fs_visitor::try_constant_propagate(fs_inst *inst, acp_entry *entry) val.effective_width = inst-src[i].effective_width; switch (inst-opcode) { - case BRW_OPCODE_MOV: case SHADER_OPCODE_LOAD_PAYLOAD: + /* LOAD_PAYLOAD can't sat its sources. */ + if (entry-saturate) +break; + /* Otherwise, fall through */ + case BRW_OPCODE_MOV: inst-src[i] = val; progress = true; break; -- 2.1.0 This looks like the same patch as 01/14 in the previous series. I suggested a better approach there: At the beginning of fs_visitor::try_constant_propagate, if (entry-saturate) return false, or just saturate the argument and proceed. We don't want to be propagating the result of a mov.sat 4.0 into anything without saturating the result first. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 09/16] i965: Prepare for using the ATTR register file in the fs backend
On Thu, Nov 13, 2014 at 4:28 PM, Kristian Høgsberg k...@bitplanet.net wrote: @@ -3148,6 +3150,9 @@ fs_visitor::dump_instruction(backend_instruction *be_inst, FILE *file) case UNIFORM: fprintf(file, ***u%d***, inst-dst.reg + inst-dst.reg_offset); break; + case ATTR: + fprintf(file, attr%d, inst-dst.reg + inst-dst.reg_offset); This can't be a destination, so put *** around it like the uniform case above. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] How difficult would it be to have debugging information for Jitted code show up?
On 14 November 2014 01:48, Emil Velikov emil.l.veli...@gmail.com wrote: Hi Steven, On 14 November 2014 01:40, Steven Stewart-Gallus sstewartgallu...@mylangara.bc.ca wrote: But my distribution does build Mesa with debugging symbols. I have the package libgl1-mesa-dri-dbg installed which gives me debugging symbols such as drm_intel_bo_wait_rendering and drm_intel_bo_subdata. I assume I don't have is debugging information for JITted code although maybe the problem is a bug in perf (they've had problems with artificial intrinsics before.) Please assume I have at least the small quantity of intelligence needed to have installed the debugging symbols for the library I a. But tell me, what do you see when you profile glxgears with perf? Let me put things a bit differently: the classic drivers (be that i965 or any other) do _not_ use LLVM. So when you say that there are no debug symbols for the JITted code. that does not make sense for most(all?) people thus they assume the closest thing. Which is that you're missing the debug symbols, therefore giving you instructions on how to get them(rebuild mesa). So can you please tell us what makes you think that using the i965 driver, goes into LLVM/JITted code ? Perhaps having some sort of backtrace (I know the some functions names will be ??) or a testcase might bring some light :) For anyone interested/hitting this issue: It seems that the debug package did not provide the relevant files, causing this misunderstanding. Now that I think about it seems to be the exact same issue about Debian's build/packaging system [1]. So as a note of self - don't trust your distribution to always get it right, double check :P -Emil [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=755921 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965: Don't call _mesa_load_state_parameters when nr_params == 0.
Saves a tiny bit of CPU overhead. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_vs_surface_state.c | 10 +- src/mesa/drivers/dri/i965/gen6_vs_state.c| 12 ++-- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c index 1cc96cf..4e18c7d 100644 --- a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c +++ b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c @@ -59,11 +59,6 @@ brw_upload_pull_constants(struct brw_context *brw, int i; uint32_t surf_index = prog_data-binding_table.pull_constants_start; - /* Updates the ParamaterValues[i] pointers for all parameters of the -* basic type of PROGRAM_STATE_VAR. -*/ - _mesa_load_state_parameters(brw-ctx, prog-Parameters); - if (!prog_data-nr_pull_params) { if (stage_state-surf_offset[surf_index]) { stage_state-surf_offset[surf_index] = 0; @@ -72,6 +67,11 @@ brw_upload_pull_constants(struct brw_context *brw, return; } + /* Updates the ParamaterValues[i] pointers for all parameters of the +* basic type of PROGRAM_STATE_VAR. +*/ + _mesa_load_state_parameters(brw-ctx, prog-Parameters); + /* CACHE_NEW_*_PROG | _NEW_PROGRAM_CONSTANTS */ uint32_t size = prog_data-nr_pull_params * 4; drm_intel_bo *const_bo = NULL; diff --git a/src/mesa/drivers/dri/i965/gen6_vs_state.c b/src/mesa/drivers/dri/i965/gen6_vs_state.c index 2427407..1de3c26 100644 --- a/src/mesa/drivers/dri/i965/gen6_vs_state.c +++ b/src/mesa/drivers/dri/i965/gen6_vs_state.c @@ -58,15 +58,15 @@ gen6_upload_push_constants(struct brw_context *brw, { struct gl_context *ctx = brw-ctx; - /* Updates the ParamaterValues[i] pointers for all parameters of the -* basic type of PROGRAM_STATE_VAR. -*/ - /* XXX: Should this happen somewhere before to get our state flag set? */ - _mesa_load_state_parameters(ctx, prog-Parameters); - if (prog_data-nr_params == 0) { stage_state-push_const_size = 0; } else { + /* Updates the ParamaterValues[i] pointers for all parameters of the + * basic type of PROGRAM_STATE_VAR. + */ + /* XXX: Should this happen somewhere before to get our state flag set? */ + _mesa_load_state_parameters(ctx, prog-Parameters); + gl_constant_value *param; int i; -- 2.1.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev