Re: [Mesa-dev] [PATCH] winsys/radeon: fix nop packet padding v2.
On Thu, Jul 24, 2014 at 8:07 PM, Alex Deucher alexdeuc...@gmail.com wrote: On Thu, Jul 24, 2014 at 6:28 PM, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com The ucode we got for hawaii does not support 0x1000 special nop packet type 3 and this leads to gpu reading invalid memory. As packet type 2 still exist just use packet type 2. Note this only partialy fix hawaii issues and some zbuffer tiling issues are still present. Changed since v1: - use packet type 2 instead of packet 3. We don't need this change if we use the updated firmware in my 3.17 tree. Looks like the original hawaii CP ucode didn't support the new 0x1000 special case NOP packet. I would rather have the nop2 packet solution than yet another is accel working. Several reasons : - 3.16 will be out soon and has most important fix - nop2 packet can easily be backported to stable mesa - testing for 3.16 is easy So i think it would be cleaner to just do nop2 and 3.16. Alex Signed-off-by: Jérôme Glisse jgli...@redhat.com --- src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c index a06ecb2..9ac7d0e 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c @@ -447,13 +447,8 @@ static void radeon_drm_cs_flush(struct radeon_winsys_cs *rcs, /* pad DMA ring to 8 DWs to meet CP fetch alignment requirements * r6xx, requires at least 4 dw alignment to avoid a hw bug. */ -if (cs-ws-info.chip_class = SI) { -while (rcs-cdw 7) -OUT_CS(cs-base, 0x8000); /* type2 nop packet */ -} else { -while (rcs-cdw 7) -OUT_CS(cs-base, 0x1000); /* type3 nop packet */ -} +while (rcs-cdw 7) +OUT_CS(cs-base, 0x8000); /* type2 nop packet */ break; case RING_UVD: while (rcs-cdw 15) -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] winsys/radeon: fix nop packet padding.
On Thu, Jul 24, 2014 at 05:42:21PM -0400, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com The gpu packet prefetcher hates the ugly big nop packet those leads to prefetching some invalid memory in some case. Apparently hawaii is particularly sensible to this. Note this only partialy fix hawaii issues and some zbuffer tiling issues are still present. Just to clarify this patch is almost good to go, there is the cs[MAX_DW-1] case that need fixing and i am pondering on how to do that. Also i have not tested on bonaire but i do expect that it should only fix thing and not break things. Cheers, Jérôme Signed-off-by: Jérôme Glisse jgli...@redhat.com --- src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c index a06ecb2..502a550 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.c @@ -451,8 +451,22 @@ static void radeon_drm_cs_flush(struct radeon_winsys_cs *rcs, while (rcs-cdw 7) OUT_CS(cs-base, 0x8000); /* type2 nop packet */ } else { -while (rcs-cdw 7) -OUT_CS(cs-base, 0x1000); /* type3 nop packet */ +switch (rcs-cdw 7) { +case 0: +break; +case 7: +/* FIXME can this be bad if we are at cs[LAST_DW-1] ? Need to + * think of something. + */ +OUT_CS(cs-base, 0xc0001000); +OUT_CS(cs-base, 0xcafedead); +/* Note we fallthrough as this will add another 7 dwords */ +default: +OUT_CS(cs-base, 0xc0001000 | (((8 - (rcs-cdw 7)) - 1) 16)); +while (rcs-cdw 7) { +OUT_CS(cs-base, 0xcafedead); +} +} } break; case RING_UVD: -- 1.8.3.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] rules for merging patches to libdrm
On Mon, Nov 18, 2013 at 05:41:50PM +0100, Thierry Reding wrote: On Mon, Nov 18, 2013 at 11:21:36AM -0500, Rob Clark wrote: On Mon, Nov 18, 2013 at 10:23 AM, Thierry Reding thierry.red...@gmail.com wrote: On Mon, Nov 18, 2013 at 10:17:47AM -0500, Rob Clark wrote: On Mon, Nov 18, 2013 at 8:29 AM, Thierry Reding thierry.red...@gmail.com wrote: On Sat, Nov 09, 2013 at 01:26:24PM -0800, Ian Romanick wrote: On 11/09/2013 12:11 AM, Dave Airlie wrote: How does this interact with the rule that kernel interfaces require an open source userspace? Is here are the mesa/libdrm patches that use it sufficient to get the kernel interface merged? That's my understanding: open source userspace needs to exist, but it doesn't need to be merged upstream yet. Having an opensource userspace and having it committed to a final repo are different things, as Daniel said patches on the mesa-list were sufficient, they're was no hurry to merge them considering a kernel release with the code wasn't close, esp with a 3 month release window if the kernel merge window is close to that anyways. libdrm is easy to change and its releases are cheap. What problem does committing code that uses an in-progress kernel interface to libdrm cause? I guess I'm not understanding something. Releases are cheap, but ABI breaks aren't so you can't just go release a libdrm with an ABI for mesa then decide later it was a bad plan. Introducing new kernel API usually involves assigning numbers for things - a new ioctl number, new #defines for bitfield members, and so on. Multiple patches can be in flight at the same time. For example, Abdiel and I both defined execbuf2 flags: #define I915_EXEC_RS (1 13) (Abdiel's code) #define I915_EXEC_OA (1 13) (my code) These obviously conflict. One of the two will land, and the second patch author will need to switch to (1 14) and resubmit. If we both decide to push to libdrm, we might get the order backwards, or maybe one series won't get pushed at all (in this case, I'm planning to drop my patch). Waiting until one lands in the kernel avoids that problem. Normally, I believe we copy the kernel headers to userspace and fix them up a bit. Dave may have other reasons; this is just the one I thought of. But mostly this, we've been stung by this exact thing happening before, and we made the process to stop it from happening again. Then in all honestly, commits to libdrm should be controlled by either a single person or a small cabal... just like the kernel and the xserver. We're clearly in an uncomfortable middle area where we have a stringent set of restrictions but no way to actually enforce them. That doesn't sound like a bad idea at all. It obviously causes more work for whoever will be the gatekeeper(s). It seems to me that libdrm is currently more of a free-for-all type of project, and whoever merges some new feature required for a particular X or Mesa driver cuts a new release so that the version number can be used to track the dependency. I wonder if perhaps tying the libdrm releases more tightly to Linux kernel releases would help. Since there already is a requirement for new kernel APIs to be merged before the libdrm equivalent can be merged, then having both release cycles in lockstep makes some sense. Not sure about strictly tying it to kernel releases would be ideal. Not *everything* in libdrm is about new kernel APIs. It tends to be the place for things needed both by xorg ddx and mesa driver, which I suppose is why it ends up a bit of a free-for-all. I didn't mean that every release would need to be tied to the Linux kernel. But whenever a new Linux kernel release was made, relevant changes from the public headers could be pulled into libdrm and a release be made. I could even imagine a matching of version numbers. libdrm releases could be numbered using the same major and minor as Linux kernels that they support. Micro version numbers could be used in intermediate releases. maybe an update-kernel-headers.sh script to grab the headers from drm-next and update libdrm wouldn't be a bad idea? Perhaps. But I think it could even be a manual step. It's not something that one person should be doing alone, but rather something that driver maintainers should be doing, since they know best what will be needed in a new version of libdrm. Like I mentioned in another subthread, I think a subtree-oriented model could work well. Thierry Please stop asking for more process bureaucracy. libdrm development model works fine. Everyone
Re: [Mesa-dev] Update: UVD status on loongson 3a platform
On Thu, Sep 05, 2013 at 10:14:32PM +0800, Chen Jie wrote: Hi all, This thread is about http://lists.freedesktop.org/archives/dri-devel/2013-April/037598.html. We recently find some interesting thing about UVD based playback on loongson 3a plaform, and also find a way to fix the problem. First, we find memcpy in [mesa]src/gallium/drivers/radeon/radeon_uvd.c caused the problem: * If memcpy is implemented though 16B or 8B load/store instructions, it will normally caused video mosaic. When insert a memcmp after the copying code in memcpy, it will report the src and dest are not equal. * If memcpy use 1B load/store instructions only, the memcmp after the copying code reports equal. Then we find the following changeset fixs out problem: diff --git a/src/gallium/drivers/radeon/radeon_uvd.c b/src/gallium/drivers/radeon/radeon_uvd.c index 2f98de2..f9599b6 100644 --- a/src/gallium/drivers/radeon/radeon_uvd.c +++ b/src/gallium/drivers/radeon/radeon_uvd.c @@ -162,7 +162,7 @@ static bool create_buffer(struct ruvd_decoder *dec, unsigned size) { buffer-buf = dec-ws-buffer_create(dec-ws, size, 4096, false, - RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM); + RADEON_DOMAIN_GTT); if (!buffer-buf) return false; The VRAM is mapped to an uncached area in out platform, so, my question is what could go wrong while using 4B load/store instructions in UVD workflow? Any idea? How do you map the VRAM into user process mapping ? ie do you have something like Intel PAT or something like MTRR or something else. In other word, can you map into process address space a region of io memory (GPU VRAM in this case) and mark it as uncached so that none of the access to it goes through CPU cache. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Update: UVD status on loongson 3a platform
On Thu, Sep 05, 2013 at 03:29:52PM -0400, Jerome Glisse wrote: On Thu, Sep 05, 2013 at 10:14:32PM +0800, Chen Jie wrote: Hi all, This thread is about http://lists.freedesktop.org/archives/dri-devel/2013-April/037598.html. We recently find some interesting thing about UVD based playback on loongson 3a plaform, and also find a way to fix the problem. First, we find memcpy in [mesa]src/gallium/drivers/radeon/radeon_uvd.c caused the problem: * If memcpy is implemented though 16B or 8B load/store instructions, it will normally caused video mosaic. When insert a memcmp after the copying code in memcpy, it will report the src and dest are not equal. * If memcpy use 1B load/store instructions only, the memcmp after the copying code reports equal. Then we find the following changeset fixs out problem: diff --git a/src/gallium/drivers/radeon/radeon_uvd.c b/src/gallium/drivers/radeon/radeon_uvd.c index 2f98de2..f9599b6 100644 --- a/src/gallium/drivers/radeon/radeon_uvd.c +++ b/src/gallium/drivers/radeon/radeon_uvd.c @@ -162,7 +162,7 @@ static bool create_buffer(struct ruvd_decoder *dec, unsigned size) { buffer-buf = dec-ws-buffer_create(dec-ws, size, 4096, false, - RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM); + RADEON_DOMAIN_GTT); if (!buffer-buf) return false; The VRAM is mapped to an uncached area in out platform, so, my question is what could go wrong while using 4B load/store instructions in UVD workflow? Any idea? How do you map the VRAM into user process mapping ? ie do you have something like Intel PAT or something like MTRR or something else. In other word, can you map into process address space a region of io memory (GPU VRAM in this case) and mark it as uncached so that none of the access to it goes through CPU cache. Cheers, Jerome Also it might be that you can't do write combining on your platform, which would be a major drawback as it's assume by radeon userspace. I would need to check the pcie specification, but write combining is probably not mandatory meaning that your architecture might not have it. This would explain why only memset with byte size copy works. Don't think there is any easy way to work around that. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] SIGFPE in libdrm_radeon on evergreen
On Mon, May 20, 2013 at 5:13 AM, Vadim Girlin vadimgir...@gmail.com wrote: On 05/20/2013 11:27 AM, Dragomir Ivanov wrote: 0x769058d7 in eg_surface_init_2d (surf_man=0x633ea0, surf=0x88d848, level=0x88dea8, bpe=1, tile_split=0, offset=65536, start_level=0) It looks like division by 0. tile_split=0 from the call site. Yes, I'm just not sure why tile_split is 0 here and what is the best way to fix it, possibly in fact this is a consequence of some problem in r600g, not in the libdrm. Though probably libdrm should handle it more gracefully anyway. Vadim Just a guess, ddx is not properly setting tile split on a surface and then r600g call in trying to rebuild miptree ... I think i fixed issue in ddx couple month ago but maybe i did not. Cheers, Jerome On Mon, May 20, 2013 at 4:11 AM, Vadim Girlin vadimgir...@gmail.com wrote: Reduced test app attached and below is gdb backtrace. I suspect something is not initialized properly but I'm not very familiar with this code. Vadim Program received signal SIGFPE, Arithmetic exception. 0x769058d7 in eg_surface_init_2d (surf_man=0x633ea0, surf=0x88d848, level=0x88dea8, bpe=1, tile_split=0, offset=65536, start_level=0) at radeon_surface.c:651 651 slice_pt = tileb / tile_split; #0 0x769058d7 in eg_surface_init_2d (surf_man=0x633ea0, surf=0x88d848, level=0x88dea8, bpe=1, tile_split=0, offset=65536, start_level=0) at radeon_surface.c:651 #1 0x76905eea in eg_surface_init_2d_miptrees (surf_man=0x633ea0, surf=0x88d848) at radeon_surface.c:807 #2 0x76906062 in eg_surface_init (surf_man=0x633ea0, surf=0x88d848) at radeon_surface.c:863 #3 0x76907fe6 in radeon_surface_init (surf_man=0x633ea0, surf=0x88d848) at radeon_surface.c:1901 #4 0x7713260b in radeon_drm_winsys_surface_init (rws=0x6339a0, surf=0x88d848) at radeon_drm_winsys.c:477 #5 0x770a3e1c in r600_setup_surface (screen=0x6340d0, rtex=0x88d760, pitch_in_bytes_override=0) at r600_texture.c:203 #6 0x770a4774 in r600_texture_create_object (screen=0x6340d0, base=0x7fffd6d0, pitch_in_bytes_override=0, buf=0x0, surface=0x7fffc8e0) at r600_texture.c:432 #7 0x770a5268 in r600_texture_create (screen=0x6340d0, templ=0x7fffd6d0) at r600_texture.c:607 #8 0x7708a5bd in r600_resource_create (screen=0x6340d0, templ=0x7fffd6d0) at r600_resource.c:38 #9 0x77125579 in dri2_drawable_process_buffers (drawable=0x88af80, buffers=0x88aea0, buffer_count=1, atts=0x88b628, att_count=2) at dri2.c:283 #10 0x7712590a in dri2_allocate_textures (drawable=0x88af80, statts=0x88b628, statts_count=2) at dri2.c:404 #11 0x77123e6a in dri_st_framebuffer_validate (stfbi=0x88af80, statts=0x88b628, count=2, out=0x7fffd840) at dri_drawable.c:81 #12 0x76e461c1 in st_framebuffer_validate (stfb=0x88b1e0, st=0x883870) at ../../src/mesa/state_tracker/**st_manager.c:193 #13 0x76e472a8 in st_api_make_current (stapi=0x7761b9e0 st_gl_api, stctxi=0x883870, stdrawi=0x88af80, streadi=0x88af80) at ../../src/mesa/state_tracker/**st_manager.c:721 #14 0x77122ce8 in dri_make_current (cPriv=0x7fdb70, driDrawPriv=0x88af40, driReadPriv=0x88af40) at dri_context.c:255 #15 0x76c6ba1f in driBindContext (pcp=0x7fdb70, pdp=0x88af40, prp=0x88af40) at ../../../../src/mesa/drivers/**dri/common/dri_util.c:382 #16 0x77dc57e3 in dri2_bind_context (context=0x7fd9d0, old=0x616650, draw=67108873, read=67108873) at dri2_glx.c:172 #17 0x77d8c253 in MakeContextCurrent (dpy=0x602040, draw=67108873, read=67108873, gc_user=0x7fd9d0) at glxcurrent.c:269 #18 0x00384e82713c in fgOpenWindow () from /lib64/libglut.so.3 #19 0x00384e825afa in fgCreateWindow () from /lib64/libglut.so.3 #20 0x00384e825b95 in fgCreateMenu () from /lib64/libglut.so.3 #21 0x00384e823cd3 in glutCreateMenu () from /lib64/libglut.so.3 #22 0x00400816 in main (argc=1, argv=0x7fffdf18) at test.c:17 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] r600g: Don't set the dest cache bits on surface sync for R600_CONTEXT_FLUSH_AND_INV
On Wed, May 1, 2013 at 1:23 PM, Marek Olšák mar...@gmail.com wrote: This is a funny subject. Originally, we only used SURFACE_SYNC on Evergreen, which is what the hw guys recommend using, but then Jerome came and rewrote it with no reasonable argument to back it up (what he was trying to fix by his cache-flush rework is not fixed to this day), such that we now flush and invalidate more caches than we need. I guess fixing lockup is not reasonable. Jerome FLUSH_AND_INV isn't recommended, because it should be slower in theory when streamout is used. Frequent changes of streamout buffers would also flush and invalidate the framebuffer cache, which is undesirable. Unfortunately, I don't know of any apps using streamout. This patch looks good. However, once we start seeing apps taking full advantage of GL3 and GL4, we will have to switch back to SURFACE_SYNC at least for graphics. Marek On Fri, Apr 26, 2013 at 7:21 PM, Tom Stellard t...@stellard.net wrote: From: Tom Stellard thomas.stell...@amd.com We are already emitting a EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet when this flush flag is set, so flushing the dest caches with a SURFACE_SYNC should not be necessary. The motivation for this change is that emitting a SURFACE_SYNC packet with the CB bits set was causing compute shaders to hang on Cayman. --- src/gallium/drivers/r600/r600_hw_context.c | 28 +--- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index b4fb3bf..8aebd25 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -231,21 +231,19 @@ void r600_flush_emit(struct r600_context *rctx) cs-buf[cs-cdw++] = PKT3(PKT3_EVENT_WRITE, 0, 0); cs-buf[cs-cdw++] = EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT) | EVENT_INDEX(0); if (rctx-chip_class = EVERGREEN) { - cp_coher_cntl = S_0085F0_CB0_DEST_BASE_ENA(1) | - S_0085F0_CB1_DEST_BASE_ENA(1) | - S_0085F0_CB2_DEST_BASE_ENA(1) | - S_0085F0_CB3_DEST_BASE_ENA(1) | - S_0085F0_CB4_DEST_BASE_ENA(1) | - S_0085F0_CB5_DEST_BASE_ENA(1) | - S_0085F0_CB6_DEST_BASE_ENA(1) | - S_0085F0_CB7_DEST_BASE_ENA(1) | - S_0085F0_CB8_DEST_BASE_ENA(1) | - S_0085F0_CB9_DEST_BASE_ENA(1) | - S_0085F0_CB10_DEST_BASE_ENA(1) | - S_0085F0_CB11_DEST_BASE_ENA(1) | - S_0085F0_DB_DEST_BASE_ENA(1) | - S_0085F0_TC_ACTION_ENA(1) | - S_0085F0_CB_ACTION_ENA(1) | + /* We were previously setting the CB and DB bits on +* cp_coher_cntl, but this is unnecessary since +* we are emitting the +* EVENT_TYPE_CACHE_FLUSH_AND_INV_EVENT packet. +* Setting the CB bits was causing lockups when using +* compute on cayman. +* +* XXX: Do even need to emit a surface sync packet here? +* Prior to e5e4c07e7964a3258ed02b530bcdc24c0650204b +* surface sync was not being emitted with the +* R600_CONTEXT_FLUSH_AND_INV flag. +*/ + cp_coher_cntl = S_0085F0_TC_ACTION_ENA(1) | S_0085F0_DB_ACTION_ENA(1) | S_0085F0_SH_ACTION_ENA(1) | S_0085F0_SMX_ACTION_ENA(1) | -- 1.8.1.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: use CP DMA for buffer clears on evergreen+
On Wed, Apr 24, 2013 at 3:15 PM, alexdeuc...@gmail.com wrote: From: Alex Deucher alexander.deuc...@amd.com Lighter weight then using streamout. Only evergreen and newer asics support embedded data as src with CP DMA. Signed-off-by: Alex Deucher alexander.deuc...@amd.com Reviewed-by: Jerome Glisse jgli...@redhat.com --- src/gallium/drivers/r600/evergreen_hw_context.c | 66 +++ src/gallium/drivers/r600/evergreend.h | 42 ++ src/gallium/drivers/r600/r600_blit.c| 10 +++- src/gallium/drivers/r600/r600_pipe.h|3 + 4 files changed, 119 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c b/src/gallium/drivers/r600/evergreen_hw_context.c index d980c18..7cab879 100644 --- a/src/gallium/drivers/r600/evergreen_hw_context.c +++ b/src/gallium/drivers/r600/evergreen_hw_context.c @@ -106,3 +106,69 @@ void evergreen_dma_copy(struct r600_context *rctx, util_range_add(rdst-valid_buffer_range, dst_offset, dst_offset + size); } + +/* The max number of bytes to copy per packet. */ +#define CP_DMA_MAX_BYTE_COUNT ((1 21) - 8) + +void evergreen_cp_dma_clear_buffer(struct r600_context *rctx, + struct pipe_resource *dst, uint64_t offset, + unsigned size, uint32_t clear_value) +{ + struct radeon_winsys_cs *cs = rctx-rings.gfx.cs; + + assert(size); + assert(rctx-screen-has_cp_dma); + + offset += r600_resource_va(rctx-screen-screen, dst); + + /* We flush the caches, because we might read from or write +* to resources which are bound right now. */ + rctx-flags |= R600_CONTEXT_INVAL_READ_CACHES | + R600_CONTEXT_FLUSH_AND_INV | + R600_CONTEXT_FLUSH_AND_INV_CB_META | + R600_CONTEXT_FLUSH_AND_INV_DB_META | + R600_CONTEXT_STREAMOUT_FLUSH | + R600_CONTEXT_WAIT_3D_IDLE; + + while (size) { + unsigned sync = 0; + unsigned byte_count = MIN2(size, CP_DMA_MAX_BYTE_COUNT); + unsigned reloc; + + r600_need_cs_space(rctx, 10 + (rctx-flags ? R600_MAX_FLUSH_CS_DWORDS : 0), FALSE); + + /* Flush the caches for the first copy only. */ + if (rctx-flags) { + r600_flush_emit(rctx); + } + + /* Do the synchronization after the last copy, so that all data is written to memory. */ + if (size == byte_count) { + sync = PKT3_CP_DMA_CP_SYNC; + } + + /* This must be done after r600_need_cs_space. */ + reloc = r600_context_bo_reloc(rctx, rctx-rings.gfx, + (struct r600_resource*)dst, RADEON_USAGE_WRITE); + + r600_write_value(cs, PKT3(PKT3_CP_DMA, 4, 0)); + r600_write_value(cs, clear_value); /* DATA [31:0] */ + r600_write_value(cs, sync | PKT3_CP_DMA_SRC_SEL(2));/* CP_SYNC [31] | SRC_SEL[30:29] */ + r600_write_value(cs, offset); /* DST_ADDR_LO [31:0] */ + r600_write_value(cs, (offset 32) 0xff);/* DST_ADDR_HI [7:0] */ + r600_write_value(cs, byte_count); /* COMMAND [29:22] | BYTE_COUNT [20:0] */ + + r600_write_value(cs, PKT3(PKT3_NOP, 0, 0)); + r600_write_value(cs, reloc); + + size -= byte_count; + offset += byte_count; + } + + /* Invalidate the read caches. */ + rctx-flags |= R600_CONTEXT_INVAL_READ_CACHES; + + util_range_add(r600_resource(dst)-valid_buffer_range, offset, + offset + size); +} + diff --git a/src/gallium/drivers/r600/evergreend.h b/src/gallium/drivers/r600/evergreend.h index 53b68a4..5d72432 100644 --- a/src/gallium/drivers/r600/evergreend.h +++ b/src/gallium/drivers/r600/evergreend.h @@ -118,6 +118,48 @@ #define PKT3_PREDICATE(x) (((x) 0) 0x1) #define PKT0(index, count) (PKT_TYPE_S(0) | PKT0_BASE_INDEX_S(index) | PKT_COUNT_S(count)) +#define PKT3_CP_DMA0x41 +/* 1. header + * 2. SRC_ADDR_LO [31:0] or DATA [31:0] + * 3. CP_SYNC [31] | SRC_SEL [30:29] | ENGINE [27] | DST_SEL [21:20] | SRC_ADDR_HI [7:0] + * 4. DST_ADDR_LO [31:0] + * 5. DST_ADDR_HI [7:0] + * 6. COMMAND [29:22] | BYTE_COUNT [20:0] + */ +#define PKT3_CP_DMA_CP_SYNC (1 31) +#define PKT3_CP_DMA_SRC_SEL(x) ((x) 29) +/* 0 - SRC_ADDR + * 1 - GDS (program SAS to 1 as well) + * 2 - DATA + */ +#define PKT3_CP_DMA_DST_SEL(x) ((x) 20) +/* 0 - DST_ADDR + * 1 - GDS (program DAS to 1 as well) + */ +/* COMMAND */ +#define PKT3_CP_DMA_CMD_SRC_SWAP(x) ((x) 23) +/* 0
Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing v2
On Wed, Mar 27, 2013 at 4:45 AM, Christian König deathsim...@vodafone.de wrote: Am 27.03.2013 01:43, schrieb Jerome Glisse: On Tue, Mar 26, 2013 at 6:45 PM, Dave Airlie airl...@gmail.com wrote: correctly). But Marek is quite right that this only counts for state objects and makes no sense for set_* and draw_* calls (and I'm currently thinking how to avoid that and can't come up with a proper solution). Anyway it's definitely not an urgent problem for radeonsi. It will be a problem once we actually start caring about performance and, most importantly, the CPU overhead of the driver. I still think that writing into the command buffers directly (e.g. without wrapper functions) is a bad idea, cause that lead to mixing driver logic and I'm convinced the exact opposite is a bad idea, because it adds another layer all commands must go through. A layer which brings no advantage. Think about apps which issue 1k-10k draw calls per frame. It's obvious that every byte moved around counts and the key to high framerate is to do (almost) nothing in the driver. It looks like the idea here is to make the driver as slow as possible. packet building in r600g. For example just try to figure out how the relocation in NOPs work by reading the source (please keep in mind that one of the primary goals why AMD is supporting this driver is to give a good example code for customers who want to implement that stuff on their own systems). I'm shocked. Sacrificing performance in the name of making the code nicer for some customers? Seriously? I thought the plan was to make the best graphics driver ever. Well, maybe I'm repeating myself: Performance is not a priority, it's only nice to have! Sorry to say so, but if we sacrifice a bit of performance for more code readability than that is perfectly ok with me (Don't understand me wrong I would really prefer to replace the closed source driver today than tomorrow, it's unfortunately just not what I'm paid for). On the other hand, we are talking about perfectly optimizeable inline functions and/or macros. All I'm saying is that we should structurize the code a bit more. Its okay to take steps in the right direction, but if you start taking steps that away from performance in lieu of code readability then please be prepared to deal with objections. The thing is in a lot of cases, code readability is in the eye of the beholder, I'm sure Jerome though r600g was perfectly readable when he wrote it, but a lot of us didn't and spent a lot of time trying to remove the CPU overheads, not least the amount of time Marek spent. The thing is performance is measureable, code readability isn't. Dave. Maybe once again you forgot why i did things the way i did them, i explained myself to you back then, i designed r600g for a new kernel api which was violently different from the cs one, my hope was that the other kernel api would be better, it was not and i never pushed more on that front. So r600g design was definitely not adapted to the cs ioctl and not thinked for it. History often explain a lot of things and people seems to forget about them. That being said, i too find ironic the code readability argument, if one understand the cs ioctl then the r600g code as it's nowadays make sense, but the radeonsi code is closer to what r600g use to be. So assuming same ioctl i would say that radeonsi should move towards what r600g is nowadays. Anyway just wanted to set history straight. Well I think you hit the point here quite well, may I ask what your kernel interface would have been looked like? Christian. I use to have a branch on fdo with it, basicly what use to be r600_hw_context was a nop in gallium and you had state in kernel (cb, db, sampler view, sampler, ...) and you created them and then bound them so everything was mostly security check at creation time and bound time was pretty quick, it was also transaction based. Relocation was easier too. Anyway it was a bad API, i know that in closed world or more obscure stack you can have a kernel api that doesn't do much security check and call it a day which gives you a lot more freedom on api. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] winsys/radeon: add command stream replay dump for faulty lockup
On Wed, Mar 27, 2013 at 11:27 AM, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com Build time option, set RADEON_CS_DUMP_ON_LOCKUP to 1 in radeon_drm_cs.h to enable it. When enabled after each cs submission the code will try to detect lockup by waiting on one of the buffer of the cs to become idle, after a timeout it will consider that the cs triggered a lockup and will write a radeon_lockup.c file in current directory that have all information for replaying the cs. To build this file : gcc -O0 -g radeon_lockup.c -ldrm -o radeon_lockup -I/usr/include/libdrm Signed-off-by: Jerome Glisse jgli...@redhat.com Maybe i should add the radeon_ctx.h file to winsys dir as you need it to build the radeon_lockup.c i did not wanted to printf the whole helper. For example you can check radeon_lockup.c and radeon_ctx.h here : http://people.freedesktop.org/~glisse/rlockup/ Note this is a radeon si verde capture for a 2d tiling that lockup (can be hard lockup sometimes so be careful). Cheers, Jerome --- src/gallium/winsys/radeon/drm/Makefile.sources | 1 + src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 80 ++-- src/gallium/winsys/radeon/drm/radeon_drm_bo.h | 2 + src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 4 + src/gallium/winsys/radeon/drm/radeon_drm_cs.h | 6 + src/gallium/winsys/radeon/drm/radeon_drm_cs_dump.c | 135 + 6 files changed, 191 insertions(+), 37 deletions(-) create mode 100644 src/gallium/winsys/radeon/drm/radeon_drm_cs_dump.c diff --git a/src/gallium/winsys/radeon/drm/Makefile.sources b/src/gallium/winsys/radeon/drm/Makefile.sources index 1d18d61..4ca5ebb 100644 --- a/src/gallium/winsys/radeon/drm/Makefile.sources +++ b/src/gallium/winsys/radeon/drm/Makefile.sources @@ -1,4 +1,5 @@ C_SOURCES := \ radeon_drm_bo.c \ radeon_drm_cs.c \ + radeon_drm_cs_dump.c \ radeon_drm_winsys.c diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c index f4ac526..5a9493a 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c @@ -391,14 +391,54 @@ static void radeon_bo_destroy(struct pb_buffer *_buf) FREE(bo); } +void *radeon_bo_do_map(struct radeon_bo *bo) +{ +struct drm_radeon_gem_mmap args = {0}; +void *ptr; + +/* Return the pointer if it's already mapped. */ +if (bo-ptr) +return bo-ptr; + +/* Map the buffer. */ +pipe_mutex_lock(bo-map_mutex); +/* Return the pointer if it's already mapped (in case of a race). */ +if (bo-ptr) { +pipe_mutex_unlock(bo-map_mutex); +return bo-ptr; +} +args.handle = bo-handle; +args.offset = 0; +args.size = (uint64_t)bo-base.size; +if (drmCommandWriteRead(bo-rws-fd, +DRM_RADEON_GEM_MMAP, +args, +sizeof(args))) { +pipe_mutex_unlock(bo-map_mutex); +fprintf(stderr, radeon: gem_mmap failed: %p 0x%08X\n, +bo, bo-handle); +return NULL; +} + +ptr = os_mmap(0, args.size, PROT_READ|PROT_WRITE, MAP_SHARED, + bo-rws-fd, args.addr_ptr); +if (ptr == MAP_FAILED) { +pipe_mutex_unlock(bo-map_mutex); +fprintf(stderr, radeon: mmap failed, errno: %i\n, errno); +return NULL; +} +bo-ptr = ptr; +pipe_mutex_unlock(bo-map_mutex); + +return bo-ptr; +} + static void *radeon_bo_map(struct radeon_winsys_cs_handle *buf, struct radeon_winsys_cs *rcs, enum pipe_transfer_usage usage) { struct radeon_bo *bo = (struct radeon_bo*)buf; struct radeon_drm_cs *cs = (struct radeon_drm_cs*)rcs; -struct drm_radeon_gem_mmap args = {0}; -void *ptr; /* If it's not unsynchronized bo_map, flush CS if needed and then wait. */ if (!(usage PIPE_TRANSFER_UNSYNCHRONIZED)) { @@ -461,41 +501,7 @@ static void *radeon_bo_map(struct radeon_winsys_cs_handle *buf, } } -/* Return the pointer if it's already mapped. */ -if (bo-ptr) -return bo-ptr; - -/* Map the buffer. */ -pipe_mutex_lock(bo-map_mutex); -/* Return the pointer if it's already mapped (in case of a race). */ -if (bo-ptr) { -pipe_mutex_unlock(bo-map_mutex); -return bo-ptr; -} -args.handle = bo-handle; -args.offset = 0; -args.size = (uint64_t)bo-base.size; -if (drmCommandWriteRead(bo-rws-fd, -DRM_RADEON_GEM_MMAP, -args, -sizeof(args))) { -pipe_mutex_unlock(bo-map_mutex); -fprintf(stderr, radeon: gem_mmap failed: %p 0x%08X\n, -bo, bo-handle); -return NULL; -} - -ptr = os_mmap(0
Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing v2
On Tue, Mar 26, 2013 at 12:40 PM, Marek Olšák mar...@gmail.com wrote: On Tue, Mar 26, 2013 at 3:59 PM, Christian König deathsim...@vodafone.de wrote: Am 26.03.2013 15:34, schrieb Marek Olšák: Speaking of si_pm4_state, I think it's a horrible mechanism for anything other than constant state objects (create/bind/delete functions). For everything else (set/draw functions), you want to emit directly into the command stream. It's not so different from the bad state management which r600g used to have (which is now gone). If you have to call malloc or calloc in a set_* or draw_* function, you're doing it wrong. Are there plans to change it to something more efficient (e.g. how r300g and r600g emit non-CSO states right now), or will it be like this forever? Actually I hoped that r600g sooner or later moves into the same direction some more. The fact that we currently need to malloc every buffer indeed sucks badly, but that is still better than mixing packet generation with driver logic. I don't understand the last sentence. What mixing? The set_* and draw_* commands are supposed to be executed immediately, therefore it's reasonable and preferable to write to the CS directly. Having any intermediate storage for commands is a waste of time and space. I agree here, i don't think uncached bo for command stream on new hw would bring huge perf increase, probably will just be noise. Also I don't think that emitting directly into the command stream is such a good idea, we sooner or later want that buffer to be a buffer allocated in GART memory. And under this condition it is better to build up the commands in a (heavily cached) system memory and then memcpy then to the destination buffer. AFAIK, GART memory is cached on non-AGP systems, but even uncached access shouldn't be a big issue, because the access pattern is sequential and write-only. BTW, I have talked about emitting commands into a buffer object with Dave and he thinks it's a bad idea due to the map and unmap overhead. Also, we have to disallow writing to certain unsafe registers anyway. Marek I think Christian is thinking about new hw cayman where we can skip register checking because of vm and hardware register checking (the hw CP checks that register in the user IB is not one of the privilege register and block write and throw irq if so). On this kind of hw you can have cmd stream in bo and don't do the map/unmap. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] OpenGL ES 3.0 support
On Tue, Mar 26, 2013 at 4:39 AM, violin yanev violin.ya...@gmail.com wrote: Thanks for your replies guys! The output of eglinfo is: EGL API version: 1.4 EGL vendor string: Mesa Project EGL version string: 1.4 (DRI2) EGL client APIs: OpenGL OpenGL_ES OpenGL_ES2 EGL extensions string: EGL_MESA_drm_image EGL_WL_bind_wayland_display EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_image EGL_KHR_gl_renderbuffer_image EGL_KHR_surfaceless_context EGL_KHR_create_context EGL_NOK_swap_region EGL_NOK_texture_from_pixmap EGL_NV_post_sub_buffer So apparently ES3.0 is not a supported API :( @Jordan: do you know if one can reenable ES3 on Intel graphics? Is a special flag expected? I had read a message that Fedora 18 will enable ES3.0 by default? Violin AFAICT fedora won't enable ES3.0 due to patent uncertainty regarding floating point format, you can however build mesa yourself and enable floating point format that would give you ES3.0 support. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] OpenGL ES 3.0 support
On Tue, Mar 26, 2013 at 2:14 PM, Jordan Justen jljus...@gmail.com wrote: On Tue, Mar 26, 2013 at 10:34 AM, Jerome Glisse j.gli...@gmail.com wrote: On Tue, Mar 26, 2013 at 4:39 AM, violin yanev violin.ya...@gmail.com wrote: Thanks for your replies guys! The output of eglinfo is: EGL API version: 1.4 EGL vendor string: Mesa Project EGL version string: 1.4 (DRI2) EGL client APIs: OpenGL OpenGL_ES OpenGL_ES2 EGL extensions string: EGL_MESA_drm_image EGL_WL_bind_wayland_display EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_image EGL_KHR_gl_renderbuffer_image EGL_KHR_surfaceless_context EGL_KHR_create_context EGL_NOK_swap_region EGL_NOK_texture_from_pixmap EGL_NV_post_sub_buffer So apparently ES3.0 is not a supported API :( @Jordan: do you know if one can reenable ES3 on Intel graphics? Is a special flag expected? I had read a message that Fedora 18 will enable ES3.0 by default? Violin AFAICT fedora won't enable ES3.0 due to patent uncertainty regarding floating point format This feature should be usable on Intel hardware which is why it was enabled by default (for Intel hardware) in 9bdf5be. -Jordan Fedora patch revert this commit. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing v2
On Tue, Mar 26, 2013 at 6:45 PM, Dave Airlie airl...@gmail.com wrote: correctly). But Marek is quite right that this only counts for state objects and makes no sense for set_* and draw_* calls (and I'm currently thinking how to avoid that and can't come up with a proper solution). Anyway it's definitely not an urgent problem for radeonsi. It will be a problem once we actually start caring about performance and, most importantly, the CPU overhead of the driver. I still think that writing into the command buffers directly (e.g. without wrapper functions) is a bad idea, cause that lead to mixing driver logic and I'm convinced the exact opposite is a bad idea, because it adds another layer all commands must go through. A layer which brings no advantage. Think about apps which issue 1k-10k draw calls per frame. It's obvious that every byte moved around counts and the key to high framerate is to do (almost) nothing in the driver. It looks like the idea here is to make the driver as slow as possible. packet building in r600g. For example just try to figure out how the relocation in NOPs work by reading the source (please keep in mind that one of the primary goals why AMD is supporting this driver is to give a good example code for customers who want to implement that stuff on their own systems). I'm shocked. Sacrificing performance in the name of making the code nicer for some customers? Seriously? I thought the plan was to make the best graphics driver ever. Well, maybe I'm repeating myself: Performance is not a priority, it's only nice to have! Sorry to say so, but if we sacrifice a bit of performance for more code readability than that is perfectly ok with me (Don't understand me wrong I would really prefer to replace the closed source driver today than tomorrow, it's unfortunately just not what I'm paid for). On the other hand, we are talking about perfectly optimizeable inline functions and/or macros. All I'm saying is that we should structurize the code a bit more. Its okay to take steps in the right direction, but if you start taking steps that away from performance in lieu of code readability then please be prepared to deal with objections. The thing is in a lot of cases, code readability is in the eye of the beholder, I'm sure Jerome though r600g was perfectly readable when he wrote it, but a lot of us didn't and spent a lot of time trying to remove the CPU overheads, not least the amount of time Marek spent. The thing is performance is measureable, code readability isn't. Dave. Maybe once again you forgot why i did things the way i did them, i explained myself to you back then, i designed r600g for a new kernel api which was violently different from the cs one, my hope was that the other kernel api would be better, it was not and i never pushed more on that front. So r600g design was definitely not adapted to the cs ioctl and not thinked for it. History often explain a lot of things and people seems to forget about them. That being said, i too find ironic the code readability argument, if one understand the cs ioctl then the r600g code as it's nowadays make sense, but the radeonsi code is closer to what r600g use to be. So assuming same ioctl i would say that radeonsi should move towards what r600g is nowadays. Anyway just wanted to set history straight. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing
On Mon, Mar 25, 2013 at 12:17 PM, Michel Dänzer mic...@daenzer.net wrote: On Mon, 2013-03-25 at 12:01 -0400, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com Same as on r600, trace cs execution by writting cs offset after each states, this allow to pin point lockup inside command stream and narrow down the scope of lockup investigation. Signed-off-by: Jerome Glisse jgli...@redhat.com [...] diff --git a/src/gallium/drivers/radeonsi/r600_texture.c b/src/gallium/drivers/radeonsi/r600_texture.c index 6cafc3d..3d074a3 100644 --- a/src/gallium/drivers/radeonsi/r600_texture.c +++ b/src/gallium/drivers/radeonsi/r600_texture.c @@ -550,7 +550,7 @@ struct pipe_resource *si_texture_create(struct pipe_screen *screen, if (!(templ-flags R600_RESOURCE_FLAG_TRANSFER) !(templ-bind PIPE_BIND_SCANOUT)) { - array_mode = V_009910_ARRAY_2D_TILED_THIN1; + array_mode = V_009910_ARRAY_1D_TILED_THIN1; } r = r600_init_surface(rscreen, surface, templ, array_mode, What's this hunk doing in here? :) The rest looks good to me on a quick look. Oops i did it on top of my 2d tiling stuff Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing
On Mon, Mar 25, 2013 at 12:38 PM, Christian König deathsim...@vodafone.de wrote: Am 25.03.2013 17:01, schrieb j.gli...@gmail.com: From: Jerome Glisse jgli...@redhat.com Same as on r600, trace cs execution by writting cs offset after each states, this allow to pin point lockup inside command stream and narrow down the scope of lockup investigation. Signed-off-by: Jerome Glisse jgli...@redhat.com Could your rewrite this to use an si_pm4_state instead of hand coding it? It's cleaner and should reduce the needed code quite a bit. Christian. Well no, the whole point is to emit inside each si_pm4_state_emit so that you can pin point which reg/packet trigger the lockup. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: add cs tracing
On Mon, Mar 25, 2013 at 1:12 PM, Christian König deathsim...@vodafone.de wrote: Am 25.03.2013 17:50, schrieb Jerome Glisse: On Mon, Mar 25, 2013 at 12:38 PM, Christian König deathsim...@vodafone.de wrote: Am 25.03.2013 17:01, schrieb j.gli...@gmail.com: From: Jerome Glisse jgli...@redhat.com Same as on r600, trace cs execution by writting cs offset after each states, this allow to pin point lockup inside command stream and narrow down the scope of lockup investigation. Signed-off-by: Jerome Glisse jgli...@redhat.com Could your rewrite this to use an si_pm4_state instead of hand coding it? It's cleaner and should reduce the needed code quite a bit. Christian. Well no, the whole point is to emit inside each si_pm4_state_emit so that you can pin point which reg/packet trigger the lockup. Ok, well then it makes no sense that you increment the counter only once per flush. Christian. The counter is for tracking the cs number (number of call to cs ioctl), while in r600_emit_trace i emit both the counter and the cs-cdw value so that you have both the dwords offset of last trace that went through as well as which cs ioctl call it was. The printf of command stream print both so that you can easily pin point things. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: allocate FMASK right after the texture, so that it's aligned with it
On Sun, Mar 3, 2013 at 9:13 AM, Marek Olšák mar...@gmail.com wrote: This avoids the kernel CS checker errors with MSAA textures. Reviewed-by: Jerome Glisse jgli...@redhat.com --- src/gallium/drivers/r600/r600_texture.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/r600_texture.c b/src/gallium/drivers/r600/r600_texture.c index 484045e..4825592 100644 --- a/src/gallium/drivers/r600/r600_texture.c +++ b/src/gallium/drivers/r600/r600_texture.c @@ -435,8 +435,8 @@ r600_texture_create_object(struct pipe_screen *screen, } if (base-nr_samples 1 !rtex-is_depth !buf) { - r600_texture_allocate_cmask(rscreen, rtex); r600_texture_allocate_fmask(rscreen, rtex); + r600_texture_allocate_cmask(rscreen, rtex); } if (!rtex-is_depth base-nr_samples 1 -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/6] r600g: inline r600_pipe_shader function
On Sun, Mar 3, 2013 at 8:39 AM, Marek Olšák mar...@gmail.com wrote: also change names of other functions, so that they make sense For the serie: Reviewed-by: Jerome Glisse jgli...@redhat.com --- src/gallium/drivers/r600/evergreen_state.c |4 +- src/gallium/drivers/r600/r600_pipe.h |8 +-- src/gallium/drivers/r600/r600_shader.c | 89 -- src/gallium/drivers/r600/r600_state.c|4 +- src/gallium/drivers/r600/r600_state_common.c |4 +- 5 files changed, 51 insertions(+), 58 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index 97f91df..5c7cd40 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -3311,7 +3311,7 @@ void evergreen_init_atom_start_cs(struct r600_context *rctx) eg_store_loop_const(cb, R_03A200_SQ_LOOP_CONST_0 + (32 * 4), 0x01000FFF); } -void evergreen_pipe_shader_ps(struct pipe_context *ctx, struct r600_pipe_shader *shader) +void evergreen_update_ps_state(struct pipe_context *ctx, struct r600_pipe_shader *shader) { struct r600_context *rctx = (struct r600_context *)ctx; struct r600_pipe_state *rstate = shader-rstate; @@ -3460,7 +3460,7 @@ void evergreen_pipe_shader_ps(struct pipe_context *ctx, struct r600_pipe_shader shader-flatshade = rctx-rasterizer-flatshade; } -void evergreen_pipe_shader_vs(struct pipe_context *ctx, struct r600_pipe_shader *shader) +void evergreen_update_vs_state(struct pipe_context *ctx, struct r600_pipe_shader *shader) { struct r600_context *rctx = (struct r600_context *)ctx; struct r600_pipe_state *rstate = shader-rstate; diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index 3eb2968..28c7de3 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -626,8 +626,8 @@ void cayman_init_common_regs(struct r600_command_buffer *cb, void evergreen_init_state_functions(struct r600_context *rctx); void evergreen_init_atom_start_cs(struct r600_context *rctx); -void evergreen_pipe_shader_ps(struct pipe_context *ctx, struct r600_pipe_shader *shader); -void evergreen_pipe_shader_vs(struct pipe_context *ctx, struct r600_pipe_shader *shader); +void evergreen_update_ps_state(struct pipe_context *ctx, struct r600_pipe_shader *shader); +void evergreen_update_vs_state(struct pipe_context *ctx, struct r600_pipe_shader *shader); void *evergreen_create_db_flush_dsa(struct r600_context *rctx); void *evergreen_create_resolve_blend(struct r600_context *rctx); void *evergreen_create_decompress_blend(struct r600_context *rctx); @@ -701,8 +701,8 @@ r600_create_sampler_view_custom(struct pipe_context *ctx, unsigned width_first_level, unsigned height_first_level); void r600_init_state_functions(struct r600_context *rctx); void r600_init_atom_start_cs(struct r600_context *rctx); -void r600_pipe_shader_ps(struct pipe_context *ctx, struct r600_pipe_shader *shader); -void r600_pipe_shader_vs(struct pipe_context *ctx, struct r600_pipe_shader *shader); +void r600_update_ps_state(struct pipe_context *ctx, struct r600_pipe_shader *shader); +void r600_update_vs_state(struct pipe_context *ctx, struct r600_pipe_shader *shader); void *r600_create_db_flush_dsa(struct r600_context *rctx); void *r600_create_resolve_blend(struct r600_context *rctx); void *r700_create_resolve_blend(struct r600_context *rctx); diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 949191a..7ecab7b 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -58,52 +58,6 @@ issued in the w slot as well. The compiler must issue the source argument to slots z, y, and x */ -static int r600_pipe_shader(struct pipe_context *ctx, struct r600_pipe_shader *shader) -{ - struct r600_context *rctx = (struct r600_context *)ctx; - struct r600_shader *rshader = shader-shader; - uint32_t *ptr; - int i; - - /* copy new shader */ - if (shader-bo == NULL) { - shader-bo = (struct r600_resource*) - pipe_buffer_create(ctx-screen, PIPE_BIND_CUSTOM, PIPE_USAGE_IMMUTABLE, rshader-bc.ndw * 4); - if (shader-bo == NULL) { - return -ENOMEM; - } - ptr = r600_buffer_mmap_sync_with_rings(rctx, shader-bo, PIPE_TRANSFER_WRITE); - if (R600_BIG_ENDIAN) { - for (i = 0; i rshader-bc.ndw; ++i) { - ptr[i] = bswap_32(rshader-bc.bytecode[i]); - } - } else { - memcpy(ptr, rshader-bc.bytecode, rshader-bc.ndw * sizeof(*ptr
Re: [Mesa-dev] [PATCH 1/5] r600g: unify vgt states
On Wed, Feb 27, 2013 at 6:11 PM, Marek Olšák mar...@gmail.com wrote: The states were split because we thought it caused a hardlock. Now we know the hardlock was caused by something else and has since been fixed. For the serie: Reviewed-by: Jerome Glisse jgli...@redhat.com --- src/gallium/drivers/r600/evergreen_state.c |3 +-- src/gallium/drivers/r600/r600_hw_context.c |1 - src/gallium/drivers/r600/r600_pipe.h |6 -- src/gallium/drivers/r600/r600_state.c|3 +-- src/gallium/drivers/r600/r600_state_common.c | 22 +++--- 5 files changed, 9 insertions(+), 26 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index 205bbc5..244989d 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -2615,8 +2615,7 @@ void evergreen_init_state_functions(struct r600_context *rctx) r600_init_atom(rctx, rctx-samplers[PIPE_SHADER_GEOMETRY].views.atom, id++, evergreen_emit_gs_sampler_views, 0); r600_init_atom(rctx, rctx-samplers[PIPE_SHADER_FRAGMENT].views.atom, id++, evergreen_emit_ps_sampler_views, 0); - r600_init_atom(rctx, rctx-vgt_state.atom, id++, r600_emit_vgt_state, 6); - r600_init_atom(rctx, rctx-vgt2_state.atom, id++, r600_emit_vgt2_state, 3); + r600_init_atom(rctx, rctx-vgt_state.atom, id++, r600_emit_vgt_state, 7); if (rctx-chip_class == EVERGREEN) { r600_init_atom(rctx, rctx-sample_mask.atom, id++, evergreen_emit_sample_mask, 3); diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index 91af6b8..b78b004 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -827,7 +827,6 @@ void r600_begin_new_cs(struct r600_context *ctx) ctx-framebuffer.atom.dirty = true; ctx-poly_offset_state.atom.dirty = true; ctx-vgt_state.atom.dirty = true; - ctx-vgt2_state.atom.dirty = true; ctx-sample_mask.atom.dirty = true; ctx-scissor.atom.dirty = true; ctx-config_state.atom.dirty = true; diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index 570a284..4cfade1 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -127,10 +127,6 @@ struct r600_vgt_state { struct r600_atom atom; uint32_t vgt_multi_prim_ib_reset_en; uint32_t vgt_multi_prim_ib_reset_indx; -}; - -struct r600_vgt2_state { - struct r600_atom atom; uint32_t vgt_indx_offset; }; @@ -506,7 +502,6 @@ struct r600_context { struct r600_config_stateconfig_state; struct r600_stencil_ref_state stencil_ref; struct r600_vgt_state vgt_state; - struct r600_vgt2_state vgt2_state; struct r600_viewport_state viewport; /* Shaders and shader resources. */ struct r600_cso_state vertex_fetch_shader; @@ -733,7 +728,6 @@ void r600_emit_cso_state(struct r600_context *rctx, struct r600_atom *atom); void r600_emit_alphatest_state(struct r600_context *rctx, struct r600_atom *atom); void r600_emit_blend_color(struct r600_context *rctx, struct r600_atom *atom); void r600_emit_vgt_state(struct r600_context *rctx, struct r600_atom *atom); -void r600_emit_vgt2_state(struct r600_context *rctx, struct r600_atom *atom); void r600_emit_clip_misc_state(struct r600_context *rctx, struct r600_atom *atom); void r600_emit_stencil_ref(struct r600_context *rctx, struct r600_atom *atom); void r600_emit_viewport_state(struct r600_context *rctx, struct r600_atom *atom); diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index bbff6bd..fd3b14e 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -2312,8 +2312,7 @@ void r600_init_state_functions(struct r600_context *rctx) r600_init_atom(rctx, rctx-samplers[PIPE_SHADER_FRAGMENT].views.atom, id++, r600_emit_ps_sampler_views, 0); r600_init_atom(rctx, rctx-vertex_buffer_state.atom, id++, r600_emit_vertex_buffers, 0); - r600_init_atom(rctx, rctx-vgt_state.atom, id++, r600_emit_vgt_state, 6); - r600_init_atom(rctx, rctx-vgt2_state.atom, id++, r600_emit_vgt2_state, 3); + r600_init_atom(rctx, rctx-vgt_state.atom, id++, r600_emit_vgt_state, 7); r600_init_atom(rctx, rctx-seamless_cube_map.atom, id++, r600_emit_seamless_cube_map, 3); r600_init_atom(rctx, rctx-sample_mask.atom, id++, r600_emit_sample_mask, 3); diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index 4c68506..8906695 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium
Re: [Mesa-dev] [PATCH] r600g/radeonsi: unreference previous fence in flush
On Mon, Mar 4, 2013 at 2:05 PM, Michel Dänzer mic...@daenzer.net wrote: On Mon, 2013-03-04 at 13:17 -0500, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com Some code calling the flush function gave a fence pointer that point to an old fence and should be unreference to avoid leaking fence. Candidate for 9.1 Signed-off-by: Jerome Glisse jgli...@redhat.com --- src/gallium/drivers/r600/r600_pipe.c | 8 +--- src/gallium/drivers/radeonsi/radeonsi_pipe.c | 9 ++--- 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index 78002ae..4bcfc67 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -145,12 +145,14 @@ static void r600_flush_from_st(struct pipe_context *ctx, enum pipe_flush_flags flags) { struct r600_context *rctx = (struct r600_context *)ctx; - struct r600_fence **rfence = (struct r600_fence**)fence; + struct r600_fence *rfence; unsigned fflags; fflags = flags PIPE_FLUSH_END_OF_FRAME ? RADEON_FLUSH_END_OF_FRAME : 0; - if (rfence) { - *rfence = r600_create_fence(rctx); + if (fence) { + rfence = r600_create_fence(rctx); + ctx-screen-fence_reference(ctx-screen, fence, + (struct pipe_fence_handle *)rfence); This change increases the reference count of the returned fence from 1 to 2. I don't think that's correct, but if it is, the change should be amended with an explanation why. No i have uncommited change in my tree. I will probably resend with the xa patchset. Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] winsys/radeon: Only add bo to hash table when creating flink
On Fri, Mar 1, 2013 at 4:34 PM, Martin Andersson g02ma...@gmail.com wrote: The problem is that we mix bo handles and flinked names in the hash table. Because kms type handles are not flinked they should not be added to the hash table. If we do that we will sooner or later get a situation where we will overwrite a correct entry because the bo handle was the same as a flinked name. --- src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c index 2d41c26..f4ac526 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c @@ -957,16 +957,16 @@ static boolean radeon_winsys_bo_get_handle(struct pb_buffer *buffer, bo-flinked = TRUE; bo-flink = flink.name; + +pipe_mutex_lock(bo-mgr-bo_handles_mutex); +util_hash_table_set(bo-mgr-bo_handles, (void*)(uintptr_t)bo-flink, bo); +pipe_mutex_unlock(bo-mgr-bo_handles_mutex); } whandle-handle = bo-flink; } else if (whandle-type == DRM_API_HANDLE_TYPE_KMS) { whandle-handle = bo-handle; } -pipe_mutex_lock(bo-mgr-bo_handles_mutex); -util_hash_table_set(bo-mgr-bo_handles, (void*)(uintptr_t)whandle-handle, bo); -pipe_mutex_unlock(bo-mgr-bo_handles_mutex); - whandle-stride = stride; return TRUE; } -- 1.8.1.4 Reviewed-by: Jerome Glisse jgli...@redhat.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g: status of my work on the shader optimization
On Tue, Feb 26, 2013 at 1:05 PM, Stefan Seifert n...@detonation.org wrote: Good news! I gave the r600-sb branch a good testing at commit 265ae41b1f1d086d35d274c7378c43cddb8215c8 and so far I've not had a single lockup in about 1 1/2 hours of flight time! The downside is that this is with R600_HYPERZ=0. But with HYPERZ enabled, I get lockups on master as well, so it would seem your branch is in pretty good shape. Testing done on a Radeon HD 5670 using kernel 3.8 Regards, Stefan Hyperz bug # ? Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/4] r6xx flushing rework and enable CP DMA
On Fri, Feb 22, 2013 at 2:38 PM, alexdeuc...@gmail.com wrote: From: Alex Deucher alexander.deuc...@amd.com This patch set cleans up the flushing on r6xx in what seems to be a logical manner. The last patch enables CP DMA on r6xx. No piglit regressions on RS780 which I was testing on. Alex Deucher (4): r600g: add missing emit_flush for R600_CONTEXT_FLUSH_AND_INV case r600g: synchronize streamout buffers on r6xx too (v2) r600g: set additional cp_coher_cntl bits for 6xx/7xx flush (v2) r600g: enable CP DMA on r6xx (v2) src/gallium/drivers/r600/r600_blit.c |3 +-- src/gallium/drivers/r600/r600_hw_context.c | 26 +- 2 files changed, 18 insertions(+), 11 deletions(-) For the serie: Reviewed-by: Jerome Glisse jgli...@redhat.com -- 1.7.7.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: properly implement S8Z24 depth-stencil format for Evergreen
On Tue, Feb 12, 2013 at 8:06 PM, Marek Olšák mar...@gmail.com wrote: I should say fix, but it has never been used until now. S8Z24 is the format equivalent to the GL_UNSIGNED_INT_24_8 packing, so we'll start to see it more often with st/mesa now making smart decisions about formats. The DB-CB copy can change the channel ordering for transfers, other than that, the internal DB format doesn't really matter. R600-R700 support is possible except shadow mapping. FMT_24_8 is broken if the SAMPLE_C instruction is used (no idea why). Also the sampler swizzling was broken in theory and the fact it worked was a lucky coincidence. radeonsi might need to port this. Reviewed-by: Jerome Glisse jgli...@redhat.com --- src/gallium/drivers/r600/evergreen_state.c | 13 +++- src/gallium/drivers/r600/r600_state.c |8 - src/gallium/drivers/r600/r600_texture.c| 44 ++-- 3 files changed, 47 insertions(+), 18 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index 211c218..c6e29db 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -200,6 +200,8 @@ static uint32_t r600_translate_dbformat(enum pipe_format format) return V_028040_Z_16; case PIPE_FORMAT_Z24X8_UNORM: case PIPE_FORMAT_Z24_UNORM_S8_UINT: + case PIPE_FORMAT_X8Z24_UNORM: + case PIPE_FORMAT_S8_UINT_Z24_UNORM: return V_028040_Z_24; case PIPE_FORMAT_Z32_FLOAT: case PIPE_FORMAT_Z32_FLOAT_S8X24_UINT: @@ -339,7 +341,7 @@ static uint32_t r600_translate_colorswap(enum pipe_format format) case PIPE_FORMAT_X8Z24_UNORM: case PIPE_FORMAT_S8_UINT_Z24_UNORM: - return V_028C70_SWAP_STD; + return V_028C70_SWAP_STD_REV; case PIPE_FORMAT_R10G10B10A2_UNORM: case PIPE_FORMAT_R10G10B10X2_SNORM: @@ -1106,6 +1108,11 @@ evergreen_create_sampler_view_custom(struct pipe_context *ctx, case PIPE_FORMAT_Z32_FLOAT_S8X24_UINT: pipe_format = PIPE_FORMAT_Z32_FLOAT; break; + case PIPE_FORMAT_X8Z24_UNORM: + case PIPE_FORMAT_S8_UINT_Z24_UNORM: + /* Z24 is always stored like this. */ + pipe_format = PIPE_FORMAT_Z24X8_UNORM; + break; case PIPE_FORMAT_X24S8_UINT: case PIPE_FORMAT_S8X24_UINT: case PIPE_FORMAT_X32_S8X24_UINT: @@ -1603,6 +1610,8 @@ static void evergreen_init_depth_surface(struct r600_context *rctx, switch (surf-base.format) { case PIPE_FORMAT_Z24X8_UNORM: case PIPE_FORMAT_Z24_UNORM_S8_UINT: + case PIPE_FORMAT_X8Z24_UNORM: + case PIPE_FORMAT_S8_UINT_Z24_UNORM: surf-pa_su_poly_offset_db_fmt_cntl = S_028B78_POLY_OFFSET_NEG_NUM_DB_BITS((char)-24); break; @@ -2179,6 +2188,8 @@ static void evergreen_emit_polygon_offset(struct r600_context *rctx, struct r600 switch (state-zs_format) { case PIPE_FORMAT_Z24X8_UNORM: case PIPE_FORMAT_Z24_UNORM_S8_UINT: + case PIPE_FORMAT_X8Z24_UNORM: + case PIPE_FORMAT_S8_UINT_Z24_UNORM: offset_units *= 2.0f; break; case PIPE_FORMAT_Z16_UNORM: diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index 5322850..d1f6626 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -270,10 +270,6 @@ static uint32_t r600_translate_colorswap(enum pipe_format format) case PIPE_FORMAT_Z24_UNORM_S8_UINT: return V_0280A0_SWAP_STD; - case PIPE_FORMAT_X8Z24_UNORM: - case PIPE_FORMAT_S8_UINT_Z24_UNORM: - return V_0280A0_SWAP_STD; - case PIPE_FORMAT_R10G10B10A2_UNORM: case PIPE_FORMAT_R10G10B10X2_SNORM: case PIPE_FORMAT_R10SG10SB10SA2U_NORM: @@ -440,10 +436,6 @@ static uint32_t r600_translate_colorformat(enum pipe_format format) case PIPE_FORMAT_Z24_UNORM_S8_UINT: return V_0280A0_COLOR_8_24; - case PIPE_FORMAT_X8Z24_UNORM: - case PIPE_FORMAT_S8_UINT_Z24_UNORM: - return V_0280A0_COLOR_24_8; - case PIPE_FORMAT_Z32_FLOAT_S8X24_UINT: return V_0280A0_COLOR_X24_8_32_FLOAT; diff --git a/src/gallium/drivers/r600/r600_texture.c b/src/gallium/drivers/r600/r600_texture.c index 85fc887..7f5752d 100644 --- a/src/gallium/drivers/r600/r600_texture.c +++ b/src/gallium/drivers/r600/r600_texture.c @@ -985,11 +985,14 @@ uint32_t r600_translate_texformat(struct pipe_screen *screen, const unsigned char *swizzle_view
Re: [Mesa-dev] [PATCH 2/2] r600g: fix lockup when hyperz alpha test are enabled together. v2
On Mon, Feb 11, 2013 at 6:45 PM, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com Seems that alpha test being enabled confuse the GPU on the order in which it should perform the Z testing. So force the order programmed throught db shader control. v2: Only force z order when alpha test is enabled Signed-off-by: Jerome Glisse jgli...@redhat.com Reviewed-by: Marek Olšák mar...@gmail.com This one does not regress piglit (redwood or rv770) and still fix lockup afaict. If no objection i will push tomorrow. Cheers, Jerome --- src/gallium/drivers/r600/evergreen_state.c | 25 +++-- src/gallium/drivers/r600/r600_state.c | 22 +- 2 files changed, 44 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index 211c218..b710131 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -2251,6 +2251,13 @@ static void evergreen_emit_db_misc_state(struct r600_context *rctx, struct r600_ if (rctx-db_state.rsurf rctx-db_state.rsurf-htile_enabled) { /* FORCE_OFF means HiZ/HiS are determined by DB_SHADER_CONTROL */ db_render_override |= S_02800C_FORCE_HIZ_ENABLE(V_02800C_FORCE_OFF); + /* This is to fix a lockup when hyperz and alpha test are enabled at +* the same time some how GPU get confuse on which order to pick for +* z test +*/ + if (rctx-alphatest_state.sx_alpha_test_control) { + db_render_override |= S_02800C_FORCE_SHADER_Z_ORDER(1); + } } else { db_render_override |= S_02800C_FORCE_HIZ_ENABLE(V_02800C_FORCE_DISABLE); } @@ -3240,7 +3247,7 @@ void evergreen_pipe_shader_ps(struct pipe_context *ctx, struct r600_pipe_shader struct r600_context *rctx = (struct r600_context *)ctx; struct r600_pipe_state *rstate = shader-rstate; struct r600_shader *rshader = shader-shader; - unsigned i, exports_ps, num_cout, spi_ps_in_control_0, spi_input_z, spi_ps_in_control_1, db_shader_control; + unsigned i, exports_ps, num_cout, spi_ps_in_control_0, spi_input_z, spi_ps_in_control_1, db_shader_control = 0; int pos_index = -1, face_index = -1; int ninterp = 0; boolean have_linear = FALSE, have_centroid = FALSE, have_perspective = FALSE; @@ -3250,7 +3257,6 @@ void evergreen_pipe_shader_ps(struct pipe_context *ctx, struct r600_pipe_shader rstate-nregs = 0; - db_shader_control = S_02880C_Z_ORDER(V_02880C_EARLY_Z_THEN_LATE_Z); for (i = 0; i rshader-ninput; i++) { /* evergreen NUM_INTERP only contains values interpolated into the LDS, POSITION goes via GPRs from the SC so isn't counted */ @@ -3484,6 +3490,21 @@ void evergreen_update_db_shader_control(struct r600_context * rctx) V_02880C_EXPORT_DB_FULL) | S_02880C_ALPHA_TO_MASK_DISABLE(rctx-framebuffer.cb0_is_integer); + /* When alpha test is enabled we can't antrust the hw to make the proper +* decision on the order in which ztest should be run related to fragment +* shader execution. +* +* If alpha test is enabled perform early z rejection (RE_Z) but don't early +* write to the zbuffer. Write to zbuffer is delayed after fragment shader +* execution and thus after alpha test so if discarded by the alpha test +* the z value is not written. +*/ + if (rctx-alphatest_state.sx_alpha_test_control) { + db_shader_control |= S_02880C_Z_ORDER(V_02880C_RE_Z); + } else { + db_shader_control |= S_02880C_Z_ORDER(V_02880C_EARLY_Z_THEN_LATE_Z); + } + if (db_shader_control != rctx-db_misc_state.db_shader_control) { rctx-db_misc_state.db_shader_control = db_shader_control; rctx-db_misc_state.atom.dirty = true; diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index 5322850..8efd4b3 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -1966,6 +1966,13 @@ static void r600_emit_db_misc_state(struct r600_context *rctx, struct r600_atom if (rctx-db_state.rsurf rctx-db_state.rsurf-htile_enabled) { /* FORCE_OFF means HiZ/HiS are determined by DB_SHADER_CONTROL */ db_render_override |= S_028D10_FORCE_HIZ_ENABLE(V_028D10_FORCE_OFF); + /* This is to fix a lockup when hyperz and alpha test are enabled at +* the same time some how GPU get confuse on which order to pick for +* z test
Re: [Mesa-dev] [PATCH] r600g: add cs memory usage accounting and limit it
On Wed, Jan 30, 2013 at 10:35 PM, Marek Olšák mar...@gmail.com wrote: On Wed, Jan 30, 2013 at 6:14 PM, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com We are now seing cs that can go over the vram+gtt size to avoid failing flush early cs that goes over 70% (gtt+vram) usage. 70% is use to allow some fragmentation. Signed-off-by: Jerome Glisse jgli...@redhat.com --- src/gallium/drivers/r600/evergreen_state.c| 4 src/gallium/drivers/r600/r600.h | 1 + src/gallium/drivers/r600/r600_buffer.c| 1 + src/gallium/drivers/r600/r600_hw_context.c| 12 src/gallium/drivers/r600/r600_pipe.c | 3 +++ src/gallium/drivers/r600/r600_pipe.h | 21 + src/gallium/drivers/r600/r600_state.c | 3 +++ src/gallium/drivers/r600/r600_state_common.c | 17 - src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 11 +++ src/gallium/winsys/radeon/drm/radeon_winsys.h | 10 ++ 10 files changed, 82 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index be1c427..84f8dce 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -1668,6 +1668,8 @@ static void evergreen_set_framebuffer_state(struct pipe_context *ctx, surf = (struct r600_surface*)state-cbufs[i]; rtex = (struct r600_texture*)surf-base.texture; + r600_context_add_resource_size(ctx, state-cbufs[i]-texture); + if (!surf-color_initialized) { evergreen_init_color_surface(rctx, surf); } @@ -1699,6 +1701,8 @@ static void evergreen_set_framebuffer_state(struct pipe_context *ctx, if (state-zsbuf) { surf = (struct r600_surface*)state-zsbuf; + r600_context_add_resource_size(ctx, state-zsbuf-texture); + if (!surf-depth_initialized) { evergreen_init_depth_surface(rctx, surf); } diff --git a/src/gallium/drivers/r600/r600.h b/src/gallium/drivers/r600/r600.h index a383c90..b9f7d3d 100644 --- a/src/gallium/drivers/r600/r600.h +++ b/src/gallium/drivers/r600/r600.h @@ -50,6 +50,7 @@ struct r600_resource { /* Resource state. */ unsigneddomains; + uint64_tsize; Don't add this. Use r600_resource::buf::size instead, which is already initialized. }; #define R600_BLOCK_MAX_BO 32 diff --git a/src/gallium/drivers/r600/r600_buffer.c b/src/gallium/drivers/r600/r600_buffer.c index 6df0d91..92f549a 100644 --- a/src/gallium/drivers/r600/r600_buffer.c +++ b/src/gallium/drivers/r600/r600_buffer.c @@ -250,6 +250,7 @@ bool r600_init_resource(struct r600_screen *rscreen, break; } + res-size = size; res-buf = rscreen-ws-buffer_create(rscreen-ws, size, alignment, use_reusable_pool, initial_domain); diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index ebafd97..44d3b4d 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -359,6 +359,16 @@ out_err: void r600_need_cs_space(struct r600_context *ctx, unsigned num_dw, boolean count_draw_in) { + if (!ctx-ws-cs_memory_below_limit(ctx-rings.gfx.cs, ctx-vram, ctx-gtt)) { + ctx-gtt = 0; + ctx-vram = 0; + ctx-rings.gfx.flush(ctx, RADEON_FLUSH_ASYNC); + return; + } + /* all will be accounted once relocation are emited */ + ctx-gtt = 0; + ctx-vram = 0; + /* The number of dwords we already used in the CS so far. */ num_dw += ctx-rings.gfx.cs-cdw; @@ -784,6 +794,8 @@ void r600_begin_new_cs(struct r600_context *ctx) ctx-pm4_dirty_cdwords = 0; ctx-flags = 0; + ctx-gtt = 0; + ctx-vram = 0; /* Begin a new CS. */ r600_emit_command_buffer(ctx-rings.gfx.cs, ctx-start_cs_cmd); diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index a59578d..cb50cfe 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -333,6 +333,9 @@ static struct pipe_context *r600_create_context(struct pipe_screen *screen, void rctx-chip_class = rscreen-chip_class; rctx-keep_tiling_flags = rscreen-info.drm_minor = 12; + rctx-gtt = 0; + rctx-vram = 0; There is no reason to initialize anything to 0 in context_create. The whole context structure is calloc'd. + LIST_INITHEAD(rctx-active_nontimer_queries
Re: [Mesa-dev] [PATCH 3/3] radeon/winsys: add async dma infrastructure
On Sat, Jan 5, 2013 at 9:49 AM, Christian König deathsim...@vodafone.de wrote: On 04.01.2013 23:19, j.gli...@gmail.com wrote: [SNIP] diff --git a/src/gallium/drivers/r300/r300_emit.c b/src/gallium/drivers/r300/r300_emit.c index d1ed4b3..c824821 100644 --- a/src/gallium/drivers/r300/r300_emit.c +++ b/src/gallium/drivers/r300/r300_emit.c @@ -1184,7 +1184,8 @@ validate: assert(tex tex-buf cbuf is marked, but NULL!); r300-rws-cs_add_reloc(r300-cs, tex-cs_buf, RADEON_USAGE_READWRITE, -r300_surface(fb-cbufs[i])-domain); +r300_surface(fb-cbufs[i])-domain, +RADEON_RING_DMA); ??? DMA ring on R300? At least on first glance that looks quite odd, should probably be GFX ring instead. Yeah it's cut and paste error i catched up that when testing on r3xx } /* ...depth buffer... */ if (fb-zsbuf) { @@ -1192,7 +1193,8 @@ validate: assert(tex tex-buf zsbuf is marked, but NULL!); r300-rws-cs_add_reloc(r300-cs, tex-cs_buf, RADEON_USAGE_READWRITE, -r300_surface(fb-zsbuf)-domain); +r300_surface(fb-zsbuf)-domain, +RADEON_RING_DMA); Same here and repeats on a couple of more places. [SNIP] diff --git a/src/gallium/winsys/radeon/drm/radeon_winsys.h b/src/gallium/winsys/radeon/drm/radeon_winsys.h index 16536dc..5ff463e 100644 --- a/src/gallium/winsys/radeon/drm/radeon_winsys.h +++ b/src/gallium/winsys/radeon/drm/radeon_winsys.h @@ -43,11 +43,13 @@ #include pipebuffer/pb_buffer.h #include libdrm/radeon_surface.h -#define RADEON_MAX_CMDBUF_DWORDS (16 * 1024) +#define RADEON_MAX_CMDBUF_DWORDS(16 * 1024) -#define RADEON_FLUSH_ASYNC (1 0) -#define RADEON_FLUSH_KEEP_TILING_FLAGS (1 1) /* needs DRM 2.12.0 */ -#define RADEON_FLUSH_COMPUTE (1 2) +#define RADEON_FLUSH_ASYNC (1 0) +#define RADEON_FLUSH_KEEP_TILING_FLAGS (1 1) /* needs DRM 2.12.0 */ +#define RADEON_FLUSH_COMPUTE(1 2) +#define RADEON_FLUSH_DMA(1 3) +#define RADEON_FLUSH_GFX(1 4) /* Tiling flags. */ enum radeon_bo_layout { @@ -137,12 +139,19 @@ enum chip_class { TAHITI, }; +enum radeon_ring_type { +RADEON_RING_PM4 = 0, +RADEON_RING_DMA = 1, +}; + Don't use PM4 as identifier here, the PM4 packet format is used for other ring types beside GFX/Compute as well, but those rings can't necessary execute GFX/Compute commands. I was looking for a 3 letter name that encompass gfx and compute struct winsys_handle; struct radeon_winsys_cs_handle; struct radeon_winsys_cs { -unsigned cdw; /* Number of used dwords. */ -uint32_t *buf; /* The command buffer. */ +unsignedcdw; /* Number of used dwords. */ +uint32_t*buf; /* The command buffer. */ +unsigneddma_cdw; /* Number of used dwords. */ +uint32_t*dma_buf; /* The command buffer. */ }; Why like this? Can't we just have separate instances of the radeon_winsys_cs structure for each ring type we are dealing with? The rest looks quite good, Christian. No we can't we need to keep track at the same time for same context of the dma ring and the gfx/compute/uvd/other ring It's the relocation code that needs that. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/3] r600g/radeon/winsys: indentation cleanup
On Fri, Jan 4, 2013 at 5:19 PM, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com Signed-off-by: Jerome Glisse jgli...@redhat.com For the serie piglit says no regression on r7xx/evergreen. I need to test r3xx/r5xx and SI. Cheers, Jerome --- src/gallium/drivers/r600/r600_pipe.c | 18 +- src/gallium/drivers/r600/r600_pipe.h | 2 +- src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 3 +-- src/gallium/winsys/radeon/drm/radeon_drm_cs.h | 2 +- 4 files changed, 12 insertions(+), 13 deletions(-) diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index 65dcbf8..e9d5e0a 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -290,21 +290,21 @@ static struct pipe_context *r600_create_context(struct pipe_screen *screen, void rctx-cs = rctx-ws-cs_create(rctx-ws); rctx-ws-cs_set_flush_callback(rctx-cs, r600_flush_from_winsys, rctx); -rctx-uploader = u_upload_create(rctx-context, 1024 * 1024, 256, - PIPE_BIND_INDEX_BUFFER | - PIPE_BIND_CONSTANT_BUFFER); -if (!rctx-uploader) -goto fail; + rctx-uploader = u_upload_create(rctx-context, 1024 * 1024, 256, + PIPE_BIND_INDEX_BUFFER | + PIPE_BIND_CONSTANT_BUFFER); + if (!rctx-uploader) + goto fail; rctx-allocator_fetch_shader = u_suballocator_create(rctx-context, 64 * 1024, 256, 0, PIPE_USAGE_STATIC, FALSE); -if (!rctx-allocator_fetch_shader) -goto fail; + if (!rctx-allocator_fetch_shader) + goto fail; rctx-allocator_so_filled_size = u_suballocator_create(rctx-context, 4096, 4, - 0, PIPE_USAGE_STATIC, TRUE); + 0, PIPE_USAGE_STATIC, TRUE); if (!rctx-allocator_so_filled_size) -goto fail; + goto fail; rctx-blitter = util_blitter_create(rctx-context); if (rctx-blitter == NULL) diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index 6b7c053..934a6f5 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -408,7 +408,7 @@ struct r600_context { struct radeon_winsys*ws; struct radeon_winsys_cs *cs; struct blitter_context *blitter; - struct u_upload_mgr *uploader; + struct u_upload_mgr *uploader; struct u_suballocator *allocator_so_filled_size; struct u_suballocator *allocator_fetch_shader; struct util_slab_mempoolpool_transfers; diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c index 07e92c5..897e962 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c @@ -802,8 +802,7 @@ static void radeon_bo_set_tiling(struct pb_buffer *_buf, sizeof(args)); } -static struct radeon_winsys_cs_handle *radeon_drm_get_cs_handle( -struct pb_buffer *_buf) +static struct radeon_winsys_cs_handle *radeon_drm_get_cs_handle(struct pb_buffer *_buf) { /* return radeon_bo. */ return (struct radeon_winsys_cs_handle*)get_radeon_bo(_buf); diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_cs.h b/src/gallium/winsys/radeon/drm/radeon_drm_cs.h index 6336d3a..286eb6a 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_cs.h +++ b/src/gallium/winsys/radeon/drm/radeon_drm_cs.h @@ -33,7 +33,7 @@ struct radeon_cs_context { uint32_tbuf[RADEON_MAX_CMDBUF_DWORDS]; -int fd; +int fd; struct drm_radeon_cscs; struct drm_radeon_cs_chunk chunks[3]; uint64_tchunk_array[3]; -- 1.7.11.7 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] radeon/winsys: add async dma infrastructure
On Fri, Jan 4, 2013 at 6:33 PM, Alex Deucher alexdeuc...@gmail.com wrote: On Fri, Jan 4, 2013 at 5:19 PM, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com The design is to take advantage of the fact that kernel will emit semaphore when buffer is referenced by different ring. So the only thing we need to enforce synchronization btw dma and gfx/compute ring is to make sure that we never reference same bo at the same time on the dma and gfx ring. This is achieved by tracking relocation, when we add a relocation to the dma ring for a bo we check first if the bo has an active relocation on the gfx ring. If it's the case we flush the gfx ring. We do the same when adding a bo to the gfx ring we check it does not have a relocation on the dma ring if it has one we flush the dma ring. This patch also simplify the helper query function to know if a bo has pending write/read command. Looks good. A couple of minor comments below. BTW, any performance gains? No, there isn't much benchmark that will trigger a lot of buffer copy AFAICT. Here is a WIP patch for texture copy : http://people.freedesktop.org/~glisse/0001-r600g-r7xx-use-async-dma-for-resource-copy.patch Kernel mostly reject the command stream so far i need to check what's going on. Cheers, Jerome Alex Signed-off-by: Jerome Glisse jgli...@redhat.com --- src/gallium/drivers/r300/r300_emit.c | 21 +- src/gallium/drivers/r300/r300_flush.c | 7 +- src/gallium/drivers/r600/evergreen_hw_context.c| 39 +++ src/gallium/drivers/r600/evergreend.h | 16 ++ src/gallium/drivers/r600/r600.h| 13 + src/gallium/drivers/r600/r600_blit.c | 94 +-- src/gallium/drivers/r600/r600_hw_context.c | 44 +++- src/gallium/drivers/r600/r600_pipe.c | 13 +- src/gallium/drivers/r600/r600_pipe.h | 2 +- src/gallium/drivers/r600/r600_texture.c| 2 +- src/gallium/drivers/r600/r600d.h | 16 ++ src/gallium/drivers/radeonsi/r600_hw_context.c | 2 +- .../drivers/radeonsi/r600_hw_context_priv.h| 2 +- src/gallium/drivers/radeonsi/r600_texture.c| 2 +- src/gallium/drivers/radeonsi/radeonsi_pipe.c | 13 +- src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 10 +- src/gallium/winsys/radeon/drm/radeon_drm_bo.h | 2 + src/gallium/winsys/radeon/drm/radeon_drm_cs.c | 270 + src/gallium/winsys/radeon/drm/radeon_drm_cs.h | 40 ++- src/gallium/winsys/radeon/drm/radeon_drm_winsys.c | 6 + src/gallium/winsys/radeon/drm/radeon_winsys.h | 28 ++- 21 files changed, 509 insertions(+), 133 deletions(-) diff --git a/src/gallium/drivers/r300/r300_emit.c b/src/gallium/drivers/r300/r300_emit.c index d1ed4b3..c824821 100644 --- a/src/gallium/drivers/r300/r300_emit.c +++ b/src/gallium/drivers/r300/r300_emit.c @@ -1184,7 +1184,8 @@ validate: assert(tex tex-buf cbuf is marked, but NULL!); r300-rws-cs_add_reloc(r300-cs, tex-cs_buf, RADEON_USAGE_READWRITE, -r300_surface(fb-cbufs[i])-domain); +r300_surface(fb-cbufs[i])-domain, +RADEON_RING_DMA); } /* ...depth buffer... */ if (fb-zsbuf) { @@ -1192,7 +1193,8 @@ validate: assert(tex tex-buf zsbuf is marked, but NULL!); r300-rws-cs_add_reloc(r300-cs, tex-cs_buf, RADEON_USAGE_READWRITE, -r300_surface(fb-zsbuf)-domain); +r300_surface(fb-zsbuf)-domain, +RADEON_RING_DMA); } } if (r300-textures_state.dirty) { @@ -1204,18 +1206,21 @@ validate: tex = r300_resource(texstate-sampler_views[i]-base.texture); r300-rws-cs_add_reloc(r300-cs, tex-cs_buf, RADEON_USAGE_READ, -tex-domain); +tex-domain, +RADEON_RING_DMA); } } /* ...occlusion query buffer... */ if (r300-query_current) r300-rws-cs_add_reloc(r300-cs, r300-query_current-cs_buf, -RADEON_USAGE_WRITE, RADEON_DOMAIN_GTT); +RADEON_USAGE_WRITE, RADEON_DOMAIN_GTT, +RADEON_RING_DMA); /* ...vertex buffer for SWTCL path... */ if (r300-vbo) r300-rws-cs_add_reloc(r300-cs, r300_resource(r300-vbo)-cs_buf, RADEON_USAGE_READ, -r300_resource(r300-vbo)-domain); +r300_resource(r300-vbo)-domain
Re: [Mesa-dev] [PATCH] r600g: add cs tracing infrastructure for lockup pin pointing
On Wed, Dec 19, 2012 at 12:17 PM, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com It's a build time option you need to set R600_TRACE_CS to 1 and it will print to stderr all cs along as cs trace point value which gave last offset into a cs process by the GPU. Signed-off-by: Jerome Glisse jgli...@redhat.com For information this is something i have been using for a while and i am just getting tire of porting it over and over so i cleaned it up into something that i believe is usefull. My rdb tools can be used to annotate cs output given by this infrastructure: rdb_annotateib hd2xxx.rdb dumpfile dumpfile.readablebyhuman I gave the the last dw before lockup. If you don't have many application running at the same time it has proven to be accurate most of the time. Note you will need the kernel patch i just sent. Cheers, Jerome --- src/gallium/drivers/r600/r600_hw_context.c | 41 + src/gallium/drivers/r600/r600_hw_context_priv.h | 5 +-- src/gallium/drivers/r600/r600_pipe.c| 20 src/gallium/drivers/r600/r600_pipe.h| 16 ++ src/gallium/drivers/r600/r600_state_common.c| 26 5 files changed, 106 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index cdd31a4..6c8cb9d 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -27,6 +27,7 @@ #include r600d.h #include util/u_memory.h #include errno.h +#include unistd.h /* Get backends mask */ void r600_get_backend_mask(struct r600_context *ctx) @@ -369,6 +370,11 @@ void r600_need_cs_space(struct r600_context *ctx, unsigned num_dw, for (i = 0; i R600_NUM_ATOMS; i++) { if (ctx-atoms[i] ctx-atoms[i]-dirty) { num_dw += ctx-atoms[i]-num_dw; +#if R600_TRACE_CS + if (ctx-screen-trace_bo) { + num_dw += R600_TRACE_CS_DWORDS; + } +#endif } } @@ -376,6 +382,11 @@ void r600_need_cs_space(struct r600_context *ctx, unsigned num_dw, /* The upper-bound of how much space a draw command would take. */ num_dw += R600_MAX_FLUSH_CS_DWORDS + R600_MAX_DRAW_CS_DWORDS; +#if R600_TRACE_CS + if (ctx-screen-trace_bo) { + num_dw += R600_TRACE_CS_DWORDS; + } +#endif } /* Count in queries_suspend. */ @@ -717,7 +728,37 @@ void r600_context_flush(struct r600_context *ctx, unsigned flags) } /* Flush the CS. */ +#if R600_TRACE_CS + if (ctx-screen-trace_bo) { + struct r600_screen *rscreen = ctx-screen; + unsigned i; + + for (i = 0; i cs-cdw; i++) { + fprintf(stderr, [%4d] [%5d] 0x%08x\n, rscreen-cs_count, i, cs-buf[i]); + } + rscreen-cs_count++; + } +#endif ctx-ws-cs_flush(ctx-cs, flags); +#if R600_TRACE_CS + if (ctx-screen-trace_bo) { + struct r600_screen *rscreen = ctx-screen; + unsigned i; + + for (i = 0; i 10; i++) { + usleep(5); + if (!ctx-ws-buffer_is_busy(rscreen-trace_bo-buf, RADEON_USAGE_READWRITE)) { + break; + } + } + if (i == 10) { + fprintf(stderr, timeout on cs lockup likely happen at cs %d dw %d\n, + rscreen-trace_ptr[1], rscreen-trace_ptr[0]); + } else { + fprintf(stderr, cs %d executed in %dms\n, rscreen-trace_ptr[1], i * 5); + } + } +#endif r600_begin_new_cs(ctx); } diff --git a/src/gallium/drivers/r600/r600_hw_context_priv.h b/src/gallium/drivers/r600/r600_hw_context_priv.h index 050c472..692e6ec 100644 --- a/src/gallium/drivers/r600/r600_hw_context_priv.h +++ b/src/gallium/drivers/r600/r600_hw_context_priv.h @@ -29,8 +29,9 @@ #include r600_pipe.h /* the number of CS dwords for flushing and drawing */ -#define R600_MAX_FLUSH_CS_DWORDS 12 -#define R600_MAX_DRAW_CS_DWORDS 34 +#define R600_MAX_FLUSH_CS_DWORDS 12 +#define R600_MAX_DRAW_CS_DWORDS34 +#define R600_TRACE_CS_DWORDS 7 /* these flags are used in register flags and added into block flags */ #define REG_FLAG_NEED_BO 1 diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index e497744..7990400 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -723,6 +723,12 @@ static void r600_destroy_screen(struct pipe_screen
Re: [Mesa-dev] [PATCH 2/2] r600g: texture buffer object + glsl 1.40 enable support
On Wed, Dec 19, 2012 at 12:33 PM, Tom Stellard t...@stellard.net wrote: On Sun, Dec 16, 2012 at 08:33:23PM +1000, Dave Airlie wrote: From: Dave Airlie airl...@redhat.com This adds TBO support to r600g, and with GLSL 1.40 enabled, we now get 3.1 core profiles advertised for r600g. This code is evergreen only so far, but I don't think there is much to make it work on r600/700/cayman other than testing. a) buffer txq is broken like cube map txq, this sucks, fix it the exact same way. b) buffer fetches are done with a vertex clause, c) vertex swizzling offsets are different than texture swizzles, but we still need to use the combiner, so make it configurable. d) add implementation of UCMP. TODO: r600/700/cayman testin Signed-off-by: Dave Airlie airl...@redhat.com --- src/gallium/drivers/r600/evergreen_state.c | 55 src/gallium/drivers/r600/r600_asm.c | 2 +- src/gallium/drivers/r600/r600_asm.h | 2 + src/gallium/drivers/r600/r600_pipe.c | 4 +- src/gallium/drivers/r600/r600_pipe.h | 10 +++- src/gallium/drivers/r600/r600_shader.c | 75 src/gallium/drivers/r600/r600_shader.h | 1 + src/gallium/drivers/r600/r600_state_common.c | 58 + src/gallium/drivers/r600/r600_texture.c | 16 -- 9 files changed, 204 insertions(+), 19 deletions(-) [snip] diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index feb7001..60667e7 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -3819,6 +3819,71 @@ static inline unsigned tgsi_tex_get_src_gpr(struct r600_shader_ctx *ctx, return ctx-file_offset[inst-Src[index].Register.File] + inst-Src[index].Register.Index; } +static int do_vtx_fetch_inst(struct r600_shader_ctx *ctx, boolean src_requires_loading) +{ + struct r600_bytecode_vtx vtx; + struct r600_bytecode_alu alu; + struct tgsi_full_instruction *inst = ctx-parse.FullToken.FullInstruction; + int src_gpr, r, i; + + src_gpr = tgsi_tex_get_src_gpr(ctx, 0); + if (src_requires_loading) { + for (i = 0; i 4; i++) { + memset(alu, 0, sizeof(struct r600_bytecode_alu)); + alu.inst = CTX_INST(V_SQ_ALU_WORD1_OP2_SQ_OP2_INST_MOV); + r600_bytecode_src(alu.src[0], ctx-src[0], i); + alu.dst.sel = ctx-temp_reg; + alu.dst.chan = i; + if (i == 3) + alu.last = 1; + alu.dst.write = 1; + r = r600_bytecode_add_alu(ctx-bc, alu); + if (r) + return r; + } + src_gpr = ctx-temp_reg; + } + + memset(vtx, 0, sizeof(vtx)); + vtx.inst = 0; + vtx.buffer_id = tgsi_tex_get_src_gpr(ctx, 1) + R600_MAX_CONST_BUFFERS;; + vtx.fetch_type = 2; /* VTX_FETCH_NO_INDEX_OFFSET */ + vtx.src_gpr = src_gpr; + vtx.mega_fetch_count = 16; + vtx.dst_gpr = ctx-file_offset[inst-Dst[0].Register.File] + inst-Dst[0].Register.Index; + vtx.dst_sel_x = (inst-Dst[0].Register.WriteMask 1) ? 0 : 7; /* SEL_X */ + vtx.dst_sel_y = (inst-Dst[0].Register.WriteMask 2) ? 1 : 7; /* SEL_Y */ + vtx.dst_sel_z = (inst-Dst[0].Register.WriteMask 4) ? 2 : 7; /* SEL_Z */ + vtx.dst_sel_w = (inst-Dst[0].Register.WriteMask 8) ? 3 : 7; /* SEL_W */ + vtx.use_const_fields = 1; + vtx.srf_mode_all = 1; /* SRF_MODE_NO_ZERO */ + According to the docs, srf_mode_all will be ignored if use_const_fields is set. However, based on my tests while running compute shaders, other fields like data_format, which are supposed to be ignored weren't being ignored unless the were set to zero. So, I think it would be safer here to set srf_mode_all to zero and make sure that bit gets set on the resource. + if ((r = r600_bytecode_add_vtx(ctx-bc, vtx))) + return r; + return 0; +} + Otherwise, this code for vtx fetch looks good to me. One problem I ran into with vtx fetch instructions while working on compute shaders was that the GPU will hang if you write to vtx.src_gpr in the instruction group following the vtx fetch. Here is a simple example: %T2_Xdef = MOV %ZERO %T3_Xdef = VTX_READ_eg %T2_Xkill, 24 %T2_Xdef = MOV %ZERO I'm not sure if this happens on all GPU variants, but I was able to consistently reproduce this on my SUMO. You may want to keep an eye out for this in case you run into any unexplainable hangs. The vtx fetch group had the barrier flag set ? Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] r600g: rework flusing and synchronization pattern v4
On Sat, Dec 8, 2012 at 7:27 PM, Marek Olšák mar...@gmail.com wrote: Hi Jerome, I'm okay with the simplification of r600_flush_emit, I'm not so okay with some other things. There's also some cruft unrelated to flushing. 1) R600_CONTEXT_FLUSH could have a better name, because it's not clear what it does. (it looks like it only flushed read-only bindings) GPU_FLUSH ? 2) Don't use magic numbers when setting cp_coher_cntl unless you want to hide something from us / obfuscating the code. :) 3) The definition of R600_MAX_FLUSH_CS_DWORDS should be updated. Yes i haven't recomputed worst case 4) SURFACE_BASE_UPDATE is emitted twice in emit_framebuffer_state. I don't think splitting one packet into two packets doing the same thing is needed. It's need couple r6xx/r7xx gpu will lockup after couple hour of stressing, wasn't seeing lockup with it. 5) RS780 and RS880 don't need SURFACE_BASE_UPDATE for streamout. Their streamout hardware was actually copied from R700. Doing CHIP_RS780 instead of CHIP_RV770 was correct. The same for r600_flush_emit. fglrx mostly do the same on r7xx and r6xx for streamout as i am not sure i have any stressing test for that i side on fglrx side. 6) In r600_context_flush, don't remove the comment about flushing framebuffer caches, because it's still done there. 7) Masking out R600_CONTEXT_FLUSH in r600_context_emit_fence is not correct. We should still flush the caches later if they're dirty and even if the fence was emitted. You can't see this regression in piglit, because we don't have a test for that. True 8) There's some inconsistent flushing between graphics and compute colorbuffer bindings. For graphics, you use (WAIT_IDLE | FLUSH_AND_INV), which makes sense. For compute, you use R600_CONTEXT_FLUSH (which is used for vertex buffers and the like elsewhere, but not colorbuffers). I haven't paid much attention to compute side, i should probably look at it. And one question: Why do you use set both FLUSH_AND_INV and STREAMOUT_FLUSH on Evergreen, while r600 only gets FLUSH_AND_INV? Did you overlook this? No, just matching fglrx pattern, i don't think i tested without that change, but it definitly match fglrx. Cheers, Jerome Marek On Thu, Dec 6, 2012 at 8:51 PM, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com This bring r600g allmost inline with closed source driver when it comes to flushing and synchronization pattern. Signed-off-by: Jerome Glisse jgli...@redhat.com --- src/gallium/drivers/r600/evergreen_compute.c | 8 +- .../drivers/r600/evergreen_compute_internal.c | 4 +- src/gallium/drivers/r600/evergreen_state.c | 4 +- src/gallium/drivers/r600/r600.h| 16 +-- src/gallium/drivers/r600/r600_hw_context.c | 154 - src/gallium/drivers/r600/r600_state.c | 18 ++- src/gallium/drivers/r600/r600_state_common.c | 19 ++- 7 files changed, 61 insertions(+), 162 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c index 44831a7..33a5910 100644 --- a/src/gallium/drivers/r600/evergreen_compute.c +++ b/src/gallium/drivers/r600/evergreen_compute.c @@ -98,7 +98,7 @@ static void evergreen_cs_set_vertex_buffer( /* The vertex instructions in the compute shaders use the texture cache, * so we need to invalidate it. */ - rctx-flags |= R600_CONTEXT_TEX_FLUSH; + rctx-flags |= R600_CONTEXT_FLUSH; state-enabled_mask |= 1 vb_index; state-dirty_mask |= 1 vb_index; state-atom.dirty = true; @@ -329,7 +329,7 @@ static void compute_emit_cs(struct r600_context *ctx, const uint *block_layout, */ r600_emit_command_buffer(ctx-cs, ctx-start_compute_cs_cmd); - ctx-flags |= R600_CONTEXT_CB_FLUSH; + ctx-flags |= R600_CONTEXT_FLUSH; r600_flush_emit(ctx); /* Emit colorbuffers. */ @@ -409,7 +409,7 @@ static void compute_emit_cs(struct r600_context *ctx, const uint *block_layout, /* XXX evergreen_flush_emit() hardcodes the CP_COHER_SIZE to 0x */ - ctx-flags |= R600_CONTEXT_CB_FLUSH; + ctx-flags |= R600_CONTEXT_FLUSH; r600_flush_emit(ctx); #if 0 @@ -468,7 +468,7 @@ void evergreen_emit_cs_shader( r600_write_value(cs, r600_context_bo_reloc(rctx, kernel-code_bo, RADEON_USAGE_READ)); - rctx-flags |= R600_CONTEXT_SHADERCONST_FLUSH; + rctx-flags |= R600_CONTEXT_FLUSH; } static void evergreen_launch_grid( diff --git a/src/gallium/drivers/r600/evergreen_compute_internal.c b/src/gallium/drivers/r600/evergreen_compute_internal.c index 7bc7fb4..187bcf1 100644 --- a/src/gallium/drivers/r600/evergreen_compute_internal.c +++ b/src/gallium/drivers/r600/evergreen_compute_internal.c @@ -538,7 +538,7 @@ void
Re: [Mesa-dev] Proposal: allow hidden security bugs on Mesa's Bugzilla
On Fri, Nov 30, 2012 at 7:43 AM, Benoit Jacob bja...@mozilla.com wrote: On 12-11-23 02:21 PM, Benoit Jacob wrote: On 12-11-21 12:48 PM, Chad Versace wrote: On 11/20/2012 09:29 AM, Benoit Jacob wrote: Any questions? Do you support or oppose me asking FD.o admins to allow hidden bugs on Mesa's bugzilla? Benoit I support this. It seems a sensible proposal for addressing security bugs. Thanks. I have just sent the request to FD.o admins. Benoit This option is now turned on on Bugzilla. See the new checkbox: Mesa Security Group Thanks! Benoit How does one get into the security group ? Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix abysmal performance in Reaction Quake
On Thu, Nov 01, 2012 at 03:13:31AM +0100, Marek Olšák wrote: On Thu, Nov 1, 2012 at 2:13 AM, Alex Deucher alexdeuc...@gmail.com wrote: On Wed, Oct 31, 2012 at 8:05 PM, Marek Olšák mar...@gmail.com wrote: The problem was we set VRAM|GTT for relocations of STATIC resources. Setting just VRAM increases the framerate 4 times on my machine. I rewrote the switch statement and adjusted the domains for window framebuffers too. Reviewed-by: Alex Deucher alexander.deuc...@amd.com Stable branches? Yes, good idea. Marek Btw as a follow up on this, i did some experiment with ttm and eviction. Blocking any vram eviction improve average fps (20-30%) and minimum fps (40-60%) but it diminish maximum fps (100%). Overall blocking eviction just make framerate more consistant. I then tried several heuristic on the eviction process (not evicting buffer if buffer was use in the last 1ms, 10ms, 20ms ..., sorting lru differently btw buffer used for rendering and auxiliary buffer use by kernel, ... none of those heuristic improved anything. I also removed bo wait in the eviction pipeline but still no improvement. Haven't time to look further but anyway bottom line is that some benchmark are memory tight and constant eviction hurt. (used unigine heaven and reaction quake for benchmark) Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] R600 tiling halves the frame rate
On Tue, Oct 30, 2012 at 8:49 PM, Tzvetan Mikov tmi...@jupiter.com wrote: On 10/30/2012 05:20 PM, Tzvetan Mikov wrote: Thanks a lot! I reproduced the same results here and I think I have figured out what the problem is. The frame buffer is always created in linear mode. The temporary hack included below doubles the performance for me with EGL. Could you please check if it has the same result for you? If it does, what would be the next step to address this? I guess I could try to prepare a real patch to fix this, as soon as I figure the right way to do it... :-) I am new to Mesa, but I am making my way through the code base. regards, Tzvetan commit 10bb3497caba1655022a53a3a04c81be6e122faa Author: Tzvetan Mikov tmi...@jupiter.com Date: Tue Oct 30 17:12:42 2012 -0700 r600_texture.c: HACK to enforce tiling in the default case diff --git a/src/gallium/drivers/r600/r600_texture.c b/src/gallium/drivers/r600/r600_texture.c index 85e4e0c..f415de3 100644 --- a/src/gallium/drivers/r600/r600_texture.c +++ b/src/gallium/drivers/r600/r600_texture.c @@ -450,7 +450,7 @@ struct pipe_resource *r600_texture_create(struct pipe_screen *screen, { struct r600_screen *rscreen = (struct r600_screen*)screen; struct radeon_surface surface; -unsigned array_mode = 0; +unsigned array_mode = V_038000_ARRAY_1D_TILED_THIN1; int r; if (!(templ-flags R600_RESOURCE_FLAG_TRANSFER)) { I just noticed that with this hack the display doesn't look quite right, so while it hopefully points in the right direction, the real fix is likely to be much more involved. My enthusiasm may have been premature :-) regards, Tzvetan For it to look right we need mesa to call into the kernel to tell the kernel what is the bo tiling format. We should do that for scanout buffer. This will fix your issue and you probably want 2d tiled not 1d for scanout. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: fix abysmal performance in Reaction Quake
On Wed, Oct 31, 2012 at 8:05 PM, Marek Olšák mar...@gmail.com wrote: The problem was we set VRAM|GTT for relocations of STATIC resources. Setting just VRAM increases the framerate 4 times on my machine. I rewrote the switch statement and adjusted the domains for window framebuffers too. Reviewed-by: Jerome Glisse jgli...@redhat.com --- src/gallium/drivers/r600/r600_buffer.c | 42 --- src/gallium/drivers/r600/r600_texture.c |3 ++- 2 files changed, 24 insertions(+), 21 deletions(-) diff --git a/src/gallium/drivers/r600/r600_buffer.c b/src/gallium/drivers/r600/r600_buffer.c index f4566ee..116ab51 100644 --- a/src/gallium/drivers/r600/r600_buffer.c +++ b/src/gallium/drivers/r600/r600_buffer.c @@ -206,29 +206,31 @@ bool r600_init_resource(struct r600_screen *rscreen, { uint32_t initial_domain, domains; - /* Staging resources particpate in transfers and blits only -* and are used for uploads and downloads from regular -* resources. We generate them internally for some transfers. -*/ - if (usage == PIPE_USAGE_STAGING) { + switch(usage) { + case PIPE_USAGE_STAGING: + /* Staging resources participate in transfers, i.e. are used +* for uploads and downloads from regular resources. +* We generate them internally for some transfers. +*/ + initial_domain = RADEON_DOMAIN_GTT; domains = RADEON_DOMAIN_GTT; + break; + case PIPE_USAGE_DYNAMIC: + case PIPE_USAGE_STREAM: + /* Default to GTT, but allow the memory manager to move it to VRAM. */ initial_domain = RADEON_DOMAIN_GTT; - } else { domains = RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM; - - switch(usage) { - case PIPE_USAGE_DYNAMIC: - case PIPE_USAGE_STREAM: - case PIPE_USAGE_STAGING: - initial_domain = RADEON_DOMAIN_GTT; - break; - case PIPE_USAGE_DEFAULT: - case PIPE_USAGE_STATIC: - case PIPE_USAGE_IMMUTABLE: - default: - initial_domain = RADEON_DOMAIN_VRAM; - break; - } + break; + case PIPE_USAGE_DEFAULT: + case PIPE_USAGE_STATIC: + case PIPE_USAGE_IMMUTABLE: + default: + /* Don't list GTT here, because the memory manager would put some +* resources to GTT no matter what the initial domain is. +* Not listing GTT in the domains improves performance a lot. */ + initial_domain = RADEON_DOMAIN_VRAM; + domains = RADEON_DOMAIN_VRAM; + break; } res-buf = rscreen-ws-buffer_create(rscreen-ws, size, alignment, bind, initial_domain); diff --git a/src/gallium/drivers/r600/r600_texture.c b/src/gallium/drivers/r600/r600_texture.c index 785eeff..2df390d 100644 --- a/src/gallium/drivers/r600/r600_texture.c +++ b/src/gallium/drivers/r600/r600_texture.c @@ -421,9 +421,10 @@ r600_texture_create_object(struct pipe_screen *screen, return NULL; } } else if (buf) { + /* This is usually the window framebuffer. We want it in VRAM, always. */ resource-buf = buf; resource-cs_buf = rscreen-ws-buffer_get_cs_handle(buf); - resource-domains = RADEON_DOMAIN_GTT | RADEON_DOMAIN_VRAM; + resource-domains = RADEON_DOMAIN_VRAM; } if (rtex-cmask_size) { -- 1.7.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] R600 tiling halves the frame rate
On Tue, Oct 30, 2012 at 10:43 AM, Tzvetan Mikov tmi...@jupiter.com wrote: On 10/30/2012 07:12 AM, Patrick Baggett wrote: Is your screen refresh rate 70 Hz? Because if so, that means that it's syncing to the vblank on Mesa, and not doing so on the proprietary one. Unfortunately no. In fact the Gallium EGL/R600 doesn't support flip on vsync at all - eglSwapInterval is always 0. The output is a standard 60Hz LCD, plus I do get different, (but still low in absolute terms) frame rates with different chips. Off the top of my head: - HD5430 - 120 FPS - HD6450 - 140 FPS - HD6460 - 70 FPS - HD6750 - 400 FPS - HD6760 - 240 FPS I do think there is something fishy with the page flip though, which I am planning to investigate today. It is way too slow - a render loop which does nothing but a eglSwapBuffers() (no actual rendering whatsoever) runs at only 350 FPS. It should be either 60FPS, or thousands. regards, Tzvetan So tested, it's something inside egl that lead to this, same program as yours with glut on X11 with 2d tiling enabled and 2d color tiling have a slight advantage 140fps vs 137fps (windowed so there is a blit which would account for a hugue chunk of perf diff with fglrx). However using egl i got 70fps with color tiling and 74fps without. So something in egl is slowing things down. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] R600 tiling halves the frame rate
On Fri, Oct 26, 2012 at 8:07 PM, Tzvetan Mikov tmi...@jupiter.com wrote: Hi, I have been running tests with Mesa 9.0 and Rdeon R600 (Radeon HD 6460) and I accidentally noticed that a small hack I did to disable texture tiling, actually *doubles* the frame rate. With different chips (e.g. 6750) the difference is less pronounced, but in all cases texture tiling decreased the performance noticeably in my tests. Can anyone shed some light on this? Is this by design - e.g. is this a case of we know that tiling is currently slower than linear but the huge payoff is scheduled to arrive in a future revision? Thanks! Tzvetan No, in all benchmark i made on various gpu from hd2xxx to hd6xxx tiling always gave a performance boost btw 5% up to 20%. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: avoid shader needing too many gpr to lockup the gpu
On Fri, Oct 26, 2012 at 10:01 PM, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com On r6xx/r7xx shader resource management need to make sure that the shader does not goes over the gpr register limit. Each specific asic has a maxmimum register that can be split btw shader stage. For each stage the shader must not use more register than the limit programmed. Signed-off-by: Jerome Glisse jgli...@redhat.com I haven't yet fully tested it on wide range of GPU but it fixes piglit case that were locking up o one can directly use quick-drivers. I mostly would like feedback on if we should print a warning when we discard a draw command because shader exceed limit. Note that with this patch the test that were locking up fails but with a simple patch on top of that (decreasing clause temp gpr to 2) they pass. Regards, Jerome --- src/gallium/drivers/r600/r600_pipe.h | 1 + src/gallium/drivers/r600/r600_state.c| 60 +++- src/gallium/drivers/r600/r600_state_common.c | 22 +- 3 files changed, 55 insertions(+), 28 deletions(-) diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index ff2a5fd..2045af3 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -363,6 +363,7 @@ struct r600_context { enum chip_class chip_class; boolean has_vertex_cache; boolean keep_tiling_flags; + booldiscard_draw; unsigneddefault_ps_gprs, default_vs_gprs; unsignedr6xx_num_clause_temp_gprs; unsignedbackend_mask; diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index 7d07008..43af934 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -2189,30 +2189,54 @@ void r600_init_state_functions(struct r600_context *rctx) /* Adjust GPR allocation on R6xx/R7xx */ void r600_adjust_gprs(struct r600_context *rctx) { - unsigned num_ps_gprs = rctx-default_ps_gprs; - unsigned num_vs_gprs = rctx-default_vs_gprs; + unsigned num_ps_gprs = rctx-ps_shader-current-shader.bc.ngpr; + unsigned num_vs_gprs = rctx-vs_shader-current-shader.bc.ngpr; + unsigned new_num_ps_gprs = num_ps_gprs; + unsigned new_num_vs_gprs = num_vs_gprs; + unsigned cur_num_ps_gprs = G_008C04_NUM_PS_GPRS(rctx-config_state.sq_gpr_resource_mgmt_1); + unsigned cur_num_vs_gprs = G_008C04_NUM_VS_GPRS(rctx-config_state.sq_gpr_resource_mgmt_1); + unsigned def_num_ps_gprs = rctx-default_ps_gprs; + unsigned def_num_vs_gprs = rctx-default_vs_gprs; + unsigned def_num_clause_temp_gprs = rctx-r6xx_num_clause_temp_gprs; + /* hardware will reserve twice num_clause_temp_gprs */ + unsigned max_gprs = def_num_ps_gprs + def_num_vs_gprs + def_num_clause_temp_gprs * 2; unsigned tmp; - int diff; - if (rctx-ps_shader-current-shader.bc.ngpr rctx-default_ps_gprs) { - diff = rctx-ps_shader-current-shader.bc.ngpr - rctx-default_ps_gprs; - num_vs_gprs -= diff; - num_ps_gprs += diff; + /* the sum of all SQ_GPR_RESOURCE_MGMT*.NUM_*_GPRS must = to max_gprs */ + if (new_num_ps_gprs cur_num_ps_gprs || new_num_vs_gprs cur_num_vs_gprs) { + /* try to use switch back to default */ + if (new_num_ps_gprs def_num_ps_gprs || new_num_vs_gprs def_num_vs_gprs) { + /* always privilege vs stage so that at worst we have the +* pixel stage producing wrong output (not the vertex +* stage) */ + new_num_ps_gprs = max_gprs - (new_num_vs_gprs + def_num_clause_temp_gprs * 2); + new_num_vs_gprs = num_vs_gprs; + } else { + new_num_ps_gprs = def_num_ps_gprs; + new_num_vs_gprs = def_num_vs_gprs; + } + } else { + rctx-discard_draw = false; + return; } - if (rctx-vs_shader-current-shader.bc.ngpr rctx-default_vs_gprs) - { - diff = rctx-vs_shader-current-shader.bc.ngpr - rctx-default_vs_gprs; - num_ps_gprs -= diff; - num_vs_gprs += diff; + /* SQ_PGM_RESOURCES_*.NUM_GPRS must always be program to a value = +* SQ_GPR_RESOURCE_MGMT*.NUM_*_GPRS otherwise the GPU will lockup +* Also if a shader use more gpr than SQ_GPR_RESOURCE_MGMT*.NUM_*_GPRS +* it will lockup. So in this case just discard the draw command +* and don't change the current gprs repartitions. +*/ + rctx-discard_draw = false
Re: [Mesa-dev] R600 tiling halves the frame rate
On Fri, Oct 26, 2012 at 10:26 PM, Tzvetan Mikov tmi...@jupiter.com wrote: -Original Message- From: Jerome Glisse Can anyone shed some light on this? Is this by design - e.g. is this a case of we know that tiling is currently slower than linear but the huge payoff is scheduled to arrive in a future revision? Thanks! Tzvetan No, in all benchmark i made on various gpu from hd2xxx to hd6xxx tiling always gave a performance boost btw 5% up to 20%. This is interesting. All I am doing is rotating a big texture on the screen. I am using EGL+Gallium, so it is as simple as it gets. The hack I am using to disable texture tiling is also extremely simple (see below). It speeds up the FPS measurably, up to the extreme case of doubling it on HD6460. What am I missing? Regards, Tzvetan Could you provide a simple gl demo or point to one that shows the same behavior with your patch. So i have something to know if i am reproducing or not Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 01/14] r600g: remove the atom variable from r600_command_buffer
On Sun, Oct 07, 2012 at 08:08:03PM +0200, Marek Olšák wrote: r600_command_buffer is not an atom. The atoms have evolved into state slots (or groups of state slots) where you can bind states. There is a fixed amount of atoms (state slots) in the context. The command buffers are nothing like that. They represent states, not state slots. We could probably give r600_atom a better name someday. For the serie: Reviewed-by: Jerome Glisse jgli...@redhat.com --- src/gallium/drivers/r600/evergreen_compute.c |4 +-- src/gallium/drivers/r600/evergreen_state.c |4 +-- src/gallium/drivers/r600/r600_hw_context.c |4 +-- src/gallium/drivers/r600/r600_pipe.h | 44 +++--- src/gallium/drivers/r600/r600_state.c|2 +- src/gallium/drivers/r600/r600_state_common.c | 13 +--- 6 files changed, 34 insertions(+), 37 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c index b7c7345..abd5b3c 100644 --- a/src/gallium/drivers/r600/evergreen_compute.c +++ b/src/gallium/drivers/r600/evergreen_compute.c @@ -329,7 +329,7 @@ static void compute_emit_cs(struct r600_context *ctx, const uint *block_layout, * See evergreen_init_atom_start_compute_cs() in this file for the list * of registers initialized by the start_compute_cs_cmd atom. */ - r600_emit_atom(ctx, ctx-start_compute_cs_cmd.atom); + r600_emit_command_buffer(ctx-cs, ctx-start_compute_cs_cmd); ctx-flags |= R600_CONTEXT_CB_FLUSH; r600_flush_emit(ctx); @@ -625,7 +625,7 @@ void evergreen_init_atom_start_compute_cs(struct r600_context *ctx) /* since all required registers are initialised in the * start_compute_cs_cmd atom, we can EMIT_EARLY here. */ - r600_init_command_buffer(ctx, cb, 1, 256); + r600_init_command_buffer(cb, 256); cb-pkt_flags = RADEON_CP_PACKET3_COMPUTE_MODE; switch (ctx-family) { diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index e35314f..a073021 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -2373,7 +2373,7 @@ static void cayman_init_atom_start_cs(struct r600_context *rctx) { struct r600_command_buffer *cb = rctx-start_cs_cmd; - r600_init_command_buffer(rctx, cb, 0, 256); + r600_init_command_buffer(cb, 256); /* This must be first. */ r600_store_value(cb, PKT3(PKT3_CONTEXT_CONTROL, 1, 0)); @@ -2774,7 +2774,7 @@ void evergreen_init_atom_start_cs(struct r600_context *rctx) return; } - r600_init_command_buffer(rctx, cb, 0, 256); + r600_init_command_buffer(cb, 256); /* This must be first. */ r600_store_value(cb, PKT3(PKT3_CONTEXT_CONTROL, 1, 0)); diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index 8245059..723039a 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -815,7 +815,7 @@ void r600_context_flush(struct r600_context *ctx, unsigned flags) { struct radeon_winsys_cs *cs = ctx-cs; - if (cs-cdw == ctx-start_cs_cmd.atom.num_dw) + if (cs-cdw == ctx-start_cs_cmd.num_dw) return; ctx-timer_queries_suspended = false; @@ -875,7 +875,7 @@ void r600_begin_new_cs(struct r600_context *ctx) ctx-flags = 0; /* Begin a new CS. */ - r600_emit_atom(ctx, ctx-start_cs_cmd.atom); + r600_emit_command_buffer(ctx-cs, ctx-start_cs_cmd); /* Re-emit states. */ r600_atom_dirty(ctx, ctx-alphatest_state.atom); diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index 607116f..be7b891 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -59,8 +59,8 @@ struct r600_atom { /* This is an atom containing GPU commands that never change. * This is supposed to be copied directly into the CS. */ struct r600_command_buffer { - struct r600_atom atom; uint32_t *buf; + unsigned num_dw; unsigned max_num_dw; unsigned pkt_flags; }; @@ -504,6 +504,14 @@ struct r600_context { int last_start_instance; }; +static INLINE void r600_emit_command_buffer(struct radeon_winsys_cs *cs, + struct r600_command_buffer *cb) +{ + assert(cs-cdw + cb-num_dw = RADEON_MAX_CMDBUF_DWORDS); + memcpy(cs-buf + cs-cdw, cb-buf, 4 * cb-num_dw); + cs-cdw += cb-num_dw; +} + static INLINE void r600_emit_atom(struct r600_context *rctx, struct r600_atom *atom) { atom-emit(rctx, atom); @@ -696,15 +704,15 @@ unsigned r600_tex_compare(unsigned compare); static INLINE void r600_store_value(struct r600_command_buffer *cb, unsigned
Re: [Mesa-dev] [PATCH] r600g: add in-place DB decompression and texturing with DB tiling
On Wed, Oct 3, 2012 at 5:50 PM, Marek Olšák mar...@gmail.com wrote: The decompression is done in-place and only the compressed tiles are decompressed. Note: R6xx-R7xx can do that only with Z16 and Z32F. The texture unit is programmed to use non-displayable tiling and depth ordering of samples, so that it can fetch the texture in the native DB format. The latest version of the libdrm surface allocator is required for stencil texturing to work. The old one didn't create the mipmap tree correctly. We need a separate mipmap tree for stencil, because the stencil mipmap offsets are not really depth offsets/4. The DB-CB copy is still used for transfers. --- I sent the libdrm patches a few minutes ago. I guess I will have to make another libdrm release. What's good about this is that it improves performance by 4-5% with the 1024x768 resolution in Lightsmark on Evergreen. However, the larger the resolution, the smaller the improvement is (something else becomes the bottleneck). It also reduces the memory requirements for depth textures by 50%, because the flushed depth texture isn't needed anymore. The catch is fetching the 4th stencil mipmap level gives wrong pixels in one not-yet-committed test. What's weird is that all the other mipmaps (both smaller and larger) are fetched correctly. That bug has yet to be fixed, but who is using a stencil buffer with mipmaps anyway? :) This 4th level might be the usual switching point btw 2d tiled and 1d tiled ... ie we think the hw is still using 2d while it switched to 1d (or the other way around) Otherwise reviewed Cheers, Jerome src/gallium/auxiliary/util/u_blitter.c |3 +- .../drivers/r600/evergreen_compute_internal.c |6 +- src/gallium/drivers/r600/evergreen_state.c | 92 +++- src/gallium/drivers/r600/evergreend.h | 10 ++- src/gallium/drivers/r600/r600_blit.c | 89 --- src/gallium/drivers/r600/r600_pipe.h |1 + src/gallium/drivers/r600/r600_resource.h | 10 ++- src/gallium/drivers/r600/r600_state.c | 13 +-- src/gallium/drivers/r600/r600_texture.c| 60 - 9 files changed, 216 insertions(+), 68 deletions(-) diff --git a/src/gallium/auxiliary/util/u_blitter.c b/src/gallium/auxiliary/util/u_blitter.c index 4ad7a6b..86109f0 100644 --- a/src/gallium/auxiliary/util/u_blitter.c +++ b/src/gallium/auxiliary/util/u_blitter.c @@ -1602,7 +1602,8 @@ void util_blitter_custom_depth_stencil(struct blitter_context *blitter, blitter_disable_render_cond(ctx); /* bind states */ - pipe-bind_blend_state(pipe, ctx-blend[PIPE_MASK_RGBA]); + pipe-bind_blend_state(pipe, cbsurf ? ctx-blend[PIPE_MASK_RGBA] : + ctx-blend[0]); pipe-bind_depth_stencil_alpha_state(pipe, dsa_stage); ctx-bind_fs_state(pipe, blitter_get_fs_col(ctx, 0, FALSE)); pipe-bind_vertex_elements_state(pipe, ctx-velem_state); diff --git a/src/gallium/drivers/r600/evergreen_compute_internal.c b/src/gallium/drivers/r600/evergreen_compute_internal.c index 496d099..b937135 100644 --- a/src/gallium/drivers/r600/evergreen_compute_internal.c +++ b/src/gallium/drivers/r600/evergreen_compute_internal.c @@ -480,7 +480,7 @@ void evergreen_set_tex_resource( unsigned format, endian; uint32_t word4 = 0, yuv_format = 0, pitch = 0; - unsigned char swizzle[4], array_mode = 0, tile_type = 0; + unsigned char swizzle[4], array_mode = 0, non_disp_tiling = 0; unsigned height, depth; swizzle[0] = 0; @@ -503,7 +503,7 @@ void evergreen_set_tex_resource( pitch = align(tmp-surface.level[0].nblk_x * util_format_get_blockwidth(tmp-resource.b.b.format), 8); array_mode = tmp-array_mode[0]; - tile_type = tmp-tile_type; + non_disp_tiling = tmp-non_disp_tiling; assert(view-base.texture-target != PIPE_TEXTURE_1D_ARRAY); assert(view-base.texture-target != PIPE_TEXTURE_2D_ARRAY); @@ -513,7 +513,7 @@ void evergreen_set_tex_resource( evergreen_emit_raw_value(res, (S_03_DIM(r600_tex_dim(view-base.texture-target)) | S_03_PITCH((pitch / 8) - 1) | - S_03_NON_DISP_TILING_ORDER(tile_type) | + S_03_NON_DISP_TILING_ORDER(non_disp_tiling) | S_03_TEX_WIDTH(view-base.texture-width0 - 1))); evergreen_emit_raw_value(res, (S_030004_TEX_HEIGHT(height - 1) | S_030004_TEX_DEPTH(depth - 1) | diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index c126e7d..5a14934 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@
Re: [Mesa-dev] [PATCH 19/19] r600g: convert the remnants of VGT state into immediate register writes
On Wed, Sep 12, 2012 at 5:24 PM, Jerome Glisse j.gli...@gmail.com wrote: On Tue, Sep 11, 2012 at 2:29 PM, Marek Olšák mar...@gmail.com wrote: On Tue, Sep 11, 2012 at 7:41 PM, Jerome Glisse j.gli...@gmail.com wrote: On Tue, Sep 11, 2012 at 1:10 PM, Marek Olšák mar...@gmail.com wrote: Please provide information about the GPU and the test which locks up. I'd like to reproduce it. Also please explain what's the cause of the lockup if you know it (which registers are not emitted in the correct order and how it can fixed). Marek For instance http://people.freedesktop.org/~glisse/registerposition/lockup-longprim.sh will lockup probably any r6xx/r7xx (definitely rv670 rv770) I know that the whole vgt register order is picky and that most of them need to be emitted before ta_cntl_aux and before cb/db. But the ordering relative to pa is kind of weird and moving when looking at fglrx. I tested RS880, which is very similar to RV670, and it didn't hang. I can test RV670 later and if there's any issue, I'll fix it. I'd like this patch to be fixed instead of dropped, that's why I'm asking and I still haven't got a definitive answer how to change the patch, so that it can be pushed. Besides that... Has it ever occured to you that the register ordering is changing in fglrx, because the ordering doesn't matter at all, just like Alex said, and the closed driver devs wrote it that way because they didn't care about the ordering either? I think the lockups you are seeing on r600-r700 are actually caused by something entirely different and it confuses you. See this thread from the comment #9 onwards: https://bugs.freedesktop.org/show_bug.cgi?id=50655#c9 Marek This modified version is fine (rv670,rv770, caicos) http://people.freedesktop.org/~glisse/0001-r600g-convert-the-remnants-of-VGT-state-into-immedia.patch Cheers, Jerome This one also works http://people.freedesktop.org/~glisse/0001-r600g-convert-the-remnants-of-VGT-state-into-immedia.patch Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] r600g: add htile support v9
On Tue, Jul 17, 2012 at 1:58 PM, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com htile is used for HiZ and HiS support and fast Z/S clears. This commit just adds the htile setup and Fast Z clear. We don't take full advantage of HiS with that patch. v2 really use fast clear, still random issue with some tiles need to try more flush combination, fix depth/stencil texture decompression v3 fix random issue on r6xx/r7xx v4 rebase on top of lastest mesa, disable CB export when clearing htile surface to avoid wasting bandwidth v5 resummarize htile surface when uploading z value. Fix z/stencil decompression, the custom blitter with custom dsa is no longer needed. v6 Reorganize render control/override update mecanism, fixing more issues in the process. v7 Add nop after depth surface base update to work around some htile flushing issue. For htile to 8x8 on r6xx/r7xx as other combination have issue. Do not enable hyperz when flushing/uncompressing depth buffer. v8 Fix htile surface, preload and prefetch setup. Only set preload and prefetch on htile surface clear like fglrx. Record depth clear value per level. Support several level for the htile surface. First depth clear can't be a fast clear. v9 Fix comments, properly account new register in emit function, disable fast zclear if clearing different layer of texture array to different value Signed-off-by: Pierre-Eric Pelloux-Prayer pell...@gmail.com Signed-off-by: Alex Deucher alexander.deuc...@amd.com Signed-off-by: Jerome Glisse jgli...@redhat.com Btw v11 version against newer mesa is at: http://people.freedesktop.org/~glisse/0001-r600g-add-htile-support-v11.patch Cheers, Jerome --- src/gallium/drivers/r600/evergreen_hw_context.c |6 + src/gallium/drivers/r600/evergreen_state.c | 102 - src/gallium/drivers/r600/evergreend.h |4 + src/gallium/drivers/r600/r600_blit.c| 38 +++ src/gallium/drivers/r600/r600_hw_context.c | 25 + src/gallium/drivers/r600/r600_pipe.c|8 ++ src/gallium/drivers/r600/r600_pipe.h| 13 ++- src/gallium/drivers/r600/r600_resource.h|7 ++ src/gallium/drivers/r600/r600_state.c | 133 --- src/gallium/drivers/r600/r600_texture.c | 103 ++ src/gallium/drivers/r600/r600d.h|6 + 11 files changed, 399 insertions(+), 46 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c b/src/gallium/drivers/r600/evergreen_hw_context.c index 081701f..546c884 100644 --- a/src/gallium/drivers/r600/evergreen_hw_context.c +++ b/src/gallium/drivers/r600/evergreen_hw_context.c @@ -62,6 +62,9 @@ static const struct r600_reg evergreen_context_reg_list[] = { {GROUP_FORCE_NEW_BLOCK, 0, 0}, {R_028058_DB_DEPTH_SIZE, 0, 0}, {R_02805C_DB_DEPTH_SLICE, 0, 0}, + {R_02802C_DB_DEPTH_CLEAR, 0, 0}, + {R_028ABC_DB_HTILE_SURFACE, 0, 0}, + {R_028AC8_DB_PRELOAD_CONTROL, 0, 0}, {R_028204_PA_SC_WINDOW_SCISSOR_TL, 0, 0}, {R_028208_PA_SC_WINDOW_SCISSOR_BR, 0, 0}, {R_028234_PA_SU_HARDWARE_SCREEN_OFFSET, 0, 0}, @@ -319,6 +322,9 @@ static const struct r600_reg cayman_context_reg_list[] = { {GROUP_FORCE_NEW_BLOCK, 0, 0}, {R_028058_DB_DEPTH_SIZE, 0, 0}, {R_02805C_DB_DEPTH_SLICE, 0, 0}, + {R_02802C_DB_DEPTH_CLEAR, 0, 0}, + {R_028ABC_DB_HTILE_SURFACE, 0, 0}, + {R_028AC8_DB_PRELOAD_CONTROL, 0, 0}, {R_028204_PA_SC_WINDOW_SCISSOR_TL, 0, 0}, {R_028208_PA_SC_WINDOW_SCISSOR_BR, 0, 0}, {R_028234_PA_SU_HARDWARE_SCREEN_OFFSET, 0, 0}, diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index a66387b..214d76b 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -710,13 +710,15 @@ static void *evergreen_create_blend_state(struct pipe_context *ctx, } blend-cb_target_mask = target_mask; - if (target_mask) + if (target_mask) { color_control |= S_028808_MODE(V_028808_CB_NORMAL); - else + } else { color_control |= S_028808_MODE(V_028808_CB_DISABLE); + } r600_pipe_state_add_reg(rstate, R_028808_CB_COLOR_CONTROL, color_control); + /* only have dual source on MRT0 */ blend-dual_src_blend = util_blend_state_is_dual(state, 0); for (int i = 0; i 8; i++) { @@ -1668,6 +1670,26 @@ static void evergreen_db(struct r600_context *rctx, struct r600_pipe_state *rsta } } + /* hyperz */ + if (rtex-hyperz) { + uint64_t htile_offset = rtex-hyperz-surface.level[level].offset; + + rctx-db_misc_state.hyperz = true
Re: [Mesa-dev] [PATCH 00/19] r600g refactoring and cleanups
On Mon, Sep 10, 2012 at 7:16 PM, Marek Olšák mar...@gmail.com wrote: Nothing too exciting. Besides cleanups, there are fine-grained sampler state updates (it emits only the samplers which changed), support for geometry shader resources (because it was easy; I am not working on GS right now), atomization of some states, some fixes and a major cleanup in r600_draw_vbo. Tested on RS880 and REDWOOD. Please review. For the first 18 patch : Reviewed-by: Jerome Glisse jgli...@redhat.com NAK for the 19 see other reply Marek Olšák (19): r600g: consolidate initialization of common state functions r600g: cleanup state function names r600g: put constant buffer state into an array indexed by shader type r600g: consolidate set_sampler_views functions r600g: consolidate set_viewport_state functions r600g: do fine-grained sampler state updates r600g: put sampler states and views into an array indexed by shader type r600g: add support for geometry shader samplers and constant buffers r600g: initialize the first CS just like any other CS r600g: remove unused state ID definitions r600g: atomize stencil ref state r600g: atomize viewport state r600g: atomize blend color r600g: atomize clip state r600g: fix the number of CS dwords of cb_misc_state r600g: fix computing how much space is needed for a draw command r600g: add clip_misc_state for clip registers emitted in draw_vbo r600g: emit the primitive type and associated regs only if the type is changed r600g: convert the remnants of VGT state into immediate register writes src/gallium/drivers/r600/evergreen_hw_context.c | 108 + src/gallium/drivers/r600/evergreen_state.c | 191 +++- src/gallium/drivers/r600/evergreend.h |2 + src/gallium/drivers/r600/r600.h |8 +- src/gallium/drivers/r600/r600_blit.c| 16 +- src/gallium/drivers/r600/r600_buffer.c | 31 +- src/gallium/drivers/r600/r600_hw_context.c | 133 +++--- src/gallium/drivers/r600/r600_hw_context_priv.h |3 +- src/gallium/drivers/r600/r600_pipe.c|6 +- src/gallium/drivers/r600/r600_pipe.h| 169 src/gallium/drivers/r600/r600_shader.c |3 +- src/gallium/drivers/r600/r600_shader.h |1 - src/gallium/drivers/r600/r600_state.c | 211 +++-- src/gallium/drivers/r600/r600_state_common.c| 526 ++- src/gallium/drivers/r600/r600d.h|2 + 15 files changed, 615 insertions(+), 795 deletions(-) Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 19/19] r600g: convert the remnants of VGT state into immediate register writes
On Mon, Sep 10, 2012 at 7:16 PM, Marek Olšák mar...@gmail.com wrote: NAK this one introduce lockup. As i said in another email register group/order matter and with this patch i get 100% lockup rate in some test case for instance the test case i reference in my other email --- src/gallium/drivers/r600/evergreen_hw_context.c | 16 --- src/gallium/drivers/r600/r600.h |7 - src/gallium/drivers/r600/r600_hw_context.c | 15 ++ src/gallium/drivers/r600/r600_hw_context_priv.h |2 +- src/gallium/drivers/r600/r600_pipe.h|8 +++--- src/gallium/drivers/r600/r600_state_common.c| 34 --- 6 files changed, 26 insertions(+), 56 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c b/src/gallium/drivers/r600/evergreen_hw_context.c index 483021f..0c2159a 100644 --- a/src/gallium/drivers/r600/evergreen_hw_context.c +++ b/src/gallium/drivers/r600/evergreen_hw_context.c @@ -32,10 +32,6 @@ static const struct r600_reg cayman_config_reg_list[] = { {R_00913C_SPI_CONFIG_CNTL_1, REG_FLAG_ENABLE_ALWAYS | REG_FLAG_FLUSH_CHANGE, 0}, }; -static const struct r600_reg evergreen_ctl_const_list[] = { - {R_03CFF4_SQ_VTX_START_INST_LOC, 0, 0}, -}; - static const struct r600_reg evergreen_context_reg_list[] = { {R_028008_DB_DEPTH_VIEW, 0, 0}, {R_028010_DB_RENDER_OVERRIDE2, 0, 0}, @@ -63,10 +59,6 @@ static const struct r600_reg evergreen_context_reg_list[] = { {R_028254_PA_SC_VPORT_SCISSOR_0_BR, 0, 0}, {R_028350_SX_MISC, 0, 0}, {GROUP_FORCE_NEW_BLOCK, 0, 0}, - {R_028408_VGT_INDX_OFFSET, 0, 0}, - {R_02840C_VGT_MULTI_PRIM_IB_RESET_INDX, 0, 0}, - {R_028A94_VGT_MULTI_PRIM_IB_RESET_EN, 0, 0}, - {GROUP_FORCE_NEW_BLOCK, 0, 0}, {R_02861C_SPI_VS_OUT_ID_0, 0, 0}, {R_028620_SPI_VS_OUT_ID_1, 0, 0}, {R_028624_SPI_VS_OUT_ID_2, 0, 0}, @@ -353,10 +345,6 @@ static const struct r600_reg cayman_context_reg_list[] = { {R_028254_PA_SC_VPORT_SCISSOR_0_BR, 0, 0}, {R_028350_SX_MISC, 0, 0}, {GROUP_FORCE_NEW_BLOCK, 0, 0}, - {R_028408_VGT_INDX_OFFSET, 0, 0}, - {R_02840C_VGT_MULTI_PRIM_IB_RESET_INDX, 0, 0}, - {R_028A94_VGT_MULTI_PRIM_IB_RESET_EN, 0, 0}, - {GROUP_FORCE_NEW_BLOCK, 0, 0}, {R_02861C_SPI_VS_OUT_ID_0, 0, 0}, {R_028620_SPI_VS_OUT_ID_1, 0, 0}, {R_028624_SPI_VS_OUT_ID_2, 0, 0}, @@ -664,10 +652,6 @@ int evergreen_context_init(struct r600_context *ctx) Elements(evergreen_context_reg_list), PKT3_SET_CONTEXT_REG, EVERGREEN_CONTEXT_REG_OFFSET); if (r) goto out_err; - r = r600_context_add_block(ctx, evergreen_ctl_const_list, - Elements(evergreen_ctl_const_list), PKT3_SET_CTL_CONST, EVERGREEN_CTL_CONST_OFFSET); - if (r) - goto out_err; /* PS loop const */ evergreen_loop_const_init(ctx, 0); diff --git a/src/gallium/drivers/r600/r600.h b/src/gallium/drivers/r600/r600.h index 6363a03..83d21a4 100644 --- a/src/gallium/drivers/r600/r600.h +++ b/src/gallium/drivers/r600/r600.h @@ -228,11 +228,4 @@ void _r600_pipe_state_add_reg(struct r600_context *ctx, #define r600_pipe_state_add_reg_bo(state, offset, value, bo, usage) _r600_pipe_state_add_reg_bo(rctx, state, offset, value, CTX_RANGE_ID(offset), CTX_BLOCK_ID(offset), bo, usage) #define r600_pipe_state_add_reg(state, offset, value) _r600_pipe_state_add_reg(rctx, state, offset, value, CTX_RANGE_ID(offset), CTX_BLOCK_ID(offset)) -static inline void r600_pipe_state_mod_reg(struct r600_pipe_state *state, - uint32_t value) -{ - state-regs[state-nregs].value = value; - state-nregs++; -} - #endif diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index 57dcc7e..122f878 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -233,10 +233,6 @@ static const struct r600_reg r600_config_reg_list[] = { {R_008C04_SQ_GPR_RESOURCE_MGMT_1, REG_FLAG_ENABLE_ALWAYS | REG_FLAG_FLUSH_CHANGE, 0}, }; -static const struct r600_reg r600_ctl_const_list[] = { - {R_03CFF4_SQ_VTX_START_INST_LOC, 0, 0}, -}; - static const struct r600_reg r600_context_reg_list[] = { {R_028A4C_PA_SC_MODE_CNTL, 0, 0}, {GROUP_FORCE_NEW_BLOCK, 0, 0}, @@ -461,9 +457,6 @@ static const struct r600_reg r600_context_reg_list[] = { {GROUP_FORCE_NEW_BLOCK, 0, 0}, {R_028850_SQ_PGM_RESOURCES_PS, 0, 0}, {R_028854_SQ_PGM_EXPORTS_PS, 0, 0}, - {R_028408_VGT_INDX_OFFSET, 0, 0}, - {R_02840C_VGT_MULTI_PRIM_IB_RESET_INDX, 0, 0}, - {R_028A94_VGT_MULTI_PRIM_IB_RESET_EN, 0, 0}, {R_028C1C_PA_SC_AA_SAMPLE_LOCS_MCTX,
Re: [Mesa-dev] [PATCH 19/19] r600g: convert the remnants of VGT state into immediate register writes
On Tue, Sep 11, 2012 at 1:10 PM, Marek Olšák mar...@gmail.com wrote: Please provide information about the GPU and the test which locks up. I'd like to reproduce it. Also please explain what's the cause of the lockup if you know it (which registers are not emitted in the correct order and how it can fixed). Marek For instance http://people.freedesktop.org/~glisse/registerposition/lockup-longprim.sh will lockup probably any r6xx/r7xx (definitely rv670 rv770) I know that the whole vgt register order is picky and that most of them need to be emitted before ta_cntl_aux and before cb/db. But the ordering relative to pa is kind of weird and moving when looking at fglrx. Cheers, Jerome On Tue, Sep 11, 2012 at 6:48 PM, Jerome Glisse j.gli...@gmail.com wrote: On Mon, Sep 10, 2012 at 7:16 PM, Marek Olšák mar...@gmail.com wrote: NAK this one introduce lockup. As i said in another email register group/order matter and with this patch i get 100% lockup rate in some test case for instance the test case i reference in my other email --- src/gallium/drivers/r600/evergreen_hw_context.c | 16 --- src/gallium/drivers/r600/r600.h |7 - src/gallium/drivers/r600/r600_hw_context.c | 15 ++ src/gallium/drivers/r600/r600_hw_context_priv.h |2 +- src/gallium/drivers/r600/r600_pipe.h|8 +++--- src/gallium/drivers/r600/r600_state_common.c| 34 --- 6 files changed, 26 insertions(+), 56 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c b/src/gallium/drivers/r600/evergreen_hw_context.c index 483021f..0c2159a 100644 --- a/src/gallium/drivers/r600/evergreen_hw_context.c +++ b/src/gallium/drivers/r600/evergreen_hw_context.c @@ -32,10 +32,6 @@ static const struct r600_reg cayman_config_reg_list[] = { {R_00913C_SPI_CONFIG_CNTL_1, REG_FLAG_ENABLE_ALWAYS | REG_FLAG_FLUSH_CHANGE, 0}, }; -static const struct r600_reg evergreen_ctl_const_list[] = { - {R_03CFF4_SQ_VTX_START_INST_LOC, 0, 0}, -}; - static const struct r600_reg evergreen_context_reg_list[] = { {R_028008_DB_DEPTH_VIEW, 0, 0}, {R_028010_DB_RENDER_OVERRIDE2, 0, 0}, @@ -63,10 +59,6 @@ static const struct r600_reg evergreen_context_reg_list[] = { {R_028254_PA_SC_VPORT_SCISSOR_0_BR, 0, 0}, {R_028350_SX_MISC, 0, 0}, {GROUP_FORCE_NEW_BLOCK, 0, 0}, - {R_028408_VGT_INDX_OFFSET, 0, 0}, - {R_02840C_VGT_MULTI_PRIM_IB_RESET_INDX, 0, 0}, - {R_028A94_VGT_MULTI_PRIM_IB_RESET_EN, 0, 0}, - {GROUP_FORCE_NEW_BLOCK, 0, 0}, {R_02861C_SPI_VS_OUT_ID_0, 0, 0}, {R_028620_SPI_VS_OUT_ID_1, 0, 0}, {R_028624_SPI_VS_OUT_ID_2, 0, 0}, @@ -353,10 +345,6 @@ static const struct r600_reg cayman_context_reg_list[] = { {R_028254_PA_SC_VPORT_SCISSOR_0_BR, 0, 0}, {R_028350_SX_MISC, 0, 0}, {GROUP_FORCE_NEW_BLOCK, 0, 0}, - {R_028408_VGT_INDX_OFFSET, 0, 0}, - {R_02840C_VGT_MULTI_PRIM_IB_RESET_INDX, 0, 0}, - {R_028A94_VGT_MULTI_PRIM_IB_RESET_EN, 0, 0}, - {GROUP_FORCE_NEW_BLOCK, 0, 0}, {R_02861C_SPI_VS_OUT_ID_0, 0, 0}, {R_028620_SPI_VS_OUT_ID_1, 0, 0}, {R_028624_SPI_VS_OUT_ID_2, 0, 0}, @@ -664,10 +652,6 @@ int evergreen_context_init(struct r600_context *ctx) Elements(evergreen_context_reg_list), PKT3_SET_CONTEXT_REG, EVERGREEN_CONTEXT_REG_OFFSET); if (r) goto out_err; - r = r600_context_add_block(ctx, evergreen_ctl_const_list, - Elements(evergreen_ctl_const_list), PKT3_SET_CTL_CONST, EVERGREEN_CTL_CONST_OFFSET); - if (r) - goto out_err; /* PS loop const */ evergreen_loop_const_init(ctx, 0); diff --git a/src/gallium/drivers/r600/r600.h b/src/gallium/drivers/r600/r600.h index 6363a03..83d21a4 100644 --- a/src/gallium/drivers/r600/r600.h +++ b/src/gallium/drivers/r600/r600.h @@ -228,11 +228,4 @@ void _r600_pipe_state_add_reg(struct r600_context *ctx, #define r600_pipe_state_add_reg_bo(state, offset, value, bo, usage) _r600_pipe_state_add_reg_bo(rctx, state, offset, value, CTX_RANGE_ID(offset), CTX_BLOCK_ID(offset), bo, usage) #define r600_pipe_state_add_reg(state, offset, value) _r600_pipe_state_add_reg(rctx, state, offset, value, CTX_RANGE_ID(offset), CTX_BLOCK_ID(offset)) -static inline void r600_pipe_state_mod_reg(struct r600_pipe_state *state, - uint32_t value) -{ - state-regs[state-nregs].value = value; - state-nregs++; -} - #endif diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c index 57dcc7e..122f878 100644 --- a/src/gallium/drivers/r600/r600_hw_context.c +++ b/src/gallium/drivers/r600/r600_hw_context.c @@ -233,10 +233,6 @@ static const struct r600_reg
Re: [Mesa-dev] [PATCH 19/19] r600g: convert the remnants of VGT state into immediate register writes
On Tue, Sep 11, 2012 at 2:29 PM, Marek Olšák mar...@gmail.com wrote: On Tue, Sep 11, 2012 at 7:41 PM, Jerome Glisse j.gli...@gmail.com wrote: On Tue, Sep 11, 2012 at 1:10 PM, Marek Olšák mar...@gmail.com wrote: Please provide information about the GPU and the test which locks up. I'd like to reproduce it. Also please explain what's the cause of the lockup if you know it (which registers are not emitted in the correct order and how it can fixed). Marek For instance http://people.freedesktop.org/~glisse/registerposition/lockup-longprim.sh will lockup probably any r6xx/r7xx (definitely rv670 rv770) I know that the whole vgt register order is picky and that most of them need to be emitted before ta_cntl_aux and before cb/db. But the ordering relative to pa is kind of weird and moving when looking at fglrx. I tested RS880, which is very similar to RV670, and it didn't hang. I can test RV670 later and if there's any issue, I'll fix it. I'd like this patch to be fixed instead of dropped, that's why I'm asking and I still haven't got a definitive answer how to change the patch, so that it can be pushed. Besides that... Has it ever occured to you that the register ordering is changing in fglrx, because the ordering doesn't matter at all, just like Alex said, and the closed driver devs wrote it that way because they didn't care about the ordering either? I think the lockups you are seeing on r600-r700 are actually caused by something entirely different and it confuses you. See this thread from the comment #9 onwards: https://bugs.freedesktop.org/show_bug.cgi?id=50655#c9 Marek It's simple without that patch no lockup, with it lockup all the time. It's just a hard fact, i am not confused about anything, i know for a fact that reg grouping/order matter somehow. I run several automated tools that compare register value at draw call time btw r600g and fglrx while doing hyperz and there was no difference at all, down the last bit. One was locking up the other not. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 19/19] r600g: convert the remnants of VGT state into immediate register writes
On Tue, Sep 11, 2012 at 3:00 PM, Jerome Glisse j.gli...@gmail.com wrote: On Tue, Sep 11, 2012 at 2:29 PM, Marek Olšák mar...@gmail.com wrote: On Tue, Sep 11, 2012 at 7:41 PM, Jerome Glisse j.gli...@gmail.com wrote: On Tue, Sep 11, 2012 at 1:10 PM, Marek Olšák mar...@gmail.com wrote: Please provide information about the GPU and the test which locks up. I'd like to reproduce it. Also please explain what's the cause of the lockup if you know it (which registers are not emitted in the correct order and how it can fixed). Marek For instance http://people.freedesktop.org/~glisse/registerposition/lockup-longprim.sh will lockup probably any r6xx/r7xx (definitely rv670 rv770) I know that the whole vgt register order is picky and that most of them need to be emitted before ta_cntl_aux and before cb/db. But the ordering relative to pa is kind of weird and moving when looking at fglrx. I tested RS880, which is very similar to RV670, and it didn't hang. I can test RV670 later and if there's any issue, I'll fix it. I'd like this patch to be fixed instead of dropped, that's why I'm asking and I still haven't got a definitive answer how to change the patch, so that it can be pushed. Besides that... Has it ever occured to you that the register ordering is changing in fglrx, because the ordering doesn't matter at all, just like Alex said, and the closed driver devs wrote it that way because they didn't care about the ordering either? I think the lockups you are seeing on r600-r700 are actually caused by something entirely different and it confuses you. See this thread from the comment #9 onwards: https://bugs.freedesktop.org/show_bug.cgi?id=50655#c9 Marek It's simple without that patch no lockup, with it lockup all the time. It's just a hard fact, i am not confused about anything, i know for a fact that reg grouping/order matter somehow. I run several automated tools that compare register value at draw call time btw r600g and fglrx while doing hyperz and there was no difference at all, down the last bit. One was locking up the other not. Cheers, Jerome And if your curious r600g command stream good and bad and diff btw bad and good are at: http://people.freedesktop.org/~glisse/longprim/ If it's the bad that is emited before the fbo-stencil test then it lockup, if it's the good one then no lockup. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 19/19] r600g: convert the remnants of VGT state into immediate register writes
On Tue, Sep 11, 2012 at 2:29 PM, Marek Olšák mar...@gmail.com wrote: On Tue, Sep 11, 2012 at 7:41 PM, Jerome Glisse j.gli...@gmail.com wrote: On Tue, Sep 11, 2012 at 1:10 PM, Marek Olšák mar...@gmail.com wrote: Please provide information about the GPU and the test which locks up. I'd like to reproduce it. Also please explain what's the cause of the lockup if you know it (which registers are not emitted in the correct order and how it can fixed). Marek For instance http://people.freedesktop.org/~glisse/registerposition/lockup-longprim.sh will lockup probably any r6xx/r7xx (definitely rv670 rv770) I know that the whole vgt register order is picky and that most of them need to be emitted before ta_cntl_aux and before cb/db. But the ordering relative to pa is kind of weird and moving when looking at fglrx. I tested RS880, which is very similar to RV670, and it didn't hang. I can test RV670 later and if there's any issue, I'll fix it. I'd like this patch to be fixed instead of dropped, that's why I'm asking and I still haven't got a definitive answer how to change the patch, so that it can be pushed. Besides that... Has it ever occured to you that the register ordering is changing in fglrx, because the ordering doesn't matter at all, just like Alex said, and the closed driver devs wrote it that way because they didn't care about the ordering either? fglrx definitly emit register according to certain grouping. Thing is there is a bunch of register that are emitted in 2/3 or 4 different group at most of what i have seen. Otherwise all other register are _always_ emitted as part of same group with the whole group being emitted. The issue i have is understanding those register that are emitted in few different ways and how fglrx choose btw those different one. I think the lockups you are seeing on r600-r700 are actually caused by something entirely different and it confuses you. See this thread from the comment #9 onwards: https://bugs.freedesktop.org/show_bug.cgi?id=50655#c9 Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: simplify flushing
On Sun, Sep 9, 2012 at 1:03 AM, Marek Olšák mar...@gmail.com wrote: Based on the patch called simplify and fix flushing and synchronization by Jerome Glisse. Rebased, removed unneded code, simplified more and cleaned up. Also, SH_ACTION_ENA is not set when changing shaders (hw doesn't seem to need it). It's only used to flush constant buffers. Looks good, still would like to do some stress testing will try to do that today. Reviewed-by: Jerome Glisse jgli...@redhat.com --- src/gallium/drivers/r600/evergreen_compute.c | 20 +- .../drivers/r600/evergreen_compute_internal.c |4 +- src/gallium/drivers/r600/evergreen_state.c |7 +- src/gallium/drivers/r600/evergreend.h |7 +- src/gallium/drivers/r600/r600.h| 18 +- src/gallium/drivers/r600/r600_hw_context.c | 218 +--- src/gallium/drivers/r600/r600_hw_context_priv.h|3 +- src/gallium/drivers/r600/r600_pipe.c |2 - src/gallium/drivers/r600/r600_pipe.h |4 - src/gallium/drivers/r600/r600_state.c | 21 +- src/gallium/drivers/r600/r600_state_common.c | 76 ++- src/gallium/drivers/r600/r600d.h | 12 ++ 12 files changed, 210 insertions(+), 182 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c index 3533312..1fb63d6 100644 --- a/src/gallium/drivers/r600/evergreen_compute.c +++ b/src/gallium/drivers/r600/evergreen_compute.c @@ -96,7 +96,7 @@ static void evergreen_cs_set_vertex_buffer( vb-buffer = buffer; vb-user_buffer = NULL; - r600_inval_vertex_cache(rctx); + rctx-flags |= rctx-has_vertex_cache ? R600_CONTEXT_VTX_FLUSH : R600_CONTEXT_TEX_FLUSH; state-enabled_mask |= 1 vb_index; state-dirty_mask |= 1 vb_index; r600_atom_dirty(rctx, state-atom); @@ -332,8 +332,11 @@ static void compute_emit_cs(struct r600_context *ctx, const uint *block_layout, */ r600_emit_atom(ctx, ctx-start_compute_cs_cmd.atom); + ctx-flags |= R600_CONTEXT_CB_FLUSH; + r600_flush_emit(ctx); + /* Emit cb_state */ -cb_state = ctx-states[R600_PIPE_STATE_FRAMEBUFFER]; + cb_state = ctx-states[R600_PIPE_STATE_FRAMEBUFFER]; r600_context_pipe_state_emit(ctx, cb_state, RADEON_CP_PACKET3_COMPUTE_MODE); /* Set CB_TARGET_MASK XXX: Use cb_misc_state */ @@ -384,15 +387,10 @@ static void compute_emit_cs(struct r600_context *ctx, const uint *block_layout, /* Emit dispatch state and dispatch packet */ evergreen_emit_direct_dispatch(ctx, block_layout, grid_layout); - /* r600_flush_framebuffer() updates the cb_flush_flags and then -* calls r600_emit_atom() on the ctx-surface_sync_cmd.atom, which emits -* a SURFACE_SYNC packet via r600_emit_surface_sync(). -* -* XXX r600_emit_surface_sync() hardcodes the CP_COHER_SIZE to -* 0x, so we will need to add a field to struct -* r600_surface_sync_cmd if we want to manually set this value. + /* XXX evergreen_flush_emit() hardcodes the CP_COHER_SIZE to 0x */ - r600_flush_framebuffer(ctx, true /* Flush now */); + ctx-flags |= R600_CONTEXT_CB_FLUSH; + r600_flush_emit(ctx); #if 0 COMPUTE_DBG(cdw: %i\n, cs-cdw); @@ -444,7 +442,7 @@ void evergreen_emit_cs_shader( r600_write_value(cs, r600_context_bo_reloc(rctx, shader-shader_code_bo, RADEON_USAGE_READ)); - r600_inval_shader_cache(rctx); + rctx-flags |= R600_CONTEXT_SHADERCONST_FLUSH; } static void evergreen_launch_grid( diff --git a/src/gallium/drivers/r600/evergreen_compute_internal.c b/src/gallium/drivers/r600/evergreen_compute_internal.c index 50a60d3..dc95732 100644 --- a/src/gallium/drivers/r600/evergreen_compute_internal.c +++ b/src/gallium/drivers/r600/evergreen_compute_internal.c @@ -562,7 +562,7 @@ void evergreen_set_tex_resource( util_format_get_blockwidth(tmp-resource.b.b.format) * view-base.texture-width0*height*depth; - r600_inval_texture_cache(pipe-ctx); + pipe-ctx-flags |= R600_CONTEXT_TEX_FLUSH; evergreen_emit_force_reloc(res); evergreen_emit_force_reloc(res); @@ -621,7 +621,7 @@ void evergreen_set_const_cache( res-usage = RADEON_USAGE_READ; res-coher_bo_size = size; - r600_inval_shader_cache(pipe-ctx); + pipe-ctx-flags |= R600_CONTEXT_SHADERCONST_FLUSH; } struct r600_resource* r600_compute_buffer_alloc_vram( diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index 9a5183e..2a7a35f 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium
Re: [Mesa-dev] [PATCH] r600g: order atom emission
On Thu, Sep 6, 2012 at 11:32 AM, Alex Deucher alexdeuc...@gmail.com wrote: On Thu, Sep 6, 2012 at 10:54 AM, Jerome Glisse j.gli...@gmail.com wrote: On Thu, Sep 6, 2012 at 6:20 AM, Dave Airlie airl...@gmail.com wrote: On Thu, Sep 6, 2012 at 5:21 PM, Philipp Klaus Krause p...@spth.de wrote: On 06.09.2012 07:35, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com To avoid GPU lockup registers must be emited in a specific order (no kidding ...). This patch rework atom emission so order in which atom are emited in respect to each other is always the same. We don't have any informations on what is the correct order so order will need to be infered from fglrx command stream. Shouldn't this be stated in comments, so the next person who comes along and makes a change in this code doesn't inadvertently change the order? Also a comment on what ordering matters most, like I suspect this is just hiding a real issue. Dave. No it's not hiding an issue, afaict it's how the hw works. The hw do what some amd document call states validations. So here is how i understand how things happen and i can be completely wrong. Hw process register write in order it receive them and to avoid postponing state validation the hw do state validation while processing register. That means if writing register A trigger state validation that use some field of register B the hw might not redo state validation when register B is latter written. ie only some register trigger the state validation no matter on what they depends on. I believe state validation is only use as pipeline optimization by the hw, so the hw knows it can take some short cut. But in some rare case if short cut are taken for wrong reasons we end up in GPU lockup. No matter if my guess is right or wrong, i know for a fact that register order is important in some situation, that's the hard bottom line, no matter what is the reasons inside the hw. This patch is far from having all the order right, it's just a first step, i am atomizing everything and it's what needed to go forward without regression. I've talked to the internal hw and sw guys and they said there isn't any specific ordering required and the closed driver doesn't impose any specific order. The pipeline doesn't get kicked off until a draw command is issued, so I don't see why the state update order would matter. It's possible there are subtle ordering requirements and the closed driver just happened to get it right. There are dependencies and hw bug workarounds however. E.g., some blocks snoop registers from other blocks so you need to make sure those dependant registers have been initialized before drawing. I don't know if it's the ordering so much as making sure we emit all the necessary state when needed. The closed driver tends to update a lot more state the is minimally required for a lot of things. That said, it probably wouldn't hurt to mirror the closed driver more closely. Alex I don't know what are the reason but what register are emitted and along which other register definitely matter. All files i am talking in this mail are located at : http://people.freedesktop.org/~glisse/registerposition/ So if you apply : 0001-r600g-FORCE-LOCKUP-BY-EMITTING-OR-NOT-REGISTER.patch and run piglit test like in lockup-longprim.sh you will lockup the GPU (i only tested on r6xx, r7xx so far). I double checked through automated tools that no register that was written by command stream from longprim piglist test are reprogram properly by the fbo test (if you have my constant buffer size patch i sent earlier). The only diff with command stream is one where R_02881C_PA_CL_VS_OUT_CNTL is emitted with each and the other only once, when emitted with each draw it lockups. bad command stream r600g-long-prim-simple-b.txt good one r600g-long-prim-simple-g.txt diff r600g-long-prim-simple-d.txt Given the bad one emit more register some draw command are moved to the second cs. Emitting some other register along PA_CL_VS_OUT_CNTL fix the lockup (don't have short list) but many other register behave the same as PA_CL_VS_OUT_CNTL. So if order does not matter then register group definitely does. I really wish that the hw were less picky about how command stream are supposed to be formated. Anyhow given that we have no information on what register need to be emitted together, mimicking fglrx sounds like the way to go. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: order atom emission
On Thu, Sep 6, 2012 at 11:32 AM, Alex Deucher alexdeuc...@gmail.com wrote: On Thu, Sep 6, 2012 at 10:54 AM, Jerome Glisse j.gli...@gmail.com wrote: On Thu, Sep 6, 2012 at 6:20 AM, Dave Airlie airl...@gmail.com wrote: On Thu, Sep 6, 2012 at 5:21 PM, Philipp Klaus Krause p...@spth.de wrote: On 06.09.2012 07:35, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com To avoid GPU lockup registers must be emited in a specific order (no kidding ...). This patch rework atom emission so order in which atom are emited in respect to each other is always the same. We don't have any informations on what is the correct order so order will need to be infered from fglrx command stream. Shouldn't this be stated in comments, so the next person who comes along and makes a change in this code doesn't inadvertently change the order? Also a comment on what ordering matters most, like I suspect this is just hiding a real issue. Dave. No it's not hiding an issue, afaict it's how the hw works. The hw do what some amd document call states validations. So here is how i understand how things happen and i can be completely wrong. Hw process register write in order it receive them and to avoid postponing state validation the hw do state validation while processing register. That means if writing register A trigger state validation that use some field of register B the hw might not redo state validation when register B is latter written. ie only some register trigger the state validation no matter on what they depends on. I believe state validation is only use as pipeline optimization by the hw, so the hw knows it can take some short cut. But in some rare case if short cut are taken for wrong reasons we end up in GPU lockup. No matter if my guess is right or wrong, i know for a fact that register order is important in some situation, that's the hard bottom line, no matter what is the reasons inside the hw. This patch is far from having all the order right, it's just a first step, i am atomizing everything and it's what needed to go forward without regression. I've talked to the internal hw and sw guys and they said there isn't any specific ordering required and the closed driver doesn't impose any specific order. The pipeline doesn't get kicked off until a draw command is issued, so I don't see why the state update order would matter. It's possible there are subtle ordering requirements and the closed driver just happened to get it right. There are dependencies and hw bug workarounds however. E.g., some blocks snoop registers from other blocks so you need to make sure those dependant registers have been initialized before drawing. I don't know if it's the ordering so much as making sure we emit all the necessary state when needed. The closed driver tends to update a lot more state the is minimally required for a lot of things. That said, it probably wouldn't hurt to mirror the closed driver more closely. Alex Yeah it's possible that it's also related to some register need to be re-emitted, i often see that fglrx is re-emitting some register even if it emitted it with same value just before and some register are emitted several time around other register block. Anyhow this patch is a first step to atomize everything and match fglrx register pattern more closely. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: order atom emission v2
On Thu, Sep 6, 2012 at 4:10 PM, Marek Olšák mar...@gmail.com wrote: On Thu, Sep 6, 2012 at 8:34 PM, Jerome Glisse j.gli...@gmail.com wrote: On Thu, Sep 6, 2012 at 2:29 PM, Marek Olšák mar...@gmail.com wrote: This looks good to me. It's funny to see the r300g architecture being re-implemented in r600g. :) There's one optimization that r300g has that this patch doesn't. r300g keeps the index of the first and the last dirty atom and the loops over the list of atoms look like this: for (i = first_dirty; i = last_dirty; i++) And after emission: first_dirty = some large number; last_dirty= 0; The atoms should be ordered according to how frequently they are updated (except when the ordering is required by the hw). But most importantly, if there are no state changes, the loops are trivially skipped. Marek Don't think this optimization is worth it, there won't be much more than 32 atom in the end and it definitely can't be ordered from most frequent to less frequent as some of the stuff need to be at the last being emitted and they are frequent one (primitive type for instance). I didn't say all atoms *must* be sorted. I meant that some (most?) atoms can be sorted, i.e. you can have some atoms at fixed positions (like the primitype type or the seamless cubemap state), but you have always at least *some* freedom where you put the rest. The ordering I had in mind was actually from the least frequent to the most frequent, in other words, from the framebuffer (least frequent) to shaders to textures to constant buffers to vertex buffers (most frequent). Of course, the code should document which atoms must have fixed positions along with an explanation. The comment that all atom positions must not be changed isn't enough, because it's not true. Marek I won't try to find which atom can have complete floating position, i am just grouping together register that are always emitted together in fglrx and then i position this group relative to each other according to fglrx position. That means all atom are always emitted in a specific order. So there won't be any freedom. The only freedom i can think of is btw 2 position forced atom and that make the sorting completely useless and complicated. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/7] MSAA on R700 and improvements for Evergreen
On Wed, Aug 22, 2012 at 9:54 PM, Marek Olšák mar...@gmail.com wrote: This series adds R700 MSAA support along with compression of MSAA colorbuffers for R700 and Evergreen, which should save a lot of bandwidth with MSAA. There are also some minor fixes. Please review. Marek Olšák (7): gallium/u_blitter: initialize sample mask in resolve r600g: set CB_TARGET_MASK to 0xf and not 0xff for resolve on evergreen r600g: fix evergreen 8x MSAA sample positions r600g: cleanup names around depth decompression r600g: implement compression for MSAA colorbuffers for evergreen r600g: change programming of CB_SHADER_MASK on r600-r700 r600g: implement MSAA for r700 For the serie : Reviewed-by: Jerome Glisse jgli...@redhat.com What's wrong with r6xx ? src/gallium/auxiliary/util/u_blitter.c | 46 src/gallium/auxiliary/util/u_blitter.h |5 + src/gallium/drivers/r600/evergreen_hw_context.c | 64 ++ src/gallium/drivers/r600/evergreen_state.c | 87 ++-- src/gallium/drivers/r600/evergreend.h | 76 ++- src/gallium/drivers/r600/r600_blit.c| 97 - src/gallium/drivers/r600/r600_hw_context.c | 16 ++ src/gallium/drivers/r600/r600_pipe.c|6 + src/gallium/drivers/r600/r600_pipe.h| 16 +- src/gallium/drivers/r600/r600_resource.h| 14 +- src/gallium/drivers/r600/r600_state.c | 262 +++ src/gallium/drivers/r600/r600_state_common.c| 45 +++- src/gallium/drivers/r600/r600_texture.c | 116 +- src/gallium/drivers/r600/r600d.h| 20 ++ 14 files changed, 770 insertions(+), 100 deletions(-) Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] winsys/radeon: fix VA allocation
On Fri, Aug 3, 2012 at 11:06 AM, Christian König deathsim...@vodafone.de wrote: Wait for VA use to end before reusing it. Should fix: https://bugs.freedesktop.org/show_bug.cgi?id=45018 Signed-off-by: Christian König deathsim...@vodafone.de --- src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 64 + 1 file changed, 43 insertions(+), 21 deletions(-) diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c index 2626586..0c94461 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c @@ -102,6 +102,7 @@ static INLINE struct radeon_bo *radeon_bo(struct pb_buffer *bo) struct radeon_bo_va_hole { struct list_head list; +uint32_t handle; uint64_t offset; uint64_t size; }; @@ -204,7 +205,30 @@ static uint64_t radeon_bomgr_find_va(struct radeon_bomgr *mgr, uint64_t size, ui pipe_mutex_lock(mgr-bo_va_mutex); /* first look for a hole */ LIST_FOR_EACH_ENTRY_SAFE(hole, n, mgr-va_holes, list) { +if (hole-handle) { +struct drm_radeon_gem_busy busy_args; +struct drm_gem_close close_args; + +memset(busy_args, 0, sizeof(busy_args)); +busy_args.handle = hole-handle; +if (drmCommandWriteRead(mgr-rws-fd, DRM_RADEON_GEM_BUSY, +busy_args, sizeof(busy_args)) != 0) { +continue; +} + +memset(close_args, 0, sizeof(close_args)); +close_args.handle = hole-handle; +drmIoctl(mgr-rws-fd, DRM_IOCTL_GEM_CLOSE, close_args); + +hole-handle = 0; +} offset = hole-offset; + if ((offset + hole-size) == mgr-va_offset) { +mgr-va_offset = offset; +list_del(hole-list); +FREE(hole); +continue; + } waste = 0; if (alignment) { waste = offset % alignment; @@ -280,23 +304,21 @@ static void radeon_bomgr_force_va(struct radeon_bomgr *mgr, uint64_t va, uint64_ pipe_mutex_unlock(mgr-bo_va_mutex); } -static void radeon_bomgr_free_va(struct radeon_bomgr *mgr, uint64_t va, uint64_t size) +static void radeon_bomgr_free_va(struct radeon_bomgr *mgr, uint64_t va, + uint64_t size, uint32_t handle) { +struct radeon_bo_va_hole *hole; pipe_mutex_lock(mgr-bo_va_mutex); -if ((va + size) == mgr-va_offset) { -mgr-va_offset = va; -} else { -struct radeon_bo_va_hole *hole; -/* FIXME on allocation failure we just lose virtual address space - * maybe print a warning - */ -hole = CALLOC_STRUCT(radeon_bo_va_hole); -if (hole) { -hole-size = size; -hole-offset = va; -list_add(hole-list, mgr-va_holes); -} +/* FIXME on allocation failure we just lose virtual address space + * maybe print a warning + */ +hole = CALLOC_STRUCT(radeon_bo_va_hole); +if (hole) { +hole-handle = handle; +hole-size = size; +hole-offset = va; +list_add(hole-list, mgr-va_holes); } pipe_mutex_unlock(mgr-bo_va_mutex); } @@ -320,12 +342,12 @@ static void radeon_bo_destroy(struct pb_buffer *_buf) os_munmap(bo-ptr, bo-base.size); if (mgr-va) { -radeon_bomgr_free_va(mgr, bo-va, bo-va_size); +radeon_bomgr_free_va(mgr, bo-va, bo-va_size, bo-handle); +} else { +/* Close object. */ +args.handle = bo-handle; +drmIoctl(bo-rws-fd, DRM_IOCTL_GEM_CLOSE, args); } - -/* Close object. */ -args.handle = bo-handle; -drmIoctl(bo-rws-fd, DRM_IOCTL_GEM_CLOSE, args); pipe_mutex_destroy(bo-map_mutex); FREE(bo); } @@ -540,7 +562,7 @@ static struct pb_buffer *radeon_bomgr_create_bo(struct pb_manager *_mgr, return NULL; } if (va.operation == RADEON_VA_RESULT_VA_EXIST) { -radeon_bomgr_free_va(mgr, bo-va, bo-va_size); +radeon_bomgr_free_va(mgr, bo-va, bo-va_size, 0); bo-va = va.offset; radeon_bomgr_force_va(mgr, bo-va, bo-va_size); } @@ -865,7 +887,7 @@ done: return NULL; } if (va.operation == RADEON_VA_RESULT_VA_EXIST) { -radeon_bomgr_free_va(mgr, bo-va, bo-va_size); +radeon_bomgr_free_va(mgr, bo-va, bo-va_size, 0); bo-va = va.offset; radeon_bomgr_force_va(mgr, bo-va, bo-va_size); } -- 1.7.9.5 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev As i said in the bug report this is not needed. As soon as you call in kernel it should work (given you
Re: [Mesa-dev] [PATCH 1/2] winsys/radeon: fix VA allocation
On Fri, Aug 3, 2012 at 11:06 AM, Christian König deathsim...@vodafone.de wrote: Wait for VA use to end before reusing it. Should fix: https://bugs.freedesktop.org/show_bug.cgi?id=45018 Signed-off-by: Christian König deathsim...@vodafone.de Actually you right mesa can't free right away va, it needs to wait kernel is done. But kernel was severly buggy too, never cleared the pagetable when freeing object. I attached kernel patch. I am in prossed of testing them. Cheers, Jerome --- src/gallium/winsys/radeon/drm/radeon_drm_bo.c | 64 + 1 file changed, 43 insertions(+), 21 deletions(-) diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c index 2626586..0c94461 100644 --- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c +++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c @@ -102,6 +102,7 @@ static INLINE struct radeon_bo *radeon_bo(struct pb_buffer *bo) struct radeon_bo_va_hole { struct list_head list; +uint32_t handle; uint64_t offset; uint64_t size; }; @@ -204,7 +205,30 @@ static uint64_t radeon_bomgr_find_va(struct radeon_bomgr *mgr, uint64_t size, ui pipe_mutex_lock(mgr-bo_va_mutex); /* first look for a hole */ LIST_FOR_EACH_ENTRY_SAFE(hole, n, mgr-va_holes, list) { +if (hole-handle) { +struct drm_radeon_gem_busy busy_args; +struct drm_gem_close close_args; + +memset(busy_args, 0, sizeof(busy_args)); +busy_args.handle = hole-handle; +if (drmCommandWriteRead(mgr-rws-fd, DRM_RADEON_GEM_BUSY, +busy_args, sizeof(busy_args)) != 0) { +continue; +} + +memset(close_args, 0, sizeof(close_args)); +close_args.handle = hole-handle; +drmIoctl(mgr-rws-fd, DRM_IOCTL_GEM_CLOSE, close_args); + +hole-handle = 0; +} offset = hole-offset; + if ((offset + hole-size) == mgr-va_offset) { +mgr-va_offset = offset; +list_del(hole-list); +FREE(hole); +continue; + } waste = 0; if (alignment) { waste = offset % alignment; @@ -280,23 +304,21 @@ static void radeon_bomgr_force_va(struct radeon_bomgr *mgr, uint64_t va, uint64_ pipe_mutex_unlock(mgr-bo_va_mutex); } -static void radeon_bomgr_free_va(struct radeon_bomgr *mgr, uint64_t va, uint64_t size) +static void radeon_bomgr_free_va(struct radeon_bomgr *mgr, uint64_t va, + uint64_t size, uint32_t handle) { +struct radeon_bo_va_hole *hole; pipe_mutex_lock(mgr-bo_va_mutex); -if ((va + size) == mgr-va_offset) { -mgr-va_offset = va; -} else { -struct radeon_bo_va_hole *hole; -/* FIXME on allocation failure we just lose virtual address space - * maybe print a warning - */ -hole = CALLOC_STRUCT(radeon_bo_va_hole); -if (hole) { -hole-size = size; -hole-offset = va; -list_add(hole-list, mgr-va_holes); -} +/* FIXME on allocation failure we just lose virtual address space + * maybe print a warning + */ +hole = CALLOC_STRUCT(radeon_bo_va_hole); +if (hole) { +hole-handle = handle; +hole-size = size; +hole-offset = va; +list_add(hole-list, mgr-va_holes); } pipe_mutex_unlock(mgr-bo_va_mutex); } @@ -320,12 +342,12 @@ static void radeon_bo_destroy(struct pb_buffer *_buf) os_munmap(bo-ptr, bo-base.size); if (mgr-va) { -radeon_bomgr_free_va(mgr, bo-va, bo-va_size); +radeon_bomgr_free_va(mgr, bo-va, bo-va_size, bo-handle); +} else { +/* Close object. */ +args.handle = bo-handle; +drmIoctl(bo-rws-fd, DRM_IOCTL_GEM_CLOSE, args); } - -/* Close object. */ -args.handle = bo-handle; -drmIoctl(bo-rws-fd, DRM_IOCTL_GEM_CLOSE, args); pipe_mutex_destroy(bo-map_mutex); FREE(bo); } @@ -540,7 +562,7 @@ static struct pb_buffer *radeon_bomgr_create_bo(struct pb_manager *_mgr, return NULL; } if (va.operation == RADEON_VA_RESULT_VA_EXIST) { -radeon_bomgr_free_va(mgr, bo-va, bo-va_size); +radeon_bomgr_free_va(mgr, bo-va, bo-va_size, 0); bo-va = va.offset; radeon_bomgr_force_va(mgr, bo-va, bo-va_size); } @@ -865,7 +887,7 @@ done: return NULL; } if (va.operation == RADEON_VA_RESULT_VA_EXIST) { -radeon_bomgr_free_va(mgr, bo-va, bo-va_size); +radeon_bomgr_free_va(mgr, bo-va, bo-va_size, 0); bo-va = va.offset; radeon_bomgr_force_va(mgr, bo-va, bo-va_size); } -- 1.7.9.5
Re: [Mesa-dev] [PATCH 1/2] r600g: add htile support v9
On Sun, Jul 29, 2012 at 1:50 PM, Marek Olšák mar...@gmail.com wrote: On Tue, Jul 17, 2012 at 7:58 PM, j.gli...@gmail.com wrote: From: Jerome Glisse jgli...@redhat.com htile is used for HiZ and HiS support and fast Z/S clears. This commit just adds the htile setup and Fast Z clear. We don't take full advantage of HiS with that patch. v2 really use fast clear, still random issue with some tiles need to try more flush combination, fix depth/stencil texture decompression v3 fix random issue on r6xx/r7xx v4 rebase on top of lastest mesa, disable CB export when clearing htile surface to avoid wasting bandwidth v5 resummarize htile surface when uploading z value. Fix z/stencil decompression, the custom blitter with custom dsa is no longer needed. v6 Reorganize render control/override update mecanism, fixing more issues in the process. v7 Add nop after depth surface base update to work around some htile flushing issue. For htile to 8x8 on r6xx/r7xx as other combination have issue. Do not enable hyperz when flushing/uncompressing depth buffer. v8 Fix htile surface, preload and prefetch setup. Only set preload and prefetch on htile surface clear like fglrx. Record depth clear value per level. Support several level for the htile surface. First depth clear can't be a fast clear. v9 Fix comments, properly account new register in emit function, disable fast zclear if clearing different layer of texture array to different value Signed-off-by: Pierre-Eric Pelloux-Prayer pell...@gmail.com Signed-off-by: Alex Deucher alexander.deuc...@amd.com Signed-off-by: Jerome Glisse jgli...@redhat.com --- src/gallium/drivers/r600/evergreen_hw_context.c |6 + src/gallium/drivers/r600/evergreen_state.c | 102 - src/gallium/drivers/r600/evergreend.h |4 + src/gallium/drivers/r600/r600_blit.c| 38 +++ src/gallium/drivers/r600/r600_hw_context.c | 25 + src/gallium/drivers/r600/r600_pipe.c|8 ++ src/gallium/drivers/r600/r600_pipe.h| 13 ++- src/gallium/drivers/r600/r600_resource.h|7 ++ src/gallium/drivers/r600/r600_state.c | 133 --- src/gallium/drivers/r600/r600_texture.c | 103 ++ src/gallium/drivers/r600/r600d.h|6 + 11 files changed, 399 insertions(+), 46 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_hw_context.c b/src/gallium/drivers/r600/evergreen_hw_context.c index 081701f..546c884 100644 --- a/src/gallium/drivers/r600/evergreen_hw_context.c +++ b/src/gallium/drivers/r600/evergreen_hw_context.c @@ -62,6 +62,9 @@ static const struct r600_reg evergreen_context_reg_list[] = { {GROUP_FORCE_NEW_BLOCK, 0, 0}, {R_028058_DB_DEPTH_SIZE, 0, 0}, {R_02805C_DB_DEPTH_SLICE, 0, 0}, + {R_02802C_DB_DEPTH_CLEAR, 0, 0}, + {R_028ABC_DB_HTILE_SURFACE, 0, 0}, + {R_028AC8_DB_PRELOAD_CONTROL, 0, 0}, {R_028204_PA_SC_WINDOW_SCISSOR_TL, 0, 0}, {R_028208_PA_SC_WINDOW_SCISSOR_BR, 0, 0}, {R_028234_PA_SU_HARDWARE_SCREEN_OFFSET, 0, 0}, @@ -319,6 +322,9 @@ static const struct r600_reg cayman_context_reg_list[] = { {GROUP_FORCE_NEW_BLOCK, 0, 0}, {R_028058_DB_DEPTH_SIZE, 0, 0}, {R_02805C_DB_DEPTH_SLICE, 0, 0}, + {R_02802C_DB_DEPTH_CLEAR, 0, 0}, + {R_028ABC_DB_HTILE_SURFACE, 0, 0}, + {R_028AC8_DB_PRELOAD_CONTROL, 0, 0}, {R_028204_PA_SC_WINDOW_SCISSOR_TL, 0, 0}, {R_028208_PA_SC_WINDOW_SCISSOR_BR, 0, 0}, {R_028234_PA_SU_HARDWARE_SCREEN_OFFSET, 0, 0}, diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index a66387b..214d76b 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -710,13 +710,15 @@ static void *evergreen_create_blend_state(struct pipe_context *ctx, } blend-cb_target_mask = target_mask; - if (target_mask) + if (target_mask) { color_control |= S_028808_MODE(V_028808_CB_NORMAL); - else + } else { color_control |= S_028808_MODE(V_028808_CB_DISABLE); + } r600_pipe_state_add_reg(rstate, R_028808_CB_COLOR_CONTROL, color_control); + /* only have dual source on MRT0 */ blend-dual_src_blend = util_blend_state_is_dual(state, 0); for (int i = 0; i 8; i++) { @@ -1668,6 +1670,26 @@ static void evergreen_db(struct r600_context *rctx, struct r600_pipe_state *rsta } } + /* hyperz */ + if (rtex-hyperz) { + uint64_t htile_offset = rtex-hyperz-surface.level[level].offset; + + rctx-db_misc_state.hyperz = true; + rctx-db_misc_state.db_htile_surface_mask = 0x
Re: [Mesa-dev] [PATCH 2/2] r600g: simplify and fix flushing and synchronization v2
On Sun, Jul 22, 2012 at 8:58 PM, Marek Olšák mar...@gmail.com wrote: On Fri, Jul 20, 2012 at 4:54 AM, Jerome Glisse j.gli...@gmail.com wrote: On Thu, Jul 19, 2012 at 10:25 PM, Marek Olšák mar...@gmail.com wrote: On Fri, Jul 20, 2012 at 4:17 AM, Jerome Glisse j.gli...@gmail.com wrote: On Thu, Jul 19, 2012 at 10:06 PM, Marek Olšák mar...@gmail.com wrote: I actually care a lot about lockups. Well, you are complaing about lockups, yet you have quite obvious bugs in your hyperz code, so let's fix them first. (I wouldn't even try and run the hyperz code in its current state. Please don't take that personally.) Then, if the lockups persist, we can start looking into *what* fixes them. You seem to think that this patch helps a lot, but you don't say why. Aren't you interested in what sequence of GPU commands helps? If I am counting correctly, there are 7 changes in behavior in this patch. It should be pretty easy to nail down the few that help, document them (like /* these two lines fix a lockup with hyperz */), and discard the rest. The documenting part is very important, so that the other developers won't break your code accidentally. Marek You haven't even try hyperz and you say i have an obvious bug, that's kind of funny, but you would not know why. I try pretty much all of Oh come on, I already told you about all the bugs I found in the hyperz patch. You now know them too, and so does everybody else reading mesa-dev. Marek None of the issue you pointed out showed in piglit, none of them did have impact on things like openarena, nexuiz, doomIII, lightmark, ... so no issue you pointed does not cripple the hyperz patch, it's working quite well for many things. Before you extrapolate, yes issue you pointed out have impact in backward use of GL but none the less i addressed them and i can tell you it does help a bit with lockup. I have no doubt that it helps with your lockups and I also have no doubt that the piece of code that helps can be bisected. I have mentioned 7 changes in the patch which are questionable, so the bisection should ideally take 3 steps. After we find the change which helps (and document it), we can discard the rest. That should give us the same stability as this patch does, but without unnecessary code which does cost GPU cycles (regardless of whether it is measurable on a particular machine or not). By the way, in draw_vbo, the emit functions should be called after r600_need_cs_space. Otherwise the command stream may overflow. Marek Again i haven't found a combination other than the outcome of the full patch that helps more. So be my guest bisect on rv610, rv635, rv670, rv710, rv740, rv770. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] r600g: simplify and fix flushing and synchronization v2
On Mon, Jul 23, 2012 at 5:28 PM, Marek Olšák mar...@gmail.com wrote: On Mon, Jul 23, 2012 at 4:25 PM, Jerome Glisse j.gli...@gmail.com wrote: On Sun, Jul 22, 2012 at 8:58 PM, Marek Olšák mar...@gmail.com wrote: On Fri, Jul 20, 2012 at 4:54 AM, Jerome Glisse j.gli...@gmail.com wrote: On Thu, Jul 19, 2012 at 10:25 PM, Marek Olšák mar...@gmail.com wrote: On Fri, Jul 20, 2012 at 4:17 AM, Jerome Glisse j.gli...@gmail.com wrote: On Thu, Jul 19, 2012 at 10:06 PM, Marek Olšák mar...@gmail.com wrote: I actually care a lot about lockups. Well, you are complaing about lockups, yet you have quite obvious bugs in your hyperz code, so let's fix them first. (I wouldn't even try and run the hyperz code in its current state. Please don't take that personally.) Then, if the lockups persist, we can start looking into *what* fixes them. You seem to think that this patch helps a lot, but you don't say why. Aren't you interested in what sequence of GPU commands helps? If I am counting correctly, there are 7 changes in behavior in this patch. It should be pretty easy to nail down the few that help, document them (like /* these two lines fix a lockup with hyperz */), and discard the rest. The documenting part is very important, so that the other developers won't break your code accidentally. Marek You haven't even try hyperz and you say i have an obvious bug, that's kind of funny, but you would not know why. I try pretty much all of Oh come on, I already told you about all the bugs I found in the hyperz patch. You now know them too, and so does everybody else reading mesa-dev. Marek None of the issue you pointed out showed in piglit, none of them did have impact on things like openarena, nexuiz, doomIII, lightmark, ... so no issue you pointed does not cripple the hyperz patch, it's working quite well for many things. Before you extrapolate, yes issue you pointed out have impact in backward use of GL but none the less i addressed them and i can tell you it does help a bit with lockup. I have no doubt that it helps with your lockups and I also have no doubt that the piece of code that helps can be bisected. I have mentioned 7 changes in the patch which are questionable, so the bisection should ideally take 3 steps. After we find the change which helps (and document it), we can discard the rest. That should give us the same stability as this patch does, but without unnecessary code which does cost GPU cycles (regardless of whether it is measurable on a particular machine or not). By the way, in draw_vbo, the emit functions should be called after r600_need_cs_space. Otherwise the command stream may overflow. Marek Again i haven't found a combination other than the outcome of the full patch that helps more. So be my guest bisect on rv610, rv635, rv670, rv710, rv740, rv770. So your patch doesn't fix any issue with evergreen? That's great. Thanks for keeping that to yourself. It's always a pleasure working with you. :) Now that we know the truth, the questionable changes to the evergreen code can be discarded freely. No, it helps on evergreen too, redwood,juniper,turks and bart are the only one i tested with. Evergreen is in a slightly better position but when it comes to lockup there is no good metrics. Concerning older chipsets, I can do the bisection only on rs880, rv670 and rv730. That will have to suffice. One way or another, every single change must be done for a *reason* and that reason should be documented if it's not obvious. Please give me all the necessary information, so that I can start bisecting. That is what lockups your patch fixes and where (name apps or tests, a specific place in a game, etc.) on what chipsets and whether hyperz is enabled. Sorry no such things. It just helps, pick something test with and without and you will see that with it lockup less often. I did not did any of the change in isolation to fix a single case, it's just that with all the change it helps. But of course you assume that i dumb and i did spend no time testing, and just put together some random thing. It is very likely that all the changes I questioned in my first email do not make any difference with regard to lockups, because there are also other changes in your patch which may help too and which I fully agree with. Marek As i said it's a package deal, i did not find a solution but i did find something that improved the overall. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] r600g: simplify and fix flushing and synchronization v2
On Mon, Jul 23, 2012 at 5:28 PM, Marek Olšák mar...@gmail.com wrote: On Mon, Jul 23, 2012 at 4:25 PM, Jerome Glisse j.gli...@gmail.com wrote: On Sun, Jul 22, 2012 at 8:58 PM, Marek Olšák mar...@gmail.com wrote: On Fri, Jul 20, 2012 at 4:54 AM, Jerome Glisse j.gli...@gmail.com wrote: On Thu, Jul 19, 2012 at 10:25 PM, Marek Olšák mar...@gmail.com wrote: On Fri, Jul 20, 2012 at 4:17 AM, Jerome Glisse j.gli...@gmail.com wrote: On Thu, Jul 19, 2012 at 10:06 PM, Marek Olšák mar...@gmail.com wrote: I actually care a lot about lockups. Well, you are complaing about lockups, yet you have quite obvious bugs in your hyperz code, so let's fix them first. (I wouldn't even try and run the hyperz code in its current state. Please don't take that personally.) Then, if the lockups persist, we can start looking into *what* fixes them. You seem to think that this patch helps a lot, but you don't say why. Aren't you interested in what sequence of GPU commands helps? If I am counting correctly, there are 7 changes in behavior in this patch. It should be pretty easy to nail down the few that help, document them (like /* these two lines fix a lockup with hyperz */), and discard the rest. The documenting part is very important, so that the other developers won't break your code accidentally. Marek You haven't even try hyperz and you say i have an obvious bug, that's kind of funny, but you would not know why. I try pretty much all of Oh come on, I already told you about all the bugs I found in the hyperz patch. You now know them too, and so does everybody else reading mesa-dev. Marek None of the issue you pointed out showed in piglit, none of them did have impact on things like openarena, nexuiz, doomIII, lightmark, ... so no issue you pointed does not cripple the hyperz patch, it's working quite well for many things. Before you extrapolate, yes issue you pointed out have impact in backward use of GL but none the less i addressed them and i can tell you it does help a bit with lockup. I have no doubt that it helps with your lockups and I also have no doubt that the piece of code that helps can be bisected. I have mentioned 7 changes in the patch which are questionable, so the bisection should ideally take 3 steps. After we find the change which helps (and document it), we can discard the rest. That should give us the same stability as this patch does, but without unnecessary code which does cost GPU cycles (regardless of whether it is measurable on a particular machine or not). By the way, in draw_vbo, the emit functions should be called after r600_need_cs_space. Otherwise the command stream may overflow. Marek Again i haven't found a combination other than the outcome of the full patch that helps more. So be my guest bisect on rv610, rv635, rv670, rv710, rv740, rv770. So your patch doesn't fix any issue with evergreen? That's great. Thanks for keeping that to yourself. It's always a pleasure working with you. :) Now that we know the truth, the questionable changes to the evergreen code can be discarded freely. As usual you make the worst assumption about me. Cheers, Jerome Concerning older chipsets, I can do the bisection only on rs880, rv670 and rv730. That will have to suffice. One way or another, every single change must be done for a *reason* and that reason should be documented if it's not obvious. Please give me all the necessary information, so that I can start bisecting. That is what lockups your patch fixes and where (name apps or tests, a specific place in a game, etc.) on what chipsets and whether hyperz is enabled. It is very likely that all the changes I questioned in my first email do not make any difference with regard to lockups, because there are also other changes in your patch which may help too and which I fully agree with. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] r600g: simplify and fix flushing and synchronization v2
On Thu, Jul 19, 2012 at 9:00 PM, Marek Olšák mar...@gmail.com wrote: On Fri, Jul 20, 2012 at 1:34 AM, Jerome Glisse j.gli...@gmail.com wrote: On Thu, Jul 19, 2012 at 6:07 PM, Marek Olšák mar...@gmail.com wrote: I have these issues with the patch: 1) On GPUs without a vertex cache, you flush the texture cache every draw operation. Are you kidding? Show me one app with perf regression due to that ? Or just go look at what fglrx is doing. I don't believe that fglrx unconditionally emits SURFACE_SYNC with TC_ACTION_ENA before every DRAW packet. I just don't buy that. It's too stupid to be true. And considering that it wasn't needed before, it's not needed now either. Please give me some other argument than just fglrx. No fglrx don't set it for each draw, fglrx set it if a bunch of reg is touch. Given than right now we pretty much always touch one of those reg btw draw it just turn up that my patch trigger the flush btw each draw. 2) All colorbuffers / streamout buffers are flushed, even those which are not enabled. E.g. instead of flushing only CB0 when there is only one, this code flushes all of them. Why? This either needs an explanation or it should only flush the buffers which are enabled (like the old code did). fglrx + no perf regression ... The no perf regression argument doesn't apply here, because it just might not be the bottleneck now. I'm willing to step aside from this one issue though. I am just trying to stick to fglrx pattern. 3) Please explain: - why you added PS_PARTIAL_FLUSH in r600_texture_barrier and r600_set_framebuffer_state. fglrx is doing something similar But not exactly the same thing, right? So there's no reason for it to be there. It's hard to do as fglrx as the pattern is evading me no matter how much different app command stream i look at i always find an exception to rule i formulating. - why you added CACHE_FLUSH_AND_INV_EVENT in set_framebuffer_state for R700 and evergreen. fglrx ... - why you applied the CB flush workarounds meant for RV6xx to all R600 and R700 chipsets. fglrx ... - why the streamout workaround for RV6xx (S_0085F0_DEST_BASE_0_ENA) is applied to all R600, R700, and evergreen chipsets. didn't hurt thought fglrx is not using that at all but i did not wanted to remove it Well, you didn't remove it. You added it for those other chipsets. That's a difference. You don't even know what you did there, do you? :) All the things I mentioned are either half-assed or added for no reason. Fglrx might do all sorts of stupid things or for its own reasons, but that doesn't mean it's automatically good for us. Besides that, it's almost impossible to figure out why a CS was built up exactly the way it was without access to the driver code and to its developers. Oh yeah i don't have fucking clue, i am fucking cluesless, i am just a fool that write fucking random line of code and have no fucking idea of what i am doing. Of course you know better, please enlight me. I am totaly on board with fglrx doing stupid things but yet it does not lockup ... so one of those stupid things is important and until someone figure which one i would rather do more stupid thing and not lockup then trying to pretend that flushing is a bottleneck with the driver right now. - why R600_CONTEXT_FLUSH_AND_INV emits SURFACE_SYNC on evergreen, resulting in emission of SURFACE_SYNC twice in a row in most situations. fglrx is doing that and without that lockup ... Hm, now you're talking. So do you need: FLUSH_AND_INV + SURFACE_SYNC (COHER_CNTL = ~0) or do you need: FLUSH_AND_INV + SURFACE_SYNC (COHER_CNTL = ~0) + SURFACE_SYNC (COHER_CNTL = according to flags) for it not to lock up? flush inv is always follow by surface sync with few exception (on which i am not clear but there is always a surface sync before a draw after a flush inv. Flushing has always worked without all the changes (1, 2, 3) mentioned above, so please if you don't have a reasonable explanation, revert to the old behavior. Well if you have a better solution please show me ... I already showed you in the first reply. If you are unwilling to change your patches even a little bit, I'll happily take them over from you. Marek Oh i will change them, just not the way you like, i am trying to avoid lockup, you oubviously don't give a shit about that Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] r600g: simplify and fix flushing and synchronization v2
On Thu, Jul 19, 2012 at 10:06 PM, Marek Olšák mar...@gmail.com wrote: I actually care a lot about lockups. Well, you are complaing about lockups, yet you have quite obvious bugs in your hyperz code, so let's fix them first. (I wouldn't even try and run the hyperz code in its current state. Please don't take that personally.) Then, if the lockups persist, we can start looking into *what* fixes them. You seem to think that this patch helps a lot, but you don't say why. Aren't you interested in what sequence of GPU commands helps? If I am counting correctly, there are 7 changes in behavior in this patch. It should be pretty easy to nail down the few that help, document them (like /* these two lines fix a lockup with hyperz */), and discard the rest. The documenting part is very important, so that the other developers won't break your code accidentally. Marek You haven't even try hyperz and you say i have an obvious bug, that's kind of funny, but you would not know why. I try pretty much all of the thing my patch do in isolation and combination of each other and the only way i got improvement is with something similar to this patch. Remove one things and i can find things program that are more likely to lockup. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2] r600g: simplify and fix flushing and synchronization v2
On Thu, Jul 19, 2012 at 10:25 PM, Marek Olšák mar...@gmail.com wrote: On Fri, Jul 20, 2012 at 4:17 AM, Jerome Glisse j.gli...@gmail.com wrote: On Thu, Jul 19, 2012 at 10:06 PM, Marek Olšák mar...@gmail.com wrote: I actually care a lot about lockups. Well, you are complaing about lockups, yet you have quite obvious bugs in your hyperz code, so let's fix them first. (I wouldn't even try and run the hyperz code in its current state. Please don't take that personally.) Then, if the lockups persist, we can start looking into *what* fixes them. You seem to think that this patch helps a lot, but you don't say why. Aren't you interested in what sequence of GPU commands helps? If I am counting correctly, there are 7 changes in behavior in this patch. It should be pretty easy to nail down the few that help, document them (like /* these two lines fix a lockup with hyperz */), and discard the rest. The documenting part is very important, so that the other developers won't break your code accidentally. Marek You haven't even try hyperz and you say i have an obvious bug, that's kind of funny, but you would not know why. I try pretty much all of Oh come on, I already told you about all the bugs I found in the hyperz patch. You now know them too, and so does everybody else reading mesa-dev. Marek None of the issue you pointed out showed in piglit, none of them did have impact on things like openarena, nexuiz, doomIII, lightmark, ... so no issue you pointed does not cripple the hyperz patch, it's working quite well for many things. Before you extrapolate, yes issue you pointed out have impact in backward use of GL but none the less i addressed them and i can tell you it does help a bit with lockup. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g: hyperz
On Sat, Jul 14, 2012 at 9:56 AM, Alex Deucher alexdeuc...@gmail.com wrote: On Fri, Jul 13, 2012 at 8:11 PM, Jerome Glisse j.gli...@gmail.com wrote: On Fri, Jul 13, 2012 at 8:08 PM, Marek Olšák mar...@gmail.com wrote: Hi Jerome, I couldn't open the patch, because freedesktop.org doesn't seem to work for me today, it always times out. Anyway, non-working code shouldn't be merged into Mesa master, because it decreases the quality of the driver and is a pain to maintain. As as I said in another email, merging non-working code on purpose is a very bad idea. Please don't do it. Marek Code works, no regression, but if you enable hyperz get ready to experience lockup, likelyhood depends on what you are doing. So no i don't consider this a non working code. It does work and doesn't regress. Is it just 6xx/7xx that locks or also evergreen? Also even if we don't turn on hyperz, it probably makes sense to always have an htile buffer bound as the htile cache (and backing htile buffer) is used for Z/S compression, culling, fast ops, etc. in addition to HiZ/S if a Z or S buffer is bound. Alex Just enabling htile surface is enough to trigger the lockup, thus we can't bind the htile buffer. Quite frankly i don't know how much evergreen is an issue, i pretty much stuck with r6xx/r7xx as they were always locking up with my test case. Thought i have been able to lockup evergreen but i did have the feeling that it was lot less likely to happen. Basicly to trigger the lockup you have to switch btw a lot of depth surface/htile surface, if you just have a single depth buffer you will be fine. Thus most use case will just work properly. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] r600g: hyperz
On Fri, Jul 13, 2012 at 8:08 PM, Marek Olšák mar...@gmail.com wrote: Hi Jerome, I couldn't open the patch, because freedesktop.org doesn't seem to work for me today, it always times out. Anyway, non-working code shouldn't be merged into Mesa master, because it decreases the quality of the driver and is a pain to maintain. As as I said in another email, merging non-working code on purpose is a very bad idea. Please don't do it. Marek Code works, no regression, but if you enable hyperz get ready to experience lockup, likelyhood depends on what you are doing. So no i don't consider this a non working code. It does work and doesn't regress. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: improve flushed depth texture handling v2
On Tue, Jul 10, 2012 at 2:10 PM, Marek Olšák mar...@gmail.com wrote: On Tue, Jul 10, 2012 at 6:40 AM, Vadim Girlin vadimgir...@gmail.com wrote: On Sat, 2012-07-07 at 01:48 +0200, Marek Olšák wrote: On Wed, Jun 27, 2012 at 1:34 AM, Vadim Girlin vadimgir...@gmail.com wrote: Use r600_resource_texture::flished_depth_texture for GPU access, and allocate it in the VRAM. For transfers we'll allocate untiled texture in the GTT and store it in the r600_transfer::staging. Improves performance when flushed depth texture is frequently used by the GPU (about 30% for Lightsmark). Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- Fixes fbo-clear-formats, fbo-generatemipmap-formats, no regressions on evergreen Hi, is there any reason this patch hasn't been committed yet? Hi, I have some doubts because it was benchmarked by phoronix and there were regressions, though I suspect that something is wrong with the results: http://www.phoronix.com/scan.php?page=articleitem=amd_r600g_texdepthnum=4 I was going to look into it but had no time yet. I'd like to be sure that there are no regressions before committing. Well, there's nothing wrong with your patch. I wouldn't trust benchmarks run with the Unity desktop so much. I myself had to switch from Unity 2D to Xfce just to get consistent results when testing performance. Now that your patch separates flushing for texturing and transfers, I think we could make it a little bit faster by imlementing an in-place flush for texturing (that is without having to allocate another resource). Marek In place flush are useful for the case where you know you wont reuse the depth buffer as a depth buffer, or if you know next operation will be a gClear on the depth buffer. What i am worried about is that recompression might not work in place, for it to work you need to have db decompressed into db tiling format and not cb tiling format. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: improve flushed depth texture handling v2
On Tue, Jul 10, 2012 at 5:16 PM, Marek Olšák mar...@gmail.com wrote: On Tue, Jul 10, 2012 at 10:00 PM, Jerome Glisse j.gli...@gmail.com wrote: On Tue, Jul 10, 2012 at 2:10 PM, Marek Olšák mar...@gmail.com wrote: On Tue, Jul 10, 2012 at 6:40 AM, Vadim Girlin vadimgir...@gmail.com wrote: On Sat, 2012-07-07 at 01:48 +0200, Marek Olšák wrote: On Wed, Jun 27, 2012 at 1:34 AM, Vadim Girlin vadimgir...@gmail.com wrote: Use r600_resource_texture::flished_depth_texture for GPU access, and allocate it in the VRAM. For transfers we'll allocate untiled texture in the GTT and store it in the r600_transfer::staging. Improves performance when flushed depth texture is frequently used by the GPU (about 30% for Lightsmark). Signed-off-by: Vadim Girlin vadimgir...@gmail.com --- Fixes fbo-clear-formats, fbo-generatemipmap-formats, no regressions on evergreen Hi, is there any reason this patch hasn't been committed yet? Hi, I have some doubts because it was benchmarked by phoronix and there were regressions, though I suspect that something is wrong with the results: http://www.phoronix.com/scan.php?page=articleitem=amd_r600g_texdepthnum=4 I was going to look into it but had no time yet. I'd like to be sure that there are no regressions before committing. Well, there's nothing wrong with your patch. I wouldn't trust benchmarks run with the Unity desktop so much. I myself had to switch from Unity 2D to Xfce just to get consistent results when testing performance. Now that your patch separates flushing for texturing and transfers, I think we could make it a little bit faster by imlementing an in-place flush for texturing (that is without having to allocate another resource). Marek In place flush are useful for the case where you know you wont reuse the depth buffer as a depth buffer, or if you know next operation will be a gClear on the depth buffer. What i am worried about is that recompression might not work in place, for it to work you need to have db decompressed into db tiling format and not cb tiling format. The case where the depth is not reused is the most common one. It might even be the only one in practice. Depth textures are most commonly used for shadow mapping, which is the not-reusing case. They can also be used to implement deferred rendering (though that's not very common), which means the same as shadow mapping for us. Actually, no graphics algorithm comes to mind that would do write-texture-write with the same depth buffer. Marek I am not saying it's not the most common one, i am saying that recompressing might be more complex (recompress to different buffer then copy back to original buffer, or copy buffer and uncompress from copy). Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/2] [RFC] r600g: improve handling of the shader exports
On Tue, Jun 26, 2012 at 5:45 AM, Vadim Girlin vadimgir...@gmail.com wrote: On Fri, 2012-06-22 at 14:24 -0400, Jerome Glisse wrote: On Fri, Jun 22, 2012 at 10:02 AM, Vadim Girlin vadimgir...@gmail.com wrote: r600g: avoid unnecessary shader exports r600g: enable DUAL_EXPORT mode when possible First patch fixes the lockups with DUAL_EXPORT mode for me, also AFAICS it fixes some depth/stencil tests, though I'm not sure why, haven't looked into it (possibly unexpected color exports were written over the depth exports). Second patch enables DUAL_EXPORT mode when possible, giving about 40% improvement with the results of the fill demo (on juniper). Also it sets DB_SOURCE_FORMAT to the EXPORT_DB_TWO when in DUAL_EXPORT mode, though I'm not sure yet if it has any effect on performance. I haven't tried to implement the same for pre-evergreen cards - I can't test it anyway without r600 hw, but I guess it shouldn't be hard. AFAIK there will be additional requirements for DUAL_EXPORT mode for r6xx (it's documented in the R6xx_3D_Registers.pdf). There are no regressions with piglit on evergreen (juniper). r6xx/r7xx version WIP not working (well not improving perf) http://people.freedesktop.org/~glisse/0003-r600g-enable-DUAL_EXPORT-mode-when-possible-on-r6xx-.patch AFAIK you've fixed that already, do you have any regressions with dual export on r6xx/7xx? There are some issues reported on rv770 with patch 1 - http://lists.freedesktop.org/archives/mesa-dev/2012-June/023229.html Vadim Yeah i have updated patches here that fix regression. Will send shortly once i am confident. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] r600g: avoid unnecessary shader exports
On Fri, Jun 22, 2012 at 10:02 AM, Vadim Girlin vadimgir...@gmail.com wrote: In some cases TGSI shader has more color outputs than the number of CBs, so it seems we need to limit the number of color exports. This requires different shader variants depending on the nr_cbufs, but on the other hand we are doing less exports, which are very costly. Signed-off-by: Vadim Girlin vadimgir...@gmail.com Reviewed-by: Jerome Glisse jgli...@redhat.com --- src/gallium/drivers/r600/evergreen_state.c | 10 +++--- src/gallium/drivers/r600/r600_shader.c | 25 ++--- src/gallium/drivers/r600/r600_shader.h | 7 ++- src/gallium/drivers/r600/r600_state_common.c | 4 ++-- 4 files changed, 33 insertions(+), 13 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index b618ca8..3fe95e1 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -2641,18 +2641,14 @@ void evergreen_pipe_shader_ps(struct pipe_context *ctx, struct r600_pipe_shader db_shader_control |= S_02880C_KILL_ENABLE(1); exports_ps = 0; - num_cout = 0; for (i = 0; i rshader-noutput; i++) { if (rshader-output[i].name == TGSI_SEMANTIC_POSITION || rshader-output[i].name == TGSI_SEMANTIC_STENCIL) exports_ps |= 1; - else if (rshader-output[i].name == TGSI_SEMANTIC_COLOR) { - if (rshader-fs_write_all) - num_cout = rshader-nr_cbufs; - else - num_cout++; - } } + + num_cout = rshader-nr_ps_color_exports; + exports_ps |= S_02884C_EXPORT_COLORS(num_cout); if (!exports_ps) { /* always at least export 1 component per pixel */ diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 63b9a03..782113b 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -801,6 +801,12 @@ static int tgsi_declaration(struct r600_shader_ctx *ctx) ctx-cv_output = i; break; } + } else if (ctx-type == TGSI_PROCESSOR_FRAGMENT) { + switch (d-Semantic.Name) { + case TGSI_SEMANTIC_COLOR: + ctx-shader-nr_ps_max_color_exports++; + break; + } } break; case TGSI_FILE_CONSTANT: @@ -1153,8 +1159,10 @@ static int r600_shader_from_tgsi(struct r600_context * rctx, struct r600_pipe_sh ctx.colors_used = 0; ctx.clip_vertex_write = 0; + shader-nr_ps_color_exports = 0; + shader-nr_ps_max_color_exports = 0; + shader-two_side = (ctx.type == TGSI_PROCESSOR_FRAGMENT) rctx-two_side; - shader-nr_cbufs = rctx-nr_cbufs; /* register allocations */ /* Values [0,127] correspond to GPR[0..127]. @@ -1289,6 +1297,9 @@ static int r600_shader_from_tgsi(struct r600_context * rctx, struct r600_pipe_sh } } + if (shader-fs_write_all rctx-chip_class = EVERGREEN) + shader-nr_ps_max_color_exports = 8; + if (ctx.fragcoord_input = 0) { if (ctx.bc-chip_class == CAYMAN) { for (j = 0 ; j 4; j++) { @@ -1528,10 +1539,17 @@ static int r600_shader_from_tgsi(struct r600_context * rctx, struct r600_pipe_sh break; case TGSI_PROCESSOR_FRAGMENT: if (shader-output[i].name == TGSI_SEMANTIC_COLOR) { + /* never export more colors than the number of CBs */ + if (next_pixel_base = rctx-nr_cbufs) { + /* skip export */ + j--; + continue; + } output[j].array_base = next_pixel_base++; output[j].type = V_SQ_CF_ALLOC_EXPORT_WORD0_SQ_EXPORT_PIXEL; + shader-nr_ps_color_exports++; if (shader-fs_write_all (rctx-chip_class = EVERGREEN)) { - for (k = 1; k shader-nr_cbufs; k++) { + for (k = 1; k rctx-nr_cbufs; k++) { j++; memset(output[j], 0, sizeof(struct r600_bytecode_output)); output[j].gpr = shader-output[i].gpr; @@ -1545,6 +1563,7 @@ static int
Re: [Mesa-dev] [PATCH 2/2] r600g: enable DUAL_EXPORT mode when possible
On Fri, Jun 22, 2012 at 10:02 AM, Vadim Girlin vadimgir...@gmail.com wrote: It seems DUAL_EXPORT on evergreen may be enabled when all CBs use 16-bit export mode (EXPORT_4C_16BPC), also there should be at least one CB, and the PS shouldn't export depth/stencil. Signed-off-by: Vadim Girlin vadimgir...@gmail.com Reviewed-by: Jerome Glisse jgli...@redhat.com --- src/gallium/drivers/r600/evergreen_state.c | 46 ++ src/gallium/drivers/r600/evergreend.h | 7 src/gallium/drivers/r600/r600_pipe.h | 5 +++ src/gallium/drivers/r600/r600_state_common.c | 3 ++ 4 files changed, 55 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index 3fe95e1..bddb67e 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -1458,7 +1458,6 @@ static void evergreen_cb(struct r600_context *rctx, struct r600_pipe_state *rsta (desc-channel[i].size 17 desc-channel[i].type == UTIL_FORMAT_TYPE_FLOAT))) { color_info |= S_028C70_SOURCE_FORMAT(V_028C70_EXPORT_4C_16BPC); - rctx-export_16bpc = true; } else { rctx-export_16bpc = false; } @@ -1661,6 +1660,7 @@ static void evergreen_set_framebuffer_state(struct pipe_context *ctx, struct r600_context *rctx = (struct r600_context *)ctx; struct r600_pipe_state *rstate = CALLOC_STRUCT(r600_pipe_state); uint32_t tl, br; + int i; if (rstate == NULL) return; @@ -1674,10 +1674,16 @@ static void evergreen_set_framebuffer_state(struct pipe_context *ctx, /* build states */ rctx-have_depth_fb = 0; + rctx-export_16bpc = true; rctx-nr_cbufs = state-nr_cbufs; - for (int i = 0; i state-nr_cbufs; i++) { + for (i = 0; i state-nr_cbufs; i++) { evergreen_cb(rctx, rstate, state, i); } + + for (; i 8 ; i++) { + r600_pipe_state_add_reg(rstate, R_028C70_CB_COLOR0_INFO + i * 0x3C, 0); + } + if (state-zsbuf) { evergreen_db(rctx, rstate, state); } @@ -2585,6 +2591,7 @@ void evergreen_pipe_shader_ps(struct pipe_context *ctx, struct r600_pipe_shader int ninterp = 0; boolean have_linear = FALSE, have_centroid = FALSE, have_perspective = FALSE; unsigned spi_baryc_cntl, sid, tmp, idx = 0; + unsigned z_export = 0, stencil_export = 0; rstate-nregs = 0; @@ -2633,13 +2640,16 @@ void evergreen_pipe_shader_ps(struct pipe_context *ctx, struct r600_pipe_shader for (i = 0; i rshader-noutput; i++) { if (rshader-output[i].name == TGSI_SEMANTIC_POSITION) - db_shader_control |= S_02880C_Z_EXPORT_ENABLE(1); + z_export = 1; if (rshader-output[i].name == TGSI_SEMANTIC_STENCIL) - db_shader_control |= S_02880C_STENCIL_EXPORT_ENABLE(1); + stencil_export = 1; } if (rshader-uses_kill) db_shader_control |= S_02880C_KILL_ENABLE(1); + db_shader_control |= S_02880C_Z_EXPORT_ENABLE(z_export); + db_shader_control |= S_02880C_STENCIL_EXPORT_ENABLE(stencil_export); + exports_ps = 0; for (i = 0; i rshader-noutput; i++) { if (rshader-output[i].name == TGSI_SEMANTIC_POSITION || @@ -2711,8 +2721,9 @@ void evergreen_pipe_shader_ps(struct pipe_context *ctx, struct r600_pipe_shader r600_pipe_state_add_reg(rstate, R_02884C_SQ_PGM_EXPORTS_PS, exports_ps); - r600_pipe_state_add_reg(rstate, R_02880C_DB_SHADER_CONTROL, - db_shader_control); + + shader-db_shader_control = db_shader_control; + shader-ps_depth_export = z_export | stencil_export; shader-sprite_coord_enable = rctx-sprite_coord_enable; if (rctx-rasterizer) @@ -2798,3 +2809,26 @@ void *evergreen_create_db_flush_dsa(struct r600_context *rctx) /* Don't set the 'is_flush' flag in r600_pipe_dsa, evergreen doesn't need it. */ return rstate; } + +void evergreen_update_dual_export_state(struct r600_context * rctx) +{ + unsigned dual_export = rctx-export_16bpc rctx-nr_cbufs + !rctx-ps_shader-ps_depth_export; + + unsigned db_source_format = dual_export ? V_02880C_EXPORT_DB_TWO : + V_02880C_EXPORT_DB_FULL; + + unsigned db_shader_control = rctx-ps_shader-db_shader_control | + S_02880C_DUAL_EXPORT_ENABLE(dual_export) | + S_02880C_DB_SOURCE_FORMAT(db_source_format); + + if (db_shader_control != rctx-db_shader_control) { + struct r600_pipe_state rstate
Re: [Mesa-dev] [PATCH 0/2] [RFC] r600g: improve handling of the shader exports
On Fri, Jun 22, 2012 at 10:02 AM, Vadim Girlin vadimgir...@gmail.com wrote: r600g: avoid unnecessary shader exports r600g: enable DUAL_EXPORT mode when possible First patch fixes the lockups with DUAL_EXPORT mode for me, also AFAICS it fixes some depth/stencil tests, though I'm not sure why, haven't looked into it (possibly unexpected color exports were written over the depth exports). Second patch enables DUAL_EXPORT mode when possible, giving about 40% improvement with the results of the fill demo (on juniper). Also it sets DB_SOURCE_FORMAT to the EXPORT_DB_TWO when in DUAL_EXPORT mode, though I'm not sure yet if it has any effect on performance. I haven't tried to implement the same for pre-evergreen cards - I can't test it anyway without r600 hw, but I guess it shouldn't be hard. AFAIK there will be additional requirements for DUAL_EXPORT mode for r6xx (it's documented in the R6xx_3D_Registers.pdf). There are no regressions with piglit on evergreen (juniper). r6xx/r7xx version WIP not working (well not improving perf) http://people.freedesktop.org/~glisse/0003-r600g-enable-DUAL_EXPORT-mode-when-possible-on-r6xx-.patch Cheers, Jerome src/gallium/drivers/r600/evergreen_state.c | 56 -- src/gallium/drivers/r600/evergreend.h | 7 src/gallium/drivers/r600/r600_pipe.h | 5 +++ src/gallium/drivers/r600/r600_shader.c | 25 ++-- src/gallium/drivers/r600/r600_shader.h | 7 +++- src/gallium/drivers/r600/r600_state_common.c | 7 +++- 6 files changed, 88 insertions(+), 19 deletions(-) -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] r600g: Unify SURFACE_SYNC packet emission for 3D and compute
On Tue, Jun 19, 2012 at 2:06 PM, Tom Stellard thomas.stell...@amd.com wrote: On Tue, Jun 19, 2012 at 07:57:50PM +0200, Marek Olšák wrote: Hi Tom, This adds new calls to r600_inval_xxx_cache, which justs sets the dirty flag in the atom surface_sync_cmd to true, but I couldn't find where the compute code calls r600_emit_atom. The proper way to emit dirty atoms is in r600_state_common.c:843-845. The compute code is calling r600_flush_framebuffer() from compute_emit_cs, which is what calls r600_emit_atom() for surface_sync_cmd. -Tom I am heavily refactoring all this for hyperz, but i can rebase once i have it working. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] mesa: Add support for GL_ARB_base_instance
On Mon, Jun 18, 2012 at 8:33 PM, Fredrik Höglund fred...@kde.org wrote: On Tuesday 19 June 2012, Brian Paul wrote: On 06/18/2012 02:50 PM, Fredrik Höglund wrote: Reviewed-by: Brian Paulbri...@vmware.com --- v2: Change baseinstance to base_instance in _mesa_prims and to baseInstance in the vbo_exec functions. src/mapi/glapi/gen/ARB_base_instance.xml | 40 +++ src/mapi/glapi/gen/Makefile | 1 + src/mapi/glapi/gen/gl_API.xml | 3 +- src/mesa/main/dd.h | 10 +++ src/mesa/main/dlist.c | 45 src/mesa/main/extensions.c | 1 + src/mesa/main/mtypes.h | 1 + src/mesa/main/vtxfmt.c | 3 + src/mesa/vbo/vbo.h | 1 + src/mesa/vbo/vbo_exec_api.c | 1 + src/mesa/vbo/vbo_exec_array.c | 114 +++--- src/mesa/vbo/vbo_save_api.c | 2 + src/mesa/vbo/vbo_split_inplace.c | 6 +- 13 files changed, 216 insertions(+), 12 deletions(-) create mode 100644 src/mapi/glapi/gen/ARB_base_instance.xml Looks good. Do you need me to commit/push these for you? Yeah, I don't have commit access, so please do. Fredrik This break gallium driver, nothing render with it Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/4] mesa: Add support for GL_ARB_base_instance
On Tue, Jun 19, 2012 at 4:46 PM, Jerome Glisse j.gli...@gmail.com wrote: On Mon, Jun 18, 2012 at 8:33 PM, Fredrik Höglund fred...@kde.org wrote: On Tuesday 19 June 2012, Brian Paul wrote: On 06/18/2012 02:50 PM, Fredrik Höglund wrote: Reviewed-by: Brian Paulbri...@vmware.com --- v2: Change baseinstance to base_instance in _mesa_prims and to baseInstance in the vbo_exec functions. src/mapi/glapi/gen/ARB_base_instance.xml | 40 +++ src/mapi/glapi/gen/Makefile | 1 + src/mapi/glapi/gen/gl_API.xml | 3 +- src/mesa/main/dd.h | 10 +++ src/mesa/main/dlist.c | 45 src/mesa/main/extensions.c | 1 + src/mesa/main/mtypes.h | 1 + src/mesa/main/vtxfmt.c | 3 + src/mesa/vbo/vbo.h | 1 + src/mesa/vbo/vbo_exec_api.c | 1 + src/mesa/vbo/vbo_exec_array.c | 114 +++--- src/mesa/vbo/vbo_save_api.c | 2 + src/mesa/vbo/vbo_split_inplace.c | 6 +- 13 files changed, 216 insertions(+), 12 deletions(-) create mode 100644 src/mapi/glapi/gen/ARB_base_instance.xml Looks good. Do you need me to commit/push these for you? Yeah, I don't have commit access, so please do. Fredrik This break gallium driver, nothing render with it Cheers, Jerome Well nevermind, git clean -fdX did the trick sorry for the noise. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Clarifications w.r.t MSAA
On Tue, Jun 12, 2012 at 8:39 AM, Christoph Bumiller e0425...@student.tuwien.ac.at wrote: On 06/12/2012 02:25 PM, Olivier Galibert wrote: On Tue, Jun 12, 2012 at 01:50:08PM +0200, Christoph Bumiller wrote: First question: how many depths should be computed, and for which coordinates? Which of these values is associated with which sample? One for each sample point. The depth buffer will be multisampled as well. Coverage sampling (CSAA) where you have extra coverage samples that do NOT (necessarily) correspond to color sample locations are not covered by the GL spec, it's vendor-specific. Ok. So that means that if the shader writes z, you have to do full supersampling then. No, I don't think that's the case. You get per-sample depth values if you use fixed-pipe depth, but shader-computed depth should simply be replicated (to all samples covered by the shader invocation), like color outputs. I don't think thats how it wors, each sample will have its color and depth value no matter if fixed pipeline or not. When resolving the msaa surface, you only use the sample that cover the surface to make the average. Anyway that's my understanding. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Clarifications w.r.t MSAA
On Tue, Jun 12, 2012 at 1:34 PM, Christoph Bumiller e0425...@student.tuwien.ac.at wrote: On 06/12/2012 06:52 PM, Jerome Glisse wrote: On Tue, Jun 12, 2012 at 8:39 AM, Christoph Bumiller e0425...@student.tuwien.ac.at wrote: On 06/12/2012 02:25 PM, Olivier Galibert wrote: On Tue, Jun 12, 2012 at 01:50:08PM +0200, Christoph Bumiller wrote: First question: how many depths should be computed, and for which coordinates? Which of these values is associated with which sample? One for each sample point. The depth buffer will be multisampled as well. Coverage sampling (CSAA) where you have extra coverage samples that do NOT (necessarily) correspond to color sample locations are not covered by the GL spec, it's vendor-specific. Ok. So that means that if the shader writes z, you have to do full supersampling then. No, I don't think that's the case. You get per-sample depth values if you use fixed-pipe depth, but shader-computed depth should simply be replicated (to all samples covered by the shader invocation), like color outputs. I don't think thats how it wors, each sample will have its color and depth value no matter if fixed pipeline or not. When resolving the Sorry, fixed-pipe was misleading, I meant the z-value from the rasterizer (which can be regarded as fixed functionality), not without (custom) shaders. If the shader is only invoked once for each fragment (i.e. MinSampleShading == 1), all the samples that belong to that fragment will share the same color and depth values. So i think we agree but according to spec MinSampleShading=1 - the fragment shader is run once for each sample. MinSampleShading value is a fraction of x/MIN_SAMPLE_SHADING_VALUE_ARB So if you have 8 sample surface and you set MinSampleShading to 0.5 you will get the fragment shader invoked for 4 sample. Note that according to spec implementation might ignore the fraction and only cover the case MinSampleShading==1 Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Four questions about DRI1 drivers
On Thu, 2012-03-01 at 13:56 -0600, Patrick Baggett wrote: Now I'm curious. Is it the case that every DRI1 driver could be a DRI2 driver with enough effort? Not talking about emulating hardware features. Patrick DRI2 impose nothing on hw capabilities. So any hw can do DRI2 even hw without 3d engine (see virtual gem for instance). Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/12] R600g: cleanups and rework of queries
On Tue, Feb 21, 2012 at 7:55 PM, Marek Olšák mar...@gmail.com wrote: Hi everyone, Besides the cleanups, there are fixes for create_context fail paths and rework of queries. The rework is the most important, because it eliminates buffer_map calls (and therefore buffer_wait) in begin_query. There are no piglit regressions on Evergreen. Please review. Reviewed. Do you test with 2d tiling on or off ? Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] r600g tiling final
Hi, So tiling work is i believe done. I have run piglit accross wide range of hw and sw combination. Bottom line is new mesa on top of either old kernel or old ddx won't regress anything. New mesa on top of proper kernel will get you 2D tiling for texture and anything allocated by mesa, and if you have proper DDX with option ColorTiling2D enabled you will also get 2D tiling for front buffer and depth/stencil buffer. For libdrm you need the lastest master. I will do a libdrm release on monday. Afterward i will commit mesa ddx with proper autoconf voodoo to check for the new libdrm. kernel patches : http://people.freedesktop.org/~glisse/tiling/0001-drm-radeon-kms-add-support-for-streamout-v7.patch http://people.freedesktop.org/~glisse/tiling/0001-drm-radeon-add-support-for-evergreen-ni-tiling-infor.patch mesa patch: http://people.freedesktop.org/~glisse/tiling/0001-r600g-add-support-for-common-surface-allocator-for-t.patch ddx patch: http://people.freedesktop.org/~glisse/tiling/0001-r600-evergreen-use-common-surface-allocator-for-tili.patch Link to piglit test: http://people.freedesktop.org/~glisse/tiling/cayman/changes.html http://people.freedesktop.org/~glisse/tiling/cedar/changes.html http://people.freedesktop.org/~glisse/tiling/redwood/changes.html http://people.freedesktop.org/~glisse/tiling/juniper/changes.html http://people.freedesktop.org/~glisse/tiling/fusion/changes.html http://people.freedesktop.org/~glisse/tiling/rv770/changes.html http://people.freedesktop.org/~glisse/tiling/rv710/changes.html http://people.freedesktop.org/~glisse/tiling/rv635/changes.html http://people.freedesktop.org/~glisse/tiling/rv610/changes.html first column GPU name is unpatched mesa,unpatched ddx,unpatched kernel second column surf0-ddx0 is patched mesa,patched ddx with 2d tiling disabled and new mesa code path disabled (basicly check that nothing regress in old code path). third column patched mesa, unpatched ddx using new mesa code path. This check that mesa on top of old userspace doesn't break anything. fourth column patched mesa, patched ddx, unpatched kernel. This check that new mesa on top of old kernel works properly. fith column is everything is patched and 2D tiling is enabled everywhere. Note that few test just randomly switch from pass to fail (fbo-sys-blit*, read-front, ...). I also tested a lot the old userspace on top of new kernel for evergreen to make sure that the command checker doesn't regress anything. While it reject some command stream thus were wrong and never successfully completed. Leading to no regression in piglit (basicly second column). Fusion doesn't have unpatched kernel run as things keep locking up for me with unpatched kernel. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/15] R600g cleanup and rework of cache flushing
On Mon, Jan 30, 2012 at 09:23:03PM +0100, Marek Olšák wrote: Hi everyone, This patch series is a follow-up to the previous one (Remove all uses of the register mask). First, it cleans up some code and merges r600_context into r600_pipe_context. The split of functionality between the two contexts made absolutely no sense. Next, it adds a new mechanism for emitting states. It's largely inspired by r300g and it's really simple, yet robust. (some people should seriously learn what polymorphism means and how it's used to write software before even writing drivers, because I feel like I am the only one making use of it in r600g, which is really a shame /rant) It can be used to schedule *any* commands for execution before the next draw operation, not just register updates. We'll use that more often in the future. For now, it's only used for cache flushes. Finally, this series completely reworks cache flushes. The problem with the old code was that the flags last_flush and binding, which were stored in resource structs, were possible causes of race conditions. Not only does this new code fix that, it also simplifies the whole thing. The flushes are done explicitly when states are changed according to this scheme: bind_shader - r600_inval_shader_cache set_constant_buffer - r600_inval_shader_cache bind_vertex_elements - r600_inval_shader_cache (for the fetch shader) bind_vertex_buffers - r600_inval_vertex_cache bind_sampler_views - r600_inval_texture_cache set_framebuffer - r600_flush_framebuffer flush - r600_flush_framebuffer Besides that, SURFACE_SYNC is called at most once between draw operations and flushes the whole memory range. The inval/flush functions only accumulate the flush flags. The rework also fixes flushes on RV670. The fbo-drawbuffers test no longer causes issues. Flushing CB1_DEST_BASE was not enough, DEST_BASE_0 must be flushed as well. This fixes 21 piglit tests on RV670. The flushing seems to be fixed finally, but the piglit results are not yet up to par with RV730. All this code has been tested on RV670, RV730, and REDWOOD. It makes no sense and it's over engineer if you forget the initial design decision which was for a new kernel API which matched closely what r600g had. But i agree that against cs ioctl this design is just painful. Anyway looks good from quick review. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] r600-r800 2D tiling
On Mon, Jan 16, 2012 at 12:08:17PM +, Simon Farnsworth wrote: (resending due to my inability to work my e-mail client - I neither cc'd Jerome, nor used the correct identity, so the original appears to be held in moderation). On Thursday 12 January 2012, Jerome Glisse j.gli...@gmail.com wrote: Hi, I don't cross post as i am pretty sure all interested people are reading this mailing-list. Attached is kernel, libdrm, ddx, mesa/r600g patches to enable 2D tiling on r600 to cayman. I haven't yet done a full regression testing but 2D tiling seems to work ok. I would like to get feedback on 2 things : - the kernel API I notice that you don't expose all the available Evergreen parameters to user control (TILE_SPLIT_BYTES, NUM_BANKS are both currently fixed by the kernel). Is this deliberate? It looks like it's leftovers from a previous attempt to force Evergreen's flexible 2D tiling to behave like R600's fixed-by-hardware 2D tiling. I need to add tile split to kernel API, num banks is not a surface parameter. Well it is but it needs to be set to the same value as the global one. I think it might only be usefull in multi-gpu case with different GPU (but that's just a wild guess). - using libdrm/radeon as common place for surface allocation The second question especialy impact the layering/abstraction of gallium btw winsys as it make libdrm/radeon_surface API a part of the winsys. The ddx doesn't need as much knowledge as mesa (pretty much the whole mipmap tree is pointless to the ddx). So anyone have strong feeling about moving the whole mipmap tree computation to this common code ? I'm in favour - it means that all the code relating to the details of how modern Radeons tile surfaces is in one place. I've looked at the API you introduce to handle this, and it should be very easy to port to a non-libdrm platform - the only element of the API that's currently tied to libdrm is radeon_surface_manager_new, so a new platform shouldn't struggle to adapt it. I am in process of reworking a bit the API but it will be very close and only the surface manager creator will have drm specific code. I do have one question; how are you intending to handle passing the tiling parameters from the DDX to Mesa for GLX_EXT_texture_from_pixmap? Right now, it works because the DDX uses the surface manager's defaults for tiling, as does Mesa; I would expect Mesa to read out the parameters as set in the kernel and use those. At a future date, I can envisage the DDX wanting to choose a different tiling layout for DRI2 buffers, or XComposite backing pixmaps (e.g. because someone's benchmarked it and found that choosing something beyond the bare minimum that meets constraints improves performance); it would be a shame if we can't do this because Mesa's not flexible enough. We don't use dri2 to communicate tiling info, we go through kernel for that. So ddx call set_tiling ioctl and mesa call get_tiling, i haven't hooked up the mesa side to extract various eg values yet, right now it works because both ddx and mesa use same surface allocator param so they end up taking same value for various eg fields. Again i am working on this. Hopefully should be completely done this week. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] r600-r800 2D tiling
On Fri, Jan 13, 2012 at 11:59:28AM +0100, Michel Dänzer wrote: On Don, 2012-01-12 at 14:50 -0500, Jerome Glisse wrote: Attached is kernel, libdrm, ddx, mesa/r600g patches to enable 2D tiling on r600 to cayman. I haven't yet done a full regression testing but 2D tiling seems to work ok. I would like to get feedback on 2 things : - the kernel API - using libdrm/radeon as common place for surface allocation I generally like the idea of centralizing this in libdrm_radeon. The second question especialy impact the layering/abstraction of gallium btw winsys as it make libdrm/radeon_surface API a part of the winsys. That's unfortunate, but then again the Radeon Gallium drivers have never been very clean in this regard. I guess the first one to want to use them on a non-DRM platform gets to clean that up. :) To test you need to set ColorTiling2D to true in your xorg.conf, plan is to get mesa 8.0 and newer with proper support for 2D tiling and in 1 year, to move ColorTiling2D default value from false to true. (assumption is that by then we could assume that someone with a working ddx would also have a supported mesa) Sounds good. Note that the Mesa and X driver changes need to either continue building and working with older libdrm_radeon, or bump the libdrm_radeon version requirement in configure.ac. Plan is to release updated libdrm before commiting to mesa, at which point i will try to dust off my configure.ac foo. I updated patches and are now at : http://people.freedesktop.org/~glisse/tiling/ For them to work you need the ddx option and for mesa you need to set R600_TILING=1 R600_SURF=1. I will remove this once i am confident that it works accross various GPU without regression. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/1] Delete i965g
On Tue, Nov 29, 2011 at 10:12 AM, Jose Fonseca jfons...@vmware.com wrote: The bulk is there but there are a few places missing. I'll update those, do some sanity checks and commit. Jose Is there a good reason to delete i965g ? Maybe some people are interested in it. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Why failover module are not used?
On Fri, Sep 23, 2011 at 3:18 AM, jaco...@viatech.com.cn wrote: Hi all, In our mesa code, there is a pipe driver named failover which is not used at all. I think the failover pipe driver is a good solution of the hardware without full capability to support GL2.0. But why it’s discarded? It’s because fallback solution isn’t needed for almost all hardware or because there is critical bug to stop using it? Any answer will be appreciated. Thanks. Best Regards, Jacob He I think it was decided that it's better not lie about hw capabilities and have the hw driver reject unsupported shader/features. Regards, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] r600g: implement fragment and vertex color clamp
On Mon, Jun 27, 2011 at 8:38 AM, Roland Scheidegger srol...@vmware.com wrote: Am 25.06.2011 00:22, schrieb Vadim Girlin: On 06/24/2011 11:38 PM, Jerome Glisse wrote: On Fri, Jun 24, 2011 at 12:29 PM, Vadim Girlinvadimgir...@gmail.com wrote: Fixes https://bugs.freedesktop.org/show_bug.cgi?id=38440 Signed-off-by: Vadim Girlinvadimgir...@gmail.com As discussed previously, there is better to handle this. I think best solution is to always add the instruction and to conditionally execute them thanks to the boolean constant. If this reveal to have a too big impact on shader, other solution i see is adding a cf block with those instructions and to enable or disable that block (cf_nop) and reupload shader that would avoid a rebuild. I know its not optimal to do a full rebuild, but rebuild is needed only when the application will use the same shader in different clamping states. It won't be a problem if the application doesn't change clamping state or if it changes the state but uses each shader in one state only. So assuming that typical app will not use one shader in both states, it shouldn't be a problem. Is this assumption wrong? I'm not really sure because I have no much experience in this. But if it's wrong then it's probably better for performance to build and cache both versions. I tend to think you're right apps probably don't want to use the same shader both with and without clamping. Well if boolean block (see COND field set to SQ_CF_COND_BOOL in SQ_CF_WORD1) are free from perf point of view then i think it's best to have one shader with the clamp instruction inside the boolean enabled block. Only benchmark can tell. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] r600g: implement fragment and vertex color clamp
On Fri, Jun 24, 2011 at 12:29 PM, Vadim Girlin vadimgir...@gmail.com wrote: Fixes https://bugs.freedesktop.org/show_bug.cgi?id=38440 Signed-off-by: Vadim Girlin vadimgir...@gmail.com As discussed previously, there is better to handle this. I think best solution is to always add the instruction and to conditionally execute them thanks to the boolean constant. If this reveal to have a too big impact on shader, other solution i see is adding a cf block with those instructions and to enable or disable that block (cf_nop) and reupload shader that would avoid a rebuild. But as a mean time solution i think this patch is ok Cheers, Jerome --- src/gallium/drivers/r600/evergreen_state.c | 2 + src/gallium/drivers/r600/r600_pipe.c | 2 +- src/gallium/drivers/r600/r600_pipe.h | 7 +++- src/gallium/drivers/r600/r600_shader.c | 52 +++--- src/gallium/drivers/r600/r600_shader.h | 1 + src/gallium/drivers/r600/r600_state.c | 2 + src/gallium/drivers/r600/r600_state_common.c | 30 ++- 7 files changed, 87 insertions(+), 9 deletions(-) diff --git a/src/gallium/drivers/r600/evergreen_state.c b/src/gallium/drivers/r600/evergreen_state.c index f86e4d4..dfe7896 100644 --- a/src/gallium/drivers/r600/evergreen_state.c +++ b/src/gallium/drivers/r600/evergreen_state.c @@ -256,6 +256,8 @@ static void *evergreen_create_rs_state(struct pipe_context *ctx, } rstate = rs-rstate; + rs-clamp_vertex_color = state-clamp_vertex_color; + rs-clamp_fragment_color = state-clamp_fragment_color; rs-flatshade = state-flatshade; rs-sprite_coord_enable = state-sprite_coord_enable; diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c index 38801d6..12599bf 100644 --- a/src/gallium/drivers/r600/r600_pipe.c +++ b/src/gallium/drivers/r600/r600_pipe.c @@ -377,6 +377,7 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) case PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_HALF_INTEGER: case PIPE_CAP_SM3: case PIPE_CAP_SEAMLESS_CUBE_MAP: + case PIPE_CAP_FRAGMENT_COLOR_CLAMP_CONTROL: return 1; /* Supported except the original R600. */ @@ -392,7 +393,6 @@ static int r600_get_param(struct pipe_screen* pscreen, enum pipe_cap param) /* Unsupported features. */ case PIPE_CAP_STREAM_OUTPUT: case PIPE_CAP_PRIMITIVE_RESTART: - case PIPE_CAP_FRAGMENT_COLOR_CLAMP_CONTROL: case PIPE_CAP_TGSI_INSTANCEID: case PIPE_CAP_TGSI_FS_COORD_ORIGIN_LOWER_LEFT: case PIPE_CAP_TGSI_FS_COORD_PIXEL_CENTER_INTEGER: diff --git a/src/gallium/drivers/r600/r600_pipe.h b/src/gallium/drivers/r600/r600_pipe.h index 63ddd39..dc9aad0 100644 --- a/src/gallium/drivers/r600/r600_pipe.h +++ b/src/gallium/drivers/r600/r600_pipe.h @@ -88,6 +88,8 @@ struct r600_pipe_sampler_view { struct r600_pipe_rasterizer { struct r600_pipe_state rstate; + boolean clamp_vertex_color; + boolean clamp_fragment_color; boolean flatshade; unsigned sprite_coord_enable; float offset_units; @@ -125,6 +127,7 @@ struct r600_pipe_shader { struct r600_bo *bo; struct r600_bo *bo_fetch; struct r600_vertex_element vertex_elements; + struct tgsi_token *tokens; }; struct r600_pipe_sampler_state { @@ -202,6 +205,8 @@ struct r600_pipe_context { struct pipe_query *saved_render_cond; unsigned saved_render_cond_mode; /* shader information */ + boolean clamp_vertex_color; + boolean clamp_fragment_color; boolean spi_dirty; unsigned sprite_coord_enable; boolean flatshade; @@ -265,7 +270,7 @@ void r600_init_query_functions(struct r600_pipe_context *rctx); void r600_init_context_resource_functions(struct r600_pipe_context *r600); /* r600_shader.c */ -int r600_pipe_shader_create(struct pipe_context *ctx, struct r600_pipe_shader *shader, const struct tgsi_token *tokens); +int r600_pipe_shader_create(struct pipe_context *ctx, struct r600_pipe_shader *shader); void r600_pipe_shader_destroy(struct pipe_context *ctx, struct r600_pipe_shader *shader); int r600_find_vs_semantic_index(struct r600_shader *vs, struct r600_shader *ps, int id); diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 904cc69..2e5d4a6 100644 --- a/src/gallium/drivers/r600/r600_shader.c +++ b/src/gallium/drivers/r600/r600_shader.c @@ -118,9 +118,9 @@ static
Re: [Mesa-dev] [PATCH 0/3] r600g patches
On Fri, Jun 24, 2011 at 12:29 PM, Vadim Girlin vadimgir...@gmail.com wrote: #1 fixes slots order for x y writes in the LIT implementation. Without this patch fp-lit-mask piglit test fails after patch 3. It seems wrong order causes wrong PV.* values for the next instruction. #2 reduces unneeded calls to r600_spi_update. #3 implements color clamping in shaders by adding MOV_SAT R,R instructions for each color output before export. Shaders are rebuilt when clamping state changes. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=38440 There are no regressions with r600.tests on evergreen with these patches. r600g: LIT: fix xy slots order r600g: optimize spi update r600g: implement fragment and vertex color clamp src/gallium/drivers/r600/evergreen_state.c | 2 + src/gallium/drivers/r600/r600_pipe.c | 2 +- src/gallium/drivers/r600/r600_pipe.h | 8 +++- src/gallium/drivers/r600/r600_shader.c | 74 -- src/gallium/drivers/r600/r600_shader.h | 1 + src/gallium/drivers/r600/r600_state.c | 2 + src/gallium/drivers/r600/r600_state_common.c | 40 -- 7 files changed, 106 insertions(+), 23 deletions(-) Pushed the series thanks Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] linker: Reject shaders that use too many varyings
On Wed, Jun 22, 2011 at 10:49 PM, Alex Deucher alexdeuc...@gmail.com wrote: On Wed, Jun 22, 2011 at 10:12 PM, Roland Scheidegger srol...@vmware.com wrote: Am 21.06.2011 20:59, schrieb Sven Arvidsson: This change broke a whole lot of stuff on r600g, for example Unigine Heaven: shader uses too many varying components (36 32) It looks like the r600g driver claims to only support 10 varyings, which the state tracker reduces to 8 (as it subtracts the supposedly included color varyings). At first sight I can't quite see why it's limited to 10, all r600 chips should be able to handle 32 (dx10 requirement) but of course the driver might not (mesa itself is limited to 16 it seems). If it worked just fine before that suggests it indeed works just fine with more... Someone more familiar with the driver should be able to tell if it's safe to increase the limit to 32 (the state tracker will cap it to 16). The hardware definitely supports 32. I'm not sure why it's currently set to 10; I don't see any limitations in the code off hand. Alex IIRC it's just cut paste from r300g it can be safely bump Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] linker: Reject shaders that use too many varyings
On Thu, Jun 23, 2011 at 10:38 AM, Roland Scheidegger srol...@vmware.com wrote: Am 23.06.2011 16:09, schrieb Jerome Glisse: On Wed, Jun 22, 2011 at 10:49 PM, Alex Deucher alexdeuc...@gmail.com wrote: On Wed, Jun 22, 2011 at 10:12 PM, Roland Scheidegger srol...@vmware.com wrote: Am 21.06.2011 20:59, schrieb Sven Arvidsson: This change broke a whole lot of stuff on r600g, for example Unigine Heaven: shader uses too many varying components (36 32) It looks like the r600g driver claims to only support 10 varyings, which the state tracker reduces to 8 (as it subtracts the supposedly included color varyings). At first sight I can't quite see why it's limited to 10, all r600 chips should be able to handle 32 (dx10 requirement) but of course the driver might not (mesa itself is limited to 16 it seems). If it worked just fine before that suggests it indeed works just fine with more... Someone more familiar with the driver should be able to tell if it's safe to increase the limit to 32 (the state tracker will cap it to 16). The hardware definitely supports 32. I'm not sure why it's currently set to 10; I don't see any limitations in the code off hand. Alex IIRC it's just cut paste from r300g it can be safely bump Ok Marek bumped it to 34. That seems to be lying too I don't think it could handle 32 generic inputs and 2 colors. But there's no way to really express that right now. Roland Also iirc r6xx/r7xx needs special code for handling varying over 16, can't remember if we had proper code for that. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Status of the GLSL-TGSI translator
On Thu, Jun 16, 2011 at 10:08 AM, Brian Paul bri...@vmware.com wrote: On 06/15/2011 03:38 PM, Bryan Cain wrote: My work on the GLSL IR to TGSI translator I announced on the list this April is now at the point where I think it is ready to be merged into Mesa. It is stable and doesn't regress any piglit tests on softpipe or nv50. It adds native integer support as required by GLSL 1.30, although it is currently disabled for all drivers since GLSL 1.30 support is not complete yet and most Gallium drivers haven't implemented the TGSI integer opcodes. (This would be a good time for Gallium driver developers to add support for TGSI's integer opcodes, which are currently only implemented in softpipe.) Developing this necessitated significant changes elsewhere in Mesa, and some small changes in Gallium. This means that some of the commits in my branch probably need to be reviewed by the developers of those components. If I had commit access to Mesa, I would create a branch for this work in the main Mesa repository. But since I am still waiting on my freedesktop.org account to be created, I have pushed the latest version to the glsl-to-tgsi branch of my personal Mesa repository on GitHub: Git clone URL: git://github.com/Plombo/mesa.git Web interface for viewing commits: https://github.com/Plombo/mesa/commits/glsl-to-tgsi Hopefully my freedesktop.org account will be created soon (I have already had my account request approved), so that I can push this to a branch in the central repository. Looks like nice work, Bryan. Just a few minor questions/comments for now: 1. The st_fragment/vertex/geometry_program structs now have a glsl_to_tgsi field. I did a grep, but I couldn't find where that field is assigned. Can you clue me in? 2. The above mentioned program structs contains an old Mesa instruction program AND/OR(?) a GLSL IR. Do both types of representations co-exist sometimes? Perhaps you could update the comments on those structs to explain that. 3. Kind of a follow-on: for glDrawPixels and glBitmap we take the original program code (in Mesa form) and prepend extra instructions for fetching the fragment color or doing the fragment kill. Do we always have the Mesa instructions for this? It seems we don't normally want to generate Mesa instructions all the time but we still need them sometimes. I must be missing something but why do we need to take the original program for those ? Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] GLSL IR int-to-float pass
On Tue, May 24, 2011 at 8:09 PM, Bryan Cain bryanca...@gmail.com wrote: Hi, In the past few days, I've been working on native integer support in my GLSL to TGSI translator. Something that's come to my attention is that supporting Gallium targets with and without integer support using a single GLSL IR backend will more or less require a GLSL IR pass to convert int, uint, and possibly bool variables and operations to floats. Currently, this is done directly in the backend, in both ir_to_mesa and st_glsl_to_tgsi. However, the mod_to_fract and div_to_mul_rcp lowering passes for GLSL IR need to know whether to lower integer modulus and division operations to their corresponding float operations. (They both do this in Mesa master without asking the backend, but that will be easy to change later.) So a GLSL IR pass will be needed to do the type lowering. Such a pass would also have the advantage of less duplicated functionality between backends, since ir_to_mesa could also take advantage of the pass to eliminate some code. I'm more than willing to try writing such a pass myself if no one else is interested in doing it, but I figure I should make sure there are no objections before starting on it. Bryan TGSI needs to grow type support (int, uint and possibly int8,16,32..) Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] GLSL IR int-to-float pass
On Wed, May 25, 2011 at 9:41 AM, Keith Whitwell kei...@vmware.com wrote: On Wed, 2011-05-25 at 09:32 -0400, Jerome Glisse wrote: On Tue, May 24, 2011 at 8:09 PM, Bryan Cain bryanca...@gmail.com wrote: Hi, In the past few days, I've been working on native integer support in my GLSL to TGSI translator. Something that's come to my attention is that supporting Gallium targets with and without integer support using a single GLSL IR backend will more or less require a GLSL IR pass to convert int, uint, and possibly bool variables and operations to floats. Currently, this is done directly in the backend, in both ir_to_mesa and st_glsl_to_tgsi. However, the mod_to_fract and div_to_mul_rcp lowering passes for GLSL IR need to know whether to lower integer modulus and division operations to their corresponding float operations. (They both do this in Mesa master without asking the backend, but that will be easy to change later.) So a GLSL IR pass will be needed to do the type lowering. Such a pass would also have the advantage of less duplicated functionality between backends, since ir_to_mesa could also take advantage of the pass to eliminate some code. I'm more than willing to try writing such a pass myself if no one else is interested in doing it, but I figure I should make sure there are no objections before starting on it. Bryan TGSI needs to grow type support (int, uint and possibly int8,16,32..) Or go away entirely... I'm not trying to impose a direction on this, but it seems like the GLSL IR-TGSI converter (once running) could be pushed down into the individual drivers and GLSL IR or a close cousin of it could become the gallium-level interface. Then individual drivers could be modified to consume IR directly. Keith I am also in favor of getting rid of tgsi, i would prefer having the driver callback into mesa to set informations mesa needs from the shader, for instance that would allow driver to pick where they put attribute (might be a huge win on hw like r6xx or newer) and few others things like that. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] shader-db, and justifying an i965 compiler optimization.
On Tue, May 17, 2011 at 11:22 PM, Eric Anholt e...@anholt.net wrote: One of the pain points of working on compiler optimizations has been justifying them -- sometimes I come up with something I think is useful and spend a day or two on it, but the value doesn't show up as fps in the application that suggested the optimization to me. Then I wonder if this transformation of the code is paying off in general, and thus if I should push it. If I don't push it, I end up bringing that patch out on every application I look at that it could affect, to see if now I finally have justification to get it out of a private branch. At a conference this week, we heard about how another team is are using a database of (assembly) shaders, which they run through their compiler and count resulting instructions for testing purposes. This sounded like a fun idea, so I threw one together. Patch #1 is good in general (hey, link errors, finally!), but also means that a quick hack to glslparsertest makes it link a passing compile shader and therefore generate assembly that gets dumped under INTEL_DEBUG=wm. Patch #2 I used for automatic scraping of shaders in every application I could find on my system at the time. The open-source ones I pushed to: http://cgit.freedesktop.org/~anholt/shader-db And finally, patch #3 is something I built before but couldn't really justify until now. However, given that it reduced fragment shader instructions 0.3% across 831 shaders (affecting 52 of them including yofrankie, warsow, norsetto, and gstreamer) and didn't increase instructions anywhere, I'm a lot happier now. Hopefully we hook up EXT_timer_query to apitrace soon so I can do more targeted optimizations and need this less :) In the meantime, I hope this can prove useful to others -- if you want to contribute appropriately-licensed shaders to the database so we track those, or if you want to make the analysis work on your hardware backend, feel free. I have been thinking at doing somethings slightly different. Sadly instruction count is not necesarily the best metric to evaluate optimization performed by shader compiler. Hidding texture fetch latency of a shader can improve performance a lot more than saving 2 instructions. So my idea was to do a gl app that render into framebuffer thousand time the same shader. The use of fbo is to avoid to have things like swapbuffer or a like to play a role while we are solely interested in shader performance. Also use an fbo as big as possible so fragment shader has a lot of pixel to go through and i believe disabling things like blending, zbuffer ... so no other part of the pipeline impact in anyway the shader. Others things might play a role, for instance if we provide small dummy texture we might just hide the gain texture fetch optimization might give, as the GPU might be able to have the texture in cache and thus have very low latency on each texture fetch. Same if we are using same texture for all unit, texture cache might hide latency that real application might otherwise face. So i think we need to have big enough dummy texture like 512*512 and different one for each unit, also try to provide random u,v for texture fetch so that texture cache doesn't hide too much of the latency. I am sure i am missing other factor that we should try to diminish while testing for shader performance. I think such things isn't a good fit for piglit but it can still be added as a subtools (so that we don't add yet another repository) Thanks a lot for extracting all those shader, i am sure we can get some people to write us shader with some what advance math under acceptable license. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] shader-db, and justifying an i965 compiler optimization.
On Wed, May 18, 2011 at 3:16 PM, Eric Anholt e...@anholt.net wrote: On Wed, 18 May 2011 11:05:39 -0400, Jerome Glisse j.gli...@gmail.com wrote: On Tue, May 17, 2011 at 11:22 PM, Eric Anholt e...@anholt.net wrote: One of the pain points of working on compiler optimizations has been justifying them -- sometimes I come up with something I think is useful and spend a day or two on it, but the value doesn't show up as fps in the application that suggested the optimization to me. Then I wonder if this transformation of the code is paying off in general, and thus if I should push it. If I don't push it, I end up bringing that patch out on every application I look at that it could affect, to see if now I finally have justification to get it out of a private branch. At a conference this week, we heard about how another team is are using a database of (assembly) shaders, which they run through their compiler and count resulting instructions for testing purposes. This sounded like a fun idea, so I threw one together. Patch #1 is good in general (hey, link errors, finally!), but also means that a quick hack to glslparsertest makes it link a passing compile shader and therefore generate assembly that gets dumped under INTEL_DEBUG=wm. Patch #2 I used for automatic scraping of shaders in every application I could find on my system at the time. The open-source ones I pushed to: http://cgit.freedesktop.org/~anholt/shader-db And finally, patch #3 is something I built before but couldn't really justify until now. However, given that it reduced fragment shader instructions 0.3% across 831 shaders (affecting 52 of them including yofrankie, warsow, norsetto, and gstreamer) and didn't increase instructions anywhere, I'm a lot happier now. Hopefully we hook up EXT_timer_query to apitrace soon so I can do more targeted optimizations and need this less :) In the meantime, I hope this can prove useful to others -- if you want to contribute appropriately-licensed shaders to the database so we track those, or if you want to make the analysis work on your hardware backend, feel free. I have been thinking at doing somethings slightly different. Sadly instruction count is not necesarily the best metric to evaluate optimization performed by shader compiler. Hidding texture fetch latency of a shader can improve performance a lot more than saving 2 instructions. So my idea was to do a gl app that render into framebuffer thousand time the same shader. The use of fbo is to avoid to have things like swapbuffer or a like to play a role while we are solely interested in shader performance. Also use an fbo as big as possible so fragment shader has a lot of pixel to go through and i believe disabling things like blending, zbuffer ... so no other part of the pipeline impact in anyway the shader. You might take a look at mesa-demos/src/perf for that. I haven't had success using them for performance work due to the noisiness of the results. More generally, imo, the problem with that plan is you have to build the shaders yourself and justify to yourself why that shader you wrote is representative, and you spend all your time on building the tests when you just wanted to know if an instruction-reduction optimization did anything. shader-db took me one evening to build and collect for all applications I had (I've got a personal branch for all the closed-source stuff :/ ) Shader is a bunch of input, so for each shader collected the issue is to provide proper input, texture could use dummy texture unless the shader have some dependency on the texture data (like if the texture fetched data determine the number of iteration or is use to kill a fragment, ...). Well it's all about going through know shader and building a reasonable set of input for each of them, it's time consuming but i believe it brings a lot more for testing point of view. For actual performance testing of apps without idsoftware-style timedemos, I'm way more excited by the potential of using apitrace with EXT_timer_query to decide which shaders I should be analyzing, and then I'd know afterward whether I impacted a real application by replaying the trace. That is, assuming I didn't increase CPU costs in the process, which is where an apitrace replay would not be representative. Our perspective is: if we are driving the hardware anywhere below what is possible, that is a bug that we should fix. Analyzing the costs of instructions, scheduling impacts, CPU overhead impacts, etc. may be out of scope for shader-db, but does make some types of analysis quick and easy (test all shaders you have ever seen of in a couple minutes). I agree that shader-db provide a usefull tools, i am just convinced that number of instruction in complex shader is a bad metric especialy when considering things like r6xx and newer class of hw where texture fetch and instruction can run
Re: [Mesa-dev] [PATCH] r600g: add support for anisotropic filtering
Please resend by attaching the patch not pasting it On Fri, May 6, 2011 at 4:53 PM, Carl-Philip Haensch carl-philip.haen...@mailbox.tu-dresden.de wrote: From b5ad4e6fb399203afcfe2a5ccb35bb8ccad28b65 Mon Sep 17 00:00:00 2001 From: Carl-Philip Haensch carli@carli-laptop.(none) Date: Fri, 6 May 2011 22:48:08 +0200 Subject: [PATCH] r600g: add support for anisotropic filtering --- src/gallium/drivers/r600/r600_state.c | 20 +--- src/gallium/drivers/r600/r600d.h | 9 + 2 files changed, 26 insertions(+), 3 deletions(-) diff --git a/src/gallium/drivers/r600/r600_state.c b/src/gallium/drivers/r600/r600_state.c index 3f979cf..aeffb9e 100644 --- a/src/gallium/drivers/r600/r600_state.c +++ b/src/gallium/drivers/r600/r600_state.c @@ -364,6 +364,17 @@ static void *r600_create_rs_state(struct pipe_context *ctx, return rstate; } + + +static inline unsigned r600_tex_aniso_filter(unsigned filter) +{ + if (filter = 1) return 0; + if (filter = 2) return 1; + if (filter = 4) return 2; + if (filter = 8) return 3; + /* else */ return 4; +} + static void *r600_create_sampler_state(struct pipe_context *ctx, const struct pipe_sampler_state *state) { @@ -376,13 +387,15 @@ static void *r600_create_sampler_state(struct pipe_context *ctx, rstate-id = R600_PIPE_STATE_SAMPLER; util_pack_color(state-border_color, PIPE_FORMAT_B8G8R8A8_UNORM, uc); + unsigned aniso_flag_offset = state-max_anisotropy 1 ? 4 : 0; r600_pipe_state_add_reg(rstate, R_03C000_SQ_TEX_SAMPLER_WORD0_0, S_03C000_CLAMP_X(r600_tex_wrap(state-wrap_s)) | S_03C000_CLAMP_Y(r600_tex_wrap(state-wrap_t)) | S_03C000_CLAMP_Z(r600_tex_wrap(state-wrap_r)) | - S_03C000_XY_MAG_FILTER(r600_tex_filter(state-mag_img_filter)) | - S_03C000_XY_MIN_FILTER(r600_tex_filter(state-min_img_filter)) | + S_03C000_XY_MAG_FILTER(r600_tex_filter(state-mag_img_filter) | aniso_flag_offset) | + S_03C000_XY_MIN_FILTER(r600_tex_filter(state-min_img_filter) | aniso_flag_offset) | S_03C000_MIP_FILTER(r600_tex_mipfilter(state-min_mip_filter)) | + S_03C000_ANISO(r600_tex_aniso_filter(state-max_anisotropy)) | S_03C000_DEPTH_COMPARE_FUNCTION(r600_tex_compare(state-compare_func)) | S_03C000_BORDER_COLOR_TYPE(uc.ui ? V_03C000_SQ_TEX_BORDER_COLOR_REGISTER : 0), 0x, NULL); r600_pipe_state_add_reg(rstate, R_03C004_SQ_TEX_SAMPLER_WORD1_0, @@ -492,7 +505,8 @@ static struct pipe_sampler_view *r600_create_sampler_view(struct pipe_context *c S_038014_BASE_ARRAY(state-u.tex.first_layer) | S_038014_LAST_ARRAY(state-u.tex.last_layer), 0x, NULL); r600_pipe_state_add_reg(rstate, R_038018_RESOURCE0_WORD6, - S_038018_TYPE(V_038010_SQ_TEX_VTX_VALID_TEXTURE), 0x, NULL); + S_038018_TYPE(V_038010_SQ_TEX_VTX_VALID_TEXTURE) | + S_038018_ANISO(4 /* max 16 samples */), 0x, NULL); return resource-base; } diff --git a/src/gallium/drivers/r600/r600d.h b/src/gallium/drivers/r600/r600d.h index 8296b52..c997462 100644 --- a/src/gallium/drivers/r600/r600d.h +++ b/src/gallium/drivers/r600/r600d.h @@ -1012,6 +1012,9 @@ #define S_038018_MPEG_CLAMP(x) (((x) 0x3) 0) #define G_038018_MPEG_CLAMP(x) (((x) 0) 0x3) #define C_038018_MPEG_CLAMP 0xFFFC +#define S_038018_ANISO(x) (((x) 0x7) 2) +#define G_038018_ANISO(x) (((x) 2) 0x7) +#define C_038018_ANISO 0xFFE3 #define S_038018_PERF_MODULATION(x) (((x) 0x7) 5) #define G_038018_PERF_MODULATION(x) (((x) 5) 0x7) #define C_038018_PERF_MODULATION 0xFF1F @@ -1090,6 +1093,9 @@ #define S_03C000_MIP_FILTER(x) (((x) 0x3) 17) #define G_03C000_MIP_FILTER(x) (((x) 17) 0x3) #define C_03C000_MIP_FILTER 0xFFF9 +#define S_03C000_ANISO(x) (((x) 0x7) 19) +#define G_03C000_ANISO(x) (((x) 19) 0x7) +#define C_03C000_ANISO 0xFFB7 #define S_03C000_BORDER_COLOR_TYPE(x) (((x) 0x3) 22) #define G_03C000_BORDER_COLOR_TYPE(x) (((x) 22) 0x3) #define C_03C000_BORDER_COLOR_TYPE 0xFF3F @@ -1152,6 +1158,9 @@ #define S_03C008_PERF_Z(x) (((x) 0x3) 18) #define G_03C008_PERF_Z(x) (((x) 18) 0x3) #define C_03C008_PERF_Z 0xFFF3 +#define
Re: [Mesa-dev] KWin and Mesa
On Wed, Apr 20, 2011 at 8:01 AM, Martin Gräßlin mgraess...@kde.org wrote: On Wed, 20 Apr 2011 04:32:25 +0200, Henri Verbeet hverb...@gmail.com wrote: On 19 April 2011 16:52, Martin Gräßlin mgraess...@kde.org wrote: Hi Mesa-devs, yesterday I published a rant about Mesa breaking KWin and given some comments on Phoronix Forums it seems like there is the wish for more communication between our development groups and so I want to start it. Please note that I am not subscribed to this mailing list, so please keep me in CC (I might not be able to reply this week at all). It is my wish to never have to rant about the state of Linux drivers any more and that I never have to see Mesa breaking KWin again. I think there are a couple of points here, some of them already made by others. Note that the following is mostly just how I personally see things, not necessarily what anyone else thinks. Thanks for your mail. This is really constructive and a reply in the kind of I hoped to receive. A good starting point to fix the mess we are currently in :-) First, there's the specific issue your blog post talks about. While I understand the issue, and can sympathize somewhat, I essentially think you're just wrong there. (Yeah, I can be direct too.) It's perhaps unfortunate that this change happened on a minor release, but the basic issues are that blacklisting / whitelisting drivers is just a bad idea, and you can't depend on renderer strings being stable. If you do it anyway, it's going to break, you get to keep all the pieces, and you can't blame the drivers. Actually I agree with you and all other who wrote it: it is a hack and it should not be there. It was added to make KWin at least work around the 4.5 release. As a matter of fact and that question might sound stupid, where do I find information on additional API provided by Mesa than not parsing the renderer/version string? In the response to the blog post I received replies that we should use DRI2QueryVersion. That was the first time that I heard this thing existed. Where is that documented? How can we find out about it? I seriously have never ever heard about it or read about it in any documentation I have read so far. In the more general case, I think hacking around driver bugs is about the worst way to deal with driver bugs in GL applications. In the best case you're just removing an incentive to fix the bug, but it's more likely you just end up creating fragile code or depending on the bug somehow. The problem is that at the time we release it has to work. Our users do not care about whether it is the driver or not. It just has to work. A big problem in that regard is as you noticed yourself the distributions. They do not ship updates to the drivers, so we need to make it work with the driver version out there and not with the next bug fix release. Our work would be much easier if we could just tell the users to update their drivers ;-) Your issue is right there, gnome-shell have been successful dealing with that because they target a particular mesa version and they set a lower bar for the GL feature they need. Your issue is that you want to enable feature that are using too advanced GL stuff for the opensource driver, GLSL wasn't that good before mesa 7.7 (or even 7.8 can't remember). What you should do is decide was is the lowest mesa version you are ready to support and then use that to decide what gl feature you can safely use. If you want to support debian that would more than likely mean dropping glsl. Trying to enable feature one by one is a real bad idea, again i believe here gnome-shell took the right approach. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] KWin and Mesa
On Wed, Apr 20, 2011 at 9:43 AM, Martin Gräßlin mgraess...@kde.org wrote: On Wed, 20 Apr 2011 09:34:01 -0400, Jerome Glisse j.gli...@gmail.com wrote: Your issue is right there, gnome-shell have been successful dealing with that because they target a particular mesa version and they set a lower bar for the GL feature they need. Your issue is that you want to enable feature that are using too advanced GL stuff for the opensource driver, GLSL wasn't that good before mesa 7.7 (or even 7.8 can't remember). What you should do is decide was is the lowest mesa version you are ready to support and then use that to decide what gl feature you can safely use. If you want to support debian that would more than likely mean dropping glsl. Trying to enable feature one by one is a real bad idea, again i believe here gnome-shell took the right approach. GNOME Shell was in a much better situation as they were able to develop against future releases of Mesa. Btw we do not depend on GLSL. Our important GLSL shaders are also reimplemented with an ARB Shader fallback. It should all fallback without problems nowadays. If the driver supports GLSL properly, it gets used, if it supports only ARB Shaders those will be used, if both is not supported it gets disabled. If we know the hardware has limitations we do not use the features unsupported by it. So we do what you actually ask as to do. E.g. with Debian and Mesa 7.7 it works totally fine with the ARB fallbacks (yeah I tested that ;-) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev Then why do you need hack ? To me it seems you are trying to enable feature just on some specific driver version, that's wrong, my point is define what mesa you want, then from that deduce the feature set that is safe to use then from that define what are the default feature, absolutely no hack in code needed here and you can still provide option for people to enable new feature if they want to try them with their driver. Cheers, Jerome ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] KWin and Mesa
On Tue, Apr 19, 2011 at 10:52 AM, Martin Gräßlin mgraess...@kde.org wrote: Hi Mesa-devs, yesterday I published a rant about Mesa breaking KWin and given some comments on Phoronix Forums it seems like there is the wish for more communication between our development groups and so I want to start it. Please note that I am not subscribed to this mailing list, so please keep me in CC (I might not be able to reply this week at all). It is my wish to never have to rant about the state of Linux drivers any more and that I never have to see Mesa breaking KWin again. First of all I want to give a little bit personal history to help you understand why I so far did not contact you and up to now wrote two rants about Mesa breaking KWin. Let's go back to 2010 and the 4.5 release. In March 2010 I finished my Master Thesis and in April 2010 I started my first job. KDE hat the feature freeze for the 4.5 release around End of April, beginning of May. All new functionality including the Blur and Lanczos filter were implemented at that time. Given the change in my life due to end of my studies I did not contribute much to the release. At the time when we implemented the features the current Mesa version was 7.7. Version 7.8 was under development and when it got released marked as a development version. I read this information and considered ok we don't have to deal with 7.8 - it is development. At that time I had a notebook with NVIDIA graphics and an old system with a rather modern Ati card, with a crashing X server if I tried to use the radeon driver. I had no chance to test Mesa drivers at that time! In the time leading to the 4.5 release KWin had no active maintainer. Lubos had been inactive for quite some time and made me maintainer in November. At that time I considered Lubos still to be the maintainer and to be responsible for decisions whether to ship the new features or not. I considered myself only responsible for my own code (which did not cause any problems in that release). I was also running the stable version of KDE (4.4) at that time and the development only for testing. During the beta phase we realized that we had a problem. Mesa 7.8.2 was marked as stable (which I did not expect due to the fact that 7.8.0 and 7.8.1 were unstable) and distributions started to include it. Users were complaining about broken features mostly concerning blur and lanczos filters and mostly with Ati and Intel drivers. Nobody in our development team had an Intel system at that time. I had had access to a system before through a friend of mine, but unfortunately it broke down in exactly that important time. Later on my friend got a new Intel powered device but run Debian Testing on it which did not show any of the problems the users reported with Mesa 7.7. Concerning Ati I knew that Fredrik had been in contact with Mesa developers and that all the new functionality had been implemented on his Ati systems. So we knew that the functionality worked at least with some systems. With the looming release and more and more obvious problems we faced two possible solutions: remove the code completely (disable by default would not have solved it) or try to get it working somehow. I did not see any reason why we should have punished the users of working drivers (e.g. NVIDIA) because other drivers did not work. After all the new functionality was an important feature for our provided user experience and our designers and the Plasma team were demanding it. So we had only the option to make it work. Now you can imagine how difficult it is to workaround bugs in hardware you do not have. My solution was to implement a black list and to crowd source the creation to all our users. Another of the changes was to have the test whether to use direct rendering in an external application (due to drivers crashing when trying to create a GLContext) and there the hack was introduced which now backfired. Now why did I not contact you when we were facing these problems? Given that I had a day job, 2 h travel each day and trying at the same time to make the experience as smooth as possible for our users in the evenings, I seriously had not the time to even think about it. And what would it have changed? We need to get KDE supporting the drivers out there and not the next version! I am sorry that I did not contact you at that time but I think everybody will understand that sometimes as a volunteer developer you don't have the time to do everything which would help. Last but not least I did not feel responsible as I was not the maintainer of KWin. When we released 4.5 I was positive that we had successfully established a black list which ensures that no user would face issues. Unfortunately I was wrong. Users still faced the issue and even worse: the desktop started to freeze if you changed settings in KWin. A problem which had been completely unknown to us before the release (there was a