[Mesa-dev] [Bug 53199] New: out-of-bounds read src/gallium/drivers/softpipe/sp_flush.c:59
https://bugs.freedesktop.org/show_bug.cgi?id=53199 Bug #: 53199 Summary: out-of-bounds read src/gallium/drivers/softpipe/sp_flush.c:59 Classification: Unclassified Product: Mesa Version: unspecified Platform: All OS/Version: All Status: NEW Severity: normal Priority: medium Component: Other AssignedTo: mesa-dev@lists.freedesktop.org ReportedBy: v...@freedesktop.org CC: bri...@vmware.com mesa: 7d65356d8a4d268dce4c933d7704d709e1cdacfa (master) Coverity reports a out-of-bounds read defect. 44void 45softpipe_flush( struct pipe_context *pipe, 46unsigned flags, 47struct pipe_fence_handle **fence ) 48{ 49 struct softpipe_context *softpipe = softpipe_context(pipe); 50 uint i; 51 52 draw_flush(softpipe-draw); 53 At (1): Condition flags 2U, taking true branch 54 if (flags SP_FLUSH_TEXTURE_CACHE) { 55 unsigned sh; 56 At (2): Condition sh 4U, taking true branch At (9): Condition sh 4U, taking true branch At (16): Condition sh 4U, taking true branch 57 for (sh = 0; sh PIPE_SHADER_TYPES; sh++) { At (3): Condition i softpipe-num_sampler_views[sh], taking true branch At (5): Condition i softpipe-num_sampler_views[sh], taking true branch At (7): Condition i softpipe-num_sampler_views[sh], taking false branch At (10): Condition i softpipe-num_sampler_views[sh], taking true branch At (12): Condition i softpipe-num_sampler_views[sh], taking true branch At (14): Condition i softpipe-num_sampler_views[sh], taking false branch At (17): Condition i softpipe-num_sampler_views[sh], taking true branch 58 for (i = 0; i softpipe-num_sampler_views[sh]; i++) { CID 714585: Out-of-bounds read (OVERRUN) [select defect] CID 714587: Out-of-bounds read (OVERRUN_STATIC) At (18): Overrunning static array softpipe-tex_cache, with 3 elements, at position 3 with index variable sh. 59sp_flush_tex_tile_cache(softpipe-tex_cache[sh][i]); At (4): Jumping back to the beginning of the loop At (6): Jumping back to the beginning of the loop At (11): Jumping back to the beginning of the loop At (13): Jumping back to the beginning of the loop 60 } At (8): Jumping back to the beginning of the loop At (15): Jumping back to the beginning of the loop 61 } 62 } src/gallium/include/pipe/p_defines.h 347 /** 348 * Shaders 349 */ 350 #define PIPE_SHADER_VERTEX 0 351 #define PIPE_SHADER_FRAGMENT 1 352 #define PIPE_SHADER_GEOMETRY 2 353 #define PIPE_SHADER_COMPUTE 3 354 #define PIPE_SHADER_TYPES4 src/gallium/drivers/softpipe/sp_context.h 180 /* 181 * Texture caches for vertex, fragment, geometry stages. 182 * Don't use PIPE_SHADER_TYPES here to avoid allocating unused memory 183 * for compute shaders. 184 */ 185 struct softpipe_tex_tile_cache *tex_cache[PIPE_SHADER_GEOMETRY+1][PIPE_MAX_SAMPLERS]; -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] translate: Fix typo in is_legal_int_format_combo.
Fixes same on both sides defect reported by Coverity. Signed-off-by: Vinson Lee v...@freedesktop.org --- src/gallium/auxiliary/translate/translate_generic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/translate/translate_generic.c b/src/gallium/auxiliary/translate/translate_generic.c index 0b6ebf5..72099af 100644 --- a/src/gallium/auxiliary/translate/translate_generic.c +++ b/src/gallium/auxiliary/translate/translate_generic.c @@ -773,7 +773,7 @@ is_legal_int_format_combo( const struct util_format_description *src, for (i = 0; i nr; i++) { /* The signs must match. */ - if (src-channel[i].type != src-channel[i].type) { + if (src-channel[i].type != dst-channel[i].type) { return FALSE; } -- 1.7.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] R600 VDPAU 422 regression since r600g: make sure copying of all texture formats is accelerated
Marek Olšák wrote: Does the attached patch fix this issue? Not properly - it fixes the invalid command stream but the output is not quite right - http://www.andyqos.ukfsn.org/vdpau-422-patched.png Marek On Mon, Aug 6, 2012 at 5:40 PM, Andy Furniss andy...@ukfsn.org wrote: Kernel is dcn card is rv790 - vdpau csc/scale regressed. This only shows with 422 colour so most things work. commit 7c371f46958910dd2ca9487c89af1b72bbfdada9 Author: Marek Olšák mar...@gmail.com Date: Sat Jul 28 00:38:42 2012 +0200 r600g: make sure copying of all texture formats is accelerated [drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream ! radeon :01:00.0: texture bo too small ((704 576) (1 1) 0 26 0 - 1622016 have 884736) radeon :01:00.0: alignments 384 1 1 1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] translate: Fix typo in is_legal_int_format_combo.
Good catch. Reviewed-by: Jose Fonseca jfons...@vmware.com - Original Message - Fixes same on both sides defect reported by Coverity. Signed-off-by: Vinson Lee v...@freedesktop.org --- src/gallium/auxiliary/translate/translate_generic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/translate/translate_generic.c b/src/gallium/auxiliary/translate/translate_generic.c index 0b6ebf5..72099af 100644 --- a/src/gallium/auxiliary/translate/translate_generic.c +++ b/src/gallium/auxiliary/translate/translate_generic.c @@ -773,7 +773,7 @@ is_legal_int_format_combo( const struct util_format_description *src, for (i = 0; i nr; i++) { /* The signs must match. */ - if (src-channel[i].type != src-channel[i].type) { + if (src-channel[i].type != dst-channel[i].type) { return FALSE; } -- 1.7.11.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] down to 1 test page failing in WebGL 1.0.1 test on Radeon driver
2012/8/6 Laurent Carlier lordhea...@gmail.com: Le lundi 6 août 2012 17:14:52 Alex Deucher a écrit : On Mon, Aug 6, 2012 at 5:14 PM, Alex Deucher alexdeuc...@gmail.com wrote: On Mon, Aug 6, 2012 at 12:43 AM, Benoit Jacob bja...@mozilla.com wrote: Hi, Just so you know: the WebGL 1.0.1 tests are now passing on 2 drivers on Linux: the Intel Mesa driver, and the NVIDIA driver. Technically that's enough for us to claim conformance (we need to pass with 2 drivers on each OS we support). But I'd really like to include the Radeon driver in the list of driver we can claim to fully pass conformance tests on. As of Ubuntu 12.04 64bit / Gallium 0.4 on AMD RV710 / Mesa 8.0.2, I have this single test page failing: https://www.khronos.org/registry/webgl/conformance-suites/1.0.1/conforman ce/textures/texture-mips.html Does it still fail with mesa from git (soon to be 8.1)? I think the tiling rework may have fixed this, but is too invasive to backport to the 8.x branch. 8.0 branch that is. Alex Two fail here with r600g/hd6870 from git and kernel 3.5 I can confirm that with my rv770 / kernel 3.5 / mesa 8.1-git conformance/textures/texture-mips.html (17 of 19 passed) failed: texture that is only using the smallest 2 mips should draw with green failed: texture that is only using smallest mips should draw with cyan Andreas ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 53199] out-of-bounds read src/gallium/drivers/softpipe/sp_flush.c:59
https://bugs.freedesktop.org/show_bug.cgi?id=53199 Brian Paul brian.e.p...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Comment #1 from Brian Paul brian.e.p...@gmail.com 2012-08-07 14:01:27 UTC --- Fixed by commit 99695f58fde6d364f2310d97303768782a1e537d -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] someone regressed tinderbox
On 08/06/2012 10:38 PM, Dave Airlie wrote: http://tinderbox.x.org/builds/2012-08-06-0020/logs/libGL/#build Making all in glx gmake[4]: Entering directory `/home/tinderbox/mesa/mesa/src/egl/drivers/glx' CC egl_glx.lo In file included from ../../../../src/egl/main/egltypedefs.h:37, from ../../../../src/egl/main/eglconfig.h:37, from egl_glx.c:44: ../../../../include/EGL/eglext.h:454: error: redefinition of typedef 'PFNEGLQUERYSTREAMTIMEKHRPROC' ../../../../include/EGL/eglext.h:407: note: previous declaration of 'PFNEGLQUERYSTREAMTIMEKHRPROC' was here gmake[4]: Leaving directory `/home/tinderbox/mesa/mesa/src/egl/drivers/glx' The eglext.h patch I posted last night fixes this. I'll go ahead and push it. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gbm: Fix build without gallium_drm_loader
pipe_loader_drm_probe_fd only exists if HAVE_PIPE_LOADER_DRM is defined. This addresses https://bugs.freedesktop.org/show_bug.cgi?id=52962 --- src/gallium/targets/gbm/gbm.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/src/gallium/targets/gbm/gbm.c b/src/gallium/targets/gbm/gbm.c index 7d2af51..3ab3b32 100644 --- a/src/gallium/targets/gbm/gbm.c +++ b/src/gallium/targets/gbm/gbm.c @@ -51,9 +51,11 @@ gallium_screen_create(struct gbm_gallium_drm_device *gdrm) struct pipe_loader_device *dev; int ret; +#ifdef HAVE_PIPE_LOADER_DRM ret = pipe_loader_drm_probe_fd(dev, gdrm-base.base.fd); if (!ret) return -1; +#endif /* HAVE_PIPE_LOADER_DRM */ gdrm-screen = pipe_loader_create_screen(dev, get_library_search_path()); if (gdrm-screen == NULL) { -- 1.7.8.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 13/15] dri: Simplify use of driConcatConfigs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 08/06/2012 07:33 PM, Eric Anholt wrote: Chad Versace chad.vers...@linux.intel.com writes: If either argument to driConcatConfigs(a, b) is null or the empty list, then simply return the other argument as the resultant list. All callers were accomplishing that same behavior anyway. And each caller accopmplished it with the same pattern. So this patch moves that external pattern into the function. CC: Ian Romanick i...@freedesktop.org Reviewed-by: e...@anholt.net Signed-off-by: Chad Versace chad.vers...@linux.intel.com I was going to say reviewed-by on the last patchset with this change. You gave it reviewed-by on the previous patchset, where it was patch 14/16. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJQIS3vAAoJEAIvNt057x8i+4YQALVLaGPD8KoNKX5DD7Knjhjg yyLjtM6Ncl2wiGDYF9Xqteg0U4sFCkk+vQw5WCl9S5yJOPS72Y5BXdoTP0TqZWj1 MRmbfWGxRiDcbxAdvyrYb8wPYCfN/BFKTo9qYgTHPbJ2WB6FtYORZSLOxYaq9LQP JU4nO4U1532qTxGpwHsbwNabC1Jq0r6cKZoFLDoYoxo8Zvv+Sv94fnGmXMcr7GmR c+G50ELXbpV0VFcodJGon1P0rBJon/RuMxMAckbTy/loy3Kr/uh+IsCzWYG2bNbS lhwQQCxV7qD/byt29UJS1PBhaVgkNRCknBJfNqDYbooGowTjfuZqxKvOE5+Ja7kz Tyi6FRX9obhl3tgUlzof86OZiWHwAWJ8xnsYvieNcQMIcdDNL2LiNkOTm0Qmom7m iGigQ7G8E6pe1pRPJAp0LZPfNBphte6PzwwpOWhcJkEYJbj4HQmlC7f0VRnm00Xe W+vrFzLl1otlvKfL93aJpPjbzdbnt+tmFLoEiDlbb5asHaSsdbnw77uVe0DC3osh yD+kakzhgTb8lLSfW/sqnFoYgO2C2KBBv7qQk3WP80D1mi568+VYb7l5vOAT4lDq Ow/2yRCx4lyDa9qYC/XRZ/bDMeYXKw/xHlujiCnqxJvgqJSouf+hO+PM0kYBJLKM EmAe4Cnca8xbGOTUBXZg =B/y7 -END PGP SIGNATURE- ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 14/15] intel: Clarify intel_screen_make_configs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 08/06/2012 07:40 PM, Eric Anholt wrote: Chad Versace chad.vers...@linux.intel.com writes: This function felt sloppy, so this patch cleans it up a little bit. - Rename `color` to `i`. It is not a color value, only an iterator int. I'm meh on this change. A quick explanation of why I renamed it... The variable name `color` confused me the first time I encountered this function. It's uncommon for integer iterator variables to have colorful names so I suspected something special was going on, but nothing was. - Move `depth_bits[0] = 0` into the non-accum loop because that is where it used. The accum loop later overwrites depth_bits[0]. This makes sense -- move it next to the place that sets up the rest of the array. - Redefine `msaa_samples_array` as static const because it is never modified. Maybe instead, singlesample_samples[] = {0} and multisample_samples[] = {4, 8}? The array math in the next patch was not pretty. Good idea. I don't want git blame to associate me with that array math. I incorporated your suggestion into this and the next patch. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJQITH9AAoJEAIvNt057x8iNDMQAMH8sVoOLqA0LaDw1VJTVRn6 R/DpyLv1lGffBfMJylKJGylDe3Yv1yRin6h9pu6dWzAi5W2UOD3UI4tOUgi+RN2g lwbDwK+hdWd/qYeQzZCPRsCq9O6E0mNZGMkFJ4DNTKLJJFGOQ3JKEBwSgZQDUOpE evlreUvC3y4LsIolAFnBMMih2tU8+mwftb/Nai7rOQbu9fwyggW5TzNp5cqWUxkW 9lTtqg7S7syDEc/iUAmfapY39w0Jnn3bTOflKfuvjPcxiHot4qDhHmPO+TxsCIR6 l63q2uB4E7db07NyHky3YtBc5YQNe9VHL9UrT98+yVfqzkXoKX4xuNfuUl342PsO YSeUOTEMVnocnIuXWHS1kfAUZNaCPHxTF6PRuB1v/zX7qUGSwGQvVKvkp1qKKIEz 0uTeFcBPJEQ4MGBz9+c5rrHLoXmQU481xcd61IXrqtohMoNkdT7wpc6yigiDXhrr 2H8KeMH45ptWDTdGinaxRaP76d09glGggucqEPQ4/hTlZptq8xGGi2TFc0zBElFw snKCkx90lOtdsib6JNSJHd0DXGfL4R/qoMVFsxpu4d/qdw0tAdZVPbkmS0vlEXb0 5dD/mubBUIdoTD/Crgh6K+CY6HZG8PcCZm0Y38PelmpYCWk8p+vkDWLVt7A1XVv6 QkBirnpfwG8wt5EdS1el =Zpbx -END PGP SIGNATURE- ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH mesa] i965: add more Haswell PCI IDs
2012/8/6 Kenneth Graunke kenn...@whitecape.org: On 08/06/2012 02:50 PM, Paulo Zanoni wrote: From: Paulo Zanoni paulo.r.zan...@intel.com Signed-off-by: Paulo Zanoni paulo.r.zan...@intel.com Reviewed-by: Kenneth Graunke kenn...@whitecape.org Do you have push access? If not, I can commit this for you. I just discovered I have. Patch committed, thanks. -- Paulo Zanoni ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/15] intel: Support mapping multisample miptrees (v2)
On 08/06/2012 07:32 PM, Eric Anholt wrote: Chad Versace chad.vers...@linux.intel.com writes: Add two new functions: intel_miptree_{map,unmap}_multisample, to which intel_miptree_{map,unmap} dispatch. Only mapping flat, renderbuffer-like miptrees are supported. v2: - Move the introduction of intel_mipmap_tree::singlesample_{width0,height0} to this patch, per Anholt. - Replace relations `mt-num_samples == 0` and `mt-num_samples 0` with `= 1` and ` 0`, per Anholt. - Don't downsample unnecessarily, found by Anholt. CC: Eric Anholt e...@anholt.net CC: Paul Berry stereotype...@gmail.com Signed-off-by: Chad Versace chad.vers...@linux.intel.com --- src/mesa/drivers/dri/intel/intel_mipmap_tree.c | 115 +++-- src/mesa/drivers/dri/intel/intel_mipmap_tree.h | 18 2 files changed, 127 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/intel/intel_mipmap_tree.c b/src/mesa/drivers/dri/intel/intel_mipmap_tree.c index 23d84c0..6ecb48f 100644 --- a/src/mesa/drivers/dri/intel/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/intel/intel_mipmap_tree.c + if (!mt-singlesample_mt) { + map-singlesample_mt_is_tmp = true; + mt-need_downsample = true; Move this mt-need_downsample flag setup to after you've successfully alloced? Done. That's a sensible change, and removes the need for unsetting mt-need_downsample in the failure path. + mt-singlesample_mt = + intel_miptree_create_for_renderbuffer(intel, + mt-format, + mt-singlesample_width0, + mt-singlesample_height0, + 0 /*num_samples*/); + if (!mt-singlesample_mt) { + mt-need_downsample = false; + goto fail; + } + } + + if (mode GL_MAP_INVALIDATE_RANGE_BIT) + mt-need_downsample = false; + + intel_miptree_downsample(intel, mt); I don't think you can clear need_downsample for GL_MAP_INVALIDATE_RANGE_BIT, because the GL_MAP_WRITE_BIT case in the unmap (implied by INVALIDATE_RANGE) will upsample the whole singlesample buffer back, not just the mapped subset. Dropping the INVALIDATE_RANGE gets the series up to this patch my r-b. Ah, your're right. Thanks for catching that subtle error. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 15/15] intel: Advertise multisample DRI2 configs on gen = 6
On 08/06/2012 07:49 PM, Eric Anholt wrote: Chad Versace chad.vers...@linux.intel.com writes: + /* Generate multisample configs. +* +* This loop breaks early, and hence is a no-op, on gen 6. +* +* Multisample configs must follow the singlesample configs in order to +* work around an X server bug present in 1.12. The X server chooses to +* associate the first listed RGBA888-Z24S8 config, regardless of its +* sample count, with the 32-bit depth visual used for compositing. +* +* Only doublebuffer configs with GLX_SWAP_UNDEFINED_OML behavior are +* supported. Singlebuffer configs are not supported because that would +* require that rendering be eventually written to the singlesample buffer +* even if DRI2Flush is never called; yet we downsample to the singlesample +* buffer only on DRI2Flush. GLX_SWAP_COPY_OML is not supported because we +* have no tests for its interaction with MSAA. +*/ We actually need to remove our claiming of GLX_SWAP_COPY_OML in general, because pageflipping means that we don't actually support SWAP_COPY. We only do UNDEFINED. I'd say instead singlebuffer configs are not supported because nobody wants them. I'll update the comments to say: * Only doublebuffer configs with GLX_SWAP_UNDEFINED_OML behavior are * supported. Singlebuffer configs are not supported because no one wants * them. GLX_SWAP_COPY_OML is not supported due to page flipping. */ I'll follow up later with a patch that removes advertising of SWAP_COPY configs. I think all you need is (pessimistically) intel_downsample_for_dri2_flush in intel_flush_front() to make front buffer rendering actually work, and it's a problem that exists even in a doublebuffer config. Ah, I failed to realize that this problem exists even in a doublebuffer config. I'll follow up later with a patch that downsamples in intel_flush_front. I think this concludes my review. Great work! I'm excited to see this finally land. Thanks. Is this an implicit r-b on this patch? ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 15/15] intel: Advertise multisample DRI2 configs on gen = 6
Chad Versace chad.vers...@linux.intel.com writes: On 08/06/2012 07:49 PM, Eric Anholt wrote: Chad Versace chad.vers...@linux.intel.com writes: + /* Generate multisample configs. +* +* This loop breaks early, and hence is a no-op, on gen 6. +* +* Multisample configs must follow the singlesample configs in order to +* work around an X server bug present in 1.12. The X server chooses to +* associate the first listed RGBA888-Z24S8 config, regardless of its +* sample count, with the 32-bit depth visual used for compositing. +* +* Only doublebuffer configs with GLX_SWAP_UNDEFINED_OML behavior are +* supported. Singlebuffer configs are not supported because that would +* require that rendering be eventually written to the singlesample buffer +* even if DRI2Flush is never called; yet we downsample to the singlesample +* buffer only on DRI2Flush. GLX_SWAP_COPY_OML is not supported because we +* have no tests for its interaction with MSAA. +*/ We actually need to remove our claiming of GLX_SWAP_COPY_OML in general, because pageflipping means that we don't actually support SWAP_COPY. We only do UNDEFINED. I'd say instead singlebuffer configs are not supported because nobody wants them. I'll update the comments to say: * Only doublebuffer configs with GLX_SWAP_UNDEFINED_OML behavior are * supported. Singlebuffer configs are not supported because no one wants * them. GLX_SWAP_COPY_OML is not supported due to page flipping. */ I'll follow up later with a patch that removes advertising of SWAP_COPY configs. I think all you need is (pessimistically) intel_downsample_for_dri2_flush in intel_flush_front() to make front buffer rendering actually work, and it's a problem that exists even in a doublebuffer config. Ah, I failed to realize that this problem exists even in a doublebuffer config. I'll follow up later with a patch that downsamples in intel_flush_front. I think this concludes my review. Great work! I'm excited to see this finally land. Thanks. Is this an implicit r-b on this patch? Yeah. pgpY24JWTcGTJ.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/msaa: Add sample-alpha-to-coverage support for multiple render targets
Anuj Phogat anuj.pho...@gmail.com writes: Render Target Write message should include source zero alpha value when sample-alpha-to-coverage is enabled for an FBO with multiple render targets. Source zero alpha value is used as fragment coverage for all the render targets. diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index fefe2c7..7fc28ac 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -1930,14 +1930,24 @@ fs_visitor::emit_color_write(int target, int index, int first_color_mrf) { int reg_width = c-dispatch_width / 8; fs_inst *inst; - fs_reg color = outputs[target]; + fs_reg color; + bool src0_alpha_to_render_target = target 0 + c-key.nr_color_regions 1 + c-key.sample_alpha_to_coverage; + + color = (src0_alpha_to_render_target !index) ? + outputs[0] : + outputs[target]; fs_reg mrf; /* If there's no color data to be written, skip it. */ if (color.file == BAD_FILE) return; - color.reg_offset += index; + if (src0_alpha_to_render_target) + color.reg_offset += !index ? 3 : index - 1; + else + color.reg_offset += index; Ew, this is really awful. How about instead.., - for (unsigned i = 0; i this-output_components[target]; i++) - emit_color_write(target, i, color_mrf); + /* If src0_alpha_to_render_target is true, include source zero alpha + * data in RenderTargetWrite message for targets 0. + */ + output_components = (target src0_alpha_to_render_target) ? + (this-output_components[target] + 1) : + this-output_components[target]; - fs_inst *inst = emit(FS_OPCODE_FB_WRITE); + for (unsigned i = 0; i output_components; i++) + emit_color_write(target, i, color_mrf); Replace all of this change with: if (src0_alpha_to_render_target) { emit_color_write(0, 3, color_mrf); color_mrf += reg_width); } for (unsigned i = 0; i this-output_components[target]; i++) emit_color_write(target, i, color_mrf); diff --git a/src/mesa/drivers/dri/i965/brw_wm.c b/src/mesa/drivers/dri/i965/brw_wm.c index 5ab0547..210b078 100644 --- a/src/mesa/drivers/dri/i965/brw_wm.c +++ b/src/mesa/drivers/dri/i965/brw_wm.c @@ -546,6 +546,8 @@ static void brw_wm_populate_key( struct brw_context *brw, /* _NEW_BUFFERS */ key-nr_color_regions = ctx-DrawBuffer-_NumColorDrawBuffers; Needs /* _NEW_MULTISAMPLE */ + key-sample_alpha_to_coverage = ctx-Multisample.SampleAlphaToCoverage; and corresponding addition to the state struct below. pgpLkqngDbG9c.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/8] intel: Rename INTEL_DEBUG=fall to INTEL_DEBUG=perf.
I want to introduce some more debug output for performance surprises that includes fallbacks, but aren't necessarily software rasterization. Leave INTEL_DEBUG=fall in place for those that have used that flag before. --- src/mesa/drivers/dri/i915/i915_program.c|2 +- src/mesa/drivers/dri/i915/intel_tris.c |4 ++-- src/mesa/drivers/dri/i965/brw_fallback.c|2 +- src/mesa/drivers/dri/i965/brw_urb.c |2 +- src/mesa/drivers/dri/intel/intel_context.c |3 ++- src/mesa/drivers/dri/intel/intel_context.h |4 ++-- src/mesa/drivers/dri/intel/intel_tex_copy.c |4 ++-- 7 files changed, 11 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/dri/i915/i915_program.c b/src/mesa/drivers/dri/i915/i915_program.c index 0a600d3..4437167 100644 --- a/src/mesa/drivers/dri/i915/i915_program.c +++ b/src/mesa/drivers/dri/i915/i915_program.c @@ -442,7 +442,7 @@ i915_emit_param4fv(struct i915_fragment_program * p, const GLfloat * values) void i915_program_error(struct i915_fragment_program *p, const char *fmt, ...) { - if (unlikely((INTEL_DEBUG (DEBUG_WM | DEBUG_FALLBACKS)) != 0)) { + if (unlikely((INTEL_DEBUG (DEBUG_WM | DEBUG_PERF)) != 0)) { va_list args; fprintf(stderr, i915_program_error: ); diff --git a/src/mesa/drivers/dri/i915/intel_tris.c b/src/mesa/drivers/dri/i915/intel_tris.c index 5954b24..549af5e 100644 --- a/src/mesa/drivers/dri/i915/intel_tris.c +++ b/src/mesa/drivers/dri/i915/intel_tris.c @@ -1223,7 +1223,7 @@ intelFallback(struct intel_context *intel, GLbitfield bit, bool mode) assert(!intel-tnl_pipeline_running); intel_flush(ctx); - if (INTEL_DEBUG DEBUG_FALLBACKS) + if (INTEL_DEBUG DEBUG_PERF) fprintf(stderr, ENTER FALLBACK %x: %s\n, bit, getFallbackString(bit)); _swsetup_Wakeup(ctx); @@ -1236,7 +1236,7 @@ intelFallback(struct intel_context *intel, GLbitfield bit, bool mode) assert(!intel-tnl_pipeline_running); _swrast_flush(ctx); - if (INTEL_DEBUG DEBUG_FALLBACKS) + if (INTEL_DEBUG DEBUG_PERF) fprintf(stderr, LEAVE FALLBACK %s\n, getFallbackString(bit)); tnl-Driver.Render.Start = intelRenderStart; tnl-Driver.Render.PrimitiveNotify = intelRenderPrimitive; diff --git a/src/mesa/drivers/dri/i965/brw_fallback.c b/src/mesa/drivers/dri/i965/brw_fallback.c index 81fc23a..1ae6fc8 100644 --- a/src/mesa/drivers/dri/i965/brw_fallback.c +++ b/src/mesa/drivers/dri/i965/brw_fallback.c @@ -37,7 +37,7 @@ #include tnl/tnl.h #include brw_context.h -#define FILE_DEBUG_FLAG DEBUG_FALLBACKS +#define FILE_DEBUG_FLAG DEBUG_PERF static bool do_check_fallback(struct brw_context *brw) { diff --git a/src/mesa/drivers/dri/i965/brw_urb.c b/src/mesa/drivers/dri/i965/brw_urb.c index 7643dc2..b1126b5 100644 --- a/src/mesa/drivers/dri/i965/brw_urb.c +++ b/src/mesa/drivers/dri/i965/brw_urb.c @@ -190,7 +190,7 @@ static void recalculate_urb_fence( struct brw_context *brw ) exit(1); } -if (unlikely(INTEL_DEBUG (DEBUG_URB|DEBUG_FALLBACKS))) +if (unlikely(INTEL_DEBUG (DEBUG_URB|DEBUG_PERF))) printf(URB CONSTRAINED\n); } diff --git a/src/mesa/drivers/dri/intel/intel_context.c b/src/mesa/drivers/dri/intel/intel_context.c index 759fead..a39462b 100644 --- a/src/mesa/drivers/dri/intel/intel_context.c +++ b/src/mesa/drivers/dri/intel/intel_context.c @@ -427,7 +427,8 @@ static const struct dri_debug_control debug_control[] = { { ioctl, DEBUG_IOCTL}, { blit, DEBUG_BLIT}, { mip, DEBUG_MIPTREE}, - { fall, DEBUG_FALLBACKS}, + { fall, DEBUG_PERF}, + { perf, DEBUG_PERF}, { verb, DEBUG_VERBOSE}, { bat, DEBUG_BATCH}, { pix, DEBUG_PIXEL}, diff --git a/src/mesa/drivers/dri/intel/intel_context.h b/src/mesa/drivers/dri/intel/intel_context.h index 29ab187..6d1a81c 100644 --- a/src/mesa/drivers/dri/intel/intel_context.h +++ b/src/mesa/drivers/dri/intel/intel_context.h @@ -430,7 +430,7 @@ extern int INTEL_DEBUG; #define DEBUG_IOCTL0x4 #define DEBUG_BLIT 0x8 #define DEBUG_MIPTREE 0x10 -#define DEBUG_FALLBACKS0x20 +#define DEBUG_PERF 0x20 #define DEBUG_VERBOSE 0x40 #define DEBUG_BATCH 0x80 #define DEBUG_PIXEL 0x100 @@ -459,7 +459,7 @@ extern int INTEL_DEBUG; } while(0) #define fallback_debug(...) do { \ - if (unlikely(INTEL_DEBUG DEBUG_FALLBACKS))\ + if (unlikely(INTEL_DEBUG DEBUG_PERF)) \ printf(__VA_ARGS__);\ } while(0) diff --git a/src/mesa/drivers/dri/intel/intel_tex_copy.c b/src/mesa/drivers/dri/intel/intel_tex_copy.c index 6da4ec6..f436633 100644 --- a/src/mesa/drivers/dri/intel/intel_tex_copy.c +++ b/src/mesa/drivers/dri/intel/intel_tex_copy.c @@ -61,7 +61,7 @@ intel_copy_texsubimage(struct intel_context *intel, intel_prepare_render(intel);
[Mesa-dev] [PATCH 2/8] i965: Add INTEL_DEBUG=perf for failure to compile 16-wide shaders.
--- src/mesa/drivers/dri/i965/brw_fs.cpp |5 - src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp |3 ++- src/mesa/drivers/dri/intel/intel_context.h|5 + 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index d06858e..298c708 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2078,7 +2078,10 @@ brw_wm_fs_emit(struct brw_context *brw, struct brw_wm_compile *c, c-dispatch_width = 16; fs_visitor v2(c, prog, shader); v2.import_uniforms(v); - v2.run(); + if (!v2.run()) { + perf_debug(16-wide shader failed to compile, falling back to +8-wide at a 10-20%% performance cost: %s, v2.fail_msg); + } } c-prog_data.dispatch_width = 8; diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp index 7618047..e7f11ae 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp @@ -238,7 +238,8 @@ fs_visitor::assign_regs() if (reg == -1) { fail(no register to spill\n); } else if (c-dispatch_width == 16) { -fail(no spilling support on 16-wide yet\n); +fail(Failure to register allocate. Reduce number of live scalar + values to avoid this.); } else { spill_reg(reg); } diff --git a/src/mesa/drivers/dri/intel/intel_context.h b/src/mesa/drivers/dri/intel/intel_context.h index 6d1a81c..c4efa54 100644 --- a/src/mesa/drivers/dri/intel/intel_context.h +++ b/src/mesa/drivers/dri/intel/intel_context.h @@ -463,6 +463,11 @@ extern int INTEL_DEBUG; printf(__VA_ARGS__);\ } while(0) +#define perf_debug(...) do { \ + if (unlikely(INTEL_DEBUG DEBUG_PERF)) \ + printf(__VA_ARGS__);\ +} while(0) + #define PCI_CHIP_845_G 0x2562 #define PCI_CHIP_I830_M0x3577 #define PCI_CHIP_I855_GM 0x3582 -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] intel: performance debug flag.
One of Valve's requests was for GL_ARB_debug_output for performance traps they should know about. Unfortunately, Mesa's ARB_debug_output support is very limited at the moment, so this just gets messages in place, which we can convert to GL_ARB_debug_output at some later time. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/8] i965: Add performance debug for register spilling.
--- src/mesa/drivers/dri/i965/brw_vs.c |4 src/mesa/drivers/dri/i965/brw_wm.c |4 2 files changed, 8 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vs.c b/src/mesa/drivers/dri/i965/brw_vs.c index b1b073e..5120167 100644 --- a/src/mesa/drivers/dri/i965/brw_vs.c +++ b/src/mesa/drivers/dri/i965/brw_vs.c @@ -254,6 +254,10 @@ do_vs_prog(struct brw_context *brw, /* Scratch space is used for register spilling */ if (c.last_scratch) { + perf_debug(Vertex shader triggered register spilling. + Try reducing the number of live vec4 values to + improve performance.\n); + c.prog_data.total_scratch = brw_get_scratch_size(c.last_scratch); brw_get_scratch_bo(intel, brw-vs.scratch_bo, diff --git a/src/mesa/drivers/dri/i965/brw_wm.c b/src/mesa/drivers/dri/i965/brw_wm.c index 5ab0547..3abc696 100644 --- a/src/mesa/drivers/dri/i965/brw_wm.c +++ b/src/mesa/drivers/dri/i965/brw_wm.c @@ -321,6 +321,10 @@ bool do_wm_prog(struct brw_context *brw, /* Scratch space is used for register spilling */ if (c-last_scratch) { + perf_debug(Fragment shader triggered register spilling. + Try reducing the number of live scalar values to + improve performance.\n); + c-prog_data.total_scratch = brw_get_scratch_size(c-last_scratch); brw_get_scratch_bo(intel, brw-wm.scratch_bo, -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/8] intel: Add performance debug for some common GPU stalls.
--- src/mesa/drivers/dri/i965/brw_queryobj.c |6 ++ src/mesa/drivers/dri/intel/intel_buffer_objects.c |8 +++- src/mesa/drivers/dri/intel/intel_regions.c|6 ++ 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_queryobj.c b/src/mesa/drivers/dri/i965/brw_queryobj.c index 240fe32..f84ad0e 100644 --- a/src/mesa/drivers/dri/i965/brw_queryobj.c +++ b/src/mesa/drivers/dri/i965/brw_queryobj.c @@ -58,6 +58,12 @@ brw_queryobj_get_results(struct gl_context *ctx, if (query-bo == NULL) return; + if (unlikely(INTEL_DEBUG DEBUG_PERF)) { + if (drm_intel_bo_busy(query-bo)) { + perf_debug(Stalling on the GPU waiting for a query object.\n); + } + } + drm_intel_bo_map(query-bo, false); results = query-bo-virtual; switch (query-Base.Target) { diff --git a/src/mesa/drivers/dri/intel/intel_buffer_objects.c b/src/mesa/drivers/dri/intel/intel_buffer_objects.c index 37dc75c..df8ac7f 100644 --- a/src/mesa/drivers/dri/intel/intel_buffer_objects.c +++ b/src/mesa/drivers/dri/intel/intel_buffer_objects.c @@ -212,7 +212,8 @@ intel_bufferobj_subdata(struct gl_context * ctx, intel_bufferobj_alloc_buffer(intel, intel_obj); drm_intel_bo_subdata(intel_obj-buffer, 0, size, data); } else { -/* Use the blitter to upload the new data. */ + perf_debug(Using a blit copy to avoid stalling on glBufferSubData() +to a busy buffer object.\n); drm_intel_bo *temp_bo = drm_intel_bo_alloc(intel-bufmgr, subdata temp, size, 64); @@ -226,6 +227,11 @@ intel_bufferobj_subdata(struct gl_context * ctx, drm_intel_bo_unreference(temp_bo); } } else { + if (unlikely(INTEL_DEBUG DEBUG_PERF)) { + if (drm_intel_bo_busy(intel_obj-buffer)) { +perf_debug(Stalling on the GPU in glBufferSubData().\n); + } + } drm_intel_bo_subdata(intel_obj-buffer, offset, size, data); } } diff --git a/src/mesa/drivers/dri/intel/intel_regions.c b/src/mesa/drivers/dri/intel/intel_regions.c index 1ef1ac6..9bf9c66 100644 --- a/src/mesa/drivers/dri/intel/intel_regions.c +++ b/src/mesa/drivers/dri/intel/intel_regions.c @@ -123,6 +123,12 @@ intel_region_map(struct intel_context *intel, struct intel_region *region, * flush is only needed on first map of the buffer. */ + if (unlikely(INTEL_DEBUG DEBUG_PERF)) { + if (drm_intel_bo_busy(region-bo)) { + perf_debug(Mapping a busy BO, causing a stall on the GPU.\n); + } + } + _DBG(%s %p\n, __FUNCTION__, region); if (!region-map_refcount) { intel_flush(intel-ctx); -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/8] i965: Add performance debug for shader recompiles.
--- src/mesa/drivers/dri/i965/brw_context.h |2 + src/mesa/drivers/dri/i965/brw_fs.cpp|6 ++ src/mesa/drivers/dri/i965/brw_program.h |2 + src/mesa/drivers/dri/i965/brw_vec4_emit.cpp |6 ++ src/mesa/drivers/dri/i965/brw_wm.c | 84 +++ src/mesa/drivers/dri/i965/brw_wm.h |3 + 6 files changed, 103 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 8a082ab..bc43557 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -275,6 +275,8 @@ struct brw_fragment_program { struct brw_shader { struct gl_shader base; + bool compiled_once; + /** Shader IR transformed for native compile, at link time. */ struct exec_list *ir; }; diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 298c708..90a1d92 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2086,6 +2086,12 @@ brw_wm_fs_emit(struct brw_context *brw, struct brw_wm_compile *c, c-prog_data.dispatch_width = 8; + if (unlikely(INTEL_DEBUG DEBUG_PERF)) { + if (shader-compiled_once) + brw_wm_debug_recompile(brw, prog, c-key); + shader-compiled_once = true; + } + return true; } diff --git a/src/mesa/drivers/dri/i965/brw_program.h b/src/mesa/drivers/dri/i965/brw_program.h index 874238f..9fbc201 100644 --- a/src/mesa/drivers/dri/i965/brw_program.h +++ b/src/mesa/drivers/dri/i965/brw_program.h @@ -45,5 +45,7 @@ struct brw_sampler_prog_key_data { void brw_populate_sampler_prog_key_data(struct gl_context *ctx, const struct gl_program *prog, struct brw_sampler_prog_key_data *key); +bool brw_debug_recompile_sampler_key(const struct brw_sampler_prog_key_data *old_key, + const struct brw_sampler_prog_key_data *key); #endif diff --git a/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp b/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp index 9df7b11..788d7b5 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp @@ -1031,6 +1031,10 @@ brw_vs_emit(struct gl_shader_program *prog, struct brw_vs_compile *c) printf(\n\n); } + if (shader-compiled_once) { + perf_debug(Recompiling vertex shader for program %d\n, prog-Name); + } + vec4_visitor v(c, prog, shader); if (!v.run()) { prog-LinkStatus = false; @@ -1038,6 +1042,8 @@ brw_vs_emit(struct gl_shader_program *prog, struct brw_vs_compile *c) return false; } + shader-compiled_once = true; + return true; } diff --git a/src/mesa/drivers/dri/i965/brw_wm.c b/src/mesa/drivers/dri/i965/brw_wm.c index 3abc696..323eabd 100644 --- a/src/mesa/drivers/dri/i965/brw_wm.c +++ b/src/mesa/drivers/dri/i965/brw_wm.c @@ -347,6 +347,90 @@ bool do_wm_prog(struct brw_context *brw, return true; } +static bool +key_debug(const char *name, int a, int b) +{ + if (a != b) { + perf_debug( %s %d-%d\n, name, a, b); + return true; + } else { + return false; + } +} + +bool +brw_debug_recompile_sampler_key(const struct brw_sampler_prog_key_data *old_key, +const struct brw_sampler_prog_key_data *key) +{ + bool found = false; + + for (unsigned int i = 0; i BRW_MAX_TEX_UNIT; i++) { + found |= key_debug(EXT_texture_swizzle or DEPTH_TEXTURE_MODE, + key-swizzles[i], old_key-swizzles[i]); + } + found |= key_debug(GL_CLAMP enabled on any texture unit's 1st coordinate, + key-gl_clamp_mask[0], old_key-gl_clamp_mask[0]); + found |= key_debug(GL_CLAMP enabled on any texture unit's 2nd coordinate, + key-gl_clamp_mask[1], old_key-gl_clamp_mask[1]); + found |= key_debug(GL_CLAMP enabled on any texture unit's 3rd coordinate, + key-gl_clamp_mask[2], old_key-gl_clamp_mask[2]); + found |= key_debug(GL_MESA_ycbcr texturing\n, + key-yuvtex_mask, old_key-yuvtex_mask); + found |= key_debug(GL_MESA_ycbcr UV swapping\n, + key-yuvtex_swap_mask, old_key-yuvtex_swap_mask); + + return found; +} + +void +brw_wm_debug_recompile(struct brw_context *brw, + struct gl_shader_program *prog, + const struct brw_wm_prog_key *key) +{ + struct brw_cache_item *c = NULL; + const struct brw_wm_prog_key *old_key = NULL; + bool found = false; + + perf_debug(Recompiling fragment shader for program %d\n, prog-Name); + + for (unsigned int i = 0; i brw-cache.size; i++) { + for (c = brw-cache.items[i]; c; c = c-next) { + if (c-cache_id == BRW_WM_PROG) { +old_key = c-key; + +if (old_key-program_string_id == key-program_string_id) + break; + } +
[Mesa-dev] [PATCH 5/8] i965: Add performance debug for fast clear fallbacks.
--- src/mesa/drivers/dri/i965/brw_clear.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_clear.c b/src/mesa/drivers/dri/i965/brw_clear.c index 31c2e45..71d7c48 100644 --- a/src/mesa/drivers/dri/i965/brw_clear.c +++ b/src/mesa/drivers/dri/i965/brw_clear.c @@ -107,14 +107,22 @@ brw_fast_clear_depth(struct gl_context *ctx) * a previous clear had happened at a different clear value and resolve it * first. */ - if (ctx-Scissor.Enabled) + if (ctx-Scissor.Enabled) { + perf_debug(Failed to fast clear depth due to scissor being enabled. + Possible 5%% performance win if avoided.\n); return false; + } /* The rendered area has to be 8x4 samples, not resolved pixels, so we look * at the miptree slice dimensions instead of renderbuffer size. */ if (mt-level[depth_irb-mt_level].width % 8 != 0 || mt-level[depth_irb-mt_level].height % 4 != 0) { + perf_debug(Failed to fast clear depth due to width/height %d,%d not + being aligned to 8,4. Possible 5%% performance win if + avoided\n, + mt-level[depth_irb-mt_level].width, + mt-level[depth_irb-mt_level].height); return false; } -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 8/8] i965: Add perf debug for stalls during shader compiles.
--- src/mesa/drivers/dri/i965/brw_fs.cpp| 13 + src/mesa/drivers/dri/i965/brw_vec4_emit.cpp | 20 ++-- src/mesa/drivers/dri/intel/intel_screen.c | 13 + src/mesa/drivers/dri/intel/intel_screen.h |1 + 4 files changed, 45 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 90a1d92..dfd101f 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2044,10 +2044,18 @@ brw_wm_fs_emit(struct brw_context *brw, struct brw_wm_compile *c, struct gl_shader_program *prog) { struct intel_context *intel = brw-intel; + bool start_busy = false; + float start_time = 0; if (!prog) return false; + if (unlikely(INTEL_DEBUG DEBUG_PERF)) { + start_busy = (intel-batch.last_bo +drm_intel_bo_busy(intel-batch.last_bo)); + start_time = get_time(); + } + struct brw_shader *shader = (brw_shader *) prog-_LinkedShaders[MESA_SHADER_FRAGMENT]; if (!shader) @@ -2090,6 +2098,11 @@ brw_wm_fs_emit(struct brw_context *brw, struct brw_wm_compile *c, if (shader-compiled_once) brw_wm_debug_recompile(brw, prog, c-key); shader-compiled_once = true; + + if (start_busy !drm_intel_bo_busy(intel-batch.last_bo)) { + perf_debug(FS compile took %.03f ms and stalled the GPU\n, +(get_time() - start_time) / 1000); + } } return true; diff --git a/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp b/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp index 788d7b5..0db435b 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp @@ -1017,9 +1017,19 @@ extern C { bool brw_vs_emit(struct gl_shader_program *prog, struct brw_vs_compile *c) { + struct intel_context *intel = c-func.brw-intel; + bool start_busy = false; + float start_time = 0; + if (!prog) return false; + if (unlikely(INTEL_DEBUG DEBUG_PERF)) { + start_busy = (intel-batch.last_bo +drm_intel_bo_busy(intel-batch.last_bo)); + start_time = get_time(); + } + struct brw_shader *shader = (brw_shader *) prog-_LinkedShaders[MESA_SHADER_VERTEX]; if (!shader) @@ -1031,8 +1041,14 @@ brw_vs_emit(struct gl_shader_program *prog, struct brw_vs_compile *c) printf(\n\n); } - if (shader-compiled_once) { - perf_debug(Recompiling vertex shader for program %d\n, prog-Name); + if (unlikely(INTEL_DEBUG DEBUG_PERF)) { + if (shader-compiled_once) { + perf_debug(Recompiling vertex shader for program %d\n, prog-Name); + } + if (start_busy !drm_intel_bo_busy(intel-batch.last_bo)) { + perf_debug(VS compile took %.03f ms and stalled the GPU\n, +(get_time() - start_time) / 1000); + } } vec4_visitor v(c, prog, shader); diff --git a/src/mesa/drivers/dri/intel/intel_screen.c b/src/mesa/drivers/dri/intel/intel_screen.c index 5c38c8d..56abc12 100644 --- a/src/mesa/drivers/dri/intel/intel_screen.c +++ b/src/mesa/drivers/dri/intel/intel_screen.c @@ -109,6 +109,19 @@ const GLuint __driNConfigOptions = 15; static PFNGLXCREATECONTEXTMODES create_context_modes = NULL; #endif /*USE_NEW_INTERFACE */ +/** + * For debugging, this returns a time in seconds since the first call. + */ +double +get_time(void) +{ + struct timespec tp; + + clock_gettime(CLOCK_MONOTONIC, tp); + + return tp.tv_sec + tp.tv_nsec / 10.0; +} + void aub_dump_bmp(struct gl_context *ctx) { diff --git a/src/mesa/drivers/dri/intel/intel_screen.h b/src/mesa/drivers/dri/intel/intel_screen.h index c0cc284..f5a374d 100644 --- a/src/mesa/drivers/dri/intel/intel_screen.h +++ b/src/mesa/drivers/dri/intel/intel_screen.h @@ -81,6 +81,7 @@ intelMakeCurrent(__DRIcontext * driContextPriv, __DRIdrawable * driDrawPriv, __DRIdrawable * driReadPriv); +double get_time(void); void aub_dump_bmp(struct gl_context *ctx); #endif -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/8] i965: Add performance debug for when the state cache gets nuked.
--- src/mesa/drivers/dri/i965/brw_state_cache.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_state_cache.c b/src/mesa/drivers/dri/i965/brw_state_cache.c index 4ae8e12..57a5ee9 100644 --- a/src/mesa/drivers/dri/i965/brw_state_cache.c +++ b/src/mesa/drivers/dri/i965/brw_state_cache.c @@ -375,8 +375,11 @@ brw_state_cache_check_size(struct brw_context *brw) /* un-tuned guess. Each object is generally a page, so 1000 of them is 4 MB of * state cache. */ - if (brw-cache.n_items 1000) + if (brw-cache.n_items 1000) { + perf_debug(Exceeded state cache size limit. Clearing the set + of compiled programs, which will trigger recompiles\n); brw_clear_cache(brw, brw-cache); + } } -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/8] i965: Add INTEL_DEBUG=perf for failure to compile 16-wide shaders.
On Tue, 2012-08-07 at 11:04 -0700, Eric Anholt wrote: diff --git a/src/mesa/drivers/dri/intel/intel_context.h b/src/mesa/drivers/dri/intel/intel_context.h index 6d1a81c..c4efa54 100644 --- a/src/mesa/drivers/dri/intel/intel_context.h +++ b/src/mesa/drivers/dri/intel/intel_context.h @@ -463,6 +463,11 @@ extern int INTEL_DEBUG; printf(__VA_ARGS__);\ } while(0) +#define perf_debug(...) do { \ + if (unlikely(INTEL_DEBUG DEBUG_PERF)) \ + printf(__VA_ARGS__);\ +} while(0) Should perf_debug be used in the paths in PATCH 1/8? For series: Reviewed-by: Jordan Justen jordan.l.jus...@intel.com -Jordan ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: Add a lowering pass to turn complicated UBO references to vector loads.
On 08/06/2012 07:00 PM, Eric Anholt wrote: v2: Reduce the impenetrable code in emit_ubo_loads() by 23 lines by keeping the ir_variable as the variable part of the offset from handle_rvalue(), and track the constant offsets from that with a plain old integer value, avoiding a bunch of temporary variables in the array and struct handling. Also, fix file description doxygen. --- src/glsl/Makefile.sources|1 + src/glsl/ir_optimization.h |1 + src/glsl/lower_ubo_reference.cpp | 313 ++ 3 files changed, 315 insertions(+) create mode 100644 src/glsl/lower_ubo_reference.cpp diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources index f2743f7..765f06a 100644 --- a/src/glsl/Makefile.sources +++ b/src/glsl/Makefile.sources @@ -66,6 +66,7 @@ LIBGLSL_CXX_FILES = \ $(GLSL_SRCDIR)/lower_vec_index_to_swizzle.cpp \ $(GLSL_SRCDIR)/lower_vector.cpp \ $(GLSL_SRCDIR)/lower_output_reads.cpp \ + $(GLSL_SRCDIR)/lower_ubo_reference.cpp \ $(GLSL_SRCDIR)/opt_algebraic.cpp \ $(GLSL_SRCDIR)/opt_array_splitting.cpp \ $(GLSL_SRCDIR)/opt_constant_folding.cpp \ diff --git a/src/glsl/ir_optimization.h b/src/glsl/ir_optimization.h index c435d77..2220d51 100644 --- a/src/glsl/ir_optimization.h +++ b/src/glsl/ir_optimization.h @@ -74,6 +74,7 @@ bool lower_variable_index_to_cond_assign(exec_list *instructions, bool lower_quadop_vector(exec_list *instructions, bool dont_lower_swz); bool lower_clip_distance(exec_list *instructions); void lower_output_reads(exec_list *instructions); +void lower_ubo_reference(struct gl_shader *shader, exec_list *instructions); bool optimize_redundant_jumps(exec_list *instructions); bool optimize_split_arrays(exec_list *instructions, bool linked); diff --git a/src/glsl/lower_ubo_reference.cpp b/src/glsl/lower_ubo_reference.cpp new file mode 100644 index 000..f930da5 --- /dev/null +++ b/src/glsl/lower_ubo_reference.cpp @@ -0,0 +1,313 @@ +/* + * Copyright © 2012 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the Software), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +/** + * \file lower_ubo_reference.cpp + * + * IR lower pass to replace dereferences of variables in a uniform + * buffer object with usage of ir_binop_ubo_load expressions, each of + * which can read data up to the size of a vec4. + * + * This relieves drivers of the responsibility to deal with tricky UBO + * layout issues like std140 structures and row_major matrices on + * their own. + */ + +#include ir.h +#include ir_builder.h +#include ir_rvalue_visitor.h +#include main/macros.h + +using namespace ir_builder; + +namespace { +class lower_ubo_reference_visitor : public ir_rvalue_enter_visitor { +public: + lower_ubo_reference_visitor(struct gl_shader *shader) + : shader(shader) + { + } + + void handle_rvalue(ir_rvalue **rvalue); + void emit_ubo_loads(ir_dereference *deref, ir_variable *base_offset, +unsigned int deref_offset); + ir_expression *ubo_load(const struct glsl_type *type, +ir_rvalue *offset); + + void *mem_ctx; + struct gl_shader *shader; + struct gl_uniform_buffer_variable *ubo_var; + unsigned uniform_block; + bool progress; +}; + +static inline unsigned int +align(unsigned int a, unsigned int align) +{ + return (a + align - 1) / align * align; +} + +void +lower_ubo_reference_visitor::handle_rvalue(ir_rvalue **rvalue) +{ + if (!*rvalue) + return; + + ir_dereference *deref = (*rvalue)-as_dereference(); + if (!deref) + return; + + ir_variable *var = deref-variable_referenced(); + if (!var || var-uniform_block == -1) + return; + + mem_ctx = ralloc_parent(*rvalue); + uniform_block = var-uniform_block; + struct gl_uniform_block *block =
[Mesa-dev] [PATCH] mesa: Fix glPopAttrib() behavior on GL_FRAMEBUFFER_SRGB.
I happened to notice this while looking at a blit pass in l4d2, which had an optional push/pop around framebuffer srgb setting. It didn't matter in the end, but the fix is sitting in my tree now. --- src/mesa/main/attrib.c | 13 + 1 file changed, 13 insertions(+) diff --git a/src/mesa/main/attrib.c b/src/mesa/main/attrib.c index 8bc7c34..9cab35b 100644 --- a/src/mesa/main/attrib.c +++ b/src/mesa/main/attrib.c @@ -135,6 +135,9 @@ struct gl_enable_attrib /* GL_ARB_point_sprite / GL_NV_point_sprite */ GLboolean PointSprite; GLboolean FragmentShaderATI; + + /* GL_ARB_framebuffer_sRGB / GL_EXT_framebuffer_sRGB */ + GLboolean sRGBEnabled; }; @@ -322,6 +325,9 @@ _mesa_PushAttrib(GLbitfield mask) attr-VertexProgramPointSize = ctx-VertexProgram.PointSizeEnabled; attr-VertexProgramTwoSide = ctx-VertexProgram.TwoSideEnabled; save_attrib_data(head, GL_ENABLE_BIT, attr); + + /* GL_ARB_framebuffer_sRGB / GL_EXT_framebuffer_sRGB */ + attr-sRGBEnabled = ctx-Color.sRGBEnabled; } if (mask GL_EVAL_BIT) { @@ -617,6 +623,10 @@ pop_enable_group(struct gl_context *ctx, const struct gl_enable_attrib *enable) enable-VertexProgramTwoSide, GL_VERTEX_PROGRAM_TWO_SIDE_ARB); + /* GL_ARB_framebuffer_sRGB / GL_EXT_framebuffer_sRGB */ + TEST_AND_UPDATE(ctx-Color.sRGBEnabled, enable-sRGBEnabled, + GL_FRAMEBUFFER_SRGB); + /* texture unit enables */ for (i = 0; i ctx-Const.MaxTextureUnits; i++) { const GLbitfield enabled = enable-Texture[i]; @@ -981,6 +991,9 @@ _mesa_PopAttrib(void) _mesa_set_enable(ctx, GL_DITHER, color-DitherFlag); _mesa_ClampColorARB(GL_CLAMP_FRAGMENT_COLOR_ARB, color-ClampFragmentColor); _mesa_ClampColorARB(GL_CLAMP_READ_COLOR_ARB, color-ClampReadColor); + + /* GL_ARB_framebuffer_sRGB / GL_EXT_framebuffer_sRGB */ + _mesa_set_enable(ctx, GL_FRAMEBUFFER_SRGB, color-sRGBEnabled); } break; case GL_CURRENT_BIT: -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 53226] New: mesa/demos does not build with mesa git because of gbm API changes
https://bugs.freedesktop.org/show_bug.cgi?id=53226 Bug #: 53226 Summary: mesa/demos does not build with mesa git because of gbm API changes Classification: Unclassified Product: Mesa Version: git Platform: Other OS/Version: All Status: NEW Severity: normal Priority: medium Component: Demos AssignedTo: mesa-dev@lists.freedesktop.org ReportedBy: freedesk...@blino.org mesa/demos does not build with mesa git because of gbm API changes. gbm_bo_get_pitch() is now gbm_bo_get_stride(), as of mesa commit 7250cd506baa0bd4649b30d87509cdd0cbc06a57 -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 53226] mesa/demos does not build with mesa git because of gbm API changes
https://bugs.freedesktop.org/show_bug.cgi?id=53226 --- Comment #1 from Olivier Blin freedesk...@blino.org 2012-08-07 21:14:27 UTC --- Created attachment 65255 -- https://bugs.freedesktop.org/attachment.cgi?id=65255 eglkms: adapt to gbm stride API change This patch fixes build with mesa git. -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] R600 VDPAU 422 regression since r600g: make sure copying of all texture formats is accelerated
Do you have any idea what could be wrong with the patch? Also could please tell me how to setup VDPAU and where to download the tests, so that I can test this. Marek On Tue, Aug 7, 2012 at 11:25 AM, Andy Furniss andy...@ukfsn.org wrote: Marek Olšák wrote: Does the attached patch fix this issue? Not properly - it fixes the invalid command stream but the output is not quite right - http://www.andyqos.ukfsn.org/vdpau-422-patched.png Marek On Mon, Aug 6, 2012 at 5:40 PM, Andy Furniss andy...@ukfsn.org wrote: Kernel is dcn card is rv790 - vdpau csc/scale regressed. This only shows with 422 colour so most things work. commit 7c371f46958910dd2ca9487c89af1b72bbfdada9 Author: Marek Olšák mar...@gmail.com Date: Sat Jul 28 00:38:42 2012 +0200 r600g: make sure copying of all texture formats is accelerated [drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream ! radeon :01:00.0: texture bo too small ((704 576) (1 1) 0 26 0 - 1622016 have 884736) radeon :01:00.0: alignments 384 1 1 1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 51749] make[6]: ../../../../src/mesa/Makefile.old: No such file or directory
https://bugs.freedesktop.org/show_bug.cgi?id=51749 --- Comment #1 from Matt Turner matts...@gmail.com 2012-08-07 21:49:33 UTC --- Not a problem now, right? -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] R600 VDPAU 422 regression since r600g: make sure copying of all texture formats is accelerated
On Tue, Aug 7, 2012 at 5:43 PM, Marek Olšák mar...@gmail.com wrote: Do you have any idea what could be wrong with the patch? Also could please tell me how to setup VDPAU and where to download the tests, so that I can test this. Just add: --enable-vdpau to your mesa configure line to enable it. To test it, try and play back an MPEG1/2 file with mplayer or another app that supports vdpau. Alex Marek On Tue, Aug 7, 2012 at 11:25 AM, Andy Furniss andy...@ukfsn.org wrote: Marek Olšák wrote: Does the attached patch fix this issue? Not properly - it fixes the invalid command stream but the output is not quite right - http://www.andyqos.ukfsn.org/vdpau-422-patched.png Marek On Mon, Aug 6, 2012 at 5:40 PM, Andy Furniss andy...@ukfsn.org wrote: Kernel is dcn card is rv790 - vdpau csc/scale regressed. This only shows with 422 colour so most things work. commit 7c371f46958910dd2ca9487c89af1b72bbfdada9 Author: Marek Olšák mar...@gmail.com Date: Sat Jul 28 00:38:42 2012 +0200 r600g: make sure copying of all texture formats is accelerated [drm:radeon_cs_ib_chunk] *ERROR* Invalid command stream ! radeon :01:00.0: texture bo too small ((704 576) (1 1) 0 26 0 - 1622016 have 884736) radeon :01:00.0: alignments 384 1 1 1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] mesa: Fix glPopAttrib() behavior on GL_FRAMEBUFFER_SRGB.
On 08/07/2012 03:05 PM, Eric Anholt wrote: I happened to notice this while looking at a blit pass in l4d2, which had an optional push/pop around framebuffer srgb setting. It didn't matter in the end, but the fix is sitting in my tree now. --- src/mesa/main/attrib.c | 13 + 1 file changed, 13 insertions(+) diff --git a/src/mesa/main/attrib.c b/src/mesa/main/attrib.c index 8bc7c34..9cab35b 100644 --- a/src/mesa/main/attrib.c +++ b/src/mesa/main/attrib.c @@ -135,6 +135,9 @@ struct gl_enable_attrib /* GL_ARB_point_sprite / GL_NV_point_sprite */ GLboolean PointSprite; GLboolean FragmentShaderATI; + + /* GL_ARB_framebuffer_sRGB / GL_EXT_framebuffer_sRGB */ + GLboolean sRGBEnabled; }; @@ -322,6 +325,9 @@ _mesa_PushAttrib(GLbitfield mask) attr-VertexProgramPointSize = ctx-VertexProgram.PointSizeEnabled; attr-VertexProgramTwoSide = ctx-VertexProgram.TwoSideEnabled; save_attrib_data(head, GL_ENABLE_BIT, attr); + + /* GL_ARB_framebuffer_sRGB / GL_EXT_framebuffer_sRGB */ + attr-sRGBEnabled = ctx-Color.sRGBEnabled; } if (mask GL_EVAL_BIT) { @@ -617,6 +623,10 @@ pop_enable_group(struct gl_context *ctx, const struct gl_enable_attrib *enable) enable-VertexProgramTwoSide, GL_VERTEX_PROGRAM_TWO_SIDE_ARB); + /* GL_ARB_framebuffer_sRGB / GL_EXT_framebuffer_sRGB */ + TEST_AND_UPDATE(ctx-Color.sRGBEnabled, enable-sRGBEnabled, + GL_FRAMEBUFFER_SRGB); + /* texture unit enables */ for (i = 0; i ctx-Const.MaxTextureUnits; i++) { const GLbitfield enabled = enable-Texture[i]; @@ -981,6 +991,9 @@ _mesa_PopAttrib(void) _mesa_set_enable(ctx, GL_DITHER, color-DitherFlag); _mesa_ClampColorARB(GL_CLAMP_FRAGMENT_COLOR_ARB, color-ClampFragmentColor); _mesa_ClampColorARB(GL_CLAMP_READ_COLOR_ARB, color-ClampReadColor); + + /* GL_ARB_framebuffer_sRGB / GL_EXT_framebuffer_sRGB */ + _mesa_set_enable(ctx, GL_FRAMEBUFFER_SRGB, color-sRGBEnabled); } break; case GL_CURRENT_BIT: Looks OK to me. Candidate for 8.0 branch? Reviewed-by: Brian Paul bri...@vmware.com ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] R600 VDPAU 422 regression since r600g: make sure copying of all texture formats is accelerated
Marek Olšák wrote: Do you have any idea what could be wrong with the patch? Also could please tell me how to setup VDPAU and where to download the tests, so that I can test this. I don't know about the patch. One thing which may be a clue or a red herring is that when Christian first implemented 422 it was corrupt in the same way. He said he thought it was to do with tiling - and fixed it in (IIRC) ~deathsimple/mesa I run LFS so to get VDPAU I installed the lib from - git://people.freedesktop.org/~aplattner/libvdpau mplayer built from svn should find the lib and enable vdpau during configure. svn checkout svn://svn.mplayerhq.hu/mplayer/trunk mplayer I guess http://www.mplayerhq.hu/MPlayer/releases/MPlayer-1.1.tar.xz will also work if you don't want to do svn. For me -vo vdpau is the default output with mplayer - You may need to be explicit I guess if you already have some distro version with a config installed. This issue is just with -vo vdpau not decode (-vc ffmpeg12vdpau) as 422 isn't implemented for decode anyway. When I autogen mesa I have --enable-gallium-g3dvl The sample I am using is from ftp://ftp.tek.com/tv/test/streams/Element/MPEG-Video/625/ The ones ending 400 are 422 the others are 420. The exact file is ftp://ftp.tek.com/tv/test/streams/Element/MPEG-Video/625/flwr_400.m2v ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Pipe control patches redux
Here's v3 of Daniel's PIPE_CONTROL series. I reworked it substantially, moving the length change to the beginning and splitting up the patches into smaller ones that only do one thing at a time, to make it easier to bisect or revert if there are any issues. (I'm pretty paranoid when it comes to PIPE_CONTROLs.) If there are no objections, I'll push these tomorrow. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/7] intel: Make the length for PIPE_CONTROL explicit.
PIPE_CONTROL has variable length, depending upon generation and whether we want to do 32-bit or 64-bit data writes. Make it explicit, rather than hiding a length of 4 in the #define for _3DSTATE_PIPE_CONTROL. Generated by s/3DSTATE_PIPE_CONTROL/3DSTATE_PIPE_CONTROL | (4 - 2)/g. This is equivalent since the #define used to have | 2 in it. A grep through the sources shows that all instances have been converted, so it's safe to remove the | 2 from the #define. Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_queryobj.c | 20 ++-- src/mesa/drivers/dri/i965/gen6_vs_state.c | 2 +- src/mesa/drivers/dri/intel/intel_batchbuffer.c | 16 src/mesa/drivers/dri/intel/intel_reg.h | 2 +- 4 files changed, 20 insertions(+), 20 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_queryobj.c b/src/mesa/drivers/dri/i965/brw_queryobj.c index 240fe32..921fecd 100644 --- a/src/mesa/drivers/dri/i965/brw_queryobj.c +++ b/src/mesa/drivers/dri/i965/brw_queryobj.c @@ -132,7 +132,7 @@ brw_begin_query(struct gl_context *ctx, struct gl_query_object *q) if (intel-gen = 6) { BEGIN_BATCH(4); - OUT_BATCH(_3DSTATE_PIPE_CONTROL); + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); OUT_BATCH(PIPE_CONTROL_WRITE_TIMESTAMP); OUT_RELOC(query-bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, @@ -143,7 +143,7 @@ brw_begin_query(struct gl_context *ctx, struct gl_query_object *q) } else { BEGIN_BATCH(4); - OUT_BATCH(_3DSTATE_PIPE_CONTROL | + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2) | PIPE_CONTROL_WRITE_TIMESTAMP); OUT_RELOC(query-bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, @@ -202,7 +202,7 @@ brw_end_query(struct gl_context *ctx, struct gl_query_object *q) case GL_TIME_ELAPSED_EXT: if (intel-gen = 6) { BEGIN_BATCH(4); - OUT_BATCH(_3DSTATE_PIPE_CONTROL); + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); OUT_BATCH(PIPE_CONTROL_WRITE_TIMESTAMP); OUT_RELOC(query-bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, @@ -213,7 +213,7 @@ brw_end_query(struct gl_context *ctx, struct gl_query_object *q) } else { BEGIN_BATCH(4); - OUT_BATCH(_3DSTATE_PIPE_CONTROL | + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2) | PIPE_CONTROL_WRITE_TIMESTAMP); OUT_RELOC(query-bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, @@ -340,12 +340,12 @@ brw_emit_query_begin(struct brw_context *brw) BEGIN_BATCH(8); /* workaround: CS stall required before depth stall. */ - OUT_BATCH(_3DSTATE_PIPE_CONTROL); + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); OUT_BATCH(PIPE_CONTROL_CS_STALL); OUT_BATCH(0); /* write address */ OUT_BATCH(0); /* write data */ - OUT_BATCH(_3DSTATE_PIPE_CONTROL); + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); OUT_BATCH(PIPE_CONTROL_DEPTH_STALL | PIPE_CONTROL_WRITE_DEPTH_COUNT); OUT_RELOC(brw-query.bo, @@ -357,7 +357,7 @@ brw_emit_query_begin(struct brw_context *brw) } else { BEGIN_BATCH(4); - OUT_BATCH(_3DSTATE_PIPE_CONTROL | + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2) | PIPE_CONTROL_DEPTH_STALL | PIPE_CONTROL_WRITE_DEPTH_COUNT); /* This object could be mapped cacheable, but we don't have an exposed @@ -397,12 +397,12 @@ brw_emit_query_end(struct brw_context *brw) if (intel-gen = 6) { BEGIN_BATCH(8); /* workaround: CS stall required before depth stall. */ - OUT_BATCH(_3DSTATE_PIPE_CONTROL); + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); OUT_BATCH(PIPE_CONTROL_CS_STALL); OUT_BATCH(0); /* write address */ OUT_BATCH(0); /* write data */ - OUT_BATCH(_3DSTATE_PIPE_CONTROL); + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); OUT_BATCH(PIPE_CONTROL_DEPTH_STALL | PIPE_CONTROL_WRITE_DEPTH_COUNT); OUT_RELOC(brw-query.bo, @@ -414,7 +414,7 @@ brw_emit_query_end(struct brw_context *brw) } else { BEGIN_BATCH(4); - OUT_BATCH(_3DSTATE_PIPE_CONTROL | + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2) | PIPE_CONTROL_DEPTH_STALL | PIPE_CONTROL_WRITE_DEPTH_COUNT); OUT_RELOC(brw-query.bo, diff --git a/src/mesa/drivers/dri/i965/gen6_vs_state.c b/src/mesa/drivers/dri/i965/gen6_vs_state.c index 3392a9f..c562cc7 100644 --- a/src/mesa/drivers/dri/i965/gen6_vs_state.c +++ b/src/mesa/drivers/dri/i965/gen6_vs_state.c @@ -216,7 +216,7 @@ upload_vs_state(struct brw_context *brw) intel_emit_post_sync_nonzero_flush(intel); BEGIN_BATCH(4); -
[Mesa-dev] [PATCH 2/7] i965: Refactor timestamp write PIPE_CONTROLs into a helper function.
This consolidates the complexity in one place, which is important because it's about to get even more complicated. Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_queryobj.c | 80 1 file changed, 30 insertions(+), 50 deletions(-) Eric wanted a helper function. He didn't say exactly what he wanted, so I made this. It at least consolidates the two timestamp bits. diff --git a/src/mesa/drivers/dri/i965/brw_queryobj.c b/src/mesa/drivers/dri/i965/brw_queryobj.c index 921fecd..229aeb7 100644 --- a/src/mesa/drivers/dri/i965/brw_queryobj.c +++ b/src/mesa/drivers/dri/i965/brw_queryobj.c @@ -45,6 +45,33 @@ #include intel_batchbuffer.h #include intel_reg.h +static void +write_timestamp(struct intel_context *intel, drm_intel_bo *query_bo, int idx) +{ + if (intel-gen = 6) { + BEGIN_BATCH(4); + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); + OUT_BATCH(PIPE_CONTROL_WRITE_TIMESTAMP); + OUT_RELOC(query_bo, +I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, +PIPE_CONTROL_GLOBAL_GTT_WRITE | +idx * sizeof(uint64_t)); + OUT_BATCH(0); + ADVANCE_BATCH(); + } else { + BEGIN_BATCH(4); + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2) | +PIPE_CONTROL_WRITE_TIMESTAMP); + OUT_RELOC(query_bo, +I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, +PIPE_CONTROL_GLOBAL_GTT_WRITE | +idx * sizeof(uint64_t)); + OUT_BATCH(0); + OUT_BATCH(0); + ADVANCE_BATCH(); + } +} + /** Waits on the query object's BO and totals the results for this query */ static void brw_queryobj_get_results(struct gl_context *ctx, @@ -127,32 +154,8 @@ brw_begin_query(struct gl_context *ctx, struct gl_query_object *q) switch (query-Base.Target) { case GL_TIME_ELAPSED_EXT: drm_intel_bo_unreference(query-bo); - query-bo = drm_intel_bo_alloc(intel-bufmgr, timer query, -4096, 4096); - - if (intel-gen = 6) { - BEGIN_BATCH(4); - OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); - OUT_BATCH(PIPE_CONTROL_WRITE_TIMESTAMP); - OUT_RELOC(query-bo, - I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, - PIPE_CONTROL_GLOBAL_GTT_WRITE | - 0); - OUT_BATCH(0); - ADVANCE_BATCH(); - - } else { - BEGIN_BATCH(4); - OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2) | - PIPE_CONTROL_WRITE_TIMESTAMP); - OUT_RELOC(query-bo, - I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, - PIPE_CONTROL_GLOBAL_GTT_WRITE | - 0); - OUT_BATCH(0); - OUT_BATCH(0); - ADVANCE_BATCH(); - } + query-bo = drm_intel_bo_alloc(intel-bufmgr, timer query, 4096, 4096); + write_timestamp(intel, query-bo, 0); break; case GL_SAMPLES_PASSED_ARB: @@ -200,30 +203,7 @@ brw_end_query(struct gl_context *ctx, struct gl_query_object *q) switch (query-Base.Target) { case GL_TIME_ELAPSED_EXT: - if (intel-gen = 6) { - BEGIN_BATCH(4); - OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); - OUT_BATCH(PIPE_CONTROL_WRITE_TIMESTAMP); - OUT_RELOC(query-bo, - I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, - PIPE_CONTROL_GLOBAL_GTT_WRITE | - 8); - OUT_BATCH(0); - ADVANCE_BATCH(); - - } else { - BEGIN_BATCH(4); - OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2) | - PIPE_CONTROL_WRITE_TIMESTAMP); - OUT_RELOC(query-bo, - I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, - PIPE_CONTROL_GLOBAL_GTT_WRITE | - 8); - OUT_BATCH(0); - OUT_BATCH(0); - ADVANCE_BATCH(); - } - + write_timestamp(intel, query-bo, 1); intel_batchbuffer_flush(intel); break; -- 1.7.11.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/7] i965: Use 64-bit writes for timestamp queries.
The hardware seems to use the length of the PIPE_CONTROL command to indicate whether the write is 64-bits or 32-bits. Which makes sense for immediate writes. Daniel discovered this by writing a pattern into the query object bo and noticing that the high 32-bits were left intact, even on those pipe control writes that seemingly worked. Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_queryobj.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_queryobj.c b/src/mesa/drivers/dri/i965/brw_queryobj.c index 229aeb7..afa3091 100644 --- a/src/mesa/drivers/dri/i965/brw_queryobj.c +++ b/src/mesa/drivers/dri/i965/brw_queryobj.c @@ -49,14 +49,15 @@ static void write_timestamp(struct intel_context *intel, drm_intel_bo *query_bo, int idx) { if (intel-gen = 6) { - BEGIN_BATCH(4); - OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); + BEGIN_BATCH(5); + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (5 - 2)); OUT_BATCH(PIPE_CONTROL_WRITE_TIMESTAMP); OUT_RELOC(query_bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, PIPE_CONTROL_GLOBAL_GTT_WRITE | idx * sizeof(uint64_t)); OUT_BATCH(0); + OUT_BATCH(0); ADVANCE_BATCH(); } else { BEGIN_BATCH(4); -- 1.7.11.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/7] i965: Emit a CS stall before timestamp writes.
This implements one of the Sandybridge PIPE_CONTROL workarounds. It doesn't appear to be required for Ivybridge. Signed-off-by: Kenneth Graunke kenn...@whitecape.org Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch --- src/mesa/drivers/dri/i965/brw_queryobj.c | 14 ++ 1 file changed, 14 insertions(+) Unlike Daniel's series, I made this only apply on Sandybridge. It appears that it isn't required, from reading the docs, and I believe Eric made a comment to that effect during the v1 review. diff --git a/src/mesa/drivers/dri/i965/brw_queryobj.c b/src/mesa/drivers/dri/i965/brw_queryobj.c index afa3091..cbe67ad 100644 --- a/src/mesa/drivers/dri/i965/brw_queryobj.c +++ b/src/mesa/drivers/dri/i965/brw_queryobj.c @@ -49,6 +49,20 @@ static void write_timestamp(struct intel_context *intel, drm_intel_bo *query_bo, int idx) { if (intel-gen = 6) { + /* Emit workaround flushes: */ + if (intel-gen == 6) { + /* The timestamp write below is a non-zero post-sync op, which on + * Gen6 necessitates a CS stall. CS stalls need stall at scoreboard + * set. See the comments for intel_emit_post_sync_nonzero_flush(). + */ + BEGIN_BATCH(4); + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); + OUT_BATCH(PIPE_CONTROL_CS_STALL | PIPE_CONTROL_STALL_AT_SCOREBOARD); + OUT_BATCH(0); + OUT_BATCH(0); + ADVANCE_BATCH(); + } + BEGIN_BATCH(5); OUT_BATCH(_3DSTATE_PIPE_CONTROL | (5 - 2)); OUT_BATCH(PIPE_CONTROL_WRITE_TIMESTAMP); -- 1.7.11.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/7] i965: Refactor depth count write PIPE_CONTROLs into a helper function.
This consolidates the complexity in one place, which is important because it's about to get even more complicated. Signed-off-by: Kenneth Graunke kenn...@whitecape.org Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch --- src/mesa/drivers/dri/i965/brw_queryobj.c | 111 --- 1 file changed, 43 insertions(+), 68 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_queryobj.c b/src/mesa/drivers/dri/i965/brw_queryobj.c index cbe67ad..d45edc1 100644 --- a/src/mesa/drivers/dri/i965/brw_queryobj.c +++ b/src/mesa/drivers/dri/i965/brw_queryobj.c @@ -87,6 +87,47 @@ write_timestamp(struct intel_context *intel, drm_intel_bo *query_bo, int idx) } } +static void +write_depth_count(struct intel_context *intel, drm_intel_bo *query_bo, int idx) +{ + if (intel-gen = 6) { + BEGIN_BATCH(8); + + /* workaround: CS stall required before depth stall. */ + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); + OUT_BATCH(PIPE_CONTROL_CS_STALL); + OUT_BATCH(0); /* write address */ + OUT_BATCH(0); /* write data */ + + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); + OUT_BATCH(PIPE_CONTROL_DEPTH_STALL | +PIPE_CONTROL_WRITE_DEPTH_COUNT); + OUT_RELOC(query_bo, +I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, +PIPE_CONTROL_GLOBAL_GTT_WRITE | +(idx * sizeof(uint64_t))); + OUT_BATCH(0); + ADVANCE_BATCH(); + } else { + BEGIN_BATCH(4); + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2) | +PIPE_CONTROL_DEPTH_STALL | +PIPE_CONTROL_WRITE_DEPTH_COUNT); + /* This object could be mapped cacheable, but we don't have an exposed + * mechanism to support that. Since it's going uncached, tell GEM that + * we're writing to it. The usual clflush should be all that's required + * to pick up the results. + */ + OUT_RELOC(query_bo, +I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, +PIPE_CONTROL_GLOBAL_GTT_WRITE | +(idx * sizeof(uint64_t))); + OUT_BATCH(0); + OUT_BATCH(0); + ADVANCE_BATCH(); + } +} + /** Waits on the query object's BO and totals the results for this query */ static void brw_queryobj_get_results(struct gl_context *ctx, @@ -331,43 +372,7 @@ brw_emit_query_begin(struct brw_context *brw) if (!query || brw-query.active) return; - if (intel-gen = 6) { - BEGIN_BATCH(8); - - /* workaround: CS stall required before depth stall. */ - OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); - OUT_BATCH(PIPE_CONTROL_CS_STALL); - OUT_BATCH(0); /* write address */ - OUT_BATCH(0); /* write data */ - - OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); - OUT_BATCH(PIPE_CONTROL_DEPTH_STALL | -PIPE_CONTROL_WRITE_DEPTH_COUNT); - OUT_RELOC(brw-query.bo, -I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, -PIPE_CONTROL_GLOBAL_GTT_WRITE | -((brw-query.index * 2) * sizeof(uint64_t))); - OUT_BATCH(0); - ADVANCE_BATCH(); - - } else { - BEGIN_BATCH(4); - OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2) | - PIPE_CONTROL_DEPTH_STALL | - PIPE_CONTROL_WRITE_DEPTH_COUNT); - /* This object could be mapped cacheable, but we don't have an exposed - * mechanism to support that. Since it's going uncached, tell GEM that - * we're writing to it. The usual clflush should be all that's required - * to pick up the results. - */ - OUT_RELOC(brw-query.bo, - I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, - PIPE_CONTROL_GLOBAL_GTT_WRITE | - ((brw-query.index * 2) * sizeof(uint64_t))); - OUT_BATCH(0); - OUT_BATCH(0); - ADVANCE_BATCH(); - } + write_depth_count(intel, brw-query.bo, brw-query.index * 2); if (query-bo != brw-query.bo) { if (query-bo != NULL) @@ -389,37 +394,7 @@ brw_emit_query_end(struct brw_context *brw) if (!brw-query.active) return; - if (intel-gen = 6) { - BEGIN_BATCH(8); - /* workaround: CS stall required before depth stall. */ - OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); - OUT_BATCH(PIPE_CONTROL_CS_STALL); - OUT_BATCH(0); /* write address */ - OUT_BATCH(0); /* write data */ - - OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); - OUT_BATCH(PIPE_CONTROL_DEPTH_STALL | -PIPE_CONTROL_WRITE_DEPTH_COUNT); - OUT_RELOC(brw-query.bo, -I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, -PIPE_CONTROL_GLOBAL_GTT_WRITE | -((brw-query.index * 2 + 1) * sizeof(uint64_t))); - OUT_BATCH(0); - ADVANCE_BATCH(); - - } else { - BEGIN_BATCH(4); - OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2) | - PIPE_CONTROL_DEPTH_STALL | -
[Mesa-dev] [PATCH 6/7] i965: Use 64-bit writes for occlusion queries.
The hardware seems to use the length of the PIPE_CONTROL command to indicate whether the write is 64-bits or 32-bits. Which makes sense for immediate writes. Daniel discovered this by writing a pattern into the query object bo and noticing that the high 32-bits were left intact, even on those pipe control writes that seemingly worked. Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_queryobj.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_queryobj.c b/src/mesa/drivers/dri/i965/brw_queryobj.c index d45edc1..1e03d08 100644 --- a/src/mesa/drivers/dri/i965/brw_queryobj.c +++ b/src/mesa/drivers/dri/i965/brw_queryobj.c @@ -91,7 +91,7 @@ static void write_depth_count(struct intel_context *intel, drm_intel_bo *query_bo, int idx) { if (intel-gen = 6) { - BEGIN_BATCH(8); + BEGIN_BATCH(9); /* workaround: CS stall required before depth stall. */ OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); @@ -99,7 +99,7 @@ write_depth_count(struct intel_context *intel, drm_intel_bo *query_bo, int idx) OUT_BATCH(0); /* write address */ OUT_BATCH(0); /* write data */ - OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (5 - 2)); OUT_BATCH(PIPE_CONTROL_DEPTH_STALL | PIPE_CONTROL_WRITE_DEPTH_COUNT); OUT_RELOC(query_bo, @@ -107,6 +107,7 @@ write_depth_count(struct intel_context *intel, drm_intel_bo *query_bo, int idx) PIPE_CONTROL_GLOBAL_GTT_WRITE | (idx * sizeof(uint64_t))); OUT_BATCH(0); + OUT_BATCH(0); ADVANCE_BATCH(); } else { BEGIN_BATCH(4); -- 1.7.11.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/7] i965: Rework the extra flushes surrounding occlusion queries.
Separate out the depth stall from the depth count write. Workarounds say that a depth stall needs to be preceeded with a non-zero post-sync op (in this case, the depth count write). Also, before the non-zero post-sync op, we need a CS stall, which needs a stall at scoreboard. Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch Signed-off-by: Kenneth Graunke kenn...@whitecape.org --- src/mesa/drivers/dri/i965/brw_queryobj.c | 36 1 file changed, 27 insertions(+), 9 deletions(-) This does remove the CS stall on Ivybridge. diff --git a/src/mesa/drivers/dri/i965/brw_queryobj.c b/src/mesa/drivers/dri/i965/brw_queryobj.c index 1e03d08..4c561ad 100644 --- a/src/mesa/drivers/dri/i965/brw_queryobj.c +++ b/src/mesa/drivers/dri/i965/brw_queryobj.c @@ -91,17 +91,24 @@ static void write_depth_count(struct intel_context *intel, drm_intel_bo *query_bo, int idx) { if (intel-gen = 6) { - BEGIN_BATCH(9); - - /* workaround: CS stall required before depth stall. */ - OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); - OUT_BATCH(PIPE_CONTROL_CS_STALL); - OUT_BATCH(0); /* write address */ - OUT_BATCH(0); /* write data */ + /* Emit Sandybridge workaround flush: */ + if (intel-gen == 6) { + /* The timestamp write below is a non-zero post-sync op, which on + * Gen6 necessitates a CS stall. CS stalls need stall at scoreboard + * set. See the comments for intel_emit_post_sync_nonzero_flush(). + */ + BEGIN_BATCH(4); + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); + OUT_BATCH(PIPE_CONTROL_CS_STALL | PIPE_CONTROL_STALL_AT_SCOREBOARD); + OUT_BATCH(0); + OUT_BATCH(0); + ADVANCE_BATCH(); + } + /* Emit the actual depth count write: */ + BEGIN_BATCH(5); OUT_BATCH(_3DSTATE_PIPE_CONTROL | (5 - 2)); - OUT_BATCH(PIPE_CONTROL_DEPTH_STALL | -PIPE_CONTROL_WRITE_DEPTH_COUNT); + OUT_BATCH(PIPE_CONTROL_WRITE_DEPTH_COUNT); OUT_RELOC(query_bo, I915_GEM_DOMAIN_INSTRUCTION, I915_GEM_DOMAIN_INSTRUCTION, PIPE_CONTROL_GLOBAL_GTT_WRITE | @@ -109,6 +116,17 @@ write_depth_count(struct intel_context *intel, drm_intel_bo *query_bo, int idx) OUT_BATCH(0); OUT_BATCH(0); ADVANCE_BATCH(); + + /* We need to emit a depth stall to get the right value for the depth + * count. As a workaround this needs a preceeding PIPE_CONTROL with a + * non-zero post-sync op. The depth count write above does that for us. + */ + BEGIN_BATCH(4); + OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2)); + OUT_BATCH(PIPE_CONTROL_DEPTH_STALL); + OUT_BATCH(0); + OUT_BATCH(0); + ADVANCE_BATCH(); } else { BEGIN_BATCH(4); OUT_BATCH(_3DSTATE_PIPE_CONTROL | (4 - 2) | -- 1.7.11.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev