Re: [Mesa-dev] glxgears is faster but 3D render is so slow
Hi Brian, On 3/19/13, Brian Paul bri...@vmware.com wrote: It is fair to say, if running llvm driver in my local machine (a 32-bit CentOS 6.2 without VNC connection), it was indeed faster than the xlib driver. Seems to me that the llvm driver broken the xlib VNC connection which could be caused by either I haven't configure the llvm correctly, or mesa llvm compile process may have bugs. I don't understand what you mean by llvm driver broken the xlib VNC connection. I have tested llvm driver in two platforms: (1) A local computer running on CentOS 6.2 which does not have hardware acceleration, but I can directly access it. The llvm driver is indeed much faster than the swrast, I could run an application with 3D structure rotation. (2) A virtual machine running on CentOS 6.2, I have to access it via VNC. I was not able to run the 3D application, the graphic jerky and could not respond. If I changed to run swrast, the 3D application graphic could be run much smoothly and response was normal, but the 3D rotation was stopped because it was too slower to rotate the 3D structure. That was what I mean the llvm broken the xlib VNC connection. Have you tested the llvm driver in VNC connection? (2) Compile llvm driver LLVM=/usr/local/libllvm/3.2 ${SOURCE}/${CONFIGURE} --prefix=${INSTALL} --enable-xlib-glx --disable-dri --enable-gallium-llvm --with-gallium-drivers=swrast --with-llvm-shared-libs=${LLVM}/lib --with-llvm-prefix=${LLVM} Manually change libGL.so and libGL.so.1 to link lib/gallium/libGL.so.1.5.0. Looks OK to me. One more question, how can I build llvm without manually changing the libGL.so link? Was I missing something in my compilation? Or is there any issue in mesa build and installation process? Thanks. Kind regards. Jupiter ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radeonsi: Emit pixel shader state even when only the vertex shader changed
From: Michel Dänzer michel.daen...@amd.com Fixes random failures with piglit glsl-max-varyings. NOTE: This is a candidate for the 9.1 branch. Signed-off-by: Michel Dänzer michel.daen...@amd.com --- src/gallium/drivers/radeonsi/si_state_draw.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index 1049d2b..a78751b 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -404,6 +404,11 @@ static void si_update_derived_state(struct r600_context *rctx) } if (si_pm4_state_changed(rctx, ps) || si_pm4_state_changed(rctx, vs)) { + /* XXX: Emitting the PS state even when only the VS changed +* fixes random failures with piglit glsl-max-varyings. +* Not sure why... +*/ + rctx-emitted.named.ps = NULL; si_update_spi_map(rctx); } } -- 1.8.2.rc3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radeonsi: Emit pixel shader state even when only the vertex shader changed
Am 20.03.2013 11:43, schrieb Michel Dänzer: From: Michel Dänzer michel.daen...@amd.com Fixes random failures with piglit glsl-max-varyings. NOTE: This is a candidate for the 9.1 branch. Signed-off-by: Michel Dänzer michel.daen...@amd.com Reviewed-by: Christian König christian.koe...@amd.com --- src/gallium/drivers/radeonsi/si_state_draw.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c b/src/gallium/drivers/radeonsi/si_state_draw.c index 1049d2b..a78751b 100644 --- a/src/gallium/drivers/radeonsi/si_state_draw.c +++ b/src/gallium/drivers/radeonsi/si_state_draw.c @@ -404,6 +404,11 @@ static void si_update_derived_state(struct r600_context *rctx) } if (si_pm4_state_changed(rctx, ps) || si_pm4_state_changed(rctx, vs)) { + /* XXX: Emitting the PS state even when only the VS changed +* fixes random failures with piglit glsl-max-varyings. +* Not sure why... +*/ + rctx-emitted.named.ps = NULL; si_update_spi_map(rctx); } } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] glxgears is faster but 3D render is so slow
On 03/20/2013 04:07 AM, jupiter wrote: Hi Brian, On 3/19/13, Brian Paulbri...@vmware.com wrote: It is fair to say, if running llvm driver in my local machine (a 32-bit CentOS 6.2 without VNC connection), it was indeed faster than the xlib driver. Seems to me that the llvm driver broken the xlib VNC connection which could be caused by either I haven't configure the llvm correctly, or mesa llvm compile process may have bugs. I don't understand what you mean by llvm driver broken the xlib VNC connection. I have tested llvm driver in two platforms: (1) A local computer running on CentOS 6.2 which does not have hardware acceleration, but I can directly access it. The llvm driver is indeed much faster than the swrast, I could run an application with 3D structure rotation. (2) A virtual machine running on CentOS 6.2, I have to access it via VNC. I was not able to run the 3D application, the graphic jerky and could not respond. If I changed to run swrast, the 3D application graphic could be run much smoothly and response was normal, but the 3D rotation was stopped because it was too slower to rotate the 3D structure. That was what I mean the llvm broken the xlib VNC connection. Have you tested the llvm driver in VNC connection? No, I haven't. I'm really not sure what's happening in this situation. My only totally wild guess is there's competition between the VNC server and Mesa for CPU time. The llvmpipe driver is threaded and creates as many threads as there are CPU cores. You can set the LP_NUM_THREADS to tell llvmpipe how many threads to use (0 for no threading). How many CPU cores do you have? (2) Compile llvm driver LLVM=/usr/local/libllvm/3.2 ${SOURCE}/${CONFIGURE} --prefix=${INSTALL} --enable-xlib-glx --disable-dri --enable-gallium-llvm --with-gallium-drivers=swrast --with-llvm-shared-libs=${LLVM}/lib --with-llvm-prefix=${LLVM} Manually change libGL.so and libGL.so.1 to link lib/gallium/libGL.so.1.5.0. Looks OK to me. One more question, how can I build llvm without manually changing the libGL.so link? Was I missing something in my compilation? Or is there any issue in mesa build and installation process? I think that's a deficiency in our configure/install system. I haven't looked into it though. -Brian ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Error while compiling the MAPI directory
You're building with scons right? Jose - Original Message - Hi, I used the latest mesa and I am still receiving the same errors. It works perfectly fine in Ubuntu though. Can somebody please tell in the file mapi_tmp.h how does the following constant included? #include MAPI_ABI_HEADER Thanks, Ritvik -Original Message- From: Jose Fonseca [mailto:jfons...@vmware.com] Sent: Monday, March 18, 2013 11:29 PM To: Sharma, Ritvik Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] Error while compiling the MAPI directory - Original Message - Hi, I am receiving the following error while compiling the code in the mapi directory. I am using mesa 7.5. If you're compiling with MSVC I'd recommend using a recent Mesa release and save your self a world of trouble. It's known to build well there. If you must use this old release, then you'll likely need to search the MSVC build fixes and crossport them. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] RFC: TGSI scalar arrays
Sorry, this has become longer than I anticipated ... I've been toying with adding support for TGSI_FILE_INPUT/OUTPUT arrays because, since I cannot allocate varyings in the same order that the register index specifies, I need it: === EXAMPLE: OUT[0], CLIPDIST[1], must be allocated at address 0x2c0 in hardware output space OUT[1], CLIPDIST[0], 0x2d0 OUT[2], GENERIC[0], between 0x80 and 0x280 OUT[3], GENERIC[1], between 0x80 and 0x280 And without array specification MOV OUT[TEMP[0].x-1], IMM[0] would leave me no clue as to whether use 0x80 or 0x2c0 as base address. === Now that I'm on it, I'm considering to go a step further, which is adding indirect scalar/component access. This is motivated by float gl_ClipDistance[], which, if accessed indirectly, currently leaves us no choice than generating code like this: if ((index 3) == 0) access x component; else if ((index 3) == 1) access y component; ... This is undesirable and the hardware can do better (as it actually supports accessing individual components since address registers contain an address in bytes and we can do scalar read/write). A second motivation is varying packing, which is required by the GL spec, and may lead to use of TEMP arrays, which, albeit improved now, will impair performance when used (on nv50 they go to uncached memory which is very slow). That case occurs if, for instance, a varying float[8] is accessed indirectly and has to be packed into OUT[0..1].xyzw, GENERIC[0..1] instead of OUT[0..7].x, GENERIC[0..7] So far I've come up with 2 choices (all available only if the driver supports e.g. PIPE_CAP_TGSI_SCALAR_REGISTERS): 1. SCALAR DECLARATIONS Using float gl_ClipDistance[8] as example, it could be declared as: OUT[0..7].x, CLIPDIST, ARRAY(1) where the .x now means that it's a single component per OUT[index] Now this obviously means that a single OUT[i] doesn't always consume 16 bytes / 4 components anymore, which may be a somewhat disturbing, since the address of an output can't be directly inferred solely from its index anymore. However, that doesn't really constitute a problem if all access is either direct or comes with an ARRAY() reference. For varying packing, which happens only for user defined variables, and hence TGSI_SEMANTIC_GENERIC, it gets a bit uglier: (NOTE: GL requires us to be able to support exactly the amount of components we report, failing due to alignment is not allowed. Hence the GLSL compiler may put some variables at unaligned locations, see ir_variable.location_frac): A GENERIC semantic index should always cover 4 components so that a fixed location can be assigned for it (drivers usually do this since it makes an extra dynamic linkage pass when shaders are changed unnecessary, as intended by GL_ARB_separate_shader_objects). So, this would be valid: OUT[0..3].x, GENERIC[0] OUT[4..5].xy, GENERIC[1] OUT[6], GENERIC[2] Note how 3 OUT[indices] only consume 1 GENERIC[index]. If we, instead, allocated semantic index per register index instead of per 4 components, we would have: OUT[0..3].x, GENERIC[0] OUT[4..5].xy, GENERIC[4] OUT[6], GENERIC[6] This would waste space, since GENERIC[4,6] would have to go to output_space[addresses 0x40, 0x60] so it could link with IN[6], GENERIC[6] where we have no information about the size of GENERIC[0 .. 5], and wasting space like that means the advertised number of varying components cannot be satisfied. And as a last step, if varyings are placed at non-vec4 boundaries, we would have to be able to specify fractional semantic indices, like this: OUT[0..2].x, GENERIC[0].x OUT[3].x, GENERIC[0].w 2. SCALAR ADDRESS REGISTER VALUES All this can be avoided by always declaring full vec4s, and adding the possibility of doing indirect addressing on a per-component basis: varying float a[4] becomes: uniform int i; a[i+5] = 999 becomes: OUT[0].xyzw, ARRAY(1) UARL_SCALAR ADDR[0].x, CONST[0]. MOV OUT(array 1)[ADDR[0].x+1].y, IMM[0]. The only difficulty with this is that we have to split acess TGSI instructions accessing unaligned vectors: (NOTE: this can always be avoided with TGSI_FILE_TEMPORARY, but varyings may have to be packed). With suggestion (1), 2 packed (and hence unaligned) vec3 arrays and a single vec2 would look like this: OUT[0..3].xyz, GENERIC[0].x OUT[4..5].xyz, GENERIC[3].x OUT[6].xy, GENERIC[4].zw and we could still do: ADD OUT[5].xyz, TEMP[0], TEMP[1] Now, these would have to merged declared as: OUT[0..4].xyzw and the 2nd vec3 would be { OUT[0].w, OUT[1].xyz } instead of simply OUT[1].xyz A problem with this is that the GLSL compiler, while it can do the packing into vec4s and splitting up access, cannot, iirc, access individual components of a vec4 indirectly like TGSI would be able to. To avoid TEMP arrays we'd have to disable the last phase of varying packing (that actually converts the code to using vec4s). It would still be able to assign fractional locations to guarantee that linkage works, but glsl-to-tgsi would likely have
[Mesa-dev] [Bug 61364] LLVM assertion when starting X11
https://bugs.freedesktop.org/show_bug.cgi?id=61364 --- Comment #6 from Jerome Glisse gli...@freedesktop.org --- Yeah i saw same issue on radeon -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Error while compiling the MAPI directory
I am using Visual Studio. I found that all these missing constants like MAPI_TABLE_NUM_STATIC are getting there values in mapi_abi.py. Since I am building it in UEFI I am making [.inf] files and using them to generate the makefilesand not the makefiles given in the mesa kit. Could this be a reason for why I am getting the errors? How are the python functions initializing the C constants? Thanks, Ritvik -Original Message- From: Jose Fonseca [mailto:jfons...@vmware.com] Sent: Wednesday, March 20, 2013 8:00 PM To: Sharma, Ritvik Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] Error while compiling the MAPI directory You're building with scons right? Jose - Original Message - Hi, I used the latest mesa and I am still receiving the same errors. It works perfectly fine in Ubuntu though. Can somebody please tell in the file mapi_tmp.h how does the following constant included? #include MAPI_ABI_HEADER Thanks, Ritvik -Original Message- From: Jose Fonseca [mailto:jfons...@vmware.com] Sent: Monday, March 18, 2013 11:29 PM To: Sharma, Ritvik Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] Error while compiling the MAPI directory - Original Message - Hi, I am receiving the following error while compiling the code in the mapi directory. I am using mesa 7.5. If you're compiling with MSVC I'd recommend using a recent Mesa release and save your self a world of trouble. It's known to build well there. If you must use this old release, then you'll likely need to search the MSVC build fixes and crossport them. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/egl: Fix build after changes in src/egl/wayland/
I pushed a different fix for this. The gallium egl code doesn't have support for buffer sharing via fd passing so we can't just ask the protocol code to advertise that, even if the kernel has the DRM_CAP_PRIME features. Instead we just pass 0 for the flags argument. thanks, Kristian On Tue, Mar 19, 2013 at 6:07 AM, Michel Dänzer mic...@daenzer.net wrote: From: Michel Dänzer michel.daen...@amd.com Not sure it actually works though, some buffer callbacks seem to have rotted before. Signed-off-by: Michel Dänzer michel.daen...@amd.com --- src/gallium/state_trackers/egl/drm/native_drm.c | 8 +++- src/gallium/state_trackers/egl/wayland/native_drm.c | 8 +++- src/gallium/state_trackers/egl/x11/native_dri2.c| 14 +- 3 files changed, 27 insertions(+), 3 deletions(-) diff --git a/src/gallium/state_trackers/egl/drm/native_drm.c b/src/gallium/state_trackers/egl/drm/native_drm.c index f0c0f54..65c91cf 100644 --- a/src/gallium/state_trackers/egl/drm/native_drm.c +++ b/src/gallium/state_trackers/egl/drm/native_drm.c @@ -207,13 +207,19 @@ drm_display_bind_wayland_display(struct native_display *ndpy, struct wl_display *wl_dpy) { struct drm_display *drmdpy = drm_display(ndpy); + int ret, flags = 0; + uint64_t cap; if (drmdpy-wl_server_drm) return FALSE; + ret = drmGetCap(drmdpy-fd, DRM_CAP_PRIME, cap); + if (ret == 0 cap == (DRM_PRIME_CAP_IMPORT | DRM_PRIME_CAP_EXPORT)) + flags |= WAYLAND_DRM_PRIME; + drmdpy-wl_server_drm = wayland_drm_init(wl_dpy, drmdpy-device_name, - wl_drm_callbacks, ndpy); + wl_drm_callbacks, ndpy, flags); if (!drmdpy-wl_server_drm) return FALSE; diff --git a/src/gallium/state_trackers/egl/wayland/native_drm.c b/src/gallium/state_trackers/egl/wayland/native_drm.c index 3801fac..7633379 100644 --- a/src/gallium/state_trackers/egl/wayland/native_drm.c +++ b/src/gallium/state_trackers/egl/wayland/native_drm.c @@ -265,13 +265,19 @@ wayland_drm_display_bind_wayland_display(struct native_display *ndpy, struct wl_display *wl_dpy) { struct wayland_drm_display *drmdpy = wayland_drm_display(ndpy); + int ret, flags = 0; + uint64_t cap; if (drmdpy-wl_server_drm) return FALSE; + ret = drmGetCap(drmdpy-fd, DRM_CAP_PRIME, cap); + if (ret == 0 cap == (DRM_PRIME_CAP_IMPORT | DRM_PRIME_CAP_EXPORT)) + flags |= WAYLAND_DRM_PRIME; + drmdpy-wl_server_drm = wayland_drm_init(wl_dpy, drmdpy-device_name, - wl_drm_callbacks, ndpy); + wl_drm_callbacks, ndpy, flags); if (!drmdpy-wl_server_drm) return FALSE; diff --git a/src/gallium/state_trackers/egl/x11/native_dri2.c b/src/gallium/state_trackers/egl/x11/native_dri2.c index a989f9e..67ecb60 100644 --- a/src/gallium/state_trackers/egl/x11/native_dri2.c +++ b/src/gallium/state_trackers/egl/x11/native_dri2.c @@ -40,6 +40,7 @@ #include common/native_helper.h #ifdef HAVE_WAYLAND_BACKEND +#include xf86drm.h #include common/native_wayland_drm_bufmgr_helper.h #endif @@ -63,6 +64,7 @@ struct dri2_display { struct util_hash_table *surfaces; #ifdef HAVE_WAYLAND_BACKEND struct wl_drm *wl_server_drm; /* for EGL_WL_bind_wayland_display */ + int fd; #endif }; @@ -817,6 +819,10 @@ dri2_display_init_screen(struct native_display *ndpy) return FALSE; } +#ifdef HAVE_WAYLAND_BACKEND + dri2dpy-fd = fd; +#endif + return TRUE; } @@ -855,13 +861,19 @@ dri2_display_bind_wayland_display(struct native_display *ndpy, struct wl_display *wl_dpy) { struct dri2_display *dri2dpy = dri2_display(ndpy); + int ret, flags = 0; + uint64_t cap; if (dri2dpy-wl_server_drm) return FALSE; + ret = drmGetCap(dri2dpy-fd, DRM_CAP_PRIME, cap); + if (ret == 0 cap == (DRM_PRIME_CAP_IMPORT | DRM_PRIME_CAP_EXPORT)) + flags |= WAYLAND_DRM_PRIME; + dri2dpy-wl_server_drm = wayland_drm_init(wl_dpy, x11_screen_get_device_name(dri2dpy-xscr), - wl_drm_callbacks, ndpy); + wl_drm_callbacks, ndpy, flags); if (!dri2dpy-wl_server_drm) return FALSE; -- 1.8.2.rc3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Add a driconf option to disable flush throttling.
On 19 March 2013 17:10, Eric Anholt e...@anholt.net wrote: Paul Berry stereotype...@gmail.com writes: Normally when submitting the first batch buffer after a flush, we check whether the GPU has completed processing of the first batch buffer of the previous frame. If it hasn't, we wait for it to finish before submitting any more batches. This prevents GPU-heavy and CPU-light applications from racing too far ahead of the current frame, but at the expense of possibly lower frame rates. Sometimes when benchmarking we want to disable this mechanism. This patch adds the driconf option disable_throttling to disable the throttling mechanism. We often do our driver-specific options inside of intel_screen.c, but I suppose this way someone could potentially translate it. Reviewed-by: Eric Anholt e...@anholt.net Have you found any interesting cases where this is a problem? I think Ken found a benchmark where there was a marginal improvement, but I don't recall exactly what it was. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] meta: fix incorrect slice, r coordinate computation
The arithmetic to convert a 3D texture slice to an R coordinate was incorrect. Found when MSVC warned of a divide by zero. Note that we don't actually ever hit this path. We don't decompress slices of 3D textures and we don't support 3D mipmap generation yet. --- src/mesa/drivers/common/meta.c | 13 + 1 files changed, 9 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c index 1a1fd28..8114550 100644 --- a/src/mesa/drivers/common/meta.c +++ b/src/mesa/drivers/common/meta.c @@ -3118,6 +3118,7 @@ setup_texture_coords(GLenum faceTarget, GLint slice, GLint width, GLint height, + GLint depth, GLfloat coords0[3], GLfloat coords1[3], GLfloat coords2[3], @@ -3134,8 +3135,11 @@ setup_texture_coords(GLenum faceTarget, case GL_TEXTURE_2D: case GL_TEXTURE_3D: case GL_TEXTURE_2D_ARRAY: - if (faceTarget == GL_TEXTURE_3D) - r = 1.0F / slice; + if (faceTarget == GL_TEXTURE_3D) { + assert(slice depth); + assert(depth = 1); + r = (slice + 0.5f) / depth; + } else if (faceTarget == GL_TEXTURE_2D_ARRAY) r = slice; else @@ -3574,7 +3578,7 @@ _mesa_meta_GenerateMipmap(struct gl_context *ctx, GLenum target, /* Setup texture coordinates */ setup_texture_coords(faceTarget, slice, -0, 0, /* width, height never used here */ +0, 0, 1, /* width, height never used here */ verts[0].tex, verts[1].tex, verts[2].tex, @@ -3840,6 +3844,7 @@ decompress_texture_image(struct gl_context *ctx, struct gl_texture_object *texObj = texImage-TexObject; const GLint width = texImage-Width; const GLint height = texImage-Height; + const GLint depth = texImage-Height; const GLenum target = texObj-Target; GLenum faceTarget; struct vertex { @@ -3935,7 +3940,7 @@ decompress_texture_image(struct gl_context *ctx, _mesa_BindSampler(ctx-Texture.CurrentUnit, decompress-Sampler); } - setup_texture_coords(faceTarget, slice, width, height, + setup_texture_coords(faceTarget, slice, width, height, depth, verts[0].tex, verts[1].tex, verts[2].tex, -- 1.7.3.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] RFC: TGSI scalar arrays
Am 20.03.2013 15:41, schrieb Christoph Bumiller: Sorry, this has become longer than I anticipated ... I've been toying with adding support for TGSI_FILE_INPUT/OUTPUT arrays because, since I cannot allocate varyings in the same order that the register index specifies, I need it: === EXAMPLE: OUT[0], CLIPDIST[1], must be allocated at address 0x2c0 in hardware output space OUT[1], CLIPDIST[0], 0x2d0 OUT[2], GENERIC[0], between 0x80 and 0x280 OUT[3], GENERIC[1], between 0x80 and 0x280 And without array specification MOV OUT[TEMP[0].x-1], IMM[0] would leave me no clue as to whether use 0x80 or 0x2c0 as base address. === Now that I'm on it, I'm considering to go a step further, which is adding indirect scalar/component access. This is motivated by float gl_ClipDistance[], which, if accessed indirectly, currently leaves us no choice than generating code like this: if ((index 3) == 0) access x component; else if ((index 3) == 1) access y component; ... This is undesirable and the hardware can do better (as it actually supports accessing individual components since address registers contain an address in bytes and we can do scalar read/write). A second motivation is varying packing, which is required by the GL spec, and may lead to use of TEMP arrays, which, albeit improved now, will impair performance when used (on nv50 they go to uncached memory which is very slow). That case occurs if, for instance, a varying float[8] is accessed indirectly and has to be packed into OUT[0..1].xyzw, GENERIC[0..1] instead of OUT[0..7].x, GENERIC[0..7] So far I've come up with 2 choices (all available only if the driver supports e.g. PIPE_CAP_TGSI_SCALAR_REGISTERS): 1. SCALAR DECLARATIONS Using float gl_ClipDistance[8] as example, it could be declared as: OUT[0..7].x, CLIPDIST, ARRAY(1) where the .x now means that it's a single component per OUT[index] Now this obviously means that a single OUT[i] doesn't always consume 16 bytes / 4 components anymore, which may be a somewhat disturbing, since the address of an output can't be directly inferred solely from its index anymore. However, that doesn't really constitute a problem if all access is either direct or comes with an ARRAY() reference. For varying packing, which happens only for user defined variables, and hence TGSI_SEMANTIC_GENERIC, it gets a bit uglier: (NOTE: GL requires us to be able to support exactly the amount of components we report, failing due to alignment is not allowed. Hence the GLSL compiler may put some variables at unaligned locations, see ir_variable.location_frac): A GENERIC semantic index should always cover 4 components so that a fixed location can be assigned for it (drivers usually do this since it makes an extra dynamic linkage pass when shaders are changed unnecessary, as intended by GL_ARB_separate_shader_objects). So, this would be valid: OUT[0..3].x, GENERIC[0] OUT[4..5].xy, GENERIC[1] OUT[6], GENERIC[2] Note how 3 OUT[indices] only consume 1 GENERIC[index]. If we, instead, allocated semantic index per register index instead of per 4 components, we would have: OUT[0..3].x, GENERIC[0] OUT[4..5].xy, GENERIC[4] OUT[6], GENERIC[6] This would waste space, since GENERIC[4,6] would have to go to output_space[addresses 0x40, 0x60] so it could link with IN[6], GENERIC[6] where we have no information about the size of GENERIC[0 .. 5], and wasting space like that means the advertised number of varying components cannot be satisfied. And as a last step, if varyings are placed at non-vec4 boundaries, we would have to be able to specify fractional semantic indices, like this: OUT[0..2].x, GENERIC[0].x OUT[3].x, GENERIC[0].w 2. SCALAR ADDRESS REGISTER VALUES All this can be avoided by always declaring full vec4s, and adding the possibility of doing indirect addressing on a per-component basis: varying float a[4] becomes: uniform int i; a[i+5] = 999 becomes: OUT[0].xyzw, ARRAY(1) UARL_SCALAR ADDR[0].x, CONST[0]. MOV OUT(array 1)[ADDR[0].x+1].y, IMM[0]. The only difficulty with this is that we have to split acess TGSI instructions accessing unaligned vectors: (NOTE: this can always be avoided with TGSI_FILE_TEMPORARY, but varyings may have to be packed). With suggestion (1), 2 packed (and hence unaligned) vec3 arrays and a single vec2 would look like this: OUT[0..3].xyz, GENERIC[0].x OUT[4..5].xyz, GENERIC[3].x OUT[6].xy, GENERIC[4].zw and we could still do: ADD OUT[5].xyz, TEMP[0], TEMP[1] Now, these would have to merged declared as: OUT[0..4].xyzw and the 2nd vec3 would be { OUT[0].w, OUT[1].xyz } instead of simply OUT[1].xyz A problem with this is that the GLSL compiler, while it can do the packing into vec4s and splitting up access, cannot, iirc, access individual components of a vec4 indirectly like TGSI would be able to. To avoid TEMP arrays we'd have to disable the last phase of
Re: [Mesa-dev] RFC: TGSI scalar arrays
On 20.03.2013 17:05, Roland Scheidegger wrote: Am 20.03.2013 15:41, schrieb Christoph Bumiller: Sorry, this has become longer than I anticipated ... I've been toying with adding support for TGSI_FILE_INPUT/OUTPUT arrays because, since I cannot allocate varyings in the same order that the register index specifies, I need it: === EXAMPLE: OUT[0], CLIPDIST[1], must be allocated at address 0x2c0 in hardware output space OUT[1], CLIPDIST[0], 0x2d0 OUT[2], GENERIC[0], between 0x80 and 0x280 OUT[3], GENERIC[1], between 0x80 and 0x280 And without array specification MOV OUT[TEMP[0].x-1], IMM[0] would leave me no clue as to whether use 0x80 or 0x2c0 as base address. === Now that I'm on it, I'm considering to go a step further, which is adding indirect scalar/component access. This is motivated by float gl_ClipDistance[], which, if accessed indirectly, currently leaves us no choice than generating code like this: if ((index 3) == 0) access x component; else if ((index 3) == 1) access y component; ... This is undesirable and the hardware can do better (as it actually supports accessing individual components since address registers contain an address in bytes and we can do scalar read/write). A second motivation is varying packing, which is required by the GL spec, and may lead to use of TEMP arrays, which, albeit improved now, will impair performance when used (on nv50 they go to uncached memory which is very slow). That case occurs if, for instance, a varying float[8] is accessed indirectly and has to be packed into OUT[0..1].xyzw, GENERIC[0..1] instead of OUT[0..7].x, GENERIC[0..7] So far I've come up with 2 choices (all available only if the driver supports e.g. PIPE_CAP_TGSI_SCALAR_REGISTERS): 1. SCALAR DECLARATIONS Using float gl_ClipDistance[8] as example, it could be declared as: OUT[0..7].x, CLIPDIST, ARRAY(1) where the .x now means that it's a single component per OUT[index] Now this obviously means that a single OUT[i] doesn't always consume 16 bytes / 4 components anymore, which may be a somewhat disturbing, since the address of an output can't be directly inferred solely from its index anymore. However, that doesn't really constitute a problem if all access is either direct or comes with an ARRAY() reference. For varying packing, which happens only for user defined variables, and hence TGSI_SEMANTIC_GENERIC, it gets a bit uglier: (NOTE: GL requires us to be able to support exactly the amount of components we report, failing due to alignment is not allowed. Hence the GLSL compiler may put some variables at unaligned locations, see ir_variable.location_frac): A GENERIC semantic index should always cover 4 components so that a fixed location can be assigned for it (drivers usually do this since it makes an extra dynamic linkage pass when shaders are changed unnecessary, as intended by GL_ARB_separate_shader_objects). So, this would be valid: OUT[0..3].x, GENERIC[0] OUT[4..5].xy, GENERIC[1] OUT[6], GENERIC[2] Note how 3 OUT[indices] only consume 1 GENERIC[index]. If we, instead, allocated semantic index per register index instead of per 4 components, we would have: OUT[0..3].x, GENERIC[0] OUT[4..5].xy, GENERIC[4] OUT[6], GENERIC[6] This would waste space, since GENERIC[4,6] would have to go to output_space[addresses 0x40, 0x60] so it could link with IN[6], GENERIC[6] where we have no information about the size of GENERIC[0 .. 5], and wasting space like that means the advertised number of varying components cannot be satisfied. And as a last step, if varyings are placed at non-vec4 boundaries, we would have to be able to specify fractional semantic indices, like this: OUT[0..2].x, GENERIC[0].x OUT[3].x, GENERIC[0].w 2. SCALAR ADDRESS REGISTER VALUES All this can be avoided by always declaring full vec4s, and adding the possibility of doing indirect addressing on a per-component basis: varying float a[4] becomes: uniform int i; a[i+5] = 999 becomes: OUT[0].xyzw, ARRAY(1) UARL_SCALAR ADDR[0].x, CONST[0]. MOV OUT(array 1)[ADDR[0].x+1].y, IMM[0]. The only difficulty with this is that we have to split acess TGSI instructions accessing unaligned vectors: (NOTE: this can always be avoided with TGSI_FILE_TEMPORARY, but varyings may have to be packed). With suggestion (1), 2 packed (and hence unaligned) vec3 arrays and a single vec2 would look like this: OUT[0..3].xyz, GENERIC[0].x OUT[4..5].xyz, GENERIC[3].x OUT[6].xy, GENERIC[4].zw and we could still do: ADD OUT[5].xyz, TEMP[0], TEMP[1] Now, these would have to merged declared as: OUT[0..4].xyzw and the 2nd vec3 would be { OUT[0].w, OUT[1].xyz } instead of simply OUT[1].xyz A problem with this is that the GLSL compiler, while it can do the packing into vec4s and splitting up access, cannot, iirc, access individual components of a vec4 indirectly like TGSI would be able to. To avoid TEMP arrays we'd have to disable the
Re: [Mesa-dev] RFC: TGSI scalar arrays
Am 20.03.2013 17:46, schrieb Christoph Bumiller: On 20.03.2013 17:05, Roland Scheidegger wrote: Am 20.03.2013 15:41, schrieb Christoph Bumiller: Sorry, this has become longer than I anticipated ... I've been toying with adding support for TGSI_FILE_INPUT/OUTPUT arrays because, since I cannot allocate varyings in the same order that the register index specifies, I need it: === EXAMPLE: OUT[0], CLIPDIST[1], must be allocated at address 0x2c0 in hardware output space OUT[1], CLIPDIST[0], 0x2d0 OUT[2], GENERIC[0], between 0x80 and 0x280 OUT[3], GENERIC[1], between 0x80 and 0x280 And without array specification MOV OUT[TEMP[0].x-1], IMM[0] would leave me no clue as to whether use 0x80 or 0x2c0 as base address. === Now that I'm on it, I'm considering to go a step further, which is adding indirect scalar/component access. This is motivated by float gl_ClipDistance[], which, if accessed indirectly, currently leaves us no choice than generating code like this: if ((index 3) == 0) access x component; else if ((index 3) == 1) access y component; ... This is undesirable and the hardware can do better (as it actually supports accessing individual components since address registers contain an address in bytes and we can do scalar read/write). A second motivation is varying packing, which is required by the GL spec, and may lead to use of TEMP arrays, which, albeit improved now, will impair performance when used (on nv50 they go to uncached memory which is very slow). That case occurs if, for instance, a varying float[8] is accessed indirectly and has to be packed into OUT[0..1].xyzw, GENERIC[0..1] instead of OUT[0..7].x, GENERIC[0..7] So far I've come up with 2 choices (all available only if the driver supports e.g. PIPE_CAP_TGSI_SCALAR_REGISTERS): 1. SCALAR DECLARATIONS Using float gl_ClipDistance[8] as example, it could be declared as: OUT[0..7].x, CLIPDIST, ARRAY(1) where the .x now means that it's a single component per OUT[index] Now this obviously means that a single OUT[i] doesn't always consume 16 bytes / 4 components anymore, which may be a somewhat disturbing, since the address of an output can't be directly inferred solely from its index anymore. However, that doesn't really constitute a problem if all access is either direct or comes with an ARRAY() reference. For varying packing, which happens only for user defined variables, and hence TGSI_SEMANTIC_GENERIC, it gets a bit uglier: (NOTE: GL requires us to be able to support exactly the amount of components we report, failing due to alignment is not allowed. Hence the GLSL compiler may put some variables at unaligned locations, see ir_variable.location_frac): A GENERIC semantic index should always cover 4 components so that a fixed location can be assigned for it (drivers usually do this since it makes an extra dynamic linkage pass when shaders are changed unnecessary, as intended by GL_ARB_separate_shader_objects). So, this would be valid: OUT[0..3].x, GENERIC[0] OUT[4..5].xy, GENERIC[1] OUT[6], GENERIC[2] Note how 3 OUT[indices] only consume 1 GENERIC[index]. If we, instead, allocated semantic index per register index instead of per 4 components, we would have: OUT[0..3].x, GENERIC[0] OUT[4..5].xy, GENERIC[4] OUT[6], GENERIC[6] This would waste space, since GENERIC[4,6] would have to go to output_space[addresses 0x40, 0x60] so it could link with IN[6], GENERIC[6] where we have no information about the size of GENERIC[0 .. 5], and wasting space like that means the advertised number of varying components cannot be satisfied. And as a last step, if varyings are placed at non-vec4 boundaries, we would have to be able to specify fractional semantic indices, like this: OUT[0..2].x, GENERIC[0].x OUT[3].x, GENERIC[0].w 2. SCALAR ADDRESS REGISTER VALUES All this can be avoided by always declaring full vec4s, and adding the possibility of doing indirect addressing on a per-component basis: varying float a[4] becomes: uniform int i; a[i+5] = 999 becomes: OUT[0].xyzw, ARRAY(1) UARL_SCALAR ADDR[0].x, CONST[0]. MOV OUT(array 1)[ADDR[0].x+1].y, IMM[0]. The only difficulty with this is that we have to split acess TGSI instructions accessing unaligned vectors: (NOTE: this can always be avoided with TGSI_FILE_TEMPORARY, but varyings may have to be packed). With suggestion (1), 2 packed (and hence unaligned) vec3 arrays and a single vec2 would look like this: OUT[0..3].xyz, GENERIC[0].x OUT[4..5].xyz, GENERIC[3].x OUT[6].xy, GENERIC[4].zw and we could still do: ADD OUT[5].xyz, TEMP[0], TEMP[1] Now, these would have to merged declared as: OUT[0..4].xyzw and the 2nd vec3 would be { OUT[0].w, OUT[1].xyz } instead of simply OUT[1].xyz A problem with this is that the GLSL compiler, while it can do the packing into vec4s and splitting up access, cannot, iirc, access individual components of a vec4 indirectly like TGSI would be able to.
Re: [Mesa-dev] RFC: TGSI scalar arrays
On 20.03.2013 18:30, Roland Scheidegger wrote: Am 20.03.2013 17:46, schrieb Christoph Bumiller: On 20.03.2013 17:05, Roland Scheidegger wrote: Not sure I fully understand this, but I'm thinking whenever in doubt, use something close to what dx10 does since that's likely going to work reasonable with different hw. Maybe declaring those special values differently (not just as output reg) would help? What DX10 does is making indirect access of varyings illegal. That's not possible with OpenGL ... Hmm I thought dcl_indexRange would be used for indirect access of varyings? Interesting ... when last I tried that back when working on d3d1x, the compiler didn't like it, and I remember something about indexRange existing only for debugging (and I remember finding that strange). Also, d3d11 doesn't have the annoying limit that GLSL has so there is no need for it to pack varyings. When I use floats[3] + SV_POSITION, I get vs_5_0 output limit (32) exceeded, shader uses 33 outputs, but float4[28] works just fine. For indirect access I still get: error X3500: array reference cannot be used as an l-value; not natively addressable for struct IA2VS { float4 position : POSITION; float4 color: COLOR; }; struct VS2PS { float4 position : SV_POSITION; float4 color[2] : WHATEVER; }; VS2PS vs(IA2VS input) { VS2PS result; int i = int(input.position.x); result.position = input.position; result.color[i] = input.color; return result; } float4 ps(VS2PS input) : SV_TARGET { return input.color[0]; } Roland ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Error while compiling the MAPI directory
SCons builds via Visual Studio compilers. So I assume by I am using Visual Studio you mean, no, I'm not using SCons... I'd strongly recommend using scons instead of replicating Mesa build with MSVC project, as Mesa build is extremely complex (a lot of code generation). If you are determined to do your own thing, then build Mesa with scons _once_ while recording its output scons verbose=1 scons.log 21 then study the commands used to compile, and mimic them. I'm afraid I can't help you further diagnose particular failures. Life is too short. Jose - Original Message - I am using Visual Studio. I found that all these missing constants like MAPI_TABLE_NUM_STATIC are getting there values in mapi_abi.py. Since I am building it in UEFI I am making [.inf] files and using them to generate the makefilesand not the makefiles given in the mesa kit. Could this be a reason for why I am getting the errors? How are the python functions initializing the C constants? Thanks, Ritvik -Original Message- From: Jose Fonseca [mailto:jfons...@vmware.com] Sent: Wednesday, March 20, 2013 8:00 PM To: Sharma, Ritvik Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] Error while compiling the MAPI directory You're building with scons right? Jose - Original Message - Hi, I used the latest mesa and I am still receiving the same errors. It works perfectly fine in Ubuntu though. Can somebody please tell in the file mapi_tmp.h how does the following constant included? #include MAPI_ABI_HEADER Thanks, Ritvik -Original Message- From: Jose Fonseca [mailto:jfons...@vmware.com] Sent: Monday, March 18, 2013 11:29 PM To: Sharma, Ritvik Cc: mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] Error while compiling the MAPI directory - Original Message - Hi, I am receiving the following error while compiling the code in the mapi directory. I am using mesa 7.5. If you're compiling with MSVC I'd recommend using a recent Mesa release and save your self a world of trouble. It's known to build well there. If you must use this old release, then you'll likely need to search the MSVC build fixes and crossport them. Jose ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 62571] Mesa 9.0 uses missing nouveau library
https://bugs.freedesktop.org/show_bug.cgi?id=62571 Jesus Cortez jesus.corte...@gmail.com changed: What|Removed |Added Assignee|nouveau@lists.freedesktop.o |mesa-dev@lists.freedesktop. |rg |org Component|Drivers/DRI/nouveau |Mesa core -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] RFC: TGSI scalar arrays
Am 20.03.2013 19:09, schrieb Christoph Bumiller: On 20.03.2013 18:30, Roland Scheidegger wrote: Am 20.03.2013 17:46, schrieb Christoph Bumiller: On 20.03.2013 17:05, Roland Scheidegger wrote: Not sure I fully understand this, but I'm thinking whenever in doubt, use something close to what dx10 does since that's likely going to work reasonable with different hw. Maybe declaring those special values differently (not just as output reg) would help? What DX10 does is making indirect access of varyings illegal. That's not possible with OpenGL ... Hmm I thought dcl_indexRange would be used for indirect access of varyings? Interesting ... when last I tried that back when working on d3d1x, the compiler didn't like it, and I remember something about indexRange existing only for debugging (and I remember finding that strange). Also, d3d11 doesn't have the annoying limit that GLSL has so there is no need for it to pack varyings. When I use floats[3] + SV_POSITION, I get vs_5_0 output limit (32) exceeded, shader uses 33 outputs, but float4[28] works just fine. For indirect access I still get: error X3500: array reference cannot be used as an l-value; not natively addressable Hmm that's odd. On some quick look I couldn't find many examples using it - and those I found it was used in hull shaders. I can't see anything saying it shouldn't work at all though (it does have some restrictions but they look reasonable to me). for struct IA2VS { float4 position : POSITION; float4 color: COLOR; }; struct VS2PS { float4 position : SV_POSITION; float4 color[2] : WHATEVER; }; VS2PS vs(IA2VS input) { VS2PS result; int i = int(input.position.x); result.position = input.position; result.color[i] = input.color; return result; } float4 ps(VS2PS input) : SV_TARGET { return input.color[0]; } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 62571] Mesa 9.0 uses missing nouveau library
https://bugs.freedesktop.org/show_bug.cgi?id=62571 Maarten Lankhorst m.b.lankho...@gmail.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |INVALID --- Comment #2 from Maarten Lankhorst m.b.lankho...@gmail.com --- First of all, this is a bug with scientific linux' packaging probably. Second, are you sure that it doesn't exist? ldd /usr/lib/dri/nouveau_dri.so -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 62571] Mesa 9.0 uses missing nouveau library
https://bugs.freedesktop.org/show_bug.cgi?id=62571 --- Comment #3 from Jesus Cortez jesus.corte...@gmail.com --- (In reply to comment #2) First of all, this is a bug with scientific linux' packaging probably. Second, are you sure that it doesn't exist? ldd /usr/lib/dri/nouveau_dri.so Yes, the nouveau_dri.so is completely missing from the machine. Just to verify that it was the upgrade, I ran yum -y upgrade mesa-libGL-devel mesa-libGL mesa-dri-drivers and the glxinfo, and sure enough, the problem came back. -- You are receiving this mail because: You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/4] i965: Move brw_vs_prog_data::outputs_written into VUE map.
Future patches will allow for there to be separate VUE maps when both a geometry shader and a vertex shader are in use. When this happens, we will want to have correspondingly separate outputs_written bitfields. Moving outputs_written into the VUE map will make this easy. For consistency with the terminology used in the VUE map, the bitfield is renamed to slots_valid in the process. --- src/mesa/drivers/dri/i965/brw_clip.c | 2 +- src/mesa/drivers/dri/i965/brw_context.h| 8 +++- src/mesa/drivers/dri/i965/brw_gs.c | 2 +- src/mesa/drivers/dri/i965/brw_sf.c | 2 +- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 9 - src/mesa/drivers/dri/i965/brw_vs.c | 23 --- src/mesa/drivers/dri/i965/brw_wm.c | 2 +- 7 files changed, 27 insertions(+), 21 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_clip.c b/src/mesa/drivers/dri/i965/brw_clip.c index d411208..e20f7c2 100644 --- a/src/mesa/drivers/dri/i965/brw_clip.c +++ b/src/mesa/drivers/dri/i965/brw_clip.c @@ -146,7 +146,7 @@ brw_upload_clip_prog(struct brw_context *brw) /* BRW_NEW_REDUCED_PRIMITIVE */ key.primitive = brw-intel.reduced_primitive; /* CACHE_NEW_VS_PROG (also part of VUE map) */ - key.attrs = brw-vs.prog_data-outputs_written; + key.attrs = brw-vs.prog_data-vue_map.slots_valid; /* _NEW_LIGHT */ key.do_flat_shading = (ctx-Light.ShadeModel == GL_FLAT); key.pv_first = (ctx-Light.ProvokingVertex == GL_FIRST_VERTEX_CONVENTION); diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index 9f1aaf5..fe6e639 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -354,6 +354,13 @@ typedef enum */ struct brw_vue_map { /** +* Bitfield representing all varying slots that are (a) stored in this VUE +* map, and (b) actually written by the shader. Does not include any of +* the additional varying slots defined in brw_varying_slot. +*/ + GLbitfield64 slots_valid; + + /** * Map from gl_varying_slot value to VUE slot. For gl_varying_slots that are * not stored in a slot (because they are not written, or because * additional processing is applied before storing them in the VUE), the @@ -437,7 +444,6 @@ struct brw_vs_prog_data { GLuint curb_read_length; GLuint urb_read_length; GLuint total_grf; - GLbitfield64 outputs_written; GLuint nr_params; /** number of float params/constants */ GLuint nr_pull_params; /** number of dwords referenced by pull_param[] */ GLuint total_scratch; diff --git a/src/mesa/drivers/dri/i965/brw_gs.c b/src/mesa/drivers/dri/i965/brw_gs.c index 1328984..e755a10 100644 --- a/src/mesa/drivers/dri/i965/brw_gs.c +++ b/src/mesa/drivers/dri/i965/brw_gs.c @@ -167,7 +167,7 @@ static void populate_key( struct brw_context *brw, memset(key, 0, sizeof(*key)); /* CACHE_NEW_VS_PROG (part of VUE map) */ - key-attrs = brw-vs.prog_data-outputs_written; + key-attrs = brw-vs.prog_data-vue_map.slots_valid; /* BRW_NEW_PRIMITIVE */ key-primitive = brw-primitive; diff --git a/src/mesa/drivers/dri/i965/brw_sf.c b/src/mesa/drivers/dri/i965/brw_sf.c index fdc6bd7..c8b7033 100644 --- a/src/mesa/drivers/dri/i965/brw_sf.c +++ b/src/mesa/drivers/dri/i965/brw_sf.c @@ -145,7 +145,7 @@ brw_upload_sf_prog(struct brw_context *brw) /* Populate the key, noting state dependencies: */ /* CACHE_NEW_VS_PROG */ - key.attrs = brw-vs.prog_data-outputs_written; + key.attrs = brw-vs.prog_data-vue_map.slots_valid; /* BRW_NEW_REDUCED_PRIMITIVE */ switch (brw-intel.reduced_primitive) { diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 60575d7..b0a0dd6 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -2402,7 +2402,7 @@ void vec4_visitor::emit_psiz_and_flags(struct brw_reg reg) { if (intel-gen 6 - ((c-prog_data.outputs_written BITFIELD64_BIT(VARYING_SLOT_PSIZ)) || + ((c-prog_data.vue_map.slots_valid VARYING_BIT_PSIZ) || c-key.userclip_active || brw-has_negative_rhw_bug)) { dst_reg header1 = dst_reg(this, glsl_type::uvec4_type); dst_reg header1_w = header1; @@ -2411,7 +2411,7 @@ vec4_visitor::emit_psiz_and_flags(struct brw_reg reg) emit(MOV(header1, 0u)); - if (c-prog_data.outputs_written BITFIELD64_BIT(VARYING_SLOT_PSIZ)) { + if (c-prog_data.vue_map.slots_valid VARYING_BIT_PSIZ) { src_reg psiz = src_reg(output_reg[VARYING_SLOT_PSIZ]); current_annotation = Point size; @@ -2456,7 +2456,7 @@ vec4_visitor::emit_psiz_and_flags(struct brw_reg reg) emit(MOV(retype(reg, BRW_REGISTER_TYPE_UD), 0u)); } else { emit(MOV(retype(reg, BRW_REGISTER_TYPE_D), src_reg(0))); - if (c-prog_data.outputs_written
[Mesa-dev] [PATCH 2/4] i965: Create a pointer in brw_context to the geometry output VUE map.
Currently, the GPU pipeline has one active VUE map in effect at any given time--the one representing the layout of vertex data coming from the vertex shader. However, when geometry shaders are added, they will have their own independent VUE map. Later pipeline stages (clip, sf, fs) will need to consult the geometry shader VUE map if a geometry shader is in use, and the vertex shader VUE map otherwise. This patch adds a new field to brw_context, vue_map_geom_out, which points to whichever VUE map should be used by later pipeline stages. It also adds a new state flag, BRW_NEW_VUE_MAP_GEOM_OUT, which is signalled whenever this pointer changes. Since we don't support geometry shaders yet, vue_map_geom_out is currently set only by the brw_vs_prog state atom. --- src/mesa/drivers/dri/i965/brw_context.h | 12 src/mesa/drivers/dri/i965/brw_state_upload.c | 1 + src/mesa/drivers/dri/i965/brw_vs.c | 4 3 files changed, 17 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h index fe6e639..7ad78f5 100644 --- a/src/mesa/drivers/dri/i965/brw_context.h +++ b/src/mesa/drivers/dri/i965/brw_context.h @@ -153,6 +153,7 @@ enum brw_state_id { BRW_STATE_PROGRAM_CACHE, BRW_STATE_STATE_BASE_ADDRESS, BRW_STATE_SOL_INDICES, + BRW_STATE_VUE_MAP_GEOM_OUT, }; #define BRW_NEW_URB_FENCE (1 BRW_STATE_URB_FENCE) @@ -182,6 +183,7 @@ enum brw_state_id { #define BRW_NEW_PROGRAM_CACHE (1 BRW_STATE_PROGRAM_CACHE) #define BRW_NEW_STATE_BASE_ADDRESS (1 BRW_STATE_STATE_BASE_ADDRESS) #define BRW_NEW_SOL_INDICES(1 BRW_STATE_SOL_INDICES) +#define BRW_NEW_VUE_MAP_GEOM_OUT (1 BRW_STATE_VUE_MAP_GEOM_OUT) struct brw_state_flags { /** State update flags signalled by mesa internals */ @@ -917,6 +919,16 @@ struct brw_context uint32_t offset; } sampler; + /** +* Layout of vertex data exiting the geometry portion of the pipleine. +* This comes from the geometry shader if one exists, otherwise from the +* vertex shader. +* +* BRW_NEW_VUE_MAP_GEOM_OUT is flagged when this pointer (or the data it +* points to) changes. +*/ + const struct brw_vue_map *vue_map_geom_out; + struct { struct brw_vs_prog_data *prog_data; diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c index 41dfdc3..5c5c05e 100644 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c @@ -376,6 +376,7 @@ static struct dirty_bit_map brw_bits[] = { DEFINE_BIT(BRW_NEW_PROGRAM_CACHE), DEFINE_BIT(BRW_NEW_STATE_BASE_ADDRESS), DEFINE_BIT(BRW_NEW_SOL_INDICES), + DEFINE_BIT(BRW_NEW_VUE_MAP_GEOM_OUT), {0, 0, 0} }; diff --git a/src/mesa/drivers/dri/i965/brw_vs.c b/src/mesa/drivers/dri/i965/brw_vs.c index d875703..214730d 100644 --- a/src/mesa/drivers/dri/i965/brw_vs.c +++ b/src/mesa/drivers/dri/i965/brw_vs.c @@ -314,6 +314,8 @@ do_vs_prog(struct brw_context *brw, program, program_size, c.prog_data, sizeof(c.prog_data), brw-vs.prog_offset, brw-vs.prog_data); + brw-vue_map_geom_out = brw-vs.prog_data-vue_map; + brw-state.dirty.brw |= BRW_NEW_VUE_MAP_GEOM_OUT; ralloc_free(mem_ctx); return true; @@ -488,6 +490,8 @@ static void brw_upload_vs_prog(struct brw_context *brw) assert(success); } + brw-vue_map_geom_out = brw-vs.prog_data-vue_map; + brw-state.dirty.brw |= BRW_NEW_VUE_MAP_GEOM_OUT; } /* See brw_vs.c: -- 1.8.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] i965: Use brw.vue_map_geom_out instead of VS output VUE map where appropriate.
This patch modifies post-GS pipeline stages (transform feedback, clip, sf, fs) to refer to the VUE map through brw-vue_map_geom_out rather than brw-vs.prog_data-vue_map. This ensures that when geometry shader support is added, these pipeline stages will consult the geometry shader output VUE map when appropriate, rather than the vertex shader output VUE map. --- src/mesa/drivers/dri/i965/brw_clip.c | 7 +++ src/mesa/drivers/dri/i965/brw_sf.c | 7 +++ src/mesa/drivers/dri/i965/brw_state.h | 2 +- src/mesa/drivers/dri/i965/brw_wm.c | 6 +++--- src/mesa/drivers/dri/i965/gen6_sf_state.c | 10 +- src/mesa/drivers/dri/i965/gen7_sf_state.c | 8 src/mesa/drivers/dri/i965/gen7_sol_state.c | 14 +++--- 7 files changed, 26 insertions(+), 28 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_clip.c b/src/mesa/drivers/dri/i965/brw_clip.c index e20f7c2..bc0ebb5 100644 --- a/src/mesa/drivers/dri/i965/brw_clip.c +++ b/src/mesa/drivers/dri/i965/brw_clip.c @@ -69,7 +69,7 @@ static void compile_clip_prog( struct brw_context *brw, c.func.single_program_flow = 1; c.key = *key; - c.vue_map = brw-vs.prog_data-vue_map; + c.vue_map = *brw-vue_map_geom_out; /* nr_regs is the number of registers filled by reading data from the VUE. * This program accesses the entire VUE, so nr_regs needs to be the size of @@ -146,7 +146,7 @@ brw_upload_clip_prog(struct brw_context *brw) /* BRW_NEW_REDUCED_PRIMITIVE */ key.primitive = brw-intel.reduced_primitive; /* CACHE_NEW_VS_PROG (also part of VUE map) */ - key.attrs = brw-vs.prog_data-vue_map.slots_valid; + key.attrs = brw-vue_map_geom_out-slots_valid; /* _NEW_LIGHT */ key.do_flat_shading = (ctx-Light.ShadeModel == GL_FLAT); key.pv_first = (ctx-Light.ProvokingVertex == GL_FIRST_VERTEX_CONVENTION); @@ -258,8 +258,7 @@ const struct brw_tracked_state brw_clip_prog = { _NEW_TRANSFORM | _NEW_POLYGON | _NEW_BUFFERS), - .brw = (BRW_NEW_REDUCED_PRIMITIVE), - .cache = CACHE_NEW_VS_PROG + .brw = (BRW_NEW_REDUCED_PRIMITIVE | BRW_NEW_VUE_MAP_GEOM_OUT) }, .emit = brw_upload_clip_prog }; diff --git a/src/mesa/drivers/dri/i965/brw_sf.c b/src/mesa/drivers/dri/i965/brw_sf.c index c8b7033..d90c0bc 100644 --- a/src/mesa/drivers/dri/i965/brw_sf.c +++ b/src/mesa/drivers/dri/i965/brw_sf.c @@ -65,7 +65,7 @@ static void compile_sf_prog( struct brw_context *brw, brw_init_compile(brw, c.func, mem_ctx); c.key = *key; - c.vue_map = brw-vs.prog_data-vue_map; + c.vue_map = *brw-vue_map_geom_out; if (c.key.do_point_coord) { /* * gl_PointCoord is a FS instead of VS builtin variable, thus it's @@ -145,7 +145,7 @@ brw_upload_sf_prog(struct brw_context *brw) /* Populate the key, noting state dependencies: */ /* CACHE_NEW_VS_PROG */ - key.attrs = brw-vs.prog_data-vue_map.slots_valid; + key.attrs = brw-vue_map_geom_out-slots_valid; /* BRW_NEW_REDUCED_PRIMITIVE */ switch (brw-intel.reduced_primitive) { @@ -216,8 +216,7 @@ const struct brw_tracked_state brw_sf_prog = { .dirty = { .mesa = (_NEW_HINT | _NEW_LIGHT | _NEW_POLYGON | _NEW_POINT | _NEW_TRANSFORM | _NEW_BUFFERS | _NEW_PROGRAM), - .brw = (BRW_NEW_REDUCED_PRIMITIVE), - .cache = CACHE_NEW_VS_PROG + .brw = (BRW_NEW_REDUCED_PRIMITIVE | BRW_NEW_VUE_MAP_GEOM_OUT) }, .emit = brw_upload_sf_prog }; diff --git a/src/mesa/drivers/dri/i965/brw_state.h b/src/mesa/drivers/dri/i965/brw_state.h index 02ce57b..1f5e18a 100644 --- a/src/mesa/drivers/dri/i965/brw_state.h +++ b/src/mesa/drivers/dri/i965/brw_state.h @@ -227,7 +227,7 @@ void upload_default_color(struct brw_context *brw, /* gen6_sf_state.c */ uint32_t -get_attr_override(struct brw_vue_map *vue_map, int urb_entry_read_offset, +get_attr_override(const struct brw_vue_map *vue_map, int urb_entry_read_offset, int fs_attr, bool two_side_color, uint32_t *max_source_attr); #ifdef __cplusplus diff --git a/src/mesa/drivers/dri/i965/brw_wm.c b/src/mesa/drivers/dri/i965/brw_wm.c index e7e9ddc..d121dbf 100644 --- a/src/mesa/drivers/dri/i965/brw_wm.c +++ b/src/mesa/drivers/dri/i965/brw_wm.c @@ -481,7 +481,7 @@ static void brw_wm_populate_key( struct brw_context *brw, /* CACHE_NEW_VS_PROG */ if (intel-gen 6) - key-vp_outputs_written = brw-vs.prog_data-vue_map.slots_valid; + key-vp_outputs_written = brw-vue_map_geom_out-slots_valid; /* The unique fragment program ID */ key-program_string_id = fp-id; @@ -524,8 +524,8 @@ const struct brw_tracked_state brw_wm_prog = { _NEW_MULTISAMPLE), .brw = (BRW_NEW_FRAGMENT_PROGRAM | BRW_NEW_WM_INPUT_DIMENSIONS | - BRW_NEW_REDUCED_PRIMITIVE), - .cache = CACHE_NEW_VS_PROG, + BRW_NEW_REDUCED_PRIMITIVE | +BRW_NEW_VUE_MAP_GEOM_OUT) }, .emit =
[Mesa-dev] [PATCH 4/4] i965/fs: Rename vp_outputs_written to input_slots_valid.
With the introduction of geometry shaders, fragment inputs will no longer come exclusively from the vertex shader; sometimes they come from the geometry shader. So the name vp_outputs_written will become a misnomer. This patch renames vp_outputs_written to input_slots_valid, to reflect the true meaning of the bitfield from the fragment shader's point of view: it indicates which of the possible input slots contain valid data that was written by the previous shader stage. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 6 +++--- src/mesa/drivers/dri/i965/brw_wm.c | 6 +++--- src/mesa/drivers/dri/i965/brw_wm.h | 2 +- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 5a5bfeb..ecce66b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1264,7 +1264,7 @@ fs_visitor::calculate_urb_setup() if (i == VARYING_SLOT_PSIZ) continue; -if (c-key.vp_outputs_written BITFIELD64_BIT(i)) { +if (c-key.input_slots_valid BITFIELD64_BIT(i)) { /* The back color slot is skipped when the front color is * also written to. In addition, some slots can be * written in the vertex shader and not read in the @@ -2995,7 +2995,7 @@ brw_fs_precompile(struct gl_context *ctx, struct gl_shader_program *prog) } if (intel-gen 6) - key.vp_outputs_written |= BITFIELD64_BIT(VARYING_SLOT_POS); + key.input_slots_valid |= BITFIELD64_BIT(VARYING_SLOT_POS); for (int i = 0; i VARYING_SLOT_MAX; i++) { if (!(fp-Base.InputsRead BITFIELD64_BIT(i))) @@ -3006,7 +3006,7 @@ brw_fs_precompile(struct gl_context *ctx, struct gl_shader_program *prog) if (intel-gen 6) { if (_mesa_varying_slot_in_fs((gl_varying_slot) i)) -key.vp_outputs_written |= BITFIELD64_BIT(i); +key.input_slots_valid |= BITFIELD64_BIT(i); } } diff --git a/src/mesa/drivers/dri/i965/brw_wm.c b/src/mesa/drivers/dri/i965/brw_wm.c index d121dbf..a8f2a3a2 100644 --- a/src/mesa/drivers/dri/i965/brw_wm.c +++ b/src/mesa/drivers/dri/i965/brw_wm.c @@ -289,8 +289,8 @@ brw_wm_debug_recompile(struct brw_context *brw, old_key-proj_attrib_mask, key-proj_attrib_mask); found |= key_debug(intel, renderbuffer height, old_key-drawable_height, key-drawable_height); - found |= key_debug(intel, vertex shader outputs, - old_key-vp_outputs_written, key-vp_outputs_written); + found |= key_debug(intel, input slots valid, + old_key-input_slots_valid, key-input_slots_valid); found |= brw_debug_recompile_sampler_key(intel, old_key-tex, key-tex); @@ -481,7 +481,7 @@ static void brw_wm_populate_key( struct brw_context *brw, /* CACHE_NEW_VS_PROG */ if (intel-gen 6) - key-vp_outputs_written = brw-vue_map_geom_out-slots_valid; + key-input_slots_valid = brw-vue_map_geom_out-slots_valid; /* The unique fragment program ID */ key-program_string_id = fp-id; diff --git a/src/mesa/drivers/dri/i965/brw_wm.h b/src/mesa/drivers/dri/i965/brw_wm.h index 8eb71de..f43d42c 100644 --- a/src/mesa/drivers/dri/i965/brw_wm.h +++ b/src/mesa/drivers/dri/i965/brw_wm.h @@ -70,7 +70,7 @@ struct brw_wm_prog_key { GLbitfield64 proj_attrib_mask; /** one bit per fragment program attribute */ GLushort drawable_height; - GLbitfield64 vp_outputs_written; + GLbitfield64 input_slots_valid; GLuint program_string_id:32; struct brw_sampler_prog_key_data tex; -- 1.8.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mismatch between Mesas dispatch table and the one used by the X server
On Tuesday 12 March 2013 09:25:07 Ian Romanick wrote: On 03/10/2013 11:24 AM, Stefan Brüns wrote: Hi everyone, I have run into a problem leading to a crashing X server for a bunch of indirect GLX calls. This problem affects any OpenGL clients using indirect rendering and calling functions which are outside the ABI. Problems range from malfunctioning to crashes. Problem analysis: The dispatcher functions in glx/indirect_dispatch.c demarshall the function arguments from the GLX wire protocol and then call the appropriate function of the GL library. Function calls are done using dispatch table with the help of the CALL_* helper macros defined in dispatch.h. Unfortunately there is a mismatch between the dispatch table expected by the xserver - which follows the layout e.g. found in glapitable.h - and the one provided by the GL library, e.g. Mesa. The dependency is the other way around. The loader (either libGL on the client or the GLX extension in the server) dictates the layout of the dispatch table. The GL driver is required to adapt to the layout dictated by the loader. That's the whole reason the remap table exists. The driver queries the loader to learn where functions are located in the dispatch table. It then stores the dispatch table locations at (fixed) locations in the remap table. So, the driver knows that glFoo is at location 824 in the remap table, and that entry stores the location of glFoo in the dispatch table. It sounds like something else is going wrong. Currently this obviously can not work. The remap table is only used when FEATURE_remap_table is defined, which for the X server is never true. Now, lets try defining FEATURE_remap_table for the xserver (which breaks OS X and windows builds of Mesa, but anyway ...) Even then, the lookup of indexes for the remap table is going wrong. The xserver uses the remap table slots defined in the dispatch.h exported from some past version of Mesa, i.e. #define TexBufferARB_remap_index 174. Now the remap table is populated from _mesa_init_remap_table in mesa/main/remap.c, which calls: _mesa_do_init_remap_table(_mesa_function_pool, driDispatchRemapTable_size, MESA_remap_table_functions); Now lets have a look a look into MESA_remap_table_functions[174], which should have the entry for TexBufferARB, and yes, its there, so everything fine! But wait, do another test - ClientWaitSync, slot 178 in remap table, at least if you ask the x server, but slot 185 if you ask Mesa ... So obviously the remap table is filled with the wrong values. Regards, Stefan -- Stefan Brüns / Bergstraße 21 / 52062 Aachen phone: +49 241 53809034 mobile: +49 151 50412019 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] build libgallium shared by default.
Am Dienstag, 19. März 2013, 21:36:47 schrieb Andreas Boll: 2013/3/19 Johannes Obermayr johannesoberm...@gmx.de: Am Montag, 18. März 2013, 15:38:31 schrieb Maarten Lankhorst: This is one of the 2 patches used in ubuntu for decreasing size of mesa build. The other one is more hacky, and links libmesagallium into libgallium, and then links libgallium against libdricore too for minimal duplication. I am against both patches: 1. libgallium shared in this version causes duplicate symbols for depending targets if using static LLVM libs and generally links to much LLVM components/libs 2. libmesagallium shared in a right implementation unconditionally depends on shared libglapi and shared libgallium to avoid duplicate symbol for depending targets 3. It is not -no-undefined but -Wl,--no-undefined to show missing symbols (and currently there are a lot of them in Mesa) ... This is because libtool is broken. 4. I have worked to target issues of 1. to 3. in a bottom-up series since December while splitting mesa into libmesacore, libmesadri and libmesagallium to reduce binary sizes as much as possible for distributions Hi Johannes, any chance you could continue the work on shared libs? We all have the same goals, reduce binary sizes, fix undefined symbols, reduce the number of build configurations, support for make dist and make distcheck - long story short improve mesa's build system. This time we have more time until the next mesa release to work out all issues. I have not stopped this work mainly for my own ego and researches (currently it works for my test cases and should be almost finished for all cases). But I am not really sure whether I will publish the patches because my general experience has been sad when my work shall become pushed to mainline repositories: core devs complained, sb. reinvented the wheel some months later and/or recognized my first approach wasn't so wrong ... Also asking and begging core devs a few times to get patches pushed is not the thing I want to do anymore. I know: If it works for my common test cases it isn't guaranteed that it will work for all cases. But you can find most issues only if patches landed in git master and become tested by more people / configure switches. Automake work is a good example: People don't test branches although they were asked to do so and complained firstly if configure switches in master were broken after the big push. But you should have seen during that time my interests were and are to quickly fix build failures caused by automake work ... If you ensure core developers agree with unconditionally shared libs, the Drop last parts of compatibility for the old Mesa build system. patch and generally the patch series will become pushed within a week after publishing for testing it will be likelier that I publish the patch series. Johannes Andreas. If 4. will be finished right this patch should become obsolete: http://cgit.freedesktop.org/mesa/mesa/commit/?id=cf69a591e1ad16b590c9ae2eba0da6fa6c4fc741 And also most of the C++ linker forces will become obsolete. But pushing things like http://cgit.freedesktop.org/mesa/mesa/commit/?id=2506b035031d6022fec0465bffac8eedd43de0f9 without saying in which cases it is required (e. g. not for me) doesn't make it easier to fulfill less memory consumption ... Johannes ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mismatch between Mesas dispatch table and the one used by the X server
On 03/20/2013 02:43 PM, Stefan Brüns wrote: On Tuesday 12 March 2013 09:25:07 Ian Romanick wrote: On 03/10/2013 11:24 AM, Stefan Brüns wrote: Hi everyone, I have run into a problem leading to a crashing X server for a bunch of indirect GLX calls. This problem affects any OpenGL clients using indirect rendering and calling functions which are outside the ABI. Problems range from malfunctioning to crashes. Problem analysis: The dispatcher functions in glx/indirect_dispatch.c demarshall the function arguments from the GLX wire protocol and then call the appropriate function of the GL library. Function calls are done using dispatch table with the help of the CALL_* helper macros defined in dispatch.h. Unfortunately there is a mismatch between the dispatch table expected by the xserver - which follows the layout e.g. found in glapitable.h - and the one provided by the GL library, e.g. Mesa. The dependency is the other way around. The loader (either libGL on the client or the GLX extension in the server) dictates the layout of the dispatch table. The GL driver is required to adapt to the layout dictated by the loader. That's the whole reason the remap table exists. The driver queries the loader to learn where functions are located in the dispatch table. It then stores the dispatch table locations at (fixed) locations in the remap table. So, the driver knows that glFoo is at location 824 in the remap table, and that entry stores the location of glFoo in the dispatch table. It sounds like something else is going wrong. Currently this obviously can not work. The remap table is only used when FEATURE_remap_table is defined, which for the X server is never true. Now, lets try defining FEATURE_remap_table for the xserver (which breaks OS X and windows builds of Mesa, but anyway ...) Xserver has nothing to do with it. The remap table is entirely in the driver (*_dri.so). The xserver has no knowledge about the remap table whatsoever. The xserver only knows about the dispatch table, and it dictates where things are in that table. It's been a long time since I wrote this code, but I haven't been able to kill of all the memories of it yet. :) Even then, the lookup of indexes for the remap table is going wrong. The xserver uses the remap table slots defined in the dispatch.h exported from some past version of Mesa, i.e. #define TexBufferARB_remap_index 174. Now the remap table is populated from _mesa_init_remap_table in mesa/main/remap.c, which calls: _mesa_do_init_remap_table(_mesa_function_pool, driDispatchRemapTable_size, MESA_remap_table_functions); Now lets have a look a look into MESA_remap_table_functions[174], which should have the entry for TexBufferARB, and yes, its there, so everything fine! But wait, do another test - ClientWaitSync, slot 178 in remap table, at least if you ask the x server, but slot 185 if you ask Mesa ... So obviously the remap table is filled with the wrong values. Regards, Stefan ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mismatch between Mesas dispatch table and the one used by the X server
On Wednesday 20 March 2013 15:47:24 you wrote: On 03/20/2013 02:43 PM, Stefan Brüns wrote: On Tuesday 12 March 2013 09:25:07 Ian Romanick wrote: On 03/10/2013 11:24 AM, Stefan Brüns wrote: Hi everyone, I have run into a problem leading to a crashing X server for a bunch of indirect GLX calls. This problem affects any OpenGL clients using indirect rendering and calling functions which are outside the ABI. Problems range from malfunctioning to crashes. Problem analysis: The dispatcher functions in glx/indirect_dispatch.c demarshall the function arguments from the GLX wire protocol and then call the appropriate function of the GL library. Function calls are done using dispatch table with the help of the CALL_* helper macros defined in dispatch.h. Unfortunately there is a mismatch between the dispatch table expected by the xserver - which follows the layout e.g. found in glapitable.h - and the one provided by the GL library, e.g. Mesa. The dependency is the other way around. The loader (either libGL on the client or the GLX extension in the server) dictates the layout of the dispatch table. The GL driver is required to adapt to the layout dictated by the loader. That's the whole reason the remap table exists. The driver queries the loader to learn where functions are located in the dispatch table. It then stores the dispatch table locations at (fixed) locations in the remap table. So, the driver knows that glFoo is at location 824 in the remap table, and that entry stores the location of glFoo in the dispatch table. It sounds like something else is going wrong. Currently this obviously can not work. The remap table is only used when FEATURE_remap_table is defined, which for the X server is never true. Now, lets try defining FEATURE_remap_table for the xserver (which breaks OS X and windows builds of Mesa, but anyway ...) Xserver has nothing to do with it. The remap table is entirely in the driver (*_dri.so). The xserver has no knowledge about the remap table whatsoever. The xserver only knows about the dispatch table, and it dictates where things are in that table. It's been a long time since I wrote this code, but I haven't been able to kill of all the memories of it yet. :) Please, look at the code again as I have done! The x server code found in xorg/glx/indirect_dispatch.c directly call into the dispatch table, and it uses the offsets found in xorg/glx/dispatch.h Regards, Stefan -- Stefan Brüns / Bergstraße 21 / 52062 Aachen phone: +49 241 53809034 mobile: +49 151 50412019 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mismatch between Mesas dispatch table and the one used by the X server
On Wednesday 20 March 2013 15:47:24 you wrote: On 03/20/2013 02:43 PM, Stefan Brüns wrote: On Tuesday 12 March 2013 09:25:07 Ian Romanick wrote: On 03/10/2013 11:24 AM, Stefan Brüns wrote: Hi everyone, I have run into a problem leading to a crashing X server for a bunch of indirect GLX calls. This problem affects any OpenGL clients using indirect rendering and calling functions which are outside the ABI. Problems range from malfunctioning to crashes. Problem analysis: The dispatcher functions in glx/indirect_dispatch.c demarshall the function arguments from the GLX wire protocol and then call the appropriate function of the GL library. Function calls are done using dispatch table with the help of the CALL_* helper macros defined in dispatch.h. Unfortunately there is a mismatch between the dispatch table expected by the xserver - which follows the layout e.g. found in glapitable.h - and the one provided by the GL library, e.g. Mesa. The dependency is the other way around. The loader (either libGL on the client or the GLX extension in the server) dictates the layout of the dispatch table. The GL driver is required to adapt to the layout dictated by the loader. That's the whole reason the remap table exists. The driver queries the loader to learn where functions are located in the dispatch table. It then stores the dispatch table locations at (fixed) locations in the remap table. So, the driver knows that glFoo is at location 824 in the remap table, and that entry stores the location of glFoo in the dispatch table. It sounds like something else is going wrong. Currently this obviously can not work. The remap table is only used when FEATURE_remap_table is defined, which for the X server is never true. Now, lets try defining FEATURE_remap_table for the xserver (which breaks OS X and windows builds of Mesa, but anyway ...) Xserver has nothing to do with it. The remap table is entirely in the driver (*_dri.so). The xserver has no knowledge about the remap table whatsoever. The xserver only knows about the dispatch table, and it dictates where things are in that table. It's been a long time since I wrote this code, but I haven't been able to kill of all the memories of it yet. :) Just one addition: The current code in the xserver inlines the various CALL_* invocations. Maybe inlining this code was never intended? Regards, Stefan -- Stefan Brüns / Bergstraße 21 / 52062 Aachen phone: +49 241 53809034 mobile: +49 151 50412019 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 03/15] glsl: parse in/out types for interface blocks
Jordan Justen jordan.l.jus...@intel.com writes: Previously only 'uniform' was allowed for uniform blocks. Now, in/out can be parsed, but it will only be allowed for GLSL = 150. basic_interface_block: - UNIFORM NEW_IDENTIFIER '{' member_list '}' instance_name_opt ';' + interface_qualifier NEW_IDENTIFIER '{' member_list '}' instance_name_opt ';' { ast_interface_block *const block = $6; block-block_name = $2; block-declarations.push_degenerate_list_at_head( $4-link); -if (!state-ARB_uniform_buffer_object_enable) { - _mesa_glsl_error( @1, state, -#version 140 / GL_ARB_uniform_buffer_object -required for defining uniform blocks\n); -} else if (state-ARB_uniform_buffer_object_warn) { - _mesa_glsl_warning( @1, state, - #version 140 / GL_ARB_uniform_buffer_object - required for defining uniform blocks\n); +if ($1.flags.q.uniform) { + if (!state-ARB_uniform_buffer_object_enable) { + _mesa_glsl_error( @1, state, + #version 140 / GL_ARB_uniform_buffer_object + required for defining uniform blocks\n); + } else if (state-ARB_uniform_buffer_object_warn) { + _mesa_glsl_warning( @1, state, + #version 140 / GL_ARB_uniform_buffer_object + required for defining uniform blocks\n); + } +} else { + if (state-es_shader || state-language_version 150) { + _mesa_glsl_error( @1, state, + #version 150 required for using + interface blocks.\n); + } } /* Since block arrays require names, and both features are added in @@ -1937,10 +1946,39 @@ basic_interface_block: blocks with an instance name\n); } +unsigned interface_type_mask, interface_type_flags; +struct ast_type_qualifier temp_type_qualifier; + +temp_type_qualifier.flags.i = 0; +temp_type_qualifier.flags.q.uniform = true; +temp_type_qualifier.flags.q.in = true; +temp_type_qualifier.flags.q.out = true; +interface_type_mask = temp_type_qualifier.flags.i; +interface_type_flags = $1.flags.i interface_type_mask; +block-layout.flags.i |= interface_type_flags; Given that an interface_qualifier ($1) only has either uniform, in, or out set, I don't see why this masking is needed. pgpHn7EY1psNf.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 05/15] glsl parser: on desktop GL require GLSL 150 for instance names
Jordan Justen jordan.l.jus...@intel.com writes: Interface blocks in GLSL 150 allow an instance name to be used. Signed-off-by: Jordan Justen jordan.l.jus...@intel.com --- src/glsl/glsl_parser.yy | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy index 8e6b04d..1fd8cc2 100644 --- a/src/glsl/glsl_parser.yy +++ b/src/glsl/glsl_parser.yy @@ -1953,11 +1953,16 @@ basic_interface_block: * the same language versions, we don't have to explicitly * version-check both things. */ -if (block-instance_name != NULL - !(state-language_version == 300 state-es_shader)) { - _mesa_glsl_error( @1, state, -#version 300 es required for using uniform -blocks with an instance name\n); +if (block-instance_name != NULL) { + if(state-es_shader state-language_version 300) { ^ missing space pgpZEdEj_BX3u.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 06/15] glsl parser: allow in out for interface block members
Jordan Justen jordan.l.jus...@intel.com writes: Previously uniform blocks allowed for the 'uniform' keyword to be used with members of a uniform blocks. With interface blocks 'in' can be used on 'in' interface block members and 'out' can be used on 'out' interface block members. The basic_interface_block rule will verify that the same qualifier type is used with the block and each member. -type-qualifier = $1; -type-qualifier.flags.q.uniform = true; -type-specifier = $3; +if (!type-qualifier.merge_qualifier( @1, state, $1)) { + YYERROR; +} + +if (type-qualifier.flags.q.attribute) { + _mesa_glsl_error( @1, state, + keyword 'attribute' cannot be used with + interface block member\n); +} else if (type-qualifier.flags.q.varying) { + _mesa_glsl_error( @1, state, + keyword 'varying' cannot be used with + interface block member\n); +} I think some more qualifiers are getting allowed now, are they all intentional? - invariant - smooth - flat - noperspective Could 7/15 get easily moved before this one, so that we don't allow uniforms in our in/out blocks at this commit? I'm done for the evening. Patches 1-5 are (other than minor comments): Reviewed-by: Eric Anholt e...@anholt.net pgpblthzdX2Oz.pgp Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] i965 varying-index uniforms improvement
https://bugs.freedesktop.org/show_bug.cgi?id=61554 It's had more me toos than I would have expected, so I've marked all but 2 incidental patches as a candidate for 9.1. It's also fairly invasive, so I'm quite uncomfortable doing so. I've tested on gm45, snb, and ivb so far, and it seems to be working, though. The previous iteration of the IVB changes have been confirmed to fix the regression, and I hope to hear back on pre-IVB soon. The branch is at fs-varying-uniform-gen4 of my tree. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/13] i965/fs: Allow constant propagation into MACH.
This happens quite a bit with varying-index uniform loads. We could also do better by avoiding the MACH entirely, but there's no reason not to at least take this step. --- src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp index 194ed07..2d0391a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp @@ -261,6 +261,7 @@ fs_visitor::try_constant_propagate(fs_inst *inst, acp_entry *entry) progress = true; break; + case BRW_OPCODE_MACH: case BRW_OPCODE_MUL: case BRW_OPCODE_ADD: if (i == 1) { @@ -268,10 +269,11 @@ fs_visitor::try_constant_propagate(fs_inst *inst, acp_entry *entry) progress = true; } else if (i == 0 inst-src[1].file != IMM) { /* Fit this constant in by commuting the operands. - * Exception: we can't do this for 32-bit integer MUL + * Exception: we can't do this for 32-bit integer MUL/MACH * because it's asymmetric. */ -if (inst-opcode == BRW_OPCODE_MUL +if ((inst-opcode == BRW_OPCODE_MUL || + inst-opcode == BRW_OPCODE_MACH) (inst-src[1].type == BRW_REGISTER_TYPE_D || inst-src[1].type == BRW_REGISTER_TYPE_UD)) break; -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/13] i965/fs: Remove creation of a MOV instruction that's never used.
We weren't inserting it into the list, so it did nothing. This line was replaced by the MOV/MUL block above. NOTE: This is a candidate for the 9.1 branch. --- src/mesa/drivers/dri/i965/brw_fs.cpp |1 - 1 file changed, 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 5a5bfeb..2fb8989 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -253,7 +253,6 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(fs_reg dst, fs_reg surf_index, } else { instructions.push_tail(MUL(mrf, offset, fs_reg(4))); } - inst = MOV(mrf, offset); inst = new(mem_ctx) fs_inst(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD, dst, surf_index); inst-header_present = header_present; -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/13] i965/fs: Move varying uniform offset compuation into the helper func.
I'm going to want to change the math for gen7 using sampler LD instructions in a way that gets CSE to occur like we'd hope. NOTE: This is a candidate for the 9.1 branch. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 16 +--- src/mesa/drivers/dri/i965/brw_fs.h |3 ++- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |5 ++--- 3 files changed, 13 insertions(+), 11 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 2fb8989..89b08e8 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -229,11 +229,15 @@ fs_visitor::CMP(fs_reg dst, fs_reg src0, fs_reg src1, uint32_t condition) exec_list fs_visitor::VARYING_PULL_CONSTANT_LOAD(fs_reg dst, fs_reg surf_index, - fs_reg offset) + fs_reg varying_offset, + uint32_t const_offset) { exec_list instructions; fs_inst *inst; + fs_reg offset = fs_reg(this, glsl_type::uint_type); + instructions.push_tail(ADD(offset, varying_offset, fs_reg(const_offset))); + if (intel-gen = 7) { inst = new(mem_ctx) fs_inst(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7, dst, surf_index, offset); @@ -1625,15 +1629,13 @@ fs_visitor::move_uniform_array_access_to_pull_constants() base_ir = inst-ir; current_annotation = inst-annotation; - fs_reg offset = fs_reg(this, glsl_type::int_type); - inst-insert_before(ADD(offset, *inst-src[i].reladdr, - fs_reg(pull_constant_loc[uniform] + -inst-src[i].reg_offset))); - fs_reg surf_index = fs_reg((unsigned)SURF_INDEX_FRAG_CONST_BUFFER); fs_reg temp = fs_reg(this, glsl_type::float_type); exec_list list = VARYING_PULL_CONSTANT_LOAD(temp, - surf_index, offset); + surf_index, + *inst-src[i].reladdr, + pull_constant_loc[uniform] + + inst-src[i].reg_offset); inst-insert_before(list); inst-src[i].file = temp.file; diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 254a534..76130b1 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -294,7 +294,8 @@ public: fs_reg reg); exec_list VARYING_PULL_CONSTANT_LOAD(fs_reg dst, fs_reg surf_index, -fs_reg offset); +fs_reg varying_offset, +uint32_t const_offset); bool run(); void setup_payload_gen4(); diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 735a33d..6b6af8d 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -650,9 +650,8 @@ fs_visitor::visit(ir_expression *ir) emit(SHR(base_offset, op[1], fs_reg(2))); for (int i = 0; i ir-type-vector_elements; i++) { -fs_reg offset = fs_reg(this, glsl_type::int_type); -emit(ADD(offset, base_offset, fs_reg(i))); -emit(VARYING_PULL_CONSTANT_LOAD(result, surf_index, offset)); +emit(VARYING_PULL_CONSTANT_LOAD(result, surf_index, +base_offset, i)); if (ir-type-base_type == GLSL_TYPE_BOOL) emit(CMP(result, result, fs_reg(0), BRW_CONDITIONAL_NZ)); -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/13] i965/fs: Avoid inappropriate optimization with regs_written 1.
Right now we don't have anything with regs_written() 1 and !inst-mlen, but that's about to change. NOTE: This is a candidate for the 9.1 branch. --- src/mesa/drivers/dri/i965/brw_fs.cpp |6 ++ 1 file changed, 6 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index fbe9e3a..f1b0789 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2087,6 +2087,12 @@ fs_visitor::compute_to_mrf() break; } +/* Things returning more than one register would need us to + * understand coalescing out more than one MOV at a time. + */ +if (scan_inst-regs_written() 1) + break; + /* SEND instructions can't have MRF as a destination. */ if (scan_inst-mlen) break; -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/13] i965: Make the fragment shader pull constants index by dwords, not vec4s.
We want to load vec4s, since loading a vec4 instead of a dword is basically no increased latency. But for variable indexed access, the previous requirement of aligned vec4s for a sampler LD was hard to implement. Note that this change only affects those messages that use the surface format, like sampler LDs, but not to the untyped data cache loads we've used in other cases. No significant performance difference on my GLSL demo with uniforms forced to take the varying pull constants path (n=4). NOTE: This is a candidate for the 9.1 branch. --- src/mesa/drivers/dri/i965/brw_fs.cpp |5 - src/mesa/drivers/dri/i965/brw_state.h |5 - src/mesa/drivers/dri/i965/brw_vs_surface_state.c |2 +- src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 13 - src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |5 +++-- src/mesa/drivers/dri/intel/intel_context.h|5 +++-- 6 files changed, 19 insertions(+), 16 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 89b08e8..fbe9e3a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2483,10 +2483,13 @@ fs_visitor::lower_uniform_pull_constant_loads() continue; if (intel-gen = 7) { + /* The offset arg before was a vec4-aligned byte offset. We need to + * turn it into a dword offset. + */ fs_reg const_offset_reg = inst-src[1]; assert(const_offset_reg.file == IMM const_offset_reg.type == BRW_REGISTER_TYPE_UD); - const_offset_reg.imm.u /= 16; + const_offset_reg.imm.u /= 4; fs_reg payload = fs_reg(this, glsl_type::uint_type); /* This is actually going to be a MOV, but since only the first dword diff --git a/src/mesa/drivers/dri/i965/brw_state.h b/src/mesa/drivers/dri/i965/brw_state.h index 02ce57b..29ec276 100644 --- a/src/mesa/drivers/dri/i965/brw_state.h +++ b/src/mesa/drivers/dri/i965/brw_state.h @@ -187,11 +187,6 @@ void *brw_state_batch(struct brw_context *brw, void gen4_init_vtable_surface_functions(struct brw_context *brw); uint32_t brw_get_surface_tiling_bits(uint32_t tiling); uint32_t brw_get_surface_num_multisamples(unsigned num_samples); -void brw_create_constant_surface(struct brw_context *brw, -drm_intel_bo *bo, -uint32_t offset, -int width, -uint32_t *out_offset); uint32_t brw_format_for_mesa_format(gl_format mesa_format); diff --git a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c index 6c0b690..675a84c 100644 --- a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c +++ b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c @@ -91,7 +91,7 @@ brw_upload_vs_pull_constants(struct brw_context *brw) const int surf = SURF_INDEX_VERT_CONST_BUFFER; intel-vtbl.create_constant_surface(brw, brw-vs.const_bo, 0, size, - brw-vs.surf_offset[surf]); + brw-vs.surf_offset[surf], false); brw-state.dirty.brw |= BRW_NEW_VS_CONSTBUF; } diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c index 98eed15..506ddf0 100644 --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c @@ -912,15 +912,16 @@ brw_update_texture_surface(struct gl_context *ctx, * Create the constant buffer surface. Vertex/fragment shader constants will be * read from this buffer with Data Port Read instructions/messages. */ -void +static void brw_create_constant_surface(struct brw_context *brw, drm_intel_bo *bo, uint32_t offset, uint32_t size, - uint32_t *out_offset) + uint32_t *out_offset, +bool dword_pitch) { struct intel_context *intel = brw-intel; - uint32_t stride = 16; + uint32_t stride = dword_pitch ? 4 : 16; uint32_t elements = ALIGN(size, stride) / stride; const GLint w = elements - 1; uint32_t *surf; @@ -1089,7 +1090,8 @@ brw_upload_wm_pull_constants(struct brw_context *brw) drm_intel_gem_bo_unmap_gtt(brw-wm.const_bo); intel-vtbl.create_constant_surface(brw, brw-wm.const_bo, 0, size, - brw-wm.surf_offset[surf_index]); + brw-wm.surf_offset[surf_index], + true); brw-state.dirty.brw |= BRW_NEW_SURFACES; } @@ -1442,7 +1444,8 @@ brw_upload_ubo_surfaces(struct brw_context *brw, */ intel-vtbl.create_constant_surface(brw, bo, binding-Offset, bo-size - binding-Offset, -
[Mesa-dev] [PATCH 09/13] i965/fs: Clean up the setup of gen4 simd16 message destinations.
I think this makes it much more obvious what's going on here. NOTE: This is a candidate for the 9.1 branch. --- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp index 6b6af8d..48c6df3 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp @@ -916,11 +916,10 @@ fs_visitor::emit_texture_gen4(ir_texture *ir, fs_reg dst, fs_reg coordinate, * this weirdness around to the expected layout. */ orig_dst = dst; - const glsl_type *vec_type = -glsl_type::get_instance(ir-type-base_type, 4, 1); - dst = fs_reg(this, glsl_type::get_array_instance(vec_type, 2)); - dst.type = intel-is_g4x ? brw_type_for_base_type(ir-type) - : BRW_REGISTER_TYPE_F; + dst = fs_reg(GRF, virtual_grf_alloc(8), + (intel-is_g4x ? +brw_type_for_base_type(ir-type) : +BRW_REGISTER_TYPE_F)); } fs_inst *inst = NULL; -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/13] i965/fs: Do CSE on gen7's varying-index pull constant loads.
This is our first CSE on a regs_written() 1 instruction, so it takes a bit of extra fixup. Reduces the number of loads on kwin's Lanczos shader from 12 to 2. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554 NOTE: This is a candidate for the 9.1 branch. --- src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 43 ++ 1 file changed, 32 insertions(+), 11 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp index 02642c9..c89da36 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp @@ -68,6 +68,7 @@ is_expression(const fs_inst *const inst) case BRW_OPCODE_MAD: case BRW_OPCODE_LRP: case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD: + case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7: case FS_OPCODE_CINTERP: case FS_OPCODE_LINTERP: return true; @@ -129,21 +130,41 @@ fs_visitor::opt_cse_local(bblock_t *block, exec_list *aeb) */ bool no_existing_temp = entry-tmp.file == BAD_FILE; if (no_existing_temp) { - entry-tmp = fs_reg(this, glsl_type::float_type); - entry-tmp.type = inst-dst.type; - - fs_inst *copy = new(ralloc_parent(inst)) - fs_inst(BRW_OPCODE_MOV, entry-generator-dst, entry-tmp); - entry-generator-insert_after(copy); - entry-generator-dst = entry-tmp; + int written = entry-generator-regs_written(); + + fs_reg orig_dst = entry-generator-dst; + fs_reg tmp = fs_reg(GRF, virtual_grf_alloc(written), + orig_dst.type); + entry-tmp = tmp; + entry-generator-dst = tmp; + + for (int i = 0; i written; i++) { + fs_inst *copy = MOV(orig_dst, tmp); + copy-force_writemask_all = + entry-generator-force_writemask_all; + entry-generator-insert_after(copy); + + orig_dst.reg_offset++; + tmp.reg_offset++; + } } /* dest - temp */ +int written = inst-regs_written(); +assert(written == entry-generator-regs_written()); assert(inst-dst.type == entry-tmp.type); - fs_inst *copy = new(ralloc_parent(inst)) - fs_inst(BRW_OPCODE_MOV, inst-dst, entry-tmp); -copy-force_writemask_all = inst-force_writemask_all; - inst-replace_with(copy); +fs_reg dst = inst-dst; +fs_reg tmp = entry-tmp; +fs_inst *copy; +for (int i = 0; i written; i++) { + copy = MOV(dst, tmp); + copy-force_writemask_all = inst-force_writemask_all; + inst-insert_before(copy); + + dst.reg_offset++; + tmp.reg_offset++; +} +inst-remove(); /* Appending an instruction may have changed our bblock end. */ if (inst == block-end) { -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/13] i965/fs: Don't double-emit SEND dependency workarounds at control flow.
We weren't setting needs_dep[i] in the loops, so we'd continue on to potentially add the same workaround MOVs to the later basic block boundaries, too. We can either set needs_dep[i] to exit through the normal path, or we can just return since we know we're done. --- src/mesa/drivers/dri/i965/brw_fs.cpp |2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index c128175..5d83e50 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2346,6 +2346,7 @@ fs_visitor::insert_gen4_pre_send_dependency_workarounds(fs_inst *inst) inst-insert_before(DEP_RESOLVE_MOV(first_write_grf + i)); } } + return; } bool scan_inst_16wide = (dispatch_width 8 @@ -2415,6 +2416,7 @@ fs_visitor::insert_gen4_post_send_dependency_workarounds(fs_inst *inst) if (needs_dep[i]) scan_inst-insert_before(DEP_RESOLVE_MOV(first_write_grf + i)); } + return; } /* Clear the flag for registers that actually got read (as expected). */ -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/13] i965/fs: Improve performance of varying-index uniform loads on IVB.
Like we have done for the VS and for constant-index uniform loads, we use the sampler engine to get caching in front of the L3 to avoid tickling the IVB L3 bug. This is also a bit of a functional change, as we're now loading a vec4 instead of a single dword, though we're not taking advantage of the other 3 components of the vec4 (yet). With the driver hacked to always take the varying-index path for all uniforms, improves performance of my old GLSL demo by 315% +/- 2% (n=4). This a major fix for some blur shaders in compositors from the varying-index uniforms support I introduced in 9.1. v2: Move old offset computation into the pre-gen7 path. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554 NOTE: This is a candidate for the 9.1 branch. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 29 - src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 27 ++- 2 files changed, 38 insertions(+), 18 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index f1b0789..f4aa9f7 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -235,14 +235,33 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(fs_reg dst, fs_reg surf_index, exec_list instructions; fs_inst *inst; - fs_reg offset = fs_reg(this, glsl_type::uint_type); - instructions.push_tail(ADD(offset, varying_offset, fs_reg(const_offset))); - if (intel-gen = 7) { + /* We have our constant surface use a pitch of 4 bytes, so our index can + * be any component of a vector, and then we load 4 contiguous + * components starting from that. + * + * We break down the const_offset to a portion added to the variable + * offset and a portion done using reg_offset, which means that if you + * have GLSL using something like uniform vec4 a[20]; gl_FragColor = + * a[i], we'll temporarily generate 4 vec4 loads from offset i * 4, and + * CSE can later notice that those loads are all the same and eliminate + * the redundant ones. + */ + fs_reg vec4_offset = fs_reg(this, glsl_type::int_type); + instructions.push_tail(ADD(vec4_offset, + varying_offset, const_offset ~3)); + + fs_reg vec4_result = fs_reg(GRF, virtual_grf_alloc(4), dst.type); inst = new(mem_ctx) fs_inst(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7, - dst, surf_index, offset); + vec4_result, surf_index, vec4_offset); instructions.push_tail(inst); + + vec4_result.reg_offset += const_offset 3; + instructions.push_tail(MOV(dst, vec4_result)); } else { + fs_reg offset = fs_reg(this, glsl_type::uint_type); + instructions.push_tail(ADD(offset, varying_offset, fs_reg(const_offset))); + int base_mrf = 13; bool header_present = true; @@ -313,7 +332,7 @@ fs_inst::equals(fs_inst *inst) int fs_inst::regs_written() { - if (is_tex()) + if (is_tex() || opcode == FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7) return 4; /* The SINCOS and INT_DIV_QUOTIENT_AND_REMAINDER math functions return 2, diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp index 712fef6..4b3c43f 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp @@ -737,28 +737,29 @@ fs_generator::generate_varying_pull_constant_load_gen7(fs_inst *inst, index.type == BRW_REGISTER_TYPE_UD); uint32_t surf_index = index.dw1.ud; - uint32_t msg_control, rlen, mlen; + uint32_t simd_mode, rlen, mlen; if (dispatch_width == 16) { - msg_control = BRW_DATAPORT_DWORD_SCATTERED_BLOCK_16DWORDS; - mlen = rlen = 2; + mlen = 2; + rlen = 8; + simd_mode = BRW_SAMPLER_SIMD_MODE_SIMD16; } else { - msg_control = BRW_DATAPORT_DWORD_SCATTERED_BLOCK_8DWORDS; - mlen = rlen = 1; + mlen = 1; + rlen = 4; + simd_mode = BRW_SAMPLER_SIMD_MODE_SIMD8; } struct brw_instruction *send = brw_next_insn(p, BRW_OPCODE_SEND); brw_set_dest(p, send, dst); brw_set_src0(p, send, offset); - if (intel-gen 6) - send-header.destreg__conditionalmod = inst-base_mrf; - brw_set_dp_read_message(p, send, + brw_set_sampler_message(p, send, surf_index, - msg_control, - GEN7_DATAPORT_DC_DWORD_SCATTERED_READ, - BRW_DATAPORT_READ_TARGET_DATA_CACHE, + 0, /* LD message ignores sampler unit */ + GEN5_SAMPLER_MESSAGE_SAMPLE_LD, + rlen, mlen, - inst-header_present, - rlen); + false, /* no header */ + simd_mode, +
[Mesa-dev] [PATCH 10/13] i965/fs: Bake regs_written into the IR instead of recomputing it later.
For sampler messages, it depends on the target gen, and on gen4 SIMD16-sampler-on-SIMD8-execution we were returning 4 instead of 8 like we should. NOTE: This is a candidate for the 9.1 branch. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 29 +++- src/mesa/drivers/dri/i965/brw_fs.h |2 +- src/mesa/drivers/dri/i965/brw_fs_cse.cpp |6 ++-- .../drivers/dri/i965/brw_fs_live_variables.cpp |2 +- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp |8 +++--- .../dri/i965/brw_fs_schedule_instructions.cpp |6 ++-- src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |7 +++-- 7 files changed, 27 insertions(+), 33 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index f4aa9f7..c128175 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -60,6 +60,9 @@ fs_inst::init() this-src[0] = reg_undef; this-src[1] = reg_undef; this-src[2] = reg_undef; + + /* This will be the case for almost all instructions. */ + this-regs_written = 1; } fs_inst::fs_inst() @@ -254,6 +257,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(fs_reg dst, fs_reg surf_index, fs_reg vec4_result = fs_reg(GRF, virtual_grf_alloc(4), dst.type); inst = new(mem_ctx) fs_inst(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7, vec4_result, surf_index, vec4_offset); + inst-regs_written = 4; instructions.push_tail(inst); vec4_result.reg_offset += const_offset 3; @@ -329,26 +333,13 @@ fs_inst::equals(fs_inst *inst) offset == inst-offset); } -int -fs_inst::regs_written() -{ - if (is_tex() || opcode == FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7) - return 4; - - /* The SINCOS and INT_DIV_QUOTIENT_AND_REMAINDER math functions return 2, -* but we don't currently use them...nor do we have an opcode for them. -*/ - - return 1; -} - bool fs_inst::overwrites_reg(const fs_reg reg) { return (reg.file == dst.file reg.reg == dst.reg reg.reg_offset = dst.reg_offset - reg.reg_offset dst.reg_offset + regs_written()); + reg.reg_offset dst.reg_offset + regs_written); } bool @@ -1388,7 +1379,7 @@ fs_visitor::split_virtual_grfs() /* If there's a SEND message that requires contiguous destination * registers, no splitting is allowed. */ - if (inst-regs_written() 1) { + if (inst-regs_written 1) { split_grf[inst-dst.reg] = false; } } @@ -2109,7 +2100,7 @@ fs_visitor::compute_to_mrf() /* Things returning more than one register would need us to * understand coalescing out more than one MOV at a time. */ -if (scan_inst-regs_written() 1) +if (scan_inst-regs_written 1) break; /* SEND instructions can't have MRF as a destination. */ @@ -2326,7 +2317,7 @@ void fs_visitor::insert_gen4_pre_send_dependency_workarounds(fs_inst *inst) { int reg_size = dispatch_width / 8; - int write_len = inst-regs_written() * reg_size; + int write_len = inst-regs_written * reg_size; int first_write_grf = inst-dst.reg; bool needs_dep[BRW_MAX_MRF]; assert(write_len (int)sizeof(needs_dep) - 1); @@ -2366,7 +2357,7 @@ fs_visitor::insert_gen4_pre_send_dependency_workarounds(fs_inst *inst) * dependency has more latency than a MOV. */ if (scan_inst-dst.file == GRF) { - for (int i = 0; i scan_inst-regs_written(); i++) { + for (int i = 0; i scan_inst-regs_written; i++) { int reg = scan_inst-dst.reg + i * reg_size; if (reg = first_write_grf @@ -2405,7 +2396,7 @@ fs_visitor::insert_gen4_pre_send_dependency_workarounds(fs_inst *inst) void fs_visitor::insert_gen4_post_send_dependency_workarounds(fs_inst *inst) { - int write_len = inst-regs_written() * dispatch_width / 8; + int write_len = inst-regs_written * dispatch_width / 8; int first_write_grf = inst-dst.reg; bool needs_dep[BRW_MAX_MRF]; assert(write_len (int)sizeof(needs_dep) - 1); diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h index 76130b1..0c5aad1 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.h +++ b/src/mesa/drivers/dri/i965/brw_fs.h @@ -174,7 +174,6 @@ public: fs_reg src0, fs_reg src1,fs_reg src2); bool equals(fs_inst *inst); - int regs_written(); bool overwrites_reg(const fs_reg reg); bool is_tex(); bool is_math(); @@ -192,6 +191,7 @@ public: uint8_t flag_subreg; int mlen; /** SEND message length */ + int regs_written; /** Number of vgrfs written by a SEND message, or 1 */ int base_mrf; /** First MRF in the SEND message, if mlen is nonzero. */ uint32_t texture_offset; /** Texture offset bitfield */ int sampler; diff --git
[Mesa-dev] [PATCH 12/13] i965/fs: Use LD messages for pre-gen7 varying-index uniform loads
This comes at a minor performance cost at the moment (-3.2% +/- 0.2%, n=14 on my GM45 forced to load all uniforms through the varying-index path), but we get a whole vec4 at a time to reuse in the next commit. NOTE: This is a candidate for the 9.1 branch. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 92 ++--- src/mesa/drivers/dri/i965/brw_fs.h|3 +- src/mesa/drivers/dri/i965/brw_fs_cse.cpp |1 + src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 55 +++-- 4 files changed, 84 insertions(+), 67 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 5d83e50..e504e3a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -238,57 +238,53 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(fs_reg dst, fs_reg surf_index, exec_list instructions; fs_inst *inst; - if (intel-gen = 7) { - /* We have our constant surface use a pitch of 4 bytes, so our index can - * be any component of a vector, and then we load 4 contiguous - * components starting from that. - * - * We break down the const_offset to a portion added to the variable - * offset and a portion done using reg_offset, which means that if you - * have GLSL using something like uniform vec4 a[20]; gl_FragColor = - * a[i], we'll temporarily generate 4 vec4 loads from offset i * 4, and - * CSE can later notice that those loads are all the same and eliminate - * the redundant ones. - */ - fs_reg vec4_offset = fs_reg(this, glsl_type::int_type); - instructions.push_tail(ADD(vec4_offset, - varying_offset, const_offset ~3)); - - fs_reg vec4_result = fs_reg(GRF, virtual_grf_alloc(4), dst.type); - inst = new(mem_ctx) fs_inst(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7, - vec4_result, surf_index, vec4_offset); - inst-regs_written = 4; - instructions.push_tail(inst); - - vec4_result.reg_offset += const_offset 3; - instructions.push_tail(MOV(dst, vec4_result)); - } else { - fs_reg offset = fs_reg(this, glsl_type::uint_type); - instructions.push_tail(ADD(offset, varying_offset, fs_reg(const_offset))); - - int base_mrf = 13; - bool header_present = true; - - fs_reg mrf = fs_reg(MRF, base_mrf + header_present); - mrf.type = BRW_REGISTER_TYPE_D; - - /* On gen6+ we want the dword offset passed in, but on gen4/5 we need a - * dword-aligned byte offset. + /* We have our constant surface use a pitch of 4 bytes, so our index can +* be any component of a vector, and then we load 4 contiguous +* components starting from that. +* +* We break down the const_offset to a portion added to the variable +* offset and a portion done using reg_offset, which means that if you +* have GLSL using something like uniform vec4 a[20]; gl_FragColor = +* a[i], we'll temporarily generate 4 vec4 loads from offset i * 4, and +* CSE can later notice that those loads are all the same and eliminate +* the redundant ones. +*/ + fs_reg vec4_offset = fs_reg(this, glsl_type::int_type); + instructions.push_tail(ADD(vec4_offset, + varying_offset, const_offset ~3)); + + int scale = 1; + if (intel-gen == 4 dispatch_width == 8) { + /* Pre-gen5, we can either use a SIMD8 message that requires (header, + * u, v, r) as parameters, or we can just use the SIMD16 message + * consisting of (header, u). We choose the second, at the cost of a + * longer return length. */ - if (intel-gen == 6) { - instructions.push_tail(MOV(mrf, offset)); - } else { - instructions.push_tail(MUL(mrf, offset, fs_reg(4))); - } - inst = new(mem_ctx) fs_inst(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD, - dst, surf_index); - inst-header_present = header_present; - inst-base_mrf = base_mrf; - inst-mlen = header_present + dispatch_width / 8; + scale = 2; + } - instructions.push_tail(inst); + enum opcode op; + if (intel-gen = 7) + op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7; + else + op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD; + fs_reg vec4_result = fs_reg(GRF, virtual_grf_alloc(4 * scale), dst.type); + inst = new(mem_ctx) fs_inst(op, vec4_result, surf_index, vec4_offset); + inst-regs_written = 4 * scale; + instructions.push_tail(inst); + + if (intel-gen 7) { + inst-base_mrf = 13; + inst-header_present = true; + if (intel-gen == 4) + inst-mlen = 3; + else + inst-mlen = 1 + dispatch_width / 8; } + vec4_result.reg_offset += (const_offset 3) * scale; + instructions.push_tail(MOV(dst, vec4_result)); + return instructions; } @@ -766,7 +762,7 @@ fs_visitor::implied_mrf_writes(fs_inst *inst) case
[Mesa-dev] [PATCH 13/13] i965/fs: Allow CSE on pre-gen7 varying-index uniform loads
All the other expression types allowed here have inst-mlen == 0, and this one has implied MRF writes for all of its payload, so nothing else in the implementation should need to change. Reduces SEND messages for loading from pull constants in kwin's Lanczos shader from 16 to 6. (Due to a deficiency in constant propagation, I can't use the hack I did in the previous commit to test the performance change) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554 NOTE: This is a candidate for the 9.1 branch. --- src/mesa/drivers/dri/i965/brw_fs_cse.cpp |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp index 6984e1a..dca75c6 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp @@ -97,7 +97,7 @@ fs_visitor::opt_cse_local(bblock_t *block, exec_list *aeb) inst = (fs_inst *) inst-next) { /* Skip some cases. */ - if (is_expression(inst) !inst-predicate inst-mlen == 0 + if (is_expression(inst) !inst-predicate !inst-force_uncompressed !inst-force_sechalf !inst-conditional_mod) { -- 1.7.10.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev