date:20130320

https://bugs.freedesktop.org/show_bug.cgi?id=61364

--- Comment #6 from Jerome Glisse gli...@freedesktop.org ---
Yeah i saw same issue on radeon

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Error while compiling the MAPI directory

2013-03-20 Thread Ritvik_Sharma

I am using Visual Studio. I found that all these missing constants like 
MAPI_TABLE_NUM_STATIC are getting there values in mapi_abi.py.  Since I am 
building it in UEFI I am making [.inf] files and using them to generate the 
makefilesand not the makefiles given in the mesa kit. Could this be a reason 
for why I am getting the errors?
 How are the python functions initializing the C constants?

Thanks, 
Ritvik  

-Original Message-
From: Jose Fonseca [mailto:jfons...@vmware.com] 
Sent: Wednesday, March 20, 2013 8:00 PM
To: Sharma, Ritvik
Cc: mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] Error while compiling the MAPI directory

You're building with scons right?

Jose

- Original Message -
 Hi,
 
 I used the latest mesa and I am still receiving the same errors. It 
 works perfectly fine in Ubuntu though.
 
 Can somebody please tell in the file mapi_tmp.h how does the following 
 constant included?
 #include MAPI_ABI_HEADER
 
 Thanks,
 Ritvik
 
 -Original Message-
 From: Jose Fonseca [mailto:jfons...@vmware.com]
 Sent: Monday, March 18, 2013 11:29 PM
 To: Sharma, Ritvik
 Cc: mesa-dev@lists.freedesktop.org
 Subject: Re: [Mesa-dev] Error while compiling the MAPI directory
 
 - Original Message -
  
  
  Hi,
  
  I am receiving the following error while compiling the code in the 
  mapi directory. I am using mesa 7.5.
 
 If you're compiling with MSVC I'd recommend using a recent Mesa 
 release and save your self a world of trouble. It's known to build well there.
 
 If you must use this old release, then you'll likely need to search 
 the MSVC build fixes and crossport them.
 
 Jose
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] st/egl: Fix build after changes in src/egl/wayland/

2013-03-20 Thread Kristian Høgsberg

I pushed a different fix for this.  The gallium egl code doesn't have
support for buffer sharing via fd passing so we can't just ask the
protocol code to advertise that, even if the kernel has the
DRM_CAP_PRIME features.  Instead we just pass 0 for the flags
argument.

thanks,
Kristian

On Tue, Mar 19, 2013 at 6:07 AM, Michel Dänzer mic...@daenzer.net wrote:
 From: Michel Dänzer michel.daen...@amd.com

 Not sure it actually works though, some buffer callbacks seem to have rotted
 before.

 Signed-off-by: Michel Dänzer michel.daen...@amd.com
 ---
  src/gallium/state_trackers/egl/drm/native_drm.c |  8 +++-
  src/gallium/state_trackers/egl/wayland/native_drm.c |  8 +++-
  src/gallium/state_trackers/egl/x11/native_dri2.c| 14 +-
  3 files changed, 27 insertions(+), 3 deletions(-)

 diff --git a/src/gallium/state_trackers/egl/drm/native_drm.c 
 b/src/gallium/state_trackers/egl/drm/native_drm.c
 index f0c0f54..65c91cf 100644
 --- a/src/gallium/state_trackers/egl/drm/native_drm.c
 +++ b/src/gallium/state_trackers/egl/drm/native_drm.c
 @@ -207,13 +207,19 @@ drm_display_bind_wayland_display(struct native_display 
 *ndpy,
struct wl_display *wl_dpy)
  {
 struct drm_display *drmdpy = drm_display(ndpy);
 +   int ret, flags = 0;
 +   uint64_t cap;

 if (drmdpy-wl_server_drm)
return FALSE;

 +   ret = drmGetCap(drmdpy-fd, DRM_CAP_PRIME, cap);
 +   if (ret == 0  cap == (DRM_PRIME_CAP_IMPORT | DRM_PRIME_CAP_EXPORT))
 +  flags |= WAYLAND_DRM_PRIME;
 +
 drmdpy-wl_server_drm = wayland_drm_init(wl_dpy,
   drmdpy-device_name,
 - wl_drm_callbacks, ndpy);
 + wl_drm_callbacks, ndpy, flags);

 if (!drmdpy-wl_server_drm)
return FALSE;
 diff --git a/src/gallium/state_trackers/egl/wayland/native_drm.c 
 b/src/gallium/state_trackers/egl/wayland/native_drm.c
 index 3801fac..7633379 100644
 --- a/src/gallium/state_trackers/egl/wayland/native_drm.c
 +++ b/src/gallium/state_trackers/egl/wayland/native_drm.c
 @@ -265,13 +265,19 @@ wayland_drm_display_bind_wayland_display(struct 
 native_display *ndpy,
   struct wl_display *wl_dpy)
  {
 struct wayland_drm_display *drmdpy = wayland_drm_display(ndpy);
 +   int ret, flags = 0;
 +   uint64_t cap;

 if (drmdpy-wl_server_drm)
return FALSE;

 +   ret = drmGetCap(drmdpy-fd, DRM_CAP_PRIME, cap);
 +   if (ret == 0  cap == (DRM_PRIME_CAP_IMPORT | DRM_PRIME_CAP_EXPORT))
 +  flags |= WAYLAND_DRM_PRIME;
 +
 drmdpy-wl_server_drm =
wayland_drm_init(wl_dpy, drmdpy-device_name,
 -   wl_drm_callbacks, ndpy);
 +   wl_drm_callbacks, ndpy, flags);

 if (!drmdpy-wl_server_drm)
return FALSE;
 diff --git a/src/gallium/state_trackers/egl/x11/native_dri2.c 
 b/src/gallium/state_trackers/egl/x11/native_dri2.c
 index a989f9e..67ecb60 100644
 --- a/src/gallium/state_trackers/egl/x11/native_dri2.c
 +++ b/src/gallium/state_trackers/egl/x11/native_dri2.c
 @@ -40,6 +40,7 @@

  #include common/native_helper.h
  #ifdef HAVE_WAYLAND_BACKEND
 +#include xf86drm.h
  #include common/native_wayland_drm_bufmgr_helper.h
  #endif

 @@ -63,6 +64,7 @@ struct dri2_display {
 struct util_hash_table *surfaces;
  #ifdef HAVE_WAYLAND_BACKEND
 struct wl_drm *wl_server_drm; /* for EGL_WL_bind_wayland_display */
 +   int fd;
  #endif
  };

 @@ -817,6 +819,10 @@ dri2_display_init_screen(struct native_display *ndpy)
return FALSE;
 }

 +#ifdef HAVE_WAYLAND_BACKEND
 +   dri2dpy-fd = fd;
 +#endif
 +
 return TRUE;
  }

 @@ -855,13 +861,19 @@ dri2_display_bind_wayland_display(struct native_display 
 *ndpy,
struct wl_display *wl_dpy)
  {
 struct dri2_display *dri2dpy = dri2_display(ndpy);
 +   int ret, flags = 0;
 +   uint64_t cap;

 if (dri2dpy-wl_server_drm)
return FALSE;

 +   ret = drmGetCap(dri2dpy-fd, DRM_CAP_PRIME, cap);
 +   if (ret == 0  cap == (DRM_PRIME_CAP_IMPORT | DRM_PRIME_CAP_EXPORT))
 +  flags |= WAYLAND_DRM_PRIME;
 +
 dri2dpy-wl_server_drm = wayland_drm_init(wl_dpy,
   x11_screen_get_device_name(dri2dpy-xscr),
 - wl_drm_callbacks, ndpy);
 + wl_drm_callbacks, ndpy, flags);

 if (!dri2dpy-wl_server_drm)
return FALSE;
 --
 1.8.2.rc3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] i965: Add a driconf option to disable flush throttling.

On 19 March 2013 17:10, Eric Anholt e...@anholt.net wrote:

 Paul Berry stereotype...@gmail.com writes:

  Normally when submitting the first batch buffer after a flush, we
  check whether the GPU has completed processing of the first batch
  buffer of the previous frame.  If it hasn't, we wait for it to finish
  before submitting any more batches.  This prevents GPU-heavy and
  CPU-light applications from racing too far ahead of the current frame,
  but at the expense of possibly lower frame rates.  Sometimes when
  benchmarking we want to disable this mechanism.
 
  This patch adds the driconf option disable_throttling to disable the
  throttling mechanism.

 We often do our driver-specific options inside of intel_screen.c, but I
 suppose this way someone could potentially translate it.

 Reviewed-by: Eric Anholt e...@anholt.net

 Have you found any interesting cases where this is a problem?


I think Ken found a benchmark where there was a marginal improvement, but I
don't recall exactly what it was.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH] meta: fix incorrect slice, r coordinate computation

2013-03-20 Thread Brian Paul

The arithmetic to convert a 3D texture slice to an R coordinate was
incorrect.  Found when MSVC warned of a divide by zero.

Note that we don't actually ever hit this path.  We don't decompress
slices of 3D textures and we don't support 3D mipmap generation yet.
---
 src/mesa/drivers/common/meta.c |   13 +
 1 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/mesa/drivers/common/meta.c b/src/mesa/drivers/common/meta.c
index 1a1fd28..8114550 100644
--- a/src/mesa/drivers/common/meta.c
+++ b/src/mesa/drivers/common/meta.c
@@ -3118,6 +3118,7 @@ setup_texture_coords(GLenum faceTarget,
  GLint slice,
  GLint width,
  GLint height,
+ GLint depth,
  GLfloat coords0[3],
  GLfloat coords1[3],
  GLfloat coords2[3],
@@ -3134,8 +3135,11 @@ setup_texture_coords(GLenum faceTarget,
case GL_TEXTURE_2D:
case GL_TEXTURE_3D:
case GL_TEXTURE_2D_ARRAY:
-  if (faceTarget == GL_TEXTURE_3D)
- r = 1.0F / slice;
+  if (faceTarget == GL_TEXTURE_3D) {
+ assert(slice  depth);
+ assert(depth = 1);
+ r = (slice + 0.5f) / depth;
+  }
   else if (faceTarget == GL_TEXTURE_2D_ARRAY)
  r = slice;
   else
@@ -3574,7 +3578,7 @@ _mesa_meta_GenerateMipmap(struct gl_context *ctx, GLenum 
target,
/* Setup texture coordinates */
setup_texture_coords(faceTarget,
 slice,
-0, 0, /* width, height never used here */
+0, 0, 1, /* width, height never used here */
 verts[0].tex,
 verts[1].tex,
 verts[2].tex,
@@ -3840,6 +3844,7 @@ decompress_texture_image(struct gl_context *ctx,
struct gl_texture_object *texObj = texImage-TexObject;
const GLint width = texImage-Width;
const GLint height = texImage-Height;
+   const GLint depth = texImage-Height;
const GLenum target = texObj-Target;
GLenum faceTarget;
struct vertex {
@@ -3935,7 +3940,7 @@ decompress_texture_image(struct gl_context *ctx,
   _mesa_BindSampler(ctx-Texture.CurrentUnit, decompress-Sampler);
}
 
-   setup_texture_coords(faceTarget, slice, width, height,
+   setup_texture_coords(faceTarget, slice, width, height, depth,
 verts[0].tex,
 verts[1].tex,
 verts[2].tex,
-- 
1.7.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] RFC: TGSI scalar arrays

2013-03-20 Thread Roland Scheidegger

Am 20.03.2013 15:41, schrieb Christoph Bumiller:
 Sorry, this has become longer than I anticipated ...
 
 I've been toying with adding support for TGSI_FILE_INPUT/OUTPUT arrays
 because, since I cannot allocate varyings in the same order that the
 register index specifies, I need it:
 
 ===
 EXAMPLE:
 OUT[0], CLIPDIST[1], must be allocated at address 0x2c0 in hardware
 output space
 OUT[1], CLIPDIST[0], 0x2d0
 OUT[2], GENERIC[0], between 0x80 and 0x280
 OUT[3], GENERIC[1], between 0x80 and 0x280
 
 And without array specification
 MOV OUT[TEMP[0].x-1], IMM[0]
 would leave me no clue as to whether use 0x80 or 0x2c0 as base address.
 ===
 
 Now that I'm on it, I'm considering to go a step further, which is
 adding indirect scalar/component access.
 This is motivated by float gl_ClipDistance[], which, if accessed
 indirectly, currently leaves us no choice than generating code like this:
 
 if ((index  3) == 0) access x component; else
 if ((index  3) == 1) access y component; ...
 
 This is undesirable and the hardware can do better (as it actually
 supports accessing individual components since address registers contain
 an address in bytes and we can do scalar read/write).
 
 A second motivation is varying packing, which is required by the GL
 spec, and may lead to use of TEMP arrays, which, albeit improved now,
 will impair performance when used (on nv50 they go to uncached memory
 which is very slow).
 
 That case occurs if, for instance, a varying float[8] is accessed
 indirectly and has to be packed into
 OUT[0..1].xyzw, GENERIC[0..1]
 instead of
 OUT[0..7].x, GENERIC[0..7]
 
 So far I've come up with 2 choices (all available only if the driver
 supports e.g. PIPE_CAP_TGSI_SCALAR_REGISTERS):
 
 
 1. SCALAR DECLARATIONS
 
 Using float gl_ClipDistance[8] as example, it could be declared as:
 
 OUT[0..7].x, CLIPDIST, ARRAY(1) where the .x now means that it's a
 single component per OUT[index]
 
 Now this obviously means that a single OUT[i] doesn't always consume 16
 bytes / 4 components anymore, which may be a somewhat disturbing, since
 the address of an output can't be directly inferred solely from its
 index anymore.
 However, that doesn't really constitute a problem if all access is
 either direct or comes with an ARRAY() reference.
 
 For varying packing, which happens only for user defined variables, and
 hence TGSI_SEMANTIC_GENERIC, it gets a bit uglier:
 
 (NOTE: GL requires us to be able to support exactly the amount of
 components we report, failing due to alignment is not allowed. Hence the
 GLSL compiler may put some variables at unaligned locations, see
 ir_variable.location_frac):
 
 A GENERIC semantic index should always cover 4 components so that a
 fixed location can be assigned for it (drivers usually do this since it
 makes an extra dynamic linkage pass when shaders are changed
 unnecessary, as intended by GL_ARB_separate_shader_objects).
 
 So, this would be valid:
 OUT[0..3].x, GENERIC[0]
 OUT[4..5].xy, GENERIC[1]
 OUT[6], GENERIC[2]
 Note how 3 OUT[indices] only consume 1 GENERIC[index].
 
 If we, instead, allocated semantic index per register index instead of
 per 4 components, we would have:
 OUT[0..3].x, GENERIC[0]
 OUT[4..5].xy, GENERIC[4]
 OUT[6], GENERIC[6]
 This would waste space, since GENERIC[4,6] would have to go to
 output_space[addresses 0x40, 0x60] so it could link with
 IN[6], GENERIC[6]
 where we have no information about the size of GENERIC[0 .. 5], and
 wasting space like that means the advertised number of varying
 components cannot be satisfied.
 
 
 And as a last step, if varyings are placed at non-vec4 boundaries, we
 would have to be able to specify fractional semantic indices, like this:
 OUT[0..2].x, GENERIC[0].x
 OUT[3].x, GENERIC[0].w
 
 
 
 2. SCALAR ADDRESS REGISTER VALUES
 
 All this can be avoided by always declaring full vec4s, and adding the
 possibility of doing indirect addressing on a per-component basis:
 
 varying float a[4] becomes:
 uniform int i;
 a[i+5] = 999 becomes:
 
 OUT[0].xyzw, ARRAY(1)
 UARL_SCALAR ADDR[0].x, CONST[0].
 MOV OUT(array 1)[ADDR[0].x+1].y, IMM[0].
 
 The only difficulty with this is that we have to split acess TGSI
 instructions accessing unaligned vectors:
 (NOTE: this can always be avoided with TGSI_FILE_TEMPORARY, but varyings
 may have to be packed).
 
 With suggestion (1), 2 packed (and hence unaligned) vec3 arrays and a
 single vec2 would look like this:
 OUT[0..3].xyz, GENERIC[0].x
 OUT[4..5].xyz, GENERIC[3].x
 OUT[6].xy, GENERIC[4].zw
 and we could still do:
 ADD OUT[5].xyz, TEMP[0], TEMP[1]
 
 Now, these would have to merged declared as:
 OUT[0..4].xyzw
 
 and the 2nd vec3 would be { OUT[0].w, OUT[1].xyz }
 
 instead of simply OUT[1].xyz
 
 A problem with this is that the GLSL compiler, while it can do the
 packing into vec4s and splitting up access, cannot, iirc, access
 individual components of a vec4 indirectly like TGSI would be able to.
 To avoid TEMP arrays we'd have to disable the last phase of

Re: [Mesa-dev] RFC: TGSI scalar arrays

2013-03-20 Thread Christoph Bumiller

On 20.03.2013 17:05, Roland Scheidegger wrote:
 Am 20.03.2013 15:41, schrieb Christoph Bumiller:
 Sorry, this has become longer than I anticipated ...

 I've been toying with adding support for TGSI_FILE_INPUT/OUTPUT arrays
 because, since I cannot allocate varyings in the same order that the
 register index specifies, I need it:

 ===
 EXAMPLE:
 OUT[0], CLIPDIST[1], must be allocated at address 0x2c0 in hardware
 output space
 OUT[1], CLIPDIST[0], 0x2d0
 OUT[2], GENERIC[0], between 0x80 and 0x280
 OUT[3], GENERIC[1], between 0x80 and 0x280

 And without array specification
 MOV OUT[TEMP[0].x-1], IMM[0]
 would leave me no clue as to whether use 0x80 or 0x2c0 as base address.
 ===

 Now that I'm on it, I'm considering to go a step further, which is
 adding indirect scalar/component access.
 This is motivated by float gl_ClipDistance[], which, if accessed
 indirectly, currently leaves us no choice than generating code like this:

 if ((index  3) == 0) access x component; else
 if ((index  3) == 1) access y component; ...

 This is undesirable and the hardware can do better (as it actually
 supports accessing individual components since address registers contain
 an address in bytes and we can do scalar read/write).

 A second motivation is varying packing, which is required by the GL
 spec, and may lead to use of TEMP arrays, which, albeit improved now,
 will impair performance when used (on nv50 they go to uncached memory
 which is very slow).

 That case occurs if, for instance, a varying float[8] is accessed
 indirectly and has to be packed into
 OUT[0..1].xyzw, GENERIC[0..1]
 instead of
 OUT[0..7].x, GENERIC[0..7]

 So far I've come up with 2 choices (all available only if the driver
 supports e.g. PIPE_CAP_TGSI_SCALAR_REGISTERS):


 1. SCALAR DECLARATIONS

 Using float gl_ClipDistance[8] as example, it could be declared as:

 OUT[0..7].x, CLIPDIST, ARRAY(1) where the .x now means that it's a
 single component per OUT[index]

 Now this obviously means that a single OUT[i] doesn't always consume 16
 bytes / 4 components anymore, which may be a somewhat disturbing, since
 the address of an output can't be directly inferred solely from its
 index anymore.
 However, that doesn't really constitute a problem if all access is
 either direct or comes with an ARRAY() reference.

 For varying packing, which happens only for user defined variables, and
 hence TGSI_SEMANTIC_GENERIC, it gets a bit uglier:

 (NOTE: GL requires us to be able to support exactly the amount of
 components we report, failing due to alignment is not allowed. Hence the
 GLSL compiler may put some variables at unaligned locations, see
 ir_variable.location_frac):

 A GENERIC semantic index should always cover 4 components so that a
 fixed location can be assigned for it (drivers usually do this since it
 makes an extra dynamic linkage pass when shaders are changed
 unnecessary, as intended by GL_ARB_separate_shader_objects).

 So, this would be valid:
 OUT[0..3].x, GENERIC[0]
 OUT[4..5].xy, GENERIC[1]
 OUT[6], GENERIC[2]
 Note how 3 OUT[indices] only consume 1 GENERIC[index].

 If we, instead, allocated semantic index per register index instead of
 per 4 components, we would have:
 OUT[0..3].x, GENERIC[0]
 OUT[4..5].xy, GENERIC[4]
 OUT[6], GENERIC[6]
 This would waste space, since GENERIC[4,6] would have to go to
 output_space[addresses 0x40, 0x60] so it could link with
 IN[6], GENERIC[6]
 where we have no information about the size of GENERIC[0 .. 5], and
 wasting space like that means the advertised number of varying
 components cannot be satisfied.


 And as a last step, if varyings are placed at non-vec4 boundaries, we
 would have to be able to specify fractional semantic indices, like this:
 OUT[0..2].x, GENERIC[0].x
 OUT[3].x, GENERIC[0].w



 2. SCALAR ADDRESS REGISTER VALUES

 All this can be avoided by always declaring full vec4s, and adding the
 possibility of doing indirect addressing on a per-component basis:

 varying float a[4] becomes:
 uniform int i;
 a[i+5] = 999 becomes:

 OUT[0].xyzw, ARRAY(1)
 UARL_SCALAR ADDR[0].x, CONST[0].
 MOV OUT(array 1)[ADDR[0].x+1].y, IMM[0].

 The only difficulty with this is that we have to split acess TGSI
 instructions accessing unaligned vectors:
 (NOTE: this can always be avoided with TGSI_FILE_TEMPORARY, but varyings
 may have to be packed).

 With suggestion (1), 2 packed (and hence unaligned) vec3 arrays and a
 single vec2 would look like this:
 OUT[0..3].xyz, GENERIC[0].x
 OUT[4..5].xyz, GENERIC[3].x
 OUT[6].xy, GENERIC[4].zw
 and we could still do:
 ADD OUT[5].xyz, TEMP[0], TEMP[1]

 Now, these would have to merged declared as:
 OUT[0..4].xyzw

 and the 2nd vec3 would be { OUT[0].w, OUT[1].xyz }

 instead of simply OUT[1].xyz

 A problem with this is that the GLSL compiler, while it can do the
 packing into vec4s and splitting up access, cannot, iirc, access
 individual components of a vec4 indirectly like TGSI would be able to.
 To avoid TEMP arrays we'd have to disable the

Re: [Mesa-dev] RFC: TGSI scalar arrays

2013-03-20 Thread Roland Scheidegger

Am 20.03.2013 17:46, schrieb Christoph Bumiller:
 On 20.03.2013 17:05, Roland Scheidegger wrote:
 Am 20.03.2013 15:41, schrieb Christoph Bumiller:
 Sorry, this has become longer than I anticipated ...

 I've been toying with adding support for TGSI_FILE_INPUT/OUTPUT arrays
 because, since I cannot allocate varyings in the same order that the
 register index specifies, I need it:

 ===
 EXAMPLE:
 OUT[0], CLIPDIST[1], must be allocated at address 0x2c0 in hardware
 output space
 OUT[1], CLIPDIST[0], 0x2d0
 OUT[2], GENERIC[0], between 0x80 and 0x280
 OUT[3], GENERIC[1], between 0x80 and 0x280

 And without array specification
 MOV OUT[TEMP[0].x-1], IMM[0]
 would leave me no clue as to whether use 0x80 or 0x2c0 as base address.
 ===

 Now that I'm on it, I'm considering to go a step further, which is
 adding indirect scalar/component access.
 This is motivated by float gl_ClipDistance[], which, if accessed
 indirectly, currently leaves us no choice than generating code like this:

 if ((index  3) == 0) access x component; else
 if ((index  3) == 1) access y component; ...

 This is undesirable and the hardware can do better (as it actually
 supports accessing individual components since address registers contain
 an address in bytes and we can do scalar read/write).

 A second motivation is varying packing, which is required by the GL
 spec, and may lead to use of TEMP arrays, which, albeit improved now,
 will impair performance when used (on nv50 they go to uncached memory
 which is very slow).

 That case occurs if, for instance, a varying float[8] is accessed
 indirectly and has to be packed into
 OUT[0..1].xyzw, GENERIC[0..1]
 instead of
 OUT[0..7].x, GENERIC[0..7]

 So far I've come up with 2 choices (all available only if the driver
 supports e.g. PIPE_CAP_TGSI_SCALAR_REGISTERS):


 1. SCALAR DECLARATIONS

 Using float gl_ClipDistance[8] as example, it could be declared as:

 OUT[0..7].x, CLIPDIST, ARRAY(1) where the .x now means that it's a
 single component per OUT[index]

 Now this obviously means that a single OUT[i] doesn't always consume 16
 bytes / 4 components anymore, which may be a somewhat disturbing, since
 the address of an output can't be directly inferred solely from its
 index anymore.
 However, that doesn't really constitute a problem if all access is
 either direct or comes with an ARRAY() reference.

 For varying packing, which happens only for user defined variables, and
 hence TGSI_SEMANTIC_GENERIC, it gets a bit uglier:

 (NOTE: GL requires us to be able to support exactly the amount of
 components we report, failing due to alignment is not allowed. Hence the
 GLSL compiler may put some variables at unaligned locations, see
 ir_variable.location_frac):

 A GENERIC semantic index should always cover 4 components so that a
 fixed location can be assigned for it (drivers usually do this since it
 makes an extra dynamic linkage pass when shaders are changed
 unnecessary, as intended by GL_ARB_separate_shader_objects).

 So, this would be valid:
 OUT[0..3].x, GENERIC[0]
 OUT[4..5].xy, GENERIC[1]
 OUT[6], GENERIC[2]
 Note how 3 OUT[indices] only consume 1 GENERIC[index].

 If we, instead, allocated semantic index per register index instead of
 per 4 components, we would have:
 OUT[0..3].x, GENERIC[0]
 OUT[4..5].xy, GENERIC[4]
 OUT[6], GENERIC[6]
 This would waste space, since GENERIC[4,6] would have to go to
 output_space[addresses 0x40, 0x60] so it could link with
 IN[6], GENERIC[6]
 where we have no information about the size of GENERIC[0 .. 5], and
 wasting space like that means the advertised number of varying
 components cannot be satisfied.


 And as a last step, if varyings are placed at non-vec4 boundaries, we
 would have to be able to specify fractional semantic indices, like this:
 OUT[0..2].x, GENERIC[0].x
 OUT[3].x, GENERIC[0].w



 2. SCALAR ADDRESS REGISTER VALUES

 All this can be avoided by always declaring full vec4s, and adding the
 possibility of doing indirect addressing on a per-component basis:

 varying float a[4] becomes:
 uniform int i;
 a[i+5] = 999 becomes:

 OUT[0].xyzw, ARRAY(1)
 UARL_SCALAR ADDR[0].x, CONST[0].
 MOV OUT(array 1)[ADDR[0].x+1].y, IMM[0].

 The only difficulty with this is that we have to split acess TGSI
 instructions accessing unaligned vectors:
 (NOTE: this can always be avoided with TGSI_FILE_TEMPORARY, but varyings
 may have to be packed).

 With suggestion (1), 2 packed (and hence unaligned) vec3 arrays and a
 single vec2 would look like this:
 OUT[0..3].xyz, GENERIC[0].x
 OUT[4..5].xyz, GENERIC[3].x
 OUT[6].xy, GENERIC[4].zw
 and we could still do:
 ADD OUT[5].xyz, TEMP[0], TEMP[1]

 Now, these would have to merged declared as:
 OUT[0..4].xyzw

 and the 2nd vec3 would be { OUT[0].w, OUT[1].xyz }

 instead of simply OUT[1].xyz

 A problem with this is that the GLSL compiler, while it can do the
 packing into vec4s and splitting up access, cannot, iirc, access
 individual components of a vec4 indirectly like TGSI would be able to.

Re: [Mesa-dev] RFC: TGSI scalar arrays

2013-03-20 Thread Christoph Bumiller

On 20.03.2013 18:30, Roland Scheidegger wrote:
 Am 20.03.2013 17:46, schrieb Christoph Bumiller:
 On 20.03.2013 17:05, Roland Scheidegger wrote:

 Not sure I fully understand this, but I'm thinking whenever in doubt,
 use something close to what dx10 does since that's likely going to work
 reasonable with different hw. Maybe declaring those special values
 differently (not just as output reg) would help?
 What DX10 does is making indirect access of varyings illegal. That's not
 possible with OpenGL ...
 Hmm I thought dcl_indexRange would be used for indirect access of varyings?

Interesting ... when last I tried that back when working on d3d1x, the
compiler didn't like it, and I remember something about indexRange
existing only for debugging (and I remember finding that strange).

Also, d3d11 doesn't have the annoying limit that GLSL has so there is no
need for it to pack varyings.
When I use floats[3] + SV_POSITION, I get vs_5_0 output limit (32)
exceeded, shader uses 33 outputs, but float4[28] works just fine.

For indirect access I still get:
error X3500: array reference cannot be used as an l-value; not natively
addressable

for

struct IA2VS
{
float4 position : POSITION;
float4 color: COLOR;
};

struct VS2PS
{
float4 position : SV_POSITION;
float4 color[2] : WHATEVER;
};

VS2PS vs(IA2VS input)
{
VS2PS result;
int i = int(input.position.x);
result.position = input.position;
result.color[i] = input.color;
return result;
}

float4 ps(VS2PS input) : SV_TARGET
{
return input.color[0];
}

 Roland

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Error while compiling the MAPI directory

2013-03-20 Thread Jose Fonseca

SCons builds via Visual Studio compilers. So I assume by I am using Visual 
Studio you mean, no, I'm not using SCons... 

I'd strongly recommend using scons instead of replicating Mesa build with MSVC 
project, as Mesa build is extremely complex (a lot of code generation).

If you are determined to do your own thing, then build Mesa with scons _once_ 
while recording its output

   scons verbose=1   scons.log 21

then study the commands used to compile, and mimic them.

I'm afraid I can't help you further diagnose particular failures.  Life is too 
short.

Jose

- Original Message -
 I am using Visual Studio. I found that all these missing constants like
 MAPI_TABLE_NUM_STATIC are getting there values in mapi_abi.py.  Since I am
 building it in UEFI I am making [.inf] files and using them to generate the
 makefilesand not the makefiles given in the mesa kit. Could this be a reason
 for why I am getting the errors?
How are the python functions initializing the C constants?
 
 Thanks,
 Ritvik
 
 -Original Message-
 From: Jose Fonseca [mailto:jfons...@vmware.com]
 Sent: Wednesday, March 20, 2013 8:00 PM
 To: Sharma, Ritvik
 Cc: mesa-dev@lists.freedesktop.org
 Subject: Re: [Mesa-dev] Error while compiling the MAPI directory
 
 You're building with scons right?
 
 Jose
 
 - Original Message -
  Hi,
  
  I used the latest mesa and I am still receiving the same errors. It
  works perfectly fine in Ubuntu though.
  
  Can somebody please tell in the file mapi_tmp.h how does the following
  constant included?
  #include MAPI_ABI_HEADER
  
  Thanks,
  Ritvik
  
  -Original Message-
  From: Jose Fonseca [mailto:jfons...@vmware.com]
  Sent: Monday, March 18, 2013 11:29 PM
  To: Sharma, Ritvik
  Cc: mesa-dev@lists.freedesktop.org
  Subject: Re: [Mesa-dev] Error while compiling the MAPI directory
  
  - Original Message -
   
   
   Hi,
   
   I am receiving the following error while compiling the code in the
   mapi directory. I am using mesa 7.5.
  
  If you're compiling with MSVC I'd recommend using a recent Mesa
  release and save your self a world of trouble. It's known to build well
  there.
  
  If you must use this old release, then you'll likely need to search
  the MSVC build fixes and crossport them.
  
  Jose
  
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 62571] Mesa 9.0 uses missing nouveau library

https://bugs.freedesktop.org/show_bug.cgi?id=62571

Jesus Cortez jesus.corte...@gmail.com changed:

   What|Removed |Added

   Assignee|nouveau@lists.freedesktop.o |mesa-dev@lists.freedesktop.
   |rg  |org
  Component|Drivers/DRI/nouveau |Mesa core

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] RFC: TGSI scalar arrays

2013-03-20 Thread Roland Scheidegger

Am 20.03.2013 19:09, schrieb Christoph Bumiller:
 On 20.03.2013 18:30, Roland Scheidegger wrote:
 Am 20.03.2013 17:46, schrieb Christoph Bumiller:
 On 20.03.2013 17:05, Roland Scheidegger wrote:

 Not sure I fully understand this, but I'm thinking whenever in doubt,
 use something close to what dx10 does since that's likely going to work
 reasonable with different hw. Maybe declaring those special values
 differently (not just as output reg) would help?
 What DX10 does is making indirect access of varyings illegal. That's not
 possible with OpenGL ...
 Hmm I thought dcl_indexRange would be used for indirect access of varyings?
 
 Interesting ... when last I tried that back when working on d3d1x, the
 compiler didn't like it, and I remember something about indexRange
 existing only for debugging (and I remember finding that strange).
 
 Also, d3d11 doesn't have the annoying limit that GLSL has so there is no
 need for it to pack varyings.
 When I use floats[3] + SV_POSITION, I get vs_5_0 output limit (32)
 exceeded, shader uses 33 outputs, but float4[28] works just fine.
 
 For indirect access I still get:
 error X3500: array reference cannot be used as an l-value; not natively
 addressable
Hmm that's odd. On some quick look I couldn't find many examples using
it - and those I found it was used in hull shaders.
I can't see anything saying it shouldn't work at all though (it does
have some restrictions but they look reasonable to me).

 
 for
 
 struct IA2VS
 {
 float4 position : POSITION;
 float4 color: COLOR;
 };
 
 struct VS2PS
 {
 float4 position : SV_POSITION;
 float4 color[2] : WHATEVER;
 };
 
 VS2PS vs(IA2VS input)
 {
 VS2PS result;
 int i = int(input.position.x);
 result.position = input.position;
 result.color[i] = input.color;
 return result;
 }
 
 float4 ps(VS2PS input) : SV_TARGET
 {
 return input.color[0];
 }
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 62571] Mesa 9.0 uses missing nouveau library

https://bugs.freedesktop.org/show_bug.cgi?id=62571

Maarten Lankhorst m.b.lankho...@gmail.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |INVALID

--- Comment #2 from Maarten Lankhorst m.b.lankho...@gmail.com ---
First of all, this is a bug with scientific linux' packaging probably.

Second, are you sure that it doesn't exist?

ldd /usr/lib/dri/nouveau_dri.so

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 62571] Mesa 9.0 uses missing nouveau library

https://bugs.freedesktop.org/show_bug.cgi?id=62571

--- Comment #3 from Jesus Cortez jesus.corte...@gmail.com ---
(In reply to comment #2)
 First of all, this is a bug with scientific linux' packaging probably.
 
 Second, are you sure that it doesn't exist?
 
 ldd /usr/lib/dri/nouveau_dri.so

Yes, the nouveau_dri.so is completely missing from the machine.

Just to verify that it was the upgrade, I ran 

yum -y upgrade mesa-libGL-devel mesa-libGL mesa-dri-drivers

and the glxinfo, and sure enough, the problem came back.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 1/4] i965: Move brw_vs_prog_data::outputs_written into VUE map.

Future patches will allow for there to be separate VUE maps when both
a geometry shader and a vertex shader are in use.  When this happens,
we will want to have correspondingly separate outputs_written
bitfields.  Moving outputs_written into the VUE map will make this
easy.

For consistency with the terminology used in the VUE map, the bitfield
is renamed to slots_valid in the process.
---
 src/mesa/drivers/dri/i965/brw_clip.c   |  2 +-
 src/mesa/drivers/dri/i965/brw_context.h|  8 +++-
 src/mesa/drivers/dri/i965/brw_gs.c |  2 +-
 src/mesa/drivers/dri/i965/brw_sf.c |  2 +-
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp |  9 -
 src/mesa/drivers/dri/i965/brw_vs.c | 23 ---
 src/mesa/drivers/dri/i965/brw_wm.c |  2 +-
 7 files changed, 27 insertions(+), 21 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_clip.c 
b/src/mesa/drivers/dri/i965/brw_clip.c
index d411208..e20f7c2 100644
--- a/src/mesa/drivers/dri/i965/brw_clip.c
+++ b/src/mesa/drivers/dri/i965/brw_clip.c
@@ -146,7 +146,7 @@ brw_upload_clip_prog(struct brw_context *brw)
/* BRW_NEW_REDUCED_PRIMITIVE */
key.primitive = brw-intel.reduced_primitive;
/* CACHE_NEW_VS_PROG (also part of VUE map) */
-   key.attrs = brw-vs.prog_data-outputs_written;
+   key.attrs = brw-vs.prog_data-vue_map.slots_valid;
/* _NEW_LIGHT */
key.do_flat_shading = (ctx-Light.ShadeModel == GL_FLAT);
key.pv_first = (ctx-Light.ProvokingVertex == GL_FIRST_VERTEX_CONVENTION);
diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index 9f1aaf5..fe6e639 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -354,6 +354,13 @@ typedef enum
  */
 struct brw_vue_map {
/**
+* Bitfield representing all varying slots that are (a) stored in this VUE
+* map, and (b) actually written by the shader.  Does not include any of
+* the additional varying slots defined in brw_varying_slot.
+*/
+   GLbitfield64 slots_valid;
+
+   /**
 * Map from gl_varying_slot value to VUE slot.  For gl_varying_slots that 
are
 * not stored in a slot (because they are not written, or because
 * additional processing is applied before storing them in the VUE), the
@@ -437,7 +444,6 @@ struct brw_vs_prog_data {
GLuint curb_read_length;
GLuint urb_read_length;
GLuint total_grf;
-   GLbitfield64 outputs_written;
GLuint nr_params;   /** number of float params/constants */
GLuint nr_pull_params; /** number of dwords referenced by pull_param[] */
GLuint total_scratch;
diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
b/src/mesa/drivers/dri/i965/brw_gs.c
index 1328984..e755a10 100644
--- a/src/mesa/drivers/dri/i965/brw_gs.c
+++ b/src/mesa/drivers/dri/i965/brw_gs.c
@@ -167,7 +167,7 @@ static void populate_key( struct brw_context *brw,
memset(key, 0, sizeof(*key));
 
/* CACHE_NEW_VS_PROG (part of VUE map) */
-   key-attrs = brw-vs.prog_data-outputs_written;
+   key-attrs = brw-vs.prog_data-vue_map.slots_valid;
 
/* BRW_NEW_PRIMITIVE */
key-primitive = brw-primitive;
diff --git a/src/mesa/drivers/dri/i965/brw_sf.c 
b/src/mesa/drivers/dri/i965/brw_sf.c
index fdc6bd7..c8b7033 100644
--- a/src/mesa/drivers/dri/i965/brw_sf.c
+++ b/src/mesa/drivers/dri/i965/brw_sf.c
@@ -145,7 +145,7 @@ brw_upload_sf_prog(struct brw_context *brw)
/* Populate the key, noting state dependencies:
 */
/* CACHE_NEW_VS_PROG */
-   key.attrs = brw-vs.prog_data-outputs_written; 
+   key.attrs = brw-vs.prog_data-vue_map.slots_valid;
 
/* BRW_NEW_REDUCED_PRIMITIVE */
switch (brw-intel.reduced_primitive) {
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 60575d7..b0a0dd6 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -2402,7 +2402,7 @@ void
 vec4_visitor::emit_psiz_and_flags(struct brw_reg reg)
 {
if (intel-gen  6 
-   ((c-prog_data.outputs_written  BITFIELD64_BIT(VARYING_SLOT_PSIZ)) ||
+   ((c-prog_data.vue_map.slots_valid  VARYING_BIT_PSIZ) ||
 c-key.userclip_active || brw-has_negative_rhw_bug)) {
   dst_reg header1 = dst_reg(this, glsl_type::uvec4_type);
   dst_reg header1_w = header1;
@@ -2411,7 +2411,7 @@ vec4_visitor::emit_psiz_and_flags(struct brw_reg reg)
 
   emit(MOV(header1, 0u));
 
-  if (c-prog_data.outputs_written  BITFIELD64_BIT(VARYING_SLOT_PSIZ)) {
+  if (c-prog_data.vue_map.slots_valid  VARYING_BIT_PSIZ) {
 src_reg psiz = src_reg(output_reg[VARYING_SLOT_PSIZ]);
 
 current_annotation = Point size;
@@ -2456,7 +2456,7 @@ vec4_visitor::emit_psiz_and_flags(struct brw_reg reg)
   emit(MOV(retype(reg, BRW_REGISTER_TYPE_UD), 0u));
} else {
   emit(MOV(retype(reg, BRW_REGISTER_TYPE_D), src_reg(0)));
-  if (c-prog_data.outputs_written

[Mesa-dev] [PATCH 2/4] i965: Create a pointer in brw_context to the geometry output VUE map.

Currently, the GPU pipeline has one active VUE map in effect at any
given time--the one representing the layout of vertex data coming from
the vertex shader.  However, when geometry shaders are added, they
will have their own independent VUE map.  Later pipeline stages (clip,
sf, fs) will need to consult the geometry shader VUE map if a geometry
shader is in use, and the vertex shader VUE map otherwise.

This patch adds a new field to brw_context, vue_map_geom_out, which
points to whichever VUE map should be used by later pipeline stages.
It also adds a new state flag, BRW_NEW_VUE_MAP_GEOM_OUT, which is
signalled whenever this pointer changes.

Since we don't support geometry shaders yet, vue_map_geom_out is
currently set only by the brw_vs_prog state atom.
---
 src/mesa/drivers/dri/i965/brw_context.h  | 12 
 src/mesa/drivers/dri/i965/brw_state_upload.c |  1 +
 src/mesa/drivers/dri/i965/brw_vs.c   |  4 
 3 files changed, 17 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index fe6e639..7ad78f5 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -153,6 +153,7 @@ enum brw_state_id {
BRW_STATE_PROGRAM_CACHE,
BRW_STATE_STATE_BASE_ADDRESS,
BRW_STATE_SOL_INDICES,
+   BRW_STATE_VUE_MAP_GEOM_OUT,
 };
 
 #define BRW_NEW_URB_FENCE   (1  BRW_STATE_URB_FENCE)
@@ -182,6 +183,7 @@ enum brw_state_id {
 #define BRW_NEW_PROGRAM_CACHE  (1  BRW_STATE_PROGRAM_CACHE)
 #define BRW_NEW_STATE_BASE_ADDRESS (1  BRW_STATE_STATE_BASE_ADDRESS)
 #define BRW_NEW_SOL_INDICES(1  BRW_STATE_SOL_INDICES)
+#define BRW_NEW_VUE_MAP_GEOM_OUT   (1  BRW_STATE_VUE_MAP_GEOM_OUT)
 
 struct brw_state_flags {
/** State update flags signalled by mesa internals */
@@ -917,6 +919,16 @@ struct brw_context
   uint32_t offset;
} sampler;
 
+   /**
+* Layout of vertex data exiting the geometry portion of the pipleine.
+* This comes from the geometry shader if one exists, otherwise from the
+* vertex shader.
+*
+* BRW_NEW_VUE_MAP_GEOM_OUT is flagged when this pointer (or the data it
+* points to) changes.
+*/
+   const struct brw_vue_map *vue_map_geom_out;
+
struct {
   struct brw_vs_prog_data *prog_data;
 
diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
index 41dfdc3..5c5c05e 100644
--- a/src/mesa/drivers/dri/i965/brw_state_upload.c
+++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
@@ -376,6 +376,7 @@ static struct dirty_bit_map brw_bits[] = {
DEFINE_BIT(BRW_NEW_PROGRAM_CACHE),
DEFINE_BIT(BRW_NEW_STATE_BASE_ADDRESS),
DEFINE_BIT(BRW_NEW_SOL_INDICES),
+   DEFINE_BIT(BRW_NEW_VUE_MAP_GEOM_OUT),
{0, 0, 0}
 };
 
diff --git a/src/mesa/drivers/dri/i965/brw_vs.c 
b/src/mesa/drivers/dri/i965/brw_vs.c
index d875703..214730d 100644
--- a/src/mesa/drivers/dri/i965/brw_vs.c
+++ b/src/mesa/drivers/dri/i965/brw_vs.c
@@ -314,6 +314,8 @@ do_vs_prog(struct brw_context *brw,
program, program_size,
c.prog_data, sizeof(c.prog_data),
brw-vs.prog_offset, brw-vs.prog_data);
+   brw-vue_map_geom_out = brw-vs.prog_data-vue_map;
+   brw-state.dirty.brw |= BRW_NEW_VUE_MAP_GEOM_OUT;
ralloc_free(mem_ctx);
 
return true;
@@ -488,6 +490,8 @@ static void brw_upload_vs_prog(struct brw_context *brw)
 
   assert(success);
}
+   brw-vue_map_geom_out = brw-vs.prog_data-vue_map;
+   brw-state.dirty.brw |= BRW_NEW_VUE_MAP_GEOM_OUT;
 }
 
 /* See brw_vs.c:
-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 3/4] i965: Use brw.vue_map_geom_out instead of VS output VUE map where appropriate.

This patch modifies post-GS pipeline stages (transform feedback, clip,
sf, fs) to refer to the VUE map through brw-vue_map_geom_out rather
than brw-vs.prog_data-vue_map.  This ensures that when geometry
shader support is added, these pipeline stages will consult the
geometry shader output VUE map when appropriate, rather than the
vertex shader output VUE map.
---
 src/mesa/drivers/dri/i965/brw_clip.c   |  7 +++
 src/mesa/drivers/dri/i965/brw_sf.c |  7 +++
 src/mesa/drivers/dri/i965/brw_state.h  |  2 +-
 src/mesa/drivers/dri/i965/brw_wm.c |  6 +++---
 src/mesa/drivers/dri/i965/gen6_sf_state.c  | 10 +-
 src/mesa/drivers/dri/i965/gen7_sf_state.c  |  8 
 src/mesa/drivers/dri/i965/gen7_sol_state.c | 14 +++---
 7 files changed, 26 insertions(+), 28 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_clip.c 
b/src/mesa/drivers/dri/i965/brw_clip.c
index e20f7c2..bc0ebb5 100644
--- a/src/mesa/drivers/dri/i965/brw_clip.c
+++ b/src/mesa/drivers/dri/i965/brw_clip.c
@@ -69,7 +69,7 @@ static void compile_clip_prog( struct brw_context *brw,
c.func.single_program_flow = 1;
 
c.key = *key;
-   c.vue_map = brw-vs.prog_data-vue_map;
+   c.vue_map = *brw-vue_map_geom_out;
 
/* nr_regs is the number of registers filled by reading data from the VUE.
 * This program accesses the entire VUE, so nr_regs needs to be the size of
@@ -146,7 +146,7 @@ brw_upload_clip_prog(struct brw_context *brw)
/* BRW_NEW_REDUCED_PRIMITIVE */
key.primitive = brw-intel.reduced_primitive;
/* CACHE_NEW_VS_PROG (also part of VUE map) */
-   key.attrs = brw-vs.prog_data-vue_map.slots_valid;
+   key.attrs = brw-vue_map_geom_out-slots_valid;
/* _NEW_LIGHT */
key.do_flat_shading = (ctx-Light.ShadeModel == GL_FLAT);
key.pv_first = (ctx-Light.ProvokingVertex == GL_FIRST_VERTEX_CONVENTION);
@@ -258,8 +258,7 @@ const struct brw_tracked_state brw_clip_prog = {
_NEW_TRANSFORM |
_NEW_POLYGON | 
_NEW_BUFFERS),
-  .brw   = (BRW_NEW_REDUCED_PRIMITIVE),
-  .cache = CACHE_NEW_VS_PROG
+  .brw   = (BRW_NEW_REDUCED_PRIMITIVE | BRW_NEW_VUE_MAP_GEOM_OUT)
},
.emit = brw_upload_clip_prog
 };
diff --git a/src/mesa/drivers/dri/i965/brw_sf.c 
b/src/mesa/drivers/dri/i965/brw_sf.c
index c8b7033..d90c0bc 100644
--- a/src/mesa/drivers/dri/i965/brw_sf.c
+++ b/src/mesa/drivers/dri/i965/brw_sf.c
@@ -65,7 +65,7 @@ static void compile_sf_prog( struct brw_context *brw,
brw_init_compile(brw, c.func, mem_ctx);
 
c.key = *key;
-   c.vue_map = brw-vs.prog_data-vue_map;
+   c.vue_map = *brw-vue_map_geom_out;
if (c.key.do_point_coord) {
   /*
* gl_PointCoord is a FS instead of VS builtin variable, thus it's
@@ -145,7 +145,7 @@ brw_upload_sf_prog(struct brw_context *brw)
/* Populate the key, noting state dependencies:
 */
/* CACHE_NEW_VS_PROG */
-   key.attrs = brw-vs.prog_data-vue_map.slots_valid;
+   key.attrs = brw-vue_map_geom_out-slots_valid;
 
/* BRW_NEW_REDUCED_PRIMITIVE */
switch (brw-intel.reduced_primitive) {
@@ -216,8 +216,7 @@ const struct brw_tracked_state brw_sf_prog = {
.dirty = {
   .mesa  = (_NEW_HINT | _NEW_LIGHT | _NEW_POLYGON | _NEW_POINT |
 _NEW_TRANSFORM | _NEW_BUFFERS | _NEW_PROGRAM),
-  .brw   = (BRW_NEW_REDUCED_PRIMITIVE),
-  .cache = CACHE_NEW_VS_PROG
+  .brw   = (BRW_NEW_REDUCED_PRIMITIVE | BRW_NEW_VUE_MAP_GEOM_OUT)
},
.emit = brw_upload_sf_prog
 };
diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index 02ce57b..1f5e18a 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -227,7 +227,7 @@ void upload_default_color(struct brw_context *brw,
 
 /* gen6_sf_state.c */
 uint32_t
-get_attr_override(struct brw_vue_map *vue_map, int urb_entry_read_offset,
+get_attr_override(const struct brw_vue_map *vue_map, int urb_entry_read_offset,
   int fs_attr, bool two_side_color, uint32_t *max_source_attr);
 
 #ifdef __cplusplus
diff --git a/src/mesa/drivers/dri/i965/brw_wm.c 
b/src/mesa/drivers/dri/i965/brw_wm.c
index e7e9ddc..d121dbf 100644
--- a/src/mesa/drivers/dri/i965/brw_wm.c
+++ b/src/mesa/drivers/dri/i965/brw_wm.c
@@ -481,7 +481,7 @@ static void brw_wm_populate_key( struct brw_context *brw,
 
/* CACHE_NEW_VS_PROG */
if (intel-gen  6)
-  key-vp_outputs_written = brw-vs.prog_data-vue_map.slots_valid;
+  key-vp_outputs_written = brw-vue_map_geom_out-slots_valid;
 
/* The unique fragment program ID */
key-program_string_id = fp-id;
@@ -524,8 +524,8 @@ const struct brw_tracked_state brw_wm_prog = {
_NEW_MULTISAMPLE),
   .brw   = (BRW_NEW_FRAGMENT_PROGRAM |
BRW_NEW_WM_INPUT_DIMENSIONS |
-   BRW_NEW_REDUCED_PRIMITIVE),
-  .cache = CACHE_NEW_VS_PROG,
+   BRW_NEW_REDUCED_PRIMITIVE |
+BRW_NEW_VUE_MAP_GEOM_OUT)
},
.emit =

[Mesa-dev] [PATCH 4/4] i965/fs: Rename vp_outputs_written to input_slots_valid.

With the introduction of geometry shaders, fragment inputs will no
longer come exclusively from the vertex shader; sometimes they come
from the geometry shader.  So the name vp_outputs_written will
become a misnomer.  This patch renames vp_outputs_written to
input_slots_valid, to reflect the true meaning of the bitfield from
the fragment shader's point of view: it indicates which of the
possible input slots contain valid data that was written by the
previous shader stage.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 6 +++---
 src/mesa/drivers/dri/i965/brw_wm.c   | 6 +++---
 src/mesa/drivers/dri/i965/brw_wm.h   | 2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 5a5bfeb..ecce66b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1264,7 +1264,7 @@ fs_visitor::calculate_urb_setup()
  if (i == VARYING_SLOT_PSIZ)
 continue;
 
-if (c-key.vp_outputs_written  BITFIELD64_BIT(i)) {
+if (c-key.input_slots_valid  BITFIELD64_BIT(i)) {
/* The back color slot is skipped when the front color is
 * also written to.  In addition, some slots can be
 * written in the vertex shader and not read in the
@@ -2995,7 +2995,7 @@ brw_fs_precompile(struct gl_context *ctx, struct 
gl_shader_program *prog)
}
 
if (intel-gen  6)
-  key.vp_outputs_written |= BITFIELD64_BIT(VARYING_SLOT_POS);
+  key.input_slots_valid |= BITFIELD64_BIT(VARYING_SLOT_POS);
 
for (int i = 0; i  VARYING_SLOT_MAX; i++) {
   if (!(fp-Base.InputsRead  BITFIELD64_BIT(i)))
@@ -3006,7 +3006,7 @@ brw_fs_precompile(struct gl_context *ctx, struct 
gl_shader_program *prog)
 
   if (intel-gen  6) {
  if (_mesa_varying_slot_in_fs((gl_varying_slot) i))
-key.vp_outputs_written |= BITFIELD64_BIT(i);
+key.input_slots_valid |= BITFIELD64_BIT(i);
   }
}
 
diff --git a/src/mesa/drivers/dri/i965/brw_wm.c 
b/src/mesa/drivers/dri/i965/brw_wm.c
index d121dbf..a8f2a3a2 100644
--- a/src/mesa/drivers/dri/i965/brw_wm.c
+++ b/src/mesa/drivers/dri/i965/brw_wm.c
@@ -289,8 +289,8 @@ brw_wm_debug_recompile(struct brw_context *brw,
   old_key-proj_attrib_mask, key-proj_attrib_mask);
found |= key_debug(intel, renderbuffer height,
   old_key-drawable_height, key-drawable_height);
-   found |= key_debug(intel, vertex shader outputs,
-  old_key-vp_outputs_written, key-vp_outputs_written);
+   found |= key_debug(intel, input slots valid,
+  old_key-input_slots_valid, key-input_slots_valid);
 
found |= brw_debug_recompile_sampler_key(intel, old_key-tex, key-tex);
 
@@ -481,7 +481,7 @@ static void brw_wm_populate_key( struct brw_context *brw,
 
/* CACHE_NEW_VS_PROG */
if (intel-gen  6)
-  key-vp_outputs_written = brw-vue_map_geom_out-slots_valid;
+  key-input_slots_valid = brw-vue_map_geom_out-slots_valid;
 
/* The unique fragment program ID */
key-program_string_id = fp-id;
diff --git a/src/mesa/drivers/dri/i965/brw_wm.h 
b/src/mesa/drivers/dri/i965/brw_wm.h
index 8eb71de..f43d42c 100644
--- a/src/mesa/drivers/dri/i965/brw_wm.h
+++ b/src/mesa/drivers/dri/i965/brw_wm.h
@@ -70,7 +70,7 @@ struct brw_wm_prog_key {
GLbitfield64 proj_attrib_mask; /** one bit per fragment program attribute 
*/
 
GLushort drawable_height;
-   GLbitfield64 vp_outputs_written;
+   GLbitfield64 input_slots_valid;
GLuint program_string_id:32;
 
struct brw_sampler_prog_key_data tex;
-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Mismatch between Mesas dispatch table and the one used by the X server

2013-03-20 Thread Stefan Brüns

On Tuesday 12 March 2013 09:25:07 Ian Romanick wrote:
 On 03/10/2013 11:24 AM, Stefan Brüns wrote:
  Hi everyone,
  
  I have run into a problem leading to a crashing X server for a bunch of
  indirect GLX calls.
  
  This problem affects any OpenGL clients using indirect rendering and
  calling functions which are outside the ABI. Problems range from
  malfunctioning to crashes.
  
  Problem analysis:
  
  The dispatcher functions in glx/indirect_dispatch.c demarshall the
  function
  arguments from the GLX wire protocol and then call the appropriate
  function of the GL library. Function calls are done using dispatch table
  with the help of the CALL_* helper macros defined in dispatch.h.
  
  Unfortunately there is a mismatch between the dispatch table expected by
  the xserver - which follows the layout e.g. found in glapitable.h - and
  the one provided by the GL library, e.g. Mesa.
 
 The dependency is the other way around.  The loader (either libGL on the
 client or the GLX extension in the server) dictates the layout of the
 dispatch table.  The GL driver is required to adapt to the layout
 dictated by the loader.  That's the whole reason the remap table exists.
   The driver queries the loader to learn where functions are located in
 the dispatch table.  It then stores the dispatch table locations at
 (fixed) locations in the remap table.  So, the driver knows that glFoo
 is at location 824 in the remap table, and that entry stores the
 location of glFoo in the dispatch table.
 
 It sounds like something else is going wrong.

Currently this obviously can not work.

The remap table is only used when FEATURE_remap_table is defined, which for 
the X server is never true.

Now, lets try defining FEATURE_remap_table for the xserver (which breaks OS X 
and windows builds of Mesa, but anyway ...)

Even then, the lookup of indexes for the remap table is going wrong.

The xserver uses the remap table slots defined in the dispatch.h exported from 
some past version of Mesa, i.e. #define TexBufferARB_remap_index 174.

Now the remap table is populated from _mesa_init_remap_table in 
mesa/main/remap.c, which calls:
_mesa_do_init_remap_table(_mesa_function_pool,
driDispatchRemapTable_size, MESA_remap_table_functions);

Now lets have a look a look into MESA_remap_table_functions[174], which should 
have the entry for TexBufferARB, and yes, its there, so everything fine!

But wait, do another test - ClientWaitSync, slot 178 in remap table, at least 
if you ask the x server, but slot 185 if you ask Mesa ...

So obviously the remap table is filled with the wrong values.

Regards, Stefan


-- 
Stefan Brüns  /  Bergstraße 21  /  52062 Aachen
phone: +49 241 53809034 mobile: +49 151 50412019
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 1/2] build libgallium shared by default.

2013-03-20 Thread Johannes Obermayr

Am Dienstag, 19. März 2013, 21:36:47 schrieb Andreas Boll:
2013/3/19 Johannes Obermayr johannesoberm...@gmx.de:
Am Montag, 18. März 2013, 15:38:31 schrieb Maarten Lankhorst:
This is one of the 2 patches used in ubuntu for decreasing size of mesa
build.

The other one is more hacky, and links libmesagallium into libgallium,
and then links libgallium against libdricore too for minimal duplication.

I am against both patches:

1. libgallium shared in this version causes duplicate symbols for depending
targets if using static LLVM libs and generally links to much LLVM
components/libs

2. libmesagallium shared in a right implementation unconditionally depends
on shared libglapi and shared libgallium to avoid duplicate symbol for
depending targets

3. It is not -no-undefined but -Wl,--no-undefined to show missing
symbols (and currently there are a lot of them in Mesa) ...

This is because libtool is broken.

4. I have worked to target issues of 1. to 3. in a bottom-up series since
December while splitting mesa into libmesacore, libmesadri and
libmesagallium to reduce binary sizes as much as possible for distributions

Hi Johannes,

any chance you could continue the work on shared libs?
We all have the same goals, reduce binary sizes, fix undefined
symbols, reduce the number of build configurations, support for make
dist and make distcheck
- long story short improve mesa's build system.
This time we have more time until the next mesa release to work out all
issues.

I have not stopped this work mainly for my own ego and researches (currently it
works for my test cases and should be almost finished for all cases).

But I am not really sure whether I will publish the patches because my general
experience has been sad when my work shall become pushed to mainline
repositories: core devs complained, sb. reinvented the wheel some months later
and/or recognized my first approach wasn't so wrong ...
Also asking and begging core devs a few times to get patches pushed is not the
thing I want to do anymore.

I know: If it works for my common test cases it isn't guaranteed that it will
work for all cases.
But you can find most issues only if patches landed in git master and become
tested by more people / configure switches.
Automake work is a good example: People don't test branches although they were
asked to do so and complained firstly if configure switches in master were
broken after the big push. But you should have seen during that time my
interests were and are to quickly fix build failures caused by automake work ...

If you ensure core developers agree with unconditionally shared libs, the Drop
last parts of compatibility for the old Mesa build system. patch and generally
the patch series will become pushed within a week after publishing for testing
it will be likelier that I publish the patch series.

Johannes

Andreas.

If 4. will be finished right this patch should become obsolete:
http://cgit.freedesktop.org/mesa/mesa/commit/?id=cf69a591e1ad16b590c9ae2eba0da6fa6c4fc741

And also most of the C++ linker forces will become obsolete.

But pushing things like
http://cgit.freedesktop.org/mesa/mesa/commit/?id=2506b035031d6022fec0465bffac8eedd43de0f9
without saying in which cases it is required (e. g. not for me) doesn't
make it easier to fulfill less memory consumption ...

Johannes
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Mismatch between Mesas dispatch table and the one used by the X server

2013-03-20 Thread Ian Romanick


On 03/20/2013 02:43 PM, Stefan Brüns wrote:

On Tuesday 12 March 2013 09:25:07 Ian Romanick wrote:

On 03/10/2013 11:24 AM, Stefan Brüns wrote:

Hi everyone,

I have run into a problem leading to a crashing X server for a bunch of
indirect GLX calls.

This problem affects any OpenGL clients using indirect rendering and
calling functions which are outside the ABI. Problems range from
malfunctioning to crashes.

Problem analysis:

The dispatcher functions in glx/indirect_dispatch.c demarshall the
function
arguments from the GLX wire protocol and then call the appropriate
function of the GL library. Function calls are done using dispatch table
with the help of the CALL_* helper macros defined in dispatch.h.

Unfortunately there is a mismatch between the dispatch table expected by
the xserver - which follows the layout e.g. found in glapitable.h - and
the one provided by the GL library, e.g. Mesa.


The dependency is the other way around.  The loader (either libGL on the
client or the GLX extension in the server) dictates the layout of the
dispatch table.  The GL driver is required to adapt to the layout
dictated by the loader.  That's the whole reason the remap table exists.
   The driver queries the loader to learn where functions are located in
the dispatch table.  It then stores the dispatch table locations at
(fixed) locations in the remap table.  So, the driver knows that glFoo
is at location 824 in the remap table, and that entry stores the
location of glFoo in the dispatch table.

It sounds like something else is going wrong.


Currently this obviously can not work.

The remap table is only used when FEATURE_remap_table is defined, which for
the X server is never true.

Now, lets try defining FEATURE_remap_table for the xserver (which breaks OS X
and windows builds of Mesa, but anyway ...)


Xserver has nothing to do with it.  The remap table is entirely in the 
driver (*_dri.so).  The xserver has no knowledge about the remap table 
whatsoever.  The xserver only knows about the dispatch table, and it 
dictates where things are in that table.


It's been a long time since I wrote this code, but I haven't been able 
to kill of all the memories of it yet. :)



Even then, the lookup of indexes for the remap table is going wrong.

The xserver uses the remap table slots defined in the dispatch.h exported from
some past version of Mesa, i.e. #define TexBufferARB_remap_index 174.

Now the remap table is populated from _mesa_init_remap_table in
mesa/main/remap.c, which calls:
_mesa_do_init_remap_table(_mesa_function_pool,
driDispatchRemapTable_size, MESA_remap_table_functions);

Now lets have a look a look into MESA_remap_table_functions[174], which should
have the entry for TexBufferARB, and yes, its there, so everything fine!

But wait, do another test - ClientWaitSync, slot 178 in remap table, at least
if you ask the x server, but slot 185 if you ask Mesa ...

So obviously the remap table is filled with the wrong values.

Regards, Stefan


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Mismatch between Mesas dispatch table and the one used by the X server

2013-03-20 Thread Stefan Brüns

On Wednesday 20 March 2013 15:47:24 you wrote:
 On 03/20/2013 02:43 PM, Stefan Brüns wrote:
  On Tuesday 12 March 2013 09:25:07 Ian Romanick wrote:
  On 03/10/2013 11:24 AM, Stefan Brüns wrote:
  Hi everyone,
  
  I have run into a problem leading to a crashing X server for a bunch of
  indirect GLX calls.
  
  This problem affects any OpenGL clients using indirect rendering and
  calling functions which are outside the ABI. Problems range from
  malfunctioning to crashes.
  
  Problem analysis:
  
  The dispatcher functions in glx/indirect_dispatch.c demarshall the
  function
  arguments from the GLX wire protocol and then call the appropriate
  function of the GL library. Function calls are done using dispatch table
  with the help of the CALL_* helper macros defined in dispatch.h.
  
  Unfortunately there is a mismatch between the dispatch table expected by
  the xserver - which follows the layout e.g. found in glapitable.h - and
  the one provided by the GL library, e.g. Mesa.
  
  The dependency is the other way around.  The loader (either libGL on the
  client or the GLX extension in the server) dictates the layout of the
  dispatch table.  The GL driver is required to adapt to the layout
  dictated by the loader.  That's the whole reason the remap table exists.
  
 The driver queries the loader to learn where functions are located in
  
  the dispatch table.  It then stores the dispatch table locations at
  (fixed) locations in the remap table.  So, the driver knows that glFoo
  is at location 824 in the remap table, and that entry stores the
  location of glFoo in the dispatch table.
  
  It sounds like something else is going wrong.
  
  Currently this obviously can not work.
  
  The remap table is only used when FEATURE_remap_table is defined, which
  for
  the X server is never true.
  
  Now, lets try defining FEATURE_remap_table for the xserver (which breaks
  OS X and windows builds of Mesa, but anyway ...)
 
 Xserver has nothing to do with it.  The remap table is entirely in the
 driver (*_dri.so).  The xserver has no knowledge about the remap table
 whatsoever.  The xserver only knows about the dispatch table, and it
 dictates where things are in that table.
 
 It's been a long time since I wrote this code, but I haven't been able
 to kill of all the memories of it yet. :)

Please, look at the code again as I have done!

The x server code found in xorg/glx/indirect_dispatch.c directly call into the 
dispatch table, and it uses the offsets found in xorg/glx/dispatch.h

Regards,

Stefan

-- 
Stefan Brüns  /  Bergstraße 21  /  52062 Aachen
phone: +49 241 53809034 mobile: +49 151 50412019
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Mismatch between Mesas dispatch table and the one used by the X server

2013-03-20 Thread Stefan Brüns

On Wednesday 20 March 2013 15:47:24 you wrote:
 On 03/20/2013 02:43 PM, Stefan Brüns wrote:
  On Tuesday 12 March 2013 09:25:07 Ian Romanick wrote:
  On 03/10/2013 11:24 AM, Stefan Brüns wrote:
  Hi everyone,
  
  I have run into a problem leading to a crashing X server for a bunch of
  indirect GLX calls.
  
  This problem affects any OpenGL clients using indirect rendering and
  calling functions which are outside the ABI. Problems range from
  malfunctioning to crashes.
  
  Problem analysis:
  
  The dispatcher functions in glx/indirect_dispatch.c demarshall the
  function
  arguments from the GLX wire protocol and then call the appropriate
  function of the GL library. Function calls are done using dispatch table
  with the help of the CALL_* helper macros defined in dispatch.h.
  
  Unfortunately there is a mismatch between the dispatch table expected by
  the xserver - which follows the layout e.g. found in glapitable.h - and
  the one provided by the GL library, e.g. Mesa.
  
  The dependency is the other way around.  The loader (either libGL on the
  client or the GLX extension in the server) dictates the layout of the
  dispatch table.  The GL driver is required to adapt to the layout
  dictated by the loader.  That's the whole reason the remap table exists.
  
 The driver queries the loader to learn where functions are located in
  
  the dispatch table.  It then stores the dispatch table locations at
  (fixed) locations in the remap table.  So, the driver knows that glFoo
  is at location 824 in the remap table, and that entry stores the
  location of glFoo in the dispatch table.
  
  It sounds like something else is going wrong.
  
  Currently this obviously can not work.
  
  The remap table is only used when FEATURE_remap_table is defined, which
  for
  the X server is never true.
  
  Now, lets try defining FEATURE_remap_table for the xserver (which breaks
  OS X and windows builds of Mesa, but anyway ...)
 
 Xserver has nothing to do with it.  The remap table is entirely in the
 driver (*_dri.so).  The xserver has no knowledge about the remap table
 whatsoever.  The xserver only knows about the dispatch table, and it
 dictates where things are in that table.
 
 It's been a long time since I wrote this code, but I haven't been able
 to kill of all the memories of it yet. :)

Just one addition:

The current code in the xserver inlines the various CALL_* invocations. Maybe 
inlining this code was never intended?

Regards,

Stefan

-- 
Stefan Brüns  /  Bergstraße 21  /  52062 Aachen
phone: +49 241 53809034 mobile: +49 151 50412019
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 03/15] glsl: parse in/out types for interface blocks

Jordan Justen jordan.l.jus...@intel.com writes:

 Previously only 'uniform' was allowed for uniform blocks.

 Now, in/out can be parsed, but it will only be allowed for
 GLSL = 150.


  basic_interface_block:
 - UNIFORM NEW_IDENTIFIER '{' member_list '}' instance_name_opt ';'
 + interface_qualifier NEW_IDENTIFIER '{' member_list '}' 
 instance_name_opt ';'
   {
  ast_interface_block *const block = $6;
  
  block-block_name = $2;
  block-declarations.push_degenerate_list_at_head( $4-link);
  
 -if (!state-ARB_uniform_buffer_object_enable) {
 -   _mesa_glsl_error( @1, state,
 -#version 140 / GL_ARB_uniform_buffer_object 
 -required for defining uniform blocks\n);
 -} else if (state-ARB_uniform_buffer_object_warn) {
 -   _mesa_glsl_warning( @1, state,
 -  #version 140 / GL_ARB_uniform_buffer_object 
 -  required for defining uniform blocks\n);
 +if ($1.flags.q.uniform) {
 +   if (!state-ARB_uniform_buffer_object_enable) {
 +  _mesa_glsl_error( @1, state,
 +   #version 140 / GL_ARB_uniform_buffer_object 
 +   required for defining uniform blocks\n);
 +   } else if (state-ARB_uniform_buffer_object_warn) {
 +  _mesa_glsl_warning( @1, state,
 + #version 140 / 
 GL_ARB_uniform_buffer_object 
 + required for defining uniform blocks\n);
 +   }
 +} else {
 +   if (state-es_shader || state-language_version  150) {
 +  _mesa_glsl_error( @1, state,
 +  #version 150 required for using 
 +  interface blocks.\n);
 +   }
  }
  
  /* Since block arrays require names, and both features are added in
 @@ -1937,10 +1946,39 @@ basic_interface_block:
  blocks with an instance name\n);
  }
  
 +unsigned interface_type_mask, interface_type_flags;
 +struct ast_type_qualifier temp_type_qualifier;
 +
 +temp_type_qualifier.flags.i = 0;
 +temp_type_qualifier.flags.q.uniform = true;
 +temp_type_qualifier.flags.q.in = true;
 +temp_type_qualifier.flags.q.out = true;
 +interface_type_mask = temp_type_qualifier.flags.i;
 +interface_type_flags = $1.flags.i  interface_type_mask;
 +block-layout.flags.i |= interface_type_flags;

Given that an interface_qualifier ($1) only has either uniform, in, or
out set, I don't see why this masking is needed.


pgpHn7EY1psNf.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 05/15] glsl parser: on desktop GL require GLSL 150 for instance names

Jordan Justen jordan.l.jus...@intel.com writes:

 Interface blocks in GLSL 150 allow an instance name to be used.

 Signed-off-by: Jordan Justen jordan.l.jus...@intel.com
 ---
  src/glsl/glsl_parser.yy |   15 ++-
  1 file changed, 10 insertions(+), 5 deletions(-)

 diff --git a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy
 index 8e6b04d..1fd8cc2 100644
 --- a/src/glsl/glsl_parser.yy
 +++ b/src/glsl/glsl_parser.yy
 @@ -1953,11 +1953,16 @@ basic_interface_block:
   * the same language versions, we don't have to explicitly
   * version-check both things.
   */
 -if (block-instance_name != NULL
 - !(state-language_version == 300  state-es_shader)) {
 -   _mesa_glsl_error( @1, state,
 -#version 300 es required for using uniform 
 -blocks with an instance name\n);
 +if (block-instance_name != NULL) {
 +   if(state-es_shader  state-language_version  300) {
^ missing space



pgpZEdEj_BX3u.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH v2 06/15] glsl parser: allow in out for interface block members

Jordan Justen jordan.l.jus...@intel.com writes:

 Previously uniform blocks allowed for the 'uniform' keyword
 to be used with members of a uniform blocks. With interface
 blocks 'in' can be used on 'in' interface block members and
 'out' can be used on 'out' interface block members.

 The basic_interface_block rule will verify that the same
 qualifier type is used with the block and each member.

 -type-qualifier = $1;
 -type-qualifier.flags.q.uniform = true;
 -type-specifier = $3;
 +if (!type-qualifier.merge_qualifier( @1, state, $1)) {
 +   YYERROR;
 +}
 +
 +if (type-qualifier.flags.q.attribute) {
 +   _mesa_glsl_error( @1, state,
 +   keyword 'attribute' cannot be used with 
 +   interface block member\n);
 +} else if (type-qualifier.flags.q.varying) {
 +   _mesa_glsl_error( @1, state,
 +   keyword 'varying' cannot be used with 
 +   interface block member\n);
 +}

I think some more qualifiers are getting allowed now, are they all intentional?

- invariant
- smooth
- flat
- noperspective

Could 7/15 get easily moved before this one, so that we don't allow
uniforms in our in/out blocks at this commit?

I'm done for the evening.  Patches 1-5 are (other than minor comments):

Reviewed-by: Eric Anholt e...@anholt.net


pgpblthzdX2Oz.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] i965 varying-index uniforms improvement

https://bugs.freedesktop.org/show_bug.cgi?id=61554

It's had more me toos than I would have expected, so I've marked all but
2 incidental patches as a candidate for 9.1.  It's also fairly invasive,
so I'm quite uncomfortable doing so.  I've tested on gm45, snb, and ivb so
far, and it seems to be working, though.  The previous iteration of the
IVB changes have been confirmed to fix the regression, and I hope to hear
back on pre-IVB soon.

The branch is at fs-varying-uniform-gen4 of my tree.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 01/13] i965/fs: Allow constant propagation into MACH.

This happens quite a bit with varying-index uniform loads.  We could also
do better by avoiding the MACH entirely, but there's no reason not to at
least take this step.
---
 src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
index 194ed07..2d0391a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
@@ -261,6 +261,7 @@ fs_visitor::try_constant_propagate(fs_inst *inst, acp_entry 
*entry)
  progress = true;
  break;
 
+  case BRW_OPCODE_MACH:
   case BRW_OPCODE_MUL:
   case BRW_OPCODE_ADD:
  if (i == 1) {
@@ -268,10 +269,11 @@ fs_visitor::try_constant_propagate(fs_inst *inst, 
acp_entry *entry)
 progress = true;
  } else if (i == 0  inst-src[1].file != IMM) {
 /* Fit this constant in by commuting the operands.
- * Exception: we can't do this for 32-bit integer MUL
+ * Exception: we can't do this for 32-bit integer MUL/MACH
  * because it's asymmetric.
  */
-if (inst-opcode == BRW_OPCODE_MUL 
+if ((inst-opcode == BRW_OPCODE_MUL ||
+ inst-opcode == BRW_OPCODE_MACH) 
 (inst-src[1].type == BRW_REGISTER_TYPE_D ||
  inst-src[1].type == BRW_REGISTER_TYPE_UD))
break;
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 02/13] i965/fs: Remove creation of a MOV instruction that's never used.

We weren't inserting it into the list, so it did nothing.  This line was
replaced by the MOV/MUL block above.

NOTE: This is a candidate for the 9.1 branch.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp |1 -
 1 file changed, 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 5a5bfeb..2fb8989 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -253,7 +253,6 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(fs_reg dst, fs_reg 
surf_index,
   } else {
  instructions.push_tail(MUL(mrf, offset, fs_reg(4)));
   }
-  inst = MOV(mrf, offset);
   inst = new(mem_ctx) fs_inst(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD,
   dst, surf_index);
   inst-header_present = header_present;
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 03/13] i965/fs: Move varying uniform offset compuation into the helper func.

I'm going to want to change the math for gen7 using sampler LD
instructions in a way that gets CSE to occur like we'd hope.

NOTE: This is a candidate for the 9.1 branch.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp |   16 +---
 src/mesa/drivers/dri/i965/brw_fs.h   |3 ++-
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |5 ++---
 3 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 2fb8989..89b08e8 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -229,11 +229,15 @@ fs_visitor::CMP(fs_reg dst, fs_reg src0, fs_reg src1, 
uint32_t condition)
 
 exec_list
 fs_visitor::VARYING_PULL_CONSTANT_LOAD(fs_reg dst, fs_reg surf_index,
-   fs_reg offset)
+   fs_reg varying_offset,
+   uint32_t const_offset)
 {
exec_list instructions;
fs_inst *inst;
 
+   fs_reg offset = fs_reg(this, glsl_type::uint_type);
+   instructions.push_tail(ADD(offset, varying_offset, fs_reg(const_offset)));
+
if (intel-gen = 7) {
   inst = new(mem_ctx) fs_inst(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7,
   dst, surf_index, offset);
@@ -1625,15 +1629,13 @@ 
fs_visitor::move_uniform_array_access_to_pull_constants()
  base_ir = inst-ir;
  current_annotation = inst-annotation;
 
- fs_reg offset = fs_reg(this, glsl_type::int_type);
- inst-insert_before(ADD(offset, *inst-src[i].reladdr,
- fs_reg(pull_constant_loc[uniform] +
-inst-src[i].reg_offset)));
-
  fs_reg surf_index = fs_reg((unsigned)SURF_INDEX_FRAG_CONST_BUFFER);
  fs_reg temp = fs_reg(this, glsl_type::float_type);
  exec_list list = VARYING_PULL_CONSTANT_LOAD(temp,
- surf_index, offset);
+ surf_index,
+ *inst-src[i].reladdr,
+ 
pull_constant_loc[uniform] +
+ inst-src[i].reg_offset);
  inst-insert_before(list);
 
  inst-src[i].file = temp.file;
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 254a534..76130b1 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -294,7 +294,8 @@ public:
   fs_reg reg);
 
exec_list VARYING_PULL_CONSTANT_LOAD(fs_reg dst, fs_reg surf_index,
-fs_reg offset);
+fs_reg varying_offset,
+uint32_t const_offset);
 
bool run();
void setup_payload_gen4();
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 735a33d..6b6af8d 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -650,9 +650,8 @@ fs_visitor::visit(ir_expression *ir)
  emit(SHR(base_offset, op[1], fs_reg(2)));
 
  for (int i = 0; i  ir-type-vector_elements; i++) {
-fs_reg offset = fs_reg(this, glsl_type::int_type);
-emit(ADD(offset, base_offset, fs_reg(i)));
-emit(VARYING_PULL_CONSTANT_LOAD(result, surf_index, offset));
+emit(VARYING_PULL_CONSTANT_LOAD(result, surf_index,
+base_offset, i));
 
 if (ir-type-base_type == GLSL_TYPE_BOOL)
emit(CMP(result, result, fs_reg(0), BRW_CONDITIONAL_NZ));
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 06/13] i965/fs: Avoid inappropriate optimization with regs_written 1.

Right now we don't have anything with regs_written()  1 and !inst-mlen,
but that's about to change.

NOTE: This is a candidate for the 9.1 branch.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp |6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index fbe9e3a..f1b0789 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2087,6 +2087,12 @@ fs_visitor::compute_to_mrf()
   break;
}
 
+/* Things returning more than one register would need us to
+ * understand coalescing out more than one MOV at a time.
+ */
+if (scan_inst-regs_written()  1)
+   break;
+
/* SEND instructions can't have MRF as a destination. */
if (scan_inst-mlen)
   break;
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 05/13] i965: Make the fragment shader pull constants index by dwords, not vec4s.

We want to load vec4s, since loading a vec4 instead of a dword is
basically no increased latency.  But for variable indexed access, the
previous requirement of aligned vec4s for a sampler LD was hard to
implement.

Note that this change only affects those messages that use the surface
format, like sampler LDs, but not to the untyped data cache loads we've
used in other cases.

No significant performance difference on my GLSL demo with uniforms forced
to take the varying pull constants path (n=4).

NOTE: This is a candidate for the 9.1 branch.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp  |5 -
 src/mesa/drivers/dri/i965/brw_state.h |5 -
 src/mesa/drivers/dri/i965/brw_vs_surface_state.c  |2 +-
 src/mesa/drivers/dri/i965/brw_wm_surface_state.c  |   13 -
 src/mesa/drivers/dri/i965/gen7_wm_surface_state.c |5 +++--
 src/mesa/drivers/dri/intel/intel_context.h|5 +++--
 6 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 89b08e8..fbe9e3a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2483,10 +2483,13 @@ fs_visitor::lower_uniform_pull_constant_loads()
  continue;
 
   if (intel-gen = 7) {
+ /* The offset arg before was a vec4-aligned byte offset.  We need to
+  * turn it into a dword offset.
+  */
  fs_reg const_offset_reg = inst-src[1];
  assert(const_offset_reg.file == IMM 
 const_offset_reg.type == BRW_REGISTER_TYPE_UD);
- const_offset_reg.imm.u /= 16;
+ const_offset_reg.imm.u /= 4;
  fs_reg payload = fs_reg(this, glsl_type::uint_type);
 
  /* This is actually going to be a MOV, but since only the first dword
diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index 02ce57b..29ec276 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -187,11 +187,6 @@ void *brw_state_batch(struct brw_context *brw,
 void gen4_init_vtable_surface_functions(struct brw_context *brw);
 uint32_t brw_get_surface_tiling_bits(uint32_t tiling);
 uint32_t brw_get_surface_num_multisamples(unsigned num_samples);
-void brw_create_constant_surface(struct brw_context *brw,
-drm_intel_bo *bo,
-uint32_t offset,
-int width,
-uint32_t *out_offset);
 
 uint32_t brw_format_for_mesa_format(gl_format mesa_format);
 
diff --git a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c
index 6c0b690..675a84c 100644
--- a/src/mesa/drivers/dri/i965/brw_vs_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_vs_surface_state.c
@@ -91,7 +91,7 @@ brw_upload_vs_pull_constants(struct brw_context *brw)
 
const int surf = SURF_INDEX_VERT_CONST_BUFFER;
intel-vtbl.create_constant_surface(brw, brw-vs.const_bo, 0, size,
-  brw-vs.surf_offset[surf]);
+  brw-vs.surf_offset[surf], false);
 
brw-state.dirty.brw |= BRW_NEW_VS_CONSTBUF;
 }
diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c 
b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
index 98eed15..506ddf0 100644
--- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
+++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
@@ -912,15 +912,16 @@ brw_update_texture_surface(struct gl_context *ctx,
  * Create the constant buffer surface.  Vertex/fragment shader constants will 
be
  * read from this buffer with Data Port Read instructions/messages.
  */
-void
+static void
 brw_create_constant_surface(struct brw_context *brw,
drm_intel_bo *bo,
uint32_t offset,
uint32_t size,
-   uint32_t *out_offset)
+   uint32_t *out_offset,
+bool dword_pitch)
 {
struct intel_context *intel = brw-intel;
-   uint32_t stride = 16;
+   uint32_t stride = dword_pitch ? 4 : 16;
uint32_t elements = ALIGN(size, stride) / stride;
const GLint w = elements - 1;
uint32_t *surf;
@@ -1089,7 +1090,8 @@ brw_upload_wm_pull_constants(struct brw_context *brw)
drm_intel_gem_bo_unmap_gtt(brw-wm.const_bo);
 
intel-vtbl.create_constant_surface(brw, brw-wm.const_bo, 0, size,
-  brw-wm.surf_offset[surf_index]);
+  brw-wm.surf_offset[surf_index],
+   true);
 
brw-state.dirty.brw |= BRW_NEW_SURFACES;
 }
@@ -1442,7 +1444,8 @@ brw_upload_ubo_surfaces(struct brw_context *brw,
*/
   intel-vtbl.create_constant_surface(brw, bo, binding-Offset,
  bo-size - binding-Offset,
-

[Mesa-dev] [PATCH 09/13] i965/fs: Clean up the setup of gen4 simd16 message destinations.

I think this makes it much more obvious what's going on here.

NOTE: This is a candidate for the 9.1 branch.
---
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 6b6af8d..48c6df3 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -916,11 +916,10 @@ fs_visitor::emit_texture_gen4(ir_texture *ir, fs_reg dst, 
fs_reg coordinate,
* this weirdness around to the expected layout.
*/
   orig_dst = dst;
-  const glsl_type *vec_type =
-glsl_type::get_instance(ir-type-base_type, 4, 1);
-  dst = fs_reg(this, glsl_type::get_array_instance(vec_type, 2));
-  dst.type = intel-is_g4x ? brw_type_for_base_type(ir-type)
-  : BRW_REGISTER_TYPE_F;
+  dst = fs_reg(GRF, virtual_grf_alloc(8),
+   (intel-is_g4x ?
+brw_type_for_base_type(ir-type) :
+BRW_REGISTER_TYPE_F));
}
 
fs_inst *inst = NULL;
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 08/13] i965/fs: Do CSE on gen7's varying-index pull constant loads.

This is our first CSE on a regs_written()  1 instruction, so it takes a
bit of extra fixup.  Reduces the number of loads on kwin's Lanczos shader
from 12 to 2.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554
NOTE: This is a candidate for the 9.1 branch.
---
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp |   43 ++
 1 file changed, 32 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
index 02642c9..c89da36 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_cse.cpp
@@ -68,6 +68,7 @@ is_expression(const fs_inst *const inst)
case BRW_OPCODE_MAD:
case BRW_OPCODE_LRP:
case FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD:
+   case FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7:
case FS_OPCODE_CINTERP:
case FS_OPCODE_LINTERP:
   return true;
@@ -129,21 +130,41 @@ fs_visitor::opt_cse_local(bblock_t *block, exec_list *aeb)
 */
bool no_existing_temp = entry-tmp.file == BAD_FILE;
if (no_existing_temp) {
-  entry-tmp = fs_reg(this, glsl_type::float_type);
-  entry-tmp.type = inst-dst.type;
-
-  fs_inst *copy = new(ralloc_parent(inst))
- fs_inst(BRW_OPCODE_MOV, entry-generator-dst, entry-tmp);
-  entry-generator-insert_after(copy);
-  entry-generator-dst = entry-tmp;
+   int written = entry-generator-regs_written();
+
+   fs_reg orig_dst = entry-generator-dst;
+   fs_reg tmp = fs_reg(GRF, virtual_grf_alloc(written),
+   orig_dst.type);
+   entry-tmp = tmp;
+   entry-generator-dst = tmp;
+
+   for (int i = 0; i  written; i++) {
+  fs_inst *copy = MOV(orig_dst, tmp);
+  copy-force_writemask_all =
+ entry-generator-force_writemask_all;
+  entry-generator-insert_after(copy);
+
+  orig_dst.reg_offset++;
+  tmp.reg_offset++;
+   }
}
 
/* dest - temp */
+int written = inst-regs_written();
+assert(written == entry-generator-regs_written());
 assert(inst-dst.type == entry-tmp.type);
-   fs_inst *copy = new(ralloc_parent(inst))
-  fs_inst(BRW_OPCODE_MOV, inst-dst, entry-tmp);
-copy-force_writemask_all = inst-force_writemask_all;
-   inst-replace_with(copy);
+fs_reg dst = inst-dst;
+fs_reg tmp = entry-tmp;
+fs_inst *copy;
+for (int i = 0; i  written; i++) {
+   copy = MOV(dst, tmp);
+   copy-force_writemask_all = inst-force_writemask_all;
+   inst-insert_before(copy);
+
+   dst.reg_offset++;
+   tmp.reg_offset++;
+}
+inst-remove();
 
/* Appending an instruction may have changed our bblock end. */
if (inst == block-end) {
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 11/13] i965/fs: Don't double-emit SEND dependency workarounds at control flow.

We weren't setting needs_dep[i] in the loops, so we'd continue on to
potentially add the same workaround MOVs to the later basic block
boundaries, too.  We can either set needs_dep[i] to exit through the
normal path, or we can just return since we know we're done.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp |2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index c128175..5d83e50 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2346,6 +2346,7 @@ 
fs_visitor::insert_gen4_pre_send_dependency_workarounds(fs_inst *inst)
inst-insert_before(DEP_RESOLVE_MOV(first_write_grf + i));
 }
  }
+ return;
   }
 
   bool scan_inst_16wide = (dispatch_width  8 
@@ -2415,6 +2416,7 @@ 
fs_visitor::insert_gen4_post_send_dependency_workarounds(fs_inst *inst)
 if (needs_dep[i])
scan_inst-insert_before(DEP_RESOLVE_MOV(first_write_grf + i));
  }
+ return;
   }
 
   /* Clear the flag for registers that actually got read (as expected). */
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 07/13] i965/fs: Improve performance of varying-index uniform loads on IVB.

Like we have done for the VS and for constant-index uniform loads, we use
the sampler engine to get caching in front of the L3 to avoid tickling the
IVB L3 bug.  This is also a bit of a functional change, as we're now
loading a vec4 instead of a single dword, though we're not taking
advantage of the other 3 components of the vec4 (yet).

With the driver hacked to always take the varying-index path for all
uniforms, improves performance of my old GLSL demo by 315% +/- 2% (n=4).
This a major fix for some blur shaders in compositors from the
varying-index uniforms support I introduced in 9.1.

v2: Move old offset computation into the pre-gen7 path.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554
NOTE: This is a candidate for the 9.1 branch.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp  |   29 -
 src/mesa/drivers/dri/i965/brw_fs_emit.cpp |   27 ++-
 2 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index f1b0789..f4aa9f7 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -235,14 +235,33 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(fs_reg dst, fs_reg 
surf_index,
exec_list instructions;
fs_inst *inst;
 
-   fs_reg offset = fs_reg(this, glsl_type::uint_type);
-   instructions.push_tail(ADD(offset, varying_offset, fs_reg(const_offset)));
-
if (intel-gen = 7) {
+  /* We have our constant surface use a pitch of 4 bytes, so our index can
+   * be any component of a vector, and then we load 4 contiguous
+   * components starting from that.
+   *
+   * We break down the const_offset to a portion added to the variable
+   * offset and a portion done using reg_offset, which means that if you
+   * have GLSL using something like uniform vec4 a[20]; gl_FragColor =
+   * a[i], we'll temporarily generate 4 vec4 loads from offset i * 4, and
+   * CSE can later notice that those loads are all the same and eliminate
+   * the redundant ones.
+   */
+  fs_reg vec4_offset = fs_reg(this, glsl_type::int_type);
+  instructions.push_tail(ADD(vec4_offset,
+ varying_offset, const_offset  ~3));
+
+  fs_reg vec4_result = fs_reg(GRF, virtual_grf_alloc(4), dst.type);
   inst = new(mem_ctx) fs_inst(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7,
-  dst, surf_index, offset);
+  vec4_result, surf_index, vec4_offset);
   instructions.push_tail(inst);
+
+  vec4_result.reg_offset += const_offset  3;
+  instructions.push_tail(MOV(dst, vec4_result));
} else {
+  fs_reg offset = fs_reg(this, glsl_type::uint_type);
+  instructions.push_tail(ADD(offset, varying_offset, 
fs_reg(const_offset)));
+
   int base_mrf = 13;
   bool header_present = true;
 
@@ -313,7 +332,7 @@ fs_inst::equals(fs_inst *inst)
 int
 fs_inst::regs_written()
 {
-   if (is_tex())
+   if (is_tex() || opcode == FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7)
   return 4;
 
/* The SINCOS and INT_DIV_QUOTIENT_AND_REMAINDER math functions return 2,
diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
index 712fef6..4b3c43f 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp
@@ -737,28 +737,29 @@ 
fs_generator::generate_varying_pull_constant_load_gen7(fs_inst *inst,
  index.type == BRW_REGISTER_TYPE_UD);
uint32_t surf_index = index.dw1.ud;
 
-   uint32_t msg_control, rlen, mlen;
+   uint32_t simd_mode, rlen, mlen;
if (dispatch_width == 16) {
-  msg_control = BRW_DATAPORT_DWORD_SCATTERED_BLOCK_16DWORDS;
-  mlen = rlen = 2;
+  mlen = 2;
+  rlen = 8;
+  simd_mode = BRW_SAMPLER_SIMD_MODE_SIMD16;
} else {
-  msg_control = BRW_DATAPORT_DWORD_SCATTERED_BLOCK_8DWORDS;
-  mlen = rlen = 1;
+  mlen = 1;
+  rlen = 4;
+  simd_mode = BRW_SAMPLER_SIMD_MODE_SIMD8;
}
 
struct brw_instruction *send = brw_next_insn(p, BRW_OPCODE_SEND);
brw_set_dest(p, send, dst);
brw_set_src0(p, send, offset);
-   if (intel-gen  6)
-  send-header.destreg__conditionalmod = inst-base_mrf;
-   brw_set_dp_read_message(p, send,
+   brw_set_sampler_message(p, send,
surf_index,
-   msg_control,
-   GEN7_DATAPORT_DC_DWORD_SCATTERED_READ,
-   BRW_DATAPORT_READ_TARGET_DATA_CACHE,
+   0, /* LD message ignores sampler unit */
+   GEN5_SAMPLER_MESSAGE_SAMPLE_LD,
+   rlen,
mlen,
-   inst-header_present,
-   rlen);
+   false, /* no header */
+   simd_mode,
+

[Mesa-dev] [PATCH 10/13] i965/fs: Bake regs_written into the IR instead of recomputing it later.

For sampler messages, it depends on the target gen, and on gen4
SIMD16-sampler-on-SIMD8-execution we were returning 4 instead of 8 like we
should.

NOTE: This is a candidate for the 9.1 branch.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   |   29 +++-
 src/mesa/drivers/dri/i965/brw_fs.h |2 +-
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp   |6 ++--
 .../drivers/dri/i965/brw_fs_live_variables.cpp |2 +-
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  |8 +++---
 .../dri/i965/brw_fs_schedule_instructions.cpp  |6 ++--
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |7 +++--
 7 files changed, 27 insertions(+), 33 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index f4aa9f7..c128175 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -60,6 +60,9 @@ fs_inst::init()
this-src[0] = reg_undef;
this-src[1] = reg_undef;
this-src[2] = reg_undef;
+
+   /* This will be the case for almost all instructions. */
+   this-regs_written = 1;
 }
 
 fs_inst::fs_inst()
@@ -254,6 +257,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(fs_reg dst, fs_reg 
surf_index,
   fs_reg vec4_result = fs_reg(GRF, virtual_grf_alloc(4), dst.type);
   inst = new(mem_ctx) fs_inst(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7,
   vec4_result, surf_index, vec4_offset);
+  inst-regs_written = 4;
   instructions.push_tail(inst);
 
   vec4_result.reg_offset += const_offset  3;
@@ -329,26 +333,13 @@ fs_inst::equals(fs_inst *inst)
offset == inst-offset);
 }
 
-int
-fs_inst::regs_written()
-{
-   if (is_tex() || opcode == FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7)
-  return 4;
-
-   /* The SINCOS and INT_DIV_QUOTIENT_AND_REMAINDER math functions return 2,
-* but we don't currently use them...nor do we have an opcode for them.
-*/
-
-   return 1;
-}
-
 bool
 fs_inst::overwrites_reg(const fs_reg reg)
 {
return (reg.file == dst.file 
reg.reg == dst.reg 
reg.reg_offset = dst.reg_offset  
-   reg.reg_offset  dst.reg_offset + regs_written());
+   reg.reg_offset  dst.reg_offset + regs_written);
 }
 
 bool
@@ -1388,7 +1379,7 @@ fs_visitor::split_virtual_grfs()
   /* If there's a SEND message that requires contiguous destination
* registers, no splitting is allowed.
*/
-  if (inst-regs_written()  1) {
+  if (inst-regs_written  1) {
 split_grf[inst-dst.reg] = false;
   }
}
@@ -2109,7 +2100,7 @@ fs_visitor::compute_to_mrf()
 /* Things returning more than one register would need us to
  * understand coalescing out more than one MOV at a time.
  */
-if (scan_inst-regs_written()  1)
+if (scan_inst-regs_written  1)
break;
 
/* SEND instructions can't have MRF as a destination. */
@@ -2326,7 +2317,7 @@ void
 fs_visitor::insert_gen4_pre_send_dependency_workarounds(fs_inst *inst)
 {
int reg_size = dispatch_width / 8;
-   int write_len = inst-regs_written() * reg_size;
+   int write_len = inst-regs_written * reg_size;
int first_write_grf = inst-dst.reg;
bool needs_dep[BRW_MAX_MRF];
assert(write_len  (int)sizeof(needs_dep) - 1);
@@ -2366,7 +2357,7 @@ 
fs_visitor::insert_gen4_pre_send_dependency_workarounds(fs_inst *inst)
* dependency has more latency than a MOV.
*/
   if (scan_inst-dst.file == GRF) {
- for (int i = 0; i  scan_inst-regs_written(); i++) {
+ for (int i = 0; i  scan_inst-regs_written; i++) {
 int reg = scan_inst-dst.reg + i * reg_size;
 
 if (reg = first_write_grf 
@@ -2405,7 +2396,7 @@ 
fs_visitor::insert_gen4_pre_send_dependency_workarounds(fs_inst *inst)
 void
 fs_visitor::insert_gen4_post_send_dependency_workarounds(fs_inst *inst)
 {
-   int write_len = inst-regs_written() * dispatch_width / 8;
+   int write_len = inst-regs_written * dispatch_width / 8;
int first_write_grf = inst-dst.reg;
bool needs_dep[BRW_MAX_MRF];
assert(write_len  (int)sizeof(needs_dep) - 1);
diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index 76130b1..0c5aad1 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -174,7 +174,6 @@ public:
fs_reg src0, fs_reg src1,fs_reg src2);
 
bool equals(fs_inst *inst);
-   int regs_written();
bool overwrites_reg(const fs_reg reg);
bool is_tex();
bool is_math();
@@ -192,6 +191,7 @@ public:
uint8_t flag_subreg;
 
int mlen; /** SEND message length */
+   int regs_written; /** Number of vgrfs written by a SEND message, or 1 */
int base_mrf; /** First MRF in the SEND message, if mlen is nonzero. */
uint32_t texture_offset; /** Texture offset bitfield */
int sampler;
diff --git

[Mesa-dev] [PATCH 12/13] i965/fs: Use LD messages for pre-gen7 varying-index uniform loads

This comes at a minor performance cost at the moment (-3.2% +/- 0.2%, n=14 on
my GM45 forced to load all uniforms through the varying-index path), but we
get a whole vec4 at a time to reuse in the next commit.

NOTE: This is a candidate for the 9.1 branch.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp  |   92 ++---
 src/mesa/drivers/dri/i965/brw_fs.h|3 +-
 src/mesa/drivers/dri/i965/brw_fs_cse.cpp  |1 +
 src/mesa/drivers/dri/i965/brw_fs_emit.cpp |   55 +++--
 4 files changed, 84 insertions(+), 67 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 5d83e50..e504e3a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -238,57 +238,53 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(fs_reg dst, fs_reg 
surf_index,
exec_list instructions;
fs_inst *inst;
 
-   if (intel-gen = 7) {
-  /* We have our constant surface use a pitch of 4 bytes, so our index can
-   * be any component of a vector, and then we load 4 contiguous
-   * components starting from that.
-   *
-   * We break down the const_offset to a portion added to the variable
-   * offset and a portion done using reg_offset, which means that if you
-   * have GLSL using something like uniform vec4 a[20]; gl_FragColor =
-   * a[i], we'll temporarily generate 4 vec4 loads from offset i * 4, and
-   * CSE can later notice that those loads are all the same and eliminate
-   * the redundant ones.
-   */
-  fs_reg vec4_offset = fs_reg(this, glsl_type::int_type);
-  instructions.push_tail(ADD(vec4_offset,
- varying_offset, const_offset  ~3));
-
-  fs_reg vec4_result = fs_reg(GRF, virtual_grf_alloc(4), dst.type);
-  inst = new(mem_ctx) fs_inst(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7,
-  vec4_result, surf_index, vec4_offset);
-  inst-regs_written = 4;
-  instructions.push_tail(inst);
-
-  vec4_result.reg_offset += const_offset  3;
-  instructions.push_tail(MOV(dst, vec4_result));
-   } else {
-  fs_reg offset = fs_reg(this, glsl_type::uint_type);
-  instructions.push_tail(ADD(offset, varying_offset, 
fs_reg(const_offset)));
-
-  int base_mrf = 13;
-  bool header_present = true;
-
-  fs_reg mrf = fs_reg(MRF, base_mrf + header_present);
-  mrf.type = BRW_REGISTER_TYPE_D;
-
-  /* On gen6+ we want the dword offset passed in, but on gen4/5 we need a
-   * dword-aligned byte offset.
+   /* We have our constant surface use a pitch of 4 bytes, so our index can
+* be any component of a vector, and then we load 4 contiguous
+* components starting from that.
+*
+* We break down the const_offset to a portion added to the variable
+* offset and a portion done using reg_offset, which means that if you
+* have GLSL using something like uniform vec4 a[20]; gl_FragColor =
+* a[i], we'll temporarily generate 4 vec4 loads from offset i * 4, and
+* CSE can later notice that those loads are all the same and eliminate
+* the redundant ones.
+*/
+   fs_reg vec4_offset = fs_reg(this, glsl_type::int_type);
+   instructions.push_tail(ADD(vec4_offset,
+  varying_offset, const_offset  ~3));
+
+   int scale = 1;
+   if (intel-gen == 4  dispatch_width == 8) {
+  /* Pre-gen5, we can either use a SIMD8 message that requires (header,
+   * u, v, r) as parameters, or we can just use the SIMD16 message
+   * consisting of (header, u).  We choose the second, at the cost of a
+   * longer return length.
*/
-  if (intel-gen == 6) {
- instructions.push_tail(MOV(mrf, offset));
-  } else {
- instructions.push_tail(MUL(mrf, offset, fs_reg(4)));
-  }
-  inst = new(mem_ctx) fs_inst(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD,
-  dst, surf_index);
-  inst-header_present = header_present;
-  inst-base_mrf = base_mrf;
-  inst-mlen = header_present + dispatch_width / 8;
+  scale = 2;
+   }
 
-  instructions.push_tail(inst);
+   enum opcode op;
+   if (intel-gen = 7)
+  op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7;
+   else
+  op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD;
+   fs_reg vec4_result = fs_reg(GRF, virtual_grf_alloc(4 * scale), dst.type);
+   inst = new(mem_ctx) fs_inst(op, vec4_result, surf_index, vec4_offset);
+   inst-regs_written = 4 * scale;
+   instructions.push_tail(inst);
+
+   if (intel-gen  7) {
+  inst-base_mrf = 13;
+  inst-header_present = true;
+  if (intel-gen == 4)
+ inst-mlen = 3;
+  else
+ inst-mlen = 1 + dispatch_width / 8;
}
 
+   vec4_result.reg_offset += (const_offset  3) * scale;
+   instructions.push_tail(MOV(dst, vec4_result));
+
return instructions;
 }
 
@@ -766,7 +762,7 @@ fs_visitor::implied_mrf_writes(fs_inst *inst)
case

[Mesa-dev] [PATCH 13/13] i965/fs: Allow CSE on pre-gen7 varying-index uniform loads