[Mesa-dev] [PATCH] ACTIVE_UNIFORM_MAX_LENGTH should include 3 extra characters for arrays.

2013-04-02 Thread Haixia Shi
If the active uniform is an array, then the length of the uniform name should
include the three extra characters for the [0] suffix, which is required by
the GL 4.2 spec to be appended to the uniform name in glGetActiveUniform().

This avoids the situation where the output buffer does not have enough space
to hold the [0] suffix, resulting in an incomplete array specification like
foobar[0.

Change-Id: Icd58cd6a73c9de7bbe5659d757b8009021846019
Signed-off-by: Haixia Shi h...@chromium.org
Reviewed-by: Stephane Marchesin marc...@chromium.org
---
 src/mesa/main/shaderapi.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
index be69467..68767f4 100644
--- a/src/mesa/main/shaderapi.c
+++ b/src/mesa/main/shaderapi.c
@@ -519,8 +519,11 @@ get_programiv(struct gl_context *ctx, GLuint program, 
GLenum pname, GLint *param
 
   for (i = 0; i  shProg-NumUserUniformStorage; i++) {
 /* Add one for the terminating NUL character.
+ * However if the uniform is an array, then add three extra characters
+ * for the appended [0] suffix, in addition to the terminating NUL.
  */
-const GLint len = strlen(shProg-UniformStorage[i].name) + 1;
+const GLint len = strlen(shProg-UniformStorage[i].name) + 1 +
+((shProg-UniformStorage[i].array_elements != 0) ? 3 : 0);
 
 if (len  max_len)
max_len = len;
-- 
1.8.1.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] R600: Emit native instructions for tex

2013-04-02 Thread Sean Silva
Tests?

-- Sean Silva
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeon/llvm: Build libradeonllvm as a static library

2013-04-02 Thread Michel Dänzer
On Mon, 2013-04-01 at 14:11 -0700, Tom Stellard wrote: 
 From: Tom Stellard thomas.stell...@amd.com
 
 Building libradeonllvm as a shared object has led to a number of bugs
 and build system complications, and I don't think it's necessary for
 such a small library.
 
 This library was originally changed to a shared object to work around
 linker error in egl_static.so, but these appear to be fixed now.
 
 https://bugs.freedesktop.org/show_bug.cgi?id=62226
 ---
 
 Please test to make sure this works for your build configuration.

Tested-by: Michel Dänzer michel.daen...@amd.com


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH V2] mesa: don't memcmp() off the end of a cache key.

2013-04-02 Thread Chris Forbes
Reported-by: `per` in #intel-gfx

The size of the cache key varies, so store the actual size as well as
the key blob itself, rather than just assuming it's the same as the size
passed in.

NOTE: This is a candidate for stable branches.

V2: Don't leave silly holes in structure; use unsigned instead of
GLuint.

Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
 src/mesa/program/prog_cache.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/mesa/program/prog_cache.c b/src/mesa/program/prog_cache.c
index 47f926b..1041f35 100644
--- a/src/mesa/program/prog_cache.c
+++ b/src/mesa/program/prog_cache.c
@@ -37,6 +37,7 @@
 struct cache_item
 {
GLuint hash;
+   unsigned keysize;
void *key;
struct gl_program *program;
struct cache_item *next;
@@ -183,7 +184,10 @@ _mesa_search_program_cache(struct gl_program_cache *cache,
   struct cache_item *c;
 
   for (c = cache-items[hash % cache-size]; c; c = c-next) {
- if (c-hash == hash  memcmp(c-key, key, keysize) == 0) {
+ if (c-hash == hash 
+c-keysize == keysize 
+memcmp(c-key, key, keysize) == 0) {
+
 cache-last = c;
 return c-program;
  }
@@ -207,6 +211,7 @@ _mesa_program_cache_insert(struct gl_context *ctx,
 
c-key = malloc(keysize);
memcpy(c-key, key, keysize);
+   c-keysize = keysize;
 
c-program = program;  /* no refcount change */
 
@@ -235,6 +240,7 @@ _mesa_shader_cache_insert(struct gl_context *ctx,
 
c-key = malloc(keysize);
memcpy(c-key, key, keysize);
+   c-keysize = keysize;
 
c-program = (struct gl_program *)program;  /* no refcount change */
 
-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] radeonsi: add instance divisor support v3

2013-04-02 Thread Michel Dänzer
On Mit, 2013-03-27 at 16:35 +0100, Christian König wrote: 
 From: Christian König christian.koe...@amd.com
 
 v2: reduce key size, don't copy key around to much.
 v3: remove key size reduction
 
 Signed-off-by: Christian König christian.koe...@amd.com

Reviewed-by: Michel Dänzer michel.daen...@amd.com 

-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 62921] [llvmpipe] piglit arb_color_buffer_float-drawpixels GL_RGBA16F regression

2013-04-02 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=62921

Roland Scheidegger srol...@vmware.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from Roland Scheidegger srol...@vmware.com ---
Fixed by 9b329f4c095a6b0aa5e55519c32fcf4c9d823e2b.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi: remove sampler writemask v2

2013-04-02 Thread Christian König
From: Christian König christian.koe...@amd.com

v2: fix instrinsic name as well

Signed-off-by: Christian König christian.koe...@amd.com
---
 src/gallium/drivers/radeonsi/radeonsi_shader.c |   19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/radeonsi_shader.c 
b/src/gallium/drivers/radeonsi/radeonsi_shader.c
index 5fdf46e..1c5fa51 100644
--- a/src/gallium/drivers/radeonsi/radeonsi_shader.c
+++ b/src/gallium/drivers/radeonsi/radeonsi_shader.c
@@ -795,10 +795,6 @@ static void tex_fetch_args(
unsigned count = 0;
unsigned chan;
 
-   /* WriteMask */
-   /* XXX: should be optimized using 
emit_data-inst-Dst[0].Register.WriteMask*/
-   emit_data-args[0] = lp_build_const_int32(bld_base-base.gallivm, 0xf);
-
/* Fetch and project texture coordinates */
coords[3] = lp_build_emit_fetch(bld_base, emit_data-inst, 0, 
TGSI_CHAN_W);
for (chan = 0; chan  3; chan++ ) {
@@ -904,20 +900,19 @@ static void tex_fetch_args(
while (count  util_next_power_of_two(count))
address[count++] = 
LLVMGetUndef(LLVMInt32TypeInContext(gallivm-context));
 
-   emit_data-args[1] = lp_build_gather_values(gallivm, address, count);
+   emit_data-args[0] = lp_build_gather_values(gallivm, address, count);
 
/* Resource */
-   emit_data-args[2] = 
si_shader_ctx-resources[emit_data-inst-Src[1].Register.Index];
+   emit_data-args[1] = 
si_shader_ctx-resources[emit_data-inst-Src[1].Register.Index];
 
/* Sampler */
-   emit_data-args[3] = 
si_shader_ctx-samplers[emit_data-inst-Src[1].Register.Index];
+   emit_data-args[2] = 
si_shader_ctx-samplers[emit_data-inst-Src[1].Register.Index];
 
/* Dimensions */
-   emit_data-args[4] = lp_build_const_int32(bld_base-base.gallivm, 
target);
+   emit_data-args[3] = lp_build_const_int32(bld_base-base.gallivm, 
target);
+
+   emit_data-arg_count = 4;
 
-   emit_data-arg_count = 5;
-   /* XXX: To optimize, we could use a float or v2f32, if the last bits of
-* the writemask are clear */
emit_data-dst_type = LLVMVectorType(
LLVMFloatTypeInContext(bld_base-base.gallivm-context),
4);
@@ -931,7 +926,7 @@ static void build_tex_intrinsic(const struct 
lp_build_tgsi_action * action,
char intr_name[23];
 
sprintf(intr_name, %sv%ui32, action-intr_name,
-   LLVMGetVectorSize(LLVMTypeOf(emit_data-args[1])));
+   LLVMGetVectorSize(LLVMTypeOf(emit_data-args[0])));
 
emit_data-output[emit_data-chan] = build_intrinsic(
base-gallivm-builder, intr_name, emit_data-dst_type,
-- 
1.7.9.5

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] drirc: set always_have_depth_buffer for Topogon

2013-04-02 Thread Jose Fonseca
- Original Message -
 On 03/29/2013 05:30 PM, Brian Paul wrote:
 
 Has this bug been reported to the Topogun developer?

Yes, I have reported via http://www.topogun.com/support/contact-us.htm on 29th 
September 2012.

I received no reply since, nor did I try a second time.

Also note that there was no release/update since then neither. So it is 
possible that a fix has been incorporated but not released.

Jose
 
  ---
src/mesa/drivers/dri/common/drirc |6 ++
1 files changed, 6 insertions(+), 0 deletions(-)
 
  diff --git a/src/mesa/drivers/dri/common/drirc
  b/src/mesa/drivers/dri/common/drirc
  index a13941f..556d1b5 100644
  --- a/src/mesa/drivers/dri/common/drirc
  +++ b/src/mesa/drivers/dri/common/drirc
  @@ -25,5 +25,11 @@
application name=Savage 2 executable=savage2.bin
option name=disable_glsl_line_continuations value=true
/
/application
  +application name=Topogun (32-bit) executable=topogun32
  +option name=always_have_depth_buffer value=true /
  +/application
  +application name=Topogun (64-bit) executable=topogun64
  +option name=always_have_depth_buffer value=true /
  +/application
/device
/driconf
 
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] gallivm: bring back optimized but incorrect float to smallfloat optimizations

2013-04-02 Thread Jose Fonseca


- Original Message -
 From: Roland Scheidegger srol...@vmware.com
 
 Conceptually the same as previously done in float_to_half.
 Should cut down number of instructions from 14 to 10 or so, but
 will promote some NaNs to Infs, so it's disabled.
 It gets a bit tricky though handling all the cases correctly...
 Passes basic tests either way (though there are no tests testing special
 cases, but some manual tests injecting them seemed promising).
 ---
  .../auxiliary/gallivm/lp_bld_format_float.c|  124
  ++--
  1 file changed, 86 insertions(+), 38 deletions(-)
 
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_format_float.c
 b/src/gallium/auxiliary/gallivm/lp_bld_format_float.c
 index 161e392..61b6a60 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_format_float.c
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_format_float.c
 @@ -79,13 +79,15 @@ lp_build_float_to_smallfloat(struct gallivm_state
 *gallivm,
  {
 LLVMBuilderRef builder = gallivm-builder;
 LLVMValueRef i32_floatexpmask, i32_smallexpmask, magic, normal;
 -   LLVMValueRef rescale_src, tmp, i32_roundmask, small_max;
 -   LLVMValueRef is_nan, i32_qnanbit, src_abs, shift, infcheck_src, res;
 -   LLVMValueRef is_inf, is_nan_or_inf, nan_or_inf, mask;
 +   LLVMValueRef rescale_src, i32_roundmask, small_max;
 +   LLVMValueRef i32_qnanbit, shift, res;
 +   LLVMValueRef is_nan_or_inf, nan_or_inf, mask, srci;
 struct lp_type f32_type = lp_type_float_vec(32, 32 * i32_type.length);
 struct lp_build_context f32_bld, i32_bld;
 LLVMValueRef zero = lp_build_const_vec(gallivm, f32_type, 0.0f);
 unsigned exponent_start = mantissa_start + mantissa_bits;
 +   boolean always_preserve_nans = true;
 +   boolean maybe_correct_denorm_rounding = true;
  
 lp_build_context_init(f32_bld, gallivm, f32_type);
 lp_build_context_init(i32_bld, gallivm, i32_type);
 @@ -94,35 +96,41 @@ lp_build_float_to_smallfloat(struct gallivm_state
 *gallivm,
   ((1  exponent_bits) - 1) 
   23);
 i32_floatexpmask = lp_build_const_int_vec(gallivm, i32_type, 0xff  23);
  
 -   src_abs = lp_build_abs(f32_bld, src);
 -   src_abs = LLVMBuildBitCast(builder, src_abs, i32_bld.vec_type, );
 +   srci = LLVMBuildBitCast(builder, src, i32_bld.vec_type, );

Lets use src_int instead of srci (as the latter invokes more the concept of 
indexed than integer).

  
 if (has_sign) {
 -  rescale_src = src_abs;
 -  infcheck_src = src_abs;
 -  src = LLVMBuildBitCast(builder, src, i32_bld.vec_type, );
 +  rescale_src = src;
 }
 else {
/* clamp to pos range (can still have sign bit if NaN or negative
zero) */
 -  rescale_src = lp_build_max(f32_bld, src, zero);
 -  rescale_src = LLVMBuildBitCast(builder, rescale_src, i32_bld.vec_type,
 );
 -  src = LLVMBuildBitCast(builder, src, i32_bld.vec_type, );
 -  infcheck_src = src;
 +  rescale_src = lp_build_max(f32_bld, zero, src);
 }
 +   rescale_src = LLVMBuildBitCast(builder, rescale_src, i32_bld.vec_type,
 );
  
 /* ordinary number */
 -   /* get rid of excess mantissa bits, and while here also potential sign
 bit */
 -   i32_roundmask = lp_build_const_int_vec(gallivm, i32_type,
 -  ~((1  (23 - mantissa_bits)) - 1)
 
 -  0x7fff);
 +   /*
 +* get rid of excess mantissa bits and sign bit
 +* This is only really needed for correct rounding of denorms I think
 +* but only if we use the preserve NaN path does using
 +* src_abs instead save us any instruction.
 +*/
 +   if (maybe_correct_denorm_rounding || !always_preserve_nans) {
 +  i32_roundmask = lp_build_const_int_vec(gallivm, i32_type,
 + ~((1  (23 - mantissa_bits)) -
 1) 
 + 0x7fff);
 +  rescale_src = LLVMBuildBitCast(builder, rescale_src, i32_bld.vec_type,
 );
 +  rescale_src = lp_build_and(i32_bld, rescale_src, i32_roundmask);
 +  rescale_src = LLVMBuildBitCast(builder, rescale_src, f32_bld.vec_type,
 );
 +   }
 +   else {
 +  rescale_src = lp_build_abs(f32_bld, src);
 +   }
  
 -   tmp = lp_build_and(i32_bld, rescale_src, i32_roundmask);
 -   tmp = LLVMBuildBitCast(builder, tmp, f32_bld.vec_type, );
 /* bias exponent (and denormalize if necessary) */
 magic = lp_build_const_int_vec(gallivm, i32_type,
((1  (exponent_bits - 1)) - 1)  23);
 magic = LLVMBuildBitCast(builder, magic, f32_bld.vec_type, );
 -   normal = lp_build_mul(f32_bld, tmp, magic);
 +   normal = lp_build_mul(f32_bld, rescale_src, magic);
  
 /* clamp to max value - largest non-infinity number */
 small_max = lp_build_const_int_vec(gallivm, i32_type,
 @@ -141,19 +149,66 @@ lp_build_float_to_smallfloat(struct gallivm_state
 *gallivm,
  * (Cannot actually save 

Re: [Mesa-dev] [PATCH] gallium/hud: do .xxxx swizzling for the font texture in the fragment shader

2013-04-02 Thread Brian Paul

On 04/01/2013 07:36 PM, Marek Olšák wrote:

This allows using L8 and R8 for the font if I8 isn't supported.
---
  src/gallium/auxiliary/hud/hud_context.c |   36 +--
  1 file changed, 30 insertions(+), 6 deletions(-)



Tested-by: Brian Paul bri...@vmware.com

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] svga: refactor occlusion query code

2013-04-02 Thread Brian Paul
From: Brian Paul bri...@vmware.com

This is in preparation for adding new query types for the HUD.
---
 src/gallium/drivers/svga/svga_pipe_query.c |  218 
 1 file changed, 124 insertions(+), 94 deletions(-)

diff --git a/src/gallium/drivers/svga/svga_pipe_query.c 
b/src/gallium/drivers/svga/svga_pipe_query.c
index 902f84c..b83c7d4 100644
--- a/src/gallium/drivers/svga/svga_pipe_query.c
+++ b/src/gallium/drivers/svga/svga_pipe_query.c
@@ -44,7 +44,10 @@ struct pipe_query {
 
 struct svga_query {
struct pipe_query base;
-   SVGA3dQueryType type;
+   unsigned type;  /** PIPE_QUERY_x or SVGA_QUERY_x */
+   SVGA3dQueryType svga_type;  /** SVGA3D_QUERYTYPE_x, or zero */
+
+   /** For PIPE_QUERY_OCCLUSION_COUNTER / SVGA3D_QUERYTYPE_OCCLUSION */
struct svga_winsys_buffer *hwbuf;
volatile SVGA3dQueryResult *queryResult;
struct pipe_fence_handle *fence;
@@ -79,31 +82,35 @@ static struct pipe_query *svga_create_query( struct 
pipe_context *pipe,
if (!sq)
   goto no_sq;
 
-   sq-type = SVGA3D_QUERYTYPE_OCCLUSION;
-
-   sq-hwbuf = svga_winsys_buffer_create(svga,
- 1,
- SVGA_BUFFER_USAGE_PINNED,
- sizeof *sq-queryResult);
-   if(!sq-hwbuf)
-  goto no_hwbuf;
-
-   sq-queryResult = (SVGA3dQueryResult *)sws-buffer_map(sws, 
-  sq-hwbuf, 
-  PIPE_TRANSFER_WRITE);
-   if(!sq-queryResult)
-  goto no_query_result;
-
-   sq-queryResult-totalSize = sizeof *sq-queryResult;
-   sq-queryResult-state = SVGA3D_QUERYSTATE_NEW;
-
-   /*
-* We request the buffer to be pinned and assume it is always mapped.
-* 
-* The reason is that we don't want to wait for fences when checking the
-* query status.
-*/
-   sws-buffer_unmap(sws, sq-hwbuf);
+   switch (query_type) {
+   case PIPE_QUERY_OCCLUSION_COUNTER:
+  sq-svga_type = SVGA3D_QUERYTYPE_OCCLUSION;
+
+  sq-hwbuf = svga_winsys_buffer_create(svga, 1,
+SVGA_BUFFER_USAGE_PINNED,
+sizeof *sq-queryResult);
+  if (!sq-hwbuf)
+ goto no_hwbuf;
+
+  sq-queryResult = (SVGA3dQueryResult *)
+ sws-buffer_map(sws, sq-hwbuf, PIPE_TRANSFER_WRITE);
+  if (!sq-queryResult)
+ goto no_query_result;
+
+  sq-queryResult-totalSize = sizeof *sq-queryResult;
+  sq-queryResult-state = SVGA3D_QUERYSTATE_NEW;
+
+  /* We request the buffer to be pinned and assume it is always mapped.
+   * The reason is that we don't want to wait for fences when checking the
+   * query status.
+   */
+  sws-buffer_unmap(sws, sq-hwbuf);
+  break;
+   default:
+  assert(!unexpected query type in svga_create_query());
+   }
+
+   sq-type = query_type;
 
return sq-base;
 
@@ -123,8 +130,16 @@ static void svga_destroy_query(struct pipe_context *pipe,
struct svga_query *sq = svga_query( q );
 
SVGA_DBG(DEBUG_QUERY, %s\n, __FUNCTION__);
-   sws-buffer_destroy(sws, sq-hwbuf);
-   sws-fence_reference(sws, sq-fence, NULL);
+
+   switch (sq-type) {
+   case PIPE_QUERY_OCCLUSION_COUNTER:
+  sws-buffer_destroy(sws, sq-hwbuf);
+  sws-fence_reference(sws, sq-fence, NULL);
+  break;
+   default:
+  assert(!svga: unexpected query type in svga_destroy_query());
+   }
+
FREE(sq);
 }
 
@@ -139,39 +154,42 @@ static void svga_begin_query(struct pipe_context *pipe,
 
SVGA_DBG(DEBUG_QUERY, %s\n, __FUNCTION__);

-   assert(!svga-sq);
-
/* Need to flush out buffered drawing commands so that they don't
 * get counted in the query results.
 */
svga_hwtnl_flush_retry(svga);

-   if(sq-queryResult-state == SVGA3D_QUERYSTATE_PENDING) {
-  /* The application doesn't care for the pending query result. We cannot
-   * let go the existing buffer and just get a new one because its storage
-   * may be reused for other purposes and clobbered by the host when it
-   * determines the query result. So the only option here is to wait for
-   * the existing query's result -- not a big deal, given that no sane
-   * application would do this.
-   */
-  uint64_t result;
+   switch (sq-type) {
+   case PIPE_QUERY_OCCLUSION_COUNTER:
+  assert(!svga-sq);
+  if (sq-queryResult-state == SVGA3D_QUERYSTATE_PENDING) {
+ /* The application doesn't care for the pending query result. We 
cannot
+  * let go the existing buffer and just get a new one because its 
storage
+  * may be reused for other purposes and clobbered by the host when it
+  * determines the query result. So the only option here is to wait for
+  * the existing query's result -- not a big deal, given that no sane
+  * application would do this.
+  */
+ uint64_t 

[Mesa-dev] [PATCH 2/2] svga: add HUD queries for number of draw calls, number of fallbacks

2013-04-02 Thread Brian Paul
From: Brian Paul bri...@vmware.com

The fallbacks count is the number of drawing calls that use a draw
module fallback, such as polygon stipple.
---
 src/gallium/drivers/svga/svga_context.h|9 +
 src/gallium/drivers/svga/svga_pipe_draw.c  |3 +++
 src/gallium/drivers/svga/svga_pipe_query.c |   27 +++
 src/gallium/drivers/svga/svga_screen.c |   22 ++
 4 files changed, 61 insertions(+)

diff --git a/src/gallium/drivers/svga/svga_context.h 
b/src/gallium/drivers/svga/svga_context.h
index 32671ec..e27778e 100644
--- a/src/gallium/drivers/svga/svga_context.h
+++ b/src/gallium/drivers/svga/svga_context.h
@@ -42,6 +42,11 @@
 #include svga3d_shaderdefs.h
 
 
+/** Non-GPU queries for gallium HUD */
+#define SVGA_QUERY_DRAW_CALLS   (PIPE_QUERY_DRIVER_SPECIFIC + 0)
+#define SVGA_QUERY_FALLBACKS(PIPE_QUERY_DRIVER_SPECIFIC + 1)
+
+
 struct draw_vertex_shader;
 struct draw_fragment_shader;
 struct svga_shader_result;
@@ -370,6 +375,10 @@ struct svga_context
 
/** List of buffers with queued transfers */
struct list_head dirty_buffers;
+
+   /** performance / info queries */
+   uint64_t num_draw_calls;  /** SVGA_QUERY_DRAW_CALLS */
+   uint64_t num_fallbacks;   /** SVGA_QUERY_FALLBACKS */
 };
 
 /* A flag for each state_tracker state object:
diff --git a/src/gallium/drivers/svga/svga_pipe_draw.c 
b/src/gallium/drivers/svga/svga_pipe_draw.c
index e72032e..f0da170 100644
--- a/src/gallium/drivers/svga/svga_pipe_draw.c
+++ b/src/gallium/drivers/svga/svga_pipe_draw.c
@@ -330,6 +330,8 @@ svga_draw_vbo(struct pipe_context *pipe, const struct 
pipe_draw_info *info)
enum pipe_error ret = 0;
boolean needed_swtnl;
 
+   svga-num_draw_calls++;  /* for SVGA_QUERY_DRAW_CALLS */
+
if (!u_trim_pipe_prim( info-mode, count ))
   return;
 
@@ -358,6 +360,7 @@ svga_draw_vbo(struct pipe_context *pipe, const struct 
pipe_draw_info *info)
 #endif
 
if (svga-state.sw.need_swtnl) {
+  svga-num_fallbacks++;  /* for SVGA_QUERY_FALLBACKS */
   if (!needed_swtnl) {
  /*
   * We're switching from HW to SW TNL.  SW TNL will require mapping all
diff --git a/src/gallium/drivers/svga/svga_pipe_query.c 
b/src/gallium/drivers/svga/svga_pipe_query.c
index b83c7d4..6fa6fac 100644
--- a/src/gallium/drivers/svga/svga_pipe_query.c
+++ b/src/gallium/drivers/svga/svga_pipe_query.c
@@ -51,6 +51,9 @@ struct svga_query {
struct svga_winsys_buffer *hwbuf;
volatile SVGA3dQueryResult *queryResult;
struct pipe_fence_handle *fence;
+
+   /** For non-GPU SVGA_QUERY_x queries */
+   uint64_t begin_count, end_count;
 };
 
 /***
@@ -106,6 +109,9 @@ static struct pipe_query *svga_create_query( struct 
pipe_context *pipe,
*/
   sws-buffer_unmap(sws, sq-hwbuf);
   break;
+   case SVGA_QUERY_DRAW_CALLS:
+   case SVGA_QUERY_FALLBACKS:
+  break;
default:
   assert(!unexpected query type in svga_create_query());
}
@@ -136,6 +142,10 @@ static void svga_destroy_query(struct pipe_context *pipe,
   sws-buffer_destroy(sws, sq-hwbuf);
   sws-fence_reference(sws, sq-fence, NULL);
   break;
+   case SVGA_QUERY_DRAW_CALLS:
+   case SVGA_QUERY_FALLBACKS:
+  /* nothing */
+  break;
default:
   assert(!svga: unexpected query type in svga_destroy_query());
}
@@ -187,6 +197,12 @@ static void svga_begin_query(struct pipe_context *pipe,
 
   svga-sq = sq;
   break;
+   case SVGA_QUERY_DRAW_CALLS:
+  sq-begin_count = svga-num_draw_calls;
+  break;
+   case SVGA_QUERY_FALLBACKS:
+  sq-begin_count = svga-num_fallbacks;
+  break;
default:
   assert(!unexpected query type in svga_begin_query());
}
@@ -224,6 +240,12 @@ static void svga_end_query(struct pipe_context *pipe,
 
   svga-sq = NULL;
   break;
+   case SVGA_QUERY_DRAW_CALLS:
+  sq-end_count = svga-num_draw_calls;
+  break;
+   case SVGA_QUERY_FALLBACKS:
+  sq-end_count = svga-num_fallbacks;
+  break;
default:
   assert(!unexpected query type in svga_end_query());
}
@@ -277,6 +299,11 @@ static boolean svga_get_query_result(struct pipe_context 
*pipe,
 
   *result = (uint64_t)sq-queryResult-result32;
   break;
+   case SVGA_QUERY_DRAW_CALLS:
+  /* fall-through */
+   case SVGA_QUERY_FALLBACKS:
+  vresult-u64 = sq-end_count - sq-begin_count;
+  break;
default:
   assert(!unexpected query type in svga_get_query_result);
}
diff --git a/src/gallium/drivers/svga/svga_screen.c 
b/src/gallium/drivers/svga/svga_screen.c
index 0558a46..70e2fa8 100644
--- a/src/gallium/drivers/svga/svga_screen.c
+++ b/src/gallium/drivers/svga/svga_screen.c
@@ -491,6 +491,27 @@ svga_fence_finish(struct pipe_screen *screen,
 }
 
 
+static int
+svga_get_driver_query_info(struct pipe_screen *screen,
+   unsigned index,
+   struct pipe_driver_query_info *info)
+{
+ 

Re: [Mesa-dev] [PATCH] st/mesa: fix bitmap, drawpix, drawtex for PIPE_CAP_TGSI_TEXCOORD

2013-04-02 Thread Brian Paul

On 03/30/2013 08:11 AM, Christoph Bumiller wrote:

NOTE: Changed the semantic index for the drawtex coordiante to
be the texture unit index instead of always 0.
Not sure if this is correct but since the value seems to depend
on the unit it would make sense to use different varying slots.


Tested-by: Brian Paul bri...@vmware.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: fix bitmap, drawpix, drawtex for PIPE_CAP_TGSI_TEXCOORD

2013-04-02 Thread Christoph Bumiller
On 02.04.2013 16:39, Brian Paul wrote:
 On 03/30/2013 08:11 AM, Christoph Bumiller wrote:
 NOTE: Changed the semantic index for the drawtex coordiante to
 be the texture unit index instead of always 0.
 Not sure if this is correct but since the value seems to depend
 on the unit it would make sense to use different varying slots.

 Tested-by: Brian Paul bri...@vmware.com
Thanks !
Just to be sure, you're referring to the part that changes the semantic
index so that TEX0..7(max units) is used instead of always TEX0, right ?
I'll push that as a separate patch then.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] PowerPC: Altivec IROUND operation

2013-04-02 Thread Jose Fonseca
I don't see need/benefit in mixing iround (ie, float - int) with round 
(ie, float - float).

If this is a one-off, then you should just call

  lp_build_intrinsic_unary(builder, llvm.ppc.altivec.vctsxs, ...)

If you really need an generic intrinsic helper for iround, then please add a new

  lp_build_iround_foo(..., enum lp_build_round_mode mode)
 
which takes enum lp_build_round_mode 

  LP_BUILD_ROUND_NEAREST - iround
  LP_BUILD_ROUND_FLOOR  - ifloor
  LP_BUILD_ROUND_CEIL - iceil
  LP_BUILD_ROUND_TRUNCATE - itrunc

Jose

- Original Message -
 From: Adhemerval Zanella azane...@linux.vnet.ibm.com
 
 This adds another rounding mode to the enum, which happens otherwise to
 match SSE4.1's rounding modes.  This should be safe as long as the
 IROUND case never hits the SSE4.1 path.
 
 Reviewed-by: Adam Jackson a...@redhat.com
 Signed-off-by: Adhemerval Zanella azane...@linux.vnet.ibm.com
 ---
  src/gallium/auxiliary/gallivm/lp_bld_arit.c | 29
  +++--
  1 file changed, 19 insertions(+), 10 deletions(-)
 
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
 b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
 index ec05026..021cd6e 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_arit.c
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_arit.c
 @@ -1360,10 +1360,17 @@ lp_build_int_to_float(struct lp_build_context *bld,
  static boolean
  arch_rounding_available(const struct lp_type type)
  {
 +   /* SSE4 vector rounding. */
 if ((util_cpu_caps.has_sse4_1 
 (type.length == 1 || type.width*type.length == 128)) ||
 (util_cpu_caps.has_avx  type.width*type.length == 256))
return TRUE;
 +   /* SSE2 vector to word. */
 +   else if ((util_cpu_caps.has_sse2 
 +((type.width == 32)  (type.length == 1 || type.length == 4)))
 ||
 +(util_cpu_caps.has_avx  type.width == 32  type.length == 8))
 +  return TRUE;
 +   /* Altivec rounding and vector to word. */
 else if ((util_cpu_caps.has_altivec 
  (type.width == 32  type.length == 4)))
return TRUE;
 @@ -1376,7 +1383,8 @@ enum lp_build_round_mode
 LP_BUILD_ROUND_NEAREST = 0,
 LP_BUILD_ROUND_FLOOR = 1,
 LP_BUILD_ROUND_CEIL = 2,
 -   LP_BUILD_ROUND_TRUNCATE = 3
 +   LP_BUILD_ROUND_TRUNCATE = 3,
 +   LP_BUILD_IROUND = 4
  };
  
  /**
 @@ -1400,6 +1408,7 @@ lp_build_round_sse41(struct lp_build_context *bld,
  
 assert(lp_check_value(type, a));
 assert(util_cpu_caps.has_sse4_1);
 +   assert(mode != LP_BUILD_IROUND);
  
 if (type.length == 1) {
LLVMTypeRef vec_type;
 @@ -1526,8 +1535,6 @@ lp_build_iround_nearest_sse2(struct lp_build_context
 *bld,
  }
  
  
 -/*
 - */
  static INLINE LLVMValueRef
  lp_build_round_altivec(struct lp_build_context *bld,
 LLVMValueRef a,
 @@ -1536,8 +1543,10 @@ lp_build_round_altivec(struct lp_build_context *bld,
 LLVMBuilderRef builder = bld-gallivm-builder;
 const struct lp_type type = bld-type;
 const char *intrinsic = NULL;
 +   LLVMTypeRef ret_type = bld-vec_type;
  
 assert(type.floating);
 +   assert(type.width == 32);
  
 assert(lp_check_value(type, a));
 assert(util_cpu_caps.has_altivec);
 @@ -1555,9 +1564,12 @@ lp_build_round_altivec(struct lp_build_context *bld,
 case LP_BUILD_ROUND_TRUNCATE:
intrinsic = llvm.ppc.altivec.vrfiz;
break;
 +   case LP_BUILD_IROUND:
 +  ret_type = lp_build_int_vec_type(bld-gallivm, bld-type);
 +  intrinsic = llvm.ppc.altivec.vctsxs;
 }
  
 -   return lp_build_intrinsic_unary(builder, intrinsic, bld-vec_type, a);
 +   return lp_build_intrinsic_unary(builder, intrinsic, ret_type, a);
  }
  
  static INLINE LLVMValueRef
 @@ -1565,7 +1577,9 @@ lp_build_round_arch(struct lp_build_context *bld,
  LLVMValueRef a,
  enum lp_build_round_mode mode)
  {
 -   if (util_cpu_caps.has_sse4_1)
 +   if (util_cpu_caps.has_sse2  (mode == LP_BUILD_IROUND))
 + return lp_build_iround_nearest_sse2(bld, a);
 +   else if (util_cpu_caps.has_sse4_1)
   return lp_build_round_sse41(bld, a, mode);
 else /* (util_cpu_caps.has_altivec) */
   return lp_build_round_altivec(bld, a, mode);
 @@ -1893,11 +1907,6 @@ lp_build_iround(struct lp_build_context *bld,
  
 assert(lp_check_value(type, a));
  
 -   if ((util_cpu_caps.has_sse2 
 -   ((type.width == 32)  (type.length == 1 || type.length == 4))) ||
 -   (util_cpu_caps.has_avx  type.width == 32  type.length == 8)) {
 -  return lp_build_iround_nearest_sse2(bld, a);
 -   }
 if (arch_rounding_available(type)) {
res = lp_build_round_arch(bld, a, LP_BUILD_ROUND_NEAREST);
 }
 --
 1.7.11.4
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: fix bitmap, drawpix, drawtex for PIPE_CAP_TGSI_TEXCOORD

2013-04-02 Thread Brian Paul

On 04/02/2013 08:43 AM, Christoph Bumiller wrote:

On 02.04.2013 16:39, Brian Paul wrote:

On 03/30/2013 08:11 AM, Christoph Bumiller wrote:

NOTE: Changed the semantic index for the drawtex coordiante to
be the texture unit index instead of always 0.
Not sure if this is correct but since the value seems to depend
on the unit it would make sense to use different varying slots.


Tested-by: Brian Paulbri...@vmware.com

Thanks !
Just to be sure, you're referring to the part that changes the semantic
index so that TEX0..7(max units) is used instead of always TEX0, right ?
I'll push that as a separate patch then.



I only tested the patch that's described by the subject line.  I don't 
recall seeing the other one.


-Brian
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] llvmpipe: add print for int128

2013-04-02 Thread Jose Fonseca
Looks good to me in principle, but I have some remarks (inline) concerning the 
implementation.

Jose

- Original Message -
 From: Adhemerval Zanella azane...@linux.vnet.ibm.com
 
 Reviewed-by: Adam Jackson a...@redhat.com
 Signed-off-by: Adhemerval Zanella azane...@linux.vnet.ibm.com
 ---
  src/gallium/auxiliary/gallivm/lp_bld_printf.c | 56
  +++
  1 file changed, 39 insertions(+), 17 deletions(-)
 
 diff --git a/src/gallium/auxiliary/gallivm/lp_bld_printf.c
 b/src/gallium/auxiliary/gallivm/lp_bld_printf.c
 index 7a6bbd9..71c4d1b 100644
 --- a/src/gallium/auxiliary/gallivm/lp_bld_printf.c
 +++ b/src/gallium/auxiliary/gallivm/lp_bld_printf.c
 @@ -83,33 +83,47 @@ lp_build_print_value(struct gallivm_state *gallivm,
 LLVMTypeKind type_kind;
 LLVMTypeRef type_ref;
 LLVMValueRef params[2 + LP_MAX_VECTOR_LENGTH];
 -   char type_fmt[6] =  %x;
 +   char type_fmt[20];
 char format[2 + 5 * LP_MAX_VECTOR_LENGTH + 2] = %s;
 -   unsigned length;
 +   unsigned vecsize;
 +   unsigned nargs;
 unsigned i;
  
 type_ref = LLVMTypeOf(value);
 type_kind = LLVMGetTypeKind(type_ref);
  
 if (type_kind == LLVMVectorTypeKind) {
 -  length = LLVMGetVectorSize(type_ref);
 +  vecsize = LLVMGetVectorSize(type_ref);
 +  nargs = vecsize;
  
type_ref = LLVMGetElementType(type_ref);
type_kind = LLVMGetTypeKind(type_ref);
 +   } else if (LLVMGetIntTypeWidth(type_ref) == 128) {
 +  vecsize = 1;
 +  nargs = 2;
 } else {
 -  length = 1;
 +  vecsize = 1;
 +  nargs = 1;
 }
  
 if (type_kind == LLVMFloatTypeKind || type_kind == LLVMDoubleTypeKind) {
 -  type_fmt[2] = '.';
 -  type_fmt[3] = '9';
 -  type_fmt[4] = 'g';
 -  type_fmt[5] = '\0';
 +  snprintf(type_fmt, sizeof type_fmt,  %%9g);

The . is missing here.

 } else if (type_kind == LLVMIntegerTypeKind) {
 -  if (LLVMGetIntTypeWidth(type_ref) == 8) {
 - type_fmt[2] = 'u';
 -  } else {
 - type_fmt[2] = 'i';
 +  unsigned typeWidth = LLVMGetIntTypeWidth(type_ref);
 +  if (LLVMGetIntTypeWidth(type_ref) = 32) {
 + snprintf(type_fmt, sizeof type_fmt,  %%x);

This doesn't look equivalent to me neither. Please retain previous behavior for 
integers = 32.

 +  } else if (typeWidth == 64) {
 +#if __WORDSIZE == 64

Is __WORDSIZE standard? I'm particularly concerned about windows. You could 
just use if (sizeof (unsigned)) here and avoid magic macro.

Or better, you could just use inttypes.h macros, which are also defined for 
MSVC.

 + snprintf(type_fmt, sizeof type_fmt,  %%016lx);
 +#else
 + snprintf(type_fmt, sizeof type_fmt,  %%016llx);
 +#endif
 +  } else if (typeWidth == 128) {
 +#if __WORDSIZE == 64
 + snprintf(type_fmt, sizeof type_fmt,  %%016lx%%016lx);
 +#else
 + snprintf(type_fmt, sizeof type_fmt,  %%016llx%%016llx);
 +#endif
}
 } else {
/* Unsupported type */
 @@ -117,14 +131,22 @@ lp_build_print_value(struct gallivm_state *gallivm,
 }
  
 /* Create format string and arguments */
 -   assert(strlen(format) + strlen(type_fmt) * length + 2 = sizeof format);
 +   assert(strlen(format) + strlen(type_fmt) * nargs + 2 = sizeof format);
  
 params[1] = lp_build_const_string(gallivm, msg);
 -   if (length == 1) {
 +   if (vecsize == 1) {
util_strncat(format, type_fmt, sizeof(format) - strlen(format) - 1);
 -  params[2] = value;
 +  if (LLVMGetIntTypeWidth(type_ref) = 64) {
 + params[2] = value;
 +  } else {
 + LLVMValueRef shift =
 LLVMConstInt(LLVMIntTypeInContext(gallivm-context, 128), 64, 0);
 + LLVMValueRef lshr = LLVMBuildLShr(builder, value, shift, );
 + LLVMTypeRef type64 = LLVMInt64TypeInContext(gallivm-context);
 + params[2] = LLVMBuildTrunc(builder, lshr, type64, );
 + params[3] = LLVMBuildTrunc(builder, value, type64, );
 +  }
 } else {
 -  for (i = 0; i  length; ++i) {
 +  for (i = 0; i  vecsize; ++i) {
   LLVMValueRef param;
   util_strncat(format, type_fmt, sizeof(format) - strlen(format) -
   1);
   param = LLVMBuildExtractElement(builder, value,
   lp_build_const_int32(gallivm, i), );
 @@ -144,7 +166,7 @@ lp_build_print_value(struct gallivm_state *gallivm,
 util_strncat(format, \n, sizeof(format) - strlen(format) - 1);
  
 params[0] = lp_build_const_string(gallivm, format);
 -   return lp_build_print_args(gallivm, 2 + length, params);
 +   return lp_build_print_args(gallivm, 2 + nargs, params);
  }
  
  
 --
 1.7.11.4
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 62868] solaris build broken with missing ffsll

2013-04-02 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=62868

Brian Paul bri...@vmware.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Brian Paul bri...@vmware.com ---
Should be fixed with commit 95df2b28831147b3e7ce2a3b6257bf60c46b4ab4

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeon/llvm: Build libradeonllvm as a static library

2013-04-02 Thread Michel Dänzer
On Die, 2013-04-02 at 10:20 +0200, Michel Dänzer wrote: 
 On Mon, 2013-04-01 at 14:11 -0700, Tom Stellard wrote: 
  From: Tom Stellard thomas.stell...@amd.com
  
  Building libradeonllvm as a shared object has led to a number of bugs
  and build system complications, and I don't think it's necessary for
  such a small library.
  
  This library was originally changed to a shared object to work around
  linker error in egl_static.so, but these appear to be fixed now.
  
  https://bugs.freedesktop.org/show_bug.cgi?id=62226
  ---
  
  Please test to make sure this works for your build configuration.
 
 Tested-by: Michel Dänzer michel.daen...@amd.com

Retracted, and patch NACKed: I had forgotten I needed to test starting X
with glamor, which this still breaks. 

-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast |  Debian, X and DRI developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Reduce code duplication in handling of depth, stencil, and HiZ.

2013-04-02 Thread Kenneth Graunke

On 03/26/2013 09:54 PM, Paul Berry wrote:

This patch consolidates duplicate code in the brw_depthbuffer and
gen7_depthbuffer state atoms.  Previously, these state atoms contained
5 chunks of code for emitting the _3DSTATE_DEPTH_BUFFER packet (3 for
Gen4-6 and 2 for Gen7).  Also a lot of logic for determining the
appropriate buffer setup was duplicated between the Gen4-6 and Gen7
functions.

This refactor splits the code into three separate functions:
brw_emit_depthbuffer(), which determines the appropriate buffer setup
in a mostly generation-independent way, brw_emit_depth_stencil_hiz(),
which emits the appropriate state packets for Gen4-6, and
gen7_emit_depth_stencil_hiz(), which emits the appropriate state
packets for Gen7.

Tested using Piglit on Gen5-7 (no regressions).


Okay, nice work, you've successfully dealt with the rat's nest.  This is 
definitely better.


Reviewed-by: Kenneth Graunke kenn...@whitecape.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/8] glsl: Don't emit spurious errors for constant indexes of the wrong type

2013-04-02 Thread Kenneth Graunke

On 04/01/2013 11:25 AM, Ian Romanick wrote:

From: Ian Romanick ian.d.roman...@intel.com

Previously the shader

uniform float x[6];
void main() { gl_Position.x = x[1.0]; }

would have generated the errors

0:2(33): error: array index must be integer type
0:2(36): error: array index must be  6

Now only

0:2(33): error: array index must be integer type

will be generated.

Signed-off-by: Ian Romanick ian.d.roman...@intel.com
---
  src/glsl/ast_array_index.cpp | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/glsl/ast_array_index.cpp b/src/glsl/ast_array_index.cpp
index 486ff55..c7ebcbd 100644
--- a/src/glsl/ast_array_index.cpp
+++ b/src/glsl/ast_array_index.cpp
@@ -58,7 +58,7 @@ _mesa_ast_array_index_to_hir(void *mem_ctx,
  * declared size.
  */
 ir_constant *const const_index = idx-constant_expression_value();
-   if (const_index != NULL) {
+   if (const_index != NULL  idx-type-is_integer()) {
const int idx = const_index-value.i[0];
const char *type_name = error;
unsigned bound = 0;
@@ -118,7 +118,7 @@ _mesa_ast_array_index_to_hir(void *mem_ctx,
check_builtin_array_max_size(v-name, idx+1, loc, state);
 }
}
-   } else if (array-type-is_array()) {
+   } else if (const_index == NULL  array-type-is_array()) {
if (array-type-array_size() == 0) {
 _mesa_glsl_error(loc, state, unsized array index must be constant);
} else if (array-type-fields.array-is_interface()) {


Aww.  Patch 6 cleaned this up so nicely, and now it's getting a bit 
uglier again.


How about simply doing an early-return above:

   if (!idx-type-is_integer()) {
  _mesa_glsl_error( idx_loc, state, array index must be integer 
type);

  return result;
   } else if (!idx-type-is_scalar()) {
  _mesa_glsl_error( idx_loc, state, array index must be scalar);
  return result;
   }

Basically, if you hit those errors, you don't want to continue checking 
for more of them.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/8] glsl: Make check_build_array_max_size externally visible

2013-04-02 Thread Kenneth Graunke

On 04/01/2013 11:25 AM, Ian Romanick wrote:

From: Ian Romanick ian.d.roman...@intel.com

A future commit will try to use this function in a different file.

Signed-off-by: Ian Romanick ian.d.roman...@intel.com


Title of patch should be check_builtin_array_max_size (typo).  I was 
wondering what a check_build_array function would do :)


I'm not really sure where you're going with patch 8, and I had a few 
comments on patch 7 (which you can take or leave).  But regardless, this 
series is:


Reviewed-by: Kenneth Graunke kenn...@whitecape.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] mesa: Add new ctx-Stencil._WriteEnabled derived state flag.

2013-04-02 Thread Kenneth Graunke
i965 needs to know whether stencil writes are enabled in several places,
and gets the test wrong sometimes.  While we could create a function to
compute this, it seems generally useful enough to warrant a new piece of
derived state.  Also, all the plumbing is already in place.

NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/main/mtypes.h  | 1 +
 src/mesa/main/stencil.c | 5 +
 2 files changed, 6 insertions(+)

diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index ace6938..e731fe3 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -1015,6 +1015,7 @@ struct gl_stencil_attrib
GLboolean TestTwoSide;  /** GL_EXT_stencil_two_side */
GLubyte ActiveFace; /** GL_EXT_stencil_two_side (0 or 2) */
GLboolean _Enabled;  /** Enabled and stencil buffer present */
+   GLboolean _WriteEnabled; /** _Enabled and non-zero writemasks */
GLboolean _TestTwoSide;
GLubyte _BackFace;   /** Current back stencil state (1 or 2) */
GLenum Function[3]; /** Stencil function */
diff --git a/src/mesa/main/stencil.c b/src/mesa/main/stencil.c
index c161808..3308417 100644
--- a/src/mesa/main/stencil.c
+++ b/src/mesa/main/stencil.c
@@ -551,6 +551,11 @@ _mesa_update_stencil(struct gl_context *ctx)
ctx-Stencil.Ref[0] != ctx-Stencil.Ref[face] ||
ctx-Stencil.ValueMask[0] != ctx-Stencil.ValueMask[face] ||
ctx-Stencil.WriteMask[0] != ctx-Stencil.WriteMask[face]);
+
+   ctx-Stencil._WriteEnabled =
+  ctx-Stencil._Enabled 
+  (ctx-Stencil.WriteMask[0] != 0 ||
+   (ctx-Stencil._TestTwoSide  ctx-Stencil.WriteMask[face] != 0));
 }
 
 
-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] i965: Fix stencil write enable flag in 3DSTATE_DEPTH_BUFFER on Gen7+.

2013-04-02 Thread Kenneth Graunke
ctx-Stencil.WriteMask is a statically sized array of 3 elements.
Checking it against 0 actually is a NULL check, and can never fail,
which meant that we always said stencil writes were enabled.

Use the new core Mesa derived state flag to fix this.

NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/gen7_misc_state.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/gen7_misc_state.c 
b/src/mesa/drivers/dri/i965/gen7_misc_state.c
index 2009070..1d3677d 100644
--- a/src/mesa/drivers/dri/i965/gen7_misc_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_misc_state.c
@@ -50,7 +50,7 @@ gen7_emit_depth_stencil_hiz(struct brw_context *brw,
OUT_BATCH((depth_mt ? depth_mt-region-pitch - 1 : 0) |
  (depthbuffer_format  18) |
  ((hiz_mt ? 1 : 0)  22) |
- ((stencil_mt != NULL  ctx-Stencil.WriteMask != 0)  27) |
+ ((stencil_mt != NULL  ctx-Stencil._WriteEnabled)  27) |
  ((ctx-Depth.Mask != 0)  28) |
  (depth_surface_type  29));
 
-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] i965: Use ctx-Stencil._WriteEnabled in DEPTH_STENCIL_STATE.

2013-04-02 Thread Kenneth Graunke
This is the same computation as the _WriteEnabled flag, so we may as
well use it.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/gen6_depthstencil.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_depthstencil.c 
b/src/mesa/drivers/dri/i965/gen6_depthstencil.c
index 4ea517f..940d91f 100644
--- a/src/mesa/drivers/dri/i965/gen6_depthstencil.c
+++ b/src/mesa/drivers/dri/i965/gen6_depthstencil.c
@@ -74,11 +74,7 @@ gen6_upload_depth_stencil_state(struct brw_context *brw)
 ds-ds1.bf_stencil_test_mask = ctx-Stencil.ValueMask[back];
   }
 
-  /* Not really sure about this:
-   */
-  if (ctx-Stencil.WriteMask[0] ||
- (ctx-Stencil._TestTwoSide  ctx-Stencil.WriteMask[back]))
-ds-ds0.stencil_write_enable = 1;
+  ds-ds0.stencil_write_enable = ctx-Stencil._WriteEnabled;
}
 
/* _NEW_DEPTH */
-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: Fix array indexing when constant folding built-in functions.

2013-04-02 Thread Paul Berry
On 1 April 2013 11:43, Kenneth Graunke kenn...@whitecape.org wrote:

 On 04/01/2013 11:30 AM, Ian Romanick wrote:

 On 03/29/2013 02:13 PM, Paul Berry wrote:

 Mesa constant-folds built-in functions by using a miniature GLSL
 interpreter (see
 ir_function_signature::**constant_expression_evaluate_**
 expression_list()).
 This interpreter had a bug in its handling of array indexing, which
 caused expressions like m[i][j] (where m is a matrix) to be handled
 incorrectly.  Specifically, it incorrectly treated j as indexing into
 the whole matrix (rather than indexing just into the vector m[i]); as
 a result the offset computed for m[i] was lost and m[i][j] was treated
 as m[j][0].

 Fixes piglit tests inverse-mat[234].{vert,frag}.

 NOTE: This is a candidate for the 9.1 branch.


 Good catch.  The test case fails only in 9.1 and later because it
 requires OpenGL 3.1, but I think the bug exists in earlier versions.


I'm glad you mentioned this because it prompted me to investigate further.
It turns out that the test case *does* fail in 9.0, because 9.0 supports
OpenGL 3.1.  The bug doesn't exist in earlier versions, since the
miniature GLSL interpreter technique was implemented in the 9.0 timeframe.

I'll update the note to mark this patch as a candidate for 9.1 and 9.0.



 Reviewed-by: Ian Romanick ian.d.roman...@intel.com
 Bugzilla: 
 https://bugs.freedesktop.org/**show_bug.cgi?id=57436https://bugs.freedesktop.org/show_bug.cgi?id=57436

 I already pushed my work-around to master (but not to 9.1).  You can
 revert it when you push this change if you like.


 I would like to keep the change to use dot(), as that seems like an actual
 improvement.  For the other changes, I guess I don't have a strong
 preference.

 --Ken


Sounds reasonable to me.  Will do.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/5] draw/gs: cleanup some debugging code

2013-04-02 Thread Zack Rusin

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_gs.c |4 
 1 file changed, 4 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_gs.c 
b/src/gallium/auxiliary/draw/draw_gs.c
index b98b133..70db837 100644
--- a/src/gallium/auxiliary/draw/draw_gs.c
+++ b/src/gallium/auxiliary/draw/draw_gs.c
@@ -160,8 +160,6 @@ static void tgsi_fetch_gs_input(struct draw_geometry_shader 
*shader,
 #if DEBUG_INPUTS
 debug_printf(\tSlot = %d, vs_slot = %d, idx = %d:\n,
  slot, vs_slot, idx);
-#endif
-#if 1
 assert(!util_is_inf_or_nan(input[vs_slot][0]));
 assert(!util_is_inf_or_nan(input[vs_slot][1]));
 assert(!util_is_inf_or_nan(input[vs_slot][2]));
@@ -249,8 +247,6 @@ llvm_fetch_gs_input(struct draw_geometry_shader *shader,
 #if DEBUG_INPUTS
 debug_printf(\tSlot = %d, vs_slot = %d, i = %d:\n,
  slot, vs_slot, i);
-#endif
-#if 0
 assert(!util_is_inf_or_nan(input[vs_slot][0]));
 assert(!util_is_inf_or_nan(input[vs_slot][1]));
 assert(!util_is_inf_or_nan(input[vs_slot][2]));
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/5] draw/llvm: use an enum instead of magic numbers

2013-04-02 Thread Zack Rusin
I think this was there before and got accidently
removed during a merge. Same code as for the GS
context, which is also using an enum instead of
hardcoded numbers.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_llvm.c |8 
 src/gallium/auxiliary/draw/draw_llvm.h |   17 +++--
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_llvm.c 
b/src/gallium/auxiliary/draw/draw_llvm.c
index d0199bb..5100ce0 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.c
+++ b/src/gallium/auxiliary/draw/draw_llvm.c
@@ -203,7 +203,7 @@ create_jit_context_type(struct gallivm_state *gallivm,
 {
LLVMTargetDataRef target = gallivm-target;
LLVMTypeRef float_type = LLVMFloatTypeInContext(gallivm-context);
-   LLVMTypeRef elem_types[5];
+   LLVMTypeRef elem_types[DRAW_JIT_CTX_NUM_FIELDS];
LLVMTypeRef context_type;
 
elem_types[0] = LLVMArrayType(LLVMPointerType(float_type, 0), /* 
vs_constants */
@@ -224,11 +224,11 @@ create_jit_context_type(struct gallivm_state *gallivm,
 #endif
 
LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, vs_constants,
-  target, context_type, 0);
+  target, context_type, DRAW_JIT_CTX_CONSTANTS);
LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, planes,
-  target, context_type, 1);
+  target, context_type, DRAW_JIT_CTX_PLANES);
LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, viewport,
-  target, context_type, 2);
+  target, context_type, DRAW_JIT_CTX_VIEWPORT);
LP_CHECK_MEMBER_OFFSET(struct draw_jit_context, textures,
   target, context_type,
   DRAW_JIT_CTX_TEXTURES);
diff --git a/src/gallium/auxiliary/draw/draw_llvm.h 
b/src/gallium/auxiliary/draw/draw_llvm.h
index 8df02a2..5909fc1 100644
--- a/src/gallium/auxiliary/draw/draw_llvm.h
+++ b/src/gallium/auxiliary/draw/draw_llvm.h
@@ -130,18 +130,23 @@ struct draw_jit_context
struct draw_jit_sampler samplers[PIPE_MAX_SAMPLERS];
 };
 
+enum {
+   DRAW_JIT_CTX_CONSTANTS   = 0,
+   DRAW_JIT_CTX_PLANES  = 1,
+   DRAW_JIT_CTX_VIEWPORT= 2,
+   DRAW_JIT_CTX_TEXTURES= 3,
+   DRAW_JIT_CTX_SAMPLERS= 4,
+   DRAW_JIT_CTX_NUM_FIELDS
+};
 
 #define draw_jit_context_vs_constants(_gallivm, _ptr) \
-   lp_build_struct_get_ptr(_gallivm, _ptr, 0, vs_constants)
+   lp_build_struct_get_ptr(_gallivm, _ptr, DRAW_JIT_CTX_CONSTANTS, 
vs_constants)
 
 #define draw_jit_context_planes(_gallivm, _ptr) \
-   lp_build_struct_get(_gallivm, _ptr, 1, planes)
+   lp_build_struct_get(_gallivm, _ptr, DRAW_JIT_CTX_PLANES, planes)
 
 #define draw_jit_context_viewport(_gallivm, _ptr) \
-   lp_build_struct_get(_gallivm, _ptr, 2, viewport)
-
-#define DRAW_JIT_CTX_TEXTURES 3
-#define DRAW_JIT_CTX_SAMPLERS 4
+   lp_build_struct_get(_gallivm, _ptr, DRAW_JIT_CTX_VIEWPORT, viewport)
 
 #define draw_jit_context_textures(_gallivm, _ptr) \
lp_build_struct_get_ptr(_gallivm, _ptr, DRAW_JIT_CTX_TEXTURES, textures)
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/5] draw: remove unused function

2013-04-02 Thread Zack Rusin
we use draw_set_mapped_so_targets nowadays

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_context.c |7 ---
 src/gallium/auxiliary/draw/draw_context.h |5 -
 2 files changed, 12 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_context.c 
b/src/gallium/auxiliary/draw/draw_context.c
index ceb74df..bb56f1b 100644
--- a/src/gallium/auxiliary/draw/draw_context.c
+++ b/src/gallium/auxiliary/draw/draw_context.c
@@ -735,13 +735,6 @@ draw_set_mapped_so_targets(struct draw_context *draw,
 }
 
 void
-draw_set_mapped_so_buffers(struct draw_context *draw,
-   void *buffers[PIPE_MAX_SO_BUFFERS],
-   unsigned num_buffers)
-{
-}
-
-void
 draw_set_so_state(struct draw_context *draw,
   struct pipe_stream_output_info *state)
 {
diff --git a/src/gallium/auxiliary/draw/draw_context.h 
b/src/gallium/auxiliary/draw/draw_context.h
index b333457..426fd44 100644
--- a/src/gallium/auxiliary/draw/draw_context.h
+++ b/src/gallium/auxiliary/draw/draw_context.h
@@ -222,11 +222,6 @@ draw_set_mapped_constant_buffer(struct draw_context *draw,
 unsigned size);
 
 void
-draw_set_mapped_so_buffers(struct draw_context *draw,
-   void *buffers[PIPE_MAX_SO_BUFFERS],
-   unsigned num_buffers);
-
-void
 draw_set_mapped_so_targets(struct draw_context *draw,
int num_targets,
struct draw_so_target 
*targets[PIPE_MAX_SO_BUFFERS]);
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/5] llvmpipe: reset so buffers when not appending

2013-04-02 Thread Zack Rusin
We need to reset the internal state of the so buffers or we'll
keep appending even though we're not supposed to.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/drivers/llvmpipe/lp_state_so.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/gallium/drivers/llvmpipe/lp_state_so.c 
b/src/gallium/drivers/llvmpipe/lp_state_so.c
index 58bab39..fa58f79 100644
--- a/src/gallium/drivers/llvmpipe/lp_state_so.c
+++ b/src/gallium/drivers/llvmpipe/lp_state_so.c
@@ -70,6 +70,12 @@ llvmpipe_set_so_targets(struct pipe_context *pipe,
int i;
for (i = 0; i  num_targets; i++) {
   pipe_so_target_reference((struct pipe_stream_output_target 
**)llvmpipe-so_targets[i], targets[i]);
+  /* if we're not appending then lets reset the internal
+ data of our so target */
+  if (!(append_bitmask  (1  i))  llvmpipe-so_targets[i]) {
+ llvmpipe-so_targets[i]-internal_offset = 0;
+ llvmpipe-so_targets[i]-emitted_vertices = 0;
+  }
}
 
for (; i  llvmpipe-num_so_targets; i++) {
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/5] draw/llvmpipe: allow independent so attachments to the vs

2013-04-02 Thread Zack Rusin
When geometry shaders are present, one needs to be able to create
an empty geometry shader with stream output that needs to be
resolved later and attached to the currently bound vertex shader.
Lets add support for it to llvmpipe and draw. draw allows attaching
independent stream output info to any vertex shader and llvmpipe
resolves at draw time which vertex shader the given empty geometry
shader should be linked to.

Signed-off-by: Zack Rusin za...@vmware.com
---
 src/gallium/auxiliary/draw/draw_context.c |9 -
 src/gallium/auxiliary/draw/draw_context.h |7 +++
 src/gallium/auxiliary/draw/draw_private.h |1 -
 src/gallium/auxiliary/draw/draw_vs.c  |   13 +
 src/gallium/drivers/llvmpipe/lp_draw_arrays.c |   15 +++
 src/gallium/drivers/llvmpipe/lp_state_gs.c|   21 -
 6 files changed, 43 insertions(+), 23 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_context.c 
b/src/gallium/auxiliary/draw/draw_context.c
index bb56f1b..2fb9bac 100644
--- a/src/gallium/auxiliary/draw/draw_context.c
+++ b/src/gallium/auxiliary/draw/draw_context.c
@@ -735,15 +735,6 @@ draw_set_mapped_so_targets(struct draw_context *draw,
 }
 
 void
-draw_set_so_state(struct draw_context *draw,
-  struct pipe_stream_output_info *state)
-{
-   memcpy(draw-so.state,
-  state,
-  sizeof(struct pipe_stream_output_info));
-}
-
-void
 draw_set_sampler_views(struct draw_context *draw,
unsigned shader_stage,
struct pipe_sampler_view **views,
diff --git a/src/gallium/auxiliary/draw/draw_context.h 
b/src/gallium/auxiliary/draw/draw_context.h
index 426fd44..1d25b7f 100644
--- a/src/gallium/auxiliary/draw/draw_context.h
+++ b/src/gallium/auxiliary/draw/draw_context.h
@@ -171,6 +171,9 @@ void draw_bind_vertex_shader(struct draw_context *draw,
  struct draw_vertex_shader *dvs);
 void draw_delete_vertex_shader(struct draw_context *draw,
struct draw_vertex_shader *dvs);
+void draw_vs_attach_so(struct draw_vertex_shader *dvs,
+   const struct pipe_stream_output_info *info);
+void draw_vs_reset_so(struct draw_vertex_shader *dvs);
 
 
 /*
@@ -226,10 +229,6 @@ draw_set_mapped_so_targets(struct draw_context *draw,
int num_targets,
struct draw_so_target 
*targets[PIPE_MAX_SO_BUFFERS]);
 
-void
-draw_set_so_state(struct draw_context *draw,
-  struct pipe_stream_output_info *state);
-
 
 /***
  * draw_pt.c 
diff --git a/src/gallium/auxiliary/draw/draw_private.h 
b/src/gallium/auxiliary/draw/draw_private.h
index 5063c3c..757ed26 100644
--- a/src/gallium/auxiliary/draw/draw_private.h
+++ b/src/gallium/auxiliary/draw/draw_private.h
@@ -279,7 +279,6 @@ struct draw_context
 
/** Stream output (vertex feedback) state */
struct {
-  struct pipe_stream_output_info state;
   struct draw_so_target *targets[PIPE_MAX_SO_BUFFERS];
   uint num_targets;
} so;
diff --git a/src/gallium/auxiliary/draw/draw_vs.c 
b/src/gallium/auxiliary/draw/draw_vs.c
index 266cca7..afec376 100644
--- a/src/gallium/auxiliary/draw/draw_vs.c
+++ b/src/gallium/auxiliary/draw/draw_vs.c
@@ -245,3 +245,16 @@ draw_vs_get_emit( struct draw_context *draw,

return draw-vs.emit;
 }
+
+void
+draw_vs_attach_so(struct draw_vertex_shader *dvs,
+  const struct pipe_stream_output_info *info)
+{
+   dvs-state.stream_output = *info;
+}
+
+void
+draw_vs_reset_so(struct draw_vertex_shader *dvs)
+{
+   memset(dvs-state.stream_output, 0, sizeof(dvs-state.stream_output));
+}
diff --git a/src/gallium/drivers/llvmpipe/lp_draw_arrays.c 
b/src/gallium/drivers/llvmpipe/lp_draw_arrays.c
index ae00c49..efeca25 100644
--- a/src/gallium/drivers/llvmpipe/lp_draw_arrays.c
+++ b/src/gallium/drivers/llvmpipe/lp_draw_arrays.c
@@ -101,6 +101,13 @@ llvmpipe_draw_vbo(struct pipe_context *pipe, const struct 
pipe_draw_info *info)
llvmpipe_prepare_geometry_sampling(lp,
   
lp-num_sampler_views[PIPE_SHADER_GEOMETRY],
   lp-sampler_views[PIPE_SHADER_GEOMETRY]);
+   if (lp-gs  !lp-gs-shader.tokens) {
+  /* we have an empty geometry shader with stream output, so
+ attach the stream output info to the current vertex shader */
+  if (lp-vs) {
+ draw_vs_attach_so(lp-vs-draw_data, lp-gs-shader.stream_output);
+  }
+   }
 
/* draw! */
draw_vbo(draw, info);
@@ -116,6 +123,14 @@ llvmpipe_draw_vbo(struct pipe_context *pipe, const struct 
pipe_draw_info *info)
}
draw_set_mapped_so_targets(draw, 0, NULL);
 
+   if (lp-gs  !lp-gs-shader.tokens) {
+  /* we have attached stream output to the vs for rendering,
+ now lets reset it */
+  if (lp-vs) {
+ 

[Mesa-dev] [PATCH 1/2] i965: Turn brw-urb.vs_size and gs_size into local variables.

2013-04-02 Thread Kenneth Graunke
These variables are only used within a single function, so we may as
well make them local variables.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/brw_context.h |  9 -
 src/mesa/drivers/dri/i965/gen6_urb.c| 18 +-
 src/mesa/drivers/dri/i965/gen7_urb.c|  7 +++
 3 files changed, 12 insertions(+), 22 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
b/src/mesa/drivers/dri/i965/brw_context.h
index ea5b62a..d3a5042 100644
--- a/src/mesa/drivers/dri/i965/brw_context.h
+++ b/src/mesa/drivers/dri/i965/brw_context.h
@@ -859,15 +859,6 @@ struct brw_context
   GLuint nr_sf_entries;
   GLuint nr_cs_entries;
 
-  /* gen6:
-   * The length of each URB entry owned by the VS (or GS), as
-   * a number of 1024-bit (128-byte) rows.  Should be = 1.
-   *
-   * gen7: Same meaning, but in 512-bit (64-byte) rows.
-   */
-  GLuint vs_size;
-  GLuint gs_size;
-
   GLuint vs_start;
   GLuint gs_start;
   GLuint clip_start;
diff --git a/src/mesa/drivers/dri/i965/gen6_urb.c 
b/src/mesa/drivers/dri/i965/gen6_urb.c
index 2d69cbe..aa985de 100644
--- a/src/mesa/drivers/dri/i965/gen6_urb.c
+++ b/src/mesa/drivers/dri/i965/gen6_urb.c
@@ -54,7 +54,7 @@ gen6_upload_urb( struct brw_context *brw )
int total_urb_size = brw-urb.size * 1024; /* in bytes */
 
/* CACHE_NEW_VS_PROG */
-   brw-urb.vs_size = MAX2(brw-vs.prog_data-urb_entry_size, 1);
+   unsigned vs_size = MAX2(brw-vs.prog_data-urb_entry_size, 1);
 
/* We use the same VUE layout for VS outputs and GS outputs (as it's what
 * the SF and Clipper expect), so we can simply make the GS URB entry size
@@ -62,14 +62,14 @@ gen6_upload_urb( struct brw_context *brw )
 * where we have few vertex attributes and a lot of varyings, since the VS
 * size is determined by the larger of the two.  For now, it's safe.
 */
-   brw-urb.gs_size = brw-urb.vs_size;
+   unsigned gs_size = vs_size;
 
/* Calculate how many entries fit in each stage's section of the URB */
if (brw-gs.prog_active) {
-  nr_vs_entries = (total_urb_size/2) / (brw-urb.vs_size * 128);
-  nr_gs_entries = (total_urb_size/2) / (brw-urb.gs_size * 128);
+  nr_vs_entries = (total_urb_size/2) / (vs_size * 128);
+  nr_gs_entries = (total_urb_size/2) / (gs_size * 128);
} else {
-  nr_vs_entries = total_urb_size / (brw-urb.vs_size * 128);
+  nr_vs_entries = total_urb_size / (vs_size * 128);
   nr_gs_entries = 0;
}
 
@@ -87,14 +87,14 @@ gen6_upload_urb( struct brw_context *brw )
assert(brw-urb.nr_vs_entries = 24);
assert(brw-urb.nr_vs_entries % 4 == 0);
assert(brw-urb.nr_gs_entries % 4 == 0);
-   assert(brw-urb.vs_size  5);
-   assert(brw-urb.gs_size  5);
+   assert(vs_size  5);
+   assert(gs_size  5);
 
BEGIN_BATCH(3);
OUT_BATCH(_3DSTATE_URB  16 | (3 - 2));
-   OUT_BATCH(((brw-urb.vs_size - 1)  GEN6_URB_VS_SIZE_SHIFT) |
+   OUT_BATCH(((vs_size - 1)  GEN6_URB_VS_SIZE_SHIFT) |
 ((brw-urb.nr_vs_entries)  GEN6_URB_VS_ENTRIES_SHIFT));
-   OUT_BATCH(((brw-urb.gs_size - 1)  GEN6_URB_GS_SIZE_SHIFT) |
+   OUT_BATCH(((gs_size - 1)  GEN6_URB_GS_SIZE_SHIFT) |
 ((brw-urb.nr_gs_entries)  GEN6_URB_GS_ENTRIES_SHIFT));
ADVANCE_BATCH();
 
diff --git a/src/mesa/drivers/dri/i965/gen7_urb.c 
b/src/mesa/drivers/dri/i965/gen7_urb.c
index 481497b..dafe1ad 100644
--- a/src/mesa/drivers/dri/i965/gen7_urb.c
+++ b/src/mesa/drivers/dri/i965/gen7_urb.c
@@ -82,9 +82,9 @@ gen7_upload_urb(struct brw_context *brw)
int handle_region_size = (brw-urb.size - 16) * 1024; /* bytes */
 
/* CACHE_NEW_VS_PROG */
-   brw-urb.vs_size = MAX2(brw-vs.prog_data-urb_entry_size, 1);
+   unsigned vs_size = MAX2(brw-vs.prog_data-urb_entry_size, 1);
 
-   int nr_vs_entries = handle_region_size / (brw-urb.vs_size * 64);
+   int nr_vs_entries = handle_region_size / (vs_size * 64);
if (nr_vs_entries  brw-urb.max_vs_entries)
   nr_vs_entries = brw-urb.max_vs_entries;
 
@@ -100,8 +100,7 @@ gen7_upload_urb(struct brw_context *brw)
assert(!brw-gs.prog_active);
 
gen7_emit_vs_workaround_flush(intel);
-   gen7_emit_urb_state(brw, brw-urb.nr_vs_entries, brw-urb.vs_size,
-   brw-urb.vs_start);
+   gen7_emit_urb_state(brw, brw-urb.nr_vs_entries, vs_size, 
brw-urb.vs_start);
 }
 
 void
-- 
1.8.1.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] i965: Use a variable for the push constant size in kB.

2013-04-02 Thread Kenneth Graunke
This clarifies that the offset of 2 is actually 16 kB / 8kB units.
It also keys both computations off of a single variable, which should
make it easier to change in the future.

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/gen7_urb.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen7_urb.c 
b/src/mesa/drivers/dri/i965/gen7_urb.c
index dafe1ad..5ac3885 100644
--- a/src/mesa/drivers/dri/i965/gen7_urb.c
+++ b/src/mesa/drivers/dri/i965/gen7_urb.c
@@ -78,8 +78,9 @@ static void
 gen7_upload_urb(struct brw_context *brw)
 {
struct intel_context *intel = brw-intel;
+   const int push_size_kB = 16;
/* Total space for entries is URB size - 16kB for push constants */
-   int handle_region_size = (brw-urb.size - 16) * 1024; /* bytes */
+   int handle_region_size = (brw-urb.size - push_size_kB) * 1024; /* bytes */
 
/* CACHE_NEW_VS_PROG */
unsigned vs_size = MAX2(brw-vs.prog_data-urb_entry_size, 1);
@@ -92,7 +93,7 @@ gen7_upload_urb(struct brw_context *brw)
brw-urb.nr_vs_entries = ROUND_DOWN_TO(nr_vs_entries, 8);
 
/* URB Starting Addresses are specified in multiples of 8kB. */
-   brw-urb.vs_start = 2; /* skip over push constants */
+   brw-urb.vs_start = push_size_kB / 8; /* skip over push constants */
 
assert(brw-urb.nr_vs_entries % 8 == 0);
assert(brw-urb.nr_gs_entries % 8 == 0);
-- 
1.8.1.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] register_allocate: Fix the type of best_benefit.

2013-04-02 Thread Matt Turner
---
 src/mesa/program/register_allocate.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/mesa/program/register_allocate.c 
b/src/mesa/program/register_allocate.c
index a9064c3..7d11b73 100644
--- a/src/mesa/program/register_allocate.c
+++ b/src/mesa/program/register_allocate.c
@@ -561,7 +561,7 @@ int
 ra_get_best_spill_node(struct ra_graph *g)
 {
unsigned int best_node = -1;
-   unsigned int best_benefit = 0.0;
+   float best_benefit = 0.0;
unsigned int n;
 
for (n = 0; n  g-count; n++) {
-- 
1.7.8.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] r600g: Fix UMAD on Cayman

2013-04-02 Thread Martin Andersson
The multiplication part of tgsi_umad did not work on Cayman, because it did
not populate the correct vector slots.
---
 src/gallium/drivers/r600/r600_shader.c | 45 --
 1 file changed, 32 insertions(+), 13 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_shader.c 
b/src/gallium/drivers/r600/r600_shader.c
index 82885d1..6c4cc8f 100644
--- a/src/gallium/drivers/r600/r600_shader.c
+++ b/src/gallium/drivers/r600/r600_shader.c
@@ -5840,7 +5840,7 @@ static int tgsi_umad(struct r600_shader_ctx *ctx)
 {
struct tgsi_full_instruction *inst = 
ctx-parse.FullToken.FullInstruction;
struct r600_bytecode_alu alu;
-   int i, j, r;
+   int i, j, k, r;
int lasti = tgsi_last_instruction(inst-Dst[0].Register.WriteMask);
 
/* src0 * src1 */
@@ -5848,21 +5848,40 @@ static int tgsi_umad(struct r600_shader_ctx *ctx)
if (!(inst-Dst[0].Register.WriteMask  (1  i)))
continue;
 
-   memset(alu, 0, sizeof(struct r600_bytecode_alu));
+   if (ctx-bc-chip_class == CAYMAN) {
+   for (j = 0 ; j  4; j++) {
+   memset(alu, 0, sizeof(struct 
r600_bytecode_alu));
 
-   alu.dst.chan = i;
-   alu.dst.sel = ctx-temp_reg;
-   alu.dst.write = 1;
+   alu.op = ALU_OP2_MULLO_UINT;
+   for (k = 0; k  inst-Instruction.NumSrcRegs; 
k++) {
+   r600_bytecode_src(alu.src[k], 
ctx-src[k], i);
+   }
+   tgsi_dst(ctx, inst-Dst[0], j, alu.dst);
+   alu.dst.sel = ctx-temp_reg;
+   alu.dst.write = (j == i);
+   if (j == 3)
+   alu.last = 1;
+   r = r600_bytecode_add_alu(ctx-bc, alu);
+   if (r)
+   return r;
+   }
+   } else {
+   memset(alu, 0, sizeof(struct r600_bytecode_alu));
 
-   alu.op = ALU_OP2_MULLO_UINT;
-   for (j = 0; j  2; j++) {
-   r600_bytecode_src(alu.src[j], ctx-src[j], i);
-   }
+   alu.dst.chan = i;
+   alu.dst.sel = ctx-temp_reg;
+   alu.dst.write = 1;
 
-   alu.last = 1;
-   r = r600_bytecode_add_alu(ctx-bc, alu);
-   if (r)
-   return r;
+   alu.op = ALU_OP2_MULLO_UINT;
+   for (j = 0; j  2; j++) {
+   r600_bytecode_src(alu.src[j], ctx-src[j], i);
+   }
+
+   alu.last = 1;
+   r = r600_bytecode_add_alu(ctx-bc, alu);
+   if (r)
+   return r;
+   }
}
 
 
-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] register_allocate: Fix the type of best_benefit.

2013-04-02 Thread Tom Stellard
On Tue, Apr 02, 2013 at 01:38:07PM -0700, Matt Turner wrote:
 ---

Nice catch, will this change have any affect on the compiled code?

-Tom

  src/mesa/program/register_allocate.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/src/mesa/program/register_allocate.c 
 b/src/mesa/program/register_allocate.c
 index a9064c3..7d11b73 100644
 --- a/src/mesa/program/register_allocate.c
 +++ b/src/mesa/program/register_allocate.c
 @@ -561,7 +561,7 @@ int
  ra_get_best_spill_node(struct ra_graph *g)
  {
 unsigned int best_node = -1;
 -   unsigned int best_benefit = 0.0;
 +   float best_benefit = 0.0;
 unsigned int n;
  
 for (n = 0; n  g-count; n++) {
 -- 
 1.7.8.6
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] Avoid spurious GCC warnings in STATIC_ASSERT() macro.

2013-04-02 Thread Paul Berry
GCC 4.8 now warns about typedefs that are local to a scope and not
used anywhere within that scope.  This produces spurious warnings with
the STATIC_ASSERT() macro (which uses a typedef to provoke a compile
error in the event of an assertion failure).

This patch avoids the warning using the GCC __attribute__((unused))
syntax.
---
 src/mesa/main/compiler.h | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/compiler.h b/src/mesa/main/compiler.h
index 8b23665..ddeb61d 100644
--- a/src/mesa/main/compiler.h
+++ b/src/mesa/main/compiler.h
@@ -249,6 +249,12 @@ static INLINE GLuint CPU_TO_LE32(GLuint x)
 #endif
 
 
+#if (__GNUC__ = 3)
+#define GCC_ATTRIBUTE_UNUSED __attribute__((unused))
+#else
+#define GCC_ATTRIBUTE_UNUSED
+#endif
+
 /**
  * Static (compile-time) assertion.
  * Basically, use COND to dimension an array.  If COND is false/zero the
@@ -256,7 +262,7 @@ static INLINE GLuint CPU_TO_LE32(GLuint x)
  */
 #define STATIC_ASSERT(COND) \
do { \
-  typedef int static_assertion_failed[(!!(COND))*2-1]; \
+  typedef int static_assertion_failed[(!!(COND))*2-1] 
GCC_ATTRIBUTE_UNUSED; \
} while (0)
 
 
-- 
1.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Avoid spurious GCC warnings in STATIC_ASSERT() macro.

2013-04-02 Thread Brian Paul

On 04/02/2013 04:16 PM, Paul Berry wrote:

GCC 4.8 now warns about typedefs that are local to a scope and not
used anywhere within that scope.  This produces spurious warnings with
the STATIC_ASSERT() macro (which uses a typedef to provoke a compile
error in the event of an assertion failure).

This patch avoids the warning using the GCC __attribute__((unused))
syntax.
---
  src/mesa/main/compiler.h | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/mesa/main/compiler.h b/src/mesa/main/compiler.h
index 8b23665..ddeb61d 100644
--- a/src/mesa/main/compiler.h
+++ b/src/mesa/main/compiler.h
@@ -249,6 +249,12 @@ static INLINE GLuint CPU_TO_LE32(GLuint x)
  #endif


+#if (__GNUC__= 3)
+#define GCC_ATTRIBUTE_UNUSED __attribute__((unused))
+#else
+#define GCC_ATTRIBUTE_UNUSED
+#endif
+
  /**
   * Static (compile-time) assertion.
   * Basically, use COND to dimension an array.  If COND is false/zero the
@@ -256,7 +262,7 @@ static INLINE GLuint CPU_TO_LE32(GLuint x)
   */
  #define STATIC_ASSERT(COND) \
 do { \
-  typedef int static_assertion_failed[(!!(COND))*2-1]; \
+  typedef int static_assertion_failed[(!!(COND))*2-1] 
GCC_ATTRIBUTE_UNUSED; \
 } while (0)




Without using gcc-isms, I think this would work too:

#define STATIC_ASSERT(COND) \
   do { \
  int static_assertion_failed[(!!(COND))*2-1]; \
  (void) static_assertion_failed; \
   } while (0)

I don't recall why I used the typedef.

Also, the same macro should probably be updated in 
src/gallium/include/pipe/p_compiler.h


-Brian

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] draw/gs: cleanup some debugging code

2013-04-02 Thread Brian Paul

On 03/30/2013 07:27 AM, Zack Rusin wrote:

Signed-off-by: Zack Rusinza...@vmware.com
---
  src/gallium/auxiliary/draw/draw_gs.c |4 
  1 file changed, 4 deletions(-)

diff --git a/src/gallium/auxiliary/draw/draw_gs.c 
b/src/gallium/auxiliary/draw/draw_gs.c
index b98b133..70db837 100644
--- a/src/gallium/auxiliary/draw/draw_gs.c
+++ b/src/gallium/auxiliary/draw/draw_gs.c
@@ -160,8 +160,6 @@ static void tgsi_fetch_gs_input(struct draw_geometry_shader 
*shader,
  #if DEBUG_INPUTS
  debug_printf(\tSlot = %d, vs_slot = %d, idx = %d:\n,
   slot, vs_slot, idx);
-#endif
-#if 1
  assert(!util_is_inf_or_nan(input[vs_slot][0]));
  assert(!util_is_inf_or_nan(input[vs_slot][1]));
  assert(!util_is_inf_or_nan(input[vs_slot][2]));
@@ -249,8 +247,6 @@ llvm_fetch_gs_input(struct draw_geometry_shader *shader,
  #if DEBUG_INPUTS
  debug_printf(\tSlot = %d, vs_slot = %d, i = %d:\n,
   slot, vs_slot, i);
-#endif
-#if 0
  assert(!util_is_inf_or_nan(input[vs_slot][0]));
  assert(!util_is_inf_or_nan(input[vs_slot][1]));
  assert(!util_is_inf_or_nan(input[vs_slot][2]));


For the series:
Reviewed-by: Brian Paul bri...@vmware.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] gallivm: use f16c hw support for float-half and half-float conversion

2013-04-02 Thread sroland
From: Roland Scheidegger srol...@vmware.com

Should be way faster of course on cpus supporting this (includes AMD
Bulldozer and Jaguar cores, Intel Ivy Bridge and up (except budget models)).
Passes piglit fbo-blending-formats GL_ARB_texture_float -auto on Ivy Bridge.
---
 src/gallium/auxiliary/gallivm/lp_bld_conv.c |   45 ---
 src/gallium/auxiliary/gallivm/lp_bld_init.c |   10 ++
 src/gallium/auxiliary/util/u_cpu_detect.c   |1 +
 src/gallium/auxiliary/util/u_cpu_detect.h   |1 +
 4 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_conv.c 
b/src/gallium/auxiliary/gallivm/lp_bld_conv.c
index 38a577c..eb2d096 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_conv.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_conv.c
@@ -175,9 +175,24 @@ lp_build_half_to_float(struct gallivm_state *gallivm,
struct lp_type f32_type = lp_type_float_vec(32, 32 * src_length);
struct lp_type i32_type = lp_type_int_vec(32, 32 * src_length);
LLVMTypeRef int_vec_type = lp_build_vec_type(gallivm, i32_type);
+   LLVMValueRef h;
+
+   if (util_cpu_caps.has_f16c  HAVE_LLVM = 0x0301 
+   (src_length == 4 || src_length == 8)) {
+  const char *intrinsic = NULL;
+  if (src_length == 4) {
+ src = lp_build_pad_vector(gallivm, src, 8);
+ intrinsic = llvm.x86.vcvtph2ps.128;
+  }
+  else {
+ intrinsic = llvm.x86.vcvtph2ps.256;
+  }
+  return lp_build_intrinsic_unary(builder, intrinsic,
+  lp_build_vec_type(gallivm, f32_type), 
src);
+   }
 
/* Convert int16 vector to int32 vector by zero ext (might generate bad 
code) */
-   LLVMValueRef h = LLVMBuildZExt(builder, src, int_vec_type, );
+   h = LLVMBuildZExt(builder, src, int_vec_type, );
return lp_build_smallfloat_to_float(gallivm, f32_type, h, 10, 5, 0, true);
 }
 
@@ -204,9 +219,31 @@ lp_build_float_to_half(struct gallivm_state *gallivm,
struct lp_type i16_type = lp_type_int_vec(16, 16 * length);
LLVMValueRef result;
 
-   result = lp_build_float_to_smallfloat(gallivm, i32_type, src, 10, 5, 0, 
true);
-   /* Convert int32 vector to int16 vector by trunc (might generate bad code) 
*/
-   result = LLVMBuildTrunc(builder, result, lp_build_vec_type(gallivm, 
i16_type), );
+   if (util_cpu_caps.has_f16c  HAVE_LLVM = 0x0301 
+   (length == 4 || length == 8)) {
+  struct lp_type i168_type = lp_type_int_vec(16, 16 * 8);
+  unsigned mode = 3; /* same as LP_BUILD_ROUND_TRUNCATE */
+  LLVMTypeRef i32t = LLVMInt32TypeInContext(gallivm-context);
+  const char *intrinsic = NULL;
+  if (length == 4) {
+ intrinsic = llvm.x86.vcvtps2ph.128;
+  }
+  else {
+ intrinsic = llvm.x86.vcvtps2ph.256;
+  }
+  result = lp_build_intrinsic_binary(builder, intrinsic,
+ lp_build_vec_type(gallivm, i168_type),
+ src, LLVMConstInt(i32t, mode, 0));
+  if (length == 4) {
+ result = lp_build_extract_range(gallivm, result, 0, 4);
+  }
+   }
+
+   else {
+  result = lp_build_float_to_smallfloat(gallivm, i32_type, src, 10, 5, 0, 
true);
+  /* Convert int32 vector to int16 vector by trunc (might generate bad 
code) */
+  result = LLVMBuildTrunc(builder, result, lp_build_vec_type(gallivm, 
i16_type), );
+   }
 
/*
 * Debugging code.
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c 
b/src/gallium/auxiliary/gallivm/lp_bld_init.c
index 050eba7..4fa5887 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_init.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_init.c
@@ -468,6 +468,15 @@ lp_build_init(void)
   util_cpu_caps.has_avx = 0;
}
 
+   if (!HAVE_AVX) {
+  /*
+   * note these instructions are VEX-only, so can only emit if we use
+   * avx (don't want to base it on has_avx  has_f16c later as that would
+   * omit it unnecessarily on amd cpus, see above).
+   */
+  util_cpu_caps.has_f16c = 0;
+   }
+
 #ifdef PIPE_ARCH_PPC_64
/* Set the NJ bit in VSCR to 0 so denormalized values are handled as
 * specified by IEEE standard (PowerISA 2.06 - Section 6.3). This garantees
@@ -495,6 +504,7 @@ lp_build_init(void)
util_cpu_caps.has_ssse3 = 0;
util_cpu_caps.has_sse4_1 = 0;
util_cpu_caps.has_avx = 0;
+   util_cpu_caps.has_f16c = 0;
 #endif
 }
 
diff --git a/src/gallium/auxiliary/util/u_cpu_detect.c 
b/src/gallium/auxiliary/util/u_cpu_detect.c
index 0328051..7e6df9d 100644
--- a/src/gallium/auxiliary/util/u_cpu_detect.c
+++ b/src/gallium/auxiliary/util/u_cpu_detect.c
@@ -279,6 +279,7 @@ util_cpu_detect(void)
  util_cpu_caps.has_sse4_1 = (regs2[2]  19)  1;
  util_cpu_caps.has_sse4_2 = (regs2[2]  20)  1;
  util_cpu_caps.has_avx= (regs2[2]  28)  1;
+ util_cpu_caps.has_f16c   = (regs2[2]  29)  1;
  util_cpu_caps.has_mmx2   = util_cpu_caps.has_sse; /* SSE cpus 
supports mmxext too 

Re: [Mesa-dev] [PATCH] gallivm: use f16c hw support for float-half and half-float conversion

2013-04-02 Thread Brian Paul

On 04/02/2013 05:07 PM, srol...@vmware.com wrote:

From: Roland Scheideggersrol...@vmware.com

Should be way faster of course on cpus supporting this (includes AMD
Bulldozer and Jaguar cores, Intel Ivy Bridge and up (except budget models)).
Passes piglit fbo-blending-formats GL_ARB_texture_float -auto on Ivy Bridge.
---
  src/gallium/auxiliary/gallivm/lp_bld_conv.c |   45 ---
  src/gallium/auxiliary/gallivm/lp_bld_init.c |   10 ++
  src/gallium/auxiliary/util/u_cpu_detect.c   |1 +
  src/gallium/auxiliary/util/u_cpu_detect.h   |1 +
  4 files changed, 53 insertions(+), 4 deletions(-)



LGTM.  Reviewed-by: Brian Paul bri...@vmware.com
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Don't immediately schedule instructions that were just made available.

2013-04-02 Thread Eric Anholt
Matt Turner matts...@gmail.com writes:

 The original goal of pre-register allocation scheduling was to reduce
 live ranges so we'd use fewer registers and hopefully fit into 16-wide.
 In shader-db, this change causes us to lose 30 16-wide programs, but we
 gain 29... so it's a toss-up. At least by choosing instructions in a
 better order all programs should be slightly faster.

I think this will break the GLES3 test that we created this pass for.

I think we'll get the same performance benefit by round-robining our
allocated registers instead of packing them in the low numbers, which is
what that branch I had mentioned to you was for.


pgpM3tAcSi1dS.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi: add more cases for copying unsupported formats to resource_copy_region

2013-04-02 Thread Tom Stellard
From: Marek Olšák mar...@gmail.com

Ported from r600g commit:

8891b2f9c91b2f6c8625184c23a10b8e55875dc0

NOTE: This is a candidate for the stable branches.
---
 src/gallium/drivers/radeonsi/r600_blit.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/r600_blit.c 
b/src/gallium/drivers/radeonsi/r600_blit.c
index f9d2568..f11f110 100644
--- a/src/gallium/drivers/radeonsi/r600_blit.c
+++ b/src/gallium/drivers/radeonsi/r600_blit.c
@@ -429,6 +429,18 @@ static void r600_resource_copy_region(struct pipe_context 
*ctx,
r600_change_format(dst, dst_level, orig_info[1],
   PIPE_FORMAT_R8G8B8A8_UNORM);
break;
+   case 8:
+   r600_change_format(src, src_level, orig_info[0],
+  PIPE_FORMAT_R16G16B16A16_UINT);
+   r600_change_format(dst, dst_level, orig_info[1],
+  PIPE_FORMAT_R16G16B16A16_UINT);
+   break;
+   case 16:
+   r600_change_format(src, src_level, orig_info[0],
+  PIPE_FORMAT_R32G32B32A32_UINT);
+   r600_change_format(dst, dst_level, orig_info[1],
+  PIPE_FORMAT_R32G32B32A32_UINT);
+   break;
default:
fprintf(stderr, Unhandled format %s with blocksize 
%u\n,
util_format_short_name(src-format), blocksize);
-- 
1.7.3.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Don't immediately schedule instructions that were just made available.

2013-04-02 Thread Matt Turner
On Tue, Apr 2, 2013 at 4:48 PM, Eric Anholt e...@anholt.net wrote:
 Matt Turner matts...@gmail.com writes:

 The original goal of pre-register allocation scheduling was to reduce
 live ranges so we'd use fewer registers and hopefully fit into 16-wide.
 In shader-db, this change causes us to lose 30 16-wide programs, but we
 gain 29... so it's a toss-up. At least by choosing instructions in a
 better order all programs should be slightly faster.

 I think this will break the GLES3 test that we created this pass for.

It does, and I've been trying to figure out another way of solving it.

 I think we'll get the same performance benefit by round-robining our
 allocated registers instead of packing them in the low numbers, which is
 what that branch I had mentioned to you was for.

I don't think so, since the round-robin allocation would have helped
write-after-read stalls, but our hardware doesn't stall on
write-after-read.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeon/uvd: add UVD implementation

2013-04-02 Thread Matt Turner
On Tue, Apr 2, 2013 at 4:19 PM, Christian König deathsim...@vodafone.de wrote:
 diff --git a/configure.ac b/configure.ac
 index 81d4a3f..93ec1d2 100644
 --- a/configure.ac
 +++ b/configure.ac
 @@ -1814,6 +1814,7 @@ if test x$with_gallium_drivers != x; then
  if test x$enable_r600_llvm = xyes -o x$enable_opencl = xyes; 
 then
  radeon_llvm_check
  NEED_RADEON_GALLIUM=yes;
 +NEED_RUVD_GALLIUM=yes;
  R600_NEED_RADEON_GALLIUM=yes;
  LLVM_COMPONENTS=${LLVM_COMPONENTS} ipo bitreader asmparser
  fi
 @@ -1832,6 +1833,7 @@ if test x$with_gallium_drivers != x; then
  GALLIUM_DRIVERS_DIRS=$GALLIUM_DRIVERS_DIRS radeonsi
  radeon_llvm_check
 NEED_RADEON_GALLIUM=yes;
 +   NEED_RUVD_GALLIUM=yes;
  gallium_check_st radeon/drm dri-radeonsi xorg-radeonsi  
  vdpau-radeonsi 
  ;;
  xnouveau)
 @@ -1987,6 +1989,7 @@ AM_CONDITIONAL(HAVE_GALAHAD_GALLIUM, test 
 x$HAVE_GALAHAD_GALLIUM = xyes)
  AM_CONDITIONAL(HAVE_IDENTITY_GALLIUM, test x$HAVE_IDENTITY_GALLIUM = xyes)
  AM_CONDITIONAL(HAVE_NOOP_GALLIUM, test x$HAVE_NOOP_GALLIUM = xyes)
  AM_CONDITIONAL(NEED_RADEON_GALLIUM, test x$NEED_RADEON_GALLIUM = xyes)
 +AM_CONDITIONAL(NEED_RUVD_GALLIUM, test x$NEED_RUVD_GALLIUM = xyes)
  AM_CONDITIONAL(R600_NEED_RADEON_GALLIUM, test x$R600_NEED_RADEON_GALLIUM = 
 xyes)
  AM_CONDITIONAL(USE_R600_LLVM_COMPILER, test x$USE_R600_LLVM_COMPILER = xyes)
  AM_CONDITIONAL(HAVE_LOADER_GALLIUM, test x$enable_gallium_loader = xyes)
 @@ -2062,6 +2065,7 @@ AC_CONFIG_FILES([Makefile
 src/gallium/drivers/softpipe/Makefile
 src/gallium/drivers/svga/Makefile
 src/gallium/drivers/trace/Makefile
 +   src/gallium/drivers/ruvd/Makefile

Keep this list in alphabetical order please.

 src/gallium/state_trackers/Makefile
 src/gallium/state_trackers/clover/Makefile
 src/gallium/state_trackers/dri/Makefile
 diff --git a/docs/README.UVD b/docs/README.UVD
 new file mode 100644
 index 000..36b467e
 --- /dev/null
 +++ b/docs/README.UVD
 @@ -0,0 +1,13 @@
 +The software may implement third party technologies (e.g. third party
 +libraries) that are not licensed to you by AMD and for which you may need
 +to obtain licenses from other parties.  Unless explicitly stated otherwise,
 +these third party technologies are not licensed hereunder.  Such third
 +party technologies include, but are not limited, to H.264, MPEG-2, MPEG-4,
 +AVC, and VC-1.
 +
 +For MPEG-2 Encoding Products ANY USE OF THIS PRODUCT IN ANY MANNER OTHER
 +THAN PERSONAL USE THAT COMPLIES WITH THE MPEG-2 STANDARD FOR ENCODING VIDEO
 +INFORMATION FOR PACKAGED MEDIA IS EXPRESSLY PROHIBITED WITHOUT A LICENSE
 +UNDER APPLICABLE PATENTS IN THE MPEG-2 PATENT PORTFOLIO, WHICH LICENSES IS
 +AVAILABLE FROM MPEG LA, LLC, 6312 S. Fiddlers Green Circle, Suite 400E,
 +Greenwood Village, Colorado 80111 U.S.A.
 diff --git a/src/gallium/drivers/Makefile.am b/src/gallium/drivers/Makefile.am
 index 3477fee..b78a3e0 100644
 --- a/src/gallium/drivers/Makefile.am
 +++ b/src/gallium/drivers/Makefile.am
 @@ -64,4 +64,12 @@ endif

  
 

 +if NEED_RADEON_GALLIUM

Supposed to be NEED_RUVD_GALLIUM?

 +
 +SUBDIRS += ruvd
 +
 +endif
 +
 +
 +
  SUBDIRS += $(GALLIUM_MAKE_DIRS)

 diff --git a/src/gallium/drivers/ruvd/Makefile.am 
 b/src/gallium/drivers/ruvd/Makefile.am
 new file mode 100644
 index 000..1d183e7
 --- /dev/null
 +++ b/src/gallium/drivers/ruvd/Makefile.am
 @@ -0,0 +1,16 @@
 +include Makefile.sources
 +
 +noinst_LTLIBRARIES = libruvd.la
 +
 +AM_CFLAGS = \
 +   -I$(top_srcdir)/src/gallium/include \
 +   -I$(top_srcdir)/src/gallium/auxiliary \
 +   -I$(top_srcdir)/src/gallium/drivers \
 +   -I$(top_srcdir)/include \
 +   $(RADEON_CFLAGS) \
 +   $(DEFINES) \
 +   $(PIC_FLAGS) \

No more PIC_FLAGS.

Congratulations. I bet this has been a really long process for you guys.

Matt
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Avoid spurious GCC warnings in STATIC_ASSERT() macro.

2013-04-02 Thread Eric Anholt
Brian Paul bri...@vmware.com writes:

 On 04/02/2013 04:16 PM, Paul Berry wrote:
 GCC 4.8 now warns about typedefs that are local to a scope and not
 used anywhere within that scope.  This produces spurious warnings with
 the STATIC_ASSERT() macro (which uses a typedef to provoke a compile
 error in the event of an assertion failure).

 This patch avoids the warning using the GCC __attribute__((unused))
 syntax.
 ---
   src/mesa/main/compiler.h | 8 +++-
   1 file changed, 7 insertions(+), 1 deletion(-)

 diff --git a/src/mesa/main/compiler.h b/src/mesa/main/compiler.h
 index 8b23665..ddeb61d 100644
 --- a/src/mesa/main/compiler.h
 +++ b/src/mesa/main/compiler.h
 @@ -249,6 +249,12 @@ static INLINE GLuint CPU_TO_LE32(GLuint x)
   #endif


 +#if (__GNUC__= 3)
 +#define GCC_ATTRIBUTE_UNUSED __attribute__((unused))
 +#else
 +#define GCC_ATTRIBUTE_UNUSED
 +#endif
 +
   /**
* Static (compile-time) assertion.
* Basically, use COND to dimension an array.  If COND is false/zero the
 @@ -256,7 +262,7 @@ static INLINE GLuint CPU_TO_LE32(GLuint x)
*/
   #define STATIC_ASSERT(COND) \
  do { \
 -  typedef int static_assertion_failed[(!!(COND))*2-1]; \
 +  typedef int static_assertion_failed[(!!(COND))*2-1] 
 GCC_ATTRIBUTE_UNUSED; \
  } while (0)



 Without using gcc-isms, I think this would work too:

 #define STATIC_ASSERT(COND) \
 do { \
int static_assertion_failed[(!!(COND))*2-1]; \
(void) static_assertion_failed; \
 } while (0)

 I don't recall why I used the typedef.

 Also, the same macro should probably be updated in 
 src/gallium/include/pipe/p_compiler.h

Rusty's CCAN is often a good reference for stuff like this:

http://git.ozlabs.org/?p=ccan;a=blob;f=ccan/build_assert/build_assert.h;h=b9ecd84028e3fbebd1bf009c3c57e8a193e45646;hb=HEAD


pgp_DUoxHsTu5.pgp
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] gallivm: minor rho calculation optimization for 1 or 3 coords

2013-04-02 Thread sroland
From: Roland Scheidegger srol...@vmware.com

Using a different packing for the single coord case should save a shuffle.
Plus some minor style fixes.
---
 src/gallium/auxiliary/gallivm/lp_bld_quad.c   |   20 +++-
 src/gallium/auxiliary/gallivm/lp_bld_sample.c |   31 +++--
 2 files changed, 22 insertions(+), 29 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_quad.c 
b/src/gallium/auxiliary/gallivm/lp_bld_quad.c
index 1955add..f2a762a 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_quad.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_quad.c
@@ -81,7 +81,8 @@ lp_build_ddy(struct lp_build_context *bld,
 /*
  * Helper for building packed ddx/ddy vector for one coord (scalar per quad
  * values). The vector will look like this (8-wide):
- * dr1dx dr1dy _ _ dr2dx dr2dy _ _
+ * dr1dx _ -dr1dy _ dr2dx _ -dr2dy _
+ * This only requires one shuffle instead of two for more straightforward 
packing.
  */
 LLVMValueRef
 lp_build_packed_ddx_ddy_onecoord(struct lp_build_context *bld,
@@ -91,19 +92,15 @@ lp_build_packed_ddx_ddy_onecoord(struct lp_build_context 
*bld,
LLVMBuilderRef builder = gallivm-builder;
LLVMValueRef vec1, vec2;
 
-   /* same packing as _twocoord, but can use aos swizzle helper */
+   /* use aos swizzle helper */
 
-   /*
-* XXX could make swizzle1 a noop swizzle by using right top/bottom
-* pair for ddy
-*/
-   static const unsigned char swizzle1[] = {
-  LP_BLD_QUAD_TOP_LEFT, LP_BLD_QUAD_TOP_LEFT,
-  LP_BLD_SWIZZLE_DONTCARE, LP_BLD_SWIZZLE_DONTCARE
+   static const unsigned char swizzle1[] = { /* no-op swizzle */
+  LP_BLD_QUAD_TOP_LEFT, LP_BLD_SWIZZLE_DONTCARE,
+  LP_BLD_QUAD_BOTTOM_LEFT, LP_BLD_SWIZZLE_DONTCARE
};
static const unsigned char swizzle2[] = {
-  LP_BLD_QUAD_TOP_RIGHT, LP_BLD_QUAD_BOTTOM_LEFT,
-  LP_BLD_SWIZZLE_DONTCARE, LP_BLD_SWIZZLE_DONTCARE
+  LP_BLD_QUAD_TOP_RIGHT, LP_BLD_SWIZZLE_DONTCARE,
+  LP_BLD_QUAD_TOP_LEFT, LP_BLD_SWIZZLE_DONTCARE
};
 
vec1 = lp_build_swizzle_aos(bld, a, swizzle1);
@@ -120,6 +117,7 @@ lp_build_packed_ddx_ddy_onecoord(struct lp_build_context 
*bld,
  * Helper for building packed ddx/ddy vector for one coord (scalar per quad
  * values). The vector will look like this (8-wide):
  * ds1dx ds1dy dt1dx dt1dy ds2dx ds2dy dt2dx dt2dy
+ * This only needs 2 (v)shufps.
  */
 LLVMValueRef
 lp_build_packed_ddx_ddy_twocoord(struct lp_build_context *bld,
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c 
b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
index fc8bae7..9a00897 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
@@ -226,7 +226,6 @@ lp_build_rho(struct lp_build_sample_context *bld,
LLVMValueRef int_size, float_size;
LLVMValueRef rho;
LLVMValueRef first_level, first_level_vec;
-   LLVMValueRef abs_ddx_ddy[2];
unsigned length = coord_bld-type.length;
unsigned num_quads = length / 4;
unsigned i;
@@ -279,32 +278,28 @@ lp_build_rho(struct lp_build_sample_context *bld,
  ddx_ddy[0] = lp_build_packed_ddx_ddy_onecoord(coord_bld, s);
   }
   else if (dims = 2) {
- ddx_ddy[0] = lp_build_packed_ddx_ddy_twocoord(coord_bld,
-   s, t);
+ ddx_ddy[0] = lp_build_packed_ddx_ddy_twocoord(coord_bld, s, t);
  if (dims  2) {
 ddx_ddy[1] = lp_build_packed_ddx_ddy_onecoord(coord_bld, r);
  }
   }
 
-  abs_ddx_ddy[0] = lp_build_abs(coord_bld, ddx_ddy[0]);
+  ddx_ddy[0] = lp_build_abs(coord_bld, ddx_ddy[0]);
   if (dims  2) {
- abs_ddx_ddy[1] = lp_build_abs(coord_bld, ddx_ddy[1]);
-  }
-  else {
- abs_ddx_ddy[1] = NULL;
+ ddx_ddy[1] = lp_build_abs(coord_bld, ddx_ddy[1]);
   }
 
-  if (dims == 1) {
- static const unsigned char swizzle1[] = {
+  if (dims  2) {
+ static const unsigned char swizzle1[] = { /* no-op swizzle */
 0, LP_BLD_SWIZZLE_DONTCARE,
 LP_BLD_SWIZZLE_DONTCARE, LP_BLD_SWIZZLE_DONTCARE
  };
  static const unsigned char swizzle2[] = {
-1, LP_BLD_SWIZZLE_DONTCARE,
+2, LP_BLD_SWIZZLE_DONTCARE,
 LP_BLD_SWIZZLE_DONTCARE, LP_BLD_SWIZZLE_DONTCARE
  };
- rho_xvec = lp_build_swizzle_aos(coord_bld, abs_ddx_ddy[0], swizzle1);
- rho_yvec = lp_build_swizzle_aos(coord_bld, abs_ddx_ddy[0], swizzle2);
+ rho_xvec = lp_build_swizzle_aos(coord_bld, ddx_ddy[0], swizzle1);
+ rho_yvec = lp_build_swizzle_aos(coord_bld, ddx_ddy[0], swizzle2);
   }
   else if (dims == 2) {
  static const unsigned char swizzle1[] = {
@@ -315,8 +310,8 @@ lp_build_rho(struct lp_build_sample_context *bld,
 1, 3,
 LP_BLD_SWIZZLE_DONTCARE, LP_BLD_SWIZZLE_DONTCARE
  };
- rho_xvec = lp_build_swizzle_aos(coord_bld, abs_ddx_ddy[0], 

[Mesa-dev] [PATCH 2/3] gallivm: do per-pixel cube face selection (finally!!!)

2013-04-02 Thread sroland
From: Roland Scheidegger srol...@vmware.com

This proved to be tricky, the problem is that after selection/mirroring
we cannot calculate reasonable derivatives (if not all pixels in a quad
end up on the same face the derivatives could get randomly exceedingly
large).
However, it is actually quite easy to simply calculate the derivatives
before selection/mirroring and then transform them similar to
the cube coordinates (they only need selection/projection, but not
mirroring as we're not interested in the sign bit, of course). While
there is a tiny bit more work to do (need to calculate derivs for 3
coords instead of 2, and additional selects) it also simplifies things
somewhat for the coord selection itself (as we save some broadcast aos
shuffles, and we don't need to calculate the average vector) - hence if
derivatives aren't needed this should actually be faster.
Also, this has the benefit that this will (trivially) work for explicit
derivatives too, which we completely ignored before that (will be in a
separate commit for better trackability).
Note that while the way for getting rho looks very different, it should
result in nearly the same values as before (the nearly is only because
before the code would choose the face based on an average vector and hence
the derivatives calculated according to this face, where now (for implicit
derivatives) the derivatives are projected on the face selected for the
first (top-left) pixel in a quad, so not necessarly the same face).
The transformation done might not quite be state-of-the-art, calculating
length(dx,dy) as max(dx,dy) certainly isn't neither but this stays the
same as before (that is I think a better transform would _somehow_ take
the derivative major axis into account so that derivative changes in
the major axis wouldn't get ignored).
Should solve some accuracy problems with cubemaps (can easily be seen with
the cubemap demo when switching wrapping/filtering), though we still don't
do seamless filtering to fix it completely (so not per-sample but per-pixel
is certainly better than per-quad and already sufficient for accurate
results with nearest tex filter).

As for performance, it seems to be a tiny bit faster too (maybe 3% or so
with cubemap demo). Which I'd have expected with nearest/nearest filtering
where this will be less instructions, but the difference seems to actually
be larger with linear/linear_mipmap_linear where it is slightly more
instructions, probably the code appears less serialized allowing better
scheduling (on a sandy bridge cpu). It actually seems to be now at least
as fast as the old path using a conditional when using 128bit vectors too
(that is probably more a result of testing with a newer cpu though), for now
that old path is still there but unused.
---
 src/gallium/auxiliary/gallivm/lp_bld_sample.c |  249 ++---
 src/gallium/auxiliary/gallivm/lp_bld_sample.h |4 +-
 src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |9 +-
 3 files changed, 180 insertions(+), 82 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c 
b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
index 9a00897..5d50921 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
@@ -207,6 +207,7 @@ lp_build_rho(struct lp_build_sample_context *bld,
  LLVMValueRef s,
  LLVMValueRef t,
  LLVMValueRef r,
+ LLVMValueRef cube_rho,
  const struct lp_derivatives *derivs)
 {
struct gallivm_state *gallivm = bld-gallivm;
@@ -240,8 +241,22 @@ lp_build_rho(struct lp_build_sample_context *bld,
int_size = lp_build_minify(int_size_bld, bld-int_size, first_level_vec);
float_size = lp_build_int_to_float(float_size_bld, int_size);
 
-   /* XXX ignoring explicit derivs for cube maps for now */
-   if (derivs  !(bld-static_texture_state-target == PIPE_TEXTURE_CUBE)) {
+   if (cube_rho) {
+  LLVMValueRef cubesize;
+  LLVMValueRef index0 = lp_build_const_int32(gallivm, 0);
+  /*
+   * If we have derivs too then we have per-pixel cube_rho - doesn't matter
+   * though until we do per-pixel lod.
+   * Cube map code did already everything except size mul and per-quad 
extraction.
+   */
+  /* Could optimize this for single quad just skip the broadcast */
+  cubesize = lp_build_extract_broadcast(gallivm, bld-float_size_in_type,
+coord_bld-type, float_size, 
index0);
+  rho_vec = lp_build_mul(coord_bld, cubesize, cube_rho);
+  rho = lp_build_pack_aos_scalars(bld-gallivm, coord_bld-type,
+  perquadf_bld-type, rho_vec, 0);
+   }
+   else if (derivs  !(bld-static_texture_state-target == 
PIPE_TEXTURE_CUBE)) {
   LLVMValueRef ddmax[3];
   for (i = 0; i  dims; i++) {
  LLVMValueRef ddx, ddy;
@@ -561,6 +576,7 @@ lp_build_lod_selector(struct lp_build_sample_context *bld,

[Mesa-dev] [PATCH 3/3] gallivm: honor explicit derivatives values for cube maps.

2013-04-02 Thread sroland
From: Roland Scheidegger srol...@vmware.com

This is trivial now, though need to make sure we pass all the necessary
derivative values (which is 3 each for ddx/ddy not 2).
Untested (no piglit test) however since the transform works the same
as implicit derivatives this should probably work correctly.
---
 src/gallium/auxiliary/gallivm/lp_bld_sample.c |   10 ++--
 src/gallium/auxiliary/gallivm/lp_bld_sample.h |1 +
 src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |2 +-
 src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c   |   66 ++---
 4 files changed, 52 insertions(+), 27 deletions(-)

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.c 
b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
index 5d50921..cc04a70 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.c
@@ -1287,6 +1287,7 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld,
  LLVMValueRef s,
  LLVMValueRef t,
  LLVMValueRef r,
+ const struct lp_derivatives *derivs, /* optional */
  LLVMValueRef *face,
  LLVMValueRef *face_s,
  LLVMValueRef *face_t,
@@ -1296,7 +1297,6 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld,
LLVMBuilderRef builder = bld-gallivm-builder;
struct gallivm_state *gallivm = bld-gallivm;
LLVMValueRef si, ti, ri;
-   boolean implicit_derivs = TRUE;
boolean need_derivs = TRUE;
 
if (1 || coord_bld-type.length  4) {
@@ -1334,9 +1334,9 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld,
   assert(PIPE_TEX_FACE_NEG_Z == PIPE_TEX_FACE_POS_Z + 1);
 
   /*
-   * TODO do this only when needed, and implement explicit derivs 
(trivial).
+   * TODO do this only when needed.
*/
-  if (need_derivs  implicit_derivs) {
+  if (need_derivs  !derivs) {
  LLVMValueRef ddx_ddy[2], tmp[2];
  /*
   * This isn't quite the same as the ordinary path since
@@ -1374,9 +1374,9 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld,
  dmax[2] = lp_build_max(coord_bld, tmp[0], tmp[1]);
   }
   else if (need_derivs) {
- /* dmax[0] = lp_build_max(coord_bld, derivs-ddx[0], derivs-ddy[0]);
+ dmax[0] = lp_build_max(coord_bld, derivs-ddx[0], derivs-ddy[0]);
  dmax[1] = lp_build_max(coord_bld, derivs-ddx[1], derivs-ddy[1]);
- dmax[2] = lp_build_max(coord_bld, derivs-ddx[2], derivs-ddy[2]); */
+ dmax[2] = lp_build_max(coord_bld, derivs-ddx[2], derivs-ddy[2]);
   }
 
   si = LLVMBuildBitCast(builder, s, lp_build_vec_type(gallivm, intctype), 
);
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample.h 
b/src/gallium/auxiliary/gallivm/lp_bld_sample.h
index 5026b0a..72af813 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample.h
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample.h
@@ -433,6 +433,7 @@ lp_build_cube_lookup(struct lp_build_sample_context *bld,
  LLVMValueRef s,
  LLVMValueRef t,
  LLVMValueRef r,
+ const struct lp_derivatives *derivs, /* optional */
  LLVMValueRef *face,
  LLVMValueRef *face_s,
  LLVMValueRef *face_t,
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
index 3b950ea..d2cc0f3 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c
@@ -1102,7 +1102,7 @@ lp_build_sample_common(struct lp_build_sample_context 
*bld,
 */
if (target == PIPE_TEXTURE_CUBE) {
   LLVMValueRef face, face_s, face_t;
-  lp_build_cube_lookup(bld, *s, *t, *r, face, face_s, face_t, 
cube_rho);
+  lp_build_cube_lookup(bld, *s, *t, *r, derivs, face, face_s, face_t, 
cube_rho);
   *s = face_s; /* vec */
   *t = face_t; /* vec */
   /* use 'r' to indicate cube face */
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c 
b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
index facfc82..007e3c9 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c
@@ -1276,8 +1276,7 @@ emit_tex( struct lp_build_tgsi_soa_context *bld,
LLVMValueRef offsets[3] = { NULL };
struct lp_derivatives derivs;
struct lp_derivatives *deriv_ptr = NULL;
-   unsigned num_coords;
-   unsigned dims;
+   unsigned num_coords, num_derivs, num_offsets;
unsigned i;
 
if (!bld-sampler) {
@@ -1291,37 +1290,52 @@ emit_tex( struct lp_build_tgsi_soa_context *bld,
switch (inst-Texture.Texture) {
case TGSI_TEXTURE_1D:
   num_coords = 1;
-  dims = 1;
+  num_offsets = 1;
+  num_derivs = 1;
   break;
case TGSI_TEXTURE_1D_ARRAY:
   num_coords = 2;
-  dims = 1;
+  num_offsets = 1;
+  num_derivs = 1;
   

[Mesa-dev] [PATCH 1/3] intel: Add support for writing to our linear-temporary-CPU-map case.

2013-04-02 Thread Eric Anholt
This will be used for handling updates of large textures.

Reviewed-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/intel/intel_mipmap_tree.c |   25 ++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/intel/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
index 66cadeb..ffdaec5 100644
--- a/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
@@ -1347,9 +1347,30 @@ intel_miptree_unmap_blit(struct intel_context *intel,
 unsigned int level,
 unsigned int slice)
 {
-   assert(!(map-mode  GL_MAP_WRITE_BIT));
-
+   struct gl_context *ctx = intel-ctx;
drm_intel_bo_unmap(map-bo);
+
+   if (map-mode  GL_MAP_WRITE_BIT) {
+  unsigned int image_x, image_y;
+  int x = map-x;
+  int y = map-y;
+  intel_miptree_get_image_offset(mt, level, slice, image_x, image_y);
+  x += image_x;
+  y += image_y;
+
+  bool ok = intelEmitCopyBlit(intel,
+  mt-region-cpp,
+  map-stride, map-bo,
+  0, I915_TILING_NONE,
+  mt-region-pitch, mt-region-bo,
+  mt-offset, mt-region-tiling,
+  0, 0,
+  x, y,
+  map-w, map-h,
+  GL_COPY);
+  WARN_ONCE(!ok, Failed to blit from linear temporary mapping);
+   }
+
drm_intel_bo_unreference(map-bo);
 }
 
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/3] intel: Avoid making tiled miptrees we won't be able to blit.

2013-04-02 Thread Eric Anholt
Doing so was breaking miptree mapping, which we really need to be able to
handle.  With this change, intel_miptree_map_direct() falls through to
doing a CPU mapping on the buffer like we need.

With the previous 2 patches, all of these should be fixed:
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37871
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44958
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53494

Reviewed-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/intel/intel_mipmap_tree.c |   35 ++--
 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/src/mesa/drivers/dri/intel/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
index 5e0cd61..8d2b8a3 100644
--- a/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
@@ -354,6 +354,18 @@ intel_miptree_create(struct intel_context *intel,
etc_format = (format != tex_format) ? tex_format : MESA_FORMAT_NONE;
base_format = _mesa_get_format_base_format(format);
 
+   mt = intel_miptree_create_layout(intel, target, format,
+ first_level, last_level, width0,
+ height0, depth0,
+ false, num_samples);
+   /*
+* pitch == 0 || height == 0  indicates the null texture
+*/
+   if (!mt || !mt-total_width || !mt-total_height) {
+  intel_miptree_release(mt);
+  return NULL;
+   }
+
if (num_samples  1) {
   /* From p82 of the Sandy Bridge PRM, dw3[1] of SURFACE_STATE (Tiled
* Surface):
@@ -377,20 +389,15 @@ intel_miptree_create(struct intel_context *intel,
 tiling = I915_TILING_Y;
   else if (force_y_tiling) {
  tiling = I915_TILING_Y;
-  } else if (width0 = 64)
-tiling = I915_TILING_X;
-   }
-
-   mt = intel_miptree_create_layout(intel, target, format,
- first_level, last_level, width0,
- height0, depth0,
- false, num_samples);
-   /*
-* pitch == 0 || height == 0  indicates the null texture
-*/
-   if (!mt || !mt-total_width || !mt-total_height) {
-  intel_miptree_release(mt);
-  return NULL;
+  } else if (width0 = 64) {
+ if (ALIGN(mt-total_width * mt-cpp, 512)  32768) {
+tiling = I915_TILING_X;
+ } else {
+perf_debug(%dx%d miptree too large to blit, 
+   falling back to untiled,
+   mt-total_width, mt-total_height);
+ }
+  }
}
 
total_width = mt-total_width;
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] intel: Do temporary CPU maps of textures that are too big to GTT map.

2013-04-02 Thread Eric Anholt
This still fails, since 8192*4bpp == 32768, which is too big to use the
blitter on.

Reviewed-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/intel/intel_mipmap_tree.c |   21 +
 1 file changed, 21 insertions(+)

diff --git a/src/mesa/drivers/dri/intel/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
index ffdaec5..5e0cd61 100644
--- a/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/intel/intel_mipmap_tree.c
@@ -1703,6 +1703,23 @@ intel_miptree_map_singlesample(struct intel_context 
*intel,
 {
struct intel_miptree_map *map;
 
+   /* Estimate the size of the mappable aperture into the GTT.  There's an
+* ioctl to get the whole GTT size, but not one to get the mappable subset.
+* It turns out it's basically always 256MB, though some ancient hardware
+* was smaller.
+*/
+   uint32_t gtt_size = 256 * 1024 * 1024;
+   if (intel-gen == 2)
+  gtt_size = 128 * 1024 * 1024;
+
+   /* We don't want to map two objects such that a memcpy between them would
+* just fault one mapping in and then the other over and over forever.  So
+* we would need to divide the GTT size by 2.  Additionally, some GTT is
+* taken up by things like the framebuffer and the ringbuffer and such, so
+* be more conservative.
+*/
+   uint32_t max_gtt_map_object_size = gtt_size / 4;
+
assert(mt-num_samples = 1);
 
map = intel_miptree_attach_map(mt, level, slice, x, y, w, h, mode);
@@ -1749,6 +1766,10 @@ intel_miptree_map_singlesample(struct intel_context 
*intel,
 mt-region-tiling == I915_TILING_X 
 mt-region-pitch  32768) {
   intel_miptree_map_blit(intel, mt, map, level, slice);
+   } else if (mt-region-tiling != I915_TILING_NONE 
+  mt-region-bo-size = max_gtt_map_object_size) {
+  assert(mt-region-pitch  32768);
+  intel_miptree_map_blit(intel, mt, map, level, slice);
} else {
   intel_miptree_map_gtt(intel, mt, map, level, slice);
}
-- 
1.7.10.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] register_allocate: Fix the type of best_benefit.

2013-04-02 Thread Kenneth Graunke

On 04/02/2013 01:38 PM, Matt Turner wrote:

---
  src/mesa/program/register_allocate.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/mesa/program/register_allocate.c 
b/src/mesa/program/register_allocate.c
index a9064c3..7d11b73 100644
--- a/src/mesa/program/register_allocate.c
+++ b/src/mesa/program/register_allocate.c
@@ -561,7 +561,7 @@ int
  ra_get_best_spill_node(struct ra_graph *g)
  {
 unsigned int best_node = -1;
-   unsigned int best_benefit = 0.0;
+   float best_benefit = 0.0;
 unsigned int n;

 for (n = 0; n  g-count; n++) {


Yikes.  Good catch...

Reviewed-by: Kenneth Graunke kenn...@whitecape.org

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] mesa: add texture gather changes

2013-04-02 Thread Kenneth Graunke

On 03/31/2013 02:10 AM, Chris Forbes wrote:

From: Maxence Le Dore maxence.led...@gmail.com

---
  src/mapi/glapi/gen/ARB_texture_gather.xml | 14 ++
  src/mapi/glapi/gen/gl_API.xml |  2 +-
  src/mesa/main/context.c   |  4 
  src/mesa/main/extensions.c|  1 +
  src/mesa/main/get.c   |  1 +
  src/mesa/main/get_hash_params.py  |  6 ++
  src/mesa/main/mtypes.h|  6 ++
  src/mesa/main/tests/enum_strings.cpp  |  3 +++
  8 files changed, 36 insertions(+), 1 deletion(-)
  create mode 100644 src/mapi/glapi/gen/ARB_texture_gather.xml

diff --git a/src/mapi/glapi/gen/ARB_texture_gather.xml 
b/src/mapi/glapi/gen/ARB_texture_gather.xml
new file mode 100644
index 000..cd331ac
--- /dev/null
+++ b/src/mapi/glapi/gen/ARB_texture_gather.xml
@@ -0,0 +1,14 @@
+?xml version=1.0?
+!DOCTYPE OpenGLAPI SYSTEM gl_API.dtd
+
+OpenGLAPI
+
+category name=GL_ARB_texture_gather number=72
+
+ enum name=MIN_PROGRAM_TEXTURE_GATHER_OFFSET_ARB value=0x8E5E/
+ enum name=MAX_PROGRAM_TEXTURE_GATHER_OFFSET_ARB value=0x8E5F/
+ enum name=MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB value=0x8F9F/
+
+/category
+
+/OpenGLAPI
\ No newline at end of file
diff --git a/src/mapi/glapi/gen/gl_API.xml b/src/mapi/glapi/gen/gl_API.xml
index 75957dc..9a957d1 100644
--- a/src/mapi/glapi/gen/gl_API.xml
+++ b/src/mapi/glapi/gen/gl_API.xml
@@ -8188,7 +8188,7 @@

  !-- 70. GL_ARB_sample_shading --
  xi:include href=ARB_texture_cube_map_array.xml 
xmlns:xi=http://www.w3.org/2001/XInclude/
-!-- 72. GL_ARB_texture_gather --
+xi:include href=ARB_texture_gather.xml 
xmlns:xi=http://www.w3.org/2001/XInclude/
  !-- 73. GL_ARB_texture_query_lod --

  !-- ARB extension number 74 is a WGL extension. --
diff --git a/src/mesa/main/context.c b/src/mesa/main/context.c
index 0539934..d4e773b 100644
--- a/src/mesa/main/context.c
+++ b/src/mesa/main/context.c
@@ -647,6 +647,10 @@ _mesa_init_constants(struct gl_context *ctx)
 ctx-Const.MinProgramTexelOffset = -8;
 ctx-Const.MaxProgramTexelOffset = 7;

+   /* GL_ARB_texture_gather */
+   ctx-Const.MinProgramTextureGatherOffset = -8;
+   ctx-Const.MaxProgramTextureGatherOffset = 7;
+
 /* GL_ARB_robustness */
 ctx-Const.ResetStrategy = GL_NO_RESET_NOTIFICATION_ARB;

diff --git a/src/mesa/main/extensions.c b/src/mesa/main/extensions.c
index 3116692..593ed1a 100644
--- a/src/mesa/main/extensions.c
+++ b/src/mesa/main/extensions.c
@@ -141,6 +141,7 @@ static const struct extension extension_table[] = {
 { GL_ARB_texture_env_crossbar,
o(ARB_texture_env_crossbar),GLL,2001 },
 { GL_ARB_texture_env_dot3,o(ARB_texture_env_dot3),   
 GLL,2001 },
 { GL_ARB_texture_float,   o(ARB_texture_float),  
 GL, 2004 },
+   { GL_ARB_texture_gather,  o(ARB_texture_gather),  
GL, 2009 },
 { GL_ARB_texture_mirrored_repeat, o(dummy_true), 
 GLL,2001 },
 { GL_ARB_texture_multisample, 
o(ARB_texture_multisample), GL, 2009 },
 { GL_ARB_texture_non_power_of_two,
o(ARB_texture_non_power_of_two),GL, 2003 },
diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
index 582ef31..a182ab4 100644
--- a/src/mesa/main/get.c
+++ b/src/mesa/main/get.c
@@ -356,6 +356,7 @@ EXTRA_EXT(ARB_map_buffer_alignment);
  EXTRA_EXT(ARB_texture_cube_map_array);
  EXTRA_EXT(ARB_texture_buffer_range);
  EXTRA_EXT(ARB_texture_multisample);
+EXTRA_EXT(ARB_texture_gather);

  static const int
  extra_NV_primitive_restart[] = {
diff --git a/src/mesa/main/get_hash_params.py b/src/mesa/main/get_hash_params.py
index 7d4f7e2..1941d53 100644
--- a/src/mesa/main/get_hash_params.py
+++ b/src/mesa/main/get_hash_params.py
@@ -709,6 +709,12 @@ descriptor=[

  # GL_ARB_texture_cube_map_array
[ TEXTURE_BINDING_CUBE_MAP_ARRAY_ARB, LOC_CUSTOM, TYPE_INT, 
TEXTURE_CUBE_ARRAY_INDEX, extra_ARB_texture_cube_map_array ],
+
+# GL_ARB_texture_gather
+  [ MIN_PROGRAM_TEXTURE_GATHER_OFFSET_ARB, 
CONTEXT_INT(Const.MinProgramTextureGatherOffset), extra_ARB_texture_gather],
+  [ MAX_PROGRAM_TEXTURE_GATHER_OFFSET_ARB, 
CONTEXT_INT(Const.MaxProgramTextureGatherOffset), extra_ARB_texture_gather],
+  [ MAX_PROGRAM_TEXTURE_GATHER_COMPONENTS_ARB, 
CONTEXT_INT(Const.MaxProgramTextureGatherComponents), extra_ARB_texture_gather],
+


Maybe drop the ARB suffixes?  They shouldn't be necessary.


  ]},

  # Enums restricted to OpenGL Core profile
diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
index e47e835..37e4b61 100644
--- a/src/mesa/main/mtypes.h
+++ b/src/mesa/main/mtypes.h
@@ -2860,6 +2860,11 @@ struct gl_constants
 /** GL_EXT_gpu_shader4 */
 GLint MinProgramTexelOffset, MaxProgramTexelOffset;

+   /** GL_ARB_texture_gather 

Re: [Mesa-dev] [PATCH 4/6] i965/fs: Add support for ir_tg4

2013-04-02 Thread Kenneth Graunke

On 03/31/2013 02:10 AM, Chris Forbes wrote:

Lowers ir_tg4 (from textureGather and textureGatherOffset builtins) to
SHADER_OPCODE_TG4.

The usual post-sampling swizzle workaround can't work for ir_tg4,
so avoid doing that:

* For R/G/B/A swizzles use the hardware channel select (lives in the
same dword in the header as the texel offset), and then don't do
anything afterward in the shader.
* For 0/1 swizzles blast the appropriate constant over all the output
channels in swizzle_result().

Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
  src/mesa/drivers/dri/i965/brw_fs.h   |  1 +
  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 55 
  2 files changed, 56 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
b/src/mesa/drivers/dri/i965/brw_fs.h
index d9d17a2..bc93bdf 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.h
+++ b/src/mesa/drivers/dri/i965/brw_fs.h
@@ -250,6 +250,7 @@ public:
 void visit(ir_function *ir);
 void visit(ir_function_signature *ir);

+   uint32_t gather_channel(ir_texture *ir, int sampler);
 void swizzle_result(ir_texture *ir, fs_reg orig_val, int sampler);

 bool can_do_source_mods(fs_inst *inst);
diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
index 8556b56..2b77883 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
@@ -1119,6 +1119,14 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg 
dst, fs_reg coordinate,
base_mrf--;
 }

+   if (ir-op == ir_tg4  !header_present) {
+  /* ir_tg4 needs to place its channel select in the header,
+   * for interaction with ARB_texture_swizzle */
+  header_present = true;
+  mlen++;
+  base_mrf--;
+   }


I'm not a fan of duplicating this block.  Why not just change the one 
above's condition to:


   if (ir-op == ir_tg4 || (ir-offset  ir-op != ir_txf)) {

Feel free to keep the comment if you like.


+
 if (ir-shadow_comparitor) {
emit(MOV(fs_reg(MRF, base_mrf + mlen), shadow_c));
mlen += reg_width;
@@ -1128,6 +1136,7 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, 
fs_reg coordinate,
 switch (ir-op) {
 case ir_tex:
 case ir_lod:
+   case ir_tg4:
break;
 case ir_txb:
emit(MOV(fs_reg(MRF, base_mrf + mlen), lod));
@@ -1242,6 +1251,7 @@ fs_visitor::emit_texture_gen7(ir_texture *ir, fs_reg dst, 
fs_reg coordinate,
 case ir_txf_ms: inst = emit(SHADER_OPCODE_TXF_MS, dst); break;
 case ir_txs: inst = emit(SHADER_OPCODE_TXS, dst); break;
 case ir_lod: inst = emit(SHADER_OPCODE_LOD, dst); break;
+   case ir_tg4: inst = emit(SHADER_OPCODE_TG4, dst); break;
 }
 inst-base_mrf = base_mrf;
 inst-mlen = mlen;
@@ -1394,6 +1404,7 @@ fs_visitor::visit(ir_texture *ir)
 switch (ir-op) {
 case ir_tex:
 case ir_lod:
+   case ir_tg4:
break;
 case ir_txb:
ir-lod_info.bias-accept(this);
@@ -1416,6 +1427,8 @@ fs_visitor::visit(ir_texture *ir)
ir-lod_info.sample_index-accept(this);
sample_index = this-result;
break;
+   default:
+  assert(!Unrecognized texture opcode);
 };

 /* Writemasking doesn't eliminate channels on SIMD8 texture
@@ -1440,6 +1453,9 @@ fs_visitor::visit(ir_texture *ir)
 if (ir-offset != NULL  ir-op != ir_txf)
inst-texture_offset = brw_texture_offset(ir-offset-as_constant());

+   if (ir-op == ir_tg4)
+  inst-texture_offset |= gather_channel(ir, sampler)  16; // M0.2:16-17


Clever.  Not a bad approach, but perhaps we should rename the field to 
something more general.  Then again, message_header_bits isn't much 
better...



 inst-sampler = sampler;

 if (ir-shadow_comparitor)
@@ -1460,6 +1476,24 @@ fs_visitor::visit(ir_texture *ir)
  }

  /**
+ * Set up the gather channel based on the swizzle, for gather4.
+ */
+uint32_t
+fs_visitor::gather_channel(ir_texture *ir, int sampler)
+{
+   int swiz = GET_SWZ(c-key.tex.swizzles[sampler], 0 /* red */);
+   switch (swiz) {
+  case SWIZZLE_X: return 0;
+  case SWIZZLE_Y: return 1;
+  case SWIZZLE_Z: return 2;
+  case SWIZZLE_W: return 3;
+  default:
+ /* zero, one swizzles */
+ return 0;
+   }
+}
+
+/**
   * Swizzle the result of a texture result.  This is necessary for
   * EXT_texture_swizzle as well as DEPTH_TEXTURE_MODE for shadow comparisons.
   */
@@ -1468,9 +1502,30 @@ fs_visitor::swizzle_result(ir_texture *ir, fs_reg 
orig_val, int sampler)
  {
 this-result = orig_val;

+   /* txs isn't actually sampling the texture */
 if (ir-op == ir_txs)
return;

+   /* tg4 does the channel select in hardware for 'real' swizzles, but can't
+* do the degenerate ZERO/ONE cases, so we do them here:
+*
+* blast all the output channels with zero or one as appropriate
+*/


So, if texture swizzling selects the channel...then zero/one are stupid.

It might be 

Re: [Mesa-dev] [PATCH 5/6] i965/vs: Add support for ir_tg4

2013-04-02 Thread Kenneth Graunke

On 03/31/2013 02:10 AM, Chris Forbes wrote:

Pretty much the same as the FS case. Channel select goes in the header,
post-sampling swizzle only does the 0/1 cases.

Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
  src/mesa/drivers/dri/i965/brw_vec4.h   |  1 +
  src/mesa/drivers/dri/i965/brw_vec4_emit.cpp|  2 +-
  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 47 --
  3 files changed, 47 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h 
b/src/mesa/drivers/dri/i965/brw_vec4.h
index 1f832d1..36c7312 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_vec4.h
@@ -443,6 +443,7 @@ public:
 void emit_pack_half_2x16(dst_reg dst, src_reg src0);
 void emit_unpack_half_2x16(dst_reg dst, src_reg src0);

+   uint32_t gather_channel(ir_texture *ir, int sampler);
 void swizzle_result(ir_texture *ir, src_reg orig_val, int sampler);

 void emit_ndc_computation();
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp
index 7938c14..d427469 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_emit.cpp
@@ -354,7 +354,7 @@ vec4_generator::generate_tex(vec4_instruction *inst,
brw_MOV(p,
  retype(brw_vec1_reg(BRW_MESSAGE_REGISTER_FILE, inst-base_mrf, 2),
 BRW_REGISTER_TYPE_UD),
- brw_imm_uw(inst-texture_offset));
+ brw_imm_ud(inst-texture_offset));
brw_pop_insn_state(p);
 } else if (inst-header_present) {
/* Set up an implied move from g0 to the MRF. */
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 8bd2fd8..95cfc3b 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -2128,6 +2128,7 @@ vec4_visitor::visit(ir_texture *ir)
break;
 case ir_txb:
 case ir_lod:
+   case ir_tg4:
break;
 }

@@ -2149,15 +2150,21 @@ vec4_visitor::visit(ir_texture *ir)
 case ir_txs:
inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_TXS);
break;
+   case ir_tg4:
+  inst = new(mem_ctx) vec4_instruction(this, SHADER_OPCODE_TG4);
+  break;
 case ir_txb:
assert(!TXB is not valid for vertex shaders.);
break;
 case ir_lod:
assert(!LOD is not valid for vertex shaders.);
break;
+   default:
+  assert(!Unrecognized tex op);
 }

-   bool use_texture_offset = ir-offset != NULL  ir-op != ir_txf;
+   bool use_texture_offset = (ir-offset != NULL  ir-op != ir_txf)
+  || ir-op == ir_tg4;


I'd prefer to leave this as is, and instead...


 /* Texel offsets go in the message header; Gen4 also requires headers. */
 inst-header_present = use_texture_offset || intel-gen  5;


inst-header_present =
   use_texture_offset || ir-op == ir_tg4 || intel-gen  5;


@@ -2168,9 +2175,13 @@ vec4_visitor::visit(ir_texture *ir)
 inst-dst.writemask = WRITEMASK_XYZW;
 inst-shadow_compare = ir-shadow_comparitor != NULL;

-   if (use_texture_offset)
+   if (use_texture_offset  ir-offset)
inst-texture_offset = brw_texture_offset(ir-offset-as_constant());


Then you can leave this alone too...


+   /* Stuff the channel select bits in the top of the texture offset */
+   if (ir-op == ir_tg4)
+  inst-texture_offset |= gather_channel(ir, sampler)16;
+
 /* MRF for the first parameter */
 int param_base = inst-base_mrf + inst-header_present;

@@ -2290,6 +2301,24 @@ vec4_visitor::visit(ir_texture *ir)
 swizzle_result(ir, src_reg(inst-dst), sampler);
  }

+/**
+ * Set up the gather channel based on the swizzle, for gather4.
+ */
+uint32_t
+vec4_visitor::gather_channel(ir_texture *ir, int sampler)
+{
+   int swiz = GET_SWZ(c-key.tex.swizzles[sampler], 0 /* red */);
+   switch (swiz) {
+  case SWIZZLE_X: return 0;
+  case SWIZZLE_Y: return 1;
+  case SWIZZLE_Z: return 2;
+  case SWIZZLE_W: return 3;
+  default:
+ /* zero, one swizzles */
+ return 0;
+   }
+}
+
  void
  vec4_visitor::swizzle_result(ir_texture *ir, src_reg orig_val, int sampler)
  {
@@ -2304,6 +2333,20 @@ vec4_visitor::swizzle_result(ir_texture *ir, src_reg 
orig_val, int sampler)
return;
 }

+   /* ir_tg4 does its swizzling in hardware, except for ZERO/ONE degenerate
+* cases, which we'll do here
+*/
+   if (ir-op == ir_tg4) {
+  int swiz = GET_SWZ(s,0);
+  if (swiz != SWIZZLE_ZERO  swiz != SWIZZLE_ONE) {
+ emit(MOV(swizzled_result, orig_val));
+ return;
+  }
+
+  emit(MOV(swizzled_result, src_reg(swiz == SWIZZLE_ONE ? 1.0f : 0.0f)));
+  return;
+   }


Again, we should probably do this earlier in visit(ir_texture *).  Then 
you can just add || ir-op == ir_tg4 to the above block which 
short-circuits.



+
 int zero_mask = 0, one_mask = 0, copy_mask = 0;
 int 

Re: [Mesa-dev] [PATCH 6/6] i965: Enable ARB_texture_gather on Gen7

2013-04-02 Thread Kenneth Graunke

On 03/31/2013 04:01 PM, Matt Turner wrote:

On Sun, Mar 31, 2013 at 2:10 AM, Chris Forbes chr...@ijw.co.nz wrote:

Signed-off-by: Chris Forbes chr...@ijw.co.nz
---
  src/mesa/drivers/dri/i965/brw_context.c   | 1 +
  src/mesa/drivers/dri/intel/intel_extensions.c | 4 
  2 files changed, 5 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index ceaf325..e8f9c60 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -210,6 +210,7 @@ brwCreateContext(int api,
ctx-Const.MaxColorTextureSamples = 8;
ctx-Const.MaxDepthTextureSamples = 8;
ctx-Const.MaxIntegerSamples = 8;
+  ctx-Const.MaxProgramTextureGatherComponents = 4;
 }

 /* if conformance mode is set, swrast can handle any size AA point */
diff --git a/src/mesa/drivers/dri/intel/intel_extensions.c 
b/src/mesa/drivers/dri/intel/intel_extensions.c
index 9efdee4..450c84d 100755
--- a/src/mesa/drivers/dri/intel/intel_extensions.c
+++ b/src/mesa/drivers/dri/intel/intel_extensions.c
@@ -110,6 +110,10 @@ intelInitExtensions(struct gl_context *ctx)
ctx-Extensions.ARB_texture_multisample = true;
 }

+   if (intel-gen == 7) {
+  ctx-Extensions.ARB_texture_gather = true;
+   }
+


Put this above the intel-gen (|=)= 6 blocks? Also update GL3.txt :)


Nope, because Chris hasn't implemented it for Sandybridge in this 
series.  It would need MaxProgramTextureGatherComponents = 1 for starts, 
and probably some workarounds...


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH] ARB_texture_gather support for Gen7 i965.

2013-04-02 Thread Kenneth Graunke

On 03/31/2013 02:10 AM, Chris Forbes wrote:

This series implements ARB_texture_gather in core mesa, and the
driver side for Gen7 i965.

Not quite baked -- green/blue/alpha texture swizzles with VS don't
work yet. Everything else works, though (R/0/1 swizzles in VS; all
swizzles in FS; textureGather and textureGatherOffset).

The first two patches are pretty much what Maxence sent out, but tidied
up so they work; the third patch of that original series (st + softpipe)
is dropped for now, but shouldn't be hard to reinclude.


Patches 1-3 are:
Reviewed-by: Kenneth Graunke kenn...@whitecape.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev