Re: [Mesa-dev] [PATCH] nir: fix sampler lowering pass for arrays

2015-05-08 Thread Pohjolainen, Topi
On Fri, May 08, 2015 at 09:51:54AM +0300, Tapani P?lli wrote:
 This fixes bugs with special cases where we have arrays of
 structures containing samplers or arrays of samplers.
 
 I've verified that patch results in calculating same index value as
 returned by _mesa_get_sampler_uniform_value for IR. Patch makes
 following ES3 conformance test pass:
 
   ES3-CTS.shaders.struct.uniform.sampler_array_fragment
 
 Signed-off-by: Tapani Pälli tapani.pa...@intel.com
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90114
 ---
  src/glsl/nir/nir_lower_samplers.cpp | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)
 
 diff --git a/src/glsl/nir/nir_lower_samplers.cpp 
 b/src/glsl/nir/nir_lower_samplers.cpp
 index cf8ab83..9859cc0 100644
 --- a/src/glsl/nir/nir_lower_samplers.cpp
 +++ b/src/glsl/nir/nir_lower_samplers.cpp
 @@ -78,7 +78,11 @@ lower_sampler(nir_tex_instr *instr, const struct 
 gl_shader_program *shader_progr
   instr-sampler_index *= glsl_get_length(deref-type);
   switch (deref_array-deref_array_type) {
   case nir_deref_array_type_direct:
 -instr-sampler_index += deref_array-base_offset;
 +
 +/* If this is an array of samplers. */

Above the case is for arrays and below you check for the sampler. This
comment does not tell much extra :)

 +if (deref-child-type-base_type == GLSL_TYPE_SAMPLER)
 +   instr-sampler_index += deref_array-base_offset;
 +
  if (deref_array-deref.child)
 ralloc_asprintf_append(name, [%u], 
 deref_array-base_offset);
  break;
 -- 
 2.1.0
 
 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 86701] [regression] weston-simple-egl not running anymore inside qemu

2015-05-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=86701

--- Comment #14 from Pekka Paalanen ppaala...@gmail.com ---
Patch series posted by Axel Davy:
http://lists.freedesktop.org/archives/mesa-dev/2015-May/083254.html
Reviews combined from Daniel Stone and Dave Airlie cover it all.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir: fix sampler lowering pass for arrays

2015-05-08 Thread Tapani Pälli
This fixes bugs with special cases where we have arrays of
structures containing samplers or arrays of samplers.

I've verified that patch results in calculating same index value as
returned by _mesa_get_sampler_uniform_value for IR. Patch makes
following ES3 conformance test pass:

ES3-CTS.shaders.struct.uniform.sampler_array_fragment

Signed-off-by: Tapani Pälli tapani.pa...@intel.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90114
---
 src/glsl/nir/nir_lower_samplers.cpp | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/glsl/nir/nir_lower_samplers.cpp 
b/src/glsl/nir/nir_lower_samplers.cpp
index cf8ab83..9859cc0 100644
--- a/src/glsl/nir/nir_lower_samplers.cpp
+++ b/src/glsl/nir/nir_lower_samplers.cpp
@@ -78,7 +78,11 @@ lower_sampler(nir_tex_instr *instr, const struct 
gl_shader_program *shader_progr
  instr-sampler_index *= glsl_get_length(deref-type);
  switch (deref_array-deref_array_type) {
  case nir_deref_array_type_direct:
-instr-sampler_index += deref_array-base_offset;
+
+/* If this is an array of samplers. */
+if (deref-child-type-base_type == GLSL_TYPE_SAMPLER)
+   instr-sampler_index += deref_array-base_offset;
+
 if (deref_array-deref.child)
ralloc_asprintf_append(name, [%u], deref_array-base_offset);
 break;
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/6] mesa/es3.1: enable GL_ARB_shader_image_load_store for gles3.1

2015-05-08 Thread marta . lofstedt


 On 05/08/2015 12:13 AM, Ian Romanick wrote:
 On 05/07/2015 12:57 AM, Marta Lofstedt wrote:
 From: Marta Lofstedt marta.lofst...@intel.com

 v2: only expose enums from GL_ARB_shader_image_load_store
 for gles 3.1 and GL core

 Signed-off-by: Marta Lofstedt marta.lofst...@intel.com
 ---
   src/mesa/main/get.c  |  6 ++
   src/mesa/main/get_hash_params.py | 17 -
   2 files changed, 14 insertions(+), 9 deletions(-)

 diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
 index 9898197..73739b6 100644
 --- a/src/mesa/main/get.c
 +++ b/src/mesa/main/get.c
 @@ -355,6 +355,12 @@ static const int extra_ARB_draw_indirect_es31[] =
 {
  EXTRA_END
   };

 +static const int extra_ARB_shader_image_load_store_es31[] = {
 +   EXT(ARB_shader_image_load_store),
 +   EXTRA_API_ES31,

 I think you're missing the patch that adds EXTRA_API_ES31.  Did you
 forget to send that one out?

 Marta's series builds on top of my patch here that adds EXTRA_API_ES31:

 http://lists.freedesktop.org/archives/mesa-dev/2015-May/083593.html

 Also, on a few of these patches, I think the old, non-_es31 set of
 requirements can be removed due to no longer being used.


Ian, are you referring to the stuff that is left with the
extra_ARB_shader_atomic_counters_and_geometry_shader: do you want me to
remove those and it's companion struct in get.c?

 +   EXTRA_END
 +};
 +
   EXTRA_EXT(ARB_texture_cube_map);
   EXTRA_EXT(EXT_texture_array);
   EXTRA_EXT(NV_fog_distance);
 diff --git a/src/mesa/main/get_hash_params.py
 b/src/mesa/main/get_hash_params.py
 index 513d5d2..85c2494 100644
 --- a/src/mesa/main/get_hash_params.py
 +++ b/src/mesa/main/get_hash_params.py
 @@ -413,6 +413,14 @@ descriptor=[
   { apis: [GL_CORE, GLES3], params: [
   # GL_ARB_draw_indirect / GLES 3.1
 [ DRAW_INDIRECT_BUFFER_BINDING, LOC_CUSTOM, TYPE_INT, 0,
 extra_ARB_draw_indirect_es31 ],
 +# GL_ARB_shader_image_load_store / GLES 3.1
 +  [ MAX_IMAGE_UNITS, CONTEXT_INT(Const.MaxImageUnits),
 extra_ARB_shader_image_load_store_es31],
 +  [ MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS,
 CONTEXT_INT(Const.MaxCombinedImageUnitsAndFragmentOutputs),
 extra_ARB_shader_image_load_store_es31],
 +  [ MAX_IMAGE_SAMPLES, CONTEXT_INT(Const.MaxImageSamples),
 extra_ARB_shader_image_load_store_es31],
 +  [ MAX_VERTEX_IMAGE_UNIFORMS,
 CONTEXT_INT(Const.Program[MESA_SHADER_VERTEX].MaxImageUniforms),
 extra_ARB_shader_image_load_store_es31],
 +  [ MAX_GEOMETRY_IMAGE_UNIFORMS,
 CONTEXT_INT(Const.Program[MESA_SHADER_GEOMETRY].MaxImageUniforms),
 extra_ARB_shader_image_load_store_es31],
 +  [ MAX_FRAGMENT_IMAGE_UNIFORMS,
 CONTEXT_INT(Const.Program[MESA_SHADER_FRAGMENT].MaxImageUniforms),
 extra_ARB_shader_image_load_store_es31],
 +  [ MAX_COMBINED_IMAGE_UNIFORMS,
 CONTEXT_INT(Const.MaxCombinedImageUniforms),
 extra_ARB_shader_image_load_store_es31],
   ]},

   # Remaining enums are only in OpenGL
 @@ -780,15 +788,6 @@ descriptor=[
 [ MAX_VERTEX_ATTRIB_RELATIVE_OFFSET,
 CONTEXT_ENUM(Const.MaxVertexAttribRelativeOffset), NO_EXTRA ],
 [ MAX_VERTEX_ATTRIB_BINDINGS,
 CONTEXT_ENUM(Const.MaxVertexAttribBindings), NO_EXTRA ],

 -# GL_ARB_shader_image_load_store
 -  [ MAX_IMAGE_UNITS, CONTEXT_INT(Const.MaxImageUnits),
 extra_ARB_shader_image_load_store],
 -  [ MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS,
 CONTEXT_INT(Const.MaxCombinedImageUnitsAndFragmentOutputs),
 extra_ARB_shader_image_load_store],
 -  [ MAX_IMAGE_SAMPLES, CONTEXT_INT(Const.MaxImageSamples),
 extra_ARB_shader_image_load_store],
 -  [ MAX_VERTEX_IMAGE_UNIFORMS,
 CONTEXT_INT(Const.Program[MESA_SHADER_VERTEX].MaxImageUniforms),
 extra_ARB_shader_image_load_store],
 -  [ MAX_GEOMETRY_IMAGE_UNIFORMS,
 CONTEXT_INT(Const.Program[MESA_SHADER_GEOMETRY].MaxImageUniforms),
 extra_ARB_shader_image_load_store_and_geometry_shader],
 -  [ MAX_FRAGMENT_IMAGE_UNIFORMS,
 CONTEXT_INT(Const.Program[MESA_SHADER_FRAGMENT].MaxImageUniforms),
 extra_ARB_shader_image_load_store],
 -  [ MAX_COMBINED_IMAGE_UNIFORMS,
 CONTEXT_INT(Const.MaxCombinedImageUniforms),
 extra_ARB_shader_image_load_store],
 -
   # GL_ARB_compute_shader
 [ MAX_COMPUTE_WORK_GROUP_INVOCATIONS,
 CONTEXT_INT(Const.MaxComputeWorkGroupInvocations),
 extra_ARB_compute_shader ],
 [ MAX_COMPUTE_UNIFORM_BLOCKS, CONST(MAX_COMPUTE_UNIFORM_BLOCKS),
 extra_ARB_compute_shader ],


 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: fix sampler lowering pass for arrays

2015-05-08 Thread Tapani Pälli



On 05/08/2015 09:56 AM, Pohjolainen, Topi wrote:

On Fri, May 08, 2015 at 09:51:54AM +0300, Tapani P?lli wrote:

This fixes bugs with special cases where we have arrays of
structures containing samplers or arrays of samplers.

I've verified that patch results in calculating same index value as
returned by _mesa_get_sampler_uniform_value for IR. Patch makes
following ES3 conformance test pass:

ES3-CTS.shaders.struct.uniform.sampler_array_fragment

Signed-off-by: Tapani Pälli tapani.pa...@intel.com
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90114
---
  src/glsl/nir/nir_lower_samplers.cpp | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/glsl/nir/nir_lower_samplers.cpp 
b/src/glsl/nir/nir_lower_samplers.cpp
index cf8ab83..9859cc0 100644
--- a/src/glsl/nir/nir_lower_samplers.cpp
+++ b/src/glsl/nir/nir_lower_samplers.cpp
@@ -78,7 +78,11 @@ lower_sampler(nir_tex_instr *instr, const struct 
gl_shader_program *shader_progr
   instr-sampler_index *= glsl_get_length(deref-type);
   switch (deref_array-deref_array_type) {
   case nir_deref_array_type_direct:
-instr-sampler_index += deref_array-base_offset;
+
+/* If this is an array of samplers. */


Above the case is for arrays and below you check for the sampler. This
comment does not tell much extra :)


Yeah, not sure how to change it. What I want to state here is that only 
for arrays of samplers we need to do this, otherwise we don't. The only 
other case really is array of structs that contain sampler so maybe I 
should state that instead?




+if (deref-child-type-base_type == GLSL_TYPE_SAMPLER)
+   instr-sampler_index += deref_array-base_offset;
+
  if (deref_array-deref.child)
 ralloc_asprintf_append(name, [%u], 
deref_array-base_offset);
  break;
--
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] main: glGetIntegeri_v fails for GL_VERTEX_BINDING_STRIDE

2015-05-08 Thread Tapani Pälli

That is strange, it was introduced in fb370f89d but then has gone missing ..

Reviewed-by: Tapani Pälli tapani.pa...@intel.com


On 05/07/2015 06:13 PM, Marta Lofstedt wrote:

The return type for GL_VERTEX_BINDING_STRIDE is missing,
this cause glGetIntegeri_v to fail.

Signed-off-by: Marta Lofstedt marta.lofst...@linux.intel.com
---
  src/mesa/main/get.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
index 6fc0f3f..9fb8fba 100644
--- a/src/mesa/main/get.c
+++ b/src/mesa/main/get.c
@@ -1959,6 +1959,7 @@ find_value_indexed(const char *func, GLenum pname, GLuint 
index, union value *v)
if (index = ctx-Const.Program[MESA_SHADER_VERTEX].MaxAttribs)
goto invalid_value;
v-value_int = 
ctx-Array.VAO-VertexBinding[VERT_ATTRIB_GENERIC(index)].Stride;
+  return TYPE_INT;

 /* ARB_shader_image_load_store */
 case GL_IMAGE_BINDING_NAME: {


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC PATCH] nir: Transform 4*x into x 2 during late optimizations.

2015-05-08 Thread Kenneth Graunke
According to Glenn, shifts on R600 have 5x the throughput as multiplies.

Intel GPUs have strange integer multiplication restrictions - on most
hardware, MUL actually only does a 32-bit x 16-bit multiply.  This
means the arguments aren't commutative, which can limit our constant
propagation options.  SHL has no such restrictions.

Shifting is probably reasonable on most people's hardware, so let's just
do that.

i965 shader-db results (using NIR for VS):
total instructions in shared programs: 7432587 - 7388982 (-0.59%)
instructions in affected programs: 1360411 - 1316806 (-3.21%)
helped:5772
HURT:  0

Signed-off-by: Kenneth Graunke kenn...@whitecape.org
Cc: matts...@gmail.com
Cc: ja...@jlekstrand.net
---
 src/glsl/nir/nir_opt_algebraic.py | 5 +
 1 file changed, 5 insertions(+)

So...I found a bizarre issue with this patch.

   (('imul', 4, a), ('ishl', a, 2)),

totally optimizes things.  However...

   (('imul', a, 4), ('ishl', a, 2)),

doesn't seem to do anything, even though imul is commutative, and nir_search
should totally handle that...

 ▄▄  ▄▄▄▄    ▄   ▄▄
 ██  ██   ▀▀▀██▀▀▀  ███  ██
 ▀█▄ ██ ▄█▀      ██ ▄█▀  ██
  ██ ██ ██   ██  ██  ██   ▄██▀   ██
  ███▀▀███   ██  ██   ██ ▀▀
  ███  ███  ▄██  ██▄ ██   ▄▄ ▄▄
  ▀▀▀  ▀▀▀  ▀▀▀▀ ▀▀   ▀▀ ▀▀

If you know why, let me know, otherwise I may have to look into it when more
awake.

diff --git a/src/glsl/nir/nir_opt_algebraic.py 
b/src/glsl/nir/nir_opt_algebraic.py
index 400d60e..350471f 100644
--- a/src/glsl/nir/nir_opt_algebraic.py
+++ b/src/glsl/nir/nir_opt_algebraic.py
@@ -247,6 +247,11 @@ late_optimizations = [
(('fge', ('fadd', a, b), 0.0), ('fge', a, ('fneg', b))),
(('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))),
(('fne', ('fadd', a, b), 0.0), ('fne', a, ('fneg', b))),
+
+   # Multiplication by 4 comes up fairly often in indirect offset calculations.
+   # Some GPUs have weird integer multiplication limitations, but shifts 
should work
+   # equally well everywhere.
+   (('imul', 4, a), ('ishl', a, 2)),
 ]
 
 print nir_algebraic.AlgebraicPass(nir_opt_algebraic, optimizations).render()
-- 
2.4.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Use NIR by default for vertex shaders on GEN8+

2015-05-08 Thread Kenneth Graunke
On Thursday, May 07, 2015 06:17:46 PM Matt Turner wrote:
 On Thu, May 7, 2015 at 4:50 PM, Jason Ekstrand ja...@jlekstrand.net wrote:
  GLSL IR vs. NIR shader-db results for SIMD8 vertex shaders on Broadwell:
 
 total instructions in shared programs: 2724483 - 2711790 (-0.47%)
 instructions in affected programs: 1860859 - 1848166 (-0.68%)
 helped:4387
 HURT:  4758
 GAINED:1499
 
  The gained programs are ARB vertext programs that were previously going
  through the vec4 backend.  Now that we have prog_to_nir, ARB vertex
  programs can go through the scalar backend so they show up as gained in
  the shader-db results.
 
 Again, I'm kind of confused and disappointed that we're just okay with
 hurting 4700 programs without more analysis. I guess I'll go do
 that...

I took a stab at that tonight.  The good news is, the majority of the
hurt is pretty stupid.  Indirect uniform address calculations involve
a lot of integer multiplication by 4.

For whatever reason, we're getting 4*x instead of x*4, which doesn't
support immediates.  So we get:

MOV tmp 4
MUL dst tmp x

Normally, constant propagation would commute the operands, giving us
MUL dst x 4 like we want.  But it sees integer multiplication and
chickens out, due to the asymmetry on some platforms.

I think we can extend that - on Broadwell it should work fine, and
might work fine for 16-bit immediates on Gen7 and Cherryview, too.

Alternatively, I wrote a nir_opt_algebraic_late optimization that turns
4*x into x  2, which works around the problem, and is also apparently
much better for R600.

Statistics on the shift patch are:

total instructions in shared programs: 7432587 - 7388982 (-0.59%)
instructions in affected programs: 1360411 - 1316806 (-3.21%)
helped:5772
HURT:  0

Statistics for GLSL IR vs. NIR+(4*x = x  2):

total instructions in shared programs: 7232451 - 7175983 (-0.78%)
instructions in affected programs: 1586917 - 1530449 (-3.56%)
helped:5780
HURT:  1654

which is much better.

Looking at a couple of the shaders that are still worse off...it looks
like a ton of Source shaders used to do MUL/ADD with an attribute and
two immediates, and now are doing MOV/MOV/MAD.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/27] i965: Enable hardware-generated binding tables on render path.

2015-05-08 Thread Abdiel Janulgue


On 05/07/2015 05:46 PM, Pohjolainen, Topi wrote:
 On Thu, May 07, 2015 at 04:43:21PM +0300, Pohjolainen, Topi wrote:
 On Tue, Apr 28, 2015 at 11:08:00PM +0300, Abdiel Janulgue wrote:
 This patch implements the binding table enable command which is also
 used to allocate a binding table pool where hardware-generated
 binding table entries are flushed into. Each binding table offset in
 the binding table pool is unique per each shader stage that are
 enabled within a batch.

 Also insert the required brw_tracked_state objects to enable
 hw-generated binding tables in normal render path.

 Signed-off-by: Abdiel Janulgue abdiel.janul...@linux.intel.com
 ---
  src/mesa/drivers/dri/i965/brw_binding_tables.c | 70 
 ++
  src/mesa/drivers/dri/i965/brw_context.c|  4 ++
  src/mesa/drivers/dri/i965/brw_context.h|  5 ++
  src/mesa/drivers/dri/i965/brw_state.h  |  7 +++
  src/mesa/drivers/dri/i965/brw_state_upload.c   |  2 +
  src/mesa/drivers/dri/i965/intel_batchbuffer.c  |  4 ++
  6 files changed, 92 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c 
 b/src/mesa/drivers/dri/i965/brw_binding_tables.c
 index 459165a..a58e32e 100644
 --- a/src/mesa/drivers/dri/i965/brw_binding_tables.c
 +++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c
 @@ -44,6 +44,11 @@
  #include brw_state.h
  #include intel_batchbuffer.h
  
 +/* Somehow the hw-binding table pool offset must start here, otherwise
 + * the GPU will hang
 + */
 +#define HW_BT_START_OFFSET 256;

 I think we want to understand this a little better before enabling...

Actually, now that I remember I had my notes somewhere when I first
enabled this hw-binding table over a year ago. I dug it up and 256 is
the actually the size of a single stage's hw-binding table state
expressed in hw-binding table format. Details:

From the Bspec 3DSTATE_BINDING_TABLE_POINTERS_x  Pointer to PS Binding
Table section lists the format as:

SurfaceStateOffset[16:6]BINDING_TABLE_STATE*256 When
HW-generated binding table is enabled

So this is 16-bits[1] x 256 = 512 bytes.

Now this offset must be expressed in Pointer to PS Binding Table using
the hw-generated binding table format which must be aligned to 16:6.
However the bit entry field in dw1 of 3DSTATE_BINDING_TABLE_POINTERS_x
must be set within 15:5 so this value should be  1. Hence, the 256
(similar case is evident on function gen7_update_binding_table() in
patch 4 of this series).

Seems the RS hardware is extremely intolerant of even slight variations
hence the hungs when this is not followed closely.

In the next version, I can make the magic numbers a bit more clearer.

[1] 3D-Media-GPGPU Engine  Shared Functions  3D Sampler  State  HW
Generated BINDING_TABLE_STATE


 +
  /**
   * Upload a shader stage's binding table as indirect state.
   *
 @@ -163,6 +168,71 @@ const struct brw_tracked_state brw_gs_binding_table = {
 .emit = brw_gs_upload_binding_table,
  };
  
 +/**
 + * Hardware-generated binding tables for the resource streamer
 + */
 +void
 +gen7_disable_hw_binding_tables(struct brw_context *brw)
 +{
 +   BEGIN_BATCH(3);
 +   OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC  16 | (3 - 2));
 +   OUT_BATCH(SET_FIELD(BRW_HW_BINDING_TABLE_OFF, 
 BRW_HW_BINDING_TABLE_ENABLE) |
 + brw-is_haswell ? HSW_HW_BINDING_TABLE_RESERVED : 0);
 +   OUT_BATCH(0);
 +   ADVANCE_BATCH();
 +
 +   /* Pipe control workaround */
 +   brw_emit_pipe_control_flush(brw, PIPE_CONTROL_STATE_CACHE_INVALIDATE);
 +}
 +
 +void
 +gen7_enable_hw_binding_tables(struct brw_context *brw)
 +{
 +   if (!brw-has_resource_streamer) {
 +  gen7_disable_hw_binding_tables(brw);

 I started wondering why we really need this - RS is disabled by default and
 we haven't needed to do anything to disable it before.
 
 Right, patch number eight gave me the answer, we want to disable it for blorp.
 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/9] glx/dri3: Add additional check for gpu offloading case

2015-05-08 Thread Emil Velikov
On 2 May 2015 at 11:15, Axel Davy axel.d...@ens.fr wrote:
 Checks blitImage is implemented.
 Initially having the __DRIimageExtension extension
 at version 9 at least meant blitImage was supported.
 However some implementations do advertise version = 9
 without implementing it.

Afaict in the above mentioned case we'll just segfault. A worthy
mesa-stable material imho.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 7/9] egl/wayland: assume EGL_WINDOW_BIT

2015-05-08 Thread Axel Davy

Le 06/05/2015 03:00, Dave Airlie a écrit :

On 2 May 2015 at 20:15, Axel Davy axel.d...@ens.fr wrote:

Only EGL_WINDOW_BIT is supported. Remove tests related.

Is this there no plans to support pixmap/pbuffer/ or any of the other bits?

Seems like a step in the wrong direction if we really should be supporting
other things than WINDOW_BIT in the future.

Dave.

I double checked, and it really seems pbuffers are not supported by the 
current mesa implementation for wayland.

However the spec doesn't tell they shouldn't be supported.

What about changing  the commit message to:
egl/wayland: simplify dri2_wl_create_surface

This function is always used with EGL_WINDOW_BIT. Pixmaps are forbidden
for Wayland, and PBuffers are unimplemented.



I'm also fine with dropping the patch.

Yours,

Axel Davy

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir/search: handle explicitly sized sources in match_value

2015-05-08 Thread Jason Ekstrand
Previously, this case was being handled in match_expression prior to
calling match_value.  However, there is really no good reason for this
given that match_value has all of the information it needs.  Also, they
weren't being handled properly in the commutative case and putting it in
match_value gives us that for free.

Cc: Connor Abbott cwabbo...@gmail.com
---
 src/glsl/nir/nir_search.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c
index 5ba0160..821b86d 100644
--- a/src/glsl/nir/nir_search.c
+++ b/src/glsl/nir/nir_search.c
@@ -73,6 +73,14 @@ match_value(const nir_search_value *value, nir_alu_instr 
*instr, unsigned src,
 {
uint8_t new_swizzle[4];
 
+   /* If the source is an explicitly sized source, then we need to reset
+* both the number of components and the swizzle.
+*/
+   if (nir_op_infos[instr-op].input_sizes[src] != 0) {
+  num_components = nir_op_infos[instr-op].input_sizes[src];
+  swizzle = identity_swizzle;
+   }
+
for (int i = 0; i  num_components; ++i)
   new_swizzle[i] = instr-src[src].swizzle[swizzle[i]];
 
@@ -200,14 +208,6 @@ match_expression(const nir_search_expression *expr, 
nir_alu_instr *instr,
 
bool matched = true;
for (unsigned i = 0; i  nir_op_infos[instr-op].num_inputs; i++) {
-  /* If the source is an explicitly sized source, then we need to reset
-   * both the number of components and the swizzle.
-   */
-  if (nir_op_infos[instr-op].input_sizes[i] != 0) {
- num_components = nir_op_infos[instr-op].input_sizes[i];
- swizzle = identity_swizzle;
-  }
-
   if (!match_value(expr-srcs[i], instr, i, num_components,
swizzle, state)) {
  matched = false;
-- 
2.4.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] nir: Translate image load, store and atomic intrinsics from GLSL IR.

2015-05-08 Thread Francisco Jerez
Connor Abbott cwabbo...@gmail.com writes:

 On Tue, May 5, 2015 at 4:29 PM, Francisco Jerez curroje...@riseup.net wrote:
 ---
  src/glsl/nir/glsl_to_nir.cpp | 125 
 +++
  1 file changed, 114 insertions(+), 11 deletions(-)

 diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
 index f6b8331..a01ab3b 100644
 --- a/src/glsl/nir/glsl_to_nir.cpp
 +++ b/src/glsl/nir/glsl_to_nir.cpp
 @@ -614,27 +614,130 @@ nir_visitor::visit(ir_call *ir)
   op = nir_intrinsic_atomic_counter_inc_var;
} else if (strcmp(ir-callee_name(), 
 __intrinsic_atomic_predecrement) == 0) {
   op = nir_intrinsic_atomic_counter_dec_var;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_load) == 0) {
 + op = nir_intrinsic_image_load;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_store) == 0) 
 {
 + op = nir_intrinsic_image_store;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_add) 
 == 0) {
 + op = nir_intrinsic_image_atomic_add;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_min) 
 == 0) {
 + op = nir_intrinsic_image_atomic_min;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_max) 
 == 0) {
 + op = nir_intrinsic_image_atomic_max;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_and) 
 == 0) {
 + op = nir_intrinsic_image_atomic_and;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_or) 
 == 0) {
 + op = nir_intrinsic_image_atomic_or;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_xor) 
 == 0) {
 + op = nir_intrinsic_image_atomic_xor;
 +  } else if (strcmp(ir-callee_name(), 
 __intrinsic_image_atomic_exchange) == 0) {
 + op = nir_intrinsic_image_atomic_exchange;
 +  } else if (strcmp(ir-callee_name(), 
 __intrinsic_image_atomic_comp_swap) == 0) {
 + op = nir_intrinsic_image_atomic_comp_swap;
} else {
   unreachable(not reached);
}

nir_intrinsic_instr *instr = nir_intrinsic_instr_create(shader, op);
 -  ir_dereference *param =
 - (ir_dereference *) ir-actual_parameters.get_head();
 -  instr-variables[0] = evaluate_deref(instr-instr, param);
 -  nir_ssa_dest_init(instr-instr, instr-dest, 1, NULL);
 +
 +  switch (op) {
 +  case nir_intrinsic_atomic_counter_read_var:
 +  case nir_intrinsic_atomic_counter_inc_var:
 +  case nir_intrinsic_atomic_counter_dec_var: {
 + ir_dereference *param =
 +(ir_dereference *) ir-actual_parameters.get_head();
 + instr-variables[0] = evaluate_deref(instr-instr, param);
 + nir_ssa_dest_init(instr-instr, instr-dest, 1, NULL);
 + break;
 +  }
 +  case nir_intrinsic_image_load:
 +  case nir_intrinsic_image_store:
 +  case nir_intrinsic_image_atomic_add:
 +  case nir_intrinsic_image_atomic_min:
 +  case nir_intrinsic_image_atomic_max:
 +  case nir_intrinsic_image_atomic_and:
 +  case nir_intrinsic_image_atomic_or:
 +  case nir_intrinsic_image_atomic_xor:
 +  case nir_intrinsic_image_atomic_exchange:
 +  case nir_intrinsic_image_atomic_comp_swap: {
 + nir_load_const_instr *instr_zero = 
 nir_load_const_instr_create(shader, 1);
 + instr_zero-value.u[0] = 0;
 + nir_instr_insert_after_cf_list(this-cf_node_list, 
 instr_zero-instr);
 +
 + /* Set the image variable dereference. */
 + exec_node *param = ir-actual_parameters.get_head();
 + ir_dereference *image = (ir_dereference *)param;
 + const glsl_type *type =
 +image-variable_referenced()-type-without_array();
 +
 + instr-variables[0] = evaluate_deref(instr-instr, image);
 + param = param-get_next();
 +
 + /* Set the address argument, extending the coordinate vector to 
 four
 +  * components.
 +  */
 + const nir_src src_addr = evaluate_rvalue((ir_dereference *)param);
 + nir_alu_instr *instr_addr = nir_alu_instr_create(shader, 
 nir_op_vec4);
 + nir_ssa_dest_init(instr_addr-instr, instr_addr-dest.dest, 4, 
 NULL);
 +
 + for (int i = 0; i  4; i++) {
 +if (i  type-coordinate_components()) {
 +   instr_addr-src[i].src = src_addr;
 +   instr_addr-src[i].swizzle[0] = i;
 +} else {
 +   instr_addr-src[i].src = nir_src_for_ssa(instr_zero-def);

 I think it would better convey the intent to create an ssa_undef_instr
 and use that here instead of zero. Unless something else relies on the
 extra coordinates being zeroed?

Yeah, that would work too.  Zero makes some sense because an
n-dimensional image is largely equivalent to an n+1-dimensional image
with the last dimension equal to one.  Some implementation
*cough*i965*cough* actually requires you to send a whole vec4 of
coordinates (in SIMD4x2 mode) no matter 

Re: [Mesa-dev] Mesa 10.6 release plan

2015-05-08 Thread Emil Velikov
Hello all,

On 21 April 2015 at 13:40, Emil Velikov emil.l.veli...@gmail.com wrote:
 Hi all,

 Here is the preliminary release schedule for Mesa 10.6. Personally I
 would love if we can go up to Mesa 11.0 (Spinal Tap anyone ?) although
 it might happen with the September release.

 May 15th 2015 - Feature freeze/Release candidate 1
 May 22st 2015 - Release candidate 2
 May 29th 2015 - Release candidate 3
 June 5th 2015 - Release candidate 4/Mesa 10.6.0

 This means that we have roughly four weeks to get new features in, and
 another four weeks to get all the serious bugs sorted out.

 Does this sound reasonable with the different teams ? If anyone has
 something special in mind please speak up.

This is a gentle reminder that we have one week until the 10.6 branch
point. As far as I can see we have a few extensions which may or may
not make it in time. I would be fine if we need an extra week to
finish up any of the following extensions considering their importance
and/or how long people have been working on them. If you feel the same
way please speak up, otherwise the schedule will stay as outlined
previously.

GL_ARB_shader_subroutine
GL_ARB_tessellation_shader
GL_ARB_shader_image_load_store
GL_ARB_direct_state_access

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 13/13] mesa/main: Verify context creation on progress

2015-05-08 Thread Pohjolainen, Topi
On Fri, May 08, 2015 at 03:23:52PM +0300, Juha-Pekka Heikkil? wrote:
perjantai 8. toukokuuta 2015 Ian Romanick i...@freedesktop.org kirjoitti:
 
  On 05/07/2015 05:21 AM, Pohjolainen, Topi wrote:
   On Tue, May 05, 2015 at 02:25:29PM +0300, Juha-Pekka Heikkila wrote:
   Stop context creation if something failed. If something errored
   during context creation we'd segfault. Now will clean up and
   return error.
  
   Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
   ---
src/mesa/main/shared.c | 66
  +++---
1 file changed, 62 insertions(+), 4 deletions(-)
  
   diff --git a/src/mesa/main/shared.c b/src/mesa/main/shared.c
   index 0b76cc0..cc05b05 100644
   --- a/src/mesa/main/shared.c
   +++ b/src/mesa/main/shared.c
   @@ -64,9 +64,21 @@ _mesa_alloc_shared_state(struct gl_context *ctx)
  
   mtx_init(shared-Mutex, mtx_plain);
  
   +   /* Mutex and timestamp for texobj state validation */
   +   mtx_init(shared-TexMutex, mtx_recursive);
   +   shared-TextureStateStamp = 0;
  
   Do you really need to move this here?
 
  I was going to ask the same thing.  I think moving it here means that it
  can be unconditionally mtx_destroy'ed in the error path below.
 
Yes, Ian got it correct. When moving mutex creation here there is no need
to go checking about it if need to clean up. I though this makes things
more clean and simple.

As it can't fail, I would have moved it right before the return in the
success path. Saves you from adding the mutex tear-down in the failure path.

 
 
   
 
   +
   shared-DisplayList = _mesa_NewHashTable();
   +   if (!shared-DisplayList)
   +  goto error_out;
   +
   shared-TexObjects = _mesa_NewHashTable();
   +   if (!shared-TexObjects)
   +  goto error_out;
   +
   shared-Programs = _mesa_NewHashTable();
   +   if (!shared-Programs)
   +  goto error_out;
  
   shared-DefaultVertexProgram =
  gl_vertex_program(ctx-Driver.NewProgram(ctx,
   @@ -76,17 +88,28 @@ _mesa_alloc_shared_state(struct gl_context *ctx)
   
   GL_FRAGMENT_PROGRAM_ARB, 0));
  
   shared-ATIShaders = _mesa_NewHashTable();
   +   if (!shared-ATIShaders)
   +  goto error_out;
   +
   shared-DefaultFragmentShader =
  _mesa_new_ati_fragment_shader(ctx, 0);
  
   shared-ShaderObjects = _mesa_NewHashTable();
   +   if (!shared-ShaderObjects)
   +  goto error_out;
  
   shared-BufferObjects = _mesa_NewHashTable();
   +   if (!shared-BufferObjects)
   +  goto error_out;
  
   /* GL_ARB_sampler_objects */
   shared-SamplerObjects = _mesa_NewHashTable();
   +   if (!shared-SamplerObjects)
   +  goto error_out;
  
   /* Allocate the default buffer object */
   shared-NullBufferObj = ctx-Driver.NewBufferObject(ctx, 0);
   +   if (!shared-NullBufferObj)
   +   goto error_out;
  
   /* Create default texture objects */
   for (i = 0; i  NUM_TEXTURE_TARGETS; i++) {
   @@ -107,22 +130,57 @@ _mesa_alloc_shared_state(struct gl_context
  *ctx)
  };
  STATIC_ASSERT(ARRAY_SIZE(targets) == NUM_TEXTURE_TARGETS);
  shared-DefaultTex[i] = ctx-Driver.NewTextureObject(ctx, 0,
  targets[i]);
   +
   +  if (!shared-DefaultTex[i])
   +  goto error_out;
   }
  
   /* sanity check */
   assert(shared-DefaultTex[TEXTURE_1D_INDEX]-RefCount == 1);
  
   -   /* Mutex and timestamp for texobj state validation */
   -   mtx_init(shared-TexMutex, mtx_recursive);
   -   shared-TextureStateStamp = 0;
   -
   shared-FrameBuffers = _mesa_NewHashTable();
   +   if (!shared-FrameBuffers)
   +  goto error_out;
   +
   shared-RenderBuffers = _mesa_NewHashTable();
   +   if (!shared-RenderBuffers)
   +  goto error_out;
  
   shared-SyncObjects = _mesa_set_create(NULL, _mesa_hash_pointer,
  _mesa_key_pointer_equal);
   +   if (!shared-SyncObjects)
   +  goto error_out;
  
   return shared;
   +
   +error_out:
   +   for (i = 0; i  NUM_TEXTURE_TARGETS; i++) {
   +  if (shared-DefaultTex[i]) {
   + ctx-Driver.DeleteTexture(ctx, shared-DefaultTex[i]);
   +  }
   +   }
   +
   +   _mesa_reference_buffer_object(ctx, shared-NullBufferObj, NULL);
   +
   +   _mesa_DeleteHashTable(shared-RenderBuffers);
   +   _mesa_DeleteHashTable(shared-FrameBuffers);
   +   _mesa_DeleteHashTable(shared-SamplerObjects);
   +   

Re: [Mesa-dev] [PATCH 1/5] nir: Define image load, store and atomic intrinsics.

2015-05-08 Thread Francisco Jerez
Connor Abbott cwabbo...@gmail.com writes:

 On IRC, Ken and I were discussing using a scheme inspired by SPIR-V,
 which has an OpImagePointer instruction that forms a pointer to the
 particular texel of the image as well as
 OpAtomic{Load,Store,Exchange,etc.} that operate on an image or shared
 buffer pointer. The advantages would be:

 * Makes translating from SPIR-V easier.
 * Reduces the number of intrinsics we need to add for SSBO support.
 * Reduces the combinatorial explosion enough that we can have separate
 versions for 2, 3, and 4 components and MS vs. non-MS without it being
 unbearable. I'm not sure how much of a benefit that would be though.

 The disadvantages I can think of are:

 * Doesn't actually save any code in the i965 backend, since we need to
 do different things depending on if the pointer is to an image or a
 shared buffer anyways.
 * We'd have to special case nir_convert_from_ssa to ignore the SSA
 value that's really a pointer since we don't have any real type-level
 support for pointers.
 * Since we lower to SSA before converting to i965, there are some ugly
 edge cases when the coordinate argument becomes part of a phi web and
 gets potentially overwritten before the instruction that uses the
 pointer.

Yeah, I actually played around with a SPIR-V-like approach, and decided
to give up the idea in the end mainly because of the disadvantages you
mention, because it's nothing close to what the back-ends will want.

Two other ideas occurred to me that could have made the back-end's life
easier, but it didn't seem like they were worth doing:

 - Write a driver-independent lowering pass to convert SPIR-V-style
   intrinsics into coordinate-based intrinsics while still in SSA form.
   In that case there's likely no point in having the SPIR-V-style
   intrinsics in the first place, no back-end I know of will prefer
   image intrinsics in that form at this point, and we can just let the
   SPIR-V front-end deal with this problem alone.

 - Actually agree on a representation for a texel pointer.  A texel
   pointer would be a triple of pointer-to-object, vec4 of coordinates
   and sample index, together with some static metadata determining the
   memory access properties.  Something more back-end-specific would
   likely work too.  In any case we would also have to agree on a
   representation for a pointer to an image/SSB object.  And we would
   need to type-check pointers correctly to prevent the optimizer from
   doing illegal transformations (e.g. a single store intrinsic that
   writes to either coherent or non-coherent memory depending on the
   parent block, or, if we adhere to the limitations of GLSL strictly, a
   single store intrinsic that might access images based on different
   array uniforms).

   It gets even more interesting if you consider the limitations some
   back-ends have about accessing images non-uniformly -- We would have
   to guarantee that the pointer-to-object inside the texel pointer has
   the same value for all thread invocations executing a given load,
   store or atomic intrinsic, what implies that we would have to forbid
   texel pointers in phi instructions unless it can be proven that
   either the control flow incident into the node is uniform *or* the
   pointer-to-object coming from all parent branches is the same.

Sounds like a lot of work for little benefit at this point.

 I don't have a preference one way or the other, and I guess we could
 always refactor it later if we wanted to, so assuming Ken is OK with
 this, then besides one minor comment on patch 4 the series is

 Reviewed-by: Connor Abbott cwabbo...@gmail.com

Thanks.

 On Tue, May 5, 2015 at 4:29 PM, Francisco Jerez curroje...@riseup.net wrote:
 ---
  src/glsl/nir/nir_intrinsics.h | 27 +++
  1 file changed, 27 insertions(+)

 diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
 index 8e28765..4b13c75 100644
 --- a/src/glsl/nir/nir_intrinsics.h
 +++ b/src/glsl/nir/nir_intrinsics.h
 @@ -89,6 +89,33 @@ ATOMIC(inc, 0)
  ATOMIC(dec, 0)
  ATOMIC(read, NIR_INTRINSIC_CAN_ELIMINATE)

 +/*
 + * Image load, store and atomic intrinsics.
 + *
 + * All image intrinsics take an image target passed as a nir_variable.  
 Image
 + * variables contain a number of memory and layout qualifiers that influence
 + * the semantics of the intrinsic.
 + *
 + * All image intrinsics take a four-coordinate vector and a sample index as
 + * first two sources, determining the location within the image that will be
 + * accessed by the intrinsic.  Components not applicable to the image target
 + * in use are equal to zero by convention.  Image store takes an additional
 + * four-component argument with the value to be written, and image atomic
 + * operations take either one or two additional scalar arguments with the 
 same
 + * meaning as in the ARB_shader_image_load_store specification.
 + */
 +INTRINSIC(image_load, 2, ARR(4, 1), true, 4, 1, 0,
 +  

Re: [Mesa-dev] [PATCH 1/9] egl/wayland: properly destroy wayland objects

2015-05-08 Thread Emil Velikov
On 2 May 2015 at 11:15, Axel Davy axel.d...@ens.fr wrote:
 the wl_registry and the wl_queue allocated weren't destroyed.

Would this have any affect other than leaking memory ? If so can you
please add the mesa-stable tag (prior to pushing).

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Use NIR by default for vertex shaders on GEN8+

2015-05-08 Thread Jason Ekstrand
On Fri, May 8, 2015 at 3:27 AM, Kenneth Graunke kenn...@whitecape.org wrote:
 On Thursday, May 07, 2015 06:17:46 PM Matt Turner wrote:
 On Thu, May 7, 2015 at 4:50 PM, Jason Ekstrand ja...@jlekstrand.net wrote:
  GLSL IR vs. NIR shader-db results for SIMD8 vertex shaders on Broadwell:
 
 total instructions in shared programs: 2724483 - 2711790 (-0.47%)
 instructions in affected programs: 1860859 - 1848166 (-0.68%)
 helped:4387
 HURT:  4758
 GAINED:1499
 
  The gained programs are ARB vertext programs that were previously going
  through the vec4 backend.  Now that we have prog_to_nir, ARB vertex
  programs can go through the scalar backend so they show up as gained in
  the shader-db results.

 Again, I'm kind of confused and disappointed that we're just okay with
 hurting 4700 programs without more analysis. I guess I'll go do
 that...

 I took a stab at that tonight.  The good news is, the majority of the
 hurt is pretty stupid.  Indirect uniform address calculations involve
 a lot of integer multiplication by 4.

 For whatever reason, we're getting 4*x instead of x*4, which doesn't
 support immediates.  So we get:

 MOV tmp 4
 MUL dst tmp x

 Normally, constant propagation would commute the operands, giving us
 MUL dst x 4 like we want.  But it sees integer multiplication and
 chickens out, due to the asymmetry on some platforms.

Right.  I just sent out a patch that puts immediates in src1 for
commutative ALU ops at the emit stage.  We probably still want to do
something in constant propagation in case we can fold something, but
it fixes the problem for now.  It also helps even more than the
shifting patch.

I don't know.  Maybe we just want to make constant propagation do the
right thing on BDW+.  Matt?
--jason

 I think we can extend that - on Broadwell it should work fine, and
 might work fine for 16-bit immediates on Gen7 and Cherryview, too.

 Alternatively, I wrote a nir_opt_algebraic_late optimization that turns
 4*x into x  2, which works around the problem, and is also apparently
 much better for R600.

 Statistics on the shift patch are:

 total instructions in shared programs: 7432587 - 7388982 (-0.59%)
 instructions in affected programs: 1360411 - 1316806 (-3.21%)
 helped:5772
 HURT:  0

 Statistics for GLSL IR vs. NIR+(4*x = x  2):

 total instructions in shared programs: 7232451 - 7175983 (-0.78%)
 instructions in affected programs: 1586917 - 1530449 (-3.56%)
 helped:5780
 HURT:  1654

 which is much better.

 Looking at a couple of the shaders that are still worse off...it looks
 like a ton of Source shaders used to do MUL/ADD with an attribute and
 two immediates, and now are doing MOV/MOV/MAD.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops

2015-05-08 Thread Matt Turner
On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand ja...@jlekstrand.net wrote:
 Shader-db results for fragment shaders on Broadwell:

total instructions in shared programs: 4310987 - 4310663 (-0.01%)
instructions in affected programs: 43242 - 42918 (-0.75%)
helped:142
HURT:  0

 Shader-db results for vertex shaders on Broadwell:

total instructions in shared programs: 2889581 - 2844309 (-1.57%)
instructions in affected programs: 1418720 - 1373448 (-3.19%)
helped:6139
HURT:  0
 ---
  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 
  1 file changed, 12 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 index 555987d..161a262 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 @@ -21,6 +21,8 @@
   * IN THE SOFTWARE.
   */

 +#include algorithm
 +
  #include glsl/ir.h
  #include glsl/ir_optimization.h
  #include glsl/nir/glsl_to_nir.h
 @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr)
op[i] = offset(op[i], instr-src[i].swizzle[channel]);
 }

 +   /* Immediates can only be used as the second source of two-source
 +* instructions.  We have code in opt_algebraic to flip them as needed
 +* for most instructions.  However, it doesn't hurt anything to just do
 +* the right thing if we can detect it at the NIR level.
 +*/
 +   if ((nir_op_infos[instr-op].algebraic_properties  
 NIR_OP_IS_COMMUTATIVE) 
 +   nir_src_as_const_value(instr-src[0].src)) {
 +  std::swap(op[0], op[1]);
 +   }
 +

The real problem is that we haven't lifted the restriction about
non-commutative integer multiply on Broadwell:

brw_fs_copy_propagation.cpp:

   /* Fit this constant in by commuting the operands.
* Exception: we can't do this for 32-bit integer MUL/MACH
* because it's asymmetric.
*/
   if ((inst-opcode == BRW_OPCODE_MUL ||
inst-opcode == BRW_OPCODE_MACH) 
   (inst-src[1].type == BRW_REGISTER_TYPE_D ||
inst-src[1].type == BRW_REGISTER_TYPE_UD))
  break;
   inst-src[0] = inst-src[1];
   inst-src[1] = val;
   progress = true;
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 9/9] egl/swrast: enable config extension for swrast

2015-05-08 Thread Emil Velikov
On 2 May 2015 at 11:15, Axel Davy axel.d...@ens.fr wrote:
 Enables to use dri config for swrast, like vblank_mode.

 Signed-off-by: Axel Davy axel.d...@ens.fr
 ---
  src/egl/drivers/dri2/egl_dri2.c| 21 ++---
  src/gallium/state_trackers/dri/drisw.c |  1 +
  src/mesa/drivers/dri/swrast/swrast.c   |  1 +
  3 files changed, 12 insertions(+), 11 deletions(-)

 diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
 index 689d5ec..1434580 100644
 --- a/src/egl/drivers/dri2/egl_dri2.c
 +++ b/src/egl/drivers/dri2/egl_dri2.c
 @@ -559,6 +559,7 @@ dri2_create_screen(_EGLDisplay *disp)
  {
 const __DRIextension **extensions;
 struct dri2_egl_display *dri2_dpy;
 +   unsigned i;

 dri2_dpy = disp-DriverData;

 @@ -599,25 +600,23 @@ dri2_create_screen(_EGLDisplay *disp)
 extensions = dri2_dpy-core-getExtensions(dri2_dpy-dri_screen);

 if (dri2_dpy-dri2) {
 -  unsigned i;
 -
if (!dri2_bind_extensions(dri2_dpy, dri2_core_extensions, extensions))
   goto cleanup_dri_screen;
 -
 -  for (i = 0; extensions[i]; i++) {
 -if (strcmp(extensions[i]-name, __DRI2_ROBUSTNESS) == 0) {
 -dri2_dpy-robustness = (__DRIrobustnessExtension *) 
 extensions[i];
 -}
 -if (strcmp(extensions[i]-name, __DRI2_CONFIG_QUERY) == 0) {
 -   dri2_dpy-config = (__DRI2configQueryExtension *) extensions[i];
 -}
 -  }
 } else {
assert(dri2_dpy-swrast);
if (!dri2_bind_extensions(dri2_dpy, swrast_core_extensions, 
 extensions))
   goto cleanup_dri_screen;
 }

 +   for (i = 0; extensions[i]; i++) {
 +  if (strcmp(extensions[i]-name, __DRI2_ROBUSTNESS) == 0) {
 + dri2_dpy-robustness = (__DRIrobustnessExtension *) extensions[i];
 +  }
 +  if (strcmp(extensions[i]-name, __DRI2_CONFIG_QUERY) == 0) {
 + dri2_dpy-config = (__DRI2configQueryExtension *) extensions[i];
 +  }
 +   }
 +
Side note: Applying this hunk might lead to conflict due to the newly
introduced __DRI2_FENCE. Afaict one can move it as well similarly to
robustness and config_query.

Cheers
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] nir: Translate image load, store and atomic intrinsics from GLSL IR.

2015-05-08 Thread Connor Abbott
On Fri, May 8, 2015 at 10:30 AM, Francisco Jerez curroje...@riseup.net wrote:
 Connor Abbott cwabbo...@gmail.com writes:

 On Tue, May 5, 2015 at 4:29 PM, Francisco Jerez curroje...@riseup.net 
 wrote:
 ---
  src/glsl/nir/glsl_to_nir.cpp | 125 
 +++
  1 file changed, 114 insertions(+), 11 deletions(-)

 diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
 index f6b8331..a01ab3b 100644
 --- a/src/glsl/nir/glsl_to_nir.cpp
 +++ b/src/glsl/nir/glsl_to_nir.cpp
 @@ -614,27 +614,130 @@ nir_visitor::visit(ir_call *ir)
   op = nir_intrinsic_atomic_counter_inc_var;
} else if (strcmp(ir-callee_name(), 
 __intrinsic_atomic_predecrement) == 0) {
   op = nir_intrinsic_atomic_counter_dec_var;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_load) == 0) 
 {
 + op = nir_intrinsic_image_load;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_store) == 
 0) {
 + op = nir_intrinsic_image_store;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_add) 
 == 0) {
 + op = nir_intrinsic_image_atomic_add;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_min) 
 == 0) {
 + op = nir_intrinsic_image_atomic_min;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_max) 
 == 0) {
 + op = nir_intrinsic_image_atomic_max;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_and) 
 == 0) {
 + op = nir_intrinsic_image_atomic_and;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_or) 
 == 0) {
 + op = nir_intrinsic_image_atomic_or;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_xor) 
 == 0) {
 + op = nir_intrinsic_image_atomic_xor;
 +  } else if (strcmp(ir-callee_name(), 
 __intrinsic_image_atomic_exchange) == 0) {
 + op = nir_intrinsic_image_atomic_exchange;
 +  } else if (strcmp(ir-callee_name(), 
 __intrinsic_image_atomic_comp_swap) == 0) {
 + op = nir_intrinsic_image_atomic_comp_swap;
} else {
   unreachable(not reached);
}

nir_intrinsic_instr *instr = nir_intrinsic_instr_create(shader, op);
 -  ir_dereference *param =
 - (ir_dereference *) ir-actual_parameters.get_head();
 -  instr-variables[0] = evaluate_deref(instr-instr, param);
 -  nir_ssa_dest_init(instr-instr, instr-dest, 1, NULL);
 +
 +  switch (op) {
 +  case nir_intrinsic_atomic_counter_read_var:
 +  case nir_intrinsic_atomic_counter_inc_var:
 +  case nir_intrinsic_atomic_counter_dec_var: {
 + ir_dereference *param =
 +(ir_dereference *) ir-actual_parameters.get_head();
 + instr-variables[0] = evaluate_deref(instr-instr, param);
 + nir_ssa_dest_init(instr-instr, instr-dest, 1, NULL);
 + break;
 +  }
 +  case nir_intrinsic_image_load:
 +  case nir_intrinsic_image_store:
 +  case nir_intrinsic_image_atomic_add:
 +  case nir_intrinsic_image_atomic_min:
 +  case nir_intrinsic_image_atomic_max:
 +  case nir_intrinsic_image_atomic_and:
 +  case nir_intrinsic_image_atomic_or:
 +  case nir_intrinsic_image_atomic_xor:
 +  case nir_intrinsic_image_atomic_exchange:
 +  case nir_intrinsic_image_atomic_comp_swap: {
 + nir_load_const_instr *instr_zero = 
 nir_load_const_instr_create(shader, 1);
 + instr_zero-value.u[0] = 0;
 + nir_instr_insert_after_cf_list(this-cf_node_list, 
 instr_zero-instr);
 +
 + /* Set the image variable dereference. */
 + exec_node *param = ir-actual_parameters.get_head();
 + ir_dereference *image = (ir_dereference *)param;
 + const glsl_type *type =
 +image-variable_referenced()-type-without_array();
 +
 + instr-variables[0] = evaluate_deref(instr-instr, image);
 + param = param-get_next();
 +
 + /* Set the address argument, extending the coordinate vector to 
 four
 +  * components.
 +  */
 + const nir_src src_addr = evaluate_rvalue((ir_dereference *)param);
 + nir_alu_instr *instr_addr = nir_alu_instr_create(shader, 
 nir_op_vec4);
 + nir_ssa_dest_init(instr_addr-instr, instr_addr-dest.dest, 4, 
 NULL);
 +
 + for (int i = 0; i  4; i++) {
 +if (i  type-coordinate_components()) {
 +   instr_addr-src[i].src = src_addr;
 +   instr_addr-src[i].swizzle[0] = i;
 +} else {
 +   instr_addr-src[i].src = nir_src_for_ssa(instr_zero-def);

 I think it would better convey the intent to create an ssa_undef_instr
 and use that here instead of zero. Unless something else relies on the
 extra coordinates being zeroed?

 Yeah, that would work too.  Zero makes some sense because an
 n-dimensional image is largely equivalent to an n+1-dimensional image
 with the last dimension equal to one.  Some implementation
 *cough*i965*cough* 

Re: [Mesa-dev] [PATCH] nir/search: handle explicitly sized sources in match_value

2015-05-08 Thread Connor Abbott
Makes sense.

Reviewed-by: Connor Abbott cwabbo...@gmail.com

On Fri, May 8, 2015 at 11:38 AM, Jason Ekstrand ja...@jlekstrand.net wrote:
 Previously, this case was being handled in match_expression prior to
 calling match_value.  However, there is really no good reason for this
 given that match_value has all of the information it needs.  Also, they
 weren't being handled properly in the commutative case and putting it in
 match_value gives us that for free.

 Cc: Connor Abbott cwabbo...@gmail.com
 ---
  src/glsl/nir/nir_search.c | 16 
  1 file changed, 8 insertions(+), 8 deletions(-)

 diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c
 index 5ba0160..821b86d 100644
 --- a/src/glsl/nir/nir_search.c
 +++ b/src/glsl/nir/nir_search.c
 @@ -73,6 +73,14 @@ match_value(const nir_search_value *value, nir_alu_instr 
 *instr, unsigned src,
  {
 uint8_t new_swizzle[4];

 +   /* If the source is an explicitly sized source, then we need to reset
 +* both the number of components and the swizzle.
 +*/
 +   if (nir_op_infos[instr-op].input_sizes[src] != 0) {
 +  num_components = nir_op_infos[instr-op].input_sizes[src];
 +  swizzle = identity_swizzle;
 +   }
 +
 for (int i = 0; i  num_components; ++i)
new_swizzle[i] = instr-src[src].swizzle[swizzle[i]];

 @@ -200,14 +208,6 @@ match_expression(const nir_search_expression *expr, 
 nir_alu_instr *instr,

 bool matched = true;
 for (unsigned i = 0; i  nir_op_infos[instr-op].num_inputs; i++) {
 -  /* If the source is an explicitly sized source, then we need to reset
 -   * both the number of components and the swizzle.
 -   */
 -  if (nir_op_infos[instr-op].input_sizes[i] != 0) {
 - num_components = nir_op_infos[instr-op].input_sizes[i];
 - swizzle = identity_swizzle;
 -  }
 -
if (!match_value(expr-srcs[i], instr, i, num_components,
 swizzle, state)) {
   matched = false;
 --
 2.4.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops

2015-05-08 Thread Matt Turner
On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand ja...@jlekstrand.net wrote:
 Shader-db results for fragment shaders on Broadwell:

total instructions in shared programs: 4310987 - 4310663 (-0.01%)
instructions in affected programs: 43242 - 42918 (-0.75%)
helped:142
HURT:  0

 Shader-db results for vertex shaders on Broadwell:

total instructions in shared programs: 2889581 - 2844309 (-1.57%)
instructions in affected programs: 1418720 - 1373448 (-3.19%)
helped:6139
HURT:  0
 ---
  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 
  1 file changed, 12 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 index 555987d..161a262 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 @@ -21,6 +21,8 @@
   * IN THE SOFTWARE.
   */

 +#include algorithm
 +
  #include glsl/ir.h
  #include glsl/ir_optimization.h
  #include glsl/nir/glsl_to_nir.h
 @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr)
op[i] = offset(op[i], instr-src[i].swizzle[channel]);
 }

 +   /* Immediates can only be used as the second source of two-source
 +* instructions.  We have code in opt_algebraic to flip them as needed
 +* for most instructions.  However, it doesn't hurt anything to just do
 +* the right thing if we can detect it at the NIR level.
 +*/
 +   if ((nir_op_infos[instr-op].algebraic_properties  
 NIR_OP_IS_COMMUTATIVE) 
 +   nir_src_as_const_value(instr-src[0].src)) {
 +  std::swap(op[0], op[1]);

We don't use STL.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/9] egl/wayland: Implement DRI_PRIME support

2015-05-08 Thread Emil Velikov
On 8 May 2015 at 18:06, Axel Davy d...@clipper.ens.fr wrote:
 On Fri, 8 May 2015, Emil Velikov wrote:

 Shouldn't we authenticate with the correct gpu or master/render node ?
 This implementation will auth with GPU1, and then use GPU2 which seems
 a bit odd. I might be missing something ?



 The original patches did do differently: when GPU1 was discovered to not be
 the wanted gpu, it was not authenticating to it.

 However loader_get_user_preferred_fd was introduced and thus that leads to
 something different (which I think is cleaner):

 We authenticate to GPU1, so we have one node we can render to.

 loader_get_user_preferred_fd takes the GPU1 fd, and eventually replaces
 it by GPU2 fd if user wants it and it is possible.

 Given this way of doing, we it makes sense to auth to GPU1, even if we end
 up rendering to GPU2.
No objections against loader_get_user_preferred_fd. I'm thinking that
if we push it into drm_handle_device() we can avoid a needless (in
some cases) auth. Not sure how much it's worth, but it seems like a
good idea imho.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCHv2 4/5] nir: Translate image load, store and atomic intrinsics from GLSL IR.

2015-05-08 Thread Francisco Jerez
v2: Undefine coordinate components not applicable to the target.

---
 src/glsl/nir/glsl_to_nir.cpp | 126 +++
 1 file changed, 115 insertions(+), 11 deletions(-)

diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
index f6b8331..fab4508 100644
--- a/src/glsl/nir/glsl_to_nir.cpp
+++ b/src/glsl/nir/glsl_to_nir.cpp
@@ -614,27 +614,131 @@ nir_visitor::visit(ir_call *ir)
  op = nir_intrinsic_atomic_counter_inc_var;
   } else if (strcmp(ir-callee_name(), __intrinsic_atomic_predecrement) 
== 0) {
  op = nir_intrinsic_atomic_counter_dec_var;
+  } else if (strcmp(ir-callee_name(), __intrinsic_image_load) == 0) {
+ op = nir_intrinsic_image_load;
+  } else if (strcmp(ir-callee_name(), __intrinsic_image_store) == 0) {
+ op = nir_intrinsic_image_store;
+  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_add) == 
0) {
+ op = nir_intrinsic_image_atomic_add;
+  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_min) == 
0) {
+ op = nir_intrinsic_image_atomic_min;
+  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_max) == 
0) {
+ op = nir_intrinsic_image_atomic_max;
+  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_and) == 
0) {
+ op = nir_intrinsic_image_atomic_and;
+  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_or) == 
0) {
+ op = nir_intrinsic_image_atomic_or;
+  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_xor) == 
0) {
+ op = nir_intrinsic_image_atomic_xor;
+  } else if (strcmp(ir-callee_name(), 
__intrinsic_image_atomic_exchange) == 0) {
+ op = nir_intrinsic_image_atomic_exchange;
+  } else if (strcmp(ir-callee_name(), 
__intrinsic_image_atomic_comp_swap) == 0) {
+ op = nir_intrinsic_image_atomic_comp_swap;
   } else {
  unreachable(not reached);
   }
 
   nir_intrinsic_instr *instr = nir_intrinsic_instr_create(shader, op);
-  ir_dereference *param =
- (ir_dereference *) ir-actual_parameters.get_head();
-  instr-variables[0] = evaluate_deref(instr-instr, param);
-  nir_ssa_dest_init(instr-instr, instr-dest, 1, NULL);
+
+  switch (op) {
+  case nir_intrinsic_atomic_counter_read_var:
+  case nir_intrinsic_atomic_counter_inc_var:
+  case nir_intrinsic_atomic_counter_dec_var: {
+ ir_dereference *param =
+(ir_dereference *) ir-actual_parameters.get_head();
+ instr-variables[0] = evaluate_deref(instr-instr, param);
+ nir_ssa_dest_init(instr-instr, instr-dest, 1, NULL);
+ break;
+  }
+  case nir_intrinsic_image_load:
+  case nir_intrinsic_image_store:
+  case nir_intrinsic_image_atomic_add:
+  case nir_intrinsic_image_atomic_min:
+  case nir_intrinsic_image_atomic_max:
+  case nir_intrinsic_image_atomic_and:
+  case nir_intrinsic_image_atomic_or:
+  case nir_intrinsic_image_atomic_xor:
+  case nir_intrinsic_image_atomic_exchange:
+  case nir_intrinsic_image_atomic_comp_swap: {
+ nir_ssa_undef_instr *instr_undef =
+nir_ssa_undef_instr_create(shader, 1);
+ nir_instr_insert_after_cf_list(this-cf_node_list,
+instr_undef-instr);
+
+ /* Set the image variable dereference. */
+ exec_node *param = ir-actual_parameters.get_head();
+ ir_dereference *image = (ir_dereference *)param;
+ const glsl_type *type =
+image-variable_referenced()-type-without_array();
+
+ instr-variables[0] = evaluate_deref(instr-instr, image);
+ param = param-get_next();
+
+ /* Set the address argument, extending the coordinate vector to four
+  * components.
+  */
+ const nir_src src_addr = evaluate_rvalue((ir_dereference *)param);
+ nir_alu_instr *instr_addr = nir_alu_instr_create(shader, nir_op_vec4);
+ nir_ssa_dest_init(instr_addr-instr, instr_addr-dest.dest, 4, 
NULL);
+
+ for (int i = 0; i  4; i++) {
+if (i  type-coordinate_components()) {
+   instr_addr-src[i].src = src_addr;
+   instr_addr-src[i].swizzle[0] = i;
+} else {
+   instr_addr-src[i].src = nir_src_for_ssa(instr_undef-def);
+}
+ }
+
+ nir_instr_insert_after_cf_list(cf_node_list, instr_addr-instr);
+ instr-src[0] = nir_src_for_ssa(instr_addr-dest.dest.ssa);
+ param = param-get_next();
+
+ /* Set the sample argument, which is undefined for single-sample
+  * images.
+  */
+ if (type-sampler_dimensionality == GLSL_SAMPLER_DIM_MS) {
+instr-src[1] = evaluate_rvalue((ir_dereference *)param);
+param = param-get_next();
+ } else {
+instr-src[1] = nir_src_for_ssa(instr_undef-def);
+ }
+
+ /* Set the intrinsic 

Re: [Mesa-dev] [RFC PATCH] nir: Transform 4*x into x 2 during late optimizations.

2015-05-08 Thread Jason Ekstrand
On Fri, May 8, 2015 at 3:36 AM, Kenneth Graunke kenn...@whitecape.org wrote:
 According to Glenn, shifts on R600 have 5x the throughput as multiplies.

 Intel GPUs have strange integer multiplication restrictions - on most
 hardware, MUL actually only does a 32-bit x 16-bit multiply.  This
 means the arguments aren't commutative, which can limit our constant
 propagation options.  SHL has no such restrictions.

 Shifting is probably reasonable on most people's hardware, so let's just
 do that.

 i965 shader-db results (using NIR for VS):
 total instructions in shared programs: 7432587 - 7388982 (-0.59%)
 instructions in affected programs: 1360411 - 1316806 (-3.21%)
 helped:5772
 HURT:  0

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 Cc: matts...@gmail.com
 Cc: ja...@jlekstrand.net
 ---
  src/glsl/nir/nir_opt_algebraic.py | 5 +
  1 file changed, 5 insertions(+)

 So...I found a bizarre issue with this patch.

(('imul', 4, a), ('ishl', a, 2)),

 totally optimizes things.  However...

(('imul', a, 4), ('ishl', a, 2)),

 doesn't seem to do anything, even though imul is commutative, and nir_search
 should totally handle that...

  ▄▄  ▄▄▄▄    ▄   ▄▄
  ██  ██   ▀▀▀██▀▀▀  ███  ██
  ▀█▄ ██ ▄█▀      ██ ▄█▀  ██
   ██ ██ ██   ██  ██  ██   ▄██▀   ██
   ███▀▀███   ██  ██   ██ ▀▀
   ███  ███  ▄██  ██▄ ██   ▄▄ ▄▄
   ▀▀▀  ▀▀▀  ▀▀▀▀ ▀▀   ▀▀ ▀▀

 If you know why, let me know, otherwise I may have to look into it when more
 awake.

I figured it out and I have a patch.  Unfortunately, it regresses a
few programs and looses 8 SIMD8 programs so I'm doing some more
investigation.  I'll send it out soon.

 diff --git a/src/glsl/nir/nir_opt_algebraic.py 
 b/src/glsl/nir/nir_opt_algebraic.py
 index 400d60e..350471f 100644
 --- a/src/glsl/nir/nir_opt_algebraic.py
 +++ b/src/glsl/nir/nir_opt_algebraic.py
 @@ -247,6 +247,11 @@ late_optimizations = [
 (('fge', ('fadd', a, b), 0.0), ('fge', a, ('fneg', b))),
 (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))),
 (('fne', ('fadd', a, b), 0.0), ('fne', a, ('fneg', b))),
 +
 +   # Multiplication by 4 comes up fairly often in indirect offset 
 calculations.
 +   # Some GPUs have weird integer multiplication limitations, but shifts 
 should work
 +   # equally well everywhere.
 +   (('imul', 4, a), ('ishl', a, 2)),
  ]

  print nir_algebraic.AlgebraicPass(nir_opt_algebraic, 
 optimizations).render()
 --
 2.4.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops

2015-05-08 Thread Jason Ekstrand
Shader-db results for fragment shaders on Broadwell:

   total instructions in shared programs: 4310987 - 4310663 (-0.01%)
   instructions in affected programs: 43242 - 42918 (-0.75%)
   helped:142
   HURT:  0

Shader-db results for vertex shaders on Broadwell:

   total instructions in shared programs: 2889581 - 2844309 (-1.57%)
   instructions in affected programs: 1418720 - 1373448 (-3.19%)
   helped:6139
   HURT:  0
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 555987d..161a262 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -21,6 +21,8 @@
  * IN THE SOFTWARE.
  */
 
+#include algorithm
+
 #include glsl/ir.h
 #include glsl/ir_optimization.h
 #include glsl/nir/glsl_to_nir.h
@@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr)
   op[i] = offset(op[i], instr-src[i].swizzle[channel]);
}
 
+   /* Immediates can only be used as the second source of two-source
+* instructions.  We have code in opt_algebraic to flip them as needed
+* for most instructions.  However, it doesn't hurt anything to just do
+* the right thing if we can detect it at the NIR level.
+*/
+   if ((nir_op_infos[instr-op].algebraic_properties  NIR_OP_IS_COMMUTATIVE) 

+   nir_src_as_const_value(instr-src[0].src)) {
+  std::swap(op[0], op[1]);
+   }
+
switch (instr-op) {
case nir_op_i2f:
case nir_op_u2f:
-- 
2.4.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/9] egl/wayland: Add support for render-nodes

2015-05-08 Thread Emil Velikov
On 2 May 2015 at 11:15, Axel Davy axel.d...@ens.fr wrote:
 It is possible the server advertises a render-node.
 In that case no authentication is needed,
 and Gem names are forbidden.

 Signed-off-by: Axel Davy axel.d...@ens.fr
 ---

 diff --git a/src/egl/drivers/dri2/platform_wayland.c 
 b/src/egl/drivers/dri2/platform_wayland.c
 index a5bcf25..79989cb 100644
 --- a/src/egl/drivers/dri2/platform_wayland.c
 +++ b/src/egl/drivers/dri2/platform_wayland.c
 @@ -800,12 +800,33 @@ bad_format:
 return NULL;
  }

 +static char
 +is_fd_render_node(int fd)
 +{
 +   struct stat render;
 +
 +   if (fstat(fd, render))
 +  return 0;
 +
 +   if (!S_ISCHR(render.st_mode))
 +  return 0;
 +
 +   if (render.st_rdev  0x80)
 +  return 1;
 +   return 0;
 +}
 +
Let's use drmGetNodeTypeFromFd(), rather than hard-coding this ?


 +static EGLBoolean
 +is_render_node_capable(struct dri2_egl_display *dri2_dpy)
 +{
 +   const __DRIextension **extensions;
 +   int i;
 +
 +   /* We cannot use Gem names with render-nodes, only prime fds (dma-buf).
 +* The server needs to accept them */
 +   if (!(dri2_dpy-capabilities  WL_DRM_CAPABILITY_PRIME))
 +  return EGL_FALSE;
 +
 +   /* Check the __DRIimage api is supported (this is required by our
 +* codepath without Gem names) */
 +   extensions = dri2_dpy-driver_extensions;
 +   for (i = 0; extensions[i]; i++) {
 +  if (strcmp(extensions[i]-name, __DRI_IMAGE_DRIVER) == 0)
 + return EGL_TRUE;
If memory serves me right__DRI_IMAGE_DRIVER is a cleaned up version of
__DRI_DRI2. Afaics the former is never looked up (or used) in the EGL
code, unlike the latter. So I'm suspecting that this check is a bit
off ?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for textureGather

2015-05-08 Thread Neil Roberts
The opt_sampler_eot optimisation seems to break when the last
instruction is SHADER_OPCODE_TG4. A bunch of Piglit tests end up doing
this so it causes a lot of regressions. I can't find any documentation
or known workarounds to indicate that this is expected behaviour, but
considering that this is probably a pretty unlikely situation in a
real use case we might as well disable it in order to avoid the
regressions. In total this fixes 451 tests.

Reviewed-by: Ben Widawsky b...@bwidawsk.net
---

See here for some more discussion of this:

http://lists.freedesktop.org/archives/mesa-dev/2015-May/083640.html

As far as I can tell the Jenkins run mentioned in that email doesn't
seem to have any tests on Cherryview or Skylake so that probably
explains why it didn't pick up the regression.

 src/mesa/drivers/dri/i965/brw_fs.cpp | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 8dd680e..e9528e0 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2655,6 +2655,16 @@ fs_visitor::opt_sampler_eot()
if (unlikely(tex_inst-is_head_sentinel()) || !tex_inst-is_tex())
   return false;
 
+   /* This optimisation doesn't seem to work for textureGather for some
+* reason. I can't find any documentation or known workarounds to indicate
+* that this is expected, but considering that it is probably pretty
+* unlikely that a shader would directly write out the results from
+* textureGather we might as well just disable it.
+*/
+   if (tex_inst-opcode == SHADER_OPCODE_TG4 ||
+   tex_inst-opcode == SHADER_OPCODE_TG4_OFFSET)
+  return false;
+
/* If there's no header present, we need to munge the LOAD_PAYLOAD as well.
 * It's very likely to be the previous instruction.
 */
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCHv2 4/5] nir: Translate image load, store and atomic intrinsics from GLSL IR.

2015-05-08 Thread Connor Abbott
Update the comment in patch 1, and the entire thing is

Reviewed-by: Connor Abbott cwabbo...@gmail.com

On Fri, May 8, 2015 at 12:40 PM, Francisco Jerez curroje...@riseup.net wrote:
 v2: Undefine coordinate components not applicable to the target.

 ---
  src/glsl/nir/glsl_to_nir.cpp | 126 
 +++
  1 file changed, 115 insertions(+), 11 deletions(-)

 diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
 index f6b8331..fab4508 100644
 --- a/src/glsl/nir/glsl_to_nir.cpp
 +++ b/src/glsl/nir/glsl_to_nir.cpp
 @@ -614,27 +614,131 @@ nir_visitor::visit(ir_call *ir)
   op = nir_intrinsic_atomic_counter_inc_var;
} else if (strcmp(ir-callee_name(), 
 __intrinsic_atomic_predecrement) == 0) {
   op = nir_intrinsic_atomic_counter_dec_var;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_load) == 0) {
 + op = nir_intrinsic_image_load;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_store) == 0) {
 + op = nir_intrinsic_image_store;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_add) 
 == 0) {
 + op = nir_intrinsic_image_atomic_add;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_min) 
 == 0) {
 + op = nir_intrinsic_image_atomic_min;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_max) 
 == 0) {
 + op = nir_intrinsic_image_atomic_max;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_and) 
 == 0) {
 + op = nir_intrinsic_image_atomic_and;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_or) == 
 0) {
 + op = nir_intrinsic_image_atomic_or;
 +  } else if (strcmp(ir-callee_name(), __intrinsic_image_atomic_xor) 
 == 0) {
 + op = nir_intrinsic_image_atomic_xor;
 +  } else if (strcmp(ir-callee_name(), 
 __intrinsic_image_atomic_exchange) == 0) {
 + op = nir_intrinsic_image_atomic_exchange;
 +  } else if (strcmp(ir-callee_name(), 
 __intrinsic_image_atomic_comp_swap) == 0) {
 + op = nir_intrinsic_image_atomic_comp_swap;
} else {
   unreachable(not reached);
}

nir_intrinsic_instr *instr = nir_intrinsic_instr_create(shader, op);
 -  ir_dereference *param =
 - (ir_dereference *) ir-actual_parameters.get_head();
 -  instr-variables[0] = evaluate_deref(instr-instr, param);
 -  nir_ssa_dest_init(instr-instr, instr-dest, 1, NULL);
 +
 +  switch (op) {
 +  case nir_intrinsic_atomic_counter_read_var:
 +  case nir_intrinsic_atomic_counter_inc_var:
 +  case nir_intrinsic_atomic_counter_dec_var: {
 + ir_dereference *param =
 +(ir_dereference *) ir-actual_parameters.get_head();
 + instr-variables[0] = evaluate_deref(instr-instr, param);
 + nir_ssa_dest_init(instr-instr, instr-dest, 1, NULL);
 + break;
 +  }
 +  case nir_intrinsic_image_load:
 +  case nir_intrinsic_image_store:
 +  case nir_intrinsic_image_atomic_add:
 +  case nir_intrinsic_image_atomic_min:
 +  case nir_intrinsic_image_atomic_max:
 +  case nir_intrinsic_image_atomic_and:
 +  case nir_intrinsic_image_atomic_or:
 +  case nir_intrinsic_image_atomic_xor:
 +  case nir_intrinsic_image_atomic_exchange:
 +  case nir_intrinsic_image_atomic_comp_swap: {
 + nir_ssa_undef_instr *instr_undef =
 +nir_ssa_undef_instr_create(shader, 1);
 + nir_instr_insert_after_cf_list(this-cf_node_list,
 +instr_undef-instr);
 +
 + /* Set the image variable dereference. */
 + exec_node *param = ir-actual_parameters.get_head();
 + ir_dereference *image = (ir_dereference *)param;
 + const glsl_type *type =
 +image-variable_referenced()-type-without_array();
 +
 + instr-variables[0] = evaluate_deref(instr-instr, image);
 + param = param-get_next();
 +
 + /* Set the address argument, extending the coordinate vector to four
 +  * components.
 +  */
 + const nir_src src_addr = evaluate_rvalue((ir_dereference *)param);
 + nir_alu_instr *instr_addr = nir_alu_instr_create(shader, 
 nir_op_vec4);
 + nir_ssa_dest_init(instr_addr-instr, instr_addr-dest.dest, 4, 
 NULL);
 +
 + for (int i = 0; i  4; i++) {
 +if (i  type-coordinate_components()) {
 +   instr_addr-src[i].src = src_addr;
 +   instr_addr-src[i].swizzle[0] = i;
 +} else {
 +   instr_addr-src[i].src = nir_src_for_ssa(instr_undef-def);
 +}
 + }
 +
 + nir_instr_insert_after_cf_list(cf_node_list, instr_addr-instr);
 + instr-src[0] = nir_src_for_ssa(instr_addr-dest.dest.ssa);
 + param = param-get_next();
 +
 + /* Set the sample argument, which is undefined for single-sample
 +  * images.
 +  */
 +

Re: [Mesa-dev] [PATCH 5/9] egl/wayland: Implement DRI_PRIME support

2015-05-08 Thread Emil Velikov
On 2 May 2015 at 11:15, Axel Davy axel.d...@ens.fr wrote:
 When the server gpu and requested gpu are different:
 . They likely don't support the same tiling modes
 . They likely do not have fast access to the same locations

 Thus we do:
 . render to a tiled buffer we do not share with the server
 . Copy the content at every swap to a buffer with no tiling
 that we share with the server.

 This is similar to the glx dri3 DRI_PRIME implementation.

 Signed-off-by: Axel Davy axel.d...@ens.fr
 ---

 --- a/src/egl/drivers/dri2/platform_wayland.c
 +++ b/src/egl/drivers/dri2/platform_wayland.c

 @@ -632,6 +658,8 @@ dri2_wl_swap_buffers_with_damage(_EGLDriver *drv,
  {
 struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
 struct dri2_egl_surface *dri2_surf = dri2_egl_surface(draw);
 +   struct dri2_egl_context *dri2_ctx;
 +   _EGLContext *ctx;
Nit: Move the variable declarations where they're used. Otherwise
static analysis tools flag lovely message(s).

 @@ -1084,6 +1123,24 @@ dri2_initialize_wayland(_EGLDriver *drv, _EGLDisplay 
 *disp)
 if (roundtrip(dri2_dpy)  0 || !dri2_dpy-authenticated)
goto cleanup_fd;

 +   dri2_dpy-fd = loader_get_user_preferred_fd(dri2_dpy-fd,
 +   dri2_dpy-is_different_gpu);
 +   if (dri2_dpy-is_different_gpu) {
 +  free(dri2_dpy-device_name);
 +  dri2_dpy-device_name = loader_get_device_name_for_fd(dri2_dpy-fd);
 +  if (!dri2_dpy-device_name) {
 + _eglError(EGL_BAD_ALLOC, wayland-egl: failed to get device name 
 +  for requested GPU);
 + goto cleanup_fd;
 +  }
 +   }
 +
Shouldn't we authenticate with the correct gpu or master/render node ?
This implementation will auth with GPU1, and then use GPU2 which seems
a bit odd. I might be missing something ?


 @@ -1127,6 +1184,15 @@ dri2_initialize_wayland(_EGLDriver *drv, _EGLDisplay 
 *disp)
goto cleanup_screen;
 }

 +   if (dri2_dpy-is_different_gpu 
 +   (dri2_dpy-image-base.version  9 ||
 +dri2_dpy-image-blitImage == NULL)) {
 +  _eglLog(_EGL_WARNING, wayland-egl: Different GPU, but image extension 
 
 +version 9 or later not found, or blitImage not 
 +implemented for this driver);
Nit: Alternative message:
Different GPU selected, but the Image extension in the driver is not
compatible. Version 9 or later and blitImage() are required.


Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Use NIR by default for vertex shaders on GEN8+

2015-05-08 Thread Matt Turner
On Fri, May 8, 2015 at 10:08 AM, Jason Ekstrand ja...@jlekstrand.net wrote:
 On Fri, May 8, 2015 at 3:27 AM, Kenneth Graunke kenn...@whitecape.org wrote:
 On Thursday, May 07, 2015 06:17:46 PM Matt Turner wrote:
 On Thu, May 7, 2015 at 4:50 PM, Jason Ekstrand ja...@jlekstrand.net wrote:
  GLSL IR vs. NIR shader-db results for SIMD8 vertex shaders on Broadwell:
 
 total instructions in shared programs: 2724483 - 2711790 (-0.47%)
 instructions in affected programs: 1860859 - 1848166 (-0.68%)
 helped:4387
 HURT:  4758
 GAINED:1499
 
  The gained programs are ARB vertext programs that were previously going
  through the vec4 backend.  Now that we have prog_to_nir, ARB vertex
  programs can go through the scalar backend so they show up as gained in
  the shader-db results.

 Again, I'm kind of confused and disappointed that we're just okay with
 hurting 4700 programs without more analysis. I guess I'll go do
 that...

 I took a stab at that tonight.  The good news is, the majority of the
 hurt is pretty stupid.  Indirect uniform address calculations involve
 a lot of integer multiplication by 4.

 For whatever reason, we're getting 4*x instead of x*4, which doesn't
 support immediates.  So we get:

 MOV tmp 4
 MUL dst tmp x

 Normally, constant propagation would commute the operands, giving us
 MUL dst x 4 like we want.  But it sees integer multiplication and
 chickens out, due to the asymmetry on some platforms.

 Right.  I just sent out a patch that puts immediates in src1 for
 commutative ALU ops at the emit stage.  We probably still want to do
 something in constant propagation in case we can fold something, but
 it fixes the problem for now.  It also helps even more than the
 shifting patch.

 I don't know.  Maybe we just want to make constant propagation do the
 right thing on BDW+.  Matt?

Yes.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching

2015-05-08 Thread Jason Ekstrand
total instructions in shared programs: 7152330 - 7137006 (-0.21%)
instructions in affected programs: 1330548 - 1315224 (-1.15%)
helped:5797
HURT:  76
GAINED:0
LOST:  8
---
 src/glsl/nir/nir_search.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c
index d86655b..4c83349 100644
--- a/src/glsl/nir/nir_search.c
+++ b/src/glsl/nir/nir_search.c
@@ -199,6 +199,10 @@ match_expression(const nir_search_expression *expr, 
nir_alu_instr *instr,
   }
}
 
+   /* Stash off the current variables_seen bitmask.  This way we can
+* restore it prior to matching in the commutative case below. */
+   unsigned variables_seen_stash = state-variables_seen;
+
bool matched = true;
for (unsigned i = 0; i  nir_op_infos[instr-op].num_inputs; i++) {
   /* If the source is an explicitly sized source, then we need to reset
@@ -221,6 +225,13 @@ match_expression(const nir_search_expression *expr, 
nir_alu_instr *instr,
 
if (nir_op_infos[instr-op].algebraic_properties  NIR_OP_IS_COMMUTATIVE) {
   assert(nir_op_infos[instr-op].num_inputs == 2);
+
+  /* Restore the variables_seen bitmask.  If we don't do this, then we
+   * could end up with an erroneous failure due to variables found in the
+   * first match attempt above not matching those in the second.
+   */
+  state-variables_seen = variables_seen_stash;
+
   if (!match_value(expr-srcs[0], instr, 1, num_components,
swizzle, state))
  return false;
-- 
2.4.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH] nir: Transform 4*x into x 2 during late optimizations.

2015-05-08 Thread Jason Ekstrand
On Fri, May 8, 2015 at 11:11 AM, Ian Romanick i...@freedesktop.org wrote:
 On 05/08/2015 03:36 AM, Kenneth Graunke wrote:
 According to Glenn, shifts on R600 have 5x the throughput as multiplies.

 Intel GPUs have strange integer multiplication restrictions - on most
 hardware, MUL actually only does a 32-bit x 16-bit multiply.  This
 means the arguments aren't commutative, which can limit our constant
 propagation options.  SHL has no such restrictions.

 Shifting is probably reasonable on most people's hardware, so let's just
 do that.

 i965 shader-db results (using NIR for VS):
 total instructions in shared programs: 7432587 - 7388982 (-0.59%)
 instructions in affected programs: 1360411 - 1316806 (-3.21%)
 helped:5772
 HURT:  0

 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 Cc: matts...@gmail.com
 Cc: ja...@jlekstrand.net
 ---
  src/glsl/nir/nir_opt_algebraic.py | 5 +
  1 file changed, 5 insertions(+)

 So...I found a bizarre issue with this patch.

(('imul', 4, a), ('ishl', a, 2)),

 totally optimizes things.  However...

(('imul', a, 4), ('ishl', a, 2)),

 doesn't seem to do anything, even though imul is commutative, and nir_search
 should totally handle that...

  ▄▄  ▄▄▄▄    ▄   ▄▄
  ██  ██   ▀▀▀██▀▀▀  ███  ██
  ▀█▄ ██ ▄█▀      ██ ▄█▀  ██
   ██ ██ ██   ██  ██  ██   ▄██▀   ██
   ███▀▀███   ██  ██   ██ ▀▀
   ███  ███  ▄██  ██▄ ██   ▄▄ ▄▄
   ▀▀▀  ▀▀▀  ▀▀▀▀ ▀▀   ▀▀ ▀▀

 If you know why, let me know, otherwise I may have to look into it when more
 awake.

 I've noticed a couple other weird things that I have been unable to
 understand.  Shaders like the one below end with fmul/ffma instaed of
 flrp, for example.  I understand why that happens from GLSL IR
 opt_algebraic, but it seems like nir_opt_algebraic should handle it.

Just a guess, but it's quite possibly due to the commutative
operations bug I just sent a patch for.
--Jason

 [require]
 GLSL = 1.30

 [vertex shader]
 in vec4 v;
 in vec2 tc_in;

 out vec2 tc;

 void main() {
 gl_Position = v;
 tc = tc_in;
 }

 [fragment shader]
 in vec2 tc;

 out vec4 color;

 uniform sampler2D s;
 uniform float a;
 uniform vec3 base_color;

 void main() {
 vec3 tex_color = texture(s, tc).xyz;

 color.xyz = (base_color * a) + (tex_color * (1.0 - a));
 color.a = 1.0;
 }



 diff --git a/src/glsl/nir/nir_opt_algebraic.py 
 b/src/glsl/nir/nir_opt_algebraic.py
 index 400d60e..350471f 100644
 --- a/src/glsl/nir/nir_opt_algebraic.py
 +++ b/src/glsl/nir/nir_opt_algebraic.py
 @@ -247,6 +247,11 @@ late_optimizations = [
 (('fge', ('fadd', a, b), 0.0), ('fge', a, ('fneg', b))),
 (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))),
 (('fne', ('fadd', a, b), 0.0), ('fne', a, ('fneg', b))),
 +
 +   # Multiplication by 4 comes up fairly often in indirect offset 
 calculations.
 +   # Some GPUs have weird integer multiplication limitations, but shifts 
 should work
 +   # equally well everywhere.
 +   (('imul', 4, a), ('ishl', a, 2)),

 This should be conditionalized on whether the platform has native integers.

  ]

  print nir_algebraic.AlgebraicPass(nir_opt_algebraic, 
 optimizations).render()


 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] main: glGetIntegeri_v fails for GL_VERTEX_BINDING_STRIDE

2015-05-08 Thread Emil Velikov
On 8 May 2015 at 11:36, Tapani Pälli tapani.pa...@intel.com wrote:
 That is strange, it was introduced in fb370f89d but then has gone missing ..

Seems like it broke shortly after it was introduced (around 10.1.0-devel)

commit 902f9df36bec7d67a2d8bc4c24d89d9d57964903
Author: Francisco Jerez curroje...@riseup.net
Date:   Mon Nov 25 10:11:59 2013 -0800

mesa: Add image parameter queries for ARB_shader_image_load_store.


The fix looks great - curious that we don't have piglits for these.

Whomever pushes this, please add

Cc: 10.4 10.5 mesa-sta...@lists.freedesktop.org
Signed-off-by: Emil Velikov emil.l.veli...@gmail.com

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops

2015-05-08 Thread Matt Turner
On Fri, May 8, 2015 at 11:13 AM, Jason Ekstrand ja...@jlekstrand.net wrote:
 On Fri, May 8, 2015 at 11:05 AM, Kenneth Graunke kenn...@whitecape.org 
 wrote:
 On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote:
 On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand ja...@jlekstrand.net 
 wrote:
  Shader-db results for fragment shaders on Broadwell:
 
 total instructions in shared programs: 4310987 - 4310663 (-0.01%)
 instructions in affected programs: 43242 - 42918 (-0.75%)
 helped:142
 HURT:  0
 
  Shader-db results for vertex shaders on Broadwell:
 
 total instructions in shared programs: 2889581 - 2844309 (-1.57%)
 instructions in affected programs: 1418720 - 1373448 (-3.19%)
 helped:6139
 HURT:  0
  ---
   src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 
   1 file changed, 12 insertions(+)
 
  diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
  b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
  index 555987d..161a262 100644
  --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
  +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
  @@ -21,6 +21,8 @@
* IN THE SOFTWARE.
*/
 
  +#include algorithm
  +
   #include glsl/ir.h
   #include glsl/ir_optimization.h
   #include glsl/nir/glsl_to_nir.h
  @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr)
 op[i] = offset(op[i], instr-src[i].swizzle[channel]);
  }
 
  +   /* Immediates can only be used as the second source of two-source
  +* instructions.  We have code in opt_algebraic to flip them as needed
  +* for most instructions.  However, it doesn't hurt anything to just 
  do
  +* the right thing if we can detect it at the NIR level.
  +*/
  +   if ((nir_op_infos[instr-op].algebraic_properties  
  NIR_OP_IS_COMMUTATIVE) 
  +   nir_src_as_const_value(instr-src[0].src)) {
  +  std::swap(op[0], op[1]);
  +   }
  +

 The real problem is that we haven't lifted the restriction about
 non-commutative integer multiply on Broadwell:

 brw_fs_copy_propagation.cpp:

/* Fit this constant in by commuting the operands.
 * Exception: we can't do this for 32-bit integer MUL/MACH
 * because it's asymmetric.
 */
if ((inst-opcode == BRW_OPCODE_MUL ||
 inst-opcode == BRW_OPCODE_MACH) 
(inst-src[1].type == BRW_REGISTER_TYPE_D ||
 inst-src[1].type == BRW_REGISTER_TYPE_UD))
   break;
inst-src[0] = inst-src[1];
inst-src[1] = val;
progress = true;

 Yeah.  I also wrote a patch to do that, adding

(brw-gen  8 || brw-is_cherryview) 

 In that case, shouldn't Cherry View take the same path as hsw when
 emitting multiplies and look for 15-bit constants?

Almost... that path needs to set one of the MUL's source types to W/UW
and the stride to 2, like in commit 0c06d019. I'm planning to rip out
all of the multiplication code from brw_fs_visitor.cpp/brw_fs_nir.cpp
and move it to a common lowering pass, so I'll fix that at the same
time.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2 v3] i965: Use predicate enable bit for conditional rendering w/o stalling

2015-05-08 Thread Kenneth Graunke
On Friday, May 08, 2015 07:04:31 PM Neil Roberts wrote:
 Previously whenever a primitive is drawn the driver would call
 _mesa_check_conditional_render which blocks waiting for the result of
 the query to determine whether to render. On Gen7+ there is a bit in
 the 3DPRIMITIVE command which can be used to disable the primitive
 based on the value of a state bit. This state bit can be set based on
 whether two registers have different values using the MI_PREDICATE
 command. We can load these two registers with the pixel count values
 stored in the query begin and end to implement conditional rendering
 without stalling.
 
 Unfortunately these two source registers were not in the whitelist of
 available registers in the kernel driver until v3.19. This patch uses
 the command parser version from intel_screen to detect whether to
 attempt to set the predicate data registers.
 
 The predicate enable bit is currently only used for drawing 3D
 primitives. For blits, clears, bitmaps, copypixels and drawpixels it
 still causes a stall. For most of these it would probably just work to
 call the new brw_check_conditional_render function instead of
 _mesa_check_conditional_render because they already work in terms of
 rendering primitives. However it's a bit trickier for blits because it
 can use the BLT ring or the blorp codepath. I think these operations
 are less useful for conditional rendering than rendering primitives so
 it might be best to leave it for a later patch.
 
 v2: Use the command parser version to detect whether we can write to
 the predicate data registers instead of trying to execute a
 register load command.
 v3: Simple rebase
 ---
  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
  src/mesa/drivers/dri/i965/brw_conditional_render.c | 167 
 +
  src/mesa/drivers/dri/i965/brw_context.c|   4 +
  src/mesa/drivers/dri/i965/brw_context.h|  21 +++
  src/mesa/drivers/dri/i965/brw_defines.h|   1 +
  src/mesa/drivers/dri/i965/brw_draw.c   |  16 +-
  src/mesa/drivers/dri/i965/brw_queryobj.c   |  17 ++-
  src/mesa/drivers/dri/i965/intel_extensions.c   |   5 +
  src/mesa/drivers/dri/i965/intel_reg.h  |  23 +++
  9 files changed, 243 insertions(+), 12 deletions(-)
  create mode 100644 src/mesa/drivers/dri/i965/brw_conditional_render.c
 
 diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
 b/src/mesa/drivers/dri/i965/Makefile.sources
 index 1ae93e1..a24c20a 100644
 --- a/src/mesa/drivers/dri/i965/Makefile.sources
 +++ b/src/mesa/drivers/dri/i965/Makefile.sources
 @@ -18,6 +18,7 @@ i965_FILES = \
   brw_clip_unfilled.c \
   brw_clip_util.c \
   brw_compute.c \
 + brw_conditional_render.c \
   brw_context.c \
   brw_context.h \
   brw_cs.cpp \
 diff --git a/src/mesa/drivers/dri/i965/brw_conditional_render.c 
 b/src/mesa/drivers/dri/i965/brw_conditional_render.c
 new file mode 100644
 index 000..e676aa0
 --- /dev/null
 +++ b/src/mesa/drivers/dri/i965/brw_conditional_render.c
 @@ -0,0 +1,167 @@
 +/*
 + * Copyright © 2014 Intel Corporation
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the Software),
 + * to deal in the Software without restriction, including without limitation
 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the next
 + * paragraph) shall be included in all copies or substantial portions of the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
 DEALINGS
 + * IN THE SOFTWARE.
 + *
 + * Authors:
 + *Neil Roberts n...@linux.intel.com
 + */
 +
 +/** @file brw_conditional_render.c
 + *
 + * Support for conditional rendering based on query objects
 + * (GL_NV_conditional_render, GL_ARB_conditional_render_inverted) on Gen7+.
 + */
 +
 +#include main/imports.h
 +#include main/condrender.h
 +
 +#include brw_context.h
 +#include brw_defines.h
 +#include intel_batchbuffer.h
 +
 +static void
 +set_predicate_enable(struct brw_context *brw,
 + bool value)
 +{
 +   if (value)
 +  brw-predicate.state = BRW_PREDICATE_STATE_RENDER;
 +   else
 +  brw-predicate.state = BRW_PREDICATE_STATE_DONT_RENDER;
 +}
 +
 +static void
 +load_64bit_register(struct brw_context *brw,
 

Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching

2015-05-08 Thread Jason Ekstrand
On Fri, May 8, 2015 at 11:53 AM, Jason Ekstrand ja...@jlekstrand.net wrote:
 total instructions in shared programs: 7152330 - 7137006 (-0.21%)
 instructions in affected programs: 1330548 - 1315224 (-1.15%)
 helped:5797
 HURT:  76

I'm doing some looking into the hurt programs.  It seems as if there
are some very strange interatctions between flrp and ffma.  I'm still
working out exactly how to fix it up.
--Jason

 GAINED:0
 LOST:  8
 ---
  src/glsl/nir/nir_search.c | 11 +++
  1 file changed, 11 insertions(+)

 diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c
 index d86655b..4c83349 100644
 --- a/src/glsl/nir/nir_search.c
 +++ b/src/glsl/nir/nir_search.c
 @@ -199,6 +199,10 @@ match_expression(const nir_search_expression *expr, 
 nir_alu_instr *instr,
}
 }

 +   /* Stash off the current variables_seen bitmask.  This way we can
 +* restore it prior to matching in the commutative case below. */
 +   unsigned variables_seen_stash = state-variables_seen;
 +
 bool matched = true;
 for (unsigned i = 0; i  nir_op_infos[instr-op].num_inputs; i++) {
/* If the source is an explicitly sized source, then we need to reset
 @@ -221,6 +225,13 @@ match_expression(const nir_search_expression *expr, 
 nir_alu_instr *instr,

 if (nir_op_infos[instr-op].algebraic_properties  NIR_OP_IS_COMMUTATIVE) 
 {
assert(nir_op_infos[instr-op].num_inputs == 2);
 +
 +  /* Restore the variables_seen bitmask.  If we don't do this, then we
 +   * could end up with an erroneous failure due to variables found in the
 +   * first match attempt above not matching those in the second.
 +   */
 +  state-variables_seen = variables_seen_stash;
 +
if (!match_value(expr-srcs[0], instr, 1, num_components,
 swizzle, state))
   return false;
 --
 2.4.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH] nir: Transform 4*x into x 2 during late optimizations.

2015-05-08 Thread Eric Anholt
Ilia Mirkin imir...@alum.mit.edu writes:

 On Fri, May 8, 2015 at 6:36 AM, Kenneth Graunke kenn...@whitecape.org wrote:
 +   # Multiplication by 4 comes up fairly often in indirect offset 
 calculations.
 +   # Some GPUs have weird integer multiplication limitations, but shifts 
 should work
 +   # equally well everywhere.
 +   (('imul', 4, a), ('ishl', a, 2)),

 Not sure what the cost of doing it this way, but really you want all
 powers of 2... and also udiv - shr. Since this is python, should be
 easy enough to append onto that list. AFAIK all GPU's prefer a shift
 over a mul. Adreno doen't have 32-bit imul in the first place (and no
 idiv either).

I can confirm that I'd love shifts instead of imul/udiv on vc4.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] nir: Define image load, store and atomic intrinsics.

2015-05-08 Thread Kenneth Graunke
On Tuesday, May 05, 2015 11:29:52 PM Francisco Jerez wrote:
 ---
  src/glsl/nir/nir_intrinsics.h | 27 +++
  1 file changed, 27 insertions(+)
 
 diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
 index 8e28765..4b13c75 100644
 --- a/src/glsl/nir/nir_intrinsics.h
 +++ b/src/glsl/nir/nir_intrinsics.h
 @@ -89,6 +89,33 @@ ATOMIC(inc, 0)
  ATOMIC(dec, 0)
  ATOMIC(read, NIR_INTRINSIC_CAN_ELIMINATE)
  
 +/*
 + * Image load, store and atomic intrinsics.
 + *
 + * All image intrinsics take an image target passed as a nir_variable.  Image
 + * variables contain a number of memory and layout qualifiers that influence
 + * the semantics of the intrinsic.
 + *
 + * All image intrinsics take a four-coordinate vector and a sample index as
 + * first two sources, determining the location within the image that will be
 + * accessed by the intrinsic.  Components not applicable to the image target
 + * in use are equal to zero by convention.  Image store takes an additional
 + * four-component argument with the value to be written, and image atomic
 + * operations take either one or two additional scalar arguments with the 
 same
 + * meaning as in the ARB_shader_image_load_store specification.
 + */
 +INTRINSIC(image_load, 2, ARR(4, 1), true, 4, 1, 0,
 +  NIR_INTRINSIC_CAN_ELIMINATE)
 +INTRINSIC(image_store, 3, ARR(4, 1, 4), false, 0, 1, 0, 0)
 +INTRINSIC(image_atomic_add, 3, ARR(4, 1, 1), true, 1, 1, 0, 0)
 +INTRINSIC(image_atomic_min, 3, ARR(4, 1, 1), true, 1, 1, 0, 0)
 +INTRINSIC(image_atomic_max, 3, ARR(4, 1, 1), true, 1, 1, 0, 0)
 +INTRINSIC(image_atomic_and, 3, ARR(4, 1, 1), true, 1, 1, 0, 0)
 +INTRINSIC(image_atomic_or, 3, ARR(4, 1, 1), true, 1, 1, 0, 0)
 +INTRINSIC(image_atomic_xor, 3, ARR(4, 1, 1), true, 1, 1, 0, 0)
 +INTRINSIC(image_atomic_exchange, 3, ARR(4, 1, 1), true, 1, 1, 0, 0)
 +INTRINSIC(image_atomic_comp_swap, 4, ARR(4, 1, 1, 1), true, 1, 1, 0, 0)
 +
  #define SYSTEM_VALUE(name, components) \
 INTRINSIC(load_##name, 0, ARR(), true, components, 0, 0, \
 NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER)
 

Looks great, Curro.  Expanding the coordinate to a vec4 and always
having the parameters present for e.g. sample_index does reduce the
combinatorial explosion a lot.  FWIW, I also prefer undefined.

This should work out fine for SPIR-V too, it's pretty trivial to go
from their style to this (just combine the two - it's SSA after all).

These 5 are:
Reviewed-by: Kenneth Graunke kenn...@whitecape.org


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for textureGather

2015-05-08 Thread Chris Forbes
I don't have CHV or SKL hw or docs to try and confirm this, but this
does what it claims to.

Reviewed-by: Chris Forbes chr...@ijw.co.nz

On Sat, May 9, 2015 at 5:10 AM, Neil Roberts n...@linux.intel.com wrote:
 The opt_sampler_eot optimisation seems to break when the last
 instruction is SHADER_OPCODE_TG4. A bunch of Piglit tests end up doing
 this so it causes a lot of regressions. I can't find any documentation
 or known workarounds to indicate that this is expected behaviour, but
 considering that this is probably a pretty unlikely situation in a
 real use case we might as well disable it in order to avoid the
 regressions. In total this fixes 451 tests.

 Reviewed-by: Ben Widawsky b...@bwidawsk.net
 ---

 See here for some more discussion of this:

 http://lists.freedesktop.org/archives/mesa-dev/2015-May/083640.html

 As far as I can tell the Jenkins run mentioned in that email doesn't
 seem to have any tests on Cherryview or Skylake so that probably
 explains why it didn't pick up the regression.

  src/mesa/drivers/dri/i965/brw_fs.cpp | 10 ++
  1 file changed, 10 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs.cpp
 index 8dd680e..e9528e0 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
 @@ -2655,6 +2655,16 @@ fs_visitor::opt_sampler_eot()
 if (unlikely(tex_inst-is_head_sentinel()) || !tex_inst-is_tex())
return false;

 +   /* This optimisation doesn't seem to work for textureGather for some
 +* reason. I can't find any documentation or known workarounds to indicate
 +* that this is expected, but considering that it is probably pretty
 +* unlikely that a shader would directly write out the results from
 +* textureGather we might as well just disable it.
 +*/
 +   if (tex_inst-opcode == SHADER_OPCODE_TG4 ||
 +   tex_inst-opcode == SHADER_OPCODE_TG4_OFFSET)
 +  return false;
 +
 /* If there's no header present, we need to munge the LOAD_PAYLOAD as 
 well.
  * It's very likely to be the previous instruction.
  */
 --
 1.9.3

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] clover: add --with-icd-file-dir option

2015-05-08 Thread Emil Velikov
On 8 May 2015 at 03:10, Michel Dänzer mic...@daenzer.net wrote:
 On 08.05.2015 03:24, Tom Stellard wrote:
 For this particular situation, I'm happy with any solution that:

 1. Allows a user to install the icd file to /etc if he or she wants to.

 --sysconfdir=/etc

 That covers drirc as well.


 2. Does not require the user to read the spec to know that /etc is the
 correct place to install it.

 I think the above is pretty standard for autotools projects. I think it
 would be better to document this in the appropriate place(s) for OpenCL
 users than to add another convoluted option which doesn't really add any
 flexibility.

Very well said Michel. I'm suspecting that the latest approach will
lead to quite a bit of confusion. Even amongst experienced package
maintainers. If in doubt one can ping the debian, arch and/or fedora
guys (i.e the ones shipping opencl) for their 2c :-)

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2 v3] i965: Use predicate enable bit for conditional rendering w/o stalling

2015-05-08 Thread Neil Roberts
Previously whenever a primitive is drawn the driver would call
_mesa_check_conditional_render which blocks waiting for the result of
the query to determine whether to render. On Gen7+ there is a bit in
the 3DPRIMITIVE command which can be used to disable the primitive
based on the value of a state bit. This state bit can be set based on
whether two registers have different values using the MI_PREDICATE
command. We can load these two registers with the pixel count values
stored in the query begin and end to implement conditional rendering
without stalling.

Unfortunately these two source registers were not in the whitelist of
available registers in the kernel driver until v3.19. This patch uses
the command parser version from intel_screen to detect whether to
attempt to set the predicate data registers.

The predicate enable bit is currently only used for drawing 3D
primitives. For blits, clears, bitmaps, copypixels and drawpixels it
still causes a stall. For most of these it would probably just work to
call the new brw_check_conditional_render function instead of
_mesa_check_conditional_render because they already work in terms of
rendering primitives. However it's a bit trickier for blits because it
can use the BLT ring or the blorp codepath. I think these operations
are less useful for conditional rendering than rendering primitives so
it might be best to leave it for a later patch.

v2: Use the command parser version to detect whether we can write to
the predicate data registers instead of trying to execute a
register load command.
v3: Simple rebase
---
 src/mesa/drivers/dri/i965/Makefile.sources |   1 +
 src/mesa/drivers/dri/i965/brw_conditional_render.c | 167 +
 src/mesa/drivers/dri/i965/brw_context.c|   4 +
 src/mesa/drivers/dri/i965/brw_context.h|  21 +++
 src/mesa/drivers/dri/i965/brw_defines.h|   1 +
 src/mesa/drivers/dri/i965/brw_draw.c   |  16 +-
 src/mesa/drivers/dri/i965/brw_queryobj.c   |  17 ++-
 src/mesa/drivers/dri/i965/intel_extensions.c   |   5 +
 src/mesa/drivers/dri/i965/intel_reg.h  |  23 +++
 9 files changed, 243 insertions(+), 12 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_conditional_render.c

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 1ae93e1..a24c20a 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -18,6 +18,7 @@ i965_FILES = \
brw_clip_unfilled.c \
brw_clip_util.c \
brw_compute.c \
+   brw_conditional_render.c \
brw_context.c \
brw_context.h \
brw_cs.cpp \
diff --git a/src/mesa/drivers/dri/i965/brw_conditional_render.c 
b/src/mesa/drivers/dri/i965/brw_conditional_render.c
new file mode 100644
index 000..e676aa0
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_conditional_render.c
@@ -0,0 +1,167 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the Software),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Neil Roberts n...@linux.intel.com
+ */
+
+/** @file brw_conditional_render.c
+ *
+ * Support for conditional rendering based on query objects
+ * (GL_NV_conditional_render, GL_ARB_conditional_render_inverted) on Gen7+.
+ */
+
+#include main/imports.h
+#include main/condrender.h
+
+#include brw_context.h
+#include brw_defines.h
+#include intel_batchbuffer.h
+
+static void
+set_predicate_enable(struct brw_context *brw,
+ bool value)
+{
+   if (value)
+  brw-predicate.state = BRW_PREDICATE_STATE_RENDER;
+   else
+  brw-predicate.state = BRW_PREDICATE_STATE_DONT_RENDER;
+}
+
+static void
+load_64bit_register(struct brw_context *brw,
+uint32_t reg,
+drm_intel_bo *bo,
+uint32_t offset)
+{
+   int i;
+
+   for (i = 0; i  2; i++) {
+   

[Mesa-dev] [PATCH 1/2] i965: Store the command parser version number in intel_screen

2015-05-08 Thread Neil Roberts
In order to detect whether the predicate source registers can be used
in a later patch we will need to know the version number for the
command parser. This patch just adds a member to intel_screen and does
an ioctl to get the version.

Reviewed-by: Kenneth Graunke kenn...@whitecape.org
---
 src/mesa/drivers/dri/i965/intel_screen.c | 7 +++
 src/mesa/drivers/dri/i965/intel_screen.h | 8 +++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index dda1638..896a125 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -1407,6 +1407,13 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp)
  (ret != -1 || errno != EINVAL);
}
 
+   struct drm_i915_getparam getparam;
+   getparam.param = I915_PARAM_CMD_PARSER_VERSION;
+   getparam.value = intelScreen-cmd_parser_version;
+   const int ret = drmIoctl(psp-fd, DRM_IOCTL_I915_GETPARAM, getparam);
+   if (ret == -1)
+  intelScreen-cmd_parser_version = 0;
+
psp-extensions = !intelScreen-has_context_reset_notification
   ? intelScreenExtensions : intelRobustScreenExtensions;
 
diff --git a/src/mesa/drivers/dri/i965/intel_screen.h 
b/src/mesa/drivers/dri/i965/intel_screen.h
index e7a1490..742b3d3 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.h
+++ b/src/mesa/drivers/dri/i965/intel_screen.h
@@ -72,7 +72,13 @@ struct intel_screen
* Configuration cache with default values for all contexts
*/
driOptionCache optionCache;
-};
+
+   /**
+* Version of the command parser reported by the
+* I915_PARAM_CMD_PARSER_VERSION parameter
+*/
+   int cmd_parser_version;
+ };
 
 extern void intelDestroyContext(__DRIcontext * driContextPriv);
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops

2015-05-08 Thread Kenneth Graunke
On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote:
 On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand ja...@jlekstrand.net wrote:
  Shader-db results for fragment shaders on Broadwell:
 
 total instructions in shared programs: 4310987 - 4310663 (-0.01%)
 instructions in affected programs: 43242 - 42918 (-0.75%)
 helped:142
 HURT:  0
 
  Shader-db results for vertex shaders on Broadwell:
 
 total instructions in shared programs: 2889581 - 2844309 (-1.57%)
 instructions in affected programs: 1418720 - 1373448 (-3.19%)
 helped:6139
 HURT:  0
  ---
   src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 
   1 file changed, 12 insertions(+)
 
  diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
  b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
  index 555987d..161a262 100644
  --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
  +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
  @@ -21,6 +21,8 @@
* IN THE SOFTWARE.
*/
 
  +#include algorithm
  +
   #include glsl/ir.h
   #include glsl/ir_optimization.h
   #include glsl/nir/glsl_to_nir.h
  @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr)
 op[i] = offset(op[i], instr-src[i].swizzle[channel]);
  }
 
  +   /* Immediates can only be used as the second source of two-source
  +* instructions.  We have code in opt_algebraic to flip them as needed
  +* for most instructions.  However, it doesn't hurt anything to just do
  +* the right thing if we can detect it at the NIR level.
  +*/
  +   if ((nir_op_infos[instr-op].algebraic_properties  
  NIR_OP_IS_COMMUTATIVE) 
  +   nir_src_as_const_value(instr-src[0].src)) {
  +  std::swap(op[0], op[1]);
  +   }
  +
 
 The real problem is that we haven't lifted the restriction about
 non-commutative integer multiply on Broadwell:
 
 brw_fs_copy_propagation.cpp:
 
/* Fit this constant in by commuting the operands.
 * Exception: we can't do this for 32-bit integer MUL/MACH
 * because it's asymmetric.
 */
if ((inst-opcode == BRW_OPCODE_MUL ||
 inst-opcode == BRW_OPCODE_MACH) 
(inst-src[1].type == BRW_REGISTER_TYPE_D ||
 inst-src[1].type == BRW_REGISTER_TYPE_UD))
   break;
inst-src[0] = inst-src[1];
inst-src[1] = val;
progress = true;

Yeah.  I also wrote a patch to do that, adding

   (brw-gen  8 || brw-is_cherryview) 

which also solves the problem.  But it won't help on Cherryview, which I
believe still has the asymmetrical multiply, while switching to shifts
will.  I suppose another alternative to NIR late optimizations is to
have brw_fs_nir's handling of imul check for power of two sizes and emit
a SHL.  That would be easy.

I do think we really need to make logical IMUL opcodes and lower them to
MUL/MACH as needed later, so we don't need to worry about breaking MACH
in cases like this.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops

2015-05-08 Thread Jason Ekstrand
On Fri, May 8, 2015 at 11:05 AM, Kenneth Graunke kenn...@whitecape.org wrote:
 On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote:
 On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand ja...@jlekstrand.net wrote:
  Shader-db results for fragment shaders on Broadwell:
 
 total instructions in shared programs: 4310987 - 4310663 (-0.01%)
 instructions in affected programs: 43242 - 42918 (-0.75%)
 helped:142
 HURT:  0
 
  Shader-db results for vertex shaders on Broadwell:
 
 total instructions in shared programs: 2889581 - 2844309 (-1.57%)
 instructions in affected programs: 1418720 - 1373448 (-3.19%)
 helped:6139
 HURT:  0
  ---
   src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 
   1 file changed, 12 insertions(+)
 
  diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
  b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
  index 555987d..161a262 100644
  --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
  +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
  @@ -21,6 +21,8 @@
* IN THE SOFTWARE.
*/
 
  +#include algorithm
  +
   #include glsl/ir.h
   #include glsl/ir_optimization.h
   #include glsl/nir/glsl_to_nir.h
  @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr)
 op[i] = offset(op[i], instr-src[i].swizzle[channel]);
  }
 
  +   /* Immediates can only be used as the second source of two-source
  +* instructions.  We have code in opt_algebraic to flip them as needed
  +* for most instructions.  However, it doesn't hurt anything to just do
  +* the right thing if we can detect it at the NIR level.
  +*/
  +   if ((nir_op_infos[instr-op].algebraic_properties  
  NIR_OP_IS_COMMUTATIVE) 
  +   nir_src_as_const_value(instr-src[0].src)) {
  +  std::swap(op[0], op[1]);
  +   }
  +

 The real problem is that we haven't lifted the restriction about
 non-commutative integer multiply on Broadwell:

 brw_fs_copy_propagation.cpp:

/* Fit this constant in by commuting the operands.
 * Exception: we can't do this for 32-bit integer MUL/MACH
 * because it's asymmetric.
 */
if ((inst-opcode == BRW_OPCODE_MUL ||
 inst-opcode == BRW_OPCODE_MACH) 
(inst-src[1].type == BRW_REGISTER_TYPE_D ||
 inst-src[1].type == BRW_REGISTER_TYPE_UD))
   break;
inst-src[0] = inst-src[1];
inst-src[1] = val;
progress = true;

 Yeah.  I also wrote a patch to do that, adding

(brw-gen  8 || brw-is_cherryview) 

In that case, shouldn't Cherry View take the same path as hsw when
emitting multiplies and look for 15-bit constants?

 which also solves the problem.  But it won't help on Cherryview, which I
 believe still has the asymmetrical multiply, while switching to shifts
 will.  I suppose another alternative to NIR late optimizations is to
 have brw_fs_nir's handling of imul check for power of two sizes and emit
 a SHL.  That would be easy.

I really don't think SHL is the issue here.  It's that we're being
stupid about multiplies.  SHL is a nice hack but unless it is actually
faster, I think it's hacking around the problem.

 I do think we really need to make logical IMUL opcodes and lower them to
 MUL/MACH as needed later, so we don't need to worry about breaking MACH
 in cases like this.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] clover: Add threadsafe wrappers for pipe_screen and pipe_context

2015-05-08 Thread Emil Velikov
Hi Tom,

On 8 May 2015 at 00:36, Tom Stellard thomas.stell...@amd.com wrote:
 Events can be added to an OpenCL command queue concurrently from
 multiple threads, but pipe_context and pipe_screen objects
 are not threadsafe.  The threadsafe wrappers protect all pipe_screen
 and pipe_context function calls with a mutex, so we can safely use
 them with multiple threads.
 ---
  src/gallium/state_trackers/clover/Makefile.am  |   6 +-
  src/gallium/state_trackers/clover/Makefile.sources |   4 +
  src/gallium/state_trackers/clover/core/device.cpp  |   2 +
  .../clover/core/pipe_threadsafe_context.c  | 272 
 +
  .../clover/core/pipe_threadsafe_screen.c   | 184 ++
  .../state_trackers/clover/core/threadsafe.h|  39 +++
  src/gallium/targets/opencl/Makefile.am |   3 +-
  7 files changed, 508 insertions(+), 2 deletions(-)
  create mode 100644 
 src/gallium/state_trackers/clover/core/pipe_threadsafe_context.c
  create mode 100644 
 src/gallium/state_trackers/clover/core/pipe_threadsafe_screen.c
  create mode 100644 src/gallium/state_trackers/clover/core/threadsafe.h


 --- a/src/gallium/state_trackers/clover/Makefile.sources
 +++ b/src/gallium/state_trackers/clover/Makefile.sources
 @@ -53,6 +53,10 @@ CPP_SOURCES := \
 util/range.hpp \
 util/tuple.hpp

 +C_SOURCES := \
 +   core/pipe_threadsafe_context.c \
 +   core/pipe_threadsafe_screen.c
 +
Regardless of the approach (re Francisco's comment), please keep the
header (core/threadsafe.h) within the sources list.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/2] i965: Do conditional rendering in hardware

2015-05-08 Thread Neil Roberts
I thought it might be a good idea to try posting these patches again
since it's been 6 months since they were originally posted. The
patches are a lot more useful now since the command parser in the
kernel is working correctly for Haswell. This means the functionality
is no longer restricted to only Ivybridge.

I haven't changed the patches apart from some minor rebasing. I've
rerun it through Piglit on Ivybridge and it doesn't cause any
regressions.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: fix sampler lowering pass for arrays

2015-05-08 Thread Jason Ekstrand
Over-all, I think this is on the right track, but I still don't think
it's 100% correct.

On Fri, May 8, 2015 at 12:04 AM, Tapani Pälli tapani.pa...@intel.com wrote:


 On 05/08/2015 09:56 AM, Pohjolainen, Topi wrote:

 On Fri, May 08, 2015 at 09:51:54AM +0300, Tapani P?lli wrote:

 This fixes bugs with special cases where we have arrays of
 structures containing samplers or arrays of samplers.

 I've verified that patch results in calculating same index value as
 returned by _mesa_get_sampler_uniform_value for IR. Patch makes
 following ES3 conformance test pass:

 ES3-CTS.shaders.struct.uniform.sampler_array_fragment

 Signed-off-by: Tapani Pälli tapani.pa...@intel.com
 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90114
 ---
   src/glsl/nir/nir_lower_samplers.cpp | 6 +-
   1 file changed, 5 insertions(+), 1 deletion(-)

 diff --git a/src/glsl/nir/nir_lower_samplers.cpp
 b/src/glsl/nir/nir_lower_samplers.cpp
 index cf8ab83..9859cc0 100644
 --- a/src/glsl/nir/nir_lower_samplers.cpp
 +++ b/src/glsl/nir/nir_lower_samplers.cpp
 @@ -78,7 +78,11 @@ lower_sampler(nir_tex_instr *instr, const struct
 gl_shader_program *shader_progr
instr-sampler_index *= glsl_get_length(deref-type);

We really should get rid of the multiply since the sampler index is
zero up until the end and the multiply does nothing but confuse
people.

switch (deref_array-deref_array_type) {
case nir_deref_array_type_direct:
 -instr-sampler_index += deref_array-base_offset;
 +
 +/* If this is an array of samplers. */


 Above the case is for arrays and below you check for the sampler. This
 comment does not tell much extra :)


 Yeah, not sure how to change it. What I want to state here is that only for
 arrays of samplers we need to do this, otherwise we don't. The only other
 case really is array of structs that contain sampler so maybe I should state
 that instead?



 +if (deref-child-type-base_type == GLSL_TYPE_SAMPLER)
 +   instr-sampler_index += deref_array-base_offset;
 +
   if (deref_array-deref.child)
  ralloc_asprintf_append(name, [%u],
 deref_array-base_offset);

The two conditionals above should be combined.  If the deref has a
child, it should not have type SAMPLER and vice-versa.  A better way
to do it would be

if (deref_array-deref.child) {
   ralloc_asprintf_append(name, [%u], deref_array-base_offset);
} else {
   assert(deref-child-type-bbase_type == GLSL_TYPE_SAMPLER);
   instr-sampler_index = deref_array-base_offset;
}

Also, it may be better to do that outside of the switch and turn the
switch into an if (deref_array-deref_array_type ==
deref_array_type_indirect) to handle indirects.  Right now, I don't
think that we correctly handle an indirect with a base offset other
than 0.

Does that make sense?
--Jason

   break;
 --
 2.1.0

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] clover: Add threadsafe wrappers for pipe_screen and pipe_context

2015-05-08 Thread Francisco Jerez
Tom Stellard thomas.stell...@amd.com writes:

 Events can be added to an OpenCL command queue concurrently from
 multiple threads, but pipe_context and pipe_screen objects
 are not threadsafe.  The threadsafe wrappers protect all pipe_screen
 and pipe_context function calls with a mutex, so we can safely use
 them with multiple threads.
 ---
  src/gallium/state_trackers/clover/Makefile.am  |   6 +-
  src/gallium/state_trackers/clover/Makefile.sources |   4 +
  src/gallium/state_trackers/clover/core/device.cpp  |   2 +
  .../clover/core/pipe_threadsafe_context.c  | 272 
 +
  .../clover/core/pipe_threadsafe_screen.c   | 184 ++
  .../state_trackers/clover/core/threadsafe.h|  39 +++
  src/gallium/targets/opencl/Makefile.am |   3 +-
  7 files changed, 508 insertions(+), 2 deletions(-)
  create mode 100644 
 src/gallium/state_trackers/clover/core/pipe_threadsafe_context.c
  create mode 100644 
 src/gallium/state_trackers/clover/core/pipe_threadsafe_screen.c
  create mode 100644 src/gallium/state_trackers/clover/core/threadsafe.h

 diff --git a/src/gallium/state_trackers/clover/Makefile.am 
 b/src/gallium/state_trackers/clover/Makefile.am
 index f46d9ef..8b615ae 100644
 --- a/src/gallium/state_trackers/clover/Makefile.am
 +++ b/src/gallium/state_trackers/clover/Makefile.am
 @@ -1,5 +1,6 @@
  AUTOMAKE_OPTIONS = subdir-objects
  
 +include $(top_srcdir)/src/gallium/Automake.inc
  include Makefile.sources
  
  AM_CPPFLAGS = \
 @@ -32,6 +33,9 @@ cl_HEADERS = \
   $(top_srcdir)/include/CL/opencl.h
  endif
  
 +AM_CFLAGS = \
 + $(GALLIUM_CFLAGS)
 +
  noinst_LTLIBRARIES = libclover.la libcltgsi.la libclllvm.la
  
  libcltgsi_la_CXXFLAGS = \
 @@ -58,6 +62,6 @@ libclover_la_CXXFLAGS = \
  libclover_la_LIBADD = \
   libcltgsi.la libclllvm.la
  
 -libclover_la_SOURCES = $(CPP_SOURCES)
 +libclover_la_SOURCES = $(CPP_SOURCES) $(C_SOURCES)
  
  EXTRA_DIST = Doxyfile
 diff --git a/src/gallium/state_trackers/clover/Makefile.sources 
 b/src/gallium/state_trackers/clover/Makefile.sources
 index 10bbda0..90e6b7e 100644
 --- a/src/gallium/state_trackers/clover/Makefile.sources
 +++ b/src/gallium/state_trackers/clover/Makefile.sources
 @@ -53,6 +53,10 @@ CPP_SOURCES := \
   util/range.hpp \
   util/tuple.hpp
  
 +C_SOURCES := \
 + core/pipe_threadsafe_context.c \
 + core/pipe_threadsafe_screen.c
 +
  LLVM_SOURCES := \
   llvm/invocation.cpp
  
 diff --git a/src/gallium/state_trackers/clover/core/device.cpp 
 b/src/gallium/state_trackers/clover/core/device.cpp
 index 42b45b7..b145027 100644
 --- a/src/gallium/state_trackers/clover/core/device.cpp
 +++ b/src/gallium/state_trackers/clover/core/device.cpp
 @@ -22,6 +22,7 @@
  
  #include core/device.hpp
  #include core/platform.hpp
 +#include core/threadsafe.h
  #include pipe/p_screen.h
  #include pipe/p_state.h
  
 @@ -47,6 +48,7 @@ device::device(clover::platform platform, 
 pipe_loader_device *ldev) :
   pipe-destroy(pipe);
throw error(CL_INVALID_DEVICE);
 }
 +   pipe = pipe_threadsafe_screen(pipe);
  }
  
  device::~device() {
 diff --git a/src/gallium/state_trackers/clover/core/pipe_threadsafe_context.c 
 b/src/gallium/state_trackers/clover/core/pipe_threadsafe_context.c
 new file mode 100644
 index 000..f08f56c
 --- /dev/null
 +++ b/src/gallium/state_trackers/clover/core/pipe_threadsafe_context.c
 @@ -0,0 +1,272 @@
 +/*
 + * Copyright 2015 Advanced Micro Devices, Inc.
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a
 + * copy of this software and associated documentation files (the Software),
 + * to deal in the Software without restriction, including without limitation
 + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 + * and/or sell copies of the Software, and to permit persons to whom the
 + * Software is furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice (including the next
 + * paragraph) shall be included in all copies or substantial portions of the
 + * Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
 FROM,
 + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
 THE
 + * SOFTWARE.
 + *
 + * Authors: Tom Stellard thomas.stell...@amd.com
 + *
 + */
 +
 +#include stdio.h
 +
 +/**
 + * \file
 + *
 + * threadsafe_context is a wrapper around a pipe_context to make it thread
 + * safe.
 + */
 +
 +#include os/os_thread.h
 +#include pipe/p_context.h
 +#include util/u_memory.h
 +
 +#include threadsafe.h
 +
 +
 +
 +struct threadsafe_context {
 +   

Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops

2015-05-08 Thread Matt Turner
On Fri, May 8, 2015 at 11:05 AM, Kenneth Graunke kenn...@whitecape.org wrote:
 On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote:
 On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand ja...@jlekstrand.net wrote:
  Shader-db results for fragment shaders on Broadwell:
 
 total instructions in shared programs: 4310987 - 4310663 (-0.01%)
 instructions in affected programs: 43242 - 42918 (-0.75%)
 helped:142
 HURT:  0
 
  Shader-db results for vertex shaders on Broadwell:
 
 total instructions in shared programs: 2889581 - 2844309 (-1.57%)
 instructions in affected programs: 1418720 - 1373448 (-3.19%)
 helped:6139
 HURT:  0
  ---
   src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 
   1 file changed, 12 insertions(+)
 
  diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
  b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
  index 555987d..161a262 100644
  --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
  +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
  @@ -21,6 +21,8 @@
* IN THE SOFTWARE.
*/
 
  +#include algorithm
  +
   #include glsl/ir.h
   #include glsl/ir_optimization.h
   #include glsl/nir/glsl_to_nir.h
  @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr)
 op[i] = offset(op[i], instr-src[i].swizzle[channel]);
  }
 
  +   /* Immediates can only be used as the second source of two-source
  +* instructions.  We have code in opt_algebraic to flip them as needed
  +* for most instructions.  However, it doesn't hurt anything to just do
  +* the right thing if we can detect it at the NIR level.
  +*/
  +   if ((nir_op_infos[instr-op].algebraic_properties  
  NIR_OP_IS_COMMUTATIVE) 
  +   nir_src_as_const_value(instr-src[0].src)) {
  +  std::swap(op[0], op[1]);
  +   }
  +

 The real problem is that we haven't lifted the restriction about
 non-commutative integer multiply on Broadwell:

 brw_fs_copy_propagation.cpp:

/* Fit this constant in by commuting the operands.
 * Exception: we can't do this for 32-bit integer MUL/MACH
 * because it's asymmetric.
 */
if ((inst-opcode == BRW_OPCODE_MUL ||
 inst-opcode == BRW_OPCODE_MACH) 
(inst-src[1].type == BRW_REGISTER_TYPE_D ||
 inst-src[1].type == BRW_REGISTER_TYPE_UD))
   break;
inst-src[0] = inst-src[1];
inst-src[1] = val;
progress = true;

 Yeah.  I also wrote a patch to do that, adding

(brw-gen  8 || brw-is_cherryview) 

 which also solves the problem.  But it won't help on Cherryview, which I
 believe still has the asymmetrical multiply, while switching to shifts
 will.  I suppose another alternative to NIR late optimizations is to
 have brw_fs_nir's handling of imul check for power of two sizes and emit
 a SHL.  That would be easy.

 I do think we really need to make logical IMUL opcodes and lower them to
 MUL/MACH as needed later, so we don't need to worry about breaking MACH
 in cases like this.

Indeed. I mentioned yesterday I've been planning to do it for a while,
so I'll go ahead and take care of it.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH] nir: Transform 4*x into x 2 during late optimizations.

2015-05-08 Thread Ilia Mirkin
On Fri, May 8, 2015 at 6:36 AM, Kenneth Graunke kenn...@whitecape.org wrote:
 +   # Multiplication by 4 comes up fairly often in indirect offset 
 calculations.
 +   # Some GPUs have weird integer multiplication limitations, but shifts 
 should work
 +   # equally well everywhere.
 +   (('imul', 4, a), ('ishl', a, 2)),

Not sure what the cost of doing it this way, but really you want all
powers of 2... and also udiv - shr. Since this is python, should be
easy enough to append onto that list. AFAIK all GPU's prefer a shift
over a mul. Adreno doen't have 32-bit imul in the first place (and no
idiv either).

In nouveau/codegen we just have a single check for whether the
immediate is a power of 2, perhaps that can be encoded here in some
way.

Cheers,

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching

2015-05-08 Thread Connor Abbott
On Fri, May 8, 2015 at 2:53 PM, Jason Ekstrand ja...@jlekstrand.net wrote:
 total instructions in shared programs: 7152330 - 7137006 (-0.21%)
 instructions in affected programs: 1330548 - 1315224 (-1.15%)
 helped:5797
 HURT:  76
 GAINED:0
 LOST:  8
 ---
  src/glsl/nir/nir_search.c | 11 +++
  1 file changed, 11 insertions(+)

 diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c
 index d86655b..4c83349 100644
 --- a/src/glsl/nir/nir_search.c
 +++ b/src/glsl/nir/nir_search.c
 @@ -199,6 +199,10 @@ match_expression(const nir_search_expression *expr, 
 nir_alu_instr *instr,
}
 }

 +   /* Stash off the current variables_seen bitmask.  This way we can
 +* restore it prior to matching in the commutative case below. */

Puth the */ on the next line, and this is

Reviewed-by: Connor Abbott cwabbo...@gmail.com

 +   unsigned variables_seen_stash = state-variables_seen;
 +
 bool matched = true;
 for (unsigned i = 0; i  nir_op_infos[instr-op].num_inputs; i++) {
/* If the source is an explicitly sized source, then we need to reset
 @@ -221,6 +225,13 @@ match_expression(const nir_search_expression *expr, 
 nir_alu_instr *instr,

 if (nir_op_infos[instr-op].algebraic_properties  NIR_OP_IS_COMMUTATIVE) 
 {
assert(nir_op_infos[instr-op].num_inputs == 2);
 +
 +  /* Restore the variables_seen bitmask.  If we don't do this, then we
 +   * could end up with an erroneous failure due to variables found in the
 +   * first match attempt above not matching those in the second.
 +   */
 +  state-variables_seen = variables_seen_stash;
 +
if (!match_value(expr-srcs[0], instr, 1, num_components,
 swizzle, state))
   return false;
 --
 2.4.0

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/search: Assert that variable id's are in range

2015-05-08 Thread Connor Abbott
Reviewed-by: Connor Abbott cwabbo...@gmail.com

On Fri, May 8, 2015 at 3:13 PM, Jason Ekstrand ja...@jlekstrand.net wrote:
 ---
  src/glsl/nir/nir_search.c | 1 +
  1 file changed, 1 insertion(+)

 diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c
 index 5ba0160..d86655b 100644
 --- a/src/glsl/nir/nir_search.c
 +++ b/src/glsl/nir/nir_search.c
 @@ -90,6 +90,7 @@ match_value(const nir_search_value *value, nir_alu_instr 
 *instr, unsigned src,

 case nir_search_value_variable: {
nir_search_variable *var = nir_search_value_as_variable(value);
 +  assert(var-variable  NIR_SEARCH_MAX_VARIABLES);

if (state-variables_seen  (1  var-variable)) {
   if (!nir_srcs_equal(state-variables[var-variable].src,
 --
 2.4.0

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH] nir: Transform 4*x into x 2 during late optimizations.

2015-05-08 Thread Ian Romanick
On 05/08/2015 03:36 AM, Kenneth Graunke wrote:
 According to Glenn, shifts on R600 have 5x the throughput as multiplies.
 
 Intel GPUs have strange integer multiplication restrictions - on most
 hardware, MUL actually only does a 32-bit x 16-bit multiply.  This
 means the arguments aren't commutative, which can limit our constant
 propagation options.  SHL has no such restrictions.
 
 Shifting is probably reasonable on most people's hardware, so let's just
 do that.
 
 i965 shader-db results (using NIR for VS):
 total instructions in shared programs: 7432587 - 7388982 (-0.59%)
 instructions in affected programs: 1360411 - 1316806 (-3.21%)
 helped:5772
 HURT:  0
 
 Signed-off-by: Kenneth Graunke kenn...@whitecape.org
 Cc: matts...@gmail.com
 Cc: ja...@jlekstrand.net
 ---
  src/glsl/nir/nir_opt_algebraic.py | 5 +
  1 file changed, 5 insertions(+)
 
 So...I found a bizarre issue with this patch.
 
(('imul', 4, a), ('ishl', a, 2)),
 
 totally optimizes things.  However...
 
(('imul', a, 4), ('ishl', a, 2)),
 
 doesn't seem to do anything, even though imul is commutative, and nir_search
 should totally handle that...
 
  ▄▄  ▄▄▄▄    ▄   ▄▄
  ██  ██   ▀▀▀██▀▀▀  ███  ██
  ▀█▄ ██ ▄█▀      ██ ▄█▀  ██
   ██ ██ ██   ██  ██  ██   ▄██▀   ██
   ███▀▀███   ██  ██   ██ ▀▀
   ███  ███  ▄██  ██▄ ██   ▄▄ ▄▄
   ▀▀▀  ▀▀▀  ▀▀▀▀ ▀▀   ▀▀ ▀▀
 
 If you know why, let me know, otherwise I may have to look into it when more
 awake.

I've noticed a couple other weird things that I have been unable to
understand.  Shaders like the one below end with fmul/ffma instaed of
flrp, for example.  I understand why that happens from GLSL IR
opt_algebraic, but it seems like nir_opt_algebraic should handle it.

[require]
GLSL = 1.30

[vertex shader]
in vec4 v;
in vec2 tc_in;

out vec2 tc;

void main() {
gl_Position = v;
tc = tc_in;
}

[fragment shader]
in vec2 tc;

out vec4 color;

uniform sampler2D s;
uniform float a;
uniform vec3 base_color;

void main() {
vec3 tex_color = texture(s, tc).xyz;

color.xyz = (base_color * a) + (tex_color * (1.0 - a));
color.a = 1.0;
}



 diff --git a/src/glsl/nir/nir_opt_algebraic.py 
 b/src/glsl/nir/nir_opt_algebraic.py
 index 400d60e..350471f 100644
 --- a/src/glsl/nir/nir_opt_algebraic.py
 +++ b/src/glsl/nir/nir_opt_algebraic.py
 @@ -247,6 +247,11 @@ late_optimizations = [
 (('fge', ('fadd', a, b), 0.0), ('fge', a, ('fneg', b))),
 (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))),
 (('fne', ('fadd', a, b), 0.0), ('fne', a, ('fneg', b))),
 +
 +   # Multiplication by 4 comes up fairly often in indirect offset 
 calculations.
 +   # Some GPUs have weird integer multiplication limitations, but shifts 
 should work
 +   # equally well everywhere.
 +   (('imul', 4, a), ('ishl', a, 2)),

This should be conditionalized on whether the platform has native integers.

  ]
  
  print nir_algebraic.AlgebraicPass(nir_opt_algebraic, 
 optimizations).render()
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops

2015-05-08 Thread Jason Ekstrand
On Fri, May 8, 2015 at 11:15 AM, Matt Turner matts...@gmail.com wrote:
 On Fri, May 8, 2015 at 11:13 AM, Jason Ekstrand ja...@jlekstrand.net wrote:
 On Fri, May 8, 2015 at 11:05 AM, Kenneth Graunke kenn...@whitecape.org 
 wrote:
 On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote:
 On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand ja...@jlekstrand.net 
 wrote:
  Shader-db results for fragment shaders on Broadwell:
 
 total instructions in shared programs: 4310987 - 4310663 (-0.01%)
 instructions in affected programs: 43242 - 42918 (-0.75%)
 helped:142
 HURT:  0
 
  Shader-db results for vertex shaders on Broadwell:
 
 total instructions in shared programs: 2889581 - 2844309 (-1.57%)
 instructions in affected programs: 1418720 - 1373448 (-3.19%)
 helped:6139
 HURT:  0
  ---
   src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 
   1 file changed, 12 insertions(+)
 
  diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
  b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
  index 555987d..161a262 100644
  --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
  +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
  @@ -21,6 +21,8 @@
* IN THE SOFTWARE.
*/
 
  +#include algorithm
  +
   #include glsl/ir.h
   #include glsl/ir_optimization.h
   #include glsl/nir/glsl_to_nir.h
  @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr)
 op[i] = offset(op[i], instr-src[i].swizzle[channel]);
  }
 
  +   /* Immediates can only be used as the second source of two-source
  +* instructions.  We have code in opt_algebraic to flip them as 
  needed
  +* for most instructions.  However, it doesn't hurt anything to just 
  do
  +* the right thing if we can detect it at the NIR level.
  +*/
  +   if ((nir_op_infos[instr-op].algebraic_properties  
  NIR_OP_IS_COMMUTATIVE) 
  +   nir_src_as_const_value(instr-src[0].src)) {
  +  std::swap(op[0], op[1]);
  +   }
  +

 The real problem is that we haven't lifted the restriction about
 non-commutative integer multiply on Broadwell:

 brw_fs_copy_propagation.cpp:

/* Fit this constant in by commuting the operands.
 * Exception: we can't do this for 32-bit integer MUL/MACH
 * because it's asymmetric.
 */
if ((inst-opcode == BRW_OPCODE_MUL ||
 inst-opcode == BRW_OPCODE_MACH) 
(inst-src[1].type == BRW_REGISTER_TYPE_D ||
 inst-src[1].type == BRW_REGISTER_TYPE_UD))
   break;
inst-src[0] = inst-src[1];
inst-src[1] = val;
progress = true;

 Yeah.  I also wrote a patch to do that, adding

(brw-gen  8 || brw-is_cherryview) 

 In that case, shouldn't Cherry View take the same path as hsw when
 emitting multiplies and look for 15-bit constants?

 Almost... that path needs to set one of the MUL's source types to W/UW
 and the stride to 2, like in commit 0c06d019. I'm planning to rip out
 all of the multiplication code from brw_fs_visitor.cpp/brw_fs_nir.cpp
 and move it to a common lowering pass, so I'll fix that at the same
 time.

Awesome!  Thanks for working on that!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir/search: Assert that variable id's are in range

2015-05-08 Thread Jason Ekstrand
---
 src/glsl/nir/nir_search.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c
index 5ba0160..d86655b 100644
--- a/src/glsl/nir/nir_search.c
+++ b/src/glsl/nir/nir_search.c
@@ -90,6 +90,7 @@ match_value(const nir_search_value *value, nir_alu_instr 
*instr, unsigned src,
 
case nir_search_value_variable: {
   nir_search_variable *var = nir_search_value_as_variable(value);
+  assert(var-variable  NIR_SEARCH_MAX_VARIABLES);
 
   if (state-variables_seen  (1  var-variable)) {
  if (!nir_srcs_equal(state-variables[var-variable].src,
-- 
2.4.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/23] glsl/types: add new subroutine type

2015-05-08 Thread Chris Forbes
Patches 11-13 are:

Reviewed-by: Chris Forbes chr...@ijw.co.nz

On Fri, Apr 24, 2015 at 1:42 PM, Dave Airlie airl...@gmail.com wrote:
 From: Dave Airlie airl...@redhat.com

 This type will be used to store the name of subroutine types

 as in subroutine void myfunc(void);
 will store myfunc into a subroutine type.

 This is required to the parser can identify a subroutine
 type in a uniform decleration as a valid type, and also for
 looking up the type later.

 Also add contains_subroutine method.

 Signed-off-by: Dave Airlie airl...@redhat.com
 ---
  src/glsl/glsl_types.cpp| 63 
 ++
  src/glsl/glsl_types.h  | 19 ++
  src/glsl/ir_clone.cpp  |  1 +
  src/glsl/link_uniform_initializers.cpp |  1 +
  4 files changed, 84 insertions(+)

 diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp
 index 9c9b7ef..37b5c62 100644
 --- a/src/glsl/glsl_types.cpp
 +++ b/src/glsl/glsl_types.cpp
 @@ -32,6 +32,7 @@ mtx_t glsl_type::mutex = _MTX_INITIALIZER_NP;
  hash_table *glsl_type::array_types = NULL;
  hash_table *glsl_type::record_types = NULL;
  hash_table *glsl_type::interface_types = NULL;
 +hash_table *glsl_type::subroutine_types = NULL;
  void *glsl_type::mem_ctx = NULL;

  void
 @@ -159,6 +160,22 @@ glsl_type::glsl_type(const glsl_struct_field *fields, 
 unsigned num_fields,
 mtx_unlock(glsl_type::mutex);
  }

 +glsl_type::glsl_type(const char *subroutine_name) :
 +   gl_type(0),
 +   base_type(GLSL_TYPE_SUBROUTINE),
 +   sampler_dimensionality(0), sampler_shadow(0), sampler_array(0),
 +   sampler_type(0), interface_packing(0),
 +   vector_elements(0), matrix_columns(0),
 +   length(0)
 +{
 +   mtx_lock(glsl_type::mutex);
 +
 +   init_ralloc_type_ctx();
 +   assert(subroutine_name != NULL);
 +   this-name = ralloc_strdup(this-mem_ctx, subroutine_name);
 +   this-vector_elements = 1;
 +   mtx_unlock(glsl_type::mutex);
 +}

  bool
  glsl_type::contains_sampler() const
 @@ -229,6 +246,22 @@ glsl_type::contains_opaque() const {
 }
  }

 +bool
 +glsl_type::contains_subroutine() const
 +{
 +   if (this-is_array()) {
 +  return this-fields.array-contains_subroutine();
 +   } else if (this-is_record()) {
 +  for (unsigned int i = 0; i  this-length; i++) {
 +if (this-fields.structure[i].type-contains_subroutine())
 +   return true;
 +  }
 +  return false;
 +   } else {
 +  return this-is_subroutine();
 +   }
 +}
 +
  gl_texture_index
  glsl_type::sampler_index() const
  {
 @@ -826,6 +859,34 @@ glsl_type::get_interface_instance(const 
 glsl_struct_field *fields,
 return t;
  }

 +const glsl_type *
 +glsl_type::get_subroutine_instance(const char *subroutine_name)
 +{
 +   const glsl_type key(subroutine_name);
 +
 +   mtx_lock(glsl_type::mutex);
 +
 +   if (subroutine_types == NULL) {
 +  subroutine_types = hash_table_ctor(64, record_key_hash, 
 record_key_compare);
 +   }
 +
 +   const glsl_type *t = (glsl_type *) hash_table_find(subroutine_types,  
 key);
 +   if (t == NULL) {
 +  mtx_unlock(glsl_type::mutex);
 +  t = new glsl_type(subroutine_name);
 +  mtx_lock(glsl_type::mutex);
 +
 +  hash_table_insert(subroutine_types, (void *) t, t);
 +   }
 +
 +   assert(t-base_type == GLSL_TYPE_SUBROUTINE);
 +   assert(strcmp(t-name, subroutine_name) == 0);
 +
 +   mtx_unlock(glsl_type::mutex);
 +
 +   return t;
 +}
 +

  const glsl_type *
  glsl_type::get_mul_type(const glsl_type *type_a, const glsl_type *type_b)
 @@ -958,6 +1019,7 @@ glsl_type::component_slots() const
 case GLSL_TYPE_SAMPLER:
 case GLSL_TYPE_ATOMIC_UINT:
 case GLSL_TYPE_VOID:
 +   case GLSL_TYPE_SUBROUTINE:
 case GLSL_TYPE_ERROR:
break;
 }
 @@ -1330,6 +1392,7 @@ glsl_type::count_attribute_slots() const
 case GLSL_TYPE_IMAGE:
 case GLSL_TYPE_ATOMIC_UINT:
 case GLSL_TYPE_VOID:
 +   case GLSL_TYPE_SUBROUTINE:
 case GLSL_TYPE_ERROR:
break;
 }
 diff --git a/src/glsl/glsl_types.h b/src/glsl/glsl_types.h
 index d383dd5..078adaf 100644
 --- a/src/glsl/glsl_types.h
 +++ b/src/glsl/glsl_types.h
 @@ -59,6 +59,7 @@ enum glsl_base_type {
 GLSL_TYPE_INTERFACE,
 GLSL_TYPE_ARRAY,
 GLSL_TYPE_VOID,
 +   GLSL_TYPE_SUBROUTINE,
 GLSL_TYPE_ERROR
  };

 @@ -276,6 +277,11 @@ struct glsl_type {
   const char *block_name);

 /**
 +* Get the instance of an subroutine type
 +*/
 +   static const glsl_type *get_subroutine_instance(const char 
 *subroutine_name);
 +
 +   /**
  * Get the type resulting from a multiplication of \p type_a * \p type_b
  */
 static const glsl_type *get_mul_type(const glsl_type *type_a,
 @@ -526,6 +532,13 @@ struct glsl_type {
 /**
  * Query if a type is unnamed/anonymous (named by the parser)
  */
 +
 +   bool is_subroutine() const
 +   {
 +  return base_type == GLSL_TYPE_SUBROUTINE;
 +   }
 +   bool contains_subroutine() const;
 +
 bool 

Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching

2015-05-08 Thread Jason Ekstrand
On Fri, May 8, 2015 at 3:20 PM, Ian Romanick i...@freedesktop.org wrote:
 On 05/08/2015 11:55 AM, Jason Ekstrand wrote:
 On Fri, May 8, 2015 at 11:53 AM, Jason Ekstrand ja...@jlekstrand.net wrote:
 total instructions in shared programs: 7152330 - 7137006 (-0.21%)
 instructions in affected programs: 1330548 - 1315224 (-1.15%)
 helped:5797
 HURT:  76

 I'm doing some looking into the hurt programs.  It seems as if there
 are some very strange interatctions between flrp and ffma.  I'm still
 working out exactly how to fix it up.

 Yes, I noticed this too.  Did you see my reply to Ken earlier today?
 The problem I noticed /seems/ restricted to cases where the would-be
 interpolant is scalar but the other values are vector.

It would surprise me a lot of that mattered.  At the point where we do
opt_algebraic in NIR, we've already scalarized things.  That said,
there is a *lot* going on in an optimization loop so anything's
possible.
--Jason

 --Jason

 GAINED:0
 LOST:  8
 ---
  src/glsl/nir/nir_search.c | 11 +++
  1 file changed, 11 insertions(+)

 diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c
 index d86655b..4c83349 100644
 --- a/src/glsl/nir/nir_search.c
 +++ b/src/glsl/nir/nir_search.c
 @@ -199,6 +199,10 @@ match_expression(const nir_search_expression *expr, 
 nir_alu_instr *instr,
}
 }

 +   /* Stash off the current variables_seen bitmask.  This way we can
 +* restore it prior to matching in the commutative case below. */
 +   unsigned variables_seen_stash = state-variables_seen;
 +
 bool matched = true;
 for (unsigned i = 0; i  nir_op_infos[instr-op].num_inputs; i++) {
/* If the source is an explicitly sized source, then we need to reset
 @@ -221,6 +225,13 @@ match_expression(const nir_search_expression *expr, 
 nir_alu_instr *instr,

 if (nir_op_infos[instr-op].algebraic_properties  
 NIR_OP_IS_COMMUTATIVE) {
assert(nir_op_infos[instr-op].num_inputs == 2);
 +
 +  /* Restore the variables_seen bitmask.  If we don't do this, then we
 +   * could end up with an erroneous failure due to variables found in 
 the
 +   * first match attempt above not matching those in the second.
 +   */
 +  state-variables_seen = variables_seen_stash;
 +
if (!match_value(expr-srcs[0], instr, 1, num_components,
 swizzle, state))
   return false;
 --
 2.4.0

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Add support for removing NOT.NZ and NOT.Z instructions.

2015-05-08 Thread Ian Romanick
From: Ian Romanick ian.d.roman...@intel.com

Shader-db results:

GM45 and Iron Lake:
total instructions in shared programs: 7888585 - 7888585 (0.00%)
instructions in affected programs: 0 - 0

Sandy Bridge, Ivy Bridge, Haswell, and Broadwell:
total instructions in shared programs: 9598608 - 9598572 (-0.00%)
instructions in affected programs: 6506 - 6470 (-0.55%)
helped:36

Signed-off-by: Ian Romanick ian.d.roman...@intel.com
---
 src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
index 469f2ea..d72a83a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
@@ -59,7 +59,8 @@ opt_cmod_propagation_local(bblock_t *block)
 
   if ((inst-opcode != BRW_OPCODE_AND 
inst-opcode != BRW_OPCODE_CMP 
-   inst-opcode != BRW_OPCODE_MOV) ||
+   inst-opcode != BRW_OPCODE_MOV 
+   inst-opcode != BRW_OPCODE_NOT) ||
   inst-predicate != BRW_PREDICATE_NONE ||
   !inst-dst.is_null() ||
   inst-src[0].file != GRF ||
@@ -86,6 +87,11 @@ opt_cmod_propagation_local(bblock_t *block)
   inst-conditional_mod != BRW_CONDITIONAL_NZ)
  continue;
 
+  if (inst-opcode == BRW_OPCODE_NOT 
+  inst-conditional_mod != BRW_CONDITIONAL_Z 
+  inst-conditional_mod != BRW_CONDITIONAL_NZ)
+ continue;
+
   bool read_flag = false;
   foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst,
   block) {
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] clover: Add a mutex to guard queue::queued_events

2015-05-08 Thread Francisco Jerez
Tom Stellard thomas.stell...@amd.com writes:

 This fixes a potential crash where on a sequence like this:

 Thread 0: Check if queue is not empty.
 Thread 1: Remove item from queue, making it empty.
 Thread 0: Do something assuming queue is not empty.
 ---
  src/gallium/state_trackers/clover/core/queue.cpp | 2 ++
  src/gallium/state_trackers/clover/core/queue.hpp | 2 ++
  2 files changed, 4 insertions(+)

 diff --git a/src/gallium/state_trackers/clover/core/queue.cpp 
 b/src/gallium/state_trackers/clover/core/queue.cpp
 index 24f9326..87f9dcc 100644
 --- a/src/gallium/state_trackers/clover/core/queue.cpp
 +++ b/src/gallium/state_trackers/clover/core/queue.cpp
 @@ -44,6 +44,7 @@ command_queue::flush() {
 pipe_screen *screen = device().pipe;
 pipe_fence_handle *fence = NULL;
  
 +   std::lock_guardstd::mutex lock(queued_events_mutex);
 if (!queued_events.empty()) {
pipe-flush(pipe, fence, 0);
  
 @@ -69,6 +70,7 @@ command_queue::profiling_enabled() const {
  
  void
  command_queue::sequence(hard_event ev) {
 +   std::lock_guardstd::mutex lock(queued_events_mutex);
 if (!queued_events.empty())
queued_events.back()().chain(ev);
  
 diff --git a/src/gallium/state_trackers/clover/core/queue.hpp 
 b/src/gallium/state_trackers/clover/core/queue.hpp
 index b7166e6..bddb86c 100644
 --- a/src/gallium/state_trackers/clover/core/queue.hpp
 +++ b/src/gallium/state_trackers/clover/core/queue.hpp
 @@ -24,6 +24,7 @@
  #define CLOVER_CORE_QUEUE_HPP
  
  #include deque
 +#include mutex
  
  #include core/object.hpp
  #include core/context.hpp
 @@ -69,6 +70,7 @@ namespace clover {
  
cl_command_queue_properties props;
pipe_context *pipe;
 +  std::mutex queued_events_mutex;
std::dequeintrusive_refhard_event queued_events;
 };
  }
 -- 
 2.0.4

Thanks, this patch is:
Reviewed-by: Francisco Jerez curroje...@riseup.net


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching

2015-05-08 Thread Ian Romanick
On 05/08/2015 11:55 AM, Jason Ekstrand wrote:
 On Fri, May 8, 2015 at 11:53 AM, Jason Ekstrand ja...@jlekstrand.net wrote:
 total instructions in shared programs: 7152330 - 7137006 (-0.21%)
 instructions in affected programs: 1330548 - 1315224 (-1.15%)
 helped:5797
 HURT:  76
 
 I'm doing some looking into the hurt programs.  It seems as if there
 are some very strange interatctions between flrp and ffma.  I'm still
 working out exactly how to fix it up.

Yes, I noticed this too.  Did you see my reply to Ken earlier today?
The problem I noticed /seems/ restricted to cases where the would-be
interpolant is scalar but the other values are vector.

 --Jason
 
 GAINED:0
 LOST:  8
 ---
  src/glsl/nir/nir_search.c | 11 +++
  1 file changed, 11 insertions(+)

 diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c
 index d86655b..4c83349 100644
 --- a/src/glsl/nir/nir_search.c
 +++ b/src/glsl/nir/nir_search.c
 @@ -199,6 +199,10 @@ match_expression(const nir_search_expression *expr, 
 nir_alu_instr *instr,
}
 }

 +   /* Stash off the current variables_seen bitmask.  This way we can
 +* restore it prior to matching in the commutative case below. */
 +   unsigned variables_seen_stash = state-variables_seen;
 +
 bool matched = true;
 for (unsigned i = 0; i  nir_op_infos[instr-op].num_inputs; i++) {
/* If the source is an explicitly sized source, then we need to reset
 @@ -221,6 +225,13 @@ match_expression(const nir_search_expression *expr, 
 nir_alu_instr *instr,

 if (nir_op_infos[instr-op].algebraic_properties  
 NIR_OP_IS_COMMUTATIVE) {
assert(nir_op_infos[instr-op].num_inputs == 2);
 +
 +  /* Restore the variables_seen bitmask.  If we don't do this, then we
 +   * could end up with an erroneous failure due to variables found in 
 the
 +   * first match attempt above not matching those in the second.
 +   */
 +  state-variables_seen = variables_seen_stash;
 +
if (!match_value(expr-srcs[0], instr, 1, num_components,
 swizzle, state))
   return false;
 --
 2.4.0

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching

2015-05-08 Thread Ian Romanick
On 05/08/2015 03:31 PM, Ian Romanick wrote:
 On 05/08/2015 03:25 PM, Jason Ekstrand wrote:
 On Fri, May 8, 2015 at 3:20 PM, Ian Romanick i...@freedesktop.org wrote:
 On 05/08/2015 11:55 AM, Jason Ekstrand wrote:
 On Fri, May 8, 2015 at 11:53 AM, Jason Ekstrand ja...@jlekstrand.net 
 wrote:
 total instructions in shared programs: 7152330 - 7137006 (-0.21%)
 instructions in affected programs: 1330548 - 1315224 (-1.15%)
 helped:5797
 HURT:  76

 I'm doing some looking into the hurt programs.  It seems as if there
 are some very strange interatctions between flrp and ffma.  I'm still
 working out exactly how to fix it up.

 Yes, I noticed this too.  Did you see my reply to Ken earlier today?
 The problem I noticed /seems/ restricted to cases where the would-be
 interpolant is scalar but the other values are vector.

 It would surprise me a lot of that mattered.  At the point where we do
 opt_algebraic in NIR, we've already scalarized things.  That said,
 there is a *lot* going on in an optimization loop so anything's
 possible.
 
 If I take the shader_runner test that I included in the e-mail to Ken
 and s/float alpha/vec3 alpha/ I get LRPs.  I made some obvious tweaks to
 opt_algebraic to handle the mix of scalar and vector, and something like
 150 shaders were helped.
 
 I spent about an hour digging into it, and I came up dry.  I have tried
 adding some rules to nir_opt_algebraic.py to convert the fmul/ffma to
 flrp, and I couldn't get a break point at the nir_opt_ffma to trigger.
 I was going to ask about it at the office on Monday, but it came up on
 the list first.

FWIW, those rules were:

   (('ffma', b, c, ('fmul', a, ('fadd', 1.0, ('fneg', c, ('flrp', a, b, c), 
'!options-lower_flrp'),
   (('ffma', c, b, ('fmul', a, ('fadd', 1.0, ('fneg', c, ('flrp', a, b, c), 
'!options-lower_flrp'),

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching

2015-05-08 Thread Ian Romanick
On 05/08/2015 03:25 PM, Jason Ekstrand wrote:
 On Fri, May 8, 2015 at 3:20 PM, Ian Romanick i...@freedesktop.org wrote:
 On 05/08/2015 11:55 AM, Jason Ekstrand wrote:
 On Fri, May 8, 2015 at 11:53 AM, Jason Ekstrand ja...@jlekstrand.net 
 wrote:
 total instructions in shared programs: 7152330 - 7137006 (-0.21%)
 instructions in affected programs: 1330548 - 1315224 (-1.15%)
 helped:5797
 HURT:  76

 I'm doing some looking into the hurt programs.  It seems as if there
 are some very strange interatctions between flrp and ffma.  I'm still
 working out exactly how to fix it up.

 Yes, I noticed this too.  Did you see my reply to Ken earlier today?
 The problem I noticed /seems/ restricted to cases where the would-be
 interpolant is scalar but the other values are vector.
 
 It would surprise me a lot of that mattered.  At the point where we do
 opt_algebraic in NIR, we've already scalarized things.  That said,
 there is a *lot* going on in an optimization loop so anything's
 possible.

If I take the shader_runner test that I included in the e-mail to Ken
and s/float alpha/vec3 alpha/ I get LRPs.  I made some obvious tweaks to
opt_algebraic to handle the mix of scalar and vector, and something like
150 shaders were helped.

I spent about an hour digging into it, and I came up dry.  I have tried
adding some rules to nir_opt_algebraic.py to convert the fmul/ffma to
flrp, and I couldn't get a break point at the nir_opt_ffma to trigger.
I was going to ask about it at the office on Monday, but it came up on
the list first.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching

2015-05-08 Thread Jason Ekstrand
On Fri, May 8, 2015 at 3:32 PM, Ian Romanick i...@freedesktop.org wrote:
 On 05/08/2015 03:31 PM, Ian Romanick wrote:
 On 05/08/2015 03:25 PM, Jason Ekstrand wrote:
 On Fri, May 8, 2015 at 3:20 PM, Ian Romanick i...@freedesktop.org wrote:
 On 05/08/2015 11:55 AM, Jason Ekstrand wrote:
 On Fri, May 8, 2015 at 11:53 AM, Jason Ekstrand ja...@jlekstrand.net 
 wrote:
 total instructions in shared programs: 7152330 - 7137006 (-0.21%)
 instructions in affected programs: 1330548 - 1315224 (-1.15%)
 helped:5797
 HURT:  76

 I'm doing some looking into the hurt programs.  It seems as if there
 are some very strange interatctions between flrp and ffma.  I'm still
 working out exactly how to fix it up.

 Yes, I noticed this too.  Did you see my reply to Ken earlier today?
 The problem I noticed /seems/ restricted to cases where the would-be
 interpolant is scalar but the other values are vector.

 It would surprise me a lot of that mattered.  At the point where we do
 opt_algebraic in NIR, we've already scalarized things.  That said,
 there is a *lot* going on in an optimization loop so anything's
 possible.

 If I take the shader_runner test that I included in the e-mail to Ken
 and s/float alpha/vec3 alpha/ I get LRPs.  I made some obvious tweaks to
 opt_algebraic to handle the mix of scalar and vector, and something like
 150 shaders were helped.

Ok, that's weird.  I really have no idea why that would make a
difference.  Unless, of course, it affects the optimizations that GLSL
IR is able to do which then, in turn, affects NIR.  That would make
some sense.

 I spent about an hour digging into it, and I came up dry.  I have tried
 adding some rules to nir_opt_algebraic.py to convert the fmul/ffma to
 flrp, and I couldn't get a break point at the nir_opt_ffma to trigger.
 I was going to ask about it at the office on Monday, but it came up on
 the list first.

 FWIW, those rules were:

(('ffma', b, c, ('fmul', a, ('fadd', 1.0, ('fneg', c, ('flrp', a, b, 
 c), '!options-lower_flrp'),
(('ffma', c, b, ('fmul', a, ('fadd', 1.0, ('fneg', c, ('flrp', a, b, 
 c), '!options-lower_flrp'),

Did you add them to optimizations or late_optimizations?  The late
ones get run after opt_ffma and the others don't.

I poked around for an hour or two today with similar optimizations (I
actually had 6) but I wasn't able to get it to do any better.
--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 13/13] mesa/main: Verify context creation on progress

2015-05-08 Thread Juha-Pekka Heikkilä
perjantai 8. toukokuuta 2015 Ian Romanick i...@freedesktop.org kirjoitti:

 On 05/07/2015 05:21 AM, Pohjolainen, Topi wrote:
  On Tue, May 05, 2015 at 02:25:29PM +0300, Juha-Pekka Heikkila wrote:
  Stop context creation if something failed. If something errored
  during context creation we'd segfault. Now will clean up and
  return error.
 
  Signed-off-by: Juha-Pekka Heikkila juhapekka.heikk...@gmail.com
 javascript:;
  ---
   src/mesa/main/shared.c | 66
 +++---
   1 file changed, 62 insertions(+), 4 deletions(-)
 
  diff --git a/src/mesa/main/shared.c b/src/mesa/main/shared.c
  index 0b76cc0..cc05b05 100644
  --- a/src/mesa/main/shared.c
  +++ b/src/mesa/main/shared.c
  @@ -64,9 +64,21 @@ _mesa_alloc_shared_state(struct gl_context *ctx)
 
  mtx_init(shared-Mutex, mtx_plain);
 
  +   /* Mutex and timestamp for texobj state validation */
  +   mtx_init(shared-TexMutex, mtx_recursive);
  +   shared-TextureStateStamp = 0;
 
  Do you really need to move this here?

 I was going to ask the same thing.  I think moving it here means that it
 can be unconditionally mtx_destroy'ed in the error path below.


Yes, Ian got it correct. When moving mutex creation here there is no need
to go checking about it if need to clean up. I though this makes things
more clean and simple.




 +
  shared-DisplayList = _mesa_NewHashTable();
  +   if (!shared-DisplayList)
  +  goto error_out;
  +
  shared-TexObjects = _mesa_NewHashTable();
  +   if (!shared-TexObjects)
  +  goto error_out;
  +
  shared-Programs = _mesa_NewHashTable();
  +   if (!shared-Programs)
  +  goto error_out;
 
  shared-DefaultVertexProgram =
 gl_vertex_program(ctx-Driver.NewProgram(ctx,
  @@ -76,17 +88,28 @@ _mesa_alloc_shared_state(struct gl_context *ctx)
 
  GL_FRAGMENT_PROGRAM_ARB, 0));
 
  shared-ATIShaders = _mesa_NewHashTable();
  +   if (!shared-ATIShaders)
  +  goto error_out;
  +
  shared-DefaultFragmentShader = _mesa_new_ati_fragment_shader(ctx,
 0);
 
  shared-ShaderObjects = _mesa_NewHashTable();
  +   if (!shared-ShaderObjects)
  +  goto error_out;
 
  shared-BufferObjects = _mesa_NewHashTable();
  +   if (!shared-BufferObjects)
  +  goto error_out;
 
  /* GL_ARB_sampler_objects */
  shared-SamplerObjects = _mesa_NewHashTable();
  +   if (!shared-SamplerObjects)
  +  goto error_out;
 
  /* Allocate the default buffer object */
  shared-NullBufferObj = ctx-Driver.NewBufferObject(ctx, 0);
  +   if (!shared-NullBufferObj)
  +   goto error_out;
 
  /* Create default texture objects */
  for (i = 0; i  NUM_TEXTURE_TARGETS; i++) {
  @@ -107,22 +130,57 @@ _mesa_alloc_shared_state(struct gl_context *ctx)
 };
 STATIC_ASSERT(ARRAY_SIZE(targets) == NUM_TEXTURE_TARGETS);
 shared-DefaultTex[i] = ctx-Driver.NewTextureObject(ctx, 0,
 targets[i]);
  +
  +  if (!shared-DefaultTex[i])
  +  goto error_out;
  }
 
  /* sanity check */
  assert(shared-DefaultTex[TEXTURE_1D_INDEX]-RefCount == 1);
 
  -   /* Mutex and timestamp for texobj state validation */
  -   mtx_init(shared-TexMutex, mtx_recursive);
  -   shared-TextureStateStamp = 0;
  -
  shared-FrameBuffers = _mesa_NewHashTable();
  +   if (!shared-FrameBuffers)
  +  goto error_out;
  +
  shared-RenderBuffers = _mesa_NewHashTable();
  +   if (!shared-RenderBuffers)
  +  goto error_out;
 
  shared-SyncObjects = _mesa_set_create(NULL, _mesa_hash_pointer,
 _mesa_key_pointer_equal);
  +   if (!shared-SyncObjects)
  +  goto error_out;
 
  return shared;
  +
  +error_out:
  +   for (i = 0; i  NUM_TEXTURE_TARGETS; i++) {
  +  if (shared-DefaultTex[i]) {
  + ctx-Driver.DeleteTexture(ctx, shared-DefaultTex[i]);
  +  }
  +   }
  +
  +   _mesa_reference_buffer_object(ctx, shared-NullBufferObj, NULL);
  +
  +   _mesa_DeleteHashTable(shared-RenderBuffers);
  +   _mesa_DeleteHashTable(shared-FrameBuffers);
  +   _mesa_DeleteHashTable(shared-SamplerObjects);
  +   _mesa_DeleteHashTable(shared-BufferObjects);
  +   _mesa_DeleteHashTable(shared-ShaderObjects);
  +   _mesa_DeleteHashTable(shared-ATIShaders);
  +   _mesa_DeleteHashTable(shared-Programs);
  +   _mesa_DeleteHashTable(shared-TexObjects);
  +   _mesa_DeleteHashTable(shared-DisplayList);
  +
  +   _mesa_reference_vertprog(ctx, shared-DefaultVertexProgram, NULL);
  +   _mesa_reference_geomprog(ctx, shared-DefaultGeometryProgram,
 NULL);
  +   _mesa_reference_fragprog(ctx, shared-DefaultFragmentProgram,
 NULL);
  +
  +   mtx_destroy(shared-Mutex);
  +   mtx_destroy(shared-TexMutex);
  +
  +   free(shared);
  +   return NULL;
   }
 
 
  --
  1.8.5.1
 
  ___
  mesa-dev mailing list
  mesa-dev@lists.freedesktop.org javascript:;
  http://lists.freedesktop.org/mailman/listinfo/mesa-dev
  ___
  mesa-dev mailing 

Re: [Mesa-dev] [PATCH 11/27] i965: Store gather table information in the program data

2015-05-08 Thread Abdiel Janulgue


On 05/07/2015 06:17 PM, Pohjolainen, Topi wrote:
 On Tue, Apr 28, 2015 at 11:08:08PM +0300, Abdiel Janulgue wrote:
 The resource streamer is able to gather and pack sparsely-located
 constant data from any buffer object by a referring to a gather table
 This patch adds support for keeping track of these constant data
 fetches into a gather table.

 The gather table is generated from two sources. Ordinary uniform fetches
 are stored first. These are then combined with a separate table containing
 UBO entries. The separate entry for UBOs is needed to make it easier to
 generate the gather mask when combining and packing the constant data.

 Signed-off-by: Abdiel Janulgue abdiel.janul...@linux.intel.com
 ---
  src/mesa/drivers/dri/i965/brw_context.h  |  9 +
  src/mesa/drivers/dri/i965/brw_gs.c   |  4 
  src/mesa/drivers/dri/i965/brw_program.c  |  5 +
  src/mesa/drivers/dri/i965/brw_shader.cpp |  4 +++-
  src/mesa/drivers/dri/i965/brw_shader.h   | 11 +++
  src/mesa/drivers/dri/i965/brw_vs.c   |  5 +
  src/mesa/drivers/dri/i965/brw_wm.c   |  5 +
  7 files changed, 42 insertions(+), 1 deletion(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
 b/src/mesa/drivers/dri/i965/brw_context.h
 index 7fd49e9..e25c64d 100644
 --- a/src/mesa/drivers/dri/i965/brw_context.h
 +++ b/src/mesa/drivers/dri/i965/brw_context.h
 @@ -355,9 +355,12 @@ struct brw_stage_prog_data {
  
 GLuint nr_params;   /** number of float params/constants */
 GLuint nr_pull_params;
 +   GLuint nr_ubo_params;
 +   GLuint nr_gather_table;
 
 I would introduce these as non gl-types - we are trying to move away from
 them. Perhaps change nr_params and nr_pull_params while you are at it.
 
  
 unsigned curb_read_length;
 unsigned total_scratch;
 +   unsigned max_ubo_const_block;
  
 /**
  * Register where the thread expects to find input data from the URB
 @@ -375,6 +378,12 @@ struct brw_stage_prog_data {
  */
 const gl_constant_value **param;
 const gl_constant_value **pull_param;
 +   struct {
 +  int reg;
 +  unsigned channel_mask;
 +  unsigned const_block;
 +  unsigned const_offset;
 +   } *gather_table;
  };
 
 Below in brw_shader.h you do:
 
struct gather_table {
   int reg;
   unsigned channel_mask;
   unsigned const_block;
   unsigned const_offset;
};
gather_table *ubo_gather_table;
 
 Why not here?

I guess you're right. Yep, I can probably re-use the same definition in
the next version. Thanks!

 
  
  /* Data about a particular attempt to compile a program.  Note that
 diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
 b/src/mesa/drivers/dri/i965/brw_gs.c
 index bea90d8..97658d5 100644
 --- a/src/mesa/drivers/dri/i965/brw_gs.c
 +++ b/src/mesa/drivers/dri/i965/brw_gs.c
 @@ -70,6 +70,10 @@ brw_compile_gs_prog(struct brw_context *brw,
 c.prog_data.base.base.pull_param =
rzalloc_array(NULL, const gl_constant_value *, param_count);
 c.prog_data.base.base.nr_params = param_count;
 +   c.prog_data.base.base.nr_gather_table = 0;
 +   c.prog_data.base.base.gather_table =
 +  rzalloc_size(NULL, sizeof(*c.prog_data.base.base.gather_table) *
 +   (c.prog_data.base.base.nr_params + 
 c.prog_data.base.base.nr_ubo_params));
 
 Wrap this line.
 
  
 if (brw-gen = 7) {
if (gp-program.OutputType == GL_POINTS) {
 diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
 b/src/mesa/drivers/dri/i965/brw_program.c
 index 81a0c19..f27c799 100644
 --- a/src/mesa/drivers/dri/i965/brw_program.c
 +++ b/src/mesa/drivers/dri/i965/brw_program.c
 @@ -573,6 +573,10 @@ brw_stage_prog_data_compare(const struct 
 brw_stage_prog_data *a,
 if (memcmp(a-pull_param, b-pull_param, a-nr_pull_params * sizeof(void 
 *)))
return false;
  
 +   if (memcmp(a-gather_table, b-gather_table,
 +  a-nr_gather_table * sizeof(*a-gather_table)))
 +  return false;
 +
 return true;
  }
  
 @@ -583,6 +587,7 @@ brw_stage_prog_data_free(const void *p)
  
 ralloc_free(prog_data-param);
 ralloc_free(prog_data-pull_param);
 +   ralloc_free(prog_data-gather_table);
  }
  
  void
 diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
 b/src/mesa/drivers/dri/i965/brw_shader.cpp
 index 0d6ac0c..8769f67 100644
 --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
 @@ -739,11 +739,13 @@ backend_visitor::backend_visitor(struct brw_context 
 *brw,
   prog(prog),
   stage_prog_data(stage_prog_data),
   cfg(NULL),
 - stage(stage)
 + stage(stage),
 + ubo_gather_table(NULL)
  {
 debug_enabled = INTEL_DEBUG  intel_debug_flag_for_shader_stage(stage);
 stage_name = _mesa_shader_stage_to_string(stage);
 stage_abbrev = _mesa_shader_stage_to_abbrev(stage);
 +   this-nr_ubo_gather_table = 0;
 
 Any particular reason not to do this in the initializer along with the other
 members?

Nothing really. Yeah, it makes sense to put 

[Mesa-dev] [PATCH 3/9] nir: add an optimization for removing dead control flow

2015-05-08 Thread Connor Abbott
I'm not so sure about where to put the helper currently in nir.c... on
the one hand, it's pretty specific to this pass, but on the other hand
it needs to do some very fiddly low-level things to the control flow
which is why it needs access to a static function in nir.c
(stitch_blocks()) that I'd rather not expose publically.

Signed-off-by: Connor Abbott cwabbo...@gmail.com
---
 src/glsl/nir/nir.c |  26 +++
 src/glsl/nir/nir.h |   8 +++
 src/glsl/nir/nir_opt_dead_cf.c | 150 +
 3 files changed, 184 insertions(+)
 create mode 100644 src/glsl/nir/nir_opt_dead_cf.c

diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c
index 362ac30..efc3f01 100644
--- a/src/glsl/nir/nir.c
+++ b/src/glsl/nir/nir.c
@@ -1281,6 +1281,32 @@ nir_cf_node_remove(nir_cf_node *node)
}
 }
 
+/* Takes a control flow list 'cf_list,' presumed to be a child of the control
+ * flow node 'node,' pastes cf_list after node, and then deletes node.
+ */
+
+void
+nir_cf_list_move_after_node(nir_cf_node *node, struct exec_list *cf_list)
+{
+   nir_cf_node *after = nir_cf_node_next(node);
+   assert(after-type == nir_cf_node_block);
+   nir_block *after_block = nir_cf_node_as_block(after);
+
+   foreach_list_typed(nir_cf_node, child, node, cf_list) {
+  child-parent = node-parent;
+   }
+
+   nir_cf_node *last = exec_node_data(nir_cf_node, exec_list_get_tail(cf_list),
+  node);
+   assert(last-type == nir_cf_node_block);
+   nir_block *last_block = nir_cf_node_as_block(last);
+
+   exec_node_insert_list_before(after-node, cf_list);
+   stitch_blocks(last_block, after_block);
+
+   nir_cf_node_remove(node);
+}
+
 static bool
 add_use_cb(nir_src *src, void *state)
 {
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index 697d37e..d2bf3d7 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1518,6 +1518,12 @@ void nir_cf_node_insert_end(struct exec_list *list, 
nir_cf_node *node);
 /** removes a control flow node, doing any cleanup necessary */
 void nir_cf_node_remove(nir_cf_node *node);
 
+
+/** Takes a control flow list 'cf_list,' presumed to be a child of the control
+ *  flow node 'node,' pastes cf_list after node, and then deletes node.
+ */
+void nir_cf_list_move_after_node(nir_cf_node *node, struct exec_list *cf_list);
+
 /** requests that the given pieces of metadata be generated */
 void nir_metadata_require(nir_function_impl *impl, nir_metadata required);
 /** dirties all but the preserved metadata */
@@ -1692,6 +1698,8 @@ bool nir_opt_cse(nir_shader *shader);
 bool nir_opt_dce_impl(nir_function_impl *impl);
 bool nir_opt_dce(nir_shader *shader);
 
+bool nir_opt_dead_cf(nir_shader *shader);
+
 void nir_opt_gcm(nir_shader *shader);
 
 bool nir_opt_peephole_select(nir_shader *shader);
diff --git a/src/glsl/nir/nir_opt_dead_cf.c b/src/glsl/nir/nir_opt_dead_cf.c
new file mode 100644
index 000..3fbb794
--- /dev/null
+++ b/src/glsl/nir/nir_opt_dead_cf.c
@@ -0,0 +1,150 @@
+/*
+ * Copyright © 2014 Connor Abbott
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the Software),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Connor Abbott (cwabbo...@gmail.com)
+ *
+ */
+
+#include nir.h
+
+/*
+ * This file implements an optimization that deletes statically unreachable
+ * code. In NIR, one way this can happen if if an if statement has a constant
+ * condition:
+ *
+ * if (true) {
+ *...
+ * }
+ *
+ * We delete the if statement and paste the contents of the always-executed
+ * branch into the surrounding control flow, possibly removing more code if
+ * the branch had a jump at the end.
+ */
+
+/* deletes all the control flow nodes after 'start' in the control flow list.
+ */
+static void
+delete_unreachable_after(nir_cf_node *start)
+{
+   nir_cf_node *current = start;
+   while (!nir_cf_node_is_last(current)) {
+  current = nir_cf_node_next(current);
+  nir_cf_node_remove(current);
+   }
+}
+

[Mesa-dev] [PATCH 9/9] i965/fs/nir: enable the dead control flow optimization

2015-05-08 Thread Connor Abbott
Doesn't do anything on the public shader-db.

Signed-off-by: Connor Abbott cwabbo...@gmail.com
---
 src/glsl/Makefile.sources   | 1 +
 src/mesa/drivers/dri/i965/brw_nir.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
index d784a81..7bdd895 100644
--- a/src/glsl/Makefile.sources
+++ b/src/glsl/Makefile.sources
@@ -50,6 +50,7 @@ NIR_FILES = \
nir/nir_opt_copy_propagate.c \
nir/nir_opt_cse.c \
nir/nir_opt_dce.c \
+   nir/nir_opt_dead_cf.c \
nir/nir_opt_gcm.c \
nir/nir_opt_global_to_local.c \
nir/nir_opt_peephole_ffma.c \
diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index de4d7aa..4001190 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -52,6 +52,8 @@ nir_optimize(nir_shader *nir)
   nir_validate_shader(nir);
   progress |= nir_opt_constant_folding(nir);
   nir_validate_shader(nir);
+  progress |= nir_opt_dead_cf(nir);
+  nir_validate_shader(nir);
   progress |= nir_opt_remove_phis(nir);
   nir_validate_shader(nir);
} while (progress);
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/9] nir: add nir_block_get_following_loop() helper

2015-05-08 Thread Connor Abbott
Signed-off-by: Connor Abbott cwabbo...@gmail.com
---
 src/glsl/nir/nir.c | 16 
 src/glsl/nir/nir.h |  2 ++
 2 files changed, 18 insertions(+)

diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c
index efc3f01..e3c37dd 100644
--- a/src/glsl/nir/nir.c
+++ b/src/glsl/nir/nir.c
@@ -2097,6 +2097,22 @@ nir_block_get_following_if(nir_block *block)
return nir_cf_node_as_if(next_node);
 }
 
+nir_loop *
+nir_block_get_following_loop(nir_block *block)
+{
+   if (exec_node_is_tail_sentinel(block-cf_node.node))
+  return NULL;
+
+   if (nir_cf_node_is_last(block-cf_node))
+  return NULL;
+
+   nir_cf_node *next_node = nir_cf_node_next(block-cf_node);
+
+   if (next_node-type != nir_cf_node_loop)
+  return NULL;
+
+   return nir_cf_node_as_loop(next_node);
+}
 static bool
 index_block(nir_block *block, void *state)
 {
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index d2bf3d7..69daa14 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1607,6 +1607,8 @@ bool nir_foreach_block_reverse(nir_function_impl *impl, 
nir_foreach_block_cb cb,
  */
 nir_if *nir_block_get_following_if(nir_block *block);
 
+nir_loop *nir_block_get_following_loop(nir_block *block);
+
 void nir_index_local_regs(nir_function_impl *impl);
 void nir_index_global_regs(nir_shader *shader);
 void nir_index_ssa_defs(nir_function_impl *impl);
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 8/9] nir/dead_cf: add support for removing useless loops

2015-05-08 Thread Connor Abbott
Signed-off-by: Connor Abbott cwabbo...@gmail.com
---
 src/glsl/nir/nir_opt_dead_cf.c | 96 --
 1 file changed, 84 insertions(+), 12 deletions(-)

diff --git a/src/glsl/nir/nir_opt_dead_cf.c b/src/glsl/nir/nir_opt_dead_cf.c
index 17eadbd..f96bf63 100644
--- a/src/glsl/nir/nir_opt_dead_cf.c
+++ b/src/glsl/nir/nir_opt_dead_cf.c
@@ -28,9 +28,9 @@
 #include nir.h
 
 /*
- * This file implements an optimization that deletes statically unreachable
- * code. In NIR, one way this can happen if if an if statement has a constant
- * condition:
+ * This file implements an optimization that deletes statically
+ * unreachable/dead code. In NIR, one way this can happen if if an if
+ * statement has a constant condition:
  *
  * if (true) {
  *...
@@ -40,7 +40,7 @@
  * branch into the surrounding control flow, possibly removing more code if
  * the branch had a jump at the end.
  *
- * The other way is that control flow can end in a jump so that code after it
+ * Another way is that control flow can end in a jump so that code after it
  * never gets executed. In particular, this can happen after optimizing
  * something like:
  *
@@ -59,6 +59,12 @@
  * }
  * ...
  *
+ * Finally, we also handle removing useless loops, i.e. loops with no side
+ * effects and without any definitions that are used elsewhere. This case is a
+ * little different from the first two in that the code is actually run (it
+ * just never does anything), but there are similar issues with needing to
+ * be careful with restarting after deleting the cf_node (see dead_cf_list())
+ * so this is a convenient place to remove them.
  */
 
 /* deletes all the control flow nodes after 'start' in the control flow list.
@@ -126,19 +132,82 @@ opt_constant_if(nir_if *if_stmt, bool condition)
 }
 
 static bool
+block_has_no_side_effects(nir_block *block, void *state)
+{
+   (void) state;
+
+   nir_foreach_instr(block, instr) {
+  if (instr-type == nir_instr_type_call)
+ return false;
+
+  if (instr-type != nir_instr_type_intrinsic)
+ continue;
+
+  nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
+  if (!nir_intrinsic_infos[intrin-intrinsic].flags 
+  NIR_INTRINSIC_CAN_ELIMINATE)
+ return false;
+   }
+
+   return true;
+}
+
+static bool
+dest_not_live_out(nir_dest *dest, void *state)
+{
+   nir_block *after = state;
+
+   assert(dest-is_ssa);
+   return !BITSET_TEST(after-live_in, dest-ssa.live_index);
+}
+
+static bool
+loop_is_dead(nir_loop *loop)
+{
+   nir_block *before = nir_cf_node_as_block(nir_cf_node_prev(loop-cf_node));
+   nir_block *after = nir_cf_node_as_block(nir_cf_node_next(loop-cf_node));
+
+   if (!exec_list_is_empty(after-instr_list) 
+   nir_block_last_instr(after)-type == nir_instr_type_phi)
+  return false;
+
+   if (!nir_foreach_block_in_cf_node(loop-cf_node, block_has_no_side_effects,
+ NULL))
+  return false;
+
+   for (nir_block *cur = after-imm_dom; cur != before; cur = cur-imm_dom) {
+  nir_foreach_instr(cur, instr) {
+ if (!nir_foreach_dest(instr, dest_not_live_out, after))
+return false;
+  }
+   }
+
+   return true;
+}
+
+static bool
 dead_cf_block(nir_block *block)
 {
nir_if *following_if = nir_block_get_following_if(block);
-   if (!following_if)
-  return false;
+   if (following_if) {
+ nir_const_value *const_value =
+nir_src_as_const_value(following_if-condition);
+
+ if (!const_value)
+return false;
+
+  opt_constant_if(following_if, const_value-u[0] != 0);
+  return true;
+   }
 
-  nir_const_value *const_value =
- nir_src_as_const_value(following_if-condition);
+   nir_loop *following_loop = nir_block_get_following_loop(block);
+   if (!following_loop)
+  return false;
 
-  if (!const_value)
- return false;
+   if (!loop_is_dead(following_loop))
+  return false;
 
-   opt_constant_if(following_if, const_value-u[0] != 0);
+   nir_cf_node_remove(following_loop-cf_node);
return true;
 }
 
@@ -167,7 +236,7 @@ dead_cf_list(struct exec_list *list, bool 
*list_ends_in_jump)
   case nir_cf_node_block: {
  nir_block *block = nir_cf_node_as_block(cur);
  if (dead_cf_block(block)) {
-/* We just deleted the if after this block, so we may have
+/* We just deleted the if or loop after this block, so we may have
  * deleted the block before or after it -- which one is an
  * implementation detail. Therefore, to recover the place we were
  * at, we have to use the previous cf_node.
@@ -238,6 +307,9 @@ dead_cf_list(struct exec_list *list, bool 
*list_ends_in_jump)
 static bool
 opt_dead_cf_impl(nir_function_impl *impl)
 {
+   nir_metadata_require(impl, nir_metadata_live_variables |
+  nir_metadata_dominance);
+
bool dummy;
bool progress = dead_cf_list(impl-body, dummy);
 
-- 
2.1.0


[Mesa-dev] [PATCH 4/9] nir/validate: validate successors at the end of a loop

2015-05-08 Thread Connor Abbott
I found this useful while debugging some control flow bugs while working
on the dead control flow pass.

Signed-off-by: Connor Abbott cwabbo...@gmail.com
---
 src/glsl/nir/nir_validate.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/src/glsl/nir/nir_validate.c b/src/glsl/nir/nir_validate.c
index da92ed9..0332c7d 100644
--- a/src/glsl/nir/nir_validate.c
+++ b/src/glsl/nir/nir_validate.c
@@ -78,6 +78,9 @@ typedef struct {
/* the parent of the current cf node being visited */
nir_cf_node *parent_node;
 
+   /* whether the current loop we're visiting has a break statement */
+   bool has_break;
+
/* the current function implementation being validated */
nir_function_impl *impl;
 
@@ -513,6 +516,8 @@ validate_instr(nir_instr *instr, validate_state *state)
   break;
 
case nir_instr_type_jump:
+  if (nir_instr_as_jump(instr)-type == nir_jump_break)
+ state-has_break = true;
   break;
 
default:
@@ -641,6 +646,9 @@ validate_if(nir_if *if_stmt, validate_state *state)
 static void
 validate_loop(nir_loop *loop, validate_state *state)
 {
+   bool old_has_break = state-has_break;
+   state-has_break = false;
+
assert(!exec_node_is_head_sentinel(loop-cf_node.node.prev));
nir_cf_node *prev_node = nir_cf_node_prev(loop-cf_node);
assert(prev_node-type == nir_cf_node_block);
@@ -663,7 +671,20 @@ validate_loop(nir_loop *loop, validate_state *state)
   validate_cf_node(cf_node, state);
}
 
+   nir_block *last_block = nir_cf_node_as_block(nir_loop_last_cf_node(loop));
+   if (exec_list_is_empty(last_block-instr_list) ||
+   nir_block_last_instr(last_block)-type != nir_instr_type_jump) {
+  assert(last_block-successors[0]-cf_node == 
nir_loop_first_cf_node(loop));
+   }
+   if (state-has_break) {
+  assert(last_block-successors[1] == NULL);
+   } else {
+  nir_block *next = nir_cf_node_as_block(nir_cf_node_next(loop-cf_node));
+  assert(last_block-successors[1] == next);
+   }
+
state-parent_node = old_parent;
+   state-has_break = old_has_break;
 }
 
 static void
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/9] nir: add a helper for iterating over blocks in a cf node

2015-05-08 Thread Connor Abbott
We were already doing this internally for iterating over a function
implementation, so just expose it directly.

Signed-off-by: Connor Abbott cwabbo...@gmail.com
---
 src/glsl/nir/nir.c | 7 +++
 src/glsl/nir/nir.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c
index e3c37dd..3cabdf7 100644
--- a/src/glsl/nir/nir.c
+++ b/src/glsl/nir/nir.c
@@ -2055,6 +2055,13 @@ foreach_cf_node(nir_cf_node *node, nir_foreach_block_cb 
cb,
 }
 
 bool
+nir_foreach_block_in_cf_node(nir_cf_node *node, nir_foreach_block_cb cb,
+ void *state)
+{
+   return foreach_cf_node(node, cb, false, state);
+}
+
+bool
 nir_foreach_block(nir_function_impl *impl, nir_foreach_block_cb cb, void 
*state)
 {
foreach_list_typed_safe(nir_cf_node, node, node, impl-body) {
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index 69daa14..4284374 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1601,6 +1601,8 @@ bool nir_foreach_block(nir_function_impl *impl, 
nir_foreach_block_cb cb,
void *state);
 bool nir_foreach_block_reverse(nir_function_impl *impl, nir_foreach_block_cb 
cb,
void *state);
+bool nir_foreach_block_in_cf_node(nir_cf_node *node, nir_foreach_block_cb cb,
+  void *state);
 
 /* If the following CF node is an if, this function returns that if.
  * Otherwise, it returns NULL.
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/9] NIR dead control-flow removal

2015-05-08 Thread Connor Abbott
This series implements a bunch of related optimizations that run at
once as part of the same pass to eliminate control flow that is
useless. This includes both unreachable code and useless loops, i.e.
loops that don't compute a result used by the rest of the code. To do
this, I needed to change some low-level things that change the behavior
of nir_instr_remove() (see patch 2 for details), but I've tested the
series on piglit and there are no regressions, and there are no
shader-db changes either.

This is rebased on Jason's series to use lists for use-def sets; it
turns out that the changes required were pretty minimal.

The series is also available at:

git://people.freedesktop.org/~cwabbott0/mesa nir-dead-cf-v3

Connor Abbott (9):
  nir/vars_to_ssa: don't rewrite removed instructions
  nir: insert ssa_undef instructions when cleanup up defs/uses
  nir: add an optimization for removing dead control flow
  nir/validate: validate successors at the end of a loop
  nir/dead_cf: delete code that's unreachable due to jumps
  nir: add nir_block_get_following_loop() helper
  nir: add a helper for iterating over blocks in a cf node
  nir/dead_cf: add support for removing useless loops
  i965/fs/nir: enable the dead control flow optimization

 src/glsl/Makefile.sources|   1 +
 src/glsl/nir/nir.c   |  94 --
 src/glsl/nir/nir.h   |  12 ++
 src/glsl/nir/nir_lower_vars_to_ssa.c |   3 +-
 src/glsl/nir/nir_opt_dead_cf.c   | 332 +++
 src/glsl/nir/nir_validate.c  |  21 +++
 src/mesa/drivers/dri/i965/brw_nir.c  |   2 +
 7 files changed, 452 insertions(+), 13 deletions(-)
 create mode 100644 src/glsl/nir/nir_opt_dead_cf.c

-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/9] nir/vars_to_ssa: don't rewrite removed instructions

2015-05-08 Thread Connor Abbott
We were rewriting the uses of the intrinsic instruction to point to
something else after removing it, which only happened to work since we
were lax about having dangling uses when removing instructions. Fix that
up.

Signed-off-by: Connor Abbott cwabbo...@gmail.com
---
 src/glsl/nir/nir_lower_vars_to_ssa.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/glsl/nir/nir_lower_vars_to_ssa.c 
b/src/glsl/nir/nir_lower_vars_to_ssa.c
index ccb8f99..8d0ae1b 100644
--- a/src/glsl/nir/nir_lower_vars_to_ssa.c
+++ b/src/glsl/nir/nir_lower_vars_to_ssa.c
@@ -647,11 +647,12 @@ rename_variables_block(nir_block *block, struct 
lower_variables_state *state)
   intrin-num_components, NULL);
 
 nir_instr_insert_before(intrin-instr, mov-instr);
-nir_instr_remove(intrin-instr);
 
 nir_ssa_def_rewrite_uses(intrin-dest.ssa,
  nir_src_for_ssa(mov-dest.dest.ssa),
  state-shader);
+
+nir_instr_remove(intrin-instr);
 break;
  }
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/9] nir: insert ssa_undef instructions when cleanup up defs/uses

2015-05-08 Thread Connor Abbott
The point of cleanup_defs_uses() is to make an instruction safe to
remove by removing any references that the rest of the shader may have
to it. Previously, it was handling register use/def sets and removing
the instruction from the use sets of any SSA sources it had, but if the
instruction defined an SSA value that was used by other instructions it
wasn't doing anything. This was ok, since we were always careful to make
sure that no removed instruction ever had any uses, but now we want to
start removing unreachable instruction which might still be used in
reachable parts of the code. In that case, the value that any uses get
is undefined since the instruction never actually executes, so we can
just replace the instruction with an ssa_undef_instr.

Signed-off-by: Connor Abbott cwabbo...@gmail.com
---
 src/glsl/nir/nir.c | 47 ++-
 1 file changed, 34 insertions(+), 13 deletions(-)

diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c
index f03e80a..362ac30 100644
--- a/src/glsl/nir/nir.c
+++ b/src/glsl/nir/nir.c
@@ -1206,26 +1206,29 @@ stitch_blocks(nir_block *before, nir_block *after)
 }
 
 static void
-remove_defs_uses(nir_instr *instr);
+remove_defs_uses(nir_instr *instr, nir_function_impl *impl);
 
 static void
-cleanup_cf_node(nir_cf_node *node)
+cleanup_cf_node(nir_cf_node *node, nir_function_impl *impl)
 {
switch (node-type) {
case nir_cf_node_block: {
   nir_block *block = nir_cf_node_as_block(node);
   /* We need to walk the instructions and clean up defs/uses */
-  nir_foreach_instr(block, instr)
- remove_defs_uses(instr);
+  nir_foreach_instr(block, instr) {
+ if (instr-type == nir_instr_type_jump)
+handle_remove_jump(block, nir_instr_as_jump(instr)-type);
+ remove_defs_uses(instr, impl);
+  }
   break;
}
 
case nir_cf_node_if: {
   nir_if *if_stmt = nir_cf_node_as_if(node);
   foreach_list_typed(nir_cf_node, child, node, if_stmt-then_list)
- cleanup_cf_node(child);
+ cleanup_cf_node(child, impl);
   foreach_list_typed(nir_cf_node, child, node, if_stmt-else_list)
- cleanup_cf_node(child);
+ cleanup_cf_node(child, impl);
 
   list_del(if_stmt-condition.use_link);
   break;
@@ -1234,13 +1237,12 @@ cleanup_cf_node(nir_cf_node *node)
case nir_cf_node_loop: {
   nir_loop *loop = nir_cf_node_as_loop(node);
   foreach_list_typed(nir_cf_node, child, node, loop-body)
- cleanup_cf_node(child);
+ cleanup_cf_node(child, impl);
   break;
}
case nir_cf_node_function: {
-  nir_function_impl *impl = nir_cf_node_as_function(node);
   foreach_list_typed(nir_cf_node, child, node, impl-body)
- cleanup_cf_node(child);
+ cleanup_cf_node(child, impl);
   break;
}
default:
@@ -1254,6 +1256,8 @@ nir_cf_node_remove(nir_cf_node *node)
nir_function_impl *impl = nir_cf_node_get_function(node);
nir_metadata_preserve(impl, nir_metadata_none);
 
+   cleanup_cf_node(node, impl);
+
if (node-type == nir_cf_node_block) {
   /*
* Basic blocks can't really be removed by themselves, since they act as
@@ -1275,8 +1279,6 @@ nir_cf_node_remove(nir_cf_node *node)
   exec_node_remove(node-node);
   stitch_blocks(before_block, after_block);
}
-
-   cleanup_cf_node(node);
 }
 
 static bool
@@ -1443,16 +1445,35 @@ remove_def_cb(nir_dest *dest, void *state)
return true;
 }
 
+static bool
+remove_ssa_def_cb(nir_ssa_def *def, void *state)
+{
+   nir_function_impl *impl = state;
+   nir_shader *shader = impl-overload-function-shader;
+
+   if (!list_empty(def-uses) || !list_empty(def-if_uses)) {
+  nir_ssa_undef_instr *undef =
+ nir_ssa_undef_instr_create(shader, def-num_components);
+  nir_instr_insert_before_cf_list(impl-body, undef-instr);
+  nir_ssa_def_rewrite_uses(def, nir_src_for_ssa(undef-def), shader);
+   }
+
+   return true;
+}
+
+
 static void
-remove_defs_uses(nir_instr *instr)
+remove_defs_uses(nir_instr *instr, nir_function_impl *impl)
 {
nir_foreach_dest(instr, remove_def_cb, instr);
nir_foreach_src(instr, remove_use_cb, instr);
+   nir_foreach_ssa_def(instr, remove_ssa_def_cb, impl);
 }
 
 void nir_instr_remove(nir_instr *instr)
 {
-   remove_defs_uses(instr);
+   nir_function_impl *impl = nir_cf_node_get_function(instr-block-cf_node);
+   remove_defs_uses(instr, impl);
exec_node_remove(instr-node);
 
if (instr-type == nir_instr_type_jump) {
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] nvc0/ir: optimize set 1.0 to produce boolean-float sets

2015-05-08 Thread Ilia Mirkin
This has started to happen more now that the backend is producing
KILL_IF more often.

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 29 ++
 .../nouveau/codegen/nv50_ir_target_nv50.cpp|  2 ++
 2 files changed, 31 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 14446b6..d8af19a 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -973,6 +973,35 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
imm0, int s)
}
   break;
 
+   case OP_AND:
+   {
+  CmpInstruction *cmp = i-getSrc(t)-getInsn()-asCmp();
+  if (!cmp || cmp-op == OP_SLCT)
+ return;
+  if (!prog-getTarget()-isOpSupported(cmp-op, TYPE_F32))
+ return;
+  if (imm0.reg.data.f32 != 1.0)
+ return;
+  if (cmp == NULL)
+ return;
+  if (i-getSrc(t)-getInsn()-dType != TYPE_U32)
+ return;
+
+  i-getSrc(t)-getInsn()-dType = TYPE_F32;
+  if (i-src(t).mod != Modifier(0)) {
+ assert(i-src(t).mod == Modifier(NV50_IR_MOD_NOT));
+ i-src(t).mod = Modifier(0);
+ cmp-setCond = reverseCondCode(cmp-setCond);
+  }
+  i-op = OP_MOV;
+  i-setSrc(s, NULL);
+  if (t) {
+ i-setSrc(0, i-getSrc(t));
+ i-setSrc(t, NULL);
+  }
+   }
+  break;
+
case OP_SHL:
{
   if (s != 1 || i-src(0).mod != Modifier(0))
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
index 178a167..70180eb 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
@@ -413,6 +413,8 @@ TargetNV50::isOpSupported(operation op, DataType ty) const
   return false;
case OP_SAD:
   return ty == TYPE_S32;
+   case OP_SET:
+  return !isFloatType(ty);
default:
   return true;
}
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] nv50/ir: allow OP_SET to merge with OP_SET_AND/etc as well as a neg

2015-05-08 Thread Ilia Mirkin
This covers the pattern where a KILL_IF is used, which triggers a
comparison of -x to 0. This can usually be folded into the comparison whose
result is being compared to 0, however it may, itself, have already been
combined with another comparison. That shouldn't impact the logic of
this pass however. With this and the  1.0 change, code like

0020: 001c0001 80081df4 set b32 $r0 lt f32 $r0 0x3e80
0028: 001c 201fc000 and b32 $r0 $r0 0x3f80
0030: 7f9c001e dd885c00 set $p0 0x1 lt f32 neg $r0 0x0
0038: 003c 1980 $p0 discard

becomes

0020: 001c001d b5881df4 set $p0 0x1 lt f32 $r0 0x3e80
0028: 003c 1980 $p0 discard

Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 51 ++
 1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index d8af19a..43a2fe9 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -278,7 +278,6 @@ private:
 
void tryCollapseChainedMULs(Instruction *, const int s, ImmediateValue);
 
-   // TGSI 'true' is converted to -1 by F2I(NEG(SET)), track back to SET
CmpInstruction *findOriginForTestWithZero(Value *);
 
unsigned int foldCount;
@@ -337,16 +336,10 @@ ConstantFolding::findOriginForTestWithZero(Value *value)
   return NULL;
Instruction *insn = value-getInsn();
 
-   while (insn  insn-op != OP_SET) {
+   while (insn  insn-op != OP_SET  insn-op != OP_SET_AND 
+  insn-op != OP_SET_OR  insn-op != OP_SET_XOR) {
   Instruction *next = NULL;
   switch (insn-op) {
-  case OP_NEG:
-  case OP_ABS:
-  case OP_CVT:
- next = insn-getSrc(0)-getInsn();
- if (insn-sType != next-dType)
-return NULL;
- break;
   case OP_MOV:
  next = insn-getSrc(0)-getInsn();
  break;
@@ -946,29 +939,51 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
imm0, int s)
 
case OP_SET: // TODO: SET_AND,OR,XOR
{
+  /* This optimizes the case where the output of a set is being compared
+   * to zero. Since the set can only produce 0/-1 (int) or 0/1 (float), we
+   * can be a lot cleverer in our comparison.
+   */
   CmpInstruction *si = findOriginForTestWithZero(i-getSrc(t));
   CondCode cc, ccZ;
-  if (i-src(t).mod != Modifier(0))
- return;
-  if (imm0.reg.data.u32 != 0 || !si || si-op != OP_SET)
+  if (imm0.reg.data.u32 != 0 || !si)
  return;
   cc = si-setCond;
   ccZ = (CondCode)((unsigned int)i-asCmp()-setCond  ~CC_U);
+  // We do everything assuming var (cmp) 0, reverse the condition if 0 is
+  // first.
   if (s == 0)
  ccZ = reverseCondCode(ccZ);
+  // If there is a negative modifier, we need to undo that, by flipping
+  // the comparison to zero.
+  if (i-src(t).mod.neg())
+ ccZ = reverseCondCode(ccZ);
+  // If this is a signed comparison, we expect the input to be a regular
+  // boolean, i.e. 0/-1. However the rest of the logic assumes that true
+  // is positive, so just flip the sign.
+  if (i-sType == TYPE_S32) {
+ assert(!isFloatType(si-dType));
+ ccZ = reverseCondCode(ccZ);
+  }
   switch (ccZ) {
-  case CC_LT: cc = CC_FL; break;
-  case CC_GE: cc = CC_TR; break;
-  case CC_EQ: cc = inverseCondCode(cc); break;
-  case CC_LE: cc = inverseCondCode(cc); break;
-  case CC_GT: break;
-  case CC_NE: break;
+  case CC_LT: cc = CC_FL; break; // bool  0 -- this is never true
+  case CC_GE: cc = CC_TR; break; // bool = 0 -- this is always true
+  case CC_EQ: cc = inverseCondCode(cc); break; // bool == 0 -- !bool
+  case CC_LE: cc = inverseCondCode(cc); break; // bool = 0 -- !bool
+  case CC_GT: break; // bool  0 -- bool
+  case CC_NE: break; // bool != 0 -- bool
   default:
  return;
   }
+
+  // Update the condition of this SET to be identical to the origin set,
+  // but with the updated condition code. The original SET should get
+  // DCE'd, ideally.
+  i-op = si-op;
   i-asCmp()-setCond = cc;
   i-setSrc(0, si-src(0));
   i-setSrc(1, si-src(1));
+  if (si-srcExists(2))
+ i-setSrc(2, si-src(2));
   i-sType = si-sType;
}
   break;
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] nvc0/ir: allow iset to produce a boolean float

2015-05-08 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 12 
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp |  1 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  |  8 +++-
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
index 28081fa..ab8bf2e 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
@@ -967,8 +967,8 @@ CodeEmitterGK110::emitSET(const CmpInstruction *i)
   code[0] = (code[0]  ~0xfc) | ((code[0]  3)  0xe0);
   if (i-defExists(1))
  defId(i-def(1), 2);
-   else
-  code[0] |= 0x1c;
+  else
+ code[0] |= 0x1c;
} else {
   switch (i-sType) {
   case TYPE_F32: op2 = 0x000; op1 = 0x800; break;
@@ -990,8 +990,12 @@ CodeEmitterGK110::emitSET(const CmpInstruction *i)
   }
   FTZ_(3a);
 
-  if (i-dType == TYPE_F32)
- code[1] |= 1  23;
+  if (i-dType == TYPE_F32) {
+ if (isFloatType(i-sType))
+code[1] |= 1  23;
+ else
+code[1] |= 1  15;
+  }
}
if (i-sType == TYPE_S32)
   code[1] |= 1  19;
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index 442cedf..399a6f1 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -1830,6 +1830,7 @@ CodeEmitterGM107::emitISET()
emitCond3(0x31, insn-setCond);
emitField(0x30, 1, isSignedType(insn-sType));
emitCC   (0x2f);
+   emitField(0x2c, 1, insn-dType == TYPE_F32);
emitX(0x2b);
emitGPR  (0x08, insn-src(0));
emitGPR  (0x00, insn-def(0));
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index c241973..f5992bc 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -1078,8 +1078,14 @@ CodeEmitterNVC0::emitSET(const CmpInstruction *i)
if (!isFloatType(i-sType))
   lo = 0x3;
 
-   if (isFloatType(i-dType) || isSignedIntType(i-sType))
+   if (isSignedIntType(i-sType))
   lo |= 0x20;
+   if (isFloatType(i-dType)) {
+  if (isFloatType(i-sType))
+ lo |= 0x20;
+  else
+ lo |= 0x80;
+   }
 
switch (i-op) {
case OP_SET_AND: hi = 0x1000; break;
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] nvc0/ir: avoid jumping to a sched instruction

2015-05-08 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin imir...@alum.mit.edu
---
Pretty sure there's nothing wrong with it, but it looks odd in the code.

 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 2 ++
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 7 +--
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 2 ++
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
index 6bb9620..28081fa 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
@@ -1316,6 +1316,8 @@ CodeEmitterGK110::emitFlow(const Instruction *i)
} else
if (mask  2) {
   int32_t pcRel = f-target.bb-binPos - (codeSize + 8);
+  if (writeIssueDelays  !(f-target.bb-binPos  0x3f))
+ pcRel += 8;
   // currently we don't want absolute branches
   assert(!f-absolute);
   code[0] |= (pcRel  0x1ff)  23;
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index 22db368..442cedf 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -509,10 +509,13 @@ CodeEmitterGM107::emitBRA()
emitCond5(0x00, CC_TR);
 
if (!insn-srcExists(0) || insn-src(0).getFile() != FILE_MEMORY_CONST) {
+  int32_t pos = insn-target.bb-binPos;
+  if (writeIssueDelays  !(pos  0x1f))
+ pos += 8;
   if (!insn-absolute)
- emitField(0x14, 24, insn-target.bb-binPos - (codeSize + 8));
+ emitField(0x14, 24, pos - (codeSize + 8));
   else
- emitField(0x14, 32, insn-target.bb-binPos);
+ emitField(0x14, 32, pos);
} else {
   emitCBUF (0x24, gpr, 20, 16, 0, insn-src(0));
   emitField(0x05, 1, 1);
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index d9aed34..c241973 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -1406,6 +1406,8 @@ CodeEmitterNVC0::emitFlow(const Instruction *i)
} else
if (mask  2) {
   int32_t pcRel = f-target.bb-binPos - (codeSize + 8);
+  if (writeIssueDelays  !(f-target.bb-binPos  0x3f))
+ pcRel += 8;
   // currently we don't want absolute branches
   assert(!f-absolute);
   code[0] |= (pcRel  0x3f)  26;
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/9] nir/vars_to_ssa: don't rewrite removed instructions

2015-05-08 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand jason.ekstr...@intel.com

On Fri, May 8, 2015 at 10:03 PM, Connor Abbott cwabbo...@gmail.com wrote:
 We were rewriting the uses of the intrinsic instruction to point to
 something else after removing it, which only happened to work since we
 were lax about having dangling uses when removing instructions. Fix that
 up.

 Signed-off-by: Connor Abbott cwabbo...@gmail.com
 ---
  src/glsl/nir/nir_lower_vars_to_ssa.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/src/glsl/nir/nir_lower_vars_to_ssa.c 
 b/src/glsl/nir/nir_lower_vars_to_ssa.c
 index ccb8f99..8d0ae1b 100644
 --- a/src/glsl/nir/nir_lower_vars_to_ssa.c
 +++ b/src/glsl/nir/nir_lower_vars_to_ssa.c
 @@ -647,11 +647,12 @@ rename_variables_block(nir_block *block, struct 
 lower_variables_state *state)
intrin-num_components, NULL);

  nir_instr_insert_before(intrin-instr, mov-instr);
 -nir_instr_remove(intrin-instr);

  nir_ssa_def_rewrite_uses(intrin-dest.ssa,
   nir_src_for_ssa(mov-dest.dest.ssa),
   state-shader);
 +
 +nir_instr_remove(intrin-instr);
  break;
   }

 --
 2.1.0

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Add support for removing NOT.NZ and NOT.Z instructions.

2015-05-08 Thread Ian Romanick
This was not fully baked.  I'll send out a fixed version later.

On 05/08/2015 07:05 PM, Ian Romanick wrote:
 From: Ian Romanick ian.d.roman...@intel.com
 
 Shader-db results:
 
 GM45 and Iron Lake:
 total instructions in shared programs: 7888585 - 7888585 (0.00%)
 instructions in affected programs: 0 - 0
 
 Sandy Bridge, Ivy Bridge, Haswell, and Broadwell:
 total instructions in shared programs: 9598608 - 9598572 (-0.00%)
 instructions in affected programs: 6506 - 6470 (-0.55%)
 helped:36
 
 Signed-off-by: Ian Romanick ian.d.roman...@intel.com
 ---
  src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)
 
 diff --git a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp 
 b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
 index 469f2ea..d72a83a 100644
 --- a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
 @@ -59,7 +59,8 @@ opt_cmod_propagation_local(bblock_t *block)
  
if ((inst-opcode != BRW_OPCODE_AND 
 inst-opcode != BRW_OPCODE_CMP 
 -   inst-opcode != BRW_OPCODE_MOV) ||
 +   inst-opcode != BRW_OPCODE_MOV 
 +   inst-opcode != BRW_OPCODE_NOT) ||
inst-predicate != BRW_PREDICATE_NONE ||
!inst-dst.is_null() ||
inst-src[0].file != GRF ||
 @@ -86,6 +87,11 @@ opt_cmod_propagation_local(bblock_t *block)
inst-conditional_mod != BRW_CONDITIONAL_NZ)
   continue;
  
 +  if (inst-opcode == BRW_OPCODE_NOT 
 +  inst-conditional_mod != BRW_CONDITIONAL_Z 
 +  inst-conditional_mod != BRW_CONDITIONAL_NZ)
 + continue;
 +
bool read_flag = false;
foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst,
block) {
 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [xf86-video-nouveau] dri2: Enable BufferAge support

2015-05-08 Thread Mario Kleiner



On 01/19/2015 12:00 PM, Chris Wilson wrote:

For enable BufferAge support, we just have to be not using the
DRI2Buffer-flags field for any purpose (i.e. it is always expected to
be 0, as it is now) and to be sure to swap the flags field whenever we
exchange buffers. As nouveau does not exactly support TripleBuffer, we
don't have to worry about setting the copying the flags field when
injecting the third buffer.

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
Cc: Maarten Lankhorst maarten.lankho...@ubuntu.com
Cc: Mario Kleiner mario.kleiner...@gmail.com
---
  src/nouveau_dri2.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/src/nouveau_dri2.c b/src/nouveau_dri2.c
index e3445b2..428ef92 100644
--- a/src/nouveau_dri2.c
+++ b/src/nouveau_dri2.c
@@ -711,6 +711,7 @@ nouveau_dri2_finish_swap(DrawablePtr draw, unsigned int 
frame,
}

SWAP(s-dst-name, s-src-name);
+   SWAP(s-dst-flags, s-src-flags);
SWAP(nouveau_pixmap(dst_pix)-bo, nouveau_pixmap(src_pix)-bo);

DamageRegionProcessPending(draw);
@@ -1003,6 +1004,12 @@ nouveau_dri2_init(ScreenPtr pScreen)
dri2.DestroyBuffer2 = nouveau_dri2_destroy_buffer2;
dri2.CopyRegion2 = nouveau_dri2_copy_region2;
  #endif
+
+#if DRI2INFOREC_VERSION = 10
+   dri2.version = 10;
+   dri2.bufferAge = 1;
+#endif
+
return DRI2ScreenInit(pScreen, dri2);
  }




Seems ok to me.

Reviewed-by: Mario Kleiner mario.kleiner...@gmail.com

-mario
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/9] nir/dead_cf: delete code that's unreachable due to jumps

2015-05-08 Thread Connor Abbott
Signed-off-by: Connor Abbott cwabbo...@gmail.com
---
 src/glsl/nir/nir_opt_dead_cf.c | 126 ++---
 1 file changed, 118 insertions(+), 8 deletions(-)

diff --git a/src/glsl/nir/nir_opt_dead_cf.c b/src/glsl/nir/nir_opt_dead_cf.c
index 3fbb794..17eadbd 100644
--- a/src/glsl/nir/nir_opt_dead_cf.c
+++ b/src/glsl/nir/nir_opt_dead_cf.c
@@ -39,6 +39,26 @@
  * We delete the if statement and paste the contents of the always-executed
  * branch into the surrounding control flow, possibly removing more code if
  * the branch had a jump at the end.
+ *
+ * The other way is that control flow can end in a jump so that code after it
+ * never gets executed. In particular, this can happen after optimizing
+ * something like:
+ *
+ * if (true) {
+ *...
+ *break;
+ * }
+ * ...
+ *
+ * We also consider the case where both branches of an if end in a jump, e.g.:
+ *
+ * if (...) {
+ *break;
+ * } else {
+ *continue;
+ * }
+ * ...
+ *
  */
 
 /* deletes all the control flow nodes after 'start' in the control flow list.
@@ -106,30 +126,120 @@ opt_constant_if(nir_if *if_stmt, bool condition)
 }
 
 static bool
-dead_cf_cb(nir_block *block, void *state)
+dead_cf_block(nir_block *block)
 {
-   bool *progress = state;
-
nir_if *following_if = nir_block_get_following_if(block);
if (!following_if)
-  return true;
+  return false;
 
   nir_const_value *const_value =
  nir_src_as_const_value(following_if-condition);
 
   if (!const_value)
- return true;
+ return false;
 
opt_constant_if(following_if, const_value-u[0] != 0);
-   *progress = true;
return true;
 }
 
 static bool
-opt_dead_cf_impl(nir_function_impl *impl)
+ends_in_jump(nir_block *block)
+{
+   if (exec_list_is_empty(block-instr_list))
+  return false;
+
+   nir_instr *instr = nir_block_last_instr(block);
+   return instr-type == nir_instr_type_jump;
+}
+
+static bool
+dead_cf_list(struct exec_list *list, bool *list_ends_in_jump)
 {
bool progress = false;
-   nir_foreach_block(impl, dead_cf_cb, progress);
+   *list_ends_in_jump = false;
+
+   nir_cf_node *cur = exec_node_data(nir_cf_node, exec_list_get_head(list),
+ node);
+   nir_cf_node *prev = NULL;
+
+   while (cur != NULL) {
+  switch (cur-type) {
+  case nir_cf_node_block: {
+ nir_block *block = nir_cf_node_as_block(cur);
+ if (dead_cf_block(block)) {
+/* We just deleted the if after this block, so we may have
+ * deleted the block before or after it -- which one is an
+ * implementation detail. Therefore, to recover the place we were
+ * at, we have to use the previous cf_node.
+ */
+
+if (prev) {
+   cur = nir_cf_node_next(prev);
+} else {
+   cur = exec_node_data(nir_cf_node, exec_list_get_head(list),
+node);
+}
+
+block = nir_cf_node_as_block(cur);
+
+progress = true;
+ }
+
+ if (ends_in_jump(block)) {
+*list_ends_in_jump = true;
+
+if (!exec_node_is_tail_sentinel(cur-node.next)) {
+   delete_unreachable_after(cur);
+   return true;
+}
+ }
+
+ break;
+  }
+
+  case nir_cf_node_if: {
+ nir_if *if_stmt = nir_cf_node_as_if(cur);
+ bool then_ends_in_jump, else_ends_in_jump;
+ progress |= dead_cf_list(if_stmt-then_list, then_ends_in_jump);
+ progress |= dead_cf_list(if_stmt-else_list, else_ends_in_jump);
+
+ if (then_ends_in_jump  else_ends_in_jump) {
+*list_ends_in_jump = true;
+nir_block *next = nir_cf_node_as_block(nir_cf_node_next(cur));
+if (!exec_list_is_empty(next-instr_list) ||
+!exec_node_is_tail_sentinel(next-cf_node.node.next)) {
+   delete_unreachable_after(cur);
+   return true;
+}
+ }
+
+ break;
+  }
+
+  case nir_cf_node_loop: {
+ nir_loop *loop = nir_cf_node_as_loop(cur);
+ bool dummy;
+ progress |= dead_cf_list(loop-body, dummy);
+
+ break;
+  }
+
+  default:
+ unreachable(unknown cf node type);
+  }
+
+  prev = cur;
+  cur = nir_cf_node_next(cur);
+   }
+
+   return progress;
+}
+
+static bool
+opt_dead_cf_impl(nir_function_impl *impl)
+{
+   bool dummy;
+   bool progress = dead_cf_list(impl-body, dummy);
 
if (progress)
   nir_metadata_preserve(impl, nir_metadata_none);
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/9] nir: insert ssa_undef instructions when cleanup up defs/uses

2015-05-08 Thread Jason Ekstrand
On Fri, May 8, 2015 at 10:03 PM, Connor Abbott cwabbo...@gmail.com wrote:
 The point of cleanup_defs_uses() is to make an instruction safe to
 remove by removing any references that the rest of the shader may have
 to it. Previously, it was handling register use/def sets and removing
 the instruction from the use sets of any SSA sources it had, but if the
 instruction defined an SSA value that was used by other instructions it
 wasn't doing anything. This was ok, since we were always careful to make
 sure that no removed instruction ever had any uses, but now we want to
 start removing unreachable instruction which might still be used in
 reachable parts of the code. In that case, the value that any uses get
 is undefined since the instruction never actually executes, so we can
 just replace the instruction with an ssa_undef_instr.

 Signed-off-by: Connor Abbott cwabbo...@gmail.com
 ---
  src/glsl/nir/nir.c | 47 ++-
  1 file changed, 34 insertions(+), 13 deletions(-)

 diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c
 index f03e80a..362ac30 100644
 --- a/src/glsl/nir/nir.c
 +++ b/src/glsl/nir/nir.c
 @@ -1206,26 +1206,29 @@ stitch_blocks(nir_block *before, nir_block *after)
  }

  static void
 -remove_defs_uses(nir_instr *instr);
 +remove_defs_uses(nir_instr *instr, nir_function_impl *impl);

  static void
 -cleanup_cf_node(nir_cf_node *node)
 +cleanup_cf_node(nir_cf_node *node, nir_function_impl *impl)
  {
 switch (node-type) {
 case nir_cf_node_block: {
nir_block *block = nir_cf_node_as_block(node);
/* We need to walk the instructions and clean up defs/uses */
 -  nir_foreach_instr(block, instr)
 - remove_defs_uses(instr);
 +  nir_foreach_instr(block, instr) {
 + if (instr-type == nir_instr_type_jump)
 +handle_remove_jump(block, nir_instr_as_jump(instr)-type);
 + remove_defs_uses(instr, impl);
 +  }

This looks like an unrelated change.  Maybe split that out?  The rest
of the patch looks plausible.
--Jason

break;
 }

 case nir_cf_node_if: {
nir_if *if_stmt = nir_cf_node_as_if(node);
foreach_list_typed(nir_cf_node, child, node, if_stmt-then_list)
 - cleanup_cf_node(child);
 + cleanup_cf_node(child, impl);
foreach_list_typed(nir_cf_node, child, node, if_stmt-else_list)
 - cleanup_cf_node(child);
 + cleanup_cf_node(child, impl);

list_del(if_stmt-condition.use_link);
break;
 @@ -1234,13 +1237,12 @@ cleanup_cf_node(nir_cf_node *node)
 case nir_cf_node_loop: {
nir_loop *loop = nir_cf_node_as_loop(node);
foreach_list_typed(nir_cf_node, child, node, loop-body)
 - cleanup_cf_node(child);
 + cleanup_cf_node(child, impl);
break;
 }
 case nir_cf_node_function: {
 -  nir_function_impl *impl = nir_cf_node_as_function(node);
foreach_list_typed(nir_cf_node, child, node, impl-body)
 - cleanup_cf_node(child);
 + cleanup_cf_node(child, impl);
break;
 }
 default:
 @@ -1254,6 +1256,8 @@ nir_cf_node_remove(nir_cf_node *node)
 nir_function_impl *impl = nir_cf_node_get_function(node);
 nir_metadata_preserve(impl, nir_metadata_none);

 +   cleanup_cf_node(node, impl);
 +
 if (node-type == nir_cf_node_block) {
/*
 * Basic blocks can't really be removed by themselves, since they act 
 as
 @@ -1275,8 +1279,6 @@ nir_cf_node_remove(nir_cf_node *node)
exec_node_remove(node-node);
stitch_blocks(before_block, after_block);
 }
 -
 -   cleanup_cf_node(node);
  }

  static bool
 @@ -1443,16 +1445,35 @@ remove_def_cb(nir_dest *dest, void *state)
 return true;
  }

 +static bool
 +remove_ssa_def_cb(nir_ssa_def *def, void *state)
 +{
 +   nir_function_impl *impl = state;
 +   nir_shader *shader = impl-overload-function-shader;
 +
 +   if (!list_empty(def-uses) || !list_empty(def-if_uses)) {
 +  nir_ssa_undef_instr *undef =
 + nir_ssa_undef_instr_create(shader, def-num_components);
 +  nir_instr_insert_before_cf_list(impl-body, undef-instr);
 +  nir_ssa_def_rewrite_uses(def, nir_src_for_ssa(undef-def), shader);
 +   }
 +
 +   return true;
 +}
 +
 +
  static void
 -remove_defs_uses(nir_instr *instr)
 +remove_defs_uses(nir_instr *instr, nir_function_impl *impl)
  {
 nir_foreach_dest(instr, remove_def_cb, instr);
 nir_foreach_src(instr, remove_use_cb, instr);
 +   nir_foreach_ssa_def(instr, remove_ssa_def_cb, impl);
  }

  void nir_instr_remove(nir_instr *instr)
  {
 -   remove_defs_uses(instr);
 +   nir_function_impl *impl = 
 nir_cf_node_get_function(instr-block-cf_node);
 +   remove_defs_uses(instr, impl);
 exec_node_remove(instr-node);

 if (instr-type == nir_instr_type_jump) {
 --
 2.1.0

 ___
 mesa-dev mailing list
 mesa-dev@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 26/27] i965: Disable gather push constants for null constants

2015-05-08 Thread Abdiel Janulgue


On 05/07/2015 05:48 PM, Pohjolainen, Topi wrote:
 On Tue, Apr 28, 2015 at 11:08:23PM +0300, Abdiel Janulgue wrote:
 Programming null constants with gather constant tables seems to
 be unsupported and results in a GPU lockup even with the prescribed
 GPU workarounds in the bspec. Found out by trial and error that
 disabling HW gather constant when the constant state for a stage
 needs to be nullified is the only way to go around the issue.
 
 Just a general question. We keep resource streamer itself always enabled
 (except for blorp of course). Does it still do something meaningful without
 gather constants or should we disable them both?
 

The resource streamer is just an infrastructure containing the specific
optimizations hw-generated binding tables and gather constants. The
dependency below flows from left to right.

RS_on ringbuffer -- hw-binding tables -- gather constants

So switching on gather constants require HW-gen binding tables be
enabled as well. But hw-generated binding table is not required to be
switched off for gather constants to be turned off.

The only problem with disabling both is the additional required state
setup packets to tear-down and then re-enable everything again which
increases the size of the command buffers for every 3D primitive

-Abdiel


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/27] i965: Enable hardware-generated binding tables on render path.

2015-05-08 Thread Abdiel Janulgue


On 05/07/2015 04:43 PM, Pohjolainen, Topi wrote:
 On Tue, Apr 28, 2015 at 11:08:00PM +0300, Abdiel Janulgue wrote:
 This patch implements the binding table enable command which is also
 used to allocate a binding table pool where hardware-generated
 binding table entries are flushed into. Each binding table offset in
 the binding table pool is unique per each shader stage that are
 enabled within a batch.

 Also insert the required brw_tracked_state objects to enable
 hw-generated binding tables in normal render path.

 Signed-off-by: Abdiel Janulgue abdiel.janul...@linux.intel.com
 ---
  src/mesa/drivers/dri/i965/brw_binding_tables.c | 70 
 ++
  src/mesa/drivers/dri/i965/brw_context.c|  4 ++
  src/mesa/drivers/dri/i965/brw_context.h|  5 ++
  src/mesa/drivers/dri/i965/brw_state.h  |  7 +++
  src/mesa/drivers/dri/i965/brw_state_upload.c   |  2 +
  src/mesa/drivers/dri/i965/intel_batchbuffer.c  |  4 ++
  6 files changed, 92 insertions(+)

 diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c 
 b/src/mesa/drivers/dri/i965/brw_binding_tables.c
 index 459165a..a58e32e 100644
 --- a/src/mesa/drivers/dri/i965/brw_binding_tables.c
 +++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c
 @@ -44,6 +44,11 @@
  #include brw_state.h
  #include intel_batchbuffer.h
  
 +/* Somehow the hw-binding table pool offset must start here, otherwise
 + * the GPU will hang
 + */
 +#define HW_BT_START_OFFSET 256;
 
 I think we want to understand this a little better before enabling...
 
 +
  /**
   * Upload a shader stage's binding table as indirect state.
   *
 @@ -163,6 +168,71 @@ const struct brw_tracked_state brw_gs_binding_table = {
 .emit = brw_gs_upload_binding_table,
  };
  
 +/**
 + * Hardware-generated binding tables for the resource streamer
 + */
 +void
 +gen7_disable_hw_binding_tables(struct brw_context *brw)
 +{
 +   BEGIN_BATCH(3);
 +   OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC  16 | (3 - 2));
 +   OUT_BATCH(SET_FIELD(BRW_HW_BINDING_TABLE_OFF, 
 BRW_HW_BINDING_TABLE_ENABLE) |
 + brw-is_haswell ? HSW_HW_BINDING_TABLE_RESERVED : 0);
 +   OUT_BATCH(0);
 +   ADVANCE_BATCH();
 +
 +   /* Pipe control workaround */
 +   brw_emit_pipe_control_flush(brw, PIPE_CONTROL_STATE_CACHE_INVALIDATE);
 +}
 +
 +void
 +gen7_enable_hw_binding_tables(struct brw_context *brw)
 +{
 +   if (!brw-has_resource_streamer) {
 +  gen7_disable_hw_binding_tables(brw);
 
 I started wondering why we really need this - RS is disabled by default and
 we haven't needed to do anything to disable it before.
 
 +  return;
 +   }
 +
 +   if (!brw-hw_bt_pool.bo) {
 +  /* From the BSpec, 3D Pipeline  Resource Streamer  Hardware Binding 
 Tables:
 +   *
 +   *  A maximum of 16,383 Binding tables are allowed in any batch 
 buffer.
 +   */
 +  int max_size = 16383 * 4;
 
 But does it really need this much all the time? I guess I need to go and
 read the spec.

This is actually just one re-usable buffer object sticking around for
the lifetime of the context. Compare this with creating lots of bo
every-time we enable the resource streamer. I think it helps with
reducing the amount of relocations we have.

 
 +  brw-hw_bt_pool.bo = drm_intel_bo_alloc(brw-bufmgr, hw_bt,
 +  max_size, 64);
 +  brw-hw_bt_pool.next_offset = HW_BT_START_OFFSET;
 +   }
 +
 +   uint32_t dw1 = SET_FIELD(BRW_HW_BINDING_TABLE_ON, 
 BRW_HW_BINDING_TABLE_ENABLE);
 +   if (brw-is_haswell)
 +  dw1 |= SET_FIELD(GEN7_MOCS_L3, GEN7_HW_BT_MOCS) | 
 HSW_HW_BINDING_TABLE_RESERVED;
 
 These are overflowing 80 columns.
 
 +
 +   BEGIN_BATCH(3);
 +   OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC  16 | (3 - 2));
 +   OUT_RELOC(brw-hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0, dw1);
 +   OUT_RELOC(brw-hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0,
 + brw-hw_bt_pool.bo-size);
 +   ADVANCE_BATCH();
 +
 +   /* Pipe control workaround */
 +   brw_emit_pipe_control_flush(brw, PIPE_CONTROL_STATE_CACHE_INVALIDATE);
 
 Would you have a spec reference for this?

3D-Media-GPGPU Engine  Resource Streamer [HSW+]  Hardware Binding
Tables [HSW+]  Programming note

When switching between HW and SW binding table generation, SW must
issue a state cache invalidate.

 
 +}
 +
 +void
 +gen7_reset_rs_pool_offsets(struct brw_context *brw)
 +{
 +   brw-hw_bt_pool.next_offset = HW_BT_START_OFFSET;
 +}
 +
 +const struct brw_tracked_state gen7_hw_binding_tables = {
 +   .dirty = {
 +  .mesa = 0,
 +  .brw = BRW_NEW_BATCH,
 +   },
 +   .emit = gen7_enable_hw_binding_tables
 +};
 +
  /** @} */
  
  /**
 diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
 b/src/mesa/drivers/dri/i965/brw_context.c
 index c7e1e81..9c7ccae 100644
 --- a/src/mesa/drivers/dri/i965/brw_context.c
 +++ b/src/mesa/drivers/dri/i965/brw_context.c
 @@ -953,6 +953,10 @@ intelDestroyContext(__DRIcontext * driContextPriv)
 if (brw-wm.base.scratch_bo)

[Mesa-dev] Stable libEGL ABI ?

2015-05-08 Thread Emil Velikov
Hi all,

Just had a quick look at Debian's repo and noticed something rather
worrying - the ABI of our libEGL is not stable.
Stable in the sense of - we provide a growing list of static symbols
for each new function an EGL extension adds.

This comes as we set EGL_EGLEXT_PROTOTYPES, prior to including
eglext.h. Which in turn exposes the the function prototypes which are
explicitly annotated with default visibility. This change was
introduced with

commit 1ed1027e886980b9b0f48fa6bfcf3d6e209c7787
Author: Brian Paul brian.p...@tungstengraphics.com
Date:   Tue May 27 13:45:41 2008 -0600

assorted changes to compile with new EGL 1.4 headers (untested)

From going through the EGL 1.3 to 1.5 spec, I cannot see any mention
that one should export EGL extension functions from their
implementation. On the contrary, it is explicitly mentioned that some
implementations may do so, but relying on it in your program is not
portable. Quick look at the Nvidia official driver shows only core EGL
functions, which aligns with various undefined symbol
eglCreateImageKHR issues that I've seen not too long ago - mir,
gstreamer, piglit.


So the question here is:

Can we break the already non-portable ABI, and hide everything that
is not core EGL, or alternatively how can we rework things so that as
we add new entry points, required by extensions, they are available
only dynamically ?

Personally I would opt for the former and address any issues in
piglit/waffle/foo accordingly. I would suspect that libepoxy is OK,
although I've never looked too closely at the code.

I would love to get this in some shape or form for 10.6, and avoid
exporting the two new functions from EGL_MESA_image_dma_buf_export :-\

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev