Re: [Mesa-dev] [PATCH 12/59] i965: add brw_imm_df

2016-05-10 Thread Samuel Iglesias Gonsálvez


On 11/05/16 05:56, Francisco Jerez wrote:
> Samuel Iglesias Gonsálvez  writes:
> 
>> From: Connor Abbott 
>>
>> v2 (Iago)
>>   - Fixup accessibility in backend_reg
>>
>> Signed-off-by: Iago Toral Quiroga 
> 
> I've just noticed (while running valgrind) that this patch causes
> serious breakage in the back-end.  The reason is that the extra bits
> required to make room for the df field of the union don't get
> initialized in all codepaths, so backend_reg comparisons done using
> memcmp() can basically return random results now.  Can you please look
> into this?  Some ways to fix it would be to make sure we zero-initialize
> the whole brw_reg in all cases (or at least the union padding), or stop
> using memcmp() to compare registers -- I guess the latter might be
> somewhat less intrusive and increase the likelihood that we can get this
> sorted out timely.
> 

Attached is a patch for it, I initialized all union bits to zero before
setting them in brw_reg(). Can you test it? If it is not fixed, Would
you mind sending me an example to run it with valgrind here?

I am thinking that maybe we want to change backend_reg::equals() if this
doesn't work.

Sam

>> ---
>>  src/mesa/drivers/dri/i965/brw_reg.h| 9 +
>>  src/mesa/drivers/dri/i965/brw_shader.h | 1 +
>>  2 files changed, 10 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
>> b/src/mesa/drivers/dri/i965/brw_reg.h
>> index b84c709..6d51623 100644
>> --- a/src/mesa/drivers/dri/i965/brw_reg.h
>> +++ b/src/mesa/drivers/dri/i965/brw_reg.h
>> @@ -254,6 +254,7 @@ struct brw_reg {
>>   unsigned pad1:1;
>>};
>>  
>> +  double df;
>>float f;
>>int   d;
>>unsigned ud;
>> @@ -544,6 +545,14 @@ brw_imm_reg(enum brw_reg_type type)
>>  
>>  /** Construct float immediate register */
>>  static inline struct brw_reg
>> +brw_imm_df(double df)
>> +{
>> +   struct brw_reg imm = brw_imm_reg(BRW_REGISTER_TYPE_DF);
>> +   imm.df = df;
>> +   return imm;
>> +}
>> +
>> +static inline struct brw_reg
>>  brw_imm_f(float f)
>>  {
>> struct brw_reg imm = brw_imm_reg(BRW_REGISTER_TYPE_F);
>> diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
>> b/src/mesa/drivers/dri/i965/brw_shader.h
>> index fc228f6..f6f6167 100644
>> --- a/src/mesa/drivers/dri/i965/brw_shader.h
>> +++ b/src/mesa/drivers/dri/i965/brw_shader.h
>> @@ -90,6 +90,7 @@ struct backend_reg : private brw_reg
>> using brw_reg::width;
>> using brw_reg::hstride;
>>  
>> +   using brw_reg::df;
>> using brw_reg::f;
>> using brw_reg::d;
>> using brw_reg::ud;
>> -- 
>> 2.5.0
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>From 35254624d63b77aa2024bc2b08612e28cae4bb98 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Samuel=20Iglesias=20Gons=C3=A1lvez?= 
Date: Wed, 11 May 2016 07:44:10 +0200
Subject: [PATCH] i965: initialize struct brw_reg's union bits to zero.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Extra bits required to make room for the df field of the union don't get
initialized in all codepaths, so backend_reg comparisons done using
memcmp() can basically return random results.

Initialize them to zero before setting the rest of union's fields.

Signed-off-by: Samuel Iglesias Gonsálvez 
Reported-by: Francisco Jerez 
---
 src/mesa/drivers/dri/i965/brw_reg.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_reg.h b/src/mesa/drivers/dri/i965/brw_reg.h
index 6d51623..3b76d7d 100644
--- a/src/mesa/drivers/dri/i965/brw_reg.h
+++ b/src/mesa/drivers/dri/i965/brw_reg.h
@@ -338,6 +338,9 @@ brw_reg(enum brw_reg_file file,
reg.subnr = subnr * type_sz(type);
reg.nr = nr;
 
+   /* Initialize all union's bits to zero before setting them. */
+   reg.df = 0;
+
/* Could do better: If the reg is r5.3<0;1,0>, we probably want to
 * set swizzle and writemask to W, as the lower bits of subnr will
 * be lost when converted to align16.  This is probably too much to
-- 
2.5.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/23] i965/fs: fix copy-propagation with suboffset from constants

2016-05-10 Thread Francisco Jerez
Iago Toral  writes:

> On Tue, 2016-05-03 at 16:21 -0700, Jordan Justen wrote:
>> On 2016-05-03 05:21:54, Samuel Iglesias Gonsálvez wrote:
>> > From: Iago Toral Quiroga 
>> > 
>> > The current code ignores the suboffet in the instruction's source
>> > and just uses the one from the constant. This is not correct
>> > when the instruction's source is accessing the constant with a
>> > different type and using the suboffset to select a specific
>> > chunk of the constant. We generate this kind of code in fp64
>> > when we want to select only the high 32-bit of a particular
>> > double constant.
>> > 
>> > Instead, we should add any existing suboffset in the
>> > instruction's source (modulo the size of the entry's type)
>> > to the suboffset in the constant so we can preserve the orinal
>> > semantics.
>> > 
>> > Prevents that we turn this:
>> > 
>> > mov(8) vgrf5:DF, u2<0>:DF
>> > mov(8) vgrf7:UD, vgrf5+0.4<2>:UD
>> > 
>> > Into:
>> > 
>> > mov(8) vgrf7:UD, u2<0>:UD
>> > 
>> > And instead, with this patch, we produce:
>> > 
>> > mov(8) vgrf7:UD, u2+0.4<0>:UD
>> > ---
>> >  .../drivers/dri/i965/brw_fs_copy_propagation.cpp   | 23 
>> > --
>> >  1 file changed, 21 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
>> > b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > index aa4c9c9..5fae10f 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> > @@ -445,8 +445,27 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int 
>> > arg, acp_entry *entry)
>> > case BAD_FILE:
>> > case ARF:
>> > case FIXED_GRF:
>> > -  inst->src[arg].reg_offset = entry->src.reg_offset;
>> > -  inst->src[arg].subreg_offset = entry->src.subreg_offset;
>> > +  {
>> > + inst->src[arg].reg_offset = entry->src.reg_offset;
>> > + inst->src[arg].subreg_offset = entry->src.subreg_offset;
>> > +
>> > + /* If we copy propagate from a larger type we have to be aware 
>> > that
>> > +  * the instruction might be using subreg_offset to select a 
>> > particular
>> > +  * chunk of the data in the entry. For example:
>> > +  *
>> > +  * mov(8) vgrf5:DF, u2<0>:DF
>> > +  * mov(8) vgrf7:UD, vgrf5+0.4<2>:UD
>> > +  *
>> > +  * vgrf5+0.4<2>:UD is actually reading the high 32-bit of u2.0, 
>> > so if
>> > +  * we want to copy propagate here we have to do it from u2+0.4.
>> > +  */
>> > + int type_sz_src = type_sz(inst->src[arg].type);
>> > + int type_sz_entry = type_sz(entry->src.type);
>> > + if (type_sz_entry > type_sz_src) {
>> > +inst->src[arg].subreg_offset +=
>> > +   inst->src[arg].subreg_offset % type_sz_entry;
>> 
>> Seeing 'inst->src[arg].subreg_offset' on both sides of this += seems
>> strange. Is this correct?
>
> Yes, this looks wrong. I'll fix it, thanks!
>

I just had a look at the version of this patch you have in your tree
(not sure I took the right one), which does:

| @@ -445,8 +445,24 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
acp_entry *entry)
| case BAD_FILE:
| case ARF:
| case FIXED_GRF:
| -  inst->src[arg].reg_offset = entry->src.reg_offset;
| -  inst->src[arg].subreg_offset = entry->src.subreg_offset;
| +  {
| + inst->src[arg].reg_offset = entry->src.reg_offset;
| +
| + /* If we copy propagate from a larger type we have to be aware that
| +  * the instruction might be using subreg_offset to select a 
particular
| +  * chunk of the data in the entry. For example:
| +  *
| +  * mov(8) vgrf5:DF, u2<0>:DF
| +  * mov(8) vgrf7:UD, vgrf5+0.4<2>:UD
| +  *
| +  * vgrf5+0.4<2>:UD is actually reading the high 32-bit of u2.0, so 
if
| +  * we want to copy propagate here we have to do it from u2+0.4.
| +  */
| + int type_sz_entry = type_sz(entry->src.type);
| + inst->src[arg].subreg_offset =
| +(inst->src[arg].subreg_offset +
| + entry->src.subreg_offset) % type_sz_entry;
| +  }
|break;
| case ATTR:
| case VGRF:

I'm afraid the calculation is still wrong, overflow of subreg_offset is
not handled properly by taking the original value modulo 4 and then
updating reg_offset accordingly (oh man there should *really* just be a
single reg_offset field expressed in bytes), and you're applying the
entry->src.subreg_offset offset *before* taking the remainder as if it
applied to the source region of the instruction instead of the source
region of the copy, which doesn't look correct to me.

Really, fixing this properly amounts to duplicating half of the logic
From the VGRF case below (but replacing REG_SIZE with 4 for uniforms
because of the inconsistent units of reg_offset).  I wonder why the

[Mesa-dev] [PATCH] i965/blorp: Special-case the clear color in MSAA resolves

2016-05-10 Thread Jason Ekstrand
The current MSAA resolve code has a special-case for if the MCS value is 0.
In this case we can only sample once because we know that all values are in
slice 0.  This commit adds a second optimization that detecs the magic MCS
value that indicates the clear color and grabs the color from a push
constant and avoids sampling altogether.  On a microbenchmark written by
Neil Roberts that tests resolving surfaces with just clear color, this
improves performance by 60% for 8x, 40% for 4x, and 28% for 2x MSAA on my
SKL gte3 laptop.  The benchmark can be found on the ML archive:

https://lists.freedesktop.org/archives/mesa-dev/2016-February/108077.html
---
 src/mesa/drivers/dri/i965/brw_blorp.h|  4 +-
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 72 ++--
 2 files changed, 71 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h 
b/src/mesa/drivers/dri/i965/brw_blorp.h
index 5f7569c..550c6c5 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.h
+++ b/src/mesa/drivers/dri/i965/brw_blorp.h
@@ -197,7 +197,9 @@ struct brw_blorp_wm_push_constants
uint32_t src_z;
 
/* Pad out to an integral number of registers */
-   uint32_t pad[5];
+   uint32_t pad;
+
+   union gl_color_union clear_color;
 };
 
 #define BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS \
diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index 97e3908..314034e 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -346,6 +346,7 @@ struct brw_blorp_blit_vars {
   nir_variable *offset;
} u_x_transform, u_y_transform;
nir_variable *u_src_z;
+   nir_variable *u_clear_color;
 
/* gl_FragCoord */
nir_variable *frag_coord;
@@ -374,6 +375,7 @@ brw_blorp_blit_vars_init(nir_builder *b, struct 
brw_blorp_blit_vars *v,
LOAD_UNIFORM(y_transform.multiplier, glsl_float_type())
LOAD_UNIFORM(y_transform.offset, glsl_float_type())
LOAD_UNIFORM(src_z, glsl_uint_type())
+   LOAD_UNIFORM(clear_color, glsl_vec4_type())
 
 #undef DECL_UNIFORM
 
@@ -858,7 +860,8 @@ static nir_ssa_def *
 blorp_nir_manual_blend_average(nir_builder *b, nir_ssa_def *pos,
unsigned tex_samples,
enum intel_msaa_layout tex_layout,
-   enum brw_reg_type dst_type)
+   enum brw_reg_type dst_type,
+   struct brw_blorp_blit_vars *v)
 {
/* If non-null, this is the outer-most if statement */
nir_if *outer_if = NULL;
@@ -867,9 +870,53 @@ blorp_nir_manual_blend_average(nir_builder *b, nir_ssa_def 
*pos,
   nir_local_variable_create(b->impl, glsl_vec4_type(), "color");
 
nir_ssa_def *mcs = NULL;
-   if (tex_layout == INTEL_MSAA_LAYOUT_CMS)
+   if (tex_layout == INTEL_MSAA_LAYOUT_CMS) {
   mcs = blorp_nir_txf_ms_mcs(b, pos);
 
+  /* The MCS buffer stores a packed value that provides a mapping from
+   * samples to array slices.  The magic value of all ones means that all
+   * samples have the clear color.  In this case, we can short-circuit the
+   * sampling process and just use the clear color that we pushed into the
+   * shader.
+   */
+  nir_ssa_def *is_clear_color;
+  switch (tex_samples) {
+  case 2:
+ /* Empirical evidence suggests that the value returned from the
+  * sampler is not always 0x3 for clear color so we need to mask it.
+  */
+ is_clear_color =
+nir_ieq(b, nir_iand(b, nir_channel(b, mcs, 0), nir_imm_int(b, 
0x3)),
+   nir_imm_int(b, 0x3));
+ break;
+  case 4:
+ is_clear_color =
+nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b, 0xff));
+ break;
+  case 8:
+ is_clear_color =
+nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b, ~0));
+ break;
+  case 16:
+ is_clear_color =
+nir_ior(b, nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b, ~0)),
+   nir_ieq(b, nir_channel(b, mcs, 1), nir_imm_int(b, ~0)));
+ break;
+  default:
+ unreachable("Invalid sample count");
+  }
+
+  nir_if *if_stmt = nir_if_create(b->shader);
+  if_stmt->condition = nir_src_for_ssa(is_clear_color);
+  nir_cf_node_insert(b->cursor, _stmt->cf_node);
+
+  b->cursor = nir_after_cf_list(_stmt->then_list);
+  nir_store_var(b, color, nir_load_var(b, v->u_clear_color), 0xf);
+
+  b->cursor = nir_after_cf_list(_stmt->else_list);
+  outer_if = if_stmt;
+   }
+
/* We add together samples using a binary tree structure, e.g. for 4x MSAA:
 *
 *   result = ((sample[0] + sample[1]) + (sample[2] + sample[3])) / 4
@@ -937,7 +984,8 @@ blorp_nir_manual_blend_average(nir_builder *b, nir_ssa_def 
*pos,
  nir_store_var(b, color, texture_data[0], 0xf);
 
  b->cursor = 

Re: [Mesa-dev] [PATCH 12/23] i965/fs: fix pull constant load component selection for doubles

2016-05-10 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> UNIFORM_PULL_CONSTANT_LOAD is used to load a contiguous vec4 starting at a
> constant offset that is 16-byte aligned. If we need to access an unaligned
> offset we emit a load with an aligned offset and use the remaining constant
> offset to select the component into the vec4 result that we are interested
> in. This component must be computed in units of the type size, since that
> is what fs_reg::set_smear expects.
>
> This patch does this change in the two places where we use this message:
> In demote_pull_constants when we lower uniform access with constant offset
> into the pull constant buffer and in UBO loads with constant offset.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 3 ++-
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 4 +++-
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 0e69be8..dff13ea 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -2268,7 +2268,8 @@ fs_visitor::lower_constant_loads()
>   inst->src[i].file = VGRF;
>   inst->src[i].nr = dst.nr;
>   inst->src[i].reg_offset = 0;
> - inst->src[i].set_smear(pull_index & 3);
> + unsigned type_slots = MAX2(1, type_sz(inst->dst.type) / 4);
> + inst->src[i].set_smear((pull_index & 3) / type_slots);
>  

This cannot be right, why should we care what the destination type of
the instruction is while lowering a uniform source?  Also I don't think
the MAX2 call is correct because *if* type_sz(inst->dst.type) / 4 < 1
you'll force type_slots to 1 and end up interpreting the pull_index in
the wrong units.  How about:

|   inst->src[i].set_smear((pull_index & 3) * 4 /
|  type_sz(inst->src[i].type));

>   brw_mark_surface_used(prog_data, index);
>}
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 4cd219a..532ca65 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -2980,8 +2980,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
> nir_intrinsic_instr *instr
>   bld.emit(FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD, packed_consts,
>surf_index, const_offset_reg);
>  
> + unsigned component_base =
> +(const_offset->u32[0] % 16) / MAX2(1, type_sz(dest.type));

Rather than dividing by the type size only to let set_smear multiply by
the type size again, I think it would be cleaner to do something like:

|   const fs_reg consts = byte_offset(packed_consts, 
const_offset->u32[0] % 16);

>   for (unsigned i = 0; i < instr->num_components; i++) {

then here:

|  bld.MOV(offset(dest, bld, i), component(consts, i));

and then remove the rest of the loop.

> -packed_consts.set_smear(const_offset->u32[0] % 16 / 4 + i);
> +packed_consts.set_smear(component_base + i);
>  
>  /* The std140 packing rules don't allow vectors to cross 16-byte
>   * boundaries, and a reg is 32 bytes.
> -- 
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 12/59] i965: add brw_imm_df

2016-05-10 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> From: Connor Abbott 
>
> v2 (Iago)
>   - Fixup accessibility in backend_reg
>
> Signed-off-by: Iago Toral Quiroga 

I've just noticed (while running valgrind) that this patch causes
serious breakage in the back-end.  The reason is that the extra bits
required to make room for the df field of the union don't get
initialized in all codepaths, so backend_reg comparisons done using
memcmp() can basically return random results now.  Can you please look
into this?  Some ways to fix it would be to make sure we zero-initialize
the whole brw_reg in all cases (or at least the union padding), or stop
using memcmp() to compare registers -- I guess the latter might be
somewhat less intrusive and increase the likelihood that we can get this
sorted out timely.

> ---
>  src/mesa/drivers/dri/i965/brw_reg.h| 9 +
>  src/mesa/drivers/dri/i965/brw_shader.h | 1 +
>  2 files changed, 10 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_reg.h 
> b/src/mesa/drivers/dri/i965/brw_reg.h
> index b84c709..6d51623 100644
> --- a/src/mesa/drivers/dri/i965/brw_reg.h
> +++ b/src/mesa/drivers/dri/i965/brw_reg.h
> @@ -254,6 +254,7 @@ struct brw_reg {
>   unsigned pad1:1;
>};
>  
> +  double df;
>float f;
>int   d;
>unsigned ud;
> @@ -544,6 +545,14 @@ brw_imm_reg(enum brw_reg_type type)
>  
>  /** Construct float immediate register */
>  static inline struct brw_reg
> +brw_imm_df(double df)
> +{
> +   struct brw_reg imm = brw_imm_reg(BRW_REGISTER_TYPE_DF);
> +   imm.df = df;
> +   return imm;
> +}
> +
> +static inline struct brw_reg
>  brw_imm_f(float f)
>  {
> struct brw_reg imm = brw_imm_reg(BRW_REGISTER_TYPE_F);
> diff --git a/src/mesa/drivers/dri/i965/brw_shader.h 
> b/src/mesa/drivers/dri/i965/brw_shader.h
> index fc228f6..f6f6167 100644
> --- a/src/mesa/drivers/dri/i965/brw_shader.h
> +++ b/src/mesa/drivers/dri/i965/brw_shader.h
> @@ -90,6 +90,7 @@ struct backend_reg : private brw_reg
> using brw_reg::width;
> using brw_reg::hstride;
>  
> +   using brw_reg::df;
> using brw_reg::f;
> using brw_reg::d;
> using brw_reg::ud;
> -- 
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] genxml: avoid using a GNU make pattern rule

2016-05-10 Thread Jason Ekstrand
On Mon, May 2, 2016 at 5:25 PM, Jonathan Gray  wrote:

> On Mon, May 02, 2016 at 11:44:35AM -0700, Jason Ekstrand wrote:
> > On Mon, May 2, 2016 at 2:27 AM, Jonathan Gray  wrote:
> >
> > > On Mon, May 02, 2016 at 02:23:46AM -0700, Jason Ekstrand wrote:
> > > > On May 1, 2016 11:24 PM, "Jonathan Gray"  wrote:
> > > > >
> > > > > % pattern rules are a GNU extension.  Convert the use of one to a
> > > > > suffix rule to allow this to build on OpenBSD.
> > > > >
> > > > > Signed-off-by: Jonathan Gray 
> > > > > ---
> > > > >  src/intel/genxml/Makefile.am | 4 +++-
> > > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/src/intel/genxml/Makefile.am
> > > b/src/intel/genxml/Makefile.am
> > > > > index f493d48..ea68fb9 100644
> > > > > --- a/src/intel/genxml/Makefile.am
> > > > > +++ b/src/intel/genxml/Makefile.am
> > > > > @@ -28,7 +28,9 @@ BUILT_SOURCES =
> > > >  \
> > > > >
> > > > >  PYTHON3_GEN = $(AM_V_GEN)$(PYTHON3) $(PYTHON_FLAGS)
> > > > >
> > > > > -%_pack.h : %.xml gen_pack_header.py
> > > > > +SUFFIXES = _pack.h .xml
> > > > > +
> > > > > +.xml_pack.h : gen_pack_header.py
> > > > > $(PYTHON3_GEN) $(srcdir)/gen_pack_header.py $< > $@
> > > >
> > > > We'd better also fix up all the places we include these files. :-)
> > >
> > > The generated filenames don't change, there is no need to:
> > >
> >
> > I just read up on Suffix rules (didn't even know they existed).  I think
> > what you're doing there *mostly* works.  The problem is that, according
> to
> > the GNU make docs (
> > https://www.gnu.org/software/make/manual/html_node/Suffix-Rules.html),
> > suffix rules aren't allowed to have any additional prerequisites
> declared.
> > If they do, they get treated as normal non-suffix rules.  How do we do
> this
> > as a suffix rule *and* have it depend on both genN.xml and
> > gen_pack_header.py?
> > --jason
>
> The docs on OpenBSD suggest adding another target rule for this case
> which seems to work as intended here.
>
> see "INFERENCE RULES" in
> http://man.openbsd.org/OpenBSD-current/man1/make.1
>
> commit 715e29f2a1db62f69a92e742cd33fc75889367ff
> Author: Jonathan Gray 
> Date:   Mon May 2 16:14:56 2016 +1000
>
> genxml: avoid using a GNU make pattern rule
>
> % pattern rules are a GNU extension.  Convert the use of one to a
> inference rule to allow this to build on OpenBSD.
>
> v2: inference rules can't have additional prerequisites
> so add a target rule to still depend on gen_pack_header.py
>
> Signed-off-by: Jonathan Gray 
>

Sorry it's taken so long but I just applied this, verified that it still
works with GNU make, added my R-B and pushed it to master.  Thanks!


>
> diff --git a/src/intel/genxml/Makefile.am b/src/intel/genxml/Makefile.am
> index f493d48..0b5b3a6 100644
> --- a/src/intel/genxml/Makefile.am
> +++ b/src/intel/genxml/Makefile.am
> @@ -28,7 +28,11 @@ BUILT_SOURCES =
>  \
>
>  PYTHON3_GEN = $(AM_V_GEN)$(PYTHON3) $(PYTHON_FLAGS)
>
> -%_pack.h : %.xml gen_pack_header.py
> +SUFFIXES = _pack.h .xml
> +
> +$(BUILT_SOURCES): gen_pack_header.py
> +
> +.xml_pack.h:
> $(PYTHON3_GEN) $(srcdir)/gen_pack_header.py $< > $@
>
>  CLEANFILES = $(BUILT_SOURCES)
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [android-x86-devel] [PATCH] isl: add support for Android libisl static

2016-05-10 Thread Chih-Wei Huang
Hi Mauro,
I think you should put your name on the
Copyright section of the Android.*.mk.
I understand these files are copied.
But it's you who created and modified these files.
So it's better to use your name instead of
Intel, LunarG, ...

2016-05-11 8:31 GMT+08:00 Rob Herring :
> On Tue, May 10, 2016 at 6:56 PM, Mauro Rossi  wrote:
>> This patch adds support for libisl static, needed to build i965.
>>
>> Android.genxml.gen.mk generates the necessary gen%_pack.h headers
>>
>> Android.gen.mk generates isl_format_layout.c
>
> [...]
>
>> diff --git a/src/intel/isl/Makefile.sources b/src/intel/isl/Makefile.sources
>> new file mode 100644
>> index 000..c8eb5b3
>> --- /dev/null
>> +++ b/src/intel/isl/Makefile.sources
>> @@ -0,0 +1,31 @@
>> +libisl_FILES = \
>
> Makefile.am needs to use all of these variables. Do that and this file
> in a separate patch.
>
>> +   isl.c \
>> +   isl.h \
>> +   isl_format.c \
>> +   isl_format_layout.c \
>> +   isl_gen4.c \
>> +   isl_gen4.h \
>> +   isl_gen6.c \
>> +   isl_gen6.h \
>> +   isl_storage_image.c
>> +
>> +libisl_gen7_FILES = \
>> +   isl_gen7.c \
>> +   isl_gen7.h \
>> +   isl_surface_state.c
>> +
>> +libisl_gen75_FILES = \
>> +   isl_surface_state.c
>> +
>> +libisl_gen8_FILES = \
>> +   isl_gen8.c \
>> +   isl_gen8.h \
>> +   isl_surface_state.c
>> +
>> +libisl_gen9_FILES = \
>> +   isl_gen9.c \
>> +   isl_gen9.h \
>> +   isl_surface_state.c
>> +
>> +libisl_GENERATED_FILES = \
>> +   isl_format_layout.c
>> diff --git a/src/mesa/drivers/dri/i965/Android.mk 
>> b/src/mesa/drivers/dri/i965/Android.mk
>> index 9fd3a30..b46d5e3 100644
>> --- a/src/mesa/drivers/dri/i965/Android.mk
>> +++ b/src/mesa/drivers/dri/i965/Android.mk
>> @@ -45,14 +45,16 @@ LOCAL_CFLAGS += \
>>  endif
>>
>>  LOCAL_C_INCLUDES := \
>> -   $(MESA_DRI_C_INCLUDES)
>> +   $(MESA_DRI_C_INCLUDES) \
>> +   $(MESA_TOP)/src/intel
>
> This should not be needed if libisl exports includes.
>
>>  LOCAL_SRC_FILES := \
>> $(i965_compiler_FILES) \
>> $(i965_FILES)
>>
>>  LOCAL_WHOLE_STATIC_LIBRARIES := \
>> -   $(MESA_DRI_WHOLE_STATIC_LIBRARIES)
>> +   $(MESA_DRI_WHOLE_STATIC_LIBRARIES) \
>> +   libisl
>>
>>  LOCAL_SHARED_LIBRARIES := \
>> $(MESA_DRI_SHARED_LIBRARIES) \



-- 
Chih-Wei
Android-x86 project
http://www.android-x86.org
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/14] radeonsi: Add buffer load functions.

2016-05-10 Thread Michel Dänzer
On 10.05.2016 19:52, Bas Nieuwenhuizen wrote:
> Signed-off-by: Bas Nieuwenhuizen 
> ---
>  src/gallium/drivers/radeonsi/si_shader.c | 81 
> 
>  1 file changed, 81 insertions(+)
> 
> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> b/src/gallium/drivers/radeonsi/si_shader.c
> index 5897149..d3df4d6 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -733,6 +733,87 @@ static void build_tbuffer_store_dwords(struct 
> si_shader_context *ctx,
>   V_008F0C_BUF_NUM_FORMAT_UINT, 1, 0, 1, 1, 0);
>  }
>  
> +static LLVMValueRef build_buffer_load(struct si_shader_context *ctx,
> + LLVMValueRef rsrc,
> + int num_channels,
> + LLVMValueRef vindex,
> + LLVMValueRef voffset,
> + LLVMValueRef soffset,
> + unsigned inst_offset,
> + unsigned glc,
> + unsigned slc)

Looks like the lines for the second and later parameters aren't properly
aligned to the opening parenthesis.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 15/23] i965/fs: add SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA helper

2016-05-10 Thread Francisco Jerez
Francisco Jerez  writes:

> Samuel Iglesias Gonsálvez  writes:
>
>> From: Iago Toral Quiroga 
>>
>> There are a few places where we need to shuffle the result of a 32-bit load
>> into valid 64-bit data, so extract this logic into a separate helper that we
>> can reuse.
>>
>> Also, the shuffling needs to operate with WE_all set, which we were missing
>> before, because we are changing the layout of the data across the various
>> channels. Otherwise we will run into problems in non-uniform control-flow
>> scenarios.
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs.cpp | 95 
>> +---
>>  src/mesa/drivers/dri/i965/brw_fs.h   |  5 ++
>>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 46 ++--
>>  3 files changed, 73 insertions(+), 73 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
>> b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> index dff13ea..709e4b8 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> @@ -216,39 +216,8 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
>> ,
>>  
>> vec4_result.type = dst.type;
>>  
>> -   /* Our VARYING_PULL_CONSTANT_LOAD reads a vector of 32-bit elements. If 
>> we
>> -* are reading doubles this means that we get this:
>> -*
>> -*  r0: x0 x0 x0 x0 x0 x0 x0 x0
>> -*  r1: x1 x1 x1 x1 x1 x1 x1 x1
>> -*  r2: y0 y0 y0 y0 y0 y0 y0 y0
>> -*  r3: y1 y1 y1 y1 y1 y1 y1 y1
>> -*
>> -* Fix this up so we return valid double elements:
>> -*
>> -*  r0: x0 x1 x0 x1 x0 x1 x0 x1
>> -*  r1: x0 x1 x0 x1 x0 x1 x0 x1
>> -*  r2: y0 y1 y0 y1 y0 y1 y0 y1
>> -*  r3: y0 y1 y0 y1 y0 y1 y0 y1
>> -*/
>> -   if (type_sz(dst.type) == 8) {
>> -  int multiplier = bld.dispatch_width() / 8;
>> -  fs_reg fixed_res =
>> - fs_reg(VGRF, alloc.allocate(2 * multiplier), BRW_REGISTER_TYPE_F);
>> -  /* We only have 2 doubles in a 32-bit vec4 */
>> -  for (int i = 0; i < 2; i++) {
>> - fs_reg vec4_float =
>> -horiz_offset(retype(vec4_result, BRW_REGISTER_TYPE_F),
>> - multiplier * 16 * i);
>> -
>> - bld.MOV(stride(fixed_res, 2), vec4_float);
>> - bld.MOV(stride(horiz_offset(fixed_res, 1), 2),
>> - horiz_offset(vec4_float, 8 * multiplier));
>> -
>> - bld.MOV(horiz_offset(vec4_result, multiplier * 8 * i),
>> - retype(fixed_res, BRW_REGISTER_TYPE_DF));
>> -  }
>> -   }
>> +   if (type_sz(dst.type) == 8)
>> +  SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA(bld, vec4_result, 
>> vec4_result, 2);
>>  
>> int type_slots = MAX2(type_sz(dst.type) / 4, 1);
>> bld.MOV(dst, offset(vec4_result, bld,
>> @@ -256,6 +225,66 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
>> ,
>>  }
>>  
>>  /**
>> + * This helper takes the result of a load operation that reads 32-bit 
>> elements
>> + * in this format:
>> + *
>> + * x x x x x x x x
>> + * y y y y y y y y
>> + * z z z z z z z z
>> + * w w w w w w w w
>> + *
>> + * and shuffles the data to get this:
>> + *
>> + * x y x y x y x y
>> + * x y x y x y x y
>> + * z w z w z w z w
>> + * z w z w z w z w
>> + *
>> + * Which is exactly what we want if the load is reading 64-bit components
>> + * like doubles, where x represents the low 32-bit of the x double component
>> + * and y represents the high 32-bit of the x double component (likewise with
>> + * z and w for double component y). The parameter @components represents
>> + * the number of 64-bit components present in @src. This would typically be
>> + * 2 at most, since we can only fit 2 double elements in the result of a
>> + * vec4 load.
>> + *
>> + * Notice that @dst and @src can be the same register.
>> + */
>> +void
>> +fs_visitor::SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA(const fs_builder ,
>
> I don't see any reason to make this an fs_visitor method.  Declare this
> as a static function local to brw_fs_nir.cpp what should improve
> encapsulation and reduce the amount of boilerplate.  Also please don't
> write it in capitals unless you want people to shout the name of your
> function while discussing out loud about it. ;)
>
>> +const fs_reg dst,
>> +const fs_reg src,
>> +uint32_t components)
>> +{
>> +   int multiplier = bld.dispatch_width() / 8;
>
> This definition is redundant with the changes below taken into account.
>
>> +
>> +   /* A temporary that we will use to shuffle the 32-bit data of each
>> +* component in the vector into valid 64-bit data
>> +*/
>> +   fs_reg tmp =
>> +  fs_reg(VGRF, alloc.allocate(2 * multiplier), BRW_REGISTER_TYPE_F);
>
> I don't think there is any reason to do this inside a temporary instead
> of writing into the destination register directly.
>
>> +
>> +   /* We 

Re: [Mesa-dev] [PATCH 18/23] i965/fs: add SHUFFLE_32BIT_DATA_FOR_64BIT_WRITE helper

2016-05-10 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> This does the inverse operation of SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA
> and we will use it when we need to write 64-bit data in the layout expected
> by untyped write messages.
>
> Again, this needs to operate with WE_all set for the same reasons as the
> inverse operation.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 42 
> 
>  src/mesa/drivers/dri/i965/brw_fs.h   |  5 +
>  2 files changed, 47 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 709e4b8..80803a6 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -285,6 +285,48 @@ 
> fs_visitor::SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA(const fs_builder ,
>  }
>  
>  /**
> + * This helper does the inverse operation of
> + * SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA.
> + *
> + * We need to do this when we are going to use untyped write messsages that
> + * operate with 32-bit components in order to arrange our 64-bit data i to be
> + * in the expected layout.
> + */
> +void
> +fs_visitor::SHUFFLE_32BIT_DATA_FOR_64BIT_WRITE(const fs_builder ,
> +   const fs_reg dst,
> +   const fs_reg src,
> +   uint32_t components)
> +{
> +   int multiplier = bld.dispatch_width() / 8;
> +
> +   /* A temporary that we will use to shuffle the 64-bit data of each
> +* component in the vector into 32-bit data that we can write.
> +*/
> +   fs_reg tmp =
> +  fs_reg(VGRF, alloc.allocate(2 * multiplier), BRW_REGISTER_TYPE_F);
> +
> +   /* We are going to operate the source in units of 32-bit */
> +   fs_reg src_data = retype(src, BRW_REGISTER_TYPE_F);
> +
> +   /* We are going to operate on the dst in units of 64-bit */
> +   fs_reg dst_data = retype(dst, BRW_REGISTER_TYPE_DF);
> +
> +   /* Shuffle the data */
> +   for (unsigned i = 0; i < components; i++) {
> +  fs_reg component_i = horiz_offset(src_data, multiplier * 16 * i);
> +
> +  bld.MOV(tmp, stride(component_i, 2))->force_writemask_all = true;
> +  bld.MOV(horiz_offset(tmp, 8 * multiplier),
> +  stride(horiz_offset(component_i, 1), 2))
> + ->force_writemask_all = true;
> +
> +  bld.MOV(horiz_offset(dst_data, multiplier * 8 * i),
> +  retype(tmp, BRW_REGISTER_TYPE_DF))->force_writemask_all = true;
> +   }
> +}
> +

Looks like pretty much all of my comments on PATCH 15 apply here too.

> +/**
>   * A helper for MOV generation for fixing up broken hardware SEND dependency
>   * handling.
>   */
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h 
> b/src/mesa/drivers/dri/i965/brw_fs.h
> index 1e78f0c..9178347 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -111,6 +111,11 @@ public:
>  const fs_reg src,
>  uint32_t components);
>  
> +   void SHUFFLE_32BIT_DATA_FOR_64BIT_WRITE(const brw::fs_builder ,
> +   const fs_reg dst,
> +   const fs_reg src,
> +   uint32_t components);
> +
> void do_untyped_vector_read(const brw::fs_builder ,
> const fs_reg surf_index,
> const fs_reg offset_reg,
> -- 
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 15/23] i965/fs: add SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA helper

2016-05-10 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> There are a few places where we need to shuffle the result of a 32-bit load
> into valid 64-bit data, so extract this logic into a separate helper that we
> can reuse.
>
> Also, the shuffling needs to operate with WE_all set, which we were missing
> before, because we are changing the layout of the data across the various
> channels. Otherwise we will run into problems in non-uniform control-flow
> scenarios.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 95 
> +---
>  src/mesa/drivers/dri/i965/brw_fs.h   |  5 ++
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 46 ++--
>  3 files changed, 73 insertions(+), 73 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index dff13ea..709e4b8 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -216,39 +216,8 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
> ,
>  
> vec4_result.type = dst.type;
>  
> -   /* Our VARYING_PULL_CONSTANT_LOAD reads a vector of 32-bit elements. If we
> -* are reading doubles this means that we get this:
> -*
> -*  r0: x0 x0 x0 x0 x0 x0 x0 x0
> -*  r1: x1 x1 x1 x1 x1 x1 x1 x1
> -*  r2: y0 y0 y0 y0 y0 y0 y0 y0
> -*  r3: y1 y1 y1 y1 y1 y1 y1 y1
> -*
> -* Fix this up so we return valid double elements:
> -*
> -*  r0: x0 x1 x0 x1 x0 x1 x0 x1
> -*  r1: x0 x1 x0 x1 x0 x1 x0 x1
> -*  r2: y0 y1 y0 y1 y0 y1 y0 y1
> -*  r3: y0 y1 y0 y1 y0 y1 y0 y1
> -*/
> -   if (type_sz(dst.type) == 8) {
> -  int multiplier = bld.dispatch_width() / 8;
> -  fs_reg fixed_res =
> - fs_reg(VGRF, alloc.allocate(2 * multiplier), BRW_REGISTER_TYPE_F);
> -  /* We only have 2 doubles in a 32-bit vec4 */
> -  for (int i = 0; i < 2; i++) {
> - fs_reg vec4_float =
> -horiz_offset(retype(vec4_result, BRW_REGISTER_TYPE_F),
> - multiplier * 16 * i);
> -
> - bld.MOV(stride(fixed_res, 2), vec4_float);
> - bld.MOV(stride(horiz_offset(fixed_res, 1), 2),
> - horiz_offset(vec4_float, 8 * multiplier));
> -
> - bld.MOV(horiz_offset(vec4_result, multiplier * 8 * i),
> - retype(fixed_res, BRW_REGISTER_TYPE_DF));
> -  }
> -   }
> +   if (type_sz(dst.type) == 8)
> +  SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA(bld, vec4_result, vec4_result, 
> 2);
>  
> int type_slots = MAX2(type_sz(dst.type) / 4, 1);
> bld.MOV(dst, offset(vec4_result, bld,
> @@ -256,6 +225,66 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
> ,
>  }
>  
>  /**
> + * This helper takes the result of a load operation that reads 32-bit 
> elements
> + * in this format:
> + *
> + * x x x x x x x x
> + * y y y y y y y y
> + * z z z z z z z z
> + * w w w w w w w w
> + *
> + * and shuffles the data to get this:
> + *
> + * x y x y x y x y
> + * x y x y x y x y
> + * z w z w z w z w
> + * z w z w z w z w
> + *
> + * Which is exactly what we want if the load is reading 64-bit components
> + * like doubles, where x represents the low 32-bit of the x double component
> + * and y represents the high 32-bit of the x double component (likewise with
> + * z and w for double component y). The parameter @components represents
> + * the number of 64-bit components present in @src. This would typically be
> + * 2 at most, since we can only fit 2 double elements in the result of a
> + * vec4 load.
> + *
> + * Notice that @dst and @src can be the same register.
> + */
> +void
> +fs_visitor::SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA(const fs_builder ,

I don't see any reason to make this an fs_visitor method.  Declare this
as a static function local to brw_fs_nir.cpp what should improve
encapsulation and reduce the amount of boilerplate.  Also please don't
write it in capitals unless you want people to shout the name of your
function while discussing out loud about it. ;)

> +const fs_reg dst,
> +const fs_reg src,
> +uint32_t components)
> +{
> +   int multiplier = bld.dispatch_width() / 8;

This definition is redundant with the changes below taken into account.

> +
> +   /* A temporary that we will use to shuffle the 32-bit data of each
> +* component in the vector into valid 64-bit data
> +*/
> +   fs_reg tmp =
> +  fs_reg(VGRF, alloc.allocate(2 * multiplier), BRW_REGISTER_TYPE_F);

I don't think there is any reason to do this inside a temporary instead
of writing into the destination register directly.

> +
> +   /* We are going to manipulate the data in elements of 32-bit */
> +   fs_reg src_data = retype(src, BRW_REGISTER_TYPE_F);
> +
> +   /* We are going to manipulate the dst in elements of 64-bit */
> 

[Mesa-dev] [Bug 95323] GL33-CTS.CommonBugs.CommonBug_ReservedNames fails

2016-05-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=95323

--- Comment #1 from Ian Romanick  ---
At least per the GLSL 1.50 specification, this test enforces NON-conformance! 
The grammar makes it quite clear that

in struct { ... } foo;

declares a shader input "foo" that is an anonymous structure.  Here are the
relevant bits of the grammar (with some omissions for clarity):

single_declaration:
...
fully_specified_type IDENTIFIER
...

fully_specified_type:
type_specifier
type_qualifier type_specifier

type_qualifier:
storage_qualifier
...

storage_qualifier:
...
IN
...

type_specifier:
type_specifier_no_prec
precision_qualifier type_specifier_no_prec

type_specifier_no_prec:
type_specifier_nonarray
...

type_specifier_nonarray:
VOID
FLOAT
INT
...
struct_specifier
TYPE_NAME

struct_specifier:
...
STRUCT LEFT_BRACE struct_declaration_list RIGHT_BRACE

Neither section 4.3.4 (Inputs) nor section 4.3.7 (Interface Blocks) add any
limitations that I can see.  I'll dig through later specs to see if this
changed.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/23] i965/fs: fix copy propagation of partially invalidated entries

2016-05-10 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> We were not invalidating entries with a src that reads more than one register
> when we find writes that overwrite any register read by entry->src after
> the first. This leads to incorrect copy propagation because we re-use
> entries from the ACP that have been partially invalidated. Same thing for
> entries with a dst that writes to more than one register.
> ---
>  .../drivers/dri/i965/brw_fs_copy_propagation.cpp   | 35 
> --
>  1 file changed, 26 insertions(+), 9 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> index 23df877..fe37676 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> @@ -44,6 +44,7 @@ struct acp_entry : public exec_node {
> fs_reg dst;
> fs_reg src;
> uint8_t regs_written;
> +   uint8_t regs_read;
> enum opcode opcode;
> bool saturate;
>  };
> @@ -768,18 +769,32 @@ fs_visitor::opt_copy_propagate_local(void 
> *copy_prop_ctx, bblock_t *block,
>/* kill the destination from the ACP */
>if (inst->dst.file == VGRF) {
>   foreach_in_list_safe(acp_entry, entry, [inst->dst.nr % 
> ACP_HASH_SIZE]) {
> - if (inst->overwrites_reg(entry->dst)) {
> -entry->remove();
> - }
> -  }
> +fs_reg tmp = entry->dst;
> +for (int n = 0; n < entry->regs_written; n++) {
> +   if (inst->overwrites_reg(tmp)) {
> +  entry->remove();
> +  break;
> +   }
> +   tmp.reg_offset++;
> +}
> + }

The loop shouldn't be necessary, I suggest we do something like:

|  if (regions_overlap(entry->dst, entry->regs_written,
|  inst->dst, inst->regs_written) {
| entry->remove();
| break;
|  }

where:

| inline bool
| regions_overlap(const fs_reg , unsigned n, const fs_reg , unsigned m)
| {
| return r.file == s.file && r.nr == s.nr &&
|!(r.reg_offset + n < s.reg_offset ||
|  s.reg_offset + m < r.reg_offset);
| }

Alternatively you could extend 'fs_inst::overwrites_reg()' to take a
register range instead of a single register -- Whatever you do it would
be nice to see a v2 of the patch before you push.

>  
>   /* Oops, we only have the chaining hash based on the destination, 
> not
>* the source, so walk across the entire table.
>*/
>   for (int i = 0; i < ACP_HASH_SIZE; i++) {
>  foreach_in_list_safe(acp_entry, entry, [i]) {
> -   if (inst->overwrites_reg(entry->src))
> -  entry->remove();
> +   /* Make sure we kill the entry if this instructions overwrites
> +* _any_ of the registers that it reads
> +*/
> +   fs_reg tmp = entry->src;
> +   for (int n = 0; n < entry->regs_read; n++) {
> +  if (inst->overwrites_reg(tmp)) {
> + entry->remove();
> + break;
> +  }
> +  tmp.reg_offset++;
> +   }

Use 'regions_overlap' here too instead of a loop.

>  }
>}
>}
> @@ -788,10 +803,11 @@ fs_visitor::opt_copy_propagate_local(void 
> *copy_prop_ctx, bblock_t *block,
> * operand of another instruction, add it to the ACP.
> */
>if (can_propagate_from(inst)) {
> -  acp_entry *entry = ralloc(copy_prop_ctx, acp_entry);
> -  entry->dst = inst->dst;
> -  entry->src = inst->src[0];
> + acp_entry *entry = ralloc(copy_prop_ctx, acp_entry);
> + entry->dst = inst->dst;
> + entry->src = inst->src[0];
>   entry->regs_written = inst->regs_written;
> + entry->regs_read = inst->regs_read(0);
>   entry->opcode = inst->opcode;
>   entry->saturate = inst->saturate;
>   acp[entry->dst.nr % ACP_HASH_SIZE].push_tail(entry);
> @@ -807,6 +823,7 @@ fs_visitor::opt_copy_propagate_local(void *copy_prop_ctx, 
> bblock_t *block,
> entry->dst.reg_offset = offset;
> entry->src = inst->src[i];
> entry->regs_written = regs_written;
> +   entry->regs_read = inst->regs_read(i);
> entry->opcode = inst->opcode;
> if (!entry->dst.equals(inst->src[i])) {
>acp[entry->dst.nr % ACP_HASH_SIZE].push_tail(entry);
> -- 
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

[Mesa-dev] [PATCH] i965: Drop perf_debug about rasterizer discard in SOL vs. clipper.

2016-05-10 Thread Kenneth Graunke
I recently experimented with performing rasterizer discard in the SOL
unit instead of the clipper, and as far as I can tell, it's basically
the same performance.  The clipper comes directly after SOL anyway,
and setting the clipper to REJECT_ALL should be pretty darn cheap.

Keep the perf_debug on Sandybridge, where the GS actually does work.

Signed-off-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/gen6_clip_state.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen6_clip_state.c 
b/src/mesa/drivers/dri/i965/gen6_clip_state.c
index 8ae19c8..7bcae9f 100644
--- a/src/mesa/drivers/dri/i965/gen6_clip_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_clip_state.c
@@ -163,9 +163,11 @@ upload_clip_state(struct brw_context *brw)
/* BRW_NEW_RASTERIZER_DISCARD */
if (ctx->RasterDiscard) {
   dw2 |= GEN6_CLIP_MODE_REJECT_ALL;
-  perf_debug("Rasterizer discard is currently implemented via the clipper; 
"
- "%s be faster.\n", brw->gen >= 7 ? "using the SOL unit may" :
- "having the GS not write primitives would likely");
+  if (brw->gen == 6) {
+ perf_debug("Rasterizer discard is currently implemented via the "
+"clipper; having the GS not write primitives would "
+"likely be faster.\n");
+  }
}
 
uint32_t enable;
-- 
2.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: use enum glsl_interface_packing in more places.

2016-05-10 Thread Dave Airlie
From: Dave Airlie 

Although the glsl_types.h stores this in a bitfield,
we should hide that from everyone else. Hide the cast
in an accessor method and use the enum everywhere.

This makes things a bit nicer in gdb, and improves type
safety.

Signed-off-by: Dave Airlie 
---
 src/compiler/glsl/ir.h  |  4 
 src/compiler/glsl/link_uniform_block_active_visitor.cpp |  6 ++
 src/compiler/glsl/link_uniform_blocks.cpp   |  6 +++---
 src/compiler/glsl/link_uniforms.cpp | 16 
 src/compiler/glsl/linker.h  |  2 +-
 src/compiler/glsl/lower_buffer_access.cpp   |  2 +-
 src/compiler/glsl/lower_buffer_access.h |  2 +-
 src/compiler/glsl/lower_shared_reference.cpp|  6 +++---
 src/compiler/glsl/lower_ubo_reference.cpp   | 16 
 src/compiler/glsl/opt_dead_code.cpp |  2 +-
 src/compiler/glsl_types.h   |  8 
 11 files changed, 40 insertions(+), 30 deletions(-)

diff --git a/src/compiler/glsl/ir.h b/src/compiler/glsl/ir.h
index 2bf04e68..ab06b13 100644
--- a/src/compiler/glsl/ir.h
+++ b/src/compiler/glsl/ir.h
@@ -534,6 +534,10 @@ public:
   return this->interface_type;
}
 
+   enum glsl_interface_packing get_interface_type_packing() const
+   {
+ return this->interface_type->get_interface_packing();
+   }
/**
 * Get the max_ifc_array_access pointer
 *
diff --git a/src/compiler/glsl/link_uniform_block_active_visitor.cpp 
b/src/compiler/glsl/link_uniform_block_active_visitor.cpp
index 54fea70..df8b221 100644
--- a/src/compiler/glsl/link_uniform_block_active_visitor.cpp
+++ b/src/compiler/glsl/link_uniform_block_active_visitor.cpp
@@ -167,8 +167,7 @@ link_uniform_block_active_visitor::visit(ir_variable *var)
 * also considered active, even if no member of the block is
 * referenced."
 */
-   if (var->get_interface_type()->interface_packing ==
-   GLSL_INTERFACE_PACKING_PACKED)
+   if (var->get_interface_type_packing() == GLSL_INTERFACE_PACKING_PACKED)
   return visit_continue;
 
/* Process the block.  Bail if there was an error.
@@ -258,8 +257,7 @@ 
link_uniform_block_active_visitor::visit_enter(ir_dereference_array *ir)
 * std140 layout qualifier, all its instances have been already marked
 * as used in link_uniform_block_active_visitor::visit(ir_variable *).
 */
-   if (var->get_interface_type()->interface_packing ==
-   GLSL_INTERFACE_PACKING_PACKED) {
+   if (var->get_interface_type_packing() == GLSL_INTERFACE_PACKING_PACKED) {
   b->var = var;
   process_arrays(this->mem_ctx, ir, b);
}
diff --git a/src/compiler/glsl/link_uniform_blocks.cpp 
b/src/compiler/glsl/link_uniform_blocks.cpp
index ac415b5..6f04cf6 100644
--- a/src/compiler/glsl/link_uniform_blocks.cpp
+++ b/src/compiler/glsl/link_uniform_blocks.cpp
@@ -68,7 +68,7 @@ private:
}
 
virtual void enter_record(const glsl_type *type, const char *,
- bool row_major, const unsigned packing) {
+ bool row_major, const enum glsl_interface_packing 
packing) {
   assert(type->is_record());
   if (packing == GLSL_INTERFACE_PACKING_STD430)
  this->offset = glsl_align(
@@ -79,7 +79,7 @@ private:
}
 
virtual void leave_record(const glsl_type *type, const char *,
- bool row_major, const unsigned packing) {
+ bool row_major, const enum glsl_interface_packing 
packing) {
   assert(type->is_record());
 
   /* If this is the last field of a structure, apply rule #9.  The
@@ -104,7 +104,7 @@ private:
 
virtual void visit_field(const glsl_type *type, const char *name,
 bool row_major, const glsl_type *,
-const unsigned packing,
+const enum glsl_interface_packing packing,
 bool last_field)
{
   assert(this->index < this->num_variables);
diff --git a/src/compiler/glsl/link_uniforms.cpp 
b/src/compiler/glsl/link_uniforms.cpp
index 92f1095..3021538 100644
--- a/src/compiler/glsl/link_uniforms.cpp
+++ b/src/compiler/glsl/link_uniforms.cpp
@@ -65,7 +65,7 @@ program_resource_visitor::process(const glsl_type *type, 
const char *name)
 
unsigned record_array_count = 1;
char *name_copy = ralloc_strdup(NULL, name);
-   unsigned packing = type->interface_packing;
+   enum glsl_interface_packing packing = type->get_interface_packing();
 
recursion(type, _copy, strlen(name), false, NULL, packing, false,
  record_array_count, NULL);
@@ -79,9 +79,9 @@ program_resource_visitor::process(ir_variable *var)
const bool row_major =
   var->data.matrix_layout == GLSL_MATRIX_LAYOUT_ROW_MAJOR;
 
-   const unsigned packing = var->get_interface_type() ?
- 

Re: [Mesa-dev] [android-x86-devel] [PATCH] isl: add support for Android libisl static

2016-05-10 Thread Rob Herring
On Tue, May 10, 2016 at 6:56 PM, Mauro Rossi  wrote:
> This patch adds support for libisl static, needed to build i965.
>
> Android.genxml.gen.mk generates the necessary gen%_pack.h headers
>
> Android.gen.mk generates isl_format_layout.c

[...]

> diff --git a/src/intel/isl/Makefile.sources b/src/intel/isl/Makefile.sources
> new file mode 100644
> index 000..c8eb5b3
> --- /dev/null
> +++ b/src/intel/isl/Makefile.sources
> @@ -0,0 +1,31 @@
> +libisl_FILES = \

Makefile.am needs to use all of these variables. Do that and this file
in a separate patch.

> +   isl.c \
> +   isl.h \
> +   isl_format.c \
> +   isl_format_layout.c \
> +   isl_gen4.c \
> +   isl_gen4.h \
> +   isl_gen6.c \
> +   isl_gen6.h \
> +   isl_storage_image.c
> +
> +libisl_gen7_FILES = \
> +   isl_gen7.c \
> +   isl_gen7.h \
> +   isl_surface_state.c
> +
> +libisl_gen75_FILES = \
> +   isl_surface_state.c
> +
> +libisl_gen8_FILES = \
> +   isl_gen8.c \
> +   isl_gen8.h \
> +   isl_surface_state.c
> +
> +libisl_gen9_FILES = \
> +   isl_gen9.c \
> +   isl_gen9.h \
> +   isl_surface_state.c
> +
> +libisl_GENERATED_FILES = \
> +   isl_format_layout.c
> diff --git a/src/mesa/drivers/dri/i965/Android.mk 
> b/src/mesa/drivers/dri/i965/Android.mk
> index 9fd3a30..b46d5e3 100644
> --- a/src/mesa/drivers/dri/i965/Android.mk
> +++ b/src/mesa/drivers/dri/i965/Android.mk
> @@ -45,14 +45,16 @@ LOCAL_CFLAGS += \
>  endif
>
>  LOCAL_C_INCLUDES := \
> -   $(MESA_DRI_C_INCLUDES)
> +   $(MESA_DRI_C_INCLUDES) \
> +   $(MESA_TOP)/src/intel

This should not be needed if libisl exports includes.

>  LOCAL_SRC_FILES := \
> $(i965_compiler_FILES) \
> $(i965_FILES)
>
>  LOCAL_WHOLE_STATIC_LIBRARIES := \
> -   $(MESA_DRI_WHOLE_STATIC_LIBRARIES)
> +   $(MESA_DRI_WHOLE_STATIC_LIBRARIES) \
> +   libisl
>
>  LOCAL_SHARED_LIBRARIES := \
> $(MESA_DRI_SHARED_LIBRARIES) \
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/23] i965/fs: don't copy propagate from a larger type if the stride is not 1

2016-05-10 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> Because the stride is in units for the type, if we copy-propagate from
> a another instruction using a larger type, then we need to make sure
> that the source in that instruction, the one we will be copy-propagating
> from, sources consecutive elements, otherwise, when sourced using a
> a smaller type, the actual elements read would change.
>
> Prevents that we turn this:
> mov(8) vgrf3+2.0:DF, vgrf11<0>:DF
> load_payload(8) vgrf15:UD, ..., vgrf3+2.0<0>:D, vgrf3+3.0<0>:D
>
> Into:
> mov(8) vgrf3+2.0:DF, vgrf11<0>:DF
> load_payload(8) vgrf15:UD, ..., vgrf11<0>:D, vgrf11<0>:D
>

Sorry but I don't see the problem, the two assembly examples look fully
equivalent to me.

> In the original instructions, vgrf3+2.0<0>:D reads a replicated 64-bit
> value, while the result after copy propagation only reads the first
> 32-bit of the value.
> ---
>  src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> index 9fc06cb..f98ab41 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> @@ -400,6 +400,15 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
> acp_entry *entry)
> if (type_sz(entry->dst.type) < type_sz(inst->src[arg].type))
>return false;
>  
> +   /* Bail if the instruction type is larger than the execution type of the
> +* copy and the stride of the source we would be copy propagating from
> +* has a stride other than 1. Otherwise, since the stride is in units of
> +* the type, we would be changing the region effectively sourced.
> +*/
> +   if (type_sz(entry->dst.type) > type_sz(inst->src[arg].type) &&
> +   entry->src.stride != 1)
> +  return false;

NAK, there is a more accurate condition just a few lines below making
sure that the strides can be composed correctly when the original
entry->src.stride is not one.  Some special cases of
'type_sz(entry->dst.type) > type_sz(inst->src[arg].type) &&
entry->src.stride != 1' are handled by copy propagation, and the cases
that are not should already be caught by the check immediately below.

> +
> /* Bail if the result of composing both strides cannot be expressed
>  * as another stride. This avoids, for example, trying to transform
>  * this:
> -- 
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: use var with initializer on global var validation

2016-05-10 Thread Kenneth Graunke
On Friday, May 6, 2016 10:55:36 AM PDT Juan A. Suarez Romero wrote:
> Currently, when cross validating global variables, all global variables
> seen in the shaders that are part of a program are saved in a table.
> 
> When checking a variable this already exist in the table, we check both
> are initialized to the same value. If the already saved variable does
> not have an initializer, we copy it from the new variable.
> 
> Unfortunately this is wrong, as we are modifying something it is
> constant. Also, if this modified variable is used in
> another program, it will keep the initializer, when it should have none.
> 
> Instead of copying the initializer, this commit replaces the old
> variable with the new one. So if we see again the same variable with an
> initializer, we can compare if both are the same or not.
> ---
>  src/compiler/glsl/glsl_symbol_table.cpp | 10 ++
>  src/compiler/glsl/glsl_symbol_table.h   |  5 +
>  src/compiler/glsl/linker.cpp| 27 +--
>  3 files changed, 20 insertions(+), 22 deletions(-)
> 
> diff --git a/src/compiler/glsl/glsl_symbol_table.cpp b/src/compiler/glsl/
glsl_symbol_table.cpp
> index 6c682ac..6d7baad 100644
> --- a/src/compiler/glsl/glsl_symbol_table.cpp
> +++ b/src/compiler/glsl/glsl_symbol_table.cpp
> @@ -278,3 +278,13 @@ glsl_symbol_table::disable_variable(const char *name)
>entry->v = NULL;
> }
>  }
> +
> +void
> +glsl_symbol_table::replace_variable(const char *name,
> +ir_variable *v)
> +{
> +   symbol_table_entry *entry = get_entry(name);
> +   if (entry != NULL) {
> +  entry->v = v;
> +   }
> +}
> diff --git a/src/compiler/glsl/glsl_symbol_table.h b/src/compiler/glsl/
glsl_symbol_table.h
> index 5d654e5..2f94d4c 100644
> --- a/src/compiler/glsl/glsl_symbol_table.h
> +++ b/src/compiler/glsl/glsl_symbol_table.h
> @@ -100,6 +100,11 @@ struct glsl_symbol_table {
>  */
> void disable_variable(const char *name);
>  
> +   /**
> +* Replaces the variable in the entry by the new variable.
> +*/
> +   void replace_variable(const char *name, ir_variable *v);
> +
>  private:
> symbol_table_entry *get_entry(const char *name);
>  
> diff --git a/src/compiler/glsl/linker.cpp b/src/compiler/glsl/linker.cpp
> index 9c72478..4522734 100644
> --- a/src/compiler/glsl/linker.cpp
> +++ b/src/compiler/glsl/linker.cpp
> @@ -1093,21 +1093,11 @@ cross_validate_globals(struct gl_shader_program 
*prog,
>return;
> }
>  } else {
> -   /* If the first-seen instance of a particular uniform did not
> -* have an initializer but a later instance does, copy the
> -* initializer to the version stored in the symbol table.
> -*/
> -   /* FINISHME: This is wrong.  The constant_value field should
> -* FINISHME: not be modified!  Imagine a case where a shader
> -* FINISHME: without an initializer is linked in two different
> -* FINISHME: programs with shaders that have differing
> -* FINISHME: initializers.  Linking with the first will
> -* FINISHME: modify the shader, and linking with the second
> -* FINISHME: will fail.
> -*/
> -   existing->constant_initializer =
> -  var->constant_initializer->clone(ralloc_parent(existing),
> -   NULL);
> +   /* If the first-seen instance of a particular uniform did
> +* not have an initializer but a later instance does,
> +* replace the former with the later.
> +   */

Please convert the tabs in the above lines to spaces.

Thanks for fixing this!

Reviewed-by: Kenneth Graunke 

> +  variables.replace_variable(existing->name, var);
>  }
>   }
>  
> @@ -1121,13 +,6 @@ cross_validate_globals(struct gl_shader_program 
*prog,
>  var->name);
> return;
>  }
> -
> -/* Some instance had an initializer, so keep track of that.  In
> - * this location, all sorts of initializers (constant or
> - * otherwise) will propagate the existence to the variable
> - * stored in the symbol table.
> - */
> -existing->data.has_initializer = true;
>   }
>  
>   if (existing->data.invariant != var->data.invariant) {
> 



signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/3] nir/algebraic: Separate ffma lowering from fusing

2016-05-10 Thread Kenneth Graunke
On Thursday, May 5, 2016 5:51:18 PM PDT Jason Ekstrand wrote:
> The i965 driver has its own pass for fusing mul+add combinations that's
> much smarter than what nir_opt_algebraic can do so we don't want to get the
> nir_opt_algebraic one just because we didn't set lower_ffma.
> ---
>  src/compiler/nir/nir.h  | 1 +
>  src/compiler/nir/nir_opt_algebraic.py   | 2 +-
>  src/gallium/drivers/freedreno/ir3/ir3_nir.c | 1 +
>  3 files changed, 3 insertions(+), 1 deletion(-)

Series is:
Reviewed-by: Kenneth Graunke 

I always thought it was pretty sketchy that we were splitting up
actual fma() GLSL built-in calls.  The whole point of using fma()
is that you want higher precision.  Otherwise, there's a * b + c.

It may have been technically legal (maybe), but this seems like the
right thing to do.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/23] i965/fs: fix copy propagation from load payload

2016-05-10 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> We were not considering the case where the load payload is writing to
> a destination with a reg_offset > 0.

Reviewed-by: Francisco Jerez 

> ---
>  src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> index fe37676..9fc06cb 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> @@ -820,7 +820,7 @@ fs_visitor::opt_copy_propagate_local(void *copy_prop_ctx, 
> bblock_t *block,
>  if (inst->src[i].file == VGRF) {
> acp_entry *entry = ralloc(copy_prop_ctx, acp_entry);
> entry->dst = inst->dst;
> -   entry->dst.reg_offset = offset;
> +   entry->dst.reg_offset += offset;
> entry->src = inst->src[i];
> entry->regs_written = regs_written;
> entry->regs_read = inst->regs_read(i);
> -- 
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/23] i965/fs: fix copy/constant propagation regioning checks

2016-05-10 Thread Francisco Jerez
Francisco Jerez  writes:

> Samuel Iglesias Gonsálvez  writes:
>
>> From: Iago Toral Quiroga 
>>
>> We were not accounting for reg_suboffset in the check for the start
>> of the region. This meant that would allow copy-propagation even if
>> the dst wrote to sub_regoffset 4 and our source read from
>> sub_regoffset 0, which is not correct. This was observed in fp64 code,
>> since there we use reg_suboffset to select the high 32-bit of a double.
>>
> I don't think this paragraph is accurate, copy instructions with
> non-zero destination subreg offset are currently considered partial
> writes and should never have been added to the ACP hash table in the
> first place.
>
>> Also, fs_reg::regs_read() already takes the stride into account, so we
>> should not multiply its result by the stride again. This was making
>> copy-propagation fail to copy-propagate cases that would otherwise be
>> safe to copy-propagate. Again, this was observed in fp64 code, since
>> there we use stride > 1 often.
>>
>> Incidentally, these fixes open up more possibilities for copy propagation
>> which uncovered new bugs in copy-propagation. The folowing patches address
>> each of these new issues.
>
> Oh man, that sucks...
>
>> ---
>>  .../drivers/dri/i965/brw_fs_copy_propagation.cpp| 21 
>> +
>>  1 file changed, 13 insertions(+), 8 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
>> b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> index 5fae10f..23df877 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
>> @@ -329,6 +329,15 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned 
>> stride,
>> return true;
>>  }
>>  
>> +static inline bool
>> +region_match(fs_reg src, unsigned regs_read,
>> + fs_reg dst, unsigned regs_written)
>
Forgot to mention that there's no reason to pass the registers by value
here, please use 'const fs_reg &' instead.

> How about 'region_contained_in(dst, regs_write, src, regs_read)'? (I

Oops, in case it wasn't not clear from my sentence above, I didn't
intend to suggest using different argument names for this function, I
just typoed them -- regs_written sounds fine to me.

> personally wouldn't mind 'region_match' but
> 'write_region_contains_read_region' sounds a bit too long for my taste).
>
>> +{
>> +   return src.reg_offset >= dst.reg_offset &&
>> +  (src.reg_offset + regs_read) <= (dst.reg_offset + regs_written) &&
>> +  src.subreg_offset >= dst.subreg_offset;
>
> This works under the assumption that src.subreg_offset is strictly less
> than the reg_offset unit -- Which *should* be the case unless we've
> messed up that restriction in some place (we have in the past :P).  To
> be on the safe side you could do something like following, if you like:
>
> |   return (src.reg_offset * REG_SIZE + src.subreg_offset >=
> |   dst.reg_offset * REG_SIZE + dst.subreg_offset) &&
> |  src.reg_offset + regs_read <= dst.reg_offset + regs_written;
>
> With the above taken into account:
>
> Reviewed-by: Francisco Jerez 
>
>> +}
>> +
>>  bool
>>  fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
>>  {
>> @@ -351,10 +360,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
>> acp_entry *entry)
>> /* Bail if inst is reading a range that isn't contained in the range
>>  * that entry is writing.
>>  */
>> -   if (inst->src[arg].reg_offset < entry->dst.reg_offset ||
>> -   (inst->src[arg].reg_offset * 32 + inst->src[arg].subreg_offset +
>> -inst->regs_read(arg) * inst->src[arg].stride * 32) >
>> -   (entry->dst.reg_offset + entry->regs_written) * 32)
>> +   if (!region_match(inst->src[arg], inst->regs_read(arg),
>> + entry->dst, entry->regs_written))
>>return false;
>>  
>> /* we can't generally copy-propagate UD negations because we
>> @@ -554,10 +561,8 @@ fs_visitor::try_constant_propagate(fs_inst *inst, 
>> acp_entry *entry)
>>/* Bail if inst is reading a range that isn't contained in the range
>> * that entry is writing.
>> */
>> -  if (inst->src[i].reg_offset < entry->dst.reg_offset ||
>> -  (inst->src[i].reg_offset * 32 + inst->src[i].subreg_offset +
>> -   inst->regs_read(i) * inst->src[i].stride * 32) >
>> -  (entry->dst.reg_offset + entry->regs_written) * 32)
>> +  if (!region_match(inst->src[i], inst->regs_read(i),
>> +entry->dst, entry->regs_written))
>>   continue;
>>  
>>fs_reg val = entry->src;
>> -- 
>> 2.5.0
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature

Re: [Mesa-dev] [PATCH 3/3] i965/fs: Recognize and emit ld_lz, sample_lz, sample_c_lz.

2016-05-10 Thread Kenneth Graunke
On Wednesday, May 4, 2016 5:41:37 PM PDT Kenneth Graunke wrote:
> On Wednesday, May 4, 2016 3:54:14 PM PDT Matt Turner wrote:
> > Ken suggested instead of a big and complicated optimization pass, to
> > just recognize the operations here. It's certainly less code and a lot
> > prettier, but it seems to actually perform worse for currently unknown
> > reasons.
> 
> One potential pitfall is that is_zero() currently returns false for
> -0.0f.  The fp64 series that's supposedly landing fixes this.
> 

That's fixed now.  With get_nir_src_imm simplified, this series is:

Reviewed-by: Kenneth Graunke 

I'm not sure why the other one does better, but this is better than
nothing, and it's really simple.  We may as well land it...


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] isl: add support for Android libisl static

2016-05-10 Thread Mauro Rossi
This patch adds support for libisl static, needed to build i965.

Android.genxml.gen.mk generates the necessary gen%_pack.h headers

Android.gen.mk generates isl_format_layout.c
---
 Android.mk   |   1 +
 src/intel/isl/Android.gen.mk |  47 
 src/intel/isl/Android.genxml.gen.mk  |  75 ++
 src/intel/isl/Android.mk | 143 +++
 src/intel/isl/Makefile.sources   |  31 
 src/mesa/drivers/dri/i965/Android.mk |   6 +-
 6 files changed, 301 insertions(+), 2 deletions(-)
 create mode 100644 src/intel/isl/Android.gen.mk
 create mode 100644 src/intel/isl/Android.genxml.gen.mk
 create mode 100644 src/intel/isl/Android.mk
 create mode 100644 src/intel/isl/Makefile.sources

diff --git a/Android.mk b/Android.mk
index bd42bc6..222c6b9 100644
--- a/Android.mk
+++ b/Android.mk
@@ -94,6 +94,7 @@ SUBDIRS := \
src/mesa \
src/util \
src/egl \
+   src/intel/isl \
src/mesa/drivers/dri
 
 INC_DIRS := $(call all-named-subdir-makefiles,$(SUBDIRS))
diff --git a/src/intel/isl/Android.gen.mk b/src/intel/isl/Android.gen.mk
new file mode 100644
index 000..dcfdfaf
--- /dev/null
+++ b/src/intel/isl/Android.gen.mk
@@ -0,0 +1,47 @@
+#
+# Copyright (C) 2016 Linaro, Ltd., Rob Herring 
+#
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# Software is furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included
+# in all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+# DEALINGS IN THE SOFTWARE.
+#
+
+ifeq ($(LOCAL_MODULE_CLASS),)
+LOCAL_MODULE_CLASS := STATIC_LIBRARIES
+endif
+
+isl_format_layout_deps := \
+   $(LOCAL_PATH)/isl_format_layout_gen.bash \
+   $(LOCAL_PATH)/isl_format_layout.csv
+
+intermediates := $(call local-generated-sources-dir)
+
+define bash-gen
+   @mkdir -p $(dir $@)
+   @echo "Gen Bash: $(PRIVATE_MODULE) <= $(notdir $(@))"
+   $(hide) $(PRIVATE_SCRIPT) < $(PRIVATE_CSV) > $@
+endef
+
+$(intermediates)/isl_format_layout.c: PRIVATE_SCRIPT := bash -c 
$(LOCAL_PATH)/isl_format_layout_gen.bash
+$(intermediates)/isl_format_layout.c: PRIVATE_CSV := 
$(LOCAL_PATH)/isl_format_layout.csv
+$(intermediates)/isl_format_layout.c: $(isl_format_layout_deps)
+   $(call bash-gen)
+
+LOCAL_SRC_FILES := $(filter-out $(libisl_GENERATED_FILES), $(LOCAL_SRC_FILES))
+
+LOCAL_GENERATED_SOURCES += $(addprefix $(intermediates)/, \
+   $(libisl_GENERATED_FILES))
diff --git a/src/intel/isl/Android.genxml.gen.mk 
b/src/intel/isl/Android.genxml.gen.mk
new file mode 100644
index 000..4b69c76
--- /dev/null
+++ b/src/intel/isl/Android.genxml.gen.mk
@@ -0,0 +1,75 @@
+# Mesa 3-D graphics library
+#
+# Copyright (C) 2010-2011 Chia-I Wu 
+# Copyright (C) 2010-2011 LunarG Inc.
+#
+# Permission is hereby granted, free of charge, to any person obtaining a
+# copy of this software and associated documentation files (the "Software"),
+# to deal in the Software without restriction, including without limitation
+# the rights to use, copy, modify, merge, publish, distribute, sublicense,
+# and/or sell copies of the Software, and to permit persons to whom the
+# Software is furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included
+# in all copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+# DEALINGS IN THE SOFTWARE.
+
+# included by isl Android.mk for source generation
+
+ifeq ($(LOCAL_MODULE_CLASS),)
+LOCAL_MODULE_CLASS := STATIC_LIBRARIES
+endif
+
+intermediates := $(call local-generated-sources-dir)
+
+LOCAL_C_INCLUDES += $(intermediates)
+
+# This is the list of auto-generated 

Re: [Mesa-dev] [PATCH 06/23] i965/fs: fix copy/constant propagation regioning checks

2016-05-10 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> We were not accounting for reg_suboffset in the check for the start
> of the region. This meant that would allow copy-propagation even if
> the dst wrote to sub_regoffset 4 and our source read from
> sub_regoffset 0, which is not correct. This was observed in fp64 code,
> since there we use reg_suboffset to select the high 32-bit of a double.
>
I don't think this paragraph is accurate, copy instructions with
non-zero destination subreg offset are currently considered partial
writes and should never have been added to the ACP hash table in the
first place.

> Also, fs_reg::regs_read() already takes the stride into account, so we
> should not multiply its result by the stride again. This was making
> copy-propagation fail to copy-propagate cases that would otherwise be
> safe to copy-propagate. Again, this was observed in fp64 code, since
> there we use stride > 1 often.
>
> Incidentally, these fixes open up more possibilities for copy propagation
> which uncovered new bugs in copy-propagation. The folowing patches address
> each of these new issues.

Oh man, that sucks...

> ---
>  .../drivers/dri/i965/brw_fs_copy_propagation.cpp| 21 
> +
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> index 5fae10f..23df877 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> @@ -329,6 +329,15 @@ can_take_stride(fs_inst *inst, unsigned arg, unsigned 
> stride,
> return true;
>  }
>  
> +static inline bool
> +region_match(fs_reg src, unsigned regs_read,
> + fs_reg dst, unsigned regs_written)

How about 'region_contained_in(dst, regs_write, src, regs_read)'? (I
personally wouldn't mind 'region_match' but
'write_region_contains_read_region' sounds a bit too long for my taste).

> +{
> +   return src.reg_offset >= dst.reg_offset &&
> +  (src.reg_offset + regs_read) <= (dst.reg_offset + regs_written) &&
> +  src.subreg_offset >= dst.subreg_offset;

This works under the assumption that src.subreg_offset is strictly less
than the reg_offset unit -- Which *should* be the case unless we've
messed up that restriction in some place (we have in the past :P).  To
be on the safe side you could do something like following, if you like:

|   return (src.reg_offset * REG_SIZE + src.subreg_offset >=
|   dst.reg_offset * REG_SIZE + dst.subreg_offset) &&
|  src.reg_offset + regs_read <= dst.reg_offset + regs_written;

With the above taken into account:

Reviewed-by: Francisco Jerez 

> +}
> +
>  bool
>  fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
>  {
> @@ -351,10 +360,8 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
> acp_entry *entry)
> /* Bail if inst is reading a range that isn't contained in the range
>  * that entry is writing.
>  */
> -   if (inst->src[arg].reg_offset < entry->dst.reg_offset ||
> -   (inst->src[arg].reg_offset * 32 + inst->src[arg].subreg_offset +
> -inst->regs_read(arg) * inst->src[arg].stride * 32) >
> -   (entry->dst.reg_offset + entry->regs_written) * 32)
> +   if (!region_match(inst->src[arg], inst->regs_read(arg),
> + entry->dst, entry->regs_written))
>return false;
>  
> /* we can't generally copy-propagate UD negations because we
> @@ -554,10 +561,8 @@ fs_visitor::try_constant_propagate(fs_inst *inst, 
> acp_entry *entry)
>/* Bail if inst is reading a range that isn't contained in the range
> * that entry is writing.
> */
> -  if (inst->src[i].reg_offset < entry->dst.reg_offset ||
> -  (inst->src[i].reg_offset * 32 + inst->src[i].subreg_offset +
> -   inst->regs_read(i) * inst->src[i].stride * 32) >
> -  (entry->dst.reg_offset + entry->regs_written) * 32)
> +  if (!region_match(inst->src[i], inst->regs_read(i),
> +entry->dst, entry->regs_written))
>   continue;
>  
>fs_reg val = entry->src;
> -- 
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] compiler/glsl: move list node downcasts after sentinel/counter checks

2016-05-10 Thread Ian Romanick
On 05/07/2016 03:05 PM, Nicolai Hähnle wrote:
> From: Nicolai Hähnle 
> 
> ---
>  src/compiler/glsl/ast_function.cpp  | 4 ++--
>  src/compiler/glsl/link_uniform_initializers.cpp | 8 +++-
>  2 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/src/compiler/glsl/ast_function.cpp 
> b/src/compiler/glsl/ast_function.cpp
> index 37fb3e79..65d3be1 100644
> --- a/src/compiler/glsl/ast_function.cpp
> +++ b/src/compiler/glsl/ast_function.cpp
> @@ -1670,8 +1670,6 @@ process_record_constructor(exec_list *instructions,
>  
> exec_node *node = actual_parameters.head;
> for (unsigned i = 0; i < constructor_type->length; i++) {
> -  ir_rvalue *ir = (ir_rvalue *) node;
> -
>if (node->is_tail_sentinel()) {
>   _mesa_glsl_error(loc, state,
>"insufficient parameters to constructor for `%s'",
> @@ -1679,6 +1677,8 @@ process_record_constructor(exec_list *instructions,
>   return ir_rvalue::error_value(ctx);
>}
>  
> +  ir_rvalue *ir = (ir_rvalue *) node;
> +
>if 
> (apply_implicit_conversion(constructor_type->fields.structure[i].type,
>   ir, state)) {
>   node->replace_with(ir);
> diff --git a/src/compiler/glsl/link_uniform_initializers.cpp 
> b/src/compiler/glsl/link_uniform_initializers.cpp
> index c6346d5..eec4e99 100644
> --- a/src/compiler/glsl/link_uniform_initializers.cpp
> +++ b/src/compiler/glsl/link_uniform_initializers.cpp
> @@ -179,17 +179,15 @@ set_uniform_initializer(void *mem_ctx, 
> gl_shader_program *prog,
>  {
> const glsl_type *t_without_array = type->without_array();
> if (type->is_record()) {
> -  ir_constant *field_constant;
> +  exec_node *node = val->components.get_head();
>  
> -  field_constant = (ir_constant *)val->components.get_head();
> -
> -  for (unsigned int i = 0; i < type->length; i++) {
> +  for (unsigned int i = 0; i < type->length; i++, node = node->next) {
> +  ir_constant *field_constant = (ir_constant *)node;
  ^ space here

Since field_constant should never be changed, I'd declare it as
'ir_constant *const'.

With those changes, this patch is

Reviewed-by: Ian Romanick 

>const glsl_type *field_type = type->fields.structure[i].type;
>const char *field_name = ralloc_asprintf(mem_ctx, "%s.%s", name,
>   type->fields.structure[i].name);
>set_uniform_initializer(mem_ctx, prog, field_name,
>   field_type, field_constant, boolean_true);
> -  field_constant = (ir_constant *)field_constant->next;
>}
>return;
> } else if (t_without_array->is_record() ||
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/5] compiler/list: add and use for_range_list macro

2016-05-10 Thread Ian Romanick
On 05/07/2016 03:05 PM, Nicolai Hähnle wrote:
> From: Nicolai Hähnle 
> 
> This macro avoids undefined downcasting of list sentinels that crashes gcc's
> ubsan.
> ---
>  src/compiler/glsl/list.h| 8 
>  src/compiler/glsl/opt_tree_grafting.cpp | 5 +
>  2 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/src/compiler/glsl/list.h b/src/compiler/glsl/list.h
> index 12389aa..f05d437 100644
> --- a/src/compiler/glsl/list.h
> +++ b/src/compiler/glsl/list.h
> @@ -719,6 +719,14 @@ inline void exec_node::insert_before(exec_list *before)
>  /**
>   * Iterate over a range [begin, end) of nodes.
>   */
> +#define for_range_list(__type, __node, __begin, __end) \
> +   for (__type *(__node), **__flag = &(__node); __flag; __flag = NULL) \
> +  for (exec_node *__cur = (__begin),   \
> + *__end_stored = (__end);  \
> +   __cur != __end_stored &&\
> +   (((__node) = (__type *) __cur) || true);\
> +   __cur = __cur->next)
> +
>  #define for_range_list_safe(__type, __node, __begin, __end) \
> for (__type *(__node), **__flag = &(__node); __flag; __flag = NULL) \
>for (struct exec_node *__cur = (__begin),\
> diff --git a/src/compiler/glsl/opt_tree_grafting.cpp 
> b/src/compiler/glsl/opt_tree_grafting.cpp
> index 47fca7d..539ed57 100644
> --- a/src/compiler/glsl/opt_tree_grafting.cpp
> +++ b/src/compiler/glsl/opt_tree_grafting.cpp
> @@ -323,10 +323,7 @@ try_tree_grafting(ir_assignment *start,
>fprintf(stderr, "\n");
> }
>  
> -   for (ir_instruction *ir = (ir_instruction *)start->next;
> - ir != bb_last->next;
> - ir = (ir_instruction *)ir->next) {
> -

Does this also work?

  for (exec_node *node = start->next;
   node != bb_last->next;
   node = node->next) {
 ir_insruction *const ir = (ir_instruction *) node;

> +   for_range_list(ir_instruction, ir, start->next, bb_last->next) {
>if (debug) {
>fprintf(stderr, "- ");
>ir->fprint(stderr);
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] Added pbuffer hooks for surfaceless platform

2016-05-10 Thread Gurchetan Singh
Hi Chad,

Thanks for the review and good suggestions.  I am interested to know your
opinion regarding an issue that comes up when implementing a back-buffered
pbuffer.  The problem is we'll be creating a front renderbuffer, not a back
renderbuffer if we pass in single buffered visuals.

For example, in intelCreateBuffer in the i965 driver, the back buffer is
only created when double-buffered visuals are passed in:

>>_mesa_add_renderbuffer(fb, BUFFER_FRONT_LEFT, >Base.Base);

>>if (mesaVis->doubleBufferMode) {
>>  rb = intel_create_renderbuffer(rgbFormat, num_samples);
>>  _mesa_add_renderbuffer(fb, BUFFER_BACK_LEFT, >Base.Base)
>>}

Due to this, with a single-buffered configuration, we'll have to do the
following in surfaceless_image_get_buffers:

>>buffers->image_mask |= __DRI_IMAGE_BUFFER_FRONT

not this:

>>buffers->image_mask |= __DRI_IMAGE_BUFFER_BACK

If this a problem, we'll have to pass in a parameter in the visual
indicating we're trying to create an EGL pbuffer (i.e,
mesaVis->is_egl_pbuffer).  Then we'll set the proper renderbuffer and
framebuffer options from that config in several places.  Some these places
include:

http://pastebin.com/28ddebWF

(I just changed the single buffer defaults in that code snippet so I could
test on my surfaceless platform, we'll have query from the updated visuals
in a proper implementation)

Is the fact we'll work with the default front render buffer a problem?  If
so, what's the best way to fix the issue?

On Fri, May 6, 2016 at 9:27 PM, Chad Versace  wrote:

> On 05/06/2016 03:39 PM, Stéphane Marchesin wrote:
> > On Fri, May 6, 2016 at 3:32 PM, Gurchetan Singh
> >  wrote:
> >> This change enables the creation of pbuffer
> >> surfaces on the surfaceless platform.
> >>
> >> V2: Use double-buffered pbuffer configuration
> >
> > Reviewed-by: Stéphane Marchesin 
> >
> > Chad, do you also want to take a look at it?
>
> On a philosophical note, it's ironic that we're now creating a *surface*
> in the
> *surfaceless* platform. Don't you agree? Anyway, let's ignore the irony and
> focus on practical matters.
>
> There are a few bugs and minor style issues. See the comments below.
>
> There is also one major issue that needs discussion. I believe pbuffers
> are single-buffered, and that the single buffer is the back buffer.
>
> As precedent, platform_x11.c implements pbuffers as single-buffered.
>
> The relevant language in the EGL 1.5 spec is phrased badly, and could be
> interpreted either way: (a) pbuffers are double-buffered, or (b) pbuffers
> have
> only a back buffer.  If I recall correctly, an internal Khronos
> conversation in
> 2014 arrived at conclusion (b).  Here are the relevant quotes from the
> spec:
>
> - Some platforms may not allow rendering directly to the front buffer
> of
>   a window surface. When such windows are made current to a context,
> the
>   context will always have an EGL_RENDER_BUFFER attribute value of
>   EGL_BACK_BUFFER. From the client API point of view these surfaces
> have
>   only a back buffer and no front buffer, similar to pbuffer rendering
> (see
>   section 2.2.2).
>
> - Querying EGL_RENDER_BUFFER [with eglQueryContext()] returns the
> buffer
>   which client API rendering is requested to use. [...] For a pbuffer
>   surface, it is always EGL_BACK_BUFFER.
>
> - [When eglSwapBuffers() is called,] If surface is a back-buffered
> window
>   surface, then the color buffer is copied to the native window
> associated
>   with that surface. [Otherwise, if] surface is a single-buffered
> window,
>   pixmap, or pbuffer surface, eglSwapBuffers has no effect.
>
> The single-buffered nature has an impact on the implementation of
> eglSwapBuffers, according the last bullet. It's a no-op. As precedent,
> platform_x11.c correctly does nothing when swapping a pbuffer.
>
> If you think my interpretation of the spec is wrong, and that
> platform_x11.c is incorrect, then I'm open to discussing it. I'm
> especially interested to learn whether any non-Mesa EGL implementations
> treat pbuffers as double-buffered.
>
> (Also, this patch should probably set EGL_MIN/MAX_SWAP_INTERVAL to 0/0
> for pbuffer configs. But let's overlook that for now, as I don't think
> it's important for the initial implementation. Depending on how Google
> uses this patch, perhaps the swap interval bounds are never relevant.)
>
> >> ---
> >>  src/egl/drivers/dri2/egl_dri2.h |   8 +-
> >>  src/egl/drivers/dri2/platform_surfaceless.c | 219
> +++-
> >>  2 files changed, 222 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/src/egl/drivers/dri2/egl_dri2.h
> b/src/egl/drivers/dri2/egl_dri2.h
> >> index ddb5f39..ecf9c76 100644
> >> --- a/src/egl/drivers/dri2/egl_dri2.h
> >> +++ b/src/egl/drivers/dri2/egl_dri2.h
> >> @@ -291,8 +291,14 @@ struct dri2_egl_surface
> >> /* EGL-owned buffers */
> >> 

Re: [Mesa-dev] [PATCH 2/5] compiler/list: add and use for_range_list_safe

2016-05-10 Thread Ian Romanick
As time goes on, I become less and less a fan of the proliferation of
for_* macros... especially ones that are only used in one or two places.

On 05/07/2016 03:05 PM, Nicolai Hähnle wrote:
> From: Nicolai Hähnle 
> 
> This macro avoids undefined behaviour that crashes gcc's ubsan.
> ---
>  src/compiler/glsl/list.h  | 13 +
>  src/compiler/glsl/opt_dead_code_local.cpp |  7 +--
>  src/compiler/glsl/opt_tree_grafting.cpp   |  5 +
>  3 files changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/src/compiler/glsl/list.h b/src/compiler/glsl/list.h
> index 8da9514..12389aa 100644
> --- a/src/compiler/glsl/list.h
> +++ b/src/compiler/glsl/list.h
> @@ -716,4 +716,17 @@ inline void exec_node::insert_before(exec_list *before)
> (((__node) = exec_node_data(__type, __cur, __field)) || true);  \
> __cur = __prev, __prev = __prev->prev)
>  
> +/**
> + * Iterate over a range [begin, end) of nodes.
> + */
> +#define for_range_list_safe(__type, __node, __begin, __end) \
> +   for (__type *(__node), **__flag = &(__node); __flag; __flag = NULL) \
> +  for (struct exec_node *__cur = (__begin),\
> +*__next = __cur->next, \
> +*__end_stored = (__end);   \
> +   __cur != __end_stored &&\
> +   (((__node) = (__type *) __cur) || true);\
> +   __cur = __next, __next = __next->next)
> +
> +
>  #endif /* LIST_CONTAINER_H */
> diff --git a/src/compiler/glsl/opt_dead_code_local.cpp 
> b/src/compiler/glsl/opt_dead_code_local.cpp
> index d38fd2b..5dd3bfd 100644
> --- a/src/compiler/glsl/opt_dead_code_local.cpp
> +++ b/src/compiler/glsl/opt_dead_code_local.cpp
> @@ -291,7 +291,6 @@ dead_code_local_basic_block(ir_instruction *first,
>ir_instruction *last,
>void *data)
>  {
> -   ir_instruction *ir, *ir_next;
> /* List of avaialble_copy */
> exec_list assignments;
> bool *out_progress = (bool *)data;
> @@ -299,8 +298,7 @@ dead_code_local_basic_block(ir_instruction *first,
>  
> void *ctx = ralloc_context(NULL);
> /* Safe looping, since process_assignment */
> -   for (ir = first, ir_next = (ir_instruction *)first->next;;
> - ir = ir_next, ir_next = (ir_instruction *)ir->next) {
> +   for_range_list_safe(ir_instruction, ir, first, last->next) {

In this case it seems like changing all the types to exec_node* and adding

 ir_instruction *ir = (ir_instruction *) node;

right here would be sufficient.

Unless you're able to hit a case where first or last isn't really an
ir_instruction.  If that's the case, there's a much bigger bug.

>ir_assignment *ir_assign = ir->as_assignment();
>  
>if (debug) {
> @@ -314,9 +312,6 @@ dead_code_local_basic_block(ir_instruction *first,
>kill_for_derefs_visitor kill();
>ir->accept();
>}
> -
> -  if (ir == last)
> -  break;
> }
> *out_progress = progress;
> ralloc_free(ctx);
> diff --git a/src/compiler/glsl/opt_tree_grafting.cpp 
> b/src/compiler/glsl/opt_tree_grafting.cpp
> index a40e5f7..47fca7d 100644
> --- a/src/compiler/glsl/opt_tree_grafting.cpp
> +++ b/src/compiler/glsl/opt_tree_grafting.cpp
> @@ -347,11 +347,8 @@ tree_grafting_basic_block(ir_instruction *bb_first,
> void *data)
>  {
> struct tree_grafting_info *info = (struct tree_grafting_info *)data;
> -   ir_instruction *ir, *next;
>  
> -   for (ir = bb_first, next = (ir_instruction *)ir->next;
> - ir != bb_last->next;
> - ir = next, next = (ir_instruction *)ir->next) {

I think this could be fixed by changing next to be exec_node*.

> +   for_range_list_safe(ir_instruction, ir, bb_first, bb_last->next) {
>ir_assignment *assign = ir->as_assignment();
>  
>if (!assign)
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 28/28] i965/blorp: Delete the old blorp shader emit code

2016-05-10 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/Makefile.sources  |2 -
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp| 1288 +--
 src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp |  145 ---
 src/mesa/drivers/dri/i965/brw_blorp_blit_eu.h   |  212 
 src/mesa/drivers/dri/i965/brw_defines.h |1 -
 src/mesa/drivers/dri/i965/brw_fs.h  |1 -
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp  |   21 -
 src/mesa/drivers/dri/i965/brw_shader.cpp|2 -
 8 files changed, 8 insertions(+), 1664 deletions(-)
 delete mode 100644 src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp
 delete mode 100644 src/mesa/drivers/dri/i965/brw_blorp_blit_eu.h

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index fa5cf5a..e94605e 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -99,8 +99,6 @@ i965_compiler_GENERATED_FILES = \
 i965_FILES = \
brw_binding_tables.c \
brw_blorp_blit.cpp \
-   brw_blorp_blit_eu.cpp \
-   brw_blorp_blit_eu.h \
brw_blorp_clear.cpp \
brw_blorp.c \
brw_blorp.h \
diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index d1c39b0..97e3908 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -31,7 +31,6 @@
 
 #include "brw_blorp.h"
 #include "brw_context.h"
-#include "brw_blorp_blit_eu.h"
 #include "brw_state.h"
 #include "brw_meta_util.h"
 
@@ -1403,1236 +1402,6 @@ brw_blorp_build_nir_shader(struct brw_context *brw,
return b.shader;
 }
 
-class brw_blorp_blit_program : public brw_blorp_eu_emitter
-{
-public:
-   brw_blorp_blit_program(struct brw_context *brw,
-  const brw_blorp_blit_prog_key *key);
-
-   const GLuint *compile(struct brw_context *brw, bool debug_flag,
- GLuint *program_size);
-
-   brw_blorp_prog_data prog_data;
-
-private:
-   void alloc_regs();
-   void alloc_push_const_regs(int base_reg);
-   void compute_frag_coords();
-   void translate_tiling(bool old_tiled_w, bool new_tiled_w);
-   void encode_msaa(unsigned num_samples, intel_msaa_layout layout);
-   void decode_msaa(unsigned num_samples, intel_msaa_layout layout);
-   void translate_dst_to_src();
-   void clamp_tex_coords(struct brw_reg regX, struct brw_reg regY,
- struct brw_reg clampX0, struct brw_reg clampY0,
- struct brw_reg clampX1, struct brw_reg clampY1);
-   void single_to_blend();
-   void manual_blend_average(unsigned num_samples);
-   void manual_blend_bilinear(unsigned num_samples);
-   void sample(struct brw_reg dst);
-   void texel_fetch(struct brw_reg dst);
-   void mcs_fetch();
-   void texture_lookup(struct brw_reg dst, enum opcode op,
-   const sampler_message_arg *args, int num_args);
-   void render_target_write();
-
-   /**
-* Base-2 logarithm of the maximum number of samples that can be blended.
-*/
-   static const unsigned LOG2_MAX_BLEND_SAMPLES = 3;
-
-   struct brw_context *brw;
-   const brw_blorp_blit_prog_key *key;
-
-   /* Thread dispatch header */
-   struct brw_reg R0;
-
-   /* Pixel X/Y coordinates (always in R1). */
-   struct brw_reg R1;
-
-   /* Push constants */
-   struct brw_reg dst_x0;
-   struct brw_reg dst_x1;
-   struct brw_reg dst_y0;
-   struct brw_reg dst_y1;
-   /* Top right coordinates of the rectangular grid used for scaled blitting */
-   struct brw_reg rect_grid_x1;
-   struct brw_reg rect_grid_y1;
-   struct {
-  struct brw_reg multiplier;
-  struct brw_reg offset;
-   } x_transform, y_transform;
-   struct brw_reg src_z;
-
-   /* Data read from texture (4 vec16's per array element) */
-   struct brw_reg texture_data[LOG2_MAX_BLEND_SAMPLES + 1];
-
-   /* Auxiliary storage for the contents of the MCS surface.
-*
-* Since the sampler always returns 8 registers worth of data, this is 8
-* registers wide, even though we only use the first 2 registers of it.
-*/
-   struct brw_reg mcs_data;
-
-   /* X coordinates.  We have two of them so that we can perform coordinate
-* transformations easily.
-*/
-   struct brw_reg x_coords[2];
-
-   /* Y coordinates.  We have two of them so that we can perform coordinate
-* transformations easily.
-*/
-   struct brw_reg y_coords[2];
-
-   /* X, Y coordinates of the pixel from which we need to fetch the specific
-*  sample. These are used for multisample scaled blitting.
-*/
-   struct brw_reg x_sample_coords;
-   struct brw_reg y_sample_coords;
-
-   /* Fractional parts of the x and y coordinates, used as bilinear 
interpolation coefficients */
-   struct brw_reg x_frac;
-   struct brw_reg y_frac;
-
-   /* Which element of x_coords and y_coords is currently in use.
-*/
-   int xy_coord_index;
-
-   /* True if, at the point in the program currently being compiled, the
- 

[Mesa-dev] [PATCH 25/28] i965/blorp: Add support for averaging resolves to the NIR path

2016-05-10 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 162 ---
 1 file changed, 144 insertions(+), 18 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index 7b01da8..83cdac5 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -840,6 +840,133 @@ blorp_nir_decode_msaa(nir_builder *b, nir_ssa_def *pos,
 }
 
 /**
+ * Count the number of trailing 1 bits in the given value.  For example:
+ *
+ * count_trailing_one_bits(0) == 0
+ * count_trailing_one_bits(7) == 3
+ * count_trailing_one_bits(11) == 2
+ */
+static inline int count_trailing_one_bits(unsigned value)
+{
+#ifdef HAVE___BUILTIN_CTZ
+   return __builtin_ctz(~value);
+#else
+   return _mesa_bitcount(value & ~(value + 1));
+#endif
+}
+
+static nir_ssa_def *
+blorp_nir_manual_blend_average(nir_builder *b, nir_ssa_def *pos,
+   unsigned tex_samples,
+   enum intel_msaa_layout tex_layout,
+   enum brw_reg_type dst_type)
+{
+   /* If non-null, this is the outer-most if statement */
+   nir_if *outer_if = NULL;
+
+   nir_variable *color =
+  nir_local_variable_create(b->impl, glsl_vec4_type(), "color");
+
+   nir_ssa_def *mcs = NULL;
+   if (tex_layout == INTEL_MSAA_LAYOUT_CMS)
+  mcs = blorp_nir_txf_ms_mcs(b, pos);
+
+   /* We add together samples using a binary tree structure, e.g. for 4x MSAA:
+*
+*   result = ((sample[0] + sample[1]) + (sample[2] + sample[3])) / 4
+*
+* This ensures that when all samples have the same value, no numerical
+* precision is lost, since each addition operation always adds two equal
+* values, and summing two equal floating point values does not lose
+* precision.
+*
+* We perform this computation by treating the texture_data array as a
+* stack and performing the following operations:
+*
+* - push sample 0 onto stack
+* - push sample 1 onto stack
+* - add top two stack entries
+* - push sample 2 onto stack
+* - push sample 3 onto stack
+* - add top two stack entries
+* - add top two stack entries
+* - divide top stack entry by 4
+*
+* Note that after pushing sample i onto the stack, the number of add
+* operations we do is equal to the number of trailing 1 bits in i.  This
+* works provided the total number of samples is a power of two, which it
+* always is for i965.
+*
+* For integer formats, we replace the add operations with average
+* operations and skip the final division.
+*/
+   nir_ssa_def *texture_data[4];
+   unsigned stack_depth = 0;
+   for (unsigned i = 0; i < tex_samples; ++i) {
+  assert(stack_depth == _mesa_bitcount(i)); /* Loop invariant */
+
+  /* Push sample i onto the stack */
+  assert(stack_depth < ARRAY_SIZE(texture_data));
+
+  nir_ssa_def *ms_pos = nir_vec3(b, nir_channel(b, pos, 0),
+nir_channel(b, pos, 1),
+nir_imm_int(b, i));
+  texture_data[stack_depth++] = blorp_nir_txf_ms(b, ms_pos, mcs, dst_type);
+
+  if (i == 0 && tex_layout == INTEL_MSAA_LAYOUT_CMS) {
+ /* The Ivy Bridge PRM, Vol4 Part1 p27 (Multisample Control Surface)
+  * suggests an optimization:
+  *
+  * "A simple optimization with probable large return in
+  * performance is to compare the MCS value to zero (indicating
+  * all samples are on sample slice 0), and sample only from
+  * sample slice 0 using ld2dss if MCS is zero."
+  *
+  * Note that in the case where the MCS value is zero, sampling from
+  * sample slice 0 using ld2dss and sampling from sample 0 using
+  * ld2dms are equivalent (since all samples are on sample slice 0).
+  * Since we have already sampled from sample 0, all we need to do is
+  * skip the remaining fetches and averaging if MCS is zero.
+  */
+ nir_ssa_def *mcs_zero =
+nir_ieq(b, nir_channel(b, mcs, 0), nir_imm_int(b, 0));
+ nir_if *if_stmt = nir_if_create(b->shader);
+ if_stmt->condition = nir_src_for_ssa(mcs_zero);
+ nir_cf_node_insert(b->cursor, _stmt->cf_node);
+
+ b->cursor = nir_after_cf_list(_stmt->then_list);
+ nir_store_var(b, color, texture_data[0], 0xf);
+
+ b->cursor = nir_after_cf_list(_stmt->else_list);
+ outer_if = if_stmt;
+  }
+
+  for (int j = 0; j < count_trailing_one_bits(i); j++) {
+ assert(stack_depth >= 2);
+ --stack_depth;
+
+ assert(dst_type == BRW_REGISTER_TYPE_F);
+ texture_data[stack_depth - 1] =
+nir_fadd(b, texture_data[stack_depth - 1],
+texture_data[stack_depth]);
+  }
+   }
+
+   /* We should have just 1 sample on the stack now. */
+   

[Mesa-dev] [PATCH 20/28] i965/blorp: Refactor getting the blit kernel into a helper

2016-05-10 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 35 +---
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index 7067c06..ea64b11 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -1737,6 +1737,27 @@ brw_blorp_blit_program::texture_lookup(struct brw_reg 
dst,
 #undef S
 #undef SWAP_XY_AND_XPYP
 
+static void
+brw_blorp_get_blit_kernel(struct brw_context *brw,
+  struct brw_blorp_params *params,
+  const struct brw_blorp_blit_prog_key *prog_key)
+{
+   if (brw_search_cache(>cache, BRW_CACHE_BLORP_PROG,
+prog_key, sizeof(*prog_key),
+>wm_prog_kernel, >wm_prog_data))
+  return;
+
+   brw_blorp_blit_program prog(brw, prog_key);
+   GLuint program_size;
+   const GLuint *program = prog.compile(brw, INTEL_DEBUG & DEBUG_BLORP,
+_size);
+   brw_upload_cache(>cache, BRW_CACHE_BLORP_PROG,
+prog_key, sizeof(*prog_key),
+program, program_size,
+_data, sizeof(prog.prog_data),
+>wm_prog_kernel, >wm_prog_data);
+}
+
 void
 brw_blorp_blit_program::render_target_write()
 {
@@ -2185,19 +2206,7 @@ brw_blorp_blit_miptrees(struct brw_context *brw,
   params.src.y_offset /= 2;
}
 
-   if (!brw_search_cache(>cache, BRW_CACHE_BLORP_PROG,
- _prog_key, sizeof(wm_prog_key),
- _prog_kernel, _prog_data)) {
-  brw_blorp_blit_program prog(brw, _prog_key);
-  GLuint program_size;
-  const GLuint *program = prog.compile(brw, INTEL_DEBUG & DEBUG_BLORP,
-   _size);
-  brw_upload_cache(>cache, BRW_CACHE_BLORP_PROG,
-   _prog_key, sizeof(wm_prog_key),
-   program, program_size,
-   _data, sizeof(prog.prog_data),
-   _prog_kernel, _prog_data);
-   }
+   brw_blorp_get_blit_kernel(brw, , _prog_key);
 
params.src.swizzle = src_swizzle;
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 21/28] i965/blorp: Add initial support for NIR-based blit shaders

2016-05-10 Thread Jason Ekstrand
Many of the more complex cases still fall back to the old shader builder.
---
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 425 +--
 1 file changed, 401 insertions(+), 24 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index ea64b11..f94dd6f 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -25,6 +25,8 @@
 #include "main/teximage.h"
 #include "main/fbobject.h"
 
+#include "compiler/nir/nir_builder.h"
+
 #include "intel_fbo.h"
 
 #include "brw_blorp.h"
@@ -332,6 +334,226 @@ enum sampler_message_arg
SAMPLER_MESSAGE_ARG_ZERO_INT,
 };
 
+struct brw_blorp_blit_vars {
+   /* Uniforms values from brw_blorp_wm_push_constants */
+   nir_variable *u_dst_x0;
+   nir_variable *u_dst_x1;
+   nir_variable *u_dst_y0;
+   nir_variable *u_dst_y1;
+   nir_variable *u_rect_grid_x1;
+   nir_variable *u_rect_grid_y1;
+   struct {
+  nir_variable *multiplier;
+  nir_variable *offset;
+   } u_x_transform, u_y_transform;
+   nir_variable *u_src_z;
+
+   /* gl_FragCoord */
+   nir_variable *frag_coord;
+
+   /* gl_FragColor */
+   nir_variable *color_out;
+};
+
+static void
+brw_blorp_blit_vars_init(nir_builder *b, struct brw_blorp_blit_vars *v,
+ const struct brw_blorp_blit_prog_key *key)
+{
+#define LOAD_UNIFORM(name, type)\
+   v->u_##name = nir_variable_create(b->shader, nir_var_uniform, type, #name); 
\
+   v->u_##name->data.location = \
+  offsetof(struct brw_blorp_wm_push_constants, name);
+
+   LOAD_UNIFORM(dst_x0, glsl_uint_type())
+   LOAD_UNIFORM(dst_x1, glsl_uint_type())
+   LOAD_UNIFORM(dst_y0, glsl_uint_type())
+   LOAD_UNIFORM(dst_y1, glsl_uint_type())
+   LOAD_UNIFORM(rect_grid_x1, glsl_float_type())
+   LOAD_UNIFORM(rect_grid_y1, glsl_float_type())
+   LOAD_UNIFORM(x_transform.multiplier, glsl_float_type())
+   LOAD_UNIFORM(x_transform.offset, glsl_float_type())
+   LOAD_UNIFORM(y_transform.multiplier, glsl_float_type())
+   LOAD_UNIFORM(y_transform.offset, glsl_float_type())
+   LOAD_UNIFORM(src_z, glsl_uint_type())
+
+#undef DECL_UNIFORM
+
+   v->frag_coord = nir_variable_create(b->shader, nir_var_shader_in,
+   glsl_vec4_type(), "gl_FragCoord");
+   v->frag_coord->data.location = VARYING_SLOT_POS;
+   v->frag_coord->data.origin_upper_left = true;
+
+   v->color_out = nir_variable_create(b->shader, nir_var_shader_out,
+  glsl_vec4_type(), "gl_FragColor");
+   v->color_out->data.location = FRAG_RESULT_COLOR;
+}
+
+nir_ssa_def *
+blorp_blit_get_frag_coords(nir_builder *b,
+   const struct brw_blorp_blit_prog_key *key,
+   struct brw_blorp_blit_vars *v)
+{
+   nir_ssa_def *coord = nir_f2i(b, nir_load_var(b, v->frag_coord));
+
+   if (key->persample_msaa_dispatch) {
+  return nir_vec3(b, nir_channel(b, coord, 0), nir_channel(b, coord, 1),
+ nir_load_system_value(b, nir_intrinsic_load_sample_id, 0));
+   } else {
+  return nir_vec2(b, nir_channel(b, coord, 0), nir_channel(b, coord, 1));
+   }
+}
+
+nir_ssa_def *
+blorp_blit_apply_transform(nir_builder *b, nir_ssa_def *src_pos,
+   struct brw_blorp_blit_vars *v)
+{
+   nir_ssa_def *offset = nir_vec2(b, nir_load_var(b, v->u_x_transform.offset),
+ nir_load_var(b, v->u_y_transform.offset));
+   nir_ssa_def *mul = nir_vec2(b, nir_load_var(b, v->u_x_transform.multiplier),
+  nir_load_var(b, 
v->u_y_transform.multiplier));
+
+   nir_ssa_def *pos = nir_ffma(b, src_pos, mul, offset);
+
+   if (src_pos->num_components == 3) {
+  /* Leave the sample id alone */
+  pos = nir_vec3(b, nir_channel(b, pos, 0), nir_channel(b, pos, 1),
+nir_channel(b, src_pos, 2));
+   }
+
+   return pos;
+}
+
+static nir_tex_instr *
+blorp_create_nir_tex_instr(nir_shader *shader, nir_texop op,
+   nir_ssa_def *pos, unsigned num_srcs,
+   enum brw_reg_type dst_type)
+{
+   nir_tex_instr *tex = nir_tex_instr_create(shader, num_srcs);
+
+   tex->op = op;
+
+   switch (dst_type) {
+   case BRW_REGISTER_TYPE_F:
+  tex->dest_type = nir_type_float;
+  break;
+   case BRW_REGISTER_TYPE_D:
+  tex->dest_type = nir_type_int;
+  break;
+   case BRW_REGISTER_TYPE_UD:
+  tex->dest_type = nir_type_uint;
+  break;
+   default:
+  unreachable("Invalid texture return type");
+   }
+
+   tex->is_array = false;
+   tex->is_shadow = false;
+
+   /* Blorp only has one texture and it's bound at unit 0 */
+   tex->texture = NULL;
+   tex->sampler = NULL;
+   tex->texture_index = 0;
+   tex->sampler_index = 0;
+
+   nir_ssa_dest_init(>instr, >dest, 4, 32, NULL);
+
+   return tex;
+}
+
+static nir_ssa_def *
+blorp_nir_tex(nir_builder *b, nir_ssa_def *pos, enum brw_reg_type dst_type)
+{
+   

[Mesa-dev] [PATCH 18/28] i965/blorp: Create the program key in get_clear_kernel

2016-05-10 Thread Jason Ekstrand
There's no reason to be passing a whole struct around just for a single
boolean.  We can create it later when we actually need to use it as a key.
---
 src/mesa/drivers/dri/i965/brw_blorp_clear.cpp | 32 +--
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
index c298889..94b8277 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
@@ -99,16 +99,20 @@ 
brw_blorp_const_color_program::~brw_blorp_const_color_program()
 static void
 brw_blorp_params_get_clear_kernel(struct brw_context *brw,
   struct brw_blorp_params *params,
-  brw_blorp_const_color_prog_key *wm_prog_key)
+  bool use_replicated_data)
 {
+   struct brw_blorp_const_color_prog_key blorp_key;
+   memset(_key, 0, sizeof(blorp_key));
+   blorp_key.use_simd16_replicated_data = use_replicated_data;
+
if (!brw_search_cache(>cache, BRW_CACHE_BLORP_PROG,
- wm_prog_key, sizeof(*wm_prog_key),
+ _key, sizeof(blorp_key),
  >wm_prog_kernel, >wm_prog_data)) {
-  brw_blorp_const_color_program prog(brw, wm_prog_key);
+  brw_blorp_const_color_program prog(brw, _key);
   GLuint program_size;
   const GLuint *program = prog.compile(brw, _size);
   brw_upload_cache(>cache, BRW_CACHE_BLORP_PROG,
-   wm_prog_key, sizeof(*wm_prog_key),
+   _key, sizeof(blorp_key),
program, program_size,
_data, sizeof(prog.prog_data),
>wm_prog_kernel, >wm_prog_data);
@@ -257,10 +261,7 @@ do_single_blorp_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
memcpy(_push_consts.dst_x0,
   ctx->Color.ClearColor.f, sizeof(float) * 4);
 
-   brw_blorp_const_color_prog_key wm_prog_key;
-   memset(_prog_key, 0, sizeof(wm_prog_key));
-
-   wm_prog_key.use_simd16_replicated_data = true;
+   bool use_simd16_replicated_data = true;
 
/* From the SNB PRM (Vol4_Part1):
 *
@@ -269,17 +270,17 @@ do_single_blorp_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
 *  (untiled) memory is UNDEFINED."
 */
if (irb->mt->tiling == I915_TILING_NONE)
-  wm_prog_key.use_simd16_replicated_data = false;
+  use_simd16_replicated_data = false;
 
/* Constant color writes ignore everyting in blend and color calculator
 * state.  This is not documented.
 */
if (set_write_disables(irb, ctx->Color.ColorMask[buf],
   params.color_write_disable))
-  wm_prog_key.use_simd16_replicated_data = false;
+  use_simd16_replicated_data = false;
 
if (irb->mt->fast_clear_state != INTEL_FAST_CLEAR_STATE_NO_MCS &&
-   !partial_clear && wm_prog_key.use_simd16_replicated_data &&
+   !partial_clear && use_simd16_replicated_data &&
brw_is_color_fast_clear_compatible(brw, irb->mt,
   >Color.ClearColor)) {
   memset(_push_consts, 0xff, 4*sizeof(float));
@@ -292,7 +293,7 @@ do_single_blorp_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
, );
}
 
-   brw_blorp_params_get_clear_kernel(brw, , _prog_key);
+   brw_blorp_params_get_clear_kernel(brw, , use_simd16_replicated_data);
 
const bool is_fast_clear =
   params.fast_clear_op == GEN7_PS_RENDER_TARGET_FAST_CLEAR_ENABLE;
@@ -326,7 +327,7 @@ do_single_blorp_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
const char *clear_type;
if (is_fast_clear)
   clear_type = "fast";
-   else if (wm_prog_key.use_simd16_replicated_data)
+   else if (use_simd16_replicated_data)
   clear_type = "replicated";
else
   clear_type = "slow";
@@ -417,11 +418,8 @@ brw_blorp_resolve_color(struct brw_context *brw, struct 
intel_mipmap_tree *mt)
 * ensure that the fragment shader delivers the data using the "replicated
 * color" message.
 */
-   brw_blorp_const_color_prog_key wm_prog_key;
-   memset(_prog_key, 0, sizeof(wm_prog_key));
-   wm_prog_key.use_simd16_replicated_data = true;
 
-   brw_blorp_params_get_clear_kernel(brw, , _prog_key);
+   brw_blorp_params_get_clear_kernel(brw, , true);
 
brw_blorp_exec(brw, );
mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_RESOLVED;
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 22/28] i965/blorp: Add support for discard-based bounds checks to the NIR path

2016-05-10 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index f94dd6f..27aab20 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -423,6 +423,23 @@ blorp_blit_apply_transform(nir_builder *b, nir_ssa_def 
*src_pos,
return pos;
 }
 
+static inline void
+blorp_nir_discard_if_outside_rect(nir_builder *b, nir_ssa_def *pos,
+  struct brw_blorp_blit_vars *v)
+{
+   nir_ssa_def *c0, *c1, *c2, *c3;
+   c0 = nir_ult(b, nir_channel(b, pos, 0), nir_load_var(b, v->u_dst_x0));
+   c1 = nir_uge(b, nir_channel(b, pos, 0), nir_load_var(b, v->u_dst_x1));
+   c2 = nir_ult(b, nir_channel(b, pos, 1), nir_load_var(b, v->u_dst_y0));
+   c3 = nir_uge(b, nir_channel(b, pos, 1), nir_load_var(b, v->u_dst_y1));
+   nir_ssa_def *oob = nir_ior(b, nir_ior(b, c0, c1), nir_ior(b, c2, c3));
+
+   nir_intrinsic_instr *discard =
+  nir_intrinsic_instr_create(b->shader, nir_intrinsic_discard_if);
+   discard->src[0] = nir_src_for_ssa(oob);
+   nir_builder_instr_insert(b, >instr);
+}
+
 static nir_tex_instr *
 blorp_create_nir_tex_instr(nir_shader *shader, nir_texop op,
nir_ssa_def *pos, unsigned num_srcs,
@@ -781,7 +798,7 @@ brw_blorp_build_nir_shader(struct brw_context *brw,
 * now is the time to do it.
 */
if (key->use_kill)
-  goto fail;
+  blorp_nir_discard_if_outside_rect(, dst_pos, );
 
src_pos = blorp_blit_apply_transform(, nir_i2f(, dst_pos), );
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 27/28] i965/blorp: Refactor coordinate munging

2016-05-10 Thread Jason Ekstrand
The original code-flow tried to map original blorp.  This puts things more
where they belong and simplifies some of the logic.
---
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 59 ++--
 1 file changed, 29 insertions(+), 30 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index c4d80a7..d1c39b0 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -1317,12 +1317,6 @@ brw_blorp_build_nir_shader(struct brw_context *brw,
 
src_pos = blorp_blit_apply_transform(, nir_i2f(, dst_pos), );
 
-   if (key->blit_scaled && key->blend) {
-   } else if (!key->bilinear_filter) {
-  /* We're going to use a texelFetch, so we need integers */
-  src_pos = nir_f2i(, src_pos);
-   }
-
/* If the source image is not multisampled, then we want to fetch sample
 * number 0, because that's the only sample there is.
 */
@@ -1334,6 +1328,9 @@ brw_blorp_build_nir_shader(struct brw_context *brw,
 * irrelevant, because we are going to fetch all samples.
 */
if (key->blend && !key->blit_scaled) {
+  /* Resolves (effecively) use texelFetch, so we need integers */
+  src_pos = nir_f2i(, src_pos);
+
   if (brw->gen == 6) {
  /* When looking up samples in an MSAA texture using the SAMPLE
   * message, Gen6 requires the texture coordinates to be odd integers
@@ -1354,33 +1351,35 @@ brw_blorp_build_nir_shader(struct brw_context *brw,
} else if (key->blend && key->blit_scaled) {
   color = blorp_nir_manual_blend_bilinear(, src_pos, key->src_samples, 
key, );
} else {
-  /* We aren't blending, which means we just want to fetch a single sample
-   * from the source surface.  The address that we want to fetch from is
-   * related to the X, Y and S values according to the formula:
-   *
-   * (X, Y, S) = decode_msaa(src_samples, detile(src_tiling, offset)).
-   *
-   * If the actual tiling and sample count of the source surface are not
-   * the same as the configuration of the texture, then we need to adjust
-   * the coordinates to compensate for the difference.
-   */
-  if ((tex_tiled_w != key->src_tiled_w ||
-   key->tex_samples != key->src_samples ||
-   key->tex_layout != key->src_layout) &&
-  !key->bilinear_filter) {
- src_pos = blorp_nir_encode_msaa(, src_pos, key->src_samples,
- key->src_layout);
- /* Now (X, Y, S) = detile(src_tiling, offset) */
- if (tex_tiled_w != key->src_tiled_w)
-src_pos = blorp_nir_retile_w_to_y(, src_pos);
- /* Now (X, Y, S) = detile(tex_tiling, offset) */
- src_pos = blorp_nir_decode_msaa(, src_pos, key->tex_samples,
- key->tex_layout);
-  }
-
   if (key->bilinear_filter) {
  color = blorp_nir_tex(, src_pos, key->texture_data_type);
   } else {
+ /* We're going to use texelFetch, so we need integers */
+ src_pos = nir_f2i(, src_pos);
+
+ /* We aren't blending, which means we just want to fetch a single
+  * sample from the source surface.  The address that we want to fetch
+  * from is related to the X, Y and S values according to the formula:
+  *
+  * (X, Y, S) = decode_msaa(src_samples, detile(src_tiling, offset)).
+  *
+  * If the actual tiling and sample count of the source surface are
+  * not the same as the configuration of the texture, then we need to
+  * adjust the coordinates to compensate for the difference.
+  */
+ if (tex_tiled_w != key->src_tiled_w ||
+ key->tex_samples != key->src_samples ||
+ key->tex_layout != key->src_layout) {
+src_pos = blorp_nir_encode_msaa(, src_pos, key->src_samples,
+key->src_layout);
+/* Now (X, Y, S) = detile(src_tiling, offset) */
+if (tex_tiled_w != key->src_tiled_w)
+   src_pos = blorp_nir_retile_w_to_y(, src_pos);
+/* Now (X, Y, S) = detile(tex_tiling, offset) */
+src_pos = blorp_nir_decode_msaa(, src_pos, key->tex_samples,
+key->tex_layout);
+ }
+
  /* Now (X, Y, S) = decode_msaa(tex_samples, detile(tex_tiling, 
offset)).
   *
   * In other words: X, Y, and S now contain values which, when passed 
to
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 19/28] i965/blorp: Use NIR for clear shaders

2016-05-10 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_blorp_clear.cpp | 184 ++
 1 file changed, 39 insertions(+), 145 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
index 94b8277..3925d28 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
@@ -37,6 +37,8 @@
 #include "brw_eu.h"
 #include "brw_state.h"
 
+#include "nir_builder.h"
+
 #define FILE_DEBUG_FLAG DEBUG_BLORP
 
 struct brw_blorp_const_color_prog_key
@@ -45,78 +47,55 @@ struct brw_blorp_const_color_prog_key
bool pad[3];
 };
 
-class brw_blorp_const_color_program
+static void
+brw_blorp_params_get_clear_kernel(struct brw_context *brw,
+  struct brw_blorp_params *params,
+  bool use_replicated_data)
 {
-public:
-   brw_blorp_const_color_program(struct brw_context *brw,
- const brw_blorp_const_color_prog_key *key);
-   ~brw_blorp_const_color_program();
+   struct brw_blorp_const_color_prog_key blorp_key;
+   memset(_key, 0, sizeof(blorp_key));
+   blorp_key.use_simd16_replicated_data = use_replicated_data;
 
-   const GLuint *compile(struct brw_context *brw, GLuint *program_size);
+   if (brw_search_cache(>cache, BRW_CACHE_BLORP_PROG,
+_key, sizeof(blorp_key),
+>wm_prog_kernel, >wm_prog_data))
+  return;
 
-   brw_blorp_prog_data prog_data;
+   void *mem_ctx = ralloc_context(NULL);
 
-private:
-   void alloc_regs();
+   nir_builder b;
+   nir_builder_init_simple_shader(, NULL, MESA_SHADER_FRAGMENT, NULL);
+   b.shader->info.name = ralloc_strdup(b.shader, "BLORP-clear");
 
-   void *mem_ctx;
-   const brw_blorp_const_color_prog_key *key;
-   struct brw_codegen func;
+   nir_variable *u_color = nir_variable_create(b.shader, nir_var_uniform,
+   glsl_vec4_type(), "u_color");
+   u_color->data.location = 0;
 
-   /* Thread dispatch header */
-   struct brw_reg R0;
+   nir_variable *frag_color = nir_variable_create(b.shader, nir_var_shader_out,
+  glsl_vec4_type(),
+  "gl_FragColor");
+   frag_color->data.location = FRAG_RESULT_COLOR;
 
-   /* Pixel X/Y coordinates (always in R1). */
-   struct brw_reg R1;
+   nir_copy_var(, frag_color, u_color);
 
-   /* Register with push constants (a single vec4) */
-   struct brw_reg clear_rgba;
+   struct brw_wm_prog_key wm_key;
+   brw_blorp_init_wm_prog_key(_key);
 
-   /* MRF used for render target writes */
-   GLuint base_mrf;
-};
+   struct brw_blorp_prog_data prog_data;
+   brw_blorp_prog_data_init(_data);
 
-brw_blorp_const_color_program::brw_blorp_const_color_program(
-  struct brw_context *brw,
-  const brw_blorp_const_color_prog_key *key)
-   : mem_ctx(ralloc_context(NULL)),
- key(key),
- R0(),
- R1(),
- clear_rgba(),
- base_mrf(0)
-{
-   prog_data.first_curbe_grf_0 = 0;
-   prog_data.persample_msaa_dispatch = false;
-   brw_init_codegen(brw->intelScreen->devinfo, , mem_ctx);
-}
+   unsigned program_size;
+   const unsigned *program =
+  brw_blorp_compile_nir_shader(brw, b.shader, _key, use_replicated_data,
+   _data, _size);
 
-brw_blorp_const_color_program::~brw_blorp_const_color_program()
-{
-   ralloc_free(mem_ctx);
-}
-
-static void
-brw_blorp_params_get_clear_kernel(struct brw_context *brw,
-  struct brw_blorp_params *params,
-  bool use_replicated_data)
-{
-   struct brw_blorp_const_color_prog_key blorp_key;
-   memset(_key, 0, sizeof(blorp_key));
-   blorp_key.use_simd16_replicated_data = use_replicated_data;
+   brw_upload_cache(>cache, BRW_CACHE_BLORP_PROG,
+_key, sizeof(blorp_key),
+program, program_size,
+_data, sizeof(prog_data),
+>wm_prog_kernel, >wm_prog_data);
 
-   if (!brw_search_cache(>cache, BRW_CACHE_BLORP_PROG,
- _key, sizeof(blorp_key),
- >wm_prog_kernel, >wm_prog_data)) {
-  brw_blorp_const_color_program prog(brw, _key);
-  GLuint program_size;
-  const GLuint *program = prog.compile(brw, _size);
-  brw_upload_cache(>cache, BRW_CACHE_BLORP_PROG,
-   _key, sizeof(blorp_key),
-   program, program_size,
-   _data, sizeof(prog.prog_data),
-   >wm_prog_kernel, >wm_prog_data);
-   }
+   ralloc_free(mem_ctx);
 }
 
 static bool
@@ -142,91 +121,6 @@ set_write_disables(const struct intel_renderbuffer *irb,
return disables;
 }
 
-void
-brw_blorp_const_color_program::alloc_regs()
-{
-   int reg = 0;
-   this->R0 = retype(brw_vec8_grf(reg++, 0), BRW_REGISTER_TYPE_UW);
-   this->R1 = retype(brw_vec8_grf(reg++, 0), 

[Mesa-dev] [PATCH 24/28] i965/blorp: Add MSAA encode/decode support to the NIR path

2016-05-10 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 203 +--
 1 file changed, 194 insertions(+), 9 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index c0c02cf..7b01da8 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -664,6 +664,182 @@ blorp_nir_retile_w_to_y(nir_builder *b, nir_ssa_def *pos)
 }
 
 /**
+ * Emit code to compensate for the difference between MSAA and non-MSAA
+ * surfaces.
+ *
+ * This code modifies the X and Y coordinates according to the formula:
+ *
+ *   (X', Y', S') = encode_msaa(num_samples, IMS, X, Y, S)
+ *
+ * (See brw_blorp_blit_program).
+ */
+static inline nir_ssa_def *
+blorp_nir_encode_msaa(nir_builder *b, nir_ssa_def *pos,
+  unsigned num_samples, enum intel_msaa_layout layout)
+{
+   assert(pos->num_components == 2 || pos->num_components == 3);
+
+   switch (layout) {
+   case INTEL_MSAA_LAYOUT_NONE:
+  assert(pos->num_components == 2);
+  return pos;
+   case INTEL_MSAA_LAYOUT_CMS:
+  /* We can't compensate for compressed layout since at this point in the
+   * program we haven't read from the MCS buffer.
+   */
+  unreachable("Bad layout in encode_msaa");
+   case INTEL_MSAA_LAYOUT_UMS:
+  /* No translation needed */
+  return pos;
+   case INTEL_MSAA_LAYOUT_IMS: {
+  nir_ssa_def *x_in = nir_channel(b, pos, 0);
+  nir_ssa_def *y_in = nir_channel(b, pos, 1);
+  nir_ssa_def *s_in = pos->num_components == 2 ? nir_imm_int(b, 0) :
+ nir_channel(b, pos, 2);
+
+  nir_ssa_def *x_out = nir_imm_int(b, 0);
+  nir_ssa_def *y_out = nir_imm_int(b, 0);
+  switch (num_samples) {
+  case 2:
+  case 4:
+ /* encode_msaa(2, IMS, X, Y, S) = (X', Y', 0)
+  *   where X' = (X & ~0b1) << 1 | (S & 0b1) << 1 | (X & 0b1)
+  * Y' = Y
+  *
+  * encode_msaa(4, IMS, X, Y, S) = (X', Y', 0)
+  *   where X' = (X & ~0b1) << 1 | (S & 0b1) << 1 | (X & 0b1)
+  * Y' = (Y & ~0b1) << 1 | (S & 0b10) | (Y & 0b1)
+  */
+ x_out = nir_mask_shift_or(b, x_out, x_in, 0xfffe, 1);
+ x_out = nir_mask_shift_or(b, x_out, s_in, 0x1, 1);
+ x_out = nir_mask_shift_or(b, x_out, x_in, 0x1, 0);
+ if (num_samples == 2) {
+y_out = y_in;
+ } else {
+y_out = nir_mask_shift_or(b, y_out, y_in, 0xfffe, 1);
+y_out = nir_mask_shift_or(b, y_out, s_in, 0x2, 0);
+y_out = nir_mask_shift_or(b, y_out, y_in, 0x1, 0);
+ }
+ break;
+
+  case 8:
+ /* encode_msaa(8, IMS, X, Y, S) = (X', Y', 0)
+  *   where X' = (X & ~0b1) << 2 | (S & 0b100) | (S & 0b1) << 1
+  *  | (X & 0b1)
+  * Y' = (Y & ~0b1) << 1 | (S & 0b10) | (Y & 0b1)
+  */
+ x_out = nir_mask_shift_or(b, x_out, x_in, 0xfffe, 2);
+ x_out = nir_mask_shift_or(b, x_out, s_in, 0x4, 0);
+ x_out = nir_mask_shift_or(b, x_out, s_in, 0x1, 1);
+ x_out = nir_mask_shift_or(b, x_out, x_in, 0x1, 0);
+ y_out = nir_mask_shift_or(b, y_out, y_in, 0xfffe, 1);
+ y_out = nir_mask_shift_or(b, y_out, s_in, 0x2, 0);
+ y_out = nir_mask_shift_or(b, y_out, y_in, 0x1, 0);
+ break;
+
+  default:
+ unreachable("Invalid number of samples for IMS layout");
+  }
+
+  return nir_vec2(b, x_out, y_out);
+   }
+
+   default:
+  unreachable("Invalid MSAA layout");
+   }
+}
+
+/**
+ * Emit code to compensate for the difference between MSAA and non-MSAA
+ * surfaces.
+ *
+ * This code modifies the X and Y coordinates according to the formula:
+ *
+ *   (X', Y', S) = decode_msaa(num_samples, IMS, X, Y, S)
+ *
+ * (See brw_blorp_blit_program).
+ */
+static inline nir_ssa_def *
+blorp_nir_decode_msaa(nir_builder *b, nir_ssa_def *pos,
+  unsigned num_samples, enum intel_msaa_layout layout)
+{
+   assert(pos->num_components == 2 || pos->num_components == 3);
+
+   switch (layout) {
+   case INTEL_MSAA_LAYOUT_NONE:
+  /* No translation necessary, and S should already be zero. */
+  assert(pos->num_components == 2);
+  return pos;
+   case INTEL_MSAA_LAYOUT_CMS:
+  /* We can't compensate for compressed layout since at this point in the
+   * program we don't have access to the MCS buffer.
+   */
+  unreachable("Bad layout in encode_msaa");
+   case INTEL_MSAA_LAYOUT_UMS:
+  /* No translation necessary. */
+  return pos;
+   case INTEL_MSAA_LAYOUT_IMS: {
+  assert(pos->num_components == 2);
+
+  nir_ssa_def *x_in = nir_channel(b, pos, 0);
+  nir_ssa_def *y_in = nir_channel(b, pos, 1);
+
+  nir_ssa_def *x_out = nir_imm_int(b, 0);
+  nir_ssa_def *y_out = nir_imm_int(b, 0);
+  nir_ssa_def *s_out = nir_imm_int(b, 0);
+  switch 

[Mesa-dev] [PATCH 02/28] i965/fs: Rework the persample shading key/prog_data bits

2016-05-10 Thread Jason Ekstrand
This commit reworks and simplifies the way we handle persample shading in
the shader key and prog_data.  The previous approach had three different
key bits that had slightly different and hard-to-decern meanings while the
new bits are far more clear.  This commit changes it to two easily
understood bits that communicate everything we need:

 1) key->persample_interp: means that the user has requested persample
interpolation through the API.  This is equivalent to having
SAMPLE_SHADING enabled and having MIN_SAMPLE_SHADING_VALUE set high
enough that you actually get multiple per-sample invocations.

 2) key->multisample_fbo: means that the shader will be running on an
actual multi-sampled framebuffer.

This commit also adds a new "persample_dispatch" bit to prog_data which
indicates that the shader should be run in persample mode.  This way the
state setup code doesn't have to look at the fragment program or GL state
and can just pull that data out of the prog_data.

In theory, this shuffle could mean more recompiles.  However, in practice,
we were shoving enough state into the key before that we were probably
hitting a recompile on every per-sample shader anyway.
---
 src/intel/vulkan/anv_pipeline.c   |  5 ++--
 src/mesa/drivers/dri/i965/brw_compiler.h  |  4 ++--
 src/mesa/drivers/dri/i965/brw_fs.cpp  | 40 +--
 src/mesa/drivers/dri/i965/brw_wm.c| 19 +++
 src/mesa/drivers/dri/i965/gen6_wm_state.c |  6 ++---
 src/mesa/drivers/dri/i965/gen7_wm_state.c | 20 +++-
 src/mesa/drivers/dri/i965/gen8_ps_state.c | 22 +++--
 7 files changed, 60 insertions(+), 56 deletions(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index ba088b6..4e52c91 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -286,8 +286,9 @@ populate_wm_prog_key(const struct brw_device_info *devinfo,
   /* We should probably pull this out of the shader, but it's fairly
* harmless to compute it and then let dead-code take care of it.
*/
-  key->persample_shading = info->pMultisampleState->sampleShadingEnable;
-  key->compute_pos_offset = info->pMultisampleState->sampleShadingEnable;
+  key->persample_interp =
+ (info->pMultisampleState->minSampleShading *
+  info->pMultisampleState->rasterizationSamples) > 1;
   key->multisample_fbo = true;
}
 }
diff --git a/src/mesa/drivers/dri/i965/brw_compiler.h 
b/src/mesa/drivers/dri/i965/brw_compiler.h
index 5807305..2482516 100644
--- a/src/mesa/drivers/dri/i965/brw_compiler.h
+++ b/src/mesa/drivers/dri/i965/brw_compiler.h
@@ -242,12 +242,11 @@ struct brw_wm_prog_key {
uint8_t iz_lookup;
bool stats_wm:1;
bool flat_shade:1;
-   bool persample_shading:1;
unsigned nr_color_regions:5;
bool replicate_alpha:1;
bool render_to_fbo:1;
bool clamp_fragment_color:1;
-   bool compute_pos_offset:1;
+   bool persample_interp:1;
bool multisample_fbo:1;
unsigned line_aa:2;
bool high_quality_derivatives:1;
@@ -386,6 +385,7 @@ struct brw_wm_prog_data {
bool early_fragment_tests;
bool no_8;
bool dual_src_blend;
+   bool persample_dispatch;
bool uses_pos_offset;
bool uses_omask;
bool uses_kill;
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 2a542b8..71e759d 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1195,8 +1195,8 @@ fs_visitor::emit_general_interpolation(fs_reg *attr, 
const char *name,
   inst->no_dd_clear = true;
 
inst = emit_linterp(*attr, fs_reg(interp), interpolation_mode,
-   mod_centroid && !key->persample_shading,
-   mod_sample || key->persample_shading);
+   mod_centroid && !key->persample_interp,
+   mod_sample || key->persample_interp);
inst->predicate = BRW_PREDICATE_NORMAL;
inst->predicate_inverse = false;
if (devinfo->has_pln)
@@ -1204,8 +1204,8 @@ fs_visitor::emit_general_interpolation(fs_reg *attr, 
const char *name,
 
 } else {
emit_linterp(*attr, fs_reg(interp), interpolation_mode,
-mod_centroid && !key->persample_shading,
-mod_sample || key->persample_shading);
+mod_centroid && !key->persample_interp,
+mod_sample || key->persample_interp);
 }
 if (devinfo->gen < 6 && interpolation_mode == 
INTERP_QUALIFIER_SMOOTH) {
bld.MUL(*attr, *attr, this->pixel_w);
@@ -1262,10 +1262,10 @@ void
 fs_visitor::compute_sample_position(fs_reg dst, fs_reg int_sample_pos)
 {
assert(stage == MESA_SHADER_FRAGMENT);
-   brw_wm_prog_key *key = (brw_wm_prog_key*) 

[Mesa-dev] [PATCH 16/28] blorp: Add initial state setup support for SIMD8 dispatch

2016-05-10 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_blorp.c |  6 +-
 src/mesa/drivers/dri/i965/brw_blorp.h |  8 +++-
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp  |  2 +-
 src/mesa/drivers/dri/i965/brw_blorp_clear.cpp |  4 ++--
 src/mesa/drivers/dri/i965/gen6_blorp.c| 23 ---
 src/mesa/drivers/dri/i965/gen7_blorp.c| 27 +--
 src/mesa/drivers/dri/i965/gen8_blorp.c| 23 +++
 7 files changed, 67 insertions(+), 26 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
b/src/mesa/drivers/dri/i965/brw_blorp.c
index 1379804..6c3b83a 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.c
+++ b/src/mesa/drivers/dri/i965/brw_blorp.c
@@ -137,7 +137,11 @@ brw_blorp_compute_tile_offsets(const struct 
brw_blorp_surface_info *info,
 void
 brw_blorp_prog_data_init(struct brw_blorp_prog_data *prog_data)
 {
-   prog_data->first_curbe_grf = 0;
+   prog_data->dispatch_8 = false;
+   prog_data->dispatch_16 = true;
+   prog_data->first_curbe_grf_0 = 0;
+   prog_data->first_curbe_grf_2 = 0;
+   prog_data->ksp_offset_2 = 0;
prog_data->persample_msaa_dispatch = false;
 
prog_data->nr_params = BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS;
diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h 
b/src/mesa/drivers/dri/i965/brw_blorp.h
index c2f33a1..b38b689 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.h
+++ b/src/mesa/drivers/dri/i965/brw_blorp.h
@@ -208,7 +208,13 @@ static const unsigned int BRW_BLORP_NUM_PUSH_CONST_REGS =
 
 struct brw_blorp_prog_data
 {
-   unsigned int first_curbe_grf;
+   bool dispatch_8;
+   bool dispatch_16;
+
+   uint8_t first_curbe_grf_0;
+   uint8_t first_curbe_grf_2;
+
+   uint32_t ksp_offset_2;
 
/**
 * True if the WM program should be run in MSDISPMODE_PERSAMPLE with more
diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index ed43184..7067c06 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -778,7 +778,7 @@ brw_blorp_blit_program::alloc_regs()
int reg = 0;
this->R0 = retype(brw_vec8_grf(reg++, 0), BRW_REGISTER_TYPE_UW);
this->R1 = retype(brw_vec8_grf(reg++, 0), BRW_REGISTER_TYPE_UW);
-   prog_data.first_curbe_grf = reg;
+   prog_data.first_curbe_grf_0 = reg;
alloc_push_const_regs(reg);
reg += BRW_BLORP_NUM_PUSH_CONST_REGS;
for (unsigned i = 0; i < ARRAY_SIZE(texture_data); ++i) {
diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
index 5ed46e1..c298889 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
@@ -86,7 +86,7 @@ brw_blorp_const_color_program::brw_blorp_const_color_program(
  clear_rgba(),
  base_mrf(0)
 {
-   prog_data.first_curbe_grf = 0;
+   prog_data.first_curbe_grf_0 = 0;
prog_data.persample_msaa_dispatch = false;
brw_init_codegen(brw->intelScreen->devinfo, , mem_ctx);
 }
@@ -145,7 +145,7 @@ brw_blorp_const_color_program::alloc_regs()
this->R0 = retype(brw_vec8_grf(reg++, 0), BRW_REGISTER_TYPE_UW);
this->R1 = retype(brw_vec8_grf(reg++, 0), BRW_REGISTER_TYPE_UW);
 
-   prog_data.first_curbe_grf = reg;
+   prog_data.first_curbe_grf_0 = reg;
clear_rgba = retype(brw_vec4_grf(reg++, 0), BRW_REGISTER_TYPE_F);
reg += BRW_BLORP_NUM_PUSH_CONST_REGS;
 
diff --git a/src/mesa/drivers/dri/i965/gen6_blorp.c 
b/src/mesa/drivers/dri/i965/gen6_blorp.c
index 950e2b9..32049eb 100644
--- a/src/mesa/drivers/dri/i965/gen6_blorp.c
+++ b/src/mesa/drivers/dri/i965/gen6_blorp.c
@@ -619,7 +619,7 @@ gen6_blorp_emit_wm_config(struct brw_context *brw,
   const struct brw_blorp_params *params)
 {
const struct brw_blorp_prog_data *prog_data = params->wm_prog_data;
-   uint32_t dw2, dw4, dw5, dw6;
+   uint32_t dw2, dw4, dw5, dw6, ksp0, ksp2;
 
/* Even when thread dispatch is disabled, max threads (dw5.25:31) must be
 * nonzero to prevent the GPU from hanging.  While the documentation doesn't
@@ -630,7 +630,7 @@ gen6_blorp_emit_wm_config(struct brw_context *brw,
 * configure the WM state whether or not there is a WM program.
 */
 
-   dw2 = dw4 = dw5 = dw6 = 0;
+   dw2 = dw4 = dw5 = dw6 = ksp0 = ksp2 = 0;
switch (params->hiz_op) {
case GEN6_HIZ_OP_DEPTH_CLEAR:
   dw4 |= GEN6_WM_DEPTH_CLEAR;
@@ -652,9 +652,18 @@ gen6_blorp_emit_wm_config(struct brw_context *brw,
dw6 |= 0 << GEN6_WM_BARYCENTRIC_INTERPOLATION_MODE_SHIFT; /* No interp */
dw6 |= 0 << GEN6_WM_NUM_SF_OUTPUTS_SHIFT; /* No inputs from SF */
if (params->wm_prog_data) {
-  dw4 |= prog_data->first_curbe_grf << GEN6_WM_DISPATCH_START_GRF_SHIFT_0;
-  dw5 |= GEN6_WM_16_DISPATCH_ENABLE;
   dw5 |= GEN6_WM_DISPATCH_ENABLE; /* We are rendering */
+
+  dw4 |= prog_data->first_curbe_grf_0 << 
GEN6_WM_DISPATCH_START_GRF_SHIFT_0;
+  dw4 |= prog_data->first_curbe_grf_2 << 

[Mesa-dev] [PATCH 09/28] i965/fs: Use MRF0 for the repclear message

2016-05-10 Thread Jason Ekstrand
This is what BLORP does.  Making them match cuts down on the noise when
looking at AUB diffs.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index f9bfe03..e6073e4 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2958,7 +2958,7 @@ void
 fs_visitor::emit_repclear_shader()
 {
brw_wm_prog_key *key = (brw_wm_prog_key*) this->key;
-   int base_mrf = 1;
+   int base_mrf = 0;
int color_mrf = base_mrf + 2;
fs_inst *mov;
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/28] i965/fs: Organize prog_data by ksp number rather than SIMD width

2016-05-10 Thread Jason Ekstrand
The hardware packets organize kernel pointers and GRF start by slots that
don't map directly to dispatch width.  This means that all of the state
setup code has to re-arrange the data from prog_data into these slots.
This logic has been duplicated 4 times in the GL driver and one more time
in the Vulkan driver.  Let's just put it all in brw_fs.cpp.
---
 src/intel/vulkan/anv_pipeline.c  | 41 --
 src/intel/vulkan/anv_private.h   |  5 ---
 src/intel/vulkan/gen7_pipeline.c | 12 +++---
 src/intel/vulkan/gen8_pipeline.c | 12 +++---
 src/mesa/drivers/dri/i965/brw_compiler.h | 12 +++---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 52 +--
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |  2 +-
 src/mesa/drivers/dri/i965/brw_wm_state.c | 31 +-
 src/mesa/drivers/dri/i965/gen6_wm_state.c| 63 +---
 src/mesa/drivers/dri/i965/gen7_wm_state.c| 35 +---
 src/mesa/drivers/dri/i965/gen8_ps_state.c| 37 +---
 11 files changed, 110 insertions(+), 192 deletions(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 4e52c91..19f24dd 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -585,17 +585,17 @@ anv_pipeline_compile_fs(struct anv_pipeline *pipeline,
const struct brw_stage_prog_data *stage_prog_data;
struct anv_pipeline_bind_map map;
struct brw_wm_prog_key key;
-   uint32_t kernel = NO_KERNEL;
unsigned char sha1[20];
 
populate_wm_prog_key(>device->info, info, extra, );
 
if (module->size > 0) {
   anv_hash_shader(sha1, , sizeof(key), module, entrypoint, spec_info);
-  kernel = anv_pipeline_cache_search(cache, sha1, _prog_data, );
+  pipeline->ps_ksp0 =
+ anv_pipeline_cache_search(cache, sha1, _prog_data, );
}
 
-   if (kernel == NO_KERNEL) {
+   if (pipeline->ps_ksp0 == NO_KERNEL) {
   struct brw_wm_prog_data prog_data = { 0, };
   struct anv_pipeline_binding surface_to_descriptor[256];
   struct anv_pipeline_binding sampler_to_descriptor[256];
@@ -682,43 +682,16 @@ anv_pipeline_compile_fs(struct anv_pipeline *pipeline,
   }
 
   stage_prog_data = _data.base;
-  kernel = anv_pipeline_cache_upload_kernel(cache,
-module->size > 0 ? sha1 : NULL,
-shader_code, code_size,
+  pipeline->ps_ksp0 =
+ anv_pipeline_cache_upload_kernel(cache,
+  module->size > 0 ? sha1 : NULL,
+  shader_code, code_size,
 _prog_data, 
sizeof(prog_data),
 );
 
   ralloc_free(mem_ctx);
}
 
-   const struct brw_wm_prog_data *wm_prog_data =
-  (const struct brw_wm_prog_data *) stage_prog_data;
-
-   if (wm_prog_data->no_8)
-  pipeline->ps_simd8 = NO_KERNEL;
-   else
-  pipeline->ps_simd8 = kernel;
-
-   if (wm_prog_data->no_8 || wm_prog_data->prog_offset_16) {
-  pipeline->ps_simd16 = kernel + wm_prog_data->prog_offset_16;
-   } else {
-  pipeline->ps_simd16 = NO_KERNEL;
-   }
-
-   pipeline->ps_ksp2 = 0;
-   pipeline->ps_grf_start2 = 0;
-   if (pipeline->ps_simd8 != NO_KERNEL) {
-  pipeline->ps_ksp0 = pipeline->ps_simd8;
-  pipeline->ps_grf_start0 = wm_prog_data->base.dispatch_grf_start_reg;
-  if (pipeline->ps_simd16 != NO_KERNEL) {
- pipeline->ps_ksp2 = pipeline->ps_simd16;
- pipeline->ps_grf_start2 = wm_prog_data->dispatch_grf_start_reg_16;
-  }
-   } else if (pipeline->ps_simd16 != NO_KERNEL) {
-  pipeline->ps_ksp0 = pipeline->ps_simd16;
-  pipeline->ps_grf_start0 = wm_prog_data->dispatch_grf_start_reg_16;
-   }
-
anv_pipeline_add_compiled_stage(pipeline, MESA_SHADER_FRAGMENT,
stage_prog_data, );
 
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index d8a2194..c55f1db 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -1418,12 +1418,7 @@ struct anv_pipeline {
struct anv_state blend_state;
uint32_t vs_simd8;
uint32_t vs_vec4;
-   uint32_t ps_simd8;
-   uint32_t ps_simd16;
uint32_t ps_ksp0;
-   uint32_t ps_ksp2;
-   uint32_t ps_grf_start0;
-   uint32_t ps_grf_start2;
uint32_t gs_kernel;
uint32_t cs_simd;
 
diff --git a/src/intel/vulkan/gen7_pipeline.c b/src/intel/vulkan/gen7_pipeline.c
index 

[Mesa-dev] [PATCH 05/28] i965/fs: Stop setting dispatch_grf_start_reg from the visitor

2016-05-10 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_fs.cpp  | 18 --
 src/mesa/drivers/dri/i965/brw_shader.cpp  |  1 +
 src/mesa/drivers/dri/i965/brw_vec4.cpp|  2 ++
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp |  1 +
 src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp|  1 +
 5 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index d136ba8..7d40135 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -1549,20 +1549,6 @@ fs_visitor::emit_gs_thread_end()
 void
 fs_visitor::assign_curb_setup()
 {
-   if (dispatch_width == 8) {
-  prog_data->dispatch_grf_start_reg = payload.num_regs;
-   } else {
-  if (stage == MESA_SHADER_FRAGMENT) {
- brw_wm_prog_data *prog_data = (brw_wm_prog_data*) this->prog_data;
- prog_data->dispatch_grf_start_reg_16 = payload.num_regs;
-  } else if (stage == MESA_SHADER_COMPUTE) {
- brw_cs_prog_data *prog_data = (brw_cs_prog_data*) this->prog_data;
- prog_data->dispatch_grf_start_reg_16 = payload.num_regs;
-  } else {
- unreachable("Unsupported shader type!");
-  }
-   }
-
prog_data->curb_read_length = ALIGN(stage_prog_data->nr_params, 8) / 8;
 
/* Map the offsets in the UNIFORM file to fixed HW regs. */
@@ -6029,6 +6015,7 @@ brw_compile_fs(const struct brw_compiler *compiler, void 
*log_data,
   return NULL;
} else if (likely(!(INTEL_DEBUG & DEBUG_NO8))) {
   simd8_cfg = v8.cfg;
+  prog_data->base.dispatch_grf_start_reg = v8.payload.num_regs;
}
 
if (!v8.simd16_unsupported &&
@@ -6044,6 +6031,7 @@ brw_compile_fs(const struct brw_compiler *compiler, void 
*log_data,
v16.fail_msg);
   } else {
  simd16_cfg = v16.cfg;
+ prog_data->dispatch_grf_start_reg_16 = v16.payload.num_regs;
   }
}
 
@@ -6167,6 +6155,7 @@ brw_compile_cs(const struct brw_compiler *compiler, void 
*log_data,
   } else {
  cfg = v8.cfg;
  prog_data->simd_size = 8;
+ prog_data->base.dispatch_grf_start_reg = v8.payload.num_regs;
   }
}
 
@@ -6191,6 +6180,7 @@ brw_compile_cs(const struct brw_compiler *compiler, void 
*log_data,
   } else {
  cfg = v16.cfg;
  prog_data->simd_size = 16;
+ prog_data->dispatch_grf_start_reg_16 = v16.payload.num_regs;
   }
}
 
diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
b/src/mesa/drivers/dri/i965/brw_shader.cpp
index a23f14e..cab79fa 100644
--- a/src/mesa/drivers/dri/i965/brw_shader.cpp
+++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
@@ -1388,6 +1388,7 @@ brw_compile_tes(const struct brw_compiler *compiler,
  return NULL;
   }
 
+  prog_data->base.base.dispatch_grf_start_reg = v.payload.num_regs;
   prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8;
 
   fs_generator g(compiler, log_data, mem_ctx, (void *) key,
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 815eaed..385afc1 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -2140,6 +2140,8 @@ brw_compile_vs(const struct brw_compiler *compiler, void 
*log_data,
  return NULL;
   }
 
+  prog_data->base.base.dispatch_grf_start_reg = v.payload.num_regs;
+
   fs_generator g(compiler, log_data, mem_ctx, (void *) key,
  _data->base.base, v.promoted_constants,
  v.runtime_check_aads_emit, MESA_SHADER_VERTEX);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
index 7df6c72..6f8c5bb 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
@@ -818,6 +818,7 @@ brw_compile_gs(const struct brw_compiler *compiler, void 
*log_data,
shader_time_index);
   if (v.run_gs()) {
  prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8;
+ prog_data->base.base.dispatch_grf_start_reg = v.payload.num_regs;
 
  fs_generator g(compiler, log_data, mem_ctx, ,
 _data->base.base, v.promoted_constants,
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp
index 6d39474..2e1a9a6 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp
@@ -507,6 +507,7 @@ brw_compile_tcs(const struct brw_compiler *compiler,
  return NULL;
   }
 
+  prog_data->base.base.dispatch_grf_start_reg = v.payload.num_regs;
   prog_data->base.dispatch_mode = DISPATCH_MODE_SIMD8;
 
   fs_generator g(compiler, log_data, mem_ctx, (void *) key,
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

[Mesa-dev] [PATCH 08/28] i965/blorp: Simplify the sample layout calculation

2016-05-10 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index dd22e6d..897ce99 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -1506,6 +1506,9 @@ brw_blorp_blit_program::manual_blend_bilinear(unsigned 
num_samples)
   *--
   *| 6 | 7 || 7 | 1 |
   *--
+  *
+  * Fortunately, this can be done fairly easily as:
+  * S' = (0x17306425 >> (S * 4)) & 0xf
   */
   emit_frc(vec16(t1_f), x_sample_coords);
   emit_frc(vec16(t2_f), y_sample_coords);
@@ -1515,23 +1518,10 @@ brw_blorp_blit_program::manual_blend_bilinear(unsigned 
num_samples)
   emit_mov(vec16(S), t1_f);
 
   if (num_samples == 8) {
- /* Map the sample index to a sample number */
- emit_cmp_if(BRW_CONDITIONAL_L, S, brw_imm_d(4));
- {
-emit_mov(vec16(t2), brw_imm_d(5));
-emit_if_eq_mov(S, 1, vec16(t2), 2);
-emit_if_eq_mov(S, 2, vec16(t2), 4);
-emit_if_eq_mov(S, 3, vec16(t2), 6);
- }
- emit_else();
- {
-emit_mov(vec16(t2), brw_imm_d(0));
-emit_if_eq_mov(S, 5, vec16(t2), 3);
-emit_if_eq_mov(S, 6, vec16(t2), 7);
-emit_if_eq_mov(S, 7, vec16(t2), 1);
- }
- emit_endif();
- emit_mov(vec16(S), t2);
+ emit_mov(vec16(t2), brw_imm_d(0x17306425));
+ emit_shl(vec16(S), S, brw_imm_d(2));
+ emit_shr(vec16(S), t2, S);
+ emit_and(vec16(S), S, brw_imm_d(0xf));
   }
   texel_fetch(texture_data[i]);
}
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 26/28] i965/blorp: Add bilinear blending support to the NIR path

2016-05-10 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 120 +--
 1 file changed, 114 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index 83cdac5..c4d80a7 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -966,6 +966,119 @@ blorp_nir_manual_blend_average(nir_builder *b, 
nir_ssa_def *pos,
return nir_load_var(b, color);
 }
 
+static inline nir_ssa_def *
+nir_imm_vec2(nir_builder *build, float x, float y)
+{
+   nir_const_value v;
+
+   memset(, 0, sizeof(v));
+   v.f32[0] = x;
+   v.f32[1] = y;
+
+   return nir_build_imm(build, 4, 32, v);
+}
+
+static nir_ssa_def *
+blorp_nir_manual_blend_bilinear(nir_builder *b, nir_ssa_def *pos,
+unsigned tex_samples,
+const brw_blorp_blit_prog_key *key,
+struct brw_blorp_blit_vars *v)
+{
+   nir_ssa_def *pos_xy = nir_channels(b, pos, 0x3);
+
+   nir_ssa_def *scale = nir_imm_vec2(b, key->x_scale, key->y_scale);
+
+   /* Translate coordinates to lay out the samples in a rectangular  grid
+* roughly corresponding to sample locations.
+*/
+   pos_xy = nir_fmul(b, pos_xy, scale);
+   /* Adjust coordinates so that integers represent pixel centers rather
+* than pixel edges.
+*/
+   pos_xy = nir_fadd(b, pos_xy, nir_imm_float(b, -0.5));
+   /* Clamp the X, Y texture coordinates to properly handle the sampling of
+* texels on texture edges.
+*/
+   pos_xy = nir_fmin(b, nir_fmax(b, pos_xy, nir_imm_float(b, 0.0)),
+nir_vec2(b, nir_load_var(b, v->u_rect_grid_x1),
+nir_load_var(b, v->u_rect_grid_y1)));
+
+   /* Store the fractional parts to be used as bilinear interpolation
+* coefficients.
+*/
+   nir_ssa_def *frac_xy = nir_ffract(b, pos_xy);
+   /* Round the float coordinates down to nearest integer */
+   pos_xy = nir_fdiv(b, nir_ftrunc(b, pos_xy), scale);
+
+   nir_ssa_def *tex_data[4];
+   for (unsigned i = 0; i < 4; ++i) {
+  float sample_off_x = (float)(i & 0x1) / key->x_scale;
+  float sample_off_y = (float)((i >> 1) & 0x1) / key->y_scale;
+  nir_ssa_def *sample_off = nir_imm_vec2(b, sample_off_x, sample_off_y);
+
+  nir_ssa_def *sample_coords = nir_fadd(b, pos_xy, sample_off);
+  nir_ssa_def *sample_coords_int = nir_f2i(b, sample_coords);
+
+  /* The MCS value we fetch has to match up with the pixel that we're
+   * sampling from. Since we sample from different pixels in each
+   * iteration of this "for" loop, the call to mcs_fetch() should be
+   * here inside the loop after computing the pixel coordinates.
+   */
+  nir_ssa_def *mcs = NULL;
+  if (key->tex_layout == INTEL_MSAA_LAYOUT_CMS)
+ mcs = blorp_nir_txf_ms_mcs(b, sample_coords_int);
+
+  /* Compute sample index and map the sample index to a sample number.
+   * Sample index layout shows the numbering of slots in a rectangular
+   * grid of samples with in a pixel. Sample number layout shows the
+   * rectangular grid of samples roughly corresponding to the real sample
+   * locations with in a pixel.
+   * In case of 4x MSAA, layout of sample indices matches the layout of
+   * sample numbers:
+   *   -
+   *   | 0 | 1 |
+   *   -
+   *   | 2 | 3 |
+   *   -
+   *
+   * In case of 8x MSAA the two layouts don't match.
+   * sample index layout :  -sample number layout :  -
+   *| 0 | 1 || 5 | 2 |
+   *--
+   *| 2 | 3 || 4 | 6 |
+   *--
+   *| 4 | 5 || 0 | 3 |
+   *--
+   *| 6 | 7 || 7 | 1 |
+   *--
+   *
+   * Fortunately, this can be done fairly easily as:
+   * S' = (0x17306425 >> (S * 4)) & 0xf
+   */
+  nir_ssa_def *frac = nir_ffract(b, sample_coords);
+  nir_ssa_def *sample =
+ nir_fdot2(b, frac, nir_imm_vec2(b, key->x_scale,
+key->x_scale * key->y_scale));
+  sample = nir_f2i(b, sample);
+
+  if (tex_samples == 8) {
+ sample = nir_iand(b, nir_ishr(b, nir_imm_int(b, 0x17306425),
+   nir_ishl(b, sample, nir_imm_int(b, 2))),
+   nir_imm_int(b, 0xf));
+  }
+  nir_ssa_def *pos_ms = nir_vec3(b, 

[Mesa-dev] [PATCH 23/28] i965/blorp: Add support for W-[de]tiling to the NIR path

2016-05-10 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp | 107 ++-
 1 file changed, 105 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index 27aab20..c0c02cf 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -571,6 +571,98 @@ blorp_nir_txf_ms_mcs(nir_builder *b, nir_ssa_def *pos)
return >dest.ssa;
 }
 
+static nir_ssa_def *
+nir_mask_shift_or(struct nir_builder *b, nir_ssa_def *dst, nir_ssa_def *src,
+  uint32_t src_mask, int src_left_shift)
+{
+   nir_ssa_def *masked = nir_iand(b, src, nir_imm_int(b, src_mask));
+
+   nir_ssa_def *shifted;
+   if (src_left_shift > 0) {
+  shifted = nir_ishl(b, masked, nir_imm_int(b, src_left_shift));
+   } else if (src_left_shift < 0) {
+  shifted = nir_ushr(b, masked, nir_imm_int(b, -src_left_shift));
+   } else {
+  assert(src_left_shift == 0);
+  shifted = masked;
+   }
+
+   return nir_ior(b, dst, shifted);
+}
+
+static inline nir_ssa_def *
+blorp_nir_retile_y_to_w(nir_builder *b, nir_ssa_def *pos)
+{
+   assert(pos->num_components == 2);
+   nir_ssa_def *x_Y = nir_channel(b, pos, 0);
+   nir_ssa_def *y_Y = nir_channel(b, pos, 1);
+
+   /* Given X and Y coordinates that describe an address using Y tiling,
+* translate to the X and Y coordinates that describe the same address
+* using W tiling.
+*
+* If we break down the low order bits of X and Y, using a
+* single letter to represent each low-order bit:
+*
+*   X = A << 7 | 0bBCDEFGH
+*   Y = J << 5 | 0bKLMNP   (1)
+*
+* Then we can apply the Y tiling formula to see the memory offset being
+* addressed:
+*
+*   offset = (J * tile_pitch + A) << 12 | 0bBCDKLMNPEFGH   (2)
+*
+* If we apply the W detiling formula to this memory location, that the
+* corresponding X' and Y' coordinates are:
+*
+*   X' = A << 6 | 0bBCDPFH (3)
+*   Y' = J << 6 | 0bKLMNEG
+*
+* Combining (1) and (3), we see that to transform (X, Y) to (X', Y'),
+* we need to make the following computation:
+*
+*   X' = (X & ~0b1011) >> 1 | (Y & 0b1) << 2 | X & 0b1 (4)
+*   Y' = (Y & ~0b1) << 1 | (X & 0b1000) >> 2 | (X & 0b10) >> 1
+*/
+   nir_ssa_def *x_W = nir_imm_int(b, 0);
+   x_W = nir_mask_shift_or(b, x_W, x_Y, 0xfff4, -1);
+   x_W = nir_mask_shift_or(b, x_W, y_Y, 0x1, 2);
+   x_W = nir_mask_shift_or(b, x_W, x_Y, 0x1, 0);
+
+   nir_ssa_def *y_W = nir_imm_int(b, 0);
+   y_W = nir_mask_shift_or(b, y_W, y_Y, 0xfffe, 1);
+   y_W = nir_mask_shift_or(b, y_W, x_Y, 0x8, -2);
+   y_W = nir_mask_shift_or(b, y_W, x_Y, 0x2, -1);
+
+   return nir_vec2(b, x_W, y_W);
+}
+
+static inline nir_ssa_def *
+blorp_nir_retile_w_to_y(nir_builder *b, nir_ssa_def *pos)
+{
+   assert(pos->num_components == 2);
+   nir_ssa_def *x_W = nir_channel(b, pos, 0);
+   nir_ssa_def *y_W = nir_channel(b, pos, 1);
+
+   /* Applying the same logic as above, but in reverse, we obtain the
+* formulas:
+*
+* X' = (X & ~0b101) << 1 | (Y & 0b10) << 2 | (Y & 0b1) << 1 | X & 0b1
+* Y' = (Y & ~0b11) >> 1 | (X & 0b100) >> 2
+*/
+   nir_ssa_def *x_Y = nir_imm_int(b, 0);
+   x_Y = nir_mask_shift_or(b, x_Y, x_W, 0xfffa, 1);
+   x_Y = nir_mask_shift_or(b, x_Y, y_W, 0x2, 2);
+   x_Y = nir_mask_shift_or(b, x_Y, y_W, 0x1, 1);
+   x_Y = nir_mask_shift_or(b, x_Y, x_W, 0x1, 0);
+
+   nir_ssa_def *y_Y = nir_imm_int(b, 0);
+   y_Y = nir_mask_shift_or(b, y_Y, y_W, 0xfffc, -1);
+   y_Y = nir_mask_shift_or(b, y_Y, x_W, 0x4, -2);
+
+   return nir_vec2(b, x_Y, y_Y);
+}
+
 /**
  * Generator for WM programs used in BLORP blits.
  *
@@ -786,7 +878,12 @@ brw_blorp_build_nir_shader(struct brw_context *brw,
if (rt_tiled_w != key->dst_tiled_w ||
key->rt_samples != key->dst_samples ||
key->rt_layout != key->dst_layout) {
-  goto fail;
+  if (key->rt_samples != key->dst_samples ||
+  key->rt_layout != key->dst_layout ||
+  key->rt_samples != 0)
+ goto fail;
+  if (rt_tiled_w != key->dst_tiled_w)
+ dst_pos = blorp_nir_retile_y_to_w(, dst_pos);
}
 
/* Now (X, Y, S) = decode_msaa(dst_samples, detile(dst_tiling, offset)).
@@ -832,7 +929,13 @@ brw_blorp_build_nir_shader(struct brw_context *brw,
key->tex_samples != key->src_samples ||
key->tex_layout != key->src_layout) &&
   !key->bilinear_filter) {
- goto fail;
+ if (key->tex_samples != key->src_samples ||
+ key->tex_layout != key->src_layout ||
+ key->tex_samples != 0)
+goto fail;
+
+ if (tex_tiled_w != key->src_tiled_w)
+src_pos = blorp_nir_retile_w_to_y(, src_pos);
   }
 
   if (key->bilinear_filter) {
-- 
2.5.0.400.gff86faf

___

[Mesa-dev] [PATCH 12/28] nir: Add texture opcodes and source types for multisample compression

2016-05-10 Thread Jason Ekstrand
Intel hardware does a form of multisample compression that involves an
auxilary surface called the MCS.  When an MCS is in use, you have to first
sample from the MCS with a special opcode and then pass the result of that
operation into the next sample instrucion.  Normally, we just do this
ourselves in the back-end, but we want to expose that functionality to NIR
so that we can use MCS values directly in NIR-based blorp.
---
 src/compiler/nir/nir.h   | 6 ++
 src/compiler/nir/nir_print.c | 6 ++
 2 files changed, 12 insertions(+)

diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 4b70a45..9c59dc4 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -1068,6 +1068,7 @@ typedef enum {
nir_tex_src_bias,
nir_tex_src_lod,
nir_tex_src_ms_index, /* MSAA sample index */
+   nir_tex_src_ms_mcs, /* MSAA compression value */
nir_tex_src_ddx,
nir_tex_src_ddy,
nir_tex_src_texture_offset, /* < dynamically uniform indirect offset */
@@ -1087,6 +1088,7 @@ typedef enum {
nir_texop_txd,/**< Texture look-up with partial derivatvies 
*/
nir_texop_txf,/**< Texel fetch with explicit LOD */
nir_texop_txf_ms,/**< Multisample texture fetch */
+   nir_texop_txf_ms_mcs, /**< Multisample compression value fetch */
nir_texop_txs,/**< Texture size */
nir_texop_lod,/**< Texture lod query */
nir_texop_tg4,/**< Texture gather */
@@ -1215,6 +1217,7 @@ nir_tex_instr_is_query(nir_tex_instr *instr)
case nir_texop_lod:
case nir_texop_texture_samples:
case nir_texop_query_levels:
+   case nir_texop_txf_ms_mcs:
   return true;
case nir_texop_tex:
case nir_texop_txb:
@@ -1235,6 +1238,9 @@ nir_tex_instr_src_size(nir_tex_instr *instr, unsigned src)
if (instr->src[src].src_type == nir_tex_src_coord)
   return instr->coord_components;
 
+   /* The MCS value is expected to be a vec4 returned by a txf_ms_mcs */
+   if (instr->src[src].src_type == nir_tex_src_ms_mcs)
+  return 4;
 
if (instr->src[src].src_type == nir_tex_src_offset ||
instr->src[src].src_type == nir_tex_src_ddx ||
diff --git a/src/compiler/nir/nir_print.c b/src/compiler/nir/nir_print.c
index a36561e..583f66c 100644
--- a/src/compiler/nir/nir_print.c
+++ b/src/compiler/nir/nir_print.c
@@ -626,6 +626,9 @@ print_tex_instr(nir_tex_instr *instr, print_state *state)
case nir_texop_txf_ms:
   fprintf(fp, "txf_ms ");
   break;
+   case nir_texop_txf_ms_mcs:
+  fprintf(fp, "txf_ms_mcs ");
+  break;
case nir_texop_txs:
   fprintf(fp, "txs ");
   break;
@@ -676,6 +679,9 @@ print_tex_instr(nir_tex_instr *instr, print_state *state)
   case nir_tex_src_ms_index:
  fprintf(fp, "(ms_index)");
  break;
+  case nir_tex_src_ms_mcs:
+ fprintf(fp, "(ms_mcs)");
+ break;
   case nir_tex_src_ddx:
  fprintf(fp, "(ddx)");
  break;
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/28] i965/gen7_wm: Move where we set the fast clear op

2016-05-10 Thread Jason Ekstrand
This better matches gen8 state setup
---
 src/mesa/drivers/dri/i965/gen7_wm_state.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/gen7_wm_state.c 
b/src/mesa/drivers/dri/i965/gen7_wm_state.c
index 17dea99..8d2e2c3 100644
--- a/src/mesa/drivers/dri/i965/gen7_wm_state.c
+++ b/src/mesa/drivers/dri/i965/gen7_wm_state.c
@@ -214,6 +214,8 @@ gen7_upload_ps_state(struct brw_context *brw,
if (prog_data->num_varying_inputs != 0)
   dw4 |= GEN7_PS_ATTRIBUTE_ENABLE;
 
+   dw4 |= fast_clear_op;
+
if (prog_data->prog_offset_16 || prog_data->no_8) {
   dw4 |= GEN7_PS_16_DISPATCH_ENABLE;
 
@@ -243,8 +245,6 @@ gen7_upload_ps_state(struct brw_context *brw,
   ksp0 = stage_state->prog_offset;
}
 
-   dw4 |= fast_clear_op;
-
BEGIN_BATCH(8);
OUT_BATCH(_3DSTATE_PS << 16 | (8 - 2));
OUT_BATCH(ksp0);
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 14/28] i965/blorp: Add a prog_data_init helper

2016-05-10 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_blorp.c | 8 
 src/mesa/drivers/dri/i965/brw_blorp.h | 2 ++
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp  | 2 +-
 src/mesa/drivers/dri/i965/brw_blorp_clear.cpp | 2 +-
 4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
b/src/mesa/drivers/dri/i965/brw_blorp.c
index 247fd75..4bbe45f 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.c
+++ b/src/mesa/drivers/dri/i965/brw_blorp.c
@@ -135,6 +135,14 @@ brw_blorp_compute_tile_offsets(const struct 
brw_blorp_surface_info *info,
 
 
 void
+brw_blorp_prog_data_init(struct brw_blorp_prog_data *prog_data)
+{
+   prog_data->first_curbe_grf = 0;
+   prog_data->persample_msaa_dispatch = false;
+}
+
+
+void
 brw_blorp_params_init(struct brw_blorp_params *params)
 {
memset(params, 0, sizeof(*params));
diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h 
b/src/mesa/drivers/dri/i965/brw_blorp.h
index c5c2c4e..4a0e46e 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.h
+++ b/src/mesa/drivers/dri/i965/brw_blorp.h
@@ -214,6 +214,8 @@ struct brw_blorp_prog_data
bool persample_msaa_dispatch;
 };
 
+void brw_blorp_prog_data_init(struct brw_blorp_prog_data *prog_data);
+
 struct brw_blorp_params
 {
uint32_t x0;
diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index 897ce99..ed43184 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -632,7 +632,7 @@ brw_blorp_blit_program::compile(struct brw_context *brw, 
bool debug_flag,
   (key->dst_samples == 0));
 
/* Set up prog_data */
-   memset(_data, 0, sizeof(prog_data));
+   brw_blorp_prog_data_init(_data);
prog_data.persample_msaa_dispatch = key->persample_msaa_dispatch;
 
alloc_regs();
diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
index ed537ba..5ed46e1 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
@@ -160,7 +160,7 @@ brw_blorp_const_color_program::compile(struct brw_context 
*brw,
GLuint *program_size)
 {
/* Set up prog_data */
-   memset(_data, 0, sizeof(prog_data));
+   brw_blorp_prog_data_init(_data);
prog_data.persample_msaa_dispatch = false;
 
alloc_regs();
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 15/28] i965/blorp: Add a param array to prog_data

2016-05-10 Thread Jason Ekstrand
This array allows the push constants to be re-arranged on upload.  The
actual arrangement will, eventually, come from the back-end compiler.
---
 src/mesa/drivers/dri/i965/brw_blorp.c  |  4 
 src/mesa/drivers/dri/i965/brw_blorp.h  |  6 ++
 src/mesa/drivers/dri/i965/gen6_blorp.c | 12 +++-
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
b/src/mesa/drivers/dri/i965/brw_blorp.c
index 4bbe45f..1379804 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.c
+++ b/src/mesa/drivers/dri/i965/brw_blorp.c
@@ -139,6 +139,10 @@ brw_blorp_prog_data_init(struct brw_blorp_prog_data 
*prog_data)
 {
prog_data->first_curbe_grf = 0;
prog_data->persample_msaa_dispatch = false;
+
+   prog_data->nr_params = BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS;
+   for (unsigned i = 0; i < BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS; i++)
+  prog_data->param[i] = i;
 }
 
 
diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h 
b/src/mesa/drivers/dri/i965/brw_blorp.h
index 4a0e46e..c2f33a1 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.h
+++ b/src/mesa/drivers/dri/i965/brw_blorp.h
@@ -199,6 +199,9 @@ struct brw_blorp_wm_push_constants
uint32_t pad[5];
 };
 
+#define BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS \
+   (sizeof(struct brw_blorp_wm_push_constants) / 4)
+
 /* Every 32 bytes of push constant data constitutes one GEN register. */
 static const unsigned int BRW_BLORP_NUM_PUSH_CONST_REGS =
sizeof(struct brw_blorp_wm_push_constants) / 32;
@@ -212,6 +215,9 @@ struct brw_blorp_prog_data
 * than one sample per pixel.
 */
bool persample_msaa_dispatch;
+
+   uint8_t nr_params;
+   uint8_t param[BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS];
 };
 
 void brw_blorp_prog_data_init(struct brw_blorp_prog_data *prog_data);
diff --git a/src/mesa/drivers/dri/i965/gen6_blorp.c 
b/src/mesa/drivers/dri/i965/gen6_blorp.c
index 1955811..950e2b9 100644
--- a/src/mesa/drivers/dri/i965/gen6_blorp.c
+++ b/src/mesa/drivers/dri/i965/gen6_blorp.c
@@ -308,11 +308,13 @@ gen6_blorp_emit_wm_constants(struct brw_context *brw,
 {
uint32_t wm_push_const_offset;
 
-   void *constants = brw_state_batch(brw, AUB_TRACE_WM_CONSTANTS,
- sizeof(params->wm_push_consts),
- 32, _push_const_offset);
-   memcpy(constants, >wm_push_consts,
-  sizeof(params->wm_push_consts));
+   uint32_t *constants = brw_state_batch(brw, AUB_TRACE_WM_CONSTANTS,
+ sizeof(params->wm_push_consts),
+ 32, _push_const_offset);
+
+   uint32_t *push_consts = (uint32_t *)>wm_push_consts;
+   for (unsigned i = 0; i < params->wm_prog_data->nr_params; i++)
+  constants[i] = push_consts[params->wm_prog_data->param[i]];
 
return wm_push_const_offset;
 }
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/28] nir/builder: Add a helper for grabbing multiple channels from an ssa def

2016-05-10 Thread Jason Ekstrand
This is similar to nir_channel except that it lets you grab more than one
channel by providing a mask.
---
 src/compiler/nir/nir_builder.h | 14 ++
 src/intel/vulkan/anv_meta_blit2d.c |  4 +---
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/compiler/nir/nir_builder.h b/src/compiler/nir/nir_builder.h
index 756188f..32d619f 100644
--- a/src/compiler/nir/nir_builder.h
+++ b/src/compiler/nir/nir_builder.h
@@ -324,6 +324,20 @@ nir_channel(nir_builder *b, nir_ssa_def *def, unsigned c)
return nir_swizzle(b, def, swizzle, 1, false);
 }
 
+static inline nir_ssa_def *
+nir_channels(nir_builder *b, nir_ssa_def *def, unsigned mask)
+{
+   unsigned num_channels = 0, swizzle[4] = { 0, 0, 0, 0 };
+
+   for (unsigned i = 0; i < 4; i++) {
+  if ((mask & (1 << i)) == 0)
+ continue;
+  swizzle[num_channels++] = i;
+   }
+
+   return nir_swizzle(b, def, swizzle, num_channels, false);
+}
+
 /**
  * Turns a nir_src into a nir_ssa_def * so it can be passed to
  * nir_build_alu()-based builder calls.
diff --git a/src/intel/vulkan/anv_meta_blit2d.c 
b/src/intel/vulkan/anv_meta_blit2d.c
index 577eeae..06e1043 100644
--- a/src/intel/vulkan/anv_meta_blit2d.c
+++ b/src/intel/vulkan/anv_meta_blit2d.c
@@ -1010,9 +1010,7 @@ build_nir_w_tiled_fragment_shader(struct anv_device 
*device,
discard->src[0] = nir_src_for_ssa(oob);
nir_builder_instr_insert(, >instr);
 
-   unsigned swiz[4] = { 0, 1, 0, 0 };
-   nir_ssa_def *tex_off =
-  nir_swizzle(, nir_load_var(, tex_off_in), swiz, 2, false);
+   nir_ssa_def *tex_off = nir_channels(, nir_load_var(, tex_off_in), 0x3);
nir_ssa_def *tex_pos = nir_iadd(, nir_vec2(, x_W, y_W), tex_off);
nir_ssa_def *tex_pitch = nir_channel(, nir_load_var(, tex_off_in), 2);
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/28] nir/builder: Generate the alu helpers directly in python

2016-05-10 Thread Jason Ekstrand
There's no reason for having a macro *and* a python generator.  We can
easily just do the whole thing in python.  This has the advantage that we
are no longer definining ALU# macros which conflict with the ones in
brw_fs_builder.h.
---
 src/compiler/nir/nir_builder.h| 30 --
 src/compiler/nir/nir_builder_opcodes_h.py | 14 +-
 2 files changed, 13 insertions(+), 31 deletions(-)

diff --git a/src/compiler/nir/nir_builder.h b/src/compiler/nir/nir_builder.h
index 4fa9779..756188f 100644
--- a/src/compiler/nir/nir_builder.h
+++ b/src/compiler/nir/nir_builder.h
@@ -233,36 +233,6 @@ nir_build_alu(nir_builder *build, nir_op op, nir_ssa_def 
*src0,
return >dest.dest.ssa;
 }
 
-#define ALU1(op)  \
-static inline nir_ssa_def *   \
-nir_##op(nir_builder *build, nir_ssa_def *src0)   \
-{ \
-   return nir_build_alu(build, nir_op_##op, src0, NULL, NULL, NULL);  \
-}
-
-#define ALU2(op)  \
-static inline nir_ssa_def *   \
-nir_##op(nir_builder *build, nir_ssa_def *src0, nir_ssa_def *src1)\
-{ \
-   return nir_build_alu(build, nir_op_##op, src0, src1, NULL, NULL);  \
-}
-
-#define ALU3(op)  \
-static inline nir_ssa_def *   \
-nir_##op(nir_builder *build, nir_ssa_def *src0,   \
- nir_ssa_def *src1, nir_ssa_def *src2)\
-{ \
-   return nir_build_alu(build, nir_op_##op, src0, src1, src2, NULL);  \
-}
-
-#define ALU4(op)  \
-static inline nir_ssa_def *   \
-nir_##op(nir_builder *build, nir_ssa_def *src0,   \
- nir_ssa_def *src1, nir_ssa_def *src2, nir_ssa_def *src3) \
-{ \
-   return nir_build_alu(build, nir_op_##op, src0, src1, src2, src3);  \
-}
-
 #include "nir_builder_opcodes.h"
 
 static inline nir_ssa_def *
diff --git a/src/compiler/nir/nir_builder_opcodes_h.py 
b/src/compiler/nir/nir_builder_opcodes_h.py
index 038e2b4..42eb6e0 100644
--- a/src/compiler/nir/nir_builder_opcodes_h.py
+++ b/src/compiler/nir/nir_builder_opcodes_h.py
@@ -26,8 +26,20 @@ template = """\
 #ifndef _NIR_BUILDER_OPCODES_
 #define _NIR_BUILDER_OPCODES_
 
+<%
+def src_decl_list(num_srcs):
+   return ', '.join('nir_ssa_def *src' + str(i) for i in range(num_srcs))
+
+def src_list(num_srcs):
+   return ', '.join('src' + str(i) if i < num_srcs else 'NULL' for i in 
range(4))
+%>
+
 % for name, opcode in sorted(opcodes.iteritems()):
-ALU${opcode.num_inputs}(${name})
+static inline nir_ssa_def *
+nir_${name}(nir_builder *build, ${src_decl_list(opcode.num_inputs)})
+{
+   return nir_build_alu(build, nir_op_${name}, ${src_list(opcode.num_inputs)});
+}
 % endfor
 
 #endif /* _NIR_BUILDER_OPCODES_ */"""
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/28] i965/fs: Clean up the logic in compile_fs a bit

2016-05-10 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 73 
 1 file changed, 41 insertions(+), 32 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 71e759d..d136ba8 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -6017,52 +6017,56 @@ brw_compile_fs(const struct brw_compiler *compiler, 
void *log_data,
key->persample_interp,
shader);
 
-   fs_visitor v(compiler, log_data, mem_ctx, key,
-_data->base, prog, shader, 8,
-shader_time_index8);
-   if (!v.run_fs(false /* do_rep_send */)) {
+   cfg_t *simd8_cfg = NULL, *simd16_cfg = NULL;
+
+   fs_visitor v8(compiler, log_data, mem_ctx, key,
+ _data->base, prog, shader, 8,
+ shader_time_index8);
+   if (!v8.run_fs(false /* do_rep_send */)) {
   if (error_str)
- *error_str = ralloc_strdup(mem_ctx, v.fail_msg);
+ *error_str = ralloc_strdup(mem_ctx, v8.fail_msg);
 
   return NULL;
+   } else if (likely(!(INTEL_DEBUG & DEBUG_NO8))) {
+  simd8_cfg = v8.cfg;
}
 
-   cfg_t *simd16_cfg = NULL;
-   fs_visitor v2(compiler, log_data, mem_ctx, key,
- _data->base, prog, shader, 16,
- shader_time_index16);
-   if (likely(!(INTEL_DEBUG & DEBUG_NO16) || use_rep_send)) {
-  if (!v.simd16_unsupported) {
- /* Try a SIMD16 compile */
- v2.import_uniforms();
- if (!v2.run_fs(use_rep_send)) {
-compiler->shader_perf_log(log_data,
-  "SIMD16 shader failed to compile: %s",
-  v2.fail_msg);
- } else {
-simd16_cfg = v2.cfg;
- }
+   if (!v8.simd16_unsupported &&
+   likely(!(INTEL_DEBUG & DEBUG_NO16) || use_rep_send)) {
+  /* Try a SIMD16 compile */
+  fs_visitor v16(compiler, log_data, mem_ctx, key,
+ _data->base, prog, shader, 16,
+ shader_time_index16);
+  v16.import_uniforms();
+  if (!v16.run_fs(use_rep_send)) {
+ compiler->shader_perf_log(log_data,
+   "SIMD16 shader failed to compile: %s",
+   v16.fail_msg);
+  } else {
+ simd16_cfg = v16.cfg;
   }
}
 
+   /* When the caller requests a repclear shader, they want SIMD16-only */
+   if (use_rep_send)
+  simd8_cfg = NULL;
+
+   /* Prior to Iron Lake, the PS had a single shader offset with a jump table
+* at the top to select the shader.  We've never implemented that.
+* Instead, we just give them exactly one shader and we pick the widest one
+* available.
+*/
+   if (compiler->devinfo->gen < 5 && simd16_cfg)
+  simd8_cfg = NULL;
+
/* We have to compute the flat inputs after the visitor is finished running
 * because it relies on prog_data->urb_setup which is computed in
 * fs_visitor::calculate_urb_setup().
 */
brw_compute_flat_inputs(prog_data, key->flat_shade, shader);
 
-   cfg_t *simd8_cfg;
-   int no_simd8 = (INTEL_DEBUG & DEBUG_NO8) || use_rep_send;
-   if ((no_simd8 || compiler->devinfo->gen < 5) && simd16_cfg) {
-  simd8_cfg = NULL;
-  prog_data->no_8 = true;
-   } else {
-  simd8_cfg = v.cfg;
-  prog_data->no_8 = false;
-   }
-
fs_generator g(compiler, log_data, mem_ctx, (void *) key, _data->base,
-  v.promoted_constants, v.runtime_check_aads_emit,
+  v8.promoted_constants, v8.runtime_check_aads_emit,
   MESA_SHADER_FRAGMENT);
 
if (unlikely(INTEL_DEBUG & DEBUG_WM)) {
@@ -6072,8 +6076,13 @@ brw_compile_fs(const struct brw_compiler *compiler, void 
*log_data,
  shader->info.name));
}
 
-   if (simd8_cfg)
+   if (simd8_cfg) {
   g.generate_code(simd8_cfg, 8);
+  prog_data->no_8 = false;
+   } else {
+  prog_data->no_8 = true;
+   }
+
if (simd16_cfg)
   prog_data->prog_offset_16 = g.generate_code(simd16_cfg, 16);
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 13/28] i965/fs: Implement the new NIR MCS texturing

2016-05-10 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index c2274ba..350c14f 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -3671,6 +3671,7 @@ fs_visitor::nir_emit_texture(const fs_builder , 
nir_tex_instr *instr)
  switch (instr->op) {
  case nir_texop_txf:
  case nir_texop_txf_ms:
+ case nir_texop_txf_ms_mcs:
  case nir_texop_samples_identical:
 srcs[TEX_LOGICAL_SRC_COORDINATE] = retype(src, 
BRW_REGISTER_TYPE_D);
 break;
@@ -3745,13 +3746,19 @@ fs_visitor::nir_emit_texture(const fs_builder , 
nir_tex_instr *instr)
  break;
   }
 
+  case nir_tex_src_ms_mcs:
+ assert(instr->op == nir_texop_txf_ms);
+ srcs[TEX_LOGICAL_SRC_MCS] = retype(src, BRW_REGISTER_TYPE_D);
+ break;
+
   default:
  unreachable("unknown texture source");
   }
}
 
-   if (instr->op == nir_texop_txf_ms ||
-   instr->op == nir_texop_samples_identical) {
+   if (srcs[TEX_LOGICAL_SRC_MCS].file == BAD_FILE &&
+   (instr->op == nir_texop_txf_ms ||
+instr->op == nir_texop_samples_identical)) {
   if (devinfo->gen >= 7 &&
   key_tex->compressed_multisample_layout_mask & (1 << texture)) {
  srcs[TEX_LOGICAL_SRC_MCS] =
@@ -3797,6 +3804,9 @@ fs_visitor::nir_emit_texture(const fs_builder , 
nir_tex_instr *instr)
   else
  opcode = SHADER_OPCODE_TXF_CMS_LOGICAL;
   break;
+   case nir_texop_txf_ms_mcs:
+  opcode = SHADER_OPCODE_TXF_MCS_LOGICAL;
+  break;
case nir_texop_query_levels:
case nir_texop_txs:
   opcode = SHADER_OPCODE_TXS_LOGICAL;
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 17/28] i965/blorp: Add a helper for compiling NIR shaders

2016-05-10 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_blorp.c | 95 +++
 src/mesa/drivers/dri/i965/brw_blorp.h | 10 
 2 files changed, 105 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
b/src/mesa/drivers/dri/i965/brw_blorp.c
index 6c3b83a..161fb90 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.c
+++ b/src/mesa/drivers/dri/i965/brw_blorp.c
@@ -26,6 +26,8 @@
 #include "intel_fbo.h"
 
 #include "brw_blorp.h"
+#include "brw_compiler.h"
+#include "brw_nir.h"
 #include "brw_state.h"
 
 #define FILE_DEBUG_FLAG DEBUG_BLORP
@@ -161,6 +163,99 @@ brw_blorp_params_init(struct brw_blorp_params *params)
params->num_layers = 1;
 }
 
+void
+brw_blorp_init_wm_prog_key(struct brw_wm_prog_key *wm_key)
+{
+   memset(wm_key, 0, sizeof(*wm_key));
+   wm_key->nr_color_regions = 1;
+   for (int i = 0; i < MAX_SAMPLERS; i++)
+  wm_key->tex.swizzles[i] = SWIZZLE_XYZW;
+}
+
+static int
+nir_uniform_type_size(const struct glsl_type *type)
+{
+   /* Only very basic types are allowed */
+   assert(glsl_type_is_vector_or_scalar(type));
+   assert(glsl_get_bit_size(glsl_get_base_type(type)) == 32);
+
+   return glsl_get_vector_elements(type) * 4;
+}
+
+const unsigned *
+brw_blorp_compile_nir_shader(struct brw_context *brw, struct nir_shader *nir,
+ const struct brw_wm_prog_key *wm_key,
+ bool use_repclear,
+ struct brw_blorp_prog_data *prog_data,
+ unsigned *program_size)
+{
+   const struct brw_compiler *compiler = brw->intelScreen->compiler;
+
+   void *mem_ctx = ralloc_context(NULL);
+
+   /* Calling brw_preprocess_nir and friends is destructive and, if cloning is
+* enabled, may end up completely replacing the nir_shader.  Therefore, we
+* own it and might as well put it in our context for easy cleanup.
+*/
+   ralloc_steal(mem_ctx, nir);
+   nir->options =
+  compiler->glsl_compiler_options[MESA_SHADER_FRAGMENT].NirOptions;
+
+   struct brw_wm_prog_data wm_prog_data;
+   memset(_prog_data, 0, sizeof(wm_prog_data));
+
+   /* We set up the params array but instead of making them point at actual
+* GL constant values, they just store an index.  This is just fine as the
+* backend compiler never looks at the contents of the pointers, it just
+* re-arranges them for us.
+*/
+   const union gl_constant_value *param[BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS];
+   for (unsigned i = 0; i < ARRAY_SIZE(param); i++)
+  param[i] = (const union gl_constant_value *)(intptr_t)i;
+
+   wm_prog_data.base.nr_params = BRW_BLORP_NUM_PUSH_CONSTANT_DWORDS;
+   wm_prog_data.base.param = param;
+
+   /* BLORP always just uses the first two binding table entries */
+   wm_prog_data.binding_table.render_target_start = 0;
+   wm_prog_data.base.binding_table.texture_start = 1;
+
+   nir = brw_preprocess_nir(compiler, nir);
+   nir_remove_dead_variables(nir, nir_var_shader_in);
+   nir_shader_gather_info(nir, nir_shader_get_entrypoint(nir)->impl);
+
+   /* Uniforms are required to be lowered before going into compile_fs.  For
+* BLORP, we'll assume that whoever builds the shader sets the location
+* they want so we just need to lower them and figure out how many we have
+* in total.
+*/
+   nir->num_uniforms = 0;
+   nir_foreach_variable(var, >uniforms) {
+  var->data.driver_location = var->data.location;
+  unsigned end = var->data.location + nir_uniform_type_size(var->type);
+  nir->num_uniforms = MAX2(nir->num_uniforms, end);
+   }
+   nir_lower_io(nir, nir_var_uniform, nir_uniform_type_size);
+
+   const unsigned *program =
+  brw_compile_fs(compiler, brw, mem_ctx, wm_key, _prog_data, nir,
+ NULL, -1, -1, use_repclear, program_size, NULL);
+
+   /* Copy the relavent bits of wm_prog_data over into the blorp prog data */
+   prog_data->dispatch_8 = wm_prog_data.dispatch_8;
+   prog_data->dispatch_16 = wm_prog_data.dispatch_16;
+   prog_data->first_curbe_grf_0 = wm_prog_data.base.dispatch_grf_start_reg;
+   prog_data->first_curbe_grf_2 = wm_prog_data.dispatch_grf_start_reg_2;
+   prog_data->ksp_offset_2 = wm_prog_data.prog_offset_2;
+   prog_data->persample_msaa_dispatch = wm_prog_data.persample_dispatch;
+
+   prog_data->nr_params = wm_prog_data.base.nr_params;
+   for (unsigned i = 0; i < ARRAY_SIZE(param); i++)
+  prog_data->param[i] = (uintptr_t)wm_prog_data.base.param[i];
+
+   return program;
+}
+
 /**
  * Perform a HiZ or depth resolve operation.
  *
diff --git a/src/mesa/drivers/dri/i965/brw_blorp.h 
b/src/mesa/drivers/dri/i965/brw_blorp.h
index b38b689..5f7569c 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.h
+++ b/src/mesa/drivers/dri/i965/brw_blorp.h
@@ -29,6 +29,7 @@
 #include "intel_mipmap_tree.h"
 
 struct brw_context;
+struct brw_wm_prog_key;
 
 #ifdef __cplusplus
 extern "C" {
@@ -356,6 +357,15 @@ struct brw_blorp_blit_prog_key
  * Used internally by gen6_blorp_exec() and gen7_blorp_exec().
  */
 
+void 

[Mesa-dev] [PATCH 01/28] nir: Add an info bit for uses_sample_qualifier

2016-05-10 Thread Jason Ekstrand
---
 src/compiler/nir/glsl_to_nir.cpp   | 1 +
 src/compiler/nir/nir.h | 5 +
 src/compiler/nir/nir_gather_info.c | 8 +++-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/src/compiler/nir/glsl_to_nir.cpp b/src/compiler/nir/glsl_to_nir.cpp
index e3fa623..4e50d5c 100644
--- a/src/compiler/nir/glsl_to_nir.cpp
+++ b/src/compiler/nir/glsl_to_nir.cpp
@@ -184,6 +184,7 @@ glsl_to_nir(const struct gl_shader_program *shader_prog,
  (struct gl_fragment_program *)sh->Program;
 
   shader->info.fs.uses_discard = fp->UsesKill;
+  shader->info.fs.uses_sample_qualifier = fp->IsSample != 0;
   shader->info.fs.early_fragment_tests = sh->EarlyFragmentTests;
   shader->info.fs.depth_layout = fp->FragDepthLayout;
   break;
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 8a616d4..4b70a45 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -1743,6 +1743,11 @@ typedef struct nir_shader_info {
  bool uses_discard;
 
  /**
+  * Whether any inputs are declared with the "sample" qualifier.
+  */
+ bool uses_sample_qualifier;
+
+ /**
   * Whether early fragment tests are enabled as defined by
   * ARB_shader_image_load_store.
   */
diff --git a/src/compiler/nir/nir_gather_info.c 
b/src/compiler/nir/nir_gather_info.c
index d45b1a2..89a6302 100644
--- a/src/compiler/nir/nir_gather_info.c
+++ b/src/compiler/nir/nir_gather_info.c
@@ -125,9 +125,15 @@ nir_shader_gather_info(nir_shader *shader, 
nir_function_impl *entrypoint)
   shader->stage == MESA_SHADER_FRAGMENT ||
   shader->stage == MESA_SHADER_COMPUTE);
 
+   bool uses_sample_qualifier = false;
shader->info.inputs_read = 0;
-   foreach_list_typed(nir_variable, var, node, >inputs)
+   foreach_list_typed(nir_variable, var, node, >inputs) {
   shader->info.inputs_read |= get_io_mask(var, shader->stage);
+  uses_sample_qualifier |= var->data.sample;
+   }
+
+   if (shader->stage == MESA_SHADER_FRAGMENT)
+  shader->info.fs.uses_sample_qualifier = uses_sample_qualifier;
 
/* TODO: Some day we may need to add stream support to NIR */
shader->info.outputs_written = 0;
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/28] i965/state: Clean up WM/PS state to pull more things out of prog_data

2016-05-10 Thread Jason Ekstrand
Now that we have a persample_shading bit in prog_data we can reduce the
amount the state setup code needs to be looking at the GL state.  In
particular, it no longer pulls anything directly out of the
gl_fragment_program and no longer depends on NEW_FRAGMENT_PROGRAM.
---
 src/mesa/drivers/dri/i965/brw_state.h |  8 ++--
 src/mesa/drivers/dri/i965/gen6_wm_state.c | 28 
 src/mesa/drivers/dri/i965/gen7_wm_state.c |  4 +---
 src/mesa/drivers/dri/i965/gen8_ps_state.c | 19 ---
 4 files changed, 15 insertions(+), 44 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_state.h 
b/src/mesa/drivers/dri/i965/brw_state.h
index e89b388..1fb618b 100644
--- a/src/mesa/drivers/dri/i965/brw_state.h
+++ b/src/mesa/drivers/dri/i965/brw_state.h
@@ -305,15 +305,12 @@ void gen7_init_vtable_surface_functions(struct 
brw_context *brw);
 
 /* gen8_ps_state.c */
 void gen8_upload_ps_state(struct brw_context *brw,
-  const struct gl_fragment_program *fp,
   const struct brw_stage_state *stage_state,
   const struct brw_wm_prog_data *prog_data,
   uint32_t fast_clear_op);
 
 void gen8_upload_ps_extra(struct brw_context *brw,
-  const struct gl_fragment_program *fp,
-  const struct brw_wm_prog_data *prog_data,
-  bool multisampled_fbo);
+  const struct brw_wm_prog_data *prog_data);
 
 /* gen7_sol_state.c */
 void gen7_upload_3dstate_so_decl_list(struct brw_context *brw,
@@ -368,10 +365,9 @@ void brw_update_sampler_state(struct brw_context *brw,
 /* gen6_wm_state.c */
 void
 gen6_upload_wm_state(struct brw_context *brw,
- const struct brw_fragment_program *fp,
  const struct brw_wm_prog_data *prog_data,
  const struct brw_stage_state *stage_state,
- bool multisampled_fbo, int min_inv_per_frag,
+ bool multisampled_fbo,
  bool dual_source_blend_enable, bool kill_enable,
  bool color_buffer_write_enable, bool msaa_enabled,
  bool line_stipple_enable, bool polygon_stipple_enable,
diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c 
b/src/mesa/drivers/dri/i965/gen6_wm_state.c
index dd33926..4a5aa12 100644
--- a/src/mesa/drivers/dri/i965/gen6_wm_state.c
+++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c
@@ -69,10 +69,9 @@ const struct brw_tracked_state gen6_wm_push_constants = {
 
 void
 gen6_upload_wm_state(struct brw_context *brw,
- const struct brw_fragment_program *fp,
  const struct brw_wm_prog_data *prog_data,
  const struct brw_stage_state *stage_state,
- bool multisampled_fbo, int min_inv_per_frag,
+ bool multisampled_fbo,
  bool dual_source_blend_enable, bool kill_enable,
  bool color_buffer_write_enable, bool msaa_enabled,
  bool line_stipple_enable, bool polygon_stipple_enable,
@@ -163,10 +162,11 @@ gen6_upload_wm_state(struct brw_context *brw,
if (polygon_stipple_enable)
   dw5 |= GEN6_WM_POLYGON_STIPPLE_ENABLE;
 
-   /* BRW_NEW_FRAGMENT_PROGRAM */
-   if (fp->program.Base.InputsRead & VARYING_BIT_POS)
-  dw5 |= GEN6_WM_USES_SOURCE_DEPTH | GEN6_WM_USES_SOURCE_W;
-   if (fp->program.Base.OutputsWritten & BITFIELD64_BIT(FRAG_RESULT_DEPTH))
+   if (prog_data->uses_src_depth)
+  dw5 |= GEN6_WM_USES_SOURCE_DEPTH;
+   if (prog_data->uses_src_w)
+  dw5 |= GEN6_WM_USES_SOURCE_W;
+   if (prog_data->computed_depth_mode != BRW_PSCDEPTH_OFF)
   dw5 |= GEN6_WM_COMPUTED_DEPTH;
dw6 |= prog_data->barycentric_interp_modes <<
   GEN6_WM_BARYCENTRIC_INTERPOLATION_MODE_SHIFT;
@@ -277,23 +277,12 @@ static void
 upload_wm_state(struct brw_context *brw)
 {
struct gl_context *ctx = >ctx;
-   /* BRW_NEW_FRAGMENT_PROGRAM */
-   const struct brw_fragment_program *fp =
-  brw_fragment_program_const(brw->fragment_program);
/* BRW_NEW_FS_PROG_DATA */
const struct brw_wm_prog_data *prog_data = brw->wm.prog_data;
 
/* _NEW_BUFFERS */
const bool multisampled_fbo = _mesa_geometric_samples(ctx->DrawBuffer) > 1;
 
-   /* In case of non 1x per sample shading, only one of SIMD8 and SIMD16
-* should be enabled. We do 'SIMD16 only' dispatch if a SIMD16 shader
-* is successfully compiled. In majority of the cases that bring us
-* better performance than 'SIMD8 only' dispatch.
-*/
-   const int min_inv_per_frag = _mesa_get_min_invocations_per_fragment(
-   ctx, brw->fragment_program, false);
-
/* BRW_NEW_FS_PROG_DATA | _NEW_COLOR */
const bool dual_src_blend_enable = prog_data->dual_src_blend &&
   (ctx->Color.BlendEnabled & 1) &&
@@ -310,8 +299,8 @@ 

[Mesa-dev] [PATCH 00/28] i965/blorp: Use NIR for compiling shaders

2016-05-10 Thread Jason Ekstrand
When Paul originally wrote blorp he hand-rolled a shader builder that
builds i965 shaders directly.  This has caused headaches because every time
we make a change to the back-end compiler, we have to update blorp.  NIR on
the other hand tends to be more stable at this point since it has many
different users all across mesa.

Using NIR also means that we get decent optimizations, register allocation,
and scheduling.  The original blorp codegen code tried fairly hard to emit
reasonably efficient code in that it didn't do more work than needed but it
was fairly naieve when it came to register allocation and scheduling.
Using the full compiler stack also means that we get new features for free
without having to re-implement them in blorp.  On Sky Lake, for instance,
we are now generating shaders with sampler-EOT.

In spite of all this, this series shows no measurable performance
difference on Haswell with every benchmark in sixonyx run 25 times.

Jason Ekstrand (28):
  nir: Add an info bit for uses_sample_qualifier
  i965/fs: Rework the persample shading key/prog_data bits
  i965/state: Clean up WM/PS state to pull more things out of prog_data
  i965/fs: Clean up the logic in compile_fs a bit
  i965/fs: Stop setting dispatch_grf_start_reg from the visitor
  i965/gen7_wm: Move where we set the fast clear op
  i965/fs: Organize prog_data by ksp number rather than SIMD width
  i965/blorp: Simplify the sample layout calculation
  i965/fs: Use MRF0 for the repclear message
  nir/builder: Generate the alu helpers directly in python
  nir/builder: Add a helper for grabbing multiple channels from an ssa
def
  nir: Add texture opcodes and source types for multisample compression
  i965/fs: Implement the new NIR MCS texturing
  i965/blorp: Add a prog_data_init helper
  i965/blorp: Add a param array to prog_data
  blorp: Add initial state setup support for SIMD8 dispatch
  i965/blorp: Add a helper for compiling NIR shaders
  i965/blorp: Create the program key in get_clear_kernel
  i965/blorp: Use NIR for clear shaders
  i965/blorp: Refactor getting the blit kernel into a helper
  i965/blorp: Add initial support for NIR-based blit shaders
  i965/blorp: Add support for discard-based bounds checks to the NIR
path
  i965/blorp: Add support for W-[de]tiling to the NIR path
  i965/blorp: Add MSAA encode/decode support to the NIR path
  i965/blorp: Add support for averaging resolves to the NIR path
  i965/blorp: Add bilinear blending support to the NIR path
  i965/blorp: Refactor coordinate munging
  i965/blorp: Delete the old blorp shader emit code

 src/compiler/nir/glsl_to_nir.cpp  |1 +
 src/compiler/nir/nir.h|   11 +
 src/compiler/nir/nir_builder.h|   44 +-
 src/compiler/nir/nir_builder_opcodes_h.py |   14 +-
 src/compiler/nir/nir_gather_info.c|8 +-
 src/compiler/nir/nir_print.c  |6 +
 src/intel/vulkan/anv_meta_blit2d.c|4 +-
 src/intel/vulkan/anv_pipeline.c   |   46 +-
 src/intel/vulkan/anv_private.h|5 -
 src/intel/vulkan/gen7_pipeline.c  |   12 +-
 src/intel/vulkan/gen8_pipeline.c  |   12 +-
 src/mesa/drivers/dri/i965/Makefile.sources|2 -
 src/mesa/drivers/dri/i965/brw_blorp.c |  111 ++
 src/mesa/drivers/dri/i965/brw_blorp.h |   26 +-
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp  | 2208 +
 src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp   |  145 --
 src/mesa/drivers/dri/i965/brw_blorp_blit_eu.h |  212 --
 src/mesa/drivers/dri/i965/brw_blorp_clear.cpp |  200 +-
 src/mesa/drivers/dri/i965/brw_compiler.h  |   16 +-
 src/mesa/drivers/dri/i965/brw_defines.h   |1 -
 src/mesa/drivers/dri/i965/brw_fs.cpp  |  173 +-
 src/mesa/drivers/dri/i965/brw_fs.h|1 -
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp|   21 -
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp  |   14 +-
 src/mesa/drivers/dri/i965/brw_fs_visitor.cpp  |2 +-
 src/mesa/drivers/dri/i965/brw_shader.cpp  |3 +-
 src/mesa/drivers/dri/i965/brw_state.h |8 +-
 src/mesa/drivers/dri/i965/brw_vec4.cpp|2 +
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp |1 +
 src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp|1 +
 src/mesa/drivers/dri/i965/brw_wm.c|   19 +-
 src/mesa/drivers/dri/i965/brw_wm_state.c  |   31 +-
 src/mesa/drivers/dri/i965/gen6_blorp.c|   35 +-
 src/mesa/drivers/dri/i965/gen6_wm_state.c |   93 +-
 src/mesa/drivers/dri/i965/gen7_blorp.c|   27 +-
 src/mesa/drivers/dri/i965/gen7_wm_state.c |   47 +-
 src/mesa/drivers/dri/i965/gen8_blorp.c|   23 +-
 src/mesa/drivers/dri/i965/gen8_ps_state.c |   64 +-
 38 files changed, 1446 insertions(+), 2203 deletions(-)
 delete mode 100644 

Re: [Mesa-dev] [PATCH v2] util/ralloc: Remove double zero'ing of rzalloc buffers

2016-05-10 Thread Kenneth Graunke
On Tuesday, May 10, 2016 3:54:05 PM PDT Jordan Justen wrote:
> Juha-Pekka found this back in May 2015:
> <1430915727-28677-1-git-send-email-juhapekka.heikk...@gmail.com>
> 
> From the discussion, obviously it would be preferable to make
> ralloc_size no longer return zeroed memory, but Juha-Pekka found that
> it would break Mesa.
> 
> In <56af1c57.2030...@gmail.com>, Juha-Pekka mentioned that patches
> exist to fix i965 when ralloc_size is fixed to not zero memory, but
> the patches have not made their way to mesa-dev yet.
> 
> For now, let's stop doing the double zeroing of rzalloc buffers.
> 
> v2:
>  * Move ralloc_size code to rzalloc_size, and add a comment as
>suggested by Ken.
> 
> Signed-off-by: Jordan Justen 
> Cc: Juha-Pekka Heikkila 
> Cc: Kenneth Graunke 
> ---
>  src/util/ralloc.c | 21 -
>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/src/util/ralloc.c b/src/util/ralloc.c
> index 6d4032b..9526011 100644
> --- a/src/util/ralloc.c
> +++ b/src/util/ralloc.c
> @@ -110,6 +110,18 @@ ralloc_context(const void *ctx)
>  void *
>  ralloc_size(const void *ctx, size_t size)
>  {
> +   /* ralloc_size was originally implemented using calloc, which meant some
> +* code accidentally relied on its zero filling behavior.
> +*
> +* TODO: Make ralloc_size not zero fill memory, and cleanup any code 
that
> +* should instead be using rzalloc.
> +*/
> +   return rzalloc_size(ctx, size);
> +}
> +
> +void *
> +rzalloc_size(const void *ctx, size_t size)
> +{
> void *block = calloc(1, size + sizeof(ralloc_header));
> ralloc_header *info;
> ralloc_header *parent;
> @@ -128,15 +140,6 @@ ralloc_size(const void *ctx, size_t size)
> return PTR_FROM_HEADER(info);
>  }
>  
> -void *
> -rzalloc_size(const void *ctx, size_t size)
> -{
> -   void *ptr = ralloc_size(ctx, size);
> -   if (likely(ptr != NULL))
> -  memset(ptr, 0, size);
> -   return ptr;
> -}
> -
>  /* helper function - assumes ptr != NULL */
>  static void *
>  resize(void *ptr, size_t size)

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] util/ralloc: Remove double zero'ing of rzalloc buffers

2016-05-10 Thread Jordan Justen
Juha-Pekka found this back in May 2015:
<1430915727-28677-1-git-send-email-juhapekka.heikk...@gmail.com>

From the discussion, obviously it would be preferable to make
ralloc_size no longer return zeroed memory, but Juha-Pekka found that
it would break Mesa.

In <56af1c57.2030...@gmail.com>, Juha-Pekka mentioned that patches
exist to fix i965 when ralloc_size is fixed to not zero memory, but
the patches have not made their way to mesa-dev yet.

For now, let's stop doing the double zeroing of rzalloc buffers.

v2:
 * Move ralloc_size code to rzalloc_size, and add a comment as
   suggested by Ken.

Signed-off-by: Jordan Justen 
Cc: Juha-Pekka Heikkila 
Cc: Kenneth Graunke 
---
 src/util/ralloc.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/src/util/ralloc.c b/src/util/ralloc.c
index 6d4032b..9526011 100644
--- a/src/util/ralloc.c
+++ b/src/util/ralloc.c
@@ -110,6 +110,18 @@ ralloc_context(const void *ctx)
 void *
 ralloc_size(const void *ctx, size_t size)
 {
+   /* ralloc_size was originally implemented using calloc, which meant some
+* code accidentally relied on its zero filling behavior.
+*
+* TODO: Make ralloc_size not zero fill memory, and cleanup any code that
+* should instead be using rzalloc.
+*/
+   return rzalloc_size(ctx, size);
+}
+
+void *
+rzalloc_size(const void *ctx, size_t size)
+{
void *block = calloc(1, size + sizeof(ralloc_header));
ralloc_header *info;
ralloc_header *parent;
@@ -128,15 +140,6 @@ ralloc_size(const void *ctx, size_t size)
return PTR_FROM_HEADER(info);
 }
 
-void *
-rzalloc_size(const void *ctx, size_t size)
-{
-   void *ptr = ralloc_size(ctx, size);
-   if (likely(ptr != NULL))
-  memset(ptr, 0, size);
-   return ptr;
-}
-
 /* helper function - assumes ptr != NULL */
 static void *
 resize(void *ptr, size_t size)
-- 
2.8.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Default all constants to a location of -1

2016-05-10 Thread Kenneth Graunke
On Tuesday, May 10, 2016 1:54:58 PM PDT Jason Ekstrand wrote:
> Otherwise constants which aren't live get an undefined constant location.
> When we go to set up param and pull_param we end up assigning all unused
> uniforms to slot 0.  This cases the Vulkan driver to segfault because it
> doesn't have pull_param.
> 
> This fixes bugs in the Vulkan driver introduced in c3fab3d000.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Default all constants to a location of -1

2016-05-10 Thread Mark Janes
Reviewed-by: Mark Janes 

Jason Ekstrand  writes:

> Otherwise constants which aren't live get an undefined constant location.
> When we go to set up param and pull_param we end up assigning all unused
> uniforms to slot 0.  This cases the Vulkan driver to segfault because it
> doesn't have pull_param.
>
> This fixes bugs in the Vulkan driver introduced in c3fab3d000.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 88d6722..ac714c5 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -2124,6 +2124,10 @@ fs_visitor::assign_constant_locations()
> push_constant_loc = ralloc_array(mem_ctx, int, uniforms);
> pull_constant_loc = ralloc_array(mem_ctx, int, uniforms);
>  
> +   /* Default to -1 meaning no location */
> +   memset(push_constant_loc, -1, uniforms * sizeof(*push_constant_loc));
> +   memset(pull_constant_loc, -1, uniforms * sizeof(*pull_constant_loc));
> +
> int chunk_start = -1;
>  
> /* First push 64-bit uniforms to ensure they are properly aligned */
> @@ -2131,9 +2135,6 @@ fs_visitor::assign_constant_locations()
>if (!is_live[u] || !is_live_64bit[u])
>   continue;
>  
> -  pull_constant_loc[u] = -1;
> -  push_constant_loc[u] = -1;
> -
>set_push_pull_constant_loc(u, _start, contiguous[u],
>   push_constant_loc, pull_constant_loc,
>   _push_constants, _pull_constants,
> @@ -2147,9 +2148,6 @@ fs_visitor::assign_constant_locations()
>if (!is_live[u] || is_live_64bit[u])
>   continue;
>  
> -  pull_constant_loc[u] = -1;
> -  push_constant_loc[u] = -1;
> -
>set_push_pull_constant_loc(u, _start, contiguous[u],
>   push_constant_loc, pull_constant_loc,
>   _push_constants, _pull_constants,
> -- 
> 2.5.0.400.gff86faf
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/23] i965/fs: fix requirements to allow type change in copy-propagation

2016-05-10 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> When source modifiers are present and the types of the source and
> the entry's source are different, there are certain cases in which
> we allow copy-propagation to change the type of source by the type
> of the entry's source we are copy propagating from.
>
> However, it is not generally safe to do this if the types involved
> have different sizes, since parameters like the stride would change
> the semantics of the resulting instruction.
>
AFAICT this will be a problem regardless of strides and other regioning
parameters -- The problem is that because the semantics of source
modifiers are type-dependent, the type of the original source of the
copy must be kept unmodified while propagating it into some instruction,
which implies that we need to have the guarantee that the meaning of the
instruction is going to remain the same after we've changed the types.
In particular when the size of the new type is different from the size
of the old type the new and old instructions cannot possibly be
equivalent because the new instruction will be reading more data that
the old one was.

I suggest you clarify the paragraph above and/or introduce a short
comment in the code explaining why a copy instruction with source
modifiers requires the instruction being propagated into to have a
special form.

> Prevents that we turn this:
>
> load_payload(8) vgrf17:DF, |vgrf4+0.0|:DF 1sthalf
> mov(8) vgrf18:DF, vgrf17:DF 1sthalf
> load_payload(8) vgrf5:DF, vgrf18:DF, vgrf20:DF NoMask 1sthalf WE_all
> load_payload(8) vgrf21:UD, vgrf5+0.4<2>:UD 1sthalf
> mov(8) vgrf22:UD, vgrf21:UD 1sthalf
>
> into:
>
> load_payload(8) vgrf17:DF, |vgrf4+0.0|:DF 1sthalf
> mov(8) vgrf18:DF, |vgrf4+0.0|:DF 1sthalf
> load_payload(8) vgrf5:DF, |vgrf4+0.0|:DF, |vgrf4+2.0|:DF NoMask 1sthalf WE_all
> load_payload(8) vgrf21:UD, vgrf5+0.4<2>:UD 1sthalf
> mov(8) vgrf22:DF, |vgrf4+0.4|<2>:DF 1sthalf
>
> where the semantics of the last instruccion have changed.
> ---
>  src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> index abc68c8..aa4c9c9 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> @@ -411,8 +411,9 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
> acp_entry *entry)
>return false;
>  
> if (has_source_modifiers &&
> -   entry->dst.type != inst->src[arg].type &&
> -   !inst->can_change_types())
> +   ((entry->dst.type != inst->src[arg].type &&
> + !inst->can_change_types()) ||
> +(type_sz(entry->dst.type) != type_sz(inst->src[arg].type

I would find the expression above more readable (less parentheses
overall) if you did something like:

| (has_source_modifiers &&
|  entry->dst.type != inst->src[arg].type &&
|  (!inst->can_change_types() ||
|   type_sz(entry->dst.type) != type_sz(inst->src[arg].type)))

Either way looks correct to me:

Reviewed-by: Francisco Jerez 

>return false;
>  
> if (devinfo->gen >= 8 && (entry->src.negate || entry->src.abs) &&
> -- 
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/23] i965/fs: fix copy propagation from sources with stride 0

2016-05-10 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> We should not offset into them based on the relative offset of
> our source and the destination of the instruction we are copy
> propagating from, so we don't turn this:
>
> mov(16) vgrf6:F, vgrf7+0.0<0>:F
> (...)
> load_payload(8) vgrf28:F, vgrf6+1.0:F 2ndhalf
> mov(8) vgrf29:DF, vgrf28:F 2ndhalf
>
> into:
>
> mov(16) vgrf6:F, vgrf7+0.0<0>:F
> (...)
> load_payload(8) vgrf28:F, vgrf7+1.0<0>:F 2ndhalf
> mov(8) vgrf29:DF, vgrf7+1.0<0>:F 2ndhalf
>
> and instead we do this:
>
> mov(16) vgrf6:F, vgrf7+0.0<0>:F
> (...)
> load_payload(8) vgrf28:F, vgrf7+0.0<0>:F 2ndhalf
> mov(8) vgrf29:DF, vgrf7+0.0<0>:F 2ndhalf
> ---
>  src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 16 +---
>  1 file changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> index becc8bc..9147e60 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> @@ -460,10 +460,20 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, 
> acp_entry *entry)
>* parts of vgrfs so we have to do some reg_offset magic.
>*/
>  
> - /* Compute the offset of inst->src[arg] relative to inst->dst */
> + /* Compute the offset of inst->src[arg] relative to inst->dst
> +  *
> +  * If the source we are copy propagating from has a stride of 0, 
> then
> +  * we must not offset into it based on the offset of our source
> +  * relative to entry->dst
> +  */
>   assert(entry->dst.subreg_offset == 0);
> - int rel_offset = inst->src[arg].reg_offset - entry->dst.reg_offset;
> - int rel_suboffset = inst->src[arg].subreg_offset;
> + int rel_offset, rel_suboffset;
> + if (entry->src.stride != 0) {
> +rel_offset = inst->src[arg].reg_offset - entry->dst.reg_offset;
> +rel_suboffset = inst->src[arg].subreg_offset;
> + } else {
> +rel_offset = rel_suboffset = 0;
> + }

Sigh, this fixes the problem for stride == 0 but leaves the logic broken
in the general case.  Turns out I came across the same problem with
SIMD32 and came up with the fix below that should work regardless of the
stride value...

>  
>   /* Compute the final register offset (in bytes) */
>   int offset = entry->src.reg_offset * 32 + entry->src.subreg_offset;
> -- 
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

From 325c80c10ea9028271116094bf8b674ffc0eb9f2 Mon Sep 17 00:00:00 2001
From: Francisco Jerez 
Date: Mon, 25 Apr 2016 15:40:05 -0700
Subject: [PATCH] i965/fs: Fix propagation of copies with strided source.

This has likely been broken since we started propagating copies not
matching the offset of the instruction exactly
(1728e74957a62b1b4b9fbb62a7de2c12b77c8a75).  The copy source stride
needs to be taken into account to find out the offset at the origin
that corresponds to the offset at the destination of the copy which is
being read by the instruction.  This has led to program miscompilation
on both my SIMD32 branch and Igalia's FP64 branch.
---
 .../drivers/dri/i965/brw_fs_copy_propagation.cpp   | 30 ++
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
index 3c702d8..83791bf 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
@@ -460,16 +460,26 @@ fs_visitor::try_copy_propagate(fs_inst *inst, int arg, acp_entry *entry)
   * parts of vgrfs so we have to do some reg_offset magic.
   */
 
- /* Compute the offset of inst->src[arg] relative to inst->dst */
- assert(entry->dst.subreg_offset == 0);
- int rel_offset = inst->src[arg].reg_offset - entry->dst.reg_offset;
- int rel_suboffset = inst->src[arg].subreg_offset;
-
- /* Compute the final register offset (in bytes) */
- int offset = entry->src.reg_offset * 32 + entry->src.subreg_offset;
- offset += rel_offset * 32 + rel_suboffset;
- inst->src[arg].reg_offset = offset / 32;
- inst->src[arg].subreg_offset = offset % 32;
+ /* Compute the offset of inst->src[arg] relative to entry->dst */
+ const unsigned rel_offset = (inst->src[arg].reg_offset
+  - entry->dst.reg_offset) * REG_SIZE +
+ inst->src[arg].subreg_offset;
+
+ /* Compute the first component of the copy that the instruction is
+  * reading, and the base 

[Mesa-dev] [PATCH] i965/fs: Default all constants to a location of -1

2016-05-10 Thread Jason Ekstrand
Otherwise constants which aren't live get an undefined constant location.
When we go to set up param and pull_param we end up assigning all unused
uniforms to slot 0.  This cases the Vulkan driver to segfault because it
doesn't have pull_param.

This fixes bugs in the Vulkan driver introduced in c3fab3d000.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 88d6722..ac714c5 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2124,6 +2124,10 @@ fs_visitor::assign_constant_locations()
push_constant_loc = ralloc_array(mem_ctx, int, uniforms);
pull_constant_loc = ralloc_array(mem_ctx, int, uniforms);
 
+   /* Default to -1 meaning no location */
+   memset(push_constant_loc, -1, uniforms * sizeof(*push_constant_loc));
+   memset(pull_constant_loc, -1, uniforms * sizeof(*pull_constant_loc));
+
int chunk_start = -1;
 
/* First push 64-bit uniforms to ensure they are properly aligned */
@@ -2131,9 +2135,6 @@ fs_visitor::assign_constant_locations()
   if (!is_live[u] || !is_live_64bit[u])
  continue;
 
-  pull_constant_loc[u] = -1;
-  push_constant_loc[u] = -1;
-
   set_push_pull_constant_loc(u, _start, contiguous[u],
  push_constant_loc, pull_constant_loc,
  _push_constants, _pull_constants,
@@ -2147,9 +2148,6 @@ fs_visitor::assign_constant_locations()
   if (!is_live[u] || is_live_64bit[u])
  continue;
 
-  pull_constant_loc[u] = -1;
-  push_constant_loc[u] = -1;
-
   set_push_pull_constant_loc(u, _start, contiguous[u],
  push_constant_loc, pull_constant_loc,
  _push_constants, _pull_constants,
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/23] i965/fs: Fix copy propagation of load payload for double operands

2016-05-10 Thread Francisco Jerez
Samuel Iglesias Gonsálvez  writes:

> From: Iago Toral Quiroga 
>
> Specifically, consider the size of the data type of the operand to compute
> the number of registers written.
> ---
>  src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> index 9147e60..abc68c8 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_copy_propagation.cpp
> @@ -775,7 +775,7 @@ fs_visitor::opt_copy_propagate_local(void *copy_prop_ctx, 
> bblock_t *block,
>   int offset = 0;
>   for (int i = 0; i < inst->sources; i++) {
>  int effective_width = i < inst->header_size ? 8 : 
> inst->exec_size;
> -int regs_written = effective_width / 8;
> +int regs_written = effective_width / 8 * 
> type_sz(inst->src[i].type) / 4;

Please use 'effective_width * type_sz(...) / REG_SIZE' instead, they are
not necessarily equivalent due to rounding and only the latter is
correct when they aren't.  (The existing code still looks broken when
the result is not an exact multiple of REG_SIZE but that probably
belongs in a separate patch...)

With that fixed:
Reviewed-by: Francisco Jerez 

>  if (inst->src[i].file == VGRF) {
> acp_entry *entry = ralloc(copy_prop_ctx, acp_entry);
> entry->dst = inst->dst;
> -- 
> 2.5.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/vbo: fix check for zero aliases with 2/10/10/10

2016-05-10 Thread Dave Airlie
On 11 May 2016 at 04:06, Ian Romanick  wrote:
> It seems like at least some of these recent fixes should be candidates
> for stable.

I didn't think they were fixing any real world problems so I hadn't
really bothered,

I think I tagged one of them so far, my goal was to try and get Haswell to pass
GL 3.3 CTS, either by fixing CTS or fixing the driver. So far it's
mostly involved
fixing CTS, (I added you and a few others to my gitlab cts branch).

GL33-CTS.clip_distance.functional,Fail
GL33-CTS.transform_feedback.capture_special_interleaved_test,Fail
GL33-CTS.CommonBugs.CommonBug_ReservedNames,Fail
GL33-CTS.texture_size_promotion.functional,InternalError
are the 4 tests I have left,

the first is kinda random failing on i965, but passes on llvmpipe
the second is a bug in i965 somewhere I haven't located
(https://bugs.freedesktop.org/show_bug.cgi?id=95322)
the third is a GLSL compiler bug
(https://bugs.freedesktop.org/show_bug.cgi?id=95323)
the last after much test fixing is failing on depth textures somehow
but it's inconsistent and it still might
be the test, however it did find the snorm clamping bug.

I'm also in passing running CTS on radeonsi forced to GL4.5 to get
better coverage from CTS,
but I'll probably run out of time/steam soon.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] compiler: guard list iteration macros against undefined behavior (v2)

2016-05-10 Thread Nicolai Hähnle
From: Nicolai Hähnle 

The old iteration casts sentinel nodes (which are mere exec_nodes) into
whatever type we're looping over, which leads to badness (in fact, gcc's
undefined behaviour sanitizer crashes while trying to verify that we have
the correct type at hand).

These modified looping constructs postpone the cast until its correctness
has been established. The odd two-level loop construct is used to be able
to define variables of different types, and the __sentinel variable
ensures that the outer loop is only run once. gcc is able to optimize the
outer loop away in the cases I have examined.

v2: fix a compile error in brw_dead_control_flow.cpp (noticed by Shawn Starr)
---
 src/compiler/glsl/list.h   | 109 -
 .../drivers/dri/i965/brw_dead_control_flow.cpp |   2 +-
 2 files changed, 65 insertions(+), 46 deletions(-)

diff --git a/src/compiler/glsl/list.h b/src/compiler/glsl/list.h
index a1c4d82..8da9514 100644
--- a/src/compiler/glsl/list.h
+++ b/src/compiler/glsl/list.h
@@ -621,36 +621,55 @@ inline void exec_node::insert_before(exec_list *before)
 }
 #endif
 
-#define foreach_in_list(__type, __inst, __list)  \
-   for (__type *(__inst) = (__type *)(__list)->head; \
-!(__inst)->is_tail_sentinel();   \
-(__inst) = (__type *)(__inst)->next)
-
-#define foreach_in_list_reverse(__type, __inst, __list)   \
-   for (__type *(__inst) = (__type *)(__list)->tail_pred; \
-!(__inst)->is_head_sentinel();\
-(__inst) = (__type *)(__inst)->prev)
+/* The somewhat odd-looking multi-loop construct here is to avoid casting
+ * sentinel nodes, which would be undefined behavior (which is indeed flagged /
+ * leads to crashes with gcc's ubsan).
+ */
+#define foreach_in_list(__type, __node, __list)  \
+   for (__type *(__node), **__flag = &(__node);  \
+__flag; __flag = NULL)   \
+  for (struct exec_node *__cur = (__list)->head; \
+   !__cur->is_tail_sentinel() && \
+   (((__node) = (__type *) __cur) || true);  \
+   __cur = __cur->next)  \
+
+#define foreach_in_list_reverse(__type, __node, __list)   \
+   for (__type *(__node), **__flag = &(__node);   \
+__flag; __flag = NULL)\
+  for (struct exec_node *__cur = (__list)->tail_pred; \
+   !__cur->is_head_sentinel() &&  \
+   (((__node) = (__type *) __cur) || true);   \
+   __cur = __cur->prev)
 
 /**
  * This version is safe even if the current node is removed.
  */ 
 #define foreach_in_list_safe(__type, __node, __list) \
-   for (__type *__node = (__type *)(__list)->head,   \
-   *__next = (__type *)__node->next; \
-__next != NULL;  \
-__node = __next, __next = (__type *)__next->next)
+   for (__type *(__node), **__flag = &(__node);  \
+__flag; __flag = NULL)   \
+  for (struct exec_node *__cur = (__list)->head, \
+*__next = __cur->next;   \
+   __next != NULL && \
+   (((__node) = (__type *) __cur) || true);  \
+   __cur = __next, __next = __next->next)
 
 #define foreach_in_list_reverse_safe(__type, __node, __list) \
-   for (__type *__node = (__type *)(__list)->tail_pred,  \
-   *__prev = (__type *)__node->prev; \
-__prev != NULL;  \
-__node = __prev, __prev = (__type *)__prev->prev)
-
-#define foreach_in_list_use_after(__type, __inst, __list) \
-   __type *(__inst);  \
-   for ((__inst) = (__type *)(__list)->head;  \
-!(__inst)->is_tail_sentinel();\
-(__inst) = (__type *)(__inst)->next)
+   for (__type *(__node), **__flag = &(__node);  \
+__flag; __flag = NULL)   \
+  for (struct exec_node *__cur = (__list)->tail_pred,\
+*__prev = __cur->prev;   \
+   __prev != NULL && \
+   (((__node) = (__type *) __cur) || true);  \
+   __cur = __prev, __prev = __prev->prev)
+
+#define foreach_in_list_use_after(__type, __node, __list) \
+   __type *(__node);  \
+   for (__type **__flag = &(__node);  \
+__flag; __flag = NULL)\
+  for (struct exec_node *__cur = (__list)->head;  \
+   !__cur->is_tail_sentinel() &&  \
+   (((__node) = (__type *) __cur) || true);   \
+   __cur = __cur->next)
 /**
  * Iterate through two lists at once.  Stops at the end of the shorter list.
  *
@@ -668,33 +687,33 @@ inline void 

Re: [Mesa-dev] [PATCH 3/9] compiler: guard list iteration macros against undefined behavior

2016-05-10 Thread Nicolai Hähnle

On 10.05.2016 14:17, Ian Romanick wrote:

On 04/30/2016 12:24 AM, Nicolai Hähnle wrote:

From: Nicolai Hähnle 

The old iteration casts sentinel nodes (which are mere exec_nodes) into
whatever type we're looping over, which leads to badness (in fact, gcc's
undefined behaviour sanitizer crashes while trying to verify that we have
the correct type at hand).


Bwah.  I was going to suggest a different method, but I found that my
other ideas wouldn't work for various reasons.


These modified looping constructs postpone the cast until its correctness
has been established. The odd two-level loop construct is used to be able
to define variables of different types, and the __sentinel variable

   ^^
You mean __flag, right?


Yes, sorry.


ensures that the outer loop is only run once. gcc is able to optimize the
outer loop away in the cases I have examined.


Does 'size' report any difference on the resulting .so?


I just ran the experiment on a build with -g -O2 
-fno-omit-frame-pointer, and the results are (first line before the 
series, each subsequent line after the next patch in the pure compiler 
series that I sent out):


   textdata bss dec hex filename
7314874  229656 2051744 9596274  926d72 ./lib/gallium/radeonsi_dri.so
7310166  229656 2051744 9591566  925b0e ./lib/gallium/radeonsi_dri.so
7310214  229656 2051744 9591614  925b3e ./lib/gallium/radeonsi_dri.so
7310150  229656 2051744 9591550  925afe ./lib/gallium/radeonsi_dri.so
7310134  229656 2051744 9591534  925aee ./lib/gallium/radeonsi_dri.so
7310134  229656 2051744 9591534  925aee ./lib/gallium/radeonsi_dri.so

I did not expect that! Turns out that there's some arithmetic involved 
in the casts - apparently the compiler decides to put the VMT before the 
exec_node part - and that could explain the change in code size.


I'm sending out a v2 of this patch in a moment - Shawn pointed out a 
compiler error in the Intel driver caused by someone directly accessing 
the __next variable, tsk tsk! ;-)


Cheers,
Nicolai


---
  src/compiler/glsl/list.h | 109 ---
  1 file changed, 64 insertions(+), 45 deletions(-)

diff --git a/src/compiler/glsl/list.h b/src/compiler/glsl/list.h
index a1c4d82..8da9514 100644
--- a/src/compiler/glsl/list.h
+++ b/src/compiler/glsl/list.h
@@ -621,36 +621,55 @@ inline void exec_node::insert_before(exec_list *before)
  }
  #endif

-#define foreach_in_list(__type, __inst, __list)  \
-   for (__type *(__inst) = (__type *)(__list)->head; \
-!(__inst)->is_tail_sentinel();   \
-(__inst) = (__type *)(__inst)->next)
-
-#define foreach_in_list_reverse(__type, __inst, __list)   \
-   for (__type *(__inst) = (__type *)(__list)->tail_pred; \
-!(__inst)->is_head_sentinel();\
-(__inst) = (__type *)(__inst)->prev)
+/* The somewhat odd-looking multi-loop construct here is to avoid casting
+ * sentinel nodes, which would be undefined behavior (which is indeed flagged /
+ * leads to crashes with gcc's ubsan).
+ */
+#define foreach_in_list(__type, __node, __list)  \
+   for (__type *(__node), **__flag = &(__node);  \
+__flag; __flag = NULL)   \
+  for (struct exec_node *__cur = (__list)->head; \
+   !__cur->is_tail_sentinel() && \
+   (((__node) = (__type *) __cur) || true);  \
+   __cur = __cur->next)  \
+
+#define foreach_in_list_reverse(__type, __node, __list)   \
+   for (__type *(__node), **__flag = &(__node);   \
+__flag; __flag = NULL)\
+  for (struct exec_node *__cur = (__list)->tail_pred; \
+   !__cur->is_head_sentinel() &&  \
+   (((__node) = (__type *) __cur) || true);   \
+   __cur = __cur->prev)

  /**
   * This version is safe even if the current node is removed.
   */
  #define foreach_in_list_safe(__type, __node, __list) \
-   for (__type *__node = (__type *)(__list)->head,   \
-   *__next = (__type *)__node->next; \
-__next != NULL;  \
-__node = __next, __next = (__type *)__next->next)
+   for (__type *(__node), **__flag = &(__node);  \
+__flag; __flag = NULL)   \
+  for (struct exec_node *__cur = (__list)->head, \
+*__next = __cur->next;   \
+   __next != NULL && \
+   (((__node) = (__type *) __cur) || true);  \
+   __cur = __next, __next = __next->next)

  #define foreach_in_list_reverse_safe(__type, __node, __list) \
-   for (__type *__node = (__type *)(__list)->tail_pred,  \
-   *__prev = (__type *)__node->prev; \
-__prev != NULL;  \
-__node = __prev, __prev = (__type *)__prev->prev)
-

Re: [Mesa-dev] [PATCH 08/15] i965/vec4: use attribute slots to calculate URB read length

2016-05-10 Thread Kenneth Graunke
On Tuesday, May 10, 2016 6:24:39 PM PDT Juan A. Suarez Romero wrote:
> On Mon, 2016-05-09 at 23:37 -0700, Kenneth Graunke wrote:
> > void *log_data,
> > > const unsigned *assembly = NULL;
> > >  
> > > unsigned nr_attributes = _mesa_bitcount_64(prog_data-
> > >inputs_read);
> > > +   unsigned nr_attribute_slots = 0;
> > 
> > Can't you just do:
> > 
> > nir_shader *nir = vp->program.Base.nir;
> > unsigned nr_attribute_slots =
> >_mesa_bitcount_64(nir->info.inputs_read) +
> >_mesa_bitcount_64(nir->info.double_inputs_read);
> > 
> > It seems like that should work without the need to iterate variables.
> 
> 
> Right. I'll use that instead. Thank you!
> 
>   J.A.
> 

Cool!  Have a:
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/15] i965/fs: shuffle 32bits into 64bits for doubles

2016-05-10 Thread Kenneth Graunke
On Thursday, April 28, 2016 1:40:35 PM PDT Antia Puentes wrote:
> From: "Juan A. Suarez Romero" 
> 
> VS Thread Payload handles attributes in URB as vec4, no matter if they
> are actually single or double precision.
> 
> So with double-precision types, value ends up in the registers split in
> 32bits chunks, in different positions.
> 
> We need to shuffle the chunks to get the doubles correctly.
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/
dri/i965/brw_fs_nir.cpp
> index 0ff3eaf..4362308 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -3173,6 +3173,12 @@ fs_visitor::nir_emit_intrinsic(const fs_builder , 
nir_intrinsic_instr *instr
>for (unsigned j = 0; j < instr->num_components; j++) {
>   bld.MOV(offset(dest, bld, j), offset(src, bld, j));
>}
> +  if (type_sz(src.type) == 8)
> + SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA(bld,
> + offset(dest, bld, 0),
> + offset(dest, bld, 0),

Isn't this just dest, dest then? :)

Reviewed-by: Kenneth Graunke 

> + instr->num_components);
> +
>break;
> }
>  
> 



signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/23] Finishing arb_gpu_shader_fp64 support to the i965 scalar backend

2016-05-10 Thread Kenneth Graunke
On Tuesday, May 3, 2016 2:21:49 PM PDT Samuel Iglesias Gonsálvez wrote:
> Hello,
> 
> This series adds the final bits to support arb_gpu_shader_fp64 in the
> i965 scalar backend for BDW+ hardware. It sits on top of the previous
> series we sent last week [0] and which is going through review at the
> moment. Specifically, this series adds:
> 
> 1. Fixes to copy propagation required for the pass to do the right
>thing in various cases specific to fp64 code as well as fixes to
>other more generic bugs in the pass that we detected.
> 
> 2. Support for double ubo/ssbo/shared-variable load/store.
> 
> The series also includes a fix to the execmasking issue in the SIMD
> lowering pass that we discussed in the previous series (patch 10). As
> we explained in that series, Curro is working on a better solution but
> we decided to include it here so the problem is made explicit to
> reviewers and also for testing, since that is necessary to fix some
> problems. The plan is to go with Curro's patch when it is available and
> we can test it.
> 
> The series does not introduce any regressions in piglit on ILK, SNB,
> HSW, BDW or SKL compared with master's HEAD 5541e11b9.
> 
> Piglit's fp64 test totals with this series, as is, look like this:
> 
>pass: 2548
>fail:   43
>   crash:  425
>skip:   16
>   total: 3032
> 
> The crashes come from the lack of vec4 support required by the GS and
> TESS stages at the moment. The plan for this is to use scalar GS and
> TESS in gen8+ so these stages run through the scalar backend as well.
> This means that we need to make a decision to either use these by
> default in gen8+ or detect if we are using fp64 to enable them
> selectively. If we enable scalar GS, TES backends (INTEL_SCALAR_TES=1
> INTEL_SCALAR_GS=1), we obtain the following results:
> 
>pass: 2971
>fail:   43
>   crash:2
>skip:   16
>   total: 3032
> 
> The fails and crashes observed are related to the register spilling
> issues we mentioned in the previous series (we still need to check if
> Curro's branch helps with that), so basically the tests fail to
> compile. The 2 crashes happen when we try to compile a couple of GS
> tests with the scalar backend and upon failure we attempt a vec4
> compilation. Since we don't support vec4 fp64 yet, that ends up hitting
> an assert. We think that we probably want to detect if GS/TESS programs
> use fp64 and if so avoid falling back to vec4 if the scalar compilation
> failed rather than attempting a backend that just does not implement
> the feature.
> 
> Notice that scalar TCS support is not in master yet but the patches
> have already been sent for review [2] and reviewed [3].
> 
> A branch with this series is available for testing here:
> 
> $ git clone -b i965-fp64-scalar-backend-part-2 https://github.com/Igalia/
mesa.git
> 
> Thanks,
> 
> Sam
> 
> [0] https://lists.freedesktop.org/archives/mesa-dev/2016-April/115014.html
> [1] https://lists.freedesktop.org/archives/mesa-dev/2016-April/115216.html
> [2] https://lists.freedesktop.org/archives/mesa-dev/2016-April/114191.html
> [3] https://lists.freedesktop.org/archives/mesa-dev/2016-April/114968.html
> 
> 
> Iago Toral Quiroga (23):
>   i965/fs: fix subreg_offset overflow in byte_offset()
>   i965/fs: fix copy propagation from sources with stride 0
>   i965/fs: Fix copy propagation of load payload for double operands
>   i965/fs: fix requirements to allow type change in copy-propagation
>   i965/fs: fix copy-propagation with suboffset from constants
>   i965/fs: fix copy/constant propagation regioning checks
>   i965/fs: fix copy propagation of partially invalidated entries
>   i965/fs: fix copy propagation from load payload
>   i965/fs: don't copy propagate from a larger type if the stride is not
> 1
>   i965/fs/lower_simd_width: fix result transposition
>   i965/fs: Fix fs_visitor::VARYING_PULL_CONSTANT_LOAD for doubles
>   i965/fs: fix pull constant load component selection for doubles
>   i965/fs: support doubles with UBO loads
>   i965/fs: support doubles with SSBO loads
>   i965/fs: add SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA helper
>   i965/fs: Add do_untyped_vector_read helper
>   i965/fs: support doubles with shared variable loads
>   i965/fs: add SHUFFLE_32BIT_DATA_FOR_64BIT_WRITE helper
>   i965/fs: support doubles with ssbo stores
>   i965/fs: support doubles with shared variable stores
>   i965: Enable ARB_gpu_shader_fp64 for gen8+
>   docs: Mark ARB_gpu_shader_fp64 as done for i965/gen8+
>   i965: Expose OpenGL 4.0 for gen8+
> 
>  docs/GL3.txt   |   2 +-
>  src/mesa/drivers/dri/i965/brw_fs.cpp   | 156 ++-
>  src/mesa/drivers/dri/i965/brw_fs.h |  16 ++
>  .../drivers/dri/i965/brw_fs_copy_propagation.cpp   

[Mesa-dev] [PATCH v2] doxygen: Add missing modules to Windows runner

2016-05-10 Thread Elie TOURNIER
---
 doxygen/doxy.bat | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/doxygen/doxy.bat b/doxygen/doxy.bat
index e566ca3..408964e 100644
--- a/doxygen/doxy.bat
+++ b/doxygen/doxy.bat
@@ -6,6 +6,9 @@ doxygen swrast_setup.doxy
 doxygen tnl.doxy
 doxygen core.doxy
 doxygen glapi.doxy
+doxygen glsl.doxy
+doxygen nir.doxy
+doxygen i965.doxy
 
 echo Building again, to resolve tags
 doxygen tnl_dd.doxy
@@ -14,4 +17,8 @@ doxygen math.doxy
 doxygen swrast.doxy
 doxygen swrast_setup.doxy
 doxygen tnl.doxy
+doxygen core.doxy
 doxygen glapi.doxy
+doxygen glsl.doxy
+doxygen nir.doxy
+doxygen i965.doxy
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCHv2 10/23] i965/fs: Stop using the LOAD_PAYLOAD instruction in lower_simd_width.

2016-05-10 Thread Kenneth Graunke
On Tuesday, May 3, 2016 9:26:13 PM PDT Francisco Jerez wrote:
> Instead of using the LOAD_PAYLOAD instruction (emitted through the
> emit_transpose() helper that is no longer useful and this commit
> removes) which had to be marked force_writemask_all in some cases,
> emit a series of moves to apply proper channel enable signals to the
> destination.  Until now lower_simd_width() had mainly been used to
> lower things that invariably had a basic block-local temporary as
> destination so it didn't seem like a big deal, but I found it to be
> the reason for several Piglit regressions in my SIMD32 branch and
> Igalia discovered the same issue independently while working on FP64
> support.
> ---
> This is taken from the following WIP series:
>   https://cgit.freedesktop.org/~currojerez/mesa/log/?h=i965-late-simd-lowering
> 
> See also:
>   https://lists.freedesktop.org/archives/mesa-dev/2016-May/115596.html
> 
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 58 ++
+-
>  1 file changed, 18 insertions(+), 40 deletions(-)

Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [v7 05/11] i965: Deferred allocation of mcs for lossless compressed

2016-05-10 Thread Topi Pohjolainen
Until now mcs was associated to single sampled buffers only for
fast clear purposes and it was therefore the responsibility of the
clear logic to allocate the aux buffer when needed. Now that normal
3D render or blorp blit may render with mcs enabled also, they need
to prepare the mcs just as well.

v2: Do not enable for scanout buffers
v3 (Ben):
   - Fix typo in commit message.
   - Check for gen < 9 and return early in brw_predraw_set_aux_buffers()
   - Check for gen < 9 and return early in intel_miptree_prepare_mcs()
v4: Check for msaa_layput and number of samples to determine if
lossless compression is to used. Otherwise one cannot distuingish
between fast clear with and without compression.

Signed-off-by: Topi Pohjolainen 
Reviewed-by: Ben Widawsky  (v3)
---
 src/mesa/drivers/dri/i965/brw_blorp_blit.cpp  |  2 ++
 src/mesa/drivers/dri/i965/brw_draw.c  | 20 ++
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 40 +++
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h |  4 +++
 4 files changed, 66 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
index ab2ceec..0672213 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
@@ -1881,6 +1881,8 @@ brw_blorp_blit_miptrees(struct brw_context *brw,
intel_miptree_slice_resolve_depth(brw, src_mt, src_level, src_layer);
intel_miptree_slice_resolve_depth(brw, dst_mt, dst_level, dst_layer);
 
+   intel_miptree_prepare_mcs(brw, dst_mt);
+
DBG("%s from %dx %s mt %p %d %d (%f,%f) (%f,%f)"
"to %dx %s mt %p %d %d (%f,%f) (%f,%f) (flip %d,%d)\n",
__func__,
diff --git a/src/mesa/drivers/dri/i965/brw_draw.c 
b/src/mesa/drivers/dri/i965/brw_draw.c
index 9d034cf..dcbb681 100644
--- a/src/mesa/drivers/dri/i965/brw_draw.c
+++ b/src/mesa/drivers/dri/i965/brw_draw.c
@@ -391,6 +391,25 @@ brw_postdraw_set_buffers_need_resolve(struct brw_context 
*brw)
}
 }
 
+static void
+brw_predraw_set_aux_buffers(struct brw_context *brw)
+{
+   if (brw->gen < 9)
+  return;
+
+   struct gl_context *ctx = >ctx;
+   struct gl_framebuffer *fb = ctx->DrawBuffer;
+
+   for (unsigned i = 0; i < fb->_NumColorDrawBuffers; i++) {
+  struct intel_renderbuffer *irb =
+ intel_renderbuffer(fb->_ColorDrawBuffers[i]);
+
+  if (irb) {
+ intel_miptree_prepare_mcs(brw, irb->mt);
+  }
+   }
+}
+
 /* May fail if out of video memory for texture or vbo upload, or on
  * fallback conditions.
  */
@@ -438,6 +457,7 @@ brw_try_draw_prims(struct gl_context *ctx,
   _mesa_fls(ctx->VertexProgram._Current->Base.SamplersUsed);
 
intel_prepare_render(brw);
+   brw_predraw_set_aux_buffers(brw);
 
/* This workaround has to happen outside of brw_upload_render_state()
 * because it may flush the batchbuffer for a blit, affecting the state
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 8f6dc24..0b432ec 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -1622,6 +1622,46 @@ intel_miptree_alloc_non_msrt_mcs(struct brw_context *brw,
return mt->mcs_mt;
 }
 
+void
+intel_miptree_prepare_mcs(struct brw_context *brw,
+  struct intel_mipmap_tree *mt)
+{
+   if (mt->mcs_mt)
+  return;
+
+   if (brw->gen < 9)
+  return;
+
+   /* Single sample compression is represented re-using msaa compression
+* layout type: "Compressed Multisampled Surfaces".
+*/
+   if (mt->msaa_layout != INTEL_MSAA_LAYOUT_CMS || mt->num_samples > 1)
+  return;
+
+   /* Clients are not currently capable of consuming compressed
+* single-sampled buffers.
+*/
+   if (mt->is_scanout)
+  return;
+
+   assert(intel_tiling_supports_non_msrt_mcs(brw, mt->tiling) ||
+  intel_miptree_supports_lossless_compressed(brw, mt));
+
+   /* Consider if lossless compression is supported but the needed
+* auxiliary buffer doesn't exist yet.
+*
+* Failing to allocate the auxiliary buffer means running out of
+* memory. The pointer to the aux miptree is left NULL which should
+* signal non-compressed behavior.
+*/
+   if (!intel_miptree_alloc_non_msrt_mcs(brw, mt)) {
+  _mesa_warning(NULL,
+"Failed to allocated aux buffer for lossless"
+" compressed %p %u:%u %s\n",
+mt, mt->logical_width0, mt->logical_height0,
+_mesa_get_format_name(mt->format));
+   }
+}
 
 /**
  * Helper for intel_miptree_alloc_hiz() that sets
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index 7f6e771..4fb5b69 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
@@ -693,6 +693,10 @@ bool
 

Re: [Mesa-dev] report ARB_cull_distance v3

2016-05-10 Thread Tobias Klausmann


On 08.05.2016 23:29, Tobias Klausmann wrote:

On 08.05.2016 22:50, Ilia Mirkin wrote:

What exactly gets fed into the CLIPDIST and CULLDIST semantics? e.g.
is CULLDIST[0].x the first cull distance, or is it the first entity in
the combined cull/clip distance array? If the former, then this won't
work as implemented on nouveau. If the latter, then why bother with
the separate semantics in the first place?


Its the latter case, as you mostly already suspected. Everything is 
lowered to gl_ClipDistanceMESA and comes in with CLIPDIST. Right now 
only the properties [1], [2] are used to configure clip/cull within 
the nvc0 part. For now i kept them because somebody "could" face 
hardware configured differently, compared to the nvidia one with its 
combined clip/cull


[1] 
https://cgit.freedesktop.org/mesa/mesa/commit/?id=e70c66197ea10cf052010c7352420a2ae0b0a50a
[2] 
https://cgit.freedesktop.org/mesa/mesa/commit/?id=5227e915803079e5e72a0b2fde3a11d62af8df99


Greetings,
Tobias


So, shall we find some verdict for the CULLDIST semantics? Keep it or 
try to get rid of it?
(Adding Marek and Dave, but feel free to add more to this conversation 
directly)






On Sun, May 8, 2016 at 4:44 PM, Tobias Klausmann
 wrote:
After the cleanup of my patches in v2, this is another take on 
finishing this

extension.

v2: cleanup, reordering of patches, split lowering pass adapation 
(Dave Airlie)

v3:
  - drop wrong codesection for array size check (suggested by 
Timothy Arceri) and

with it the now useless helper to see if an array was unsized
  - fix GL3.txt, add releasenote

Dave Airlie (1):
   glsl: rename lower_clip_distance to lower_distance.

Tobias Klausmann (10):
   glapi: Add GL_ARB_cull_distance
   mesa/main: Add support for GL_ARB_cull_distance (v2)
   mesa/prog: Add varyings for arb_cull_distance
   glsl: Extend lowering pass for gl_ClipDistance to support other 
arrays

 (v2)
   glsl: Add arb_cull_distance support
   gallium: Add a pipe cap for arb_cull_distance
   mesa/st: Add support for GL_ARB_cull_distance
   nv50/ir: Check for TGSI_SEMANTIC_CULLDIST in tgsi declarations
   llvmpipe: Enable already implemented cull_distance
   nvc0: Implement cull_distance as a special form of clip distance

  docs/GL3.txt   |   2 +-
  docs/relnotes/11.3.0.html  |   1 +
  src/compiler/Makefile.sources  |   2 +-
  src/compiler/glsl/ast_to_hir.cpp   |  14 +
  src/compiler/glsl/builtin_variables.cpp|  11 +-
  src/compiler/glsl/glcpp/glcpp-parse.y  |   3 +
  src/compiler/glsl/glsl_parser_extras.cpp   |   1 +
  src/compiler/glsl/glsl_parser_extras.h |   2 +
  src/compiler/glsl/ir_optimization.h|   3 +-
  src/compiler/glsl/link_varyings.cpp|  12 +-
  src/compiler/glsl/link_varyings.h  |   1 +
  src/compiler/glsl/linker.cpp   | 113 +++-
  src/compiler/glsl/lower_clip_distance.cpp  | 574 

  src/compiler/glsl/lower_distance.cpp   | 601 
+

  src/compiler/glsl/standalone_scaffolding.cpp   |   1 +
  src/compiler/glsl/tests/varyings_test.cpp  |  27 +
  src/compiler/shader_enums.h|   4 +
  src/gallium/docs/source/screen.rst |   2 +
  src/gallium/drivers/freedreno/freedreno_screen.c   |   1 +
  src/gallium/drivers/i915/i915_screen.c |   1 +
  src/gallium/drivers/ilo/ilo_screen.c   |   1 +
  src/gallium/drivers/llvmpipe/lp_screen.c   |   2 +
  .../drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp  |   3 +
  src/gallium/drivers/nouveau/nv30/nv30_screen.c |   1 +
  src/gallium/drivers/nouveau/nv50/nv50_screen.c |   1 +
  src/gallium/drivers/nouveau/nvc0/nvc0_program.c|   2 +
  src/gallium/drivers/nouveau/nvc0/nvc0_program.h|   1 +
  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c |   1 +
  src/gallium/drivers/r300/r300_screen.c |   1 +
  src/gallium/drivers/r600/r600_pipe.c   |   1 +
  src/gallium/drivers/radeonsi/si_pipe.c |   1 +
  src/gallium/drivers/softpipe/sp_screen.c   |   1 +
  src/gallium/drivers/svga/svga_screen.c |   1 +
  src/gallium/drivers/vc4/vc4_screen.c   |   1 +
  src/gallium/include/pipe/p_defines.h   |   1 +
  src/mapi/glapi/gen/gl_API.xml  |   7 +-
  src/mesa/drivers/dri/i965/brw_compiler.c   |   2 +-
  src/mesa/main/extensions_table.h   |   1 +
  src/mesa/main/get.c|   1 +
  src/mesa/main/get_hash_params.py   |   4 +
  src/mesa/main/mtypes.h |  14 +-
  src/mesa/main/shaderapi.c  |   3 +
  src/mesa/program/prog_print.c  |   4 +
  src/mesa/state_tracker/st_extensions.c  

Re: [Mesa-dev] [PATCH 01/17] scons: Build NIR.

2016-05-10 Thread Emil Velikov
On 9 May 2016 at 20:33, Rob Clark  wrote:
> From: Jose Fonseca 
>
> Signed-off-by: Rob Clark 
> ---
>  src/compiler/SConscript | 57 
> +++--
>  1 file changed, 55 insertions(+), 2 deletions(-)
>
> diff --git a/src/compiler/SConscript b/src/compiler/SConscript
> index 10c79c4..dde4dfd 100644
> --- a/src/compiler/SConscript
> +++ b/src/compiler/SConscript
> @@ -1,5 +1,7 @@
>  Import('*')
>
> +from sys import executable as python_cmd
> +
>  env = env.Clone()
>
>  env.MSVC2013Compat()
> @@ -11,13 +13,64 @@ env.Prepend(CPPPATH = [
>  '#src/mesa',
>  '#src/gallium/include',
>  '#src/gallium/auxiliary',
> +'#src/compiler',
> +'#src/compiler/nir',
> +])
> +
> +
> +# Make generated headers reachable from the include path.
> +env.Append(CPPPATH = [
> +   Dir('nir').abspath
>  ])
>
> -sources = env.ParseSourceList('Makefile.sources', 'LIBCOMPILER_FILES')
> +# nir generated sources
> +
> +nir_builder_opcodes_h = env.CodeGenerate(
> +target = 'nir/nir_builder_opcodes.h',
> +script = 'nir/nir_builder_opcodes_h.py',
> +source = [],
> +command = python_cmd + ' $SCRIPT > $TARGET'
> +)
> +
> +env.CodeGenerate(
> +target = 'nir/nir_constant_expressions.c',
> +script = 'nir/nir_constant_expressions.py',
> +source = [],
> +command = python_cmd + ' $SCRIPT > $TARGET'
> +)
> +
> +env.CodeGenerate(
> +target = 'nir/nir_opcodes.h',
> +script = 'nir/nir_opcodes_h.py',
> +source = [],
> +command = python_cmd + ' $SCRIPT > $TARGET'
> +)
> +
> +env.CodeGenerate(
> +target = 'nir/nir_opcodes.c',
> +script = 'nir/nir_opcodes_c.py',
> +source = [],
> +command = python_cmd + ' $SCRIPT > $TARGET'
> +)
> +
> +env.CodeGenerate(
> +target = 'nir/nir_opt_algebraic.c',
> +script = 'nir/nir_algebraic.py',
> +source = [],
> +command = python_cmd + ' $SCRIPT > $TARGET'
> +)
> +
> +# parse Makefile.sources
> +source_lists = env.ParseSourceList('Makefile.sources')
> +
> +nir_sources = []
> +nir_sources += source_lists['LIBCOMPILER_FILES']
> +nir_sources += source_lists['NIR_FILES']
> +nir_sources += source_lists['NIR_GENERATED_FILES']
>
>  compiler = env.ConvenienceLibrary(
>  target = 'compiler',
> -source = sources
> +source = nir_sources
>  )
>  Export('compiler')
>
NIR already has scons build support. One just needs to add the static
(convenience in scons speak) library 'nir' into the respective
place(s). Something like the following untested hunk should do it. And
yes, it is a bit nasty looking.

-Emil

diff --git a/src/compiler/SConscript.glsl b/src/compiler/SConscript.glsl
index 43a11d1..4e5133b 100644
--- a/src/compiler/SConscript.glsl
+++ b/src/compiler/SConscript.glsl
@@ -64,6 +64,8 @@ if env['msvc']:
 env.Prepend(CPPPATH = ['#/src/getopt'])
 env.PrependUnique(LIBS = [getopt])

+env.Prepend(LIBS = [nir])
+
 # Copy these files to avoid generation object files into src/mesa/program
 env.Prepend(CPPPATH = ['#src/mesa/main'])
 env.Command('glsl/imports.c', '#src/mesa/main/imports.c',
Copy('$TARGET', '$SOURCE'))
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 18/23] i965/fs: add SHUFFLE_32BIT_DATA_FOR_64BIT_WRITE helper

2016-05-10 Thread Kenneth Graunke
On Tuesday, May 3, 2016 2:22:07 PM PDT Samuel Iglesias Gonsálvez wrote:
> From: Iago Toral Quiroga 
> 
> This does the inverse operation of SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA
> and we will use it when we need to write 64-bit data in the layout expected
> by untyped write messages.
> 
> Again, this needs to operate with WE_all set for the same reasons as the
> inverse operation.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 42 ++
++
>  src/mesa/drivers/dri/i965/brw_fs.h   |  5 +
>  2 files changed, 47 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/
i965/brw_fs.cpp
> index 709e4b8..80803a6 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -285,6 +285,48 @@ 
fs_visitor::SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA(const fs_builder ,
>  }
>  
>  /**
> + * This helper does the inverse operation of
> + * SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA.
> + *
> + * We need to do this when we are going to use untyped write messsages that
> + * operate with 32-bit components in order to arrange our 64-bit data i to 
be
> + * in the expected layout.
> + */
> +void
> +fs_visitor::SHUFFLE_32BIT_DATA_FOR_64BIT_WRITE(const fs_builder ,
> +   const fs_reg dst,
> +   const fs_reg src,
> +   uint32_t components)
> +{
> +   int multiplier = bld.dispatch_width() / 8;
> +
> +   /* A temporary that we will use to shuffle the 64-bit data of each
> +* component in the vector into 32-bit data that we can write.
> +*/
> +   fs_reg tmp =
> +  fs_reg(VGRF, alloc.allocate(2 * multiplier), BRW_REGISTER_TYPE_F);
> +
> +   /* We are going to operate the source in units of 32-bit */
> +   fs_reg src_data = retype(src, BRW_REGISTER_TYPE_F);
> +
> +   /* We are going to operate on the dst in units of 64-bit */
> +   fs_reg dst_data = retype(dst, BRW_REGISTER_TYPE_DF);
> +

   const fs_builder nomask_bld = bld.exec_all();

> +   /* Shuffle the data */
> +   for (unsigned i = 0; i < components; i++) {
> +  fs_reg component_i = horiz_offset(src_data, multiplier * 16 * i);
> +
> +  bld.MOV(tmp, stride(component_i, 2))->force_writemask_all = true;
> +  bld.MOV(horiz_offset(tmp, 8 * multiplier),
> +  stride(horiz_offset(component_i, 1), 2))
> + ->force_writemask_all = true;
> +
> +  bld.MOV(horiz_offset(dst_data, multiplier * 8 * i),
> +  retype(tmp, BRW_REGISTER_TYPE_DF))->force_writemask_all = 
true;

nomask_bld.MOV(...);

> +   }
> +}
> +
> +/**
>   * A helper for MOV generation for fixing up broken hardware SEND 
dependency
>   * handling.
>   */
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/
brw_fs.h
> index 1e78f0c..9178347 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -111,6 +111,11 @@ public:
>  const fs_reg src,
>  uint32_t components);
>  
> +   void SHUFFLE_32BIT_DATA_FOR_64BIT_WRITE(const brw::fs_builder ,
> +   const fs_reg dst,
> +   const fs_reg src,
> +   uint32_t components);
> +
> void do_untyped_vector_read(const brw::fs_builder ,
> const fs_reg surf_index,
> const fs_reg offset_reg,
> 



signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 15/23] i965/fs: add SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA helper

2016-05-10 Thread Kenneth Graunke
On Tuesday, May 3, 2016 2:22:04 PM PDT Samuel Iglesias Gonsálvez wrote:
> From: Iago Toral Quiroga 
> 
> There are a few places where we need to shuffle the result of a 32-bit load
> into valid 64-bit data, so extract this logic into a separate helper that we
> can reuse.
> 
> Also, the shuffling needs to operate with WE_all set, which we were missing
> before, because we are changing the layout of the data across the various
> channels. Otherwise we will run into problems in non-uniform control-flow
> scenarios.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 95 
+---
>  src/mesa/drivers/dri/i965/brw_fs.h   |  5 ++
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 46 ++--
>  3 files changed, 73 insertions(+), 73 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/
i965/brw_fs.cpp
> index dff13ea..709e4b8 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -216,39 +216,8 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
,
>  
> vec4_result.type = dst.type;
>  
> -   /* Our VARYING_PULL_CONSTANT_LOAD reads a vector of 32-bit elements. If 
we
> -* are reading doubles this means that we get this:
> -*
> -*  r0: x0 x0 x0 x0 x0 x0 x0 x0
> -*  r1: x1 x1 x1 x1 x1 x1 x1 x1
> -*  r2: y0 y0 y0 y0 y0 y0 y0 y0
> -*  r3: y1 y1 y1 y1 y1 y1 y1 y1
> -*
> -* Fix this up so we return valid double elements:
> -*
> -*  r0: x0 x1 x0 x1 x0 x1 x0 x1
> -*  r1: x0 x1 x0 x1 x0 x1 x0 x1
> -*  r2: y0 y1 y0 y1 y0 y1 y0 y1
> -*  r3: y0 y1 y0 y1 y0 y1 y0 y1
> -*/
> -   if (type_sz(dst.type) == 8) {
> -  int multiplier = bld.dispatch_width() / 8;
> -  fs_reg fixed_res =
> - fs_reg(VGRF, alloc.allocate(2 * multiplier), BRW_REGISTER_TYPE_F);
> -  /* We only have 2 doubles in a 32-bit vec4 */
> -  for (int i = 0; i < 2; i++) {
> - fs_reg vec4_float =
> -horiz_offset(retype(vec4_result, BRW_REGISTER_TYPE_F),
> - multiplier * 16 * i);
> -
> - bld.MOV(stride(fixed_res, 2), vec4_float);
> - bld.MOV(stride(horiz_offset(fixed_res, 1), 2),
> - horiz_offset(vec4_float, 8 * multiplier));
> -
> - bld.MOV(horiz_offset(vec4_result, multiplier * 8 * i),
> - retype(fixed_res, BRW_REGISTER_TYPE_DF));
> -  }
> -   }
> +   if (type_sz(dst.type) == 8)
> +  SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA(bld, vec4_result, 
vec4_result, 2);
>  
> int type_slots = MAX2(type_sz(dst.type) / 4, 1);
> bld.MOV(dst, offset(vec4_result, bld,
> @@ -256,6 +225,66 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
,
>  }
>  
>  /**
> + * This helper takes the result of a load operation that reads 32-bit 
elements
> + * in this format:
> + *
> + * x x x x x x x x
> + * y y y y y y y y
> + * z z z z z z z z
> + * w w w w w w w w
> + *
> + * and shuffles the data to get this:
> + *
> + * x y x y x y x y
> + * x y x y x y x y
> + * z w z w z w z w
> + * z w z w z w z w
> + *
> + * Which is exactly what we want if the load is reading 64-bit components
> + * like doubles, where x represents the low 32-bit of the x double 
component
> + * and y represents the high 32-bit of the x double component (likewise 
with
> + * z and w for double component y). The parameter @components represents
> + * the number of 64-bit components present in @src. This would typically be
> + * 2 at most, since we can only fit 2 double elements in the result of a
> + * vec4 load.
> + *
> + * Notice that @dst and @src can be the same register.
> + */
> +void
> +fs_visitor::SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT_DATA(const fs_builder ,
> +const fs_reg dst,
> +const fs_reg src,
> +uint32_t components)
> +{
> +   int multiplier = bld.dispatch_width() / 8;
> +
> +   /* A temporary that we will use to shuffle the 32-bit data of each
> +* component in the vector into valid 64-bit data
> +*/
> +   fs_reg tmp =
> +  fs_reg(VGRF, alloc.allocate(2 * multiplier), BRW_REGISTER_TYPE_F);
> +
> +   /* We are going to manipulate the data in elements of 32-bit */
> +   fs_reg src_data = retype(src, BRW_REGISTER_TYPE_F);
> +
> +   /* We are going to manipulate the dst in elements of 64-bit */
> +   fs_reg dst_data = retype(dst, BRW_REGISTER_TYPE_DF);

How about:

const fs_builder nomask_bld = bld.exec_all();

...

nomask_bld.MOV(...);

instead of bld.MOV(...)->force_writemask_all = true?


> +
> +   /* Shuffle the data */
> +   for (unsigned i = 0; i < components; i++) {
> +  fs_reg component_i = horiz_offset(src_data, multiplier * 16 * i);
> +
> +  bld.MOV(stride(tmp, 2), component_i)->force_writemask_all = true;
> +  bld.MOV(stride(horiz_offset(tmp, 1), 2),
> +  

Re: [Mesa-dev] [PATCH 11/23] i965/fs: Fix fs_visitor::VARYING_PULL_CONSTANT_LOAD for doubles

2016-05-10 Thread Kenneth Graunke
On Tuesday, May 10, 2016 12:07:26 PM PDT Kenneth Graunke wrote:
> On Tuesday, May 3, 2016 2:22:00 PM PDT Samuel Iglesias Gonsálvez wrote:
> > From: Iago Toral Quiroga 
> > 
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 49 +
+
> --
> >  1 file changed, 47 insertions(+), 2 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/
> i965/brw_fs.cpp
> > index bc81a80..0e69be8 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > @@ -193,8 +193,15 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const 
fs_builder 
> ,
> > else
> >op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD;
> >  
> > +   /* The pull load message will load a vec4 (16 bytes). If we are 
loading
> > +* a double this means we are only loading 2 elements worth of data.
> > +* We also want to use a 32-bit data type for the dst of the load 
> operation
> > +* so other parts of the driver don't get confused about the size of 
the
> > +* result.
> > +*/
> > int regs_written = 4 * (bld.dispatch_width() / 8) * scale;
> > -   fs_reg vec4_result = fs_reg(VGRF, alloc.allocate(regs_written), 
> dst.type);
> > +   fs_reg vec4_result = fs_reg(VGRF, alloc.allocate(regs_written),
> > +   BRW_REGISTER_TYPE_F);
> > fs_inst *inst = bld.emit(op, vec4_result, surf_index, vec4_offset);
> > inst->regs_written = regs_written;
> >  
> > @@ -207,7 +214,45 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const 
fs_builder 
> ,
> >   inst->mlen = 1 + bld.dispatch_width() / 8;
> > }
> >  
> > -   bld.MOV(dst, offset(vec4_result, bld, ((const_offset & 0xf) / 4) * 
> scale));
> > +   vec4_result.type = dst.type;
> > +
> > +   /* Our VARYING_PULL_CONSTANT_LOAD reads a vector of 32-bit elements. 
If 
> we
> > +* are reading doubles this means that we get this:
> > +*
> > +*  r0: x0 x0 x0 x0 x0 x0 x0 x0
> > +*  r1: x1 x1 x1 x1 x1 x1 x1 x1
> > +*  r2: y0 y0 y0 y0 y0 y0 y0 y0
> > +*  r3: y1 y1 y1 y1 y1 y1 y1 y1
> > +*
> > +* Fix this up so we return valid double elements:
> > +*
> > +*  r0: x0 x1 x0 x1 x0 x1 x0 x1
> > +*  r1: x0 x1 x0 x1 x0 x1 x0 x1
> > +*  r2: y0 y1 y0 y1 y0 y1 y0 y1
> > +*  r3: y0 y1 y0 y1 y0 y1 y0 y1
> > +*/
> 
> I think this could be simplified a little...
> 
> > +   if (type_sz(dst.type) == 8) {
> > +  int multiplier = bld.dispatch_width() / 8;
> > +  fs_reg fixed_res =
> > + fs_reg(VGRF, alloc.allocate(2 * multiplier), 
BRW_REGISTER_TYPE_F);
> > +  /* We only have 2 doubles in a 32-bit vec4 */
> > +  for (int i = 0; i < 2; i++) {
> > + fs_reg vec4_float =
> > +horiz_offset(retype(vec4_result, BRW_REGISTER_TYPE_F),
> > + multiplier * 16 * i);
> > +
> > + bld.MOV(stride(fixed_res, 2), vec4_float);
> 
> ^^^ copies x0 or y0 into fixed_res
> 
> > + bld.MOV(stride(horiz_offset(fixed_res, 1), 2),
> > + horiz_offset(vec4_float, 8 * multiplier));
> > +
> ^^^ copies x1 or y1 into fixed_res
> 
> > + bld.MOV(horiz_offset(vec4_result, multiplier * 8 * i),
> > + retype(fixed_res, BRW_REGISTER_TYPE_DF));
> 
> This just copies fixed_res back over vec4_result?  I don't think we need
> to do this - vec4_result is just a temporary.  We really want the final
> result in dst.
> 
> > +  }
> > +   }
> > +
> > +   int type_slots = MAX2(type_sz(dst.type) / 4, 1);
> > +   bld.MOV(dst, offset(vec4_result, bld,
> > +   ((const_offset & 0xf) / (4 * type_slots)) * 
scale));
> >  }
> 
> How about we simplify this to:
> 
>if (type_sz(dst.type) == 8) {
>   int multiplier = bld.dispatch_width() / 8;
>   fs_reg fixed_res =
>  fs_reg(VGRF, alloc.allocate(2 * multiplier), BRW_REGISTER_TYPE_F);
>   /* We only have 2 doubles in a 32-bit vec4 */
>   for (int i = 0; i < 2; i++) {
>  fs_reg vec4_float =
> horiz_offset(retype(vec4_result, BRW_REGISTER_TYPE_F),
>  multiplier * 16 * i);
> 
>  bld.MOV(stride(fixed_res, 2), vec4_float);
>  bld.MOV(stride(horiz_offset(fixed_res, 1), 2),
>  horiz_offset(vec4_float, 8 * multiplier));
> 
>  // Differences start here: no extra MOV, simplify the end a bit
>   }
> 
>   assert(scale == 1);
>   bld.MOV(dst, offset(fixed_res, bld, (const_offset & 0xf) / 8));
>} else {
>   bld.MOV(dst, offset(vec4_result, bld, ((const_offset & 0xf) / 4) * 
> scale));
>}
> 
> With that change,
> Reviewed-by: Kenneth Graunke 

Eh...now that I've read a bit further...I suppose the equivalent change
would be to make SHUFFLE_32BIT_LOAD_RESULT_TO_64BIT return a register
rather than copying to a dst parameter.  And the other callers seem OK
as is.

We can always clean things up later, I'd rather not 

Re: [Mesa-dev] [PATCH 01/17] scons: Build NIR.

2016-05-10 Thread Eric Anholt
Rob Clark  writes:

> From: Jose Fonseca 
>
> Signed-off-by: Rob Clark 
> ---
>  src/compiler/SConscript | 57 
> +++--
>  1 file changed, 55 insertions(+), 2 deletions(-)
>
> diff --git a/src/compiler/SConscript b/src/compiler/SConscript
> index 10c79c4..dde4dfd 100644
> --- a/src/compiler/SConscript
> +++ b/src/compiler/SConscript
> @@ -1,5 +1,7 @@
>  Import('*')
>  
> +from sys import executable as python_cmd
> +
>  env = env.Clone()
>  
>  env.MSVC2013Compat()
> @@ -11,13 +13,64 @@ env.Prepend(CPPPATH = [
>  '#src/mesa',
>  '#src/gallium/include',
>  '#src/gallium/auxiliary',
> +'#src/compiler',
> +'#src/compiler/nir',
> +])
> +
> +
> +# Make generated headers reachable from the include path.
> +env.Append(CPPPATH = [
> +   Dir('nir').abspath
>  ])
>  
> -sources = env.ParseSourceList('Makefile.sources', 'LIBCOMPILER_FILES')
> +# nir generated sources
> +
> +nir_builder_opcodes_h = env.CodeGenerate(
> +target = 'nir/nir_builder_opcodes.h',
> +script = 'nir/nir_builder_opcodes_h.py',
> +source = [],
> +command = python_cmd + ' $SCRIPT > $TARGET'
> +)
> +
> +env.CodeGenerate(
> +target = 'nir/nir_constant_expressions.c',
> +script = 'nir/nir_constant_expressions.py',
> +source = [],
> +command = python_cmd + ' $SCRIPT > $TARGET'
> +)
> +
> +env.CodeGenerate(
> +target = 'nir/nir_opcodes.h',
> +script = 'nir/nir_opcodes_h.py',
> +source = [],
> +command = python_cmd + ' $SCRIPT > $TARGET'
> +)
> +
> +env.CodeGenerate(
> +target = 'nir/nir_opcodes.c',
> +script = 'nir/nir_opcodes_c.py',
> +source = [],
> +command = python_cmd + ' $SCRIPT > $TARGET'
> +)
> +
> +env.CodeGenerate(
> +target = 'nir/nir_opt_algebraic.c',
> +script = 'nir/nir_algebraic.py',
> +source = [],
> +command = python_cmd + ' $SCRIPT > $TARGET'
> +)
> +
> +# parse Makefile.sources
> +source_lists = env.ParseSourceList('Makefile.sources')
> +
> +nir_sources = []
> +nir_sources += source_lists['LIBCOMPILER_FILES']
> +nir_sources += source_lists['NIR_FILES']
> +nir_sources += source_lists['NIR_GENERATED_FILES']
>  
>  compiler = env.ConvenienceLibrary(
>  target = 'compiler',
> -source = sources
> +source = nir_sources
>  )
>  Export('compiler')

Possibly s/nir_sources/sources/ to reduce diff, but either way,

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/9] compiler: guard list iteration macros against undefined behavior

2016-05-10 Thread Ian Romanick
On 04/30/2016 12:24 AM, Nicolai Hähnle wrote:
> From: Nicolai Hähnle 
> 
> The old iteration casts sentinel nodes (which are mere exec_nodes) into
> whatever type we're looping over, which leads to badness (in fact, gcc's
> undefined behaviour sanitizer crashes while trying to verify that we have
> the correct type at hand).

Bwah.  I was going to suggest a different method, but I found that my
other ideas wouldn't work for various reasons.

> These modified looping constructs postpone the cast until its correctness
> has been established. The odd two-level loop construct is used to be able
> to define variables of different types, and the __sentinel variable
  ^^
You mean __flag, right?

> ensures that the outer loop is only run once. gcc is able to optimize the
> outer loop away in the cases I have examined.

Does 'size' report any difference on the resulting .so?

> ---
>  src/compiler/glsl/list.h | 109 
> ---
>  1 file changed, 64 insertions(+), 45 deletions(-)
> 
> diff --git a/src/compiler/glsl/list.h b/src/compiler/glsl/list.h
> index a1c4d82..8da9514 100644
> --- a/src/compiler/glsl/list.h
> +++ b/src/compiler/glsl/list.h
> @@ -621,36 +621,55 @@ inline void exec_node::insert_before(exec_list *before)
>  }
>  #endif
>  
> -#define foreach_in_list(__type, __inst, __list)  \
> -   for (__type *(__inst) = (__type *)(__list)->head; \
> -!(__inst)->is_tail_sentinel();   \
> -(__inst) = (__type *)(__inst)->next)
> -
> -#define foreach_in_list_reverse(__type, __inst, __list)   \
> -   for (__type *(__inst) = (__type *)(__list)->tail_pred; \
> -!(__inst)->is_head_sentinel();\
> -(__inst) = (__type *)(__inst)->prev)
> +/* The somewhat odd-looking multi-loop construct here is to avoid casting
> + * sentinel nodes, which would be undefined behavior (which is indeed 
> flagged /
> + * leads to crashes with gcc's ubsan).
> + */
> +#define foreach_in_list(__type, __node, __list)  \
> +   for (__type *(__node), **__flag = &(__node);  \
> +__flag; __flag = NULL)   \
> +  for (struct exec_node *__cur = (__list)->head; \
> +   !__cur->is_tail_sentinel() && \
> +   (((__node) = (__type *) __cur) || true);  \
> +   __cur = __cur->next)  \
> +
> +#define foreach_in_list_reverse(__type, __node, __list)   \
> +   for (__type *(__node), **__flag = &(__node);   \
> +__flag; __flag = NULL)\
> +  for (struct exec_node *__cur = (__list)->tail_pred; \
> +   !__cur->is_head_sentinel() &&  \
> +   (((__node) = (__type *) __cur) || true);   \
> +   __cur = __cur->prev)
>  
>  /**
>   * This version is safe even if the current node is removed.
>   */ 
>  #define foreach_in_list_safe(__type, __node, __list) \
> -   for (__type *__node = (__type *)(__list)->head,   \
> -   *__next = (__type *)__node->next; \
> -__next != NULL;  \
> -__node = __next, __next = (__type *)__next->next)
> +   for (__type *(__node), **__flag = &(__node);  \
> +__flag; __flag = NULL)   \
> +  for (struct exec_node *__cur = (__list)->head, \
> +*__next = __cur->next;   \
> +   __next != NULL && \
> +   (((__node) = (__type *) __cur) || true);  \
> +   __cur = __next, __next = __next->next)
>  
>  #define foreach_in_list_reverse_safe(__type, __node, __list) \
> -   for (__type *__node = (__type *)(__list)->tail_pred,  \
> -   *__prev = (__type *)__node->prev; \
> -__prev != NULL;  \
> -__node = __prev, __prev = (__type *)__prev->prev)
> -
> -#define foreach_in_list_use_after(__type, __inst, __list) \
> -   __type *(__inst);  \
> -   for ((__inst) = (__type *)(__list)->head;  \
> -!(__inst)->is_tail_sentinel();\
> -(__inst) = (__type *)(__inst)->next)
> +   for (__type *(__node), **__flag = &(__node);  \
> +__flag; __flag = NULL)   \
> +  for (struct exec_node *__cur = (__list)->tail_pred,\
> +*__prev = __cur->prev;   \
> +   __prev != NULL && \
> +   (((__node) = (__type *) __cur) || true);  \
> +   __cur = __prev, __prev = __prev->prev)
> +
> +#define foreach_in_list_use_after(__type, __node, __list) \
> +   __type *(__node);  \
> +   for (__type **__flag = &(__node);  \
> +__flag; __flag = NULL)\
> +  for 

Re: [Mesa-dev] [PATCH 2/3] nir/algebraic: support for power-of-two optimizations

2016-05-10 Thread Kenneth Graunke
On Tuesday, May 10, 2016 2:57:03 PM PDT Rob Clark wrote:
> From: Rob Clark 
> 
> Some optimizations, like converting integer multiply/divide into left/
> right shifts, have additional constraints on the search expression.
> Like requiring that a variable is a constant power of two.  Support
> these cases by allowing a fxn name to be appended to the search var
> expression (ie. "a#32(is_power_of_two)").
> 
> TODO update doc/comment explaining search var syntax
> TODO the eagle-eyed viewer might have noticed that this could also
> replace the existing const syntax (ie. "#a").  Not sure if we should
> keep that.. we could make it syntactic sugar (ie '#' automatically sets
> the cond fxn ptr to 'is_const') or just get rid of it entirely?  Maybe
> that is a follow-on clean-up patch?
> 
> Signed-off-by: Rob Clark 
> ---
>  src/compiler/nir/nir_algebraic.py |  8 +++--
>  src/compiler/nir/nir_opt_algebraic.py |  5 +++
>  src/compiler/nir/nir_search.c |  3 ++
>  src/compiler/nir/nir_search.h | 10 ++
>  src/compiler/nir/nir_search_helpers.h | 66 ++
+
>  5 files changed, 90 insertions(+), 2 deletions(-)
>  create mode 100644 src/compiler/nir/nir_search_helpers.h
> 
> diff --git a/src/compiler/nir/nir_algebraic.py b/src/compiler/nir/
nir_algebraic.py
> index 285f853..19ac6ee 100644
> --- a/src/compiler/nir/nir_algebraic.py
> +++ b/src/compiler/nir/nir_algebraic.py
> @@ -76,6 +76,7 @@ class Value(object):
>   return Constant(val, name_base)
>  
> __template = mako.template.Template("""
> +#include "compiler/nir/nir_search_helpers.h"
>  static const ${val.c_type} ${val.name} = {
> { ${val.type_enum}, ${val.bit_size} },
>  % if isinstance(val, Constant):
> @@ -84,6 +85,7 @@ static const ${val.c_type} ${val.name} = {
> ${val.index}, /* ${val.var_name} */
> ${'true' if val.is_constant else 'false'},
> ${val.type() or 'nir_type_invalid' },
> +   ${val.cond if val.cond else 'NULL'},
>  % elif isinstance(val, Expression):
> ${'true' if val.inexact else 'false'},
> nir_op_${val.opcode},
> @@ -113,7 +115,7 @@ static const ${val.c_type} ${val.name} = {
>  Variable=Variable,
>  Expression=Expression)
>  
> -_constant_re = re.compile(r"(?P[^@]+)(?:@(?P\d+))?")
> +_constant_re = re.compile(r"(?P[^@\(]+)(?:@(?P\d+))?")
>  
>  class Constant(Value):
> def __init__(self, val, name):
> @@ -150,7 +152,8 @@ class Constant(Value):
>   return "nir_type_float"
>  
>  _var_name_re = re.compile(r"(?P#)?(?P\w+)"
> -  r"(?:@(?Pint|uint|bool|float)?(?P\d
+)?)?")
> +  r"(?:@(?Pint|uint|bool|float)?(?P\d
+)?)?"
> +  r"(?P\([^\)]+\))?")
>  
>  class Variable(Value):
> def __init__(self, val, name, varset):
> @@ -161,6 +164,7 @@ class Variable(Value):
>  
>self.var_name = m.group('name')
>self.is_constant = m.group('const') is not None
> +  self.cond = m.group('cond')
>self.required_type = m.group('type')
>self.bit_size = int(m.group('bits')) if m.group('bits') else 0
>  
> diff --git a/src/compiler/nir/nir_opt_algebraic.py b/src/compiler/nir/
nir_opt_algebraic.py
> index 0a95725..952a91a 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -62,6 +62,11 @@ d = 'd'
>  # constructed value should have that bit-size.
>  
>  optimizations = [
> +
> +   (('imul', a, '#b@32(is_power_of_two)'), ('ishl', a, ('find_lsb', b))),
> +   (('udiv', a, '#b@32(is_power_of_two)'), ('ushr', a, ('find_lsb', b))),
> +   (('umod', a, '#b(is_power_of_two)'),('iand', a, ('isub', b, 1))),
> +
> (('fneg', ('fneg', a)), a),
> (('ineg', ('ineg', a)), a),
> (('fabs', ('fabs', a)), ('fabs', a)),
> diff --git a/src/compiler/nir/nir_search.c b/src/compiler/nir/nir_search.c
> index 2c2fd92..b21fb2c 100644
> --- a/src/compiler/nir/nir_search.c
> +++ b/src/compiler/nir/nir_search.c
> @@ -127,6 +127,9 @@ match_value(const nir_search_value *value, nir_alu_instr 
*instr, unsigned src,
>   instr->src[src].src.ssa->parent_instr->type != 
nir_instr_type_load_const)
>  return false;
>  
> + if (var->cond && !var->cond(instr, src, num_components, 
new_swizzle))
> +return false;
> +
>   if (var->type != nir_type_invalid) {
>  if (instr->src[src].src.ssa->parent_instr->type != 
nir_instr_type_alu)
> return false;
> diff --git a/src/compiler/nir/nir_search.h b/src/compiler/nir/nir_search.h
> index a500feb..f55d797 100644
> --- a/src/compiler/nir/nir_search.h
> +++ b/src/compiler/nir/nir_search.h
> @@ -68,6 +68,16 @@ typedef struct {
>  * never match anything.
>  */
> nir_alu_type type;
> +
> +   /** Optional condition fxn ptr
> +*
> +* This is only allowed in search expressions, and allows 

Re: [Mesa-dev] [PATCH 11/23] i965/fs: Fix fs_visitor::VARYING_PULL_CONSTANT_LOAD for doubles

2016-05-10 Thread Kenneth Graunke
On Tuesday, May 3, 2016 2:22:00 PM PDT Samuel Iglesias Gonsálvez wrote:
> From: Iago Toral Quiroga 
> 
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 49 ++
--
>  1 file changed, 47 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/
i965/brw_fs.cpp
> index bc81a80..0e69be8 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -193,8 +193,15 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
,
> else
>op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD;
>  
> +   /* The pull load message will load a vec4 (16 bytes). If we are loading
> +* a double this means we are only loading 2 elements worth of data.
> +* We also want to use a 32-bit data type for the dst of the load 
operation
> +* so other parts of the driver don't get confused about the size of the
> +* result.
> +*/
> int regs_written = 4 * (bld.dispatch_width() / 8) * scale;
> -   fs_reg vec4_result = fs_reg(VGRF, alloc.allocate(regs_written), 
dst.type);
> +   fs_reg vec4_result = fs_reg(VGRF, alloc.allocate(regs_written),
> +   BRW_REGISTER_TYPE_F);
> fs_inst *inst = bld.emit(op, vec4_result, surf_index, vec4_offset);
> inst->regs_written = regs_written;
>  
> @@ -207,7 +214,45 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
,
>   inst->mlen = 1 + bld.dispatch_width() / 8;
> }
>  
> -   bld.MOV(dst, offset(vec4_result, bld, ((const_offset & 0xf) / 4) * 
scale));
> +   vec4_result.type = dst.type;
> +
> +   /* Our VARYING_PULL_CONSTANT_LOAD reads a vector of 32-bit elements. If 
we
> +* are reading doubles this means that we get this:
> +*
> +*  r0: x0 x0 x0 x0 x0 x0 x0 x0
> +*  r1: x1 x1 x1 x1 x1 x1 x1 x1
> +*  r2: y0 y0 y0 y0 y0 y0 y0 y0
> +*  r3: y1 y1 y1 y1 y1 y1 y1 y1
> +*
> +* Fix this up so we return valid double elements:
> +*
> +*  r0: x0 x1 x0 x1 x0 x1 x0 x1
> +*  r1: x0 x1 x0 x1 x0 x1 x0 x1
> +*  r2: y0 y1 y0 y1 y0 y1 y0 y1
> +*  r3: y0 y1 y0 y1 y0 y1 y0 y1
> +*/

I think this could be simplified a little...

> +   if (type_sz(dst.type) == 8) {
> +  int multiplier = bld.dispatch_width() / 8;
> +  fs_reg fixed_res =
> + fs_reg(VGRF, alloc.allocate(2 * multiplier), BRW_REGISTER_TYPE_F);
> +  /* We only have 2 doubles in a 32-bit vec4 */
> +  for (int i = 0; i < 2; i++) {
> + fs_reg vec4_float =
> +horiz_offset(retype(vec4_result, BRW_REGISTER_TYPE_F),
> + multiplier * 16 * i);
> +
> + bld.MOV(stride(fixed_res, 2), vec4_float);

^^^ copies x0 or y0 into fixed_res

> + bld.MOV(stride(horiz_offset(fixed_res, 1), 2),
> + horiz_offset(vec4_float, 8 * multiplier));
> +
^^^ copies x1 or y1 into fixed_res

> + bld.MOV(horiz_offset(vec4_result, multiplier * 8 * i),
> + retype(fixed_res, BRW_REGISTER_TYPE_DF));

This just copies fixed_res back over vec4_result?  I don't think we need
to do this - vec4_result is just a temporary.  We really want the final
result in dst.

> +  }
> +   }
> +
> +   int type_slots = MAX2(type_sz(dst.type) / 4, 1);
> +   bld.MOV(dst, offset(vec4_result, bld,
> +   ((const_offset & 0xf) / (4 * type_slots)) * scale));
>  }

How about we simplify this to:

   if (type_sz(dst.type) == 8) {
  int multiplier = bld.dispatch_width() / 8;
  fs_reg fixed_res =
 fs_reg(VGRF, alloc.allocate(2 * multiplier), BRW_REGISTER_TYPE_F);
  /* We only have 2 doubles in a 32-bit vec4 */
  for (int i = 0; i < 2; i++) {
 fs_reg vec4_float =
horiz_offset(retype(vec4_result, BRW_REGISTER_TYPE_F),
 multiplier * 16 * i);

 bld.MOV(stride(fixed_res, 2), vec4_float);
 bld.MOV(stride(horiz_offset(fixed_res, 1), 2),
 horiz_offset(vec4_float, 8 * multiplier));

 // Differences start here: no extra MOV, simplify the end a bit
  }

  assert(scale == 1);
  bld.MOV(dst, offset(fixed_res, bld, (const_offset & 0xf) / 8));
   } else {
  bld.MOV(dst, offset(vec4_result, bld, ((const_offset & 0xf) / 4) * 
scale));
   }

With that change,
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/6] Update generated GLX server code

2016-05-10 Thread Eric Anholt
Adam Jackson  writes:

> Another attempt at syncing the GLX generator scripts with xserver.
>
> Jon mentioned a couple of issues in the last series, namely that these two
> patches were still necessary:
>
> https://lists.x.org/archives/xorg-devel/2014-April/041597.html
> https://lists.x.org/archives/xorg-devel/2014-April/041919.html
>
> The former doesn't work at all: 'api' isn't a global, nor is it a parameter
> to emit_function_call(). So that's not included here, and anyone updating
> the xserver side will just need to be careful about the ARB_multitexture
> bits. It's also fairly ugly! Turns out there really isn't enough
> information attached to the entrypoint itself to be able to walk backward
> to the GL version or extension name directly. I've been fiddling with
> getting it working and not having a lot of fun.
>
> The latter is addressed in a different way, I've just dropped the "skip
> that which is marked for skipping" patch from the series. This means we
> will generate protocol code for NV_*_program again, which is harmless
> enough. But it does mean xserver will need to revert the commit that
> removes the open-coded size logic for GetProgramString.

I've looked through the python and the generated diff to the server.
The server's diff is still unreasonably huge, but this series' change to
the diff seems sensible to me.

Reviewed-by: Eric Anholt 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] nir/algebraic: support for power-of-two optimizations

2016-05-10 Thread Rob Clark
From: Rob Clark 

Some optimizations, like converting integer multiply/divide into left/
right shifts, have additional constraints on the search expression.
Like requiring that a variable is a constant power of two.  Support
these cases by allowing a fxn name to be appended to the search var
expression (ie. "a#32(is_power_of_two)").

TODO update doc/comment explaining search var syntax
TODO the eagle-eyed viewer might have noticed that this could also
replace the existing const syntax (ie. "#a").  Not sure if we should
keep that.. we could make it syntactic sugar (ie '#' automatically sets
the cond fxn ptr to 'is_const') or just get rid of it entirely?  Maybe
that is a follow-on clean-up patch?

Signed-off-by: Rob Clark 
---
 src/compiler/nir/nir_algebraic.py |  8 +++--
 src/compiler/nir/nir_opt_algebraic.py |  5 +++
 src/compiler/nir/nir_search.c |  3 ++
 src/compiler/nir/nir_search.h | 10 ++
 src/compiler/nir/nir_search_helpers.h | 66 +++
 5 files changed, 90 insertions(+), 2 deletions(-)
 create mode 100644 src/compiler/nir/nir_search_helpers.h

diff --git a/src/compiler/nir/nir_algebraic.py 
b/src/compiler/nir/nir_algebraic.py
index 285f853..19ac6ee 100644
--- a/src/compiler/nir/nir_algebraic.py
+++ b/src/compiler/nir/nir_algebraic.py
@@ -76,6 +76,7 @@ class Value(object):
  return Constant(val, name_base)
 
__template = mako.template.Template("""
+#include "compiler/nir/nir_search_helpers.h"
 static const ${val.c_type} ${val.name} = {
{ ${val.type_enum}, ${val.bit_size} },
 % if isinstance(val, Constant):
@@ -84,6 +85,7 @@ static const ${val.c_type} ${val.name} = {
${val.index}, /* ${val.var_name} */
${'true' if val.is_constant else 'false'},
${val.type() or 'nir_type_invalid' },
+   ${val.cond if val.cond else 'NULL'},
 % elif isinstance(val, Expression):
${'true' if val.inexact else 'false'},
nir_op_${val.opcode},
@@ -113,7 +115,7 @@ static const ${val.c_type} ${val.name} = {
 Variable=Variable,
 Expression=Expression)
 
-_constant_re = re.compile(r"(?P[^@]+)(?:@(?P\d+))?")
+_constant_re = re.compile(r"(?P[^@\(]+)(?:@(?P\d+))?")
 
 class Constant(Value):
def __init__(self, val, name):
@@ -150,7 +152,8 @@ class Constant(Value):
  return "nir_type_float"
 
 _var_name_re = re.compile(r"(?P#)?(?P\w+)"
-  
r"(?:@(?Pint|uint|bool|float)?(?P\d+)?)?")
+  r"(?:@(?Pint|uint|bool|float)?(?P\d+)?)?"
+  r"(?P\([^\)]+\))?")
 
 class Variable(Value):
def __init__(self, val, name, varset):
@@ -161,6 +164,7 @@ class Variable(Value):
 
   self.var_name = m.group('name')
   self.is_constant = m.group('const') is not None
+  self.cond = m.group('cond')
   self.required_type = m.group('type')
   self.bit_size = int(m.group('bits')) if m.group('bits') else 0
 
diff --git a/src/compiler/nir/nir_opt_algebraic.py 
b/src/compiler/nir/nir_opt_algebraic.py
index 0a95725..952a91a 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -62,6 +62,11 @@ d = 'd'
 # constructed value should have that bit-size.
 
 optimizations = [
+
+   (('imul', a, '#b@32(is_power_of_two)'), ('ishl', a, ('find_lsb', b))),
+   (('udiv', a, '#b@32(is_power_of_two)'), ('ushr', a, ('find_lsb', b))),
+   (('umod', a, '#b(is_power_of_two)'),('iand', a, ('isub', b, 1))),
+
(('fneg', ('fneg', a)), a),
(('ineg', ('ineg', a)), a),
(('fabs', ('fabs', a)), ('fabs', a)),
diff --git a/src/compiler/nir/nir_search.c b/src/compiler/nir/nir_search.c
index 2c2fd92..b21fb2c 100644
--- a/src/compiler/nir/nir_search.c
+++ b/src/compiler/nir/nir_search.c
@@ -127,6 +127,9 @@ match_value(const nir_search_value *value, nir_alu_instr 
*instr, unsigned src,
  instr->src[src].src.ssa->parent_instr->type != 
nir_instr_type_load_const)
 return false;
 
+ if (var->cond && !var->cond(instr, src, num_components, new_swizzle))
+return false;
+
  if (var->type != nir_type_invalid) {
 if (instr->src[src].src.ssa->parent_instr->type != 
nir_instr_type_alu)
return false;
diff --git a/src/compiler/nir/nir_search.h b/src/compiler/nir/nir_search.h
index a500feb..f55d797 100644
--- a/src/compiler/nir/nir_search.h
+++ b/src/compiler/nir/nir_search.h
@@ -68,6 +68,16 @@ typedef struct {
 * never match anything.
 */
nir_alu_type type;
+
+   /** Optional condition fxn ptr
+*
+* This is only allowed in search expressions, and allows additional
+* constraints to be placed on the match.  Typically used for 'is_constant'
+* variables to require, for example, power-of-two in order for the search
+* to match.
+*/
+   bool (*cond)(nir_alu_instr *instr, unsigned src,
+unsigned num_components, const 

[Mesa-dev] [Bug 95005] Unreal engine demos segfault after shader compilation error with OpenGL 4.3

2016-05-10 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=95005

--- Comment #8 from Ian Romanick  ---
This seems related to a bug that was reported via the mesa-dev mailing list
back in 2013:

https://lists.freedesktop.org/archives/mesa-dev/2013-November/048843.html

-- 
You are receiving this mail because:
You are on the CC list for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] gallium/ddebug: Support compute states.

2016-05-10 Thread Bas Nieuwenhuizen
v2: Reuse the macro for bind & delete.

Note that may not be able to share the delete long-term as
pipe_compute_state contains members not in pipe_shader_state,
and we need to distinguish the pointer location if we add that
struct to the union.

Signed-off-by: Bas Nieuwenhuizen 
---
 src/gallium/drivers/ddebug/dd_context.c | 56 +++--
 1 file changed, 40 insertions(+), 16 deletions(-)

diff --git a/src/gallium/drivers/ddebug/dd_context.c 
b/src/gallium/drivers/ddebug/dd_context.c
index d06efbc..0f8ef18 100644
--- a/src/gallium/drivers/ddebug/dd_context.c
+++ b/src/gallium/drivers/ddebug/dd_context.c
@@ -250,22 +250,7 @@ DD_CSO_DELETE(vertex_elements)
  * shaders
  */
 
-#define DD_SHADER(NAME, name) \
-   static void * \
-   dd_context_create_##name##_state(struct pipe_context *_pipe, \
-const struct pipe_shader_state *state) \
-   { \
-  struct pipe_context *pipe = dd_context(_pipe)->pipe; \
-  struct dd_state *hstate = CALLOC_STRUCT(dd_state); \
- \
-  if (!hstate) \
- return NULL; \
-  hstate->cso = pipe->create_##name##_state(pipe, state); \
-  hstate->state.shader = *state; \
-  hstate->state.shader.tokens = tgsi_dup_tokens(state->tokens); \
-  return hstate; \
-   } \
-\
+#define DD_SHADER_NOCREATE(NAME, name) \
static void \
dd_context_bind_##name##_state(struct pipe_context *_pipe, void *state) \
{ \
@@ -289,12 +274,48 @@ DD_CSO_DELETE(vertex_elements)
   FREE(hstate); \
}
 
+#define DD_SHADER(NAME, name) \
+   static void * \
+   dd_context_create_##name##_state(struct pipe_context *_pipe, \
+const struct pipe_shader_state *state) \
+   { \
+  struct pipe_context *pipe = dd_context(_pipe)->pipe; \
+  struct dd_state *hstate = CALLOC_STRUCT(dd_state); \
+ \
+  if (!hstate) \
+ return NULL; \
+  hstate->cso = pipe->create_##name##_state(pipe, state); \
+  hstate->state.shader = *state; \
+  hstate->state.shader.tokens = tgsi_dup_tokens(state->tokens); \
+  return hstate; \
+   } \
+\
+   DD_SHADER_NOCREATE(NAME, name)
+
 DD_SHADER(FRAGMENT, fs)
 DD_SHADER(VERTEX, vs)
 DD_SHADER(GEOMETRY, gs)
 DD_SHADER(TESS_CTRL, tcs)
 DD_SHADER(TESS_EVAL, tes)
 
+static void * \
+dd_context_create_compute_state(struct pipe_context *_pipe,
+ const struct pipe_compute_state *state)
+{
+   struct pipe_context *pipe = dd_context(_pipe)->pipe;
+   struct dd_state *hstate = CALLOC_STRUCT(dd_state);
+
+   if (!hstate)
+  return NULL;
+   hstate->cso = pipe->create_compute_state(pipe, state);
+
+   if (state->ir_type == PIPE_SHADER_IR_TGSI)
+  hstate->state.shader.tokens = tgsi_dup_tokens(state->prog);
+
+   return hstate;
+}
+
+DD_SHADER_NOCREATE(COMPUTE, compute)
 
 /
  * immediate states
@@ -703,6 +724,9 @@ dd_context_create(struct dd_screen *dscreen, struct 
pipe_context *pipe)
CTX_INIT(create_tes_state);
CTX_INIT(bind_tes_state);
CTX_INIT(delete_tes_state);
+   CTX_INIT(create_compute_state);
+   CTX_INIT(bind_compute_state);
+   CTX_INIT(delete_compute_state);
CTX_INIT(create_vertex_elements_state);
CTX_INIT(bind_vertex_elements_state);
CTX_INIT(delete_vertex_elements_state);
-- 
2.8.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] mesa/vbo: fix check for zero aliases with 2/10/10/10

2016-05-10 Thread Ian Romanick
It seems like at least some of these recent fixes should be candidates
for stable.

On 05/09/2016 08:37 PM, Kenneth Graunke wrote:
> On Tuesday, May 10, 2016 11:07:23 AM PDT Dave Airlie wrote:
>> From: Dave Airlie 
>>
>> This fixes:
>> GL33-
> CTS.gtf33.GL3Tests.vertex_type_2_10_10_10_rev.vertex_type_2_10_10_10_rev_attrib
>>
>> Signed-off-by: Dave Airlie 
>> ---
>>  src/mesa/vbo/vbo_attrib_tmp.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/mesa/vbo/vbo_attrib_tmp.h b/src/mesa/vbo/vbo_attrib_tmp.h
>> index e73b8fb..ed0b2de 100644
>> --- a/src/mesa/vbo/vbo_attrib_tmp.h
>> +++ b/src/mesa/vbo/vbo_attrib_tmp.h
>> @@ -226,7 +226,7 @@ static inline float conv_i2_to_norm_float(const struct 
> gl_context *ctx, int i2)
>> } while(0)
>>  
>>  #define ATTR_UI_INDEX(ctx, val, type, normalized, index, arg) do {  \
>> -  if ((index) == 0) {   \
>> +  if ((index) == 0 && _mesa_attr_zero_aliases_vertex(ctx)) {\
>>   ATTR_UI(ctx, val, (type), normalized, 0, (arg));   \
>>} else if ((index) < MAX_VERTEX_GENERIC_ATTRIBS) {\
>>   ATTR_UI(ctx, val, (type), normalized, VBO_ATTRIB_GENERIC0 + (index), 
> (arg)); \
>>
> 
> It does look like we do this in every other case, so...
> 
> Reviewed-by: Kenneth Graunke 
> 
> 
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 




signature.asc
Description: OpenPGP digital signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [v4 09/11] i965: Set render state for lossless compressed

2016-05-10 Thread Ben Widawsky
On Tue, May 10, 2016 at 08:14:00PM +0300, Pohjolainen, Topi wrote:
> On Thu, Apr 21, 2016 at 02:59:04PM +0300, Topi Pohjolainen wrote:
> > v2: Add support for blorp and removed the support for meta
> > 
> > Signed-off-by: Topi Pohjolainen 
> > ---
> >  src/mesa/drivers/dri/i965/brw_blorp_blit.cpp  | 3 +++
> >  src/mesa/drivers/dri/i965/brw_blorp_clear.cpp | 6 ++
> >  src/mesa/drivers/dri/i965/brw_draw.c  | 7 ++-
> >  3 files changed, 15 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
> > b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> > index 74fe3c0..84f4ca5 100644
> > --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> > @@ -107,6 +107,9 @@ brw_blorp_blit_miptrees(struct brw_context *brw,
> > brw_blorp_exec(brw, );
> >  
> > intel_miptree_slice_set_needs_hiz_resolve(dst_mt, dst_level, dst_layer);
> > +
> > +   if (intel_miptree_is_lossless_compressed(brw, dst_mt))
> > +  dst_mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_UNRESOLVED;
> >  }
> >  
> >  static int
> > diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp 
> > b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> > index b1da935..bf8d231 100644
> > --- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> > @@ -398,6 +398,12 @@ do_single_blorp_clear(struct brw_context *brw, struct 
> > gl_framebuffer *fb,
> > * redundant clears.
> > */
> >irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;
> > +   } else if (intel_miptree_is_lossless_compressed(brw, irb->mt)) {
> > +  /* Compressed buffers can be cleared also using normal rep-clear. In
> > +   * such case they bahave such as if they were drawn using normal 3D
> > +   * render pipeline, and we simply mark the mcs as dirty.
> > +   */
> > +  irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_UNRESOLVED;
> 
> We discussed in irc with Ben that this should be only hit with partial
> clears. I added:
> 
>  assert(partial_clear);

Can you make it:
assert(partial_clear && !fast_clear)?

> 
> I didn't see this trigger, and Ben, I didn't actually see assertion failures
> in your test run either. The two deqp regressions:
> 
> piglit.spec.arb_pixel_buffer_object.texsubimage pbo.sklm64
> piglit.spec.arb_pixel_buffer_object.texsubimage-unpack pbo.sklm64
> 
> are actually real rendering errors that I haven't seen before. We analysed
> this a little with Ken and were able to make these pass with:
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp 
> b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> index 7cfaae7..363b558 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> @@ -30,6 +30,7 @@
>  #include "util/ralloc.h"
>  
>  #include "intel_fbo.h"
> +#include "intel_reg.h"
>  
>  #include "brw_blorp.h"
>  #include "brw_meta_util.h"
> @@ -435,6 +436,9 @@ brw_blorp_resolve_color(struct brw_context *brw, struct 
> intel_mipmap_tree *mt)
>  
> brw_blorp_exec(brw, );
> mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_RESOLVED;
> +
> +   brw_emit_pipe_control_flush(brw,
> +   PIPE_CONTROL_RENDER_TARGET_FLUSH);
>  }
>  
>  } /* extern "C" */
> 
> 
> In bspec there is:
> 
> "When performing a render target resolve, PIPE_CONTROL with end of pipe sync
> must be delivered."
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 12/17] nir/lower-io: add support for lowering inputs

2016-05-10 Thread Jason Ekstrand
On Mon, May 9, 2016 at 12:34 PM, Rob Clark  wrote:

> From: Rob Clark 
>
> Signed-off-by: Rob Clark 
> ---
>  src/compiler/nir/nir.h |  3 +-
>  src/compiler/nir/nir_lower_io_to_temporaries.c | 56
> +-
>  src/mesa/drivers/dri/i965/brw_nir.c|  4 +-
>  3 files changed, 52 insertions(+), 11 deletions(-)
>
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index 5410f0b..c96eaf9 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -2279,7 +2279,8 @@ bool nir_lower_indirect_derefs(nir_shader *shader,
> nir_variable_mode modes);
>
>  bool nir_lower_locals_to_regs(nir_shader *shader);
>
> -void nir_lower_io_to_temporaries(nir_shader *shader, nir_function
> *entrypoint);
> +void nir_lower_io_to_temporaries(nir_shader *shader, nir_function
> *entrypoint,
> + bool outputs, bool inputs);
>
>  void nir_shader_gather_info(nir_shader *shader, nir_function_impl
> *entrypoint);
>
> diff --git a/src/compiler/nir/nir_lower_io_to_temporaries.c
> b/src/compiler/nir/nir_lower_io_to_temporaries.c
> index 9df2ba0..34e7477 100644
> --- a/src/compiler/nir/nir_lower_io_to_temporaries.c
> +++ b/src/compiler/nir/nir_lower_io_to_temporaries.c
> @@ -22,9 +22,12 @@
>   */
>
>  /*
> - * Implements a pass that lowers output variables to a temporary plus an
> - * output variable with a single copy at each exit point of the shader.
> - * This way the output variable is only ever written.
> + * Implements a pass that lowers output and/or input variables to a
> + * temporary plus an output variable with a single copy at each exit
> + * point of the shader and/or an input variable with a single copy
> + * at the entrance point of the shader.  This way the output variable
> + * is only ever written once and/or input is only read once, and there
> + * are no indirect outut/input accesses.
>   */
>
>  #include "nir.h"
> @@ -33,6 +36,7 @@ struct lower_io_state {
> nir_shader *shader;
> nir_function *entrypoint;
> struct exec_list old_outputs;
> +   struct exec_list old_inputs;
>  };
>
>  static void
> @@ -49,7 +53,6 @@ emit_copies(nir_cursor cursor, nir_shader *shader,
> struct exec_list *new_vars,
>   nir_intrinsic_instr_create(shader, nir_intrinsic_copy_var);
>copy->variables[0] = nir_deref_var_create(copy, newv);
>copy->variables[1] = nir_deref_var_create(copy, temp);
> -
>

I don't think this was intended.


>nir_instr_insert(cursor, >instr);
> }
>  }
> @@ -90,6 +93,20 @@ emit_output_copies_impl(struct lower_io_state *state,
> nir_function_impl *impl)
> }
>  }
>
> +static void
> +emit_input_copies(nir_cursor cursor, struct lower_io_state *state)
> +{
> +   emit_copies(cursor, state->shader, >old_inputs,
> >shader->inputs);
> +}
> +
> +static void
> +emit_input_copies_impl(struct lower_io_state *state, nir_function_impl
> *impl)
> +{
> +   if (impl->function == state->entrypoint) {
> +  emit_input_copies(nir_before_block(nir_start_block(impl)), state);
> +   }
> +}
>

I was questioning the need for the wrapper before, but this is a bit silly
:-)

I think I'd like to see the extra layer of wrappers go if you're not too
attached to them.  and I had one other trivial change above.  Other than
that the lower_io_to_temporaries patches are

Reviewed-by: Jason Ekstrand 


> +
>  static nir_variable *
>  create_shadow_temp(struct lower_io_state *state, nir_variable *var)
>  {
> @@ -105,8 +122,8 @@ create_shadow_temp(struct lower_io_state *state,
> nir_variable *var)
> /* Reparent the constant initializer (if any) */
> ralloc_steal(nvar, nvar->constant_initializer);
>
> -   /* Give the output a new name with @out-temp appended */
> -   const char *mode = "out";
> +   /* Give the original a new name with @-temp appended */
> +   const char *mode = (temp->data.mode == nir_var_shader_in) ? "in" :
> "out";
> temp->name = ralloc_asprintf(var, "%s@%s-temp", mode, nvar->name);
> temp->data.mode = nir_var_global;
> temp->constant_initializer = NULL;
> @@ -115,7 +132,8 @@ create_shadow_temp(struct lower_io_state *state,
> nir_variable *var)
>  }
>
>  void
> -nir_lower_io_to_temporaries(nir_shader *shader, nir_function *entrypoint)
> +nir_lower_io_to_temporaries(nir_shader *shader, nir_function *entrypoint,
> +bool outputs, bool inputs)
>  {
> struct lower_io_state state;
>
> @@ -124,7 +142,16 @@ nir_lower_io_to_temporaries(nir_shader *shader,
> nir_function *entrypoint)
>
> state.shader = shader;
> state.entrypoint = entrypoint;
> -   exec_list_move_nodes_to(>outputs, _outputs);
> +
> +   if (inputs)
> +  exec_list_move_nodes_to(>inputs, _inputs);
> +   else
> +  exec_list_make_empty(_inputs);
> +
> +   if (outputs)
> +  exec_list_move_nodes_to(>outputs, _outputs);
> +   else
> +  exec_list_make_empty(_outputs);
>

Re: [Mesa-dev] [PATCH 11/17] nir/lower-io: split out some helper fxns

2016-05-10 Thread Jason Ekstrand
On Mon, May 9, 2016 at 12:33 PM, Rob Clark  wrote:

> From: Rob Clark 
>
> Prep work to reduce the noise in the next patch.
>
> Signed-off-by: Rob Clark 
> ---
>  src/compiler/nir/nir_lower_io_to_temporaries.c | 124
> ++---
>  1 file changed, 72 insertions(+), 52 deletions(-)
>
> diff --git a/src/compiler/nir/nir_lower_io_to_temporaries.c
> b/src/compiler/nir/nir_lower_io_to_temporaries.c
> index bf16aec..9df2ba0 100644
> --- a/src/compiler/nir/nir_lower_io_to_temporaries.c
> +++ b/src/compiler/nir/nir_lower_io_to_temporaries.c
> @@ -31,29 +31,89 @@
>
>  struct lower_io_state {
> nir_shader *shader;
> +   nir_function *entrypoint;
> struct exec_list old_outputs;
>  };
>
>  static void
> -emit_output_copies(nir_cursor cursor, struct lower_io_state *state)
> +emit_copies(nir_cursor cursor, nir_shader *shader, struct exec_list
> *new_vars,
> +  struct exec_list *old_vars)
>  {
> -   assert(exec_list_length(>shader->outputs) ==
> -  exec_list_length(>old_outputs));
> +   assert(exec_list_length(new_vars) == exec_list_length(old_vars));
>
> -   foreach_two_lists(out_node, >shader->outputs,
> - temp_node, >old_outputs) {
> -  nir_variable *output = exec_node_data(nir_variable, out_node, node);
> -  nir_variable *temp = exec_node_data(nir_variable, temp_node, node);
> +   foreach_two_lists(new_node, new_vars, old_node, old_vars) {
> +  nir_variable *newv = exec_node_data(nir_variable, new_node, node);
> +  nir_variable *temp = exec_node_data(nir_variable, old_node, node);
>
>nir_intrinsic_instr *copy =
> - nir_intrinsic_instr_create(state->shader,
> nir_intrinsic_copy_var);
> -  copy->variables[0] = nir_deref_var_create(copy, output);
> + nir_intrinsic_instr_create(shader, nir_intrinsic_copy_var);
> +  copy->variables[0] = nir_deref_var_create(copy, newv);
>copy->variables[1] = nir_deref_var_create(copy, temp);
>
>nir_instr_insert(cursor, >instr);
> }
>  }
>
> +static void
> +emit_output_copies(nir_cursor cursor, struct lower_io_state *state)
> +{
> +   emit_copies(cursor, state->shader, >shader->outputs,
> >old_outputs);
> +}
>

I question whether or not this is useful.  I'm inclined to say not.


> +
> +static void
> +emit_output_copies_impl(struct lower_io_state *state, nir_function_impl
> *impl)
> +{
> +   if (state->shader->stage == MESA_SHADER_GEOMETRY) {
> +  /* For geometry shaders, we have to emit the output copies right
> +   * before each EmitVertex call.
> +   */
> +  nir_foreach_block(block, impl) {
> + nir_foreach_instr(instr, block) {
> +if (instr->type != nir_instr_type_intrinsic)
> +   continue;
> +
> +nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
> +if (intrin->intrinsic == nir_intrinsic_emit_vertex) {
> +   emit_output_copies(nir_before_instr(>instr),
> state);
> +}
> + }
> +  }
> +   } else if (impl->function == state->entrypoint) {
> +  /* For all other shader types, we need to do the copies right before
> +   * the jumps to the end block.
> +   */
> +  struct set_entry *block_entry;
> +  set_foreach(impl->end_block->predecessors, block_entry) {
> + struct nir_block *block = (void *)block_entry->key;
> + emit_output_copies(nir_after_block_before_jump(block), state);
> +  }
> +   }
> +}
> +
> +static nir_variable *
> +create_shadow_temp(struct lower_io_state *state, nir_variable *var)
> +{
> +   nir_variable *nvar = ralloc(state->shader, nir_variable);
> +   memcpy(nvar, var, sizeof *nvar);
> +
> +   /* The original is now the temporary */
> +   nir_variable *temp = var;
> +
> +   /* Reparent the name to the new variable */
> +   ralloc_steal(nvar, nvar->name);
> +
> +   /* Reparent the constant initializer (if any) */
> +   ralloc_steal(nvar, nvar->constant_initializer);
> +
> +   /* Give the output a new name with @out-temp appended */
> +   const char *mode = "out";
> +   temp->name = ralloc_asprintf(var, "%s@%s-temp", mode, nvar->name);
> +   temp->data.mode = nir_var_global;
> +   temp->constant_initializer = NULL;
> +
> +   return nvar;
> +}
> +
>  void
>  nir_lower_io_to_temporaries(nir_shader *shader, nir_function *entrypoint)
>  {
> @@ -63,29 +123,14 @@ nir_lower_io_to_temporaries(nir_shader *shader,
> nir_function *entrypoint)
>return;
>
> state.shader = shader;
> +   state.entrypoint = entrypoint;
> exec_list_move_nodes_to(>outputs, _outputs);
>
> /* Walk over all of the outputs turn each output into a temporary and
>  * make a new variable for the actual output.
>  */
> nir_foreach_variable(var, _outputs) {
> -  nir_variable *output = ralloc(shader, nir_variable);
> -  memcpy(output, var, sizeof *output);
> -
> -  /* The orignal is now the temporary */
> -  nir_variable 

Re: [Mesa-dev] [PATCH 04/14] radeonsi: Add buffer load functions.

2016-05-10 Thread Nicolai Hähnle

On 10.05.2016 11:36, Bas Nieuwenhuizen wrote:

On Tue, May 10, 2016 at 6:28 PM, Nicolai Hähnle  wrote:

On 10.05.2016 11:25, Bas Nieuwenhuizen wrote:


On Tue, May 10, 2016 at 6:13 PM, Nicolai Hähnle 
wrote:


On 10.05.2016 05:52, Bas Nieuwenhuizen wrote:



Signed-off-by: Bas Nieuwenhuizen 
---
src/gallium/drivers/radeonsi/si_shader.c | 81

1 file changed, 81 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c
b/src/gallium/drivers/radeonsi/si_shader.c
index 5897149..d3df4d6 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -733,6 +733,87 @@ static void build_tbuffer_store_dwords(struct
si_shader_context *ctx,
  V_008F0C_BUF_NUM_FORMAT_UINT, 1, 0, 1, 1,
0);
}

+static LLVMValueRef build_buffer_load(struct si_shader_context *ctx,
+   LLVMValueRef rsrc,
+   int num_channels,
+   LLVMValueRef vindex,
+   LLVMValueRef voffset,
+   LLVMValueRef soffset,
+   unsigned inst_offset,
+   unsigned glc,
+   unsigned slc)
+{
+   struct gallivm_state *gallivm = >radeon_bld.gallivm;
+   LLVMValueRef args[] = {
+   LLVMBuildBitCast(gallivm->builder, rsrc, ctx->v16i8,
""),
+   voffset ? voffset : vindex,
+   soffset,
+   LLVMConstInt(ctx->i32, inst_offset, 0),
+   LLVMConstInt(ctx->i32, voffset ? 1 : 0, 0), // offen
+   LLVMConstInt(ctx->i32, vindex ? 1 : 0, 0), //idxen
+   LLVMConstInt(ctx->i32, glc, 0),
+   LLVMConstInt(ctx->i32, slc, 0),
+   LLVMConstInt(ctx->i32, 0, 0), // TFE
+   };
+
+   unsigned func = CLAMP(num_channels, 1, 3) - 1;
+   LLVMTypeRef types[] = {ctx->i32, LLVMVectorType(ctx->i32, 2),
ctx->v4i32};
+   const char *type_names[] = {"i32", "v2i32", "v4i32"};
+   const char *arg_type = "i32";
+
+   if (voffset && vindex) {
+   LLVMValueRef vaddr[] = {vindex, voffset};
+
+   arg_type = "v2i32";
+   args[1] = lp_build_gather_values(gallivm, vaddr, 2);
+   }
+
+   char name[256];
+   snprintf(name, sizeof(name), "llvm.SI.buffer.load.dword.%s.%s",
+type_names[func], arg_type);




We're generally trying to get away from the llvm.SI.* intrinsics and use
the
llvm.amdgcn.* intrinsic instead - in this case llvm.amdgcn.buffer.load.



The llvm.amdgcn.buffer.load doesn't allow specifiying VGPR + SGPR +
immediate offset separately though as far as I can see. Furthermore I
was trying to avoid a LLVM 3.9 dependency, although I can solve that
with if based on LLVM version.



Fair enough on the LLVM version dependency.

I also think you're right about llvm.amdgcn.buffer.load, but that's
something that should be fixed on the LLVM side eventually without
introducing a new intrinsic.


I am not sure if LLVM can eventually be able to. I don't know if the
vgpr + sgpr + immediate offset is wrapped around in 32-bit. Otherwise
a v_add and the offsets have different behavior and LLVM should not
sink the add into the load instruction.


That's a good point. I don't know the answer unfortunately...

Nicolai



- Bas



I think we should go for the if (HAVE_LLVM) approach even if it produces
slightly worse code for now (it really should be only one additional v_add
at most).

Nicolai



- Bas



Nicolai



+
+   return lp_build_intrinsic(gallivm->builder, name, types[func],
args,
+ ARRAY_SIZE(args),
LLVMReadOnlyAttribute
|
+
LLVMNoUnwindAttribute);
+}
+
+static LLVMValueRef buffer_load(struct lp_build_tgsi_context *bld_base,
+enum tgsi_opcode_type type, unsigned
swizzle,
+LLVMValueRef buffer, LLVMValueRef
offset,
+LLVMValueRef base)
+{
+   struct si_shader_context *ctx = si_shader_context(bld_base);
+   struct gallivm_state *gallivm = bld_base->base.gallivm;
+   LLVMValueRef value, value2;
+   LLVMTypeRef llvm_type = tgsi2llvmtype(bld_base, type);
+   LLVMTypeRef vec_type = LLVMVectorType(llvm_type, 4);
+
+   if (swizzle == ~0) {
+
+   value = build_buffer_load(ctx, buffer, 4, NULL, base,
offset,
+ 0, 1, 0);
+
+   return LLVMBuildBitCast(gallivm->builder, value,
vec_type,
"");
+   }
+
+   if (type != TGSI_TYPE_DOUBLE) {
+   value = build_buffer_load(ctx, buffer, 4, NULL, base,
offset,
+ 0, 1, 0);
+
+   value = LLVMBuildBitCast(gallivm->builder, value,
vec_type, "");
+   return 

Re: [Mesa-dev] [v4 09/11] i965: Set render state for lossless compressed

2016-05-10 Thread Pohjolainen, Topi
On Thu, Apr 21, 2016 at 02:59:04PM +0300, Topi Pohjolainen wrote:
> v2: Add support for blorp and removed the support for meta
> 
> Signed-off-by: Topi Pohjolainen 
> ---
>  src/mesa/drivers/dri/i965/brw_blorp_blit.cpp  | 3 +++
>  src/mesa/drivers/dri/i965/brw_blorp_clear.cpp | 6 ++
>  src/mesa/drivers/dri/i965/brw_draw.c  | 7 ++-
>  3 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp 
> b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> index 74fe3c0..84f4ca5 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_blorp_blit.cpp
> @@ -107,6 +107,9 @@ brw_blorp_blit_miptrees(struct brw_context *brw,
> brw_blorp_exec(brw, );
>  
> intel_miptree_slice_set_needs_hiz_resolve(dst_mt, dst_level, dst_layer);
> +
> +   if (intel_miptree_is_lossless_compressed(brw, dst_mt))
> +  dst_mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_UNRESOLVED;
>  }
>  
>  static int
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp 
> b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> index b1da935..bf8d231 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
> @@ -398,6 +398,12 @@ do_single_blorp_clear(struct brw_context *brw, struct 
> gl_framebuffer *fb,
> * redundant clears.
> */
>irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;
> +   } else if (intel_miptree_is_lossless_compressed(brw, irb->mt)) {
> +  /* Compressed buffers can be cleared also using normal rep-clear. In
> +   * such case they bahave such as if they were drawn using normal 3D
> +   * render pipeline, and we simply mark the mcs as dirty.
> +   */
> +  irb->mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_UNRESOLVED;

We discussed in irc with Ben that this should be only hit with partial
clears. I added:

 assert(partial_clear);

I didn't see this trigger, and Ben, I didn't actually see assertion failures
in your test run either. The two deqp regressions:

piglit.spec.arb_pixel_buffer_object.texsubimage pbo.sklm64
piglit.spec.arb_pixel_buffer_object.texsubimage-unpack pbo.sklm64

are actually real rendering errors that I haven't seen before. We analysed
this a little with Ken and were able to make these pass with:

diff --git a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp 
b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
index 7cfaae7..363b558 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
+++ b/src/mesa/drivers/dri/i965/brw_blorp_clear.cpp
@@ -30,6 +30,7 @@
 #include "util/ralloc.h"
 
 #include "intel_fbo.h"
+#include "intel_reg.h"
 
 #include "brw_blorp.h"
 #include "brw_meta_util.h"
@@ -435,6 +436,9 @@ brw_blorp_resolve_color(struct brw_context *brw, struct 
intel_mipmap_tree *mt)
 
brw_blorp_exec(brw, );
mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_RESOLVED;
+
+   brw_emit_pipe_control_flush(brw,
+   PIPE_CONTROL_RENDER_TARGET_FLUSH);
 }
 
 } /* extern "C" */


In bspec there is:

"When performing a render target resolve, PIPE_CONTROL with end of pipe sync
must be delivered."
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 14/14] radeonsi: Allow TES distribution between shader engines.

2016-05-10 Thread Nicolai Hähnle



On 10.05.2016 05:53, Bas Nieuwenhuizen wrote:

Setting 028B6C_DISTRIBUTION_MODE to a non-zero value and
either setting 028B6C_NUM_DS_WAVES_PER_SIMD to a non-zero
value or storing a zero control word hang my card.

The R_028B50_VGT_TESS_DISTRIBUTION value is copied from
amdgpu-pro. Smaller values in the ACCUM fields seem to
decrease the performance advantage from this patch, higher
values don't seem to matter.

Signed-off-by: Bas Nieuwenhuizen 
---
  src/gallium/drivers/radeonsi/si_state.c |  5 
  src/gallium/drivers/radeonsi/si_state_draw.c|  8 ++
  src/gallium/drivers/radeonsi/si_state_shaders.c | 36 ++---
  3 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state.c 
b/src/gallium/drivers/radeonsi/si_state.c
index c4af77e..eb48a9e 100644
--- a/src/gallium/drivers/radeonsi/si_state.c
+++ b/src/gallium/drivers/radeonsi/si_state.c
@@ -3787,6 +3787,11 @@ static void si_init_config(struct si_context *sctx)
   S_028424_OVERWRITE_COMBINER_WATERMARK(4));
si_pm4_set_reg(pm4, R_028C58_VGT_VERTEX_REUSE_BLOCK_CNTL, 30);
si_pm4_set_reg(pm4, R_028C5C_VGT_OUT_DEALLOC_CNTL, 32);
+   si_pm4_set_reg(pm4, R_028B50_VGT_TESS_DISTRIBUTION,
+  S_028B50_ACCUM_ISOLINE(32) |
+  S_028B50_ACCUM_TRI(11) |
+  S_028B50_ACCUM_QUAD(11) |
+  S_028B50_DONUT_SPLIT(16));
}

if (sctx->b.family == CHIP_STONEY)
diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index 3150489..7ad9422 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -271,6 +271,14 @@ static unsigned si_get_ia_multi_vgt_param(struct 
si_context *sctx,
 sctx->b.family == CHIP_BONAIRE) &&
sctx->gs_shader.cso)
partial_vs_wave = true;
+
+   /* Needed for 028B6C_DISTRIBUTION_MODE != 0 */
+   if (sctx->b.chip_class >= VI) {
+   if (sctx->gs_shader.cso)
+   partial_es_wave = true;
+   else
+   partial_vs_wave = true;
+   }
}

/* This is a hardware requirement. */
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 43f4a84..d7ed31d 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -249,7 +249,8 @@ void si_destroy_shader_cache(struct si_screen *sscreen)

  /* SHADER STATES */

-static void si_set_tesseval_regs(struct si_shader *shader,
+static void si_set_tesseval_regs(struct si_screen *sscreen,
+struct si_shader *shader,
 struct si_pm4_state *pm4)
  {
struct tgsi_shader_info *info = >selector->info;
@@ -257,7 +258,7 @@ static void si_set_tesseval_regs(struct si_shader *shader,
unsigned tes_spacing = info->properties[TGSI_PROPERTY_TES_SPACING];
bool tes_vertex_order_cw = 
info->properties[TGSI_PROPERTY_TES_VERTEX_ORDER_CW];
bool tes_point_mode = info->properties[TGSI_PROPERTY_TES_POINT_MODE];
-   unsigned type, partitioning, topology;
+   unsigned type, partitioning, topology, distribution_mode;

switch (tes_prim_mode) {
case PIPE_PRIM_LINES:
@@ -299,10 +300,13 @@ static void si_set_tesseval_regs(struct si_shader *shader,
else
topology = V_028B6C_OUTPUT_TRIANGLE_CW;

+   distribution_mode = sscreen->b.chip_class >= VI ? 2 : 0;


The named values for distribution_mode are:

0 - NO_DIST
1 - PATCHES
2 - DONUTS

Makes sense to add those to sid.h.

Apart from that, patches 8-12 and this one are

Reviewed-by: Nicolai Hähnle 


+
si_pm4_set_reg(pm4, R_028B6C_VGT_TF_PARAM,
   S_028B6C_TYPE(type) |
   S_028B6C_PARTITIONING(partitioning) |
-  S_028B6C_TOPOLOGY(topology));
+  S_028B6C_TOPOLOGY(topology) |
+  S_028B6C_DISTRIBUTION_MODE(distribution_mode));
  }

  static void si_shader_ls(struct si_shader *shader)
@@ -359,7 +363,7 @@ static void si_shader_hs(struct si_shader *shader)
   S_00B42C_SCRATCH_EN(shader->config.scratch_bytes_per_wave 
> 0));
  }

-static void si_shader_es(struct si_shader *shader)
+static void si_shader_es(struct si_screen *sscreen, struct si_shader *shader)
  {
struct si_pm4_state *pm4;
unsigned num_user_sgprs;
@@ -402,7 +406,7 @@ static void si_shader_es(struct si_shader *shader)
   S_00B32C_SCRATCH_EN(shader->config.scratch_bytes_per_wave 
> 0));

if 

Re: [Mesa-dev] [v6 05/11] i965: Deferred allocation of mcs for lossless compressed

2016-05-10 Thread Pohjolainen, Topi
On Mon, May 09, 2016 at 10:30:25AM -0700, Ben Widawsky wrote:
> On Mon, May 09, 2016 at 10:29:28AM -0700, Ben Widawsky wrote:
> > On Fri, May 06, 2016 at 11:38:25AM +0300, Topi Pohjolainen wrote:
> > > Until now mcs was associated to single sampled buffers only for
> > > fast clear purposes and it was therefore the responsibility of the
> > > clear logic to allocate the aux buffer when needed. Now that normal
> > > 3D render or blorp blit may render with mcs enabled also, they need
> > > to prepare the mcs just as well.
> > > 
> > > v2: Do not enable for scanout buffers
> > > v3 (Ben):
> > >- Fix typo in commit message.
> > >- Check for gen < 9 and return early in brw_predraw_set_aux_buffers()
> > >- Check for gen < 9 and return early in intel_miptree_prepare_mcs()
> > > 
> > > Signed-off-by: Topi Pohjolainen 
> > > Reviewed-by: Ben Widawsky 
> > 
> > v5 is also Reviewed-by: Ben Widawsky 
> 
> oops, v6 :P

This is afterall not good enough. It works when lossless compression is
enabled but adds unnecessary mcs allocation for normal 3d-render. I'll post
an update.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/glsl_to_tgsi: attach image to correct instruction for samples

2016-05-10 Thread Ilia Mirkin
Reviewed-by: Ilia Mirkin 

On Tue, May 10, 2016 at 1:54 AM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> This fixes a crash (but not the test):
> GL45-CTS.shader_texture_image_samples_tests.functional_test
>
> Signed-off-by: Dave Airlie 
> ---
>  src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
> b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> index 6e9c19a..9cf204a 100644
> --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
> @@ -3511,9 +3511,9 @@ glsl_to_tgsi_visitor::visit_image_intrinsic(ir_call *ir)
>st_src_reg res = get_temp(glsl_type::ivec4_type);
>st_dst_reg dstres = st_dst_reg(res);
>dstres.writemask = WRITEMASK_W;
> -  emit_asm(ir, TGSI_OPCODE_RESQ, dstres);
> +  inst = emit_asm(ir, TGSI_OPCODE_RESQ, dstres);
>res.swizzle = SWIZZLE_;
> -  inst = emit_asm(ir, TGSI_OPCODE_MOV, dst, res);
> +  emit_asm(ir, TGSI_OPCODE_MOV, dst, res);
> } else {
>st_src_reg arg1 = undef_src, arg2 = undef_src;
>st_src_reg coord;
> --
> 2.5.5
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 13/14] radeonsi: Process multiple patches per threadgroup.

2016-05-10 Thread Nicolai Hähnle

On 10.05.2016 05:53, Bas Nieuwenhuizen wrote:

Using more than 1 wave per threadgroup does increase performance
generally.  Not using too many patches per threadgroup also
increases performance. Both catalyst and amdgpu-pro seem to
use 40 patches as their maximum, but I haven't really seen
any performance increase from limiting the number of patches
to 40 instead of 64.

Note that the trick where we overlap the input and output LDS
does not work anymore as the insertion of the tess factors
changes the patch stride.


So this still doesn't take into account potential LDS use by the shaders 
themselves, e.g. because of alloca lowering. Perhaps this should be 
noted in a comment somewhere?




Signed-off-by: Bas Nieuwenhuizen 
---
  src/gallium/drivers/radeonsi/si_state_draw.c | 42 ++--
  1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_state_draw.c 
b/src/gallium/drivers/radeonsi/si_state_draw.c
index 7aa886a..3150489 100644
--- a/src/gallium/drivers/radeonsi/si_state_draw.c
+++ b/src/gallium/drivers/radeonsi/si_state_draw.c
@@ -108,20 +108,7 @@ static void si_emit_derived_tess_state(struct si_context 
*sctx,
unsigned input_patch_size, output_patch_size, output_patch0_offset;
unsigned perpatch_output_offset, lds_size, ls_rsrc2;
unsigned tcs_in_layout, tcs_out_layout, tcs_out_offsets;
-   unsigned offchip_layout;
-
-   *num_patches = 1; /* TODO: calculate this */
-
-   if (sctx->last_ls == ls->current &&
-   sctx->last_tcs == tcs &&
-   sctx->last_tes_sh_base == tes_sh_base &&
-   sctx->last_num_tcs_input_cp == num_tcs_input_cp)
-   return;
-
-   sctx->last_ls = ls->current;
-   sctx->last_tcs = tcs;
-   sctx->last_tes_sh_base = tes_sh_base;
-   sctx->last_num_tcs_input_cp = num_tcs_input_cp;
+   unsigned offchip_layout, hardware_lds_size;

/* This calculates how shader inputs and outputs among VS, TCS, and TES
 * are laid out in LDS. */
@@ -146,9 +133,23 @@ static void si_emit_derived_tess_state(struct si_context 
*sctx,
pervertex_output_patch_size = num_tcs_output_cp * output_vertex_size;
output_patch_size = pervertex_output_patch_size + num_tcs_patch_outputs 
* 16;

-   output_patch0_offset = sctx->tcs_shader.cso ? input_patch_size * 
*num_patches : 0;
+   *num_patches = MIN2(256 / num_tcs_input_cp, 256 / num_tcs_output_cp);
+
+   /* Make sure that the data fits in LDS. */
+   hardware_lds_size = sctx->b.chip_class >= CIK ? 65536 : 32768;
+   *num_patches = MIN2(*num_patches, hardware_lds_size / (input_patch_size 
+
+  
output_patch_size));
+
+   /* Make sure the output data fits in the offchip buffer */
+   *num_patches = MIN2(*num_patches, 32768 / output_patch_size);


That's a bit too much magic constants for my taste. Can you define a 
named constant somewhere that both this calculation and the offchip 
buffer allocation are based on?



+   /* Not necessary for correctness, but improves performance */
+   *num_patches = MIN2(*num_patches, 64);
+
+   output_patch0_offset = input_patch_size * *num_patches;
perpatch_output_offset = output_patch0_offset + 
pervertex_output_patch_size;

+


Extraneous blank line.

Cheers,
Nicolai


lds_size = output_patch0_offset + output_patch_size * *num_patches;
ls_rsrc2 = ls->current->config.rsrc2;

@@ -160,6 +161,17 @@ static void si_emit_derived_tess_state(struct si_context 
*sctx,
ls_rsrc2 |= S_00B52C_LDS_SIZE(align(lds_size, 256) / 256);
}

+   if (sctx->last_ls == ls->current &&
+   sctx->last_tcs == tcs &&
+   sctx->last_tes_sh_base == tes_sh_base &&
+   sctx->last_num_tcs_input_cp == num_tcs_input_cp)
+   return;
+
+   sctx->last_ls = ls->current;
+   sctx->last_tcs = tcs;
+   sctx->last_tes_sh_base = tes_sh_base;
+   sctx->last_num_tcs_input_cp = num_tcs_input_cp;
+
/* Due to a hw bug, RSRC2_LS must be written twice with another
 * LS register written in between. */
if (sctx->b.chip_class == CIK && sctx->b.family != CHIP_HAWAII)


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/14] radeonsi: Store inputs to memory when not using a TCS.

2016-05-10 Thread Nicolai Hähnle

On 10.05.2016 05:52, Bas Nieuwenhuizen wrote:

We need to copy the VS outputs to memory. I decided to do this
using a shader key, as the value depends on other shaders.

I also switch the fixed function TCS over to monolithic, as
otherwisze many of the user SGPR's need to be passed to the
epilog, which increases register pressure, or complexity to
avoid that. The main body of the fixed function TCS is not
that interesting to precompile anyway, since we do it on
demand and it is very small.

Signed-off-by: Bas Nieuwenhuizen 
---
  src/gallium/drivers/radeonsi/si_shader.c| 45 +
  src/gallium/drivers/radeonsi/si_shader.h|  1 +
  src/gallium/drivers/radeonsi/si_state_shaders.c |  3 ++
  3 files changed, 49 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 90830ee..50c48bf 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -2423,6 +2423,48 @@ handle_semantic:
}
  }

+static void si_copy_tcs_inputs(struct lp_build_tgsi_context *bld_base)
+{
+   struct si_shader_context *ctx = si_shader_context(bld_base);
+   struct gallivm_state *gallivm = bld_base->base.gallivm;
+   LLVMValueRef invocation_id, rw_buffers, buffer, buffer_offset;
+   LLVMValueRef lds_vertex_stride, lds_vertex_offset, lds_base;
+   unsigned num_outputs, i;
+
+   invocation_id = unpack_param(ctx, SI_PARAM_REL_IDS, 8, 5);
+
+   rw_buffers = LLVMGetParam(ctx->radeon_bld.main_fn, SI_PARAM_RW_BUFFERS);
+   buffer = build_indexed_load_const(ctx, rw_buffers,
+   lp_build_const_int32(gallivm, SI_HS_RING_TESS_OFFCHIP));
+
+   buffer_offset = LLVMGetParam(ctx->radeon_bld.main_fn, 
ctx->param_oc_lds);
+
+   lds_vertex_stride = unpack_param(ctx, SI_PARAM_TCS_IN_LAYOUT, 13, 8);
+   lds_vertex_offset = LLVMBuildMul(gallivm->builder, invocation_id,
+lds_vertex_stride, "");
+   lds_base = get_tcs_in_current_patch_offset(ctx);
+   lds_base = LLVMBuildAdd(gallivm->builder, lds_base, lds_vertex_offset, 
"");
+
+   num_outputs = 
util_last_bit64(ctx->shader->key.tcs.epilog.inputs_to_copy);
+   for (i = 0; i < num_outputs; i++) {
+   if (!((1llu << i) & ctx->shader->key.tcs.epilog.inputs_to_copy))
+   continue;


Use u_bit_scan64, please.

Nicolai


+
+   LLVMValueRef lds_ptr = LLVMBuildAdd(gallivm->builder, lds_base,
+   lp_build_const_int32(gallivm, 4 * 
i),
+"");
+
+   LLVMValueRef buffer_addr = get_buffer_address(ctx, 
invocation_id,
+ lp_build_const_int32(gallivm, i));
+
+   LLVMValueRef value = lds_load(bld_base, TGSI_TYPE_SIGNED, ~0,
+ lds_ptr);
+
+   build_tbuffer_store_dwords(ctx, buffer, value, 4, buffer_addr,
+  buffer_offset, 0);
+   }
+}
+
  static void si_write_tess_factors(struct lp_build_tgsi_context *bld_base,
  LLVMValueRef rel_patch_id,
  LLVMValueRef invocation_id,
@@ -2564,6 +2606,7 @@ static void si_llvm_emit_tcs_epilogue(struct 
lp_build_tgsi_context *bld_base)
return;
}

+   si_copy_tcs_inputs(bld_base);
si_write_tess_factors(bld_base, rel_patch_id, invocation_id, 
tf_lds_offset);
  }

@@ -7374,6 +7417,8 @@ int si_shader_create(struct si_screen *sscreen, 
LLVMTargetMachineRef tm,
  shader->key.vs.as_ls != mainp->key.vs.as_ls)) ||
(shader->selector->type == PIPE_SHADER_TESS_EVAL &&
 shader->key.tes.as_es != mainp->key.tes.as_es) ||
+   (shader->selector->type == PIPE_SHADER_TESS_CTRL &&
+shader->key.tcs.epilog.inputs_to_copy) ||
shader->selector->type == PIPE_SHADER_COMPUTE) {
/* Monolithic shader (compiled as a whole, has many variants,
 * may take a long time to compile).
diff --git a/src/gallium/drivers/radeonsi/si_shader.h 
b/src/gallium/drivers/radeonsi/si_shader.h
index 26be25e..67b457b 100644
--- a/src/gallium/drivers/radeonsi/si_shader.h
+++ b/src/gallium/drivers/radeonsi/si_shader.h
@@ -304,6 +304,7 @@ struct si_vs_epilog_bits {
  /* Common TCS bits between the shader key and the epilog key. */
  struct si_tcs_epilog_bits {
unsignedprim_mode:3;
+   uint64_tinputs_to_copy;
  };

  /* Common PS bits between the shader key and the prolog key. */
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c 
b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 32ac95d..f48582a 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -841,6 +841,9 @@ 

Re: [Mesa-dev] [PATCH 07/14] radeonsi: Add offchip buffer address calculation.

2016-05-10 Thread Nicolai Hähnle

On 10.05.2016 11:25, Nicolai Hähnle wrote:

On 10.05.2016 05:52, Bas Nieuwenhuizen wrote:

Instead of creating a memory area per patch and per vertex, we put
the same attribute of every vertex & patch together. Most loads
and stores access the same attribute across all lanes, only for
different patches and vertices.

For the TCS this results in tightly packed data for 4-component
stores.

For the TES this is not the case as within a patch the loads
often also access the same vertex. However if there are < 4
vertices/patch, this still results in a reduction of the number
of cache lines. In the LDS situation we only do better than worst
case if the data per patch < 64 bytes, which due to the
tessellation factors is pretty much never.

We do not use hardware swizzling for this. It would slightly reduce
the number of executed VALU instructions, but I had issues with
increased wait times that I haven't been able to solve yet.
Furthermore, the tbuffer_store intrinsic does not support both
VGPR offset and an index, so we have a problem storing
indirectly indexed outputs. This can be solved by temporarily
storing arrays in LDS and then copying them, but I don't think
that is worth the effort. The difference in VALU cycles
hardware swizzling gives is about 0.2% of total busy cycles.
That is without handling the array case.

I chose for attributes instead of components as they are often
accessed together, and the software swizzling takes VALU cycles
for calculating offsets.

Signed-off-by: Bas Nieuwenhuizen 
---
  src/gallium/drivers/radeonsi/si_shader.c | 143
+++
  1 file changed, 143 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c
b/src/gallium/drivers/radeonsi/si_shader.c
index adbee73..90830ee 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -664,6 +664,149 @@ static LLVMValueRef get_dw_address(struct
si_shader_context *ctx,
  lp_build_const_int32(gallivm, param * 4), "");
  }

+/* The offchip buffer layout for TCS->TES is
+ *
+ * - attribute 0 of patch 0 vertex 0
+ * - attribute 0 of patch 0 vertex 1
+ * - attribute 0 of patch 0 vertex 2
+ *   ...
+ * - attribute 0 of patch 1 vertex 0
+ * - attribute 0 of patch 1 vertex 1
+ *   ...
+ * - attribute 1 of patch 0 vertex 0
+ * - attribute 1 of patch 0 vertex 1
+ *   ...
+ * - per patch attribute 0 of patch 0 vertex 0
+ * - per patch attribute 0 of patch 0 vertex 1


Should be "... of patch 0" and "... of patch 1", right?


+ *   ...
+ *
+ * Note that every attribute has 4 components.
+ */
+static LLVMValueRef get_buffer_address(struct si_shader_context *ctx,
+   LLVMValueRef vertex_index,
+   LLVMValueRef param_index)


Actually, could you change those function names to something like 
get_tcs_tes_buffer_address?


Thanks,
Nicolai


+{
+struct gallivm_state *gallivm =
ctx->radeon_bld.soa.bld_base.base.gallivm;
+LLVMValueRef base_addr, vertices_per_patch, num_patches,
total_vertices;
+LLVMValueRef param_stride, constant16;
+
+vertices_per_patch = unpack_param(ctx,
SI_PARAM_TCS_OFFCHIP_LAYOUT, 9, 6);
+num_patches = unpack_param(ctx, SI_PARAM_TCS_OFFCHIP_LAYOUT, 0, 9);
+total_vertices = LLVMBuildMul(gallivm->builder, vertices_per_patch,
+  num_patches, "");
+
+constant16 = lp_build_const_int32(gallivm, 16);
+if (vertex_index) {
+base_addr = LLVMBuildMul(gallivm->builder,
+ constant16,
+ vertices_per_patch, "");
+
+base_addr = LLVMBuildMul(gallivm->builder,
get_rel_patch_id(ctx),
+ base_addr, "");
+
+base_addr = LLVMBuildAdd(gallivm->builder, base_addr,
+ LLVMBuildMul(gallivm->builder,
vertex_index,
+  constant16, ""), "");


Nitpick, but... apply the distributive law and calculate (rel_patch_id *
vertices_per_patch + vertex_idx) * 16.


+
+param_stride = LLVMBuildMul(gallivm->builder, total_vertices,
+constant16, "");
+} else {
+LLVMValueRef patch_data_offset =
+   unpack_param(ctx, SI_PARAM_TCS_OFFCHIP_LAYOUT, 16,
16);
+
+base_addr = LLVMBuildMul(gallivm->builder,
get_rel_patch_id(ctx),
+ constant16, "");
+
+base_addr = LLVMBuildAdd(gallivm->builder, base_addr,
+ patch_data_offset, "");
+
+param_stride = LLVMBuildMul(gallivm->builder, num_patches,
+constant16, "");
+}
+
+base_addr = LLVMBuildAdd(gallivm->builder, base_addr,
+ LLVMBuildMul(gallivm->builder, param_index,
+  param_stride, ""), "");
+
+return base_addr;

Re: [Mesa-dev] [PATCH 04/14] radeonsi: Add buffer load functions.

2016-05-10 Thread Bas Nieuwenhuizen
On Tue, May 10, 2016 at 6:28 PM, Nicolai Hähnle  wrote:
> On 10.05.2016 11:25, Bas Nieuwenhuizen wrote:
>>
>> On Tue, May 10, 2016 at 6:13 PM, Nicolai Hähnle 
>> wrote:
>>>
>>> On 10.05.2016 05:52, Bas Nieuwenhuizen wrote:


 Signed-off-by: Bas Nieuwenhuizen 
 ---
src/gallium/drivers/radeonsi/si_shader.c | 81
 
1 file changed, 81 insertions(+)

 diff --git a/src/gallium/drivers/radeonsi/si_shader.c
 b/src/gallium/drivers/radeonsi/si_shader.c
 index 5897149..d3df4d6 100644
 --- a/src/gallium/drivers/radeonsi/si_shader.c
 +++ b/src/gallium/drivers/radeonsi/si_shader.c
 @@ -733,6 +733,87 @@ static void build_tbuffer_store_dwords(struct
 si_shader_context *ctx,
  V_008F0C_BUF_NUM_FORMAT_UINT, 1, 0, 1, 1,
 0);
}

 +static LLVMValueRef build_buffer_load(struct si_shader_context *ctx,
 +   LLVMValueRef rsrc,
 +   int num_channels,
 +   LLVMValueRef vindex,
 +   LLVMValueRef voffset,
 +   LLVMValueRef soffset,
 +   unsigned inst_offset,
 +   unsigned glc,
 +   unsigned slc)
 +{
 +   struct gallivm_state *gallivm = >radeon_bld.gallivm;
 +   LLVMValueRef args[] = {
 +   LLVMBuildBitCast(gallivm->builder, rsrc, ctx->v16i8,
 ""),
 +   voffset ? voffset : vindex,
 +   soffset,
 +   LLVMConstInt(ctx->i32, inst_offset, 0),
 +   LLVMConstInt(ctx->i32, voffset ? 1 : 0, 0), // offen
 +   LLVMConstInt(ctx->i32, vindex ? 1 : 0, 0), //idxen
 +   LLVMConstInt(ctx->i32, glc, 0),
 +   LLVMConstInt(ctx->i32, slc, 0),
 +   LLVMConstInt(ctx->i32, 0, 0), // TFE
 +   };
 +
 +   unsigned func = CLAMP(num_channels, 1, 3) - 1;
 +   LLVMTypeRef types[] = {ctx->i32, LLVMVectorType(ctx->i32, 2),
 ctx->v4i32};
 +   const char *type_names[] = {"i32", "v2i32", "v4i32"};
 +   const char *arg_type = "i32";
 +
 +   if (voffset && vindex) {
 +   LLVMValueRef vaddr[] = {vindex, voffset};
 +
 +   arg_type = "v2i32";
 +   args[1] = lp_build_gather_values(gallivm, vaddr, 2);
 +   }
 +
 +   char name[256];
 +   snprintf(name, sizeof(name), "llvm.SI.buffer.load.dword.%s.%s",
 +type_names[func], arg_type);
>>>
>>>
>>>
>>> We're generally trying to get away from the llvm.SI.* intrinsics and use
>>> the
>>> llvm.amdgcn.* intrinsic instead - in this case llvm.amdgcn.buffer.load.
>>
>>
>> The llvm.amdgcn.buffer.load doesn't allow specifiying VGPR + SGPR +
>> immediate offset separately though as far as I can see. Furthermore I
>> was trying to avoid a LLVM 3.9 dependency, although I can solve that
>> with if based on LLVM version.
>
>
> Fair enough on the LLVM version dependency.
>
> I also think you're right about llvm.amdgcn.buffer.load, but that's
> something that should be fixed on the LLVM side eventually without
> introducing a new intrinsic.

I am not sure if LLVM can eventually be able to. I don't know if the
vgpr + sgpr + immediate offset is wrapped around in 32-bit. Otherwise
a v_add and the offsets have different behavior and LLVM should not
sink the add into the load instruction.

- Bas

>
> I think we should go for the if (HAVE_LLVM) approach even if it produces
> slightly worse code for now (it really should be only one additional v_add
> at most).
>
> Nicolai
>
>
>> - Bas
>>
>>>
>>> Nicolai
>>>
>>>
 +
 +   return lp_build_intrinsic(gallivm->builder, name, types[func],
 args,
 + ARRAY_SIZE(args),
 LLVMReadOnlyAttribute
 |
 +
 LLVMNoUnwindAttribute);
 +}
 +
 +static LLVMValueRef buffer_load(struct lp_build_tgsi_context *bld_base,
 +enum tgsi_opcode_type type, unsigned
 swizzle,
 +LLVMValueRef buffer, LLVMValueRef
 offset,
 +LLVMValueRef base)
 +{
 +   struct si_shader_context *ctx = si_shader_context(bld_base);
 +   struct gallivm_state *gallivm = bld_base->base.gallivm;
 +   LLVMValueRef value, value2;
 +   LLVMTypeRef llvm_type = tgsi2llvmtype(bld_base, type);
 +   LLVMTypeRef vec_type = LLVMVectorType(llvm_type, 4);
 +
 +   if (swizzle == ~0) {
 +
 +   value = build_buffer_load(ctx, buffer, 4, NULL, base,
 offset,
 + 0, 1, 0);
 +
 +  

  1   2   >