Re: [Mesa-dev] [PATCH] i965/fs: Take the sample mask into account in FIND_LIVE_CHANNEL

2016-09-14 Thread Jason Ekstrand
On Wed, Sep 14, 2016 at 12:02 AM, Francisco Jerez 
wrote:

> Jason Ekstrand  writes:
>
> > Just looking at the channel enables is not sufficient, at least not on
> Sky
> > Lake.  Channels that are disabled by the sample_mask may show up in the
> > channel enable register as being enabled even if they are not executing.
> > This can cause FIND_LIVE_CHANNEL to return a channel that isn't actually
> > executing.  In our handling of interpolateAtSample we do a clever trick
> > with emit_uniformize to call the interpolator once for each unique sample
> > id.  Thanks to FIND_LIVE_CHANNEL returning a dead channel, we can get an
> > infinite loop which hangs the GPU.
> >
> > Signed-off-by: Jason Ekstrand 
>
> NAK, FIND_LIVE_CHANNEL returns channels from the EU execution mask by
> design (see the doxygen comment in brw_defines.h), which is necessary
> for the instruction to return a well-defined result when only helper
> invocations are enabled in the execution mask.  Several users of the
> instruction are likely to be relying on this.
>

Perhaps I need to be a bit more specific about what problem is being fixed
here.  It's not just that we need to take sample mask into account.  It's
actually a far more subtle problem.  It can happen (Yes, I've seen this in
practice) that a group of channels is disabled (i.e., doesn't execute, not
just helper invocations) but the corresponding bits in ec0 are set to 1.
Maybe this means that ec0 is still broken on Sky Lake.  Maybe it just means
that ec0 doesn't mean what it looks like it means.  I'm not sure.  What I
do know is that FIND_LIVE_CHANNEL is returning a 100% dead channel.

I'm fine if this is the wrong fix.  Maybe we need to just do the gen7 thing
everywhere.


> And isn't interpolateAtSample supposed to give a well-defined result too
> when it's run from a helper invocation?
>

Probably?


> > ---
> >  src/mesa/drivers/dri/i965/brw_eu.h   |  3 ++-
> >  src/mesa/drivers/dri/i965/brw_eu_emit.c  | 22
> +++---
> >  src/mesa/drivers/dri/i965/brw_fs_builder.h   |  3 ++-
> >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  2 +-
> >  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  2 +-
> >  5 files changed, 21 insertions(+), 11 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_eu.h
> b/src/mesa/drivers/dri/i965/brw_eu.h
> > index 3e52764..9aaab78 100644
> > --- a/src/mesa/drivers/dri/i965/brw_eu.h
> > +++ b/src/mesa/drivers/dri/i965/brw_eu.h
> > @@ -488,7 +488,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
> >
> >  void
> >  brw_find_live_channel(struct brw_codegen *p,
> > -  struct brw_reg dst);
> > +  struct brw_reg dst,
> > +  struct brw_reg sample_mask);
> >
> >  void
> >  brw_broadcast(struct brw_codegen *p,
> > diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> > index 3b12030..f593a8d 100644
> > --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> > +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> > @@ -3361,7 +3361,8 @@ brw_pixel_interpolator_query(struct brw_codegen
> *p,
> >  }
> >
> >  void
> > -brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst)
> > +brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst,
> > +  struct brw_reg sample_mask)
> >  {
> > const struct gen_device_info *devinfo = p->devinfo;
> > const unsigned exec_size = 1 << brw_inst_exec_size(devinfo,
> p->current);
> > @@ -3377,13 +3378,20 @@ brw_find_live_channel(struct brw_codegen *p,
> struct brw_reg dst)
> >
> >if (devinfo->gen >= 8) {
> >   /* Getting the first active channel index is easy on Gen8:
> Just find
> > -  * the first bit set in the mask register.  The same register
> exists
> > -  * on HSW already but it reads back as all ones when the
> current
> > -  * instruction has execution masking disabled, so it's kind of
> > -  * useless.
> > +  * the first bit set in the mask register AND the sample
> mask.  The
> > +  * same register exists on HSW already but it reads back as
> all ones
> > +  * when the current instruction has execution masking
> disabled, so
> > +  * it's kind of useless.
> >*/
> > - inst = brw_FBL(p, vec1(dst),
> > -retype(brw_mask_reg(0), BRW_REGISTER_TYPE_UD));
> > + struct brw_reg mask_reg = retype(brw_mask_reg(0),
> > +  BRW_REGISTER_TYPE_UD);
> > + if (sample_mask.file != BRW_IMMEDIATE_VALUE ||
> > + sample_mask.ud != 0x) {
> > +brw_AND(p, vec1(dst), mask_reg, sample_mask);
> > +mask_reg = vec1(dst);
> > + }
> > +
> > + inst = brw_FBL(p, vec1(dst), mask_reg);
> >
> >   /* Quarter control has the effect of magically shifting the
> value of
> >* this register so you'll get the first active cha

Re: [Mesa-dev] [PATCH] i965/fs: Take the sample mask into account in FIND_LIVE_CHANNEL

2016-09-14 Thread Francisco Jerez
Jason Ekstrand  writes:

> Just looking at the channel enables is not sufficient, at least not on Sky
> Lake.  Channels that are disabled by the sample_mask may show up in the
> channel enable register as being enabled even if they are not executing.
> This can cause FIND_LIVE_CHANNEL to return a channel that isn't actually
> executing.  In our handling of interpolateAtSample we do a clever trick
> with emit_uniformize to call the interpolator once for each unique sample
> id.  Thanks to FIND_LIVE_CHANNEL returning a dead channel, we can get an
> infinite loop which hangs the GPU.
>
> Signed-off-by: Jason Ekstrand 

NAK, FIND_LIVE_CHANNEL returns channels from the EU execution mask by
design (see the doxygen comment in brw_defines.h), which is necessary
for the instruction to return a well-defined result when only helper
invocations are enabled in the execution mask.  Several users of the
instruction are likely to be relying on this.

And isn't interpolateAtSample supposed to give a well-defined result too
when it's run from a helper invocation?

> ---
>  src/mesa/drivers/dri/i965/brw_eu.h   |  3 ++-
>  src/mesa/drivers/dri/i965/brw_eu_emit.c  | 22 +++---
>  src/mesa/drivers/dri/i965/brw_fs_builder.h   |  3 ++-
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  2 +-
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  2 +-
>  5 files changed, 21 insertions(+), 11 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
> b/src/mesa/drivers/dri/i965/brw_eu.h
> index 3e52764..9aaab78 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu.h
> +++ b/src/mesa/drivers/dri/i965/brw_eu.h
> @@ -488,7 +488,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
>  
>  void
>  brw_find_live_channel(struct brw_codegen *p,
> -  struct brw_reg dst);
> +  struct brw_reg dst,
> +  struct brw_reg sample_mask);
>  
>  void
>  brw_broadcast(struct brw_codegen *p,
> diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
> b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> index 3b12030..f593a8d 100644
> --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
> +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
> @@ -3361,7 +3361,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
>  }
>  
>  void
> -brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst)
> +brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst,
> +  struct brw_reg sample_mask)
>  {
> const struct gen_device_info *devinfo = p->devinfo;
> const unsigned exec_size = 1 << brw_inst_exec_size(devinfo, p->current);
> @@ -3377,13 +3378,20 @@ brw_find_live_channel(struct brw_codegen *p, struct 
> brw_reg dst)
>  
>if (devinfo->gen >= 8) {
>   /* Getting the first active channel index is easy on Gen8: Just find
> -  * the first bit set in the mask register.  The same register exists
> -  * on HSW already but it reads back as all ones when the current
> -  * instruction has execution masking disabled, so it's kind of
> -  * useless.
> +  * the first bit set in the mask register AND the sample mask.  The
> +  * same register exists on HSW already but it reads back as all ones
> +  * when the current instruction has execution masking disabled, so
> +  * it's kind of useless.
>*/
> - inst = brw_FBL(p, vec1(dst),
> -retype(brw_mask_reg(0), BRW_REGISTER_TYPE_UD));
> + struct brw_reg mask_reg = retype(brw_mask_reg(0),
> +  BRW_REGISTER_TYPE_UD);
> + if (sample_mask.file != BRW_IMMEDIATE_VALUE ||
> + sample_mask.ud != 0x) {
> +brw_AND(p, vec1(dst), mask_reg, sample_mask);
> +mask_reg = vec1(dst);
> + }
> +
> + inst = brw_FBL(p, vec1(dst), mask_reg);
>  
>   /* Quarter control has the effect of magically shifting the value of
>* this register so you'll get the first active channel relative to
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h 
> b/src/mesa/drivers/dri/i965/brw_fs_builder.h
> index 483672f..45b5f88 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
> @@ -407,7 +407,8 @@ namespace brw {
>   const dst_reg chan_index = vgrf(BRW_REGISTER_TYPE_UD);
>   const dst_reg dst = vgrf(src.type);
>  
> - ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index);
> + ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index,
> +   sample_mask_reg());
>   ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, component(chan_index, 
> 0));
>  
>   return src_reg(component(dst, 0));
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> index 2f4ba7b..d923b0b 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp

[Mesa-dev] [PATCH] i965/fs: Take the sample mask into account in FIND_LIVE_CHANNEL

2016-09-13 Thread Jason Ekstrand
Just looking at the channel enables is not sufficient, at least not on Sky
Lake.  Channels that are disabled by the sample_mask may show up in the
channel enable register as being enabled even if they are not executing.
This can cause FIND_LIVE_CHANNEL to return a channel that isn't actually
executing.  In our handling of interpolateAtSample we do a clever trick
with emit_uniformize to call the interpolator once for each unique sample
id.  Thanks to FIND_LIVE_CHANNEL returning a dead channel, we can get an
infinite loop which hangs the GPU.

Signed-off-by: Jason Ekstrand 
---
 src/mesa/drivers/dri/i965/brw_eu.h   |  3 ++-
 src/mesa/drivers/dri/i965/brw_eu_emit.c  | 22 +++---
 src/mesa/drivers/dri/i965/brw_fs_builder.h   |  3 ++-
 src/mesa/drivers/dri/i965/brw_fs_generator.cpp   |  2 +-
 src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  2 +-
 5 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_eu.h 
b/src/mesa/drivers/dri/i965/brw_eu.h
index 3e52764..9aaab78 100644
--- a/src/mesa/drivers/dri/i965/brw_eu.h
+++ b/src/mesa/drivers/dri/i965/brw_eu.h
@@ -488,7 +488,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
 
 void
 brw_find_live_channel(struct brw_codegen *p,
-  struct brw_reg dst);
+  struct brw_reg dst,
+  struct brw_reg sample_mask);
 
 void
 brw_broadcast(struct brw_codegen *p,
diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c 
b/src/mesa/drivers/dri/i965/brw_eu_emit.c
index 3b12030..f593a8d 100644
--- a/src/mesa/drivers/dri/i965/brw_eu_emit.c
+++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c
@@ -3361,7 +3361,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p,
 }
 
 void
-brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst)
+brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst,
+  struct brw_reg sample_mask)
 {
const struct gen_device_info *devinfo = p->devinfo;
const unsigned exec_size = 1 << brw_inst_exec_size(devinfo, p->current);
@@ -3377,13 +3378,20 @@ brw_find_live_channel(struct brw_codegen *p, struct 
brw_reg dst)
 
   if (devinfo->gen >= 8) {
  /* Getting the first active channel index is easy on Gen8: Just find
-  * the first bit set in the mask register.  The same register exists
-  * on HSW already but it reads back as all ones when the current
-  * instruction has execution masking disabled, so it's kind of
-  * useless.
+  * the first bit set in the mask register AND the sample mask.  The
+  * same register exists on HSW already but it reads back as all ones
+  * when the current instruction has execution masking disabled, so
+  * it's kind of useless.
   */
- inst = brw_FBL(p, vec1(dst),
-retype(brw_mask_reg(0), BRW_REGISTER_TYPE_UD));
+ struct brw_reg mask_reg = retype(brw_mask_reg(0),
+  BRW_REGISTER_TYPE_UD);
+ if (sample_mask.file != BRW_IMMEDIATE_VALUE ||
+ sample_mask.ud != 0x) {
+brw_AND(p, vec1(dst), mask_reg, sample_mask);
+mask_reg = vec1(dst);
+ }
+
+ inst = brw_FBL(p, vec1(dst), mask_reg);
 
  /* Quarter control has the effect of magically shifting the value of
   * this register so you'll get the first active channel relative to
diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h 
b/src/mesa/drivers/dri/i965/brw_fs_builder.h
index 483672f..45b5f88 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_builder.h
+++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h
@@ -407,7 +407,8 @@ namespace brw {
  const dst_reg chan_index = vgrf(BRW_REGISTER_TYPE_UD);
  const dst_reg dst = vgrf(src.type);
 
- ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index);
+ ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index,
+   sample_mask_reg());
  ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, component(chan_index, 
0));
 
  return src_reg(component(dst, 0));
diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
index 2f4ba7b..d923b0b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
@@ -2041,7 +2041,7 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
  break;
 
   case SHADER_OPCODE_FIND_LIVE_CHANNEL:
- brw_find_live_channel(p, dst);
+ brw_find_live_channel(p, dst, src[0]);
  break;
 
   case SHADER_OPCODE_BROADCAST:
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
index 256abae..63fca6f 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
@@ -1863,7 +1863,7 @@ generate_code(struct brw_c