Re: [Mesa-dev] [PATCH] i965/fs: Take the sample mask into account in FIND_LIVE_CHANNEL
On Wed, Sep 14, 2016 at 12:02 AM, Francisco Jerez wrote: > Jason Ekstrand writes: > > > Just looking at the channel enables is not sufficient, at least not on > Sky > > Lake. Channels that are disabled by the sample_mask may show up in the > > channel enable register as being enabled even if they are not executing. > > This can cause FIND_LIVE_CHANNEL to return a channel that isn't actually > > executing. In our handling of interpolateAtSample we do a clever trick > > with emit_uniformize to call the interpolator once for each unique sample > > id. Thanks to FIND_LIVE_CHANNEL returning a dead channel, we can get an > > infinite loop which hangs the GPU. > > > > Signed-off-by: Jason Ekstrand > > NAK, FIND_LIVE_CHANNEL returns channels from the EU execution mask by > design (see the doxygen comment in brw_defines.h), which is necessary > for the instruction to return a well-defined result when only helper > invocations are enabled in the execution mask. Several users of the > instruction are likely to be relying on this. > Perhaps I need to be a bit more specific about what problem is being fixed here. It's not just that we need to take sample mask into account. It's actually a far more subtle problem. It can happen (Yes, I've seen this in practice) that a group of channels is disabled (i.e., doesn't execute, not just helper invocations) but the corresponding bits in ec0 are set to 1. Maybe this means that ec0 is still broken on Sky Lake. Maybe it just means that ec0 doesn't mean what it looks like it means. I'm not sure. What I do know is that FIND_LIVE_CHANNEL is returning a 100% dead channel. I'm fine if this is the wrong fix. Maybe we need to just do the gen7 thing everywhere. > And isn't interpolateAtSample supposed to give a well-defined result too > when it's run from a helper invocation? > Probably? > > --- > > src/mesa/drivers/dri/i965/brw_eu.h | 3 ++- > > src/mesa/drivers/dri/i965/brw_eu_emit.c | 22 > +++--- > > src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 ++- > > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 +- > > src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 2 +- > > 5 files changed, 21 insertions(+), 11 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_eu.h > b/src/mesa/drivers/dri/i965/brw_eu.h > > index 3e52764..9aaab78 100644 > > --- a/src/mesa/drivers/dri/i965/brw_eu.h > > +++ b/src/mesa/drivers/dri/i965/brw_eu.h > > @@ -488,7 +488,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p, > > > > void > > brw_find_live_channel(struct brw_codegen *p, > > - struct brw_reg dst); > > + struct brw_reg dst, > > + struct brw_reg sample_mask); > > > > void > > brw_broadcast(struct brw_codegen *p, > > diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c > b/src/mesa/drivers/dri/i965/brw_eu_emit.c > > index 3b12030..f593a8d 100644 > > --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c > > +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c > > @@ -3361,7 +3361,8 @@ brw_pixel_interpolator_query(struct brw_codegen > *p, > > } > > > > void > > -brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst) > > +brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst, > > + struct brw_reg sample_mask) > > { > > const struct gen_device_info *devinfo = p->devinfo; > > const unsigned exec_size = 1 << brw_inst_exec_size(devinfo, > p->current); > > @@ -3377,13 +3378,20 @@ brw_find_live_channel(struct brw_codegen *p, > struct brw_reg dst) > > > >if (devinfo->gen >= 8) { > > /* Getting the first active channel index is easy on Gen8: > Just find > > - * the first bit set in the mask register. The same register > exists > > - * on HSW already but it reads back as all ones when the > current > > - * instruction has execution masking disabled, so it's kind of > > - * useless. > > + * the first bit set in the mask register AND the sample > mask. The > > + * same register exists on HSW already but it reads back as > all ones > > + * when the current instruction has execution masking > disabled, so > > + * it's kind of useless. > >*/ > > - inst = brw_FBL(p, vec1(dst), > > -retype(brw_mask_reg(0), BRW_REGISTER_TYPE_UD)); > > + struct brw_reg mask_reg = retype(brw_mask_reg(0), > > + BRW_REGISTER_TYPE_UD); > > + if (sample_mask.file != BRW_IMMEDIATE_VALUE || > > + sample_mask.ud != 0x) { > > +brw_AND(p, vec1(dst), mask_reg, sample_mask); > > +mask_reg = vec1(dst); > > + } > > + > > + inst = brw_FBL(p, vec1(dst), mask_reg); > > > > /* Quarter control has the effect of magically shifting the > value of > >* this register so you'll get the first active cha
Re: [Mesa-dev] [PATCH] i965/fs: Take the sample mask into account in FIND_LIVE_CHANNEL
Jason Ekstrand writes: > Just looking at the channel enables is not sufficient, at least not on Sky > Lake. Channels that are disabled by the sample_mask may show up in the > channel enable register as being enabled even if they are not executing. > This can cause FIND_LIVE_CHANNEL to return a channel that isn't actually > executing. In our handling of interpolateAtSample we do a clever trick > with emit_uniformize to call the interpolator once for each unique sample > id. Thanks to FIND_LIVE_CHANNEL returning a dead channel, we can get an > infinite loop which hangs the GPU. > > Signed-off-by: Jason Ekstrand NAK, FIND_LIVE_CHANNEL returns channels from the EU execution mask by design (see the doxygen comment in brw_defines.h), which is necessary for the instruction to return a well-defined result when only helper invocations are enabled in the execution mask. Several users of the instruction are likely to be relying on this. And isn't interpolateAtSample supposed to give a well-defined result too when it's run from a helper invocation? > --- > src/mesa/drivers/dri/i965/brw_eu.h | 3 ++- > src/mesa/drivers/dri/i965/brw_eu_emit.c | 22 +++--- > src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 ++- > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 +- > src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 2 +- > 5 files changed, 21 insertions(+), 11 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_eu.h > b/src/mesa/drivers/dri/i965/brw_eu.h > index 3e52764..9aaab78 100644 > --- a/src/mesa/drivers/dri/i965/brw_eu.h > +++ b/src/mesa/drivers/dri/i965/brw_eu.h > @@ -488,7 +488,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p, > > void > brw_find_live_channel(struct brw_codegen *p, > - struct brw_reg dst); > + struct brw_reg dst, > + struct brw_reg sample_mask); > > void > brw_broadcast(struct brw_codegen *p, > diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c > b/src/mesa/drivers/dri/i965/brw_eu_emit.c > index 3b12030..f593a8d 100644 > --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c > +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c > @@ -3361,7 +3361,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p, > } > > void > -brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst) > +brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst, > + struct brw_reg sample_mask) > { > const struct gen_device_info *devinfo = p->devinfo; > const unsigned exec_size = 1 << brw_inst_exec_size(devinfo, p->current); > @@ -3377,13 +3378,20 @@ brw_find_live_channel(struct brw_codegen *p, struct > brw_reg dst) > >if (devinfo->gen >= 8) { > /* Getting the first active channel index is easy on Gen8: Just find > - * the first bit set in the mask register. The same register exists > - * on HSW already but it reads back as all ones when the current > - * instruction has execution masking disabled, so it's kind of > - * useless. > + * the first bit set in the mask register AND the sample mask. The > + * same register exists on HSW already but it reads back as all ones > + * when the current instruction has execution masking disabled, so > + * it's kind of useless. >*/ > - inst = brw_FBL(p, vec1(dst), > -retype(brw_mask_reg(0), BRW_REGISTER_TYPE_UD)); > + struct brw_reg mask_reg = retype(brw_mask_reg(0), > + BRW_REGISTER_TYPE_UD); > + if (sample_mask.file != BRW_IMMEDIATE_VALUE || > + sample_mask.ud != 0x) { > +brw_AND(p, vec1(dst), mask_reg, sample_mask); > +mask_reg = vec1(dst); > + } > + > + inst = brw_FBL(p, vec1(dst), mask_reg); > > /* Quarter control has the effect of magically shifting the value of >* this register so you'll get the first active channel relative to > diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h > b/src/mesa/drivers/dri/i965/brw_fs_builder.h > index 483672f..45b5f88 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h > +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h > @@ -407,7 +407,8 @@ namespace brw { > const dst_reg chan_index = vgrf(BRW_REGISTER_TYPE_UD); > const dst_reg dst = vgrf(src.type); > > - ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index); > + ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index, > + sample_mask_reg()); > ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, component(chan_index, > 0)); > > return src_reg(component(dst, 0)); > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > index 2f4ba7b..d923b0b 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
[Mesa-dev] [PATCH] i965/fs: Take the sample mask into account in FIND_LIVE_CHANNEL
Just looking at the channel enables is not sufficient, at least not on Sky Lake. Channels that are disabled by the sample_mask may show up in the channel enable register as being enabled even if they are not executing. This can cause FIND_LIVE_CHANNEL to return a channel that isn't actually executing. In our handling of interpolateAtSample we do a clever trick with emit_uniformize to call the interpolator once for each unique sample id. Thanks to FIND_LIVE_CHANNEL returning a dead channel, we can get an infinite loop which hangs the GPU. Signed-off-by: Jason Ekstrand --- src/mesa/drivers/dri/i965/brw_eu.h | 3 ++- src/mesa/drivers/dri/i965/brw_eu_emit.c | 22 +++--- src/mesa/drivers/dri/i965/brw_fs_builder.h | 3 ++- src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 +- src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 2 +- 5 files changed, 21 insertions(+), 11 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_eu.h b/src/mesa/drivers/dri/i965/brw_eu.h index 3e52764..9aaab78 100644 --- a/src/mesa/drivers/dri/i965/brw_eu.h +++ b/src/mesa/drivers/dri/i965/brw_eu.h @@ -488,7 +488,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p, void brw_find_live_channel(struct brw_codegen *p, - struct brw_reg dst); + struct brw_reg dst, + struct brw_reg sample_mask); void brw_broadcast(struct brw_codegen *p, diff --git a/src/mesa/drivers/dri/i965/brw_eu_emit.c b/src/mesa/drivers/dri/i965/brw_eu_emit.c index 3b12030..f593a8d 100644 --- a/src/mesa/drivers/dri/i965/brw_eu_emit.c +++ b/src/mesa/drivers/dri/i965/brw_eu_emit.c @@ -3361,7 +3361,8 @@ brw_pixel_interpolator_query(struct brw_codegen *p, } void -brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst) +brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst, + struct brw_reg sample_mask) { const struct gen_device_info *devinfo = p->devinfo; const unsigned exec_size = 1 << brw_inst_exec_size(devinfo, p->current); @@ -3377,13 +3378,20 @@ brw_find_live_channel(struct brw_codegen *p, struct brw_reg dst) if (devinfo->gen >= 8) { /* Getting the first active channel index is easy on Gen8: Just find - * the first bit set in the mask register. The same register exists - * on HSW already but it reads back as all ones when the current - * instruction has execution masking disabled, so it's kind of - * useless. + * the first bit set in the mask register AND the sample mask. The + * same register exists on HSW already but it reads back as all ones + * when the current instruction has execution masking disabled, so + * it's kind of useless. */ - inst = brw_FBL(p, vec1(dst), -retype(brw_mask_reg(0), BRW_REGISTER_TYPE_UD)); + struct brw_reg mask_reg = retype(brw_mask_reg(0), + BRW_REGISTER_TYPE_UD); + if (sample_mask.file != BRW_IMMEDIATE_VALUE || + sample_mask.ud != 0x) { +brw_AND(p, vec1(dst), mask_reg, sample_mask); +mask_reg = vec1(dst); + } + + inst = brw_FBL(p, vec1(dst), mask_reg); /* Quarter control has the effect of magically shifting the value of * this register so you'll get the first active channel relative to diff --git a/src/mesa/drivers/dri/i965/brw_fs_builder.h b/src/mesa/drivers/dri/i965/brw_fs_builder.h index 483672f..45b5f88 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_builder.h +++ b/src/mesa/drivers/dri/i965/brw_fs_builder.h @@ -407,7 +407,8 @@ namespace brw { const dst_reg chan_index = vgrf(BRW_REGISTER_TYPE_UD); const dst_reg dst = vgrf(src.type); - ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index); + ubld.emit(SHADER_OPCODE_FIND_LIVE_CHANNEL, chan_index, + sample_mask_reg()); ubld.emit(SHADER_OPCODE_BROADCAST, dst, src, component(chan_index, 0)); return src_reg(component(dst, 0)); diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp index 2f4ba7b..d923b0b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp @@ -2041,7 +2041,7 @@ fs_generator::generate_code(const cfg_t *cfg, int dispatch_width) break; case SHADER_OPCODE_FIND_LIVE_CHANNEL: - brw_find_live_channel(p, dst); + brw_find_live_channel(p, dst, src[0]); break; case SHADER_OPCODE_BROADCAST: diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp index 256abae..63fca6f 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp @@ -1863,7 +1863,7 @@ generate_code(struct brw_c