Re: [Mesa-dev] [PATCH v3] i965/gen6/gs: Handle case where a GS doesn't allocate VUE
Hello, Sorry for late response. I was on vacation. I am glad that this fix is ok. Thanks a lot for your review. Regards, Andrii. On 2018-06-25 12:11 PM, Iago Toral wrote: Thanks for testing Mark. Andrii, I'll add my Reviewed-by and and push the patch to master later today (I'll also queue it for the next stable release). Thanks for fixing this! Iago On Fri, 2018-06-22 at 13:18 -0700, Mark Janes wrote: Tested-by: Mark Janes Iago Toral writes: Thanks Andrii, this version looks good to me. Mark: this change fixes a GPU hang in sandy bridge with geometry shaders (the change itself affects a path in the driver that is only executed in SNB with GS, so nothing else is affected). While I think the change in here is correct according to the PRMs and in fact it seems to fix the GPU hang reported in Bugzilla, since this is tinkering with the way in which we allocate and free VUEs for SNB GS I believe that if this breaks anything it might produce a GPU hang and in that case I would rather not hang Jenkins for everyone else until you have a chance to restore it, so in order to minimize that risk, could you run this through Jenkins when you are available? If that is inconvenient for you just let me know and I will send it myself late in my day on Monday to minimize the risk. Thanks, Iago On Fri, 2018-06-22 at 10:59 +0300, Andrii Simiklit wrote: We can not use the VUE Dereference flags combination for EOT message under ILK and SNB because the threads are not initialized there with initial VUE handle unlike Pre-IL. So to avoid GPU hangs on SNB and ILK we need to avoid usage of the VUE Dereference flags combination. (Was tested only on SNB but according to the specification SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6 the ILK must behave itself in the similar way) v2: Approach to fix this issue was changed. Instead of different EOT flags in the program end we will create VUE every time even if GS produces no output. v3: Clean up the patch. Signed-off-by: Andrii Simiklit Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399 --- src/intel/compiler/gen6_gs_visitor.cpp | 42 +--- -- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/src/intel/compiler/gen6_gs_visitor.cpp b/src/intel/compiler/gen6_gs_visitor.cpp index 66c69fb..c9571cc 100644 --- a/src/intel/compiler/gen6_gs_visitor.cpp +++ b/src/intel/compiler/gen6_gs_visitor.cpp @@ -350,27 +350,27 @@ gen6_gs_visitor::emit_thread_end() int max_usable_mrf = FIRST_SPILL_MRF(devinfo->gen); /* Issue the FF_SYNC message and obtain the initial VUE handle. */ + this->current_annotation = "gen6 thread end: ff_sync"; + + vec4_instruction *inst = NULL; + if (prog->info.has_transform_feedback_varyings) { + src_reg sol_temp(this, glsl_type::uvec4_type); + emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES, + dst_reg(this->svbi), + this->vertex_count, + this->prim_count, + sol_temp); + inst = emit(GS_OPCODE_FF_SYNC, + dst_reg(this->temp), this->prim_count, this- svbi); + } else { + inst = emit(GS_OPCODE_FF_SYNC, + dst_reg(this->temp), this->prim_count, brw_imm_ud(0u)); + } + inst->base_mrf = base_mrf; + emit(CMP(dst_null_ud(), this->vertex_count, brw_imm_ud(0u), BRW_CONDITIONAL_G)); emit(IF(BRW_PREDICATE_NORMAL)); { - this->current_annotation = "gen6 thread end: ff_sync"; - - vec4_instruction *inst; - if (prog->info.has_transform_feedback_varyings) { - src_reg sol_temp(this, glsl_type::uvec4_type); - emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES, - dst_reg(this->svbi), - this->vertex_count, - this->prim_count, - sol_temp); - inst = emit(GS_OPCODE_FF_SYNC, - dst_reg(this->temp), this->prim_count, this- svbi); - } else { - inst = emit(GS_OPCODE_FF_SYNC, - dst_reg(this->temp), this->prim_count, brw_imm_ud(0u)); - } - inst->base_mrf = base_mrf; - /* Loop over all buffered vertices and emit URB write messages */ this->current_annotation = "gen6 thread end: urb writes init"; src_reg vertex(this, glsl_type::uint_type); @@ -414,7 +414,7 @@ gen6_gs_visitor::emit_thread_end() dst_reg reg = dst_reg(MRF, mrf); reg.type = output_reg[varying][0].type; data.type = reg.type; - vec4_instruction *inst = emit(MOV(reg, data)); + inst = emit(MOV(reg, data)); inst->force_writemask_all = true; mrf++; @@ -460,7 +460,7 @@ gen6_gs_visitor::emit_thread_end() * * However, this would lead us to end the program with an ENDIF opcode, * which we want to avoid, so what we do is that we always request a new -* VUE handle every time we do a URB WRITE, even for the last vertex we emit. +* VUE handle every time, even if GS
Re: [Mesa-dev] [PATCH v3] i965/gen6/gs: Handle case where a GS doesn't allocate VUE
Thanks for testing Mark. Andrii, I'll add my Reviewed-by and and push the patch to master later today (I'll also queue it for the next stable release). Thanks for fixing this! Iago On Fri, 2018-06-22 at 13:18 -0700, Mark Janes wrote: > Tested-by: Mark Janes > > Iago Toral writes: > > > Thanks Andrii, this version looks good to me. > > > > Mark: this change fixes a GPU hang in sandy bridge with geometry > > shaders (the change itself affects a path in the driver that is > > only > > executed in SNB with GS, so nothing else is affected). While I > > think > > the change in here is correct according to the PRMs and in fact it > > seems to fix the GPU hang reported in Bugzilla, since this is > > tinkering > > with the way in which we allocate and free VUEs for SNB GS I > > believe > > that if this breaks anything it might produce a GPU hang and in > > that > > case I would rather not hang Jenkins for everyone else until you > > have a > > chance to restore it, so in order to minimize that risk, could you > > run > > this through Jenkins when you are available? If that is > > inconvenient > > for you just let me know and I will send it myself late in my day > > on > > Monday to minimize the risk. > > > > Thanks, > > Iago > > > > On Fri, 2018-06-22 at 10:59 +0300, Andrii Simiklit wrote: > > > We can not use the VUE Dereference flags combination for EOT > > > message under ILK and SNB because the threads are not initialized > > > there with initial VUE handle unlike Pre-IL. > > > So to avoid GPU hangs on SNB and ILK we need > > > to avoid usage of the VUE Dereference flags combination. > > > (Was tested only on SNB but according to the specification > > > SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6 > > > the ILK must behave itself in the similar way) > > > > > > v2: Approach to fix this issue was changed. > > > Instead of different EOT flags in the program end > > > we will create VUE every time even if GS produces no output. > > > > > > v3: Clean up the patch. > > > Signed-off-by: Andrii Simiklit > > > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399 > > > > > > --- > > > src/intel/compiler/gen6_gs_visitor.cpp | 42 +--- > > > > > > -- > > > 1 file changed, 21 insertions(+), 21 deletions(-) > > > > > > diff --git a/src/intel/compiler/gen6_gs_visitor.cpp > > > b/src/intel/compiler/gen6_gs_visitor.cpp > > > index 66c69fb..c9571cc 100644 > > > --- a/src/intel/compiler/gen6_gs_visitor.cpp > > > +++ b/src/intel/compiler/gen6_gs_visitor.cpp > > > @@ -350,27 +350,27 @@ gen6_gs_visitor::emit_thread_end() > > > int max_usable_mrf = FIRST_SPILL_MRF(devinfo->gen); > > > > > > /* Issue the FF_SYNC message and obtain the initial VUE > > > handle. > > > */ > > > + this->current_annotation = "gen6 thread end: ff_sync"; > > > + > > > + vec4_instruction *inst = NULL; > > > + if (prog->info.has_transform_feedback_varyings) { > > > + src_reg sol_temp(this, glsl_type::uvec4_type); > > > + emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES, > > > + dst_reg(this->svbi), > > > + this->vertex_count, > > > + this->prim_count, > > > + sol_temp); > > > + inst = emit(GS_OPCODE_FF_SYNC, > > > + dst_reg(this->temp), this->prim_count, this- > > > > svbi); > > > > > > + } else { > > > + inst = emit(GS_OPCODE_FF_SYNC, > > > + dst_reg(this->temp), this->prim_count, > > > brw_imm_ud(0u)); > > > + } > > > + inst->base_mrf = base_mrf; > > > + > > > emit(CMP(dst_null_ud(), this->vertex_count, brw_imm_ud(0u), > > > BRW_CONDITIONAL_G)); > > > emit(IF(BRW_PREDICATE_NORMAL)); > > > { > > > - this->current_annotation = "gen6 thread end: ff_sync"; > > > - > > > - vec4_instruction *inst; > > > - if (prog->info.has_transform_feedback_varyings) { > > > - src_reg sol_temp(this, glsl_type::uvec4_type); > > > - emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES, > > > - dst_reg(this->svbi), > > > - this->vertex_count, > > > - this->prim_count, > > > - sol_temp); > > > - inst = emit(GS_OPCODE_FF_SYNC, > > > - dst_reg(this->temp), this->prim_count, > > > this- > > > > svbi); > > > > > > - } else { > > > - inst = emit(GS_OPCODE_FF_SYNC, > > > - dst_reg(this->temp), this->prim_count, > > > brw_imm_ud(0u)); > > > - } > > > - inst->base_mrf = base_mrf; > > > - > > >/* Loop over all buffered vertices and emit URB write > > > messages > > > */ > > >this->current_annotation = "gen6 thread end: urb writes > > > init"; > > >src_reg vertex(this, glsl_type::uint_type); > > > @@ -414,7 +414,7 @@ gen6_gs_visitor::emit_thread_end() > > > dst_reg reg = dst_reg(MRF, mrf); > > > reg.type = output_reg[varying][0].type; > > > data.type = reg.type; > > > - vec4_instruction *inst =
Re: [Mesa-dev] [PATCH v3] i965/gen6/gs: Handle case where a GS doesn't allocate VUE
Tested-by: Mark Janes Iago Toral writes: > Thanks Andrii, this version looks good to me. > > Mark: this change fixes a GPU hang in sandy bridge with geometry > shaders (the change itself affects a path in the driver that is only > executed in SNB with GS, so nothing else is affected). While I think > the change in here is correct according to the PRMs and in fact it > seems to fix the GPU hang reported in Bugzilla, since this is tinkering > with the way in which we allocate and free VUEs for SNB GS I believe > that if this breaks anything it might produce a GPU hang and in that > case I would rather not hang Jenkins for everyone else until you have a > chance to restore it, so in order to minimize that risk, could you run > this through Jenkins when you are available? If that is inconvenient > for you just let me know and I will send it myself late in my day on > Monday to minimize the risk. > > Thanks, > Iago > > On Fri, 2018-06-22 at 10:59 +0300, Andrii Simiklit wrote: >> We can not use the VUE Dereference flags combination for EOT >> message under ILK and SNB because the threads are not initialized >> there with initial VUE handle unlike Pre-IL. >> So to avoid GPU hangs on SNB and ILK we need >> to avoid usage of the VUE Dereference flags combination. >> (Was tested only on SNB but according to the specification >> SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6 >> the ILK must behave itself in the similar way) >> >> v2: Approach to fix this issue was changed. >> Instead of different EOT flags in the program end >> we will create VUE every time even if GS produces no output. >> >> v3: Clean up the patch. >> Signed-off-by: Andrii Simiklit >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399 >> >> --- >> src/intel/compiler/gen6_gs_visitor.cpp | 42 +--- >> -- >> 1 file changed, 21 insertions(+), 21 deletions(-) >> >> diff --git a/src/intel/compiler/gen6_gs_visitor.cpp >> b/src/intel/compiler/gen6_gs_visitor.cpp >> index 66c69fb..c9571cc 100644 >> --- a/src/intel/compiler/gen6_gs_visitor.cpp >> +++ b/src/intel/compiler/gen6_gs_visitor.cpp >> @@ -350,27 +350,27 @@ gen6_gs_visitor::emit_thread_end() >> int max_usable_mrf = FIRST_SPILL_MRF(devinfo->gen); >> >> /* Issue the FF_SYNC message and obtain the initial VUE handle. >> */ >> + this->current_annotation = "gen6 thread end: ff_sync"; >> + >> + vec4_instruction *inst = NULL; >> + if (prog->info.has_transform_feedback_varyings) { >> + src_reg sol_temp(this, glsl_type::uvec4_type); >> + emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES, >> + dst_reg(this->svbi), >> + this->vertex_count, >> + this->prim_count, >> + sol_temp); >> + inst = emit(GS_OPCODE_FF_SYNC, >> + dst_reg(this->temp), this->prim_count, this- >> >svbi); >> + } else { >> + inst = emit(GS_OPCODE_FF_SYNC, >> + dst_reg(this->temp), this->prim_count, >> brw_imm_ud(0u)); >> + } >> + inst->base_mrf = base_mrf; >> + >> emit(CMP(dst_null_ud(), this->vertex_count, brw_imm_ud(0u), >> BRW_CONDITIONAL_G)); >> emit(IF(BRW_PREDICATE_NORMAL)); >> { >> - this->current_annotation = "gen6 thread end: ff_sync"; >> - >> - vec4_instruction *inst; >> - if (prog->info.has_transform_feedback_varyings) { >> - src_reg sol_temp(this, glsl_type::uvec4_type); >> - emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES, >> - dst_reg(this->svbi), >> - this->vertex_count, >> - this->prim_count, >> - sol_temp); >> - inst = emit(GS_OPCODE_FF_SYNC, >> - dst_reg(this->temp), this->prim_count, this- >> >svbi); >> - } else { >> - inst = emit(GS_OPCODE_FF_SYNC, >> - dst_reg(this->temp), this->prim_count, >> brw_imm_ud(0u)); >> - } >> - inst->base_mrf = base_mrf; >> - >>/* Loop over all buffered vertices and emit URB write messages >> */ >>this->current_annotation = "gen6 thread end: urb writes init"; >>src_reg vertex(this, glsl_type::uint_type); >> @@ -414,7 +414,7 @@ gen6_gs_visitor::emit_thread_end() >> dst_reg reg = dst_reg(MRF, mrf); >> reg.type = output_reg[varying][0].type; >> data.type = reg.type; >> - vec4_instruction *inst = emit(MOV(reg, data)); >> + inst = emit(MOV(reg, data)); >> inst->force_writemask_all = true; >> >> mrf++; >> @@ -460,7 +460,7 @@ gen6_gs_visitor::emit_thread_end() >> * >> * However, this would lead us to end the program with an ENDIF >> opcode, >> * which we want to avoid, so what we do is that we always >> request a new >> -* VUE handle every time we do a URB WRITE, even for the last >> vertex we emit. >> +* VUE handle every time, even if GS produces no output. >> * With this we make sure that whether we have emitted at least >> one vertex >>
Re: [Mesa-dev] [PATCH v3] i965/gen6/gs: Handle case where a GS doesn't allocate VUE
Hi Iago, It is usually safe to push changes to CI which provoke GPU hangs. Occasionally, systems do not recover, so I appreciate the extra notification. I've sent this commit for testing and will let you know how it goes. -Mark Iago Toral writes: > Thanks Andrii, this version looks good to me. > > Mark: this change fixes a GPU hang in sandy bridge with geometry > shaders (the change itself affects a path in the driver that is only > executed in SNB with GS, so nothing else is affected). While I think > the change in here is correct according to the PRMs and in fact it > seems to fix the GPU hang reported in Bugzilla, since this is tinkering > with the way in which we allocate and free VUEs for SNB GS I believe > that if this breaks anything it might produce a GPU hang and in that > case I would rather not hang Jenkins for everyone else until you have a > chance to restore it, so in order to minimize that risk, could you run > this through Jenkins when you are available? If that is inconvenient > for you just let me know and I will send it myself late in my day on > Monday to minimize the risk. > > Thanks, > Iago > > On Fri, 2018-06-22 at 10:59 +0300, Andrii Simiklit wrote: >> We can not use the VUE Dereference flags combination for EOT >> message under ILK and SNB because the threads are not initialized >> there with initial VUE handle unlike Pre-IL. >> So to avoid GPU hangs on SNB and ILK we need >> to avoid usage of the VUE Dereference flags combination. >> (Was tested only on SNB but according to the specification >> SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6 >> the ILK must behave itself in the similar way) >> >> v2: Approach to fix this issue was changed. >> Instead of different EOT flags in the program end >> we will create VUE every time even if GS produces no output. >> >> v3: Clean up the patch. >> Signed-off-by: Andrii Simiklit >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399 >> >> --- >> src/intel/compiler/gen6_gs_visitor.cpp | 42 +--- >> -- >> 1 file changed, 21 insertions(+), 21 deletions(-) >> >> diff --git a/src/intel/compiler/gen6_gs_visitor.cpp >> b/src/intel/compiler/gen6_gs_visitor.cpp >> index 66c69fb..c9571cc 100644 >> --- a/src/intel/compiler/gen6_gs_visitor.cpp >> +++ b/src/intel/compiler/gen6_gs_visitor.cpp >> @@ -350,27 +350,27 @@ gen6_gs_visitor::emit_thread_end() >> int max_usable_mrf = FIRST_SPILL_MRF(devinfo->gen); >> >> /* Issue the FF_SYNC message and obtain the initial VUE handle. >> */ >> + this->current_annotation = "gen6 thread end: ff_sync"; >> + >> + vec4_instruction *inst = NULL; >> + if (prog->info.has_transform_feedback_varyings) { >> + src_reg sol_temp(this, glsl_type::uvec4_type); >> + emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES, >> + dst_reg(this->svbi), >> + this->vertex_count, >> + this->prim_count, >> + sol_temp); >> + inst = emit(GS_OPCODE_FF_SYNC, >> + dst_reg(this->temp), this->prim_count, this- >> >svbi); >> + } else { >> + inst = emit(GS_OPCODE_FF_SYNC, >> + dst_reg(this->temp), this->prim_count, >> brw_imm_ud(0u)); >> + } >> + inst->base_mrf = base_mrf; >> + >> emit(CMP(dst_null_ud(), this->vertex_count, brw_imm_ud(0u), >> BRW_CONDITIONAL_G)); >> emit(IF(BRW_PREDICATE_NORMAL)); >> { >> - this->current_annotation = "gen6 thread end: ff_sync"; >> - >> - vec4_instruction *inst; >> - if (prog->info.has_transform_feedback_varyings) { >> - src_reg sol_temp(this, glsl_type::uvec4_type); >> - emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES, >> - dst_reg(this->svbi), >> - this->vertex_count, >> - this->prim_count, >> - sol_temp); >> - inst = emit(GS_OPCODE_FF_SYNC, >> - dst_reg(this->temp), this->prim_count, this- >> >svbi); >> - } else { >> - inst = emit(GS_OPCODE_FF_SYNC, >> - dst_reg(this->temp), this->prim_count, >> brw_imm_ud(0u)); >> - } >> - inst->base_mrf = base_mrf; >> - >>/* Loop over all buffered vertices and emit URB write messages >> */ >>this->current_annotation = "gen6 thread end: urb writes init"; >>src_reg vertex(this, glsl_type::uint_type); >> @@ -414,7 +414,7 @@ gen6_gs_visitor::emit_thread_end() >> dst_reg reg = dst_reg(MRF, mrf); >> reg.type = output_reg[varying][0].type; >> data.type = reg.type; >> - vec4_instruction *inst = emit(MOV(reg, data)); >> + inst = emit(MOV(reg, data)); >> inst->force_writemask_all = true; >> >> mrf++; >> @@ -460,7 +460,7 @@ gen6_gs_visitor::emit_thread_end() >> * >> * However, this would lead us to end the program with an ENDIF >> opcode, >> * which we want to avoid, so what we do is that we always >> request a new >> -* VUE handle every time we do
Re: [Mesa-dev] [PATCH v3] i965/gen6/gs: Handle case where a GS doesn't allocate VUE
Thanks Andrii, this version looks good to me. Mark: this change fixes a GPU hang in sandy bridge with geometry shaders (the change itself affects a path in the driver that is only executed in SNB with GS, so nothing else is affected). While I think the change in here is correct according to the PRMs and in fact it seems to fix the GPU hang reported in Bugzilla, since this is tinkering with the way in which we allocate and free VUEs for SNB GS I believe that if this breaks anything it might produce a GPU hang and in that case I would rather not hang Jenkins for everyone else until you have a chance to restore it, so in order to minimize that risk, could you run this through Jenkins when you are available? If that is inconvenient for you just let me know and I will send it myself late in my day on Monday to minimize the risk. Thanks, Iago On Fri, 2018-06-22 at 10:59 +0300, Andrii Simiklit wrote: > We can not use the VUE Dereference flags combination for EOT > message under ILK and SNB because the threads are not initialized > there with initial VUE handle unlike Pre-IL. > So to avoid GPU hangs on SNB and ILK we need > to avoid usage of the VUE Dereference flags combination. > (Was tested only on SNB but according to the specification > SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6 > the ILK must behave itself in the similar way) > > v2: Approach to fix this issue was changed. > Instead of different EOT flags in the program end > we will create VUE every time even if GS produces no output. > > v3: Clean up the patch. > Signed-off-by: Andrii Simiklit > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399 > > --- > src/intel/compiler/gen6_gs_visitor.cpp | 42 +--- > -- > 1 file changed, 21 insertions(+), 21 deletions(-) > > diff --git a/src/intel/compiler/gen6_gs_visitor.cpp > b/src/intel/compiler/gen6_gs_visitor.cpp > index 66c69fb..c9571cc 100644 > --- a/src/intel/compiler/gen6_gs_visitor.cpp > +++ b/src/intel/compiler/gen6_gs_visitor.cpp > @@ -350,27 +350,27 @@ gen6_gs_visitor::emit_thread_end() > int max_usable_mrf = FIRST_SPILL_MRF(devinfo->gen); > > /* Issue the FF_SYNC message and obtain the initial VUE handle. > */ > + this->current_annotation = "gen6 thread end: ff_sync"; > + > + vec4_instruction *inst = NULL; > + if (prog->info.has_transform_feedback_varyings) { > + src_reg sol_temp(this, glsl_type::uvec4_type); > + emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES, > + dst_reg(this->svbi), > + this->vertex_count, > + this->prim_count, > + sol_temp); > + inst = emit(GS_OPCODE_FF_SYNC, > + dst_reg(this->temp), this->prim_count, this- > >svbi); > + } else { > + inst = emit(GS_OPCODE_FF_SYNC, > + dst_reg(this->temp), this->prim_count, > brw_imm_ud(0u)); > + } > + inst->base_mrf = base_mrf; > + > emit(CMP(dst_null_ud(), this->vertex_count, brw_imm_ud(0u), > BRW_CONDITIONAL_G)); > emit(IF(BRW_PREDICATE_NORMAL)); > { > - this->current_annotation = "gen6 thread end: ff_sync"; > - > - vec4_instruction *inst; > - if (prog->info.has_transform_feedback_varyings) { > - src_reg sol_temp(this, glsl_type::uvec4_type); > - emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES, > - dst_reg(this->svbi), > - this->vertex_count, > - this->prim_count, > - sol_temp); > - inst = emit(GS_OPCODE_FF_SYNC, > - dst_reg(this->temp), this->prim_count, this- > >svbi); > - } else { > - inst = emit(GS_OPCODE_FF_SYNC, > - dst_reg(this->temp), this->prim_count, > brw_imm_ud(0u)); > - } > - inst->base_mrf = base_mrf; > - >/* Loop over all buffered vertices and emit URB write messages > */ >this->current_annotation = "gen6 thread end: urb writes init"; >src_reg vertex(this, glsl_type::uint_type); > @@ -414,7 +414,7 @@ gen6_gs_visitor::emit_thread_end() > dst_reg reg = dst_reg(MRF, mrf); > reg.type = output_reg[varying][0].type; > data.type = reg.type; > - vec4_instruction *inst = emit(MOV(reg, data)); > + inst = emit(MOV(reg, data)); > inst->force_writemask_all = true; > > mrf++; > @@ -460,7 +460,7 @@ gen6_gs_visitor::emit_thread_end() > * > * However, this would lead us to end the program with an ENDIF > opcode, > * which we want to avoid, so what we do is that we always > request a new > -* VUE handle every time we do a URB WRITE, even for the last > vertex we emit. > +* VUE handle every time, even if GS produces no output. > * With this we make sure that whether we have emitted at least > one vertex > * or none at all, we have to finish the thread without writing > to the URB, > * which works for both cases by setting the COMPLETE and UNUSED > flags in > @@ -476,7 +476,7 @@
[Mesa-dev] [PATCH v3] i965/gen6/gs: Handle case where a GS doesn't allocate VUE
We can not use the VUE Dereference flags combination for EOT message under ILK and SNB because the threads are not initialized there with initial VUE handle unlike Pre-IL. So to avoid GPU hangs on SNB and ILK we need to avoid usage of the VUE Dereference flags combination. (Was tested only on SNB but according to the specification SNB Volume 2 Part 1: 1.6.5.3, 1.6.5.6 the ILK must behave itself in the similar way) v2: Approach to fix this issue was changed. Instead of different EOT flags in the program end we will create VUE every time even if GS produces no output. v3: Clean up the patch. Signed-off-by: Andrii Simiklit Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105399 --- src/intel/compiler/gen6_gs_visitor.cpp | 42 +- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/src/intel/compiler/gen6_gs_visitor.cpp b/src/intel/compiler/gen6_gs_visitor.cpp index 66c69fb..c9571cc 100644 --- a/src/intel/compiler/gen6_gs_visitor.cpp +++ b/src/intel/compiler/gen6_gs_visitor.cpp @@ -350,27 +350,27 @@ gen6_gs_visitor::emit_thread_end() int max_usable_mrf = FIRST_SPILL_MRF(devinfo->gen); /* Issue the FF_SYNC message and obtain the initial VUE handle. */ + this->current_annotation = "gen6 thread end: ff_sync"; + + vec4_instruction *inst = NULL; + if (prog->info.has_transform_feedback_varyings) { + src_reg sol_temp(this, glsl_type::uvec4_type); + emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES, + dst_reg(this->svbi), + this->vertex_count, + this->prim_count, + sol_temp); + inst = emit(GS_OPCODE_FF_SYNC, + dst_reg(this->temp), this->prim_count, this->svbi); + } else { + inst = emit(GS_OPCODE_FF_SYNC, + dst_reg(this->temp), this->prim_count, brw_imm_ud(0u)); + } + inst->base_mrf = base_mrf; + emit(CMP(dst_null_ud(), this->vertex_count, brw_imm_ud(0u), BRW_CONDITIONAL_G)); emit(IF(BRW_PREDICATE_NORMAL)); { - this->current_annotation = "gen6 thread end: ff_sync"; - - vec4_instruction *inst; - if (prog->info.has_transform_feedback_varyings) { - src_reg sol_temp(this, glsl_type::uvec4_type); - emit(GS_OPCODE_FF_SYNC_SET_PRIMITIVES, - dst_reg(this->svbi), - this->vertex_count, - this->prim_count, - sol_temp); - inst = emit(GS_OPCODE_FF_SYNC, - dst_reg(this->temp), this->prim_count, this->svbi); - } else { - inst = emit(GS_OPCODE_FF_SYNC, - dst_reg(this->temp), this->prim_count, brw_imm_ud(0u)); - } - inst->base_mrf = base_mrf; - /* Loop over all buffered vertices and emit URB write messages */ this->current_annotation = "gen6 thread end: urb writes init"; src_reg vertex(this, glsl_type::uint_type); @@ -414,7 +414,7 @@ gen6_gs_visitor::emit_thread_end() dst_reg reg = dst_reg(MRF, mrf); reg.type = output_reg[varying][0].type; data.type = reg.type; - vec4_instruction *inst = emit(MOV(reg, data)); + inst = emit(MOV(reg, data)); inst->force_writemask_all = true; mrf++; @@ -460,7 +460,7 @@ gen6_gs_visitor::emit_thread_end() * * However, this would lead us to end the program with an ENDIF opcode, * which we want to avoid, so what we do is that we always request a new -* VUE handle every time we do a URB WRITE, even for the last vertex we emit. +* VUE handle every time, even if GS produces no output. * With this we make sure that whether we have emitted at least one vertex * or none at all, we have to finish the thread without writing to the URB, * which works for both cases by setting the COMPLETE and UNUSED flags in @@ -476,7 +476,7 @@ gen6_gs_visitor::emit_thread_end() emit(GS_OPCODE_SET_DWORD_2, dst_reg(MRF, base_mrf), data); } - vec4_instruction *inst = emit(GS_OPCODE_THREAD_END); + inst = emit(GS_OPCODE_THREAD_END); inst->urb_write_flags = BRW_URB_WRITE_COMPLETE | BRW_URB_WRITE_UNUSED; inst->base_mrf = base_mrf; inst->mlen = 1; -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev