Re: [Mesa-dev] [Mesa-stable] [PATCH v3 14/48] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-12-07 Thread Francisco Jerez
Connor Abbott  writes:

> On Thu, Dec 7, 2017 at 5:30 PM, Francisco Jerez  wrote:
>> Jason Ekstrand  writes:
>>
>>> On Fri, Oct 27, 2017 at 5:24 PM, Francisco Jerez 
>>> wrote:
>>>
 I've submitted an alternative to this approach here [1], more along the
 lines of the v1 patch you borrowed from my jenkins branch.  Most of the
 reasons for that we have discussed already in the office (except for the
 last point), but FTR:

  - This patch only works around the problem by tweaking the live
intervals, which makes the live intervals inconsistent with the block
live sets (the latter are used by some back-end passes that this
patch isn't going to help with, like DCE and scheduling).

>>>
>>> I'm not sure that's strictly true.  It changes the mapping function from
>>> the live sets to the liveness intervals but it's still directly computable
>>> from live sets + cfg.  That said, there is something to be said for
>>> treating back-edges specially when computing the live sets.
>>>
>>
>> The thing is a number of back-end passes derive their liveness
>> information based on the live sets rather than the live intervals.  Take
>> the scheduler as an example -- After this change variables whose live
>> range exits a loop increase the register pressure at the beginning of
>> the loop, but the scheduler won't know because it obtains that
>> information out of the live sets, so its register pressure decisions
>> become inaccurate with this patch (unless a similar hack is implemented
>> as we have right now in the scheduler to extend live sets for variables
>> whose live range crosses through an else block).
>
> Actually, this isn't true -- I made the scheduler "fill in" its
> liveness sets to match the liveness intervals used by the register
> allocator, i.e. it already does your "hack":
>
> https://cgit.freedesktop.org/mesa/mesa/tree/src/intel/compiler/brw_schedule_instructions.cpp#n646
>
> If it didn't do that, its results would be inaccurate, which would
> defeat the purpose of tracking register pressure in the first place.
> Not that it matters much now, though, since my patches to do limit
> scheduling never got merged.
>

Yes, I'm aware that such a thing exists, but I don't think it will be
sufficient to get the scheduler's live sets in sync with the live
intervals calculated here with this patch in place, that's what I meant
to say earlier.  The reason for that is that the live intervals extended
by this patch won't overlap the end IP of the block immediately before
the loop, so the scheduler's workaround will probably not kick in at the
top block of the loop.

>>
>>>
  - This patch doesn't fix the same bug on the VEC4 back-end.

  - I feel that the alternative approach [1] admits more straightforward
extension to more exotic control-flow constructs.

  - It doesn't actually fix the problem for all possible patterns of
divergent loop execution.  Consider the following GLSL pseudocode:

| b = 0;
| do {
|use(b);
|b = non_uniform_condition();
|if (b)
|   continue;
|
|stomp_neighboring_channels();
|break;
|
| } while(true);

With this series applied, the live interval of 'b' will be calculated
to be between the 'b = 0' and 'continue' statements, which allows
stomp_neighboring_channels() to overwrite the contents of 'b'
depending on how the register allocator lays things out in the
register file.  OTOH with [1] applied the live interval calculated
for 'b' will be the region between the 'b = 0' and 'break'
statements, which overlaps the whole region of divergent control flow
as expected.

>>>
>>> It does handle that case correctly before your patch to add defin/out but
>>> not after. :-)  This is because we always propagate livein from the top of
>>> the loop into the liveout of the while so anything livein at the top
>>> automatically gets extended down.  We do not, however, propagate defout
>>> that way (since the while is not reachable) so your patch breaks it.
>>>
>>
>> Probably, but the main use of this patch is fixing a bug that comes out
>> to surface as a side effect of my PATCH 15 of this series, so it better
>> work correctly in combination with the patch it's preparing the compiler
>> for ;)
>>
>>>
 Cheers.

 [1] https://lists.freedesktop.org/archives/mesa-dev/2017-
 October/174591.html

 Jason Ekstrand  writes:

 > No Shader-db changes.
 >
 > Cc: mesa-sta...@lists.freedesktop.org
 > ---
 >  src/intel/compiler/brw_fs_live_variables.cpp | 55
 
 >  1 file changed, 55 insertions(+)
 >
 > diff --git a/src/intel/compiler/brw_fs_live_variables.cpp

Re: [Mesa-dev] [Mesa-stable] [PATCH v3 14/48] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-12-07 Thread Connor Abbott
On Thu, Dec 7, 2017 at 5:30 PM, Francisco Jerez  wrote:
> Jason Ekstrand  writes:
>
>> On Fri, Oct 27, 2017 at 5:24 PM, Francisco Jerez 
>> wrote:
>>
>>> I've submitted an alternative to this approach here [1], more along the
>>> lines of the v1 patch you borrowed from my jenkins branch.  Most of the
>>> reasons for that we have discussed already in the office (except for the
>>> last point), but FTR:
>>>
>>>  - This patch only works around the problem by tweaking the live
>>>intervals, which makes the live intervals inconsistent with the block
>>>live sets (the latter are used by some back-end passes that this
>>>patch isn't going to help with, like DCE and scheduling).
>>>
>>
>> I'm not sure that's strictly true.  It changes the mapping function from
>> the live sets to the liveness intervals but it's still directly computable
>> from live sets + cfg.  That said, there is something to be said for
>> treating back-edges specially when computing the live sets.
>>
>
> The thing is a number of back-end passes derive their liveness
> information based on the live sets rather than the live intervals.  Take
> the scheduler as an example -- After this change variables whose live
> range exits a loop increase the register pressure at the beginning of
> the loop, but the scheduler won't know because it obtains that
> information out of the live sets, so its register pressure decisions
> become inaccurate with this patch (unless a similar hack is implemented
> as we have right now in the scheduler to extend live sets for variables
> whose live range crosses through an else block).

Actually, this isn't true -- I made the scheduler "fill in" its
liveness sets to match the liveness intervals used by the register
allocator, i.e. it already does your "hack":

https://cgit.freedesktop.org/mesa/mesa/tree/src/intel/compiler/brw_schedule_instructions.cpp#n646

If it didn't do that, its results would be inaccurate, which would
defeat the purpose of tracking register pressure in the first place.
Not that it matters much now, though, since my patches to do limit
scheduling never got merged.

>
>>
>>>  - This patch doesn't fix the same bug on the VEC4 back-end.
>>>
>>>  - I feel that the alternative approach [1] admits more straightforward
>>>extension to more exotic control-flow constructs.
>>>
>>>  - It doesn't actually fix the problem for all possible patterns of
>>>divergent loop execution.  Consider the following GLSL pseudocode:
>>>
>>>| b = 0;
>>>| do {
>>>|use(b);
>>>|b = non_uniform_condition();
>>>|if (b)
>>>|   continue;
>>>|
>>>|stomp_neighboring_channels();
>>>|break;
>>>|
>>>| } while(true);
>>>
>>>With this series applied, the live interval of 'b' will be calculated
>>>to be between the 'b = 0' and 'continue' statements, which allows
>>>stomp_neighboring_channels() to overwrite the contents of 'b'
>>>depending on how the register allocator lays things out in the
>>>register file.  OTOH with [1] applied the live interval calculated
>>>for 'b' will be the region between the 'b = 0' and 'break'
>>>statements, which overlaps the whole region of divergent control flow
>>>as expected.
>>>
>>
>> It does handle that case correctly before your patch to add defin/out but
>> not after. :-)  This is because we always propagate livein from the top of
>> the loop into the liveout of the while so anything livein at the top
>> automatically gets extended down.  We do not, however, propagate defout
>> that way (since the while is not reachable) so your patch breaks it.
>>
>
> Probably, but the main use of this patch is fixing a bug that comes out
> to surface as a side effect of my PATCH 15 of this series, so it better
> work correctly in combination with the patch it's preparing the compiler
> for ;)
>
>>
>>> Cheers.
>>>
>>> [1] https://lists.freedesktop.org/archives/mesa-dev/2017-
>>> October/174591.html
>>>
>>> Jason Ekstrand  writes:
>>>
>>> > No Shader-db changes.
>>> >
>>> > Cc: mesa-sta...@lists.freedesktop.org
>>> > ---
>>> >  src/intel/compiler/brw_fs_live_variables.cpp | 55
>>> 
>>> >  1 file changed, 55 insertions(+)
>>> >
>>> > diff --git a/src/intel/compiler/brw_fs_live_variables.cpp
>>> b/src/intel/compiler/brw_fs_live_variables.cpp
>>> > index c449672..380060d 100644
>>> > --- a/src/intel/compiler/brw_fs_live_variables.cpp
>>> > +++ b/src/intel/compiler/brw_fs_live_variables.cpp
>>> > @@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
>>> >   }
>>> >}
>>> > }
>>> > +
>>> > +   /* Due to the explicit way the SIMD data is handled on GEN, we need
>>> to be a
>>> > +* bit more careful with live ranges and loops.  Consider the
>>> following
>>> > +* example:
>>> > +*
>>> > +*vec4 color2;
>>> > +*while (1) {
>>> > +*   

Re: [Mesa-dev] [Mesa-stable] [PATCH v3 14/48] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-12-07 Thread Francisco Jerez
Jason Ekstrand  writes:

> On Fri, Oct 27, 2017 at 5:24 PM, Francisco Jerez 
> wrote:
>
>> I've submitted an alternative to this approach here [1], more along the
>> lines of the v1 patch you borrowed from my jenkins branch.  Most of the
>> reasons for that we have discussed already in the office (except for the
>> last point), but FTR:
>>
>>  - This patch only works around the problem by tweaking the live
>>intervals, which makes the live intervals inconsistent with the block
>>live sets (the latter are used by some back-end passes that this
>>patch isn't going to help with, like DCE and scheduling).
>>
>
> I'm not sure that's strictly true.  It changes the mapping function from
> the live sets to the liveness intervals but it's still directly computable
> from live sets + cfg.  That said, there is something to be said for
> treating back-edges specially when computing the live sets.
>

The thing is a number of back-end passes derive their liveness
information based on the live sets rather than the live intervals.  Take
the scheduler as an example -- After this change variables whose live
range exits a loop increase the register pressure at the beginning of
the loop, but the scheduler won't know because it obtains that
information out of the live sets, so its register pressure decisions
become inaccurate with this patch (unless a similar hack is implemented
as we have right now in the scheduler to extend live sets for variables
whose live range crosses through an else block).

>
>>  - This patch doesn't fix the same bug on the VEC4 back-end.
>>
>>  - I feel that the alternative approach [1] admits more straightforward
>>extension to more exotic control-flow constructs.
>>
>>  - It doesn't actually fix the problem for all possible patterns of
>>divergent loop execution.  Consider the following GLSL pseudocode:
>>
>>| b = 0;
>>| do {
>>|use(b);
>>|b = non_uniform_condition();
>>|if (b)
>>|   continue;
>>|
>>|stomp_neighboring_channels();
>>|break;
>>|
>>| } while(true);
>>
>>With this series applied, the live interval of 'b' will be calculated
>>to be between the 'b = 0' and 'continue' statements, which allows
>>stomp_neighboring_channels() to overwrite the contents of 'b'
>>depending on how the register allocator lays things out in the
>>register file.  OTOH with [1] applied the live interval calculated
>>for 'b' will be the region between the 'b = 0' and 'break'
>>statements, which overlaps the whole region of divergent control flow
>>as expected.
>>
>
> It does handle that case correctly before your patch to add defin/out but
> not after. :-)  This is because we always propagate livein from the top of
> the loop into the liveout of the while so anything livein at the top
> automatically gets extended down.  We do not, however, propagate defout
> that way (since the while is not reachable) so your patch breaks it.
>

Probably, but the main use of this patch is fixing a bug that comes out
to surface as a side effect of my PATCH 15 of this series, so it better
work correctly in combination with the patch it's preparing the compiler
for ;)

>
>> Cheers.
>>
>> [1] https://lists.freedesktop.org/archives/mesa-dev/2017-
>> October/174591.html
>>
>> Jason Ekstrand  writes:
>>
>> > No Shader-db changes.
>> >
>> > Cc: mesa-sta...@lists.freedesktop.org
>> > ---
>> >  src/intel/compiler/brw_fs_live_variables.cpp | 55
>> 
>> >  1 file changed, 55 insertions(+)
>> >
>> > diff --git a/src/intel/compiler/brw_fs_live_variables.cpp
>> b/src/intel/compiler/brw_fs_live_variables.cpp
>> > index c449672..380060d 100644
>> > --- a/src/intel/compiler/brw_fs_live_variables.cpp
>> > +++ b/src/intel/compiler/brw_fs_live_variables.cpp
>> > @@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
>> >   }
>> >}
>> > }
>> > +
>> > +   /* Due to the explicit way the SIMD data is handled on GEN, we need
>> to be a
>> > +* bit more careful with live ranges and loops.  Consider the
>> following
>> > +* example:
>> > +*
>> > +*vec4 color2;
>> > +*while (1) {
>> > +*   vec4 color = texture();
>> > +*   if (...) {
>> > +*  color2 = color * 2;
>> > +*  break;
>> > +*   }
>> > +*}
>> > +*gl_FragColor = color2;
>> > +*
>> > +* In this case, the definition of color2 dominates the use because
>> the
>> > +* loop only has the one exit.  This means that the live range
>> interval for
>> > +* color2 goes from the statement in the if to it's use below the
>> loop.
>> > +* Now suppose that the texture operation has a header register that
>> gets
>> > +* assigned one of the registers used for color2.  If the loop
>> condition is
>> > +* non-uniform and some of the threads will take the and others 

Re: [Mesa-dev] [Mesa-stable] [PATCH v3 14/48] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-12-07 Thread Jason Ekstrand
On Fri, Oct 27, 2017 at 5:24 PM, Francisco Jerez 
wrote:

> I've submitted an alternative to this approach here [1], more along the
> lines of the v1 patch you borrowed from my jenkins branch.  Most of the
> reasons for that we have discussed already in the office (except for the
> last point), but FTR:
>
>  - This patch only works around the problem by tweaking the live
>intervals, which makes the live intervals inconsistent with the block
>live sets (the latter are used by some back-end passes that this
>patch isn't going to help with, like DCE and scheduling).
>

I'm not sure that's strictly true.  It changes the mapping function from
the live sets to the liveness intervals but it's still directly computable
from live sets + cfg.  That said, there is something to be said for
treating back-edges specially when computing the live sets.


>  - This patch doesn't fix the same bug on the VEC4 back-end.
>
>  - I feel that the alternative approach [1] admits more straightforward
>extension to more exotic control-flow constructs.
>
>  - It doesn't actually fix the problem for all possible patterns of
>divergent loop execution.  Consider the following GLSL pseudocode:
>
>| b = 0;
>| do {
>|use(b);
>|b = non_uniform_condition();
>|if (b)
>|   continue;
>|
>|stomp_neighboring_channels();
>|break;
>|
>| } while(true);
>
>With this series applied, the live interval of 'b' will be calculated
>to be between the 'b = 0' and 'continue' statements, which allows
>stomp_neighboring_channels() to overwrite the contents of 'b'
>depending on how the register allocator lays things out in the
>register file.  OTOH with [1] applied the live interval calculated
>for 'b' will be the region between the 'b = 0' and 'break'
>statements, which overlaps the whole region of divergent control flow
>as expected.
>

It does handle that case correctly before your patch to add defin/out but
not after. :-)  This is because we always propagate livein from the top of
the loop into the liveout of the while so anything livein at the top
automatically gets extended down.  We do not, however, propagate defout
that way (since the while is not reachable) so your patch breaks it.


> Cheers.
>
> [1] https://lists.freedesktop.org/archives/mesa-dev/2017-
> October/174591.html
>
> Jason Ekstrand  writes:
>
> > No Shader-db changes.
> >
> > Cc: mesa-sta...@lists.freedesktop.org
> > ---
> >  src/intel/compiler/brw_fs_live_variables.cpp | 55
> 
> >  1 file changed, 55 insertions(+)
> >
> > diff --git a/src/intel/compiler/brw_fs_live_variables.cpp
> b/src/intel/compiler/brw_fs_live_variables.cpp
> > index c449672..380060d 100644
> > --- a/src/intel/compiler/brw_fs_live_variables.cpp
> > +++ b/src/intel/compiler/brw_fs_live_variables.cpp
> > @@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
> >   }
> >}
> > }
> > +
> > +   /* Due to the explicit way the SIMD data is handled on GEN, we need
> to be a
> > +* bit more careful with live ranges and loops.  Consider the
> following
> > +* example:
> > +*
> > +*vec4 color2;
> > +*while (1) {
> > +*   vec4 color = texture();
> > +*   if (...) {
> > +*  color2 = color * 2;
> > +*  break;
> > +*   }
> > +*}
> > +*gl_FragColor = color2;
> > +*
> > +* In this case, the definition of color2 dominates the use because
> the
> > +* loop only has the one exit.  This means that the live range
> interval for
> > +* color2 goes from the statement in the if to it's use below the
> loop.
> > +* Now suppose that the texture operation has a header register that
> gets
> > +* assigned one of the registers used for color2.  If the loop
> condition is
> > +* non-uniform and some of the threads will take the and others will
> > +* continue.  In this case, the next pass through the loop, the
> WE_all
> > +* setup of the header register will stomp the disabled channels of
> color2
> > +* and corrupt the value.
> > +*
> > +* This same problem can occur if you have a mix of 64, 32, and
> 16-bit
> > +* registers because the channels do not line up or if you have a
> SIMD16
> > +* program and the first half of one value overlaps the second half
> of the
> > +* other.
> > +*
> > +* To solve this problem, we take any VGRFs whose live ranges cross
> the
> > +* while instruction of a loop and extend their live ranges to the
> top of
> > +* the loop.  This more accurately models the hardware because the
> value in
> > +* the VGRF needs to be carried through subsequent loop iterations
> in order
> > +* to remain valid when we finally do break.
> > +*/
> > +   foreach_block (block, cfg) {
> > +  if (block->end()->opcode != BRW_OPCODE_WHILE)
> > +

Re: [Mesa-dev] [Mesa-stable] [PATCH v3 14/48] i965/fs: Extend the live ranges of VGRFs which leave loops

2017-10-27 Thread Francisco Jerez
I've submitted an alternative to this approach here [1], more along the
lines of the v1 patch you borrowed from my jenkins branch.  Most of the
reasons for that we have discussed already in the office (except for the
last point), but FTR:

 - This patch only works around the problem by tweaking the live
   intervals, which makes the live intervals inconsistent with the block
   live sets (the latter are used by some back-end passes that this
   patch isn't going to help with, like DCE and scheduling).

 - This patch doesn't fix the same bug on the VEC4 back-end.

 - I feel that the alternative approach [1] admits more straightforward
   extension to more exotic control-flow constructs.

 - It doesn't actually fix the problem for all possible patterns of
   divergent loop execution.  Consider the following GLSL pseudocode:

   | b = 0;
   | do {
   |use(b);
   |b = non_uniform_condition();
   |if (b)
   |   continue;
   | 
   |stomp_neighboring_channels();
   |break;
   | 
   | } while(true);

   With this series applied, the live interval of 'b' will be calculated
   to be between the 'b = 0' and 'continue' statements, which allows
   stomp_neighboring_channels() to overwrite the contents of 'b'
   depending on how the register allocator lays things out in the
   register file.  OTOH with [1] applied the live interval calculated
   for 'b' will be the region between the 'b = 0' and 'break'
   statements, which overlaps the whole region of divergent control flow
   as expected.

Cheers.

[1] https://lists.freedesktop.org/archives/mesa-dev/2017-October/174591.html

Jason Ekstrand  writes:

> No Shader-db changes.
>
> Cc: mesa-sta...@lists.freedesktop.org
> ---
>  src/intel/compiler/brw_fs_live_variables.cpp | 55 
> 
>  1 file changed, 55 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs_live_variables.cpp 
> b/src/intel/compiler/brw_fs_live_variables.cpp
> index c449672..380060d 100644
> --- a/src/intel/compiler/brw_fs_live_variables.cpp
> +++ b/src/intel/compiler/brw_fs_live_variables.cpp
> @@ -223,6 +223,61 @@ fs_live_variables::compute_start_end()
>   }
>}
> }
> +
> +   /* Due to the explicit way the SIMD data is handled on GEN, we need to be 
> a
> +* bit more careful with live ranges and loops.  Consider the following
> +* example:
> +*
> +*vec4 color2;
> +*while (1) {
> +*   vec4 color = texture();
> +*   if (...) {
> +*  color2 = color * 2;
> +*  break;
> +*   }
> +*}
> +*gl_FragColor = color2;
> +*
> +* In this case, the definition of color2 dominates the use because the
> +* loop only has the one exit.  This means that the live range interval 
> for
> +* color2 goes from the statement in the if to it's use below the loop.
> +* Now suppose that the texture operation has a header register that gets
> +* assigned one of the registers used for color2.  If the loop condition 
> is
> +* non-uniform and some of the threads will take the and others will
> +* continue.  In this case, the next pass through the loop, the WE_all
> +* setup of the header register will stomp the disabled channels of color2
> +* and corrupt the value.
> +*
> +* This same problem can occur if you have a mix of 64, 32, and 16-bit
> +* registers because the channels do not line up or if you have a SIMD16
> +* program and the first half of one value overlaps the second half of the
> +* other.
> +*
> +* To solve this problem, we take any VGRFs whose live ranges cross the
> +* while instruction of a loop and extend their live ranges to the top of
> +* the loop.  This more accurately models the hardware because the value 
> in
> +* the VGRF needs to be carried through subsequent loop iterations in 
> order
> +* to remain valid when we finally do break.
> +*/
> +   foreach_block (block, cfg) {
> +  if (block->end()->opcode != BRW_OPCODE_WHILE)
> + continue;
> +
> +  /* This is a WHILE instrution. Find the DO block. */
> +  bblock_t *do_block = NULL;
> +  foreach_list_typed(bblock_link, child_link, link, >children) {
> + if (child_link->block->start_ip < block->end_ip) {
> +assert(do_block == NULL);
> +do_block = child_link->block;
> + }
> +  }
> +  assert(do_block);
> +
> +  for (int i = 0; i < num_vars; i++) {
> + if (start[i] < block->end_ip && end[i] > block->end_ip)
> +start[i] = MIN2(start[i], do_block->start_ip);
> +  }
> +   }
>  }
>  
>  fs_live_variables::fs_live_variables(fs_visitor *v, const cfg_t *cfg)
> -- 
> 2.5.0.400.gff86faf
>
> ___
> mesa-stable mailing list
> mesa-sta...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-stable


signature.asc
Description: PGP signature