Re: [Mesa-dev] [PATCH v2 2/3] nir: Add a discard optimization pass

2018-07-11 Thread Eero Tamminen

Hi,

On 06.07.2018 00:28, Jason Ekstrand wrote:
On Thu, Jul 5, 2018 at 2:18 PM, Jason Ekstrand > wrote:

[...]

>> Optimizing for the latter case is an essentially
>> heuristic assumption that needs to be verified experimentally.  Have 
you
>> tested the effect of this pass on non-DX workloads extensively?
>>
>
> Yes, it is a trade-off.  No, I have not done particularly extensive
> testing.  We do, however, know of non-DXVK workloads that would 
benefit
> from this.  I believe Manhattan is one such example though I have not 
yet
> benchmarked it.
>

You should grab some numbers then to make sure there are no
regressions...


I'm working on that.  Unfortunately the perf system is giving me
trouble so I don't have the numbers yet.

But keep in mind that the i965 scheduler is already
performing a similar optimization (locally, but with cycle-count
information).  This will only help over the existing
optimization if the
shaders that represent a bottleneck in Manhattan have sufficient
control
flow for the basic block boundaries to represent a problem to the
(local) scheduler.


I'm not sure about the manhattan shader but the Skyrim shader does
have control flow which the discard has to get moved above.


I have results from the perf system now and somehow this pass makes 
manhattan noticeably worse.  I'll look into that.


Note: All the more complex GfxBench tests use discard:
Egypt, T-Rex, Manhattan, CarChase, AztecRuins...

(Most of the other benchmarks we're running, don't use them.)


- Eero
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 2/3] nir: Add a discard optimization pass

2018-07-05 Thread Jason Ekstrand
On Thu, Jul 5, 2018 at 2:18 PM, Jason Ekstrand  wrote:

> On Thu, Jul 5, 2018 at 11:03 AM, Francisco Jerez 
> wrote:
>
>> Jason Ekstrand  writes:
>>
>> > On Wed, Jul 4, 2018 at 1:20 PM, Francisco Jerez 
>> > wrote:
>> >
>> >> Jason Ekstrand  writes:
>> >>
>> >> > Many fragment shaders do a discard using relatively little
>> information
>> >> > but still put the discard fairly far down in the shader for no good
>> >> > reason.  If the discard is moved higher up, we can possibly avoid
>> doing
>> >> > some or almost all of the work in the shader.  When this lets us skip
>> >> > texturing operations, it's an especially high win.
>> >> >
>> >> > One of the biggest offenders here is DXVK.  The D3D APIs have
>> different
>> >> > rules for discards than OpenGL and Vulkan.  One effective way (which
>> is
>> >> > what DXVK uses) to implement DX behavior on top of GL or Vulkan is to
>> >> > wait until the very end of the shader to discard.  This ends up in
>> the
>> >> > pessimal case where we always do all of the work before discarding.
>> >> > This pass helps some DXVK shaders significantly.
>> >> >
>> >>
>> >> One thing to keep in mind is that this sort of transformation is
>> trading
>> >> off run-time of fragment shader invocations that don't call discard (or
>> >> do so non-uniformly, which means that the code the discard jump is
>> >> protecting will be executed anyway, so doing this can actually increase
>> >> the critical path of the program) in favour of invocations that call
>> >> discard uniformly (so executing discard early will effectively
>> terminate
>> >> the program early).
>> >
>> >
>> > It's not really a uniform vs. non-uniform thing.  Even if a shader only
>> > discards some of the fragments, it sill reduces the number of live
>> channels
>> > which reduces the cost of later non-uniform control-flow.
>> >
>>
>> Which only helps if the shader's control flow is sufficiently
>> non-uniform that the additional cost from performing those computations
>> early pays off -- Or not at all if the discarded fragments need to be
>> executed (non-compliantly) anyway in order to provide
>> derivatives_safe_after_discard.  However, if the discard condition is
>> uniform (across a warp), the thread can be terminated early by the
>> back-end most certainly, which gives you the maximum pay-off.  Uniform
>> discard conditions are therefore the best-case scenario for this
>> optimization pass.
>>
>
> Yes, that is correct.  Fortunately, things that discard tend to discard
> fairly large chunks of the polygon at one time so this case is fairly
> common.
>
>
>> >
>> >> Optimizing for the latter case is an essentially
>> >> heuristic assumption that needs to be verified experimentally.  Have
>> you
>> >> tested the effect of this pass on non-DX workloads extensively?
>> >>
>> >
>> > Yes, it is a trade-off.  No, I have not done particularly extensive
>> > testing.  We do, however, know of non-DXVK workloads that would benefit
>> > from this.  I believe Manhattan is one such example though I have not
>> yet
>> > benchmarked it.
>> >
>>
>> You should grab some numbers then to make sure there are no
>> regressions...
>
>
> I'm working on that.  Unfortunately the perf system is giving me trouble
> so I don't have the numbers yet.
>
>
>> But keep in mind that the i965 scheduler is already
>> performing a similar optimization (locally, but with cycle-count
>> information).  This will only help over the existing optimization if the
>> shaders that represent a bottleneck in Manhattan have sufficient control
>> flow for the basic block boundaries to represent a problem to the
>> (local) scheduler.
>>
>
> I'm not sure about the manhattan shader but the Skyrim shader does have
> control flow which the discard has to get moved above.
>

I have results from the perf system now and somehow this pass makes
manhattan noticeably worse.  I'll look into that.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 2/3] nir: Add a discard optimization pass

2018-07-05 Thread Jason Ekstrand
On Thu, Jul 5, 2018 at 11:03 AM, Francisco Jerez 
wrote:

> Jason Ekstrand  writes:
>
> > On Wed, Jul 4, 2018 at 1:20 PM, Francisco Jerez 
> > wrote:
> >
> >> Jason Ekstrand  writes:
> >>
> >> > Many fragment shaders do a discard using relatively little information
> >> > but still put the discard fairly far down in the shader for no good
> >> > reason.  If the discard is moved higher up, we can possibly avoid
> doing
> >> > some or almost all of the work in the shader.  When this lets us skip
> >> > texturing operations, it's an especially high win.
> >> >
> >> > One of the biggest offenders here is DXVK.  The D3D APIs have
> different
> >> > rules for discards than OpenGL and Vulkan.  One effective way (which
> is
> >> > what DXVK uses) to implement DX behavior on top of GL or Vulkan is to
> >> > wait until the very end of the shader to discard.  This ends up in the
> >> > pessimal case where we always do all of the work before discarding.
> >> > This pass helps some DXVK shaders significantly.
> >> >
> >>
> >> One thing to keep in mind is that this sort of transformation is trading
> >> off run-time of fragment shader invocations that don't call discard (or
> >> do so non-uniformly, which means that the code the discard jump is
> >> protecting will be executed anyway, so doing this can actually increase
> >> the critical path of the program) in favour of invocations that call
> >> discard uniformly (so executing discard early will effectively terminate
> >> the program early).
> >
> >
> > It's not really a uniform vs. non-uniform thing.  Even if a shader only
> > discards some of the fragments, it sill reduces the number of live
> channels
> > which reduces the cost of later non-uniform control-flow.
> >
>
> Which only helps if the shader's control flow is sufficiently
> non-uniform that the additional cost from performing those computations
> early pays off -- Or not at all if the discarded fragments need to be
> executed (non-compliantly) anyway in order to provide
> derivatives_safe_after_discard.  However, if the discard condition is
> uniform (across a warp), the thread can be terminated early by the
> back-end most certainly, which gives you the maximum pay-off.  Uniform
> discard conditions are therefore the best-case scenario for this
> optimization pass.
>

Yes, that is correct.  Fortunately, things that discard tend to discard
fairly large chunks of the polygon at one time so this case is fairly
common.


> >
> >> Optimizing for the latter case is an essentially
> >> heuristic assumption that needs to be verified experimentally.  Have you
> >> tested the effect of this pass on non-DX workloads extensively?
> >>
> >
> > Yes, it is a trade-off.  No, I have not done particularly extensive
> > testing.  We do, however, know of non-DXVK workloads that would benefit
> > from this.  I believe Manhattan is one such example though I have not yet
> > benchmarked it.
> >
>
> You should grab some numbers then to make sure there are no
> regressions...


I'm working on that.  Unfortunately the perf system is giving me trouble so
I don't have the numbers yet.


> But keep in mind that the i965 scheduler is already
> performing a similar optimization (locally, but with cycle-count
> information).  This will only help over the existing optimization if the
> shaders that represent a bottleneck in Manhattan have sufficient control
> flow for the basic block boundaries to represent a problem to the
> (local) scheduler.
>

I'm not sure about the manhattan shader but the Skyrim shader does have
control flow which the discard has to get moved above.


> >
> >> > v2 (Jason Ekstrand):
> >> >  - Fix a couple of typos (Grazvydas, Ian)
> >> >  - Use the new nir_instr_move helper
> >> >  - Find all movable discards before moving anything so we don't
> >> >accidentally re-order anything and break dependencies
> >> > ---
> >> >  src/compiler/Makefile.sources  |   1 +
> >> >  src/compiler/nir/meson.build   |   1 +
> >> >  src/compiler/nir/nir.h |  10 +
> >> >  src/compiler/nir/nir_opt_discard.c | 396
> +
> >> >  4 files changed, 408 insertions(+)
> >> >  create mode 100644 src/compiler/nir/nir_opt_discard.c
> >> >
> >> > diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.
> >> sources
> >> > index 9e3fbdc2612..8600ce81281 100644
> >> > --- a/src/compiler/Makefile.sources
> >> > +++ b/src/compiler/Makefile.sources
> >> > @@ -271,6 +271,7 @@ NIR_FILES = \
> >> >   nir/nir_opt_cse.c \
> >> >   nir/nir_opt_dce.c \
> >> >   nir/nir_opt_dead_cf.c \
> >> > + nir/nir_opt_discard.c \
> >> >   nir/nir_opt_gcm.c \
> >> >   nir/nir_opt_global_to_local.c \
> >> >   nir/nir_opt_if.c \
> >> > diff --git a/src/compiler/nir/meson.build
> b/src/compiler/nir/meson.build
> >> > index 28aa8de7014..e339258bb94 100644
> >> > --- a/src/compiler/nir/meson.build
> >> > +++ b/src/compiler/nir/meson.build
> >> > @@ -156,6 +156,7 @@ files_libnir = files(
> 

Re: [Mesa-dev] [PATCH v2 2/3] nir: Add a discard optimization pass

2018-07-05 Thread Francisco Jerez
Jason Ekstrand  writes:

> On Wed, Jul 4, 2018 at 1:20 PM, Francisco Jerez 
> wrote:
>
>> Jason Ekstrand  writes:
>>
>> > Many fragment shaders do a discard using relatively little information
>> > but still put the discard fairly far down in the shader for no good
>> > reason.  If the discard is moved higher up, we can possibly avoid doing
>> > some or almost all of the work in the shader.  When this lets us skip
>> > texturing operations, it's an especially high win.
>> >
>> > One of the biggest offenders here is DXVK.  The D3D APIs have different
>> > rules for discards than OpenGL and Vulkan.  One effective way (which is
>> > what DXVK uses) to implement DX behavior on top of GL or Vulkan is to
>> > wait until the very end of the shader to discard.  This ends up in the
>> > pessimal case where we always do all of the work before discarding.
>> > This pass helps some DXVK shaders significantly.
>> >
>>
>> One thing to keep in mind is that this sort of transformation is trading
>> off run-time of fragment shader invocations that don't call discard (or
>> do so non-uniformly, which means that the code the discard jump is
>> protecting will be executed anyway, so doing this can actually increase
>> the critical path of the program) in favour of invocations that call
>> discard uniformly (so executing discard early will effectively terminate
>> the program early).
>
>
> It's not really a uniform vs. non-uniform thing.  Even if a shader only
> discards some of the fragments, it sill reduces the number of live channels
> which reduces the cost of later non-uniform control-flow.
>

Which only helps if the shader's control flow is sufficiently
non-uniform that the additional cost from performing those computations
early pays off -- Or not at all if the discarded fragments need to be
executed (non-compliantly) anyway in order to provide
derivatives_safe_after_discard.  However, if the discard condition is
uniform (across a warp), the thread can be terminated early by the
back-end most certainly, which gives you the maximum pay-off.  Uniform
discard conditions are therefore the best-case scenario for this
optimization pass.

>
>> Optimizing for the latter case is an essentially
>> heuristic assumption that needs to be verified experimentally.  Have you
>> tested the effect of this pass on non-DX workloads extensively?
>>
>
> Yes, it is a trade-off.  No, I have not done particularly extensive
> testing.  We do, however, know of non-DXVK workloads that would benefit
> from this.  I believe Manhattan is one such example though I have not yet
> benchmarked it.
>

You should grab some numbers then to make sure there are no
regressions...  But keep in mind that the i965 scheduler is already
performing a similar optimization (locally, but with cycle-count
information).  This will only help over the existing optimization if the
shaders that represent a bottleneck in Manhattan have sufficient control
flow for the basic block boundaries to represent a problem to the
(local) scheduler.

>
>> > v2 (Jason Ekstrand):
>> >  - Fix a couple of typos (Grazvydas, Ian)
>> >  - Use the new nir_instr_move helper
>> >  - Find all movable discards before moving anything so we don't
>> >accidentally re-order anything and break dependencies
>> > ---
>> >  src/compiler/Makefile.sources  |   1 +
>> >  src/compiler/nir/meson.build   |   1 +
>> >  src/compiler/nir/nir.h |  10 +
>> >  src/compiler/nir/nir_opt_discard.c | 396 +
>> >  4 files changed, 408 insertions(+)
>> >  create mode 100644 src/compiler/nir/nir_opt_discard.c
>> >
>> > diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.
>> sources
>> > index 9e3fbdc2612..8600ce81281 100644
>> > --- a/src/compiler/Makefile.sources
>> > +++ b/src/compiler/Makefile.sources
>> > @@ -271,6 +271,7 @@ NIR_FILES = \
>> >   nir/nir_opt_cse.c \
>> >   nir/nir_opt_dce.c \
>> >   nir/nir_opt_dead_cf.c \
>> > + nir/nir_opt_discard.c \
>> >   nir/nir_opt_gcm.c \
>> >   nir/nir_opt_global_to_local.c \
>> >   nir/nir_opt_if.c \
>> > diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
>> > index 28aa8de7014..e339258bb94 100644
>> > --- a/src/compiler/nir/meson.build
>> > +++ b/src/compiler/nir/meson.build
>> > @@ -156,6 +156,7 @@ files_libnir = files(
>> >'nir_opt_cse.c',
>> >'nir_opt_dce.c',
>> >'nir_opt_dead_cf.c',
>> > +  'nir_opt_discard.c',
>> >'nir_opt_gcm.c',
>> >'nir_opt_global_to_local.c',
>> >'nir_opt_if.c',
>> > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> > index c40a88c8ccc..dac019c17e8 100644
>> > --- a/src/compiler/nir/nir.h
>> > +++ b/src/compiler/nir/nir.h
>> > @@ -2022,6 +2022,13 @@ typedef struct nir_shader_compiler_options {
>> >  */
>> > bool vs_inputs_dual_locations;
>> >
>> > +   /**
>> > +* Whether or not derivatives are still a safe operation after a
>> discard
>> > +* has occurred.  Optimization passes may be 

Re: [Mesa-dev] [PATCH v2 2/3] nir: Add a discard optimization pass

2018-07-04 Thread Jason Ekstrand

On July 4, 2018 15:35:15 Bas Nieuwenhuizen  wrote:


On Wed, Jul 4, 2018 at 11:00 PM, Jason Ekstrand  wrote:

On Wed, Jul 4, 2018 at 1:20 PM, Francisco Jerez 
wrote:


Jason Ekstrand  writes:


Many fragment shaders do a discard using relatively little information
but still put the discard fairly far down in the shader for no good
reason.  If the discard is moved higher up, we can possibly avoid doing
some or almost all of the work in the shader.  When this lets us skip
texturing operations, it's an especially high win.

One of the biggest offenders here is DXVK.  The D3D APIs have different
rules for discards than OpenGL and Vulkan.  One effective way (which is
what DXVK uses) to implement DX behavior on top of GL or Vulkan is to
wait until the very end of the shader to discard.  This ends up in the
pessimal case where we always do all of the work before discarding.
This pass helps some DXVK shaders significantly.


One thing to keep in mind is that this sort of transformation is trading
off run-time of fragment shader invocations that don't call discard (or
do so non-uniformly, which means that the code the discard jump is
protecting will be executed anyway, so doing this can actually increase
the critical path of the program) in favour of invocations that call
discard uniformly (so executing discard early will effectively terminate
the program early).



It's not really a uniform vs. non-uniform thing.  Even if a shader only
discards some of the fragments, it sill reduces the number of live channels
which reduces the cost of later non-uniform control-flow.



Optimizing for the latter case is an essentially
heuristic assumption that needs to be verified experimentally.  Have you
tested the effect of this pass on non-DX workloads extensively?



Yes, it is a trade-off.  No, I have not done particularly extensive testing.
We do, however, know of non-DXVK workloads that would benefit from this.  I
believe Manhattan is one such example though I have not yet benchmarked it.


Out of curiosity, what is the performance trade-off here? What extra
costs could we
get by discarding early?


It's typically not high but there may be some cost from extra stalls from 
less latency hiding.  For instance, if you put the discard right after a 
texture, you can't process any more of the shader until it returns so you 
know whether or not to discard.  Also, you could end up increasing register 
pressure if you end up moving something to the top that's uses both in the 
discard and in something else.  It's the usual set of trade-offs you get 
every time you move instructions made possibly worse by how aggressive this 
pass is in making the only instructions before the discard the ones that 
are explicitly needed.










v2 (Jason Ekstrand):
- Fix a couple of typos (Grazvydas, Ian)
- Use the new nir_instr_move helper
- Find all movable discards before moving anything so we don't
accidentally re-order anything and break dependencies
---
src/compiler/Makefile.sources  |   1 +
src/compiler/nir/meson.build   |   1 +
src/compiler/nir/nir.h |  10 +
src/compiler/nir/nir_opt_discard.c | 396 +
4 files changed, 408 insertions(+)
create mode 100644 src/compiler/nir/nir_opt_discard.c

diff --git a/src/compiler/Makefile.sources
b/src/compiler/Makefile.sources
index 9e3fbdc2612..8600ce81281 100644
--- a/src/compiler/Makefile.sources
+++ b/src/compiler/Makefile.sources
@@ -271,6 +271,7 @@ NIR_FILES = \
nir/nir_opt_cse.c \
nir/nir_opt_dce.c \
nir/nir_opt_dead_cf.c \
+ nir/nir_opt_discard.c \
nir/nir_opt_gcm.c \
nir/nir_opt_global_to_local.c \
nir/nir_opt_if.c \
diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
index 28aa8de7014..e339258bb94 100644
--- a/src/compiler/nir/meson.build
+++ b/src/compiler/nir/meson.build
@@ -156,6 +156,7 @@ files_libnir = files(
'nir_opt_cse.c',
'nir_opt_dce.c',
'nir_opt_dead_cf.c',
+  'nir_opt_discard.c',
'nir_opt_gcm.c',
'nir_opt_global_to_local.c',
'nir_opt_if.c',
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index c40a88c8ccc..dac019c17e8 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2022,6 +2022,13 @@ typedef struct nir_shader_compiler_options {
*/
bool vs_inputs_dual_locations;

+   /**
+* Whether or not derivatives are still a safe operation after a
discard
+* has occurred.  Optimization passes may be able to be a bit more
+* agressive if this is true.
+*/
+   bool derivatives_safe_after_discard;
+


It's worth noting in the comment above that any driver that is in
position to enable this option (e.g. i965) is strictly speaking
non-compliant with GLSL and SPIR-V, whether or not this optimization
pass is used.  The reason is that derivatives being safe after a
non-uniform discard implies that any invocations involved in derivative
computations must be executed even though they aren't supposed to
according to the spec, and even though doing so might lead to undefined

Re: [Mesa-dev] [PATCH v2 2/3] nir: Add a discard optimization pass

2018-07-04 Thread Bas Nieuwenhuizen
On Wed, Jul 4, 2018 at 11:00 PM, Jason Ekstrand  wrote:
> On Wed, Jul 4, 2018 at 1:20 PM, Francisco Jerez 
> wrote:
>>
>> Jason Ekstrand  writes:
>>
>> > Many fragment shaders do a discard using relatively little information
>> > but still put the discard fairly far down in the shader for no good
>> > reason.  If the discard is moved higher up, we can possibly avoid doing
>> > some or almost all of the work in the shader.  When this lets us skip
>> > texturing operations, it's an especially high win.
>> >
>> > One of the biggest offenders here is DXVK.  The D3D APIs have different
>> > rules for discards than OpenGL and Vulkan.  One effective way (which is
>> > what DXVK uses) to implement DX behavior on top of GL or Vulkan is to
>> > wait until the very end of the shader to discard.  This ends up in the
>> > pessimal case where we always do all of the work before discarding.
>> > This pass helps some DXVK shaders significantly.
>> >
>>
>> One thing to keep in mind is that this sort of transformation is trading
>> off run-time of fragment shader invocations that don't call discard (or
>> do so non-uniformly, which means that the code the discard jump is
>> protecting will be executed anyway, so doing this can actually increase
>> the critical path of the program) in favour of invocations that call
>> discard uniformly (so executing discard early will effectively terminate
>> the program early).
>
>
> It's not really a uniform vs. non-uniform thing.  Even if a shader only
> discards some of the fragments, it sill reduces the number of live channels
> which reduces the cost of later non-uniform control-flow.
>
>>
>> Optimizing for the latter case is an essentially
>> heuristic assumption that needs to be verified experimentally.  Have you
>> tested the effect of this pass on non-DX workloads extensively?
>
>
> Yes, it is a trade-off.  No, I have not done particularly extensive testing.
> We do, however, know of non-DXVK workloads that would benefit from this.  I
> believe Manhattan is one such example though I have not yet benchmarked it.

Out of curiosity, what is the performance trade-off here? What extra
costs could we
get by discarding early?

>
>>
>> > v2 (Jason Ekstrand):
>> >  - Fix a couple of typos (Grazvydas, Ian)
>> >  - Use the new nir_instr_move helper
>> >  - Find all movable discards before moving anything so we don't
>> >accidentally re-order anything and break dependencies
>> > ---
>> >  src/compiler/Makefile.sources  |   1 +
>> >  src/compiler/nir/meson.build   |   1 +
>> >  src/compiler/nir/nir.h |  10 +
>> >  src/compiler/nir/nir_opt_discard.c | 396 +
>> >  4 files changed, 408 insertions(+)
>> >  create mode 100644 src/compiler/nir/nir_opt_discard.c
>> >
>> > diff --git a/src/compiler/Makefile.sources
>> > b/src/compiler/Makefile.sources
>> > index 9e3fbdc2612..8600ce81281 100644
>> > --- a/src/compiler/Makefile.sources
>> > +++ b/src/compiler/Makefile.sources
>> > @@ -271,6 +271,7 @@ NIR_FILES = \
>> >   nir/nir_opt_cse.c \
>> >   nir/nir_opt_dce.c \
>> >   nir/nir_opt_dead_cf.c \
>> > + nir/nir_opt_discard.c \
>> >   nir/nir_opt_gcm.c \
>> >   nir/nir_opt_global_to_local.c \
>> >   nir/nir_opt_if.c \
>> > diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
>> > index 28aa8de7014..e339258bb94 100644
>> > --- a/src/compiler/nir/meson.build
>> > +++ b/src/compiler/nir/meson.build
>> > @@ -156,6 +156,7 @@ files_libnir = files(
>> >'nir_opt_cse.c',
>> >'nir_opt_dce.c',
>> >'nir_opt_dead_cf.c',
>> > +  'nir_opt_discard.c',
>> >'nir_opt_gcm.c',
>> >'nir_opt_global_to_local.c',
>> >'nir_opt_if.c',
>> > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> > index c40a88c8ccc..dac019c17e8 100644
>> > --- a/src/compiler/nir/nir.h
>> > +++ b/src/compiler/nir/nir.h
>> > @@ -2022,6 +2022,13 @@ typedef struct nir_shader_compiler_options {
>> >  */
>> > bool vs_inputs_dual_locations;
>> >
>> > +   /**
>> > +* Whether or not derivatives are still a safe operation after a
>> > discard
>> > +* has occurred.  Optimization passes may be able to be a bit more
>> > +* agressive if this is true.
>> > +*/
>> > +   bool derivatives_safe_after_discard;
>> > +
>>
>> It's worth noting in the comment above that any driver that is in
>> position to enable this option (e.g. i965) is strictly speaking
>> non-compliant with GLSL and SPIR-V, whether or not this optimization
>> pass is used.  The reason is that derivatives being safe after a
>> non-uniform discard implies that any invocations involved in derivative
>> computations must be executed even though they aren't supposed to
>> according to the spec, and even though doing so might lead to undefined
>> behaviour that wasn't present in the original program, e.g.:
>>
>> | int delta = non_uniform_computation();
>> | if (delta == 0)
>> |  discard;
>> |
>> | for (int i = 0; i < N; i += delta) {
>> |   // 

Re: [Mesa-dev] [PATCH v2 2/3] nir: Add a discard optimization pass

2018-07-04 Thread Jason Ekstrand
On Wed, Jul 4, 2018 at 1:20 PM, Francisco Jerez 
wrote:

> Jason Ekstrand  writes:
>
> > Many fragment shaders do a discard using relatively little information
> > but still put the discard fairly far down in the shader for no good
> > reason.  If the discard is moved higher up, we can possibly avoid doing
> > some or almost all of the work in the shader.  When this lets us skip
> > texturing operations, it's an especially high win.
> >
> > One of the biggest offenders here is DXVK.  The D3D APIs have different
> > rules for discards than OpenGL and Vulkan.  One effective way (which is
> > what DXVK uses) to implement DX behavior on top of GL or Vulkan is to
> > wait until the very end of the shader to discard.  This ends up in the
> > pessimal case where we always do all of the work before discarding.
> > This pass helps some DXVK shaders significantly.
> >
>
> One thing to keep in mind is that this sort of transformation is trading
> off run-time of fragment shader invocations that don't call discard (or
> do so non-uniformly, which means that the code the discard jump is
> protecting will be executed anyway, so doing this can actually increase
> the critical path of the program) in favour of invocations that call
> discard uniformly (so executing discard early will effectively terminate
> the program early).


It's not really a uniform vs. non-uniform thing.  Even if a shader only
discards some of the fragments, it sill reduces the number of live channels
which reduces the cost of later non-uniform control-flow.


> Optimizing for the latter case is an essentially
> heuristic assumption that needs to be verified experimentally.  Have you
> tested the effect of this pass on non-DX workloads extensively?
>

Yes, it is a trade-off.  No, I have not done particularly extensive
testing.  We do, however, know of non-DXVK workloads that would benefit
from this.  I believe Manhattan is one such example though I have not yet
benchmarked it.


> > v2 (Jason Ekstrand):
> >  - Fix a couple of typos (Grazvydas, Ian)
> >  - Use the new nir_instr_move helper
> >  - Find all movable discards before moving anything so we don't
> >accidentally re-order anything and break dependencies
> > ---
> >  src/compiler/Makefile.sources  |   1 +
> >  src/compiler/nir/meson.build   |   1 +
> >  src/compiler/nir/nir.h |  10 +
> >  src/compiler/nir/nir_opt_discard.c | 396 +
> >  4 files changed, 408 insertions(+)
> >  create mode 100644 src/compiler/nir/nir_opt_discard.c
> >
> > diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.
> sources
> > index 9e3fbdc2612..8600ce81281 100644
> > --- a/src/compiler/Makefile.sources
> > +++ b/src/compiler/Makefile.sources
> > @@ -271,6 +271,7 @@ NIR_FILES = \
> >   nir/nir_opt_cse.c \
> >   nir/nir_opt_dce.c \
> >   nir/nir_opt_dead_cf.c \
> > + nir/nir_opt_discard.c \
> >   nir/nir_opt_gcm.c \
> >   nir/nir_opt_global_to_local.c \
> >   nir/nir_opt_if.c \
> > diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
> > index 28aa8de7014..e339258bb94 100644
> > --- a/src/compiler/nir/meson.build
> > +++ b/src/compiler/nir/meson.build
> > @@ -156,6 +156,7 @@ files_libnir = files(
> >'nir_opt_cse.c',
> >'nir_opt_dce.c',
> >'nir_opt_dead_cf.c',
> > +  'nir_opt_discard.c',
> >'nir_opt_gcm.c',
> >'nir_opt_global_to_local.c',
> >'nir_opt_if.c',
> > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> > index c40a88c8ccc..dac019c17e8 100644
> > --- a/src/compiler/nir/nir.h
> > +++ b/src/compiler/nir/nir.h
> > @@ -2022,6 +2022,13 @@ typedef struct nir_shader_compiler_options {
> >  */
> > bool vs_inputs_dual_locations;
> >
> > +   /**
> > +* Whether or not derivatives are still a safe operation after a
> discard
> > +* has occurred.  Optimization passes may be able to be a bit more
> > +* agressive if this is true.
> > +*/
> > +   bool derivatives_safe_after_discard;
> > +
>
> It's worth noting in the comment above that any driver that is in
> position to enable this option (e.g. i965) is strictly speaking
> non-compliant with GLSL and SPIR-V, whether or not this optimization
> pass is used.  The reason is that derivatives being safe after a
> non-uniform discard implies that any invocations involved in derivative
> computations must be executed even though they aren't supposed to
> according to the spec, and even though doing so might lead to undefined
> behaviour that wasn't present in the original program, e.g.:
>
> | int delta = non_uniform_computation();
> | if (delta == 0)
> |  discard;
> |
> | for (int i = 0; i < N; i += delta) {
> |   // Will loop forever if discarded fragments are incorrectly executed
> |   // by the back-end.
> | }
>
> The above shader is specified to terminate if the semantics of discard
> are as defined by GLSL or SPIRV, but not necessarily as defined by DX.
>

That is an interesting point.  One possible 

Re: [Mesa-dev] [PATCH v2 2/3] nir: Add a discard optimization pass

2018-07-04 Thread Francisco Jerez
Jason Ekstrand  writes:

> Many fragment shaders do a discard using relatively little information
> but still put the discard fairly far down in the shader for no good
> reason.  If the discard is moved higher up, we can possibly avoid doing
> some or almost all of the work in the shader.  When this lets us skip
> texturing operations, it's an especially high win.
>
> One of the biggest offenders here is DXVK.  The D3D APIs have different
> rules for discards than OpenGL and Vulkan.  One effective way (which is
> what DXVK uses) to implement DX behavior on top of GL or Vulkan is to
> wait until the very end of the shader to discard.  This ends up in the
> pessimal case where we always do all of the work before discarding.
> This pass helps some DXVK shaders significantly.
>

One thing to keep in mind is that this sort of transformation is trading
off run-time of fragment shader invocations that don't call discard (or
do so non-uniformly, which means that the code the discard jump is
protecting will be executed anyway, so doing this can actually increase
the critical path of the program) in favour of invocations that call
discard uniformly (so executing discard early will effectively terminate
the program early).  Optimizing for the latter case is an essentially
heuristic assumption that needs to be verified experimentally.  Have you
tested the effect of this pass on non-DX workloads extensively?

> v2 (Jason Ekstrand):
>  - Fix a couple of typos (Grazvydas, Ian)
>  - Use the new nir_instr_move helper
>  - Find all movable discards before moving anything so we don't
>accidentally re-order anything and break dependencies
> ---
>  src/compiler/Makefile.sources  |   1 +
>  src/compiler/nir/meson.build   |   1 +
>  src/compiler/nir/nir.h |  10 +
>  src/compiler/nir/nir_opt_discard.c | 396 +
>  4 files changed, 408 insertions(+)
>  create mode 100644 src/compiler/nir/nir_opt_discard.c
>
> diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
> index 9e3fbdc2612..8600ce81281 100644
> --- a/src/compiler/Makefile.sources
> +++ b/src/compiler/Makefile.sources
> @@ -271,6 +271,7 @@ NIR_FILES = \
>   nir/nir_opt_cse.c \
>   nir/nir_opt_dce.c \
>   nir/nir_opt_dead_cf.c \
> + nir/nir_opt_discard.c \
>   nir/nir_opt_gcm.c \
>   nir/nir_opt_global_to_local.c \
>   nir/nir_opt_if.c \
> diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
> index 28aa8de7014..e339258bb94 100644
> --- a/src/compiler/nir/meson.build
> +++ b/src/compiler/nir/meson.build
> @@ -156,6 +156,7 @@ files_libnir = files(
>'nir_opt_cse.c',
>'nir_opt_dce.c',
>'nir_opt_dead_cf.c',
> +  'nir_opt_discard.c',
>'nir_opt_gcm.c',
>'nir_opt_global_to_local.c',
>'nir_opt_if.c',
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index c40a88c8ccc..dac019c17e8 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -2022,6 +2022,13 @@ typedef struct nir_shader_compiler_options {
>  */
> bool vs_inputs_dual_locations;
>  
> +   /**
> +* Whether or not derivatives are still a safe operation after a discard
> +* has occurred.  Optimization passes may be able to be a bit more
> +* agressive if this is true.
> +*/
> +   bool derivatives_safe_after_discard;
> +

It's worth noting in the comment above that any driver that is in
position to enable this option (e.g. i965) is strictly speaking
non-compliant with GLSL and SPIR-V, whether or not this optimization
pass is used.  The reason is that derivatives being safe after a
non-uniform discard implies that any invocations involved in derivative
computations must be executed even though they aren't supposed to
according to the spec, and even though doing so might lead to undefined
behaviour that wasn't present in the original program, e.g.:

| int delta = non_uniform_computation();
| if (delta == 0)
|  discard;
| 
| for (int i = 0; i < N; i += delta) {
|   // Will loop forever if discarded fragments are incorrectly executed
|   // by the back-end.
| }

The above shader is specified to terminate if the semantics of discard
are as defined by GLSL or SPIRV, but not necessarily as defined by DX.
This makes me think that DXVK is in a privileged position to decide
where the discard jump should end up at, since it can make assumptions
about code lexically after a discard being well-defined even if the
discard condition evaluates to true.  It's unfortunate that it behaves
so suboptimally currently that you need to work around it here.

> unsigned max_unroll_iterations;
>  } nir_shader_compiler_options;
>  
> @@ -2901,6 +2908,9 @@ bool nir_opt_dce(nir_shader *shader);
>  
>  bool nir_opt_dead_cf(nir_shader *shader);
>  
> +bool nir_opt_discard_if(nir_shader *shader);
> +bool nir_opt_move_discards_to_top(nir_shader *shader);
> +
>  bool nir_opt_gcm(nir_shader *shader, bool value_number);
>  
>  bool nir_opt_if(nir_shader 

Re: [Mesa-dev] [PATCH v2 2/3] nir: Add a discard optimization pass

2018-07-04 Thread Jason Ekstrand
On Wed, Jul 4, 2018 at 10:00 AM, Matt Turner  wrote:

> On Wed, Jul 4, 2018 at 9:59 AM Jason Ekstrand 
> wrote:
> >
> > Many fragment shaders do a discard using relatively little information
> > but still put the discard fairly far down in the shader for no good
> > reason.  If the discard is moved higher up, we can possibly avoid doing
> > some or almost all of the work in the shader.  When this lets us skip
> > texturing operations, it's an especially high win.
> >
> > One of the biggest offenders here is DXVK.  The D3D APIs have different
> > rules for discards than OpenGL and Vulkan.  One effective way (which is
> > what DXVK uses) to implement DX behavior on top of GL or Vulkan is to
> > wait until the very end of the shader to discard.  This ends up in the
> > pessimal case where we always do all of the work before discarding.
> > This pass helps some DXVK shaders significantly.
> >
> > v2 (Jason Ekstrand):
> >  - Fix a couple of typos (Grazvydas, Ian)
> >  - Use the new nir_instr_move helper
> >  - Find all movable discards before moving anything so we don't
> >accidentally re-order anything and break dependencies
> > ---
> >  src/compiler/Makefile.sources  |   1 +
> >  src/compiler/nir/meson.build   |   1 +
> >  src/compiler/nir/nir.h |  10 +
> >  src/compiler/nir/nir_opt_discard.c | 396 +
> >  4 files changed, 408 insertions(+)
> >  create mode 100644 src/compiler/nir/nir_opt_discard.c
> >
> > diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.
> sources
> > index 9e3fbdc2612..8600ce81281 100644
> > --- a/src/compiler/Makefile.sources
> > +++ b/src/compiler/Makefile.sources
> > @@ -271,6 +271,7 @@ NIR_FILES = \
> > nir/nir_opt_cse.c \
> > nir/nir_opt_dce.c \
> > nir/nir_opt_dead_cf.c \
> > +   nir/nir_opt_discard.c \
> > nir/nir_opt_gcm.c \
> > nir/nir_opt_global_to_local.c \
> > nir/nir_opt_if.c \
> > diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
> > index 28aa8de7014..e339258bb94 100644
> > --- a/src/compiler/nir/meson.build
> > +++ b/src/compiler/nir/meson.build
> > @@ -156,6 +156,7 @@ files_libnir = files(
> >'nir_opt_cse.c',
> >'nir_opt_dce.c',
> >'nir_opt_dead_cf.c',
> > +  'nir_opt_discard.c',
> >'nir_opt_gcm.c',
> >'nir_opt_global_to_local.c',
> >'nir_opt_if.c',
> > diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> > index c40a88c8ccc..dac019c17e8 100644
> > --- a/src/compiler/nir/nir.h
> > +++ b/src/compiler/nir/nir.h
> > @@ -2022,6 +2022,13 @@ typedef struct nir_shader_compiler_options {
> >  */
> > bool vs_inputs_dual_locations;
> >
> > +   /**
> > +* Whether or not derivatives are still a safe operation after a
> discard
> > +* has occurred.  Optimization passes may be able to be a bit more
> > +* agressive if this is true.
>
> s/agressive/aggressive/
>

fixed locally.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 2/3] nir: Add a discard optimization pass

2018-07-04 Thread Matt Turner
On Wed, Jul 4, 2018 at 9:59 AM Jason Ekstrand  wrote:
>
> Many fragment shaders do a discard using relatively little information
> but still put the discard fairly far down in the shader for no good
> reason.  If the discard is moved higher up, we can possibly avoid doing
> some or almost all of the work in the shader.  When this lets us skip
> texturing operations, it's an especially high win.
>
> One of the biggest offenders here is DXVK.  The D3D APIs have different
> rules for discards than OpenGL and Vulkan.  One effective way (which is
> what DXVK uses) to implement DX behavior on top of GL or Vulkan is to
> wait until the very end of the shader to discard.  This ends up in the
> pessimal case where we always do all of the work before discarding.
> This pass helps some DXVK shaders significantly.
>
> v2 (Jason Ekstrand):
>  - Fix a couple of typos (Grazvydas, Ian)
>  - Use the new nir_instr_move helper
>  - Find all movable discards before moving anything so we don't
>accidentally re-order anything and break dependencies
> ---
>  src/compiler/Makefile.sources  |   1 +
>  src/compiler/nir/meson.build   |   1 +
>  src/compiler/nir/nir.h |  10 +
>  src/compiler/nir/nir_opt_discard.c | 396 +
>  4 files changed, 408 insertions(+)
>  create mode 100644 src/compiler/nir/nir_opt_discard.c
>
> diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
> index 9e3fbdc2612..8600ce81281 100644
> --- a/src/compiler/Makefile.sources
> +++ b/src/compiler/Makefile.sources
> @@ -271,6 +271,7 @@ NIR_FILES = \
> nir/nir_opt_cse.c \
> nir/nir_opt_dce.c \
> nir/nir_opt_dead_cf.c \
> +   nir/nir_opt_discard.c \
> nir/nir_opt_gcm.c \
> nir/nir_opt_global_to_local.c \
> nir/nir_opt_if.c \
> diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
> index 28aa8de7014..e339258bb94 100644
> --- a/src/compiler/nir/meson.build
> +++ b/src/compiler/nir/meson.build
> @@ -156,6 +156,7 @@ files_libnir = files(
>'nir_opt_cse.c',
>'nir_opt_dce.c',
>'nir_opt_dead_cf.c',
> +  'nir_opt_discard.c',
>'nir_opt_gcm.c',
>'nir_opt_global_to_local.c',
>'nir_opt_if.c',
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index c40a88c8ccc..dac019c17e8 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -2022,6 +2022,13 @@ typedef struct nir_shader_compiler_options {
>  */
> bool vs_inputs_dual_locations;
>
> +   /**
> +* Whether or not derivatives are still a safe operation after a discard
> +* has occurred.  Optimization passes may be able to be a bit more
> +* agressive if this is true.

s/agressive/aggressive/
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2 2/3] nir: Add a discard optimization pass

2018-07-04 Thread Jason Ekstrand
Many fragment shaders do a discard using relatively little information
but still put the discard fairly far down in the shader for no good
reason.  If the discard is moved higher up, we can possibly avoid doing
some or almost all of the work in the shader.  When this lets us skip
texturing operations, it's an especially high win.

One of the biggest offenders here is DXVK.  The D3D APIs have different
rules for discards than OpenGL and Vulkan.  One effective way (which is
what DXVK uses) to implement DX behavior on top of GL or Vulkan is to
wait until the very end of the shader to discard.  This ends up in the
pessimal case where we always do all of the work before discarding.
This pass helps some DXVK shaders significantly.

v2 (Jason Ekstrand):
 - Fix a couple of typos (Grazvydas, Ian)
 - Use the new nir_instr_move helper
 - Find all movable discards before moving anything so we don't
   accidentally re-order anything and break dependencies
---
 src/compiler/Makefile.sources  |   1 +
 src/compiler/nir/meson.build   |   1 +
 src/compiler/nir/nir.h |  10 +
 src/compiler/nir/nir_opt_discard.c | 396 +
 4 files changed, 408 insertions(+)
 create mode 100644 src/compiler/nir/nir_opt_discard.c

diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
index 9e3fbdc2612..8600ce81281 100644
--- a/src/compiler/Makefile.sources
+++ b/src/compiler/Makefile.sources
@@ -271,6 +271,7 @@ NIR_FILES = \
nir/nir_opt_cse.c \
nir/nir_opt_dce.c \
nir/nir_opt_dead_cf.c \
+   nir/nir_opt_discard.c \
nir/nir_opt_gcm.c \
nir/nir_opt_global_to_local.c \
nir/nir_opt_if.c \
diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
index 28aa8de7014..e339258bb94 100644
--- a/src/compiler/nir/meson.build
+++ b/src/compiler/nir/meson.build
@@ -156,6 +156,7 @@ files_libnir = files(
   'nir_opt_cse.c',
   'nir_opt_dce.c',
   'nir_opt_dead_cf.c',
+  'nir_opt_discard.c',
   'nir_opt_gcm.c',
   'nir_opt_global_to_local.c',
   'nir_opt_if.c',
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index c40a88c8ccc..dac019c17e8 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -2022,6 +2022,13 @@ typedef struct nir_shader_compiler_options {
 */
bool vs_inputs_dual_locations;
 
+   /**
+* Whether or not derivatives are still a safe operation after a discard
+* has occurred.  Optimization passes may be able to be a bit more
+* agressive if this is true.
+*/
+   bool derivatives_safe_after_discard;
+
unsigned max_unroll_iterations;
 } nir_shader_compiler_options;
 
@@ -2901,6 +2908,9 @@ bool nir_opt_dce(nir_shader *shader);
 
 bool nir_opt_dead_cf(nir_shader *shader);
 
+bool nir_opt_discard_if(nir_shader *shader);
+bool nir_opt_move_discards_to_top(nir_shader *shader);
+
 bool nir_opt_gcm(nir_shader *shader, bool value_number);
 
 bool nir_opt_if(nir_shader *shader);
diff --git a/src/compiler/nir/nir_opt_discard.c 
b/src/compiler/nir/nir_opt_discard.c
new file mode 100644
index 000..c61af163707
--- /dev/null
+++ b/src/compiler/nir/nir_opt_discard.c
@@ -0,0 +1,396 @@
+/*
+ * Copyright © 2018 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "nir.h"
+#include "nir_builder.h"
+#include "nir_control_flow.h"
+#include "nir_worklist.h"
+
+static bool
+block_has_only_discard(nir_block *block)
+{
+   nir_instr *instr = nir_block_first_instr(block);
+   if (instr == NULL || instr != nir_block_last_instr(block))
+  return false;
+
+   if (instr->type != nir_instr_type_intrinsic)
+  return false;
+
+   nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
+   return intrin->intrinsic == nir_intrinsic_discard;
+}
+
+static bool
+opt_discard_if_impl(nir_function_impl *impl)
+{
+   bool progress = false;
+
+   nir_builder b;
+   nir_builder_init(, impl);
+
+