Re: [Mesa-dev] [PATCH 11/22] nir: Recognize some more open-coded fmin / fmax

2018-02-26 Thread Roland Scheidegger
Am 26.02.2018 um 23:21 schrieb Ian Romanick:
> On 02/23/2018 05:14 PM, Jason Ekstrand wrote:
>> On Fri, Feb 23, 2018 at 3:55 PM, Ian Romanick > > wrote:
>>
>> From: Ian Romanick > >
>>
>> shader-db results:
>>
>> Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
>> total instructions in shared programs: 14514817 -> 14514808 (<.01%)
>> instructions in affected programs: 229 -> 220 (-3.93%)
>> helped: 3
>> HURT: 0
>> helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 4
>> helped stats (rel) min: 2.86% max: 4.12% x̄: 3.70% x̃: 4.12%
>>
>> total cycles in shared programs: 533145211 -> 533144939 (<.01%)
>> cycles in affected programs: 37268 -> 36996 (-0.73%)
>> helped: 8
>> HURT: 0
>> helped stats (abs) min: 2 max: 134 x̄: 34.00 x̃: 2
>> helped stats (rel) min: 0.02% max: 14.22% x̄: 3.53% x̃: 0.05%
>>
>> Sandy Bridge and Ivy Bridge had similar results. (Ivy Bridge shown)
>> total cycles in shared programs: 257618409 -> 257618403 (<.01%)
>> cycles in affected programs: 12582 -> 12576 (-0.05%)
>> helped: 3
>> HURT: 0
>> helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
>> helped stats (rel) min: 0.05% max: 0.05% x̄: 0.05% x̃: 0.05%
>>
>> No changes on Iron Lake or GM45.
>>
>> Signed-off-by: Ian Romanick > >
>> ---
>>  src/compiler/nir/nir_opt_algebraic.py | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/src/compiler/nir/nir_opt_algebraic.py
>> b/src/compiler/nir/nir_opt_algebraic.py
>> index d40d59b..f5f9e94 100644
>> --- a/src/compiler/nir/nir_opt_algebraic.py
>> +++ b/src/compiler/nir/nir_opt_algebraic.py
>> @@ -170,6 +170,8 @@ optimizations = [
>>     (('fge', ('fneg', ('fabs', a)), 0.0), ('feq', a, 0.0)),
>>     (('bcsel', ('flt', b, a), b, a), ('fmin', a, b)),
>>     (('bcsel', ('flt', a, b), b, a), ('fmax', a, b)),
>> +   (('bcsel', ('fge', b, a), a, b), ('fmin', a, b)),
>> +   (('bcsel', ('fge', a, b), a, b), ('fmax', a, b)),
>>
>>
>> Please flag as inexact.  As per the stupid GLSL definition, these are
>> not the same as fmin/fmax when you throw in a NaN.
> 
> I'm having some trouble rectifying this with the existing
> transformations and the Intel hardware implementation.
> 
> GLSL spec says min(x, y) "Returns y if y < x; otherwise it returns x."
> From that I infer min(x, NaN) == x, and min(NaN, y) == NaN.  The
> expression ('bcsel', ('flt', b, a), b, a) has the same behavior.
> 
> I think if I rewrite the fmin transform as (swapping the argument order)
> 
> (('bcsel', ('fge', a, b), b, a), ('fmin', a, b)),
> 
> it should be at least as valid for as the existing transforms.  A
> similar modification should work for fmax.
> 
> The Intel SEL instruction which says that with the .L or .GE modifier,
> if one argument is NaN, the other value is always returned.  This means
> that min(NaN, y) will be y.
> 
> This is valid for min and max because section 4.7.1 (Range and
> Precision) says:
> 
> Operations and built-in functions that operate on a NaN are not
> required to return a NaN as the result.
> 
> I don't think returning non-NaN for ('bcsel', ('flt', b, NaN), b, NaN)
> is valid, so I think the existing transformations should also be marked
> inexact for platforms that implement the "never NaN" behavior for fmin
> or fmax.
Everybody will return non-nan for fmin/fmax (if you don't, some apps
will very likely break), at least for pc graphic chips (with the
exception of pre-dx10 chips, where all bets are off). This is the
required behavior by dx10 (hence what everyone will do, since gl doesn't
have a required behavior as far as NaNs are concerned). So yes,
transforming bcsel + flt/fge into fmin/fmax looks inexact to me too.
(glsl NaN behavior really sucks - in theory nearly everything goes, but
in practice apps will break if you don't follow more or less dx10 rules...)

Roland


> 
>>     (('bcsel', ('inot', a), b, c), ('bcsel', a, c, b)),
>>     (('bcsel', a, ('bcsel', a, b, c), d), ('bcsel', a, b, d)),
>>     (('bcsel', a, True, 'b@bool'), ('ior', a, b)),
>> --
>> 2.9.5
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org 
>> 
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=DwIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=svCuExX5hvVdvO6FOv-EAbkLHTwvBj8vZ2p7r-AXx28&s=40JZMjENRthrMifGyr5Y9JqxTeCxba6QBTkrgaFTkos&e=
>> 
>> 

Re: [Mesa-dev] [PATCH 11/22] nir: Recognize some more open-coded fmin / fmax

2018-02-26 Thread Ian Romanick
On 02/26/2018 02:43 PM, Jason Ekstrand wrote:
> On Mon, Feb 26, 2018 at 2:21 PM, Ian Romanick  > wrote:
> 
> On 02/23/2018 05:14 PM, Jason Ekstrand wrote:
> > On Fri, Feb 23, 2018 at 3:55 PM, Ian Romanick  
> > >> wrote:
> >
> >     From: Ian Romanick  
> >      >>
> >
> >     shader-db results:
> >
> >     Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
> >     total instructions in shared programs: 14514817 -> 14514808 (<.01%)
> >     instructions in affected programs: 229 -> 220 (-3.93%)
> >     helped: 3
> >     HURT: 0
> >     helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 4
> >     helped stats (rel) min: 2.86% max: 4.12% x̄: 3.70% x̃: 4.12%
> >
> >     total cycles in shared programs: 533145211 -> 533144939 (<.01%)
> >     cycles in affected programs: 37268 -> 36996 (-0.73%)
> >     helped: 8
> >     HURT: 0
> >     helped stats (abs) min: 2 max: 134 x̄: 34.00 x̃: 2
> >     helped stats (rel) min: 0.02% max: 14.22% x̄: 3.53% x̃: 0.05%
> >
> >     Sandy Bridge and Ivy Bridge had similar results. (Ivy Bridge shown)
> >     total cycles in shared programs: 257618409 -> 257618403 (<.01%)
> >     cycles in affected programs: 12582 -> 12576 (-0.05%)
> >     helped: 3
> >     HURT: 0
> >     helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
> >     helped stats (rel) min: 0.05% max: 0.05% x̄: 0.05% x̃: 0.05%
> >
> >     No changes on Iron Lake or GM45.
> >
> >     Signed-off-by: Ian Romanick  
> >      >>
> >     ---
> >      src/compiler/nir/nir_opt_algebraic.py | 2 ++
> >      1 file changed, 2 insertions(+)
> >
> >     diff --git a/src/compiler/nir/nir_opt_algebraic.py
> >     b/src/compiler/nir/nir_opt_algebraic.py
> >     index d40d59b..f5f9e94 100644
> >     --- a/src/compiler/nir/nir_opt_algebraic.py
> >     +++ b/src/compiler/nir/nir_opt_algebraic.py
> >     @@ -170,6 +170,8 @@ optimizations = [
> >         (('fge', ('fneg', ('fabs', a)), 0.0), ('feq', a, 0.0)),
> >         (('bcsel', ('flt', b, a), b, a), ('fmin', a, b)),
> >         (('bcsel', ('flt', a, b), b, a), ('fmax', a, b)),
> >     +   (('bcsel', ('fge', b, a), a, b), ('fmin', a, b)),
> >     +   (('bcsel', ('fge', a, b), a, b), ('fmax', a, b)),
> >
> >
> > Please flag as inexact.  As per the stupid GLSL definition, these are
> > not the same as fmin/fmax when you throw in a NaN.
> 
> I'm having some trouble rectifying this with the existing
> transformations and the Intel hardware implementation.
> 
> 
> Me too. :-)  Really, I think D3D10 spec'd the right thing. With GL, it
> looks more like a case of someone wrote the obvious thing down when NaN
> wasn't a thing and it got extended to NaN without much thought.  How
> would you feel about just defining the NIR fmin/fmax to have the D3D10
> behavior and telling people that they can make new opcodes if they want
> something different?  At the very least, that would make it
> constant-fold consistently for us.  It also sounds like it's a correct
> transformation as per the spec comment below.
> 
> GLSL spec says min(x, y) "Returns y if y < x; otherwise it returns x."
> From that I infer min(x, NaN) == x, and min(NaN, y) == NaN.  The
> expression ('bcsel', ('flt', b, a), b, a) has the same behavior.
> 
> I think if I rewrite the fmin transform as (swapping the argument order)
> 
>     (('bcsel', ('fge', a, b), b, a), ('fmin', a, b)),
> 
> it should be at least as valid for as the existing transforms.  A
> similar modification should work for fmax.
> 
> 
> Yes, it should.
>  
> 
> The Intel SEL instruction which says that with the .L or .GE modifier,
> if one argument is NaN, the other value is always returned.  This means
> that min(NaN, y) will be y.
> 
> This is valid for min and max because section 4.7.1 (Range and
> Precision) says:
> 
>     Operations and built-in functions that operate on a NaN are not
>     required to return a NaN as the result.
> 
> 
> It's good to know that the DX behavior is considered a valid
> transformation.  I didn't know exactly what the rules were there.
>  
> 
> I don't think returning non-NaN for ('bcsel', ('flt', b, NaN), b, NaN)
> is valid, so I think the existing transformations should also be marked
> inexact for platforms that implement the "never NaN" behavior for fmin
> or fmax.
> 
> 
> I think I agree.  I think you could probably argue the point and I doubt
> real apps would care but...  Probably best to flag it as ine

Re: [Mesa-dev] [PATCH 11/22] nir: Recognize some more open-coded fmin / fmax

2018-02-26 Thread Jason Ekstrand
On Mon, Feb 26, 2018 at 2:21 PM, Ian Romanick  wrote:

> On 02/23/2018 05:14 PM, Jason Ekstrand wrote:
> > On Fri, Feb 23, 2018 at 3:55 PM, Ian Romanick  > > wrote:
> >
> > From: Ian Romanick  > >
> >
> > shader-db results:
> >
> > Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
> > total instructions in shared programs: 14514817 -> 14514808 (<.01%)
> > instructions in affected programs: 229 -> 220 (-3.93%)
> > helped: 3
> > HURT: 0
> > helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 4
> > helped stats (rel) min: 2.86% max: 4.12% x̄: 3.70% x̃: 4.12%
> >
> > total cycles in shared programs: 533145211 -> 533144939 (<.01%)
> > cycles in affected programs: 37268 -> 36996 (-0.73%)
> > helped: 8
> > HURT: 0
> > helped stats (abs) min: 2 max: 134 x̄: 34.00 x̃: 2
> > helped stats (rel) min: 0.02% max: 14.22% x̄: 3.53% x̃: 0.05%
> >
> > Sandy Bridge and Ivy Bridge had similar results. (Ivy Bridge shown)
> > total cycles in shared programs: 257618409 -> 257618403 (<.01%)
> > cycles in affected programs: 12582 -> 12576 (-0.05%)
> > helped: 3
> > HURT: 0
> > helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
> > helped stats (rel) min: 0.05% max: 0.05% x̄: 0.05% x̃: 0.05%
> >
> > No changes on Iron Lake or GM45.
> >
> > Signed-off-by: Ian Romanick  > >
> > ---
> >  src/compiler/nir/nir_opt_algebraic.py | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/src/compiler/nir/nir_opt_algebraic.py
> > b/src/compiler/nir/nir_opt_algebraic.py
> > index d40d59b..f5f9e94 100644
> > --- a/src/compiler/nir/nir_opt_algebraic.py
> > +++ b/src/compiler/nir/nir_opt_algebraic.py
> > @@ -170,6 +170,8 @@ optimizations = [
> > (('fge', ('fneg', ('fabs', a)), 0.0), ('feq', a, 0.0)),
> > (('bcsel', ('flt', b, a), b, a), ('fmin', a, b)),
> > (('bcsel', ('flt', a, b), b, a), ('fmax', a, b)),
> > +   (('bcsel', ('fge', b, a), a, b), ('fmin', a, b)),
> > +   (('bcsel', ('fge', a, b), a, b), ('fmax', a, b)),
> >
> >
> > Please flag as inexact.  As per the stupid GLSL definition, these are
> > not the same as fmin/fmax when you throw in a NaN.
>
> I'm having some trouble rectifying this with the existing
> transformations and the Intel hardware implementation.
>

Me too. :-)  Really, I think D3D10 spec'd the right thing. With GL, it
looks more like a case of someone wrote the obvious thing down when NaN
wasn't a thing and it got extended to NaN without much thought.  How would
you feel about just defining the NIR fmin/fmax to have the D3D10 behavior
and telling people that they can make new opcodes if they want something
different?  At the very least, that would make it constant-fold
consistently for us.  It also sounds like it's a correct transformation as
per the spec comment below.

GLSL spec says min(x, y) "Returns y if y < x; otherwise it returns x."
> From that I infer min(x, NaN) == x, and min(NaN, y) == NaN.  The
> expression ('bcsel', ('flt', b, a), b, a) has the same behavior.
>
> I think if I rewrite the fmin transform as (swapping the argument order)
>
> (('bcsel', ('fge', a, b), b, a), ('fmin', a, b)),
>
> it should be at least as valid for as the existing transforms.  A
> similar modification should work for fmax.
>

Yes, it should.


> The Intel SEL instruction which says that with the .L or .GE modifier,
> if one argument is NaN, the other value is always returned.  This means
> that min(NaN, y) will be y.
>
> This is valid for min and max because section 4.7.1 (Range and
> Precision) says:
>
> Operations and built-in functions that operate on a NaN are not
> required to return a NaN as the result.
>

It's good to know that the DX behavior is considered a valid
transformation.  I didn't know exactly what the rules were there.


> I don't think returning non-NaN for ('bcsel', ('flt', b, NaN), b, NaN)
> is valid, so I think the existing transformations should also be marked
> inexact for platforms that implement the "never NaN" behavior for fmin
> or fmax.
>

I think I agree.  I think you could probably argue the point and I doubt
real apps would care but...  Probably best to flag it as inexact and move
on with life.  If you're using precise and open-coding min/max in some
horifically stupid way, you're doing it wrong.


> > (('bcsel', ('inot', a), b, c), ('bcsel', a, c, b)),
> > (('bcsel', a, ('bcsel', a, b, c), d), ('bcsel', a, b, d)),
> > (('bcsel', a, True, 'b@bool'), ('ior', a, b)),
> > --
> > 2.9.5
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org  freedesktop.org>
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> > 

Re: [Mesa-dev] [PATCH 11/22] nir: Recognize some more open-coded fmin / fmax

2018-02-26 Thread Ian Romanick
On 02/23/2018 05:14 PM, Jason Ekstrand wrote:
> On Fri, Feb 23, 2018 at 3:55 PM, Ian Romanick  > wrote:
> 
> From: Ian Romanick  >
> 
> shader-db results:
> 
> Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
> total instructions in shared programs: 14514817 -> 14514808 (<.01%)
> instructions in affected programs: 229 -> 220 (-3.93%)
> helped: 3
> HURT: 0
> helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 4
> helped stats (rel) min: 2.86% max: 4.12% x̄: 3.70% x̃: 4.12%
> 
> total cycles in shared programs: 533145211 -> 533144939 (<.01%)
> cycles in affected programs: 37268 -> 36996 (-0.73%)
> helped: 8
> HURT: 0
> helped stats (abs) min: 2 max: 134 x̄: 34.00 x̃: 2
> helped stats (rel) min: 0.02% max: 14.22% x̄: 3.53% x̃: 0.05%
> 
> Sandy Bridge and Ivy Bridge had similar results. (Ivy Bridge shown)
> total cycles in shared programs: 257618409 -> 257618403 (<.01%)
> cycles in affected programs: 12582 -> 12576 (-0.05%)
> helped: 3
> HURT: 0
> helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
> helped stats (rel) min: 0.05% max: 0.05% x̄: 0.05% x̃: 0.05%
> 
> No changes on Iron Lake or GM45.
> 
> Signed-off-by: Ian Romanick  >
> ---
>  src/compiler/nir/nir_opt_algebraic.py | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/src/compiler/nir/nir_opt_algebraic.py
> b/src/compiler/nir/nir_opt_algebraic.py
> index d40d59b..f5f9e94 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -170,6 +170,8 @@ optimizations = [
>     (('fge', ('fneg', ('fabs', a)), 0.0), ('feq', a, 0.0)),
>     (('bcsel', ('flt', b, a), b, a), ('fmin', a, b)),
>     (('bcsel', ('flt', a, b), b, a), ('fmax', a, b)),
> +   (('bcsel', ('fge', b, a), a, b), ('fmin', a, b)),
> +   (('bcsel', ('fge', a, b), a, b), ('fmax', a, b)),
> 
> 
> Please flag as inexact.  As per the stupid GLSL definition, these are
> not the same as fmin/fmax when you throw in a NaN.

I'm having some trouble rectifying this with the existing
transformations and the Intel hardware implementation.

GLSL spec says min(x, y) "Returns y if y < x; otherwise it returns x."
From that I infer min(x, NaN) == x, and min(NaN, y) == NaN.  The
expression ('bcsel', ('flt', b, a), b, a) has the same behavior.

I think if I rewrite the fmin transform as (swapping the argument order)

(('bcsel', ('fge', a, b), b, a), ('fmin', a, b)),

it should be at least as valid for as the existing transforms.  A
similar modification should work for fmax.

The Intel SEL instruction which says that with the .L or .GE modifier,
if one argument is NaN, the other value is always returned.  This means
that min(NaN, y) will be y.

This is valid for min and max because section 4.7.1 (Range and
Precision) says:

Operations and built-in functions that operate on a NaN are not
required to return a NaN as the result.

I don't think returning non-NaN for ('bcsel', ('flt', b, NaN), b, NaN)
is valid, so I think the existing transformations should also be marked
inexact for platforms that implement the "never NaN" behavior for fmin
or fmax.

>     (('bcsel', ('inot', a), b, c), ('bcsel', a, c, b)),
>     (('bcsel', a, ('bcsel', a, b, c), d), ('bcsel', a, b, d)),
>     (('bcsel', a, True, 'b@bool'), ('ior', a, b)),
> --
> 2.9.5
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org 
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/22] nir: Recognize some more open-coded fmin / fmax

2018-02-23 Thread Jason Ekstrand
On Fri, Feb 23, 2018 at 3:55 PM, Ian Romanick  wrote:

> From: Ian Romanick 
>
> shader-db results:
>
> Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
> total instructions in shared programs: 14514817 -> 14514808 (<.01%)
> instructions in affected programs: 229 -> 220 (-3.93%)
> helped: 3
> HURT: 0
> helped stats (abs) min: 1 max: 4 x̄: 3.00 x̃: 4
> helped stats (rel) min: 2.86% max: 4.12% x̄: 3.70% x̃: 4.12%
>
> total cycles in shared programs: 533145211 -> 533144939 (<.01%)
> cycles in affected programs: 37268 -> 36996 (-0.73%)
> helped: 8
> HURT: 0
> helped stats (abs) min: 2 max: 134 x̄: 34.00 x̃: 2
> helped stats (rel) min: 0.02% max: 14.22% x̄: 3.53% x̃: 0.05%
>
> Sandy Bridge and Ivy Bridge had similar results. (Ivy Bridge shown)
> total cycles in shared programs: 257618409 -> 257618403 (<.01%)
> cycles in affected programs: 12582 -> 12576 (-0.05%)
> helped: 3
> HURT: 0
> helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
> helped stats (rel) min: 0.05% max: 0.05% x̄: 0.05% x̃: 0.05%
>
> No changes on Iron Lake or GM45.
>
> Signed-off-by: Ian Romanick 
> ---
>  src/compiler/nir/nir_opt_algebraic.py | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/src/compiler/nir/nir_opt_algebraic.py
> b/src/compiler/nir/nir_opt_algebraic.py
> index d40d59b..f5f9e94 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -170,6 +170,8 @@ optimizations = [
> (('fge', ('fneg', ('fabs', a)), 0.0), ('feq', a, 0.0)),
> (('bcsel', ('flt', b, a), b, a), ('fmin', a, b)),
> (('bcsel', ('flt', a, b), b, a), ('fmax', a, b)),
> +   (('bcsel', ('fge', b, a), a, b), ('fmin', a, b)),
> +   (('bcsel', ('fge', a, b), a, b), ('fmax', a, b)),
>

Please flag as inexact.  As per the stupid GLSL definition, these are not
the same as fmin/fmax when you throw in a NaN.


> (('bcsel', ('inot', a), b, c), ('bcsel', a, c, b)),
> (('bcsel', a, ('bcsel', a, b, c), d), ('bcsel', a, b, d)),
> (('bcsel', a, True, 'b@bool'), ('ior', a, b)),
> --
> 2.9.5
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev