Re: [Mesa-dev] [PATCH] nir: Optimize double-precision lower_round_even()

2019-01-29 Thread Erik Faye-Lund
On Tue, 2019-01-29 at 14:41 +, Roland Scheidegger wrote:
> Am 29.01.19 um 10:10 schrieb Erik Faye-Lund:
> > On Mon, 2019-01-28 at 09:31 -0800, Matt Turner wrote:
> > > Use the trick of adding and then subtracting 2**52 (52 is the
> > > number
> > > of
> > > explicit mantissa bits a double-precision floating-point value
> > > has)
> > > to
> > > implement round-to-even.
> > > 
> > > Cuts the number of instructions on SKL of the piglit test
> > > fs-roundEven-double.shader_test from 109 to 21.
> > 
> > Won't this approach only work for "small" values, that is values
> > equal
> > to or smaller than DBL_MAX - 2**52? Once you add 2**52, you'll get
> > infinity, and you can't subtract 2**52 away again without being
> > stuck
> > with infinity, no...
> 
> It would actually work for very large numbers in theory.
> The only numbers the magic trick won't work are those with magnitude
> between 2^52 and 2^104 (those are already integral and the add will
> cause some of them to be rounded up to another number with the sub
> not
> doing anything afterwards), for larger ones it will work again, up to
> and including inf.
> But in any case, that's what the bcsel is for, for numbers larger
> than
> 2^52 no operations are performed at all.
> 

Doh, I missed the bcsel somehow. Thanks for setting me straight.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: Optimize double-precision lower_round_even()

2019-01-29 Thread Roland Scheidegger
Am 29.01.19 um 10:10 schrieb Erik Faye-Lund:
> On Mon, 2019-01-28 at 09:31 -0800, Matt Turner wrote:
>> Use the trick of adding and then subtracting 2**52 (52 is the number
>> of
>> explicit mantissa bits a double-precision floating-point value has)
>> to
>> implement round-to-even.
>>
>> Cuts the number of instructions on SKL of the piglit test
>> fs-roundEven-double.shader_test from 109 to 21.
> 
> Won't this approach only work for "small" values, that is values equal
> to or smaller than DBL_MAX - 2**52? Once you add 2**52, you'll get
> infinity, and you can't subtract 2**52 away again without being stuck
> with infinity, no...

It would actually work for very large numbers in theory.
The only numbers the magic trick won't work are those with magnitude
between 2^52 and 2^104 (those are already integral and the add will
cause some of them to be rounded up to another number with the sub not
doing anything afterwards), for larger ones it will work again, up to
and including inf.
But in any case, that's what the bcsel is for, for numbers larger than
2^52 no operations are performed at all.

Roland


> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7Cab4bc9f7d353406d07fd08d685c9b366%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636843498692106809sdata=X5iJUwgPjhoiZYqrzSd%2FE1vhRrBthXVt21eFBigWjjM%3Dreserved=0
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: Optimize double-precision lower_round_even()

2019-01-29 Thread Philipp Zabel
On Tue, 2019-01-29 at 10:10 +0100, Erik Faye-Lund wrote:
> On Mon, 2019-01-28 at 09:31 -0800, Matt Turner wrote:
> > Use the trick of adding and then subtracting 2**52 (52 is the number
> > of
> > explicit mantissa bits a double-precision floating-point value has)
> > to
> > implement round-to-even.
> > 
> > Cuts the number of instructions on SKL of the piglit test
> > fs-roundEven-double.shader_test from 109 to 21.
> 
> Won't this approach only work for "small" values, that is values equal
> to or smaller than DBL_MAX - 2**52? Once you add 2**52, you'll get
> infinity, and you can't subtract 2**52 away again without being stuck
> with infinity, no...

2**52 is such a small value compared to anything close to DBL_MAX, it
will just be absorbed.

regards
Philipp
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: Optimize double-precision lower_round_even()

2019-01-29 Thread Erik Faye-Lund
On Mon, 2019-01-28 at 09:31 -0800, Matt Turner wrote:
> Use the trick of adding and then subtracting 2**52 (52 is the number
> of
> explicit mantissa bits a double-precision floating-point value has)
> to
> implement round-to-even.
> 
> Cuts the number of instructions on SKL of the piglit test
> fs-roundEven-double.shader_test from 109 to 21.

Won't this approach only work for "small" values, that is values equal
to or smaller than DBL_MAX - 2**52? Once you add 2**52, you'll get
infinity, and you can't subtract 2**52 away again without being stuck
with infinity, no...


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: Optimize double-precision lower_round_even()

2019-01-28 Thread Matt Turner
On Mon, Jan 28, 2019 at 10:25 AM Roland Scheidegger  wrote:
>
> I like it :-).
> That said, there's some caveats as discussed on IRC - in particular for
> gpus which don't do round-to-nearest-even for ordinary fp64 math (or
> rounding mode could be set to something else manually) it won't do the
> right thing.

I don't know that there are any. Round-to-even is the simplest thing
to do in hardware.

> And if you can have fast-math enabled, then it probably won't round at
> all (at least I think it would be legal to eliminate the add/sub in this
> case).
> So I'm not entirely sure anymore if this can be used unconditionally.
> But I can't really tell if those potential caveats actually matter, hence
> Reviewed-by: Roland Scheidegger 


Thanks!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: Optimize double-precision lower_round_even()

2019-01-28 Thread Roland Scheidegger
I like it :-).
That said, there's some caveats as discussed on IRC - in particular for
gpus which don't do round-to-nearest-even for ordinary fp64 math (or
rounding mode could be set to something else manually) it won't do the
right thing.
And if you can have fast-math enabled, then it probably won't round at
all (at least I think it would be legal to eliminate the add/sub in this
case).
So I'm not entirely sure anymore if this can be used unconditionally.
But I can't really tell if those potential caveats actually matter, hence
Reviewed-by: Roland Scheidegger 

Am 28.01.19 um 18:31 schrieb Matt Turner:
> Use the trick of adding and then subtracting 2**52 (52 is the number of
> explicit mantissa bits a double-precision floating-point value has) to
> implement round-to-even.
> 
> Cuts the number of instructions on SKL of the piglit test
> fs-roundEven-double.shader_test from 109 to 21.
> ---
>  src/compiler/nir/nir_lower_double_ops.c | 56 ++---
>  1 file changed, 12 insertions(+), 44 deletions(-)
> 
> diff --git a/src/compiler/nir/nir_lower_double_ops.c 
> b/src/compiler/nir/nir_lower_double_ops.c
> index 4d4cdf635ea..054fce9c168 100644
> --- a/src/compiler/nir/nir_lower_double_ops.c
> +++ b/src/compiler/nir/nir_lower_double_ops.c
> @@ -392,50 +392,18 @@ lower_fract(nir_builder *b, nir_ssa_def *src)
>  static nir_ssa_def *
>  lower_round_even(nir_builder *b, nir_ssa_def *src)
>  {
> -   /* If fract(src) == 0.5, then we will have to decide the rounding 
> direction.
> -* We will do this by computing the mod(abs(src), 2) and testing if it
> -* is < 1 or not.
> -*
> -* We compute mod(abs(src), 2) as:
> -* abs(src) - 2.0 * floor(abs(src) / 2.0)
> -*/
> -   nir_ssa_def *two = nir_imm_double(b, 2.0);
> -   nir_ssa_def *abs_src = nir_fabs(b, src);
> -   nir_ssa_def *mod =
> -  nir_fsub(b,
> -   abs_src,
> -   nir_fmul(b,
> -two,
> -nir_ffloor(b,
> -   nir_fmul(b,
> -abs_src,
> -nir_imm_double(b, 0.5);
> -
> -   /*
> -* If fract(src) != 0.5, then we round as floor(src + 0.5)
> -*
> -* If fract(src) == 0.5, then we have to check the modulo:
> -*
> -*   if it is < 1 we need a trunc operation so we get:
> -*  0.5 -> 0,   -0.5 -> -0
> -*  2.5 -> 2,   -2.5 -> -2
> -*
> -*   otherwise we need to check if src >= 0, in which case we need to 
> round
> -*   upwards, or not, in which case we need to round downwards so we get:
> -*  1.5 -> 2,   -1.5 -> -2
> -*  3.5 -> 4,   -3.5 -> -4
> -*/
> -   nir_ssa_def *fract = nir_ffract(b, src);
> -   return nir_bcsel(b,
> -nir_fne(b, fract, nir_imm_double(b, 0.5)),
> -nir_ffloor(b, nir_fadd(b, src, nir_imm_double(b, 0.5))),
> -nir_bcsel(b,
> -  nir_flt(b, mod, nir_imm_double(b, 1.0)),
> -  nir_ftrunc(b, src),
> -  nir_bcsel(b,
> -nir_fge(b, src, nir_imm_double(b, 
> 0.0)),
> -nir_fadd(b, src, nir_imm_double(b, 
> 0.5)),
> -nir_fsub(b, src, nir_imm_double(b, 
> 0.5);
> +   /* Add and subtract 2**52 to round off any fractional bits. */
> +   nir_ssa_def *two52 = nir_imm_double(b, (double)(1ull << 52));
> +   nir_ssa_def *sign = nir_iand(b, nir_unpack_64_2x32_split_y(b, src),
> +nir_imm_int(b, 1ull << 31));
> +
> +   b->exact = true;
> +   nir_ssa_def *res = nir_fsub(b, nir_fadd(b, nir_fabs(b, src), two52), 
> two52);
> +   b->exact = false;
> +
> +   return nir_bcsel(b, nir_flt(b, nir_fabs(b, src), two52),
> +nir_pack_64_2x32_split(b, nir_unpack_64_2x32_split_x(b, 
> res),
> +   nir_ior(b, 
> nir_unpack_64_2x32_split_y(b, res), sign)), src);
>  }
>  
>  static nir_ssa_def *
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir: Optimize double-precision lower_round_even()

2019-01-28 Thread Matt Turner
Use the trick of adding and then subtracting 2**52 (52 is the number of
explicit mantissa bits a double-precision floating-point value has) to
implement round-to-even.

Cuts the number of instructions on SKL of the piglit test
fs-roundEven-double.shader_test from 109 to 21.
---
 src/compiler/nir/nir_lower_double_ops.c | 56 ++---
 1 file changed, 12 insertions(+), 44 deletions(-)

diff --git a/src/compiler/nir/nir_lower_double_ops.c 
b/src/compiler/nir/nir_lower_double_ops.c
index 4d4cdf635ea..054fce9c168 100644
--- a/src/compiler/nir/nir_lower_double_ops.c
+++ b/src/compiler/nir/nir_lower_double_ops.c
@@ -392,50 +392,18 @@ lower_fract(nir_builder *b, nir_ssa_def *src)
 static nir_ssa_def *
 lower_round_even(nir_builder *b, nir_ssa_def *src)
 {
-   /* If fract(src) == 0.5, then we will have to decide the rounding direction.
-* We will do this by computing the mod(abs(src), 2) and testing if it
-* is < 1 or not.
-*
-* We compute mod(abs(src), 2) as:
-* abs(src) - 2.0 * floor(abs(src) / 2.0)
-*/
-   nir_ssa_def *two = nir_imm_double(b, 2.0);
-   nir_ssa_def *abs_src = nir_fabs(b, src);
-   nir_ssa_def *mod =
-  nir_fsub(b,
-   abs_src,
-   nir_fmul(b,
-two,
-nir_ffloor(b,
-   nir_fmul(b,
-abs_src,
-nir_imm_double(b, 0.5);
-
-   /*
-* If fract(src) != 0.5, then we round as floor(src + 0.5)
-*
-* If fract(src) == 0.5, then we have to check the modulo:
-*
-*   if it is < 1 we need a trunc operation so we get:
-*  0.5 -> 0,   -0.5 -> -0
-*  2.5 -> 2,   -2.5 -> -2
-*
-*   otherwise we need to check if src >= 0, in which case we need to round
-*   upwards, or not, in which case we need to round downwards so we get:
-*  1.5 -> 2,   -1.5 -> -2
-*  3.5 -> 4,   -3.5 -> -4
-*/
-   nir_ssa_def *fract = nir_ffract(b, src);
-   return nir_bcsel(b,
-nir_fne(b, fract, nir_imm_double(b, 0.5)),
-nir_ffloor(b, nir_fadd(b, src, nir_imm_double(b, 0.5))),
-nir_bcsel(b,
-  nir_flt(b, mod, nir_imm_double(b, 1.0)),
-  nir_ftrunc(b, src),
-  nir_bcsel(b,
-nir_fge(b, src, nir_imm_double(b, 
0.0)),
-nir_fadd(b, src, nir_imm_double(b, 
0.5)),
-nir_fsub(b, src, nir_imm_double(b, 
0.5);
+   /* Add and subtract 2**52 to round off any fractional bits. */
+   nir_ssa_def *two52 = nir_imm_double(b, (double)(1ull << 52));
+   nir_ssa_def *sign = nir_iand(b, nir_unpack_64_2x32_split_y(b, src),
+nir_imm_int(b, 1ull << 31));
+
+   b->exact = true;
+   nir_ssa_def *res = nir_fsub(b, nir_fadd(b, nir_fabs(b, src), two52), two52);
+   b->exact = false;
+
+   return nir_bcsel(b, nir_flt(b, nir_fabs(b, src), two52),
+nir_pack_64_2x32_split(b, nir_unpack_64_2x32_split_x(b, 
res),
+   nir_ior(b, 
nir_unpack_64_2x32_split_y(b, res), sign)), src);
 }
 
 static nir_ssa_def *
-- 
2.19.2

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev