Re: [Mesa-dev] [PATCH] nir: Optimize double-precision lower_round_even()
On Tue, 2019-01-29 at 14:41 +, Roland Scheidegger wrote: > Am 29.01.19 um 10:10 schrieb Erik Faye-Lund: > > On Mon, 2019-01-28 at 09:31 -0800, Matt Turner wrote: > > > Use the trick of adding and then subtracting 2**52 (52 is the > > > number > > > of > > > explicit mantissa bits a double-precision floating-point value > > > has) > > > to > > > implement round-to-even. > > > > > > Cuts the number of instructions on SKL of the piglit test > > > fs-roundEven-double.shader_test from 109 to 21. > > > > Won't this approach only work for "small" values, that is values > > equal > > to or smaller than DBL_MAX - 2**52? Once you add 2**52, you'll get > > infinity, and you can't subtract 2**52 away again without being > > stuck > > with infinity, no... > > It would actually work for very large numbers in theory. > The only numbers the magic trick won't work are those with magnitude > between 2^52 and 2^104 (those are already integral and the add will > cause some of them to be rounded up to another number with the sub > not > doing anything afterwards), for larger ones it will work again, up to > and including inf. > But in any case, that's what the bcsel is for, for numbers larger > than > 2^52 no operations are performed at all. > Doh, I missed the bcsel somehow. Thanks for setting me straight. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Optimize double-precision lower_round_even()
Am 29.01.19 um 10:10 schrieb Erik Faye-Lund: > On Mon, 2019-01-28 at 09:31 -0800, Matt Turner wrote: >> Use the trick of adding and then subtracting 2**52 (52 is the number >> of >> explicit mantissa bits a double-precision floating-point value has) >> to >> implement round-to-even. >> >> Cuts the number of instructions on SKL of the piglit test >> fs-roundEven-double.shader_test from 109 to 21. > > Won't this approach only work for "small" values, that is values equal > to or smaller than DBL_MAX - 2**52? Once you add 2**52, you'll get > infinity, and you can't subtract 2**52 away again without being stuck > with infinity, no... It would actually work for very large numbers in theory. The only numbers the magic trick won't work are those with magnitude between 2^52 and 2^104 (those are already integral and the add will cause some of them to be rounded up to another number with the sub not doing anything afterwards), for larger ones it will work again, up to and including inf. But in any case, that's what the bcsel is for, for numbers larger than 2^52 no operations are performed at all. Roland > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7Cab4bc9f7d353406d07fd08d685c9b366%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636843498692106809sdata=X5iJUwgPjhoiZYqrzSd%2FE1vhRrBthXVt21eFBigWjjM%3Dreserved=0 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Optimize double-precision lower_round_even()
On Tue, 2019-01-29 at 10:10 +0100, Erik Faye-Lund wrote: > On Mon, 2019-01-28 at 09:31 -0800, Matt Turner wrote: > > Use the trick of adding and then subtracting 2**52 (52 is the number > > of > > explicit mantissa bits a double-precision floating-point value has) > > to > > implement round-to-even. > > > > Cuts the number of instructions on SKL of the piglit test > > fs-roundEven-double.shader_test from 109 to 21. > > Won't this approach only work for "small" values, that is values equal > to or smaller than DBL_MAX - 2**52? Once you add 2**52, you'll get > infinity, and you can't subtract 2**52 away again without being stuck > with infinity, no... 2**52 is such a small value compared to anything close to DBL_MAX, it will just be absorbed. regards Philipp ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Optimize double-precision lower_round_even()
On Mon, 2019-01-28 at 09:31 -0800, Matt Turner wrote: > Use the trick of adding and then subtracting 2**52 (52 is the number > of > explicit mantissa bits a double-precision floating-point value has) > to > implement round-to-even. > > Cuts the number of instructions on SKL of the piglit test > fs-roundEven-double.shader_test from 109 to 21. Won't this approach only work for "small" values, that is values equal to or smaller than DBL_MAX - 2**52? Once you add 2**52, you'll get infinity, and you can't subtract 2**52 away again without being stuck with infinity, no... ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Optimize double-precision lower_round_even()
On Mon, Jan 28, 2019 at 10:25 AM Roland Scheidegger wrote: > > I like it :-). > That said, there's some caveats as discussed on IRC - in particular for > gpus which don't do round-to-nearest-even for ordinary fp64 math (or > rounding mode could be set to something else manually) it won't do the > right thing. I don't know that there are any. Round-to-even is the simplest thing to do in hardware. > And if you can have fast-math enabled, then it probably won't round at > all (at least I think it would be legal to eliminate the add/sub in this > case). > So I'm not entirely sure anymore if this can be used unconditionally. > But I can't really tell if those potential caveats actually matter, hence > Reviewed-by: Roland Scheidegger Thanks! ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: Optimize double-precision lower_round_even()
I like it :-). That said, there's some caveats as discussed on IRC - in particular for gpus which don't do round-to-nearest-even for ordinary fp64 math (or rounding mode could be set to something else manually) it won't do the right thing. And if you can have fast-math enabled, then it probably won't round at all (at least I think it would be legal to eliminate the add/sub in this case). So I'm not entirely sure anymore if this can be used unconditionally. But I can't really tell if those potential caveats actually matter, hence Reviewed-by: Roland Scheidegger Am 28.01.19 um 18:31 schrieb Matt Turner: > Use the trick of adding and then subtracting 2**52 (52 is the number of > explicit mantissa bits a double-precision floating-point value has) to > implement round-to-even. > > Cuts the number of instructions on SKL of the piglit test > fs-roundEven-double.shader_test from 109 to 21. > --- > src/compiler/nir/nir_lower_double_ops.c | 56 ++--- > 1 file changed, 12 insertions(+), 44 deletions(-) > > diff --git a/src/compiler/nir/nir_lower_double_ops.c > b/src/compiler/nir/nir_lower_double_ops.c > index 4d4cdf635ea..054fce9c168 100644 > --- a/src/compiler/nir/nir_lower_double_ops.c > +++ b/src/compiler/nir/nir_lower_double_ops.c > @@ -392,50 +392,18 @@ lower_fract(nir_builder *b, nir_ssa_def *src) > static nir_ssa_def * > lower_round_even(nir_builder *b, nir_ssa_def *src) > { > - /* If fract(src) == 0.5, then we will have to decide the rounding > direction. > -* We will do this by computing the mod(abs(src), 2) and testing if it > -* is < 1 or not. > -* > -* We compute mod(abs(src), 2) as: > -* abs(src) - 2.0 * floor(abs(src) / 2.0) > -*/ > - nir_ssa_def *two = nir_imm_double(b, 2.0); > - nir_ssa_def *abs_src = nir_fabs(b, src); > - nir_ssa_def *mod = > - nir_fsub(b, > - abs_src, > - nir_fmul(b, > -two, > -nir_ffloor(b, > - nir_fmul(b, > -abs_src, > -nir_imm_double(b, 0.5); > - > - /* > -* If fract(src) != 0.5, then we round as floor(src + 0.5) > -* > -* If fract(src) == 0.5, then we have to check the modulo: > -* > -* if it is < 1 we need a trunc operation so we get: > -* 0.5 -> 0, -0.5 -> -0 > -* 2.5 -> 2, -2.5 -> -2 > -* > -* otherwise we need to check if src >= 0, in which case we need to > round > -* upwards, or not, in which case we need to round downwards so we get: > -* 1.5 -> 2, -1.5 -> -2 > -* 3.5 -> 4, -3.5 -> -4 > -*/ > - nir_ssa_def *fract = nir_ffract(b, src); > - return nir_bcsel(b, > -nir_fne(b, fract, nir_imm_double(b, 0.5)), > -nir_ffloor(b, nir_fadd(b, src, nir_imm_double(b, 0.5))), > -nir_bcsel(b, > - nir_flt(b, mod, nir_imm_double(b, 1.0)), > - nir_ftrunc(b, src), > - nir_bcsel(b, > -nir_fge(b, src, nir_imm_double(b, > 0.0)), > -nir_fadd(b, src, nir_imm_double(b, > 0.5)), > -nir_fsub(b, src, nir_imm_double(b, > 0.5); > + /* Add and subtract 2**52 to round off any fractional bits. */ > + nir_ssa_def *two52 = nir_imm_double(b, (double)(1ull << 52)); > + nir_ssa_def *sign = nir_iand(b, nir_unpack_64_2x32_split_y(b, src), > +nir_imm_int(b, 1ull << 31)); > + > + b->exact = true; > + nir_ssa_def *res = nir_fsub(b, nir_fadd(b, nir_fabs(b, src), two52), > two52); > + b->exact = false; > + > + return nir_bcsel(b, nir_flt(b, nir_fabs(b, src), two52), > +nir_pack_64_2x32_split(b, nir_unpack_64_2x32_split_x(b, > res), > + nir_ior(b, > nir_unpack_64_2x32_split_y(b, res), sign)), src); > } > > static nir_ssa_def * > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nir: Optimize double-precision lower_round_even()
Use the trick of adding and then subtracting 2**52 (52 is the number of explicit mantissa bits a double-precision floating-point value has) to implement round-to-even. Cuts the number of instructions on SKL of the piglit test fs-roundEven-double.shader_test from 109 to 21. --- src/compiler/nir/nir_lower_double_ops.c | 56 ++--- 1 file changed, 12 insertions(+), 44 deletions(-) diff --git a/src/compiler/nir/nir_lower_double_ops.c b/src/compiler/nir/nir_lower_double_ops.c index 4d4cdf635ea..054fce9c168 100644 --- a/src/compiler/nir/nir_lower_double_ops.c +++ b/src/compiler/nir/nir_lower_double_ops.c @@ -392,50 +392,18 @@ lower_fract(nir_builder *b, nir_ssa_def *src) static nir_ssa_def * lower_round_even(nir_builder *b, nir_ssa_def *src) { - /* If fract(src) == 0.5, then we will have to decide the rounding direction. -* We will do this by computing the mod(abs(src), 2) and testing if it -* is < 1 or not. -* -* We compute mod(abs(src), 2) as: -* abs(src) - 2.0 * floor(abs(src) / 2.0) -*/ - nir_ssa_def *two = nir_imm_double(b, 2.0); - nir_ssa_def *abs_src = nir_fabs(b, src); - nir_ssa_def *mod = - nir_fsub(b, - abs_src, - nir_fmul(b, -two, -nir_ffloor(b, - nir_fmul(b, -abs_src, -nir_imm_double(b, 0.5); - - /* -* If fract(src) != 0.5, then we round as floor(src + 0.5) -* -* If fract(src) == 0.5, then we have to check the modulo: -* -* if it is < 1 we need a trunc operation so we get: -* 0.5 -> 0, -0.5 -> -0 -* 2.5 -> 2, -2.5 -> -2 -* -* otherwise we need to check if src >= 0, in which case we need to round -* upwards, or not, in which case we need to round downwards so we get: -* 1.5 -> 2, -1.5 -> -2 -* 3.5 -> 4, -3.5 -> -4 -*/ - nir_ssa_def *fract = nir_ffract(b, src); - return nir_bcsel(b, -nir_fne(b, fract, nir_imm_double(b, 0.5)), -nir_ffloor(b, nir_fadd(b, src, nir_imm_double(b, 0.5))), -nir_bcsel(b, - nir_flt(b, mod, nir_imm_double(b, 1.0)), - nir_ftrunc(b, src), - nir_bcsel(b, -nir_fge(b, src, nir_imm_double(b, 0.0)), -nir_fadd(b, src, nir_imm_double(b, 0.5)), -nir_fsub(b, src, nir_imm_double(b, 0.5); + /* Add and subtract 2**52 to round off any fractional bits. */ + nir_ssa_def *two52 = nir_imm_double(b, (double)(1ull << 52)); + nir_ssa_def *sign = nir_iand(b, nir_unpack_64_2x32_split_y(b, src), +nir_imm_int(b, 1ull << 31)); + + b->exact = true; + nir_ssa_def *res = nir_fsub(b, nir_fadd(b, nir_fabs(b, src), two52), two52); + b->exact = false; + + return nir_bcsel(b, nir_flt(b, nir_fabs(b, src), two52), +nir_pack_64_2x32_split(b, nir_unpack_64_2x32_split_x(b, res), + nir_ior(b, nir_unpack_64_2x32_split_y(b, res), sign)), src); } static nir_ssa_def * -- 2.19.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev