> > Then you probably want SIMD vector ops too, which, AFAIK, are not yet > supported. FP math in Racket does use the SIMD unit on most targets, > but normal math computes one value at a time, using only one slot per > SIMD register, as opposed to the N slots available at the given precision. > [This is the same as in C: if you want vector ops, you use SIMD > intrinsics instead of the normal C operators.]

We already make heavy use of SIMD instructions in our main codebase, so I don't need Racket to do SIMD since I plan on only using Racket for offline analysis purposes. How long do you want to wait for "truth" calculations. Done using > either rationals (software bigint / bigint fractions), or bigfloats > (software adjustable width FP) with results converted to rational for > comparison, the truth calculation is going to be many orders of > magnitude slower than hardware FP math. > Do you have enough memory? Rationals can expand to fill all available > space. I can wait a while, but it can't be too slow, of course. If we're talking hours just to get a single computation done that involves just a handful of adds or multiplies, then this is untenable for me. But my experience shows that Racket is plenty fast for this simple case. Are there cases where it takes a surprising amount of extra time to perform a series of multiplies and adds? As for memory space, I have 32 GB of memory to spare. Should I be concerned with this when my computations typically only contain a few multiplies or adds? (FYI, it's not guaranteed that I'll restrict myself to such simple cases. We have many 4x4 matrix operations that we perform that I can definitely see myself looking into, some of which do orthonormalization or matrix inverses). Perhaps some kind of relative error measurement would be more > appropriate? Without knowing the algorithm in question, nobody can > really give better suggestions. Yes, for sure, but I currently only care about ULPs at the moment. -Dale Kim On Tuesday, April 10, 2018 at 1:48:16 AM UTC-7, gneuner2 wrote: > > > On 4/10/2018 1:36 AM, dk...@insomniacgames.com <javascript:> wrote: > > For the applications I work on, double precision floats are too costly > > to use; although the CPU cycle count to operate on doubles tend to be > > the same as single precision floats on modern hardware, the bandwidth > > cost is too prohibitive. We really do need single precision floats, > > and in many cases, 16 bit half precision floats due to the bandwidth > > savings. > > Then you probably want SIMD vector ops too, which, AFAIK, are not yet > supported. FP math in Racket does use the SIMD unit on most targets, > but normal math computes one value at a time, using only one slot per > SIMD register, as opposed to the N slots available at the given precision. > [This is the same as in C: if you want vector ops, you use SIMD > intrinsics instead of the normal C operators.] > > In Racket, there are tricks you can play with typed arrays and/or unsafe > operations to get more speed from bypassing the language's type > safeguards ... but you won't get vector ops AFAIK unless you drop into C > code. > > And again, there is no half precision available. Half precision is > available only in GPUs or certain DSPs - no CPU implements it. > > > > With regard to exactness, I don't need exactness to compare two single > > precision floats. I would like to have exactness in the ground truth > > that I compute to be able to calculate the error in the single > > precision float version of the computation. The idea is that I > > implement two versions of an algorithm. One uses the exact numbers > > supported by Racket and the other would use single precision floats, > > then I would like to compute error with (flulp-error x r) or something > > similar. > > How long do you want to wait for "truth" calculations. Done using > either rationals (software bigint / bigint fractions), or bigfloats > (software adjustable width FP) with results converted to rational for > comparison, the truth calculation is going to be many orders of > magnitude slower than hardware FP math. > > Do you have enough memory? Rationals can expand to fill all available > space. > > > > Is there a better approach to do this kind of analysis? > > You really haven't specified any "analysis" per se. Thus far you have > said only that you want to execute two versions of the same algorithm: > one using exact (or maybe high precision float) values, and one using > low (single) precision values, and compare the results. > > What you proposed is fine as far as it goes, but I question whether > measuring ulps error really is what you want to do. That more typically > would be done to compare answers computed to the same precision using > different algorithms. In your case, the low precision value will likely > lead to large errors vs the exact one - think about how intermediate > values overflowing or underflowing might affect the end result. > > Perhaps some kind of relative error measurement would be more > appropriate? Without knowing the algorithm in question, nobody can > really give better suggestions. > > > > -Dale Kim > > YMMV, > George > > On Tuesday, April 10, 2018 at 1:48:16 AM UTC-7, gneuner2 wrote: > > > On 4/10/2018 1:36 AM, dk...@insomniacgames.com <javascript:> wrote: > > For the applications I work on, double precision floats are too costly > > to use; although the CPU cycle count to operate on doubles tend to be > > the same as single precision floats on modern hardware, the bandwidth > > cost is too prohibitive. We really do need single precision floats, > > and in many cases, 16 bit half precision floats due to the bandwidth > > savings. > > Then you probably want SIMD vector ops too, which, AFAIK, are not yet > supported. FP math in Racket does use the SIMD unit on most targets, > but normal math computes one value at a time, using only one slot per > SIMD register, as opposed to the N slots available at the given precision. > [This is the same as in C: if you want vector ops, you use SIMD > intrinsics instead of the normal C operators.] > > In Racket, there are tricks you can play with typed arrays and/or unsafe > operations to get more speed from bypassing the language's type > safeguards ... but you won't get vector ops AFAIK unless you drop into C > code. > > And again, there is no half precision available. Half precision is > available only in GPUs or certain DSPs - no CPU implements it. > > > > With regard to exactness, I don't need exactness to compare two single > > precision floats. I would like to have exactness in the ground truth > > that I compute to be able to calculate the error in the single > > precision float version of the computation. The idea is that I > > implement two versions of an algorithm. One uses the exact numbers > > supported by Racket and the other would use single precision floats, > > then I would like to compute error with (flulp-error x r) or something > > similar. > > How long do you want to wait for "truth" calculations. Done using > either rationals (software bigint / bigint fractions), or bigfloats > (software adjustable width FP) with results converted to rational for > comparison, the truth calculation is going to be many orders of > magnitude slower than hardware FP math. > > Do you have enough memory? Rationals can expand to fill all available > space. > > > > Is there a better approach to do this kind of analysis? > > You really haven't specified any "analysis" per se. Thus far you have > said only that you want to execute two versions of the same algorithm: > one using exact (or maybe high precision float) values, and one using > low (single) precision values, and compare the results. > > What you proposed is fine as far as it goes, but I question whether > measuring ulps error really is what you want to do. That more typically > would be done to compare answers computed to the same precision using > different algorithms. In your case, the low precision value will likely > lead to large errors vs the exact one - think about how intermediate > values overflowing or underflowing might affect the end result. > > Perhaps some kind of relative error measurement would be more > appropriate? Without knowing the algorithm in question, nobody can > really give better suggestions. > > > > -Dale Kim > > YMMV, > George > > -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.