Thank you to the multiple people who worked out the rounding proof for FP
ADD.  Now, I could use a little help with multiplication.

For the moment, I'm not going to bother with multiplication of numbers
where the normalization bit is zero.  This is an inconsistency with
add/sub, but it was easy enough there while a bit harder here.  If we
decide to correct this later, we can.  But at the moment, if you multiply
by a denormalized number, you'll get a result that's smaller than it should
be, in those cases where it doesn't underflow anyway.

So, making this simplification, after denormalization we get two 24-bit
numbers in the fixed-point format 1.23.  These numbers will be anywhere
between (inclusive) 0x800000 and 0xFFFFFF.

The result coming out of the multiplier circuit is in the format 2.46.  The
smallest result is 0x400000000000, and the largest is 0xFFFFFE000001.  So
at worst, we have to right shift by one, and we have a stage for that
already right after rounding.  We'll right-shift the result by a constant
23 (which is free) out of the multiplier, and then use that last stage to
right shift once more.

Now, there's the issue of rounding:  If you look at the C version I wrote,
I right shift by 22 bits, then use that lowest bit to decide if I need to
round or not.  Can that bit, in fact, ever be non-zero?  If it is, then is
it possible to overflow such that we'd have to right shift by 2 to properly
normalize?

I could just not round, and that would solve the problem, but if we can
avoid a bit of round-off error, that would be great, and this is already
going through the rounding stage written for add/sub, so it doesn't cost
anything extra.

While someone else tries to do an analytical proof, I'm going to be an
idiot and write an exhaustive numerical proof, where I try all the
combinations.  We'll see who wins, although I have a huge head start.  :)
 Also, I'm an even bigger idiot, because I swear I've done this experiment
before, and I don't remember the result.


Is it good or bad I didn't specialize in theory of computation?  I've never
been good at proofs.  Let's find a way to equate this here graph theory
problem with 3SAT so we can prove the solution is at least as hard as the
hardest problem where a correct answer can be verified in polynomial time.
 PAIN!


-- 
Timothy Normand Miller, PhD
Assistant Professor of Computer Science, Binghamton University
http://www.cs.binghamton.edu/~millerti/<http://www.cse.ohio-state.edu/~millerti>
Open Graphics Project
_______________________________________________
Open-graphics mailing list
[email protected]
http://lists.duskglow.com/mailman/listinfo/open-graphics
List service provided by Duskglow Consulting, LLC (www.duskglow.com)

Reply via email to