It's impossible NOT to 'rearrange' +/, if you want speed. With 1 ALU,
you add 0+1+2... except: suppose your add latency is 4? Then you have 4
accumulators and interleave the addition of 0+4+8+12, 1+5+9+13, etc.
But now AVX comes along and you add 4 at a time, so with 4 accumulators
you are adding 0+16+32+... etc and then there are 6 ways to combine the
accumulators.
With AVX512 everything changes again.
Henry Rich
On 1/9/2023 8:32 PM, Marshall Lochbaum wrote:
Well, true, I'm not in favor of rearranging +/ either. The dangers of
floating point don't include nondeterminism, unless you make them.
However, I also think matrix products have it worse. Numbers with widely
varying exponents are a bit of an edge case. But when you're multiplying
a few large matrices together they can show up naturally, so I expect
it's not so rare to have a product that's numerically stable in one
direction and not in the other.
Marshall
On Mon, Jan 09, 2023 at 05:52:34PM -0600, Omar Antolín Camarena wrote:
But that's just normal floating non-associativity. It happens even for addition of
"integers":
1 + (_1e19 + 1e19)
1
(1 + _1e19) + 1e19
0
People using floating point are probably aware of the dangers or at least
should be.
--
Omar
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm