It's impossible NOT to 'rearrange' +/, if you want speed.  With 1 ALU, you add 0+1+2... except: suppose your add latency is 4?  Then you have 4 accumulators and interleave the addition of 0+4+8+12, 1+5+9+13, etc.

But now AVX comes along and you add 4 at a time, so with 4 accumulators you are adding 0+16+32+... etc and then there are 6 ways to combine the accumulators.

With AVX512 everything changes again.

Henry Rich

On 1/9/2023 8:32 PM, Marshall Lochbaum wrote:
Well, true, I'm not in favor of rearranging +/ either. The dangers of
floating point don't include nondeterminism, unless you make them.

However, I also think matrix products have it worse. Numbers with widely
varying exponents are a bit of an edge case. But when you're multiplying
a few large matrices together they can show up naturally, so I expect
it's not so rare to have a product that's numerically stable in one
direction and not in the other.

Marshall

On Mon, Jan 09, 2023 at 05:52:34PM -0600, Omar Antolín Camarena wrote:
But that's just normal floating non-associativity. It happens even for addition of 
"integers":

    1 + (_1e19 + 1e19)
1
    (1 + _1e19) + 1e19
0

People using floating point are probably aware of the dangers or at least 
should be.

--
Omar
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to