Steven D'Aprano <[EMAIL PROTECTED]> wrote: > If the values vary greatly in magnitude, you probably want to add them > from smallest to biggest; other than that, how else can you calculate the > mean? > It doesn't have to be smallest to largest, but the important thing is not to be adding the millionth element to the sum of the preceding 999999 values. So if I remember correctly one option is to add pairs of numbers, then sum the pairs and so on.
I think my point really is that nobody is going to bother doing this when they just want to sum a few numbers, and I'm not really surprised that the builtin sum doesn't do it either, but I was slightly surprised that numpy doesn't try to minimise loss of precision. Here's a quick attempt at implementing it which demonstrates how the builtin sum loses precision somewhat faster than a pairwise sum (and is rather dependant on the input order). sumpair is of course a lot slower than the builtin sum, but it might be interesting to see how a C coded version compared. I can't see that it would necessarily be much slower: it just adds some counters, bit twiddling and a short temporary array to the calculation. C:\temp>sumpairs.py builtin sum 500000.00000083662 pairwise sum 500000.00000499998 sorted builtin sum 500000.00000499998 pairwise sum 500000.00000499998 reverse sorted builtin sum 500000.0 pairwise sum 500000.00000500004 all the same builtin sum 1000000.0000016732 pairwise sum 1000000.00001 ------------ sumpairs.py ------------------------------------------ # Sum implemented so that intermediate results sum a similar number # of the input values. def sumpairs(seq): tmp = [] for i,v in enumerate(seq): if i&1: tmp[-1] += v i = i + 1 n = i & -i while n > 2: t = tmp.pop(-1) tmp[-1] = tmp[-1] + t n >>= 1 else: tmp.append(v) tmp.reverse() return sum(tmp) v = [1000000, 0.00001] * 1000000 print "builtin sum ", repr(sum(v)/len(v)) print "pairwise sum", repr(sumpairs(v)/len(v)) print "sorted" v.sort() print "builtin sum ", repr(sum(v)/len(v)) print "pairwise sum", repr(sumpairs(v)/len(v)) print "reverse sorted" v.reverse() print "builtin sum ", repr(sum(v)/len(v)) print "pairwise sum", repr(sumpairs(v)/len(v)) print "all the same" v = [1000000.00001] * 1000000 print "builtin sum ", repr(sum(v)/len(v)) print "pairwise sum", repr(sumpairs(v)/len(v))----------------------------- ---------------------------------- In case the code is a bit hard to follow this version makes it clearer what is happening: ------------- sumpair2.py --------------- def sumpairs(seq): tmp = [] for i,v in enumerate(seq): if i&1: tmp[-1] += v print "add ", tmp i = i + 1 n = i & -i while n > 2: t = tmp.pop(-1) tmp[-1] = tmp[-1] + t n >>= 1 print "reduce", tmp else: tmp.append(v) print "append", tmp tmp.reverse() return sum(tmp) print sumpairs([1]*40) ----------------------------------------- which gives the output: append [1] add [2] append [2, 1] add [2, 2] reduce [4] append [4, 1] add [4, 2] append [4, 2, 1] add [4, 2, 2] reduce [4, 4] reduce [8] append [8, 1] add [8, 2] append [8, 2, 1] add [8, 2, 2] reduce [8, 4] append [8, 4, 1] add [8, 4, 2] append [8, 4, 2, 1] add [8, 4, 2, 2] reduce [8, 4, 4] reduce [8, 8] reduce [16] append [16, 1] add [16, 2] append [16, 2, 1] add [16, 2, 2] reduce [16, 4] append [16, 4, 1] add [16, 4, 2] append [16, 4, 2, 1] add [16, 4, 2, 2] reduce [16, 4, 4] reduce [16, 8] append [16, 8, 1] add [16, 8, 2] append [16, 8, 2, 1] add [16, 8, 2, 2] reduce [16, 8, 4] append [16, 8, 4, 1] add [16, 8, 4, 2] append [16, 8, 4, 2, 1] add [16, 8, 4, 2, 2] reduce [16, 8, 4, 4] reduce [16, 8, 8] reduce [16, 16] reduce [32] append [32, 1] add [32, 2] append [32, 2, 1] add [32, 2, 2] reduce [32, 4] append [32, 4, 1] add [32, 4, 2] append [32, 4, 2, 1] add [32, 4, 2, 2] reduce [32, 4, 4] reduce [32, 8] 40 -- http://mail.python.org/mailman/listinfo/python-list