Jussi Salmela <[EMAIL PROTECTED]> wrote: > I've run a couple of tests and it seems to me that Dennis Lee Bieber is > on the trail of the truth when he claims that smallest magnitude to > the largest is the way to do the summation. Actually it isn't THE way > although it diminishes the error. I was sort of astonished because I > also had somewhere along the years formed the same notion as Dennis. > > "Your" pairwise method beats the former method by a large margin > although I don't understand why. To tell you the truth: I started out to > show you were wrong because intuitively (for me at least) the former > method should work better than yours.
I think Dennis is spot on when he says that smallest to largest is the best way. What he has missed though is that if you have a list of similar sized values, then after you have added the first two together the next addition still needs to follow the smallest to largest rule so it shouldn't include the result of the first addition. One option, if you have access to the full list when you start, would be to reinsert each partial sum back into the sorted list as you calculate it: a heap queue might be an appropriate implementation for that. The pairwise sum gives you most, although not quite all, of the benefit of sorting the data: obviously if the input values are identical it is almost exactly equivalent to adding the smallest values and then inserting the partial result back into the list. Even in the 1000000, 0.00001 case it avoids most of the precision-losing additions such as 500000000005.0 + 0.00001 which the simple sum hits. Also it has the further advantages of still working with other summable objects (e.g. lists) so long as they are associative and not requiring you to know all the values or even the length of the sequence in advance. For floating point you can of course use methods to preserve the lost bits such as the kahan sum, or even just do all imtermediate calculations to a higher internal precision. By the way, I found another bug in my sumpairs: += is a very bad idea here if the input can contain mutables, so the relevant line should be: tmp[-1] = tmp[-1] + v Once I fixed that one, so it no longer attempts to modify lists in- place, it appears that the more balanced intermediate results give a benefit when summing lists as well. sumpairs on a list of lists seems to beat the builtin sum as soon as the length of the list exceeds about 210 items. At 1000 items it is about 3 times faster and 30 times faster at 10,000 items. If I get the time I think I really should code up a C version of sumpairs and see how that compares against Pythons builtin sum for numerical sums. I guess either everyone else missed that computer science lecture I thought I remembered, or maybe I was the one who was asleep and I dreamed it all. :) -- http://mail.python.org/mailman/listinfo/python-list