On 01/21/2015 09:27 AM, Arne Scheffer wrote:
Sorry, corrected second try because of copy&paste mistakes:
VlG-Arne
Comments appreciated.
Definition var_samp = Sum of squared differences /n-1
Definition stddev_samp = sqrt(var_samp)
Example N=4
1.) Sum of squared differences
1_4Sum(Xi-XM4)²
=
2.) adding nothing
1_4Sum(Xi-XM4)²
+0
+0
+0
=
3.) nothing changed
1_4Sum(Xi-XM4)²
+(-1_3Sum(Xi-XM3)²+1_3Sum(Xi-XM3)²)
+(-1_2Sum(Xi-XM2)²+1_2Sum(Xi-XM2)²)
+(-1_1Sum(Xi-XM1)²+1_1Sum(Xi-XM1)²)
=
4.) parts reordered
(1_4Sum(Xi-XM4)²-1_3Sum(Xi-XM3)²)
+(1_3Sum(Xi-XM3)²-1_2Sum(Xi-XM2)²)
+(1_2Sum(Xi-XM2)²-1_1Sum(Xi-XM1)²)
+1_1Sum(X1-XM1)²
=
5.)
(X4-XM4)(X4-XM3)
+ (X3-XM3)(X3-XM2)
+ (X2-XM2)(X2-XM1)
+ (X1-XM1)²
=
6.) XM1=X1 => There it is - The iteration part of Welfords Algorithm
(in
reverse order)
(X4-XM4)(X4-XM3)
+ (X3-XM3)(X3-XM2)
+ (X2-XM2)(X2-X1)
+ 0
The missing piece is 4.) to 5.)
it's algebra, look at e.g.:
http://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/
I have no idea what you are saying here.
Here are comments in email to me from the author of
<http://www.johndcook.com/blog/standard_deviation> regarding the divisor
used:
My code is using the unbiased form of the sample variance, dividing
by n-1.
It's usually not worthwhile to make a distinction between a sample
and a population because the "population" is often itself a sample.
For example, if you could measure the height of everyone on earth at
one instance, that's the entire population, but it's still a sample
from all who have lived and who ever will live.
Also, for large n, there's hardly any difference between 1/n and
1/(n-1).
Maybe I should add that in the code comments. Otherwise, I don't think
we need a change.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers