On 01/21/2015 09:27 AM, Arne Scheffer wrote:
Sorry, corrected second try because of copy&paste mistakes:
VlG-Arne

Comments appreciated.
Definition var_samp = Sum of squared differences /n-1
Definition stddev_samp = sqrt(var_samp)
Example N=4
1.) Sum of squared differences
   1_4Sum(Xi-XM4)²
=
2.) adding nothing
   1_4Sum(Xi-XM4)²
  +0
  +0
  +0
=
3.) nothing changed
  1_4Sum(Xi-XM4)²
  +(-1_3Sum(Xi-XM3)²+1_3Sum(Xi-XM3)²)
  +(-1_2Sum(Xi-XM2)²+1_2Sum(Xi-XM2)²)
  +(-1_1Sum(Xi-XM1)²+1_1Sum(Xi-XM1)²)
=
4.) parts reordered
   (1_4Sum(Xi-XM4)²-1_3Sum(Xi-XM3)²)
  +(1_3Sum(Xi-XM3)²-1_2Sum(Xi-XM2)²)
  +(1_2Sum(Xi-XM2)²-1_1Sum(Xi-XM1)²)
  +1_1Sum(X1-XM1)²
=
5.)
   (X4-XM4)(X4-XM3)
+ (X3-XM3)(X3-XM2)
+ (X2-XM2)(X2-XM1)
+ (X1-XM1)²
=
6.) XM1=X1 => There it is - The iteration part of Welfords Algorithm
(in
reverse order)
   (X4-XM4)(X4-XM3)
+ (X3-XM3)(X3-XM2)
+ (X2-XM2)(X2-X1)
+ 0
The missing piece is 4.) to 5.)
it's algebra, look at e.g.:
http://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/





I have no idea what you are saying here.

Here are comments in email to me from the author of <http://www.johndcook.com/blog/standard_deviation> regarding the divisor used:

   My code is using the unbiased form of the sample variance, dividing
   by n-1.

   It's usually not worthwhile to make a distinction between a sample
   and a population because the "population" is often itself a sample.
   For example, if you could measure the height of everyone on earth at
   one instance, that's the entire population, but it's still a sample
   from all who have lived and who ever will live.

   Also, for large n, there's hardly any difference between 1/n and
   1/(n-1).


Maybe I should add that in the code comments. Otherwise, I don't think we need a change.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to