Re: [Jprogramming] Welford's method for standard deviations

Mike Day Wed, 14 Sep 2011 06:19:23 -0700

Here's a somewhat J-ish way to do the job.  It's ok for small data
sets but falls down on a few million cases.


NB. given x,  append a reversed column of k, then append a row: 0 0
NB. If you like, this reverts to origin zero, setting M0 = 0
NB. x gets reversed on the way,  but I don't think that usually matters
NB. If it does matter,  just do Mk on |.x

makexM =: 0 ,~ ] ,.~ [: >: [: i. [: - #

NB. carry out Mk-1 + (xk - Mk-1)/k on (xk, xk-1,. Mk, Mk-1)
NB. and prepend xk to produce xk, Mk

    mk =: ({:@] , {:@[ - -&{: % {.@])

NB. Put these verbs together and clean up the resulting 2 column
NB. array of (reversed) x and Mk on reversed x:

    Mk =:  }.@: |.@: (mk ~/\.) @: makexM

NB. diffs xk - Mk

    difxk_Mk =: -/"1

NB. diffs xk - Mk-1

    difxk_Mkm1 =: }:@:(-~//.)

NB. combine these for Sk:

    Sk =: +/@: (difxk_Mk * difxk_Mkm1) @: Mk

NB. eg given x5
    x5

0.971379 0.0466292 0.330109 0.01346 0.220004

    Mk x5

  0.220004 0.220004
   0.01346 0.116732
  0.330109 0.187858
0.0466292 0.152551
  0.971379 0.316316

    Sk x5

0.603026

    ssdev x5  NB. cf stats.ijs verb

0.603026

It's a bit quicker than Victor's Wstd.

However, Sk ran out of space on his
    a=:1e9+2e7?@$0;
It does run with
    a=:1e9+5e6?@$0,
but is then much slower than ssdev (about 60
times!).  On my machine, Wstd is another 4
times slower than that.   All 3 verbs gave
the same sum of squares for this value of a .

For practical purposes,  I should think ssdev
and stddev  will do for now.  Instability in
standard deviation might be associated with
problems with the data,  and I would begin to
worry about whether I should transform the
data and alter my conceptual model for the
system producing the results.  I don't suppose
many statisticians would be too concerned by
variations in estimates of a deviance of
1 part in a billion.

Mike

On 13/09/2011 8:23 PM, Johann Hibschman wrote:
> I just came across Welford's method for calculating standard deviations,
> and I realized I didn't know how to implement it in J.
>
> See
>
>    - http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
>    - 
> http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/
>    - http://www.johndcook.com/standard_deviation.html
>
> for the various articles that got me wondering about this.
>
> Here's the algorithm, as stated in the above links.
>
>    1. Initialize m_1 = x_1 and s_1 = 0.
>    2. For subsequent xs,
>       m_k = m_{k-1} + (x_k - m_{k-1})/k
>       s_k = s_{k-1} + (x_k - m_{k-1})*(x_k - m_k)
>    3. Then the kth running standard deviation is:
>       stddev_k = s_k / (k - 1)
>
> I've not tried to switch it from 1-based indexing to 0-based indexing,
> or to use J-like phrasing in the problem, for fear I'd make a stupid
> mistake.
>
> How would I do this in J?
>
> Cheers,
> Johann
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Welford's method for standard deviations

Reply via email to