Interesting problem.  Welford's method doesn't translate to J very well, 
but there might be other ways of achieving the same result.

The basic problem is that +/ doesn't give the right answer for very long 
arrays with large average values and some important small deviations. 
As more and more elements are added, the total gets large, so that as 
each successive number is added to the total, it is truncated with loss 
of precision.

One way to work around the problem would be to do the +/ in pieces, to 
give the fractions a fighting chance:

    a=:1e9+2e7?@$0
    stddev a
0.398581    NB. the problem: result should be 0.288675
    mean
+/ % #
    mean_z_ =: # %~ [: +/ _1e6&(+/\)
    stddev a
0.288687  NB. problem fixed

A problem with this method is that it requires analysis of the data to 
decide what size of infix is adequate.  For the specific case of finding 
the mean, you can take an approximate mean and refine it to a more 
accurate value, without analyzing the data:

    mean_z_ =: (] + (+/ % #)@:-) (+/ % #)
    stddev a
0.288687

These run pretty fast in J.

Henry Rich

On 9/14/2011 12:58 AM, Viktor Cerovski wrote:
>
>
> Johann Hibschman-4 wrote:
>>
>> I just came across Welford's method for calculating standard deviations,
>> and I realized I didn't know how to implement it in J.
>>
>> See
>>
>>    - http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
>>    -
>> http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/
>>    - http://www.johndcook.com/standard_deviation.html
>>
>> for the various articles that got me wondering about this.
>>
>> Here's the algorithm, as stated in the above links.
>>
>>    1. Initialize m_1 = x_1 and s_1 = 0.
>>    2. For subsequent xs,
>>       m_k = m_{k-1} + (x_k - m_{k-1})/k
>>       s_k = s_{k-1} + (x_k - m_{k-1})*(x_k - m_k)
>>    3. Then the kth running standard deviation is:
>>       stddev_k = s_k / (k - 1)
>>
>> I've not tried to switch it from 1-based indexing to 0-based indexing,
>> or to use J-like phrasing in the problem, for fear I'd make a stupid
>> mistake.
>>
>> How would I do this in J?
>>
>> Cheers,
>> Johann
>>
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>>
>
> Very interesting. Here is one solution based on the reduce:
>
> We take three values in the accumulator, k, M and S,
> and the iteration is (using index starting with 1 notation):
>
>     f =. dyad : 0
>     'k M S'=.y
>     Mk=.M+k%~x-M
>     (k+1);Mk;S+(x-M)*x-Mk
> )
>
>
>
> The work is then done according to the scheme:
>
>     array f reduce acc0,
>
> where acc0 is initial value of k;M;S, giving the following:
>
>      Wstd =: }: f~ RL~ 2;0;~{:
>
>      RL =: 1 : 0
> :
>    for_e.y do.x=.x u e end.
>    x
> )
>
> Finally, we get the std. deviation from the result of Wstd as:
>
>     Wstddev =: [: %: [: ({: % 2-~{.)>@:Wstd
>
>     The code could be shortened a bit by a better arrangement
> of what comes from the left/right side (e.g using f RR instead of f~RL~),
> but I don't know how to make it significantly shorter than this.
>
>      Here RL is not crucial--Wstd should be doable with Insert using
> a slight modification of the above scheme, although I don't think
> the code will be much shorter (assuming that we don't count
> the definition of RL, of course.)
>
>     Examples of the stability of Welford's algorithm against
> cancellation errors:
>
>     a=:1e6+1e6?@$0
>     stddev a
> 0.288662
>     Wstddev a
> 0.288662
>
>     a=:1e12+1e6?@$0
>     stddev a
> 0.573414
>     Wstddev a
> 0.288817
>
>     18j16":%%:12       NB. std. dev. of the uniform distribution.
> 0.2886751345948129
>
> It's also interesting to take longer arrays and keep
> more bits of array elements intact:
>
>     a=:1e9+2e7?@$0
>     stddev a
> 0.398581    NB. all the bits of the result gone...
>     Wstddev a
> 0.288684
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to