Apologies for misspelling Viktor's name.
Yes - I've had a look at the more scalar approach.
FWIW, The best explicit form I've come up with is :
kMS =: {:@:(kMS2/)@:,&0 NB. appending zero sets k0=M0=S0=0
kMS2 =: 3 : 0
:
k =. >: km1 =. <.{.y NB. k, k-1
dx =. x - 1{y,0 NB. x - Mk-1
y + 1, (1, km1) * (dx ^ 1 2) % k NB. do it!
)
A tacit expression which does the same job as
kMS2 is
kMSt =: {:@:(kMS2t/)@:,&0
kMSt2=:13 : 'y + 1, (x - 1{y,0) ((1 2^~[) * (1&,%>:)@]) <.{.y'
Performance isn't great: x is ?1000#0 -
timer'kMS 1e9+ 1000000$x'
+-------+-------+
|15.8811|82981.7|
+-------+-------+
timer'kMSt 1e9+ 1000000$x' NB. tacit version
+-------+-------+
|7.61304|82981.7|
+-------+-------+
timer'Sk 1e9+ 1000000$x' NB. my earlier contribution
+-------+-------+
|5.50684|82981.7|
+-------+-------+
timer'ssdev 1e9+ 1000000$x' NB. statfns.ijs
+--------+-------+
|0.171143|82981.7|
+--------+-------+
Stability is not ideal but less compromised for kMS
than for ssdev for a larger array:
timer'ssdev 10000000$x' NB. statfns.ijs
+-------+------+
|1.49707|829817|
+-------+------+
timer'kMSt 10000000$x' NB. tacit version
+---------+------+
|1 15.1301|829817|
+---------+------+
NB. ok when mean is ~0
timer'ssdev 1e9+ 10000000$x' NB. statfns.ijs
+-------+------+
|1.60718|859034|
+-------+------+
timer'Sk 1e9+ 10000000$x' NB. my earlier contribution
|out of memory: makexM
| Sk 1000000000+10000000$x
timer'kMSt 1e9+ 10000000$x' NB. tacit version
+--------+------+
|1 15.052|829821|
+--------+------+
NB. less good when mean ~ 1e9
I hadn't heard of Welford when I reinvented this wheel
a few years ago, for different reasons: if you only
know the mean, standard deviation and sample size for
two (or more) different samples, I used an analogous
technique to estimate the mean & std were those samples
to be combined, assuming of course that their
underlying distributions were the same.
But if you do have all the data, I still think that
the simplest approach is satisfactory. If it isn't,
in a real statistical investigation, I would look for
outliers, try transforming the data (eg taking logs),
and generally wonder why it's unstable.
Mike
On 15/09/2011 5:33 PM, Viktor Cerovski wrote:
>
> Mike Day-3 wrote:
>> Here's a somewhat J-ish way to do the job. It's ok for small data
>> sets but falls down on a few million cases.
>>
>> NB. given x, append a reversed column of k, then append a row: 0 0
>> NB. If you like, this reverts to origin zero, setting M0 = 0
>> NB. x gets reversed on the way, but I don't think that usually matters
>> NB. If it does matter, just do Mk on |.x
>>
>> makexM =: 0 ,~ ] ,.~ [:>: [: i. [: - #
>>
>> NB. carry out Mk-1 + (xk - Mk-1)/k on (xk, xk-1,. Mk, Mk-1)
>> NB. and prepend xk to produce xk, Mk
>>
>> mk =: ({:@] , {:@[ - -&{: % {.@])
>>
>> NB. Put these verbs together and clean up the resulting 2 column
>> NB. array of (reversed) x and Mk on reversed x:
>>
>> Mk =: }.@: |.@: (mk ~/\.) @: makexM
>>
>> NB. diffs xk - Mk
>>
>> difxk_Mk =: -/"1
>>
>> NB. diffs xk - Mk-1
>>
>> difxk_Mkm1 =: }:@:(-~//.)
>>
>> NB. combine these for Sk:
>>
>> Sk =: +/@: (difxk_Mk * difxk_Mkm1) @: Mk
>>
>> NB. eg given x5
>> x5
>>
>> 0.971379 0.0466292 0.330109 0.01346 0.220004
>>
>> Mk x5
>>
>> 0.220004 0.220004
>> 0.01346 0.116732
>> 0.330109 0.187858
>> 0.0466292 0.152551
>> 0.971379 0.316316
>>
>> Sk x5
>>
>> 0.603026
>>
>> ssdev x5 NB. cf stats.ijs verb
>>
>> 0.603026
>>
> Looks interesting to me, I'll study it more when I get the chance.
>
>
>
>> It's a bit quicker than Victor's Wstd.
>>
>> However, Sk ran out of space on his
>> a=:1e9+2e7?@$0;
>> It does run with
>> a=:1e9+5e6?@$0,
>> but is then much slower than ssdev (about 60
>> times!). On my machine, Wstd is another 4
>> times slower than that. All 3 verbs gave
>> the same sum of squares for this value of a .
>>
> Wstd can be sped up by simply rewriting it
> as a for. loop, getting a standard imperative-style
> program. The speedup should be considerable since
> there will be no boxing/unboxing being done for
> every element of x, and because of the simpler iteration.
[snipped the rest - Mike]
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm