The current verbs for calculating covariance and correlation in the
stats/base/multivariate.ijs script, are dyadic and designed to calculate
the cov/corr between 2 variables
e.g.
  load 'stats'
  X=: 1 1 1 1 2 2 2 2 3 3 3 3
  Y=: 1 2 2 3 5 5 6 7 10 11 11 12
  Z=: 1 1 2 2 4 6 5 4 8 7 9 10
  X cov Y
3.27273
  X corr Y
0.97547

Often we want to calculate a cov/corr matrix for more than 2 variables. The
current definitions can be used this for purpose

​
​​
cov"1/~ X,Y,:Z

0.727273 3.27273 2.54545

3.27273 15.4773 11.6591

2.54545 11.6591 9.7197

​   ​
corr"1/~ X,Y,:Z
1 0.97547 0.957393

0.97547 1 0.950585

0.957393 0.950585 1


but they are slower than these alternatives
   ((+/ .*~ |:)@dev % <:@#) X,.Y,.Z
   ((+/ .*~ |:)@(dev %"_1 _ stddev) % <:@#) X,.Y,.Z


This topic has come up in the forums at least a couple of times.
http://www.jsoftware.com/pipermail/programming/2011-March/022417.html
http://www.jsoftware.com/pipermail/programming/2009-September/016321.html
http://www.jsoftware.com/pipermail/programming/2007-June/007186.html

I propose to add the following definitions to the
stats/base/multivariate.ijs script. Any algorithmic or naming suggestions
are welcome.

XtX=: |: +/ .* ]
cov_multi=: XtX@dev % <:@#
corr_multi=: XtX@(dev %"_1 _ stddev) % <:@#

Note that my testing suggests the fork (|: +/ .* ]) appears to be slightly
faster and leaner than equivalent hook (+/ .*~ |:)
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to