The current verbs for calculating covariance and correlation in the stats/base/multivariate.ijs script, are dyadic and designed to calculate the cov/corr between 2 variables e.g. load 'stats' X=: 1 1 1 1 2 2 2 2 3 3 3 3 Y=: 1 2 2 3 5 5 6 7 10 11 11 12 Z=: 1 1 2 2 4 6 5 4 8 7 9 10 X cov Y 3.27273 X corr Y 0.97547
Often we want to calculate a cov/corr matrix for more than 2 variables. The current definitions can be used this for purpose cov"1/~ X,Y,:Z 0.727273 3.27273 2.54545 3.27273 15.4773 11.6591 2.54545 11.6591 9.7197 corr"1/~ X,Y,:Z 1 0.97547 0.957393 0.97547 1 0.950585 0.957393 0.950585 1 but they are slower than these alternatives ((+/ .*~ |:)@dev % <:@#) X,.Y,.Z ((+/ .*~ |:)@(dev %"_1 _ stddev) % <:@#) X,.Y,.Z This topic has come up in the forums at least a couple of times. http://www.jsoftware.com/pipermail/programming/2011-March/022417.html http://www.jsoftware.com/pipermail/programming/2009-September/016321.html http://www.jsoftware.com/pipermail/programming/2007-June/007186.html I propose to add the following definitions to the stats/base/multivariate.ijs script. Any algorithmic or naming suggestions are welcome. XtX=: |: +/ .* ] cov_multi=: XtX@dev % <:@# corr_multi=: XtX@(dev %"_1 _ stddev) % <:@# Note that my testing suggests the fork (|: +/ .* ]) appears to be slightly faster and leaner than equivalent hook (+/ .*~ |:) ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm