Dear R developers,
there appears to be a small problem with function cmdscale: for non-Euclidean distance matrices, using option add=FALSE (the default), cmdscale misses the smallest eigenvalue. This affects GOF statistic g.1 (See Mardia, Kent + Bibby (1979): Multivariate Analysis, eq. (14.4.7). The corresponding formula in Cox + Cox (2001): Multidimensional Scaling, 2nd ed., p 38, would seem to contain a misprint, it should be n instead of n-1.)
Example:
R> cmdscale(eurodist, eig=TRUE)$GOF [1] 0.7968344 0.8679134
The full set of eigenvalues can be obtained via
R> H <- function(n) { diag(n) - matrix(1, n, n)/n } R> eigen( -.5 * H(21) %*% (as.matrix(eurodist))^2 %*% H(21) , + only.values=TRUE)$values [1] 1.953838e+07 1.185656e+07 1.528844e+06 [4] 1.118742e+06 7.893472e+05 5.816552e+05 [7] 2.623192e+05 1.925976e+05 1.450845e+05 [10] 1.079673e+05 5.139484e+04 3.538963e-10 [13] -9.496124e+03 -5.305820e+04 -1.322166e+05 [16] -2.573360e+05 -3.326719e+05 -5.162523e+05 [19] -9.191491e+05 -1.006504e+06 -2.251844e+06
whereas cmdscale allows access of the first 20 of these via
R> cmdscale(eurodist, k=20, eig=TRUE)$eig [1] 1.953838e+07 1.185656e+07 1.528844e+06 [4] 1.118742e+06 7.893472e+05 5.816552e+05 [7] 2.623192e+05 1.925976e+05 1.450845e+05 [10] 1.079673e+05 5.139484e+04 4.656613e-10 [13] -9.496124e+03 -5.305820e+04 -1.322166e+05 [16] -2.573360e+05 -3.326719e+05 -5.162523e+05 [19] -9.191491e+05 -1.006504e+06 Warning messages: ...
Suggestion: replace
evalus <- e$values[-n]
with
evalus <- e$values
This yields
R> cmdscale(eurodist, eig=TRUE)$GOF [1] 0.7537543 0.8679134
Minor points:
May I suggest to allow access of all eigenvalues, instead of just the first k as delivered by cmdscale(... , eig=TRUE)$eig, in order to permit inspection of the severity of the problem? (I am aware that a comparison of g.1 and g.2 indicates the presence of negative eigenvalues.) It might also be useful to deliver a warning whenever some eigenvalues are < 0, using something like
if (any(e$values < 0)) warning("some eigenvalues are < 0")
Also, the documentation says that cmdscale(...)$eig returns "the n-1 eigenvalues computed during the scaling process if 'eig' is true", whereas in fact it returns the first k.
Incidentally, there is an undocumented feature: depending on the value of if(eig || x.ret || add) the value is a list including a component 'ac' (add. constant) in addition to those mentioned in the documentation.
Best regards, Christian Kleiber
Version: R 2.0.1 OS: Win XP Pro
--
*****************************************
Dr. Christian Kleiber FB Statistik Universitaet Dortmund Vogelpothsweg 78 D-44227 DORTMUND Germany
Tel: ++49-(0)231-755 5419 Fax: ++49-(0)231-755 5284 E-mail: [EMAIL PROTECTED]
______________________________________________ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel