The values should probably be labeled initial instead of raw which is
how they are labeled in the source. The Details section of manual indicates
that the first step is to identify a subset of the original data between .5
and 1 whose covariance matrix has the lowest possible determinant. The next
paragraph:
The raw MCD estimate of location is then the average of these h points,
whereas the raw MCD estimate of scatter is their covariance matrix,
multiplied by a consistency factor and a finite sample correction factor (to
make it consistent at the normal model and unbiased at small samples).
Following your example:
set.seed(42)
x - matrix(rnorm(10*3), ncol = 3)
xmeans - colMeans(x)
Sx - cov(x)
D2rb - covMcd(x)
D2rb$raw.weights
[1] 0 1 1 1 1 1 0 1 0 1 == Note that the raw weights eliminate obs 1, 7,
and 9
xmeans; D2rb$raw.center
[1] 0.5472968 -0.1634567 -0.1780795== Compare original means
[1] 0.08172336 -0.03067387 -0.23956925 and raw means
colMeans(x[as.logical(D2rb$raw.weights),]) == means with 1, 7, and 9
eliminated
[1] 0.08172336 -0.03067387 -0.23956925 == This matches
D2rb$raw.center
So the raw values are taken for a subset, h, which includes observations
2, 3, 4, 5, 6, 8, and 10. Given that the raw.center and raw.cov are based on
a subset of the original data, the mahalanobis distances will not be the
same either.
--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
project.org] On Behalf Of Fraser D. Neiman
Sent: Friday, July 27, 2012 7:16 AM
To: r-help@r-project.org
Subject: [R] puzzling classical Mahalanobis distances from covMcd()
{robustbase}
Greetings,
I am puzzled about why the _classical_ Mahalanobis distances that I get
using
the {stats} mahalanobis() function do not match the distances I get
from the
{robustbase} covMcd() function. Here is an example:
x - matrix(rnorm(10*3), ncol = 3)
#here is the {stats} result:
Sx - cov(x)
D2 - mahalanobis(x, colMeans(x), Sx)
D2
[1] 1.5135795 1.3761046 1.0367444 1.8111585 4.3038621 5.3195918
3.2798665
5.7559301
[9] 2.2172150 0.3859475
#here is the {robustbase} result
Library(robustbase)
D2rb- covMcd(x)
D2rb$raw.mah
[1] 0.7737193 1.1177445 0.7290794 0.6275703 3.5517622 6.0334350
1.0582663
5.7169250
[9] 0.9420184 0.4210470
According to the help file for covMcd{robustbase}
raw.mah mahalanobis distances of the observations based on the raw
estimate of
the location and scatter.
So I think the second set of numbers should match the first. But they
do not.
What am I missing here?
Thanks, Fraser
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.