Re: [R] puzzling classical Mahalanobis distances from covMcd() {robustbase}

2012-07-28 Thread David L Carlson
The values should probably be labeled initial instead of raw which is
how they are labeled in the source. The Details section of manual indicates
that the first step is to identify a subset of the original data between .5
and 1 whose covariance matrix has the lowest possible determinant. The next
paragraph:

The raw MCD estimate of location is then the average of these h points,
whereas the raw MCD estimate of scatter is their covariance matrix,
multiplied by a consistency factor and a finite sample correction factor (to
make it consistent at the normal model and unbiased at small samples).

Following your example:
 set.seed(42)
 x - matrix(rnorm(10*3), ncol = 3)
 xmeans - colMeans(x)
 Sx - cov(x)
 D2rb - covMcd(x)
 D2rb$raw.weights
 [1] 0 1 1 1 1 1 0 1 0 1  == Note that the raw weights eliminate obs 1, 7,
and 9
 xmeans; D2rb$raw.center 
[1]  0.5472968 -0.1634567 -0.1780795== Compare original means 
[1]  0.08172336 -0.03067387 -0.23956925 and raw means
 colMeans(x[as.logical(D2rb$raw.weights),]) == means with 1, 7, and 9
eliminated
[1]  0.08172336 -0.03067387 -0.23956925  == This matches
D2rb$raw.center

So the raw values are taken for a subset, h, which includes observations
2, 3, 4, 5, 6, 8, and 10. Given that the raw.center and raw.cov are based on
a subset of the original data, the mahalanobis distances will not be the
same either.

--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Fraser D. Neiman
 Sent: Friday, July 27, 2012 7:16 AM
 To: r-help@r-project.org
 Subject: [R] puzzling classical Mahalanobis distances from covMcd()
 {robustbase}
 
 Greetings,
 
 I am puzzled about why the _classical_ Mahalanobis distances that I get
 using
 the {stats} mahalanobis() function do not match the distances I get
 from the
 {robustbase} covMcd() function. Here is an example:
 
 x - matrix(rnorm(10*3), ncol = 3)
 
 #here is the {stats} result:
 Sx - cov(x)
 D2 - mahalanobis(x, colMeans(x), Sx)
 D2
 
 [1] 1.5135795 1.3761046 1.0367444 1.8111585 4.3038621 5.3195918
 3.2798665
 5.7559301
  [9] 2.2172150 0.3859475
 
 
 #here is the {robustbase} result
 Library(robustbase)
 D2rb- covMcd(x)
 D2rb$raw.mah
 
 [1] 0.7737193 1.1177445 0.7290794 0.6275703 3.5517622 6.0334350
 1.0582663
 5.7169250
  [9] 0.9420184 0.4210470
 
 According to the help file for covMcd{robustbase}
 
 raw.mah   mahalanobis distances of the observations based on the raw
 estimate of
 the location and scatter.
 
 So I think the second set of numbers should match the first. But they
 do not.
 What am I missing here?
 
 Thanks, Fraser
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] puzzling classical Mahalanobis distances from covMcd() {robustbase}

2012-07-27 Thread Fraser D. Neiman
Greetings,

I am puzzled about why the _classical_ Mahalanobis distances that I get using
the {stats} mahalanobis() function do not match the distances I get from the
{robustbase} covMcd() function. Here is an example:

x - matrix(rnorm(10*3), ncol = 3)

#here is the {stats} result:
Sx - cov(x)
D2 - mahalanobis(x, colMeans(x), Sx)
D2

[1] 1.5135795 1.3761046 1.0367444 1.8111585 4.3038621 5.3195918 3.2798665
5.7559301
 [9] 2.2172150 0.3859475

 
#here is the {robustbase} result
Library(robustbase)
D2rb- covMcd(x)
D2rb$raw.mah

[1] 0.7737193 1.1177445 0.7290794 0.6275703 3.5517622 6.0334350 1.0582663
5.7169250
 [9] 0.9420184 0.4210470

According to the help file for covMcd{robustbase}

raw.mah mahalanobis distances of the observations based on the raw estimate of
the location and scatter.

So I think the second set of numbers should match the first. But they do not.
What am I missing here?

Thanks, Fraser

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.