Re: [R] mahalanobis
Yianni You probably would have gotten more helpful replies if you indicated the substantiative problem you were trying to solve. From your description, it seems like you want to calculate leverage of predictors, (X1, X2) in the lm( y ~ X1+X2). My crystal ball says you may be an SPSS user, for whom mahalanobis D^2 of the predictors is what you have to beg for to get leverages. In R, you will get the most happiness from ?leverage.plot in the car package. mahalanobois D^2 are proportional to leverage. -Michael [EMAIL PROTECTED] wrote: Hi, I am not sure I am using correctly the mahalanobis distnace method... Suppose I have a response variable Y and predictor variables X1 and X2 all - cbind(Y, X1, X2) mahalanobis(all, colMeans(all), cov(all)); However, my results from this are different from the ones I am getting using another statistical software. I was reading that the comparison is with the means of the predictor variables which led me to think that the above should be transformed into: predictors - cbind(X1, X2) mahalanobis(all, colMeans(predictors), cov(all)) But still the results are different Am I doing something wrong or have I misunderstood something in the use of the function mahalanobis? Thanks. -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Streethttp://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mahalanobis
On 31/05/07, Anup Nandialath [EMAIL PROTECTED] wrote: oops forgot the example example try this line sqrt(mahalanobis(all, colMeans(predictors), cov(all), FALSE) Hi and thanks for the reply Anup. Unfortunately, I had a look on the example before posting but not much of a help... I did some further tests and in order to have the same results I must run mahalanobis with the predictors only dataset, ie. mahalanobis(predictors, colMeans(predictors), cov(predictors)). Now, on a first glance it seems to me a bit strange that the influence of these points on a regression are measured without taking into account the response variable (provided that the other stat software calculates the mahalanobis distances correctly) but I guess this is something that I have to resolve by doing some studying on my own on the mahalanobis distance... thanks again. now cross check with other software best Anup No need to miss a message. Get email on-the-go with Yahoo! Mail for Mobile. Get started. -- yianni __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mahalanobis
Hi, I am not sure I am using correctly the mahalanobis distnace method... Suppose I have a response variable Y and predictor variables X1 and X2 all - cbind(Y, X1, X2) mahalanobis(all, colMeans(all), cov(all)); However, my results from this are different from the ones I am getting using another statistical software. I was reading that the comparison is with the means of the predictor variables which led me to think that the above should be transformed into: predictors - cbind(X1, X2) mahalanobis(all, colMeans(predictors), cov(all)) But still the results are different Am I doing something wrong or have I misunderstood something in the use of the function mahalanobis? Thanks. -- yianni __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mahalanobis distance and probability of group membership using Hotelling's T2 distribution
I want to calculate the probability that a group will include a particular point using the squared Mahalanobis distance to the centroid. I understand that the squared Mahalanobis distance is distributed as chi-squared but that for a small number of random samples from a multivariate normal population the Hotellings T2 (T squared) distribution should be used. I cannot find a function for Hotelling's T2 distribution in R (although from a previous post I have been provided with functions for the Hotelling Test). My understanding is that the Hotelling's T2 distribution is related to the F distribution using the equation: T2(u,v) = F(u, v-u+1)*vu/(v-u+1) where u is the number of variables and v the number of group members. I have written the R code below to compare the results from the chi-squared distribution with the Hotelling's T2 distribution for probability of a member being included within a group. Please can anyone confirm whether or not this is the correct way to use Hotelling's T2 distribution for probability of group membership. Also, when testing a particular group member, is it preferable to leave that member out when calculating the centre and covariance of the group for the Mahalanobis distances? Thanks Mike White ## Hotelling T^2 distribution function ph-function(q, u, v, ...){ # q vector of quantiles as in function pf # u number of independent variables # v number of observations if (!v u+1) stop(n must be greater than p+1) df1 - u df2 - v-u+1 pf(q*df2/(v*u), df1, df2, ...) } # compare Chi-squared and Hotelling T^2 distributions for a group member u-3 v-10 set.seed(1) mat-matrix(rnorm(v*u), nrow=v, ncol=u) MD2-mahalanobis(mat, center=colMeans(mat), cov=cov(mat)) d-MD2[order(MD2)] # select a point midway between nearest and furthest from centroid dm-d[length(d)/2] 1-ph(dm,u,v)# probability using Hotelling T^2 distribution # [1] 0.6577069 1-pchisq(dm, u) # probability using Chi-squared distribution # [1] 0.5538466 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mahalanobis distance and probability of group membership using Hotelling's T2 distribution
I want to calculate the probability that a group will include a particular point using the squared Mahalanobis distance to the centroid. I understand that the squared Mahalanobis distance is distributed as chi-squared but that for a small number of random samples from a multivariate normal population the Hotellings T2 (T squared) distribution should be used. I cannot find a function for Hotelling's T2 distribution in R (although from a previous post I have been provided with functions for the Hotelling Test). My understanding is that the Hotelling's T2 distribution is related to the F distribution using the equation: T2(u,v) = F(u, v-u+1)*vu/(v-u+1) where u is the number of variables and v the number of group members. I have written the R code below to compare the results from the chi-squared distribution with the Hotelling's T2 distribution for probability of a member being included within a group. Please can anyone confirm whether or not this is the correct way to use Hotelling's T2 distribution for probability of group membership. Also, when testing a particular group member, is it preferable to leave that member out when calculating the centre and covariance of the group for the Mahalanobis distances? Thanks Mike White ## Hotelling T^2 distribution function ph-function(q, u, v, ...){ # q vector of quantiles as in function pf # u number of independent variables # v number of observations if (!v u+1) stop(n must be greater than p+1) df1 - u df2 - v-u+1 pf(q*df2/(v*u), df1, df2, ...) } # compare Chi-squared and Hotelling T^2 distributions for a group member u-3 v-10 set.seed(1) mat-matrix(rnorm(v*u), nrow=v, ncol=u) MD2-mahalanobis(mat, center=colMeans(mat), cov=cov(mat)) d-MD2[order(MD2)] # select a point midway between nearest and furthest from centroid dm-d[length(d)/2] 1-ph(dm,u,v)# probability using Hotelling T^2 distribution # [1] 0.6577069 1-pchisq(dm, u) # probability using Chi-squared distribution # [1] 0.5538466 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mahalanobis distances
Dear R community Have just recently got back into R after a long break and have been amazed at how much it has grown, and how active the list is! Thank you so much to all those who contribute to this amazing project. My question: I am trying to calculate Mahalanobis distances for a matrix called fgmatrix dim(fgmatrix) [1] 76 15 fg.cov - cov.wt(fgmatrix) mahalanobis(fgmatrix, center = fg.cov$center, cov = fg.cov$cov) Then I get an error message Covariance matrix is apparently singular What does this mean? I can't see anything strange about the covariance matrix, and am not getting anywhere with the help files. dim(fg.cov$cov) [1] 15 15 length(fg.cov$center) [1] 15 Thanks -- Karen Kotschy Centre for Water in the Environment University of the Witwatersrand Johannesburg South Africa P/Bag X3, Wits, 2050 Tel: +2711 717-6425 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Mahalanobis distances
The first thing I'd try is scale, as that should not affect the Mahalinobis distances: Fgmat - scale(fgmatrix) fg.cov - cov.wt(Fgmat) mahalanobis(Fgmat, center = Fg.cov$center, cov = Fg.cov$cov) Does this give you the same result. If no, the problem was that fgmatrix was not sufficiently well conditioned to support this computation. If this does NOT solve the problem, I'd manually contruct a ginverse of Fg.cov$cov, proceeding roughly as outlined in the following example: set.seed(1) X10 - array(rnorm(760), dim=c(76, 10)) X15.10 - cbind(X10, X10[,1:5]) fg.cov - cov.wt(X15.10) mahalanobis(X15.10, center = fg.cov$center, cov = fg.cov$cov) (S15.10 - eigen(fg.cov$cov, symmetric=TRUE)) # Only 10 non-zero eigenvalues fg.Info - crossprod(S15.10$vectors[,1:10] / rep(sqrt(S15.10$values[1:10]), 15)) mahalanobis(X15.10, center = fg.cov$center, cov = fg.cov$cov, inverted=TRUE) The key is computing your own generalized inverse and using that with inverted=TRUE. spencer graves Karen Kotschy wrote: Dear R community Have just recently got back into R after a long break and have been amazed at how much it has grown, and how active the list is! Thank you so much to all those who contribute to this amazing project. My question: I am trying to calculate Mahalanobis distances for a matrix called fgmatrix dim(fgmatrix) [1] 76 15 fg.cov - cov.wt(fgmatrix) mahalanobis(fgmatrix, center = fg.cov$center, cov = fg.cov$cov) Then I get an error message Covariance matrix is apparently singular What does this mean? I can't see anything strange about the covariance matrix, and am not getting anywhere with the help files. dim(fg.cov$cov) [1] 15 15 length(fg.cov$center) [1] 15 Thanks -- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA [EMAIL PROTECTED] www.pdf.com http://www.pdf.com Tel: 408-938-4420 Fax: 408-280-7915 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Mahalanobis distances
On Fri, 24 Jun 2005, Spencer Graves wrote: (...) The key is computing your own generalized inverse and using that with inverted=TRUE. (...) One method to do this is function solvecov in package fpc. Christian spencer graves Karen Kotschy wrote: Dear R community Have just recently got back into R after a long break and have been amazed at how much it has grown, and how active the list is! Thank you so much to all those who contribute to this amazing project. My question: I am trying to calculate Mahalanobis distances for a matrix called fgmatrix dim(fgmatrix) [1] 76 15 fg.cov - cov.wt(fgmatrix) mahalanobis(fgmatrix, center = fg.cov$center, cov = fg.cov$cov) Then I get an error message Covariance matrix is apparently singular What does this mean? I can't see anything strange about the covariance matrix, and am not getting anywhere with the help files. dim(fg.cov$cov) [1] 15 15 length(fg.cov$center) [1] 15 Thanks -- Spencer Graves, PhD Senior Development Engineer PDF Solutions, Inc. 333 West San Carlos Street Suite 700 San Jose, CA 95110, USA [EMAIL PROTECTED] www.pdf.com http://www.pdf.com Tel: 408-938-4420 Fax: 408-280-7915 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html *** NEW ADDRESS! *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 [EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] mahalanobis distance
Is there a function that calculate the mahalanobis distance in R . The dist function calculates euclidean', 'maximum', 'manhattan', 'canberra', 'binary' or 'minkowski'. Thanks ../Murli __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] mahalanobis distance
See (surprising enough) ?mahalanobis... Andy From: Murli Nair Is there a function that calculate the mahalanobis distance in R . The dist function calculates euclidean', 'maximum', 'manhattan', 'canberra', 'binary' or 'minkowski'. Thanks ../Murli __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] mahalanobis distance
Dear Murli, Try ?mahalanobis, which, by the way, is turned up by help.search(mahalanobis). I hope this helps, John -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Murli Nair Sent: Sunday, September 12, 2004 3:17 PM To: [EMAIL PROTECTED] Subject: [R] mahalanobis distance Is there a function that calculate the mahalanobis distance in R . The dist function calculates euclidean', 'maximum', 'manhattan', 'canberra', 'binary' or 'minkowski'. Thanks ../Murli __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Mahalanobis
Dear all Why isn'it possible to calculate Mahalanobis distances with R for a matrix with 1 row (observations) more than the number of columns (variables)? mydata - matrix(runif(12,-5,5), 4, 3) mahalanobis(x=mydata, center=apply(mydata,2,mean), cov=var(mydata)) [1] 2.25 2.25 2.25 2.25 mydata - matrix(runif(420,-5,5), 21, 20) mahalanobis(x=mydata, center=apply(mydata,2,mean), cov=var(mydata)) [1] 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 [13] 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 mydata - matrix(runif(132,-5,5), 12, 11) mahalanobis(x=mydata, center=apply(mydata,2,mean), cov=var(mydata)) [1] 10.08333 10.08333 10.08333 10.08333 10.08333 10.08333 10.08333 10.08333 10.08333 10.08333 10.08333 10.08333 Thanks in advance Alberto Murta version _ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major1 minor8.1 year 2003 month11 day 21 language R -- Alberto G. Murta Institute for Agriculture and Fisheries Research (INIAP-IPIMAR) Av. Brasilia, 1449-006 Lisboa, Portugal | Phone: +351 213027062 Fax:+351 213015948 | http://ipimar-iniap.ipimar.pt/pelagicos/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Mahalanobis
If I'm not mistaken, the data you generated form a simplex in the p-dimensional space. Mahalanobis distance for such data, using sample mean and covariance, just give the distance to the centroid after normalization. The normalization step make all the points equidistance from the centroid. To see this, try generating 3 points in 2D, and plot the principal component scores: You'll see the points on the vertices of a regular triangle. Andy From: Alberto Murta Dear all Why isn'it possible to calculate Mahalanobis distances with R for a matrix with 1 row (observations) more than the number of columns (variables)? mydata - matrix(runif(12,-5,5), 4, 3) mahalanobis(x=mydata, center=apply(mydata,2,mean), cov=var(mydata)) [1] 2.25 2.25 2.25 2.25 mydata - matrix(runif(420,-5,5), 21, 20) mahalanobis(x=mydata, center=apply(mydata,2,mean), cov=var(mydata)) [1] 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 [13] 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 19.04762 mydata - matrix(runif(132,-5,5), 12, 11) mahalanobis(x=mydata, center=apply(mydata,2,mean), cov=var(mydata)) [1] 10.08333 10.08333 10.08333 10.08333 10.08333 10.08333 10.08333 10.08333 10.08333 10.08333 10.08333 10.08333 Thanks in advance Alberto Murta version _ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major1 minor8.1 year 2003 month11 day 21 language R -- Alberto G. Murta Institute for Agriculture and Fisheries Research (INIAP-IPIMAR) Av. Brasilia, 1449-006 Lisboa, Portugal | Phone: +351 213027062 Fax:+351 213015948 | http://ipimar-iniap.ipimar.pt/pelagicos/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html