Re: [R] Linear Discriminant Analysis
Region and Name are effectively the same variable cor(olive[,4:11]) will also show you that there are strong correlations between some of the variables - this is something you might want to avoid From: [EMAIL PROTECTED] on behalf of Soare Marcian-Alin Sent: Wed 06/06/2007 4:45 PM To: Uwe Ligges; [email protected] Subject: Re: [R] Linear Discriminant Analysis Thanks for explaining... Im just sitting at the homework for 6 hours after taking for one week antibiotica, because i had an amygdalitis... I just wanted some tipps for solving this homework, but thanks, I will try to get help on another way :) I think i solved it, but I still get this Error :( ## Loading Data library(MASS) olive <- url(" http://www.statistik.tuwien.ac.at/public/filz/students/multi/ss07/olive.R";) print(load(olive)) dim(olive) summary(olive) index <- sample(nrow(olive), 286) train <- olive[index,-11] test <- olive[-index,-11] summary(train) summary(test) table(train$Region) table(test$Region) # Linear Discriminant Analysis z <- lda(Region ~ . , train) zn <- predict(z, newdata=test)$class mean(zn != test$Region) 2007/6/6, Uwe Ligges <[EMAIL PROTECTED]>: > > > So what about asking your teacher (who seems to be Peter Filzmoser) and > try to find out your homework yourself? > You might want to think about some assumptions that must hold for LDA > and look at the class of your explaining variables ... > > Uwe Ligges > > > > Soare Marcian-Alin wrote: > > Hello, > > > > I want to make a linear discriminant analysis for the dataset olive, and > I > > get always this error:# > > Warning message: > > variables are collinear in: lda.default(x, grouping, ...) > > > > ## Loading Data > > library(MASS) > > olive <- url(" > > > http://www.statistik.tuwien.ac.at/public/filz/students/multi/ss07/olive.R > ") > > print(load(olive)) > > > > y <- 1:572 > > x <- sample(y) > > y1 <- x[1:286] > > > > train <- olive[y1,-11] > > test <- olive[-y1,-11] > > > > summary(train) > > summary(test) > > > > table(train$Region) > > table(test$Region) > > > > # Linear Discriminant Analysis > > z <- lda(Region ~ . , train) > > predict(z, train) > > > > z <- lda(Region ~ . , test) > > predict(z, test) > > > > Thanks in advance! > > > > > > > > > > > > __ > > [email protected] mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > -- Mit freundlichen Grüssen / Best Regards Soare Marcian-Alin [[alternative HTML version deleted]] __ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Discriminant Analysis
So what about asking your teacher (who seems to be Peter Filzmoser) and
try to find out your homework yourself?
You might want to think about some assumptions that must hold for LDA
and look at the class of your explaining variables ...
Uwe Ligges
Soare Marcian-Alin wrote:
> Hello,
>
> I want to make a linear discriminant analysis for the dataset olive, and I
> get always this error:#
> Warning message:
> variables are collinear in: lda.default(x, grouping, ...)
>
> ## Loading Data
> library(MASS)
> olive <- url("
> http://www.statistik.tuwien.ac.at/public/filz/students/multi/ss07/olive.R";)
> print(load(olive))
>
> y <- 1:572
> x <- sample(y)
> y1 <- x[1:286]
>
> train <- olive[y1,-11]
> test <- olive[-y1,-11]
>
> summary(train)
> summary(test)
>
> table(train$Region)
> table(test$Region)
>
> # Linear Discriminant Analysis
> z <- lda(Region ~ . , train)
> predict(z, train)
>
> z <- lda(Region ~ . , test)
> predict(z, test)
>
> Thanks in advance!
>
>
>
>
>
> __
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Discriminant Analysis
Thanks for explaining...
Im just sitting at the homework for 6 hours after taking for one week
antibiotica, because i had an amygdalitis...
I just wanted some tipps for solving this homework, but thanks, I will try
to get help on another way :)
I think i solved it, but I still get this Error :(
## Loading Data
library(MASS)
olive <- url("
http://www.statistik.tuwien.ac.at/public/filz/students/multi/ss07/olive.R";)
print(load(olive))
dim(olive)
summary(olive)
index <- sample(nrow(olive), 286)
train <- olive[index,-11]
test <- olive[-index,-11]
summary(train)
summary(test)
table(train$Region)
table(test$Region)
# Linear Discriminant Analysis
z <- lda(Region ~ . , train)
zn <- predict(z, newdata=test)$class
mean(zn != test$Region)
2007/6/6, Uwe Ligges <[EMAIL PROTECTED]>:
>
>
> So what about asking your teacher (who seems to be Peter Filzmoser) and
> try to find out your homework yourself?
> You might want to think about some assumptions that must hold for LDA
> and look at the class of your explaining variables ...
>
> Uwe Ligges
>
>
>
> Soare Marcian-Alin wrote:
> > Hello,
> >
> > I want to make a linear discriminant analysis for the dataset olive, and
> I
> > get always this error:#
> > Warning message:
> > variables are collinear in: lda.default(x, grouping, ...)
> >
> > ## Loading Data
> > library(MASS)
> > olive <- url("
> >
> http://www.statistik.tuwien.ac.at/public/filz/students/multi/ss07/olive.R
> ")
> > print(load(olive))
> >
> > y <- 1:572
> > x <- sample(y)
> > y1 <- x[1:286]
> >
> > train <- olive[y1,-11]
> > test <- olive[-y1,-11]
> >
> > summary(train)
> > summary(test)
> >
> > table(train$Region)
> > table(test$Region)
> >
> > # Linear Discriminant Analysis
> > z <- lda(Region ~ . , train)
> > predict(z, train)
> >
> > z <- lda(Region ~ . , test)
> > predict(z, test)
> >
> > Thanks in advance!
> >
> >
> >
> >
> >
> > __
> > [email protected] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
--
Mit freundlichen Grüssen / Best Regards
Soare Marcian-Alin
[[alternative HTML version deleted]]
__
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Re: [R] linear discriminant analysis in MASS
Dear Prof. Ripley I'm sorry about the confusion; this reply will simply avoid any humor attempts (good or bad). About "S" I'm sorry, as a "user" I was not aware of any "S" still existing outside of s-plus or R. So your right, the procedure I was referring to was conducted on s-plus. I used the GUI to construct the analysis, so I really don't know if the discrim() procedure I copied from the "command" window is accurate. But when I re-run the analysis with that as the command line, I get the same results. And it does provide a matrix of Mahalanobis distances between groups and a test of their significance (Hotelling's T Squared for Differences in Means Between Each Group). About the credits My data set is on JMP (SAS). It's great at manipulating and exploring data sets. The software does allow for many analysis types too, so my very first discriminant analysis was actually on JMP. But like many GUI softwares, it lacks options. JMP approaches the distance problem by drawing 95% confidence interval spheres around group means. Thats very nice (although it doesn't account for multiple comparisons) for LDA problems with few groups, but I have 12 so it became messy (graphically). Besides, I have the - I think very healthy - problem of never trusting just one software, especially the black box type, for my analysis. I was also accumulating literature on the subject (ecophysiology of trees, not statistics!) and I came across this paper Delagrange, S., Messier, C., Lechowicz, M.J. and Dizengremel, P. 2004. Physiological, morphological and allocational plasticity in understory deciduous trees: importance of plant size and light availability. Tree Physiol. 24(7): 775-784. which presented a test on Mahalanobis distances from LDA analysis. Now they used SAS (CAN-DISC with the ANOVA option) for their analysis. I tried it on R (lda in MASS and discrimin in ade4), without success (I get the discriminant analysis, but not the test). So I tried it on S-PLUS, and voilà! You could say that actually my first encounter with the procedure was with SAS, then on R, and only then on S-PLUS. I use the "vegan" package a lot for permutational statistics, as well as code developed at Pierre Legendre's lab, and I cite them accordingly, just like I believe I did with lda in MASS in the present e-mail. Thanks for your advice on multiple comparisons and normality. By the way, the s-plus procedure also outputs normality and co-variance tests. I do have multiple normality, but for now (!), I have covariance heterogeneity. I was of course planning on a Dunn-Sidak correction for multiple comparisons. Thank you for the quick reply, Alain Prof Brian Ripley a écrit : > On Mon, 20 Feb 2006, Alain Paquette wrote: > >> Hello R people >> >> I now know how to run my discriminant analysis with the lda function in >> MASS: >> lda.alain=lda(Groupes ~ Ht.D0 + Lc.Dc + Ram + IDF, gr, CV = FALSE) >> and it works fine. > > CV=FALSE is the default and so not needed. > >> But I am missing a test and cannot find any help on how to get it, if it >> exist. >> >> The "S" equivalent: > > There is no such function in S, and I rather object as the S > equivalent is lda() (and as the author of both I should know). Credit > where credit is due: discrim() is an S-PLUS function, indebted to lda(). > >> discrim(structure(.Data = Groupes ~ Ht.D0 + Lc.Dc + Ram + IDF, class = >> "formula"), data = gr, family = Canonical(cov.structure = >> "homoscedastic"), na.action = na.omit, prior = "proportional") >> outputs a nice matrix of Mahalanobis distances between groups and even >> tests (Hotelling's T Squared) for significant distances. > > Well, it seems not to. That is part of the output of the summary() > method, which itself calls the multicomp() method. > >> Why don't I just take the "S" output you say? Because like you, I'd >> rather put in my paper that I did it using R of course! > > No `of course' applies. If you learnt of this output from S-PLUS, I > urge you to credit it honestly and accurately. (If you refer to lda, > you should credit that, not just R.) > >> Does anyone know of a way to get this test out of lda? Or of another R >> package that does it? > > Mahalanobis distance between groups is easy, as this is just Euclidean > distance between group centres in the scaled space. The test > statistics can be produced, but > > - they are critically dependent on the unrealistic assumptions of > multivariate normality and variance homogeneity and > > - there needs to be an adjustment for multiple comparisons. > -- Alain Paquette Laboratoire d'écologie végétale Institut de recherche en biologie végétale Université de Montréal 4101 rue Sherbrooke Est Montréal (Québec) H1X 2B2 [EMAIL PROTECTED] labo (514) 872-8488 fax (514) 872-9406 http://www.irbv.umontreal.ca/francais/personnel/cogliastro-paquette.htm __ [email protected] mailing list https://st
Re: [R] linear discriminant analysis in MASS
On Mon, 20 Feb 2006, Alain Paquette wrote: > Hello R people > > I now know how to run my discriminant analysis with the lda function in > MASS: > lda.alain=lda(Groupes ~ Ht.D0 + Lc.Dc + Ram + IDF, gr, CV = FALSE) > and it works fine. CV=FALSE is the default and so not needed. > But I am missing a test and cannot find any help on how to get it, if it > exist. > > The "S" equivalent: There is no such function in S, and I rather object as the S equivalent is lda() (and as the author of both I should know). Credit where credit is due: discrim() is an S-PLUS function, indebted to lda(). > discrim(structure(.Data = Groupes ~ Ht.D0 + Lc.Dc + Ram + IDF, class = > "formula"), data = gr, family = Canonical(cov.structure = > "homoscedastic"), na.action = na.omit, prior = "proportional") > outputs a nice matrix of Mahalanobis distances between groups and even > tests (Hotelling's T Squared) for significant distances. Well, it seems not to. That is part of the output of the summary() method, which itself calls the multicomp() method. > Why don't I just take the "S" output you say? Because like you, I'd > rather put in my paper that I did it using R of course! No `of course' applies. If you learnt of this output from S-PLUS, I urge you to credit it honestly and accurately. (If you refer to lda, you should credit that, not just R.) > Does anyone know of a way to get this test out of lda? Or of another R > package that does it? Mahalanobis distance between groups is easy, as this is just Euclidean distance between group centres in the scaled space. The test statistics can be produced, but - they are critically dependent on the unrealistic assumptions of multivariate normality and variance homogeneity and - there needs to be an adjustment for multiple comparisons. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
