Re: [R] help with lda function from MASS package

David Winsemius Tue, 29 Sep 2009 10:35:27 -0700

If you want to see how Venables and Ripley get their lda distances,then this is a quick path to the (uncommented) source:


1) > methods(lda)
[1] lda.data.frame* lda.default*    lda.formula*    lda.matrix*
2) since you call involved a formula I looked first at:
> getAnywhere(lda.formula)
A single object matching ‘lda.formula’ was found

<snipped output because after setup the data was dispatched tolda.default

3) > getAnywhere(lda.default)
A single object matching ‘lda.default’ was found
<snipped output.

<scroll down to the section where they do their scaling and learn fromthe masters.

The source code in the original package may have advantages in thatthe comments would be visible. There was an R News, if memory serves,by Uwe Ligges, if memory serves, a few years ago on getting access tothe source that is good reading.

--
David


2)On Sep 29, 2009, at 12:39 PM, Pete Shepard wrote:

Thanks David,

Yes, I am talking about the MASS package.Thank you for pointing outthat

these scale the same. My question is, how do I get from the V1 data:

      V1
1 164.4283
2 166.2492
3 170.5232
4 156.5622
5 127.7540
6 136.7704
7 136.3436

to the other set of data:


+ 1 -2.3769280
+ 2 -2.7049437
+ 3 -3.4748309
+ 4 -0.9599825
+ 5  4.2293774
+ 6  2.6052193
+ 7  2.6820884

On Mon, Sep 28, 2009 at 3:41 PM, David Winsemius <dwinsem...@comcast.net>wrote:

Your results are the same (after scaling and sign reversal) out tothe 4thdecimal place as those from lda (which by the way is almostcertainly from

the MASS package and not from an impossible to find "lda package".)

read.table(textConnection(txt))

est <-read.table(textConnection(txt))
scale(est)

           V1
[1,]  0.7656185
[2,]  0.8712707
[3,]  1.1192567
[4,]  0.3092117
[5,] -1.3622976
[6,] -0.8391481
[7,] -0.8639119
attr(,"scaled:center")
   V1
151.233
attr(,"scaled:scale")
    V1
17.23484

LD1est <- read.table(textConnection(" LD1

+ 1 -2.3769280
+ 2 -2.7049437
+ 3 -3.4748309
+ 4 -0.9599825
+ 5  4.2293774
+ 6  2.6052193
+ 7  2.6820884"), header=T)

scale(LD1est)

       LD1
1 -0.7656170
2 -0.8712721
3 -1.1192555
4 -0.3092138
5  1.3622976
6  0.8391505
7  0.8639103
attr(,"scaled:center")
        LD1
-3.172066e-17
attr(,"scaled:scale")
   LD1
3.104591


On Sep 28, 2009, at 5:54 PM, Pete Shepard wrote:

I am having a problem understanding the lda package. I have a dataset

here:

 [,1] [,2] [,3]
[1,] 2.95 6.63    0
[2,] 2.53 7.79    0
[3,] 3.57 5.65    0
[4,] 3.16 5.47    0
[5,] 2.58 4.46    1
[6,] 2.16 6.22    1
[7,] 3.27 3.52    1

If I do the following;

"names(d)<-c("y","x1","x2")
d$x1 = d$x1 * 100
d$x2 = d$x2 * 100
g<-lda( y ~ x1 + x2, data=d)
v2 <- predict(g, d)",

I get;
     LD1
1 -2.3769280
2 -2.7049437
3 -3.4748309
4 -0.9599825
5  4.2293774
6  2.6052193
7  2.6820884

However, If I do it manually,

"rawdata<-matrix(scan("tab1_1.


dat"),ncol=3,byrow=T)
group <- rawdata[,1]
X <- 100 * rawdata[,2:3]
Apf <- X[group==1,]
Af <- X[group==0,]
xbar1 <- apply(Af, 2, mean)
S1 <- var(Af)
N1 <- dim(Af)[1]
xbar2 <- apply(Apf, 2, mean)
S2 <- var(Apf)
N2 <- dim(Apf)[1]
S<-((N1-1)*S1+(N2-1)*S2)/(N1+N2-2)
Sinv=solve(S)
d<-xbar1-xbar2
b <- Sinv %*% d
v <- X %*% b",

I get;

     [,1]
[1,] 164.4283
[2,] 166.2492
[3,] 170.5232
[4,] 156.5622
[5,] 127.7540
[6,] 136.7704
[7,] 136.3436

I am having a problem understanding the lda package. I have adataset

here:

 [,1] [,2] [,3]
[1,] 2.95 6.63    0
[2,] 2.53 7.79    0
[3,] 3.57 5.65    0
[4,] 3.16 5.47    0
[5,] 2.58 4.46    1
[6,] 2.16 6.22    1
[7,] 3.27 3.52    1

If I do the following;

"names(d)<-c("y","x1","x2")
d$x1 = d$x1 * 100
d$x2 = d$x2 * 100
g<-lda( y ~ x1 + x2, data=d)
v2 <- predict(g, d)",

I get;
     LD1
1 -2.3769280
2 -2.7049437
3 -3.4748309
4 -0.9599825
5  4.2293774
6  2.6052193
7  2.6820884

However, If I do it manually,

"rawdata<-matrix(scan("tab1_1.dat"),ncol=3,byrow=T)
group <- rawdata[,1]
X <- 100 * rawdata[,2:3]
Apf <- X[group==1,]
Af <- X[group==0,]
xbar1 <- apply(Af, 2, mean)
S1 <- var(Af)
N1 <- dim(Af)[1]
xbar2 <- apply(Apf, 2, mean)
S2 <- var(Apf)
N2 <- dim(Apf)[1]
S<-((N1-1)*S1+(N2-1)*S2)/(N1+N2-2)
Sinv=solve(S)
d<-xbar1-xbar2
b <- Sinv %*% d
v <- X %*% b",

I get;

     [,1]
[1,] 164.4283
[2,] 166.2492
[3,] 170.5232
[4,] 156.5622
[5,] 127.7540
[6,] 136.7704
[7,] 136.3436

It seems there is an extra step that I am missing? The predictstep thatadds a constant to the second set of values? Can anyone clearthis up for

me?



David Winsemius, MD
Heritage Laboratories
West Hartford, CT


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with lda function from MASS package

Reply via email to