Hi, I was working on a classification problem using the pamr package. I
used the pamr.adaptthresh() function to find the optimal accuracy of the
classifier. I must not be doing it right, since it doesn't return the threshold
values for optimum classification. For example,if I run it on a
dataset, I get the following result using pamr.adaptthresh():
predicted true (1) (2) (1) 32 8 (2) 5 17 i.e a
mis-classification of (5 + 8 ) / ( 32 + 8 + 5 + 17) However, if I just
use an arbitrary threshold (in this case, I chose '2'), I get the following
result: predicted true (1) (2) (1) 35 5 (2) 5 17
i.e a mis-classification of (5 + 5) / ( 32 + 8 + 5 + 17), which is clearly
better than the one that I got from using pamr.adaptthresh(). Am I
doing something wrong? What do I need to do to ensure that pamr.adaptthresh()
returns the least mis-classification error rate? I have tried using
different values for 'ntries', and 'reduction factor' in pamr.adaptthresh(),
without any success. I have reproduced my code below. Any comments
would be appreciated! thanks. ########################### CODE
################################# library(base) library(graphics)
library(pamr) rm(list = ls()) gc() makeColon <- function(){ #
This dataset has 24 cancer, and 9 normal samples n2 <-
read.table("data/Colon.data",header = FALSE,sep = ",") cancdat <-
n2[,n2[1,]== 'tumor'] normdat <- n2[,n2[1,]== 'normal'] cancdat <-
cancdat[-1,] normdat <- normdat[-1,] mat <-
as.matrix(cbind(cancdat,normdat)) actclass <- rep(c(1, 2),
c(ncol(cancdat), ncol(normdat))) return(list(mat,actclass)) }
m <- makeColon() mat <- m[[1]] actclass <- m[[2]] mat <-
matrix(as.numeric(mat),nrow(mat),ncol(mat)) geneid =
as.character(1:nrow(mat)) gs = as.character(1:nrow(mat)) mydata <- list(x=
mat,y=factor(actclass),geneid = geneid ,genenames=gs)
mytrain <- pamr.train(mydata) new.scales <-
pamr.adaptthresh(mytrain,ntries = 10, reduction.factor = 0.9)
mytrain2 <- pamr.train(mydata,threshold.scale = new.scales) mycv <-
pamr.cv(mytrain2,mydata,nfold = 10) res1 <- pamr.confusion(mycv,
threshold = mytrain2$threshold.scale,extra = FALSE) print(res1) res2 <-
pamr.confusion(mycv, threshold = 2,extra = FALSE) print(res2)
########################### END CODE ###############################
---------------------------------
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html