[R] package pamr: pamr.adaptthresh() error rates

Tim Smith Thu, 27 Apr 2006 05:49:25 -0700

   
Hi,         I was working on a classification problem using the pamr package. I 
used the pamr.adaptthresh() function to find the optimal accuracy of the 
classifier. I must not be doing it right, since it doesn't return the threshold 
values for optimum classification.         For example,if I run it on a 
dataset, I get the following result using pamr.adaptthresh():             
predicted  true  (1)  (2)     (1) 32  8     (2)  5 17         i.e a 
mis-classification of (5 + 8 ) / ( 32 + 8 + 5 + 17)         However, if I just 
use an arbitrary threshold (in this case, I chose '2'), I get the following 
result:          predicted  true  (1)  (2)     (1) 35  5     (2)  5 17         
i.e a mis-classification of (5 + 5) / ( 32 + 8 + 5 + 17), which is clearly 
better than the one that I got from using pamr.adaptthresh().         Am I 
doing something wrong? What do I need to do to ensure that pamr.adaptthresh() 
returns the least mis-classification error rate?         I have tried using
 different values for 'ntries', and 'reduction factor' in pamr.adaptthresh(), 
without any success.         I have reproduced my code below. Any comments 
would be appreciated!         thanks.         ########################### CODE 
#################################       library(base)  library(graphics)  
library(pamr)       rm(list = ls())  gc()        makeColon <- function(){    # 
This dataset has 24 cancer, and 9 normal samples      n2 <- 
read.table("data/Colon.data",header = FALSE,sep = ",")           cancdat <- 
n2[,n2[1,]== 'tumor']       normdat <- n2[,n2[1,]== 'normal']      cancdat <- 
cancdat[-1,]      normdat <- normdat[-1,]      mat <-  
as.matrix(cbind(cancdat,normdat))      actclass <-  rep(c(1, 2), 
c(ncol(cancdat), ncol(normdat)))      return(list(mat,actclass))    }


     m <- makeColon()  mat <- m[[1]]  actclass <- m[[2]]  mat <- 
matrix(as.numeric(mat),nrow(mat),ncol(mat))       geneid = 
as.character(1:nrow(mat))  gs = as.character(1:nrow(mat))  mydata <- list(x= 
mat,y=factor(actclass),geneid = geneid ,genenames=gs) 

  mytrain <-   pamr.train(mydata)  new.scales <- 
pamr.adaptthresh(mytrain,ntries = 10, reduction.factor = 0.9) 

mytrain2 <- pamr.train(mydata,threshold.scale = new.scales)  mycv <- 
pamr.cv(mytrain2,mydata,nfold = 10)       res1 <- pamr.confusion(mycv,  
threshold = mytrain2$threshold.scale,extra = FALSE)  print(res1)      res2 <- 
pamr.confusion(mycv,  threshold = 2,extra = FALSE)  print(res2)                 
  ########################### END CODE ###############################          
    


                
---------------------------------

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] package pamr: pamr.adaptthresh() error rates

Reply via email to