Rusers: I am trying to apply a quadratic discriminant function to find the best classification outcomes. 1 is assigned to the values greater than a threshold value; and 0 otherwise. I would like to see how the apparent error rates and the optimal error rate change with increasing threshold values.
I have a 1000*10 data matrix: n=1000 and p=10. Here is what I wrote so far, but seems to be inefficient. I appreciate if someone help me out. library(foreign) library(MASS) D<-read.dbf('data/Indianapolis015.dbf') # import a data # data looks like this LONGLAB X Y Perimeter Area X_UTM Y_UTM F0 F1 F2 1 TAZ 18011:1000 -86.25985 39.95286 2.061630 0.1862549 50600.38 4435792 235 0 35 2 TAZ 18011:1001 -86.31030 39.97591 3.657006 0.7305006 46440.80 4438608 0 0 0 3 TAZ 18011:1002 -86.29542 39.97054 3.516089 0.6408084 47677.31 4437936 155 0 15 4 TAZ 18011:1003 -86.27574 39.97294 5.000185 1.2592142 49374.91 4438102 835 0 55 5 TAZ 18011:1004 -86.25967 39.97197 4.788531 1.1930984 50741.38 4437913 425 0 80 6 TAZ 18011:1005 -86.29245 39.98580 6.189141 1.6734483 48031.44 4439616 185 0 35 7 TAZ 18011:1006 -86.24899 39.98259 7.525633 2.0564466 51723.80 4439040 505 0 45 8 TAZ 18011:1007 -86.30974 39.99014 3.773037 0.7790234 46583.20 4440186 30 0 10 9 TAZ 18011:1008 -86.27151 39.99040 4.589226 1.2212674 49850.92 4440021 40 0 0 10 TAZ 18011:9215 -86.58085 40.13588 37.278521 69.6681954 24438.13 4457794 2095 85 200 thrs<-seq(1000,10000,length=50) ED<-D[,383]/D[,5] # employment density CBDx<-D[,6]-58277.194363 # convert a coordinate for x CBDy<-D[,7]-4414486.03135 # convert a coordinate for y AER<-vector("numeric",length(thrs)) OER<-vector("numeric",length(thrs)) MER<-vector("numeric",length(thrs)) # compute the apparent error rates for each threshold value for (j in 1:length(thrs)){ ctgy<-ifelse(ED>thrs[j],2,1) # 2 categories are created by the threshold test1<-qda(cbind(ED,CBDx,CBDy),ctgy) est1<-cbind(ctgy,predict(test1)$class) AER[j]<-sum((est1[,1]-est1[2])==0)/dim(D)[1] } # OER computation for ith location taken out for the thresholds for (k in 1:dim(D)[1]){ for (j in 1:length(thrs)){ ctgy<-ifelse(ED>thrs[j],2,1) test2<-qda(cbind(ED[-k],CBDx[-k],CBDy[-k]),ctgy[-k]) est2<-cbind(ctgy[-k],predict(test2)$class) OER[j]<-mean(sum((est2[,1]-est2[2])==0)/(dim(D)[1]-1)) }} ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.