Jonathan, It's still hard to tell. Try this:
options(warn = 1) # see ?options for explanation
## RUN YOUR CODE
Regards, Sundar
Jonathan Greenberg wrote:
Its hard for me to pinpoint where this is happening, since I'm working on an image thatıs about 10000 x 20000 pixels, and 12 bands deep and I'm using a set of for-next loops to pull out subsections of data. I can guarantee the input values are all floating point values.
To be more specific, I have created a classification tree, and I want to apply it to that large floating point image (all the band names match up) and write the prediction (probability) values to a file. What happens if a decision tree tries to classify a set of input values that are completely outside of the range of the input tree?
Here's the code I was using. I should mention that this worked on a small subset (400 x 400 pixels) that wouldn't have any "weird" values (negative or zero). The output file from this is turning out to be slightly smaller than it should given the samples,lines,bands and number type, which I why I'm wondering if the tree is simply dropping those "bad" values rather than giving them some value (e.g. 0):
## Creating the tree library(tree) bands=12 bandnames<-paste(c("B"),1:bands,sep="") treetraindata=read.csv("classtrainshad040205.csv",header=TRUE) names(treetraindata)[2:6]<-bandnames[1:5] names(treetraindata)[8:14]<-bandnames[6:12] treetraindata$Class_Name<-as.factor(treetraindata$Class_Name)
## Create an overfit tree treetrain<-tree(Class_Name ~ B1 + B2 + B3 + B4+B5+B6+B7+B8+B9+B10+B11+B12,treetraindata,mincut=1,minsize=2,mindev=0)
## Extracts a slice of data out of an ENVI BSQ file
envigetslice<-function(fileconnection,samples,lines,bands,interleave,datatyp
e,maxpixels) {
currentloc=seek(fileconnection,where=NA,origin="current")
## If data is integer
if(datatype==3) {
numbersize=2
datatype=integer()
if ((samples*lines)-(currentloc/numbersize) < maxpixels)
maxpixels=(samples*lines)-(currentloc/numbersize)
envislice <-
readBin(fileconnection,integer(),maxpixels,size=numbersize)
newloc=seek(fileconnection,where=NA,origin="current")
if (bands > 1) {
for (i in 1:(bands-1)) {
seek(fileconnection,where=currentloc+(samples*lines*numbersize*i),origin="st
art")
currentslice <-
readBin(fileconnection,integer(),maxpixels,size=numbersize)
envislice=data.frame(envislice,currentslice)
}
}
}
## If data is floating point
if(datatype==4) {
numbersize=4
if ((samples*lines)-(currentloc/numbersize) < maxpixels)
maxpixels=(samples*lines)-(currentloc/numbersize)
envislice <-
readBin(fileconnection,double(),maxpixels,size=numbersize)
newloc=seek(fileconnection,where=NA,origin="current")
if (bands > 1) {
for (i in 1:(bands-1)) {
seek(fileconnection,where=currentloc+(samples*lines*numbersize*i),origin="st
art")
currentslice <-
readBin(fileconnection,double(),maxpixels,size=numbersize)
envislice=data.frame(envislice,currentslice)
}
}
}
seek(fileconnection,where=newloc,origin="start")
envislice
}
## Read ENVI files in subsets ## interleave: 1=bsq ## datatype: (follows ENVI format): ## 3: long integer ## 4:floating point
## Apply the classifier imageclasstree<-function(infile,outfile,dectree,samples,lines,bands,interlea ve,datatype,maxpixels) {
fileconnection<-file(infile,open="rb") outfileconnection=file(outfile,open="wb")
numpixels = samples * lines numslices=ceiling(numpixels/maxpixels) if (numslices == floor(numpixels/maxpixels)) numslices=numslices-1
bandnames<-paste(c("B"),1:bands,sep="")
## Loop for processing images
for(j in 0:numslices) {
print((j/numslices)*100)
envislice<-envigetslice(fileconnection,samples,lines,bands,interleave,dataty
pe,maxpixels)
names(envislice)<-bandnames
predictslice<-predict(treetrain,envislice,type=c("vector"))
predictslice<-as.integer(round(as.vector(t(predictslice*10000)),digits=0))
predictslice
writeBin(predictslice,outfileconnection,size=2)
}
close(fileconnection)
close(outfileconnection)
}
imageclasstree("flt4aall","flt4adt", treetrain,11216,18173,12,1,4,25000)
On 2/18/04 2:25 PM, "Sundar Dorai-Raj" <[EMAIL PROTECTED]> wrote:
Jonathan Greenberg wrote:
I'm running a decision tree on a large dataset, and I'm getting multiple instances of "NAs introduced by coercion" (> 50). What does this mean?
--j
My guess would be you're trying to convert from character to numeric and are unable to do so. As in,
as.numeric("A")
[1] NA Warning message: NAs introduced by coercion
as.numeric("1")
[1] 1
But without more information from you it's impossible to tell.
See the posting guide at
http://www.R-project.org/posting-guide.html
Regards, Sundar
______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html