Jonathan,
  It's still hard to tell. Try this:

options(warn = 1) # see ?options for explanation

## RUN YOUR CODE

Regards,
Sundar


Jonathan Greenberg wrote:


Its hard for me to pinpoint where this is happening, since I'm working on an
image thatıs about 10000 x 20000 pixels, and 12 bands deep and I'm using a
set of for-next loops to pull out subsections of data.  I can guarantee the
input values are all floating point values.

To be more specific, I have created a classification tree, and I want to
apply it to that large floating point image (all the band names match up)
and write the prediction (probability) values to a file.  What happens if a
decision tree tries to classify a set of input values that are completely
outside of the range of the input tree?

Here's the code I was using.  I should mention that this worked on a small
subset (400 x 400 pixels) that wouldn't have any "weird" values (negative or
zero).  The output file from this is turning out to be slightly smaller than
it should given the samples,lines,bands and number type, which I why I'm
wondering if the tree is simply dropping those "bad" values rather than
giving them some value (e.g. 0):

## Creating the tree
library(tree)
bands=12
bandnames<-paste(c("B"),1:bands,sep="")
treetraindata=read.csv("classtrainshad040205.csv",header=TRUE)
names(treetraindata)[2:6]<-bandnames[1:5]
names(treetraindata)[8:14]<-bandnames[6:12]
treetraindata$Class_Name<-as.factor(treetraindata$Class_Name)

## Create an overfit tree
treetrain<-tree(Class_Name ~ B1 + B2 + B3 +
B4+B5+B6+B7+B8+B9+B10+B11+B12,treetraindata,mincut=1,minsize=2,mindev=0)

## Extracts a slice of data out of an ENVI BSQ file
envigetslice<-function(fileconnection,samples,lines,bands,interleave,datatyp
e,maxpixels) {
currentloc=seek(fileconnection,where=NA,origin="current")
## If data is integer
if(datatype==3) {
numbersize=2
datatype=integer()
if ((samples*lines)-(currentloc/numbersize) < maxpixels)
maxpixels=(samples*lines)-(currentloc/numbersize)
envislice <-
readBin(fileconnection,integer(),maxpixels,size=numbersize)
newloc=seek(fileconnection,where=NA,origin="current")
if (bands > 1) {
for (i in 1:(bands-1)) {
seek(fileconnection,where=currentloc+(samples*lines*numbersize*i),origin="st
art")
currentslice <-
readBin(fileconnection,integer(),maxpixels,size=numbersize)
envislice=data.frame(envislice,currentslice)
}
}
}
## If data is floating point
if(datatype==4) {
numbersize=4
if ((samples*lines)-(currentloc/numbersize) < maxpixels)
maxpixels=(samples*lines)-(currentloc/numbersize)
envislice <-
readBin(fileconnection,double(),maxpixels,size=numbersize)
newloc=seek(fileconnection,where=NA,origin="current")
if (bands > 1) {
for (i in 1:(bands-1)) {
seek(fileconnection,where=currentloc+(samples*lines*numbersize*i),origin="st
art")
currentslice <-
readBin(fileconnection,double(),maxpixels,size=numbersize)
envislice=data.frame(envislice,currentslice)
}
}
}
seek(fileconnection,where=newloc,origin="start")
envislice
}


## Read ENVI files in subsets
## interleave: 1=bsq
## datatype: (follows ENVI format):
##    3: long integer
##    4:floating point


## Apply the classifier imageclasstree<-function(infile,outfile,dectree,samples,lines,bands,interlea ve,datatype,maxpixels) {

fileconnection<-file(infile,open="rb")
outfileconnection=file(outfile,open="wb")

numpixels = samples * lines
numslices=ceiling(numpixels/maxpixels)
if (numslices == floor(numpixels/maxpixels)) numslices=numslices-1

bandnames<-paste(c("B"),1:bands,sep="")

## Loop for processing images
for(j in 0:numslices) {
print((j/numslices)*100)
envislice<-envigetslice(fileconnection,samples,lines,bands,interleave,dataty
pe,maxpixels)
names(envislice)<-bandnames
predictslice<-predict(treetrain,envislice,type=c("vector"))
predictslice<-as.integer(round(as.vector(t(predictslice*10000)),digits=0))
predictslice
writeBin(predictslice,outfileconnection,size=2)
}
close(fileconnection)
close(outfileconnection)
}


imageclasstree("flt4aall","flt4adt", treetrain,11216,18173,12,1,4,25000)

On 2/18/04 2:25 PM, "Sundar Dorai-Raj" <[EMAIL PROTECTED]> wrote:



Jonathan Greenberg wrote:



I'm running a decision tree on a large dataset, and I'm getting multiple
instances of "NAs introduced by coercion" (> 50).  What does this mean?

--j


My guess would be you're trying to convert from character to numeric and are unable to do so. As in,


as.numeric("A")

[1] NA Warning message: NAs introduced by coercion

as.numeric("1")

[1] 1


But without more information from you it's impossible to tell.

See the posting guide at

http://www.R-project.org/posting-guide.html

Regards,
Sundar





______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to