Thanks for clarifying! On Thu, Jan 10, 2013 at 12:47 PM, Uwe Ligges <lig...@statistik.tu-dortmund.de> wrote: > > > On 08.01.2013 21:14, Claus O'Rourke wrote: >> >> Hi all, >> I've encountered an issue using svm (e1071) in the specific case of >> supplying new data which may not have the full range of levels that >> were present in the training data. >> >> I've constructed this really primitive example to illustrate the point: >> >>> library(e1071) >>> training.data <- data.frame(x = c("yellow","red","yellow","red"), a = >>> c("alpha","alpha","beta","beta"), b = c("a", "b", "a", "c")) >>> my.model <- svm(x ~ .,data=training.data) >>> test.data <- data.frame(x = c("yellow","red"), a = c("alpha","beta"), b = >>> c("a", "b")) >>> predict(my.model,test.data) >> >> Error in predict.svm(my.model, test.data) : >> test data does not match model ! >>> >>> >>> levels(test.data$b) <- levels(training.data$b) >>> predict(my.model,test.data) >> >> 1 2 >> yellow red >> Levels: red yellow >> >> In the first case test.data$b does not have the level "c" and this >> results in the input data being rejected. I've debugged this down to >> the point of model matrix creation in the SVM R code. Once I fill up >> the levels in the test data with the levels from the original data, >> then there is no problem at all. >> >> Assuming my test data has to come from another source where the number >> of category levels seen might not always be as large as those for the >> original training data, is there a better way I should be handling >> this? > > > > You have to tell the factor about the possible levels, it does not > necessarily contain examples. > That means: > > levels(test.data$b) <- C("a", "b", "c") > predict(my.model,test.data) > > will help. > > Best, > Uwe Ligges > > > >> Thanks >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.