Hi all, I've encountered an issue using svm (e1071) in the specific case of supplying new data which may not have the full range of levels that were present in the training data.
I've constructed this really primitive example to illustrate the point: > library(e1071) > training.data <- data.frame(x = c("yellow","red","yellow","red"), a = > c("alpha","alpha","beta","beta"), b = c("a", "b", "a", "c")) > my.model <- svm(x ~ .,data=training.data) > test.data <- data.frame(x = c("yellow","red"), a = c("alpha","beta"), b = > c("a", "b")) > predict(my.model,test.data) Error in predict.svm(my.model, test.data) : test data does not match model ! > > levels(test.data$b) <- levels(training.data$b) > predict(my.model,test.data) 1 2 yellow red Levels: red yellow In the first case test.data$b does not have the level "c" and this results in the input data being rejected. I've debugged this down to the point of model matrix creation in the SVM R code. Once I fill up the levels in the test data with the levels from the original data, then there is no problem at all. Assuming my test data has to come from another source where the number of category levels seen might not always be as large as those for the original training data, is there a better way I should be handling this? Thanks ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.