[R] Levels in new data fed to SVM

Claus O'Rourke Tue, 08 Jan 2013 12:17:33 -0800

Hi all,
I've encountered an issue using svm (e1071) in the specific case of
supplying new data which may not have the full range of levels that
were present in the training data.


I've constructed this really primitive example to illustrate the point:

> library(e1071)
> training.data <- data.frame(x = c("yellow","red","yellow","red"), a = 
> c("alpha","alpha","beta","beta"), b = c("a", "b", "a", "c"))
> my.model <- svm(x ~ .,data=training.data)
> test.data <- data.frame(x = c("yellow","red"), a = c("alpha","beta"), b = 
> c("a", "b"))
> predict(my.model,test.data)
Error in predict.svm(my.model, test.data) :
  test data does not match model !
>
> levels(test.data$b) <- levels(training.data$b)
> predict(my.model,test.data)
     1      2
yellow    red
Levels: red yellow

In the first case test.data$b does not have the level "c" and this
results in the input data being rejected. I've debugged this down to
the point of model matrix creation in the SVM R code. Once I fill up
the levels in the test data with the levels from the original data,
then there is no problem at all.

Assuming my test data has to come from another source where the number
of category levels seen might not always be as large as those for the
original training data, is there a better way I should be handling
this?

Thanks

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Levels in new data fed to SVM

Reply via email to