I'm trying to fit a naive Bayes model and predict on a new data set using
the functions naivebayes and predict (package = e1071).

R version 2.5.1 on a Linux machine

My data set looks like this. "class" is the response and k1 - k3 are the
independent variables. All of them are factors. The response has 52 levels
and k1 - k3 have 2-6 levels. I have about 9,300 independent variables but
omit the long list here for simple demonstration. There are no missing
values in the observations.

   class k1 k2 k3
      1  0  0  1
      8  0  0  0

# model fitting, I also tried setting laplace=0 but didn't help
 nbmodel <- naiveBayes(class~., data=train, laplace=1)

# predict
 nb.fit <- predict(nbmodel, x.test[,-1])

First I had no trouble fitting the model. R also returned the predictions
for some of my large data sets. But for some data sets, R can fit the model
(no error message, nb.model$tables look ok). When I invoked the predict
function, it kept giving me the following message:

# my data set has 1 response variable and 9318 independent variables
Error in FUN(1:9319[[4L]], ...) : subscript out of bounds

# Here's what traceback() returns
10: FUN(1:9319[[4L]], ...)
9: lapply(X, FUN, ...)
8: sapply(1:nattribs, function(v) {
       nd <- ndata[v]
       if (is.na(nd))
           rep(1, length(object$apriori))
       else {
           prob <- if (isnumeric[v]) {
               msd <- object$tables[[v]]
               dnorm(nd, msd[, 1], msd[, 2])
           }
           else object$tables[[v]][, nd]
           prob[prob == 0] <- threshold
           prob
       }
   })
7: log(sapply(1:nattribs, function(v) {
       nd <- ndata[v]
       if (is.na(nd))
           rep(1, length(object$apriori))
       else {
           prob <- if (isnumeric[v]) {
               msd <- object$tables[[v]]
               dnorm(nd, msd[, 1], msd[, 2])
           }
           else object$tables[[v]][, nd]
           prob[prob == 0] <- threshold
           prob
       }
   }))
6: apply(log(sapply(1:nattribs, function(v) {
       nd <- ndata[v]
       if (is.na(nd))
           rep(1, length(object$apriori))
       else {
           prob <- if (isnumeric[v]) {
               msd <- object$tables[[v]]
               dnorm(nd, msd[, 1], msd[, 2])
           }
           else object$tables[[v]][, nd]
           prob[prob == 0] <- threshold
           prob
       }
   })), 1, sum)
5: FUN(1:30[[1L]], ...)
4: lapply(X, FUN, ...)
3: sapply(1:nrow(newdata), function(i) {
       ndata <- newdata[i, ]
       L <- log(object$apriori) + apply(log(sapply(1:nattribs, function(v) {
           nd <- ndata[v]
           if (is.na(nd))
               rep(1, length(object$apriori))
           else {
               prob <- if (isnumeric[v]) {
                   msd <- object$tables[[v]]
                   dnorm(nd, msd[, 1], msd[, 2])
               }
               else object$tables[[v]][, nd]
               prob[prob == 0] <- threshold
               prob
           }
       })), 1, sum)
       if (type == "class")
           L
       else {
           L <- exp(L)
           L/sum(L)
       }
   })
2: predict.naiveBayes(nbmodel, validf[1:30, ])
1: predict(nbmodel, validf[1:30, ])


Does anyone have an idea what went wrong? Thanks in advance.

        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to