I'm trying to fit a naive Bayes model and predict on a new data set using the functions naivebayes and predict (package = e1071).
R version 2.5.1 on a Linux machine My data set looks like this. "class" is the response and k1 - k3 are the independent variables. All of them are factors. The response has 52 levels and k1 - k3 have 2-6 levels. I have about 9,300 independent variables but omit the long list here for simple demonstration. There are no missing values in the observations. class k1 k2 k3 1 0 0 1 8 0 0 0 # model fitting, I also tried setting laplace=0 but didn't help nbmodel <- naiveBayes(class~., data=train, laplace=1) # predict nb.fit <- predict(nbmodel, x.test[,-1]) First I had no trouble fitting the model. R also returned the predictions for some of my large data sets. But for some data sets, R can fit the model (no error message, nb.model$tables look ok). When I invoked the predict function, it kept giving me the following message: # my data set has 1 response variable and 9318 independent variables Error in FUN(1:9319[[4L]], ...) : subscript out of bounds # Here's what traceback() returns 10: FUN(1:9319[[4L]], ...) 9: lapply(X, FUN, ...) 8: sapply(1:nattribs, function(v) { nd <- ndata[v] if (is.na(nd)) rep(1, length(object$apriori)) else { prob <- if (isnumeric[v]) { msd <- object$tables[[v]] dnorm(nd, msd[, 1], msd[, 2]) } else object$tables[[v]][, nd] prob[prob == 0] <- threshold prob } }) 7: log(sapply(1:nattribs, function(v) { nd <- ndata[v] if (is.na(nd)) rep(1, length(object$apriori)) else { prob <- if (isnumeric[v]) { msd <- object$tables[[v]] dnorm(nd, msd[, 1], msd[, 2]) } else object$tables[[v]][, nd] prob[prob == 0] <- threshold prob } })) 6: apply(log(sapply(1:nattribs, function(v) { nd <- ndata[v] if (is.na(nd)) rep(1, length(object$apriori)) else { prob <- if (isnumeric[v]) { msd <- object$tables[[v]] dnorm(nd, msd[, 1], msd[, 2]) } else object$tables[[v]][, nd] prob[prob == 0] <- threshold prob } })), 1, sum) 5: FUN(1:30[[1L]], ...) 4: lapply(X, FUN, ...) 3: sapply(1:nrow(newdata), function(i) { ndata <- newdata[i, ] L <- log(object$apriori) + apply(log(sapply(1:nattribs, function(v) { nd <- ndata[v] if (is.na(nd)) rep(1, length(object$apriori)) else { prob <- if (isnumeric[v]) { msd <- object$tables[[v]] dnorm(nd, msd[, 1], msd[, 2]) } else object$tables[[v]][, nd] prob[prob == 0] <- threshold prob } })), 1, sum) if (type == "class") L else { L <- exp(L) L/sum(L) } }) 2: predict.naiveBayes(nbmodel, validf[1:30, ]) 1: predict(nbmodel, validf[1:30, ]) Does anyone have an idea what went wrong? Thanks in advance. [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.