Re: [R] subscript out of bounds Error in predict.naivebayes

2007-08-28 Thread Stephen Weigand
On 8/22/07, Polly He [EMAIL PROTECTED] wrote:
 I'm trying to fit a naive Bayes model and predict on a new data set using
 the functions naivebayes and predict (package = e1071).

 R version 2.5.1 on a Linux machine

 My data set looks like this. class is the response and k1 - k3 are the
 independent variables. All of them are factors. The response has 52 levels
 and k1 - k3 have 2-6 levels. I have about 9,300 independent variables but
 omit the long list here for simple demonstration. There are no missing
 values in the observations.

class k1 k2 k3
   1  0  0  1
   8  0  0  0

 # model fitting, I also tried setting laplace=0 but didn't help
  nbmodel - naiveBayes(class~., data=train, laplace=1)

 # predict
  nb.fit - predict(nbmodel, x.test[,-1])

 First I had no trouble fitting the model. R also returned the predictions
 for some of my large data sets. But for some data sets, R can fit the model
 (no error message, nb.model$tables look ok). When I invoked the predict
 function, it kept giving me the following message:

 # my data set has 1 response variable and 9318 independent variables
 Error in FUN(1:9319[[4L]], ...) : subscript out of bounds
[...]

In my experience, some predict methods have trouble when
newdata does not have all levels of a factor. This seems
to be the case with predict.naiveBayes:

example(naiveBayes)
predict(model, subset(HouseVotes84, V1 == n))

gives

Error in object$tables[[v]] : subscript out of bounds

One workaround is to predict for a bigger data set
and retain a subset of the predictions.

Hope this helps,

Stephen


-- 
Rochester, Minn. USA

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subscript out of bounds Error in predict.naivebayes

2007-08-22 Thread Polly He
I'm trying to fit a naive Bayes model and predict on a new data set using
the functions naivebayes and predict (package = e1071).

R version 2.5.1 on a Linux machine

My data set looks like this. class is the response and k1 - k3 are the
independent variables. All of them are factors. The response has 52 levels
and k1 - k3 have 2-6 levels. I have about 9,300 independent variables but
omit the long list here for simple demonstration. There are no missing
values in the observations.

   class k1 k2 k3
  1  0  0  1
  8  0  0  0

# model fitting, I also tried setting laplace=0 but didn't help
 nbmodel - naiveBayes(class~., data=train, laplace=1)

# predict
 nb.fit - predict(nbmodel, x.test[,-1])

First I had no trouble fitting the model. R also returned the predictions
for some of my large data sets. But for some data sets, R can fit the model
(no error message, nb.model$tables look ok). When I invoked the predict
function, it kept giving me the following message:

# my data set has 1 response variable and 9318 independent variables
Error in FUN(1:9319[[4L]], ...) : subscript out of bounds

# Here's what traceback() returns
10: FUN(1:9319[[4L]], ...)
9: lapply(X, FUN, ...)
8: sapply(1:nattribs, function(v) {
   nd - ndata[v]
   if (is.na(nd))
   rep(1, length(object$apriori))
   else {
   prob - if (isnumeric[v]) {
   msd - object$tables[[v]]
   dnorm(nd, msd[, 1], msd[, 2])
   }
   else object$tables[[v]][, nd]
   prob[prob == 0] - threshold
   prob
   }
   })
7: log(sapply(1:nattribs, function(v) {
   nd - ndata[v]
   if (is.na(nd))
   rep(1, length(object$apriori))
   else {
   prob - if (isnumeric[v]) {
   msd - object$tables[[v]]
   dnorm(nd, msd[, 1], msd[, 2])
   }
   else object$tables[[v]][, nd]
   prob[prob == 0] - threshold
   prob
   }
   }))
6: apply(log(sapply(1:nattribs, function(v) {
   nd - ndata[v]
   if (is.na(nd))
   rep(1, length(object$apriori))
   else {
   prob - if (isnumeric[v]) {
   msd - object$tables[[v]]
   dnorm(nd, msd[, 1], msd[, 2])
   }
   else object$tables[[v]][, nd]
   prob[prob == 0] - threshold
   prob
   }
   })), 1, sum)
5: FUN(1:30[[1L]], ...)
4: lapply(X, FUN, ...)
3: sapply(1:nrow(newdata), function(i) {
   ndata - newdata[i, ]
   L - log(object$apriori) + apply(log(sapply(1:nattribs, function(v) {
   nd - ndata[v]
   if (is.na(nd))
   rep(1, length(object$apriori))
   else {
   prob - if (isnumeric[v]) {
   msd - object$tables[[v]]
   dnorm(nd, msd[, 1], msd[, 2])
   }
   else object$tables[[v]][, nd]
   prob[prob == 0] - threshold
   prob
   }
   })), 1, sum)
   if (type == class)
   L
   else {
   L - exp(L)
   L/sum(L)
   }
   })
2: predict.naiveBayes(nbmodel, validf[1:30, ])
1: predict(nbmodel, validf[1:30, ])


Does anyone have an idea what went wrong? Thanks in advance.

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.