Melanie Vida wrote:

Hi All,

My question is in regards to an error generated when using randomForest in R. Is there a special way to format the data in order to avoid this error, or am I completely confused on what the error implies?

"Error in randomForest.default(m, y, ...) :
       Can not handle categorical predictors with more than 32 categories."

This is generated from the command line:
> credit.rf <- randomForest(V16 ~ ., data=credit, mtry=2, importance = TRUE, do.trace=100)


The data set is the credit-screening data from the UCI respository, ftp://ftp.ics.uci.edu/pub/machine-learning-databases/credit-screening/crx.data. This data consists of 690 samples and 16 attributes.
The attribute information includes:


A1:    b, a.
   A2:    continuous.
   A3:    continuous.
   A4:    u, y, l, t.
   A5:    g, p, gg.
   A6:    c, d, cc, i, j, k, m, r, q, w, x, e, aa, ff.
   A7:    v, h, bb, j, n, z, dd, ff, o.
   A8:    continuous.
   A9:    t, f.
   A10:    t, f.
   A11:    continuous.
   A12:    t, f.
   A13:    g, p, s.
   A14:    continuous.
   A15:    continuous.
   A16: +,-         (class attribute)

Has anyone tried randomForests in R on the credit-screening data set from the UCI repository?


For sure you forgot to set na.strings = "?" in read.table()....
Look at str(credit) to see that some numerics had been converted to factors for that reason.


Uwe Ligges



Thanks in advance for any useful hints and tips,

Melanie

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to