[R] Question about random Forest function in R

2012-05-29 Thread Kelly Cool


Hello, 

I am trying to run the random Forest function on a data.frame using the 
following code..

myrf - randomForest (y=sample_data_metal, x=Train, importance=TRUE, 
proximity=TRUE)


However, an error occurs saying, can not handle categorical predictors with 
more than 32 categories. 

My x=Train data.frame is quite large and my y=sample_data_metal is one 
column. 

I'm not sure how to go about fixing this error or if there is even a way to get 
around this error. Thanks in advance for any help. 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about random Forest function in R

2012-05-29 Thread Liaw, Andy
Hi Kelly,

The function has a limitation that it cannot handle any column in your x that 
is a categorical variable with more than 32 categories.  One possibility is to 
see if you can bin some of the categories into one to get below 32 categories.

Andy 

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Kelly Cool
Sent: Tuesday, May 29, 2012 10:47 AM
To: r-help@r-project.org
Subject: [R] Question about random Forest function in R



Hello, 

I am trying to run the random Forest function on a data.frame using the 
following code..

myrf - randomForest (y=sample_data_metal, x=Train, importance=TRUE, 
proximity=TRUE)


However, an error occurs saying, can not handle categorical predictors with 
more than 32 categories. 

My x=Train data.frame is quite large and my y=sample_data_metal is one 
column. 

I'm not sure how to go about fixing this error or if there is even a way to get 
around this error. Thanks in advance for any help. 

[[alternative HTML version deleted]]

Notice:  This e-mail message, together with any attachme...{{dropped:11}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about random Forest function in R

2012-05-29 Thread Liu, Weifeng Aaron
This is a well-known limitation. You have to group categorical attributes 
together to work around.

--
Weifeng (aaron) liu  |  retail systems pricing  |  sr research scientist


-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Kelly Cool
Sent: Tuesday, May 29, 2012 7:47 AM
To: r-help@r-project.org
Subject: [R] Question about random Forest function in R



Hello, 

I am trying to run the random Forest function on a data.frame using the 
following code..

myrf - randomForest (y=sample_data_metal, x=Train, importance=TRUE, 
proximity=TRUE)


However, an error occurs saying, can not handle categorical predictors with 
more than 32 categories. 

My x=Train data.frame is quite large and my y=sample_data_metal is one 
column. 

I'm not sure how to go about fixing this error or if there is even a way to get 
around this error. Thanks in advance for any help. 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.