[R] Question about random Forest function in R
Hello, I am trying to run the random Forest function on a data.frame using the following code.. myrf - randomForest (y=sample_data_metal, x=Train, importance=TRUE, proximity=TRUE) However, an error occurs saying, can not handle categorical predictors with more than 32 categories. My x=Train data.frame is quite large and my y=sample_data_metal is one column. I'm not sure how to go about fixing this error or if there is even a way to get around this error. Thanks in advance for any help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about random Forest function in R
Hi Kelly, The function has a limitation that it cannot handle any column in your x that is a categorical variable with more than 32 categories. One possibility is to see if you can bin some of the categories into one to get below 32 categories. Andy -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Kelly Cool Sent: Tuesday, May 29, 2012 10:47 AM To: r-help@r-project.org Subject: [R] Question about random Forest function in R Hello, I am trying to run the random Forest function on a data.frame using the following code.. myrf - randomForest (y=sample_data_metal, x=Train, importance=TRUE, proximity=TRUE) However, an error occurs saying, can not handle categorical predictors with more than 32 categories. My x=Train data.frame is quite large and my y=sample_data_metal is one column. I'm not sure how to go about fixing this error or if there is even a way to get around this error. Thanks in advance for any help. [[alternative HTML version deleted]] Notice: This e-mail message, together with any attachme...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about random Forest function in R
This is a well-known limitation. You have to group categorical attributes together to work around. -- Weifeng (aaron) liu | retail systems pricing | sr research scientist -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Kelly Cool Sent: Tuesday, May 29, 2012 7:47 AM To: r-help@r-project.org Subject: [R] Question about random Forest function in R Hello, I am trying to run the random Forest function on a data.frame using the following code.. myrf - randomForest (y=sample_data_metal, x=Train, importance=TRUE, proximity=TRUE) However, an error occurs saying, can not handle categorical predictors with more than 32 categories. My x=Train data.frame is quite large and my y=sample_data_metal is one column. I'm not sure how to go about fixing this error or if there is even a way to get around this error. Thanks in advance for any help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.