The current "classwt" option in the randomForest package has been there since 
the beginning, and is different from how the official Fortran code (version 4 
and later) implements class weights.  It simply account for the class weights 
in the Gini index calculation when splitting nodes, exactly as how a single 
CART tree is done when given class weights.  Prof. Breiman came up with the 
newer class weighting scheme implemented in the newer version of his Fortran 
code after we found that simply using the weights in the Gini index didn't seem 
to help much in extremely unbalanced data (say 1:100 or worse).  If using 
weighted Gini helps in your situation, by all means do it.  I can only say that 
in the past it didn't give us the result we were expecting.

Best,
Andy 

> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of James Long
> Sent: Tuesday, September 13, 2011 2:10 AM
> To: [email protected]
> Subject: [R] class weights with Random Forest
> 
> Hi All,
> 
> I am looking for a reference that explains how the 
> randomForest function in
> the randomForest package uses the classwt parameter. Here:
> 
> http://tolstoy.newcastle.edu.au/R/e4/help/08/05/12088.html
> 
> Andy Liaw suggests not using classwt. And according to:
> 
> http://r.789695.n4.nabble.com/R-help-with-RandomForest-classwt
> -option-td817149.html
> 
> it has "not been implemented" as of 2007. However it improved 
> classification
> performance for a problem I am working on, more than 
> adjusting the sampsize
> parameter. So I'm wondering if it has been implemented 
> recently (since 2007)
> or if there is a detailed explanation of what this 
> unimplemented version is
> doing.
> 
> Thanks!
> James
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to