Re: [R] Random Forest weighting

2008-12-05 Thread Raghu Naik
Andy, Thanks for your email. I understand that by default, the sampsize variable will use the behavior variable that we are classifying as the strata variable. Then, I could set sampsize=c(no=89, yes=11). I implemented that but I got 99% classification error rate on the yes value. When I

Re: [R] Random Forest weighting

2008-12-04 Thread Liaw, Andy
If I understand your situation correctly, you may be able to make use of the strata and sampsize arguments in randomForest() to get bootstrap samples that resemble the original data distribution. They allow you to specify stratified samples using the strata variable. Best, Andy From: Raghu

[R] Random Forest weighting

2008-12-03 Thread Raghu Naik
Folks, I have a query around weighting in Random Forest (RF). I know that several earlier emails in this group have raised this issue, but I did not find an answer to my query. I am working on a dataset (dataset1) that consists of 4 million records that can be reduced to a dataset (dataset2) of