Hi Weiwei, thanks a lot for the detailed help!! I tried the option 2 in R. It works pretty well! You mention that you also implemented RF. Could you plz share your code with me? Thanks!
Betty On 1/29/07, Weiwei Shi <[EMAIL PROTECTED]> wrote: > > Hi, Betty: > > 1. Fortan code ( > http://www.stat.berkeley.edu/~breiman/RandomForests/cc_examples/prog.f) > > if(jclasswt.eq.0) then > do j=1,nclass > classwt(j)=1 > enddo > endif > if(jclasswt.eq.1) then > c fill in classwt(j) for each j: > c classwt(1)=1. > c classwt(2)=10. > > You need to set the jclasswt = 1 ( you can find by "search" through the > codes). > then "uncomment" the last two lines. Here you go with classwt in > fortran. You can use this classwt for extremely-imbalanced > classification problem. Down-sampling is one possible choice for that > too but it is not directly implemented in rf. Check the following > paper, and it might help. > http://oz.berkeley.edu/users/chenchao/666.pdf > > 2. as to the wrapper function, the idea is that you can create a set > of samples by applying some sampling probilities to implement > down-sampling. Then build a rf model for each sample; > suppose you call rf in this way for each sample, > my.rf <- randomForest(...) > > then you can access the oob scores and prediction scores by > my.rf$votes or my.rf$test$votes respectively. > > then you can average those scores by yourself, it is just like a > simple meta-learning process but it does exactly what downsampling > plus rf does, though downsampling is not implemented. > > > 3. classwt and cutoff are used at different places. The former is used > at two places: calculating the gini criteria and calculating the final > vote from the leaf. While cutoff is only used in the final voting. So > cutoff won't change the splitting while classwt can. However, since > the current R's rf cannot do classwt, you can try to use cutoff to see > if it helps in your case. > > 4. The fourth option is you can use my implementation of rf; But I did > not write a manual for that; and it cannot show your splitting yet. > > HTH, > > weiwei > > > > > On 1/29/07, Betty Health <[EMAIL PROTECTED]> wrote: > > Thank you very much, Weiwei and Jim! > > > > Yeah, I did read the post by Andy, the contributor of this package. It > seems > > that classwt is not implemented yet. For Weiwei's options, I have a few > more > > questions. Thanks! > > > > "1. try to use rf in fortran by following the linky below > > http://www.stat.berkeley.edu/~breiman/RandomForests/cc_software.htm" > > > > I read the Fortran code briefly. But I did not find the options for down > > sampling. So does that mean I need to do down sampling myself? Could > you > > explain a little more about "2. make a wrapper function to do the down > > sampling by yourself"? You mean I can do it in R or in Fortran? Some > links > > plz? I haven't done this before. > > > > Yeah, cut off did change for the final classification results. However > from > > what I tried, they did not influence how the nodes are split. So I would > go > > further in the above 2 options. > > > > Thank you again! > > > > Betty > > > > > > > > > > On 1/28/07, Weiwei Shi <[EMAIL PROTECTED]> wrote: > > > Dear Betty: > > > > > > I could suggest 3 options: > > > > > > 1. try to use rf in fortran by following the linky below > > > > > http://www.stat.berkeley.edu/~breiman/RandomForests/cc_software.htm > > > > > > 2. make a wrapper function to do the down sampling by yourself > > > > > > 3. try to use cutoff in randomForest, which might help in your > situation. > > > > > > HTH, > > > > > > weiwei > > > > > > On 1/28/07, Betty Health < [EMAIL PROTECTED]> wrote: > > > > Hello there, > > > > > > > > I am working on an extremely unbalanced two class classification > > problems. I > > > > wanna use "classwt" with "down sampling" together. By checking the > > rfNews() > > > > in R, it looks that classwt is not working yet. Then I looked at the > > > > software from Salford. I did not find the down sampling option. I > am > > > > wondering if you have any experience to deal with this problem. Do > you > > know > > > > any method or softwares can handle this problem? > > > > > > > > Thank you very much!! > > > > > > > > Betty > > > > > > > > [[alternative HTML version deleted]] > > > > > > > > ______________________________________________ > > > > [email protected] mailing list > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > > > > -- > > > Weiwei Shi, Ph.D > > > Research Scientist > > > GeneGO, Inc. > > > > > > "Did you always know?" > > > "No, I did not. But I believed..." > > > ---Matrix III > > > > > > > > > > -- > Weiwei Shi, Ph.D > Research Scientist > GeneGO, Inc. > > "Did you always know?" > "No, I did not. But I believed..." > ---Matrix III > [[alternative HTML version deleted]] ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
