Hi all, In Mahout 0.8, the distributed Random Forest implementation doesn't seem to be computing the out of bag error while building the RF model. I wanted to confirm if that is really the case.
While browsing through previous versions of Mahout source code (versions 0.2 to 0.5), I came across distributed code to compute the out of bag error while building the RF model - the classes of note here are Step2Job and Step2Mapper, both of these don't exist in 0.8. Also, I don't see the 'callback' package any more in 0.8. I was wondering why Mahout doesn't support those implementations any more. I'm currently using the Mahout's RF PartialBuilder and am working on ways to evaluate the model built. I wanted to know the best strategy to evaluate RF models built via PartialBuilder. I can always split my data into train and test samples and get metrics like AUC. What I was thinking is, if Mahout's distributed RF implementation involved computation of out of bag error while generating the model, then there is no need to split my offline data into train and test samples. Looking forward to hearing your thoughts on this. Thanks, Som
