I got what I was looking for - https://issues.apache.org/jira/browse/MAHOUT-835 Thanks, Som
On Thu, Jan 9, 2014 at 8:43 AM, Som Satpathy <[email protected]> wrote: > Hi all, > > In Mahout 0.8, the distributed Random Forest implementation doesn't seem > to be computing the out of bag error while building the RF model. I wanted > to confirm if that is really the case. > > While browsing through previous versions of Mahout source code (versions > 0.2 to 0.5), I came across distributed code to compute the out of bag error > while building the RF model - the classes of note here are Step2Job and > Step2Mapper, both of these don't exist in 0.8. Also, I don't see the > 'callback' package any more in 0.8. I was wondering why Mahout doesn't > support those implementations any more. > > I'm currently using the Mahout's RF PartialBuilder and am working on ways > to evaluate the model built. I wanted to know the best strategy to evaluate > RF models built via PartialBuilder. I can always split my data into train > and test samples and get metrics like AUC. What I was thinking is, if > Mahout's distributed RF implementation involved computation of out of bag > error while generating the model, then there is no need to split my offline > data into train and test samples. > > Looking forward to hearing your thoughts on this. > > Thanks, > Som >
