Query regarding Mahout's distributed Random Forest implementation

Som Satpathy Thu, 09 Jan 2014 08:48:38 -0800

Hi all,

In Mahout 0.8, the distributed Random Forest implementation doesn't seem to
be computing the out of bag error while building the RF model. I wanted to
confirm if that is really the case.


While browsing through previous versions of Mahout source code (versions
0.2 to 0.5), I came across distributed code to compute the out of bag error
while building the RF model - the classes of note here are Step2Job and
Step2Mapper, both of these don't exist in 0.8. Also, I don't see the
'callback' package any more in 0.8. I was wondering why Mahout doesn't
support those implementations any more.

I'm currently using the Mahout's RF PartialBuilder and am working on ways
to evaluate the model built. I wanted to know the best strategy to evaluate
RF models built via PartialBuilder. I can always split my data into train
and test samples and get metrics like AUC. What I was thinking is, if
Mahout's distributed RF implementation involved computation of out of bag
error while generating the model, then there is no need to split my offline
data into train and test samples.

Looking forward to hearing your thoughts on this.

Thanks,
Som

Query regarding Mahout's distributed Random Forest implementation

Reply via email to