Random Decision Forests - Binary Classification of large data set

Night Wolf Fri, 29 Jul 2011 09:25:41 -0700

Hi all,

I have been playing around with the Random Decision Forests in Mahout. Seems
like the classifier produces good results using the test programs.


I am wondering if this classifier can be used on larger data sets with
around 35,000 features and 100k+ message instances to classify on a small
Hadoop cluster or even a single node development install?

Has anyone used the Random forest classifier to work with massive data sets
reliably and with high accuracy. My previous experience using the RF model
has been good for sparse data sets and I think this is one area Mahout could
really shine. Using tools like Weka and even R, the data sets I'm testing
with now are just to large for these tools to work well so I was hoping
Mahout may be the answer for this problem as well.

So is it worth working with the Random Forest classifier to get a production
or near to production system running?

Does anyone have any examples and stories of their Mahout RF usage?

Thanks!

Random Decision Forests - Binary Classification of large data set

Reply via email to