Run more than one mapper for TestForest?

Adam Baron Fri, 05 Jul 2013 14:02:13 -0700

I'm attempting to run org.apache.mahout.classifier.df.mapreduce.TestForest
on a CSV with 200,000 rows that have 500,000 features per row.
 However, TestForest is  running extremely slow, likely because only 1
mapper was assigned to the job.  This seems strange because
the org.apache.mahout.classifier.df.mapreduce.BuildForest step on the same
data used 1772 mappers and took about 6 minutes.  (BTW: I know I
*shouldn't* use the same data set for the training and the testing steps;
this is purely a technical experiment to see if Mahout's Random Forest can
handle the data sizes we typically deal with).


Any idea on how to get org.apache.mahout.classifier.df.mapreduce.TestForest
to use more mappers?  Glancing at the code (and thinking about what is
happening intuitively), it should be ripe for parallelization.

Thanks,
        Adam

Run more than one mapper for TestForest?

Reply via email to