Re: Decision Forest - Partial implementation

Marty Kube Thu, 06 Dec 2012 18:33:26 -0800

Yes I'm on a project in which we classify a large data set. We do usemapreduce to do the classification as the data set is much larger thanthe working memory. We have a non-mahout implementation...

So we put the decision forest in memory via a distributed cache andpartition the data set and run it past the models. The models aregetting pretty big and keeping them in memory is a challenge. I guess Iwas looking for an implementation that doesn't require keeping thedecision forest in memory. I'll have a look at the TestForestimplementation.



On 12/06/2012 12:06 AM, deneche abdelhakim wrote:

You mean you want to classify a large dataset ?
The partial implementation is useful when the training dataset is too large
to fit in memory. If it's does fit then you better train the forest using
the in-memory implementation.
If you want to classify a large amount of rows then you can add the
parameter -mr to TestForest to classify the data using mapreduce. An
example of this can be found in the wiki:

https://cwiki.apache.org/MAHOUT/partial-implementation.html




On Thu, Dec 6, 2012 at 2:45 AM, Marty Kube <
[email protected]> wrote:

Hi,

I'm working improving classification throughput for a decision forest.  I
was wondering about the use case for Partial Implementation.

The quick start guide suggests that Partial Implementation is designed for
building forest on large datasets.

My problem is classification after training. Is Partial Implementation
helpful for this use case?

Re: Decision Forest - Partial implementation

Reply via email to