Debasish, we've tested the MLLib decision tree a bit and it eats up too
much memory for RF purposes.
Once the tree got to depth 8~9, it was easy to get heap exception, even
with 2~4 GB of memory per worker.

With RF, it's very easy to get 100+ depth in RF with even only 100,000+
rows (because trees usually are not balanced). Additionally, the lack of
multi-class classification limits its applicability.

Also, RF requires random features per tree node to be effective (not just
bootstrap samples), and MLLib decision tree doesn't support that.


On Thu, Apr 17, 2014 at 10:27 AM, Debasish Das <debasish.da...@gmail.com>wrote:

> Mllib has decision tree....there is a rf pr which is not active
> now....take that and swap the tree builder with the fast tree builder
> that's in mllib...search for the spark jira...the code is based on google
> planet paper. ..
>
> I am sure people in devlist are already working on it...send an email to
> know the status over there...
>
> There is also a rf in cloudera oryx but we could not run it on our data
> yet....
>
> Weka 3.7.10 has a multi thread rf that is good to do some adhoc runs but
> it does not scale...
>  On Apr 17, 2014 2:45 AM, "Laeeq Ahmed" <laeeqsp...@yahoo.com> wrote:
>
>> Hi,
>>
>> For one of my application, I want to use Random forests(RF) on top of
>> spark. I see that currenlty MLLib does not have implementation for RF. What
>> other opensource RF implementations will be great to use with spark in
>> terms of speed?
>>
>> Regards,
>> Laeeq Ahmed,
>> KTH, Sweden.
>>
>>

Reply via email to