Re: [scikit-learn] best way to scale on the random forest for text w bag of words ...

Sasha Kacanski Thu, 16 Mar 2017 05:40:11 -0700

Thanks Joel, what would be your
approach?



Sasha Kacanski

On Mar 15, 2017 9:46 PM, "Joel Nothman" <[email protected]> wrote:

> Trees are not a traditional choice for bag of words models, but you should
> make sure you are at least using the parameters of the random forest to
> limit the size (depth, branching) of the trees.
>
> On 16 March 2017 at 12:20, Sasha Kacanski <[email protected]> wrote:
>
>> Hi,
>> As soon as number of trees and features goes higher, 70Gb of ram is gone
>> and i am getting out of memory errors.
>> file size is 700Mb. Dataframe quickly shrinks from 14 to 2 columns but
>> there is ton of text ...
>> with 10 estimators and 100 features per word I can't tackle ~900 k of
>> records ...
>> Training set, about 15% of data does perfectly fine but when test come
>> that is it.
>>
>> i can split stuff and multiprocess it but I believe that will simply skew
>> results...
>>
>> Any ideas?
>>
>>
>> --
>> Aleksandar Kacanski
>>
>> _______________________________________________
>> scikit-learn mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] best way to scale on the random forest for text w bag of words ...

Reply via email to