Trees are not a traditional choice for bag of words models, but you should
make sure you are at least using the parameters of the random forest to
limit the size (depth, branching) of the trees.

On 16 March 2017 at 12:20, Sasha Kacanski <[email protected]> wrote:

> Hi,
> As soon as number of trees and features goes higher, 70Gb of ram is gone
> and i am getting out of memory errors.
> file size is 700Mb. Dataframe quickly shrinks from 14 to 2 columns but
> there is ton of text ...
> with 10 estimators and 100 features per word I can't tackle ~900 k of
> records ...
> Training set, about 15% of data does perfectly fine but when test come
> that is it.
>
> i can split stuff and multiprocess it but I believe that will simply skew
> results...
>
> Any ideas?
>
>
> --
> Aleksandar Kacanski
>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to