[scikit-learn] best way to scale on the random forest for text w bag of words ...

Sasha Kacanski Wed, 15 Mar 2017 18:23:45 -0700

Hi,
As soon as number of trees and features goes higher, 70Gb of ram is gone
and i am getting out of memory errors.
file size is 700Mb. Dataframe quickly shrinks from 14 to 2 columns but
there is ton of text ...
with 10 estimators and 100 features per word I can't tackle ~900 k of
records ...
Training set, about 15% of data does perfectly fine but when test come that
is it.


i can split stuff and multiprocess it but I believe that will simply skew
results...

Any ideas?


-- 
Aleksandar Kacanski

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

[scikit-learn] best way to scale on the random forest for text w bag of words ...

Reply via email to