Thanks Joel, what would be your approach?
Sasha Kacanski On Mar 15, 2017 9:46 PM, "Joel Nothman" <[email protected]> wrote: > Trees are not a traditional choice for bag of words models, but you should > make sure you are at least using the parameters of the random forest to > limit the size (depth, branching) of the trees. > > On 16 March 2017 at 12:20, Sasha Kacanski <[email protected]> wrote: > >> Hi, >> As soon as number of trees and features goes higher, 70Gb of ram is gone >> and i am getting out of memory errors. >> file size is 700Mb. Dataframe quickly shrinks from 14 to 2 columns but >> there is ton of text ... >> with 10 estimators and 100 features per word I can't tackle ~900 k of >> records ... >> Training set, about 15% of data does perfectly fine but when test come >> that is it. >> >> i can split stuff and multiprocess it but I believe that will simply skew >> results... >> >> Any ideas? >> >> >> -- >> Aleksandar Kacanski >> >> _______________________________________________ >> scikit-learn mailing list >> [email protected] >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
