> > You could cut that in half by converting coef_ and optionally > intercept_ to np.float32 (that's not officially supported, but with > the current implementation it should work): > > clf.coef_ = np.astype(clf.coef_, np.float32) > > You could also try the HashingVectorizer in sklearn.feature_extraction > and see if performance is still acceptable with a small number of > features. That also skips storing the vocabulary, which I imagine will > be quite large as well. > HashingVectorizer might indeed save some space...will test for acceptable answer...
> (I hope you meant 12000 document *per class*?) > :( Unfortunately, no, I have 12000 documents in all..atleast as a start point, Initially it is just to collect metrics, and as time goes on, mode documents per category will be added. Besides I am also limited on train time which seems to go over hour as the number of samples goes up..[My very first attempt was with 200k documents]. Thanks for the suggestions. ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
