> 
> You could cut that in half by converting coef_ and optionally
> intercept_ to np.float32 (that's not officially supported, but with
> the current implementation it should work):
> 
>     clf.coef_ = np.astype(clf.coef_, np.float32)
> 
> You could also try the HashingVectorizer in sklearn.feature_extraction
> and see if performance is still acceptable with a small number of
> features. That also skips storing the vocabulary, which I imagine will
> be quite large as well.
> 
 HashingVectorizer might indeed save some space...will test for acceptable 
answer...

> (I hope you meant 12000 document *per class*?)
> 
 :( Unfortunately, no, I have 12000 documents in all..atleast as a start point,
Initially it is just to collect metrics, and as time goes on, mode
documents per category will be added. Besides I am also limited on train time 
which seems to go over hour as the number of samples goes up..[My very first
attempt was with 200k documents].
Thanks for the suggestions.





------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to