> > The size is dominated by the n_features * n_classes coef_ matrix, > which you can't get rid of just like that. What does your problem look > like? >
Document classification of ~3000 categories with ~12000 documents. The number of features comes out to be 500,000 [in which case the joblib classifier dumped is 10g]. If I use SelectKbest to select 200000 best features the size comes down to 4.8g maintain the accuracy to 97%. But I am not sure if there would be another alternative without sacrificing the accuracy. ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
