2013/5/18 Joel Nothman <[email protected]>: >> I think that this is software sophistication that makes it harder to use >> for people who are not used to complex software construct (think the >> matlab 101 user), and I for this reason, am -1.
Agree... > So you'd +1 the transform_threshold object parameter? There's one issue with this, which is that grid searching transform_threshold would re-train the estimator many times in the loop to change a parameter that does not actually affect fit. As regards methods vs. meta-estimators, I'm not too fond of extra methods on classifiers to overload them with feature selection. If we use a meta-estimator, then we can add an option to make it "forget" the underlying estimator and keep only the mask. I'm currrently doing this manually with linear SVMs, because in multiclass classification, the coef_ is n_classes × n_features × sizeof(np.float64), while n_features × sizeof(np.bool) suffices to do feature selection. At 3e6 features × 6 classes, this greatly reduces the stored model's size. (Also, is it an idea to extend SelectKBest and SelectPercentile to work with estimators that have feature_importances_?) -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ------------------------------------------------------------------------------ AlienVault Unified Security Management (USM) platform delivers complete security visibility with the essential security capabilities. Easily and efficiently configure, manage, and operate all of your security controls from a single console and one unified framework. Download a free trial. http://p.sf.net/sfu/alienvault_d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
