2013/5/18 Joel Nothman <[email protected]>:
>> I think that this is software sophistication that makes it harder to use
>> for people who are not used to complex software construct (think the
>> matlab 101 user), and I for this reason, am -1.

Agree...

> So you'd +1 the transform_threshold object parameter?

There's one issue with this, which is that grid searching
transform_threshold would re-train the estimator many times in the
loop to change a parameter that does not actually affect fit.

As regards methods vs. meta-estimators, I'm not too fond of extra
methods on classifiers to overload them with feature selection. If we
use a meta-estimator, then we can add an option to make it "forget"
the underlying estimator and keep only the mask. I'm currrently doing
this manually with linear SVMs, because in multiclass classification,
the coef_ is n_classes × n_features × sizeof(np.float64), while
n_features × sizeof(np.bool) suffices to do feature selection. At 3e6
features × 6 classes, this greatly reduces the stored model's size.

(Also, is it an idea to extend SelectKBest and SelectPercentile to
work with estimators that have feature_importances_?)

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to