On 11/13/2015 12:32 AM, Scott Turner wrote:
On Thu, Nov 12, 2015 at 3:41 PM, <scikit-learn-general-requ...@lists.sourceforge.net <mailto:scikit-learn-general-requ...@lists.sourceforge.net>> wrote:

    https://github.com/scikit-learn/scikit-learn/pull/5805


I wish all my off-hand remarks got such speedy service :-).

To return for a moment to Andreas Mueller's concern over whether an averaging ensemble of regressors is useful, the obvious example is the Netflix Prize. But let me moderate my suggestion to request that VotingClassifier be generalized into a StackingEnsemble.

A StackingEnsemble would take a list of base estimators, a meta-estimator, a partitioning scheme, and a few flags. It would fit by using the partitioning scheme to split X and y, and then train the base estimators on the first split, and then use the base estimators to predict the second split and train the meta-estimator on [y2p] (plus X2 if a flag for including the base features is set).

With classifiers as the base estimators, a Voting meta-estimator and a null partition, this is the VotingClassifier. With a holdout it becomes blending. With something more sophisticated as a meta-estimator it becomes Stacking.

With regressors as the base estimators, a Mean meta-estimator and a null partition, this is the AveragingEnsemble. With a holdout it becomes blending. With something more sophisticated as a meta-estimator it becomes Stacking.

If you use StackingEnsembles as base estimators in another StackingEnsemble, you get multi-level stacking.

If you allow an additional optional input to StackingEnsemble.fit() to pass in meta-features that would be used only by the meta-estimator, you get the sort of ensemble that was effective in the Netflix Competition.

There's probably more thought needed about the design and options, but an approach like this would seem to add a lot of capability without overly complicating sklearn with a lot of individual ensemble estimators.

-- Scott


------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
It's not a good idea, but i'll post it:
Seems that it's possible to easily create StackingClassifier even in current master. You can compose VotingClassifier and some meta-estimator into Pipeline, because VotingClassifier has transform method, which returns separated predictions of each base estimator before voting. So it could be used and fitted as transformer inside pipeline. With FeatureUnion you can pass some features strictly to meta estimator.

Though custom partitioning behaviour requires small modifications in code, and you have to slice your dataset into features somehow if you want to use FeatureUnion in Pipeline.
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to