On 11/13/2015 12:32 AM, Scott Turner wrote:
On Thu, Nov 12, 2015 at 3:41 PM,
<scikit-learn-general-requ...@lists.sourceforge.net
<mailto:scikit-learn-general-requ...@lists.sourceforge.net>> wrote:
https://github.com/scikit-learn/scikit-learn/pull/5805
I wish all my off-hand remarks got such speedy service :-).
To return for a moment to Andreas Mueller's concern over whether an
averaging ensemble of regressors is useful, the obvious example is the
Netflix Prize. But let me moderate my suggestion to request that
VotingClassifier be generalized into a StackingEnsemble.
A StackingEnsemble would take a list of base estimators, a
meta-estimator, a partitioning scheme, and a few flags. It would fit
by using the partitioning scheme to split X and y, and then train the
base estimators on the first split, and then use the base estimators
to predict the second split and train the meta-estimator on [y2p]
(plus X2 if a flag for including the base features is set).
With classifiers as the base estimators, a Voting meta-estimator and a
null partition, this is the VotingClassifier. With a holdout it
becomes blending. With something more sophisticated as a
meta-estimator it becomes Stacking.
With regressors as the base estimators, a Mean meta-estimator and a
null partition, this is the AveragingEnsemble. With a holdout it
becomes blending. With something more sophisticated as a
meta-estimator it becomes Stacking.
If you use StackingEnsembles as base estimators in another
StackingEnsemble, you get multi-level stacking.
If you allow an additional optional input to StackingEnsemble.fit() to
pass in meta-features that would be used only by the meta-estimator,
you get the sort of ensemble that was effective in the Netflix
Competition.
There's probably more thought needed about the design and options, but
an approach like this would seem to add a lot of capability without
overly complicating sklearn with a lot of individual ensemble estimators.
-- Scott
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
It's not a good idea, but i'll post it:
Seems that it's possible to easily create StackingClassifier even in
current master. You can compose VotingClassifier and some meta-estimator
into Pipeline, because VotingClassifier has transform method, which
returns separated predictions of each base estimator before voting. So
it could be used and fitted as transformer inside pipeline. With
FeatureUnion you can pass some features strictly to meta estimator.
Though custom partitioning behaviour requires small modifications in
code, and you have to slice your dataset into features somehow if you
want to use FeatureUnion in Pipeline.
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general