I've realised that one way to get around the clearing of the model on
`clone` is to pass a method, rather than an estimator, into your stacking
transformer. https://gist.github.com/jnothman/8074321 But it won't pickle.
- Joel
On Fri, Dec 20, 2013 at 7:49 AM, Joel Nothman <[email protected]>wrote:
> Unless it is an estimator with warm_start=True, fit() should not be
> affected by previous state (I hope I'm right in that :P).
>
> And there's no shame in doing cross-validation by hand =) But it would
> indeed be nice if stacking were easier in scikit-learn.
>
>
> On Fri, Dec 20, 2013 at 6:47 AM, José Ricardo <[email protected]>wrote:
>
>> Hi Joel, thank you for such a comprehensive answer. Only one more
>> question, if you don't mind.
>>
>> I'm considering doing the cross-validation by hand. Are there any
>> concerns on calling #fit multiple times on the same classifier (without
>> cloning it)?
>>
>> Best regards,
>>
>> José
>>
>>
>> On Wed, Dec 18, 2013 at 7:33 PM, Joel Nothman <[email protected]>wrote:
>>
>>> Hi José,
>>>
>>> Scikit-learn doesn't currently have anything out-of-the-box on this
>>> front, and you've identified some ways in which the API makes it tricky.
>>>
>>> Yes, there could be a meta-estimator which turns a predictor into a
>>> transformer (via predict, predict_proba, or decision_function), although
>>> excessive meta-estimator nesting is never very neat. It could instead be a
>>> mixin, and you have to sub-class the estimator to make it into a
>>> transformer (although many estimators already have a mixin to provide a
>>> feature selection transform() method).
>>>
>>> You could then incorporate these features either by training and fixing
>>> the model (e.g. trained on other data), or including it in
>>> cross-validation. In either case, the default clone operation that happens
>>> in cross_val_score and *SearchCV will clear any fitted attributes, which
>>> breaks a fixed model, and makes CV do repeated work.
>>>
>>> It might be nice to have a way to fix a model. CV doing repeated work is
>>> a problem for pipelines in general, as a change to a later estimator's
>>> parameter doesn't affect the fitting of an earlier estimator's model. The
>>> proposed solution is to use memoisation (via joblib's Memory), but the
>>> details, and how to supply this feature without adding complexity, are up
>>> for debate (see e.g.
>>> https://github.com/scikit-learn/scikit-learn/pull/2086).
>>>
>>> Cheers,
>>>
>>> - Joel
>>>
>>>
>>> On Thu, Dec 19, 2013 at 7:23 AM, José Ricardo
>>> <[email protected]>wrote:
>>>
>>>> Hi, I'm trying to stack two classifiers. Right now, it's quite simple.
>>>>
>>>> I want to classify paragraphs of text and want to use their page
>>>> classification as one of the features (pages can be classified in two
>>>> classes).
>>>>
>>>> In other words: I want to use the page classifier's predict_proba as a
>>>> feature of the paragraph classifier.
>>>>
>>>> Searching in the scikit-learn docs I didn't manage to find a standard
>>>> way to stack classifiers in this way, is there any helper for this task?
>>>>
>>>> I created a wrapper that allows me to use the predict_proba method of a
>>>> classifier as a feature. But when I try to cross-validate (via
>>>> cross_val_score) the paragraph classifier, the page classifier is reset
>>>> (cross_val_score tries to fit all classifiers again).
>>>>
>>>> Sorry for the long text, but I'm wondering if there are better ways to
>>>> accomplish this task and I'm still a Machine Learning beginner.
>>>>
>>>> Any help will be appreciated.
>>>>
>>>> Best regards,
>>>>
>>>> José Ricardo
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Rapidly troubleshoot problems before they affect your business. Most IT
>>>> organizations don't have a clear picture of how application performance
>>>> affects their revenue. With AppDynamics, you get 100% visibility into
>>>> your
>>>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of
>>>> AppDynamics Pro!
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Rapidly troubleshoot problems before they affect your business. Most IT
>>> organizations don't have a clear picture of how application performance
>>> affects their revenue. With AppDynamics, you get 100% visibility into
>>> your
>>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of
>>> AppDynamics Pro!
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Rapidly troubleshoot problems before they affect your business. Most IT
>> organizations don't have a clear picture of how application performance
>> affects their revenue. With AppDynamics, you get 100% visibility into your
>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
>> Pro!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general