Joel, thank you so much for the lesson on scikit-learn details that I
didn't know.
Your approach is way cleaner than my previous attempt.
Thank you
On Sat, Dec 21, 2013 at 6:13 PM, Joel Nothman <[email protected]>wrote:
> I've realised that one way to get around the clearing of the model on
> `clone` is to pass a method, rather than an estimator, into your stacking
> transformer. https://gist.github.com/jnothman/8074321 But it won't
> pickle. - Joel
>
>
> On Fri, Dec 20, 2013 at 7:49 AM, Joel Nothman <[email protected]>wrote:
>
>> Unless it is an estimator with warm_start=True, fit() should not be
>> affected by previous state (I hope I'm right in that :P).
>>
>> And there's no shame in doing cross-validation by hand =) But it would
>> indeed be nice if stacking were easier in scikit-learn.
>>
>>
>> On Fri, Dec 20, 2013 at 6:47 AM, José Ricardo <[email protected]>wrote:
>>
>>> Hi Joel, thank you for such a comprehensive answer. Only one more
>>> question, if you don't mind.
>>>
>>> I'm considering doing the cross-validation by hand. Are there any
>>> concerns on calling #fit multiple times on the same classifier (without
>>> cloning it)?
>>>
>>> Best regards,
>>>
>>> José
>>>
>>>
>>> On Wed, Dec 18, 2013 at 7:33 PM, Joel Nothman <[email protected]>wrote:
>>>
>>>> Hi José,
>>>>
>>>> Scikit-learn doesn't currently have anything out-of-the-box on this
>>>> front, and you've identified some ways in which the API makes it tricky.
>>>>
>>>> Yes, there could be a meta-estimator which turns a predictor into a
>>>> transformer (via predict, predict_proba, or decision_function), although
>>>> excessive meta-estimator nesting is never very neat. It could instead be a
>>>> mixin, and you have to sub-class the estimator to make it into a
>>>> transformer (although many estimators already have a mixin to provide a
>>>> feature selection transform() method).
>>>>
>>>> You could then incorporate these features either by training and fixing
>>>> the model (e.g. trained on other data), or including it in
>>>> cross-validation. In either case, the default clone operation that happens
>>>> in cross_val_score and *SearchCV will clear any fitted attributes, which
>>>> breaks a fixed model, and makes CV do repeated work.
>>>>
>>>> It might be nice to have a way to fix a model. CV doing repeated work
>>>> is a problem for pipelines in general, as a change to a later estimator's
>>>> parameter doesn't affect the fitting of an earlier estimator's model. The
>>>> proposed solution is to use memoisation (via joblib's Memory), but the
>>>> details, and how to supply this feature without adding complexity, are up
>>>> for debate (see e.g.
>>>> https://github.com/scikit-learn/scikit-learn/pull/2086).
>>>>
>>>> Cheers,
>>>>
>>>> - Joel
>>>>
>>>>
>>>> On Thu, Dec 19, 2013 at 7:23 AM, José Ricardo
>>>> <[email protected]>wrote:
>>>>
>>>>> Hi, I'm trying to stack two classifiers. Right now, it's quite simple.
>>>>>
>>>>> I want to classify paragraphs of text and want to use their page
>>>>> classification as one of the features (pages can be classified in two
>>>>> classes).
>>>>>
>>>>> In other words: I want to use the page classifier's predict_proba as a
>>>>> feature of the paragraph classifier.
>>>>>
>>>>> Searching in the scikit-learn docs I didn't manage to find a standard
>>>>> way to stack classifiers in this way, is there any helper for this task?
>>>>>
>>>>> I created a wrapper that allows me to use the predict_proba method of
>>>>> a classifier as a feature. But when I try to cross-validate (via
>>>>> cross_val_score) the paragraph classifier, the page classifier is reset
>>>>> (cross_val_score tries to fit all classifiers again).
>>>>>
>>>>> Sorry for the long text, but I'm wondering if there are better ways to
>>>>> accomplish this task and I'm still a Machine Learning beginner.
>>>>>
>>>>> Any help will be appreciated.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> José Ricardo
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Rapidly troubleshoot problems before they affect your business. Most IT
>>>>> organizations don't have a clear picture of how application performance
>>>>> affects their revenue. With AppDynamics, you get 100% visibility into
>>>>> your
>>>>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of
>>>>> AppDynamics Pro!
>>>>>
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> [email protected]
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Rapidly troubleshoot problems before they affect your business. Most IT
>>>> organizations don't have a clear picture of how application performance
>>>> affects their revenue. With AppDynamics, you get 100% visibility into
>>>> your
>>>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of
>>>> AppDynamics Pro!
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Rapidly troubleshoot problems before they affect your business. Most IT
>>> organizations don't have a clear picture of how application performance
>>> affects their revenue. With AppDynamics, you get 100% visibility into
>>> your
>>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of
>>> AppDynamics Pro!
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>
>
> ------------------------------------------------------------------------------
> Rapidly troubleshoot problems before they affect your business. Most IT
> organizations don't have a clear picture of how application performance
> affects their revenue. With AppDynamics, you get 100% visibility into your
> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
> Pro!
> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general