Hi Joel, thank you for such a comprehensive answer. Only one more question,
if you don't mind.

I'm considering doing the cross-validation by hand. Are there any concerns
on calling #fit multiple times on the same classifier (without cloning it)?

Best regards,

José


On Wed, Dec 18, 2013 at 7:33 PM, Joel Nothman <[email protected]>wrote:

> Hi José,
>
> Scikit-learn doesn't currently have anything out-of-the-box on this front,
> and you've identified some ways in which the API makes it tricky.
>
> Yes, there could be a meta-estimator which turns a predictor into a
> transformer (via predict, predict_proba, or decision_function), although
> excessive meta-estimator nesting is never very neat. It could instead be a
> mixin, and you have to sub-class the estimator to make it into a
> transformer (although many estimators already have a mixin to provide a
> feature selection transform() method).
>
> You could then incorporate these features either by training and fixing
> the model (e.g. trained on other data), or including it in
> cross-validation. In either case, the default clone operation that happens
> in cross_val_score and *SearchCV will clear any fitted attributes, which
> breaks a fixed model, and makes CV do repeated work.
>
> It might be nice to have a way to fix a model. CV doing repeated work is a
> problem for pipelines in general, as a change to a later estimator's
> parameter doesn't affect the fitting of an earlier estimator's model. The
> proposed solution is to use memoisation (via joblib's Memory), but the
> details, and how to supply this feature without adding complexity, are up
> for debate (see e.g.
> https://github.com/scikit-learn/scikit-learn/pull/2086).
>
> Cheers,
>
> - Joel
>
>
> On Thu, Dec 19, 2013 at 7:23 AM, José Ricardo <[email protected]>wrote:
>
>> Hi, I'm trying to stack two classifiers. Right now, it's quite simple.
>>
>> I want to classify paragraphs of text and want to use their page
>> classification as one of the features (pages can be classified in two
>> classes).
>>
>> In other words: I want to use the page classifier's predict_proba as a
>> feature of the paragraph classifier.
>>
>> Searching in the scikit-learn docs I didn't manage to find a standard way
>> to stack classifiers in this way, is there any helper for this task?
>>
>> I created a wrapper that allows me to use the predict_proba method of a
>> classifier as a feature. But when I try to cross-validate (via
>> cross_val_score) the paragraph classifier, the page classifier is reset
>> (cross_val_score tries to fit all classifiers again).
>>
>> Sorry for the long text, but I'm wondering if there are better ways to
>> accomplish this task and I'm still a Machine Learning beginner.
>>
>> Any help will be appreciated.
>>
>> Best regards,
>>
>> José Ricardo
>>
>>
>> ------------------------------------------------------------------------------
>> Rapidly troubleshoot problems before they affect your business. Most IT
>> organizations don't have a clear picture of how application performance
>> affects their revenue. With AppDynamics, you get 100% visibility into your
>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
>> Pro!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Rapidly troubleshoot problems before they affect your business. Most IT
> organizations don't have a clear picture of how application performance
> affects their revenue. With AppDynamics, you get 100% visibility into your
> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
> Pro!
> http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to