Hello,

I think a pipeline does precisely what you are asking for:
http://scikit-learn.org/stable/modules/pipeline.html

If you include the scaler as a step in the pipeline it should behave the 
way you described in your first email.

Laurent

On 09/11/2014 04:59 PM, Pagliari, Roberto wrote:
> I'm not trying to scale the dataset at the very beginning. I would like to 
> scale while doing gridsearchCV.
>
> Thanks,
>
>
> -----Original Message-----
> From: Pagliari, Roberto [mailto:[email protected]]
> Sent: Thursday, September 11, 2014 10:52 AM
> To: [email protected]
> Subject: Re: [Scikit-learn-general] modify gridsearch to scale 
> cross-validation training/test dataset
>
> I'm not sure how to do it when using gridsearch. Can you provide an example?
>
> Thank you,
>
>
> -----Original Message-----
> From: Gael Varoquaux [mailto:[email protected]]
> Sent: Thursday, September 11, 2014 10:50 AM
> To: [email protected]
> Subject: Re: [Scikit-learn-general] modify gridsearch to scale 
> cross-validation training/test dataset
>
> Use a pipeline.
>
> G
>
> On Thu, Sep 11, 2014 at 02:47:48PM +0000, Pagliari, Roberto wrote:
>> Hello,
>> Gridsearch with CV is something like this at a high level:
>
>
>> for every combination of parameters:
>>     for every partition of training data
>>       split training into train_cv and test_cv
>>       train_classifier(train_cv).predict(test_cv)
>>       compute score
>>     average score
>>     if max so far, then update best params
>
>
>> I woud like to do something like this:
>
>
>> for every combination of parameters:
>>     for every partition of training data
>>       split training into train_cv and test_cv
>>       scaler = StandardScaler()
>>       scaler.fit(train_cv)
>>       train_cv = scaler.transform(train_cv)
>>       test_cv = scaler.transform(test_cv)
>>      train_classifier(train_cv).predict(test_cv)
>>       compute score
>>     average score
>>     if max so far, then update best params
>
>
>> basically, I would like to scale training data and test data (using
>> training data params) every time a CV train/test is generated.
>> Can someone suggest the best way to modify grid_search.py to do this?
>
>
>> Thank you,
>
>
>
>> ----------------------------------------------------------------------
>> --------
>> Want excitement?
>> Manually upgrade your production database.
>> When you want reliability, choose Perforce Perforce version control.
>> Predictably reliable.
>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.
>> clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to