Re: [Scikit-learn-general] modify gridsearch to scale cross-validation training/test dataset

Pagliari, Roberto Thu, 11 Sep 2014 09:51:39 -0700

Hi, 
Yes, I think you are right. 

Is the code below how it should be done (scaling+linearsvc)?


[('scaler', Scaler()), ('linear_svm', LinearSVC())]
clf = Pipeline(estimators)
params = dict(LinearSVC__C=[0.1, 10, 100])
gs = GridSearchCV(clf, param_grid=params)

Thank you,


-----Original Message-----
From: Laurent Direr [mailto:[email protected]] 
Sent: Thursday, September 11, 2014 11:15 AM
To: [email protected]
Subject: Re: [Scikit-learn-general] modify gridsearch to scale cross-validation 
training/test dataset

Hello,

I think a pipeline does precisely what you are asking for:
http://scikit-learn.org/stable/modules/pipeline.html

If you include the scaler as a step in the pipeline it should behave the way 
you described in your first email.

Laurent

On 09/11/2014 04:59 PM, Pagliari, Roberto wrote:
> I'm not trying to scale the dataset at the very beginning. I would like to 
> scale while doing gridsearchCV.
>
> Thanks,
>
>
> -----Original Message-----
> From: Pagliari, Roberto [mailto:[email protected]]
> Sent: Thursday, September 11, 2014 10:52 AM
> To: [email protected]
> Subject: Re: [Scikit-learn-general] modify gridsearch to scale 
> cross-validation training/test dataset
>
> I'm not sure how to do it when using gridsearch. Can you provide an example?
>
> Thank you,
>
>
> -----Original Message-----
> From: Gael Varoquaux [mailto:[email protected]]
> Sent: Thursday, September 11, 2014 10:50 AM
> To: [email protected]
> Subject: Re: [Scikit-learn-general] modify gridsearch to scale 
> cross-validation training/test dataset
>
> Use a pipeline.
>
> G
>
> On Thu, Sep 11, 2014 at 02:47:48PM +0000, Pagliari, Roberto wrote:
>> Hello,
>> Gridsearch with CV is something like this at a high level:
>
>
>> for every combination of parameters:
>>     for every partition of training data
>>       split training into train_cv and test_cv
>>       train_classifier(train_cv).predict(test_cv)
>>       compute score
>>     average score
>>     if max so far, then update best params
>
>
>> I woud like to do something like this:
>
>
>> for every combination of parameters:
>>     for every partition of training data
>>       split training into train_cv and test_cv
>>       scaler = StandardScaler()
>>       scaler.fit(train_cv)
>>       train_cv = scaler.transform(train_cv)
>>       test_cv = scaler.transform(test_cv)
>>      train_classifier(train_cv).predict(test_cv)
>>       compute score
>>     average score
>>     if max so far, then update best params
>
>
>> basically, I would like to scale training data and test data (using 
>> training data params) every time a CV train/test is generated.
>> Can someone suggest the best way to modify grid_search.py to do this?
>
>
>> Thank you,
>
>
>
>> ---------------------------------------------------------------------
>> -
>> --------
>> Want excitement?
>> Manually upgrade your production database.
>> When you want reliability, choose Perforce Perforce version control.
>> Predictably reliable.
>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.
>> clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce Perforce version control. 
Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] modify gridsearch to scale cross-validation training/test dataset

Reply via email to