Re: [Scikit-learn-general] modify gridsearch to scale cross-validation training/test dataset

Pagliari, Roberto Thu, 11 Sep 2014 09:58:46 -0700

Thank you,
I’m going to try shortly.

And in general, if I wanted to put my own function in the pipeline, the only 
requirement is that the class must have the “fit” method?


Thank you again,


From: Josh Vredevoogd [mailto:[email protected]]
Sent: Thursday, September 11, 2014 12:52 PM
To: [email protected]
Subject: Re: [Scikit-learn-general] modify gridsearch to scale cross-validation 
training/test dataset

You're missing estimators = in the first line, I guess.
params should be:
params = dict(linear_svm__C=[0.1, 10, 100])

On Thu, Sep 11, 2014 at 9:44 AM, Pagliari, Roberto 
<[email protected]<mailto:[email protected]>> wrote:
Hi,
Yes, I think you are right.

Is the code below how it should be done (scaling+linearsvc)?

[('scaler', Scaler()), ('linear_svm', LinearSVC())]
clf = Pipeline(estimators)
params = dict(LinearSVC__C=[0.1, 10, 100])
gs = GridSearchCV(clf, param_grid=params)

Thank you,


-----Original Message-----
From: Laurent Direr 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Thursday, September 11, 2014 11:15 AM
To: 
[email protected]<mailto:[email protected]>
Subject: Re: [Scikit-learn-general] modify gridsearch to scale cross-validation 
training/test dataset

Hello,

I think a pipeline does precisely what you are asking for:
http://scikit-learn.org/stable/modules/pipeline.html

If you include the scaler as a step in the pipeline it should behave the way 
you described in your first email.

Laurent

On 09/11/2014 04:59 PM, Pagliari, Roberto wrote:
> I'm not trying to scale the dataset at the very beginning. I would like to 
> scale while doing gridsearchCV.
>
> Thanks,
>
>
> -----Original Message-----
> From: Pagliari, Roberto 
> [mailto:[email protected]<mailto:[email protected]>]
> Sent: Thursday, September 11, 2014 10:52 AM
> To: 
> [email protected]<mailto:[email protected]>
> Subject: Re: [Scikit-learn-general] modify gridsearch to scale
> cross-validation training/test dataset
>
> I'm not sure how to do it when using gridsearch. Can you provide an example?
>
> Thank you,
>
>
> -----Original Message-----
> From: Gael Varoquaux 
> [mailto:[email protected]<mailto:[email protected]>]
> Sent: Thursday, September 11, 2014 10:50 AM
> To: 
> [email protected]<mailto:[email protected]>
> Subject: Re: [Scikit-learn-general] modify gridsearch to scale
> cross-validation training/test dataset
>
> Use a pipeline.
>
> G
>
> On Thu, Sep 11, 2014 at 02:47:48PM +0000, Pagliari, Roberto wrote:
>> Hello,
>> Gridsearch with CV is something like this at a high level:
>
>
>> for every combination of parameters:
>>     for every partition of training data
>>       split training into train_cv and test_cv
>>       train_classifier(train_cv).predict(test_cv)
>>       compute score
>>     average score
>>     if max so far, then update best params
>
>
>> I woud like to do something like this:
>
>
>> for every combination of parameters:
>>     for every partition of training data
>>       split training into train_cv and test_cv
>>       scaler = StandardScaler()
>>       scaler.fit(train_cv)
>>       train_cv = scaler.transform(train_cv)
>>       test_cv = scaler.transform(test_cv)
>>      train_classifier(train_cv).predict(test_cv)
>>       compute score
>>     average score
>>     if max so far, then update best params
>
>
>> basically, I would like to scale training data and test data (using
>> training data params) every time a CV train/test is generated.
>> Can someone suggest the best way to modify grid_search.py to do this?
>
>
>> Thank you,
>
>
>
>> ---------------------------------------------------------------------
>> -
>> --------
>> Want excitement?
>> Manually upgrade your production database.
>> When you want reliability, choose Perforce Perforce version control.
>> Predictably reliable.
>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.
>> clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]<mailto:[email protected]>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>


------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce Perforce version control. 
Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] modify gridsearch to scale cross-validation training/test dataset

Reply via email to