Hi,
Yes, I think you are right.
Is the code below how it should be done (scaling+linearsvc)?
[('scaler', Scaler()), ('linear_svm', LinearSVC())]
clf = Pipeline(estimators)
params = dict(LinearSVC__C=[0.1, 10, 100])
gs = GridSearchCV(clf, param_grid=params)
Thank you,
-----Original Message-----
From: Laurent Direr [mailto:[email protected]]
Sent: Thursday, September 11, 2014 11:15 AM
To: [email protected]
Subject: Re: [Scikit-learn-general] modify gridsearch to scale cross-validation
training/test dataset
Hello,
I think a pipeline does precisely what you are asking for:
http://scikit-learn.org/stable/modules/pipeline.html
If you include the scaler as a step in the pipeline it should behave the way
you described in your first email.
Laurent
On 09/11/2014 04:59 PM, Pagliari, Roberto wrote:
> I'm not trying to scale the dataset at the very beginning. I would like to
> scale while doing gridsearchCV.
>
> Thanks,
>
>
> -----Original Message-----
> From: Pagliari, Roberto [mailto:[email protected]]
> Sent: Thursday, September 11, 2014 10:52 AM
> To: [email protected]
> Subject: Re: [Scikit-learn-general] modify gridsearch to scale
> cross-validation training/test dataset
>
> I'm not sure how to do it when using gridsearch. Can you provide an example?
>
> Thank you,
>
>
> -----Original Message-----
> From: Gael Varoquaux [mailto:[email protected]]
> Sent: Thursday, September 11, 2014 10:50 AM
> To: [email protected]
> Subject: Re: [Scikit-learn-general] modify gridsearch to scale
> cross-validation training/test dataset
>
> Use a pipeline.
>
> G
>
> On Thu, Sep 11, 2014 at 02:47:48PM +0000, Pagliari, Roberto wrote:
>> Hello,
>> Gridsearch with CV is something like this at a high level:
>
>
>> for every combination of parameters:
>> for every partition of training data
>> split training into train_cv and test_cv
>> train_classifier(train_cv).predict(test_cv)
>> compute score
>> average score
>> if max so far, then update best params
>
>
>> I woud like to do something like this:
>
>
>> for every combination of parameters:
>> for every partition of training data
>> split training into train_cv and test_cv
>> scaler = StandardScaler()
>> scaler.fit(train_cv)
>> train_cv = scaler.transform(train_cv)
>> test_cv = scaler.transform(test_cv)
>> train_classifier(train_cv).predict(test_cv)
>> compute score
>> average score
>> if max so far, then update best params
>
>
>> basically, I would like to scale training data and test data (using
>> training data params) every time a CV train/test is generated.
>> Can someone suggest the best way to modify grid_search.py to do this?
>
>
>> Thank you,
>
>
>
>> ---------------------------------------------------------------------
>> -
>> --------
>> Want excitement?
>> Manually upgrade your production database.
>> When you want reliability, choose Perforce Perforce version control.
>> Predictably reliable.
>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.
>> clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce Perforce version control.
Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general