Hi,
If you test this code you will see it raises an error ;).
The naming of the parameters in the param_grid should be consistent
with the names in the Pipeline object.
GridSearchCV performs grid search on the Pipeline object so it cannot
understand what the 'LinearSVC__C' parameter means.
If you replace it with 'linear_svm__C' it works just fine.
On 09/11/2014 06:44 PM, Pagliari, Roberto wrote:
> Hi,
> Yes, I think you are right.
>
> Is the code below how it should be done (scaling+linearsvc)?
>
> [('scaler', Scaler()), ('linear_svm', LinearSVC())]
> clf = Pipeline(estimators)
> params = dict(LinearSVC__C=[0.1, 10, 100])
> gs = GridSearchCV(clf, param_grid=params)
>
> Thank you,
>
>
> -----Original Message-----
> From: Laurent Direr [mailto:[email protected]]
> Sent: Thursday, September 11, 2014 11:15 AM
> To: [email protected]
> Subject: Re: [Scikit-learn-general] modify gridsearch to scale
> cross-validation training/test dataset
>
> Hello,
>
> I think a pipeline does precisely what you are asking for:
> http://scikit-learn.org/stable/modules/pipeline.html
>
> If you include the scaler as a step in the pipeline it should behave the way
> you described in your first email.
>
> Laurent
>
> On 09/11/2014 04:59 PM, Pagliari, Roberto wrote:
>> I'm not trying to scale the dataset at the very beginning. I would like to
>> scale while doing gridsearchCV.
>>
>> Thanks,
>>
>>
>> -----Original Message-----
>> From: Pagliari, Roberto [mailto:[email protected]]
>> Sent: Thursday, September 11, 2014 10:52 AM
>> To: [email protected]
>> Subject: Re: [Scikit-learn-general] modify gridsearch to scale
>> cross-validation training/test dataset
>>
>> I'm not sure how to do it when using gridsearch. Can you provide an example?
>>
>> Thank you,
>>
>>
>> -----Original Message-----
>> From: Gael Varoquaux [mailto:[email protected]]
>> Sent: Thursday, September 11, 2014 10:50 AM
>> To: [email protected]
>> Subject: Re: [Scikit-learn-general] modify gridsearch to scale
>> cross-validation training/test dataset
>>
>> Use a pipeline.
>>
>> G
>>
>> On Thu, Sep 11, 2014 at 02:47:48PM +0000, Pagliari, Roberto wrote:
>>> Hello,
>>> Gridsearch with CV is something like this at a high level:
>>
>>> for every combination of parameters:
>>> for every partition of training data
>>> split training into train_cv and test_cv
>>> train_classifier(train_cv).predict(test_cv)
>>> compute score
>>> average score
>>> if max so far, then update best params
>>
>>> I woud like to do something like this:
>>
>>> for every combination of parameters:
>>> for every partition of training data
>>> split training into train_cv and test_cv
>>> scaler = StandardScaler()
>>> scaler.fit(train_cv)
>>> train_cv = scaler.transform(train_cv)
>>> test_cv = scaler.transform(test_cv)
>>> train_classifier(train_cv).predict(test_cv)
>>> compute score
>>> average score
>>> if max so far, then update best params
>>
>>> basically, I would like to scale training data and test data (using
>>> training data params) every time a CV train/test is generated.
>>> Can someone suggest the best way to modify grid_search.py to do this?
>>
>>> Thank you,
>>
>>
>>> ---------------------------------------------------------------------
>>> -
>>> --------
>>> Want excitement?
>>> Manually upgrade your production database.
>>> When you want reliability, choose Perforce Perforce version control.
>>> Predictably reliable.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.
>>> clktrk
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce Perforce version control.
> Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce
> Perforce version control. Predictably reliable.
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general