Hello, I think a pipeline does precisely what you are asking for: http://scikit-learn.org/stable/modules/pipeline.html
If you include the scaler as a step in the pipeline it should behave the way you described in your first email. Laurent On 09/11/2014 04:59 PM, Pagliari, Roberto wrote: > I'm not trying to scale the dataset at the very beginning. I would like to > scale while doing gridsearchCV. > > Thanks, > > > -----Original Message----- > From: Pagliari, Roberto [mailto:[email protected]] > Sent: Thursday, September 11, 2014 10:52 AM > To: [email protected] > Subject: Re: [Scikit-learn-general] modify gridsearch to scale > cross-validation training/test dataset > > I'm not sure how to do it when using gridsearch. Can you provide an example? > > Thank you, > > > -----Original Message----- > From: Gael Varoquaux [mailto:[email protected]] > Sent: Thursday, September 11, 2014 10:50 AM > To: [email protected] > Subject: Re: [Scikit-learn-general] modify gridsearch to scale > cross-validation training/test dataset > > Use a pipeline. > > G > > On Thu, Sep 11, 2014 at 02:47:48PM +0000, Pagliari, Roberto wrote: >> Hello, >> Gridsearch with CV is something like this at a high level: > > >> for every combination of parameters: >> for every partition of training data >> split training into train_cv and test_cv >> train_classifier(train_cv).predict(test_cv) >> compute score >> average score >> if max so far, then update best params > > >> I woud like to do something like this: > > >> for every combination of parameters: >> for every partition of training data >> split training into train_cv and test_cv >> scaler = StandardScaler() >> scaler.fit(train_cv) >> train_cv = scaler.transform(train_cv) >> test_cv = scaler.transform(test_cv) >> train_classifier(train_cv).predict(test_cv) >> compute score >> average score >> if max so far, then update best params > > >> basically, I would like to scale training data and test data (using >> training data params) every time a CV train/test is generated. >> Can someone suggest the best way to modify grid_search.py to do this? > > >> Thank you, > > > >> ---------------------------------------------------------------------- >> -------- >> Want excitement? >> Manually upgrade your production database. >> When you want reliability, choose Perforce Perforce version control. >> Predictably reliable. >> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg. >> clktrk >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
