Re: [Scikit-learn-general] getting different results with sklearn gridsearchCV

Andy Fri, 12 Sep 2014 09:13:17 -0700

Hi Roberto.

GridSearchCV uses accuracy for selection if not other method isspecified, so there should be no difference.


Could you provide code?

Do you also create a pipeline when using your own grid search? I wouldimagine there is some difference in how you do the fitting in the pipeline.


Cheers,
Andy


On 09/12/2014 05:09 PM, Pagliari, Roberto wrote:

Regarding my previous question, I suspect the difference lies in thescoring function.
What is the default scoring function used by gridsearch?

In my own implementation  I am using
number of correctly classified samples (no weighting) / total numberof samples
sklearn gridsearch function must be using something else, or maybe thesame, but with weighting?
Thanks,

*From:* Pagliari, Roberto
*Sent:* Friday, September 12, 2014 10:21 AM
*To:* 'scikit-learn-general@lists.sourceforge.net'
*Subject:* getting different results with sklearn gridsearchCV
I am comparing the results of sklearn cross-validation and my owncross validation.
I tested linearSVC under the following conditions:

-Data scaling per grid search

-Data scaling + 2-level quantization, per grid search

Specifically, I have done the following:

Sklearn gridSearchCV
-Create a pipeline with [StandardScaler, LinearSVC] if no binning isused, or [StandardScaler, Binarizer, LinearSVC], if binning is used
-Invoke sklearn gridsearch (only C is provided as a parameter tooptimize over)
-When done with gridsearch,

oScale entire training set

oScale test set (with mean/std found on training set)

oQuantize, if quantization is used

o run LinearSVC, with best C value found

My own grid search

-Search over all possible values of C (same range as above)
-For each value of C, use stratifiedKFold with random_seed set to arandom number
oScale train cross-validation datased, and test cross validationdataset with train cv mean and std
oIf binning is used, apply binary binning (my own function), on top ofStandardScaler
oFor each value of C compute average score over all partition, wherethe score is defined as number of correctly classified samples / totalnumber of samples
-When done with gridsearch,

oScale entire training set

oScale test set (with mean/std found on training set)

oQuantize, if quantization is used

o run LinearSVC, with best C value found
For some reason, I’m getting different results. In particular, sklearngridsearch performs better than my own gridsearch when not usingquantization, and it gets worse with quantization. With my owngridsearch I’m getting the opposite trend.
Is my understanding of sklearn gridsearch wrong, or are there anyissues with it?
Thank you,



------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] getting different results with sklearn gridsearchCV

Reply via email to