I am comparing the results of sklearn cross-validation and my own cross
validation.
I tested linearSVC under the following conditions:
- Data scaling per grid search
- Data scaling + 2-level quantization, per grid search
Specifically, I have done the following:
Sklearn gridSearchCV
- Create a pipeline with [StandardScaler, LinearSVC] if no binning is
used, or [StandardScaler, Binarizer, LinearSVC], if binning is used
- Invoke sklearn gridsearch (only C is provided as a parameter to
optimize over)
- When done with gridsearch,
o Scale entire training set
o Scale test set (with mean/std found on training set)
o Quantize, if quantization is used
o run LinearSVC, with best C value found
My own grid search
- Search over all possible values of C (same range as above)
- For each value of C, use stratifiedKFold with random_seed set to a
random number
o Scale train cross-validation datased, and test cross validation dataset
with train cv mean and std
o If binning is used, apply binary binning (my own function), on top of
StandardScaler
o For each value of C compute average score over all partition, where the
score is defined as number of correctly classified samples / total number of
samples
- When done with gridsearch,
o Scale entire training set
o Scale test set (with mean/std found on training set)
o Quantize, if quantization is used
o run LinearSVC, with best C value found
For some reason, I'm getting different results. In particular, sklearn
gridsearch performs better than my own gridsearch when not using quantization,
and it gets worse with quantization. With my own gridsearch I'm getting the
opposite trend.
Is my understanding of sklearn gridsearch wrong, or are there any issues with
it?
Thank you,
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general