Re: [Scikit-learn-general] getting different results with sklearn gridsearchCV

Andy Fri, 12 Sep 2014 10:19:16 -0700

As Laurent said using StandardScaler again is not necessary.

If you don't provide code for your custom grid-search, it is hard to saywhat the difference might be ;)Are the same parameters selected and are the scores during thegrid-search the same?





On 09/12/2014 06:31 PM, Pagliari, Roberto wrote:


Hi Andy,

I don’t think the accuracy is an issue. I explicitly provided a scorefunction and the problem persists.

With my own gridsearch I don’t use pipeline, just stratifiedKFold andaverage for every combination of the parameters.


This is an example with scaling+svm using sklearn pipeline:

    estimators = [('scaler', StandardScaler()),

                     ('linear_svm', svm.LinearSVC(class_weight=’auto’,))]

    clf_pipeline = Pipeline(estimators)

    params = dict(linear_svm__C=<some array of values>)

    clf = grid_search.GridSearchCV(clf_pipeline, param_grid=params)

clf.fit(X_train, y_train) # here I’m not scaling since I assumegridsearch will do while searching


After this I make the predictions

    scaler = StandardScaler()

    X_train = scaler.fit_transform(X_train)

    X_test = scaler.transform(X_test)

    y_predictions = clf.predict(X_test)

with binning, I would just add the Binarizer to the pipeline, andright before computing y_predictions.


Is there anything wrong with what I’m doing?

Thank you

*From:*Andy [mailto:t3k...@gmail.com]
*Sent:* Friday, September 12, 2014 12:12 PM
*To:* scikit-learn-general@lists.sourceforge.net

*Subject:* Re: [Scikit-learn-general] getting different results withsklearn gridsearchCV


Hi Roberto.

GridSearchCV uses accuracy for selection if not other method isspecified, so there should be no difference.


Could you provide code?

Do you also create a pipeline when using your own grid search? I wouldimagine there is some difference in how you do the fitting in thepipeline.


Cheers,
Andy


On 09/12/2014 05:09 PM, Pagliari, Roberto wrote:

    Regarding my previous question, I suspect the difference lies in
    the scoring function.

    What is the default scoring function used by gridsearch?

    In my own implementation  I am using

    number of correctly classified samples (no weighting) / total
    number of samples

    sklearn gridsearch function must be using something else, or maybe
    the same, but with weighting?

    Thanks,

    *From:* Pagliari, Roberto
    *Sent:* Friday, September 12, 2014 10:21 AM
    *To:* 'scikit-learn-general@lists.sourceforge.net
    <mailto:scikit-learn-general@lists.sourceforge.net>'
    *Subject:* getting different results with sklearn gridsearchCV

    I am comparing the results of sklearn cross-validation and my own
    cross validation.

    I tested linearSVC under the following conditions:

    -Data scaling per grid search

    -Data scaling + 2-level quantization, per grid search

    Specifically, I have done the following:

    Sklearn gridSearchCV

    -Create a pipeline with [StandardScaler, LinearSVC] if no binning
    is used,  or [StandardScaler, Binarizer, LinearSVC], if binning is
    used

    -Invoke sklearn gridsearch (only C is provided as a parameter to
    optimize over)

    -When done with gridsearch,

    oScale entire training set

    oScale test set (with mean/std found on training set)

    oQuantize, if quantization is used

    o run LinearSVC, with best C value found

    My own grid search

    -Search over all possible values of C (same range as above)

    -For each value of C, use stratifiedKFold with random_seed set to
    a random number

    oScale train cross-validation datased, and test cross validation
    dataset with train cv mean and std

    oIf binning is used, apply binary binning (my own function), on
    top of StandardScaler

    oFor each value of C compute average score over all partition,
    where the score is defined as number of correctly classified
    samples / total number of samples

    -When done with gridsearch,

    oScale entire training set

    oScale test set (with mean/std found on training set)

    oQuantize, if quantization is used

    o run LinearSVC, with best C value found

    For some reason, I’m getting different results. In particular,
    sklearn gridsearch performs better than my own gridsearch when not
    using quantization, and it gets worse with quantization. With my
    own gridsearch I’m getting the opposite trend.

    Is my understanding of sklearn gridsearch wrong, or are there any
    issues with it?

    Thank you,




    
------------------------------------------------------------------------------

    Want excitement?

    Manually upgrade your production database.

    When you want reliability, choose Perforce

    Perforce version control. Predictably reliable.

    http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk




    _______________________________________________

    Scikit-learn-general mailing list

    Scikit-learn-general@lists.sourceforge.net  
<mailto:Scikit-learn-general@lists.sourceforge.net>

    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] getting different results with sklearn gridsearchCV

Reply via email to