Re: [Scikit-learn-general] Identical scores across repetitions of repeated CV ?? (figure included)

Joel Nothman Wed, 31 Jul 2013 21:27:34 -0700

I think all those results correspond to the RBF kernel. You have far too
few samples to learn an RBF model, so it's stored trivial coefficients
independent of C and gamma.



On Thu, Aug 1, 2013 at 1:56 PM, Josh Wasserstein <[email protected]>wrote:

> Hi,
>
> I am noticing that for some models in my grid search I get virtually the
> same exact results across 100 repetitions of CV. Is this normal? In case it
> matters, I am working with ~30 data points (I know, it's a small dataset)
> with ~5 dimensions.
>
> Below are the details of the configuration that I used for grid search:
>
> with K=4:
> sfs = StratifiedShuffleSplit(y,n_iter=100,test_size=1.0/K)
>
> I am working on a 3-labels classification problem with the following SVM
> kernels:
>
>   tuned_parameters = [
>                       {'kernel': ['linear'],  'C':
> np.power(2,np.arange(-8.,8., step_size)},
>                       {'kernel': ['rbf'],     'C':
> np.power(2,np.arange(-8.,8., step_size), 'gamma':
> np.power(2,np.arange(-8.,8., step_size)},
>                ]
>
> #....#
> clf = GridSearchCV(SVC(C=1, cache_size=5000),
>  tuned_parameters,
>  scoring=f1_macro,
>  verbose=1, n_jobs=1, cv=sfs)
> clf.fit(X,y)
> #....#
>
> Below is the plotting of the *cv_validation_scores* (mean, min, max,
> mean-std and mean+std) from *clf.cv_scores*
>
> More specifically:
>
>   all_scores = [x.cv_validation_scores for x in clf.cv_scores_]
>   all_scores = np.vstack(all_scores).transpose()
>
>   # Load the scores in a dataframe in pandas and sort the columns (the
> models)
>   all_scores_df = pd.DataFrame(all_scores)
>    sorted_columns = all_scores_df.mean().order(ascending=False).index
>   sorted_scores = all_scores_df.reindex_axis(sorted_columns, axis=1)
>
>   # Plot envelope:
>   max_values  = sorted_scores.max().values
>   min_values  = sorted_scores.min().values
>   mean_values = sorted_scores.mean().values
>   std_values  = sorted_scores.std().values
>
>   fig = plt.figure()
>   fig.hold(True)
>   plt.plot(max_values, color='r')
>   plt.plot(min_values, color='r')
>   plt.plot(mean_values, color='b')
>
>   above = mean_values + std_values
>   above = np.minimum(above,max_values)
>   plt.plot(above, color='g', linestyle='--', linewidth=2.0)
>   below = mean_values - std_values
>   below = np.maximum(below,min_values)
>   plt.plot(below, color='g', linestyle='--', linewidth=2.0)
>
> [image: Inline image 1]
> And here is an example of one of those models:
>
> > clf.cv_scores_[8].cv_validation_scores
>
> array([ 0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
>         0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
>         0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
>         0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
>         0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
>         0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
>         0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
>         0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
>         0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
>         0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
>         0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
>         0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
>         0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
>         0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
>         0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
>         0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
>         0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
>         0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
>         0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
>         0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376])
>
> Thanks,
>
> Josh
>
>
> ------------------------------------------------------------------------------
> Get your SQL database under version control now!
> Version control is standard for application code, but databases havent
> caught up. So what steps can you take to put your SQL databases under
> version control? Why should you start doing it? Read more to find out.
> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

<<f1_macro_envelope.jpg>>

------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Identical scores across repetitions of repeated CV ?? (figure included)

Reply via email to