[Scikit-learn-general] Identical scores across repetitions of repeated CV ?? (figure included)

Josh Wasserstein Wed, 31 Jul 2013 20:58:14 -0700

Hi,

I am noticing that for some models in my grid search I get virtually the
same exact results across 100 repetitions of CV. Is this normal? In case it
matters, I am working with ~30 data points (I know, it's a small dataset)
with ~5 dimensions.


Below are the details of the configuration that I used for grid search:

with K=4:
sfs = StratifiedShuffleSplit(y,n_iter=100,test_size=1.0/K)

I am working on a 3-labels classification problem with the following SVM
kernels:

  tuned_parameters = [
                      {'kernel': ['linear'],  'C':
np.power(2,np.arange(-8.,8., step_size)},
                      {'kernel': ['rbf'],     'C':
np.power(2,np.arange(-8.,8., step_size), 'gamma':
np.power(2,np.arange(-8.,8., step_size)},
               ]

#....#
clf = GridSearchCV(SVC(C=1, cache_size=5000),
 tuned_parameters,
 scoring=f1_macro,
 verbose=1, n_jobs=1, cv=sfs)
clf.fit(X,y)
#....#

Below is the plotting of the *cv_validation_scores* (mean, min, max,
mean-std and mean+std) from *clf.cv_scores*

More specifically:

  all_scores = [x.cv_validation_scores for x in clf.cv_scores_]
  all_scores = np.vstack(all_scores).transpose()

  # Load the scores in a dataframe in pandas and sort the columns (the
models)
  all_scores_df = pd.DataFrame(all_scores)
  sorted_columns = all_scores_df.mean().order(ascending=False).index
  sorted_scores = all_scores_df.reindex_axis(sorted_columns, axis=1)

  # Plot envelope:
  max_values  = sorted_scores.max().values
  min_values  = sorted_scores.min().values
  mean_values = sorted_scores.mean().values
  std_values  = sorted_scores.std().values

  fig = plt.figure()
  fig.hold(True)
  plt.plot(max_values, color='r')
  plt.plot(min_values, color='r')
  plt.plot(mean_values, color='b')

  above = mean_values + std_values
  above = np.minimum(above,max_values)
  plt.plot(above, color='g', linestyle='--', linewidth=2.0)
  below = mean_values - std_values
  below = np.maximum(below,min_values)
  plt.plot(below, color='g', linestyle='--', linewidth=2.0)

[image: Inline image 1]
And here is an example of one of those models:

> clf.cv_scores_[8].cv_validation_scores

array([ 0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
        0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
        0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
        0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
        0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
        0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
        0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
        0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
        0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
        0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
        0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
        0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
        0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
        0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
        0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
        0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
        0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
        0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
        0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376,
        0.21505376,  0.21505376,  0.21505376,  0.21505376,  0.21505376])

Thanks,

Josh

<<f1_macro_envelope.jpg>>

------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Identical scores across repetitions of repeated CV ?? (figure included)

Reply via email to