Hi, I am noticing that for some models in my grid search I get virtually the same exact results across 100 repetitions of CV. Is this normal? In case it matters, I am working with ~30 data points (I know, it's a small dataset) with ~5 dimensions.
Below are the details of the configuration that I used for grid search:
with K=4:
sfs = StratifiedShuffleSplit(y,n_iter=100,test_size=1.0/K)
I am working on a 3-labels classification problem with the following SVM
kernels:
tuned_parameters = [
{'kernel': ['linear'], 'C':
np.power(2,np.arange(-8.,8., step_size)},
{'kernel': ['rbf'], 'C':
np.power(2,np.arange(-8.,8., step_size), 'gamma':
np.power(2,np.arange(-8.,8., step_size)},
]
#....#
clf = GridSearchCV(SVC(C=1, cache_size=5000),
tuned_parameters,
scoring=f1_macro,
verbose=1, n_jobs=1, cv=sfs)
clf.fit(X,y)
#....#
Below is the plotting of the *cv_validation_scores* (mean, min, max,
mean-std and mean+std) from *clf.cv_scores*
More specifically:
all_scores = [x.cv_validation_scores for x in clf.cv_scores_]
all_scores = np.vstack(all_scores).transpose()
# Load the scores in a dataframe in pandas and sort the columns (the
models)
all_scores_df = pd.DataFrame(all_scores)
sorted_columns = all_scores_df.mean().order(ascending=False).index
sorted_scores = all_scores_df.reindex_axis(sorted_columns, axis=1)
# Plot envelope:
max_values = sorted_scores.max().values
min_values = sorted_scores.min().values
mean_values = sorted_scores.mean().values
std_values = sorted_scores.std().values
fig = plt.figure()
fig.hold(True)
plt.plot(max_values, color='r')
plt.plot(min_values, color='r')
plt.plot(mean_values, color='b')
above = mean_values + std_values
above = np.minimum(above,max_values)
plt.plot(above, color='g', linestyle='--', linewidth=2.0)
below = mean_values - std_values
below = np.maximum(below,min_values)
plt.plot(below, color='g', linestyle='--', linewidth=2.0)
[image: Inline image 1]
And here is an example of one of those models:
> clf.cv_scores_[8].cv_validation_scores
array([ 0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376,
0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376,
0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376,
0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376,
0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376,
0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376,
0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376,
0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376,
0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376,
0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376,
0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376,
0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376,
0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376,
0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376,
0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376,
0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376,
0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376,
0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376,
0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376,
0.21505376, 0.21505376, 0.21505376, 0.21505376, 0.21505376])
Thanks,
Josh
<<f1_macro_envelope.jpg>>
------------------------------------------------------------------------------ Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out. http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
