I'm not sure what the issue with your custom CV is but this seems like a
complicated way to implement this.
Try model_selection.LeaveOneGroupOut, which directly implements LOSO
On 12/04/2016 03:12 PM, Ludovico Coletta wrote:
Dear scikit experts,
I'm struggling with the implementation of a nested cross validation.
My data: I have 26 subjects (13 per class) x 6670 features. I used a
feature reduction algorithm (you may have heard about Boruta) to
reduce the dimensionality of my data. Problems start now: I defined
LOSO as outer partitioning schema. Therefore, for each of the 26 cv
folds I used 24 subjects for feature reduction. This lead to a
different number of features in each cv fold. Now, for each cv fold I
would like to use the same 24 subjects for hyperparameter optimization
(SVM with rbf kernel).
This is what I did:
/cv = list(LeaveOneout(len(y))) # in y I stored the labels/
/
/
/inner_train = [None] * len(y)/
/
/
/inner_test = [None] * len(y)/
/
/
/ii = 0/
/
/
/while ii < len(y):/
/ cv = list(LeaveOneOut(len(y))) /
/ a = cv[ii][0]/
/ a = a[:-1]/
/ inner_train[ii] = a/
/
/
/ b = cv[ii][0]/
/ b = np.array(b[((len(cv[0][0]))-1)])/
/ inner_test[ii]=b/
/
/
/ ii = ii + 1/
/
/
/custom_cv = zip(inner_train,inner_test) # inner cv/
/
/
/
/
/pipe_logistic = Pipeline([('scl', StandardScaler()),('clf',
SVC(kernel="rbf"))])/
/
/
/parameters = [{'clf__C': np.logspace(-2, 10, 13),
'clf__gamma':np.logspace(-9, 3, 13)}]/
/
/
/
/
/
/
/scores = [None] * (len(y)) /
/
/
/ii = 0/
/
/
/while ii < len(scores):/
/
/
/ a = data[ii][0] # data for train/
/ b = data[ii][1] # data for test/
/ c = np.concatenate((a,b)) # shape: number of subjects * number of
features/
/ d = cv[ii][0] # labels for train/
/ e = cv[ii][1] # label for test/
/ f = np.concatenate((d,e))/
/
/
/ grid_search = GridSearchCV(estimator=pipe_logistic,
param_grid=parameters, verbose=1, scoring='accuracy', cv=
zip(([custom_cv[ii][0]]), ([custom_cv[ii][1]])))/
/
/
/ scores[ii] = cross_validation.cross_val_score(grid_search, c,
y[f], scoring='accuracy', cv = zip(([cv[ii][0]]), ([cv[ii][1]])))/
/
/
/ ii = ii + 1/
However, I got the following error message: index 25 is out of bounds
for size 25
Would it be so bad if I do not perform a nested LOSO but I use the
default setting for hyperparameter optimization?
Any help would be really appreciated
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn