Re: [scikit-learn] Nested Leave One Subject Out (LOSO) cross validation with scikit

Andy Mon, 05 Dec 2016 05:56:29 -0800

I'm not sure what the issue with your custom CV is but this seems like acomplicated way to implement this.

Try model_selection.LeaveOneGroupOut, which directly implements LOSO


On 12/04/2016 03:12 PM, Ludovico Coletta wrote:

Dear scikit experts,

I'm struggling with the implementation of a nested cross validation.
My data: I have 26 subjects (13 per class) x 6670 features. I used afeature reduction algorithm (you may have heard about Boruta) toreduce the dimensionality of my data. Problems start now: I definedLOSO as outer partitioning schema. Therefore, for each of the 26 cvfolds I used 24 subjects for feature reduction. This lead to adifferent number of features in each cv fold. Now, for each cv fold Iwould like to use the same 24 subjects for hyperparameter optimization(SVM with rbf kernel).
This is what I did:

/cv = list(LeaveOneout(len(y))) # in y I stored the labels/
/
/
/inner_train = [None] * len(y)/
/
/
/inner_test =  [None] * len(y)/
/
/
/ii = 0/
/
/
/while ii < len(y):/
/    cv = list(LeaveOneOut(len(y))) /
/    a = cv[ii][0]/
/    a = a[:-1]/
/    inner_train[ii] = a/
/
/
/    b = cv[ii][0]/
/    b = np.array(b[((len(cv[0][0]))-1)])/
/    inner_test[ii]=b/
/
/
/    ii = ii + 1/
/
/
/custom_cv = zip(inner_train,inner_test) # inner cv/
/
/
/
/
/pipe_logistic = Pipeline([('scl', StandardScaler()),('clf',SVC(kernel="rbf"))])/
/
/
/parameters = [{'clf__C': np.logspace(-2, 10, 13),'clf__gamma':np.logspace(-9, 3, 13)}]/
/
/
/
/
/
/
/scores = [None] * (len(y)) /
/
/
/ii = 0/
/
/
/while ii < len(scores):/
/
/
/    a = data[ii][0] # data for train/
/    b = data[ii][1] # data for test/
/ c = np.concatenate((a,b)) # shape: number of subjects * number offeatures/
/    d = cv[ii][0] # labels for train/
/    e = cv[ii][1] # label for test/
/    f = np.concatenate((d,e))/
/
/
/ grid_search = GridSearchCV(estimator=pipe_logistic,param_grid=parameters, verbose=1, scoring='accuracy', cv=zip(([custom_cv[ii][0]]), ([custom_cv[ii][1]])))/
/
/
/ scores[ii] = cross_validation.cross_val_score(grid_search, c,y[f], scoring='accuracy', cv = zip(([cv[ii][0]]), ([cv[ii][1]])))/
/
/
/    ii = ii + 1/
However, I got the following error message: index 25 is out of boundsfor size 25
Would it be so bad if I do not perform a nested LOSO but I use thedefault setting for hyperparameter optimization?
Any help would be really appreciated



_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Nested Leave One Subject Out (LOSO) cross validation with scikit

Reply via email to