Unfortunately, it did not work. I think I am doing something wrong when passing the nested cv, but I do not understand where. If I omit the cv argument in the grid search it runs smoothly. I would like to have LeaveOneOut in both the outer and inner cv, how would you implement such a thing?
Best Ludovico ________________________________ Da: scikit-learn <scikit-learn-bounces+ludo25_90=hotmail....@python.org> per conto di scikit-learn-requ...@python.org <scikit-learn-requ...@python.org> Inviato: domenica 4 dicembre 2016 22.27 A: scikit-learn@python.org Oggetto: scikit-learn Digest, Vol 9, Issue 13 Send scikit-learn mailing list submissions to scikit-learn@python.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn> mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... or, via email, send a message with subject or body 'help' to scikit-learn-requ...@python.org You can reach the person managing the list at scikit-learn-ow...@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of scikit-learn digest..." Today's Topics: 1. Nested Leave One Subject Out (LOSO) cross validation with scikit (Ludovico Coletta) 2. Re: Adding samplers for intersection/Jensen-Shannon kernels (a...@mccme.ru) 3. Re: Nested Leave One Subject Out (LOSO) cross validation with scikit (Raghav R V) ---------------------------------------------------------------------- Message: 1 Date: Sun, 4 Dec 2016 20:12:29 +0000 From: Ludovico Coletta <ludo25...@hotmail.com> To: "scikit-learn@python.org" <scikit-learn@python.org> Subject: [scikit-learn] Nested Leave One Subject Out (LOSO) cross validation with scikit Message-ID: <blupr0301mb2017b71792c520f6ace525478c...@blupr0301mb2017.namprd03.prod.outlook.com> Content-Type: text/plain; charset="iso-8859-1" Dear scikit experts, I'm struggling with the implementation of a nested cross validation. My data: I have 26 subjects (13 per class) x 6670 features. I used a feature reduction algorithm (you may have heard about Boruta) to reduce the dimensionality of my data. Problems start now: I defined LOSO as outer partitioning schema. Therefore, for each of the 26 cv folds I used 24 subjects for feature reduction. This lead to a different number of features in each cv fold. Now, for each cv fold I would like to use the same 24 subjects for hyperparameter optimization (SVM with rbf kernel). This is what I did: cv = list(LeaveOneout(len(y))) # in y I stored the labels inner_train = [None] * len(y) inner_test = [None] * len(y) ii = 0 while ii < len(y): cv = list(LeaveOneOut(len(y))) a = cv[ii][0] a = a[:-1] inner_train[ii] = a b = cv[ii][0] b = np.array(b[((len(cv[0][0]))-1)]) inner_test[ii]=b ii = ii + 1 custom_cv = zip(inner_train,inner_test) # inner cv pipe_logistic = Pipeline([('scl', StandardScaler()),('clf', SVC(kernel="rbf"))]) parameters = [{'clf__C': np.logspace(-2, 10, 13), 'clf__gamma':np.logspace(-9, 3, 13)}] scores = [None] * (len(y)) ii = 0 while ii < len(scores): a = data[ii][0] # data for train b = data[ii][1] # data for test c = np.concatenate((a,b)) # shape: number of subjects * number of features d = cv[ii][0] # labels for train e = cv[ii][1] # label for test f = np.concatenate((d,e)) grid_search = GridSearchCV(estimator=pipe_logistic, param_grid=parameters, verbose=1, scoring='accuracy', cv= zip(([custom_cv[ii][0]]), ([custom_cv[ii][1]]))) scores[ii] = cross_validation.cross_val_score(grid_search, c, y[f], scoring='accuracy', cv = zip(([cv[ii][0]]), ([cv[ii][1]]))) ii = ii + 1 However, I got the following error message: index 25 is out of bounds for size 25 Would it be so bad if I do not perform a nested LOSO but I use the default setting for hyperparameter optimization? Any help would be really appreciated -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161204/1e5e5ec9/attachment-0001.html> ------------------------------ Message: 2 Date: Sun, 04 Dec 2016 23:50:21 +0300 From: a...@mccme.ru To: Scikit-learn user and developer mailing list <scikit-learn@python.org> Subject: Re: [scikit-learn] Adding samplers for intersection/Jensen-Shannon kernels Message-ID: <0511d5fa33737f78ccdf7fbb2e5b2...@mccme.ru> Content-Type: text/plain; charset=UTF-8; format=flowed I see now. So I'll proceed with adding documentation and unit tests for those kernels to complete their support. And I don't think they're too specialized, given that many kinds of feature vectors in e.g. computer vision are in fact histograms and all of those kernels are histogram-oriented. Andy ????? 2016-12-04 00:23: > Hi Valery. > I didn't include them because the Chi2 worked better for my task ;) > In hindsight, I'm not sure if these kernels are not to a bit too > specialized for scikit-learn. > But given that we have the (slightly more obscure) SkewedChi2 and > AdditiveChi2, > I think the intersection one would be a good addition if you found it > useful. > > Andy > > On 12/03/2016 03:39 PM, Valery Anisimovsky via scikit-learn wrote: >> Hello, >> >> In the course of my work, I've made samplers for >> intersection/Jensen-Shannon kernels, just by small modifications to >> sklearn.kernel_approximation.AdditiveChi2Sampler code. Intersection >> kernel proved to be the best one for my task (clustering Docstrum >> feature vectors), so perhaps it'd be good to add those samplers >> alongside AdditiveChi2Sampler? Should I proceed with creating a pull >> request? Or, perhaps, those kernels were not already included for some >> good reason? >> >> With best regards, >> -- Valery >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn> mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn> mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... ------------------------------ Message: 3 Date: Sun, 4 Dec 2016 22:27:02 +0100 From: Raghav R V <rag...@gmail.com> To: Scikit-learn user and developer mailing list <scikit-learn@python.org> Subject: Re: [scikit-learn] Nested Leave One Subject Out (LOSO) cross validation with scikit Message-ID: <CACmxyDFRO0T_wxk8Z=sy-0co2c2g-ofgqzvyjxq5eyf7ksz...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" Hi! It looks like you are using the old `sklearn.cross_validation`'s LeaveOneLabelOut cross-validator. It has been deprecated since v0.18. Use the `LeaveOneLabelOut` from `sklearn.model_selection`, that should fix your issue I think (thought I have not looked into your code in detail). HTH! On Sun, Dec 4, 2016 at 9:12 PM, Ludovico Coletta <ludo25...@hotmail.com> wrote: > Dear scikit experts, > > I'm struggling with the implementation of a nested cross validation. > > My data: I have 26 subjects (13 per class) x 6670 features. I used a > feature reduction algorithm (you may have heard about Boruta) to reduce the > dimensionality of my data. Problems start now: I defined LOSO as outer > partitioning schema. Therefore, for each of the 26 cv folds I used 24 > subjects for feature reduction. This lead to a different number of features > in each cv fold. Now, for each cv fold I would like to use the same 24 > subjects for hyperparameter optimization (SVM with rbf kernel). > > This is what I did: > > *cv = list(LeaveOneout(len(y))) # in y I stored the labels* > > *inner_train = [None] * len(y)* > > *inner_test = [None] * len(y)* > > *ii = 0* > > *while ii < len(y):* > * cv = list(LeaveOneOut(len(y))) * > * a = cv[ii][0]* > * a = a[:-1]* > * inner_train[ii] = a* > > * b = cv[ii][0]* > * b = np.array(b[((len(cv[0][0]))-1)])* > * inner_test[ii]=b* > > * ii = ii + 1* > > *custom_cv = zip(inner_train,inner_test) # inner cv* > > > *pipe_logistic = Pipeline([('scl', StandardScaler()),('clf', > SVC(kernel="rbf"))])* > > *parameters = [{'clf__C': np.logspace(-2, 10, 13), > 'clf__gamma':np.logspace(-9, 3, 13)}]* > > > > *scores = [None] * (len(y)) * > > *ii = 0* > > *while ii < len(scores):* > > * a = data[ii][0] # data for train* > * b = data[ii][1] # data for test* > * c = np.concatenate((a,b)) # shape: number of subjects * number of > features* > * d = cv[ii][0] # labels for train* > * e = cv[ii][1] # label for test* > * f = np.concatenate((d,e))* > > * grid_search = GridSearchCV(estimator=pipe_logistic, > param_grid=parameters, verbose=1, scoring='accuracy', cv= > zip(([custom_cv[ii][0]]), ([custom_cv[ii][1]])))* > > * scores[ii] = cross_validation.cross_val_score(grid_search, c, y[f], > scoring='accuracy', cv = zip(([cv[ii][0]]), ([cv[ii][1]])))* > > * ii = ii + 1* > > > > However, I got the following error message: index 25 is out of bounds for > size 25 > > Would it be so bad if I do not perform a nested LOSO but I use the default > setting for hyperparameter optimization? > > Any help would be really appreciated > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn> mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... > > -- Raghav RV https://github.com/raghavrv [https://avatars2.githubusercontent.com/u/9487348?v=3&s=400]<https://github.com/raghavrv> raghavrv (Raghav RV) ยท GitHub<https://github.com/raghavrv> github.com raghavrv has 18 repositories available. Follow their code on GitHub. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161204/2de12424/attachment.html> ------------------------------ Subject: Digest Footer _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn scikit-learn Info Page - Python<https://mail.python.org/mailman/listinfo/scikit-learn> mail.python.org To see the collection of prior postings to the list, visit the scikit-learn Archives. Using scikit-learn: To post a message to all the list members ... ------------------------------ End of scikit-learn Digest, Vol 9, Issue 13 *******************************************
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn