Hi, Joel thanks for pointing out the indentation issue. I have fixed it.
Can someone explain what the 3 tests that were automatically run on my code are? And why did the Appveyor and Travis ones fail? Sincerely, Basil Beirouti Sent from my iPhone > On Jul 11, 2016, at 11:00 AM, [email protected] wrote: > > Send scikit-learn mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Re: Scikit learn GridSearchCV fit method ValueError Found > array with 0 sample (Maciek W?jcikowski) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 11 Jul 2016 13:33:28 +0200 > From: Maciek W?jcikowski <[email protected]> > To: Scikit-learn user and developer mailing list > <[email protected]> > Subject: Re: [scikit-learn] Scikit learn GridSearchCV fit method > ValueError Found array with 0 sample > Message-ID: > <CAH2JJR1BqHC0PzNv7uaugkQ9GDBUTev4yuJ1qOWuJa=ewz1...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Shouldn't you pass labels (binary) instead of continuous data? If you wish > to stick to logK's and keep the distribution unchanged then you'd better > reduce the number of classes (eg round the values to nearest integer?). > > It might be the case that the counts per class are floored and you get 0 > for some cases. > > ---- > Pozdrawiam, | Best regards, > Maciek W?jcikowski > [email protected] > > 2016-07-11 13:16 GMT+02:00 Micha? Nowotka <[email protected]>: > >> Hi Maciek, >> >> Thanks for suggestion, I think the problem indeed is related to the >> StratifiedKFold because if I use KFold instead the code works fine. >> However, if I print StratifiedKFold object it looks fine to me: >> >> sklearn.cross_validation.StratifiedKFold(labels=[ 5.43 8.74 8.1 >> 6.55 7.66 6.52 8.6 7.1 6.4 8.05 7.89 6.68 >> 8.06 6.17 5.5 7.96 5.78 6. 7.74 5.83 6.51 6.31 6.68 9.22 >> 6.07 7.06 7.12 8.64 5.72 6.4 7.64 5.74 7.41 6.49 6.81 7.1 >> 7.66 6.68 7.05 6.28 5.49 6.35 6.9 6.2 7.51 5.65 9.3 5.84 >> 6.92 5.75 6.92 8.8 7.04 5.81 5.73 5.31 7.13 7.66 6.98 5.93 >> 8.24 6.96 8.22 7.27 7.34 5.91 5.57 6.5 7.28 6.74 4.92 6.88 >> 5.8 9.15 6.63 6.37 8.66 6.4 ], n_folds=5, shuffle=False, >> random_state=None) >> >> >> On Fri, Jul 8, 2016 at 10:42 PM, Maciek W?jcikowski >> <[email protected]> wrote: >>> Hi Micha?, >>> >>> What are the class counts in that set? Maybe there is a problem with >>> generating stratified subsamples (eg some classes get below 1 sample)? >>> >>> ---- >>> Pozdrawiam, | Best regards, >>> Maciek W?jcikowski >>> [email protected] >>> >>> 2016-07-08 17:22 GMT+02:00 Micha? Nowotka <[email protected]>: >>>> >>>> Hi, >>>> >>>> Sorry for cross posting >>>> >>>> ( >> http://stackoverflow.com/questions/38263933/scikit-learn-gridsearchcv-fit-method-valueerror-found-array-with-0-sample >> ) >>>> but I don't know where is better to get help with my problem. >>>> I'm working on a VM with Jupyter notebook server installed. >>>> From time to time I add new notebooks and reevaluate old ones to see >>>> if they still work. >>>> >>>> This notebook stopped working due to some changes in scikit-learn API >>>> and some parameters become obsolete: >> https://github.com/chembl/mychembl/blob/master/ipython_notebooks/10_myChEMBL_machine_learning.ipynb >>>> >>>> I've created a corrected version of the notebook here: >>>> >>>> https://gist.github.com/anonymous/676c55cc501ffa48fecfcc1e1252d433 >>>> >>>> But I'm stuck in cell 36 on this code: >>>> >>>> from sklearn.cross_validation import KFold >>>> from sklearn.grid_search import GridSearchCV >>>> >>>> X_traina, X_testa, y_traina, y_testa = >>>> cross_validation.train_test_split(x, y, test_size=0.95, >>>> random_state=23) >>>> >>>> params = {'min_samples_split': [8], 'max_depth': [20], >>>> 'min_samples_leaf': [1],'n_estimators':[200]} >>>> cv = KFold(n=len(X_traina),n_folds=10,shuffle=True) >>>> cv_stratified = StratifiedKFold(y_traina, n_folds=5) >>>> gs = GridSearchCV(custom_forest, params, >>>> cv=cv_stratified,verbose=1,refit=True) >>>> gs.fit(X_traina,y_traina) >>>> >>>> This gives me: >>>> >>>> ValueError: Found array with 0 sample(s) (shape=(0, 491)) while a >>>> minimum of 1 is required. >>>> >>>> Now I don't understand this because when I print shapes of the samples: >>>> >>>> print (X_traina.shape, X_testa.shape, y_traina.shape, y_testa.shape) >>>> >>>> I'm getting: >>>> >>>> ((78, 491), (1489, 491), (78,), (1489,)) >>>> >>>> Interestingly, if I change the test_size parameter to 0.88 (like in >>>> the example corrected notebook) it works and this is the highest value >>>> where it works. For this value, the shapes are: >>>> >>>> ((188, 491), (1379, 491), (188,), (1379,)) >>>> >>>> So the question is - what should I change in my code to make it work >>>> for test_size set to 0.95 as well? >>>> >>>> Kind regards, >>>> >>>> Michal Nowotka >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> [email protected] >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> [email protected] >>> https://mail.python.org/mailman/listinfo/scikit-learn >> _______________________________________________ >> scikit-learn mailing list >> [email protected] >> https://mail.python.org/mailman/listinfo/scikit-learn > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > <http://mail.python.org/pipermail/scikit-learn/attachments/20160711/d66aa81c/attachment-0001.html> > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 4, Issue 15 > ******************************************* _______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
