Shouldn't you pass labels (binary) instead of continuous data? If you wish to stick to logK's and keep the distribution unchanged then you'd better reduce the number of classes (eg round the values to nearest integer?).
It might be the case that the counts per class are floored and you get 0 for some cases. ---- Pozdrawiam, | Best regards, Maciek Wójcikowski [email protected] 2016-07-11 13:16 GMT+02:00 Michał Nowotka <[email protected]>: > Hi Maciek, > > Thanks for suggestion, I think the problem indeed is related to the > StratifiedKFold because if I use KFold instead the code works fine. > However, if I print StratifiedKFold object it looks fine to me: > > sklearn.cross_validation.StratifiedKFold(labels=[ 5.43 8.74 8.1 > 6.55 7.66 6.52 8.6 7.1 6.4 8.05 7.89 6.68 > 8.06 6.17 5.5 7.96 5.78 6. 7.74 5.83 6.51 6.31 6.68 9.22 > 6.07 7.06 7.12 8.64 5.72 6.4 7.64 5.74 7.41 6.49 6.81 7.1 > 7.66 6.68 7.05 6.28 5.49 6.35 6.9 6.2 7.51 5.65 9.3 5.84 > 6.92 5.75 6.92 8.8 7.04 5.81 5.73 5.31 7.13 7.66 6.98 5.93 > 8.24 6.96 8.22 7.27 7.34 5.91 5.57 6.5 7.28 6.74 4.92 6.88 > 5.8 9.15 6.63 6.37 8.66 6.4 ], n_folds=5, shuffle=False, > random_state=None) > > > On Fri, Jul 8, 2016 at 10:42 PM, Maciek Wójcikowski > <[email protected]> wrote: > > Hi Michał, > > > > What are the class counts in that set? Maybe there is a problem with > > generating stratified subsamples (eg some classes get below 1 sample)? > > > > ---- > > Pozdrawiam, | Best regards, > > Maciek Wójcikowski > > [email protected] > > > > 2016-07-08 17:22 GMT+02:00 Michał Nowotka <[email protected]>: > >> > >> Hi, > >> > >> Sorry for cross posting > >> > >> ( > http://stackoverflow.com/questions/38263933/scikit-learn-gridsearchcv-fit-method-valueerror-found-array-with-0-sample > ) > >> but I don't know where is better to get help with my problem. > >> I'm working on a VM with Jupyter notebook server installed. > >> From time to time I add new notebooks and reevaluate old ones to see > >> if they still work. > >> > >> This notebook stopped working due to some changes in scikit-learn API > >> and some parameters become obsolete: > >> > >> > >> > https://github.com/chembl/mychembl/blob/master/ipython_notebooks/10_myChEMBL_machine_learning.ipynb > >> > >> I've created a corrected version of the notebook here: > >> > >> https://gist.github.com/anonymous/676c55cc501ffa48fecfcc1e1252d433 > >> > >> But I'm stuck in cell 36 on this code: > >> > >> from sklearn.cross_validation import KFold > >> from sklearn.grid_search import GridSearchCV > >> > >> X_traina, X_testa, y_traina, y_testa = > >> cross_validation.train_test_split(x, y, test_size=0.95, > >> random_state=23) > >> > >> params = {'min_samples_split': [8], 'max_depth': [20], > >> 'min_samples_leaf': [1],'n_estimators':[200]} > >> cv = KFold(n=len(X_traina),n_folds=10,shuffle=True) > >> cv_stratified = StratifiedKFold(y_traina, n_folds=5) > >> gs = GridSearchCV(custom_forest, params, > >> cv=cv_stratified,verbose=1,refit=True) > >> gs.fit(X_traina,y_traina) > >> > >> This gives me: > >> > >> ValueError: Found array with 0 sample(s) (shape=(0, 491)) while a > >> minimum of 1 is required. > >> > >> Now I don't understand this because when I print shapes of the samples: > >> > >> print (X_traina.shape, X_testa.shape, y_traina.shape, y_testa.shape) > >> > >> I'm getting: > >> > >> ((78, 491), (1489, 491), (78,), (1489,)) > >> > >> Interestingly, if I change the test_size parameter to 0.88 (like in > >> the example corrected notebook) it works and this is the highest value > >> where it works. For this value, the shapes are: > >> > >> ((188, 491), (1379, 491), (188,), (1379,)) > >> > >> So the question is - what should I change in my code to make it work > >> for test_size set to 0.95 as well? > >> > >> Kind regards, > >> > >> Michal Nowotka > >> _______________________________________________ > >> scikit-learn mailing list > >> [email protected] > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > _______________________________________________ > > scikit-learn mailing list > > [email protected] > > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
