Hi Michał, What are the class counts in that set? Maybe there is a problem with generating stratified subsamples (eg some classes get below 1 sample)?
---- Pozdrawiam, | Best regards, Maciek Wójcikowski [email protected] 2016-07-08 17:22 GMT+02:00 Michał Nowotka <[email protected]>: > Hi, > > Sorry for cross posting > ( > http://stackoverflow.com/questions/38263933/scikit-learn-gridsearchcv-fit-method-valueerror-found-array-with-0-sample > ) > but I don't know where is better to get help with my problem. > I'm working on a VM with Jupyter notebook server installed. > From time to time I add new notebooks and reevaluate old ones to see > if they still work. > > This notebook stopped working due to some changes in scikit-learn API > and some parameters become obsolete: > > > https://github.com/chembl/mychembl/blob/master/ipython_notebooks/10_myChEMBL_machine_learning.ipynb > > I've created a corrected version of the notebook here: > > https://gist.github.com/anonymous/676c55cc501ffa48fecfcc1e1252d433 > > But I'm stuck in cell 36 on this code: > > from sklearn.cross_validation import KFold > from sklearn.grid_search import GridSearchCV > > X_traina, X_testa, y_traina, y_testa = > cross_validation.train_test_split(x, y, test_size=0.95, > random_state=23) > > params = {'min_samples_split': [8], 'max_depth': [20], > 'min_samples_leaf': [1],'n_estimators':[200]} > cv = KFold(n=len(X_traina),n_folds=10,shuffle=True) > cv_stratified = StratifiedKFold(y_traina, n_folds=5) > gs = GridSearchCV(custom_forest, params, > cv=cv_stratified,verbose=1,refit=True) > gs.fit(X_traina,y_traina) > > This gives me: > > ValueError: Found array with 0 sample(s) (shape=(0, 491)) while a > minimum of 1 is required. > > Now I don't understand this because when I print shapes of the samples: > > print (X_traina.shape, X_testa.shape, y_traina.shape, y_testa.shape) > > I'm getting: > > ((78, 491), (1489, 491), (78,), (1489,)) > > Interestingly, if I change the test_size parameter to 0.88 (like in > the example corrected notebook) it works and this is the highest value > where it works. For this value, the shapes are: > > ((188, 491), (1379, 491), (188,), (1379,)) > > So the question is - what should I change in my code to make it work > for test_size set to 0.95 as well? > > Kind regards, > > Michal Nowotka > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
