Hi, Sorry for cross posting (http://stackoverflow.com/questions/38263933/scikit-learn-gridsearchcv-fit-method-valueerror-found-array-with-0-sample) but I don't know where is better to get help with my problem. I'm working on a VM with Jupyter notebook server installed. >From time to time I add new notebooks and reevaluate old ones to see if they still work.
This notebook stopped working due to some changes in scikit-learn API and some parameters become obsolete: https://github.com/chembl/mychembl/blob/master/ipython_notebooks/10_myChEMBL_machine_learning.ipynb I've created a corrected version of the notebook here: https://gist.github.com/anonymous/676c55cc501ffa48fecfcc1e1252d433 But I'm stuck in cell 36 on this code: from sklearn.cross_validation import KFold from sklearn.grid_search import GridSearchCV X_traina, X_testa, y_traina, y_testa = cross_validation.train_test_split(x, y, test_size=0.95, random_state=23) params = {'min_samples_split': [8], 'max_depth': [20], 'min_samples_leaf': [1],'n_estimators':[200]} cv = KFold(n=len(X_traina),n_folds=10,shuffle=True) cv_stratified = StratifiedKFold(y_traina, n_folds=5) gs = GridSearchCV(custom_forest, params, cv=cv_stratified,verbose=1,refit=True) gs.fit(X_traina,y_traina) This gives me: ValueError: Found array with 0 sample(s) (shape=(0, 491)) while a minimum of 1 is required. Now I don't understand this because when I print shapes of the samples: print (X_traina.shape, X_testa.shape, y_traina.shape, y_testa.shape) I'm getting: ((78, 491), (1489, 491), (78,), (1489,)) Interestingly, if I change the test_size parameter to 0.88 (like in the example corrected notebook) it works and this is the highest value where it works. For this value, the shapes are: ((188, 491), (1379, 491), (188,), (1379,)) So the question is - what should I change in my code to make it work for test_size set to 0.95 as well? Kind regards, Michal Nowotka _______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
