Hello fellows, i am knew at slkearn and I have a question about GridSearchCV:
I am running the following code at a jupyter notebook : ----------------------*code*------------------------------- opt_models = dict() for feature in [features1, features2, features3, features4]: cmb = CMB(x_train, y_train, x_test, y_test, feature) cmb.fit() cmb.predict() opt_models[str(feature)]=cmb.get_best_model() ------------------------------------------------------- The CMB class is just a class that contains different classification models (SVC, decision tree, etc...). When cmb.fit() is running, a gridSearchCV is performed at the SVC model (which is within the cmb instance) in order to tune the hyperparameters C, gamma, and kernel. The SCV model is implemented using the sklearn.svm.SVC class. Here is the output of the first and second iteration of the for loop: ---------------------*output*------------------------------------- -> 1st iteration Fitting 5 folds for each of 12 candidates, totalling 60 fits [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers. [Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 6.1s [Parallel(n_jobs=-1)]: Done 2 tasks | elapsed: 6.1s [Parallel(n_jobs=-1)]: Done 3 tasks | elapsed: 6.1s [Parallel(n_jobs=-1)]: Done 4 tasks | elapsed: 6.2s [Parallel(n_jobs=-1)]: Done 5 tasks | elapsed: 6.2s [Parallel(n_jobs=-1)]: Done 6 tasks | elapsed: 6.2s [Parallel(n_jobs=-1)]: Done 7 tasks | elapsed: 6.2s [Parallel(n_jobs=-1)]: Done 8 tasks | elapsed: 6.2s [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 6.2s [Parallel(n_jobs=-1)]: Done 10 tasks | elapsed: 6.2s [Parallel(n_jobs=-1)]: Done 11 tasks | elapsed: 6.2s [Parallel(n_jobs=-1)]: Done 12 tasks | elapsed: 6.3s [Parallel(n_jobs=-1)]: Done 13 tasks | elapsed: 6.3s [Parallel(n_jobs=-1)]: Done 14 tasks | elapsed: 6.3s [Parallel(n_jobs=-1)]: Done 15 tasks | elapsed: 6.4s [Parallel(n_jobs=-1)]: Done 16 tasks | elapsed: 6.4s [Parallel(n_jobs=-1)]: Done 17 tasks | elapsed: 6.4s [Parallel(n_jobs=-1)]: Done 18 tasks | elapsed: 6.4s [Parallel(n_jobs=-1)]: Done 19 tasks | elapsed: 6.5s [Parallel(n_jobs=-1)]: Done 20 tasks | elapsed: 6.5s [Parallel(n_jobs=-1)]: Done 21 tasks | elapsed: 6.5s [Parallel(n_jobs=-1)]: Done 22 tasks | elapsed: 6.6s [Parallel(n_jobs=-1)]: Done 23 tasks | elapsed: 6.7s [Parallel(n_jobs=-1)]: Done 24 tasks | elapsed: 6.7s [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 6.7s [Parallel(n_jobs=-1)]: Done 26 tasks | elapsed: 6.8s [Parallel(n_jobs=-1)]: Done 27 tasks | elapsed: 6.8s [Parallel(n_jobs=-1)]: Done 28 tasks | elapsed: 6.9s [Parallel(n_jobs=-1)]: Done 29 tasks | elapsed: 6.9s [Parallel(n_jobs=-1)]: Done 30 tasks | elapsed: 6.9s [Parallel(n_jobs=-1)]: Done 31 tasks | elapsed: 7.0s [Parallel(n_jobs=-1)]: Done 32 tasks | elapsed: 7.0s [Parallel(n_jobs=-1)]: Done 33 tasks | elapsed: 7.0s [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 7.0s [Parallel(n_jobs=-1)]: Done 35 tasks | elapsed: 7.1s [Parallel(n_jobs=-1)]: Done 36 tasks | elapsed: 7.1s [Parallel(n_jobs=-1)]: Done 37 tasks | elapsed: 7.2s [Parallel(n_jobs=-1)]: Done 38 tasks | elapsed: 7.2s [Parallel(n_jobs=-1)]: Done 39 tasks | elapsed: 7.2s [Parallel(n_jobs=-1)]: Done 40 tasks | elapsed: 7.2s [Parallel(n_jobs=-1)]: Done 41 tasks | elapsed: 7.3s [Parallel(n_jobs=-1)]: Done 42 tasks | elapsed: 7.3s [Parallel(n_jobs=-1)]: Done 43 tasks | elapsed: 7.3s [Parallel(n_jobs=-1)]: Done 44 tasks | elapsed: 7.4s [Parallel(n_jobs=-1)]: Done 45 tasks | elapsed: 7.4s [Parallel(n_jobs=-1)]: Done 46 tasks | elapsed: 7.5s -> 2nd iteration Fitting 5 folds for each of 12 candidates, totalling 60 fits [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers. [Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 0.0s [Parallel(n_jobs=-1)]: Batch computation too fast (0.0260s.) Setting batch_size=14. [Parallel(n_jobs=-1)]: Done 2 tasks | elapsed: 0.0s [Parallel(n_jobs=-1)]: Done 3 tasks | elapsed: 0.0s [Parallel(n_jobs=-1)]: Done 4 tasks | elapsed: 0.0s [Parallel(n_jobs=-1)]: Done 5 tasks | elapsed: 0.0s [Parallel(n_jobs=-1)]: Done 60 out of 60 | elapsed: 0.7s finished --------------------------------------------------------------------------------------------------------------------- As you can see, the first iteration gets a elapsed time much larger than the 2nd iteration. Does it make sense? I am afraid that the model is doing some kind of cache or shortcut from the 1st iteration, and consequently could decrease the model training/performance? I already read the sklearn documentation and I didn't saw any warning/note about this kind of behaviour. Thank you very much for your time :)
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn