Regarding scikit-learn, the only thing that we cache is the transformer processing in the pipeline (see the memory parameter in Pipeline).
It seems that you are passing a different set of features at each iteration. Is the number of features different? On Sun, 29 Mar 2020 at 19:23, Pedro Cardoso <pedro.cardoso.c...@gmail.com> wrote: > Hello fellows, > > i am knew at slkearn and I have a question about GridSearchCV: > > I am running the following code at a jupyter notebook : > > ----------------------*code*------------------------------- > > opt_models = dict() > for feature in [features1, features2, features3, features4]: > cmb = CMB(x_train, y_train, x_test, y_test, feature) > cmb.fit() > cmb.predict() > opt_models[str(feature)]=cmb.get_best_model() > > ------------------------------------------------------- > > The CMB class is just a class that contains different classification > models (SVC, decision tree, etc...). When cmb.fit() is running, a > gridSearchCV is performed at the SVC model (which is within the cmb > instance) in order to tune the hyperparameters C, gamma, and kernel. The > SCV model is implemented using the sklearn.svm.SVC class. Here is the > output of the first and second iteration of the for loop: > > ---------------------*output*------------------------------------- > -> 1st iteration > > > Fitting 5 folds for each of 12 candidates, totalling 60 fits > > [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers. > [Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 6.1s > [Parallel(n_jobs=-1)]: Done 2 tasks | elapsed: 6.1s > [Parallel(n_jobs=-1)]: Done 3 tasks | elapsed: 6.1s > [Parallel(n_jobs=-1)]: Done 4 tasks | elapsed: 6.2s > [Parallel(n_jobs=-1)]: Done 5 tasks | elapsed: 6.2s > [Parallel(n_jobs=-1)]: Done 6 tasks | elapsed: 6.2s > [Parallel(n_jobs=-1)]: Done 7 tasks | elapsed: 6.2s > [Parallel(n_jobs=-1)]: Done 8 tasks | elapsed: 6.2s > [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 6.2s > [Parallel(n_jobs=-1)]: Done 10 tasks | elapsed: 6.2s > [Parallel(n_jobs=-1)]: Done 11 tasks | elapsed: 6.2s > [Parallel(n_jobs=-1)]: Done 12 tasks | elapsed: 6.3s > [Parallel(n_jobs=-1)]: Done 13 tasks | elapsed: 6.3s > [Parallel(n_jobs=-1)]: Done 14 tasks | elapsed: 6.3s > [Parallel(n_jobs=-1)]: Done 15 tasks | elapsed: 6.4s > [Parallel(n_jobs=-1)]: Done 16 tasks | elapsed: 6.4s > [Parallel(n_jobs=-1)]: Done 17 tasks | elapsed: 6.4s > [Parallel(n_jobs=-1)]: Done 18 tasks | elapsed: 6.4s > [Parallel(n_jobs=-1)]: Done 19 tasks | elapsed: 6.5s > [Parallel(n_jobs=-1)]: Done 20 tasks | elapsed: 6.5s > [Parallel(n_jobs=-1)]: Done 21 tasks | elapsed: 6.5s > [Parallel(n_jobs=-1)]: Done 22 tasks | elapsed: 6.6s > [Parallel(n_jobs=-1)]: Done 23 tasks | elapsed: 6.7s > [Parallel(n_jobs=-1)]: Done 24 tasks | elapsed: 6.7s > [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 6.7s > [Parallel(n_jobs=-1)]: Done 26 tasks | elapsed: 6.8s > [Parallel(n_jobs=-1)]: Done 27 tasks | elapsed: 6.8s > [Parallel(n_jobs=-1)]: Done 28 tasks | elapsed: 6.9s > [Parallel(n_jobs=-1)]: Done 29 tasks | elapsed: 6.9s > [Parallel(n_jobs=-1)]: Done 30 tasks | elapsed: 6.9s > [Parallel(n_jobs=-1)]: Done 31 tasks | elapsed: 7.0s > [Parallel(n_jobs=-1)]: Done 32 tasks | elapsed: 7.0s > [Parallel(n_jobs=-1)]: Done 33 tasks | elapsed: 7.0s > [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 7.0s > [Parallel(n_jobs=-1)]: Done 35 tasks | elapsed: 7.1s > [Parallel(n_jobs=-1)]: Done 36 tasks | elapsed: 7.1s > [Parallel(n_jobs=-1)]: Done 37 tasks | elapsed: 7.2s > [Parallel(n_jobs=-1)]: Done 38 tasks | elapsed: 7.2s > [Parallel(n_jobs=-1)]: Done 39 tasks | elapsed: 7.2s > [Parallel(n_jobs=-1)]: Done 40 tasks | elapsed: 7.2s > [Parallel(n_jobs=-1)]: Done 41 tasks | elapsed: 7.3s > [Parallel(n_jobs=-1)]: Done 42 tasks | elapsed: 7.3s > [Parallel(n_jobs=-1)]: Done 43 tasks | elapsed: 7.3s > [Parallel(n_jobs=-1)]: Done 44 tasks | elapsed: 7.4s > [Parallel(n_jobs=-1)]: Done 45 tasks | elapsed: 7.4s > [Parallel(n_jobs=-1)]: Done 46 tasks | elapsed: 7.5s > > > -> 2nd iteration > > Fitting 5 folds for each of 12 candidates, totalling 60 fits > > [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers. > [Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 0.0s > [Parallel(n_jobs=-1)]: Batch computation too fast (0.0260s.) Setting > batch_size=14. > [Parallel(n_jobs=-1)]: Done 2 tasks | elapsed: 0.0s > [Parallel(n_jobs=-1)]: Done 3 tasks | elapsed: 0.0s > [Parallel(n_jobs=-1)]: Done 4 tasks | elapsed: 0.0s > [Parallel(n_jobs=-1)]: Done 5 tasks | elapsed: 0.0s > [Parallel(n_jobs=-1)]: Done 60 out of 60 | elapsed: 0.7s finished > > > --------------------------------------------------------------------------------------------------------------------- > > > As you can see, the first iteration gets a elapsed time much larger than > the 2nd iteration. Does it make sense? I am afraid that the model is doing > some kind of cache or shortcut from the 1st iteration, and consequently > could decrease the model training/performance? I already read the sklearn > documentation and I didn't saw any warning/note about this kind of > behaviour. > > Thank you very much for your time :) > > > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn