Hey Guillaume, first of all, thank you for the help. I checked my code and memory is turned of (parameter is using default). And yes, I am using a different number of features everytime.
Guillaume Lemaître <g.lemaitr...@gmail.com> escreveu no dia quarta, 27/05/2020 à(s) 16:55: > Regarding scikit-learn, the only thing that we cache is the transformer > processing in the pipeline (see the memory parameter in Pipeline). > > It seems that you are passing a different set of features at each > iteration. Is the number of features different? > > On Sun, 29 Mar 2020 at 19:23, Pedro Cardoso <pedro.cardoso.c...@gmail.com> > wrote: > >> Hello fellows, >> >> i am knew at slkearn and I have a question about GridSearchCV: >> >> I am running the following code at a jupyter notebook : >> >> ----------------------*code*------------------------------- >> >> opt_models = dict() >> for feature in [features1, features2, features3, features4]: >> cmb = CMB(x_train, y_train, x_test, y_test, feature) >> cmb.fit() >> cmb.predict() >> opt_models[str(feature)]=cmb.get_best_model() >> >> ------------------------------------------------------- >> >> The CMB class is just a class that contains different classification >> models (SVC, decision tree, etc...). When cmb.fit() is running, a >> gridSearchCV is performed at the SVC model (which is within the cmb >> instance) in order to tune the hyperparameters C, gamma, and kernel. The >> SCV model is implemented using the sklearn.svm.SVC class. Here is the >> output of the first and second iteration of the for loop: >> >> ---------------------*output*------------------------------------- >> -> 1st iteration >> >> >> Fitting 5 folds for each of 12 candidates, totalling 60 fits >> >> [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers. >> [Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 6.1s >> [Parallel(n_jobs=-1)]: Done 2 tasks | elapsed: 6.1s >> [Parallel(n_jobs=-1)]: Done 3 tasks | elapsed: 6.1s >> [Parallel(n_jobs=-1)]: Done 4 tasks | elapsed: 6.2s >> [Parallel(n_jobs=-1)]: Done 5 tasks | elapsed: 6.2s >> [Parallel(n_jobs=-1)]: Done 6 tasks | elapsed: 6.2s >> [Parallel(n_jobs=-1)]: Done 7 tasks | elapsed: 6.2s >> [Parallel(n_jobs=-1)]: Done 8 tasks | elapsed: 6.2s >> [Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 6.2s >> [Parallel(n_jobs=-1)]: Done 10 tasks | elapsed: 6.2s >> [Parallel(n_jobs=-1)]: Done 11 tasks | elapsed: 6.2s >> [Parallel(n_jobs=-1)]: Done 12 tasks | elapsed: 6.3s >> [Parallel(n_jobs=-1)]: Done 13 tasks | elapsed: 6.3s >> [Parallel(n_jobs=-1)]: Done 14 tasks | elapsed: 6.3s >> [Parallel(n_jobs=-1)]: Done 15 tasks | elapsed: 6.4s >> [Parallel(n_jobs=-1)]: Done 16 tasks | elapsed: 6.4s >> [Parallel(n_jobs=-1)]: Done 17 tasks | elapsed: 6.4s >> [Parallel(n_jobs=-1)]: Done 18 tasks | elapsed: 6.4s >> [Parallel(n_jobs=-1)]: Done 19 tasks | elapsed: 6.5s >> [Parallel(n_jobs=-1)]: Done 20 tasks | elapsed: 6.5s >> [Parallel(n_jobs=-1)]: Done 21 tasks | elapsed: 6.5s >> [Parallel(n_jobs=-1)]: Done 22 tasks | elapsed: 6.6s >> [Parallel(n_jobs=-1)]: Done 23 tasks | elapsed: 6.7s >> [Parallel(n_jobs=-1)]: Done 24 tasks | elapsed: 6.7s >> [Parallel(n_jobs=-1)]: Done 25 tasks | elapsed: 6.7s >> [Parallel(n_jobs=-1)]: Done 26 tasks | elapsed: 6.8s >> [Parallel(n_jobs=-1)]: Done 27 tasks | elapsed: 6.8s >> [Parallel(n_jobs=-1)]: Done 28 tasks | elapsed: 6.9s >> [Parallel(n_jobs=-1)]: Done 29 tasks | elapsed: 6.9s >> [Parallel(n_jobs=-1)]: Done 30 tasks | elapsed: 6.9s >> [Parallel(n_jobs=-1)]: Done 31 tasks | elapsed: 7.0s >> [Parallel(n_jobs=-1)]: Done 32 tasks | elapsed: 7.0s >> [Parallel(n_jobs=-1)]: Done 33 tasks | elapsed: 7.0s >> [Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 7.0s >> [Parallel(n_jobs=-1)]: Done 35 tasks | elapsed: 7.1s >> [Parallel(n_jobs=-1)]: Done 36 tasks | elapsed: 7.1s >> [Parallel(n_jobs=-1)]: Done 37 tasks | elapsed: 7.2s >> [Parallel(n_jobs=-1)]: Done 38 tasks | elapsed: 7.2s >> [Parallel(n_jobs=-1)]: Done 39 tasks | elapsed: 7.2s >> [Parallel(n_jobs=-1)]: Done 40 tasks | elapsed: 7.2s >> [Parallel(n_jobs=-1)]: Done 41 tasks | elapsed: 7.3s >> [Parallel(n_jobs=-1)]: Done 42 tasks | elapsed: 7.3s >> [Parallel(n_jobs=-1)]: Done 43 tasks | elapsed: 7.3s >> [Parallel(n_jobs=-1)]: Done 44 tasks | elapsed: 7.4s >> [Parallel(n_jobs=-1)]: Done 45 tasks | elapsed: 7.4s >> [Parallel(n_jobs=-1)]: Done 46 tasks | elapsed: 7.5s >> >> >> -> 2nd iteration >> >> Fitting 5 folds for each of 12 candidates, totalling 60 fits >> >> [Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers. >> [Parallel(n_jobs=-1)]: Done 1 tasks | elapsed: 0.0s >> [Parallel(n_jobs=-1)]: Batch computation too fast (0.0260s.) Setting >> batch_size=14. >> [Parallel(n_jobs=-1)]: Done 2 tasks | elapsed: 0.0s >> [Parallel(n_jobs=-1)]: Done 3 tasks | elapsed: 0.0s >> [Parallel(n_jobs=-1)]: Done 4 tasks | elapsed: 0.0s >> [Parallel(n_jobs=-1)]: Done 5 tasks | elapsed: 0.0s >> [Parallel(n_jobs=-1)]: Done 60 out of 60 | elapsed: 0.7s finished >> >> >> --------------------------------------------------------------------------------------------------------------------- >> >> >> As you can see, the first iteration gets a elapsed time much larger than >> the 2nd iteration. Does it make sense? I am afraid that the model is doing >> some kind of cache or shortcut from the 1st iteration, and consequently >> could decrease the model training/performance? I already read the sklearn >> documentation and I didn't saw any warning/note about this kind of >> behaviour. >> >> Thank you very much for your time :) >> >> >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > -- > Guillaume Lemaitre > Scikit-learn @ Inria Foundation > https://glemaitre.github.io/ > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn