Guillaume - thank you for the comments. Indeed, an approach to "freeze" a fitted classifier would solve our problem. The Github issue seems to be inactive for a while, but I will check if anyone else is working on it.
Luiz Gustavo On Wed, Sep 19, 2018 at 12:02 PM <scikit-learn-requ...@python.org> wrote: > Send scikit-learn mailing list submissions to > scikit-learn@python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/scikit-learn > or, via email, send a message with subject or body 'help' to > scikit-learn-requ...@python.org > > You can reach the person managing the list at > scikit-learn-ow...@python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > > 1. Re: Issues with clone for ensemble of classifiers > (Guillaume Lema?tre) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 19 Sep 2018 17:38:46 +0200 > From: Guillaume Lema?tre <g.lemaitr...@gmail.com> > To: Scikit-learn user and developer mailing list > <scikit-learn@python.org> > Subject: Re: [scikit-learn] Issues with clone for ensemble of > classifiers > Message-ID: > <CACDxx9gyszjJP-5ZB_bvH4nCkdn-sb6CCb= > k2j_koonfpbq...@mail.gmail.com> > Content-Type: text/plain; charset="UTF-8" > > However, there is some issue to frozen a fitted classifier. You can refer > to: > > https://github.com/scikit-learn/scikit-learn/issues/8370 > > with the associated discussion. > On Wed, 19 Sep 2018 at 17:34, Guillaume Lema?tre <g.lemaitr...@gmail.com> > wrote: > > > > Ups I misread your comment. I don't think that we have currently a > > mechanism to avoid cloning classifier internally. > > On Wed, 19 Sep 2018 at 17:31, Guillaume Lema?tre <g.lemaitr...@gmail.com> > wrote: > > > > > > You don't have anywhere in your class MyClassifier where you are > > > calling base_classifier.fit(...) therefore when calling > > > base_classifier.predict(...) it will let you know that you did not fit > > > it. > > > > > > On Wed, 19 Sep 2018 at 16:43, Luiz Gustavo Hafemann <luiz...@gmail.com> > wrote: > > > > > > > > Hello, > > > > > > > > I am one of the developers of a library for Dynamic Ensemble > Selection (DES) methods (the library is called DESlib), and we are > currently working to get the library fully compatible with scikit-learn (to > submit it to scikit-learn-contrib). We have "check_estimator" working for > most of the classes, but now I am having problems to make the classes > compatible with GridSearch / other CV functions. > > > > > > > > One of the main use cases of this library is to facilitate research > on this field, and this led to a design decision that the base classifiers > are fit by the user, and the DES methods receive a pool of base classifiers > that were already fit (this allow users to compare many DES techniques with > the same base classifiers). This is creating an issue with GridSearch, > since the clone method (defined in sklearn.base) is not cloning the classes > as we would like. It does a shallow (non-deep) copy of the parameters, but > we would like the pool of base classifiers to be deep-copied. > > > > > > > > I analyzed this issue and I could not find a solution that does not > require changes on the scikit-learn code. Here is the sequence of steps > that cause the problem: > > > > > > > > GridSearchCV calls "clone" on the DES estimator (link) > > > > The clone function calls the "get_params" function of the DES > estimator (link, line 60). We don't re-implement this function, so it gets > all the parameters, including the pool of classifiers (at this point, they > are still "fitted") > > > > The clone function then clones each parameter with safe=False (line > 62). When cloning the pool of classifiers, the result is a pool that is not > "fitted" anymore. > > > > > > > > The problem is that, to my knowledge, there is no way for my > classifier to inform "clone" that a parameter should be always deep copied. > I see that other ensemble methods in sklearn always fit the base > classifiers within the "fit" method of the ensemble, so this problem does > not happen there. I would like to know if there is a solution for this > problem while having the base classifiers fitted elsewhere. > > > > > > > > Here is a short code that reproduces the issue: > > > > > > > > --------------------------- > > > > > > > > from sklearn.model_selection import GridSearchCV, train_test_split > > > > from sklearn.base import BaseEstimator, ClassifierMixin > > > > from sklearn.ensemble import BaggingClassifier > > > > from sklearn.datasets import load_iris > > > > > > > > > > > > class MyClassifier(BaseEstimator, ClassifierMixin): > > > > def __init__(self, base_classifiers, k): > > > > self.base_classifiers = base_classifiers # Base classifiers > that are already trained > > > > self.k = k # Simulate a parameter that we want to do a grid > search on > > > > > > > > def fit(self, X_dsel, y_dsel): > > > > pass # Here we would fit any parameters for the Dynamic > selection method, not the base classifiers > > > > > > > > def predict(self, X): > > > > return self.base_classifiers.predict(X) # In practice the > methods would do something with the predictions of each classifier > > > > > > > > > > > > X, y = load_iris(return_X_y=True) > > > > X_train, X_dsel, y_train, y_dsel = train_test_split(X, y, > test_size=0.5) > > > > > > > > base_classifiers = BaggingClassifier() > > > > base_classifiers.fit(X_train, y_train) > > > > > > > > clf = MyClassifier(base_classifiers, k=1) > > > > > > > > params = {'k': [1, 3, 5, 7]} > > > > grid = GridSearchCV(clf, params) > > > > > > > > grid.fit(X_dsel, y_dsel) # Raises error that the bagging > classifiers are not fitted > > > > > > > > --------------------------- > > > > > > > > Btw, here is the branch that we are using to make the library > compatible with sklearn: > https://github.com/Menelau/DESlib/tree/sklearn-estimators. The failing > test related to this issue is in > https://github.com/Menelau/DESlib/blob/sklearn-estimators/deslib/tests/test_des_integration.py#L36 > > > > > > > > Thanks in advance for any help on this case, > > > > > > > > Luiz Gustavo Hafemann > > > > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn@python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > -- > > > Guillaume Lemaitre > > > INRIA Saclay - Parietal team > > > Center for Data Science Paris-Saclay > > > https://glemaitre.github.io/ > > > > > > > > -- > > Guillaume Lemaitre > > INRIA Saclay - Parietal team > > Center for Data Science Paris-Saclay > > https://glemaitre.github.io/ > > > > -- > Guillaume Lemaitre > INRIA Saclay - Parietal team > Center for Data Science Paris-Saclay > https://glemaitre.github.io/ > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > ------------------------------ > > End of scikit-learn Digest, Vol 30, Issue 14 > ******************************************** >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn