This code raises a PicklingError: from sklearn.datasets import load_boston from sklearn.pipeline import Pipeline from sklearn.ensemble import RandomForestRegressor from sklearn.grid_search import RandomizedSearchCV from sklearn.externals import joblib from scipy.stats import randint
X, y = load_boston().data, load_boston().target pipe = Pipeline([("rf", RandomForestRegressor())]) params = {"rf__n_estimators": randint(2,3)} random_search = RandomizedSearchCV(pipe, params, n_iter=1).fit(X, y) joblib.dump(random_search, "final_model.pkl", compress=3) In params, if randint(2,3) is changed to range(2,3), no pickling error occurs. In 0.16.2, changing all the parameters in a large grid search to ranges causes a memory error (due to all possible combinations being saved to an array), so this is not a workable solution. Pickling just the best_estimator_ works (which is now what I do), but currently there does not seem to be a way to pickle a gridsearch that has a large number of hyper-parameters (very common with RandomizedSearchCV) in 0.16.2. You all do amazing work. Thank you all so much for your contributions to the project. Jason ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general