This is a known scipy deficiency. See https://github.com/scipy/scipy/pull/4821 and related issues.
On 15 August 2015 at 05:37, Jason Sanchez <jason.sanchez.m...@statefarm.com> wrote: > This code raises a PicklingError: > > from sklearn.datasets import load_boston > from sklearn.pipeline import Pipeline > from sklearn.ensemble import RandomForestRegressor > from sklearn.grid_search import RandomizedSearchCV > from sklearn.externals import joblib > from scipy.stats import randint > > X, y = load_boston().data, load_boston().target > pipe = Pipeline([("rf", RandomForestRegressor())]) > params = {"rf__n_estimators": randint(2,3)} > random_search = RandomizedSearchCV(pipe, params, n_iter=1).fit(X, y) > joblib.dump(random_search, "final_model.pkl", compress=3) > > > In params, if randint(2,3) is changed to range(2,3), no pickling error > occurs. > > In 0.16.2, changing all the parameters in a large grid search to ranges > causes a memory error (due to all possible combinations being saved to an > array), so this is not a workable solution. > > Pickling just the best_estimator_ works (which is now what I do), but > currently there does not seem to be a way to pickle a gridsearch that has a > large number of hyper-parameters (very common with RandomizedSearchCV) in > 0.16.2. > > You all do amazing work. Thank you all so much for your contributions to > the project. > > Jason > > > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general