This is a known scipy deficiency. See
https://github.com/scipy/scipy/pull/4821 and related issues.

On 15 August 2015 at 05:37, Jason Sanchez <jason.sanchez.m...@statefarm.com>
wrote:

> This code raises a PicklingError:
>
> from sklearn.datasets import load_boston
> from sklearn.pipeline import Pipeline
> from sklearn.ensemble import RandomForestRegressor
> from sklearn.grid_search import RandomizedSearchCV
> from sklearn.externals import joblib
> from scipy.stats import randint
>
> X, y = load_boston().data, load_boston().target
> pipe = Pipeline([("rf", RandomForestRegressor())])
> params = {"rf__n_estimators": randint(2,3)}
> random_search = RandomizedSearchCV(pipe, params, n_iter=1).fit(X, y)
> joblib.dump(random_search, "final_model.pkl", compress=3)
>
>
> In params, if randint(2,3) is changed to range(2,3), no pickling error
> occurs.
>
> In 0.16.2, changing all the parameters in a large grid search to ranges
> causes a memory error (due to all possible combinations being saved to an
> array), so this is not a workable solution.
>
> Pickling just the best_estimator_ works (which is now what I do), but
> currently there does not seem to be a way to pickle a gridsearch that has a
> large number of hyper-parameters (very common with RandomizedSearchCV) in
> 0.16.2.
>
> You all do amazing work. Thank you all so much for your contributions to
> the project.
>
> Jason
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to