[Scikit-learn-general] Gridsearch pickle error with scipy distributions

Jason Sanchez Fri, 14 Aug 2015 13:10:18 -0700

This code raises a PicklingError:

from sklearn.datasets import load_boston
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.grid_search import RandomizedSearchCV
from sklearn.externals import joblib
from scipy.stats import randint


X, y = load_boston().data, load_boston().target
pipe = Pipeline([("rf", RandomForestRegressor())])
params = {"rf__n_estimators": randint(2,3)}  
random_search = RandomizedSearchCV(pipe, params, n_iter=1).fit(X, y)
joblib.dump(random_search, "final_model.pkl", compress=3)


In params, if randint(2,3) is changed to range(2,3), no pickling error occurs. 

In 0.16.2, changing all the parameters in a large grid search to ranges causes 
a memory error (due to all possible combinations being saved to an array), so 
this is not a workable solution.

Pickling just the best_estimator_ works (which is now what I do), but currently 
there does not seem to be a way to pickle a gridsearch that has a large number 
of hyper-parameters (very common with RandomizedSearchCV) in 0.16.2. 

You all do amazing work. Thank you all so much for your contributions to the 
project.

Jason

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Gridsearch pickle error with scipy distributions

Reply via email to