It's a problem of excessive memory consumption due to a O(# possible
parameter settings) approach to sampling from discrete parameter grids
without replacement.

The fix was merged into master only hours ago. Please feel free to work
with master, or to cherry-pick febefb0

On 25 June 2015 at 16:22, Jason Sanchez <jason.sanchez.m...@statefarm.com>
wrote:

> This code that uses RandomizedSearchCV works fine in 0.15.2:
>
> import pandas as pd
> from sklearn.pipeline import Pipeline
> from sklearn.datasets import load_iris
> from sklearn.ensemble import RandomForestClassifier
> from sklearn.grid_search import RandomizedSearchCV
>
> iris = load_iris()
> X = iris.data
> y = iris.target
>
> pipeline = Pipeline([("rf", RandomForestClassifier())])
>
> params = {  "rf__n_estimators": range(10,50),
>             "rf__max_depth": range(5,10),
>             "rf__max_features": range(1, 5),
>             "rf__min_samples_split": range(5,101),
>             "rf__min_samples_leaf": range(20,50),
>             "rf__max_leaf_nodes": range(200, 350)}
>
> random_search = RandomizedSearchCV(pipeline, params).fit(X, y)
>
>
> It does not work in 0.16.1. When I kill the process, here is the Traceback:
> ---------------------------------------------------------------------------
> KeyboardInterrupt                         Traceback (most recent call last)
> <ipython-input-108-8794e7d30469> in <module>()
>      24 random_search = RandomizedSearchCV(pipeline, params,
> n_iter=n_iter_search, cv=2, refit=True, n_jobs=1)
>      25
> ---> 26 random_search.fit(X_iris, y_iris)
>
> /.../lib/python2.7/site-packages/sklearn/grid_search.pyc in fit(self, X, y)
>     896                                           self.n_iter,
>     897
>  random_state=self.random_state)
> --> 898         return self._fit(X, y, sampled_params)
>
> /.../lib/python2.7/site-packages/sklearn/grid_search.pyc in _fit(self, X,
> y, parameter_iterable)
>     503                                     self.fit_params,
> return_parameters=True,
>     504                                     error_score=self.error_score)
> --> 505                 for parameters in parameter_iterable
>     506                 for train, test in cv)
>     507
>
> /.../lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in
> __call__(self, iterable)
>     656                 os.environ[JOBLIB_SPAWNED_PROCESS] = '1'
>     657             self._iterating = True
> --> 658             for function, args, kwargs in iterable:
>     659                 self.dispatch(function, args, kwargs)
>     660
>
> /.../lib/python2.7/site-packages/sklearn/grid_search.pyc in
> <genexpr>(***failed resolving arguments***)
>     499             pre_dispatch=pre_dispatch
>     500         )(
> --> 501             delayed(_fit_and_score)(clone(base_estimator), X, y,
> self.scorer_,
>     502                                     train, test, self.verbose,
> parameters,
>     503                                     self.fit_params,
> return_parameters=True,
>
> /.../lib/python2.7/site-packages/sklearn/grid_search.pyc in __iter__(self)
>     180         if all_lists:
>     181             # get complete grid and yield from it
> --> 182             param_grid =
> list(ParameterGrid(self.param_distributions))
>     183             grid_size = len(param_grid)
>     184
>
> /.../lib/python2.7/site-packages/sklearn/grid_search.pyc in __iter__(self)
>     100                 keys, values = zip(*items)
>     101                 for v in product(*values):
> --> 102                     params = dict(zip(keys, v))
>     103                     yield params
>     104
>
> KeyboardInterrupt:
>
>
> Any thoughts?
>
>
> ------------------------------------------------------------------------------
> Monitor 25 network devices or servers for free with OpManager!
> OpManager is web-based network management software that monitors
> network devices and physical & virtual servers, alerts via email & sms
> for fault. Monitor 25 devices for free with no restriction. Download now
> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors 
network devices and physical & virtual servers, alerts via email & sms 
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to