Hi Emanuele, I cannot make any sense form the stack trace - have you tried to run GridSearchCV with n_jobs=1? if so, does it work OK?
When do you write the parameters into the DB - at the end of ``score``? It would be great if you could provide a gist which exposes the problem. BTW: it's strange that grid_search.py is using your "global" joblib (/usr/lib/pymodules/python2.7/joblib/) and not the one that comes with sklearn (in sklearn/externals/joblib) best, Peter 2012/6/22 Emanuele Olivetti <[email protected]>: > Dear All, > > After my previous question on how to use GridSearchCV with > boosting (thanks Peter and Andy!), here is another related one. > > I'd like to keep a detailed log of the results of each fold > of each assignment to the parameters during model selection > via GridSearchCV. Peter suggested to redefine the .score() > function of the regressor of interest so I am going along > that direction. Note that I am heavily using the parallelization > capabilities of GridSearchCV, i.e. joblib, so each evaluation of > the regressor lies in a different process. > > I know that sharing a global variable, e.g. a list, where to > put all the results of the classifier instances does not work here > because each process has its own copy of that variable - in short > global variables does not work with multiprocess. My current attempt is with > sqlite + sqlalchemy + pickle , i.e. writing a very short layer > by which the detailed results of each GridSearchCV step > are first pickled and then transparently mapped into a sqlite db thanks > to sqlalchemy. SQlite can handle concurrent writing of different > process to the same db... so this solution should work. But it does not... > > Unfortunately I get this exception: > --- > /tmp/python-15120YlP.py in <module>() > 56 clf = GridSearchCV(GradientBoostingRegressor(loss='ls', > random_state=seed), > param_grid=parameters, loss_func=None, n_jobs=-1, cv=n_folds, verbose=10) > 57 > ---> 58 clf.fit(X, y) > 59 > > /usr/lib/pymodules/python2.7/sklearn/grid_search.pyc in fit(self, X, y, > **params) > 396 X, y, base_clf, clf_params, train, test, > self.loss_func, > 397 self.score_func, self.verbose, **self.fit_params) > --> 398 for clf_params in grid for train, test in cv) > 399 > 400 # Out is a list of triplet: score, estimator, n_test_samples > > > /usr/lib/pymodules/python2.7/joblib/parallel.pyc in __call__(self, iterable) > 473 self.dispatch(function, args, kwargs) > 474 > --> 475 self.retrieve() > 476 # Make sure that we get a last message telling us we are > done > > 477 elapsed_time = time.time() - self._start_time > > /usr/lib/pymodules/python2.7/joblib/parallel.pyc in retrieve(self) > 425 # Convert this to a JoblibException > > 426 exception_type = _mk_exception(exception.etype)[0] > --> 427 raise exception_type(report) > 428 raise exception > 429 > > JoblibUnmappedInstanceError: JoblibUnmappedInstanceError > ___________________________________________________________________________ > Class '__builtin__.unicode' is not mapped > ___________________________________________________________________________ > ----------- > which I do not undestand. The issue is triggered by "session.add(result)" > that lies within the overridden score function of the regressor > as suggested y Peter in the previous thread. > > I understand all this could sound confusing and not well explained. > I a working on a minimal example to expose the issue. In the meanwhile > I am trying this first brief attempt to capture your interest on > this problem. Maybe some of you immediately see the issue without > further explanation. If you spot it, could you please explain? > > BTW, is there a preferred way to communicate between the subprocesses > and the father process within sklearn? > > Best, > > Emanuele > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Peter Prettenhofer ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
