Hi Emanuele,

I cannot make any sense form the stack trace - have you tried to run
GridSearchCV with n_jobs=1? if so, does it work OK?

When do you write the parameters into the DB - at the end of ``score``?

It would be great if you could provide a gist which exposes the problem.

BTW: it's strange that grid_search.py is using your "global" joblib
(/usr/lib/pymodules/python2.7/joblib/) and not the one that comes with
sklearn (in sklearn/externals/joblib)

best,
 Peter


2012/6/22 Emanuele Olivetti <[email protected]>:
> Dear All,
>
> After my previous question on how to use GridSearchCV with
> boosting (thanks Peter and Andy!), here is another related one.
>
> I'd like to keep a detailed log of the results of each fold
> of each assignment to the parameters during model selection
> via GridSearchCV. Peter suggested to redefine the .score()
> function of the regressor of interest so I am going along
> that direction. Note that I am heavily using the parallelization
> capabilities of GridSearchCV, i.e. joblib, so each evaluation of
> the regressor lies in a different process.
>
> I know that sharing a global variable, e.g. a list, where to
> put all the results of the classifier instances does not work here
> because each process has its own copy of that variable - in short
> global variables does not work with multiprocess. My current attempt is with
> sqlite + sqlalchemy + pickle , i.e. writing a very short layer
> by which the detailed results of each GridSearchCV step
> are first pickled and then transparently mapped into a sqlite db thanks
> to sqlalchemy. SQlite can handle concurrent writing of different
> process to the same db... so this solution should work. But it does not...
>
> Unfortunately I get this exception:
> ---
> /tmp/python-15120YlP.py in <module>()
>      56     clf = GridSearchCV(GradientBoostingRegressor(loss='ls', 
> random_state=seed),
> param_grid=parameters, loss_func=None, n_jobs=-1, cv=n_folds, verbose=10)
>      57
> ---> 58     clf.fit(X, y)
>      59
>
> /usr/lib/pymodules/python2.7/sklearn/grid_search.pyc in fit(self, X, y, 
> **params)
>     396                 X, y, base_clf, clf_params, train, test, 
> self.loss_func,
>     397                 self.score_func, self.verbose, **self.fit_params)
> --> 398                     for clf_params in grid for train, test in cv)
>     399
>     400         # Out is a list of triplet: score, estimator, n_test_samples
>
>
> /usr/lib/pymodules/python2.7/joblib/parallel.pyc in __call__(self, iterable)
>     473                 self.dispatch(function, args, kwargs)
>     474
> --> 475             self.retrieve()
>     476             # Make sure that we get a last message telling us we are 
> done
>
>     477             elapsed_time = time.time() - self._start_time
>
> /usr/lib/pymodules/python2.7/joblib/parallel.pyc in retrieve(self)
>     425                     # Convert this to a JoblibException
>
>     426                     exception_type = _mk_exception(exception.etype)[0]
> --> 427                     raise exception_type(report)
>     428                 raise exception
>     429
>
> JoblibUnmappedInstanceError: JoblibUnmappedInstanceError
> ___________________________________________________________________________
> Class '__builtin__.unicode' is not mapped
> ___________________________________________________________________________
> -----------
> which I do not undestand. The issue is triggered by "session.add(result)"
> that lies within the overridden score function of the regressor
> as suggested y Peter in the previous thread.
>
> I understand all this could sound confusing and not well explained.
> I a working on a minimal example to expose the issue. In the meanwhile
> I am trying this first brief attempt to capture your interest on
> this problem. Maybe some of you immediately see the issue without
> further explanation. If you spot it, could you please explain?
>
> BTW, is there a preferred way to communicate between the subprocesses
> and the father process within sklearn?
>
> Best,
>
> Emanuele
>
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



-- 
Peter Prettenhofer

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to