[Scikit-learn-general] GridSearchCV/joblib and concurrent writing

Emanuele Olivetti Fri, 22 Jun 2012 08:12:54 -0700

Dear All,

After my previous question on how to use GridSearchCV with
boosting (thanks Peter and Andy!), here is another related one.


I'd like to keep a detailed log of the results of each fold
of each assignment to the parameters during model selection
via GridSearchCV. Peter suggested to redefine the .score()
function of the regressor of interest so I am going along
that direction. Note that I am heavily using the parallelization
capabilities of GridSearchCV, i.e. joblib, so each evaluation of
the regressor lies in a different process.

I know that sharing a global variable, e.g. a list, where to
put all the results of the classifier instances does not work here
because each process has its own copy of that variable - in short
global variables does not work with multiprocess. My current attempt is with
sqlite + sqlalchemy + pickle , i.e. writing a very short layer
by which the detailed results of each GridSearchCV step
are first pickled and then transparently mapped into a sqlite db thanks
to sqlalchemy. SQlite can handle concurrent writing of different
process to the same db... so this solution should work. But it does not...

Unfortunately I get this exception:
---
/tmp/python-15120YlP.py in <module>()
      56     clf = GridSearchCV(GradientBoostingRegressor(loss='ls', 
random_state=seed), 
param_grid=parameters, loss_func=None, n_jobs=-1, cv=n_folds, verbose=10)
      57
---> 58     clf.fit(X, y)
      59

/usr/lib/pymodules/python2.7/sklearn/grid_search.pyc in fit(self, X, y, 
**params)
     396                 X, y, base_clf, clf_params, train, test, 
self.loss_func,
     397                 self.score_func, self.verbose, **self.fit_params)
--> 398                     for clf_params in grid for train, test in cv)
     399
     400         # Out is a list of triplet: score, estimator, n_test_samples


/usr/lib/pymodules/python2.7/joblib/parallel.pyc in __call__(self, iterable)
     473                 self.dispatch(function, args, kwargs)
     474
--> 475             self.retrieve()
     476             # Make sure that we get a last message telling us we are 
done

     477             elapsed_time = time.time() - self._start_time

/usr/lib/pymodules/python2.7/joblib/parallel.pyc in retrieve(self)
     425                     # Convert this to a JoblibException

     426                     exception_type = _mk_exception(exception.etype)[0]
--> 427                     raise exception_type(report)
     428                 raise exception
     429

JoblibUnmappedInstanceError: JoblibUnmappedInstanceError
___________________________________________________________________________
Class '__builtin__.unicode' is not mapped
___________________________________________________________________________
-----------
which I do not undestand. The issue is triggered by "session.add(result)"
that lies within the overridden score function of the regressor
as suggested y Peter in the previous thread.

I understand all this could sound confusing and not well explained.
I a working on a minimal example to expose the issue. In the meanwhile
I am trying this first brief attempt to capture your interest on
this problem. Maybe some of you immediately see the issue without
further explanation. If you spot it, could you please explain?

BTW, is there a preferred way to communicate between the subprocesses
and the father process within sklearn?

Best,

Emanuele




------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] GridSearchCV/joblib and concurrent writing

Reply via email to