Greetings,
I wanted to ask the developers about an issue with scikit-learn's
parallelism. I previously posted it as an issue to the github:
https://github.com/scikit-learn/scikit-learn/issues/3754 but have not heard
anything back.
The gist of the issue is that it's not possible to nest parallel tasks in
scikit-learn currently. That is, if I have a classifier I am running a grid
search with, and that classifier supports paralllelism via the n_jobs
argument, I can't have both the classifier's n_jobs > 1 and the grid
search's n_jobs > 1. There are various examples of this issue showing up
throughout the entire package.
The issue is due to the fact that python does not allow a new ProcessPool
to be created in a subprocess.
So my proposal is that the APIs be changed to add a new parameter "delayed"
(default False) to anything with the n_jobs parameter.
If delayed is False, things run as normal.
If delayed is True, then rather than run the task and return the result,
the call will return a (delayed_jobs, completion_func) tuple. delayed_jobs
would be a tuple of joblib.delayed objects, which, could be run by a
joblib.Paralllel() set up by the caller. The results of the delayed jobs
could then be passed in to completion_func to finish the computation and
return the expected result.
This would allow the caller to obtain multiple sets of delayed_jobs, and
run them all together, then pass the results back to their respective
completion_func's to finish up. Effectively it would be possible to nest
parallelism.
Let me know if there's any interest in pursuing a solution to this problem.
I'd be willing to work on the PR if there is interest.
Thanks,
Dan Spitz
------------------------------------------------------------------------------
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general