Hi Satra.
Thanks for your comments.
Can you explain what the "grap an engine" strategy means?
Is it that you distribute the jobs to the engines before starting any
jobs and not having them in a queue?
This should be ok if my jobs and my engines are pretty homogeneous, right?
The main question for me is if there is an easy, non-intrusive way to
let my interface interact with sklearn.
As far as I can tell, the work that is planned for the sprint is much
more "low level" than what joblib is doing at the moment.
There are two use cases for me: model selection and ensemble learning.
For both of them, parallelisation should be fairly easy.
I do have a shared file system and some engines might share memory but I
don't really want to go there.
The runtime of my jobs is much longer than memory transfer over the
network would be, so I think
I'd be happy with just using the IPython library.
Unfortunately, I can't come to Pycon. And I need a working solution
before that ;)
Well I actually have a working solution for forests but it's not good
enough ^^
Cheers,
Andy
On 01/27/2012 04:00 PM, Satrajit Ghosh wrote:
hi andreas,
a few notes:
- a sprint planned for pycon will be looking at parallel computing
with scikit-learn and ipython (http://wiki.ipython.org/PyCon12Sprint)
- ipython currently uses a grab an engine and not release strategy in
the context of distributed systems like SGE/PBS/LSF. this implies that
the load distribution happens at engine instantiation time not at
execution time. depending on your cluster this may be a positive or a
negative thing.
- in nipype we do distributed computing by offering the ability to use
ipython as the point of distribution or directly interfacing with the
cluster engine. here is the ipython plugin:
https://github.com/nipy/nipype/blob/master/nipype/pipeline/plugins/ipythonxi.py
- there is also a python library called the soma workflow that offers
a python interface to distributed computing using drmaa.
the key decision point for which route will depend on how the data
gets to the compute node (whether by files, or pickling, or shared
memory), whether the file system is shared or whether the data
movement is done between processes.
cheers,
satra
On Fri, Jan 27, 2012 at 9:44 AM, Andreas <[email protected]
<mailto:[email protected]>> wrote:
Hi everybody.
This question basically goes out to Gael, but might also be
interesting
for others.
I am using sklearn on an SGE cluster at the moment and it is not
as nice
as it could be.
So I was wondering whether there would be a non-intrusive way to make
sklearn
parallelize over the cluster.
At the moment all parallelism is handled by joblib. On the other
hand it
seems
IPython can talk to the SGE scheduling.
So I would love to have a way for joblib to talk to IPython.
Is there an easy way to make this possible?
I was thinking about monkey-patching the Parallel class to use
"LoadBalancedView" from IPython.
Do you think this is feasible?
Another question is whether there are additional assumptions made
by sklearn about the way the parallelism works.
IPython basically provides a "map" interface similar to "Parallel",
so I would hope that there are no problems. Do you think there
will be?
Any help would be welcome.
If I actually get this to work, I feel this might be quite a success
story for sklearn ;)
Cheers,
Andy
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft
developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3,
MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general