hi andreas,
when you launch ipcluster on SGE for example, it queues up a set of python
engines as jobs. these jobs will get distributed to the SGE execution pool
depending on it's current job distribution. The key to note here is that no
real job execution (in your case forests) have taken place yet. the
controller then sends jobs to these engines.
if you have 1000s of nodes with few users and a light load on the cluster,
this is not a problem at all. but if you have few nodes and lots of users
this is a no go.
i would think that joblib would be the best place to add this distributed
functionality.
cheers,
satra
On Fri, Jan 27, 2012 at 10:48 AM, Andreas <[email protected]> wrote:
> **
> Hi Satra.
> Thanks for your comments.
> Can you explain what the "grap an engine" strategy means?
> Is it that you distribute the jobs to the engines before starting any jobs
> and not having them in a queue?
> This should be ok if my jobs and my engines are pretty homogeneous, right?
>
> The main question for me is if there is an easy, non-intrusive way to let
> my interface interact with sklearn.
>
> As far as I can tell, the work that is planned for the sprint is much more
> "low level" than what joblib is doing at the moment.
> There are two use cases for me: model selection and ensemble learning.
> For both of them, parallelisation should be fairly easy.
>
> I do have a shared file system and some engines might share memory but I
> don't really want to go there.
> The runtime of my jobs is much longer than memory transfer over the
> network would be, so I think
> I'd be happy with just using the IPython library.
>
> Unfortunately, I can't come to Pycon. And I need a working solution before
> that ;)
> Well I actually have a working solution for forests but it's not good
> enough ^^
>
> Cheers,
> Andy
>
>
>
>
> On 01/27/2012 04:00 PM, Satrajit Ghosh wrote:
>
> hi andreas,
>
> a few notes:
>
> - a sprint planned for pycon will be looking at parallel computing with
> scikit-learn and ipython (http://wiki.ipython.org/PyCon12Sprint)
>
> - ipython currently uses a grab an engine and not release strategy in
> the context of distributed systems like SGE/PBS/LSF. this implies that the
> load distribution happens at engine instantiation time not at execution
> time. depending on your cluster this may be a positive or a negative thing.
>
> - in nipype we do distributed computing by offering the ability to use
> ipython as the point of distribution or directly interfacing with the
> cluster engine. here is the ipython plugin:
>
>
> https://github.com/nipy/nipype/blob/master/nipype/pipeline/plugins/ipythonxi.py
>
> - there is also a python library called the soma workflow that offers a
> python interface to distributed computing using drmaa.
>
> the key decision point for which route will depend on how the data gets
> to the compute node (whether by files, or pickling, or shared memory),
> whether the file system is shared or whether the data movement is done
> between processes.
>
> cheers,
>
> satra
>
>
> On Fri, Jan 27, 2012 at 9:44 AM, Andreas <[email protected]> wrote:
>
>> Hi everybody.
>> This question basically goes out to Gael, but might also be interesting
>> for others.
>> I am using sklearn on an SGE cluster at the moment and it is not as nice
>> as it could be.
>> So I was wondering whether there would be a non-intrusive way to make
>> sklearn
>> parallelize over the cluster.
>> At the moment all parallelism is handled by joblib. On the other hand it
>> seems
>> IPython can talk to the SGE scheduling.
>> So I would love to have a way for joblib to talk to IPython.
>>
>> Is there an easy way to make this possible?
>> I was thinking about monkey-patching the Parallel class to use
>> "LoadBalancedView" from IPython.
>> Do you think this is feasible?
>>
>> Another question is whether there are additional assumptions made
>> by sklearn about the way the parallelism works.
>> IPython basically provides a "map" interface similar to "Parallel",
>> so I would hope that there are no problems. Do you think there will be?
>>
>> Any help would be welcome.
>>
>> If I actually get this to work, I feel this might be quite a success
>> story for sklearn ;)
>>
>> Cheers,
>> Andy
>>
>>
>> ------------------------------------------------------------------------------
>> Try before you buy = See our experts in action!
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>> http://p.sf.net/sfu/learndevnow-dev2
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
> ------------------------------------------------------------------------------
> Try before you buy = See our experts in action!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe
> now!http://p.sf.net/sfu/learndevnow-dev2
>
>
> _______________________________________________
> Scikit-learn-general mailing
> [email protected]https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Try before you buy = See our experts in action!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-dev2
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general