Re: [scikit-learn] Scaling model selection on a cluster

federico vaggi Sun, 07 Aug 2016 02:08:34 -0700

This might be interesting to you:

http://blaze.pydata.org/blog/2015/10/19/dask-learn/



On Sun, 7 Aug 2016 at 10:42 Vlad Ionescu <[email protected]> wrote:

> Hello,
>
> I am interested in scaling grid searches on an HPC LSF cluster with about
> 60 nodes, each with 20 cores. I thought i could just set n_jobs=1000 then
> submit a job with bsub -n 1000, but then I dug deeper and understood that
> the underlying joblib used by scikit-learn will create all of those jobs on
> a single node, resulting in no performance benefits. So I am stuck using a
> single node.
>
> I've read a lengthy discussion some time ago about adding something like
> this in scikit-learn:
> https://sourceforge.net/p/scikit-learn/mailman/scikit-learn-general/thread/[email protected]/
>
>
> However, it hasn't materialized in any way, as far as I can tell.
>
> Do you know of any way to do this, or any modern cluster computing
> libraries for python that might help me write something myself (I found a
> lot, but it's hard to tell what's considered good or even still under
> development)?
>
> Also, are there still plans to implement this in scikit-learn? You seemed
> to like the idea back then.
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
>

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Scaling model selection on a cluster

Reply via email to