Hello,
I am interested in scaling grid searches on an HPC LSF cluster with about
60 nodes, each with 20 cores. I thought i could just set n_jobs=1000 then
submit a job with bsub -n 1000, but then I dug deeper and understood that
the underlying joblib used by scikit-learn will create all of those j
This might be interesting to you:
http://blaze.pydata.org/blog/2015/10/19/dask-learn/
On Sun, 7 Aug 2016 at 10:42 Vlad Ionescu wrote:
> Hello,
>
> I am interested in scaling grid searches on an HPC LSF cluster with about
> 60 nodes, each with 20 cores. I thought i could just set n_jobs=1000 th
Thanks, that looks interesting. I've looked into dask-learn's grid search (
https://github.com/mrocklin/dask-learn/blob/master/grid_search.py) but it
seems not to make use of the n_jobs parameter. Will this work in a
distributed fashion? The link you gave seemed to focus more on optimizing
the grid
Could someone disable the Travis cache once and for all please?
I have seen several frustrating incidents where the Travis fails the PR
because of this caching of old files.
I also don't understand why it is enabled in the first place. It would
really be super helpful if it is disabled for good.
hi,
I just flushed all the caches.
HTH
Alex
On Sun, Aug 7, 2016 at 2:39 PM, Raghav R V wrote:
> Could someone disable the Travis cache once and for all please?
>
> I have seen several frustrating incidents where the Travis fails the PR
> because of this caching of old files.
>
> I also don't un
Parallel computing in scikit-learn is built upon on joblib. In the
development version of scikit-learn, the included joblib can be extended
with a distributed backend:
http://distributed.readthedocs.io/en/latest/joblib.html
that can distribute code on a cluster.
This is still bleeding edge, but th
I copy pasted the example in the link you gave, only made the search take a
longer time. I used dask-ssh to setup worker nodes and a scheduler, then
connected to the scheduler in my code.
Tweaking the n_jobs parameters for the randomized search does not get any
performance benefits. The connection
Why do you think it should be disabled instead of fixed?
On 08/07/2016 08:39 AM, Raghav R V wrote:
Could someone disable the Travis cache once and for all please?
I have seen several frustrating incidents where the Travis fails the
PR because of this caching of old files.
I also don't under
My guess is that your model evaluations are too fast, and that you are
not getting the benefits of distributed computing as the overhead is
hiding them.
Anyhow, I don't think that this is ready for prime-time usage. It
probably requires tweeking and understanding the tradeoffs.
G
On Sun, Aug 07,
I don't think they're too fast. I tried with slower models and bigger data
sets as well. I get the best results with n_jobs=20, which is the number of
cores on a single node. Anything below is considerably slower, anything
above is mostly the same, sometimes a little slower.
Is there a way to see
10 matches
Mail list logo