In terms of memory: I gather joblib.parallel is meant to automatically
memmap large arrays (>100MB). However, then each subprocess will extract a
non-contiguous set of samples from the data for training under a
cross-validation regime. Would I be right in thinking that's where the
memory blowout comes from? When there's risk of such an expensive indexing,
should we be using sample_weight (where the base estimator supports it) to
select portions of the training data without copy?

On 24 September 2015 at 23:21, Dale Smith <dsm...@nexidia.com> wrote:

> My experiences with parallel GridSearchCV and RFECV have not been
> pleasant. Memory usage was a huge problem, as apparently each job got a
> copy of the data with an out-of-the box scikit-learn installation using
> Anaconda 3. No matter how I set pre_dispatch, I could not get n_jobs = 2 to
> work, even with no one else using a 100 gb 24 core Windows box.
>
>
>
> I can create some reproducible code if anyone has time to work on it.
>
>
>
>
> *Dale Smith, Ph.D.*
> Data Scientist
> ​
> [image:
> http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20logo.png]
> <http://nexidia.com/>
>
> * d.* 404.495.7220 x 4008   *f.* 404.795.7221
> Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta,
> GA 30305
>
> [image:
> http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Blog.jpeg]
> <http://blog.nexidia.com/> [image:
> http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20LinkedIn.jpeg]
> <https://www.linkedin.com/company/nexidia> [image:
> http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Google.jpeg]
> <https://plus.google.com/u/0/107921893643164441840/posts> [image:
> http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20twitter.jpeg]
> <https://twitter.com/Nexidia> [image:
> http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Youtube.jpeg]
> <https://www.youtube.com/user/NexidiaTV>
>
>
>
> *From:* Clyde Fare [mailto:clyde.f...@gmail.com]
> *Sent:* Thursday, September 24, 2015 8:38 AM
> *To:* scikit-learn-general@lists.sourceforge.net
> *Subject:* [Scikit-learn-general] GridSearchCV using too many cores?
>
>
>
> Hi,
>
>
>
> I'm trying to run GridSearchCV on a computational cluster but my jobs keep
> failing with an error from the queuing system claiming I'm using too many
> cores.
>
>
>
> If I set n_jobs equal 1, then the job doesn't fail but if it's more than
> one, no matter what number it is the job fails.
>
>
>
> In the example below I've set n_jobs to 6 and pre_dispatch to 12, and
> asked for 8 processors from the queue. I got the following error after ~10
> minutes: "PBS: job killed: ncpus 19.73 exceeded limit 8 (sum)"
>
>
>
> I've tried playing around the pre_dispatch but it makes difference. There
> will be other people running calculations on these nodes, so might there be
> some kind of intereference between GridSearchCV and the other jobs?
>
>
>
> Anyone come across anything like this before?
>
>
>
> Cheers
>
>
>
> Clyde
>
>
>
>
>
> import dill
>
> import numpy as np
>
>
>
> from sklearn.kernel_ridge import KernelRidge
>
> from sklearn.grid_search import GridSearchCV
>
>
>
> label='test_grdsrch3'
>
> X_train = np.random.rand(971,276)
>
> y_train = np.random.rand(971)
>
>
>
> kr = GridSearchCV(KernelRidge(), cv=10,
>
>                   param_grid={"kernel": ['rbf', 'laplacian'],
>
>                               "alpha": [2**i for i in
> np.arange(-40,-5,0.5)],                 #alpha=lambda
>
>                               "gamma": [1/(2.**(2*i)) for i in
> np.arange(5,18,0.5)]},   #gamma = 1/sigma^2
>
>                   pre_dispatch=12,
>
>                   n_jobs=6)
>
>
>
> kr.fit(X_train, y_train)
>
>
>
> with open(label+'.pkl','w') as data_f:
>
>         dill.dump(kr, data_f)
>
>
>
>
> ------------------------------------------------------------------------------
> Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
> Get real-time metrics from all of your servers, apps and tools
> in one place.
> SourceForge users - Click here to start your Free Trial of Datadog now!
> http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to