Re: [Scikit-learn-general] Parallel SVM/SVC?

2014-01-30 Thread Sturla Molden
On 31/01/14 05:45, Thomas Johnson wrote: > I think it uses SMP, since they offer machines with up to 16 cores Perhaps, I don't know. You get 16 "cores", but do they share memory? There are SVMs for cluster architectures as well: http://books.nips.cc/papers/files/nips20/NIPS2007_0435.pdf https

Re: [Scikit-learn-general] Parallel SVM/SVC?

2014-01-30 Thread Thomas Johnson
I think it uses SMP, since they offer machines with up to 16 cores On Thu, Jan 30, 2014 at 10:41 PM, Sturla Molden wrote: > On 31/01/14 05:16, Thomas Johnson wrote:> It's definitely the bottleneck > for my particular use case. I spawn ~180 > > processes for a grid search on my Google Compute En

Re: [Scikit-learn-general] Parallel SVM/SVC?

2014-01-30 Thread Sturla Molden
On 31/01/14 05:16, Thomas Johnson wrote:> It's definitely the bottleneck for my particular use case. I spawn ~180 > processes for a grid search on my Google Compute Engine cluster, but > still end up waiting >90 minutes just for a few individual long-running > processes with high C values. The

Re: [Scikit-learn-general] Parallel SVM/SVC?

2014-01-30 Thread Thomas Johnson
It's definitely the bottleneck for my particular use case. I spawn ~180 processes for a grid search on my Google Compute Engine cluster, but still end up waiting >90 minutes just for a few individual long-running processes with high C values. On Thu, Jan 30, 2014 at 6:39 PM, Frédéric Bastien wro

Re: [Scikit-learn-general] Bayesian optimization for hyperparameter tuning

2014-01-30 Thread Ken Arnold
On Thu, Jan 30, 2014 at 5:21 PM, Sturla Molden wrote: > As I understand it fro reading about this a LONG time ago (apologies if my > memory is rusty), "Bayesian optimization" means maximizing the > log-likelihood using the Newton-Raphson method. Probably that was how the term was typically used

Re: [Scikit-learn-general] Bayesian optimization for hyperparameter tuning

2014-01-30 Thread Joel Nothman
> With a grid search, we can run all jobs in parallel. But I have the impression that those algo remove that possibility. ... You can still run all folds in, say 10-fold cross-validation in parallel. > But the most interresting question, if we start many jobs in parallel, if the jobs don't finis

Re: [Scikit-learn-general] Bayesian optimization for hyperparameter tuning

2014-01-30 Thread Patrick Mineault
Sure you can: http://www.cs.toronto.edu/~jasper/*bayes*opt.pdf And some python code: https://github.com/JasperSnoek/spearmint On Thu, Jan 30, 2014 at 7:53 PM, Frédéric Bastien wrote: > I have a question on those type of algo for hyper parameter > optimization. With a grid search, we can run

Re: [Scikit-learn-general] Bayesian optimization for hyperparameter tuning

2014-01-30 Thread Frédéric Bastien
I have a question on those type of algo for hyper parameter optimization. With a grid search, we can run all jobs in parallel. But I have the impression that those algo remove that possibility. Is there there way to sample many starting configuration with those algo? But the most interresting quest

Re: [Scikit-learn-general] Parallel SVM/SVC?

2014-01-30 Thread Frédéric Bastien
Just a comment (I won't use this). I think most people can live with the restriction of not using python multi-process if they use SVM parallel. From my low understanding of SVM, my guess is that this is the bottleneck ~100% of the time. So the need of multiprocess in conjuction of parallel SVM is

Re: [Scikit-learn-general] API change for scoring in cross validation @rev 0.14.1

2014-01-30 Thread Joel Nothman
Submit a PR and see what people think! On 31 January 2014 09:52, Faraz Mirzaei wrote: > It seems that removing lines 1065-1067 of cross_validation.py solves the > problem for now: > > > 1065: if not isinstance(score, numbers.Number): > > 1066: raise ValueError("scoring must return a number,

Re: [Scikit-learn-general] Parallel SVM/SVC?

2014-01-30 Thread Sturla Molden
Sturla Molden wrote: > > This is actually a GNU problem. libgomp cannot be used on both sides of a > fork without an exec. Other common OpenMP implementations (Intel, > Microsoft) do not have this problem. It is interesting that Apple's GCD and > Accelerate framework have exactly the same issue a

Re: [Scikit-learn-general] API change for scoring in cross validation @rev 0.14.1

2014-01-30 Thread Faraz Mirzaei
It seems that removing lines 1065-1067 of cross_validation.py solves the problem for now: 1065: if not isinstance(score, numbers.Number): 1066: raise ValueError("scoring must return a number, got %s (%s)" 1067 " instead." % (str(score), type(score))) Can we pa

Re: [Scikit-learn-general] Parallel SVM/SVC?

2014-01-30 Thread Sturla Molden
Lars Buitinck wrote: > But anyway, the modification has not been implemented in scikit-learn > because the combination of OpenMP and Python multiprocessing is rather > problematic. This is actually a GNU problem. libgomp cannot be used on both sides of a fork without an exec. Other common OpenMP

Re: [Scikit-learn-general] Bayesian optimization for hyperparameter tuning

2014-01-30 Thread Gael Varoquaux
On Thu, Jan 30, 2014 at 11:23:28AM -0800, James Jensen wrote: > Bayesian optimization is an efficient method used especially for > functions that are expensive to evaluate. The basic idea is to fit the > function using Gaussian processes, using a surrogate function that > determines where to eva

Re: [Scikit-learn-general] Bayesian optimization for hyperparameter tuning

2014-01-30 Thread Sturla Molden
As I understand it fro reading about this a LONG time ago (apologies if my memory is rusty), "Bayesian optimization" means maximizing the log-likelihood using the Newton-Raphson method. The word "Bayesian" comes from an obfuscated explanation of what really happens: If we assume a flat or Gaussian

Re: [Scikit-learn-general] Parallel SVM/SVC?

2014-01-30 Thread Lars Buitinck
2014-01-30 Thomas Johnson : > The scikit-learn docs say that the SVM/SVC classes are based on libsvm. The > libsvm faq (http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f432) says > that libsvm can automatically parallelize kernel evaluations using openmp if > compiled correctly. Is there any way

Re: [Scikit-learn-general] Bayesian optimization for hyperparameter tuning

2014-01-30 Thread James Jensen
Hi, Had, It's true that I'd have limited time (working on a PhD). I imagine most possible contributors are also quite busy. Mainly, I lack the expertise necessary to do this properly; I understand Bayesian optimization at a high level but don't have much of a foundation in the underlying math,

Re: [Scikit-learn-general] Bayesian optimization for hyperparameter tuning

2014-01-30 Thread Zach Dwiel
It seems that with GridSearchCV and RandomizedSearchCV both already included in scikit-learn, it would make sense to also include other common, more efficient hyperparameter searchers as well. zach On Thu, Jan 30, 2014 at 3:11 PM, Hadayat Seddiqi wrote: > Hi, > > So I was the one who volunteer

Re: [Scikit-learn-general] Bayesian optimization for hyperparameter tuning

2014-01-30 Thread Hadayat Seddiqi
Hi, So I was the one who volunteered to do contribute my GP code for a revamp of scikits module. I'm far from an expert, and I can't say I understand how this would fit off the top of my head, but if someone is knowledgeable and willing to work on this then I'd be more than happy to lend a hand as

[Scikit-learn-general] Parallel SVM/SVC?

2014-01-30 Thread Thomas Johnson
The scikit-learn docs say that the SVM/SVC classes are based on libsvm. The libsvm faq (http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f432) says that libsvm can automatically parallelize kernel evaluations using openmp if compiled correctly. Is there any way to parallelize the SVM or SVC implem

Re: [Scikit-learn-general] Bayesian optimization for hyperparameter tuning

2014-01-30 Thread Dan Haiduc
Actually, I wanted to create exactly this myself. I was then discouraged by the fact that Scikit-learn did not pull from a guy who implemented Multi-Armed Banditon the reason that Scikit-learn doesn't do reinforcement learning. I'm new here (ev

[Scikit-learn-general] Bayesian optimization for hyperparameter tuning

2014-01-30 Thread James Jensen
I usually hesitate to suggest a new feature in a library like this unless I am in a position to work on it myself. However, given the number of people who seem eager to find something to contribute, and given the recent discussion about improving the Gaussian process module, I thought I'd ventu

Re: [Scikit-learn-general] What's up with our Debian popcon results

2014-01-30 Thread federico vaggi
Most computing clusters usually run centOS, I think. Even if they were running Debian, it's very rare that all the nodes of the cluster are connected to the internet, usually you have to build the binaries yourself, which for some packages is a huge pain. On Wed, Jan 29, 2014 at 12:36 PM, Olivie