[Scikit-learn-general] Bayesian optimization for hyperparameter tuning

James Jensen Thu, 30 Jan 2014 11:49:15 -0800

I usually hesitate to suggest a new feature in a library like this 
unless I am in a position to work on it myself. However, given the 
number of people who seem eager to find something to contribute, and 
given the recent discussion about improving the Gaussian process module, 
I thought I'd venture an idea.


Bayesian optimization is an efficient method used especially for 
functions that are expensive to evaluate. The basic idea is to fit the 
function using Gaussian processes, using a surrogate function that 
determines where to evaluate next in each iteration. The surrogate 
strikes a balance between exploration (sampling intervals you haven't 
tried before) and exploitation (if previous samples in a vicinity scored 
well, then the likelihood of getting a high score in that area is high). 
Some of the math behind it is beyond me, but the general idea is very 
intuitive. Brochu, Cora, and de Freitas (2010) "A Tutorial on Bayesian 
Optimization of Expensive Cost Functions," is a good introduction.

One useful application of Bayesian optimization is hyperparameter 
tuning. It can be used to optimize the cross-validation score, as an 
alternative to, for example, grid search. Grid search is simple and 
parallelizable, there is no overhead in choosing the hyperparameters to 
try, and the nature of some estimators allows them to be used with it 
very efficiently. Bayesian optimization is serial and has a small amount 
of overhead in evaluating the surrogate. But it is generally much more 
efficient in finding good solutions, and particularly shines when the 
scoring function is costly or when there are more than 1 or 2 
hyperparameters to tune; here grid search is less attractive and 
sometimes completely impractical.

In one of my own applications, involving 4 regularization parameters, 
I've been using the BayesOpt library 
(http://rmcantin.bitbucket.org/html/index.html), which offers it as a 
general-purpose optimization technique that one can manually integrate 
with one's cross-validation code. In general, it works quite well, but 
there are some limitations to its design that can make its integration 
inconvenient. Having this functionality directly integrated into 
scikit-learn and specifically tailored to hyperparameter tuning would be 
useful. I have been impressed with the ease of use of such convenience 
classes as GridSearchCV, and dream of having a corresponding BayesOptCV, 
etc.

As a general-use optimization method, Bayesian optimization would belong 
elsewhere than in scikit-learn, e.g. in scipy.optimize. But specifically 
as a method for hyperparameter tuning, it seems it would fit well in the 
scope of scikit-learn, especially since I expect it would not be much 
more than a layer or two of functionality on top of what scikit-learn's 
GP module offers (or will offer once revised). And it would be of more 
general utility than an additional estimator here or there.

I'm curious to hear what others think about the idea. Would this be a 
good fit for scikit-learn? Do we have people with the interest, 
expertise, and time to take this on at some point?





------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Bayesian optimization for hyperparameter tuning

Reply via email to