I usually hesitate to suggest a new feature in a library like this unless I am in a position to work on it myself. However, given the number of people who seem eager to find something to contribute, and given the recent discussion about improving the Gaussian process module, I thought I'd venture an idea.
Bayesian optimization is an efficient method used especially for functions that are expensive to evaluate. The basic idea is to fit the function using Gaussian processes, using a surrogate function that determines where to evaluate next in each iteration. The surrogate strikes a balance between exploration (sampling intervals you haven't tried before) and exploitation (if previous samples in a vicinity scored well, then the likelihood of getting a high score in that area is high). Some of the math behind it is beyond me, but the general idea is very intuitive. Brochu, Cora, and de Freitas (2010) "A Tutorial on Bayesian Optimization of Expensive Cost Functions," is a good introduction. One useful application of Bayesian optimization is hyperparameter tuning. It can be used to optimize the cross-validation score, as an alternative to, for example, grid search. Grid search is simple and parallelizable, there is no overhead in choosing the hyperparameters to try, and the nature of some estimators allows them to be used with it very efficiently. Bayesian optimization is serial and has a small amount of overhead in evaluating the surrogate. But it is generally much more efficient in finding good solutions, and particularly shines when the scoring function is costly or when there are more than 1 or 2 hyperparameters to tune; here grid search is less attractive and sometimes completely impractical. In one of my own applications, involving 4 regularization parameters, I've been using the BayesOpt library (http://rmcantin.bitbucket.org/html/index.html), which offers it as a general-purpose optimization technique that one can manually integrate with one's cross-validation code. In general, it works quite well, but there are some limitations to its design that can make its integration inconvenient. Having this functionality directly integrated into scikit-learn and specifically tailored to hyperparameter tuning would be useful. I have been impressed with the ease of use of such convenience classes as GridSearchCV, and dream of having a corresponding BayesOptCV, etc. As a general-use optimization method, Bayesian optimization would belong elsewhere than in scikit-learn, e.g. in scipy.optimize. But specifically as a method for hyperparameter tuning, it seems it would fit well in the scope of scikit-learn, especially since I expect it would not be much more than a layer or two of functionality on top of what scikit-learn's GP module offers (or will offer once revised). And it would be of more general utility than an additional estimator here or there. I'm curious to hear what others think about the idea. Would this be a good fit for scikit-learn? Do we have people with the interest, expertise, and time to take this on at some point? ------------------------------------------------------------------------------ WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds. http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general