yes: http://people.fas.harvard.edu/~bergstra/files/pub/11_nips_hyperopt.pdf
and a nice blog post by alex passos: http://atpassos.posterous.com/bayesian-optimization Alex On Thu, Mar 8, 2012 at 9:25 PM, Jacob VanderPlas <[email protected]> wrote: > Interesting! > Has anyone ever seen gaussian process learning used for this sort of > hyperparameter estimation? I'm thinking of something similar to the > Kriging approach to likelihood surfaces, where some random starting > points are used to train a GPML solution, and this surface is minimized > to guess the next best location to try (or locations, if things are > being done in parallel). In this case, the points would be locations in > hyper-parameter space, and the evaluation is the cross-validation score. > It seems like this sort of approach could out-perform the random > selection used in this paper. > Jake > > Olivier Grisel wrote: >> Some fresh news from the hyperparameters tuning front-lines: >> >> http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf >> >> Some interesting snippets from the conclusion (I have not yet read the >> rest of the paper): >> >> """ >> We have shown that random experiments are more efficient than grid >> experiments for hyper-parameter optimization in the case of several >> learning algorithms on several data sets. Our analysis of the >> hyper-parameter response surface (Ψ) suggests that random experiments >> are more efficient because not all hyper- parameters are equally >> important to tune. Grid search experiments allocate too many trials to >> the exploration of dimensions that do not matter and suffer from poor >> coverage in dimensions that are important. >> """ >> >> """ >> Random experiments are also easier to carry out than grid experiments >> for practical reasons related to the statistical independence of every >> trial. >> >> • The experiment can be stopped any time and the trials form a >> complete experiment. >> >> • If extra computers become available, new trials can be added to an >> experiment without having to adjust the grid and commit to a much >> larger experiment. >> >> • Every trial can be carried out asynchronously. >> >> • If the computer carrying out a trial fails for any reason, its trial >> can be either abandoned or restarted without jeopardizing the >> experiment. >> """ >> >> I wonder how this would transpose to scikit-learn models that have >> often much fewer hyper-parameters that the average Deep Belief >> Network. Still it's very interesting food for thought if someone >> want's to dive into improving the model selection tooling in the >> scikit. >> >> Maybe a new GSoC topic? Anybody would be interested as a mentor or candidate? >> >> > > ------------------------------------------------------------------------------ > Virtualization & Cloud Management Using Capacity Planning > Cloud computing makes use of virtualization - but cloud computing > also focuses on allowing computing to be delivered as a service. > http://www.accelacomm.com/jaw/sfnl/114/51521223/ > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
