Interesting! Has anyone ever seen gaussian process learning used for this sort of hyperparameter estimation? I'm thinking of something similar to the Kriging approach to likelihood surfaces, where some random starting points are used to train a GPML solution, and this surface is minimized to guess the next best location to try (or locations, if things are being done in parallel). In this case, the points would be locations in hyper-parameter space, and the evaluation is the cross-validation score. It seems like this sort of approach could out-perform the random selection used in this paper. Jake
Olivier Grisel wrote: > Some fresh news from the hyperparameters tuning front-lines: > > http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf > > Some interesting snippets from the conclusion (I have not yet read the > rest of the paper): > > """ > We have shown that random experiments are more efficient than grid > experiments for hyper-parameter optimization in the case of several > learning algorithms on several data sets. Our analysis of the > hyper-parameter response surface (Ψ) suggests that random experiments > are more efficient because not all hyper- parameters are equally > important to tune. Grid search experiments allocate too many trials to > the exploration of dimensions that do not matter and suffer from poor > coverage in dimensions that are important. > """ > > """ > Random experiments are also easier to carry out than grid experiments > for practical reasons related to the statistical independence of every > trial. > > • The experiment can be stopped any time and the trials form a > complete experiment. > > • If extra computers become available, new trials can be added to an > experiment without having to adjust the grid and commit to a much > larger experiment. > > • Every trial can be carried out asynchronously. > > • If the computer carrying out a trial fails for any reason, its trial > can be either abandoned or restarted without jeopardizing the > experiment. > """ > > I wonder how this would transpose to scikit-learn models that have > often much fewer hyper-parameters that the average Deep Belief > Network. Still it's very interesting food for thought if someone > want's to dive into improving the model selection tooling in the > scikit. > > Maybe a new GSoC topic? Anybody would be interested as a mentor or candidate? > > ------------------------------------------------------------------------------ Virtualization & Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
