Interesting!
Has anyone ever seen gaussian process learning used for this sort of 
hyperparameter estimation?  I'm thinking of something similar to the 
Kriging approach to likelihood surfaces, where some random starting 
points are used to train a GPML solution, and this surface is minimized 
to guess the next best location to try (or locations, if things are 
being done in parallel).  In this case, the points would be locations in 
hyper-parameter space, and the evaluation is the cross-validation score.
It seems like this sort of approach could out-perform the random 
selection used in this paper.
   Jake

Olivier Grisel wrote:
> Some fresh news from the hyperparameters tuning front-lines:
>
>   http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf
>
> Some interesting snippets from the conclusion (I have not yet read the
> rest of the paper):
>
> """
> We have shown that random experiments are more efficient than grid
> experiments for hyper-parameter optimization in the case of several
> learning algorithms on several data sets. Our analysis of the
> hyper-parameter response surface (Ψ) suggests that random experiments
> are more efficient because not all hyper- parameters are equally
> important to tune. Grid search experiments allocate too many trials to
> the exploration of dimensions that do not matter and suffer from poor
> coverage in dimensions that are important.
> """
>
> """
> Random experiments are also easier to carry out than grid experiments
> for practical reasons related to the statistical independence of every
> trial.
>
> • The experiment can be stopped any time and the trials form a
> complete experiment.
>
> • If extra computers become available, new trials can be added to an
> experiment without having to adjust the grid and commit to a much
> larger experiment.
>
> • Every trial can be carried out asynchronously.
>
> • If the computer carrying out a trial fails for any reason, its trial
> can be either abandoned or restarted without jeopardizing the
> experiment.
> """
>
> I wonder how this would transpose to scikit-learn models that have
> often much fewer hyper-parameters that the average Deep Belief
> Network. Still it's very interesting food for thought if someone
> want's to dive into improving the model selection tooling in the
> scikit.
>
> Maybe a new GSoC topic? Anybody would be interested as a mentor or candidate?
>
>   

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to