yes:

http://people.fas.harvard.edu/~bergstra/files/pub/11_nips_hyperopt.pdf

and a nice blog post by alex passos:

http://atpassos.posterous.com/bayesian-optimization

Alex

On Thu, Mar 8, 2012 at 9:25 PM, Jacob VanderPlas
<[email protected]> wrote:
> Interesting!
> Has anyone ever seen gaussian process learning used for this sort of
> hyperparameter estimation?  I'm thinking of something similar to the
> Kriging approach to likelihood surfaces, where some random starting
> points are used to train a GPML solution, and this surface is minimized
> to guess the next best location to try (or locations, if things are
> being done in parallel).  In this case, the points would be locations in
> hyper-parameter space, and the evaluation is the cross-validation score.
> It seems like this sort of approach could out-perform the random
> selection used in this paper.
>   Jake
>
> Olivier Grisel wrote:
>> Some fresh news from the hyperparameters tuning front-lines:
>>
>>   http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf
>>
>> Some interesting snippets from the conclusion (I have not yet read the
>> rest of the paper):
>>
>> """
>> We have shown that random experiments are more efficient than grid
>> experiments for hyper-parameter optimization in the case of several
>> learning algorithms on several data sets. Our analysis of the
>> hyper-parameter response surface (Ψ) suggests that random experiments
>> are more efficient because not all hyper- parameters are equally
>> important to tune. Grid search experiments allocate too many trials to
>> the exploration of dimensions that do not matter and suffer from poor
>> coverage in dimensions that are important.
>> """
>>
>> """
>> Random experiments are also easier to carry out than grid experiments
>> for practical reasons related to the statistical independence of every
>> trial.
>>
>> • The experiment can be stopped any time and the trials form a
>> complete experiment.
>>
>> • If extra computers become available, new trials can be added to an
>> experiment without having to adjust the grid and commit to a much
>> larger experiment.
>>
>> • Every trial can be carried out asynchronously.
>>
>> • If the computer carrying out a trial fails for any reason, its trial
>> can be either abandoned or restarted without jeopardizing the
>> experiment.
>> """
>>
>> I wonder how this would transpose to scikit-learn models that have
>> often much fewer hyper-parameters that the average Deep Belief
>> Network. Still it's very interesting food for thought if someone
>> want's to dive into improving the model selection tooling in the
>> scikit.
>>
>> Maybe a new GSoC topic? Anybody would be interested as a mentor or candidate?
>>
>>
>
> ------------------------------------------------------------------------------
> Virtualization & Cloud Management Using Capacity Planning
> Cloud computing makes use of virtualization - but cloud computing
> also focuses on allowing computing to be delivered as a service.
> http://www.accelacomm.com/jaw/sfnl/114/51521223/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to