I submitted my final proposal to melange.
Thanks everybody for your suggestions!
Christof
On 20150326 22:51, Andy wrote:
I think you should focus on first creating a prototype without
ParamSklearn.
On 03/26/2015 06:19 PM, Christof Angermueller wrote:
Hi Matthias,
using HPOlib to benchmark GPSearchCV on the same datasets that were
used to benchmark spearmint, TPA, and SMAC, is a good idea, and I
will include it in my proposal. However, I plan to primarily compare
GPSearchCV with GridSearchCV, RandomizedSearchCV, as well as
spearmint as only external optimizer. Including TPA and SMAC is
optional, which I could do after GSoC or in the unlikely case that
time is left at the end.
At the current stage, I can not tell you if I will use ParamSklearn
to define hyperparameters. Maybe I will come back to you when I think
for carefully about how to define parameters.
Thanks for you suggestions,
Christof
On 20150326 16:08, Andreas Mueller wrote:
Hi Matthias.
As far as I know, the main goal for TPE was to support
tree-structured parameter spaces. I am not sure we want to go there
yet because of the more complex API.
On non-tree structured spaces, I think TPE performed worse than SMAC
and GP.
With regard to your code: There might be touchy legal issues
involved if you didn't publish your code and we base our
implementation on it.
If your code is public and BSD / MIT licensed, it would probably be
much safer. Why don't you just push your code under a permissive
license?
Thank you for providing your benchmarks, they might be quite helpful.
Cheers,
Andy
On 03/26/2015 11:17 AM, Matthias Feurer wrote:
Dear Christof, dear scikit-learn team,
This is a great idea, I highly encourage your idea to integrate
Bayesian Optimization into scikit-learn since automatically
configuring scikit-learn is quite powerful. It was done by the
three winning teams of the first automated machine learning
competition: https://sites.google.com/a/chalearn.org/automl/
I am writing this e-mail because our research group on learning,
optimization and automated algorithm design
(http://aad.informatik.uni-freiburg.de/) is working on very similar
things which might be useful in this context. Some people in our
lab (together with some people from other universities)developed a
framework for robust Bayesian optimization with minimal external
dependencies. It currently depends on GPy, but this dependency
could be easily replaced by the scikit-learn GP. It is probably not
as leightweight as you want to have it for scikit-learn, but you
might want to have a look at the source code. I will provide a link
as soon as the project is public (which is soon). In the meantime,
I can grant read-access to those who are interested. It might be
helpful for you to have look at the structure of the module.
Besides these remarks, I think that using a GP is a good way to
tune the few hyperparameters of a single model. Another remark:
Instead of comparing GPSearchCV to spearmint only, you should also
consider the TPE algorithm implemented in hyperopt
(https://github.com/hyperopt/hyperopt). You could consider the
following benchmarks:
1. Together with a fellow student I implemented a library called
HPOlib, which provides a few benchmarks for hyperparameter
optimization (for example some from the 2012 spearmint paper):
https://github.com/automl/HPOlib It is further described in this
paper: http://automl.org/papers/13-BayesOpt_EmpiricalFoundation.pdf
2. If you are looking for a small pipeline, you can use
sklearn.feature_selection.SelectPercentile with a fixed scoring
function together with a classification algorithm. It adds a single
hyperparameter which should be a good fit for the GP.
Best regards,
Matthias
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
cangermuel...@gmail.com
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
cangermuel...@gmail.com
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general