I think the class that you introduce should really be geared towards scikit-learn estimators. But there could be a "lower level" function that just optimizes a black-box function. That is probably desirable from a modularity standpoint and for testing anyhow.

On 03/26/2015 05:07 PM, Christof Angermueller wrote:
GridSearchCV and RandomizedSearchCV inherit from BaseCV and require and an estimator object with fit() and predict() as first constructor argument. Hence, the estimator must follow the sklearn convention with fit() and predict(). Instead, the estimator might also be implemented as a black-box function f(x) that takes some arguments and returns a value, as it is done in spearmint. This makes it easier to optimize any algorithms, not just those implemented in sklearn.

For consistency, GPSearchCV should also inherit from BaseCV. But what to you think about extending the current interface to make it easier to optimize any learner?

Christof


On 20150326 20:02, Christof Angermueller wrote:
Hi Andy and others,

I revised my proposal (https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing) and submitted it to melange. Can you have a look if any essential (formal) things are missing?
I will submit the final version tomorrow.

Cheers,
Christof

On 20150326 16:08, Andreas Mueller wrote:
Hi Matthias.
As far as I know, the main goal for TPE was to support tree-structured parameter spaces. I am not sure we want to go there yet because of the more complex API. On non-tree structured spaces, I think TPE performed worse than SMAC and GP.

With regard to your code: There might be touchy legal issues involved if you didn't publish your code and we base our implementation on it. If your code is public and BSD / MIT licensed, it would probably be much safer. Why don't you just push your code under a permissive license?

Thank you for providing your benchmarks, they might be quite helpful.

Cheers,
Andy



On 03/26/2015 11:17 AM, Matthias Feurer wrote:
Dear Christof, dear scikit-learn team,

This is a great idea, I highly encourage your idea to integrate Bayesian Optimization into scikit-learn since automatically configuring scikit-learn is quite powerful. It was done by the three winning teams of the first automated machine learning competition: https://sites.google.com/a/chalearn.org/automl/

I am writing this e-mail because our research group on learning, optimization and automated algorithm design (http://aad.informatik.uni-freiburg.de/) is working on very similar things which might be useful in this context. Some people in our lab (together with some people from other universities)developed a framework for robust Bayesian optimization with minimal external dependencies. It currently depends on GPy, but this dependency could be easily replaced by the scikit-learn GP. It is probably not as leightweight as you want to have it for scikit-learn, but you might want to have a look at the source code. I will provide a link as soon as the project is public (which is soon). In the meantime, I can grant read-access to those who are interested. It might be helpful for you to have look at the structure of the module.

Besides these remarks, I think that using a GP is a good way to tune the few hyperparameters of a single model. Another remark: Instead of comparing GPSearchCV to spearmint only, you should also consider the TPE algorithm implemented in hyperopt (https://github.com/hyperopt/hyperopt). You could consider the following benchmarks:

1. Together with a fellow student I implemented a library called HPOlib, which provides a few benchmarks for hyperparameter optimization (for example some from the 2012 spearmint paper): https://github.com/automl/HPOlib It is further described in this paper: http://automl.org/papers/13-BayesOpt_EmpiricalFoundation.pdf 2. If you are looking for a small pipeline, you can use sklearn.feature_selection.SelectPercentile with a fixed scoring function together with a classification algorithm. It adds a single hyperparameter which should be a good fit for the GP.

Best regards,
Matthias




------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

--
Christof Angermueller
cangermuel...@gmail.com
http://cangermueller.com

--
Christof Angermueller
cangermuel...@gmail.com
http://cangermueller.com


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to