Dear scikit-learn team,

After reading the proposal of Christoph Angermüller wanting to enhance scikit-learn with Bayesian optimization (http://sourceforge.net/p/scikit-learn/mailman/message/33630274/) as a GSoC project, you might also want to think again about the integration of a hyperparameter concept into scikit-learn.

Our group built a framework called ParamSklearn (https://bitbucket.org/mfeurer/paramsklearn/overview), which provides hyperparameter definitions for a subset of classifiers, regressors and preprocessors in scikit-learn. The result is something similar like what James Bergstra did in hpsklearn (https://github.com/hyperopt/hyperopt-sklearn) and a post from 2010 (http://sourceforge.net/p/scikit-learn/mailman/scikit-learn-general/thread/aanlktilvznvavqr-sbiixcguwyuf6jyq_ijvytdx7...@mail.gmail.com/?page=0). In the end you get a configuration space which can then be read by a Sequential Model-based Optimization package. For example, we used this module for our AutoSklearn entry in the first automated machine learning competition: https://sites.google.com/a/chalearn.org/automl/

Optimizing hyperparameters is a challenge itself, but defining relevant ranges is also a difficult task for non-experts. Thus, it would be nice to find a way to integrate the hyperparameter definitions into scikit-learn (see bottom of this e-mail for a suggestion) such that they can be used either by the not-yet-existing GPSearchCV, the already existing RandomizedSearchCV or the GridSearchCV, but also by external tools like our ParamSklearn. The hyperparameter definitions would leave a user with only two mandatory choices: number of evaluations/runtime and the estimator to use.

What do you think?

Best regards,
Matthias Feurer

Currently, we define the hyperparameters with a package called HPOlibConfigSpace (https://github.com/automl/HPOlibConfigSpace). For the SVC it looks like this:

C = UniformFloatHyperparameter("C", 0.03125, 32768, log=True, default=1.0)
kernel = CategoricalHyperparameter(name="kernel",
    choices=["rbf", "poly", "sigmoid"], default="rbf")
degree = UniformIntegerHyperparameter("degree", 1, 5, default=3)
gamma = UniformFloatHyperparameter("gamma", 3.0517578125e-05, 8,
    log=True, default=0.1)
coef0 = UniformFloatHyperparameter("coef0", -1, 1, default=0)
shrinking = CategoricalHyperparameter("shrinking", ["True", "False"],
                                      default="True")
tol = UniformFloatHyperparameter("tol", 1e-5, 1e-1, default=1e-4,
                                 log=True)
class_weight = CategoricalHyperparameter("class_weight",
    ["None", "auto"],default="None")
max_iter = UnParametrizedHyperparameter("max_iter", -1)

cs = ConfigurationSpace()
cs.add_hyperparameter(C)
cs.add_hyperparameter(kernel)
cs.add_hyperparameter(degree)
cs.add_hyperparameter(gamma)
cs.add_hyperparameter(coef0)
cs.add_hyperparameter(shrinking)
cs.add_hyperparameter(tol)
cs.add_hyperparameter(class_weight)
cs.add_hyperparameter(max_iter)

degree_depends_on_poly = EqualsCondition(degree, kernel, "poly")
coef0_condition = InCondition(coef0, kernel, ["poly", "sigmoid"])
cs.add_condition(degree_depends_on_poly)
cs.add_condition(coef0_condition)

The code is more verbose than it has to be, but we are working on this. The ConfigurationSpace object can then be accessed by a @staticmethod and be used as a parameter description object inside *SearchCV. We can provide a stripped-down version of the HPOlibConfigSpace for integration in sklearn.external, as well as the hyperparameter definitions we have so far.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to