Hi Matthias.
I think that is an interesting direction to go into, and I actually
thought a bit about if and how we could add something like that to
scikit-learn.
Is there online documentation for paramsklearn?
It is a bit hard to say what are good defaults, I think, and it often
encodes intuition about the problem.
The parameter spaces that you want to search are probably different
between GridSearchCV and a model-based approach, too.
Do you have any examples or benchmarks available online?
Cheers,
Andy
On 03/24/2015 03:50 PM, Matthias Feurer wrote:
Dear scikit-learn team,
After reading the proposal of Christoph Angermüller wanting to enhance
scikit-learn with Bayesian optimization
(http://sourceforge.net/p/scikit-learn/mailman/message/33630274/) as a
GSoC project, you might also want to think again about the integration
of a hyperparameter concept into scikit-learn.
Our group built a framework called ParamSklearn
(https://bitbucket.org/mfeurer/paramsklearn/overview), which provides
hyperparameter definitions for a subset of classifiers, regressors and
preprocessors in scikit-learn. The result is something similar like
what James Bergstra did in hpsklearn
(https://github.com/hyperopt/hyperopt-sklearn) and a post from 2010
(http://sourceforge.net/p/scikit-learn/mailman/scikit-learn-general/thread/aanlktilvznvavqr-sbiixcguwyuf6jyq_ijvytdx7...@mail.gmail.com/?page=0).
In the end you get a configuration space which can then be read by a
Sequential Model-based Optimization package. For example, we used this
module for our AutoSklearn entry in the first automated machine
learning competition: https://sites.google.com/a/chalearn.org/automl/
Optimizing hyperparameters is a challenge itself, but defining
relevant ranges is also a difficult task for non-experts. Thus, it
would be nice to find a way to integrate the hyperparameter
definitions into scikit-learn (see bottom of this e-mail for a
suggestion) such that they can be used either by the not-yet-existing
GPSearchCV, the already existing RandomizedSearchCV or the
GridSearchCV, but also by external tools like our ParamSklearn. The
hyperparameter definitions would leave a user with only two mandatory
choices: number of evaluations/runtime and the estimator to use.
What do you think?
Best regards,
Matthias Feurer
Currently, we define the hyperparameters with a package called
HPOlibConfigSpace (https://github.com/automl/HPOlibConfigSpace). For
the SVC it looks like this:
C = UniformFloatHyperparameter("C", 0.03125, 32768, log=True, default=1.0)
kernel = CategoricalHyperparameter(name="kernel",
choices=["rbf", "poly", "sigmoid"], default="rbf")
degree = UniformIntegerHyperparameter("degree", 1, 5, default=3)
gamma = UniformFloatHyperparameter("gamma", 3.0517578125e-05, 8,
log=True, default=0.1)
coef0 = UniformFloatHyperparameter("coef0", -1, 1, default=0)
shrinking = CategoricalHyperparameter("shrinking", ["True", "False"],
default="True")
tol = UniformFloatHyperparameter("tol", 1e-5, 1e-1, default=1e-4,
log=True)
class_weight = CategoricalHyperparameter("class_weight",
["None", "auto"],default="None")
max_iter = UnParametrizedHyperparameter("max_iter", -1)
cs = ConfigurationSpace()
cs.add_hyperparameter(C)
cs.add_hyperparameter(kernel)
cs.add_hyperparameter(degree)
cs.add_hyperparameter(gamma)
cs.add_hyperparameter(coef0)
cs.add_hyperparameter(shrinking)
cs.add_hyperparameter(tol)
cs.add_hyperparameter(class_weight)
cs.add_hyperparameter(max_iter)
degree_depends_on_poly = EqualsCondition(degree, kernel, "poly")
coef0_condition = InCondition(coef0, kernel, ["poly", "sigmoid"])
cs.add_condition(degree_depends_on_poly)
cs.add_condition(coef0_condition)
The code is more verbose than it has to be, but we are working on
this. The ConfigurationSpace object can then be accessed by a
@staticmethod and be used as a parameter description object inside
*SearchCV. We can provide a stripped-down version of the
HPOlibConfigSpace for integration in sklearn.external, as well as the
hyperparameter definitions we have so far.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general