2013/2/19 James Bergstra <james.bergs...@gmail.com>: > Further to this: I started a project on github to look at how to > combine hyperopt with sklearn. > https://github.com/jaberg/hyperopt-sklearn > > I've only wrapped on algorithm so far: Perceptron > https://github.com/jaberg/hyperopt-sklearn/blob/master/hpsklearn/perceptron.py > > My idea is that little files like perceptron.py would encode > (a) domain expertise about what values make sense for a particular > hyper-parameter (see the `search_space()` function and > (b) a sklearn-style fit/predict interface that encapsulates search > over those hyper-parameters (see `AutoPerceptron`)
I'm not sure what your long-term goals with this project are, but I see three problems with this approach: 1. The values might be problem-dependent rather than estimator dependent. In your example, you're optimizing for accuracy, but you might want to optimize for F1-score instead. 2. The number is estimators is *huge* if you also consider combinations like SelectKBest(chi2) -> RBFSamples -> SGDClassifier pipelines (a classifier that I was trying out only yesterday). 3. The estimator parameters change sometimes, so this would have to be kept in sync with scikit-learn. When I wrote the scikit-learn wrapper for NLTK [1], I chose a strategy where *no scikit-learn code is imported at all* (except when the user runs the demo or unit tests). Instead, the user is responsible for importing it and constructing the appropriate estimator. This makes the code robust to API changes, and it can handle arbitrarily complex sklearn.Pipeline objects, as well as estimators that follow the API conventions but are not in scikit-learn proper. I think a similar approach can be followed here. While some suggestions for parameters to try might be shipped as examples, an estimator- and evaluation-agnostic wrapper class ("meta-estimator") is a stronger basis for a package like the one you're writing. scikit-learn's own GridSearch is also implemented like this, to a large extent. [1] https://github.com/nltk/nltk/blob/f7f3b73f0f051639d87cfeea43b0aabf6f167b8f/nltk/classify/scikitlearn.py -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general