2013/2/19 James Bergstra <james.bergs...@gmail.com>:
> Further to this: I started a project on github to look at how to
> combine hyperopt with sklearn.
> https://github.com/jaberg/hyperopt-sklearn
>
> I've only wrapped on algorithm so far: Perceptron
> https://github.com/jaberg/hyperopt-sklearn/blob/master/hpsklearn/perceptron.py
>
> My idea is that little files like perceptron.py would encode
> (a) domain expertise about what values make sense for a particular
> hyper-parameter (see the `search_space()` function and
> (b) a sklearn-style fit/predict interface that encapsulates search
> over those hyper-parameters (see `AutoPerceptron`)

I'm not sure what your long-term goals with this project are, but I
see three problems with this approach:
1. The values might be problem-dependent rather than estimator
dependent. In your example, you're optimizing for accuracy, but you
might want to optimize for F1-score instead.
2. The number is estimators is *huge* if you also consider
combinations like SelectKBest(chi2) -> RBFSamples -> SGDClassifier
pipelines (a classifier that I was trying out only yesterday).
3. The estimator parameters change sometimes, so this would have to be
kept in sync with scikit-learn.

When I wrote the scikit-learn wrapper for NLTK [1], I chose a strategy
where *no scikit-learn code is imported at all* (except when the user
runs the demo or unit tests). Instead, the user is responsible for
importing it and constructing the appropriate estimator. This makes
the code robust to API changes, and it can handle arbitrarily complex
sklearn.Pipeline objects, as well as estimators that follow the API
conventions but are not in scikit-learn proper.

I think a similar approach can be followed here. While some
suggestions for parameters to try might be shipped as examples, an
estimator- and evaluation-agnostic wrapper class ("meta-estimator") is
a stronger basis for a package like the one you're writing.
scikit-learn's own GridSearch is also implemented like this, to a
large extent.

[1] 
https://github.com/nltk/nltk/blob/f7f3b73f0f051639d87cfeea43b0aabf6f167b8f/nltk/classify/scikitlearn.py

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to