Re: [Scikit-learn-general] Hyperparameter optimization

James Bergstra Tue, 19 Feb 2013 17:48:17 -0800

On Tue, Feb 19, 2013 at 7:55 PM, Lars Buitinck <l.j.buiti...@uva.nl> wrote:
> 2013/2/19 James Bergstra <james.bergs...@gmail.com>:
>> Further to this: I started a project on github to look at how to
>> combine hyperopt with sklearn.
>> https://github.com/jaberg/hyperopt-sklearn
>>
>> I've only wrapped on algorithm so far: Perceptron
>> https://github.com/jaberg/hyperopt-sklearn/blob/master/hpsklearn/perceptron.py
>>
>> My idea is that little files like perceptron.py would encode
>> (a) domain expertise about what values make sense for a particular
>> hyper-parameter (see the `search_space()` function and
>> (b) a sklearn-style fit/predict interface that encapsulates search
>> over those hyper-parameters (see `AutoPerceptron`)
>
> I'm not sure what your long-term goals with this project are, but I
> see three problems with this approach:
> 1. The values might be problem-dependent rather than estimator
> dependent. In your example, you're optimizing for accuracy, but you
> might want to optimize for F1-score instead.


Good point, and if I understand correctly, it's related to your other
point below about GridSearch. I think you are pointing out that the
design of the AutoPerceptron is off the mark for 2 reasons:

1. There is only one line in that class that actually refers to
Perceptron, so why not make the actual estimator a constructor
argument? (I agree, it should be an argument.)

2. The class mainly consists of plumbing, but also is hard-coded to
compute classification error. This is silly, it would be better to use
either (a) the native loss of the estimator or else (b) some specific
user-supplied validation metric.

I agree with both of these points. Let me know if I misunderstood you though.

> 2. The number is estimators is *huge* if you also consider
> combinations like SelectKBest(chi2) -> RBFSamples -> SGDClassifier
> pipelines (a classifier that I was trying out only yesterday).

Yes, the number of estimators in a search space can be huge. In my
research on visual system models I found that hyperopt was
surprisingly useful, even in the face of daunting configuration
problems. The point of this project, for me, is to see how it stacks
up.

One design aspect that doesn't come through in the current code sample
is that the hard-coded parameter spaces (which I'll come to in a
second) must compose. What I mean is that if someone has written up a
standard SGDClassifier search space, and someone has coded up search
spaces for SelectKBest and RBFSamples, then you should be able to just
string those all together and search the joint space without much
trouble.

Your particular case is exactly the sort of case I would hope
eventually to address - it's difficult to give sensible defaults to
each of those modules before knowing either (a) what kind of data they
will process and (b) what's going on in the rest of the pipeline.
Playing with a bunch of interacting variables as measured by
long-running programs is hard for people; automatic methods don't
actually have to be all that efficient to be competitive.

> 3. The estimator parameters change sometimes, so this would have to be
> kept in sync with scikit-learn.

This is a price I was expecting to have to pay, I don't see any way
around it. Part of the value of this library is encoding parameter
ranges for specific estimators. That tight coupling is not something
to be dodged.

- James

> When I wrote the scikit-learn wrapper for NLTK [1], I chose a strategy
> where *no scikit-learn code is imported at all* (except when the user
> runs the demo or unit tests). Instead, the user is responsible for
> importing it and constructing the appropriate estimator. This makes
> the code robust to API changes, and it can handle arbitrarily complex
> sklearn.Pipeline objects, as well as estimators that follow the API
> conventions but are not in scikit-learn proper.
>
> I think a similar approach can be followed here. While some
> suggestions for parameters to try might be shipped as examples, an
> estimator- and evaluation-agnostic wrapper class ("meta-estimator") is
> a stronger basis for a package like the one you're writing.
> scikit-learn's own GridSearch is also implemented like this, to a
> large extent.
>
> [1] 
> https://github.com/nltk/nltk/blob/f7f3b73f0f051639d87cfeea43b0aabf6f167b8f/nltk/classify/scikitlearn.py

Thanks, yes, there is a strong similarity between what I'm trying to
do and GridSearch, so it makes sense to use similar strategies for
comparing model outputs. The "AutoPerceptron" class would be improved
by being more generic, like GridSearch.

- James

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Hyperparameter optimization

Reply via email to