Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

Kyle Kastner Tue, 24 Mar 2015 17:54:28 -0700

I would focus on the API of this functionality and how/what users will
be allowed to specify. To me, this is a particularly tricky bit of the
PR. As Vlad said, take a close look at GridSearchCV and
RandomizedSearchCV and see how they interact with the codebase. Do you
plan to find good defaults for existing estimators? Or use simple
ones? Even setting simple hyperparameter ranges for estimators will
take some work. Is there a way to do this automagically?


Slice sampling and parallelization - is it necessary to have these so
early in the timeline? I would move benchmarking, profiling, and
documentation up. Those things tend to take more time than expected,
and good documentation will be key for this work. Parallelization and
slice sampling are both useful, but are pretty much internal facing -
and I would expect you would need benchmark code to prove that
parallelization and slice sampling are useful. The docs you write
should 99% apply to the code before and after adding parallelization
and slice sampling.

I think it is also key that you take a look at the new GP interface.
The PR code is fairly mature but being *very* familiar with how it
works will be a key part of success in this task.

It is good to think about external compatibility (I am especially
interested in this for selfish reasons), but it is most important to
get something that works well for sklearn alone. I don't think testing
on deep networks is especially useful for sklearn, especially since
spearmint, hyperopt, whetlab, and many other packages all try to do
this. IMO, random forests or GBRT are great candidates for examples.
Focusing on a simple, well thought out CV object with *great*
documentation and examples is most important, and will have the
largest benefit for users.

Overall, like Vlad said, the more you can break this into smaller
changes the better it is. I am not really sure how to do this, beyond
one PR with base GPSearchCV and associated code, then optimizations
like parallelization/slice sampling/EIperS in smaller following PRs,
but it is very important to think about.

On Tue, Mar 24, 2015 at 8:24 PM, Vlad Niculae <[email protected]> wrote:
> Hi Cristoph, Gael, hi everyone,
>
>> On 24 Mar 2015, at 18:09, Gael Varoquaux <[email protected]> 
>> wrote:
>>
>>> Don't you think that I could also benchmark models that are not
>>> implemented in sklearn? […]
>>
>> I am personally less interested in that. We have already a lot in
>> scikit-learn and more than enough to test the model selection code.
>
> On top of this, people have already been using dedicated hyperparameter 
> optimizer toolkits for Theano deep nets. I don’t think we should aim to 
> compete with hyperopt/spearmint from day 0 (or ever), but, just like Gael 
> said,
>
>> The focus should be on providing code that is readily-usable.
>
> As for your proposal, I have a few comments.
>
> 1. in 3.1 you say “It will have same interfaces as GridSearchCV and 
> RandomizedSearchCV”. Even the use of plural “interfaces” here points at a 
> problem: those two object do not have identical interfaces. Which interface 
> will GPSearchCV have? Will it take (prior) distributions over 
> hyperparameters? (In the same format as RandomizedSearchCV?) Ranges and 
> assume a fixed prior? I think a more detailed discussion of the user-facing 
> API would be useful.
>
> 2. Ideally this module would fully reuse the GP module. We should have no 
> code redundancy, but the way your proposal is written, it does not focus much 
> on the interaction of your changes with the GP module. (For example, will 
> slice sampling be a contribution to the GP module?) Change sets that reach 
> deeper will take longer to review and merge.
>
> 3. Your point in 4.4 about optimizing improvement *per second* seems 
> desirable, where does it fit in the timeline? Will everything be done with 
> this in mind from the start?
>
> 4. Parallelization is interesting and seems non-trivial. I’m a bit dense but 
> I managed to understand Gael’s seeding suggestion earlier. The paragraph in 
> your proposal confused me though, especially the part “I will use all 
> completed, and integrate over pending evaluations to approximate the expected 
> acquisition function.” Could you clarify?
>
> 5. (Timeline stuff.) I’m not sure what the relationship between “Build 
> optimizing pipeline from parts implemented so far” and “First working 
> prototype” is. Testing features shouldn’t come so late, it should be done at 
> the same time. In general, the timeline would benefit from a slight shift of 
> perspective: when would you like to have the PR on X functionally complete 
> (this includes tests)? Overall complete (includes docs, examples and at least 
> some review)?
>
> Hope my comments are helpful,
>
> Yours,
> Vlad
>
>
>>
>> I am worried that such task will be very time consuming and will not move
>> us much closer to code that improves model selection in scikit-learn.
>>
>> Gaël
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website, 
>> sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub for 
>> all
>> things parallel software development, from weekly thought leadership blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

Reply via email to