Re: [Scikit-learn-general] GSoC 2015 Proposal: Multiple Metric Learning

Raghav R V Wed, 25 Mar 2015 15:28:50 -0700

Hi all,

thanks a lot for the comments!


I've just edited/formatted my prop. based on all of your comments...

https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2015-Proposal:-Multiple-metric-support-for-CV-and-grid_search-and-other-general-improvements

Only thing to be done is to plan what I should do for the month of July...
( For August I intend to finish any leftovers and clean up the  tutorials /
documentations / docstrings )

I have the following options for July -
* discussing and attempting implementation of generalized cv and early
stopping as suggested by @amueller
* evaluating and attempting to implement or atleast document how out of
core grid search / cv can be done as suggested by @ogrisel
* A new CV generator that is a blend of `ShuffleSplit` and `LeavePLabel` as
suggested by @ogrisel (I have a feeling this is trivial and can be
completed  in one/two week max)

Kindly let me know how you feel about this revised proposal and also let me
know which one I could do for the month of July.

On Thu, Mar 26, 2015 at 12:59 AM, Andreas Mueller <[email protected]> wrote:

>
>
> On 03/24/2015 07:39 PM, Vlad Niculae wrote:
> > Hi Raghav, hi everyone,
> >
> > If I may, I have a very high-level comment on your proposal. It clearly
> shows that you are very involved in the project and understand the
> internals well. However, I feel like it’s written from a way too technical
> perspective.  Your proposal contains implementation details, but little or
> no discussion of why each change is important and how it impacts users.
> Taking a step back and writing such discussion can help gain perspective,
> which is important for planning.
> Great comment! (as are your following points).
> >
> > 3. How does multiple metric support interfere with model selection APIs?
> Suddenly there is no more “best_{score|params|estimator}_”. There is an API
> discussion to be had there, and your review of possible options would be a
> great addition to the proposal.  For example, will model selection objects
> gain a “criterion” function, that maybe defaults to getting the first
> specified metric? If so, could this API be used to make global decisions,
> e.g. "the model which is within 1 standard error of the best score, but has
> the largest C?” Or should it essentially just return a number per parameter
> configuration, that we then sort by?
> Actually I would not fiddle with this. Why not always the first one? The
> rest is just additional information.
> > 4. There is another API discussion about `sample_weight`: is that the
> only parameter that we want to route to scoring? I have some applications
> where I want some notion of `sample_group`. (This would allow to use
> scikit-learn directly for e.g. query-grouped search results ranking.)  I
> proposed the `sample_*` API convention but it has quite a few downsides; if
> I remember correctly Joel proposed a param_routing API where you would pass
> a routing dict {‘sample_group’: ‘fit’, ‘score’}: such an API would be much
> more extensible.
> Yep, we need to have this discussion at some point.
>
>
> Andy
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] GSoC 2015 Proposal: Multiple Metric Learning

Reply via email to