Re: [Scikit-learn-general] GSoC 2015 Proposal: Multiple Metric Learning

Raghav R V Thu, 26 Mar 2015 05:42:09 -0700

Hey Gael,

I am sorry that I missed this comment of yours -


> > 1. The design of multiple metric support is important and would bring
an immense usability gain.

> But it will also require a framework of its own. I would say that this is
to be considered in a second step.

Could you expand a little on this? Do you mean to say I should probably
allocate time for considering the framework and API involved in the same?

Thanks,

Raghav  RV (ragv)

On Thu, Mar 26, 2015 at 3:57 AM, Raghav R V <[email protected]> wrote:

> Hi all,
>
> thanks a lot for the comments!
>
> I've just edited/formatted my prop. based on all of your comments...
>
>
> https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2015-Proposal:-Multiple-metric-support-for-CV-and-grid_search-and-other-general-improvements
>
> Only thing to be done is to plan what I should do for the month of July...
> ( For August I intend to finish any leftovers and clean up the  tutorials /
> documentations / docstrings )
>
> I have the following options for July -
> * discussing and attempting implementation of generalized cv and early
> stopping as suggested by @amueller
> * evaluating and attempting to implement or atleast document how out of
> core grid search / cv can be done as suggested by @ogrisel
> * A new CV generator that is a blend of `ShuffleSplit` and `LeavePLabel`
> as suggested by @ogrisel (I have a feeling this is trivial and can be
> completed  in one/two week max)
>
> Kindly let me know how you feel about this revised proposal and also let
> me know which one I could do for the month of July.
>
> On Thu, Mar 26, 2015 at 12:59 AM, Andreas Mueller <[email protected]>
> wrote:
>
>>
>>
>> On 03/24/2015 07:39 PM, Vlad Niculae wrote:
>> > Hi Raghav, hi everyone,
>> >
>> > If I may, I have a very high-level comment on your proposal. It clearly
>> shows that you are very involved in the project and understand the
>> internals well. However, I feel like it’s written from a way too technical
>> perspective.  Your proposal contains implementation details, but little or
>> no discussion of why each change is important and how it impacts users.
>> Taking a step back and writing such discussion can help gain perspective,
>> which is important for planning.
>> Great comment! (as are your following points).
>> >
>> > 3. How does multiple metric support interfere with model selection
>> APIs? Suddenly there is no more “best_{score|params|estimator}_”. There is
>> an API discussion to be had there, and your review of possible options
>> would be a great addition to the proposal.  For example, will model
>> selection objects gain a “criterion” function, that maybe defaults to
>> getting the first specified metric? If so, could this API be used to make
>> global decisions, e.g. "the model which is within 1 standard error of the
>> best score, but has the largest C?” Or should it essentially just return a
>> number per parameter configuration, that we then sort by?
>> Actually I would not fiddle with this. Why not always the first one? The
>> rest is just additional information.
>> > 4. There is another API discussion about `sample_weight`: is that the
>> only parameter that we want to route to scoring? I have some applications
>> where I want some notion of `sample_group`. (This would allow to use
>> scikit-learn directly for e.g. query-grouped search results ranking.)  I
>> proposed the `sample_*` API convention but it has quite a few downsides; if
>> I remember correctly Joel proposed a param_routing API where you would pass
>> a routing dict {‘sample_group’: ‘fit’, ‘score’}: such an API would be much
>> more extensible.
>> Yep, we need to have this discussion at some point.
>>
>>
>> Andy
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website,
>> sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub
>> for all
>> things parallel software development, from weekly thought leadership
>> blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] GSoC 2015 Proposal: Multiple Metric Learning

Reply via email to