Re: [Scikit-learn-general] GSoC 2015 Proposal: Multiple Metric Learning

Andreas Mueller Wed, 25 Mar 2015 12:29:50 -0700


On 03/24/2015 07:39 PM, Vlad Niculae wrote:
> Hi Raghav, hi everyone,
>
> If I may, I have a very high-level comment on your proposal. It clearly shows 
> that you are very involved in the project and understand the internals well. 
> However, I feel like it’s written from a way too technical perspective.  Your 
> proposal contains implementation details, but little or no discussion of why 
> each change is important and how it impacts users.  Taking a step back and 
> writing such discussion can help gain perspective, which is important for 
> planning.
Great comment! (as are your following points).
>
> 3. How does multiple metric support interfere with model selection APIs? 
> Suddenly there is no more “best_{score|params|estimator}_”. There is an API 
> discussion to be had there, and your review of possible options would be a 
> great addition to the proposal.  For example, will model selection objects 
> gain a “criterion” function, that maybe defaults to getting the first 
> specified metric? If so, could this API be used to make global decisions, 
> e.g. "the model which is within 1 standard error of the best score, but has 
> the largest C?” Or should it essentially just return a number per parameter 
> configuration, that we then sort by?
Actually I would not fiddle with this. Why not always the first one? The 
rest is just additional information.
> 4. There is another API discussion about `sample_weight`: is that the only 
> parameter that we want to route to scoring? I have some applications where I 
> want some notion of `sample_group`. (This would allow to use scikit-learn 
> directly for e.g. query-grouped search results ranking.)  I proposed the 
> `sample_*` API convention but it has quite a few downsides; if I remember 
> correctly Joel proposed a param_routing API where you would pass a routing 
> dict {‘sample_group’: ‘fit’, ‘score’}: such an API would be much more 
> extensible.
Yep, we need to have this discussion at some point.



Andy

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] GSoC 2015 Proposal: Multiple Metric Learning

Reply via email to