Dear Artem,

congratulations on the acceptance of your GSoC proposal! I am certain there
will be a very interesting summer ahead of us. Kyle and I are excited to be
mentors and will do our best to provide all the guidance necessary for your
project to succeed. It is very rich and will be a great addition to the
codebase.

Your blog post <http://barmaley-exe.blogspot.ru/2015/05/introduction.html>
on the gists of the methods is written in a very understandable way and
permits a good overview of the topics you are going to address in depth. It
shows that you have the right intuitions, and are ready to delve into the
intricacies of the methods [1]. Take advantage of the next weeks to do so!
Let's make sure we hit the ground running at the end of this warm-up phase.

As for your next plans, sketching the algorithms in very high level
pseudo-code is of course an excellent idea and can be a next blog post.
After this, you can zoom in on the details of how each pseudo-code step can
be implemented. If you get the level of detail right, I recommend the
Python language to describe your algorithms ;) -- what I mean is that
getting a minimal version of the algorithm to work, just as a function, not
a sklearn estimator, is a valuable baseline to have, and it usually deepens
the understanding as well.

As for the API questions, it is of course quite essential to remain
conscious at all times of the issues that have been identified in prior
discussion and to think of ways to add a metric learning module without
succumbing to excessive feature creep. My hunch is that given some working
minimal versions of the algorithms, we can perhaps crystallize out what is
absolutely necessary in terms of additions, so I would prefer that order of
priorities. There is also some work to be done in identifying other parts
of scikit-learn that already deal with (dis-)similarity type data (cf eg
the kernels defined in the PR for gaussian processes) and see how these can
be made to work in a consistent way.

A crucial aspect that we need to figure out is "what is a good outcome?":
Of course we would like to have some PRs merged at the end of summer, yes.
But what makes a concrete implementation good? Speed (with respect to
what)? Readability? Maintainability (yes please!)? Elegance (what does that
even mean?)?

It may be helpful if you could devise a more fine-grained timeline
<https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2015-Proposal:-Metric-Learning-module#timeline>
for the community bonding period than what is currently stated on the wiki
page. How about blogging your progress in understanding? Writing things
down for others to understand is a very good way of identifying any doubts
you may have on particular aspects. A mini blog-entry at the end of each
week simply recounting what has been done and also what has been discussed
will also yield an effective overview of ongoing topics.

In the meantime, please don't hesitate to bombard us, the other mentors,
and the list with any questions you may have.

Michael

[1]  [image: Inline image 1]
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to