Re: [Scikit-learn-general] [GSoC2015 metric learning]

Michael Eickenberg Thu, 28 May 2015 14:13:22 -0700

Hi Artem,

thanks for your blog posts and summary email. Andy has already given the
right direction to work along, but here are my \varepsilon\$:


1. What Andy says: no cython unless absolutely necessary, and if so, at the
end of the development process. It is great that you already have a working
version. As you indicate, there are some obvious vectorizations that can be
done e.g. as in
https://github.com/RolT/NCA-python/blob/master/nca_cost.py#L54 .

2. Once these changes are made you can start benching and get a feeling for
the tradeoffs between memory and speed that you have brought up.

3. For the extreme of recomputing at every iteration, you already have a
baseline to compare other optimization algorithms to, which is Roland
Thiollière's code from which I cite the line above. It uses an optimizer
from scipy, which you can also exchange for other ones that have been
mentioned by you and Andy.


> What do you think about "cell-wise" optimization (fixing all cells of
matrix but one)?


Do you mean like coordinate descent/ascent as in e.g. the Lasso estimator?
You can try writing the optimization problem, but the implementation will
probably be useless without cython. So I'd keep that for the end if there
is time (and if I understand the idea correctly, because maybe there is a
vectorizable axis that I overlooked that sufficiently compensates for
looping over the others).

> Brian suggested to try a rather radical optimization, where you only
consider pik for pairs that are close in the original space. It is unclear
how bad an approximation that would be.

This goes in the direction of the idea behind LMNN, so should probably be
considered at the time when LMNN is attacked.

As for example data, I really like the one you show in your blog post. It
would be great to have a few more low-dimensional examples like that. Maybe
you can find something useful in the already existing ones, or think up
some extra ones that are more or less difficult to resolve for your methods.

Code-wise, I would attack the problem as a function first. Write a function
that takes X and y (plus maybe some options) and gives back L. You can put
a skeleton of a sklearn estimator around it by calling this function from
fit.
Please keep your code either in a sklearn WIP PR or a public gist, so it
can be reviewed. Writing benchmarks can be framed as writing examples, i.e.
plot_* functions (maybe Andy or Olivier have a comment on how benchmarks
have been handled in the past?).

Michael



On Thu, May 28, 2015 at 8:33 PM, Andreas Mueller <t3k...@gmail.com> wrote:

>  I talked to my office mate Brian McFee for some feedback.
> Apparently there are some memory and computation saving methods, but they
> are all not well-published unfortunately.
> I think you should try create some benchmarks for your current
> implementation, using classification with 1 nearest neighbor, maybe start
> with iris and digits and make_classification.
>
> You should probably start by comparing precomputing all outer products
> (which will not scale) with recomputing at every iteration.
> And using recomputing at every iteration, try to compare SGD and L-BFGS
> (from scipy).
>
> Brian suggested to try a rather radical optimization, where you only
> consider pik for pairs that are close in the original space. It is unclear
> how bad an approximation that would be.
> The problem with tracking the large pik is that computing the pik is
> already pretty expensive (n ** 2).
> Maybe doing mini-batch sgd over the i with adagrad would work.
>
> Btw, there are some formatting issues in your blogpost code.
>
>
> On 05/28/2015 06:25 AM, Artem wrote:
>
>   Hello everyone
>
>  I finally published my API blogpost
> <http://barmaley-exe.blogspot.ru/2015/05/api-design.html>. It mostly
> summarizes ideas discussed during project proposal stage.
>
>  For now I'd like to discuss NCA implementation. I already wrote
> <http://barmaley-exe.blogspot.ru/2015/05/nca.html> a small prototype that
> works, but there are issues I want to talk about:
>  1. Efficiency. Should I aim for Cython implementation, or just having
> everything vectorized is fine? In general, some rule of thumb like when to
> use Cython would be nice.
> 2. Memory issues. If we cache outer products, there might be not enough
> memory. What's your opinion on my idea (see the blogpost) of a cache of
> fixed size? There's cache_size parameter for SVC, but I haven't looked into
> that yet. Is it of any help to this problem?
> 3. Optimization. At the moment I use plain batch gradient ascent with a
> fixed learning rate.
>
>    - I think it'd better to use something more advanced like Momentum /
>    AdaGrad (line search seems too impractical)
>     - Another point to consider is stochastic (minibatch?) gradient
>    ascent.
>    - What do you think about "cell-wise" optimization (fixing all cells
>    of matrix but one)?
>    - Or maybe it's enough to use scipy's conjugate gradients optimizer?
>
>
> On Mon, May 4, 2015 at 2:02 PM, Michael Eickenberg <
> michael.eickenb...@gmail.com> wrote:
>
>>  Dear Artem,
>>
>> congratulations on the acceptance of your GSoC proposal! I am certain
>> there will be a very interesting summer ahead of us. Kyle and I are excited
>> to be mentors and will do our best to provide all the guidance necessary
>> for your project to succeed. It is very rich and will be a great addition
>> to the codebase.
>>
>> Your blog post
>> <http://barmaley-exe.blogspot.ru/2015/05/introduction.html> on the gists
>> of the methods is written in a very understandable way and permits a good
>> overview of the topics you are going to address in depth. It shows that you
>> have the right intuitions, and are ready to delve into the intricacies of
>> the methods [1]. Take advantage of the next weeks to do so! Let's make sure
>> we hit the ground running at the end of this warm-up phase.
>>
>> As for your next plans, sketching the algorithms in very high level
>> pseudo-code is of course an excellent idea and can be a next blog post.
>> After this, you can zoom in on the details of how each pseudo-code step
>> can be implemented. If you get the level of detail right, I recommend the
>> Python language to describe your algorithms ;) -- what I mean is that
>> getting a minimal version of the algorithm to work, just as a function, not
>> a sklearn estimator, is a valuable baseline to have, and it usually deepens
>> the understanding as well.
>>
>> As for the API questions, it is of course quite essential to remain
>> conscious at all times of the issues that have been identified in prior
>> discussion and to think of ways to add a metric learning module without
>> succumbing to excessive feature creep. My hunch is that given some working
>> minimal versions of the algorithms, we can perhaps crystallize out what is
>> absolutely necessary in terms of additions, so I would prefer that order of
>> priorities. There is also some work to be done in identifying other parts
>> of scikit-learn that already deal with (dis-)similarity type data (cf eg
>> the kernels defined in the PR for gaussian processes) and see how these can
>> be made to work in a consistent way.
>>
>> A crucial aspect that we need to figure out is "what is a good outcome?":
>> Of course we would like to have some PRs merged at the end of summer, yes.
>> But what makes a concrete implementation good? Speed (with respect to
>> what)? Readability? Maintainability (yes please!)? Elegance (what does that
>> even mean?)?
>>
>> It may be helpful if you could devise a more fine-grained timeline
>> <https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2015-Proposal:-Metric-Learning-module#timeline>
>> for the community bonding period than what is currently stated on the wiki
>> page. How about blogging your progress in understanding? Writing things
>> down for others to understand is a very good way of identifying any doubts
>> you may have on particular aspects. A mini blog-entry at the end of each
>> week simply recounting what has been done and also what has been discussed
>> will also yield an effective overview of ongoing topics.
>>
>>  In the meantime, please don't hesitate to bombard us, the other mentors,
>> and the list with any questions you may have.
>>
>>  Michael
>>
>>   [1] http://i.imgur.com/at3vIHh.png
>>
>
>
>
> ------------------------------------------------------------------------------
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] [GSoC2015 metric learning]

Reply via email to