Re: building a (weighted) movie similarity measure

Ted Dunning Thu, 15 Sep 2011 15:57:36 -0700

2011/9/15 eric konsirald <eric.konsir...@gmail.com>

> ... When you say that co-occurrence processing will not work the way i
> might
> like, you mean because those might be feature hashed independently and will
> hardly be linked together right?
>


No.  I meant because you will have have odd artifacts.  For instance, if you
use two probes, then the two probes for a single item will appear to be a
very strong cooccurrence.  Likewise, when items collide on a single probe,
that collision will make the cooccurrence appear as it were real.

You can repair this somewhat by normalizing the encoding and then doing the
direct matrix multiplication.  You will get some spreading of the
cooccurrence energy, but it should be mostly focussed on the points you
want.

maybe what i can do is to do an independent co-occurrence analysis  and then
> inject the meaningful co-occurrences as a single token (each word separated
> e.g. by a '_') in the feature vector.
>

That is an excellent approach.

- with SGD for instance, i'm more familiar seeing it at work as a learning
> algorithm for e.g. classification (referring to the 20NewsGroup example i
> saw in Mahout), but i'm not sure how to use it as a standalone optimization
> algorithm for minimizing a given objective function.


The SGD algorithm we have in Mahout uses gradient descent to optimize an
error criterion.  If you compute the gradient you can use it for whatever
optimization you like.  The sparse update stuff will all change, but it
should still work.


> For instance, to use
> the terminology from the paper i mentioned (
> http://www2008.org/papers/pdf/p1041-debnath.pdf ), if i'd like to find the
> optimized values for the weights ω0,..., ωN given a set of equations of the
> form
> ω0 + ω1f(A1i, A1j ) + ω2f(A2i, A2j ) + · · · + ωNf(Ani, Anj ) = E(Oi, Oj)
> is there any easy way to do that in Mahout?
>

Not directly and how you want to do this would likely vary with the
definition of f.


The rest of your questions are all good.  I can't really spare the time just
now (plane about to leave) to think through a good answer, but I htink that
based on your questions that you have some good momentum in a good
direction.

Re: building a (weighted) movie similarity measure

Reply via email to