Re: Understanding the SVD recommender

Sean Owen Sun, 06 Nov 2011 15:17:43 -0800

Yeah OK, that makes sense. That's a pretty helpful paper.

I can get through the Lanczos algorithm without much trouble, I think.
The piece I'm looking for pointers on at the moment is the eigen
decomposition of the tridiagonal matrix.


Jake am I right that this is done in memory right now? I am sure your
previous comment actually answers this next question, but I haven't
connected the dots yet. If we're getting eigenvectors of AT * A, and
that matrix is huge, then isn't the tridiagonal matrix still huge --
albeit it has only 3m entries, not m^2. It seems like the code just
calls to EigenDecomposer which treats it as a dense m x m matrix.

(Is it worth me figuring out the QR decomposition enough to build a
distributed version of it?)

On Sun, Nov 6, 2011 at 7:25 PM, Sebastian Schelter <[email protected]> wrote:
> On 06.11.2011 18:52, Sean Owen wrote:
>> Following up on this very old thread.
>>
>> I understood all this bit, thanks, that greatly clarified.
>>
>> You multiply a new user vector by V to project it into the new
>> "pseudo-item", reduced-dimension space.
>> And to get back to real items, you multiply by V's inverse, which is
>> its transpose.
>> And so you are really multiplying the user vector by V VT, which is
>> not a no-op, since those are truncated matrices and aren't actually
>> exact inverses (?)
>>
>> The original paper talks about cosine similarities between users or
>> items in the reduced-dimension space, but, can anyone shine light on
>> the point of that?
>
> I think this interpretation is coming from LSI where you project a query
> onto the document feature space and use the cosine to find the most
> similar documents (which can be done by simple matrix vector
> mulitplication as the singular vectors are orthonormal and computing dot
> products with the projected query is therefore equal to computing cosines).
>
> Something similar could be done to find most similar items or users, the
> bad thing is that AFAIK you have to look at every other user or item as
> U and V are dense.
>
> Maybe this paper can help to shine light on that:
>
> "Using Linear Algebra for Intelligent Information Retrieval"
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.3138&rep=rep1&type=pdf
>
> From the paper also, it seems like they say the
>> predictions are just computed as vector products as above.
>>
>>
>> Finally, separately, I'm trying to understand the Lanczos method as
>> part of computing an SVD. Lanczos operates on a real symmetric matrix
>> right? And am I right that it comes into play when you are computing
>> and SVD...
>>
>> A = U * S * VT
>>
>> ... because U is actually the eigenvectors of (symmetric) A*AT and V
>> is the eigenvectors of AT*A? And so Lanczos is used to answer those
>> questions to complete the SVD
>>
>> On Fri, Jun 4, 2010 at 6:48 AM, Ted Dunning <[email protected]> wrote:
>>> You are correct.  The paper has an appalling treatment of the folding in
>>> approach.
>>>
>>> In fact, the procedure is dead simple.
>>>
>>> The basic idea is to leave the coordinate system derived in the original SVD
>>> intact and simply project the new users into that space.
>>>
>>> The easiest way to see what is happening is to start again with the original
>>> rating matrix A as decomposed:
>>>
>>>   A = U S V'
>>>
>>> where A is users x items.  If we multiply on the right by V, we get
>>>
>>>   A V = U S V' V = U S
>>>
>>> (because V' V = I, by definition).  This result is (users x items) x (items
>>> x k) = users x k, that is, it gives a k dimensional vector for each user.
>>>  Similarly, multiplication on the left by U' gives a k x items matrix which,
>>> when transposed gives a k dimensional vector for each item.
>>>
>>> This implies that if we augment U with new user row vectors U_new, we should
>>> be able to simply compute new k-dimensional vectors for the new users and
>>> adjoin these new vectors to the previous vectors.  Concisely put,
>>>
>>> ( A     )     ( A V     )
>>> (       ) V = (         )
>>> ( A_new )     ( A_new V )
>>>
>>> This isn't really magical.  It just says that we can compute new user
>>> vectors at any time by multiplying the new users' ratings by V.
>>>
>>> The diagram in figure one is hideously confusing because it looks like a
>>> picture of some kind of multiplication whereas it is really depicting some
>>> odd kind of flow diagram.
>>>
>>> Does this solve the problem?
>>>
>>> On Thu, Jun 3, 2010 at 9:26 AM, Sean Owen <[email protected]> wrote:
>>>
>>>> Section 3 is hard to understand.
>>>>
>>>> - Ak and P are defined, but not used later
>>>> - Definition of P has UTk x Nu as a computation. UTk is a k x m
>>>> matrix, and Nu is "t" x 1. t is not defined.
>>>> - This only makes sense if t = m. But m is the number of users, and Nu
>>>> is a user vector, so should have a number of elements equal to n, the
>>>> number of items
>>>> - Sk * VTk is described as a k x "d" matrix but d is undefined
>>>> - The diagram suggests that VTk are multiplied by all the Nu, which
>>>> makes more sense -- but only if Nu are multiplied by VTk, not the
>>>> other way. And the diagram depicts neither of those.
>>>> - Conceptually I would understand Nu x VTk, but then P is defined by
>>>> an additional product with Uk
>>>>
>>>> In short... what?
>>>>
>>>>
>>>> On Thu, Jun 3, 2010 at 4:15 PM, Ted Dunning <[email protected]> wrote:
>>>>> Fire away.
>>>>>
>>>>> On Thu, Jun 3, 2010 at 3:52 AM, Sean Owen <[email protected]> wrote:
>>>>>
>>>>>> Is anyone out there familiar enough with this to a) discuss this paper
>>>>>> with me or b) point me to another writeup on the approach?
>>>>>>
>>>>>
>>>>
>>>
>
>

Re: Understanding the SVD recommender

Reply via email to