Thanks, it helped!

After having some thoughts about what the outcome prediction, I'm having a 
question about measuring the quality of my model.
If I'm using a technique in which in the end I'm predicting a preference value 
(implicit \ explicit) I could easily measure my model by applying it on a test 
dataset and calculating RMSE and etc.
But if I'm just estimating the possibility the user will like the item (such 
with the co-occurrence item based), it give me the ability to rank items, but 
how could I estimate my success?
How could I measure the success of my ranking?

-----Original Message-----
From: Sean Owen [mailto:[email protected]] 
Sent: Friday, July 06, 2012 12:35
To: Mahout User List
Subject: Shortcut to finding the best recs from factored matrices?

Here's one I've been puzzling over for a bit. In a factorization based
on the SVD or what have you, you reconstruct the approximate original
matrix (well, one row) by multiplying the matrices back together and
looking for the largest elements. This is essentially multiplying a
user feature vector by the entire item-feature matrix to reconstruct
one approximate row of the input.

That's not necessarily so slow, but it's not the fastest thing. I want
to speed it up. It seems like there ought to be some shortcut, even if
it means a probabilistic approach that could get it slightly wrong at
times.

I say so because most item feature vectors are nowhere near the user
feature vector in feature space. Their dot product is not going to be
the largest. In fact, given the user feature vector you can say
exactly where in feature space (which direction) you want to look for
the top items. For example, if the user feature vector is (2,1) you
are looking for item vector (x,y) that maximizes 2x+y and that is
largest in the direction of (2,1).

When feature space is 50+-dimensional though, I'm having a hard time
thinking of an efficient way to index those item feature vectors such
that one could quickly find a few buckets of items to check and with
high confidence have found the best recommendations. The bases I have
are not necessarily orthogonal let alone orthonormal either. I bet,
hope, someone will have an insight?

You could cluster the items with k-means, quickly, I suppose. I was
hoping for something less heavy.

Sean
---------------------------------------------------------------------
Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Reply via email to