Daniel,

You have to distinguish between explicit data (ratings from a
predefined scale) and implicit data (counting how often you observed
some behavior).

For explicit data, you can't interpret missing values as zeros,
because you simply don't know what the user would give as rating. In
order to still use matrix factorization techniques, the decomposition
has to be computed in a different way than with standard SVD
approaches. The error function stays the same as with SVD (minimize
the squared error of the product of the decomposed matrix), but the
computation uses only the known entries. That's nothing Mahout
specific, Mahout has implementations of the approaches described in
http://sifter.org/~simon/journal/20061211.html and in
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.173.2797&rep=rep1&type=pdf

For implicit data, the situation is different, because if you haven't
observed a user conducting some behavior with an item, than your
matrix should indeed have a 0 in that cell. The problem here is that
the user might simply not have had the opportunity to interact with a
lot of items, which means that you can't really 'trust' the zero
entries as much as the other entries. There is a great paper that
introduces a 'confidence' value for implicit data to solve this
problem: www2.research.att.com/~yifanhu/PUB/cf.pdf Generally speaking,
with this technique, the factorization uses the whole matrix, but
'favors' non-zero entries.

--sebastian

2012/4/29 Sean Owen <[email protected]>:
> They're implicitly zero as far as the math goes IIRC
>
> On Sun, Apr 29, 2012 at 10:45 PM, Daniel Quach <[email protected]> wrote:
>> ah sorry, I meant in the context of the SVDRecommender.
>>
>> Your earlier email mentioned that the DataModel does NOT do any subtraction, 
>> nor add back in the end, ensuring the matrix remains sparse. Does that mean 
>> it inserts zero values?

Reply via email to