Daniel, You have to distinguish between explicit data (ratings from a predefined scale) and implicit data (counting how often you observed some behavior).
For explicit data, you can't interpret missing values as zeros, because you simply don't know what the user would give as rating. In order to still use matrix factorization techniques, the decomposition has to be computed in a different way than with standard SVD approaches. The error function stays the same as with SVD (minimize the squared error of the product of the decomposed matrix), but the computation uses only the known entries. That's nothing Mahout specific, Mahout has implementations of the approaches described in http://sifter.org/~simon/journal/20061211.html and in http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.173.2797&rep=rep1&type=pdf For implicit data, the situation is different, because if you haven't observed a user conducting some behavior with an item, than your matrix should indeed have a 0 in that cell. The problem here is that the user might simply not have had the opportunity to interact with a lot of items, which means that you can't really 'trust' the zero entries as much as the other entries. There is a great paper that introduces a 'confidence' value for implicit data to solve this problem: www2.research.att.com/~yifanhu/PUB/cf.pdf Generally speaking, with this technique, the factorization uses the whole matrix, but 'favors' non-zero entries. --sebastian 2012/4/29 Sean Owen <[email protected]>: > They're implicitly zero as far as the math goes IIRC > > On Sun, Apr 29, 2012 at 10:45 PM, Daniel Quach <[email protected]> wrote: >> ah sorry, I meant in the context of the SVDRecommender. >> >> Your earlier email mentioned that the DataModel does NOT do any subtraction, >> nor add back in the end, ensuring the matrix remains sparse. Does that mean >> it inserts zero values?
