Yeah I've turned that over in my head. I am not sure I have a great answer. But I interpret the net effect to be that the model prefers simple explanations for active users, at the cost of more error in the approximation. One would rather pick a basis that more naturally explains the data observed in active users. I think I can see that this could be a useful assumption -- these users are less extremely sparse.
On Mon, Jun 16, 2014 at 8:50 PM, Dmitriy Lyubimov <[email protected]> wrote: > Probably a question for Sebastian. > > As we know, the two papers (Hu-Koren-Volynsky and Zhou et. al) use slightly > different loss functions. > > Zhou et al. are fairly unique in that they multiply norm of U, V vectors > additionally by the number of observied interactions. > > The paper doesn't explain why it works except saying along the lines of "we > tried several regularization matrices, and this one worked better in our > case". > > I tried to figure why that is. And still not sure why it would be better. > So b asically we say, by allowing smaller sets of observation having > smaller regularization values, it is ok for smaller observation sets to > overfit slightly more than larger observations sets. > > This seems to be counterintuitive. Intuition tells us, smaller sets > actually would tend to overfit more, not less, and therefore might possibly > use larger regularization rate, not smaller one. Sebastian, what's your > take on weighing regularization in ALS-WR? > > thanks. > -d
