Drilling just a bit more. If I just use simple Tikhonov regularization, I set both lambdas to identity, and iterate like this (MATLAB):
rank = 50; for i=1:6, Y = inv(X'*X+eye(rank))'*X'*A; X = A*Y'*inv(Y*Y'+eye(rank)); end Now, can I use weighted regularization and preserve the matrix notation? Because it seems to me that I have to go one row of X, (one column of Y) at a time. Is that really so, or am I missing something? On Wed, Jan 9, 2013 at 10:13 AM, Koobas <[email protected]> wrote: > > > On Wed, Jan 9, 2013 at 12:40 AM, Sean Owen <[email protected]> wrote: > >> I think the model you're referring to can use explicit or implicit >> feedback. It's using the values -- however they are derived -- as >> weights in the loss function rather than values to be approximated >> directly. So you still use P even with implicit feedback. >> >> Of course you can also use ALS to factor R directly if you wanted, also. >> >> Yes, I see it now. > It is weighted regression, whether explicit or implicit data. > Thank you so much. > I think I finally got the picture. > > >> Overfitting is as much an issue as in any ML algorithm. Hard to >> quantify it more than that but you certainly don't want to use lambda >> = 0. >> >> The right value of lambda depends on the data -- depends even more on >> what you mean by lambda! there are different usages in different >> papers. More data means you need less lambda. The effective weight on >> the overfitting / Tikhonov terms is about 1 in my experience -- these >> terms should be weighted roughly like the loss function terms. But >> that can mean using values for lambda much smaller than 1, since >> lambda is just one multiplier of those terms in many formulations. >> >> The rank has to be greater than the effective rank of the data (of >> course). It's also something you have to fit to the data >> experimentally. For normal-ish data sets of normal-ish size, the right >> number of features is probably 20 - 100. I'd test in that range to >> start. >> >> More features tends to let the model overfit more, so in theory you >> need more lambda with more features, all else equal. >> >> It's *really* something you just have to fit to representative sample >> data. The optimal answer is way too dependent on the nature, >> distribution and size of the data to say more than the above. >> >> >> On Tue, Jan 8, 2013 at 8:54 PM, Koobas <[email protected]> wrote: >> >> Okay, I got a little bit further in my understanding. >> > The matrix of ratings R is replaced with the binary matrix P. >> > Then R is used again in regularization. >> > I get it. >> > This takes care of the situations when you have user-item interactions, >> > but you don't have the rating. >> > So, it can handle explicit feedback, implicit feedback, and mixed >> (partial >> > / missing feedback). >> > If I have implicit feedback, I just drop R altogether, right? >> > >> > Now the only remaining "trick" is Tikhonov regularization, >> > which leads to a couple of questions: >> > 1) How much of a problem overfitting is? >> > 2) How do I pick lambda? >> > 3) How do I pick the rank of the approximation in the first place? >> > How does the overfitting problem depend on the rank of the >> > approximation? >> > >
