Re: alternating least squares

Koobas Wed, 09 Jan 2013 07:14:01 -0800

On Wed, Jan 9, 2013 at 12:40 AM, Sean Owen <[email protected]> wrote:

> I think the model you're referring to can use explicit or implicit
> feedback. It's using the values -- however they are derived -- as
> weights in the loss function rather than values to be approximated
> directly. So you still use P even with implicit feedback.
>
> Of course you can also use ALS to factor R directly if you wanted, also.
>
> Yes, I see it now.
It is weighted regression, whether explicit or implicit data.
Thank you so much.
I think I finally got the picture.



> Overfitting is as much an issue as in any ML algorithm. Hard to
> quantify it more than that but you certainly don't want to use lambda
> = 0.
>
> The right value of lambda depends on the data -- depends even more on
> what you mean by lambda! there are different usages in different
> papers. More data means you need less lambda. The effective weight on
> the overfitting / Tikhonov terms is about 1 in my experience -- these
> terms should be weighted roughly like the loss function terms. But
> that can mean using values for lambda much smaller than 1, since
> lambda is just one multiplier of those terms in many formulations.
>
> The rank has to be greater than the effective rank of the data (of
> course). It's also something you have to fit to the data
> experimentally. For normal-ish data sets of normal-ish size, the right
> number of features is probably 20 - 100. I'd test in that range to
> start.
>
> More features tends to let the model overfit more, so in theory you
> need more lambda with more features, all else equal.
>
> It's *really* something you just have to fit to representative sample
> data. The optimal answer is way too dependent on the nature,
> distribution and size of the data to say more than the above.
>
>
> On Tue, Jan 8, 2013 at 8:54 PM, Koobas <[email protected]> wrote:
> >> Okay, I got a little bit further in my understanding.
> > The matrix of ratings R is replaced with the binary matrix P.
> > Then R is used again in regularization.
> > I get it.
> > This takes care of the situations when you have user-item interactions,
> > but you don't have the rating.
> > So, it can handle explicit feedback, implicit feedback, and mixed
> (partial
> > / missing feedback).
> > If I have implicit feedback, I just drop R altogether, right?
> >
> > Now the only remaining "trick" is Tikhonov regularization,
> > which leads to a couple of questions:
> > 1) How much of a problem overfitting is?
> > 2) How do I pick lambda?
> > 3) How do I pick the rank of the approximation in the first place?
> >     How does the overfitting problem depend on the rank of the
> > approximation?
>

Re: alternating least squares

Reply via email to