It may actually be that they weren't solving the problem they thought.  By
regularizing prolific users more vigorously, they may actually have just
been down-weighting them.

We effectively do the same in ISJ by down-sampling the data.  It is very
important to do so, but not because of regularization.  The real reason is
that the most prolific users are soooo prolific and soooo odd.  The reason
that they appear unhinged is that they are often bots or QA teams.
 Weighting the behavior of these users highly is a recipe for disaster.


On Mon, Jun 16, 2014 at 1:28 PM, Sean Owen <[email protected]> wrote:

> Yeah I've turned that over in my head. I am not sure I have a great
> answer. But I interpret the net effect to be that the model prefers
> simple explanations for active users, at the cost of more error in the
> approximation. One would rather pick a basis that more naturally
> explains the data observed in active users. I think I can see that
> this could be a useful assumption -- these users are less extremely
> sparse.
>
>
> On Mon, Jun 16, 2014 at 8:50 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
> > Probably a question for Sebastian.
> >
> > As we know, the two papers (Hu-Koren-Volynsky and Zhou et. al) use
> slightly
> > different loss functions.
> >
> > Zhou et al. are fairly unique in that they multiply norm of U, V vectors
> > additionally by the number of observied interactions.
> >
> > The paper doesn't explain why it works except saying along the lines of
> "we
> > tried several regularization matrices, and this one worked better in our
> > case".
> >
> > I tried to figure why that is. And still not sure why it would be better.
> > So b asically we say, by allowing smaller sets of observation having
> > smaller regularization values, it is ok for smaller observation sets to
> > overfit slightly more than larger observations sets.
> >
> > This seems to be counterintuitive. Intuition tells us, smaller sets
> > actually would tend to overfit more, not less, and therefore might
> possibly
> > use larger regularization rate, not smaller one. Sebastian, what's your
> > take on weighing regularization in ALS-WR?
> >
> > thanks.
> > -d
>

Reply via email to