yeah so that was my best guess as well. nothing to do with regularization, just importance weighing.
The reason i was asking becase i was traditionally including "do WR/ do not do WR" as a training parameter but wasn't sure if it had much sense. Now i was revisiting this for M-1365 again. i guess i will leave it here with "do WR" by default on. On Mon, Jun 16, 2014 at 2:27 PM, Ted Dunning <[email protected]> wrote: > It may actually be that they weren't solving the problem they thought. By > regularizing prolific users more vigorously, they may actually have just > been down-weighting them. > > We effectively do the same in ISJ by down-sampling the data. It is very > important to do so, but not because of regularization. The real reason is > that the most prolific users are soooo prolific and soooo odd. The reason > that they appear unhinged is that they are often bots or QA teams. > Weighting the behavior of these users highly is a recipe for disaster. > > > On Mon, Jun 16, 2014 at 1:28 PM, Sean Owen <[email protected]> wrote: > > > Yeah I've turned that over in my head. I am not sure I have a great > > answer. But I interpret the net effect to be that the model prefers > > simple explanations for active users, at the cost of more error in the > > approximation. One would rather pick a basis that more naturally > > explains the data observed in active users. I think I can see that > > this could be a useful assumption -- these users are less extremely > > sparse. > > > > > > On Mon, Jun 16, 2014 at 8:50 PM, Dmitriy Lyubimov <[email protected]> > > wrote: > > > Probably a question for Sebastian. > > > > > > As we know, the two papers (Hu-Koren-Volynsky and Zhou et. al) use > > slightly > > > different loss functions. > > > > > > Zhou et al. are fairly unique in that they multiply norm of U, V > vectors > > > additionally by the number of observied interactions. > > > > > > The paper doesn't explain why it works except saying along the lines of > > "we > > > tried several regularization matrices, and this one worked better in > our > > > case". > > > > > > I tried to figure why that is. And still not sure why it would be > better. > > > So b asically we say, by allowing smaller sets of observation having > > > smaller regularization values, it is ok for smaller observation sets to > > > overfit slightly more than larger observations sets. > > > > > > This seems to be counterintuitive. Intuition tells us, smaller sets > > > actually would tend to overfit more, not less, and therefore might > > possibly > > > use larger regularization rate, not smaller one. Sebastian, what's your > > > take on weighing regularization in ALS-WR? > > > > > > thanks. > > > -d > > >
