And i was referring to SVD recommender, not SGD here. SGD indeed takes care of that kind of problem since it doesn't examine "empty cells" in case of latent factor computation during solving factorization problems.
But I think there's similar problem with missing side information labels in case of SGD: say we have a bunch of probes and we are reading signals off of them at certain intervals. but now and then we fail to read some of them. Actually, we fail pretty often. But regular SGD doesn't 'freeze' learning for inputs we failed to read off. We are forced to put some values there; and least harmless, it seems, is the average, since it doesn't cause any learning to happen on that particular input. But I think it does cause regularization to count a generation thus cancelling some of the learning. Whereas if we grouped missing inputs into separate learners and did hierarchical learning, that would not be happening. That's what i meant by SGD producing slightly more erorrs in this case compared to what it seems to be possible to do with hierarchies. similarity between those cases (sparse SVD and SGD inputs) is that in every case we are forced to feed a 'made-up' data to learners, because we failed to observe it in a sample. On Wed, Feb 2, 2011 at 11:05 PM, Ted Dunning <[email protected]> wrote: > That is a property of sparsity and connectedness, not SGD. > > On Wed, Feb 2, 2011 at 8:54 PM, Dmitriy Lyubimov <[email protected]> wrote: >> >> As one guy from Stanford demonstrated on >> Netflix data, the whole system collapses very quickly after certain >> threshold of sample sparsity is reached. > >
