Hi Dimitry, I'm not sure if this algorithm:
http://www.stanford.edu/~raghuram/optspace/index.html could helps in the case of missing information in SGD, but it seems they have a very efficient approach in the case of unknown ratings in CF tasks using SVD. 2011/2/3 Dmitriy Lyubimov <[email protected]> > And i was referring to SVD recommender, not SGD here. SGD indeed takes > care of that kind of problem since it doesn't examine "empty cells" in > case of latent factor computation during solving factorization > problems. > > But I think there's similar problem with missing side information > labels in case of SGD: say we have a bunch of probes and we are > reading signals off of them at certain intervals. but now and then we > fail to read some of them. Actually, we fail pretty often. But regular > SGD doesn't 'freeze' learning for inputs we failed to read off. We are > forced to put some values there; and least harmless, it seems, is the > average, since it doesn't cause any learning to happen on that > particular input. But I think it does cause regularization to count a > generation thus cancelling some of the learning. Whereas if we grouped > missing inputs into separate learners and did hierarchical learning, > that would not be happening. That's what i meant by SGD producing > slightly more erorrs in this case compared to what it seems to be > possible to do with hierarchies. > > similarity between those cases (sparse SVD and SGD inputs) is that in > every case we are forced to feed a 'made-up' data to learners, because > we failed to observe it in a sample. > > On Wed, Feb 2, 2011 at 11:05 PM, Ted Dunning <[email protected]> > wrote: > > That is a property of sparsity and connectedness, not SGD. > > > > On Wed, Feb 2, 2011 at 8:54 PM, Dmitriy Lyubimov <[email protected]> > wrote: > >> > >> As one guy from Stanford demonstrated on > >> Netflix data, the whole system collapses very quickly after certain > >> threshold of sample sparsity is reached. > > > > >
