On Wed, Dec 29, 2010 at 11:37 AM, Dmitriy Lyubimov <[email protected]>wrote:
> What i don't understand is why they locked themselves to log-linear model > there in that paper. > It is a very general link function that gives good probability scores. Note that log-linear == soft-max == multinomial logistic regression Only the names were changed to protect the innocent. > There's a Yahoo research paper around where they generalize the case for > GLMs and do very similar thing. Except it doesn't have to be a log-linear > and actually could be a two-pass learning using whatever techniques in each > pass (as long as architecture allows to plug those models one into > another). > The important point is not the log-linear output step. The important thing is generalizing the equivalent of logistic regression to the dyadic case with side information. Intuitively i feel that the most promising approach is somethnig like > incremental SVD (with first 20 or so items being soft-limited by logit) and > perhaps using wieghted regularization learning side information parameters > subsequently in minibatches for online learning. > This is very nearly what Mohan and Elkan did. They have incremental SVD-ish, an effective learning algorithm and, with SGD algorithms for learning an ability to do incremental learning or learning to rank. > Also i am still not quite sure what the best way to do input normalization > for both continuous and nominal inputs together with existing framework. > Last thing i heard was it is not working well together in existing SGD > solution. What isn't working well enough with the SGD solution is inputs that mix sparse and non-sparse inputs. Input normalization is a separate issue and easy to deal with using on-line estimators such as OnlineSummarizer. .... But i am fairly sure Mahout > doesn't support hierarchical plugs at the moment at all. > Depending on what you mean here, I think it does support some pretty good hierarchical capabilities. It doesn't support multi-layer learning, but that isn't usually what helps in the large sparse problems that are the focus of Mahout classification.
