On Wed, Dec 29, 2010 at 11:37 AM, Dmitriy Lyubimov <[email protected]>wrote:

> What i don't understand is why they locked themselves to log-linear model
> there in that paper.
>

It is a very general link function that gives good probability scores.

Note that log-linear == soft-max == multinomial logistic regression

Only the names were changed to protect the innocent.


> There's a Yahoo research paper around where they generalize the case for
> GLMs and do very similar thing. Except it doesn't have to be a log-linear
> and actually could be a two-pass learning using whatever techniques in each
> pass (as long as architecture allows to plug those models one into
> another).
>

The important point is not the log-linear output step.   The important thing
is generalizing the equivalent of logistic regression to the dyadic case
with side information.

Intuitively i feel that the most promising approach is somethnig like
> incremental SVD (with first 20 or so items being soft-limited by logit) and
> perhaps using wieghted regularization learning side information parameters
> subsequently in minibatches for online learning.
>

This is very nearly what Mohan and Elkan did.  They have incremental
SVD-ish, an effective learning algorithm and, with SGD algorithms for
learning an ability to do incremental learning or learning to rank.


> Also i am still not quite sure what the best way to do input normalization
> for both continuous and nominal inputs together with existing framework.
> Last thing i heard was it is not working well together in existing SGD
> solution.


What isn't working well enough with the SGD solution is inputs that mix
sparse and non-sparse inputs.  Input normalization is a separate issue and
easy to deal with using on-line estimators such as OnlineSummarizer.

.... But i am fairly sure Mahout
> doesn't support hierarchical plugs at the moment at all.
>

Depending on what you mean here, I think it does support some pretty good
hierarchical capabilities.

It doesn't support multi-layer learning, but that isn't usually what helps
in the large sparse problems that are the focus
of Mahout classification.

Reply via email to