> Depending on what you mean here, I think it does support some pretty good
hierarchical capabilities.


That Yahoo paper implied that you could learn Bayesian hierarchy as a
combination
of models under inversed GLM link function.

In their particular case, they considered learnign latent factor model
first and then subsequently freeze its learning but adding another model
that learns based on side information, under inverse link.

They sort of made a case that 1) it doesn't really matter what inverse link
is (they considered Logistic and i think Poisson regressions as examples)
and 2) it doesn't really matter what method (or even model) is used  to
train the models under inverse link function.

Also, another observation they were making was that if we learn latent links
and regression based on static user/item profiles as a first stage model,
and then
add model that considers side information as a second stage (when first
stage is being frozen) then we can have reasonable prediction for a user
based on its
profile out of the door before the user has yet to make any ratings (i.e.
before any side information has been available). Thus they say it is a
reasonable cold start
solution (which i think was part of the inquery in the original post, they
say they add 100s users per day i would like to make recommendations out of
the door). i think they implied that was a technique that yahoo media was
using.

What i think it means is if there's a generic GLM Bayesian architecture that
allows to combine 1st stage and 2nd stage (or possibly more) GLM based
models then it opens up interesting opportunities for experimentation and
ad-hoc customization while possibly reusing standard implementations to as
much degree as possible.

One might think of even 3-stage Bayesian Hierarchy in this case: 1st stage
-- static profile regression; 2nd stage -- latent links; 3rd stage -- side
information. I can think of even more complicated ad-hoc hierarchy or even
adaptive hierarchy selection in some cases (we have a case for some flavor
of that). But it all requires some sort of pluggable architecture for GLM
hierarchies.


On Wed, Dec 29, 2010 at 1:03 PM, Ted Dunning <[email protected]> wrote:

> On Wed, Dec 29, 2010 at 11:37 AM, Dmitriy Lyubimov <[email protected]
> >wrote:
>
> > What i don't understand is why they locked themselves to log-linear model
> > there in that paper.
> >
>
> It is a very general link function that gives good probability scores.
>
> Note that log-linear == soft-max == multinomial logistic regression
>
> Only the names were changed to protect the innocent.
>
>
> > There's a Yahoo research paper around where they generalize the case for
> > GLMs and do very similar thing. Except it doesn't have to be a log-linear
> > and actually could be a two-pass learning using whatever techniques in
> each
> > pass (as long as architecture allows to plug those models one into
> > another).
> >
>
> The important point is not the log-linear output step.   The important
> thing
> is generalizing the equivalent of logistic regression to the dyadic case
> with side information.
>
> Intuitively i feel that the most promising approach is somethnig like
> > incremental SVD (with first 20 or so items being soft-limited by logit)
> and
> > perhaps using wieghted regularization learning side information
> parameters
> > subsequently in minibatches for online learning.
> >
>
> This is very nearly what Mohan and Elkan did.  They have incremental
> SVD-ish, an effective learning algorithm and, with SGD algorithms for
> learning an ability to do incremental learning or learning to rank.
>
>
> > Also i am still not quite sure what the best way to do input
> normalization
> > for both continuous and nominal inputs together with existing framework.
> > Last thing i heard was it is not working well together in existing SGD
> > solution.
>
>
> What isn't working well enough with the SGD solution is inputs that mix
> sparse and non-sparse inputs.  Input normalization is a separate issue and
> easy to deal with using on-line estimators such as OnlineSummarizer.
>
> .... But i am fairly sure Mahout
> > doesn't support hierarchical plugs at the moment at all.
> >
>
> Depending on what you mean here, I think it does support some pretty good
> hierarchical capabilities.
>
> It doesn't support multi-layer learning, but that isn't usually what helps
> in the large sparse problems that are the focus
> of Mahout classification.
>

Reply via email to