Re: Persisting trained models in Mahout

Vinod Thu, 08 Dec 2011 05:02:40 -0800

Hi Sean,

Neither Recommender nor any of its parent interface extends serializable so
there is no way that I'd be able to serialize it.


I agree that the implementations may not have startup overhead. However,
training a model on millions of row is a cpu, memory & time consuming
activity. For example, when data set is changed from 100K to 1M in chapter
4, program crashes with OutOfMemory after significant amount of time.

I feel that training should be done in development only. Once a developer
is ok with test results, he should be able to save instance of the trained
and tested model  (for ex:- recommender or classifier).

These saved instances of trained and tested models only should be deployed
to production.

Thought?

regards,
Vinod



On Thu, Dec 8, 2011 at 6:00 PM, Sean Owen <[email protected]> wrote:

> Ah right. No, there's still not a provision for this. You would just have
> to serialize it yourself if you like.
> Most of the implementations don't have a great deal of startup overhead, so
> don't really need this. The exception is perhaps slope-one, but there you
> can actually save and supply pre-computed diffs.
> Still it would be valid to store and re-supply user-user similarities or
> something. You can do this, manually, by querying for user-user
> similarities, saving them, then loading them and supplying them via
> GenericUserSimilarity for instance.
>
> On Thu, Dec 8, 2011 at 12:27 PM, Vinod <[email protected]> wrote:
>
> > Hi Sean,
> >
> > Thanks for the quick response.
> >
> > By model, I am not referring to data model but, a "trained" recommender
> > instance.
> >
> > Weka, for examples, has ability to save and load models:-
> > http://weka.wikispaces.com/Serialization
> > http://weka.wikispaces.com/Saving+and+loading+models
> >
> > This avoids the need to train model (recommender) every time a server is
> > bounced or program is restarted.
> >
> > regards,
> > Vinod
> >
> >
> > On Thu, Dec 8, 2011 at 5:43 PM, Sean Owen <[email protected]> wrote:
> >
> > > The classes aren't Serializable, no. In the case of DataModel, it's
> > assumed
> > > that you already have some persisted model somewhere, in a DB or file
> or
> > > something, so this would be redundant.
> > >
> > > On Thu, Dec 8, 2011 at 12:07 PM, Vinod <[email protected]> wrote:
> > >
> > > > Hi,
> > > >
> > > > This is my first day of experimentation with Mahout. I am following
> > > "Mahout
> > > > in Action" book and looking at the sample code provided, it seems
> that
> > > > models for ex:- recommender, needs to be trained at the start of the
> > > > program (start/restart). Recommender interface extends Refreshable
> > which
> > > > doesn't extend serializable. So, I am wondering if Mahout provides an
> > > > alternate mechanism to to persist trained models (recommender
> instance
> > in
> > > > this case).
> > > >
> > > > Apologies if this is a very silly question.
> > > >
> > > > Thanks & regards,
> > > > Vinod
> > > >
> > >
> >
>

Re: Persisting trained models in Mahout

Reply via email to