Sure Suneel. Thanks. On Thu, Dec 8, 2011 at 8:00 PM, Suneel Marthi <[email protected]>wrote:
> Would ModelSerializer class in Mahout be what you are looking for? I had > used it to persist trained models for SGD classifiers, you may want to look > into it. > > > > ________________________________ > From: Vinod <[email protected]> > To: [email protected] > Sent: Thursday, December 8, 2011 8:46 AM > Subject: Re: Persisting trained models in Mahout > > I'll use the first example from Chapter 2 of your book to clarify what I > mean by training:- > > Following code trains the recommender:- > DataModel model = new FileDataModel(new File("intro.csv")); > > UserSimilarity similarity = new PearsonCorrelationSimilarity(model); > UserNeighborhood neighborhood = > new NearestNUserNeighborhood(2, similarity, model); > > Recommender recommender = new GenericUserBasedRecommender( > model, neighborhood, similarity); > > At this point, recommender is trained on preferences of users 1 to 5 in > intro.csv. > > We should now be able to serialize() this recommender instance into a file, > say "Movie Recommender.model" using steps mentioned here ( > http://java.sun.com/developer/technicalArticles/Programming/serialization/ > ) > > All we need to do now is deploy "Movie Recommender.model" to production. > > If I understand the behavior correctly, this model should now be able to > predict recommendation for a new user. > > As an example, lets assume that production has a different user base. If > recommender instance is loaded from "Movie Recommender.model" file and > asked to provide recommendations for user '7' who has rated 101 and 102 as > 4 and 3 respectively, it should be able to predict recommendations for 7. > right? > > regards, > Vinod > > > > > On Thu, Dec 8, 2011 at 6:49 PM, Sean Owen <[email protected]> wrote: > > > Yes, I mean you need to write it and read it in your own code. > > > > What do you mean by training a model? computing similarities? I don't > know > > if there's such a thing here as "training" on one data set and running on > > another. The implementations always use all currently available info. Is > > this a cold-start issue? > > > > OutOfMemoryError is nothing to do with this; on such a small data set it > > indicates you didn't set your JVM heap size above the default. > > > > > > On Thu, Dec 8, 2011 at 1:02 PM, Vinod <[email protected]> wrote: > > > > > Hi Sean, > > > > > > Neither Recommender nor any of its parent interface extends > serializable > > so > > > there is no way that I'd be able to serialize it. > > > > > > I agree that the implementations may not have startup overhead. > However, > > > training a model on millions of row is a cpu, memory & time consuming > > > activity. For example, when data set is changed from 100K to 1M in > > chapter > > > 4, program crashes with OutOfMemory after significant amount of time. > > > > > > I feel that training should be done in development only. Once a > developer > > > is ok with test results, he should be able to save instance of the > > trained > > > and tested model (for ex:- recommender or classifier). > > > > > > These saved instances of trained and tested models only should be > > deployed > > > to production. > > > > > > Thought? > > > > > > regards, > > > Vinod > > > > > > > > > > > > On Thu, Dec 8, 2011 at 6:00 PM, Sean Owen <[email protected]> wrote: > > > > > > > Ah right. No, there's still not a provision for this. You would just > > have > > > > to serialize it yourself if you like. > > > > Most of the implementations don't have a great deal of startup > > overhead, > > > so > > > > don't really need this. The exception is perhaps slope-one, but there > > you > > > > can actually save and supply pre-computed diffs. > > > > Still it would be valid to store and re-supply user-user similarities > > or > > > > something. You can do this, manually, by querying for user-user > > > > similarities, saving them, then loading them and supplying them via > > > > GenericUserSimilarity for instance. > > > > > > > > On Thu, Dec 8, 2011 at 12:27 PM, Vinod <[email protected]> wrote: > > > > > > > > > Hi Sean, > > > > > > > > > > Thanks for the quick response. > > > > > > > > > > By model, I am not referring to data model but, a "trained" > > recommender > > > > > instance. > > > > > > > > > > Weka, for examples, has ability to save and load models:- > > > > > http://weka.wikispaces.com/Serialization > > > > > http://weka.wikispaces.com/Saving+and+loading+models > > > > > > > > > > This avoids the need to train model (recommender) every time a > server > > > is > > > > > bounced or program is restarted. > > > > > > > > > > regards, > > > > > Vinod > > > > > > > > > > > > > > > On Thu, Dec 8, 2011 at 5:43 PM, Sean Owen <[email protected]> > wrote: > > > > > > > > > > > The classes aren't Serializable, no. In the case of DataModel, > it's > > > > > assumed > > > > > > that you already have some persisted model somewhere, in a DB or > > file > > > > or > > > > > > something, so this would be redundant. > > > > > > > > > > > > On Thu, Dec 8, 2011 at 12:07 PM, Vinod <[email protected]> > wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > This is my first day of experimentation with Mahout. I am > > following > > > > > > "Mahout > > > > > > > in Action" book and looking at the sample code provided, it > seems > > > > that > > > > > > > models for ex:- recommender, needs to be trained at the start > of > > > the > > > > > > > program (start/restart). Recommender interface extends > > Refreshable > > > > > which > > > > > > > doesn't extend serializable. So, I am wondering if Mahout > > provides > > > an > > > > > > > alternate mechanism to to persist trained models (recommender > > > > instance > > > > > in > > > > > > > this case). > > > > > > > > > > > > > > Apologies if this is a very silly question. > > > > > > > > > > > > > > Thanks & regards, > > > > > > > Vinod > > > > > > > > > > > > > > > > > > > > > > > > > > > >
