After training, save the model to a file. To apply the model,
- load the model from the file - read a document - tokenize and vectorize document - call model.classify() or some variant. Done! See the example server in chapter 16 of the mahout book for an example. I think that source is also available on my github account (without the book, of course). On Wed, Sep 21, 2011 at 8:54 PM, Dan <[email protected]> wrote: > Hi, > > Once the classifier is trained what are the standard approaches for: > > "and then apply it to documents" > > > dan > > > > --- On Tue, 9/20/11, Ken Krugler <[email protected]> wrote: > > From: Ken Krugler <[email protected]> > Subject: Re: new to mahout and need direction. > To: [email protected] > Date: Tuesday, September 20, 2011, 7:52 PM > > Hi Dan, > > I don't think you really need Mahout for the actual processing pipeline. > > If I understand the issue correctly, you're trying to come up with > potential categories for job postings that are flowing through your system. > > So that feels more like a typical train-a-classifier (offline) and then > apply it to documents via whatever mechanism fits best with your current > workflow. > > Which classifier to use, extracting features for training/classification, > etc is where Mahout could be useful. > > -- Ken > > On Sep 20, 2011, at 7:01pm, Dan wrote: > > > Hello, > > > > I am new to using mahout. I have setup hadoop, nutch, pig and I feel I am > very knowledgeable about solr and fully understand lucene. I am a php > developer and have only tinkered with java code. > > > > I have 2 million jobs and I need to build a categorization system I > figured mahout should do the trick. So I setup the 20newsgroup example ran > it. I am trying to figure out how mahout will fit into the > job-posting-into-solr chain. > > > > Currently a job posting will go into a queue to be processed into a solr > document. we currently have a bunch of processes that will add to the > document like calling google to get a latitude/longitude based on the job > posting location, etc. I figure mahout would be in one of these worker > queues. > > > > What are my options for accessing mahout from php? webservice.. bash? I > would like a system where I post it a chunk of text and it would return a > list of suggested categories since a job posting could belong to multiple > categories. > > > > > > Any pointers in the right direction would be appreciated > > > > dan > > > > > > -------------------------- > Ken Krugler > +1 530-210-6378 > http://bixolabs.com > custom big data solutions & training > Hadoop, Cascading, Mahout & Solr > > > >
