After training, save the model to a file.

To apply the model,

- load the model from the file
- read a document
- tokenize and vectorize document
- call model.classify() or some variant.

Done!

See the example server in chapter 16 of the mahout book for an example.  I
think that source is also available on my github account (without the book,
of course).

On Wed, Sep 21, 2011 at 8:54 PM, Dan <[email protected]> wrote:

> Hi,
>
> Once the classifier is trained what are the standard approaches for:
>
> "and then apply it to documents"
>
>
> dan
>
>
>
> --- On Tue, 9/20/11, Ken Krugler <[email protected]> wrote:
>
> From: Ken Krugler <[email protected]>
> Subject: Re: new to mahout and need direction.
> To: [email protected]
> Date: Tuesday, September 20, 2011, 7:52 PM
>
> Hi Dan,
>
> I don't think you really need Mahout for the actual processing pipeline.
>
> If I understand the issue correctly, you're trying to come up with
> potential categories for job postings that are flowing through your system.
>
> So that feels more like a typical train-a-classifier (offline) and then
> apply it to documents via whatever mechanism fits best with your current
> workflow.
>
> Which classifier to use, extracting features for training/classification,
> etc is where Mahout could be useful.
>
> -- Ken
>
> On Sep 20, 2011, at 7:01pm, Dan wrote:
>
> > Hello,
> >
> > I am new to using mahout. I have setup hadoop, nutch, pig and I feel I am
> very knowledgeable about solr and fully understand lucene. I am a php
> developer and have only tinkered with java code.
> >
> > I have 2 million jobs and I need to build a categorization system I
> figured mahout should do the trick. So I setup the 20newsgroup example ran
> it. I am trying to figure out how mahout will fit into the
> job-posting-into-solr chain.
> >
> > Currently a job posting will go into a queue to be processed into a solr
> document. we currently have a bunch of processes that will add to the
> document like calling google to get a latitude/longitude based on the job
> posting location, etc. I figure mahout would be in one of these worker
> queues.
> >
> > What are my options for accessing mahout from php? webservice.. bash? I
> would like a system where I post it a chunk of text and it would return a
> list of suggested categories since a job posting could belong to multiple
> categories.
> >
> >
> > Any pointers in the right direction would be appreciated
> >
> > dan
> >
> >
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> custom big data solutions & training
> Hadoop, Cascading, Mahout & Solr
>
>
>
>

Reply via email to