Hi, 

Once the classifier is trained what are the standard approaches for:

"and then apply it to documents"


dan



--- On Tue, 9/20/11, Ken Krugler <[email protected]> wrote:

From: Ken Krugler <[email protected]>
Subject: Re: new to mahout and need direction.
To: [email protected]
Date: Tuesday, September 20, 2011, 7:52 PM

Hi Dan,

I don't think you really need Mahout for the actual processing pipeline.

If I understand the issue correctly, you're trying to come up with potential 
categories for job postings that are flowing through your system.

So that feels more like a typical train-a-classifier (offline) and then apply 
it to documents via whatever mechanism fits best with your current workflow.

Which classifier to use, extracting features for training/classification, etc 
is where Mahout could be useful.

-- Ken

On Sep 20, 2011, at 7:01pm, Dan wrote:

> Hello,
> 
> I am new to using mahout. I have setup hadoop, nutch, pig and I feel I am 
> very knowledgeable about solr and fully understand lucene. I am a php 
> developer and have only tinkered with java code.
> 
> I have 2 million jobs and I need to build a categorization system I figured 
> mahout should do the trick. So I setup the 20newsgroup example ran it. I am 
> trying to figure out how mahout will fit into the job-posting-into-solr chain.
> 
> Currently a job posting will go into a queue to be processed into a solr 
> document. we currently have a bunch of processes that will add to the 
> document like calling google to get a latitude/longitude based on the job 
> posting location, etc. I figure mahout would be in one of these worker queues.
> 
> What are my options for accessing mahout from php? webservice.. bash? I would 
> like a system where I post it a chunk of text and it would return a list of 
> suggested categories since a job posting could belong to multiple categories. 
> 
> 
> Any pointers in the right direction would be appreciated
> 
> dan
> 
> 

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr



Reply via email to