Hello, I am new to using mahout. I have setup hadoop, nutch, pig and I feel I am very knowledgeable about solr and fully understand lucene. I am a php developer and have only tinkered with java code.
I have 2 million jobs and I need to build a categorization system I figured mahout should do the trick. So I setup the 20newsgroup example ran it. I am trying to figure out how mahout will fit into the job-posting-into-solr chain. Currently a job posting will go into a queue to be processed into a solr document. we currently have a bunch of processes that will add to the document like calling google to get a latitude/longitude based on the job posting location, etc. I figure mahout would be in one of these worker queues. What are my options for accessing mahout from php? webservice.. bash? I would like a system where I post it a chunk of text and it would return a list of suggested categories since a job posting could belong to multiple categories. Any pointers in the right direction would be appreciated dan
