Here's one way - albeit indirect: a. Index your DB into Solr, using the DataImportHandler: http://wiki.apache.org/solr/DataImportHandler#Usage_with_RDBMS
b. Now you will have a Lucene index, which you can import into Mahout: https://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html#CreatingVectorsfromText-FromLucene c. Train your classifier inside Mahout. d. Run the classifier on the needed records, and get an output file in the format: <record id> <label> e. Use a script to insert the results into the database. I am a Mahout newbie, so there might be more efficient ways. Cheers, Yuval On Mon, Nov 14, 2011 at 5:47 AM, Sam Cunningham <[email protected]>wrote: > I have a database of documents. In other words, each tuple contains a > document that needs to be classified. Does Mahout API provide such > capability that I connect to DB, get the document, classify and write the > label back to database? > > I am aware I can connect to DB separately, loop through tuples, convert > each > tuple to a document, then use Mahout API to classify, and write back to the > database, at the end. Is this the way to go? > > To be more specific, does BayesFileFormatter in Mahout API come with > readerToDatabase method? or is there a way to use readerToDocument method > along with a database tuple instead of Files.newReader()? > > What is the best practice to connect and read/write from/to DB from Mahout > classifier? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Classifying-documents-in-database-tp3505846p3505846.html > Sent from the Mahout User List mailing list archive at Nabble.com. >
