On Sun, Nov 13, 2011 at 7:47 PM, Sam Cunningham <[email protected]>wrote:
> I have a database of documents. In other words, each tuple contains a > document that needs to be classified. Does Mahout API provide such > capability that I connect to DB, get the document, classify and write the > label back to database? > Yes. You can do this, particularly with the SGD classifiers. The API for the Naive Bayes classifiers is a bit more complex, but it also supports this scenario. > I am aware I can connect to DB separately, loop through tuples, convert > each > tuple to a document, then use Mahout API to classify, and write back to the > database, at the end. Is this the way to go? > More or less, yes. Document encoding is inherently application specific. You can also parallelize this for higher throughput, but you have to watch out for the fact that a large parallel number of tasks can slam your database pretty easily. > To be more specific, does BayesFileFormatter in Mahout API come with > readerToDatabase method? or is there a way to use readerToDocument method > along with a database tuple instead of Files.newReader()? > No. > What is the best practice to connect and read/write from/to DB from Mahout > classifier? > I think you described it right off the bat.
