On Sun, Nov 13, 2011 at 7:47 PM, Sam Cunningham <[email protected]>wrote:

> I have a database of documents. In other words, each tuple contains a
> document that needs to be classified. Does Mahout API provide such
> capability that I connect to DB, get the document, classify and write the
> label back to database?
>

Yes.  You can do this, particularly with the SGD classifiers.  The API for
the Naive Bayes classifiers is a bit more complex, but it also supports
this scenario.


> I am aware I can connect to DB separately, loop through tuples, convert
> each
> tuple to a document, then use Mahout API to classify, and write back to the
> database, at the end. Is this the way to go?
>

More or less, yes.  Document encoding is inherently application specific.

You can also parallelize this for higher throughput, but you have to watch
out for the fact that a large parallel number of tasks can slam your
database pretty easily.



> To be more specific, does BayesFileFormatter in Mahout API come with
> readerToDatabase method? or is there a way to use readerToDocument method
> along with a database tuple instead of Files.newReader()?
>

No.


> What is the best practice to connect and read/write from/to DB from Mahout
> classifier?
>

I think you described it right off the bat.

Reply via email to