FWIW, there will be more classification capabilities coming in the next several months.
-Jason On Mon, Apr 23, 2012 at 5:12 PM, Jörn Kottmann <[email protected]> wrote: > OpenNLP is using either a Maxent or Perceptron classifier > to classify a piece of text. This can give you back the provabilities > for the various categories, but its not designed to tell you how > much each topic is represented in your input document. > > You could take a document and assume each paragraph has one topic > and then classify it paragraph by paragraph. > We sadly don't have support for topic models, such as LDA. > > All the training logs are still written to the console, we have plans > to properly capture them and report training process back via an > API. This output should then be logged and maybe just stored in inside > the model for later debugging. > > Jörn > > > On 04/23/2012 07:41 PM, Alex Kudlick wrote: > >> Hi, >> >> I've just started using open nlp for a project to classify scientific >> articles in to subjects. I have a few questions: >> >> 1. How do I configure logging for the model? I'm using sf4j-log4j for the >> rest of my application, but the training output from the model just goes >> to >> stdout. >> >> 2. Is there any support for classifying documents with multiple classes? >> For instance, a given article may be classified as Computational Biology, >> Cell Biology, and Molecular Biology. >> >> Thanks, >> >> Alex Kudlick >> >> > -- Jason Baldridge Associate Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge
