Hi Andreas, Your question should probably be directed to the OpenNLP guys, I think. However, I am using OpenNLP with UIMA and can tell you that the OpenNLP SentenceDetector only works in "single view mode".
See the AE code at: https://svn.apache.org/repos/asf/opennlp/trunk/opennlp-uima/src/main/java/opennlp/uima/sentdetect/AbstractSentenceDetector.java https://svn.apache.org/repos/asf/opennlp/trunk/opennlp-uima/src/main/java/opennlp/uima/sentdetect/SentenceDetector.java and the corresponding descriptor file: https://svn.apache.org/repos/asf/opennlp/trunk/opennlp-uima/descriptors/SentenceDetector.xml As far as the OpenNLP philosophy goes, you'd use a container type that would determine which part of the SOFA is a title, subtitle, document, or any other content you are interested in sentence-segmenting and only process text within that particular container type, while the default is to process the entire content if no container type is set; see AbstractSentenceDector#process(CAS) If you'd really need to process multiple views, you could use multiple, aggregate SOFA/view mappings (see http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.mvs.specifying_cas_view_for_single_view ) or write a wrapper around the OpenNLP annotator that works with multiple views; something like this should work, too: public class MySentenceAnnotator extends SentenceDetector { @Override public void process(CAS cas) throws AnalysisEngineProcessException { super.process(cas.getView("view-a")); super.process(cas.getView("view-b")); // asf... } } (Footnote: Naturally, you'd probably do that by using an array init parameter for the views you wish to process and a loop instead of the hardcoded string constants, this is just to show the basic idea...) On a side-note, to me it seems a bit "fishy" that you are trying to split your SOFA into views depending on whether the relevant bit is a title, subtitle, body or any other part of the SOFA. In this respect, I think the OpenNLP approach with a "container annotation type" feels more UIMA-like: Views should be *different* views of the *same* content (e.g., different languages, raw [byte] document content vs. plain text, etc.), and not the "same" view [types] of different content. Hope this helps a bit! Cheers, Florian On 16 Aug 2012, at 21:28, Andreas Niekler wrote: > Hello, > > i wonder if it is possible to define multiple sofa's (views) in a UIMA > Collection Reader and pass those differnt contents to the sentence annotator > of the openNLP Tools. Will there be a sentence annotation for each sofa > (view) or does openNLP UIMA automatically choose the first sofa in the data? > > How could i implement such a CAS case where i'm able to annotate title, > document and subtitle (for example) seperately in one chain? > > Thank you > > Andreas -- Florian Leitner, PhD <[email protected]> Structural Biology and BioComputing Programme Spanish National Cancer Research Centre (CNIO) Address: C/ Melchor Fernandez Almagro 3; E-28029 Madrid Phone: +34 91 732 8000 Fax: +34 91 224 6980 Internet: http://www.cnio.es
