Hi Andreas,

Your question should probably be directed to the OpenNLP guys, I think. 
However, I am using OpenNLP with UIMA and can tell you that the OpenNLP 
SentenceDetector only works in "single view mode".

See the AE code at:

https://svn.apache.org/repos/asf/opennlp/trunk/opennlp-uima/src/main/java/opennlp/uima/sentdetect/AbstractSentenceDetector.java
https://svn.apache.org/repos/asf/opennlp/trunk/opennlp-uima/src/main/java/opennlp/uima/sentdetect/SentenceDetector.java

and the corresponding descriptor file:

https://svn.apache.org/repos/asf/opennlp/trunk/opennlp-uima/descriptors/SentenceDetector.xml

As far as the OpenNLP philosophy goes, you'd use a container type that would 
determine which part of the SOFA is a title, subtitle, document, or any other 
content you are interested in sentence-segmenting and only process text within 
that particular container type, while the default is to process the entire 
content if no container type is set; see AbstractSentenceDector#process(CAS)

If you'd really need to process multiple views, you could use multiple, 
aggregate SOFA/view mappings (see

http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.mvs.specifying_cas_view_for_single_view

) or write a wrapper around the OpenNLP annotator that works with multiple 
views; something like this should work, too:

public class MySentenceAnnotator extends SentenceDetector {
        @Override
        public void process(CAS cas) throws AnalysisEngineProcessException {
                super.process(cas.getView("view-a"));
                super.process(cas.getView("view-b"));
                // asf...
        }
}
        
(Footnote: Naturally, you'd probably do that by using an array init parameter 
for the views you wish to process and a loop instead of the hardcoded string 
constants, this is just to show the basic idea...)

On a side-note, to me it seems a bit "fishy" that you are trying to split your 
SOFA into views depending on whether the relevant bit is a title, subtitle, 
body or any other part of the SOFA. In this respect, I think the OpenNLP 
approach with a "container annotation type" feels more UIMA-like: Views should 
be *different* views of the *same* content (e.g., different languages, raw 
[byte] document content vs. plain text, etc.), and not the "same" view [types] 
of different content. 

Hope this helps a bit!

Cheers,
Florian

On 16 Aug 2012, at 21:28, Andreas Niekler wrote:

> Hello,
> 
> i wonder if it is possible to define multiple sofa's (views) in a UIMA 
> Collection Reader and pass those differnt contents to the sentence annotator 
> of the openNLP Tools. Will there be a sentence annotation for each sofa 
> (view) or does openNLP UIMA automatically choose the first sofa in the data?
> 
> How could i implement such a CAS case where i'm able to annotate title, 
> document and subtitle (for example) seperately in one chain?
> 
> Thank you
> 
> Andreas

-- 
Florian Leitner, PhD <[email protected]>

Structural Biology and BioComputing Programme
Spanish National Cancer Research Centre (CNIO)

Address: C/ Melchor Fernandez Almagro 3; E-28029 Madrid
Phone: +34 91 732 8000
Fax: +34 91 224 6980
Internet: http://www.cnio.es

Reply via email to