Hello,
after switching from "org.apache.uima.examples" to the "opennlp.uima"
UIMA wrappers I wanted to create an Aggregate Engine using a Sentence
Detector, Tokenizer and POS Tagger. Each component comes in OpenNLP
1.4.3 UIMA wrappers with a descriptor file. An AggregateEngine
applying all 3 steps in sequence is not included though. I encountered
the following behavior of the components when trying to run them in
the CAS Visual Debugger:
1) Tokenizer: load, runs, generates Token Annotation
(opennlp.uima.token)
2) PosTagger: loads, runs, no Annoation
I guess this is expected since Tokenization was not done before.
3) SentenceDetector: loads, does not run and generates error
"opennlp.uima.util.OpenNlpAnnotatorProcessException: The
required parameter opennlp.uima.ContainerType can not be found!"
This indicates the missing parameter
"opennlp.uima.ContainerType". But I do not know what value to set
this. Setting it to any value returns an empty Annotation. I assumed
the SentenceDetector should run out of the box without dependencies
since it is the first step in the pipeline.
Finally I also tried to create an AggregateEngine which applies the
Tokenizer and the PosTagger. I only got a Token Annotation no POS
tags. Either my Aggregate Engine descriptor (as attached below) was
bad specified or it needs the SentenceSplitting step before
Tokenization.
Any help I would highly appreciate!
Thanks.
Toby
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
AnalysisEngine (Tokenizer+PosTagger from opennlp.uima)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
<?xml version="1.0" encoding="UTF-8"?>
<analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier
">
<frameworkImplementation>org.apache.uima.java</frameworkImplementation>
<primitive>false</primitive>
<delegateAnalysisEngineSpecifiers>
<delegateAnalysisEngine key="PosTagger">
<import location="PosTagger.xml"/>
</delegateAnalysisEngine>
<delegateAnalysisEngine key="Tokenizer">
<import location="Tokenizer.xml"/>
</delegateAnalysisEngine>
</delegateAnalysisEngineSpecifiers>
<analysisEngineMetaData>
<name>OpenNLPAggregate</name>
<description>Aggregate analysis engine that performs sentence
detection, tokenization, and POS tagging using OpenNLP.</description>
<version>1.0</version>
<vendor>The Apache Software Foundation</vendor>
<configurationParameters searchStrategy="language_fallback"/>
<configurationParameterSettings/>
<flowConstraints>
<fixedFlow>
<node>Tokenizer</node>
<node>PosTagger</node>
</fixedFlow>
</flowConstraints>
<typePriorities/>
<fsIndexCollection/>
<capabilities>
<capability>
<inputs/>
<outputs>
<type allAnnotatorFeatures="true">uima.tcas.DocumentAnnotation</type>
<type allAnnotatorFeatures="true">uima.tcas.Annotation</type>
</outputs>
<languagesSupported/>
</capability>
</capabilities>
<operationalProperties>
<modifiesCas>true</modifiesCas>
<multipleDeploymentAllowed>true</multipleDeploymentAllowed>
<outputsNewCASes>false</outputsNewCASes>
</operationalProperties>
</analysisEngineMetaData>
<resourceManagerConfiguration/>
</analysisEngineDescription>