Hello,

after switching from "org.apache.uima.examples" to the "opennlp.uima" UIMA wrappers I wanted to create an Aggregate Engine using a Sentence Detector, Tokenizer and POS Tagger. Each component comes in OpenNLP 1.4.3 UIMA wrappers with a descriptor file. An AggregateEngine applying all 3 steps in sequence is not included though. I encountered the following behavior of the components when trying to run them in the CAS Visual Debugger:

1) Tokenizer: load, runs, generates Token Annotation (opennlp.uima.token)

  2) PosTagger: loads, runs, no Annoation

      I guess this is expected since Tokenization was not done before.

  3) SentenceDetector: loads, does not run and generates error

"opennlp.uima.util.OpenNlpAnnotatorProcessException: The required parameter opennlp.uima.ContainerType can not be found!"

This indicates the missing parameter "opennlp.uima.ContainerType". But I do not know what value to set this. Setting it to any value returns an empty Annotation. I assumed the SentenceDetector should run out of the box without dependencies since it is the first step in the pipeline.

Finally I also tried to create an AggregateEngine which applies the Tokenizer and the PosTagger. I only got a Token Annotation no POS tags. Either my Aggregate Engine descriptor (as attached below) was bad specified or it needs the SentenceSplitting step before Tokenization.

Any help I would highly appreciate!

Thanks.

Toby



- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  AnalysisEngine (Tokenizer+PosTagger from opennlp.uima)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

<?xml version="1.0" encoding="UTF-8"?>
<analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier ">
<frameworkImplementation>org.apache.uima.java</frameworkImplementation>
<primitive>false</primitive>
<delegateAnalysisEngineSpecifiers>
<delegateAnalysisEngine key="PosTagger">
<import location="PosTagger.xml"/>
</delegateAnalysisEngine>
<delegateAnalysisEngine key="Tokenizer">
<import location="Tokenizer.xml"/>
</delegateAnalysisEngine>
</delegateAnalysisEngineSpecifiers>
<analysisEngineMetaData>
<name>OpenNLPAggregate</name>
<description>Aggregate analysis engine that performs sentence detection, tokenization, and POS tagging using OpenNLP.</description>
<version>1.0</version>
<vendor>The Apache Software Foundation</vendor>
<configurationParameters searchStrategy="language_fallback"/>
<configurationParameterSettings/>
<flowConstraints>
<fixedFlow>
<node>Tokenizer</node>
<node>PosTagger</node>
</fixedFlow>
</flowConstraints>
<typePriorities/>
<fsIndexCollection/>
<capabilities>
<capability>
<inputs/>
<outputs>
<type allAnnotatorFeatures="true">uima.tcas.DocumentAnnotation</type>
<type allAnnotatorFeatures="true">uima.tcas.Annotation</type>
</outputs>
<languagesSupported/>
</capability>
</capabilities>
<operationalProperties>
<modifiesCas>true</modifiesCas>
<multipleDeploymentAllowed>true</multipleDeploymentAllowed>
<outputsNewCASes>false</outputsNewCASes>
</operationalProperties>
</analysisEngineMetaData>
<resourceManagerConfiguration/>
</analysisEngineDescription>



Reply via email to