descriptor files in opennlp.uima wrappers (e.g. SentenceDetector)

Tobias Wunner Tue, 28 Apr 2009 06:53:00 -0700

Hello,

after switching from "org.apache.uima.examples" to the "opennlp.uima"UIMA wrappers I wanted to create an Aggregate Engine using a SentenceDetector, Tokenizer and POS Tagger. Each component comes in OpenNLP1.4.3 UIMA wrappers with a descriptor file. An AggregateEngineapplying all 3 steps in sequence is not included though. I encounteredthe following behavior of the components when trying to run them inthe CAS Visual Debugger:

1) Tokenizer: load, runs, generates Token Annotation(opennlp.uima.token)


  2) PosTagger: loads, runs, no Annoation

      I guess this is expected since Tokenization was not done before.

  3) SentenceDetector: loads, does not run and generates error

"opennlp.uima.util.OpenNlpAnnotatorProcessException: Therequired parameter opennlp.uima.ContainerType can not be found!"

This indicates the missing parameter"opennlp.uima.ContainerType". But I do not know what value to setthis. Setting it to any value returns an empty Annotation. I assumedthe SentenceDetector should run out of the box without dependenciessince it is the first step in the pipeline.

Finally I also tried to create an AggregateEngine which applies theTokenizer and the PosTagger. I only got a Token Annotation no POStags. Either my Aggregate Engine descriptor (as attached below) wasbad specified or it needs the SentenceSplitting step beforeTokenization.


Any help I would highly appreciate!

Thanks.

Toby

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- -

  AnalysisEngine (Tokenizer+PosTagger from opennlp.uima)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -- -


<?xml version="1.0" encoding="UTF-8"?>

<analysisEngineDescription xmlns="http://uima.apache.org/resourceSpecifier">

<frameworkImplementation>org.apache.uima.java</frameworkImplementation>
<primitive>false</primitive>
<delegateAnalysisEngineSpecifiers>
<delegateAnalysisEngine key="PosTagger">
<import location="PosTagger.xml"/>
</delegateAnalysisEngine>
<delegateAnalysisEngine key="Tokenizer">
<import location="Tokenizer.xml"/>
</delegateAnalysisEngine>
</delegateAnalysisEngineSpecifiers>
<analysisEngineMetaData>
<name>OpenNLPAggregate</name>

<description>Aggregate analysis engine that performs sentencedetection, tokenization, and POS tagging using OpenNLP.</description>

<version>1.0</version>
<vendor>The Apache Software Foundation</vendor>
<configurationParameters searchStrategy="language_fallback"/>
<configurationParameterSettings/>
<flowConstraints>
<fixedFlow>
<node>Tokenizer</node>
<node>PosTagger</node>
</fixedFlow>
</flowConstraints>
<typePriorities/>
<fsIndexCollection/>
<capabilities>
<capability>
<inputs/>
<outputs>
<type allAnnotatorFeatures="true">uima.tcas.DocumentAnnotation</type>
<type allAnnotatorFeatures="true">uima.tcas.Annotation</type>
</outputs>
<languagesSupported/>
</capability>
</capabilities>
<operationalProperties>
<modifiesCas>true</modifiesCas>
<multipleDeploymentAllowed>true</multipleDeploymentAllowed>
<outputsNewCASes>false</outputsNewCASes>
</operationalProperties>
</analysisEngineMetaData>
<resourceManagerConfiguration/>
</analysisEngineDescription>

descriptor files in opennlp.uima wrappers (e.g. SentenceDetector)

Reply via email to