Hi,
I wrote some components usefull for integrate UIMA-components inside a
Spring framework.
This components are Spring FactoryBeans that are able to produce
CasProcessors/Consumers , CollectionReaders and type systems.
The production can be made "totally programmatically", from descriptor
or a PEAR.
I want to release this components to the community, if it sounds good.
This works starts over code posted by Steven Bethard on this ml.
Thank a lot Steven!
I give some use's examples:
<!-- collection reader -->
<bean name="cr" class="it.celi.uima.bean.CollectionReaderFactoryBean"
parent="baseAnnotator">
<property name="componentClass"
value="it.celi.components.collection.RecursiveFileSytemCollectionReader"
/>
<property name="configurationParameters">
<map>
<entry key="application" value="language" />
<entry key="language" value="it" />
</map>
</property>
</bean>
where baseAnnotator is:
<bean name="baseAnnotator"
class="it.celi.uima.bean.AbstractUIMAComponentsFactoryBean"
abstract="true">
<property name="typeSystem" ref="typeSystem" />
</bean>
<bean name="typeSystem" class="it.celi.uima.bean.TypeSytemFactoryBean">
<property name="typeSytemPath"
value="file:../dd4-typeSystem/src/main/resources/CeliTypeSystem.xml"
/>
</bean>
Processor/consumers:
<bean name="sentenceAnnotator"
class="it.celi.uima.bean.CasProcessorFactoryBean"
parent="baseAnnotator">
<property name="componentClass"
value="it.celi.annotators.language.SentenceAnnotator" />
<property name="configurationParameters">
<map>
<entry key="abbreviationsFiles"
value="abbreviations_*.txt" />
<entry key="additionalSeparatorsFiles"
value="sentenceSeparators_*.txt" />
</map>
</property>
</bean>
<bean name="xslSerializerCasConsumer"
class="it.celi.uima.bean.CasConsumerFactoryBean"
parent="baseAnnotator">
<property name="componentClass"
value="it.celi.components.consumer.XslSerializerCasConsumer" />
<property name="configurationParameters">
<map>
<entry key="fileExtension" value=".xml" />
</map>
</property>
</bean>
PEAR files (configuraiton parameters override is not allowed!):
<bean name="japeAnnotator"
class="it.celi.uima.bean.CasProcessorFactoryBean">
<property name="descriptorPath"
value="file:./pears/JapeAnnotator.pear" />
<property name="redeployPear" value="true"/>
<property name="configurationParameters">
<map>
</map>
</property>
</bean>
from descriptor with params override:
<bean name="japeAnnotator"
class="it.celi.uima.bean.CasProcessorFactoryBean">
<property name="descriptorPath"
value="file:./desc/RegExpTokenizer.xml" />
<property name="configurationParameters">
<map>
<entry key="commandsFileName"
value="commands_tokenizer_*.xml" />
</map>
</property>
</bean>
A simple use case coul be:
Configuration:
<bean name="cpm" class="org.apache.uima.UIMAFramework"
factory-method="newCollectionProcessingManager">
</bean>
<bean name="uimaCPM" class="it.celi.uima.engine.CpmUIMAEngine">
<property name="cpm" ref="cpm" />
<property name="listeners">
</property>
<property name="readers">
<list>
<ref bean="rfcr" />
</list>
</property>
<property name="processors">
<list>
<ref bean="sentenceAnnotator" />
<ref bean="regExpTokenizer" />
<ref bean="japeAnnotator" />
</list>
</property>
<property name="consumers">
<list>
<ref bean="xslSerializerCasConsumer" />
</list>
</property>
</bean>
The last element is a CPMWrapper that inside do this:
Methods to add consumers and processors to cpm (lists are injected by
conf above):
private void addAllConsumersToCpm() {
for (CasConsumer casConsumer : consumers) {
String name =
casConsumer.getProcessingResourceMetaData().getName();
try {
logger.info("adding consumer to pipeline::" +
name);
cpm.addCasConsumer(casConsumer);
} catch (ResourceConfigurationException e) {
logger.error("unable to add processor :: " +
name, e);
}
}
}
private void addAllProcessorToCpm() {
for (CasProcessor casProcessor : processors) {
String name =
casProcessor.getProcessingResourceMetaData().getName();
try {
logger.info("adding processor to pipeline::" +
name);
cpm.addCasProcessor(casProcessor);
} catch (ResourceConfigurationException e) {
logger.error("unable to add processor :: " +
name, e);
}
}
}
and then in a method can do:
cpm.setCollectionReader(reader);
cpm.process();
Some advantage:
-only one simple file to configure a cpm
-easy to inject components
-easy to embed cpm/AE inside existing applications
-can use SpringIDE inside Eclipse
-....whatever?
Disadvantage:
-if you don't use Spring, there's another framework to learn
-you can't use the Eclipse's UIMA plugins to edit/manage descriptors
-Aggregate are not supported programmatically (via descriptors there's
no problem)
-....whatever?
Is it interesting? Let me now.
Roberto
--
Roberto Franchini
http://www.celi.it
http://www.blogmeter.it
http://www.memesphere.it
Tel +39-011-6600814
jabber:[EMAIL PROTECTED] skype:ro.franchini